Re: [Qemu-devel] [PATCH 0/2] Active commit regression fix

2016-02-01 Thread Eric Blake
On 01/29/2016 10:17 PM, Jeff Cody wrote:
> Bug #1300209 is a regression in 2.5, introduced during the 
> change away from bdrv_swap().
> 
> When we change the parent backing link (change_parent_backing_link),
> we must also accomodate non-NULL tqe_prev pointers that point to a
> NULL entry.  Please see patch #1 for more details.
> 
> Jeff Cody (2):
>   block: change parent backing link when *tqe_prev == NULL
>   block: qemu-iotests - add test for snapshot, commit, snapshot bug

Series:
Reviewed-by: Eric Blake 

> 
>  block.c|   2 +-
>  tests/qemu-iotests/143 | 114 
> +
>  tests/qemu-iotests/143.out |  24 ++
>  tests/qemu-iotests/group   |   1 +
>  4 files changed, 140 insertions(+), 1 deletion(-)
>  create mode 100755 tests/qemu-iotests/143
>  create mode 100644 tests/qemu-iotests/143.out
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [Qemu-arm] Does QEMU support AArch64 Big Endian emulation on x86-64 host?

2016-02-01 Thread Peter Crosthwaite
On Mon, Feb 1, 2016 at 3:25 AM, Ruslan Bilovol  wrote:
> On Wed, Jan 27, 2016 at 7:39 PM, Peter Crosthwaite
>  wrote:
>> On Tue, Jan 26, 2016 at 4:05 AM, Ruslan Bilovol
>>  wrote:
>>> On Mon, Jan 25, 2016 at 6:07 PM, Peter Maydell  
>>> wrote:
 On 25 January 2016 at 13:51, Ruslan Bilovol  
 wrote:
>> I'm trying to run AArch64 Big Endian image under QEMU that I built for
>> my x86-64 Ubuntu host from latest master branch and when I'm running
>> kernel I'm getting next error:
>>  > qemu: fatal: Trying to execute code outside RAM or ROM at 
>> 0x50020880
>>
>> Similar image built as Little Endian runs fine with same QEMU tool
>> (qemu-system-aarch64)
>>
>> So I'm wondering is it possible to run QEMU AArch64 Big Endian
>> emulation on x86-64 host?

 It is not currently supported, no. However there are some patches
 on the list[*] to add this support, so I expect a future QEMU version
 will do this.

 [*] https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg03025.html
>>>
>>> Thank you four quick answer.
>>> I tried to apply this patch series to latest qemu master branch, but it
>>> fails to apply in misc places. Peter Crosthwaite, could you please say
>>> which commit ID is it based on?
>>>
>>
>> I pushed the branch just now to here:
>>
>> https://github.com/pcrost/qemu/tree/arm-be.next
>
> Thanks Peter.
>
> I've built qemu on this branch and tried to run it with my
> AArch64 Big Endian binaries but still get same error:
> "qemu: fatal: Trying to execute code outside RAM or
> ROM at 0x7fe60d8e0a30"
>
> So it looks like (unfortunately) these patches do not enable
> AArch64 BE support
>

You need to patch more information, including your QEMU command line
and possible a link to your binary. Are you using an ELF or a raw
image?

Regards,
Peter

> Best regards,
> Ruslan



Re: [Qemu-devel] [PATCH] fdc: fix detection under Linux

2016-02-01 Thread Hervé Poussineau

Le 29/01/2016 23:35, John Snow a écrit :

Accidentally, I removed a "feature" where empty drives had geometry
values applied to them, which allows seek on empty drives to work
"by accident," as QEMU actually tries to disallow that.

Seeks on empty drives should work, though, but the easiest thing is to
restore the misfeature where empty drives have non-zero geometries
applied.

Document the hack accordingly.

Signed-off-by: John Snow
---
  hw/block/fdc.c | 16 
  1 file changed, 16 insertions(+)


Tested-by: Hervé Poussineau 





[Qemu-devel] [PATCH] kvm-all: trace: strerror fixup

2016-02-01 Thread Andrew Jones
Signed-off-by: Andrew Jones 
---
 kvm-all.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/kvm-all.c b/kvm-all.c
index 9148889921197..330f509a0da84 100644
--- a/kvm-all.c
+++ b/kvm-all.c
@@ -2362,7 +2362,7 @@ int kvm_set_one_reg(CPUState *cs, uint64_t id, void 
*source)
 reg.addr = (uintptr_t) source;
 r = kvm_vcpu_ioctl(cs, KVM_SET_ONE_REG, );
 if (r) {
-trace_kvm_failed_reg_set(id, strerror(r));
+trace_kvm_failed_reg_set(id, strerror(-r));
 }
 return r;
 }
@@ -2376,7 +2376,7 @@ int kvm_get_one_reg(CPUState *cs, uint64_t id, void 
*target)
 reg.addr = (uintptr_t) target;
 r = kvm_vcpu_ioctl(cs, KVM_GET_ONE_REG, );
 if (r) {
-trace_kvm_failed_reg_get(id, strerror(r));
+trace_kvm_failed_reg_get(id, strerror(-r));
 }
 return r;
 }
-- 
2.4.3




Re: [Qemu-devel] [PATCH v8] spec: add qcow2 bitmaps extension specification

2016-02-01 Thread Max Reitz
On 27.01.2016 16:52, Vladimir Sementsov-Ogievskiy wrote:
> The new feature for qcow2: storing bitmaps.
> 
> This patch adds new header extension to qcow2 - Bitmaps Extension. It
> provides an ability to store virtual disk related bitmaps in a qcow2
> image. For now there is only one type of such bitmaps: Dirty Tracking
> Bitmap, which just tracks virtual disk changes from some moment.
> 
> Note: Only bitmaps, relative to the virtual disk, stored in qcow2 file,
> should be stored in this qcow2 file. The size of each bitmap
> (considering its granularity) is equal to virtual disk size.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>

The semantics are good, I just have more grammar nitpicks from here on. :-)

[...]

> 
>  docs/specs/qcow2.txt | 225 
> ++-
>  1 file changed, 224 insertions(+), 1 deletion(-)
> 
> diff --git a/docs/specs/qcow2.txt b/docs/specs/qcow2.txt
> index f236d8c..7b0ebef 100644
> --- a/docs/specs/qcow2.txt
> +++ b/docs/specs/qcow2.txt

[...]

> +12 - 15:flags
> +Bit
> +  0: in_use
> + The bitmap was not saved correctly and may be
> + inconsistent.
> +
> +  1: auto
> + The bitmap must reflect all changes of the virtual
> + disk by any application that would write to this 
> qcow2
> + file (including writes, snapshot switching, etc.). 
> The
> + type of this bitmap must be 'dirty tracking bitmap'.
> +
> +  2: extra_data_compatible
> + This flags is meaningful when extra data is unknown 
> for

s/for/to/, and probably also "the extra data".

> + the software (currently any extra data is unknown 
> for

s/for/to/

> + Qemu).
> + If it is set, the bitmap may be used as expected, 
> extra
> + data must be left as is.
> + If it is not set, the bitmap must not be used, but 
> left
> + as is with extra data.

Maybe s/with/along with its/ would sound better; or "but both it and its
extra data be left as is".

> +
> +Bits 3 - 31 are reserved and must be 0.

[...]

> +=== Dirty tracking bitmaps ===
> +
> +Bitmaps with 'type' field equal to one are dirty tracking bitmaps.
> +
> +When the virtual disk is in use dirty tracking bitmap may be 'enabled' or
> +'disabled'. While the bitmap is 'enabled', all writes to the virtual disk
> +should be reflected in the bitmap. Set bit in the bitmap means that the

s/Set bit/A set bit/

> +corresponding range of the virtual disk (see above) was written while the

Maybe s/written/written to/, but that's optional ("written" sounds to me
like an allocating write, or as if everything in that range was
overwritten).

> +bitmap was 'enabled'. Unset bit means that this range was not written.

s/Unset bit/An unset bit/, and again maybe s/written/written to/.

> +
> +The software should not sync the bitmap in the image file with its
> +representation in RAM after each write. Flag 'in_use' should be set while the
> +bitmap is not synced.
> +
> +In the image file the 'enabled' state is reflected by 'auto' flag. If this 
> flag

s/'auto' flag/the 'auto' flag/

> +is set, the software must consider the bitmap as 'enabled' and start tracking
> +virtual disk changes to this bitmap from the first write to the virtual 
> disk. If
> +this flag is not set then the bitmap is constant.

s/constant/'disabled'/? It's basically the same, but "disabled" is what
you used above.

> +
> +To maintain the bitmap consistent, the only software is allowed to change the
> +value of 'auto' flag: the software which was created the bitmap.

I'd prefer:

To maintain bitmap consistency, the only software which is allowed to
change the value of the 'auto' flag is the one which has created the bitmap.

>   The other
> +software must not change this flag, it only tracks changes to this bitmap, if
> +'auto' flag is set and ignores the bitmap otherwise.

I'd drop the second part and shorten it to:

Any other software must not change this flag.

Or just drop it completely. The previous sentence completely suffices to
tell that no other program is allowed to modify it; and I found the
second part ("it only tracks...") confusing because I had to wonder
about what the "it" referred to, and because it's superfluous. It's just
repeating how any program is supposed to handle such a bitmap anyway.

Max



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH V2 1/2] ARM: PL061: Clear PL061 device state after reset

2016-02-01 Thread Wei Huang
Current QEMU doesn't clear PL061 state after reset. This causes a
weird issue with guest reboot via GPIO. Here is the device state
description with two reboot requests:

  (PL061State fields)   data   old_in_data   istate
VM boot 0  0 0
After 1st ACPI reboot request   8  8 8
After VM PL061 driver ACK   8  8 0
After VM reboot 8  8 0

2nd ACPI reboot request 8

In the second reboot request above, because old_in_data field is 8,
QEMU decides that there is a pending edge IRQ already (see
pl061_update()) in input; so it doesn't raise up IRQ again. As a result
the second reboot request is lost. The correct way is to clear PL061
device state after reset.

NOTE: The reset state is found from the following documentation:
 - PL061 Technical Reference Manual
 - Stellaris LM3S8962 Microcontroller Data Sheet
 - Stellaris LM3S5P31 Microcontroller Data Sheet

Signed-off-by: Wei Huang 
---
 hw/gpio/pl061.c | 32 ++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/hw/gpio/pl061.c b/hw/gpio/pl061.c
index e5a696e..342a70d 100644
--- a/hw/gpio/pl061.c
+++ b/hw/gpio/pl061.c
@@ -284,8 +284,35 @@ static void pl061_write(void *opaque, hwaddr offset,
 
 static void pl061_reset(PL061State *s)
 {
-  s->locked = 1;
-  s->cr = 0xff;
+/* reset values from PL061 TRM, Stellaris LM3S5P31 & LM3S8962 Data Sheet */
+s->data = 0;
+s->old_out_data = 0;
+s->old_in_data = 0;
+s->dir = 0;
+s->isense = 0;
+s->ibe = 0;
+s->iev = 0;
+s->im = 0;
+s->istate = 0;
+s->afsel = 0;
+s->dr2r = 0xff;
+s->dr4r = 0;
+s->dr8r = 0;
+s->odr = 0;
+s->pur = 0;
+s->pdr = 0;
+s->slr = 0;
+s->den = 0;
+s->locked = 1;
+s->cr = 0xff;
+s->amsel = 0;
+}
+
+static void pl061_state_reset(DeviceState *dev)
+{
+PL061State *s = PL061(dev);
+
+pl061_reset(s);
 }
 
 static void pl061_set_irq(void * opaque, int irq, int level)
@@ -343,6 +370,7 @@ static void pl061_class_init(ObjectClass *klass, void *data)
 
 k->init = pl061_initfn;
 dc->vmsd = _pl061;
+dc->reset = _state_reset;
 }
 
 static const TypeInfo pl061_info = {
-- 
1.8.3.1




[Qemu-devel] [PATCH V2 2/2] ARM: PL061: Cleaning field of PL061 device state

2016-02-01 Thread Wei Huang
This patch removes the float_high field of PL061State, which doesn't
seem to be used anywhere. Because this changes the device state, the
version ID is also bumped up for the reason of compatiblity.

Signed-off-by: Wei Huang 
---
 hw/gpio/pl061.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/hw/gpio/pl061.c b/hw/gpio/pl061.c
index 342a70d..5e2abe6 100644
--- a/hw/gpio/pl061.c
+++ b/hw/gpio/pl061.c
@@ -56,7 +56,6 @@ typedef struct PL061State {
 uint32_t slr;
 uint32_t den;
 uint32_t cr;
-uint32_t float_high;
 uint32_t amsel;
 qemu_irq irq;
 qemu_irq out[8];
@@ -65,8 +64,8 @@ typedef struct PL061State {
 
 static const VMStateDescription vmstate_pl061 = {
 .name = "pl061",
-.version_id = 3,
-.minimum_version_id = 3,
+.version_id = 4,
+.minimum_version_id = 4,
 .fields = (VMStateField[]) {
 VMSTATE_UINT32(locked, PL061State),
 VMSTATE_UINT32(data, PL061State),
@@ -88,7 +87,6 @@ static const VMStateDescription vmstate_pl061 = {
 VMSTATE_UINT32(slr, PL061State),
 VMSTATE_UINT32(den, PL061State),
 VMSTATE_UINT32(cr, PL061State),
-VMSTATE_UINT32(float_high, PL061State),
 VMSTATE_UINT32_V(amsel, PL061State, 2),
 VMSTATE_END_OF_LIST()
 }
-- 
1.8.3.1




Re: [Qemu-devel] [iGVT-g] VFIO based vGPU(was Re: [Announcement] 2015-Q3 release of XenGT - a Mediated ...)

2016-02-01 Thread Alex Williamson
On Mon, 2016-02-01 at 14:10 +0100, Gerd Hoffmann wrote:
>   Hi,
> 
> > > Unfortunately it's not the only one. Another example is, device-model
> > > may want to write-protect a gfn (RAM). In case that this request goes
> > > to VFIO .. how it is supposed to reach KVM MMU?
> > 
> > Well, let's work through the problem.  How is the GFN related to the
> > device?  Is this some sort of page table for device mappings with a base
> > register in the vgpu hardware?
> 
> IIRC this is needed to make sure the guest can't bypass execbuffer
> verification and works like this:
> 
>   (1) guest submits execbuffer.
>   (2) host makes execbuffer readonly for the guest
>   (3) verify the buffer (make sure it only accesses resources owned by
>   the vm).
>   (4) pass on execbuffer to the hardware.
>   (5) when the gpu is done with it make the execbuffer writable again.

Ok, so are there opportunities to do those page protections outside of
KVM?  We should be able to get the vma for the buffer, can we do
something with that to make it read-only.  Alternatively can the vgpu
driver copy it to a private buffer and hardware can execute from that?
I'm not a virtual memory expert, but it doesn't seem like an
insurmountable problem.  Thanks,

Alex




Re: [Qemu-devel] [PATCH] jobs: remove unused structure

2016-02-01 Thread Eric Blake
On 02/01/2016 03:18 PM, John Snow wrote:
> Signed-off-by: John Snow 
> ---
>  blockjob.c | 8 
>  1 file changed, 8 deletions(-)

Reviewed-by: Eric Blake 

> 
> diff --git a/blockjob.c b/blockjob.c
> index 80adb9d..a692142 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -278,14 +278,6 @@ void block_job_iostatus_reset(BlockJob *job)
>  }
>  }
>  
> -struct BlockFinishData {
> -BlockJob *job;
> -BlockCompletionFunc *cb;
> -void *opaque;
> -bool cancelled;
> -int ret;
> -};
> -
>  static int block_job_finish_sync(BlockJob *job,
>   void (*finish)(BlockJob *, Error **errp),
>   Error **errp)
> 

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [PATCH v2 0/3] SD emulation fixes for Pi2 Tianocore EDK2 UEFI

2016-02-01 Thread Andrew Baumann
> From: Peter Maydell [mailto:peter.mayd...@linaro.org]
> Sent: Monday, 25 January 2016 10:37
> 
> On 25 January 2016 at 18:06, Andrew Baumann
>  wrote:
> > This is the most recent version of the patch series. However,
> > there was an unresolved question about migration compatibility
> > for the vmstate layout (patch 2/3).
> 
> Ah yes, that got lost in the Christmas holiday shuffle I think.
> I've replied to that patch with my opinion.
> 
> > The other two patches in the series already got a Reviewed-By from
> > Peter C. Once we nail the migration issue, I can rebase the patches
> > and they should be ready to go.
> 
> You might find it easier to rebase on top of my sd card QOMification
> series:
>   https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg04425.html
> 
> as that will probably get into master first, though hopefully there
> won't be any major clashes with this series.

Thanks. I've just sent v3 to the list, which hopefully solves the migration 
problem. It is based on your sd qom series, but as you expected out the 
conflicts are really minor.

Andrew


Re: [Qemu-devel] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks

2016-02-01 Thread Alex Williamson
On Mon, 2016-02-01 at 13:49 +0100, Gerd Hoffmann wrote:
> > > Maybe we should define the interface as "guest writes 0xfc to pick
> > > address, qemu takes care to place opregion there".  That gives us the
> > > freedom to change the qemu implementation (either copy host opregion or
> > > map the host opregion) without breaking things.
> > 
> > Ok, so seabios allocates two pages, writes the base address of those
> > pages to 0xfc and looks to see whether the signature appears at that
> > address due to qemu mapping.  It verifies the size and does a
> > free/realloc if not the right size.
> 
> I think seabios first needs to reserve something big enough for a
> temporary mapping, to check signature + size, otherwise the opregion
> might scratch data structures beyond opregion in case it happens to be
> larger than 8k.
> 
> How likely is it that the opregion size ever changes?  Should we better
> be prepared to handle it?  Or would it be ok to have a ...
> 
>    if (opregion_size > 8k)
>   panic();
> 
> ... style sanity check?
> 

The patch below is what I'm working with now, it assumes that the
opregion is 8K, maps, verifies, and re-allocs if it's a different size.
Maybe it is safer to abort if it is over 8K, but we're not actually
clobbering anything with the mapping, we're just temporarily mapping
over it.  So if there's not another thread of execution that could be
accessing something there and we're not stepping on our own stack or
data, it doesn't seem like there's a problem.

diff --git a/src/fw/pciinit.c b/src/fw/pciinit.c
index c31c2fa..4f3251e 100644
--- a/src/fw/pciinit.c
+++ b/src/fw/pciinit.c
@@ -257,6 +257,52 @@ static void ich9_smbus_setup(struct pci_device *dev, void *
 pci_config_writeb(bdf, ICH9_SMB_HOSTC, ICH9_SMB_HOSTC_HST_EN);
 }
 
+static void intel_igd_opregion_setup(struct pci_device *dev, void *arg)
+{
+u16 bdf = dev->bdf;
+u32 orig;
+void *opregion;
+int size = 8;
+
+if (!CONFIG_QEMU)
+return;
+
+orig = pci_config_readl(bdf, 0xFC);
+
+realloc:
+opregion = malloc_high(size * 1024);
+if (!opregion) {
+warn_noalloc();
+return;
+}
+
+/*
+ * QEMU maps the OpRegion into system memory at the address written here,
+ * this overlaps our malloc, which marks the range e820 reserved.
+ */
+pci_config_writel(bdf, 0xFC, cpu_to_le32((u32)opregion));
+
+if (memcmp(opregion, "IntelGraphicsMem", 16)) {
+pci_config_writel(bdf, 0xFC, orig);
+free(opregion);
+return; /* the opregion didn't magically appear, not supported */
+}
+
+if (size == le32_to_cpu(*(u32 *)(opregion + 16))) {
+dprintf(1, "Intel IGD OpRegion enabled on %02x:%02x.%x\n",
+pci_bdf_to_bus(bdf), pci_bdf_to_dev(bdf), pci_bdf_to_fn(bdf));
+return; /* success! */
+}
+
+pci_config_writel(bdf, 0xFC, orig);
+free(opregion);
+
+if (size == 8) { /* try once more with a new size */
+size = le32_to_cpu(*(u32 *)(opregion + 16));
+goto realloc;
+}
+}
+
 static const struct pci_device_id pci_device_tbl[] = {
 /* PIIX3/PIIX4 PCI to ISA bridge */
 PCI_DEVICE(PCI_VENDOR_ID_INTEL, PCI_DEVICE_ID_INTEL_82371SB_0,
@@ -290,6 +336,10 @@ static const struct pci_device_id pci_device_tbl[] = {
 PCI_DEVICE_CLASS(PCI_VENDOR_ID_APPLE, 0x0017, 0xff00, apple_macio_setup),
 PCI_DEVICE_CLASS(PCI_VENDOR_ID_APPLE, 0x0022, 0xff00, apple_macio_setup),
 
+/* Intel IGD OpRegion setup */
+PCI_DEVICE_CLASS(PCI_VENDOR_ID_INTEL, PCI_ANY_ID, PCI_CLASS_DISPLAY_VGA,
+ intel_igd_opregion_setup),
+
 PCI_DEVICE_END,
 };
 

> > If the graphics signature does not
> > appear, free those pages and assume no opregion support.
> 
> Yes.
> 
> > If we later
> > decide to use a copy, we'd need to disable the 0xfc automagic mapping
> > and probably pass the data via fw_cfg.  Sound right?
> 
> I'd have qemu copy the data on 0xfc write then, so things continue to
> work without updating seabios.  So, the firmware has to allocate space,
> reserve it etc.,  and programming the 0xfc register.  Qemu has to make
> sure the opregion appears at the address written by the firmware, by
> whatever method it prefers.

Ah, so here is where we'd clobber data in firmware.  I currently do
this in vfio's pci config write in QEMU:

orig = pci_get_long(pdev->config + IGD_OPREGION);
pci_default_write_config(pdev, addr, val, len);
cur = pci_get_long(pdev->config + IGD_OPREGION);

if (cur != orig) {
if (orig) {
memory_region_del_subregion(get_system_memory(),
vdev->igd_opregion->mem);
}

if (cur) {
memory_region_add_subregion(get_system_memory(),
cur, vdev->igd_opregion->mem);
}
}

This means that fw can write 0x0 back to the ASL storage register and
the mapping goes away, no 

[Qemu-devel] [PATCH] jobs: remove unused structure

2016-02-01 Thread John Snow
Signed-off-by: John Snow 
---
 blockjob.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/blockjob.c b/blockjob.c
index 80adb9d..a692142 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -278,14 +278,6 @@ void block_job_iostatus_reset(BlockJob *job)
 }
 }
 
-struct BlockFinishData {
-BlockJob *job;
-BlockCompletionFunc *cb;
-void *opaque;
-bool cancelled;
-int ret;
-};
-
 static int block_job_finish_sync(BlockJob *job,
  void (*finish)(BlockJob *, Error **errp),
  Error **errp)
-- 
2.4.3




[Qemu-devel] [PATCH v3 0/3] SD emulation fixes for Pi2 Tianocore EDK2 UEFI

2016-02-01 Thread Andrew Baumann
This series contains fixes to the SD card emulation that are needed to
unblock Tianocore EDK2 UEFI (specifically, the bootloader for Windows
on Raspberry Pi 2).

Changes in v2, based on feedback from Peter Crosthwaite:
 * correct implementation of CMD23 to switch to transfer state on completion
 * use an actual timer for the power-up delay, rather than relying on
   the guest polling ACMD41 twice
 * added patch 3: replace fprintfs with guest error logging

Change in v3:
 * rebased on Peter Maydell's SD QOMification series:
   https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg04425.html
   (although the conflicts are minor -- this could be independent)
 * use a subsection for the new OCR vmstate (patch 2/3), rather than
   bumping the version number, to ensure backward-compatibility

(I'm guessing at the CC list here, since this code appears to be
unmaintained. Apologies if I guessed wrong!)

Cheers,
Andrew


Andrew Baumann (3):
  hw/sd: implement CMD23 (SET_BLOCK_COUNT) for MMC compatibility
  hw/sd: model a power-up delay, as a workaround for an EDK2 bug
  hw/sd: use guest error logging rather than fprintf to stderr

 hw/sd/sd.c | 141 ++---
 1 file changed, 126 insertions(+), 15 deletions(-)
 mode change 100644 => 100755 hw/sd/sd.c

-- 
2.5.3




Re: [Qemu-devel] [PATCH v10 10/25] qapi: Improve generated event use of qapi visitor

2016-02-01 Thread Eric Blake
On 02/01/2016 05:31 AM, Markus Armbruster wrote:

>> |+visit_start_struct(v, NULL, NULL, "ACPI_DEVICE_OST", 0, );
>> | if (err) {
>> | goto out;
>> | }
>> | visit_type_ACPIOSTInfo(v, , "info", );
>> | if (err) {
>> |-goto out;
>> |+goto out_obj;
>> | }
>> |-visit_end_struct(v, );
>> |+out_obj:
>> |+visit_end_struct(v, err ? NULL : );
> 
> Slightly awkward example, because out_obj is pointless in this
> degenerated case.  You could pick one with multiple members (thus
> multiple goto out_obj), or do pseudo-code hinting at multiple members.

DEVICE_DELETED, DEVICE_TRAY_MOVED, MEM_UNPLUG_ERROR,
NET_RX_FILTER_CHANGED, and SPICE_CONNECTED are nice candidates (two
members instead of one).  Do you want to take care of redoing any
portion of the commit message?

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [Qemu-block] [PATCH 1/2] block: change parent backing link when *tqe_prev == NULL

2016-02-01 Thread Max Reitz
On 30.01.2016 06:17, Jeff Cody wrote:
> In change_parent_backing_link(), we only inserted the new
> BlockDriverState entry into the device_list if the tqe_prev pointer was
> NULL.   However, we must also allow insertion when the BDS pointed
> to by the tqe_prev pointer is NULL as well.
> 
> This fixes a bug with external snapshots, and live active layer commits.
> 
> After a live snapshot occurs, the active layer and the base layer both
> have a non-NULL tqe_prev field in the device_list, although the base
> node's tqe_prev field points to a NULL entry.
> 
> Once the active commit is finished, bdrv_replace_in_backing_chain() is
> called to set the base node as the new active node, and remove the
> node that was the prior active layer from the device_list.
> 
> If we only check against the tqe_prev pointer field and not the entity
> it is pointing to, then we fail to insert base image into the device
> list.  The previous active layer is still removed from the device_list,
> leaving an empty device_list queue.
> 
> With an empty device_list queue, odd behavior occurs - such as not
> allowing any more live snapshots.
> 
> This commit fixes this issue, by checking for a NULL tqe_prev entity
> in the devices_list.
> 
> Signed-off-by: Jeff Cody 
> ---
>  block.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/block.c b/block.c
> index 5709d3d..0b8526b 100644
> --- a/block.c
> +++ b/block.c
> @@ -2272,7 +2272,7 @@ static void change_parent_backing_link(BlockDriverState 
> *from,
>  }
>  if (from->blk) {
>  blk_set_bs(from->blk, to);
> -if (!to->device_list.tqe_prev) {
> +if (!to->device_list.tqe_prev || !*to->device_list.tqe_prev) {

I'm not sure this is the right fix; bdrv_make_anon() clearly states that
we do want device_list.tqe_prev to be NULL if and only if the BDS is not
part of the device list. So this should not be happening.

>  QTAILQ_INSERT_BEFORE(from, to, device_list);
>  }
>  QTAILQ_REMOVE(_states, from, device_list);

Inserting a from->device_list.tqe_prev = NULL; here makes the iotest
happy and looks like the better fix to me.

Max

> 



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [RFC PATCH 0/3] (Resend) TranslationBlock annotation mechanism

2016-02-01 Thread Lluís Vilanova
Lluís Vilanova writes:

> Bastian Koppelmann writes:
>> Hi Lluis,
>> On 01/27/2016 07:54 PM, Lluís Vilanova wrote:
[...]
>>> 
>>> So, I'd say that such support is on the list of current developments (at 
>>> least
>>> mine, specially now that I have a bit more time for it). But getting the 
>>> core
>>> infrastructure mainlined takes some time to ensure it makes sense and can be
>>> easily maintained and be generally usefull to vanilla QEMU.
>>> 

>> For us such a API would make a lot of sense and there is no benefit for
>> us to do our own API. Would it make sense for you if we helped you?

> Definitely. The instrumentation code needs some serious update to bring it
> up-to-date with the latest QEMU, but adding generally useful guest code 
> tracing
> events is something that can be easily pararellized.

FYI, I've rebased my local queue and pushed it to the public repository
[1]. Note that except for the head of the queue, patches are not tested (may not
even compile), but it's a first step at seriously rebooting the project.

I've also started splitting some of the series into separate branches to ease
their development in parallel (devel-*).

[1] https://projects.gso.ac.upc.edu/projects/qemu-dbi
https://code.gso.ac.upc.edu/git/qemu-dbi

PS: if you had a previous checkout, you might need to get it again (or do some
tedious checkout+rm+merge+checkout sequence), since I'm developing all these
series using stgit (which rewrites the story).


Cheers,
  Lluis



Re: [Qemu-devel] [PATCH v10 23/25] qapi: Drop unused error argument for list and implicit struct

2016-02-01 Thread Eric Blake
On 02/01/2016 06:07 AM, Markus Armbruster wrote:
> Eric Blake  writes:
> 
>> No backend was setting an error when ending an implicit struct,
>> or when iterating a list.
> 
> Perhaps "when ending the visit of a list or implicit struct, or when
> moving to the next list node" would be more precise.  If you like it, I
> can do that on commit.
> 
>>Make the callers a bit easier to follow
>> by making this a part of the contract, and removing the errp
>> argument - callers can then unconditionally end an object as
>> part of cleanup without having to think about whether a second
>> error is dominated by a first, because there is no second error.
>>
>> A later patch will then tackle the larger task of splitting
>> visit_end_struct(), which can indeed set an error (and that
>> cleanup will also have the side-effect of removing the use of
>> error_abort added here).

Oh, while you're touching this up, the last half of this sentence is now
stale (since the addition of _abort was split out into 22/25
instead); I'd just delete the entire parenthetical, ending with "indeed
set an error."

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


[Qemu-devel] [PATCH v3 1/3] hw/sd: implement CMD23 (SET_BLOCK_COUNT) for MMC compatibility

2016-02-01 Thread Andrew Baumann
CMD23 is optional for SD but required for MMC, and the UEFI bootloader
used for Windows on Raspberry Pi 2 issues it.

Reviewed-by: Peter Crosthwaite 
Signed-off-by: Andrew Baumann 
---
 hw/sd/sd.c | 37 +
 1 file changed, 37 insertions(+)

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index 9e3be2c..8514ac7 100644
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -97,6 +97,7 @@ struct SDState {
 int32_t wpgrps_size;
 uint64_t size;
 uint32_t blk_len;
+uint32_t multi_blk_cnt;
 uint32_t erase_start;
 uint32_t erase_end;
 uint8_t pwd[16];
@@ -429,6 +430,7 @@ static void sd_reset(DeviceState *dev)
 sd->blk_len = 0x200;
 sd->pwd_len = 0;
 sd->expecting_acmd = false;
+sd->multi_blk_cnt = 0;
 }
 
 static bool sd_get_inserted(SDState *sd)
@@ -488,6 +490,7 @@ static const VMStateDescription sd_vmstate = {
 VMSTATE_UINT32(vhs, SDState),
 VMSTATE_BITMAP(wp_groups, SDState, 0, wpgrps_size),
 VMSTATE_UINT32(blk_len, SDState),
+VMSTATE_UINT32(multi_blk_cnt, SDState),
 VMSTATE_UINT32(erase_start, SDState),
 VMSTATE_UINT32(erase_end, SDState),
 VMSTATE_UINT8_ARRAY(pwd, SDState, 16),
@@ -696,6 +699,12 @@ static sd_rsp_type_t sd_normal_command(SDState *sd,
 if (sd_cmd_type[req.cmd] == sd_ac || sd_cmd_type[req.cmd] == sd_adtc)
 rca = req.arg >> 16;
 
+/* CMD23 (set block count) must be immediately followed by CMD18 or CMD25
+ * if not, its effects are cancelled */
+if (sd->multi_blk_cnt != 0 && !(req.cmd == 18 || req.cmd == 25)) {
+sd->multi_blk_cnt = 0;
+}
+
 DPRINTF("CMD%d 0x%08x state %d\n", req.cmd, req.arg, sd->state);
 switch (req.cmd) {
 /* Basic commands (Class 0 and Class 1) */
@@ -991,6 +1000,17 @@ static sd_rsp_type_t sd_normal_command(SDState *sd,
 }
 break;
 
+case 23:/* CMD23: SET_BLOCK_COUNT */
+switch (sd->state) {
+case sd_transfer_state:
+sd->multi_blk_cnt = req.arg;
+return sd_r1;
+
+default:
+break;
+}
+break;
+
 /* Block write commands (Class 4) */
 case 24:   /* CMD24:  WRITE_SINGLE_BLOCK */
 if (sd->spi)
@@ -1590,6 +1610,14 @@ void sd_write_data(SDState *sd, uint8_t value)
 sd->csd[14] |= 0x40;
 
 /* Bzzztt  Operation complete.  */
+if (sd->multi_blk_cnt != 0) {
+if (--sd->multi_blk_cnt == 0) {
+/* Stop! */
+sd->state = sd_transfer_state;
+break;
+}
+}
+
 sd->state = sd_receivingdata_state;
 }
 break;
@@ -1736,6 +1764,15 @@ uint8_t sd_read_data(SDState *sd)
 if (sd->data_offset >= io_len) {
 sd->data_start += io_len;
 sd->data_offset = 0;
+
+if (sd->multi_blk_cnt != 0) {
+if (--sd->multi_blk_cnt == 0) {
+/* Stop! */
+sd->state = sd_transfer_state;
+break;
+}
+}
+
 if (sd->data_start + io_len > sd->size) {
 sd->card_status |= ADDRESS_ERROR;
 break;
-- 
2.5.3




[Qemu-devel] [RFC] Programmable guest-to-QEMU hypercalls

2016-02-01 Thread Lluís Vilanova
Hi! I have in my trace instrumentation queue a series that adds a very simple
but efficient way to trigger code in QEMU from guest code using guest-agnostic
code.

Blue Swirl showed some interest long ago in using it in the test suite (e.g.,
instruct QEMU to check the vCPU state after a series of instructions). But I
don't know if there still is interest, or if anybody else finds this useful
(otherwise I'll keep it in my instrumentation branch).

Guest-side interface:

  #include 
  int main()
  {
  // initialize communication device
  qemu_hypercall_init("/tmp/hypercall");

  // memory region to share data between guest and QEMU
  // (QEMU does not trap reads/writes here, so it can be used as a
  // bandwidth-efficient communication channel)
  void *data = qemu_hypercall_data();
  ((char*)data)[0] = 0x1;

  // trigger hypercall callback
  qemu_hypercall(0xcafe);   // in-line data
  }

A dynamic library is loaded when starting QEMU, which gets called as a response
to 'qemu_hypercall()':

  // libmyhypercall.so
  qemu_hypercall(uint64_t cmd, char *data)
  {
  assert(cmd == 0xcafe)
  assert(((char*)data)[0] == 0x1);
  }

To start QEMU:

  qemu-x86_64 -hypercall libmyhypercall.so -hypercall-device=/tmp/backdoor  
/test/program
  qemu-system-x86_64 -device hypercall

I have a prototype for a guest user library and a guest Linux module to use this
in both user and system mode.


Cheers,
  Lluis



[Qemu-devel] [PATCH v3 2/3] hw/sd: model a power-up delay, as a workaround for an EDK2 bug

2016-02-01 Thread Andrew Baumann
The SD spec for ACMD41 says that a zero argument is an "inquiry"
ACMD41, which does not start initialisation and is used only for
retrieving the OCR. However, Tianocore EDK2 (UEFI) has a bug [1]: it
first sends an inquiry (zero) ACMD41. If that first request returns an
OCR value with the power up bit (0x8000) set, it assumes the card
is ready and continues, leaving the card in the wrong state. (My
assumption is that this works on hardware, because no real card is
immediately powered up upon reset.)

This change models a delay of 0.5ms from the first ACMD41 to the power
being up. However, it also immediately sets the power on upon seeing a
non-zero (non-enquiry) ACMD41. This speeds up UEFI boot, it should
also account for guests that simply delay after card reset and then
issue an ACMD41 that they expect will succeed.

[1] 
https://github.com/tianocore/edk2/blob/master/EmbeddedPkg/Universal/MmcDxe/MmcIdentification.c#L279
(This is the loop starting with "We need to wait for the MMC or SD
card is ready")

Signed-off-by: Andrew Baumann 
---
Obviously this is a bug that should be fixed in EDK2. However, this
initialisation appears to have been around for quite a while in EDK2
(in various forms), and the fact that it has obviously worked with so
many real SD/MMC cards makes me think that it would be pragmatic to
have the workaround in QEMU as well.

You might argue that the delay timer should start on sd_reset(), and
not the first ACMD41. However, that doesn't work reliably with UEFI,
because a large delay often elapses between the two (particularly in
debug builds that do lots of printing to the serial port). If the
timer fires too early, we'll still hit the bug, but we also don't want
to set a huge timeout value, because some guests may depend on it
expiring.

 hw/sd/sd.c | 83 +-
 1 file changed, 77 insertions(+), 6 deletions(-)
 mode change 100644 => 100755 hw/sd/sd.c

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
old mode 100644
new mode 100755
index 8514ac7..473d4a0
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -36,6 +36,7 @@
 #include "qemu/bitmap.h"
 #include "hw/qdev-properties.h"
 #include "qemu/error-report.h"
+#include "qemu/timer.h"
 
 //#define DEBUG_SD 1
 
@@ -46,7 +47,9 @@ do { fprintf(stderr, "SD: " fmt , ## __VA_ARGS__); } while (0)
 #define DPRINTF(fmt, ...) do {} while(0)
 #endif
 
-#define ACMD41_ENQUIRY_MASK 0x00ff
+#define ACMD41_ENQUIRY_MASK 0x00ff
+#define OCR_POWER_UP0x8000
+#define OCR_POWER_DELAY (get_ticks_per_sec() / 2000) /* 0.5ms */
 
 typedef enum {
 sd_r0 = 0,/* no response */
@@ -85,6 +88,7 @@ struct SDState {
 uint32_t mode;/* current card mode, one of SDCardModes */
 int32_t state;/* current card state, one of SDCardStates */
 uint32_t ocr;
+QEMUTimer *ocr_power_timer;
 uint8_t scr[8];
 uint8_t cid[16];
 uint8_t csd[16];
@@ -199,8 +203,17 @@ static uint16_t sd_crc16(void *message, size_t width)
 
 static void sd_set_ocr(SDState *sd)
 {
-/* All voltages OK, card power-up OK, Standard Capacity SD Memory Card */
-sd->ocr = 0x8000;
+/* All voltages OK, Standard Capacity SD Memory Card, not yet powered up */
+sd->ocr = 0x0000;
+}
+
+static void sd_ocr_powerup(void *opaque)
+{
+SDState *sd = opaque;
+
+/* Set powered up bit in OCR */
+assert(!(sd->ocr & OCR_POWER_UP));
+sd->ocr |= OCR_POWER_UP;
 }
 
 static void sd_set_scr(SDState *sd)
@@ -475,10 +488,44 @@ static const BlockDevOps sd_block_ops = {
 .change_media_cb = sd_cardchange,
 };
 
+static bool sd_ocr_vmstate_needed(void *opaque)
+{
+SDState *sd = opaque;
+
+/* Include the OCR state (and timer) if it is not yet powered up */
+return !(sd->ocr & OCR_POWER_UP);
+}
+
+static const VMStateDescription sd_ocr_vmstate = {
+.name = "sd-card/ocr-state",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = sd_ocr_vmstate_needed,
+.fields = (VMStateField[]) {
+VMSTATE_UINT32(ocr, SDState),
+VMSTATE_TIMER_PTR(ocr_power_timer, SDState),
+VMSTATE_END_OF_LIST()
+},
+};
+
+static int sd_vmstate_pre_load(void *opaque)
+{
+SDState *sd = opaque;
+
+/* If the OCR state is not included (prior versions, or not
+ * needed), then the OCR must be set as powered up. If the OCR state
+ * is included, this will be replaced by the state restore.
+ */
+sd_ocr_powerup(sd);
+
+return 0;
+}
+
 static const VMStateDescription sd_vmstate = {
 .name = "sd-card",
 .version_id = 1,
 .minimum_version_id = 1,
+.pre_load = sd_vmstate_pre_load,
 .fields = (VMStateField[]) {
 VMSTATE_UINT32(mode, SDState),
 VMSTATE_INT32(state, SDState),
@@ -505,7 +552,11 @@ static const VMStateDescription sd_vmstate = {
 VMSTATE_BUFFER_POINTER_UNSAFE(buf, SDState, 1, 512),
 VMSTATE_BOOL(enable, SDState),
 VMSTATE_END_OF_LIST()
-}
+

[Qemu-devel] [PATCH v3 3/3] hw/sd: use guest error logging rather than fprintf to stderr

2016-02-01 Thread Andrew Baumann
Some of these errors may be harmless (e.g. probing unimplemented
commands, or issuing CMD12 in the wrong state), and may also be quite
frequent. Spamming the standard error output isn't desirable in such
cases.

Reviewed-by: Peter Crosthwaite 
Signed-off-by: Andrew Baumann 
---
It might also be desirable to have a squelch mechanism for these
messages, but at least for my use-case, this is sufficient, since they
only occur during boot time.

 hw/sd/sd.c | 21 -
 1 file changed, 12 insertions(+), 9 deletions(-)

diff --git a/hw/sd/sd.c b/hw/sd/sd.c
index 473d4a0..4794538 100755
--- a/hw/sd/sd.c
+++ b/hw/sd/sd.c
@@ -1294,16 +1294,17 @@ static sd_rsp_type_t sd_normal_command(SDState *sd,
 
 default:
 bad_cmd:
-fprintf(stderr, "SD: Unknown CMD%i\n", req.cmd);
+qemu_log_mask(LOG_GUEST_ERROR, "SD: Unknown CMD%i\n", req.cmd);
 return sd_illegal;
 
 unimplemented_cmd:
 /* Commands that are recognised but not yet implemented in SPI mode.  
*/
-fprintf(stderr, "SD: CMD%i not implemented in SPI mode\n", req.cmd);
+qemu_log_mask(LOG_UNIMP, "SD: CMD%i not implemented in SPI mode\n",
+  req.cmd);
 return sd_illegal;
 }
 
-fprintf(stderr, "SD: CMD%i in a wrong state\n", req.cmd);
+qemu_log_mask(LOG_GUEST_ERROR, "SD: CMD%i in a wrong state\n", req.cmd);
 return sd_illegal;
 }
 
@@ -1435,7 +1436,7 @@ static sd_rsp_type_t sd_app_command(SDState *sd,
 return sd_normal_command(sd, req);
 }
 
-fprintf(stderr, "SD: ACMD%i in a wrong state\n", req.cmd);
+qemu_log_mask(LOG_GUEST_ERROR, "SD: ACMD%i in a wrong state\n", req.cmd);
 return sd_illegal;
 }
 
@@ -1478,7 +1479,7 @@ int sd_do_command(SDState *sd, SDRequest *req,
 if (!cmd_valid_while_locked(sd, req)) {
 sd->card_status |= ILLEGAL_COMMAND;
 sd->expecting_acmd = false;
-fprintf(stderr, "SD: Card is locked\n");
+qemu_log_mask(LOG_GUEST_ERROR, "SD: Card is locked\n");
 rtype = sd_illegal;
 goto send_response;
 }
@@ -1636,7 +1637,8 @@ void sd_write_data(SDState *sd, uint8_t value)
 return;
 
 if (sd->state != sd_receivingdata_state) {
-fprintf(stderr, "sd_write_data: not in Receiving-Data state\n");
+qemu_log_mask(LOG_GUEST_ERROR,
+  "sd_write_data: not in Receiving-Data state\n");
 return;
 }
 
@@ -1755,7 +1757,7 @@ void sd_write_data(SDState *sd, uint8_t value)
 break;
 
 default:
-fprintf(stderr, "sd_write_data: unknown command\n");
+qemu_log_mask(LOG_GUEST_ERROR, "sd_write_data: unknown command\n");
 break;
 }
 }
@@ -1770,7 +1772,8 @@ uint8_t sd_read_data(SDState *sd)
 return 0x00;
 
 if (sd->state != sd_sendingdata_state) {
-fprintf(stderr, "sd_read_data: not in Sending-Data state\n");
+qemu_log_mask(LOG_GUEST_ERROR,
+  "sd_read_data: not in Sending-Data state\n");
 return 0x00;
 }
 
@@ -1881,7 +1884,7 @@ uint8_t sd_read_data(SDState *sd)
 break;
 
 default:
-fprintf(stderr, "sd_read_data: unknown command\n");
+qemu_log_mask(LOG_GUEST_ERROR, "sd_read_data: unknown command\n");
 return 0x00;
 }
 
-- 
2.5.3




Re: [Qemu-devel] [PATCH v1 1/1] qom: Correct object_property_get_int() description

2016-02-01 Thread Alistair Francis
On Sat, Jan 30, 2016 at 1:35 AM, Michael Tokarev  wrote:
> 18.01.2016 21:42, Alistair Francis wrote:
>> The description of object_property_get_int() stated that on an error
>> it returns NULL. This is not the case and the function will return -1
>> if an error occurs. Update the commented documentation accordingly.
>
> Applied to -trivial, thanks!

Thanks!

Alistair

>
> /mjt
>



Re: [Qemu-devel] [PATCH v9 23/37] qmp: Support explicit null during input visit

2016-02-01 Thread Eric Blake
On 01/22/2016 10:12 AM, Markus Armbruster wrote:
> Eric Blake  writes:
> 
>> Implement the new type_null() callback for the qmp input visitor.
>> While we don't yet have a use for this in qapi (the generator
>> will need some tweaks first), one usage is already envisioned:
>> when changing blockdev parameters, it would be nice to have a
>> difference between leaving a tuning parameter unchanged (omit
>> that parameter from the struct) and to explicitly reset the
>> parameter to its default without having to know what the default
>> value is (specify the parameter with an explicit null value,
>> which will require us to allow a qapi alternate that chooses
>> between the normal value and an explicit null).
>>
>> At any rate, we can test this without the use of generated qapi
>> by manually using visit_start_struct()/visit_end_struct().
> 
> Well, we test by calling visit_type_null() manually.  We choose to wrap
> it in a visit_start_struct() ... visit_end_struct() pair, but that's
> detail.  Actually, we do an unwrapped root visit first, and then a
> struct-wrapped visit.
> 
> Suggest "by calling visit_type_null() manually."
> 
>> Signed-off-by: Eric Blake 
>>
>> ---

>> +static void qmp_input_type_null(Visitor *v, const char *name, Error **errp)
>> +{
>> +QmpInputVisitor *qiv = to_qiv(v);
>> +QObject *qobj = qmp_input_get_object(qiv, name, true);
>> +
>> +if (qobject_type(qobj) == QTYPE_QNULL) {
>> +return;
>> +}
>> +
>> +error_setg(errp, QERR_INVALID_PARAMETER_TYPE, name ? name : "null",
>> +   "null");
> 
> Recommend to put the error in the conditional:
> 
> if (qobject_type(qobj) != QTYPE_QNULL) {
> error_setg(errp, QERR_INVALID_PARAMETER_TYPE, name ? name : "null",
>"null");
> }

Sure, I can reflow the logic.

>> +/* Check that qnull reference counting is sane:
>> + * 1 for global use, 1 for our qnull() use, and 1 still owned by 'v'
>> + * until it is torn down */
>> +null = qnull();
>> +g_assert(null->refcnt == 3);
>> +visitor_input_teardown(data, NULL);
>> +g_assert(null->refcnt == 2);
>> +qobject_decref(null);
> 
> For other kinds of QObject, we leave the testing of reference counting
> to the check-qKIND.c, and don't bother with it when testing the
> visitors.  Any particular reason to do null differently?

Well, 19/37 added reference counting checks to
test-qmp-output-visitor.c, and we don't have a check-qnull.c test yet.
That, and the thing being checked here is that the visitor doesn't over-
or under-reference the static qnull object (just checking qnull()
without a visitor doesn't tell you if the visitor has any reference
counting bugs).  But maybe it is indeed worth writing a check-qnull.c
file that does this work.

-- 
Eric Blake   eblake redhat com+1-919-301-3266
Libvirt virtualization library http://libvirt.org



signature.asc
Description: OpenPGP digital signature


Re: [Qemu-devel] [Qemu-block] [PATCH 1/2] block: change parent backing link when *tqe_prev == NULL

2016-02-01 Thread Jeff Cody
On Mon, Feb 01, 2016 at 10:43:02PM +0100, Max Reitz wrote:
> On 30.01.2016 06:17, Jeff Cody wrote:
> > In change_parent_backing_link(), we only inserted the new
> > BlockDriverState entry into the device_list if the tqe_prev pointer was
> > NULL.   However, we must also allow insertion when the BDS pointed
> > to by the tqe_prev pointer is NULL as well.
> > 
> > This fixes a bug with external snapshots, and live active layer commits.
> > 
> > After a live snapshot occurs, the active layer and the base layer both
> > have a non-NULL tqe_prev field in the device_list, although the base
> > node's tqe_prev field points to a NULL entry.
> > 
> > Once the active commit is finished, bdrv_replace_in_backing_chain() is
> > called to set the base node as the new active node, and remove the
> > node that was the prior active layer from the device_list.
> > 
> > If we only check against the tqe_prev pointer field and not the entity
> > it is pointing to, then we fail to insert base image into the device
> > list.  The previous active layer is still removed from the device_list,
> > leaving an empty device_list queue.
> > 
> > With an empty device_list queue, odd behavior occurs - such as not
> > allowing any more live snapshots.
> > 
> > This commit fixes this issue, by checking for a NULL tqe_prev entity
> > in the devices_list.
> > 
> > Signed-off-by: Jeff Cody 
> > ---
> >  block.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/block.c b/block.c
> > index 5709d3d..0b8526b 100644
> > --- a/block.c
> > +++ b/block.c
> > @@ -2272,7 +2272,7 @@ static void 
> > change_parent_backing_link(BlockDriverState *from,
> >  }
> >  if (from->blk) {
> >  blk_set_bs(from->blk, to);
> > -if (!to->device_list.tqe_prev) {
> > +if (!to->device_list.tqe_prev || !*to->device_list.tqe_prev) {
> 
> I'm not sure this is the right fix; bdrv_make_anon() clearly states that
> we do want device_list.tqe_prev to be NULL if and only if the BDS is not
> part of the device list. So this should not be happening.
>

Good point.  This also screams for a helper function to remove a BDS
from the device_list, to enforce this behavior (we remove a BDS from
the device_list is 3 spots, and this time it was missed in one of
them).  Hopefully that will help prevent future errors.

> >  QTAILQ_INSERT_BEFORE(from, to, device_list);
> >  }
> >  QTAILQ_REMOVE(_states, from, device_list);
> 
> Inserting a from->device_list.tqe_prev = NULL; here makes the iotest
> happy and looks like the better fix to me.
> 
> Max
>

Thanks!

-Jeff





[Qemu-devel] [PULL 14/17] e1000: eliminate infinite loops on out-of-bounds transfer start

2016-02-01 Thread Jason Wang
From: Laszlo Ersek 

The start_xmit() and e1000_receive_iov() functions implement DMA transfers
iterating over a set of descriptors that the guest's e1000 driver
prepares:

- the TDLEN and RDLEN registers store the total size of the descriptor
  area,

- while the TDH and RDH registers store the offset (in whole tx / rx
  descriptors) into the area where the transfer is supposed to start.

Each time a descriptor is processed, the TDH and RDH register is bumped
(as appropriate for the transfer direction).

QEMU already contains logic to deal with bogus transfers submitted by the
guest:

- Normally, the transmit case wants to increase TDH from its initial value
  to TDT. (TDT is allowed to be numerically smaller than the initial TDH
  value; wrapping at or above TDLEN bytes to zero is normal.) The failsafe
  that QEMU currently has here is a check against reaching the original
  TDH value again -- a complete wraparound, which should never happen.

- In the receive case RDH is increased from its initial value until
  "total_size" bytes have been received; preferably in a single step, or
  in "s->rxbuf_size" byte steps, if the latter is smaller. However, null
  RX descriptors are skipped without receiving data, while RDH is
  incremented just the same. QEMU tries to prevent an infinite loop
  (processing only null RX descriptors) by detecting whether RDH assumes
  its original value during the loop. (Again, wrapping from RDLEN to 0 is
  normal.)

What both directions miss is that the guest could program TDLEN and RDLEN
so low, and the initial TDH and RDH so high, that these registers will
immediately be truncated to zero, and then never reassume their initial
values in the loop -- a full wraparound will never occur.

The condition that expresses this is:

  xdh_start >= s->mac_reg[XDLEN] / sizeof(desc)

i.e., TDH or RDH start out after the last whole rx or tx descriptor that
fits into the TDLEN or RDLEN sized area.

This condition could be checked before we enter the loops, but
pci_dma_read() / pci_dma_write() knows how to fill in buffers safely for
bogus DMA addresses, so we just extend the existing failsafes with the
above condition.

This is CVE-2016-1981.

Cc: "Michael S. Tsirkin" 
Cc: Petr Matousek 
Cc: Stefano Stabellini 
Cc: Prasad Pandit 
Cc: Michael Roth 
Cc: Jason Wang 
Cc: qemu-sta...@nongnu.org
RHBZ: https://bugzilla.redhat.com/show_bug.cgi?id=1296044
Signed-off-by: Laszlo Ersek 
Reviewed-by: Jason Wang 
Signed-off-by: Jason Wang 
---
 hw/net/e1000.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/net/e1000.c b/hw/net/e1000.c
index 4eda7a3..0387fa0 100644
--- a/hw/net/e1000.c
+++ b/hw/net/e1000.c
@@ -909,7 +909,8 @@ start_xmit(E1000State *s)
  * bogus values to TDT/TDLEN.
  * there's nothing too intelligent we could do about this.
  */
-if (s->mac_reg[TDH] == tdh_start) {
+if (s->mac_reg[TDH] == tdh_start ||
+tdh_start >= s->mac_reg[TDLEN] / sizeof(desc)) {
 DBGOUT(TXERR, "TDH wraparound @%x, TDT %x, TDLEN %x\n",
tdh_start, s->mac_reg[TDT], s->mac_reg[TDLEN]);
 break;
@@ -1166,7 +1167,8 @@ e1000_receive_iov(NetClientState *nc, const struct iovec 
*iov, int iovcnt)
 if (++s->mac_reg[RDH] * sizeof(desc) >= s->mac_reg[RDLEN])
 s->mac_reg[RDH] = 0;
 /* see comment in start_xmit; same here */
-if (s->mac_reg[RDH] == rdh_start) {
+if (s->mac_reg[RDH] == rdh_start ||
+rdh_start >= s->mac_reg[RDLEN] / sizeof(desc)) {
 DBGOUT(RXERR, "RDH wraparound @%x, RDT %x, RDLEN %x\n",
rdh_start, s->mac_reg[RDT], s->mac_reg[RDLEN]);
 set_ics(s, 0, E1000_ICS_RXO);
-- 
2.5.0




[Qemu-devel] [PULL 16/17] net: always walk through filters in reverse if traffic is egress

2016-02-01 Thread Jason Wang
From: Li Zhijian 

Previously, if we attach more than one filters for a single netdev,
both ingress and egress traffic will go through net filters in same
order like:

ingress: netdev ->filter1 ->filter2 ->...filter[n] ->emulated device
egress: emulated device ->filter1 ->filter2 ->...filter[n] ->netdev.

This is against the natural feeling and will complicate filters
configuration since in some scenes, we hope filters handle the egress
traffic in a reverse order. For example, in colo-proxy (will be
implemented later), we have a redirector filter and a colo-rewriter
filter, we need the filter behave like:

ingress(->)/egress(<-): chardev<->redirector<->colo-rewriter<->emulated device

Since both buffer filter and dump do not require strict order of
filters, this patch switches to always let egress traffic walk through
net filters in reverse to simplify the possible filters configuration
in the future.

Signed-off-by: Wen Congyang 
Signed-off-by: Li Zhijian 
Reviewed-by: Yang Hongyang 
Signed-off-by: Jason Wang 
---
 include/net/net.h |  2 +-
 net/filter.c  | 21 +++--
 net/net.c | 20 +++-
 3 files changed, 35 insertions(+), 8 deletions(-)

diff --git a/include/net/net.h b/include/net/net.h
index 7af3e15..73e4c46 100644
--- a/include/net/net.h
+++ b/include/net/net.h
@@ -92,7 +92,7 @@ struct NetClientState {
 NetClientDestructor *destructor;
 unsigned int queue_index;
 unsigned rxfilter_notify_enabled:1;
-QTAILQ_HEAD(, NetFilterState) filters;
+QTAILQ_HEAD(NetFilterHead, NetFilterState) filters;
 };
 
 typedef struct NICState {
diff --git a/net/filter.c b/net/filter.c
index 5d90f83..17a8398 100644
--- a/net/filter.c
+++ b/net/filter.c
@@ -34,6 +34,22 @@ ssize_t qemu_netfilter_receive(NetFilterState *nf,
 return 0;
 }
 
+static NetFilterState *netfilter_next(NetFilterState *nf,
+  NetFilterDirection dir)
+{
+NetFilterState *next;
+
+if (dir == NET_FILTER_DIRECTION_TX) {
+/* forward walk through filters */
+next = QTAILQ_NEXT(nf, next);
+} else {
+/* reverse order */
+next = QTAILQ_PREV(nf, NetFilterHead, next);
+}
+
+return next;
+}
+
 ssize_t qemu_netfilter_pass_to_next(NetClientState *sender,
 unsigned flags,
 const struct iovec *iov,
@@ -43,7 +59,7 @@ ssize_t qemu_netfilter_pass_to_next(NetClientState *sender,
 int ret = 0;
 int direction;
 NetFilterState *nf = opaque;
-NetFilterState *next = QTAILQ_NEXT(nf, next);
+NetFilterState *next = NULL;
 
 if (!sender || !sender->peer) {
 /* no receiver, or sender been deleted, no need to pass it further */
@@ -61,6 +77,7 @@ ssize_t qemu_netfilter_pass_to_next(NetClientState *sender,
 direction = nf->direction;
 }
 
+next = netfilter_next(nf, direction);
 while (next) {
 /*
  * if qemu_netfilter_pass_to_next been called, means that
@@ -73,7 +90,7 @@ ssize_t qemu_netfilter_pass_to_next(NetClientState *sender,
 if (ret) {
 return ret;
 }
-next = QTAILQ_NEXT(next, next);
+next = netfilter_next(next, direction);
 }
 
 /*
diff --git a/net/net.c b/net/net.c
index 87dd356..c929c41 100644
--- a/net/net.c
+++ b/net/net.c
@@ -580,11 +580,21 @@ static ssize_t filter_receive_iov(NetClientState *nc,
 ssize_t ret = 0;
 NetFilterState *nf = NULL;
 
-QTAILQ_FOREACH(nf, >filters, next) {
-ret = qemu_netfilter_receive(nf, direction, sender, flags, iov,
- iovcnt, sent_cb);
-if (ret) {
-return ret;
+if (direction == NET_FILTER_DIRECTION_TX) {
+QTAILQ_FOREACH(nf, >filters, next) {
+ret = qemu_netfilter_receive(nf, direction, sender, flags, iov,
+ iovcnt, sent_cb);
+if (ret) {
+return ret;
+}
+}
+} else {
+QTAILQ_FOREACH_REVERSE(nf, >filters, NetFilterHead, next) {
+ret = qemu_netfilter_receive(nf, direction, sender, flags, iov,
+ iovcnt, sent_cb);
+if (ret) {
+return ret;
+}
 }
 }
 
-- 
2.5.0




[Qemu-devel] [PULL 13/17] slirp: Adding family argument to tcp_fconnect()

2016-02-01 Thread Jason Wang
From: Guillaume Subiron 

This patch simply adds a sa_family_t argument to remove the hardcoded
"AF_INET" in the call of qemu_socket().

This prepares for IPv6 support.

Signed-off-by: Guillaume Subiron 
Signed-off-by: Samuel Thibault 
Reviewed-by: Thomas Huth 
Signed-off-by: Jason Wang 
---
 slirp/slirp.h | 2 +-
 slirp/tcp_input.c | 2 +-
 slirp/tcp_subr.c  | 5 +++--
 3 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/slirp/slirp.h b/slirp/slirp.h
index ec0a4c2..fcd4a05 100644
--- a/slirp/slirp.h
+++ b/slirp/slirp.h
@@ -327,7 +327,7 @@ void tcp_respond(struct tcpcb *, register struct tcpiphdr 
*, register struct mbu
 struct tcpcb * tcp_newtcpcb(struct socket *);
 struct tcpcb * tcp_close(register struct tcpcb *);
 void tcp_sockclosed(struct tcpcb *);
-int tcp_fconnect(struct socket *);
+int tcp_fconnect(struct socket *, sa_family_t af);
 void tcp_connect(struct socket *);
 int tcp_attach(struct socket *);
 uint8_t tcp_tos(struct socket *);
diff --git a/slirp/tcp_input.c b/slirp/tcp_input.c
index 5e2773c..f24e706 100644
--- a/slirp/tcp_input.c
+++ b/slirp/tcp_input.c
@@ -584,7 +584,7 @@ findso:
goto cont_input;
  }
 
-  if ((tcp_fconnect(so) == -1) &&
+ if ((tcp_fconnect(so, so->so_ffamily) == -1) &&
 #if defined(_WIN32)
   socket_error() != WSAEWOULDBLOCK
 #else
diff --git a/slirp/tcp_subr.c b/slirp/tcp_subr.c
index 76c716f..8ec2729 100644
--- a/slirp/tcp_subr.c
+++ b/slirp/tcp_subr.c
@@ -324,14 +324,15 @@ tcp_sockclosed(struct tcpcb *tp)
  * nonblocking.  Connect returns after the SYN is sent, and does
  * not wait for ACK+SYN.
  */
-int tcp_fconnect(struct socket *so)
+int tcp_fconnect(struct socket *so, sa_family_t af)
 {
   int ret=0;
 
   DEBUG_CALL("tcp_fconnect");
   DEBUG_ARG("so = %p", so);
 
-  if( (ret = so->s = qemu_socket(AF_INET,SOCK_STREAM,0)) >= 0) {
+  ret = so->s = qemu_socket(af, SOCK_STREAM, 0);
+  if (ret >= 0) {
 int opt, s=so->s;
 struct sockaddr_storage addr;
 
-- 
2.5.0




Re: [Qemu-devel] [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset tweaks

2016-02-01 Thread Kay, Allen M


> -Original Message-
> From: Alex Williamson [mailto:alex.william...@redhat.com]
> Sent: Sunday, January 31, 2016 9:42 AM
> To: Kay, Allen M; Gerd Hoffmann; David Woodhouse
> Cc: igv...@ml01.01.org; xen-de...@lists.xensource.com; Eduardo Habkost;
> Stefano Stabellini; qemu-devel@nongnu.org; Cao jin; vfio-
> us...@redhat.com
> Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough chipset
> tweaks
> 
> On Sat, 2016-01-30 at 01:18 +, Kay, Allen M wrote:
> >
> > > -Original Message-
> > > From: iGVT-g [mailto:igvt-g-boun...@lists.01.org] On Behalf Of Alex
> > > Williamson
> > > Sent: Friday, January 29, 2016 10:00 AM
> > > To: Gerd Hoffmann
> > > Cc: igv...@ml01.01.org; xen-de...@lists.xensource.com; Eduardo
> > > Habkost; Stefano Stabellini; qemu-devel@nongnu.org; Cao jin; vfio-
> > > us...@redhat.com
> > > Subject: Re: [iGVT-g] [vfio-users] [PATCH v3 00/11] igd passthrough
> > > chipset tweaks
> > >
> > > Do guest drivers depend on IGD appearing at 00:02.0?  I'm currently
> > > testing for any Intel VGA device, but I wonder if I should only be
> > > enabling anything opregion if it also appears at a specific address.
> > >
> >
> > No.  Both Windows and Linux IGD driver should work at any PCI slot.  We
> have seen 0:5.0 in the guest and the driver works.
> 
> Thanks Allen.  Another question, when I boot a VM with an assigned HD
> P4000 GPU, my console stream with IOMMU faults, like:
> 
> DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3
> DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3
> DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3
> DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3
> DMAR: DMAR:[DMA Write] Request device [00:02.0] fault addr 9fa3
> 
> All of these fall within the host RMRR range for the device:
> 
> DMAR: Setting RMRR:
> DMAR: Setting identity map for device :00:02.0 [0x9f80 - 0xaf9f]

Hi Alex,

Do you configure IGD as primary or secondary display in your KVM setup?   If 
primary, are you running Intel vBIOS as part of guest boot?

On BDW/SKL systems, we have started to configure IGD as secondary and QEMU VGA 
and primary.  In this setup, we are no longer running vBIOS in the guest which 
avoids some complications.  vBIOS uses stolen memory for display buffers which 
requires RMRR mapping.  We have been using similar setup (IGD as secondary) on 
other hypervisors and have not seen IOMMU faults.

I will setup a KVM configuration on my SKL and see if I can duplicate your 
problem here.   I will try to call into Don's Thursday meeting to discuss this 
(I'm on call for jury duty this week).  I will give you a heads up on Wednesday 
evening.

Allen

> 
> A while back, we excluded devices using RMRRs from participating in IOMMU
> API domains because they may continue to DMA to these reserved regions
> after assignment, possibly corrupting VM memory (c875d2c1b808).  Intel
> later decided this exclusion shouldn't apply to graphics devices
> (18436afdc11a).  Don't the above IOMMU faults reveal that exactly the
> problem we're trying to prevent by general exclusion of RMRR encumbered
> devices from the IOMMU API is actually occuring?  If I were to have VM
> memory within the RMRR address range, I wouldn't be seeing these faults,
> I'd be having the GPU corrupt my VM memory.
> 
> David notes in the latter commit above:
> 
> "We should be able to successfully assign graphics devices to guests too, as
> long as the initial handling of stolen memory is reconfigured appropriately."
> 
> What code is supposed to be doing that reconfiguration when a device is
> assigned?  Clearly we don't have it yet, making assignment of these devices
> very unsafe.  It seems like vfio or IOMMU code  in the kernel needs device
> specific code to clear these settings to make it safe for userspace, then
> perhaps VM BIOS support to reallocate.  Is there any consistency across IGD
> revisions for doing this?  Is there a spec?
> Thanks,
> 
> Alex



[Qemu-devel] [PATCH v4] blockjob: Fix hang in block_job_finish_sync

2016-02-01 Thread Fam Zheng
With a mirror job running on a virtio-blk dataplane disk, sending "q" to
HMP will cause a dead loop in block_job_finish_sync.

This is because the aio_poll() only processes the AIO context of bs
which has no more work to do, while the main loop BH that is scheduled
for setting the job->completed flag is never processed.

Fix this by adding a flag in BlockJob structure, to track which context
to poll for the block job to make progress. Its value is set to true
when block_job_coroutine_complete() is called, and is checked in
block_job_finish_sync to determine which context to poll.

Suggested-by: Stefan Hajnoczi 
Signed-off-by: Fam Zheng 
---
 blockjob.c   | 6 +-
 include/block/blockjob.h | 5 +
 2 files changed, 10 insertions(+), 1 deletion(-)

diff --git a/blockjob.c b/blockjob.c
index 80adb9d..b15df93 100644
--- a/blockjob.c
+++ b/blockjob.c
@@ -304,7 +304,9 @@ static int block_job_finish_sync(BlockJob *job,
 return -EBUSY;
 }
 while (!job->completed) {
-aio_poll(bdrv_get_aio_context(bs), true);
+aio_poll(job->deferred_to_main_loop ? qemu_get_aio_context() :
+  bdrv_get_aio_context(bs),
+ true);
 }
 ret = (job->cancelled && job->ret == 0) ? -ECANCELED : job->ret;
 block_job_unref(job);
@@ -478,6 +480,7 @@ static void block_job_defer_to_main_loop_bh(void *opaque)
 aio_context = bdrv_get_aio_context(data->job->bs);
 aio_context_acquire(aio_context);
 
+data->job->deferred_to_main_loop = false;
 data->fn(data->job, data->opaque);
 
 aio_context_release(aio_context);
@@ -497,6 +500,7 @@ void block_job_defer_to_main_loop(BlockJob *job,
 data->aio_context = bdrv_get_aio_context(job->bs);
 data->fn = fn;
 data->opaque = opaque;
+job->deferred_to_main_loop = true;
 
 qemu_bh_schedule(data->bh);
 }
diff --git a/include/block/blockjob.h b/include/block/blockjob.h
index d84ccd8..8bedc49 100644
--- a/include/block/blockjob.h
+++ b/include/block/blockjob.h
@@ -130,6 +130,11 @@ struct BlockJob {
  */
 bool ready;
 
+/**
+ * Set to true when the job has deferred work to the main loop.
+ */
+bool deferred_to_main_loop;
+
 /** Status that is published by the query-block-jobs QMP API */
 BlockDeviceIoStatus iostatus;
 
-- 
2.4.3




Re: [Qemu-devel] [PATCH v3 0/4] Netfilter: Add each netdev a default filter

2016-02-01 Thread Yang Hongyang

If we have to add a default filter, then I have a suggestion only
for this series:
1. Add a nop filter. filter-nop.c
2. Add a "default-filter=xxx" property to -netdev, if not specified,
   default to nop.

On 02/01/2016 08:01 PM, zhanghailiang wrote:

This series is a prerequisite for COLO, here we add each netdev
a default buffer filter, it is disabled by default, and has
no side effect for delivering packets in net layer.

Note: this series is based on patch
 '[PATCH v2] net/filter: Fix the output information for command 'info 
network'

v3:
  - Drop patch '[PATCH RFC v2 2/5] vl: Make object_create() public'
  - Use object_new_with_props() instead of object_create() (Daniel)
v2:
  - Drop the patch net/filter: prevent the default filter to be deleted' (Jason)
  - Re-implement netdev_add_filter() by re-using object_object() (Jason)
  - Send patch 'net/filter: Fix the output information for command 'info
network' as an independent one. (Jason)

zhanghailiang (4):
   net/filter: Add a 'status' property for filter object
   net/filter: Introduce a helper to add a filter to the netdev
   filter-buffer: Accept zero interval
   net/filter: Add a default filter to each netdev

  include/net/filter.h | 12 
  net/filter-buffer.c  | 10 ---
  net/filter.c | 79 
  net/net.c| 23 +++
  4 files changed, 114 insertions(+), 10 deletions(-)



--
Thanks,
Yang



[Qemu-devel] [PATCH v2 0/2] Active commit regression fix

2016-02-01 Thread Jeff Cody
Changes from v1:

* Rather than allow insertion when bs->device_listtqe_prev points to
  a NULL entry, make sure than we follow the block scheme of enforcing
  bs->device_list->tqe_prev is NULL upon deletion. (Thanks Max!)

Bug #1300209 is a regression in 2.5, introduced during the 
change away from bdrv_swap().

When we change the parent backing link (change_parent_backing_link),
we must also accomodate non-NULL tqe_prev pointers that point to a
NULL entry.  Please see patch #1 for more details.

Jeff Cody (2):
  block: set device_list.tqe_prev to NULL on BDS removal
  block: qemu-iotests - add test for snapshot, commit, snapshot bug

 block.c|  24 ++
 blockdev.c |   3 +-
 include/block/block.h  |   1 +
 tests/qemu-iotests/143 | 114 +
 tests/qemu-iotests/143.out |  24 ++
 tests/qemu-iotests/group   |   1 +
 6 files changed, 155 insertions(+), 12 deletions(-)
 create mode 100755 tests/qemu-iotests/143
 create mode 100644 tests/qemu-iotests/143.out

-- 
1.9.3




[Qemu-devel] [PATCH v2 2/2] block: qemu-iotests - add test for snapshot, commit, snapshot bug

2016-02-01 Thread Jeff Cody
Signed-off-by: Jeff Cody 
---
 tests/qemu-iotests/143 | 114 +
 tests/qemu-iotests/143.out |  24 ++
 tests/qemu-iotests/group   |   1 +
 3 files changed, 139 insertions(+)
 create mode 100755 tests/qemu-iotests/143
 create mode 100644 tests/qemu-iotests/143.out

diff --git a/tests/qemu-iotests/143 b/tests/qemu-iotests/143
new file mode 100755
index 000..6f3e0bb
--- /dev/null
+++ b/tests/qemu-iotests/143
@@ -0,0 +1,114 @@
+#!/bin/bash
+# Check live snapshot, followed by active commit, and another snapshot. 
+#
+# This test is to catch the error case of BZ #1300209:
+# https://bugzilla.redhat.com/show_bug.cgi?id=1300209
+#
+# Copyright (C) 2016 Red Hat, Inc.
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=jc...@redhat.com
+
+seq=`basename $0`
+echo "QA output created by $seq"
+
+here=`pwd`
+status=1   # failure is the default!
+
+TMP_SNAP1=${TEST_DIR}/tmp.qcow2
+TMP_SNAP2=${TEST_DIR}/tmp2.qcow2
+
+_cleanup()
+{
+_cleanup_qemu
+rm -f "${TEST_IMG}" "${TMP_SNAP1}" "${TMP_SNAP2}"
+}
+
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# get standard environment, filters and checks
+. ./common.rc
+. ./common.filter
+. ./common.qemu
+
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+size=512M
+
+_make_test_img $size
+
+echo
+echo === Launching QEMU ===
+echo 
+
+qemu_comm_method="qmp"
+_launch_qemu -drive file="${TEST_IMG}",if=virtio
+h=$QEMU_HANDLE
+
+
+echo
+echo === Performing Live Snapshot 1 ===
+echo
+
+_send_qemu_cmd $h "{ 'execute': 'qmp_capabilities' }" "return"
+
+
+# First live snapshot, new overlay as active layer
+_send_qemu_cmd $h "{ 'execute': 'blockdev-snapshot-sync', 
+'arguments': { 
+ 'device': 'virtio0',
+ 'snapshot-file':'${TMP_SNAP1}',
+ 'format': 'qcow2'
+ }
+}" "return"
+
+echo
+echo === Performing block-commit on active layer ===
+echo
+
+# Block commit on active layer, push the new overlay into base
+_send_qemu_cmd $h "{ 'execute': 'block-commit',
+'arguments': {
+ 'device': 'virtio0'
+  }
+}" "READY"
+
+_send_qemu_cmd $h "{ 'execute': 'block-job-complete',
+'arguments': {
+'device': 'virtio0'
+  }
+   }" "COMPLETED"
+
+echo
+echo === Performing Live Snapshot 2 ===
+echo
+
+# New live snapshot, new overlays as active layer
+_send_qemu_cmd $h "{ 'execute': 'blockdev-snapshot-sync',
+'arguments': {
+'device': 'virtio0',
+'snapshot-file':'${TMP_SNAP2}',
+'format': 'qcow2'
+  }
+   }" "return"
+
+# success, all done
+echo "*** done"
+rm -f $seq.full
+status=0
diff --git a/tests/qemu-iotests/143.out b/tests/qemu-iotests/143.out
new file mode 100644
index 000..05cc9f4
--- /dev/null
+++ b/tests/qemu-iotests/143.out
@@ -0,0 +1,24 @@
+QA output created by 143
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=536870912
+
+=== Launching QEMU ===
+
+
+=== Performing Live Snapshot 1 ===
+
+{"return": {}}
+Formatting 'TEST_DIR/tmp.qcow2', fmt=qcow2 size=536870912 
backing_file=TEST_DIR/t.qcow2 backing_fmt=qcow2 encryption=off 
cluster_size=65536 lazy_refcounts=off refcount_bits=16
+{"return": {}}
+
+=== Performing block-commit on active layer ===
+
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_READY", "data": {"device": "virtio0", "len": 0, "offset": 0, 
"speed": 0, "type": "commit"}}
+{"return": {}}
+{"timestamp": {"seconds":  TIMESTAMP, "microseconds":  TIMESTAMP}, "event": 
"BLOCK_JOB_COMPLETED", "data": {"device": "virtio0", "len": 0, "offset": 0, 
"speed": 0, "type": "commit"}}
+
+=== Performing Live Snapshot 2 ===
+
+Formatting 'TEST_DIR/tmp2.qcow2', fmt=qcow2 size=536870912 

[Qemu-devel] [PATCH v2 1/2] block: set device_list.tqe_prev to NULL on BDS removal

2016-02-01 Thread Jeff Cody
This fixes a regression introduced with commit 3f09bfbc7.  Multiple
bugs arise in conjunction with live snapshots and mirroring operations
(which include active layer commit).

After a live snapshot occurs, the active layer and the base layer both
have a non-NULL tqe_prev field in the device_list, although the base
node's tqe_prev field points to a NULL entry.  This non-NULL tqe_prev
field occurs after the bdrv_append() in the external snapshot calls
change_parent_backing_link().

In change_parent_backing_link(), when the previous active layer is
removed from device_list, the device_list.tqe_prev pointer is not
set to NULL.

The operating scheme in the block layer is to indicate that a BDS belongs
in the bdrv_states device_list iff the device_list.tqe_prev pointer
is non-NULL.

This patch does two things:

1.) Introduces a new block layer helper bdrv_device_remove() to remove a
BDS from the device_list, and
2.) uses that new API, which also fixes the regression once used in
change_parent_backing_link().

Signed-off-by: Jeff Cody 
---
 block.c   | 24 ++--
 blockdev.c|  3 +--
 include/block/block.h |  1 +
 3 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/block.c b/block.c
index 5709d3d..08cc130 100644
--- a/block.c
+++ b/block.c
@@ -2220,21 +2220,25 @@ void bdrv_close_all(void)
 }
 }
 
+/* Note that bs->device_list.tqe_prev is initially null,
+ * and gets set to non-null by QTAILQ_INSERT_TAIL().  Establish
+ * the useful invariant "bs in bdrv_states iff bs->tqe_prev" by
+ * resetting it to null on remove.  */
+void bdrv_device_remove(BlockDriverState *bs)
+{
+QTAILQ_REMOVE(_states, bs, device_list);
+bs->device_list.tqe_prev = NULL;
+}
+
 /* make a BlockDriverState anonymous by removing from bdrv_state and
  * graph_bdrv_state list.
Also, NULL terminate the device_name to prevent double remove */
 void bdrv_make_anon(BlockDriverState *bs)
 {
-/*
- * Take care to remove bs from bdrv_states only when it's actually
- * in it.  Note that bs->device_list.tqe_prev is initially null,
- * and gets set to non-null by QTAILQ_INSERT_TAIL().  Establish
- * the useful invariant "bs in bdrv_states iff bs->tqe_prev" by
- * resetting it to null on remove.
- */
+/* Take care to remove bs from bdrv_states only when it's actually
+ * in it. */
 if (bs->device_list.tqe_prev) {
-QTAILQ_REMOVE(_states, bs, device_list);
-bs->device_list.tqe_prev = NULL;
+bdrv_device_remove(bs);
 }
 if (bs->node_name[0] != '\0') {
 QTAILQ_REMOVE(_bdrv_states, bs, node_list);
@@ -2275,7 +2279,7 @@ static void change_parent_backing_link(BlockDriverState 
*from,
 if (!to->device_list.tqe_prev) {
 QTAILQ_INSERT_BEFORE(from, to, device_list);
 }
-QTAILQ_REMOVE(_states, from, device_list);
+bdrv_device_remove(from);
 }
 }
 
diff --git a/blockdev.c b/blockdev.c
index 07cfe25..5f8e1b6 100644
--- a/blockdev.c
+++ b/blockdev.c
@@ -2382,8 +2382,7 @@ void qmp_x_blockdev_remove_medium(const char *device, 
Error **errp)
 
 /* This follows the convention established by bdrv_make_anon() */
 if (bs->device_list.tqe_prev) {
-QTAILQ_REMOVE(_states, bs, device_list);
-bs->device_list.tqe_prev = NULL;
+bdrv_device_remove(bs);
 }
 
 blk_remove_bs(blk);
diff --git a/include/block/block.h b/include/block/block.h
index 25f36dc..8c53b93 100644
--- a/include/block/block.h
+++ b/include/block/block.h
@@ -198,6 +198,7 @@ int bdrv_create(BlockDriver *drv, const char* filename,
 int bdrv_create_file(const char *filename, QemuOpts *opts, Error **errp);
 BlockDriverState *bdrv_new_root(void);
 BlockDriverState *bdrv_new(void);
+void bdrv_device_remove(BlockDriverState *bs);
 void bdrv_make_anon(BlockDriverState *bs);
 void bdrv_append(BlockDriverState *bs_new, BlockDriverState *bs_top);
 void bdrv_replace_in_backing_chain(BlockDriverState *old,
-- 
1.9.3




Re: [Qemu-devel] [PATCH 2/2] hw/vfio/platform: Add Qualcomm Technologies, Inc HiDMA device support

2016-02-01 Thread Shanker Donthineni

Hi Eric,

On 02/01/2016 08:37 AM, Eric Auger wrote:

Hi Shanker, Vikram,
On 01/30/2016 12:00 AM, Shanker Donthineni wrote:

From: Vikram Sethi 

This patch introduces a Qualcomm Technologies, Inc HiDMA
device and allows the instantiation of the vfio-qcom-hidma
device from the QEMU command line
(-device vfio-qcom-hidma,host="").

A device tree node is created for the guest containing compat,
dma-coherent, reg and interrupts properties.

Signed-off-by: Vikram Sethi 
Signed-off-by: Shanker Donthineni 
---
  hw/arm/sysbus-fdt.c   |  2 ++
  hw/vfio/Makefile.objs |  1 +
  hw/vfio/qcom-hidma.c  | 57 +++
  include/hw/vfio/vfio-qcom-hidma.h | 49 +
  4 files changed, 109 insertions(+)
  create mode 100644 hw/vfio/qcom-hidma.c
  create mode 100644 include/hw/vfio/vfio-qcom-hidma.h

diff --git a/hw/arm/sysbus-fdt.c b/hw/arm/sysbus-fdt.c
index 6ee7af2..4a7419e 100644
--- a/hw/arm/sysbus-fdt.c
+++ b/hw/arm/sysbus-fdt.c
@@ -28,6 +28,7 @@
  #include "sysemu/sysemu.h"
  #include "hw/vfio/vfio-platform.h"
  #include "hw/vfio/vfio-calxeda-xgmac.h"
+#include "hw/vfio/vfio-qcom-hidma.h"
  #include "hw/arm/fdt.h"
  
  /*

@@ -126,6 +127,7 @@ fail_reg:
  /* list of supported dynamic sysbus devices */
  static const NodeCreationPair add_fdt_node_functions[] = {
  {TYPE_VFIO_CALXEDA_XGMAC, add_generic_platform_fdt_node},
+{TYPE_VFIO_QCOM_HIDMA, add_generic_platform_fdt_node},
  {"", NULL}, /* last element */
  };
  
diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs

index d324863..9bcb093 100644
--- a/hw/vfio/Makefile.objs
+++ b/hw/vfio/Makefile.objs
@@ -3,4 +3,5 @@ obj-$(CONFIG_SOFTMMU) += common.o
  obj-$(CONFIG_PCI) += pci.o pci-quirks.o
  obj-$(CONFIG_SOFTMMU) += platform.o
  obj-$(CONFIG_SOFTMMU) += calxeda-xgmac.o
+obj-$(CONFIG_SOFTMMU) += qcom-hidma.o
  endif
diff --git a/hw/vfio/qcom-hidma.c b/hw/vfio/qcom-hidma.c
new file mode 100644
index 000..04acbd8
--- /dev/null
+++ b/hw/vfio/qcom-hidma.c
@@ -0,0 +1,57 @@
+/*
+ * Qualcomm Technologies, Inc VFIO HiDMA platform device
+ *
+ * Copyright (c) 2016, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ */
+
+#include "hw/vfio/vfio-qcom-hidma.h"
+
+static void qcom_hidma_realize(DeviceState *dev, Error **errp)
+{
+VFIOPlatformDevice *vdev = VFIO_PLATFORM_DEVICE(dev);
+VFIOQcomHidmaDeviceClass *k = VFIO_QCOM_HIDMA_DEVICE_GET_CLASS(dev);
+
+vdev->compat = g_strdup("qcom,hidma");
+
+k->parent_realize(dev, errp);
+}
+
+static const VMStateDescription vfio_platform_vmstate = {
+.name = TYPE_VFIO_QCOM_HIDMA,
+.unmigratable = 1,
+};
+
+static void vfio_qcom_hidma_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+VFIOQcomHidmaDeviceClass *vcxc = VFIO_QCOM_HIDMA_DEVICE_CLASS(klass);
+
+vcxc->parent_realize = dc->realize;
+dc->realize = qcom_hidma_realize;
+dc->desc = "VFIO QCOM HIDMA";

If I am not wrong you miss the dc->vmsd = _platform_vmstate
(VMStateDescription attachement)

This will cause an error with CLANG - I got that one already, reported
by Peter ;-) -

error: unused variable
'vfio_platform_vmstate' [-Werror,-Wunused-const-variable]
static const VMStateDescription vfio_platform_vmstate = {


Thanks for your finding, I will fix.


+}
+
+static const TypeInfo vfio_qcom_hidma_dev_info = {
+.name = TYPE_VFIO_QCOM_HIDMA,
+.parent = TYPE_VFIO_PLATFORM,
+.instance_size = sizeof(VFIOQcomHidmaDevice),
+.class_init = vfio_qcom_hidma_class_init,
+.class_size = sizeof(VFIOQcomHidmaDeviceClass),
+};
+
+static void register_qcom_hidma_dev_type(void)
+{
+type_register_static(_qcom_hidma_dev_info);
+}
+
+type_init(register_qcom_hidma_dev_type)
diff --git a/include/hw/vfio/vfio-qcom-hidma.h 
b/include/hw/vfio/vfio-qcom-hidma.h
new file mode 100644
index 000..a7cc8e6
--- /dev/null
+++ b/include/hw/vfio/vfio-qcom-hidma.h
@@ -0,0 +1,49 @@
+/*
+ * Qualcomm Technologies, Inc VFIO HiDMA platform device
+ *
+ * Copyright (c) 2016, The Linux Foundation. All rights reserved.
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License version 2 and
+ * only version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * 

[Qemu-devel] [PULL 04/17] cadence_gem: fix buffer overflow

2016-02-01 Thread Jason Wang
From: "Michael S. Tsirkin" 

gem_transmit copies a packet from guest into an tx_packet[2048]
array on stack, with size limited by descriptor length set by guest.  If
guest is malicious and specifies a descriptor length that is too large,
and should packet size exceed array size, this results in a buffer
overflow.

Reported-by: 刘令 
Signed-off-by: Michael S. Tsirkin 
Signed-off-by: Jason Wang 
---
 hw/net/cadence_gem.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/hw/net/cadence_gem.c b/hw/net/cadence_gem.c
index e513d9d..0346f3e 100644
--- a/hw/net/cadence_gem.c
+++ b/hw/net/cadence_gem.c
@@ -867,6 +867,14 @@ static void gem_transmit(CadenceGEMState *s)
 break;
 }
 
+if (tx_desc_get_length(desc) > sizeof(tx_packet) - (p - tx_packet)) {
+DB_PRINT("TX descriptor @ 0x%x too large: size 0x%x space 0x%x\n",
+ (unsigned)packet_desc_addr,
+ (unsigned)tx_desc_get_length(desc),
+ sizeof(tx_packet) - (p - tx_packet));
+break;
+}
+
 /* Gather this fragment of the packet from "dma memory" to our contig.
  * buffer.
  */
-- 
2.5.0




[Qemu-devel] [PULL 10/17] slirp: Factorizing and cleaning solookup()

2016-02-01 Thread Jason Wang
From: Guillaume Subiron 

solookup() was only compatible with TCP. Having the socket list in
argument, it is now compatible with UDP too.

Some optimization code is factorized inside the function (the function
look at the last returned result before browsing the complete socket
list).

This prepares for IPv6 support.

Signed-off-by: Guillaume Subiron 
Signed-off-by: Samuel Thibault 
Reviewed-by: Thomas Huth 
Signed-off-by: Jason Wang 
---
 slirp/socket.c| 37 -
 slirp/socket.h|  5 +++--
 slirp/tcp_input.c | 13 +++--
 slirp/udp.c   | 21 ++---
 4 files changed, 32 insertions(+), 44 deletions(-)

diff --git a/slirp/socket.c b/slirp/socket.c
index d1034fb..8f73e90 100644
--- a/slirp/socket.c
+++ b/slirp/socket.c
@@ -16,23 +16,34 @@ static void sofcantrcvmore(struct socket *so);
 static void sofcantsendmore(struct socket *so);
 
 struct socket *
-solookup(struct socket *head, struct in_addr laddr, u_int lport,
+solookup(struct socket **last, struct socket *head,
+ struct in_addr laddr, u_int lport,
  struct in_addr faddr, u_int fport)
 {
-   struct socket *so;
-
-   for (so = head->so_next; so != head; so = so->so_next) {
-   if (so->so_lport == lport &&
-   so->so_laddr.s_addr == laddr.s_addr &&
-   so->so_faddr.s_addr == faddr.s_addr &&
-   so->so_fport == fport)
-  break;
-   }
+struct socket *so = *last;
+
+/* Optimisation */
+if (so != head &&
+so->so_lport == lport &&
+so->so_laddr.s_addr == laddr.s_addr &&
+(!faddr.s_addr ||
+(so->so_faddr.s_addr == faddr.s_addr &&
+ so->so_fport == fport))) {
+return so;
+}
 
-   if (so == head)
-  return (struct socket *)NULL;
-   return so;
+for (so = head->so_next; so != head; so = so->so_next) {
+if (so->so_lport == lport &&
+so->so_laddr.s_addr == laddr.s_addr &&
+(!faddr.s_addr ||
+(so->so_faddr.s_addr == faddr.s_addr &&
+ so->so_fport == fport))) {
+*last = so;
+return so;
+}
+}
 
+return (struct socket *)NULL;
 }
 
 /*
diff --git a/slirp/socket.h b/slirp/socket.h
index b27bbb2..1c8c24c 100644
--- a/slirp/socket.h
+++ b/slirp/socket.h
@@ -87,8 +87,9 @@ struct socket {
 #define SS_HOSTFWD 0x1000  /* Socket describes host->guest 
forwarding */
 #define SS_INCOMING0x2000  /* Connection was initiated by a host 
on the internet */
 
-struct socket * solookup(struct socket *, struct in_addr, u_int, struct 
in_addr, u_int);
-struct socket * socreate(Slirp *);
+struct socket *solookup(struct socket **, struct socket *,
+struct in_addr, u_int, struct in_addr, u_int);
+struct socket *socreate(Slirp *);
 void sofree(struct socket *);
 int soread(struct socket *);
 void sorecvoob(struct socket *);
diff --git a/slirp/tcp_input.c b/slirp/tcp_input.c
index 4c3191d..5492061 100644
--- a/slirp/tcp_input.c
+++ b/slirp/tcp_input.c
@@ -320,16 +320,9 @@ tcp_input(struct mbuf *m, int iphlen, struct socket *inso)
 * Locate pcb for segment.
 */
 findso:
-   so = slirp->tcp_last_so;
-   if (so->so_fport != ti->ti_dport ||
-   so->so_lport != ti->ti_sport ||
-   so->so_laddr.s_addr != ti->ti_src.s_addr ||
-   so->so_faddr.s_addr != ti->ti_dst.s_addr) {
-   so = solookup(>tcb, ti->ti_src, ti->ti_sport,
-  ti->ti_dst, ti->ti_dport);
-   if (so)
-   slirp->tcp_last_so = so;
-   }
+   so = solookup(>tcp_last_so, >tcb,
+ ti->ti_src, ti->ti_sport,
+ ti->ti_dst, ti->ti_dport);
 
/*
 * If the state is CLOSED (i.e., TCB does not exist) then
diff --git a/slirp/udp.c b/slirp/udp.c
index 8203eb1..126ef82 100644
--- a/slirp/udp.c
+++ b/slirp/udp.c
@@ -151,25 +151,8 @@ udp_input(register struct mbuf *m, int iphlen)
/*
 * Locate pcb for datagram.
 */
-   so = slirp->udp_last_so;
-   if (so == >udb || so->so_lport != uh->uh_sport ||
-   so->so_laddr.s_addr != ip->ip_src.s_addr) {
-   struct socket *tmp;
-
-   for (tmp = slirp->udb.so_next; tmp != >udb;
-tmp = tmp->so_next) {
-   if (tmp->so_lport == uh->uh_sport &&
-   tmp->so_laddr.s_addr == ip->ip_src.s_addr) {
-   so = tmp;
-   break;
-   }
-   }
-   if (tmp == >udb) {
- so = NULL;
-   } else {
- slirp->udp_last_so = so;
-   }
-   }
+   so = 

[Qemu-devel] [PULL 05/17] slirp: goto bad in udp_input if sosendto fails

2016-02-01 Thread Jason Wang
From: Guillaume Subiron 

Before this patch, if sosendto fails, udp_input is executed as if the
packet was sent, recording the packet for icmp errors, which does not
makes sense since the packet was not actually sent, errors would be
related to a previous packet.

This patch adds a goto bad to cut the execution of this function.

Signed-off-by: Guillaume Subiron 
Signed-off-by: Samuel Thibault 
Reviewed-by: Thomas Huth 
Signed-off-by: Jason Wang 
---
 slirp/udp.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/slirp/udp.c b/slirp/udp.c
index fee13b4..ce63414 100644
--- a/slirp/udp.c
+++ b/slirp/udp.c
@@ -218,6 +218,7 @@ udp_input(register struct mbuf *m, int iphlen)
  *ip=save_ip;
  DEBUG_MISC((dfd,"udp tx errno = %d-%s\n",errno,strerror(errno)));
  icmp_error(m, ICMP_UNREACH,ICMP_UNREACH_NET, 0,strerror(errno));
+ goto bad;
}
 
m_free(so->so_m);   /* used for ICMP if error on sorecvfrom */
-- 
2.5.0




[Qemu-devel] [PULL 17/17] net/filter: Fix the output information for command 'info network'

2016-02-01 Thread Jason Wang
From: zhanghailiang 

The properties of netfilter object could be changed by 'qom-set'
command, but the output of 'info network' command is not updated,
because it got the old information through nf->info_str, it will
not be updated while we change the value of netfilter's property.

Here we split a helper function that could collect the output
information for filter, and also remove the useless member
'info_str' from struct NetFilterState.

Signed-off-by: zhanghailiang 
Cc: Jason Wang 
Cc: Eric Blake 
Cc: Markus Armbruster 
Cc: Yang Hongyang 
Reviewed-by: Eric Blake 
Signed-off-by: Jason Wang 
---
 include/net/filter.h |  1 -
 net/filter.c | 22 --
 net/net.c| 32 +---
 3 files changed, 29 insertions(+), 26 deletions(-)

diff --git a/include/net/filter.h b/include/net/filter.h
index 2deda36..5639976 100644
--- a/include/net/filter.h
+++ b/include/net/filter.h
@@ -55,7 +55,6 @@ struct NetFilterState {
 char *netdev_id;
 NetClientState *netdev;
 NetFilterDirection direction;
-char info_str[256];
 QTAILQ_ENTRY(NetFilterState) next;
 };
 
diff --git a/net/filter.c b/net/filter.c
index 17a8398..8f07b99 100644
--- a/net/filter.c
+++ b/net/filter.c
@@ -15,7 +15,6 @@
 #include "net/vhost_net.h"
 #include "qom/object_interfaces.h"
 #include "qemu/iov.h"
-#include "qapi/string-output-visitor.h"
 
 ssize_t qemu_netfilter_receive(NetFilterState *nf,
NetFilterDirection direction,
@@ -152,10 +151,6 @@ static void netfilter_complete(UserCreatable *uc, Error 
**errp)
 NetFilterClass *nfc = NETFILTER_GET_CLASS(uc);
 int queues;
 Error *local_err = NULL;
-char *str, *info;
-ObjectProperty *prop;
-ObjectPropertyIterator iter;
-StringOutputVisitor *ov;
 
 if (!nf->netdev_id) {
 error_setg(errp, "Parameter 'netdev' is required");
@@ -189,23 +184,6 @@ static void netfilter_complete(UserCreatable *uc, Error 
**errp)
 }
 }
 QTAILQ_INSERT_TAIL(>netdev->filters, nf, next);
-
-/* generate info str */
-object_property_iter_init(, OBJECT(nf));
-while ((prop = object_property_iter_next())) {
-if (!strcmp(prop->name, "type")) {
-continue;
-}
-ov = string_output_visitor_new(false);
-object_property_get(OBJECT(nf), string_output_get_visitor(ov),
-prop->name, errp);
-str = string_output_get_string(ov);
-string_output_visitor_cleanup(ov);
-info = g_strdup_printf(",%s=%s", prop->name, str);
-g_strlcat(nf->info_str, info, sizeof(nf->info_str));
-g_free(str);
-g_free(info);
-}
 }
 
 static void netfilter_finalize(Object *obj)
diff --git a/net/net.c b/net/net.c
index c929c41..55ce154 100644
--- a/net/net.c
+++ b/net/net.c
@@ -45,6 +45,7 @@
 #include "qapi/dealloc-visitor.h"
 #include "sysemu/sysemu.h"
 #include "net/filter.h"
+#include "qapi/string-output-visitor.h"
 
 /* Net bridge is currently not supported for W32. */
 #if !defined(_WIN32)
@@ -1195,6 +1196,30 @@ void qmp_netdev_del(const char *id, Error **errp)
 qemu_opts_del(opts);
 }
 
+static void netfilter_print_info(Monitor *mon, NetFilterState *nf)
+{
+char *str;
+ObjectProperty *prop;
+ObjectPropertyIterator iter;
+StringOutputVisitor *ov;
+
+/* generate info str */
+object_property_iter_init(, OBJECT(nf));
+while ((prop = object_property_iter_next())) {
+if (!strcmp(prop->name, "type")) {
+continue;
+}
+ov = string_output_visitor_new(false);
+object_property_get(OBJECT(nf), string_output_get_visitor(ov),
+prop->name, NULL);
+str = string_output_get_string(ov);
+string_output_visitor_cleanup(ov);
+monitor_printf(mon, ",%s=%s", prop->name, str);
+g_free(str);
+}
+monitor_printf(mon, "\n");
+}
+
 void print_net_client(Monitor *mon, NetClientState *nc)
 {
 NetFilterState *nf;
@@ -1208,9 +1233,10 @@ void print_net_client(Monitor *mon, NetClientState *nc)
 }
 QTAILQ_FOREACH(nf, >filters, next) {
 char *path = object_get_canonical_path_component(OBJECT(nf));
-monitor_printf(mon, "  - %s: type=%s%s\n", path,
-   object_get_typename(OBJECT(nf)),
-   nf->info_str);
+
+monitor_printf(mon, "  - %s: type=%s", path,
+   object_get_typename(OBJECT(nf)));
+netfilter_print_info(mon, nf);
 g_free(path);
 }
 }
-- 
2.5.0




Re: [Qemu-devel] [RFC PATCH v1 1/1] vGPU core driver : to provide common interface for vGPU.

2016-02-01 Thread Kirti Wankhede
Resending this mail again, somehow my previous mail didn't reached every 
to everyone's inbox.


On 2/2/2016 3:16 AM, Kirti Wankhede wrote:

Design for vGPU Driver:
Main purpose of vGPU driver is to provide a common interface for vGPU
management that can be used by differnt GPU drivers.

This module would provide a generic interface to create the device, add
it to vGPU bus, add device to IOMMU group and then add it to vfio group.

High Level block diagram:


+--+vgpu_register_driver()+---+
| __init() +->+   |
|  |  |   |
|  +<-+vgpu.ko|
| vfio_vgpu.ko |   probe()/remove()   |   |
|  |+-+   +-+
+--+| +---+---+ |
 | ^ |
 | callback| |
 | +---++|
 | |vgpu_register_device()   |
 | |||
 +---^-+-++-+--+-+
 | nvidia.ko ||  i915.ko   |
 |   |||
 +---+++

vGPU driver provides two types of registration interfaces:
1. Registration interface for vGPU bus driver:

/**
  * struct vgpu_driver - vGPU device driver
  * @name: driver name
  * @probe: called when new device created
  * @remove: called when device removed
  * @driver: device driver structure
  *
  **/
struct vgpu_driver {
 const char *name;
 int  (*probe)  (struct device *dev);
 void (*remove) (struct device *dev);
 struct device_driverdriver;
};

int  vgpu_register_driver(struct vgpu_driver *drv, struct module *owner);
void vgpu_unregister_driver(struct vgpu_driver *drv);

VFIO bus driver for vgpu, should use this interface to register with
vGPU driver. With this, VFIO bus driver for vGPU devices is responsible
to add vGPU device to VFIO group.

2. GPU driver interface

/**
  * struct gpu_device_ops - Structure to be registered for each physical
GPU to
  * register the device to vgpu module.
  *
  * @owner:  The module owner.
  * @vgpu_supported_config: Called to get information about supported
  *   vgpu types.
  *  @dev : pci device structure of physical GPU.
  *  @config: should return string listing supported
  *  config
  *  Returns integer: success (0) or error (< 0)
  * @vgpu_create:Called to allocate basic resouces in graphics
  *  driver for a particular vgpu.
  *  @dev: physical pci device structure on which vgpu
  *should be created
  *  @uuid: uuid for which VM it is intended to
  *  @instance: vgpu instance in that VM
  *  @vgpu_id: This represents the type of vgpu to be
  *created
  *  Returns integer: success (0) or error (< 0)
  * @vgpu_destroy:   Called to free resources in graphics driver for
  *  a vgpu instance of that VM.
  *  @dev: physical pci device structure to which
  *  this vgpu points to.
  *  @uuid: uuid for which the vgpu belongs to.
  *  @instance: vgpu instance in that VM
  *  Returns integer: success (0) or error (< 0)
  *  If VM is running and vgpu_destroy is called that
  *  means the vGPU is being hotunpluged. Return error
  *  if VM is running and graphics driver doesn't
  *  support vgpu hotplug.
  * @vgpu_start: Called to do initiate vGPU initialization
  *  process in graphics driver when VM boots before
  *  qemu starts.
  *  @uuid: UUID which is booting.
  *  Returns integer: success (0) or error (< 0)
  * @vgpu_shutdown:  Called to teardown vGPU related resources for
  *  the VM
  *  @uuid: UUID which is shutting down .
  *  Returns integer: success (0) or error (< 0)
  * @read:   Read emulation callback
  *  @vdev: vgpu device structure
  *  @buf: read buffer
  *  @count: number bytes to read
  *  @address_space: specifies for which address space
  *  the 

[Qemu-devel] [PULL 07/17] slirp: Adding address family switch for produced frames

2016-02-01 Thread Jason Wang
From: Guillaume Subiron 

In if_encap, a switch is added to prepare for the IPv6 case. Some code
is factorized.

This prepares for IPv6 support.

Signed-off-by: Guillaume Subiron 
Signed-off-by: Samuel Thibault 
Reviewed-by: Thomas Huth 
---
 slirp/slirp.c | 61 ++-
 1 file changed, 48 insertions(+), 13 deletions(-)

diff --git a/slirp/slirp.c b/slirp/slirp.c
index 1d5d172..f8dc505 100644
--- a/slirp/slirp.c
+++ b/slirp/slirp.c
@@ -762,20 +762,15 @@ void slirp_input(Slirp *slirp, const uint8_t *pkt, int 
pkt_len)
 }
 }
 
-/* Output the IP packet to the ethernet device. Returns 0 if the packet must be
- * re-queued.
+/* Prepare the IPv4 packet to be sent to the ethernet device. Returns 1 if no
+ * packet should be sent, 0 if the packet must be re-queued, 2 if the packet
+ * is ready to go.
  */
-int if_encap(Slirp *slirp, struct mbuf *ifm)
+static int if_encap4(Slirp *slirp, struct mbuf *ifm, struct ethhdr *eh,
+uint8_t ethaddr[ETH_ALEN])
 {
-uint8_t buf[1600];
-struct ethhdr *eh = (struct ethhdr *)buf;
-uint8_t ethaddr[ETH_ALEN];
 const struct ip *iph = (const struct ip *)ifm->m_data;
 
-if (ifm->m_len + ETH_HLEN > sizeof(buf)) {
-return 1;
-}
-
 if (iph->ip_dst.s_addr == 0) {
 /* 0.0.0.0 can not be a destination address, something went wrong,
  * avoid making it worse */
@@ -819,15 +814,55 @@ int if_encap(Slirp *slirp, struct mbuf *ifm)
 }
 return 0;
 } else {
-memcpy(eh->h_dest, ethaddr, ETH_ALEN);
 memcpy(eh->h_source, special_ethaddr, ETH_ALEN - 4);
 /* XXX: not correct */
 memcpy(>h_source[2], >vhost_addr, 4);
 eh->h_proto = htons(ETH_P_IP);
-memcpy(buf + sizeof(struct ethhdr), ifm->m_data, ifm->m_len);
-slirp_output(slirp->opaque, buf, ifm->m_len + ETH_HLEN);
+
+/* Send this */
+return 2;
+}
+}
+
+/* Output the IP packet to the ethernet device. Returns 0 if the packet must be
+ * re-queued.
+ */
+int if_encap(Slirp *slirp, struct mbuf *ifm)
+{
+uint8_t buf[1600];
+struct ethhdr *eh = (struct ethhdr *)buf;
+uint8_t ethaddr[ETH_ALEN];
+const struct ip *iph = (const struct ip *)ifm->m_data;
+int ret;
+
+if (ifm->m_len + ETH_HLEN > sizeof(buf)) {
 return 1;
 }
+
+switch (iph->ip_v) {
+case IPVERSION:
+ret = if_encap4(slirp, ifm, eh, ethaddr);
+if (ret < 2) {
+return ret;
+}
+break;
+
+default:
+/* Do not assert while we don't manage IP6VERSION */
+/* assert(0); */
+break;
+}
+
+memcpy(eh->h_dest, ethaddr, ETH_ALEN);
+DEBUG_ARGS((dfd, " src = %02x:%02x:%02x:%02x:%02x:%02x\n",
+eh->h_source[0], eh->h_source[1], eh->h_source[2],
+eh->h_source[3], eh->h_source[4], eh->h_source[5]));
+DEBUG_ARGS((dfd, " dst = %02x:%02x:%02x:%02x:%02x:%02x\n",
+eh->h_dest[0], eh->h_dest[1], eh->h_dest[2],
+eh->h_dest[3], eh->h_dest[4], eh->h_dest[5]));
+memcpy(buf + sizeof(struct ethhdr), ifm->m_data, ifm->m_len);
+slirp_output(slirp->opaque, buf, ifm->m_len + ETH_HLEN);
+return 1;
 }
 
 /* Drop host forwarding rule, return 0 if found. */
-- 
2.5.0




[Qemu-devel] [PULL 09/17] slirp: Factorizing address translation

2016-02-01 Thread Jason Wang
From: Guillaume Subiron 

This patch factorizes some duplicate code into a new function,
sotranslate_out(). This function perform the address translation when a
packet is transmitted to the host network. If the packet is destinated
to the host, the loopback address is used, and if the packet is
destinated to the virtual DNS, the real DNS address is used. This code
is just a copy of the existent, but factorized and ready to manage the
IPv6 case.

On the same model, the major part of udp_output() code is moved into a
new sotranslate_in(). This function is directly used in sorecvfrom(),
like sotranslate_out() in sosendto().
udp_output() becoming useless, it is removed and udp_output2() is
renamed into udp_output(). This adds consistency with the udp6_output()
function introduced by further patches.

Lastly, this factorizes some duplicate code into sotranslate_accept(), which
performs the address translation when a connection is established on the host
for port forwarding: if it comes from localhost, the host virtual address is
used instead.

This prepares for IPv6 support.

Signed-off-by: Guillaume Subiron 
Signed-off-by: Samuel Thibault 
Reviewed-by: Thomas Huth 
Signed-off-by: Jason Wang 
---
 slirp/bootp.c|   2 +-
 slirp/ip_icmp.c  |  19 ++
 slirp/socket.c   | 112 +--
 slirp/socket.h   |   5 +++
 slirp/tcp_subr.c |  35 -
 slirp/tftp.c |   6 +--
 slirp/udp.c  |  37 ++
 slirp/udp.h  |   3 +-
 8 files changed, 116 insertions(+), 103 deletions(-)

diff --git a/slirp/bootp.c b/slirp/bootp.c
index 1baaab1..0027279 100644
--- a/slirp/bootp.c
+++ b/slirp/bootp.c
@@ -325,7 +325,7 @@ static void bootp_reply(Slirp *slirp, const struct bootp_t 
*bp)
 
 m->m_len = sizeof(struct bootp_t) -
 sizeof(struct ip) - sizeof(struct udphdr);
-udp_output2(NULL, m, , , IPTOS_LOWDELAY);
+udp_output(NULL, m, , , IPTOS_LOWDELAY);
 }
 
 void bootp_input(struct mbuf *m)
diff --git a/slirp/ip_icmp.c b/slirp/ip_icmp.c
index 58b7ceb..3a29847 100644
--- a/slirp/ip_icmp.c
+++ b/slirp/ip_icmp.c
@@ -157,7 +157,7 @@ icmp_input(struct mbuf *m, int hlen)
 goto freeit;
 } else {
   struct socket *so;
-  struct sockaddr_in addr;
+  struct sockaddr_storage addr;
   if ((so = socreate(slirp)) == NULL) goto freeit;
   if (icmp_send(so, m, hlen) == 0) {
 return;
@@ -181,20 +181,9 @@ icmp_input(struct mbuf *m, int hlen)
   so->so_state = SS_ISFCONNECTED;
 
   /* Send the packet */
-  addr.sin_family = AF_INET;
-  if ((so->so_faddr.s_addr & slirp->vnetwork_mask.s_addr) ==
-  slirp->vnetwork_addr.s_addr) {
-   /* It's an alias */
-   if (so->so_faddr.s_addr == slirp->vnameserver_addr.s_addr) {
- if (get_dns_addr(_addr) < 0)
-   addr.sin_addr = loopback_addr;
-   } else {
- addr.sin_addr = loopback_addr;
-   }
-  } else {
-   addr.sin_addr = so->so_faddr;
-  }
-  addr.sin_port = so->so_fport;
+  addr = so->fhost.ss;
+  sotranslate_out(so, );
+
   if(sendto(so->s, icmp_ping_msg, strlen(icmp_ping_msg), 0,
(struct sockaddr *), sizeof(addr)) == -1) {
DEBUG_MISC((dfd,"icmp_input udp sendto tx errno = %d-%s\n",
diff --git a/slirp/socket.c b/slirp/socket.c
index bf603c9..d1034fb 100644
--- a/slirp/socket.c
+++ b/slirp/socket.c
@@ -438,6 +438,7 @@ void
 sorecvfrom(struct socket *so)
 {
struct sockaddr_storage addr;
+   struct sockaddr_storage saddr, daddr;
socklen_t addrlen = sizeof(struct sockaddr_storage);
 
DEBUG_CALL("sorecvfrom");
@@ -525,11 +526,17 @@ sorecvfrom(struct socket *so)
 
/*
 * If this packet was destined for CTL_ADDR,
-* make it look like that's where it came from, done by udp_output
+* make it look like that's where it came from
 */
+   saddr = addr;
+   sotranslate_in(so, );
+   daddr = so->lhost.ss;
+
switch (so->so_ffamily) {
case AF_INET:
-   udp_output(so, m, (struct sockaddr_in *) );
+   udp_output(so, m, (struct sockaddr_in *) ,
+  (struct sockaddr_in *) ,
+  so->so_iptos);
break;
default:
break;
@@ -544,33 +551,20 @@ sorecvfrom(struct socket *so)
 int
 sosendto(struct socket *so, struct mbuf *m)
 {
-   Slirp *slirp = so->slirp;
int ret;
-   struct sockaddr_in addr;
+   struct sockaddr_storage addr;
 
DEBUG_CALL("sosendto");
DEBUG_ARG("so = %p", so);
DEBUG_ARG("m = %p", m);
 
-addr.sin_family = AF_INET;
-   if ((so->so_faddr.s_addr & slirp->vnetwork_mask.s_addr) ==
-   slirp->vnetwork_addr.s_addr) {
- /* It's an alias */
- 

[Qemu-devel] [PULL 08/17] slirp: Make Socket structure IPv6 compatible

2016-02-01 Thread Jason Wang
From: Guillaume Subiron 

This patch replaces foreign and local address/port couples in Socket
structure by 2 sockaddr_storage which can be casted in sockaddr_in.
Direct access to address and port is still possible thanks to some
\#define, so retrocompatibility of the existing code is assured.

The ss_family field of sockaddr_storage is declared after each socket
creation.

The whole structure is also saved/restored when a Qemu session is
saved/restored.

This prepares for IPv6 support.

Signed-off-by: Guillaume Subiron 
Signed-off-by: Samuel Thibault 
Reviewed-by: Thomas Huth 
Signed-off-by: Jason Wang 
---
 slirp/ip_icmp.c   |  2 ++
 slirp/slirp.c | 51 ++-
 slirp/socket.c| 14 +++---
 slirp/socket.h| 19 +++
 slirp/tcp_input.c |  2 ++
 slirp/tcp_subr.c  |  2 ++
 slirp/udp.c   |  4 
 7 files changed, 78 insertions(+), 16 deletions(-)

diff --git a/slirp/ip_icmp.c b/slirp/ip_icmp.c
index 23b9f0f..58b7ceb 100644
--- a/slirp/ip_icmp.c
+++ b/slirp/ip_icmp.c
@@ -170,8 +170,10 @@ icmp_input(struct mbuf *m, int hlen)
goto end_error;
   }
   so->so_m = m;
+  so->so_ffamily = AF_INET;
   so->so_faddr = ip->ip_dst;
   so->so_fport = htons(7);
+  so->so_lfamily = AF_INET;
   so->so_laddr = ip->ip_src;
   so->so_lport = htons(9);
   so->so_iptos = ip->ip_tos;
diff --git a/slirp/slirp.c b/slirp/slirp.c
index f8dc505..b900775 100644
--- a/slirp/slirp.c
+++ b/slirp/slirp.c
@@ -23,6 +23,7 @@
  */
 #include "qemu-common.h"
 #include "qemu/timer.h"
+#include "qemu/error-report.h"
 #include "sysemu/char.h"
 #include "slirp.h"
 #include "hw/hw.h"
@@ -234,7 +235,7 @@ Slirp *slirp_init(int restricted, struct in_addr vnetwork,
 
 slirp->opaque = opaque;
 
-register_savevm(NULL, "slirp", 0, 3,
+register_savevm(NULL, "slirp", 0, 4,
 slirp_state_save, slirp_state_load, slirp);
 
 QTAILQ_INSERT_TAIL(_instances, slirp, entry);
@@ -1046,10 +1047,26 @@ static void slirp_sbuf_save(QEMUFile *f, struct sbuf 
*sbuf)
 static void slirp_socket_save(QEMUFile *f, struct socket *so)
 {
 qemu_put_be32(f, so->so_urgc);
-qemu_put_be32(f, so->so_faddr.s_addr);
-qemu_put_be32(f, so->so_laddr.s_addr);
-qemu_put_be16(f, so->so_fport);
-qemu_put_be16(f, so->so_lport);
+qemu_put_be16(f, so->so_ffamily);
+switch (so->so_ffamily) {
+case AF_INET:
+qemu_put_be32(f, so->so_faddr.s_addr);
+qemu_put_be16(f, so->so_fport);
+break;
+default:
+error_report(
+"so_ffamily unknown, unable to save so_faddr and so_fport\n");
+}
+qemu_put_be16(f, so->so_lfamily);
+switch (so->so_lfamily) {
+case AF_INET:
+qemu_put_be32(f, so->so_laddr.s_addr);
+qemu_put_be16(f, so->so_lport);
+break;
+default:
+error_report(
+"so_ffamily unknown, unable to save so_laddr and so_lport\n");
+}
 qemu_put_byte(f, so->so_iptos);
 qemu_put_byte(f, so->so_emu);
 qemu_put_byte(f, so->so_type);
@@ -1169,10 +1186,26 @@ static int slirp_socket_load(QEMUFile *f, struct socket 
*so)
 return -ENOMEM;
 
 so->so_urgc = qemu_get_be32(f);
-so->so_faddr.s_addr = qemu_get_be32(f);
-so->so_laddr.s_addr = qemu_get_be32(f);
-so->so_fport = qemu_get_be16(f);
-so->so_lport = qemu_get_be16(f);
+so->so_ffamily = qemu_get_be16(f);
+switch (so->so_ffamily) {
+case AF_INET:
+so->so_faddr.s_addr = qemu_get_be32(f);
+so->so_fport = qemu_get_be16(f);
+break;
+default:
+error_report(
+"so_ffamily unknown, unable to restore so_faddr and 
so_lport\n");
+}
+so->so_lfamily = qemu_get_be16(f);
+switch (so->so_lfamily) {
+case AF_INET:
+so->so_laddr.s_addr = qemu_get_be32(f);
+so->so_lport = qemu_get_be16(f);
+break;
+default:
+error_report(
+"so_ffamily unknown, unable to restore so_laddr and 
so_lport\n");
+}
 so->so_iptos = qemu_get_byte(f);
 so->so_emu = qemu_get_byte(f);
 so->so_type = qemu_get_byte(f);
diff --git a/slirp/socket.c b/slirp/socket.c
index 1673e3a..bf603c9 100644
--- a/slirp/socket.c
+++ b/slirp/socket.c
@@ -437,8 +437,8 @@ sowrite(struct socket *so)
 void
 sorecvfrom(struct socket *so)
 {
-   struct sockaddr_in addr;
-   socklen_t addrlen = sizeof(struct sockaddr_in);
+   struct sockaddr_storage addr;
+   socklen_t addrlen = sizeof(struct sockaddr_storage);
 
DEBUG_CALL("sorecvfrom");
DEBUG_ARG("so = %p", so);
@@ -527,7 +527,13 @@ sorecvfrom(struct socket *so)
 * If this packet was destined for CTL_ADDR,
 * make it look like that's where it came from, done by udp_output
 */
-   udp_output(so, m, );
+   

[Qemu-devel] [PULL 15/17] net: netmap: use nm_open() to open netmap ports

2016-02-01 Thread Jason Wang
From: Vincenzo Maffione 

This patch simplifies the netmap backend code by means of the nm_open()
helper function provided by netmap_user.h, which hides the details of
open(), iotcl() and mmap() carried out on the netmap device.

Moreover, the semantic of nm_open() makes it possible to open special
netmap ports (e.g. pipes, monitors) and use special modes (e.g. host rings
only, single queue mode, exclusive access).

Signed-off-by: Vincenzo Maffione 
Signed-off-by: Jason Wang 
---
 net/netmap.c | 97 
 1 file changed, 32 insertions(+), 65 deletions(-)

diff --git a/net/netmap.c b/net/netmap.c
index 5558368..27295ab 100644
--- a/net/netmap.c
+++ b/net/netmap.c
@@ -39,21 +39,12 @@
 #include "qemu/error-report.h"
 #include "qemu/iov.h"
 
-/* Private netmap device info. */
-typedef struct NetmapPriv {
-int fd;
-size_t  memsize;
-void*mem;
-struct netmap_if*nifp;
-struct netmap_ring  *rx;
-struct netmap_ring  *tx;
-charfdname[PATH_MAX];/* Normally "/dev/netmap". */
-charifname[IFNAMSIZ];
-} NetmapPriv;
-
 typedef struct NetmapState {
 NetClientState  nc;
-NetmapPriv  me;
+struct nm_desc  *nmd;
+charifname[IFNAMSIZ];
+struct netmap_ring  *tx;
+struct netmap_ring  *rx;
 boolread_poll;
 boolwrite_poll;
 struct ioveciov[IOV_MAX];
@@ -90,44 +81,23 @@ pkt_copy(const void *_src, void *_dst, int l)
  * Open a netmap device. We assume there is only one queue
  * (which is the case for the VALE bridge).
  */
-static void netmap_open(NetmapPriv *me, Error **errp)
+static struct nm_desc *netmap_open(const NetdevNetmapOptions *nm_opts,
+   Error **errp)
 {
-int fd;
-int err;
-size_t l;
+struct nm_desc *nmd;
 struct nmreq req;
 
-me->fd = fd = open(me->fdname, O_RDWR);
-if (fd < 0) {
-error_setg_file_open(errp, errno, me->fdname);
-return;
-}
 memset(, 0, sizeof(req));
-pstrcpy(req.nr_name, sizeof(req.nr_name), me->ifname);
-req.nr_ringid = NETMAP_NO_TX_POLL;
-req.nr_version = NETMAP_API;
-err = ioctl(fd, NIOCREGIF, );
-if (err) {
-error_setg_errno(errp, errno, "Unable to register %s", me->ifname);
-goto error;
-}
-l = me->memsize = req.nr_memsize;
 
-me->mem = mmap(0, l, PROT_WRITE | PROT_READ, MAP_SHARED, fd, 0);
-if (me->mem == MAP_FAILED) {
-error_setg_errno(errp, errno, "Unable to mmap netmap shared memory");
-me->mem = NULL;
-goto error;
+nmd = nm_open(nm_opts->ifname, , NETMAP_NO_TX_POLL,
+  NULL);
+if (nmd == NULL) {
+error_setg_errno(errp, errno, "Failed to nm_open() %s",
+ nm_opts->ifname);
+return NULL;
 }
 
-me->nifp = NETMAP_IF(me->mem, req.nr_offset);
-me->tx = NETMAP_TXRING(me->nifp, 0);
-me->rx = NETMAP_RXRING(me->nifp, 0);
-
-return;
-
-error:
-close(me->fd);
+return nmd;
 }
 
 static void netmap_send(void *opaque);
@@ -136,7 +106,7 @@ static void netmap_writable(void *opaque);
 /* Set the event-loop handlers for the netmap backend. */
 static void netmap_update_fd_handler(NetmapState *s)
 {
-qemu_set_fd_handler(s->me.fd,
+qemu_set_fd_handler(s->nmd->fd,
 s->read_poll ? netmap_send : NULL,
 s->write_poll ? netmap_writable : NULL,
 s);
@@ -188,7 +158,7 @@ static ssize_t netmap_receive(NetClientState *nc,
   const uint8_t *buf, size_t size)
 {
 NetmapState *s = DO_UPCAST(NetmapState, nc, nc);
-struct netmap_ring *ring = s->me.tx;
+struct netmap_ring *ring = s->tx;
 uint32_t i;
 uint32_t idx;
 uint8_t *dst;
@@ -218,7 +188,7 @@ static ssize_t netmap_receive(NetClientState *nc,
 ring->slot[i].flags = 0;
 pkt_copy(buf, dst, size);
 ring->cur = ring->head = nm_ring_next(ring, i);
-ioctl(s->me.fd, NIOCTXSYNC, NULL);
+ioctl(s->nmd->fd, NIOCTXSYNC, NULL);
 
 return size;
 }
@@ -227,7 +197,7 @@ static ssize_t netmap_receive_iov(NetClientState *nc,
 const struct iovec *iov, int iovcnt)
 {
 NetmapState *s = DO_UPCAST(NetmapState, nc, nc);
-struct netmap_ring *ring = s->me.tx;
+struct netmap_ring *ring = s->tx;
 uint32_t last;
 uint32_t idx;
 uint8_t *dst;
@@ -284,7 +254,7 @@ static ssize_t netmap_receive_iov(NetClientState *nc,
 /* Now update ring->cur and ring->head. */
 ring->cur = ring->head = i;
 
-ioctl(s->me.fd, NIOCTXSYNC, NULL);
+ioctl(s->nmd->fd, NIOCTXSYNC, NULL);
 
 return iov_size(iov, iovcnt);
 }
@@ -301,7 +271,7 @@ static void netmap_send_completed(NetClientState *nc, 
ssize_t len)
 static void netmap_send(void *opaque)
 {

[Qemu-devel] [PULL 12/17] slirp: Make udp_attach IPv6 compatible

2016-02-01 Thread Jason Wang
From: Guillaume Subiron 

A sa_family_t is now passed in argument to udp_attach instead of using a
hardcoded "AF_INET" to call qemu_socket().

This prepares for IPv6 support.

Signed-off-by: Guillaume Subiron 
Signed-off-by: Samuel Thibault 
Reviewed-by: Thomas Huth 
Signed-off-by: Jason Wang 
---
 slirp/ip_icmp.c | 2 +-
 slirp/udp.c | 7 ---
 slirp/udp.h | 2 +-
 3 files changed, 6 insertions(+), 5 deletions(-)

diff --git a/slirp/ip_icmp.c b/slirp/ip_icmp.c
index 3a29847..592f33a 100644
--- a/slirp/ip_icmp.c
+++ b/slirp/ip_icmp.c
@@ -162,7 +162,7 @@ icmp_input(struct mbuf *m, int hlen)
   if (icmp_send(so, m, hlen) == 0) {
 return;
   }
-  if(udp_attach(so) == -1) {
+  if (udp_attach(so, AF_INET) == -1) {
DEBUG_MISC((dfd,"icmp_input udp_attach errno = %d-%s\n",
errno,strerror(errno)));
sofree(so);
diff --git a/slirp/udp.c b/slirp/udp.c
index 63776c0..94e19b7 100644
--- a/slirp/udp.c
+++ b/slirp/udp.c
@@ -169,7 +169,7 @@ udp_input(register struct mbuf *m, int iphlen)
  if (!so) {
  goto bad;
  }
- if(udp_attach(so) == -1) {
+ if (udp_attach(so, AF_INET) == -1) {
DEBUG_MISC((dfd," udp_attach errno = %d-%s\n",
errno,strerror(errno)));
sofree(so);
@@ -277,9 +277,10 @@ int udp_output(struct socket *so, struct mbuf *m,
 }
 
 int
-udp_attach(struct socket *so)
+udp_attach(struct socket *so, sa_family_t af)
 {
-  if((so->s = qemu_socket(AF_INET,SOCK_DGRAM,0)) != -1) {
+  so->s = qemu_socket(af, SOCK_DGRAM, 0);
+  if (so->s != -1) {
 so->so_expire = curtime + SO_EXPIRE;
 insque(so, >slirp->udb);
   }
diff --git a/slirp/udp.h b/slirp/udp.h
index a04b8ce..15e73c1 100644
--- a/slirp/udp.h
+++ b/slirp/udp.h
@@ -76,7 +76,7 @@ struct mbuf;
 void udp_init(Slirp *);
 void udp_cleanup(Slirp *);
 void udp_input(register struct mbuf *, int);
-int udp_attach(struct socket *);
+int udp_attach(struct socket *, sa_family_t af);
 void udp_detach(struct socket *);
 struct socket * udp_listen(Slirp *, uint32_t, u_int, uint32_t, u_int,
int);
-- 
2.5.0




[Qemu-devel] [PULL 11/17] slirp: Add sockaddr_equal, make solookup family-agnostic

2016-02-01 Thread Jason Wang
From: Guillaume Subiron 

This patch makes solookup() compatible with varying address
families, by using a new sockaddr_equal() function that compares
two sockaddr_storage.

This prepares for IPv6 support.

Signed-off-by: Guillaume Subiron 
Signed-off-by: Samuel Thibault 
Reviewed-by: Thomas Huth 
Signed-off-by: Jason Wang 
---
 slirp/socket.c| 21 ++---
 slirp/socket.h| 22 +-
 slirp/tcp_input.c | 23 ++-
 slirp/udp.c   | 10 --
 4 files changed, 49 insertions(+), 27 deletions(-)

diff --git a/slirp/socket.c b/slirp/socket.c
index 8f73e90..f7e5968 100644
--- a/slirp/socket.c
+++ b/slirp/socket.c
@@ -15,29 +15,20 @@
 static void sofcantrcvmore(struct socket *so);
 static void sofcantsendmore(struct socket *so);
 
-struct socket *
-solookup(struct socket **last, struct socket *head,
- struct in_addr laddr, u_int lport,
- struct in_addr faddr, u_int fport)
+struct socket *solookup(struct socket **last, struct socket *head,
+struct sockaddr_storage *lhost, struct sockaddr_storage *fhost)
 {
 struct socket *so = *last;
 
 /* Optimisation */
-if (so != head &&
-so->so_lport == lport &&
-so->so_laddr.s_addr == laddr.s_addr &&
-(!faddr.s_addr ||
-(so->so_faddr.s_addr == faddr.s_addr &&
- so->so_fport == fport))) {
+if (so != head && sockaddr_equal(&(so->lhost.ss), lhost)
+&& (!fhost || sockaddr_equal(>fhost.ss, fhost))) {
 return so;
 }
 
 for (so = head->so_next; so != head; so = so->so_next) {
-if (so->so_lport == lport &&
-so->so_laddr.s_addr == laddr.s_addr &&
-(!faddr.s_addr ||
-(so->so_faddr.s_addr == faddr.s_addr &&
- so->so_fport == fport))) {
+if (sockaddr_equal(&(so->lhost.ss), lhost)
+&& (!fhost || sockaddr_equal(>fhost.ss, fhost))) {
 *last = so;
 return so;
 }
diff --git a/slirp/socket.h b/slirp/socket.h
index 1c8c24c..929af0a 100644
--- a/slirp/socket.h
+++ b/slirp/socket.h
@@ -87,8 +87,28 @@ struct socket {
 #define SS_HOSTFWD 0x1000  /* Socket describes host->guest 
forwarding */
 #define SS_INCOMING0x2000  /* Connection was initiated by a host 
on the internet */
 
+static inline int sockaddr_equal(struct sockaddr_storage *a,
+struct sockaddr_storage *b)
+{
+if (a->ss_family != b->ss_family) {
+return 0;
+}
+
+switch (a->ss_family) {
+case AF_INET:
+{
+struct sockaddr_in *a4 = (struct sockaddr_in *) a;
+struct sockaddr_in *b4 = (struct sockaddr_in *) b;
+return a4->sin_addr.s_addr == b4->sin_addr.s_addr
+   && a4->sin_port == b4->sin_port;
+}
+default:
+assert(0);
+}
+}
+
 struct socket *solookup(struct socket **, struct socket *,
-struct in_addr, u_int, struct in_addr, u_int);
+struct sockaddr_storage *, struct sockaddr_storage *);
 struct socket *socreate(Slirp *);
 void sofree(struct socket *);
 int soread(struct socket *);
diff --git a/slirp/tcp_input.c b/slirp/tcp_input.c
index 5492061..5e2773c 100644
--- a/slirp/tcp_input.c
+++ b/slirp/tcp_input.c
@@ -227,6 +227,8 @@ tcp_input(struct mbuf *m, int iphlen, struct socket *inso)
int iss = 0;
u_long tiwin;
int ret;
+   struct sockaddr_storage lhost, fhost;
+   struct sockaddr_in *lhost4, *fhost4;
 struct ex_list *ex_ptr;
 Slirp *slirp;
 
@@ -320,9 +322,16 @@ tcp_input(struct mbuf *m, int iphlen, struct socket *inso)
 * Locate pcb for segment.
 */
 findso:
-   so = solookup(>tcp_last_so, >tcb,
- ti->ti_src, ti->ti_sport,
- ti->ti_dst, ti->ti_dport);
+   lhost.ss_family = AF_INET;
+   lhost4 = (struct sockaddr_in *) 
+   lhost4->sin_addr = ti->ti_src;
+   lhost4->sin_port = ti->ti_sport;
+   fhost.ss_family = AF_INET;
+   fhost4 = (struct sockaddr_in *) 
+   fhost4->sin_addr = ti->ti_dst;
+   fhost4->sin_port = ti->ti_dport;
+
+   so = solookup(>tcp_last_so, >tcb, , );
 
/*
 * If the state is CLOSED (i.e., TCB does not exist) then
@@ -367,12 +376,8 @@ findso:
  sbreserve(>so_snd, TCP_SNDSPACE);
  sbreserve(>so_rcv, TCP_RCVSPACE);
 
- so->so_lfamily = AF_INET;
- so->so_laddr = ti->ti_src;
- so->so_lport = ti->ti_sport;
- so->so_ffamily = AF_INET;
- so->so_faddr = ti->ti_dst;
- so->so_fport = ti->ti_dport;
+ so->lhost.ss = lhost;
+ so->fhost.ss = fhost;
 
  if ((so->so_iptos = tcp_tos(so)) == 0)
so->so_iptos = ((struct ip *)ti)->ip_tos;
diff --git a/slirp/udp.c b/slirp/udp.c
index 126ef82..63776c0 100644
--- a/slirp/udp.c
+++ b/slirp/udp.c

Re: [Qemu-devel] [PATCH v10 0/2] mirror: Improve zero write and discard

2016-02-01 Thread Paolo Bonzini


On 22/01/2016 08:46, Fam Zheng wrote:
> On Wed, 01/13 10:50, Fam Zheng wrote:
>> > v10: Fix and simplify mirror_cow_align. [Max]
> Jeff, are you happy to take these patches?

Ping again.  I have patches waiting for these to be accepted too.

Paolo



Re: [Qemu-devel] [PATCH] jobs: remove unused structure

2016-02-01 Thread Fam Zheng
On Mon, 02/01 17:18, John Snow wrote:
> Signed-off-by: John Snow 
> ---
>  blockjob.c | 8 
>  1 file changed, 8 deletions(-)
> 
> diff --git a/blockjob.c b/blockjob.c
> index 80adb9d..a692142 100644
> --- a/blockjob.c
> +++ b/blockjob.c
> @@ -278,14 +278,6 @@ void block_job_iostatus_reset(BlockJob *job)
>  }
>  }
>  
> -struct BlockFinishData {
> -BlockJob *job;
> -BlockCompletionFunc *cb;
> -void *opaque;
> -bool cancelled;
> -int ret;
> -};
> -

Duplicates this:

https://lists.gnu.org/archive/html/qemu-devel/2016-01/msg05529.html

Fam

>  static int block_job_finish_sync(BlockJob *job,
>   void (*finish)(BlockJob *, Error **errp),
>   Error **errp)
> -- 
> 2.4.3
> 
> 



[Qemu-devel] [PULL 00/17] Net patches

2016-02-01 Thread Jason Wang
The following changes since commit 0430891ce162b986c6e02a7729a942ecd2a32ca4:

  hw: Clean up includes (2016-01-29 15:07:25 +)

are available in the git repository at:

  https://github.com/jasowang/qemu.git tags/net-pull-request

for you to fetch changes up to e8a7a1a574ed6728422959c8aa79ca584cdd1d4d:

  net/filter: Fix the output information for command 'info network' (2016-02-02 
10:21:28 +0800)



Major changes for net:

- preparation for ipv6 support in slirp
- fix tx infinite loops of e1000
- fix cadence_gem buffer overflow
- rx netfilter were gone in reverse for future complex netfilter setups


Guillaume Subiron (9):
  slirp: goto bad in udp_input if sosendto fails
  slirp: Generalizing and neutralizing ARP code
  slirp: Adding address family switch for produced frames
  slirp: Make Socket structure IPv6 compatible
  slirp: Factorizing address translation
  slirp: Factorizing and cleaning solookup()
  slirp: Add sockaddr_equal, make solookup family-agnostic
  slirp: Make udp_attach IPv6 compatible
  slirp: Adding family argument to tcp_fconnect()

Laszlo Ersek (1):
  e1000: eliminate infinite loops on out-of-bounds transfer start

Li Zhijian (1):
  net: always walk through filters in reverse if traffic is egress

Michael S. Tsirkin (1):
  cadence_gem: fix buffer overflow

Prasad J Pandit (1):
  net: cadence_gem: check packet size in gem_recieve

Thomas Huth (2):
  net/slirp: Tell the users when they are using deprecated options
  qemu-doc: Do not promote deprecated -smb and -redir options

Vincenzo Maffione (1):
  net: netmap: use nm_open() to open netmap ports

zhanghailiang (1):
  net/filter: Fix the output information for command 'info network'

 hw/net/cadence_gem.c |  12 
 hw/net/e1000.c   |   6 +-
 include/net/filter.h |   1 -
 include/net/net.h|   2 +-
 net/filter.c |  43 +++---
 net/net.c|  52 ++---
 net/netmap.c |  97 +++
 net/slirp.c  |   3 +
 os-posix.c   |   3 +
 qemu-doc.texi|   9 +--
 slirp/bootp.c|   2 +-
 slirp/ip_icmp.c  |  23 +++-
 slirp/mbuf.c |   2 +-
 slirp/mbuf.h |   2 +-
 slirp/slirp.c| 116 +
 slirp/slirp.h|   2 +-
 slirp/socket.c   | 158 ++-
 slirp/socket.h   |  49 ++--
 slirp/tcp_input.c|  30 +-
 slirp/tcp_subr.c |  40 -
 slirp/tftp.c |   6 +-
 slirp/udp.c  |  74 +++-
 slirp/udp.h  |   5 +-
 vl.c |   6 ++
 24 files changed, 445 insertions(+), 298 deletions(-)




[Qemu-devel] [PULL 06/17] slirp: Generalizing and neutralizing ARP code

2016-02-01 Thread Jason Wang
From: Guillaume Subiron 

Basically, this patch replaces "arp" by "resolution" every time "arp"
means "mac resolution" and not specifically ARP.

This prepares for IPv6 support.

Signed-off-by: Guillaume Subiron 
Signed-off-by: Samuel Thibault 
Reviewed-by: Thomas Huth 
Signed-off-by: Jason Wang 
---
 slirp/mbuf.c  | 2 +-
 slirp/mbuf.h  | 2 +-
 slirp/slirp.c | 4 ++--
 3 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/slirp/mbuf.c b/slirp/mbuf.c
index 795fc29..bc942b6 100644
--- a/slirp/mbuf.c
+++ b/slirp/mbuf.c
@@ -91,7 +91,7 @@ m_get(Slirp *slirp)
m->m_len = 0;
 m->m_nextpkt = NULL;
 m->m_prevpkt = NULL;
-m->arp_requested = false;
+m->resolution_requested = false;
 m->expiration_date = (uint64_t)-1;
 end_error:
DEBUG_ARG("m = %p", m);
diff --git a/slirp/mbuf.h b/slirp/mbuf.h
index b144f1c..38fedf4 100644
--- a/slirp/mbuf.h
+++ b/slirp/mbuf.h
@@ -79,7 +79,7 @@ struct mbuf {
int m_len;  /* Amount of data in this mbuf */
 
Slirp *slirp;
-   boolarp_requested;
+   boolresolution_requested;
uint64_t expiration_date;
/* start of dynamic buffer area, must be last element */
union {
diff --git a/slirp/slirp.c b/slirp/slirp.c
index 35f819a..1d5d172 100644
--- a/slirp/slirp.c
+++ b/slirp/slirp.c
@@ -786,7 +786,7 @@ int if_encap(Slirp *slirp, struct mbuf *ifm)
 struct ethhdr *reh = (struct ethhdr *)arp_req;
 struct arphdr *rah = (struct arphdr *)(arp_req + ETH_HLEN);
 
-if (!ifm->arp_requested) {
+if (!ifm->resolution_requested) {
 /* If the client addr is not known, send an ARP request */
 memset(reh->h_dest, 0xff, ETH_ALEN);
 memcpy(reh->h_source, special_ethaddr, ETH_ALEN - 4);
@@ -812,7 +812,7 @@ int if_encap(Slirp *slirp, struct mbuf *ifm)
 rah->ar_tip = iph->ip_dst.s_addr;
 slirp->client_ipaddr = iph->ip_dst;
 slirp_output(slirp->opaque, arp_req, sizeof(arp_req));
-ifm->arp_requested = true;
+ifm->resolution_requested = true;
 
 /* Expire request and drop outgoing packet after 1 second */
 ifm->expiration_date = qemu_clock_get_ns(QEMU_CLOCK_REALTIME) + 
10ULL;
-- 
2.5.0




[Qemu-devel] [PULL 02/17] qemu-doc: Do not promote deprecated -smb and -redir options

2016-02-01 Thread Jason Wang
From: Thomas Huth 

Since -smb and -redir are deprecated options, we should not
use them as examples in the documentation anymore.

Signed-off-by: Thomas Huth 
Signed-off-by: Jason Wang 
---
 qemu-doc.texi | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/qemu-doc.texi b/qemu-doc.texi
index ca4d9de..212aba3 100644
--- a/qemu-doc.texi
+++ b/qemu-doc.texi
@@ -1237,9 +1237,9 @@ echo 100 100 > /proc/sys/net/ipv4/ping_group_range
 When using the built-in TFTP server, the router is also the TFTP
 server.
 
-When using the @option{-redir} option, TCP or UDP connections can be
-redirected from the host to the guest. It allows for example to
-redirect X11, telnet or SSH connections.
+When using the @option{'-netdev user,hostfwd=...'} option, TCP or UDP
+connections can be redirected from the host to the guest. It allows for
+example to redirect X11, telnet or SSH connections.
 
 @subsection Connecting VLANs between QEMU instances
 
@@ -1889,7 +1889,8 @@ correctly instructs QEMU to shutdown at the appropriate 
moment.
 
 @subsubsection Share a directory between Unix and Windows
 
-See @ref{sec_invocation} about the help of the option @option{-smb}.
+See @ref{sec_invocation} about the help of the option
+@option{'-netdev user,smb=...'}.
 
 @subsubsection Windows XP security problem
 
-- 
2.5.0




[Qemu-devel] [PULL 01/17] net/slirp: Tell the users when they are using deprecated options

2016-02-01 Thread Jason Wang
From: Thomas Huth 

We don't want to support the legacy -tftp, -bootp, -smb and
-net channel options forever. So let's start telling the users
that they are deprecated and what option should be used instead.

Signed-off-by: Thomas Huth 
Signed-off-by: Jason Wang 
---
 net/slirp.c | 3 +++
 os-posix.c  | 3 +++
 vl.c| 6 ++
 3 files changed, 12 insertions(+)

diff --git a/net/slirp.c b/net/slirp.c
index f505570..eac4fc2 100644
--- a/net/slirp.c
+++ b/net/slirp.c
@@ -784,6 +784,9 @@ int net_slirp_parse_legacy(QemuOptsList *opts_list, const 
char *optarg, int *ret
 return 0;
 }
 
+error_report("The '-net channel' option is deprecated. "
+ "Please use '-netdev user,guestfwd=...' instead.");
+
 /* handle legacy -net channel,port:chr */
 optarg += strlen("channel,");
 
diff --git a/os-posix.c b/os-posix.c
index e4da406..87e2a16 100644
--- a/os-posix.c
+++ b/os-posix.c
@@ -40,6 +40,7 @@
 #include "net/slirp.h"
 #include "qemu-options.h"
 #include "qemu/rcu.h"
+#include "qemu/error-report.h"
 
 #ifdef CONFIG_LINUX
 #include 
@@ -139,6 +140,8 @@ void os_parse_cmd_args(int index, const char *optarg)
 switch (index) {
 #ifdef CONFIG_SLIRP
 case QEMU_OPTION_smb:
+error_report("The -smb option is deprecated. "
+ "Please use '-netdev user,smb=...' instead.");
 if (net_slirp_smb(optarg) < 0)
 exit(1);
 break;
diff --git a/vl.c b/vl.c
index f043009..a12eabe 100644
--- a/vl.c
+++ b/vl.c
@@ -3308,12 +3308,18 @@ int main(int argc, char **argv, char **envp)
 #endif
 #ifdef CONFIG_SLIRP
 case QEMU_OPTION_tftp:
+error_report("The -tftp option is deprecated. "
+ "Please use '-netdev user,tftp=...' instead.");
 legacy_tftp_prefix = optarg;
 break;
 case QEMU_OPTION_bootp:
+error_report("The -bootp option is deprecated. "
+ "Please use '-netdev user,bootfile=...' 
instead.");
 legacy_bootp_filename = optarg;
 break;
 case QEMU_OPTION_redir:
+error_report("The -redir option is deprecated. "
+ "Please use '-netdev user,hostfwd=...' instead.");
 if (net_slirp_redir(optarg) < 0)
 exit(1);
 break;
-- 
2.5.0




[Qemu-devel] [PULL 03/17] net: cadence_gem: check packet size in gem_recieve

2016-02-01 Thread Jason Wang
From: Prasad J Pandit 

While receiving packets in 'gem_receive' routine, if Frame Check
Sequence(FCS) is enabled, it copies the packet into a local
buffer without checking its size. Add check to validate packet
length against the buffer size to avoid buffer overflow.

Reported-by: Ling Liu 
Signed-off-by: Prasad J Pandit 
Signed-off-by: Jason Wang 
---
 hw/net/cadence_gem.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/net/cadence_gem.c b/hw/net/cadence_gem.c
index f9e4091..e513d9d 100644
--- a/hw/net/cadence_gem.c
+++ b/hw/net/cadence_gem.c
@@ -678,6 +678,10 @@ static ssize_t gem_receive(NetClientState *nc, const 
uint8_t *buf, size_t size)
 } else {
 unsigned crc_val;
 
+if (size > sizeof(rxbuf) - sizeof(crc_val)) {
+size = sizeof(rxbuf) - sizeof(crc_val);
+}
+bytes_to_copy = size;
 /* The application wants the FCS field, which QEMU does not provide.
  * We must try and calculate one.
  */
-- 
2.5.0




Re: [Qemu-devel] [RFC Patch v2 06/10] virtio-net rsc: IPv4 checksum

2016-02-01 Thread Wei Xu



On 02/01/2016 02:31 PM, Jason Wang wrote:


On 02/01/2016 02:13 AM, w...@redhat.com wrote:

From: Wei Xu 

If a field in the IPv4 header is modified, then the checksum
have to be recalculated before sending it out.

This in fact breaks bisection. I think you need either squash this into
previous patch or introduce virtio_net_rsc_ipv4_checksum() as a helper
before the patch of ipv4 coalescing.

OK.

Signed-off-by: Wei Xu 
---
  hw/net/virtio-net.c | 19 +++
  1 file changed, 19 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 93df0d5..88fc4f8 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1630,6 +1630,18 @@ static int virtio_net_load_device(VirtIODevice *vdev, 
QEMUFile *f,
  return 0;
  }
  
+static void virtio_net_rsc_ipv4_checksum(NetRscSeg *seg)

+{
+uint32_t sum;
+struct ip_header *ip;
+
+ip = (struct ip_header *)(seg->buf + IP_OFFSET);
+
+ip->ip_sum = 0;
+sum = net_checksum_add_cont(sizeof(struct ip_header), (uint8_t *)ip, 0);
+ip->ip_sum = cpu_to_be16(net_checksum_finish(sum));
+}
+
  static void virtio_net_rsc_purge(void *opq)
  {
  int ret = 0;
@@ -1643,6 +1655,10 @@ static void virtio_net_rsc_purge(void *opq)
  continue;
  }
  
+if ((chain->proto == ETH_P_IP) && seg->is_coalesced) {

+virtio_net_rsc_ipv4_checksum(seg);
+}
+
  ret = virtio_net_do_receive(seg->nc, seg->buf, seg->size);
  QTAILQ_REMOVE(>buffers, seg, next);
  g_free(seg->buf);
@@ -1853,6 +1869,9 @@ static size_t virtio_net_rsc_callback(NetRscChain *chain, 
NetClientState *nc,
  QTAILQ_FOREACH_SAFE(seg, >buffers, next, nseg) {
  ret = coalesce(chain, seg, buf, size);
  if (RSC_FINAL == ret) {
+if ((chain->proto == ETH_P_IP) && seg->is_coalesced) {
+virtio_net_rsc_ipv4_checksum(seg);
+}
  ret = virtio_net_do_receive(seg->nc, seg->buf, seg->size);
  QTAILQ_REMOVE(>buffers, seg, next);
  g_free(seg->buf);







Re: [Qemu-devel] [RFC Patch v2 05/10] virtio-net rsc: Create timer to drain the packets from the cache pool

2016-02-01 Thread Wei Xu

On 02/01/2016 02:28 PM, Jason Wang wrote:


On 02/01/2016 02:13 AM, w...@redhat.com wrote:

From: Wei Xu 

The timer will only be triggered if the packets pool is not empty,
and it'll drain off all the cached packets, this is to reduce the
delay to upper layer protocol stack.

Signed-off-by: Wei Xu 
---
  hw/net/virtio-net.c | 38 ++
  1 file changed, 38 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 4f77fbe..93df0d5 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -48,12 +48,17 @@
  
  #define MAX_VIRTIO_IP_PAYLOAD  (65535 + IP_OFFSET)
  
+/* Purge coalesced packets timer interval */

+#define RSC_TIMER_INTERVAL  50

Any hints for choosing this as default value? Do we need a property for
user to change this?
This is still under estimation, 300ms -500ms is a good value to adapt 
the test, this should be configurable.

+
  /* Global statistics */
  static uint32_t rsc_chain_no_mem;
  
  /* Switcher to enable/disable rsc */

  static bool virtio_net_rsc_bypass;
  
+static uint32_t rsc_timeout = RSC_TIMER_INTERVAL;

+
  /* Coalesce callback for ipv4/6 */
  typedef int32_t (VirtioNetCoalesce) (NetRscChain *chain, NetRscSeg *seg,
   const uint8_t *buf, size_t size);
@@ -1625,6 +1630,35 @@ static int virtio_net_load_device(VirtIODevice *vdev, 
QEMUFile *f,
  return 0;
  }
  
+static void virtio_net_rsc_purge(void *opq)

+{
+int ret = 0;
+NetRscChain *chain = (NetRscChain *)opq;
+NetRscSeg *seg, *rn;
+
+QTAILQ_FOREACH_SAFE(seg, >buffers, next, rn) {
+if (!qemu_can_send_packet(seg->nc)) {
+/* Should quit or continue? not sure if one or some
+* of the queues fail would happen, try continue here */

This looks wrong, qemu_can_send_packet() is used for nc's peer not nc
itself.

OK.



+continue;
+}
+
+ret = virtio_net_do_receive(seg->nc, seg->buf, seg->size);
+QTAILQ_REMOVE(>buffers, seg, next);
+g_free(seg->buf);
+g_free(seg);
+
+if (ret == 0) {
+/* Try next queue */

Try next seg?

Yes, it's seg.



+continue;
+}

Why need above?
yes, it's optional, my maybe can help if there are extra codes after 
this, will remove this.



+}
+
+if (!QTAILQ_EMPTY(>buffers)) {
+timer_mod(chain->drain_timer,
+  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + rsc_timeout);

Need stop/start the timer during vm stop/start to save cpu.

Thanks, do you know where should i add the code?



+}
+}
  
  static void virtio_net_rsc_cleanup(VirtIONet *n)

  {
@@ -1810,6 +1844,8 @@ static size_t virtio_net_rsc_callback(NetRscChain *chain, 
NetClientState *nc,
  if (!virtio_net_rsc_cache_buf(chain, nc, buf, size)) {
  return 0;
  } else {
+timer_mod(chain->drain_timer,
+  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + rsc_timeout);
  return size;
  }
  }
@@ -1877,6 +1913,8 @@ static NetRscChain 
*virtio_net_rsc_lookup_chain(NetClientState *nc,
  }
  
  chain->proto = proto;

+chain->drain_timer = timer_new_ns(QEMU_CLOCK_VIRTUAL,
+  virtio_net_rsc_purge, chain);
  chain->do_receive = virtio_net_rsc_receive4;
  
  QTAILQ_INIT(>buffers);







Re: [Qemu-devel] [RFC Patch v2 08/10] virtio-net rsc: Sanity check & More bypass cases check

2016-02-01 Thread Wei Xu

On 02/01/2016 02:58 PM, Jason Wang wrote:


On 02/01/2016 02:13 AM, w...@redhat.com wrote:

From: Wei Xu 

More general exception cases check
1. Incorrect version in IP header
2. IP options & IP fragment
3. Not a TCP packets
4. Sanity size check to prevent buffer overflow attack.

Signed-off-by: Wei Xu 

Let's squash this into previous patches too for a better bisection
ability and complete implementation.

ok.



---
  hw/net/virtio-net.c | 44 
  1 file changed, 44 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index b0987d0..9b44762 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1948,6 +1948,46 @@ static size_t virtio_net_rsc_drain_one(NetRscChain 
*chain, NetClientState *nc,
  
  return virtio_net_do_receive(nc, buf, size);

  }
+
+static int32_t virtio_net_rsc_filter4(NetRscChain *chain, struct ip_header *ip,
+  const uint8_t *buf, size_t size)

This function checks for ip header, so need rename it to something like
"virtio_net_rsc_ipv4_filter()"

OK.



+{
+uint16_t ip_len;
+
+if (size < (TCP4_OFFSET + sizeof(tcp_header))) {
+return RSC_BYPASS;
+}
+
+/* Not an ipv4 one */
+if (0x4 != ((0xF0 & ip->ip_ver_len) >> 4)) {

Let's don't use magic value like 0x4 here.

OK.



+return RSC_BYPASS;
+}
+
+/* Don't handle packets with ip option */
+if (5 != (0xF & ip->ip_ver_len)) {
+return RSC_BYPASS;
+}
+
+/* Don't handle packets with ip fragment */
+if (!(htons(ip->ip_off) & IP_DF)) {
+return RSC_BYPASS;
+}
+
+if (ip->ip_p != IPPROTO_TCP) {
+return RSC_BYPASS;
+}
+
+/* Sanity check */
+ip_len = htons(ip->ip_len);
+if (ip_len < (sizeof(struct ip_header) + sizeof(struct tcp_header))
+|| ip_len > (size - IP_OFFSET)) {
+return RSC_BYPASS;
+}
+
+return RSC_WANT;
+}
+
+
  static size_t virtio_net_rsc_receive4(void *opq, NetClientState* nc,
const uint8_t *buf, size_t size)
  {
@@ -1958,6 +1998,10 @@ static size_t virtio_net_rsc_receive4(void *opq, 
NetClientState* nc,
  chain = (NetRscChain *)opq;
  ip = (struct ip_header *)(buf + IP_OFFSET);
  
+if (RSC_WANT != virtio_net_rsc_filter4(chain, ip, buf, size)) {

+return virtio_net_do_receive(nc, buf, size);
+}
+
  ret = virtio_net_rsc_parse_tcp_ctrl((uint8_t *)ip,
  (0xF & ip->ip_ver_len) << 2);
  if (RSC_BYPASS == ret) {







Re: [Qemu-devel] [Bug 1539940] [NEW] Qemu 2.5 Solaris 8 and 9 sparc hang after terminal type menu

2016-02-01 Thread Artyom Tarasenko
On Sat, Jan 30, 2016 at 5:41 PM, Zhen Ning Lim  wrote:
> Public bug reported:
>
> Qemu command:
> qemu-system-sparc -nographic -monitor null -serial 
> mon:telnet:localhost:3000,server -bios ../../Downloads/ss20_v2.25_rom -M 
> SS-20 -hda ./solsparc -m 512 -cdrom ./sol-9-905hw-ga-sparc-dvd.iso -boot d 
> -cpu "TI SuperSparc 60" -net nic,vlan=1,macaddr=52:54:0:12:34:56
>
>
> when i do disk2:d, the system loads until the terminal type menu.
>
> What type of terminal are you using?
> 1) ANSI Standard CRT
> 2) DEC VT52
> 3) DEC VT100
> 4) Heathkit 19
> 5) Lear Siegler ADM31
> 6) PC Console
> 7) Sun Command Tool
> 8) Sun Workstation
> 9) Televideo 910
> 10) Televideo 925
> 11) Wyse Model 50
> 12) X Terminal Emulator (xterms)
> 13) CDE Terminal Emulator (dtterm)
> 14) Other
> Type the number of your choice and press Return: 3
> syslog service starting.
> savecore: no dump device configured
> Running in command line mode
>
> And nothing happens after that. Anyone encountered this issue?
>
> ** Affects: qemu
>  Importance: Undecided
>  Status: New
>
> --
> You received this bug notification because you are a member of qemu-
> devel-ml, which is subscribed to QEMU.
> https://bugs.launchpad.net/bugs/1539940
>
> Title:
>   Qemu 2.5 Solaris 8 and 9 sparc hang after terminal type menu
>
> Status in QEMU:
>   New
>
> Bug description:
>   Qemu command:
>   qemu-system-sparc -nographic -monitor null -serial 
> mon:telnet:localhost:3000,server -bios ../../Downloads/ss20_v2.25_rom -M 
> SS-20 -hda ./solsparc -m 512 -cdrom ./sol-9-905hw-ga-sparc-dvd.iso -boot d 
> -cpu "TI SuperSparc 60" -net nic,vlan=1,macaddr=52:54:0:12:34:56
>
>
>   when i do disk2:d, the system loads until the terminal type menu.
>
>   What type of terminal are you using?
>   1) ANSI Standard CRT
>   2) DEC VT52
>   3) DEC VT100
>   4) Heathkit 19
>   5) Lear Siegler ADM31
>   6) PC Console
>   7) Sun Command Tool
>   8) Sun Workstation
>   9) Televideo 910
>   10) Televideo 925
>   11) Wyse Model 50
>   12) X Terminal Emulator (xterms)
>   13) CDE Terminal Emulator (dtterm)
>   14) Other
>   Type the number of your choice and press Return: 3
>   syslog service starting.
>   savecore: no dump device configured
>   Running in command line mode
>
>   And nothing happens after that. Anyone encountered this issue?

Does the boot log look like the "good" or the "bad" example from the link below?

http://tyom.blogspot.de/2010/05/sx-framebuffer-emulation.html


-- 
Regards,
Artyom Tarasenko

SPARC and PPC PReP under qemu blog: http://tyom.blogspot.com/search/label/qemu



Re: [Qemu-devel] [RFC Patch v2 03/10] virtio-net rsc: Chain Lookup, Packet Caching and Framework of IPv4

2016-02-01 Thread Wei Xu

On 02/01/2016 01:55 PM, Jason Wang wrote:



On 02/01/2016 02:13 AM, w...@redhat.com wrote:

From: Wei Xu 

Upon a packet is arriving, a corresponding chain will be selected or created,
or be bypassed if it's not an IPv4 packets.

The callback in the chain will be invoked to call the real coalescing.

Since the coalescing is based on the TCP connection, so the packets will be
cached if there is no previous data within the same connection.

The framework of IPv4 is also introduced.

This patch depends on patch 2918cf2 (Detailed IPv4 and General TCP data
coalescing)

Then looks like the order needs to be changed?


OK, as mentioned in other feedbacks, some of the patches should be merged, will 
adjust the patch set again, thanks.



Signed-off-by: Wei Xu 
---
  hw/net/virtio-net.c | 173 +++-
  1 file changed, 172 insertions(+), 1 deletion(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 4e9458e..cfbac6d 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -14,10 +14,12 @@
  #include "qemu/iov.h"
  #include "hw/virtio/virtio.h"
  #include "net/net.h"
+#include "net/eth.h"
  #include "net/checksum.h"
  #include "net/tap.h"
  #include "qemu/error-report.h"
  #include "qemu/timer.h"
+#include "qemu/sockets.h"
  #include "hw/virtio/virtio-net.h"
  #include "net/vhost_net.h"
  #include "hw/virtio/virtio-bus.h"
@@ -37,6 +39,21 @@
  #define endof(container, field) \
  (offsetof(container, field) + sizeof(((container *)0)->field))
  
+#define VIRTIO_HEADER   12/* Virtio net header size */

This looks wrong if mrg_rxbuf (VIRTIO_NET_F_MRG_RXBUF) is off


OK.




+#define IP_OFFSET (VIRTIO_HEADER + sizeof(struct eth_header))
+
+#define MAX_VIRTIO_IP_PAYLOAD  (65535 + IP_OFFSET)
+
+/* Global statistics */
+static uint32_t rsc_chain_no_mem;

This is meaningless, see below comments.


Yes, should remove this.




+
+/* Switcher to enable/disable rsc */
+static bool virtio_net_rsc_bypass;
+
+/* Coalesce callback for ipv4/6 */
+typedef int32_t (VirtioNetCoalesce) (NetRscChain *chain, NetRscSeg *seg,
+ const uint8_t *buf, size_t size);
+
  typedef struct VirtIOFeature {
  uint32_t flags;
  size_t end;
@@ -1019,7 +1036,8 @@ static int receive_filter(VirtIONet *n, const uint8_t 
*buf, int size)
  return 0;
  }
  
-static ssize_t virtio_net_receive(NetClientState *nc, const uint8_t *buf, size_t size)

+static ssize_t virtio_net_do_receive(NetClientState *nc,
+  const uint8_t *buf, size_t size)
  {
  VirtIONet *n = qemu_get_nic_opaque(nc);
  VirtIONetQueue *q = virtio_net_get_subqueue(nc);
@@ -1623,6 +1641,159 @@ static void virtio_net_rsc_cleanup(VirtIONet *n)
  }
  }
  
+static int virtio_net_rsc_cache_buf(NetRscChain *chain, NetClientState *nc,

+const uint8_t *buf, size_t size)
+{
+NetRscSeg *seg;
+
+seg = g_malloc(sizeof(NetRscSeg));
+if (!seg) {
+return 0;
+}

g_malloc() can't fail, no need to check if it succeeded.


OK.




+
+seg->buf = g_malloc(MAX_VIRTIO_IP_PAYLOAD);
+if (!seg->buf) {
+goto out;
+}
+
+memmove(seg->buf, buf, size);
+seg->size = size;
+seg->dup_ack_count = 0;
+seg->is_coalesced = 0;
+seg->nc = nc;
+
+QTAILQ_INSERT_TAIL(>buffers, seg, next);
+return size;
+
+out:
+g_free(seg);
+return 0;
+}
+
+
+static int32_t virtio_net_rsc_try_coalesce4(NetRscChain *chain,
+   NetRscSeg *seg, const uint8_t *buf, size_t size)
+{
+/* This real part of this function will be introduced in next patch, just
+*  return a 'final' to feed the compilation. */
+return RSC_FINAL;
+}
+
+static size_t virtio_net_rsc_callback(NetRscChain *chain, NetClientState *nc,
+const uint8_t *buf, size_t size, VirtioNetCoalesce *coalesce)
+{

Looks like this function was called directly, so "callback" suffix is
not accurate.


OK.




+int ret;
+NetRscSeg *seg, *nseg;
+
+if (QTAILQ_EMPTY(>buffers)) {
+if (!virtio_net_rsc_cache_buf(chain, nc, buf, size)) {
+return 0;
+} else {
+return size;
+}
+}
+
+QTAILQ_FOREACH_SAFE(seg, >buffers, next, nseg) {
+ret = coalesce(chain, seg, buf, size);
+if (RSC_FINAL == ret) {

Let's use "ret == RSC_FINAL" for a consistent coding style with other
qemu codes.


OK.




+ret = virtio_net_do_receive(seg->nc, seg->buf, seg->size);
+QTAILQ_REMOVE(>buffers, seg, next);
+g_free(seg->buf);
+g_free(seg);
+if (ret == 0) {
+/* Send failed */
+return 0;
+}
+
+/* Send current packet */
+return virtio_net_do_receive(nc, buf, size);
+} else if (RSC_NO_MATCH == ret) {
+continue;
+} else {
+/* 

Re: [Qemu-devel] [PATCH RFC v2 3/5] net/filter: Introduce a helper to add a filter to the netdev

2016-02-01 Thread Hailiang Zhang

On 2016/2/1 16:05, Yang Hongyang wrote:



On 02/01/2016 03:56 PM, Hailiang Zhang wrote:

On 2016/2/1 15:46, Jason Wang wrote:



On 02/01/2016 02:13 PM, Hailiang Zhang wrote:

On 2016/2/1 11:14, Jason Wang wrote:



On 01/27/2016 04:29 PM, zhanghailiang wrote:

We add a new helper function netdev_add_filter(), this function
can help adding a filter object to a netdev.
Besides, we add a is_default member for struct NetFilterState
to indicate whether the filter is default or not.

Signed-off-by: zhanghailiang 
---
v2:
   -Re-implement netdev_add_filter() by re-using object_create()
(Jason's suggestion)
---
   include/net/filter.h |  7 +
   net/filter.c | 80

   2 files changed, 87 insertions(+)

diff --git a/include/net/filter.h b/include/net/filter.h
index af3c53c..ee1c024 100644
--- a/include/net/filter.h
+++ b/include/net/filter.h
@@ -55,6 +55,7 @@ struct NetFilterState {
   char *netdev_id;
   NetClientState *netdev;
   NetFilterDirection direction;
+bool is_default;
   bool enabled;
   QTAILQ_ENTRY(NetFilterState) next;
   };
@@ -74,4 +75,10 @@ ssize_t
qemu_netfilter_pass_to_next(NetClientState *sender,
   int iovcnt,
   void *opaque);

+void netdev_add_filter(const char *netdev_id,
+   const char *filter_type,
+   const char *id,
+   bool is_default,
+   Error **errp);
+
   #endif /* QEMU_NET_FILTER_H */
diff --git a/net/filter.c b/net/filter.c
index d08a2be..dc7aa9b 100644
--- a/net/filter.c
+++ b/net/filter.c
@@ -214,6 +214,86 @@ static void netfilter_complete(UserCreatable
*uc, Error **errp)
   QTAILQ_INSERT_TAIL(>netdev->filters, nf, next);
   }

+QemuOptsList qemu_filter_opts = {
+.name = "default-filter",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_filter_opts.head),
+.desc = {
+{
+.name = "qom-type",
+.type = QEMU_OPT_STRING,
+},{
+.name = "id",
+.type = QEMU_OPT_STRING,
+},{
+.name = "netdev",
+.type = QEMU_OPT_STRING,
+},{
+.name = "status",
+.type = QEMU_OPT_STRING,
+},
+{ /* end of list */ }
+},
+};
+
+static void filter_set_default_flag(const char *id,
+bool is_default,
+Error **errp)
+{
+Object *obj, *container;
+NetFilterState *nf;
+
+container = object_get_objects_root();
+obj = object_resolve_path_component(container, id);
+if (!obj) {
+error_setg(errp, "object id not found");
+return;
+}
+nf = NETFILTER(obj);
+nf->is_default = is_default;
+}
+
+void netdev_add_filter(const char *netdev_id,
+   const char *filter_type,
+   const char *id,
+   bool is_default,
+   Error **errp)
+{
+NetClientState *nc = qemu_find_netdev(netdev_id);
+char *optarg;
+QemuOpts *opts = NULL;
+Error *err = NULL;
+
+/* FIXME: Not support multiple queues */
+if (!nc || nc->queue_index > 1) {
+return;
+}
+/* Not support vhost-net */
+if (get_vhost_net(nc)) {
+return;
+}
+
+optarg = g_strdup_printf("qom-type=%s,id=%s,netdev=%s,status=%s",
+filter_type, id, netdev_id, is_default ? "disable" :
"enable"


Instead of this, I wonder maybe it's better to:

- store the default filter property into a pointer to string


Do you mean, pass a string parameter which stores the filter property
instead of
assemble it in this helper ?


Yes. E.g just a global string which could be changed by any subsystem.
E.g colo may change it to "filter-buffer,interval=0,status=disable". But
filter ids need to be generated automatically.



Got it. Then we don't need the global default_netfilter_type[] in patch 5,
Just use this global string instead ?




- colo code may change the pointer to "filter-buffer,status=disable"




Then, there's no need for lots of codes above:
- no need a "is_default" parameter in netdev_add_filter which does not
scale consider we may want to have more property in the future
- no need to hacking like "qemu_filter_opts"


Yes, we can use qemu_find_opts("object") instead of it.


- no need to have a special flag like "is_default"



But we have to distinguish the default filter from the common
filter, use the name (id) to distinguish it ?


What's the reason that you want to distinguish default filters from
others?



The default filters will be used by COLO or MC, (In COLO, we will use it
to control packets buffering/releasing).
For COLO, we don't want to control (use) other filters that added by users.


I think Jason's point is that COLO is a manager, you can add the filter
to netdev when doing COLO, so the only 

Re: [Qemu-devel] [RFC Patch v2 09/10] virtio-net rsc: Add IPv6 support

2016-02-01 Thread Wei Xu



On 02/01/2016 03:14 PM, Jason Wang wrote:


On 02/01/2016 02:13 AM, w...@redhat.com wrote:

From: Wei Xu 

A few more stuffs should be included to support this
1. Corresponding chain lookup
2. Coalescing callback for the protocol chain
3. Filter & Sanity Check.

Signed-off-by: Wei Xu 
---
  hw/net/virtio-net.c | 104 +++-
  1 file changed, 102 insertions(+), 2 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 9b44762..c9f6bfc 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -46,12 +46,19 @@
  #define TCP4_OFFSET (IP_OFFSET + sizeof(struct ip_header)) /* tcp4 header */
  #define TCP4_PORT_OFFSET TCP4_OFFSET/* tcp4 port offset */
  #define IP4_ADDR_SIZE   8   /* ipv4 saddr + daddr */
+
+#define IP6_ADDR_OFFSET (IP_OFFSET + 8) /* ipv6 address start */
+#define TCP6_OFFSET (IP_OFFSET + sizeof(struct ip6_header)) /* tcp6 header */
+#define TCP6_PORT_OFFSET TCP6_OFFSET/* tcp6 port offset */
+#define IP6_ADDR_SIZE   32  /* ipv6 saddr + daddr */
  #define TCP_PORT_SIZE   4   /* sport + dport */
  #define TCP_WINDOW  65535
  
  /* IPv4 max payload, 16 bits in the header */

  #define MAX_IP4_PAYLOAD  (65535 - sizeof(struct ip_header))
  
+/* ip6 max payload, payload in ipv6 don't include the  header */

+#define MAX_IP6_PAYLOAD  65535
  #define MAX_VIRTIO_IP_PAYLOAD  (65535 + IP_OFFSET)
  
  /* Purge coalesced packets timer interval */

@@ -1856,6 +1863,42 @@ static int32_t virtio_net_rsc_try_coalesce4(NetRscChain 
*chain,
  o_data, _ip->ip_len, MAX_IP4_PAYLOAD);
  }
  
+static int32_t virtio_net_rsc_try_coalesce6(NetRscChain *chain,

+NetRscSeg *seg, const uint8_t *buf, size_t size)
+{
+uint16_t o_ip_len, n_ip_len;/* len in ip header field */
+uint16_t n_tcp_len, o_tcp_len;  /* tcp header len */
+uint16_t o_data, n_data;/* payload without virtio/eth/ip/tcp */
+struct ip6_header *n_ip, *o_ip;
+struct tcp_header *n_tcp, *o_tcp;
+
+n_ip = (struct ip6_header *)(buf + IP_OFFSET);
+n_ip_len = htons(n_ip->ip6_ctlun.ip6_un1.ip6_un1_plen);
+n_tcp = (struct tcp_header *)(((uint8_t *)n_ip)\
++ sizeof(struct ip6_header));
+n_tcp_len = (htons(n_tcp->th_offset_flags) & 0xF000) >> 10;
+n_data = n_ip_len - n_tcp_len;
+
+o_ip = (struct ip6_header *)(seg->buf + IP_OFFSET);
+o_ip_len = htons(o_ip->ip6_ctlun.ip6_un1.ip6_un1_plen);
+o_tcp = (struct tcp_header *)(((uint8_t *)o_ip)\
++ sizeof(struct ip6_header));
+o_tcp_len = (htons(o_tcp->th_offset_flags) & 0xF000) >> 10;
+o_data = o_ip_len - o_tcp_len;

Like I've replied in previous mails, need a helper or just store
pointers to both ip and tcp in seg.

OK.



+
+if (memcmp(_ip->ip6_src, _ip->ip6_src, sizeof(struct in6_address))
+|| memcmp(_ip->ip6_dst, _ip->ip6_dst, sizeof(struct in6_address))
+|| (n_tcp->th_sport ^ o_tcp->th_sport)
+|| (n_tcp->th_dport ^ o_tcp->th_dport)) {
+return RSC_NO_MATCH;
+}

And if you still want to handle coalescing in a layer style, better
delay the check of ports to tcp function.

OK.



+
+/* There is a difference between payload lenght in ipv4 and v6,
+   ip header is excluded in ipv6 */
+return virtio_net_rsc_coalesce_tcp(chain, seg, buf,
+   n_tcp, n_tcp_len, n_data, o_tcp, o_tcp_len, o_data,
+   _ip->ip6_ctlun.ip6_un1.ip6_un1_plen, MAX_IP6_PAYLOAD);
+}
  
  /* Pakcets with 'SYN' should bypass, other flag should be sent after drain

   * to prevent out of order */
@@ -2015,6 +2058,59 @@ static size_t virtio_net_rsc_receive4(void *opq, 
NetClientState* nc,
 virtio_net_rsc_try_coalesce4);
  }
  
+static int32_t virtio_net_rsc_filter6(NetRscChain *chain, struct ip6_header *ip,

+  const uint8_t *buf, size_t size)
+{
+uint16_t ip_len;
+
+if (size < (TCP6_OFFSET + sizeof(tcp_header))) {
+return RSC_BYPASS;
+}
+
+if (0x6 != (0xF & ip->ip6_ctlun.ip6_un1.ip6_un1_flow)) {
+return RSC_BYPASS;
+}
+
+/* Both option and protocol is checked in this */
+if (ip->ip6_ctlun.ip6_un1.ip6_un1_nxt != IPPROTO_TCP) {
+return RSC_BYPASS;
+}
+
+/* Sanity check */
+ip_len = htons(ip->ip6_ctlun.ip6_un1.ip6_un1_plen);
+if (ip_len < sizeof(struct tcp_header)
+|| ip_len > (size - TCP6_OFFSET)) {
+return RSC_BYPASS;
+}
+
+return 0;

RSC_WANT?

Yes, the is new code and not tested.



+}
+
+static size_t virtio_net_rsc_receive6(void *opq, NetClientState* nc,
+  const uint8_t *buf, size_t size)
+{
+int32_t ret;
+NetRscChain *chain;
+struct ip6_header *ip;
+
+chain = 

Re: [Qemu-devel] [PATCH RFC v2 3/5] net/filter: Introduce a helper to add a filter to the netdev

2016-02-01 Thread Jason Wang


On 02/01/2016 03:56 PM, Hailiang Zhang wrote:
> On 2016/2/1 15:46, Jason Wang wrote:
>>
>>
>> On 02/01/2016 02:13 PM, Hailiang Zhang wrote:
>>> On 2016/2/1 11:14, Jason Wang wrote:


 On 01/27/2016 04:29 PM, zhanghailiang wrote:
> We add a new helper function netdev_add_filter(), this function
> can help adding a filter object to a netdev.
> Besides, we add a is_default member for struct NetFilterState
> to indicate whether the filter is default or not.
>
> Signed-off-by: zhanghailiang 
> ---
> v2:
>-Re-implement netdev_add_filter() by re-using object_create()
> (Jason's suggestion)
> ---
>include/net/filter.h |  7 +
>net/filter.c | 80
> 
>2 files changed, 87 insertions(+)
>
>

[...]

> +
> +optarg =
> g_strdup_printf("qom-type=%s,id=%s,netdev=%s,status=%s",
> +filter_type, id, netdev_id, is_default ? "disable" :
> "enable"

 Instead of this, I wonder maybe it's better to:

 - store the default filter property into a pointer to string
>>>
>>> Do you mean, pass a string parameter which stores the filter property
>>> instead of
>>> assemble it in this helper ?
>>
>> Yes. E.g just a global string which could be changed by any subsystem.
>> E.g colo may change it to "filter-buffer,interval=0,status=disable". But
>> filter ids need to be generated automatically.
>>
>
> Got it. Then we don't need the global default_netfilter_type[] in
> patch 5,

Yes.

> Just use this global string instead ?

Right.

>
>>>
 - colo code may change the pointer to "filter-buffer,status=disable"

>>>
 Then, there's no need for lots of codes above:
 - no need a "is_default" parameter in netdev_add_filter which does not
 scale consider we may want to have more property in the future
 - no need to hacking like "qemu_filter_opts"
>>>
>>> Yes, we can use qemu_find_opts("object") instead of it.
>>>
 - no need to have a special flag like "is_default"

>>>
>>> But we have to distinguish the default filter from the common
>>> filter, use the name (id) to distinguish it ?
>>
>> What's the reason that you want to distinguish default filters from
>> others?
>>
>
> The default filters will be used by COLO or MC, (In COLO, we will use it
> to control packets buffering/releasing).

A question is how will you do this?

> For COLO, we don't want to control (use) other filters that added by
> users.
>
> Thanks,
> Hailiang
>
>> Thanks
>>
>>>
>>> Thanks,
>>> Hailiang
>>>
 Thoughts?

> +opts = qemu_opts_parse_noisily(_filter_opts,
> +   optarg, false);
> +if (!opts) {
> +error_report("Failed to parse param '%s'", optarg);
> +exit(1);
> +}
> +g_free(optarg);
> +if (object_create(NULL, opts, ) < 0) {
> +error_report("Failed to create object");
> +goto out_clean;
> +}
> +filter_set_default_flag(id, is_default, );
> +
> +out_clean:
> +qemu_opts_del(opts);
> +if (err) {
> +error_propagate(errp, err);
> +}
> +}
> +
>static void netfilter_finalize(Object *obj)
>{
>NetFilterState *nf = NETFILTER(obj);


 .

>>>
>>>
>>
>>
>> .
>>
>
>
>




Re: [Qemu-devel] [RFC Patch v2 04/10] virtio-net rsc: Detailed IPv4 and General TCP data coalescing

2016-02-01 Thread Wei Xu


On 02/01/2016 02:21 PM, Jason Wang wrote:



On 02/01/2016 02:13 AM, w...@redhat.com wrote:

From: Wei Xu 

Since this feature also needs to support IPv6, and there are
some protocol specific differences difference for IPv4/6 in the header,
so try to make the interface to be general.

IPv4/6 should set up both the new and old IP/TCP header before invoking
TCP coalescing, and should also tell the real payload.

The main handler of TCP includes TCP window update, duplicated ACK check
and the real data coalescing if the new segment passed invalid filter
and is identified as an expected one.

An expected segment means:
1. Segment is within current window and the sequence is the expected one.
2. ACK of the segment is in the valid window.
3. If the ACK in the segment is a duplicated one, then it must less than 2,
this is to notify upper layer TCP starting retransmission due to the spec.

Signed-off-by: Wei Xu 
---
  hw/net/virtio-net.c | 127 ++--
  1 file changed, 124 insertions(+), 3 deletions(-)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index cfbac6d..4f77fbe 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -41,6 +41,10 @@
  
  #define VIRTIO_HEADER   12/* Virtio net header size */

  #define IP_OFFSET (VIRTIO_HEADER + sizeof(struct eth_header))
+#define TCP_WINDOW  65535

The name is confusing, how about TCP_MAX_WINDOW_SIZE ?


Sounds better, will take it in.




+
+/* IPv4 max payload, 16 bits in the header */
+#define MAX_IP4_PAYLOAD  (65535 - sizeof(struct ip_header))
  
  #define MAX_VIRTIO_IP_PAYLOAD  (65535 + IP_OFFSET)
  
@@ -1670,13 +1674,130 @@ out:

  return 0;
  }
  
+static int32_t virtio_net_rsc_handle_ack(NetRscChain *chain, NetRscSeg *seg,

+ const uint8_t *buf, struct tcp_header *n_tcp,
+ struct tcp_header *o_tcp)
+{
+uint32_t nack, oack;
+uint16_t nwin, owin;
+
+nack = htonl(n_tcp->th_ack);
+nwin = htons(n_tcp->th_win);
+oack = htonl(o_tcp->th_ack);
+owin = htons(o_tcp->th_win);
+
+if ((nack - oack) >= TCP_WINDOW) {
+return RSC_FINAL;
+} else if (nack == oack) {
+/* duplicated ack or window probe */
+if (nwin == owin) {
+/* duplicated ack, add dup ack count due to whql test up to 1 */
+
+if (seg->dup_ack_count == 0) {
+seg->dup_ack_count++;
+return RSC_COALESCE;
+} else {
+/* Spec says should send it directly */
+return RSC_FINAL;
+}
+} else {
+/* Coalesce window update */

Need we flush this immediately consider it was a window update?


The flowchart in the spec says this can be coalesced as normal.

https://msdn.microsoft.com/en-us/library/windows/hardware/jj853325%28v=vs.85%29.aspx




+o_tcp->th_win = n_tcp->th_win;
+return RSC_COALESCE;
+}
+} else {

What if nack < oack here?


That should happen, the  modulo-232 arithmetic check at the begin of this 
function will keep the ack is in the current window.




+/* pure ack, update ack */
+o_tcp->th_ack = n_tcp->th_ack;
+return RSC_COALESCE;
+}
+}
+
+static int32_t virtio_net_rsc_coalesce_tcp(NetRscChain *chain, NetRscSeg *seg,
+   const uint8_t *buf, struct tcp_header *n_tcp, uint16_t 
n_tcp_len,
+   uint16_t n_data, struct tcp_header *o_tcp, uint16_t o_tcp_len,
+   uint16_t o_data, uint16_t *p_ip_len, uint16_t max_data)
+{
+void *data;
+uint16_t o_ip_len;
+uint32_t nseq, oseq;
+
+o_ip_len = htons(*p_ip_len);
+nseq = htonl(n_tcp->th_seq);
+oseq = htonl(o_tcp->th_seq);
+

Need to the tcp header check here. And looks like we need also check more:

- Flags
- Data offset
- URG pointer


ok.




+/* Ignore packet with more/larger tcp options */
+if (n_tcp_len > o_tcp_len) {

What if n_tcp_len < o_tcp_len ?


This maybe a bug, it's better to bypass it.




+return RSC_FINAL;
+}
+
+/* out of order or retransmitted. */
+if ((nseq - oseq) > TCP_WINDOW) {
+return RSC_FINAL;
+}
+
+data = ((uint8_t *)n_tcp) + n_tcp_len;
+if (nseq == oseq) {
+if ((0 == o_data) && n_data) {
+/* From no payload to payload, normal case, not a dup ack or etc */
+goto coalesce;
+} else {
+return virtio_net_rsc_handle_ack(chain, seg, buf, n_tcp, o_tcp);
+}
+} else if ((nseq - oseq) != o_data) {
+/* Not a consistent packet, out of order */
+return RSC_FINAL;
+} else {
+coalesce:
+if ((o_ip_len + n_data) > max_data) {
+return RSC_FINAL;
+}
+
+/* Here comes the right data, the payload lengh in v4/v6 is different,
+   so use the field value to update */
+*p_ip_len = htons(o_ip_len + 

Re: [Qemu-devel] [PATCH v7 12/13] qmp: Add query-ppc-cpu-cores command

2016-02-01 Thread Bharata B Rao
On Fri, Jan 29, 2016 at 04:45:06PM +0100, Igor Mammedov wrote:
> On Thu, 28 Jan 2016 11:19:54 +0530
> Bharata B Rao  wrote:
> 
> > Show the details of PPC CPU cores via a new QMP command.
> > 
> > TODO: update qmp-commands.hx with example
> > 
> > Signed-off-by: Bharata B Rao 
> > ---
> >  hw/ppc/cpu-core.c   | 77 
> > +
> >  qapi-schema.json| 31 +
> >  qmp-commands.hx | 51 +++
> >  stubs/Makefile.objs |  1 +
> >  stubs/qmp_query_ppc_cpu_cores.c | 10 ++
> >  5 files changed, 170 insertions(+)
> >  create mode 100644 stubs/qmp_query_ppc_cpu_cores.c
> > 
> > diff --git a/hw/ppc/cpu-core.c b/hw/ppc/cpu-core.c
> > index aa96e79..652a5aa 100644
> > --- a/hw/ppc/cpu-core.c
> > +++ b/hw/ppc/cpu-core.c
> > @@ -9,7 +9,84 @@
> >  #include "hw/ppc/cpu-core.h"
> >  #include "hw/boards.h"
> >  #include 
> > +#include 
> >  #include "qemu/error-report.h"
> > +#include "qmp-commands.h"
> > +
> > +/*
> > + * QMP: info ppc-cpu-cores
> > + */
> > +static int qmp_ppc_cpu_list(Object *obj, void *opaque)
> > +{
> > +CpuInfoList ***prev = opaque;
> > +
> > +if (object_dynamic_cast(obj, TYPE_POWERPC_CPU)) {
> > +CpuInfoList *elem = g_new0(CpuInfoList, 1);
> > +CpuInfo *s = g_new0(CpuInfo, 1);
> > +CPUState *cs = CPU(obj);
> > +PowerPCCPU *cpu = POWERPC_CPU(cs);
> > +CPUPPCState *env = >env;
> > +
> > +cpu_synchronize_state(cs);
> > +s->arch = CPU_INFO_ARCH_PPC;
> > +s->current = (cs == first_cpu);
> > +s->CPU = cs->cpu_index;
> > +s->qom_path = object_get_canonical_path(obj);
> > +s->halted = cs->halted;
> > +s->thread_id = cs->thread_id;
> > +s->u.ppc = g_new0(CpuInfoPPC, 1);
> > +s->u.ppc->nip = env->nip;
> > +
> > +elem->value = s;
> > +elem->next = NULL;
> > +**prev = elem;
> > +*prev = >next;
> > +}
> > +object_child_foreach(obj, qmp_ppc_cpu_list, opaque);
> > +return 0;
> > +}
> > +
> > +static int qmp_ppc_cpu_core_list(Object *obj, void *opaque)
> > +{
> > +PPCCPUCoreList ***prev = opaque;
> > +
> > +if (object_dynamic_cast(obj, TYPE_POWERPC_CPU_CORE)) {
> > +DeviceClass *dc = DEVICE_GET_CLASS(obj);
> > +DeviceState *dev = DEVICE(obj);
> > +
> > +if (dev->realized) {
> > +PPCCPUCoreList *elem = g_new0(PPCCPUCoreList, 1);
> > +PPCCPUCore *s = g_new0(PPCCPUCore, 1);
> > +CpuInfoList *cpu_head = NULL;
> > +CpuInfoList **cpu_prev = _head;
> > +
> > +if (dev->id) {
> > +s->has_id = true;
> > +s->id = g_strdup(dev->id);
> > +}
> > +s->hotplugged = dev->hotplugged;
> > +s->hotpluggable = dc->hotpluggable;
> > +qmp_ppc_cpu_list(obj, _prev);
> > +s->threads = cpu_head;
> > +elem->value = s;
> > +elem->next = NULL;
> > +**prev = elem;
> > +*prev = >next;
> > +}
> > +}
> > +
> > +object_child_foreach(obj, qmp_ppc_cpu_core_list, opaque);
> > +return 0;
> > +}
> > +
> > +PPCCPUCoreList *qmp_query_ppc_cpu_cores(Error **errp)
> > +{
> > +PPCCPUCoreList *head = NULL;
> > +PPCCPUCoreList **prev = 
> > +
> > +qmp_ppc_cpu_core_list(qdev_get_machine(), );
> > +return head;
> > +}
> >  
> >  static int ppc_cpu_core_realize_child(Object *child, void *opaque)
> >  {
> > diff --git a/qapi-schema.json b/qapi-schema.json
> > index 8d04897..0902697 100644
> > --- a/qapi-schema.json
> > +++ b/qapi-schema.json
> > @@ -4083,3 +4083,34 @@
> >  ##
> >  { 'enum': 'ReplayMode',
> >'data': [ 'none', 'record', 'play' ] }
> > +
> > +##
> > +# @PPCCPUCore:
> > +#
> > +# Information about PPC CPU core devices
> > +#
> > +# @hotplugged: true if device was hotplugged
> > +#
> > +# @hotpluggable: true if device if could be added/removed while machine is 
> > running
> > +#
> > +# Since: 2.6
> > +##
> > +
> > +{ 'struct': 'PPCCPUCore',
> > +  'data': { '*id': 'str',
> > +'hotplugged': 'bool',
> > +'hotpluggable': 'bool',
> > +'threads' : ['CpuInfo']
> > +  }
> > +}
> Could it be made more arch independent?

Except that it is called PPCCPUCore, the fields are actually arch
neutral with 'threads' element defined as CpuInfo that gets interpreted
as arch-specific CpuInfo based on the target architecture.

In any case, this patchset adds PowerPC specific CPU core device that
sPAPR target implements. This is kept arch-specific in order to make it
more acceptable in short term in case arch-neutral, generic CPU hotplug
solutions take long time for reaching consensus.

> Perhaps it might make sense to replace 'threads'
> with qom-path so tools could inspect it in more detail
> if needed?

Hmm 'threads' is of 

Re: [Qemu-devel] [RFC Patch v2 07/10] virtio-net rsc: Checking TCP flag and drain specific connection packets

2016-02-01 Thread Wei Xu



On 02/01/2016 02:44 PM, Jason Wang wrote:


On 02/01/2016 02:13 AM, w...@redhat.com wrote:

From: Wei Xu 

Normally it includes 2 typical way to handle a TCP control flag, bypass
and finalize, bypass means should be sent out directly, and finalize
means the packets should also be bypassed, and this should be done
after searching for the same connection packets in the pool and sending
all of them out, this is to avoid out of data.

All the 'SYN' packets will be bypassed since this always begin a new'
connection, other flag such 'FIN/RST' will trigger a finalization, because
this normally happens upon a connection is going to be closed.

Signed-off-by: Wei Xu 
---
  hw/net/virtio-net.c | 66 +
  1 file changed, 66 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 88fc4f8..b0987d0 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -41,6 +41,12 @@
  
  #define VIRTIO_HEADER   12/* Virtio net header size */

  #define IP_OFFSET (VIRTIO_HEADER + sizeof(struct eth_header))
+
+#define IP4_ADDR_OFFSET (IP_OFFSET + 12)/* ipv4 address start */
+#define TCP4_OFFSET (IP_OFFSET + sizeof(struct ip_header)) /* tcp4 header */
+#define TCP4_PORT_OFFSET TCP4_OFFSET/* tcp4 port offset */
+#define IP4_ADDR_SIZE   8   /* ipv4 saddr + daddr */
+#define TCP_PORT_SIZE   4   /* sport + dport */
  #define TCP_WINDOW  65535
  
  /* IPv4 max payload, 16 bits in the header */

@@ -1850,6 +1856,27 @@ static int32_t virtio_net_rsc_try_coalesce4(NetRscChain 
*chain,
  o_data, _ip->ip_len, MAX_IP4_PAYLOAD);
  }
  
+

+/* Pakcets with 'SYN' should bypass, other flag should be sent after drain
+ * to prevent out of order */
+static int virtio_net_rsc_parse_tcp_ctrl(uint8_t *ip, uint16_t offset)
+{
+uint16_t tcp_flag;
+struct tcp_header *tcp;
+
+tcp = (struct tcp_header *)(ip + offset);
+tcp_flag = htons(tcp->th_offset_flags) & 0x3F;
+if (tcp_flag & TH_SYN) {
+return RSC_BYPASS;
+}
+
+if (tcp_flag & (TH_FIN | TH_URG | TH_RST)) {
+return RSC_FINAL;
+}
+
+return 0;
+}

To avid breaking bisection, need to squash this into previous patches
for a complete implementation of tcp coalescing.

OK.



+
  static size_t virtio_net_rsc_callback(NetRscChain *chain, NetClientState *nc,
  const uint8_t *buf, size_t size, VirtioNetCoalesce *coalesce)
  {
@@ -1895,12 +1922,51 @@ static size_t virtio_net_rsc_callback(NetRscChain 
*chain, NetClientState *nc,
  return virtio_net_rsc_cache_buf(chain, nc, buf, size);
  }
  
+/* Drain a connection data, this is to avoid out of order segments */

+static size_t virtio_net_rsc_drain_one(NetRscChain *chain, NetClientState *nc,
+const uint8_t *buf, size_t size, uint16_t ip_start,
+uint16_t ip_size, uint16_t tcp_port, uint16_t port_size)
+{
+NetRscSeg *seg, *nseg;
+
+QTAILQ_FOREACH_SAFE(seg, >buffers, next, nseg) {
+if (memcmp(buf + ip_start, seg->buf + ip_start, ip_size)
+|| memcmp(buf + tcp_port, seg->buf + tcp_port, port_size)) {

Do you really mean "||" here?

Oops, it's '&&' here.



+continue;
+}
+if ((chain->proto == ETH_P_IP) && seg->is_coalesced) {
+virtio_net_rsc_ipv4_checksum(seg);
+}
+
+virtio_net_do_receive(seg->nc, seg->buf, seg->size);
+
+QTAILQ_REMOVE(>buffers, seg, next);
+g_free(seg->buf);
+g_free(seg);

The above three or four lines looks like a duplication two or three
times in the codes of previous patch. Need consider a new helper.

OK.



+break;
+}
+
+return virtio_net_do_receive(nc, buf, size);
+}
  static size_t virtio_net_rsc_receive4(void *opq, NetClientState* nc,
const uint8_t *buf, size_t size)
  {
+int32_t ret;
+struct ip_header *ip;
  NetRscChain *chain;
  
  chain = (NetRscChain *)opq;

+ip = (struct ip_header *)(buf + IP_OFFSET);
+
+ret = virtio_net_rsc_parse_tcp_ctrl((uint8_t *)ip,
+(0xF & ip->ip_ver_len) << 2);

This looks like a layer violation here. I think it should be done in
virtio_net_rsc_roalesce_tcp().

Good idea, will check it out.



+if (RSC_BYPASS == ret) {
+return virtio_net_do_receive(nc, buf, size);
+} else if (RSC_FINAL == ret) {
+return virtio_net_rsc_drain_one(chain, nc, buf, size, IP4_ADDR_OFFSET,
+IP4_ADDR_SIZE, TCP4_PORT_OFFSET, 
TCP_PORT_SIZE);

It's better for virtio_net_rsc_drain_one() itself to check the ip proto
and switch to use v4 or v6 offset/size, instead of passing a long
parameter list of OFFSET/SIZE macros.

Yes, is considering optimizing it.



+}
+
  return virtio_net_rsc_callback(chain, nc, buf, size,
 

Re: [Qemu-devel] [PATCH v3 00/10] Allow hotplug of s390 CPUs

2016-02-01 Thread Christian Borntraeger
On 01/27/2016 05:53 PM, Matthew Rosato wrote:
> Changes from v2->v3:
> 
> * Call cpu_remove_sync rather than cpu_remove().
> * Pull latest version of patches from pseries set (v6).  Trivial change to 
>   "Reclaim VCPU objects" to fix checkpatch error.
> * Add object_unparent during s390_cpu_release to accomodate changes in 
>   Patch 4 "Reclaim VCPU objects."
> * Remove a cleanup patch in favor of 2 patches from pseries set.
> 
> **
> 
> The following patchset enables hotplug of s390 CPUs.
> 
> The standard interface is used -- to configure a guest with 2 CPUs online at 
> boot and 4 maximum:
> 
> qemu -smp 2,maxcpus=4
> 
> To subsequently hotplug a CPU:
> 
> Issue 'device_add s390-cpu,id=' from monitor.
> 
> At this point, the guest must bring the CPU online for use -- This can be 
> achieved via "echo 1 > /sys/devices/system/cpu/cpuX/online" or via a 
> management 
> tool like cpuplugd.
> 
> Hot unplug support is provided via 'device_del ', however s390 does not 
> have
> a mechanism for gracefully handling a CPU that has been removed, so this event
> triggers a reset of the guest in order to force recognition.  
> 
> This patch set is based on work previously done by Jason Herne.
> 
> Bharata B Rao (3):
>   exec: Remove cpu from cpus list during cpu_exec_exit()
>   exec: Do vmstate unregistration from cpu_exec_exit()
>   cpu: Add a sync version of cpu_remove()
> 
> Gu Zheng (1):
>   cpu: Reclaim vCPU objects
> 
> Matthew Rosato (6):
>   s390x/cpu: Cleanup init in preparation for hotplug
>   s390x/cpu: Set initial CPU state in common routine
>   s390x/cpu: Move some CPU initialization into realize
>   s390x/cpu: Add functions to (un)register CPU state
>   s390/virtio-ccw: Add hotplug handler and prepare for unplug
>   s390x/cpu: Allow hot plug/unplug of CPUs
> 
>  cpus.c | 50 +
>  exec.c | 30 
>  hw/s390x/s390-virtio-ccw.c | 30 +++-
>  hw/s390x/s390-virtio.c | 64 +++---
>  hw/s390x/s390-virtio.h |  2 +-
>  include/qom/cpu.h  | 18 
>  include/sysemu/kvm.h   |  1 +
>  kvm-all.c  | 57 -
>  kvm-stub.c |  5 
>  target-s390x/cpu.c | 70 
> +++---
>  target-s390x/cpu.h |  4 +++
>  11 files changed, 308 insertions(+), 23 deletions(-)


Acked-by: Christian Borntraeger 


Alexander, if you are too busy at the moment, we could carry
these patches via the s390/kvm tree?

We want these patches merged, since we have to libvirt as well to
use device_add instead of cpu_add (sigh).

Christian




Re: [Qemu-devel] [PATCH RFC v2 3/5] net/filter: Introduce a helper to add a filter to the netdev

2016-02-01 Thread Yang Hongyang



On 02/01/2016 03:56 PM, Hailiang Zhang wrote:

On 2016/2/1 15:46, Jason Wang wrote:



On 02/01/2016 02:13 PM, Hailiang Zhang wrote:

On 2016/2/1 11:14, Jason Wang wrote:



On 01/27/2016 04:29 PM, zhanghailiang wrote:

We add a new helper function netdev_add_filter(), this function
can help adding a filter object to a netdev.
Besides, we add a is_default member for struct NetFilterState
to indicate whether the filter is default or not.

Signed-off-by: zhanghailiang 
---
v2:
   -Re-implement netdev_add_filter() by re-using object_create()
(Jason's suggestion)
---
   include/net/filter.h |  7 +
   net/filter.c | 80

   2 files changed, 87 insertions(+)

diff --git a/include/net/filter.h b/include/net/filter.h
index af3c53c..ee1c024 100644
--- a/include/net/filter.h
+++ b/include/net/filter.h
@@ -55,6 +55,7 @@ struct NetFilterState {
   char *netdev_id;
   NetClientState *netdev;
   NetFilterDirection direction;
+bool is_default;
   bool enabled;
   QTAILQ_ENTRY(NetFilterState) next;
   };
@@ -74,4 +75,10 @@ ssize_t
qemu_netfilter_pass_to_next(NetClientState *sender,
   int iovcnt,
   void *opaque);

+void netdev_add_filter(const char *netdev_id,
+   const char *filter_type,
+   const char *id,
+   bool is_default,
+   Error **errp);
+
   #endif /* QEMU_NET_FILTER_H */
diff --git a/net/filter.c b/net/filter.c
index d08a2be..dc7aa9b 100644
--- a/net/filter.c
+++ b/net/filter.c
@@ -214,6 +214,86 @@ static void netfilter_complete(UserCreatable
*uc, Error **errp)
   QTAILQ_INSERT_TAIL(>netdev->filters, nf, next);
   }

+QemuOptsList qemu_filter_opts = {
+.name = "default-filter",
+.head = QTAILQ_HEAD_INITIALIZER(qemu_filter_opts.head),
+.desc = {
+{
+.name = "qom-type",
+.type = QEMU_OPT_STRING,
+},{
+.name = "id",
+.type = QEMU_OPT_STRING,
+},{
+.name = "netdev",
+.type = QEMU_OPT_STRING,
+},{
+.name = "status",
+.type = QEMU_OPT_STRING,
+},
+{ /* end of list */ }
+},
+};
+
+static void filter_set_default_flag(const char *id,
+bool is_default,
+Error **errp)
+{
+Object *obj, *container;
+NetFilterState *nf;
+
+container = object_get_objects_root();
+obj = object_resolve_path_component(container, id);
+if (!obj) {
+error_setg(errp, "object id not found");
+return;
+}
+nf = NETFILTER(obj);
+nf->is_default = is_default;
+}
+
+void netdev_add_filter(const char *netdev_id,
+   const char *filter_type,
+   const char *id,
+   bool is_default,
+   Error **errp)
+{
+NetClientState *nc = qemu_find_netdev(netdev_id);
+char *optarg;
+QemuOpts *opts = NULL;
+Error *err = NULL;
+
+/* FIXME: Not support multiple queues */
+if (!nc || nc->queue_index > 1) {
+return;
+}
+/* Not support vhost-net */
+if (get_vhost_net(nc)) {
+return;
+}
+
+optarg = g_strdup_printf("qom-type=%s,id=%s,netdev=%s,status=%s",
+filter_type, id, netdev_id, is_default ? "disable" :
"enable"


Instead of this, I wonder maybe it's better to:

- store the default filter property into a pointer to string


Do you mean, pass a string parameter which stores the filter property
instead of
assemble it in this helper ?


Yes. E.g just a global string which could be changed by any subsystem.
E.g colo may change it to "filter-buffer,interval=0,status=disable". But
filter ids need to be generated automatically.



Got it. Then we don't need the global default_netfilter_type[] in patch 5,
Just use this global string instead ?




- colo code may change the pointer to "filter-buffer,status=disable"




Then, there's no need for lots of codes above:
- no need a "is_default" parameter in netdev_add_filter which does not
scale consider we may want to have more property in the future
- no need to hacking like "qemu_filter_opts"


Yes, we can use qemu_find_opts("object") instead of it.


- no need to have a special flag like "is_default"



But we have to distinguish the default filter from the common
filter, use the name (id) to distinguish it ?


What's the reason that you want to distinguish default filters from
others?



The default filters will be used by COLO or MC, (In COLO, we will use it
to control packets buffering/releasing).
For COLO, we don't want to control (use) other filters that added by users.


I think Jason's point is that COLO is a manager, you can add the filter
to netdev when doing COLO, so the only difference between COLO's default
filter and 

Re: [Qemu-devel] [PATCH] nvme: generate OpenFirmware device path in the "bootorder" fw_cfg file

2016-02-01 Thread Laszlo Ersek
Gerd,

On 02/01/16 06:57, vladislav.vovche...@sk.com wrote:
>> -Original Message-
>> From: Laszlo Ersek [mailto:ler...@redhat.com]
>> Sent: 27 January 2016 02:21
>> To: qemu-devel@nongnu.org
>> Cc: Keith Busch; Kevin Wolf; open list:nvme; Gonglei; Vladislav Vovchenko
>> SFS; Feng Tian; Gerd Hoffmann; Kevin O'Connor
>> Subject: [PATCH] nvme: generate OpenFirmware device path in the
>> "bootorder" fw_cfg file

[snip]

> Tested-by: Vladislav Vovchenko 

Can you please pick up this patch? (You handled the precursor patch,
commit 33739c712982.)

This one has A-b's from Gonglei and Keith, and a T-b from Vladislav.
(Thanks for those, guys!)

Cheers
Laszlo



Re: [Qemu-devel] [RFC Patch v2 10/10] virtio-net rsc: Add Receive Segment Coalesce statistics

2016-02-01 Thread Wei Xu



On 02/01/2016 03:16 PM, Jason Wang wrote:


On 02/01/2016 02:13 AM, w...@redhat.com wrote:

From: Wei Xu 

Add statistics to log what happened during the process.

Signed-off-by: Wei Xu 
---
  hw/net/virtio-net.c| 49 +++---
  include/hw/virtio/virtio.h | 33 +++
  2 files changed, 79 insertions(+), 3 deletions(-)

Statistics is good, but need a way for reporting it to either end-user
(ethtool?) or developer (log, trace or other things).


OK, will check it out.



Re: [Qemu-devel] [PATCH V2] net/traffic-mirror:Add traffic-mirror

2016-02-01 Thread Dr. David Alan Gilbert
* Li Zhijian (lizhij...@cn.fujitsu.com) wrote:
> 
> 
> On 02/01/2016 10:57 AM, Jason Wang wrote:
> >
> >
> >On 01/29/2016 09:38 AM, Li Zhijian wrote:
> >>
> >>
> >>On 01/28/2016 01:44 PM, Jason Wang wrote:
> >>>
> >>>
> >>>On 01/27/2016 10:40 AM, Zhang Chen wrote:
> From: ZhangChen 
> 
> Traffic-mirror is a netfilter plugin.
> It gives qemu the ability to copy and mirror guest's
> net packet. we output packet to chardev.
> 
> usage:
> 
> -netdev tap,id=hn0
> -chardev socket,id=mirror0,host=ip_primary,port=X,server,nowait
> -traffic-mirror,id=m0,netdev=hn0,queue=tx/rx/all,outdev=mirror0
> 
> Signed-off-by: ZhangChen 
> Signed-off-by: Wen Congyang 
> Reviewed-by: Yang Hongyang 
> >>>
> >>>Thanks for the patch. Several questions:
> >>>
> >>>- I'm curious about how the patch was tested? Simple setup e.g:
> >>>
> >>>-netdev tap,id=hn0 -device virtio-net-pci,netdev=hn0 -chardev
> >>>socket,id=c0,host=localhost,port=,server,nowait -object
> >>>traffic-mirror,netdev=hn0,outdev=c0,id=f0 -netdev
> >>>socket,id=s0,connect=127.0.0.1: -device e1000,netdev=s0
> >>>
> 
> a strange thing is about "host=localhost", connection is refused at SUSE 11.3 
> but
> connection is connected successfully at Ubuntu 15.10 if i launch qemu with the
> command line above.
> I try to launch qemu at three physical machines installed with SUSE 11.3, 
> they all
> connect failed. But when I specified "host=127.0.0.1", the connection is OK.
> 
> I have comfirmed that:
> - "localhost have pointed to 127.0.0.1 if I "ping localhost" at SUSE
> - "telnet localhost " works at SUSE

My guess is that it's IPv6 related; check the /etc/hosts so see if there's
a ::1 entry for localhost; I've seen some weird behaviour on rhel in the
same way but in other uses.

Dave

> 
> >>>does not works for me.
> >>Hi, Jason
> >>
> >>I just test the mirror using the command line above, it don't work too.
> >>I am looking to it, and find that seems because the -net socket
> >>problem that
> >>I have ever post a patch  try to fix(refer to ↓)
> >>[Qemu-devel] [PATCH] report a error message if -net socket can not
> >>connect to server
> >>https://lists.gnu.org/archive/html/qemu-devel/2015-12/msg00758.html
> >
> >Will have a look at this.
> >
> >>
> >>after applying this patch, the qemu monitor tell me following message:
> >>(qemu) qemu-system-x86_64: net socket is not connected Connection refused
> >
> >Maybe two issues. Have you tired to start the mirror on one VM and then
> >using socket backend to connect it from another VM?
> 
> Yes, if i connect the mirror on VM1 using socket backend from another VM2, 
> the connection
> is established successfully. But on VM2 guest, I can't dump any packet using 
> 'tcpdump'
> That's because in current version code, mirror is not compatible with socket 
> backend and
> we will fix it in next version.
> 
> 
> Best regards.
> Li Zhijian
> 
> >
> >>
> >>
> >>Thanks
> >>Li Zhijian
> >>
> >>
> >>
> >
> >
> >
> >.
> >
> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK



Re: [Qemu-devel] [PATCH RFC v2 3/5] net/filter: Introduce a helper to add a filter to the netdev

2016-02-01 Thread Jason Wang


On 02/01/2016 05:39 PM, Hailiang Zhang wrote:
> On 2016/2/1 17:18, Jason Wang wrote:
>>
>>
>> On 02/01/2016 04:21 PM, Hailiang Zhang wrote:

 Instead of this, I wonder maybe it's better to:

 - store the default filter property into a pointer to string
>>>
>>> Do you mean, pass a string parameter which stores the filter
>>> property
>>> instead of
>>> assemble it in this helper ?
>>
>> Yes. E.g just a global string which could be changed by any
>> subsystem.
>> E.g colo may change it to
>> "filter-buffer,interval=0,status=disable". But
>> filter ids need to be generated automatically.
>>
>
> Got it. Then we don't need the global default_netfilter_type[] in
> patch 5,
> Just use this global string instead ?
>
>>>
 - colo code may change the pointer to
 "filter-buffer,status=disable"

>>>
 Then, there's no need for lots of codes above:
 - no need a "is_default" parameter in netdev_add_filter which
 does not
 scale consider we may want to have more property in the future
 - no need to hacking like "qemu_filter_opts"
>>>
>>> Yes, we can use qemu_find_opts("object") instead of it.
>>>
 - no need to have a special flag like "is_default"

>>>
>>> But we have to distinguish the default filter from the common
>>> filter, use the name (id) to distinguish it ?
>>
>> What's the reason that you want to distinguish default filters from
>> others?
>>
>
> The default filters will be used by COLO or MC, (In COLO, we will
> use it
> to control packets buffering/releasing).
> For COLO, we don't want to control (use) other filters that added by
> users.

 I think Jason's point is that COLO is a manager, you can add the
 filter
 to netdev when doing COLO, so the only difference between COLO's
 default
>>>
>>> Er, then we came back to the original question, 'is it necessary to
>>> add each netdev
>>> a default filter ?'
>>
>> The question could be extended to:
>>
>> 1) Do we need a default filter? I think the answer is yes, but of course
>> COLO can work even without this.
>
> Yes, after colo-proxy is realized, we can switch to colo-proxy
> (It should have the capability of buffer and release packets directly).
> But for now, we want to merge COLO prototype without colo-proxy, the COLO
> prototype should have the basic capability.

Right, I see.

> Just like Remus or
> Micro-checkpointing. It is based on the default buffer-filter to
> control net
> packets.
>
>> 2) Do we want to implement COLO on top of default filter? If yes, as you
>> suggest, we may record the ids of the default filter and do what ever we
>
> Yes, we need it.

Or just as I reply, all buffer filters (with zero interval) could be
tracked by itself. So as you see, several ways could go. It's your call
to choose one of them.

>
>> what. If not, COLO need codes to go through each netdev and add filter
>> itself (hotplug is not supported). Or you want management to do this,
>> then even hotplug could be supported.
>>
>
> We also want to support hotplug during VM is in COLO state in the future.
> (For this point, I'm not quite sure if this usage case is really exist.)
>
> Thanks,
> Hailiang

Support hotplug should be useful I think. But I'm also ok if you don't
want to consider for it now.




Re: [Qemu-devel] [PATCH v13 00/10] Block replication for continuous checkpoints

2016-02-01 Thread Dr. David Alan Gilbert
* Wen Congyang (we...@cn.fujitsu.com) wrote:
> On 01/29/2016 06:47 PM, Dr. David Alan Gilbert wrote:
> > * Wen Congyang (we...@cn.fujitsu.com) wrote:
> >> On 01/29/2016 06:07 PM, Dr. David Alan Gilbert wrote:
> >>> * Wen Congyang (we...@cn.fujitsu.com) wrote:
>  On 01/27/2016 07:03 PM, Dr. David Alan Gilbert wrote:
> > Hi,
> >   I've got a block error if I kill the secondary.
> >
> > Start both primary & secondary
> > kill -9 secondary qemu
> > x_colo_lost_heartbeat on primary
> >
> > The guest sees a block error and the ext4 root switches to read-only.
> >
> > I gdb'd the primary with a breakpoint on quorum_report_bad; see
> > backtrace below.
> > (This is based on colo-v2.4-periodic-mode of the framework
> > code with the block and network proxy merged in; so it could be my
> > merging but I don't think so ?)
> >
> >
> > (gdb) where
> > #0  quorum_report_bad (node_name=0x7f2946a0892c "node0", ret=-5, 
> > acb=0x7f2946cb3910, acb=0x7f2946cb3910)
> > at /root/colo/jan-2016/qemu/block/quorum.c:222
> > #1  0x7f2943b23058 in quorum_aio_cb (opaque=, 
> > ret=)
> > at /root/colo/jan-2016/qemu/block/quorum.c:315
> > #2  0x7f2943b311be in bdrv_co_complete (acb=0x7f2946cb3f60) at 
> > /root/colo/jan-2016/qemu/block/io.c:2122
> > #3  0x7f2943ae777d in aio_bh_call (bh=) at 
> > /root/colo/jan-2016/qemu/async.c:64
> > #4  aio_bh_poll (ctx=ctx@entry=0x7f2945b771d0) at 
> > /root/colo/jan-2016/qemu/async.c:92
> > #5  0x7f2943af5090 in aio_dispatch (ctx=0x7f2945b771d0) at 
> > /root/colo/jan-2016/qemu/aio-posix.c:305
> > #6  0x7f2943ae756e in aio_ctx_dispatch (source=, 
> > callback=, 
> > user_data=) at /root/colo/jan-2016/qemu/async.c:231
> > #7  0x7f293b84a79a in g_main_context_dispatch () from 
> > /lib64/libglib-2.0.so.0
> > #8  0x7f2943af3a00 in glib_pollfds_poll () at 
> > /root/colo/jan-2016/qemu/main-loop.c:211
> > #9  os_host_main_loop_wait (timeout=) at 
> > /root/colo/jan-2016/qemu/main-loop.c:256
> > #10 main_loop_wait (nonblocking=) at 
> > /root/colo/jan-2016/qemu/main-loop.c:504
> > #11 0x7f29438529ee in main_loop () at 
> > /root/colo/jan-2016/qemu/vl.c:1945
> > #12 main (argc=, argv=, envp= > out>) at /root/colo/jan-2016/qemu/vl.c:4707
> >
> > (gdb) p s->num_children
> > $1 = 2
> > (gdb) p acb->success_count
> > $2 = 0
> > (gdb) p acb->is_read
> > $5 = false
> 
>  Sorry for the late reply.
> >>>
> >>> No problem.
> >>>
>  What it the value of acb->count?
> >>>
> >>> (gdb) p acb->count
> >>> $1 = 1
> >>
> >> Note, the count is 1, not 2. Writing to children.0 is in flight. If 
> >> writing to children.0 successes,
> >> the guest doesn't know this error.
>  If secondary host is down, you should remove quorum's children.1. 
>  Otherwise, you will get
>  I/O error event.
> >>>
> >>> Is that safe?  If the secondary fails, do you always have time to issue 
> >>> the command to
> >>> remove the children.1  before the guest sees the error?
> >>
> >> We will write to two children, and expect that writing to children.0 will 
> >> success. If so,
> >> the guest doesn't know this error. You just get the I/O error event.
> > 
> > I think children.0 is the disk, and that should be OK - so only the 
> > children.1/replication should
> > be failing - so in that case why do I see the error?
> 
> I don't know, and I will check the codes.
> 
> > The 'node0' in the backtrace above is the name of the replication, so it 
> > does look like the error
> > is coming from the replication.
> 
> No, the backtrace is just report an I/O error events to the management 
> application.

OK, but the guest did see the error as well, so I'd assumed it was that path.

> >>> Anyway, I tried removing children.1 but it segfaults now, I guess the 
> >>> replication is unhappy:
> >>>
> >>> (qemu) x_block_change colo-disk0 -d children.1
> >>> (qemu) x_colo_lost_heartbeat 
> >>
> >> Hmm, you should not remove the child before failover. I will check it how 
> >> to avoid it in the codes.
> > 
> >  But you said 'If secondary host is down, you should remove quorum's 
> > children.1' - is that not
> > what you meant?
> 
> Yes, you should excute 'x_colo_lost_heartbeat' fist, and then excute 
> 'x_block_change ... -d ...'.

OK, but then that's quite separate from the problem with teh guest seeing it.

> >>> 12973 Segmentation fault  (core dumped) 
> >>> ./try/x86_64-softmmu/qemu-system-x86_64 -enable-kvm $console_param -S 
> >>> -boot c -m 4080 -smp 4 -machine pc-i440fx-2.5,accel=kvm -name 
> >>> debug-threads=on -trace events=trace-file -device virtio-rng-pci 
> >>> $block_param $net_param
> >>>
> >>> #0  0x7f0a398a864c in bdrv_stop_replication (bs=0x7f0a3b0a8430, 
> >>> failover=true, errp=0x7fff6a5c3420)
> >>> at /root/colo/jan-2016/qemu/block.c:4426
> >>>
> 

Re: [Qemu-devel] [PATCH RFC v2 2/5] vl: Make object_create() public

2016-02-01 Thread Hailiang Zhang

On 2016/2/1 18:41, Daniel P. Berrange wrote:

On Wed, Jan 27, 2016 at 04:29:37PM +0800, zhanghailiang wrote:

Make the helper object_create() public and fix its first
parameter to accept NULL value.

Signed-off-by: zhanghailiang 
Cc: Paolo Bonzini 
---
v2:
  - New patch
---
  include/qemu-common.h | 2 ++
  vl.c  | 4 ++--
  2 files changed, 4 insertions(+), 2 deletions(-)

diff --git a/include/qemu-common.h b/include/qemu-common.h
index 22b010c..52cf4fd 100644
--- a/include/qemu-common.h
+++ b/include/qemu-common.h
@@ -500,4 +500,6 @@ int parse_debug_env(const char *name, int max, int initial);
  const char *qemu_ether_ntoa(const MACAddr *mac);
  void page_size_init(void);

+int object_create(void *opaque, QemuOpts *opts, Error **errp);
+
  #endif
diff --git a/vl.c b/vl.c
index f043009..b21335e 100644
--- a/vl.c
+++ b/vl.c
@@ -2819,7 +2819,7 @@ static bool object_create_delayed(const char *type)
  }


-static int object_create(void *opaque, QemuOpts *opts, Error **errp)
+int object_create(void *opaque, QemuOpts *opts, Error **errp)
  {
  Error *err = NULL;
  char *type = NULL;
@@ -2842,7 +2842,7 @@ static int object_create(void *opaque, QemuOpts *opts, 
Error **errp)
  if (err) {
  goto out;
  }
-if (!type_predicate(type)) {
+if (type_predicate && !type_predicate(type)) {
  goto out;
  }


No, please don't do this - your later patch should *not* be using
object_create, it should use object_new_with_props.



Er, i didn't notice this helper before, i will look into it.

Thanks,
Hailiang


Regards,
Daniel







Re: [Qemu-devel] [RFC Patch v2 03/10] virtio-net rsc: Chain Lookup, Packet Caching and Framework of IPv4

2016-02-01 Thread Jason Wang


On 02/01/2016 04:02 PM, Wei Xu wrote:

[...]
>>
>>> +return NULL;
>>> +}
>>> +
>>> +chain->proto = proto;
>>> +chain->do_receive = virtio_net_rsc_receive4;
>>> +
>>> +QTAILQ_INIT(>buffers);
>>> +QTAILQ_INSERT_TAIL(>rsc_chains, chain, next);
>>> +return chain;
>>> +}
>> Better to split the chain initialization from lookup. And we can
>> initialize ipv4 chain during initialization.
>
> Since the allocation happens really seldom, is it ok to keep the
> mechanism to make the logic clean? 

Ok for now.



Re: [Qemu-devel] [PATCH RFC v2 3/5] net/filter: Introduce a helper to add a filter to the netdev

2016-02-01 Thread Hailiang Zhang

On 2016/2/1 17:04, Jason Wang wrote:



On 02/01/2016 03:56 PM, Hailiang Zhang wrote:

On 2016/2/1 15:46, Jason Wang wrote:



On 02/01/2016 02:13 PM, Hailiang Zhang wrote:

On 2016/2/1 11:14, Jason Wang wrote:



On 01/27/2016 04:29 PM, zhanghailiang wrote:

We add a new helper function netdev_add_filter(), this function
can help adding a filter object to a netdev.
Besides, we add a is_default member for struct NetFilterState
to indicate whether the filter is default or not.

Signed-off-by: zhanghailiang 
---
v2:
-Re-implement netdev_add_filter() by re-using object_create()
 (Jason's suggestion)
---
include/net/filter.h |  7 +
net/filter.c | 80

2 files changed, 87 insertions(+)




[...]


+
+optarg =
g_strdup_printf("qom-type=%s,id=%s,netdev=%s,status=%s",
+filter_type, id, netdev_id, is_default ? "disable" :
"enable"


Instead of this, I wonder maybe it's better to:

- store the default filter property into a pointer to string


Do you mean, pass a string parameter which stores the filter property
instead of
assemble it in this helper ?


Yes. E.g just a global string which could be changed by any subsystem.
E.g colo may change it to "filter-buffer,interval=0,status=disable". But
filter ids need to be generated automatically.



Got it. Then we don't need the global default_netfilter_type[] in
patch 5,


Yes.


Just use this global string instead ?


Right.






- colo code may change the pointer to "filter-buffer,status=disable"




Then, there's no need for lots of codes above:
- no need a "is_default" parameter in netdev_add_filter which does not
scale consider we may want to have more property in the future
- no need to hacking like "qemu_filter_opts"


Yes, we can use qemu_find_opts("object") instead of it.


- no need to have a special flag like "is_default"



But we have to distinguish the default filter from the common
filter, use the name (id) to distinguish it ?


What's the reason that you want to distinguish default filters from
others?



The default filters will be used by COLO or MC, (In COLO, we will use it
to control packets buffering/releasing).


A question is how will you do this?



Er, for COLO, we will enable all the default filter in the initialization stage,
then the buffer-filter will buffer all netdev's packets,
after doing a checkpoint, we will release all the buffered packets (Flush all 
default
filters' packets). If VM is failover, we will set all default filters back to 
disabled
status.
(This is a periodic mode for COLO, different from another mode, which we will 
call it
hybrid mode, that is based on colo-proxy, which is in developing by zhangchen)

Thanks,
Hailiang


For COLO, we don't want to control (use) other filters that added by
users.

Thanks,
Hailiang


Thanks



Thanks,
Hailiang


Thoughts?


+opts = qemu_opts_parse_noisily(_filter_opts,
+   optarg, false);
+if (!opts) {
+error_report("Failed to parse param '%s'", optarg);
+exit(1);
+}
+g_free(optarg);
+if (object_create(NULL, opts, ) < 0) {
+error_report("Failed to create object");
+goto out_clean;
+}
+filter_set_default_flag(id, is_default, );
+
+out_clean:
+qemu_opts_del(opts);
+if (err) {
+error_propagate(errp, err);
+}
+}
+
static void netfilter_finalize(Object *obj)
{
NetFilterState *nf = NETFILTER(obj);



.







.








.







Re: [Qemu-devel] [PATCH RFC v2 3/5] net/filter: Introduce a helper to add a filter to the netdev

2016-02-01 Thread Jason Wang


On 02/01/2016 04:21 PM, Hailiang Zhang wrote:
>>
>> Instead of this, I wonder maybe it's better to:
>>
>> - store the default filter property into a pointer to string
>
> Do you mean, pass a string parameter which stores the filter property
> instead of
> assemble it in this helper ?

 Yes. E.g just a global string which could be changed by any subsystem.
 E.g colo may change it to
 "filter-buffer,interval=0,status=disable". But
 filter ids need to be generated automatically.

>>>
>>> Got it. Then we don't need the global default_netfilter_type[] in
>>> patch 5,
>>> Just use this global string instead ?
>>>
>
>> - colo code may change the pointer to "filter-buffer,status=disable"
>>
>
>> Then, there's no need for lots of codes above:
>> - no need a "is_default" parameter in netdev_add_filter which
>> does not
>> scale consider we may want to have more property in the future
>> - no need to hacking like "qemu_filter_opts"
>
> Yes, we can use qemu_find_opts("object") instead of it.
>
>> - no need to have a special flag like "is_default"
>>
>
> But we have to distinguish the default filter from the common
> filter, use the name (id) to distinguish it ?

 What's the reason that you want to distinguish default filters from
 others?

>>>
>>> The default filters will be used by COLO or MC, (In COLO, we will
>>> use it
>>> to control packets buffering/releasing).
>>> For COLO, we don't want to control (use) other filters that added by
>>> users.
>>
>> I think Jason's point is that COLO is a manager, you can add the filter
>> to netdev when doing COLO, so the only difference between COLO's default
>
> Er, then we came back to the original question, 'is it necessary to
> add each netdev
> a default filter ?'

The question could be extended to:

1) Do we need a default filter? I think the answer is yes, but of course
COLO can work even without this.
2) Do we want to implement COLO on top of default filter? If yes, as you
suggest, we may record the ids of the default filter and do what ever we
what. If not, COLO need codes to go through each netdev and add filter
itself (hotplug is not supported). Or you want management to do this,
then even hotplug could be supported.

Any thoughts?

> If we add the a filter to netdev when doing COLO, it will be added
> dynamically,
> Here we want to add each netdev a default filter while launch QEMU
> (no matter if this VM will go into COLO or not),
> just to support hot-add NIC for VM while in COLO lifetime.

Yes.






[Qemu-devel] [Bug 1539940] Re: Qemu 2.5 Solaris 8 and 9 sparc hang after terminal type menu

2016-02-01 Thread Zhen Ning Lim
Looks bad before i did setenv sbus-probe-list f

Probing Memory Bank #7 64 Megabytes of DRAM
Incorrect configuration checksum; 
Setting NVRAM parameters to default values.
Setting diag-switch? NVRAM parameter to true
Probing /iommu@f,e000/sbus@f,e0001000 at f,0  espdma esp sd st ledma le 
SUNW,bpp 
Probing /iommu@f,e000/sbus@f,e0001000 at e,0  
Probing /iommu@f,e000/sbus@f,e0001000 at 0,0  Nothing there
Probing /iommu@f,e000/sbus@f,e0001000 at 1,0  Nothing there
Probing /iommu@f,e000/sbus@f,e0001000 at 2,0  Nothing there
Probing /iommu@f,e000/sbus@f,e0001000 at 3,0  Nothing there

after:

Probing Memory Bank #7 64 Megabytes of DRAM
Probing /iommu@f,e000/sbus@f,e0001000 at f,0  espdma esp sd st ledma le 
SUNW,bpp 

SPARCstation 20 (1 X 390Z50), No Keyboard
ROM Rev. 2.25, 512 MB memory installed, Serial #0.

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1539940

Title:
  Qemu 2.5 Solaris 8 and 9 sparc hang after terminal type menu

Status in QEMU:
  New

Bug description:
  Qemu command:
  qemu-system-sparc -nographic -monitor null -serial 
mon:telnet:localhost:3000,server -bios ../../Downloads/ss20_v2.25_rom -M SS-20 
-hda ./solsparc -m 512 -cdrom ./sol-9-905hw-ga-sparc-dvd.iso -boot d -cpu "TI 
SuperSparc 60" -net nic,vlan=1,macaddr=52:54:0:12:34:56

  
  when i do disk2:d, the system loads until the terminal type menu.

  What type of terminal are you using?
  1) ANSI Standard CRT
  2) DEC VT52
  3) DEC VT100
  4) Heathkit 19
  5) Lear Siegler ADM31
  6) PC Console
  7) Sun Command Tool
  8) Sun Workstation
  9) Televideo 910
  10) Televideo 925
  11) Wyse Model 50
  12) X Terminal Emulator (xterms)
  13) CDE Terminal Emulator (dtterm)
  14) Other
  Type the number of your choice and press Return: 3
  syslog service starting.
  savecore: no dump device configured
  Running in command line mode

  And nothing happens after that. Anyone encountered this issue?

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1539940/+subscriptions



Re: [Qemu-devel] [PATCH v7 12/13] qmp: Add query-ppc-cpu-cores command

2016-02-01 Thread Igor Mammedov
On Mon, 1 Feb 2016 14:13:58 +0530
Bharata B Rao  wrote:

> On Fri, Jan 29, 2016 at 04:45:06PM +0100, Igor Mammedov wrote:
> > On Thu, 28 Jan 2016 11:19:54 +0530
> > Bharata B Rao  wrote:
> >   
> > > Show the details of PPC CPU cores via a new QMP command.
> > > 
> > > TODO: update qmp-commands.hx with example
> > > 
> > > Signed-off-by: Bharata B Rao 
> > > ---
> > >  hw/ppc/cpu-core.c   | 77 
> > > +
> > >  qapi-schema.json| 31 +
> > >  qmp-commands.hx | 51 +++
> > >  stubs/Makefile.objs |  1 +
> > >  stubs/qmp_query_ppc_cpu_cores.c | 10 ++
> > >  5 files changed, 170 insertions(+)
> > >  create mode 100644 stubs/qmp_query_ppc_cpu_cores.c
> > > 
> > > diff --git a/hw/ppc/cpu-core.c b/hw/ppc/cpu-core.c
> > > index aa96e79..652a5aa 100644
> > > --- a/hw/ppc/cpu-core.c
> > > +++ b/hw/ppc/cpu-core.c
> > > @@ -9,7 +9,84 @@
> > >  #include "hw/ppc/cpu-core.h"
> > >  #include "hw/boards.h"
> > >  #include 
> > > +#include 
> > >  #include "qemu/error-report.h"
> > > +#include "qmp-commands.h"
> > > +
> > > +/*
> > > + * QMP: info ppc-cpu-cores
> > > + */
> > > +static int qmp_ppc_cpu_list(Object *obj, void *opaque)
> > > +{
> > > +CpuInfoList ***prev = opaque;
> > > +
> > > +if (object_dynamic_cast(obj, TYPE_POWERPC_CPU)) {
> > > +CpuInfoList *elem = g_new0(CpuInfoList, 1);
> > > +CpuInfo *s = g_new0(CpuInfo, 1);
> > > +CPUState *cs = CPU(obj);
> > > +PowerPCCPU *cpu = POWERPC_CPU(cs);
> > > +CPUPPCState *env = >env;
> > > +
> > > +cpu_synchronize_state(cs);
> > > +s->arch = CPU_INFO_ARCH_PPC;
> > > +s->current = (cs == first_cpu);
> > > +s->CPU = cs->cpu_index;
> > > +s->qom_path = object_get_canonical_path(obj);
> > > +s->halted = cs->halted;
> > > +s->thread_id = cs->thread_id;
> > > +s->u.ppc = g_new0(CpuInfoPPC, 1);
> > > +s->u.ppc->nip = env->nip;
> > > +
> > > +elem->value = s;
> > > +elem->next = NULL;
> > > +**prev = elem;
> > > +*prev = >next;
> > > +}
> > > +object_child_foreach(obj, qmp_ppc_cpu_list, opaque);
> > > +return 0;
> > > +}
> > > +
> > > +static int qmp_ppc_cpu_core_list(Object *obj, void *opaque)
> > > +{
> > > +PPCCPUCoreList ***prev = opaque;
> > > +
> > > +if (object_dynamic_cast(obj, TYPE_POWERPC_CPU_CORE)) {
> > > +DeviceClass *dc = DEVICE_GET_CLASS(obj);
> > > +DeviceState *dev = DEVICE(obj);
> > > +
> > > +if (dev->realized) {
> > > +PPCCPUCoreList *elem = g_new0(PPCCPUCoreList, 1);
> > > +PPCCPUCore *s = g_new0(PPCCPUCore, 1);
> > > +CpuInfoList *cpu_head = NULL;
> > > +CpuInfoList **cpu_prev = _head;
> > > +
> > > +if (dev->id) {
> > > +s->has_id = true;
> > > +s->id = g_strdup(dev->id);
> > > +}
> > > +s->hotplugged = dev->hotplugged;
> > > +s->hotpluggable = dc->hotpluggable;
> > > +qmp_ppc_cpu_list(obj, _prev);
> > > +s->threads = cpu_head;
> > > +elem->value = s;
> > > +elem->next = NULL;
> > > +**prev = elem;
> > > +*prev = >next;
> > > +}
> > > +}
> > > +
> > > +object_child_foreach(obj, qmp_ppc_cpu_core_list, opaque);
> > > +return 0;
> > > +}
> > > +
> > > +PPCCPUCoreList *qmp_query_ppc_cpu_cores(Error **errp)
> > > +{
> > > +PPCCPUCoreList *head = NULL;
> > > +PPCCPUCoreList **prev = 
> > > +
> > > +qmp_ppc_cpu_core_list(qdev_get_machine(), );
> > > +return head;
> > > +}
> > >  
> > >  static int ppc_cpu_core_realize_child(Object *child, void *opaque)
> > >  {
> > > diff --git a/qapi-schema.json b/qapi-schema.json
> > > index 8d04897..0902697 100644
> > > --- a/qapi-schema.json
> > > +++ b/qapi-schema.json
> > > @@ -4083,3 +4083,34 @@
> > >  ##
> > >  { 'enum': 'ReplayMode',
> > >'data': [ 'none', 'record', 'play' ] }
> > > +
> > > +##
> > > +# @PPCCPUCore:
> > > +#
> > > +# Information about PPC CPU core devices
> > > +#
> > > +# @hotplugged: true if device was hotplugged
> > > +#
> > > +# @hotpluggable: true if device if could be added/removed while machine 
> > > is running
> > > +#
> > > +# Since: 2.6
> > > +##
> > > +
> > > +{ 'struct': 'PPCCPUCore',
> > > +  'data': { '*id': 'str',
> > > +'hotplugged': 'bool',
> > > +'hotpluggable': 'bool',
> > > +'threads' : ['CpuInfo']
> > > +  }
> > > +}  
> > Could it be made more arch independent?  
> 
> Except that it is called PPCCPUCore, the fields are actually arch
> neutral with 'threads' element defined as CpuInfo that gets interpreted
> as arch-specific CpuInfo based on the target architecture.
> 
> In any case, this patchset 

Re: [Qemu-devel] [PATCH 1/1] arm: virt: change GPIO trigger interrupt to pulse

2016-02-01 Thread Igor Mammedov
On Fri, 29 Jan 2016 09:13:15 -0600
Wei Huang  wrote:

> On 01/29/2016 08:50 AM, Peter Maydell wrote:
> > On 29 January 2016 at 14:46, Shannon Zhao  wrote:  
> >> On 2016/1/29 22:35, Wei Huang wrote:  
> >>> On 01/29/2016 04:10 AM, Shannon Zhao wrote:  
>  This makes ACPI work well but makes DT not work. The reason is systemd or
>  acpid open /dev/input/event0 failed. So the interrupt could be injected
>  and
>  could see under /proc/interrupts but guest doesn't have any action. I'll
>  investigate why it opens failed later.  
> >>>
> >>>
> >>> That is interesting. Could you try it with the following? This reverses
> >>> the order to down-up and worked on ACPI case.
> >>>  
> >> Yeah, that's very weird.
> >>  
> >>> qemu_set_irq(qdev_get_gpio_in(pl061_dev, 3), 0);
> >>> qemu_set_irq(qdev_get_gpio_in(pl061_dev, 3), 1);
> >>>  
> >> I'll try this tomorrow. But even if this works, it's still weird.  
> > 
> > I wonder if we should be asserting the GPIO pin in the powerdown-request
> > hook and then deasserting it on system reset somewhere...  
> 
> This is another possibility. We can try to reset the pl061 state by
> hooking up with dc->reset and see what happens.
I think that's what we do on x86.

> 
> > 
> > thanks
> > -- PMM
> >   
> 




Re: [Qemu-devel] [RFC Patch v2 05/10] virtio-net rsc: Create timer to drain the packets from the cache pool

2016-02-01 Thread Jason Wang


On 02/01/2016 04:39 PM, Wei Xu wrote:
> On 02/01/2016 02:28 PM, Jason Wang wrote:
>>
>> On 02/01/2016 02:13 AM, w...@redhat.com wrote:
>>> From: Wei Xu 
>>>
>>> The timer will only be triggered if the packets pool is not empty,
>>> and it'll drain off all the cached packets, this is to reduce the
>>> delay to upper layer protocol stack.
>>>
>>> Signed-off-by: Wei Xu 
>>> ---
>>>   hw/net/virtio-net.c | 38 ++
>>>   1 file changed, 38 insertions(+)
>>>
>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>> index 4f77fbe..93df0d5 100644
>>> --- a/hw/net/virtio-net.c
>>> +++ b/hw/net/virtio-net.c
>>> @@ -48,12 +48,17 @@
>>> #define MAX_VIRTIO_IP_PAYLOAD  (65535 + IP_OFFSET)
>>>   +/* Purge coalesced packets timer interval */
>>> +#define RSC_TIMER_INTERVAL  50
>> Any hints for choosing this as default value? Do we need a property for
>> user to change this?
> This is still under estimation, 300ms -500ms is a good value to adapt
> the test, this should be configurable.
>>> +
>>>   /* Global statistics */
>>>   static uint32_t rsc_chain_no_mem;
>>> /* Switcher to enable/disable rsc */
>>>   static bool virtio_net_rsc_bypass;
>>>   +static uint32_t rsc_timeout = RSC_TIMER_INTERVAL;
>>> +
>>>   /* Coalesce callback for ipv4/6 */
>>>   typedef int32_t (VirtioNetCoalesce) (NetRscChain *chain, NetRscSeg
>>> *seg,
>>>const uint8_t *buf, size_t
>>> size);
>>> @@ -1625,6 +1630,35 @@ static int
>>> virtio_net_load_device(VirtIODevice *vdev, QEMUFile *f,
>>>   return 0;
>>>   }
>>>   +static void virtio_net_rsc_purge(void *opq)
>>> +{
>>> +int ret = 0;
>>> +NetRscChain *chain = (NetRscChain *)opq;
>>> +NetRscSeg *seg, *rn;
>>> +
>>> +QTAILQ_FOREACH_SAFE(seg, >buffers, next, rn) {
>>> +if (!qemu_can_send_packet(seg->nc)) {
>>> +/* Should quit or continue? not sure if one or some
>>> +* of the queues fail would happen, try continue here */
>> This looks wrong, qemu_can_send_packet() is used for nc's peer not nc
>> itself.
> OK.
>>
>>> +continue;
>>> +}
>>> +
>>> +ret = virtio_net_do_receive(seg->nc, seg->buf, seg->size);
>>> +QTAILQ_REMOVE(>buffers, seg, next);
>>> +g_free(seg->buf);
>>> +g_free(seg);
>>> +
>>> +if (ret == 0) {
>>> +/* Try next queue */
>> Try next seg?
> Yes, it's seg.
>>
>>> +continue;
>>> +}
>> Why need above?
> yes, it's optional, my maybe can help if there are extra codes after
> this, will remove this.
>>
>>> +}
>>> +
>>> +if (!QTAILQ_EMPTY(>buffers)) {
>>> +timer_mod(chain->drain_timer,
>>> +  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
>>> rsc_timeout);
>> Need stop/start the timer during vm stop/start to save cpu.
> Thanks, do you know where should i add the code?

You may want to look at the vmstate change handler in virtio-net.c.



Re: [Qemu-devel] [PATCH RFC v2 3/5] net/filter: Introduce a helper to add a filter to the netdev

2016-02-01 Thread Hailiang Zhang

On 2016/2/1 17:18, Jason Wang wrote:



On 02/01/2016 04:21 PM, Hailiang Zhang wrote:


Instead of this, I wonder maybe it's better to:

- store the default filter property into a pointer to string


Do you mean, pass a string parameter which stores the filter property
instead of
assemble it in this helper ?


Yes. E.g just a global string which could be changed by any subsystem.
E.g colo may change it to
"filter-buffer,interval=0,status=disable". But
filter ids need to be generated automatically.



Got it. Then we don't need the global default_netfilter_type[] in
patch 5,
Just use this global string instead ?




- colo code may change the pointer to "filter-buffer,status=disable"




Then, there's no need for lots of codes above:
- no need a "is_default" parameter in netdev_add_filter which
does not
scale consider we may want to have more property in the future
- no need to hacking like "qemu_filter_opts"


Yes, we can use qemu_find_opts("object") instead of it.


- no need to have a special flag like "is_default"



But we have to distinguish the default filter from the common
filter, use the name (id) to distinguish it ?


What's the reason that you want to distinguish default filters from
others?



The default filters will be used by COLO or MC, (In COLO, we will
use it
to control packets buffering/releasing).
For COLO, we don't want to control (use) other filters that added by
users.


I think Jason's point is that COLO is a manager, you can add the filter
to netdev when doing COLO, so the only difference between COLO's default


Er, then we came back to the original question, 'is it necessary to
add each netdev
a default filter ?'


The question could be extended to:

1) Do we need a default filter? I think the answer is yes, but of course
COLO can work even without this.


Yes, after colo-proxy is realized, we can switch to colo-proxy
(It should have the capability of buffer and release packets directly).
But for now, we want to merge COLO prototype without colo-proxy, the COLO
prototype should have the basic capability. Just like Remus or
Micro-checkpointing. It is based on the default buffer-filter to control net
packets.


2) Do we want to implement COLO on top of default filter? If yes, as you
suggest, we may record the ids of the default filter and do what ever we


Yes, we need it.


what. If not, COLO need codes to go through each netdev and add filter
itself (hotplug is not supported). Or you want management to do this,
then even hotplug could be supported.



We also want to support hotplug during VM is in COLO state in the future.
(For this point, I'm not quite sure if this usage case is really exist.)

Thanks,
Hailiang


Any thoughts?


If we add the a filter to netdev when doing COLO, it will be added
dynamically,
Here we want to add each netdev a default filter while launch QEMU
(no matter if this VM will go into COLO or not),
just to support hot-add NIC for VM while in COLO lifetime.


Yes.




.







Re: [Qemu-devel] CPU hotplug

2016-02-01 Thread Christian Borntraeger
On 02/01/2016 06:35 AM, David Gibson wrote:
> Hi,
> 
> It seems to me we're getting rather bogged down in how to proceed with
> an improved CPU hotplug (and hot unplug) interface, both generically
> and for ppc in particular.

Yes, s390 also needs this. 
Can you add Matthew in any cpu hotplug discussion?


> 
> So here's a somewhat more concrete suggestion of a way forward, to see
> if we can get some consensus.
> 
> The biggest difficulty I think we're grappling with is that device-add
> is actually *not* a great interface to cpu hotplug.  Or rather, it's
> not great as the _only_ interface: in order to represent the many
> different constraints on how cpus can be plugged on various platforms,
> it's natural to use a heirarchy of cpu core / socket / package types
> specific to the specific platform or real-world cpu package being
> modeled.  However, for the normal case of a regular homogenous (and at
> least slightly para-virtualized) server, that interface is nasty for
> management layers because they have to know the right type to
> instantiate.
> 
> To address this, I'm proposing this two layer interface:
> 
> Layer 1: Low-level, device-add based
> 
> * a new, generic cpu-package QOM type represents a group of 1 or
>   more cpu threads which can be hotplugged as a unit
> * cpu-package is abstract and can't be instantiated directly
> * archs and/or individual platforms have specific subtypes of
>   cpu-package which can be instantiated
> * for platforms attempting to be faithful representations of real
>   hardware these subtypes would match the specific characteristics
>   of the real hardware devices.  In addition to the cpu threads,
>   they may have other on chip devices as sub-objects.
> * for platforms which are paravirtual - or which have existing
>   firmware abstractions for cpu cores/sockets/packages/whatever -
>   these could be more abstract, but would still be tied to that
>   platform's constraints
> * Depending on the platform the cpu-package object could have
>   further internal structure (e.g. a package object representing a
>   socket contains package objects representing each core, which in
>   turn contain cpu objects for each thread)
> * Some crazy platform that has multiple daughterboards each with
>   several multi-chip-modules each with several chips, each
> with several cores each with several threads could represent
> that too.
> 
> What would be common to all the cpu-package subtypes is:
> * A boolean "present" attribute ("realized" might already be
>   suitable, but I'm not certain)
> * A generic means of determining the number of cpu threads in the
>   package, and enumerating those
> * A generic means of determining if the package is hotpluggable or
>   not
> * They'd get listed in a standard place in the QOM tree
> 
> This interface is suitable if you want complete control over
> constructing the system, including weird cases like heterogeneous
> machines (either totally different cpu types, or just different
> numbers of threads in different packages).
> 
> The intention is that these objects would never look at the global cpu
> type or sockets/cores/threads numbers.  The next level up would
> instead configure the packages to match those for the common case.
> 
> Layer 2: Higher-level
> 
> * not all machine types need support this model, but I'd expect
>   all future versions of machine types designed for production use
>   to do so
> * machine types don't construct cpu objects directly
> * instead they create enough cpu-package objects - of a subtype
>   suitable for this machine - to provide maxcpus threads
> * the machine type would set the "present" bit on enough of the
>   cpu packages to provide the base number of cpu threads
> 
> Management layers can then manage hotplug without knowing platform
> specifics by using qmp to toggle the "present" bit on packages.
> Platforms that allow thread-level pluggability can expose a package
> for every thread, those that allow core-level expose a package per
> core, those that have even less granularity expose a package at
> whatever grouping they can do hotplug on.
> 
> Examples:
> 
> For use with pc (or q35 or whatever) machine type, I'd expect a
> cpu-package subtype called, say "acpi-thread" which represents a
> single thread in the ACPI sense.  Toggling those would trigger ACPI
> hotplug events as cpu_add does now.
> 
> For use with pseries, I'd expect a "papr-core" cpu-package subtype,
> which represents a single (paravirtual) core.  Toggling present on
> this would trigger the PAPR hotplug events.  A property would control
> the number of threads in the core (only settable before enabling
> present).
> 
> For use with the powernv machine type (once ready for merge) I'd
> expect "POWER8-package" type which represents a POWER8 chip / module
> as close to the 

Re: [Qemu-devel] [PATCH RFC v2 3/5] net/filter: Introduce a helper to add a filter to the netdev

2016-02-01 Thread Hailiang Zhang

On 2016/2/1 17:49, Jason Wang wrote:



On 02/01/2016 05:39 PM, Hailiang Zhang wrote:

On 2016/2/1 17:18, Jason Wang wrote:



On 02/01/2016 04:21 PM, Hailiang Zhang wrote:


Instead of this, I wonder maybe it's better to:

- store the default filter property into a pointer to string


Do you mean, pass a string parameter which stores the filter
property
instead of
assemble it in this helper ?


Yes. E.g just a global string which could be changed by any
subsystem.
E.g colo may change it to
"filter-buffer,interval=0,status=disable". But
filter ids need to be generated automatically.



Got it. Then we don't need the global default_netfilter_type[] in
patch 5,
Just use this global string instead ?




- colo code may change the pointer to
"filter-buffer,status=disable"




Then, there's no need for lots of codes above:
- no need a "is_default" parameter in netdev_add_filter which
does not
scale consider we may want to have more property in the future
- no need to hacking like "qemu_filter_opts"


Yes, we can use qemu_find_opts("object") instead of it.


- no need to have a special flag like "is_default"



But we have to distinguish the default filter from the common
filter, use the name (id) to distinguish it ?


What's the reason that you want to distinguish default filters from
others?



The default filters will be used by COLO or MC, (In COLO, we will
use it
to control packets buffering/releasing).
For COLO, we don't want to control (use) other filters that added by
users.


I think Jason's point is that COLO is a manager, you can add the
filter
to netdev when doing COLO, so the only difference between COLO's
default


Er, then we came back to the original question, 'is it necessary to
add each netdev
a default filter ?'


The question could be extended to:

1) Do we need a default filter? I think the answer is yes, but of course
COLO can work even without this.


Yes, after colo-proxy is realized, we can switch to colo-proxy
(It should have the capability of buffer and release packets directly).
But for now, we want to merge COLO prototype without colo-proxy, the COLO
prototype should have the basic capability.


Right, I see.


Just like Remus or
Micro-checkpointing. It is based on the default buffer-filter to
control net
packets.


2) Do we want to implement COLO on top of default filter? If yes, as you
suggest, we may record the ids of the default filter and do what ever we


Yes, we need it.


Or just as I reply, all buffer filters (with zero interval) could be
tracked by itself. So as you see, several ways could go. It's your call
to choose one of them.



OK, got it.




what. If not, COLO need codes to go through each netdev and add filter
itself (hotplug is not supported). Or you want management to do this,
then even hotplug could be supported.



We also want to support hotplug during VM is in COLO state in the future.
(For this point, I'm not quite sure if this usage case is really exist.)

Thanks,
Hailiang


Support hotplug should be useful I think. But I'm also ok if you don't
want to consider for it now.



Thanks very much.

Hailiang





Re: [Qemu-devel] [PATCH RFC v2 2/5] vl: Make object_create() public

2016-02-01 Thread Daniel P. Berrange
On Wed, Jan 27, 2016 at 04:29:37PM +0800, zhanghailiang wrote:
> Make the helper object_create() public and fix its first
> parameter to accept NULL value.
> 
> Signed-off-by: zhanghailiang 
> Cc: Paolo Bonzini 
> ---
> v2:
>  - New patch
> ---
>  include/qemu-common.h | 2 ++
>  vl.c  | 4 ++--
>  2 files changed, 4 insertions(+), 2 deletions(-)
> 
> diff --git a/include/qemu-common.h b/include/qemu-common.h
> index 22b010c..52cf4fd 100644
> --- a/include/qemu-common.h
> +++ b/include/qemu-common.h
> @@ -500,4 +500,6 @@ int parse_debug_env(const char *name, int max, int 
> initial);
>  const char *qemu_ether_ntoa(const MACAddr *mac);
>  void page_size_init(void);
>  
> +int object_create(void *opaque, QemuOpts *opts, Error **errp);
> +
>  #endif
> diff --git a/vl.c b/vl.c
> index f043009..b21335e 100644
> --- a/vl.c
> +++ b/vl.c
> @@ -2819,7 +2819,7 @@ static bool object_create_delayed(const char *type)
>  }
>  
>  
> -static int object_create(void *opaque, QemuOpts *opts, Error **errp)
> +int object_create(void *opaque, QemuOpts *opts, Error **errp)
>  {
>  Error *err = NULL;
>  char *type = NULL;
> @@ -2842,7 +2842,7 @@ static int object_create(void *opaque, QemuOpts *opts, 
> Error **errp)
>  if (err) {
>  goto out;
>  }
> -if (!type_predicate(type)) {
> +if (type_predicate && !type_predicate(type)) {
>  goto out;
>  }

No, please don't do this - your later patch should *not* be using
object_create, it should use object_new_with_props.

Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] [PATCH RFC v2 3/5] net/filter: Introduce a helper to add a filter to the netdev

2016-02-01 Thread Daniel P. Berrange
On Wed, Jan 27, 2016 at 04:29:38PM +0800, zhanghailiang wrote:
> We add a new helper function netdev_add_filter(), this function
> can help adding a filter object to a netdev.
> Besides, we add a is_default member for struct NetFilterState
> to indicate whether the filter is default or not.
> 
> Signed-off-by: zhanghailiang 
> ---
> v2:
>  -Re-implement netdev_add_filter() by re-using object_create()
>   (Jason's suggestion)
> ---
>  include/net/filter.h |  7 +
>  net/filter.c | 80 
> 
>  2 files changed, 87 insertions(+)

> +void netdev_add_filter(const char *netdev_id,
> +   const char *filter_type,
> +   const char *id,
> +   bool is_default,
> +   Error **errp)
> +{
> +NetClientState *nc = qemu_find_netdev(netdev_id);
> +char *optarg;
> +QemuOpts *opts = NULL;
> +Error *err = NULL;
> +
> +/* FIXME: Not support multiple queues */
> +if (!nc || nc->queue_index > 1) {
> +return;
> +}
> +/* Not support vhost-net */
> +if (get_vhost_net(nc)) {
> +return;
> +}
> +
> +optarg = g_strdup_printf("qom-type=%s,id=%s,netdev=%s,status=%s",
> +filter_type, id, netdev_id, is_default ? "disable" : "enable");
> +opts = qemu_opts_parse_noisily(_filter_opts,
> +   optarg, false);

Formatting a string and then immediately parsing it again is totally
crazy, not least because you're not taking care to do escaping of
special characters like ',' in the string parameters.

> +if (!opts) {
> +error_report("Failed to parse param '%s'", optarg);
> +exit(1);
> +}
> +g_free(optarg);
> +if (object_create(NULL, opts, ) < 0) {
> +error_report("Failed to create object");
> +goto out_clean;
> +}

Don't use object_create() - use object_new_with_props() which avoids
the need to format + parse the string above. ie do

  object_new_with_props(filter_type,
object_get_objects_root(),
id,
,
"netdev", netdev_id,
"status", is_default ? "disable" : "enable",
NULL);


Regards,
Daniel
-- 
|: http://berrange.com  -o-http://www.flickr.com/photos/dberrange/ :|
|: http://libvirt.org  -o- http://virt-manager.org :|
|: http://autobuild.org   -o- http://search.cpan.org/~danberr/ :|
|: http://entangle-photo.org   -o-   http://live.gnome.org/gtk-vnc :|



Re: [Qemu-devel] [RFC Patch v2 04/10] virtio-net rsc: Detailed IPv4 and General TCP data coalescing

2016-02-01 Thread Jason Wang


On 02/01/2016 04:29 PM, Wei Xu wrote:
>
> On 02/01/2016 02:21 PM, Jason Wang wrote:
>
>>
>> On 02/01/2016 02:13 AM, w...@redhat.com wrote:
>>> From: Wei Xu 
>>>
>>> Since this feature also needs to support IPv6, and there are
>>> some protocol specific differences difference for IPv4/6 in the header,
>>> so try to make the interface to be general.
>>>
>>> IPv4/6 should set up both the new and old IP/TCP header before invoking
>>> TCP coalescing, and should also tell the real payload.
>>>
>>> The main handler of TCP includes TCP window update, duplicated ACK
>>> check
>>> and the real data coalescing if the new segment passed invalid filter
>>> and is identified as an expected one.
>>>
>>> An expected segment means:
>>> 1. Segment is within current window and the sequence is the expected
>>> one.
>>> 2. ACK of the segment is in the valid window.
>>> 3. If the ACK in the segment is a duplicated one, then it must less
>>> than 2,
>>> this is to notify upper layer TCP starting retransmission due to
>>> the spec.
>>>
>>> Signed-off-by: Wei Xu 
>>> ---
>>>   hw/net/virtio-net.c | 127
>>> ++--
>>>   1 file changed, 124 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
>>> index cfbac6d..4f77fbe 100644
>>> --- a/hw/net/virtio-net.c
>>> +++ b/hw/net/virtio-net.c
>>> @@ -41,6 +41,10 @@
>>> #define VIRTIO_HEADER   12/* Virtio net header size */
>>>   #define IP_OFFSET (VIRTIO_HEADER + sizeof(struct eth_header))
>>> +#define TCP_WINDOW  65535
>> The name is confusing, how about TCP_MAX_WINDOW_SIZE ?
>
> Sounds better, will take it in.
>
>>
>>> +
>>> +/* IPv4 max payload, 16 bits in the header */
>>> +#define MAX_IP4_PAYLOAD  (65535 - sizeof(struct ip_header))
>>> #define MAX_VIRTIO_IP_PAYLOAD  (65535 + IP_OFFSET)
>>>   @@ -1670,13 +1674,130 @@ out:
>>>   return 0;
>>>   }
>>>   +static int32_t virtio_net_rsc_handle_ack(NetRscChain *chain,
>>> NetRscSeg *seg,
>>> + const uint8_t *buf, struct
>>> tcp_header *n_tcp,
>>> + struct tcp_header *o_tcp)
>>> +{
>>> +uint32_t nack, oack;
>>> +uint16_t nwin, owin;
>>> +
>>> +nack = htonl(n_tcp->th_ack);
>>> +nwin = htons(n_tcp->th_win);
>>> +oack = htonl(o_tcp->th_ack);
>>> +owin = htons(o_tcp->th_win);
>>> +
>>> +if ((nack - oack) >= TCP_WINDOW) {
>>> +return RSC_FINAL;
>>> +} else if (nack == oack) {
>>> +/* duplicated ack or window probe */
>>> +if (nwin == owin) {
>>> +/* duplicated ack, add dup ack count due to whql test
>>> up to 1 */
>>> +
>>> +if (seg->dup_ack_count == 0) {
>>> +seg->dup_ack_count++;
>>> +return RSC_COALESCE;
>>> +} else {
>>> +/* Spec says should send it directly */
>>> +return RSC_FINAL;
>>> +}
>>> +} else {
>>> +/* Coalesce window update */
>> Need we flush this immediately consider it was a window update?
>
> The flowchart in the spec says this can be coalesced as normal.
>
> https://msdn.microsoft.com/en-us/library/windows/hardware/jj853325%28v=vs.85%29.aspx

I see.

>
>
>>
>>> +o_tcp->th_win = n_tcp->th_win;
>>> +return RSC_COALESCE;
>>> +}
>>> +} else {
>> What if nack < oack here?
>
> That should happen, the  modulo-232 arithmetic check at the begin of
> this function will keep the ack is in the current window. 

Ok.

Thanks



Re: [Qemu-devel] [PATCH v7 01/13] machine: Don't allow CPU toplogies with partially filled cores

2016-02-01 Thread Igor Mammedov
On Fri, 29 Jan 2016 15:24:12 -0200
Eduardo Habkost  wrote:

> On Fri, Jan 29, 2016 at 05:52:15PM +0100, Igor Mammedov wrote:
> > On Fri, 29 Jan 2016 13:36:05 -0200
> > Eduardo Habkost  wrote:
> >   
> > > On Fri, Jan 29, 2016 at 04:10:47PM +0100, Igor Mammedov wrote:  
> > > > On Fri, 29 Jan 2016 12:24:18 -0200
> > > > Eduardo Habkost  wrote:
> > > > 
> > > > > On Fri, Jan 29, 2016 at 02:52:30PM +1100, David Gibson wrote:
> > > > > > On Thu, Jan 28, 2016 at 11:19:43AM +0530, Bharata B Rao wrote:  
> > > > > > > Prevent guests from booting with CPU topologies that have 
> > > > > > > partially
> > > > > > > filled CPU cores or can result in partially filled CPU cores after
> > > > > > > CPU hotplug like
> > > > > > > 
> > > > > > > -smp 15,sockets=1,cores=4,threads=4,maxcpus=16 or
> > > > > > > -smp 15,sockets=1,cores=4,threads=4,maxcpus=17.
> > > > > > > 
> > > > > > > This is enforced by introducing 
> > > > > > > MachineClass::validate_smp_config()
> > > > > > > that gets called from generic SMP parsing code. Machine type 
> > > > > > > versions
> > > > > > > that want to enforce this can define this to the generic version
> > > > > > > provided.
> > > > > > > 
> > > > > > > Only sPAPR and PC machine types starting from version 2.6 enforce 
> > > > > > > this in
> > > > > > > this patch.
> > > > > > > 
> > > > > > > Signed-off-by: Bharata B Rao   
> > > > > > 
> > > > > > I've been kind of lost in the back and forth about
> > > > > > threads/cores/sockets.
> > > > > > 
> > > > > > What, in the end, is the rationale for allowing partially filled
> > > > > > sockets, but not partially filled cores?  
> > > > > 
> > > > > I don't think there's a good reason for that (at least for PC).
> > > > > 
> > > > > It's easier to relax the requirements later if necessary, than
> > > > > dealing with compatibility issues again when making the code more
> > > > > strict. So I suggest we make validate_smp_config_generic() also
> > > > > check if smp_cpus % (smp_threads * smp_cores) == 0.
> > > > 
> > > > that would break exiting setups.
> > > 
> > > Not if we do that only on newer machine classes.
> > > validate_smp_config_generic() will be used only on *-2.6 and
> > > newer.
> > > 
> > >   
> > > > 
> > > > Also in case of cpu hotplug this patch will break migration
> > > > as target QEMU might refuse starting with hotplugged CPU thread.
> > > 
> > > This won't change older machine-types.
> > > 
> > > But I think you are right: it can break migration on pc-2.6, too.
> > > But: isn't migration already broken when creating other sets of
> > > CPUs that can't represented using -smp?
> > > 
> > > How exactly would you migrate a machine today, if you run:
> > > 
> > >   $ qemu-system-x86_64 -smp 16,sockets=2,cores=2,threads=2,maxcpus=32
> > >   (QMP) cpu-add id=31  
> > that's invalid topology and should exit with error at start-up,  
> 
> Oops, my mistake. Now the same question with the right numbers:
> 
> How exactly would you migrate a machine today, if you do the
> following?
> 
>   $ qemu-system-x86_64 -smp 8,sockets=2,cores=2,threads=2,maxcpus=16
>   (QMP) cpu-add id=15
isn't it the same broken topology? 
  sockets*cores*threads != maxcpus
But if you ask if it's possible to migrate machine with non-sequentially
hotplugged CPUs than answer is no it's not possible with cpu-add.

> > however it shouldn't be smp_cpus vs sockets,cores,threads check
> > but rather max_cpus vs sockets,cores,threads,maxcpus check.
> > something like this:
> > 
> > diff --git a/vl.c b/vl.c
> > index f043009..3afa0b6 100644
> > --- a/vl.c
> > +++ b/vl.c
> > @@ -1239,9 +1239,9 @@ static void smp_parse(QemuOpts *opts)
> >  }
> >  
> >  max_cpus = qemu_opt_get_number(opts, "maxcpus", cpus);
> > -if (sockets * cores * threads > max_cpus) {
> > -error_report("cpu topology: "
> > - "sockets (%u) * cores (%u) * threads (%u) > "
> > +if (sockets * cores * threads == max_cpus) {
> > +error_report("invalid cpu topology: "
> > + "sockets (%u) * cores (%u) * threads (%u) not 
> > equal "
> >   "maxcpus (%u)",
> >   sockets, cores, threads, max_cpus);
> >  exit(1);
> >   
> > > 
> > >   
> > > > 
> > > > Perhaps this check should be enforced per target/machine if
> > > > arch requires it.
> > > 
> > > It is. Please see the patch. It introduces a validate_smp_config
> > > method.
> > > 
> > > But we need your input to clarify if
> > > validate_smp_config_generic() is safe for pc-2.6 too.  
> > it breaks migration as it could prevent target from starting if
> > there is hotplugged CPUs on source side.  
> 
> It looks like this is a problem only if the machine allows
> hotplug of individual threads. What if we just add this to the
> beginning of 

Re: [Qemu-devel] [PATCH V2] net/traffic-mirror:Add traffic-mirror

2016-02-01 Thread Li Zhijian



On 02/01/2016 05:11 PM, Dr. David Alan Gilbert wrote:

* Li Zhijian (lizhij...@cn.fujitsu.com) wrote:



On 02/01/2016 10:57 AM, Jason Wang wrote:



On 01/29/2016 09:38 AM, Li Zhijian wrote:



On 01/28/2016 01:44 PM, Jason Wang wrote:



On 01/27/2016 10:40 AM, Zhang Chen wrote:

From: ZhangChen 

Traffic-mirror is a netfilter plugin.
It gives qemu the ability to copy and mirror guest's
net packet. we output packet to chardev.

usage:

-netdev tap,id=hn0
-chardev socket,id=mirror0,host=ip_primary,port=X,server,nowait
-traffic-mirror,id=m0,netdev=hn0,queue=tx/rx/all,outdev=mirror0

Signed-off-by: ZhangChen 
Signed-off-by: Wen Congyang 
Reviewed-by: Yang Hongyang 


Thanks for the patch. Several questions:

- I'm curious about how the patch was tested? Simple setup e.g:

-netdev tap,id=hn0 -device virtio-net-pci,netdev=hn0 -chardev
socket,id=c0,host=localhost,port=,server,nowait -object
traffic-mirror,netdev=hn0,outdev=c0,id=f0 -netdev
socket,id=s0,connect=127.0.0.1: -device e1000,netdev=s0



a strange thing is about "host=localhost", connection is refused at SUSE 11.3 
but
connection is connected successfully at Ubuntu 15.10 if i launch qemu with the
command line above.
I try to launch qemu at three physical machines installed with SUSE 11.3, they 
all
connect failed. But when I specified "host=127.0.0.1", the connection is OK.

I have comfirmed that:
- "localhost have pointed to 127.0.0.1 if I "ping localhost" at SUSE
- "telnet localhost " works at SUSE


My guess is that it's IPv6 related; check the /etc/hosts so see if there's
a ::1 entry for localhost; I've seen some weird behaviour on rhel in the
same way but in other uses.


Thank you Dave,
As you said, there are 2 entry record (ipv4 and ipv6) for "localhost" at my 
/etc/hosts
after removing the ipv6 entry, the whole world become fine ^_^

Thanks
Li Zhijian



Dave




does not works for me.

Hi, Jason

I just test the mirror using the command line above, it don't work too.
I am looking to it, and find that seems because the -net socket
problem that
I have ever post a patch  try to fix(refer to ↓)
[Qemu-devel] [PATCH] report a error message if -net socket can not
connect to server
https://lists.gnu.org/archive/html/qemu-devel/2015-12/msg00758.html


Will have a look at this.



after applying this patch, the qemu monitor tell me following message:
(qemu) qemu-system-x86_64: net socket is not connected Connection refused


Maybe two issues. Have you tired to start the mirror on one VM and then
using socket backend to connect it from another VM?


Yes, if i connect the mirror on VM1 using socket backend from another VM2, the 
connection
is established successfully. But on VM2 guest, I can't dump any packet using 
'tcpdump'
That's because in current version code, mirror is not compatible with socket 
backend and
we will fix it in next version.


Best regards.
Li Zhijian






Thanks
Li Zhijian







.





--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK


.



--
Best regards.
Li Zhijian (8555)





Re: [Qemu-devel] [PATCH RFC v2 3/5] net/filter: Introduce a helper to add a filter to the netdev

2016-02-01 Thread Jason Wang


On 02/01/2016 05:22 PM, Hailiang Zhang wrote:
> On 2016/2/1 17:04, Jason Wang wrote:
>>
>>
>> On 02/01/2016 03:56 PM, Hailiang Zhang wrote:
>>> On 2016/2/1 15:46, Jason Wang wrote:


 On 02/01/2016 02:13 PM, Hailiang Zhang wrote:
> On 2016/2/1 11:14, Jason Wang wrote:
>>
>>
>> On 01/27/2016 04:29 PM, zhanghailiang wrote:
>>> We add a new helper function netdev_add_filter(), this function
>>> can help adding a filter object to a netdev.
>>> Besides, we add a is_default member for struct NetFilterState
>>> to indicate whether the filter is default or not.
>>>
>>> Signed-off-by: zhanghailiang 
>>> ---
>>> v2:
>>> -Re-implement netdev_add_filter() by re-using object_create()
>>>  (Jason's suggestion)
>>> ---
>>> include/net/filter.h |  7 +
>>> net/filter.c | 80
>>> 
>>> 2 files changed, 87 insertions(+)
>>>
>>>
>>
>> [...]
>>
>>> +
>>> +optarg =
>>> g_strdup_printf("qom-type=%s,id=%s,netdev=%s,status=%s",
>>> +filter_type, id, netdev_id, is_default ? "disable" :
>>> "enable"
>>
>> Instead of this, I wonder maybe it's better to:
>>
>> - store the default filter property into a pointer to string
>
> Do you mean, pass a string parameter which stores the filter property
> instead of
> assemble it in this helper ?

 Yes. E.g just a global string which could be changed by any subsystem.
 E.g colo may change it to
 "filter-buffer,interval=0,status=disable". But
 filter ids need to be generated automatically.

>>>
>>> Got it. Then we don't need the global default_netfilter_type[] in
>>> patch 5,
>>
>> Yes.
>>
>>> Just use this global string instead ?
>>
>> Right.
>>
>>>
>
>> - colo code may change the pointer to "filter-buffer,status=disable"
>>
>
>> Then, there's no need for lots of codes above:
>> - no need a "is_default" parameter in netdev_add_filter which
>> does not
>> scale consider we may want to have more property in the future
>> - no need to hacking like "qemu_filter_opts"
>
> Yes, we can use qemu_find_opts("object") instead of it.
>
>> - no need to have a special flag like "is_default"
>>
>
> But we have to distinguish the default filter from the common
> filter, use the name (id) to distinguish it ?

 What's the reason that you want to distinguish default filters from
 others?

>>>
>>> The default filters will be used by COLO or MC, (In COLO, we will
>>> use it
>>> to control packets buffering/releasing).
>>
>> A question is how will you do this?
>>
>
> Er, for COLO, we will enable all the default filter in the
> initialization stage,
> then the buffer-filter will buffer all netdev's packets,
> after doing a checkpoint, we will release all the buffered packets
> (Flush all default
> filters' packets).

Right, that's the point. So looks several choices here:

1) Track each default filter explicitly, generate and record the netdev
ids for default filter by COLO.  Walk through the ids list and release
the packet in each filter.
2) Track the default filters implicitly. Link all buffer filters (with
zero interval) in a list during filter buffer initialization. And export
a helper for COLO to walk them and release packets.

Either looks fine, but maybe 2 is easier.

>  If VM is failover, we will set all default filters back to disabled
> status.
> (This is a periodic mode for COLO, different from another mode, which
> we will call it
> hybrid mode, that is based on colo-proxy, which is in developing by
> zhangchen)
>
> Thanks,
> Hailiang

Yes, I see.




Re: [Qemu-devel] [PATCH v10 00/25] qapi visitor cleanups part 1 (post-introspection cleanups subset E)

2016-02-01 Thread Markus Armbruster
Eric Blake  writes:

> Based on qemu.git master. No pending prerequisites
>
> Also available as a tag at this location:
> git fetch git://repo.or.cz/qemu/ericb.git qapi-cleanupv10e
>
> and will soon be part of my branch with the rest of the v5 series, at:
> http://repo.or.cz/qemu/ericb.git/shortlog/refs/heads/qapi
>
> v10 notes:
> This is patches 1-20 of v9; 21-37 of that series will come later,
> but this half was relatively clean and should be ready to merge.
> Plus, this half includes the argument reordering, which touches
> a lot of the tree, so getting it in sooner rather than later will
> minimize rebase churn.  A couple patches were split differently
> or retitled.  Reviewed-by was kept on patches that didn't change
> in content (even if the content was split across different patch
> boundaries).
>
> Most of the work here was addressing Markus' review comments,
> or rebasing later patches on top of earlier changes.

I found one harmless rebase mistake, and got a few suggestions on commit
messages, nothing that can't be touched up on commit.

With the harmless rebase mistake cleaned up, series
Reviewed-by: Markus Armbruster 

[...]



Re: [Qemu-devel] [RFC Patch v2 05/10] virtio-net rsc: Create timer to drain the packets from the cache pool

2016-02-01 Thread Wei Xu



On 02/01/2016 05:31 PM, Jason Wang wrote:


On 02/01/2016 04:39 PM, Wei Xu wrote:

On 02/01/2016 02:28 PM, Jason Wang wrote:

On 02/01/2016 02:13 AM, w...@redhat.com wrote:

From: Wei Xu 

The timer will only be triggered if the packets pool is not empty,
and it'll drain off all the cached packets, this is to reduce the
delay to upper layer protocol stack.

Signed-off-by: Wei Xu 
---
   hw/net/virtio-net.c | 38 ++
   1 file changed, 38 insertions(+)

diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 4f77fbe..93df0d5 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -48,12 +48,17 @@
 #define MAX_VIRTIO_IP_PAYLOAD  (65535 + IP_OFFSET)
   +/* Purge coalesced packets timer interval */
+#define RSC_TIMER_INTERVAL  50

Any hints for choosing this as default value? Do we need a property for
user to change this?

This is still under estimation, 300ms -500ms is a good value to adapt
the test, this should be configurable.

+
   /* Global statistics */
   static uint32_t rsc_chain_no_mem;
 /* Switcher to enable/disable rsc */
   static bool virtio_net_rsc_bypass;
   +static uint32_t rsc_timeout = RSC_TIMER_INTERVAL;
+
   /* Coalesce callback for ipv4/6 */
   typedef int32_t (VirtioNetCoalesce) (NetRscChain *chain, NetRscSeg
*seg,
const uint8_t *buf, size_t
size);
@@ -1625,6 +1630,35 @@ static int
virtio_net_load_device(VirtIODevice *vdev, QEMUFile *f,
   return 0;
   }
   +static void virtio_net_rsc_purge(void *opq)
+{
+int ret = 0;
+NetRscChain *chain = (NetRscChain *)opq;
+NetRscSeg *seg, *rn;
+
+QTAILQ_FOREACH_SAFE(seg, >buffers, next, rn) {
+if (!qemu_can_send_packet(seg->nc)) {
+/* Should quit or continue? not sure if one or some
+* of the queues fail would happen, try continue here */

This looks wrong, qemu_can_send_packet() is used for nc's peer not nc
itself.

OK.

+continue;
+}
+
+ret = virtio_net_do_receive(seg->nc, seg->buf, seg->size);
+QTAILQ_REMOVE(>buffers, seg, next);
+g_free(seg->buf);
+g_free(seg);
+
+if (ret == 0) {
+/* Try next queue */

Try next seg?

Yes, it's seg.

+continue;
+}

Why need above?

yes, it's optional, my maybe can help if there are extra codes after
this, will remove this.

+}
+
+if (!QTAILQ_EMPTY(>buffers)) {
+timer_mod(chain->drain_timer,
+  qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) +
rsc_timeout);

Need stop/start the timer during vm stop/start to save cpu.

Thanks, do you know where should i add the code?

You may want to look at the vmstate change handler in virtio-net.c.

great, thanks a lot.







[Qemu-devel] [PATCH v6 8/8] hw/arm/sysbus-fdt: remove qemu_fdt_setprop returned value check

2016-02-01 Thread Eric Auger
qemu_fdt_setprop asserts in case of error hence no need to check
the returned value.

Signed-off-by: Eric Auger 

---

v3 -> v4: fix returned value
---
 hw/arm/sysbus-fdt.c | 19 +--
 1 file changed, 5 insertions(+), 14 deletions(-)

diff --git a/hw/arm/sysbus-fdt.c b/hw/arm/sysbus-fdt.c
index 9920388..04afeae 100644
--- a/hw/arm/sysbus-fdt.c
+++ b/hw/arm/sysbus-fdt.c
@@ -217,7 +217,7 @@ static int add_calxeda_midway_xgmac_fdt_node(SysBusDevice 
*sbdev, void *opaque)
 PlatformBusDevice *pbus = data->pbus;
 void *fdt = data->fdt;
 const char *parent_node = data->pbus_node_name;
-int compat_str_len, i, ret = -1;
+int compat_str_len, i;
 char *nodename;
 uint32_t *irq_attr, *reg_attr;
 uint64_t mmio_base, irq_number;
@@ -242,12 +242,8 @@ static int add_calxeda_midway_xgmac_fdt_node(SysBusDevice 
*sbdev, void *opaque)
 reg_attr[2 * i + 1] = cpu_to_be32(
 memory_region_size(>regions[i]->mem));
 }
-ret = qemu_fdt_setprop(fdt, nodename, "reg", reg_attr,
-   vbasedev->num_regions * 2 * sizeof(uint32_t));
-if (ret) {
-error_report("could not set reg property of node %s", nodename);
-goto fail_reg;
-}
+qemu_fdt_setprop(fdt, nodename, "reg", reg_attr,
+ vbasedev->num_regions * 2 * sizeof(uint32_t));
 
 irq_attr = g_new(uint32_t, vbasedev->num_irqs * 3);
 for (i = 0; i < vbasedev->num_irqs; i++) {
@@ -257,17 +253,12 @@ static int add_calxeda_midway_xgmac_fdt_node(SysBusDevice 
*sbdev, void *opaque)
 irq_attr[3 * i + 1] = cpu_to_be32(irq_number);
 irq_attr[3 * i + 2] = cpu_to_be32(GIC_FDT_IRQ_FLAGS_LEVEL_HI);
 }
-ret = qemu_fdt_setprop(fdt, nodename, "interrupts",
+qemu_fdt_setprop(fdt, nodename, "interrupts",
  irq_attr, vbasedev->num_irqs * 3 * sizeof(uint32_t));
-if (ret) {
-error_report("could not set interrupts property of node %s",
- nodename);
-}
 g_free(irq_attr);
-fail_reg:
 g_free(reg_attr);
 g_free(nodename);
-return ret;
+return 0;
 }
 
 /* AMD xgbe properties whose values are copied/pasted from host */
-- 
1.9.1




[Qemu-devel] [PATCH v6 3/8] device_tree: introduce qemu_fdt_node_path

2016-02-01 Thread Eric Auger
This new helper routine returns a NULL terminated array of
node paths matching a node name and a compat string.

Signed-off-by: Eric Auger 

---
v5 -> v6:
- in case of error, free the resources and return NULL
- update the doc comment

v4 -> v5:
- support the case where several nodes exist, ie.
  return an array of node paths. Also add Error **
  parameter

v1 -> v2:
- move doc comment in header file
- do not use a fixed size buffer
- break on errors in while loop
- use strcmp instead of strncmp

RFC -> v1:
- improve error handling according to Alex' comments
---
 device_tree.c| 51 
 include/sysemu/device_tree.h | 18 
 2 files changed, 69 insertions(+)

diff --git a/device_tree.c b/device_tree.c
index 3797182..a89b838 100644
--- a/device_tree.c
+++ b/device_tree.c
@@ -230,6 +230,57 @@ static int findnode_nofail(void *fdt, const char 
*node_path)
 return offset;
 }
 
+char **qemu_fdt_node_path(void *fdt, const char *name, char *compat,
+  Error **errp)
+{
+int offset, len, ret;
+const char *iter_name;
+unsigned int path_len = 16, n = 0;
+GSList *path_list = NULL, *iter;
+char **path_array;
+
+offset = fdt_node_offset_by_compatible(fdt, -1, compat);
+
+while (offset >= 0) {
+iter_name = fdt_get_name(fdt, offset, );
+if (!iter_name) {
+offset = len;
+break;
+}
+if (!strcmp(iter_name, name)) {
+char *path;
+
+path = g_malloc(path_len);
+while ((ret = fdt_get_path(fdt, offset, path, path_len))
+  == -FDT_ERR_NOSPACE) {
+path_len += 16;
+path = g_realloc(path, path_len);
+}
+path_list = g_slist_prepend(path_list, path);
+n++;
+}
+offset = fdt_node_offset_by_compatible(fdt, offset, compat);
+}
+
+if (offset < 0 && offset != -FDT_ERR_NOTFOUND) {
+error_setg(errp, "%s: abort parsing dt for %s/%s: %s",
+   __func__, name, compat, fdt_strerror(offset));
+g_slist_free_full(path_list, g_free);
+return NULL;
+}
+
+path_array = g_new(char *, n + 1);
+path_array[n--] = NULL;
+
+for (iter = path_list; iter; iter = iter->next) {
+path_array[n--] = iter->data;
+}
+
+g_slist_free(path_list);
+
+return path_array;
+}
+
 int qemu_fdt_setprop(void *fdt, const char *node_path,
  const char *property, const void *val, int size)
 {
diff --git a/include/sysemu/device_tree.h b/include/sysemu/device_tree.h
index 62093ba..552df21 100644
--- a/include/sysemu/device_tree.h
+++ b/include/sysemu/device_tree.h
@@ -25,6 +25,24 @@ void *load_device_tree(const char *filename_path, int 
*sizep);
 void *load_device_tree_from_sysfs(void);
 #endif
 
+/**
+ * qemu_fdt_node_path: return the paths of nodes matching a given
+ * name and compat string
+ * @fdt: pointer to the dt blob
+ * @name: node name
+ * @compat: compatibility string
+ * @errp: handle to an error object
+ *
+ * returns a newly allocated NULL-terminated array of node paths.
+ * Use g_strfreev() to free it. If one or more nodes were found, the
+ * array contains the path of each node and the last element equals to
+ * NULL. If there is no error but no matching node was found, the
+ * returned array contains a single element equal to NULL. If an error
+ * was encountered when parsing the blob, the function returns NULL
+ */
+char **qemu_fdt_node_path(void *fdt, const char *name, char *compat,
+  Error **errp);
+
 int qemu_fdt_setprop(void *fdt, const char *node_path,
  const char *property, const void *val, int size);
 int qemu_fdt_setprop_cell(void *fdt, const char *node_path,
-- 
1.9.1




[Qemu-devel] [PATCH v6 2/8] device_tree: introduce load_device_tree_from_sysfs

2016-02-01 Thread Eric Auger
This function returns the host device tree blob from sysfs
(/proc/device-tree). It uses a recursive function inspired
from dtc read_fstree.

Signed-off-by: Eric Auger 

---
v5 -> v6:
- fix some spelling mistakes
- error_report + exit replaced by error_setg
- const char *parent_node;
- use g_strdup_printf instead of g_strjoin
- add a doc comment for load_device_tree_from_sysfs
v1 -> v2:
- do not implement/expose read_fstree and load_device_tree_from_sysfs
  if CONFIG_LINUX is not defined (lstat is not implemeted in mingw)
- correct indentation in read_fstree
- use /proc/device-tree symlink instead of /sys/firmware/devicetree/base
  path (kernel.org/doc/Documentation/ABI/testing/sysfs-firmware-ofw)
- use g_file_get_contents in read_fstree
- introduce SYSFS_DT_BASEDIR macro and use strlen
- exit on error in load_device_tree_from_sysfs
- user error_setg

RFC -> v1:
- remove runtime dependency on dtc binary and introduce read_fstree
---
 device_tree.c| 99 
 include/sysemu/device_tree.h |  8 
 2 files changed, 107 insertions(+)

diff --git a/device_tree.c b/device_tree.c
index a9f5f8e..3797182 100644
--- a/device_tree.c
+++ b/device_tree.c
@@ -17,6 +17,9 @@
 #include 
 #include 
 #include 
+#ifdef CONFIG_LINUX
+#include 
+#endif
 
 #include "qemu-common.h"
 #include "qemu/error-report.h"
@@ -117,6 +120,102 @@ fail:
 return NULL;
 }
 
+#ifdef CONFIG_LINUX
+
+#define SYSFS_DT_BASEDIR "/proc/device-tree"
+
+/**
+ * read_fstree: this function is inspired from dtc read_fstree
+ * @fdt: preallocated fdt blob buffer, to be populated
+ * @dirname: directory to scan under SYSFS_DT_BASEDIR
+ * the search is recursive and the tree is searched down to the
+ * leaves (property files).
+ *
+ * the function asserts in case of error
+ */
+static void read_fstree(void *fdt, const char *dirname)
+{
+DIR *d;
+struct dirent *de;
+struct stat st;
+const char *root_dir = SYSFS_DT_BASEDIR;
+const char *parent_node;
+
+if (strstr(dirname, root_dir) != dirname) {
+error_setg(_fatal, "%s: %s must be searched within %s",
+   __func__, dirname, root_dir);
+}
+parent_node = [strlen(SYSFS_DT_BASEDIR)];
+
+d = opendir(dirname);
+if (!d) {
+error_setg(_fatal, "%s cannot open %s", __func__, dirname);
+}
+
+while ((de = readdir(d)) != NULL) {
+char *tmpnam;
+
+if (!g_strcmp0(de->d_name, ".")
+|| !g_strcmp0(de->d_name, "..")) {
+continue;
+}
+
+tmpnam = g_strdup_printf("%s/%s", dirname, de->d_name);
+
+if (lstat(tmpnam, ) < 0) {
+error_setg(_fatal, "%s cannot lstat %s", __func__, tmpnam);
+}
+
+if (S_ISREG(st.st_mode)) {
+gchar *val;
+gsize len;
+
+if (!g_file_get_contents(tmpnam, , , NULL)) {
+error_setg(_fatal, "%s not able to extract info from %s",
+   __func__, tmpnam);
+}
+
+if (strlen(parent_node) > 0) {
+qemu_fdt_setprop(fdt, parent_node,
+ de->d_name, val, len);
+} else {
+qemu_fdt_setprop(fdt, "/", de->d_name, val, len);
+}
+g_free(val);
+} else if (S_ISDIR(st.st_mode)) {
+char *node_name;
+
+node_name = g_strdup_printf("%s/%s",
+parent_node, de->d_name);
+qemu_fdt_add_subnode(fdt, node_name);
+g_free(node_name);
+read_fstree(fdt, tmpnam);
+}
+
+g_free(tmpnam);
+}
+
+closedir(d);
+}
+
+/* load_device_tree_from_sysfs: extract the dt blob from host sysfs */
+void *load_device_tree_from_sysfs(void)
+{
+void *host_fdt;
+int host_fdt_size;
+
+host_fdt = create_device_tree(_fdt_size);
+read_fstree(host_fdt, SYSFS_DT_BASEDIR);
+if (fdt_check_header(host_fdt)) {
+error_setg(_fatal,
+   "%s host device tree extracted into memory is invalid",
+   __func__);
+}
+return host_fdt;
+}
+
+#endif /* CONFIG_LINUX */
+
 static int findnode_nofail(void *fdt, const char *node_path)
 {
 int offset;
diff --git a/include/sysemu/device_tree.h b/include/sysemu/device_tree.h
index 359e143..62093ba 100644
--- a/include/sysemu/device_tree.h
+++ b/include/sysemu/device_tree.h
@@ -16,6 +16,14 @@
 
 void *create_device_tree(int *sizep);
 void *load_device_tree(const char *filename_path, int *sizep);
+#ifdef CONFIG_LINUX
+/**
+ * load_device_tree_from_sysfs: reads the device tree information in the
+ * /proc/device-tree directory and return the corresponding binary blob
+ * buffer pointer. Asserts in case of error.
+ */
+void *load_device_tree_from_sysfs(void);
+#endif
 
 int qemu_fdt_setprop(void *fdt, const char *node_path,
  const char *property, const void *val, int size);
-- 
1.9.1




Re: [Qemu-devel] [PATCH 4/8] libqos: remove some leaks

2016-02-01 Thread Marc-André Lureau
Hi

On Fri, Jan 29, 2016 at 4:43 PM, Markus Armbruster  wrote:
> The existing users pass a func that saves dev, and free the saved dev
> later.  Works as long as we call func() at most once.  If multiple
> devices match, all but the last one are leaked.  Can this happen?

It is the responsability of the func() callback to deal with multiple
matches. I don't think this needs to change.

This fix is only about the case of unmatching devices that need to be
free within qpci_device_foreach().

Do you ack that fix?


-- 
Marc-André Lureau



Re: [Qemu-devel] [PATCH RFC v2 3/5] net/filter: Introduce a helper to add a filter to the netdev

2016-02-01 Thread Hailiang Zhang

On 2016/2/1 18:43, Daniel P. Berrange wrote:

On Wed, Jan 27, 2016 at 04:29:38PM +0800, zhanghailiang wrote:

We add a new helper function netdev_add_filter(), this function
can help adding a filter object to a netdev.
Besides, we add a is_default member for struct NetFilterState
to indicate whether the filter is default or not.

Signed-off-by: zhanghailiang 
---
v2:
  -Re-implement netdev_add_filter() by re-using object_create()
   (Jason's suggestion)
---
  include/net/filter.h |  7 +
  net/filter.c | 80 
  2 files changed, 87 insertions(+)



+void netdev_add_filter(const char *netdev_id,
+   const char *filter_type,
+   const char *id,
+   bool is_default,
+   Error **errp)
+{
+NetClientState *nc = qemu_find_netdev(netdev_id);
+char *optarg;
+QemuOpts *opts = NULL;
+Error *err = NULL;
+
+/* FIXME: Not support multiple queues */
+if (!nc || nc->queue_index > 1) {
+return;
+}
+/* Not support vhost-net */
+if (get_vhost_net(nc)) {
+return;
+}
+
+optarg = g_strdup_printf("qom-type=%s,id=%s,netdev=%s,status=%s",
+filter_type, id, netdev_id, is_default ? "disable" : "enable");
+opts = qemu_opts_parse_noisily(_filter_opts,
+   optarg, false);


Formatting a string and then immediately parsing it again is totally
crazy, not least because you're not taking care to do escaping of
special characters like ',' in the string parameters.



Got it.


+if (!opts) {
+error_report("Failed to parse param '%s'", optarg);
+exit(1);
+}
+g_free(optarg);
+if (object_create(NULL, opts, ) < 0) {
+error_report("Failed to create object");
+goto out_clean;
+}


Don't use object_create() - use object_new_with_props() which avoids
the need to format + parse the string above. ie do

   object_new_with_props(filter_type,
 object_get_objects_root(),
id,
,
"netdev", netdev_id,
"status", is_default ? "disable" : "enable",
NULL);




Ha, that's really a good idea, i will fix it like that
in next version. Thank you very much.

Hailiang


Regards,
Daniel







Re: [Qemu-devel] [PATCH v5 2/8] device_tree: introduce load_device_tree_from_sysfs

2016-02-01 Thread Eric Auger
Hi Peter,
On 01/25/2016 03:13 PM, Peter Maydell wrote:
> On 18 January 2016 at 15:16, Eric Auger  wrote:
>> This function returns the host device tree blob from sysfs
>> (/proc/device-tree). It uses a recursive function inspired
>> from dtc read_fstree.
>>
>> Signed-off-by: Eric Auger 
>>
>> ---
>> v1 -> v2:
>> - do not implement/expose read_fstree and load_device_tree_from_sysfs
>>   if CONFIG_LINUX is not defined (lstat is not implemeted in mingw)
>> - correct indentation in read_fstree
>> - use /proc/device-tree symlink instead of /sys/firmware/devicetree/base
>>   path (kernel.org/doc/Documentation/ABI/testing/sysfs-firmware-ofw)
>> - use g_file_get_contents in read_fstree
>> - introduce SYSFS_DT_BASEDIR macro and use strlen
>> - exit on error in load_device_tree_from_sysfs
>> - user error_setg
>>
>> RFC -> v1:
>> - remove runtime dependency on dtc binary and introduce read_fstree
>> ---
>>  device_tree.c| 100 
>> +++
>>  include/sysemu/device_tree.h |   3 ++
>>  2 files changed, 103 insertions(+)
>>
>> diff --git a/device_tree.c b/device_tree.c
>> index a9f5f8e..b262c2d 100644
>> --- a/device_tree.c
>> +++ b/device_tree.c
>> @@ -17,6 +17,9 @@
>>  #include 
>>  #include 
>>  #include 
>> +#ifdef CONFIG_LINUX
>> +#include 
>> +#endif
>>
>>  #include "qemu-common.h"
>>  #include "qemu/error-report.h"
>> @@ -117,6 +120,103 @@ fail:
>>  return NULL;
>>  }
>>
>> +#ifdef CONFIG_LINUX
>> +
>> +#define SYSFS_DT_BASEDIR "/proc/device-tree"
>> +
>> +/**
>> + * read_fstree: this function is inspired from dtc read_fstree
>> + * @fdt: preallocated fdt blob buffer, to be populated
>> + * @dirname: directory to scan under SYSFS_DT_BASEDIR
>> + * the search is recursive and the tree is searched down to the
>> + * leafs (property files).
> 
> "leaves"
OK
> 
>> + *
>> + * the function self-asserts in case of error
> 
> "asserts"
OK
> 
>> + */
>> +static void read_fstree(void *fdt, const char *dirname)
>> +{
>> +DIR *d;
>> +struct dirent *de;
>> +struct stat st;
>> +const char *root_dir = SYSFS_DT_BASEDIR;
>> +char *parent_node;
>> +
>> +if (strstr(dirname, root_dir) != dirname) {
>> +error_report("%s: %s must be searched within %s",
>> + __func__, dirname, root_dir);
>> +exit(1);
> 
> Why does this one error_report and exit but other errors below use
> error_setg?
replaced with error_setg(_fatal, ...)
> 
>> +}
>> +parent_node = (char *)[strlen(SYSFS_DT_BASEDIR)];
> 
> What causes us to need this cast to char* ?
I changed parent_node to a const char * instead of char*
> 
>> +
>> +d = opendir(dirname);
>> +if (!d) {
>> +error_setg(_fatal, "%s cannot open %s", __func__, dirname);
> 
> You need to return here (and similarly to bail out properly
> in the other error paths below).
> 
>> +}
>> +
>> +while ((de = readdir(d)) != NULL) {
>> +char *tmpnam;
>> +
>> +if (!g_strcmp0(de->d_name, ".")
>> +|| !g_strcmp0(de->d_name, "..")) {
>> +continue;
>> +}
>> +
>> +tmpnam = g_strjoin("/", dirname, de->d_name, NULL);
>> +
>> +if (lstat(tmpnam, ) < 0) {
>> +error_setg(_fatal, "%s cannot lstat %s", __func__, 
>> tmpnam);
>> +}
>> +
>> +if (S_ISREG(st.st_mode)) {
>> +gchar *val;
>> +gsize len;
>> +
>> +if (!g_file_get_contents(tmpnam, , , NULL)) {
>> +error_setg(_fatal, "%s not able to extract info from 
>> %s",
>> +   __func__, tmpnam);
>> +}
>> +
>> +if (strlen(parent_node) > 0) {
>> +qemu_fdt_setprop(fdt, parent_node,
>> + de->d_name, val, len);
>> +} else {
>> +qemu_fdt_setprop(fdt, "/", de->d_name, val, len);
>> +}
>> +g_free(val);
>> +} else if (S_ISDIR(st.st_mode)) {
>> +char *node_name;
>> +
>> +node_name = g_strdup_printf("%s/%s",
>> +parent_node, de->d_name);
> 
> I don't mind whether we use g_strjoin("/",...) or g_strdup_printf("%s/%s", 
> ...)
> to glue together strings with a '/' between them, but can we not use
> both methods in the same function, please?
ok
> 
>> +qemu_fdt_add_subnode(fdt, node_name);
>> +g_free(node_name);
>> +read_fstree(fdt, tmpnam);
>> +}
>> +
>> +g_free(tmpnam);
>> +}
>> +
>> +closedir(d);
>> +}
>> +
>> +/* load_device_tree_from_sysfs: extract the dt blob from host sysfs */
>> +void *load_device_tree_from_sysfs(void)
>> +{
>> +void *host_fdt;
>> +int host_fdt_size;
>> +
>> +host_fdt = create_device_tree(_fdt_size);
>> +read_fstree(host_fdt, SYSFS_DT_BASEDIR);
>> +if (fdt_check_header(host_fdt)) {
>> +error_setg(_fatal,
>> +   "%s host device tree 

Re: [Qemu-devel] [PATCH v5 7/8] hw/arm/sysbus-fdt: enable amd-xgbe dynamic instantiation

2016-02-01 Thread Eric Auger
Hi Peter,
On 01/25/2016 03:33 PM, Peter Maydell wrote:
> On 18 January 2016 at 15:16, Eric Auger  wrote:
>> This patch allows the instantiation of the vfio-amd-xgbe device
>> from the QEMU command line (-device vfio-amd-xgbe,host="").
>>
>> The guest is exposed with a device tree node that combines the description
>> of both XGBE and PHY (representation supported from 4.2 onwards kernel):
>> Documentation/devicetree/bindings/net/amd-xgbe.txt.
>>
>> There are 5 register regions, 6 interrupts including 4 optional
>> edge-sensitive per-channel interrupts.
>>
>> Some property values are inherited from host device tree. Host device tree
>> must feature a combined XGBE/PHY representation (>= 4.2 host kernel).
>>
>> 2 clock nodes (dma and ptp) also are created. It is checked those clocks
>> are fixed on host side.
>>
>> AMD XGBE node creation function has a dependency on vfio Linux header and
>> more generally node creation function for VFIO platform devices only make
>> sense with CONFIG_LINUX so let's protect this code with #ifdef CONFIG_LINUX.
>>
>> Signed-off-by: Eric Auger 
> 
>>  hw/arm/sysbus-fdt.c | 195 
>> ++--
>>  1 file changed, 189 insertions(+), 6 deletions(-)
> 
> I'll let it pass for this patchset, but this file is starting to
> get big, and probably needs some kind of split into common functions
> in one file and then a separate file for each pass-through device,
> if they're all going to require 200-odd lines to deal with.

That's understood. If you don't mind i would rather introduce that move
in a separate series then.

Thanks

Eric
> 
>> +/**
>> + * sysfs_to_dt_name: convert the name found in sysfs into the node name
>> + * for instance e090.xgmac is converted into xgmac@e090
>> + * @sysfs_name: directory name in sysfs
>> + *
>> + * returns the device tree name upon success or NULL in case the sysfs name
>> + * does not match the expected format
>> + */
>> +static char *sysfs_to_dt_name(const char *sysfs_name)
>> +{
>> +gchar **substrings =  g_strsplit(sysfs_name, ".", 2);
>> +char *dt_name = NULL;
>> +
>> +if (!substrings || !substrings[1] || !substrings[0]) {
> 
> We should check substrings[0] before substrings[1], as otherwise
> if we're passed the empty string we'll index off the end of the
> vector.
> 
>> +goto out;
>> +}
>> +dt_name = g_strdup_printf("%s@%s", substrings[1], substrings[0]);
>> +out:
>> +g_strfreev(substrings);
>> +return dt_name;
>> +}
>> +
>>  /* Device Specific Code */
>>
>>  /**
>> @@ -243,9 +269,166 @@ fail_reg:
>>  return ret;
>>  }
>>
>> +
>> +/* AMD xgbe properties whose values are copied/pasted from host */
>> +static HostProperty amd_xgbe_copied_properties[] = {
>> +{"compatible", false},
>> +{"dma-coherent", true},
>> +{"amd,per-channel-interrupt", true},
>> +{"phy-mode", false},
>> +{"mac-address", true},
>> +{"amd,speed-set", false},
>> +{"amd,serdes-blwc", true},
>> +{"amd,serdes-cdr-rate", true},
>> +{"amd,serdes-pq-skew", true},
>> +{"amd,serdes-tx-amp", true},
>> +{"amd,serdes-dfe-tap-config", true},
>> +{"amd,serdes-dfe-tap-enable", true},
>> +{"clock-names", false},
>> +};
>> +
>> +/**
>> + * add_amd_xgbe_fdt_node
>> + *
>> + * Generates the combined xgbe/phy node following kernel >=4.2
>> + * binding documentation:
>> + * Documentation/devicetree/bindings/net/amd-xgbe.txt:
>> + * Also 2 clock nodes are created (dma and ptp)
>> + *
>> + * self-asserts in case of error
> 
> "asserts".
> 
>> +for (i = 0; i < vbasedev->num_irqs; i++) {
>> +irq_number = platform_bus_get_irqn(pbus, sbdev , i)
>> + + data->irq_start;
>> +irq_attr[3 * i] = cpu_to_be32(GIC_FDT_IRQ_TYPE_SPI);
>> +irq_attr[3 * i + 1] = cpu_to_be32(irq_number);
>> +/*
>> +  * General device interrupt and PCS auto-negociation interrupts are
> 
> "negotiation"
> 
>> +  * level-sensitive while the 4 per-channel interrupts are edge
>> +  * sensitive
>> +  */
>> +QLIST_FOREACH(intp, >intp_list, next) {
>> +if (intp->pin == i) {
>> +break;
>> +}
>> +}
>> +if (intp->flags & VFIO_IRQ_INFO_AUTOMASKED) {
>> +irq_attr[3 * i + 2] = cpu_to_be32(GIC_FDT_IRQ_FLAGS_LEVEL_HI);
>> +} else {
>> +irq_attr[3 * i + 2] = cpu_to_be32(GIC_FDT_IRQ_FLAGS_EDGE_LO_HI);
>> +}
>> +}
>> +qemu_fdt_setprop(guest_fdt, nodename, "interrupts",
>> + irq_attr, vbasedev->num_irqs * 3 * sizeof(uint32_t));
>> +
>> +g_free(host_fdt);
>> +g_strfreev(node_path);
>> +g_free(irq_attr);
>> +g_free(reg_attr);
>> +g_free(nodename);
>> +return 0;
>> +}
>> +
>> +#endif /* CONFIG_LINUX */
>> +
>>  /* list of supported dynamic sysbus devices */
>>  static const NodeCreationPair add_fdt_node_functions[] = {
>> +#ifdef 

Re: [Qemu-devel] [PATCH] build: Rename all "syscall.h" in target directories to "target_syscall.h".

2016-02-01 Thread Lluís Vilanova
Peter Maydell writes:

> On 29 January 2016 at 18:53, Lluís Vilanova  wrote:
>> This fixes double-definitions in *-user builds when using the UST
>> tracing backend (which indirectly includes the system's "syscall.h").
>> 
>> Signed-off-by: Lluís Vilanova 
>> ---
>> bsd-user/qemu.h   |2 +-
>> linux-user/aarch64/target_syscall.h   |5 +
>> linux-user/alpha/target_syscall.h |5 +
>> linux-user/arm/target_syscall.h   |4 
>> linux-user/cris/target_syscall.h  |0
>> linux-user/i386/target_syscall.h  |5 +
>> linux-user/m68k/target_syscall.h  |4 
>> linux-user/mips/target_syscall.h  |4 
>> linux-user/mips64/target_syscall.h|4 
>> linux-user/openrisc/target_syscall.h  |5 +
>> linux-user/ppc/target_syscall.h   |5 +
>> linux-user/qemu.h |2 +-
>> linux-user/s390x/target_syscall.h |5 +
>> linux-user/sh4/target_syscall.h   |5 +
>> linux-user/sparc/target_syscall.h |5 +
>> linux-user/sparc64/target_syscall.h   |5 +
>> linux-user/unicore32/target_syscall.h |0
>> linux-user/x86_64/target_syscall.h|5 +
>> target-microblaze/target_syscall.h|1 +
>> target-tilegx/target_syscall.h|1 +
>> 20 files changed, 70 insertions(+), 2 deletions(-)
>> rename linux-user/aarch64/{syscall.h => target_syscall.h} (80%)
>> rename linux-user/alpha/{syscall.h => target_syscall.h} (98%)
>> rename linux-user/arm/{syscall.h => target_syscall.h} (93%)
>> rename linux-user/cris/{syscall.h => target_syscall.h} (100%)
>> rename linux-user/i386/{syscall.h => target_syscall.h} (97%)
>> rename linux-user/m68k/{syscall.h => target_syscall.h} (87%)
>> rename linux-user/mips/{syscall.h => target_syscall.h} (99%)
>> rename linux-user/mips64/{syscall.h => target_syscall.h} (99%)
>> rename linux-user/openrisc/{syscall.h => target_syscall.h} (90%)
>> rename linux-user/ppc/{syscall.h => target_syscall.h} (96%)
>> rename linux-user/s390x/{syscall.h => target_syscall.h} (89%)
>> rename linux-user/sh4/{syscall.h => target_syscall.h} (83%)
>> rename linux-user/sparc/{syscall.h => target_syscall.h} (87%)
>> rename linux-user/sparc64/{syscall.h => target_syscall.h} (87%)
>> rename linux-user/unicore32/{syscall.h => target_syscall.h} (100%)
>> rename linux-user/x86_64/{syscall.h => target_syscall.h} (96%)
>> create mode 100644 target-microblaze/target_syscall.h
>> create mode 100644 target-tilegx/target_syscall.h
>> 
>> diff --git a/bsd-user/qemu.h b/bsd-user/qemu.h
>> index 735cb40..1a361df 100644
>> --- a/bsd-user/qemu.h
>> +++ b/bsd-user/qemu.h
>> @@ -38,7 +38,7 @@ enum BSDType {
>> extern enum BSDType bsd_type;
>> 
>> #include "syscall_defs.h"
>> -#include "syscall.h"
>> +#include "target_syscall.h"

> If you want to change this you need to rename all the
> bsd-user/*/syscall.h too. Otherwise bsd-user compilation will
> break.

> Personally I think I'd do linux-user in one patch and
> bsd-user in a second patch.

True, my bad (my build did not catch this one). I'll send a v2 series.

Thanks,
  Lluis



Re: [Qemu-devel] [PATCH] linux-user,ppc: synchronize syscall_nr.h

2016-02-01 Thread Laurent Vivier
Ping ?

Le 19/01/2016 00:08, Laurent Vivier a écrit :
> Synchronize with include/uapi/asm/unistd.h from kernel v4.4
> 
> This allows to use timerfd_create().
> 
> Signed-off-by: Laurent Vivier 
> ---
>  linux-user/ppc/syscall_nr.h | 14 +-
>  1 file changed, 13 insertions(+), 1 deletion(-)
> 
> diff --git a/linux-user/ppc/syscall_nr.h b/linux-user/ppc/syscall_nr.h
> index 1e1736e..7382294 100644
> --- a/linux-user/ppc/syscall_nr.h
> +++ b/linux-user/ppc/syscall_nr.h
> @@ -319,7 +319,7 @@
>  #define TARGET_NR_epoll_pwait303
>  #define TARGET_NR_utimensat  304
>  #define TARGET_NR_signalfd   305
> -#define TARGET_NR_timerfd306
> +#define TARGET_NR_timerfd_create306
>  #define TARGET_NR_eventfd307
>  #define TARGET_NR_sync_file_range2   308
>  #define TARGET_NR_fallocate  309
> @@ -368,3 +368,15 @@
>  #define TARGET_NR_process_vm_writev 352
>  #define TARGET_NR_finit_module  353
>  #define TARGET_NR_kcmp  354
> +#define TARGET_NR_sched_setattr 355
> +#define TARGET_NR_sched_getattr 356
> +#define TARGET_NR_renameat2 357
> +#define TARGET_NR_seccomp   358
> +#define TARGET_NR_getrandom 359
> +#define TARGET_NR_memfd_create  360
> +#define TARGET_NR_bpf   361
> +#define TARGET_NR_execveat  362
> +#define TARGET_NR_switch_endian 363
> +#define TARGET_NR_userfaultfd   364
> +#define TARGET_NR_membarrier365
> +#define TARGET_NR_mlock2378
> 



Re: [Qemu-devel] [PATCH] linux-user: fix realloc size of target_fd_trans.

2016-02-01 Thread Laurent Vivier
Ping ?

Le 18/01/2016 23:50, Laurent Vivier a écrit :
> target_fd_trans is an array of "TargetFdTrans *": compute size
> accordingly. Use g_renew() as proposed by Paolo.
> 
> Reported-by: Paolo Bonzini 
> Signed-off-by: Laurent Vivier 
> ---
>  linux-user/syscall.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index 0cbace4..fd04e5f 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -330,8 +330,8 @@ static void fd_trans_register(int fd, TargetFdTrans 
> *trans)
>  if (fd >= target_fd_max) {
>  oldmax = target_fd_max;
>  target_fd_max = ((fd >> 6) + 1) << 6; /* by slice of 64 entries */
> -target_fd_trans = g_realloc(target_fd_trans,
> -target_fd_max * sizeof(TargetFdTrans));
> +target_fd_trans = g_renew(TargetFdTrans *,
> +  target_fd_trans, target_fd_max);
>  memset((void *)(target_fd_trans + oldmax), 0,
> (target_fd_max - oldmax) * sizeof(TargetFdTrans *));
>  }
> 



[Qemu-devel] [PATCH v6 5/8] device_tree: qemu_fdt_getprop_cell converted to use the error API

2016-02-01 Thread Eric Auger
This patch aligns the prototype with qemu_fdt_getprop. The caller
can choose whether the function self-asserts on error (passing
_fatal as Error ** argument, corresponding to the legacy behavior),
or behaves differently such as simply output a message.

In this later case the caller can use the new lenp parameter to interpret
the error if any.

Signed-off-by: Eric Auger 
Reviewed-by: Peter Crosthwaite 

---
v4 -> v5:
- Add Peter's R-b
- remove comment about error_fatal

v3 : creation
---
 device_tree.c| 21 ++---
 hw/arm/boot.c|  6 --
 hw/arm/vexpress.c|  6 --
 include/sysemu/device_tree.h | 14 +-
 4 files changed, 35 insertions(+), 12 deletions(-)

diff --git a/device_tree.c b/device_tree.c
index 2c44a3d..c25a7b0 100644
--- a/device_tree.c
+++ b/device_tree.c
@@ -351,15 +351,22 @@ const void *qemu_fdt_getprop(void *fdt, const char 
*node_path,
 }
 
 uint32_t qemu_fdt_getprop_cell(void *fdt, const char *node_path,
-   const char *property)
+   const char *property, int *lenp, Error **errp)
 {
 int len;
-const uint32_t *p = qemu_fdt_getprop(fdt, node_path, property, ,
- _fatal);
-if (len != 4) {
-error_report("%s: %s/%s not 4 bytes long (not a cell?)",
- __func__, node_path, property);
-exit(1);
+const uint32_t *p;
+
+if (!lenp) {
+lenp = 
+}
+p = qemu_fdt_getprop(fdt, node_path, property, lenp, errp);
+if (!p) {
+return 0;
+} else if (*lenp != 4) {
+error_setg(errp, "%s: %s/%s not 4 bytes long (not a cell?)",
+   __func__, node_path, property);
+*lenp = -EINVAL;
+return 0;
 }
 return be32_to_cpu(*p);
 }
diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index 7742dd3..65b2d9d 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -386,8 +386,10 @@ static int load_dtb(hwaddr addr, const struct 
arm_boot_info *binfo,
 return 0;
 }
 
-acells = qemu_fdt_getprop_cell(fdt, "/", "#address-cells");
-scells = qemu_fdt_getprop_cell(fdt, "/", "#size-cells");
+acells = qemu_fdt_getprop_cell(fdt, "/", "#address-cells",
+   NULL, _fatal);
+scells = qemu_fdt_getprop_cell(fdt, "/", "#size-cells",
+   NULL, _fatal);
 if (acells == 0 || scells == 0) {
 fprintf(stderr, "dtb file invalid (#address-cells or #size-cells 
0)\n");
 goto fail;
diff --git a/hw/arm/vexpress.c b/hw/arm/vexpress.c
index 3154aea..726c4e0 100644
--- a/hw/arm/vexpress.c
+++ b/hw/arm/vexpress.c
@@ -478,8 +478,10 @@ static void vexpress_modify_dtb(const struct arm_boot_info 
*info, void *fdt)
 uint32_t acells, scells, intc;
 const VEDBoardInfo *daughterboard = (const VEDBoardInfo *)info;
 
-acells = qemu_fdt_getprop_cell(fdt, "/", "#address-cells");
-scells = qemu_fdt_getprop_cell(fdt, "/", "#size-cells");
+acells = qemu_fdt_getprop_cell(fdt, "/", "#address-cells",
+   NULL, _fatal);
+scells = qemu_fdt_getprop_cell(fdt, "/", "#size-cells",
+   NULL, _fatal);
 intc = find_int_controller(fdt);
 if (!intc) {
 /* Not fatal, we just won't provide virtio. This will
diff --git a/include/sysemu/device_tree.h b/include/sysemu/device_tree.h
index 48bf3b5..705650a 100644
--- a/include/sysemu/device_tree.h
+++ b/include/sysemu/device_tree.h
@@ -67,8 +67,20 @@ int qemu_fdt_setprop_phandle(void *fdt, const char 
*node_path,
 const void *qemu_fdt_getprop(void *fdt, const char *node_path,
  const char *property, int *lenp,
  Error **errp);
+/**
+ * qemu_fdt_getprop_cell: retrieve the value of a given 4 byte property
+ * @fdt: pointer to the device tree blob
+ * @node_path: node path
+ * @property: name of the property to find
+ * @lenp: fdt error if any or -EINVAL if the property size is different from
+ *4 bytes, or 4 (expected length of the property) upon success.
+ * @errp: handle to an error object
+ *
+ * returns the property value on success
+ */
 uint32_t qemu_fdt_getprop_cell(void *fdt, const char *node_path,
-   const char *property);
+   const char *property, int *lenp,
+   Error **errp);
 uint32_t qemu_fdt_get_phandle(void *fdt, const char *path);
 uint32_t qemu_fdt_alloc_phandle(void *fdt);
 int qemu_fdt_nop_node(void *fdt, const char *node_path);
-- 
1.9.1




  1   2   3   >