Re: Questionable aspects of QEMU Error's design

2020-04-01 Thread Markus Armbruster
Peter Maydell  writes:

> On Wed, 1 Apr 2020 at 10:03, Markus Armbruster  wrote:
>>
>> QEMU's Error was patterned after GLib's GError.  Differences include:
>
> From my POV the major problem with Error as we have it today
> is that it makes the simple process of writing code like
> device realize functions horrifically boilerplate heavy;
> for instance this is from hw/arm/armsse.c:
>
> object_property_set_link(cpuobj, OBJECT(>cpu_container[i]),
>  "memory", );
> if (err) {
> error_propagate(errp, err);
> return;
> }
> object_property_set_link(cpuobj, OBJECT(s), "idau", );
> if (err) {
> error_propagate(errp, err);
> return;
> }
> object_property_set_bool(cpuobj, true, "realized", );
> if (err) {
> error_propagate(errp, err);
> return;
> }
>
> 16 lines of code just to set 2 properties on an object
> and realize it. It's a lot of boilerplate and as
> a result we frequently get it wrong or take shortcuts
> (eg forgetting the error-handling entirely, calling
> error_propagate just once for a whole sequence of
> calls, taking the lazy approach and using err_abort
> or err_fatal when we ought really to be propagating
> an error, etc). I haven't looked at 'auto propagation'
> yet, hopefully it will help?

With that, you can have

object_property_set_link(cpuobj, OBJECT(>cpu_container[i]),
 "memory", errp);
if (*errp) {
return;
}
object_property_set_link(cpuobj, OBJECT(s), "idau", errp);
if (*errp) {
return;
}
object_property_set_bool(cpuobj, true, "realized", errp);
if (*errp) {
return;
}

but you have to add

ERRP_AUTO_PROPAGATE();

right at the beginning of the function.

It's a small improvement.  A bigger one is

if (object_property_set_link(cpuobj, OBJECT(>cpu_container[i]),
 "memory", errp)) {
return;
}
if (object_property_set_link(cpuobj, OBJECT(s), "idau", errp)) {
return;
}
if (object_property_set_bool(cpuobj, true, "realized", errp)) {
return;
}

This is item "Return value conventions" in the message you replied to.

Elsewhere in this thread, I discussed the difficulties of automating the
conversion to this style.  I think I know how to automate converting the
calls to use the bool return value, but converting the functions to
return it looks hard.  We could do that manually for a modest set of
frequently used functions.  object.h would top my list.




Re: [PATCH v3 0/3] hw/riscv: Add a serial property to sifive_u

2020-04-01 Thread Bin Meng
On Tue, Mar 24, 2020 at 10:08 AM Bin Meng  wrote:
>
> Hi Palmer,
>
> On Sat, Mar 7, 2020 at 5:45 AM Alistair Francis
>  wrote:
> >
> > At present the board serial number is hard-coded to 1, and passed
> > to OTP model during initialization. Firmware (FSBL, U-Boot) uses
> > the serial number to generate a unique MAC address for the on-chip
> > ethernet controller. When multiple QEMU 'sifive_u' instances are
> > created and connected to the same subnet, they all have the same
> > MAC address hence it creates a unusable network.
> >
> > A new "serial" property is introduced to specify the board serial
> > number. When not given, the default serial number 1 is used.
> >
>
> Could you please take this for v5.0.0?

Ping?



Re: Question about scsi device hotplug (e.g scsi-hd)

2020-04-01 Thread Markus Armbruster
Paolo Bonzini  writes:

> On 01/04/20 17:09, Stefan Hajnoczi wrote:
>>> What do you think about it?
>>
>> Maybe aio_disable_external() is needed to postpone device emulation
>> until after realize has finished?
>> 
>> Virtqueue kick ioeventfds are marked "external" and won't be processed
>> while external events are disabled.  See also
>> virtio_queue_aio_set_host_notifier_handler() ->
>> aio_set_event_notifier().
>
> Yes, I think Stefan is right.

Is this issue limited to SCSI devices?




Re: Questionable aspects of QEMU Error's design

2020-04-01 Thread Vladimir Sementsov-Ogievskiy

01.04.2020 23:15, Peter Maydell wrote:

On Wed, 1 Apr 2020 at 10:03, Markus Armbruster  wrote:


QEMU's Error was patterned after GLib's GError.  Differences include:


 From my POV the major problem with Error as we have it today
is that it makes the simple process of writing code like
device realize functions horrifically boilerplate heavy;
for instance this is from hw/arm/armsse.c:

 object_property_set_link(cpuobj, OBJECT(>cpu_container[i]),
  "memory", );
 if (err) {
 error_propagate(errp, err);
 return;
 }
 object_property_set_link(cpuobj, OBJECT(s), "idau", );
 if (err) {
 error_propagate(errp, err);
 return;
 }
 object_property_set_bool(cpuobj, true, "realized", );
 if (err) {
 error_propagate(errp, err);
 return;
 }

16 lines of code just to set 2 properties on an object
and realize it. It's a lot of boilerplate and as
a result we frequently get it wrong or take shortcuts
(eg forgetting the error-handling entirely, calling
error_propagate just once for a whole sequence of
calls, taking the lazy approach and using err_abort
or err_fatal when we ought really to be propagating
an error, etc). I haven't looked at 'auto propagation'
yet, hopefully it will help?


Yes, after it the code above will look like this:

... some_func(..., errp)
{
ERRP_AUTO_PROPAGATE(); # magic macro at function start, and no "Error *err" 
definition

...
  object_property_set_link(cpuobj, OBJECT(>cpu_container[i]),
   "memory", errp);
  if (*errp) {
  return;
  }
  object_property_set_link(cpuobj, OBJECT(s), "idau", errp);
  if (*errp) {
  return;
  }
  object_property_set_bool(cpuobj, true, "realized", errp);
  if (*errp) {
  return;
  }
...
}

 - propagation is automatic, errp is used directly and may be safely 
dereferenced.



--
Best regards,
Vladimir



Re: [RFC for Linux] virtio_balloon: Add VIRTIO_BALLOON_F_THP_ORDER to handle THP spilt issue

2020-04-01 Thread teawater



> 2020年4月1日 17:48,David Hildenbrand  写道:
> 
> On 31.03.20 18:37, Nadav Amit wrote:
>>> On Mar 31, 2020, at 7:09 AM, David Hildenbrand  wrote:
>>> 
>>> On 31.03.20 16:07, Michael S. Tsirkin wrote:
 On Tue, Mar 31, 2020 at 04:03:18PM +0200, David Hildenbrand wrote:
> On 31.03.20 15:37, Michael S. Tsirkin wrote:
>> On Tue, Mar 31, 2020 at 03:32:05PM +0200, David Hildenbrand wrote:
>>> On 31.03.20 15:24, Michael S. Tsirkin wrote:
 On Tue, Mar 31, 2020 at 12:35:24PM +0200, David Hildenbrand wrote:
> On 26.03.20 10:49, Michael S. Tsirkin wrote:
>> On Thu, Mar 26, 2020 at 08:54:04AM +0100, David Hildenbrand wrote:
 Am 26.03.2020 um 08:21 schrieb Michael S. Tsirkin 
 :
 
 On Thu, Mar 12, 2020 at 09:51:25AM +0100, David Hildenbrand wrote:
>> On 12.03.20 09:47, Michael S. Tsirkin wrote:
>> On Thu, Mar 12, 2020 at 09:37:32AM +0100, David Hildenbrand 
>> wrote:
>>> 2. You are essentially stealing THPs in the guest. So the 
>>> fastest
>>> mapping (THP in guest and host) is gone. The guest won't be 
>>> able to make
>>> use of THP where it previously was able to. I can imagine this 
>>> implies a
>>> performance degradation for some workloads. This needs a proper
>>> performance evaluation.
>> 
>> I think the problem is more with the alloc_pages API.
>> That gives you exactly the given order, and if there's
>> a larger chunk available, it will split it up.
>> 
>> But for balloon - I suspect lots of other users,
>> we do not want to stress the system but if a large
>> chunk is available anyway, then we could handle
>> that more optimally by getting it all in one go.
>> 
>> 
>> So if we want to address this, IMHO this calls for a new API.
>> Along the lines of
>> 
>>  struct page *alloc_page_range(gfp_t gfp, unsigned int min_order,
>>  unsigned int max_order, unsigned int *order)
>> 
>> the idea would then be to return at a number of pages in the 
>> given
>> range.
>> 
>> What do you think? Want to try implementing that?
> 
> You can just start with the highest order and decrement the order 
> until
> your allocation succeeds using alloc_pages(), which would be 
> enough for
> a first version. At least I don't see the immediate need for a new
> kernel API.
 
 OK I remember now.  The problem is with reclaim. Unless reclaim is
 completely disabled, any of these calls can sleep. After it wakes 
 up,
 we would like to get the larger order that has become available
 meanwhile.
>>> 
>>> Yes, but that‘s a pure optimization IMHO.
>>> So I think we should do a trivial implementation first and then see 
>>> what we gain from a new allocator API. Then we might also be able 
>>> to justify it using real numbers.
>> 
>> Well how do you propose implement the necessary semantics?
>> I think we are both agreed that alloc_page_range is more or
>> less what's necessary anyway - so how would you approximate it
>> on top of existing APIs?
> diff --git a/include/linux/balloon_compaction.h 
> b/include/linux/balloon_compaction.h
>> 
>> .
>> 
>> 
> diff --git a/mm/balloon_compaction.c b/mm/balloon_compaction.c
> index 26de020aae7b..067810b32813 100644
> --- a/mm/balloon_compaction.c
> +++ b/mm/balloon_compaction.c
> @@ -112,23 +112,35 @@ size_t balloon_page_list_dequeue(struct 
> balloon_dev_info *b_dev_info,
> EXPORT_SYMBOL_GPL(balloon_page_list_dequeue);
> 
> /*
> - * balloon_page_alloc - allocates a new page for insertion into the 
> balloon
> - *   page list.
> + * balloon_pages_alloc - allocates a new page (of at most the given 
> order)
> + *for insertion into the balloon page list.
> *
> * Driver must call this function to properly allocate a new balloon 
> page.
> * Driver must call balloon_page_enqueue before definitively removing 
> the page
> * from the guest system.
> *
> + * Will fall back to smaller orders if allocation fails. The order 
> of the
> + * allocated page is stored in page->private.
> + *
> * Return: struct page for the allocated page or NULL on allocation 
> failure.
> */
> -struct 

Re: [PATCH] util/async: Add memory barrier to aio_ctx_prepare

2020-04-01 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/20200402011803.1270-1-fangyi...@huawei.com/



Hi,

This series seems to have some coding style problems. See output below for
more information:

Subject: [PATCH] util/async: Add memory barrier to aio_ctx_prepare
Message-id: 20200402011803.1270-1-fangyi...@huawei.com
Type: series

=== TEST SCRIPT BEGIN ===
#!/bin/bash
git rev-parse base > /dev/null || exit 0
git config --local diff.renamelimit 0
git config --local diff.renames True
git config --local diff.algorithm histogram
./scripts/checkpatch.pl --mailback base..
=== TEST SCRIPT END ===

Updating 3c8cf5a9c21ff8782164d1def7f44bd888713384
From https://github.com/patchew-project/qemu
 * [new tag] patchew/20200402024431.1629-1-fangyi...@huawei.com -> 
patchew/20200402024431.1629-1-fangyi...@huawei.com
Switched to a new branch 'test'
06a5d59 util/async: Add memory barrier to aio_ctx_prepare

=== OUTPUT BEGIN ===
ERROR: memory barrier without comment
#60: FILE: util/async.c:254:
+smp_mb();

total: 1 errors, 0 warnings, 7 lines checked

Commit 06a5d59d541d (util/async: Add memory barrier to aio_ctx_prepare) has 
style problems, please review.  If any of these errors
are false positives report them to the maintainer, see
CHECKPATCH in MAINTAINERS.
=== OUTPUT END ===

Test command exited with code: 1


The full log is available at
http://patchew.org/logs/20200402011803.1270-1-fangyi...@huawei.com/testing.checkpatch/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[PATCH v2] util/async: Add memory barrier to aio_ctx_prepare

2020-04-01 Thread Ying Fang
Qemu main thread is found to hang up in the mainloop when doing
image format convert on aarch64 platform and it is highly
reproduceable by executing test using:

qemu-img convert -f qcow2 -O qcow2 origin.qcow2 converted.qcow2

This mysterious hang can be explained by a race condition between
the main thread and an io worker thread. There can be a chance that
the last worker thread has called aio_bh_schedule_oneshot and it is
checking against notify_me to deliver a notfiy event. At the same
time, the main thread is calling aio_ctx_prepare however it first
calls qemu_timeout_ns_to_ms, thus the worker thread did not see
notify_me as true and did not send a notify event. The time line
can be shown in the following way:

 Main Thread
 
 aio_ctx_prepare
atomic_or(>notify_me, 1);
/* out of order execution goes here */
*timeout = qemu_timeout_ns_to_ms(aio_compute_timeout(ctx));

 Worker Thread
 ---
 aio_bh_schedule_oneshot -> aio_bh_enqueue
aio_notify
smp_mb();
if (ctx->notify_me) {   /* worker thread checks notify_me here */
event_notifier_set(>notifier);
atomic_mb_set(>notified, true);
}

Normal VM runtime is not affected by this hang since there is always some
timer timeout or subsequent io worker come and notify the main thead.
To fix this problem, a memory barrier is added to aio_ctx_prepare and
it is proved to have the hang fixed in our test.

This hang is not observed on the x86 platform however it can be easily
reproduced on the aarch64 platform, thus it is architecture related.
Not sure if this is revelant to Commit eabc977973103527bbb8fed69c91cfaa6691f8ab

Signed-off-by: Ying Fang 
Signed-off-by: zhanghailiang 
Reported-by: Euler Robot 

---
v2: add comments before the barrier

---
 util/async.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/util/async.c b/util/async.c
index b94518b..89a4f3e 100644
--- a/util/async.c
+++ b/util/async.c
@@ -250,7 +250,8 @@ aio_ctx_prepare(GSource *source, gint*timeout)
 AioContext *ctx = (AioContext *) source;
 
 atomic_or(>notify_me, 1);
-
+/* Make sure notify_me is set before aio_compute_timeout */
+smp_mb();
 /* We assume there is no timeout already supplied */
 *timeout = qemu_timeout_ns_to_ms(aio_compute_timeout(ctx));
 
-- 
1.8.3.1





Re: [PATCH v2 1/4] target/ppc: Introduce ppc_radix64_xlate() for Radix tree translation

2020-04-01 Thread David Gibson
On Wed, Apr 01, 2020 at 06:28:07PM +0200, Cédric Le Goater wrote:
> This is moving code under a new ppc_radix64_xlate() routine shared by
> the MMU Radix page fault handler and the 'get_phys_page_debug' PPC
> callback. The difference being that 'get_phys_page_debug' does not
> generate exceptions.
> 
> The specific part of process-scoped Radix translation is moved under
> ppc_radix64_process_scoped_xlate() in preparation of the future support
> for partition-scoped Radix translation. Routines raising the exceptions
> now take a 'cause_excp' bool to cover the 'get_phys_page_debug' case.
> 
> It should be functionally equivalent.
> 
> Signed-off-by: Suraj Jitindar Singh 
> Signed-off-by: Cédric Le Goater 
> ---
>  target/ppc/mmu-radix64.c | 223 ++-
>  1 file changed, 125 insertions(+), 98 deletions(-)
> 
> diff --git a/target/ppc/mmu-radix64.c b/target/ppc/mmu-radix64.c
> index d2422d1c54c9..410376fbeb65 100644
> --- a/target/ppc/mmu-radix64.c
> +++ b/target/ppc/mmu-radix64.c
> @@ -69,11 +69,16 @@ static bool 
> ppc_radix64_get_fully_qualified_addr(CPUPPCState *env, vaddr eaddr,
>  return true;
>  }
>  
> -static void ppc_radix64_raise_segi(PowerPCCPU *cpu, int rwx, vaddr eaddr)
> +static void ppc_radix64_raise_segi(PowerPCCPU *cpu, int rwx, vaddr eaddr,
> +   bool cause_excp)
>  {
>  CPUState *cs = CPU(cpu);
>  CPUPPCState *env = >env;
>  
> +if (!cause_excp) {
> +return;
> +}

Hrm... adding a parameter which makes this function a no-op seems an
odd choice, rather than putting an if in the caller.

> +
>  if (rwx == 2) { /* Instruction Segment Interrupt */
>  cs->exception_index = POWERPC_EXCP_ISEG;
>  } else { /* Data Segment Interrupt */
> @@ -84,11 +89,15 @@ static void ppc_radix64_raise_segi(PowerPCCPU *cpu, int 
> rwx, vaddr eaddr)
>  }
>  
>  static void ppc_radix64_raise_si(PowerPCCPU *cpu, int rwx, vaddr eaddr,
> -uint32_t cause)
> + uint32_t cause, bool cause_excp)
>  {
>  CPUState *cs = CPU(cpu);
>  CPUPPCState *env = >env;
>  
> +if (!cause_excp) {
> +return;
> +}
> +
>  if (rwx == 2) { /* Instruction Storage Interrupt */
>  cs->exception_index = POWERPC_EXCP_ISI;
>  env->error_code = cause;
> @@ -219,17 +228,118 @@ static bool validate_pate(PowerPCCPU *cpu, uint64_t 
> lpid, ppc_v3_pate_t *pate)
>  return true;
>  }
>  
> +static int ppc_radix64_process_scoped_xlate(PowerPCCPU *cpu, int rwx,
> +vaddr eaddr, uint64_t pid,
> +ppc_v3_pate_t pate, hwaddr 
> *g_raddr,
> +int *g_prot, int *g_page_size,
> +bool cause_excp)
> +{
> +CPUState *cs = CPU(cpu);
> +uint64_t offset, size, prtbe_addr, prtbe0, pte;
> +int fault_cause = 0;
> +hwaddr pte_addr;
> +
> +/* Index Process Table by PID to Find Corresponding Process Table Entry 
> */
> +offset = pid * sizeof(struct prtb_entry);
> +size = 1ULL << ((pate.dw1 & PATE1_R_PRTS) + 12);
> +if (offset >= size) {
> +/* offset exceeds size of the process table */
> +ppc_radix64_raise_si(cpu, rwx, eaddr, DSISR_NOPTE, cause_excp);
> +return 1;
> +}
> +prtbe_addr = (pate.dw1 & PATE1_R_PRTB) + offset;
> +prtbe0 = ldq_phys(cs->as, prtbe_addr);
> +
> +/* Walk Radix Tree from Process Table Entry to Convert EA to RA */
> +*g_page_size = PRTBE_R_GET_RTS(prtbe0);
> +pte = ppc_radix64_walk_tree(cpu, eaddr & R_EADDR_MASK,
> +prtbe0 & PRTBE_R_RPDB, prtbe0 & PRTBE_R_RPDS,
> +g_raddr, g_page_size, _cause, 
> _addr);
> +
> +if (!(pte & R_PTE_VALID) ||
> +ppc_radix64_check_prot(cpu, rwx, pte, _cause, g_prot)) {
> +/* No valid pte or access denied due to protection */
> +ppc_radix64_raise_si(cpu, rwx, eaddr, fault_cause, cause_excp);
> +return 1;
> +}
> +
> +ppc_radix64_set_rc(cpu, rwx, pte, pte_addr, g_prot);
> +
> +return 0;
> +}
> +
> +static int ppc_radix64_xlate(PowerPCCPU *cpu, vaddr eaddr, int rwx,
> + bool relocation,
> + hwaddr *raddr, int *psizep, int *protp,
> + bool cause_excp)
> +{
> +uint64_t lpid = 0, pid = 0;
> +ppc_v3_pate_t pate;
> +int psize, prot;
> +hwaddr g_raddr;
> +
> +/* Virtual Mode Access - get the fully qualified address */
> +if (!ppc_radix64_get_fully_qualified_addr(>env, eaddr, , 
> )) {
> +ppc_radix64_raise_segi(cpu, rwx, eaddr, cause_excp);
> +return 1;
> +}
> +
> +/* Get Process Table */
> +if (cpu->vhyp) {
> +PPCVirtualHypervisorClass *vhc;
> +vhc = PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
> +

[PATCH] util/async: Add memory barrier to aio_ctx_prepare

2020-04-01 Thread Ying Fang
Qemu main thread is found to hang up in the mainloop when doing
image format convert on aarch64 platform and it is highly
reproduceable by executing test using:

qemu-img convert -f qcow2 -O qcow2 origin.qcow2 converted.qcow2

This mysterious hang can be explained by a race condition between
the main thread and an io worker thread. There can be a chance that
the last worker thread has called aio_bh_schedule_oneshot and it is
checking against notify_me to deliver a notfiy event. At the same
time, the main thread is calling aio_ctx_prepare however it first
calls qemu_timeout_ns_to_ms, thus the worker thread did not see
notify_me as true and did not send a notify event. The time line
can be shown in the following way:

 Main Thread
 
 aio_ctx_prepare
atomic_or(>notify_me, 1);
/* out of order execution goes here */
*timeout = qemu_timeout_ns_to_ms(aio_compute_timeout(ctx));

 Worker Thread
 ---
 aio_bh_schedule_oneshot -> aio_bh_enqueue
aio_notify
smp_mb();
if (ctx->notify_me) {   /* worker thread checks notify_me here */
event_notifier_set(>notifier);
atomic_mb_set(>notified, true);
}

Normal VM runtime is not affected by this hang since there is always some
timer timeout or subsequent io worker come and notify the main thead.
To fix this problem, a memory barrier is added to aio_ctx_prepare and
it is proved to have the hang fixed in our test.

This hang is not observed on the x86 platform however it can be easily
reproduced on the aarch64 platform, thus it is architecture related.
Not sure if this is revelant to Commit eabc977973103527bbb8fed69c91cfaa6691f8ab

Signed-off-by: Ying Fang 
Signed-off-by: zhanghailiang 
---
 util/async.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/util/async.c b/util/async.c
index b94518b948..a51fffc20d 100644
--- a/util/async.c
+++ b/util/async.c
@@ -251,6 +251,7 @@ aio_ctx_prepare(GSource *source, gint*timeout)
 
 atomic_or(>notify_me, 1);
 
+smp_mb();
 /* We assume there is no timeout already supplied */
 *timeout = qemu_timeout_ns_to_ms(aio_compute_timeout(ctx));
 
-- 
2.19.1





Re: [PATCH] Add PAT, cr8 and EFER for 32-bit qemu to hax ioctl interface

2020-04-01 Thread Colin Xu

Sorry for missing this one.

If I remembered correctly, this patch together with the HAXM patch in 
github will cause some regression in SMP case. So we'd like to fully 
understand the technique details to make proper change, not only for a 
very specific purpose, i.e. boot Windows on Windows.


This patch together with PR#204 doens't change the IOCTL interface 
itself, but extend set/get regs with a version check, so the description 
here isn't quite clear.
And the change looks just sync the qemu/haxm status but no more. Could 
you provide more details that why Windows can't boot without the change. 
Like CR8 (TPR), is there logic that need to be handled when TPR is 
read/write?


On 2/29/2020 04:59, Alexey Romko wrote:

Add PAT, cr8 and EFER for 32-bit qemu to hax ioctl interface, part of HAX PR 204

Signed-off-by: Alexey Romko 
---
  target/i386/hax-all.c       | 32 
  target/i386/hax-i386.h      |  2 +-
  target/i386/hax-interface.h |  2 ++
  3 files changed, 31 insertions(+), 5 deletions(-)


diff --git a/target/i386/hax-all.c b/target/i386/hax-all.c
index a8b6e5aeb8..0bdd309665 100644
--- a/target/i386/hax-all.c
+++ b/target/i386/hax-all.c
@@ -45,7 +45,7 @@
      } while (0)
  
  /* Current version */

-const uint32_t hax_cur_version = 0x4; /* API v4: unmapping and MMIO moves */
+const uint32_t hax_cur_version = 0x5; /* API v5: supports CR8/EFER/PAT */
  /* Minimum HAX kernel version */
  const uint32_t hax_min_version = 0x4; /* API v4: supports unmapping */
  
@@ -137,6 +137,7 @@ static int hax_version_support(struct hax_state *hax)

          return 0;
      }
  
+    hax_global.cur_api_version = version.cur_version;

      return 1;
  }
  
@@ -845,12 +846,24 @@ static int hax_sync_vcpu_register(CPUArchState *env, int set)

          regs._cr2 = env->cr[2];
          regs._cr3 = env->cr[3];
          regs._cr4 = env->cr[4];
+
+        if( hax_global.cur_api_version >= 0x5 ) {
+          CPUState *cs = env_cpu(env);
+          X86CPU *x86_cpu = X86_CPU(cs);
+          regs._cr8 = cpu_get_apic_tpr(x86_cpu->apic_state);
+        }
+
          hax_set_segments(env, );
      } else {
          env->cr[0] = regs._cr0;
          env->cr[2] = regs._cr2;
          env->cr[3] = regs._cr3;
          env->cr[4] = regs._cr4;
+
+        //if( hax_global.cur_api_version >= 0x5 ) {
+          //no need to update TPR from regs._cr8, since all changes are 
notified.
+        //}
+
          hax_get_segments(env, );
      }
  
@@ -881,14 +894,18 @@ static int hax_get_msrs(CPUArchState *env)

      msrs[n++].entry = MSR_IA32_SYSENTER_ESP;
      msrs[n++].entry = MSR_IA32_SYSENTER_EIP;
      msrs[n++].entry = MSR_IA32_TSC;
-#ifdef TARGET_X86_64
      msrs[n++].entry = MSR_EFER;
+#ifdef TARGET_X86_64
      msrs[n++].entry = MSR_STAR;
      msrs[n++].entry = MSR_LSTAR;
      msrs[n++].entry = MSR_CSTAR;
      msrs[n++].entry = MSR_FMASK;
      msrs[n++].entry = MSR_KERNELGSBASE;
  #endif
+    if( hax_global.cur_api_version >= 0x5 ) {
+      msrs[n++].entry = MSR_PAT;
+    }
+
      md.nr_msr = n;
      ret = hax_sync_msr(env, , 0);
      if (ret < 0) {
@@ -909,10 +926,10 @@ static int hax_get_msrs(CPUArchState *env)
          case MSR_IA32_TSC:
              env->tsc = msrs[i].value;
              break;
-#ifdef TARGET_X86_64
          case MSR_EFER:
              env->efer = msrs[i].value;
              break;
+#ifdef TARGET_X86_64
          case MSR_STAR:
              env->star = msrs[i].value;
              break;
@@ -929,6 +946,9 @@ static int hax_get_msrs(CPUArchState *env)
              env->kernelgsbase = msrs[i].value;
              break;
  #endif
+        case MSR_PAT:
+            env->pat = msrs[i].value;
+            break;
          }
      }
  
@@ -947,14 +967,18 @@ static int hax_set_msrs(CPUArchState *env)

      hax_msr_entry_set([n++], MSR_IA32_SYSENTER_ESP, env->sysenter_esp);
      hax_msr_entry_set([n++], MSR_IA32_SYSENTER_EIP, env->sysenter_eip);
      hax_msr_entry_set([n++], MSR_IA32_TSC, env->tsc);
-#ifdef TARGET_X86_64
      hax_msr_entry_set([n++], MSR_EFER, env->efer);
+#ifdef TARGET_X86_64
      hax_msr_entry_set([n++], MSR_STAR, env->star);
      hax_msr_entry_set([n++], MSR_LSTAR, env->lstar);
      hax_msr_entry_set([n++], MSR_CSTAR, env->cstar);
      hax_msr_entry_set([n++], MSR_FMASK, env->fmask);
      hax_msr_entry_set([n++], MSR_KERNELGSBASE, env->kernelgsbase);
  #endif
+    if( hax_global.cur_api_version >= 0x5 ) {
+      hax_msr_entry_set([n++], MSR_PAT, env->pat);
+    }
+
      md.nr_msr = n;
      md.done = 0;
  
diff --git a/target/i386/hax-i386.h b/target/i386/hax-i386.h

index 54e9d8b057..9515803184 100644
--- a/target/i386/hax-i386.h
+++ b/target/i386/hax-i386.h
@@ -34,7 +34,7 @@ struct hax_vcpu_state {
  
  struct hax_state {

      hax_fd fd; /* the global hax device interface */
-    uint32_t version;
+    uint32_t cur_api_version;
      struct hax_vm *vm;
      uint64_t mem_quota;
      bool supports_64bit_ramblock;
diff --git 

RE: [PATCH for-5.1 00/31] target/arm: SVE2, part 1

2020-04-01 Thread Ana Pazos
Hello Richard,

I want to introduce you to Stephen Long. He is our new hire who started this 
week.

I want to know if you are available for a sync-up meeting to discuss how we can 
cooperate with qemu sve2 support.

Thank you,
Ana.

-Original Message-
From: Richard Henderson 
Sent: Thursday, March 26, 2020 4:08 PM
To: qemu-devel@nongnu.org
Cc: qemu-...@nongnu.org; Ana Pazos ; Raja Venkateswaran 

Subject: [PATCH for-5.1 00/31] target/arm: SVE2, part 1

-
CAUTION: This email originated from outside of the organization.
-

Posting this for early review.  It's based on some other patch sets that I have 
posted recently that also touch SVE, listed below.  But it might just be easier 
to clone the devel tree [2].
While the branch itself will rebase frequently for development, I've also 
created a tag, post-sve2-20200326, for this posting.

This is mostly untested, as the most recently released Foundation Model does 
not support SVE2.  Some of the new instructions overlap with old fashioned 
NEON, and I can verify that those have not broken, and show that SVE2 will use 
the same code path.  But the predicated insns and bottom/top interleaved insns 
are not yet RISU testable, as I have nothing to compare against.

The patches are in general arranged so that one complete group of insns are 
added at once.  The groups within the manual [1] have so far been small-ish.


r~

---

[1] ISA manual: 
https://static.docs.arm.com/ddi0602/d/ISA_A64_xml_futureA-2019-12_OPT.pdf

[2] Devel tree: https://github.com/rth7680/qemu/tree/tgt-arm-sve-2

Based-on: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=163610
("target/arm: sve load/store improvements")

Based-on: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=164500
("target/arm: Use tcg_gen_gvec_5_ptr for sve FMLA/FCMLA")

Based-on: http://patchwork.ozlabs.org/project/qemu-devel/list/?series=164048
("target/arm: Implement ARMv8.5-MemTag, system mode")

Richard Henderson (31):
  target/arm: Add ID_AA64ZFR0 fields and isar_feature_aa64_sve2
  target/arm: Implement SVE2 Integer Multiply - Unpredicated
  target/arm: Implement SVE2 integer pairwise add and accumulate long
  target/arm: Remove fp_status from helper_{recpe,rsqrte}_u32
  target/arm: Implement SVE2 integer unary operations (predicated)
  target/arm: Split out saturating/rounding shifts from neon
  target/arm: Implement SVE2 saturating/rounding bitwise shift left
(predicated)
  target/arm: Implement SVE2 integer halving add/subtract (predicated)
  target/arm: Implement SVE2 integer pairwise arithmetic
  target/arm: Implement SVE2 saturating add/subtract (predicated)
  target/arm: Implement SVE2 integer add/subtract long
  target/arm: Implement SVE2 integer add/subtract interleaved long
  target/arm: Implement SVE2 integer add/subtract wide
  target/arm: Implement SVE2 integer multiply long
  target/arm: Implement PMULLB and PMULLT
  target/arm: Tidy SVE tszimm shift formats
  target/arm: Implement SVE2 bitwise shift left long
  target/arm: Implement SVE2 bitwise exclusive-or interleaved
  target/arm: Implement SVE2 bitwise permute
  target/arm: Implement SVE2 complex integer add
  target/arm: Implement SVE2 integer absolute difference and accumulate
long
  target/arm: Implement SVE2 integer add/subtract long with carry
  target/arm: Create arm_gen_gvec_[us]sra
  target/arm: Create arm_gen_gvec_{u,s}{rshr,rsra}
  target/arm: Implement SVE2 bitwise shift right and accumulate
  target/arm: Create arm_gen_gvec_{sri,sli}
  target/arm: Tidy handle_vec_simd_shri
  target/arm: Implement SVE2 bitwise shift and insert
  target/arm: Vectorize SABD/UABD
  target/arm: Vectorize SABA/UABA
  target/arm: Implement SVE2 integer absolute difference and accumulate

 target/arm/cpu.h   |  31 ++
 target/arm/helper-sve.h| 345 +
 target/arm/helper.h|  81 +++-
 target/arm/translate-a64.h |   9 +
 target/arm/translate.h |  24 +-
 target/arm/vec_internal.h  | 161 
 target/arm/sve.decode  | 217 ++-
 target/arm/helper.c|   3 +-
 target/arm/kvm64.c |   2 +
 target/arm/neon_helper.c   | 515 -
 target/arm/sve_helper.c| 757 ++---
 target/arm/translate-a64.c | 557 +++  
target/arm/translate-sve.c | 557 +++
 target/arm/translate.c | 626 ++
 target/arm/vec_helper.c| 411 
 target/arm/vfp_helper.c|   4 +-
 16 files changed, 3532 insertions(+), 768 deletions(-)  create mode 100644 
target/arm/vec_internal.h

--
2.20.1





Re: [PATCH] ppc/pnv: Introduce common PNV_SETFIELD() and PNV_GETFIELD() macros

2020-04-01 Thread David Gibson
On Wed, Apr 01, 2020 at 05:26:33PM +0200, Cédric Le Goater wrote:
> Most of QEMU definitions of the register fields of the PowerNV machine
> come from skiboot and the models duplicate a set of macros for this
> purpose. Make them common under the pnv_utils.h file.
> 
> Signed-off-by: Cédric Le Goater 

Hrm.  If we're touching these, would it make sense to rewrite them in
terms of the cross-qemu generic extract64() and deposit64()?

> ---
>  include/hw/pci-host/pnv_phb3_regs.h | 16 --
>  include/hw/ppc/pnv_utils.h  | 29 +++
>  hw/intc/pnv_xive.c  | 76 -
>  hw/pci-host/pnv_phb3.c  | 32 ++--
>  hw/pci-host/pnv_phb3_msi.c  | 24 -
>  hw/pci-host/pnv_phb4.c  | 51 ---
>  6 files changed, 108 insertions(+), 120 deletions(-)
>  create mode 100644 include/hw/ppc/pnv_utils.h
> 
> diff --git a/include/hw/pci-host/pnv_phb3_regs.h 
> b/include/hw/pci-host/pnv_phb3_regs.h
> index a174ef1f7045..38f8ce9d7406 100644
> --- a/include/hw/pci-host/pnv_phb3_regs.h
> +++ b/include/hw/pci-host/pnv_phb3_regs.h
> @@ -12,22 +12,6 @@
>  
>  #include "qemu/host-utils.h"
>  
> -/*
> - * QEMU version of the GETFIELD/SETFIELD macros
> - *
> - * These are common with the PnvXive model.
> - */
> -static inline uint64_t GETFIELD(uint64_t mask, uint64_t word)
> -{
> -return (word & mask) >> ctz64(mask);
> -}
> -
> -static inline uint64_t SETFIELD(uint64_t mask, uint64_t word,
> -uint64_t value)
> -{
> -return (word & ~mask) | ((value << ctz64(mask)) & mask);
> -}
> -
>  /*
>   * PBCQ XSCOM registers
>   */
> diff --git a/include/hw/ppc/pnv_utils.h b/include/hw/ppc/pnv_utils.h
> new file mode 100644
> index ..8521e13b5149
> --- /dev/null
> +++ b/include/hw/ppc/pnv_utils.h
> @@ -0,0 +1,29 @@
> +/*
> + * QEMU PowerPC PowerNV utilities
> + *
> + * Copyright (c) 2020, IBM Corporation.
> + *
> + * This code is licensed under the GPL version 2 or later. See the
> + * COPYING file in the top-level directory.
> + */
> +
> +#ifndef PPC_PNV_UTILS_H
> +#define PPC_PNV_UTILS_H
> +
> +/*
> + * QEMU version of the GETFIELD/SETFIELD macros used in skiboot to
> + * define the register fields.
> + */
> +
> +static inline uint64_t PNV_GETFIELD(uint64_t mask, uint64_t word)
> +{
> +return (word & mask) >> ctz64(mask);
> +}
> +
> +static inline uint64_t PNV_SETFIELD(uint64_t mask, uint64_t word,
> +uint64_t value)
> +{
> +return (word & ~mask) | ((value << ctz64(mask)) & mask);
> +}
> +
> +#endif /* PPC_PNV_UTILS_H */
> diff --git a/hw/intc/pnv_xive.c b/hw/intc/pnv_xive.c
> index aeda488bd113..77cacdd6c623 100644
> --- a/hw/intc/pnv_xive.c
> +++ b/hw/intc/pnv_xive.c
> @@ -21,6 +21,7 @@
>  #include "hw/ppc/pnv_core.h"
>  #include "hw/ppc/pnv_xscom.h"
>  #include "hw/ppc/pnv_xive.h"
> +#include "hw/ppc/pnv_utils.h" /* SETFIELD() and GETFIELD() macros */
>  #include "hw/ppc/xive_regs.h"
>  #include "hw/qdev-properties.h"
>  #include "hw/ppc/ppc.h"
> @@ -65,26 +66,6 @@ static const XiveVstInfo vst_infos[] = {
>  qemu_log_mask(LOG_GUEST_ERROR, "XIVE[%x] - " fmt "\n",  \
>(xive)->chip->chip_id, ## __VA_ARGS__);
>  
> -/*
> - * QEMU version of the GETFIELD/SETFIELD macros
> - *
> - * TODO: It might be better to use the existing extract64() and
> - * deposit64() but this means that all the register definitions will
> - * change and become incompatible with the ones found in skiboot.
> - *
> - * Keep it as it is for now until we find a common ground.
> - */
> -static inline uint64_t GETFIELD(uint64_t mask, uint64_t word)
> -{
> -return (word & mask) >> ctz64(mask);
> -}
> -
> -static inline uint64_t SETFIELD(uint64_t mask, uint64_t word,
> -uint64_t value)
> -{
> -return (word & ~mask) | ((value << ctz64(mask)) & mask);
> -}
> -
>  /*
>   * When PC_TCTXT_CHIPID_OVERRIDE is configured, the PC_TCTXT_CHIPID
>   * field overrides the hardwired chip ID in the Powerbus operations
> @@ -96,7 +77,7 @@ static uint8_t pnv_xive_block_id(PnvXive *xive)
>  uint64_t cfg_val = xive->regs[PC_TCTXT_CFG >> 3];
>  
>  if (cfg_val & PC_TCTXT_CHIPID_OVERRIDE) {
> -blk = GETFIELD(PC_TCTXT_CHIPID, cfg_val);
> +blk = PNV_GETFIELD(PC_TCTXT_CHIPID, cfg_val);
>  }
>  
>  return blk;
> @@ -145,7 +126,7 @@ static uint64_t pnv_xive_vst_addr_direct(PnvXive *xive, 
> uint32_t type,
>  {
>  const XiveVstInfo *info = _infos[type];
>  uint64_t vst_addr = vsd & VSD_ADDRESS_MASK;
> -uint64_t vst_tsize = 1ull << (GETFIELD(VSD_TSIZE, vsd) + 12);
> +uint64_t vst_tsize = 1ull << (PNV_GETFIELD(VSD_TSIZE, vsd) + 12);
>  uint32_t idx_max;
>  
>  idx_max = vst_tsize / info->size - 1;
> @@ -180,7 +161,7 @@ static uint64_t pnv_xive_vst_addr_indirect(PnvXive *xive, 
> uint32_t type,
>  return 0;
>  }
>  
> -page_shift = GETFIELD(VSD_TSIZE, vsd) + 12;
> 

Re: [PATCH v2 13/22] intel_iommu: add PASID cache management infrastructure

2020-04-01 Thread Peter Xu
On Sun, Mar 29, 2020 at 09:24:52PM -0700, Liu Yi L wrote:
> This patch adds a PASID cache management infrastructure based on
> new added structure VTDPASIDAddressSpace, which is used to track
> the PASID usage and future PASID tagged DMA address translation
> support in vIOMMU.
> 
> struct VTDPASIDAddressSpace {
> VTDBus *vtd_bus;
> uint8_t devfn;
> AddressSpace as;
> uint32_t pasid;
> IntelIOMMUState *iommu_state;
> VTDContextCacheEntry context_cache_entry;
> QLIST_ENTRY(VTDPASIDAddressSpace) next;
> VTDPASIDCacheEntry pasid_cache_entry;
> };
> 
> Ideally, a VTDPASIDAddressSpace instance is created when a PASID
> is bound with a DMA AddressSpace. Intel VT-d spec requires guest
> software to issue pasid cache invalidation when bind or unbind a
> pasid with an address space under caching-mode. However, as
> VTDPASIDAddressSpace instances also act as pasid cache in this
> implementation, its creation also happens during vIOMMU PASID
> tagged DMA translation. The creation in this path will not be
> added in this patch since no PASID-capable emulated devices for
> now.
> 
> The implementation in this patch manages VTDPASIDAddressSpace
> instances per PASID+BDF (lookup and insert will use PASID and
> BDF) since Intel VT-d spec allows per-BDF PASID Table. When a
> guest bind a PASID with an AddressSpace, QEMU will capture the
> guest pasid selective pasid cache invalidation, and allocate
> remove a VTDPASIDAddressSpace instance per the invalidation
> reasons:
> 
> *) a present pasid entry moved to non-present
> *) a present pasid entry to be a present entry
> *) a non-present pasid entry moved to present
> 
> vIOMMU emulator could figure out the reason by fetching latest
> guest pasid entry.
> 
> v1 -> v2: - merged this patch with former replay binding patch, makes
> PSI/DSI/GSI use the unified function to do cache invalidation
> and pasid binding replay.
>   - dropped pasid_cache_gen in both iommu_state and vtd_pasid_as
> as it is not necessary so far, we may want it when one day
> initroduce emulated SVA-capable device.
> 
> Cc: Kevin Tian 
> Cc: Jacob Pan 
> Cc: Peter Xu 
> Cc: Yi Sun 
> Cc: Paolo Bonzini 
> Cc: Richard Henderson 
> Cc: Eduardo Habkost 
> Signed-off-by: Liu Yi L 
> ---
>  hw/i386/intel_iommu.c  | 473 
> +
>  hw/i386/intel_iommu_internal.h |  18 ++
>  hw/i386/trace-events   |   1 +
>  include/hw/i386/intel_iommu.h  |  24 +++
>  4 files changed, 516 insertions(+)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index 2eb60c3..a7e9973 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -40,6 +40,7 @@
>  #include "kvm_i386.h"
>  #include "migration/vmstate.h"
>  #include "trace.h"
> +#include "qemu/jhash.h"
>  
>  /* context entry operations */
>  #define VTD_CE_GET_RID2PASID(ce) \
> @@ -65,6 +66,8 @@
>  static void vtd_address_space_refresh_all(IntelIOMMUState *s);
>  static void vtd_address_space_unmap(VTDAddressSpace *as, IOMMUNotifier *n);
>  
> +static void vtd_pasid_cache_reset(IntelIOMMUState *s);
> +
>  static void vtd_panic_require_caching_mode(void)
>  {
>  error_report("We need to set caching-mode=on for intel-iommu to enable "
> @@ -276,6 +279,7 @@ static void vtd_reset_caches(IntelIOMMUState *s)
>  vtd_iommu_lock(s);
>  vtd_reset_iotlb_locked(s);
>  vtd_reset_context_cache_locked(s);
> +vtd_pasid_cache_reset(s);
>  vtd_iommu_unlock(s);
>  }
>  
> @@ -686,6 +690,16 @@ static inline bool vtd_pe_type_check(X86IOMMUState 
> *x86_iommu,
>  return true;
>  }
>  
> +static inline uint16_t vtd_pe_get_domain_id(VTDPASIDEntry *pe)
> +{
> +return VTD_SM_PASID_ENTRY_DID((pe)->val[1]);
> +}
> +
> +static inline uint32_t vtd_sm_ce_get_pdt_entry_num(VTDContextEntry *ce)
> +{
> +return 1U << (VTD_SM_CONTEXT_ENTRY_PDTS(ce->val[0]) + 7);
> +}
> +
>  static inline bool vtd_pdire_present(VTDPASIDDirEntry *pdire)
>  {
>  return pdire->val & 1;
> @@ -2395,9 +2409,452 @@ static bool vtd_process_iotlb_desc(IntelIOMMUState 
> *s, VTDInvDesc *inv_desc)
>  return true;
>  }
>  
> +static inline void vtd_init_pasid_key(uint32_t pasid,
> + uint16_t sid,
> + struct pasid_key *key)
> +{
> +key->pasid = pasid;
> +key->sid = sid;
> +}
> +
> +static guint vtd_pasid_as_key_hash(gconstpointer v)
> +{
> +struct pasid_key *key = (struct pasid_key *)v;
> +uint32_t a, b, c;
> +
> +/* Jenkins hash */
> +a = b = c = JHASH_INITVAL + sizeof(*key);
> +a += key->sid;
> +b += extract32(key->pasid, 0, 16);
> +c += extract32(key->pasid, 16, 16);
> +
> +__jhash_mix(a, b, c);
> +__jhash_final(a, b, c);
> +
> +return c;
> +}
> +
> +static gboolean vtd_pasid_as_key_equal(gconstpointer v1, gconstpointer v2)
> +{
> +const struct pasid_key *k1 

Re: Question on dirty sync before kvm memslot removal

2020-04-01 Thread Peter Xu
On Wed, Apr 01, 2020 at 01:12:04AM +0200, Paolo Bonzini wrote:
> On 31/03/20 18:51, Peter Xu wrote:
> > On Tue, Mar 31, 2020 at 05:34:43PM +0200, Paolo Bonzini wrote:
> >> On 31/03/20 17:23, Peter Xu wrote:
>  Or KVM_MEM_READONLY.
> >>> Yeah, I used a new flag because I thought READONLY was a bit tricky to
> >>> be used directly here.  The thing is IIUC if guest writes to a
> >>> READONLY slot then KVM should either ignore the write or trigger an
> >>> error which I didn't check, however here what we want to do is to let
> >>> the write to fallback to the userspace so it's neither dropped (we
> >>> still want the written data to land gracefully on RAM), nor triggering
> >>> an error (because the slot is actually writable).
> >>
> >> No, writes fall back to userspace with KVM_MEM_READONLY.
> > 
> > I read that __kvm_write_guest_page() will return -EFAULT when writting
> > to the read-only memslot, and e.g. kvm_write_guest_virt_helper() will
> > return with X86EMUL_IO_NEEDED, which will be translated into a
> > EMULATION_OK in x86_emulate_insn().  Then in x86_emulate_instruction()
> > it seems to get a "1" returned (note that I think it does not set
> > either vcpu->arch.pio.count or vcpu->mmio_needed).  Does that mean
> > it'll retry the write forever instead of quit into the userspace?  I
> > may possibly have misread somewhere, though..
> 
> We are definitely relying on KVM_MEM_READONLY to exit to userspace, in
> order to emulate flash memory.
> 
> > However... I think I might find another race with this:
> > 
> >   main thread   vcpu thread
> >   ---   ---
> > dirty GFN1, cached in PML
> > ...
> >   remove memslot1 of GFN1
> > set slot READONLY (whatever, or INVALID)
> > sync log (NOTE: no GFN1 yet)
> > vmexit, flush PML with RCU
> > (will flush to old bitmap) 
> > <--- [1]
> > delete memslot1 (old bitmap freed) 
> > <--- [2]
> >   add memslot2 of GFN1 (memslot2 could be smaller)
> > add memslot2
> > 
> > I'm not 100% sure, but I think GFN1's dirty bit will be lost though
> > it's correctly applied at [1] but quickly freed at [2].
> 
> Yes, we probably need to do a mass vCPU kick when a slot is made
> READONLY, before KVM_SET_USER_MEMORY_REGION returns (and after releasing
> slots_lock).  It makes sense to guarantee that you can't get any more
> dirtying after KVM_SET_USER_MEMORY_REGION returns.

Sounds doable.  Though we still need a synchronous way to kick vcpus
in KVM to make sure the PML is flushed before KVM_SET_MEMORY_REGION
returns, am I right?  Is there an existing good way to do this?

Thanks,

-- 
Peter Xu




Re: [PATCH v2 05/10] target/xtensa: add FIXME for translation memory leak

2020-04-01 Thread Max Filippov
On Wed, Apr 1, 2020 at 2:48 AM Alex Bennée  wrote:
>
> Dynamically allocating a new structure within the DisasContext can
> potentially leak as we can longjmp out of the translation loop (see
> test_phys_mem). The proper fix would be to use static allocation
> within the DisasContext but as the Xtensa translator imports it's code
> from elsewhere I leave that as an exercise for the maintainer.
>
> Signed-off-by: Alex Bennée 
> Cc: Max Filippov 
> ---
>  target/xtensa/translate.c | 5 +
>  1 file changed, 5 insertions(+)

Acked-by: Max Filippov 

-- 
Thanks.
-- Max



[PATCH 1/1] target/i386: fix phadd* with identical destination and source register

2020-04-01 Thread Janne Grunau
Detected by asm test suite failures in dav1d
(https://code.videolan.org/videolan/dav1d). Can be reproduced by
`qemu-x86_64 -cpu core2duo ./tests/checkasm --test=mc_8bpc 1659890620`.

Signed-off-by: Janne Grunau 
---
 target/i386/ops_sse.h | 53 +++
 1 file changed, 33 insertions(+), 20 deletions(-)

diff --git a/target/i386/ops_sse.h b/target/i386/ops_sse.h
index ec1ec745d0..2f41511aef 100644
--- a/target/i386/ops_sse.h
+++ b/target/i386/ops_sse.h
@@ -1435,34 +1435,47 @@ void glue(helper_pshufb, SUFFIX)(CPUX86State *env, Reg 
*d, Reg *s)
 
 void glue(helper_phaddw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
 {
-d->W(0) = (int16_t)d->W(0) + (int16_t)d->W(1);
-d->W(1) = (int16_t)d->W(2) + (int16_t)d->W(3);
-XMM_ONLY(d->W(2) = (int16_t)d->W(4) + (int16_t)d->W(5));
-XMM_ONLY(d->W(3) = (int16_t)d->W(6) + (int16_t)d->W(7));
-d->W((2 << SHIFT) + 0) = (int16_t)s->W(0) + (int16_t)s->W(1);
-d->W((2 << SHIFT) + 1) = (int16_t)s->W(2) + (int16_t)s->W(3);
-XMM_ONLY(d->W(6) = (int16_t)s->W(4) + (int16_t)s->W(5));
-XMM_ONLY(d->W(7) = (int16_t)s->W(6) + (int16_t)s->W(7));
+
+Reg r;
+
+r.W(0) = (int16_t)d->W(0) + (int16_t)d->W(1);
+r.W(1) = (int16_t)d->W(2) + (int16_t)d->W(3);
+XMM_ONLY(r.W(2) = (int16_t)d->W(4) + (int16_t)d->W(5));
+XMM_ONLY(r.W(3) = (int16_t)d->W(6) + (int16_t)d->W(7));
+r.W((2 << SHIFT) + 0) = (int16_t)s->W(0) + (int16_t)s->W(1);
+r.W((2 << SHIFT) + 1) = (int16_t)s->W(2) + (int16_t)s->W(3);
+XMM_ONLY(r.W(6) = (int16_t)s->W(4) + (int16_t)s->W(5));
+XMM_ONLY(r.W(7) = (int16_t)s->W(6) + (int16_t)s->W(7));
+
+*d = r;
 }
 
 void glue(helper_phaddd, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
 {
-d->L(0) = (int32_t)d->L(0) + (int32_t)d->L(1);
-XMM_ONLY(d->L(1) = (int32_t)d->L(2) + (int32_t)d->L(3));
-d->L((1 << SHIFT) + 0) = (int32_t)s->L(0) + (int32_t)s->L(1);
-XMM_ONLY(d->L(3) = (int32_t)s->L(2) + (int32_t)s->L(3));
+Reg r;
+
+r.L(0) = (int32_t)d->L(0) + (int32_t)d->L(1);
+XMM_ONLY(r.L(1) = (int32_t)d->L(2) + (int32_t)d->L(3));
+r.L((1 << SHIFT) + 0) = (int32_t)s->L(0) + (int32_t)s->L(1);
+XMM_ONLY(r.L(3) = (int32_t)s->L(2) + (int32_t)s->L(3));
+
+*d = r;
 }
 
 void glue(helper_phaddsw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
 {
-d->W(0) = satsw((int16_t)d->W(0) + (int16_t)d->W(1));
-d->W(1) = satsw((int16_t)d->W(2) + (int16_t)d->W(3));
-XMM_ONLY(d->W(2) = satsw((int16_t)d->W(4) + (int16_t)d->W(5)));
-XMM_ONLY(d->W(3) = satsw((int16_t)d->W(6) + (int16_t)d->W(7)));
-d->W((2 << SHIFT) + 0) = satsw((int16_t)s->W(0) + (int16_t)s->W(1));
-d->W((2 << SHIFT) + 1) = satsw((int16_t)s->W(2) + (int16_t)s->W(3));
-XMM_ONLY(d->W(6) = satsw((int16_t)s->W(4) + (int16_t)s->W(5)));
-XMM_ONLY(d->W(7) = satsw((int16_t)s->W(6) + (int16_t)s->W(7)));
+Reg r;
+
+r.W(0) = satsw((int16_t)d->W(0) + (int16_t)d->W(1));
+r.W(1) = satsw((int16_t)d->W(2) + (int16_t)d->W(3));
+XMM_ONLY(r.W(2) = satsw((int16_t)d->W(4) + (int16_t)d->W(5)));
+XMM_ONLY(r.W(3) = satsw((int16_t)d->W(6) + (int16_t)d->W(7)));
+r.W((2 << SHIFT) + 0) = satsw((int16_t)s->W(0) + (int16_t)s->W(1));
+r.W((2 << SHIFT) + 1) = satsw((int16_t)s->W(2) + (int16_t)s->W(3));
+XMM_ONLY(r.W(6) = satsw((int16_t)s->W(4) + (int16_t)s->W(5)));
+XMM_ONLY(r.W(7) = satsw((int16_t)s->W(6) + (int16_t)s->W(7)));
+
+*d = r;
 }
 
 void glue(helper_pmaddubsw, SUFFIX)(CPUX86State *env, Reg *d, Reg *s)
-- 
2.25.1




[PATCH for-5.0?] nbd: Attempt reconnect after server error of ESHUTDOWN

2020-04-01 Thread Eric Blake
I was trying to test qemu's reconnect-delay parameter by using nbdkit
as a server that I could easily make disappear and resume.  A bit of
experimenting shows that when nbdkit is abruptly killed (SIGKILL),
qemu detects EOF on the socket and manages to reconnect just fine; but
when nbdkit is gracefully killed (SIGTERM), it merely fails all
further guest requests with NBD_ESHUTDOWN until the client disconnects
first, and qemu was blindly failing the I/O request with ESHUTDOWN
from the server instead of attempting to reconnect.

While most NBD server failures are unlikely to change by merely
retrying the same transaction, our decision to not start a retry loop
in the common case is correct.  But NBD_ESHUTDOWN is rare enough, and
really is indicative of a transient situation, that it is worth
special-casing.

Here's the test setup I used: in one terminal, kick off a sequence of
nbdkit commands that has a temporary window where the server is
offline; in another terminal (and within the first 5 seconds) kick off
a qemu-img convert with reconnect enabled.  If the qemu-img process
completes successfully, the reconnect worked.

$ #term1
$ MYSIG=# or MYSIG='-s KILL'
$ timeout $MYSIG 5s ~/nbdkit/nbdkit -fv --filter=delay --filter=noextents \
  null 200M delay-read=1s; sleep 5; ~/nbdkit/nbdkit -fv --filter=exitlast \
  --filter=delay --filter=noextents null 200M delay-read=1s

$ #term2
$ MYCONN=server.type=inet,server.host=localhost,server.port=10809
$ qemu-img convert -p -O raw --image-opts \
  driver=nbd,$MYCONN,,reconnect-delay=60 out.img

See also: https://bugzilla.redhat.com/show_bug.cgi?id=1819240#c8

Signed-off-by: Eric Blake 
---

This is not a regression, per se, as reconnect-delay has been unchanged
since 4.2; but I'd like to consider this as an interoperability bugfix
worth including in the next rc.

 block/nbd.c | 9 +
 1 file changed, 9 insertions(+)

diff --git a/block/nbd.c b/block/nbd.c
index 2906484390f9..576b95fb8753 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -863,6 +863,15 @@ static coroutine_fn int nbd_co_receive_one_chunk(
 if (ret < 0) {
 memset(reply, 0, sizeof(*reply));
 nbd_channel_error(s, ret);
+} else if (s->reconnect_delay && *request_ret == -ESHUTDOWN) {
+/*
+ * Special case: if we support reconnect and server is warning
+ * us that it wants to shut down, then treat this like an
+ * abrupt connection loss.
+ */
+memset(reply, 0, sizeof(*reply));
+*request_ret = 0;
+nbd_channel_error(s, -EIO);
 } else {
 /* For assert at loop start in nbd_connection_entry */
 *reply = s->reply;
-- 
2.26.0.rc2




Re: [PATCH for-5.0 v2] qemu-img: Report convert errors by bytes, not sectors

2020-04-01 Thread no-reply
Patchew URL: https://patchew.org/QEMU/20200401180436.298613-1-ebl...@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

Not run: 259
Failures: 244
Failed 1 of 116 iotests
make: *** [check-tests/check-block.sh] Error 1
make: *** Waiting for unfinished jobs
  TESTcheck-qtest-aarch64: tests/qtest/test-hmp
  TESTcheck-qtest-aarch64: tests/qtest/qos-test
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=4ed9f742ca544eb99a9fdb6a074a1398', '-u', 
'1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-krxqwjvg/src/docker-src.2020-04-01-17.56.22.16834:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=4ed9f742ca544eb99a9fdb6a074a1398
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-krxqwjvg/src'
make: *** [docker-run-test-quick@centos7] Error 2

real14m48.713s
user0m9.005s


The full log is available at
http://patchew.org/logs/20200401180436.298613-1-ebl...@redhat.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH-for-5.0 1/7] tests/acceptance/machine_sparc_leon3: Disable HelenOS test

2020-04-01 Thread Philippe Mathieu-Daudé

On 4/1/20 10:30 PM, Willian Rampazzo wrote:

On Wed, Apr 1, 2020 at 5:21 PM Philippe Mathieu-Daudé  wrote:
  

Odd, with avocado-framework==76.0 I get:

https://travis-ci.org/github/philmd/qemu/jobs/669851870#L4908

Traceback (most recent call last):
File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
  "__main__", mod_spec)
File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
  exec(code, run_globals)
File
"/home/travis/build/philmd/qemu/build/tests/venv/lib/python3.6zsite-packages/avocado/__main__.py",
line 11, in 
  sys.exit(main.run())
File
"/home/travis/build/philmd/qemu/build/tests/venv/lib/python3.6/site-packages/avocado/core/app.py",
line 91, in run
  return method(self.parser.config)
File
"/home/travis/build/philmd/qemu/build/tests/venv/lib/python3.6/site-packages/avocado/plugins/assets.py",
line 291, in run
  success, fail = fetch_assets(test_file)
File
"/home/travis/build/philmd/qemu/build/tests/venv/lib/python3.6/site-packages/avocado/plugins/assets.py",
line 200, in fetch_assets
  handler = FetchAssetHandler(test_file, klass, method)
File
"/home/travis/build/philmd/qemu/build/tests/venv/lib/python3.6/site-packages/avocado/plugins/assets.py",
line 65, in __init__
  self.visit(self.tree)
File "/usr/lib/python3.6/ast.py", line 253, in visit
  return visitor(node)
File "/usr/lib/python3.6/ast.py", line 261, in generic_visit
  self.visit(item)
File "/usr/lib/python3.6/ast.py", line 253, in visit
  return visitor(node)
File
"/home/travis/build/philmd/qemu/build/tests/venv/lib/python3.6/site-packages/avocado/plugins/assets.py",
line 139, in visit_ClassDef
  self.generic_visit(node)
File "/usr/lib/python3.6/ast.py", line 261, in generic_visit
  self.visit(item)
File "/usr/lib/python3.6/ast.py", line 253, in visit
  return visitor(node)
File
"/home/travis/build/philmd/qemu/build/tests/venv/lib/python3.6/site-packages/avocado/plugins/assets.py",
line 171, in visit_Assign
  self.asgmts[cur_klass][cur_method][name] = node.value.s
KeyError: 'launch_and_wait'
/home/travis/build/philmd/qemu/tests/Makefile.include:910: recipe for
target 'fetch-acceptance-assets' failed

This launch_and_wait comes from:

tests/acceptance/boot_linux.py:88:def launch_and_wait(self):


Sorry about that. This is a known bug, see
https://github.com/avocado-framework/avocado/issues/3661. It is fixed
upstream and will be available in the next release of Avocado.


Thanks for the quick reply :)

I'm now using this kludge to include your bugfix:

-- >8 --
diff --git a/tests/requirements.txt b/tests/requirements.txt
index f9c84b4ba1..d625b32dbb 100644
--- a/tests/requirements.txt
+++ b/tests/requirements.txt
@@ -1,5 +1,5 @@
 # Add Python module requirements, one per line, to be installed
 # in the tests/venv Python virtual environment. For more info,
 # refer to: https://pip.pypa.io/en/stable/user_guide/#id1
-avocado-framework==76.0
+-e 
git+https://github.com/avocado-framework/avocado.git@f9b4dc7c58a6424eb8d0ed6781a1d76ae3a5ab06#egg=avocado-framework

 pycdlib==1.9.0
---

But I'm getting another error:

https://travis-ci.org/github/philmd/qemu/builds/669898682#L1702

...
avocado.test: Asset not in cache, fetching it.
avocado.test: Fetching 
ftp://ftp.boulder.ibm.com/rs6000/firmware/7020-40p/P12H0456.IMG -> 
/home/travis/avocado/data/cache/by_location/9234e0fdde347e2a4604c133fa2c8d9e9291/P12H0456.IMG.dp3lw27q
avocado.test: FileNotFoundError: [Errno 2] No such file or directory: 
'/home/travis/avocado/data/cache/by_location/9234e0fdde347e2a4604c133fa2c8d9e9291/P12H0456.IMG.dp3lw27q'

Failed to fetch P12H0456.IMG.
/home/travis/build/philmd/qemu/tests/Makefile.include:910: recipe for 
target 'fetch-acceptance-assets' failed

make: *** [fetch-acceptance-assets] Error 4

I don't understand because all the other directories are created, I'm 
not sure what is missing here, any idea?


(test branch is https://github.com/philmd/qemu/commits/travis_fetch_avocado)

Thanks,

Phil.




Re: [PATCH 0/3] DirectSound fixes for 5.0

2020-04-01 Thread Volker Rümelin
Sorry, please ignore this patch series. Patch 1/3 "dsoundaudio:
fix never-ending playback loop" is wrong. I'll send a version 2.

With best regards,
Volker



[PATCH for-5.0] configure: Add -Werror to PIE probe

2020-04-01 Thread Richard Henderson
Without -Werror, the probe may succeed, but then compilation fails
later when -Werror is added for other reasons.  Shows up on windows,
where the compiler complains about -fPIC.

Signed-off-by: Richard Henderson 
---
 configure | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/configure b/configure
index 22870f3867..233c671aaa 100755
--- a/configure
+++ b/configure
@@ -2119,7 +2119,7 @@ if compile_prog "-Werror -fno-pie" "-no-pie"; then
 fi
 
 if test "$static" = "yes"; then
-  if test "$pie" != "no" && compile_prog "-fPIE -DPIE" "-static-pie"; then
+  if test "$pie" != "no" && compile_prog "-Werror -fPIE -DPIE" "-static-pie"; 
then
 QEMU_CFLAGS="-fPIE -DPIE $QEMU_CFLAGS"
 QEMU_LDFLAGS="-static-pie $QEMU_LDFLAGS"
 pie="yes"
@@ -2132,7 +2132,7 @@ if test "$static" = "yes"; then
 elif test "$pie" = "no"; then
   QEMU_CFLAGS="$CFLAGS_NOPIE $QEMU_CFLAGS"
   QEMU_LDFLAGS="$LDFLAGS_NOPIE $QEMU_LDFLAGS"
-elif compile_prog "-fPIE -DPIE" "-pie"; then
+elif compile_prog "-Werror -fPIE -DPIE" "-pie"; then
   QEMU_CFLAGS="-fPIE -DPIE $QEMU_CFLAGS"
   QEMU_LDFLAGS="-pie $QEMU_LDFLAGS"
   pie="yes"
-- 
2.20.1




Re: [PATCH-for-5.0 1/7] tests/acceptance/machine_sparc_leon3: Disable HelenOS test

2020-04-01 Thread Willian Rampazzo
On Wed, Apr 1, 2020 at 5:21 PM Philippe Mathieu-Daudé  wrote:
 
> Odd, with avocado-framework==76.0 I get:
>
> https://travis-ci.org/github/philmd/qemu/jobs/669851870#L4908
>
> Traceback (most recent call last):
>File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
>  "__main__", mod_spec)
>File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
>  exec(code, run_globals)
>File
> "/home/travis/build/philmd/qemu/build/tests/venv/lib/python3.6zsite-packages/avocado/__main__.py",
> line 11, in 
>  sys.exit(main.run())
>File
> "/home/travis/build/philmd/qemu/build/tests/venv/lib/python3.6/site-packages/avocado/core/app.py",
> line 91, in run
>  return method(self.parser.config)
>File
> "/home/travis/build/philmd/qemu/build/tests/venv/lib/python3.6/site-packages/avocado/plugins/assets.py",
> line 291, in run
>  success, fail = fetch_assets(test_file)
>File
> "/home/travis/build/philmd/qemu/build/tests/venv/lib/python3.6/site-packages/avocado/plugins/assets.py",
> line 200, in fetch_assets
>  handler = FetchAssetHandler(test_file, klass, method)
>File
> "/home/travis/build/philmd/qemu/build/tests/venv/lib/python3.6/site-packages/avocado/plugins/assets.py",
> line 65, in __init__
>  self.visit(self.tree)
>File "/usr/lib/python3.6/ast.py", line 253, in visit
>  return visitor(node)
>File "/usr/lib/python3.6/ast.py", line 261, in generic_visit
>  self.visit(item)
>File "/usr/lib/python3.6/ast.py", line 253, in visit
>  return visitor(node)
>File
> "/home/travis/build/philmd/qemu/build/tests/venv/lib/python3.6/site-packages/avocado/plugins/assets.py",
> line 139, in visit_ClassDef
>  self.generic_visit(node)
>File "/usr/lib/python3.6/ast.py", line 261, in generic_visit
>  self.visit(item)
>File "/usr/lib/python3.6/ast.py", line 253, in visit
>  return visitor(node)
>File
> "/home/travis/build/philmd/qemu/build/tests/venv/lib/python3.6/site-packages/avocado/plugins/assets.py",
> line 171, in visit_Assign
>  self.asgmts[cur_klass][cur_method][name] = node.value.s
> KeyError: 'launch_and_wait'
> /home/travis/build/philmd/qemu/tests/Makefile.include:910: recipe for
> target 'fetch-acceptance-assets' failed
>
> This launch_and_wait comes from:
>
> tests/acceptance/boot_linux.py:88:def launch_and_wait(self):

Sorry about that. This is a known bug, see
https://github.com/avocado-framework/avocado/issues/3661. It is fixed
upstream and will be available in the next release of Avocado.

Willian




Re: [PATCH-for-5.0 1/7] tests/acceptance/machine_sparc_leon3: Disable HelenOS test

2020-04-01 Thread Philippe Mathieu-Daudé

On 4/1/20 7:43 PM, Willian Rampazzo wrote:

On Tue, Mar 31, 2020 at 5:07 PM Philippe Mathieu-Daudé
 wrote:



First job failed by timeout, 2nd succeeded:
https://travis-ci.org/github/philmd/qemu/jobs/669265466

However "Ran for 46 min 48 sec"

  From the log:

Fetching asset from
tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_mips64el_malta_5KEc_cpio
Fetching asset from
tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_mips64el_malta_5KEc_cpio
Fetching asset from
tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_orangepi
Fetching asset from
tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_orangepi_initrd
Fetching asset from
tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_orangepi_initrd
Fetching asset from
tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_orangepi_sd
Fetching asset from
tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_orangepi_sd
Fetching asset from
tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_orangepi_bionic
Fetching asset from
tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_orangepi_uboot_netbsd9
Fetching asset from
tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_orangepi_uboot_netbsd9
Fetching asset from
tests/acceptance/ppc_prep_40p.py:IbmPrep40pMachine.test_openbios_and_netbsd
...
   (13/82)
tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_mips64el_malta_5KEc_cpio:
   SKIP: untrusted code
   (24/82)
tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_orangepi_bionic:
   SKIP: storage limited
...
   (25/82)
tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_orangepi_uboot_netbsd9:
   SKIP: storage limited
...
   (63/82)
tests/acceptance/ppc_prep_40p.py:IbmPrep40pMachine.test_openbios_and_netbsd:
   SKIP: Running on Travis-CI

Is it possible that we are now fetching assets for tests we are not
running? In particular the one marked @skip because the downloading time
was too long on Travis?


Yes, your assumption is correct, this execution of Avocado downloaded
assets for tests that were skipped. Let me try to explain how the
asset feature works today on Avocado.

Avocado has two basic ways to work with assets:

1. Parse limited use cases of `fetch_asset` call in the test file and
execute them. This operation can happen in two different scenarios.
First, when using the command line `avocado assets fetch `.


Odd, with avocado-framework==76.0 I get:

https://travis-ci.org/github/philmd/qemu/jobs/669851870#L4908

Traceback (most recent call last):
  File "/usr/lib/python3.6/runpy.py", line 193, in _run_module_as_main
"__main__", mod_spec)
  File "/usr/lib/python3.6/runpy.py", line 85, in _run_code
exec(code, run_globals)
  File 
"/home/travis/build/philmd/qemu/build/tests/venv/lib/python3.6zsite-packages/avocado/__main__.py", 
line 11, in 

sys.exit(main.run())
  File 
"/home/travis/build/philmd/qemu/build/tests/venv/lib/python3.6/site-packages/avocado/core/app.py", 
line 91, in run

return method(self.parser.config)
  File 
"/home/travis/build/philmd/qemu/build/tests/venv/lib/python3.6/site-packages/avocado/plugins/assets.py", 
line 291, in run

success, fail = fetch_assets(test_file)
  File 
"/home/travis/build/philmd/qemu/build/tests/venv/lib/python3.6/site-packages/avocado/plugins/assets.py", 
line 200, in fetch_assets

handler = FetchAssetHandler(test_file, klass, method)
  File 
"/home/travis/build/philmd/qemu/build/tests/venv/lib/python3.6/site-packages/avocado/plugins/assets.py", 
line 65, in __init__

self.visit(self.tree)
  File "/usr/lib/python3.6/ast.py", line 253, in visit
return visitor(node)
  File "/usr/lib/python3.6/ast.py", line 261, in generic_visit
self.visit(item)
  File "/usr/lib/python3.6/ast.py", line 253, in visit
return visitor(node)
  File 
"/home/travis/build/philmd/qemu/build/tests/venv/lib/python3.6/site-packages/avocado/plugins/assets.py", 
line 139, in visit_ClassDef

self.generic_visit(node)
  File "/usr/lib/python3.6/ast.py", line 261, in generic_visit
self.visit(item)
  File "/usr/lib/python3.6/ast.py", line 253, in visit
return visitor(node)
  File 
"/home/travis/build/philmd/qemu/build/tests/venv/lib/python3.6/site-packages/avocado/plugins/assets.py", 
line 171, in visit_Assign

self.asgmts[cur_klass][cur_method][name] = node.value.s
KeyError: 'launch_and_wait'
/home/travis/build/philmd/qemu/tests/Makefile.include:910: recipe for 
target 'fetch-acceptance-assets' failed


This launch_and_wait comes from:

tests/acceptance/boot_linux.py:88:def launch_and_wait(self):


In this case, it is a standalone execution of each fetch call and the
test is not executed at all. Second, by running the test. The, enabled
by default, plugin FetchAssetJob will do the same operation of parsing
the test file and executing occurrences of `fetch_asset` call before
the tests start to run. Again, the fetch time is not 

Re: Questionable aspects of QEMU Error's design

2020-04-01 Thread Peter Maydell
On Wed, 1 Apr 2020 at 10:03, Markus Armbruster  wrote:
>
> QEMU's Error was patterned after GLib's GError.  Differences include:

>From my POV the major problem with Error as we have it today
is that it makes the simple process of writing code like
device realize functions horrifically boilerplate heavy;
for instance this is from hw/arm/armsse.c:

object_property_set_link(cpuobj, OBJECT(>cpu_container[i]),
 "memory", );
if (err) {
error_propagate(errp, err);
return;
}
object_property_set_link(cpuobj, OBJECT(s), "idau", );
if (err) {
error_propagate(errp, err);
return;
}
object_property_set_bool(cpuobj, true, "realized", );
if (err) {
error_propagate(errp, err);
return;
}

16 lines of code just to set 2 properties on an object
and realize it. It's a lot of boilerplate and as
a result we frequently get it wrong or take shortcuts
(eg forgetting the error-handling entirely, calling
error_propagate just once for a whole sequence of
calls, taking the lazy approach and using err_abort
or err_fatal when we ought really to be propagating
an error, etc). I haven't looked at 'auto propagation'
yet, hopefully it will help?

thanks
-- PMM



Re: [PATCH v16 QEMU 12/16] memory: Set DIRTY_MEMORY_MIGRATION when IOMMU is enabled

2020-04-01 Thread Alex Williamson
On Wed, 1 Apr 2020 20:00:32 +0100
"Dr. David Alan Gilbert"  wrote:

> * Kirti Wankhede (kwankh...@nvidia.com) wrote:
> > Signed-off-by: Kirti Wankhede 
> > ---
> >  memory.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/memory.c b/memory.c
> > index acb7546971c3..285ca2ed6dd9 100644
> > --- a/memory.c
> > +++ b/memory.c
> > @@ -1788,7 +1788,7 @@ bool memory_region_is_ram_device(MemoryRegion *mr)
> >  uint8_t memory_region_get_dirty_log_mask(MemoryRegion *mr)
> >  {
> >  uint8_t mask = mr->dirty_log_mask;
> > -if (global_dirty_log && mr->ram_block) {
> > +if (global_dirty_log && (mr->ram_block || memory_region_is_iommu(mr))) 
> > {
> >  mask |= (1 << DIRTY_MEMORY_MIGRATION);  
> 
> I'm missing why the two go together here.
> What does 'is_iommu' really mean?

I take that to mean MemoryRegion is translated by an IOMMU, ie. it's an
IOVA range of the IOMMU.  Therefore we're adding it to dirty log
tracking, just as we do for ram blocks.  At least that's my
interpretation of what it's supposed to do, I'm not an expert here on
whether it's the right way to do that.  Thanks,

Alex




[PATCH] hw/core: properly terminate loading .hex on EOF record

2020-04-01 Thread Alex Bennée
The https://makecode.microbit.org/#editor generates slightly weird
.hex files which work fine on a real microbit but causes QEMU to
choke. The reason is extraneous data after the EOF record which causes
the loader to attempt to write a bigger file than it should to the
"rom". According to the HEX file spec an EOF really should be the last
thing we process so lets do that.

Reported-by: Ursula Bennée 
Signed-off-by: Alex Bennée 
---
 hw/core/loader.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/core/loader.c b/hw/core/loader.c
index eeef6da9a1b..8bbb1797a4c 100644
--- a/hw/core/loader.c
+++ b/hw/core/loader.c
@@ -1447,6 +1447,7 @@ typedef struct {
 uint32_t current_rom_index;
 uint32_t rom_start_address;
 AddressSpace *as;
+bool complete;
 } HexParser;
 
 /* return size or -1 if error */
@@ -1484,6 +1485,7 @@ static int handle_record_type(HexParser *parser)
   parser->current_rom_index,
   parser->rom_start_address, parser->as);
 }
+parser->complete = true;
 return parser->total_size;
 case EXT_SEG_ADDR_RECORD:
 case EXT_LINEAR_ADDR_RECORD:
@@ -1548,11 +1550,12 @@ static int parse_hex_blob(const char *filename, hwaddr 
*addr, uint8_t *hex_blob,
 .bin_buf = g_malloc(hex_blob_size),
 .start_addr = addr,
 .as = as,
+.complete = false
 };
 
 rom_transaction_begin();
 
-for (; hex_blob < end; ++hex_blob) {
+for (; hex_blob < end && !parser.complete; ++hex_blob) {
 switch (*hex_blob) {
 case '\r':
 case '\n':
-- 
2.20.1




Re: [PATCH v16 QEMU 14/16] vfio: Add vfio_listener_log_sync to mark dirty pages

2020-04-01 Thread Dr. David Alan Gilbert
* Alex Williamson (alex.william...@redhat.com) wrote:
> On Wed, 25 Mar 2020 02:39:12 +0530
> Kirti Wankhede  wrote:
> 
> > vfio_listener_log_sync gets list of dirty pages from container using
> > VFIO_IOMMU_GET_DIRTY_BITMAP ioctl and mark those pages dirty when all
> > devices are stopped and saving state.
> > Return early for the RAM block section of mapped MMIO region.
> > 
> > Signed-off-by: Kirti Wankhede 
> > Reviewed-by: Neo Jia 
> > ---
> >  hw/vfio/common.c | 200 
> > +--
> >  hw/vfio/trace-events |   1 +
> >  2 files changed, 196 insertions(+), 5 deletions(-)
> > 
> > diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> > index 4a2f0d6a2233..6d41e1ac5c2f 100644
> > --- a/hw/vfio/common.c
> > +++ b/hw/vfio/common.c
> > @@ -29,6 +29,7 @@
> >  #include "hw/vfio/vfio.h"
> >  #include "exec/address-spaces.h"
> >  #include "exec/memory.h"
> > +#include "exec/ram_addr.h"
> >  #include "hw/hw.h"
> >  #include "qemu/error-report.h"
> >  #include "qemu/main-loop.h"
> > @@ -38,6 +39,7 @@
> >  #include "sysemu/reset.h"
> >  #include "trace.h"
> >  #include "qapi/error.h"
> > +#include "migration/migration.h"
> >  
> >  VFIOGroupList vfio_group_list =
> >  QLIST_HEAD_INITIALIZER(vfio_group_list);
> > @@ -288,6 +290,28 @@ const MemoryRegionOps vfio_region_ops = {
> >  };
> >  
> >  /*
> > + * Device state interfaces
> > + */
> > +
> > +static bool vfio_devices_are_stopped_and_saving(void)
> > +{
> > +VFIOGroup *group;
> > +VFIODevice *vbasedev;
> > +
> > +QLIST_FOREACH(group, _group_list, next) {
> > +QLIST_FOREACH(vbasedev, >device_list, next) {
> > +if ((vbasedev->device_state & VFIO_DEVICE_STATE_SAVING) &&
> > +!(vbasedev->device_state & VFIO_DEVICE_STATE_RUNNING)) {
> > +continue;
> > +} else {
> > +return false;
> > +}
> > +}
> > +}
> > +return true;
> > +}
> > +
> > +/*
> >   * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
> >   */
> >  static int vfio_dma_unmap(VFIOContainer *container,
> > @@ -408,8 +432,8 @@ static bool 
> > vfio_listener_skipped_section(MemoryRegionSection *section)
> >  }
> >  
> >  /* Called with rcu_read_lock held.  */
> > -static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, void **vaddr,
> > -   bool *read_only)
> > +static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> > +   ram_addr_t *ram_addr, bool *read_only)
> >  {
> >  MemoryRegion *mr;
> >  hwaddr xlat;
> > @@ -440,9 +464,17 @@ static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, void 
> > **vaddr,
> >  return false;
> >  }
> >  
> > -*vaddr = memory_region_get_ram_ptr(mr) + xlat;
> > -*read_only = !writable || mr->readonly;
> > +if (vaddr) {
> > +*vaddr = memory_region_get_ram_ptr(mr) + xlat;
> > +}
> >  
> > +if (ram_addr) {
> > +*ram_addr = memory_region_get_ram_addr(mr) + xlat;
> > +}
> > +
> > +if (read_only) {
> > +*read_only = !writable || mr->readonly;
> > +}
> >  return true;
> >  }
> >  
> > @@ -467,7 +499,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
> > IOMMUTLBEntry *iotlb)
> >  rcu_read_lock();
> >  
> >  if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
> > -if (!vfio_get_vaddr(iotlb, , _only)) {
> > +if (!vfio_get_xlat_addr(iotlb, , NULL, _only)) {
> >  goto out;
> >  }
> >  /*
> > @@ -813,9 +845,167 @@ static void vfio_listener_region_del(MemoryListener 
> > *listener,
> >  }
> >  }
> >  
> > +static int vfio_get_dirty_bitmap(MemoryListener *listener,
> > + MemoryRegionSection *section)
> > +{
> > +VFIOContainer *container = container_of(listener, VFIOContainer, 
> > listener);
> > +VFIOGuestIOMMU *giommu;
> > +IOMMUTLBEntry iotlb;
> > +hwaddr granularity, address_limit, iova;
> > +int ret;
> > +
> > +if (memory_region_is_iommu(section->mr)) {
> > +QLIST_FOREACH(giommu, >giommu_list, giommu_next) {
> > +if (MEMORY_REGION(giommu->iommu) == section->mr &&
> > +giommu->n.start == section->offset_within_region) {
> > +break;
> > +}
> > +}
> > +
> > +if (!giommu) {
> > +return -EINVAL;
> > +}
> > +}
> > +
> > +if (memory_region_is_iommu(section->mr)) {
> > +granularity = memory_region_iommu_get_min_page_size(giommu->iommu);
> > +
> > +address_limit = MIN(int128_get64(section->size),
> > +
> > memory_region_iommu_get_address_limit(giommu->iommu,
> > + 
> > int128_get64(section->size)));
> > +} else {
> > +granularity = memory_region_size(section->mr);
> > +address_limit = int128_get64(section->size);
> > +}
> > +
> > +iova = 

Re: [PATCH v16 QEMU 13/16] vfio: Add function to start and stop dirty pages tracking

2020-04-01 Thread Dr. David Alan Gilbert
* Kirti Wankhede (kwankh...@nvidia.com) wrote:
> Call VFIO_IOMMU_DIRTY_PAGES ioctl to start and stop dirty pages tracking
> for VFIO devices.
> 
> Signed-off-by: Kirti Wankhede 
> ---
>  hw/vfio/migration.c | 36 
>  1 file changed, 36 insertions(+)
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index ab295d25620e..1827b7cfb316 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -9,6 +9,7 @@
>  
>  #include "qemu/osdep.h"
>  #include "qemu/main-loop.h"
> +#include 
>  #include 
>  
>  #include "sysemu/runstate.h"
> @@ -296,6 +297,32 @@ static int vfio_load_device_config_state(QEMUFile *f, 
> void *opaque)
>  return qemu_file_get_error(f);
>  }
>  
> +static int vfio_start_dirty_page_tracking(VFIODevice *vbasedev, bool start)
> +{
> +int ret;
> +VFIOContainer *container = vbasedev->group->container;
> +struct vfio_iommu_type1_dirty_bitmap dirty = {
> +.argsz = sizeof(dirty),
> +};
> +
> +if (start) {
> +if (vbasedev->device_state & VFIO_DEVICE_STATE_SAVING) {
> +dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START;
> +} else {
> +return 0;
> +}
> +} else {
> +dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP;
> +}
> +
> +ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, );
> +if (ret) {
> +error_report("Failed to set dirty tracking flag 0x%x errno: %d",
> + dirty.flags, errno);
> +}
> +return ret;
> +}
> +
>  /* -- */
>  
>  static int vfio_save_setup(QEMUFile *f, void *opaque)
> @@ -330,6 +357,11 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
>   */
>  qemu_put_be64(f, migration->region.size);
>  
> +ret = vfio_start_dirty_page_tracking(vbasedev, true);
> +if (ret) {
> +return ret;
> +}
> +
>  qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
>  
>  ret = qemu_file_get_error(f);
> @@ -346,6 +378,8 @@ static void vfio_save_cleanup(void *opaque)
>  VFIODevice *vbasedev = opaque;
>  VFIOMigration *migration = vbasedev->migration;
>  
> +vfio_start_dirty_page_tracking(vbasedev, false);

Shouldn't you check the return value?

> +
>  if (migration->region.mmaps) {
>  vfio_region_unmap(>region);
>  }
> @@ -669,6 +703,8 @@ static void vfio_migration_state_notifier(Notifier 
> *notifier, void *data)
>  if (ret) {
>  error_report("%s: Failed to set state RUNNING", vbasedev->name);
>  }
> +
> +vfio_start_dirty_page_tracking(vbasedev, false);
>  }
>  }
>  
> -- 
> 2.7.0
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




[PATCH 2/3] dsoundaudio: fix "Could not lock capture buffer" warning

2020-04-01 Thread Volker Rümelin
IDirectSoundCaptureBuffer_Lock() fails on Windows when called
with len = 0. Return early from dsound_get_buffer_in() in this
case.

To reproduce the warning start a linux guest. In the guest
start Audacity and you will see a lot of "Could not lock
capture buffer" warnings.

Signed-off-by: Volker Rümelin 
---
 audio/dsoundaudio.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/audio/dsoundaudio.c b/audio/dsoundaudio.c
index af70dd128e..ea1595dcd1 100644
--- a/audio/dsoundaudio.c
+++ b/audio/dsoundaudio.c
@@ -544,6 +544,11 @@ static void *dsound_get_buffer_in(HWVoiceIn *hw, size_t 
*size)
 req_size = audio_ring_dist(cpos, hw->pos_emul, hw->size_emul);
 req_size = MIN(req_size, hw->size_emul - hw->pos_emul);
 
+if (req_size == 0) {
+*size = 0;
+return NULL;
+}
+
 err = dsound_lock_in(dscb, >info, hw->pos_emul, req_size, , NULL,
  _size, NULL, false, ds->s);
 if (err) {
-- 
2.16.4




[PATCH 1/3] dsoundaudio: fix never-ending playback loop

2020-04-01 Thread Volker Rümelin
Currently the DirectSound backend fails to stop audio playback
in dsound_enable_out(). The call to IDirectSoundBuffer_GetStatus()
in dsound_get_status_out() returns a status word with the flag
DSERR_BUFFERLOST set (probably because of a buffer underrun).
The function dsound_get_status_out() correctly calls
dsound_restore_out() and returns an error. This is wrong. If
dsound_restore_out() succeeds the program should continue without
an error.

To reproduce the bug start qemu on a Windows host with
-soundhw pcspk -audiodev dsound,id=audio0. On the guest
FreeDOS 1.2 command line enter beep. The image Day 1 - F-Bird
from the QEMU Advent Calendar 2018 shows the bug as well.

Fixes: 2762955f72 "dsoundaudio: remove *_retries kludges"
Buglink: https://bugs.launchpad.net/qemu/+bug/1699628
Signed-off-by: Volker Rümelin 
---
 audio/dsoundaudio.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/audio/dsoundaudio.c b/audio/dsoundaudio.c
index bd57082a8d..af70dd128e 100644
--- a/audio/dsoundaudio.c
+++ b/audio/dsoundaudio.c
@@ -280,8 +280,10 @@ static int dsound_get_status_out (LPDIRECTSOUNDBUFFER dsb, 
DWORD *statusp,
 }
 
 if (*statusp & DSERR_BUFFERLOST) {
-dsound_restore_out(dsb, s);
-return -1;
+if (dsound_restore_out(dsb, s)) {
+return -1;
+}
+*statusp &= ~DSERR_BUFFERLOST;
 }
 
 return 0;
-- 
2.16.4




Re: [PATCH v16 QEMU 12/16] memory: Set DIRTY_MEMORY_MIGRATION when IOMMU is enabled

2020-04-01 Thread Dr. David Alan Gilbert
* Kirti Wankhede (kwankh...@nvidia.com) wrote:
> Signed-off-by: Kirti Wankhede 
> ---
>  memory.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/memory.c b/memory.c
> index acb7546971c3..285ca2ed6dd9 100644
> --- a/memory.c
> +++ b/memory.c
> @@ -1788,7 +1788,7 @@ bool memory_region_is_ram_device(MemoryRegion *mr)
>  uint8_t memory_region_get_dirty_log_mask(MemoryRegion *mr)
>  {
>  uint8_t mask = mr->dirty_log_mask;
> -if (global_dirty_log && mr->ram_block) {
> +if (global_dirty_log && (mr->ram_block || memory_region_is_iommu(mr))) {
>  mask |= (1 << DIRTY_MEMORY_MIGRATION);

I'm missing why the two go together here.
What does 'is_iommu' really mean?

Dave

>  }
>  return mask;
> -- 
> 2.7.0
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK




[PATCH 3/3] dsoundaudio: dsound_get_buffer_in should honor *size

2020-04-01 Thread Volker Rümelin
This patch prevents an underflow of variable samples in function
audio_pcm_hw_run_in(). See commit 599eac4e5a "audio:
audio_generic_get_buffer_in should honor *size". This time the
while loop in audio_pcm_hw_run_in() will terminate nevertheless,
because it seems the recording stream in Windows is always rate
limited.

Signed-off-by: Volker Rümelin 
---
 audio/audio.c   | 12 +---
 audio/dsoundaudio.c |  2 +-
 2 files changed, 6 insertions(+), 8 deletions(-)

diff --git a/audio/audio.c b/audio/audio.c
index 9ac9a20c41..7a9e680355 100644
--- a/audio/audio.c
+++ b/audio/audio.c
@@ -1491,15 +1491,13 @@ size_t audio_generic_write(HWVoiceOut *hw, void *buf, 
size_t size)
 
 size_t audio_generic_read(HWVoiceIn *hw, void *buf, size_t size)
 {
-size_t src_size, copy_size;
-void *src = hw->pcm_ops->get_buffer_in(hw, _size);
-copy_size = MIN(size, src_size);
+void *src = hw->pcm_ops->get_buffer_in(hw, );
 
-memcpy(buf, src, copy_size);
-hw->pcm_ops->put_buffer_in(hw, src, copy_size);
-return copy_size;
-}
+memcpy(buf, src, size);
+hw->pcm_ops->put_buffer_in(hw, src, size);
 
+return size;
+}
 
 static int audio_driver_init(AudioState *s, struct audio_driver *drv,
  bool msg, Audiodev *dev)
diff --git a/audio/dsoundaudio.c b/audio/dsoundaudio.c
index ea1595dcd1..d3522f0e00 100644
--- a/audio/dsoundaudio.c
+++ b/audio/dsoundaudio.c
@@ -542,7 +542,7 @@ static void *dsound_get_buffer_in(HWVoiceIn *hw, size_t 
*size)
 }
 
 req_size = audio_ring_dist(cpos, hw->pos_emul, hw->size_emul);
-req_size = MIN(req_size, hw->size_emul - hw->pos_emul);
+req_size = MIN(*size, MIN(req_size, hw->size_emul - hw->pos_emul));
 
 if (req_size == 0) {
 *size = 0;
-- 
2.16.4




[PATCH 0/3] DirectSound fixes for 5.0

2020-04-01 Thread Volker Rümelin
Three audio fixes for DirectSound on Windows. They were tested
on a Windows 10 Home system with HAXM accelerator.

Volker Rümelin (3):
  dsoundaudio: fix never-ending playback loop
  dsoundaudio: fix "Could not lock capture buffer" warning
  dsoundaudio: dsound_get_buffer_in should honor *size

 audio/audio.c   | 12 +---
 audio/dsoundaudio.c | 13 ++---
 2 files changed, 15 insertions(+), 10 deletions(-)

-- 
2.16.4



Re: [PATCH v16 QEMU 10/16] vfio: Add load state functions to SaveVMHandlers

2020-04-01 Thread Dr. David Alan Gilbert
* Kirti Wankhede (kwankh...@nvidia.com) wrote:
> Sequence  during _RESUMING device state:
> While data for this device is available, repeat below steps:
> a. read data_offset from where user application should write data.
> b. write data of data_size to migration region from data_offset.
> c. write data_size which indicates vendor driver that data is written in
>staging buffer.
> 
> For user, data is opaque. User should write data in the same order as
> received.
> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  hw/vfio/migration.c  | 179 
> +++
>  hw/vfio/trace-events |   3 +
>  2 files changed, 182 insertions(+)
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index ecbeed5182c2..ab295d25620e 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -269,6 +269,33 @@ static int vfio_save_device_config_state(QEMUFile *f, 
> void *opaque)
>  return qemu_file_get_error(f);
>  }
>  
> +static int vfio_load_device_config_state(QEMUFile *f, void *opaque)
> +{
> +VFIODevice *vbasedev = opaque;
> +uint64_t data;
> +
> +if (vbasedev->ops && vbasedev->ops->vfio_load_config) {
> +int ret;
> +
> +ret = vbasedev->ops->vfio_load_config(vbasedev, f);
> +if (ret) {
> +error_report("%s: Failed to load device config space",
> + vbasedev->name);
> +return ret;
> +}
> +}
> +
> +data = qemu_get_be64(f);
> +if (data != VFIO_MIG_FLAG_END_OF_STATE) {
> +error_report("%s: Failed loading device config space, "
> + "end flag incorrect 0x%"PRIx64, vbasedev->name, data);
> +return -EINVAL;
> +}
> +
> +trace_vfio_load_device_config_state(vbasedev->name);
> +return qemu_file_get_error(f);
> +}
> +
>  /* -- */
>  
>  static int vfio_save_setup(QEMUFile *f, void *opaque)
> @@ -434,12 +461,164 @@ static int vfio_save_complete_precopy(QEMUFile *f, 
> void *opaque)
>  return ret;
>  }
>  
> +static int vfio_load_setup(QEMUFile *f, void *opaque)
> +{
> +VFIODevice *vbasedev = opaque;
> +VFIOMigration *migration = vbasedev->migration;
> +int ret = 0;
> +
> +if (migration->region.mmaps) {
> +ret = vfio_region_mmap(>region);
> +if (ret) {
> +error_report("%s: Failed to mmap VFIO migration region %d: %s",
> + vbasedev->name, migration->region.nr,
> + strerror(-ret));
> +return ret;
> +}
> +}
> +
> +ret = vfio_migration_set_state(vbasedev, ~0, VFIO_DEVICE_STATE_RESUMING);
> +if (ret) {
> +error_report("%s: Failed to set state RESUMING", vbasedev->name);
> +}
> +return ret;
> +}
> +
> +static int vfio_load_cleanup(void *opaque)
> +{
> +vfio_save_cleanup(opaque);
> +return 0;
> +}
> +
> +static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
> +{
> +VFIODevice *vbasedev = opaque;
> +VFIOMigration *migration = vbasedev->migration;
> +int ret = 0;
> +uint64_t data, data_size;
> +
> +data = qemu_get_be64(f);
> +while (data != VFIO_MIG_FLAG_END_OF_STATE) {
> +
> +trace_vfio_load_state(vbasedev->name, data);
> +
> +switch (data) {
> +case VFIO_MIG_FLAG_DEV_CONFIG_STATE:
> +{
> +ret = vfio_load_device_config_state(f, opaque);
> +if (ret) {
> +return ret;
> +}
> +break;
> +}
> +case VFIO_MIG_FLAG_DEV_SETUP_STATE:
> +{
> +uint64_t region_size = qemu_get_be64(f);
> +
> +if (migration->region.size < region_size) {
> +error_report("%s: SETUP STATE: migration region too small, "
> + "0x%"PRIx64 " < 0x%"PRIx64, vbasedev->name,
> + migration->region.size, region_size);
> +return -EINVAL;
> +}
> +
> +data = qemu_get_be64(f);
> +if (data == VFIO_MIG_FLAG_END_OF_STATE) {

Can you explain why you're reading this here rather than letting it drop
through to the read at the end of the loop?

> +return ret;
> +} else {
> +error_report("%s: SETUP STATE: EOS not found 0x%"PRIx64,
> + vbasedev->name, data);
> +return -EINVAL;
> +}
> +break;
> +}
> +case VFIO_MIG_FLAG_DEV_DATA_STATE:
> +{
> +VFIORegion *region = >region;
> +void *buf = NULL;
> +bool buffer_mmaped = false;
> +uint64_t data_offset = 0;
> +
> +data_size = qemu_get_be64(f);
> +if (data_size == 0) {
> +break;
> +}
> +
> +ret = pread(vbasedev->fd, _offset, sizeof(data_offset),
> +

Re: [PATCH] lockable: Replace locks with lock guard macros

2020-04-01 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/20200401162023.GA15912@simran-Inspiron-5558/



Hi,

This series failed the asan build test. Please find the testing commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-debug@fedora TARGET_LIST=x86_64-softmmu J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC  x86_64-softmmu/hw/i386/x86.o
  CC  x86_64-softmmu/hw/i386/pc.o
  CC  x86_64-softmmu/hw/i386/pc_sysfw.o
/tmp/qemu-test/src/hw/rdma/rdma_utils.c:74:5: error: unused variable 
'qemu_lockable_auto__COUNTER__' [-Werror,-Wunused-variable]
QEMU_LOCK_GUARD(>lock);
^
/tmp/qemu-test/src/include/qemu/lockable.h:173:29: note: expanded from macro 
'QEMU_LOCK_GUARD'
---
:194:1: note: expanded from here
qemu_lockable_auto__COUNTER__
^
/tmp/qemu-test/src/hw/rdma/rdma_utils.c:109:5: error: unused variable 
'qemu_lockable_auto__COUNTER__' [-Werror,-Wunused-variable]
QEMU_LOCK_GUARD(>lock);
^
/tmp/qemu-test/src/include/qemu/lockable.h:173:29: note: expanded from macro 
'QEMU_LOCK_GUARD'
---
:207:1: note: expanded from here
qemu_lockable_auto__COUNTER__
^
/tmp/qemu-test/src/hw/rdma/rdma_utils.c:116:5: error: unused variable 
'qemu_lockable_auto__COUNTER__' [-Werror,-Wunused-variable]
QEMU_LOCK_GUARD(>lock);
^
/tmp/qemu-test/src/include/qemu/lockable.h:173:29: note: expanded from macro 
'QEMU_LOCK_GUARD'
---
qemu_lockable_auto__COUNTER__
^
3 errors generated.
make[1]: *** [/tmp/qemu-test/src/rules.mak:69: hw/rdma/rdma_utils.o] Error 1
make[1]: *** Waiting for unfinished jobs
  CC  x86_64-softmmu/hw/i386/pc_piix.o
/tmp/qemu-test/src/hw/hyperv/hyperv.c:495:5: error: unused variable 
'qemu_lockable_auto__COUNTER__' [-Werror,-Wunused-variable]
QEMU_LOCK_GUARD(_mutex);
^
/tmp/qemu-test/src/include/qemu/lockable.h:173:29: note: expanded from macro 
'QEMU_LOCK_GUARD'
---
:24:1: note: expanded from here
qemu_lockable_auto__COUNTER__
^
/tmp/qemu-test/src/hw/hyperv/hyperv.c:568:5: error: unused variable 
'qemu_lockable_auto__COUNTER__' [-Werror,-Wunused-variable]
QEMU_LOCK_GUARD(_mutex);
^
/tmp/qemu-test/src/include/qemu/lockable.h:173:29: note: expanded from macro 
'QEMU_LOCK_GUARD'
---
qemu_lockable_auto__COUNTER__
^
2 errors generated.
make[1]: *** [/tmp/qemu-test/src/rules.mak:69: hw/hyperv/hyperv.o] Error 1
/tmp/qemu-test/src/hw/rdma/rdma_rm.c:150:5: error: unused variable 
'qemu_lockable_auto__COUNTER__' [-Werror,-Wunused-variable]
QEMU_LOCK_GUARD(>lock);
^
/tmp/qemu-test/src/include/qemu/lockable.h:173:29: note: expanded from macro 
'QEMU_LOCK_GUARD'
---
qemu_lockable_auto__COUNTER__
^
1 error generated.
make[1]: *** [/tmp/qemu-test/src/rules.mak:69: hw/rdma/rdma_rm.o] Error 1
make: *** [Makefile:527: x86_64-softmmu/all] Error 2
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 664, in 
sys.exit(main())
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=d930676030934c1fbb5e45fcfd347949', '-u', 
'1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 
'TARGET_LIST=x86_64-softmmu', '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 
'J=14', '-e', 'DEBUG=', '-e', 'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', 
'-v', '/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-ie9_gn1g/src/docker-src.2020-04-01-14.53.27.23876:/var/tmp/qemu:z,ro',
 'qemu:fedora', '/var/tmp/qemu/run', 'test-debug']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=d930676030934c1fbb5e45fcfd347949
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-ie9_gn1g/src'
make: *** [docker-run-test-debug@fedora] Error 2

real4m7.415s
user0m8.437s


The full log is available at
http://patchew.org/logs/20200401162023.GA15912@simran-Inspiron-5558/testing.asan/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: bdrv_drained_begin deadlock with io-threads

2020-04-01 Thread Kevin Wolf
Am 01.04.2020 um 20:28 hat Dietmar Maurer geschrieben:
> > That's a pretty big change, and I'm not sure how it's related to
> > completed requests hanging in the thread pool instead of reentering the
> > file-posix coroutine. But I also tested it enough that I'm confident
> > it's really the first bad commit.
> > 
> > Maybe you want to try if your problem starts at the same commit?
> 
> Stefan already found that by bisecting last week:
> 
> See: https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07629.html
> 
> But, IMHO the commit is not the reason for (my) bug - It just makes
> it easier to trigger... I can see (my) bug sometimes with 4.1.1, although
> I have no easy way to reproduce it reliable.
> 
> Also, Stefan sent some patches to the list to fix some of the problems.
> 
> https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg00022.html
> 
> Does that fix your problem?

It seems to fix it, yes. Now I don't get any hangs any more. (Also, I
guess this means that this day was essentially wasted because I worked
on a problem that already has a fix... *sigh*)

Kevin




[PATCH v1] nvme: indicate CMB support through controller capabilities register

2020-04-01 Thread Andrzej Jakowski
This patch sets CMBS bit in controller capabilities register when user
configures NVMe driver with CMB support, so capabilites are correctly reported
to guest OS.

Signed-off-by: Andrzej Jakowski 
---
 hw/block/nvme.c  | 2 ++
 include/block/nvme.h | 4 
 2 files changed, 6 insertions(+)

diff --git a/hw/block/nvme.c b/hw/block/nvme.c
index d28335cbf3..986803398f 100644
--- a/hw/block/nvme.c
+++ b/hw/block/nvme.c
@@ -1393,6 +1393,8 @@ static void nvme_realize(PCIDevice *pci_dev, Error **errp)
 n->bar.intmc = n->bar.intms = 0;
 
 if (n->cmb_size_mb) {
+/* Contoller capabilities */
+NVME_CAP_SET_CMBS(n->bar.cap, 1);
 
 NVME_CMBLOC_SET_BIR(n->bar.cmbloc, 2);
 NVME_CMBLOC_SET_OFST(n->bar.cmbloc, 0);
diff --git a/include/block/nvme.h b/include/block/nvme.h
index 8fb941c653..561891b140 100644
--- a/include/block/nvme.h
+++ b/include/block/nvme.h
@@ -27,6 +27,7 @@ enum NvmeCapShift {
 CAP_CSS_SHIFT  = 37,
 CAP_MPSMIN_SHIFT   = 48,
 CAP_MPSMAX_SHIFT   = 52,
+CAP_CMB_SHIFT  = 57,
 };
 
 enum NvmeCapMask {
@@ -39,6 +40,7 @@ enum NvmeCapMask {
 CAP_CSS_MASK   = 0xff,
 CAP_MPSMIN_MASK= 0xf,
 CAP_MPSMAX_MASK= 0xf,
+CAP_CMB_MASK   = 0x1,
 };
 
 #define NVME_CAP_MQES(cap)  (((cap) >> CAP_MQES_SHIFT)   & CAP_MQES_MASK)
@@ -69,6 +71,8 @@ enum NvmeCapMask {
<< CAP_MPSMIN_SHIFT)
 #define NVME_CAP_SET_MPSMAX(cap, val) (cap |= (uint64_t)(val & 
CAP_MPSMAX_MASK)\
 << 
CAP_MPSMAX_SHIFT)
+#define NVME_CAP_SET_CMBS(cap, val) (cap |= (uint64_t)(val & CAP_CMB_MASK)\
+<< CAP_CMB_SHIFT)
 
 enum NvmeCcShift {
 CC_EN_SHIFT = 0,
-- 
2.21.1




Re: bdrv_drained_begin deadlock with io-threads

2020-04-01 Thread Kevin Wolf
Am 01.04.2020 um 20:12 hat Kevin Wolf geschrieben:
> Am 01.04.2020 um 17:37 hat Dietmar Maurer geschrieben:
> > > > I really nobody else able to reproduce this (somebody already tried to 
> > > > reproduce)?
> > > 
> > > I can get hangs, but that's for job_completed(), not for starting the
> > > job. Also, my hangs have a non-empty bs->tracked_requests, so it looks
> > > like a different case to me.
> > 
> > Please can you post the command line args of your VM? I use something like
> > 
> > ./x86_64-softmmu/qemu-system-x86_64 -chardev
> > 'socket,id=qmp,path=/var/run/qemu-server/101.qmp,server,nowait' -mon
> > 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/101.pid  -m
> > 1024 -object 'iothread,id=iothread-virtioscsi0' -device
> > 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' -drive
> > 'file=/backup/disk3/debian-buster.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on'
> > -device
> > 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0'
> > -machine "type=pc,accel=kvm"
> > 
> > Do you also run "stress-ng -d 5" indied the VM?
> 
> I'm not using the exact same test case, but something that I thought
> would be similar enough. Specifically, I run the script below, which
> boots from a RHEL 8 CD and in the rescue shell, I'll do 'dd if=/dev/zero
> of=/dev/sda' while the script keeps starting and cancelling backup jobs
> in the background.
> 
> Anyway, I finally managed to bisect my problem now (did it wrong the
> first time) and got this result:
> [...]

So back on current git master, my deadlock is between main thread and
the iothread. For some reason, the main thread holds the thread pool
mutex of the iothread's thread pool. This means that the iothread can't
complete its requests and the drain operation in the main thread can't
make progress.

I think there is no reason why the main thread should ever take the
mutex of the thread pool of a different thread, so I'm not sure. But
maybe that backup commit changed something in the way nodes are moved
between AioContexts that would cause this to happen.

Kevin


Thread 3 (Thread 0x7f53c7438700 (LWP 3967)):
#0  0x7f53d23449ad in __lll_lock_wait () at /lib64/libpthread.so.0
#1  0x7f53d233dd94 in pthread_mutex_lock () at /lib64/libpthread.so.0
#2  0x55dcc331bdb3 in qemu_mutex_lock_impl (mutex=0x55dcc4ff1c80, 
file=0x55dcc3512bff "util/async.c", line=557) at util/qemu-thread-posix.c:78
#3  0x55dcc33167ae in thread_pool_completion_bh (opaque=0x7f53b8003120) at 
util/thread-pool.c:167
#4  0x55dcc331597e in aio_bh_call (bh=0x55dcc5b94680) at util/async.c:117
#5  0x55dcc331597e in aio_bh_poll (ctx=ctx@entry=0x55dcc4ff1c20) at 
util/async.c:117
#6  0x55dcc3318ee7 in aio_poll (ctx=0x55dcc4ff1c20, 
blocking=blocking@entry=true) at util/aio-posix.c:638
#7  0x55dcc2ff7df0 in iothread_run (opaque=opaque@entry=0x55dcc4cfdac0) at 
iothread.c:75
#8  0x55dcc331bbba in qemu_thread_start (args=) at 
util/qemu-thread-posix.c:519
#9  0x7f53d233b4aa in start_thread () at /lib64/libpthread.so.0
#10 0x7f53d226b3f3 in clone () at /lib64/libc.so.6

Thread 1 (Thread 0x7f53c7dab680 (LWP 3962)):
#0  0x7f53d2260526 in ppoll () at /lib64/libc.so.6
#1  0x55dcc33171c9 in ppoll (__ss=0x0, __timeout=0x0, __nfds=, __fds=) at /usr/include/bits/poll2.h:77
#2  0x55dcc33171c9 in qemu_poll_ns (fds=, nfds=, timeout=timeout@entry=-1) at util/qemu-timer.c:335
#3  0x55dcc33199a1 in fdmon_poll_wait (ctx=0x55dcc4f78470, 
ready_list=0x7fffb3506768, timeout=-1) at util/fdmon-poll.c:79
#4  0x55dcc3318f87 in aio_poll (ctx=0x55dcc4f78470, 
blocking=blocking@entry=true) at util/aio-posix.c:589
#5  0x55dcc3276763 in bdrv_do_drained_begin (poll=, 
ignore_bds_parents=false, parent=0x0, recursive=false, bs=0x55dcc5b66010) at 
block/io.c:429
#6  0x55dcc3276763 in bdrv_do_drained_begin (bs=0x55dcc5b66010, 
recursive=, parent=0x0, ignore_bds_parents=, 
poll=) at block/io.c:395
#7  0x55dcc3291422 in bdrv_backup_top_drop (bs=0x55dcc5b66010) at 
block/backup-top.c:273
#8  0x55dcc328cb4c in backup_clean (job=0x55dcc64ab800) at 
block/backup.c:132
#9  0x55dcc322324d in job_clean (job=0x55dcc64ab800) at job.c:656
#10 0x55dcc322324d in job_finalize_single (job=0x55dcc64ab800) at job.c:672
#11 0x55dcc322324d in job_finalize_single (job=0x55dcc64ab800) at job.c:660
#12 0x55dcc3223baa in job_completed_txn_abort (job=) at 
job.c:748
#13 0x55dcc3223db2 in job_completed (job=0x55dcc64ab800) at job.c:842
#14 0x55dcc3223db2 in job_completed (job=0x55dcc64ab800) at job.c:835
#15 0x55dcc3223f60 in job_exit (opaque=0x55dcc64ab800) at job.c:863
#16 0x55dcc331597e in aio_bh_call (bh=0x7f53b8010eb0) at util/async.c:117
#17 0x55dcc331597e in aio_bh_poll (ctx=ctx@entry=0x55dcc4f78470) at 
util/async.c:117
#18 0x55dcc3318dde in aio_dispatch (ctx=0x55dcc4f78470) at 
util/aio-posix.c:380
#19 0x55dcc331585e in aio_ctx_dispatch (source=, 

Re: [PATCH v16 QEMU 00/16] Add migration support for VFIO devices

2020-04-01 Thread Alex Williamson
On Wed, 1 Apr 2020 02:41:54 -0400
Yan Zhao  wrote:

> On Wed, Apr 01, 2020 at 02:34:24AM +0800, Alex Williamson wrote:
> > On Wed, 25 Mar 2020 02:38:58 +0530
> > Kirti Wankhede  wrote:
> >   
> > > Hi,
> > > 
> > > This Patch set adds migration support for VFIO devices in QEMU.  
> > 
> > Hi Kirti,
> > 
> > Do you have any migration data you can share to show that this solution
> > is viable and useful?  I was chatting with Dave Gilbert and there still
> > seems to be a concern that we actually have a real-world practical
> > solution.  We know this is inefficient with QEMU today, vendor pinned
> > memory will get copied multiple times if we're lucky.  If we're not
> > lucky we may be copying all of guest RAM repeatedly.  There are known
> > inefficiencies with vIOMMU, etc.  QEMU could learn new heuristics to
> > account for some of this and we could potentially report different
> > bitmaps in different phases through vfio, but let's make sure that
> > there are useful cases enabled by this first implementation.
> > 
> > With a reasonably sized VM, running a reasonable graphics demo or
> > workload, can we achieve reasonably live migration?  What kind of
> > downtime do we achieve and what is the working set size of the pinned
> > memory?  Intel folks, if you've been able to port to this or similar
> > code base, please report your results as well, open source consumers
> > are arguably even more important.  Thanks,
> >   
> hi Alex
> we're in the process of porting to this code, and now it's able to
> migrate successfully without dirty pages.
> 
> when there're dirty pages, we met several issues.
> one of them is reported here
> (https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg4.html).
> dirty pages for some regions are not able to be collected correctly,
> especially for memory range from 3G to 4G.
> 
> even without this bug, qemu still got stuck in middle before
> reaching stop-and-copy phase and cannot be killed by admin.
> still in debugging of this problem.

Thanks, Yan.  So it seems we have various bugs, known limitations, and
we haven't actually proven that this implementation provides a useful
feature, at least for the open source consumer.  This doesn't give me
much confidence to consider the kernel portion ready for v5.7 given how
late we are already :-\  Thanks,

Alex




Re: bdrv_drained_begin deadlock with io-threads

2020-04-01 Thread Dietmar Maurer
> That's a pretty big change, and I'm not sure how it's related to
> completed requests hanging in the thread pool instead of reentering the
> file-posix coroutine. But I also tested it enough that I'm confident
> it's really the first bad commit.
> 
> Maybe you want to try if your problem starts at the same commit?

Stefan already found that by bisecting last week:

See: https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07629.html

But, IMHO the commit is not the reason for (my) bug - It just makes
it easier to trigger... I can see (my) bug sometimes with 4.1.1, although
I have no easy way to reproduce it reliable.

Also, Stefan sent some patches to the list to fix some of the problems.

https://lists.gnu.org/archive/html/qemu-devel/2020-04/msg00022.html

Does that fix your problem?

I will run further tests with your script, thanks.

> Kevin
> 
> 
> #!/bin/bash
> 
> qmp() {
> cat < {'execute':'qmp_capabilities'}
> EOF
> 
> while true; do
> cat < { "execute": "drive-backup", "arguments": {
>   "job-id":"drive_image1","device": "drive_image1", "sync": "full", "target": 
> "/tmp/backup.raw" } }
> EOF
> sleep 1
> cat < { "execute": "block-job-cancel", "arguments": { "device": "drive_image1"} }
> EOF
> sleep 2
> done
> }
> 
> ./qemu-img create -f qcow2 /tmp/test.qcow2 4G
> for i in $(seq 0 1); do echo "write ${i}G 1G"; done | ./qemu-io 
> /tmp/test.qcow2
> 
> qmp | x86_64-softmmu/qemu-system-x86_64 \
> -enable-kvm \
> -machine pc \
> -m 1G \
> -object 'iothread,id=iothread-virtioscsi0' \
> -device 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' \
> -blockdev node-name=my_drive,driver=file,filename=/tmp/test.qcow2 \
> -blockdev driver=qcow2,node-name=drive_image1,file=my_drive \
> -device scsi-hd,drive=drive_image1,id=image1 \
> -cdrom ~/images/iso/RHEL-8.0-20190116.1-x86_64-dvd1.iso \
> -boot d \
> -qmp stdio -monitor vc




Re: bdrv_drained_begin deadlock with io-threads

2020-04-01 Thread Kevin Wolf
Am 01.04.2020 um 17:37 hat Dietmar Maurer geschrieben:
> > > I really nobody else able to reproduce this (somebody already tried to 
> > > reproduce)?
> > 
> > I can get hangs, but that's for job_completed(), not for starting the
> > job. Also, my hangs have a non-empty bs->tracked_requests, so it looks
> > like a different case to me.
> 
> Please can you post the command line args of your VM? I use something like
> 
> ./x86_64-softmmu/qemu-system-x86_64 -chardev
> 'socket,id=qmp,path=/var/run/qemu-server/101.qmp,server,nowait' -mon
> 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/101.pid  -m
> 1024 -object 'iothread,id=iothread-virtioscsi0' -device
> 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' -drive
> 'file=/backup/disk3/debian-buster.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on'
> -device
> 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0'
> -machine "type=pc,accel=kvm"
> 
> Do you also run "stress-ng -d 5" indied the VM?

I'm not using the exact same test case, but something that I thought
would be similar enough. Specifically, I run the script below, which
boots from a RHEL 8 CD and in the rescue shell, I'll do 'dd if=/dev/zero
of=/dev/sda' while the script keeps starting and cancelling backup jobs
in the background.

Anyway, I finally managed to bisect my problem now (did it wrong the
first time) and got this result:

00e30f05de1d19586345ec373970ef4c192c6270 is the first bad commit
commit 00e30f05de1d19586345ec373970ef4c192c6270
Author: Vladimir Sementsov-Ogievskiy 
Date:   Tue Oct 1 16:14:09 2019 +0300

block/backup: use backup-top instead of write notifiers

Drop write notifiers and use filter node instead.

= Changes =

1. Add filter-node-name argument for backup qmp api. We have to do it
in this commit, as 257 needs to be fixed.

2. There are no more write notifiers here, so is_write_notifier
parameter is dropped from block-copy paths.

3. To sync with in-flight requests at job finish we now have drained
removing of the filter, we don't need rw-lock.

4. Block-copy is now using BdrvChildren instead of BlockBackends

5. As backup-top owns these children, we also move block-copy state
into backup-top's ownership.

[...]


That's a pretty big change, and I'm not sure how it's related to
completed requests hanging in the thread pool instead of reentering the
file-posix coroutine. But I also tested it enough that I'm confident
it's really the first bad commit.

Maybe you want to try if your problem starts at the same commit?

Kevin


#!/bin/bash

qmp() {
cat <

Re: [PATCH v16 3/4] qcow2: add zstd cluster compression

2020-04-01 Thread Denis Plotnikov




On 01.04.2020 18:36, Vladimir Sementsov-Ogievskiy wrote:

01.04.2020 17:37, Denis Plotnikov wrote:

zstd significantly reduces cluster compression time.
It provides better compression performance maintaining
the same level of the compression ratio in comparison with
zlib, which, at the moment, is the only compression
method available.

The performance test results:
Test compresses and decompresses qemu qcow2 image with just
installed rhel-7.6 guest.
Image cluster size: 64K. Image on disk size: 2.2G

The test was conducted with brd disk to reduce the influence
of disk subsystem to the test results.
The results is given in seconds.

compress cmd:
   time ./qemu-img convert -O qcow2 -c -o compression_type=[zlib|zstd]
   src.img [zlib|zstd]_compressed.img
decompress cmd
   time ./qemu-img convert -O qcow2
   [zlib|zstd]_compressed.img uncompressed.img

    compression   decompression
  zlib   zstd   zlib zstd

real 65.5   16.3 (-75 %)    1.9  1.6 (-16 %)
user 65.0   15.8    5.3  2.5
sys   3.3    0.2    2.0  2.0

Both ZLIB and ZSTD gave the same compression ratio: 1.57
compressed image size in both cases: 1.4G

Signed-off-by: Denis Plotnikov 
QAPI part:
Acked-by: Markus Armbruster 
---
  docs/interop/qcow2.txt |   1 +
  configure  |   2 +-
  qapi/block-core.json   |   3 +-
  block/qcow2-threads.c  | 163 +
  block/qcow2.c  |   7 ++
  5 files changed, 174 insertions(+), 2 deletions(-)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index 640e0eca40..18a77f737e 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -209,6 +209,7 @@ version 2.
    Available compression type values:
  0: zlib 
+    1: zstd 
      === Header padding ===
diff --git a/configure b/configure
index e225a1e3ff..fdc991b010 100755
--- a/configure
+++ b/configure
@@ -1861,7 +1861,7 @@ disabled with --disable-FEATURE, default is 
enabled if available:

    lzfse   support of lzfse compression library
    (for reading lzfse-compressed dmg images)
    zstd    support for zstd compression library
-  (for migration compression)
+  (for migration compression and qcow2 cluster 
compression)

    seccomp seccomp support
    coroutine-pool  coroutine freelist (better performance)
    glusterfs   GlusterFS backend
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 1522e2983f..6fbacddab2 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4293,11 +4293,12 @@
  # Compression type used in qcow2 image file
  #
  # @zlib: zlib compression, see 
+# @zstd: zstd compression, see 
  #
  # Since: 5.1
  ##
  { 'enum': 'Qcow2CompressionType',
-  'data': [ 'zlib' ] }
+  'data': [ 'zlib', { 'name': 'zstd', 'if': 'defined(CONFIG_ZSTD)' } 
] }

    ##
  # @BlockdevCreateOptionsQcow2:
diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
index 7dbaf53489..aa133204f0 100644
--- a/block/qcow2-threads.c
+++ b/block/qcow2-threads.c
@@ -28,6 +28,11 @@
  #define ZLIB_CONST
  #include 
  +#ifdef CONFIG_ZSTD
+#include 
+#include 
+#endif
+
  #include "qcow2.h"
  #include "block/thread-pool.h"
  #include "crypto.h"
@@ -166,6 +171,154 @@ static ssize_t qcow2_zlib_decompress(void 
*dest, size_t dest_size,

  return ret;
  }
  +#ifdef CONFIG_ZSTD
+
+/*
+ * qcow2_zstd_compress()
+ *
+ * Compress @src_size bytes of data using zstd compression method
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: compressed size on success
+ *  -ENOMEM destination buffer is not enough to store 
compressed data

+ *  -EIO    on any other error
+ */
+static ssize_t qcow2_zstd_compress(void *dest, size_t dest_size,
+   const void *src, size_t src_size)
+{
+    ssize_t ret;
+    ZSTD_outBuffer output = { dest, dest_size, 0 };
+    ZSTD_inBuffer input = { src, src_size, 0 };
+    ZSTD_CCtx *cctx = ZSTD_createCCtx();
+
+    if (!cctx) {
+    return -EIO;
+    }
+    /*
+ * Use the zstd streamed interface for symmetry with decompression,
+ * where streaming is essential since we don't record the exact
+ * compressed size.
+ *
+ * In the loop, we try to compress all the data into one zstd 
frame.

+ * ZSTD_compressStream2 potentially can finish a frame earlier
+ * than the full input data is consumed. That's why we are looping
+ * until all the input data is consumed.
+ */
+    while (input.pos < input.size) {
+    size_t zstd_ret = 0;


dead assignment


+    /*
+ * ZSTD 

[PATCH for-5.0 v2] qemu-img: Report convert errors by bytes, not sectors

2020-04-01 Thread Eric Blake
Various qemu-img commands are inconsistent on whether they report
status/errors in terms of bytes or sector offsets.  The latter is
confusing (especially as more places move to 4k block sizes), so let's
switch everything to just use bytes everywhere.

Signed-off-by: Eric Blake 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---

- v2: fix printf formats [patchew]

 qemu-img.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/qemu-img.c b/qemu-img.c
index b167376bd72e..821cbf610e5f 100644
--- a/qemu-img.c
+++ b/qemu-img.c
@@ -1924,8 +1924,8 @@ retry:
 if (status == BLK_DATA && !copy_range) {
 ret = convert_co_read(s, sector_num, n, buf);
 if (ret < 0) {
-error_report("error while reading sector %" PRId64
- ": %s", sector_num, strerror(-ret));
+error_report("error while reading at byte %lld: %s",
+ sector_num * BDRV_SECTOR_SIZE, strerror(-ret));
 s->ret = ret;
 }
 } else if (!s->min_sparse && status == BLK_ZERO) {
@@ -1953,8 +1953,8 @@ retry:
 ret = convert_co_write(s, sector_num, n, buf, status);
 }
 if (ret < 0) {
-error_report("error while writing sector %" PRId64
- ": %s", sector_num, strerror(-ret));
+error_report("error while writing at byte %lld: %s",
+ sector_num * BDRV_SECTOR_SIZE, strerror(-ret));
 s->ret = ret;
 }
 }
-- 
2.26.0.rc2




[PATCH v17 1/4] qcow2: introduce compression type feature

2020-04-01 Thread Denis Plotnikov
The patch adds some preparation parts for incompatible compression type
feature to qcow2 allowing the use different compression methods for
image clusters (de)compressing.

It is implied that the compression type is set on the image creation and
can be changed only later by image conversion, thus compression type
defines the only compression algorithm used for the image, and thus,
for all image clusters.

The goal of the feature is to add support of other compression methods
to qcow2. For example, ZSTD which is more effective on compression than ZLIB.

The default compression is ZLIB. Images created with ZLIB compression type
are backward compatible with older qemu versions.

Adding of the compression type breaks a number of tests because now the
compression type is reported on image creation and there are some changes
in the qcow2 header in size and offsets.

The tests are fixed in the following ways:
* filter out compression_type for many tests
* fix header size, feature table size and backing file offset
  affected tests: 031, 036, 061, 080
  header_size +=8: 1 byte compression type
   7 bytes padding
  feature_table += 48: incompatible feature compression type
  backing_file_offset += 56 (8 + 48 -> header_change + feature_table_change)
* add "compression type" for test output matching when it isn't filtered
  affected tests: 049, 060, 061, 065, 144, 182, 242, 255

Signed-off-by: Denis Plotnikov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
QAPI part:
Acked-by: Markus Armbruster 
---
 qapi/block-core.json |  22 +-
 block/qcow2.h|  20 +-
 include/block/block_int.h|   1 +
 block/qcow2.c| 113 +++
 tests/qemu-iotests/031.out   |  14 ++--
 tests/qemu-iotests/036.out   |   4 +-
 tests/qemu-iotests/049.out   | 102 ++--
 tests/qemu-iotests/060.out   |   1 +
 tests/qemu-iotests/061.out   |  34 ++
 tests/qemu-iotests/065   |  28 +---
 tests/qemu-iotests/080   |   2 +-
 tests/qemu-iotests/144.out   |   4 +-
 tests/qemu-iotests/182.out   |   2 +-
 tests/qemu-iotests/242.out   |   5 ++
 tests/qemu-iotests/255.out   |   8 +--
 tests/qemu-iotests/common.filter |   3 +-
 16 files changed, 267 insertions(+), 96 deletions(-)

diff --git a/qapi/block-core.json b/qapi/block-core.json
index 943df1926a..1522e2983f 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -78,6 +78,8 @@
 #
 # @bitmaps: A list of qcow2 bitmap details (since 4.0)
 #
+# @compression-type: the image cluster compression method (since 5.1)
+#
 # Since: 1.7
 ##
 { 'struct': 'ImageInfoSpecificQCow2',
@@ -89,7 +91,8 @@
   '*corrupt': 'bool',
   'refcount-bits': 'int',
   '*encrypt': 'ImageInfoSpecificQCow2Encryption',
-  '*bitmaps': ['Qcow2BitmapInfo']
+  '*bitmaps': ['Qcow2BitmapInfo'],
+  'compression-type': 'Qcow2CompressionType'
   } }
 
 ##
@@ -4284,6 +4287,18 @@
   'data': [ 'v2', 'v3' ] }
 
 
+##
+# @Qcow2CompressionType:
+#
+# Compression type used in qcow2 image file
+#
+# @zlib: zlib compression, see 
+#
+# Since: 5.1
+##
+{ 'enum': 'Qcow2CompressionType',
+  'data': [ 'zlib' ] }
+
 ##
 # @BlockdevCreateOptionsQcow2:
 #
@@ -4307,6 +4322,8 @@
 # allowed values: off, falloc, full, metadata)
 # @lazy-refcounts: True if refcounts may be updated lazily (default: off)
 # @refcount-bits: Width of reference counts in bits (default: 16)
+# @compression-type: The image cluster compression method
+#(default: zlib, since 5.1)
 #
 # Since: 2.12
 ##
@@ -4322,7 +4339,8 @@
 '*cluster-size':'size',
 '*preallocation':   'PreallocMode',
 '*lazy-refcounts':  'bool',
-'*refcount-bits':   'int' } }
+'*refcount-bits':   'int',
+'*compression-type':'Qcow2CompressionType' } }
 
 ##
 # @BlockdevCreateOptionsQed:
diff --git a/block/qcow2.h b/block/qcow2.h
index f4de0a27d5..6a8b82e6cc 100644
--- a/block/qcow2.h
+++ b/block/qcow2.h
@@ -146,8 +146,16 @@ typedef struct QCowHeader {
 
 uint32_t refcount_order;
 uint32_t header_length;
+
+/* Additional fields */
+uint8_t compression_type;
+
+/* header must be a multiple of 8 */
+uint8_t padding[7];
 } QEMU_PACKED QCowHeader;
 
+QEMU_BUILD_BUG_ON(!QEMU_IS_ALIGNED(sizeof(QCowHeader), 8));
+
 typedef struct QEMU_PACKED QCowSnapshotHeader {
 /* header is 8 byte aligned */
 uint64_t l1_table_offset;
@@ -216,13 +224,16 @@ enum {
 QCOW2_INCOMPAT_DIRTY_BITNR  = 0,
 QCOW2_INCOMPAT_CORRUPT_BITNR= 1,
 QCOW2_INCOMPAT_DATA_FILE_BITNR  = 2,
+QCOW2_INCOMPAT_COMPRESSION_BITNR = 3,
 QCOW2_INCOMPAT_DIRTY= 1 << QCOW2_INCOMPAT_DIRTY_BITNR,
 QCOW2_INCOMPAT_CORRUPT  = 1 << QCOW2_INCOMPAT_CORRUPT_BITNR,
 QCOW2_INCOMPAT_DATA_FILE= 1 << 

Re: [PATCH v16 Kernel 5/7] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap

2020-04-01 Thread Kirti Wankhede




On 3/30/2020 7:45 AM, Yan Zhao wrote:

On Fri, Mar 27, 2020 at 12:42:43PM +0800, Kirti Wankhede wrote:



On 3/27/2020 5:34 AM, Yan Zhao wrote:

On Fri, Mar 27, 2020 at 05:39:44AM +0800, Kirti Wankhede wrote:



On 3/25/2020 7:48 AM, Yan Zhao wrote:

On Wed, Mar 25, 2020 at 03:32:37AM +0800, Kirti Wankhede wrote:

DMA mapped pages, including those pinned by mdev vendor drivers, might
get unpinned and unmapped while migration is active and device is still
running. For example, in pre-copy phase while guest driver could access
those pages, host device or vendor driver can dirty these mapped pages.
Such pages should be marked dirty so as to maintain memory consistency
for a user making use of dirty page tracking.

To get bitmap during unmap, user should allocate memory for bitmap, set
size of allocated memory, set page size to be considered for bitmap and
set flag VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
drivers/vfio/vfio_iommu_type1.c | 54 
++---
include/uapi/linux/vfio.h   | 10 
2 files changed, 60 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 27ed069c5053..b98a8d79e13a 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -982,7 +982,8 @@ static int verify_bitmap_size(uint64_t npages, uint64_t 
bitmap_size)
}

static int vfio_dma_do_unmap(struct vfio_iommu *iommu,

-struct vfio_iommu_type1_dma_unmap *unmap)
+struct vfio_iommu_type1_dma_unmap *unmap,
+struct vfio_bitmap *bitmap)
{
uint64_t mask;
struct vfio_dma *dma, *dma_last = NULL;
@@ -1033,6 +1034,10 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 * will be returned if these conditions are not met.  The v2 interface
 * will only return success and a size of zero if there were no
 * mappings within the range.
+*
+* When VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP flag is set, unmap request
+* must be for single mapping. Multiple mappings with this flag set is
+* not supported.
 */
if (iommu->v2) {
dma = vfio_find_dma(iommu, unmap->iova, 1);
@@ -1040,6 +1045,13 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
ret = -EINVAL;
goto unlock;
}
+
+   if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
+   (dma->iova != unmap->iova || dma->size != unmap->size)) {

potential NULL pointer!

And could you address the comments in v14?
How to handle DSI unmaps in vIOMMU
(https://lore.kernel.org/kvm/20200323011041.GB5456@joy-OptiPlex-7040/)



Sorry, I drafted reply to it, but I missed to send, it remained in my drafts

   >
   > it happens in vIOMMU Domain level invalidation of IOTLB
   > (domain-selective invalidation, see vtd_iotlb_domain_invalidate() in
qemu).
   > common in VTD lazy mode, and NOT just happening once at boot time.
   > rather than invalidate page by page, it batches the page invalidation.
   > so, when this invalidation takes place, even higher level page tables
   > have been invalid and therefore it has to invalidate a bigger
combined range.
   > That's why we see IOVAs are mapped in 4k pages, but are unmapped in 2M
   > pages.
   >
   > I think those UNMAPs should also have GET_DIRTY_BIMTAP flag on, right?


vtd_iotlb_domain_invalidate()
 vtd_sync_shadow_page_table()
   vtd_sync_shadow_page_table_range(vtd_as, , 0, UINT64_MAX)
 vtd_page_walk()
   vtd_page_walk_level() - walk over specific level for IOVA range
 vtd_page_walk_one()
   memory_region_notify_iommu()
   ...
 vfio_iommu_map_notify()

In the above trace, isn't page walk will take care of creating proper
IOTLB entry which should be same as created during mapping for that
IOTLB entry?


No. It does walk the page table, but as it's dsi (delay & batched unmap),
pages table entry for a whole 2M (the higher level, not last level for 4K)
range is invalid, so the iotlb->addr_mask what vfio_iommu_map_notify()
receives is (2M - 1), not the same as the size for map.



When do this happen? during my testing I never hit this case. How can I
hit this case?


Just common settings to enable vIOMMU:
Qemu: -device intel-iommu,caching-mode=true
guest kernel parameter: intel_iommu=on

(intel_iommu=on turns on lazy mode by default)

In lazy mode, guest notifies DMA MAP on page level, but notifies DMA UNMAPs
in batch.
with a pass-through NVMe, there are 89 DSI unmaps in 1 second for a typical fio.
With a pass-through GPU, there 22 DSI unmaps in total for benchmark openArena
(lasting around 55 secs)


In this case, will adjacent whole vfio_dmas will be clubbed together or
will there be any intersection 

[PATCH v17 3/4] qcow2: add zstd cluster compression

2020-04-01 Thread Denis Plotnikov
zstd significantly reduces cluster compression time.
It provides better compression performance maintaining
the same level of the compression ratio in comparison with
zlib, which, at the moment, is the only compression
method available.

The performance test results:
Test compresses and decompresses qemu qcow2 image with just
installed rhel-7.6 guest.
Image cluster size: 64K. Image on disk size: 2.2G

The test was conducted with brd disk to reduce the influence
of disk subsystem to the test results.
The results is given in seconds.

compress cmd:
  time ./qemu-img convert -O qcow2 -c -o compression_type=[zlib|zstd]
  src.img [zlib|zstd]_compressed.img
decompress cmd
  time ./qemu-img convert -O qcow2
  [zlib|zstd]_compressed.img uncompressed.img

   compression   decompression
 zlib   zstd   zlib zstd

real 65.5   16.3 (-75 %)1.9  1.6 (-16 %)
user 65.0   15.85.3  2.5
sys   3.30.22.0  2.0

Both ZLIB and ZSTD gave the same compression ratio: 1.57
compressed image size in both cases: 1.4G

Signed-off-by: Denis Plotnikov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
QAPI part:
Acked-by: Markus Armbruster 
---
 docs/interop/qcow2.txt |   1 +
 configure  |   2 +-
 qapi/block-core.json   |   3 +-
 block/qcow2-threads.c  | 157 +
 block/qcow2.c  |   7 ++
 5 files changed, 168 insertions(+), 2 deletions(-)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index 640e0eca40..18a77f737e 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -209,6 +209,7 @@ version 2.
 
 Available compression type values:
 0: zlib 
+1: zstd 
 
 
 === Header padding ===
diff --git a/configure b/configure
index e225a1e3ff..fdc991b010 100755
--- a/configure
+++ b/configure
@@ -1861,7 +1861,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
   lzfse   support of lzfse compression library
   (for reading lzfse-compressed dmg images)
   zstdsupport for zstd compression library
-  (for migration compression)
+  (for migration compression and qcow2 cluster compression)
   seccomp seccomp support
   coroutine-pool  coroutine freelist (better performance)
   glusterfs   GlusterFS backend
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 1522e2983f..6fbacddab2 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4293,11 +4293,12 @@
 # Compression type used in qcow2 image file
 #
 # @zlib: zlib compression, see 
+# @zstd: zstd compression, see 
 #
 # Since: 5.1
 ##
 { 'enum': 'Qcow2CompressionType',
-  'data': [ 'zlib' ] }
+  'data': [ 'zlib', { 'name': 'zstd', 'if': 'defined(CONFIG_ZSTD)' } ] }
 
 ##
 # @BlockdevCreateOptionsQcow2:
diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
index 7dbaf53489..0525718704 100644
--- a/block/qcow2-threads.c
+++ b/block/qcow2-threads.c
@@ -28,6 +28,11 @@
 #define ZLIB_CONST
 #include 
 
+#ifdef CONFIG_ZSTD
+#include 
+#include 
+#endif
+
 #include "qcow2.h"
 #include "block/thread-pool.h"
 #include "crypto.h"
@@ -166,6 +171,148 @@ static ssize_t qcow2_zlib_decompress(void *dest, size_t 
dest_size,
 return ret;
 }
 
+#ifdef CONFIG_ZSTD
+
+/*
+ * qcow2_zstd_compress()
+ *
+ * Compress @src_size bytes of data using zstd compression method
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: compressed size on success
+ *  -ENOMEM destination buffer is not enough to store compressed data
+ *  -EIOon any other error
+ */
+static ssize_t qcow2_zstd_compress(void *dest, size_t dest_size,
+   const void *src, size_t src_size)
+{
+ssize_t ret;
+ZSTD_outBuffer output = { dest, dest_size, 0 };
+ZSTD_inBuffer input = { src, src_size, 0 };
+ZSTD_CCtx *cctx = ZSTD_createCCtx();
+
+if (!cctx) {
+return -EIO;
+}
+/*
+ * Use the zstd streamed interface for symmetry with decompression,
+ * where streaming is essential since we don't record the exact
+ * compressed size.
+ *
+ * In the loop, we try to compress all the data into one zstd frame.
+ * ZSTD_compressStream2 potentially can finish a frame earlier
+ * than the full input data is consumed. That's why we are looping
+ * until all the input data is consumed.
+ */
+while (input.pos < input.size) {
+size_t zstd_ret;
+/*
+ * ZSTD spec: "You must continue calling ZSTD_compressStream2()
+ * with ZSTD_e_end until it returns 0, at which point you are
+ 

[PATCH v17 0/4] qcow2: Implement zstd cluster compression method

2020-04-01 Thread Denis Plotnikov
v17:
   * 03: remove incorrect comment in zstd decompress [Vladimir]
   * 03: remove "paraniod" and rewrite the comment on decompress [Vladimir]
   * 03: fix dead assignment [Vladimir]
   * 04: add and remove quotes [Vladimir]
   * 04: replace long offset form with the short one [Vladimir]

v16:
   * 03: ssize_t for ret, size_t for zstd_ret [Vladimir]
   * 04: small fixes according to the comments [Vladimir] 

v15:
   * 01: aiming qemu 5.1 [Eric]
   * 03: change zstd_res definition place [Vladimir]
   * 04: add two new test cases [Eric]
 1. test adjacent cluster compression with zstd
 2. test incompressible cluster processing
   * 03, 04: many rewording and gramma fixing [Eric]

v14:
   * fix bug on compression - looping until compress == 0 [Me]
   * apply reworked Vladimir's suggestions:
  1. not mixing ssize_t with size_t
  2. safe check for ENOMEM in compression part - avoid overflow
  3. tolerate sanity check allow zstd to make progress only
 on one of the buffers
v13:
   * 03: add progress sanity check to decompression loop [Vladimir]
 03: add successful decompression check [Me]

v12:
   * 03: again, rework compression and decompression loops
 to make them more correct [Vladimir]
 03: move assert in compression to more appropriate place
 [Vladimir]
v11:
   * 03: the loops don't need "do{}while" form anymore and
 the they were buggy (missed "do" in the beginning)
 replace them with usual "while(){}" loops [Vladimir]
v10:
   * 03: fix zstd (de)compressed loops for multi-frame
 cases [Vladimir]
v9:
   * 01: fix error checking and reporting in qcow2_amend compression type part 
[Vladimir]
   * 03: replace asserts with -EIO in qcow2_zstd_decompression [Vladimir, 
Alberto]
   * 03: reword/amend/add comments, fix typos [Vladimir]

v8:
   * 03: switch zstd API from simple to stream [Eric]
 No need to state a special cluster layout for zstd
 compressed clusters.
v7:
   * use qapi_enum_parse instead of the open-coding [Eric]
   * fix wording, typos and spelling [Eric]

v6:
   * "block/qcow2-threads: fix qcow2_decompress" is removed from the series
  since it has been accepted by Max already
   * add compile time checking for Qcow2Header to be a multiple of 8 [Max, 
Alberto]
   * report error on qcow2 amending when the compression type is actually 
chnged [Max]
   * remove the extra space and the extra new line [Max]
   * re-arrange acks and signed-off-s [Vladimir]

v5:
   * replace -ENOTSUP with abort in qcow2_co_decompress [Vladimir]
   * set cluster size for all test cases in the beginning of the 287 test

v4:
   * the series is rebased on top of 01 "block/qcow2-threads: fix 
qcow2_decompress"
   * 01 is just a no-change resend to avoid extra dependencies. Still, it may 
be merged in separate

v3:
   * remove redundant max compression type value check [Vladimir, Eric]
 (the switch below checks everything)
   * prevent compression type changing on "qemu-img amend" [Vladimir]
   * remove zstd config setting, since it has been added already by
 "migration" patches [Vladimir]
   * change the compression type error message [Vladimir] 
   * fix alignment and 80-chars exceeding [Vladimir]

v2:
   * rework compression type setting [Vladimir]
   * squash iotest changes to the compression type introduction patch 
[Vladimir, Eric]
   * fix zstd availability checking in zstd iotest [Vladimir]
   * remove unnecessry casting [Eric]
   * remove rudundant checks [Eric]
   * fix compressed cluster layout in qcow2 spec [Vladimir]
   * fix wording [Eric, Vladimir]
   * fix compression type filtering in iotests [Eric]

v1:
   the initial series



Denis Plotnikov (4):
  qcow2: introduce compression type feature
  qcow2: rework the cluster compression routine
  qcow2: add zstd cluster compression
  iotests: 287: add qcow2 compression type test

 docs/interop/qcow2.txt   |   1 +
 configure|   2 +-
 qapi/block-core.json |  23 +++-
 block/qcow2.h|  20 ++-
 include/block/block_int.h|   1 +
 block/qcow2-threads.c| 228 +--
 block/qcow2.c| 120 
 tests/qemu-iotests/031.out   |  14 +-
 tests/qemu-iotests/036.out   |   4 +-
 tests/qemu-iotests/049.out   | 102 +++---
 tests/qemu-iotests/060.out   |   1 +
 tests/qemu-iotests/061.out   |  34 +++--
 tests/qemu-iotests/065   |  28 ++--
 tests/qemu-iotests/080   |   2 +-
 tests/qemu-iotests/144.out   |   4 +-
 tests/qemu-iotests/182.out   |   2 +-
 tests/qemu-iotests/242.out   |   5 +
 tests/qemu-iotests/255.out   |   8 +-
 tests/qemu-iotests/287   | 162 ++
 tests/qemu-iotests/287.out   |  70 ++
 tests/qemu-iotests/common.filter |   3 +-
 tests/qemu-iotests/group |   1 +
 22 files changed, 727 insertions(+), 108 deletions(-)
 

[PATCH v17 2/4] qcow2: rework the cluster compression routine

2020-04-01 Thread Denis Plotnikov
The patch enables processing the image compression type defined
for the image and chooses an appropriate method for image clusters
(de)compression.

Signed-off-by: Denis Plotnikov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Alberto Garcia 
---
 block/qcow2-threads.c | 71 ---
 1 file changed, 60 insertions(+), 11 deletions(-)

diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
index a68126f291..7dbaf53489 100644
--- a/block/qcow2-threads.c
+++ b/block/qcow2-threads.c
@@ -74,7 +74,9 @@ typedef struct Qcow2CompressData {
 } Qcow2CompressData;
 
 /*
- * qcow2_compress()
+ * qcow2_zlib_compress()
+ *
+ * Compress @src_size bytes of data using zlib compression method
  *
  * @dest - destination buffer, @dest_size bytes
  * @src - source buffer, @src_size bytes
@@ -83,8 +85,8 @@ typedef struct Qcow2CompressData {
  *  -ENOMEM destination buffer is not enough to store compressed data
  *  -EIOon any other error
  */
-static ssize_t qcow2_compress(void *dest, size_t dest_size,
-  const void *src, size_t src_size)
+static ssize_t qcow2_zlib_compress(void *dest, size_t dest_size,
+   const void *src, size_t src_size)
 {
 ssize_t ret;
 z_stream strm;
@@ -119,10 +121,10 @@ static ssize_t qcow2_compress(void *dest, size_t 
dest_size,
 }
 
 /*
- * qcow2_decompress()
+ * qcow2_zlib_decompress()
  *
  * Decompress some data (not more than @src_size bytes) to produce exactly
- * @dest_size bytes.
+ * @dest_size bytes using zlib compression method
  *
  * @dest - destination buffer, @dest_size bytes
  * @src - source buffer, @src_size bytes
@@ -130,8 +132,8 @@ static ssize_t qcow2_compress(void *dest, size_t dest_size,
  * Returns: 0 on success
  *  -EIO on fail
  */
-static ssize_t qcow2_decompress(void *dest, size_t dest_size,
-const void *src, size_t src_size)
+static ssize_t qcow2_zlib_decompress(void *dest, size_t dest_size,
+ const void *src, size_t src_size)
 {
 int ret;
 z_stream strm;
@@ -191,20 +193,67 @@ qcow2_co_do_compress(BlockDriverState *bs, void *dest, 
size_t dest_size,
 return arg.ret;
 }
 
+/*
+ * qcow2_co_compress()
+ *
+ * Compress @src_size bytes of data using the compression
+ * method defined by the image compression type
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: compressed size on success
+ *  a negative error code on failure
+ */
 ssize_t coroutine_fn
 qcow2_co_compress(BlockDriverState *bs, void *dest, size_t dest_size,
   const void *src, size_t src_size)
 {
-return qcow2_co_do_compress(bs, dest, dest_size, src, src_size,
-qcow2_compress);
+BDRVQcow2State *s = bs->opaque;
+Qcow2CompressFunc fn;
+
+switch (s->compression_type) {
+case QCOW2_COMPRESSION_TYPE_ZLIB:
+fn = qcow2_zlib_compress;
+break;
+
+default:
+abort();
+}
+
+return qcow2_co_do_compress(bs, dest, dest_size, src, src_size, fn);
 }
 
+/*
+ * qcow2_co_decompress()
+ *
+ * Decompress some data (not more than @src_size bytes) to produce exactly
+ * @dest_size bytes using the compression method defined by the image
+ * compression type
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: 0 on success
+ *  a negative error code on failure
+ */
 ssize_t coroutine_fn
 qcow2_co_decompress(BlockDriverState *bs, void *dest, size_t dest_size,
 const void *src, size_t src_size)
 {
-return qcow2_co_do_compress(bs, dest, dest_size, src, src_size,
-qcow2_decompress);
+BDRVQcow2State *s = bs->opaque;
+Qcow2CompressFunc fn;
+
+switch (s->compression_type) {
+case QCOW2_COMPRESSION_TYPE_ZLIB:
+fn = qcow2_zlib_decompress;
+break;
+
+default:
+abort();
+}
+
+return qcow2_co_do_compress(bs, dest, dest_size, src, src_size, fn);
 }
 
 
-- 
2.17.0




[PATCH v17 4/4] iotests: 287: add qcow2 compression type test

2020-04-01 Thread Denis Plotnikov
The test checks fulfilling qcow2 requiriements for the compression
type feature and zstd compression type operability.

Signed-off-by: Denis Plotnikov 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 tests/qemu-iotests/287 | 162 +
 tests/qemu-iotests/287.out |  70 
 tests/qemu-iotests/group   |   1 +
 3 files changed, 233 insertions(+)
 create mode 100755 tests/qemu-iotests/287
 create mode 100644 tests/qemu-iotests/287.out

diff --git a/tests/qemu-iotests/287 b/tests/qemu-iotests/287
new file mode 100755
index 00..ff59c9c154
--- /dev/null
+++ b/tests/qemu-iotests/287
@@ -0,0 +1,162 @@
+#!/usr/bin/env bash
+#
+# Test case for an image using zstd compression
+#
+# Copyright (c) 2020 Virtuozzo International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=dplotni...@virtuozzo.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+status=1   # failure is the default!
+
+# standard environment
+. ./common.rc
+. ./common.filter
+
+# This tests qocw2-specific low-level functionality
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+COMPR_IMG="$TEST_IMG.compressed"
+RAND_FILE="$TEST_DIR/rand_data"
+
+_cleanup()
+{
+   _cleanup_test_img
+   rm -f $COMPR_IMG
+   rm -f $RAND_FILE
+}
+trap "_cleanup; exit \$status" 0 1 2 3 15
+
+# for all the cases
+CLUSTER_SIZE=65536
+
+# Check if we can run this test.
+if IMGOPTS='compression_type=zstd' _make_test_img 64M |
+grep "Invalid parameter 'zstd'"; then
+_notrun "ZSTD is disabled"
+fi
+
+# Test: when compression is zlib the incompatible bit is unset
+echo
+echo "=== Testing compression type incompatible bit setting for zlib ==="
+echo
+
+IMGOPTS='compression_type=zlib' _make_test_img 64M
+$PYTHON qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+
+# Test: when compression differs from zlib the incompatible bit is set
+echo
+echo "=== Testing compression type incompatible bit setting for zstd ==="
+echo
+
+IMGOPTS='compression_type=zstd' _make_test_img 64M
+$PYTHON qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+
+# Test: an image can't be opened if compression type is zlib and
+#   incompatible feature compression type is set
+echo
+echo "=== Testing zlib with incompatible bit set ==="
+echo
+
+IMGOPTS='compression_type=zlib' _make_test_img 64M
+$PYTHON qcow2.py "$TEST_IMG" set-feature-bit incompatible 3
+# to make sure the bit was actually set
+$PYTHON qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+$QEMU_IMG info "$TEST_IMG" 2>1 1>/dev/null
+if (($?==0)); then
+echo "Error: The image opened successfully. The image must not be opened"
+fi
+
+# Test: an image can't be opened if compression type is NOT zlib and
+#   incompatible feature compression type is UNSET
+echo
+echo "=== Testing zstd with incompatible bit unset ==="
+echo
+
+IMGOPTS='compression_type=zstd' _make_test_img 64M
+$PYTHON qcow2.py "$TEST_IMG" set-header incompatible_features 0
+# to make sure the bit was actually unset
+$PYTHON qcow2.py "$TEST_IMG" dump-header | grep incompatible_features
+$QEMU_IMG info "$TEST_IMG" 2>1 1>/dev/null
+if (($?==0)); then
+echo "Error: The image opened successfully. The image must not be opened"
+fi
+# Test: check compression type values
+echo
+echo "=== Testing compression type values ==="
+echo
+# zlib=0
+IMGOPTS='compression_type=zlib' _make_test_img 64M
+od -j104 -N1 -An -vtu1 "$TEST_IMG"
+
+# zstd=1
+IMGOPTS='compression_type=zstd' _make_test_img 64M
+od -j104 -N1 -An -vtu1 "$TEST_IMG"
+
+# Test: using zstd compression, write to and read from an image
+echo
+echo "=== Testing simple reading and writing with zstd ==="
+echo
+
+IMGOPTS='compression_type=zstd' _make_test_img 64M
+$QEMU_IO -c "write -c -P 0xAC 64K 64K " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -P 0xAC 64K 64K " "$TEST_IMG" | _filter_qemu_io
+# read on the cluster boundaries
+$QEMU_IO -c "read -v 131070 8 " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c "read -v 65534 8" "$TEST_IMG" | _filter_qemu_io
+
+# Test: using zstd compression, write and verify three adjacent
+#   compressed clusters
+echo
+echo "=== Testing adjacent clusters reading and writing with zstd ==="
+echo
+
+IMGOPTS='compression_type=zstd' _make_test_img 64M
+$QEMU_IO -c "write -c -P 0xAB 0 64K " "$TEST_IMG" | _filter_qemu_io
+$QEMU_IO -c 

Re: [PATCH v17 Kernel 6/7] vfio iommu: Adds flag to indicate dirty pages tracking capability support

2020-04-01 Thread Alex Williamson
On Wed, 1 Apr 2020 11:51:03 -0600
Alex Williamson  wrote:

> On Wed, 1 Apr 2020 22:55:57 +0530
> Kirti Wankhede  wrote:
> 
> > On 4/1/2020 12:45 AM, Alex Williamson wrote:  
> > > On Wed, 1 Apr 2020 00:38:49 +0530
> > > Kirti Wankhede  wrote:
> > > 
> > >> On 3/31/2020 2:28 AM, Alex Williamson wrote:
> > >>> On Mon, 30 Mar 2020 22:20:43 +0530
> > >>> Kirti Wankhede  wrote:
> > >>>
> >  Flag VFIO_IOMMU_INFO_DIRTY_PGS in VFIO_IOMMU_GET_INFO indicates that 
> >  driver
> >  support dirty pages tracking.
> > 
> >  Signed-off-by: Kirti Wankhede 
> >  Reviewed-by: Neo Jia 
> >  ---
> > drivers/vfio/vfio_iommu_type1.c | 3 ++-
> > include/uapi/linux/vfio.h   | 5 +++--
> > 2 files changed, 5 insertions(+), 3 deletions(-)
> > 
> >  diff --git a/drivers/vfio/vfio_iommu_type1.c 
> >  b/drivers/vfio/vfio_iommu_type1.c
> >  index 266550bd7307..9fe12b425976 100644
> >  --- a/drivers/vfio/vfio_iommu_type1.c
> >  +++ b/drivers/vfio/vfio_iommu_type1.c
> >  @@ -2390,7 +2390,8 @@ static long vfio_iommu_type1_ioctl(void 
> >  *iommu_data,
> > info.cap_offset = 0; /* output, no-recopy 
> >  necessary */
> > }
> > 
> >  -  info.flags = VFIO_IOMMU_INFO_PGSIZES;
> >  +  info.flags = VFIO_IOMMU_INFO_PGSIZES |
> >  +   VFIO_IOMMU_INFO_DIRTY_PGS;
> > 
> > info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
> > 
> >  diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> >  index e3cbf8b78623..0fe7c9a6f211 100644
> >  --- a/include/uapi/linux/vfio.h
> >  +++ b/include/uapi/linux/vfio.h
> >  @@ -985,8 +985,9 @@ struct vfio_device_feature {
> > struct vfio_iommu_type1_info {
> > __u32   argsz;
> > __u32   flags;
> >  -#define VFIO_IOMMU_INFO_PGSIZES (1 << 0)  /* supported page sizes 
> >  info */
> >  -#define VFIO_IOMMU_INFO_CAPS  (1 << 1)/* Info supports caps */
> >  +#define VFIO_IOMMU_INFO_PGSIZES   (1 << 0) /* supported page sizes 
> >  info */
> >  +#define VFIO_IOMMU_INFO_CAPS  (1 << 1) /* Info supports caps */
> >  +#define VFIO_IOMMU_INFO_DIRTY_PGS (1 << 2) /* supports dirty page 
> >  tracking */
> > __u64   iova_pgsizes;   /* Bitmap of supported page sizes */
> > __u32   cap_offset; /* Offset within info struct of first 
> >  cap */
> > };
> > >>>
> > >>>
> > >>> As I just mentioned in my reply to Yan, I'm wondering if
> > >>> VFIO_CHECK_EXTENSION would be a better way to expose this.  The
> > >>> difference is relatively trivial, but currently the only flag
> > >>> set by VFIO_IOMMU_GET_INFO is to indicate the presence of a field in
> > >>> the returned structure.  I think this is largely true of other INFO
> > >>> ioctls within vfio as well and we're already using the
> > >>> VFIO_CHECK_EXTENSION ioctl to check supported IOMMU models, and IOMMU
> > >>> cache coherency.  We'd simply need to define a VFIO_DIRTY_PGS_IOMMU
> > >>> value (9) and return 1 for that case.  Then when we enable support for
> > >>> dirt pages that can span multiple mappings, we can add a v2 extensions,
> > >>> or "MULTI" variant of this extension, since it should be backwards
> > >>> compatible.
> > >>>
> > >>> The v2/multi version will again require that the user provide a zero'd
> > >>> bitmap, but I don't think that should be a problem as part of the
> > >>> definition of that version (we won't know if the user is using v1 or
> > >>> v2, but a v1 user should only retrieve bitmaps that exactly match
> > >>> existing mappings, where all bits will be written).  Thanks,
> > >>>
> > >>> Alex
> > >>>
> > >>
> > >> I look at these two ioctls as : VFIO_CHECK_EXTENSION is used to get
> > >> IOMMU type, while VFIO_IOMMU_GET_INFO is used to get properties of a
> > >> particular IOMMU type, right?
> > > 
> > > Not exclusively, see for example VFIO_DMA_CC_IOMMU,
> > > 
> > >> Then I think VFIO_IOMMU_INFO_DIRTY_PGS should be part of
> > >> VFIO_IOMMU_GET_INFO and when we add code for v2/multi, a flag should be
> > >> added to VFIO_IOMMU_GET_INFO.
> > > 
> > > Which burns through flags, which is a far more limited resource than
> > > our 32bit extension address space, especially when we're already
> > > planning for one or more extensions to this support.  Thanks,
> > > 
> > 
> > To use flag from VFIO_IOMMU_GET_INFO was your original suggestion, only 
> > 3 bits are used here as of now.  
> 
> Sorry, I'm not infallible.  Perhaps I was short sighted and thought we
> might only need one flag, perhaps I forgot about the check-extension
> ioctl.  Are there any technical reasons to keep it on the get-info
> ioctl?  As I'm trying to look ahead for how we're going to fill the
> gaps of this initial implementation, it seems to me that what 

Re: [PATCH v17 Kernel 6/7] vfio iommu: Adds flag to indicate dirty pages tracking capability support

2020-04-01 Thread Alex Williamson
On Wed, 1 Apr 2020 22:55:57 +0530
Kirti Wankhede  wrote:

> On 4/1/2020 12:45 AM, Alex Williamson wrote:
> > On Wed, 1 Apr 2020 00:38:49 +0530
> > Kirti Wankhede  wrote:
> >   
> >> On 3/31/2020 2:28 AM, Alex Williamson wrote:  
> >>> On Mon, 30 Mar 2020 22:20:43 +0530
> >>> Kirti Wankhede  wrote:
> >>>  
>  Flag VFIO_IOMMU_INFO_DIRTY_PGS in VFIO_IOMMU_GET_INFO indicates that 
>  driver
>  support dirty pages tracking.
> 
>  Signed-off-by: Kirti Wankhede 
>  Reviewed-by: Neo Jia 
>  ---
> drivers/vfio/vfio_iommu_type1.c | 3 ++-
> include/uapi/linux/vfio.h   | 5 +++--
> 2 files changed, 5 insertions(+), 3 deletions(-)
> 
>  diff --git a/drivers/vfio/vfio_iommu_type1.c 
>  b/drivers/vfio/vfio_iommu_type1.c
>  index 266550bd7307..9fe12b425976 100644
>  --- a/drivers/vfio/vfio_iommu_type1.c
>  +++ b/drivers/vfio/vfio_iommu_type1.c
>  @@ -2390,7 +2390,8 @@ static long vfio_iommu_type1_ioctl(void 
>  *iommu_data,
>   info.cap_offset = 0; /* output, no-recopy 
>  necessary */
>   }
> 
>  -info.flags = VFIO_IOMMU_INFO_PGSIZES;
>  +info.flags = VFIO_IOMMU_INFO_PGSIZES |
>  + VFIO_IOMMU_INFO_DIRTY_PGS;
> 
>   info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
> 
>  diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
>  index e3cbf8b78623..0fe7c9a6f211 100644
>  --- a/include/uapi/linux/vfio.h
>  +++ b/include/uapi/linux/vfio.h
>  @@ -985,8 +985,9 @@ struct vfio_device_feature {
> struct vfio_iommu_type1_info {
>   __u32   argsz;
>   __u32   flags;
>  -#define VFIO_IOMMU_INFO_PGSIZES (1 << 0)/* supported page sizes 
>  info */
>  -#define VFIO_IOMMU_INFO_CAPS(1 << 1)/* Info supports caps */
>  +#define VFIO_IOMMU_INFO_PGSIZES   (1 << 0) /* supported page sizes info 
>  */
>  +#define VFIO_IOMMU_INFO_CAPS  (1 << 1) /* Info supports caps */
>  +#define VFIO_IOMMU_INFO_DIRTY_PGS (1 << 2) /* supports dirty page 
>  tracking */
>   __u64   iova_pgsizes;   /* Bitmap of supported page sizes */
>   __u32   cap_offset; /* Offset within info struct of first 
>  cap */
> };  
> >>>
> >>>
> >>> As I just mentioned in my reply to Yan, I'm wondering if
> >>> VFIO_CHECK_EXTENSION would be a better way to expose this.  The
> >>> difference is relatively trivial, but currently the only flag
> >>> set by VFIO_IOMMU_GET_INFO is to indicate the presence of a field in
> >>> the returned structure.  I think this is largely true of other INFO
> >>> ioctls within vfio as well and we're already using the
> >>> VFIO_CHECK_EXTENSION ioctl to check supported IOMMU models, and IOMMU
> >>> cache coherency.  We'd simply need to define a VFIO_DIRTY_PGS_IOMMU
> >>> value (9) and return 1 for that case.  Then when we enable support for
> >>> dirt pages that can span multiple mappings, we can add a v2 extensions,
> >>> or "MULTI" variant of this extension, since it should be backwards
> >>> compatible.
> >>>
> >>> The v2/multi version will again require that the user provide a zero'd
> >>> bitmap, but I don't think that should be a problem as part of the
> >>> definition of that version (we won't know if the user is using v1 or
> >>> v2, but a v1 user should only retrieve bitmaps that exactly match
> >>> existing mappings, where all bits will be written).  Thanks,
> >>>
> >>> Alex
> >>>  
> >>
> >> I look at these two ioctls as : VFIO_CHECK_EXTENSION is used to get
> >> IOMMU type, while VFIO_IOMMU_GET_INFO is used to get properties of a
> >> particular IOMMU type, right?  
> > 
> > Not exclusively, see for example VFIO_DMA_CC_IOMMU,
> >   
> >> Then I think VFIO_IOMMU_INFO_DIRTY_PGS should be part of
> >> VFIO_IOMMU_GET_INFO and when we add code for v2/multi, a flag should be
> >> added to VFIO_IOMMU_GET_INFO.  
> > 
> > Which burns through flags, which is a far more limited resource than
> > our 32bit extension address space, especially when we're already
> > planning for one or more extensions to this support.  Thanks,
> >   
> 
> To use flag from VFIO_IOMMU_GET_INFO was your original suggestion, only 
> 3 bits are used here as of now.

Sorry, I'm not infallible.  Perhaps I was short sighted and thought we
might only need one flag, perhaps I forgot about the check-extension
ioctl.  Are there any technical reasons to keep it on the get-info
ioctl?  As I'm trying to look ahead for how we're going to fill the
gaps of this initial implementation, it seems to me that what we're
exposing here is in line with what we've used check-extension for in
the past, and it offers us essentially unlimited extensions to burn
through, while we're clearly limited on the get-info flags.  We do have
the precedent of the reset flag on the device_get_info 

Re: [PATCH-for-5.0 1/7] tests/acceptance/machine_sparc_leon3: Disable HelenOS test

2020-04-01 Thread Willian Rampazzo
On Tue, Mar 31, 2020 at 5:07 PM Philippe Mathieu-Daudé
 wrote:

>
> First job failed by timeout, 2nd succeeded:
> https://travis-ci.org/github/philmd/qemu/jobs/669265466
>
> However "Ran for 46 min 48 sec"
>
>  From the log:
>
> Fetching asset from
> tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_mips64el_malta_5KEc_cpio
> Fetching asset from
> tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_mips64el_malta_5KEc_cpio
> Fetching asset from
> tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_orangepi
> Fetching asset from
> tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_orangepi_initrd
> Fetching asset from
> tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_orangepi_initrd
> Fetching asset from
> tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_orangepi_sd
> Fetching asset from
> tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_orangepi_sd
> Fetching asset from
> tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_orangepi_bionic
> Fetching asset from
> tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_orangepi_uboot_netbsd9
> Fetching asset from
> tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_orangepi_uboot_netbsd9
> Fetching asset from
> tests/acceptance/ppc_prep_40p.py:IbmPrep40pMachine.test_openbios_and_netbsd
> ...
>   (13/82)
> tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_mips64el_malta_5KEc_cpio:
>   SKIP: untrusted code
>   (24/82)
> tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_orangepi_bionic:
>   SKIP: storage limited
> ...
>   (25/82)
> tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_arm_orangepi_uboot_netbsd9:
>   SKIP: storage limited
> ...
>   (63/82)
> tests/acceptance/ppc_prep_40p.py:IbmPrep40pMachine.test_openbios_and_netbsd:
>   SKIP: Running on Travis-CI
>
> Is it possible that we are now fetching assets for tests we are not
> running? In particular the one marked @skip because the downloading time
> was too long on Travis?

Yes, your assumption is correct, this execution of Avocado downloaded
assets for tests that were skipped. Let me try to explain how the
asset feature works today on Avocado.

Avocado has two basic ways to work with assets:

1. Parse limited use cases of `fetch_asset` call in the test file and
execute them. This operation can happen in two different scenarios.
First, when using the command line `avocado assets fetch `.
In this case, it is a standalone execution of each fetch call and the
test is not executed at all. Second, by running the test. The, enabled
by default, plugin FetchAssetJob will do the same operation of parsing
the test file and executing occurrences of `fetch_asset` call before
the tests start to run. Again, the fetch time is not computed in the
job time.

2. Execute the `fetch_asset` call from each test during the test
execution. In this case, the FetchAssetJob plugin should be disabled.
The fetch time is added to the job time as the asset download occurs
during the test execution.

The acceptance tests which make use of `fetch_asset` are all using the
first method with FetchAssetJob plugin enabled. As Avocado is parsing
the test file before it starts to run the tests, it is not aware of
possible skips that may occur during a test execution due to possible
dynamic dependency.

This is not the desired behavior, as you mentioned, Avocado is
downloading an asset that will not be used because its test will be
skipped. To minimize the damage on the download side, the Travis job
is holding the avocado cache. It means the download should happen just
once. This does not minimize the damage to space usage.

One possible workaround here is to temporarily disable the
FetchAssetJob plugin, now that the needed assets are on Travis Avocado
cache. The downside is that when an asset is not available in the
cache, it will be downloaded during the test execution and the
download time will be added to the job time. I don't know if it is
possible to manually remove an asset from Travis Avocado cache. If so,
this can be done for the tests that should be skipped because of space
usage.

We have been trying to make the Asset feature as flexible as possible
to accommodate the use cases we have been identifying. Thanks for
reporting this!

>
> RESULTS: PASS 65 | ERROR 0 | FAIL 0 | SKIP 14 | WARN 0 | INTERRUPT 0
> | CANCEL 3
> JOB TIME   : 1480.72 s
>
> Does this "JOB TIME" sums the 'Fetching asset' part?

Answered in the comments above.

>
> Thanks,
>
> Phil.
>

Willian




Re: [PATCH v16 QEMU 08/16] vfio: Register SaveVMHandlers for VFIO device

2020-04-01 Thread Dr. David Alan Gilbert
* Kirti Wankhede (kwankh...@nvidia.com) wrote:
> Define flags to be used as delimeter in migration file stream.
> Added .save_setup and .save_cleanup functions. Mapped & unmapped migration
> region from these functions at source during saving or pre-copy phase.
> Set VFIO device state depending on VM's state. During live migration, VM is
> running when .save_setup is called, _SAVING | _RUNNING state is set for VFIO
> device. During save-restore, VM is paused, _SAVING state is set for VFIO 
> device.
> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  hw/vfio/migration.c  | 76 
> 
>  hw/vfio/trace-events |  2 ++
>  2 files changed, 78 insertions(+)
> 
> diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
> index 22ded9d28cf3..033f76526e49 100644
> --- a/hw/vfio/migration.c
> +++ b/hw/vfio/migration.c
> @@ -8,6 +8,7 @@
>   */
>  
>  #include "qemu/osdep.h"
> +#include "qemu/main-loop.h"
>  #include 
>  
>  #include "sysemu/runstate.h"
> @@ -24,6 +25,17 @@
>  #include "pci.h"
>  #include "trace.h"
>  
> +/*
> + * Flags used as delimiter:
> + * 0x => MSB 32-bit all 1s
> + * 0xef10 => emulated (virtual) function IO
> + * 0x => 16-bits reserved for flags
> + */
> +#define VFIO_MIG_FLAG_END_OF_STATE  (0xef11ULL)
> +#define VFIO_MIG_FLAG_DEV_CONFIG_STATE  (0xef12ULL)
> +#define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xef13ULL)
> +#define VFIO_MIG_FLAG_DEV_DATA_STATE(0xef14ULL)
> +
>  static void vfio_migration_region_exit(VFIODevice *vbasedev)
>  {
>  VFIOMigration *migration = vbasedev->migration;
> @@ -126,6 +138,69 @@ static int vfio_migration_set_state(VFIODevice 
> *vbasedev, uint32_t mask,
>  return 0;
>  }
>  
> +/* -- */
> +
> +static int vfio_save_setup(QEMUFile *f, void *opaque)
> +{
> +VFIODevice *vbasedev = opaque;
> +VFIOMigration *migration = vbasedev->migration;
> +int ret;
> +
> +qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE);
> +
> +if (migration->region.mmaps) {
> +qemu_mutex_lock_iothread();
> +ret = vfio_region_mmap(>region);
> +qemu_mutex_unlock_iothread();
> +if (ret) {
> +error_report("%s: Failed to mmap VFIO migration region %d: %s",
> + vbasedev->name, migration->region.index,
> + strerror(-ret));
> +return ret;
> +}
> +}
> +
> +ret = vfio_migration_set_state(vbasedev, ~0, VFIO_DEVICE_STATE_SAVING);
> +if (ret) {
> +error_report("%s: Failed to set state SAVING", vbasedev->name);
> +return ret;
> +}
> +
> +/*
> + * Save migration region size. This is used to verify migration region 
> size
> + * is greater than or equal to migration region size at destination
> + */
> +qemu_put_be64(f, migration->region.size);
> +
> +qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);

OK, good, so now we can change that to something else if you want to
migrate something extra in the future.

> +ret = qemu_file_get_error(f);
> +if (ret) {
> +return ret;
> +}
> +
> +trace_vfio_save_setup(vbasedev->name);

I'd put that trace at the start of the function.

> +return 0;
> +}
> +
> +static void vfio_save_cleanup(void *opaque)
> +{
> +VFIODevice *vbasedev = opaque;
> +VFIOMigration *migration = vbasedev->migration;
> +
> +if (migration->region.mmaps) {
> +vfio_region_unmap(>region);
> +}
> +trace_vfio_save_cleanup(vbasedev->name);
> +}
> +
> +static SaveVMHandlers savevm_vfio_handlers = {
> +.save_setup = vfio_save_setup,
> +.save_cleanup = vfio_save_cleanup,
> +};
> +
> +/* -- */
> +
>  static void vfio_vmstate_change(void *opaque, int running, RunState state)
>  {
>  VFIODevice *vbasedev = opaque;
> @@ -191,6 +266,7 @@ static int vfio_migration_init(VFIODevice *vbasedev,
>  return ret;
>  }
>  
> +register_savevm_live("vfio", -1, 1, _vfio_handlers, vbasedev);

That doesn't look right to me;  firstly the -1 should now be
VMSTATE_INSTANCE_ID_ANY - after the recent change in commit 1df2c9a

Have you tried this with two vfio devices?
This is quite rare - it's an iterative device that can have
multiple instances;  if you look at 'ram' for example, all the RAM
instances are handled inside the save_setup/save for the one instance of
'ram'.  I think here you're trying to register an individual vfio
device, so if you had multiple devices you'd see this called twice.

So either you need to make vfio_save_* do all of the devices in a loop -
which feels like a bad idea;  or replace "vfio" in that call by a unique
device name;  as long as your device has a bus path then you should be
able to use the same trick vmstate_register_with_alias_id does, and use
I think,  

Re: [PATCH v17 Kernel 6/7] vfio iommu: Adds flag to indicate dirty pages tracking capability support

2020-04-01 Thread Kirti Wankhede




On 4/1/2020 12:45 AM, Alex Williamson wrote:

On Wed, 1 Apr 2020 00:38:49 +0530
Kirti Wankhede  wrote:


On 3/31/2020 2:28 AM, Alex Williamson wrote:

On Mon, 30 Mar 2020 22:20:43 +0530
Kirti Wankhede  wrote:
   

Flag VFIO_IOMMU_INFO_DIRTY_PGS in VFIO_IOMMU_GET_INFO indicates that driver
support dirty pages tracking.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
   drivers/vfio/vfio_iommu_type1.c | 3 ++-
   include/uapi/linux/vfio.h   | 5 +++--
   2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 266550bd7307..9fe12b425976 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2390,7 +2390,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
info.cap_offset = 0; /* output, no-recopy necessary */
}
   
-		info.flags = VFIO_IOMMU_INFO_PGSIZES;

+   info.flags = VFIO_IOMMU_INFO_PGSIZES |
+VFIO_IOMMU_INFO_DIRTY_PGS;
   
   		info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
   
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h

index e3cbf8b78623..0fe7c9a6f211 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -985,8 +985,9 @@ struct vfio_device_feature {
   struct vfio_iommu_type1_info {
__u32   argsz;
__u32   flags;
-#define VFIO_IOMMU_INFO_PGSIZES (1 << 0) /* supported page sizes info */
-#define VFIO_IOMMU_INFO_CAPS   (1 << 1)  /* Info supports caps */
+#define VFIO_IOMMU_INFO_PGSIZES   (1 << 0) /* supported page sizes info */
+#define VFIO_IOMMU_INFO_CAPS  (1 << 1) /* Info supports caps */
+#define VFIO_IOMMU_INFO_DIRTY_PGS (1 << 2) /* supports dirty page tracking */
__u64   iova_pgsizes;   /* Bitmap of supported page sizes */
__u32   cap_offset; /* Offset within info struct of first cap */
   };



As I just mentioned in my reply to Yan, I'm wondering if
VFIO_CHECK_EXTENSION would be a better way to expose this.  The
difference is relatively trivial, but currently the only flag
set by VFIO_IOMMU_GET_INFO is to indicate the presence of a field in
the returned structure.  I think this is largely true of other INFO
ioctls within vfio as well and we're already using the
VFIO_CHECK_EXTENSION ioctl to check supported IOMMU models, and IOMMU
cache coherency.  We'd simply need to define a VFIO_DIRTY_PGS_IOMMU
value (9) and return 1 for that case.  Then when we enable support for
dirt pages that can span multiple mappings, we can add a v2 extensions,
or "MULTI" variant of this extension, since it should be backwards
compatible.

The v2/multi version will again require that the user provide a zero'd
bitmap, but I don't think that should be a problem as part of the
definition of that version (we won't know if the user is using v1 or
v2, but a v1 user should only retrieve bitmaps that exactly match
existing mappings, where all bits will be written).  Thanks,

Alex
   


I look at these two ioctls as : VFIO_CHECK_EXTENSION is used to get
IOMMU type, while VFIO_IOMMU_GET_INFO is used to get properties of a
particular IOMMU type, right?


Not exclusively, see for example VFIO_DMA_CC_IOMMU,


Then I think VFIO_IOMMU_INFO_DIRTY_PGS should be part of
VFIO_IOMMU_GET_INFO and when we add code for v2/multi, a flag should be
added to VFIO_IOMMU_GET_INFO.


Which burns through flags, which is a far more limited resource than
our 32bit extension address space, especially when we're already
planning for one or more extensions to this support.  Thanks,



To use flag from VFIO_IOMMU_GET_INFO was your original suggestion, only 
3 bits are used here as of now.


Thanks,
Kirti



Re: RFC: use VFIO over a UNIX domain socket to implement device offloading

2020-04-01 Thread Marc-André Lureau
Hi

On Wed, Apr 1, 2020 at 5:51 PM Thanos Makatos
 wrote:
>
> > On Thu, Mar 26, 2020 at 09:47:38AM +, Thanos Makatos wrote:
> > > Build MUSER with vfio-over-socket:
> > >
> > > git clone --single-branch --branch vfio-over-socket
> > g...@github.com:tmakatos/muser.git
> > > cd muser/
> > > git submodule update --init
> > > make
> > >
> > > Run device emulation, e.g.
> > >
> > > ./build/dbg/samples/gpio-pci-idio-16 -s 
> > >
> > > Where  is an available IOMMU group, essentially the device ID, which
> > must not
> > > previously exist in /dev/vfio/.
> > >
> > > Run QEMU using the vfio wrapper library and specifying the MUSER device:
> > >
> > > LD_PRELOAD=muser/build/dbg/libvfio/libvfio.so qemu-system-x86_64
> > \
> > > ... \
> > > -device vfio-pci,sysfsdev=/dev/vfio/ \
> > > -object memory-backend-file,id=ram-node0,prealloc=yes,mem-
> > path=mem,share=yes,size=1073741824 \
> > > -numa node,nodeid=0,cpus=0,memdev=ram-node0
> > >

fyi, with 5.0 you no longer need -numa!:

-object memory-backend-memfd,id=mem,size=2G -M memory-backend=mem

(hopefully, we will get something even simpler one day)

> > > Bear in mind that since this is just a PoC lots of things can break, e.g. 
> > > some
> > > system call not intercepted etc.
> >
> > Cool, I had a quick look at libvfio and how the transport integrates
> > into libmuser.  The integration on the libmuser side is nice and small.
> >
> > It seems likely that there will be several different implementations of
> > the vfio-over-socket device side (server):
> > 1. libmuser
> > 2. A Rust equivalent to libmuser
> > 3. Maybe a native QEMU implementation for multi-process QEMU (I think JJ
> >has been investigating this?)
> >
> > In order to interoperate we'll need to maintain a protocol
> > specification.  Mayb You and JJ could put that together and CC the vfio,
> > rust-vmm, and QEMU communities for discussion?
>
> Sure, I can start by drafting a design doc and share it.

ok! I am quite amazed you went this far with a ldpreload hack. This
demonstrates some limits of gpl projects, if it was necessary.

I think with this work, and the muser experience, you have a pretty
good idea of what the protocol could look like. My approach, as I
remember, was a quite straightforward VFIO over socket translation,
while trying to see if it could share some aspects with vhost-user,
such as memory handling etc.

To contrast with the work done on qemu-mp series, I'd also prefer we
focus our work on a vfio-like protocol, before trying to see how qemu
code and interface could be changed over multiple binaries etc. We
will start with some limitations, similar to the one that apply to
VFIO: migration, introspection, managements etc are mostly left out
for now. (iow, qemu-mp is trying to do too many things simultaneously)

That's the rough ideas/plan I have in mind:
- draft/define a "vfio over unix" protocol
- similar to vhost-user, also define some backend conventions
https://github.com/qemu/qemu/blob/master/docs/interop/vhost-user.rst#backend-program-conventions
- modify qemu vfio code to allow using a socket backend. Ie something
like "-chardev socket=foo -device vfio-pci,chardev=foo"
- implement some test devices (and outside qemu, in whatever
language/framework - the more the merrier!)
- investigate how existing qemu binary could expose some devices over
"vfio-unix", for ex: "qemu -machine none -chardev socket=foo,server
-device pci-serial,vfio=foo". This would avoid a lot of proxy and code
churn proposed in qemu-mp.
- think about evolution of QMP, so that commands are dispatched to the
right process. In my book, this is called a bus, and I would go for
DBus (not through qemu) in the long term. But for now, we probably
want to split QMP code to make it more modular (in qemu-mp series,
this isn't stellar either). Later on, perhaps look at bridging QMP
over DBus.
- code refactoring in qemu, to allow smaller binaries, that implement
the minimum for vfio-user devices. (imho, this will be a bit easier
after the meson move, as the build system is simpler)

That should allow some work sharing.

I can't wait for your design draft, and see how I could help.

>
> > It should cover the UNIX domain socket connection semantics (does a
> > listen socket only accept 1 connection at a time?  What happens when the
> > client disconnects?  What happens when the server disconnects?), how
> > VFIO structs are exchanged, any vfio-over-socket specific protocol
> > messages, etc.  Basically everything needed to write an implementation
> > (although it's not necessary to copy the VFIO struct definitions from
> > the kernel headers into the spec or even document their semantics if
> > they are identical to kernel VFIO).
> >
> > The next step beyond the LD_PRELOAD library is a native vfio-over-socket
> > client implementation in QEMU.  There is a prototype here:
> > 

[PATCH v2] Compress lines for immediate return

2020-04-01 Thread Simran Singhal
Compress two lines into a single line if immediate return statement is found.

It also remove variables progress, val, data, ret and sock
as they are no longer needed.

Remove space between function "mixer_load" and '(' to fix the
checkpatch.pl error:-
ERROR: space prohibited between function name and open parenthesis '('

Done using following coccinelle script:
@@
local idexpression ret;
expression e;
@@

-ret =
+return
 e;
-return ret;

Signed-off-by: Simran Singhal 
---
Changes in v2:
-Added coccinelle script wrote for changes in commit message.

 block/file-posix.c  | 3 +--
 block/nfs.c | 3 +--
 block/nvme.c| 4 +---
 block/vhdx.c| 3 +--
 hw/audio/ac97.c | 4 +---
 hw/audio/adlib.c| 5 +
 hw/display/cirrus_vga.c | 4 +---
 migration/ram.c | 4 +---
 ui/gtk.c| 3 +--
 util/qemu-sockets.c | 5 +
 10 files changed, 10 insertions(+), 28 deletions(-)

diff --git a/block/file-posix.c b/block/file-posix.c
index 7e19bbff5f..dc01f0d4d3 100644
--- a/block/file-posix.c
+++ b/block/file-posix.c
@@ -1627,8 +1627,7 @@ static int handle_aiocb_write_zeroes_unmap(void *opaque)
 
 /* If we couldn't manage to unmap while guaranteed that the area reads as
  * all-zero afterwards, just write zeroes without unmapping */
-ret = handle_aiocb_write_zeroes(aiocb);
-return ret;
+return handle_aiocb_write_zeroes(aiocb);
 }
 
 #ifndef HAVE_COPY_FILE_RANGE
diff --git a/block/nfs.c b/block/nfs.c
index cc2413d5ab..100f15bd1f 100644
--- a/block/nfs.c
+++ b/block/nfs.c
@@ -623,8 +623,7 @@ static int nfs_file_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 
 bs->total_sectors = ret;
-ret = 0;
-return ret;
+return 0;
 }
 
 static QemuOptsList nfs_create_opts = {
diff --git a/block/nvme.c b/block/nvme.c
index 7b7c0cc5d6..eb2f54dd9d 100644
--- a/block/nvme.c
+++ b/block/nvme.c
@@ -575,11 +575,9 @@ static bool nvme_poll_cb(void *opaque)
 {
 EventNotifier *e = opaque;
 BDRVNVMeState *s = container_of(e, BDRVNVMeState, irq_notifier);
-bool progress = false;
 
 trace_nvme_poll_cb(s);
-progress = nvme_poll_queues(s);
-return progress;
+return nvme_poll_queues(s);
 }
 
 static int nvme_init(BlockDriverState *bs, const char *device, int namespace,
diff --git a/block/vhdx.c b/block/vhdx.c
index 33e57cd656..2c0e7ee44d 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -411,8 +411,7 @@ int vhdx_update_headers(BlockDriverState *bs, BDRVVHDXState 
*s,
 if (ret < 0) {
 return ret;
 }
-ret = vhdx_update_header(bs, s, generate_data_write_guid, log_guid);
-return ret;
+return vhdx_update_header(bs, s, generate_data_write_guid, log_guid);
 }
 
 /* opens the specified header block from the VHDX file header section */
diff --git a/hw/audio/ac97.c b/hw/audio/ac97.c
index 1ec87feec0..8a9b9924c4 100644
--- a/hw/audio/ac97.c
+++ b/hw/audio/ac97.c
@@ -573,11 +573,9 @@ static uint32_t nam_readb (void *opaque, uint32_t addr)
 static uint32_t nam_readw (void *opaque, uint32_t addr)
 {
 AC97LinkState *s = opaque;
-uint32_t val = ~0U;
 uint32_t index = addr;
 s->cas = 0;
-val = mixer_load (s, index);
-return val;
+return mixer_load(s, index);
 }
 
 static uint32_t nam_readl (void *opaque, uint32_t addr)
diff --git a/hw/audio/adlib.c b/hw/audio/adlib.c
index d6c1fb0586..7c3b67dcfb 100644
--- a/hw/audio/adlib.c
+++ b/hw/audio/adlib.c
@@ -120,13 +120,10 @@ static void adlib_write(void *opaque, uint32_t nport, 
uint32_t val)
 static uint32_t adlib_read(void *opaque, uint32_t nport)
 {
 AdlibState *s = opaque;
-uint8_t data;
 int a = nport & 3;
 
 adlib_kill_timers (s);
-data = OPLRead (s->opl, a);
-
-return data;
+return OPLRead (s->opl, a);
 }
 
 static void timer_handler (void *opaque, int c, double interval_Sec)
diff --git a/hw/display/cirrus_vga.c b/hw/display/cirrus_vga.c
index 0d391e1300..1f29731ffe 100644
--- a/hw/display/cirrus_vga.c
+++ b/hw/display/cirrus_vga.c
@@ -2411,12 +2411,10 @@ static uint64_t cirrus_linear_bitblt_read(void *opaque,
   unsigned size)
 {
 CirrusVGAState *s = opaque;
-uint32_t ret;
 
 /* XXX handle bitblt */
 (void)s;
-ret = 0xff;
-return ret;
+return 0xff;
 }
 
 static void cirrus_linear_bitblt_write(void *opaque,
diff --git a/migration/ram.c b/migration/ram.c
index 04f13feb2e..06cba88632 100644
--- a/migration/ram.c
+++ b/migration/ram.c
@@ -2135,9 +2135,7 @@ int ram_postcopy_send_discard_bitmap(MigrationState *ms)
 }
 trace_ram_postcopy_send_discard_bitmap();
 
-ret = postcopy_each_ram_send_discard(ms);
-
-return ret;
+return postcopy_each_ram_send_discard(ms);
 }
 
 /**
diff --git a/ui/gtk.c b/ui/gtk.c
index 030b251c61..83f2f5d49b 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -1650,8 +1650,7 @@ static GSList *gd_vc_menu_init(GtkDisplayState *s, 
VirtualConsole *vc,
  

Re: [PATCH v2 0/6] nbd: reduce max_block restrictions

2020-04-01 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/20200401150112.9557-1-vsement...@virtuozzo.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

--- /tmp/qemu-test/src/tests/qemu-iotests/251.out   2020-04-01 
15:01:19.0 +
+++ /tmp/qemu-test/build/tests/qemu-iotests/251.out.bad 2020-04-01 
16:49:36.542097534 +
@@ -18,26 +18,16 @@
 qemu-img: warning: error while reading offset read_fail_offset_8: Input/output 
error
 qemu-img: warning: error while reading offset read_fail_offset_9: Input/output 
error
 
-wrote 512/512 bytes at offset read_fail_offset_0
-512 bytes, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
---
Not run: 259
Failures: 033 034 154 177 251
Failed 5 of 116 iotests
make: *** [check-tests/check-block.sh] Error 1
make: *** Waiting for unfinished jobs
  TESTcheck-qtest-aarch64: tests/qtest/test-hmp
  TESTcheck-qtest-aarch64: tests/qtest/qos-test
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=067a8c1d1d00403ba8447315095d6388', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-7rgy2yf_/src/docker-src.2020-04-01-12.37.18.28490:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=067a8c1d1d00403ba8447315095d6388
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-7rgy2yf_/src'
make: *** [docker-run-test-quick@centos7] Error 2

real14m49.328s
user0m7.899s


The full log is available at
http://patchew.org/logs/20200401150112.9557-1-vsement...@virtuozzo.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

[PATCH v2 0/2] ppc/xive: Add support for PQ state bits offload

2020-04-01 Thread Cédric Le Goater
Hello,

When the XIVE router unit receives a trigger message coming from a HW
source, it contains a special bit informing the XIVE interrupt
controller that the PQ bits have been checked at the source or
not. Depending on the value, the IC can perform the check and the
state transition locally using its own PQ state bits.

The following changes add new accessors to the XiveRouter required to
query and update the PQ state bits. This is only applies to the
PowerNV machine, sPAPR is not concerned by such complex configuration.
We will use it for upcoming features offloading event coalescing on
the interrupt controller.

Thanks,

C.

Cédric Le Goater (2):
  ppc/xive: export PQ routines
  ppc/xive: Add support for PQ state bits offload

 include/hw/ppc/xive.h| 12 +++--
 hw/intc/pnv_xive.c   | 37 ---
 hw/intc/spapr_xive_kvm.c |  8 +++---
 hw/intc/xive.c   | 54 
 hw/pci-host/pnv_phb4.c   |  9 +--
 hw/ppc/pnv_psi.c |  8 --
 6 files changed, 105 insertions(+), 23 deletions(-)

-- 
2.21.1




[PATCH v2 2/2] ppc/xive: Add support for PQ state bits offload

2020-04-01 Thread Cédric Le Goater
The trigger message coming from a HW source contains a special bit
informing the XIVE interrupt controller that the PQ bits have been
checked at the source or not. Depending on the value, the IC can
perform the check and the state transition locally using its own PQ
state bits.

The following changes add new accessors to the XiveRouter required to
query and update the PQ state bits. This is only applies to the
PowerNV machine, sPAPR is not concerned by such complex configuration.

Signed-off-by: Cédric Le Goater 
---
 include/hw/ppc/xive.h  |  8 +--
 hw/intc/pnv_xive.c | 37 +---
 hw/intc/xive.c | 48 --
 hw/pci-host/pnv_phb4.c |  9 ++--
 hw/ppc/pnv_psi.c   |  8 +--
 5 files changed, 94 insertions(+), 16 deletions(-)

diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index 112fb6fb6dbe..050e49c14fd9 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -160,7 +160,7 @@ typedef struct XiveNotifier XiveNotifier;
 
 typedef struct XiveNotifierClass {
 InterfaceClass parent;
-void (*notify)(XiveNotifier *xn, uint32_t lisn);
+void (*notify)(XiveNotifier *xn, uint32_t lisn, bool pq_checked);
 } XiveNotifierClass;
 
 /*
@@ -354,6 +354,10 @@ typedef struct XiveRouterClass {
 /* XIVE table accessors */
 int (*get_eas)(XiveRouter *xrtr, uint8_t eas_blk, uint32_t eas_idx,
XiveEAS *eas);
+int (*get_pq)(XiveRouter *xrtr, uint8_t eas_blk, uint32_t eas_idx,
+  uint8_t *pq);
+int (*set_pq)(XiveRouter *xrtr, uint8_t eas_blk, uint32_t eas_idx,
+  uint8_t *pq);
 int (*get_end)(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
XiveEND *end);
 int (*write_end)(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
@@ -375,7 +379,7 @@ int xive_router_get_nvt(XiveRouter *xrtr, uint8_t nvt_blk, 
uint32_t nvt_idx,
 XiveNVT *nvt);
 int xive_router_write_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
   XiveNVT *nvt, uint8_t word_number);
-void xive_router_notify(XiveNotifier *xn, uint32_t lisn);
+void xive_router_notify(XiveNotifier *xn, uint32_t lisn, bool pq_checked);
 
 /*
  * XIVE Presenter
diff --git a/hw/intc/pnv_xive.c b/hw/intc/pnv_xive.c
index 77cacdd6c623..bcfe9dc54e3b 100644
--- a/hw/intc/pnv_xive.c
+++ b/hw/intc/pnv_xive.c
@@ -373,6 +373,34 @@ static int pnv_xive_get_eas(XiveRouter *xrtr, uint8_t blk, 
uint32_t idx,
 return pnv_xive_vst_read(xive, VST_TSEL_IVT, blk, idx, eas);
 }
 
+static int pnv_xive_get_pq(XiveRouter *xrtr, uint8_t blk, uint32_t idx,
+   uint8_t *pq)
+{
+PnvXive *xive = PNV_XIVE(xrtr);
+
+if (pnv_xive_block_id(xive) != blk) {
+xive_error(xive, "VST: EAS %x is remote !?", XIVE_EAS(blk, idx));
+return -1;
+}
+
+*pq = xive_source_esb_get(>ipi_source, idx);
+return 0;
+}
+
+static int pnv_xive_set_pq(XiveRouter *xrtr, uint8_t blk, uint32_t idx,
+   uint8_t *pq)
+{
+PnvXive *xive = PNV_XIVE(xrtr);
+
+if (pnv_xive_block_id(xive) != blk) {
+xive_error(xive, "VST: EAS %x is remote !?", XIVE_EAS(blk, idx));
+return -1;
+}
+
+*pq = xive_source_esb_set(>ipi_source, idx, *pq);
+return 0;
+}
+
 /*
  * One bit per thread id. The first register PC_THREAD_EN_REG0 covers
  * the first cores 0-15 (normal) of the chip or 0-7 (fused). The
@@ -469,12 +497,12 @@ static PnvXive *pnv_xive_tm_get_xive(PowerPCCPU *cpu)
  * event notification to the Router. This is required on a multichip
  * system.
  */
-static void pnv_xive_notify(XiveNotifier *xn, uint32_t srcno)
+static void pnv_xive_notify(XiveNotifier *xn, uint32_t srcno, bool pq_checked)
 {
 PnvXive *xive = PNV_XIVE(xn);
 uint8_t blk = pnv_xive_block_id(xive);
 
-xive_router_notify(xn, XIVE_EAS(blk, srcno));
+xive_router_notify(xn, XIVE_EAS(blk, srcno), pq_checked);
 }
 
 /*
@@ -1316,7 +1344,8 @@ static void pnv_xive_ic_hw_trigger(PnvXive *xive, hwaddr 
addr, uint64_t val)
 blk = XIVE_EAS_BLOCK(val);
 idx = XIVE_EAS_INDEX(val);
 
-xive_router_notify(XIVE_NOTIFIER(xive), XIVE_EAS(blk, idx));
+xive_router_notify(XIVE_NOTIFIER(xive), XIVE_EAS(blk, idx),
+   !!(val & XIVE_TRIGGER_PQ));
 }
 
 static void pnv_xive_ic_notify_write(void *opaque, hwaddr addr, uint64_t val,
@@ -1944,6 +1973,8 @@ static void pnv_xive_class_init(ObjectClass *klass, void 
*data)
 device_class_set_props(dc, pnv_xive_properties);
 
 xrc->get_eas = pnv_xive_get_eas;
+xrc->get_pq = pnv_xive_get_pq;
+xrc->set_pq = pnv_xive_set_pq;
 xrc->get_end = pnv_xive_get_end;
 xrc->write_end = pnv_xive_write_end;
 xrc->get_nvt = pnv_xive_get_nvt;
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index b8825577f719..894be4b49ba4 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -927,7 +927,7 @@ static void xive_source_notify(XiveSource 

[PATCH v2 1/2] ppc/xive: export PQ routines

2020-04-01 Thread Cédric Le Goater
Signed-off-by: Cédric Le Goater 
Reviewed-by: Greg Kurz 
---
 include/hw/ppc/xive.h| 4 
 hw/intc/spapr_xive_kvm.c | 8 
 hw/intc/xive.c   | 6 +++---
 3 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index 705cf48176fc..112fb6fb6dbe 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -255,6 +255,10 @@ static inline hwaddr xive_source_esb_mgmt(XiveSource 
*xsrc, int srcno)
 #define XIVE_ESB_QUEUED   (XIVE_ESB_VAL_P | XIVE_ESB_VAL_Q)
 #define XIVE_ESB_OFF  XIVE_ESB_VAL_Q
 
+bool xive_esb_trigger(uint8_t *pq);
+bool xive_esb_eoi(uint8_t *pq);
+uint8_t xive_esb_set(uint8_t *pq, uint8_t value);
+
 /*
  * "magic" Event State Buffer (ESB) MMIO offsets.
  *
diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
index edb7ee0e74f1..43f4d56b958c 100644
--- a/hw/intc/spapr_xive_kvm.c
+++ b/hw/intc/spapr_xive_kvm.c
@@ -308,7 +308,7 @@ static uint8_t xive_esb_read(XiveSource *xsrc, int srcno, 
uint32_t offset)
 return xive_esb_rw(xsrc, srcno, offset, 0, 0) & 0x3;
 }
 
-static void xive_esb_trigger(XiveSource *xsrc, int srcno)
+static void kvmppc_xive_esb_trigger(XiveSource *xsrc, int srcno)
 {
 uint64_t *addr = xsrc->esb_mmap + xive_source_esb_page(xsrc, srcno);
 
@@ -331,7 +331,7 @@ uint64_t kvmppc_xive_esb_rw(XiveSource *xsrc, int srcno, 
uint32_t offset,
 offset == XIVE_ESB_LOAD_EOI) {
 xive_esb_read(xsrc, srcno, XIVE_ESB_SET_PQ_00);
 if (xsrc->status[srcno] & XIVE_STATUS_ASSERTED) {
-xive_esb_trigger(xsrc, srcno);
+kvmppc_xive_esb_trigger(xsrc, srcno);
 }
 return 0;
 } else {
@@ -375,7 +375,7 @@ void kvmppc_xive_source_set_irq(void *opaque, int srcno, 
int val)
 }
 }
 
-xive_esb_trigger(xsrc, srcno);
+kvmppc_xive_esb_trigger(xsrc, srcno);
 }
 
 /*
@@ -544,7 +544,7 @@ static void kvmppc_xive_change_state_handler(void *opaque, 
int running,
  * generate a trigger.
  */
 if (pq == XIVE_ESB_RESET && old_pq == XIVE_ESB_QUEUED) {
-xive_esb_trigger(xsrc, i);
+kvmppc_xive_esb_trigger(xsrc, i);
 }
 }
 
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index d6183f8ae40a..b8825577f719 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -789,7 +789,7 @@ void xive_tctx_destroy(XiveTCTX *tctx)
  * XIVE ESB helpers
  */
 
-static uint8_t xive_esb_set(uint8_t *pq, uint8_t value)
+uint8_t xive_esb_set(uint8_t *pq, uint8_t value)
 {
 uint8_t old_pq = *pq & 0x3;
 
@@ -799,7 +799,7 @@ static uint8_t xive_esb_set(uint8_t *pq, uint8_t value)
 return old_pq;
 }
 
-static bool xive_esb_trigger(uint8_t *pq)
+bool xive_esb_trigger(uint8_t *pq)
 {
 uint8_t old_pq = *pq & 0x3;
 
@@ -819,7 +819,7 @@ static bool xive_esb_trigger(uint8_t *pq)
 }
 }
 
-static bool xive_esb_eoi(uint8_t *pq)
+bool xive_esb_eoi(uint8_t *pq)
 {
 uint8_t old_pq = *pq & 0x3;
 
-- 
2.21.1




Re: QEMU participation to Google Season of Docs

2020-04-01 Thread Paolo Bonzini
On 01/04/20 18:37, Philippe Mathieu-Daudé wrote:
> 
> * Refactor the open source project's existing documentation to
>   provide an improved user experience or a more accessible
>   information architecture.

This kind of project would be indeed very suitable to QEMU.  Stefan,
perhaps you could help by providing the text for our Summer of Code
submission?

Thanks,

Paolo




Re: [PATCH 2/2] ppc/xive: Add support for PQ state bits offload

2020-04-01 Thread Cédric Le Goater
>  static void pnv_xive_ic_notify_write(void *opaque, hwaddr addr, uint64_t val,
> @@ -1939,6 +1968,8 @@ static void pnv_xive_class_init(ObjectClass *klass, 
> void *data)
>  device_class_set_props(dc, pnv_xive_properties);
>  
>  xrc->get_eas = pnv_xive_get_eas;
> +xrc->get_pq = pnv_xive_get_pq;
> +xrc->set_pq = pnv_xive_set_pq;
>  xrc->get_end = pnv_xive_get_end;
>  xrc->write_end = pnv_xive_write_end;
>  xrc->get_nvt = pnv_xive_get_nvt;
> @@ -1967,7 +1998,8 @@ static const TypeInfo pnv_xive_info = {
>   *
>   * Trigger all threads 0
>   */
> -static void pnv_xive_lsi_notify(XiveNotifier *xn, uint32_t srcno)
> +static void pnv_xive_lsi_notify(XiveNotifier *xn, uint32_t srcno,
> +bool pq_checked)


oops. I haven't sent this yet. So this patch won't apply.

Sending a v2.

C. 



QEMU participation to Google Season of Docs

2020-04-01 Thread Philippe Mathieu-Daudé

Hi,

Google recently announced their 'Season of Docs' project:
https://developers.google.com/season-of-docs

QEMU project seems to fit all the requirements.

Who is interested in [co-]mentoring?

Relevant links:
https://developers.google.com/season-of-docs/docs/admin-guide
https://developers.google.com/season-of-docs/docs/timeline

[Following is extracted from the previous links:]

Example projects:

* Build a documentation site on a platform to be decided
  by the technical writer and open source mentor, and publish
  an initial set of basic documents on the site. Examples of
  platforms include:

  - A static site generator such as Hugo, Jekyll, Sphinx, ...

* Refactor the open source project's existing documentation to
  provide an improved user experience or a more accessible
  information architecture.

* Write a conceptual overview of, or introduction to, a product
  or feature. Often a team creates their technical documentation
  from the bottom up, with the result that there's a lot of
  detail but it's hard to understand the product as a whole. A
  technical writer can fix this.

* Create a tutorial for a high-profile use case.

* Create a set of focused how-to guides for specific tasks.

* Create a contributor’s guide that includes basic information
  about getting started as a contributor to the open source
  project, as well as any rules around licence agreements,
  processes for pull requests and reviews, building the project,
  and so on.

Previous experience with similar programs, such as Google Summer
of Code or others: If you or any of your mentors have taken part
in Google Summer of Code or a similar program, mention this in
your application. Describe your achievements in that program.
Explain how this experience may influence the way you work in
Season of Docs.

The 2020 season of Season of Docs is limited to a maximum of
50 technical writing projects in total.
As a guideline, we expect to accept a maximum of 2 projects
per organization, so that we don't end up with too many
accepted projects. However, if the free selection process
doesn't fill all the slots, the Google program administrators
may allocate additional slots to some organizations.




Re: Question about scsi device hotplug (e.g scsi-hd)

2020-04-01 Thread Paolo Bonzini
On 01/04/20 17:09, Stefan Hajnoczi wrote:
>> What do you think about it?
>
> Maybe aio_disable_external() is needed to postpone device emulation
> until after realize has finished?
> 
> Virtqueue kick ioeventfds are marked "external" and won't be processed
> while external events are disabled.  See also
> virtio_queue_aio_set_host_notifier_handler() ->
> aio_set_event_notifier().

Yes, I think Stefan is right.

Paolo



signature.asc
Description: OpenPGP digital signature


Re: [PATCH] Compress lines for immediate return

2020-04-01 Thread Simran Singhal
On Wed, Apr 1, 2020 at 9:15 PM Eric Blake  wrote:

> On 4/1/20 9:49 AM, Simran Singhal wrote:
> > Hello Philippe
> >
> > On Wed, Apr 1, 2020 at 7:26 PM Philippe Mathieu-Daudé  >
> > wrote:
> >
> >> Hi Simran,
> >>
> >> On 4/1/20 2:11 PM, Simran Singhal wrote:
> >>> Compress two lines into a single line if immediate return statement is
> >> found.
> >>
> >> How did you find these changes? Manual audit, some tool?
> >>
> >
> > I wrote coccinelle script to do these changes.
> >
>
> Then is it worth checking in your script to scripts/coccinelle/ to let
> it be something we can repeatedly rerun in the future to catch more
> instances?  Even if you don't go that far, mentioning the exact rune you
> used makes it easier to reproduce the patch, or even backport its
> effects to a different branch.
>

Ok, I'll resend the patch with changing the commit message to include the
script used to make the change.

Thanks
Simran


>
> --
> Eric Blake, Principal Software Engineer
> Red Hat, Inc.   +1-919-301-3226
> Virtualization:  qemu.org | libvirt.org
>
>


Re: [PATCH v3 1/1] vl/s390x: fixup ram sizes for compat machines

2020-04-01 Thread Igor Mammedov
On Wed,  1 Apr 2020 08:37:54 -0400
Christian Borntraeger  wrote:

> Older QEMU versions did fixup the ram size to match what can be reported
> via sclp. We need to mimic this behaviour for machine types 4.2 and
> older to not fail on inbound migration for memory sizes that do not fit.
> Old machines with proper aligned memory sizes are not affected.
> 
> Alignment table:
>  VM size (<=) | Alignment
> --
>   1020M   | 1M
>   2040M   | 2M
>   4080M   | 4M
>   8160M   | 8M
>  16320M   |16M
>  32640M   |32M
>  65280M   |64M
> 130560M   |   128M
> 261120M   |   256M
> 522240M   |   512M
>1044480M   | 1G
>2088960M   | 2G
>4177920M   | 4G
>8355840M   | 8G
> 
> Suggested action is to replace unaligned -m value with a suitable
> aligned one or if a change to a newer machine type is possible, use a
> machine version >= 5.0.
> 
> A future versions might remove the compatibility handling.
> 
> For machine types >= 5.0 we can simply use an increment size of 1M and
> use the full range of increment number which allows for all possible
> memory sizes. The old limitation of having a maximum of 1020 increments
> was added for standby memory, which we no longer support. With that we
> can now support even weird memory sizes like 10001234 MB.
> 
> As we no longer fixup maxram_size as well, make other users use ram_size
> instead. Keep using maxram_size when setting the maximum ram size in KVM,
> as that will come in handy in the future when supporting memory hotplug
> (in contrast, storage keys and storage attributes for hotplugged memory
> will have to be migrated per RAM block in the future).
> 
> Fixes: 3a12fc61af5c ("390x/s390-virtio-ccw: use memdev for RAM")
> Reported-by: Lukáš Doktor 
> Cc: Igor Mammedov 
> Cc: Dr. David Alan Gilbert 
> Signed-off-by: David Hildenbrand 
> Signed-off-by: Christian Borntraeger 

Acked-by: Igor Mammedov 

minor nit below if you have to respin

> ---
>  hw/s390x/s390-skeys.c|  2 +-
>  hw/s390x/s390-stattrib-kvm.c |  4 ++--
>  hw/s390x/s390-virtio-ccw.c   | 21 +
>  hw/s390x/sclp.c  | 17 +
>  include/hw/boards.h  |  7 +++
>  softmmu/vl.c |  3 +++
>  6 files changed, 39 insertions(+), 15 deletions(-)
> 
> diff --git a/hw/s390x/s390-skeys.c b/hw/s390x/s390-skeys.c
> index 5da6e5292f..a9a4ae7b39 100644
> --- a/hw/s390x/s390-skeys.c
> +++ b/hw/s390x/s390-skeys.c
> @@ -176,7 +176,7 @@ static void qemu_s390_skeys_init(Object *obj)
>  QEMUS390SKeysState *skeys = QEMU_S390_SKEYS(obj);
>  MachineState *machine = MACHINE(qdev_get_machine());
>  
> -skeys->key_count = machine->maxram_size / TARGET_PAGE_SIZE;
> +skeys->key_count = machine->ram_size / TARGET_PAGE_SIZE;
>  skeys->keydata = g_malloc0(skeys->key_count);
>  }
>  
> diff --git a/hw/s390x/s390-stattrib-kvm.c b/hw/s390x/s390-stattrib-kvm.c
> index c7e1f35524..f89d8d9d16 100644
> --- a/hw/s390x/s390-stattrib-kvm.c
> +++ b/hw/s390x/s390-stattrib-kvm.c
> @@ -85,7 +85,7 @@ static int kvm_s390_stattrib_set_stattr(S390StAttribState 
> *sa,
>  {
>  KVMS390StAttribState *sas = KVM_S390_STATTRIB(sa);
>  MachineState *machine = MACHINE(qdev_get_machine());
> -unsigned long max = machine->maxram_size / TARGET_PAGE_SIZE;
> +unsigned long max = machine->ram_size / TARGET_PAGE_SIZE;
>  
>  if (start_gfn + count > max) {
>  error_report("Out of memory bounds when setting storage attributes");
> @@ -104,7 +104,7 @@ static void 
> kvm_s390_stattrib_synchronize(S390StAttribState *sa)
>  {
>  KVMS390StAttribState *sas = KVM_S390_STATTRIB(sa);
>  MachineState *machine = MACHINE(qdev_get_machine());
> -unsigned long max = machine->maxram_size / TARGET_PAGE_SIZE;
> +unsigned long max = machine->ram_size / TARGET_PAGE_SIZE;
>  /* We do not need to reach the maximum buffer size allowed */
>  unsigned long cx, len = KVM_S390_SKEYS_MAX / 2;
>  int r;
> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> index 3cf19c99f3..61a8a0e693 100644
> --- a/hw/s390x/s390-virtio-ccw.c
> +++ b/hw/s390x/s390-virtio-ccw.c
> @@ -27,6 +27,7 @@
>  #include "qemu/ctype.h"
>  #include "qemu/error-report.h"
>  #include "qemu/option.h"
> +#include "qemu/qemu-print.h"
>  #include "s390-pci-bus.h"
>  #include "sysemu/reset.h"
>  #include "hw/s390x/storage-keys.h"
> @@ -579,6 +580,25 @@ static void s390_nmi(NMIState *n, int cpu_index, Error 
> **errp)
>  s390_cpu_restart(S390_CPU(cs));
>  }
>  
> +static ram_addr_t s390_fixup_ram_size(ram_addr_t sz)
> +{
> +/* same logic as in sclp.c */
> +int increment_size = 20;
> +ram_addr_t newsz;
> +
> +while ((sz >> increment_size) > MAX_STORAGE_INCREMENTS) {
> +increment_size++;
> +}
> +newsz = sz >> increment_size << increment_size;
> +
> +if (sz != newsz) {
> +qemu_printf("Ram size %" PRIu64 "MB was fixed 

[PATCH v2 4/4] target/ppc: Add support for Radix partition-scoped translation

2020-04-01 Thread Cédric Le Goater
The Radix tree translation model currently supports process-scoped
translation for the PowerNV machine (Hypervisor mode) and for the
pSeries machine (Guest mode). Guests running under an emulated
Hypervisor (PowerNV machine) require a new type of Radix translation,
called partition-scoped, which is missing today.

The Radix tree translation is a 2 steps process. The first step,
process-scoped translation, converts an effective Address to a guest
real address, and the second step, partition-scoped translation,
converts a guest real address to a host real address.

There are difference cases to covers :

* Hypervisor real mode access: no Radix translation.

* Hypervisor or host application access (quadrant 0 and 3) with
  relocation on: process-scoped translation.

* Guest OS real mode access: only partition-scoped translation.

* Guest OS real or guest application access (quadrant 0 and 3) with
  relocation on: both process-scoped translation and partition-scoped
  translations.

* Hypervisor access in quadrant 1 and 2 with relocation on: both
  process-scoped translation and partition-scoped translations.

The radix tree partition-scoped translation is performed using tables
pointed to by the first double-word of the Partition Table Entries and
process-scoped translation uses tables pointed to by the Process Table
Entries (second double-word of the Partition Table Entries).

Both partition-scoped and process-scoped translations process are
identical and thus the radix tree traversing code is largely reused.
However, errors in partition-scoped translations generate hypervisor
exceptions.

Signed-off-by: Suraj Jitindar Singh 
Signed-off-by: Cédric Le Goater 
---
 target/ppc/cpu.h |   3 +
 target/ppc/excp_helper.c |   3 +-
 target/ppc/mmu-radix64.c | 172 ---
 3 files changed, 164 insertions(+), 14 deletions(-)

diff --git a/target/ppc/cpu.h b/target/ppc/cpu.h
index f4a5304d4356..6b6dd7e483f1 100644
--- a/target/ppc/cpu.h
+++ b/target/ppc/cpu.h
@@ -463,6 +463,9 @@ typedef struct ppc_v3_pate_t {
 #define DSISR_AMR0x0020
 /* Unsupported Radix Tree Configuration */
 #define DSISR_R_BADCONFIG0x0008
+#define DSISR_ATOMIC_RC  0x0004
+/* Unable to translate address of (guest) pde or process/page table entry */
+#define DSISR_PRTABLE_FAULT  0x0002
 
 /* SRR1 error code fields */
 
diff --git a/target/ppc/excp_helper.c b/target/ppc/excp_helper.c
index 1acc3786de0e..f05297966472 100644
--- a/target/ppc/excp_helper.c
+++ b/target/ppc/excp_helper.c
@@ -506,9 +506,10 @@ static inline void powerpc_excp(PowerPCCPU *cpu, int 
excp_model, int excp)
 case POWERPC_EXCP_ISEG:  /* Instruction segment exception*/
 case POWERPC_EXCP_TRACE: /* Trace exception  */
 break;
+case POWERPC_EXCP_HISI:  /* Hypervisor instruction storage exception */
+msr |= env->error_code;
 case POWERPC_EXCP_HDECR: /* Hypervisor decrementer exception */
 case POWERPC_EXCP_HDSI:  /* Hypervisor data storage exception*/
-case POWERPC_EXCP_HISI:  /* Hypervisor instruction storage exception */
 case POWERPC_EXCP_HDSEG: /* Hypervisor data segment exception*/
 case POWERPC_EXCP_HISEG: /* Hypervisor instruction segment exception */
 case POWERPC_EXCP_SDOOR_HV:  /* Hypervisor Doorbell interrupt*/
diff --git a/target/ppc/mmu-radix64.c b/target/ppc/mmu-radix64.c
index dd07c6598d9f..03f0bb04ed72 100644
--- a/target/ppc/mmu-radix64.c
+++ b/target/ppc/mmu-radix64.c
@@ -112,6 +112,32 @@ static void ppc_radix64_raise_si(PowerPCCPU *cpu, int rwx, 
vaddr eaddr,
 }
 }
 
+static void ppc_radix64_raise_hsi(PowerPCCPU *cpu, int rwx, vaddr eaddr,
+  hwaddr g_raddr, uint32_t cause,
+  bool cause_excp)
+{
+CPUState *cs = CPU(cpu);
+CPUPPCState *env = >env;
+
+if (!cause_excp) {
+return;
+}
+
+if (rwx == 2) { /* H Instruction Storage Interrupt */
+cs->exception_index = POWERPC_EXCP_HISI;
+env->spr[SPR_ASDR] = g_raddr;
+env->error_code = cause;
+} else { /* H Data Storage Interrupt */
+cs->exception_index = POWERPC_EXCP_HDSI;
+if (rwx == 1) { /* Write -> Store */
+cause |= DSISR_ISSTORE;
+}
+env->spr[SPR_HDSISR] = cause;
+env->spr[SPR_HDAR] = eaddr;
+env->spr[SPR_ASDR] = g_raddr;
+env->error_code = 0;
+}
+}
 
 static bool ppc_radix64_check_prot(PowerPCCPU *cpu, int rwx, uint64_t pte,
int *fault_cause, int *prot,
@@ -249,6 +275,37 @@ static bool validate_pate(PowerPCCPU *cpu, uint64_t lpid, 
ppc_v3_pate_t *pate)
 return true;
 }
 
+static int ppc_radix64_partition_scoped_xlate(PowerPCCPU *cpu, int rwx,
+  vaddr eaddr, hwaddr g_raddr,
+   

[PATCH v2 3/4] target/ppc: Rework ppc_radix64_walk_tree() for partition-scoped translation

2020-04-01 Thread Cédric Le Goater
The ppc_radix64_walk_tree() routine walks through the nested radix
tables to look for a PTE.

Split it in two and introduce a new routine ppc_radix64_next_level()
which we will use for partition-scoped Radix translation when
translating the process tree addresses. Use also a 'AddressSpace *as'
parameter instead of a 'PowerPCCPU *cpu'.

Signed-off-by: Suraj Jitindar Singh 
Signed-off-by: Cédric Le Goater 
---
 target/ppc/mmu-radix64.c | 58 +++-
 1 file changed, 39 insertions(+), 19 deletions(-)

diff --git a/target/ppc/mmu-radix64.c b/target/ppc/mmu-radix64.c
index 29fee6529332..dd07c6598d9f 100644
--- a/target/ppc/mmu-radix64.c
+++ b/target/ppc/mmu-radix64.c
@@ -172,44 +172,64 @@ static void ppc_radix64_set_rc(PowerPCCPU *cpu, int rwx, 
uint64_t pte,
 }
 }
 
-static uint64_t ppc_radix64_walk_tree(PowerPCCPU *cpu, vaddr eaddr,
-  uint64_t base_addr, uint64_t nls,
-  hwaddr *raddr, int *psize,
-  int *fault_cause, hwaddr *pte_addr)
+static uint64_t ppc_radix64_next_level(AddressSpace *as, vaddr eaddr,
+   uint64_t *pte_addr, uint64_t *nls,
+   int *psize, int *fault_cause)
 {
-CPUState *cs = CPU(cpu);
 uint64_t index, pde;
 
-if (nls < 5) { /* Directory maps less than 2**5 entries */
+if (*nls < 5) { /* Directory maps less than 2**5 entries */
 *fault_cause |= DSISR_R_BADCONFIG;
 return 0;
 }
 
 /* Read page  entry from guest address space */
-index = eaddr >> (*psize - nls); /* Shift */
-index &= ((1UL << nls) - 1); /* Mask */
-pde = ldq_phys(cs->as, base_addr + (index * sizeof(pde)));
-if (!(pde & R_PTE_VALID)) { /* Invalid Entry */
+pde = ldq_phys(as, *pte_addr);
+if (!(pde & R_PTE_VALID)) { /* Invalid Entry */
 *fault_cause |= DSISR_NOPTE;
 return 0;
 }
 
-*psize -= nls;
+*psize -= *nls;
+if (!(pde & R_PTE_LEAF)) { /* Prepare for next iteration */
+*nls = pde & R_PDE_NLS;
+index = eaddr >> (*psize - *nls);   /* Shift */
+index &= ((1UL << *nls) - 1);   /* Mask */
+*pte_addr = (pde & R_PDE_NLB) + (index * sizeof(pde));
+}
+return pde;
+}
+
+static uint64_t ppc_radix64_walk_tree(AddressSpace *as, vaddr eaddr,
+  uint64_t base_addr, uint64_t nls,
+  hwaddr *raddr, int *psize,
+  int *fault_cause, hwaddr *pte_addr)
+{
+uint64_t index, pde;
+
+if (nls < 5) { /* Directory maps less than 2**5 entries */
+*fault_cause |= DSISR_R_BADCONFIG;
+return 0;
+}
+
+index = eaddr >> (*psize - nls);/* Shift */
+index &= ((1UL << nls) - 1);   /* Mask */
+*pte_addr = base_addr + (index * sizeof(pde));
+do {
+pde = ppc_radix64_next_level(as, eaddr, pte_addr, , psize,
+ fault_cause);
+} while ((pde & R_PTE_VALID) && !(pde & R_PTE_LEAF));
 
-/* Check if Leaf Entry -> Page Table Entry -> Stop the Search */
-if (pde & R_PTE_LEAF) {
+/* Did we find a valid leaf? */
+if ((pde & R_PTE_VALID) && (pde & R_PTE_LEAF)) {
 uint64_t rpn = pde & R_PTE_RPN;
 uint64_t mask = (1UL << *psize) - 1;
 
 /* Or high bits of rpn and low bits to ea to form whole real addr */
 *raddr = (rpn & ~mask) | (eaddr & mask);
-*pte_addr = base_addr + (index * sizeof(pde));
-return pde;
 }
 
-/* Next Level of Radix Tree */
-return ppc_radix64_walk_tree(cpu, eaddr, pde & R_PDE_NLB, pde & R_PDE_NLS,
- raddr, psize, fault_cause, pte_addr);
+return pde;
 }
 
 static bool validate_pate(PowerPCCPU *cpu, uint64_t lpid, ppc_v3_pate_t *pate)
@@ -253,7 +273,7 @@ static int ppc_radix64_process_scoped_xlate(PowerPCCPU 
*cpu, int rwx,
 
 /* Walk Radix Tree from Process Table Entry to Convert EA to RA */
 *g_page_size = PRTBE_R_GET_RTS(prtbe0);
-pte = ppc_radix64_walk_tree(cpu, eaddr & R_EADDR_MASK,
+pte = ppc_radix64_walk_tree(cs->as, eaddr & R_EADDR_MASK,
 prtbe0 & PRTBE_R_RPDB, prtbe0 & PRTBE_R_RPDS,
 g_raddr, g_page_size, _cause, _addr);
 
-- 
2.21.1




[PATCH v2 2/4] target/ppc: Extend ppc_radix64_check_prot() with a 'partition_scoped' bool

2020-04-01 Thread Cédric Le Goater
This prepares ground for partition-scoped Radix translation.

Signed-off-by: Suraj Jitindar Singh 
Signed-off-by: Cédric Le Goater 
Reviewed-by: Greg Kurz 
---
 target/ppc/mmu-radix64.c | 11 ++-
 1 file changed, 6 insertions(+), 5 deletions(-)

diff --git a/target/ppc/mmu-radix64.c b/target/ppc/mmu-radix64.c
index 410376fbeb65..29fee6529332 100644
--- a/target/ppc/mmu-radix64.c
+++ b/target/ppc/mmu-radix64.c
@@ -114,7 +114,8 @@ static void ppc_radix64_raise_si(PowerPCCPU *cpu, int rwx, 
vaddr eaddr,
 
 
 static bool ppc_radix64_check_prot(PowerPCCPU *cpu, int rwx, uint64_t pte,
-   int *fault_cause, int *prot)
+   int *fault_cause, int *prot,
+   bool partition_scoped)
 {
 CPUPPCState *env = >env;
 const int need_prot[] = { PAGE_READ, PAGE_WRITE, PAGE_EXEC };
@@ -130,11 +131,11 @@ static bool ppc_radix64_check_prot(PowerPCCPU *cpu, int 
rwx, uint64_t pte,
 }
 
 /* Determine permissions allowed by Encoded Access Authority */
-if ((pte & R_PTE_EAA_PRIV) && msr_pr) { /* Insufficient Privilege */
+if (!partition_scoped && (pte & R_PTE_EAA_PRIV) && msr_pr) {
 *prot = 0;
-} else if (msr_pr || (pte & R_PTE_EAA_PRIV)) {
+} else if (msr_pr || (pte & R_PTE_EAA_PRIV) || partition_scoped) {
 *prot = ppc_radix64_get_prot_eaa(pte);
-} else { /* !msr_pr && !(pte & R_PTE_EAA_PRIV) */
+} else { /* !msr_pr && !(pte & R_PTE_EAA_PRIV) && !partition_scoped */
 *prot = ppc_radix64_get_prot_eaa(pte);
 *prot &= ppc_radix64_get_prot_amr(cpu); /* Least combined permissions 
*/
 }
@@ -257,7 +258,7 @@ static int ppc_radix64_process_scoped_xlate(PowerPCCPU 
*cpu, int rwx,
 g_raddr, g_page_size, _cause, _addr);
 
 if (!(pte & R_PTE_VALID) ||
-ppc_radix64_check_prot(cpu, rwx, pte, _cause, g_prot)) {
+ppc_radix64_check_prot(cpu, rwx, pte, _cause, g_prot, false)) {
 /* No valid pte or access denied due to protection */
 ppc_radix64_raise_si(cpu, rwx, eaddr, fault_cause, cause_excp);
 return 1;
-- 
2.21.1




[PATCH v2 1/4] target/ppc: Introduce ppc_radix64_xlate() for Radix tree translation

2020-04-01 Thread Cédric Le Goater
This is moving code under a new ppc_radix64_xlate() routine shared by
the MMU Radix page fault handler and the 'get_phys_page_debug' PPC
callback. The difference being that 'get_phys_page_debug' does not
generate exceptions.

The specific part of process-scoped Radix translation is moved under
ppc_radix64_process_scoped_xlate() in preparation of the future support
for partition-scoped Radix translation. Routines raising the exceptions
now take a 'cause_excp' bool to cover the 'get_phys_page_debug' case.

It should be functionally equivalent.

Signed-off-by: Suraj Jitindar Singh 
Signed-off-by: Cédric Le Goater 
---
 target/ppc/mmu-radix64.c | 223 ++-
 1 file changed, 125 insertions(+), 98 deletions(-)

diff --git a/target/ppc/mmu-radix64.c b/target/ppc/mmu-radix64.c
index d2422d1c54c9..410376fbeb65 100644
--- a/target/ppc/mmu-radix64.c
+++ b/target/ppc/mmu-radix64.c
@@ -69,11 +69,16 @@ static bool 
ppc_radix64_get_fully_qualified_addr(CPUPPCState *env, vaddr eaddr,
 return true;
 }
 
-static void ppc_radix64_raise_segi(PowerPCCPU *cpu, int rwx, vaddr eaddr)
+static void ppc_radix64_raise_segi(PowerPCCPU *cpu, int rwx, vaddr eaddr,
+   bool cause_excp)
 {
 CPUState *cs = CPU(cpu);
 CPUPPCState *env = >env;
 
+if (!cause_excp) {
+return;
+}
+
 if (rwx == 2) { /* Instruction Segment Interrupt */
 cs->exception_index = POWERPC_EXCP_ISEG;
 } else { /* Data Segment Interrupt */
@@ -84,11 +89,15 @@ static void ppc_radix64_raise_segi(PowerPCCPU *cpu, int 
rwx, vaddr eaddr)
 }
 
 static void ppc_radix64_raise_si(PowerPCCPU *cpu, int rwx, vaddr eaddr,
-uint32_t cause)
+ uint32_t cause, bool cause_excp)
 {
 CPUState *cs = CPU(cpu);
 CPUPPCState *env = >env;
 
+if (!cause_excp) {
+return;
+}
+
 if (rwx == 2) { /* Instruction Storage Interrupt */
 cs->exception_index = POWERPC_EXCP_ISI;
 env->error_code = cause;
@@ -219,17 +228,118 @@ static bool validate_pate(PowerPCCPU *cpu, uint64_t 
lpid, ppc_v3_pate_t *pate)
 return true;
 }
 
+static int ppc_radix64_process_scoped_xlate(PowerPCCPU *cpu, int rwx,
+vaddr eaddr, uint64_t pid,
+ppc_v3_pate_t pate, hwaddr 
*g_raddr,
+int *g_prot, int *g_page_size,
+bool cause_excp)
+{
+CPUState *cs = CPU(cpu);
+uint64_t offset, size, prtbe_addr, prtbe0, pte;
+int fault_cause = 0;
+hwaddr pte_addr;
+
+/* Index Process Table by PID to Find Corresponding Process Table Entry */
+offset = pid * sizeof(struct prtb_entry);
+size = 1ULL << ((pate.dw1 & PATE1_R_PRTS) + 12);
+if (offset >= size) {
+/* offset exceeds size of the process table */
+ppc_radix64_raise_si(cpu, rwx, eaddr, DSISR_NOPTE, cause_excp);
+return 1;
+}
+prtbe_addr = (pate.dw1 & PATE1_R_PRTB) + offset;
+prtbe0 = ldq_phys(cs->as, prtbe_addr);
+
+/* Walk Radix Tree from Process Table Entry to Convert EA to RA */
+*g_page_size = PRTBE_R_GET_RTS(prtbe0);
+pte = ppc_radix64_walk_tree(cpu, eaddr & R_EADDR_MASK,
+prtbe0 & PRTBE_R_RPDB, prtbe0 & PRTBE_R_RPDS,
+g_raddr, g_page_size, _cause, _addr);
+
+if (!(pte & R_PTE_VALID) ||
+ppc_radix64_check_prot(cpu, rwx, pte, _cause, g_prot)) {
+/* No valid pte or access denied due to protection */
+ppc_radix64_raise_si(cpu, rwx, eaddr, fault_cause, cause_excp);
+return 1;
+}
+
+ppc_radix64_set_rc(cpu, rwx, pte, pte_addr, g_prot);
+
+return 0;
+}
+
+static int ppc_radix64_xlate(PowerPCCPU *cpu, vaddr eaddr, int rwx,
+ bool relocation,
+ hwaddr *raddr, int *psizep, int *protp,
+ bool cause_excp)
+{
+uint64_t lpid = 0, pid = 0;
+ppc_v3_pate_t pate;
+int psize, prot;
+hwaddr g_raddr;
+
+/* Virtual Mode Access - get the fully qualified address */
+if (!ppc_radix64_get_fully_qualified_addr(>env, eaddr, , )) {
+ppc_radix64_raise_segi(cpu, rwx, eaddr, cause_excp);
+return 1;
+}
+
+/* Get Process Table */
+if (cpu->vhyp) {
+PPCVirtualHypervisorClass *vhc;
+vhc = PPC_VIRTUAL_HYPERVISOR_GET_CLASS(cpu->vhyp);
+vhc->get_pate(cpu->vhyp, );
+} else {
+if (!ppc64_v3_get_pate(cpu, lpid, )) {
+ppc_radix64_raise_si(cpu, rwx, eaddr, DSISR_NOPTE, cause_excp);
+return 1;
+}
+if (!validate_pate(cpu, lpid, )) {
+ppc_radix64_raise_si(cpu, rwx, eaddr, DSISR_R_BADCONFIG,
+ cause_excp);
+return 1;
+}
+/* We don't support guest mode yet */
+if (lpid != 

[PATCH v2 0/4] target/ppc: Add support for Radix partition-scoped translation

2020-04-01 Thread Cédric Le Goater
Hello,

The Radix tree translation model currently supports process-scoped
translation for the PowerNV machine (Hypervisor mode) and for the
pSeries machine (Guest mode). Guests running under an emulated
Hypervisor (PowerNV machine) require a new type of Radix translation,
called partition-scoped, which is missing today.

The Radix tree translation is a 2 steps process. The first step,
process-scoped translation, converts an effective Address to a guest
real address, and the second step, partition-scoped translation,
converts a guest real address to a host real address.

There are difference cases to covers : 

* Hypervisor real mode access: no Radix translation.

* Hypervisor or host application access (quadrant 0 and 3) with
  relocation on: process-scoped translation.

* Guest OS real mode access: only partition-scoped translation.

* Guest OS real or guest application access (quadrant 0 and 3) with
  relocation on: both process-scoped translation and partition-scoped
  translations.

* Hypervisor access in quadrant 1 and 2 with relocation on: both
  process-scoped translation and partition-scoped translations.

The radix tree partition-scoped translation is performed using tables
pointed to by the first double-word of the Partition Table Entries and
process-scoped translation uses tables pointed to by the Process Table
Entries (second double-word of the Partition Table Entries).

Both partition-scoped and process-scoped translations process are
identical and thus the radix tree traversing code is largely reused.
However, errors in partition-scoped translations generate hypervisor
exceptions.

Based on work from Suraj Jitindar Singh 

Thanks,

C.

Changes since v1:

 - removed checks (cpu->vhyp && lpid == 0)
 - changed ppc_radix64_walk_tree() and ppc_radix64_next_level() to use
   an 'AddressSpace *'
 - moved call to ppc_radix64_get_fully_qualified_addr() under
   ppc_radix64_xlate()
 - reworked the prototype of the routines raising the exceptions to
   take a 'cause_excp' bool.
 - re-introduced an extra test on nls in ppc_radix64_walk_tree()
 
Cédric Le Goater (4):
  target/ppc: Introduce ppc_radix64_xlate() for Radix tree translation
  target/ppc: Extend ppc_radix64_check_prot() with a 'partition_scoped'
bool
  target/ppc: Rework ppc_radix64_walk_tree() for partition-scoped
translation
  target/ppc: Add support for Radix partition-scoped translation

 target/ppc/cpu.h |   3 +
 target/ppc/excp_helper.c |   3 +-
 target/ppc/mmu-radix64.c | 434 ---
 3 files changed, 319 insertions(+), 121 deletions(-)

-- 
2.21.1




Re: [PATCH 1/2] ppc/xive: export PQ routines

2020-04-01 Thread Greg Kurz
On Wed,  1 Apr 2020 17:45:35 +0200
Cédric Le Goater  wrote:

> Signed-off-by: Cédric Le Goater 
> ---
>  include/hw/ppc/xive.h| 4 
>  hw/intc/spapr_xive_kvm.c | 8 
>  hw/intc/xive.c   | 6 +++---
>  3 files changed, 11 insertions(+), 7 deletions(-)
> 
> diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
> index 59ac075db080..d4e7c1f9217f 100644
> --- a/include/hw/ppc/xive.h
> +++ b/include/hw/ppc/xive.h
> @@ -255,6 +255,10 @@ static inline hwaddr xive_source_esb_mgmt(XiveSource 
> *xsrc, int srcno)
>  #define XIVE_ESB_QUEUED   (XIVE_ESB_VAL_P | XIVE_ESB_VAL_Q)
>  #define XIVE_ESB_OFF  XIVE_ESB_VAL_Q
>  
> +bool xive_esb_trigger(uint8_t *pq);
> +bool xive_esb_eoi(uint8_t *pq);
> +uint8_t xive_esb_set(uint8_t *pq, uint8_t value);
> +
>  /*
>   * "magic" Event State Buffer (ESB) MMIO offsets.
>   *
> diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
> index edb7ee0e74f1..43f4d56b958c 100644
> --- a/hw/intc/spapr_xive_kvm.c
> +++ b/hw/intc/spapr_xive_kvm.c
> @@ -308,7 +308,7 @@ static uint8_t xive_esb_read(XiveSource *xsrc, int srcno, 
> uint32_t offset)
>  return xive_esb_rw(xsrc, srcno, offset, 0, 0) & 0x3;
>  }
>  
> -static void xive_esb_trigger(XiveSource *xsrc, int srcno)
> +static void kvmppc_xive_esb_trigger(XiveSource *xsrc, int srcno)

And good riddance to the duplicate name, for a better gdb experience. :)

Reviewed-by: Greg Kurz 

>  {
>  uint64_t *addr = xsrc->esb_mmap + xive_source_esb_page(xsrc, srcno);
>  
> @@ -331,7 +331,7 @@ uint64_t kvmppc_xive_esb_rw(XiveSource *xsrc, int srcno, 
> uint32_t offset,
>  offset == XIVE_ESB_LOAD_EOI) {
>  xive_esb_read(xsrc, srcno, XIVE_ESB_SET_PQ_00);
>  if (xsrc->status[srcno] & XIVE_STATUS_ASSERTED) {
> -xive_esb_trigger(xsrc, srcno);
> +kvmppc_xive_esb_trigger(xsrc, srcno);
>  }
>  return 0;
>  } else {
> @@ -375,7 +375,7 @@ void kvmppc_xive_source_set_irq(void *opaque, int srcno, 
> int val)
>  }
>  }
>  
> -xive_esb_trigger(xsrc, srcno);
> +kvmppc_xive_esb_trigger(xsrc, srcno);
>  }
>  
>  /*
> @@ -544,7 +544,7 @@ static void kvmppc_xive_change_state_handler(void 
> *opaque, int running,
>   * generate a trigger.
>   */
>  if (pq == XIVE_ESB_RESET && old_pq == XIVE_ESB_QUEUED) {
> -xive_esb_trigger(xsrc, i);
> +kvmppc_xive_esb_trigger(xsrc, i);
>  }
>  }
>  
> diff --git a/hw/intc/xive.c b/hw/intc/xive.c
> index b9656cd7556c..56ce3ed93e29 100644
> --- a/hw/intc/xive.c
> +++ b/hw/intc/xive.c
> @@ -811,7 +811,7 @@ void xive_tctx_destroy(XiveTCTX *tctx)
>   * XIVE ESB helpers
>   */
>  
> -static uint8_t xive_esb_set(uint8_t *pq, uint8_t value)
> +uint8_t xive_esb_set(uint8_t *pq, uint8_t value)
>  {
>  uint8_t old_pq = *pq & 0x3;
>  
> @@ -821,7 +821,7 @@ static uint8_t xive_esb_set(uint8_t *pq, uint8_t value)
>  return old_pq;
>  }
>  
> -static bool xive_esb_trigger(uint8_t *pq)
> +bool xive_esb_trigger(uint8_t *pq)
>  {
>  uint8_t old_pq = *pq & 0x3;
>  
> @@ -841,7 +841,7 @@ static bool xive_esb_trigger(uint8_t *pq)
>  }
>  }
>  
> -static bool xive_esb_eoi(uint8_t *pq)
> +bool xive_esb_eoi(uint8_t *pq)
>  {
>  uint8_t old_pq = *pq & 0x3;
>  




[PATCH] lockable: Replace locks with lock guard macros

2020-04-01 Thread Simran Singhal
Replace manual lock()/unlock() calls with lock guard macros
(QEMU_LOCK_GUARD/WITH_QEMU_LOCK_GUARD).

Signed-off-by: Simran Singhal 
---
 hw/hyperv/hyperv.c | 15 ++---
 hw/rdma/rdma_backend.c | 50 +-
 hw/rdma/rdma_rm.c  |  3 +--
 hw/rdma/rdma_utils.c   | 15 +
 4 files changed, 39 insertions(+), 44 deletions(-)

diff --git a/hw/hyperv/hyperv.c b/hw/hyperv/hyperv.c
index 8ca3706f5b..4ddafe1de1 100644
--- a/hw/hyperv/hyperv.c
+++ b/hw/hyperv/hyperv.c
@@ -15,6 +15,7 @@
 #include "sysemu/kvm.h"
 #include "qemu/bitops.h"
 #include "qemu/error-report.h"
+#include "qemu/lockable.h"
 #include "qemu/queue.h"
 #include "qemu/rcu.h"
 #include "qemu/rcu_queue.h"
@@ -491,7 +492,7 @@ int hyperv_set_msg_handler(uint32_t conn_id, HvMsgHandler 
handler, void *data)
 int ret;
 MsgHandler *mh;
 
-qemu_mutex_lock(_mutex);
+QEMU_LOCK_GUARD(_mutex);
 QLIST_FOREACH(mh, _handlers, link) {
 if (mh->conn_id == conn_id) {
 if (handler) {
@@ -501,7 +502,7 @@ int hyperv_set_msg_handler(uint32_t conn_id, HvMsgHandler 
handler, void *data)
 g_free_rcu(mh, rcu);
 ret = 0;
 }
-goto unlock;
+return ret;
 }
 }
 
@@ -515,8 +516,7 @@ int hyperv_set_msg_handler(uint32_t conn_id, HvMsgHandler 
handler, void *data)
 } else {
 ret = -ENOENT;
 }
-unlock:
-qemu_mutex_unlock(_mutex);
+
 return ret;
 }
 
@@ -565,7 +565,7 @@ static int set_event_flag_handler(uint32_t conn_id, 
EventNotifier *notifier)
 int ret;
 EventFlagHandler *handler;
 
-qemu_mutex_lock(_mutex);
+QEMU_LOCK_GUARD(_mutex);
 QLIST_FOREACH(handler, _flag_handlers, link) {
 if (handler->conn_id == conn_id) {
 if (notifier) {
@@ -575,7 +575,7 @@ static int set_event_flag_handler(uint32_t conn_id, 
EventNotifier *notifier)
 g_free_rcu(handler, rcu);
 ret = 0;
 }
-goto unlock;
+return ret;
 }
 }
 
@@ -588,8 +588,7 @@ static int set_event_flag_handler(uint32_t conn_id, 
EventNotifier *notifier)
 } else {
 ret = -ENOENT;
 }
-unlock:
-qemu_mutex_unlock(_mutex);
+
 return ret;
 }
 
diff --git a/hw/rdma/rdma_backend.c b/hw/rdma/rdma_backend.c
index 3dd39fe1a7..db7e5c8be5 100644
--- a/hw/rdma/rdma_backend.c
+++ b/hw/rdma/rdma_backend.c
@@ -95,36 +95,36 @@ static int rdma_poll_cq(RdmaDeviceResources *rdma_dev_res, 
struct ibv_cq *ibcq)
 struct ibv_wc wc[2];
 RdmaProtectedGSList *cqe_ctx_list;
 
-qemu_mutex_lock(_dev_res->lock);
-do {
-ne = ibv_poll_cq(ibcq, ARRAY_SIZE(wc), wc);
+WITH_QEMU_LOCK_GUARD(_dev_res->lock) {
+do {
+ne = ibv_poll_cq(ibcq, ARRAY_SIZE(wc), wc);
 
-trace_rdma_poll_cq(ne, ibcq);
+trace_rdma_poll_cq(ne, ibcq);
 
-for (i = 0; i < ne; i++) {
-bctx = rdma_rm_get_cqe_ctx(rdma_dev_res, wc[i].wr_id);
-if (unlikely(!bctx)) {
-rdma_error_report("No matching ctx for req %"PRId64,
-  wc[i].wr_id);
-continue;
-}
+for (i = 0; i < ne; i++) {
+bctx = rdma_rm_get_cqe_ctx(rdma_dev_res, wc[i].wr_id);
+if (unlikely(!bctx)) {
+rdma_error_report("No matching ctx for req %"PRId64,
+  wc[i].wr_id);
+continue;
+}
 
-comp_handler(bctx->up_ctx, [i]);
+comp_handler(bctx->up_ctx, [i]);
 
-if (bctx->backend_qp) {
-cqe_ctx_list = >backend_qp->cqe_ctx_list;
-} else {
-cqe_ctx_list = >backend_srq->cqe_ctx_list;
-}
+if (bctx->backend_qp) {
+cqe_ctx_list = >backend_qp->cqe_ctx_list;
+} else {
+cqe_ctx_list = >backend_srq->cqe_ctx_list;
+}
 
-rdma_protected_gslist_remove_int32(cqe_ctx_list, wc[i].wr_id);
-rdma_rm_dealloc_cqe_ctx(rdma_dev_res, wc[i].wr_id);
-g_free(bctx);
-}
-total_ne += ne;
-} while (ne > 0);
-atomic_sub(_dev_res->stats.missing_cqe, total_ne);
-qemu_mutex_unlock(_dev_res->lock);
+rdma_protected_gslist_remove_int32(cqe_ctx_list, wc[i].wr_id);
+rdma_rm_dealloc_cqe_ctx(rdma_dev_res, wc[i].wr_id);
+g_free(bctx);
+}
+total_ne += ne;
+} while (ne > 0);
+atomic_sub(_dev_res->stats.missing_cqe, total_ne);
+}
 
 if (ne < 0) {
 rdma_error_report("ibv_poll_cq fail, rc=%d, errno=%d", ne, errno);
diff --git a/hw/rdma/rdma_rm.c b/hw/rdma/rdma_rm.c
index 7e9ea283c9..60957f88db 100644
--- a/hw/rdma/rdma_rm.c
+++ b/hw/rdma/rdma_rm.c
@@ -147,14 +147,13 @@ static inline void rdma_res_tbl_dealloc(RdmaRmResTbl 

Re: bdrv_drained_begin deadlock with io-threads

2020-04-01 Thread Dietmar Maurer
> > I really nobody else able to reproduce this (somebody already tried to 
> > reproduce)?
> 
> I can get hangs, but that's for job_completed(), not for starting the
> job. Also, my hangs have a non-empty bs->tracked_requests, so it looks
> like a different case to me.

Please can you post the command line args of your VM? I use something like

./x86_64-softmmu/qemu-system-x86_64 -chardev 
'socket,id=qmp,path=/var/run/qemu-server/101.qmp,server,nowait' -mon 
'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/101.pid  -m 1024 
-object 'iothread,id=iothread-virtioscsi0' -device 
'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' -drive 
'file=/backup/disk3/debian-buster.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on'
 -device 
'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0'
 -machine "type=pc,accel=kvm"

Do you also run "stress-ng -d 5" indied the VM?




Re: [Qemu-devel] [PULL 06/25] xen: create xenstore areas for XenDevice-s

2020-04-01 Thread Philippe Mathieu-Daudé

Hi Anthony, Paul.

Cc'ing Markus too.

On 1/14/19 2:51 PM, Anthony PERARD wrote:

From: Paul Durrant 

This patch adds a new source module, xen-bus-helper.c, which builds on
basic libxenstore primitives to provide functions to create (setting
permissions appropriately) and destroy xenstore areas, and functions to
'printf' and 'scanf' nodes therein. The main xen-bus code then uses
these primitives [1] to initialize and destroy the frontend and backend
areas for a XenDevice during realize and unrealize respectively.

The 'xen-block' implementation is extended with a 'get_name' method that
returns the VBD number. This number is required to 'name' the xenstore
areas.

NOTE: An exit handler is also added to make sure the xenstore areas are
   cleaned up if QEMU terminates without devices being unrealized.

[1] The 'scanf' functions are actually not yet needed, but they will be
 needed by code delivered in subsequent patches.

Signed-off-by: Paul Durrant 
Reviewed-by: Anthony Perard 
Signed-off-by: Anthony PERARD 
---
  hw/block/xen-block.c|   9 +
  hw/xen/Makefile.objs|   2 +-
  hw/xen/trace-events |  12 +-
  hw/xen/xen-bus-helper.c | 150 +++
  hw/xen/xen-bus.c| 321 +++-
  include/hw/xen/xen-bus-helper.h |  39 
  include/hw/xen/xen-bus.h|  12 ++
  7 files changed, 540 insertions(+), 5 deletions(-)
  create mode 100644 hw/xen/xen-bus-helper.c
  create mode 100644 include/hw/xen/xen-bus-helper.h


[...]

+static void xen_device_exit(Notifier *n, void *data)
+{
+XenDevice *xendev = container_of(n, XenDevice, exit);
+
+xen_device_unrealize(DEVICE(xendev), _abort);
  }
  
  static void xen_device_realize(DeviceState *dev, Error **errp)

  {
  XenDevice *xendev = XEN_DEVICE(dev);
  XenDeviceClass *xendev_class = XEN_DEVICE_GET_CLASS(xendev);
+XenBus *xenbus = XEN_BUS(qdev_get_parent_bus(DEVICE(xendev)));
  const char *type = object_get_typename(OBJECT(xendev));
  Error *local_err = NULL;
  
-trace_xen_device_realize(type);

+if (xendev->frontend_id == DOMID_INVALID) {
+xendev->frontend_id = xen_domid;
+}
+
+if (xendev->frontend_id >= DOMID_FIRST_RESERVED) {
+error_setg(errp, "invalid frontend-id");
+goto unrealize;
+}
+
+if (!xendev_class->get_name) {
+error_setg(errp, "get_name method not implemented");
+goto unrealize;
+}
+
+xendev->name = xendev_class->get_name(xendev, _err);
+if (local_err) {
+error_propagate_prepend(errp, local_err,
+"failed to get device name: ");
+goto unrealize;
+}
+
+trace_xen_device_realize(type, xendev->name);
+
+xen_device_backend_create(xendev, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+goto unrealize;
+}
+
+xen_device_frontend_create(xendev, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+goto unrealize;
+}
  
  if (xendev_class->realize) {

  xendev_class->realize(xendev, _err);
@@ -72,18 +364,43 @@ static void xen_device_realize(DeviceState *dev, Error 
**errp)
  }
  }
  
+xen_device_backend_printf(xendev, "frontend", "%s",

+  xendev->frontend_path);
+xen_device_backend_printf(xendev, "frontend-id", "%u",
+  xendev->frontend_id);
+xen_device_backend_printf(xendev, "online", "%u", 1);
+xen_device_backend_printf(xendev, "hotplug-status", "connected");
+
+xen_device_backend_set_state(xendev, XenbusStateInitWait);
+
+xen_device_frontend_printf(xendev, "backend", "%s",
+   xendev->backend_path);
+xen_device_frontend_printf(xendev, "backend-id", "%u",
+   xenbus->backend_id);
+
+xen_device_frontend_set_state(xendev, XenbusStateInitialising);
+
+xendev->exit.notify = xen_device_exit;
+qemu_add_exit_notifier(>exit);
  return;
  
  unrealize:

  xen_device_unrealize(dev, _abort);


It seems if unrealize() fails, the error stored in _err is never 
reported. Not sure if this can be improved although.



  }
  
+static Property xen_device_props[] = {

+DEFINE_PROP_UINT16("frontend-id", XenDevice, frontend_id,
+   DOMID_INVALID),
+DEFINE_PROP_END_OF_LIST()
+};
+
  static void xen_device_class_init(ObjectClass *class, void *data)
  {
  DeviceClass *dev_class = DEVICE_CLASS(class);
  
  dev_class->realize = xen_device_realize;

  dev_class->unrealize = xen_device_unrealize;
+dev_class->props = xen_device_props;
  dev_class->bus_type = TYPE_XEN_BUS;
  }
  

[...]




Re: [PATCH 1/6] scripts/coccinelle: add error-use-after-free.cocci

2020-04-01 Thread Peter Maydell
On Wed, 1 Apr 2020 at 15:44, Markus Armbruster  wrote:
> Peter Maydell  writes:
> > On Wed, 1 Apr 2020 at 06:07, Markus Armbruster  wrote:
> > But then as a coccinelle script author I need to know which of
> > the options I needed are standard, which are for-this-script-only,
> > and which are just 'workflow'.
>
> If you're capable of writing a Coccinelle script that actually does what
> you want, I bet you dollars to donuts that you can decide which options
> actually affect the patch in comparably no time whatsoever ;)

I use this thing maybe once a month at most, more likely once
every three months, and the documentation is notoriously
impenetrable. I really really don't want to have to start looking in it
and guessing about how the original author ran the script, when
they could have just told me.

> If you prefer to bother your reader with your personal choices, that's
> between you and your reviewers.  Myself, I prefer less noise around the
> signal.

> If you got Coccinelle installed and know the very basics, then the
> incantation in the script should suffice to use the script, and the
> incantation in the commit message should suffice to reproduce the patch.

So I need to now look in the git log for the script to find the commit
message? Why not just put the command in the file and save steps?

> Example:
>
> commit 4e20c1becba3fd2e8e71a2663cefb9627fd2a6e0
> Author: Markus Armbruster 
> Date:   Thu Dec 13 18:51:54 2018 +0100
>
> block: Replace qdict_put() by qdict_put_obj() where appropriate
>
> Patch created mechanically by rerunning:
>
>   $  spatch --sp-file scripts/coccinelle/qobject.cocci \
> --macro-file scripts/cocci-macro-file.h \
> --dir block --in-place

Yep, that command line would be great to see in the script file.

> scripts/coccinelle/qobject.cocci has no usage comment.  I doubt it needs
> one, but I'd certainly tolerate something like

// Usage:
// spatch --sp-file scripts/coccinelle/qobject.cocci \
//--macro-file scripts/cocci-macro-file.h \
//FILES ...

I think that should be about the minimum. I think every
.cocci file should say how it was used or is supposed to be used.
The least-effort way for the author of the script to do that is to
simply give the command line they used to run it.

> >   That's more work for the author *and* more work for the
> > reader than just "put the command line you used into the script
> > as a comment" -- so who's it benefiting?
>
> Anyone with basic Coccinelle proficiency benefits slightly from the
> reduction of noise.

How 'basic' is basic? I think that being specific is useful for
anybody who's at my level or lower (ie, can write a script, doesn't
do so often enough to be able to write a script or run spatch
without looking at documentation and cribbing from other scripts
as examples). How many people do we have at a higher level
than that for whom this is noise? 2? 3? And people who do
know Coccinelle well should have no difficulty in quickly
looking at a command line and mentally filtering out the options
that they don't feel they need.

thanks
-- PMM



Re: [PATCH v16 4/4] iotests: 287: add qcow2 compression type test

2020-04-01 Thread Vladimir Sementsov-Ogievskiy

01.04.2020 17:37, Denis Plotnikov wrote:

The test checks fulfilling qcow2 requiriements for the compression
type feature and zstd compression type operability.

Signed-off-by: Denis Plotnikov 
---
  tests/qemu-iotests/287 | 162 +
  tests/qemu-iotests/287.out |  70 
  tests/qemu-iotests/group   |   1 +
  3 files changed, 233 insertions(+)
  create mode 100755 tests/qemu-iotests/287
  create mode 100644 tests/qemu-iotests/287.out

diff --git a/tests/qemu-iotests/287 b/tests/qemu-iotests/287
new file mode 100755
index 00..699dccd72c
--- /dev/null
+++ b/tests/qemu-iotests/287
@@ -0,0 +1,162 @@
+#!/usr/bin/env bash
+#
+# Test case for an image using zstd compression
+#
+# Copyright (c) 2020 Virtuozzo International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+# creator
+owner=dplotni...@virtuozzo.com
+
+seq="$(basename $0)"
+echo "QA output created by $seq"
+
+status=1   # failure is the default!
+
+# standard environment
+. ./common.rc
+. ./common.filter
+
+# This tests qocw2-specific low-level functionality
+_supported_fmt qcow2
+_supported_proto file
+_supported_os Linux
+
+COMPR_IMG=$TEST_IMG.compressed
+RAND_FILE=$TEST_DIR/rand_data


Always quote file paths, to don't worry about whitespaces.


+
+_cleanup()
+{
+   _cleanup_test_img
+   rm -f $COMPR_IMG
+   rm -f $RAND_FILE
+}


[..]

 data should be identical

+echo
+echo "=== Testing incompressible cluster processing with zstd ==="
+echo
+
+dd if=/dev/urandom of=$RAND_FILE bs=1M count=1
+
+_make_test_img 64M
+# fill the image with likely incompressible and compressible clusters
+$QEMU_IO -c "write -c -s "$RAND_FILE" 0 1M " "$TEST_IMG" | _filter_qemu_io


hmm quotes around RAND_FILE here doesn't make things better: if RAND_FILE 
variable has
whitespace inside, the argemunt after -c will be split into two arguments.

What we need here is to support quotes inside the string argument for qemu-io. 
It's a
separate thing to do. Let's just don't worry about whitespaces here now.


With the following squashed in:
Reviewed-by: Vladimir Sementsov-Ogievskiy 

--- a/tests/qemu-iotests/287
+++ b/tests/qemu-iotests/287
@@ -35,14 +35,14 @@ _supported_fmt qcow2
 _supported_proto file
 _supported_os Linux

-COMPR_IMG=$TEST_IMG.compressed
-RAND_FILE=$TEST_DIR/rand_data
+COMPR_IMG="$TEST_IMG.compressed"
+RAND_FILE="$TEST_DIR/rand_data"

 _cleanup()
 {
 _cleanup_test_img
-rm -f $COMPR_IMG
-rm -f $RAND_FILE
+rm -f "$COMPR_IMG"
+rm -f "$RAND_FILE"
 }
 trap "_cleanup; exit \$status" 0 1 2 3 15

@@ -146,11 +146,14 @@ echo
 echo "=== Testing incompressible cluster processing with zstd ==="
 echo

-dd if=/dev/urandom of=$RAND_FILE bs=1M count=1
+dd if=/dev/urandom of="$RAND_FILE" bs=1M count=1

 _make_test_img 64M
 # fill the image with likely incompressible and compressible clusters
-$QEMU_IO -c "write -c -s "$RAND_FILE" 0 1M " "$TEST_IMG" | _filter_qemu_io
+# TODO: if RAND_FILE variable contain a whitespace, the following will fail.
+# We need to support some kind of quotes to make possible file paths with
+# white spaces for -s option
+$QEMU_IO -c "write -c -s $RAND_FILE 0 1M " "$TEST_IMG" | _filter_qemu_io
 $QEMU_IO -c "write -c -P 0xFA 1M 1M " "$TEST_IMG" | _filter_qemu_io
 $QEMU_IMG convert -O $IMGFMT -c -o compression_type=zstd \
   "$TEST_IMG" "$COMPR_IMG"


--
Best regards,
Vladimir



Re: [PATCH] ppc/pnv: Introduce common PNV_SETFIELD() and PNV_GETFIELD() macros

2020-04-01 Thread Greg Kurz
On Wed,  1 Apr 2020 17:26:33 +0200
Cédric Le Goater  wrote:

> Most of QEMU definitions of the register fields of the PowerNV machine
> come from skiboot and the models duplicate a set of macros for this
> purpose. Make them common under the pnv_utils.h file.
> 
> Signed-off-by: Cédric Le Goater 
> ---

Reviewed-by: Greg Kurz 

>  include/hw/pci-host/pnv_phb3_regs.h | 16 --
>  include/hw/ppc/pnv_utils.h  | 29 +++
>  hw/intc/pnv_xive.c  | 76 -
>  hw/pci-host/pnv_phb3.c  | 32 ++--
>  hw/pci-host/pnv_phb3_msi.c  | 24 -
>  hw/pci-host/pnv_phb4.c  | 51 ---
>  6 files changed, 108 insertions(+), 120 deletions(-)
>  create mode 100644 include/hw/ppc/pnv_utils.h
> 
> diff --git a/include/hw/pci-host/pnv_phb3_regs.h 
> b/include/hw/pci-host/pnv_phb3_regs.h
> index a174ef1f7045..38f8ce9d7406 100644
> --- a/include/hw/pci-host/pnv_phb3_regs.h
> +++ b/include/hw/pci-host/pnv_phb3_regs.h
> @@ -12,22 +12,6 @@
>  
>  #include "qemu/host-utils.h"
>  
> -/*
> - * QEMU version of the GETFIELD/SETFIELD macros
> - *
> - * These are common with the PnvXive model.
> - */
> -static inline uint64_t GETFIELD(uint64_t mask, uint64_t word)
> -{
> -return (word & mask) >> ctz64(mask);
> -}
> -
> -static inline uint64_t SETFIELD(uint64_t mask, uint64_t word,
> -uint64_t value)
> -{
> -return (word & ~mask) | ((value << ctz64(mask)) & mask);
> -}
> -
>  /*
>   * PBCQ XSCOM registers
>   */
> diff --git a/include/hw/ppc/pnv_utils.h b/include/hw/ppc/pnv_utils.h
> new file mode 100644
> index ..8521e13b5149
> --- /dev/null
> +++ b/include/hw/ppc/pnv_utils.h
> @@ -0,0 +1,29 @@
> +/*
> + * QEMU PowerPC PowerNV utilities
> + *
> + * Copyright (c) 2020, IBM Corporation.
> + *
> + * This code is licensed under the GPL version 2 or later. See the
> + * COPYING file in the top-level directory.
> + */
> +
> +#ifndef PPC_PNV_UTILS_H
> +#define PPC_PNV_UTILS_H
> +
> +/*
> + * QEMU version of the GETFIELD/SETFIELD macros used in skiboot to
> + * define the register fields.
> + */
> +
> +static inline uint64_t PNV_GETFIELD(uint64_t mask, uint64_t word)
> +{
> +return (word & mask) >> ctz64(mask);
> +}
> +
> +static inline uint64_t PNV_SETFIELD(uint64_t mask, uint64_t word,
> +uint64_t value)
> +{
> +return (word & ~mask) | ((value << ctz64(mask)) & mask);
> +}
> +
> +#endif /* PPC_PNV_UTILS_H */
> diff --git a/hw/intc/pnv_xive.c b/hw/intc/pnv_xive.c
> index aeda488bd113..77cacdd6c623 100644
> --- a/hw/intc/pnv_xive.c
> +++ b/hw/intc/pnv_xive.c
> @@ -21,6 +21,7 @@
>  #include "hw/ppc/pnv_core.h"
>  #include "hw/ppc/pnv_xscom.h"
>  #include "hw/ppc/pnv_xive.h"
> +#include "hw/ppc/pnv_utils.h" /* SETFIELD() and GETFIELD() macros */
>  #include "hw/ppc/xive_regs.h"
>  #include "hw/qdev-properties.h"
>  #include "hw/ppc/ppc.h"
> @@ -65,26 +66,6 @@ static const XiveVstInfo vst_infos[] = {
>  qemu_log_mask(LOG_GUEST_ERROR, "XIVE[%x] - " fmt "\n",  \
>(xive)->chip->chip_id, ## __VA_ARGS__);
>  
> -/*
> - * QEMU version of the GETFIELD/SETFIELD macros
> - *
> - * TODO: It might be better to use the existing extract64() and
> - * deposit64() but this means that all the register definitions will
> - * change and become incompatible with the ones found in skiboot.
> - *
> - * Keep it as it is for now until we find a common ground.
> - */
> -static inline uint64_t GETFIELD(uint64_t mask, uint64_t word)
> -{
> -return (word & mask) >> ctz64(mask);
> -}
> -
> -static inline uint64_t SETFIELD(uint64_t mask, uint64_t word,
> -uint64_t value)
> -{
> -return (word & ~mask) | ((value << ctz64(mask)) & mask);
> -}
> -
>  /*
>   * When PC_TCTXT_CHIPID_OVERRIDE is configured, the PC_TCTXT_CHIPID
>   * field overrides the hardwired chip ID in the Powerbus operations
> @@ -96,7 +77,7 @@ static uint8_t pnv_xive_block_id(PnvXive *xive)
>  uint64_t cfg_val = xive->regs[PC_TCTXT_CFG >> 3];
>  
>  if (cfg_val & PC_TCTXT_CHIPID_OVERRIDE) {
> -blk = GETFIELD(PC_TCTXT_CHIPID, cfg_val);
> +blk = PNV_GETFIELD(PC_TCTXT_CHIPID, cfg_val);
>  }
>  
>  return blk;
> @@ -145,7 +126,7 @@ static uint64_t pnv_xive_vst_addr_direct(PnvXive *xive, 
> uint32_t type,
>  {
>  const XiveVstInfo *info = _infos[type];
>  uint64_t vst_addr = vsd & VSD_ADDRESS_MASK;
> -uint64_t vst_tsize = 1ull << (GETFIELD(VSD_TSIZE, vsd) + 12);
> +uint64_t vst_tsize = 1ull << (PNV_GETFIELD(VSD_TSIZE, vsd) + 12);
>  uint32_t idx_max;
>  
>  idx_max = vst_tsize / info->size - 1;
> @@ -180,7 +161,7 @@ static uint64_t pnv_xive_vst_addr_indirect(PnvXive *xive, 
> uint32_t type,
>  return 0;
>  }
>  
> -page_shift = GETFIELD(VSD_TSIZE, vsd) + 12;
> +page_shift = PNV_GETFIELD(VSD_TSIZE, vsd) + 12;
>  
>  if (!pnv_xive_vst_page_size_allowed(page_shift)) 

Re: [PATCH v7 00/41] target/arm: Implement ARMv8.1-VHE

2020-04-01 Thread Jonathan Cameron
On Wed, 1 Apr 2020 11:45:22 +0100
Jonathan Cameron  wrote:

> On Tue, 31 Mar 2020 11:59:13 -0700
> Richard Henderson  wrote:
> 
> > On 3/31/20 8:33 AM, Jonathan Cameron wrote:  
> > > Just wondering if there are any known issues with this?
> > 
> > Nope.  It works for me.
> > Can you give us any more details.
> >   
> 
> Unfortunately not a lot more to add.
> 
> I ran some sanity checks that it wasn't something else looking like
> an issue in these patches.
> 
> All with 5.6 kernel and 5.0.0 rc0 qemu
> 
> 1) sve=off but VHE still on. failed.
> 2) sve=off + VH bit not set. fine but obviously no VHE.
> (dance with SVE required because of kernel checks for SVE before
> allowing no VHE kvm).
> 3) above tests run on mainline qemu just after VHE patches applied
> (just in case we have a regression from some other change).  No
> change. 4) EDK2 for the guest.  Synchronous exception. (works fine
> with no VHE) 0x00..05F9B2208

This one may be something since fixed in edk2.  I did a fresh build
of the current tree and it goes away.

> 
> I do get an additional error sometimes such as the ld.so one here.
> 
> [   16.539375] Run /sbin/init as init process
> Inconsistency detected by ld.so: rtld.c: 721: init_tls: Assertion `i
> == GL(dl_tls_max_dtv_idx)' failed! [   17.780596] Kernel panic - not
> syncing: Attempted to kill init! exitcode=0x7f00 [   17.847709]
> CPU: 0 PID: 1 Comm: init Not tainted 5.6.0 #356 [   17.897260]
> Hardware name: linux,dummy-virt (DT) [   17.940007] Call trace:
> [   17.962297]  dump_backtrace+0x0/0x190
> [   17.993897]  show_stack+0x1c/0x28
> [   18.022382]  dump_stack+0xb4/0xfc
> [   18.050781]  panic+0x160/0x35c
> [   18.077469]  do_exit+0x9a4/0xa08
> [   18.105510]  do_group_exit+0x48/0xa8
> [   18.136073]  __arm64_sys_exit_group+0x1c/0x20
> [   18.173677]  el0_svc_common.constprop.0+0x70/0x168
> [   18.218659]  do_el0_svc+0x28/0x88
> [   18.247154]  el0_sync_handler+0x10c/0x180
> [   18.281379]  el0_sync+0x140/0x180
> [   18.310684] Kernel Offset: 0x2a67dcc0 from 0x80001000
> [   18.362474] PHYS_OFFSET: 0xf794
> [   18.398314] CPU features: 0x40012,20c0a238
> [   18.433416] Memory Limit: none
> [   18.463648] ---[ end Kernel panic - not syncing: Attempted to kill
> init! exitcode=0x7f00 ]---

Seems I can get away with an initrd, but not a qcow2 based disk image.

Not that this necessarily helps much with working out what is going
wrong!

Jonathan

> 
> 
> Jonathan
> 
> 
> > 
> > r~  
> 





Re: bdrv_drained_begin deadlock with io-threads

2020-04-01 Thread Dietmar Maurer


> On April 1, 2020 5:37 PM Dietmar Maurer  wrote:
> 
>  
> > > I really nobody else able to reproduce this (somebody already tried to 
> > > reproduce)?
> > 
> > I can get hangs, but that's for job_completed(), not for starting the
> > job. Also, my hangs have a non-empty bs->tracked_requests, so it looks
> > like a different case to me.
> 
> Please can you post the command line args of your VM? I use something like
> 
> ./x86_64-softmmu/qemu-system-x86_64 -chardev 
> 'socket,id=qmp,path=/var/run/qemu-server/101.qmp,server,nowait' -mon 
> 'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/101.pid  -m 1024 
> -object 'iothread,id=iothread-virtioscsi0' -device 
> 'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' -drive 
> 'file=/backup/disk3/debian-buster.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on'
>  -device 
> 'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0'
>  -machine "type=pc,accel=kvm"

BTW, I get a segfault if I start above VM without "accel=kvm", 

gdb --args ./x86_64-softmmu/qemu-system-x86_64 -chardev 
'socket,id=qmp,path=/var/run/qemu-server/101.qmp,server,nowait' -mon 
'chardev=qmp,mode=control' -pidfile /var/run/qemu-server/101.pid  -m 1024 
-object 'iothread,id=iothread-virtioscsi0' -device 
'virtio-scsi-pci,id=virtioscsi0,iothread=iothread-virtioscsi0' -drive 
'file=/backup/disk3/debian-buster.raw,if=none,id=drive-scsi0,format=raw,cache=none,aio=native,detect-zeroes=on'
 -device 
'scsi-hd,bus=virtioscsi0.0,channel=0,scsi-id=0,lun=0,drive=drive-scsi0,id=scsi0'
 -machine "type=pc"

after a few seconds:

Thread 3 "qemu-system-x86" received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fffe857e700 (LWP 22257)]
0x5587c130 in do_tb_phys_invalidate (tb=tb@entry=0x7fffa7b40500 
, 
rm_from_page_list=rm_from_page_list@entry=true)
at /home/dietmar/pve5-devel/mirror_qemu/accel/tcg/translate-all.c:1483
1483atomic_set(_ctx->tb_phys_invalidate_count,
(gdb) bt
#0  0x5587c130 in do_tb_phys_invalidate
(tb=tb@entry=0x7fffa7b40500 , 
rm_from_page_list=rm_from_page_list@entry=true)
at /home/dietmar/pve5-devel/mirror_qemu/accel/tcg/translate-all.c:1483
#1  0x5587c53b in tb_phys_invalidate__locked (tb=0x7fffa7b40500 
)
at /home/dietmar/pve5-devel/mirror_qemu/accel/tcg/translate-all.c:1960
#2  0x5587c53b in tb_invalidate_phys_page_range__locked
(pages=pages@entry=0x7fffe780d400, p=0x7fff651066a0, 
start=start@entry=1072709632, end=end@entry=1072713728, 
retaddr=retaddr@entry=0) at 
/home/dietmar/pve5-devel/mirror_qemu/accel/tcg/translate-all.c:1960
#3  0x5587dad1 in tb_invalidate_phys_range (start=1072709632, 
end=1072771072)
at /home/dietmar/pve5-devel/mirror_qemu/accel/tcg/translate-all.c:2036
#4  0x55801c12 in invalidate_and_set_dirty
(mr=, addr=, length=)
at /home/dietmar/pve5-devel/mirror_qemu/exec.c:3036
#5  0x558072df in address_space_unmap
(as=, buffer=, len=, 
is_write=, access_len=65536)
at /home/dietmar/pve5-devel/mirror_qemu/exec.c:3571
#6  0x55967ff6 in dma_memory_unmap
(access_len=, dir=, len=, 
buffer=, as=) at 
/home/dietmar/pve5-devel/mirror_qemu/include/sysemu/dma.h:145
#7  0x55967ff6 in dma_blk_unmap (dbs=dbs@entry=0x7fffe7839220) at 
dma-helpers.c:104
#8  0x55968394 in dma_complete (ret=0, dbs=0x7fffe7839220) at 
dma-helpers.c:116
#9  0x55968394 in dma_blk_cb (opaque=0x7fffe7839220, ret=0) at 
dma-helpers.c:136
#10 0x55bac78e in blk_aio_complete (acb=0x7fffe783da00) at 
block/block-backend.c:1339
#11 0x55c7280b in coroutine_trampoline (i0=, 
i1=)
at util/coroutine-ucontext.c:115
#12 0x76176b50 in __correctly_grouped_prefixwc
(begin=0x7fffa7b40240  L"\x3ff0497b", end=0x12 
, thousands=0 L'\000', 
grouping=0x7fffa7b40590  "\001") at grouping.c:171
#13 0x in  ()


It runs fine without iothreads.

But I guess this is a totally different problem?




Re: Questionable aspects of QEMU Error's design

2020-04-01 Thread Markus Armbruster
Alex Bennée  writes:

> Vladimir Sementsov-Ogievskiy  writes:
>
>> Side question:
>>
>> Can we somehow implement a possibility to reliably identify file and line 
>> number
>> where error is set by error message?
>>
>> It's where debug of error-bugs always starts: try to imagine which parts of 
>> the error
>> message are "%s", and how to grep for it in the code, keeping in mind also,
>> that error massage may be split into several lines..
>>
>> Put file:line into each error? Seems too noisy for users.. A lot of errors 
>> are not
>> bugs: use do something wrong and see the error, and understands what he is 
>> doing
>> wrong.. It's not usual practice to print file:line into each message
>> for user.
>
> I tend to use __func__ for these things as the result is usually easily
> grep-able.

Putting __func__ in error messages makes them both greppable and ugly.




RE: RFC: use VFIO over a UNIX domain socket to implement device offloading

2020-04-01 Thread Thanos Makatos
> On Thu, Mar 26, 2020 at 09:47:38AM +, Thanos Makatos wrote:
> > Build MUSER with vfio-over-socket:
> >
> > git clone --single-branch --branch vfio-over-socket
> g...@github.com:tmakatos/muser.git
> > cd muser/
> > git submodule update --init
> > make
> >
> > Run device emulation, e.g.
> >
> > ./build/dbg/samples/gpio-pci-idio-16 -s 
> >
> > Where  is an available IOMMU group, essentially the device ID, which
> must not
> > previously exist in /dev/vfio/.
> >
> > Run QEMU using the vfio wrapper library and specifying the MUSER device:
> >
> > LD_PRELOAD=muser/build/dbg/libvfio/libvfio.so qemu-system-x86_64
> \
> > ... \
> > -device vfio-pci,sysfsdev=/dev/vfio/ \
> > -object memory-backend-file,id=ram-node0,prealloc=yes,mem-
> path=mem,share=yes,size=1073741824 \
> > -numa node,nodeid=0,cpus=0,memdev=ram-node0
> >
> > Bear in mind that since this is just a PoC lots of things can break, e.g. 
> > some
> > system call not intercepted etc.
> 
> Cool, I had a quick look at libvfio and how the transport integrates
> into libmuser.  The integration on the libmuser side is nice and small.
> 
> It seems likely that there will be several different implementations of
> the vfio-over-socket device side (server):
> 1. libmuser
> 2. A Rust equivalent to libmuser
> 3. Maybe a native QEMU implementation for multi-process QEMU (I think JJ
>has been investigating this?)
> 
> In order to interoperate we'll need to maintain a protocol
> specification.  Mayb You and JJ could put that together and CC the vfio,
> rust-vmm, and QEMU communities for discussion?

Sure, I can start by drafting a design doc and share it.

> It should cover the UNIX domain socket connection semantics (does a
> listen socket only accept 1 connection at a time?  What happens when the
> client disconnects?  What happens when the server disconnects?), how
> VFIO structs are exchanged, any vfio-over-socket specific protocol
> messages, etc.  Basically everything needed to write an implementation
> (although it's not necessary to copy the VFIO struct definitions from
> the kernel headers into the spec or even document their semantics if
> they are identical to kernel VFIO).
> 
> The next step beyond the LD_PRELOAD library is a native vfio-over-socket
> client implementation in QEMU.  There is a prototype here:
> https://github.com/elmarco/qemu/blob/wip/vfio-user/hw/vfio/libvfio-
> user.c
> 
> If there are any volunteers for working on that then this would be a
> good time to discuss it.
> 
> Finally, has anyone looked at CrosVM's out-of-process device model?  I
> wonder if it has any features we should consider...
> 
> Looks like a great start to vfio-over-socket!



[PATCH 2/2] ppc/xive: Add support for PQ state bits offload

2020-04-01 Thread Cédric Le Goater
The trigger message coming from a HW source contains a special bit
informing the XIVE interrupt controller that the PQ bits have been
checked at the source or not. Depending on the value, the IC can
perform the check and the state transition locally using its own PQ
state bits.

The following changes add new accessors to the XiveRouter required to
query and update the PQ state bits. This is only applies to the
PowerNV machine, sPAPR is not concerned by such complex configuration.

Signed-off-by: Cédric Le Goater 
---
 include/hw/ppc/xive.h  |  8 +--
 hw/intc/pnv_xive.c | 40 +++
 hw/intc/xive.c | 48 --
 hw/pci-host/pnv_phb4.c |  9 ++--
 hw/ppc/pnv_psi.c   |  8 +--
 5 files changed, 96 insertions(+), 17 deletions(-)

diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index d4e7c1f9217f..6184b713e1a6 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -160,7 +160,7 @@ typedef struct XiveNotifier XiveNotifier;
 
 typedef struct XiveNotifierClass {
 InterfaceClass parent;
-void (*notify)(XiveNotifier *xn, uint32_t lisn);
+void (*notify)(XiveNotifier *xn, uint32_t lisn, bool pq_checked);
 } XiveNotifierClass;
 
 /*
@@ -354,6 +354,10 @@ typedef struct XiveRouterClass {
 /* XIVE table accessors */
 int (*get_eas)(XiveRouter *xrtr, uint8_t eas_blk, uint32_t eas_idx,
XiveEAS *eas);
+int (*get_pq)(XiveRouter *xrtr, uint8_t eas_blk, uint32_t eas_idx,
+  uint8_t *pq);
+int (*set_pq)(XiveRouter *xrtr, uint8_t eas_blk, uint32_t eas_idx,
+  uint8_t *pq);
 int (*get_end)(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
XiveEND *end);
 int (*write_end)(XiveRouter *xrtr, uint8_t end_blk, uint32_t end_idx,
@@ -375,7 +379,7 @@ int xive_router_get_nvt(XiveRouter *xrtr, uint8_t nvt_blk, 
uint32_t nvt_idx,
 XiveNVT *nvt);
 int xive_router_write_nvt(XiveRouter *xrtr, uint8_t nvt_blk, uint32_t nvt_idx,
   XiveNVT *nvt, uint8_t word_number);
-void xive_router_notify(XiveNotifier *xn, uint32_t lisn);
+void xive_router_notify(XiveNotifier *xn, uint32_t lisn, bool pq_checked);
 
 /*
  * XIVE Presenter
diff --git a/hw/intc/pnv_xive.c b/hw/intc/pnv_xive.c
index 13361cb688b2..36eaa08f93d5 100644
--- a/hw/intc/pnv_xive.c
+++ b/hw/intc/pnv_xive.c
@@ -373,6 +373,34 @@ static int pnv_xive_get_eas(XiveRouter *xrtr, uint8_t blk, 
uint32_t idx,
 return pnv_xive_vst_read(xive, VST_TSEL_IVT, blk, idx, eas);
 }
 
+static int pnv_xive_get_pq(XiveRouter *xrtr, uint8_t blk, uint32_t idx,
+   uint8_t *pq)
+{
+PnvXive *xive = PNV_XIVE(xrtr);
+
+if (pnv_xive_block_id(xive) != blk) {
+xive_error(xive, "VST: EAS %x is remote !?", XIVE_EAS(blk, idx));
+return -1;
+}
+
+*pq = xive_source_esb_get(>ipi_source, idx);
+return 0;
+}
+
+static int pnv_xive_set_pq(XiveRouter *xrtr, uint8_t blk, uint32_t idx,
+   uint8_t *pq)
+{
+PnvXive *xive = PNV_XIVE(xrtr);
+
+if (pnv_xive_block_id(xive) != blk) {
+xive_error(xive, "VST: EAS %x is remote !?", XIVE_EAS(blk, idx));
+return -1;
+}
+
+*pq = xive_source_esb_set(>ipi_source, idx, *pq);
+return 0;
+}
+
 /*
  * One bit per thread id. The first register PC_THREAD_EN_REG0 covers
  * the first cores 0-15 (normal) of the chip or 0-7 (fused). The
@@ -469,12 +497,12 @@ static PnvXive *pnv_xive_tm_get_xive(PowerPCCPU *cpu)
  * event notification to the Router. This is required on a multichip
  * system.
  */
-static void pnv_xive_notify(XiveNotifier *xn, uint32_t srcno)
+static void pnv_xive_notify(XiveNotifier *xn, uint32_t srcno, bool pq_checked)
 {
 PnvXive *xive = PNV_XIVE(xn);
 uint8_t blk = pnv_xive_block_id(xive);
 
-xive_router_notify(xn, XIVE_EAS(blk, srcno));
+xive_router_notify(xn, XIVE_EAS(blk, srcno), pq_checked);
 }
 
 /*
@@ -1316,7 +1344,8 @@ static void pnv_xive_ic_hw_trigger(PnvXive *xive, hwaddr 
addr, uint64_t val)
 blk = XIVE_EAS_BLOCK(val);
 idx = XIVE_EAS_INDEX(val);
 
-xive_router_notify(XIVE_NOTIFIER(xive), XIVE_EAS(blk, idx));
+xive_router_notify(XIVE_NOTIFIER(xive), XIVE_EAS(blk, idx),
+   !!(val & XIVE_TRIGGER_PQ));
 }
 
 static void pnv_xive_ic_notify_write(void *opaque, hwaddr addr, uint64_t val,
@@ -1939,6 +1968,8 @@ static void pnv_xive_class_init(ObjectClass *klass, void 
*data)
 device_class_set_props(dc, pnv_xive_properties);
 
 xrc->get_eas = pnv_xive_get_eas;
+xrc->get_pq = pnv_xive_get_pq;
+xrc->set_pq = pnv_xive_set_pq;
 xrc->get_end = pnv_xive_get_end;
 xrc->write_end = pnv_xive_write_end;
 xrc->get_nvt = pnv_xive_get_nvt;
@@ -1967,7 +1998,8 @@ static const TypeInfo pnv_xive_info = {
  *
  * Trigger all threads 0
  */
-static void pnv_xive_lsi_notify(XiveNotifier *xn, uint32_t srcno)
+static void 

[PATCH 0/2] ppc/xive: Add support for PQ state bits offload

2020-04-01 Thread Cédric Le Goater
Hello,

When the XIVE router unit receives a trigger message coming from a HW
source, it contains a special bit informing the XIVE interrupt
controller that the PQ bits have been checked at the source or
not. Depending on the value, the IC can perform the check and the
state transition locally using its own PQ state bits.

The following changes add new accessors to the XiveRouter required to
query and update the PQ state bits. This is only applies to the
PowerNV machine, sPAPR is not concerned by such complex configuration.
We will use it for upcoming features offloading event coalescing on
the interrupt controller.

Thanks,

C.

Cédric Le Goater (2):
  ppc/xive: export PQ routines
  ppc/xive: Add support for PQ state bits offload

 include/hw/ppc/xive.h| 12 +++--
 hw/intc/pnv_xive.c   | 40 ++---
 hw/intc/spapr_xive_kvm.c |  8 +++---
 hw/intc/xive.c   | 54 
 hw/pci-host/pnv_phb4.c   |  9 +--
 hw/ppc/pnv_psi.c |  8 --
 6 files changed, 107 insertions(+), 24 deletions(-)

-- 
2.21.1




[PATCH 1/2] ppc/xive: export PQ routines

2020-04-01 Thread Cédric Le Goater
Signed-off-by: Cédric Le Goater 
---
 include/hw/ppc/xive.h| 4 
 hw/intc/spapr_xive_kvm.c | 8 
 hw/intc/xive.c   | 6 +++---
 3 files changed, 11 insertions(+), 7 deletions(-)

diff --git a/include/hw/ppc/xive.h b/include/hw/ppc/xive.h
index 59ac075db080..d4e7c1f9217f 100644
--- a/include/hw/ppc/xive.h
+++ b/include/hw/ppc/xive.h
@@ -255,6 +255,10 @@ static inline hwaddr xive_source_esb_mgmt(XiveSource 
*xsrc, int srcno)
 #define XIVE_ESB_QUEUED   (XIVE_ESB_VAL_P | XIVE_ESB_VAL_Q)
 #define XIVE_ESB_OFF  XIVE_ESB_VAL_Q
 
+bool xive_esb_trigger(uint8_t *pq);
+bool xive_esb_eoi(uint8_t *pq);
+uint8_t xive_esb_set(uint8_t *pq, uint8_t value);
+
 /*
  * "magic" Event State Buffer (ESB) MMIO offsets.
  *
diff --git a/hw/intc/spapr_xive_kvm.c b/hw/intc/spapr_xive_kvm.c
index edb7ee0e74f1..43f4d56b958c 100644
--- a/hw/intc/spapr_xive_kvm.c
+++ b/hw/intc/spapr_xive_kvm.c
@@ -308,7 +308,7 @@ static uint8_t xive_esb_read(XiveSource *xsrc, int srcno, 
uint32_t offset)
 return xive_esb_rw(xsrc, srcno, offset, 0, 0) & 0x3;
 }
 
-static void xive_esb_trigger(XiveSource *xsrc, int srcno)
+static void kvmppc_xive_esb_trigger(XiveSource *xsrc, int srcno)
 {
 uint64_t *addr = xsrc->esb_mmap + xive_source_esb_page(xsrc, srcno);
 
@@ -331,7 +331,7 @@ uint64_t kvmppc_xive_esb_rw(XiveSource *xsrc, int srcno, 
uint32_t offset,
 offset == XIVE_ESB_LOAD_EOI) {
 xive_esb_read(xsrc, srcno, XIVE_ESB_SET_PQ_00);
 if (xsrc->status[srcno] & XIVE_STATUS_ASSERTED) {
-xive_esb_trigger(xsrc, srcno);
+kvmppc_xive_esb_trigger(xsrc, srcno);
 }
 return 0;
 } else {
@@ -375,7 +375,7 @@ void kvmppc_xive_source_set_irq(void *opaque, int srcno, 
int val)
 }
 }
 
-xive_esb_trigger(xsrc, srcno);
+kvmppc_xive_esb_trigger(xsrc, srcno);
 }
 
 /*
@@ -544,7 +544,7 @@ static void kvmppc_xive_change_state_handler(void *opaque, 
int running,
  * generate a trigger.
  */
 if (pq == XIVE_ESB_RESET && old_pq == XIVE_ESB_QUEUED) {
-xive_esb_trigger(xsrc, i);
+kvmppc_xive_esb_trigger(xsrc, i);
 }
 }
 
diff --git a/hw/intc/xive.c b/hw/intc/xive.c
index b9656cd7556c..56ce3ed93e29 100644
--- a/hw/intc/xive.c
+++ b/hw/intc/xive.c
@@ -811,7 +811,7 @@ void xive_tctx_destroy(XiveTCTX *tctx)
  * XIVE ESB helpers
  */
 
-static uint8_t xive_esb_set(uint8_t *pq, uint8_t value)
+uint8_t xive_esb_set(uint8_t *pq, uint8_t value)
 {
 uint8_t old_pq = *pq & 0x3;
 
@@ -821,7 +821,7 @@ static uint8_t xive_esb_set(uint8_t *pq, uint8_t value)
 return old_pq;
 }
 
-static bool xive_esb_trigger(uint8_t *pq)
+bool xive_esb_trigger(uint8_t *pq)
 {
 uint8_t old_pq = *pq & 0x3;
 
@@ -841,7 +841,7 @@ static bool xive_esb_trigger(uint8_t *pq)
 }
 }
 
-static bool xive_esb_eoi(uint8_t *pq)
+bool xive_esb_eoi(uint8_t *pq)
 {
 uint8_t old_pq = *pq & 0x3;
 
-- 
2.21.1




Re: [PATCH] Compress lines for immediate return

2020-04-01 Thread Eric Blake

On 4/1/20 9:49 AM, Simran Singhal wrote:

Hello Philippe

On Wed, Apr 1, 2020 at 7:26 PM Philippe Mathieu-Daudé 
wrote:


Hi Simran,

On 4/1/20 2:11 PM, Simran Singhal wrote:

Compress two lines into a single line if immediate return statement is

found.

How did you find these changes? Manual audit, some tool?



I wrote coccinelle script to do these changes.



Then is it worth checking in your script to scripts/coccinelle/ to let 
it be something we can repeatedly rerun in the future to catch more 
instances?  Even if you don't go that far, mentioning the exact rune you 
used makes it easier to reproduce the patch, or even backport its 
effects to a different branch.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [PATCH v16 3/4] qcow2: add zstd cluster compression

2020-04-01 Thread Vladimir Sementsov-Ogievskiy

01.04.2020 17:37, Denis Plotnikov wrote:

zstd significantly reduces cluster compression time.
It provides better compression performance maintaining
the same level of the compression ratio in comparison with
zlib, which, at the moment, is the only compression
method available.

The performance test results:
Test compresses and decompresses qemu qcow2 image with just
installed rhel-7.6 guest.
Image cluster size: 64K. Image on disk size: 2.2G

The test was conducted with brd disk to reduce the influence
of disk subsystem to the test results.
The results is given in seconds.

compress cmd:
   time ./qemu-img convert -O qcow2 -c -o compression_type=[zlib|zstd]
   src.img [zlib|zstd]_compressed.img
decompress cmd
   time ./qemu-img convert -O qcow2
   [zlib|zstd]_compressed.img uncompressed.img

compression   decompression
  zlib   zstd   zlib zstd

real 65.5   16.3 (-75 %)1.9  1.6 (-16 %)
user 65.0   15.85.3  2.5
sys   3.30.22.0  2.0

Both ZLIB and ZSTD gave the same compression ratio: 1.57
compressed image size in both cases: 1.4G

Signed-off-by: Denis Plotnikov 
QAPI part:
Acked-by: Markus Armbruster 
---
  docs/interop/qcow2.txt |   1 +
  configure  |   2 +-
  qapi/block-core.json   |   3 +-
  block/qcow2-threads.c  | 163 +
  block/qcow2.c  |   7 ++
  5 files changed, 174 insertions(+), 2 deletions(-)

diff --git a/docs/interop/qcow2.txt b/docs/interop/qcow2.txt
index 640e0eca40..18a77f737e 100644
--- a/docs/interop/qcow2.txt
+++ b/docs/interop/qcow2.txt
@@ -209,6 +209,7 @@ version 2.
  
  Available compression type values:

  0: zlib 
+1: zstd 
  
  
  === Header padding ===

diff --git a/configure b/configure
index e225a1e3ff..fdc991b010 100755
--- a/configure
+++ b/configure
@@ -1861,7 +1861,7 @@ disabled with --disable-FEATURE, default is enabled if 
available:
lzfse   support of lzfse compression library
(for reading lzfse-compressed dmg images)
zstdsupport for zstd compression library
-  (for migration compression)
+  (for migration compression and qcow2 cluster compression)
seccomp seccomp support
coroutine-pool  coroutine freelist (better performance)
glusterfs   GlusterFS backend
diff --git a/qapi/block-core.json b/qapi/block-core.json
index 1522e2983f..6fbacddab2 100644
--- a/qapi/block-core.json
+++ b/qapi/block-core.json
@@ -4293,11 +4293,12 @@
  # Compression type used in qcow2 image file
  #
  # @zlib: zlib compression, see 
+# @zstd: zstd compression, see 
  #
  # Since: 5.1
  ##
  { 'enum': 'Qcow2CompressionType',
-  'data': [ 'zlib' ] }
+  'data': [ 'zlib', { 'name': 'zstd', 'if': 'defined(CONFIG_ZSTD)' } ] }
  
  ##

  # @BlockdevCreateOptionsQcow2:
diff --git a/block/qcow2-threads.c b/block/qcow2-threads.c
index 7dbaf53489..aa133204f0 100644
--- a/block/qcow2-threads.c
+++ b/block/qcow2-threads.c
@@ -28,6 +28,11 @@
  #define ZLIB_CONST
  #include 
  
+#ifdef CONFIG_ZSTD

+#include 
+#include 
+#endif
+
  #include "qcow2.h"
  #include "block/thread-pool.h"
  #include "crypto.h"
@@ -166,6 +171,154 @@ static ssize_t qcow2_zlib_decompress(void *dest, size_t 
dest_size,
  return ret;
  }
  
+#ifdef CONFIG_ZSTD

+
+/*
+ * qcow2_zstd_compress()
+ *
+ * Compress @src_size bytes of data using zstd compression method
+ *
+ * @dest - destination buffer, @dest_size bytes
+ * @src - source buffer, @src_size bytes
+ *
+ * Returns: compressed size on success
+ *  -ENOMEM destination buffer is not enough to store compressed data
+ *  -EIOon any other error
+ */
+static ssize_t qcow2_zstd_compress(void *dest, size_t dest_size,
+   const void *src, size_t src_size)
+{
+ssize_t ret;
+ZSTD_outBuffer output = { dest, dest_size, 0 };
+ZSTD_inBuffer input = { src, src_size, 0 };
+ZSTD_CCtx *cctx = ZSTD_createCCtx();
+
+if (!cctx) {
+return -EIO;
+}
+/*
+ * Use the zstd streamed interface for symmetry with decompression,
+ * where streaming is essential since we don't record the exact
+ * compressed size.
+ *
+ * In the loop, we try to compress all the data into one zstd frame.
+ * ZSTD_compressStream2 potentially can finish a frame earlier
+ * than the full input data is consumed. That's why we are looping
+ * until all the input data is consumed.
+ */
+while (input.pos < input.size) {
+size_t zstd_ret = 0;


dead assignment


+/*
+ * ZSTD spec: "You must continue calling ZSTD_compressStream2()
+

Re: Questionable aspects of QEMU Error's design

2020-04-01 Thread Markus Armbruster
Daniel P. Berrangé  writes:

> On Wed, Apr 01, 2020 at 11:02:11AM +0200, Markus Armbruster wrote:
>> QEMU's Error was patterned after GLib's GError.  Differences include:
>> 
>> * _fatal, _abort for convenience
>
> I think this doesn't really need to exist, and is an artifact
> of the later point "return values" where we commonly make methds
> return void.  If we adopted a non-void return value, then these
> are no longer so compelling.
>
> Consider if we didn't have _fatal right now, then we would
> need to
>
>Error *local_err = NULL;
>qemu_boot_set(boot_once, _err)
>if (*local_err)
>   abort();
>
> This is tedious, so we invented _abort to make our lives
> better
>
>qemu_boot_set(boot_once, _abort)
>
>
> If we had a "bool" return value though, we would probably have just
> ended up doing:
>
>assert(qemu_boot_set(boot_once, NULL));

This assumes !defined(NDEBUG).

> or
>
>if (!qemu_boot_set(boot_once, NULL))
>abort()
>
> and would never have invented _fatal.

Yes, the readability improvement of _abort over this is only
marginal.

However, _abort also results in more useful stack backtraces, as
Vladimir already pointed out.

Our use of error_propagate() sabotages this advantage.  Vladimir's auto
propagation work stops that.

>> * Distinguishing different errors
>> 
>>   Where Error has ErrorClass, GError has Gquark domain, gint code.  Use
>>   of ErrorClass other than ERROR_CLASS_GENERIC_ERROR is strongly
>>   discouraged.  When we need callers to distinguish errors, we return
>>   suitable error codes separately.
>
> The GQuark is just a static string, and in most cases this ends up being
> defined per-file, or sometimes per functional group. So essentially you
> can consider it to approximately a source file in most cases. The code
> is a constant of some arbitrary type that is generally considered to be
> scoped within the context of the GQuark domain.
>
>> * Return value conventions
>> 
>>   Common: non-void functions return a distinct error value on failure
>>   when such a value can be defined.  Patterns:
>> 
>>   - Functions returning non-null pointers on success return null pointer
>> on failure.
>> 
>>   - Functions returning non-negative integers on success return a
>> negative error code on failure.
>> 
>>   Different: GLib discourages void functions, because these lead to
>>   awkward error checking code.  We have tons of them, and tons of
>>   awkward error checking code:
>> 
>> Error *err = NULL;
>> frobnicate(arg, );
>> if (err) {
>> ... recover ...
>> error_propagate(errp, err);
>> }
>
> Yeah, I really dislike this verbose style...
>
>> 
>>   instead of
>> 
>> if (!frobnicate(arg, errp))
>> ... recover ...
>> }
>
> ...so I've followed this style for any code I've written in QEMU
> where possible.
>
>> 
>>   Can also lead to pointless creation of Error objects.
>> 
>>   I consider this a design mistake.  Can we still fix it?  We have more
>>   than 2000 void functions taking an Error ** parameter...
>
> Even if we don't do full conversion, we can at least encourage the
> simpler style - previously reviewers have told me to rewrite code
> to use the more verbose style, which I resisted. So at the very
> least setting the expectations for preferred style is useful.

It's a matter of patching the big comment in error.h.

Of course, the non-preferred style will still be copied from bad
examples until we get rid of them.

>>   Transforming code that receives and checks for errors with Coccinelle
>>   shouldn't be hard.  Transforming code that returns errors seems more
>>   difficult.  We need to transform explicit and implicit return to
>>   either return true or return false, depending on what we did to the
>>   @errp parameter on the way to the return.  Hmm.
>
> Even if we only converted methods which are currently void, that
> would be a notable benefit I think.

Manual conversion of a modest set of frequently used functions with
automated conversion of its calls should be feasible.

For more, I believe we need to figure out how to automatically transform
code that returns errors.

> It is a shame we didn't just use GError from the start, but I guess
> its probably too late to consider changing that now.

If I remember correctly, error.h predates our adoption of GLib.  Water
under the bridge now.




[PATCH] ppc/pnv: Introduce common PNV_SETFIELD() and PNV_GETFIELD() macros

2020-04-01 Thread Cédric Le Goater
Most of QEMU definitions of the register fields of the PowerNV machine
come from skiboot and the models duplicate a set of macros for this
purpose. Make them common under the pnv_utils.h file.

Signed-off-by: Cédric Le Goater 
---
 include/hw/pci-host/pnv_phb3_regs.h | 16 --
 include/hw/ppc/pnv_utils.h  | 29 +++
 hw/intc/pnv_xive.c  | 76 -
 hw/pci-host/pnv_phb3.c  | 32 ++--
 hw/pci-host/pnv_phb3_msi.c  | 24 -
 hw/pci-host/pnv_phb4.c  | 51 ---
 6 files changed, 108 insertions(+), 120 deletions(-)
 create mode 100644 include/hw/ppc/pnv_utils.h

diff --git a/include/hw/pci-host/pnv_phb3_regs.h 
b/include/hw/pci-host/pnv_phb3_regs.h
index a174ef1f7045..38f8ce9d7406 100644
--- a/include/hw/pci-host/pnv_phb3_regs.h
+++ b/include/hw/pci-host/pnv_phb3_regs.h
@@ -12,22 +12,6 @@
 
 #include "qemu/host-utils.h"
 
-/*
- * QEMU version of the GETFIELD/SETFIELD macros
- *
- * These are common with the PnvXive model.
- */
-static inline uint64_t GETFIELD(uint64_t mask, uint64_t word)
-{
-return (word & mask) >> ctz64(mask);
-}
-
-static inline uint64_t SETFIELD(uint64_t mask, uint64_t word,
-uint64_t value)
-{
-return (word & ~mask) | ((value << ctz64(mask)) & mask);
-}
-
 /*
  * PBCQ XSCOM registers
  */
diff --git a/include/hw/ppc/pnv_utils.h b/include/hw/ppc/pnv_utils.h
new file mode 100644
index ..8521e13b5149
--- /dev/null
+++ b/include/hw/ppc/pnv_utils.h
@@ -0,0 +1,29 @@
+/*
+ * QEMU PowerPC PowerNV utilities
+ *
+ * Copyright (c) 2020, IBM Corporation.
+ *
+ * This code is licensed under the GPL version 2 or later. See the
+ * COPYING file in the top-level directory.
+ */
+
+#ifndef PPC_PNV_UTILS_H
+#define PPC_PNV_UTILS_H
+
+/*
+ * QEMU version of the GETFIELD/SETFIELD macros used in skiboot to
+ * define the register fields.
+ */
+
+static inline uint64_t PNV_GETFIELD(uint64_t mask, uint64_t word)
+{
+return (word & mask) >> ctz64(mask);
+}
+
+static inline uint64_t PNV_SETFIELD(uint64_t mask, uint64_t word,
+uint64_t value)
+{
+return (word & ~mask) | ((value << ctz64(mask)) & mask);
+}
+
+#endif /* PPC_PNV_UTILS_H */
diff --git a/hw/intc/pnv_xive.c b/hw/intc/pnv_xive.c
index aeda488bd113..77cacdd6c623 100644
--- a/hw/intc/pnv_xive.c
+++ b/hw/intc/pnv_xive.c
@@ -21,6 +21,7 @@
 #include "hw/ppc/pnv_core.h"
 #include "hw/ppc/pnv_xscom.h"
 #include "hw/ppc/pnv_xive.h"
+#include "hw/ppc/pnv_utils.h" /* SETFIELD() and GETFIELD() macros */
 #include "hw/ppc/xive_regs.h"
 #include "hw/qdev-properties.h"
 #include "hw/ppc/ppc.h"
@@ -65,26 +66,6 @@ static const XiveVstInfo vst_infos[] = {
 qemu_log_mask(LOG_GUEST_ERROR, "XIVE[%x] - " fmt "\n",  \
   (xive)->chip->chip_id, ## __VA_ARGS__);
 
-/*
- * QEMU version of the GETFIELD/SETFIELD macros
- *
- * TODO: It might be better to use the existing extract64() and
- * deposit64() but this means that all the register definitions will
- * change and become incompatible with the ones found in skiboot.
- *
- * Keep it as it is for now until we find a common ground.
- */
-static inline uint64_t GETFIELD(uint64_t mask, uint64_t word)
-{
-return (word & mask) >> ctz64(mask);
-}
-
-static inline uint64_t SETFIELD(uint64_t mask, uint64_t word,
-uint64_t value)
-{
-return (word & ~mask) | ((value << ctz64(mask)) & mask);
-}
-
 /*
  * When PC_TCTXT_CHIPID_OVERRIDE is configured, the PC_TCTXT_CHIPID
  * field overrides the hardwired chip ID in the Powerbus operations
@@ -96,7 +77,7 @@ static uint8_t pnv_xive_block_id(PnvXive *xive)
 uint64_t cfg_val = xive->regs[PC_TCTXT_CFG >> 3];
 
 if (cfg_val & PC_TCTXT_CHIPID_OVERRIDE) {
-blk = GETFIELD(PC_TCTXT_CHIPID, cfg_val);
+blk = PNV_GETFIELD(PC_TCTXT_CHIPID, cfg_val);
 }
 
 return blk;
@@ -145,7 +126,7 @@ static uint64_t pnv_xive_vst_addr_direct(PnvXive *xive, 
uint32_t type,
 {
 const XiveVstInfo *info = _infos[type];
 uint64_t vst_addr = vsd & VSD_ADDRESS_MASK;
-uint64_t vst_tsize = 1ull << (GETFIELD(VSD_TSIZE, vsd) + 12);
+uint64_t vst_tsize = 1ull << (PNV_GETFIELD(VSD_TSIZE, vsd) + 12);
 uint32_t idx_max;
 
 idx_max = vst_tsize / info->size - 1;
@@ -180,7 +161,7 @@ static uint64_t pnv_xive_vst_addr_indirect(PnvXive *xive, 
uint32_t type,
 return 0;
 }
 
-page_shift = GETFIELD(VSD_TSIZE, vsd) + 12;
+page_shift = PNV_GETFIELD(VSD_TSIZE, vsd) + 12;
 
 if (!pnv_xive_vst_page_size_allowed(page_shift)) {
 xive_error(xive, "VST: invalid %s page shift %d", info->name,
@@ -207,7 +188,7 @@ static uint64_t pnv_xive_vst_addr_indirect(PnvXive *xive, 
uint32_t type,
  * Check that the pages have a consistent size across the
  * indirect table
  */
-if (page_shift != GETFIELD(VSD_TSIZE, vsd) + 12) {
+if (page_shift != 

[Bug 1847525] Re: qemu-system-i386 eats a lot of cpu after just few hours, with sdl, gl=on

2020-04-01 Thread Alex Bennée
** Tags added: tcg

** Tags removed: tcg
** Tags added: kvm

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1847525

Title:
  qemu-system-i386 eats a lot of cpu after just few hours,  with
  sdl,gl=on

Status in QEMU:
  New

Bug description:
  I already send this email to qemu-disc...@nongnu.org , but I can't see
  it arriving in archives, so here  is copy.

  Hello, all!

  I use qemu-system-i386/qemu-system_x86_64 for rebuilding Slax-like live 
cd/dvd.
  Usually guests (with various self-compiled kernels and X stack with kde3 on 
top of them)
  boot up normally, but if I left them to run in GUI mode for few hours - qemu 
process on host
  started to eat more and more cpu for itself - more notiecable if I set host 
cpu to lowest possible
  frequency via trayfreq applet (1400Mhz in my case).

  Boot line a bit complicated, but I really prefer to have sound and usb inside 
VM.
  qemu-system-i386 -cdrom /dev/shm/CDROM-4.4.194_5.iso -m 1.9G -enable-kvm 
-soundhw es1370 -smp 2 -display sdl,gl=on -usb -cpu host -rtc clock=vm

  rtc clock=vm was taken from https://bugs.launchpad.net/qemu/+bug/1174654 but 
apparently not helping.
  After just 3 hours of uptime (copied line from 'top' on host)

  31943 guest 20   0 2412m 791m  38m R   51  6.7  66:36.51 qemu-
  system-i38

  I use Xorg 1.19.7 on host, with mesa git/nouveau as GL driver. But my card 
has not very big amount of VRAM - only 384Mb.
  May be this limitation is playing some role .. but 'end-user' result was 
after 1-2 day of guest uptime I run into completely frozen guest 
  (may be when qemu was hitting 100 one core usage on host some internal timer 
just made guest kernel too upset/froze?
   I was sleeping or doing other things on host  for all this time, with VM 
just supposedly running at another virtual desktop - 
  in KDE3 + built-in compositor )

  I wonder if more mainstream desktop users (on GNOME, Xfce, etc) and/or users 
of other distros (I use self-re-compiled Slackware)
  actually can see same problem?

  qemu-system-i386 --version
  QEMU emulator version 4.1.50 (v4.1.0-1188-gc6f5012ba5-dirty)
  but I saw same behavior for quite some time .. just never reported it in hope 
it will go away.

  cat /proc/cpuinfo
  processor   : 0
  vendor_id   : AuthenticAMD
  cpu family  : 21
  model   : 2
  model name  : AMD FX(tm)-4300 Quad-Core Processor
  stepping: 0
  microcode   : 0x6000852
  cpu MHz : 1399.977
  cache size  : 2048 KB
  physical id : 0
  siblings: 4
  core id : 0
  cpu cores   : 2
  apicid  : 16
  initial apicid  : 0
  fpu : yes
  fpu_exception   : yes
  cpuid level : 13
  wp  : yes
  flags   : fpu vme de pse tsc msr pae mce cx8 apic sep mtrr pge mca 
cmov pat pse36 clflush mmx fxsr sse sse2 ht syscall nx mmxext fxsr_opt pdpe1gb 
rdtscp lm constant_tsc rep_good nopl nonstop_tsc cpuid extd_apicid aperfmperf 
pni pclmulqdq monitor ssse3 fma cx16 sse4_1 sse4_2 popcnt aes xsave avx f16c 
lahf_lm cmp_legacy svm extapic cr8_legacy abm sse4a misalignsse 3dnowprefetch 
osvw ibs xop skinit wdt lwp fma4 tce nodeid_msr tbm topoext perfctr_core 
perfctr_nb cpb hw_pstate ssbd vmmcall bmi1 arat npt lbrv svm_lock nrip_save 
tsc_scale vmcb_clean flushbyasid decodeassists pausefilter pfthreshold
  bugs: fxsave_leak sysret_ss_attrs null_seg spectre_v1 spectre_v2 
spec_store_bypass
  bogomips: 7600.06
  TLB size: 1536 4K pages
  clflush size: 64
  cache_alignment : 64
  address sizes   : 48 bits physical, 48 bits virtual
  power management: ts ttp tm 100mhzsteps hwpstate cpb eff_freq_ro

  [and 3x more of the same, for 3 remaining cores]

  Gcc is Slackware 14.2's gcc 5.5.0, but I saw this with 4.9.2 too.
  This might be 32-bit host problem. But may be just no-one tried to run qemu 
with GUI guest for literaly days?

  Host kernel is
   uname -a
  Linux slax 5.1.12-x64 #1 SMP PREEMPT Wed Jun 19 12:31:05 MSK 2019 x86_64 AMD 
FX(tm)-4300 Quad-Core Processor AuthenticAMD GNU/Linux

  I was trying newish 5.3.2 but my compilation was not as stable as this one 
  (I tend to change few things, like max cpu count, preemption mode, numa 
support  
  for more distribution-like, yet most stable  and performant for me kernel)

  Kernel world is moving fast, so I'll try to recompile new 5.3.x too
  

  
  I guess I  should provide perf/profiler output, but for  this I need to 
recompile qemu. 
  I'll try to come back with more details soon.

  Thanks for your attention and possible feedback!

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1847525/+subscriptions



Re: Question about scsi device hotplug (e.g scsi-hd)

2020-04-01 Thread Stefan Hajnoczi
On Tue, Mar 31, 2020 at 07:16:23PM +0300, Maxim Levitsky wrote:
> Hi!
> 
> I recently investigated an interesting issue related to repeated scsi-hd 
> hotplug/hotunplug.
> The bugzilla for it is here:
> https://bugzilla.redhat.com/show_bug.cgi?id=1812399
> 
> In nutshell the issue that I think that I found and I would like to ask your 
> opinion on it,
> since I don't have enough experience to be 100% sure that I didn't miss 
> something  is this:
> 
> When a new device is hotplugged via monitor, the qdev_device_add first puts 
> the device on
> the bus where the user requested it to be, and then calls the device's 
> .realize.
> 
> However for scsi bus, each time a new request is sent from the guest, the 
> scsi adapter drivers
> (e.g virtio-scsi) call scsi_device_find to find the LUN's driver to dispatch 
> the request to,
> and scsi_device_find will return the added device as soon as it is placed on 
> the bus.
> 
> Thus between the point when the new device is placed on the bus and until the 
> end of the .realize,
> the device can be accessed by the guest when it is not yet realized or 
> partially realized as
> happens in the bugreport.
> 
> What do you think about it?

Maybe aio_disable_external() is needed to postpone device emulation
until after realize has finished?

Virtqueue kick ioeventfds are marked "external" and won't be processed
while external events are disabled.  See also
virtio_queue_aio_set_host_notifier_handler() ->
aio_set_event_notifier().

Stefan


signature.asc
Description: PGP signature


[PATCH v2 3/6] block: add max_pwrite_zeroes_fast to BlockLimits

2020-04-01 Thread Vladimir Sementsov-Ogievskiy
The NBD spec was recently updated to clarify that max_block doesn't
relate to NBD_CMD_WRITE_ZEROES with NBD_CMD_FLAG_FAST_ZERO (which
mirrors Qemu flag BDRV_REQ_NO_FALLBACK). To drop the restriction we
need new max_pwrite_zeroes_fast.

Default value of new max_pwrite_zeroes_fast is zero and it means
use max_pwrite_zeroes. So this commit semantically changes nothing.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 include/block/block_int.h |  8 
 block/io.c| 17 -
 2 files changed, 20 insertions(+), 5 deletions(-)

diff --git a/include/block/block_int.h b/include/block/block_int.h
index 4c3587ea19..ea1018d598 100644
--- a/include/block/block_int.h
+++ b/include/block/block_int.h
@@ -624,6 +624,14 @@ typedef struct BlockLimits {
  * pwrite_zeroes_alignment. May be 0 if no inherent 32-bit limit */
 int32_t max_pwrite_zeroes;
 
+/*
+ * Maximum number of bytes that can zeroed at once if flag
+ * BDRV_REQ_NO_FALLBACK specified. Must be multiple of
+ * pwrite_zeroes_alignment.
+ * If 0, max_pwrite_zeroes is used for no-fallback case.
+ */
+int64_t max_pwrite_zeroes_fast;
+
 /* Optimal alignment for write zeroes requests in bytes. A power
  * of 2 is best but not mandatory.  Must be a multiple of
  * bl.request_alignment, and must be less than max_pwrite_zeroes
diff --git a/block/io.c b/block/io.c
index aba67f66b9..07270524a9 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1751,12 +1751,13 @@ static int coroutine_fn 
bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
 bool need_flush = false;
 int head = 0;
 int tail = 0;
-
-int max_write_zeroes = MIN_NON_ZERO(bs->bl.max_pwrite_zeroes, INT_MAX);
+int max_write_zeroes;
 int alignment = MAX(bs->bl.pwrite_zeroes_alignment,
 bs->bl.request_alignment);
 int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer, MAX_BOUNCE_BUFFER);
 
+assert(alignment % bs->bl.request_alignment == 0);
+
 if (!drv) {
 return -ENOMEDIUM;
 }
@@ -1765,12 +1766,18 @@ static int coroutine_fn 
bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
 return -ENOTSUP;
 }
 
-assert(alignment % bs->bl.request_alignment == 0);
-head = offset % alignment;
-tail = (offset + bytes) % alignment;
+if ((flags & BDRV_REQ_NO_FALLBACK) && bs->bl.max_pwrite_zeroes_fast) {
+max_write_zeroes = bs->bl.max_pwrite_zeroes_fast;
+} else {
+max_write_zeroes = bs->bl.max_pwrite_zeroes;
+}
+max_write_zeroes = MIN_NON_ZERO(max_write_zeroes, INT_MAX);
 max_write_zeroes = QEMU_ALIGN_DOWN(max_write_zeroes, alignment);
 assert(max_write_zeroes >= bs->bl.request_alignment);
 
+head = offset % alignment;
+tail = (offset + bytes) % alignment;
+
 while (bytes > 0 && !ret) {
 int num = bytes;
 
-- 
2.21.0




Re: Questionable aspects of QEMU Error's design

2020-04-01 Thread Markus Armbruster
Vladimir Sementsov-Ogievskiy  writes:

> Side question:
>
> Can we somehow implement a possibility to reliably identify file and line 
> number
> where error is set by error message?
>
> It's where debug of error-bugs always starts: try to imagine which parts of 
> the error
> message are "%s", and how to grep for it in the code, keeping in mind also,
> that error massage may be split into several lines..
>
> Put file:line into each error? Seems too noisy for users.. A lot of errors 
> are not
> bugs: use do something wrong and see the error, and understands what he is 
> doing
> wrong.. It's not usual practice to print file:line into each message for user.
>
>
> But what if we do some kind of mapping file:line <-> error code, so user will 
> see
> something like:
>
>
>Error 12345: Device drive-scsi0-0-0-0 is not found
>
> 
>
> Hmm, maybe, just add one more argument to error_setg:
>
> error_setg(errp, 12345, "Device %s is not found", device_name);
>
> - it's enough grep-able.

error_setg() already records source file and line number in the Error
object, so that error_handle_fatal(_abort, err) can report them.

Making the programmer pick and pass an error ID at every call site is
onerous.  More so when the error ID should be globally unique.

With GLib's domain, code, the code needs only be unique within the
domain, but you still have to define globally unique domains.
Differently onerous.

We could have -msg,debug=on make error_report_err() report the Error
object's source file and line number.  Doesn't help for the many direct
uses of error_report().  To cover those, we'd have to turn
error_report() into a macro, similar to how error_setg() works.




[PATCH v2 6/6] block/io: auto-no-fallback for write-zeroes

2020-04-01 Thread Vladimir Sementsov-Ogievskiy
When BDRV_REQ_NO_FALLBACK is supported, the NBD driver supports a
larger request size.  Add code to try large zero requests with a
NO_FALLBACK request prior to having to split a request into chunks
according to max_pwrite_zeroes.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/io.c | 18 ++
 1 file changed, 18 insertions(+)

diff --git a/block/io.c b/block/io.c
index f8335e7212..425314a221 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1755,6 +1755,7 @@ static int coroutine_fn 
bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
 int alignment = MAX(bs->bl.pwrite_zeroes_alignment,
 bs->bl.request_alignment);
 int max_transfer = MIN_NON_ZERO(bs->bl.max_transfer, MAX_BOUNCE_BUFFER);
+bool auto_no_fallback;
 
 assert(alignment % bs->bl.request_alignment == 0);
 
@@ -1762,6 +1763,16 @@ static int coroutine_fn 
bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
 return -ENOMEDIUM;
 }
 
+if (!(flags & BDRV_REQ_NO_FALLBACK) &&
+(bs->supported_zero_flags & BDRV_REQ_NO_FALLBACK) &&
+bs->bl.max_pwrite_zeroes && bs->bl.max_pwrite_zeroes < bytes &&
+bs->bl.max_pwrite_zeroes < bs->bl.max_pwrite_zeroes_fast)
+{
+assert(drv->bdrv_co_pwrite_zeroes);
+flags |= BDRV_REQ_NO_FALLBACK;
+auto_no_fallback = true;
+}
+
 if ((flags & ~bs->supported_zero_flags) & BDRV_REQ_NO_FALLBACK) {
 return -ENOTSUP;
 }
@@ -1806,6 +1817,13 @@ static int coroutine_fn 
bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
 if (drv->bdrv_co_pwrite_zeroes) {
 ret = drv->bdrv_co_pwrite_zeroes(bs, offset, num,
  flags & bs->supported_zero_flags);
+if (ret == -ENOTSUP && auto_no_fallback) {
+flags &= ~BDRV_REQ_NO_FALLBACK;
+max_write_zeroes =
+QEMU_ALIGN_DOWN(MIN_NON_ZERO(bs->bl.max_pwrite_zeroes,
+ INT_MAX), alignment);
+continue;
+}
 if (ret != -ENOTSUP && (flags & BDRV_REQ_FUA) &&
 !(bs->supported_zero_flags & BDRV_REQ_FUA)) {
 need_flush = true;
-- 
2.21.0




[PATCH v2 2/6] block/nbd-client: drop max_block restriction from discard

2020-04-01 Thread Vladimir Sementsov-Ogievskiy
NBD spec is updated, so that max_block doesn't relate to
NBD_CMD_TRIM. So, drop the restriction.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/block/nbd.c b/block/nbd.c
index d4d518a780..4ac23c8f62 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1955,7 +1955,7 @@ static void nbd_refresh_limits(BlockDriverState *bs, 
Error **errp)
 }
 
 bs->bl.request_alignment = min;
-bs->bl.max_pdiscard = max;
+bs->bl.max_pdiscard = QEMU_ALIGN_DOWN(INT_MAX, min);
 bs->bl.max_pwrite_zeroes = max;
 bs->bl.max_transfer = max;
 
-- 
2.21.0




[PATCH v2 5/6] block/io: refactor bdrv_co_do_pwrite_zeroes head calculation

2020-04-01 Thread Vladimir Sementsov-Ogievskiy
It's wrong to update head using num in this place, as num may be
reduced during the iteration (seems it doesn't, but it's not obvious),
and we'll have wrong head value on next iteration.

Instead update head at iteration end.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
---
 block/io.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/block/io.c b/block/io.c
index 07270524a9..f8335e7212 100644
--- a/block/io.c
+++ b/block/io.c
@@ -1790,7 +1790,6 @@ static int coroutine_fn 
bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
  * convenience, limit this request to max_transfer even if
  * we don't need to fall back to writes.  */
 num = MIN(MIN(bytes, max_transfer), alignment - head);
-head = (head + num) % alignment;
 assert(num < max_write_zeroes);
 } else if (tail && num > alignment) {
 /* Shorten the request to the last aligned sector.  */
@@ -1849,6 +1848,9 @@ static int coroutine_fn 
bdrv_co_do_pwrite_zeroes(BlockDriverState *bs,
 
 offset += num;
 bytes -= num;
+if (head) {
+head = offset % alignment;
+}
 }
 
 fail:
-- 
2.21.0




[PATCH v2 1/6] block/nbd-client: drop max_block restriction from block_status

2020-04-01 Thread Vladimir Sementsov-Ogievskiy
NBD spec is updated, so that max_block doesn't relate to
NBD_CMD_BLOCK_STATUS. So, drop the restriction.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Eric Blake 
---
 block/nbd.c | 4 +---
 1 file changed, 1 insertion(+), 3 deletions(-)

diff --git a/block/nbd.c b/block/nbd.c
index 2160859f64..d4d518a780 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1320,9 +1320,7 @@ static int coroutine_fn nbd_client_co_block_status(
 NBDRequest request = {
 .type = NBD_CMD_BLOCK_STATUS,
 .from = offset,
-.len = MIN(MIN_NON_ZERO(QEMU_ALIGN_DOWN(INT_MAX,
-bs->bl.request_alignment),
-s->info.max_block),
+.len = MIN(QEMU_ALIGN_DOWN(INT_MAX, bs->bl.request_alignment),
MIN(bytes, s->info.size - offset)),
 .flags = NBD_CMD_FLAG_REQ_ONE,
 };
-- 
2.21.0




[PATCH v2 0/6] nbd: reduce max_block restrictions

2020-04-01 Thread Vladimir Sementsov-Ogievskiy
Recent changes in NBD protocol allowed to use some commands without
max_block restriction. Let's drop the restrictions.

NBD change is here:
https://github.com/NetworkBlockDevice/nbd/commit/9f30fedb8699f151e7ef4ccc07e624330be3316b#diff-762fb7c670348da69cc9050ef58fe3ae

v2:

01: add Eric's r-b
02: keep INT_MAX limit for discard
03: s/max_pwrite_zeroes_no_fallback/max_pwrite_zeroes_fast/
  (a bit shorter, so that if condition fit in one line)
default is changed to be max_pwrite_zeroes,
  so blkdebug is unchanged and we need one more patch: 04
refactor max_write_zeroes calculation in bdrv_co_do_pwrite_zeroes,
  prepare to last patch
04: new, actual change for nbd driver
05: reword commit message (fix -> refactoring) as I don't see bugs here. Keep 
an r-b
06: rewording, grammar [Eric]
rebase on 03 changes

Vladimir Sementsov-Ogievskiy (6):
  block/nbd-client: drop max_block restriction from block_status
  block/nbd-client: drop max_block restriction from discard
  block: add max_pwrite_zeroes_fast to BlockLimits
  block/nbd: define new max_write_zero_fast limit
  block/io: refactor bdrv_co_do_pwrite_zeroes head calculation
  block/io: auto-no-fallback for write-zeroes

 include/block/block_int.h |  8 
 block/io.c| 39 +--
 block/nbd.c   |  7 +++
 3 files changed, 44 insertions(+), 10 deletions(-)

-- 
2.21.0




[PATCH v2 4/6] block/nbd: define new max_write_zero_fast limit

2020-04-01 Thread Vladimir Sementsov-Ogievskiy
The NBD spec was recently updated to clarify that max_block doesn't
relate to NBD_CMD_WRITE_ZEROES with NBD_CMD_FLAG_FAST_ZERO (which
mirrors Qemu flag BDRV_REQ_NO_FALLBACK).

bs->bl.max_write_zero_fast is zero by default which means using
max_pwrite_zeroes. Update nbd driver to allow larger requests with
BDRV_REQ_NO_FALLBACK.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
 block/nbd.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/block/nbd.c b/block/nbd.c
index 4ac23c8f62..b0584cf68d 100644
--- a/block/nbd.c
+++ b/block/nbd.c
@@ -1956,6 +1956,7 @@ static void nbd_refresh_limits(BlockDriverState *bs, 
Error **errp)
 
 bs->bl.request_alignment = min;
 bs->bl.max_pdiscard = QEMU_ALIGN_DOWN(INT_MAX, min);
+bs->bl.max_pwrite_zeroes_fast = bs->bl.max_pdiscard;
 bs->bl.max_pwrite_zeroes = max;
 bs->bl.max_transfer = max;
 
-- 
2.21.0




Re: [PATCH v4] block/nvme: introduce PMR support from NVMe 1.4 spec

2020-04-01 Thread Keith Busch
On Wed, Apr 01, 2020 at 03:50:05PM +0100, Stefan Hajnoczi wrote:
> On Tue, Mar 24, 2020 at 10:05:26AM -0700, Andrzej Jakowski wrote:
> > On 3/23/20 6:28 AM, Stefan Hajnoczi wrote:
> > > Excellent, thank you!
> > > 
> > > Reviewed-by: Stefan Hajnoczi 
> > 
> > Awesome, thx! Not sure about process...
> > Is this patch now staged for inclusion in QEMU?
> 
> Kevin or Max would normally merge it.
> 
> A review or ack from Keith Busch would be great, too.
> 
> Stefan

Yes, I reviewed this patch on this thread, and looks good to me:
https://lists.nongnu.org/archive/html/qemu-devel/2020-03/msg08816.html



  1   2   3   >