date:20200324

RE: [PATCH v2 1/1] net/colo-compare.c: Fix memory leak in packet_enqueue()

2020-03-24 Thread Zhang, Chen



> -Original Message-
> From: Derek Su 
> Sent: Wednesday, March 25, 2020 12:17 PM
> To: Zhang, Chen 
> Cc: qemu-devel@nongnu.org; lizhij...@cn.fujitsu.com;
> jasow...@redhat.com; dere...@qnap.com
> Subject: Re: [PATCH v2 1/1] net/colo-compare.c: Fix memory leak in
> packet_enqueue()
> 
> Jing-Wei Su  於 2020年3月25日 週三 上午10:05
> 寫道：
> >
> > Zhang, Chen  於 2020年3月25日 週三 上午
> 9:37寫道：
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: Jing-Wei Su 
> > > > Sent: Tuesday, March 24, 2020 10:47 AM
> > > > To: Zhang, Chen 
> > > > Cc: qemu-devel@nongnu.org; lizhij...@cn.fujitsu.com;
> > > > jasow...@redhat.com; dere...@qnap.com
> > > > Subject: Re: [PATCH v2 1/1] net/colo-compare.c: Fix memory leak in
> > > > packet_enqueue()
> > > >
> > > > Zhang, Chen  於 2020年3月24日 週二 上午
> 3:24
> > > > 寫道：
> > > > >
> > > > >
> > > > >
> > > > > > -Original Message-
> > > > > > From: Derek Su 
> > > > > > Sent: Monday, March 23, 2020 1:48 AM
> > > > > > To: qemu-devel@nongnu.org
> > > > > > Cc: Zhang, Chen ;
> > > > > > lizhij...@cn.fujitsu.com; jasow...@redhat.com;
> > > > > > dere...@qnap.com
> > > > > > Subject: [PATCH v2 1/1] net/colo-compare.c: Fix memory leak in
> > > > > > packet_enqueue()
> > > > > >
> > > > > > The patch is to fix the "pkt" memory leak in packet_enqueue().
> > > > > > The allocated "pkt" needs to be freed if the colo compare
> > > > > > primary or secondary queue is too big.
> > > > >
> > > > > Hi Derek,
> > > > >
> > > > > Thank you for the patch.
> > > > > I re-think this issue in a big view, looks just free the pkg is
> > > > > not enough in
> > > > this situation.
> > > > > The root cause is network is too busy to compare, So, better
> > > > > choice is notify COLO frame to do a checkpoint and clean up all
> > > > > the network queue. This work maybe decrease COLO network
> > > > > performance but seams
> > > > better than drop lots of pkg.
> > > > >
> > > > > Thanks
> > > > > Zhang Chen
> > > > >
> > > >
> > > > Hello, Zhang
> > > >
> > > > Got it.
> > > > What is the concern of the massive "drop packets"?
> > > > Does the behavior make the COLO do checkpoint periodically (~20
> > > > seconds) instead of doing immediate checkpoint when encountering
> > > > different response packets?
> > >
> > > The concern of the "drop packets" is guest will lose network
> > > connection with most of real clients until next periodic force
> > > checkpoint. COLO designed for dynamic control checkpoint, so I think do
> a checkpoint here will help guest supply service faster.
> > >
> >
> > I see.
> > I'll update the patch with your suggestion later.
> >
> 
> Hi, Zhang
> Here is the idea and pseudo code to handle the "drop packet".
> 
> ```
> ret = packet_enqueue
> (1) ret == 0
>   compare connection
> (2) ret == -1
>   send packet
> (3) ret == queue insertion fail
>   do checkpoint
>   send all queued primary packets
>   remove all queued secondary packets ```
> 
> Do you have any comment for the handling?

Looks good for me.

Thanks
Zhang Chen

> 
> Thanks
> Derek
> 
> > > >
> > > > It seems that frequent checkpoints caused by the full queue (busy
> > > > network) instead of different
> > > > response packets may harm the high speed network (10 Gbps or
> > > > higher) performance dramatically.
> > >
> > > Yes, maybe I can send a patch to make user adjust queue size depend on
> it's own environment.
> > > But with larger queue size, colo-compare will spend much time to do
> > > compare packet when network Is real busy status.
> >
> > Thank you. The user-configurable queue size will be very helpful.
> >
> > Thanks.
> > Derek Su
> >
> > >
> > > Thanks
> > > Zhang Chen
> > >
> > > >
> > > > Thanks
> > > > Derek
> > > >
> > > > > >
> > > > > > Signed-off-by: Derek Su 
> > > > > > ---
> > > > > >  net/colo-compare.c | 23 +++
> > > > > >  1 file changed, 15 insertions(+), 8 deletions(-)
> > > > > >
> > > > > > diff --git a/net/colo-compare.c b/net/colo-compare.c index
> > > > > > 7ee17f2cf8..cdd87b2aa8 100644
> > > > > > --- a/net/colo-compare.c
> > > > > > +++ b/net/colo-compare.c
> > > > > > @@ -120,6 +120,10 @@ enum {
> > > > > >  SECONDARY_IN,
> > > > > >  };
> > > > > >
> > > > > > +static const char *colo_mode[] = {
> > > > > > +[PRIMARY_IN] = "primary",
> > > > > > +[SECONDARY_IN] = "secondary", };
> > > > > >
> > > > > >  static int compare_chr_send(CompareState *s,
> > > > > >  const uint8_t *buf, @@ -215,6
> > > > > > +219,7 @@ static int packet_enqueue(CompareState *s, int mode,
> > > > > > Connection
> > > > **con)
> > > > > >  ConnectionKey key;
> > > > > >  Packet *pkt = NULL;
> > > > > >  Connection *conn;
> > > > > > +int ret;
> > > > > >
> > > > > >  if (mode == PRIMARY_IN) {
> > > > > >  pkt = packet_new(s->pri_rs.buf, @@ -243,16 +248,18 @@
> > > > > > static int packet_enqueue(CompareState *s, int mode,
> > > > > > Connection
> > > > **con)
> > > > > >  }
> > > >

RE: [PATCH v15 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-03-24 Thread Tian, Kevin

> From: Dr. David Alan Gilbert 
> Sent: Wednesday, March 25, 2020 4:23 AM
> 
> * Alex Williamson (alex.william...@redhat.com) wrote:
> > On Mon, 23 Mar 2020 23:01:18 -0400
> > Yan Zhao  wrote:
> >
> > > On Tue, Mar 24, 2020 at 02:51:14AM +0800, Dr. David Alan Gilbert wrote:
> > > > * Alex Williamson (alex.william...@redhat.com) wrote:
> > > > > On Mon, 23 Mar 2020 23:24:37 +0530
> > > > > Kirti Wankhede  wrote:
> > > > >
> > > > > > On 3/21/2020 12:29 AM, Alex Williamson wrote:
> > > > > > > On Sat, 21 Mar 2020 00:12:04 +0530
> > > > > > > Kirti Wankhede  wrote:
> > > > > > >
> > > > > > >> On 3/20/2020 11:31 PM, Alex Williamson wrote:
> > > > > > >>> On Fri, 20 Mar 2020 23:19:14 +0530
> > > > > > >>> Kirti Wankhede  wrote:
> > > > > > >>>
> > > > > >  On 3/20/2020 4:27 AM, Alex Williamson wrote:
> > > > > > > On Fri, 20 Mar 2020 01:46:41 +0530
> > > > > > > Kirti Wankhede  wrote:
> > > > > > >
> > > > > > >>
> > > > > > >> 
> > > > > > >>
> > > > > > >> +static int vfio_iova_dirty_bitmap(struct vfio_iommu
> *iommu, dma_addr_t iova,
> > > > > > >> +  size_t size, uint64_t pgsize,
> > > > > > >> +  u64 __user *bitmap)
> > > > > > >> +{
> > > > > > >> +struct vfio_dma *dma;
> > > > > > >> +unsigned long pgshift = __ffs(pgsize);
> > > > > > >> +unsigned int npages, bitmap_size;
> > > > > > >> +
> > > > > > >> +dma = vfio_find_dma(iommu, iova, 1);
> > > > > > >> +
> > > > > > >> +if (!dma)
> > > > > > >> +return -EINVAL;
> > > > > > >> +
> > > > > > >> +if (dma->iova != iova || dma->size != size)
> > > > > > >> +return -EINVAL;
> > > > > > >> +
> > > > > > >> +npages = dma->size >> pgshift;
> > > > > > >> +bitmap_size = DIRTY_BITMAP_BYTES(npages);
> > > > > > >> +
> > > > > > >> +/* mark all pages dirty if all pages are pinned and
> mapped. */
> > > > > > >> +if (dma->iommu_mapped)
> > > > > > >> +bitmap_set(dma->bitmap, 0, npages);
> > > > > > >> +
> > > > > > >> +if (copy_to_user((void __user *)bitmap, dma-
> >bitmap, bitmap_size))
> > > > > > >> +return -EFAULT;
> > > > > > >
> > > > > > > We still need to reset the bitmap here, clearing and re-adding
> the
> > > > > > > pages that are still pinned.
> > > > > > >
> > > > > > >
> https://lore.kernel.org/kvm/20200319070635.2ff5d...@x1.home/
> > > > > > >
> > > > > > 
> > > > > >  I thought you agreed on my reply to it
> > > > > >  https://lore.kernel.org/kvm/31621b70-02a9-2ea5-045f-
> f72b671fe...@nvidia.com/
> > > > > > 
> > > > > > > Why re-populate when there will be no change since
> > > > > > > vfio_iova_dirty_bitmap() is called holding iommu->lock? If
> there is any
> > > > > > > pin request while vfio_iova_dirty_bitmap() is still 
> > > > > >  working, it
> will
> > > > > > > wait till iommu->lock is released. Bitmap will be 
> > > > > >  populated
> when page is
> > > > > > > pinned.
> > > > > > >>>
> > > > > > >>> As coded, dirty bits are only ever set in the bitmap, never
> cleared.
> > > > > > >>> If a page is unpinned between iterations of the user recording
> the
> > > > > > >>> dirty bitmap, it should be marked dirty in the iteration
> immediately
> > > > > > >>> after the unpinning and not marked dirty in the following
> iteration.
> > > > > > >>> That doesn't happen here.  We're reporting cumulative dirty
> pages since
> > > > > > >>> logging was enabled, we need to be reporting dirty pages since
> the user
> > > > > > >>> last retrieved the dirty bitmap.  The bitmap should be cleared
> and
> > > > > > >>> currently pinned pages re-added after copying to the user.
> Thanks,
> > > > > > >>>
> > > > > > >>
> > > > > > >> Does that mean, we have to track every iteration? do we really
> need that
> > > > > > >> tracking?
> > > > > > >>
> > > > > > >> Generally the flow is:
> > > > > > >> - vendor driver pin x pages
> > > > > > >> - Enter pre-copy-phase where vCPUs are running - user starts
> dirty pages
> > > > > > >> tracking, then user asks dirty bitmap, x pages reported dirty by
> > > > > > >> VFIO_IOMMU_DIRTY_PAGES ioctl with _GET flag
> > > > > > >> - In pre-copy phase, vendor driver pins y more pages, now
> bitmap
> > > > > > >> consists of x+y bits set
> > > > > > >> - In pre-copy phase, vendor driver unpins z pages, but bitmap is
> not
> > > > > > >> updated, so again bitmap consists of x+y bits set.
> > > > > > >> - Enter in stop-and-copy phase, vCPUs are stopped, mdev devices
> are stopped
> > > > > > >> - user asks dirty bitmap - Since here vCPU and mdev devices are
> stopped,
> > > > > > >> pages should not get dirty by guest driver or the physical 
> > > > > > >> device.
> > > > > > >> Hence, x+y dirty pages would be reported.
> > > > > > >>
> > > > > > >> I don't think we need to track

Re: [PATCH 6/6] qga/commands-posix: fix use after free of local_err

2020-03-24 Thread Vladimir Sementsov-Ogievskiy


24.03.2020 23:03, Eric Blake wrote:

On 3/24/20 10:36 AM, Vladimir Sementsov-Ogievskiy wrote:

local_err is used several times in guest_suspend(). Setting non-NULL
local_err will crash, so let's zero it after freeing. Also fix possible
leak of local_err in final if().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  qga/commands-posix.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index 93474ff770..cc69b82704 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -1773,6 +1773,7 @@ static void guest_suspend(SuspendMode mode, Error **errp)
  }
  error_free(local_err);
+    local_err = NULL;


Let's show this with more context.


static void guest_suspend(SuspendMode mode, Error **errp)
{
    Error *local_err = NULL;
    bool mode_supported = false;

    if (systemd_supports_mode(mode, _err)) {


Hmm - we have an even earlier bug that needs fixing.  Note that systemd_supports_mode() returns a 
bool AND conditionally sets errp.  But it is inconsistent: it has the following table of actions 
based on the results of run_process_child() on "systemctl status" coupled with the man 
page on "systemctl status" return values:
-1 (unable to run systemctl) -> errp set, return false
0 (unit is active) -> errp left unchanged, return false
1 (unit not failed) -> errp left unchanged, return true
2 (unused) -> errp left unchanged, return true
3 (unit not active) -> errp left unchanged, return true
4 (no such unit) -> errp left unchanged, return false
5+ (unexpected from systemctl) -> errp left unchanged, return false

But the comments in systemd_supports_mode() claim that ANY status < 4 (other 
than -1, which means we did not run systemctl) should count as the service 
existing, even though the most common status is 3.  If our comment is to be 
believed, then we should return true, not false, for status 0.

Now, back to _this_ function:


    mode_supported = true;
    systemd_suspend(mode, _err);


Okay - if we get here (whether from status 1-3, or with systemd_supports_mode 
fixed to support status 0-3), local_err is still unset prior to calling 
systemd_suspend(), and we are guaranteed that after the call, either we 
suspended successfully or local_err is now set.


    }

    if (!local_err) {
    return;
    }


So if returned, we succeeded at systemd_suspend, and there is nothing further 
to do; but if we get past that point, we don't know if it was 
systemd_supports_mode that failed or systemd_suspend that failed, and we don't 
know if local_err is set.


No, we know that is set, as we check exactly this and return if not set.





    error_free(local_err);
+    local_err = NULL;


Yet, we blindly throw away local_err, without trying to report it.  If that's 
the case, then WHY are we passing in local_err?  Wouldn't it be better to pass 
in NULL (we really don't care about the error message), and/or fix 
systemd_suspend() to return a bool just like systemd_supports_mode, and/or fix 
systemd_supports_mode to guarantee that it sets errp when returning false?



I agree that this is a strange function and its logic is weird. But I don't 
know what the logic should be. My patch is still valid to just fix obvious 
use-after-free and possible leak. It doesn't fix the logic.


--
Best regards,
Vladimir

Re: [PATCH v2 1/1] net/colo-compare.c: Fix memory leak in packet_enqueue()

2020-03-24 Thread Derek Su

Jing-Wei Su  於 2020年3月25日 週三 上午10:05寫道：
>
> Zhang, Chen  於 2020年3月25日 週三 上午9:37寫道：
> >
> >
> >
> > > -Original Message-
> > > From: Jing-Wei Su 
> > > Sent: Tuesday, March 24, 2020 10:47 AM
> > > To: Zhang, Chen 
> > > Cc: qemu-devel@nongnu.org; lizhij...@cn.fujitsu.com;
> > > jasow...@redhat.com; dere...@qnap.com
> > > Subject: Re: [PATCH v2 1/1] net/colo-compare.c: Fix memory leak in
> > > packet_enqueue()
> > >
> > > Zhang, Chen  於 2020年3月24日 週二 上午3:24
> > > 寫道：
> > > >
> > > >
> > > >
> > > > > -Original Message-
> > > > > From: Derek Su 
> > > > > Sent: Monday, March 23, 2020 1:48 AM
> > > > > To: qemu-devel@nongnu.org
> > > > > Cc: Zhang, Chen ; lizhij...@cn.fujitsu.com;
> > > > > jasow...@redhat.com; dere...@qnap.com
> > > > > Subject: [PATCH v2 1/1] net/colo-compare.c: Fix memory leak in
> > > > > packet_enqueue()
> > > > >
> > > > > The patch is to fix the "pkt" memory leak in packet_enqueue().
> > > > > The allocated "pkt" needs to be freed if the colo compare primary or
> > > > > secondary queue is too big.
> > > >
> > > > Hi Derek,
> > > >
> > > > Thank you for the patch.
> > > > I re-think this issue in a big view, looks just free the pkg is not 
> > > > enough in
> > > this situation.
> > > > The root cause is network is too busy to compare, So, better choice is
> > > > notify COLO frame to do a checkpoint and clean up all the network
> > > > queue. This work maybe decrease COLO network performance but seams
> > > better than drop lots of pkg.
> > > >
> > > > Thanks
> > > > Zhang Chen
> > > >
> > >
> > > Hello, Zhang
> > >
> > > Got it.
> > > What is the concern of the massive "drop packets"?
> > > Does the behavior make the COLO do checkpoint periodically (~20 seconds)
> > > instead of doing immediate checkpoint when encountering different
> > > response packets?
> >
> > The concern of the "drop packets" is guest will lose network connection with
> > most of real clients until next periodic force checkpoint. COLO designed 
> > for dynamic
> > control checkpoint, so I think do a checkpoint here will help guest supply 
> > service faster.
> >
>
> I see.
> I'll update the patch with your suggestion later.
>

Hi, Zhang
Here is the idea and pseudo code to handle the "drop packet".

```
ret = packet_enqueue
(1) ret == 0
  compare connection
(2) ret == -1
  send packet
(3) ret == queue insertion fail
  do checkpoint
  send all queued primary packets
  remove all queued secondary packets
```

Do you have any comment for the handling?

Thanks
Derek

> > >
> > > It seems that frequent checkpoints caused by the full queue (busy
> > > network) instead of different
> > > response packets may harm the high speed network (10 Gbps or higher)
> > > performance dramatically.
> >
> > Yes, maybe I can send a patch to make user adjust queue size depend on it's 
> > own environment.
> > But with larger queue size, colo-compare will spend much time to do compare 
> > packet when network
> > Is real busy status.
>
> Thank you. The user-configurable queue size will be very helpful.
>
> Thanks.
> Derek Su
>
> >
> > Thanks
> > Zhang Chen
> >
> > >
> > > Thanks
> > > Derek
> > >
> > > > >
> > > > > Signed-off-by: Derek Su 
> > > > > ---
> > > > >  net/colo-compare.c | 23 +++
> > > > >  1 file changed, 15 insertions(+), 8 deletions(-)
> > > > >
> > > > > diff --git a/net/colo-compare.c b/net/colo-compare.c index
> > > > > 7ee17f2cf8..cdd87b2aa8 100644
> > > > > --- a/net/colo-compare.c
> > > > > +++ b/net/colo-compare.c
> > > > > @@ -120,6 +120,10 @@ enum {
> > > > >  SECONDARY_IN,
> > > > >  };
> > > > >
> > > > > +static const char *colo_mode[] = {
> > > > > +[PRIMARY_IN] = "primary",
> > > > > +[SECONDARY_IN] = "secondary",
> > > > > +};
> > > > >
> > > > >  static int compare_chr_send(CompareState *s,
> > > > >  const uint8_t *buf, @@ -215,6 +219,7 @@
> > > > > static int packet_enqueue(CompareState *s, int mode, Connection
> > > **con)
> > > > >  ConnectionKey key;
> > > > >  Packet *pkt = NULL;
> > > > >  Connection *conn;
> > > > > +int ret;
> > > > >
> > > > >  if (mode == PRIMARY_IN) {
> > > > >  pkt = packet_new(s->pri_rs.buf, @@ -243,16 +248,18 @@
> > > > > static int packet_enqueue(CompareState *s, int mode, Connection
> > > **con)
> > > > >  }
> > > > >
> > > > >  if (mode == PRIMARY_IN) {
> > > > > -if (!colo_insert_packet(>primary_list, pkt, 
> > > > > >pack)) {
> > > > > -error_report("colo compare primary queue size too big,"
> > > > > - "drop packet");
> > > > > -}
> > > > > +ret = colo_insert_packet(>primary_list, pkt,
> > > > > + >pack);
> > > > >  } else {
> > > > > -if (!colo_insert_packet(>secondary_list, pkt, 
> > > > > >sack)) {
> > > > > -error_report("colo compare secondary queue size too big,"
> > > > > - "drop packet");
> > > > > -}
> > > > > +

[PATCH v5 0/3] redundant code: Fix warnings reported by Clang static code analyzer

2020-03-24 Thread Chen Qun

v1->v2:
- Patch1: Add John Snow review comment.
- Patch9: Move the 'dst_type' declaration to while() statement.
- Patch12: Add Philippe Mathieu-Daud?? review comment.
- Patch13: Move the 'set' declaration to the for() statement.

v2->v3:
- Patch1: Add Kevin Wolf review comment.
- Patch2: Keep the 'flags' then use it(Base on Kevin's comments).
- Patch3: Add Kevin Wolf review comment.
- Patch9: Add Francisco Iglesias and Alistair Francis review comment.
- Patch10: Juan Quintela has added it to the queue and delete it.
- Patch12->Patch11: Add Philippe Mathieu-Daud?? review comment.
- Patch13->Patch12: Add Philippe Mathieu-Daud?? review comment.

v3->v4:
- Deleted the patches that have been merged in the v3.
- Modify "scsi/esp-pci" Patch, use g_assert with variable size.

v4->v5:
- Patch1: Add Laurent Vivier review comment and change the subject.
- Patch2: Use extract16() instead of delete bit operation statement.
- Patch3: Add Laurent Vivier review comment.

Chen Qun (3):
  scsi/esp-pci: add g_assert() for fix clang analyzer warning in
esp_pci_io_write()
  display/blizzard: use extract16() for fix clang analyzer warning in
blizzard_draw_line16_32()
  timer/exynos4210_mct: Remove redundant statement in
exynos4210_mct_write()

 hw/display/blizzard.c | 10 --
 hw/scsi/esp-pci.c |  1 +
 hw/timer/exynos4210_mct.c |  4 
 3 files changed, 5 insertions(+), 10 deletions(-)

-- 
2.23.0

[PATCH v5 2/3] display/blizzard: use extract16() for fix clang analyzer warning in blizzard_draw_line16_32()

2020-03-24 Thread Chen Qun

Clang static code analyzer show warning:
  hw/display/blizzard.c:940:9: warning: Value stored to 'data' is never read
data >>= 5;
^~
Reported-by: Euler Robot 
Signed-off-by: Chen Qun 
---
Cc: Andrzej Zaborowski 
Cc: Peter Maydell 

v1->v2: Use extract16() function instead of bit operation(Base on Laurent's 
comments).
---
 hw/display/blizzard.c | 10 --
 1 file changed, 4 insertions(+), 6 deletions(-)

diff --git a/hw/display/blizzard.c b/hw/display/blizzard.c
index 359e399c2a..105241577d 100644
--- a/hw/display/blizzard.c
+++ b/hw/display/blizzard.c
@@ -19,6 +19,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/bitops.h"
 #include "ui/console.h"
 #include "hw/display/blizzard.h"
 #include "ui/pixel_ops.h"
@@ -932,12 +933,9 @@ static void blizzard_draw_line16_32(uint32_t *dest,
 const uint16_t *end = (const void *) src + width;
 while (src < end) {
 data = *src ++;
-b = (data & 0x1f) << 3;
-data >>= 5;
-g = (data & 0x3f) << 2;
-data >>= 6;
-r = (data & 0x1f) << 3;
-data >>= 5;
+b = extract16(data, 0, 5) << 3;
+g = extract16(data, 5, 6) << 2;
+r = extract16(data, 11, 5) << 3;
 *dest++ = rgb_to_pixel32(r, g, b);
 }
 }
-- 
2.23.0

[PATCH v5 3/3] timer/exynos4210_mct: Remove redundant statement in exynos4210_mct_write()

2020-03-24 Thread Chen Qun

Clang static code analyzer show warning:
hw/timer/exynos4210_mct.c:1370:9: warning: Value stored to 'index' is never read
index = GET_L_TIMER_CNT_REG_IDX(offset, lt_i);
^   ~
hw/timer/exynos4210_mct.c:1399:9: warning: Value stored to 'index' is never read
index = GET_L_TIMER_CNT_REG_IDX(offset, lt_i);
^   ~
hw/timer/exynos4210_mct.c:1441:9: warning: Value stored to 'index' is never read
index = GET_L_TIMER_CNT_REG_IDX(offset, lt_i);
^   ~

Reported-by: Euler Robot 
Signed-off-by: Chen Qun 
Reviewed-by: Laurent Vivier 
---
Cc: Igor Mitsyanko 
Cc: Peter Maydell 
---
 hw/timer/exynos4210_mct.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/hw/timer/exynos4210_mct.c b/hw/timer/exynos4210_mct.c
index 944120aea5..570cf7075b 100644
--- a/hw/timer/exynos4210_mct.c
+++ b/hw/timer/exynos4210_mct.c
@@ -1367,7 +1367,6 @@ static void exynos4210_mct_write(void *opaque, hwaddr 
offset,
 
 case L0_TCNTB: case L1_TCNTB:
 lt_i = GET_L_TIMER_IDX(offset);
-index = GET_L_TIMER_CNT_REG_IDX(offset, lt_i);
 
 /*
  * TCNTB is updated to internal register only after CNT expired.
@@ -1396,7 +1395,6 @@ static void exynos4210_mct_write(void *opaque, hwaddr 
offset,
 
 case L0_ICNTB: case L1_ICNTB:
 lt_i = GET_L_TIMER_IDX(offset);
-index = GET_L_TIMER_CNT_REG_IDX(offset, lt_i);
 
 s->l_timer[lt_i].reg.wstat |= L_WSTAT_ICNTB_WRITE;
 s->l_timer[lt_i].reg.cnt[L_REG_CNT_ICNTB] = value &
@@ -1438,8 +1436,6 @@ static void exynos4210_mct_write(void *opaque, hwaddr 
offset,
 
 case L0_FRCNTB: case L1_FRCNTB:
 lt_i = GET_L_TIMER_IDX(offset);
-index = GET_L_TIMER_CNT_REG_IDX(offset, lt_i);
-
 DPRINTF("local timer[%d] FRCNTB write %llx\n", lt_i, value);
 
 s->l_timer[lt_i].reg.wstat |= L_WSTAT_FRCCNTB_WRITE;
-- 
2.23.0

[PATCH v5 1/3] scsi/esp-pci: add g_assert() for fix clang analyzer warning in esp_pci_io_write()

2020-03-24 Thread Chen Qun

Clang static code analyzer show warning:
  hw/scsi/esp-pci.c:198:9: warning: Value stored to 'size' is never read
size = 4;
^  ~

Reported-by: Euler Robot 
Signed-off-by: Chen Qun 
Reviewed-by: Laurent Vivier 
---
Cc: Paolo Bonzini 
Cc:Fam Zheng 

v1->v2:
keep ' size = 4'  and  add 'g_assert(size >= 4)' after if() statement.
(Base on Laurent's comments)
---
 hw/scsi/esp-pci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/hw/scsi/esp-pci.c b/hw/scsi/esp-pci.c
index d5a1f9e017..497a8d5901 100644
--- a/hw/scsi/esp-pci.c
+++ b/hw/scsi/esp-pci.c
@@ -197,6 +197,7 @@ static void esp_pci_io_write(void *opaque, hwaddr addr,
 addr &= ~3;
 size = 4;
 }
+g_assert(size >= 4);
 
 if (addr < 0x40) {
 /* SCSI core reg */
-- 
2.23.0

Re: Qemu master crashing on boot when using file backend for memory

2020-03-24 Thread Raphael Norwitz

On Thu, Mar 12, 2020 at 11:54:50AM +0100, Igor Mammedov wrote:
> 
> On Thu, 12 Mar 2020 01:36:48 -0400
> Raphael Norwitz  wrote:
> 
> > When I try run master qemu I am hitting a divide by zero error. It seems
> > to be coming from util/oslib-posix.c in touch_all_pages(). see line 477:
> > 
> > numpages_per_thread = numpages / memset_num_threads;
> > 
> > Poking around the crash dumps, I can see that the smp_cpus parameter
> > passed in to touch_all_pages() is 0. Going up the stack to
> > host_memory_backend_memory_complete() I see backend->prealloc_threads is
> > also 0.
> > 
> > Here’s how I am running qemu
> > 
> > ./x86_64-softmmu/qemu-system-x86_64 \
> > -kernel /boot/vmlinuz-3.10.0-1062.el7.x86_64  \
> > -netdev user,id=net0,hostfwd=tcp::2250-:22 \
> > -device e1000e,netdev=net0 \
> > -m 1G \
> > -initrd /boot/initramfs-3.10.0-1062.el7.x86_64.img  \
> > -object 
> > memory-backend-file,id=ram-node0,prealloc=yes,mem-path=mem,share=yes,size=1G
> >  \
> > -numa node,nodeid=0,cpus=0,memdev=ram-node0 
> > 
> > I don't see this error on a slightly older qemu, as of commit 105b07f1
> > (January 27th).
> > 
> > Interestingly when I remove the memory-backend-file parameter I don’t
> > see the error, i.e. this works:
> > 
> > ./x86_64-softmmu/qemu-system-x86_64 \
> > -kernel /boot/vmlinuz-3.10.0-1062.el7.x86_64  \
> > -netdev user,id=net0,hostfwd=tcp::2250-:22 \
> > -device e1000e,netdev=net0 \
> > -m 1G \
> > -initrd /boot/initramfs-3.10.0-1062.el7.x86_64.img
> > 
> > Looking at the blame data for backends/hostmem.c I see commit ffac16fa
> > introduced some churn in this part of the code. Has anyone else seen
> > this issue? Could I be doing something wrong here?
> 
> It's know issue, see
>  "[PATCH] oslib-posix: initialize mutex and condition variable
> for a fix
> > 
>

I'm testing on qemu master now. Looks like this patch has been merged
and I'm still seeing the same crash.

Re: [PATCH v3] hw/char/pl011: Enable TxFIFO and async transmission

2020-03-24 Thread Gavin Shan


On 3/11/20 3:09 PM, Gavin Shan wrote:

The depth of TxFIFO can be 1 or 16 depending on LCR[4]. The TxFIFO is
disabled when its depth is 1. It's nice to have TxFIFO enabled if
possible because more characters can be piled and transmitted at once,
which would have less overhead. Besides, we can be blocked because of
qemu_chr_fe_write_all(), which isn't nice.

This enables TxFIFO if possible. On ther other hand, the asynchronous
transmission is enabled if needed, as we did in hw/char/cadence_uart.c

Signed-off-by: Gavin Shan 
---
v3: Use PL011() to do data type conversion
 Return G_SOURCE_REMOVE when the backend is disconnected in pl011_xmit()
 Drop parenthesis in the condition validating @size in pl011_write_fifo()
---
  hw/char/pl011.c | 105 +---
  include/hw/char/pl011.h |   3 ++
  2 files changed, 102 insertions(+), 6 deletions(-)



Marc-André, ping. Could you please review when you get a chance? Thanks in
advance :)


diff --git a/hw/char/pl011.c b/hw/char/pl011.c
index 13e784f9d9..dccb8c42b0 100644
--- a/hw/char/pl011.c
+++ b/hw/char/pl011.c
@@ -169,6 +169,73 @@ static void pl011_set_read_trigger(PL011State *s)
  s->read_trigger = 1;
  }
  
+static gboolean pl011_xmit(GIOChannel *chan, GIOCondition cond, void *opaque)

+{
+PL011State *s = PL011(opaque);
+int ret;
+
+/* Drain FIFO if there is no backend */
+if (!qemu_chr_fe_backend_connected(>chr)) {
+s->write_count = 0;
+s->flags &= ~PL011_FLAG_TXFF;
+s->flags |= PL011_FLAG_TXFE;
+return G_SOURCE_REMOVE;
+}
+
+/* Nothing to do */
+if (!s->write_count) {
+return FALSE;
+}
+
+ret = qemu_chr_fe_write(>chr, s->write_fifo, s->write_count);
+if (ret > 0) {
+s->write_count -= ret;
+memmove(s->write_fifo, s->write_fifo + ret, s->write_count);
+s->flags &= ~PL011_FLAG_TXFF;
+if (!s->write_count) {
+s->flags |= PL011_FLAG_TXFE;
+}
+}
+
+if (s->write_count) {
+s->watch_tag = qemu_chr_fe_add_watch(>chr, G_IO_OUT | G_IO_HUP,
+ pl011_xmit, s);
+if (!s->watch_tag) {
+s->write_count = 0;
+s->flags &= ~PL011_FLAG_TXFF;
+s->flags |= PL011_FLAG_TXFE;
+return FALSE;
+}
+}
+
+s->int_level |= PL011_INT_TX;
+pl011_update(s);
+return FALSE;
+}
+
+static void pl011_write_fifo(void *opaque, const unsigned char *buf, int size)
+{
+PL011State *s = PL011(opaque);
+int depth = (s->lcr & 0x10) ? 16 : 1;
+
+if (size >= depth - s->write_count) {
+size = depth - s->write_count;
+}
+
+if (size > 0) {
+memcpy(s->write_fifo + s->write_count, buf, size);
+s->write_count += size;
+if (s->write_count >= depth) {
+s->flags |= PL011_FLAG_TXFF;
+}
+s->flags &= ~PL011_FLAG_TXFE;
+}
+
+if (!s->watch_tag) {
+pl011_xmit(NULL, G_IO_OUT, s);
+}
+}
+
  static void pl011_write(void *opaque, hwaddr offset,
  uint64_t value, unsigned size)
  {
@@ -179,13 +246,8 @@ static void pl011_write(void *opaque, hwaddr offset,
  
  switch (offset >> 2) {

  case 0: /* UARTDR */
-/* ??? Check if transmitter is enabled.  */
  ch = value;
-/* XXX this blocks entire thread. Rewrite to use
- * qemu_chr_fe_write and background I/O callbacks */
-qemu_chr_fe_write_all(>chr, , 1);
-s->int_level |= PL011_INT_TX;
-pl011_update(s);
+pl011_write_fifo(opaque, , 1);
  break;
  case 1: /* UARTRSR/UARTECR */
  s->rsr = 0;
@@ -207,7 +269,16 @@ static void pl011_write(void *opaque, hwaddr offset,
  if ((s->lcr ^ value) & 0x10) {
  s->read_count = 0;
  s->read_pos = 0;
+
+if (s->watch_tag) {
+g_source_remove(s->watch_tag);
+s->watch_tag = 0;
+}
+s->write_count = 0;
+s->flags &= ~PL011_FLAG_TXFF;
+s->flags |= PL011_FLAG_TXFE;
  }
+
  s->lcr = value;
  pl011_set_read_trigger(s);
  break;
@@ -292,6 +363,24 @@ static const MemoryRegionOps pl011_ops = {
  .endianness = DEVICE_NATIVE_ENDIAN,
  };
  
+static bool pl011_write_fifo_needed(void *opaque)

+{
+PL011State *s = PL011(opaque);
+return s->write_count > 0;
+}
+
+static const VMStateDescription vmstate_pl011_write_fifo = {
+.name = "pl011/write_fifo",
+.version_id = 1,
+.minimum_version_id = 1,
+.needed = pl011_write_fifo_needed,
+.fields = (VMStateField[]) {
+VMSTATE_INT32(write_count, PL011State),
+VMSTATE_UINT8_ARRAY(write_fifo, PL011State, 16),
+VMSTATE_END_OF_LIST()
+}
+};
+
  static const VMStateDescription vmstate_pl011 = {
  .name = "pl011",
  .version_id = 2,
@@ -314,6 +403,10 @@ static const VMStateDescription

Re: [PATCH v16 QEMU 14/16] vfio: Add vfio_listener_log_sync to mark dirty pages

2020-03-24 Thread Yan Zhao

On Wed, Mar 25, 2020 at 05:09:12AM +0800, Kirti Wankhede wrote:
> vfio_listener_log_sync gets list of dirty pages from container using
> VFIO_IOMMU_GET_DIRTY_BITMAP ioctl and mark those pages dirty when all
> devices are stopped and saving state.
> Return early for the RAM block section of mapped MMIO region.
> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  hw/vfio/common.c | 200 
> +--
>  hw/vfio/trace-events |   1 +
>  2 files changed, 196 insertions(+), 5 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 4a2f0d6a2233..6d41e1ac5c2f 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -29,6 +29,7 @@
>  #include "hw/vfio/vfio.h"
>  #include "exec/address-spaces.h"
>  #include "exec/memory.h"
> +#include "exec/ram_addr.h"
>  #include "hw/hw.h"
>  #include "qemu/error-report.h"
>  #include "qemu/main-loop.h"
> @@ -38,6 +39,7 @@
>  #include "sysemu/reset.h"
>  #include "trace.h"
>  #include "qapi/error.h"
> +#include "migration/migration.h"
>  
>  VFIOGroupList vfio_group_list =
>  QLIST_HEAD_INITIALIZER(vfio_group_list);
> @@ -288,6 +290,28 @@ const MemoryRegionOps vfio_region_ops = {
>  };
>  
>  /*
> + * Device state interfaces
> + */
> +
> +static bool vfio_devices_are_stopped_and_saving(void)
> +{
> +VFIOGroup *group;
> +VFIODevice *vbasedev;
> +
> +QLIST_FOREACH(group, _group_list, next) {
> +QLIST_FOREACH(vbasedev, >device_list, next) {
> +if ((vbasedev->device_state & VFIO_DEVICE_STATE_SAVING) &&
> +!(vbasedev->device_state & VFIO_DEVICE_STATE_RUNNING)) {
> +continue;
> +} else {
> +return false;
> +}
> +}
> +}
> +return true;
> +}
> +
> +/*
>   * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
>   */
>  static int vfio_dma_unmap(VFIOContainer *container,
> @@ -408,8 +432,8 @@ static bool 
> vfio_listener_skipped_section(MemoryRegionSection *section)
>  }
>  
>  /* Called with rcu_read_lock held.  */
> -static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, void **vaddr,
> -   bool *read_only)
> +static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
> +   ram_addr_t *ram_addr, bool *read_only)
>  {
>  MemoryRegion *mr;
>  hwaddr xlat;
> @@ -440,9 +464,17 @@ static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, void 
> **vaddr,
>  return false;
>  }
>  
> -*vaddr = memory_region_get_ram_ptr(mr) + xlat;
> -*read_only = !writable || mr->readonly;
> +if (vaddr) {
> +*vaddr = memory_region_get_ram_ptr(mr) + xlat;
> +}
>  
> +if (ram_addr) {
> +*ram_addr = memory_region_get_ram_addr(mr) + xlat;
> +}
> +
> +if (read_only) {
> +*read_only = !writable || mr->readonly;
> +}
>  return true;
>  }
>  
> @@ -467,7 +499,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
> IOMMUTLBEntry *iotlb)
>  rcu_read_lock();
>  
>  if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
> -if (!vfio_get_vaddr(iotlb, , _only)) {
> +if (!vfio_get_xlat_addr(iotlb, , NULL, _only)) {
>  goto out;
>  }
>  /*
> @@ -813,9 +845,167 @@ static void vfio_listener_region_del(MemoryListener 
> *listener,
>  }
>  }
>  
> +static int vfio_get_dirty_bitmap(MemoryListener *listener,
> + MemoryRegionSection *section)
> +{
> +VFIOContainer *container = container_of(listener, VFIOContainer, 
> listener);
> +VFIOGuestIOMMU *giommu;
> +IOMMUTLBEntry iotlb;
> +hwaddr granularity, address_limit, iova;
> +int ret;
> +
> +if (memory_region_is_iommu(section->mr)) {
> +QLIST_FOREACH(giommu, >giommu_list, giommu_next) {
> +if (MEMORY_REGION(giommu->iommu) == section->mr &&
> +giommu->n.start == section->offset_within_region) {
> +break;
> +}
> +}
> +
> +if (!giommu) {
> +return -EINVAL;
> +}
> +}
> +
> +if (memory_region_is_iommu(section->mr)) {
> +granularity = memory_region_iommu_get_min_page_size(giommu->iommu);
> +
> +address_limit = MIN(int128_get64(section->size),
> +
> memory_region_iommu_get_address_limit(giommu->iommu,
> + 
> int128_get64(section->size)));
> +} else {
> +granularity = memory_region_size(section->mr);
> +address_limit = int128_get64(section->size);
> +}
> +
> +iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
> +
> +RCU_READ_LOCK_GUARD();
> +
> +while (iova < address_limit) {
> +struct vfio_iommu_type1_dirty_bitmap *dbitmap;
> +struct vfio_iommu_type1_dirty_bitmap_get *range;
> +ram_addr_t start, pages;
> +uint64_t iova_xlat, size;
> +
> +if

Re: [PATCH v16 Kernel 5/7] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap

2020-03-24 Thread Yan Zhao

On Wed, Mar 25, 2020 at 03:32:37AM +0800, Kirti Wankhede wrote:
> DMA mapped pages, including those pinned by mdev vendor drivers, might
> get unpinned and unmapped while migration is active and device is still
> running. For example, in pre-copy phase while guest driver could access
> those pages, host device or vendor driver can dirty these mapped pages.
> Such pages should be marked dirty so as to maintain memory consistency
> for a user making use of dirty page tracking.
> 
> To get bitmap during unmap, user should allocate memory for bitmap, set
> size of allocated memory, set page size to be considered for bitmap and
> set flag VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP.
> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 54 
> ++---
>  include/uapi/linux/vfio.h   | 10 
>  2 files changed, 60 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 27ed069c5053..b98a8d79e13a 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -982,7 +982,8 @@ static int verify_bitmap_size(uint64_t npages, uint64_t 
> bitmap_size)
>  }
>  
>  static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
> -  struct vfio_iommu_type1_dma_unmap *unmap)
> +  struct vfio_iommu_type1_dma_unmap *unmap,
> +  struct vfio_bitmap *bitmap)
>  {
>   uint64_t mask;
>   struct vfio_dma *dma, *dma_last = NULL;
> @@ -1033,6 +1034,10 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>* will be returned if these conditions are not met.  The v2 interface
>* will only return success and a size of zero if there were no
>* mappings within the range.
> +  *
> +  * When VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP flag is set, unmap request
> +  * must be for single mapping. Multiple mappings with this flag set is
> +  * not supported.
>*/
>   if (iommu->v2) {
>   dma = vfio_find_dma(iommu, unmap->iova, 1);
> @@ -1040,6 +1045,13 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>   ret = -EINVAL;
>   goto unlock;
>   }
> +
> + if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
> + (dma->iova != unmap->iova || dma->size != unmap->size)) {
potential NULL pointer!

And could you address the comments in v14?
How to handle DSI unmaps in vIOMMU
(https://lore.kernel.org/kvm/20200323011041.GB5456@joy-OptiPlex-7040/)

> + ret = -EINVAL;
> + goto unlock;
> + }
> +
>   dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
>   if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
>   ret = -EINVAL;
> @@ -1057,6 +1069,11 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>   if (dma->task->mm != current->mm)
>   break;
>  
> + if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
> +  iommu->dirty_page_tracking)
> + vfio_iova_dirty_bitmap(iommu, dma->iova, dma->size,
> + bitmap->pgsize, bitmap->data);
> +
>   if (!RB_EMPTY_ROOT(>pfn_list)) {
>   struct vfio_iommu_type1_dma_unmap nb_unmap;
>  
> @@ -2418,17 +2435,46 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  
>   } else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
>   struct vfio_iommu_type1_dma_unmap unmap;
> - long ret;
> + struct vfio_bitmap bitmap = { 0 };
> + int ret;
>  
>   minsz = offsetofend(struct vfio_iommu_type1_dma_unmap, size);
>  
>   if (copy_from_user(, (void __user *)arg, minsz))
>   return -EFAULT;
>  
> - if (unmap.argsz < minsz || unmap.flags)
> + if (unmap.argsz < minsz ||
> + unmap.flags & ~VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP)
>   return -EINVAL;
>  
> - ret = vfio_dma_do_unmap(iommu, );
> + if (unmap.flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) {
> + unsigned long pgshift;
> + uint64_t iommu_pgsize =
> +  1 << __ffs(vfio_pgsize_bitmap(iommu));
> +
> + if (unmap.argsz < (minsz + sizeof(bitmap)))
> + return -EINVAL;
> +
> + if (copy_from_user(,
> +(void __user *)(arg + minsz),
> +sizeof(bitmap)))
> + return -EFAULT;
> +
> + /* allow only min supported pgsize */
> + if (bitmap.pgsize != iommu_pgsize)
> +

Re: [RFC PATCH v2 1/7] vfio-ccw: Return IOINST_CC_NOT_OPERATIONAL for EIO

2020-03-24 Thread Halil Pasic

On Tue, 24 Mar 2020 18:04:30 +0100
Cornelia Huck  wrote:

> On Thu,  6 Feb 2020 22:45:03 +0100
> Eric Farman  wrote:
> 
> > From: Farhan Ali 
> > 
> > EIO is returned by vfio-ccw mediated device when the backing
> > host subchannel is not operational anymore. So return cc=3
> > back to the guest, rather than returning a unit check.
> > This way the guest can take appropriate action such as
> > issue an 'stsch'.

I believe this is not the only situation when vfio-ccw returns
EIO, or?

> > 
> > Signed-off-by: Farhan Ali 
> > Signed-off-by: Eric Farman 
> > ---
> > 
> > Notes:
> > v1->v2: [EF]
> >  - Add s-o-b
> >  - [Seems the discussion on v1 centered on the return code
> >set in the kernel, rather than anything that needs to
> >change here, unless I've missed something.]

Does this need to change here? If the kernel is supposed to return ENODEV
then this does not need to change.

> 
> I've stared at this and at the kernel code for some time again; and I'm
> not sure if "return -EIO == not operational" is even true. That said,
> I'm not sure a unit check is the right response, either. The only thing
> I'm sure about is that the kernel code needs some review of return
> codes and some documentation...

I could not agree more, this is semantically uapi and needs to be
properly documented.

With regards to "linux error codes: vs "ionist cc's" an where
the mapping is different example:

"""
/** 
 * cio_cancel_halt_clear - Cancel running I/O by performing cancel, halt
 * and clear ordinally if subchannel is valid.  
 * @sch: subchannel on which to perform the cancel_halt_clear operation 
 * @iretry: the number of the times remained to retry the next operation
 *  
 * This should be called repeatedly since halt/clear are asynchronous   
 * operations. We do one try with cio_cancel, three tries with cio_halt,
 * 255 tries with cio_clear. The caller should initialize @iretry with  
 * the value 255 for its first call to this, and keep using the same
 * @iretry in the subsequent calls until it gets a non -EBUSY return.   
 *  
 * Returns 0 if device now idle, -ENODEV for device not operational,
 * -EBUSY if an interrupt is expected (either from halt/clear or from a 
 * status pending), and -EIO if out of retries. 
 */ 
int cio_cancel_halt_clear(struct subchannel *sch, int *iretry)   

"""
Here -ENODEV is not operational.

Regards,
Halil

> 
> > 
> >  hw/vfio/ccw.c | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/hw/vfio/ccw.c b/hw/vfio/ccw.c
> > index 50cc2ec75c..19144ecfc7 100644
> > --- a/hw/vfio/ccw.c
> > +++ b/hw/vfio/ccw.c
> > @@ -114,6 +114,7 @@ again:
> >  return IOINST_CC_BUSY;
> >  case -ENODEV:
> >  case -EACCES:
> > +case -EIO:
> >  return IOINST_CC_NOT_OPERATIONAL;
> >  case -EFAULT:
> >  default:
> 
>

Re: [PATCH v16 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-03-24 Thread Yan Zhao

On Wed, Mar 25, 2020 at 05:18:52AM +0800, Kirti Wankhede wrote:
> VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
> - Start dirty pages tracking while migration is active
> - Stop dirty pages tracking.
> - Get dirty pages bitmap. Its user space application's responsibility to
>   copy content of dirty pages from source to destination during migration.
> 
> To prevent DoS attack, memory for bitmap is allocated per vfio_dma
> structure. Bitmap size is calculated considering smallest supported page
> size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled
> 
> Bitmap is populated for already pinned pages when bitmap is allocated for
> a vfio_dma with the smallest supported page size. Update bitmap from
> pinning functions when tracking is enabled. When user application queries
> bitmap, check if requested page size is same as page size used to
> populated bitmap. If it is equal, copy bitmap, but if not equal, return
> error.
> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 266 
> +++-
>  1 file changed, 260 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 70aeab921d0f..874a1a7ae925 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -71,6 +71,7 @@ struct vfio_iommu {
>   unsigned intdma_avail;
>   boolv2;
>   boolnesting;
> + booldirty_page_tracking;
>  };
>  
>  struct vfio_domain {
> @@ -91,6 +92,7 @@ struct vfio_dma {
>   boollock_cap;   /* capable(CAP_IPC_LOCK) */
>   struct task_struct  *task;
>   struct rb_root  pfn_list;   /* Ex-user pinned pfn list */
> + unsigned long   *bitmap;
>  };
>  
>  struct vfio_group {
> @@ -125,7 +127,21 @@ struct vfio_regions {
>  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)  \
>   (!list_empty(>domain_list))
>  
> +#define DIRTY_BITMAP_BYTES(n)(ALIGN(n, BITS_PER_TYPE(u64)) / 
> BITS_PER_BYTE)
> +
> +/*
> + * Input argument of number of bits to bitmap_set() is unsigned integer, 
> which
> + * further casts to signed integer for unaligned multi-bit operation,
> + * __bitmap_set().
> + * Then maximum bitmap size supported is 2^31 bits divided by 2^3 bits/byte,
> + * that is 2^28 (256 MB) which maps to 2^31 * 2^12 = 2^43 (8TB) on 4K page
> + * system.
> + */
> +#define DIRTY_BITMAP_PAGES_MAX   (uint64_t)(INT_MAX - 1)
> +#define DIRTY_BITMAP_SIZE_MAX 
> DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
> +
>  static int put_pfn(unsigned long pfn, int prot);
> +static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu);
>  
>  /*
>   * This code handles mapping and unmapping of user data buffers
> @@ -175,6 +191,77 @@ static void vfio_unlink_dma(struct vfio_iommu *iommu, 
> struct vfio_dma *old)
>   rb_erase(>node, >dma_list);
>  }
>  
> +
> +static int vfio_dma_bitmap_alloc(struct vfio_dma *dma, uint64_t pgsize)
> +{
> + uint64_t npages = dma->size / pgsize;
> +
> + if (npages > DIRTY_BITMAP_PAGES_MAX)
> + return -EINVAL;
> +
> + dma->bitmap = kvzalloc(DIRTY_BITMAP_BYTES(npages), GFP_KERNEL);
> + if (!dma->bitmap)
> + return -ENOMEM;
> +
> + return 0;
> +}
> +
> +static void vfio_dma_bitmap_free(struct vfio_dma *dma)
> +{
> + kfree(dma->bitmap);
> + dma->bitmap = NULL;
> +}
> +
> +static void vfio_dma_populate_bitmap(struct vfio_dma *dma, uint64_t pgsize)
> +{
> + struct rb_node *p;
> +
> + if (RB_EMPTY_ROOT(>pfn_list))
> + return;
> +
> + for (p = rb_first(>pfn_list); p; p = rb_next(p)) {
> + struct vfio_pfn *vpfn = rb_entry(p, struct vfio_pfn, node);
> +
> + bitmap_set(dma->bitmap, (vpfn->iova - dma->iova) / pgsize, 1);
> + }
> +}
> +
> +static int vfio_dma_bitmap_alloc_all(struct vfio_iommu *iommu, uint64_t 
> pgsize)
> +{
> + struct rb_node *n = rb_first(>dma_list);
> +
> + for (; n; n = rb_next(n)) {
> + struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
> + int ret;
> +
> + ret = vfio_dma_bitmap_alloc(dma, pgsize);
> + if (ret) {
> + struct rb_node *p = rb_prev(n);
> +
> + for (; p; p = rb_prev(p)) {
> + struct vfio_dma *dma = rb_entry(n,
> + struct vfio_dma, node);
> +
> + vfio_dma_bitmap_free(dma);
> + }
> + return ret;
> + }
> + vfio_dma_populate_bitmap(dma, pgsize);
> + }
> + return 0;
> +}
> +
> +static void vfio_dma_bitmap_free_all(struct vfio_iommu *iommu)
> +{
> + struct rb_node *n = rb_first(>dma_list);
> +
> + for (; n; n =

Re: [PATCH v2 1/1] net/colo-compare.c: Fix memory leak in packet_enqueue()

2020-03-24 Thread Jing-Wei Su

Zhang, Chen  於 2020年3月25日 週三 上午9:37寫道：
>
>
>
> > -Original Message-
> > From: Jing-Wei Su 
> > Sent: Tuesday, March 24, 2020 10:47 AM
> > To: Zhang, Chen 
> > Cc: qemu-devel@nongnu.org; lizhij...@cn.fujitsu.com;
> > jasow...@redhat.com; dere...@qnap.com
> > Subject: Re: [PATCH v2 1/1] net/colo-compare.c: Fix memory leak in
> > packet_enqueue()
> >
> > Zhang, Chen  於 2020年3月24日 週二 上午3:24
> > 寫道：
> > >
> > >
> > >
> > > > -Original Message-
> > > > From: Derek Su 
> > > > Sent: Monday, March 23, 2020 1:48 AM
> > > > To: qemu-devel@nongnu.org
> > > > Cc: Zhang, Chen ; lizhij...@cn.fujitsu.com;
> > > > jasow...@redhat.com; dere...@qnap.com
> > > > Subject: [PATCH v2 1/1] net/colo-compare.c: Fix memory leak in
> > > > packet_enqueue()
> > > >
> > > > The patch is to fix the "pkt" memory leak in packet_enqueue().
> > > > The allocated "pkt" needs to be freed if the colo compare primary or
> > > > secondary queue is too big.
> > >
> > > Hi Derek,
> > >
> > > Thank you for the patch.
> > > I re-think this issue in a big view, looks just free the pkg is not 
> > > enough in
> > this situation.
> > > The root cause is network is too busy to compare, So, better choice is
> > > notify COLO frame to do a checkpoint and clean up all the network
> > > queue. This work maybe decrease COLO network performance but seams
> > better than drop lots of pkg.
> > >
> > > Thanks
> > > Zhang Chen
> > >
> >
> > Hello, Zhang
> >
> > Got it.
> > What is the concern of the massive "drop packets"?
> > Does the behavior make the COLO do checkpoint periodically (~20 seconds)
> > instead of doing immediate checkpoint when encountering different
> > response packets?
>
> The concern of the "drop packets" is guest will lose network connection with
> most of real clients until next periodic force checkpoint. COLO designed for 
> dynamic
> control checkpoint, so I think do a checkpoint here will help guest supply 
> service faster.
>

I see.
I'll update the patch with your suggestion later.

> >
> > It seems that frequent checkpoints caused by the full queue (busy
> > network) instead of different
> > response packets may harm the high speed network (10 Gbps or higher)
> > performance dramatically.
>
> Yes, maybe I can send a patch to make user adjust queue size depend on it's 
> own environment.
> But with larger queue size, colo-compare will spend much time to do compare 
> packet when network
> Is real busy status.

Thank you. The user-configurable queue size will be very helpful.

Thanks.
Derek Su

>
> Thanks
> Zhang Chen
>
> >
> > Thanks
> > Derek
> >
> > > >
> > > > Signed-off-by: Derek Su 
> > > > ---
> > > >  net/colo-compare.c | 23 +++
> > > >  1 file changed, 15 insertions(+), 8 deletions(-)
> > > >
> > > > diff --git a/net/colo-compare.c b/net/colo-compare.c index
> > > > 7ee17f2cf8..cdd87b2aa8 100644
> > > > --- a/net/colo-compare.c
> > > > +++ b/net/colo-compare.c
> > > > @@ -120,6 +120,10 @@ enum {
> > > >  SECONDARY_IN,
> > > >  };
> > > >
> > > > +static const char *colo_mode[] = {
> > > > +[PRIMARY_IN] = "primary",
> > > > +[SECONDARY_IN] = "secondary",
> > > > +};
> > > >
> > > >  static int compare_chr_send(CompareState *s,
> > > >  const uint8_t *buf, @@ -215,6 +219,7 @@
> > > > static int packet_enqueue(CompareState *s, int mode, Connection
> > **con)
> > > >  ConnectionKey key;
> > > >  Packet *pkt = NULL;
> > > >  Connection *conn;
> > > > +int ret;
> > > >
> > > >  if (mode == PRIMARY_IN) {
> > > >  pkt = packet_new(s->pri_rs.buf, @@ -243,16 +248,18 @@
> > > > static int packet_enqueue(CompareState *s, int mode, Connection
> > **con)
> > > >  }
> > > >
> > > >  if (mode == PRIMARY_IN) {
> > > > -if (!colo_insert_packet(>primary_list, pkt, 
> > > > >pack)) {
> > > > -error_report("colo compare primary queue size too big,"
> > > > - "drop packet");
> > > > -}
> > > > +ret = colo_insert_packet(>primary_list, pkt,
> > > > + >pack);
> > > >  } else {
> > > > -if (!colo_insert_packet(>secondary_list, pkt, 
> > > > >sack)) {
> > > > -error_report("colo compare secondary queue size too big,"
> > > > - "drop packet");
> > > > -}
> > > > +ret = colo_insert_packet(>secondary_list, pkt,
> > > > + >sack);
> > > >  }
> > > > +
> > > > +if (!ret) {
> > > > +error_report("colo compare %s queue size too big,"
> > > > + "drop packet", colo_mode[mode]);
> > > > +packet_destroy(pkt, NULL);
> > > > +pkt = NULL;
> > > > +}
> > > > +
> > > >  *con = conn;
> > > >
> > > >  return 0;
> > > > --
> > > > 2.17.1
> > >

[PATCH v3] migration: use "" instead of (null) for tls-authz

2020-03-24 Thread Mao Zhongyi

run:
(qemu) info migrate_parameters
announce-initial: 50 ms
...
announce-max: 550 ms
multifd-compression: none
xbzrle-cache-size: 4194304
max-postcopy-bandwidth: 0
 tls-authz: '(null)'

Migration parameter 'tls-authz' is used to provide the QOM ID
of a QAuthZ subclass instance that provides the access control
check, default is NULL. But the empty string is not a valid
object ID, so use "" instead of the default. Although it will
fail when lookup an object with ID "", it is harmless, just
consistent with tls_creds.

As a bonus, this patch also fixed the bad indentation on the
last line and removed 'has_tls_authz' redundant check in
'hmp_info_migrate_parameters'.

Signed-off-by: Mao Zhongyi 
---
 migration/migration.c | 3 ++-
 monitor/hmp-cmds.c| 4 ++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 4b26110d57..c4c9aee15e 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -790,7 +790,8 @@ MigrationParameters *qmp_query_migrate_parameters(Error 
**errp)
 params->has_tls_hostname = true;
 params->tls_hostname = g_strdup(s->parameters.tls_hostname);
 params->has_tls_authz = true;
-params->tls_authz = g_strdup(s->parameters.tls_authz);
+params->tls_authz = g_strdup(s->parameters.tls_authz ?
+ s->parameters.tls_authz : "");
 params->has_max_bandwidth = true;
 params->max_bandwidth = s->parameters.max_bandwidth;
 params->has_downtime_limit = true;
diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
index a71de0e60b..dc48e6986c 100644
--- a/monitor/hmp-cmds.c
+++ b/monitor/hmp-cmds.c
@@ -459,9 +459,9 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict 
*qdict)
 monitor_printf(mon, "%s: %" PRIu64 "\n",
 MigrationParameter_str(MIGRATION_PARAMETER_MAX_POSTCOPY_BANDWIDTH),
 params->max_postcopy_bandwidth);
-monitor_printf(mon, " %s: '%s'\n",
+monitor_printf(mon, "%s: '%s'\n",
 MigrationParameter_str(MIGRATION_PARAMETER_TLS_AUTHZ),
-params->has_tls_authz ? params->tls_authz : "");
+params->tls_authz);
 }
 
 qapi_free_MigrationParameters(params);
-- 
2.17.1

RE: [PATCH v2 1/1] net/colo-compare.c: Fix memory leak in packet_enqueue()

2020-03-24 Thread Zhang, Chen



> -Original Message-
> From: Jing-Wei Su 
> Sent: Tuesday, March 24, 2020 10:47 AM
> To: Zhang, Chen 
> Cc: qemu-devel@nongnu.org; lizhij...@cn.fujitsu.com;
> jasow...@redhat.com; dere...@qnap.com
> Subject: Re: [PATCH v2 1/1] net/colo-compare.c: Fix memory leak in
> packet_enqueue()
> 
> Zhang, Chen  於 2020年3月24日 週二 上午3:24
> 寫道：
> >
> >
> >
> > > -Original Message-
> > > From: Derek Su 
> > > Sent: Monday, March 23, 2020 1:48 AM
> > > To: qemu-devel@nongnu.org
> > > Cc: Zhang, Chen ; lizhij...@cn.fujitsu.com;
> > > jasow...@redhat.com; dere...@qnap.com
> > > Subject: [PATCH v2 1/1] net/colo-compare.c: Fix memory leak in
> > > packet_enqueue()
> > >
> > > The patch is to fix the "pkt" memory leak in packet_enqueue().
> > > The allocated "pkt" needs to be freed if the colo compare primary or
> > > secondary queue is too big.
> >
> > Hi Derek,
> >
> > Thank you for the patch.
> > I re-think this issue in a big view, looks just free the pkg is not enough 
> > in
> this situation.
> > The root cause is network is too busy to compare, So, better choice is
> > notify COLO frame to do a checkpoint and clean up all the network
> > queue. This work maybe decrease COLO network performance but seams
> better than drop lots of pkg.
> >
> > Thanks
> > Zhang Chen
> >
> 
> Hello, Zhang
> 
> Got it.
> What is the concern of the massive "drop packets"?
> Does the behavior make the COLO do checkpoint periodically (~20 seconds)
> instead of doing immediate checkpoint when encountering different
> response packets?

The concern of the "drop packets" is guest will lose network connection with
most of real clients until next periodic force checkpoint. COLO designed for 
dynamic
control checkpoint, so I think do a checkpoint here will help guest supply 
service faster.

> 
> It seems that frequent checkpoints caused by the full queue (busy
> network) instead of different
> response packets may harm the high speed network (10 Gbps or higher)
> performance dramatically.

Yes, maybe I can send a patch to make user adjust queue size depend on it's own 
environment.
But with larger queue size, colo-compare will spend much time to do compare 
packet when network
Is real busy status.

Thanks
Zhang Chen   

> 
> Thanks
> Derek
> 
> > >
> > > Signed-off-by: Derek Su 
> > > ---
> > >  net/colo-compare.c | 23 +++
> > >  1 file changed, 15 insertions(+), 8 deletions(-)
> > >
> > > diff --git a/net/colo-compare.c b/net/colo-compare.c index
> > > 7ee17f2cf8..cdd87b2aa8 100644
> > > --- a/net/colo-compare.c
> > > +++ b/net/colo-compare.c
> > > @@ -120,6 +120,10 @@ enum {
> > >  SECONDARY_IN,
> > >  };
> > >
> > > +static const char *colo_mode[] = {
> > > +[PRIMARY_IN] = "primary",
> > > +[SECONDARY_IN] = "secondary",
> > > +};
> > >
> > >  static int compare_chr_send(CompareState *s,
> > >  const uint8_t *buf, @@ -215,6 +219,7 @@
> > > static int packet_enqueue(CompareState *s, int mode, Connection
> **con)
> > >  ConnectionKey key;
> > >  Packet *pkt = NULL;
> > >  Connection *conn;
> > > +int ret;
> > >
> > >  if (mode == PRIMARY_IN) {
> > >  pkt = packet_new(s->pri_rs.buf, @@ -243,16 +248,18 @@
> > > static int packet_enqueue(CompareState *s, int mode, Connection
> **con)
> > >  }
> > >
> > >  if (mode == PRIMARY_IN) {
> > > -if (!colo_insert_packet(>primary_list, pkt, >pack)) {
> > > -error_report("colo compare primary queue size too big,"
> > > - "drop packet");
> > > -}
> > > +ret = colo_insert_packet(>primary_list, pkt,
> > > + >pack);
> > >  } else {
> > > -if (!colo_insert_packet(>secondary_list, pkt, 
> > > >sack)) {
> > > -error_report("colo compare secondary queue size too big,"
> > > - "drop packet");
> > > -}
> > > +ret = colo_insert_packet(>secondary_list, pkt,
> > > + >sack);
> > >  }
> > > +
> > > +if (!ret) {
> > > +error_report("colo compare %s queue size too big,"
> > > + "drop packet", colo_mode[mode]);
> > > +packet_destroy(pkt, NULL);
> > > +pkt = NULL;
> > > +}
> > > +
> > >  *con = conn;
> > >
> > >  return 0;
> > > --
> > > 2.17.1
> >

[Bug 1866892] Re: guest OS catches a page fault bug when running dotnet

2020-03-24 Thread Robert Henry

Peter: I think your intuition is right.  The POPQ_RA (pop quad, passing
through return address handle) is only called from helper_ret_protected,
and it suspiciously calls cpu_ldq_kernel_ra which calls
cpu_mmu_index_kernel which only is prepared for kernel space iretq (and
of course the substring _kernel in the function name tells us that too).

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1866892

Title:
  guest OS catches a page  fault bug when running dotnet

Status in QEMU:
  New

Bug description:
  The linux guest OS catches a page fault bug when running the dotnet
  application.

  host = metal = x86_64
  host OS = ubuntu 19.10
  qemu emulation, without KVM, with "tiny code generator" tcg; no plugins; 
built from head/master
  guest emulation = x86_64
  guest OS = ubuntu 19.10
  guest app = dotnet, running any program

  qemu sha=7bc4d1980f95387c4cc921d7a066217ff4e42b70 (head/master Mar 10,
  2020)

  qemu invocation is:

  qemu/build/x86_64-softmmu/qemu-system-x86_64 \
-m size=4096 \
-smp cpus=1 \
-machine type=pc-i440fx-5.0,accel=tcg \
-cpu Skylake-Server-v1 \
-nographic \
-bios OVMF-pure-efi.fd \
-drive if=none,id=hd0,file=ubuntu-19.10-server-cloudimg-amd64.img \
-device virtio-blk,drive=hd0 \
-drive if=none,id=cloud,file=linux_cloud_config.img \
-device virtio-blk,drive=cloud \
-netdev user,id=user0,hostfwd=tcp::2223-:22 \
-device virtio-net,netdev=user0

  
  Here's the guest kernel console output:

  
  [ 2834.005449] BUG: unable to handle page fault for address: 7fffc2c0
  [ 2834.009895] #PF: supervisor read access in user mode
  [ 2834.013872] #PF: error_code(0x0001) - permissions violation
  [ 2834.018025] IDT: 0xfe00 (limit=0xfff) GDT: 0xfe001000 
(limit=0x7f)
  [ 2834.022242] LDTR: NULL
  [ 2834.026306] TR: 0x40 -- base=0xfe003000 limit=0x206f
  [ 2834.030395] PGD 8000360d0067 P4D 8000360d0067 PUD 36105067 PMD 
36193067 PTE 800076d8e867
  [ 2834.038672] Oops: 0001 [#4] SMP PTI
  [ 2834.042707] CPU: 0 PID: 13537 Comm: dotnet Tainted: G  D   
5.3.0-29-generic #31-Ubuntu
  [ 2834.050591] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
0.0.0 02/06/2015
  [ 2834.054785] RIP: 0033:0x147eaeda
  [ 2834.059017] Code: d0 00 00 00 4c 8b a7 d8 00 00 00 4c 8b af e0 00 00 00 4c 
8b b7 e8 00 00 00 4c 8b bf f0 00 00 00 48 8b bf b0 00 00 00 9d 74 02 <48> cf 48 
8d 64 24 30 5d c3 90 cc c3 66 90 55 4c 8b a7 d8 00 00 00
  [ 2834.072103] RSP: 002b:7fffc2c0 EFLAGS: 0202
  [ 2834.076507] RAX:  RBX: 1554b401af38 RCX: 
0001
  [ 2834.080832] RDX:  RSI:  RDI: 
7fffcfb0
  [ 2834.085010] RBP: 7fffd730 R08:  R09: 
7fffd1b0
  [ 2834.089184] R10: 15331dd5 R11: 153ad8d0 R12: 
0002
  [ 2834.093350] R13: 0001 R14: 0001 R15: 
1554b401d388
  [ 2834.097309] FS:  14fa5740 GS:  
  [ 2834.101131] Modules linked in: isofs nls_iso8859_1 dm_multipath 
scsi_dh_rdac scsi_dh_emc scsi_dh_alua ppdev input_leds serio_raw parport_pc 
parport sch_fq_codel ip_tables x_tables autofs4 btrfs zstd_compress raid10 
raid456 async_raid6_recov async_memcpy async_pq async_xor async_tx xor raid6_pq 
libcrc32c raid1 raid0 multipath linear crct10dif_pclmul crc32_pclmul 
ghash_clmulni_intel aesni_intel aes_x86_64 crypto_simd cryptd glue_helper 
virtio_net psmouse net_failover failover virtio_blk floppy
  [ 2834.122539] CR2: 7fffc2c0
  [ 2834.126867] ---[ end trace dfae51f1d9432708 ]---
  [ 2834.131239] RIP: 0033:0x14d793262eda
  [ 2834.135715] Code: Bad RIP value.
  [ 2834.140243] RSP: 002b:7ffddb4e2980 EFLAGS: 0202
  [ 2834.144615] RAX:  RBX: 14d6f402acb8 RCX: 
0002
  [ 2834.148943] RDX: 01cd6950 RSI:  RDI: 
7ffddb4e3670
  [ 2834.153335] RBP: 7ffddb4e3df0 R08: 0001 R09: 
7ffddb4e3870
  [ 2834.157774] R10: 14d793da9dd5 R11: 14d793e258d0 R12: 
0002
  [ 2834.162132] R13: 0001 R14: 0001 R15: 
14d6f402d040
  [ 2834.166239] FS:  14fa5740() GS:97213ba0() 
knlGS:
  [ 2834.170529] CS:  0033 DS:  ES:  CR0: 80050033
  [ 2834.174751] CR2: 14d793262eb0 CR3: 3613 CR4: 
007406f0
  [ 2834.178892] PKRU: 5554

  I run the application from a shell with `ulimit -s unlimited`
  (unlimited stack to size).

  The application creates a number of threads, and those threads make a
  lot of calls to sigaltstack() and mprotect(); see the relevant source
  for dotnet here
  
https://github.com/dotnet/runtime/blob/15ec69e47b4dc56098e6058a11ccb6ae4d5d4fa1/src/coreclr/src/pal/src/thread/thread.cpp#L2467

  using strace -f on the app shows that no

Re: [PATCH v4 0/2] introduction of migration_version attribute for VFIO live migration

2020-03-24 Thread Yan Zhao

On Tue, Mar 24, 2020 at 10:49:54PM +0800, Alex Williamson wrote:
> On Tue, 24 Mar 2020 09:23:31 +
> "Dr. David Alan Gilbert"  wrote:
> 
> > * Yan Zhao (yan.y.z...@intel.com) wrote:
> > > On Tue, Mar 24, 2020 at 05:29:59AM +0800, Alex Williamson wrote:  
> > > > On Mon, 3 Jun 2019 20:34:22 -0400
> > > > Yan Zhao  wrote:
> > > >   
> > > > > On Tue, Jun 04, 2019 at 03:29:32AM +0800, Alex Williamson wrote:  
> > > > > > On Thu, 30 May 2019 20:44:38 -0400
> > > > > > Yan Zhao  wrote:
> > > > > > 
> > > > > > > This patchset introduces a migration_version attribute under 
> > > > > > > sysfs of VFIO
> > > > > > > Mediated devices.
> > > > > > > 
> > > > > > > This migration_version attribute is used to check migration 
> > > > > > > compatibility
> > > > > > > between two mdev devices of the same mdev type.
> > > > > > > 
> > > > > > > Patch 1 defines migration_version attribute in
> > > > > > > Documentation/vfio-mediated-device.txt
> > > > > > > 
> > > > > > > Patch 2 uses GVT as an example to show how to expose 
> > > > > > > migration_version
> > > > > > > attribute and check migration compatibility in vendor driver.
> > > > > > 
> > > > > > Thanks for iterating through this, it looks like we've settled on
> > > > > > something reasonable, but now what?  This is one piece of the 
> > > > > > puzzle to
> > > > > > supporting mdev migration, but I don't think it makes sense to 
> > > > > > commit
> > > > > > this upstream on its own without also defining the remainder of how 
> > > > > > we
> > > > > > actually do migration, preferably with more than one working
> > > > > > implementation and at least prototyped, if not final, QEMU support. 
> > > > > >  I
> > > > > > hope that was the intent, and maybe it's now time to look at the 
> > > > > > next
> > > > > > piece of the puzzle.  Thanks,
> > > > > > 
> > > > > > Alex
> > > > > 
> > > > > Got it. 
> > > > > Also thank you and all for discussing and guiding all along:)
> > > > > We'll move to the next episode now.  
> > > > 
> > > > Hi Yan,
> > > > 
> > > > As we're hopefully moving towards a migration API, would it make sense
> > > > to refresh this series at the same time?  I think we're still expecting
> > > > a vendor driver implementing Kirti's migration API to also implement
> > > > this sysfs interface for compatibility verification.  Thanks,
> > > >  
> > > Hi Alex
> > > Got it!
> > > Thanks for reminding of this. And as now we have vfio-pci implementing
> > > vendor ops to allow live migration of pass-through devices, is it
> > > necessary to implement similar sysfs node for those devices?
> > > or do you think just PCI IDs of those devices are enough for libvirt to
> > > know device compatibility ?  
> > 
> > Wasn't the problem that we'd have to know how to check for things like:
> >   a) Whether different firmware versions in the device were actually
> > compatible
> >   b) Whether minor hardware differences were compatible - e.g. some
> > hardware might let you migrate to the next version of hardware up.
> 
> Yes, minor changes in hardware or firmware that may not be represented
> in the device ID or hardware revision.  Also the version is as much for
> indicating the compatibility of the vendor defined migration protocol
> as it is for the hardware itself.  I certainly wouldn't be so bold as
> to create a protocol that is guaranteed compatible forever.  We'll need
> to expose the same sysfs attribute in some standard location for
> non-mdev devices.  I assume vfio-pci would provide the vendor ops some
> mechanism to expose these in a standard namespace of sysfs attributes
> under the device itself.  Perhaps that indicates we need to link the
> mdev type version under the mdev device as well to make this
> transparent to userspace tools like libvirt.  Thanks,
>
Got it. will do it.
Thanks!

Yan

[Bug 1868116] Re: QEMU monitor no longer works

2020-03-24 Thread Egmont Koblinger

Thanks for this investigation so far!

We've opened an upstream VTE issue at
https://gitlab.gnome.org/GNOME/vte/issues/222 .

We'd appreciate if QEMU developers joined us there. Apparently QEMU uses
the "commit" signal in a way that it was not meant to be used, and thus
it's unclear what the best solution would be.

** Bug watch added: gitlab.gnome.org/GNOME/vte/issues #222
   https://gitlab.gnome.org/GNOME/vte/issues/222

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1868116

Title:
  QEMU monitor no longer works

Status in QEMU:
  New
Status in qemu package in Ubuntu:
  Triaged
Status in vte2.91 package in Ubuntu:
  New

Bug description:
  Repro:
  VTE
  $ meson _build && ninja -C _build && ninja -C _build install

  qemu:
  $ ../configure --python=/usr/bin/python3 --disable-werror --disable-user 
--disable-linux-user --disable-docs --disable-guest-agent --disable-sdl 
--enable-gtk --disable-vnc --disable-xen --disable-brlapi --disable-fdt 
--disable-hax --disable-vde --disable-netmap --disable-rbd --disable-libiscsi 
--disable-libnfs --disable-smartcard --disable-libusb --disable-usb-redir 
--disable-seccomp --disable-glusterfs --disable-tpm --disable-numa 
--disable-opengl --disable-virglrenderer --disable-xfsctl --disable-vxhs 
--disable-slirp --disable-blobs --target-list=x86_64-softmmu --disable-rdma 
--disable-pvrdma --disable-attr --disable-vhost-net --disable-vhost-vsock 
--disable-vhost-scsi --disable-vhost-crypto --disable-vhost-user 
--disable-spice --disable-qom-cast-debug --disable-vxhs --disable-bochs 
--disable-cloop --disable-dmg --disable-qcow1 --disable-vdi --disable-vvfat 
--disable-qed --disable-parallels --disable-sheepdog --disable-avx2 
--disable-nettle --disable-gnutls --disable-capstone --disable-tools 
--disable-libpmem --disable-iconv --disable-cap-ng
  $ make

  Test:
  $ LD_LIBRARY_PATH=/usr/local/lib/x86_64-linux-gnu/:$LD_LIBRARY_PATH 
./build/x86_64-softmmu/qemu-system-x86_64 -enable-kvm --drive 
media=cdrom,file=http://archive.ubuntu.com/ubuntu/dists/bionic/main/installer-amd64/current/images/netboot/mini.iso
  - switch to monitor with CTRL+ALT+2
  - try to enter something

  Affects head of both usptream git repos.

  
  --- original bug ---

  It was observed that the QEMU console (normally accessible using
  Ctrl+Alt+2) accepts no input, so it can't be used. This is being
  problematic because there are cases where it's required to send
  commands to the guest, or key combinations that the host would grab
  (as Ctrl-Alt-F1 or Alt-F4).

  ProblemType: Bug
  DistroRelease: Ubuntu 20.04
  Package: qemu 1:4.2-3ubuntu2
  Uname: Linux 5.6.0-rc6+ x86_64
  ApportVersion: 2.20.11-0ubuntu20
  Architecture: amd64
  CurrentDesktop: XFCE
  Date: Thu Mar 19 12:16:31 2020
  Dependencies:

  InstallationDate: Installed on 2017-06-13 (1009 days ago)
  InstallationMedia: Xubuntu 17.04 "Zesty Zapus" - Release amd64 (20170412)
  KvmCmdLine:
   COMMAND STAT  EUID  RUID PIDPPID %CPU COMMAND
   qemu-system-x86 Sl+   1000  1000   34275   25235 29.2 qemu-system-x86_64 -m 
4G -cpu Skylake-Client -device virtio-vga,virgl=true,xres=1280,yres=720 -accel 
kvm -device nec-usb-xhci -serial vc -serial stdio -hda 
/home/usuario/Sistemas/androidx86.img -display gtk,gl=on -device usb-audio
   kvm-nx-lpage-re S0 0   34284   2  0.0 [kvm-nx-lpage-re]
   kvm-pit/34275   S0 0   34286   2  0.0 [kvm-pit/34275]
  MachineType: LENOVO 80UG
  ProcKernelCmdLine: BOOT_IMAGE=/boot/vmlinuz-5.6.0-rc6+ 
root=UUID=6b4ae5c0-c78c-49a6-a1ba-029192618a7a ro quiet ro kvm.ignore_msrs=1 
kvm.report_ignored_msrs=0 kvm.halt_poll_ns=0 kvm.halt_poll_ns_grow=0 
i915.enable_gvt=1 i915.fastboot=1 cgroup_enable=memory swapaccount=1 
zswap.enabled=1 zswap.zpool=z3fold 
resume=UUID=a82e38a0-8d20-49dd-9cbd-de7216b589fc log_buf_len=16M 
usbhid.quirks=0x0079:0x0006:0x10 config_scsi_mq_default=y 
scsi_mod.use_blk_mq=1 mtrr_gran_size=64M mtrr_chunk_size=64M nbd.nbds_max=2 
nbd.max_part=63
  SourcePackage: qemu
  UpgradeStatus: Upgraded to focal on 2019-12-22 (87 days ago)
  dmi.bios.date: 08/09/2018
  dmi.bios.vendor: LENOVO
  dmi.bios.version: 0XCN45WW
  dmi.board.asset.tag: NO Asset Tag
  dmi.board.name: Toronto 4A2
  dmi.board.vendor: LENOVO
  dmi.board.version: SDK0J40679 WIN
  dmi.chassis.asset.tag: NO Asset Tag
  dmi.chassis.type: 10
  dmi.chassis.vendor: LENOVO
  dmi.chassis.version: Lenovo ideapad 310-14ISK
  dmi.modalias: 
dmi:bvnLENOVO:bvr0XCN45WW:bd08/09/2018:svnLENOVO:pn80UG:pvrLenovoideapad310-14ISK:rvnLENOVO:rnToronto4A2:rvrSDK0J40679WIN:cvnLENOVO:ct10:cvrLenovoideapad310-14ISK:
  dmi.product.family: IDEAPAD
  dmi.product.name: 80UG
  dmi.product.sku: LENOVO_MT_80UG_BU_idea_FM_Lenovo ideapad 310-14ISK
  dmi.product.version: Lenovo ideapad 310-14ISK
  dmi.sys.vendor: LENOVO
  mtime.conffile..etc.apport.crashdb.conf: 2019-08-29T08:39:36.787240

To manage notifications about this bug

Re: [PATCH v16 QEMU 00/16] Add migration support for VFIO devices

2020-03-24 Thread no-reply

Patchew URL: 
https://patchew.org/QEMU/1585084154-29461-1-git-send-email-kwankh...@nvidia.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC  x86_64-softmmu/hw/vfio/pci-quirks.o
  CC  aarch64-softmmu/hw/intc/exynos4210_combiner.o
/tmp/qemu-test/src/hw/vfio/common.c: In function 'vfio_listerner_log_sync':
/tmp/qemu-test/src/hw/vfio/common.c:945:66: error: 'giommu' may be used 
uninitialized in this function [-Werror=maybe-uninitialized]
 
memory_region_iommu_get_address_limit(giommu->iommu,
  ^
/tmp/qemu-test/src/hw/vfio/common.c:923:21: note: 'giommu' was declared here
 VFIOGuestIOMMU *giommu;
 ^
cc1: all warnings being treated as errors
make[1]: *** [hw/vfio/common.o] Error 1
make[1]: *** Waiting for unfinished jobs
  CC  aarch64-softmmu/hw/intc/omap_intc.o
  CC  aarch64-softmmu/hw/intc/bcm2835_ic.o
---
  CC  aarch64-softmmu/hw/vfio/amd-xgbe.o
  CC  aarch64-softmmu/hw/virtio/virtio.o
  CC  aarch64-softmmu/hw/virtio/vhost.o
make: *** [x86_64-softmmu/all] Error 2
make: *** Waiting for unfinished jobs
  CC  aarch64-softmmu/hw/virtio/vhost-backend.o
  CC  aarch64-softmmu/hw/virtio/vhost-user.o
---
  CC  aarch64-softmmu/hw/virtio/virtio-iommu.o
  CC  aarch64-softmmu/hw/virtio/vhost-vsock.o
/tmp/qemu-test/src/hw/vfio/common.c: In function 'vfio_listerner_log_sync':
/tmp/qemu-test/src/hw/vfio/common.c:945:66: error: 'giommu' may be used 
uninitialized in this function [-Werror=maybe-uninitialized]
 
memory_region_iommu_get_address_limit(giommu->iommu,
  ^
/tmp/qemu-test/src/hw/vfio/common.c:923:21: note: 'giommu' was declared here
 VFIOGuestIOMMU *giommu;
 ^
cc1: all warnings being treated as errors
make[1]: *** [hw/vfio/common.o] Error 1
make[1]: *** Waiting for unfinished jobs
make: *** [aarch64-softmmu/all] Error 2
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 664, in 
sys.exit(main())
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=c9b01bcc7fc04e2d8f5e74bf460f0d7a', '-u', 
'1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-lne31pn7/src/docker-src.2020-03-24-19.33.46.14149:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=c9b01bcc7fc04e2d8f5e74bf460f0d7a
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-lne31pn7/src'
make: *** [docker-run-test-quick@centos7] Error 2

real3m5.634s
user0m8.335s


The full log is available at
http://patchew.org/logs/1585084154-29461-1-git-send-email-kwankh...@nvidia.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH 0/6] dwc-hsotg (aka dwc2) USB host contoller emulation

2020-03-24 Thread Paul Zimmerman

Thanks Gerd. I will switch over to using tracepoints, wait a few days to
see if there are any more comments, then resubmit.

Thanks,
Paul

On Mon, Mar 23, 2020 at 4:10 AM Gerd Hoffmann  wrote:

>   Hi,
>
> > 1) I have used printf-based debug statements while developing the
> >code, and have not implemented any tracing statements. I'm not
> >sure if that is considered acceptable for new code?
>
> Please use tracepoints.  I'd suggest to use the "log" trace backend
> which comes very close to printf-debugging; effectively all trace
> points are turned into runtime-switchable printf's.
>
> Mixing (temporary) debug printfs and tracepoints works.
>
> > 2) I have imported the register description file from the Linux
> >kernel. This file is licensed GPL-2 only, is this OK?
>
> Yes.  There even is a script to keep things in sync and apply some
> tweaks like replacing linux kernel types with standard C types
> (s/u32/uint32_t/ etc).
>
> See scripts/update-linux-headers.sh
>
> You might consider hooking up your file there, but probably this is
> overkill given that the register descriptions are unlikely to see
> frequent updates.
>
> > 3) The emulation does not respect the max-packet size when
> >transferring packets. Since the dwc-hsotg controller only has
> >one root port, and the Qemu USB hub is only full-speed, that
> >means every device connected has to run at full speed. That
> >makes mass-storage devices in particular run very slowly. Using
> >transfers greater than max-packet size alleviates this. Is this
> >OK? I think the EHCI emulation does the same thing, since its
> >transfers seem to run at greater than real world transfer rates.
>
> I don't think ehci uses larger packets.  I think it simply does more
> transfers than physical hardware would be able to do.
>
> uhci is pretty strict here, it counts bytes transfered and simply stops
> processing queues when it has transfered enough data for the current
> frame.  On the next frame timer tick it resumes work.  There is a
> bandwidth= property to tweak the transfer limit, you can use that to
> make uhci emulation run at the speed you want ;)
>
> ehci and xhci simply don't count bytes and don't have a limit, they go
> process queues as long as there is work to do (and they don't have to
> wait for host block I/O).
>
> > 4) I have only implemented host mode for now. Would there be any
> >benefit to implementing gadget mode as well? It seems it could
> >be useful to emulate gadget devices in Qemu, but I am not sure
> >if Qemu currently offers any support for that?
>
> No, there isn't any gadget support yet.
>
> cheers,
>   Gerd
>
>

[PATCH v9 11/14] iotests: add script_initialize

2020-03-24 Thread John Snow

Like script_main, but doesn't require a single point of entry.
Replace all existing initialization sections with this drop-in replacement.

This brings debug support to all existing script-style iotests.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/149|  3 +-
 tests/qemu-iotests/194|  4 +-
 tests/qemu-iotests/202|  4 +-
 tests/qemu-iotests/203|  4 +-
 tests/qemu-iotests/206|  2 +-
 tests/qemu-iotests/207|  6 ++-
 tests/qemu-iotests/208|  2 +-
 tests/qemu-iotests/209|  2 +-
 tests/qemu-iotests/210|  6 ++-
 tests/qemu-iotests/211|  6 ++-
 tests/qemu-iotests/212|  6 ++-
 tests/qemu-iotests/213|  6 ++-
 tests/qemu-iotests/216|  4 +-
 tests/qemu-iotests/218|  2 +-
 tests/qemu-iotests/219|  2 +-
 tests/qemu-iotests/222|  7 ++--
 tests/qemu-iotests/224|  4 +-
 tests/qemu-iotests/228|  6 ++-
 tests/qemu-iotests/234|  4 +-
 tests/qemu-iotests/235|  4 +-
 tests/qemu-iotests/236|  2 +-
 tests/qemu-iotests/237|  2 +-
 tests/qemu-iotests/238|  2 +
 tests/qemu-iotests/242|  2 +-
 tests/qemu-iotests/246|  2 +-
 tests/qemu-iotests/248|  2 +-
 tests/qemu-iotests/254|  2 +-
 tests/qemu-iotests/255|  2 +-
 tests/qemu-iotests/256|  2 +-
 tests/qemu-iotests/258|  7 ++--
 tests/qemu-iotests/260|  4 +-
 tests/qemu-iotests/262|  4 +-
 tests/qemu-iotests/264|  4 +-
 tests/qemu-iotests/277|  2 +
 tests/qemu-iotests/280|  8 ++--
 tests/qemu-iotests/283|  4 +-
 tests/qemu-iotests/iotests.py | 75 +++
 37 files changed, 129 insertions(+), 81 deletions(-)

diff --git a/tests/qemu-iotests/149 b/tests/qemu-iotests/149
index b4a21bf7b7..852768f80a 100755
--- a/tests/qemu-iotests/149
+++ b/tests/qemu-iotests/149
@@ -382,8 +382,7 @@ def test_once(config, qemu_img=False):
 
 
 # Obviously we only work with the luks image format
-iotests.verify_image_format(supported_fmts=['luks'])
-iotests.verify_platform()
+iotests.script_initialize(supported_fmts=['luks'])
 
 # We need sudo in order to run cryptsetup to create
 # dm-crypt devices. This is safe to use on any
diff --git a/tests/qemu-iotests/194 b/tests/qemu-iotests/194
index 9dc1bd3510..8b1f720af4 100755
--- a/tests/qemu-iotests/194
+++ b/tests/qemu-iotests/194
@@ -21,8 +21,8 @@
 
 import iotests
 
-iotests.verify_image_format(supported_fmts=['qcow2', 'qed', 'raw'])
-iotests.verify_platform(['linux'])
+iotests.script_initialize(supported_fmts=['qcow2', 'qed', 'raw'],
+  supported_platforms=['linux'])
 
 with iotests.FilePath('source.img') as source_img_path, \
  iotests.FilePath('dest.img') as dest_img_path, \
diff --git a/tests/qemu-iotests/202 b/tests/qemu-iotests/202
index 920a8683ef..e3900a44d1 100755
--- a/tests/qemu-iotests/202
+++ b/tests/qemu-iotests/202
@@ -24,8 +24,8 @@
 
 import iotests
 
-iotests.verify_image_format(supported_fmts=['qcow2'])
-iotests.verify_platform(['linux'])
+iotests.script_initialize(supported_fmts=['qcow2'],
+  supported_platforms=['linux'])
 
 with iotests.FilePath('disk0.img') as disk0_img_path, \
  iotests.FilePath('disk1.img') as disk1_img_path, \
diff --git a/tests/qemu-iotests/203 b/tests/qemu-iotests/203
index 49eff5d405..4b4bd3307d 100755
--- a/tests/qemu-iotests/203
+++ b/tests/qemu-iotests/203
@@ -24,8 +24,8 @@
 
 import iotests
 
-iotests.verify_image_format(supported_fmts=['qcow2'])
-iotests.verify_platform(['linux'])
+iotests.script_initialize(supported_fmts=['qcow2'],
+  supported_platforms=['linux'])
 
 with iotests.FilePath('disk0.img') as disk0_img_path, \
  iotests.FilePath('disk1.img') as disk1_img_path, \
diff --git a/tests/qemu-iotests/206 b/tests/qemu-iotests/206
index e2b50ae24d..f42432a838 100755
--- a/tests/qemu-iotests/206
+++ b/tests/qemu-iotests/206
@@ -23,7 +23,7 @@
 import iotests
 from iotests import imgfmt
 
-iotests.verify_image_format(supported_fmts=['qcow2'])
+iotests.script_initialize(supported_fmts=['qcow2'])
 
 with iotests.FilePath('t.qcow2') as disk_path, \
  iotests.FilePath('t.qcow2.base') as backing_path, \
diff --git a/tests/qemu-iotests/207 b/tests/qemu-iotests/207
index 3d9c1208ca..a6621410da 100755
--- a/tests/qemu-iotests/207
+++ b/tests/qemu-iotests/207
@@ -24,8 +24,10 @@ import iotests
 import subprocess
 import re
 
-iotests.verify_image_format(supported_fmts=['raw'])
-iotests.verify_protocol(supported=['ssh'])
+iotests.script_initialize(
+supported_fmts=['raw'],
+supported_protocols=['ssh'],
+)
 
 def filter_hash(qmsg):
 def _filter(key, value):
diff --git a/tests/qemu-iotests/208 b/tests/qemu-iotests/208
index 1c3fc8c7fd..6cb642f821 100755
--- a/tests/qemu-iotests/208
+++ b/tests/qemu-iotests/208
@@ -22,7 +22,7 @@
 
 import iotests

[PATCH v9 12/14] iotest 258: use script_main

2020-03-24 Thread John Snow

Since this one is nicely factored to use a single entry point,
use script_main to run the tests.

Signed-off-by: John Snow 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/258 | 11 ---
 1 file changed, 4 insertions(+), 7 deletions(-)

diff --git a/tests/qemu-iotests/258 b/tests/qemu-iotests/258
index a65151dda6..e305a1502f 100755
--- a/tests/qemu-iotests/258
+++ b/tests/qemu-iotests/258
@@ -23,12 +23,6 @@ import iotests
 from iotests import log, qemu_img, qemu_io_silent, \
 filter_qmp_testfiles, filter_qmp_imgfmt
 
-# Need backing file and change-backing-file support
-iotests.script_initialize(
-supported_fmts=['qcow2', 'qed'],
-supported_platforms=['linux'],
-)
-
 # Returns a node for blockdev-add
 def node(node_name, path, backing=None, fmt=None, throttle=None):
 if fmt is None:
@@ -161,4 +155,7 @@ def main():
 test_concurrent_finish(False)
 
 if __name__ == '__main__':
-main()
+# Need backing file and change-backing-file support
+iotests.script_main(main,
+supported_fmts=['qcow2', 'qed'],
+supported_platforms=['linux'])
-- 
2.21.1

[PATCH v9 08/14] iotests: touch up log function signature

2020-03-24 Thread John Snow

Representing nested, recursive data structures in mypy is notoriously
difficult; the best we can reliably do right now is denote the atom
types as "Any" while describing the general shape of the data.

Regardless, this fully annotates the log() function.

Typing notes:

TypeVar is a Type variable that can optionally be constrained by a
sequence of possible types. This variable is bound per-invocation such
that the signature for filter=() requires that its callables take e.g. a
str and return a str.

Signed-off-by: John Snow 
---
 tests/qemu-iotests/iotests.py | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index c93c6b4557..3a049ece5b 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -28,6 +28,7 @@
 import struct
 import subprocess
 import sys
+from typing import (Any, Callable, Dict, Iterable, List, Optional, TypeVar)
 import unittest
 
 # pylint: disable=import-error, wrong-import-position
@@ -353,9 +354,16 @@ def _filter(_key, value):
 return value
 return filter_qmp(qmsg, _filter)
 
-def log(msg, filters=(), indent=None):
-'''Logs either a string message or a JSON serializable message (like QMP).
-If indent is provided, JSON serializable messages are pretty-printed.'''
+
+Msg = TypeVar('Msg', Dict[str, Any], List[Any], str)
+
+def log(msg: Msg,
+filters: Iterable[Callable[[Msg], Msg]] = (),
+indent: Optional[int] = None) -> None:
+"""
+Logs either a string message or a JSON serializable message (like QMP).
+If indent is provided, JSON serializable messages are pretty-printed.
+"""
 for flt in filters:
 msg = flt(msg)
 if isinstance(msg, (dict, list)):
-- 
2.21.1

[PATCH v9 05/14] iotests: add pylintrc file

2020-03-24 Thread John Snow

This allows others to get repeatable results with pylint. If you run
`pylint iotests.py`, you should see a 100% pass.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/pylintrc | 22 ++
 1 file changed, 22 insertions(+)
 create mode 100644 tests/qemu-iotests/pylintrc

diff --git a/tests/qemu-iotests/pylintrc b/tests/qemu-iotests/pylintrc
new file mode 100644
index 00..8720b6a0de
--- /dev/null
+++ b/tests/qemu-iotests/pylintrc
@@ -0,0 +1,22 @@
+[MESSAGES CONTROL]
+
+# Disable the message, report, category or checker with the given id(s). You
+# can either give multiple identifiers separated by comma (,) or put this
+# option multiple times (only on the command line, not in the configuration
+# file where it should appear only once). You can also use "--disable=all" to
+# disable everything first and then reenable specific checks. For example, if
+# you want to run only the similarities checker, you can use "--disable=all
+# --enable=similarities". If you want to run only the classes checker, but have
+# no Warning level messages displayed, use "--disable=all --enable=classes
+# --disable=W".
+disable=invalid-name,
+no-else-return,
+too-many-lines,
+too-few-public-methods,
+too-many-arguments,
+too-many-locals,
+too-many-branches,
+too-many-public-methods,
+# These are temporary, and should be removed:
+missing-docstring,
+line-too-long,
-- 
2.21.1

[PATCH v9 03/14] iotests: ignore import warnings from pylint

2020-03-24 Thread John Snow

The right way to solve this is to come up with a virtual environment
infrastructure that sets all the paths correctly, and/or to create
installable python modules that can be imported normally.

That's hard, so just silence this error for now.

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Max Reitz 
Signed-off-by: John Snow 
---
 tests/qemu-iotests/iotests.py | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 7f486e6c4b..0eccca88e0 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -30,6 +30,7 @@
 from collections import OrderedDict
 import faulthandler
 
+# pylint: disable=import-error, wrong-import-position
 sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..', 'python'))
 from qemu import qtest
 
-- 
2.21.1

[PATCH v9 07/14] iotests: drop pre-Python 3.4 compatibility code

2020-03-24 Thread John Snow

We no longer need to accommodate 3.4, drop this code.
(The lines were > 79 chars and it stood out.)

Signed-off-by: John Snow 
---
 tests/qemu-iotests/iotests.py | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 2a0e22a3db..c93c6b4557 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -359,12 +359,9 @@ def log(msg, filters=(), indent=None):
 for flt in filters:
 msg = flt(msg)
 if isinstance(msg, (dict, list)):
-# Python < 3.4 needs to know not to add whitespace when 
pretty-printing:
-separators = (', ', ': ') if indent is None else (',', ': ')
 # Don't sort if it's already sorted
 do_sort = not isinstance(msg, OrderedDict)
-print(json.dumps(msg, sort_keys=do_sort,
- indent=indent, separators=separators))
+print(json.dumps(msg, sort_keys=do_sort, indent=indent))
 else:
 print(msg)
 
-- 
2.21.1

[PATCH v9 10/14] iotests: add hmp helper with logging

2020-03-24 Thread John Snow

Just a mild cleanup while I was here.

Although we now have universal qmp logging on or off, many existing
callers to hmp functions don't expect that output to be logged, which
causes quite a few changes in the test output.

For now, just offer a use_log parameter.

Signed-off-by: John Snow 
---
 tests/qemu-iotests/iotests.py | 30 +-
 1 file changed, 17 insertions(+), 13 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index e12d6e533e..4faee06f14 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -540,25 +540,29 @@ def add_incoming(self, addr):
 self._args.append(addr)
 return self
 
-def pause_drive(self, drive, event=None):
-'''Pause drive r/w operations'''
+def hmp(self, command_line: str, use_log: bool = False):
+cmd = 'human-monitor-command'
+kwargs = {'command-line': command_line}
+if use_log:
+return self.qmp_log(cmd, **kwargs)
+else:
+return self.qmp(cmd, **kwargs)
+
+def pause_drive(self, drive: str, event: Optional[str] = None) -> None:
+"""Pause drive r/w operations"""
 if not event:
 self.pause_drive(drive, "read_aio")
 self.pause_drive(drive, "write_aio")
 return
-self.qmp('human-monitor-command',
- command_line='qemu-io %s "break %s bp_%s"'
- % (drive, event, drive))
+self.hmp(f'qemu-io {drive} "break {event} bp_{drive}"')
 
-def resume_drive(self, drive):
-self.qmp('human-monitor-command',
- command_line='qemu-io %s "remove_break bp_%s"'
- % (drive, drive))
+def resume_drive(self, drive: str) -> None:
+"""Resume drive r/w operations"""
+self.hmp(f'qemu-io {drive} "remove_break bp_{drive}"')
 
-def hmp_qemu_io(self, drive, cmd):
-'''Write to a given drive using an HMP command'''
-return self.qmp('human-monitor-command',
-command_line='qemu-io %s "%s"' % (drive, cmd))
+def hmp_qemu_io(self, drive: str, cmd: str, use_log: bool = False) -> None:
+"""Write to a given drive using an HMP command"""
+return self.hmp(f'qemu-io {drive} "{cmd}"', use_log=use_log)
 
 def flatten_qmp_object(self, obj, output=None, basestr=''):
 if output is None:
-- 
2.21.1

[PATCH v9 14/14] iotests: use python logging for iotests.log()

2020-03-24 Thread John Snow

We can turn logging on/off globally instead of per-function.

Remove use_log from run_job, and use python logging to turn on
diffable output when we run through a script entry point.

iotest 245 changes output order due to buffering reasons.


An extended note on python logging:

A NullHandler is added to `qemu.iotests` to stop output from being
generated if this code is used as a library without configuring logging.
A NullHandler is only needed at the root, so a duplicate handler is not
needed for `qemu.iotests.diff_io`.

When logging is not configured, messages at the 'WARNING' levels or
above are printed with default settings. The NullHandler stops this from
occurring, which is considered good hygiene for code used as a library.

See https://docs.python.org/3/howto/logging.html#library-config

When logging is actually enabled (always at the behest of an explicit
call by a client script), a root logger is implicitly created at the
root, which allows messages to propagate upwards and be handled/emitted
from the root logger with default settings.

When we want iotest logging, we attach a handler to the
qemu.iotests.diff_io logger and disable propagation to avoid possible
double-printing.

For more information on python logging infrastructure, I highly
recommend downloading the pip package `logging_tree`, which provides
convenient visualizations of the hierarchical logging configuration
under different circumstances.

See https://pypi.org/project/logging_tree/ for more information.

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/030|  4 +--
 tests/qemu-iotests/155|  2 +-
 tests/qemu-iotests/245|  1 +
 tests/qemu-iotests/245.out| 24 
 tests/qemu-iotests/iotests.py | 52 +++
 5 files changed, 45 insertions(+), 38 deletions(-)

diff --git a/tests/qemu-iotests/030 b/tests/qemu-iotests/030
index aa911d266a..104e3cee1b 100755
--- a/tests/qemu-iotests/030
+++ b/tests/qemu-iotests/030
@@ -411,8 +411,8 @@ class TestParallelOps(iotests.QMPTestCase):
 result = self.vm.qmp('block-job-set-speed', device='drive0', speed=0)
 self.assert_qmp(result, 'return', {})
 
-self.vm.run_job(job='drive0', auto_dismiss=True, use_log=False)
-self.vm.run_job(job='node4', auto_dismiss=True, use_log=False)
+self.vm.run_job(job='drive0', auto_dismiss=True)
+self.vm.run_job(job='node4', auto_dismiss=True)
 self.assert_no_active_block_jobs()
 
 # Test a block-stream and a block-commit job in parallel
diff --git a/tests/qemu-iotests/155 b/tests/qemu-iotests/155
index 571bce9de4..cb371d4649 100755
--- a/tests/qemu-iotests/155
+++ b/tests/qemu-iotests/155
@@ -188,7 +188,7 @@ class MirrorBaseClass(BaseClass):
 
 self.assert_qmp(result, 'return', {})
 
-self.vm.run_job('mirror-job', use_log=False, auto_finalize=False,
+self.vm.run_job('mirror-job', auto_finalize=False,
 pre_finalize=self.openBacking, auto_dismiss=True)
 
 def testFull(self):
diff --git a/tests/qemu-iotests/245 b/tests/qemu-iotests/245
index 1001275a44..4f5f0bb901 100755
--- a/tests/qemu-iotests/245
+++ b/tests/qemu-iotests/245
@@ -1027,5 +1027,6 @@ class TestBlockdevReopen(iotests.QMPTestCase):
 self.run_test_iothreads(None, 'iothread0')
 
 if __name__ == '__main__':
+iotests.activate_logging()
 iotests.main(supported_fmts=["qcow2"],
  supported_protocols=["file"])
diff --git a/tests/qemu-iotests/245.out b/tests/qemu-iotests/245.out
index 682b93394d..4b33dcaf5c 100644
--- a/tests/qemu-iotests/245.out
+++ b/tests/qemu-iotests/245.out
@@ -1,17 +1,17 @@
+{"execute": "job-finalize", "arguments": {"id": "commit0"}}
+{"return": {}}
+{"data": {"id": "commit0", "type": "commit"}, "event": "BLOCK_JOB_PENDING", 
"timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"data": {"device": "commit0", "len": 3145728, "offset": 3145728, "speed": 0, 
"type": "commit"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
+{"execute": "job-finalize", "arguments": {"id": "stream0"}}
+{"return": {}}
+{"data": {"id": "stream0", "type": "stream"}, "event": "BLOCK_JOB_PENDING", 
"timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"data": {"device": "stream0", "len": 3145728, "offset": 3145728, "speed": 0, 
"type": "stream"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
+{"execute": "job-finalize", "arguments": {"id": "stream0"}}
+{"return": {}}
+{"data": {"id": "stream0", "type": "stream"}, "event": "BLOCK_JOB_PENDING", 
"timestamp": {"microseconds": "USECS", "seconds": "SECS"}}
+{"data": {"device": "stream0", "len": 3145728, "offset": 3145728, "speed": 0, 
"type": "stream"}, "event": "BLOCK_JOB_COMPLETED", "timestamp": 
{"microseconds": "USECS", "seconds": "SECS"}}
 .
 --

[PATCH v9 13/14] iotests: Mark verify functions as private

2020-03-24 Thread John Snow

Mark the verify functions as "private" with a leading underscore, to
discourage their use.

(Also, make pending patches not yet using the new entry points fail in a
very obvious way.)

Signed-off-by: John Snow 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/iotests.py | 20 ++--
 1 file changed, 10 insertions(+), 10 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index fbca0f2a40..a356dd1b45 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -1000,7 +1000,7 @@ def case_notrun(reason):
 open('%s/%s.casenotrun' % (output_dir, seq), 'a').write(
 '[case not run] ' + reason + '\n')
 
-def verify_image_format(supported_fmts=(), unsupported_fmts=()):
+def _verify_image_format(supported_fmts=(), unsupported_fmts=()):
 assert not (supported_fmts and unsupported_fmts)
 
 if 'generic' in supported_fmts and \
@@ -1014,7 +1014,7 @@ def verify_image_format(supported_fmts=(), 
unsupported_fmts=()):
 if not_sup or (imgfmt in unsupported_fmts):
 notrun('not suitable for this image format: %s' % imgfmt)
 
-def verify_protocol(supported=(), unsupported=()):
+def _verify_protocol(supported=(), unsupported=()):
 assert not (supported and unsupported)
 
 if 'generic' in supported:
@@ -1024,7 +1024,7 @@ def verify_protocol(supported=(), unsupported=()):
 if not_sup or (imgproto in unsupported):
 notrun('not suitable for this protocol: %s' % imgproto)
 
-def verify_platform(supported=(), unsupported=()):
+def _verify_platform(supported=(), unsupported=()):
 if any((sys.platform.startswith(x) for x in unsupported)):
 notrun('not suitable for this OS: %s' % sys.platform)
 
@@ -1032,11 +1032,11 @@ def verify_platform(supported=(), unsupported=()):
 if not any((sys.platform.startswith(x) for x in supported)):
 notrun('not suitable for this OS: %s' % sys.platform)
 
-def verify_cache_mode(supported_cache_modes=()):
+def _verify_cache_mode(supported_cache_modes=()):
 if supported_cache_modes and (cachemode not in supported_cache_modes):
 notrun('not suitable for this cache mode: %s' % cachemode)
 
-def verify_aio_mode(supported_aio_modes=()):
+def _verify_aio_mode(supported_aio_modes=()):
 if supported_aio_modes and (aiomode not in supported_aio_modes):
 notrun('not suitable for this aio mode: %s' % aiomode)
 
@@ -1163,11 +1163,11 @@ def execute_setup_common(supported_fmts: 
Collection[str] = (),
 sys.stderr.write('Please run this test via the "check" script\n')
 sys.exit(os.EX_USAGE)
 
-verify_image_format(supported_fmts, unsupported_fmts)
-verify_protocol(supported_protocols, unsupported_protocols)
-verify_platform(supported=supported_platforms)
-verify_cache_mode(supported_cache_modes)
-verify_aio_mode(supported_aio_modes)
+_verify_image_format(supported_fmts, unsupported_fmts)
+_verify_protocol(supported_protocols, unsupported_protocols)
+_verify_platform(supported=supported_platforms)
+_verify_cache_mode(supported_cache_modes)
+_verify_aio_mode(supported_aio_modes)
 
 debug = '-d' in sys.argv
 if debug:
-- 
2.21.1

[PATCH v9 06/14] iotests: alphabetize standard imports

2020-03-24 Thread John Snow

I had to fix a merge conflict, so do this tiny harmless thing while I'm
here.

Signed-off-by: John Snow 
---
 tests/qemu-iotests/iotests.py | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 20da488ad6..2a0e22a3db 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -16,19 +16,19 @@
 # along with this program.  If not, see .
 #
 
+import atexit
+from collections import OrderedDict
+import faulthandler
+import io
+import json
+import logging
 import os
 import re
+import signal
+import struct
 import subprocess
-import unittest
 import sys
-import struct
-import json
-import signal
-import logging
-import atexit
-import io
-from collections import OrderedDict
-import faulthandler
+import unittest
 
 # pylint: disable=import-error, wrong-import-position
 sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..', 'python'))
-- 
2.21.1

[PATCH v9 09/14] iotests: limit line length to 79 chars

2020-03-24 Thread John Snow

79 is the PEP8 recommendation. This recommendation works well for
reading patch diffs in TUI email clients.

Signed-off-by: John Snow 
---
 tests/qemu-iotests/iotests.py | 64 +++
 tests/qemu-iotests/pylintrc   |  6 +++-
 2 files changed, 47 insertions(+), 23 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 3a049ece5b..e12d6e533e 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -80,9 +80,11 @@
 def qemu_img(*args):
 '''Run qemu-img and return the exit code'''
 devnull = open('/dev/null', 'r+')
-exitcode = subprocess.call(qemu_img_args + list(args), stdin=devnull, 
stdout=devnull)
+exitcode = subprocess.call(qemu_img_args + list(args),
+   stdin=devnull, stdout=devnull)
 if exitcode < 0:
-sys.stderr.write('qemu-img received signal %i: %s\n' % (-exitcode, ' 
'.join(qemu_img_args + list(args
+sys.stderr.write('qemu-img received signal %i: %s\n'
+ % (-exitcode, ' '.join(qemu_img_args + list(args
 return exitcode
 
 def ordered_qmp(qmsg, conv_keys=True):
@@ -121,7 +123,8 @@ def qemu_img_verbose(*args):
 '''Run qemu-img without suppressing its output and return the exit code'''
 exitcode = subprocess.call(qemu_img_args + list(args))
 if exitcode < 0:
-sys.stderr.write('qemu-img received signal %i: %s\n' % (-exitcode, ' 
'.join(qemu_img_args + list(args
+sys.stderr.write('qemu-img received signal %i: %s\n'
+ % (-exitcode, ' '.join(qemu_img_args + list(args
 return exitcode
 
 def qemu_img_pipe(*args):
@@ -132,7 +135,8 @@ def qemu_img_pipe(*args):
 universal_newlines=True)
 exitcode = subp.wait()
 if exitcode < 0:
-sys.stderr.write('qemu-img received signal %i: %s\n' % (-exitcode, ' 
'.join(qemu_img_args + list(args
+sys.stderr.write('qemu-img received signal %i: %s\n'
+ % (-exitcode, ' '.join(qemu_img_args + list(args
 return subp.communicate()[0]
 
 def qemu_img_log(*args):
@@ -162,7 +166,8 @@ def qemu_io(*args):
 universal_newlines=True)
 exitcode = subp.wait()
 if exitcode < 0:
-sys.stderr.write('qemu-io received signal %i: %s\n' % (-exitcode, ' 
'.join(args)))
+sys.stderr.write('qemu-io received signal %i: %s\n'
+ % (-exitcode, ' '.join(args)))
 return subp.communicate()[0]
 
 def qemu_io_log(*args):
@@ -284,10 +289,13 @@ def filter_test_dir(msg):
 def filter_win32(msg):
 return win32_re.sub("", msg)
 
-qemu_io_re = re.compile(r"[0-9]* ops; [0-9\/:. sec]* \([0-9\/.inf]* 
[EPTGMKiBbytes]*\/sec and [0-9\/.inf]* ops\/sec\)")
+qemu_io_re = re.compile(r"[0-9]* ops; [0-9\/:. sec]* "
+r"\([0-9\/.inf]* [EPTGMKiBbytes]*\/sec "
+r"and [0-9\/.inf]* ops\/sec\)")
 def filter_qemu_io(msg):
 msg = filter_win32(msg)
-return qemu_io_re.sub("X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)", 
msg)
+return qemu_io_re.sub("X ops; XX:XX:XX.X "
+  "(XXX YYY/sec and XXX ops/sec)", msg)
 
 chown_re = re.compile(r"chown [0-9]+:[0-9]+")
 def filter_chown(msg):
@@ -339,7 +347,9 @@ def filter_img_info(output, filename):
 line = line.replace(filename, 'TEST_IMG') \
.replace(imgfmt, 'IMGFMT')
 line = re.sub('iters: [0-9]+', 'iters: XXX', line)
-line = re.sub('uuid: [-a-f0-9]+', 'uuid: 
----', line)
+line = re.sub('uuid: [-a-f0-9]+',
+  'uuid: ----',
+  line)
 line = re.sub('cid: [0-9]+', 'cid: XX', line)
 lines.append(line)
 return '\n'.join(lines)
@@ -537,11 +547,13 @@ def pause_drive(self, drive, event=None):
 self.pause_drive(drive, "write_aio")
 return
 self.qmp('human-monitor-command',
- command_line='qemu-io %s "break %s bp_%s"' % (drive, event, 
drive))
+ command_line='qemu-io %s "break %s bp_%s"'
+ % (drive, event, drive))
 
 def resume_drive(self, drive):
 self.qmp('human-monitor-command',
- command_line='qemu-io %s "remove_break bp_%s"' % (drive, 
drive))
+ command_line='qemu-io %s "remove_break bp_%s"'
+ % (drive, drive))
 
 def hmp_qemu_io(self, drive, cmd):
 '''Write to a given drive using an HMP command'''
@@ -801,16 +813,18 @@ def dictpath(self, d, path):
 idx = int(idx)
 
 if not isinstance(d, dict) or component not in d:
-self.fail('failed path traversal for "%s" in "%s"' % (path, 
str(d)))
+self.fail(f'failed path traversal for "{path}" in "{d}"')
 d = d[component]
 
 if m:

[PATCH v9 02/14] iotests: don't use 'format' for drive_add

2020-03-24 Thread John Snow

It shadows (with a different type) the built-in format.
Use something else.

Signed-off-by: John Snow 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/055| 3 ++-
 tests/qemu-iotests/iotests.py | 6 +++---
 2 files changed, 5 insertions(+), 4 deletions(-)

diff --git a/tests/qemu-iotests/055 b/tests/qemu-iotests/055
index 82b9f5f47d..4175fff5e4 100755
--- a/tests/qemu-iotests/055
+++ b/tests/qemu-iotests/055
@@ -469,7 +469,8 @@ class TestDriveCompression(iotests.QMPTestCase):
 qemu_img('create', '-f', fmt, blockdev_target_img,
  str(TestDriveCompression.image_len), *args)
 if attach_target:
-self.vm.add_drive(blockdev_target_img, format=fmt, 
interface="none")
+self.vm.add_drive(blockdev_target_img,
+  img_format=fmt, interface="none")
 
 self.vm.launch()
 
diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 886ae962ae..7f486e6c4b 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -485,21 +485,21 @@ def add_drive_raw(self, opts):
 self._args.append(opts)
 return self
 
-def add_drive(self, path, opts='', interface='virtio', format=imgfmt):
+def add_drive(self, path, opts='', interface='virtio', img_format=imgfmt):
 '''Add a virtio-blk drive to the VM'''
 options = ['if=%s' % interface,
'id=drive%d' % self._num_drives]
 
 if path is not None:
 options.append('file=%s' % path)
-options.append('format=%s' % format)
+options.append('format=%s' % img_format)
 options.append('cache=%s' % cachemode)
 options.append('aio=%s' % aiomode)
 
 if opts:
 options.append(opts)
 
-if format == 'luks' and 'key-secret' not in opts:
+if img_format == 'luks' and 'key-secret' not in opts:
 # default luks support
 if luks_default_secret_object not in self._args:
 self.add_object(luks_default_secret_object)
-- 
2.21.1

[PATCH v9 04/14] iotests: replace mutable list default args

2020-03-24 Thread John Snow

It's bad hygiene: if we modify this list, it will be modified across all
invocations.

(Remaining bad usages are fixed in a subsequent patch which changes the
function signature anyway.)

Signed-off-by: John Snow 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/iotests.py | 24 
 1 file changed, 12 insertions(+), 12 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 0eccca88e0..20da488ad6 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -139,7 +139,7 @@ def qemu_img_log(*args):
 log(result, filters=[filter_testfiles])
 return result
 
-def img_info_log(filename, filter_path=None, imgopts=False, extra_args=[]):
+def img_info_log(filename, filter_path=None, imgopts=False, extra_args=()):
 args = ['info']
 if imgopts:
 args.append('--image-opts')
@@ -353,7 +353,7 @@ def _filter(_key, value):
 return value
 return filter_qmp(qmsg, _filter)
 
-def log(msg, filters=[], indent=None):
+def log(msg, filters=(), indent=None):
 '''Logs either a string message or a JSON serializable message (like QMP).
 If indent is provided, JSON serializable messages are pretty-printed.'''
 for flt in filters:
@@ -569,7 +569,7 @@ def get_qmp_events_filtered(self, wait=60.0):
 result.append(filter_qmp_event(ev))
 return result
 
-def qmp_log(self, cmd, filters=[], indent=None, **kwargs):
+def qmp_log(self, cmd, filters=(), indent=None, **kwargs):
 full_cmd = OrderedDict((
 ("execute", cmd),
 ("arguments", ordered_qmp(kwargs))
@@ -973,7 +973,7 @@ def case_notrun(reason):
 open('%s/%s.casenotrun' % (output_dir, seq), 'a').write(
 '[case not run] ' + reason + '\n')
 
-def verify_image_format(supported_fmts=[], unsupported_fmts=[]):
+def verify_image_format(supported_fmts=(), unsupported_fmts=()):
 assert not (supported_fmts and unsupported_fmts)
 
 if 'generic' in supported_fmts and \
@@ -987,7 +987,7 @@ def verify_image_format(supported_fmts=[], 
unsupported_fmts=[]):
 if not_sup or (imgfmt in unsupported_fmts):
 notrun('not suitable for this image format: %s' % imgfmt)
 
-def verify_protocol(supported=[], unsupported=[]):
+def verify_protocol(supported=(), unsupported=()):
 assert not (supported and unsupported)
 
 if 'generic' in supported:
@@ -1006,11 +1006,11 @@ def verify_platform(supported=None, unsupported=None):
 if not any((sys.platform.startswith(x) for x in supported)):
 notrun('not suitable for this OS: %s' % sys.platform)
 
-def verify_cache_mode(supported_cache_modes=[]):
+def verify_cache_mode(supported_cache_modes=()):
 if supported_cache_modes and (cachemode not in supported_cache_modes):
 notrun('not suitable for this cache mode: %s' % cachemode)
 
-def verify_aio_mode(supported_aio_modes=[]):
+def verify_aio_mode(supported_aio_modes=()):
 if supported_aio_modes and (aiomode not in supported_aio_modes):
 notrun('not suitable for this aio mode: %s' % aiomode)
 
@@ -1050,7 +1050,7 @@ def supported_formats(read_only=False):
 
 return supported_formats.formats[read_only]
 
-def skip_if_unsupported(required_formats=[], read_only=False):
+def skip_if_unsupported(required_formats=(), read_only=False):
 '''Skip Test Decorator
Runs the test if all the required formats are whitelisted'''
 def skip_test_decorator(func):
@@ -1101,11 +1101,11 @@ def execute_unittest(output, verbosity, debug):
 sys.stderr.write(out)
 
 def execute_test(test_function=None,
- supported_fmts=[],
+ supported_fmts=(),
  supported_platforms=None,
- supported_cache_modes=[], supported_aio_modes={},
- unsupported_fmts=[], supported_protocols=[],
- unsupported_protocols=[]):
+ supported_cache_modes=(), supported_aio_modes=(),
+ unsupported_fmts=(), supported_protocols=(),
+ unsupported_protocols=()):
 """Run either unittest or script-style tests."""
 
 # We are using TEST_DIR and QEMU_DEFAULT_MACHINE as proxies to
-- 
2.21.1

[PATCH v9 01/14] iotests: do a light delinting

2020-03-24 Thread John Snow

This doesn't fix everything in here, but it does help clean up the
pylint report considerably.

This should be 100% style changes only; the intent is to make pylint
more useful by working on establishing a baseline for iotests that we
can gate against in the future.

Signed-off-by: John Snow 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Max Reitz 
---
 tests/qemu-iotests/iotests.py | 83 ++-
 1 file changed, 43 insertions(+), 40 deletions(-)

diff --git a/tests/qemu-iotests/iotests.py b/tests/qemu-iotests/iotests.py
index 7bc4934cd2..886ae962ae 100644
--- a/tests/qemu-iotests/iotests.py
+++ b/tests/qemu-iotests/iotests.py
@@ -16,11 +16,9 @@
 # along with this program.  If not, see .
 #
 
-import errno
 import os
 import re
 import subprocess
-import string
 import unittest
 import sys
 import struct
@@ -35,7 +33,7 @@
 sys.path.append(os.path.join(os.path.dirname(__file__), '..', '..', 'python'))
 from qemu import qtest
 
-assert sys.version_info >= (3,6)
+assert sys.version_info >= (3, 6)
 
 faulthandler.enable()
 
@@ -141,11 +139,11 @@ def qemu_img_log(*args):
 return result
 
 def img_info_log(filename, filter_path=None, imgopts=False, extra_args=[]):
-args = [ 'info' ]
+args = ['info']
 if imgopts:
 args.append('--image-opts')
 else:
-args += [ '-f', imgfmt ]
+args += ['-f', imgfmt]
 args += extra_args
 args.append(filename)
 
@@ -224,7 +222,7 @@ def cmd(self, cmd):
 # quit command is in close(), '\n' is added automatically
 assert '\n' not in cmd
 cmd = cmd.strip()
-assert cmd != 'q' and cmd != 'quit'
+assert cmd not in ('q', 'quit')
 self._p.stdin.write(cmd + '\n')
 self._p.stdin.flush()
 return self._read_output()
@@ -246,10 +244,8 @@ def qemu_nbd_early_pipe(*args):
 sys.stderr.write('qemu-nbd received signal %i: %s\n' %
  (-exitcode,
   ' '.join(qemu_nbd_args + ['--fork'] + list(args
-if exitcode == 0:
-return exitcode, ''
-else:
-return exitcode, subp.communicate()[0]
+
+return exitcode, subp.communicate()[0] if exitcode else ''
 
 def qemu_nbd_popen(*args):
 '''Run qemu-nbd in daemon mode and return the parent's exit code'''
@@ -313,7 +309,7 @@ def filter_qmp(qmsg, filter_fn):
 items = qmsg.items()
 
 for k, v in items:
-if isinstance(v, list) or isinstance(v, dict):
+if isinstance(v, (dict, list)):
 qmsg[k] = filter_qmp(v, filter_fn)
 else:
 qmsg[k] = filter_fn(k, v)
@@ -324,7 +320,7 @@ def filter_testfiles(msg):
 return msg.replace(prefix, 'TEST_DIR/PID-')
 
 def filter_qmp_testfiles(qmsg):
-def _filter(key, value):
+def _filter(_key, value):
 if is_str(value):
 return filter_testfiles(value)
 return value
@@ -350,7 +346,7 @@ def filter_imgfmt(msg):
 return msg.replace(imgfmt, 'IMGFMT')
 
 def filter_qmp_imgfmt(qmsg):
-def _filter(key, value):
+def _filter(_key, value):
 if is_str(value):
 return filter_imgfmt(value)
 return value
@@ -361,7 +357,7 @@ def log(msg, filters=[], indent=None):
 If indent is provided, JSON serializable messages are pretty-printed.'''
 for flt in filters:
 msg = flt(msg)
-if isinstance(msg, dict) or isinstance(msg, list):
+if isinstance(msg, (dict, list)):
 # Python < 3.4 needs to know not to add whitespace when 
pretty-printing:
 separators = (', ', ': ') if indent is None else (',', ': ')
 # Don't sort if it's already sorted
@@ -372,14 +368,14 @@ def log(msg, filters=[], indent=None):
 print(msg)
 
 class Timeout:
-def __init__(self, seconds, errmsg = "Timeout"):
+def __init__(self, seconds, errmsg="Timeout"):
 self.seconds = seconds
 self.errmsg = errmsg
 def __enter__(self):
 signal.signal(signal.SIGALRM, self.timeout)
 signal.setitimer(signal.ITIMER_REAL, self.seconds)
 return self
-def __exit__(self, type, value, traceback):
+def __exit__(self, exc_type, value, traceback):
 signal.setitimer(signal.ITIMER_REAL, 0)
 return False
 def timeout(self, signum, frame):
@@ -388,7 +384,7 @@ def timeout(self, signum, frame):
 def file_pattern(name):
 return "{0}-{1}".format(os.getpid(), name)
 
-class FilePaths(object):
+class FilePaths:
 """
 FilePaths is an auto-generated filename that cleans itself up.
 
@@ -535,11 +531,11 @@ def pause_drive(self, drive, event=None):
 self.pause_drive(drive, "write_aio")
 return
 self.qmp('human-monitor-command',
-command_line='qemu-io %s "break %s bp_%s"' % (drive, 
event, drive))
+ command_line='qemu-io %s "break %s bp_%s"' % (drive, event, 
drive))
 
 def resume_drive(self, drive):

[PATCH v9 00/14] iotests: use python logging

2020-03-24 Thread John Snow

This series uses python logging to enable output conditionally on
iotests.log(). We unify an initialization call (which also enables
debugging output for those tests with -d) and then make the switch
inside of iotests.

It will help alleviate the need to create logged/unlogged versions
of all the various helpers we have made.

Also, I got lost and accidentally delinted iotests while I was here.
Sorry about that. By version 9, it's now the overwhelming focus of
this series. No good deed, etc.

V9:

001/14:[] [-C] 'iotests: do a light delinting'
002/14:[] [--] 'iotests: don't use 'format' for drive_add'
003/14:[] [-C] 'iotests: ignore import warnings from pylint'
004/14:[] [--] 'iotests: replace mutable list default args'
005/14:[] [--] 'iotests: add pylintrc file'
006/14:[down]  'iotests: alphabetize standard imports'
007/14:[down]  'iotests: drop pre-Python 3.4 compatibility code'
008/14:[down]  'iotests: touch up log function signature'
009/14:[] [--] 'iotests: limit line length to 79 chars'
010/14:[down]  'iotests: add hmp helper with logging'
011/14:[0004] [FC] 'iotests: add script_initialize'
012/14:[] [--] 'iotest 258: use script_main'
013/14:[] [--] 'iotests: Mark verify functions as private'
014/14:[0001] [FC] 'iotests: use python logging for iotests.log()'

006: New.
007: Split from old patch.
008: Split from old patch; enhanced a little to justify its own patch.
010: New, pulled in from bitmap-populate series. Helps line length.
011: Reflow columns for long `typing` import list. (Kept RB.)
014: New blank line. (Kept RB.)

V8:
- Split out the little drop of Python 3.4 code. (Phil)
- Change line continuation styles (QEMU Memorial Choir)
- Rebase changes; remove use_log from more places, adjust test output.

V7:
- All delinting patches are now entirely front-loaded.
- Redid delinting to avoid "correcting" no-else-return statements.
- Moved more mutable list corrections into patch 4, to make it standalone.
- Moved pylintrc up to patch 5. Disabled no-else-return.
- Added patch 6 to require line length checks.
  (Some python 3.4 compatibility code is removed as a consequence.)
- Patch 7 changes slightly as a result of patch 4 changes.
- Added some logging explainer into patch 10.
  (Patch changes slightly because of patch 6.)

V6:
 - It's been so long since V5, let's just look at it anew.

John Snow (14):
  iotests: do a light delinting
  iotests: don't use 'format' for drive_add
  iotests: ignore import warnings from pylint
  iotests: replace mutable list default args
  iotests: add pylintrc file
  iotests: alphabetize standard imports
  iotests: drop pre-Python 3.4 compatibility code
  iotests: touch up log function signature
  iotests: limit line length to 79 chars
  iotests: add hmp helper with logging
  iotests: add script_initialize
  iotest 258: use script_main
  iotests: Mark verify functions as private
  iotests: use python logging for iotests.log()

 tests/qemu-iotests/030|   4 +-
 tests/qemu-iotests/055|   3 +-
 tests/qemu-iotests/149|   3 +-
 tests/qemu-iotests/155|   2 +-
 tests/qemu-iotests/194|   4 +-
 tests/qemu-iotests/202|   4 +-
 tests/qemu-iotests/203|   4 +-
 tests/qemu-iotests/206|   2 +-
 tests/qemu-iotests/207|   6 +-
 tests/qemu-iotests/208|   2 +-
 tests/qemu-iotests/209|   2 +-
 tests/qemu-iotests/210|   6 +-
 tests/qemu-iotests/211|   6 +-
 tests/qemu-iotests/212|   6 +-
 tests/qemu-iotests/213|   6 +-
 tests/qemu-iotests/216|   4 +-
 tests/qemu-iotests/218|   2 +-
 tests/qemu-iotests/219|   2 +-
 tests/qemu-iotests/222|   7 +-
 tests/qemu-iotests/224|   4 +-
 tests/qemu-iotests/228|   6 +-
 tests/qemu-iotests/234|   4 +-
 tests/qemu-iotests/235|   4 +-
 tests/qemu-iotests/236|   2 +-
 tests/qemu-iotests/237|   2 +-
 tests/qemu-iotests/238|   2 +
 tests/qemu-iotests/242|   2 +-
 tests/qemu-iotests/245|   1 +
 tests/qemu-iotests/245.out|  24 +--
 tests/qemu-iotests/246|   2 +-
 tests/qemu-iotests/248|   2 +-
 tests/qemu-iotests/254|   2 +-
 tests/qemu-iotests/255|   2 +-
 tests/qemu-iotests/256|   2 +-
 tests/qemu-iotests/258|  10 +-
 tests/qemu-iotests/260|   4 +-
 tests/qemu-iotests/262|   4 +-
 tests/qemu-iotests/264|   4 +-
 tests/qemu-iotests/277|   2 +
 tests/qemu-iotests/280|   8 +-
 tests/qemu-iotests/283|   4 +-
 tests/qemu-iotests/iotests.py | 356 --
 tests/qemu-iotests/pylintrc   |  26 +++
 43 files changed, 333 insertions(+), 221 deletions(-)
 create mode 100644 tests/qemu-iotests/pylintrc

-- 
2.21.1

Re: Potential missing checks

2020-03-24 Thread Mansour Ahmadi

Thanks for the explanation.


On Tue, Mar 24, 2020 at 5:17 PM Peter Maydell 
wrote:

> On Tue, 24 Mar 2020 at 20:39, Mansour Ahmadi  wrote:
> >
> > Thank you for looking into this, Peter. I agree that static analysis has
> false positives; that's why I called them potential. Basically, they are
> found based on code similarity so I might be wrong and I need a second
> opinion from QEMU developers. I appreciate your effort.
>
> The thing is, you're making us do all the work here. That's
> not very useful to us. It's doubly unuseful when there's
> a strong chance that when we do do the work of looking
> at the code it turns out that there's no problem.
>
> "I did some static analysis, and I looked at the
> results, and I dug through the QEMU code, and it
> does seem to me that this could well be a bug" is
> definitely useful. "I did some static analysis using
> only analysis techniques that have an pretty
> low false positive rate, and here is a selection of
> the results" is also useful. But "I just ran the
> code through an analyser that produces lots of
> false positives and then I didn't do any further
> human examination of the results" is of much less
> utility to the project, I'm afraid.
>
> > For the first case, I noticed a check on offset (if (offset)) before
> negating it and passing to stream function here.
> >
> https://github.com/qemu/qemu/blob/c532b954d96f96d361ca31308f75f1b95bd4df76/disas/arm.c#L1748
> >
> > Similar scenario happened here WITHOUT the check:
> >
> https://github.com/qemu/qemu/blob/c532b954d96f96d361ca31308f75f1b95bd4df76/disas/arm.c#L2731-L2733
>
> So, this is in the disassembler. The difference is
> just whether we print out a register+offset memory
> reference where the offset happens to be zero
> as "[reg, #0]" or just "[reg]", and the no-special-case-0
> is for encodings which are always pc-relative.
> So even if it was a missing check the results are
> entirely harmless, since anybody reading the disassembly
> will understand the #0 fine.
>
> Secondly, this code is imported from binutils,
> so we usually don't worry too much about fixing
> up minor bugs in it.
>
> Finally, I went and checked the Arm specs, and for
> the kinds of PC-relative load/store that the second
> example is handling the specified disassembly format
> does mandate the "pc, #0" (whereas the other example
> is correctly skipping it for 0-immediates because
> in those insns the offset is optional in disassembly).
>
> So the code is correct as it stands.
>
> thanks
> -- PMM
>

Re: Potential missing checks

2020-03-24 Thread Mansour Ahmadi

Thank you for looking into this, Peter. I agree that static analysis has
false positives; that's why I called them potential. Basically, they are
found based on code similarity so I might be wrong and I need a second
opinion from QEMU developers. I appreciate your effort.

For the first case, I noticed a check on offset (if (offset)) before
negating it and passing to stream function here.
https://github.com/qemu/qemu/blob/c532b954d96f96d361ca31308f75f1b95bd4df76/disas/arm.c#L1748

Similar scenario happened here WITHOUT the check:
https://github.com/qemu/qemu/blob/c532b954d96f96d361ca31308f75f1b95bd4df76/disas/arm.c#L2731-L2733

So I wonder whether a check on offset is really missed.

Thank you!
Mansour



On Tue, Mar 24, 2020 at 5:24 AM Peter Maydell 
wrote:

> On Mon, 23 Mar 2020 at 22:04, Mansour Ahmadi  wrote:
> >
> > Hi QEMU developers,
> >
> > I noticed the following two potential missing checks by static analysis
> and detecting inconsistencies on the source code of QEMU. here is the
> result:
>
> Hi. Can you provide more details of your analysis, please? "Maybe
> there's an issue
> at this line" is not terribly helpful, especially if one has to follow
> a bunch of URLs
> to even find out which code is being discussed. All static analysers are
> prone
> to false positives, and so the value is in analysing the possible issues,
> not
> in simply dumping raw output with no details onto the mailing list.
>
> > 1)
> > Missing check on offset:
> >
> https://github.com/qemu/qemu/blob/c532b954d96f96d361ca31308f75f1b95bd4df76/disas/arm.c#L2728-L2733
> >
> > While it is checked here:
> >
> https://github.com/qemu/qemu/blob/c532b954d96f96d361ca31308f75f1b95bd4df76/disas/arm.c#L1748-L1752
>
> What in particular do you think should be being checked that is not?
>
> > 2)
> > Missing check on bmds->dirty_bitmap:
> >
> https://github.com/qemu/qemu/blob/c532b954d96f96d361ca31308f75f1b95bd4df76/migration/block.c#L377-L378
> >
> > While it is checked here:
> >
> https://github.com/qemu/qemu/blob/c532b954d96f96d361ca31308f75f1b95bd4df76/migration/block.c#L363-L365
>
> This one looks correct to me -- the second case is the error handling
> path for "failure halfway through creating the list of dirty bitmaps",
> and so it must handle "this one wasn't created yet". The first
> case will only run on data structures where set_dirty_tracking()
> succeeded, and so we know that there can't be any NULL pointers.
> Why do you think it is incorrect?
>
> thanks
> -- PMM
>

Re: [PULL v2 0/5] Linux user for 5.0 patches

2020-03-24 Thread Laurent Vivier

Le 24/03/2020 à 14:14, Peter Maydell a écrit :
> On Tue, 24 Mar 2020 at 12:32, Laurent Vivier  wrote:
>> OK, I think there is an existing problem in the build dependencies.
>>
>> Do you use enable all targets ("configure" without parameters)?
>> Do you run make with "all" or "x86_64-linux-user/all"?
> 
> This config is
> '../../configure' '--cc=ccache gcc' '--enable-debug' '--static'
> '--disable-system' '--disable-gnutls'
> and it is an incremental build, so just
> 
> make --output-sync -C build/all-linux-static -j8
> make --output-sync -C build/all-linux-static check V=1 -j8
> make --output-sync -C ~/linaro/linux-user-test-0.3/ test
> make --output-sync -C build/all-linux-static check-tcg
> 
> (it's step 3 that fails here).
> 

The problem is introduced by the change I made to be able to bisect
while we move syscall_nr.h from source dir to build dir (as said by
Richard):

4d6a835dea47 ("linux-user: introduce parameters to generate syscall_nr.h")

There is also a new problem introduced by:

5f29856b852d(" linux-user, configure: improve syscall_nr.h dependencies
checking")

that doesn't scan arch variant (it scans ppc64-linux-user but not
ppc64le-linux-user).

The best solution I can propose is to simply remove the piece of code
I've added in configure and let the user to do a "make clean" if the
build fails because of the move of syscall_nr.h from source dir to build
dir.

Any idea?

Thanks,
Laurent

[ANNOUNCE] QEMU 5.0.0-rc0 is now available

2020-03-24 Thread Michael Roth

Hello,

On behalf of the QEMU Team, I'd like to announce the availability of the
first release candidate for the QEMU 5.0 release.  This release is meant
for testing purposes and should not be used in a production environment.

  http://download.qemu-project.org/qemu-5.0.0-rc0.tar.xz
  http://download.qemu-project.org/qemu-5.0.0-rc0.tar.xz.sig

You can help improve the quality of the QEMU 5.0 release by testing this
release and reporting bugs on Launchpad:

  https://bugs.launchpad.net/qemu/

The release plan, as well a documented known issues for release
candidates, are available at:

  http://wiki.qemu.org/Planning/5.0

Please add entries to the ChangeLog for the 5.0 release below:

  http://wiki.qemu.org/ChangeLog/5.0

Thank you to everyone involved!

Re: [PATCH v16 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-03-24 Thread Kirti Wankhede





On 3/25/2020 2:15 AM, Alex Williamson wrote:

On Tue, 24 Mar 2020 14:37:16 -0600
Alex Williamson  wrote:


On Wed, 25 Mar 2020 01:02:36 +0530
Kirti Wankhede  wrote:


VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
- Start dirty pages tracking while migration is active
- Stop dirty pages tracking.
- Get dirty pages bitmap. Its user space application's responsibility to
   copy content of dirty pages from source to destination during migration.

To prevent DoS attack, memory for bitmap is allocated per vfio_dma
structure. Bitmap size is calculated considering smallest supported page
size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled

Bitmap is populated for already pinned pages when bitmap is allocated for
a vfio_dma with the smallest supported page size. Update bitmap from
pinning functions when tracking is enabled. When user application queries
bitmap, check if requested page size is same as page size used to
populated bitmap. If it is equal, copy bitmap, but if not equal, return
error.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
  drivers/vfio/vfio_iommu_type1.c | 265 +++-
  1 file changed, 259 insertions(+), 6 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 70aeab921d0f..27ed069c5053 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -71,6 +71,7 @@ struct vfio_iommu {
unsigned intdma_avail;
boolv2;
boolnesting;
+   booldirty_page_tracking;
  };
  
  struct vfio_domain {

@@ -91,6 +92,7 @@ struct vfio_dma {
boollock_cap;   /* capable(CAP_IPC_LOCK) */
struct task_struct  *task;
struct rb_root  pfn_list;   /* Ex-user pinned pfn list */
+   unsigned long   *bitmap;
  };
  
  struct vfio_group {

@@ -125,7 +127,21 @@ struct vfio_regions {
  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)   \
(!list_empty(>domain_list))
  
+#define DIRTY_BITMAP_BYTES(n)	(ALIGN(n, BITS_PER_TYPE(u64)) / BITS_PER_BYTE)

+
+/*
+ * Input argument of number of bits to bitmap_set() is unsigned integer, which
+ * further casts to signed integer for unaligned multi-bit operation,
+ * __bitmap_set().
+ * Then maximum bitmap size supported is 2^31 bits divided by 2^3 bits/byte,
+ * that is 2^28 (256 MB) which maps to 2^31 * 2^12 = 2^43 (8TB) on 4K page
+ * system.
+ */
+#define DIRTY_BITMAP_PAGES_MAX (uint64_t)(INT_MAX - 1)
+#define DIRTY_BITMAP_SIZE_MAX   DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
+
  static int put_pfn(unsigned long pfn, int prot);
+static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu);
  
  /*

   * This code handles mapping and unmapping of user data buffers
@@ -175,6 +191,77 @@ static void vfio_unlink_dma(struct vfio_iommu *iommu, 
struct vfio_dma *old)
rb_erase(>node, >dma_list);
  }
  
+

+static int vfio_dma_bitmap_alloc(struct vfio_dma *dma, uint64_t pgsize)
+{
+   uint64_t npages = dma->size / pgsize;
+
+   if (npages > DIRTY_BITMAP_PAGES_MAX)
+   return -EINVAL;
+
+   dma->bitmap = kvzalloc(DIRTY_BITMAP_BYTES(npages), GFP_KERNEL);
+   if (!dma->bitmap)
+   return -ENOMEM;
+
+   return 0;
+}
+
+static void vfio_dma_bitmap_free(struct vfio_dma *dma)
+{
+   kfree(dma->bitmap);
+   dma->bitmap = NULL;
+}
+
+static void vfio_dma_populate_bitmap(struct vfio_dma *dma, uint64_t pgsize)
+{
+   struct rb_node *p;
+
+   if (RB_EMPTY_ROOT(>pfn_list))
+   return;
+
+   for (p = rb_first(>pfn_list); p; p = rb_next(p)) {
+   struct vfio_pfn *vpfn = rb_entry(p, struct vfio_pfn, node);
+
+   bitmap_set(dma->bitmap, (vpfn->iova - dma->iova) / pgsize, 1);
+   }
+}
+
+static int vfio_dma_bitmap_alloc_all(struct vfio_iommu *iommu, uint64_t pgsize)
+{
+   struct rb_node *n = rb_first(>dma_list);
+
+   for (; n; n = rb_next(n)) {
+   struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
+   int ret;
+
+   ret = vfio_dma_bitmap_alloc(dma, pgsize);
+   if (ret) {
+   struct rb_node *p = rb_prev(n);
+
+   for (; p; p = rb_prev(p)) {
+   struct vfio_dma *dma = rb_entry(n,
+   struct vfio_dma, node);
+
+   vfio_dma_bitmap_free(dma);
+   }
+   return ret;
+   }
+   vfio_dma_populate_bitmap(dma, pgsize);
+   }
+   return 0;
+}
+
+static void vfio_dma_bitmap_free_all(struct vfio_iommu *iommu)
+{
+   struct rb_node *n = rb_first(>dma_list);
+
+   for (; n; n = rb_next(n)) {
+   struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
+
+

[PATCH v16 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-03-24 Thread Kirti Wankhede

VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
- Start dirty pages tracking while migration is active
- Stop dirty pages tracking.
- Get dirty pages bitmap. Its user space application's responsibility to
  copy content of dirty pages from source to destination during migration.

To prevent DoS attack, memory for bitmap is allocated per vfio_dma
structure. Bitmap size is calculated considering smallest supported page
size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled

Bitmap is populated for already pinned pages when bitmap is allocated for
a vfio_dma with the smallest supported page size. Update bitmap from
pinning functions when tracking is enabled. When user application queries
bitmap, check if requested page size is same as page size used to
populated bitmap. If it is equal, copy bitmap, but if not equal, return
error.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 drivers/vfio/vfio_iommu_type1.c | 266 +++-
 1 file changed, 260 insertions(+), 6 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 70aeab921d0f..874a1a7ae925 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -71,6 +71,7 @@ struct vfio_iommu {
unsigned intdma_avail;
boolv2;
boolnesting;
+   booldirty_page_tracking;
 };
 
 struct vfio_domain {
@@ -91,6 +92,7 @@ struct vfio_dma {
boollock_cap;   /* capable(CAP_IPC_LOCK) */
struct task_struct  *task;
struct rb_root  pfn_list;   /* Ex-user pinned pfn list */
+   unsigned long   *bitmap;
 };
 
 struct vfio_group {
@@ -125,7 +127,21 @@ struct vfio_regions {
 #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)\
(!list_empty(>domain_list))
 
+#define DIRTY_BITMAP_BYTES(n)  (ALIGN(n, BITS_PER_TYPE(u64)) / BITS_PER_BYTE)
+
+/*
+ * Input argument of number of bits to bitmap_set() is unsigned integer, which
+ * further casts to signed integer for unaligned multi-bit operation,
+ * __bitmap_set().
+ * Then maximum bitmap size supported is 2^31 bits divided by 2^3 bits/byte,
+ * that is 2^28 (256 MB) which maps to 2^31 * 2^12 = 2^43 (8TB) on 4K page
+ * system.
+ */
+#define DIRTY_BITMAP_PAGES_MAX (uint64_t)(INT_MAX - 1)
+#define DIRTY_BITMAP_SIZE_MAX   DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
+
 static int put_pfn(unsigned long pfn, int prot);
+static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu);
 
 /*
  * This code handles mapping and unmapping of user data buffers
@@ -175,6 +191,77 @@ static void vfio_unlink_dma(struct vfio_iommu *iommu, 
struct vfio_dma *old)
rb_erase(>node, >dma_list);
 }
 
+
+static int vfio_dma_bitmap_alloc(struct vfio_dma *dma, uint64_t pgsize)
+{
+   uint64_t npages = dma->size / pgsize;
+
+   if (npages > DIRTY_BITMAP_PAGES_MAX)
+   return -EINVAL;
+
+   dma->bitmap = kvzalloc(DIRTY_BITMAP_BYTES(npages), GFP_KERNEL);
+   if (!dma->bitmap)
+   return -ENOMEM;
+
+   return 0;
+}
+
+static void vfio_dma_bitmap_free(struct vfio_dma *dma)
+{
+   kfree(dma->bitmap);
+   dma->bitmap = NULL;
+}
+
+static void vfio_dma_populate_bitmap(struct vfio_dma *dma, uint64_t pgsize)
+{
+   struct rb_node *p;
+
+   if (RB_EMPTY_ROOT(>pfn_list))
+   return;
+
+   for (p = rb_first(>pfn_list); p; p = rb_next(p)) {
+   struct vfio_pfn *vpfn = rb_entry(p, struct vfio_pfn, node);
+
+   bitmap_set(dma->bitmap, (vpfn->iova - dma->iova) / pgsize, 1);
+   }
+}
+
+static int vfio_dma_bitmap_alloc_all(struct vfio_iommu *iommu, uint64_t pgsize)
+{
+   struct rb_node *n = rb_first(>dma_list);
+
+   for (; n; n = rb_next(n)) {
+   struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
+   int ret;
+
+   ret = vfio_dma_bitmap_alloc(dma, pgsize);
+   if (ret) {
+   struct rb_node *p = rb_prev(n);
+
+   for (; p; p = rb_prev(p)) {
+   struct vfio_dma *dma = rb_entry(n,
+   struct vfio_dma, node);
+
+   vfio_dma_bitmap_free(dma);
+   }
+   return ret;
+   }
+   vfio_dma_populate_bitmap(dma, pgsize);
+   }
+   return 0;
+}
+
+static void vfio_dma_bitmap_free_all(struct vfio_iommu *iommu)
+{
+   struct rb_node *n = rb_first(>dma_list);
+
+   for (; n; n = rb_next(n)) {
+   struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
+
+   vfio_dma_bitmap_free(dma);
+   }
+}
+
 /*
  * Helper Functions for host iova-pfn list
  */
@@ -567,6 +654,18 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,

[PATCH v16 QEMU 16/16] vfio: Make vfio-pci device migration capable

2020-03-24 Thread Kirti Wankhede

If device is not failover primary device call vfio_migration_probe()
and vfio_migration_finalize() functions for vfio-pci device to enable
migration for vfio PCI device which support migration.
Removed vfio_pci_vmstate structure.
Removed migration blocker from VFIO PCI device specific structure and use
migration blocker from generic structure of  VFIO device.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/pci.c | 32 +++-
 hw/vfio/pci.h |  1 -
 2 files changed, 11 insertions(+), 22 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 8deb11e87ef7..c70f153d431a 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2916,22 +2916,11 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 return;
 }
 
-if (!pdev->failover_pair_id) {
-error_setg(>migration_blocker,
-"VFIO device doesn't support migration");
-ret = migrate_add_blocker(vdev->migration_blocker, );
-if (ret) {
-error_propagate(errp, err);
-error_free(vdev->migration_blocker);
-vdev->migration_blocker = NULL;
-return;
-}
-}
-
 vdev->vbasedev.name = g_path_get_basename(vdev->vbasedev.sysfsdev);
 vdev->vbasedev.ops = _pci_ops;
 vdev->vbasedev.type = VFIO_DEVICE_TYPE_PCI;
 vdev->vbasedev.dev = DEVICE(vdev);
+vdev->vbasedev.device_state = 0;
 
 tmp = g_strdup_printf("%s/iommu_group", vdev->vbasedev.sysfsdev);
 len = readlink(tmp, group_path, sizeof(group_path));
@@ -3195,6 +3184,14 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 }
 }
 
+if (!pdev->failover_pair_id) {
+ret = vfio_migration_probe(>vbasedev, errp);
+if (ret) {
+error_report("%s: Failed to setup for migration",
+ vdev->vbasedev.name);
+}
+}
+
 vfio_register_err_notifier(vdev);
 vfio_register_req_notifier(vdev);
 vfio_setup_resetfn_quirk(vdev);
@@ -3209,11 +3206,6 @@ out_teardown:
 vfio_bars_exit(vdev);
 error:
 error_prepend(errp, VFIO_MSG_PREFIX, vdev->vbasedev.name);
-if (vdev->migration_blocker) {
-migrate_del_blocker(vdev->migration_blocker);
-error_free(vdev->migration_blocker);
-vdev->migration_blocker = NULL;
-}
 }
 
 static void vfio_instance_finalize(Object *obj)
@@ -3225,10 +3217,7 @@ static void vfio_instance_finalize(Object *obj)
 vfio_bars_finalize(vdev);
 g_free(vdev->emulated_config_bits);
 g_free(vdev->rom);
-if (vdev->migration_blocker) {
-migrate_del_blocker(vdev->migration_blocker);
-error_free(vdev->migration_blocker);
-}
+
 /*
  * XXX Leaking igd_opregion is not an oversight, we can't remove the
  * fw_cfg entry therefore leaking this allocation seems like the safest
@@ -3256,6 +3245,7 @@ static void vfio_exitfn(PCIDevice *pdev)
 }
 vfio_teardown_msi(vdev);
 vfio_bars_exit(vdev);
+vfio_migration_finalize(>vbasedev);
 }
 
 static void vfio_pci_reset(DeviceState *dev)
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index 0da7a20a7ec2..b148c937ef72 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -168,7 +168,6 @@ typedef struct VFIOPCIDevice {
 bool no_vfio_ioeventfd;
 bool enable_ramfb;
 VFIODisplay *dpy;
-Error *migration_blocker;
 Notifier irqchip_change_notifier;
 } VFIOPCIDevice;
 
-- 
2.7.0

[PATCH v16 QEMU 08/16] vfio: Register SaveVMHandlers for VFIO device

2020-03-24 Thread Kirti Wankhede

Define flags to be used as delimeter in migration file stream.
Added .save_setup and .save_cleanup functions. Mapped & unmapped migration
region from these functions at source during saving or pre-copy phase.
Set VFIO device state depending on VM's state. During live migration, VM is
running when .save_setup is called, _SAVING | _RUNNING state is set for VFIO
device. During save-restore, VM is paused, _SAVING state is set for VFIO device.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/migration.c  | 76 
 hw/vfio/trace-events |  2 ++
 2 files changed, 78 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 22ded9d28cf3..033f76526e49 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -8,6 +8,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/main-loop.h"
 #include 
 
 #include "sysemu/runstate.h"
@@ -24,6 +25,17 @@
 #include "pci.h"
 #include "trace.h"
 
+/*
+ * Flags used as delimiter:
+ * 0x => MSB 32-bit all 1s
+ * 0xef10 => emulated (virtual) function IO
+ * 0x => 16-bits reserved for flags
+ */
+#define VFIO_MIG_FLAG_END_OF_STATE  (0xef11ULL)
+#define VFIO_MIG_FLAG_DEV_CONFIG_STATE  (0xef12ULL)
+#define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xef13ULL)
+#define VFIO_MIG_FLAG_DEV_DATA_STATE(0xef14ULL)
+
 static void vfio_migration_region_exit(VFIODevice *vbasedev)
 {
 VFIOMigration *migration = vbasedev->migration;
@@ -126,6 +138,69 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, 
uint32_t mask,
 return 0;
 }
 
+/* -- */
+
+static int vfio_save_setup(QEMUFile *f, void *opaque)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+int ret;
+
+qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE);
+
+if (migration->region.mmaps) {
+qemu_mutex_lock_iothread();
+ret = vfio_region_mmap(>region);
+qemu_mutex_unlock_iothread();
+if (ret) {
+error_report("%s: Failed to mmap VFIO migration region %d: %s",
+ vbasedev->name, migration->region.index,
+ strerror(-ret));
+return ret;
+}
+}
+
+ret = vfio_migration_set_state(vbasedev, ~0, VFIO_DEVICE_STATE_SAVING);
+if (ret) {
+error_report("%s: Failed to set state SAVING", vbasedev->name);
+return ret;
+}
+
+/*
+ * Save migration region size. This is used to verify migration region size
+ * is greater than or equal to migration region size at destination
+ */
+qemu_put_be64(f, migration->region.size);
+
+qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
+
+ret = qemu_file_get_error(f);
+if (ret) {
+return ret;
+}
+
+trace_vfio_save_setup(vbasedev->name);
+return 0;
+}
+
+static void vfio_save_cleanup(void *opaque)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+
+if (migration->region.mmaps) {
+vfio_region_unmap(>region);
+}
+trace_vfio_save_cleanup(vbasedev->name);
+}
+
+static SaveVMHandlers savevm_vfio_handlers = {
+.save_setup = vfio_save_setup,
+.save_cleanup = vfio_save_cleanup,
+};
+
+/* -- */
+
 static void vfio_vmstate_change(void *opaque, int running, RunState state)
 {
 VFIODevice *vbasedev = opaque;
@@ -191,6 +266,7 @@ static int vfio_migration_init(VFIODevice *vbasedev,
 return ret;
 }
 
+register_savevm_live("vfio", -1, 1, _vfio_handlers, vbasedev);
 vbasedev->vm_state = qemu_add_vm_change_state_handler(vfio_vmstate_change,
   vbasedev);
 
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 69503228f20e..4bb43f18f315 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -149,3 +149,5 @@ vfio_migration_probe(char *name, uint32_t index) " (%s) 
Region %d"
 vfio_migration_set_state(char *name, uint32_t state) " (%s) state %d"
 vfio_vmstate_change(char *name, int running, const char *reason, uint32_t 
dev_state) " (%s) running %d reason %s device state %d"
 vfio_migration_state_notifier(char *name, int state) " (%s) state %d"
+vfio_save_setup(char *name) " (%s)"
+vfio_save_cleanup(char *name) " (%s)"
-- 
2.7.0

[PATCH v16 QEMU 14/16] vfio: Add vfio_listener_log_sync to mark dirty pages

2020-03-24 Thread Kirti Wankhede

vfio_listener_log_sync gets list of dirty pages from container using
VFIO_IOMMU_GET_DIRTY_BITMAP ioctl and mark those pages dirty when all
devices are stopped and saving state.
Return early for the RAM block section of mapped MMIO region.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/common.c | 200 +--
 hw/vfio/trace-events |   1 +
 2 files changed, 196 insertions(+), 5 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 4a2f0d6a2233..6d41e1ac5c2f 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -29,6 +29,7 @@
 #include "hw/vfio/vfio.h"
 #include "exec/address-spaces.h"
 #include "exec/memory.h"
+#include "exec/ram_addr.h"
 #include "hw/hw.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
@@ -38,6 +39,7 @@
 #include "sysemu/reset.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "migration/migration.h"
 
 VFIOGroupList vfio_group_list =
 QLIST_HEAD_INITIALIZER(vfio_group_list);
@@ -288,6 +290,28 @@ const MemoryRegionOps vfio_region_ops = {
 };
 
 /*
+ * Device state interfaces
+ */
+
+static bool vfio_devices_are_stopped_and_saving(void)
+{
+VFIOGroup *group;
+VFIODevice *vbasedev;
+
+QLIST_FOREACH(group, _group_list, next) {
+QLIST_FOREACH(vbasedev, >device_list, next) {
+if ((vbasedev->device_state & VFIO_DEVICE_STATE_SAVING) &&
+!(vbasedev->device_state & VFIO_DEVICE_STATE_RUNNING)) {
+continue;
+} else {
+return false;
+}
+}
+}
+return true;
+}
+
+/*
  * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
  */
 static int vfio_dma_unmap(VFIOContainer *container,
@@ -408,8 +432,8 @@ static bool 
vfio_listener_skipped_section(MemoryRegionSection *section)
 }
 
 /* Called with rcu_read_lock held.  */
-static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, void **vaddr,
-   bool *read_only)
+static bool vfio_get_xlat_addr(IOMMUTLBEntry *iotlb, void **vaddr,
+   ram_addr_t *ram_addr, bool *read_only)
 {
 MemoryRegion *mr;
 hwaddr xlat;
@@ -440,9 +464,17 @@ static bool vfio_get_vaddr(IOMMUTLBEntry *iotlb, void 
**vaddr,
 return false;
 }
 
-*vaddr = memory_region_get_ram_ptr(mr) + xlat;
-*read_only = !writable || mr->readonly;
+if (vaddr) {
+*vaddr = memory_region_get_ram_ptr(mr) + xlat;
+}
 
+if (ram_addr) {
+*ram_addr = memory_region_get_ram_addr(mr) + xlat;
+}
+
+if (read_only) {
+*read_only = !writable || mr->readonly;
+}
 return true;
 }
 
@@ -467,7 +499,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
 rcu_read_lock();
 
 if ((iotlb->perm & IOMMU_RW) != IOMMU_NONE) {
-if (!vfio_get_vaddr(iotlb, , _only)) {
+if (!vfio_get_xlat_addr(iotlb, , NULL, _only)) {
 goto out;
 }
 /*
@@ -813,9 +845,167 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
 }
 }
 
+static int vfio_get_dirty_bitmap(MemoryListener *listener,
+ MemoryRegionSection *section)
+{
+VFIOContainer *container = container_of(listener, VFIOContainer, listener);
+VFIOGuestIOMMU *giommu;
+IOMMUTLBEntry iotlb;
+hwaddr granularity, address_limit, iova;
+int ret;
+
+if (memory_region_is_iommu(section->mr)) {
+QLIST_FOREACH(giommu, >giommu_list, giommu_next) {
+if (MEMORY_REGION(giommu->iommu) == section->mr &&
+giommu->n.start == section->offset_within_region) {
+break;
+}
+}
+
+if (!giommu) {
+return -EINVAL;
+}
+}
+
+if (memory_region_is_iommu(section->mr)) {
+granularity = memory_region_iommu_get_min_page_size(giommu->iommu);
+
+address_limit = MIN(int128_get64(section->size),
+
memory_region_iommu_get_address_limit(giommu->iommu,
+ int128_get64(section->size)));
+} else {
+granularity = memory_region_size(section->mr);
+address_limit = int128_get64(section->size);
+}
+
+iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+
+RCU_READ_LOCK_GUARD();
+
+while (iova < address_limit) {
+struct vfio_iommu_type1_dirty_bitmap *dbitmap;
+struct vfio_iommu_type1_dirty_bitmap_get *range;
+ram_addr_t start, pages;
+uint64_t iova_xlat, size;
+
+if (memory_region_is_iommu(section->mr)) {
+iotlb = address_space_get_iotlb_entry(container->space->as, iova,
+ true, MEMTXATTRS_UNSPECIFIED);
+if ((iotlb.target_as == NULL) || (iotlb.addr_mask == 0)) {
+if ((iova + granularity) < iova) {
+break;
+}
+iova +=

[PATCH v16 QEMU 07/16] vfio: Add migration state change notifier

2020-03-24 Thread Kirti Wankhede

Added migration state change notifier to get notification on migration state
change. These states are translated to VFIO device state and conveyed to vendor
driver.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/migration.c   | 29 +
 hw/vfio/trace-events  |  1 +
 include/hw/vfio/vfio-common.h |  1 +
 3 files changed, 31 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index af9443c275fb..22ded9d28cf3 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -154,6 +154,27 @@ static void vfio_vmstate_change(void *opaque, int running, 
RunState state)
 }
 }
 
+static void vfio_migration_state_notifier(Notifier *notifier, void *data)
+{
+MigrationState *s = data;
+VFIODevice *vbasedev = container_of(notifier, VFIODevice, migration_state);
+int ret;
+
+trace_vfio_migration_state_notifier(vbasedev->name, s->state);
+
+switch (s->state) {
+case MIGRATION_STATUS_CANCELLING:
+case MIGRATION_STATUS_CANCELLED:
+case MIGRATION_STATUS_FAILED:
+ret = vfio_migration_set_state(vbasedev,
+  ~(VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RESUMING),
+  VFIO_DEVICE_STATE_RUNNING);
+if (ret) {
+error_report("%s: Failed to set state RUNNING", vbasedev->name);
+}
+}
+}
+
 static int vfio_migration_init(VFIODevice *vbasedev,
struct vfio_region_info *info)
 {
@@ -173,6 +194,9 @@ static int vfio_migration_init(VFIODevice *vbasedev,
 vbasedev->vm_state = qemu_add_vm_change_state_handler(vfio_vmstate_change,
   vbasedev);
 
+vbasedev->migration_state.notify = vfio_migration_state_notifier;
+add_migration_state_change_notifier(>migration_state);
+
 return 0;
 }
 
@@ -211,6 +235,11 @@ add_blocker:
 
 void vfio_migration_finalize(VFIODevice *vbasedev)
 {
+
+if (vbasedev->migration_state.notify) {
+remove_migration_state_change_notifier(>migration_state);
+}
+
 if (vbasedev->vm_state) {
 qemu_del_vm_change_state_handler(vbasedev->vm_state);
 }
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 3d15bacd031a..69503228f20e 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -148,3 +148,4 @@ vfio_display_edid_write_error(void) ""
 vfio_migration_probe(char *name, uint32_t index) " (%s) Region %d"
 vfio_migration_set_state(char *name, uint32_t state) " (%s) state %d"
 vfio_vmstate_change(char *name, int running, const char *reason, uint32_t 
dev_state) " (%s) running %d reason %s device state %d"
+vfio_migration_state_notifier(char *name, int state) " (%s) state %d"
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 3d18eb146b33..28f55f66d019 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -123,6 +123,7 @@ typedef struct VFIODevice {
 VMChangeStateEntry *vm_state;
 uint32_t device_state;
 int vm_running;
+Notifier migration_state;
 } VFIODevice;
 
 struct VFIODeviceOps {
-- 
2.7.0

[PATCH v16 QEMU 11/16] iommu: add callback to get address limit IOMMU supports

2020-03-24 Thread Kirti Wankhede

Add optional method to get address limit IOMMU supports

Signed-off-by: Kirti Wankhede 
---
 hw/i386/intel_iommu.c |  9 +
 include/exec/memory.h | 19 +++
 memory.c  | 11 +++
 3 files changed, 39 insertions(+)

diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
index df7ad254ac15..d0b88c20c31e 100644
--- a/hw/i386/intel_iommu.c
+++ b/hw/i386/intel_iommu.c
@@ -3577,6 +3577,14 @@ static void vtd_iommu_replay(IOMMUMemoryRegion 
*iommu_mr, IOMMUNotifier *n)
 return;
 }
 
+static hwaddr vtd_iommu_get_address_limit(IOMMUMemoryRegion *iommu_mr)
+{
+VTDAddressSpace *vtd_as = container_of(iommu_mr, VTDAddressSpace, iommu);
+IntelIOMMUState *s = vtd_as->iommu_state;
+
+return VTD_ADDRESS_SIZE(s->aw_bits) - 1;
+}
+
 /* Do the initialization. It will also be called when reset, so pay
  * attention when adding new initialization stuff.
  */
@@ -3878,6 +3886,7 @@ static void 
vtd_iommu_memory_region_class_init(ObjectClass *klass,
 imrc->translate = vtd_iommu_translate;
 imrc->notify_flag_changed = vtd_iommu_notify_flag_changed;
 imrc->replay = vtd_iommu_replay;
+imrc->get_address_limit = vtd_iommu_get_address_limit;
 }
 
 static const TypeInfo vtd_iommu_memory_region_info = {
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 1614d9a02c0c..f7d92bf6e6a9 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -355,6 +355,17 @@ typedef struct IOMMUMemoryRegionClass {
  * @iommu: the IOMMUMemoryRegion
  */
 int (*num_indexes)(IOMMUMemoryRegion *iommu);
+
+/*
+ * Return address limit this IOMMU supports.
+ *
+ * Optional method: if this method is not provided, then
+ * memory_region_iommu_address_limit() will return the limit which input
+ * argument to this function.
+ *
+ * @iommu: the IOMMUMemoryRegion
+ */
+hwaddr (*get_address_limit)(IOMMUMemoryRegion *iommu);
 } IOMMUMemoryRegionClass;
 
 typedef struct CoalescedMemoryRange CoalescedMemoryRange;
@@ -1364,6 +1375,14 @@ int memory_region_iommu_attrs_to_index(IOMMUMemoryRegion 
*iommu_mr,
 int memory_region_iommu_num_indexes(IOMMUMemoryRegion *iommu_mr);
 
 /**
+ * memory_region_iommu_get_address_limit : return the maximum address limit
+ * that this IOMMU supports.
+ *
+ * @iommu_mr: the memory region
+ */
+hwaddr memory_region_iommu_get_address_limit(IOMMUMemoryRegion *iommu_mr,
+ hwaddr limit);
+/**
  * memory_region_name: get a memory region's name
  *
  * Returns the string that was used to initialize the memory region.
diff --git a/memory.c b/memory.c
index 601b74990620..acb7546971c3 100644
--- a/memory.c
+++ b/memory.c
@@ -1887,6 +1887,17 @@ void memory_region_iommu_replay(IOMMUMemoryRegion 
*iommu_mr, IOMMUNotifier *n)
 }
 }
 
+hwaddr memory_region_iommu_get_address_limit(IOMMUMemoryRegion *iommu_mr,
+ hwaddr limit)
+{
+IOMMUMemoryRegionClass *imrc = IOMMU_MEMORY_REGION_GET_CLASS(iommu_mr);
+
+if (imrc->get_address_limit) {
+return imrc->get_address_limit(iommu_mr);
+}
+return limit;
+}
+
 void memory_region_unregister_iommu_notifier(MemoryRegion *mr,
  IOMMUNotifier *n)
 {
-- 
2.7.0

[PATCH v16 QEMU 13/16] vfio: Add function to start and stop dirty pages tracking

2020-03-24 Thread Kirti Wankhede

Call VFIO_IOMMU_DIRTY_PAGES ioctl to start and stop dirty pages tracking
for VFIO devices.

Signed-off-by: Kirti Wankhede 
---
 hw/vfio/migration.c | 36 
 1 file changed, 36 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index ab295d25620e..1827b7cfb316 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -9,6 +9,7 @@
 
 #include "qemu/osdep.h"
 #include "qemu/main-loop.h"
+#include 
 #include 
 
 #include "sysemu/runstate.h"
@@ -296,6 +297,32 @@ static int vfio_load_device_config_state(QEMUFile *f, void 
*opaque)
 return qemu_file_get_error(f);
 }
 
+static int vfio_start_dirty_page_tracking(VFIODevice *vbasedev, bool start)
+{
+int ret;
+VFIOContainer *container = vbasedev->group->container;
+struct vfio_iommu_type1_dirty_bitmap dirty = {
+.argsz = sizeof(dirty),
+};
+
+if (start) {
+if (vbasedev->device_state & VFIO_DEVICE_STATE_SAVING) {
+dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_START;
+} else {
+return 0;
+}
+} else {
+dirty.flags = VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP;
+}
+
+ret = ioctl(container->fd, VFIO_IOMMU_DIRTY_PAGES, );
+if (ret) {
+error_report("Failed to set dirty tracking flag 0x%x errno: %d",
+ dirty.flags, errno);
+}
+return ret;
+}
+
 /* -- */
 
 static int vfio_save_setup(QEMUFile *f, void *opaque)
@@ -330,6 +357,11 @@ static int vfio_save_setup(QEMUFile *f, void *opaque)
  */
 qemu_put_be64(f, migration->region.size);
 
+ret = vfio_start_dirty_page_tracking(vbasedev, true);
+if (ret) {
+return ret;
+}
+
 qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
 
 ret = qemu_file_get_error(f);
@@ -346,6 +378,8 @@ static void vfio_save_cleanup(void *opaque)
 VFIODevice *vbasedev = opaque;
 VFIOMigration *migration = vbasedev->migration;
 
+vfio_start_dirty_page_tracking(vbasedev, false);
+
 if (migration->region.mmaps) {
 vfio_region_unmap(>region);
 }
@@ -669,6 +703,8 @@ static void vfio_migration_state_notifier(Notifier 
*notifier, void *data)
 if (ret) {
 error_report("%s: Failed to set state RUNNING", vbasedev->name);
 }
+
+vfio_start_dirty_page_tracking(vbasedev, false);
 }
 }
 
-- 
2.7.0

[PATCH v16 QEMU 15/16] vfio: Add ioctl to get dirty pages bitmap during dma unmap.

2020-03-24 Thread Kirti Wankhede

With vIOMMU, IO virtual address range can get unmapped while in pre-copy phase
of migration. In that case, unmap ioctl should return pages pinned in that range
and QEMU should find its correcponding guest physical addresses and report
those dirty.

Note: This patch is not yet tested. I'm trying to see how I can test this code
path.

Suggested-by: Alex Williamson 
Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/common.c  | 83 ---
 include/hw/vfio/vfio-common.h |  1 +
 2 files changed, 80 insertions(+), 4 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 6d41e1ac5c2f..e0f91841bc82 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -311,11 +311,77 @@ static bool vfio_devices_are_stopped_and_saving(void)
 return true;
 }
 
+static bool vfio_devices_are_running_and_saving(void)
+{
+VFIOGroup *group;
+VFIODevice *vbasedev;
+
+QLIST_FOREACH(group, _group_list, next) {
+QLIST_FOREACH(vbasedev, >device_list, next) {
+if ((vbasedev->device_state & VFIO_DEVICE_STATE_SAVING) &&
+(vbasedev->device_state & VFIO_DEVICE_STATE_RUNNING)) {
+continue;
+} else {
+return false;
+}
+}
+}
+return true;
+}
+
+static int vfio_dma_unmap_bitmap(VFIOContainer *container,
+ hwaddr iova, ram_addr_t size,
+ IOMMUTLBEntry *iotlb)
+{
+struct vfio_iommu_type1_dma_unmap *unmap;
+struct vfio_bitmap *bitmap;
+uint64_t pages = TARGET_PAGE_ALIGN(size) >> TARGET_PAGE_BITS;
+int ret;
+
+unmap = g_malloc0(sizeof(*unmap) + sizeof(*bitmap));
+if (!unmap) {
+return -ENOMEM;
+}
+
+unmap->argsz = sizeof(*unmap) + sizeof(*bitmap);
+unmap->flags |= VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP;
+bitmap = (struct vfio_bitmap *)>data;
+
+/*
+ * cpu_physical_memory_set_dirty_lebitmap() expects pages in bitmap of
+ * TARGET_PAGE_SIZE to mark those dirty. Hence set bitmap_pgsize to
+ * TARGET_PAGE_SIZE.
+ */
+
+bitmap->pgsize = TARGET_PAGE_SIZE;
+bitmap->size = ROUND_UP(pages, 64) / 8;
+bitmap->data = g_malloc0(bitmap->size);
+if (!bitmap->data) {
+error_report("UNMAP: Error allocating bitmap of size 0x%llx",
+ bitmap->size);
+g_free(unmap);
+return -ENOMEM;
+}
+
+ret = ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, unmap);
+if (!ret) {
+cpu_physical_memory_set_dirty_lebitmap((uint64_t *)bitmap->data,
+iotlb->translated_addr, pages);
+} else {
+error_report("VFIO_UNMAP_DMA with DIRTY_BITMAP : %d", -errno);
+}
+
+g_free(bitmap->data);
+g_free(unmap);
+return ret;
+}
+
 /*
  * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
  */
 static int vfio_dma_unmap(VFIOContainer *container,
-  hwaddr iova, ram_addr_t size)
+  hwaddr iova, ram_addr_t size,
+  IOMMUTLBEntry *iotlb)
 {
 struct vfio_iommu_type1_dma_unmap unmap = {
 .argsz = sizeof(unmap),
@@ -324,6 +390,11 @@ static int vfio_dma_unmap(VFIOContainer *container,
 .size = size,
 };
 
+if (iotlb && container->dirty_pages_supported &&
+vfio_devices_are_running_and_saving()) {
+return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
+}
+
 while (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, )) {
 /*
  * The type1 backend has an off-by-one bug in the kernel (71a7d3d78e3c
@@ -371,7 +442,7 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr 
iova,
  * the VGA ROM space.
  */
 if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, ) == 0 ||
-(errno == EBUSY && vfio_dma_unmap(container, iova, size) == 0 &&
+(errno == EBUSY && vfio_dma_unmap(container, iova, size, NULL) == 0 &&
  ioctl(container->fd, VFIO_IOMMU_MAP_DMA, ) == 0)) {
 return 0;
 }
@@ -519,7 +590,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
  iotlb->addr_mask + 1, vaddr, ret);
 }
 } else {
-ret = vfio_dma_unmap(container, iova, iotlb->addr_mask + 1);
+ret = vfio_dma_unmap(container, iova, iotlb->addr_mask + 1, iotlb);
 if (ret) {
 error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
  "0x%"HWADDR_PRIx") = %d (%m)",
@@ -822,7 +893,7 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
 }
 
 if (try_unmap) {
-ret = vfio_dma_unmap(container, iova, int128_get64(llsize));
+ret = vfio_dma_unmap(container, iova, int128_get64(llsize), NULL);
 if (ret) {
 error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
  "0x%"HWADDR_PRIx") = %d (%m)",
@@ -1479,6 +1550,7 @@ static int

[PATCH v16 QEMU 06/16] vfio: Add VM state change handler to know state of VM

2020-03-24 Thread Kirti Wankhede

VM state change handler gets called on change in VM's state. This is used to set
VFIO device state to _RUNNING.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/migration.c   | 87 +++
 hw/vfio/trace-events  |  2 +
 include/hw/vfio/vfio-common.h |  4 ++
 3 files changed, 93 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index a078dcf1dd8f..af9443c275fb 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -10,6 +10,7 @@
 #include "qemu/osdep.h"
 #include 
 
+#include "sysemu/runstate.h"
 #include "hw/vfio/vfio-common.h"
 #include "cpu.h"
 #include "migration/migration.h"
@@ -74,6 +75,85 @@ err:
 return ret;
 }
 
+static int vfio_migration_set_state(VFIODevice *vbasedev, uint32_t mask,
+uint32_t value)
+{
+VFIOMigration *migration = vbasedev->migration;
+VFIORegion *region = >region;
+uint32_t device_state;
+int ret;
+
+ret = pread(vbasedev->fd, _state, sizeof(device_state),
+region->fd_offset + offsetof(struct vfio_device_migration_info,
+  device_state));
+if (ret < 0) {
+error_report("%s: Failed to read device state %d %s",
+ vbasedev->name, ret, strerror(errno));
+return ret;
+}
+
+device_state = (device_state & mask) | value;
+
+if (!VFIO_DEVICE_STATE_VALID(device_state)) {
+return -EINVAL;
+}
+
+ret = pwrite(vbasedev->fd, _state, sizeof(device_state),
+ region->fd_offset + offsetof(struct 
vfio_device_migration_info,
+  device_state));
+if (ret < 0) {
+error_report("%s: Failed to set device state %d %s",
+ vbasedev->name, ret, strerror(errno));
+
+ret = pread(vbasedev->fd, _state, sizeof(device_state),
+region->fd_offset + offsetof(struct vfio_device_migration_info,
+device_state));
+if (ret < 0) {
+error_report("%s: On failure, failed to read device state %d %s",
+vbasedev->name, ret, strerror(errno));
+return ret;
+}
+
+if (VFIO_DEVICE_STATE_IS_ERROR(device_state)) {
+error_report("%s: Device is in error state 0x%x",
+ vbasedev->name, device_state);
+return -EFAULT;
+}
+}
+
+vbasedev->device_state = device_state;
+trace_vfio_migration_set_state(vbasedev->name, device_state);
+return 0;
+}
+
+static void vfio_vmstate_change(void *opaque, int running, RunState state)
+{
+VFIODevice *vbasedev = opaque;
+
+if ((vbasedev->vm_running != running)) {
+int ret;
+uint32_t value = 0, mask = 0;
+
+if (running) {
+value = VFIO_DEVICE_STATE_RUNNING;
+if (vbasedev->device_state & VFIO_DEVICE_STATE_RESUMING) {
+mask = ~VFIO_DEVICE_STATE_RESUMING;
+}
+} else {
+mask = ~VFIO_DEVICE_STATE_RUNNING;
+}
+
+ret = vfio_migration_set_state(vbasedev, mask, value);
+if (ret) {
+error_report("%s: Failed to set device state 0x%x",
+ vbasedev->name, value & mask);
+}
+vbasedev->vm_running = running;
+trace_vfio_vmstate_change(vbasedev->name, running, RunState_str(state),
+  value & mask);
+}
+}
+
 static int vfio_migration_init(VFIODevice *vbasedev,
struct vfio_region_info *info)
 {
@@ -90,6 +170,9 @@ static int vfio_migration_init(VFIODevice *vbasedev,
 return ret;
 }
 
+vbasedev->vm_state = qemu_add_vm_change_state_handler(vfio_vmstate_change,
+  vbasedev);
+
 return 0;
 }
 
@@ -128,6 +211,10 @@ add_blocker:
 
 void vfio_migration_finalize(VFIODevice *vbasedev)
 {
+if (vbasedev->vm_state) {
+qemu_del_vm_change_state_handler(vbasedev->vm_state);
+}
+
 if (vbasedev->migration_blocker) {
 migrate_del_blocker(vbasedev->migration_blocker);
 error_free(vbasedev->migration_blocker);
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 191a726a1312..3d15bacd031a 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -146,3 +146,5 @@ vfio_display_edid_write_error(void) ""
 
 # migration.c
 vfio_migration_probe(char *name, uint32_t index) " (%s) Region %d"
+vfio_migration_set_state(char *name, uint32_t state) " (%s) state %d"
+vfio_vmstate_change(char *name, int running, const char *reason, uint32_t 
dev_state) " (%s) running %d reason %s device state %d"
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index d4b268641173..3d18eb146b33 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -29,6 +29,7 @@
 #ifdef CONFIG_LINUX
 #include 
 #endif

[PATCH v16 QEMU 10/16] vfio: Add load state functions to SaveVMHandlers

2020-03-24 Thread Kirti Wankhede

Sequence  during _RESUMING device state:
While data for this device is available, repeat below steps:
a. read data_offset from where user application should write data.
b. write data of data_size to migration region from data_offset.
c. write data_size which indicates vendor driver that data is written in
   staging buffer.

For user, data is opaque. User should write data in the same order as
received.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/migration.c  | 179 +++
 hw/vfio/trace-events |   3 +
 2 files changed, 182 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index ecbeed5182c2..ab295d25620e 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -269,6 +269,33 @@ static int vfio_save_device_config_state(QEMUFile *f, void 
*opaque)
 return qemu_file_get_error(f);
 }
 
+static int vfio_load_device_config_state(QEMUFile *f, void *opaque)
+{
+VFIODevice *vbasedev = opaque;
+uint64_t data;
+
+if (vbasedev->ops && vbasedev->ops->vfio_load_config) {
+int ret;
+
+ret = vbasedev->ops->vfio_load_config(vbasedev, f);
+if (ret) {
+error_report("%s: Failed to load device config space",
+ vbasedev->name);
+return ret;
+}
+}
+
+data = qemu_get_be64(f);
+if (data != VFIO_MIG_FLAG_END_OF_STATE) {
+error_report("%s: Failed loading device config space, "
+ "end flag incorrect 0x%"PRIx64, vbasedev->name, data);
+return -EINVAL;
+}
+
+trace_vfio_load_device_config_state(vbasedev->name);
+return qemu_file_get_error(f);
+}
+
 /* -- */
 
 static int vfio_save_setup(QEMUFile *f, void *opaque)
@@ -434,12 +461,164 @@ static int vfio_save_complete_precopy(QEMUFile *f, void 
*opaque)
 return ret;
 }
 
+static int vfio_load_setup(QEMUFile *f, void *opaque)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+int ret = 0;
+
+if (migration->region.mmaps) {
+ret = vfio_region_mmap(>region);
+if (ret) {
+error_report("%s: Failed to mmap VFIO migration region %d: %s",
+ vbasedev->name, migration->region.nr,
+ strerror(-ret));
+return ret;
+}
+}
+
+ret = vfio_migration_set_state(vbasedev, ~0, VFIO_DEVICE_STATE_RESUMING);
+if (ret) {
+error_report("%s: Failed to set state RESUMING", vbasedev->name);
+}
+return ret;
+}
+
+static int vfio_load_cleanup(void *opaque)
+{
+vfio_save_cleanup(opaque);
+return 0;
+}
+
+static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+int ret = 0;
+uint64_t data, data_size;
+
+data = qemu_get_be64(f);
+while (data != VFIO_MIG_FLAG_END_OF_STATE) {
+
+trace_vfio_load_state(vbasedev->name, data);
+
+switch (data) {
+case VFIO_MIG_FLAG_DEV_CONFIG_STATE:
+{
+ret = vfio_load_device_config_state(f, opaque);
+if (ret) {
+return ret;
+}
+break;
+}
+case VFIO_MIG_FLAG_DEV_SETUP_STATE:
+{
+uint64_t region_size = qemu_get_be64(f);
+
+if (migration->region.size < region_size) {
+error_report("%s: SETUP STATE: migration region too small, "
+ "0x%"PRIx64 " < 0x%"PRIx64, vbasedev->name,
+ migration->region.size, region_size);
+return -EINVAL;
+}
+
+data = qemu_get_be64(f);
+if (data == VFIO_MIG_FLAG_END_OF_STATE) {
+return ret;
+} else {
+error_report("%s: SETUP STATE: EOS not found 0x%"PRIx64,
+ vbasedev->name, data);
+return -EINVAL;
+}
+break;
+}
+case VFIO_MIG_FLAG_DEV_DATA_STATE:
+{
+VFIORegion *region = >region;
+void *buf = NULL;
+bool buffer_mmaped = false;
+uint64_t data_offset = 0;
+
+data_size = qemu_get_be64(f);
+if (data_size == 0) {
+break;
+}
+
+ret = pread(vbasedev->fd, _offset, sizeof(data_offset),
+region->fd_offset +
+offsetof(struct vfio_device_migration_info,
+data_offset));
+if (ret != sizeof(data_offset)) {
+error_report("%s:Failed to get migration buffer data offset 
%d",
+ vbasedev->name, ret);
+return -EINVAL;
+}
+
+if (region->mmaps) {
+buf = find_data_region(region, data_offset,

[PATCH v16 QEMU 04/16] vfio: Add save and load functions for VFIO PCI devices

2020-03-24 Thread Kirti Wankhede

These functions save and restore PCI device specific data - config
space of PCI device.
Tested save and restore with MSI and MSIX type.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/pci.c | 163 ++
 include/hw/vfio/vfio-common.h |   2 +
 2 files changed, 165 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 6c77c12e44b9..8deb11e87ef7 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -41,6 +41,7 @@
 #include "trace.h"
 #include "qapi/error.h"
 #include "migration/blocker.h"
+#include "migration/qemu-file.h"
 
 #define TYPE_VFIO_PCI "vfio-pci"
 #define PCI_VFIO(obj)OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI)
@@ -1632,6 +1633,50 @@ static void vfio_bars_prepare(VFIOPCIDevice *vdev)
 }
 }
 
+static int vfio_bar_validate(VFIOPCIDevice *vdev, int nr)
+{
+PCIDevice *pdev = >pdev;
+VFIOBAR *bar = >bars[nr];
+uint64_t addr;
+uint32_t addr_lo, addr_hi = 0;
+
+/* Skip unimplemented BARs and the upper half of 64bit BARS. */
+if (!bar->size) {
+return 0;
+}
+
+addr_lo = pci_default_read_config(pdev, PCI_BASE_ADDRESS_0 + nr * 4, 4);
+
+addr_lo = addr_lo & (bar->ioport ? PCI_BASE_ADDRESS_IO_MASK :
+   PCI_BASE_ADDRESS_MEM_MASK);
+if (bar->type == PCI_BASE_ADDRESS_MEM_TYPE_64) {
+addr_hi = pci_default_read_config(pdev,
+ PCI_BASE_ADDRESS_0 + (nr + 1) * 4, 4);
+}
+
+addr = ((uint64_t)addr_hi << 32) | addr_lo;
+
+if (!QEMU_IS_ALIGNED(addr, bar->size)) {
+return -EINVAL;
+}
+
+return 0;
+}
+
+static int vfio_bars_validate(VFIOPCIDevice *vdev)
+{
+int i, ret;
+
+for (i = 0; i < PCI_ROM_SLOT; i++) {
+ret = vfio_bar_validate(vdev, i);
+if (ret) {
+error_report("vfio: BAR address %d validation failed", i);
+return ret;
+}
+}
+return 0;
+}
+
 static void vfio_bar_register(VFIOPCIDevice *vdev, int nr)
 {
 VFIOBAR *bar = >bars[nr];
@@ -2414,11 +2459,129 @@ static Object *vfio_pci_get_object(VFIODevice 
*vbasedev)
 return OBJECT(vdev);
 }
 
+static void vfio_pci_save_config(VFIODevice *vbasedev, QEMUFile *f)
+{
+VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+PCIDevice *pdev = >pdev;
+uint16_t pci_cmd;
+int i;
+
+for (i = 0; i < PCI_ROM_SLOT; i++) {
+uint32_t bar;
+
+bar = pci_default_read_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, 4);
+qemu_put_be32(f, bar);
+}
+
+qemu_put_be32(f, vdev->interrupt);
+if (vdev->interrupt == VFIO_INT_MSI) {
+uint32_t msi_flags, msi_addr_lo, msi_addr_hi = 0, msi_data;
+bool msi_64bit;
+
+msi_flags = pci_default_read_config(pdev, pdev->msi_cap + 
PCI_MSI_FLAGS,
+2);
+msi_64bit = (msi_flags & PCI_MSI_FLAGS_64BIT);
+
+msi_addr_lo = pci_default_read_config(pdev,
+ pdev->msi_cap + PCI_MSI_ADDRESS_LO, 
4);
+qemu_put_be32(f, msi_addr_lo);
+
+if (msi_64bit) {
+msi_addr_hi = pci_default_read_config(pdev,
+ pdev->msi_cap + 
PCI_MSI_ADDRESS_HI,
+ 4);
+}
+qemu_put_be32(f, msi_addr_hi);
+
+msi_data = pci_default_read_config(pdev,
+pdev->msi_cap + (msi_64bit ? PCI_MSI_DATA_64 : 
PCI_MSI_DATA_32),
+2);
+qemu_put_be32(f, msi_data);
+} else if (vdev->interrupt == VFIO_INT_MSIX) {
+uint16_t offset;
+
+/* save enable bit and maskall bit */
+offset = pci_default_read_config(pdev,
+   pdev->msix_cap + PCI_MSIX_FLAGS + 1, 2);
+qemu_put_be16(f, offset);
+msix_save(pdev, f);
+}
+pci_cmd = pci_default_read_config(pdev, PCI_COMMAND, 2);
+qemu_put_be16(f, pci_cmd);
+}
+
+static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
+{
+VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+PCIDevice *pdev = >pdev;
+uint32_t interrupt_type;
+uint32_t msi_flags, msi_addr_lo, msi_addr_hi = 0, msi_data;
+uint16_t pci_cmd;
+bool msi_64bit;
+int i, ret;
+
+/* retore pci bar configuration */
+pci_cmd = pci_default_read_config(pdev, PCI_COMMAND, 2);
+vfio_pci_write_config(pdev, PCI_COMMAND,
+pci_cmd & (!(PCI_COMMAND_IO | PCI_COMMAND_MEMORY)), 2);
+for (i = 0; i < PCI_ROM_SLOT; i++) {
+uint32_t bar = qemu_get_be32(f);
+
+vfio_pci_write_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, bar, 4);
+}
+
+ret = vfio_bars_validate(vdev);
+if (ret) {
+return ret;
+}
+
+interrupt_type = qemu_get_be32(f);
+
+if (interrupt_type == VFIO_INT_MSI) {
+/* restore msi configuration */
+msi_flags =

[PATCH v16 QEMU 09/16] vfio: Add save state functions to SaveVMHandlers

2020-03-24 Thread Kirti Wankhede

Added .save_live_pending, .save_live_iterate and .save_live_complete_precopy
functions. These functions handles pre-copy and stop-and-copy phase.

In _SAVING|_RUNNING device state or pre-copy phase:
- read pending_bytes. If pending_bytes > 0, go through below steps.
- read data_offset - indicates kernel driver to write data to staging
  buffer.
- read data_size - amount of data in bytes written by vendor driver in
  migration region.
- read data_size bytes of data from data_offset in the migration region.
- Write data packet to file stream as below:
{VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data,
VFIO_MIG_FLAG_END_OF_STATE }

In _SAVING device state or stop-and-copy phase
a. read config space of device and save to migration file stream. This
   doesn't need to be from vendor driver. Any other special config state
   from driver can be saved as data in following iteration.
b. read pending_bytes. If pending_bytes > 0, go through below steps.
c. read data_offset - indicates kernel driver to write data to staging
   buffer.
d. read data_size - amount of data in bytes written by vendor driver in
   migration region.
e. read data_size bytes of data from data_offset in the migration region.
f. Write data packet as below:
   {VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data}
g. iterate through steps b to f while (pending_bytes > 0)
h. Write {VFIO_MIG_FLAG_END_OF_STATE}

When data region is mapped, its user's responsibility to read data from
data_offset of data_size before moving to next steps.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/migration.c   | 245 +-
 hw/vfio/trace-events  |   6 ++
 include/hw/vfio/vfio-common.h |   1 +
 3 files changed, 251 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 033f76526e49..ecbeed5182c2 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -138,6 +138,137 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, 
uint32_t mask,
 return 0;
 }
 
+static void *find_data_region(VFIORegion *region,
+  uint64_t data_offset,
+  uint64_t data_size)
+{
+void *ptr = NULL;
+int i;
+
+for (i = 0; i < region->nr_mmaps; i++) {
+if ((data_offset >= region->mmaps[i].offset) &&
+(data_offset < region->mmaps[i].offset + region->mmaps[i].size) &&
+(data_size <= region->mmaps[i].size)) {
+ptr = region->mmaps[i].mmap + (data_offset -
+   region->mmaps[i].offset);
+break;
+}
+}
+return ptr;
+}
+
+static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev)
+{
+VFIOMigration *migration = vbasedev->migration;
+VFIORegion *region = >region;
+uint64_t data_offset = 0, data_size = 0;
+int ret;
+
+ret = pread(vbasedev->fd, _offset, sizeof(data_offset),
+region->fd_offset + offsetof(struct vfio_device_migration_info,
+ data_offset));
+if (ret != sizeof(data_offset)) {
+error_report("%s: Failed to get migration buffer data offset %d",
+ vbasedev->name, ret);
+return -EINVAL;
+}
+
+ret = pread(vbasedev->fd, _size, sizeof(data_size),
+region->fd_offset + offsetof(struct vfio_device_migration_info,
+ data_size));
+if (ret != sizeof(data_size)) {
+error_report("%s: Failed to get migration buffer data size %d",
+ vbasedev->name, ret);
+return -EINVAL;
+}
+
+if (data_size > 0) {
+void *buf = NULL;
+bool buffer_mmaped;
+
+if (region->mmaps) {
+buf = find_data_region(region, data_offset, data_size);
+}
+
+buffer_mmaped = (buf != NULL) ? true : false;
+
+if (!buffer_mmaped) {
+buf = g_try_malloc0(data_size);
+if (!buf) {
+error_report("%s: Error allocating buffer ", __func__);
+return -ENOMEM;
+}
+
+ret = pread(vbasedev->fd, buf, data_size,
+region->fd_offset + data_offset);
+if (ret != data_size) {
+error_report("%s: Failed to get migration data %d",
+ vbasedev->name, ret);
+g_free(buf);
+return -EINVAL;
+}
+}
+
+qemu_put_be64(f, data_size);
+qemu_put_buffer(f, buf, data_size);
+
+if (!buffer_mmaped) {
+g_free(buf);
+}
+} else {
+qemu_put_be64(f, data_size);
+}
+
+trace_vfio_save_buffer(vbasedev->name, data_offset, data_size,
+   migration->pending_bytes);
+
+ret = qemu_file_get_error(f);
+if (ret) {
+return ret;
+}
+
+return data_size;
+}
+
+static int

[PATCH v16 QEMU 12/16] memory: Set DIRTY_MEMORY_MIGRATION when IOMMU is enabled

2020-03-24 Thread Kirti Wankhede

Signed-off-by: Kirti Wankhede 
---
 memory.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/memory.c b/memory.c
index acb7546971c3..285ca2ed6dd9 100644
--- a/memory.c
+++ b/memory.c
@@ -1788,7 +1788,7 @@ bool memory_region_is_ram_device(MemoryRegion *mr)
 uint8_t memory_region_get_dirty_log_mask(MemoryRegion *mr)
 {
 uint8_t mask = mr->dirty_log_mask;
-if (global_dirty_log && mr->ram_block) {
+if (global_dirty_log && (mr->ram_block || memory_region_is_iommu(mr))) {
 mask |= (1 << DIRTY_MEMORY_MIGRATION);
 }
 return mask;
-- 
2.7.0

[PATCH v16 QEMU 05/16] vfio: Add migration region initialization and finalize function

2020-03-24 Thread Kirti Wankhede

- Migration functions are implemented for VFIO_DEVICE_TYPE_PCI device in this
  patch series.
- VFIO device supports migration or not is decided based of migration region
  query. If migration region query is successful and migration region
  initialization is successful then migration is supported else migration is
  blocked.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/Makefile.objs |   2 +-
 hw/vfio/migration.c   | 138 ++
 hw/vfio/trace-events  |   3 +
 include/hw/vfio/vfio-common.h |   9 +++
 4 files changed, 151 insertions(+), 1 deletion(-)
 create mode 100644 hw/vfio/migration.c

diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
index 9bb1c09e8477..8b296c889ed9 100644
--- a/hw/vfio/Makefile.objs
+++ b/hw/vfio/Makefile.objs
@@ -1,4 +1,4 @@
-obj-y += common.o spapr.o
+obj-y += common.o spapr.o migration.o
 obj-$(CONFIG_VFIO_PCI) += pci.o pci-quirks.o display.o
 obj-$(CONFIG_VFIO_CCW) += ccw.o
 obj-$(CONFIG_VFIO_PLATFORM) += platform.o
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
new file mode 100644
index ..a078dcf1dd8f
--- /dev/null
+++ b/hw/vfio/migration.c
@@ -0,0 +1,138 @@
+/*
+ * Migration support for VFIO devices
+ *
+ * Copyright NVIDIA, Inc. 2019
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include 
+
+#include "hw/vfio/vfio-common.h"
+#include "cpu.h"
+#include "migration/migration.h"
+#include "migration/qemu-file.h"
+#include "migration/register.h"
+#include "migration/blocker.h"
+#include "migration/misc.h"
+#include "qapi/error.h"
+#include "exec/ramlist.h"
+#include "exec/ram_addr.h"
+#include "pci.h"
+#include "trace.h"
+
+static void vfio_migration_region_exit(VFIODevice *vbasedev)
+{
+VFIOMigration *migration = vbasedev->migration;
+
+if (!migration) {
+return;
+}
+
+if (migration->region.size) {
+vfio_region_exit(>region);
+vfio_region_finalize(>region);
+}
+}
+
+static int vfio_migration_region_init(VFIODevice *vbasedev, int index)
+{
+VFIOMigration *migration = vbasedev->migration;
+Object *obj = NULL;
+int ret = -EINVAL;
+
+if (!vbasedev->ops->vfio_get_object) {
+return ret;
+}
+
+obj = vbasedev->ops->vfio_get_object(vbasedev);
+if (!obj) {
+return ret;
+}
+
+ret = vfio_region_setup(obj, vbasedev, >region, index,
+"migration");
+if (ret) {
+error_report("%s: Failed to setup VFIO migration region %d: %s",
+ vbasedev->name, index, strerror(-ret));
+goto err;
+}
+
+if (!migration->region.size) {
+ret = -EINVAL;
+error_report("%s: Invalid region size of VFIO migration region %d: %s",
+ vbasedev->name, index, strerror(-ret));
+goto err;
+}
+
+return 0;
+
+err:
+vfio_migration_region_exit(vbasedev);
+return ret;
+}
+
+static int vfio_migration_init(VFIODevice *vbasedev,
+   struct vfio_region_info *info)
+{
+int ret;
+
+vbasedev->migration = g_new0(VFIOMigration, 1);
+
+ret = vfio_migration_region_init(vbasedev, info->index);
+if (ret) {
+error_report("%s: Failed to initialise migration region",
+ vbasedev->name);
+g_free(vbasedev->migration);
+vbasedev->migration = NULL;
+return ret;
+}
+
+return 0;
+}
+
+/* -- */
+
+int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
+{
+struct vfio_region_info *info;
+Error *local_err = NULL;
+int ret;
+
+ret = vfio_get_dev_region_info(vbasedev, VFIO_REGION_TYPE_MIGRATION,
+   VFIO_REGION_SUBTYPE_MIGRATION, );
+if (ret) {
+goto add_blocker;
+}
+
+ret = vfio_migration_init(vbasedev, info);
+if (ret) {
+goto add_blocker;
+}
+
+trace_vfio_migration_probe(vbasedev->name, info->index);
+return 0;
+
+add_blocker:
+error_setg(>migration_blocker,
+   "VFIO device doesn't support migration");
+ret = migrate_add_blocker(vbasedev->migration_blocker, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+error_free(vbasedev->migration_blocker);
+}
+return ret;
+}
+
+void vfio_migration_finalize(VFIODevice *vbasedev)
+{
+if (vbasedev->migration_blocker) {
+migrate_del_blocker(vbasedev->migration_blocker);
+error_free(vbasedev->migration_blocker);
+}
+
+vfio_migration_region_exit(vbasedev);
+g_free(vbasedev->migration);
+}
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 8cdc27946cb8..191a726a1312 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -143,3 +143,6 @@ vfio_display_edid_link_up(void) ""

[PATCH v16 QEMU 00/16] Add migration support for VFIO devices

2020-03-24 Thread Kirti Wankhede

Hi,

This Patch set adds migration support for VFIO devices in QEMU.

This Patch set include patches as below:
Patch 1:
- Define KABI for VFIO device for migration support for device state and newly
  added ioctl definations to get dirty pages bitmap. This is a placeholder
  patch.

Patch 2-4:
- Few code refactor
- Added save and restore functions for PCI configuration space

Patch 5-10:
- Generic migration functionality for VFIO device.
  * This patch set adds functionality only for PCI devices, but can be
extended to other VFIO devices.
  * Added all the basic functions required for pre-copy, stop-and-copy and
resume phases of migration.
  * Added state change notifier and from that notifier function, VFIO
device's state changed is conveyed to VFIO device driver.
  * During save setup phase and resume/load setup phase, migration region
is queried and is used to read/write VFIO device data.
  * .save_live_pending and .save_live_iterate are implemented to use QEMU's
functionality of iteration during pre-copy phase.
  * In .save_live_complete_precopy, that is in stop-and-copy phase,
iteration to read data from VFIO device driver is implemented till pending
bytes returned by driver are not zero.

Patch 11-12
- Add helper function for migration with vIOMMU enabled to get address limit
  IOMMU supports.
- Set DIRTY_MEMORY_MIGRATION flag in dirty log mask for migration with vIOMMU
  enabled.

Patch 13-14:
- Add function to start and stop dirty pages tracking.
- Add vfio_listerner_log_sync to mark dirty pages. Dirty pages bitmap is queried
  per container. All pages pinned by vendor driver through vfio_pin_pages
  external API has to be marked as dirty during  migration.
  When there are CPU writes, CPU dirty page tracking can identify dirtied
  pages, but any page pinned by vendor driver can also be written by
  device. As of now there is no device which has hardware support for
  dirty page tracking. So all pages which are pinned by vendor driver
  should be considered as dirty.
  In Qemu, marking pages dirty is only done when device is in stop-and-copy
  phase because if pages are marked dirty during pre-copy phase and content is
  transfered from source to distination, there is no way to know newly dirtied
  pages from the point they were copied earlier until device stops. To avoid
  repeated copy of same content, pinned pages are marked dirty only during
  stop-and-copy phase.

Patch 15:
- With vIOMMU, IO virtual address range can get unmapped while in pre-copy
  phase of migration. In that case, unmap ioctl should return pages pinned
  in that range and QEMU should report corresponding guest physical pages
  dirty.

Patch 16:
- Make VFIO PCI device migration capable. If migration region is not provided by
  driver, migration is blocked.

Yet TODO:
Since there is no device which has hardware support for system memmory
dirty bitmap tracking, right now there is no other API from vendor driver
to VFIO IOMMU module to report dirty pages. In future, when such hardware
support will be implemented, an API will be required in kernel such that
vendor driver could report dirty pages to VFIO module during migration phases.

Below is the flow of state change for live migration where states in brackets
represent VM state, migration state and VFIO device state as:
(VM state, MIGRATION_STATUS, VFIO_DEVICE_STATE)

Live migration save path:
QEMU normal running state
(RUNNING, _NONE, _RUNNING)
|
migrate_init spawns migration_thread.
(RUNNING, _SETUP, _RUNNING|_SAVING)
Migration thread then calls each device's .save_setup()
|
(RUNNING, _ACTIVE, _RUNNING|_SAVING)
If device is active, get pending bytes by .save_live_pending()
if pending bytes >= threshold_size,  call save_live_iterate()
Data of VFIO device for pre-copy phase is copied.
Iterate till pending bytes converge and are less than threshold
|
On migration completion, vCPUs stops and calls .save_live_complete_precopy
for each active device. VFIO device is then transitioned in
 _SAVING state.
(FINISH_MIGRATE, _DEVICE, _SAVING)
For VFIO device, iterate in  .save_live_complete_precopy  until
pending data is 0.
(FINISH_MIGRATE, _DEVICE, _STOPPED)
|
(FINISH_MIGRATE, _COMPLETED, STOPPED)
Migraton thread schedule cleanup bottom half and exit

Live migration resume path:
Incomming migration calls .load_setup for each device
(RESTORE_VM, _ACTIVE, STOPPED)
|
For each device, .load_state is called for that device section data
|
At the end, called .load_cleanup for each device and vCPUs are started.
|
(RUNNING, _NONE, _RUNNING)

Note that:
- Migration post copy is not supported.

v9 -> v16
- KABI almost finalised on kernel patches.
- Added support for migration with vIOMMU

[PATCH v16 QEMU 02/16] vfio: Add function to unmap VFIO region

2020-03-24 Thread Kirti Wankhede

This function will be used for migration region.
Migration region is mmaped when migration starts and will be unmapped when
migration is complete.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
Reviewed-by: Cornelia Huck 
---
 hw/vfio/common.c  | 20 
 hw/vfio/trace-events  |  1 +
 include/hw/vfio/vfio-common.h |  1 +
 3 files changed, 22 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 0b3593b3c0c4..4a2f0d6a2233 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -983,6 +983,26 @@ int vfio_region_mmap(VFIORegion *region)
 return 0;
 }
 
+void vfio_region_unmap(VFIORegion *region)
+{
+int i;
+
+if (!region->mem) {
+return;
+}
+
+for (i = 0; i < region->nr_mmaps; i++) {
+trace_vfio_region_unmap(memory_region_name(>mmaps[i].mem),
+region->mmaps[i].offset,
+region->mmaps[i].offset +
+region->mmaps[i].size - 1);
+memory_region_del_subregion(region->mem, >mmaps[i].mem);
+munmap(region->mmaps[i].mmap, region->mmaps[i].size);
+object_unparent(OBJECT(>mmaps[i].mem));
+region->mmaps[i].mmap = NULL;
+}
+}
+
 void vfio_region_exit(VFIORegion *region)
 {
 int i;
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index b1ef55a33ffd..8cdc27946cb8 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -111,6 +111,7 @@ vfio_region_mmap(const char *name, unsigned long offset, 
unsigned long end) "Reg
 vfio_region_exit(const char *name, int index) "Device %s, region %d"
 vfio_region_finalize(const char *name, int index) "Device %s, region %d"
 vfio_region_mmaps_set_enabled(const char *name, bool enabled) "Region %s mmaps 
enabled: %d"
+vfio_region_unmap(const char *name, unsigned long offset, unsigned long end) 
"Region %s unmap [0x%lx - 0x%lx]"
 vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) 
"Device %s region %d: %d sparse mmap entries"
 vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long end) 
"sparse entry %d [0x%lx - 0x%lx]"
 vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t 
subtype) "%s index %d, %08x/%0x8"
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index fd564209ac71..8d7a0fbb1046 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -171,6 +171,7 @@ int vfio_region_setup(Object *obj, VFIODevice *vbasedev, 
VFIORegion *region,
   int index, const char *name);
 int vfio_region_mmap(VFIORegion *region);
 void vfio_region_mmaps_set_enabled(VFIORegion *region, bool enabled);
+void vfio_region_unmap(VFIORegion *region);
 void vfio_region_exit(VFIORegion *region);
 void vfio_region_finalize(VFIORegion *region);
 void vfio_reset_handler(void *opaque);
-- 
2.7.0

[PATCH v16 QEMU 03/16] vfio: Add vfio_get_object callback to VFIODeviceOps

2020-03-24 Thread Kirti Wankhede

Hook vfio_get_object callback for PCI devices.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
Suggested-by: Cornelia Huck 
Reviewed-by: Cornelia Huck 
---
 hw/vfio/pci.c | 8 
 include/hw/vfio/vfio-common.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 5e75a95129ac..6c77c12e44b9 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2407,10 +2407,18 @@ static void vfio_pci_compute_needs_reset(VFIODevice 
*vbasedev)
 }
 }
 
+static Object *vfio_pci_get_object(VFIODevice *vbasedev)
+{
+VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+
+return OBJECT(vdev);
+}
+
 static VFIODeviceOps vfio_pci_ops = {
 .vfio_compute_needs_reset = vfio_pci_compute_needs_reset,
 .vfio_hot_reset_multi = vfio_pci_hot_reset_multi,
 .vfio_eoi = vfio_intx_eoi,
+.vfio_get_object = vfio_pci_get_object,
 };
 
 int vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 8d7a0fbb1046..74261feaeac9 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -119,6 +119,7 @@ struct VFIODeviceOps {
 void (*vfio_compute_needs_reset)(VFIODevice *vdev);
 int (*vfio_hot_reset_multi)(VFIODevice *vdev);
 void (*vfio_eoi)(VFIODevice *vdev);
+Object *(*vfio_get_object)(VFIODevice *vdev);
 };
 
 typedef struct VFIOGroup {
-- 
2.7.0

[PATCH v16 QEMU 01/16] vfio: KABI for migration interface - Kernel header placeholder

2020-03-24 Thread Kirti Wankhede

Kernel header patches are being reviewed along with kernel side changes.
This patch is only for place holder.
Link to Kernel patch set:
https://lists.gnu.org/archive/html/qemu-devel/2020-03/msg07429.html

This patch include all changes in vfio.h from above patch set

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 linux-headers/linux/vfio.h | 297 -
 1 file changed, 295 insertions(+), 2 deletions(-)

diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index fb10370d2928..78cadee85ac6 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -305,6 +305,7 @@ struct vfio_region_info_cap_type {
 #define VFIO_REGION_TYPE_PCI_VENDOR_MASK   (0x)
 #define VFIO_REGION_TYPE_GFX(1)
 #define VFIO_REGION_TYPE_CCW   (2)
+#define VFIO_REGION_TYPE_MIGRATION  (3)
 
 /* sub-types for VFIO_REGION_TYPE_PCI_* */
 
@@ -379,6 +380,232 @@ struct vfio_region_gfx_edid {
 /* sub-types for VFIO_REGION_TYPE_CCW */
 #define VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD  (1)
 
+/* sub-types for VFIO_REGION_TYPE_MIGRATION */
+#define VFIO_REGION_SUBTYPE_MIGRATION   (1)
+
+/*
+ * The structure vfio_device_migration_info is placed at the 0th offset of
+ * the VFIO_REGION_SUBTYPE_MIGRATION region to get and set VFIO device related
+ * migration information. Field accesses from this structure are only supported
+ * at their native width and alignment. Otherwise, the result is undefined and
+ * vendor drivers should return an error.
+ *
+ * device_state: (read/write)
+ *  - The user application writes to this field to inform the vendor driver
+ *about the device state to be transitioned to.
+ *  - The vendor driver should take the necessary actions to change the
+ *device state. After successful transition to a given state, the
+ *vendor driver should return success on write(device_state, state)
+ *system call. If the device state transition fails, the vendor driver
+ *should return an appropriate -errno for the fault condition.
+ *  - On the user application side, if the device state transition fails,
+ *   that is, if write(device_state, state) returns an error, read
+ *   device_state again to determine the current state of the device from
+ *   the vendor driver.
+ *  - The vendor driver should return previous state of the device unless
+ *the vendor driver has encountered an internal error, in which case
+ *the vendor driver may report the device_state 
VFIO_DEVICE_STATE_ERROR.
+ *  - The user application must use the device reset ioctl to recover the
+ *device from VFIO_DEVICE_STATE_ERROR state. If the device is
+ *indicated to be in a valid device state by reading device_state, the
+ *user application may attempt to transition the device to any valid
+ *state reachable from the current state or terminate itself.
+ *
+ *  device_state consists of 3 bits:
+ *  - If bit 0 is set, it indicates the _RUNNING state. If bit 0 is clear,
+ *it indicates the _STOP state. When the device state is changed to
+ *_STOP, driver should stop the device before write() returns.
+ *  - If bit 1 is set, it indicates the _SAVING state, which means that the
+ *driver should start gathering device state information that will be
+ *provided to the VFIO user application to save the device's state.
+ *  - If bit 2 is set, it indicates the _RESUMING state, which means that
+ *the driver should prepare to resume the device. Data provided through
+ *the migration region should be used to resume the device.
+ *  Bits 3 - 31 are reserved for future use. To preserve them, the user
+ *  application should perform a read-modify-write operation on this
+ *  field when modifying the specified bits.
+ *
+ *  +--- _RESUMING
+ *  |+-- _SAVING
+ *  ||+- _RUNNING
+ *  |||
+ *  000b => Device Stopped, not saving or resuming
+ *  001b => Device running, which is the default state
+ *  010b => Stop the device & save the device state, stop-and-copy state
+ *  011b => Device running and save the device state, pre-copy state
+ *  100b => Device stopped and the device state is resuming
+ *  101b => Invalid state
+ *  110b => Error state
+ *  111b => Invalid state
+ *
+ * State transitions:
+ *
+ *  _RESUMING  _RUNNINGPre-copyStop-and-copy   _STOP
+ *(100b) (001b) (011b)(010b)   (000b)
+ * 0. Running or default state
+ * |
+ *
+ * 1. Normal Shutdown (optional)
+ * |->|
+ *
+ * 2. Save the state or suspend
+ * |->|-->|
+ *
+ * 3. Save the state during live migration
+ *

[PATCH] hw/net/allwinner-sun8i-emac.c: Fix REG_ADDR_HIGH/LOW reads

2020-03-24 Thread Peter Maydell

Coverity points out (CID 1421926) that the read code for
REG_ADDR_HIGH reads off the end of the buffer, because it does a
32-bit read from byte 4 of a 6-byte buffer.

The code also has an endianness issue for both REG_ADDR_HIGH and
REG_ADDR_LOW, because it will do the wrong thing on a big-endian
host.

Rewrite the read code to use ldl_le_p() and lduw_le_p() to fix this;
the write code is not incorrect, but for consistency we make it use
stl_le_p() and stw_le_p().

Signed-off-by: Peter Maydell 
---
 hw/net/allwinner-sun8i-emac.c | 12 
 1 file changed, 4 insertions(+), 8 deletions(-)

diff --git a/hw/net/allwinner-sun8i-emac.c b/hw/net/allwinner-sun8i-emac.c
index 3fc5e346401..fc67a1be70a 100644
--- a/hw/net/allwinner-sun8i-emac.c
+++ b/hw/net/allwinner-sun8i-emac.c
@@ -611,10 +611,10 @@ static uint64_t allwinner_sun8i_emac_read(void *opaque, 
hwaddr offset,
 value = s->mii_data;
 break;
 case REG_ADDR_HIGH: /* MAC Address High */
-value = *(((uint32_t *) (s->conf.macaddr.a)) + 1);
+value = lduw_le_p(s->conf.macaddr.a + 4);
 break;
 case REG_ADDR_LOW:  /* MAC Address Low */
-value = *(uint32_t *) (s->conf.macaddr.a);
+value = ldl_le_p(s->conf.macaddr.a);
 break;
 case REG_TX_DMA_STA:/* Transmit DMA Status */
 break;
@@ -728,14 +728,10 @@ static void allwinner_sun8i_emac_write(void *opaque, 
hwaddr offset,
 s->mii_data = value;
 break;
 case REG_ADDR_HIGH: /* MAC Address High */
-s->conf.macaddr.a[4] = (value & 0xff);
-s->conf.macaddr.a[5] = (value & 0xff00) >> 8;
+stw_le_p(s->conf.macaddr.a + 4, value);
 break;
 case REG_ADDR_LOW:  /* MAC Address Low */
-s->conf.macaddr.a[0] = (value & 0xff);
-s->conf.macaddr.a[1] = (value & 0xff00) >> 8;
-s->conf.macaddr.a[2] = (value & 0xff) >> 16;
-s->conf.macaddr.a[3] = (value & 0xff00) >> 24;
+stl_le_p(s->conf.macaddr.a, value);
 break;
 case REG_TX_DMA_STA:/* Transmit DMA Status */
 case REG_TX_CUR_DESC:   /* Transmit Current Descriptor */
-- 
2.20.1

Re: [PATCH] ext4: Give 32bit personalities 32bit hashes

2020-03-24 Thread Linus Walleij

On Tue, Mar 24, 2020 at 7:48 PM Theodore Y. Ts'o  wrote:
> On Tue, Mar 24, 2020 at 09:29:58AM +, Peter Maydell wrote:
> >
> > On the contrary, that would be a much better interface for QEMU.
> > We always know when we're doing an open-syscall on behalf
> > of the guest, and it would be trivial to make the fcntl() call then.
> > That would ensure that we don't accidentally get the
> > '32-bit semantics' on file descriptors QEMU opens for its own
> > purposes, and wouldn't leave us open to the risk in future that
> > setting the PER_LINUX32 flag for all of QEMU causes
> > unexpected extra behaviour in future kernels that would be correct
> > for the guest binary but wrong/broken for QEMU's own internals.
>
> If using a flag set by fcntl is better for qemu, then by all means
> let's go with that instead of using a personality flag/number.
>
> Linus, do you have what you need to do a respin of the patch?

Absolutely, I'm a bit occupied this week but I will try to get to it
early next week!

Thanks a lot for the directions here, it's highly valuable.

Yours,
Linus Walleij

Re: Potential missing checks

2020-03-24 Thread Peter Maydell

On Tue, 24 Mar 2020 at 20:39, Mansour Ahmadi  wrote:
>
> Thank you for looking into this, Peter. I agree that static analysis has 
> false positives; that's why I called them potential. Basically, they are 
> found based on code similarity so I might be wrong and I need a second 
> opinion from QEMU developers. I appreciate your effort.

The thing is, you're making us do all the work here. That's
not very useful to us. It's doubly unuseful when there's
a strong chance that when we do do the work of looking
at the code it turns out that there's no problem.

"I did some static analysis, and I looked at the
results, and I dug through the QEMU code, and it
does seem to me that this could well be a bug" is
definitely useful. "I did some static analysis using
only analysis techniques that have an pretty
low false positive rate, and here is a selection of
the results" is also useful. But "I just ran the
code through an analyser that produces lots of
false positives and then I didn't do any further
human examination of the results" is of much less
utility to the project, I'm afraid.

> For the first case, I noticed a check on offset (if (offset)) before negating 
> it and passing to stream function here.
> https://github.com/qemu/qemu/blob/c532b954d96f96d361ca31308f75f1b95bd4df76/disas/arm.c#L1748
>
> Similar scenario happened here WITHOUT the check:
> https://github.com/qemu/qemu/blob/c532b954d96f96d361ca31308f75f1b95bd4df76/disas/arm.c#L2731-L2733

So, this is in the disassembler. The difference is
just whether we print out a register+offset memory
reference where the offset happens to be zero
as "[reg, #0]" or just "[reg]", and the no-special-case-0
is for encodings which are always pc-relative.
So even if it was a missing check the results are
entirely harmless, since anybody reading the disassembly
will understand the #0 fine.

Secondly, this code is imported from binutils,
so we usually don't worry too much about fixing
up minor bugs in it.

Finally, I went and checked the Arm specs, and for
the kinds of PC-relative load/store that the second
example is handling the specified disassembly format
does mandate the "pc, #0" (whereas the other example
is correctly skipping it for 0-immediates because
in those insns the offset is optional in disassembly).

So the code is correct as it stands.

thanks
-- PMM

Re: [PULL 0/2] Ide patches

2020-03-24 Thread John Snow




On 3/24/20 3:55 PM, John Snow wrote:
> The following changes since commit 736cf607e40674776d752acc201f565723e86045:
> 
>   Update version for v5.0.0-rc0 release (2020-03-24 17:50:00 +)
> 
> are available in the Git repository at:
> 
>   https://github.com/jnsnow/qemu.git tags/ide-pull-request
> 
> for you to fetch changes up to 51058b3b3bcbe62506cf191fca1c0d679bb80f2b:
> 
>   hw/ide/sii3112: Use qdev gpio rather than qemu_allocate_irqs() (2020-03-24 
> 15:52:16 -0400)
> 
> 
> Pull request: IDE
> 
> Admittedly the first one is not a crisis fix; but I think it's low-risk to
> include for rc1.
> 
> The second one is yours, and will shush coverity.
> 
> 
> 
> Peter Maydell (1):
>   hw/ide/sii3112: Use qdev gpio rather than qemu_allocate_irqs()
> 
> Sven Schnelle (1):
>   fdc/i8257: implement verify transfer mode
> 
>  include/hw/isa/isa.h |  1 -
>  hw/block/fdc.c   | 61 +---
>  hw/dma/i8257.c   | 20 ++-
>  hw/ide/sii3112.c |  8 +++---
>  4 files changed, 35 insertions(+), 55 deletions(-)
> 

NACK. Mark Cave-Ayland is sending additional fixes.

--js

[PATCH for-5.0 2/3] via-ide: use qdev gpio rather than qemu_allocate_irqs()

2020-03-24 Thread Mark Cave-Ayland

This prevents the memory from qemu_allocate_irqs() from being leaked which
can in some cases be spotted by Coverity (CID 1421984).

Signed-off-by: Mark Cave-Ayland 
---
 hw/ide/via.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/hw/ide/via.c b/hw/ide/via.c
index 2a55b7fbc6..be09912b33 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -160,6 +160,7 @@ static void via_ide_reset(DeviceState *dev)
 static void via_ide_realize(PCIDevice *dev, Error **errp)
 {
 PCIIDEState *d = PCI_IDE(dev);
+DeviceState *ds = DEVICE(dev);
 uint8_t *pci_conf = dev->config;
 int i;
 
@@ -187,9 +188,10 @@ static void via_ide_realize(PCIDevice *dev, Error **errp)
 bmdma_setup_bar(d);
 pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, >bmdma_bar);
 
+qdev_init_gpio_in(ds, via_ide_set_irq, 2);
 for (i = 0; i < 2; i++) {
-ide_bus_new(>bus[i], sizeof(d->bus[i]), DEVICE(d), i, 2);
-ide_init2(>bus[i], qemu_allocate_irq(via_ide_set_irq, d, i));
+ide_bus_new(>bus[i], sizeof(d->bus[i]), ds, i, 2);
+ide_init2(>bus[i], qdev_get_gpio_in(ds, i));
 
 bmdma_init(>bus[i], >bmdma[i], d);
 d->bmdma[i].bus = >bus[i];
-- 
2.20.1

[PATCH for-5.0 3/3] cmd646-ide: use qdev gpio rather than qemu_allocate_irqs()

2020-03-24 Thread Mark Cave-Ayland

This prevents the memory from qemu_allocate_irqs() from being leaked which
can in some cases be spotted by Coverity (CID 1421984).

Signed-off-by: Mark Cave-Ayland 
---
 hw/ide/cmd646.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/hw/ide/cmd646.c b/hw/ide/cmd646.c
index 699f25824d..c254631485 100644
--- a/hw/ide/cmd646.c
+++ b/hw/ide/cmd646.c
@@ -249,8 +249,8 @@ static void cmd646_pci_config_write(PCIDevice *d, uint32_t 
addr, uint32_t val,
 static void pci_cmd646_ide_realize(PCIDevice *dev, Error **errp)
 {
 PCIIDEState *d = PCI_IDE(dev);
+DeviceState *ds = DEVICE(dev);
 uint8_t *pci_conf = dev->config;
-qemu_irq *irq;
 int i;
 
 pci_conf[PCI_CLASS_PROG] = 0x8f;
@@ -291,16 +291,15 @@ static void pci_cmd646_ide_realize(PCIDevice *dev, Error 
**errp)
 /* TODO: RST# value should be 0 */
 pci_conf[PCI_INTERRUPT_PIN] = 0x01; // interrupt on pin 1
 
-irq = qemu_allocate_irqs(cmd646_set_irq, d, 2);
+qdev_init_gpio_in(ds, cmd646_set_irq, 2);
 for (i = 0; i < 2; i++) {
-ide_bus_new(>bus[i], sizeof(d->bus[i]), DEVICE(dev), i, 2);
-ide_init2(>bus[i], irq[i]);
+ide_bus_new(>bus[i], sizeof(d->bus[i]), ds, i, 2);
+ide_init2(>bus[i], qdev_get_gpio_in(ds, i));
 
 bmdma_init(>bus[i], >bmdma[i], d);
 d->bmdma[i].bus = >bus[i];
 ide_register_restart_cb(>bus[i]);
 }
-g_free(irq);
 }
 
 static void pci_cmd646_ide_exitfn(PCIDevice *dev)
-- 
2.20.1

Re: [PATCH v5 07/18] s390x: protvirt: Inhibit balloon when switching to protected mode

2020-03-24 Thread Brijesh Singh



On 3/20/20 1:43 PM, Halil Pasic wrote:
> On Thu, 19 Mar 2020 18:31:11 +0100
> David Hildenbrand  wrote:
>
>> [...]
>>
 I asked this question already to Michael (cc) via a different
 channel, but hare is it again:

 Why does the balloon driver not support VIRTIO_F_IOMMU_PLATFORM? It
 is absolutely not clear to me. The introducing commit mentioned
 that it "bypasses DMA". I fail to see that.

 At least the communication via the SG mechanism should work
 perfectly fine with an IOMMU enabled. So I assume it boils down to
 the pages that we inflate/deflate not being referenced via IOVA?
>>> AFAIU the IOVA/GPA stuff is not the problem here. You have said it
>>> yourself, the SG mechanism would work for balloon out of the box, as
>>> it does for the other virtio devices. 
>>>
>>> But VIRTIO_F_ACCESS_PLATFORM (aka VIRTIO_F_IOMMU_PLATFORM)  not
>>> presented means according to Michael that the device has full access
>>> to the entire guest RAM. If VIRTIO_F_ACCESS_PLATFORM is negotiated
>>> this may or may not be the case.
>> So you say
>>
>> "The virtio specification tells that the device is to present
>> VIRTIO_F_ACCESS_PLATFORM (a.k.a. VIRTIO_F_IOMMU_PLATFORM) when the
>> device "can only access certain memory addresses with said access
>> specified and/or granted by the platform"."
>>
>> So, AFAIU, *any* virtio device (hypervisor side) has to present this
>> flag when PV is enabled. 
> Yes, and balloon says bye bye when running in PV mode is only a secondary
> objective. I've compiled some references:
>
> "To summarize, the necessary conditions for a hack along these lines
> (using DMA API without VIRTIO_F_ACCESS_PLATFORM) are that we detect that:
>
>   - secure guest mode is enabled - so we know that since we don't share
> most memory regular virtio code won't
> work, even though the buggy hypervisor didn't set 
> VIRTIO_F_ACCESS_PLATFORM" 
> (Michael Tsirkin, 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2020%2F2%2F20%2F1021data=02%7C01%7Cbrijesh.singh%40amd.com%7C52b79b5c9e894dd968c508d7ccfe9479%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637203266090844487sdata=aNS%2FW2nL27mPSl1Xz3iXUY31qtrzmVHYhzVHEILAaQQ%3Dreserved=0)
> I.e.: PV but !VIRTIO_F_ACCESS_PLATFORM \implies bugy hypervisor
>
>
> "If VIRTIO_F_ACCESS_PLATFORM is set then things just work.  If
> VIRTIO_F_ACCESS_PLATFORM is clear device is supposed to have access to
> all of memory.  You can argue in various ways but it's easier to just
> declare a behaviour that violates this a bug."
> (Michael Tsirkin, 
> https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Flkml.org%2Flkml%2F2020%2F2%2F21%2F1626data=02%7C01%7Cbrijesh.singh%40amd.com%7C52b79b5c9e894dd968c508d7ccfe9479%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637203266090854439sdata=d3knybBUZ5NL0Lv1C2JS040A3toiCxXVYLkBlzXSrqc%3Dreserved=0)
> This one is about all memory guest, and not just the buffers transfered
> via the virtqueue, which surprised me a bit at the beginning. But balloon
> actually needs this.
>
> "A device SHOULD offer VIRTIO_F_ACCESS_PLATFORM if its access to memory
> is through bus addresses distinct from and translated by the platform to
> physical addresses used by the driver, and/or if it can only access
> certain memory addresses with said access specified and/or granted by
> the platform. A device MAY fail to operate further if
> VIRTIO_F_ACCESS_PLATFORM is not accepted. "
> (https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fdocs.oasis-open.org%2Fvirtio%2Fvirtio%2Fv1.1%2Fcs01%2Fvirtio-v1.1-cs01.html%23x1-4120002data=02%7C01%7Cbrijesh.singh%40amd.com%7C52b79b5c9e894dd968c508d7ccfe9479%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C637203266090854439sdata=RBx8cBr8I%2FWFChtVFTjBygRiHIXMmsjT8W%2BwLaTNQ24%3Dreserved=0)
>
>
>> In that regard, your patch makes perfect sense
>> (although I am not sure it's a good idea to overwrite these feature
>> bits
>> - maybe they should be activated on the cmdline permanently instead
>> when PV is to be used? (or enable )).
> I didn't understand the last part. I believe conserving the user
> specified value when not running in PV mode is better than the hard
> overwrite I did here. I wanted a discussion starter.
>
> I think the other option (with respect to let QEMU manage this for user,
> i.e. what I try to do here) is to fence the conversion if virtio devices
> that do not offer VIRTIO_F_ACCESS_PLATFORM are attached; and disallow
> hotplug of such devices at some point during the conversion.
>
> I believe that alternative is even uglier.
>
> IMHO we don't want the end user to fiddle with iommu_platform, because
> all the 'benefit' he gets from that is possibility to make a mistake.
> For example, I got an internal bug report saying virtio is broken with
> PV, which boiled down to an overlooked auto generated NIC, which of
> course had iommu_platform (VIRTIO_F_ACCESS_PLATFORM) not set.
>
>>> The actual problem is that the

[PATCH for-5.0 0/3] ide: fix potential memory leaks (plus one via-ide bugfix)

2020-03-24 Thread Mark Cave-Ayland

This was supposed to be a simple patchset to switch via-ide and cmd646-ide
over to use qdev gpio in the same way as Peter's patch did for sil3112, but
at the same time I spotted a silly mistake in my last set of via-ide
patches which is included as patch 1.

I'm not sure exactly why Coverity CID 1421984 isn't triggered by the
via-ide and cmd646-ide devices, however given the simplicity of the fix it
seems worth doing just to keep everything the same and ensure it won't
suddenly appear in future.

The via-ide changes were tested using the instructions provided by Zoltan
for MIPS fulong2e and PPC pegasos2, whilst the cmd646 change was tested
using one of my SPARC64 Linux test images.

Signed-off-by: Mark Cave-Ayland 


Mark Cave-Ayland (3):
  via-ide: don't use PCI level for legacy IRQs
  via-ide: use qdev gpio rather than qemu_allocate_irqs()
  cmd646-ide: use qdev gpio rather than qemu_allocate_irqs()

 hw/ide/cmd646.c | 9 -
 hw/ide/via.c| 7 ---
 2 files changed, 8 insertions(+), 8 deletions(-)

-- 
2.20.1

[PATCH for-5.0 1/3] via-ide: don't use PCI level for legacy IRQs

2020-03-24 Thread Mark Cave-Ayland

The PCI level calculation was accidentally left in when rebasing from a
previous patchset. Since both IRQs are driven separately, the value
being passed into the IRQ handler should be used directly.

Signed-off-by: Mark Cave-Ayland 
---
 hw/ide/via.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/hw/ide/via.c b/hw/ide/via.c
index 8de4945cc1..2a55b7fbc6 100644
--- a/hw/ide/via.c
+++ b/hw/ide/via.c
@@ -112,7 +112,6 @@ static void via_ide_set_irq(void *opaque, int n, int level)
 d->config[0x70 + n * 8] &= ~0x80;
 }
 
-level = (d->config[0x70] & 0x80) || (d->config[0x78] & 0x80);
 qemu_set_irq(isa_get_irq(NULL, 14 + n), level);
 }
 
-- 
2.20.1

Re: [PATCH] hw/ide/sii3112: Use qdev gpio rather than qemu_allocate_irqs()

2020-03-24 Thread John Snow




On 3/24/20 4:43 PM, Mark Cave-Ayland wrote:
> On 23/03/2020 15:17, Peter Maydell wrote:
> 
>> Coverity points out (CID 1421984) that we are leaking the
>> memory returned by qemu_allocate_irqs(). We can avoid this
>> leak by switching to using qdev_init_gpio_in(); the base
>> class finalize will free the irqs that this allocates under
>> the hood.
>>
>> Signed-off-by: Peter Maydell 
>> ---
>> This is how the 'use qdev gpio' approach to fixing the leak looks.
>> Disclaimer: I have only tested this with "make check", nothing more.
>>
>>  hw/ide/sii3112.c | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>>
>> diff --git a/hw/ide/sii3112.c b/hw/ide/sii3112.c
>> index 06605d7af2b..2ae6f5d9df6 100644
>> --- a/hw/ide/sii3112.c
>> +++ b/hw/ide/sii3112.c
>> @@ -251,8 +251,8 @@ static void sii3112_pci_realize(PCIDevice *dev, Error 
>> **errp)
>>  {
>>  SiI3112PCIState *d = SII3112_PCI(dev);
>>  PCIIDEState *s = PCI_IDE(dev);
>> +DeviceState *ds = DEVICE(dev);
>>  MemoryRegion *mr;
>> -qemu_irq *irq;
>>  int i;
>>  
>>  pci_config_set_interrupt_pin(dev->config, 1);
>> @@ -280,10 +280,10 @@ static void sii3112_pci_realize(PCIDevice *dev, Error 
>> **errp)
>>  memory_region_init_alias(mr, OBJECT(d), "sii3112.bar4", >mmio, 0, 
>> 16);
>>  pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, mr);
>>  
>> -irq = qemu_allocate_irqs(sii3112_set_irq, d, 2);
>> +qdev_init_gpio_in(ds, sii3112_set_irq, 2);
>>  for (i = 0; i < 2; i++) {
>>  ide_bus_new(>bus[i], sizeof(s->bus[i]), DEVICE(dev), i, 1);
>> -ide_init2(>bus[i], irq[i]);
>> +ide_init2(>bus[i], qdev_get_gpio_in(ds, i));
>>  
>>  bmdma_init(>bus[i], >bmdma[i], s);
>>  s->bmdma[i].bus = >bus[i];
> 
> Looks like there is similar use of qemu_allocate_irqs() in via-ide and 
> cmd646-ide,
> and also reviewing my latest via-ide changes I spotted a silly mistake which 
> was
> obviously left in from a previous experimental version.
> 
> I'm not sure why Coverity doesn't pick up these other occurrences, however 
> I'll send
> along a patchset for this shortly.
> 

OK;

I will rescind my PR and will re-send it with your patches included.

--js

[Bug 1866892] Re: guest OS catches a page fault bug when running dotnet

2020-03-24 Thread Robert Henry

I've stepped/nexted from the helper_iret_protected, going deep into the
bowels of the TLB, MMU and page table engine.  None of which I
understand. The helper_ret_protected faults in the first POPQ_RA.  I'll
investigate the value of sp at the time of the POPQ_RA.

Here's the POPQ_RA in i386/seg_helper.c:2140

sp = env->regs[R_ESP];
ssp = env->segs[R_SS].base;
new_eflags = 0; /* avoid warning */
#ifdef TARGET_X86_64
if (shift == 2) {
POPQ_RA(sp, new_eip, retaddr);
POPQ_RA(sp, new_cs, retaddr);
new_cs &= 0x;
if (is_iret) {
POPQ_RA(sp, new_eflags, retaddr);
}

and here's the stack.  Note some of the logical intermediate frames are
optimized out due to -O3 and inline. (the value of env-errorcode is 1)

0  0x55a370c0 in raise_interrupt2
(env=env@entry=0x566ef200, intno=14, is_int=is_int@entry=0, 
error_code=1, next_eip_addend=next_eip_addend@entry=0, 
retaddr=retaddr@entry=140736367565663) at 
/mnt/robhenry/qemu_robhenry_amd64/qemu/include/exec/cpu-all.h:426
#1  0x55a377f9 in raise_exception_err_ra
(env=env@entry=0x566ef200, exception_index=, 
error_code=, retaddr=retaddr@entry=140736367565663) at 
/mnt/robhenry/qemu_robhenry_amd64/qemu/target/i386/excp_helper.c:127
#2  0x55a37d69 in x86_cpu_tlb_fill
(cs=0x566e69a0, addr=140727872411616, size=, 
access_type=MMU_DATA_LOAD, mmu_idx=0, probe=, 
retaddr=140736367565663) at 
/mnt/robhenry/qemu_robhenry_amd64/qemu/target/i386/excp_helper.c:697
#3  0x55952295 in tlb_fill
(cpu=0x566e69a0, addr=140727872411616, size=8, 
access_type=MMU_DATA_LOAD, mmu_idx=0, retaddr=140736367565663)
at /mnt/robhenry/qemu_robhenry_amd64/qemu/accel/tcg/cputlb.c:1017
#4  0x55956320 in load_helper
(full_load=0x55956140 , code_read=false, op=MO_64, 
retaddr=93825010692608, oi=48, addr=140727872411616, env=0x566ef200) at 
/mnt/robhenry/qemu_robhenry_amd64/qemu/include/exec/cpu-all.h:426
#5  0x55956320 in helper_le_ldq_mmu
(env=env@entry=0x566ef200, addr=addr@entry=140727872411616, 
oi=oi@entry=48, retaddr=retaddr@entry=140736367565663)
at /mnt/robhenry/qemu_robhenry_amd64/qemu/accel/tcg/cputlb.c:1688
#6  0x55956dc0 in cpu_load_helper
(full_load=0x55956140 , op=MO_64, 
retaddr=140736367565663, mmu_idx=, addr=140727872411616, 
env=0x566ef200) at 
/mnt/robhenry/qemu_robhenry_amd64/qemu/accel/tcg/cputlb.c:1752
#7  0x55956dc0 in cpu_ldq_mmuidx_ra
(env=env@entry=0x566ef200, addr=addr@entry=140727872411616, 
mmu_idx=, ra=ra@entry=140736367565663)
--Type  for more, q to quit, c to continue without paging--
at /mnt/robhenry/qemu_robhenry_amd64/qemu/accel/tcg/cputlb.c:1799
#8  0x55a4ff09 in helper_ret_protected
(env=env@entry=0x566ef200, shift=shift@entry=2, 
is_iret=is_iret@entry=1, addend=addend@entry=0, retaddr=140736367565663)
at /mnt/robhenry/qemu_robhenry_amd64/qemu/target/i386/seg_helper.c:2140
#9  0x55a50ff5 in helper_iret_protected (env=0x566ef200, shift=2, 
next_eip=-999377888)
at /mnt/robhenry/qemu_robhenry_amd64/qemu/target/i386/seg_helper.c:2363
#10 0x7fffbd321b5f in code_gen_buffer ()

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1866892

Title:
  guest OS catches a page  fault bug when running dotnet

Status in QEMU:
  New

Bug description:
  The linux guest OS catches a page fault bug when running the dotnet
  application.

  host = metal = x86_64
  host OS = ubuntu 19.10
  qemu emulation, without KVM, with "tiny code generator" tcg; no plugins; 
built from head/master
  guest emulation = x86_64
  guest OS = ubuntu 19.10
  guest app = dotnet, running any program

  qemu sha=7bc4d1980f95387c4cc921d7a066217ff4e42b70 (head/master Mar 10,
  2020)

  qemu invocation is:

  qemu/build/x86_64-softmmu/qemu-system-x86_64 \
-m size=4096 \
-smp cpus=1 \
-machine type=pc-i440fx-5.0,accel=tcg \
-cpu Skylake-Server-v1 \
-nographic \
-bios OVMF-pure-efi.fd \
-drive if=none,id=hd0,file=ubuntu-19.10-server-cloudimg-amd64.img \
-device virtio-blk,drive=hd0 \
-drive if=none,id=cloud,file=linux_cloud_config.img \
-device virtio-blk,drive=cloud \
-netdev user,id=user0,hostfwd=tcp::2223-:22 \
-device virtio-net,netdev=user0

  
  Here's the guest kernel console output:

  
  [ 2834.005449] BUG: unable to handle page fault for address: 7fffc2c0
  [ 2834.009895] #PF: supervisor read access in user mode
  [ 2834.013872] #PF: error_code(0x0001) - permissions violation
  [ 2834.018025] IDT: 0xfe00 (limit=0xfff) GDT: 0xfe001000 
(limit=0x7f)
  [ 2834.022242] LDTR: NULL
  [ 2834.026306] TR: 0x40 -- base=0xfe003000 limit=0x206f
  [ 2834.030395] PGD 8000360d0067 P4D 8000360d0067 PUD 36105067 PMD 
36193067 PTE 800076d8e867
  [ 2834.038672] Oops: 0001 [#4] SMP PTI
  [

Re: [PATCH v16 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-03-24 Thread Alex Williamson

On Tue, 24 Mar 2020 14:37:16 -0600
Alex Williamson  wrote:

> On Wed, 25 Mar 2020 01:02:36 +0530
> Kirti Wankhede  wrote:
> 
> > VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
> > - Start dirty pages tracking while migration is active
> > - Stop dirty pages tracking.
> > - Get dirty pages bitmap. Its user space application's responsibility to
> >   copy content of dirty pages from source to destination during migration.
> > 
> > To prevent DoS attack, memory for bitmap is allocated per vfio_dma
> > structure. Bitmap size is calculated considering smallest supported page
> > size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled
> > 
> > Bitmap is populated for already pinned pages when bitmap is allocated for
> > a vfio_dma with the smallest supported page size. Update bitmap from
> > pinning functions when tracking is enabled. When user application queries
> > bitmap, check if requested page size is same as page size used to
> > populated bitmap. If it is equal, copy bitmap, but if not equal, return
> > error.
> > 
> > Signed-off-by: Kirti Wankhede 
> > Reviewed-by: Neo Jia 
> > ---
> >  drivers/vfio/vfio_iommu_type1.c | 265 
> > +++-
> >  1 file changed, 259 insertions(+), 6 deletions(-)
> > 
> > diff --git a/drivers/vfio/vfio_iommu_type1.c 
> > b/drivers/vfio/vfio_iommu_type1.c
> > index 70aeab921d0f..27ed069c5053 100644
> > --- a/drivers/vfio/vfio_iommu_type1.c
> > +++ b/drivers/vfio/vfio_iommu_type1.c
> > @@ -71,6 +71,7 @@ struct vfio_iommu {
> > unsigned intdma_avail;
> > boolv2;
> > boolnesting;
> > +   booldirty_page_tracking;
> >  };
> >  
> >  struct vfio_domain {
> > @@ -91,6 +92,7 @@ struct vfio_dma {
> > boollock_cap;   /* capable(CAP_IPC_LOCK) */
> > struct task_struct  *task;
> > struct rb_root  pfn_list;   /* Ex-user pinned pfn list */
> > +   unsigned long   *bitmap;
> >  };
> >  
> >  struct vfio_group {
> > @@ -125,7 +127,21 @@ struct vfio_regions {
> >  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)\
> > (!list_empty(>domain_list))
> >  
> > +#define DIRTY_BITMAP_BYTES(n)  (ALIGN(n, BITS_PER_TYPE(u64)) / 
> > BITS_PER_BYTE)
> > +
> > +/*
> > + * Input argument of number of bits to bitmap_set() is unsigned integer, 
> > which
> > + * further casts to signed integer for unaligned multi-bit operation,
> > + * __bitmap_set().
> > + * Then maximum bitmap size supported is 2^31 bits divided by 2^3 
> > bits/byte,
> > + * that is 2^28 (256 MB) which maps to 2^31 * 2^12 = 2^43 (8TB) on 4K page
> > + * system.
> > + */
> > +#define DIRTY_BITMAP_PAGES_MAX (uint64_t)(INT_MAX - 1)
> > +#define DIRTY_BITMAP_SIZE_MAX   
> > DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
> > +
> >  static int put_pfn(unsigned long pfn, int prot);
> > +static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu);
> >  
> >  /*
> >   * This code handles mapping and unmapping of user data buffers
> > @@ -175,6 +191,77 @@ static void vfio_unlink_dma(struct vfio_iommu *iommu, 
> > struct vfio_dma *old)
> > rb_erase(>node, >dma_list);
> >  }
> >  
> > +
> > +static int vfio_dma_bitmap_alloc(struct vfio_dma *dma, uint64_t pgsize)
> > +{
> > +   uint64_t npages = dma->size / pgsize;
> > +
> > +   if (npages > DIRTY_BITMAP_PAGES_MAX)
> > +   return -EINVAL;
> > +
> > +   dma->bitmap = kvzalloc(DIRTY_BITMAP_BYTES(npages), GFP_KERNEL);
> > +   if (!dma->bitmap)
> > +   return -ENOMEM;
> > +
> > +   return 0;
> > +}
> > +
> > +static void vfio_dma_bitmap_free(struct vfio_dma *dma)
> > +{
> > +   kfree(dma->bitmap);
> > +   dma->bitmap = NULL;
> > +}
> > +
> > +static void vfio_dma_populate_bitmap(struct vfio_dma *dma, uint64_t pgsize)
> > +{
> > +   struct rb_node *p;
> > +
> > +   if (RB_EMPTY_ROOT(>pfn_list))
> > +   return;
> > +
> > +   for (p = rb_first(>pfn_list); p; p = rb_next(p)) {
> > +   struct vfio_pfn *vpfn = rb_entry(p, struct vfio_pfn, node);
> > +
> > +   bitmap_set(dma->bitmap, (vpfn->iova - dma->iova) / pgsize, 1);
> > +   }
> > +}
> > +
> > +static int vfio_dma_bitmap_alloc_all(struct vfio_iommu *iommu, uint64_t 
> > pgsize)
> > +{
> > +   struct rb_node *n = rb_first(>dma_list);
> > +
> > +   for (; n; n = rb_next(n)) {
> > +   struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
> > +   int ret;
> > +
> > +   ret = vfio_dma_bitmap_alloc(dma, pgsize);
> > +   if (ret) {
> > +   struct rb_node *p = rb_prev(n);
> > +
> > +   for (; p; p = rb_prev(p)) {
> > +   struct vfio_dma *dma = rb_entry(n,
> > +   struct vfio_dma, node);
> > +
> > +   vfio_dma_bitmap_free(dma);
> > +   }
> > +   return ret;
> > +   }
> >

Re: [PATCH] hw/ide/sii3112: Use qdev gpio rather than qemu_allocate_irqs()

2020-03-24 Thread Mark Cave-Ayland

On 23/03/2020 15:17, Peter Maydell wrote:

> Coverity points out (CID 1421984) that we are leaking the
> memory returned by qemu_allocate_irqs(). We can avoid this
> leak by switching to using qdev_init_gpio_in(); the base
> class finalize will free the irqs that this allocates under
> the hood.
> 
> Signed-off-by: Peter Maydell 
> ---
> This is how the 'use qdev gpio' approach to fixing the leak looks.
> Disclaimer: I have only tested this with "make check", nothing more.
> 
>  hw/ide/sii3112.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/hw/ide/sii3112.c b/hw/ide/sii3112.c
> index 06605d7af2b..2ae6f5d9df6 100644
> --- a/hw/ide/sii3112.c
> +++ b/hw/ide/sii3112.c
> @@ -251,8 +251,8 @@ static void sii3112_pci_realize(PCIDevice *dev, Error 
> **errp)
>  {
>  SiI3112PCIState *d = SII3112_PCI(dev);
>  PCIIDEState *s = PCI_IDE(dev);
> +DeviceState *ds = DEVICE(dev);
>  MemoryRegion *mr;
> -qemu_irq *irq;
>  int i;
>  
>  pci_config_set_interrupt_pin(dev->config, 1);
> @@ -280,10 +280,10 @@ static void sii3112_pci_realize(PCIDevice *dev, Error 
> **errp)
>  memory_region_init_alias(mr, OBJECT(d), "sii3112.bar4", >mmio, 0, 16);
>  pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, mr);
>  
> -irq = qemu_allocate_irqs(sii3112_set_irq, d, 2);
> +qdev_init_gpio_in(ds, sii3112_set_irq, 2);
>  for (i = 0; i < 2; i++) {
>  ide_bus_new(>bus[i], sizeof(s->bus[i]), DEVICE(dev), i, 1);
> -ide_init2(>bus[i], irq[i]);
> +ide_init2(>bus[i], qdev_get_gpio_in(ds, i));
>  
>  bmdma_init(>bus[i], >bmdma[i], s);
>  s->bmdma[i].bus = >bus[i];

Looks like there is similar use of qemu_allocate_irqs() in via-ide and 
cmd646-ide,
and also reviewing my latest via-ide changes I spotted a silly mistake which was
obviously left in from a previous experimental version.

I'm not sure why Coverity doesn't pick up these other occurrences, however I'll 
send
along a patchset for this shortly.


ATB,

Mark.

Re: [PATCH v16 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-03-24 Thread Alex Williamson

On Wed, 25 Mar 2020 01:02:36 +0530
Kirti Wankhede  wrote:

> VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
> - Start dirty pages tracking while migration is active
> - Stop dirty pages tracking.
> - Get dirty pages bitmap. Its user space application's responsibility to
>   copy content of dirty pages from source to destination during migration.
> 
> To prevent DoS attack, memory for bitmap is allocated per vfio_dma
> structure. Bitmap size is calculated considering smallest supported page
> size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled
> 
> Bitmap is populated for already pinned pages when bitmap is allocated for
> a vfio_dma with the smallest supported page size. Update bitmap from
> pinning functions when tracking is enabled. When user application queries
> bitmap, check if requested page size is same as page size used to
> populated bitmap. If it is equal, copy bitmap, but if not equal, return
> error.
> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 265 
> +++-
>  1 file changed, 259 insertions(+), 6 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 70aeab921d0f..27ed069c5053 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -71,6 +71,7 @@ struct vfio_iommu {
>   unsigned intdma_avail;
>   boolv2;
>   boolnesting;
> + booldirty_page_tracking;
>  };
>  
>  struct vfio_domain {
> @@ -91,6 +92,7 @@ struct vfio_dma {
>   boollock_cap;   /* capable(CAP_IPC_LOCK) */
>   struct task_struct  *task;
>   struct rb_root  pfn_list;   /* Ex-user pinned pfn list */
> + unsigned long   *bitmap;
>  };
>  
>  struct vfio_group {
> @@ -125,7 +127,21 @@ struct vfio_regions {
>  #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)  \
>   (!list_empty(>domain_list))
>  
> +#define DIRTY_BITMAP_BYTES(n)(ALIGN(n, BITS_PER_TYPE(u64)) / 
> BITS_PER_BYTE)
> +
> +/*
> + * Input argument of number of bits to bitmap_set() is unsigned integer, 
> which
> + * further casts to signed integer for unaligned multi-bit operation,
> + * __bitmap_set().
> + * Then maximum bitmap size supported is 2^31 bits divided by 2^3 bits/byte,
> + * that is 2^28 (256 MB) which maps to 2^31 * 2^12 = 2^43 (8TB) on 4K page
> + * system.
> + */
> +#define DIRTY_BITMAP_PAGES_MAX   (uint64_t)(INT_MAX - 1)
> +#define DIRTY_BITMAP_SIZE_MAX 
> DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
> +
>  static int put_pfn(unsigned long pfn, int prot);
> +static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu);
>  
>  /*
>   * This code handles mapping and unmapping of user data buffers
> @@ -175,6 +191,77 @@ static void vfio_unlink_dma(struct vfio_iommu *iommu, 
> struct vfio_dma *old)
>   rb_erase(>node, >dma_list);
>  }
>  
> +
> +static int vfio_dma_bitmap_alloc(struct vfio_dma *dma, uint64_t pgsize)
> +{
> + uint64_t npages = dma->size / pgsize;
> +
> + if (npages > DIRTY_BITMAP_PAGES_MAX)
> + return -EINVAL;
> +
> + dma->bitmap = kvzalloc(DIRTY_BITMAP_BYTES(npages), GFP_KERNEL);
> + if (!dma->bitmap)
> + return -ENOMEM;
> +
> + return 0;
> +}
> +
> +static void vfio_dma_bitmap_free(struct vfio_dma *dma)
> +{
> + kfree(dma->bitmap);
> + dma->bitmap = NULL;
> +}
> +
> +static void vfio_dma_populate_bitmap(struct vfio_dma *dma, uint64_t pgsize)
> +{
> + struct rb_node *p;
> +
> + if (RB_EMPTY_ROOT(>pfn_list))
> + return;
> +
> + for (p = rb_first(>pfn_list); p; p = rb_next(p)) {
> + struct vfio_pfn *vpfn = rb_entry(p, struct vfio_pfn, node);
> +
> + bitmap_set(dma->bitmap, (vpfn->iova - dma->iova) / pgsize, 1);
> + }
> +}
> +
> +static int vfio_dma_bitmap_alloc_all(struct vfio_iommu *iommu, uint64_t 
> pgsize)
> +{
> + struct rb_node *n = rb_first(>dma_list);
> +
> + for (; n; n = rb_next(n)) {
> + struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
> + int ret;
> +
> + ret = vfio_dma_bitmap_alloc(dma, pgsize);
> + if (ret) {
> + struct rb_node *p = rb_prev(n);
> +
> + for (; p; p = rb_prev(p)) {
> + struct vfio_dma *dma = rb_entry(n,
> + struct vfio_dma, node);
> +
> + vfio_dma_bitmap_free(dma);
> + }
> + return ret;
> + }
> + vfio_dma_populate_bitmap(dma, pgsize);
> + }
> + return 0;
> +}
> +
> +static void vfio_dma_bitmap_free_all(struct vfio_iommu *iommu)
> +{
> + struct rb_node *n = rb_first(>dma_list);
> +
> + for (; n; n = rb_next(n)) {

Re: [PATCH v15 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-03-24 Thread Dr. David Alan Gilbert

* Alex Williamson (alex.william...@redhat.com) wrote:
> On Mon, 23 Mar 2020 23:01:18 -0400
> Yan Zhao  wrote:
> 
> > On Tue, Mar 24, 2020 at 02:51:14AM +0800, Dr. David Alan Gilbert wrote:
> > > * Alex Williamson (alex.william...@redhat.com) wrote:  
> > > > On Mon, 23 Mar 2020 23:24:37 +0530
> > > > Kirti Wankhede  wrote:
> > > >   
> > > > > On 3/21/2020 12:29 AM, Alex Williamson wrote:  
> > > > > > On Sat, 21 Mar 2020 00:12:04 +0530
> > > > > > Kirti Wankhede  wrote:
> > > > > > 
> > > > > >> On 3/20/2020 11:31 PM, Alex Williamson wrote:
> > > > > >>> On Fri, 20 Mar 2020 23:19:14 +0530
> > > > > >>> Kirti Wankhede  wrote:
> > > > > >>>
> > > > >  On 3/20/2020 4:27 AM, Alex Williamson wrote:
> > > > > > On Fri, 20 Mar 2020 01:46:41 +0530
> > > > > > Kirti Wankhede  wrote:
> > > > > >   
> > > > > >>
> > > > > >> 
> > > > > >>
> > > > > >> +static int vfio_iova_dirty_bitmap(struct vfio_iommu *iommu, 
> > > > > >> dma_addr_t iova,
> > > > > >> +size_t size, uint64_t pgsize,
> > > > > >> +u64 __user *bitmap)
> > > > > >> +{
> > > > > >> +  struct vfio_dma *dma;
> > > > > >> +  unsigned long pgshift = __ffs(pgsize);
> > > > > >> +  unsigned int npages, bitmap_size;
> > > > > >> +
> > > > > >> +  dma = vfio_find_dma(iommu, iova, 1);
> > > > > >> +
> > > > > >> +  if (!dma)
> > > > > >> +  return -EINVAL;
> > > > > >> +
> > > > > >> +  if (dma->iova != iova || dma->size != size)
> > > > > >> +  return -EINVAL;
> > > > > >> +
> > > > > >> +  npages = dma->size >> pgshift;
> > > > > >> +  bitmap_size = DIRTY_BITMAP_BYTES(npages);
> > > > > >> +
> > > > > >> +  /* mark all pages dirty if all pages are pinned and 
> > > > > >> mapped. */
> > > > > >> +  if (dma->iommu_mapped)
> > > > > >> +  bitmap_set(dma->bitmap, 0, npages);
> > > > > >> +
> > > > > >> +  if (copy_to_user((void __user *)bitmap, dma->bitmap, 
> > > > > >> bitmap_size))
> > > > > >> +  return -EFAULT;
> > > > > >
> > > > > > We still need to reset the bitmap here, clearing and re-adding 
> > > > > > the
> > > > > > pages that are still pinned.
> > > > > >
> > > > > > https://lore.kernel.org/kvm/20200319070635.2ff5d...@x1.home/
> > > > > >   
> > > > > 
> > > > >  I thought you agreed on my reply to it
> > > > >  https://lore.kernel.org/kvm/31621b70-02a9-2ea5-045f-f72b671fe...@nvidia.com/
> > > > >    
> > > > > > Why re-populate when there will be no change since
> > > > > > vfio_iova_dirty_bitmap() is called holding iommu->lock? If 
> > > > >  there is any
> > > > > > pin request while vfio_iova_dirty_bitmap() is still 
> > > > >  working, it will
> > > > > > wait till iommu->lock is released. Bitmap will be populated 
> > > > >  when page is
> > > > > > pinned.
> > > > > >>>
> > > > > >>> As coded, dirty bits are only ever set in the bitmap, never 
> > > > > >>> cleared.
> > > > > >>> If a page is unpinned between iterations of the user recording the
> > > > > >>> dirty bitmap, it should be marked dirty in the iteration 
> > > > > >>> immediately
> > > > > >>> after the unpinning and not marked dirty in the following 
> > > > > >>> iteration.
> > > > > >>> That doesn't happen here.  We're reporting cumulative dirty pages 
> > > > > >>> since
> > > > > >>> logging was enabled, we need to be reporting dirty pages since 
> > > > > >>> the user
> > > > > >>> last retrieved the dirty bitmap.  The bitmap should be cleared and
> > > > > >>> currently pinned pages re-added after copying to the user.  
> > > > > >>> Thanks,
> > > > > >>>
> > > > > >>
> > > > > >> Does that mean, we have to track every iteration? do we really 
> > > > > >> need that
> > > > > >> tracking?
> > > > > >>
> > > > > >> Generally the flow is:
> > > > > >> - vendor driver pin x pages
> > > > > >> - Enter pre-copy-phase where vCPUs are running - user starts dirty 
> > > > > >> pages
> > > > > >> tracking, then user asks dirty bitmap, x pages reported dirty by
> > > > > >> VFIO_IOMMU_DIRTY_PAGES ioctl with _GET flag
> > > > > >> - In pre-copy phase, vendor driver pins y more pages, now bitmap
> > > > > >> consists of x+y bits set
> > > > > >> - In pre-copy phase, vendor driver unpins z pages, but bitmap is 
> > > > > >> not
> > > > > >> updated, so again bitmap consists of x+y bits set.
> > > > > >> - Enter in stop-and-copy phase, vCPUs are stopped, mdev devices 
> > > > > >> are stopped
> > > > > >> - user asks dirty bitmap - Since here vCPU and mdev devices are 
> > > > > >> stopped,
> > > > > >> pages should not get dirty by guest driver or the physical device.
> > > > > >> Hence, x+y dirty pages would be reported.
> > > > > >>
> > > > > >> I

[PATCH v16 Kernel 7/7] vfio: Selective dirty page tracking if IOMMU backed device pins pages

2020-03-24 Thread Kirti Wankhede

Added a check such that only singleton IOMMU groups can pin pages.
>From the point when vendor driver pins any pages, consider IOMMU group
dirty page scope to be limited to pinned pages.

To optimize to avoid walking list often, added flag
pinned_page_dirty_scope to indicate if all of the vfio_groups for each
vfio_domain in the domain_list dirty page scope is limited to pinned
pages. This flag is updated on first pinned pages request for that IOMMU
group and on attaching/detaching group.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 drivers/vfio/vfio.c | 13 --
 drivers/vfio/vfio_iommu_type1.c | 94 +++--
 include/linux/vfio.h|  4 +-
 3 files changed, 104 insertions(+), 7 deletions(-)

diff --git a/drivers/vfio/vfio.c b/drivers/vfio/vfio.c
index 210fcf426643..311b5e4e111e 100644
--- a/drivers/vfio/vfio.c
+++ b/drivers/vfio/vfio.c
@@ -85,6 +85,7 @@ struct vfio_group {
atomic_topened;
wait_queue_head_t   container_q;
boolnoiommu;
+   unsigned intdev_counter;
struct kvm  *kvm;
struct blocking_notifier_head   notifier;
 };
@@ -555,6 +556,7 @@ struct vfio_device *vfio_group_create_device(struct 
vfio_group *group,
 
mutex_lock(>device_lock);
list_add(>group_next, >device_list);
+   group->dev_counter++;
mutex_unlock(>device_lock);
 
return device;
@@ -567,6 +569,7 @@ static void vfio_device_release(struct kref *kref)
struct vfio_group *group = device->group;
 
list_del(>group_next);
+   group->dev_counter--;
mutex_unlock(>device_lock);
 
dev_set_drvdata(device->dev, NULL);
@@ -1933,6 +1936,9 @@ int vfio_pin_pages(struct device *dev, unsigned long 
*user_pfn, int npage,
if (!group)
return -ENODEV;
 
+   if (group->dev_counter > 1)
+   return -EINVAL;
+
ret = vfio_group_add_container_user(group);
if (ret)
goto err_pin_pages;
@@ -1940,7 +1946,8 @@ int vfio_pin_pages(struct device *dev, unsigned long 
*user_pfn, int npage,
container = group->container;
driver = container->iommu_driver;
if (likely(driver && driver->ops->pin_pages))
-   ret = driver->ops->pin_pages(container->iommu_data, user_pfn,
+   ret = driver->ops->pin_pages(container->iommu_data,
+group->iommu_group, user_pfn,
 npage, prot, phys_pfn);
else
ret = -ENOTTY;
@@ -2038,8 +2045,8 @@ int vfio_group_pin_pages(struct vfio_group *group,
driver = container->iommu_driver;
if (likely(driver && driver->ops->pin_pages))
ret = driver->ops->pin_pages(container->iommu_data,
-user_iova_pfn, npage,
-prot, phys_pfn);
+group->iommu_group, user_iova_pfn,
+npage, prot, phys_pfn);
else
ret = -ENOTTY;
 
diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 5b233ded7a9a..a6b1f6930e6a 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -72,6 +72,7 @@ struct vfio_iommu {
boolv2;
boolnesting;
booldirty_page_tracking;
+   boolpinned_page_dirty_scope;
 };
 
 struct vfio_domain {
@@ -99,6 +100,7 @@ struct vfio_group {
struct iommu_group  *iommu_group;
struct list_headnext;
boolmdev_group; /* An mdev group */
+   boolpinned_page_dirty_scope;
 };
 
 struct vfio_iova {
@@ -143,6 +145,10 @@ struct vfio_regions {
 static int put_pfn(unsigned long pfn, int prot);
 static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu);
 
+static struct vfio_group *vfio_iommu_find_iommu_group(struct vfio_iommu *iommu,
+  struct iommu_group *iommu_group);
+
+static void update_pinned_page_dirty_scope(struct vfio_iommu *iommu);
 /*
  * This code handles mapping and unmapping of user data buffers
  * into DMA'ble space using the IOMMU
@@ -589,11 +595,13 @@ static int vfio_unpin_page_external(struct vfio_dma *dma, 
dma_addr_t iova,
 }
 
 static int vfio_iommu_type1_pin_pages(void *iommu_data,
+ struct iommu_group *iommu_group,
  unsigned long *user_pfn,
  int npage, int prot,
  unsigned long *phys_pfn)
 {
struct vfio_iommu *iommu = iommu_data;
+   struct vfio_group *group;
int i, j, ret;

[PATCH v16 Kernel 4/7] vfio iommu: Implementation of ioctl for dirty pages tracking.

2020-03-24 Thread Kirti Wankhede

VFIO_IOMMU_DIRTY_PAGES ioctl performs three operations:
- Start dirty pages tracking while migration is active
- Stop dirty pages tracking.
- Get dirty pages bitmap. Its user space application's responsibility to
  copy content of dirty pages from source to destination during migration.

To prevent DoS attack, memory for bitmap is allocated per vfio_dma
structure. Bitmap size is calculated considering smallest supported page
size. Bitmap is allocated for all vfio_dmas when dirty logging is enabled

Bitmap is populated for already pinned pages when bitmap is allocated for
a vfio_dma with the smallest supported page size. Update bitmap from
pinning functions when tracking is enabled. When user application queries
bitmap, check if requested page size is same as page size used to
populated bitmap. If it is equal, copy bitmap, but if not equal, return
error.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 drivers/vfio/vfio_iommu_type1.c | 265 +++-
 1 file changed, 259 insertions(+), 6 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 70aeab921d0f..27ed069c5053 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -71,6 +71,7 @@ struct vfio_iommu {
unsigned intdma_avail;
boolv2;
boolnesting;
+   booldirty_page_tracking;
 };
 
 struct vfio_domain {
@@ -91,6 +92,7 @@ struct vfio_dma {
boollock_cap;   /* capable(CAP_IPC_LOCK) */
struct task_struct  *task;
struct rb_root  pfn_list;   /* Ex-user pinned pfn list */
+   unsigned long   *bitmap;
 };
 
 struct vfio_group {
@@ -125,7 +127,21 @@ struct vfio_regions {
 #define IS_IOMMU_CAP_DOMAIN_IN_CONTAINER(iommu)\
(!list_empty(>domain_list))
 
+#define DIRTY_BITMAP_BYTES(n)  (ALIGN(n, BITS_PER_TYPE(u64)) / BITS_PER_BYTE)
+
+/*
+ * Input argument of number of bits to bitmap_set() is unsigned integer, which
+ * further casts to signed integer for unaligned multi-bit operation,
+ * __bitmap_set().
+ * Then maximum bitmap size supported is 2^31 bits divided by 2^3 bits/byte,
+ * that is 2^28 (256 MB) which maps to 2^31 * 2^12 = 2^43 (8TB) on 4K page
+ * system.
+ */
+#define DIRTY_BITMAP_PAGES_MAX (uint64_t)(INT_MAX - 1)
+#define DIRTY_BITMAP_SIZE_MAX   DIRTY_BITMAP_BYTES(DIRTY_BITMAP_PAGES_MAX)
+
 static int put_pfn(unsigned long pfn, int prot);
+static unsigned long vfio_pgsize_bitmap(struct vfio_iommu *iommu);
 
 /*
  * This code handles mapping and unmapping of user data buffers
@@ -175,6 +191,77 @@ static void vfio_unlink_dma(struct vfio_iommu *iommu, 
struct vfio_dma *old)
rb_erase(>node, >dma_list);
 }
 
+
+static int vfio_dma_bitmap_alloc(struct vfio_dma *dma, uint64_t pgsize)
+{
+   uint64_t npages = dma->size / pgsize;
+
+   if (npages > DIRTY_BITMAP_PAGES_MAX)
+   return -EINVAL;
+
+   dma->bitmap = kvzalloc(DIRTY_BITMAP_BYTES(npages), GFP_KERNEL);
+   if (!dma->bitmap)
+   return -ENOMEM;
+
+   return 0;
+}
+
+static void vfio_dma_bitmap_free(struct vfio_dma *dma)
+{
+   kfree(dma->bitmap);
+   dma->bitmap = NULL;
+}
+
+static void vfio_dma_populate_bitmap(struct vfio_dma *dma, uint64_t pgsize)
+{
+   struct rb_node *p;
+
+   if (RB_EMPTY_ROOT(>pfn_list))
+   return;
+
+   for (p = rb_first(>pfn_list); p; p = rb_next(p)) {
+   struct vfio_pfn *vpfn = rb_entry(p, struct vfio_pfn, node);
+
+   bitmap_set(dma->bitmap, (vpfn->iova - dma->iova) / pgsize, 1);
+   }
+}
+
+static int vfio_dma_bitmap_alloc_all(struct vfio_iommu *iommu, uint64_t pgsize)
+{
+   struct rb_node *n = rb_first(>dma_list);
+
+   for (; n; n = rb_next(n)) {
+   struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
+   int ret;
+
+   ret = vfio_dma_bitmap_alloc(dma, pgsize);
+   if (ret) {
+   struct rb_node *p = rb_prev(n);
+
+   for (; p; p = rb_prev(p)) {
+   struct vfio_dma *dma = rb_entry(n,
+   struct vfio_dma, node);
+
+   vfio_dma_bitmap_free(dma);
+   }
+   return ret;
+   }
+   vfio_dma_populate_bitmap(dma, pgsize);
+   }
+   return 0;
+}
+
+static void vfio_dma_bitmap_free_all(struct vfio_iommu *iommu)
+{
+   struct rb_node *n = rb_first(>dma_list);
+
+   for (; n; n = rb_next(n)) {
+   struct vfio_dma *dma = rb_entry(n, struct vfio_dma, node);
+
+   vfio_dma_bitmap_free(dma);
+   }
+}
+
 /*
  * Helper Functions for host iova-pfn list
  */
@@ -567,6 +654,18 @@ static int vfio_iommu_type1_pin_pages(void *iommu_data,

[PATCH v16 Kernel 3/7] vfio iommu: Add ioctl definition for dirty pages tracking.

2020-03-24 Thread Kirti Wankhede

IOMMU container maintains a list of all pages pinned by vfio_pin_pages API.
All pages pinned by vendor driver through this API should be considered as
dirty during migration. When container consists of IOMMU capable device and
all pages are pinned and mapped, then all pages are marked dirty.
Added support to start/stop dirtied pages tracking and to get bitmap of all
dirtied pages for requested IO virtual address range.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 include/uapi/linux/vfio.h | 56 +++
 1 file changed, 56 insertions(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 8641e022c3b0..0018721fb744 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -996,6 +996,12 @@ struct vfio_iommu_type1_dma_map {
 
 #define VFIO_IOMMU_MAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 13)
 
+struct vfio_bitmap {
+   __u64pgsize;/* page size for bitmap in bytes */
+   __u64size;  /* in bytes */
+   __u64 __user *data; /* one bit per page */
+};
+
 /**
  * VFIO_IOMMU_UNMAP_DMA - _IOWR(VFIO_TYPE, VFIO_BASE + 14,
  * struct vfio_dma_unmap)
@@ -1022,6 +1028,56 @@ struct vfio_iommu_type1_dma_unmap {
 #define VFIO_IOMMU_ENABLE  _IO(VFIO_TYPE, VFIO_BASE + 15)
 #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16)
 
+/**
+ * VFIO_IOMMU_DIRTY_PAGES - _IOWR(VFIO_TYPE, VFIO_BASE + 17,
+ * struct vfio_iommu_type1_dirty_bitmap)
+ * IOCTL is used for dirty pages tracking.
+ * Caller should set flag depending on which operation to perform, details as
+ * below:
+ *
+ * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_START set, indicates
+ * migration is active and IOMMU module should track pages which are dirtied or
+ * potentially dirtied by device.
+ * Dirty pages are tracked until tracking is stopped by user application by
+ * setting VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP flag.
+ *
+ * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP set, indicates
+ * IOMMU should stop tracking dirtied pages.
+ *
+ * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP flag set,
+ * IOCTL returns dirty pages bitmap for IOMMU container during migration for
+ * given IOVA range. User must provide data[] as the structure
+ * vfio_iommu_type1_dirty_bitmap_get through which user provides IOVA range and
+ * pgsize. IOVA range must match to that used in original mapping call. This
+ * interface supports to get bitmap of smallest supported pgsize only and can
+ * be modified in future to get bitmap of specified pgsize.
+ * User must allocate memory for bitmap and set size of allocated memory in
+ * bitmap.size field. One bit is used to represent one page consecutively
+ * starting from iova offset. User should provide page size in bitmap.pgsize
+ * field. Bit set in bitmap indicates page at that offset from iova is
+ * dirty. Caller must set argsz including size of structure
+ * vfio_iommu_type1_dirty_bitmap_get.
+ *
+ * Only one of the flags _START, STOP and _GET may be specified at a time.
+ *
+ */
+struct vfio_iommu_type1_dirty_bitmap {
+   __u32argsz;
+   __u32flags;
+#define VFIO_IOMMU_DIRTY_PAGES_FLAG_START  (1 << 0)
+#define VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP   (1 << 1)
+#define VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP (1 << 2)
+   __u8 data[];
+};
+
+struct vfio_iommu_type1_dirty_bitmap_get {
+   __u64  iova;/* IO virtual address */
+   __u64  size;/* Size of iova range */
+   struct vfio_bitmap bitmap;
+};
+
+#define VFIO_IOMMU_DIRTY_PAGES _IO(VFIO_TYPE, VFIO_BASE + 17)
+
 /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
 
 /*
-- 
2.7.0

[PATCH v16 Kernel 5/7] vfio iommu: Update UNMAP_DMA ioctl to get dirty bitmap before unmap

2020-03-24 Thread Kirti Wankhede

DMA mapped pages, including those pinned by mdev vendor drivers, might
get unpinned and unmapped while migration is active and device is still
running. For example, in pre-copy phase while guest driver could access
those pages, host device or vendor driver can dirty these mapped pages.
Such pages should be marked dirty so as to maintain memory consistency
for a user making use of dirty page tracking.

To get bitmap during unmap, user should allocate memory for bitmap, set
size of allocated memory, set page size to be considered for bitmap and
set flag VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 drivers/vfio/vfio_iommu_type1.c | 54 ++---
 include/uapi/linux/vfio.h   | 10 
 2 files changed, 60 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 27ed069c5053..b98a8d79e13a 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -982,7 +982,8 @@ static int verify_bitmap_size(uint64_t npages, uint64_t 
bitmap_size)
 }
 
 static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
-struct vfio_iommu_type1_dma_unmap *unmap)
+struct vfio_iommu_type1_dma_unmap *unmap,
+struct vfio_bitmap *bitmap)
 {
uint64_t mask;
struct vfio_dma *dma, *dma_last = NULL;
@@ -1033,6 +1034,10 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 * will be returned if these conditions are not met.  The v2 interface
 * will only return success and a size of zero if there were no
 * mappings within the range.
+*
+* When VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP flag is set, unmap request
+* must be for single mapping. Multiple mappings with this flag set is
+* not supported.
 */
if (iommu->v2) {
dma = vfio_find_dma(iommu, unmap->iova, 1);
@@ -1040,6 +1045,13 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
ret = -EINVAL;
goto unlock;
}
+
+   if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
+   (dma->iova != unmap->iova || dma->size != unmap->size)) {
+   ret = -EINVAL;
+   goto unlock;
+   }
+
dma = vfio_find_dma(iommu, unmap->iova + unmap->size - 1, 0);
if (dma && dma->iova + dma->size != unmap->iova + unmap->size) {
ret = -EINVAL;
@@ -1057,6 +1069,11 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
if (dma->task->mm != current->mm)
break;
 
+   if ((unmap->flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) &&
+iommu->dirty_page_tracking)
+   vfio_iova_dirty_bitmap(iommu, dma->iova, dma->size,
+   bitmap->pgsize, bitmap->data);
+
if (!RB_EMPTY_ROOT(>pfn_list)) {
struct vfio_iommu_type1_dma_unmap nb_unmap;
 
@@ -2418,17 +2435,46 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 
} else if (cmd == VFIO_IOMMU_UNMAP_DMA) {
struct vfio_iommu_type1_dma_unmap unmap;
-   long ret;
+   struct vfio_bitmap bitmap = { 0 };
+   int ret;
 
minsz = offsetofend(struct vfio_iommu_type1_dma_unmap, size);
 
if (copy_from_user(, (void __user *)arg, minsz))
return -EFAULT;
 
-   if (unmap.argsz < minsz || unmap.flags)
+   if (unmap.argsz < minsz ||
+   unmap.flags & ~VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP)
return -EINVAL;
 
-   ret = vfio_dma_do_unmap(iommu, );
+   if (unmap.flags & VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP) {
+   unsigned long pgshift;
+   uint64_t iommu_pgsize =
+1 << __ffs(vfio_pgsize_bitmap(iommu));
+
+   if (unmap.argsz < (minsz + sizeof(bitmap)))
+   return -EINVAL;
+
+   if (copy_from_user(,
+  (void __user *)(arg + minsz),
+  sizeof(bitmap)))
+   return -EFAULT;
+
+   /* allow only min supported pgsize */
+   if (bitmap.pgsize != iommu_pgsize)
+   return -EINVAL;
+   if (!access_ok((void __user *)bitmap.data, bitmap.size))
+   return -EINVAL;
+
+   pgshift = __ffs(bitmap.pgsize);
+   ret = verify_bitmap_size(unmap.size >> pgshift,
+

[PATCH v16 Kernel 2/7] vfio iommu: Remove atomicity of ref_count of pinned pages

2020-03-24 Thread Kirti Wankhede

vfio_pfn.ref_count is always updated by holding iommu->lock, using atomic
variable is overkill.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
Reviewed-by: Eric Auger 
---
 drivers/vfio/vfio_iommu_type1.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 9fdfae1cb17a..70aeab921d0f 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -112,7 +112,7 @@ struct vfio_pfn {
struct rb_node  node;
dma_addr_t  iova;   /* Device address */
unsigned long   pfn;/* Host pfn */
-   atomic_tref_count;
+   unsigned intref_count;
 };
 
 struct vfio_regions {
@@ -233,7 +233,7 @@ static int vfio_add_to_pfn_list(struct vfio_dma *dma, 
dma_addr_t iova,
 
vpfn->iova = iova;
vpfn->pfn = pfn;
-   atomic_set(>ref_count, 1);
+   vpfn->ref_count = 1;
vfio_link_pfn(dma, vpfn);
return 0;
 }
@@ -251,7 +251,7 @@ static struct vfio_pfn *vfio_iova_get_vfio_pfn(struct 
vfio_dma *dma,
struct vfio_pfn *vpfn = vfio_find_vpfn(dma, iova);
 
if (vpfn)
-   atomic_inc(>ref_count);
+   vpfn->ref_count++;
return vpfn;
 }
 
@@ -259,7 +259,8 @@ static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, 
struct vfio_pfn *vpfn)
 {
int ret = 0;
 
-   if (atomic_dec_and_test(>ref_count)) {
+   vpfn->ref_count--;
+   if (!vpfn->ref_count) {
ret = put_pfn(vpfn->pfn, dma->prot);
vfio_remove_from_pfn_list(dma, vpfn);
}
-- 
2.7.0

[PATCH v16 Kernel 6/7] vfio iommu: Adds flag to indicate dirty pages tracking capability support

2020-03-24 Thread Kirti Wankhede

Flag VFIO_IOMMU_INFO_DIRTY_PGS in VFIO_IOMMU_GET_INFO indicates that driver
support dirty pages tracking.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 drivers/vfio/vfio_iommu_type1.c | 3 ++-
 include/uapi/linux/vfio.h   | 5 +++--
 2 files changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index b98a8d79e13a..5b233ded7a9a 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -2388,7 +2388,8 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
info.cap_offset = 0; /* output, no-recopy necessary */
}
 
-   info.flags = VFIO_IOMMU_INFO_PGSIZES;
+   info.flags = VFIO_IOMMU_INFO_PGSIZES |
+VFIO_IOMMU_INFO_DIRTY_PGS;
 
info.iova_pgsizes = vfio_pgsize_bitmap(iommu);
 
diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 7c888041136f..39bd734a5064 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -948,8 +948,9 @@ struct vfio_device_ioeventfd {
 struct vfio_iommu_type1_info {
__u32   argsz;
__u32   flags;
-#define VFIO_IOMMU_INFO_PGSIZES (1 << 0)   /* supported page sizes info */
-#define VFIO_IOMMU_INFO_CAPS   (1 << 1)/* Info supports caps */
+#define VFIO_IOMMU_INFO_PGSIZES   (1 << 0) /* supported page sizes info */
+#define VFIO_IOMMU_INFO_CAPS  (1 << 1) /* Info supports caps */
+#define VFIO_IOMMU_INFO_DIRTY_PGS (1 << 2) /* supports dirty page tracking */
__u64   iova_pgsizes;   /* Bitmap of supported page sizes */
__u32   cap_offset; /* Offset within info struct of first cap */
 };
-- 
2.7.0

[PATCH v16 Kernel 1/7] vfio: KABI for migration interface for device state

2020-03-24 Thread Kirti Wankhede

- Defined MIGRATION region type and sub-type.

- Defined vfio_device_migration_info structure which will be placed at the
  0th offset of migration region to get/set VFIO device related
  information. Defined members of structure and usage on read/write access.

- Defined device states and state transition details.

- Defined sequence to be followed while saving and resuming VFIO device.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 include/uapi/linux/vfio.h | 228 ++
 1 file changed, 228 insertions(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 9e843a147ead..8641e022c3b0 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -305,6 +305,7 @@ struct vfio_region_info_cap_type {
 #define VFIO_REGION_TYPE_PCI_VENDOR_MASK   (0x)
 #define VFIO_REGION_TYPE_GFX(1)
 #define VFIO_REGION_TYPE_CCW   (2)
+#define VFIO_REGION_TYPE_MIGRATION  (3)
 
 /* sub-types for VFIO_REGION_TYPE_PCI_* */
 
@@ -379,6 +380,233 @@ struct vfio_region_gfx_edid {
 /* sub-types for VFIO_REGION_TYPE_CCW */
 #define VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD  (1)
 
+/* sub-types for VFIO_REGION_TYPE_MIGRATION */
+#define VFIO_REGION_SUBTYPE_MIGRATION   (1)
+
+/*
+ * The structure vfio_device_migration_info is placed at the 0th offset of
+ * the VFIO_REGION_SUBTYPE_MIGRATION region to get and set VFIO device related
+ * migration information. Field accesses from this structure are only supported
+ * at their native width and alignment. Otherwise, the result is undefined and
+ * vendor drivers should return an error.
+ *
+ * device_state: (read/write)
+ *  - The user application writes to this field to inform the vendor driver
+ *about the device state to be transitioned to.
+ *  - The vendor driver should take the necessary actions to change the
+ *device state. After successful transition to a given state, the
+ *vendor driver should return success on write(device_state, state)
+ *system call. If the device state transition fails, the vendor driver
+ *should return an appropriate -errno for the fault condition.
+ *  - On the user application side, if the device state transition fails,
+ *   that is, if write(device_state, state) returns an error, read
+ *   device_state again to determine the current state of the device from
+ *   the vendor driver.
+ *  - The vendor driver should return previous state of the device unless
+ *the vendor driver has encountered an internal error, in which case
+ *the vendor driver may report the device_state 
VFIO_DEVICE_STATE_ERROR.
+ *  - The user application must use the device reset ioctl to recover the
+ *device from VFIO_DEVICE_STATE_ERROR state. If the device is
+ *indicated to be in a valid device state by reading device_state, the
+ *user application may attempt to transition the device to any valid
+ *state reachable from the current state or terminate itself.
+ *
+ *  device_state consists of 3 bits:
+ *  - If bit 0 is set, it indicates the _RUNNING state. If bit 0 is clear,
+ *it indicates the _STOP state. When the device state is changed to
+ *_STOP, driver should stop the device before write() returns.
+ *  - If bit 1 is set, it indicates the _SAVING state, which means that the
+ *driver should start gathering device state information that will be
+ *provided to the VFIO user application to save the device's state.
+ *  - If bit 2 is set, it indicates the _RESUMING state, which means that
+ *the driver should prepare to resume the device. Data provided through
+ *the migration region should be used to resume the device.
+ *  Bits 3 - 31 are reserved for future use. To preserve them, the user
+ *  application should perform a read-modify-write operation on this
+ *  field when modifying the specified bits.
+ *
+ *  +--- _RESUMING
+ *  |+-- _SAVING
+ *  ||+- _RUNNING
+ *  |||
+ *  000b => Device Stopped, not saving or resuming
+ *  001b => Device running, which is the default state
+ *  010b => Stop the device & save the device state, stop-and-copy state
+ *  011b => Device running and save the device state, pre-copy state
+ *  100b => Device stopped and the device state is resuming
+ *  101b => Invalid state
+ *  110b => Error state
+ *  111b => Invalid state
+ *
+ * State transitions:
+ *
+ *  _RESUMING  _RUNNINGPre-copyStop-and-copy   _STOP
+ *(100b) (001b) (011b)(010b)   (000b)
+ * 0. Running or default state
+ * |
+ *
+ * 1. Normal Shutdown (optional)
+ * |->|
+ *
+ * 2. Save the state or suspend
+ * |->|-->|
+ *
+ * 3.

[PATCH v16 Kernel 0/7] KABIs to support migration for VFIO devices

2020-03-24 Thread Kirti Wankhede

Hi,

This patch set adds:
* New IOCTL VFIO_IOMMU_DIRTY_PAGES to get dirty pages bitmap with
  respect to IOMMU container rather than per device. All pages pinned by
  vendor driver through vfio_pin_pages external API has to be marked as
  dirty during  migration. When IOMMU capable device is present in the
  container and all pages are pinned and mapped, then all pages are marked
  dirty.
  When there are CPU writes, CPU dirty page tracking can identify dirtied
  pages, but any page pinned by vendor driver can also be written by
  device. As of now there is no device which has hardware support for
  dirty page tracking. So all pages which are pinned should be considered
  as dirty.
  This ioctl is also used to start/stop dirty pages tracking for pinned and
  unpinned pages while migration is active.

* Updated IOCTL VFIO_IOMMU_UNMAP_DMA to get dirty pages bitmap before
  unmapping IO virtual address range.
  With vIOMMU, during pre-copy phase of migration, while CPUs are still
  running, IO virtual address unmap can happen while device still keeping
  reference of guest pfns. Those pages should be reported as dirty before
  unmap, so that VFIO user space application can copy content of those
  pages from source to destination.

* Patch 7 detect if IOMMU capable device driver is smart to report pages
  to be marked dirty by pinning pages using vfio_pin_pages() API.


Yet TODO:
Since there is no device which has hardware support for system memmory
dirty bitmap tracking, right now there is no other API from vendor driver
to VFIO IOMMU module to report dirty pages. In future, when such hardware
support will be implemented, an API will be required such that vendor
driver could report dirty pages to VFIO module during migration phases.

Adding revision history from previous QEMU patch set to understand KABI
changes done till now

v15 -> v16
- Minor edits and nit picks (Auger Eric)
- On copying bitmap to user, re-populated bitmap only for pinned pages,
  excluding unmapped pages and CPU dirtied pages.
- Patches are on tag: next-20200318 and 1-3 patches from Yan's series
  https://lkml.org/lkml/2020/3/12/1255

v14 -> v15
- Minor edits and nit picks.
- In the verification of user allocated bitmap memory, added check of
   maximum size.
- Patches are on tag: next-20200318 and 1-3 patches from Yan's series
  https://lkml.org/lkml/2020/3/12/1255

v13 -> v14
- Added struct vfio_bitmap to kabi. updated structure
  vfio_iommu_type1_dirty_bitmap_get and vfio_iommu_type1_dma_unmap.
- All small changes suggested by Alex.
- Patches are on tag: next-20200318 and 1-3 patches from Yan's series
  https://lkml.org/lkml/2020/3/12/1255

v12 -> v13
- Changed bitmap allocation in vfio_iommu_type1 to per vfio_dma
- Changed VFIO_IOMMU_DIRTY_PAGES ioctl behaviour to be per vfio_dma range.
- Changed vfio_iommu_type1_dirty_bitmap structure to have separate data
  field.

v11 -> v12
- Changed bitmap allocation in vfio_iommu_type1.
- Remove atomicity of ref_count.
- Updated comments for migration device state structure about error
  reporting.
- Nit picks from v11 reviews

v10 -> v11
- Fix pin pages API to free vpfn if it is marked as unpinned tracking page.
- Added proposal to detect if IOMMU capable device calls external pin pages
  API to mark pages dirty.
- Nit picks from v10 reviews

v9 -> v10:
- Updated existing VFIO_IOMMU_UNMAP_DMA ioctl to get dirty pages bitmap
  during unmap while migration is active
- Added flag in VFIO_IOMMU_GET_INFO to indicate driver support dirty page
  tracking.
- If iommu_mapped, mark all pages dirty.
- Added unpinned pages tracking while migration is active.
- Updated comments for migration device state structure with bit
  combination table and state transition details.

v8 -> v9:
- Split patch set in 2 sets, Kernel and QEMU.
- Dirty pages bitmap is queried from IOMMU container rather than from
  vendor driver for per device. Added 2 ioctls to achieve this.

v7 -> v8:
- Updated comments for KABI
- Added BAR address validation check during PCI device's config space load
  as suggested by Dr. David Alan Gilbert.
- Changed vfio_migration_set_state() to set or clear device state flags.
- Some nit fixes.

v6 -> v7:
- Fix build failures.

v5 -> v6:
- Fix build failure.

v4 -> v5:
- Added decriptive comment about the sequence of access of members of
  structure vfio_device_migration_info to be followed based on Alex's
  suggestion
- Updated get dirty pages sequence.
- As per Cornelia Huck's suggestion, added callbacks to VFIODeviceOps to
  get_object, save_config and load_config.
- Fixed multiple nit picks.
- Tested live migration with multiple vfio device assigned to a VM.

v3 -> v4:
- Added one more bit for _RESUMING flag to be set explicitly.
- data_offset field is read-only for user space application.
- data_size is read for every iteration before reading data from migration,
  that is removed assumption that data will be till end of migration
  region.
- If vendor driver supports mappable sparsed region,

Re: [PATCH 6/6] qga/commands-posix: fix use after free of local_err

2020-03-24 Thread Eric Blake


On 3/24/20 10:36 AM, Vladimir Sementsov-Ogievskiy wrote:

local_err is used several times in guest_suspend(). Setting non-NULL
local_err will crash, so let's zero it after freeing. Also fix possible
leak of local_err in final if().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  qga/commands-posix.c | 3 +++
  1 file changed, 3 insertions(+)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index 93474ff770..cc69b82704 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -1773,6 +1773,7 @@ static void guest_suspend(SuspendMode mode, Error **errp)
  }
  
  error_free(local_err);

+local_err = NULL;


Let's show this with more context.


static void guest_suspend(SuspendMode mode, Error **errp)
{
Error *local_err = NULL;
bool mode_supported = false;

if (systemd_supports_mode(mode, _err)) {


Hmm - we have an even earlier bug that needs fixing.  Note that 
systemd_supports_mode() returns a bool AND conditionally sets errp.  But 
it is inconsistent: it has the following table of actions based on the 
results of run_process_child() on "systemctl status" coupled with the 
man page on "systemctl status" return values:

-1 (unable to run systemctl) -> errp set, return false
0 (unit is active) -> errp left unchanged, return false
1 (unit not failed) -> errp left unchanged, return true
2 (unused) -> errp left unchanged, return true
3 (unit not active) -> errp left unchanged, return true
4 (no such unit) -> errp left unchanged, return false
5+ (unexpected from systemctl) -> errp left unchanged, return false

But the comments in systemd_supports_mode() claim that ANY status < 4 
(other than -1, which means we did not run systemctl) should count as 
the service existing, even though the most common status is 3.  If our 
comment is to be believed, then we should return true, not false, for 
status 0.


Now, back to _this_ function:


mode_supported = true;
systemd_suspend(mode, _err);


Okay - if we get here (whether from status 1-3, or with 
systemd_supports_mode fixed to support status 0-3), local_err is still 
unset prior to calling systemd_suspend(), and we are guaranteed that 
after the call, either we suspended successfully or local_err is now set.



}

if (!local_err) {
return;
}


So if returned, we succeeded at systemd_suspend, and there is nothing 
further to do; but if we get past that point, we don't know if it was 
systemd_supports_mode that failed or systemd_suspend that failed, and we 
don't know if local_err is set.




error_free(local_err);
+local_err = NULL;


Yet, we blindly throw away local_err, without trying to report it.  If 
that's the case, then WHY are we passing in local_err?  Wouldn't it be 
better to pass in NULL (we really don't care about the error message), 
and/or fix systemd_suspend() to return a bool just like 
systemd_supports_mode, and/or fix systemd_supports_mode to guarantee 
that it sets errp when returning false?


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org

Re: [PATCH v1] mips/mips_malta: Allow more than 2G RAM

2020-03-24 Thread Aleksandar Markovic

18:38 Pon, 23.03.2020. Aurelien Jarno  је написао/ла:
>
> Hi,
>
> Sorry for the delay, I just want to give some more details about the
> Debian.
>
> On 2020-03-14 10:09, Philippe Mathieu-Daudé wrote:
> > IIUC today all distributions supporting MIPS ports are building their
MIPS
> > packages on QEMU instances because it is faster than the native MIPS
> > hardware they have.
>
> Actually Debian requires that packages are built on real hardware. We
> have a mix of Loongson 3 and Octeon 3 based build daemons. They all have
> 8GiB of RAM.
>
> > Since one (or two?) years, some binaries (Linux kernel? QEMU?) are
failing
> > to link because the amount of guest memory is restricted to 2GB
(probably
> > advance of linker techniques, now linkers use more memory).
>
> The problem happens with big packages (e.g. ceph which is a dependency
> of QEMU). The problem is not the physical memory issue, but the virtual
> address space, which is limited to 2GB for 32-bit processes. That's why
> we do not have the issue for the 64-bit ports.
>
> > YunQiang, is this why you suggested this change?
> >
> > See:
> > -
https://www.mail-archive.com/debian-mips@lists.debian.org/msg10912.html
> > -
https://alioth-lists.debian.net/pipermail/pkg-rust-maintainers/2019-January/004844.html
> >
> > I believe most of the QEMU Malta board users don't care it is a Malta
board,
> > they only care it is a fast emulated MIPS machine.
> > Unfortunately it is the default board.
> >
> > However 32-bit MIPS port is being dropped on Debian:
> > https://lists.debian.org/debian-mips/2019/07/msg00010.html
>
> The 32-bit big endian port has been dropped after the Buster (10)
> release and won't be available for the Bullseye release (11). The
> 32-bit little endian port is still available, but it's difficult to keep
> it alive given the 2GB limit.
>
> > Maybe we can sync with the Malta users, ask them to switch to the Boston
> > machines to build 64-bit packages, then later reduce the Malta board to
1GB.
> > (The Boston board is more recent, but was not available at the time
users
> > started to use QEMU to build 64-bit packages).
> >
> > Might it be easier starting introducing a malta-5.0 machine restricted
to
> > 1GB?
>
> In any case having an easy way to simulate machines with more than 2GB
> of RAM in QEMU would be great.
>

In my company, we do have both Octeon (don't know at this moment what
version) and Boston systems.

Boston seems to me as a very good candidate for enabling RAM > 2GB. I never
saw it phisically, since it is assigned to a different department, but just
anectodaly I heard that it is designed as a desktop (or even server)
machine, and, therefore, it almost certainly supports > 2GB.

Given current circumstances of remote work for most of us, and limited
movement, it may be somewhat difficult for me to access it, but it is not
imposible.

Please take everything I said in this email with a grain of salt, since it
is based more on hallway chats, rather than on facts.

I'll try to get more info, hopefully soon.

Yours,
Aleksandar


> Cheers,
> Aurelien
>
> --
> Aurelien Jarno  GPG: 4096R/1DDD8C9B
> aurel...@aurel32.net http://www.aurel32.net
>

[PULL 2/2] hw/ide/sii3112: Use qdev gpio rather than qemu_allocate_irqs()

2020-03-24 Thread John Snow

From: Peter Maydell 

Coverity points out (CID 1421984) that we are leaking the
memory returned by qemu_allocate_irqs(). We can avoid this
leak by switching to using qdev_init_gpio_in(); the base
class finalize will free the irqs that this allocates under
the hood.

Signed-off-by: Peter Maydell 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: John Snow 
Tested-by: BALATON Zoltan 
Message-id: 20200323151715.29454-1-peter.mayd...@linaro.org
[Maintainer edit: replace `DEVICE(dev)` by `ds` --js]
Signed-off-by: John Snow 
---
 hw/ide/sii3112.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/hw/ide/sii3112.c b/hw/ide/sii3112.c
index 06605d7af2..d69079c3d9 100644
--- a/hw/ide/sii3112.c
+++ b/hw/ide/sii3112.c
@@ -251,8 +251,8 @@ static void sii3112_pci_realize(PCIDevice *dev, Error 
**errp)
 {
 SiI3112PCIState *d = SII3112_PCI(dev);
 PCIIDEState *s = PCI_IDE(dev);
+DeviceState *ds = DEVICE(dev);
 MemoryRegion *mr;
-qemu_irq *irq;
 int i;
 
 pci_config_set_interrupt_pin(dev->config, 1);
@@ -280,10 +280,10 @@ static void sii3112_pci_realize(PCIDevice *dev, Error 
**errp)
 memory_region_init_alias(mr, OBJECT(d), "sii3112.bar4", >mmio, 0, 16);
 pci_register_bar(dev, 4, PCI_BASE_ADDRESS_SPACE_IO, mr);
 
-irq = qemu_allocate_irqs(sii3112_set_irq, d, 2);
+qdev_init_gpio_in(ds, sii3112_set_irq, 2);
 for (i = 0; i < 2; i++) {
-ide_bus_new(>bus[i], sizeof(s->bus[i]), DEVICE(dev), i, 1);
-ide_init2(>bus[i], irq[i]);
+ide_bus_new(>bus[i], sizeof(s->bus[i]), ds, i, 1);
+ide_init2(>bus[i], qdev_get_gpio_in(ds, i));
 
 bmdma_init(>bus[i], >bmdma[i], s);
 s->bmdma[i].bus = >bus[i];
-- 
2.21.1

[PULL 1/2] fdc/i8257: implement verify transfer mode

2020-03-24 Thread John Snow

From: Sven Schnelle 

While working on the Tulip driver i tried to write some Teledisk images to
a floppy image which didn't work. Turned out that Teledisk checks the written
data by issuing a READ command to the FDC but running the DMA controller
in VERIFY mode. As we ignored the DMA request in that case, the DMA transfer
never finished, and Teledisk reported an error.

The i8257 spec says about verify transfers:

3) DMA verify, which does not actually involve the transfer of data. When an
8257 channel is in the DMA verify mode, it will respond the same as described
for transfer operations, except that no memory or I/O read/write control signals
will be generated.

Hervé proposed to remove all the dma_mode_ok stuff from fdc to have a more
clear boundary between DMA and FDC, so this patch also does that.

Suggested-by: Hervé Poussineau 
Signed-off-by: Sven Schnelle 
Reviewed-by: Hervé Poussineau 
Signed-off-by: John Snow 
---
 include/hw/isa/isa.h |  1 -
 hw/block/fdc.c   | 61 +---
 hw/dma/i8257.c   | 20 ++-
 3 files changed, 31 insertions(+), 51 deletions(-)

diff --git a/include/hw/isa/isa.h b/include/hw/isa/isa.h
index e9ac1f1205..59a4d4b50a 100644
--- a/include/hw/isa/isa.h
+++ b/include/hw/isa/isa.h
@@ -56,7 +56,6 @@ typedef int (*IsaDmaTransferHandler)(void *opaque, int nchan, 
int pos,
 typedef struct IsaDmaClass {
 InterfaceClass parent;
 
-IsaDmaTransferMode (*get_transfer_mode)(IsaDma *obj, int nchan);
 bool (*has_autoinitialization)(IsaDma *obj, int nchan);
 int (*read_memory)(IsaDma *obj, int nchan, void *buf, int pos, int len);
 int (*write_memory)(IsaDma *obj, int nchan, void *buf, int pos, int len);
diff --git a/hw/block/fdc.c b/hw/block/fdc.c
index 22e954e0dc..33bc9e2f92 100644
--- a/hw/block/fdc.c
+++ b/hw/block/fdc.c
@@ -1714,53 +1714,28 @@ static void fdctrl_start_transfer(FDCtrl *fdctrl, int 
direction)
 }
 fdctrl->eot = fdctrl->fifo[6];
 if (fdctrl->dor & FD_DOR_DMAEN) {
-IsaDmaTransferMode dma_mode;
+/* DMA transfer is enabled. */
 IsaDmaClass *k = ISADMA_GET_CLASS(fdctrl->dma);
-bool dma_mode_ok;
-/* DMA transfer are enabled. Check if DMA channel is well programmed */
-dma_mode = k->get_transfer_mode(fdctrl->dma, fdctrl->dma_chann);
-FLOPPY_DPRINTF("dma_mode=%d direction=%d (%d - %d)\n",
-   dma_mode, direction,
-   (128 << fdctrl->fifo[5]) *
+
+FLOPPY_DPRINTF("direction=%d (%d - %d)\n",
+   direction, (128 << fdctrl->fifo[5]) *
(cur_drv->last_sect - ks + 1), fdctrl->data_len);
-switch (direction) {
-case FD_DIR_SCANE:
-case FD_DIR_SCANL:
-case FD_DIR_SCANH:
-dma_mode_ok = (dma_mode == ISADMA_TRANSFER_VERIFY);
-break;
-case FD_DIR_WRITE:
-dma_mode_ok = (dma_mode == ISADMA_TRANSFER_WRITE);
-break;
-case FD_DIR_READ:
-dma_mode_ok = (dma_mode == ISADMA_TRANSFER_READ);
-break;
-case FD_DIR_VERIFY:
-dma_mode_ok = true;
-break;
-default:
-dma_mode_ok = false;
-break;
-}
-if (dma_mode_ok) {
-/* No access is allowed until DMA transfer has completed */
-fdctrl->msr &= ~FD_MSR_RQM;
-if (direction != FD_DIR_VERIFY) {
-/* Now, we just have to wait for the DMA controller to
- * recall us...
- */
-k->hold_DREQ(fdctrl->dma, fdctrl->dma_chann);
-k->schedule(fdctrl->dma);
-} else {
-/* Start transfer */
-fdctrl_transfer_handler(fdctrl, fdctrl->dma_chann, 0,
-fdctrl->data_len);
-}
-return;
+
+/* No access is allowed until DMA transfer has completed */
+fdctrl->msr &= ~FD_MSR_RQM;
+if (direction != FD_DIR_VERIFY) {
+/*
+ * Now, we just have to wait for the DMA controller to
+ * recall us...
+ */
+k->hold_DREQ(fdctrl->dma, fdctrl->dma_chann);
+k->schedule(fdctrl->dma);
 } else {
-FLOPPY_DPRINTF("bad dma_mode=%d direction=%d\n", dma_mode,
-   direction);
+/* Start transfer */
+fdctrl_transfer_handler(fdctrl, fdctrl->dma_chann, 0,
+fdctrl->data_len);
 }
+return;
 }
 FLOPPY_DPRINTF("start non-DMA transfer\n");
 fdctrl->msr |= FD_MSR_NONDMA | FD_MSR_RQM;
diff --git a/hw/dma/i8257.c b/hw/dma/i8257.c
index ef15c06d77..1b3435ab58 100644
--- a/hw/dma/i8257.c
+++ b/hw/dma/i8257.c
@@ -292,12 +292,6 @@ static uint64_t i8257_read_cont(void *opaque, hwaddr 
nport, unsigned size)
 return val;
 }
 
-static IsaDmaTransferMode

[PULL 0/2] Ide patches

2020-03-24 Thread John Snow

The following changes since commit 736cf607e40674776d752acc201f565723e86045:

  Update version for v5.0.0-rc0 release (2020-03-24 17:50:00 +)

are available in the Git repository at:

  https://github.com/jnsnow/qemu.git tags/ide-pull-request

for you to fetch changes up to 51058b3b3bcbe62506cf191fca1c0d679bb80f2b:

  hw/ide/sii3112: Use qdev gpio rather than qemu_allocate_irqs() (2020-03-24 
15:52:16 -0400)


Pull request: IDE

Admittedly the first one is not a crisis fix; but I think it's low-risk to
include for rc1.

The second one is yours, and will shush coverity.



Peter Maydell (1):
  hw/ide/sii3112: Use qdev gpio rather than qemu_allocate_irqs()

Sven Schnelle (1):
  fdc/i8257: implement verify transfer mode

 include/hw/isa/isa.h |  1 -
 hw/block/fdc.c   | 61 +---
 hw/dma/i8257.c   | 20 ++-
 hw/ide/sii3112.c |  8 +++---
 4 files changed, 35 insertions(+), 55 deletions(-)

-- 
2.21.1

Re: [PATCH v15 Kernel 3/7] vfio iommu: Add ioctl definition for dirty pages tracking.

2020-03-24 Thread Kirti Wankhede





On 3/24/2020 2:41 AM, Auger Eric wrote:

Hi Kirti,

On 3/19/20 9:16 PM, Kirti Wankhede wrote:

IOMMU container maintains a list of all pages pinned by vfio_pin_pages API.
All pages pinned by vendor driver through this API should be considered as
dirty during migration. When container consists of IOMMU capable device and
all pages are pinned and mapped, then all pages are marked dirty.
Added support to start/stop dirtied pages tracking and to get bitmap of all
dirtied pages for requested IO virtual address range.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
  include/uapi/linux/vfio.h | 55 +++
  1 file changed, 55 insertions(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index d0021467af53..8138f94cac15 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -995,6 +995,12 @@ struct vfio_iommu_type1_dma_map {
  
  #define VFIO_IOMMU_MAP_DMA _IO(VFIO_TYPE, VFIO_BASE + 13)
  
+struct vfio_bitmap {

+   __u64pgsize;/* page size for bitmap */

in bytes as well


Added.


+   __u64size;  /* in bytes */
+   __u64 __user *data; /* one bit per page */
+};
+
  /**
   * VFIO_IOMMU_UNMAP_DMA - _IOWR(VFIO_TYPE, VFIO_BASE + 14,
   *struct vfio_dma_unmap)
@@ -1021,6 +1027,55 @@ struct vfio_iommu_type1_dma_unmap {
  #define VFIO_IOMMU_ENABLE _IO(VFIO_TYPE, VFIO_BASE + 15)
  #define VFIO_IOMMU_DISABLE_IO(VFIO_TYPE, VFIO_BASE + 16)
  
+/**

+ * VFIO_IOMMU_DIRTY_PAGES - _IOWR(VFIO_TYPE, VFIO_BASE + 17,
+ * struct vfio_iommu_type1_dirty_bitmap)
+ * IOCTL is used for dirty pages tracking. Caller sets argsz, which is size 
of> + * struct vfio_iommu_type1_dirty_bitmap.

nit: This may become outdated when adding new fields. argz use mode is
documented at the beginning of the file.



Ok.


  Caller set flag depend on which

+ * operation to perform, details as below:
+ *
+ * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_START set, indicates
+ * migration is active and IOMMU module should track pages which are dirtied or
+ * potentially dirtied by device.
+ * Dirty pages are tracked until tracking is stopped by user application by
+ * setting VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP flag.
+ *
+ * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP set, indicates
+ * IOMMU should stop tracking dirtied pages.
+ *
+ * When IOCTL is called with VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP flag set,
+ * IOCTL returns dirty pages bitmap for IOMMU container during migration for
+ * given IOVA range. User must provide data[] as the structure
+ * vfio_iommu_type1_dirty_bitmap_get through which user provides IOVA range

I think the fact the IOVA range must match the vfio dma_size must be
documented.


Added.


  and

+ * pgsize. This interface supports to get bitmap of smallest supported pgsize
+ * only and can be modified in future to get bitmap of specified pgsize.
+ * User must allocate memory for bitmap, zero the bitmap memory and set size
+ * of allocated memory in bitmap_size field. One bit is used to represent one
+ * page consecutively starting from iova offset. User should provide page size
+ * in 'pgsize'. Bit set in bitmap indicates page at that offset from iova is
+ * dirty. Caller must set argsz including size of structure
+ * vfio_iommu_type1_dirty_bitmap_get.

nit: ditto


I think this is still needed here because vfio_bitmap is only used in 
case of this particular flag.


Thanks,
Kirti


+ *
+ * Only one of the flags _START, STOP and _GET may be specified at a time.
+ *
+ */
+struct vfio_iommu_type1_dirty_bitmap {
+   __u32argsz;
+   __u32flags;
+#define VFIO_IOMMU_DIRTY_PAGES_FLAG_START  (1 << 0)
+#define VFIO_IOMMU_DIRTY_PAGES_FLAG_STOP   (1 << 1)
+#define VFIO_IOMMU_DIRTY_PAGES_FLAG_GET_BITMAP (1 << 2)
+   __u8 data[];
+};
+
+struct vfio_iommu_type1_dirty_bitmap_get {
+   __u64  iova;/* IO virtual address */
+   __u64  size;/* Size of iova range */
+   struct vfio_bitmap bitmap;
+};
+
+#define VFIO_IOMMU_DIRTY_PAGES _IO(VFIO_TYPE, VFIO_BASE + 17)
+
  /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
  
  /*



Thanks

Eric

[PATCH-for-5.0] qga-posix: Avoid crashing process when failing to allocate memory

2020-03-24 Thread Philippe Mathieu-Daudé

Similarly to commit 807e2b6fce0 for Windows, kindly return a
QMP error message instead of crashing the whole process.

Cc: qemu-sta...@nongnu.org
Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=1594054
Reported-by: Fakhri Zulkifli 
Signed-off-by: Philippe Mathieu-Daudé 
---
 qga/commands-posix.c | 8 +++-
 1 file changed, 7 insertions(+), 1 deletion(-)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index 93474ff770..8f127788e6 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -493,7 +493,13 @@ struct GuestFileRead *qmp_guest_file_read(int64_t handle, 
bool has_count,
 gfh->state = RW_STATE_NEW;
 }
 
-buf = g_malloc0(count+1);
+buf = g_try_malloc0(count + 1);
+if (!buf) {
+error_setg(errp,
+   "failed to allocate sufficient memory "
+   "to complete the requested service");
+return NULL;
+}
 read_count = fread(buf, 1, count, fh);
 if (ferror(fh)) {
 error_setg_errno(errp, errno, "failed to read file");
-- 
2.21.1

Re: [PATCH 5/6] migration/ram: fix use after free of local_err

2020-03-24 Thread Dr. David Alan Gilbert

* Vladimir Sementsov-Ogievskiy (vsement...@virtuozzo.com) wrote:
> local_err is used again in migration_bitmap_sync_precopy() after
> precopy_notify(), so we must zero it. Otherwise try to set
> non-NULL local_err will crash.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> ---
>  migration/ram.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/migration/ram.c b/migration/ram.c
> index c12cfdbe26..04f13feb2e 100644
> --- a/migration/ram.c
> +++ b/migration/ram.c
> @@ -980,6 +980,7 @@ static void migration_bitmap_sync_precopy(RAMState *rs)
>   */
>  if (precopy_notify(PRECOPY_NOTIFY_BEFORE_BITMAP_SYNC, _err)) {
>  error_report_err(local_err);
> +local_err = NULL;

Reviewed-by: Dr. David Alan Gilbert 

and queued.


>  }
>  
>  migration_bitmap_sync(rs);
> -- 
> 2.21.0
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH 4/6] migration/colo: fix use after free of local_err

2020-03-24 Thread Dr. David Alan Gilbert

* Vladimir Sementsov-Ogievskiy (vsement...@virtuozzo.com) wrote:
> local_err is used again in secondary_vm_do_failover() after
> replication_stop_all(), so we must zero it. Otherwise try to set
> non-NULL local_err will crash.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 

Reviewed-by: Dr. David Alan Gilbert 

I'll queue this

> ---
>  migration/colo.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/migration/colo.c b/migration/colo.c
> index 44942c4e23..a54ac84f41 100644
> --- a/migration/colo.c
> +++ b/migration/colo.c
> @@ -93,6 +93,7 @@ static void secondary_vm_do_failover(void)
>  replication_stop_all(true, _err);
>  if (local_err) {
>  error_report_err(local_err);
> +local_err = NULL;
>  }
>  
>  /* Notify all filters of all NIC to do checkpoint */
> -- 
> 2.21.0
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH v15 Kernel 2/7] vfio iommu: Remove atomicity of ref_count of pinned pages

2020-03-24 Thread Kirti Wankhede





On 3/24/2020 2:00 AM, Auger Eric wrote:

Hi Kirti,

On 3/19/20 9:16 PM, Kirti Wankhede wrote:

vfio_pfn.ref_count is always updated by holding iommu->lock, using atomic
variable is overkill.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 

Reviewed-by: Eric Auger 



Thanks.

Kirti.


Thanks

Eric

---
  drivers/vfio/vfio_iommu_type1.c | 9 +
  1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 9fdfae1cb17a..70aeab921d0f 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -112,7 +112,7 @@ struct vfio_pfn {
struct rb_node  node;
dma_addr_t  iova;   /* Device address */
unsigned long   pfn;/* Host pfn */
-   atomic_tref_count;
+   unsigned intref_count;
  };
  
  struct vfio_regions {

@@ -233,7 +233,7 @@ static int vfio_add_to_pfn_list(struct vfio_dma *dma, 
dma_addr_t iova,
  
  	vpfn->iova = iova;

vpfn->pfn = pfn;
-   atomic_set(>ref_count, 1);
+   vpfn->ref_count = 1;
vfio_link_pfn(dma, vpfn);
return 0;
  }
@@ -251,7 +251,7 @@ static struct vfio_pfn *vfio_iova_get_vfio_pfn(struct 
vfio_dma *dma,
struct vfio_pfn *vpfn = vfio_find_vpfn(dma, iova);
  
  	if (vpfn)

-   atomic_inc(>ref_count);
+   vpfn->ref_count++;
return vpfn;
  }
  
@@ -259,7 +259,8 @@ static int vfio_iova_put_vfio_pfn(struct vfio_dma *dma, struct vfio_pfn *vpfn)

  {
int ret = 0;
  
-	if (atomic_dec_and_test(>ref_count)) {

+   vpfn->ref_count--;
+   if (!vpfn->ref_count) {
ret = put_pfn(vpfn->pfn, dma->prot);
vfio_remove_from_pfn_list(dma, vpfn);
}

Re: [PATCH v2 4/4] sheepdog: Consistently set bdrv_has_zero_init_truncate

2020-03-24 Thread John Snow




On 3/24/20 1:42 PM, Eric Blake wrote:
> block_int.h claims that .bdrv_has_zero_init must return 0 if
> .bdrv_has_zero_init_truncate does likewise; but this is violated if
> only the former callback is provided if .bdrv_co_truncate also exists.
> When adding the latter callback, it was mistakenly added to only one
> of the three possible sheepdog instantiations.
> 
> Fixes: 1dcaf527
> Signed-off-by: Eric Blake 
> ---
>  block/sheepdog.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/block/sheepdog.c b/block/sheepdog.c
> index cfa84338a2d6..522c16a93676 100644
> --- a/block/sheepdog.c
> +++ b/block/sheepdog.c
> @@ -3269,6 +3269,7 @@ static BlockDriver bdrv_sheepdog_tcp = {
>  .bdrv_co_create   = sd_co_create,
>  .bdrv_co_create_opts  = sd_co_create_opts,
>  .bdrv_has_zero_init   = bdrv_has_zero_init_1,
> +.bdrv_has_zero_init_truncate  = bdrv_has_zero_init_1,
>  .bdrv_getlength   = sd_getlength,
>  .bdrv_get_allocated_file_size = sd_get_allocated_file_size,
>  .bdrv_co_truncate = sd_co_truncate,
> @@ -3307,6 +3308,7 @@ static BlockDriver bdrv_sheepdog_unix = {
>  .bdrv_co_create   = sd_co_create,
>  .bdrv_co_create_opts  = sd_co_create_opts,
>  .bdrv_has_zero_init   = bdrv_has_zero_init_1,
> +.bdrv_has_zero_init_truncate  = bdrv_has_zero_init_1,
>  .bdrv_getlength   = sd_getlength,
>  .bdrv_get_allocated_file_size = sd_get_allocated_file_size,
>  .bdrv_co_truncate = sd_co_truncate,
> 

Reviewed-by: John Snow

Re: [PATCH v15 Kernel 1/7] vfio: KABI for migration interface for device state

2020-03-24 Thread Kirti Wankhede





On 3/24/2020 2:00 AM, Auger Eric wrote:

Hi Kirti,

On 3/19/20 9:16 PM, Kirti Wankhede wrote:

- Defined MIGRATION region type and sub-type.

- Defined vfio_device_migration_info structure which will be placed at the
   0th offset of migration region to get/set VFIO device related
   information. Defined members of structure and usage on read/write access.

- Defined device states and state transition details.

- Defined sequence to be followed while saving and resuming VFIO device.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 


Please forgive me, I have just discovered v15 was available.

hereafter, you will find the 2 main points I feel difficult to
understand when reading the documentation.


---
  include/uapi/linux/vfio.h | 227 ++
  1 file changed, 227 insertions(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 9e843a147ead..d0021467af53 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -305,6 +305,7 @@ struct vfio_region_info_cap_type {
  #define VFIO_REGION_TYPE_PCI_VENDOR_MASK  (0x)
  #define VFIO_REGION_TYPE_GFX(1)
  #define VFIO_REGION_TYPE_CCW  (2)
+#define VFIO_REGION_TYPE_MIGRATION  (3)
  
  /* sub-types for VFIO_REGION_TYPE_PCI_* */
  
@@ -379,6 +380,232 @@ struct vfio_region_gfx_edid {

  /* sub-types for VFIO_REGION_TYPE_CCW */
  #define VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD (1)
  
+/* sub-types for VFIO_REGION_TYPE_MIGRATION */

+#define VFIO_REGION_SUBTYPE_MIGRATION   (1)
+
+/*
+ * The structure vfio_device_migration_info is placed at the 0th offset of
+ * the VFIO_REGION_SUBTYPE_MIGRATION region to get and set VFIO device related
+ * migration information. Field accesses from this structure are only supported
+ * at their native width and alignment. Otherwise, the result is undefined and
+ * vendor drivers should return an error.
+ *
+ * device_state: (read/write)
+ *  - The user application writes to this field to inform the vendor driver
+ *about the device state to be transitioned to.
+ *  - The vendor driver should take the necessary actions to change the
+ *device state. After successful transition to a given state, the
+ *vendor driver should return success on write(device_state, state)
+ *system call. If the device state transition fails, the vendor driver
+ *should return an appropriate -errno for the fault condition.
+ *  - On the user application side, if the device state transition fails,
+ *   that is, if write(device_state, state) returns an error, read
+ *   device_state again to determine the current state of the device from
+ *   the vendor driver.
+ *  - The vendor driver should return previous state of the device unless
+ *the vendor driver has encountered an internal error, in which case
+ *the vendor driver may report the device_state 
VFIO_DEVICE_STATE_ERROR.
+ *  - The user application must use the device reset ioctl to recover the
+ *device from VFIO_DEVICE_STATE_ERROR state. If the device is
+ *indicated to be in a valid device state by reading device_state, the
+ *user application may attempt to transition the device to any valid
+ *state reachable from the current state or terminate itself.
+ *
+ *  device_state consists of 3 bits:
+ *  - If bit 0 is set, it indicates the _RUNNING state. If bit 0 is clear,
+ *it indicates the _STOP state. When the device state is changed to
+ *_STOP, driver should stop the device before write() returns.
+ *  - If bit 1 is set, it indicates the _SAVING state, which means that the
+ *driver should start gathering device state information that will be
+ *provided to the VFIO user application to save the device's state.
+ *  - If bit 2 is set, it indicates the _RESUMING state, which means that
+ *the driver should prepare to resume the device. Data provided through
+ *the migration region should be used to resume the device.
+ *  Bits 3 - 31 are reserved for future use. To preserve them, the user
+ *  application should perform a read-modify-write operation on this
+ *  field when modifying the specified bits.
+ *
+ *  +--- _RESUMING
+ *  |+-- _SAVING
+ *  ||+- _RUNNING
+ *  |||
+ *  000b => Device Stopped, not saving or resuming
+ *  001b => Device running, which is the default state
+ *  010b => Stop the device & save the device state, stop-and-copy state
+ *  011b => Device running and save the device state, pre-copy state
+ *  100b => Device stopped and the device state is resuming
+ *  101b => Invalid state
+ *  110b => Error state
+ *  111b => Invalid state
+ *
+ * State transitions:
+ *
+ *  _RESUMING  _RUNNINGPre-copyStop-and-copy   _STOP
+ *(100b) (001b) (011b)(010b)   (000b)
+ * 0. Running or

Re: [PATCH v14 Kernel 1/7] vfio: KABI for migration interface for device state

2020-03-24 Thread Kirti Wankhede





On 3/23/2020 5:15 PM, Auger Eric wrote:

Hi Kirti,

On 3/18/20 8:41 PM, Kirti Wankhede wrote:

- Defined MIGRATION region type and sub-type.

- Defined vfio_device_migration_info structure which will be placed at the
   0th offset of migration region to get/set VFIO device related
   information. Defined members of structure and usage on read/write access.

- Defined device states and state transition details.

- Defined sequence to be followed while saving and resuming VFIO device.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
  include/uapi/linux/vfio.h | 227 ++
  1 file changed, 227 insertions(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 9e843a147ead..d0021467af53 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -305,6 +305,7 @@ struct vfio_region_info_cap_type {
  #define VFIO_REGION_TYPE_PCI_VENDOR_MASK  (0x)
  #define VFIO_REGION_TYPE_GFX(1)
  #define VFIO_REGION_TYPE_CCW  (2)
+#define VFIO_REGION_TYPE_MIGRATION  (3)
  
  /* sub-types for VFIO_REGION_TYPE_PCI_* */
  
@@ -379,6 +380,232 @@ struct vfio_region_gfx_edid {

  /* sub-types for VFIO_REGION_TYPE_CCW */
  #define VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD (1)
  
+/* sub-types for VFIO_REGION_TYPE_MIGRATION */

+#define VFIO_REGION_SUBTYPE_MIGRATION   (1)
+
+/*
+ * The structure vfio_device_migration_info is placed at the 0th offset of
+ * the VFIO_REGION_SUBTYPE_MIGRATION region to get and set VFIO device related
+ * migration information. Field accesses from this structure are only supported
+ * at their native width and alignment. Otherwise, the result is undefined and
+ * vendor drivers should return an error.
+ *
+ * device_state: (read/write)
+ *  - The user application writes to this field to inform the vendor driver
+ *about the device state to be transitioned to.
+ *  - The vendor driver should take the necessary actions to change the
+ *device state. After successful transition to a given state, the
+ *vendor driver should return success on write(device_state, state)
+ *system call. If the device state transition fails, the vendor driver
+ *should return an appropriate -errno for the fault condition.
+ *  - On the user application side, if the device state transition fails,
+ *   that is, if write(device_state, state) returns an error, read
+ *   device_state again to determine the current state of the device from
+ *   the vendor driver.
+ *  - The vendor driver should return previous state of the device unless
+ *the vendor driver has encountered an internal error, in which case
+ *the vendor driver may report the device_state 
VFIO_DEVICE_STATE_ERROR.
+ *  - The user application must use the device reset ioctl to recover the
+ *device from VFIO_DEVICE_STATE_ERROR state. If the device is
+ *indicated to be in a valid device state by reading device_state, the
+ *user application may attempt to transition the device to any valid
+ *state reachable from the current state or terminate itself.
+ *
+ *  device_state consists of 3 bits:
+ *  - If bit 0 is set, it indicates the _RUNNING state. If bit 0 is clear,
+ *it indicates the _STOP state. When the device state is changed to
+ *_STOP, driver should stop the device before write() returns.
+ *  - If bit 1 is set, it indicates the _SAVING state, which means that the
+ *driver should start gathering device state information that will be
+ *provided to the VFIO user application to save the device's state.
+ *  - If bit 2 is set, it indicates the _RESUMING state, which means that
+ *the driver should prepare to resume the device. Data provided through
+ *the migration region should be used to resume the device.
+ *  Bits 3 - 31 are reserved for future use. To preserve them, the user
+ *  application should perform a read-modify-write operation on this
+ *  field when modifying the specified bits.
+ *
+ *  +--- _RESUMING
+ *  |+-- _SAVING
+ *  ||+- _RUNNING
+ *  |||
+ *  000b => Device Stopped, not saving or resuming
+ *  001b => Device running, which is the default state
+ *  010b => Stop the device & save the device state, stop-and-copy state
+ *  011b => Device running and save the device state, pre-copy state
+ *  100b => Device stopped and the device state is resuming
+ *  101b => Invalid state
+ *  110b => Error state
+ *  111b => Invalid state
+ *
+ * State transitions:
+ *
+ *  _RESUMING  _RUNNINGPre-copyStop-and-copy   _STOP
+ *(100b) (001b) (011b)(010b)   (000b)
+ * 0. Running or default state
+ * |
+ *
+ * 1. Normal Shutdown (optional)
+ * |->|
+ *
+ * 2.

Re: [PATCH for-5.0] vl.c: fix migration failure for 3.1 and older machine types

2020-03-24 Thread Dr. David Alan Gilbert

* Igor Mammedov (imamm...@redhat.com) wrote:
> On Wed,  4 Mar 2020 12:27:48 -0500
> Igor Mammedov  wrote:
> 
> > Migration from QEMU(v4.0) fails when using 3.1 or older machine
> > type. For example if one attempts to migrate
> > QEMU-2.12 started as
> >   qemu-system-ppc64 -nodefaults -M pseries-2.12 -m 4096 -mem-path /tmp/
> > to current master, it will fail with
> >   qemu-system-ppc64: Unknown ramblock "ppc_spapr.ram", cannot accept 
> > migration
> >   qemu-system-ppc64: error while loading state for instance 0x0 of device 
> > 'ram'
> >   qemu-system-ppc64: load of migration failed: Invalid argument
> > 
> > Caused by 900c0ba373 commit which switches main RAM allocation to
> > memory backends and the fact in 3.1 and older QEMU, backends used
> > full[***] QOM path as memory region name instead of backend's name.
> > That was changed after 3.1 to use prefix-less names by default
> > (fa0cb34d22) for new machine types.
> > *** effectively makes main RAM memory region names defined by
> > MachineClass::default_ram_id being altered with '/objects/' prefix
> > and therefore migration fails as old QEMU sends prefix-less
> > name while new QEMU expects name with prefix when using 3.1 and
> > older machine types.
> > 
> > Fix it by forcing implicit[1] memory backend to always use
> > prefix-less names for its memory region by setting
> >   'x-use-canonical-path-for-ramblock-id'
> > property to false.
> > 
> > 1) i.e. memory backend created by compat glue which maps
> > -m/-mem-path/-mem-prealloc/default RAM size into
> > appropriate backend type/options to match old CLI format.
> > 
> > Fixes: 900c0ba373
> > Signed-off-by: Igor Mammedov 
> > Reported-by: Lukáš Doktor 
> 
> 
> ping,
> 
> so we don't forget to merge it

I'm queueing this.

> > ---
> > CC: ldok...@redhat.com
> > CC: marcandre.lur...@redhat.com
> > CC: dgilb...@redhat.com
> > ---
> >  softmmu/vl.c | 3 +++
> >  1 file changed, 3 insertions(+)
> > 
> > diff --git a/softmmu/vl.c b/softmmu/vl.c
> > index 5549f4b619..1101b1cb41 100644
> > --- a/softmmu/vl.c
> > +++ b/softmmu/vl.c
> > @@ -2800,6 +2800,9 @@ static void create_default_memdev(MachineState *ms, 
> > const char *path)
> >  object_property_set_int(obj, ms->ram_size, "size", _fatal);
> >  object_property_add_child(object_get_objects_root(), 
> > mc->default_ram_id,
> >obj, _fatal);
> > +/* Ensure backend's memory region name is equal to mc->default_ram_id 
> > */
> > +object_property_set_bool(obj, false, 
> > "x-use-canonical-path-for-ramblock-id",
> > + _fatal);
> >  user_creatable_complete(USER_CREATABLE(obj), _fatal);
> >  object_unref(obj);
> >  object_property_set_str(OBJECT(ms), mc->default_ram_id, 
> > "memory-backend",
> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH-for-5.0] tools/virtiofsd/passthrough_ll: Fix double close()

2020-03-24 Thread Dr. David Alan Gilbert

* Philippe Mathieu-Daudé (phi...@redhat.com) wrote:
> On 3/21/20 1:06 PM, Philippe Mathieu-Daudé wrote:
> > On success, the fdopendir() call closes fd. Later on the error
> > path we try to close an already-closed fd. This can lead to
> > use-after-free. Fix by only closing the fd if the fdopendir()
> > call failed.
> > 
> > Cc: qemu-sta...@nongnu.org
> > Fixes: 7c6b66027 (Import passthrough_ll from libfuse fuse-3.8.0)
> 
> libfuse is correct, the bug was introduced in commit b39bce121b, so:
> 
> Fixes: b39bce121b (add dirp_map to hide lo_dirp pointers)

Queued with that tweak

> > Reported-by: Coverity (CID 1421933 USE_AFTER_FREE)
> > Suggested-by: Peter Maydell 
> > Signed-off-by: Philippe Mathieu-Daudé 
> > ---
> >   tools/virtiofsd/passthrough_ll.c | 3 +--
> >   1 file changed, 1 insertion(+), 2 deletions(-)
> > 
> > diff --git a/tools/virtiofsd/passthrough_ll.c 
> > b/tools/virtiofsd/passthrough_ll.c
> > index 4f259aac70..4c35c95b25 100644
> > --- a/tools/virtiofsd/passthrough_ll.c
> > +++ b/tools/virtiofsd/passthrough_ll.c
> > @@ -1520,8 +1520,7 @@ out_err:
> >   if (d) {
> >   if (d->dp) {
> >   closedir(d->dp);
> > -}
> > -if (fd != -1) {
> > +} else if (fd != -1) {
> >   close(fd);
> >   }
> >   free(d);
> > 
> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH] hmp/vnc: Fix info vnc list leak

2020-03-24 Thread Dr. David Alan Gilbert

* Dr. David Alan Gilbert (git) (dgilb...@redhat.com) wrote:
> From: "Dr. David Alan Gilbert" 
> 
> We're iterating the list, and then freeing the iteration pointer rather
> than the list head.
> 
> Fixes: 0a9667ecdb6d ("hmp: Update info vnc")
> Reported-by: Coverity (CID 1421932)
> Signed-off-by: Dr. David Alan Gilbert 

Queued

> ---
>  monitor/hmp-cmds.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/monitor/hmp-cmds.c b/monitor/hmp-cmds.c
> index a00248527c..1d473e809c 100644
> --- a/monitor/hmp-cmds.c
> +++ b/monitor/hmp-cmds.c
> @@ -527,10 +527,11 @@ static void hmp_info_vnc_servers(Monitor *mon, 
> VncServerInfo2List *server)
>  
>  void hmp_info_vnc(Monitor *mon, const QDict *qdict)
>  {
> -VncInfo2List *info2l;
> +VncInfo2List *info2l, *info2l_head;
>  Error *err = NULL;
>  
>  info2l = qmp_query_vnc_servers();
> +info2l_head = info2l;
>  if (err) {
>  hmp_handle_error(mon, err);
>  return;
> @@ -559,7 +560,7 @@ void hmp_info_vnc(Monitor *mon, const QDict *qdict)
>  info2l = info2l->next;
>  }
>  
> -qapi_free_VncInfo2List(info2l);
> +qapi_free_VncInfo2List(info2l_head);
>  
>  }
>  #endif
> -- 
> 2.25.1
> 
> 
--
Dr. David Alan Gilbert / dgilb...@redhat.com / Manchester, UK

Re: [PATCH] ext4: Give 32bit personalities 32bit hashes

2020-03-24 Thread Theodore Y. Ts'o

On Tue, Mar 24, 2020 at 09:29:58AM +, Peter Maydell wrote:
> 
> On the contrary, that would be a much better interface for QEMU.
> We always know when we're doing an open-syscall on behalf
> of the guest, and it would be trivial to make the fcntl() call then.
> That would ensure that we don't accidentally get the
> '32-bit semantics' on file descriptors QEMU opens for its own
> purposes, and wouldn't leave us open to the risk in future that
> setting the PER_LINUX32 flag for all of QEMU causes
> unexpected extra behaviour in future kernels that would be correct
> for the guest binary but wrong/broken for QEMU's own internals.

If using a flag set by fcntl is better for qemu, then by all means
let's go with that instead of using a personality flag/number.

Linus, do you have what you need to do a respin of the patch?

 - Ted

1 2 3 4 >

1 - 100 of 348 matches

Mail list logo