Re: [PATCH v2 19/20] nvme: make lba data size configurable

2019-11-12 Thread Klaus Birkelund
On Tue, Nov 12, 2019 at 03:24:00PM +, Beata Michalska wrote:
> Hi Klaus,
> 
> On Tue, 15 Oct 2019 at 11:50, Klaus Jensen  wrote:
> >  #define DEFINE_NVME_NS_PROPERTIES(_state, _props) \
> > -DEFINE_PROP_UINT32("nsid", _state, _props.nsid, 0)
> > +DEFINE_PROP_UINT32("nsid", _state, _props.nsid, 0), \
> > +DEFINE_PROP_UINT8("lbads", _state, _props.lbads, 9)
> >
> Could we actually use BDRV_SECTOR_BITS instead of magic numbers?
> 
 
Yes, better. Fixed in two places.



Re: [PATCH v1 0/2] TCG plugin doc updates

2019-11-12 Thread no-reply
Patchew URL: 
https://patchew.org/QEMU/20191112164051.16404-1-alex.ben...@linaro.org/



Hi,

This series failed the docker-mingw@fedora build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#! /bin/bash
export ARCH=x86_64
make docker-image-fedora V=1 NETWORK=1
time make docker-test-mingw@fedora J=14 NETWORK=1
=== TEST SCRIPT END ===

  CC  util/thread-pool.o
  CC  util/qemu-timer.o

Warning, treated as error:
/tmp/qemu-test/src/docs/devel/index.rst:13:toctree contains reference to 
nonexisting document 'plugins'
  CC  util/main-loop.o
  CC  util/aio-win32.o
---
  CC  util/error.o
  CC  util/qemu-error.o
  CC  util/qemu-print.o
make: *** [Makefile:1018: docs/devel/index.html] Error 2
make: *** Waiting for unfinished jobs
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 662, in 
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=56318e0bd4904ba6866c9fbf90a5f3b3', '-u', 
'1001', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-gd8azaub/src/docker-src.2019-11-13-02.05.56.22302:/var/tmp/qemu:z,ro',
 'qemu:fedora', '/var/tmp/qemu/run', 'test-mingw']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=56318e0bd4904ba6866c9fbf90a5f3b3
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-gd8azaub/src'
make: *** [docker-run-test-mingw@fedora] Error 2

real1m52.187s
user0m8.358s


The full log is available at
http://patchew.org/logs/20191112164051.16404-1-alex.ben...@linaro.org/testing.docker-mingw@fedora/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: virtio,iommu_platform=on

2019-11-12 Thread Alexey Kardashevskiy



On 13/11/2019 17:23, Alexey Kardashevskiy wrote:
> 
> 
> On 13/11/2019 16:54, Michael Roth wrote:
>> Quoting Alexey Kardashevskiy (2019-11-11 21:53:49)
>>> Hi!
>>>
>>> I am enabling IOMMU for virtio in the pseries firmware (SLOF) and seeing
>>> problems, one of them is SLOF does SCSI bus scan, then it stops the
>>> virtio-scsi by clearing MMIO|IO|BUSMASTER from PCI_COMMAND (as SLOF
>>> stopped using the devices) and when this happens, I see unassigned
>>> memory access (see below) which happens because disabling busmaster
>>> disables IOMMU and QEMU cannot access the rings to do some shutdown. And
>>> when this happens, the device does not come back even if SLOF re-enables it.
>>>
>>> Hacking SLOF to not clear BUSMASTER makes virtio-scsi work but it is
>>> hardly a right fix.
>>
>> I hit the same issue enabling IOMMU for virtio-blk using this branch:
>>
>>   https://github.com/mdroth/SLOF/commits/virtio-iommu
>>
>> I just sent a tentative fix for QEMU as:
>>
>>   "virtio-pci: disable vring processing when bus-mastering is disabled"
>>
>> It's an RFC since piggy-backing off dev->broken seems a bit hacky, but
>> it seems to fix the issue at least.


btw this fixes my issue with disabling bus master as well.


>>
>> BTW, the SLOF branch above needs cleanup, but it's booting guests okay
>> and I was planning to post this week. Where are you at on yours? Maybe
>> we should sync up...
> 
> 
> Mine is here: github.com:aik/SLOF.git  virtio-iommu
> 
> Still have to debug a lot, right now virtio-net does not work :-/
> 
> Want to take over? :)
> 
> 

-- 
Alexey



Re: virtio,iommu_platform=on

2019-11-12 Thread Alexey Kardashevskiy



On 13/11/2019 16:54, Michael Roth wrote:
> Quoting Alexey Kardashevskiy (2019-11-11 21:53:49)
>> Hi!
>>
>> I am enabling IOMMU for virtio in the pseries firmware (SLOF) and seeing
>> problems, one of them is SLOF does SCSI bus scan, then it stops the
>> virtio-scsi by clearing MMIO|IO|BUSMASTER from PCI_COMMAND (as SLOF
>> stopped using the devices) and when this happens, I see unassigned
>> memory access (see below) which happens because disabling busmaster
>> disables IOMMU and QEMU cannot access the rings to do some shutdown. And
>> when this happens, the device does not come back even if SLOF re-enables it.
>>
>> Hacking SLOF to not clear BUSMASTER makes virtio-scsi work but it is
>> hardly a right fix.
> 
> I hit the same issue enabling IOMMU for virtio-blk using this branch:
> 
>   https://github.com/mdroth/SLOF/commits/virtio-iommu
> 
> I just sent a tentative fix for QEMU as:
> 
>   "virtio-pci: disable vring processing when bus-mastering is disabled"
> 
> It's an RFC since piggy-backing off dev->broken seems a bit hacky, but
> it seems to fix the issue at least.
> 
> BTW, the SLOF branch above needs cleanup, but it's booting guests okay
> and I was planning to post this week. Where are you at on yours? Maybe
> we should sync up...


Mine is here: github.com:aik/SLOF.git  virtio-iommu

Still have to debug a lot, right now virtio-net does not work :-/

Want to take over? :)


-- 
Alexey



Re: [PATCH v2 04/20] nvme: populate the mandatory subnqn and ver fields

2019-11-12 Thread Klaus Birkelund
On Tue, Nov 12, 2019 at 03:04:45PM +, Beata Michalska wrote:
> Hi Klaus
> 
> On Tue, 15 Oct 2019 at 11:42, Klaus Jensen  wrote:
> > +n->bar.vs = 0x00010201;
> 
> Very minor:
> 
> The version number is being set twice in the patch series already.
> And it is being set in two places.
> It might be worth to make a #define out of it so that only one
> needs to be changed.
> 

I think you are right. I'll do that.



Re: [PATCH v2 06/20] nvme: add support for the abort command

2019-11-12 Thread Klaus Birkelund
On Tue, Nov 12, 2019 at 03:04:38PM +, Beata Michalska wrote:
> Hi Klaus
> 

Hi Beata,

Thank you very much for your thorough reviews! I'll start going through
them one by one :) You might have seen that I've posted a v3, but I will
make sure to consolidate between v2 and v3!

> On Tue, 15 Oct 2019 at 11:41, Klaus Jensen  wrote:
> >
> > Required for compliance with NVMe revision 1.2.1. See NVM Express 1.2.1,
> > Section 5.1 ("Abort command").
> >
> > The Abort command is a best effort command; for now, the device always
> > fails to abort the given command.
> >
> > Signed-off-by: Klaus Jensen 
> > ---
> >  hw/block/nvme.c | 16 
> >  1 file changed, 16 insertions(+)
> >
> > diff --git a/hw/block/nvme.c b/hw/block/nvme.c
> > index daa2367b0863..84e4f2ea7a15 100644
> > --- a/hw/block/nvme.c
> > +++ b/hw/block/nvme.c
> > @@ -741,6 +741,18 @@ static uint16_t nvme_identify(NvmeCtrl *n, NvmeCmd 
> > *cmd)
> >  }
> >  }
> >
> > +static uint16_t nvme_abort(NvmeCtrl *n, NvmeCmd *cmd, NvmeRequest *req)
> > +{
> > +uint16_t sqid = le32_to_cpu(cmd->cdw10) & 0x;
> > +
> > +req->cqe.result = 1;
> > +if (nvme_check_sqid(n, sqid)) {
> > +return NVME_INVALID_FIELD | NVME_DNR;
> > +}
> > +
> Shouldn't we validate the CID as well ?
> 

According to the specification it is "implementation specific if/when a
controller chooses to complete the command when the command to abort is
not found".

I'm interpreting this to mean that, yes, an invalid command identifier
could be given in the command, but this implementation does not care
about that.

I still think the controller should check the validity of the submission
queue identifier though. It is a general invariant that the sqid should
be valid.

> > +return NVME_SUCCESS;
> > +}
> > +
> >  static inline void nvme_set_timestamp(NvmeCtrl *n, uint64_t ts)
> >  {
> >  trace_nvme_setfeat_timestamp(ts);
> > @@ -859,6 +871,7 @@ static uint16_t nvme_set_feature(NvmeCtrl *n, NvmeCmd 
> > *cmd, NvmeRequest *req)
> >  trace_nvme_err_invalid_setfeat(dw10);
> >  return NVME_INVALID_FIELD | NVME_DNR;
> >  }
> > +
> >  return NVME_SUCCESS;
> >  }
> >
> > @@ -875,6 +888,8 @@ static uint16_t nvme_admin_cmd(NvmeCtrl *n, NvmeCmd 
> > *cmd, NvmeRequest *req)
> >  return nvme_create_cq(n, cmd);
> >  case NVME_ADM_CMD_IDENTIFY:
> >  return nvme_identify(n, cmd);
> > +case NVME_ADM_CMD_ABORT:
> > +return nvme_abort(n, cmd, req);
> >  case NVME_ADM_CMD_SET_FEATURES:
> >  return nvme_set_feature(n, cmd, req);
> >  case NVME_ADM_CMD_GET_FEATURES:
> > @@ -1388,6 +1403,7 @@ static void nvme_realize(PCIDevice *pci_dev, Error 
> > **errp)
> >  id->ieee[2] = 0xb3;
> >  id->ver = cpu_to_le32(0x00010201);
> >  id->oacs = cpu_to_le16(0);
> > +id->acl = 3;
> So we are setting the max number of concurrent commands
> but there is no logic to enforce that and wrap up with the
> status suggested by specification.
> 

That is true, but because the controller always completes the Abort
command immediately this cannot happen. If the controller did try to
abort executing commands, the Abort command would need to linger in the
controller state until a completion queue entry is posted for the
command to be aborted before the completion queue entry can be posted
for the Abort command. This takes up resources in the controller and is
the reason for the Abort Command Limit.

You could argue that we should set ACL to 0 then, but the specification
recommends a value of 3 and I do not see any harm in conveying a
"reasonable", though inconsequential, value.



Re: virtio,iommu_platform=on

2019-11-12 Thread Michael Roth
Quoting Alexey Kardashevskiy (2019-11-11 21:53:49)
> Hi!
> 
> I am enabling IOMMU for virtio in the pseries firmware (SLOF) and seeing
> problems, one of them is SLOF does SCSI bus scan, then it stops the
> virtio-scsi by clearing MMIO|IO|BUSMASTER from PCI_COMMAND (as SLOF
> stopped using the devices) and when this happens, I see unassigned
> memory access (see below) which happens because disabling busmaster
> disables IOMMU and QEMU cannot access the rings to do some shutdown. And
> when this happens, the device does not come back even if SLOF re-enables it.
> 
> Hacking SLOF to not clear BUSMASTER makes virtio-scsi work but it is
> hardly a right fix.

I hit the same issue enabling IOMMU for virtio-blk using this branch:

  https://github.com/mdroth/SLOF/commits/virtio-iommu

I just sent a tentative fix for QEMU as:

  "virtio-pci: disable vring processing when bus-mastering is disabled"

It's an RFC since piggy-backing off dev->broken seems a bit hacky, but
it seems to fix the issue at least.

BTW, the SLOF branch above needs cleanup, but it's booting guests okay
and I was planning to post this week. Where are you at on yours? Maybe
we should sync up...



[PATCH RFC] virtio-pci: disable vring processing when bus-mastering is disabled

2019-11-12 Thread Michael Roth
Currently the SLOF firmware for pseries guests will disable/re-enable
a PCI device multiple times via IO/MEM/MASTER bits of PCI_COMMAND
register after the initial probe/feature negotiation, as it tends to
work with a single device at a time at various stages like probing
and running block/network bootloaders without doing a full reset
in-between.

In QEMU, when PCI_COMMAND_MASTER is disabled we disable the
corresponding IOMMU memory region, so DMA accesses (including to vring
fields like idx/flags) will no longer undergo the necessary
translation. Normally we wouldn't expect this to happen since it would
be misbehavior on the driver side to continue driving DMA requests.

However, in the case of pseries, with iommu_platform=on, we trigger the
following sequence when tearing down the virtio-blk dataplane ioeventfd
in response to the guest unsetting PCI_COMMAND_MASTER:

  #2  0x55922651 in virtqueue_map_desc (vdev=vdev@entry=0x56dbcfb0, 
p_num_sg=p_num_sg@entry=0x7fffe657e1a8, addr=addr@entry=0x7fffe657e240, 
iov=iov@entry=0x7fffe6580240, max_num_sg=max_num_sg@entry=1024, 
is_write=is_write@entry=false, pa=0, sz=0)
  at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:757
  #3  0x55922a89 in virtqueue_pop (vq=vq@entry=0x56dc8660, 
sz=sz@entry=184)
  at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:950
  #4  0x558d3eca in virtio_blk_get_request (vq=0x56dc8660, 
s=0x56dbcfb0)
  at /home/mdroth/w/qemu.git/hw/block/virtio-blk.c:255
  #5  0x558d3eca in virtio_blk_handle_vq (s=0x56dbcfb0, 
vq=0x56dc8660)
  at /home/mdroth/w/qemu.git/hw/block/virtio-blk.c:776
  #6  0x5591dd66 in virtio_queue_notify_aio_vq 
(vq=vq@entry=0x56dc8660)
  at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:1550
  #7  0x5591ecef in virtio_queue_notify_aio_vq (vq=0x56dc8660)
  at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:1546
  #8  0x5591ecef in virtio_queue_host_notifier_aio_poll 
(opaque=0x56dc86c8)
  at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:2527
  #9  0x55d02164 in run_poll_handlers_once 
(ctx=ctx@entry=0x5688bfc0, timeout=timeout@entry=0x7fffe65844a8)
  at /home/mdroth/w/qemu.git/util/aio-posix.c:520
  #10 0x55d02d1b in try_poll_mode (timeout=0x7fffe65844a8, 
ctx=0x5688bfc0)
  at /home/mdroth/w/qemu.git/util/aio-posix.c:607
  #11 0x55d02d1b in aio_poll (ctx=ctx@entry=0x5688bfc0, 
blocking=blocking@entry=true)
  at /home/mdroth/w/qemu.git/util/aio-posix.c:639
  #12 0x55d0004d in aio_wait_bh_oneshot (ctx=0x5688bfc0, 
cb=cb@entry=0x558d5130 , 
opaque=opaque@entry=0x56de86f0)
  at /home/mdroth/w/qemu.git/util/aio-wait.c:71
  #13 0x558d59bf in virtio_blk_data_plane_stop (vdev=)
  at /home/mdroth/w/qemu.git/hw/block/dataplane/virtio-blk.c:288
  #14 0x55b906a1 in virtio_bus_stop_ioeventfd 
(bus=bus@entry=0x56dbcf38)
  at /home/mdroth/w/qemu.git/hw/virtio/virtio-bus.c:245
  #15 0x55b90dbb in virtio_bus_stop_ioeventfd 
(bus=bus@entry=0x56dbcf38)
  at /home/mdroth/w/qemu.git/hw/virtio/virtio-bus.c:237
  #16 0x55b92a8e in virtio_pci_stop_ioeventfd (proxy=0x56db4e40)
  at /home/mdroth/w/qemu.git/hw/virtio/virtio-pci.c:292
  #17 0x55b92a8e in virtio_write_config (pci_dev=0x56db4e40, 
address=, val=1048832, len=)
  at /home/mdroth/w/qemu.git/hw/virtio/virtio-pci.c:613

I.e. the calling code is only scheduling a one-shot BH for
virtio_blk_data_plane_stop_bh, but somehow we end up trying to process
an additional virtqueue entry before we get there. This is likely due
to the following check in virtio_queue_host_notifier_aio_poll:

  static bool virtio_queue_host_notifier_aio_poll(void *opaque)
  {
  EventNotifier *n = opaque;
  VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
  bool progress;

  if (!vq->vring.desc || virtio_queue_empty(vq)) {
  return false;
  }

  progress = virtio_queue_notify_aio_vq(vq);

namely the call to virtio_queue_empty(). In this case, since no new
requests have actually been issued, shadow_avail_idx == last_avail_idx,
so we actually try to access the vring via vring_avail_idx() to get
the latest non-shadowed idx:

  int virtio_queue_empty(VirtQueue *vq)
  {
  bool empty;
  ...

  if (vq->shadow_avail_idx != vq->last_avail_idx) {
  return 0;
  }

  rcu_read_lock();
  empty = vring_avail_idx(vq) == vq->last_avail_idx;
  rcu_read_unlock();
  return empty;

but since the IOMMU region has been disabled we get a bogus value (0
usually), which causes virtio_queue_empty() to falsely report that
there are entries to be processed, which causes errors such as:

  "virtio: zero sized buffers are not allowed"

or

  "virtio-blk missing headers"

and puts the device in an error state.

This patch works around the issue by introducing virtio_set_disabled(),
which piggy-backs off the vdev->broken flag we 

Re: virtio,iommu_platform=on

2019-11-12 Thread Alexey Kardashevskiy



On 12/11/2019 20:06, Laszlo Ersek wrote:
> On 11/12/19 04:53, Alexey Kardashevskiy wrote:
>> Hi!
>>
>> I am enabling IOMMU for virtio in the pseries firmware (SLOF) and seeing
>> problems, one of them is SLOF does SCSI bus scan, then it stops the
>> virtio-scsi by clearing MMIO|IO|BUSMASTER from PCI_COMMAND (as SLOF
>> stopped using the devices) and when this happens, I see unassigned
>> memory access (see below) which happens because disabling busmaster
>> disables IOMMU and QEMU cannot access the rings to do some shutdown. And
>> when this happens, the device does not come back even if SLOF re-enables it.
>>
>> Hacking SLOF to not clear BUSMASTER makes virtio-scsi work but it is
>> hardly a right fix.
>>
>> Is this something expected? Thanks,
> 
> Can you perform a virtio reset (write 0 to the virtio-scsi-pci device's
> virtio status register) in SLOF, before clearing PCI_COMMAND?


The device stops working in SLOF, even if I do not remove bus master bit
ever. Weird...


> 
> Thanks,
> Laszlo
> 
> 
>>
>>
>> Here is the exact command line:
>>
>> /home/aik/pbuild/qemu-garrison2-ppc64/ppc64-softmmu/qemu-system-ppc64 \
>>
>> -nodefaults \
>>
>> -chardev stdio,id=STDIO0,signal=off,mux=on \
>>
>> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
>>
>> -mon id=MON0,chardev=STDIO0,mode=readline \
>>
>> -nographic \
>>
>> -vga none \
>>
>> -enable-kvm \
>> -m 2G \
>>
>> -device
>> virtio-scsi-pci,id=vscsi0,iommu_platform=on,disable-modern=off,disable-legacy=on
>> \
>> -drive id=DRIVE0,if=none,file=img/u1804-64le.qcow2,format=qcow2 \
>>
>> -device scsi-disk,id=scsi-disk0,drive=DRIVE0 \
>>
>> -snapshot \
>>
>> -smp 1 \
>>
>> -machine pseries \
>>
>> -L /home/aik/t/qemu-ppc64-bios/ \
>>
>> -trace events=qemu_trace_events \
>>
>> -d guest_errors \
>>
>> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.ssh59518 \
>>
>> -mon chardev=SOCKET0,mode=control
>>
>>
>>
>> Here is the backtrace:
>>
>> Thread 5 "qemu-system-ppc" hit Breakpoint 8, unassigned_mem_accepts
>> (opaque=0x0, addr=0x5802, size=0x2, is_write=0x0, attrs=...) at /home/
>> aik/p/qemu/memory.c:1275
>> 1275return false;
>> #0  unassigned_mem_accepts (opaque=0x0, addr=0x5802, size=0x2,
>> is_write=0x0, attrs=...) at /home/aik/p/qemu/memory.c:1275
>> #1  0x100a8ac8 in memory_region_access_valid (mr=0x1105c230
>> , addr=0x5802, size=0x2, is_write=0x0, attrs=...) at
>> /home/aik/p/qemu/memory.c:1377
>> #2  0x100a8c88 in memory_region_dispatch_read (mr=0x1105c230
>> , addr=0x5802, pval=0x7550d410, op=MO_16,
>> attrs=...) at /home/aik/p/qemu/memory.c:1418
>> #3  0x1001cad4 in address_space_lduw_internal_cached_slow
>> (cache=0x7fff68036fa0, addr=0x2, attrs=..., result=0x0,
>> endian=DEVICE_LITTLE_ENDIAN) at /home/aik/p/qemu/memory_ldst.inc.c:211
>> #4  0x1001cc84 in address_space_lduw_le_cached_slow
>> (cache=0x7fff68036fa0, addr=0x2, attrs=..., result=0x0) at
>> /home/aik/p/qemu/memory_ldst.inc.c:249
>> #5  0x1019bd80 in address_space_lduw_le_cached
>> (cache=0x7fff68036fa0, addr=0x2, attrs=..., result=0x0) at
>> /home/aik/p/qemu/include/exec/memory_ldst_cached.inc.h:56
>> #6  0x1019c10c in lduw_le_phys_cached (cache=0x7fff68036fa0,
>> addr=0x2) at /home/aik/p/qemu/include/exec/memory_ldst_phys.inc.h:91
>> #7  0x1019d86c in virtio_lduw_phys_cached (vdev=0x118b9110,
>> cache=0x7fff68036fa0, pa=0x2) at
>> /home/aik/p/qemu/include/hw/virtio/virtio-access.h:166
>> #8  0x1019e648 in vring_avail_idx (vq=0x118c2720) at
>> /home/aik/p/qemu/hw/virtio/virtio.c:302
>> #9  0x1019f5bc in virtio_queue_split_empty (vq=0x118c2720) at
>> /home/aik/p/qemu/hw/virtio/virtio.c:581
>> #10 0x1019f838 in virtio_queue_empty (vq=0x118c2720) at
>> /home/aik/p/qemu/hw/virtio/virtio.c:612
>> #11 0x101a8fa8 in virtio_queue_host_notifier_aio_poll
>> (opaque=0x118c2798) at /home/aik/p/qemu/hw/virtio/virtio.c:3389
>> #12 0x1092679c in run_poll_handlers_once (ctx=0x11212e40,
>> timeout=0x7550d7d8) at /home/aik/p/qemu/util/aio-posix.c:520
>> #13 0x10926aec in try_poll_mode (ctx=0x11212e40,
>> timeout=0x7550d7d8) at /home/aik/p/qemu/util/aio-posix.c:607
>> #14 0x10926c2c in aio_poll (ctx=0x11212e40, blocking=0x1) at
>> /home/aik/p/qemu/util/aio-posix.c:639
>> #15 0x1091fe0c in aio_wait_bh_oneshot (ctx=0x11212e40,
>> cb=0x1016f35c , opaque=0x118b9110) at
>> /home/aik/p/qemu/util/aio-wait.c:71
>> #16 0x1016fa60 in virtio_scsi_dataplane_stop (vdev=0x118b9110)
>> at /home/aik/p/qemu/hw/scsi/virtio-scsi-dataplane.c:211
>> #17 0x10684740 in virtio_bus_stop_ioeventfd (bus=0x118b9098) at
>> /home/aik/p/qemu/hw/virtio/virtio-bus.c:245
>> #18 0x10688290 in virtio_pci_stop_ioeventfd (proxy=0x118b0fa0)
>> at /home/aik/p/qemu/hw/virtio/virtio-pci.c:292
>> #19 0x106891e8 in virtio_write_config (pci_dev=0x118b0fa0,
>> address=0x4, val=0x100100, len=0x4) at
>> /home/aik/p/qemu/hw/virtio/virtio-pci.c:613
>> #20 

Re: virtio,iommu_platform=on

2019-11-12 Thread Alexey Kardashevskiy



On 12/11/2019 18:08, Michael S. Tsirkin wrote:
> On Tue, Nov 12, 2019 at 02:53:49PM +1100, Alexey Kardashevskiy wrote:
>> Hi!
>>
>> I am enabling IOMMU for virtio in the pseries firmware (SLOF) and seeing
>> problems, one of them is SLOF does SCSI bus scan, then it stops the
>> virtio-scsi by clearing MMIO|IO|BUSMASTER from PCI_COMMAND (as SLOF
>> stopped using the devices) and when this happens, I see unassigned
>> memory access (see below) which happens because disabling busmaster
>> disables IOMMU and QEMU cannot access the rings to do some shutdown. And
>> when this happens, the device does not come back even if SLOF re-enables it.
> 
> In fact clearing bus master should disable ring access even
> without the IOMMU.
> Once you do this you should not wait for rings to be processed,
> it is safe to assume they won't be touched again and just
> free up any buffers that have not been used.
> 
> Why don't you see this without IOMMU?

Because without IOMMU, virtio can always access rings, it does not need
bus master address space for that.


> It's a bug I think, probably there to work around buggy guests.
> 
> So pls fix this in SLOF and then hopefully we can drop the
> work arounds and have clearing bus master actually block DMA.


Laszlo suggested writing 0 to the status but this does not seem helping,
with both ioeventfd=true/false. It looks like setting/clearing busmaster
bit confused memory region caches in QEMU's virtio. I am confused which
direction to keep digging to, any suggestions? Thanks,



> 
>> Hacking SLOF to not clear BUSMASTER makes virtio-scsi work but it is
>> hardly a right fix.
>>
>> Is this something expected? Thanks,
>>
>>
>> Here is the exact command line:
>>
>> /home/aik/pbuild/qemu-garrison2-ppc64/ppc64-softmmu/qemu-system-ppc64 \
>>
>> -nodefaults \
>>
>> -chardev stdio,id=STDIO0,signal=off,mux=on \
>>
>> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
>>
>> -mon id=MON0,chardev=STDIO0,mode=readline \
>>
>> -nographic \
>>
>> -vga none \
>>
>> -enable-kvm \
>> -m 2G \
>>
>> -device
>> virtio-scsi-pci,id=vscsi0,iommu_platform=on,disable-modern=off,disable-legacy=on
>> \
>> -drive id=DRIVE0,if=none,file=img/u1804-64le.qcow2,format=qcow2 \
>>
>> -device scsi-disk,id=scsi-disk0,drive=DRIVE0 \
>>
>> -snapshot \
>>
>> -smp 1 \
>>
>> -machine pseries \
>>
>> -L /home/aik/t/qemu-ppc64-bios/ \
>>
>> -trace events=qemu_trace_events \
>>
>> -d guest_errors \
>>
>> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.ssh59518 \
>>
>> -mon chardev=SOCKET0,mode=control
>>
>>
>>
>> Here is the backtrace:
>>
>> Thread 5 "qemu-system-ppc" hit Breakpoint 8, unassigned_mem_accepts
>> (opaque=0x0, addr=0x5802, size=0x2, is_write=0x0, attrs=...) at /home/
>> aik/p/qemu/memory.c:1275
>> 1275return false;
>> #0  unassigned_mem_accepts (opaque=0x0, addr=0x5802, size=0x2,
>> is_write=0x0, attrs=...) at /home/aik/p/qemu/memory.c:1275
>> #1  0x100a8ac8 in memory_region_access_valid (mr=0x1105c230
>> , addr=0x5802, size=0x2, is_write=0x0, attrs=...) at
>> /home/aik/p/qemu/memory.c:1377
>> #2  0x100a8c88 in memory_region_dispatch_read (mr=0x1105c230
>> , addr=0x5802, pval=0x7550d410, op=MO_16,
>> attrs=...) at /home/aik/p/qemu/memory.c:1418
>> #3  0x1001cad4 in address_space_lduw_internal_cached_slow
>> (cache=0x7fff68036fa0, addr=0x2, attrs=..., result=0x0,
>> endian=DEVICE_LITTLE_ENDIAN) at /home/aik/p/qemu/memory_ldst.inc.c:211
>> #4  0x1001cc84 in address_space_lduw_le_cached_slow
>> (cache=0x7fff68036fa0, addr=0x2, attrs=..., result=0x0) at
>> /home/aik/p/qemu/memory_ldst.inc.c:249
>> #5  0x1019bd80 in address_space_lduw_le_cached
>> (cache=0x7fff68036fa0, addr=0x2, attrs=..., result=0x0) at
>> /home/aik/p/qemu/include/exec/memory_ldst_cached.inc.h:56
>> #6  0x1019c10c in lduw_le_phys_cached (cache=0x7fff68036fa0,
>> addr=0x2) at /home/aik/p/qemu/include/exec/memory_ldst_phys.inc.h:91
>> #7  0x1019d86c in virtio_lduw_phys_cached (vdev=0x118b9110,
>> cache=0x7fff68036fa0, pa=0x2) at
>> /home/aik/p/qemu/include/hw/virtio/virtio-access.h:166
>> #8  0x1019e648 in vring_avail_idx (vq=0x118c2720) at
>> /home/aik/p/qemu/hw/virtio/virtio.c:302
>> #9  0x1019f5bc in virtio_queue_split_empty (vq=0x118c2720) at
>> /home/aik/p/qemu/hw/virtio/virtio.c:581
>> #10 0x1019f838 in virtio_queue_empty (vq=0x118c2720) at
>> /home/aik/p/qemu/hw/virtio/virtio.c:612
>> #11 0x101a8fa8 in virtio_queue_host_notifier_aio_poll
>> (opaque=0x118c2798) at /home/aik/p/qemu/hw/virtio/virtio.c:3389
>> #12 0x1092679c in run_poll_handlers_once (ctx=0x11212e40,
>> timeout=0x7550d7d8) at /home/aik/p/qemu/util/aio-posix.c:520
>> #13 0x10926aec in try_poll_mode (ctx=0x11212e40,
>> timeout=0x7550d7d8) at /home/aik/p/qemu/util/aio-posix.c:607
>> #14 0x10926c2c in aio_poll (ctx=0x11212e40, blocking=0x1) at
>> /home/aik/p/qemu/util/aio-posix.c:639
>> #15 0x1091fe0c in aio_wait_bh_oneshot 

Re: [PATCH v9 QEMU 14/15] vfio: Add ioctl to get dirty pages bitmap during dma unmap.

2019-11-12 Thread Yan Zhao
On Wed, Nov 13, 2019 at 01:05:23AM +0800, Kirti Wankhede wrote:
> With vIOMMU, IO virtual address range can get unmapped while in pre-copy phase
> of migration. In that case, unmap ioctl should return pages pinned in that 
> range
> and QEMU should find its correcponding guest physical addresses and report
> those dirty.
> 
> Note: This patch is not yet tested. I'm trying to see how I can test this code
> path.
> 
> Suggested-by: Alex Williamson 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  hw/vfio/common.c | 65 
> 
>  1 file changed, 61 insertions(+), 4 deletions(-)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index 66f1c64bf074..dc5768219d44 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -311,11 +311,30 @@ static bool vfio_devices_are_stopped_and_saving(void)
>  return true;
>  }
>  
> +static bool vfio_devices_are_running_and_saving(void)
> +{
> +VFIOGroup *group;
> +VFIODevice *vbasedev;
> +
> +QLIST_FOREACH(group, _group_list, next) {
> +QLIST_FOREACH(vbasedev, >device_list, next) {
> +if ((vbasedev->device_state & VFIO_DEVICE_STATE_SAVING) &&
> +(vbasedev->device_state & VFIO_DEVICE_STATE_RUNNING)) {
> +continue;
> +} else {
> +return false;
> +}
> +}
> +}
> +return true;
> +}
> +
>  /*
>   * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
>   */
>  static int vfio_dma_unmap(VFIOContainer *container,
> -  hwaddr iova, ram_addr_t size)
> +  hwaddr iova, ram_addr_t size,
> +  VFIOGuestIOMMU *giommu)
>  {
>  struct vfio_iommu_type1_dma_unmap unmap = {
>  .argsz = sizeof(unmap),
> @@ -324,6 +343,44 @@ static int vfio_dma_unmap(VFIOContainer *container,
>  .size = size,
>  };
>  
> +if (giommu && vfio_devices_are_running_and_saving()) {
> +int ret;
> +uint64_t bitmap_size;
> +struct vfio_iommu_type1_dma_unmap_bitmap unmap_bitmap = {
> +.argsz = sizeof(unmap_bitmap),
> +.flags = VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP,
> +.iova = iova,
> +.size = size,
> +};
> +
> +bitmap_size = BITS_TO_LONGS(size >> TARGET_PAGE_BITS) *
> +  sizeof(uint64_t);
> +
> +unmap_bitmap.bitmap = g_try_malloc0(bitmap_size);
> +if (!unmap_bitmap.bitmap) {
> +error_report("%s: Error allocating bitmap buffer of size 0x%lx",
> + __func__, bitmap_size);
> +return -ENOMEM;
> +}
> +
> +unmap_bitmap.bitmap_size = bitmap_size;
> +
> +ret = ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA_GET_BITMAP,
> +_bitmap);

Once reaching vfio_dma_unmap, the IOVAs being unmapped will be failed to
get translated in viommu for shadow page tables are updated already. so
except for iotlbs have been generated and iotlb inalidation is delayed until
after unmap notification, IOVA to GPA translation would fail.
> +
> +if (!ret) {
> +cpu_physical_memory_set_dirty_lebitmap(
> +(uint64_t *)unmap_bitmap.bitmap,
> +giommu->iommu_offset + 
> giommu->n.start,
> +bitmap_size >> TARGET_PAGE_BITS);
also, why here IOVAs can be used directly?

Thanks
Yan
> +} else {
> +error_report("VFIO_IOMMU_GET_DIRTY_BITMAP: %d %d", ret, errno);
> +}
> +
> +g_free(unmap_bitmap.bitmap);
> +return ret;
> +}
> +
>  while (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, )) {
>  /*
>   * The type1 backend has an off-by-one bug in the kernel 
> (71a7d3d78e3c
> @@ -371,7 +428,7 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr 
> iova,
>   * the VGA ROM space.
>   */
>  if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, ) == 0 ||
> -(errno == EBUSY && vfio_dma_unmap(container, iova, size) == 0 &&
> +(errno == EBUSY && vfio_dma_unmap(container, iova, size, NULL) == 0 
> &&
>   ioctl(container->fd, VFIO_IOMMU_MAP_DMA, ) == 0)) {
>  return 0;
>  }
> @@ -511,7 +568,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
> IOMMUTLBEntry *iotlb)
>   iotlb->addr_mask + 1, vaddr, ret);
>  }
>  } else {
> -ret = vfio_dma_unmap(container, iova, iotlb->addr_mask + 1);
> +ret = vfio_dma_unmap(container, iova, iotlb->addr_mask + 1, giommu);
>  if (ret) {
>  error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
>   "0x%"HWADDR_PRIx") = %d (%m)",
> @@ -814,7 +871,7 @@ static void vfio_listener_region_del(MemoryListener 
> *listener,
>  }
>  
>  if (try_unmap) {
> -ret = vfio_dma_unmap(container, 

Re: [PATCH v9 Kernel 1/5] vfio: KABI for migration interface for device state

2019-11-12 Thread Yan Zhao
On Wed, Nov 13, 2019 at 06:30:05AM +0800, Alex Williamson wrote:
> On Tue, 12 Nov 2019 22:33:36 +0530
> Kirti Wankhede  wrote:
> 
> > - Defined MIGRATION region type and sub-type.
> > - Used 3 bits to define VFIO device states.
> > Bit 0 => _RUNNING
> > Bit 1 => _SAVING
> > Bit 2 => _RESUMING
> > Combination of these bits defines VFIO device's state during migration
> > _RUNNING => Normal VFIO device running state. When its reset, it
> > indicates _STOPPED state. when device is changed to
> > _STOPPED, driver should stop device before write()
> > returns.
> > _SAVING | _RUNNING => vCPUs are running, VFIO device is running but
> >   start saving state of device i.e. pre-copy state
> > _SAVING  => vCPUs are stopped, VFIO device should be stopped, and
> 
> s/should/must/
> 
> > save device state,i.e. stop-n-copy state
> > _RESUMING => VFIO device resuming state.
> > _SAVING | _RESUMING and _RUNNING | _RESUMING => Invalid states
> 
> A table might be useful here and in the uapi header to indicate valid
> states:
> 
> | _RESUMING | _SAVING | _RUNNING | Description
> +---+-+--+--
> | 0 |0| 0| Stopped, not saving or resuming (a)
> +---+-+--+--
> | 0 |0| 1| Running, default state
> +---+-+--+--
> | 0 |1| 0| Stopped, migration interface in save mode
> +---+-+--+--
> | 0 |1| 1| Running, save mode interface, iterative
> +---+-+--+--
> | 1 |0| 0| Stopped, migration resume interface active
> +---+-+--+--
> | 1 |0| 1| Invalid (b)
> +---+-+--+--
> | 1 |1| 0| Invalid (c)
> +---+-+--+--
> | 1 |1| 1| Invalid (d)
> 
> I think we need to consider whether we define (a) as generally
> available, for instance we might want to use it for diagnostics or a
> fatal error condition outside of migration.
> 
> Are there hidden assumptions between state transitions here or are
> there specific next possible state diagrams that we need to include as
> well?
> 
> I'm curious if Intel agrees with the states marked invalid with their
> push for post-copy support.
> 
hi Alex and Kirti,
Actually, for postcopy, I think we anyway need an extra POSTCOPY state
introduced. Reasons as below:
- in the target side, _RSESUMING state is set in the beginning of
  migration. It cannot be used as a state to inform device of that
  currently it's in postcopy state and device DMAs are to be trapped and
  pre-faulted.
  we also cannot use state (_RESUMING + _RUNNING) as an indicator of
  postcopy, because before device & vm running in target side, some device
  state are already loaded (e.g. page tables, pending workloads),
  target side can do pre-pagefault at that period before target vm up.
- in the source side, after device is stopped, postcopy needs saving
  device state only (as compared to device state + remaining dirty
  pages in precopy). state (!_RUNNING + _SAVING) here again cannot
  differentiate precopy and postcopy here.

> > Bits 3 - 31 are reserved for future use. User should perform
> > read-modify-write operation on this field.
> > - Defined vfio_device_migration_info structure which will be placed at 0th
> >   offset of migration region to get/set VFIO device related information.
> >   Defined members of structure and usage on read/write access:
> > * device_state: (read/write)
> > To convey VFIO device state to be transitioned to. Only 3 bits are
> > used as of now, Bits 3 - 31 are reserved for future use.
> > * pending bytes: (read only)
> > To get pending bytes yet to be migrated for VFIO device.
> > * data_offset: (read only)
> > To get data offset in migration region from where data exist
> > during _SAVING and from where data should be written by user space
> > application during _RESUMING state.
> > * data_size: (read/write)
> > To get and set size in bytes of data copied in migration region
> > during _SAVING and _RESUMING state.
> > 
> > Migration region looks like:
> >  --
> > |vfio_device_migration_info|data section  |
> > |  | ///  |
> >  --
> >  ^ 

Re: [PATCH v9 QEMU 13/15] vfio: Add vfio_listener_log_sync to mark dirty pages

2019-11-12 Thread Yan Zhao
On Wed, Nov 13, 2019 at 01:05:22AM +0800, Kirti Wankhede wrote:
> vfio_listener_log_sync gets list of dirty pages from container using
> VFIO_IOMMU_GET_DIRTY_BITMAP ioctl and mark those pages dirty when all
> devices are stopped and saving state.
> Return early for the RAM block section of mapped MMIO region.
> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  hw/vfio/common.c | 103 
> +++
>  hw/vfio/trace-events |   1 +
>  2 files changed, 104 insertions(+)
> 
> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
> index ade9839c28a3..66f1c64bf074 100644
> --- a/hw/vfio/common.c
> +++ b/hw/vfio/common.c
> @@ -29,6 +29,7 @@
>  #include "hw/vfio/vfio.h"
>  #include "exec/address-spaces.h"
>  #include "exec/memory.h"
> +#include "exec/ram_addr.h"
>  #include "hw/hw.h"
>  #include "qemu/error-report.h"
>  #include "qemu/main-loop.h"
> @@ -38,6 +39,7 @@
>  #include "sysemu/reset.h"
>  #include "trace.h"
>  #include "qapi/error.h"
> +#include "migration/migration.h"
>  
>  VFIOGroupList vfio_group_list =
>  QLIST_HEAD_INITIALIZER(vfio_group_list);
> @@ -288,6 +290,28 @@ const MemoryRegionOps vfio_region_ops = {
>  };
>  
>  /*
> + * Device state interfaces
> + */
> +
> +static bool vfio_devices_are_stopped_and_saving(void)
> +{
> +VFIOGroup *group;
> +VFIODevice *vbasedev;
> +
> +QLIST_FOREACH(group, _group_list, next) {
> +QLIST_FOREACH(vbasedev, >device_list, next) {
> +if ((vbasedev->device_state & VFIO_DEVICE_STATE_SAVING) &&
> +!(vbasedev->device_state & VFIO_DEVICE_STATE_RUNNING)) {
> +continue;
> +} else {
> +return false;
> +}
> +}
> +}
> +return true;
> +}
> +
> +/*
>   * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
>   */
>  static int vfio_dma_unmap(VFIOContainer *container,
> @@ -813,9 +837,88 @@ static void vfio_listener_region_del(MemoryListener 
> *listener,
>  }
>  }
>  
> +static int vfio_get_dirty_bitmap(VFIOContainer *container,
> + MemoryRegionSection *section)
> +{
> +struct vfio_iommu_type1_dirty_bitmap range;
> +uint64_t bitmap_size;
> +int ret;
> +
> +range.argsz = sizeof(range);
> +
> +if (memory_region_is_iommu(section->mr)) {
> +VFIOGuestIOMMU *giommu;
> +IOMMUTLBEntry iotlb;
> +
> +QLIST_FOREACH(giommu, >giommu_list, giommu_next) {
> +if (MEMORY_REGION(giommu->iommu) == section->mr &&
> +giommu->n.start == section->offset_within_region) {
> +break;
> +}
> +}
> +
> +if (!giommu) {
> +return -EINVAL;
> +}
> +
> +iotlb = address_space_get_iotlb_entry(container->space->as,
> +   
> TARGET_PAGE_ALIGN(section->offset_within_address_space),
> +   true, MEMTXATTRS_UNSPECIFIED);
> +range.iova = iotlb.iova + giommu->iommu_offset;
> +range.size = iotlb.addr_mask + 1;
> +} else {
> +range.iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
> +range.size = int128_get64(section->size);
> +}
> +
> +bitmap_size = BITS_TO_LONGS(range.size >> TARGET_PAGE_BITS) *
> + 
> sizeof(uint64_t);
> +
> +range.bitmap = g_try_malloc0(bitmap_size);
> +if (!range.bitmap) {
> +error_report("%s: Error allocating bitmap buffer of size 0x%lx",
> + __func__, bitmap_size);
> +return -ENOMEM;
> +}
> +
> +range.bitmap_size = bitmap_size;
> +
> +ret = ioctl(container->fd, VFIO_IOMMU_GET_DIRTY_BITMAP, );
> +
>From the implementation of ioctl VFIO_IOMMU_GET_DIRTY_BITMAP,
this range.bitmap is indexed by iova, right?
so if viommu is on, why cpu_physical_memory_set_dirty_lebitmap can be 
called directly here without any viommu translation?

> +if (!ret) {
> +cpu_physical_memory_set_dirty_lebitmap((uint64_t *)range.bitmap,
> +   
> TARGET_PAGE_ALIGN(section->offset_within_address_space),
> +   bitmap_size >> TARGET_PAGE_BITS);
> +} else {
> +error_report("VFIO_IOMMU_GET_DIRTY_BITMAP: %d %d", ret, errno);
> +}
> +
> +trace_vfio_get_dirty_bitmap(container->fd, range.iova, range.size,
> +bitmap_size);
> +
> +g_free(range.bitmap);
> +return ret;
> +}
> +
> +static void vfio_listerner_log_sync(MemoryListener *listener,
> +MemoryRegionSection *section)
> +{
> +VFIOContainer *container = container_of(listener, VFIOContainer, 
> listener);
> +
> +if (memory_region_is_ram_device(section->mr)) {
> +return;
> +}
> +
how about for those devices who need to sync dirty bitmap in RUNNING and
SAVING state?
> +if (vfio_devices_are_stopped_and_saving()) {
> +
> +vfio_get_dirty_bitmap(container, 

Re: [PATCH v14 03/11] tests: Add test for QAPI builtin type time

2019-11-12 Thread Tao Xu

On 11/13/2019 4:15 AM, Eduardo Habkost wrote:

On Fri, Nov 08, 2019 at 09:05:52AM +0100, Markus Armbruster wrote:

Tao Xu  writes:


On 11/7/2019 9:31 PM, Eduardo Habkost wrote:

On Thu, Nov 07, 2019 at 02:24:52PM +0800, Tao Xu wrote:

On 11/7/2019 4:53 AM, Eduardo Habkost wrote:

On Mon, Oct 28, 2019 at 03:52:12PM +0800, Tao Xu wrote:

Add tests for time input such as zero, around limit of precision,
signed upper limit, actual upper limit, beyond limits, time suffixes,
and etc.

Signed-off-by: Tao Xu 
---

[...]

+/* Close to signed upper limit 0x7c00 (53 msbs set) */
+qdict = keyval_parse("time1=9223372036854774784," /* 7c00 */
+ "time2=9223372036854775295", /* 7dff */
+ NULL, _abort);
+v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
+qobject_unref(qdict);
+visit_start_struct(v, NULL, NULL, 0, _abort);
+visit_type_time(v, "time1", , _abort);
+g_assert_cmphex(time, ==, 0x7c00);
+visit_type_time(v, "time2", , _abort);
+g_assert_cmphex(time, ==, 0x7c00);


I'm confused by this test case and the one below[1].  Are these
known bugs?  Shouldn't we document them as known bugs?


Because do_strtosz() or do_strtomul() actually parse with strtod(), so the
precision is 53 bits, so in these cases, 7dff and
fbff are rounded.


My questions remain: why isn't this being treated like a bug?


Hi Markus,

I am confused about the code here too. Because in do_strtosz(), the
upper limit is

val * mul >= 0xfc00

So some data near 53 bit may be rounded. Is there a bug?


No, but the design is surprising, and the functions lack written
contracts, except for the do_strtosz() helper, which has one that sucks.

qemu_strtosz() & friends are designed to accept fraction * unit
multiplier.  Example: 1.5M means 1.5 * 1024 * 1024 with qemu_strtosz()
and qemu_strtosz_MiB(), and 1.5 * 1000 * 1000 with
qemu_strtosz_metric().  Whether supporting fractions is a good idea is
debatable, but it's what we've got.

The implementation limits the numeric part to the precision of double,
i.e. 53 bits.  "8PiB should be enough for anybody."

Switching it from double to long double raises the limit to the
precision of long double.  At least 64 bit on common hosts, but hosts
exist where it's the same 53 bits.  Do we support any such hosts?  If
yes, then we'd make the precision depend on the host, which feels like a
bad idea.

A possible alternative is to parse the numeric part both as a double and
as a 64 bit unsigned integer, then use whatever consumes more
characters.  This enables providing full 64 bits unless you actually use
a fraction.



This sounds like the right thing to do, if user input is an
integer and the code in the other end is consuming an integer.



As far as I remember, the only problem we've ever had with the 53 bits
limit is developer confusion :)



Developer confusion, I can deal with.  However, exposing this
behavior on external interfaces is a bug to me.

I don't know how serious the bug is because I don't know which
interfaces are affected by it.  Do we have a list?


Patches welcome.


My first goal is to get the maintainers of that code to recognize
it as a bug.  Then I hope this will motivate somebody else to fix
it.  :)



Hi Eduardo,

If it is a bug, could the fix patch merged during rc1-rc3? Because I 
made 2 patches, and I want to submit before HMAT (HMAT patches is big, 
so submit together may be slow).


Tao



[PATCH v2 4/4] watchdog/aspeed: Fix AST2600 frequency behaviour

2019-11-12 Thread Joel Stanley
The AST2600 control register sneakily changed the meaning of bit 4
without anyone noticing. It no longer controls the 1MHz vs APB clock
select, and instead always runs at 1MHz.

The AST2500 was always 1MHz too, but it retained bit 4, making it read
only. We can model both using the same fixed 1MHz calculation.

Fixes: 6b2b2a703cad ("hw: wdt_aspeed: Add AST2600 support")
Reviewed-by: Cédric Le Goater 
Signed-off-by: Joel Stanley 
---
v2: Fix Fixes line in commit message
---
 hw/watchdog/wdt_aspeed.c | 21 +
 include/hw/watchdog/wdt_aspeed.h |  1 +
 2 files changed, 18 insertions(+), 4 deletions(-)

diff --git a/hw/watchdog/wdt_aspeed.c b/hw/watchdog/wdt_aspeed.c
index 8787c5ad0f97..9a8a2200fd8e 100644
--- a/hw/watchdog/wdt_aspeed.c
+++ b/hw/watchdog/wdt_aspeed.c
@@ -93,11 +93,11 @@ static uint64_t aspeed_wdt_read(void *opaque, hwaddr 
offset, unsigned size)
 
 }
 
-static void aspeed_wdt_reload(AspeedWDTState *s, bool pclk)
+static void aspeed_wdt_reload(AspeedWDTState *s)
 {
 uint64_t reload;
 
-if (pclk) {
+if (!(s->regs[WDT_CTRL] & WDT_CTRL_1MHZ_CLK)) {
 reload = muldiv64(s->regs[WDT_RELOAD_VALUE], NANOSECONDS_PER_SECOND,
   s->pclk_freq);
 } else {
@@ -109,6 +109,16 @@ static void aspeed_wdt_reload(AspeedWDTState *s, bool pclk)
 }
 }
 
+static void aspeed_wdt_reload_1mhz(AspeedWDTState *s)
+{
+uint64_t reload = s->regs[WDT_RELOAD_VALUE] * 1000ULL;
+
+if (aspeed_wdt_is_enabled(s)) {
+timer_mod(s->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + reload);
+}
+}
+
+
 static void aspeed_wdt_write(void *opaque, hwaddr offset, uint64_t data,
  unsigned size)
 {
@@ -130,13 +140,13 @@ static void aspeed_wdt_write(void *opaque, hwaddr offset, 
uint64_t data,
 case WDT_RESTART:
 if ((data & 0x) == WDT_RESTART_MAGIC) {
 s->regs[WDT_STATUS] = s->regs[WDT_RELOAD_VALUE];
-aspeed_wdt_reload(s, !(s->regs[WDT_CTRL] & WDT_CTRL_1MHZ_CLK));
+awc->wdt_reload(s);
 }
 break;
 case WDT_CTRL:
 if (enable && !aspeed_wdt_is_enabled(s)) {
 s->regs[WDT_CTRL] = data;
-aspeed_wdt_reload(s, !(data & WDT_CTRL_1MHZ_CLK));
+awc->wdt_reload(s);
 } else if (!enable && aspeed_wdt_is_enabled(s)) {
 s->regs[WDT_CTRL] = data;
 timer_del(s->timer);
@@ -283,6 +293,7 @@ static void aspeed_2400_wdt_class_init(ObjectClass *klass, 
void *data)
 awc->offset = 0x20;
 awc->ext_pulse_width_mask = 0xff;
 awc->reset_ctrl_reg = SCU_RESET_CONTROL1;
+awc->wdt_reload = aspeed_wdt_reload;
 }
 
 static const TypeInfo aspeed_2400_wdt_info = {
@@ -317,6 +328,7 @@ static void aspeed_2500_wdt_class_init(ObjectClass *klass, 
void *data)
 awc->ext_pulse_width_mask = 0xf;
 awc->reset_ctrl_reg = SCU_RESET_CONTROL1;
 awc->reset_pulse = aspeed_2500_wdt_reset_pulse;
+awc->wdt_reload = aspeed_wdt_reload_1mhz;
 }
 
 static const TypeInfo aspeed_2500_wdt_info = {
@@ -336,6 +348,7 @@ static void aspeed_2600_wdt_class_init(ObjectClass *klass, 
void *data)
 awc->ext_pulse_width_mask = 0xf; /* TODO */
 awc->reset_ctrl_reg = AST2600_SCU_RESET_CONTROL1;
 awc->reset_pulse = aspeed_2500_wdt_reset_pulse;
+awc->wdt_reload = aspeed_wdt_reload_1mhz;
 }
 
 static const TypeInfo aspeed_2600_wdt_info = {
diff --git a/include/hw/watchdog/wdt_aspeed.h b/include/hw/watchdog/wdt_aspeed.h
index dfedd7662dd1..819c22993a6e 100644
--- a/include/hw/watchdog/wdt_aspeed.h
+++ b/include/hw/watchdog/wdt_aspeed.h
@@ -47,6 +47,7 @@ typedef struct AspeedWDTClass {
 uint32_t ext_pulse_width_mask;
 uint32_t reset_ctrl_reg;
 void (*reset_pulse)(AspeedWDTState *s, uint32_t property);
+void (*wdt_reload)(AspeedWDTState *s);
 }  AspeedWDTClass;
 
 #endif /* WDT_ASPEED_H */
-- 
2.24.0




[PATCH v2 1/4] aspeed/sdmc: Make ast2600 default 1G

2019-11-12 Thread Joel Stanley
Most boards have this much.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Joel Stanley 
---
 hw/misc/aspeed_sdmc.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/misc/aspeed_sdmc.c b/hw/misc/aspeed_sdmc.c
index f3a63a2e01db..2df3244b53c8 100644
--- a/hw/misc/aspeed_sdmc.c
+++ b/hw/misc/aspeed_sdmc.c
@@ -208,10 +208,10 @@ static int ast2600_rambits(AspeedSDMCState *s)
 }
 
 /* use a common default */
-warn_report("Invalid RAM size 0x%" PRIx64 ". Using default 512M",
+warn_report("Invalid RAM size 0x%" PRIx64 ". Using default 1024M",
 s->ram_size);
-s->ram_size = 512 << 20;
-return ASPEED_SDMC_AST2600_512MB;
+s->ram_size = 1024 << 20;
+return ASPEED_SDMC_AST2600_1024MB;
 }
 
 static void aspeed_sdmc_reset(DeviceState *dev)
-- 
2.24.0




[PATCH v2 3/4] watchdog/aspeed: Improve watchdog timeout message

2019-11-12 Thread Joel Stanley
Users benefit from knowing which watchdog timer has expired. The address
of the watchdog's registers unambiguously indicates which has expired,
so log that.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Joel Stanley 
---
v2: Use HWADDR_PRIx
---
 hw/watchdog/wdt_aspeed.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/hw/watchdog/wdt_aspeed.c b/hw/watchdog/wdt_aspeed.c
index 145be6f99ce2..8787c5ad0f97 100644
--- a/hw/watchdog/wdt_aspeed.c
+++ b/hw/watchdog/wdt_aspeed.c
@@ -219,7 +219,8 @@ static void aspeed_wdt_timer_expired(void *dev)
 return;
 }
 
-qemu_log_mask(CPU_LOG_RESET, "Watchdog timer expired.\n");
+qemu_log_mask(CPU_LOG_RESET, "Watchdog timer %" HWADDR_PRIx " expired.\n",
+s->iomem.addr);
 watchdog_perform_action();
 timer_del(s->timer);
 }
-- 
2.24.0




[PATCH v2 2/4] aspeed/scu: Fix W1C behavior

2019-11-12 Thread Joel Stanley
This models the clock write one to clear registers, and fixes up some
incorrect behavior in all of the write to clear registers.

There was also a typo in one of the register definitions.

Reviewed-by: Cédric Le Goater 
Signed-off-by: Joel Stanley 
---
 hw/misc/aspeed_scu.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/hw/misc/aspeed_scu.c b/hw/misc/aspeed_scu.c
index 717509bc5460..aac4645f8c3c 100644
--- a/hw/misc/aspeed_scu.c
+++ b/hw/misc/aspeed_scu.c
@@ -98,7 +98,7 @@
 #define AST2600_CLK_STOP_CTRL TO_REG(0x80)
 #define AST2600_CLK_STOP_CTRL_CLR TO_REG(0x84)
 #define AST2600_CLK_STOP_CTRL2 TO_REG(0x90)
-#define AST2600_CLK_STOP_CTR2L_CLR TO_REG(0x94)
+#define AST2600_CLK_STOP_CTRL2_CLR TO_REG(0x94)
 #define AST2600_SDRAM_HANDSHAKE   TO_REG(0x100)
 #define AST2600_HPLL_PARAMTO_REG(0x200)
 #define AST2600_HPLL_EXT  TO_REG(0x204)
@@ -532,11 +532,12 @@ static uint64_t aspeed_ast2600_scu_read(void *opaque, 
hwaddr offset,
 return s->regs[reg];
 }
 
-static void aspeed_ast2600_scu_write(void *opaque, hwaddr offset, uint64_t 
data,
+static void aspeed_ast2600_scu_write(void *opaque, hwaddr offset, uint64_t 
data64,
  unsigned size)
 {
 AspeedSCUState *s = ASPEED_SCU(opaque);
 int reg = TO_REG(offset);
+uint32_t data = data64;
 
 if (reg >= ASPEED_AST2600_SCU_NR_REGS) {
 qemu_log_mask(LOG_GUEST_ERROR,
@@ -563,15 +564,19 @@ static void aspeed_ast2600_scu_write(void *opaque, hwaddr 
offset, uint64_t data,
 /* fall through */
 case AST2600_SYS_RST_CTRL:
 case AST2600_SYS_RST_CTRL2:
+case AST2600_CLK_STOP_CTRL:
+case AST2600_CLK_STOP_CTRL2:
 /* W1S (Write 1 to set) registers */
 s->regs[reg] |= data;
 return;
 case AST2600_SYS_RST_CTRL_CLR:
 case AST2600_SYS_RST_CTRL2_CLR:
+case AST2600_CLK_STOP_CTRL_CLR:
+case AST2600_CLK_STOP_CTRL2_CLR:
 case AST2600_HW_STRAP1_CLR:
 case AST2600_HW_STRAP2_CLR:
 /* W1C (Write 1 to clear) registers */
-s->regs[reg] &= ~data;
+s->regs[reg - 1] &= ~data;
 return;
 
 case AST2600_RNG_DATA:
-- 
2.24.0




[PATCH v2 0/4] arm/aspeed: Watchdog and SDRAM fixes

2019-11-12 Thread Joel Stanley
Three of these are fixes for ast2600 models that I found when testing
master. The forth is a usability improvement that is helpful when
diagnosing why a watchdog is biting.

v2 fixes some review comments from Cédric and adds his r-b.


Joel Stanley (4):
  aspeed/sdmc: Make ast2600 default 1G
  aspeed/scu: Fix W1C behavior
  watchdog/aspeed: Improve watchdog timeout message
  watchdog/aspeed: Fix AST2600 frequency behaviour

 hw/misc/aspeed_scu.c | 11 ---
 hw/misc/aspeed_sdmc.c|  6 +++---
 hw/watchdog/wdt_aspeed.c | 24 +++-
 include/hw/watchdog/wdt_aspeed.h |  1 +
 4 files changed, 31 insertions(+), 11 deletions(-)

-- 
2.24.0




Re: [PATCH 4/4] watchdog/aspeed: Fix AST2600 frequency behaviour

2019-11-12 Thread Joel Stanley
On Tue, 12 Nov 2019 at 07:56, Cédric Le Goater  wrote:
>
> On 12/11/2019 07:40, Joel Stanley wrote:
> > The AST2600 control register sneakily changed the meaning of bit 4
> > without anyone noticing. It no longer controls the 1MHz vs APB clock
> > select, and instead always runs at 1MHz.
> >
> > The AST2500 was always 1MHz too, but it retained bit 4, making it read
> > only. We can model both using the same fixed 1MHz calculation.
> >
> > Fixes: ea29711f467f ("watchdog/aspeed: Fix AST2600 control reg behaviour")
>
> which commit is that ^ ? Did you mean :
>
> Fixes: 6b2b2a703cad ("hw: wdt_aspeed: Add AST2600 support")

Yes. Thanks for catching that.



Re: [PATCH 00/55] Patch Round-up for stable 4.1.1, freeze on 2019-11-12

2019-11-12 Thread Michael Roth
Quoting Michael Roth (2019-11-12 12:05:14)
> Quoting Michael Roth (2019-11-05 14:51:48)
> > Hi everyone,
> > 
> > The following new patches are queued for QEMU stable v4.1.1:
> > 
> >   https://github.com/mdroth/qemu/commits/stable-4.1-staging
> > 
> > The release is tentatively planned for 2019-11-14:
> > 
> >   https://wiki.qemu.org/Planning/4.1
> > 
> > Please note that the original release date was planned for 2019-11-21,
> > but was moved up to address a number of qcow2 corruption issues:
> > 
> >   https://lists.gnu.org/archive/html/qemu-devel/2019-10/msg07144.html
> > 
> > Fixes for the XFS issues noted in the thread are still pending, but will
> > hopefully be qemu.git master in time for 4.1.1 freeze and the
> > currently-scheduled release date for 4.2.0-rc1.
> > 
> > The list of still-pending patchsets being tracked for inclusion are:
> > 
> >   qcow2: Fix data corruption on XFS
> > https://lists.gnu.org/archive/html/qemu-devel/2019-11/msg00073.html
> > (PULL pending)
> >   qcow2: Fix QCOW2_COMPRESSED_SECTOR_MASK
> > https://lists.gnu.org/archive/html/qemu-devel/2019-10/msg07718.html
> >   qcow2-bitmap: Fix uint64_t left-shift overflow
> > https://lists.gnu.org/archive/html/qemu-devel/2019-10/msg07989.html
> > 
> > Please respond here or CC qemu-sta...@nongnu.org on any additional patches
> > you think should be included in the release.
> 
> The following additional patches have been pushed to the staging tree:
> 
>   tests: make filemonitor test more robust to event ordering
>   block: posix: Always allocate the first block
>   file-posix: Handle undetectable alignment
>   block/file-posix: Let post-EOF fallocate serialize
>   block: Add bdrv_co_get_self_request()
>   block: Make wait/mark serialising requests public
>   block/io: refactor padding
>   util/iov: improve qemu_iovec_is_zero
>   util/iov: introduce qemu_iovec_init_extended
>   qcow2-bitmap: Fix uint64_t left-shift overflow
>   iotests: Add peek_file* functions
>   iotests: Add test for 4G+ compressed qcow2 write
>   qcow2: Fix QCOW2_COMPRESSED_SECTOR_MASK

The following additional patches have been pushed to the staging tree:

  mirror: Keep mirror_top_bs drained after dropping permissions
  block/create: Do not abort if a block driver is not available
  vhost: Fix memory region section comparison
  memory: Provide an equality function for MemoryRegionSections
  memory: Align MemoryRegionSections fields

> 
> Thank you for the suggestions.
> 
> > 
> > Thanks!
> > 
> > 
> > Adrian Moreno (1):
> >   vhost-user: save features if the char dev is closed
> > 
> > Alberto Garcia (1):
> >   qcow2: Fix the calculation of the maximum L2 cache size
> > 
> > Anthony PERARD (1):
> >   xen-bus: Fix backend state transition on device reset
> > 
> > Aurelien Jarno (1):
> >   target/alpha: fix tlb_fill trap_arg2 value for instruction fetch
> > 
> > Christophe Lyon (1):
> >   target/arm: Allow reading flags from FPSCR for M-profile
> > 
> > David Hildenbrand (1):
> >   s390x/tcg: Fix VERIM with 32/64 bit elements
> > 
> > Eduardo Habkost (1):
> >   pc: Don't make die-id mandatory unless necessary
> > 
> > Fan Yang (1):
> >   COLO-compare: Fix incorrect `if` logic
> > 
> > Hikaru Nishida (1):
> >   ui: Fix hanging up Cocoa display on macOS 10.15 (Catalina)
> > 
> > Igor Mammedov (1):
> >   x86: do not advertise die-id in query-hotpluggbale-cpus if '-smp 
> > dies' is not set
> > 
> > Johannes Berg (1):
> >   libvhost-user: fix SLAVE_SEND_FD handling
> > 
> > John Snow (2):
> >   Revert "ide/ahci: Check for -ECANCELED in aio callbacks"
> >   iotests: add testing shim for script-style python tests
> > 
> > Kevin Wolf (4):
> >   coroutine: Add qemu_co_mutex_assert_locked()
> >   qcow2: Fix corruption bug in qcow2_detect_metadata_preallocation()
> >   block/snapshot: Restrict set of snapshot nodes
> >   iotests: Test internal snapshots with -blockdev
> > 
> > Markus Armbruster (1):
> >   pr-manager: Fix invalid g_free() crash bug
> > 
> > Matthew Rosato (1):
> >   s390: PCI: fix IOMMU region init
> > 
> > Max Filippov (1):
> >   target/xtensa: regenerate and re-import test_mmuhifi_c3 core
> > 
> > Max Reitz (16):
> >   block/file-posix: Reduce xfsctl() use
> >   iotests: Test reverse sub-cluster qcow2 writes
> >   vpc: Return 0 from vpc_co_create() on success
> >   iotests: Add supported protocols to execute_test()
> >   iotests: Restrict file Python tests to file
> >   iotests: Restrict nbd Python tests to nbd
> >   iotests: Test blockdev-create for vpc
> >   curl: Keep pointer to the CURLState in CURLSocket
> >   curl: Keep *socket until the end of curl_sock_cb()
> >   curl: Check completion in curl_multi_do()
> >   curl: Pass CURLSocket to curl_multi_do()
> >   curl: Report only ready sockets
> >   curl: Handle success in multi_check_completion

Re: [virtio-dev] Re: guest / host buffer sharing ...

2019-11-12 Thread Gurchetan Singh
On Tue, Nov 12, 2019 at 5:56 AM Liam Girdwood
 wrote:
>
> On Mon, 2019-11-11 at 16:54 -0800, Gurchetan Singh wrote:
> > On Tue, Nov 5, 2019 at 2:55 AM Gerd Hoffmann 
> > wrote:
> > > Each buffer also has some properties to carry metadata, some fixed
> > > (id, size, application), but
> > > also allow free form (name = value, framebuffers would have
> > > width/height/stride/format for example).
> >
> > Sounds a lot like the recently added DMA_BUF_SET_NAME ioctls:
> >
> > https://patchwork.freedesktop.org/patch/310349/
> >
> > For virtio-wayland + virtio-vdec, the problem is sharing -- not
> > allocation.
> >
>
> Audio also needs to share buffers with firmware running on DSPs.
>
> > As the buffer reaches a kernel boundary, it's properties devolve into
> > [fd, size].  Userspace can typically handle sharing metadata.  The
> > issue is the guest dma-buf fd doesn't mean anything on the host.
> >
> > One scenario could be:
> >
> > 1) Guest userspace (say, gralloc) allocates using virtio-gpu.  When
> > allocating, we call uuidgen() and then pass that via RESOURCE_CREATE
> > hypercall to the host.
> > 2) When exporting the dma-buf, we call DMA_BUF_SET_NAME (the buffer
> > name will be "virtgpu-buffer-${UUID}").
> > 3) When importing, virtio-{vdec, video} reads the dma-buf name in
> > userspace, and calls fd to handle.  The name is sent to the host via
> > a
> > hypercall, giving host virtio-{vdec, video} enough information to
> > identify the buffer.
> >
> > This solution is entirely userspace -- we can probably come up with
> > something in kernel space [generate_random_uuid()] if need be.  We
> > only need two universal IDs: {device ID, buffer ID}.
> >
>
> I need something where I can take a guest buffer and then convert it to
> physical scatter gather page list. I can then either pass the SG page
> list to the DSP firmware (for DMAC IP programming) or have the host
> driver program the DMAC directly using the page list (who programs DMAC
> depends on DSP architecture).

So you need the HW address space from a guest allocation?  Would your
allocation hypercalls use something like the virtio_gpu_mem_entry
(virtio_gpu.h) and the draft virtio_video_mem_entry (draft)?

struct {
__le64 addr;
__le32 length;
__le32 padding;
};

/* VIRTIO_GPU_CMD_RESOURCE_ATTACH_BACKING */
struct virtio_gpu_resource_attach_backing {
struct virtio_gpu_ctrl_hdr hdr;
__le32 resource_id;
__le32 nr_entries;
  *struct struct virtio_gpu_mem_entry */
};

struct virtio_video_mem_entry {
__le64 addr;
__le32 length;
__u8 padding[4];
};

struct virtio_video_resource_attach_backing {
struct virtio_video_ctrl_hdr hdr;
__le32 resource_id;
__le32 nr_entries;
};

>
> DSP FW has no access to userspace so we would need some additional API
> on top of DMA_BUF_SET_NAME etc to get physical hardware pages ?

The dma-buf api currently can share guest memory sg-lists.

>
> Liam
>
>
>
> -
> To unsubscribe, e-mail: virtio-dev-unsubscr...@lists.oasis-open.org
> For additional commands, e-mail: virtio-dev-h...@lists.oasis-open.org
>



RE: QEMU for Qualcomm Hexagon - KVM Forum talk and code available

2019-11-12 Thread Taylor Simpson
I had discussions with several people at the KVM Forum, and I’ve been thinking 
about how to divide up the code for community review.  Here is my proposal for 
the steps.

  1.  linux-user changes + linux-user/hexagon + skeleton of target/hexagon
This is the minimum amount to build and run a very simple program.  I have an 
assembly program that prints “Hello” and exits.  It is constructed to use very 
few instructions that can be added brute force in the Hexagon back end.
  2.  Add the code that is imported from the Hexagon simulator and the qemu 
helper generator
This will allow the scalar ISA to be executed.  This will grow the set of 
programs that could execute, but there will still be limitations.  In 
particular, there can be no packets which means the C library won’t work .  We 
have to build with -nostdlib
  3.  Add support for packet semantics
At this point, we will be able to execute full programs linked with the C 
library.  This will include the check-tcg tests.
  4.  Add support for the wide vector extensions
  5.  Add the helper overrides for performance optimization
Some of these will be written by hand, and we’ll work with rev.ng to integrate 
their flex/bison generator.

I would love some feedback on this proposal.  Hopefully, that is enough detail 
so that people can comment.  If anything isn’t clear, please ask questions.


Thanks,
Taylor


From: Qemu-devel  On Behalf 
Of Taylor Simpson
Sent: Tuesday, November 5, 2019 10:33 AM
To: Aleksandar Markovic 
Cc: Alessandro Di Federico ; ni...@rev.ng; qemu-devel@nongnu.org; 
Niccolò Izzo 
Subject: RE: QEMU for Qualcomm Hexagon - KVM Forum talk and code available

Hi Aleksandar,

Thank you – We’re glad you enjoyed the talk.

One point of clarification on SIMD in Hexagon.  What we refer to as the 
“scalar” core does have some SIMD operations.  Register pairs are 8 bytes, and 
there are several SIMD instructions.  The example we showed in the talk 
included a VADDH instruction.  It treats the register pair as 4 half-words and 
does a vector add.  Then there are the Hexagon Vector eXtensions (HVX) 
instructions that operate on 128-byte vectors.  There is a wide variety of 
instructions in this set.  As you mentioned, some of them are pure SIMD and 
others are very complex.

For the helper generator, the vast majority of these are implemented with 
helpers.  There are only 2 vector instructions in the scalar core that have a 
TCG override, and all of the HVX instructions are implemented with helpers.  If 
you are interested in a deeper dive, see below.

Alessandro and Niccolo can comment on the flex/bison implementation.

Thanks,
Taylor


Now for the deeper dive in case anyone is interested.  Look at the genptr.c 
file in target/hexagon.

The first vector instruction that is with an override is A6_vminub_RdP.  It 
does a byte-wise comparison of two register pairs and sets a predicate register 
indicating whether the byte in the left or right operand is greater.  Here is 
the TCG code.
#define fWRAP_A6_vminub_RdP(GENHLPR, SHORTCODE) \
{ \
TCGv BYTE = tcg_temp_new(); \
TCGv left = tcg_temp_new(); \
TCGv right = tcg_temp_new(); \
TCGv tmp = tcg_temp_new(); \
int i; \
tcg_gen_movi_tl(PeV, 0); \
tcg_gen_movi_i64(RddV, 0); \
for (i = 0; i < 8; i++) { \
fGETUBYTE(i, RttV); \
tcg_gen_mov_tl(left, BYTE); \
fGETUBYTE(i, RssV); \
tcg_gen_mov_tl(right, BYTE); \
tcg_gen_setcond_tl(TCG_COND_GT, tmp, left, right); \
fSETBIT(i, PeV, tmp); \
fMIN(tmp, left, right); \
fSETBYTE(i, RddV, tmp); \
} \
tcg_temp_free(BYTE); \
tcg_temp_free(left); \
tcg_temp_free(right); \
tcg_temp_free(tmp); \
}

The second instruction is S2_vsplatrb.  It takes the byte from the operand and 
replicates it 4 times into the destination register.  Here is the TCG code.
#define fWRAP_S2_vsplatrb(GENHLPR, SHORTCODE) \
{ \
TCGv tmp = tcg_temp_new(); \
int i; \
tcg_gen_movi_tl(RdV, 0); \
tcg_gen_andi_tl(tmp, RsV, 0xff); \
for (i = 0; i < 4; i++) { \
tcg_gen_shli_tl(RdV, RdV, 8); \
tcg_gen_or_tl(RdV, RdV, tmp); \
} \
tcg_temp_free(tmp); \
}


From: Aleksandar Markovic 
mailto:aleksandar.m.m...@gmail.com>>
Sent: Monday, November 4, 2019 6:05 PM
To: Taylor Simpson mailto:tsimp...@quicinc.com>>
Cc: qemu-devel@nongnu.org; Alessandro Di Federico 
mailto:a...@rev.ng>>; ni...@rev.ng; Niccolò 
Izzo mailto:izzonicc...@gmail.com>>
Subject: Re: QEMU for Qualcomm Hexagon - KVM Forum talk and code available


CAUTION: This email originated from outside of the organization.


On Friday, October 25, 2019, Taylor Simpson 
mailto:tsimp...@quicinc.com>> wrote:
We would like inform the you that we will be doing a talk at the KVM Forum next 
week on QEMU for Qualcomm Hexagon.  Alessandro Di Federico, Niccolo Izzo, and I 
have been working independently on implementations of the Hexagon target.  We 
plan to 

Re: [PATCH v3 0/8] blockdev: avoid acquiring AioContext lock twice at do_drive_backup and do_blockdev_backup

2019-11-12 Thread no-reply
Patchew URL: https://patchew.org/QEMU/20191112113012.71136-1-...@redhat.com/



Hi,

This series failed the docker-quick@centos7 build test. Please find the testing 
commands and
their output below. If you have Docker installed, you can probably reproduce it
locally.

=== TEST SCRIPT BEGIN ===
#!/bin/bash
make docker-image-centos7 V=1 NETWORK=1
time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
=== TEST SCRIPT END ===

  TESTiotest-qcow2: 268
Failures: 141
Failed 1 of 108 iotests
make: *** [check-tests/check-block.sh] Error 1
Traceback (most recent call last):
  File "./tests/docker/docker.py", line 662, in 
sys.exit(main())
---
raise CalledProcessError(retcode, cmd)
subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
'--label', 'com.qemu.instance.uuid=5e0a4e7f97154a93b182d709969b9417', '-u', 
'1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
'-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', '-e', 
'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
'/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
'/var/tmp/patchew-tester-tmp-6a9_8q0n/src/docker-src.2019-11-12-17.38.46.26027:/var/tmp/qemu:z,ro',
 'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
status 2.
filter=--filter=label=com.qemu.instance.uuid=5e0a4e7f97154a93b182d709969b9417
make[1]: *** [docker-run] Error 1
make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-6a9_8q0n/src'
make: *** [docker-run-test-quick@centos7] Error 2

real10m57.839s
user0m8.062s


The full log is available at
http://patchew.org/logs/20191112113012.71136-1-...@redhat.com/testing.docker-quick@centos7/?type=message.
---
Email generated automatically by Patchew [https://patchew.org/].
Please send your feedback to patchew-de...@redhat.com

Re: [PATCH v9 Kernel 3/5] vfio iommu: Add ioctl defination to unmap IOVA and return dirty bitmap

2019-11-12 Thread Alex Williamson
On Tue, 12 Nov 2019 22:33:38 +0530
Kirti Wankhede  wrote:

> With vIOMMU, during pre-copy phase of migration, while CPUs are still
> running, IO virtual address unmap can happen while device still keeping
> reference of guest pfns. Those pages should be reported as dirty before
> unmap, so that VFIO user space application can copy content of those pages
> from source to destination.
> 
> IOCTL defination added here add bitmap pointer, size and flag. If flag

definition, adds

> VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP is set and bitmap memory is allocated
> and bitmap_size of set, then ioctl will create bitmap of pinned pages and

s/of/is/

> then unmap those.
> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  include/uapi/linux/vfio.h | 33 +
>  1 file changed, 33 insertions(+)
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 6fd3822aa610..72fd297baf52 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -925,6 +925,39 @@ struct vfio_iommu_type1_dirty_bitmap {
>  
>  #define VFIO_IOMMU_GET_DIRTY_BITMAP _IO(VFIO_TYPE, VFIO_BASE + 
> 17)
>  
> +/**
> + * VFIO_IOMMU_UNMAP_DMA_GET_BITMAP - _IOWR(VFIO_TYPE, VFIO_BASE + 18,
> + * struct vfio_iommu_type1_dma_unmap_bitmap)
> + *
> + * Unmap IO virtual addresses using the provided struct
> + * vfio_iommu_type1_dma_unmap_bitmap.  Caller sets argsz.
> + * VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP should be set to get dirty bitmap
> + * before unmapping IO virtual addresses. If this flag is not set, only IO
> + * virtual address are unmapped without creating pinned pages bitmap, that
> + * is, behave same as VFIO_IOMMU_UNMAP_DMA ioctl.
> + * User should allocate memory to get bitmap and should set size of allocated
> + * memory in bitmap_size field. One bit in bitmap is used to represent per 
> page
> + * consecutively starting from iova offset. Bit set indicates page at that
> + * offset from iova is dirty.
> + * The actual unmapped size is returned in the size field and bitmap of pages
> + * in the range of unmapped size is returned in bitmap if flag
> + * VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP is set.
> + *
> + * No guarantee is made to the user that arbitrary unmaps of iova or size
> + * different from those used in the original mapping call will succeed.
> + */
> +struct vfio_iommu_type1_dma_unmap_bitmap {
> + __u32argsz;
> + __u32flags;
> +#define VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP (1 << 0)
> + __u64iova;/* IO virtual address */
> + __u64size;/* Size of mapping (bytes) */
> + __u64bitmap_size; /* in bytes */
> + void __user *bitmap;  /* one bit per page */
> +};
> +
> +#define VFIO_IOMMU_UNMAP_DMA_GET_BITMAP _IO(VFIO_TYPE, VFIO_BASE + 18)
> +

Why not extend VFIO_IOMMU_UNMAP_DMA to support this rather than add an
ioctl that duplicates the functionality and extends it??  Otherwise
same comments as previous, in fact it's too bad we can't use this ioctl
for both, but a DONT_UNMAP flag on the UNMAP_DMA ioctl seems a bit
absurd.

I suspect we also want a flags bit in VFIO_IOMMU_GET_INFO to indicate
these capabilities are supported.

Maybe for both ioctls we also want to define it as the user's
responsibility to zero the bitmap, requiring the kernel to only set
bits as necessary.  Thanks,

Alex

>  /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
>  
>  /*




Re: [PATCH v9 Kernel 1/5] vfio: KABI for migration interface for device state

2019-11-12 Thread Alex Williamson
On Tue, 12 Nov 2019 22:33:36 +0530
Kirti Wankhede  wrote:

> - Defined MIGRATION region type and sub-type.
> - Used 3 bits to define VFIO device states.
> Bit 0 => _RUNNING
> Bit 1 => _SAVING
> Bit 2 => _RESUMING
> Combination of these bits defines VFIO device's state during migration
> _RUNNING => Normal VFIO device running state. When its reset, it
>   indicates _STOPPED state. when device is changed to
>   _STOPPED, driver should stop device before write()
>   returns.
> _SAVING | _RUNNING => vCPUs are running, VFIO device is running but
>   start saving state of device i.e. pre-copy state
> _SAVING  => vCPUs are stopped, VFIO device should be stopped, and

s/should/must/

> save device state,i.e. stop-n-copy state
> _RESUMING => VFIO device resuming state.
> _SAVING | _RESUMING and _RUNNING | _RESUMING => Invalid states

A table might be useful here and in the uapi header to indicate valid
states:

| _RESUMING | _SAVING | _RUNNING | Description
+---+-+--+--
| 0 |0| 0| Stopped, not saving or resuming (a)
+---+-+--+--
| 0 |0| 1| Running, default state
+---+-+--+--
| 0 |1| 0| Stopped, migration interface in save mode
+---+-+--+--
| 0 |1| 1| Running, save mode interface, iterative
+---+-+--+--
| 1 |0| 0| Stopped, migration resume interface active
+---+-+--+--
| 1 |0| 1| Invalid (b)
+---+-+--+--
| 1 |1| 0| Invalid (c)
+---+-+--+--
| 1 |1| 1| Invalid (d)

I think we need to consider whether we define (a) as generally
available, for instance we might want to use it for diagnostics or a
fatal error condition outside of migration.

Are there hidden assumptions between state transitions here or are
there specific next possible state diagrams that we need to include as
well?

I'm curious if Intel agrees with the states marked invalid with their
push for post-copy support.

> Bits 3 - 31 are reserved for future use. User should perform
> read-modify-write operation on this field.
> - Defined vfio_device_migration_info structure which will be placed at 0th
>   offset of migration region to get/set VFIO device related information.
>   Defined members of structure and usage on read/write access:
> * device_state: (read/write)
> To convey VFIO device state to be transitioned to. Only 3 bits are
>   used as of now, Bits 3 - 31 are reserved for future use.
> * pending bytes: (read only)
> To get pending bytes yet to be migrated for VFIO device.
> * data_offset: (read only)
> To get data offset in migration region from where data exist
>   during _SAVING and from where data should be written by user space
>   application during _RESUMING state.
> * data_size: (read/write)
> To get and set size in bytes of data copied in migration region
>   during _SAVING and _RESUMING state.
> 
> Migration region looks like:
>  --
> |vfio_device_migration_info|data section  |
> |  | ///  |
>  --
>  ^  ^
>  offset 0-trapped partdata_offset
> 
> Structure vfio_device_migration_info is always followed by data section
> in the region, so data_offset will always be non-0. Offset from where data
> to be copied is decided by kernel driver, data section can be trapped or
> mapped depending on how kernel driver defines data section.
> Data section partition can be defined as mapped by sparse mmap capability.
> If mmapped, then data_offset should be page aligned, where as initial
> section which contain vfio_device_migration_info structure might not end
> at offset which is page aligned.
> Vendor driver should decide whether to partition data section and how to
> partition the data section. Vendor driver should return data_offset
> accordingly.
> 
> For user application, data is opaque. User should write data in the same
> order as received.
> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  include/uapi/linux/vfio.h | 108 
> ++
>  1 file changed, 108 insertions(+)
> 
> diff 

Re: [PATCH v9 Kernel 5/5] vfio iommu: Implementation of ioctl to get dirty bitmap before unmap

2019-11-12 Thread Alex Williamson
On Tue, 12 Nov 2019 22:33:40 +0530
Kirti Wankhede  wrote:

> If pages are pinned by external interface for requested IO virtual address
> range, bitmap of such pages is created and then that range is unmapped.
> To get bitmap during unmap, user should set flag
> VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP, bitmap memory should be allocated and
> bitmap_size should be set. If flag is not set, then it behaves same as
> VFIO_IOMMU_UNMAP_DMA ioctl.
> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 71 
> +++--
>  1 file changed, 69 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index ac176e672857..d6b988452ba6 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -926,7 +926,8 @@ static int vfio_iova_get_dirty_bitmap(struct vfio_iommu 
> *iommu,
>  }
>  
>  static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
> -  struct vfio_iommu_type1_dma_unmap *unmap)
> +  struct vfio_iommu_type1_dma_unmap *unmap,
> +  unsigned long *bitmap)
>  {
>   uint64_t mask;
>   struct vfio_dma *dma, *dma_last = NULL;
> @@ -1026,6 +1027,12 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>   _unmap);
>   goto again;
>   }
> +
> + if (bitmap) {
> + vfio_iova_dirty_bitmap(iommu, dma->iova, dma->size,
> +unmap->iova, bitmap);
> + }
> +
>   unmapped += dma->size;
>   vfio_remove_dma(iommu, dma);
>   }
> @@ -1039,6 +1046,43 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>   return ret;
>  }
>  
> +static int vfio_dma_do_unmap_bitmap(struct vfio_iommu *iommu,
> + struct vfio_iommu_type1_dma_unmap_bitmap *unmap_bitmap)
> +{
> + struct vfio_iommu_type1_dma_unmap unmap;
> + unsigned long *bitmap = NULL;
> + int ret;
> +
> + /* check bitmap size */
> + if ((unmap_bitmap->flags | VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP)) {

It's required to enforce other flag bits are zero or else we can never
guarantee we can use them in the future without breaking existing
userspace, but I'd really rather extend the existing ioctl.

Should we provide any optimization to indicate to the user that dirty
bits were set?  Thanks,

Alex

> + ret = verify_bitmap_size(unmap_bitmap->size >> PAGE_SHIFT,
> +  unmap_bitmap->bitmap_size);
> + if (ret)
> + return ret;
> +
> + /* one bit per page */
> + bitmap = bitmap_zalloc(unmap_bitmap->size >> PAGE_SHIFT,
> + GFP_KERNEL);
> + if (!bitmap)
> + return -ENOMEM;
> + }
> +
> + unmap.iova = unmap_bitmap->iova;
> + unmap.size = unmap_bitmap->size;
> + ret = vfio_dma_do_unmap(iommu, , bitmap);
> + if (!ret)
> + unmap_bitmap->size = unmap.size;
> +
> + if (bitmap) {
> + if (!ret && copy_to_user(unmap_bitmap->bitmap, bitmap,
> +  unmap_bitmap->bitmap_size))
> + ret = -EFAULT;
> + bitmap_free(bitmap);
> + }
> +
> + return ret;
> +}
> +
>  static int vfio_iommu_map(struct vfio_iommu *iommu, dma_addr_t iova,
> unsigned long pfn, long npage, int prot)
>  {
> @@ -2366,7 +2410,7 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>   if (unmap.argsz < minsz || unmap.flags)
>   return -EINVAL;
>  
> - ret = vfio_dma_do_unmap(iommu, );
> + ret = vfio_dma_do_unmap(iommu, , NULL);
>   if (ret)
>   return ret;
>  
> @@ -2389,6 +2433,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>   return -EINVAL;
>  
>   return vfio_iova_get_dirty_bitmap(iommu, );
> + } else if (cmd == VFIO_IOMMU_UNMAP_DMA_GET_BITMAP) {
> + struct vfio_iommu_type1_dma_unmap_bitmap unmap_bitmap;
> + long ret;
> +
> + /* Supported for v2 version only */
> + if (!iommu->v2)
> + return -EACCES;
> +
> + minsz = offsetofend(struct vfio_iommu_type1_dma_unmap_bitmap,
> + bitmap);
> +
> + if (copy_from_user(_bitmap, (void __user *)arg, minsz))
> + return -EFAULT;
> +
> + if (unmap_bitmap.argsz < minsz)
> + return -EINVAL;
> +
> + ret = vfio_dma_do_unmap_bitmap(iommu, _bitmap);
> + if (ret)
> + return ret;
> +
> + return copy_to_user((void __user *)arg, _bitmap, minsz) ?
> +   

Re: [PATCH v9 Kernel 2/5] vfio iommu: Add ioctl defination to get dirty pages bitmap.

2019-11-12 Thread Alex Williamson
On Tue, 12 Nov 2019 22:33:37 +0530
Kirti Wankhede  wrote:

> All pages pinned by vendor driver through vfio_pin_pages API should be
> considered as dirty during migration. IOMMU container maintains a list of
> all such pinned pages. Added an ioctl defination to get bitmap of such

definition

> pinned pages for requested IO virtual address range.

Additionally, all mapped pages are considered dirty when physically
mapped through to an IOMMU, modulo we discussed devices opting in to
per page pinning to indicate finer granularity with a TBD mechanism to
figure out if any non-opt-in devices remain.

> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  include/uapi/linux/vfio.h | 23 +++
>  1 file changed, 23 insertions(+)
> 
> diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
> index 35b09427ad9f..6fd3822aa610 100644
> --- a/include/uapi/linux/vfio.h
> +++ b/include/uapi/linux/vfio.h
> @@ -902,6 +902,29 @@ struct vfio_iommu_type1_dma_unmap {
>  #define VFIO_IOMMU_ENABLE_IO(VFIO_TYPE, VFIO_BASE + 15)
>  #define VFIO_IOMMU_DISABLE   _IO(VFIO_TYPE, VFIO_BASE + 16)
>  
> +/**
> + * VFIO_IOMMU_GET_DIRTY_BITMAP - _IOWR(VFIO_TYPE, VFIO_BASE + 17,
> + * struct vfio_iommu_type1_dirty_bitmap)
> + *
> + * IOCTL to get dirty pages bitmap for IOMMU container during migration.
> + * Get dirty pages bitmap of given IO virtual addresses range using
> + * struct vfio_iommu_type1_dirty_bitmap. Caller sets argsz, which is size of
> + * struct vfio_iommu_type1_dirty_bitmap. User should allocate memory to get
> + * bitmap and should set size of allocated memory in bitmap_size field.
> + * One bit is used to represent per page consecutively starting from iova
> + * offset. Bit set indicates page at that offset from iova is dirty.
> + */
> +struct vfio_iommu_type1_dirty_bitmap {
> + __u32argsz;
> + __u32flags;
> + __u64iova;  /* IO virtual address */
> + __u64size;  /* Size of iova range */
> + __u64bitmap_size;   /* in bytes */

This seems redundant.  We can calculate the size of the bitmap based on
the iova size.

> + void __user *bitmap;/* one bit per page */

Should we define that as a __u64* to (a) help with the size
calculation, and (b) assure that we can use 8-byte ops on it?

However, who defines page size?  Is it necessarily the processor page
size?  A physical IOMMU may support page sizes other than the CPU page
size.  It might be more important to indicate the expected page size
than the bitmap size.  Thanks,

Alex

> +};
> +
> +#define VFIO_IOMMU_GET_DIRTY_BITMAP _IO(VFIO_TYPE, VFIO_BASE + 
> 17)
> +
>  /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
>  
>  /*




Re: [PATCH v9 Kernel 4/5] vfio iommu: Implementation of ioctl to get dirty pages bitmap.

2019-11-12 Thread Alex Williamson
On Tue, 12 Nov 2019 22:33:39 +0530
Kirti Wankhede  wrote:

> IOMMU container maintains list of external pinned pages. Bitmap of pinned
> pages for input IO virtual address range is created and returned.
> IO virtual address range should be from a single mapping created by
> map request. Input bitmap_size is validated by calculating the size of
> requested range.
> This ioctl returns bitmap of dirty pages, its user space application
> responsibility to copy content of dirty pages from source to destination
> during migration.
> 
> Signed-off-by: Kirti Wankhede 
> Reviewed-by: Neo Jia 
> ---
>  drivers/vfio/vfio_iommu_type1.c | 92 
> +
>  1 file changed, 92 insertions(+)
> 
> diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
> index 2ada8e6cdb88..ac176e672857 100644
> --- a/drivers/vfio/vfio_iommu_type1.c
> +++ b/drivers/vfio/vfio_iommu_type1.c
> @@ -850,6 +850,81 @@ static unsigned long vfio_pgsize_bitmap(struct 
> vfio_iommu *iommu)
>   return bitmap;
>  }
>  
> +/*
> + * start_iova is the reference from where bitmaping started. This is called
> + * from DMA_UNMAP where start_iova can be different than iova

Why not simply call this with a pointer to the bitmap relative to the
start of the iova?

> + */
> +
> +static int vfio_iova_dirty_bitmap(struct vfio_iommu *iommu, dma_addr_t iova,
> +   size_t size, dma_addr_t start_iova,
> +   unsigned long *bitmap)
> +{
> + struct vfio_dma *dma;
> + dma_addr_t temp_iova = iova;
> +
> + dma = vfio_find_dma(iommu, iova, size);
> + if (!dma)

The UAPI did not define that the user can only ask for the dirty bitmap
across a mapped range.

> + return -EINVAL;
> +
> + /*
> +  * Range should be from a single mapping created by map request.
> +  */

The UAPI also did not specify this as a requirement.

> +
> + if ((iova < dma->iova) ||
> + ((dma->iova + dma->size) < (iova + size)))
> + return -EINVAL;

Nor this.

So the actual implemented UAPI is that the user must call this over
some portion of, but not exceeding a single previously mapped DMA
range.  Why so restrictive?

> +
> + while (temp_iova < iova + size) {
> + struct vfio_pfn *vpfn = NULL;
> +
> + vpfn = vfio_find_vpfn(dma, temp_iova);
> + if (vpfn)
> + __bitmap_set(bitmap, vpfn->iova - start_iova, 1);
> +
> + temp_iova += PAGE_SIZE;

Seems like waking the rb tree would be far more efficient.  Also, if
dma->iommu_mapped, mark all pages dirty until we figure out how to
avoid it.

> + }
> +
> + return 0;
> +}
> +
> +static int verify_bitmap_size(unsigned long npages, unsigned long 
> bitmap_size)
> +{
> + unsigned long bsize = ALIGN(npages, BITS_PER_LONG) / 8;
> +
> + if ((bitmap_size == 0) || (bitmap_size < bsize))
> + return -EINVAL;
> + return 0;
> +}
> +
> +static int vfio_iova_get_dirty_bitmap(struct vfio_iommu *iommu,
> + struct vfio_iommu_type1_dirty_bitmap *range)
> +{
> + unsigned long *bitmap;
> + int ret;
> +
> + ret = verify_bitmap_size(range->size >> PAGE_SHIFT, range->bitmap_size);
> + if (ret)
> + return ret;
> +
> + /* one bit per page */
> + bitmap = bitmap_zalloc(range->size >> PAGE_SHIFT, GFP_KERNEL);

This creates a DoS vector, we need to be able to directly use the user
bitmap or chunk words into it using a confined size (ex. a user can
with args 0 to UIN64_MAX). Thanks,

Alex

> + if (!bitmap)
> + return -ENOMEM;
> +
> + mutex_lock(>lock);
> + ret = vfio_iova_dirty_bitmap(iommu, range->iova, range->size,
> +  range->iova, bitmap);
> + mutex_unlock(>lock);
> +
> + if (!ret) {
> + if (copy_to_user(range->bitmap, bitmap, range->bitmap_size))
> + ret = -EFAULT;
> + }
> +
> + bitmap_free(bitmap);
> + return ret;
> +}
> +
>  static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
>struct vfio_iommu_type1_dma_unmap *unmap)
>  {
> @@ -2297,6 +2372,23 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
>  
>   return copy_to_user((void __user *)arg, , minsz) ?
>   -EFAULT : 0;
> + } else if (cmd == VFIO_IOMMU_GET_DIRTY_BITMAP) {
> + struct vfio_iommu_type1_dirty_bitmap range;
> +
> + /* Supported for v2 version only */
> + if (!iommu->v2)
> + return -EACCES;
> +
> + minsz = offsetofend(struct vfio_iommu_type1_dirty_bitmap,
> + bitmap);
> +
> + if (copy_from_user(, (void __user *)arg, minsz))
> + return -EFAULT;
> +
> + if (range.argsz < minsz)
> + return -EINVAL;
> +
> + return 

[ANNOUNCE] QEMU 4.2.0-rc1 is now available

2019-11-12 Thread Michael Roth
Hello,

On behalf of the QEMU Team, I'd like to announce the availability of the
second release candidate for the QEMU 4.2 release.  This release is meant
for testing purposes and should not be used in a production environment.

  http://download.qemu-project.org/qemu-4.2.0-rc1.tar.xz
  http://download.qemu-project.org/qemu-4.2.0-rc1.tar.xz.sig

You can help improve the quality of the QEMU 4.2 release by testing this
release and reporting bugs on Launchpad:

  https://bugs.launchpad.net/qemu/

The release plan, as well a documented known issues for release
candidates, are available at:

  http://wiki.qemu.org/Planning/4.2

Please add entries to the ChangeLog for the 4.2 release below:

  http://wiki.qemu.org/ChangeLog/4.2

Thank you to everyone involved!

Changes since rc0:

aa464db69b: Update version for v4.2.0-rc1 release (Peter Maydell)
0f1f2d4596: linux-user: remove host stime() syscall (Laurent Vivier)
c0cb880153: linux-user: fix missing break (Laurent Vivier)
c49a41b0b9: target/microblaze: Plug temp leak around eval_cond_jmp() (Edgar E. 
Iglesias)
f91c60f0ca: target/microblaze: Plug temp leaks with delay slot setup (Edgar E. 
Iglesias)
a633801526: target/microblaze: Plug temp leaks for loads/stores (Edgar E. 
Iglesias)
3fb356cc86: tcg plugins: expose an API version concept (Alex Bennée)
05273a43af: .travis.yml: don't run make check with multiple jobs (Alex Bennée)
5b4b4865f4: tests/vm: support sites with sha512 checksums (Alex Bennée)
860eacec58: tests: only run ipmi-bt-test if CONFIG_LINUX (Alex Bennée)
2548b4a7d3: tests/vm: update netbsd to version 8.1 (Gerd Hoffmann)
00963aca8b: tests/vm: use console_consume for netbsd (Gerd Hoffmann)
6c4f0416be: tests/vm: add console_consume helper (Gerd Hoffmann)
af093bc937: tests/vm: netbsd autoinstall, using serial console (Gerd Hoffmann)
5c62979ed5: ivshmem-server: Terminate also on SIGINT (Jan Kiszka)
0602a6166d: ivshmem-server: Clean up shmem on shutdown (Jan Kiszka)
88ed5db16c: numa: Add missing \n to error message (Greg Kurz)
d55e937d3e: qom: Fix error message in object_class_property_add() (Greg Kurz)
32eb2da326: Makefile: install bios-microvm like other binary blobs (Bruce 
Rogers)
cb974c95df: tcg/LICENSE: Remove out of date claim about TCG subdirectory 
licensing (Peter Maydell)
2552e30cba: tcg/ppc/tcg-target.opc.h: Add copyright/license (Peter Maydell)
2029bf7e52: tcg/i386/tcg-target.opc.h: Add copyright/license (Peter Maydell)
97105f2921: tcg/aarch64/tcg-target.opc.h: Add copyright/license (Peter Maydell)
45c078f163: hw/arm/boot: Set NSACR.{CP11, CP10} in dummy SMC setup routine 
(Clement Deschamps)
894d354fd8: Remove unassigned_access CPU hook (Peter Maydell)
af2a580f7e: ptimer: Remove old ptimer_init_with_bh() API (Peter Maydell)
623ef637a2: configure: Check bzip2 is available (Philippe Mathieu-Daudé)
05dfa22b5b: configure: Only decompress EDK2 blobs for X86/ARM targets (Philippe 
Mathieu-Daudé)
84b2c7e59a: tests/migration: Print some debug on bad status (Dr. David Alan 
Gilbert)
611aa4d00d: MAINTAINERS: slirp: Remove myself as maintainer (Jan Kiszka)
741309136e: cpu-plug-test: fix leaks (Marc-André Lureau)
36524a1a3d: qtest: fix qtest_qmp_device_add leak (Marc-André Lureau)
c744cf7879: dp8393x: fix dp8393x_receive() (Laurent Vivier)
af9f0be36c: dp8393x: put the DMA buffer in the state structure (Laurent Vivier)
1dfe2b91dc: usb-host: add option to allow all resets. (Gerd Hoffmann)




Re: [PATCH 00/55] Patch Round-up for stable 4.1.1, freeze on 2019-11-12

2019-11-12 Thread Bruce Rogers
On Tue, 2019-11-05 at 14:51 -0600, Michael Roth wrote:
> Hi everyone,
> 
> The following new patches are queued for QEMU stable v4.1.1:
> 
>   https://github.com/mdroth/qemu/commits/stable-4.1-staging
> 
> The release is tentatively planned for 2019-11-14:
> 
>   https://wiki.qemu.org/Planning/4.1
> 
> Please note that the original release date was planned for 2019-11-
> 21,
> but was moved up to address a number of qcow2 corruption issues:
> 
>   https://lists.gnu.org/archive/html/qemu-devel/2019-10/msg07144.html
> 
> Fixes for the XFS issues noted in the thread are still pending, but
> will
> hopefully be qemu.git master in time for 4.1.1 freeze and the
> currently-scheduled release date for 4.2.0-rc1.
> 
> The list of still-pending patchsets being tracked for inclusion are:
> 
>   qcow2: Fix data corruption on XFS
> 
> https://lists.gnu.org/archive/html/qemu-devel/2019-11/msg00073.html
> (PULL pending)
>   qcow2: Fix QCOW2_COMPRESSED_SECTOR_MASK
> 
> https://lists.gnu.org/archive/html/qemu-devel/2019-10/msg07718.html
>   qcow2-bitmap: Fix uint64_t left-shift overflow
> 
> https://lists.gnu.org/archive/html/qemu-devel/2019-10/msg07989.html
> 
> Please respond here or CC qemu-sta...@nongnu.org on any additional
> patches
> you think should be included in the release.
> 
> Thanks!
> 
> 
> Adrian Moreno (1):
>   vhost-user: save features if the char dev is closed
> 
> Alberto Garcia (1):
>   qcow2: Fix the calculation of the maximum L2 cache size
> 
> Anthony PERARD (1):
>   xen-bus: Fix backend state transition on device reset
> 
> Aurelien Jarno (1):
>   target/alpha: fix tlb_fill trap_arg2 value for instruction
> fetch
> 
> Christophe Lyon (1):
>   target/arm: Allow reading flags from FPSCR for M-profile
> 
> David Hildenbrand (1):
>   s390x/tcg: Fix VERIM with 32/64 bit elements
> 
> Eduardo Habkost (1):
>   pc: Don't make die-id mandatory unless necessary
> 
> Fan Yang (1):
>   COLO-compare: Fix incorrect `if` logic
> 
> Hikaru Nishida (1):
>   ui: Fix hanging up Cocoa display on macOS 10.15 (Catalina)
> 
> Igor Mammedov (1):
>   x86: do not advertise die-id in query-hotpluggbale-cpus if '-
> smp dies' is not set
> 
> Johannes Berg (1):
>   libvhost-user: fix SLAVE_SEND_FD handling
> 
> John Snow (2):
>   Revert "ide/ahci: Check for -ECANCELED in aio callbacks"
>   iotests: add testing shim for script-style python tests
> 
> Kevin Wolf (4):
>   coroutine: Add qemu_co_mutex_assert_locked()
>   qcow2: Fix corruption bug in
> qcow2_detect_metadata_preallocation()
>   block/snapshot: Restrict set of snapshot nodes
>   iotests: Test internal snapshots with -blockdev
> 
> Markus Armbruster (1):
>   pr-manager: Fix invalid g_free() crash bug
> 
> Matthew Rosato (1):
>   s390: PCI: fix IOMMU region init
> 
> Max Filippov (1):
>   target/xtensa: regenerate and re-import test_mmuhifi_c3 core
> 
> Max Reitz (16):
>   block/file-posix: Reduce xfsctl() use
>   iotests: Test reverse sub-cluster qcow2 writes
>   vpc: Return 0 from vpc_co_create() on success
>   iotests: Add supported protocols to execute_test()
>   iotests: Restrict file Python tests to file
>   iotests: Restrict nbd Python tests to nbd
>   iotests: Test blockdev-create for vpc
>   curl: Keep pointer to the CURLState in CURLSocket
>   curl: Keep *socket until the end of curl_sock_cb()
>   curl: Check completion in curl_multi_do()
>   curl: Pass CURLSocket to curl_multi_do()
>   curl: Report only ready sockets
>   curl: Handle success in multi_check_completion
>   qcow2: Limit total allocation range to INT_MAX
>   iotests: Test large write request to qcow2 file
>   mirror: Do not dereference invalid pointers
> 
> Maxim Levitsky (1):
>   block/qcow2: Fix corruption introduced by commit 8ac0f15f335
> 
> Michael Roth (2):
>   make-release: pull in edk2 submodules so we can build it from
> tarballs
>   roms/Makefile.edk2: don't pull in submodules when building from
> tarball
> 
> Michael S. Tsirkin (1):
>   virtio: new post_load hook
> 
> Mikhail Sennikovsky (1):
>   virtio-net: prevent offloads reset on migration
> 
> Paolo Bonzini (2):
>   dma-helpers: ensure AIO callback is invoked after cancellation
>   scsi: lsi: exit infinite loop while executing script (CVE-2019-
> 12068)
> 
> Paul Durrant (1):
>   xen-bus: check whether the frontend is active during device
> reset...
> 
> Peter Lieven (1):
>   block/nfs: tear down aio before nfs_close
> 
> Peter Maydell (3):
>   target/arm: Free TCG temps in trans_VMOV_64_sp()
>   target/arm: Don't abort on M-profile exception return in linux-
> user mode
>   hw/arm/boot.c: Set NSACR.{CP11,CP10} for NS kernel boots
> 
> Philippe Mathieu-Daudé (1):
>   virtio-blk: Cancel the pending BH when the dataplane is reset
> 
> Sergio Lopez (1):
>   

Re: [PATCH for 4.2 v1 1/1] riscv/virt: Increase flash size

2019-11-12 Thread Alistair Francis
On Mon, Nov 11, 2019 at 7:30 AM Bin Meng  wrote:
>
> On Thu, Nov 7, 2019 at 8:54 AM Alistair Francis
>  wrote:
> >
> > Coreboot developers have requested that they have at least 32MB of flash
> > to load binaries. We currently have 32MB of flash, but it is split in
> > two to allow loading two flash binaries. Let's increase the flash size
> > from 32MB to 64MB to ensure we have a single region that is 32MB.
> >
> > No QEMU release has include flash in the RISC-V virt machine, so this
> > isn't a breaking change.
> >
> > Signed-off-by: Alistair Francis 
> > ---
> >  hw/riscv/virt.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
>
> Reviewed-by: Bin Meng 

Thanks!

Ping! I really want this in 4.2. Otherwise we are stuck with a
compatibility issue.

Alistair



Re: [PATCH 1/5] MAINTAINERS: Add a section on git infrastructure

2019-11-12 Thread Aleksandar Markovic
On Tuesday, November 12, 2019, Alex Bennée  wrote:

>
> Aleksandar Markovic  writes:
>
> > From: Aleksandar Markovic 
> >
> > There should be a patient person maintaining gory details of
> > git-related files, and there is no better person for that role
> > than Philippe. Alex should be the reviewer for some relations
> > with gitdm.
>
> I'm not sure about this. The .gitignore files are best updated by people
> responsible for the various parts of the tree. Once out-of-tree builds
> become standard we should be able to eliminate them all together. As far
> as .mailmap is concerned I think people are quite capable of updating it
> themselves without putting the changes through a maintainer tree.
>
>
Thank you Alex for your feedback.

People here are mainly concerned about the substance of their contribution,
and don't know or don't care about .mailmap file - as evidenced by many
items Philippe had to add to that file. The essence of this patch was not
to force people to go to the maintainer approval, but that the maintainer
ftom time to time refreshes the file, if needed.

But, OK, if you have such reservations that you mentioned, I am going to
remove this patch in v2. We leave all these files listed in this patch
unmaintained.

Yours,
Aleksandar



> >
> > Signed-off-by: Aleksandar Markovic 
> > ---
> >  MAINTAINERS | 17 +
> >  1 file changed, 17 insertions(+)
> >
> > diff --git a/MAINTAINERS b/MAINTAINERS
> > index 4964fbb..be43ccb 100644
> > --- a/MAINTAINERS
> > +++ b/MAINTAINERS
> > @@ -2664,6 +2664,23 @@ M: Daniel P. Berrange 
> >  S: Odd Fixes
> >  F: scripts/git-submodule.sh
> >
> > +GIT infrastructure
> > +M: Philippe Mathieu-Daudé 
> > +R: Alex Bennée 
> > +S: Maintained
> > +F: .mailmap
> > +F: scripts/git.orderfile
> > +F: .gitignore
> > +F: tests/fp/.gitignore
> > +F: tests/fp/berkeley-softfloat-3/.gitignore
> > +F: tests/fp/berkeley-testfloat-3/.gitignore
> > +F: tests/migration/.gitignore
> > +F: tests/multiboot/.gitignore
> > +F: tests/qemu-iotests/.gitignore
> > +F: tests/tcg/.gitignore
> > +F: tests/uefi-test-tools/.gitignore
> > +F: ui/keycodemapdb/tests/.gitignore
> > +
> >  Sphinx documentation configuration and build machinery
> >  M: Peter Maydell 
> >  S: Maintained
>
>
> --
> Alex Bennée
>
>


Re: [PATCH v14 03/11] tests: Add test for QAPI builtin type time

2019-11-12 Thread Eduardo Habkost
On Fri, Nov 08, 2019 at 09:05:52AM +0100, Markus Armbruster wrote:
> Tao Xu  writes:
> 
> > On 11/7/2019 9:31 PM, Eduardo Habkost wrote:
> >> On Thu, Nov 07, 2019 at 02:24:52PM +0800, Tao Xu wrote:
> >>> On 11/7/2019 4:53 AM, Eduardo Habkost wrote:
>  On Mon, Oct 28, 2019 at 03:52:12PM +0800, Tao Xu wrote:
> > Add tests for time input such as zero, around limit of precision,
> > signed upper limit, actual upper limit, beyond limits, time suffixes,
> > and etc.
> >
> > Signed-off-by: Tao Xu 
> > ---
>  [...]
> > +/* Close to signed upper limit 0x7c00 (53 msbs set) */
> > +qdict = keyval_parse("time1=9223372036854774784," /* 
> > 7c00 */
> > + "time2=9223372036854775295", /* 
> > 7dff */
> > + NULL, _abort);
> > +v = qobject_input_visitor_new_keyval(QOBJECT(qdict));
> > +qobject_unref(qdict);
> > +visit_start_struct(v, NULL, NULL, 0, _abort);
> > +visit_type_time(v, "time1", , _abort);
> > +g_assert_cmphex(time, ==, 0x7c00);
> > +visit_type_time(v, "time2", , _abort);
> > +g_assert_cmphex(time, ==, 0x7c00);
> 
>  I'm confused by this test case and the one below[1].  Are these
>  known bugs?  Shouldn't we document them as known bugs?
> >>>
> >>> Because do_strtosz() or do_strtomul() actually parse with strtod(), so the
> >>> precision is 53 bits, so in these cases, 7dff and
> >>> fbff are rounded.
> >>
> >> My questions remain: why isn't this being treated like a bug?
> >>
> > Hi Markus,
> >
> > I am confused about the code here too. Because in do_strtosz(), the
> > upper limit is
> >
> > val * mul >= 0xfc00
> >
> > So some data near 53 bit may be rounded. Is there a bug?
> 
> No, but the design is surprising, and the functions lack written
> contracts, except for the do_strtosz() helper, which has one that sucks.
> 
> qemu_strtosz() & friends are designed to accept fraction * unit
> multiplier.  Example: 1.5M means 1.5 * 1024 * 1024 with qemu_strtosz()
> and qemu_strtosz_MiB(), and 1.5 * 1000 * 1000 with
> qemu_strtosz_metric().  Whether supporting fractions is a good idea is
> debatable, but it's what we've got.
> 
> The implementation limits the numeric part to the precision of double,
> i.e. 53 bits.  "8PiB should be enough for anybody."
> 
> Switching it from double to long double raises the limit to the
> precision of long double.  At least 64 bit on common hosts, but hosts
> exist where it's the same 53 bits.  Do we support any such hosts?  If
> yes, then we'd make the precision depend on the host, which feels like a
> bad idea.
> 
> A possible alternative is to parse the numeric part both as a double and
> as a 64 bit unsigned integer, then use whatever consumes more
> characters.  This enables providing full 64 bits unless you actually use
> a fraction.
> 

This sounds like the right thing to do, if user input is an
integer and the code in the other end is consuming an integer.


> As far as I remember, the only problem we've ever had with the 53 bits
> limit is developer confusion :)
> 

Developer confusion, I can deal with.  However, exposing this
behavior on external interfaces is a bug to me.

I don't know how serious the bug is because I don't know which
interfaces are affected by it.  Do we have a list?

> Patches welcome.

My first goal is to get the maintainers of that code to recognize
it as a bug.  Then I hope this will motivate somebody else to fix
it.  :)

-- 
Eduardo




Re: [PATCH] WHPX: refactor load library

2019-11-12 Thread Eduardo Habkost
On Tue, Nov 12, 2019 at 06:42:00PM +, Sunil Muthuswamy wrote:
> 
> 
> > -Original Message-
> > From: Sunil Muthuswamy
> > Sent: Friday, November 8, 2019 12:32 PM
> > To: 'Paolo Bonzini' ; 'Richard Henderson' 
> > ; 'Eduardo Habkost' ; 'Stefan
> > Weil' 
> > Cc: 'qemu-devel@nongnu.org' ; Justin Terry (VM) 
> > 
> > Subject: [PATCH] WHPX: refactor load library
> > 
> > This refactors the load library of WHV libraries to make it more
> > modular. It makes a helper routine that can be called on demand.
> > This allows future expansion of load library/functions to support
> > functionality that is depenedent on some feature being available.
> > 
> > Signed-off-by: Sunil Muthuswamy 
> > ---
> 
> Can I possibly get some eyes on this?

I'd be glad to queue the patch if we get a Reviewed-by line from
somebody who understands Windows and WHPX.  Maybe Justin?

Sunil, Justin, would you like to be listed as maintainers or
designated reviewers for the WHPX code in QEMU?

-- 
Eduardo




Re: [PATCH v1 3/5] Add use of RCU for qemu_logfile.

2019-11-12 Thread Robert Foley
On Tue, 12 Nov 2019 at 12:36, Alex Bennée  wrote:
>
>
> >  }
> > +atomic_rcu_set(_logfile, logfile);
> >  }
> > -qemu_mutex_unlock(_logfile_mutex);
> > +logfile = qemu_logfile;
>
> Isn't this read outside of the protection of both rcu_read_lock() and
> the mutex? Could that race?

This assignment is under the mutex.  This change moved the mutex
unlock towards the end of the function, just a few lines below the
call_rcu().

> > +
> >  if (qemu_logfile &&
> >  (is_daemonized() ? logfilename == NULL : !qemu_loglevel)) {
> > -qemu_log_close();
> > +atomic_rcu_set(_logfile, NULL);
> > +call_rcu(logfile, qemu_logfile_free, rcu);
>
> I wonder if we can re-arrange the logic here so it's lets confusing? For
> example the NULL qemu_loglevel can be detected at the start and we
> should be able just to squash the current log and reset and go. I'm not
> sure about the damonize/stdout check.
>
> >  }
> > +qemu_mutex_unlock(_logfile_mutex);
> >  }
> >

Absolutely, the logic that was here originally can be improved.  We
found it confusing also.
We could move things around a bit and change it to something like this
to help clarify.

bool need_to_open_file = false;
/*
 * In all cases we only log if qemu_loglevel is set.
 * And:
 * If not daemonized we will always log either to stderr
 *   or to a file (if there is a logfilename).
 * If we are daemonized,
 *   we will only log if there is a logfilename.
 */
if (qemu_loglevel && (!is_daemonized() || logfilename) {
need_to_open_file = true;
}
g_assert(qemu_logfile_mutex.initialized);
qemu_mutex_lock(_logfile_mutex);

if(qemu_logfile && !need_to_open_file) {
/* Close file. */
QemuLogFile *logfile = qemu_logfile;
atomic_rcu_set(_logfile, NULL);
call_rcu(logfile, qemu_logfile_free, rcu);
} else if(!qemu_logfile && need_to_open_file) {
logfile = g_new0(QemuLogFile, 1);
   __snip__ existing patch logic for opening the qemu_logfile will
be inserted here.
}
qemu_mutex_unlock(_logfile_mutex);

> >  {
> >  char *pidstr;
> > +
> >  g_free(logfilename);
>
> nit: stray newline

Will remove the newline.

Thanks,
Rob

On Tue, 12 Nov 2019 at 12:36, Alex Bennée  wrote:
>
>
> Robert Foley  writes:
>
> > This now allows changing the logfile while logging is active,
> > and also solves the issue of a seg fault while changing the logfile.
> >
> > Any read access to the qemu_logfile handle will use
> > the rcu_read_lock()/unlock() around the use of the handle.
> > To fetch the handle we will use atomic_rcu_read().
> > We also in many cases do a check for validity of the
> > logfile handle before using it to deal with the case where the
> > file is closed and set to NULL.
> >
> > The cases where we write to the qemu_logfile will use atomic_rcu_set().
> > Writers will also use call_rcu() with a newly added qemu_logfile_free
> > function for freeing/closing when readers have finished.
> >
> > Signed-off-by: Robert Foley 
> > ---
> > v1
> > - Changes for review comments.
> > - Minor changes to definition of QemuLogFile.
> > - Changed qemu_log_separate() to fix unbalanced and
> >   remove qemu_log_enabled() check.
> > - changed qemu_log_lock() to include else.
> > - make qemu_logfile_free static.
> > - use g_assert(logfile) in qemu_logfile_free.
> > - Relocated unlock out of if/else in qemu_log_close(), and
> >   in qemu_set_log().
> > ---
> >  include/qemu/log.h | 42 ++
> >  util/log.c | 73 +-
> >  include/exec/log.h | 33 ++---
> >  tcg/tcg.c  | 12 ++--
> >  4 files changed, 128 insertions(+), 32 deletions(-)
> >
> > diff --git a/include/qemu/log.h b/include/qemu/log.h
> > index a7c5b01571..528e1f9dd7 100644
> > --- a/include/qemu/log.h
> > +++ b/include/qemu/log.h
> > @@ -3,9 +3,16 @@
> >
> >  /* A small part of this API is split into its own header */
> >  #include "qemu/log-for-trace.h"
> > +#include "qemu/rcu.h"
> > +
> > +typedef struct QemuLogFile {
> > +struct rcu_head rcu;
> > +FILE *fd;
> > +} QemuLogFile;
> >
> >  /* Private global variable, don't use */
> > -extern FILE *qemu_logfile;
> > +extern QemuLogFile *qemu_logfile;
> > +
> >
> >  /*
> >   * The new API:
> > @@ -25,7 +32,16 @@ static inline bool qemu_log_enabled(void)
> >   */
> >  static inline bool qemu_log_separate(void)
> >  {
> > -return qemu_logfile != NULL && qemu_logfile != stderr;
> > +QemuLogFile *logfile;
> > +bool res = false;
> > +
> > +rcu_read_lock();
> > +logfile = atomic_rcu_read(_logfile);
> > +if (logfile && logfile->fd != stderr) {
> > +res = true;
> > +}
> > +rcu_read_unlock();
> > +return res;
> >  }
> >
> >  #define CPU_LOG_TB_OUT_ASM (1 << 0)
> > @@ -55,14 +71,23 @@ static inline bool qemu_log_separate(void)

Re: [PATCH v4 15/20] fuzz: add fuzzer skeleton

2019-11-12 Thread Alexander Bulekov

On 11/7/19 7:55 AM, Stefan Hajnoczi wrote:

On Wed, Oct 30, 2019 at 02:50:00PM +, Oleinik, Alexander wrote:

diff --git a/tests/fuzz/fuzz.c b/tests/fuzz/fuzz.c
new file mode 100644
index 00..0e38f81c48
--- /dev/null
+++ b/tests/fuzz/fuzz.c
@@ -0,0 +1,177 @@
+/*
+ * fuzzing driver
+ *
+ * Copyright Red Hat Inc., 2019
+ *
+ * Authors:
+ *  Alexander Bulekov   


Bulekov instead of Oleinik?
Yes I changed my last name and the approval from the court finally came 
through last week :)

I'll make sure its consistent across v5.


+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+
+#include 
+#include 


stdio.h and stdlib.h are already included by qemu/osdep.h.


+/* Executed for each fuzzing-input */
+int LLVMFuzzerTestOneInput(const unsigned char *Data, size_t Size)
+{
+if (fuzz_target->fuzz) {


Will this ever be NULL?

I'll remove the check


+fuzz_target->fuzz(fuzz_qts, Data, Size);
+}
+return 0;
+}
+
+/* Executed once, prior to fuzzing */
+int LLVMFuzzerInitialize(int *argc, char ***argv, char ***envp)
+{
+
+char *target_name;
+
+/* Initialize qgraph and modules */
+qos_graph_init();
+module_call_init(MODULE_INIT_FUZZ_TARGET);
+module_call_init(MODULE_INIT_QOM);
+module_call_init(MODULE_INIT_LIBQOS);
+
+if (*argc <= 1) {
+usage(**argv);
+}
+
+/* Identify the fuzz target */
+target_name = (*argv)[1];
+if (!strstr(target_name, "--fuzz-target=")) {
+usage(**argv);
+}
+
+target_name += strlen("--fuzz-target=");
+
+fuzz_target = fuzz_get_target(target_name);
+if (!fuzz_target) {
+usage(**argv);
+}
+
+fuzz_qts = qtest_setup();
+
+if (!fuzz_target) {


This is dead code since fuzz_target was already checked above.  Please
remove this if statement.


+fprintf(stderr, "Error: Fuzz fuzz_target name %s not found\n",
+target_name);
+usage(**argv);
+}
+
+if (fuzz_target->pre_vm_init) {
+fuzz_target->pre_vm_init();
+}
+
+/* Run QEMU's softmmu main with the fuzz-target dependent arguments */
+char *init_cmdline = fuzz_target->get_init_cmdline(fuzz_target);


Where is init_cmdline freed or should this be const char *?


+wordexp_t result;
+wordexp(init_cmdline, , 0);


What is the purpose of word expansion here?
The fuzz target devs can specify arguments in a single string and not 
worry about calculating the argc and **argv - we take care of it for them.



+
+qemu_init(result.we_wordc, result.we_wordv, NULL);
+
+if (fuzz_target->pre_fuzz) {
+fuzz_target->pre_fuzz(fuzz_qts);
+}
+
+return 0;
+}
diff --git a/tests/fuzz/fuzz.h b/tests/fuzz/fuzz.h
new file mode 100644
index 00..b569b622d7
--- /dev/null
+++ b/tests/fuzz/fuzz.h
@@ -0,0 +1,66 @@
+/*
+ * fuzzing driver
+ *
+ * Copyright Red Hat Inc., 2019
+ *
+ * Authors:
+ *  Alexander Bulekov   
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ *
+ */
+
+#ifndef FUZZER_H_
+#define FUZZER_H_
+
+#include "qemu/osdep.h"
+#include "qemu/units.h"
+#include "qapi/error.h"
+#include "exec/memory.h"
+#include "tests/libqtest.h"
+
+


Some documentation would be nice:


...

Does the caller need to call g_free() on the returned string?  Please
document this.

...

s/to to/to/

...

Please also mention that QEMU has been initialized at this point.


...

"makes a copy of *target" -> does this mean the argument type can be
const FuzzTarget *target?



Thanks - I made changes to address these.
-Alex

--
===
I recently changed my last name from Oleinik to Bulekov
===



RE: [PATCH v2] WHPX: support for xcr0

2019-11-12 Thread Sunil Muthuswamy



> -Original Message-
> From: Sunil Muthuswamy
> Sent: Thursday, November 7, 2019 11:49 AM
> To: Paolo Bonzini ; Richard Henderson 
> ; Eduardo Habkost 
> Cc: qemu-devel@nongnu.org
> Subject: [PATCH v2] WHPX: support for xcr0
> 
> Support for xcr0 to be able to enable xsave/xrstor. This by itself
> is not sufficient to enable xsave/xrstor. WHPX XSAVE API's also
> needs to be hooked up.
> 
> Signed-off-by: Sunil Muthuswamy 
> ---
> You will need the Windows 10 SDK for RS5 (build 17763) or above to
> to be able to compile this patch because of the definition of the
> XCR0 register.
> 
> Changes since v1:
> - Added a sign-off line in the patch.
> 

Is it possible to get some eyes on this?



RE: [PATCH] WHPX: refactor load library

2019-11-12 Thread Sunil Muthuswamy



> -Original Message-
> From: Sunil Muthuswamy
> Sent: Friday, November 8, 2019 12:32 PM
> To: 'Paolo Bonzini' ; 'Richard Henderson' 
> ; 'Eduardo Habkost' ; 'Stefan
> Weil' 
> Cc: 'qemu-devel@nongnu.org' ; Justin Terry (VM) 
> 
> Subject: [PATCH] WHPX: refactor load library
> 
> This refactors the load library of WHV libraries to make it more
> modular. It makes a helper routine that can be called on demand.
> This allows future expansion of load library/functions to support
> functionality that is depenedent on some feature being available.
> 
> Signed-off-by: Sunil Muthuswamy 
> ---

Can I possibly get some eyes on this?




Re: [PULL 0/2] Linux user for 4.2 patches

2019-11-12 Thread Peter Maydell
On Tue, 12 Nov 2019 at 16:18, Laurent Vivier  wrote:
>
> The following changes since commit 2a7e7c3e103a5c29af7c583390c243d85a2527e8:
>
>   Merge remote-tracking branch 
> 'remotes/stsquad/tags/pull-testing-and-tcg-121119-1' into staging (2019-11-12 
> 14:51:00 +)
>
> are available in the Git repository at:
>
>   git://github.com/vivier/qemu.git tags/linux-user-for-4.2-pull-request
>
> for you to fetch changes up to 0f1f2d4596aee037d3ccbcf10592466daa54107f:
>
>   linux-user: remove host stime() syscall (2019-11-12 17:05:57 +0100)
>
> 
> Fix CID 1407221 and stime()
>
> 
>
> Laurent Vivier (2):
>   linux-user: fix missing break
>   linux-user: remove host stime() syscall
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/4.2
for any user-visible changes.

-- PMM



Re: [PATCH v1 5/5] Fix double free issue in qemu_set_log_filename().

2019-11-12 Thread Alex Bennée


Robert Foley  writes:

> After freeing the logfilename, we set logfilename to NULL, in case of an
> error which returns without setting logfilename.
>
> Signed-off-by: Robert Foley 

As this fixes an existing bug I would put this at the start of the
series. Otherwise:

Reviewed-by: Alex Bennée 

> ---
> v1
> - This is new in the patch v1.
> ---
>  util/log.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/util/log.c b/util/log.c
> index 802b8de42e..1eed74788c 100644
> --- a/util/log.c
> +++ b/util/log.c
> @@ -148,6 +148,7 @@ void qemu_set_log_filename(const char *filename, Error 
> **errp)
>  char *pidstr;
>
>  g_free(logfilename);
> +logfilename = NULL;
>
>  pidstr = strstr(filename, "%");
>  if (pidstr) {


--
Alex Bennée



Re: [PATCH v1 4/5] Added tests for close and change of logfile.

2019-11-12 Thread Alex Bennée


Robert Foley  writes:

> One test ensures that the logfile handle is still valid even if
> the logfile is changed during logging.
> The other test validates that the logfile handle remains valid under
> the logfile lock even if the logfile is closed.
>
> Signed-off-by: Robert Foley 

Reviewed-by: Alex Bennée 

> --
> v1
> - Changes for first round of code review comments.
> - Added in use of g_autofree, removed the g_free()s.
> - Added in use of logfile2 and changed sequence of asserts.
> ---
>  tests/test-logging.c | 80 
>  1 file changed, 80 insertions(+)
>
> diff --git a/tests/test-logging.c b/tests/test-logging.c
> index a12585f70a..1e646f045d 100644
> --- a/tests/test-logging.c
> +++ b/tests/test-logging.c
> @@ -108,6 +108,82 @@ static void test_parse_path(gconstpointer data)
>  error_free_or_abort();
>  }
>
> +static void test_logfile_write(gconstpointer data)
> +{
> +QemuLogFile *logfile;
> +QemuLogFile *logfile2;
> +gchar const *dir = data;
> +Error *err = NULL;
> +g_autofree gchar *file_path;
> +g_autofree gchar *file_path1;
> +FILE *orig_fd;
> +
> +/*
> + * Before starting test, set log flags, to ensure the file gets
> + * opened below with the call to qemu_set_log_filename().
> + * In cases where a logging backend other than log is used,
> + * this is needed.
> + */
> +qemu_set_log(CPU_LOG_TB_OUT_ASM);
> +file_path = g_build_filename(dir, "qemu_test_log_write0.log", NULL);
> +file_path1 = g_build_filename(dir, "qemu_test_log_write1.log", NULL);
> +
> +/*
> + * Test that even if an open file handle is changed,
> + * our handle remains valid due to RCU.
> + */
> +qemu_set_log_filename(file_path, );
> +g_assert(!err);
> +rcu_read_lock();
> +logfile = atomic_rcu_read(_logfile);
> +orig_fd = logfile->fd;
> +g_assert(logfile && logfile->fd);
> +fprintf(logfile->fd, "%s 1st write to file\n", __func__);
> +fflush(logfile->fd);
> +
> +/* Change the logfile and ensure that the handle is still valid. */
> +qemu_set_log_filename(file_path1, );
> +g_assert(!err);
> +logfile2 = atomic_rcu_read(_logfile);
> +g_assert(logfile->fd == orig_fd);
> +g_assert(logfile2->fd != logfile->fd);
> +fprintf(logfile->fd, "%s 2nd write to file\n", __func__);
> +fflush(logfile->fd);
> +rcu_read_unlock();
> +}
> +
> +static void test_logfile_lock(gconstpointer data)
> +{
> +FILE *logfile;
> +gchar const *dir = data;
> +Error *err = NULL;
> +g_autofree gchar *file_path;
> +
> +file_path = g_build_filename(dir, "qemu_test_logfile_lock0.log", NULL);
> +
> +/*
> + * Test the use of the logfile lock, such
> + * that even if an open file handle is closed,
> + * our handle remains valid for use due to RCU.
> + */
> +qemu_set_log_filename(file_path, );
> +logfile = qemu_log_lock();
> +g_assert(logfile);
> +fprintf(logfile, "%s 1st write to file\n", __func__);
> +fflush(logfile);
> +
> +/*
> + * Initiate a close file and make sure our handle remains
> + * valid since we still have the logfile lock.
> + */
> +qemu_log_close();
> +fprintf(logfile, "%s 2nd write to file\n", __func__);
> +fflush(logfile);
> +qemu_log_unlock(logfile);
> +
> +g_assert(!err);
> +}
> +
>  /* Remove a directory and all its entries (non-recursive). */
>  static void rmdir_full(gchar const *root)
>  {
> @@ -134,6 +210,10 @@ int main(int argc, char **argv)
>
>  g_test_add_func("/logging/parse_range", test_parse_range);
>  g_test_add_data_func("/logging/parse_path", tmp_path, test_parse_path);
> +g_test_add_data_func("/logging/logfile_write_path",
> + tmp_path, test_logfile_write);
> +g_test_add_data_func("/logging/logfile_lock_path",
> + tmp_path, test_logfile_lock);
>
>  rc = g_test_run();


--
Alex Bennée



Re: [PATCH v7 8/8] Acceptance test: add "boot_linux" tests

2019-11-12 Thread Philippe Mathieu-Daudé

On 11/4/19 4:13 PM, Cleber Rosa wrote:

This acceptance test, validates that a full blown Linux guest can
successfully boot in QEMU.  In this specific case, the guest chosen is
Fedora version 31.

  * x86_64, pc and q35 machine types, with and without kvm as an
accellerator


typo "accelerator"



  * aarch64 and virt machine type, with and without kvm as an
accellerator


Ditto.



  * ppc64 and pseries machine type

  * s390x and s390-ccw-virtio machine type

The method for checking the successful boot is based on "cloudinit"
and its "phone home" feature.  The guest is given an ISO image
with the location of the phone home server, and the information to
post (the instance ID).  Upon receiving the correct information,
from the guest, the test is considered to have PASSed.

This test is currently limited to user mode networking only, and
instructs the guest to connect to the "router" address that is hard
coded in QEMU.

To create the cloudinit ISO image that will be used to configure the
guest, the pycdlib library is also required and has been added as
requirement to the virtual environment created by "check-venv".

The console output is read by a separate thread, by means of the
Avocado datadrainer utility module.

Signed-off-by: Cleber Rosa 
---





Re: [PATCH v7 4/8] Acceptance tests: use relative location for tests

2019-11-12 Thread Philippe Mathieu-Daudé

On 11/11/19 11:11 PM, Cleber Rosa wrote:

On Mon, Nov 04, 2019 at 07:26:23PM +0100, Philippe Mathieu-Daudé wrote:

On 11/4/19 4:13 PM, Cleber Rosa wrote:

An Avocado Test ID[1] is composed by a number of components, but it
starts with the Test Name, usually a file system location that was
given to the loader.

Because the source directory is being given as a prefix to the
"tests/acceptance" directory containing the acceptance tests, the test
names will needlessly include the directory the user is using to host
the QEMU sources (and/or build tree).

Let's remove the source dir (or a build dir) from the path given to
the test loader.  This should give more constant names, and when using
result servers and databases, it should give the same test names
across executions from different people or from different directories.


Can we strip the full path to directory and only keep the filename in the
database? (Thinking about out-of-tree tests).



Yes, absolutely, but this needs to be done one the Avocado side.  TBH,
I have ideas to make this go even further, such as:

  1) the stripping of the "test_" prefix of the test method

  2) replace the full path to a directory with tests for a "test suite"
 alias (default to the directory name itself)

  3) test suite alias will be persisted on test result such as reports
 or machine, but ommited from the human UI

  4) full path to directory, exact version of test files (git hash) will
 all be metadata and not part of the test ID

Roughly speaking, something which is listed like this currently:

   $ avocado list tests/acceptance/
   INSTRUMENTED 
tests/acceptance/boot_linux_console.py:BootLinuxConsole.test_x86_64_pc
   ...

When executed, would be shown as:

   JOB ID : fb885e9c3e7dc50534ec380a7e988cbf94233f07
   JOB LOG: 
/home/cleber/avocado/job-results/job-2019-11-11T17.07-fb885e9/job.log
(1/1) acceptance/boot_linux_console.py:BootLinuxConsole.x86_64_pc: PASS 
(2.17 s)


For the particular use case of QEMU, we can also strip the "acceptance/" 
part (and eventually ".py").



   RESULTS: PASS 1 | ERROR 0 | FAIL 0 | SKIP 0 | WARN 0 | INTERRUPT 0 | 
CANCEL 0
   JOB TIME   : 2.35 s

How does that sound?

- Cleber.






Re: [PATCH v7 3/8] Acceptance tests: use avocado tags for machine type

2019-11-12 Thread Philippe Mathieu-Daudé

On 11/12/19 2:59 AM, Cleber Rosa wrote:

On Fri, Nov 08, 2019 at 02:20:45PM +0100, Philippe Mathieu-Daudé wrote:

On 11/4/19 4:13 PM, Cleber Rosa wrote:

   """
-self.vm.set_machine('none')
   self.vm.add_args('-S')
   self.vm.launch()
diff --git a/tests/acceptance/linux_initrd.py b/tests/acceptance/linux_initrd.py
index c61d9826a4..3a0ff7b098 100644
--- a/tests/acceptance/linux_initrd.py
+++ b/tests/acceptance/linux_initrd.py
@@ -20,6 +20,7 @@ class LinuxInitrd(Test):
   Checks QEMU evaluates correctly the initrd file passed as -initrd option.
   :avocado: tags=arch:x86_64
+:avocado: tags=machine:pc


For some tests we can run on multiple machines (here q35), I was tempted to
use multiple tags. How could I do that now?



I missed this comment: you can add many tag values here to *classify*
the test as being "q35 machine type capable".

But, Avocado will only run a test multiple times with a varianter
plugin active.  In that case, a "machine" *parameter* with different
values will be passed to the tests.  This tag value is being used
as a default value for the parameter, so it has a lower precedence.

We have a pending task[1] to create an initial CIT file for arch and
machine types.

CC'ing Jan Richter, who is supposed to start working on it soon.

- Cleber.

[1] - 
https://trello.com/c/1wvzcxHY/105-create-cit-parameter-for-acceptance-tests


Good news, thanks for the trello link.




Re: [PATCH v7 3/8] Acceptance tests: use avocado tags for machine type

2019-11-12 Thread Philippe Mathieu-Daudé

On 11/11/19 10:49 PM, Cleber Rosa wrote:

On Fri, Nov 08, 2019 at 02:20:45PM +0100, Philippe Mathieu-Daudé wrote:

@@ -310,7 +302,7 @@ class BootLinuxConsole(Test):
   def test_arm_emcraft_sf2(self):
   """
   :avocado: tags=arch:arm
-:avocado: tags=machine:emcraft_sf2
+:avocado: tags=machine:emcraft-sf2


Maybe add a comment about this change, "Since avocado 72(?) we can ... so
use ..."



You mean on this specific test docstring?  I'm confused if there's a


No! Just in the commit description :)


special reason for doing it here, of if you're suggesting adding a
similar command to all tag entries which make use of the extended
character set (which I think would be too verbose, repetitve, and hard
to maintain).


diff --git a/tests/acceptance/cpu_queries.py b/tests/acceptance/cpu_queries.py
index af47d2795a..293dccb89a 100644
--- a/tests/acceptance/cpu_queries.py
+++ b/tests/acceptance/cpu_queries.py
@@ -20,8 +20,8 @@ class QueryCPUModelExpansion(Test):
   def test(self):
   """
   :avocado: tags=arch:x86_64
+:avocado: tags=machine:none


Not to confuse with None :)



Yep! :)

- Cleber.






Re: [PATCH 00/55] Patch Round-up for stable 4.1.1, freeze on 2019-11-12

2019-11-12 Thread Michael Roth
Quoting Michael Roth (2019-11-05 14:51:48)
> Hi everyone,
> 
> The following new patches are queued for QEMU stable v4.1.1:
> 
>   https://github.com/mdroth/qemu/commits/stable-4.1-staging
> 
> The release is tentatively planned for 2019-11-14:
> 
>   https://wiki.qemu.org/Planning/4.1
> 
> Please note that the original release date was planned for 2019-11-21,
> but was moved up to address a number of qcow2 corruption issues:
> 
>   https://lists.gnu.org/archive/html/qemu-devel/2019-10/msg07144.html
> 
> Fixes for the XFS issues noted in the thread are still pending, but will
> hopefully be qemu.git master in time for 4.1.1 freeze and the
> currently-scheduled release date for 4.2.0-rc1.
> 
> The list of still-pending patchsets being tracked for inclusion are:
> 
>   qcow2: Fix data corruption on XFS
> https://lists.gnu.org/archive/html/qemu-devel/2019-11/msg00073.html
> (PULL pending)
>   qcow2: Fix QCOW2_COMPRESSED_SECTOR_MASK
> https://lists.gnu.org/archive/html/qemu-devel/2019-10/msg07718.html
>   qcow2-bitmap: Fix uint64_t left-shift overflow
> https://lists.gnu.org/archive/html/qemu-devel/2019-10/msg07989.html
> 
> Please respond here or CC qemu-sta...@nongnu.org on any additional patches
> you think should be included in the release.

The following additional patches have been pushed to the staging tree:

  tests: make filemonitor test more robust to event ordering
  block: posix: Always allocate the first block
  file-posix: Handle undetectable alignment
  block/file-posix: Let post-EOF fallocate serialize
  block: Add bdrv_co_get_self_request()
  block: Make wait/mark serialising requests public
  block/io: refactor padding
  util/iov: improve qemu_iovec_is_zero
  util/iov: introduce qemu_iovec_init_extended
  qcow2-bitmap: Fix uint64_t left-shift overflow
  iotests: Add peek_file* functions
  iotests: Add test for 4G+ compressed qcow2 write
  qcow2: Fix QCOW2_COMPRESSED_SECTOR_MASK

Thank you for the suggestions.

> 
> Thanks!
> 
> 
> Adrian Moreno (1):
>   vhost-user: save features if the char dev is closed
> 
> Alberto Garcia (1):
>   qcow2: Fix the calculation of the maximum L2 cache size
> 
> Anthony PERARD (1):
>   xen-bus: Fix backend state transition on device reset
> 
> Aurelien Jarno (1):
>   target/alpha: fix tlb_fill trap_arg2 value for instruction fetch
> 
> Christophe Lyon (1):
>   target/arm: Allow reading flags from FPSCR for M-profile
> 
> David Hildenbrand (1):
>   s390x/tcg: Fix VERIM with 32/64 bit elements
> 
> Eduardo Habkost (1):
>   pc: Don't make die-id mandatory unless necessary
> 
> Fan Yang (1):
>   COLO-compare: Fix incorrect `if` logic
> 
> Hikaru Nishida (1):
>   ui: Fix hanging up Cocoa display on macOS 10.15 (Catalina)
> 
> Igor Mammedov (1):
>   x86: do not advertise die-id in query-hotpluggbale-cpus if '-smp dies' 
> is not set
> 
> Johannes Berg (1):
>   libvhost-user: fix SLAVE_SEND_FD handling
> 
> John Snow (2):
>   Revert "ide/ahci: Check for -ECANCELED in aio callbacks"
>   iotests: add testing shim for script-style python tests
> 
> Kevin Wolf (4):
>   coroutine: Add qemu_co_mutex_assert_locked()
>   qcow2: Fix corruption bug in qcow2_detect_metadata_preallocation()
>   block/snapshot: Restrict set of snapshot nodes
>   iotests: Test internal snapshots with -blockdev
> 
> Markus Armbruster (1):
>   pr-manager: Fix invalid g_free() crash bug
> 
> Matthew Rosato (1):
>   s390: PCI: fix IOMMU region init
> 
> Max Filippov (1):
>   target/xtensa: regenerate and re-import test_mmuhifi_c3 core
> 
> Max Reitz (16):
>   block/file-posix: Reduce xfsctl() use
>   iotests: Test reverse sub-cluster qcow2 writes
>   vpc: Return 0 from vpc_co_create() on success
>   iotests: Add supported protocols to execute_test()
>   iotests: Restrict file Python tests to file
>   iotests: Restrict nbd Python tests to nbd
>   iotests: Test blockdev-create for vpc
>   curl: Keep pointer to the CURLState in CURLSocket
>   curl: Keep *socket until the end of curl_sock_cb()
>   curl: Check completion in curl_multi_do()
>   curl: Pass CURLSocket to curl_multi_do()
>   curl: Report only ready sockets
>   curl: Handle success in multi_check_completion
>   qcow2: Limit total allocation range to INT_MAX
>   iotests: Test large write request to qcow2 file
>   mirror: Do not dereference invalid pointers
> 
> Maxim Levitsky (1):
>   block/qcow2: Fix corruption introduced by commit 8ac0f15f335
> 
> Michael Roth (2):
>   make-release: pull in edk2 submodules so we can build it from tarballs
>   roms/Makefile.edk2: don't pull in submodules when building from tarball
> 
> Michael S. Tsirkin (1):
>   virtio: new post_load hook
> 
> Mikhail Sennikovsky (1):
>   virtio-net: prevent offloads reset on migration
> 
> Paolo Bonzini (2):
>   dma-helpers: ensure 

Re: [PATCH] microvm: fix memory leak in microvm_fix_kernel_cmdline

2019-11-12 Thread Paolo Bonzini
On 12/11/19 17:34, Sergio Lopez wrote:
> In microvm_fix_kernel_cmdline(), fw_cfg_modify_string() is duplicating
> cmdline instead of taking ownership of it. Free it afterwards to avoid
> leaking it.
> 
> Reported-by: Coverity (CID 1407218)
> Suggested-by: Peter Maydell 
> Signed-off-by: Sergio Lopez 
> ---
>  hw/i386/microvm.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
> index 8aacd6c8d1..def37e60f7 100644
> --- a/hw/i386/microvm.c
> +++ b/hw/i386/microvm.c
> @@ -331,6 +331,8 @@ static void microvm_fix_kernel_cmdline(MachineState 
> *machine)
>  
>  fw_cfg_modify_i32(x86ms->fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(cmdline) + 
> 1);
>  fw_cfg_modify_string(x86ms->fw_cfg, FW_CFG_CMDLINE_DATA, cmdline);
> +
> +g_free(cmdline);
>  }
>  
>  static void microvm_machine_state_init(MachineState *machine)
> 

Queued, thanks.

Paolo




[PATCH v9 QEMU 12/15] vfio: Add load state functions to SaveVMHandlers

2019-11-12 Thread Kirti Wankhede
Sequence  during _RESUMING device state:
While data for this device is available, repeat below steps:
a. read data_offset from where user application should write data.
b. write data of data_size to migration region from data_offset.
c. write data_size which indicates vendor driver that data is written in
   staging buffer.

For user, data is opaque. User should write data in the same order as
received.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/migration.c  | 170 +++
 hw/vfio/trace-events |   3 +
 2 files changed, 173 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index f890e864e174..16e12586fe8b 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -251,6 +251,33 @@ static int vfio_save_device_config_state(QEMUFile *f, void 
*opaque)
 return qemu_file_get_error(f);
 }
 
+static int vfio_load_device_config_state(QEMUFile *f, void *opaque)
+{
+VFIODevice *vbasedev = opaque;
+uint64_t data;
+
+if (vbasedev->ops && vbasedev->ops->vfio_load_config) {
+int ret;
+
+ret = vbasedev->ops->vfio_load_config(vbasedev, f);
+if (ret) {
+error_report("%s: Failed to load device config space",
+ vbasedev->name);
+return ret;
+}
+}
+
+data = qemu_get_be64(f);
+if (data != VFIO_MIG_FLAG_END_OF_STATE) {
+error_report("%s: Failed loading device config space, "
+ "end flag incorrect 0x%"PRIx64, vbasedev->name, data);
+return -EINVAL;
+}
+
+trace_vfio_load_device_config_state(vbasedev->name);
+return qemu_file_get_error(f);
+}
+
 /* -- */
 
 static int vfio_save_setup(QEMUFile *f, void *opaque)
@@ -410,12 +437,155 @@ static int vfio_save_complete_precopy(QEMUFile *f, void 
*opaque)
 return ret;
 }
 
+static int vfio_load_setup(QEMUFile *f, void *opaque)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+int ret = 0;
+
+if (migration->region.mmaps) {
+ret = vfio_region_mmap(>region);
+if (ret) {
+error_report("%s: Failed to mmap VFIO migration region %d: %s",
+ vbasedev->name, migration->region.nr,
+ strerror(-ret));
+return ret;
+}
+}
+
+ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RESUMING, 0);
+if (ret) {
+error_report("%s: Failed to set state RESUMING", vbasedev->name);
+}
+return ret;
+}
+
+static int vfio_load_cleanup(void *opaque)
+{
+vfio_save_cleanup(opaque);
+return 0;
+}
+
+static int vfio_load_state(QEMUFile *f, void *opaque, int version_id)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+int ret = 0;
+uint64_t data, data_size;
+
+data = qemu_get_be64(f);
+while (data != VFIO_MIG_FLAG_END_OF_STATE) {
+
+trace_vfio_load_state(vbasedev->name, data);
+
+switch (data) {
+case VFIO_MIG_FLAG_DEV_CONFIG_STATE:
+{
+ret = vfio_load_device_config_state(f, opaque);
+if (ret) {
+return ret;
+}
+break;
+}
+case VFIO_MIG_FLAG_DEV_SETUP_STATE:
+{
+data = qemu_get_be64(f);
+if (data == VFIO_MIG_FLAG_END_OF_STATE) {
+return ret;
+} else {
+error_report("%s: SETUP STATE: EOS not found 0x%"PRIx64,
+ vbasedev->name, data);
+return -EINVAL;
+}
+break;
+}
+case VFIO_MIG_FLAG_DEV_DATA_STATE:
+{
+VFIORegion *region = >region;
+void *buf = NULL;
+bool buffer_mmaped = false;
+uint64_t data_offset = 0;
+
+data_size = qemu_get_be64(f);
+if (data_size == 0) {
+break;
+}
+
+ret = pread(vbasedev->fd, _offset, sizeof(data_offset),
+region->fd_offset +
+offsetof(struct vfio_device_migration_info,
+data_offset));
+if (ret != sizeof(data_offset)) {
+error_report("%s:Failed to get migration buffer data offset 
%d",
+ vbasedev->name, ret);
+return -EINVAL;
+}
+
+if (region->mmaps) {
+buf = find_data_region(region, data_offset, data_size);
+}
+
+buffer_mmaped = (buf != NULL) ? true : false;
+
+if (!buffer_mmaped) {
+buf = g_try_malloc0(data_size);
+if (!buf) {
+error_report("%s: Error allocating buffer ", __func__);
+return -ENOMEM;
+}
+}
+
+qemu_get_buffer(f, buf, 

[PATCH v9 QEMU 15/15] vfio: Make vfio-pci device migration capable.

2019-11-12 Thread Kirti Wankhede
If device is not failover primary device call vfio_migration_probe()
and vfio_migration_finalize() functions for vfio-pci device to enable
migration for vfio PCI device which support migration.
Removed vfio_pci_vmstate structure.
Removed migration blocker from VFIO PCI device specific structure and use
migration blocker from generic structure of  VFIO device.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/pci.c | 30 +++---
 hw/vfio/pci.h |  1 -
 2 files changed, 11 insertions(+), 20 deletions(-)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 2c22cca0c3be..3d2ebc7abfdc 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2909,21 +2909,11 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 return;
 }
 
-if (!pdev->failover_pair_id) {
-error_setg(>migration_blocker,
-"VFIO device doesn't support migration");
-ret = migrate_add_blocker(vdev->migration_blocker, );
-if (err) {
-error_propagate(errp, err);
-error_free(vdev->migration_blocker);
-return;
-}
-}
-
 vdev->vbasedev.name = g_path_get_basename(vdev->vbasedev.sysfsdev);
 vdev->vbasedev.ops = _pci_ops;
 vdev->vbasedev.type = VFIO_DEVICE_TYPE_PCI;
 vdev->vbasedev.dev = DEVICE(vdev);
+vdev->vbasedev.device_state = 0;
 
 tmp = g_strdup_printf("%s/iommu_group", vdev->vbasedev.sysfsdev);
 len = readlink(tmp, group_path, sizeof(group_path));
@@ -3184,6 +3174,14 @@ static void vfio_realize(PCIDevice *pdev, Error **errp)
 }
 }
 
+if (!pdev->failover_pair_id) {
+ret = vfio_migration_probe(>vbasedev, errp);
+if (ret) {
+error_report("%s: Failed to setup for migration",
+ vdev->vbasedev.name);
+}
+}
+
 vfio_register_err_notifier(vdev);
 vfio_register_req_notifier(vdev);
 vfio_setup_resetfn_quirk(vdev);
@@ -3196,10 +3194,6 @@ out_teardown:
 vfio_bars_exit(vdev);
 error:
 error_prepend(errp, VFIO_MSG_PREFIX, vdev->vbasedev.name);
-if (vdev->migration_blocker) {
-migrate_del_blocker(vdev->migration_blocker);
-error_free(vdev->migration_blocker);
-}
 }
 
 static void vfio_instance_finalize(Object *obj)
@@ -3207,14 +3201,11 @@ static void vfio_instance_finalize(Object *obj)
 VFIOPCIDevice *vdev = PCI_VFIO(obj);
 VFIOGroup *group = vdev->vbasedev.group;
 
+vdev->vbasedev.device_state = 0;
 vfio_display_finalize(vdev);
 vfio_bars_finalize(vdev);
 g_free(vdev->emulated_config_bits);
 g_free(vdev->rom);
-if (vdev->migration_blocker) {
-migrate_del_blocker(vdev->migration_blocker);
-error_free(vdev->migration_blocker);
-}
 /*
  * XXX Leaking igd_opregion is not an oversight, we can't remove the
  * fw_cfg entry therefore leaking this allocation seems like the safest
@@ -3239,6 +3230,7 @@ static void vfio_exitfn(PCIDevice *pdev)
 }
 vfio_teardown_msi(vdev);
 vfio_bars_exit(vdev);
+vfio_migration_finalize(>vbasedev);
 }
 
 static void vfio_pci_reset(DeviceState *dev)
diff --git a/hw/vfio/pci.h b/hw/vfio/pci.h
index b329d50338b5..834a90d64686 100644
--- a/hw/vfio/pci.h
+++ b/hw/vfio/pci.h
@@ -168,7 +168,6 @@ typedef struct VFIOPCIDevice {
 bool no_vfio_ioeventfd;
 bool enable_ramfb;
 VFIODisplay *dpy;
-Error *migration_blocker;
 } VFIOPCIDevice;
 
 uint32_t vfio_pci_read_config(PCIDevice *pdev, uint32_t addr, int len);
-- 
2.7.0




[PATCH v9 QEMU 09/15] vfio: Add migration state change notifier

2019-11-12 Thread Kirti Wankhede
Added migration state change notifier to get notification on migration state
change. These states are translated to VFIO device state and conveyed to vendor
driver.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/migration.c   | 28 
 hw/vfio/trace-events  |  1 +
 include/hw/vfio/vfio-common.h |  1 +
 3 files changed, 30 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 28981a759e6c..7e7aeb58647e 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -136,6 +136,26 @@ static void vfio_vmstate_change(void *opaque, int running, 
RunState state)
 }
 }
 
+static void vfio_migration_state_notifier(Notifier *notifier, void *data)
+{
+MigrationState *s = data;
+VFIODevice *vbasedev = container_of(notifier, VFIODevice, migration_state);
+int ret;
+
+trace_vfio_migration_state_notifier(vbasedev->name, s->state);
+
+switch (s->state) {
+case MIGRATION_STATUS_CANCELLING:
+case MIGRATION_STATUS_CANCELLED:
+case MIGRATION_STATUS_FAILED:
+ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_RUNNING,
+   VFIO_DEVICE_STATE_SAVING | VFIO_DEVICE_STATE_RESUMING);
+if (ret) {
+error_report("%s: Failed to set state RUNNING", vbasedev->name);
+}
+}
+}
+
 static int vfio_migration_init(VFIODevice *vbasedev,
struct vfio_region_info *info)
 {
@@ -154,6 +174,9 @@ static int vfio_migration_init(VFIODevice *vbasedev,
 vbasedev->vm_state = qemu_add_vm_change_state_handler(vfio_vmstate_change,
   vbasedev);
 
+vbasedev->migration_state.notify = vfio_migration_state_notifier;
+add_migration_state_change_notifier(>migration_state);
+
 return 0;
 }
 
@@ -192,6 +215,11 @@ add_blocker:
 
 void vfio_migration_finalize(VFIODevice *vbasedev)
 {
+
+if (vbasedev->migration_state.notify) {
+remove_migration_state_change_notifier(>migration_state);
+}
+
 if (vbasedev->vm_state) {
 qemu_del_vm_change_state_handler(vbasedev->vm_state);
 }
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 3d15bacd031a..69503228f20e 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -148,3 +148,4 @@ vfio_display_edid_write_error(void) ""
 vfio_migration_probe(char *name, uint32_t index) " (%s) Region %d"
 vfio_migration_set_state(char *name, uint32_t state) " (%s) state %d"
 vfio_vmstate_change(char *name, int running, const char *reason, uint32_t 
dev_state) " (%s) running %d reason %s device state %d"
+vfio_migration_state_notifier(char *name, int state) " (%s) state %d"
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 6573acd6738e..bd280396d702 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -124,6 +124,7 @@ typedef struct VFIODevice {
 uint32_t device_state;
 VMChangeStateEntry *vm_state;
 int vm_running;
+Notifier migration_state;
 } VFIODevice;
 
 struct VFIODeviceOps {
-- 
2.7.0




[PATCH v9 QEMU 06/15] vfio: Add save and load functions for VFIO PCI devices

2019-11-12 Thread Kirti Wankhede
These functions save and restore PCI device specific data - config
space of PCI device.
Tested save and restore with MSI and MSIX type.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/pci.c | 168 ++
 include/hw/vfio/vfio-common.h |   2 +
 2 files changed, 170 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index 4ae02e71622a..2c22cca0c3be 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -41,6 +41,7 @@
 #include "trace.h"
 #include "qapi/error.h"
 #include "migration/blocker.h"
+#include "migration/qemu-file.h"
 
 #define TYPE_VFIO_PCI "vfio-pci"
 #define PCI_VFIO(obj)OBJECT_CHECK(VFIOPCIDevice, obj, TYPE_VFIO_PCI)
@@ -1620,6 +1621,55 @@ static void vfio_bars_prepare(VFIOPCIDevice *vdev)
 }
 }
 
+static int vfio_bar_validate(VFIOPCIDevice *vdev, int nr)
+{
+PCIDevice *pdev = >pdev;
+VFIOBAR *bar = >bars[nr];
+uint64_t addr;
+uint32_t addr_lo, addr_hi = 0;
+
+/* Skip unimplemented BARs and the upper half of 64bit BARS. */
+if (!bar->size) {
+return 0;
+}
+
+/* skip IO BAR */
+if (bar->ioport) {
+return 0;
+}
+
+addr_lo = pci_default_read_config(pdev, PCI_BASE_ADDRESS_0 + nr * 4, 4);
+
+addr_lo = addr_lo & (bar->ioport ? PCI_BASE_ADDRESS_IO_MASK :
+   PCI_BASE_ADDRESS_MEM_MASK);
+if (bar->type == PCI_BASE_ADDRESS_MEM_TYPE_64) {
+addr_hi = pci_default_read_config(pdev,
+ PCI_BASE_ADDRESS_0 + (nr + 1) * 4, 4);
+}
+
+addr = ((uint64_t)addr_hi << 32) | addr_lo;
+
+if (!QEMU_IS_ALIGNED(addr, bar->size)) {
+return -EINVAL;
+}
+
+return 0;
+}
+
+static int vfio_bars_validate(VFIOPCIDevice *vdev)
+{
+int i, ret;
+
+for (i = 0; i < PCI_ROM_SLOT; i++) {
+ret = vfio_bar_validate(vdev, i);
+if (ret) {
+error_report("vfio: BAR address %d validation failed", i);
+return ret;
+}
+}
+return 0;
+}
+
 static void vfio_bar_register(VFIOPCIDevice *vdev, int nr)
 {
 VFIOBAR *bar = >bars[nr];
@@ -2402,11 +2452,129 @@ static Object *vfio_pci_get_object(VFIODevice 
*vbasedev)
 return OBJECT(vdev);
 }
 
+static void vfio_pci_save_config(VFIODevice *vbasedev, QEMUFile *f)
+{
+VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+PCIDevice *pdev = >pdev;
+uint16_t pci_cmd;
+int i;
+
+for (i = 0; i < PCI_ROM_SLOT; i++) {
+uint32_t bar;
+
+bar = pci_default_read_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, 4);
+qemu_put_be32(f, bar);
+}
+
+qemu_put_be32(f, vdev->interrupt);
+if (vdev->interrupt == VFIO_INT_MSI) {
+uint32_t msi_flags, msi_addr_lo, msi_addr_hi = 0, msi_data;
+bool msi_64bit;
+
+msi_flags = pci_default_read_config(pdev, pdev->msi_cap + 
PCI_MSI_FLAGS,
+2);
+msi_64bit = (msi_flags & PCI_MSI_FLAGS_64BIT);
+
+msi_addr_lo = pci_default_read_config(pdev,
+ pdev->msi_cap + PCI_MSI_ADDRESS_LO, 
4);
+qemu_put_be32(f, msi_addr_lo);
+
+if (msi_64bit) {
+msi_addr_hi = pci_default_read_config(pdev,
+ pdev->msi_cap + 
PCI_MSI_ADDRESS_HI,
+ 4);
+}
+qemu_put_be32(f, msi_addr_hi);
+
+msi_data = pci_default_read_config(pdev,
+pdev->msi_cap + (msi_64bit ? PCI_MSI_DATA_64 : 
PCI_MSI_DATA_32),
+2);
+qemu_put_be32(f, msi_data);
+} else if (vdev->interrupt == VFIO_INT_MSIX) {
+uint16_t offset;
+
+/* save enable bit and maskall bit */
+offset = pci_default_read_config(pdev,
+   pdev->msix_cap + PCI_MSIX_FLAGS + 1, 2);
+qemu_put_be16(f, offset);
+msix_save(pdev, f);
+}
+pci_cmd = pci_default_read_config(pdev, PCI_COMMAND, 2);
+qemu_put_be16(f, pci_cmd);
+}
+
+static int vfio_pci_load_config(VFIODevice *vbasedev, QEMUFile *f)
+{
+VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+PCIDevice *pdev = >pdev;
+uint32_t interrupt_type;
+uint32_t msi_flags, msi_addr_lo, msi_addr_hi = 0, msi_data;
+uint16_t pci_cmd;
+bool msi_64bit;
+int i, ret;
+
+/* retore pci bar configuration */
+pci_cmd = pci_default_read_config(pdev, PCI_COMMAND, 2);
+vfio_pci_write_config(pdev, PCI_COMMAND,
+pci_cmd & (!(PCI_COMMAND_IO | PCI_COMMAND_MEMORY)), 2);
+for (i = 0; i < PCI_ROM_SLOT; i++) {
+uint32_t bar = qemu_get_be32(f);
+
+vfio_pci_write_config(pdev, PCI_BASE_ADDRESS_0 + i * 4, bar, 4);
+}
+
+ret = vfio_bars_validate(vdev);
+if (ret) {
+return ret;
+}
+
+interrupt_type = qemu_get_be32(f);
+
+if (interrupt_type == VFIO_INT_MSI) {
+

[PATCH v9 QEMU 13/15] vfio: Add vfio_listener_log_sync to mark dirty pages

2019-11-12 Thread Kirti Wankhede
vfio_listener_log_sync gets list of dirty pages from container using
VFIO_IOMMU_GET_DIRTY_BITMAP ioctl and mark those pages dirty when all
devices are stopped and saving state.
Return early for the RAM block section of mapped MMIO region.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/common.c | 103 +++
 hw/vfio/trace-events |   1 +
 2 files changed, 104 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index ade9839c28a3..66f1c64bf074 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -29,6 +29,7 @@
 #include "hw/vfio/vfio.h"
 #include "exec/address-spaces.h"
 #include "exec/memory.h"
+#include "exec/ram_addr.h"
 #include "hw/hw.h"
 #include "qemu/error-report.h"
 #include "qemu/main-loop.h"
@@ -38,6 +39,7 @@
 #include "sysemu/reset.h"
 #include "trace.h"
 #include "qapi/error.h"
+#include "migration/migration.h"
 
 VFIOGroupList vfio_group_list =
 QLIST_HEAD_INITIALIZER(vfio_group_list);
@@ -288,6 +290,28 @@ const MemoryRegionOps vfio_region_ops = {
 };
 
 /*
+ * Device state interfaces
+ */
+
+static bool vfio_devices_are_stopped_and_saving(void)
+{
+VFIOGroup *group;
+VFIODevice *vbasedev;
+
+QLIST_FOREACH(group, _group_list, next) {
+QLIST_FOREACH(vbasedev, >device_list, next) {
+if ((vbasedev->device_state & VFIO_DEVICE_STATE_SAVING) &&
+!(vbasedev->device_state & VFIO_DEVICE_STATE_RUNNING)) {
+continue;
+} else {
+return false;
+}
+}
+}
+return true;
+}
+
+/*
  * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
  */
 static int vfio_dma_unmap(VFIOContainer *container,
@@ -813,9 +837,88 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
 }
 }
 
+static int vfio_get_dirty_bitmap(VFIOContainer *container,
+ MemoryRegionSection *section)
+{
+struct vfio_iommu_type1_dirty_bitmap range;
+uint64_t bitmap_size;
+int ret;
+
+range.argsz = sizeof(range);
+
+if (memory_region_is_iommu(section->mr)) {
+VFIOGuestIOMMU *giommu;
+IOMMUTLBEntry iotlb;
+
+QLIST_FOREACH(giommu, >giommu_list, giommu_next) {
+if (MEMORY_REGION(giommu->iommu) == section->mr &&
+giommu->n.start == section->offset_within_region) {
+break;
+}
+}
+
+if (!giommu) {
+return -EINVAL;
+}
+
+iotlb = address_space_get_iotlb_entry(container->space->as,
+   TARGET_PAGE_ALIGN(section->offset_within_address_space),
+   true, MEMTXATTRS_UNSPECIFIED);
+range.iova = iotlb.iova + giommu->iommu_offset;
+range.size = iotlb.addr_mask + 1;
+} else {
+range.iova = TARGET_PAGE_ALIGN(section->offset_within_address_space);
+range.size = int128_get64(section->size);
+}
+
+bitmap_size = BITS_TO_LONGS(range.size >> TARGET_PAGE_BITS) *
+ sizeof(uint64_t);
+
+range.bitmap = g_try_malloc0(bitmap_size);
+if (!range.bitmap) {
+error_report("%s: Error allocating bitmap buffer of size 0x%lx",
+ __func__, bitmap_size);
+return -ENOMEM;
+}
+
+range.bitmap_size = bitmap_size;
+
+ret = ioctl(container->fd, VFIO_IOMMU_GET_DIRTY_BITMAP, );
+
+if (!ret) {
+cpu_physical_memory_set_dirty_lebitmap((uint64_t *)range.bitmap,
+   TARGET_PAGE_ALIGN(section->offset_within_address_space),
+   bitmap_size >> TARGET_PAGE_BITS);
+} else {
+error_report("VFIO_IOMMU_GET_DIRTY_BITMAP: %d %d", ret, errno);
+}
+
+trace_vfio_get_dirty_bitmap(container->fd, range.iova, range.size,
+bitmap_size);
+
+g_free(range.bitmap);
+return ret;
+}
+
+static void vfio_listerner_log_sync(MemoryListener *listener,
+MemoryRegionSection *section)
+{
+VFIOContainer *container = container_of(listener, VFIOContainer, listener);
+
+if (memory_region_is_ram_device(section->mr)) {
+return;
+}
+
+if (vfio_devices_are_stopped_and_saving()) {
+
+vfio_get_dirty_bitmap(container, section);
+}
+}
+
 static const MemoryListener vfio_memory_listener = {
 .region_add = vfio_listener_region_add,
 .region_del = vfio_listener_region_del,
+.log_sync = vfio_listerner_log_sync,
 };
 
 static void vfio_listener_release(VFIOContainer *container)
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index ac065b559f4e..0dd1f2ffe648 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -160,3 +160,4 @@ vfio_save_complete_precopy(char *name) " (%s)"
 vfio_load_device_config_state(char *name) " (%s)"
 vfio_load_state(char *name, uint64_t data) " (%s) data 0x%"PRIx64
 vfio_load_state_device_data(char *name, uint64_t data_offset, uint64_t 

[PATCH v9 QEMU 05/15] vfio: Add vfio_get_object callback to VFIODeviceOps

2019-11-12 Thread Kirti Wankhede
Hook vfio_get_object callback for PCI devices.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
Suggested-by: Cornelia Huck 
Reviewed-by: Cornelia Huck 
---
 hw/vfio/pci.c | 8 
 include/hw/vfio/vfio-common.h | 1 +
 2 files changed, 9 insertions(+)

diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
index e6569a796850..4ae02e71622a 100644
--- a/hw/vfio/pci.c
+++ b/hw/vfio/pci.c
@@ -2395,10 +2395,18 @@ static void vfio_pci_compute_needs_reset(VFIODevice 
*vbasedev)
 }
 }
 
+static Object *vfio_pci_get_object(VFIODevice *vbasedev)
+{
+VFIOPCIDevice *vdev = container_of(vbasedev, VFIOPCIDevice, vbasedev);
+
+return OBJECT(vdev);
+}
+
 static VFIODeviceOps vfio_pci_ops = {
 .vfio_compute_needs_reset = vfio_pci_compute_needs_reset,
 .vfio_hot_reset_multi = vfio_pci_hot_reset_multi,
 .vfio_eoi = vfio_intx_eoi,
+.vfio_get_object = vfio_pci_get_object,
 };
 
 int vfio_populate_vga(VFIOPCIDevice *vdev, Error **errp)
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 8d7a0fbb1046..74261feaeac9 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -119,6 +119,7 @@ struct VFIODeviceOps {
 void (*vfio_compute_needs_reset)(VFIODevice *vdev);
 int (*vfio_hot_reset_multi)(VFIODevice *vdev);
 void (*vfio_eoi)(VFIODevice *vdev);
+Object *(*vfio_get_object)(VFIODevice *vdev);
 };
 
 typedef struct VFIOGroup {
-- 
2.7.0




Re: [PULL v1 0/3] MicroBlaze fixes

2019-11-12 Thread Peter Maydell
On Tue, 12 Nov 2019 at 16:04, Edgar E. Iglesias
 wrote:
>
> From: "Edgar E. Iglesias" 
>
> The following changes since commit 039e285e095c20a88e623b927654b161aaf9d914:
>
>   Merge remote-tracking branch 
> 'remotes/vivier2/tags/trivial-branch-pull-request' into staging (2019-11-12 
> 12:09:19 +)
>
> are available in the Git repository at:
>
>   g...@github.com:edgarigl/qemu.git 
> tags/edgar/xilinx-next-2019-11-12.for-upstream
>
> for you to fetch changes up to c49a41b0b9e6c77e24ac2be4d95c54d62bc7b092:
>
>   target/microblaze: Plug temp leak around eval_cond_jmp() (2019-11-12 
> 16:35:26 +0100)
>
> 
> For upstream
>
> 
> Edgar E. Iglesias (3):
>   target/microblaze: Plug temp leaks for loads/stores
>   target/microblaze: Plug temp leaks with delay slot setup
>   target/microblaze: Plug temp leak around eval_cond_jmp()
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/4.2
for any user-visible changes.

-- PMM



[PATCH v9 QEMU 10/15] vfio: Register SaveVMHandlers for VFIO device

2019-11-12 Thread Kirti Wankhede
Define flags to be used as delimeter in migration file stream.
Added .save_setup and .save_cleanup functions. Mapped & unmapped migration
region from these functions at source during saving or pre-copy phase.
Set VFIO device state depending on VM's state. During live migration, VM is
running when .save_setup is called, _SAVING | _RUNNING state is set for VFIO
device. During save-restore, VM is paused, _SAVING state is set for VFIO device.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/migration.c  | 70 
 hw/vfio/trace-events |  2 ++
 2 files changed, 72 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 7e7aeb58647e..48aac6d29876 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -8,6 +8,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/main-loop.h"
 #include 
 
 #include "sysemu/runstate.h"
@@ -24,6 +25,17 @@
 #include "pci.h"
 #include "trace.h"
 
+/*
+ * Flags used as delimiter:
+ * 0x => MSB 32-bit all 1s
+ * 0xef10 => emulated (virtual) function IO
+ * 0x => 16-bits reserved for flags
+ */
+#define VFIO_MIG_FLAG_END_OF_STATE  (0xef11ULL)
+#define VFIO_MIG_FLAG_DEV_CONFIG_STATE  (0xef12ULL)
+#define VFIO_MIG_FLAG_DEV_SETUP_STATE   (0xef13ULL)
+#define VFIO_MIG_FLAG_DEV_DATA_STATE(0xef14ULL)
+
 static void vfio_migration_region_exit(VFIODevice *vbasedev)
 {
 VFIOMigration *migration = vbasedev->migration;
@@ -108,6 +120,63 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, 
uint32_t set_flags,
 return 0;
 }
 
+/* -- */
+
+static int vfio_save_setup(QEMUFile *f, void *opaque)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+int ret;
+
+qemu_put_be64(f, VFIO_MIG_FLAG_DEV_SETUP_STATE);
+
+if (migration->region.mmaps) {
+qemu_mutex_lock_iothread();
+ret = vfio_region_mmap(>region);
+qemu_mutex_unlock_iothread();
+if (ret) {
+error_report("%s: Failed to mmap VFIO migration region %d: %s",
+ vbasedev->name, migration->region.index,
+ strerror(-ret));
+return ret;
+}
+}
+
+ret = vfio_migration_set_state(vbasedev, VFIO_DEVICE_STATE_SAVING, 0);
+if (ret) {
+error_report("%s: Failed to set state SAVING", vbasedev->name);
+return ret;
+}
+
+qemu_put_be64(f, VFIO_MIG_FLAG_END_OF_STATE);
+
+ret = qemu_file_get_error(f);
+if (ret) {
+return ret;
+}
+
+trace_vfio_save_setup(vbasedev->name);
+return 0;
+}
+
+static void vfio_save_cleanup(void *opaque)
+{
+VFIODevice *vbasedev = opaque;
+VFIOMigration *migration = vbasedev->migration;
+
+if (migration->region.mmaps) {
+vfio_region_unmap(>region);
+}
+trace_vfio_save_cleanup(vbasedev->name);
+}
+
+static SaveVMHandlers savevm_vfio_handlers = {
+.save_setup = vfio_save_setup,
+.save_cleanup = vfio_save_cleanup,
+};
+
+/* -- */
+
 static void vfio_vmstate_change(void *opaque, int running, RunState state)
 {
 VFIODevice *vbasedev = opaque;
@@ -171,6 +240,7 @@ static int vfio_migration_init(VFIODevice *vbasedev,
 return ret;
 }
 
+register_savevm_live("vfio", -1, 1, _vfio_handlers, vbasedev);
 vbasedev->vm_state = qemu_add_vm_change_state_handler(vfio_vmstate_change,
   vbasedev);
 
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 69503228f20e..4bb43f18f315 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -149,3 +149,5 @@ vfio_migration_probe(char *name, uint32_t index) " (%s) 
Region %d"
 vfio_migration_set_state(char *name, uint32_t state) " (%s) state %d"
 vfio_vmstate_change(char *name, int running, const char *reason, uint32_t 
dev_state) " (%s) running %d reason %s device state %d"
 vfio_migration_state_notifier(char *name, int state) " (%s) state %d"
+vfio_save_setup(char *name) " (%s)"
+vfio_save_cleanup(char *name) " (%s)"
-- 
2.7.0




[PATCH v9 QEMU 04/15] vfio: Add function to unmap VFIO region

2019-11-12 Thread Kirti Wankhede
This function will be used for migration region.
Migration region is mmaped when migration starts and will be unmapped when
migration is complete.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
Reviewed-by: Cornelia Huck 
---
 hw/vfio/common.c  | 20 
 hw/vfio/trace-events  |  1 +
 include/hw/vfio/vfio-common.h |  1 +
 3 files changed, 22 insertions(+)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 5ca11488d676..ade9839c28a3 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -983,6 +983,26 @@ int vfio_region_mmap(VFIORegion *region)
 return 0;
 }
 
+void vfio_region_unmap(VFIORegion *region)
+{
+int i;
+
+if (!region->mem) {
+return;
+}
+
+for (i = 0; i < region->nr_mmaps; i++) {
+trace_vfio_region_unmap(memory_region_name(>mmaps[i].mem),
+region->mmaps[i].offset,
+region->mmaps[i].offset +
+region->mmaps[i].size - 1);
+memory_region_del_subregion(region->mem, >mmaps[i].mem);
+munmap(region->mmaps[i].mmap, region->mmaps[i].size);
+object_unparent(OBJECT(>mmaps[i].mem));
+region->mmaps[i].mmap = NULL;
+}
+}
+
 void vfio_region_exit(VFIORegion *region)
 {
 int i;
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index b1ef55a33ffd..8cdc27946cb8 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -111,6 +111,7 @@ vfio_region_mmap(const char *name, unsigned long offset, 
unsigned long end) "Reg
 vfio_region_exit(const char *name, int index) "Device %s, region %d"
 vfio_region_finalize(const char *name, int index) "Device %s, region %d"
 vfio_region_mmaps_set_enabled(const char *name, bool enabled) "Region %s mmaps 
enabled: %d"
+vfio_region_unmap(const char *name, unsigned long offset, unsigned long end) 
"Region %s unmap [0x%lx - 0x%lx]"
 vfio_region_sparse_mmap_header(const char *name, int index, int nr_areas) 
"Device %s region %d: %d sparse mmap entries"
 vfio_region_sparse_mmap_entry(int i, unsigned long start, unsigned long end) 
"sparse entry %d [0x%lx - 0x%lx]"
 vfio_get_dev_region(const char *name, int index, uint32_t type, uint32_t 
subtype) "%s index %d, %08x/%0x8"
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index fd564209ac71..8d7a0fbb1046 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -171,6 +171,7 @@ int vfio_region_setup(Object *obj, VFIODevice *vbasedev, 
VFIORegion *region,
   int index, const char *name);
 int vfio_region_mmap(VFIORegion *region);
 void vfio_region_mmaps_set_enabled(VFIORegion *region, bool enabled);
+void vfio_region_unmap(VFIORegion *region);
 void vfio_region_exit(VFIORegion *region);
 void vfio_region_finalize(VFIORegion *region);
 void vfio_reset_handler(void *opaque);
-- 
2.7.0




[PATCH v9 QEMU 14/15] vfio: Add ioctl to get dirty pages bitmap during dma unmap.

2019-11-12 Thread Kirti Wankhede
With vIOMMU, IO virtual address range can get unmapped while in pre-copy phase
of migration. In that case, unmap ioctl should return pages pinned in that range
and QEMU should find its correcponding guest physical addresses and report
those dirty.

Note: This patch is not yet tested. I'm trying to see how I can test this code
path.

Suggested-by: Alex Williamson 
Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/common.c | 65 
 1 file changed, 61 insertions(+), 4 deletions(-)

diff --git a/hw/vfio/common.c b/hw/vfio/common.c
index 66f1c64bf074..dc5768219d44 100644
--- a/hw/vfio/common.c
+++ b/hw/vfio/common.c
@@ -311,11 +311,30 @@ static bool vfio_devices_are_stopped_and_saving(void)
 return true;
 }
 
+static bool vfio_devices_are_running_and_saving(void)
+{
+VFIOGroup *group;
+VFIODevice *vbasedev;
+
+QLIST_FOREACH(group, _group_list, next) {
+QLIST_FOREACH(vbasedev, >device_list, next) {
+if ((vbasedev->device_state & VFIO_DEVICE_STATE_SAVING) &&
+(vbasedev->device_state & VFIO_DEVICE_STATE_RUNNING)) {
+continue;
+} else {
+return false;
+}
+}
+}
+return true;
+}
+
 /*
  * DMA - Mapping and unmapping for the "type1" IOMMU interface used on x86
  */
 static int vfio_dma_unmap(VFIOContainer *container,
-  hwaddr iova, ram_addr_t size)
+  hwaddr iova, ram_addr_t size,
+  VFIOGuestIOMMU *giommu)
 {
 struct vfio_iommu_type1_dma_unmap unmap = {
 .argsz = sizeof(unmap),
@@ -324,6 +343,44 @@ static int vfio_dma_unmap(VFIOContainer *container,
 .size = size,
 };
 
+if (giommu && vfio_devices_are_running_and_saving()) {
+int ret;
+uint64_t bitmap_size;
+struct vfio_iommu_type1_dma_unmap_bitmap unmap_bitmap = {
+.argsz = sizeof(unmap_bitmap),
+.flags = VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP,
+.iova = iova,
+.size = size,
+};
+
+bitmap_size = BITS_TO_LONGS(size >> TARGET_PAGE_BITS) *
+  sizeof(uint64_t);
+
+unmap_bitmap.bitmap = g_try_malloc0(bitmap_size);
+if (!unmap_bitmap.bitmap) {
+error_report("%s: Error allocating bitmap buffer of size 0x%lx",
+ __func__, bitmap_size);
+return -ENOMEM;
+}
+
+unmap_bitmap.bitmap_size = bitmap_size;
+
+ret = ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA_GET_BITMAP,
+_bitmap);
+
+if (!ret) {
+cpu_physical_memory_set_dirty_lebitmap(
+(uint64_t *)unmap_bitmap.bitmap,
+giommu->iommu_offset + giommu->n.start,
+bitmap_size >> TARGET_PAGE_BITS);
+} else {
+error_report("VFIO_IOMMU_GET_DIRTY_BITMAP: %d %d", ret, errno);
+}
+
+g_free(unmap_bitmap.bitmap);
+return ret;
+}
+
 while (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, )) {
 /*
  * The type1 backend has an off-by-one bug in the kernel (71a7d3d78e3c
@@ -371,7 +428,7 @@ static int vfio_dma_map(VFIOContainer *container, hwaddr 
iova,
  * the VGA ROM space.
  */
 if (ioctl(container->fd, VFIO_IOMMU_MAP_DMA, ) == 0 ||
-(errno == EBUSY && vfio_dma_unmap(container, iova, size) == 0 &&
+(errno == EBUSY && vfio_dma_unmap(container, iova, size, NULL) == 0 &&
  ioctl(container->fd, VFIO_IOMMU_MAP_DMA, ) == 0)) {
 return 0;
 }
@@ -511,7 +568,7 @@ static void vfio_iommu_map_notify(IOMMUNotifier *n, 
IOMMUTLBEntry *iotlb)
  iotlb->addr_mask + 1, vaddr, ret);
 }
 } else {
-ret = vfio_dma_unmap(container, iova, iotlb->addr_mask + 1);
+ret = vfio_dma_unmap(container, iova, iotlb->addr_mask + 1, giommu);
 if (ret) {
 error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
  "0x%"HWADDR_PRIx") = %d (%m)",
@@ -814,7 +871,7 @@ static void vfio_listener_region_del(MemoryListener 
*listener,
 }
 
 if (try_unmap) {
-ret = vfio_dma_unmap(container, iova, int128_get64(llsize));
+ret = vfio_dma_unmap(container, iova, int128_get64(llsize), NULL);
 if (ret) {
 error_report("vfio_dma_unmap(%p, 0x%"HWADDR_PRIx", "
  "0x%"HWADDR_PRIx") = %d (%m)",
-- 
2.7.0




[PATCH v9 QEMU 08/15] vfio: Add VM state change handler to know state of VM

2019-11-12 Thread Kirti Wankhede
VM state change handler gets called on change in VM's state. This is used to set
VFIO device state to _RUNNING.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/migration.c   | 69 +++
 hw/vfio/trace-events  |  2 ++
 include/hw/vfio/vfio-common.h |  4 +++
 3 files changed, 75 insertions(+)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index c17bd1b0b934..28981a759e6c 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -10,6 +10,7 @@
 #include "qemu/osdep.h"
 #include 
 
+#include "sysemu/runstate.h"
 #include "hw/vfio/vfio-common.h"
 #include "cpu.h"
 #include "migration/migration.h"
@@ -74,6 +75,67 @@ err:
 return ret;
 }
 
+static int vfio_migration_set_state(VFIODevice *vbasedev, uint32_t set_flags,
+uint32_t clear_flags)
+{
+VFIOMigration *migration = vbasedev->migration;
+VFIORegion *region = >region;
+uint32_t device_state;
+int ret = 0;
+
+/* same flags should not be set or clear */
+assert(!(set_flags & clear_flags));
+
+device_state = (vbasedev->device_state | set_flags) & ~clear_flags;
+
+switch (device_state & VFIO_DEVICE_STATE_MASK) {
+case VFIO_DEVICE_STATE_INVALID_CASE1:
+case VFIO_DEVICE_STATE_INVALID_CASE2:
+return -EINVAL;
+}
+
+ret = pwrite(vbasedev->fd, _state, sizeof(device_state),
+ region->fd_offset + offsetof(struct 
vfio_device_migration_info,
+  device_state));
+if (ret < 0) {
+error_report("%s: Failed to set device state %d %s",
+ vbasedev->name, ret, strerror(errno));
+return ret;
+}
+
+vbasedev->device_state = device_state;
+trace_vfio_migration_set_state(vbasedev->name, device_state);
+return 0;
+}
+
+static void vfio_vmstate_change(void *opaque, int running, RunState state)
+{
+VFIODevice *vbasedev = opaque;
+
+if ((vbasedev->vm_running != running)) {
+int ret;
+uint32_t set_flags = 0, clear_flags = 0;
+
+if (running) {
+set_flags = VFIO_DEVICE_STATE_RUNNING;
+if (vbasedev->device_state & VFIO_DEVICE_STATE_RESUMING) {
+clear_flags = VFIO_DEVICE_STATE_RESUMING;
+}
+} else {
+clear_flags = VFIO_DEVICE_STATE_RUNNING;
+}
+
+ret = vfio_migration_set_state(vbasedev, set_flags, clear_flags);
+if (ret) {
+error_report("%s: Failed to set device state 0x%x",
+ vbasedev->name, set_flags & ~clear_flags);
+}
+vbasedev->vm_running = running;
+trace_vfio_vmstate_change(vbasedev->name, running, RunState_str(state),
+  set_flags & ~clear_flags);
+}
+}
+
 static int vfio_migration_init(VFIODevice *vbasedev,
struct vfio_region_info *info)
 {
@@ -89,6 +151,9 @@ static int vfio_migration_init(VFIODevice *vbasedev,
 return ret;
 }
 
+vbasedev->vm_state = qemu_add_vm_change_state_handler(vfio_vmstate_change,
+  vbasedev);
+
 return 0;
 }
 
@@ -127,6 +192,10 @@ add_blocker:
 
 void vfio_migration_finalize(VFIODevice *vbasedev)
 {
+if (vbasedev->vm_state) {
+qemu_del_vm_change_state_handler(vbasedev->vm_state);
+}
+
 if (vbasedev->migration_blocker) {
 migrate_del_blocker(vbasedev->migration_blocker);
 error_free(vbasedev->migration_blocker);
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 191a726a1312..3d15bacd031a 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -146,3 +146,5 @@ vfio_display_edid_write_error(void) ""
 
 # migration.c
 vfio_migration_probe(char *name, uint32_t index) " (%s) Region %d"
+vfio_migration_set_state(char *name, uint32_t state) " (%s) state %d"
+vfio_vmstate_change(char *name, int running, const char *reason, uint32_t 
dev_state) " (%s) running %d reason %s device state %d"
diff --git a/include/hw/vfio/vfio-common.h b/include/hw/vfio/vfio-common.h
index 927511897a44..6573acd6738e 100644
--- a/include/hw/vfio/vfio-common.h
+++ b/include/hw/vfio/vfio-common.h
@@ -29,6 +29,7 @@
 #ifdef CONFIG_LINUX
 #include 
 #endif
+#include "sysemu/sysemu.h"
 
 #define VFIO_MSG_PREFIX "vfio %s: "
 
@@ -120,6 +121,9 @@ typedef struct VFIODevice {
 unsigned int flags;
 VFIOMigration *migration;
 Error *migration_blocker;
+uint32_t device_state;
+VMChangeStateEntry *vm_state;
+int vm_running;
 } VFIODevice;
 
 struct VFIODeviceOps {
-- 
2.7.0




[PATCH v9 Qemu 00/15] Add migration support for VFIO devices

2019-11-12 Thread Kirti Wankhede
Hi,

This Patch set adds migration support for VFIO devices in QEMU.

This Patch set include patches as below:
Patch 1-3:
- Define KABI for VFIO device for migration support for device state and newly
  added ioctl definations to get dirty pages bitmap. These 3 patches are same as
  the first 2 patches in kernel patch set.

Patch 4-6:
- Few code refactor
- Added save and restore functions for PCI configuration space

Patch 7-12:
- Generic migration functionality for VFIO device.
  * This patch set adds functionality only for PCI devices, but can be
extended to other VFIO devices.
  * Added all the basic functions required for pre-copy, stop-and-copy and
resume phases of migration.
  * Added state change notifier and from that notifier function, VFIO
device's state changed is conveyed to VFIO device driver.
  * During save setup phase and resume/load setup phase, migration region
is queried and is used to read/write VFIO device data.
  * .save_live_pending and .save_live_iterate are implemented to use QEMU's
functionality of iteration during pre-copy phase.
  * In .save_live_complete_precopy, that is in stop-and-copy phase,
iteration to read data from VFIO device driver is implemented till pending
bytes returned by driver are not zero.

Patch 13:
- Add vfio_listerner_log_sync to mark dirty pages. Dirty pages bitmap is queried
  per container. All pages pinned by vendor driver through vfio_pin_pages
  external API has to be marked as dirty during  migration.
  When there are CPU writes, CPU dirty page tracking can identify dirtied
  pages, but any page pinned by vendor driver can also be written by
  device. As of now there is no device which has hardware support for
  dirty page tracking. So all pages which are pinned by vendor driver
  should be considered as dirty.
  In Qemu, marking pages dirty is only done when device is in stop-and-copy
  phase because if pages are marked dirty during pre-copy phase and content is
  transfered from source to distination, there is no way to know newly dirtied
  pages from the point they were copied earlier until device stops. To avoid
  repeated copy of same content, pinned pages are marked dirty only during
  stop-and-copy phase.

Patch 14:
- With vIOMMU, IO virtual address range can get unmapped while in pre-copy
  phase of migration. In that case, unmap ioctl should return pages pinned
  in that range and QEMU should report corresponding guest physical pages
  dirty.

Patch 15:
- Make VFIO PCI device migration capable. If migration region is not provided by
  driver, migration is blocked.

Yet TODO:
Since there is no device which has hardware support for system memmory
dirty bitmap tracking, right now there is no other API from vendor driver
to VFIO IOMMU module to report dirty pages. In future, when such hardware
support will be implemented, an API will be required in kernel such that
vendor driver could report dirty pages to VFIO module during migration phases.

Below is the flow of state change for live migration where states in brackets
represent VM state, migration state and VFIO device state as:
(VM state, MIGRATION_STATUS, VFIO_DEVICE_STATE)

Live migration save path:
QEMU normal running state
(RUNNING, _NONE, _RUNNING)
|
migrate_init spawns migration_thread.
(RUNNING, _SETUP, _RUNNING|_SAVING)
Migration thread then calls each device's .save_setup()
|
(RUNNING, _ACTIVE, _RUNNING|_SAVING)
If device is active, get pending bytes by .save_live_pending()
if pending bytes >= threshold_size,  call save_live_iterate()
Data of VFIO device for pre-copy phase is copied.
Iterate till pending bytes converge and are less than threshold
|
On migration completion, vCPUs stops and calls .save_live_complete_precopy
for each active device. VFIO device is then transitioned in
 _SAVING state.
(FINISH_MIGRATE, _DEVICE, _SAVING)
For VFIO device, iterate in  .save_live_complete_precopy  until
pending data is 0.
(FINISH_MIGRATE, _DEVICE, _STOPPED)
|
(FINISH_MIGRATE, _COMPLETED, STOPPED)
Migraton thread schedule cleanup bottom half and exit

Live migration resume path:
Incomming migration calls .load_setup for each device
(RESTORE_VM, _ACTIVE, STOPPED)
|
For each device, .load_state is called for that device section data
|
At the end, called .load_cleanup for each device and vCPUs are started.
|
(RUNNING, _NONE, _RUNNING)

Note that:
- Migration post copy is not supported.

v8 -> v9:
- Split patch set in 2 sets, Kernel and QEMU sets.
- Dirty pages bitmap is queried from IOMMU container rather than from
  vendor driver for per device. Added 2 ioctls to achieve this.

v7 -> v8:
- Updated comments for KABI
- Added BAR address validation check during PCI device's config space load as
  

[PATCH v9 QEMU 11/15] vfio: Add save state functions to SaveVMHandlers

2019-11-12 Thread Kirti Wankhede
Added .save_live_pending, .save_live_iterate and .save_live_complete_precopy
functions. These functions handles pre-copy and stop-and-copy phase.

In _SAVING|_RUNNING device state or pre-copy phase:
- read pending_bytes. If pending_bytes > 0, go through below steps.
- read data_offset - indicates kernel driver to write data to staging
  buffer.
- read data_size - amount of data in bytes written by vendor driver in
  migration region.
- read data_size bytes of data from data_offset in the migration region.
- Write data packet to file stream as below:
{VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data,
VFIO_MIG_FLAG_END_OF_STATE }

In _SAVING device state or stop-and-copy phase
a. read config space of device and save to migration file stream. This
   doesn't need to be from vendor driver. Any other special config state
   from driver can be saved as data in following iteration.
b. read pending_bytes. If pending_bytes > 0, go through below steps.
c. read data_offset - indicates kernel driver to write data to staging
   buffer.
d. read data_size - amount of data in bytes written by vendor driver in
   migration region.
e. read data_size bytes of data from data_offset in the migration region.
f. Write data packet as below:
   {VFIO_MIG_FLAG_DEV_DATA_STATE, data_size, actual data}
g. iterate through steps b to f while (pending_bytes > 0)
h. Write {VFIO_MIG_FLAG_END_OF_STATE}

When data region is mapped, its user's responsibility to read data from
data_offset of data_size before moving to next steps.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/migration.c  | 245 ++-
 hw/vfio/trace-events |   6 ++
 2 files changed, 250 insertions(+), 1 deletion(-)

diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
index 48aac6d29876..f890e864e174 100644
--- a/hw/vfio/migration.c
+++ b/hw/vfio/migration.c
@@ -120,6 +120,137 @@ static int vfio_migration_set_state(VFIODevice *vbasedev, 
uint32_t set_flags,
 return 0;
 }
 
+static void *find_data_region(VFIORegion *region,
+  uint64_t data_offset,
+  uint64_t data_size)
+{
+void *ptr = NULL;
+int i;
+
+for (i = 0; i < region->nr_mmaps; i++) {
+if ((data_offset >= region->mmaps[i].offset) &&
+(data_offset < region->mmaps[i].offset + region->mmaps[i].size) &&
+(data_size <= region->mmaps[i].size)) {
+ptr = region->mmaps[i].mmap + (data_offset -
+   region->mmaps[i].offset);
+break;
+}
+}
+return ptr;
+}
+
+static int vfio_save_buffer(QEMUFile *f, VFIODevice *vbasedev)
+{
+VFIOMigration *migration = vbasedev->migration;
+VFIORegion *region = >region;
+uint64_t data_offset = 0, data_size = 0;
+int ret;
+
+ret = pread(vbasedev->fd, _offset, sizeof(data_offset),
+region->fd_offset + offsetof(struct vfio_device_migration_info,
+ data_offset));
+if (ret != sizeof(data_offset)) {
+error_report("%s: Failed to get migration buffer data offset %d",
+ vbasedev->name, ret);
+return -EINVAL;
+}
+
+ret = pread(vbasedev->fd, _size, sizeof(data_size),
+region->fd_offset + offsetof(struct vfio_device_migration_info,
+ data_size));
+if (ret != sizeof(data_size)) {
+error_report("%s: Failed to get migration buffer data size %d",
+ vbasedev->name, ret);
+return -EINVAL;
+}
+
+if (data_size > 0) {
+void *buf = NULL;
+bool buffer_mmaped;
+
+if (region->mmaps) {
+buf = find_data_region(region, data_offset, data_size);
+}
+
+buffer_mmaped = (buf != NULL) ? true : false;
+
+if (!buffer_mmaped) {
+buf = g_try_malloc0(data_size);
+if (!buf) {
+error_report("%s: Error allocating buffer ", __func__);
+return -ENOMEM;
+}
+
+ret = pread(vbasedev->fd, buf, data_size,
+region->fd_offset + data_offset);
+if (ret != data_size) {
+error_report("%s: Failed to get migration data %d",
+ vbasedev->name, ret);
+g_free(buf);
+return -EINVAL;
+}
+}
+
+qemu_put_be64(f, data_size);
+qemu_put_buffer(f, buf, data_size);
+
+if (!buffer_mmaped) {
+g_free(buf);
+}
+} else {
+qemu_put_be64(f, data_size);
+}
+
+trace_vfio_save_buffer(vbasedev->name, data_offset, data_size,
+   migration->pending_bytes);
+
+ret = qemu_file_get_error(f);
+if (ret) {
+return ret;
+}
+
+return data_size;
+}
+
+static int vfio_update_pending(VFIODevice *vbasedev)
+{
+VFIOMigration 

[PATCH v9 QEMU 01/15] vfio: KABI for migration interface for device state

2019-11-12 Thread Kirti Wankhede
- Defined MIGRATION region type and sub-type.
- Used 3 bits to define VFIO device states.
  Bit 0 => _RUNNING
  Bit 1 => _SAVING
  Bit 2 => _RESUMING
  Combination of these bits defines VFIO device's state during migration
  _RUNNING => Normal VFIO device running state. When its reset, it
  indicates _STOPPED state. when device is changed to
  _STOPPED, driver should stop device before write()
  returns.
  _SAVING | _RUNNING => vCPUs are running, VFIO device is running but
start saving state of device i.e. pre-copy state
  _SAVING  => vCPUs are stopped, VFIO device should be stopped, and
  save device state,i.e. stop-n-copy state
  _RESUMING => VFIO device resuming state.
  _SAVING | _RESUMING and _RUNNING | _RESUMING => Invalid states
  Bits 3 - 31 are reserved for future use. User should perform
  read-modify-write operation on this field.
- Defined vfio_device_migration_info structure which will be placed at 0th
  offset of migration region to get/set VFIO device related information.
  Defined members of structure and usage on read/write access:
* device_state: (read/write)
To convey VFIO device state to be transitioned to. Only 3 bits are
used as of now, Bits 3 - 31 are reserved for future use.
* pending bytes: (read only)
To get pending bytes yet to be migrated for VFIO device.
* data_offset: (read only)
To get data offset in migration region from where data exist
during _SAVING and from where data should be written by user space
application during _RESUMING state.
* data_size: (read/write)
To get and set size in bytes of data copied in migration region
during _SAVING and _RESUMING state.

Migration region looks like:
 --
|vfio_device_migration_info|data section  |
|  | ///  |
 --
 ^  ^
 offset 0-trapped partdata_offset

Structure vfio_device_migration_info is always followed by data section
in the region, so data_offset will always be non-0. Offset from where data
to be copied is decided by kernel driver, data section can be trapped or
mapped depending on how kernel driver defines data section.
Data section partition can be defined as mapped by sparse mmap capability.
If mmapped, then data_offset should be page aligned, where as initial
section which contain vfio_device_migration_info structure might not end
at offset which is page aligned.
Vendor driver should decide whether to partition data section and how to
partition the data section. Vendor driver should return data_offset
accordingly.

For user application, data is opaque. User should write data in the same
order as received.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 linux-headers/linux/vfio.h | 108 +
 1 file changed, 108 insertions(+)

diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index fb10370d2928..597b3d4bf45e 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -305,6 +305,7 @@ struct vfio_region_info_cap_type {
 #define VFIO_REGION_TYPE_PCI_VENDOR_MASK   (0x)
 #define VFIO_REGION_TYPE_GFX(1)
 #define VFIO_REGION_TYPE_CCW   (2)
+#define VFIO_REGION_TYPE_MIGRATION  (3)
 
 /* sub-types for VFIO_REGION_TYPE_PCI_* */
 
@@ -379,6 +380,113 @@ struct vfio_region_gfx_edid {
 /* sub-types for VFIO_REGION_TYPE_CCW */
 #define VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD  (1)
 
+/* sub-types for VFIO_REGION_TYPE_MIGRATION */
+#define VFIO_REGION_SUBTYPE_MIGRATION   (1)
+
+/*
+ * Structure vfio_device_migration_info is placed at 0th offset of
+ * VFIO_REGION_SUBTYPE_MIGRATION region to get/set VFIO device related 
migration
+ * information. Field accesses from this structure are only supported at their
+ * native width and alignment, otherwise the result is undefined and vendor
+ * drivers should return an error.
+ *
+ * device_state: (read/write)
+ *  To indicate vendor driver the state VFIO device should be transitioned
+ *  to. If device state transition fails, write on this field return error.
+ *  It consists of 3 bits:
+ *  - If bit 0 set, indicates _RUNNING state. When its reset, that 
indicates
+ *_STOPPED state. When device is changed to _STOPPED, driver should 
stop
+ *device before write() returns.
+ *  - If bit 1 set, indicates _SAVING state. When set, that indicates 
driver
+ *should start gathering device state information which will be 
provided
+ *to VFIO user space application to save device's state.
+ *  - If bit 2 set, indicates _RESUMING state. When set, that indicates
+ *prepare to resume device, data provided through migration region
+ 

[PATCH v9 QEMU 07/15] vfio: Add migration region initialization and finalize function

2019-11-12 Thread Kirti Wankhede
- Migration functions are implemented for VFIO_DEVICE_TYPE_PCI device in this
  patch series.
- VFIO device supports migration or not is decided based of migration region
  query. If migration region query is successful and migration region
  initialization is successful then migration is supported else migration is
  blocked.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 hw/vfio/Makefile.objs |   2 +-
 hw/vfio/migration.c   | 137 ++
 hw/vfio/trace-events  |   3 +
 include/hw/vfio/vfio-common.h |  10 +++
 4 files changed, 151 insertions(+), 1 deletion(-)
 create mode 100644 hw/vfio/migration.c

diff --git a/hw/vfio/Makefile.objs b/hw/vfio/Makefile.objs
index abad8b818c9b..36033d1437c5 100644
--- a/hw/vfio/Makefile.objs
+++ b/hw/vfio/Makefile.objs
@@ -1,4 +1,4 @@
-obj-y += common.o spapr.o
+obj-y += common.o spapr.o migration.o
 obj-$(CONFIG_VFIO_PCI) += pci.o pci-quirks.o display.o
 obj-$(CONFIG_VFIO_CCW) += ccw.o
 obj-$(CONFIG_VFIO_PLATFORM) += platform.o
diff --git a/hw/vfio/migration.c b/hw/vfio/migration.c
new file mode 100644
index ..c17bd1b0b934
--- /dev/null
+++ b/hw/vfio/migration.c
@@ -0,0 +1,137 @@
+/*
+ * Migration support for VFIO devices
+ *
+ * Copyright NVIDIA, Inc. 2019
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2. See
+ * the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include 
+
+#include "hw/vfio/vfio-common.h"
+#include "cpu.h"
+#include "migration/migration.h"
+#include "migration/qemu-file.h"
+#include "migration/register.h"
+#include "migration/blocker.h"
+#include "migration/misc.h"
+#include "qapi/error.h"
+#include "exec/ramlist.h"
+#include "exec/ram_addr.h"
+#include "pci.h"
+#include "trace.h"
+
+static void vfio_migration_region_exit(VFIODevice *vbasedev)
+{
+VFIOMigration *migration = vbasedev->migration;
+
+if (!migration) {
+return;
+}
+
+if (migration->region.size) {
+vfio_region_exit(>region);
+vfio_region_finalize(>region);
+}
+}
+
+static int vfio_migration_region_init(VFIODevice *vbasedev, int index)
+{
+VFIOMigration *migration = vbasedev->migration;
+Object *obj = NULL;
+int ret = -EINVAL;
+
+if (!vbasedev->ops || !vbasedev->ops->vfio_get_object) {
+return ret;
+}
+
+obj = vbasedev->ops->vfio_get_object(vbasedev);
+if (!obj) {
+return ret;
+}
+
+ret = vfio_region_setup(obj, vbasedev, >region, index,
+"migration");
+if (ret) {
+error_report("%s: Failed to setup VFIO migration region %d: %s",
+ vbasedev->name, index, strerror(-ret));
+goto err;
+}
+
+if (!migration->region.size) {
+ret = -EINVAL;
+error_report("%s: Invalid region size of VFIO migration region %d: %s",
+ vbasedev->name, index, strerror(-ret));
+goto err;
+}
+
+return 0;
+
+err:
+vfio_migration_region_exit(vbasedev);
+return ret;
+}
+
+static int vfio_migration_init(VFIODevice *vbasedev,
+   struct vfio_region_info *info)
+{
+int ret;
+
+vbasedev->migration = g_new0(VFIOMigration, 1);
+
+ret = vfio_migration_region_init(vbasedev, info->index);
+if (ret) {
+error_report("%s: Failed to initialise migration region",
+ vbasedev->name);
+g_free(vbasedev->migration);
+return ret;
+}
+
+return 0;
+}
+
+/* -- */
+
+int vfio_migration_probe(VFIODevice *vbasedev, Error **errp)
+{
+struct vfio_region_info *info;
+Error *local_err = NULL;
+int ret;
+
+ret = vfio_get_dev_region_info(vbasedev, VFIO_REGION_TYPE_MIGRATION,
+   VFIO_REGION_SUBTYPE_MIGRATION, );
+if (ret) {
+goto add_blocker;
+}
+
+ret = vfio_migration_init(vbasedev, info);
+if (ret) {
+goto add_blocker;
+}
+
+trace_vfio_migration_probe(vbasedev->name, info->index);
+return 0;
+
+add_blocker:
+error_setg(>migration_blocker,
+   "VFIO device doesn't support migration");
+ret = migrate_add_blocker(vbasedev->migration_blocker, _err);
+if (local_err) {
+error_propagate(errp, local_err);
+error_free(vbasedev->migration_blocker);
+}
+return ret;
+}
+
+void vfio_migration_finalize(VFIODevice *vbasedev)
+{
+if (vbasedev->migration_blocker) {
+migrate_del_blocker(vbasedev->migration_blocker);
+error_free(vbasedev->migration_blocker);
+}
+
+vfio_migration_region_exit(vbasedev);
+g_free(vbasedev->migration);
+}
diff --git a/hw/vfio/trace-events b/hw/vfio/trace-events
index 8cdc27946cb8..191a726a1312 100644
--- a/hw/vfio/trace-events
+++ b/hw/vfio/trace-events
@@ -143,3 +143,6 @@ vfio_display_edid_link_up(void) ""
 vfio_display_edid_link_down(void) ""
 

Re: [PATCH v4 12/20] libqtest: add in-process qtest.c tx/rx handlers

2019-11-12 Thread Alexander Bulekov

On 11/6/19 11:56 AM, Stefan Hajnoczi wrote:

On Wed, Oct 30, 2019 at 02:49:58PM +, Oleinik, Alexander wrote:

From: Alexander Oleinik 

Signed-off-by: Alexander Oleinik 
---
There's a particularily ugly line here:
qtest_client_set_tx_handler(qts,
 (void (*)(QTestState *s, const char*, size_t)) send);


Please typedef the function pointer to avoid repetition:

   typedef void (*QTestSendFn)(QTestState *s, const char *buf, size_t len);

And then introduce a wrapper function for type-safety:

   /* A type-safe wrapper for s->send() */
   static void send_wrapper(QTestState *s, const char *buf, size_t len)
   {
   s->send(s, buf, len);
   }

   ...

   qts->send = send;
   qtest_client_set_tx_handler(qts, send_wrapper);

Does this solve the issue?
So there should be two pointers qts->send and qts->ops->send? Otherwise 
qtest_client_set_tx_handler simply overwrites qts->send with the 
send_wrapper.


What I'm worried about is having to cast a
(void (*)(void *s, const char*, size_t) to a
(void (*)(QTestState *s, const char*, size_t)
I don't think this is defined according to the standard. If we add a 
secondary send function pointer to qts (void (*)(void *s, const char*, 
size_t)), then I think its no longer an issue, which I think is what you 
suggest above.



By the way, I also wonder whether the size_t len arguments are necessary
since const char *buf is a NUL-terminated C string.  The string should
be enough since the length can be calculated from it.

I'll change it.


diff --git a/qtest.c b/qtest.c
index 9fbfa0f08f..f817a5d789 100644
--- a/qtest.c
+++ b/qtest.c
@@ -812,6 +812,6 @@ void qtest_server_inproc_recv(void *dummy, const char *buf, 
size_t size)
  g_string_append(gstr, buf);
  if (gstr->str[gstr->len - 1] == '\n') {
  qtest_process_inbuf(NULL, gstr);
-g_string_free(gstr, true);
+g_string_truncate(gstr, 0);


Ah, a fix for the bug in an earlier commit.  Please squash it.


diff --git a/tests/libqtest.c b/tests/libqtest.c
index ff3153daf2..6143af33da 100644
--- a/tests/libqtest.c
+++ b/tests/libqtest.c
@@ -71,6 +71,7 @@ static void qtest_client_set_tx_handler(QTestState *s,
  static void qtest_client_set_rx_handler(QTestState *s,
  GString * (*recv)(QTestState *));
  
+static GString *recv_str;


Can this be a QTestState field?




--
===
I recently changed my last name from Oleinik to Bulekov
===



[PATCH v9 Kernel 4/5] vfio iommu: Implementation of ioctl to get dirty pages bitmap.

2019-11-12 Thread Kirti Wankhede
IOMMU container maintains list of external pinned pages. Bitmap of pinned
pages for input IO virtual address range is created and returned.
IO virtual address range should be from a single mapping created by
map request. Input bitmap_size is validated by calculating the size of
requested range.
This ioctl returns bitmap of dirty pages, its user space application
responsibility to copy content of dirty pages from source to destination
during migration.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 drivers/vfio/vfio_iommu_type1.c | 92 +
 1 file changed, 92 insertions(+)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index 2ada8e6cdb88..ac176e672857 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -850,6 +850,81 @@ static unsigned long vfio_pgsize_bitmap(struct vfio_iommu 
*iommu)
return bitmap;
 }
 
+/*
+ * start_iova is the reference from where bitmaping started. This is called
+ * from DMA_UNMAP where start_iova can be different than iova
+ */
+
+static int vfio_iova_dirty_bitmap(struct vfio_iommu *iommu, dma_addr_t iova,
+ size_t size, dma_addr_t start_iova,
+ unsigned long *bitmap)
+{
+   struct vfio_dma *dma;
+   dma_addr_t temp_iova = iova;
+
+   dma = vfio_find_dma(iommu, iova, size);
+   if (!dma)
+   return -EINVAL;
+
+   /*
+* Range should be from a single mapping created by map request.
+*/
+
+   if ((iova < dma->iova) ||
+   ((dma->iova + dma->size) < (iova + size)))
+   return -EINVAL;
+
+   while (temp_iova < iova + size) {
+   struct vfio_pfn *vpfn = NULL;
+
+   vpfn = vfio_find_vpfn(dma, temp_iova);
+   if (vpfn)
+   __bitmap_set(bitmap, vpfn->iova - start_iova, 1);
+
+   temp_iova += PAGE_SIZE;
+   }
+
+   return 0;
+}
+
+static int verify_bitmap_size(unsigned long npages, unsigned long bitmap_size)
+{
+   unsigned long bsize = ALIGN(npages, BITS_PER_LONG) / 8;
+
+   if ((bitmap_size == 0) || (bitmap_size < bsize))
+   return -EINVAL;
+   return 0;
+}
+
+static int vfio_iova_get_dirty_bitmap(struct vfio_iommu *iommu,
+   struct vfio_iommu_type1_dirty_bitmap *range)
+{
+   unsigned long *bitmap;
+   int ret;
+
+   ret = verify_bitmap_size(range->size >> PAGE_SHIFT, range->bitmap_size);
+   if (ret)
+   return ret;
+
+   /* one bit per page */
+   bitmap = bitmap_zalloc(range->size >> PAGE_SHIFT, GFP_KERNEL);
+   if (!bitmap)
+   return -ENOMEM;
+
+   mutex_lock(>lock);
+   ret = vfio_iova_dirty_bitmap(iommu, range->iova, range->size,
+range->iova, bitmap);
+   mutex_unlock(>lock);
+
+   if (!ret) {
+   if (copy_to_user(range->bitmap, bitmap, range->bitmap_size))
+   ret = -EFAULT;
+   }
+
+   bitmap_free(bitmap);
+   return ret;
+}
+
 static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
 struct vfio_iommu_type1_dma_unmap *unmap)
 {
@@ -2297,6 +2372,23 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
 
return copy_to_user((void __user *)arg, , minsz) ?
-EFAULT : 0;
+   } else if (cmd == VFIO_IOMMU_GET_DIRTY_BITMAP) {
+   struct vfio_iommu_type1_dirty_bitmap range;
+
+   /* Supported for v2 version only */
+   if (!iommu->v2)
+   return -EACCES;
+
+   minsz = offsetofend(struct vfio_iommu_type1_dirty_bitmap,
+   bitmap);
+
+   if (copy_from_user(, (void __user *)arg, minsz))
+   return -EFAULT;
+
+   if (range.argsz < minsz)
+   return -EINVAL;
+
+   return vfio_iova_get_dirty_bitmap(iommu, );
}
 
return -ENOTTY;
-- 
2.7.0




[PATCH v9 QEMU 03/15] vfio iommu: Add ioctl defination to unmap IOVA and return dirty bitmap

2019-11-12 Thread Kirti Wankhede
With vIOMMU, during pre-copy phase of migration, while CPUs are still
running, IO virtual address unmap can happen while device still keeping
reference of guest pfns. Those pages should be reported as dirty before
unmap, so that VFIO user space application can copy content of those pages
from source to destination.

IOCTL defination added here add bitmap pointer, size and flag. If flag
VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP is set and bitmap memory is allocated
and bitmap_size of set, then ioctl will create bitmap of pinned pages and
then unmap those.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 linux-headers/linux/vfio.h | 33 +
 1 file changed, 33 insertions(+)

diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index 2b00c732f313..520e952e3daf 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -925,6 +925,39 @@ struct vfio_iommu_type1_dirty_bitmap {
 
 #define VFIO_IOMMU_GET_DIRTY_BITMAP _IO(VFIO_TYPE, VFIO_BASE + 17)
 
+/**
+ * VFIO_IOMMU_UNMAP_DMA_GET_BITMAP - _IOWR(VFIO_TYPE, VFIO_BASE + 18,
+ struct vfio_iommu_type1_dma_unmap_bitmap)
+ *
+ * Unmap IO virtual addresses using the provided struct
+ * vfio_iommu_type1_dma_unmap_bitmap.  Caller sets argsz.
+ * VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP should be set to get dirty bitmap
+ * before unmapping IO virtual addresses. If this flag is not set, only IO
+ * virtual address are unmapped, that is, behave same as VFIO_IOMMU_UNMAP_DMA
+ * ioctl.
+ * User should allocate memory to get bitmap and should set size of allocated
+ * memory in bitmap_size field. One bit is used to represent per page
+ * consecutively starting from iova offset. Bit set indicates page at that
+ * offset from iova is dirty.
+ * The actual unmapped size is returned in the size field and bitmap of pages
+ * in the range of unmapped size is retuned in bitmap if flag
+ * VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP if set.
+ *
+ * No guarantee is made to the user that arbitrary unmaps of iova or size
+ * different from those used in the original mapping call will succeed.
+ */
+struct vfio_iommu_type1_dma_unmap_bitmap {
+   __u32argsz;
+   __u32flags;
+#define VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP (1 << 0)
+   __u64iova;/* IO virtual address */
+   __u64size;/* Size of mapping (bytes) */
+   __u64bitmap_size; /* in bytes */
+   void*bitmap;  /* one bit per page */
+};
+
+#define VFIO_IOMMU_UNMAP_DMA_GET_BITMAP _IO(VFIO_TYPE, VFIO_BASE + 18)
+
 /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
 
 /*
-- 
2.7.0




[PATCH v9 Kernel 5/5] vfio iommu: Implementation of ioctl to get dirty bitmap before unmap

2019-11-12 Thread Kirti Wankhede
If pages are pinned by external interface for requested IO virtual address
range, bitmap of such pages is created and then that range is unmapped.
To get bitmap during unmap, user should set flag
VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP, bitmap memory should be allocated and
bitmap_size should be set. If flag is not set, then it behaves same as
VFIO_IOMMU_UNMAP_DMA ioctl.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 drivers/vfio/vfio_iommu_type1.c | 71 +++--
 1 file changed, 69 insertions(+), 2 deletions(-)

diff --git a/drivers/vfio/vfio_iommu_type1.c b/drivers/vfio/vfio_iommu_type1.c
index ac176e672857..d6b988452ba6 100644
--- a/drivers/vfio/vfio_iommu_type1.c
+++ b/drivers/vfio/vfio_iommu_type1.c
@@ -926,7 +926,8 @@ static int vfio_iova_get_dirty_bitmap(struct vfio_iommu 
*iommu,
 }
 
 static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
-struct vfio_iommu_type1_dma_unmap *unmap)
+struct vfio_iommu_type1_dma_unmap *unmap,
+unsigned long *bitmap)
 {
uint64_t mask;
struct vfio_dma *dma, *dma_last = NULL;
@@ -1026,6 +1027,12 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
_unmap);
goto again;
}
+
+   if (bitmap) {
+   vfio_iova_dirty_bitmap(iommu, dma->iova, dma->size,
+  unmap->iova, bitmap);
+   }
+
unmapped += dma->size;
vfio_remove_dma(iommu, dma);
}
@@ -1039,6 +1046,43 @@ static int vfio_dma_do_unmap(struct vfio_iommu *iommu,
return ret;
 }
 
+static int vfio_dma_do_unmap_bitmap(struct vfio_iommu *iommu,
+   struct vfio_iommu_type1_dma_unmap_bitmap *unmap_bitmap)
+{
+   struct vfio_iommu_type1_dma_unmap unmap;
+   unsigned long *bitmap = NULL;
+   int ret;
+
+   /* check bitmap size */
+   if ((unmap_bitmap->flags | VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP)) {
+   ret = verify_bitmap_size(unmap_bitmap->size >> PAGE_SHIFT,
+unmap_bitmap->bitmap_size);
+   if (ret)
+   return ret;
+
+   /* one bit per page */
+   bitmap = bitmap_zalloc(unmap_bitmap->size >> PAGE_SHIFT,
+   GFP_KERNEL);
+   if (!bitmap)
+   return -ENOMEM;
+   }
+
+   unmap.iova = unmap_bitmap->iova;
+   unmap.size = unmap_bitmap->size;
+   ret = vfio_dma_do_unmap(iommu, , bitmap);
+   if (!ret)
+   unmap_bitmap->size = unmap.size;
+
+   if (bitmap) {
+   if (!ret && copy_to_user(unmap_bitmap->bitmap, bitmap,
+unmap_bitmap->bitmap_size))
+   ret = -EFAULT;
+   bitmap_free(bitmap);
+   }
+
+   return ret;
+}
+
 static int vfio_iommu_map(struct vfio_iommu *iommu, dma_addr_t iova,
  unsigned long pfn, long npage, int prot)
 {
@@ -2366,7 +2410,7 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
if (unmap.argsz < minsz || unmap.flags)
return -EINVAL;
 
-   ret = vfio_dma_do_unmap(iommu, );
+   ret = vfio_dma_do_unmap(iommu, , NULL);
if (ret)
return ret;
 
@@ -2389,6 +2433,29 @@ static long vfio_iommu_type1_ioctl(void *iommu_data,
return -EINVAL;
 
return vfio_iova_get_dirty_bitmap(iommu, );
+   } else if (cmd == VFIO_IOMMU_UNMAP_DMA_GET_BITMAP) {
+   struct vfio_iommu_type1_dma_unmap_bitmap unmap_bitmap;
+   long ret;
+
+   /* Supported for v2 version only */
+   if (!iommu->v2)
+   return -EACCES;
+
+   minsz = offsetofend(struct vfio_iommu_type1_dma_unmap_bitmap,
+   bitmap);
+
+   if (copy_from_user(_bitmap, (void __user *)arg, minsz))
+   return -EFAULT;
+
+   if (unmap_bitmap.argsz < minsz)
+   return -EINVAL;
+
+   ret = vfio_dma_do_unmap_bitmap(iommu, _bitmap);
+   if (ret)
+   return ret;
+
+   return copy_to_user((void __user *)arg, _bitmap, minsz) ?
+   -EFAULT : 0;
}
 
return -ENOTTY;
-- 
2.7.0




[PATCH v9 Kernel 0/5] Add KABIs to support migration for VFIO devices

2019-11-12 Thread Kirti Wankhede
Hi Alex,

To keep kernel and QEMU patches in sync, keeping v9 version for this patch
set. Till v8 version, KABI was being discussed from QEMU patch series[1].
In earlier version mail and as per in person discussion at KVM forum, this
patch set adds:
* New IOCTL VFIO_IOMMU_GET_DIRTY_BITMAP to get dirty pages bitmap with
  respect to IOMMU container rather than per device. All pages pinned by
  vendor driver through vfio_pin_pages external API has to be marked as
  dirty during  migration.
  When there are CPU writes, CPU dirty page tracking can identify dirtied
  pages, but any page pinned by vendor driver can also be written by
  device. As of now there is no device which has hardware support for
  dirty page tracking. So all pages which are pinned by vendor driver
  should be considered as dirty.
* New IOCTL VFIO_IOMMU_UNMAP_DMA_GET_BITMAP to get dirty pages bitmap
  before unmapping IO virtual address range.
  With vIOMMU, during pre-copy phase of migration, while CPUs are still
  running, IO virtual address unmap can happen while device still keeping
  reference of guest pfns. Those pages should be reported as dirty before
  unmap, so that VFIO user space application can copy content of those
  pages from source to destination.

Yet TODO:
Since there is no device which has hardware support for system memmory
dirty bitmap tracking, right now there is no other API from vendor driver
to VFIO IOMMU module to report dirty pages. In future, when such hardware
support will be implemented, an API will be required such that vendor
driver could report dirty pages to VFIO module during migration phases.

[1] https://www.mail-archive.com/qemu-devel@nongnu.org/msg640400.html

Adding revision history from previous QEMU patch set to understand KABI
changes done till now

v8 -> v9:
- Split patch set in 2 sets, Kernel and QEMU.
- Dirty pages bitmap is queried from IOMMU container rather than from
  vendor driver for per device. Added 2 ioctls to achieve this.

v7 -> v8:
- Updated comments for KABI
- Added BAR address validation check during PCI device's config space load
  as suggested by Dr. David Alan Gilbert.
- Changed vfio_migration_set_state() to set or clear device state flags.
- Some nit fixes.

v6 -> v7:
- Fix build failures.

v5 -> v6:
- Fix build failure.

v4 -> v5:
- Added decriptive comment about the sequence of access of members of
  structure vfio_device_migration_info to be followed based on Alex's
  suggestion
- Updated get dirty pages sequence.
- As per Cornelia Huck's suggestion, added callbacks to VFIODeviceOps to
  get_object, save_config and load_config.
- Fixed multiple nit picks.
- Tested live migration with multiple vfio device assigned to a VM.

v3 -> v4:
- Added one more bit for _RESUMING flag to be set explicitly.
- data_offset field is read-only for user space application.
- data_size is read for every iteration before reading data from migration,
  that is removed assumption that data will be till end of migration
  region.
- If vendor driver supports mappable sparsed region, map those region
  during setup state of save/load, similarly unmap those from cleanup
  routines.
- Handles race condition that causes data corruption in migration region
  during save device state by adding mutex and serialiaing save_buffer and
  get_dirty_pages routines.
- Skip called get_dirty_pages routine for mapped MMIO region of device.
- Added trace events.
- Split into multiple functional patches.

v2 -> v3:
- Removed enum of VFIO device states. Defined VFIO device state with 2
  bits.
- Re-structured vfio_device_migration_info to keep it minimal and defined
  action on read and write access on its members.

v1 -> v2:
- Defined MIGRATION region type and sub-type which should be used with
  region type capability.
- Re-structured vfio_device_migration_info. This structure will be placed
  at 0th offset of migration region.
- Replaced ioctl with read/write for trapped part of migration region.
- Added both type of access support, trapped or mmapped, for data section
  of the region.
- Moved PCI device functions to pci file.
- Added iteration to get dirty page bitmap until bitmap for all requested
  pages are copied.

Thanks,
Kirti

Kirti Wankhede (5):
  vfio: KABI for migration interface for device state
  vfio iommu: Add ioctl defination to get dirty pages bitmap.
  vfio iommu: Add ioctl defination to unmap IOVA and return dirty bitmap
  vfio iommu: Implementation of ioctl to get dirty pages bitmap.
  vfio iommu: Implementation of ioctl to get dirty bitmap before unmap

 drivers/vfio/vfio_iommu_type1.c | 163 ++-
 include/uapi/linux/vfio.h   | 164 
 2 files changed, 325 insertions(+), 2 deletions(-)

-- 
2.7.0




[PATCH v9 Kernel 3/5] vfio iommu: Add ioctl defination to unmap IOVA and return dirty bitmap

2019-11-12 Thread Kirti Wankhede
With vIOMMU, during pre-copy phase of migration, while CPUs are still
running, IO virtual address unmap can happen while device still keeping
reference of guest pfns. Those pages should be reported as dirty before
unmap, so that VFIO user space application can copy content of those pages
from source to destination.

IOCTL defination added here add bitmap pointer, size and flag. If flag
VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP is set and bitmap memory is allocated
and bitmap_size of set, then ioctl will create bitmap of pinned pages and
then unmap those.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 include/uapi/linux/vfio.h | 33 +
 1 file changed, 33 insertions(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 6fd3822aa610..72fd297baf52 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -925,6 +925,39 @@ struct vfio_iommu_type1_dirty_bitmap {
 
 #define VFIO_IOMMU_GET_DIRTY_BITMAP _IO(VFIO_TYPE, VFIO_BASE + 17)
 
+/**
+ * VFIO_IOMMU_UNMAP_DMA_GET_BITMAP - _IOWR(VFIO_TYPE, VFIO_BASE + 18,
+ *   struct vfio_iommu_type1_dma_unmap_bitmap)
+ *
+ * Unmap IO virtual addresses using the provided struct
+ * vfio_iommu_type1_dma_unmap_bitmap.  Caller sets argsz.
+ * VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP should be set to get dirty bitmap
+ * before unmapping IO virtual addresses. If this flag is not set, only IO
+ * virtual address are unmapped without creating pinned pages bitmap, that
+ * is, behave same as VFIO_IOMMU_UNMAP_DMA ioctl.
+ * User should allocate memory to get bitmap and should set size of allocated
+ * memory in bitmap_size field. One bit in bitmap is used to represent per page
+ * consecutively starting from iova offset. Bit set indicates page at that
+ * offset from iova is dirty.
+ * The actual unmapped size is returned in the size field and bitmap of pages
+ * in the range of unmapped size is returned in bitmap if flag
+ * VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP is set.
+ *
+ * No guarantee is made to the user that arbitrary unmaps of iova or size
+ * different from those used in the original mapping call will succeed.
+ */
+struct vfio_iommu_type1_dma_unmap_bitmap {
+   __u32argsz;
+   __u32flags;
+#define VFIO_DMA_UNMAP_FLAG_GET_DIRTY_BITMAP (1 << 0)
+   __u64iova;/* IO virtual address */
+   __u64size;/* Size of mapping (bytes) */
+   __u64bitmap_size; /* in bytes */
+   void __user *bitmap;  /* one bit per page */
+};
+
+#define VFIO_IOMMU_UNMAP_DMA_GET_BITMAP _IO(VFIO_TYPE, VFIO_BASE + 18)
+
 /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
 
 /*
-- 
2.7.0




[PATCH v9 QEMU 02/15] vfio iommu: Add ioctl defination to get dirty pages bitmap.

2019-11-12 Thread Kirti Wankhede
All pages pinned by vendor driver through vfio_pin_pages API should be
considered as dirty during migration. IOMMU container maintains a list of
all such pinned pages. Added an ioctl defination to get bitmap of such
pinned pages for requested IO virtual address range.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 linux-headers/linux/vfio.h | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/linux-headers/linux/vfio.h b/linux-headers/linux/vfio.h
index 597b3d4bf45e..2b00c732f313 100644
--- a/linux-headers/linux/vfio.h
+++ b/linux-headers/linux/vfio.h
@@ -902,6 +902,29 @@ struct vfio_iommu_type1_dma_unmap {
 #define VFIO_IOMMU_ENABLE  _IO(VFIO_TYPE, VFIO_BASE + 15)
 #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16)
 
+/**
+ * VFIO_IOMMU_GET_DIRTY_BITMAP - _IOWR(VFIO_TYPE, VFIO_BASE + 17,
+ * struct vfio_iommu_type1_dirty_bitmap)
+ *
+ * IOCTL to get dirty pages bitmap for IOMMU container during migration.
+ * Get dirty pages bitmap of given IO virtual addresses range using
+ * struct vfio_iommu_type1_dirty_bitmap. Caller sets argsz, which is size of
+ * struct vfio_iommu_type1_dirty_bitmap. User should allocate memory to get
+ * bitmap and should set size of allocated memory in bitmap_size field.
+ * One bit is used to represent per page consecutively starting from iova
+ * offset. Bit set indicates page at that offset from iova is dirty.
+ */
+struct vfio_iommu_type1_dirty_bitmap {
+   __u32argsz;
+   __u32flags;
+   __u64iova;  /* IO virtual address */
+   __u64size;  /* Size of iova range */
+   __u64bitmap_size;   /* in bytes */
+   void*bitmap;/* one bit per page */
+};
+
+#define VFIO_IOMMU_GET_DIRTY_BITMAP _IO(VFIO_TYPE, VFIO_BASE + 17)
+
 /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
 
 /*
-- 
2.7.0




[PATCH v9 Kernel 1/5] vfio: KABI for migration interface for device state

2019-11-12 Thread Kirti Wankhede
- Defined MIGRATION region type and sub-type.
- Used 3 bits to define VFIO device states.
Bit 0 => _RUNNING
Bit 1 => _SAVING
Bit 2 => _RESUMING
Combination of these bits defines VFIO device's state during migration
_RUNNING => Normal VFIO device running state. When its reset, it
indicates _STOPPED state. when device is changed to
_STOPPED, driver should stop device before write()
returns.
_SAVING | _RUNNING => vCPUs are running, VFIO device is running but
  start saving state of device i.e. pre-copy state
_SAVING  => vCPUs are stopped, VFIO device should be stopped, and
save device state,i.e. stop-n-copy state
_RESUMING => VFIO device resuming state.
_SAVING | _RESUMING and _RUNNING | _RESUMING => Invalid states
Bits 3 - 31 are reserved for future use. User should perform
read-modify-write operation on this field.
- Defined vfio_device_migration_info structure which will be placed at 0th
  offset of migration region to get/set VFIO device related information.
  Defined members of structure and usage on read/write access:
* device_state: (read/write)
To convey VFIO device state to be transitioned to. Only 3 bits are
used as of now, Bits 3 - 31 are reserved for future use.
* pending bytes: (read only)
To get pending bytes yet to be migrated for VFIO device.
* data_offset: (read only)
To get data offset in migration region from where data exist
during _SAVING and from where data should be written by user space
application during _RESUMING state.
* data_size: (read/write)
To get and set size in bytes of data copied in migration region
during _SAVING and _RESUMING state.

Migration region looks like:
 --
|vfio_device_migration_info|data section  |
|  | ///  |
 --
 ^  ^
 offset 0-trapped partdata_offset

Structure vfio_device_migration_info is always followed by data section
in the region, so data_offset will always be non-0. Offset from where data
to be copied is decided by kernel driver, data section can be trapped or
mapped depending on how kernel driver defines data section.
Data section partition can be defined as mapped by sparse mmap capability.
If mmapped, then data_offset should be page aligned, where as initial
section which contain vfio_device_migration_info structure might not end
at offset which is page aligned.
Vendor driver should decide whether to partition data section and how to
partition the data section. Vendor driver should return data_offset
accordingly.

For user application, data is opaque. User should write data in the same
order as received.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 include/uapi/linux/vfio.h | 108 ++
 1 file changed, 108 insertions(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 9e843a147ead..35b09427ad9f 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -305,6 +305,7 @@ struct vfio_region_info_cap_type {
 #define VFIO_REGION_TYPE_PCI_VENDOR_MASK   (0x)
 #define VFIO_REGION_TYPE_GFX(1)
 #define VFIO_REGION_TYPE_CCW   (2)
+#define VFIO_REGION_TYPE_MIGRATION  (3)
 
 /* sub-types for VFIO_REGION_TYPE_PCI_* */
 
@@ -379,6 +380,113 @@ struct vfio_region_gfx_edid {
 /* sub-types for VFIO_REGION_TYPE_CCW */
 #define VFIO_REGION_SUBTYPE_CCW_ASYNC_CMD  (1)
 
+/* sub-types for VFIO_REGION_TYPE_MIGRATION */
+#define VFIO_REGION_SUBTYPE_MIGRATION   (1)
+
+/*
+ * Structure vfio_device_migration_info is placed at 0th offset of
+ * VFIO_REGION_SUBTYPE_MIGRATION region to get/set VFIO device related 
migration
+ * information. Field accesses from this structure are only supported at their
+ * native width and alignment, otherwise the result is undefined and vendor
+ * drivers should return an error.
+ *
+ * device_state: (read/write)
+ *  To indicate vendor driver the state VFIO device should be transitioned
+ *  to. If device state transition fails, write on this field return error.
+ *  It consists of 3 bits:
+ *  - If bit 0 set, indicates _RUNNING state. When its reset, that 
indicates
+ *_STOPPED state. When device is changed to _STOPPED, driver should 
stop
+ *device before write() returns.
+ *  - If bit 1 set, indicates _SAVING state. When set, that indicates 
driver
+ *should start gathering device state information which will be 
provided
+ *to VFIO user space application to save device's state.
+ *  - If bit 2 set, indicates _RESUMING state. When set, that indicates
+ *prepare to resume 

[PATCH v9 Kernel 2/5] vfio iommu: Add ioctl defination to get dirty pages bitmap.

2019-11-12 Thread Kirti Wankhede
All pages pinned by vendor driver through vfio_pin_pages API should be
considered as dirty during migration. IOMMU container maintains a list of
all such pinned pages. Added an ioctl defination to get bitmap of such
pinned pages for requested IO virtual address range.

Signed-off-by: Kirti Wankhede 
Reviewed-by: Neo Jia 
---
 include/uapi/linux/vfio.h | 23 +++
 1 file changed, 23 insertions(+)

diff --git a/include/uapi/linux/vfio.h b/include/uapi/linux/vfio.h
index 35b09427ad9f..6fd3822aa610 100644
--- a/include/uapi/linux/vfio.h
+++ b/include/uapi/linux/vfio.h
@@ -902,6 +902,29 @@ struct vfio_iommu_type1_dma_unmap {
 #define VFIO_IOMMU_ENABLE  _IO(VFIO_TYPE, VFIO_BASE + 15)
 #define VFIO_IOMMU_DISABLE _IO(VFIO_TYPE, VFIO_BASE + 16)
 
+/**
+ * VFIO_IOMMU_GET_DIRTY_BITMAP - _IOWR(VFIO_TYPE, VFIO_BASE + 17,
+ * struct vfio_iommu_type1_dirty_bitmap)
+ *
+ * IOCTL to get dirty pages bitmap for IOMMU container during migration.
+ * Get dirty pages bitmap of given IO virtual addresses range using
+ * struct vfio_iommu_type1_dirty_bitmap. Caller sets argsz, which is size of
+ * struct vfio_iommu_type1_dirty_bitmap. User should allocate memory to get
+ * bitmap and should set size of allocated memory in bitmap_size field.
+ * One bit is used to represent per page consecutively starting from iova
+ * offset. Bit set indicates page at that offset from iova is dirty.
+ */
+struct vfio_iommu_type1_dirty_bitmap {
+   __u32argsz;
+   __u32flags;
+   __u64iova;  /* IO virtual address */
+   __u64size;  /* Size of iova range */
+   __u64bitmap_size;   /* in bytes */
+   void __user *bitmap;/* one bit per page */
+};
+
+#define VFIO_IOMMU_GET_DIRTY_BITMAP _IO(VFIO_TYPE, VFIO_BASE + 17)
+
 /*  Additional API for SPAPR TCE (Server POWERPC) IOMMU  */
 
 /*
-- 
2.7.0




Re: [PATCH v1 3/5] Add use of RCU for qemu_logfile.

2019-11-12 Thread Alex Bennée


Robert Foley  writes:

> This now allows changing the logfile while logging is active,
> and also solves the issue of a seg fault while changing the logfile.
>
> Any read access to the qemu_logfile handle will use
> the rcu_read_lock()/unlock() around the use of the handle.
> To fetch the handle we will use atomic_rcu_read().
> We also in many cases do a check for validity of the
> logfile handle before using it to deal with the case where the
> file is closed and set to NULL.
>
> The cases where we write to the qemu_logfile will use atomic_rcu_set().
> Writers will also use call_rcu() with a newly added qemu_logfile_free
> function for freeing/closing when readers have finished.
>
> Signed-off-by: Robert Foley 
> ---
> v1
> - Changes for review comments.
> - Minor changes to definition of QemuLogFile.
> - Changed qemu_log_separate() to fix unbalanced and
>   remove qemu_log_enabled() check.
> - changed qemu_log_lock() to include else.
> - make qemu_logfile_free static.
> - use g_assert(logfile) in qemu_logfile_free.
> - Relocated unlock out of if/else in qemu_log_close(), and
>   in qemu_set_log().
> ---
>  include/qemu/log.h | 42 ++
>  util/log.c | 73 +-
>  include/exec/log.h | 33 ++---
>  tcg/tcg.c  | 12 ++--
>  4 files changed, 128 insertions(+), 32 deletions(-)
>
> diff --git a/include/qemu/log.h b/include/qemu/log.h
> index a7c5b01571..528e1f9dd7 100644
> --- a/include/qemu/log.h
> +++ b/include/qemu/log.h
> @@ -3,9 +3,16 @@
>
>  /* A small part of this API is split into its own header */
>  #include "qemu/log-for-trace.h"
> +#include "qemu/rcu.h"
> +
> +typedef struct QemuLogFile {
> +struct rcu_head rcu;
> +FILE *fd;
> +} QemuLogFile;
>
>  /* Private global variable, don't use */
> -extern FILE *qemu_logfile;
> +extern QemuLogFile *qemu_logfile;
> +
>
>  /*
>   * The new API:
> @@ -25,7 +32,16 @@ static inline bool qemu_log_enabled(void)
>   */
>  static inline bool qemu_log_separate(void)
>  {
> -return qemu_logfile != NULL && qemu_logfile != stderr;
> +QemuLogFile *logfile;
> +bool res = false;
> +
> +rcu_read_lock();
> +logfile = atomic_rcu_read(_logfile);
> +if (logfile && logfile->fd != stderr) {
> +res = true;
> +}
> +rcu_read_unlock();
> +return res;
>  }
>
>  #define CPU_LOG_TB_OUT_ASM (1 << 0)
> @@ -55,14 +71,23 @@ static inline bool qemu_log_separate(void)
>
>  static inline FILE *qemu_log_lock(void)
>  {
> -qemu_flockfile(qemu_logfile);
> -return logfile->fd;
> +QemuLogFile *logfile;
> +rcu_read_lock();
> +logfile = atomic_rcu_read(_logfile);
> +if (logfile) {
> +qemu_flockfile(logfile->fd);
> +return logfile->fd;
> +} else {
> +rcu_read_unlock();
> +return NULL;
> +}
>  }
>
>  static inline void qemu_log_unlock(FILE *fd)
>  {
>  if (fd) {
>  qemu_funlockfile(fd);
> +rcu_read_unlock();
>  }
>  }
>
> @@ -73,9 +98,14 @@ static inline void qemu_log_unlock(FILE *fd)
>  static inline void GCC_FMT_ATTR(1, 0)
>  qemu_log_vprintf(const char *fmt, va_list va)
>  {
> -if (qemu_logfile) {
> -vfprintf(qemu_logfile, fmt, va);
> +QemuLogFile *logfile;
> +
> +rcu_read_lock();
> +logfile = atomic_rcu_read(_logfile);
> +if (logfile) {
> +vfprintf(logfile->fd, fmt, va);
>  }
> +rcu_read_unlock();
>  }
>
>  /* log only if a bit is set on the current loglevel mask:
> diff --git a/util/log.c b/util/log.c
> index c25643dc99..802b8de42e 100644
> --- a/util/log.c
> +++ b/util/log.c
> @@ -28,7 +28,7 @@
>
>  static char *logfilename;
>  static QemuMutex qemu_logfile_mutex;
> -FILE *qemu_logfile;
> +QemuLogFile *qemu_logfile;
>  int qemu_loglevel;
>  static int log_append = 0;
>  static GArray *debug_regions;
> @@ -37,10 +37,14 @@ static GArray *debug_regions;
>  int qemu_log(const char *fmt, ...)
>  {
>  int ret = 0;
> -if (qemu_logfile) {
> +QemuLogFile *logfile;
> +
> +rcu_read_lock();
> +logfile = atomic_rcu_read(_logfile);
> +if (logfile) {
>  va_list ap;
>  va_start(ap, fmt);
> -ret = vfprintf(qemu_logfile, fmt, ap);
> +ret = vfprintf(logfile->fd, fmt, ap);
>  va_end(ap);
>
>  /* Don't pass back error results.  */
> @@ -48,6 +52,7 @@ int qemu_log(const char *fmt, ...)
>  ret = 0;
>  }
>  }
> +rcu_read_unlock();
>  return ret;
>  }
>
> @@ -56,11 +61,23 @@ static void __attribute__((__constructor__)) 
> qemu_logfile_init(void)
>  qemu_mutex_init(_logfile_mutex);
>  }
>
> +static void qemu_logfile_free(QemuLogFile *logfile)
> +{
> +g_assert(logfile);
> +
> +if (logfile->fd != stderr) {
> +fclose(logfile->fd);
> +}
> +g_free(logfile);
> +}
> +
>  static bool log_uses_own_buffers;
>
>  /* enable or disable low levels log */
>  void qemu_set_log(int log_flags)
>  

Re: [PATCH v1 2/5] qemu_log_lock/unlock now preserves the qemu_logfile handle.

2019-11-12 Thread Alex Bennée


Robert Foley  writes:

> qemu_log_lock() now returns a handle and qemu_log_unlock() receives a
> handle to unlock.  This allows for changing the handle during logging
> and ensures the lock() and unlock() are for the same file.
>
> Signed-off-by: Robert Foley 
> ---
> v1
> - Moved this up in the patch sequence to be
>   before adding RCU for qemu_logfile.
> ---
>  include/qemu/log.h|  9 ++---
>  accel/tcg/cpu-exec.c  |  4 ++--
>  accel/tcg/translate-all.c |  4 ++--
>  accel/tcg/translator.c|  4 ++--
>  exec.c|  4 ++--
>  hw/net/can/can_sja1000.c  |  4 ++--
>  net/can/can_socketcan.c   |  5 ++---
>  target/cris/translate.c   |  4 ++--
>  target/i386/translate.c   |  5 +++--
>  target/lm32/translate.c   |  4 ++--
>  target/microblaze/translate.c |  4 ++--
>  target/nios2/translate.c  |  4 ++--
>  target/tilegx/translate.c |  7 ---
>  target/unicore32/translate.c  |  4 ++--

A bit messier than I'd like but I guess that's unavoidable. It does
nicely show who's left to convert to the common translator loop ;-)

Reviewed-by: Alex Bennée 

>  tcg/tcg.c | 16 
>  15 files changed, 43 insertions(+), 39 deletions(-)
>
> diff --git a/include/qemu/log.h b/include/qemu/log.h
> index a91105b2ad..a7c5b01571 100644
> --- a/include/qemu/log.h
> +++ b/include/qemu/log.h
> @@ -53,14 +53,17 @@ static inline bool qemu_log_separate(void)
>   * qemu_loglevel is never set when qemu_logfile is unset.
>   */
>
> -static inline void qemu_log_lock(void)
> +static inline FILE *qemu_log_lock(void)
>  {
>  qemu_flockfile(qemu_logfile);
> +return logfile->fd;
>  }
>
> -static inline void qemu_log_unlock(void)
> +static inline void qemu_log_unlock(FILE *fd)
>  {
> -qemu_funlockfile(qemu_logfile);
> +if (fd) {
> +qemu_funlockfile(fd);
> +}
>  }
>
>  /* Logging functions: */
> diff --git a/net/can/can_socketcan.c b/net/can/can_socketcan.c
> index 8a6ffad40c..29bfacd4f8 100644
> --- a/net/can/can_socketcan.c
> +++ b/net/can/can_socketcan.c
> @@ -76,8 +76,7 @@ QEMU_BUILD_BUG_ON(offsetof(qemu_can_frame, data)
>  static void can_host_socketcan_display_msg(struct qemu_can_frame *msg)
>  {
>  int i;
> -
> -qemu_log_lock();
> +FILE *logfile = qemu_log_lock();
>  qemu_log("[cansocketcan]: %03X [%01d] %s %s",
>   msg->can_id & QEMU_CAN_EFF_MASK,
>   msg->can_dlc,
> @@ -89,7 +88,7 @@ static void can_host_socketcan_display_msg(struct 
> qemu_can_frame *msg)
>  }
>  qemu_log("\n");
>  qemu_log_flush();
> -qemu_log_unlock();
> +qemu_log_unlock(logfile);
>  }
>
>  static void can_host_socketcan_read(void *opaque)
> diff --git a/accel/tcg/cpu-exec.c b/accel/tcg/cpu-exec.c
> index c01f59c743..62068d10c3 100644
> --- a/accel/tcg/cpu-exec.c
> +++ b/accel/tcg/cpu-exec.c
> @@ -156,7 +156,7 @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, 
> TranslationBlock *itb)
>  #if defined(DEBUG_DISAS)
>  if (qemu_loglevel_mask(CPU_LOG_TB_CPU)
>  && qemu_log_in_addr_range(itb->pc)) {
> -qemu_log_lock();
> +FILE *logfile = qemu_log_lock();
>  int flags = 0;
>  if (qemu_loglevel_mask(CPU_LOG_TB_FPU)) {
>  flags |= CPU_DUMP_FPU;
> @@ -165,7 +165,7 @@ static inline tcg_target_ulong cpu_tb_exec(CPUState *cpu, 
> TranslationBlock *itb)
>  flags |= CPU_DUMP_CCOP;
>  #endif
>  log_cpu_state(cpu, flags);
> -qemu_log_unlock();
> +qemu_log_unlock(logfile);
>  }
>  #endif /* DEBUG_DISAS */
>
> diff --git a/accel/tcg/translate-all.c b/accel/tcg/translate-all.c
> index 9f48da9472..bb325a2bc4 100644
> --- a/accel/tcg/translate-all.c
> +++ b/accel/tcg/translate-all.c
> @@ -1804,7 +1804,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>  #ifdef DEBUG_DISAS
>  if (qemu_loglevel_mask(CPU_LOG_TB_OUT_ASM) &&
>  qemu_log_in_addr_range(tb->pc)) {
> -qemu_log_lock();
> +FILE *logfile = qemu_log_lock();
>  qemu_log("OUT: [size=%d]\n", gen_code_size);
>  if (tcg_ctx->data_gen_ptr) {
>  size_t code_size = tcg_ctx->data_gen_ptr - tb->tc.ptr;
> @@ -1829,7 +1829,7 @@ TranslationBlock *tb_gen_code(CPUState *cpu,
>  }
>  qemu_log("\n");
>  qemu_log_flush();
> -qemu_log_unlock();
> +qemu_log_unlock(logfile);
>  }
>  #endif
>
> diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
> index f977682be7..603d17ff83 100644
> --- a/accel/tcg/translator.c
> +++ b/accel/tcg/translator.c
> @@ -138,11 +138,11 @@ void translator_loop(const TranslatorOps *ops, 
> DisasContextBase *db,
>  #ifdef DEBUG_DISAS
>  if (qemu_loglevel_mask(CPU_LOG_TB_IN_ASM)
>  && qemu_log_in_addr_range(db->pc_first)) {
> -qemu_log_lock();
> +FILE *logfile = qemu_log_lock();
>  qemu_log("\n");
>  ops->disas_log(db, cpu);
>  qemu_log("\n");
> -   

Re: [PATCH v1 1/5] Add a mutex to guarantee single writer to qemu_logfile handle.

2019-11-12 Thread Alex Bennée


Robert Foley  writes:

> Also added qemu_logfile_init() for initializing the logfile mutex.
>
> Signed-off-by: Robert Foley 

Reviewed-by: Alex Bennée 

> ---
> v1
> - changed qemu_logfile_init() to use __constructor__.
> ---
>  util/log.c | 14 ++
>  1 file changed, 14 insertions(+)
>
> diff --git a/util/log.c b/util/log.c
> index 1ca13059ee..c25643dc99 100644
> --- a/util/log.c
> +++ b/util/log.c
> @@ -24,8 +24,10 @@
>  #include "qapi/error.h"
>  #include "qemu/cutils.h"
>  #include "trace/control.h"
> +#include "qemu/thread.h"
>
>  static char *logfilename;
> +static QemuMutex qemu_logfile_mutex;
>  FILE *qemu_logfile;
>  int qemu_loglevel;
>  static int log_append = 0;
> @@ -49,6 +51,11 @@ int qemu_log(const char *fmt, ...)
>  return ret;
>  }
>
> +static void __attribute__((__constructor__)) qemu_logfile_init(void)
> +{
> +qemu_mutex_init(_logfile_mutex);
> +}
> +
>  static bool log_uses_own_buffers;
>
>  /* enable or disable low levels log */
> @@ -58,6 +65,9 @@ void qemu_set_log(int log_flags)
>  #ifdef CONFIG_TRACE_LOG
>  qemu_loglevel |= LOG_TRACE;
>  #endif
> +
> +g_assert(qemu_logfile_mutex.initialized);
> +qemu_mutex_lock(_logfile_mutex);
>  if (!qemu_logfile &&
>  (is_daemonized() ? logfilename != NULL : qemu_loglevel)) {
>  if (logfilename) {
> @@ -93,6 +103,7 @@ void qemu_set_log(int log_flags)
>  log_append = 1;
>  }
>  }
> +qemu_mutex_unlock(_logfile_mutex);
>  if (qemu_logfile &&
>  (is_daemonized() ? logfilename == NULL : !qemu_loglevel)) {
>  qemu_log_close();
> @@ -230,12 +241,15 @@ void qemu_log_flush(void)
>  /* Close the log file */
>  void qemu_log_close(void)
>  {
> +g_assert(qemu_logfile_mutex.initialized);
> +qemu_mutex_lock(_logfile_mutex);
>  if (qemu_logfile) {
>  if (qemu_logfile != stderr) {
>  fclose(qemu_logfile);
>  }
>  qemu_logfile = NULL;
>  }
> +qemu_mutex_unlock(_logfile_mutex);
>  }
>
>  const QEMULogItem qemu_log_items[] = {


--
Alex Bennée



Re: [PATCH v1 1/2] docs/devel: rename plugins.rst to tcg-plugins.rst

2019-11-12 Thread Peter Maydell
On Tue, 12 Nov 2019 at 16:42, Alex Bennée  wrote:
>
> This makes it a bit clearer what this is about.
>
> Signed-off-by: Alex Bennée 
> ---
>  MAINTAINERS | 1 +
>  docs/devel/{plugins.rst => tcg-plugins.rst} | 0
>  2 files changed, 1 insertion(+)
>  rename docs/devel/{plugins.rst => tcg-plugins.rst} (100%)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index ff8d0d29f4b..b160d817208 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2369,6 +2369,7 @@ F: tcg/
>  TCG Plugins
>  M: Alex Bennée 
>  S: Maintained
> +F: docs/devel/tcg-plugins.rst
>  F: plugins/
>  F: tests/plugin
>
> diff --git a/docs/devel/plugins.rst b/docs/devel/tcg-plugins.rst
> similarity index 100%
> rename from docs/devel/plugins.rst
> rename to docs/devel/tcg-plugins.rst
> -

Don't you also need to update the reference
to 'plugins' in docs/devel/index.rst ?

thanks
-- PMM



Re: [PATCH v2] s390x: Properly fetch the short psw on diag308 subc 0/1

2019-11-12 Thread David Hildenbrand



> Am 12.11.2019 um 17:58 schrieb Cornelia Huck :
> 
> On Mon, 11 Nov 2019 10:28:08 -0500
> Janosch Frank  wrote:
> 
>> We need to actually fetch the cpu mask and set it. As we invert the
>> short psw indication in the mask, SIE will report a specification
>> exception, if it wasn't present in the reset psw.
>> 
>> Signed-off-by: Janosch Frank 
>> ---
>> target/s390x/cpu.c | 12 ++--
>> target/s390x/cpu.h |  1 +
>> 2 files changed, 11 insertions(+), 2 deletions(-)
> 
> So, is this change -rc material, or should it go in during the next
> release? I'm a bit confused here.

IMHO, this is not urgent and can wait.
> 
> [Also, does this need a change in the tcg code, or is that something
> that should just be done eventually? Sorry, drowning a bit in mails
> here...]

We‘re missing many checks when loading/running a new PSW for TCG, not just this 
scenario. So this should be done at one point but is not urgent at all.




Re: [PATCH v1 2/2] docs/devel: update tcg-plugins.rst with API versioning details

2019-11-12 Thread Peter Maydell
On Tue, 12 Nov 2019 at 16:41, Alex Bennée  wrote:
>
> Signed-off-by: Alex Bennée 
> ---
>  docs/devel/tcg-plugins.rst | 16 
>  1 file changed, 16 insertions(+)
>
> diff --git a/docs/devel/tcg-plugins.rst b/docs/devel/tcg-plugins.rst
> index b18fb6729e3..8d619fd44ef 100644
> --- a/docs/devel/tcg-plugins.rst
> +++ b/docs/devel/tcg-plugins.rst
> @@ -25,6 +25,22 @@ process. However the project reserves the right to change 
> or break the
>  API should it need to do so. The best way to avoid this is to submit
>  your plugin upstream so they can be updated if/when the API changes.
>
> +API versioning
> +--
> +
> +All plugins need to declare a symbol which exports the plugin API
> +version they were built against. This is can be done simply by:

either "is" or "can be", but not both :-)

> +
> +::
> +QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
> +
> +The core code will refuse to load a plugin that doesn't export a
> +`qemu_plugin_version` symbol.

It also refuses to load a plugin which exports a qemu_plugin_version
specifying a version which the core code doesn't support, right?

> Additionally the `qemu_info_t` structure
> +which is passed to the `qemu_plugin_install` method of a plugin will
> +detail the minimum and current API versions supported by QEMU. The API
> +version will be incremented if new APIs are added. The minimum API
> +version will be incremented if existing APIs are changed or removed.
> +
>

thanks
-- PMM



Re: [PATCH v2] s390x: Properly fetch the short psw on diag308 subc 0/1

2019-11-12 Thread Cornelia Huck
On Mon, 11 Nov 2019 10:28:08 -0500
Janosch Frank  wrote:

> We need to actually fetch the cpu mask and set it. As we invert the
> short psw indication in the mask, SIE will report a specification
> exception, if it wasn't present in the reset psw.
> 
> Signed-off-by: Janosch Frank 
> ---
>  target/s390x/cpu.c | 12 ++--
>  target/s390x/cpu.h |  1 +
>  2 files changed, 11 insertions(+), 2 deletions(-)

So, is this change -rc material, or should it go in during the next
release? I'm a bit confused here.

[Also, does this need a change in the tcg code, or is that something
that should just be done eventually? Sorry, drowning a bit in mails
here...]




Re: [PATCH 1/5] MAINTAINERS: Add a section on git infrastructure

2019-11-12 Thread Alex Bennée


Aleksandar Markovic  writes:

> From: Aleksandar Markovic 
>
> There should be a patient person maintaining gory details of
> git-related files, and there is no better person for that role
> than Philippe. Alex should be the reviewer for some relations
> with gitdm.

I'm not sure about this. The .gitignore files are best updated by people
responsible for the various parts of the tree. Once out-of-tree builds
become standard we should be able to eliminate them all together. As far
as .mailmap is concerned I think people are quite capable of updating it
themselves without putting the changes through a maintainer tree.

>
> Signed-off-by: Aleksandar Markovic 
> ---
>  MAINTAINERS | 17 +
>  1 file changed, 17 insertions(+)
>
> diff --git a/MAINTAINERS b/MAINTAINERS
> index 4964fbb..be43ccb 100644
> --- a/MAINTAINERS
> +++ b/MAINTAINERS
> @@ -2664,6 +2664,23 @@ M: Daniel P. Berrange 
>  S: Odd Fixes
>  F: scripts/git-submodule.sh
>
> +GIT infrastructure
> +M: Philippe Mathieu-Daudé 
> +R: Alex Bennée 
> +S: Maintained
> +F: .mailmap
> +F: scripts/git.orderfile
> +F: .gitignore
> +F: tests/fp/.gitignore
> +F: tests/fp/berkeley-softfloat-3/.gitignore
> +F: tests/fp/berkeley-testfloat-3/.gitignore
> +F: tests/migration/.gitignore
> +F: tests/multiboot/.gitignore
> +F: tests/qemu-iotests/.gitignore
> +F: tests/tcg/.gitignore
> +F: tests/uefi-test-tools/.gitignore
> +F: ui/keycodemapdb/tests/.gitignore
> +
>  Sphinx documentation configuration and build machinery
>  M: Peter Maydell 
>  S: Maintained


--
Alex Bennée



Re: [PATCH v4 01/20] softmmu: split off vl.c:main() into main.c

2019-11-12 Thread Alexander Bulekov

On 11/5/19 11:41 AM, Darren Kenny wrote:

On Wed, Oct 30, 2019 at 02:49:48PM +, Oleinik, Alexander wrote:

From: Alexander Oleinik 

A program might rely on functions implemented in vl.c, but implement its
own main(). By placing main into a separate source file, there are no
complaints about duplicate main()s when linking against vl.o. For
example, the virtual-device fuzzer uses a main() provided by libfuzzer,
and needs to perform some initialization before running the softmmu
initialization. Now, main simply calls three vl.c functions which
handle the guest initialization, main loop and cleanup.

Signed-off-by: Alexander Oleinik 
---
Makefile    |  1 +
Makefile.objs   |  2 ++
include/sysemu/sysemu.h |  4 
main.c  | 52 +
vl.c    | 36 +++-
5 files changed, 68 insertions(+), 27 deletions(-)
create mode 100644 main.c

diff --git a/Makefile b/Makefile
index 0e994a275d..d2b2ecd3c4 100644
--- a/Makefile
+++ b/Makefile
@@ -474,6 +474,7 @@ $(SOFTMMU_ALL_RULES): $(crypto-obj-y)
$(SOFTMMU_ALL_RULES): $(io-obj-y)
$(SOFTMMU_ALL_RULES): config-all-devices.mak
$(SOFTMMU_ALL_RULES): $(edk2-decompressed)
+$(SOFTMMU_ALL_RULES): $(softmmu-main-y)

.PHONY: $(TARGET_DIRS_RULES)
# The $(TARGET_DIRS_RULES) are of the form SUBDIR/GOAL, so that
diff --git a/Makefile.objs b/Makefile.objs
index 11ba1a36bd..9ff9b0c6f9 100644
--- a/Makefile.objs
+++ b/Makefile.objs
@@ -86,6 +86,8 @@ common-obj-$(CONFIG_FDT) += device_tree.o
# qapi

common-obj-y += qapi/
+
+softmmu-main-y = main.o
endif

###
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index 44f18eb739..03f9838b81 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -114,6 +114,10 @@ QemuOpts *qemu_get_machine_opts(void);

bool defaults_enabled(void);

+void main_loop(void);
+void qemu_init(int argc, char **argv, char **envp);
+void qemu_cleanup(void);
+
extern QemuOptsList qemu_legacy_drive_opts;
extern QemuOptsList qemu_common_drive_opts;
extern QemuOptsList qemu_drive_opts;
diff --git a/main.c b/main.c
new file mode 100644
index 00..ecd6389424
--- /dev/null
+++ b/main.c
@@ -0,0 +1,52 @@
+/*
+ * QEMU System Emulator
+ *
+ * Copyright (c) 2003-2008 Fabrice Bellard
+ *
+ * Permission is hereby granted, free of charge, to any person 
obtaining a copy
+ * of this software and associated documentation files (the 
"Software"), to deal
+ * in the Software without restriction, including without limitation 
the rights
+ * to use, copy, modify, merge, publish, distribute, sublicense, 
and/or sell

+ * copies of the Software, and to permit persons to whom the Software is
+ * furnished to do so, subject to the following conditions:
+ *
+ * The above copyright notice and this permission notice shall be 
included in

+ * all copies or substantial portions of the Software.
+ *
+ * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, 
EXPRESS OR
+ * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF 
MERCHANTABILITY,
+ * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT 
SHALL
+ * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES 
OR OTHER
+ * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, 
ARISING FROM,
+ * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER 
DEALINGS IN

+ * THE SOFTWARE.
+ */
+
+#include "qemu/osdep.h"
+#include "sysemu/sysemu.h"
+
+#ifdef CONFIG_SDL
+#if defined(__APPLE__) || defined(main)
+#include 
+int main(int argc, char **argv)
+{
+    return qemu_main(argc, argv, NULL);
+}
+#undef main
+#define main qemu_main


This /looks/ wrong, you're defining a function main(), and then
immediately #undef and #define main again.

Maybe this could be written differently, or add a comment here as to
why you need to do this.


+#endif
+#endif /* CONFIG_SDL */
+
+#ifdef CONFIG_COCOA
+#undef main
+#define main qemu_main
+#endif /* CONFIG_COCOA */


I don't really know the combinations that might exist, but it looks
like if CONFIG_SDL is not defined, then we're redefining main() to be
qemi_main() - so what main() function will actually be used here?


I tried to copy this straight from vl.c. It seems that this was 
originally added for similar reasons that I added this patch - similarly 
to libfuzzer, SDL has its own main function, and I'm guessing its 
similar for cocoa. With some  preprocessor flags, the result looks like:

int SDL_main(int argc, char **argv)
{
return qemu_main(argc, argv,
((void *)0)
);
}
int qemu_main(int argc, char **argv, char **envp)
{
qemu_init(argc, argv, envp);
main_loop();
qemu_cleanup();
return 0;
}

So it looks like this is there since SDL expects main to have two args. 
Maybe this is something that can be solved by adding separate main-sdl.c 
and main-cocoa.c files 

[PATCH v1 1/2] docs/devel: rename plugins.rst to tcg-plugins.rst

2019-11-12 Thread Alex Bennée
This makes it a bit clearer what this is about.

Signed-off-by: Alex Bennée 
---
 MAINTAINERS | 1 +
 docs/devel/{plugins.rst => tcg-plugins.rst} | 0
 2 files changed, 1 insertion(+)
 rename docs/devel/{plugins.rst => tcg-plugins.rst} (100%)

diff --git a/MAINTAINERS b/MAINTAINERS
index ff8d0d29f4b..b160d817208 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2369,6 +2369,7 @@ F: tcg/
 TCG Plugins
 M: Alex Bennée 
 S: Maintained
+F: docs/devel/tcg-plugins.rst
 F: plugins/
 F: tests/plugin
 
diff --git a/docs/devel/plugins.rst b/docs/devel/tcg-plugins.rst
similarity index 100%
rename from docs/devel/plugins.rst
rename to docs/devel/tcg-plugins.rst
-- 
2.20.1




[PATCH v1 2/2] docs/devel: update tcg-plugins.rst with API versioning details

2019-11-12 Thread Alex Bennée
Signed-off-by: Alex Bennée 
---
 docs/devel/tcg-plugins.rst | 16 
 1 file changed, 16 insertions(+)

diff --git a/docs/devel/tcg-plugins.rst b/docs/devel/tcg-plugins.rst
index b18fb6729e3..8d619fd44ef 100644
--- a/docs/devel/tcg-plugins.rst
+++ b/docs/devel/tcg-plugins.rst
@@ -25,6 +25,22 @@ process. However the project reserves the right to change or 
break the
 API should it need to do so. The best way to avoid this is to submit
 your plugin upstream so they can be updated if/when the API changes.
 
+API versioning
+--
+
+All plugins need to declare a symbol which exports the plugin API
+version they were built against. This is can be done simply by:
+
+::
+QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
+
+The core code will refuse to load a plugin that doesn't export a
+`qemu_plugin_version` symbol. Additionally the `qemu_info_t` structure
+which is passed to the `qemu_plugin_install` method of a plugin will
+detail the minimum and current API versions supported by QEMU. The API
+version will be incremented if new APIs are added. The minimum API
+version will be incremented if existing APIs are changed or removed.
+
 
 Exposure of QEMU internals
 --
-- 
2.20.1




[PATCH v1 0/2] TCG plugin doc updates

2019-11-12 Thread Alex Bennée
Hi,

A few minor tweaks to the TCG plugin documentation.

Alex Bennée (2):
  docs/devel: rename plugins.rst to tcg-plugins.rst
  docs/devel: update tcg-plugins.rst with API versioning details

 MAINTAINERS |  1 +
 docs/devel/{plugins.rst => tcg-plugins.rst} | 16 
 2 files changed, 17 insertions(+)
 rename docs/devel/{plugins.rst => tcg-plugins.rst} (87%)

-- 
2.20.1




[PATCH] microvm: fix memory leak in microvm_fix_kernel_cmdline

2019-11-12 Thread Sergio Lopez
In microvm_fix_kernel_cmdline(), fw_cfg_modify_string() is duplicating
cmdline instead of taking ownership of it. Free it afterwards to avoid
leaking it.

Reported-by: Coverity (CID 1407218)
Suggested-by: Peter Maydell 
Signed-off-by: Sergio Lopez 
---
 hw/i386/microvm.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/hw/i386/microvm.c b/hw/i386/microvm.c
index 8aacd6c8d1..def37e60f7 100644
--- a/hw/i386/microvm.c
+++ b/hw/i386/microvm.c
@@ -331,6 +331,8 @@ static void microvm_fix_kernel_cmdline(MachineState 
*machine)
 
 fw_cfg_modify_i32(x86ms->fw_cfg, FW_CFG_CMDLINE_SIZE, strlen(cmdline) + 1);
 fw_cfg_modify_string(x86ms->fw_cfg, FW_CFG_CMDLINE_DATA, cmdline);
+
+g_free(cmdline);
 }
 
 static void microvm_machine_state_init(MachineState *machine)
-- 
2.23.0




Re: [PULL 0/8] testing and tcg plugin api ver

2019-11-12 Thread Alex Bennée


Peter Maydell  writes:

> On Tue, 12 Nov 2019 at 14:50, Alex Bennée  wrote:
>>
>> The following changes since commit 039e285e095c20a88e623b927654b161aaf9d914:
>>
>>   Merge remote-tracking branch 
>> 'remotes/vivier2/tags/trivial-branch-pull-request' into staging (2019-11-12 
>> 12:09:19 +)
>>
>> are available in the Git repository at:
>>
>>   https://github.com/stsquad/qemu.git tags/pull-testing-and-tcg-121119-1
>>
>> for you to fetch changes up to 3fb356cc86461a14450802e14fa79e8436dbbf31:
>>
>>   tcg plugins: expose an API version concept (2019-11-12 14:32:55 +)
>>
>> 
>> Testing and plugins for rc1
>>
>>   - add plugin API versioning
>>   - tests/vm add netbsd autoinstall
>>   - disable ipmi-bt-test for non-Linux
>>   - single-thread make check
>
>
> Applied, thanks.
>
> Please update the changelog at https://wiki.qemu.org/ChangeLog/4.2
> for any user-visible changes.

Yes.. I'll cook something up.

>
> PS: just noticed, but shouldn't the plugin-version change
> have needed an update to the docs ?
>
> thanks
> -- PMM


--
Alex Bennée



[PULL 1/2] linux-user: fix missing break

2019-11-12 Thread Laurent Vivier
Reported by Coverity (CID 1407221)
Fixes: a2d866827bd8 ("linux-user: Support for NETLINK socket options")
cc: Josh Kunz 
Signed-off-by: Laurent Vivier 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20191112105055.32269-1-laur...@vivier.eu>
---
 linux-user/syscall.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index ab9d933e53af..4e97bcf1e5a9 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -2632,6 +2632,7 @@ static abi_long do_getsockopt(int sockfd, int level, int 
optname,
 default:
 goto unimplemented;
 }
+break;
 #endif /* SOL_NETLINK */
 default:
 unimplemented:
-- 
2.21.0




[PULL 0/2] Linux user for 4.2 patches

2019-11-12 Thread Laurent Vivier
The following changes since commit 2a7e7c3e103a5c29af7c583390c243d85a2527e8:

  Merge remote-tracking branch 
'remotes/stsquad/tags/pull-testing-and-tcg-121119-1' into staging (2019-11-12 
14:51:00 +)

are available in the Git repository at:

  git://github.com/vivier/qemu.git tags/linux-user-for-4.2-pull-request

for you to fetch changes up to 0f1f2d4596aee037d3ccbcf10592466daa54107f:

  linux-user: remove host stime() syscall (2019-11-12 17:05:57 +0100)


Fix CID 1407221 and stime()



Laurent Vivier (2):
  linux-user: fix missing break
  linux-user: remove host stime() syscall

 linux-user/syscall.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

-- 
2.21.0




[PULL 2/2] linux-user: remove host stime() syscall

2019-11-12 Thread Laurent Vivier
stime() has been withdrawn from glibc
(12cbde1dae6f "Use clock_settime to implement stime; withdraw stime.")

Implement the target stime() syscall using host
clock_settime(CLOCK_REALTIME, ...) as it is done internally in glibc.

Tested qemu-ppc/x86_64 with:

#include 
#include 

int main(void)
{
time_t t;
int ret;

/* date -u -d"2019-11-12T15:11:00" "+%s" */
t = 1573571460;
ret = stime();
printf("ret %d\n", ret);
return 0;
}

# date; ./stime; date
Tue Nov 12 14:18:32 UTC 2019
ret 0
Tue Nov 12 15:11:00 UTC 2019

Buglink: https://bugs.launchpad.net/qemu/+bug/1852115
Reported-by: Cole Robinson 
Signed-off-by: Laurent Vivier 
Reviewed-by: Peter Maydell 
Message-Id: <20191112142556.6335-1-laur...@vivier.eu>
---
 linux-user/syscall.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 4e97bcf1e5a9..ce399a55f0db 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -7764,10 +7764,12 @@ static abi_long do_syscall1(void *cpu_env, int num, 
abi_long arg1,
 #ifdef TARGET_NR_stime /* not on alpha */
 case TARGET_NR_stime:
 {
-time_t host_time;
-if (get_user_sal(host_time, arg1))
+struct timespec ts;
+ts.tv_nsec = 0;
+if (get_user_sal(ts.tv_sec, arg1)) {
 return -TARGET_EFAULT;
-return get_errno(stime(_time));
+}
+return get_errno(clock_settime(CLOCK_REALTIME, ));
 }
 #endif
 #ifdef TARGET_NR_alarm /* not on alpha */
-- 
2.21.0




Re: [PATCH] linux-user: remove host stime() syscall

2019-11-12 Thread Laurent Vivier
Le 12/11/2019 à 15:25, Laurent Vivier a écrit :
> stime() has been withdrawn from glibc
> (12cbde1dae6f "Use clock_settime to implement stime; withdraw stime.")
> 
> Implement the target stime() syscall using host
> clock_settime(CLOCK_REALTIME, ...) as it is done internally in glibc.
> 
> Tested qemu-ppc/x86_64 with:
> 
>   #include 
>   #include 
> 
>   int main(void)
>   {
>   time_t t;
>   int ret;
> 
>   /* date -u -d"2019-11-12T15:11:00" "+%s" */
>   t = 1573571460;
>   ret = stime();
>   printf("ret %d\n", ret);
>   return 0;
>   }
> 
> # date; ./stime; date
> Tue Nov 12 14:18:32 UTC 2019
> ret 0
> Tue Nov 12 15:11:00 UTC 2019
> 
> Buglink: https://bugs.launchpad.net/qemu/+bug/1852115
> Reported-by: Cole Robinson 
> Signed-off-by: Laurent Vivier 
> ---
>  linux-user/syscall.c | 8 +---
>  1 file changed, 5 insertions(+), 3 deletions(-)
> 
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index ab9d933e53af..c4dcdc94b10c 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -7763,10 +7763,12 @@ static abi_long do_syscall1(void *cpu_env, int num, 
> abi_long arg1,
>  #ifdef TARGET_NR_stime /* not on alpha */
>  case TARGET_NR_stime:
>  {
> -time_t host_time;
> -if (get_user_sal(host_time, arg1))
> +struct timespec ts;
> +ts.tv_nsec = 0;
> +if (get_user_sal(ts.tv_sec, arg1)) {
>  return -TARGET_EFAULT;
> -return get_errno(stime(_time));
> +}
> +return get_errno(clock_settime(CLOCK_REALTIME, ));
>  }
>  #endif
>  #ifdef TARGET_NR_alarm /* not on alpha */
> 

Applied to my linux-user branch.

Thanks,
Laurent



Re: [PATCH] linux-user: fix missing break

2019-11-12 Thread Laurent Vivier
Le 12/11/2019 à 11:50, Laurent Vivier a écrit :
> Reported by Coverity (CID 1407221)
> Fixes: a2d866827bd8 ("linux-user: Support for NETLINK socket options")
> cc: Josh Kunz 
> Signed-off-by: Laurent Vivier 
> ---
>  linux-user/syscall.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/linux-user/syscall.c b/linux-user/syscall.c
> index ab9d933e53af..4e97bcf1e5a9 100644
> --- a/linux-user/syscall.c
> +++ b/linux-user/syscall.c
> @@ -2632,6 +2632,7 @@ static abi_long do_getsockopt(int sockfd, int level, 
> int optname,
>  default:
>  goto unimplemented;
>  }
> +break;
>  #endif /* SOL_NETLINK */
>  default:
>  unimplemented:
> 

Applied to my linux-user branch.

Thanks,
Laurent



[PULL v1 3/3] target/microblaze: Plug temp leak around eval_cond_jmp()

2019-11-12 Thread Edgar E. Iglesias
From: "Edgar E. Iglesias" 

Plug temp leak around eval_cond_jmp().

Reviewed-by: Luc Michel 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alistair Francis 
Reviewed-by: Richard Henderson 
Signed-off-by: Edgar E. Iglesias 
---
 target/microblaze/translate.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/target/microblaze/translate.c b/target/microblaze/translate.c
index 7b4b66a622..bdc7d5326a 100644
--- a/target/microblaze/translate.c
+++ b/target/microblaze/translate.c
@@ -1681,7 +1681,10 @@ void gen_intermediate_code(CPUState *cs, 
TranslationBlock *tb, int max_insns)
 dc->tb_flags &= ~D_FLAG;
 /* If it is a direct jump, try direct chaining.  */
 if (dc->jmp == JMP_INDIRECT) {
-eval_cond_jmp(dc, env_btarget, tcg_const_i64(dc->pc));
+TCGv_i64 tmp_pc = tcg_const_i64(dc->pc);
+eval_cond_jmp(dc, env_btarget, tmp_pc);
+tcg_temp_free_i64(tmp_pc);
+
 dc->is_jmp = DISAS_JUMP;
 } else if (dc->jmp == JMP_DIRECT) {
 t_sync_flags(dc);
-- 
2.20.1




Re: [PATCH 2/2] iotests: Test multiple blockdev-snapshot calls

2019-11-12 Thread Peter Krempa
On Fri, Nov 08, 2019 at 09:53:12 +0100, Kevin Wolf wrote:
> Test that doing a second blockdev-snapshot doesn't make the first
> overlay's backing file go away.
> 
> Signed-off-by: Kevin Wolf 
> ---
>  tests/qemu-iotests/273 |  76 +
>  tests/qemu-iotests/273.out | 337 +
>  tests/qemu-iotests/group   |   1 +
>  3 files changed, 414 insertions(+)
>  create mode 100755 tests/qemu-iotests/273
>  create mode 100644 tests/qemu-iotests/273.out

Didn't apply cleanly for me.

> 
> diff --git a/tests/qemu-iotests/273 b/tests/qemu-iotests/273
> new file mode 100755
> index 00..60076de7ce
> --- /dev/null
> +++ b/tests/qemu-iotests/273
> @@ -0,0 +1,76 @@
> +#!/usr/bin/env bash
> +#
> +# Test large write to a qcow2 image

Cut?


Rest looks good

Reviewed-by: Peter Krempa 

> +#
> +# Copyright (C) 2019 Red Hat, Inc.
> +#
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 2 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see .
> +#
> +
> +seq=$(basename "$0")
> +echo "QA output created by $seq"
> +
> +status=1 # failure is the default!
> +
> +_cleanup()
> +{
> +_cleanup_test_img
> +}
> +trap "_cleanup; exit \$status" 0 1 2 3 15
> +
> +# get standard environment, filters and checks
> +. ./common.rc
> +. ./common.filter
> +
> +# This is a qcow2 regression test
> +_supported_fmt qcow2
> +_supported_proto file
> +_supported_os Linux
> +
> +do_run_qemu()
> +{
> +echo Testing: "$@"
> +$QEMU -nographic -qmp-pretty stdio -nodefaults "$@"

-qmp-pretty, that's useful

> +echo
> +}
> +
> +run_qemu()
> +{
> +do_run_qemu "$@" 2>&1 | _filter_testdir | _filter_qemu | _filter_qmp |
> +_filter_generated_node_ids | _filter_imgfmt | 
> _filter_actual_image_size
> +}
> +
> +TEST_IMG="$TEST_IMG.base" _make_test_img 64M
> +TEST_IMG="$TEST_IMG.mid" _make_test_img -b "$TEST_IMG.base"
> +_make_test_img -b "$TEST_IMG.mid"
> +
> +run_qemu \
> +-blockdev file,node-name=base,filename="$TEST_IMG.base" \
> + -blockdev file,node-name=midf,filename="$TEST_IMG.mid" \
> + -blockdev 
> '{"driver":"qcow2","node-name":"mid","file":"midf","backing":null}' \
> + -blockdev file,node-name=topf,filename="$TEST_IMG" \
> + -blockdev 
> '{"driver":"qcow2","file":"topf","node-name":"top","backing":null}' \
> +< +{"execute":"qmp_capabilities"}
> +{"execute":"blockdev-snapshot","arguments":{"node":"base","overlay":"mid"}}
> +{"execute":"blockdev-snapshot","arguments":{"node":"mid","overlay":"top"}}
> +{"execute":"query-named-block-nodes"}
> +{"execute":"x-debug-query-block-graph"}

Oh, this too!

> +{"execute":"quit"}
> +EOF
> +
> +# success, all done
> +echo "*** done"
> +rm -f $seq.full
> +status=0




[PULL v1 2/3] target/microblaze: Plug temp leaks with delay slot setup

2019-11-12 Thread Edgar E. Iglesias
From: "Edgar E. Iglesias" 

Plug temp leaks with delay slot setup.

Reviewed-by: Luc Michel 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alistair Francis 
Reviewed-by: Richard Henderson 
Signed-off-by: Edgar E. Iglesias 
---
 target/microblaze/translate.c | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/target/microblaze/translate.c b/target/microblaze/translate.c
index c8442b18e1..7b4b66a622 100644
--- a/target/microblaze/translate.c
+++ b/target/microblaze/translate.c
@@ -1177,6 +1177,17 @@ static void eval_cond_jmp(DisasContext *dc, TCGv_i64 
pc_true, TCGv_i64 pc_false)
 tcg_temp_free_i64(tmp_zero);
 }
 
+static void dec_setup_dslot(DisasContext *dc)
+{
+TCGv_i32 tmp = tcg_const_i32(dc->type_b && (dc->tb_flags & IMM_FLAG));
+
+dc->delayed_branch = 2;
+dc->tb_flags |= D_FLAG;
+
+tcg_gen_st_i32(tmp, cpu_env, offsetof(CPUMBState, bimm));
+tcg_temp_free_i32(tmp);
+}
+
 static void dec_bcc(DisasContext *dc)
 {
 unsigned int cc;
@@ -1188,10 +1199,7 @@ static void dec_bcc(DisasContext *dc)
 
 dc->delayed_branch = 1;
 if (dslot) {
-dc->delayed_branch = 2;
-dc->tb_flags |= D_FLAG;
-tcg_gen_st_i32(tcg_const_i32(dc->type_b && (dc->tb_flags & IMM_FLAG)),
-  cpu_env, offsetof(CPUMBState, bimm));
+dec_setup_dslot(dc);
 }
 
 if (dec_alu_op_b_is_small_imm(dc)) {
@@ -1250,10 +1258,7 @@ static void dec_br(DisasContext *dc)
 
 dc->delayed_branch = 1;
 if (dslot) {
-dc->delayed_branch = 2;
-dc->tb_flags |= D_FLAG;
-tcg_gen_st_i32(tcg_const_i32(dc->type_b && (dc->tb_flags & IMM_FLAG)),
-  cpu_env, offsetof(CPUMBState, bimm));
+dec_setup_dslot(dc);
 }
 if (link && dc->rd)
 tcg_gen_movi_i32(cpu_R[dc->rd], dc->pc);
@@ -1355,10 +1360,7 @@ static void dec_rts(DisasContext *dc)
 return;
 }
 
-dc->delayed_branch = 2;
-dc->tb_flags |= D_FLAG;
-tcg_gen_st_i32(tcg_const_i32(dc->type_b && (dc->tb_flags & IMM_FLAG)),
-  cpu_env, offsetof(CPUMBState, bimm));
+dec_setup_dslot(dc);
 
 if (i_bit) {
 LOG_DIS("rtid ir=%x\n", dc->ir);
-- 
2.20.1




[PULL v1 0/3] MicroBlaze fixes

2019-11-12 Thread Edgar E. Iglesias
From: "Edgar E. Iglesias" 

The following changes since commit 039e285e095c20a88e623b927654b161aaf9d914:

  Merge remote-tracking branch 
'remotes/vivier2/tags/trivial-branch-pull-request' into staging (2019-11-12 
12:09:19 +)

are available in the Git repository at:

  g...@github.com:edgarigl/qemu.git 
tags/edgar/xilinx-next-2019-11-12.for-upstream

for you to fetch changes up to c49a41b0b9e6c77e24ac2be4d95c54d62bc7b092:

  target/microblaze: Plug temp leak around eval_cond_jmp() (2019-11-12 16:35:26 
+0100)


For upstream


Edgar E. Iglesias (3):
  target/microblaze: Plug temp leaks for loads/stores
  target/microblaze: Plug temp leaks with delay slot setup
  target/microblaze: Plug temp leak around eval_cond_jmp()

 target/microblaze/translate.c | 77 +--
 1 file changed, 38 insertions(+), 39 deletions(-)

-- 
2.20.1




[PULL v1 1/3] target/microblaze: Plug temp leaks for loads/stores

2019-11-12 Thread Edgar E. Iglesias
From: "Edgar E. Iglesias" 

Simplify endian reversion of address also plugging TCG temp
leaks for loads/stores.

Suggested-by: Richard Henderson 
Reviewed-by: Richard Henderson 
Reviewed-by: Luc Michel 
Reviewed-by: Alistair Francis 
Signed-off-by: Edgar E. Iglesias 
---
 target/microblaze/translate.c | 46 +++
 1 file changed, 20 insertions(+), 26 deletions(-)

diff --git a/target/microblaze/translate.c b/target/microblaze/translate.c
index 761f535357..c8442b18e1 100644
--- a/target/microblaze/translate.c
+++ b/target/microblaze/translate.c
@@ -962,17 +962,7 @@ static void dec_load(DisasContext *dc)
 switch (size) {
 case 1:
 {
-/* 00 -> 11
-   01 -> 10
-   10 -> 10
-   11 -> 00 */
-TCGv low = tcg_temp_new();
-
-tcg_gen_andi_tl(low, addr, 3);
-tcg_gen_sub_tl(low, tcg_const_tl(3), low);
-tcg_gen_andi_tl(addr, addr, ~3);
-tcg_gen_or_tl(addr, addr, low);
-tcg_temp_free(low);
+tcg_gen_xori_tl(addr, addr, 3);
 break;
 }
 
@@ -1006,9 +996,16 @@ static void dec_load(DisasContext *dc)
 tcg_gen_qemu_ld_i32(v, addr, mem_index, mop);
 
 if ((dc->cpu->env.pvr.regs[2] & PVR2_UNALIGNED_EXC_MASK) && size > 1) {
+TCGv_i32 t0 = tcg_const_i32(0);
+TCGv_i32 treg = tcg_const_i32(dc->rd);
+TCGv_i32 tsize = tcg_const_i32(size - 1);
+
 tcg_gen_movi_i64(cpu_SR[SR_PC], dc->pc);
-gen_helper_memalign(cpu_env, addr, tcg_const_i32(dc->rd),
-tcg_const_i32(0), tcg_const_i32(size - 1));
+gen_helper_memalign(cpu_env, addr, treg, t0, tsize);
+
+tcg_temp_free_i32(t0);
+tcg_temp_free_i32(treg);
+tcg_temp_free_i32(tsize);
 }
 
 if (ex) {
@@ -1095,17 +1092,7 @@ static void dec_store(DisasContext *dc)
 switch (size) {
 case 1:
 {
-/* 00 -> 11
-   01 -> 10
-   10 -> 10
-   11 -> 00 */
-TCGv low = tcg_temp_new();
-
-tcg_gen_andi_tl(low, addr, 3);
-tcg_gen_sub_tl(low, tcg_const_tl(3), low);
-tcg_gen_andi_tl(addr, addr, ~3);
-tcg_gen_or_tl(addr, addr, low);
-tcg_temp_free(low);
+tcg_gen_xori_tl(addr, addr, 3);
 break;
 }
 
@@ -1124,6 +,10 @@ static void dec_store(DisasContext *dc)
 
 /* Verify alignment if needed.  */
 if ((dc->cpu->env.pvr.regs[2] & PVR2_UNALIGNED_EXC_MASK) && size > 1) {
+TCGv_i32 t1 = tcg_const_i32(1);
+TCGv_i32 treg = tcg_const_i32(dc->rd);
+TCGv_i32 tsize = tcg_const_i32(size - 1);
+
 tcg_gen_movi_i64(cpu_SR[SR_PC], dc->pc);
 /* FIXME: if the alignment is wrong, we should restore the value
  *in memory. One possible way to achieve this is to probe
@@ -1131,8 +1122,11 @@ static void dec_store(DisasContext *dc)
  *the alignment checks in between the probe and the mem
  *access.
  */
-gen_helper_memalign(cpu_env, addr, tcg_const_i32(dc->rd),
-tcg_const_i32(1), tcg_const_i32(size - 1));
+gen_helper_memalign(cpu_env, addr, treg, t1, tsize);
+
+tcg_temp_free_i32(t1);
+tcg_temp_free_i32(treg);
+tcg_temp_free_i32(tsize);
 }
 
 if (ex) {
-- 
2.20.1




Re: [PATCH 1/2] block: Remove 'backing': null from bs->{explicit_,}options

2019-11-12 Thread Peter Krempa
On Fri, Nov 08, 2019 at 09:53:11 +0100, Kevin Wolf wrote:
> bs->options and bs->explicit_options shouldn't contain any options for
> child nodes. bdrv_open_inherited() takes care to remove any options that
> match a child name after opening the image and the same is done when
> reopening.
> 
> However, we miss the case of 'backing': null, which is a child option,
> but results in no child being created. This means that a 'backing': null
> remains in bs->options and bs->explicit_options.
> 
> A typical use for 'backing': null is in live snapshots: blockdev-add for
> the qcow2 overlay makes sure not to open the backing file (because it is

Note that we also use '"backing": null' as a terminator for the last
image in the chain if the user configures the chain manually.

This is kind-of a protection from opening the backing file from the
header if it was misconfigured somehow. I think this functionality
should be kept despite probably not making practical sense.

In my testing this scenario worked properly.

> already opened and blockdev-snapshot will attach it). After doing a
> blockdev-snapshot, bs->options and bs->explicit_options become
> inconsistent with the actual state (bs has a backing file now, but the
> options still say null). On the next occasion that the image is
> reopened, e.g. switching it from read-write to read-only when another
> snapshot is taken, the option will take effect again and the node
> incorrectly loses its backing file.
> 
> Fix bdrv_open_inherited() to remove the 'backing' option from
> bs->options and bs->explicit_options even for the case where it
> specifies that no backing file is wanted.
> 
> Reported-by: Peter Krempa 
> Signed-off-by: Kevin Wolf 

The fix looks sane-enough to me and works as expected, but since I'm not
familiar enough with this code I'm comfortable only with a:

Tested-by: Peter Krempa 

> ---
>  block.c | 2 ++
>  1 file changed, 2 insertions(+)




Re: [PULL 0/8] testing and tcg plugin api ver

2019-11-12 Thread Peter Maydell
On Tue, 12 Nov 2019 at 14:50, Alex Bennée  wrote:
>
> The following changes since commit 039e285e095c20a88e623b927654b161aaf9d914:
>
>   Merge remote-tracking branch 
> 'remotes/vivier2/tags/trivial-branch-pull-request' into staging (2019-11-12 
> 12:09:19 +)
>
> are available in the Git repository at:
>
>   https://github.com/stsquad/qemu.git tags/pull-testing-and-tcg-121119-1
>
> for you to fetch changes up to 3fb356cc86461a14450802e14fa79e8436dbbf31:
>
>   tcg plugins: expose an API version concept (2019-11-12 14:32:55 +)
>
> 
> Testing and plugins for rc1
>
>   - add plugin API versioning
>   - tests/vm add netbsd autoinstall
>   - disable ipmi-bt-test for non-Linux
>   - single-thread make check


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/4.2
for any user-visible changes.

PS: just noticed, but shouldn't the plugin-version change
have needed an update to the docs ?

thanks
-- PMM



Re: [RFC v5 000/126] error: auto propagated local_err

2019-11-12 Thread Vladimir Sementsov-Ogievskiy
12.11.2019 16:46, Cornelia Huck wrote:
> On Fri, 8 Nov 2019 22:57:25 +0400
> Marc-André Lureau  wrote:
> 
>> Hi
>>
>> On Fri, Nov 8, 2019 at 7:31 PM Vladimir Sementsov-Ogievskiy
>>  wrote:
>>>
>>> Finally, what is the plan?
>>>
>>> Markus what do you think?
>>>
>>> Now a lot of patches are reviewed, but a lot of are not.
>>>
>>> Is there any hope that all patches will be reviewed? Should I resend the
>>> whole series, or may be reduce it to reviewed subsystems only?
>>
>> I don't think we have well established rules for whole-tree cleanups
>> like this. In the past, several cleanup series got lost.
> 
> Yes, it is always problematic if a series touches a lot of different
> subsystems.
> 
>>
>> It will take ages to get every subsystem maintainer to review the
>> patches. Most likely, since they are quite systematic, there isn't
>> much to say and it is easy to miss something that has some hidden
>> ramifications. Perhaps whole-tree cleanups should require at least 2
>> reviewers to bypass the subsytem maintainer review? But my past
>> experience with this kind of exercice doesn't encourage me, and
>> probably I am not the only one.
> 
> It's not just the reviews; it's easy to miss compile problems on less
> mainstream architectures (and even easier to miss functional problems
> there, although they are probably less likely with automated rework.)
> CI can probably help, but that's something for the future.
> 
> Anyway, I've now gotten around to that series; spotted one problem in
> s390x code, I think.
> 
> One thing that's helpful for such a large series is a git branch that
> makes it easy to give the series a quick go. (You can use patchew, but
> it takes time before it gets all mails, so just pushing it somewhere
> and letting people know is a good idea anyway.)
> 

Thanks for review!

The series are posted here:

https://src.openvz.org/users/vsementsov/repos/qemu/browse

https://src.openvz.org/scm/~vsementsov/qemu.git #tag up-auto-local-err-v5


-- 
Best regards,
Vladimir


Re: [RFC v5 023/126] hw/vfio/ap: drop local_err from vfio_ap_realize

2019-11-12 Thread Vladimir Sementsov-Ogievskiy
12.11.2019 16:06, Cornelia Huck wrote:
> On Fri, 11 Oct 2019 19:04:09 +0300
> Vladimir Sementsov-Ogievskiy  wrote:
> 
>> No reason for local_err here, use errp directly instead.
>>
>> Signed-off-by: Vladimir Sementsov-Ogievskiy 
>> ---
>>   hw/vfio/ap.c | 16 +++-
>>   1 file changed, 3 insertions(+), 13 deletions(-)
>>
>> diff --git a/hw/vfio/ap.c b/hw/vfio/ap.c
>> index da6a20669d..db816e1860 100644
>> --- a/hw/vfio/ap.c
>> +++ b/hw/vfio/ap.c
>> @@ -87,16 +87,14 @@ static VFIOGroup *vfio_ap_get_group(VFIOAPDevice 
>> *vapdev, Error **errp)
>>   
>>   static void vfio_ap_realize(DeviceState *dev, Error **errp)
>>   {
>> -int ret;
>>   char *mdevid;
>> -Error *local_err = NULL;
>>   VFIOGroup *vfio_group;
>>   APDevice *apdev = AP_DEVICE(dev);
>>   VFIOAPDevice *vapdev = VFIO_AP_DEVICE(apdev);
>>   
>> -vfio_group = vfio_ap_get_group(vapdev, _err);
>> +vfio_group = vfio_ap_get_group(vapdev, errp);
>>   if (!vfio_group) {
>> -goto out_err;
>> +return;
>>   }
>>   
>>   vapdev->vdev.ops = _ap_ops;
>> @@ -113,18 +111,10 @@ static void vfio_ap_realize(DeviceState *dev, Error 
>> **errp)
>>*/
>>   vapdev->vdev.balloon_allowed = true;
>>   
>> -ret = vfio_get_device(vfio_group, mdevid, >vdev, _err);
>> -if (ret) {
>> -goto out_get_dev_err;
>> -}
>> -
>> -return;
>> +vfio_get_device(vfio_group, mdevid, >vdev, errp);
> 
> This looks wrong; you still need to check for the outcome of
> vfio_get_device().

Oops, agree, will fix.


> 
>>   
>> -out_get_dev_err:
>>   vfio_ap_put_device(vapdev);
>>   vfio_put_group(vfio_group);
>> -out_err:
>> -error_propagate(errp, local_err);
>>   }
>>   
>>   static void vfio_ap_unrealize(DeviceState *dev, Error **errp)
> 


-- 
Best regards,
Vladimir



  1   2   3   >