Re: [PATCH V4] target/i386/kvm: Add Hyper-V direct tlb flush support

2019-11-13 Thread Roman Kagan
On Wed, Nov 13, 2019 at 10:29:00AM +0100, Vitaly Kuznetsov wrote:
> Roman Kagan  writes:
> > On Tue, Nov 12, 2019 at 11:34:27AM +0800, lantianyu1...@gmail.com wrote:
> >> From: Tianyu Lan 
> >> 
> >> Hyper-V direct tlb flush targets KVM on Hyper-V guest.
> >> Enable direct TLB flush for its guests meaning that TLB
> >> flush hypercalls are handled by Level 0 hypervisor (Hyper-V)
> >> bypassing KVM in Level 1. Due to the different ABI for hypercall
> >> parameters between Hyper-V and KVM, KVM capabilities should be
> >> hidden when enable Hyper-V direct tlb flush otherwise KVM
> >> hypercalls may be intercepted by Hyper-V. Add new parameter
> >> "hv-direct-tlbflush". Check expose_kvm and Hyper-V tlb flush
> >> capability status before enabling the feature.
> >> 
> >> Signed-off-by: Tianyu Lan 
> >> ---
> >> Change since v3:
> >>- Fix logic of Hyper-V passthrough mode with direct
> >>tlb flush.
> >> 
> >> Change sicne v2:
> >>- Update new feature description and name.
> >>- Change failure print log.
> >> 
> >> Change since v1:
> >>- Add direct tlb flush's Hyper-V property and use
> >>hv_cpuid_check_and_set() to check the dependency of tlbflush
> >>feature.
> >>- Make new feature work with Hyper-V passthrough mode.
> >> ---
> >>  docs/hyperv.txt   | 10 ++
> >>  target/i386/cpu.c |  2 ++
> >>  target/i386/cpu.h |  1 +
> >>  target/i386/kvm.c | 24 
> >>  4 files changed, 37 insertions(+)
> >> 
> >> diff --git a/docs/hyperv.txt b/docs/hyperv.txt
> >> index 8fdf25c829..140a5c7e44 100644
> >> --- a/docs/hyperv.txt
> >> +++ b/docs/hyperv.txt
> >> @@ -184,6 +184,16 @@ enabled.
> >>  
> >>  Requires: hv-vpindex, hv-synic, hv-time, hv-stimer
> >>  
> >> +3.18. hv-direct-tlbflush
> >> +===
> >> +Enable direct TLB flush for KVM when it is running as a nested
> >> +hypervisor on top Hyper-V. When enabled, TLB flush hypercalls from L2
> >> +guests are being passed through to L0 (Hyper-V) for handling. Due to ABI
> >> +differences between Hyper-V and KVM hypercalls, L2 guests will not be
> >> +able to issue KVM hypercalls (as those could be mishanled by L0
> >> +Hyper-V), this requires KVM hypervisor signature to be hidden.
> >
> > On a second thought, I wonder if this is the only conflict we have.
> >
> > In KVM, kvm_emulate_hypercall, when sees Hyper-V hypercalls enabled,
> > just calls kvm_hv_hypercall and returns.  I.e. once the userspace
> > enables Hyper-V hypercalls (which QEMU does when any of hv_* flags is
> > given), KVM treats *all* hypercalls as Hyper-V ones and handles *no* KVM
> > hypercalls.
> 
> Yes, but only after guest enables Hyper-V hypercalls by writing to
> HV_X64_MSR_HYPERCALL. E.g. if you run a Linux guest and add a couple
> hv_* flags on the QEMU command line the guest will still be able to use
> KVM hypercalls normally becase Linux won't enable Hyper-V hypercall
> page.

Ah, you're right.  There's no conflict indeed, the guest makes
deliberate choice which hypercall ABI to use.

Then QEMU (or KVM on its own?) should only activate this flag in evmcs
if it sees that the guest has enabled Hyper-V hypercalls.  No need to
hide KVM signature.

Roman.



Re: virtio,iommu_platform=on

2019-11-13 Thread Michael S. Tsirkin
On Wed, Nov 13, 2019 at 03:44:28PM +1100, Alexey Kardashevskiy wrote:
> 
> 
> On 12/11/2019 18:08, Michael S. Tsirkin wrote:
> > On Tue, Nov 12, 2019 at 02:53:49PM +1100, Alexey Kardashevskiy wrote:
> >> Hi!
> >>
> >> I am enabling IOMMU for virtio in the pseries firmware (SLOF) and seeing
> >> problems, one of them is SLOF does SCSI bus scan, then it stops the
> >> virtio-scsi by clearing MMIO|IO|BUSMASTER from PCI_COMMAND (as SLOF
> >> stopped using the devices) and when this happens, I see unassigned
> >> memory access (see below) which happens because disabling busmaster
> >> disables IOMMU and QEMU cannot access the rings to do some shutdown. And
> >> when this happens, the device does not come back even if SLOF re-enables 
> >> it.
> > 
> > In fact clearing bus master should disable ring access even
> > without the IOMMU.
> > Once you do this you should not wait for rings to be processed,
> > it is safe to assume they won't be touched again and just
> > free up any buffers that have not been used.
> > 
> > Why don't you see this without IOMMU?
> 
> Because without IOMMU, virtio can always access rings, it does not need
> bus master address space for that.

Right and that's a bug in virtio scsi. E.g. virtio net checks
bus mastering before each access.
Which is all well and good, but we can't just break the world
so I guess we first need to fix SLOF, and then add
a compat property. And maybe keep it broken for
legacy ...

> 
> > It's a bug I think, probably there to work around buggy guests.
> > 
> > So pls fix this in SLOF and then hopefully we can drop the
> > work arounds and have clearing bus master actually block DMA.
> 
> 
> Laszlo suggested writing 0 to the status but this does not seem helping,
> with both ioeventfd=true/false. It looks like setting/clearing busmaster
> bit confused memory region caches in QEMU's virtio. I am confused which
> direction to keep digging to, any suggestions? Thanks,
> 

to clarify you reset after setting bus master? right?


> 
> > 
> >> Hacking SLOF to not clear BUSMASTER makes virtio-scsi work but it is
> >> hardly a right fix.
> >>
> >> Is this something expected? Thanks,
> >>
> >>
> >> Here is the exact command line:
> >>
> >> /home/aik/pbuild/qemu-garrison2-ppc64/ppc64-softmmu/qemu-system-ppc64 \
> >>
> >> -nodefaults \
> >>
> >> -chardev stdio,id=STDIO0,signal=off,mux=on \
> >>
> >> -device spapr-vty,id=svty0,reg=0x71000110,chardev=STDIO0 \
> >>
> >> -mon id=MON0,chardev=STDIO0,mode=readline \
> >>
> >> -nographic \
> >>
> >> -vga none \
> >>
> >> -enable-kvm \
> >> -m 2G \
> >>
> >> -device
> >> virtio-scsi-pci,id=vscsi0,iommu_platform=on,disable-modern=off,disable-legacy=on
> >> \
> >> -drive id=DRIVE0,if=none,file=img/u1804-64le.qcow2,format=qcow2 \
> >>
> >> -device scsi-disk,id=scsi-disk0,drive=DRIVE0 \
> >>
> >> -snapshot \
> >>
> >> -smp 1 \
> >>
> >> -machine pseries \
> >>
> >> -L /home/aik/t/qemu-ppc64-bios/ \
> >>
> >> -trace events=qemu_trace_events \
> >>
> >> -d guest_errors \
> >>
> >> -chardev socket,id=SOCKET0,server,nowait,path=qemu.mon.ssh59518 \
> >>
> >> -mon chardev=SOCKET0,mode=control
> >>
> >>
> >>
> >> Here is the backtrace:
> >>
> >> Thread 5 "qemu-system-ppc" hit Breakpoint 8, unassigned_mem_accepts
> >> (opaque=0x0, addr=0x5802, size=0x2, is_write=0x0, attrs=...) at /home/
> >> aik/p/qemu/memory.c:1275
> >> 1275return false;
> >> #0  unassigned_mem_accepts (opaque=0x0, addr=0x5802, size=0x2,
> >> is_write=0x0, attrs=...) at /home/aik/p/qemu/memory.c:1275
> >> #1  0x100a8ac8 in memory_region_access_valid (mr=0x1105c230
> >> , addr=0x5802, size=0x2, is_write=0x0, attrs=...) at
> >> /home/aik/p/qemu/memory.c:1377
> >> #2  0x100a8c88 in memory_region_dispatch_read (mr=0x1105c230
> >> , addr=0x5802, pval=0x7550d410, op=MO_16,
> >> attrs=...) at /home/aik/p/qemu/memory.c:1418
> >> #3  0x1001cad4 in address_space_lduw_internal_cached_slow
> >> (cache=0x7fff68036fa0, addr=0x2, attrs=..., result=0x0,
> >> endian=DEVICE_LITTLE_ENDIAN) at /home/aik/p/qemu/memory_ldst.inc.c:211
> >> #4  0x1001cc84 in address_space_lduw_le_cached_slow
> >> (cache=0x7fff68036fa0, addr=0x2, attrs=..., result=0x0) at
> >> /home/aik/p/qemu/memory_ldst.inc.c:249
> >> #5  0x1019bd80 in address_space_lduw_le_cached
> >> (cache=0x7fff68036fa0, addr=0x2, attrs=..., result=0x0) at
> >> /home/aik/p/qemu/include/exec/memory_ldst_cached.inc.h:56
> >> #6  0x1019c10c in lduw_le_phys_cached (cache=0x7fff68036fa0,
> >> addr=0x2) at /home/aik/p/qemu/include/exec/memory_ldst_phys.inc.h:91
> >> #7  0x1019d86c in virtio_lduw_phys_cached (vdev=0x118b9110,
> >> cache=0x7fff68036fa0, pa=0x2) at
> >> /home/aik/p/qemu/include/hw/virtio/virtio-access.h:166
> >> #8  0x1019e648 in vring_avail_idx (vq=0x118c2720) at
> >> /home/aik/p/qemu/hw/virtio/virtio.c:302
> >> #9  0x1019f5bc in virtio_queue_split_empty (vq=0x118c2720) at
> >> /home/aik/p/qemu/hw/virtio/virtio.c:581
> >> #10 0x1019f838 in 

[Bug 1734792] Re: linux-user mode does not support memfd_create syscall

2019-11-13 Thread Laurent Vivier
9bdfa4d23f43 linux-user: add memfd_create


** Changed in: qemu
   Status: Confirmed => Fix Committed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1734792

Title:
  linux-user mode does not support memfd_create syscall

Status in QEMU:
  Fix Committed

Bug description:
  qemu-x86_66 GIT HEAD fails on a userspace application that requires
  memfd_create with:

  "qemu: Unsupported syscall: 319".

  memfd_create support needs to be implemented in QEMU.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1734792/+subscriptions



[Bug 1852115] Re: qemu --static user build fails with fedora rawhide glibc-2.30.9000

2019-11-13 Thread Laurent Vivier
Fixed by 0f1f2d4596ae ("linux-user: remove host stime() syscall")


** Changed in: qemu
   Status: Confirmed => Fix Committed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1852115

Title:
  qemu --static user build fails with fedora rawhide glibc-2.30.9000

Status in QEMU:
  Fix Committed

Bug description:
  Building qemu latest git 654efcb511d on fedora rawhide fails with this
  configure line:

  ./configure \
--static \
--disable-system \
--enable-linux-user \
--disable-werror \
--disable-tools \
--disable-capstone

  make fails with:

  /usr/bin/ld: linux-user/syscall.o: in function `do_syscall1':
  /root/qemu.git/linux-user/syscall.c:7769: undefined reference to `stime'
  collect2: error: ld returned 1 exit status

  Seems related to this glibc change:
  
https://sourceware.org/git/?p=glibc.git;a=commit;h=12cbde1dae6fa4a9a792b64564c7e0debf7544cc

  ...

  +* The obsolete function stime is no longer available to newly linked
  +  binaries and it has been removed from  header.  This function
  +  has been deprecated in favor of clock_settime.
  +

  # rpm -q glibc
  glibc-2.30.9000-17.fc32.x86_64

  
  FWIW there's some other messages but I don't think they are fatal:

  /usr/bin/ld: 
/usr/lib/gcc/x86_64-redhat-linux/9/../../../../lib64/libglib-2.0.a(gutils.c.o): 
in function `g_get_user_database_entry':
  (.text+0x267): warning: Using 'getpwuid' in statically linked applications 
requires at runtime the shared libraries from the glibc version used for linking
  /usr/bin/ld: (.text+0xe0): warning: Using 'getpwnam_r' in statically linked 
applications requires at runtime the shared libraries from the glibc version 
used for linking
  /usr/bin/ld: (.text+0x11e): warning: Using 'getpwuid_r' in statically linked 
applications requires at runtime the shared libraries from the glibc version 
used for linking

  
  Also, --disable-capstone is required to avoid this error, but it is 
pre-existing, not sure if it's a bug, if so I can file a separate one:

LINKaarch64-linux-user/qemu-aarch64
  /usr/bin/ld: cannot find -lcapstone
  collect2: error: ld returned 1 exit status

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1852115/+subscriptions



Re: QEMU for Qualcomm Hexagon - KVM Forum talk and code available

2019-11-13 Thread Alex Bennée


Taylor Simpson  writes:

> I had discussions with several people at the KVM Forum, and I’ve been 
> thinking about how to divide up the code for community review.  Here is my 
> proposal for the steps.
>
>   1.  linux-user changes + linux-user/hexagon + skeleton of target/hexagon
> This is the minimum amount to build and run a very simple program.  I
>   have an assembly program that prints “Hello” and exits.  It is
>   constructed to use very few instructions that can be added brute
>   force in the Hexagon back end.

I'm hoping most of the linux-user changes are in the hexagon runloop?
There has been quite a bit of work splitting up and cleaning up the
#ifdef mess in linux-user over the last few years.

>   2.  Add the code that is imported from the Hexagon simulator and the qemu 
> helper generator
> This will allow the scalar ISA to be executed.  This will grow the set
> of programs that could execute, but there will still be limitations.
> In particular, there can be no packets which means the C library won’t
> work .  We have to build with -nostdlib

You could run -nostdlib system TCG tests (hello and memory) but that
would require modelling some sort of hardware and assumes you have a
simple serial port or semihosting solution. That said a bunch of the
MIPS tests are linux-user and -nostdlib so that isn't a major problem in
getting some of the tests running.

When you say code imported from the hexagon simulator I was under the
impression you were generating code from the instruction description.
Otherwise you'll need to be very clear about your licensing grants.

>   3.  Add support for packet semantics
> At this point, we will be able to execute full programs linked with
> the C library.  This will include the check-tcg tests.

I think the interesting question is if the roll-back semantics of the
hexagon are something we might need for other emulated architectures or
is a particularly specific solution for Hexagon (I'm guessing the later).

>   4.  Add support for the wide vector extensions
>   5.  Add the helper overrides for performance optimization
> Some of these will be written by hand, and we’ll work with rev.ng to
>   integrate their flex/bison generator.

One thing to nail down will be will we include the generated code in the
source tree with a tool to regenerate (much like we do for
linux-headers) or if we want to add the dependency and regenerate each
time from scratch. I don't see including flex/bison as a dependency
being a major issue (in fact we have it in our docker images so I guess
something uses it). However it might be trickier depending on
libclang which was also being discussed.

>
> I would love some feedback on this proposal.  Hopefully, that is enough 
> detail so that people can comment.  If anything isn’t clear, please ask 
> questions.
>
>
> Thanks,
> Taylor
>
>
> From: Qemu-devel  On 
> Behalf Of Taylor Simpson
> Sent: Tuesday, November 5, 2019 10:33 AM
> To: Aleksandar Markovic 
> Cc: Alessandro Di Federico ; ni...@rev.ng; 
> qemu-devel@nongnu.org; Niccolò Izzo 
> Subject: RE: QEMU for Qualcomm Hexagon - KVM Forum talk and code available
>
> Hi Aleksandar,
>
> Thank you – We’re glad you enjoyed the talk.
>
> One point of clarification on SIMD in Hexagon.  What we refer to as the 
> “scalar” core does have some SIMD operations.  Register pairs are 8 bytes, 
> and there are several SIMD instructions.  The example we showed in the talk 
> included a VADDH instruction.  It treats the register pair as 4 half-words 
> and does a vector add.  Then there are the Hexagon Vector eXtensions (HVX) 
> instructions that operate on 128-byte vectors.  There is a wide variety of 
> instructions in this set.  As you mentioned, some of them are pure SIMD and 
> others are very complex.
>
> For the helper generator, the vast majority of these are implemented with 
> helpers.  There are only 2 vector instructions in the scalar core that have a 
> TCG override, and all of the HVX instructions are implemented with helpers.  
> If you are interested in a deeper dive, see below.
>
> Alessandro and Niccolo can comment on the flex/bison implementation.
>
> Thanks,
> Taylor
>
>
> Now for the deeper dive in case anyone is interested.  Look at the genptr.c 
> file in target/hexagon.
>
> The first vector instruction that is with an override is A6_vminub_RdP.  It 
> does a byte-wise comparison of two register pairs and sets a predicate 
> register indicating whether the byte in the left or right operand is greater. 
>  Here is the TCG code.
> #define fWRAP_A6_vminub_RdP(GENHLPR, SHORTCODE) \
> { \
> TCGv BYTE = tcg_temp_new(); \
> TCGv left = tcg_temp_new(); \
> TCGv right = tcg_temp_new(); \
> TCGv tmp = tcg_temp_new(); \
> int i; \
> tcg_gen_movi_tl(PeV, 0); \
> tcg_gen_movi_i64(RddV, 0); \
> for (i = 0; i < 8; i++) { \
> fGETUBYTE(i, RttV); \
> tcg_gen_mov_tl(left, BYTE); \
> fGETUBYTE(i, RssV); \
> tcg_gen_mov_tl(right, BYTE); \

[PATCH v1 4/5] docs/devel: convert multi-thread-tcg to a .rst document

2019-11-13 Thread Alex Bennée
Signed-off-by: Alex Bennée 
---
 docs/devel/index.rst  |  1 +
 ...ti-thread-tcg.txt => multi-thread-tcg.rst} | 28 ---
 2 files changed, 19 insertions(+), 10 deletions(-)
 rename docs/devel/{multi-thread-tcg.txt => multi-thread-tcg.rst} (96%)

diff --git a/docs/devel/index.rst b/docs/devel/index.rst
index c86a3cdff2f..3e6624ec604 100644
--- a/docs/devel/index.rst
+++ b/docs/devel/index.rst
@@ -22,4 +22,5 @@ Contents:
decodetree
secure-coding-practices
tcg
+   multi-thread-tcg
tcg-plugins
diff --git a/docs/devel/multi-thread-tcg.txt b/docs/devel/multi-thread-tcg.rst
similarity index 96%
rename from docs/devel/multi-thread-tcg.txt
rename to docs/devel/multi-thread-tcg.rst
index 782bebc28b4..4e914bacc0c 100644
--- a/docs/devel/multi-thread-tcg.txt
+++ b/docs/devel/multi-thread-tcg.rst
@@ -1,7 +1,10 @@
-Copyright (c) 2015-2016 Linaro Ltd.
+.. Copyright (c) 2015-2016 Linaro Ltd.
+.. This work is licensed under the terms of the GNU GPL, version 2 or
+.. later. See the COPYING file in the top-level directory.
 
-This work is licensed under the terms of the GNU GPL, version 2 or
-later. See the COPYING file in the top-level directory.
+==
+Multi-threaded TCG
+==
 
 Introduction
 
@@ -40,7 +43,7 @@ Main Run Loop
 Even when there is no code being generated there are a number of
 structures associated with the hot-path through the main run-loop.
 These are associated with looking up the next translation block to
-execute. These include:
+execute. These include::
 
 tb_jmp_cache (per-vCPU, cache of recent jumps)
 tb_ctx.htable (global hash table, phys address->tb lookup)
@@ -61,7 +64,9 @@ have their block-to-block jumps patched.
 Global TCG State
 
 
-### User-mode emulation
+User-mode emulation
+~~~
+
 We need to protect the entire code generation cycle including any post
 generation patching of the translated code. This also implies a shared
 translation buffer which contains code running on all cores. Any
@@ -78,7 +83,9 @@ patching.
 
 Code generation is serialised with mmap_lock().
 
-### !User-mode emulation
+System emulation
+
+
 Each vCPU has its own TCG context and associated TCG region, thereby
 requiring no locking.
 
@@ -125,10 +132,11 @@ linked list of all Translation Blocks in that page (see 
page_next).
 Both the jump patching and the page cache involve linked lists that
 the invalidated TranslationBlock needs to be removed from.
 
-DESIGN REQUIREMENT: Safely handle invalidation of TBs
-  - safely patch/revert direct jumps
-  - remove central PageDesc lookup entries
-  - ensure lookup caches/hashes are safely updated
+DESIGN REQUIREMENTS:
+  - Safely handle invalidation of TBs
+ - safely patch/revert direct jumps
+ - remove central PageDesc lookup entries
+ - ensure lookup caches/hashes are safely updated
 
 (Current solution)
 
-- 
2.20.1




Re: [PATCH] virtio: fix IO request length in virtio SCSI/block #PSBM-78839

2019-11-13 Thread Denis Plotnikov


On 06.11.2019 15:03, Michael S. Tsirkin wrote:
> On Thu, Oct 24, 2019 at 11:34:34AM +, Denis Lunev wrote:
>> On 10/24/19 12:28 AM, Michael S. Tsirkin wrote:
>>> On Fri, Oct 18, 2019 at 02:55:47PM +0300, Denis Plotnikov wrote:
 From: "Denis V. Lunev" 

 Linux guests submit IO requests no longer than PAGE_SIZE * max_seg
 field reported by SCSI controler. Thus typical sequential read with
 1 MB size results in the following pattern of the IO from the guest:
8,16   115754 2.766095122  2071  D   R 2095104 + 1008 [dd]
8,16   115755 2.766108785  2071  D   R 2096112 + 1008 [dd]
8,16   115756 2.766113486  2071  D   R 2097120 + 32 [dd]
8,16   115757 2.767668961 0  C   R 2095104 + 1008 [0]
8,16   115758 2.768534315 0  C   R 2096112 + 1008 [0]
8,16   115759 2.768539782 0  C   R 2097120 + 32 [0]
 The IO was generated by
dd if=/dev/sda of=/dev/null bs=1024 iflag=direct

 This effectively means that on rotational disks we will observe 3 IOPS
 for each 2 MBs processed. This definitely negatively affects both
 guest and host IO performance.

 The cure is relatively simple - we should report lengthy scatter-gather
 ability of the SCSI controller. Fortunately the situation here is very
 good. VirtIO transport layer can accomodate 1024 items in one request
 while we are using only 128. This situation is present since almost
 very beginning. 2 items are dedicated for request metadata thus we
 should publish VIRTQUEUE_MAX_SIZE - 2 as max_seg.

 The following pattern is observed after the patch:
8,16   1 9921 2.662721340  2063  D   R 2095104 + 1024 [dd]
8,16   1 9922 2.662737585  2063  D   R 2096128 + 1024 [dd]
8,16   1 9923 2.665188167 0  C   R 2095104 + 1024 [0]
8,16   1 9924 2.665198777 0  C   R 2096128 + 1024 [0]
 which is much better.

 The dark side of this patch is that we are tweaking guest visible
 parameter, though this should be relatively safe as above transport
 layer support is present in QEMU/host Linux for a very long time.
 The patch adds configurable property for VirtIO SCSI with a new default
 and hardcode option for VirtBlock which does not provide good
 configurable framework.

 Unfortunately the commit can not be applied as is. For the real cure we
 need guest to be fixed to accomodate that queue length, which is done
 only in the latest 4.14 kernel. Thus we are going to expose the property
 and tweak it on machine type level.

 The problem with the old kernels is that they have
 max_segments <= virtqueue_size restriction which cause the guest
 crashing in the case of violation.
>>> This isn't just in the guests: virtio spec also seems to imply this,
>>> or at least be vague on this point.
>>>
>>> So I think it'll need a feature bit.
>>> Doing that in a safe way will also allow being compatible with old guests.
>>>
>>> The only downside is it's a bit more work as we need to
>>> spec this out and add guest support.
>>>
 To fix the case described above in the old kernels we can increase
 virtqueue_size to 256 and max_segments to 254. The pitfall here is
 that seabios allows the virtqueue_size-s < 128, however, the seabios
 patch extending that value to 256 is pending.
>>> And the fix here is just to limit large vq size to virtio 1.0.
>>> In that mode it's fine I think:
>>>
>>>
>>> /* check if the queue is available */
>>> if (vp->use_modern) {
>>> num = vp_read(>common, virtio_pci_common_cfg, queue_size);
>>> if (num > MAX_QUEUE_NUM) {
>>> vp_write(>common, virtio_pci_common_cfg, queue_size,
>>>  MAX_QUEUE_NUM);
>>> num = vp_read(>common, virtio_pci_common_cfg, queue_size);
>>> }
>>> } else {
>>> num = vp_read(>legacy, virtio_pci_legacy, queue_num);
>>> }
>> you mean to put the code like this into virtio_pci_realize() inside QEMU?
>>
>> If no, can you pls clarify which component should be touched.
>>
>> Den
> I mean:
>   - add an API to change the default queue size
>   - add a validate features callback, in there check and for modern
> flag set in features increase the queue size
>
> maybe all this is too much work, we could block this
> for transitional devices, but your patch does not do it,
> you need to check that legacy is enabled not that modern
> is not disabled.
To develop the idea of how to adjust queue size further I'd like to 
summarize what we have:

1. Variatly of gusts without(?) queue size limitations which can support 
queue sizes up to MAX(1024)

2. seabios setups with two possible max queue size limitations: 128 and 
256 (recently commited)

3. non-sebios setups with unknown max queue-size limitations

Taking into account that queue size may be 

[PATCH] ahci: zero-initialize port struct

2019-11-13 Thread Gerd Hoffmann
Specifically port->driver.lchs needs clearing, otherwise seabios will
try interpret whatever random crap happens to be there as disk geometry,
which may or may not break boot depending on how lucky you are.

Signed-off-by: Gerd Hoffmann 
---
 src/hw/ahci.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/src/hw/ahci.c b/src/hw/ahci.c
index 97a072a1ca81..d45b4307ec68 100644
--- a/src/hw/ahci.c
+++ b/src/hw/ahci.c
@@ -345,6 +345,7 @@ ahci_port_alloc(struct ahci_ctrl_s *ctrl, u32 pnr)
 warn_noalloc();
 return NULL;
 }
+memset(port, 0, sizeof(*port));
 port->pnr = pnr;
 port->ctrl = ctrl;
 port->list = memalign_tmp(1024, 1024);
-- 
2.18.1




Re: [PATCH V4] target/i386/kvm: Add Hyper-V direct tlb flush support

2019-11-13 Thread Vitaly Kuznetsov
Roman Kagan  writes:

> On Wed, Nov 13, 2019 at 10:29:00AM +0100, Vitaly Kuznetsov wrote:
>> Roman Kagan  writes:
>> > On Tue, Nov 12, 2019 at 11:34:27AM +0800, lantianyu1...@gmail.com wrote:
>> >> From: Tianyu Lan 
>> >> 
>> >> Hyper-V direct tlb flush targets KVM on Hyper-V guest.
>> >> Enable direct TLB flush for its guests meaning that TLB
>> >> flush hypercalls are handled by Level 0 hypervisor (Hyper-V)
>> >> bypassing KVM in Level 1. Due to the different ABI for hypercall
>> >> parameters between Hyper-V and KVM, KVM capabilities should be
>> >> hidden when enable Hyper-V direct tlb flush otherwise KVM
>> >> hypercalls may be intercepted by Hyper-V. Add new parameter
>> >> "hv-direct-tlbflush". Check expose_kvm and Hyper-V tlb flush
>> >> capability status before enabling the feature.
>> >> 
>> >> Signed-off-by: Tianyu Lan 
>> >> ---
>> >> Change since v3:
>> >>- Fix logic of Hyper-V passthrough mode with direct
>> >>tlb flush.
>> >> 
>> >> Change sicne v2:
>> >>- Update new feature description and name.
>> >>- Change failure print log.
>> >> 
>> >> Change since v1:
>> >>- Add direct tlb flush's Hyper-V property and use
>> >>hv_cpuid_check_and_set() to check the dependency of tlbflush
>> >>feature.
>> >>- Make new feature work with Hyper-V passthrough mode.
>> >> ---
>> >>  docs/hyperv.txt   | 10 ++
>> >>  target/i386/cpu.c |  2 ++
>> >>  target/i386/cpu.h |  1 +
>> >>  target/i386/kvm.c | 24 
>> >>  4 files changed, 37 insertions(+)
>> >> 
>> >> diff --git a/docs/hyperv.txt b/docs/hyperv.txt
>> >> index 8fdf25c829..140a5c7e44 100644
>> >> --- a/docs/hyperv.txt
>> >> +++ b/docs/hyperv.txt
>> >> @@ -184,6 +184,16 @@ enabled.
>> >>  
>> >>  Requires: hv-vpindex, hv-synic, hv-time, hv-stimer
>> >>  
>> >> +3.18. hv-direct-tlbflush
>> >> +===
>> >> +Enable direct TLB flush for KVM when it is running as a nested
>> >> +hypervisor on top Hyper-V. When enabled, TLB flush hypercalls from L2
>> >> +guests are being passed through to L0 (Hyper-V) for handling. Due to ABI
>> >> +differences between Hyper-V and KVM hypercalls, L2 guests will not be
>> >> +able to issue KVM hypercalls (as those could be mishanled by L0
>> >> +Hyper-V), this requires KVM hypervisor signature to be hidden.
>> >
>> > On a second thought, I wonder if this is the only conflict we have.
>> >
>> > In KVM, kvm_emulate_hypercall, when sees Hyper-V hypercalls enabled,
>> > just calls kvm_hv_hypercall and returns.  I.e. once the userspace
>> > enables Hyper-V hypercalls (which QEMU does when any of hv_* flags is
>> > given), KVM treats *all* hypercalls as Hyper-V ones and handles *no* KVM
>> > hypercalls.
>> 
>> Yes, but only after guest enables Hyper-V hypercalls by writing to
>> HV_X64_MSR_HYPERCALL. E.g. if you run a Linux guest and add a couple
>> hv_* flags on the QEMU command line the guest will still be able to use
>> KVM hypercalls normally becase Linux won't enable Hyper-V hypercall
>> page.
>
> Ah, you're right.  There's no conflict indeed, the guest makes
> deliberate choice which hypercall ABI to use.
>
> Then QEMU (or KVM on its own?) should only activate this flag in evmcs
> if it sees that the guest has enabled Hyper-V hypercalls.

That was my suggestion as well when KVM patches were submitted, but if I
remember correctly Tianyu said that if we don't enable 'direct tlb
flush' flag in eVMCS on first VMLAUNCH, underlying Hyper-V won't give us
a second chance so we can't enadle it after guest writes to
HV_X64_MSR_HYPERCALL. This is a very unfortunate design/implementation.

-- 
Vitaly




Re: [RFC PATCH 18/18] qemu-storage-daemon: Add --monitor option

2019-11-13 Thread Kevin Wolf
Am 12.11.2019 um 15:25 hat Markus Armbruster geschrieben:
> Kevin Wolf  writes:
> 
> > This adds and parses the --monitor option, so that a QMP monitor can be
> > used in the storage daemon. The monitor offers commands defined in the
> > QAPI schema at storage-daemon/qapi/qapi-schema.json.
> 
> I feel we should explain the module sharing between the two QAPI
> schemata here.  It's not exactly trivial, as we'll see below.
> 
> > Signed-off-by: Kevin Wolf 
> > ---
> >  storage-daemon/qapi/qapi-schema.json | 15 
> >  qemu-storage-daemon.c| 34 
> >  Makefile | 30 
> >  Makefile.objs|  4 ++--
> >  monitor/Makefile.objs|  2 ++
> >  qapi/Makefile.objs   |  5 
> >  qom/Makefile.objs|  1 +
> >  scripts/qapi/gen.py  |  5 
> >  storage-daemon/Makefile.objs |  1 +
> >  storage-daemon/qapi/Makefile.objs|  1 +
> >  10 files changed, 96 insertions(+), 2 deletions(-)
> >  create mode 100644 storage-daemon/qapi/qapi-schema.json
> >  create mode 100644 storage-daemon/Makefile.objs
> >  create mode 100644 storage-daemon/qapi/Makefile.objs
> >
> > diff --git a/storage-daemon/qapi/qapi-schema.json 
> > b/storage-daemon/qapi/qapi-schema.json
> > new file mode 100644
> > index 00..58c561ebea
> > --- /dev/null
> > +++ b/storage-daemon/qapi/qapi-schema.json
> > @@ -0,0 +1,15 @@
> > +# -*- Mode: Python -*-
> > +
> > +{ 'include': '../../qapi/pragma.json' }
> > +
> > +{ 'include': '../../qapi/block.json' }
> > +{ 'include': '../../qapi/block-core.json' }
> > +{ 'include': '../../qapi/char.json' }
> > +{ 'include': '../../qapi/common.json' }
> > +{ 'include': '../../qapi/crypto.json' }
> > +{ 'include': '../../qapi/introspect.json' }
> > +{ 'include': '../../qapi/job.json' }
> > +{ 'include': '../../qapi/monitor.json' }
> > +{ 'include': '../../qapi/qom.json' }
> > +{ 'include': '../../qapi/sockets.json' }
> > +{ 'include': '../../qapi/transaction.json' }
> > diff --git a/qemu-storage-daemon.c b/qemu-storage-daemon.c
> > index 46e0a6ea56..4939e6b41f 100644
> > --- a/qemu-storage-daemon.c
> > +++ b/qemu-storage-daemon.c
> > @@ -28,12 +28,16 @@
> >  #include "block/nbd.h"
> >  #include "chardev/char.h"
> >  #include "crypto/init.h"
> > +#include "monitor/monitor.h"
> > +#include "monitor/monitor-internal.h"
> >  
> >  #include "qapi/error.h"
> >  #include "qapi/qapi-commands-block.h"
> >  #include "qapi/qapi-commands-block-core.h"
> > +#include "qapi/qapi-commands-monitor.h"
> 
> Aren't these three redundant with ...
> 
> >  #include "qapi/qapi-visit-block.h"
> >  #include "qapi/qapi-visit-block-core.h"
> > +#include "qapi/qmp/qstring.h"
> >  #include "qapi/qobject-input-visitor.h"
> >  
> >  #include "qemu-common.h"
> > @@ -46,6 +50,8 @@
> >  #include "qemu/option.h"
> >  #include "qom/object_interfaces.h"
> >  
> > +#include "storage-daemon/qapi/qapi-commands.h"
> > +
> 
> ... this one now?

Looks like it.

> >  #include "sysemu/runstate.h"
> >  #include "trace/control.h"
> >  
> > @@ -58,6 +64,11 @@ void qemu_system_killed(int signal, pid_t pid)
> >  exit_requested = true;
> >  }
> >  
> > +void qmp_quit(Error **errp)
> > +{
> > +exit_requested = true;
> > +}
> > +
> >  static void help(void)
> >  {
> >  printf(
> > @@ -101,6 +112,7 @@ enum {
> >  OPTION_OBJECT = 256,
> >  OPTION_BLOCKDEV,
> >  OPTION_CHARDEV,
> > +OPTION_MONITOR,
> >  OPTION_NBD_SERVER,
> >  OPTION_EXPORT,
> >  };
> 
> Recommend to keep these sorted alphabetically.
> 
> > @@ -116,6 +128,17 @@ static QemuOptsList qemu_object_opts = {
> >  },
> >  };
> >  
> > +static void init_qmp_commands(void)
> > +{
> > +qmp_init_marshal(_commands);
> > +qmp_register_command(_commands, "query-qmp-schema",
> > + qmp_query_qmp_schema, QCO_ALLOW_PRECONFIG);
> > +
> > +QTAILQ_INIT(_cap_negotiation_commands);
> > +qmp_register_command(_cap_negotiation_commands, "qmp_capabilities",
> > + qmp_marshal_qmp_capabilities, 
> > QCO_ALLOW_PRECONFIG);
> > +}
> 
> Duplicates monitor_init_qmp_commands() less two 'gen': false commands
> that don't exist in the storage daemon.
> 
> Hmm, could we let commands.py generate command registration even for
> 'gen': false commands?

Possibly. Are you telling me to do it or is it just an idea for a later
cleanup?

> > +
> >  static void init_export(BlockExport *export, Error **errp)
> >  {
> >  switch (export->type) {
> > @@ -138,6 +161,7 @@ static int process_options(int argc, char *argv[], 
> > Error **errp)
> >  {"object", required_argument, 0, OPTION_OBJECT},
> >  {"blockdev", required_argument, 0, OPTION_BLOCKDEV},
> >  {"chardev", required_argument, 0, OPTION_CHARDEV},
> > +{"monitor", required_argument, 0, OPTION_MONITOR},
> >  {"nbd-server", required_argument, 0, 

[PATCH v1 2/5] docs/devel: rename plugins.rst to tcg-plugins.rst

2019-11-13 Thread Alex Bennée
This makes it a bit clearer what this is about.

Signed-off-by: Alex Bennée 
---
 docs/devel/index.rst| 2 +-
 docs/devel/{plugins.rst => tcg-plugins.rst} | 0
 MAINTAINERS | 1 +
 3 files changed, 2 insertions(+), 1 deletion(-)
 rename docs/devel/{plugins.rst => tcg-plugins.rst} (100%)

diff --git a/docs/devel/index.rst b/docs/devel/index.rst
index 2ff058bae38..c86a3cdff2f 100644
--- a/docs/devel/index.rst
+++ b/docs/devel/index.rst
@@ -22,4 +22,4 @@ Contents:
decodetree
secure-coding-practices
tcg
-   plugins
+   tcg-plugins
diff --git a/docs/devel/plugins.rst b/docs/devel/tcg-plugins.rst
similarity index 100%
rename from docs/devel/plugins.rst
rename to docs/devel/tcg-plugins.rst
diff --git a/MAINTAINERS b/MAINTAINERS
index ff8d0d29f4b..b160d817208 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2369,6 +2369,7 @@ F: tcg/
 TCG Plugins
 M: Alex Bennée 
 S: Maintained
+F: docs/devel/tcg-plugins.rst
 F: plugins/
 F: tests/plugin
 
-- 
2.20.1




Re: [PATCH v2 3/4] watchdog/aspeed: Improve watchdog timeout message

2019-11-13 Thread Alex Bennée


Joel Stanley  writes:

> Users benefit from knowing which watchdog timer has expired. The address
> of the watchdog's registers unambiguously indicates which has expired,
> so log that.
>
> Reviewed-by: Cédric Le Goater 
> Signed-off-by: Joel Stanley 
> ---
> v2: Use HWADDR_PRIx
> ---
>  hw/watchdog/wdt_aspeed.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
>
> diff --git a/hw/watchdog/wdt_aspeed.c b/hw/watchdog/wdt_aspeed.c
> index 145be6f99ce2..8787c5ad0f97 100644
> --- a/hw/watchdog/wdt_aspeed.c
> +++ b/hw/watchdog/wdt_aspeed.c
> @@ -219,7 +219,8 @@ static void aspeed_wdt_timer_expired(void *dev)
>  return;
>  }
>
> -qemu_log_mask(CPU_LOG_RESET, "Watchdog timer expired.\n");
> +qemu_log_mask(CPU_LOG_RESET, "Watchdog timer %" HWADDR_PRIx " 
> expired.\n",
> +s->iomem.addr);

nit: spacing off

Reviewed-by: Alex Bennée 

>  watchdog_perform_action();
>  timer_del(s->timer);
>  }


--
Alex Bennée



Re: [PATCH] qmp: Reset mon->commands on CHR_EVENT_CLOSED

2019-11-13 Thread Markus Armbruster
Jason Andryuk  writes:

> Currently, mon->commands is uninitialized until CHR_EVENT_OPENED where
> it is set to _cap_negotiation_commands.  After capability
> negotiation, it is set to _commands.  If the chardev is closed,
> CHR_EVENT_CLOSED, mon->commands remains as _commands.  Only once the
> chardev is re-opened with CHR_EVENT_OPENED, is it reset to
> _cap_negotiation_commands.
>
> monitor_qapi_event_emit compares mon->commands to
> _cap_negotiation_commands, and skips sending events when they are
> equal.

The check's purpose is to ensure we don't send events in capabilities
negotiation mode, i.e. between connect and a successful qmp_capabilities
command.

> In the case of a closed chardev, QMP events are still sent down
> to the closed chardev which needs to drop them.

True, but I wonder how this can happen.  Can you explain?

> Set mon->commands to _cap_negotiation_commands for CHR_EVENT_CLOSED
> to stop sending events.  Setting for the CHR_EVENT_OPENED case remains
> since that is how mon->commands is set for a newly opened connections.
>
> Signed-off-by: Jason Andryuk 
> ---
>  monitor/qmp.c | 1 +
>  1 file changed, 1 insertion(+)
>
> diff --git a/monitor/qmp.c b/monitor/qmp.c
> index 9d9e5d8b27..5e2073c5eb 100644
> --- a/monitor/qmp.c
> +++ b/monitor/qmp.c
> @@ -333,6 +333,7 @@ static void monitor_qmp_event(void *opaque, int event)
   case CHR_EVENT_CLOSED:
   /*
* Note: this is only useful when the output of the chardev
* backend is still open.  For example, when the backend is
* stdio, it's possible that stdout is still open when stdin
>   * is closed.
>   */
>  monitor_qmp_cleanup_queues(mon);
> +mon->commands = _cap_negotiation_commands;
>  json_message_parser_destroy(>parser);
>  json_message_parser_init(>parser, handle_qmp_command,
>   mon, NULL);

Patchew reports this loses SHUTDOWN events.  Local testing confirms it
does.  Simplified reproducer:

$ for ((i=0; i<100; i++)); do printf '{"execute": 
"qmp_capabilities"}\n{"execute": "quit"}' | upstream-qemu -S -display none -qmp 
stdio; done | grep -c SHUTDOWN
84

In this test, the SHUTDOWN event was lost 16 out of 100 times.

Its emission obviously races with the assignment you add.

The comment preceding it provides a clue: after CHR_EVENT_CLOSED, we
know the input direction is gone, and we duly clean up that part of the
monitor.  But the output direction may or may not be gone, so we don't
touch it.  Your assignment touches it.  This is not the correct spot.
I can't tell you offhand where the correct spot is.  Perhaps Marc-André
can.




Re: [PATCH] ahci: zero-initialize port struct

2019-11-13 Thread Laszlo Ersek
On 11/13/19 10:18, Gerd Hoffmann wrote:
> Specifically port->driver.lchs needs clearing, otherwise seabios will

s/driver/drive/

> try interpret whatever random crap happens to be there as disk geometry,
> which may or may not break boot depending on how lucky you are.
> 
> Signed-off-by: Gerd Hoffmann 
> ---
>  src/hw/ahci.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/src/hw/ahci.c b/src/hw/ahci.c
> index 97a072a1ca81..d45b4307ec68 100644
> --- a/src/hw/ahci.c
> +++ b/src/hw/ahci.c
> @@ -345,6 +345,7 @@ ahci_port_alloc(struct ahci_ctrl_s *ctrl, u32 pnr)
>  warn_noalloc();
>  return NULL;
>  }
> +memset(port, 0, sizeof(*port));
>  port->pnr = pnr;
>  port->ctrl = ctrl;
>  port->list = memalign_tmp(1024, 1024);
> 

Reviewed-by: Laszlo Ersek 




Re: [PATCH v2 05/15] stubs: add stubs for io_uring interface

2019-11-13 Thread Stefan Hajnoczi
On Mon, Nov 11, 2019 at 12:13:47PM +0100, Kevin Wolf wrote:
> Am 25.10.2019 um 18:04 hat Stefan Hajnoczi geschrieben:
> > From: Aarushi Mehta 
> > 
> > Signed-off-by: Aarushi Mehta 
> > Signed-off-by: Stefan Hajnoczi 
> 
> This commit message needs to answer at least where these stubs are
> actually used. Aren't all callers protected with #ifdef
> CONFIG_LINUX_IO_URING? (And if they aren't, why is abort() okay?)

Okay, will look into this.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v2 03/15] block/block: add BDRV flag for io_uring

2019-11-13 Thread Stefan Hajnoczi
On Mon, Nov 11, 2019 at 05:25:04PM +0100, Max Reitz wrote:
> On 11.11.19 11:57, Kevin Wolf wrote:
> > Am 25.10.2019 um 18:04 hat Stefan Hajnoczi geschrieben:
> >> From: Aarushi Mehta 
> >>
> >> Signed-off-by: Aarushi Mehta 
> >> Reviewed-by: Maxim Levitsky 
> >> Signed-off-by: Stefan Hajnoczi 
> >> ---
> >>  include/block/block.h | 1 +
> >>  1 file changed, 1 insertion(+)
> >>
> >> diff --git a/include/block/block.h b/include/block/block.h
> >> index 89606bd9f8..bdb48dcd1b 100644
> >> --- a/include/block/block.h
> >> +++ b/include/block/block.h
> >> @@ -126,6 +126,7 @@ typedef struct HDGeometry {
> >>ignoring the format layer */
> >>  #define BDRV_O_NO_IO   0x1 /* don't initialize for I/O */
> >>  #define BDRV_O_AUTO_RDONLY 0x2 /* degrade to read-only if opening 
> >> read-write fails */
> >> +#define BDRV_O_IO_URING0x4 /* use io_uring instead of the thread 
> >> pool */
> > 
> > Why do we need a new (global!) flag rather than a driver-specific option
> > for file-posix? This looks wrong to me, the flag has no sensible meaning
> > for any other driver.
> > 
> > (Yes, BDRV_O_NATIVE_AIO is wrong, too, but compatibility means we can't
> > remove it easily.)
> 
> To add to that, Kevin and me had a short talk on IRC, and it seemed like
> the reason would be that it’s easier to use this way than an option
> would be.
> 
> To me, the problem seems to be that it’s only easier to use because of
> the option inheritance we have in the block layer (so you can set this
> flag on a qcow2 node and its file node will have it, too).  But
> naturally this inheritance is a bit of magic (because qemu has to guess
> where you probably want the flag to end up), so I’d too rather avoid
> adding to it.
> 
> OTOH, I wonder whether ease of use is really that important here.  Would
> people who want to use io_uring really care about a command line that’s
> going to be a bit more complicated than just
> -drive file=foo.img,format=$imgfmt,aio=io_uring,if=$interface?  (Namely
> file.aio in this case, or maybe even a full-on block graph definition.)

Thanks for the idea.  I agree it's cleaner to avoid a new global flag.

I'll take a look at fixing this and let you know if I hit any problems.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH v2 1/4] aspeed/sdmc: Make ast2600 default 1G

2019-11-13 Thread Alex Bennée


Joel Stanley  writes:

> Most boards have this much.
>
> Reviewed-by: Cédric Le Goater 
> Signed-off-by: Joel Stanley 
> ---
>  hw/misc/aspeed_sdmc.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/hw/misc/aspeed_sdmc.c b/hw/misc/aspeed_sdmc.c
> index f3a63a2e01db..2df3244b53c8 100644
> --- a/hw/misc/aspeed_sdmc.c
> +++ b/hw/misc/aspeed_sdmc.c
> @@ -208,10 +208,10 @@ static int ast2600_rambits(AspeedSDMCState *s)
>  }
>
>  /* use a common default */
> -warn_report("Invalid RAM size 0x%" PRIx64 ". Using default 512M",
> +warn_report("Invalid RAM size 0x%" PRIx64 ". Using default 1024M",
>  s->ram_size);
> -s->ram_size = 512 << 20;
> -return ASPEED_SDMC_AST2600_512MB;
> +s->ram_size = 1024 << 20;

FWIW units.h has some nice #defines to wrap this stuff:

 s->ram_size = 1024 * MiB

Not a blocker though:

Reviewed-by: Alex Bennée 


> +return ASPEED_SDMC_AST2600_1024MB;
>  }
>
>  static void aspeed_sdmc_reset(DeviceState *dev)


--
Alex Bennée



About MONITOR/MWAIT in i386 CPU model

2019-11-13 Thread Tao Xu

Hi Eduardo,

After kvm use "-overcommit cpu-pm=on" to expose MONITOR/MWAIT
(commit id 6f131f13e68d648a8e4f083c667ab1acd88ce4cd), the MONITOR/MWAIT 
feature in CPU model (phenom core2duo coreduo n270 Opteron_G3 EPYC 
Snowridge Denverton) may be unused. For example, when we boot a guest 
with Denverton cpu model, guest cannot detect MONITOR and boot with no 
warning. Should we remove this feature from some CPU model?


Tested by Guo, Xuelian 

Tao Xu




[Bug 1811758] Re: virtio-rng backend should use getentropy() syscall when available

2019-11-13 Thread Laurent Vivier
rng-builtin is the new RNG default backend for virtio-rng and is based
on getrandom().

0198c2621a1e virtio-rng: change default backend to rng-builtin
5f7655f6ef15 virtio-rng: Keep the default backend out of VirtIORNGConf
6c4e9d487fea rng-builtin: add an RNG backend that uses qemu_guest_getrandom()

** Changed in: qemu
   Status: New => Fix Committed

-- 
You received this bug notification because you are a member of qemu-
devel-ml, which is subscribed to QEMU.
https://bugs.launchpad.net/bugs/1811758

Title:
  virtio-rng backend should use getentropy() syscall when available

Status in QEMU:
  Fix Committed

Bug description:
  According to https://wiki.qemu.org/Features/VirtIORNG the default
  backend for `virtio-rng-pci` is `/dev/random`.  Alternately, the user
  can point it to a different backend file, like `/dev/urandom`.

  However, both of these files have suboptimal behavior in one way or
  another, as documented in `random(7)`.  Instead, the default behavior
  should be to pull the requested octets from the `getrandom()` system
  call, if available, called with no flags set.

To manage notifications about this bug go to:
https://bugs.launchpad.net/qemu/+bug/1811758/+subscriptions



[PATCH v1 5/5] .travis.yml: drop 32 bit systems from MAIN_SOFTMMU_TARGETS

2019-11-13 Thread Alex Bennée
The older clangs are still struggling to build and run everything
withing the 50 minute timeout so lets lighten the load a bit more. We
still have coverage for GCC and hopefully no obscure 32 bit guest only
breakages slip through the cracks.

Signed-off-by: Alex Bennée 
---
 .travis.yml | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.travis.yml b/.travis.yml
index b9a026c8eeb..c09b6a00143 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -79,7 +79,7 @@ env:
 - BASE_CONFIG="--disable-docs --disable-tools"
 - TEST_CMD="make check V=1"
 # This is broadly a list of "mainline" softmmu targets which have support 
across the major distros
-- 
MAIN_SOFTMMU_TARGETS="aarch64-softmmu,arm-softmmu,i386-softmmu,mips-softmmu,mips64-softmmu,ppc64-softmmu,riscv64-softmmu,s390x-softmmu,x86_64-softmmu"
+- 
MAIN_SOFTMMU_TARGETS="aarch64-softmmu,mips64-softmmu,ppc64-softmmu,riscv64-softmmu,s390x-softmmu,x86_64-softmmu"
 - CCACHE_SLOPPINESS="include_file_ctime,include_file_mtime"
 - CCACHE_MAXSIZE=1G
 
-- 
2.20.1




[PATCH v1 1/5] tests/vm: make --interactive (and therefore DEBUG=1) unconditional

2019-11-13 Thread Alex Bennée
While the concept of only dropping to ssh if a test fails is nice it
is more useful for this to be unconditional. You usually just want to
get the build up and running and then noodle around debugging or
attempting to replicate.

Cc: Peter Maydell 
Signed-off-by: Alex Bennée 

---
v2
  - fix spelling
---
 tests/vm/basevm.py | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/vm/basevm.py b/tests/vm/basevm.py
index 91a9226026d..0b8c1b26576 100755
--- a/tests/vm/basevm.py
+++ b/tests/vm/basevm.py
@@ -403,7 +403,7 @@ def main(vmcls):
 exitcode = 0
 if vm.ssh(*cmd) != 0:
 exitcode = 3
-if exitcode != 0 and args.interactive:
+if args.interactive:
 vm.ssh()
 
 if not args.snapshot:
-- 
2.20.1




[PATCH for 4.2-rc2 v1 0/5] misc doc and testing fixes

2019-11-13 Thread Alex Bennée
Hi,

As we approach release I'm just bundling up my random fixes for 4.2
into one tree for each release candidate. This has a minor fix for
debugging vm builds and a number of minor documentation fixes
including moving the MTTCG docs across to rst format. The final patch
is a tweak to Travis to drop the 32 bit targets from
MAIN_SOFTMMU_TARGETS. This may be too controversial although the 32
bit targets still get built and tested under GCC.

Please review.

Alex Bennée (5):
  tests/vm: make --interactive (and therefore DEBUG=1) unconditional
  docs/devel: rename plugins.rst to tcg-plugins.rst
  docs/devel: update tcg-plugins.rst with API versioning details
  docs/devel: convert multi-thread-tcg to a .rst document
  .travis.yml: drop 32 bit systems from MAIN_SOFTMMU_TARGETS

 docs/devel/index.rst  |  3 +-
 ...ti-thread-tcg.txt => multi-thread-tcg.rst} | 28 ---
 docs/devel/{plugins.rst => tcg-plugins.rst}   | 27 ++
 .travis.yml   |  2 +-
 MAINTAINERS   |  1 +
 tests/vm/basevm.py|  2 +-
 6 files changed, 44 insertions(+), 19 deletions(-)
 rename docs/devel/{multi-thread-tcg.txt => multi-thread-tcg.rst} (96%)
 rename docs/devel/{plugins.rst => tcg-plugins.rst} (83%)

-- 
2.20.1




Re: [PATCH v3 0/8] blockdev: avoid acquiring AioContext lock twice at do_drive_backup and do_blockdev_backup

2019-11-13 Thread Sergio Lopez


no-re...@patchew.org writes:

> Patchew URL: https://patchew.org/QEMU/20191112113012.71136-1-...@redhat.com/
>
>
>
> Hi,
>
> This series failed the docker-quick@centos7 build test. Please find the 
> testing commands and
> their output below. If you have Docker installed, you can probably reproduce 
> it
> locally.
>
> === TEST SCRIPT BEGIN ===
> #!/bin/bash
> make docker-image-centos7 V=1 NETWORK=1
> time make docker-test-quick@centos7 SHOW_ENV=1 J=14 NETWORK=1
> === TEST SCRIPT END ===
>
>   TESTiotest-qcow2: 268
> Failures: 141

Hm... 141 didn't fail in my test machine. I'm going to have a look.

Sergio.

> Failed 1 of 108 iotests
> make: *** [check-tests/check-block.sh] Error 1
> Traceback (most recent call last):
>   File "./tests/docker/docker.py", line 662, in 
> sys.exit(main())
> ---
> raise CalledProcessError(retcode, cmd)
> subprocess.CalledProcessError: Command '['sudo', '-n', 'docker', 'run', 
> '--label', 'com.qemu.instance.uuid=5e0a4e7f97154a93b182d709969b9417', '-u', 
> '1003', '--security-opt', 'seccomp=unconfined', '--rm', '-e', 'TARGET_LIST=', 
> '-e', 'EXTRA_CONFIGURE_OPTS=', '-e', 'V=', '-e', 'J=14', '-e', 'DEBUG=', 
> '-e', 'SHOW_ENV=1', '-e', 'CCACHE_DIR=/var/tmp/ccache', '-v', 
> '/home/patchew2/.cache/qemu-docker-ccache:/var/tmp/ccache:z', '-v', 
> '/var/tmp/patchew-tester-tmp-6a9_8q0n/src/docker-src.2019-11-12-17.38.46.26027:/var/tmp/qemu:z,ro',
>  'qemu:centos7', '/var/tmp/qemu/run', 'test-quick']' returned non-zero exit 
> status 2.
> filter=--filter=label=com.qemu.instance.uuid=5e0a4e7f97154a93b182d709969b9417
> make[1]: *** [docker-run] Error 1
> make[1]: Leaving directory `/var/tmp/patchew-tester-tmp-6a9_8q0n/src'
> make: *** [docker-run-test-quick@centos7] Error 2
>
> real10m57.839s
> user0m8.062s
>
>
> The full log is available at
> http://patchew.org/logs/20191112113012.71136-1-...@redhat.com/testing.docker-quick@centos7/?type=message.
> ---
> Email generated automatically by Patchew [https://patchew.org/].
> Please send your feedback to patchew-de...@redhat.com




Re: [PATCH 2/2] iotests: Test multiple blockdev-snapshot calls

2019-11-13 Thread Kevin Wolf
Am 08.11.2019 um 15:32 hat Alberto Garcia geschrieben:
> On Fri 08 Nov 2019 09:53:12 AM CET, Kevin Wolf wrote:
> > +# Test large write to a qcow2 image
> 
> This doesn't belong here I guess :)

Yes, fixed.

> I wonder if this test could go in 245 instead.

The headline for 245 is "Test cases for the QMP 'x-blockdev-reopen'
command", so I don't think so?

We end up testing the reopen code, but only as a side effect of
snapshotting, not through x-blockdev-reopen.

Kevin




Re: [PATCH 2/2] iotests: Test multiple blockdev-snapshot calls

2019-11-13 Thread Kevin Wolf
Am 12.11.2019 um 17:07 hat Peter Krempa geschrieben:
> On Fri, Nov 08, 2019 at 09:53:12 +0100, Kevin Wolf wrote:
> > Test that doing a second blockdev-snapshot doesn't make the first
> > overlay's backing file go away.
> > 
> > Signed-off-by: Kevin Wolf 
> > ---
> >  tests/qemu-iotests/273 |  76 +
> >  tests/qemu-iotests/273.out | 337 +
> >  tests/qemu-iotests/group   |   1 +
> >  3 files changed, 414 insertions(+)
> >  create mode 100755 tests/qemu-iotests/273
> >  create mode 100644 tests/qemu-iotests/273.out
> 
> Didn't apply cleanly for me.
> 
> > 
> > diff --git a/tests/qemu-iotests/273 b/tests/qemu-iotests/273
> > new file mode 100755
> > index 00..60076de7ce
> > --- /dev/null
> > +++ b/tests/qemu-iotests/273
> > @@ -0,0 +1,76 @@
> > +#!/usr/bin/env bash
> > +#
> > +# Test large write to a qcow2 image
> 
> Cut?

Indeed. I'll change it like this:

# Test multiple blockdev-snapshot calls with 'backing': null

> Rest looks good
> 
> Reviewed-by: Peter Krempa 

Thanks!

Kevin




How to extend QEMU's vhost-user tests after implementing vhost-user-blk device backend

2019-11-13 Thread Coiby Xu
Hi,

I've implemented vhost-user-blk device backend by following
https://wiki.qemu.org/Google_Summer_of_Code_2019#vhost-user-blk_device_backend.
But I'm not sure what kind of tests I should write or to extend to
take advantage of implemented vhost-user-blk device backend. The
existing two tests related to vhost user are tests/vhost-user-bridge.c
and tests/vhost-user-test.c both of which act as vhost user server to
test QEMU's implementation of vhost user client. Am I supposed to
extend these two tests? Could you elaborate on the final step "Extend
QEMU's vhost-user tests to take advantage of your vhost-user-blk
device backend"?

Thank you!
--
Best regards,
Coiby



Re: [PATCH RFC] virtio-pci: disable vring processing when bus-mastering is disabled

2019-11-13 Thread Michael S. Tsirkin
On Tue, Nov 12, 2019 at 11:43:01PM -0600, Michael Roth wrote:
> Currently the SLOF firmware for pseries guests will disable/re-enable
> a PCI device multiple times via IO/MEM/MASTER bits of PCI_COMMAND
> register after the initial probe/feature negotiation, as it tends to
> work with a single device at a time at various stages like probing
> and running block/network bootloaders without doing a full reset
> in-between.
> 
> In QEMU, when PCI_COMMAND_MASTER is disabled we disable the
> corresponding IOMMU memory region, so DMA accesses (including to vring
> fields like idx/flags) will no longer undergo the necessary
> translation. Normally we wouldn't expect this to happen since it would
> be misbehavior on the driver side to continue driving DMA requests.
> 
> However, in the case of pseries, with iommu_platform=on, we trigger the
> following sequence when tearing down the virtio-blk dataplane ioeventfd
> in response to the guest unsetting PCI_COMMAND_MASTER:
> 
>   #2  0x55922651 in virtqueue_map_desc 
> (vdev=vdev@entry=0x56dbcfb0, p_num_sg=p_num_sg@entry=0x7fffe657e1a8, 
> addr=addr@entry=0x7fffe657e240, iov=iov@entry=0x7fffe6580240, 
> max_num_sg=max_num_sg@entry=1024, is_write=is_write@entry=false, pa=0, sz=0)
>   at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:757
>   #3  0x55922a89 in virtqueue_pop (vq=vq@entry=0x56dc8660, 
> sz=sz@entry=184)
>   at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:950
>   #4  0x558d3eca in virtio_blk_get_request (vq=0x56dc8660, 
> s=0x56dbcfb0)
>   at /home/mdroth/w/qemu.git/hw/block/virtio-blk.c:255
>   #5  0x558d3eca in virtio_blk_handle_vq (s=0x56dbcfb0, 
> vq=0x56dc8660)
>   at /home/mdroth/w/qemu.git/hw/block/virtio-blk.c:776
>   #6  0x5591dd66 in virtio_queue_notify_aio_vq 
> (vq=vq@entry=0x56dc8660)
>   at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:1550
>   #7  0x5591ecef in virtio_queue_notify_aio_vq (vq=0x56dc8660)
>   at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:1546
>   #8  0x5591ecef in virtio_queue_host_notifier_aio_poll 
> (opaque=0x56dc86c8)
>   at /home/mdroth/w/qemu.git/hw/virtio/virtio.c:2527
>   #9  0x55d02164 in run_poll_handlers_once 
> (ctx=ctx@entry=0x5688bfc0, timeout=timeout@entry=0x7fffe65844a8)
>   at /home/mdroth/w/qemu.git/util/aio-posix.c:520
>   #10 0x55d02d1b in try_poll_mode (timeout=0x7fffe65844a8, 
> ctx=0x5688bfc0)
>   at /home/mdroth/w/qemu.git/util/aio-posix.c:607
>   #11 0x55d02d1b in aio_poll (ctx=ctx@entry=0x5688bfc0, 
> blocking=blocking@entry=true)
>   at /home/mdroth/w/qemu.git/util/aio-posix.c:639
>   #12 0x55d0004d in aio_wait_bh_oneshot (ctx=0x5688bfc0, 
> cb=cb@entry=0x558d5130 , 
> opaque=opaque@entry=0x56de86f0)
>   at /home/mdroth/w/qemu.git/util/aio-wait.c:71
>   #13 0x558d59bf in virtio_blk_data_plane_stop (vdev=)
>   at /home/mdroth/w/qemu.git/hw/block/dataplane/virtio-blk.c:288
>   #14 0x55b906a1 in virtio_bus_stop_ioeventfd 
> (bus=bus@entry=0x56dbcf38)
>   at /home/mdroth/w/qemu.git/hw/virtio/virtio-bus.c:245
>   #15 0x55b90dbb in virtio_bus_stop_ioeventfd 
> (bus=bus@entry=0x56dbcf38)
>   at /home/mdroth/w/qemu.git/hw/virtio/virtio-bus.c:237
>   #16 0x55b92a8e in virtio_pci_stop_ioeventfd (proxy=0x56db4e40)
>   at /home/mdroth/w/qemu.git/hw/virtio/virtio-pci.c:292
>   #17 0x55b92a8e in virtio_write_config (pci_dev=0x56db4e40, 
> address=, val=1048832, len=)
>   at /home/mdroth/w/qemu.git/hw/virtio/virtio-pci.c:613
> 
> I.e. the calling code is only scheduling a one-shot BH for
> virtio_blk_data_plane_stop_bh, but somehow we end up trying to process
> an additional virtqueue entry before we get there. This is likely due
> to the following check in virtio_queue_host_notifier_aio_poll:
> 
>   static bool virtio_queue_host_notifier_aio_poll(void *opaque)
>   {
>   EventNotifier *n = opaque;
>   VirtQueue *vq = container_of(n, VirtQueue, host_notifier);
>   bool progress;
> 
>   if (!vq->vring.desc || virtio_queue_empty(vq)) {
>   return false;
>   }
> 
>   progress = virtio_queue_notify_aio_vq(vq);
> 
> namely the call to virtio_queue_empty(). In this case, since no new
> requests have actually been issued, shadow_avail_idx == last_avail_idx,
> so we actually try to access the vring via vring_avail_idx() to get
> the latest non-shadowed idx:
> 
>   int virtio_queue_empty(VirtQueue *vq)
>   {
>   bool empty;
>   ...
> 
>   if (vq->shadow_avail_idx != vq->last_avail_idx) {
>   return 0;
>   }
> 
>   rcu_read_lock();
>   empty = vring_avail_idx(vq) == vq->last_avail_idx;
>   rcu_read_unlock();
>   return empty;
> 
> but since the IOMMU region has been disabled we get a bogus value (0
> usually), which causes virtio_queue_empty() to falsely report that
> there are entries to be processed, 

Re: [PATCH v9 Kernel 1/5] vfio: KABI for migration interface for device state

2019-11-13 Thread Cornelia Huck
On Tue, 12 Nov 2019 15:30:05 -0700
Alex Williamson  wrote:

> On Tue, 12 Nov 2019 22:33:36 +0530
> Kirti Wankhede  wrote:
> 
> > - Defined MIGRATION region type and sub-type.
> > - Used 3 bits to define VFIO device states.
> > Bit 0 => _RUNNING
> > Bit 1 => _SAVING
> > Bit 2 => _RESUMING
> > Combination of these bits defines VFIO device's state during migration
> > _RUNNING => Normal VFIO device running state. When its reset, it
> > indicates _STOPPED state. when device is changed to
> > _STOPPED, driver should stop device before write()
> > returns.
> > _SAVING | _RUNNING => vCPUs are running, VFIO device is running but
> >   start saving state of device i.e. pre-copy state
> > _SAVING  => vCPUs are stopped, VFIO device should be stopped, and  
> 
> s/should/must/
> 
> > save device state,i.e. stop-n-copy state
> > _RESUMING => VFIO device resuming state.
> > _SAVING | _RESUMING and _RUNNING | _RESUMING => Invalid states  
> 
> A table might be useful here and in the uapi header to indicate valid
> states:

I like that.

> 
> | _RESUMING | _SAVING | _RUNNING | Description
> +---+-+--+--
> | 0 |0| 0| Stopped, not saving or resuming (a)
> +---+-+--+--
> | 0 |0| 1| Running, default state
> +---+-+--+--
> | 0 |1| 0| Stopped, migration interface in save mode
> +---+-+--+--
> | 0 |1| 1| Running, save mode interface, iterative
> +---+-+--+--
> | 1 |0| 0| Stopped, migration resume interface active
> +---+-+--+--
> | 1 |0| 1| Invalid (b)
> +---+-+--+--
> | 1 |1| 0| Invalid (c)
> +---+-+--+--
> | 1 |1| 1| Invalid (d)
> 
> I think we need to consider whether we define (a) as generally
> available, for instance we might want to use it for diagnostics or a
> fatal error condition outside of migration.
> 
> Are there hidden assumptions between state transitions here or are
> there specific next possible state diagrams that we need to include as
> well?

Some kind of state-change diagram might be useful in addition to the
textual description anyway. Let me try, just to make sure I understand
this correctly:

1) 0/0/1 ---(trigger driver to start gathering state info)---> 0/1/1
2) 0/0/1 ---(tell driver to stop)---> 0/0/0
3) 0/1/1 ---(tell driver to stop)---> 0/1/0
4) 0/0/1 ---(tell driver to resume with provided info)---> 1/0/0
5) 1/0/0 ---(driver is ready)---> 0/0/1
6) 0/1/1 ---(tell driver to stop saving)---> 0/0/1

Not sure about the usefulness of 2). Also, is 4) the only way to
trigger resuming? And is the change in 5) performed by the driver, or
by userspace?

Are any other state transitions valid?

(...)

> > + * Sequence to be followed for _SAVING|_RUNNING device state or pre-copy 
> > phase
> > + * and for _SAVING device state or stop-and-copy phase:
> > + * a. read pending_bytes. If pending_bytes > 0, go through below steps.
> > + * b. read data_offset, indicates kernel driver to write data to staging 
> > buffer.
> > + *Kernel driver should return this read operation only after writing 
> > data to
> > + *staging buffer is done.  
> 
> "staging buffer" implies a vendor driver implementation, perhaps we
> could just state that data is available from (region + data_offset) to
> (region + data_offset + data_size) upon return of this read operation.
> 
> > + * c. read data_size, amount of data in bytes written by vendor driver in
> > + *migration region.
> > + * d. read data_size bytes of data from data_offset in the migration 
> > region.
> > + * e. process data.
> > + * f. Loop through a to e. Next read on pending_bytes indicates that read 
> > data
> > + *operation from migration region for previous iteration is done.  
> 
> I think this indicate that step (f) should be to read pending_bytes, the
> read sequence is not complete until this step.  Optionally the user can
> then proceed to step (b).  There are no read side-effects of (a) afaict.
> 
> Is the use required to reach pending_bytes == 0 before changing
> device_state, particularly transitioning to !_RUNNING?  Presumably the
> user can exit this sequence at any time by clearing _SAVING.

That would be transition 6) above (abort saving and continue). I think
it makes sense not to forbid this.

> 
> > + *
> > + * Sequence to be followed 

Re: [PATCH v9 Qemu 00/15] Add migration support for VFIO devices

2019-11-13 Thread Cornelia Huck
On Tue, 12 Nov 2019 22:35:09 +0530
Kirti Wankhede  wrote:

> Hi,
> 
> This Patch set adds migration support for VFIO devices in QEMU.
> 
> This Patch set include patches as below:
> Patch 1-3:
> - Define KABI for VFIO device for migration support for device state and newly
>   added ioctl definations to get dirty pages bitmap. These 3 patches are same 
> as
>   the first 2 patches in kernel patch set.

Meta: Might make sense to replace these three patches with a
placeholder for a linux-headers update, as we're reviewing this on the
kernel side anyway.




Re: [PATCH] enable translating statx syscalls on more arches

2019-11-13 Thread Laurent Vivier
Le 13/11/2019 à 10:55, Aleksandar Markovic a écrit :
> 
> 
> On Tuesday, November 12, 2019, Andrew Kelley  > wrote:
> 
> ping
> 
> 
> 
> Hello, Andrew.
> 
> I just want to advise you to send, if possible, a new version of the
> patch, addressing whatt said in his review. It is better if you send it
> sooner rather than later, given the stage of our dev cycle.
> 
> For MIPS parts:
> 
> Reviewed-by: Aleksandar Markovic  >
> 

The i386 part breaks the build because it defines TARGET_NR_arch_prctl
and in linux-user/syscall.c we have:

#ifdef TARGET_NR_arch_prctl
case TARGET_NR_arch_prctl:
#if defined(TARGET_I386) && !defined(TARGET_ABI32)
return do_arch_prctl(cpu_env, arg1, arg2);
#else
#error unreachable
#endif
#endif

Thanks,
Laurent




Re: [PATCH v2 00/15] io_uring: add Linux io_uring AIO engine

2019-11-13 Thread Stefan Hajnoczi
On Mon, Nov 04, 2019 at 11:32:33AM +0100, Stefan Hajnoczi wrote:
> On Fri, Oct 25, 2019 at 06:04:29PM +0200, Stefan Hajnoczi wrote:
> > v11:
> >  * Drop fd registration because it breaks QEMU's file locking and will need 
> > to
> >be resolved in a separate patch series
> >  * Drop line-wrapping changes that accidentally broke several qemu-iotests
> > 
> > v10:
> >  * Dropped kernel submission queue polling, it requires root and has 
> > additional
> >limitations.  It should be benchmarked and considered for inclusion 
> > later,
> >maybe even together with kernel side changes.
> >  * Add io_uring_register_files() return value to trace_luring_fd_register()
> >  * Fix indentation in luring_fd_unregister()
> >  * Set s->fd_reg.fd_array to NULL after g_free() to avoid dangling pointers
> >  * Simplify fd registration code
> >  * Add luring_fd_unregister() and call it from file-posix.c to prevent
> >fd leaks
> >  * Add trace_luring_fd_unregister() trace event
> >  * Add missing space to qemu-img command-line documentation
> >  * Update MAINTAINERS file [Julia]
> >  * Rename MAX_EVENTS to MAX_ENTRIES [Julia]
> >  * Define ioq_submit() before callers so the prototype isn't necessary 
> > [Julia]
> >  * Declare variables at the beginning of the block in luring_init() [Julia]
> > 
> > This patch series is based on Aarushi Mehta's v9 patch series written for
> > Google Summer of Code 2019:
> > 
> >   https://lists.gnu.org/archive/html/qemu-devel/2019-08/msg00179.html
> > 
> > It adds a new AIO engine that uses the new Linux io_uring API.  This is the
> > successor to Linux AIO with a number of improvements:
> > 1. Both O_DIRECT and buffered I/O work
> > 2. fdatasync(2) is supported (no need for a separate thread pool!)
> > 3. True async behavior so the syscall doesn't block (Linux AIO got there to 
> > some degree...)
> > 4. Advanced performance optimizations are available (file registration, 
> > memory
> >buffer registration, completion polling, submission polling).
> > 
> > Since Aarushi has been busy, I have taken up this patch series.  Booting a
> > guest works with -drive aio=io_uring and -drive aio=io_uring,cache=none 
> > with a
> > raw file on XFS.
> > 
> > I currently recommend using -drive aio=io_uring only with host block devices
> > (like NVMe devices).  As of Linux v5.4-rc1 I still hit kernel bugs when 
> > using
> > image files on ext4 or XFS.
> > 
> > Aarushi Mehta (15):
> >   configure: permit use of io_uring
> >   qapi/block-core: add option for io_uring
> >   block/block: add BDRV flag for io_uring
> >   block/io_uring: implements interfaces for io_uring
> >   stubs: add stubs for io_uring interface
> >   util/async: add aio interfaces for io_uring
> >   blockdev: adds bdrv_parse_aio to use io_uring
> >   block/file-posix.c: extend to use io_uring
> >   block: add trace events for io_uring
> >   block/io_uring: adds userspace completion polling
> >   qemu-io: adds option to use aio engine
> >   qemu-img: adds option to use aio engine for benchmarking
> >   qemu-nbd: adds option for aio engines
> >   tests/qemu-iotests: enable testing with aio options
> >   tests/qemu-iotests: use AIOMODE with various tests
> > 
> >  MAINTAINERS   |   9 +
> >  qapi/block-core.json  |   4 +-
> >  configure |  27 +++
> >  block/Makefile.objs   |   3 +
> >  stubs/Makefile.objs   |   1 +
> >  include/block/aio.h   |  16 +-
> >  include/block/block.h |   2 +
> >  include/block/raw-aio.h   |  12 +
> >  block.c   |  22 ++
> >  block/file-posix.c|  99 ++--
> >  block/io_uring.c  | 433 ++
> >  blockdev.c|  12 +-
> >  qemu-img.c|  11 +-
> >  qemu-io.c |  25 +-
> >  qemu-nbd.c|  12 +-
> >  stubs/io_uring.c  |  32 +++
> >  util/async.c  |  36 +++
> >  block/trace-events|  12 +
> >  qemu-img-cmds.hx  |   4 +-
> >  qemu-img.texi |   5 +-
> >  qemu-nbd.texi |   4 +-
> >  tests/qemu-iotests/028|   2 +-
> >  tests/qemu-iotests/058|   2 +-
> >  tests/qemu-iotests/089|   4 +-
> >  tests/qemu-iotests/091|   4 +-
> >  tests/qemu-iotests/109|   2 +-
> >  tests/qemu-iotests/147|   5 +-
> >  tests/qemu-iotests/181|   8 +-
> >  tests/qemu-iotests/183|   4 +-
> >  tests/qemu-iotests/185|  10 +-
> >  tests/qemu-iotests/200|   2 +-
> >  tests/qemu-iotests/201|   8 +-
> >  tests/qemu-iotests/check  |  15 +-
> >  tests/qemu-iotests/common.rc  |  14 ++
> >  tests/qemu-iotests/iotests.py |  12 +-
> >  35 files changed, 797 insertions(+), 76 deletions(-)
> >  create mode 100644 block/io_uring.c
> >  create mode 100644 stubs/io_uring.c
> 
> Fixed up commit description as requested by Markus.
> 
> Thanks, applied to my 

[PATCH v1 3/5] docs/devel: update tcg-plugins.rst with API versioning details

2019-11-13 Thread Alex Bennée
While we are at it fix up the quoted code sections with the inline ::
approach.

Signed-off-by: Alex Bennée 

---
v2
  - fix grammar
  - mention we also will fail to load outside the range
  - clean-up code sections
---
 docs/devel/tcg-plugins.rst | 27 +--
 1 file changed, 21 insertions(+), 6 deletions(-)

diff --git a/docs/devel/tcg-plugins.rst b/docs/devel/tcg-plugins.rst
index b18fb6729e3..718eef00f22 100644
--- a/docs/devel/tcg-plugins.rst
+++ b/docs/devel/tcg-plugins.rst
@@ -25,6 +25,23 @@ process. However the project reserves the right to change or 
break the
 API should it need to do so. The best way to avoid this is to submit
 your plugin upstream so they can be updated if/when the API changes.
 
+API versioning
+--
+
+All plugins need to declare a symbol which exports the plugin API
+version they were built against. This can be done simply by::
+
+  QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
+
+The core code will refuse to load a plugin that doesn't export a
+`qemu_plugin_version` symbol or if plugin version is outside of QEMU's
+supported range of API versions.
+
+Additionally the `qemu_info_t` structure which is passed to the
+`qemu_plugin_install` method of a plugin will detail the minimum and
+current API versions supported by QEMU. The API version will be
+incremented if new APIs are added. The minimum API version will be
+incremented if existing APIs are changed or removed.
 
 Exposure of QEMU internals
 --
@@ -40,16 +57,14 @@ instructions and events are opaque to the plugins 
themselves.
 Usage
 =
 
-The QEMU binary needs to be compiled for plugin support:
+The QEMU binary needs to be compiled for plugin support::
 
-::
-configure --enable-plugins
+  configure --enable-plugins
 
 Once built a program can be run with multiple plugins loaded each with
-their own arguments:
+their own arguments::
 
-::
-$QEMU $OTHER_QEMU_ARGS \
+  $QEMU $OTHER_QEMU_ARGS \
   -plugin tests/plugin/libhowvec.so,arg=inline,arg=hint \
   -plugin tests/plugin/libhotblocks.so
 
-- 
2.20.1




Re: [PATCH v2 2/4] aspeed/scu: Fix W1C behavior

2019-11-13 Thread Alex Bennée


Joel Stanley  writes:

> This models the clock write one to clear registers, and fixes up some
> incorrect behavior in all of the write to clear registers.
>
> There was also a typo in one of the register definitions.
>
> Reviewed-by: Cédric Le Goater 
> Signed-off-by: Joel Stanley 
> ---
>  hw/misc/aspeed_scu.c | 11 ---
>  1 file changed, 8 insertions(+), 3 deletions(-)
>
> diff --git a/hw/misc/aspeed_scu.c b/hw/misc/aspeed_scu.c
> index 717509bc5460..aac4645f8c3c 100644
> --- a/hw/misc/aspeed_scu.c
> +++ b/hw/misc/aspeed_scu.c
> @@ -98,7 +98,7 @@
>  #define AST2600_CLK_STOP_CTRL TO_REG(0x80)
>  #define AST2600_CLK_STOP_CTRL_CLR TO_REG(0x84)
>  #define AST2600_CLK_STOP_CTRL2 TO_REG(0x90)
> -#define AST2600_CLK_STOP_CTR2L_CLR TO_REG(0x94)
> +#define AST2600_CLK_STOP_CTRL2_CLR TO_REG(0x94)
>  #define AST2600_SDRAM_HANDSHAKE   TO_REG(0x100)
>  #define AST2600_HPLL_PARAMTO_REG(0x200)
>  #define AST2600_HPLL_EXT  TO_REG(0x204)
> @@ -532,11 +532,12 @@ static uint64_t aspeed_ast2600_scu_read(void *opaque, 
> hwaddr offset,
>  return s->regs[reg];
>  }
>
> -static void aspeed_ast2600_scu_write(void *opaque, hwaddr offset, uint64_t 
> data,
> +static void aspeed_ast2600_scu_write(void *opaque, hwaddr offset, uint64_t 
> data64,
>   unsigned size)
>  {
>  AspeedSCUState *s = ASPEED_SCU(opaque);
>  int reg = TO_REG(offset);
> +uint32_t data = data64;

Does it make much difference silently truncating to 32 bit here vs in
the actual set further down? AFAICT most _write functions just deal with
it at the final set.

>
>  if (reg >= ASPEED_AST2600_SCU_NR_REGS) {
>  qemu_log_mask(LOG_GUEST_ERROR,
> @@ -563,15 +564,19 @@ static void aspeed_ast2600_scu_write(void *opaque, 
> hwaddr offset, uint64_t data,
>  /* fall through */
>  case AST2600_SYS_RST_CTRL:
>  case AST2600_SYS_RST_CTRL2:
> +case AST2600_CLK_STOP_CTRL:
> +case AST2600_CLK_STOP_CTRL2:
>  /* W1S (Write 1 to set) registers */
>  s->regs[reg] |= data;
>  return;
>  case AST2600_SYS_RST_CTRL_CLR:
>  case AST2600_SYS_RST_CTRL2_CLR:
> +case AST2600_CLK_STOP_CTRL_CLR:
> +case AST2600_CLK_STOP_CTRL2_CLR:
>  case AST2600_HW_STRAP1_CLR:
>  case AST2600_HW_STRAP2_CLR:
>  /* W1C (Write 1 to clear) registers */
> -s->regs[reg] &= ~data;
> +s->regs[reg - 1] &= ~data;

It might be worth expanding the W1C comment just to mention the
alignment of _CLR vs _CTRL registers.

Otherwise:

Reviewed-by: Alex Bennée 


>  return;
>
>  case AST2600_RNG_DATA:


--
Alex Bennée



Re: [PATCH v2 4/4] watchdog/aspeed: Fix AST2600 frequency behaviour

2019-11-13 Thread Alex Bennée


Joel Stanley  writes:

> The AST2600 control register sneakily changed the meaning of bit 4
> without anyone noticing. It no longer controls the 1MHz vs APB clock
> select, and instead always runs at 1MHz.
>
> The AST2500 was always 1MHz too, but it retained bit 4, making it read
> only. We can model both using the same fixed 1MHz calculation.
>
> Fixes: 6b2b2a703cad ("hw: wdt_aspeed: Add AST2600 support")
> Reviewed-by: Cédric Le Goater 
> Signed-off-by: Joel Stanley 

Reviewed-by: Alex Bennée 

> ---
> v2: Fix Fixes line in commit message
> ---
>  hw/watchdog/wdt_aspeed.c | 21 +
>  include/hw/watchdog/wdt_aspeed.h |  1 +
>  2 files changed, 18 insertions(+), 4 deletions(-)
>
> diff --git a/hw/watchdog/wdt_aspeed.c b/hw/watchdog/wdt_aspeed.c
> index 8787c5ad0f97..9a8a2200fd8e 100644
> --- a/hw/watchdog/wdt_aspeed.c
> +++ b/hw/watchdog/wdt_aspeed.c
> @@ -93,11 +93,11 @@ static uint64_t aspeed_wdt_read(void *opaque, hwaddr 
> offset, unsigned size)
>
>  }
>
> -static void aspeed_wdt_reload(AspeedWDTState *s, bool pclk)
> +static void aspeed_wdt_reload(AspeedWDTState *s)
>  {
>  uint64_t reload;
>
> -if (pclk) {
> +if (!(s->regs[WDT_CTRL] & WDT_CTRL_1MHZ_CLK)) {
>  reload = muldiv64(s->regs[WDT_RELOAD_VALUE], NANOSECONDS_PER_SECOND,
>s->pclk_freq);
>  } else {
> @@ -109,6 +109,16 @@ static void aspeed_wdt_reload(AspeedWDTState *s, bool 
> pclk)
>  }
>  }
>
> +static void aspeed_wdt_reload_1mhz(AspeedWDTState *s)
> +{
> +uint64_t reload = s->regs[WDT_RELOAD_VALUE] * 1000ULL;
> +
> +if (aspeed_wdt_is_enabled(s)) {
> +timer_mod(s->timer, qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL) + reload);
> +}
> +}
> +
> +
>  static void aspeed_wdt_write(void *opaque, hwaddr offset, uint64_t data,
>   unsigned size)
>  {
> @@ -130,13 +140,13 @@ static void aspeed_wdt_write(void *opaque, hwaddr 
> offset, uint64_t data,
>  case WDT_RESTART:
>  if ((data & 0x) == WDT_RESTART_MAGIC) {
>  s->regs[WDT_STATUS] = s->regs[WDT_RELOAD_VALUE];
> -aspeed_wdt_reload(s, !(s->regs[WDT_CTRL] & WDT_CTRL_1MHZ_CLK));
> +awc->wdt_reload(s);
>  }
>  break;
>  case WDT_CTRL:
>  if (enable && !aspeed_wdt_is_enabled(s)) {
>  s->regs[WDT_CTRL] = data;
> -aspeed_wdt_reload(s, !(data & WDT_CTRL_1MHZ_CLK));
> +awc->wdt_reload(s);
>  } else if (!enable && aspeed_wdt_is_enabled(s)) {
>  s->regs[WDT_CTRL] = data;
>  timer_del(s->timer);
> @@ -283,6 +293,7 @@ static void aspeed_2400_wdt_class_init(ObjectClass 
> *klass, void *data)
>  awc->offset = 0x20;
>  awc->ext_pulse_width_mask = 0xff;
>  awc->reset_ctrl_reg = SCU_RESET_CONTROL1;
> +awc->wdt_reload = aspeed_wdt_reload;
>  }
>
>  static const TypeInfo aspeed_2400_wdt_info = {
> @@ -317,6 +328,7 @@ static void aspeed_2500_wdt_class_init(ObjectClass 
> *klass, void *data)
>  awc->ext_pulse_width_mask = 0xf;
>  awc->reset_ctrl_reg = SCU_RESET_CONTROL1;
>  awc->reset_pulse = aspeed_2500_wdt_reset_pulse;
> +awc->wdt_reload = aspeed_wdt_reload_1mhz;
>  }
>
>  static const TypeInfo aspeed_2500_wdt_info = {
> @@ -336,6 +348,7 @@ static void aspeed_2600_wdt_class_init(ObjectClass 
> *klass, void *data)
>  awc->ext_pulse_width_mask = 0xf; /* TODO */
>  awc->reset_ctrl_reg = AST2600_SCU_RESET_CONTROL1;
>  awc->reset_pulse = aspeed_2500_wdt_reset_pulse;
> +awc->wdt_reload = aspeed_wdt_reload_1mhz;
>  }
>
>  static const TypeInfo aspeed_2600_wdt_info = {
> diff --git a/include/hw/watchdog/wdt_aspeed.h 
> b/include/hw/watchdog/wdt_aspeed.h
> index dfedd7662dd1..819c22993a6e 100644
> --- a/include/hw/watchdog/wdt_aspeed.h
> +++ b/include/hw/watchdog/wdt_aspeed.h
> @@ -47,6 +47,7 @@ typedef struct AspeedWDTClass {
>  uint32_t ext_pulse_width_mask;
>  uint32_t reset_ctrl_reg;
>  void (*reset_pulse)(AspeedWDTState *s, uint32_t property);
> +void (*wdt_reload)(AspeedWDTState *s);
>  }  AspeedWDTClass;
>
>  #endif /* WDT_ASPEED_H */


--
Alex Bennée



Re: [RFC v4 PATCH 02/49] multi-process: util: Add qemu_thread_cancel() to cancel running thread

2019-11-13 Thread Daniel P . Berrangé
On Wed, Nov 13, 2019 at 11:04:58AM -0500, Jag Raman wrote:
> 
> 
> On 11/13/2019 10:51 AM, Daniel P. Berrangé wrote:
> > On Wed, Nov 13, 2019 at 10:38:06AM -0500, Jag Raman wrote:
> > > 
> > > 
> > > On 11/13/2019 10:30 AM, Stefan Hajnoczi wrote:
> > > > On Thu, Oct 24, 2019 at 05:08:43AM -0400, Jagannathan Raman wrote:
> > > > > qemu_thread_cancel() added to destroy a given running thread.
> > > > > This will be needed in the following patches.
> > > > > 
> > > > > Signed-off-by: John G Johnson 
> > > > > Signed-off-by: Jagannathan Raman 
> > > > > Signed-off-by: Elena Ufimtseva 
> > > > > ---
> > > > >include/qemu/thread.h|  1 +
> > > > >util/qemu-thread-posix.c | 10 ++
> > > > >2 files changed, 11 insertions(+)
> > > > 
> > > > Is this still needed?  I thought previous discussion concluded that
> > > > thread cancellation is hard to get right and it's not actually used by
> > > > this series?
> > > 
> > > Hi Stefan,
> > > 
> > > This is used in PATCH 41/49.
> > 
> > I don't believe the cancellation usage in that patch is safe :-)
> 
> Thanks for the feedback, we will address that.
> 
> May I please ask why it is not safe? Any clarification will help us to
> find a better alternative.

I put some comments inline in the patch 41 explaining my thoughts.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [RFC v4 PATCH 41/49] multi-process/mig: Enable VMSD save in the Proxy object

2019-11-13 Thread Jag Raman




On 11/13/2019 10:50 AM, Daniel P. Berrangé wrote:

On Thu, Oct 24, 2019 at 05:09:22AM -0400, Jagannathan Raman wrote:

Collect the VMSD from remote process on the source and save
it to the channel leading to the destination

Signed-off-by: Elena Ufimtseva 
Signed-off-by: John G Johnson 
Signed-off-by: Jagannathan Raman 
---
  New patch in v4

  hw/proxy/qemu-proxy.c | 132 ++
  include/hw/proxy/qemu-proxy.h |   2 +
  include/io/mpqemu-link.h  |   1 +
  3 files changed, 135 insertions(+)

diff --git a/hw/proxy/qemu-proxy.c b/hw/proxy/qemu-proxy.c
index 623a6c5..ce72e6a 100644
--- a/hw/proxy/qemu-proxy.c
+++ b/hw/proxy/qemu-proxy.c
@@ -52,6 +52,14 @@
  #include "util/event_notifier-posix.c"
  #include "hw/boards.h"
  #include "include/qemu/log.h"
+#include "io/channel.h"
+#include "migration/qemu-file-types.h"
+#include "qapi/error.h"
+#include "io/channel-util.h"
+#include "migration/qemu-file-channel.h"
+#include "migration/qemu-file.h"
+#include "migration/migration.h"
+#include "migration/vmstate.h"
  
  QEMUTimer *hb_timer;

  static void pci_proxy_dev_realize(PCIDevice *dev, Error **errp);
@@ -62,6 +70,9 @@ static void stop_heartbeat_timer(void);
  static void childsig_handler(int sig, siginfo_t *siginfo, void *ctx);
  static void broadcast_msg(MPQemuMsg *msg, bool need_reply);
  
+#define PAGE_SIZE getpagesize()

+uint8_t *mig_data;
+
  static void childsig_handler(int sig, siginfo_t *siginfo, void *ctx)
  {
  /* TODO: Add proper handler. */
@@ -357,14 +368,135 @@ static void pci_proxy_dev_inst_init(Object *obj)
  dev->mem_init = false;
  }
  
+typedef struct {

+QEMUFile *rem;
+PCIProxyDev *dev;
+} proxy_mig_data;
+
+static void *proxy_mig_out(void *opaque)
+{
+proxy_mig_data *data = opaque;
+PCIProxyDev *dev = data->dev;
+uint8_t byte;
+uint64_t data_size = PAGE_SIZE;
+
+mig_data = g_malloc(data_size);
+
+while (true) {
+byte = qemu_get_byte(data->rem);


There is a pretty large set of APIs hiding behind the qemu_get_byte
call, which does not give me confidence that...


+mig_data[dev->migsize++] = byte;
+if (dev->migsize == data_size) {
+data_size += PAGE_SIZE;
+mig_data = g_realloc(mig_data, data_size);
+}
+}
+
+return NULL;
+}
+
+static int proxy_pre_save(void *opaque)
+{
+PCIProxyDev *pdev = opaque;
+proxy_mig_data *mig_data;
+QEMUFile *f_remote;
+MPQemuMsg msg = {0};
+QemuThread thread;
+Error *err = NULL;
+QIOChannel *ioc;
+uint64_t size;
+int fd[2];
+
+if (socketpair(AF_UNIX, SOCK_STREAM, 0, fd)) {
+return -1;
+}
+
+ioc = qio_channel_new_fd(fd[0], );
+if (err) {
+error_report_err(err);
+return -1;
+}
+
+qio_channel_set_name(QIO_CHANNEL(ioc), "PCIProxyDevice-mig");
+
+f_remote = qemu_fopen_channel_input(ioc);
+
+pdev->migsize = 0;
+
+mig_data = g_malloc0(sizeof(proxy_mig_data));
+mig_data->rem = f_remote;
+mig_data->dev = pdev;
+
+qemu_thread_create(, "Proxy MIG_OUT", proxy_mig_out, mig_data,
+   QEMU_THREAD_DETACHED);
+
+msg.cmd = START_MIG_OUT;
+msg.bytestream = 0;
+msg.num_fds = 2;
+msg.fds[0] = fd[1];
+msg.fds[1] = GET_REMOTE_WAIT;
+
+mpqemu_msg_send(pdev->mpqemu_link, , pdev->mpqemu_link->com);
+size = wait_for_remote(msg.fds[1]);
+PUT_REMOTE_WAIT(msg.fds[1]);
+
+assert(size != ULLONG_MAX);
+
+/*
+ * migsize is being update by a separate thread. Using volatile to
+ * instruct the compiler to fetch the value of this variable from
+ * memory during every read
+ */
+while (*((volatile uint64_t *)>migsize) < size) {
+}
+
+qemu_thread_cancel();


this is a safe way to stop the thread executing without
resulting in memory being leaked.

In addition thread cancellation is asynchronous, so the thread
may still be using the QEMUFile object while


+qemu_fclose(f_remote);


The above "wait_for_remote()" call waits for the remote process to
finish with Migration, and return the size of the VMSD.

It should be safe to cancel the thread and close the file, once the
remote process is done sending the VMSD and we have read "size" bytes
from it, is it not?

Thank you very much!
--
Jag



..this is closing it. This feels like it is a crash danger.



+close(fd[1]);
+
+return 0;
+}


Regards,
Daniel





[PATCH for 5.0 1/6] linux-user: Add support for enabling/disabling rtc features using ioctls

2019-11-13 Thread Filip Bozuta
Signed-off-by: Filip Bozuta 
---
 linux-user/ioctls.h   |  9 +
 linux-user/syscall.c  |  1 +
 linux-user/syscall_defs.h | 10 ++
 3 files changed, 20 insertions(+)

diff --git a/linux-user/ioctls.h b/linux-user/ioctls.h
index c6b9d6a..97741c7 100644
--- a/linux-user/ioctls.h
+++ b/linux-user/ioctls.h
@@ -69,6 +69,15 @@
  IOCTL(KDSETLED, 0, TYPE_INT)
  IOCTL_SPECIAL(KDSIGACCEPT, 0, do_ioctl_kdsigaccept, TYPE_INT)
 
+ IOCTL(RTC_AIE_ON, 0, TYPE_NULL)
+ IOCTL(RTC_AIE_OFF, 0, TYPE_NULL)
+ IOCTL(RTC_UIE_ON, 0, TYPE_NULL)
+ IOCTL(RTC_UIE_OFF, 0, TYPE_NULL)
+ IOCTL(RTC_PIE_ON, 0, TYPE_NULL)
+ IOCTL(RTC_PIE_OFF, 0, TYPE_NULL)
+ IOCTL(RTC_WIE_ON, 0, TYPE_NULL)
+ IOCTL(RTC_WIE_OFF, 0, TYPE_NULL)
+
  IOCTL(BLKROSET, IOC_W, MK_PTR(TYPE_INT))
  IOCTL(BLKROGET, IOC_R, MK_PTR(TYPE_INT))
  IOCTL(BLKRRPART, 0, TYPE_NULL)
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index ce399a5..74c3c08 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -107,6 +107,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "linux_loop.h"
 #include "uname.h"
 
diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index 98c2119..f91579a 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -763,6 +763,16 @@ struct target_pollfd {
 #define TARGET_KDSETLED0x4B32  /* set led state [lights, not flags] */
 #define TARGET_KDSIGACCEPT 0x4B4E
 
+/* real time clock ioctls */
+#define TARGET_RTC_AIE_ON   TARGET_IO('p', 0x01)
+#define TARGET_RTC_AIE_OFF  TARGET_IO('p', 0x02)
+#define TARGET_RTC_UIE_ON   TARGET_IO('p', 0x03)
+#define TARGET_RTC_UIE_OFF  TARGET_IO('p', 0x04)
+#define TARGET_RTC_PIE_ON   TARGET_IO('p', 0x05)
+#define TARGET_RTC_PIE_OFF  TARGET_IO('p', 0x06)
+#define TARGET_RTC_WIE_ON   TARGET_IO('p', 0x0f)
+#define TARGET_RTC_WIE_OFF  TARGET_IO('p', 0x10)
+
 #if defined(TARGET_ALPHA) || defined(TARGET_MIPS) || defined(TARGET_SH4) ||
\
defined(TARGET_XTENSA)
 #define TARGET_FIOGETOWN   TARGET_IOR('f', 123, int)
-- 
2.7.4




Re: [PATCH v9 Kernel 1/5] vfio: KABI for migration interface for device state

2019-11-13 Thread Alex Williamson
On Wed, 13 Nov 2019 11:24:17 +0100
Cornelia Huck  wrote:

> On Tue, 12 Nov 2019 15:30:05 -0700
> Alex Williamson  wrote:
> 
> > On Tue, 12 Nov 2019 22:33:36 +0530
> > Kirti Wankhede  wrote:
> >   
> > > - Defined MIGRATION region type and sub-type.
> > > - Used 3 bits to define VFIO device states.
> > > Bit 0 => _RUNNING
> > > Bit 1 => _SAVING
> > > Bit 2 => _RESUMING
> > > Combination of these bits defines VFIO device's state during migration
> > > _RUNNING => Normal VFIO device running state. When its reset, it
> > >   indicates _STOPPED state. when device is changed to
> > >   _STOPPED, driver should stop device before write()
> > >   returns.
> > > _SAVING | _RUNNING => vCPUs are running, VFIO device is running but
> > >   start saving state of device i.e. pre-copy state
> > > _SAVING  => vCPUs are stopped, VFIO device should be stopped, and
> > 
> > s/should/must/
> >   
> > > save device state,i.e. stop-n-copy state
> > > _RESUMING => VFIO device resuming state.
> > > _SAVING | _RESUMING and _RUNNING | _RESUMING => Invalid states
> > 
> > A table might be useful here and in the uapi header to indicate valid
> > states:  
> 
> I like that.
> 
> > 
> > | _RESUMING | _SAVING | _RUNNING | Description
> > +---+-+--+--
> > | 0 |0| 0| Stopped, not saving or resuming (a)
> > +---+-+--+--
> > | 0 |0| 1| Running, default state
> > +---+-+--+--
> > | 0 |1| 0| Stopped, migration interface in save mode
> > +---+-+--+--
> > | 0 |1| 1| Running, save mode interface, iterative
> > +---+-+--+--
> > | 1 |0| 0| Stopped, migration resume interface 
> > active
> > +---+-+--+--
> > | 1 |0| 1| Invalid (b)
> > +---+-+--+--
> > | 1 |1| 0| Invalid (c)
> > +---+-+--+--
> > | 1 |1| 1| Invalid (d)
> > 
> > I think we need to consider whether we define (a) as generally
> > available, for instance we might want to use it for diagnostics or a
> > fatal error condition outside of migration.
> > 
> > Are there hidden assumptions between state transitions here or are
> > there specific next possible state diagrams that we need to include as
> > well?  
> 
> Some kind of state-change diagram might be useful in addition to the
> textual description anyway. Let me try, just to make sure I understand
> this correctly:
> 
> 1) 0/0/1 ---(trigger driver to start gathering state info)---> 0/1/1
> 2) 0/0/1 ---(tell driver to stop)---> 0/0/0
> 3) 0/1/1 ---(tell driver to stop)---> 0/1/0
> 4) 0/0/1 ---(tell driver to resume with provided info)---> 1/0/0

I think this is to switch into resuming mode, the data will follow

> 5) 1/0/0 ---(driver is ready)---> 0/0/1
> 6) 0/1/1 ---(tell driver to stop saving)---> 0/0/1

I think also:

0/0/1 --> 0/1/0 If user chooses to go directly to stop and copy

0/0/0 and 0/0/1 should be reachable from any state, though I could see
that a vendor driver could fail transition from 1/0/0 -> 0/0/1 if the
received state is incomplete.  Somehow though a user always needs to
return the device to the initial state, so how does device_state
interact with the reset ioctl?  Would this automatically manipulate
device_state back to 0/0/1?
 
> Not sure about the usefulness of 2). Also, is 4) the only way to
> trigger resuming? And is the change in 5) performed by the driver, or
> by userspace?
> 
> Are any other state transitions valid?
> 
> (...)
> 
> > > + * Sequence to be followed for _SAVING|_RUNNING device state or pre-copy 
> > > phase
> > > + * and for _SAVING device state or stop-and-copy phase:
> > > + * a. read pending_bytes. If pending_bytes > 0, go through below steps.
> > > + * b. read data_offset, indicates kernel driver to write data to staging 
> > > buffer.
> > > + *Kernel driver should return this read operation only after writing 
> > > data to
> > > + *staging buffer is done.
> > 
> > "staging buffer" implies a vendor driver implementation, perhaps we
> > could just state that data is available from (region + data_offset) to
> > (region + data_offset + data_size) upon return of this read operation.
> >   
> > > + * c. read data_size, amount of data in bytes written by vendor driver in
> > > + *migration region.
> > > + * d. read data_size bytes of data from data_offset in the migration 
> > > 

Re: [PATCH v2 1/5] MAINTAINERS: Add a section on UI translation

2019-11-13 Thread Philippe Mathieu-Daudé

Hi Aleksandar,

On 11/13/19 2:47 PM, Aleksandar Markovic wrote:

From: Aleksandar Markovic 

There should be a person who will quickly evaluate new UI
translation, and find a way to update existing ones should
something changes in UI.


I appreciate your trust, but I'm afraid I know next to nothing about 
po/. I don't use QEMU's GUI myself: mostly command line, and via libvirt 
from time to time.


These files are about language translations, maybe it is easier to let 
them unmaintained and have patches go via the trivial tree?




Signed-off-by: Aleksandar Markovic 
---
  MAINTAINERS | 5 +
  1 file changed, 5 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 363e72a..fd9ba32 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2714,6 +2714,11 @@ M: Daniel P. Berrange 
  S: Odd Fixes
  F: scripts/git-submodule.sh
  
+UI translations

+M: Aleksandar Markovic 
+R: Philippe Mathieu-Daudé 
+F: po/*.po
+
  Sphinx documentation configuration and build machinery
  M: Peter Maydell 
  S: Maintained






Re: [EXTERNAL]Re: [PATCH v1 5/5] .travis.yml: drop 32 bit systems from MAIN_SOFTMMU_TARGETS

2019-11-13 Thread Alex Bennée


Aleksandar Markovic  writes:

>> From: Philippe Mathieu-Daudé 
>> > -- 
>> > MAIN_SOFTMMU_TARGETS="aarch64-softmmu,arm-softmmu,i386-softmmu,mips-softmmu,mips64-softmmu,ppc64-softmmu,riscv64-softmmu,s390x-softmmu,x86_64-softmmu"
>> > +- 
>> > MAIN_SOFTMMU_TARGETS="aarch64-softmmu,mips64-softmmu,ppc64-softmmu,riscv64-softmmu,s390x-softmmu,x86_64-softmmu"
>>
>> Aleksandar, since you mostly test 32-bit MIPS, are you OK we keep
>> mips-softmmu and drop mips64-softmmu here? Another job (acceptance-test)
>> builds the mips64el-softmmu.
>
> Philippe, thanks for bringing this to my attention. Yes, 32-bit mips
> targets are important to us, but, what can we do, time constraints are
> time constraints, so I agree with Alex change, please go ahead, Alex.
> We can test 32-bit mips targets via other acceptance tests (those that
> can run longer, so-called "slow" group), and perhaps we can extend
> them to test more 32-bit mips systems.

To be clear both gcc and clang have rules that test:

- CONFIG="--disable-user --target-list-exclude=${MAIN_SOFTMMU_TARGETS}"

So the main targets which are reducing their coverage are:

- CONFIG="--enable-debug --target-list=${MAIN_SOFTMMU_TARGETS}"

- CONFIG="--enable-modules --target-list=${MAIN_SOFTMMU_TARGETS}"

- CONFIG="--target-list=${MAIN_SOFTMMU_TARGETS} "
- CACHE_NAME="${TRAVIS_BRANCH}-linux-clang-sanitize"
  compiler: clang
  before_script:
- ./configure ${CONFIG} --extra-cflags="-fsanitize=undefined -Werror" 
|| { cat config.log && exit 1; }

- CONFIG="--enable-gprof --enable-gcov --disable-pie 
--target-list=${MAIN_SOFTMMU_TARGETS}"

and the MacOSX 9.4 build:
# MacOSX builds
- env:
- CONFIG="--target-list=${MAIN_SOFTMMU_TARGETS}"
  os: osx
  osx_image: xcode9.4
  compiler: clang

The Xcode 10.3 build is already a reduced list:
- 
CONFIG="--target-list=i386-softmmu,ppc-softmmu,ppc64-softmmu,m68k-softmmu,x86_64-softmmu"


>
> Thanks to everybody,
> Aleksandar


--
Alex Bennée



Re: [PATCH V4] target/i386/kvm: Add Hyper-V direct tlb flush support

2019-11-13 Thread Vitaly Kuznetsov
Roman Kagan  writes:

> On Tue, Nov 12, 2019 at 11:34:27AM +0800, lantianyu1...@gmail.com wrote:
>> From: Tianyu Lan 
>> 
>> Hyper-V direct tlb flush targets KVM on Hyper-V guest.
>> Enable direct TLB flush for its guests meaning that TLB
>> flush hypercalls are handled by Level 0 hypervisor (Hyper-V)
>> bypassing KVM in Level 1. Due to the different ABI for hypercall
>> parameters between Hyper-V and KVM, KVM capabilities should be
>> hidden when enable Hyper-V direct tlb flush otherwise KVM
>> hypercalls may be intercepted by Hyper-V. Add new parameter
>> "hv-direct-tlbflush". Check expose_kvm and Hyper-V tlb flush
>> capability status before enabling the feature.
>> 
>> Signed-off-by: Tianyu Lan 
>> ---
>> Change since v3:
>>- Fix logic of Hyper-V passthrough mode with direct
>>tlb flush.
>> 
>> Change sicne v2:
>>- Update new feature description and name.
>>- Change failure print log.
>> 
>> Change since v1:
>>- Add direct tlb flush's Hyper-V property and use
>>hv_cpuid_check_and_set() to check the dependency of tlbflush
>>feature.
>>- Make new feature work with Hyper-V passthrough mode.
>> ---
>>  docs/hyperv.txt   | 10 ++
>>  target/i386/cpu.c |  2 ++
>>  target/i386/cpu.h |  1 +
>>  target/i386/kvm.c | 24 
>>  4 files changed, 37 insertions(+)
>> 
>> diff --git a/docs/hyperv.txt b/docs/hyperv.txt
>> index 8fdf25c829..140a5c7e44 100644
>> --- a/docs/hyperv.txt
>> +++ b/docs/hyperv.txt
>> @@ -184,6 +184,16 @@ enabled.
>>  
>>  Requires: hv-vpindex, hv-synic, hv-time, hv-stimer
>>  
>> +3.18. hv-direct-tlbflush
>> +===
>> +Enable direct TLB flush for KVM when it is running as a nested
>> +hypervisor on top Hyper-V. When enabled, TLB flush hypercalls from L2
>> +guests are being passed through to L0 (Hyper-V) for handling. Due to ABI
>> +differences between Hyper-V and KVM hypercalls, L2 guests will not be
>> +able to issue KVM hypercalls (as those could be mishanled by L0
>> +Hyper-V), this requires KVM hypervisor signature to be hidden.
>
> On a second thought, I wonder if this is the only conflict we have.
>
> In KVM, kvm_emulate_hypercall, when sees Hyper-V hypercalls enabled,
> just calls kvm_hv_hypercall and returns.  I.e. once the userspace
> enables Hyper-V hypercalls (which QEMU does when any of hv_* flags is
> given), KVM treats *all* hypercalls as Hyper-V ones and handles *no* KVM
> hypercalls.

Yes, but only after guest enables Hyper-V hypercalls by writing to
HV_X64_MSR_HYPERCALL. E.g. if you run a Linux guest and add a couple
hv_* flags on the QEMU command line the guest will still be able to use
KVM hypercalls normally becase Linux won't enable Hyper-V hypercall
page.

>
> So, if hiding the KVM hypervisor signature is the only way to prevent the
> guest from issuing KVM hypercalls (need to double-check), then, I'm
> afraid, we just need to require it as soon as any Hyper-V feature is
> enabled.
>

If we do that we're going to break a lot of setups in the wild which run
Linux guests with hv_* flags (e.g. just to keep configuration the same
for Windows/Linux or by mistake/misunderstanding).

When Hyper-V enlightenments are enabled, KVM signature moves to
0x4100 so if a guest is still able to find it -- then it knows
what's going on. I'd suggest we maintain the status quo.

-- 
Vitaly




[PATCH] enable translating statx syscalls on more arches

2019-11-13 Thread Aleksandar Markovic
On Tuesday, November 12, 2019, Andrew Kelley  wrote:

> ping
>
>

Hello, Andrew.

I just want to advise you to send, if possible, a new version of the patch,
addressing whatt said in his review. It is better if you send it sooner
rather than later, given the stage of our dev cycle.

For MIPS parts:

Reviewed-by: Aleksandar Markovic 



> On 10/16/19 5:01 PM, Andrew Kelley wrote:
> > Signed-off-by: Andrew Kelley 
> > ---
> >  linux-user/aarch64/syscall_nr.h | 13 ++
> >  linux-user/arm/syscall_nr.h | 38 
> >  linux-user/i386/syscall_nr.h| 43 
> >  linux-user/mips/cpu_loop.c  |  6 +
> >  linux-user/ppc/syscall_nr.h | 44 +
> >  5 files changed, 144 insertions(+)
> >
> > diff --git a/linux-user/aarch64/syscall_nr.h
> > b/linux-user/aarch64/syscall_nr.h
> > index f00ffd7fb8..4e8d0bbb15 100644
> > --- a/linux-user/aarch64/syscall_nr.h
> > +++ b/linux-user/aarch64/syscall_nr.h
> > @@ -276,5 +276,18 @@
> >  #define TARGET_NR_membarrier 283
> >  #define TARGET_NR_mlock2 284
> >  #define TARGET_NR_copy_file_range 285
> > +#define TARGET_NR_preadv2 286
> > +#define TARGET_NR_pwritev2 287
> > +#define TARGET_NR_pkey_mprotect 288
> > +#define TARGET_NR_pkey_alloc 289
> > +#define TARGET_NR_pkey_free 290
> > +#define TARGET_NR_statx 291
> > +#define TARGET_NR_io_pgetevents 292
> > +#define TARGET_NR_rseq 293
> > +#define TARGET_NR_kexec_file_load 294
> > +#define TARGET_NR_pidfd_send_signal 424
> > +#define TARGET_NR_io_uring_setup 425
> > +#define TARGET_NR_io_uring_enter 426
> > +#define TARGET_NR_io_uring_register 427
> >
> >  #endif
> > diff --git a/linux-user/arm/syscall_nr.h b/linux-user/arm/syscall_nr.h
> > index e7eda0d766..20afa3992a 100644
> > --- a/linux-user/arm/syscall_nr.h
> > +++ b/linux-user/arm/syscall_nr.h
> > @@ -400,4 +400,42 @@
> >  #define TARGET_NR_membarrier   (389)
> >  #define TARGET_NR_mlock2   (390)
> >
> > +#define TARGET_NR_copy_file_range  (391)
> > +#define TARGET_NR_preadv2  (392)
> > +#define TARGET_NR_pwritev2 (393)
> > +#define TARGET_NR_pkey_mprotect(394)
> > +#define TARGET_NR_pkey_alloc   (395)
> > +#define TARGET_NR_pkey_free(396)
> > +#define TARGET_NR_statx(397)
> > +#define TARGET_NR_rseq (398)
> > +#define TARGET_NR_io_pgetevents(399)
> > +#define TARGET_NR_migrate_pages(400)
> > +
> > +#define TARGET_NR_kexec_file_load  (401)
> > +#define TARGET_NR_clock_gettime64  (403)
> > +#define TARGET_NR_clock_settime64  (404)
> > +#define TARGET_NR_clock_adjtime64  (405)
> > +#define TARGET_NR_clock_getres_time64  (406)
> > +#define TARGET_NR_clock_nanosleep_time64   (407)
> > +#define TARGET_NR_timer_gettime64  (408)
> > +#define TARGET_NR_timer_settime64  (409)
> > +#define TARGET_NR_timerfd_gettime64(410)
> > +
> > +#define TARGET_NR_timerfd_settime64(411)
> > +#define TARGET_NR_utimensat_time64 (412)
> > +#define TARGET_NR_pselect6_time64  (413)
> > +#define TARGET_NR_ppoll_time64 (414)
> > +#define TARGET_NR_io_pgetevents_time64 (416)
> > +#define TARGET_NR_recvmmsg_time64  (417)
> > +#define TARGET_NR_mq_timedsend_time64  (418)
> > +#define TARGET_NR_mq_timedreceive_time64   (419)
> > +#define TARGET_NR_semtimedop_time64(420)
> > +
> > +#define TARGET_NR_rt_sigtimedwait_time64   (421)
> > +#define TARGET_NR_futex_time64 (422)
> > +#define TARGET_NR_sched_rr_get_interval_time64 (423)
> > +#define TARGET_NR_pidfd_send_signal(424)
> > +#define TARGET_NR_io_uring_setup   (425)
> > +#define TARGET_NR_io_uring_enter   (426)
> > +#define TARGET_NR_io_uring_register(427)
> >  #endif
> > diff --git a/linux-user/i386/syscall_nr.h b/linux-user/i386/syscall_nr.h
> > index 3234ec21c6..e641674daf 100644
> > --- a/linux-user/i386/syscall_nr.h
> > +++ b/linux-user/i386/syscall_nr.h
> > @@ -383,5 +383,48 @@
> >  #define TARGET_NR_membarrier375
> >  #define TARGET_NR_mlock2376
> >  #define TARGET_NR_copy_file_range   377
> > +#define TARGET_NR_preadv2 378
> > +#define TARGET_NR_pwritev2 379
> > +#define TARGET_NR_pkey_mprotect 380
> > +#define TARGET_NR_pkey_alloc 381
> > +#define TARGET_NR_pkey_free 382
> > +#define TARGET_NR_statx 383
> > +#define TARGET_NR_arch_prctl 384
> > +#define TARGET_NR_io_pgetevents 385
> > +#define TARGET_NR_rseq 386
> > +#define TARGET_NR_semget 393
> > +#define TARGET_NR_semctl 394
> > +#define TARGET_NR_shmget 395
> > +#define TARGET_NR_shmctl 396
> > +#define TARGET_NR_shmat 397
> > +#define TARGET_NR_shmdt 398
> > +#define TARGET_NR_msgget 399

Re: [PATCH v1 5/5] .travis.yml: drop 32 bit systems from MAIN_SOFTMMU_TARGETS

2019-11-13 Thread Philippe Mathieu-Daudé

On 11/13/19 12:59 PM, Alex Bennée wrote:

The older clangs are still struggling to build and run everything
withing the 50 minute timeout so lets lighten the load a bit more. We
still have coverage for GCC and hopefully no obscure 32 bit guest only
breakages slip through the cracks.

Signed-off-by: Alex Bennée 
---
  .travis.yml | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/.travis.yml b/.travis.yml
index b9a026c8eeb..c09b6a00143 100644
--- a/.travis.yml
+++ b/.travis.yml
@@ -79,7 +79,7 @@ env:
  - BASE_CONFIG="--disable-docs --disable-tools"
  - TEST_CMD="make check V=1"
  # This is broadly a list of "mainline" softmmu targets which have support 
across the major distros
-- 
MAIN_SOFTMMU_TARGETS="aarch64-softmmu,arm-softmmu,i386-softmmu,mips-softmmu,mips64-softmmu,ppc64-softmmu,riscv64-softmmu,s390x-softmmu,x86_64-softmmu"
+- 
MAIN_SOFTMMU_TARGETS="aarch64-softmmu,mips64-softmmu,ppc64-softmmu,riscv64-softmmu,s390x-softmmu,x86_64-softmmu"


Aleksandar, since you mostly test 32-bit MIPS, are you OK we keep 
mips-softmmu and drop mips64-softmmu here? Another job (acceptance-test) 
builds the mips64el-softmmu.



  - CCACHE_SLOPPINESS="include_file_ctime,include_file_mtime"
  - CCACHE_MAXSIZE=1G
  






Call for agenda for 2019-11-19 KVM call

2019-11-13 Thread Juan Quintela



Hi

Please, send any topic that you are interested in covering.

At the end of Monday I will send an email with the agenda or the
cancellation of the call, so hurry up.

After discussions on the QEMU Summit, we are going to have always open a
KVM call where you can add topics.

 Call details:

By popular demand, a google calendar public entry with it

  
https://www.google.com/calendar/embed?src=dG9iMXRqcXAzN3Y4ZXZwNzRoMHE4a3BqcXNAZ3JvdXAuY2FsZW5kYXIuZ29vZ2xlLmNvbQ

(Let me know if you have any problems with the calendar entry.  I just
gave up about getting right at the same time CEST, CET, EDT and DST).

If you need phone number details,  contact me privately

Thanks, Juan. 




Re: [PATCH] WHPX: refactor load library

2019-11-13 Thread Philippe Mathieu-Daudé

On 11/12/19 8:47 PM, Eduardo Habkost wrote:

On Tue, Nov 12, 2019 at 06:42:00PM +, Sunil Muthuswamy wrote:




-Original Message-
From: Sunil Muthuswamy
Sent: Friday, November 8, 2019 12:32 PM
To: 'Paolo Bonzini' ; 'Richard Henderson' ; 
'Eduardo Habkost' ; 'Stefan
Weil' 
Cc: 'qemu-devel@nongnu.org' ; Justin Terry (VM) 

Subject: [PATCH] WHPX: refactor load library

This refactors the load library of WHV libraries to make it more
modular. It makes a helper routine that can be called on demand.
This allows future expansion of load library/functions to support
functionality that is depenedent on some feature being available.

Signed-off-by: Sunil Muthuswamy 
---


Can I possibly get some eyes on this?


I'd be glad to queue the patch if we get a Reviewed-by line from
somebody who understands Windows and WHPX.  Maybe Justin?


Can we wait for approval from the Microsoft legal department first?
So we can start testing WHPX builds, and reduce the possibilities to 
introduce regressions.


Testing is ready, we are waiting for Microsoft to merge, see:
https://www.mail-archive.com/qemu-devel@nongnu.org/msg646351.html



Sunil, Justin, would you like to be listed as maintainers or
designated reviewers for the WHPX code in QEMU?


Great idea!




Re: [RFC v4 PATCH 11/49] multi-process: setup memory manager for remote device

2019-11-13 Thread Jag Raman




On 11/13/2019 11:33 AM, Stefan Hajnoczi wrote:

On Thu, Oct 24, 2019 at 05:08:52AM -0400, Jagannathan Raman wrote:

+static void remote_ram_destructor(MemoryRegion *mr)
+{
+qemu_ram_free(mr->ram_block);
+}
+
+static void remote_ram_init_from_fd(MemoryRegion *mr, int fd, uint64_t size,
+ram_addr_t offset, Error **errp)
+{
+char *name = g_strdup_printf("%d", fd);
+
+memory_region_init(mr, NULL, name, size);
+mr->ram = true;
+mr->terminates = true;
+mr->destructor = NULL;
+mr->align = 0;
+mr->ram_block = qemu_ram_alloc_from_fd(size, mr, RAM_SHARED, fd, offset,
+   errp);
+mr->dirty_log_mask = tcg_enabled() ? (1 << DIRTY_MEMORY_CODE) : 0;
+
+g_free(name);
+}


This is not specific to remote/memory.c and could be shared in case
something else in QEMU wants to initialize from an fd.


+
+void remote_sysmem_reconfig(MPQemuMsg *msg, Error **errp)
+{
+sync_sysmem_msg_t *sysmem_info = >data1.sync_sysmem;


A possible security issue with MPQemuMsg: was the message size
validatedb before we access msg->data1.sync_sysmem?

If not, then we might access uninitialized data.  I didn't see if there
is a single place in the code that always zeroes msg, but I think the
answer is no.  Accessing uninitialized data could expose the old
contents of the stack/heap to the other process.  Information leaks like
this can be used to defeat address-space randomization because the other
process may learn about our memory layout if there are memory addresses
in the uninitialized data.


Thanks for the feedback. Will do.

--
Jag







[PATCH for 5.0 0/6] linux-user: Add support for real time clock ioctls

2019-11-13 Thread Filip Bozuta
Add ioctls for all rtc features that are currently supported in linux kernell.

Filip Bozuta (6):
  linux-user: Add support for enabling/disabling rtc features using
ioctls
  linux-user: Add set and read for rtc time and alarm using ioctls
  linux-user: Add read and set for rtc periodic interrupt and epoch
using ioctls
  linux-user: Add get and set for rtc wakeup alarm using ioctls
  linux-user: Add get and set for rtc pll correction using ioctls
  linux-user: Add rtc voltage low detector read and clear using ioctls

 linux-user/ioctls.h| 23 +++
 linux-user/syscall.c   |  1 +
 linux-user/syscall_defs.h  | 36 
 linux-user/syscall_types.h | 25 +
 4 files changed, 85 insertions(+)

-- 
2.7.4




[PATCH for 5.0 5/6] linux-user: Add get and set for rtc pll correction using ioctls

2019-11-13 Thread Filip Bozuta
Signed-off-by: Filip Bozuta 
---
 linux-user/ioctls.h|  2 ++
 linux-user/syscall_defs.h  | 14 ++
 linux-user/syscall_types.h |  9 +
 3 files changed, 25 insertions(+)

diff --git a/linux-user/ioctls.h b/linux-user/ioctls.h
index 5830315..eea65e1 100644
--- a/linux-user/ioctls.h
+++ b/linux-user/ioctls.h
@@ -87,6 +87,8 @@
  IOCTL(RTC_EPOCH_SET, IOC_W, MK_PTR(TYPE_ULONG))
  IOCTL(RTC_WKALM_SET, IOC_W, MK_PTR(MK_STRUCT(STRUCT_rtc_wkalrm)))
  IOCTL(RTC_WKALM_RD, IOC_R, MK_PTR(MK_STRUCT(STRUCT_rtc_wkalrm)))
+ IOCTL(RTC_PLL_GET, IOC_R, MK_PTR(MK_STRUCT(STRUCT_rtc_pll_info)))
+ IOCTL(RTC_PLL_SET, IOC_W, MK_PTR(MK_STRUCT(STRUCT_rtc_pll_info)))
 
  IOCTL(BLKROSET, IOC_W, MK_PTR(TYPE_INT))
  IOCTL(BLKROGET, IOC_R, MK_PTR(TYPE_INT))
diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index 3a0eb6b..367e9bd 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -763,6 +763,16 @@ struct target_pollfd {
 #define TARGET_KDSETLED0x4B32  /* set led state [lights, not flags] */
 #define TARGET_KDSIGACCEPT 0x4B4E
 
+struct target_rtc_pll_info {
+int pll_ctrl;
+int pll_value;
+int pll_max;
+int pll_min;
+int pll_posmult;
+int pll_negmult;
+abi_long pll_clock;
+};
+
 /* real time clock ioctls */
 #define TARGET_RTC_AIE_ON   TARGET_IO('p', 0x01)
 #define TARGET_RTC_AIE_OFF  TARGET_IO('p', 0x02)
@@ -782,6 +792,10 @@ struct target_pollfd {
 #define TARGET_RTC_EPOCH_SETTARGET_IOW('p', 0x0e, abi_ulong)
 #define TARGET_RTC_WKALM_SETTARGET_IOW('p', 0x0f, struct rtc_wkalrm)
 #define TARGET_RTC_WKALM_RD TARGET_IOR('p', 0x10, struct rtc_wkalrm)
+#define TARGET_RTC_PLL_GET  TARGET_IOR('p', 0x11,  
\
+   struct target_rtc_pll_info)
+#define TARGET_RTC_PLL_SET  TARGET_IOW('p', 0x12,  
\
+   struct target_rtc_pll_info)
 
 #if defined(TARGET_ALPHA) || defined(TARGET_MIPS) || defined(TARGET_SH4) ||
\
defined(TARGET_XTENSA)
diff --git a/linux-user/syscall_types.h b/linux-user/syscall_types.h
index 820bc8e..baf43ee 100644
--- a/linux-user/syscall_types.h
+++ b/linux-user/syscall_types.h
@@ -271,6 +271,15 @@ STRUCT(rtc_wkalrm,
TYPE_CHAR, /* pending */
MK_STRUCT(STRUCT_rtc_time)) /* time */
 
+STRUCT(rtc_pll_info,
+   TYPE_INT, /* pll_ctrl */
+   TYPE_INT, /* pll_value */
+   TYPE_INT, /* pll_max */
+   TYPE_INT, /* pll_min */
+   TYPE_INT, /* pll_posmult */
+   TYPE_INT, /* pll_negmult */
+   TYPE_ULONG) /* pll_clock */
+
 STRUCT(blkpg_ioctl_arg,
TYPE_INT, /* op */
TYPE_INT, /* flags */
-- 
2.7.4




Re: [PATCH] WHPX: refactor load library

2019-11-13 Thread Eduardo Habkost
On Wed, Nov 13, 2019 at 05:35:59PM +0100, Philippe Mathieu-Daudé wrote:
> On 11/12/19 8:47 PM, Eduardo Habkost wrote:
> > On Tue, Nov 12, 2019 at 06:42:00PM +, Sunil Muthuswamy wrote:
> > > 
> > > 
> > > > -Original Message-
> > > > From: Sunil Muthuswamy
> > > > Sent: Friday, November 8, 2019 12:32 PM
> > > > To: 'Paolo Bonzini' ; 'Richard Henderson' 
> > > > ; 'Eduardo Habkost' ; 'Stefan
> > > > Weil' 
> > > > Cc: 'qemu-devel@nongnu.org' ; Justin Terry (VM) 
> > > > 
> > > > Subject: [PATCH] WHPX: refactor load library
> > > > 
> > > > This refactors the load library of WHV libraries to make it more
> > > > modular. It makes a helper routine that can be called on demand.
> > > > This allows future expansion of load library/functions to support
> > > > functionality that is depenedent on some feature being available.
> > > > 
> > > > Signed-off-by: Sunil Muthuswamy 
> > > > ---
> > > 
> > > Can I possibly get some eyes on this?
> > 
> > I'd be glad to queue the patch if we get a Reviewed-by line from
> > somebody who understands Windows and WHPX.  Maybe Justin?
> 
> Can we wait for approval from the Microsoft legal department first?
> So we can start testing WHPX builds, and reduce the possibilities to
> introduce regressions.
> 
> Testing is ready, we are waiting for Microsoft to merge, see:
> https://www.mail-archive.com/qemu-devel@nongnu.org/msg646351.html

Making it easier for other people to test WHPX would be nice.
But in case this is not sorted out soon, I don't see a reason to
not merge WHPX changes if they are reviewed and approved by the
main author of that code (Justin).

-- 
Eduardo




Re: [EXTERNAL]Re: [PATCH v1 5/5] .travis.yml: drop 32 bit systems from MAIN_SOFTMMU_TARGETS

2019-11-13 Thread Aleksandar Markovic
> From: Philippe Mathieu-Daudé 
> > -- 
> > MAIN_SOFTMMU_TARGETS="aarch64-softmmu,arm-softmmu,i386-softmmu,mips-softmmu,mips64-softmmu,ppc64-softmmu,riscv64-softmmu,s390x-softmmu,x86_64-softmmu"
> > +- 
> > MAIN_SOFTMMU_TARGETS="aarch64-softmmu,mips64-softmmu,ppc64-softmmu,riscv64-softmmu,s390x-softmmu,x86_64-softmmu"
> 
> Aleksandar, since you mostly test 32-bit MIPS, are you OK we keep
> mips-softmmu and drop mips64-softmmu here? Another job (acceptance-test)
> builds the mips64el-softmmu.

Philippe, thanks for bringing this to my attention. Yes, 32-bit mips targets 
are important to us, but, what can we do, time constraints are time 
constraints, so I agree with Alex change, please go ahead, Alex. We can test 
32-bit mips targets via other acceptance tests (those that can run longer, 
so-called "slow" group), and perhaps we can extend them to test more 32-bit 
mips systems.

Thanks to everybody,
Aleksandar


Re: [RFC v4 PATCH 45/49] multi-process/mig: Synchronize runstate of remote process

2019-11-13 Thread Jag Raman




On 11/11/2019 11:17 AM, Stefan Hajnoczi wrote:

On Thu, Oct 24, 2019 at 05:09:26AM -0400, Jagannathan Raman wrote:

@@ -656,6 +657,19 @@ static void init_proxy(PCIDevice *dev, char *command, bool 
need_spawn, Error **e
  }
  }
  
+static void proxy_vm_state_change(void *opaque, int running, RunState state)

+{
+PCIProxyDev *dev = opaque;
+MPQemuMsg msg = { 0 };
+
+msg.cmd = RUNSTATE_SET;
+msg.bytestream = 0;
+msg.size = sizeof(msg.data1);
+msg.data1.runstate.state = state;
+
+mpqemu_msg_send(dev->mpqemu_link, , dev->mpqemu_link->com);
+}


Changing vm state is a barrier operation - devices must not dirty memory
afterwards.  This function doesn't have barrier semantics, it sends off
the message without waiting for the remote process to finish processing
it.  This means there is a race condition where QEMU has changes the vm
state but devices could still dirty memory.  Please wait for a reply to
prevent this.


Got it, thanks! Will do.

--
Jag



Stefan





[PATCH for 5.0 2/6] linux-user: Add set and read for rtc time and alarm using ioctls

2019-11-13 Thread Filip Bozuta
Signed-off-by: Filip Bozuta 
---
 linux-user/ioctls.h|  4 
 linux-user/syscall_defs.h  |  4 
 linux-user/syscall_types.h | 11 +++
 3 files changed, 19 insertions(+)

diff --git a/linux-user/ioctls.h b/linux-user/ioctls.h
index 97741c7..ff1cdd2 100644
--- a/linux-user/ioctls.h
+++ b/linux-user/ioctls.h
@@ -77,6 +77,10 @@
  IOCTL(RTC_PIE_OFF, 0, TYPE_NULL)
  IOCTL(RTC_WIE_ON, 0, TYPE_NULL)
  IOCTL(RTC_WIE_OFF, 0, TYPE_NULL)
+ IOCTL(RTC_ALM_SET, IOC_W, MK_PTR(MK_STRUCT(STRUCT_rtc_time)))
+ IOCTL(RTC_ALM_READ, IOC_R, MK_PTR(MK_STRUCT(STRUCT_rtc_time)))
+ IOCTL(RTC_RD_TIME, IOC_R, MK_PTR(MK_STRUCT(STRUCT_rtc_time)))
+ IOCTL(RTC_SET_TIME, IOC_W, MK_PTR(MK_STRUCT(STRUCT_rtc_time)))
 
  IOCTL(BLKROSET, IOC_W, MK_PTR(TYPE_INT))
  IOCTL(BLKROGET, IOC_R, MK_PTR(TYPE_INT))
diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index f91579a..2298547 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -772,6 +772,10 @@ struct target_pollfd {
 #define TARGET_RTC_PIE_OFF  TARGET_IO('p', 0x06)
 #define TARGET_RTC_WIE_ON   TARGET_IO('p', 0x0f)
 #define TARGET_RTC_WIE_OFF  TARGET_IO('p', 0x10)
+#define TARGET_RTC_ALM_SET  TARGET_IOW('p', 0x07, struct rtc_time)
+#define TARGET_RTC_ALM_READ TARGET_IOR('p', 0x08, struct rtc_time)
+#define TARGET_RTC_RD_TIME  TARGET_IOR('p', 0x09, struct rtc_time)
+#define TARGET_RTC_SET_TIME TARGET_IOW('p', 0x0a, struct rtc_time)
 
 #if defined(TARGET_ALPHA) || defined(TARGET_MIPS) || defined(TARGET_SH4) ||
\
defined(TARGET_XTENSA)
diff --git a/linux-user/syscall_types.h b/linux-user/syscall_types.h
index 4e36983..a35072a 100644
--- a/linux-user/syscall_types.h
+++ b/linux-user/syscall_types.h
@@ -255,6 +255,17 @@ STRUCT(blkpg_partition,
MK_ARRAY(TYPE_CHAR, BLKPG_DEVNAMELTH), /* devname */
MK_ARRAY(TYPE_CHAR, BLKPG_VOLNAMELTH)) /* volname */
 
+STRUCT(rtc_time,
+   TYPE_INT, /* tm_sec */
+   TYPE_INT, /* tm_min */
+   TYPE_INT, /* tm_hour */
+   TYPE_INT, /* tm_mday */
+   TYPE_INT, /* tm_mon */
+   TYPE_INT, /* tm_year */
+   TYPE_INT, /* tm_wday */
+   TYPE_INT, /* tm_yday */
+   TYPE_INT) /* tm_isdst */
+
 STRUCT(blkpg_ioctl_arg,
TYPE_INT, /* op */
TYPE_INT, /* flags */
-- 
2.7.4




Re: [PATCH v2] monitor/qmp: resume monitor when clearing its queue

2019-11-13 Thread Markus Armbruster
Wolfgang Bumiller  writes:

> When a monitor's queue is filled up in handle_qmp_command()
> it gets suspended. It's the dispatcher bh's job currently to
> resume the monitor, which it does after processing an event
> from the queue. However, it is possible for a
> CHR_EVENT_CLOSED event to be processed before before the bh
> is scheduled, which will clear the queue without resuming
> the monitor, thereby preventing the dispatcher from reaching
> the resume() call.
> Any new connections to the qmp socket will be accept()ed and
> show the greeting, but will not respond to any messages sent
> afterwards (as they will not be read from the
> still-suspended socket).
> Fix this by resuming the monitor when clearing a queue which
> was filled up.
>
> Signed-off-by: Wolfgang Bumiller 
> ---
> Changes to v1:
>   * Update commit message to include the resulting symptoms.
>   * Moved the resume code from `monitor_qmp_cleanup_req_queue_locked` to
> `monitor_qmp_cleanup_queues` to avoid an unnecessary resume when
> destroying the monitor (as the `_locked` version is also used by
> `monitor_data_destroy()`.
>   * Renamed `monitor_qmp_cleanup_queues` to
> `monitor_qmp_cleanup_queues_and_resume` to reflect the change and be
> verbose about it for potential future users of the function.
> Currently the only user is `monitor_qmp_event()` in the
> `CHR_EVENT_CLOSED` case, which is exactly the problematic case currently.
>
> Sorry for the deleay :|

Same to you (my sorry excuse is KVM Forum).  Now we need to hurry to get
this fix into 4.2.  Let's try.

>  monitor/qmp.c | 24 ++--
>  1 file changed, 22 insertions(+), 2 deletions(-)
>
> diff --git a/monitor/qmp.c b/monitor/qmp.c
> index 9d9e5d8b27..df689aa95e 100644
> --- a/monitor/qmp.c
> +++ b/monitor/qmp.c
> @@ -75,10 +75,30 @@ static void 
> monitor_qmp_cleanup_req_queue_locked(MonitorQMP *mon)
>  }
>  }
>  
> -static void monitor_qmp_cleanup_queues(MonitorQMP *mon)
> +static void monitor_qmp_cleanup_queues_and_resume(MonitorQMP *mon)

Let's rename to _cleanup_queue_and resume().  The plural is a remnant
from when we also had a response queue.  Gone since commit 27656018d86.

>  {
>  qemu_mutex_lock(>qmp_queue_lock);
> +
> +/*
> + * Same condition as in monitor_qmp_bh_dispatcher(), but before removing 
> an
> + * element from the queue (hence no `- 1`), also, the queue should not be
> + * empty either, otherwise the monitor hasn't been suspended yet (or was
> + * already resumed).
> + */

Comment lines should be wrapped around colum 70.

> +bool need_resume = (!qmp_oob_enabled(mon) && mon->qmp_requests->length > 
> 0)
> +|| mon->qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX;

Now let me digest the comment.  Here's condition in
monitor_qmp_bh_dispatcher():

need_resume = !qmp_oob_enabled(mon) ||
mon->qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX - 1;

"Same but before removing" is

   bool need_resume = !qmp_oob_enabled(mon)
   || mon->qmp_requests->length == QMP_REQ_QUEUE_LEN_MAX;

That leaves the "also" part.  It's been too long since v1, I don't
remember a thing, and I'm too dense today to understand without more
help.  Can you help me some more?

> +
>  monitor_qmp_cleanup_req_queue_locked(mon);
> +
> +if (need_resume) {
> +/*
> + * Pairs with the monitor_suspend() in handle_qmp_command() in case 
> the
> + * queue gets cleared from a CH_EVENT_CLOSED event before the 
> dispatch
> + * bh got scheduled.
> + */

CHR_EVENT_CLOSED

Suggest:

   /*
* handle_qmp_command() suspened the monitor because the
* request queue filled up, to be resumed when the queue has
* space again.  We just emptied it; resume the monitor.
*/

If we want to record the issue that made us fix the missing resume, we
can add:

* Without this, the monitor would remain suspended forever
* when we get here while the monitor is suspended.  An
* unfortunately times CHR_EVENT_CLOSED can do the trick.

Also update handle_qmp_command()'s comment:

/*
 * Suspend the monitor when we can't queue more requests after
 * this one.  Dequeuing in monitor_qmp_bh_dispatcher() or
 * monitor_qmp_cleanup_queue_and_resume() will resume it.
 * Note that when OOB is disabled, we queue at most one command,
 * for backward compatibility.
 */

> +monitor_resume(>common);
> +}
> +
>  qemu_mutex_unlock(>qmp_queue_lock);
>  }
>  
> @@ -332,7 +352,7 @@ static void monitor_qmp_event(void *opaque, int event)
>   * stdio, it's possible that stdout is still open when stdin
>   * is closed.
>   */
> -monitor_qmp_cleanup_queues(mon);
> +monitor_qmp_cleanup_queues_and_resume(mon);
>  json_message_parser_destroy(>parser);
>  json_message_parser_init(>parser, handle_qmp_command,
> 

Re: [PATCH v1 2/5] docs/devel: rename plugins.rst to tcg-plugins.rst

2019-11-13 Thread Paolo Bonzini
> What about (other patch):
> 
>   plugins/ -> tcg/plugins/
> 
>>   F: tests/plugin
> 
> tests/plugin/ -> tests/tcg/plugins/

tests/tcg is for code that runs on the target in general, but perhaps
tests/tcg-plugins would make sense.

Paolo




Re: [SeaBIOS] Re: [PATCH] ahci: zero-initialize port struct

2019-11-13 Thread Sam Eiderman
Links to latest commits from archive.
You can see all changes in the cover letter.

[SeaBIOS] [PATCH v4 0/5] Add Qemu to SeaBIOS LCHS interface
https://mail.coreboot.org/hyperkitty/list/seab...@seabios.org/message/VLNFBEERTWLEUO6LM5BYLBNVIFCTP46M/
[SeaBIOS] [PATCH v4 1/5] geometry: Read LCHS from fw_cfg
https://mail.coreboot.org/hyperkitty/list/seab...@seabios.org/message/B3IPD3HH4UPDYJWFE4KX3HXUCNW5GPEW/
[SeaBIOS] [PATCH v4 2/5] boot: Reorder functions in boot.c
https://mail.coreboot.org/hyperkitty/list/seab...@seabios.org/message/YDVU3WIGOSKZ2RQSMR5UVQNZ66K4IG65/
[SeaBIOS] [PATCH v4 3/5] boot: Build ata and scsi paths in function
https://mail.coreboot.org/hyperkitty/list/seab...@seabios.org/message/RY33DRZZ3UK3UMQ3Q6BY2KUCHRRW4MRK/
[SeaBIOS] [PATCH v4 4/5] geometry: Add boot_lchs_find_*() utility functions
https://mail.coreboot.org/hyperkitty/list/seab...@seabios.org/message/DAJOULWFK24DX4DY3VS6WWOOQNWW3GSG/
[SeaBIOS] [PATCH v4 5/5] geometry: Apply LCHS values for boot devices
https://mail.coreboot.org/hyperkitty/list/seab...@seabios.org/message/UUCTPPJ4PTS5CUTCFLOH3YOEXGC6HQ4T/

Sam

On Wed, Nov 13, 2019 at 6:35 PM Sam Eiderman  wrote:
>
> Sure,
>
> There are two issues here.
>
> The first issue is that my commits which applied to seabios master:
>
> * 9caa19b - geometry: Apply LCHS values for boot devices
> * cb56f61 - config: Add toggle for bootdevice information
> * ad29109 - geometry: Add boot_lchs_find_*() utility functions
> * b3d2120 - boot: Reorder functions in boot.c
> * 7c66a43 - geometry: Read LCHS from fw_cfg
>
> Are not from the latest version which was submitted to the mailing list (v4)
> * fw_cfg key name has changed
> * The value and of the key has changed from binary (v1) to textual (v4)
> * Other fixes and variable name changes.
>
> So these commits need to be reverted and reapplied with the latest version 
> (v4)
>
> The second issue is that my commits, (in v4 too) will require this fix
> that Gerd added ([PATCH] ahci: zero-initialize port struct) since they
> change how SeaBIOS uses lchs.
>
> Previously, before any of my commits, drive.lchs could contain "random
> crap" since it was always set before being used in
> setup_translation().
>
> After my patches, get_translation() invokes overriden_lchs_supplied()
> which checks: "return drive->lchs.cylinder || drive->lchs.head ||
> drive->lchs.sector;"
> So there is an assumption that "drive->lchs" is zeroed when lchs is
> not supplied for the host.
>
> This was true for all devices using "drive->lchs" (all were memset to
> 0) except ahci.
> (I used 'git grep "drive_s * drive"' to find them all).
>
> So Gerd fix is indeed needed and then all devices are covered
> (drive->lchs is memset to 0).
>
> Now only the first issue remains...
>
> Sam
>
> On Wed, Nov 13, 2019 at 6:12 PM Philippe Mathieu-Daudé
>  wrote:
> >
> > Hi Sam,
> >
> > On 11/13/19 4:03 PM, Sam Eiderman wrote:
> > > Hi,
> > >
> > > Does this fix a bug that actually happened?
> > >
> > > I just noticed that in my lchs patches I assumed that lchs struct is
> > > zeroed out in all devices (not only ahci):
> > >
> > > 9caa19be0e53 (geometry: Apply LCHS values for boot devices)
> > >
> > > Seems like this is not the case but why only ahci is affected?
> > >
> > > The list of devices is at least:
> > >
> > >  * ata
> > >  * ahci
> > >  * scsi
> > >  * esp
> > >  * lsi
> > >  * megasas
> > >  * mpt
> > >  * pvscsi
> > >  * virtio
> > >  * virtio-blk
> > >
> > > As specified in the commit message.
> > >
> > > Also Gerd it seems that my lchs patches were not committed in the
> > > latest submitted version (v4)!!!
> > > The ABI of the fw config key is completely broken.
> >
> > What do you mean? Can you be more specific?
> >



[PATCH v1] s390x: kvm-unit-tests: a PONG device for Sub Channels tests

2019-11-13 Thread Pierre Morel
The PONG device accept two commands: PONG_READ and PONG_WRITE
which allow to read from and write to an internal buffer of
1024 bytes.

The QEMU device is named ccw-pong.

Signed-off-by: Pierre Morel 
---
 hw/s390x/Makefile.objs  |   1 +
 hw/s390x/ccw-pong.c | 186 
 include/hw/s390x/pong.h |  47 
 3 files changed, 234 insertions(+)
 create mode 100644 hw/s390x/ccw-pong.c
 create mode 100644 include/hw/s390x/pong.h

diff --git a/hw/s390x/Makefile.objs b/hw/s390x/Makefile.objs
index ee91152..3a83438 100644
--- a/hw/s390x/Makefile.objs
+++ b/hw/s390x/Makefile.objs
@@ -32,6 +32,7 @@ obj-$(CONFIG_KVM) += tod-kvm.o
 obj-$(CONFIG_KVM) += s390-skeys-kvm.o
 obj-$(CONFIG_KVM) += s390-stattrib-kvm.o s390-mchk.o
 obj-y += s390-ccw.o
+obj-y += ccw-pong.o
 obj-y += ap-device.o
 obj-y += ap-bridge.o
 obj-y += s390-sei.o
diff --git a/hw/s390x/ccw-pong.c b/hw/s390x/ccw-pong.c
new file mode 100644
index 000..e7439d5
--- /dev/null
+++ b/hw/s390x/ccw-pong.c
@@ -0,0 +1,186 @@
+/*
+ * CCW PING-PONG
+ *
+ * Copyright 2019 IBM Corp.
+ * Author(s): Pierre Morel 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at
+ * your option) any later version. See the COPYING file in the top-level
+ * directory.
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/module.h"
+#include "cpu.h"
+#include "exec/address-spaces.h"
+#include "hw/s390x/css.h"
+#include "hw/s390x/css-bridge.h"
+#include "hw/qdev-properties.h"
+#include "hw/s390x/pong.h"
+
+#define PONG_BUF_SIZE 0x1000
+static char buf[PONG_BUF_SIZE] = "Hello world\n";
+
+static inline int pong_rw(CCW1 *ccw, char *p, int len, bool dir)
+{
+int ret;
+
+ret = address_space_rw(_space_memory, ccw->cda,
+   MEMTXATTRS_UNSPECIFIED,
+   (unsigned char *)buf, len, dir);
+
+return (ret == MEMTX_OK) ? -EIO : 0;
+}
+
+/* Handle READ ccw commands from guest */
+static int handle_payload_read(CcwPONGDevice *dev, CCW1 *ccw)
+{
+CcwDevice *ccw_dev = CCW_DEVICE(dev);
+int len;
+
+if (!ccw->cda) {
+return -EFAULT;
+}
+
+if (ccw->count > PONG_BUF_SIZE) {
+len = PONG_BUF_SIZE;
+ccw_dev->sch->curr_status.scsw.count = ccw->count - PONG_BUF_SIZE;
+} else {
+len = ccw->count;
+ccw_dev->sch->curr_status.scsw.count = 0;
+}
+
+return pong_rw(ccw, buf, len, 1);
+}
+
+/*
+ * Handle WRITE ccw commands to write data to client
+ * The SCSW count is set to the number of bytes not transfered.
+ */
+static int handle_payload_write(CcwPONGDevice *dev, CCW1 *ccw)
+{
+CcwDevice *ccw_dev = CCW_DEVICE(dev);
+int len;
+
+if (!ccw->cda) {
+ccw_dev->sch->curr_status.scsw.count = ccw->count;
+return -EFAULT;
+}
+
+if (ccw->count > PONG_BUF_SIZE) {
+len = PONG_BUF_SIZE;
+ccw_dev->sch->curr_status.scsw.count = ccw->count - PONG_BUF_SIZE;
+} else {
+len = ccw->count;
+ccw_dev->sch->curr_status.scsw.count = 0;
+}
+
+return pong_rw(ccw, buf, len, 0);
+}
+
+static int pong_ccw_cb(SubchDev *sch, CCW1 ccw)
+{
+int rc = 0;
+CcwPONGDevice *dev = sch->driver_data;
+
+switch (ccw.cmd_code) {
+case PONG_WRITE:
+rc = handle_payload_write(dev, );
+break;
+case PONG_READ:
+rc = handle_payload_read(dev, );
+break;
+default:
+rc = -ENOSYS;
+break;
+}
+
+if (rc == -EIO) {
+/* I/O error, specific devices generate specific conditions */
+SCHIB *schib = >curr_status;
+
+sch->curr_status.scsw.dstat = SCSW_DSTAT_UNIT_CHECK;
+sch->sense_data[0] = 0x40;/* intervention-req */
+schib->scsw.ctrl &= ~SCSW_ACTL_START_PEND;
+schib->scsw.ctrl &= ~SCSW_CTRL_MASK_STCTL;
+schib->scsw.ctrl |= SCSW_STCTL_PRIMARY | SCSW_STCTL_SECONDARY |
+   SCSW_STCTL_ALERT | SCSW_STCTL_STATUS_PEND;
+}
+return rc;
+}
+
+static void pong_ccw_realize(DeviceState *ds, Error **errp)
+{
+uint16_t chpid;
+CcwPONGDevice *dev = CCW_PONG(ds);
+CcwDevice *cdev = CCW_DEVICE(ds);
+CCWDeviceClass *cdk = CCW_DEVICE_GET_CLASS(cdev);
+SubchDev *sch;
+Error *err = NULL;
+
+sch = css_create_sch(cdev->devno, errp);
+if (!sch) {
+return;
+}
+
+sch->driver_data = dev;
+cdev->sch = sch;
+chpid = css_find_free_chpid(sch->cssid);
+
+if (chpid > MAX_CHPID) {
+error_setg(, "No available chpid to use.");
+goto out_err;
+}
+
+sch->id.reserved = 0xff;
+sch->id.cu_type = CCW_PONG_CU_TYPE;
+css_sch_build_virtual_schib(sch, (uint8_t)chpid,
+CCW_PONG_CHPID_TYPE);
+sch->do_subchannel_work = do_subchannel_work_virtual;
+sch->ccw_cb = pong_ccw_cb;
+
+cdk->realize(cdev, );
+if (err) {
+goto out_err;
+}
+
+css_reset_sch(sch);
+return;
+
+out_err:
+error_propagate(errp, err);
+ 

Re: [PATCH v7 0/4] colo: Add support for continuous replication

2019-11-13 Thread Lukas Straub
On Fri, 25 Oct 2019 19:06:31 +0200
Lukas Straub  wrote:

> Hello Everyone,
> These Patches add support for continuous replication to colo. This means
> that after the Primary fails and the Secondary did a failover, the Secondary
> can then become Primary and resume replication to a new Secondary.
>
> Regards,
> Lukas Straub
>
> v7:
>  - clarify meaning of ip's in documentation and note that active and hidden
>images just need to be created once
>  - reverted removal of bdrv_is_root_node(top_bs) in replication and adjusted
>the top-id= parameter in documentation acordingly
>
> v6:
>  - documented the position= and insert= options
>  - renamed replication test
>  - clarified documentation by using different ip's for primary and secondary
>  - added Reviewed-by tags
>
> v5:
>  - change syntax for the position= parameter
>  - fix spelling mistake
>
> v4:
>  - fix checkpatch.pl warnings
>
> v3:
>  - add test for replication changes
>  - check if the filter to be inserted before/behind belongs to the same 
> interface
>  - fix the error message for the position= parameter
>  - rename term "after" -> "behind" and variable "insert_before" -> 
> "insert_before_flag"
>  - document the quorum node on the secondary side
>  - simplify quorum parameters in documentation
>  - remove trailing spaces in documentation
>  - clarify the testing procedure in documentation
>
> v2:
>  - fix email formating
>  - fix checkpatch.pl warnings
>  - fix patchew error
>  - clearer commit messages
>
>
> Lukas Straub (4):
>   block/replication.c: Ignore requests after failover
>   tests/test-replication.c: Add test for for secondary node continuing
> replication
>   net/filter.c: Add Options to insert filters anywhere in the filter
> list
>   colo: Update Documentation for continuous replication
>
>  block/replication.c|  35 +-
>  docs/COLO-FT.txt   | 224 +++--
>  docs/block-replication.txt |  28 +++--
>  include/net/filter.h   |   2 +
>  net/filter.c   |  92 ++-
>  qemu-options.hx|  31 -
>  tests/test-replication.c   |  52 +
>  7 files changed, 389 insertions(+), 75 deletions(-)
>

Hello Everyone,
So I guess this is ready for merging or will that have to wait until the 4.2 
release is done?

Regards,
Lukas Straub



Re: [RFC v4 PATCH 12/49] multi-process: remote process initialization

2019-11-13 Thread Stefan Hajnoczi
On Thu, Oct 24, 2019 at 05:08:53AM -0400, Jagannathan Raman wrote:
>  int main(int argc, char *argv[])
>  {
> +Error *err = NULL;
> +
>  module_call_init(MODULE_INIT_QOM);
>  
> +bdrv_init_with_whitelist();
> +
> +if (qemu_init_main_loop()) {
> +error_report_err(err);
> +return -EBUSY;
> +}
> +
> +qemu_init_cpu_loop();
> +
> +page_size_init();
> +
>  current_machine = 
> MACHINE(REMOTE_MACHINE(object_new(TYPE_REMOTE_MACHINE)));
>  
> +mpqemu_link = mpqemu_link_create();
> +if (!mpqemu_link) {
> +printf("Could not create MPQemu link\n");
> +return -1;
> +}
> +
> +mpqemu_init_channel(mpqemu_link, _link->com, STDIN_FILENO);
> +mpqemu_link_set_callback(mpqemu_link, process_msg);
> +
> +mpqemu_start_coms(mpqemu_link);

Can you use util/main-loop.c instead of an mpqemu-specific event loop?
I think that file is needed anyway because lots of QEMU code depends on
it.


signature.asc
Description: PGP signature


[PATCH for 5.0 3/6] linux-user: Add read and set for rtc periodic interrupt and epoch using ioctls

2019-11-13 Thread Filip Bozuta
Signed-off-by: Filip Bozuta 
---
 linux-user/ioctls.h   | 4 
 linux-user/syscall_defs.h | 4 
 2 files changed, 8 insertions(+)

diff --git a/linux-user/ioctls.h b/linux-user/ioctls.h
index ff1cdd2..7d76a9f 100644
--- a/linux-user/ioctls.h
+++ b/linux-user/ioctls.h
@@ -81,6 +81,10 @@
  IOCTL(RTC_ALM_READ, IOC_R, MK_PTR(MK_STRUCT(STRUCT_rtc_time)))
  IOCTL(RTC_RD_TIME, IOC_R, MK_PTR(MK_STRUCT(STRUCT_rtc_time)))
  IOCTL(RTC_SET_TIME, IOC_W, MK_PTR(MK_STRUCT(STRUCT_rtc_time)))
+ IOCTL(RTC_IRQP_READ, IOC_R, MK_PTR(TYPE_ULONG))
+ IOCTL(RTC_IRQP_SET, IOC_W, MK_PTR(TYPE_ULONG))
+ IOCTL(RTC_EPOCH_READ, IOC_R, MK_PTR(TYPE_ULONG))
+ IOCTL(RTC_EPOCH_SET, IOC_W, MK_PTR(TYPE_ULONG))
 
  IOCTL(BLKROSET, IOC_W, MK_PTR(TYPE_INT))
  IOCTL(BLKROGET, IOC_R, MK_PTR(TYPE_INT))
diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index 2298547..e69071f 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -776,6 +776,10 @@ struct target_pollfd {
 #define TARGET_RTC_ALM_READ TARGET_IOR('p', 0x08, struct rtc_time)
 #define TARGET_RTC_RD_TIME  TARGET_IOR('p', 0x09, struct rtc_time)
 #define TARGET_RTC_SET_TIME TARGET_IOW('p', 0x0a, struct rtc_time)
+#define TARGET_RTC_IRQP_READTARGET_IOR('p', 0x0b, abi_ulong)
+#define TARGET_RTC_IRQP_SET TARGET_IOW('p', 0x0c, abi_ulong)
+#define TARGET_RTC_EPOCH_READ   TARGET_IOR('p', 0x0d, abi_ulong)
+#define TARGET_RTC_EPOCH_SETTARGET_IOW('p', 0x0e, abi_ulong)
 
 #if defined(TARGET_ALPHA) || defined(TARGET_MIPS) || defined(TARGET_SH4) ||
\
defined(TARGET_XTENSA)
-- 
2.7.4




RE: [PATCH] WHPX: refactor load library

2019-11-13 Thread Sunil Muthuswamy


> Making it easier for other people to test WHPX would be nice.

Yes, we understand the concerns and I generally agree here. I am
trying to connect the different teams involved here (legal, SDK here)
and connect the dots for them, to see what can be done here.

> But in case this is not sorted out soon, I don't see a reason to
> not merge WHPX changes if they are reviewed and approved by the
> main author of that code (Justin).
> 

Justin is ready to review it, but is out for another week. Will have him
review once he is back.




Re: [RFC v4 PATCH 11/49] multi-process: setup memory manager for remote device

2019-11-13 Thread Stefan Hajnoczi
On Thu, Oct 24, 2019 at 05:08:52AM -0400, Jagannathan Raman wrote:
> +static void remote_ram_destructor(MemoryRegion *mr)
> +{
> +qemu_ram_free(mr->ram_block);
> +}
> +
> +static void remote_ram_init_from_fd(MemoryRegion *mr, int fd, uint64_t size,
> +ram_addr_t offset, Error **errp)
> +{
> +char *name = g_strdup_printf("%d", fd);
> +
> +memory_region_init(mr, NULL, name, size);
> +mr->ram = true;
> +mr->terminates = true;
> +mr->destructor = NULL;
> +mr->align = 0;
> +mr->ram_block = qemu_ram_alloc_from_fd(size, mr, RAM_SHARED, fd, offset,
> +   errp);
> +mr->dirty_log_mask = tcg_enabled() ? (1 << DIRTY_MEMORY_CODE) : 0;
> +
> +g_free(name);
> +}

This is not specific to remote/memory.c and could be shared in case
something else in QEMU wants to initialize from an fd.

> +
> +void remote_sysmem_reconfig(MPQemuMsg *msg, Error **errp)
> +{
> +sync_sysmem_msg_t *sysmem_info = >data1.sync_sysmem;

A possible security issue with MPQemuMsg: was the message size
validatedb before we access msg->data1.sync_sysmem?

If not, then we might access uninitialized data.  I didn't see if there
is a single place in the code that always zeroes msg, but I think the
answer is no.  Accessing uninitialized data could expose the old
contents of the stack/heap to the other process.  Information leaks like
this can be used to defeat address-space randomization because the other
process may learn about our memory layout if there are memory addresses
in the uninitialized data.


signature.asc
Description: PGP signature


[PATCH for 5.0 4/6] linux-user: Add get and set for rtc wakeup alarm using ioctls

2019-11-13 Thread Filip Bozuta
Signed-off-by: Filip Bozuta 
---
 linux-user/ioctls.h| 2 ++
 linux-user/syscall_defs.h  | 2 ++
 linux-user/syscall_types.h | 5 +
 3 files changed, 9 insertions(+)

diff --git a/linux-user/ioctls.h b/linux-user/ioctls.h
index 7d76a9f..5830315 100644
--- a/linux-user/ioctls.h
+++ b/linux-user/ioctls.h
@@ -85,6 +85,8 @@
  IOCTL(RTC_IRQP_SET, IOC_W, MK_PTR(TYPE_ULONG))
  IOCTL(RTC_EPOCH_READ, IOC_R, MK_PTR(TYPE_ULONG))
  IOCTL(RTC_EPOCH_SET, IOC_W, MK_PTR(TYPE_ULONG))
+ IOCTL(RTC_WKALM_SET, IOC_W, MK_PTR(MK_STRUCT(STRUCT_rtc_wkalrm)))
+ IOCTL(RTC_WKALM_RD, IOC_R, MK_PTR(MK_STRUCT(STRUCT_rtc_wkalrm)))
 
  IOCTL(BLKROSET, IOC_W, MK_PTR(TYPE_INT))
  IOCTL(BLKROGET, IOC_R, MK_PTR(TYPE_INT))
diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index e69071f..3a0eb6b 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -780,6 +780,8 @@ struct target_pollfd {
 #define TARGET_RTC_IRQP_SET TARGET_IOW('p', 0x0c, abi_ulong)
 #define TARGET_RTC_EPOCH_READ   TARGET_IOR('p', 0x0d, abi_ulong)
 #define TARGET_RTC_EPOCH_SETTARGET_IOW('p', 0x0e, abi_ulong)
+#define TARGET_RTC_WKALM_SETTARGET_IOW('p', 0x0f, struct rtc_wkalrm)
+#define TARGET_RTC_WKALM_RD TARGET_IOR('p', 0x10, struct rtc_wkalrm)
 
 #if defined(TARGET_ALPHA) || defined(TARGET_MIPS) || defined(TARGET_SH4) ||
\
defined(TARGET_XTENSA)
diff --git a/linux-user/syscall_types.h b/linux-user/syscall_types.h
index a35072a..820bc8e 100644
--- a/linux-user/syscall_types.h
+++ b/linux-user/syscall_types.h
@@ -266,6 +266,11 @@ STRUCT(rtc_time,
TYPE_INT, /* tm_yday */
TYPE_INT) /* tm_isdst */
 
+STRUCT(rtc_wkalrm,
+   TYPE_CHAR, /* enabled */
+   TYPE_CHAR, /* pending */
+   MK_STRUCT(STRUCT_rtc_time)) /* time */
+
 STRUCT(blkpg_ioctl_arg,
TYPE_INT, /* op */
TYPE_INT, /* flags */
-- 
2.7.4




RE: [PATCH] WHPX: refactor load library

2019-11-13 Thread Sunil Muthuswamy


> -Original Message-
> From: Paolo Bonzini 
> Sent: Wednesday, November 13, 2019 7:00 AM
> To: Sunil Muthuswamy ; Richard Henderson 
> ; Eduardo Habkost ;
> Stefan Weil 
> Cc: qemu-devel@nongnu.org; Justin Terry (VM) 
> Subject: Re: [PATCH] WHPX: refactor load library
> 
> On 08/11/19 21:31, Sunil Muthuswamy wrote:
> >
> > +typedef enum WHPFunctionList {
> > +WINHV_PLATFORM_FNS_DEFAULT,
> > +WINHV_EMULATION_FNS_DEFAULT,
> > +} WHPFunctionList;
> >
> 
> What does "default" stand for?  I assume you have more changes to this
> function in the future.
> 
Yes, there are more functions coming, such as for XSAVE. I used "default" to 
represent
whatever is there currently, for lack of a better term.

> > + * Load the functions from the given library, using the given handle. If a
> > + * handle is provided, it is used, otherwise the library is opened. The
> > + * handle will be updated on return with the opened one.
> > + */
> > +static bool load_whp_dipatch_fns(HMODULE *handle, WHPFunctionList 
> > function_list)
> > +{
> 
> Typo, "dipatch" instead of "dispatch".
> >
> > +if (hLib) {
> > +FreeLibrary(hWinHvEmulation);
> > +}
> 
> The argument to FreeLibrary should be hLib.
> 

Thanks, will fix these in the next version.



Re: [PATCH for 5.0 0/6] linux-user: Add support for real time clock ioctls

2019-11-13 Thread Laurent Vivier
Hi Filip,

Le 13/11/2019 à 17:41, Filip Bozuta a écrit :
> Add ioctls for all rtc features that are currently supported in linux kernell.
> 
> Filip Bozuta (6):
>   linux-user: Add support for enabling/disabling rtc features using
> ioctls
>   linux-user: Add set and read for rtc time and alarm using ioctls
>   linux-user: Add read and set for rtc periodic interrupt and epoch
> using ioctls
>   linux-user: Add get and set for rtc wakeup alarm using ioctls
>   linux-user: Add get and set for rtc pll correction using ioctls
>   linux-user: Add rtc voltage low detector read and clear using ioctls
> 
>  linux-user/ioctls.h| 23 +++
>  linux-user/syscall.c   |  1 +
>  linux-user/syscall_defs.h  | 36 
>  linux-user/syscall_types.h | 25 +
>  4 files changed, 85 insertions(+)
> 

Could you add in the description of each patch the name the ioctls it
implements, their purpose (you can cut from man(rtc)) and how you
have tested them?

Thanks,
Laurent



RE: [PATCH] WHPX: refactor load library

2019-11-13 Thread Sunil Muthuswamy



> Can we wait for approval from the Microsoft legal department first?
> So we can start testing WHPX builds, and reduce the possibilities to
> introduce regressions.
> 
> Testing is ready, we are waiting for Microsoft to merge, see:
> https://nam06.safelinks.protection.outlook.com/?url=https%3A%2F%2Fwww.mail-archive.com%2Fqemu-
> devel%40nongnu.org%2Fmsg646351.htmldata=02%7C01%7Csunilmut%40microsoft.com%7C41ce65aedecb47c7bd0d08d76857937d
> %7C72f988bf86f141af91ab2d7cd011db47%7C1%7C0%7C637092597657121219sdata=tu0zZDIzlG%2F9lEU4SJi11%2B%2FX1JdHUt6PD
> 2teeYCMZ%2B8%3Dreserved=0
> 

Yes, we have escalated this to the right set of teams, including the legal
team. We are working through the processes here internally and will
update once we have something. Meanwhile, it would be good to see if
we can get these patches in.

> >
> > Sunil, Justin, would you like to be listed as maintainers or
> > designated reviewers for the WHPX code in QEMU?
> 
> Great idea!
It's a valid and good point. I am discussing this internally here and
will get back.




[PATCH] migration: Fix the re-run check of the migrate-incoming command

2019-11-13 Thread Yury Kotov
The current check sets an error but doesn't fail the command.
This may cause a problem if new connection attempt by the same URI
affects the first connection.

Signed-off-by: Yury Kotov 
---
 migration/migration.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/migration/migration.c b/migration/migration.c
index 354ad072fa..fa2005b49f 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -1784,6 +1784,7 @@ void qmp_migrate_incoming(const char *uri, Error **errp)
 }
 if (!once) {
 error_setg(errp, "The incoming migration has already been started");
+return;
 }
 
 qemu_start_incoming_migration(uri, _err);
-- 
2.24.0




[PATCH v7 2/3] qcow2: Allow writing compressed data of multiple clusters

2019-11-13 Thread Andrey Shinkevich
QEMU currently supports writing compressed data of the size equal to
one cluster. This patch allows writing QCOW2 compressed data that
exceed one cluster. Now, we split buffered data into separate clusters
and write them compressed using the existing functionality.

Suggested-by: Pavel Butsykin 
Signed-off-by: Andrey Shinkevich 
Reviewed-by: Vladimir Sementsov-Ogievskiy 
---
 block/qcow2.c | 102 ++
 1 file changed, 75 insertions(+), 27 deletions(-)

diff --git a/block/qcow2.c b/block/qcow2.c
index 7c18721..0e03a1a 100644
--- a/block/qcow2.c
+++ b/block/qcow2.c
@@ -4222,10 +4222,8 @@ fail:
 return ret;
 }
 
-/* XXX: put compressed sectors first, then all the cluster aligned
-   tables to avoid losing bytes in alignment */
 static coroutine_fn int
-qcow2_co_pwritev_compressed_part(BlockDriverState *bs,
+qcow2_co_pwritev_compressed_task(BlockDriverState *bs,
  uint64_t offset, uint64_t bytes,
  QEMUIOVector *qiov, size_t qiov_offset)
 {
@@ -4235,32 +4233,11 @@ qcow2_co_pwritev_compressed_part(BlockDriverState *bs,
 uint8_t *buf, *out_buf;
 uint64_t cluster_offset;
 
-if (has_data_file(bs)) {
-return -ENOTSUP;
-}
-
-if (bytes == 0) {
-/* align end of file to a sector boundary to ease reading with
-   sector based I/Os */
-int64_t len = bdrv_getlength(bs->file->bs);
-if (len < 0) {
-return len;
-}
-return bdrv_co_truncate(bs->file, len, false, PREALLOC_MODE_OFF, NULL);
-}
-
-if (offset_into_cluster(s, offset)) {
-return -EINVAL;
-}
+assert(bytes == s->cluster_size || (bytes < s->cluster_size &&
+   (offset + bytes == bs->total_sectors << BDRV_SECTOR_BITS)));
 
 buf = qemu_blockalign(bs, s->cluster_size);
-if (bytes != s->cluster_size) {
-if (bytes > s->cluster_size ||
-offset + bytes != bs->total_sectors << BDRV_SECTOR_BITS)
-{
-qemu_vfree(buf);
-return -EINVAL;
-}
+if (bytes < s->cluster_size) {
 /* Zero-pad last write if image size is not cluster aligned */
 memset(buf + bytes, 0, s->cluster_size - bytes);
 }
@@ -4309,6 +4286,77 @@ fail:
 return ret;
 }
 
+static coroutine_fn int qcow2_co_pwritev_compressed_task_entry(AioTask *task)
+{
+Qcow2AioTask *t = container_of(task, Qcow2AioTask, task);
+
+assert(!t->cluster_type && !t->l2meta);
+
+return qcow2_co_pwritev_compressed_task(t->bs, t->offset, t->bytes, 
t->qiov,
+t->qiov_offset);
+}
+
+/*
+ * XXX: put compressed sectors first, then all the cluster aligned
+ * tables to avoid losing bytes in alignment
+ */
+static coroutine_fn int
+qcow2_co_pwritev_compressed_part(BlockDriverState *bs,
+ uint64_t offset, uint64_t bytes,
+ QEMUIOVector *qiov, size_t qiov_offset)
+{
+BDRVQcow2State *s = bs->opaque;
+AioTaskPool *aio = NULL;
+int ret = 0;
+
+if (has_data_file(bs)) {
+return -ENOTSUP;
+}
+
+if (bytes == 0) {
+/*
+ * align end of file to a sector boundary to ease reading with
+ * sector based I/Os
+ */
+int64_t len = bdrv_getlength(bs->file->bs);
+if (len < 0) {
+return len;
+}
+return bdrv_co_truncate(bs->file, len, false, PREALLOC_MODE_OFF, NULL);
+}
+
+if (offset_into_cluster(s, offset)) {
+return -EINVAL;
+}
+
+while (bytes && aio_task_pool_status(aio) == 0) {
+uint64_t chunk_size = MIN(bytes, s->cluster_size);
+
+if (!aio && chunk_size != bytes) {
+aio = aio_task_pool_new(QCOW2_MAX_WORKERS);
+}
+
+ret = qcow2_add_task(bs, aio, qcow2_co_pwritev_compressed_task_entry,
+ 0, 0, offset, chunk_size, qiov, qiov_offset, 
NULL);
+if (ret < 0) {
+break;
+}
+qiov_offset += chunk_size;
+offset += chunk_size;
+bytes -= chunk_size;
+}
+
+if (aio) {
+aio_task_pool_wait_all(aio);
+if (ret == 0) {
+ret = aio_task_pool_status(aio);
+}
+g_free(aio);
+}
+
+return ret;
+}
+
 static int coroutine_fn
 qcow2_co_preadv_compressed(BlockDriverState *bs,
uint64_t file_cluster_offset,
-- 
1.8.3.1




[PATCH v7 1/3] block: introduce compress filter driver

2019-11-13 Thread Andrey Shinkevich
Allow writing all the data compressed through the filter driver.
The written data will be aligned by the cluster size.
Based on the QEMU current implementation, that data can be written to
unallocated clusters only. May be used for a backup job.

Suggested-by: Max Reitz 
Signed-off-by: Andrey Shinkevich 
---
 block/Makefile.objs |   1 +
 block/filter-compress.c | 201 
 qapi/block-core.json|  10 ++-
 3 files changed, 208 insertions(+), 4 deletions(-)
 create mode 100644 block/filter-compress.c

diff --git a/block/Makefile.objs b/block/Makefile.objs
index e394fe0..330529b 100644
--- a/block/Makefile.objs
+++ b/block/Makefile.objs
@@ -43,6 +43,7 @@ block-obj-y += crypto.o
 
 block-obj-y += aio_task.o
 block-obj-y += backup-top.o
+block-obj-y += filter-compress.o
 
 common-obj-y += stream.o
 
diff --git a/block/filter-compress.c b/block/filter-compress.c
new file mode 100644
index 000..64b1ee5
--- /dev/null
+++ b/block/filter-compress.c
@@ -0,0 +1,201 @@
+/*
+ * Compress filter block driver
+ *
+ * Copyright (c) 2019 Virtuozzo International GmbH
+ *
+ * Author:
+ *   Andrey Shinkevich 
+ *   (based on block/copy-on-read.c by Max Reitz)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License as
+ * published by the Free Software Foundation; either version 2 or
+ * (at your option) any later version of the License.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "block/block_int.h"
+#include "qemu/module.h"
+
+
+static int compress_open(BlockDriverState *bs, QDict *options, int flags,
+ Error **errp)
+{
+bs->backing = bdrv_open_child(NULL, options, "file", bs, _file, 
false,
+  errp);
+if (!bs->backing) {
+return -EINVAL;
+}
+
+bs->supported_write_flags = BDRV_REQ_WRITE_UNCHANGED |
+BDRV_REQ_WRITE_COMPRESSED |
+(BDRV_REQ_FUA & bs->backing->bs->supported_write_flags);
+
+bs->supported_zero_flags = BDRV_REQ_WRITE_UNCHANGED |
+((BDRV_REQ_FUA | BDRV_REQ_MAY_UNMAP | BDRV_REQ_NO_FALLBACK) &
+bs->backing->bs->supported_zero_flags);
+
+return 0;
+}
+
+
+#define PERM_PASSTHROUGH (BLK_PERM_CONSISTENT_READ \
+  | BLK_PERM_WRITE \
+  | BLK_PERM_RESIZE)
+#define PERM_UNCHANGED (BLK_PERM_ALL & ~PERM_PASSTHROUGH)
+
+static void compress_child_perm(BlockDriverState *bs, BdrvChild *c,
+const BdrvChildRole *role,
+BlockReopenQueue *reopen_queue,
+uint64_t perm, uint64_t shared,
+uint64_t *nperm, uint64_t *nshared)
+{
+*nperm = perm & PERM_PASSTHROUGH;
+*nshared = (shared & PERM_PASSTHROUGH) | PERM_UNCHANGED;
+
+/*
+ * We must not request write permissions for an inactive node, the child
+ * cannot provide it.
+ */
+if (!(bs->open_flags & BDRV_O_INACTIVE)) {
+*nperm |= BLK_PERM_WRITE_UNCHANGED;
+}
+}
+
+
+static int64_t compress_getlength(BlockDriverState *bs)
+{
+return bdrv_getlength(bs->backing->bs);
+}
+
+
+static int coroutine_fn compress_co_truncate(BlockDriverState *bs,
+ int64_t offset, bool exact,
+ PreallocMode prealloc,
+ Error **errp)
+{
+return bdrv_co_truncate(bs->backing, offset, exact, prealloc, errp);
+}
+
+
+static int coroutine_fn compress_co_preadv_part(BlockDriverState *bs,
+uint64_t offset, uint64_t 
bytes,
+QEMUIOVector *qiov,
+size_t qiov_offset,
+int flags)
+{
+return bdrv_co_preadv_part(bs->backing, offset, bytes, qiov, qiov_offset,
+   flags);
+}
+
+
+static int coroutine_fn compress_co_pwritev_part(BlockDriverState *bs,
+ uint64_t offset,
+ uint64_t bytes,
+ QEMUIOVector *qiov,
+ size_t qiov_offset, int flags)
+{
+return bdrv_co_pwritev_part(bs->backing, offset, bytes, qiov, qiov_offset,
+flags | BDRV_REQ_WRITE_COMPRESSED);
+}
+
+
+static int coroutine_fn 

Re: [EXTERNAL]Re: [PATCH v1 5/5] .travis.yml: drop 32 bit systems from MAIN_SOFTMMU_TARGETS

2019-11-13 Thread Philippe Mathieu-Daudé

On 11/13/19 6:38 PM, Aleksandar Markovic wrote:

From: Philippe Mathieu-Daudé 

-- 
MAIN_SOFTMMU_TARGETS="aarch64-softmmu,arm-softmmu,i386-softmmu,mips-softmmu,mips64-softmmu,ppc64-softmmu,riscv64-softmmu,s390x-softmmu,x86_64-softmmu"
+- 
MAIN_SOFTMMU_TARGETS="aarch64-softmmu,mips64-softmmu,ppc64-softmmu,riscv64-softmmu,s390x-softmmu,x86_64-softmmu"


Aleksandar, since you mostly test 32-bit MIPS, are you OK we keep
mips-softmmu and drop mips64-softmmu here? Another job (acceptance-test)
builds the mips64el-softmmu.


Philippe, thanks for bringing this to my attention. Yes, 32-bit mips targets are 
important to us, but, what can we do, time constraints are time constraints, so I agree 
with Alex change, please go ahead, Alex. We can test 32-bit mips targets via other 
acceptance tests (those that can run longer, so-called "slow" group), and 
perhaps we can extend them to test more 32-bit mips systems.


OK, let's keep mips64 as suggested Alex then.

Reviewed-by: Philippe Mathieu-Daudé 




Re: [PATCH v9 Kernel 1/5] vfio: KABI for migration interface for device state

2019-11-13 Thread Kirti Wankhede




On 11/13/2019 8:53 AM, Yan Zhao wrote:

On Wed, Nov 13, 2019 at 06:30:05AM +0800, Alex Williamson wrote:

On Tue, 12 Nov 2019 22:33:36 +0530
Kirti Wankhede  wrote:


- Defined MIGRATION region type and sub-type.
- Used 3 bits to define VFIO device states.
 Bit 0 => _RUNNING
 Bit 1 => _SAVING
 Bit 2 => _RESUMING
 Combination of these bits defines VFIO device's state during migration
 _RUNNING => Normal VFIO device running state. When its reset, it
indicates _STOPPED state. when device is changed to
_STOPPED, driver should stop device before write()
returns.
 _SAVING | _RUNNING => vCPUs are running, VFIO device is running but
   start saving state of device i.e. pre-copy state
 _SAVING  => vCPUs are stopped, VFIO device should be stopped, and


s/should/must/


 save device state,i.e. stop-n-copy state
 _RESUMING => VFIO device resuming state.
 _SAVING | _RESUMING and _RUNNING | _RESUMING => Invalid states


A table might be useful here and in the uapi header to indicate valid
states:

| _RESUMING | _SAVING | _RUNNING | Description
+---+-+--+--
| 0 |0| 0| Stopped, not saving or resuming (a)
+---+-+--+--
| 0 |0| 1| Running, default state
+---+-+--+--
| 0 |1| 0| Stopped, migration interface in save mode
+---+-+--+--
| 0 |1| 1| Running, save mode interface, iterative
+---+-+--+--
| 1 |0| 0| Stopped, migration resume interface active
+---+-+--+--
| 1 |0| 1| Invalid (b)
+---+-+--+--
| 1 |1| 0| Invalid (c)
+---+-+--+--
| 1 |1| 1| Invalid (d)

I think we need to consider whether we define (a) as generally
available, for instance we might want to use it for diagnostics or a
fatal error condition outside of migration.



We have to set it as init state. I'll add this it.


Are there hidden assumptions between state transitions here or are
there specific next possible state diagrams that we need to include as
well?

I'm curious if Intel agrees with the states marked invalid with their
push for post-copy support.


hi Alex and Kirti,
Actually, for postcopy, I think we anyway need an extra POSTCOPY state
introduced. Reasons as below:
- in the target side, _RSESUMING state is set in the beginning of
   migration. It cannot be used as a state to inform device of that
   currently it's in postcopy state and device DMAs are to be trapped and
   pre-faulted.
   we also cannot use state (_RESUMING + _RUNNING) as an indicator of
   postcopy, because before device & vm running in target side, some device
   state are already loaded (e.g. page tables, pending workloads),
   target side can do pre-pagefault at that period before target vm up.
- in the source side, after device is stopped, postcopy needs saving
   device state only (as compared to device state + remaining dirty
   pages in precopy). state (!_RUNNING + _SAVING) here again cannot
   differentiate precopy and postcopy here.


 Bits 3 - 31 are reserved for future use. User should perform
 read-modify-write operation on this field.
- Defined vfio_device_migration_info structure which will be placed at 0th
   offset of migration region to get/set VFIO device related information.
   Defined members of structure and usage on read/write access:
 * device_state: (read/write)
 To convey VFIO device state to be transitioned to. Only 3 bits are
used as of now, Bits 3 - 31 are reserved for future use.
 * pending bytes: (read only)
 To get pending bytes yet to be migrated for VFIO device.
 * data_offset: (read only)
 To get data offset in migration region from where data exist
during _SAVING and from where data should be written by user space
application during _RESUMING state.
 * data_size: (read/write)
 To get and set size in bytes of data copied in migration region
during _SAVING and _RESUMING state.

Migration region looks like:
  --
|vfio_device_migration_info|data section  |
|  | ///  |
  --
  ^  ^
  offset 0-trapped part

Re: [PATCH v1 2/5] docs/devel: rename plugins.rst to tcg-plugins.rst

2019-11-13 Thread Philippe Mathieu-Daudé

On 11/13/19 12:59 PM, Alex Bennée wrote:

This makes it a bit clearer what this is about.

Signed-off-by: Alex Bennée 
---
  docs/devel/index.rst| 2 +-
  docs/devel/{plugins.rst => tcg-plugins.rst} | 0
  MAINTAINERS | 1 +
  3 files changed, 2 insertions(+), 1 deletion(-)
  rename docs/devel/{plugins.rst => tcg-plugins.rst} (100%)

diff --git a/docs/devel/index.rst b/docs/devel/index.rst
index 2ff058bae38..c86a3cdff2f 100644
--- a/docs/devel/index.rst
+++ b/docs/devel/index.rst
@@ -22,4 +22,4 @@ Contents:
 decodetree
 secure-coding-practices
 tcg
-   plugins
+   tcg-plugins
diff --git a/docs/devel/plugins.rst b/docs/devel/tcg-plugins.rst
similarity index 100%
rename from docs/devel/plugins.rst
rename to docs/devel/tcg-plugins.rst
diff --git a/MAINTAINERS b/MAINTAINERS
index ff8d0d29f4b..b160d817208 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -2369,6 +2369,7 @@ F: tcg/
  TCG Plugins
  M: Alex Bennée 
  S: Maintained
+F: docs/devel/tcg-plugins.rst
  F: plugins/


What about (other patch):

  plugins/ -> tcg/plugins/


  F: tests/plugin


tests/plugin/ -> tests/tcg/plugins/

Anyway for this patch:
Reviewed-by: Philippe Mathieu-Daudé 




Re: [SeaBIOS] Re: [PATCH] ahci: zero-initialize port struct

2019-11-13 Thread Sam Eiderman
Sure,

There are two issues here.

The first issue is that my commits which applied to seabios master:

* 9caa19b - geometry: Apply LCHS values for boot devices
* cb56f61 - config: Add toggle for bootdevice information
* ad29109 - geometry: Add boot_lchs_find_*() utility functions
* b3d2120 - boot: Reorder functions in boot.c
* 7c66a43 - geometry: Read LCHS from fw_cfg

Are not from the latest version which was submitted to the mailing list (v4)
* fw_cfg key name has changed
* The value and of the key has changed from binary (v1) to textual (v4)
* Other fixes and variable name changes.

So these commits need to be reverted and reapplied with the latest version (v4)

The second issue is that my commits, (in v4 too) will require this fix
that Gerd added ([PATCH] ahci: zero-initialize port struct) since they
change how SeaBIOS uses lchs.

Previously, before any of my commits, drive.lchs could contain "random
crap" since it was always set before being used in
setup_translation().

After my patches, get_translation() invokes overriden_lchs_supplied()
which checks: "return drive->lchs.cylinder || drive->lchs.head ||
drive->lchs.sector;"
So there is an assumption that "drive->lchs" is zeroed when lchs is
not supplied for the host.

This was true for all devices using "drive->lchs" (all were memset to
0) except ahci.
(I used 'git grep "drive_s * drive"' to find them all).

So Gerd fix is indeed needed and then all devices are covered
(drive->lchs is memset to 0).

Now only the first issue remains...

Sam

On Wed, Nov 13, 2019 at 6:12 PM Philippe Mathieu-Daudé
 wrote:
>
> Hi Sam,
>
> On 11/13/19 4:03 PM, Sam Eiderman wrote:
> > Hi,
> >
> > Does this fix a bug that actually happened?
> >
> > I just noticed that in my lchs patches I assumed that lchs struct is
> > zeroed out in all devices (not only ahci):
> >
> > 9caa19be0e53 (geometry: Apply LCHS values for boot devices)
> >
> > Seems like this is not the case but why only ahci is affected?
> >
> > The list of devices is at least:
> >
> >  * ata
> >  * ahci
> >  * scsi
> >  * esp
> >  * lsi
> >  * megasas
> >  * mpt
> >  * pvscsi
> >  * virtio
> >  * virtio-blk
> >
> > As specified in the commit message.
> >
> > Also Gerd it seems that my lchs patches were not committed in the
> > latest submitted version (v4)!!!
> > The ABI of the fw config key is completely broken.
>
> What do you mean? Can you be more specific?
>



[PATCH for 5.0 6/6] linux-user: Add rtc voltage low detector read and clear using ioctls

2019-11-13 Thread Filip Bozuta
Signed-off-by: Filip Bozuta 
---
 linux-user/ioctls.h   | 2 ++
 linux-user/syscall_defs.h | 2 ++
 2 files changed, 4 insertions(+)

diff --git a/linux-user/ioctls.h b/linux-user/ioctls.h
index eea65e1..330e502 100644
--- a/linux-user/ioctls.h
+++ b/linux-user/ioctls.h
@@ -89,6 +89,8 @@
  IOCTL(RTC_WKALM_RD, IOC_R, MK_PTR(MK_STRUCT(STRUCT_rtc_wkalrm)))
  IOCTL(RTC_PLL_GET, IOC_R, MK_PTR(MK_STRUCT(STRUCT_rtc_pll_info)))
  IOCTL(RTC_PLL_SET, IOC_W, MK_PTR(MK_STRUCT(STRUCT_rtc_pll_info)))
+ IOCTL(RTC_VL_READ, IOC_R, TYPE_INT)
+ IOCTL(RTC_VL_CLR, 0, TYPE_NULL)
 
  IOCTL(BLKROSET, IOC_W, MK_PTR(TYPE_INT))
  IOCTL(BLKROGET, IOC_R, MK_PTR(TYPE_INT))
diff --git a/linux-user/syscall_defs.h b/linux-user/syscall_defs.h
index 367e9bd..6ad827b 100644
--- a/linux-user/syscall_defs.h
+++ b/linux-user/syscall_defs.h
@@ -796,6 +796,8 @@ struct target_rtc_pll_info {
struct target_rtc_pll_info)
 #define TARGET_RTC_PLL_SET  TARGET_IOW('p', 0x12,  
\
struct target_rtc_pll_info)
+#define TARGET_RTC_VL_READ  TARGET_IOR('p', 0x13, int)
+#define TARGET_RTC_VL_CLR   TARGET_IO('p', 0x14)
 
 #if defined(TARGET_ALPHA) || defined(TARGET_MIPS) || defined(TARGET_SH4) ||
\
defined(TARGET_XTENSA)
-- 
2.7.4




Re: [RFC v4 PATCH 41/49] multi-process/mig: Enable VMSD save in the Proxy object

2019-11-13 Thread Daniel P . Berrangé
On Wed, Nov 13, 2019 at 11:32:09AM -0500, Jag Raman wrote:
> 
> 
> On 11/13/2019 10:50 AM, Daniel P. Berrangé wrote:
> > On Thu, Oct 24, 2019 at 05:09:22AM -0400, Jagannathan Raman wrote:
> > > Collect the VMSD from remote process on the source and save
> > > it to the channel leading to the destination
> > > 
> > > Signed-off-by: Elena Ufimtseva 
> > > Signed-off-by: John G Johnson 
> > > Signed-off-by: Jagannathan Raman 
> > > ---
> > >   New patch in v4
> > > 
> > >   hw/proxy/qemu-proxy.c | 132 
> > > ++
> > >   include/hw/proxy/qemu-proxy.h |   2 +
> > >   include/io/mpqemu-link.h  |   1 +
> > >   3 files changed, 135 insertions(+)
> > > 
> > > diff --git a/hw/proxy/qemu-proxy.c b/hw/proxy/qemu-proxy.c
> > > index 623a6c5..ce72e6a 100644
> > > --- a/hw/proxy/qemu-proxy.c
> > > +++ b/hw/proxy/qemu-proxy.c
> > > @@ -52,6 +52,14 @@
> > >   #include "util/event_notifier-posix.c"
> > >   #include "hw/boards.h"
> > >   #include "include/qemu/log.h"
> > > +#include "io/channel.h"
> > > +#include "migration/qemu-file-types.h"
> > > +#include "qapi/error.h"
> > > +#include "io/channel-util.h"
> > > +#include "migration/qemu-file-channel.h"
> > > +#include "migration/qemu-file.h"
> > > +#include "migration/migration.h"
> > > +#include "migration/vmstate.h"
> > >   QEMUTimer *hb_timer;
> > >   static void pci_proxy_dev_realize(PCIDevice *dev, Error **errp);
> > > @@ -62,6 +70,9 @@ static void stop_heartbeat_timer(void);
> > >   static void childsig_handler(int sig, siginfo_t *siginfo, void *ctx);
> > >   static void broadcast_msg(MPQemuMsg *msg, bool need_reply);
> > > +#define PAGE_SIZE getpagesize()
> > > +uint8_t *mig_data;
> > > +
> > >   static void childsig_handler(int sig, siginfo_t *siginfo, void *ctx)
> > >   {
> > >   /* TODO: Add proper handler. */
> > > @@ -357,14 +368,135 @@ static void pci_proxy_dev_inst_init(Object *obj)
> > >   dev->mem_init = false;
> > >   }
> > > +typedef struct {
> > > +QEMUFile *rem;
> > > +PCIProxyDev *dev;
> > > +} proxy_mig_data;
> > > +
> > > +static void *proxy_mig_out(void *opaque)
> > > +{
> > > +proxy_mig_data *data = opaque;
> > > +PCIProxyDev *dev = data->dev;
> > > +uint8_t byte;
> > > +uint64_t data_size = PAGE_SIZE;
> > > +
> > > +mig_data = g_malloc(data_size);
> > > +
> > > +while (true) {
> > > +byte = qemu_get_byte(data->rem);
> > 
> > There is a pretty large set of APIs hiding behind the qemu_get_byte
> > call, which does not give me confidence that...
> > 
> > > +mig_data[dev->migsize++] = byte;
> > > +if (dev->migsize == data_size) {
> > > +data_size += PAGE_SIZE;
> > > +mig_data = g_realloc(mig_data, data_size);
> > > +}
> > > +}
> > > +
> > > +return NULL;
> > > +}
> > > +
> > > +static int proxy_pre_save(void *opaque)
> > > +{
> > > +PCIProxyDev *pdev = opaque;
> > > +proxy_mig_data *mig_data;
> > > +QEMUFile *f_remote;
> > > +MPQemuMsg msg = {0};
> > > +QemuThread thread;
> > > +Error *err = NULL;
> > > +QIOChannel *ioc;
> > > +uint64_t size;
> > > +int fd[2];
> > > +
> > > +if (socketpair(AF_UNIX, SOCK_STREAM, 0, fd)) {
> > > +return -1;
> > > +}
> > > +
> > > +ioc = qio_channel_new_fd(fd[0], );
> > > +if (err) {
> > > +error_report_err(err);
> > > +return -1;
> > > +}
> > > +
> > > +qio_channel_set_name(QIO_CHANNEL(ioc), "PCIProxyDevice-mig");
> > > +
> > > +f_remote = qemu_fopen_channel_input(ioc);
> > > +
> > > +pdev->migsize = 0;
> > > +
> > > +mig_data = g_malloc0(sizeof(proxy_mig_data));
> > > +mig_data->rem = f_remote;
> > > +mig_data->dev = pdev;
> > > +
> > > +qemu_thread_create(, "Proxy MIG_OUT", proxy_mig_out, mig_data,
> > > +   QEMU_THREAD_DETACHED);
> > > +
> > > +msg.cmd = START_MIG_OUT;
> > > +msg.bytestream = 0;
> > > +msg.num_fds = 2;
> > > +msg.fds[0] = fd[1];
> > > +msg.fds[1] = GET_REMOTE_WAIT;
> > > +
> > > +mpqemu_msg_send(pdev->mpqemu_link, , pdev->mpqemu_link->com);
> > > +size = wait_for_remote(msg.fds[1]);
> > > +PUT_REMOTE_WAIT(msg.fds[1]);
> > > +
> > > +assert(size != ULLONG_MAX);
> > > +
> > > +/*
> > > + * migsize is being update by a separate thread. Using volatile to
> > > + * instruct the compiler to fetch the value of this variable from
> > > + * memory during every read
> > > + */
> > > +while (*((volatile uint64_t *)>migsize) < size) {
> > > +}
> > > +
> > > +qemu_thread_cancel();
> > 
> > this is a safe way to stop the thread executing without
> > resulting in memory being leaked.
> > 
> > In addition thread cancellation is asynchronous, so the thread
> > may still be using the QEMUFile object while
> > 
> > > +qemu_fclose(f_remote);
> 
> The above "wait_for_remote()" call waits for the remote process to
> finish with Migration, and return the size of 

[PATCH v7 0/3] qcow2: advanced compression options

2019-11-13 Thread Andrey Shinkevich
The compression filter driver is introduced as suggested by Max.
A sample usage of the filter can be found in the test #214.
Now, multiple clusters can be written compressed.
It is useful for the backup job.

v7:
  01: The 'zip_' prefix for the compression filter functions replaced with
'compress_' one.
  02: .bdrv_co_preadv/pwritev (without _part) removed from the filter.
  03: .bdrv_refresh_limits amended.
  04: .bdrv_get_info added.
  05: In qapi/block-core.json, @compress: Since 5.0 was set.

  Discussed in the email thread with the message ID
  <1573488277-794975-1-git-send-email-andrey.shinkev...@virtuozzo.com>

Andrey Shinkevich (3):
  block: introduce compress filter driver
  qcow2: Allow writing compressed data of multiple clusters
  tests/qemu-iotests: add case to write compressed data of multiple
clusters

 block/Makefile.objs|   1 +
 block/filter-compress.c| 201 +
 block/qcow2.c  | 102 +--
 qapi/block-core.json   |  10 ++-
 tests/qemu-iotests/214 |  43 ++
 tests/qemu-iotests/214.out |  14 
 6 files changed, 340 insertions(+), 31 deletions(-)
 create mode 100644 block/filter-compress.c

-- 
1.8.3.1




[PATCH v7 3/3] tests/qemu-iotests: add case to write compressed data of multiple clusters

2019-11-13 Thread Andrey Shinkevich
Add the case to the iotest #214 that checks possibility of writing
compressed data of more than one cluster size. The test case involves
the compress filter driver showing a sample usage of that.

Signed-off-by: Andrey Shinkevich 
---
 tests/qemu-iotests/214 | 43 +++
 tests/qemu-iotests/214.out | 14 ++
 2 files changed, 57 insertions(+)

diff --git a/tests/qemu-iotests/214 b/tests/qemu-iotests/214
index 21ec8a2..5012112 100755
--- a/tests/qemu-iotests/214
+++ b/tests/qemu-iotests/214
@@ -89,6 +89,49 @@ _check_test_img -r all
 $QEMU_IO -c "read  -P 0x11  0 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | 
_filter_testdir
 $QEMU_IO -c "read  -P 0x22 4M 4M" "$TEST_IMG" 2>&1 | _filter_qemu_io | 
_filter_testdir
 
+echo
+echo "=== Write compressed data of multiple clusters ==="
+echo
+cluster_size=0x1
+_make_test_img 2M -o cluster_size=$cluster_size
+
+echo "Write uncompressed data:"
+let data_size="8 * $cluster_size"
+$QEMU_IO -c "write -P 0xaa 0 $data_size" "$TEST_IMG" \
+ 2>&1 | _filter_qemu_io | _filter_testdir
+sizeA=$($QEMU_IMG info --output=json "$TEST_IMG" |
+sed -n '/"actual-size":/ s/[^0-9]//gp')
+
+_make_test_img 2M -o cluster_size=$cluster_size
+echo "Write compressed data:"
+let data_size="3 * $cluster_size + ($cluster_size / 2)"
+# Set compress=on. That will align the written data
+# by the cluster size and will write them compressed.
+QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT \
+$QEMU_IO -c "write -P 0xbb 0 $data_size" --image-opts \
+ 
"driver=compress,file.driver=$IMGFMT,file.file.driver=file,file.file.filename=$TEST_IMG"
 \
+ 2>&1 | _filter_qemu_io | _filter_testdir
+
+let offset="4 * $cluster_size"
+QEMU_IO_OPTIONS=$QEMU_IO_OPTIONS_NO_FMT \
+$QEMU_IO -c "write -P 0xcc $offset $data_size" "json:{\
+'driver': 'compress',
+'file': {'driver': '$IMGFMT',
+ 'file': {'driver': 'file',
+  'filename': '$TEST_IMG'}}}" | \
+  _filter_qemu_io | _filter_testdir
+
+sizeB=$($QEMU_IMG info --output=json "$TEST_IMG" |
+sed -n '/"actual-size":/ s/[^0-9]//gp')
+
+if [ $sizeA -le $sizeB ]
+then
+echo "Compression ERROR"
+fi
+
+$QEMU_IMG check --output=json "$TEST_IMG" |
+  sed -n 's/,$//; /"compressed-clusters":/ s/^ *//p'
+
 # success, all done
 echo '*** done'
 rm -f $seq.full
diff --git a/tests/qemu-iotests/214.out b/tests/qemu-iotests/214.out
index 0fcd8dc..4a2ec33 100644
--- a/tests/qemu-iotests/214.out
+++ b/tests/qemu-iotests/214.out
@@ -32,4 +32,18 @@ read 4194304/4194304 bytes at offset 0
 4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
 read 4194304/4194304 bytes at offset 4194304
 4 MiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+
+=== Write compressed data of multiple clusters ===
+
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2097152
+Write uncompressed data:
+wrote 524288/524288 bytes at offset 0
+512 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=2097152
+Write compressed data:
+wrote 229376/229376 bytes at offset 0
+224 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+wrote 229376/229376 bytes at offset 262144
+224 KiB, X ops; XX:XX:XX.X (XXX YYY/sec and XXX ops/sec)
+"compressed-clusters": 8
 *** done
-- 
1.8.3.1




[PATCH V2] WHPX: refactor load library

2019-11-13 Thread Sunil Muthuswamy
This refactors the load library of WHV libraries to make it more
modular. It makes a helper routine that can be called on demand.
This allows future expansion of load library/functions to support
functionality that is dependent on some feature being available.

Signed-off-by: Sunil Muthuswamy 
---
Changes since v1:
- Fixed typo of load_whp_dispatch_fns
- Fixed free of the right handle

 target/i386/whp-dispatch.h |  4 +++
 target/i386/whpx-all.c | 85 +++---
 2 files changed, 62 insertions(+), 27 deletions(-)

diff --git a/target/i386/whp-dispatch.h b/target/i386/whp-dispatch.h
index 23791fbb47..87d049ceab 100644
--- a/target/i386/whp-dispatch.h
+++ b/target/i386/whp-dispatch.h
@@ -50,5 +50,9 @@ extern struct WHPDispatch whp_dispatch;
 
 bool init_whp_dispatch(void);
 
+typedef enum WHPFunctionList {
+WINHV_PLATFORM_FNS_DEFAULT,
+WINHV_EMULATION_FNS_DEFAULT,
+} WHPFunctionList;
 
 #endif /* WHP_DISPATCH_H */
diff --git a/target/i386/whpx-all.c b/target/i386/whpx-all.c
index ed95105eae..f3c61fa5d8 100644
--- a/target/i386/whpx-all.c
+++ b/target/i386/whpx-all.c
@@ -1356,6 +1356,58 @@ static void whpx_handle_interrupt(CPUState *cpu, int 
mask)
 }
 }
 
+/*
+ * Load the functions from the given library, using the given handle. If a
+ * handle is provided, it is used, otherwise the library is opened. The
+ * handle will be updated on return with the opened one.
+ */
+static bool load_whp_dispatch_fns(HMODULE *handle,
+WHPFunctionList function_list)
+{
+HMODULE hLib = *handle;
+
+#define WINHV_PLATFORM_DLL "WinHvPlatform.dll"
+#define WINHV_EMULATION_DLL "WinHvEmulation.dll"
+#define WHP_LOAD_FIELD(return_type, function_name, signature) \
+whp_dispatch.function_name = \
+(function_name ## _t)GetProcAddress(hLib, #function_name); \
+if (!whp_dispatch.function_name) { \
+error_report("Could not load function %s", #function_name); \
+goto error; \
+} \
+
+#define WHP_LOAD_LIB(lib_name, handle_lib) \
+if (!handle_lib) { \
+handle_lib = LoadLibrary(lib_name); \
+if (!handle_lib) { \
+error_report("Could not load library %s.", lib_name); \
+goto error; \
+} \
+} \
+
+switch (function_list) {
+case WINHV_PLATFORM_FNS_DEFAULT:
+WHP_LOAD_LIB(WINHV_PLATFORM_DLL, hLib)
+LIST_WINHVPLATFORM_FUNCTIONS(WHP_LOAD_FIELD)
+break;
+
+case WINHV_EMULATION_FNS_DEFAULT:
+WHP_LOAD_LIB(WINHV_EMULATION_DLL, hLib)
+LIST_WINHVEMULATION_FUNCTIONS(WHP_LOAD_FIELD)
+break;
+}
+
+*handle = hLib;
+return true;
+
+error:
+if (hLib) {
+FreeLibrary(hLib);
+}
+
+return false;
+}
+
 /*
  * Partition support
  */
@@ -1491,51 +1543,30 @@ static void whpx_type_init(void)
 
 bool init_whp_dispatch(void)
 {
-const char *lib_name;
-HMODULE hLib;
-
 if (whp_dispatch_initialized) {
 return true;
 }
 
-#define WHP_LOAD_FIELD(return_type, function_name, signature) \
-whp_dispatch.function_name = \
-(function_name ## _t)GetProcAddress(hLib, #function_name); \
-if (!whp_dispatch.function_name) { \
-error_report("Could not load function %s from library %s.", \
- #function_name, lib_name); \
-goto error; \
-} \
-
-lib_name = "WinHvPlatform.dll";
-hWinHvPlatform = LoadLibrary(lib_name);
-if (!hWinHvPlatform) {
-error_report("Could not load library %s.", lib_name);
+if (!load_whp_dispatch_fns(, WINHV_PLATFORM_FNS_DEFAULT)) {
 goto error;
 }
-hLib = hWinHvPlatform;
-LIST_WINHVPLATFORM_FUNCTIONS(WHP_LOAD_FIELD)
 
-lib_name = "WinHvEmulation.dll";
-hWinHvEmulation = LoadLibrary(lib_name);
-if (!hWinHvEmulation) {
-error_report("Could not load library %s.", lib_name);
+if (!load_whp_dispatch_fns(, WINHV_EMULATION_FNS_DEFAULT)) 
{
 goto error;
 }
-hLib = hWinHvEmulation;
-LIST_WINHVEMULATION_FUNCTIONS(WHP_LOAD_FIELD)
 
 whp_dispatch_initialized = true;
-return true;
-
-error:
 
+return true;
+error:
 if (hWinHvPlatform) {
 FreeLibrary(hWinHvPlatform);
 }
+
 if (hWinHvEmulation) {
 FreeLibrary(hWinHvEmulation);
 }
+
 return false;
 }
 
-- 
2.16.4




Re: [PATCH] qmp: Reset mon->commands on CHR_EVENT_CLOSED

2019-11-13 Thread Jason Andryuk
On Wed, Nov 13, 2019 at 8:18 AM Marc-André Lureau
 wrote:
>
> Hi
>
> On Wed, Nov 13, 2019 at 4:41 PM Markus Armbruster  wrote:
> >
> > Jason Andryuk  writes:
> >
> > > Currently, mon->commands is uninitialized until CHR_EVENT_OPENED where
> > > it is set to _cap_negotiation_commands.  After capability
> > > negotiation, it is set to _commands.  If the chardev is closed,
> > > CHR_EVENT_CLOSED, mon->commands remains as _commands.  Only once the
> > > chardev is re-opened with CHR_EVENT_OPENED, is it reset to
> > > _cap_negotiation_commands.
> > >
> > > monitor_qapi_event_emit compares mon->commands to
> > > _cap_negotiation_commands, and skips sending events when they are
> > > equal.
> >
> > The check's purpose is to ensure we don't send events in capabilities
> > negotiation mode, i.e. between connect and a successful qmp_capabilities
> > command.
> >
> > > In the case of a closed chardev, QMP events are still sent down
> > > to the closed chardev which needs to drop them.
> >
> > True, but I wonder how this can happen.  Can you explain?

I was working on QMP over Xen's argo inter-vm communication for the
OpenXT project.  We had a lock up because we weren't properly dropping
QMP events in the chardev, even though we called CHR_EVENT_CLOSED.  We
fixed the argo chardev to drop messages like other chardevs, but my
investigation found QMP was still sending events even through it was
closed.

Xen's libxl opens a UNIX domain QMP socket for the duration of issuing
a command and then closes it.  OpenXT does the same over Argo.  In the
case of connecting and disconnecting, QMP events continue to be
generated after the first connection, even though there is nothing
connected for an unbounded length of time.  I was just trying to
optimize that scenario by skipping QMP events when disconnected.

> > > Set mon->commands to _cap_negotiation_commands for CHR_EVENT_CLOSED
> > > to stop sending events.  Setting for the CHR_EVENT_OPENED case remains
> > > since that is how mon->commands is set for a newly opened connections.
> > >
> > > Signed-off-by: Jason Andryuk 
> > > ---
> > >  monitor/qmp.c | 1 +
> > >  1 file changed, 1 insertion(+)
> > >
> > > diff --git a/monitor/qmp.c b/monitor/qmp.c
> > > index 9d9e5d8b27..5e2073c5eb 100644
> > > --- a/monitor/qmp.c
> > > +++ b/monitor/qmp.c
> > > @@ -333,6 +333,7 @@ static void monitor_qmp_event(void *opaque, int event)
> >case CHR_EVENT_CLOSED:
> >/*
> > * Note: this is only useful when the output of the chardev
> > * backend is still open.  For example, when the backend is
> > * stdio, it's possible that stdout is still open when stdin
> > >   * is closed.
> > >   */
> > >  monitor_qmp_cleanup_queues(mon);
> > > +mon->commands = _cap_negotiation_commands;
> > >  json_message_parser_destroy(>parser);
> > >  json_message_parser_init(>parser, handle_qmp_command,
> > >   mon, NULL);
> >
> > Patchew reports this loses SHUTDOWN events.  Local testing confirms it
> > does.  Simplified reproducer:
> >
> > $ for ((i=0; i<100; i++)); do printf '{"execute": 
> > "qmp_capabilities"}\n{"execute": "quit"}' | upstream-qemu -S -display none 
> > -qmp stdio; done | grep -c SHUTDOWN
> > 84
> >
> > In this test, the SHUTDOWN event was lost 16 out of 100 times.
> >
> > Its emission obviously races with the assignment you add.
> >
> > The comment preceding it provides a clue: after CHR_EVENT_CLOSED, we
> > know the input direction is gone, and we duly clean up that part of the
> > monitor.  But the output direction may or may not be gone, so we don't
> > touch it.  Your assignment touches it.  This is not the correct spot.
> > I can't tell you offhand where the correct spot is.  Perhaps Marc-André
> > can.
>
> The same "closed" event is used to signal read-disconnected,
> write-disconnected and hup.
>
> To implement half-closed signaling, we would need more events/state
> (probably change the chardev API to use input and output streams).
> Tbh, I am not sure it's really worth at this point, unless we have a
> "visible" bug to fix.

Yes, I agree.

Thanks for your time looking at this.

Regards,
Jason



[PATCH v2 5/5] MAINTAINERS: Add two files to Malta section

2019-11-13 Thread Aleksandar Markovic
From: Aleksandar Markovic 

Add two files that were recently introduced in a refactoring,
that Malta emulation relies on. They are added by this patch
to Malta section, but they are not added to the general MIPS
section, since they are really not MIPS-specific, and there
may be some non-MIPS hardware using them in future.

Signed-off-by: Aleksandar Markovic 
---
 MAINTAINERS | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index ba9ca98..f8a1646 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -959,8 +959,10 @@ M: Philippe Mathieu-Daudé 
 R: Hervé Poussineau 
 R: Aurelien Jarno 
 S: Maintained
+F: hw/isa/piix4.c
 F: hw/mips/mips_malta.c
 F: hw/mips/gt64xxx_pci.c
+F: include/hw/southbridge/piix.h
 F: tests/acceptance/linux_ssh_mips_malta.py
 
 Mipssim
-- 
2.7.4




[PATCH v2 3/5] MAINTAINERS: Adjust maintainership for Malta board

2019-11-13 Thread Aleksandar Markovic
From: Aleksandar Markovic 

Change the maintainership for Malta board to improve its quality.

Acked-by: Aurelien Jarno 
Signed-off-by: Aleksandar Markovic 
---
 MAINTAINERS | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 3bf2144..6afec32 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -955,8 +955,9 @@ F: hw/display/jazz_led.c
 F: hw/dma/rc4030.c
 
 Malta
-M: Aurelien Jarno 
-R: Aleksandar Rikalo 
+M: Philippe Mathieu-Daudé 
+R: Hervé Poussineau 
+R: Aurelien Jarno 
 S: Maintained
 F: hw/mips/mips_malta.c
 F: hw/mips/gt64xxx_pci.c
-- 
2.7.4




Re: [RFC PATCH 01/18] qemu-storage-daemon: Add barebone tool

2019-11-13 Thread Kevin Wolf
Am 24.10.2019 um 15:50 hat Eric Blake geschrieben:
> On 10/17/19 8:01 AM, Kevin Wolf wrote:
> > This adds a new binary qemu-storage-daemon that doesn't yet do more than
> > some typical initialisation for tools and parsing the basic command
> > options --version, --help and --trace.
> > 
> > Signed-off-by: Kevin Wolf 
> > ---
> >   configure |   2 +-
> >   qemu-storage-daemon.c | 141 ++
> >   Makefile  |   1 +
> >   3 files changed, 143 insertions(+), 1 deletion(-)
> >   create mode 100644 qemu-storage-daemon.c
> > 
> > diff --git a/configure b/configure
> 
> > +++ b/qemu-storage-daemon.c
> > @@ -0,0 +1,141 @@
> > +/*
> > + * QEMU storage daemon
> > + *
> > + * Copyright (c) 2019 Kevin Wolf 
> > + *
> > + * Permission is hereby granted, free of charge, to any person obtaining a 
> > copy
> > + * of this software and associated documentation files (the "Software"), 
> > to deal
> > + * in the Software without restriction, including without limitation the 
> > rights
> > + * to use, copy, modify, merge, publish, distribute, sublicense, and/or 
> > sell
> > + * copies of the Software, and to permit persons to whom the Software is
> > + * furnished to do so, subject to the following conditions:
> > + *
> > + * The above copyright notice and this permission notice shall be included 
> > in
> > + * all copies or substantial portions of the Software.
> > + *
> > + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS 
> > OR
> > + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> > + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> > + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR 
> > OTHER
> > + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> > FROM,
> > + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS 
> > IN
> > + * THE SOFTWARE.
> 
> Is there an intent to license this binary as BSD (by restricting sources
> that can be linked in), or is it going to end up as GPLv2+ for other
> reasons? If the latter, would it be better to license this file GPLv2+?

The binary will be GPLv2 either way. Maybe even GPLv2+, not sure about
that.

This file copies quite a bit from qemu-img.c and vl.c, so I'm using the
same license as there. Of course, the "bug" in my patch is that I also
need to keep not only the license text, but also the actual copyright
line from there. Will fix.

> 
> > + */
> > +
> > +#include "qemu/osdep.h"
> > +
> > +#include "block/block.h"
> > +#include "crypto/init.h"
> > +
> > +#include "qapi/error.h"
> > +#include "qemu-common.h"
> > +#include "qemu-version.h"
> > +#include "qemu/config-file.h"
> > +#include "qemu/error-report.h"
> > +#include "qemu/log.h"
> > +#include "qemu/main-loop.h"
> > +#include "qemu/module.h"
> > +
> > +#include "trace/control.h"
> > +
> > +#include 
> 
> Shouldn't system headers appear right after qemu/osdep.h?

I wasn't aware of this, but CODING_STYLE.rst agrees with you.

> > +
> > +static void help(void)
> > +{
> > +printf(
> > +"Usage: %s [options]\n"
> > +"QEMU storage daemon\n"
> > +"\n"
> > +"  -h, --help display this help and exit\n"
> > +"  -T, --trace [[enable=]][,events=][,file=]\n"
> > +" specify tracing options\n"
> > +"  -V, --version  output version information and exit\n"
> > +"\n"
> > +QEMU_HELP_BOTTOM "\n",
> > +error_get_progname());
> > +}
> > +
> > +static int process_options(int argc, char *argv[], Error **errp)
> > +{
> > +int c;
> > +char *trace_file = NULL;
> > +int ret = -EINVAL;
> > +
> > +static const struct option long_options[] = {
> > +{"help", no_argument, 0, 'h'},
> > +{"version", no_argument, 0, 'V'},
> > +{"trace", required_argument, NULL, 'T'},
> 
> I find it harder to maintain lists of options (which will get longer over
> time) when the order of the struct...
> 
> > +{0, 0, 0, 0}
> > +};
> > +
> > +while ((c = getopt_long(argc, argv, ":hT:V", long_options, NULL)) != 
> > -1) {
> 
> ...the order of the short options...
> 
> > +switch (c) {
> > +case '?':
> > +error_setg(errp, "Unknown option '%s'", argv[optind - 1]);
> > +goto out;
> > +case ':':
> > +error_setg(errp, "Missing option argument for '%s'",
> > +   argv[optind - 1]);
> > +goto out;
> > +case 'h':
> > +help();
> > +exit(EXIT_SUCCESS);
> > +case 'V':
> > +printf("qemu-storage-daemon version "
> > +   QEMU_FULL_VERSION "\n" QEMU_COPYRIGHT "\n");
> > +exit(EXIT_SUCCESS);
> > +case 'T':
> > +g_free(trace_file);
> > +trace_file = trace_opt_parse(optarg);
> > +break;
> 
> ...and the order of the case statements are not identical. Case-insensitive
> 

Re: [RFC PATCH v2 03/26] qcow2: Process QCOW2_CLUSTER_ZERO_ALLOC clusters in handle_copied()

2019-11-13 Thread Alberto Garcia
On Wed 30 Oct 2019 03:24:08 PM CET, Max Reitz wrote:
>>  static void calculate_l2_meta(BlockDriverState *bs, uint64_t host_offset,
>>uint64_t guest_offset, uint64_t bytes,
>> -  QCowL2Meta **m, bool keep_old)
>> +  uint64_t *l2_slice, QCowL2Meta **m, bool 
>> keep_old)
>>  {
>>  BDRVQcow2State *s = bs->opaque;
>> -unsigned cow_start_from = 0;
>> +int l2_index = offset_to_l2_slice_index(s, guest_offset);
>> +uint64_t l2_entry;
>> +unsigned cow_start_from, cow_end_to;
>>  unsigned cow_start_to = offset_into_cluster(s, guest_offset);
>>  unsigned cow_end_from = cow_start_to + bytes;
>> -unsigned cow_end_to = ROUND_UP(cow_end_from, s->cluster_size);
>>  unsigned nb_clusters = size_to_clusters(s, cow_end_from);
>>  QCowL2Meta *old_m = *m;
>> +QCow2ClusterType type;
>> +
>> +/* Return if there's no COW (all clusters are normal and we keep them) 
>> */
>> +if (keep_old) {
>> +int i;
>> +for (i = 0; i < nb_clusters; i++) {
>> +l2_entry = be64_to_cpu(l2_slice[l2_index + i]);
>
> I’d assert somewhere that l2_index + nb_clusters - 1 won’t overflow.
>
>> +if (qcow2_get_cluster_type(bs, l2_entry) != 
>> QCOW2_CLUSTER_NORMAL) {
>
> Wouldn’t cluster_needs_cow() be better?

The semantics of cluster_needs_cow() change in this patch (which also
updates the documentation). But I should maybe change the name instead.

>> +break;
>> +}
>> +}
>> +if (i == nb_clusters) {
>> +return;
>> +}
>> +}
>
> So I understand we always need to create an L2Meta structure in all
> other cases because we at least need to turn those clusters into
> normal clusters?  (Even if they’re already allocated, as in the case
> of allocated zero clusters.)

That's correct.

>> -/* Returns true if writing to a cluster requires COW */
>> +/* Returns true if the cluster is unallocated or has refcount > 1 */
>>  static bool cluster_needs_cow(BlockDriverState *bs, uint64_t l2_entry)
>>  {
>>  switch (qcow2_get_cluster_type(bs, l2_entry)) {
>>  case QCOW2_CLUSTER_NORMAL:
>> +case QCOW2_CLUSTER_ZERO_ALLOC:
>>  if (l2_entry & QCOW_OFLAG_COPIED) {
>>  return false;
>
> Don’t zero-allocated clusters need COW always?  (Because the at the
> given host offset isn’t guaranteed to be zero.)

Yeah, hence the semantics change I described earlier. I should probably
call it cluster_needs_new_allocation() or something like that, which is
what this means now ("true if unallocated or refcount > 1").

>> - * Returns the number of contiguous clusters that can be used for an 
>> allocating
>> - * write, but require COW to be performed (this includes yet unallocated 
>> space,
>> - * which must copy from the backing file)
>> + * Returns the number of contiguous clusters that can be written to
>> + * using one single write request, starting from @l2_index.
>> + * At most @nb_clusters are checked.
>> + *
>> + * If @want_cow is true this counts clusters that are either
>> + * unallocated, or allocated but with refcount > 1.
>
> +(So they need to be newly allocated and COWed)?

Yes. Which in this context is the same as "newly allocated" I guess,
because every newly allocated cluster requires COW.

> (Or is the past participle of COW COWn?  Or maybe CedOW?)

:-))

>> + * If @want_cow is false this counts clusters that are already
>> + * allocated and can be written to using their current locations
>
> s/using their current locations/in-place/?

Ok.

>> @@ -1475,13 +1489,14 @@ static int handle_alloc(BlockDriverState *bs, 
>> uint64_t guest_offset,
>>  *bytes = MIN(*bytes, nb_bytes - offset_into_cluster(s, guest_offset));
>>  assert(*bytes != 0);
>>  
>> -calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes,
>> -  m, keep_old_clusters);
>> +calculate_l2_meta(bs, alloc_cluster_offset, guest_offset, *bytes, 
>> l2_slice,
>> +  m, false);
>>  
>> -return 1;
>> +ret = 1;
>>  
>> -fail:
>> -if (*m && (*m)->nb_clusters > 0) {
>> +out:
>> +qcow2_cache_put(s->l2_table_cache, (void **) _slice);
>
> Is this a bug fix?

No, we call qcow2_cache_put() later because calculate_l2_meta() needs
l2_slice.

Berto



[PATCH 13/16] xen: convert "-machine igd-passthru" to an accelerator property

2019-11-13 Thread Paolo Bonzini
The first machine property to fall is Xen's Intel integrated graphics
passthrough.  The "-machine igd-passthru" option does not set anymore
a property on the machine object, but desugars to a GlobalProperty on
accelerator objects.

The setter is very simple, since the value ends up in a
global variable, so this patch also provides an example before the more
complicated cases that follow it.

Signed-off-by: Paolo Bonzini 
---
 hw/core/machine.c   | 20 
 hw/xen/xen-common.c | 16 
 include/hw/boards.h |  1 -
 qemu-options.hx |  9 +
 vl.c| 14 --
 5 files changed, 25 insertions(+), 35 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 45ddfb6..d7a0356 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -412,20 +412,6 @@ static void machine_set_graphics(Object *obj, bool value, 
Error **errp)
 ms->enable_graphics = value;
 }
 
-static bool machine_get_igd_gfx_passthru(Object *obj, Error **errp)
-{
-MachineState *ms = MACHINE(obj);
-
-return ms->igd_gfx_passthru;
-}
-
-static void machine_set_igd_gfx_passthru(Object *obj, bool value, Error **errp)
-{
-MachineState *ms = MACHINE(obj);
-
-ms->igd_gfx_passthru = value;
-}
-
 static char *machine_get_firmware(Object *obj, Error **errp)
 {
 MachineState *ms = MACHINE(obj);
@@ -862,12 +848,6 @@ static void machine_class_init(ObjectClass *oc, void *data)
 object_class_property_set_description(oc, "graphics",
 "Set on/off to enable/disable graphics emulation", _abort);
 
-object_class_property_add_bool(oc, "igd-passthru",
-machine_get_igd_gfx_passthru, machine_set_igd_gfx_passthru,
-_abort);
-object_class_property_set_description(oc, "igd-passthru",
-"Set on/off to enable/disable igd passthrou", _abort);
-
 object_class_property_add_str(oc, "firmware",
 machine_get_firmware, machine_set_firmware,
 _abort);
diff --git a/hw/xen/xen-common.c b/hw/xen/xen-common.c
index 5284b0d..6cba30c 100644
--- a/hw/xen/xen-common.c
+++ b/hw/xen/xen-common.c
@@ -124,6 +124,16 @@ static void xen_change_state_handler(void *opaque, int 
running,
 }
 }
 
+static bool xen_get_igd_gfx_passthru(Object *obj, Error **errp)
+{
+return has_igd_gfx_passthru;
+}
+
+static void xen_set_igd_gfx_passthru(Object *obj, bool value, Error **errp)
+{
+has_igd_gfx_passthru = value;
+}
+
 static void xen_setup_post(MachineState *ms, AccelState *accel)
 {
 int rc;
@@ -177,6 +187,12 @@ static void xen_accel_class_init(ObjectClass *oc, void 
*data)
 ac->compat_props = g_ptr_array_new();
 
 compat_props_add(ac->compat_props, compat, G_N_ELEMENTS(compat));
+
+object_class_property_add_bool(oc, "igd-passthru",
+xen_get_igd_gfx_passthru, xen_set_igd_gfx_passthru,
+_abort);
+object_class_property_set_description(oc, "igd-passthru",
+"Set on/off to enable/disable igd passthrou", _abort);
 }
 
 #define TYPE_XEN_ACCEL ACCEL_CLASS_NAME("xen")
diff --git a/include/hw/boards.h b/include/hw/boards.h
index 36fcbda..cdcf481 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -287,7 +287,6 @@ struct MachineState {
 bool mem_merge;
 bool usb;
 bool usb_disabled;
-bool igd_gfx_passthru;
 char *firmware;
 bool iommu;
 bool suppress_vmdesc;
diff --git a/qemu-options.hx b/qemu-options.hx
index 3931f90..5b43a83 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -37,7 +37,6 @@ DEF("machine", HAS_ARG, QEMU_OPTION_machine, \
 "kvm_shadow_mem=size of KVM shadow MMU in bytes\n"
 "dump-guest-core=on|off include guest memory in a core 
dump (default=on)\n"
 "mem-merge=on|off controls memory merge support (default: 
on)\n"
-"igd-passthru=on|off controls IGD GFX passthrough support 
(default=off)\n"
 "aes-key-wrap=on|off controls support for AES key wrapping 
(default=on)\n"
 "dea-key-wrap=on|off controls support for DEA key wrapping 
(default=on)\n"
 "suppress-vmdesc=on|off disables self-describing migration 
(default=off)\n"
@@ -71,8 +70,6 @@ more than one accelerator specified, the next one is used if 
the previous one
 fails to initialize.
 @item kernel_irqchip=on|off
 Controls in-kernel irqchip support for the chosen accelerator when available.
-@item gfx_passthru=on|off
-Enables IGD GFX passthrough support for the chosen machine when available.
 @item vmport=on|off|auto
 Enables emulation of VMWare IO port, for vmmouse etc. auto says to select the
 value based on accel. For accel=xen the default is off otherwise the default
@@ -118,8 +115,9 @@ Select CPU model (@code{-cpu help} for list and additional 
feature selection)
 ETEXI
 
 DEF("accel", HAS_ARG, QEMU_OPTION_accel,
-"-accel [accel=]accelerator[,thread=single|multi]\n"
+"-accel [accel=]accelerator[,prop[=value][,...]]\n"
 "select 

Re: [PATCH] spapr/kvm: Set default cpu model for all machine classes

2019-11-13 Thread Greg Kurz
On Wed, 13 Nov 2019 15:43:44 +0100
Jiri Denemark  wrote:

> Hi David.
> 
> On Wed, Oct 30, 2019 at 17:32:43 +0100, David Gibson wrote:
> > We have to set the default model of all machine classes, not just for the
> > active one. Otherwise, "query-machines" will indicate the wrong CPU model
> > ("qemu-s390x-cpu" instead of "host-s390x-cpu") as "default-cpu-type".
> > 
> > s390x already fixed this in de60a92e "s390x/kvm: Set default cpu model for
> > all machine classes".  This patch applies a similar fix for the pseries-*
> > machine types on ppc64.
> > 
> > Doing a
> > {"execute":"query-machines"}
> > under KVM now results in
> > {
> >   "hotpluggable-cpus": true,
> >   "name": "pseries-4.2",
> >   "numa-mem-supported": true,
> >   "default-cpu-type": "host-powerpc64-cpu",
> >   "is-default": true,
> >   "cpu-max": 1024,
> >   "deprecated": false,
> >   "alias": "pseries"
> > },
> > {
> >   "hotpluggable-cpus": true,
> >   "name": "pseries-4.1",
> >   "numa-mem-supported": true,
> >   "default-cpu-type": "host-powerpc64-cpu",
> >   "cpu-max": 1024,
> >   "deprecated": false
> > },
> > ...
> > 
> > Libvirt probes all machines via "-machine none,accel=kvm:tcg" and will
> > currently see the wrong CPU model under KVM.
> 
> Will this patch make it into 4.2.0?
> 

David is away until the 19th of November, which is the release date
of rc2 according to the planning [*]. Then we have rc3 the 26th, and
final release (or rc4) the 3rd of December, so it should be ok.

If David cannot make it for some reason, I guess Laurent Vivier or
myself can send a PR with this patch, as already suggested by David
in the past.

Cheers,

--
Greg

[*] https://wiki.qemu.org/Planning/4.2

> Jirka
> 
> 




Re: [PATCH v1 3/5] Add use of RCU for qemu_logfile.

2019-11-13 Thread Alex Bennée


Robert Foley  writes:

> On Tue, 12 Nov 2019 at 12:36, Alex Bennée  wrote:
>>
>>
>> >  }
>> > +atomic_rcu_set(_logfile, logfile);
>> >  }
>> > -qemu_mutex_unlock(_logfile_mutex);
>> > +logfile = qemu_logfile;
>>
>> Isn't this read outside of the protection of both rcu_read_lock() and
>> the mutex? Could that race?
>
> This assignment is under the mutex.  This change moved the mutex
> unlock towards the end of the function, just a few lines below the
> call_rcu().

Ahh I see now, I went +- blind ;-)

>
>> > +
>> >  if (qemu_logfile &&
>> >  (is_daemonized() ? logfilename == NULL : !qemu_loglevel)) {
>> > -qemu_log_close();
>> > +atomic_rcu_set(_logfile, NULL);
>> > +call_rcu(logfile, qemu_logfile_free, rcu);
>>
>> I wonder if we can re-arrange the logic here so it's lets confusing? For
>> example the NULL qemu_loglevel can be detected at the start and we
>> should be able just to squash the current log and reset and go. I'm not
>> sure about the damonize/stdout check.
>>
>> >  }
>> > +qemu_mutex_unlock(_logfile_mutex);
>> >  }
>> >
>
> Absolutely, the logic that was here originally can be improved.  We
> found it confusing also.
> We could move things around a bit and change it to something like this
> to help clarify.
>
> bool need_to_open_file = false;
> /*
>  * In all cases we only log if qemu_loglevel is set.
>  * And:
>  * If not daemonized we will always log either to stderr
>  *   or to a file (if there is a logfilename).
>  * If we are daemonized,
>  *   we will only log if there is a logfilename.
>  */
> if (qemu_loglevel && (!is_daemonized() || logfilename) {
> need_to_open_file = true;
> }
> g_assert(qemu_logfile_mutex.initialized);
> qemu_mutex_lock(_logfile_mutex);
>
> if(qemu_logfile && !need_to_open_file) {
> /* Close file. */
> QemuLogFile *logfile = qemu_logfile;
> atomic_rcu_set(_logfile, NULL);
> call_rcu(logfile, qemu_logfile_free, rcu);
> } else if(!qemu_logfile && need_to_open_file) {
> logfile = g_new0(QemuLogFile, 1);
>__snip__ existing patch logic for opening the qemu_logfile will
> be inserted here.
> }
> qemu_mutex_unlock(_logfile_mutex);

That reads a lot better thanks. It's probably worth doing the clean-up in
a separate patch so it is easier to see the mutex's when added.

--
Alex Bennée



Re: [SeaBIOS] Re: [PATCH] ahci: zero-initialize port struct

2019-11-13 Thread Sam Eiderman
Sorry the correct list (which includes ahci) is:

git grep "drive_s * drive"

src/hw/ahci.h:struct drive_s drive;
src/hw/ata.h:struct drive_s drive;
src/hw/esp-scsi.c:struct drive_s drive;
src/hw/lsi-scsi.c:struct drive_s drive;
src/hw/megasas.c:struct drive_s drive;
src/hw/mpt-scsi.c:struct drive_s drive;
src/hw/nvme-int.h:struct drive_s drive;
src/hw/pvscsi.c:struct drive_s drive;
src/hw/sdcard.c:struct drive_s drive;
src/hw/usb-msc.c:struct drive_s drive;
src/hw/usb-uas.c:struct drive_s drive;
src/hw/virtio-blk.c:struct drive_s drive;
src/hw/virtio-scsi.c:struct drive_s drive;

Went over this list now, seems only ahci_port_alloc was missing
memset(0) so after this patch all of the devices that contain lchs are
covered.

But we still need to revert my lchs patches and reapply them with the
newest version (v4)

Sam

On Wed, Nov 13, 2019 at 5:18 PM Sam Eiderman  wrote:
>
> We should make sure drive.lchs is zeroed out for the following devices:
>
> git grep "drive_s drive"
>
> src/hw/ata.h:struct drive_s drive;
> src/hw/esp-scsi.c:struct drive_s drive;
> src/hw/lsi-scsi.c:struct drive_s drive;
> src/hw/megasas.c:struct drive_s drive;
> src/hw/mpt-scsi.c:struct drive_s drive;
> src/hw/nvme-int.h:struct drive_s drive;
> src/hw/pvscsi.c:struct drive_s drive;
> src/hw/sdcard.c:struct drive_s drive;
> src/hw/usb-msc.c:struct drive_s drive;
> src/hw/usb-uas.c:struct drive_s drive;
> src/hw/virtio-blk.c:struct drive_s drive;
> src/hw/virtio-scsi.c:struct drive_s drive;
>
> Sam
>
>
> On Wed, Nov 13, 2019 at 5:03 PM Sam Eiderman  wrote:
> >
> > Hi,
> >
> > Does this fix a bug that actually happened?
> >
> > I just noticed that in my lchs patches I assumed that lchs struct is
> > zeroed out in all devices (not only ahci):
> >
> > 9caa19be0e53 (geometry: Apply LCHS values for boot devices)
> >
> > Seems like this is not the case but why only ahci is affected?
> >
> > The list of devices is at least:
> >
> > * ata
> > * ahci
> > * scsi
> > * esp
> > * lsi
> > * megasas
> > * mpt
> > * pvscsi
> > * virtio
> > * virtio-blk
> >
> > As specified in the commit message.
> >
> > Also Gerd it seems that my lchs patches were not committed in the
> > latest submitted version (v4)!!!
> > The ABI of the fw config key is completely broken.
> >
> >
> > On Wed, Nov 13, 2019 at 4:02 PM Gerd Hoffmann  wrote:
> > >
> > > On Wed, Nov 13, 2019 at 10:35:03AM +0100, Laszlo Ersek wrote:
> > > > On 11/13/19 10:18, Gerd Hoffmann wrote:
> > > > > Specifically port->driver.lchs needs clearing, otherwise seabios will
> > > >
> > > > s/driver/drive/
> > >
> > > > Reviewed-by: Laszlo Ersek 
> > >
> > > Typo fixed & pushed.
> > >
> > > thanks,
> > >   Gerd
> > > ___
> > > SeaBIOS mailing list -- seab...@seabios.org
> > > To unsubscribe send an email to seabios-le...@seabios.org



Re: [RFC v4 PATCH 02/49] multi-process: util: Add qemu_thread_cancel() to cancel running thread

2019-11-13 Thread Daniel P . Berrangé
On Wed, Nov 13, 2019 at 10:38:06AM -0500, Jag Raman wrote:
> 
> 
> On 11/13/2019 10:30 AM, Stefan Hajnoczi wrote:
> > On Thu, Oct 24, 2019 at 05:08:43AM -0400, Jagannathan Raman wrote:
> > > qemu_thread_cancel() added to destroy a given running thread.
> > > This will be needed in the following patches.
> > > 
> > > Signed-off-by: John G Johnson 
> > > Signed-off-by: Jagannathan Raman 
> > > Signed-off-by: Elena Ufimtseva 
> > > ---
> > >   include/qemu/thread.h|  1 +
> > >   util/qemu-thread-posix.c | 10 ++
> > >   2 files changed, 11 insertions(+)
> > 
> > Is this still needed?  I thought previous discussion concluded that
> > thread cancellation is hard to get right and it's not actually used by
> > this series?
> 
> Hi Stefan,
> 
> This is used in PATCH 41/49.

I don't believe the cancellation usage in that patch is safe :-)

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [RFC v4 PATCH 41/49] multi-process/mig: Enable VMSD save in the Proxy object

2019-11-13 Thread Daniel P . Berrangé
On Thu, Oct 24, 2019 at 05:09:22AM -0400, Jagannathan Raman wrote:
> Collect the VMSD from remote process on the source and save
> it to the channel leading to the destination
> 
> Signed-off-by: Elena Ufimtseva 
> Signed-off-by: John G Johnson 
> Signed-off-by: Jagannathan Raman 
> ---
>  New patch in v4
> 
>  hw/proxy/qemu-proxy.c | 132 
> ++
>  include/hw/proxy/qemu-proxy.h |   2 +
>  include/io/mpqemu-link.h  |   1 +
>  3 files changed, 135 insertions(+)
> 
> diff --git a/hw/proxy/qemu-proxy.c b/hw/proxy/qemu-proxy.c
> index 623a6c5..ce72e6a 100644
> --- a/hw/proxy/qemu-proxy.c
> +++ b/hw/proxy/qemu-proxy.c
> @@ -52,6 +52,14 @@
>  #include "util/event_notifier-posix.c"
>  #include "hw/boards.h"
>  #include "include/qemu/log.h"
> +#include "io/channel.h"
> +#include "migration/qemu-file-types.h"
> +#include "qapi/error.h"
> +#include "io/channel-util.h"
> +#include "migration/qemu-file-channel.h"
> +#include "migration/qemu-file.h"
> +#include "migration/migration.h"
> +#include "migration/vmstate.h"
>  
>  QEMUTimer *hb_timer;
>  static void pci_proxy_dev_realize(PCIDevice *dev, Error **errp);
> @@ -62,6 +70,9 @@ static void stop_heartbeat_timer(void);
>  static void childsig_handler(int sig, siginfo_t *siginfo, void *ctx);
>  static void broadcast_msg(MPQemuMsg *msg, bool need_reply);
>  
> +#define PAGE_SIZE getpagesize()
> +uint8_t *mig_data;
> +
>  static void childsig_handler(int sig, siginfo_t *siginfo, void *ctx)
>  {
>  /* TODO: Add proper handler. */
> @@ -357,14 +368,135 @@ static void pci_proxy_dev_inst_init(Object *obj)
>  dev->mem_init = false;
>  }
>  
> +typedef struct {
> +QEMUFile *rem;
> +PCIProxyDev *dev;
> +} proxy_mig_data;
> +
> +static void *proxy_mig_out(void *opaque)
> +{
> +proxy_mig_data *data = opaque;
> +PCIProxyDev *dev = data->dev;
> +uint8_t byte;
> +uint64_t data_size = PAGE_SIZE;
> +
> +mig_data = g_malloc(data_size);
> +
> +while (true) {
> +byte = qemu_get_byte(data->rem);

There is a pretty large set of APIs hiding behind the qemu_get_byte
call, which does not give me confidence that...

> +mig_data[dev->migsize++] = byte;
> +if (dev->migsize == data_size) {
> +data_size += PAGE_SIZE;
> +mig_data = g_realloc(mig_data, data_size);
> +}
> +}
> +
> +return NULL;
> +}
> +
> +static int proxy_pre_save(void *opaque)
> +{
> +PCIProxyDev *pdev = opaque;
> +proxy_mig_data *mig_data;
> +QEMUFile *f_remote;
> +MPQemuMsg msg = {0};
> +QemuThread thread;
> +Error *err = NULL;
> +QIOChannel *ioc;
> +uint64_t size;
> +int fd[2];
> +
> +if (socketpair(AF_UNIX, SOCK_STREAM, 0, fd)) {
> +return -1;
> +}
> +
> +ioc = qio_channel_new_fd(fd[0], );
> +if (err) {
> +error_report_err(err);
> +return -1;
> +}
> +
> +qio_channel_set_name(QIO_CHANNEL(ioc), "PCIProxyDevice-mig");
> +
> +f_remote = qemu_fopen_channel_input(ioc);
> +
> +pdev->migsize = 0;
> +
> +mig_data = g_malloc0(sizeof(proxy_mig_data));
> +mig_data->rem = f_remote;
> +mig_data->dev = pdev;
> +
> +qemu_thread_create(, "Proxy MIG_OUT", proxy_mig_out, mig_data,
> +   QEMU_THREAD_DETACHED);
> +
> +msg.cmd = START_MIG_OUT;
> +msg.bytestream = 0;
> +msg.num_fds = 2;
> +msg.fds[0] = fd[1];
> +msg.fds[1] = GET_REMOTE_WAIT;
> +
> +mpqemu_msg_send(pdev->mpqemu_link, , pdev->mpqemu_link->com);
> +size = wait_for_remote(msg.fds[1]);
> +PUT_REMOTE_WAIT(msg.fds[1]);
> +
> +assert(size != ULLONG_MAX);
> +
> +/*
> + * migsize is being update by a separate thread. Using volatile to
> + * instruct the compiler to fetch the value of this variable from
> + * memory during every read
> + */
> +while (*((volatile uint64_t *)>migsize) < size) {
> +}
> +
> +qemu_thread_cancel();

this is a safe way to stop the thread executing without
resulting in memory being leaked.

In addition thread cancellation is asynchronous, so the thread
may still be using the QEMUFile object while

> +qemu_fclose(f_remote);

..this is closing it. This feels like it is a crash danger.


> +close(fd[1]);
> +
> +return 0;
> +}

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|




Re: [PATCH v2 1/2] nbd: Don't send oversize strings

2019-11-13 Thread Eric Blake

On 10/15/19 11:16 AM, Vladimir Sementsov-Ogievskiy wrote:


@@ -1561,6 +1569,8 @@ NBDExport *nbd_export_new(BlockDriverState *bs, uint64_t 
dev_offset,
    exp->export_bitmap = bm;
    exp->export_bitmap_context = g_strdup_printf("qemu:dirty-bitmap:%s",
     bitmap);
+    /* See BME_MAX_NAME_SIZE in block/qcow2-bitmap.c */


Hmm. BME_MAX_NAME_SIZE is checked only when creating persistent bitmaps. But 
for non-persistent
name length is actually unlimited. So, we should either limit all bitmap names 
to 1023 (hope,
this will not break existing scenarios) or error out here (or earlier) instead 
of assertion.


I'm seriously doubting that any existing scenarios try to use a name 
that long. If no one was relying on a long name (especially since it was 
inconsistent between persistent being limited to qcow2 constraints and 
non-persistent having no limit), we can consider it as a bug-fix rather 
than something needing a deprecation period.




I'm leaning towards limiting ALL bitmaps to the same length (as we've already 
debated the idea of being able to convert an existing bitmap from transient to 
persistent).


Agreed, but ..





We also may want QEMU_BUILD_BUG_ON(NBD_MAX_STRING_SIZE < BME_MAX_NAME_SIZE + 
sizeof("qemu:dirty-bitmap:") - 1)


Except that BME_MAX_NAME_SIZE is not (currently) in a public .h file.



.. I think, than it should be new BLOCK_DIRTY_BITMAP_MAX_NAME_SIZE.. And we'll 
have to note it in qapi doc..
Should this change go through deprecation? Or we consider non-persistent 
bitmaps as something not really useful?


I'm preparing a v3 patch that just goes ahead and adds the limit on 
bitmap names everywhere, as a separate patch.


--
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3226
Virtualization:  qemu.org | libvirt.org




Re: [RFC v4 PATCH 07/49] multi-process: define mpqemu-link object

2019-11-13 Thread Stefan Hajnoczi
On Thu, Oct 24, 2019 at 05:08:48AM -0400, Jagannathan Raman wrote:
> +#ifndef MPQEMU_LINK_H
> +#define MPQEMU_LINK_H
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +
> +#include 
> +#include 

These are already included by "qemu/osdep.h".

> +#include 

Is  needed?

> +
> +#include "qom/object.h"
> +#include "qemu/thread.h"
> +
> +#define TYPE_MPQEMU_LINK "mpqemu-link"
> +#define MPQEMU_LINK(obj) \
> +OBJECT_CHECK(MPQemuLinkState, (obj), TYPE_MPQEMU_LINK)
> +
> +#define REMOTE_MAX_FDS 8
> +
> +#define MPQEMU_MSG_HDR_SIZE offsetof(MPQemuMsg, data1.u64)
> +
> +/**
> + * mpqemu_cmd_t:
> + * CONF_READPCI config. space read
> + * CONF_WRITE   PCI config. space write
> + *
> + * proc_cmd_t enum type to specify the command to be executed on the remote
> + * device.
> + */
> +typedef enum {
> +INIT = 0,
> +CONF_READ,
> +CONF_WRITE,
> +MAX,
> +} mpqemu_cmd_t;

Please allow for future non-PCI devices by clearly naming PCI-specific
commands and including a bus type in the initialization messages.

> diff --git a/io/mpqemu-link.c b/io/mpqemu-link.c
> new file mode 100644
> index 000..b39f4d0
> --- /dev/null
> +++ b/io/mpqemu-link.c
> @@ -0,0 +1,309 @@
> +/*
> + * Communication channel between QEMU and remote device process
> + *
> + * Copyright 2019, Oracle and/or its affiliates.
> + *
> + * Permission is hereby granted, free of charge, to any person obtaining a 
> copy
> + * of this software and associated documentation files (the "Software"), to 
> deal
> + * in the Software without restriction, including without limitation the 
> rights
> + * to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
> + * copies of the Software, and to permit persons to whom the Software is
> + * furnished to do so, subject to the following conditions:
> + *
> + * The above copyright notice and this permission notice shall be included in
> + * all copies or substantial portions of the Software.
> + *
> + * THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
> + * IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
> + * FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL
> + * THE AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
> + * LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING 
> FROM,
> + * OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
> + * THE SOFTWARE.
> + */
> +
> +#include "qemu/osdep.h"
> +#include "qemu-common.h"
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 

Many of these are already included by "qemu/osdep.h".  Some of them
shouldn't be used directly because QEMU or glib have abstractions that
hide the platform-specific differences (e.g. pthread, poll).

> +MPQemuLinkState *mpqemu_link_create(void)
> +{
> +return MPQEMU_LINK(object_new(TYPE_MPQEMU_LINK));
> +}

I'm not sure what the purpose of this object is.  mpqemu_link_create()
suggests the objects will be created internally instead of via -object
mpqemu-link,..., which is unusual.

mpqemu_msg_send() and mpqemu_msg_recv() seem to be the main functions
but they do not even use their MPQemuLinkState *s argument.

> +void mpqemu_start_coms(MPQemuLinkState *s)
> +{
> +
> +g_assert(g_source_attach(>com->gsrc, s->ctx));
> +
> +g_main_loop_run(s->loop);
> +}

There is already IOThread if you need an event loop thread.  But does
this need to be its own thread?  The communication should be
asynchronous and therefore it can run in the main event loop or any
existing IOThread.


signature.asc
Description: PGP signature


Re: [RFC v4 PATCH 33/49] multi-process: perform device reset in the remote process

2019-11-13 Thread Jag Raman




On 11/11/2019 11:19 AM, Stefan Hajnoczi wrote:

On Thu, Oct 24, 2019 at 05:09:14AM -0400, Jagannathan Raman wrote:

+void proxy_device_reset(DeviceState *dev)
+{
+PCIProxyDev *pdev = PCI_PROXY_DEV(dev);
+MPQemuMsg msg;
+
+memset(, 0, sizeof(MPQemuMsg));
+
+msg.bytestream = 0;
+msg.size = sizeof(msg.data1);
+msg.cmd = DEVICE_RESET;
+
+mpqemu_msg_send(pdev->mpqemu_link, , pdev->mpqemu_link->com);
+}


Device reset must wait for the remote process to finish reset, otherwise
the remote device could still be running after proxy_device_reset()
returns from sending the message.


Thanks for feedback. We will wait for the reset to complete.

--
Jag



Stefan





Re: [PATCH v1 1/5] tests/vm: make --interactive (and therefore DEBUG=1) unconditional

2019-11-13 Thread Thomas Huth
On 13/11/2019 12.59, Alex Bennée wrote:
> While the concept of only dropping to ssh if a test fails is nice it
> is more useful for this to be unconditional. You usually just want to
> get the build up and running and then noodle around debugging or
> attempting to replicate.
> 
> Cc: Peter Maydell 
> Signed-off-by: Alex Bennée 
> 
> ---
> v2
>   - fix spelling
> ---
>  tests/vm/basevm.py | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/tests/vm/basevm.py b/tests/vm/basevm.py
> index 91a9226026d..0b8c1b26576 100755
> --- a/tests/vm/basevm.py
> +++ b/tests/vm/basevm.py
> @@ -403,7 +403,7 @@ def main(vmcls):
>  exitcode = 0
>  if vm.ssh(*cmd) != 0:
>  exitcode = 3
> -if exitcode != 0 and args.interactive:
> +if args.interactive:
>  vm.ssh()
>  
>  if not args.snapshot:
> 

Reviewed-by: Thomas Huth 




[PATCH 14/16] kvm: convert "-machine kvm_shadow_mem" to an accelerator property

2019-11-13 Thread Paolo Bonzini
Signed-off-by: Paolo Bonzini 
---
 accel/kvm/kvm-all.c | 43 +++
 hw/core/machine.c   | 39 ---
 include/hw/boards.h |  2 --
 qemu-options.hx |  6 +++---
 target/i386/kvm.c   |  2 +-
 vl.c|  4 
 6 files changed, 51 insertions(+), 45 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 140b0bd..c016319 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -41,6 +41,7 @@
 #include "hw/irq.h"
 #include "sysemu/sev.h"
 #include "sysemu/balloon.h"
+#include "qapi/visitor.h"
 
 #include "hw/boards.h"
 
@@ -92,6 +93,7 @@ struct KVMState
 int max_nested_state_len;
 int many_ioeventfds;
 int intx_set_mask;
+int kvm_shadow_mem;
 bool sync_mmu;
 bool manual_dirty_log_protect;
 /* The man page (and posix) say ioctl numbers are signed int, but
@@ -2922,6 +2924,40 @@ static bool kvm_accel_has_memory(MachineState *ms, 
AddressSpace *as,
 return false;
 }
 
+static void kvm_get_kvm_shadow_mem(Object *obj, Visitor *v,
+   const char *name, void *opaque,
+   Error **errp)
+{
+KVMState *s = KVM_STATE(obj);
+int64_t value = s->kvm_shadow_mem;
+
+visit_type_int(v, name, , errp);
+}
+
+static void kvm_set_kvm_shadow_mem(Object *obj, Visitor *v,
+   const char *name, void *opaque,
+   Error **errp)
+{
+KVMState *s = KVM_STATE(obj);
+Error *error = NULL;
+int64_t value;
+
+visit_type_int(v, name, , );
+if (error) {
+error_propagate(errp, error);
+return;
+}
+
+s->kvm_shadow_mem = value;
+}
+
+static void kvm_accel_instance_init(Object *obj)
+{
+KVMState *s = KVM_STATE(obj);
+
+s->kvm_shadow_mem = -1;
+}
+
 static void kvm_accel_class_init(ObjectClass *oc, void *data)
 {
 AccelClass *ac = ACCEL_CLASS(oc);
@@ -2929,11 +2965,18 @@ static void kvm_accel_class_init(ObjectClass *oc, void 
*data)
 ac->init_machine = kvm_init;
 ac->has_memory = kvm_accel_has_memory;
 ac->allowed = _allowed;
+
+object_class_property_add(oc, "kvm-shadow-mem", "int",
+kvm_get_kvm_shadow_mem, kvm_set_kvm_shadow_mem,
+NULL, NULL, _abort);
+object_class_property_set_description(oc, "kvm-shadow-mem",
+"KVM shadow MMU size", _abort);
 }
 
 static const TypeInfo kvm_accel_type = {
 .name = TYPE_KVM_ACCEL,
 .parent = TYPE_ACCEL,
+.instance_init = kvm_accel_instance_init,
 .class_init = kvm_accel_class_init,
 .instance_size = sizeof(KVMState),
 };
diff --git a/hw/core/machine.c b/hw/core/machine.c
index d7a0356..bfb5f6f 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -211,33 +211,6 @@ static void machine_set_kernel_irqchip(Object *obj, 
Visitor *v,
 }
 }
 
-static void machine_get_kvm_shadow_mem(Object *obj, Visitor *v,
-   const char *name, void *opaque,
-   Error **errp)
-{
-MachineState *ms = MACHINE(obj);
-int64_t value = ms->kvm_shadow_mem;
-
-visit_type_int(v, name, , errp);
-}
-
-static void machine_set_kvm_shadow_mem(Object *obj, Visitor *v,
-   const char *name, void *opaque,
-   Error **errp)
-{
-MachineState *ms = MACHINE(obj);
-Error *error = NULL;
-int64_t value;
-
-visit_type_int(v, name, , );
-if (error) {
-error_propagate(errp, error);
-return;
-}
-
-ms->kvm_shadow_mem = value;
-}
-
 static char *machine_get_kernel(Object *obj, Error **errp)
 {
 MachineState *ms = MACHINE(obj);
@@ -785,12 +758,6 @@ static void machine_class_init(ObjectClass *oc, void *data)
 object_class_property_set_description(oc, "kernel-irqchip",
 "Configure KVM in-kernel irqchip", _abort);
 
-object_class_property_add(oc, "kvm-shadow-mem", "int",
-machine_get_kvm_shadow_mem, machine_set_kvm_shadow_mem,
-NULL, NULL, _abort);
-object_class_property_set_description(oc, "kvm-shadow-mem",
-"KVM shadow MMU size", _abort);
-
 object_class_property_add_str(oc, "kernel",
 machine_get_kernel, machine_set_kernel, _abort);
 object_class_property_set_description(oc, "kernel",
@@ -892,7 +859,6 @@ static void machine_initfn(Object *obj)
 
 ms->kernel_irqchip_allowed = true;
 ms->kernel_irqchip_split = mc->default_kernel_irqchip_split;
-ms->kvm_shadow_mem = -1;
 ms->dump_guest_core = true;
 ms->mem_merge = true;
 ms->enable_graphics = true;
@@ -963,11 +929,6 @@ bool machine_kernel_irqchip_split(MachineState *machine)
 return machine->kernel_irqchip_split;
 }
 
-int machine_kvm_shadow_mem(MachineState *machine)
-{
-return machine->kvm_shadow_mem;
-}
-
 int machine_phandle_start(MachineState *machine)
 {
 return machine->phandle_start;
diff --git a/include/hw/boards.h 

[PATCH] target/i386: add PSCHANGE_NO bit for the ARCH_CAPABILITIES MSR

2019-11-13 Thread Paolo Bonzini
This is required to disable ITLB multihit mitigations in nested
hypervisors.

Signed-off-by: Paolo Bonzini 
---
 target/i386/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index a624163ac2..2f60df37c4 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1204,7 +1204,7 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = 
{
 .type = MSR_FEATURE_WORD,
 .feat_names = {
 "rdctl-no", "ibrs-all", "rsba", "skip-l1dfl-vmentry",
-"ssb-no", "mds-no", NULL, NULL,
+"ssb-no", "mds-no", "pschange-mc-no", NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
-- 
2.21.0




[PATCH] target/i386: add PSCHANGE_NO bit for the ARCH_CAPABILITIES MSR

2019-11-13 Thread Paolo Bonzini
From: speck for Paolo Bonzini 

This is required to disable ITLB multihit mitigations in nested
hypervisors.

Signed-off-by: Paolo Bonzini 
---
 target/i386/cpu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index a624163ac2..2f60df37c4 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -1204,7 +1204,7 @@ static FeatureWordInfo feature_word_info[FEATURE_WORDS] = 
{
 .type = MSR_FEATURE_WORD,
 .feat_names = {
 "rdctl-no", "ibrs-all", "rsba", "skip-l1dfl-vmentry",
-"ssb-no", "mds-no", NULL, NULL,
+"ssb-no", "mds-no", "pschange-mc-no", NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
 NULL, NULL, NULL, NULL,
-- 
2.21.0




Re: FW: [PATCH v1 3/5] docs/devel: update tcg-plugins.rst with API versioning details

2019-11-13 Thread Robert Foley
On Wed, 13 Nov 2019 at 07:00, Alex Bennée  wrote:
>
> While we are at it fix up the quoted code sections with the inline ::
> approach.
>
> Signed-off-by: Alex Bennée 
>
> ---
> v2
>   - fix grammar
>   - mention we also will fail to load outside the range
>   - clean-up code sections
> ---
>  docs/devel/tcg-plugins.rst | 27 +--
>  1 file changed, 21 insertions(+), 6 deletions(-)
>
> diff --git a/docs/devel/tcg-plugins.rst b/docs/devel/tcg-plugins.rst
> index b18fb6729e3..718eef00f22 100644
> --- a/docs/devel/tcg-plugins.rst
> +++ b/docs/devel/tcg-plugins.rst
> @@ -25,6 +25,23 @@ process. However the project reserves the right to change 
> or break the
>  API should it need to do so. The best way to avoid this is to submit
>  your plugin upstream so they can be updated if/when the API changes.
>
> +API versioning
> +--
> +
> +All plugins need to declare a symbol which exports the plugin API
> +version they were built against. This can be done simply by::
> +
> +  QEMU_PLUGIN_EXPORT int qemu_plugin_version = QEMU_PLUGIN_VERSION;
> +
> +The core code will refuse to load a plugin that doesn't export a
> +`qemu_plugin_version` symbol or if plugin version is outside of QEMU's
> +supported range of API versions.
> +
> +Additionally the `qemu_info_t` structure which is passed to the
> +`qemu_plugin_install` method of a plugin will detail the minimum and
> +current API versions supported by QEMU. The API version will be
> +incremented if new APIs are added. The minimum API version will be
> +incremented if existing APIs are changed or removed.
>
>  Exposure of QEMU internals
>  --
> @@ -40,16 +57,14 @@ instructions and events are opaque to the plugins 
> themselves.
>  Usage
>  =
>
> -The QEMU binary needs to be compiled for plugin support:
> +The QEMU binary needs to be compiled for plugin support::
>
> -::
> -configure --enable-plugins
> +  configure --enable-plugins
>
>  Once built a program can be run with multiple plugins loaded each with
> -their own arguments:
> +their own arguments::
>
> -::
> -$QEMU $OTHER_QEMU_ARGS \
> +  $QEMU $OTHER_QEMU_ARGS \
>-plugin tests/plugin/libhowvec.so,arg=inline,arg=hint \
>-plugin tests/plugin/libhotblocks.so
>
> --
> 2.20.1
>
>
Reviewed-by: Robert Foley 



Re: [PATCH] spapr/kvm: Set default cpu model for all machine classes

2019-11-13 Thread Peter Maydell
On Wed, 13 Nov 2019 at 15:10, Greg Kurz  wrote:
> David is away until the 19th of November, which is the release date
> of rc2 according to the planning [*]. Then we have rc3 the 26th, and
> final release (or rc4) the 3rd of December, so it should be ok.

Please don't actively plan to delay putting changes in
until later release candidates. The release process
involves steadily winding up the bar of whether it's
worth putting in and hopefully reducing the volume
of changes between rcs. In an ideal world rc3 would
have very few changes, and then there would be no
changes at all between rc3 and the final release.

thanks
-- PMM



Re: [RFC v4 PATCH 07/49] multi-process: define mpqemu-link object

2019-11-13 Thread Jag Raman




On 11/11/2019 11:41 AM, Stefan Hajnoczi wrote:

On Thu, Oct 24, 2019 at 05:08:48AM -0400, Jagannathan Raman wrote:

+int mpqemu_msg_recv(MPQemuLinkState *s, MPQemuMsg *msg, MPQemuChannel *chan)
+{
+int rc;
+uint8_t *data;
+union {
+char control[CMSG_SPACE(REMOTE_MAX_FDS * sizeof(int))];
+struct cmsghdr align;
+} u;
+struct msghdr hdr;
+struct cmsghdr *chdr;
+size_t fdsize;
+int sock = chan->sock;
+QemuMutex *lock = >recv_lock;
+
+struct iovec iov = {
+.iov_base = (char *) msg,
+.iov_len = MPQEMU_MSG_HDR_SIZE,
+};
+
+memset(, 0, sizeof(hdr));
+memset(, 0, sizeof(u));
+
+hdr.msg_iov = 
+hdr.msg_iovlen = 1;
+hdr.msg_control = 
+hdr.msg_controllen = sizeof(u);
+
+qemu_mutex_lock(lock);
+
+do {
+rc = recvmsg(sock, , 0);
+} while (rc < 0 && (errno == EINTR || errno == EAGAIN));
+
+if (rc < 0) {
+qemu_log_mask(LOG_REMOTE_DEBUG, "%s - recvmsg rc is %d, errno is %d,"
+  " sock %d\n", __func__, rc, errno, sock);
+qemu_mutex_unlock(lock);
+return rc;
+}
+
+msg->num_fds = 0;
+for (chdr = CMSG_FIRSTHDR(); chdr != NULL;
+ chdr = CMSG_NXTHDR(, chdr)) {
+if ((chdr->cmsg_level == SOL_SOCKET) &&
+(chdr->cmsg_type == SCM_RIGHTS)) {
+fdsize = chdr->cmsg_len - CMSG_LEN(0);
+msg->num_fds = fdsize / sizeof(int);
+if (msg->num_fds > REMOTE_MAX_FDS) {
+/*
+ * TODO: Security issue detected. Sender never sends more
+ * than REMOTE_MAX_FDS. This condition should be signaled to
+ * the admin
+ */
+qemu_log_mask(LOG_REMOTE_DEBUG, "%s: Max FDs exceeded\n", 
__func__);
+return -ERANGE;
+}
+
+memcpy(msg->fds, CMSG_DATA(chdr), fdsize);
+break;
+}
+}
+
+if (msg->size && msg->bytestream) {
+msg->data2 = calloc(1, msg->size);
+data = msg->data2;
+} else {
+data = (uint8_t *)>data1;
+}
+
+if (msg->size) {
+do {
+rc = read(sock, data, msg->size);
+} while (rc < 0 && (errno == EINTR || errno == EAGAIN));
+}
+
+qemu_mutex_unlock(lock);
+
+return rc;
+}


This code is still insecure.  Until the communication between processes
is made secure this series does not meet its goal of providing process
isolation.

1. An attacker can overflow msg->data1 easily by setting msg->size but
not msg->bytestream.


We will add a check to ensure that msg->size is less than msg->data1 if
msg->bytestream is not set.


2. An attacker can allocate data2, all mpqemu_msg_recv() callers
need to free it to prevent memory leaks.


We will address this memory leak.


3. mpqemu_msg_recv() callers generally do not validate untrusted msg
fields.  All the code needs to be audited.


mpqemu_msg_recv() callers validate the num_fds field. But we will add
more fields for validation by the callers.

Thanks!
--
Jag



Stefan





Re: [SeaBIOS] Re: [PATCH] ahci: zero-initialize port struct

2019-11-13 Thread Philippe Mathieu-Daudé

Hi Sam,

On 11/13/19 4:03 PM, Sam Eiderman wrote:

Hi,

Does this fix a bug that actually happened?

I just noticed that in my lchs patches I assumed that lchs struct is
zeroed out in all devices (not only ahci):

9caa19be0e53 (geometry: Apply LCHS values for boot devices)

Seems like this is not the case but why only ahci is affected?

The list of devices is at least:

 * ata
 * ahci
 * scsi
 * esp
 * lsi
 * megasas
 * mpt
 * pvscsi
 * virtio
 * virtio-blk

As specified in the commit message.

Also Gerd it seems that my lchs patches were not committed in the
latest submitted version (v4)!!!
The ABI of the fw config key is completely broken.


What do you mean? Can you be more specific?




Re: [RFC v4 PATCH 10/49] multi-process: setup a machine object for remote device process

2019-11-13 Thread Stefan Hajnoczi
On Thu, Oct 24, 2019 at 05:08:51AM -0400, Jagannathan Raman wrote:
> +static NotifierList machine_init_done_notifiers =
> +NOTIFIER_LIST_INITIALIZER(machine_init_done_notifiers);
> +
> +bool machine_init_done;
> +
> +void qemu_add_machine_init_done_notifier(Notifier *notify)
> +{
> +notifier_list_add(_init_done_notifiers, notify);
> +if (machine_init_done) {
> +notify->notify(notify, NULL);
> +}
> +}
> +
> +void qemu_remove_machine_init_done_notifier(Notifier *notify)
> +{
> +notifier_remove(notify);
> +}
> +
> +void qemu_run_machine_init_done_notifiers(void)
> +{
> +machine_init_done = true;
> +notifier_list_notify(_init_done_notifiers, NULL);
> +}

qemu_add_machine_init_done_notifier() is already defined in vl.c.
Please share the implementation instead of duplicating it into the
remote program.

> +
> +static void remote_machine_init(Object *obj)
> +{
> +RemMachineState *s = REMOTE_MACHINE(obj);
> +RemPCIHost *rem_host;
> +MemoryRegion *system_memory, *system_io, *pci_memory;
> +
> +Error *error_abort = NULL;
> +
> +qemu_mutex_init(_list.mutex);

Please keep global initialization separate from RemMachineState (e.g. do
it in main() or a function called by main()).  This function should only
initialize RemMachineState.

> +
> +object_property_add_child(object_get_root(), "machine", obj, 
> _abort);
> +if (error_abort) {
> +error_report_err(error_abort);
> +}
> +
> +memory_map_init();

This is global init, please move it elsewhere.

> +
> +system_memory = get_system_memory();
> +system_io = get_system_io();
> +
> +pci_memory = g_new(MemoryRegion, 1);
> +memory_region_init(pci_memory, NULL, "pci", UINT64_MAX);
> +
> +rem_host = REMOTE_HOST_DEVICE(qdev_create(NULL, 
> TYPE_REMOTE_HOST_DEVICE));
> +
> +rem_host->mr_pci_mem = pci_memory;
> +rem_host->mr_sys_mem = system_memory;
> +rem_host->mr_sys_io = system_io;
> +
> +s->host = rem_host;

Both s and rem_host are QOM objects.  There should be a child property
relationship between them here.  It will ensure that rem_host is cleaned
up when s is cleaned up.  Please use that instead of a regular C
pointer.


signature.asc
Description: PGP signature


  1   2   3   >