date:20210910

Re: [PATCH v5 1/4] spapr: move NUMA associativity init to machine reset

2021-09-10 Thread David Gibson

On Fri, Sep 10, 2021 at 04:57:14PM -0300, Daniel Henrique Barboza wrote:
65;6402;1c> 
> 
> On 9/7/21 6:23 AM, David Gibson wrote:
> > On Tue, Sep 07, 2021 at 09:10:13AM +0200, Greg Kurz wrote:
> > > On Tue, 7 Sep 2021 10:37:27 +1000
> > > David Gibson  wrote:
> > > 
> > > > On Mon, Sep 06, 2021 at 09:25:24PM -0300, Daniel Henrique Barboza wrote:
> > > > > At this moment we only support one form of NUMA affinity, FORM1. This
> > > > > allows us to init the internal structures during machine_init(), and
> > > > > given that NUMA distances won't change during the guest lifetime we
> > > > > don't need to bother with that again.
> > > > > 
> > > > > We're about to introduce FORM2, a new NUMA affinity mode for pSeries
> > > > > guests. This means that we'll only be certain about the affinity mode
> > > > > being used after client architecture support. This also means that the
> > > > > guest can switch affinity modes in machine reset.
> > > > > 
> > > > > Let's prepare the ground for the FORM2 support by moving the NUMA
> > > > > internal data init from machine_init() to machine_reset(). Change the
> > > > > name to spapr_numa_associativity_reset() to make it clearer that this 
> > > > > is
> > > > > a function that can be called multiple times during the guest 
> > > > > lifecycle.
> > > > > We're also simplifying its current API since this method will be 
> > > > > called
> > > > > during CAS time (do_client_architecture_support()) later on and 
> > > > > there's no
> > > > > MachineState pointer already solved there.
> > > > > 
> > > > > Signed-off-by: Daniel Henrique Barboza 
> > > > 
> > > > Applied to ppc-for-6.2, thanks.
> > > > 
> > > 
> > > Even if already applied :
> > > 
> > > Reviewed-by: Greg Kurz 
> > 
> > Added, thanks.
> 
> 
> I'm afraid this patch was deprecated by the new patch series I just
> posted.

Ok, I've removed the old patch from ppc-for-6.2.

> 
> 
> Thanks,
> 
> 
> Daniel
> 
> > 
> > > > > ---
> > > > >   hw/ppc/spapr.c  | 6 +++---
> > > > >   hw/ppc/spapr_numa.c | 4 ++--
> > > > >   include/hw/ppc/spapr_numa.h | 9 +
> > > > >   3 files changed, 6 insertions(+), 13 deletions(-)
> > > > > 
> > > > > diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
> > > > > index d39fd4e644..8e1ff6cd10 100644
> > > > > --- a/hw/ppc/spapr.c
> > > > > +++ b/hw/ppc/spapr.c
> > > > > @@ -1621,6 +1621,9 @@ static void spapr_machine_reset(MachineState 
> > > > > *machine)
> > > > >*/
> > > > >   spapr_irq_reset(spapr, _fatal);
> > > > > +/* Reset numa_assoc_array */
> > > > > +spapr_numa_associativity_reset(spapr);
> > > > > +
> > > > >   /*
> > > > >* There is no CAS under qtest. Simulate one to please the code 
> > > > > that
> > > > >* depends on spapr->ov5_cas. This is especially needed to test 
> > > > > device
> > > > > @@ -2808,9 +2811,6 @@ static void spapr_machine_init(MachineState 
> > > > > *machine)
> > > > >   spapr->gpu_numa_id = spapr_numa_initial_nvgpu_numa_id(machine);
> > > > > -/* Init numa_assoc_array */
> > > > > -spapr_numa_associativity_init(spapr, machine);
> > > > > -
> > > > >   if ((!kvm_enabled() || kvmppc_has_cap_mmu_radix()) &&
> > > > >   ppc_type_check_compat(machine->cpu_type, 
> > > > > CPU_POWERPC_LOGICAL_3_00, 0,
> > > > > spapr->max_compat_pvr)) {
> > > > > diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
> > > > > index 779f18b994..9ee4b479fe 100644
> > > > > --- a/hw/ppc/spapr_numa.c
> > > > > +++ b/hw/ppc/spapr_numa.c
> > > > > @@ -155,10 +155,10 @@ static void 
> > > > > spapr_numa_define_associativity_domains(SpaprMachineState *spapr)
> > > > >   }
> > > > > -void spapr_numa_associativity_init(SpaprMachineState *spapr,
> > > > > -   MachineState *machine)
> > > > > +void spapr_numa_associativity_reset(SpaprMachineState *spapr)
> > > > >   {
> > > > >   SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
> > > > > +MachineState *machine = MACHINE(spapr);
> > > > >   int nb_numa_nodes = machine->numa_state->num_nodes;
> > > > >   int i, j, max_nodes_with_gpus;
> > > > >   bool using_legacy_numa = spapr_machine_using_legacy_numa(spapr);
> > > > > diff --git a/include/hw/ppc/spapr_numa.h b/include/hw/ppc/spapr_numa.h
> > > > > index 6f9f02d3de..0e457bba57 100644
> > > > > --- a/include/hw/ppc/spapr_numa.h
> > > > > +++ b/include/hw/ppc/spapr_numa.h
> > > > > @@ -16,14 +16,7 @@
> > > > >   #include "hw/boards.h"
> > > > >   #include "hw/ppc/spapr.h"
> > > > > -/*
> > > > > - * Having both SpaprMachineState and MachineState as arguments
> > > > > - * feels odd, but it will spare a MACHINE() call inside the
> > > > > - * function. spapr_machine_init() is the only caller for it, and
> > > > > - * it has both pointers resolved already.
> > > > > - */
> > > > > -void spapr_numa_associativity_init(SpaprMachineState *spapr,
> > > > > -   MachineState *machine);
> >

Re: [PATCH] hw/i386/acpi-build: Fix a typo

2021-09-10 Thread Ani Sinha

On Sat, Sep 11, 2021 at 1:03 AM Philippe Mathieu-Daudé 
wrote:

> On 9/10/21 8:54 PM, Volker Rümelin wrote:
> >> Fix 'hotplugabble' -> 'hotpluggabble' typo.
> >
> > I'm convinced that the correct spelling is hotpluggable. Only the
> > consonant g is doubled.
>
> Lol I missed this part, thanks :>

Oops my apologies. I also did not notice the double b.

>
>

Re: [RFC 00/10] hw/mos6522: VIA timer emulation fixes and improvements

2021-09-10 Thread Finn Thain

On Fri, 10 Sep 2021, Mark Cave-Ayland wrote:

> On 01/09/2021 09:06, Mark Cave-Ayland wrote:
> 
> > I'll have a go at some basic timer measurements using your patch to 
> > see what sort of numbers I get for the latency here. Obviously QEMU 
> > doesn't guarantee response times but over 20ms does seem high.
> 
> I was able to spend some time today looking at this and came up with 
> https://github.com/mcayland/qemu/tree/via-hacks with the aim of starting 
> with your patches to reduce the calls to the timer update functions (to 
> reduce jitter) and fixing up the places where mos6522_update_irq() isn't 
> called.
> 

What kind of guest was that? What impact does jitter have on that guest? 
Was the jitter measurement increased, decreased or unchanged by this patch 
series?

> That seemed okay, but the part I'm struggling with at the moment is 
> removing the re-assertion of the timer interrupt if the timer has 
> expired when any of the registers are read i.e.
> 
> diff --git a/hw/misc/mos6522.c b/hw/misc/mos6522.c
> index 3c33cbebde..9884d7e178 100644
> --- a/hw/misc/mos6522.c
> +++ b/hw/misc/mos6522.c
> @@ -229,16 +229,7 @@ uint64_t mos6522_read(void *opaque, hwaddr addr, unsigned
> size)
>  {
>  MOS6522State *s = opaque;
>  uint32_t val;
> -int64_t now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);
> 
> -if (now >= s->timers[0].next_irq_time) {
> -mos6522_timer1_update(s, >timers[0], now);
> -s->ifr |= T1_INT;
> -}
> -if (now >= s->timers[1].next_irq_time) {
> -mos6522_timer2_update(s, >timers[1], now);
> -s->ifr |= T2_INT;
> -}
>  switch (addr) {
>  case VIA_REG_B:
>  val = s->b;
> 
> If I apply this then I see a hang in about roughly 1 in 10 boots. Poking 
> it with a debugger shows that the VIA1 timer interrupts are constantly 
> being fired as IER and IFR have the timer 1 bit set, but it is being 
> filtered out by SR being set to 0x2700 on the CPU.
> 
> The existing code above isn't correct since T1_INT should only be 
> asserted by the expiry of the timer 1 via mos6522_timer1(), so I'm 
> wondering if this is tickling a CPU bug somewhere? In what circumstances 
> could interrupts be disabled via SR but not enabled again?
> 

The code you're patching here was part of Laurent's commit cd8843ff25 
("mos6522: fix T1 and T2 timers"). You've mentioned that elsewhere in this 
thread. My response is the same as before: this patch series removes that 
code so it's moot.

Please test the patch series I sent, unmodified. If you find a problem 
with my code, please do report it here. I believe that you will see no 
hangs at all.

Re: [PATCH 1/2] vfio/pci: Fix vfio-pci sub-page MMIO BAR mmaping in live migration

2021-09-10 Thread Alex Williamson

On Fri, 10 Sep 2021 16:33:12 +0800
Kunkun Jiang  wrote:

> Hi Alex,
> 
> On 2021/9/9 4:45, Alex Williamson wrote:
> > On Fri, 3 Sep 2021 17:36:10 +0800
> > Kunkun Jiang  wrote:
> >  
> >> We expand MemoryRegions of vfio-pci sub-page MMIO BARs to
> >> vfio_pci_write_config to improve IO performance.
> >> The MemoryRegions of destination VM will not be expanded
> >> successful in live migration, because their addresses have
> >> been updated in vmstate_load_state (vfio_pci_load_config).  
> > What's the call path through vfio_pci_write_config() that you're
> > relying on to get triggered to enable this and why wouldn't we just
> > walk all sub-page BARs in vfio_pci_load_config() to resolve the issue
> > then?  It's my understanding that we do this update in write-config
> > because it's required that the VM sizes the BAR before using it, which
> > is not the case when we resume from migration.  Thanks,  
> Let's take an example:
> 
> AArch64
> host page granularity: 64KB
> PCI device: *Bar2 size 32KB* [mem 0x800020-0x800020 64bit pref]
> 
> When enable Command register bit 1(Memory Space), the code flow is
> as follows:
> 
> vfio_pci_write_config (addr: 4 val: 2 len: 2)
>      // record the old address of each bar, 0x
>      old_addr[bar] = pdev->io_regions[bar].addr;
>      pci_default_write_config
>      pci_update_mappings
>      new_addr = pci_bar_address    // 0x800020
>      r->addr = new_addr;
>      memory_region_addr_subregion_overlap
>      ...
> *vfio_listener_region_add*
>      alignment check of the ram section address and size 
> fail, return
> *kvm_region_add*
>      kvm_set_phys_mem
>      alignment check of the ram section address and 
> size fail, return
> 
>      // old_addr[bar] != pdev->io_regions[bar].addr &&
>      // 0 < vdev->bars[bar].region.size < qemu_real_host_page_size
>      vfio_sub_page_bar_update_mapping
> *bar size = qemu_real_host_page_size*
>      ...
>      vfio_listener_region_add
>      map success
>      kvm_region_add
>      kvm_set_phys_mem
>      map success
> 
> In live migration, only pci config data is sent to the destination VM.
> Therefore, we need to update the bar's size before destination VM
> using it.
> 
> In vfio_pci_load_config, the code flow is as follows:
> 
> vfio_pci_load_config
>      vmstate_load_state
>      *get_pci_config_device*
>      pci_update_mappings
>      ...
>      // bar's addr is updated(0x800020), but bar's size 
> is still 32KB, so map failed
>      vfio_pci_write_config
>      // bar's addr will not be changed, so 
> vfio_sub_page_bar_update_mapping won't be called
> 
> My idea is that removing the check 'old_addr[bar] != 
> pdev->io_regions[bar].addr' doesn't
> affect the previous process. There's also a bar size check. In 
> vfio_sub_page_bar_update_mapping,
> it will check if bar is mapped and page aligned.
> 1) If bar's addr is 0x, it will not pass the 
> vfio_sub_page_bar_update_mapping check.
> 2) If bar's size has been updated, it will not pass the bar size check 
> in vfio_pci_write_config.

The bar size check in vfio_pci_write_config() only tests if the vfio
region is >0 and bars[bar].region.size == qemu_real_host_page_size) once we setup
the sub-page support, I'm not convinced that's true.

So yes, sub-page-update can reject invalid addresses and we already
rely on it to do so, but the code being removed avoids that redundant
writes to the BAR won't trigger redundant MemoryRegion manipulation.
Maybe those are harmless, but that's not your argument for allowing it.

OTOH, why wouldn't vfio_pci_load_config() iterate sub-page BARs and try
to update them at that time?  Thanks,

Alex


> > 
> >> Remove the restriction on base address change in
> >> vfio_pci_write_config for correct mmapping sub-page MMIO
> >> BARs. Accroding to my analysis, the remaining parameter
> >> verification is enough.
> >>
> >> Fixes: c5e2fb3ce4d (vfio: Add save and load functions for VFIO PCI devices)
> >> Reported-by: Nianyao Tang 
> >> Reported-by: Qixin Gan 
> >> Signed-off-by: Kunkun Jiang 
> >> ---
> >>   hw/vfio/pci.c | 8 +---
> >>   1 file changed, 1 insertion(+), 7 deletions(-)
> >>
> >> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> >> index e1ea1d8a23..891b211ddf 100644
> >> --- a/hw/vfio/pci.c
> >> +++ b/hw/vfio/pci.c
> >> @@ -1189,18 +1189,12 @@ void vfio_pci_write_config(PCIDevice *pdev,
> >>   }
> >>   } else if (ranges_overlap(addr, len, PCI_BASE_ADDRESS_0, 24) ||
> >>   range_covers_byte(addr, len, PCI_COMMAND)) {
> >> -pcibus_t old_addr[PCI_NUM_REGIONS - 1];
> >>   int bar;
> >>   
> >> -for (bar = 0; bar < PCI_ROM_SLOT; bar++) {
> >> -old_addr[bar] = pdev->io_regions[bar].addr;
> >> -}
> >> -
> >>

Re: [PULL 00/42] bsd-user updates to run hello world

2021-09-10 Thread Philippe Mathieu-Daudé

On 9/10/21 10:30 PM, i...@bsdimp.com wrote:
> From: Warner Losh 
> 
> The following changes since commit a61c30b8c8c3c8619847cfaa289233cc696f5689:
> 
>   Merge remote-tracking branch 'remotes/mjt/tags/patch-fetch' into staging 
> (2021-09-07 10:15:48 +0100)
> 
> are available in the Git repository at:
> 
>   https://gitlab.com/bsdimp/qemu.git tags/pull-bsd-user-20210910

Well done, chapeau!

> for you to fetch changes up to be04f210f954bed8663943a94ece50c2ca410231:
> 
>   bsd-user: Update mapping to handle reserved and starting conditions 
> (2021-09-10 14:13:06 -0600)
> 
> 
> This series of patches gets me to the point that I can run "Hello World" on 
> i386
> and x86_64. This is for static binaries only, that are relatively small, but
> it's better than the 100% instant mmap failre that is the current state of all
> things bsd-user in upstream qemu. Future patch sets will refine this, add
> the missing system calls, fix bugs preventing more sophisticated programms
> from running and add a bunch of new architecture support.
> 
> There's three large themes in these patches, though the changes that
> represent them are interrelated making it hard to separate out further.
> 1. Reorganization to support multiple OS and architectures (though I've only
>tested FreeBSD, other BSDs might not even compile yet).
> 2. Diff reduction with the bsd-user fork for several files. These diffs 
> include
>changes that borrowed from linux-user as well as changes to make things 
> work
>on FreeBSD. The records keeping when this was done, however, was poor at
>best, so many of the specific borrowings are going unacknowledged here, 
> apart
>from this general ack. These diffs also include some minor code shuffling.
>Some of the changes are done specifically to make it easier to rebase
>the bsd-user fork's changes when these land in the tree (a number of 
> changes
>have been pushed there to make this more possible).
> 3. Filling in the missing pieces to make things work. There's many changes to
>elfload to make it load things in the right places, to find the interpreter
>better, etc. There's changes to mmap.c to make the mappings work better and
>there's changes to main.c that were inspired, at least, by now-ancient 
> changes
>to linux-user's main.c.
> 
> I ran checkpatch.pl on this, and there's 350-odd errors it identifies (the 
> vast
> majoirty come from BSD's fetish for tabs), so there will need to be a V2 to 
> fix
> this at the very least. In addition, the change set is big (about 
> +~4.5k/-~2.5k
> lines), so I anticipate some iteration as well just based on its sheer
> size. I've tried to keep each set small to make it easy to review in 
> isolation,
> but I've also allowed some interrelated ones to get a little bigger than I'd
> normally like. I've not done the customary documentation of the expected
> checkpatch.pl output because it is large, and because I wanted to get review
> of the other parts rolling to get this project unstuck. Future versions of the
> patch will document the expected output.
> 
> In addition, I noticed a number of places where I could modernize to make the
> code match things like linux-user better. I've resisted the urge to do these 
> at
> this time, since it would complicate merging the other ~30k lines of diff that
> remains after this batch. Future batches should generally be smaller once this
> one has landed since they are, by and large, either a bunch of new files to
> support armv7, aarch64, riscv64, mips, mipsel, mips64, ppc, ppc64 and ppc64le,
> or are adding system calls, which can be done individually or small groups. 
> I've
> removed sparc and sparc64 support as they've been removed from FreeBSD and
> have been near totally busted for years.
> 
> Stacey Son did the bulk of this work originally, but since I had to move 
> things
> around so much and/or retool that work in non-trivial ways, I've kept myself 
> as
> author, and added his signed-off-by line. I'm unsure of the qemu standard
> practice for this, but am happy to learn if this is too far outside its 
> current
> mainstream. For a while Sean Bruno did the merges from upstream, and he's
> credited using his signed-off-by in appropriate places, though for this patch
> set there's only a few. I've tried to ensure that others who have work in
> individual patches that I've aggregated together also are reflected in their
> signed-off-by. Given the chaotic stat of the upstream repo for its early
> history, this may be the best that can be reconstructed at this late date. 
> Most
> of these files are 'foundational' so have existed from the e

Re: simple serial device emulation

2021-09-10 Thread Philippe Mathieu-Daudé

On 9/10/21 9:35 PM, Hinko Kocevar wrote:
> I have an emulated MMIO area holding couple of registers that deal with
> serial UART. Very simple access to the Tx and Rx registers from the
> userspace point of view involves polling for a bit in one register and
> then writing another; when there is room for another character. When the
> guest app does write to a MMIO Tx register, as expected, io_writex() is
> invoked and my handler is invoked. At the moment it does not do much.
> I'm thinking now that the character needs to be fed to the serial device
> instance or something.
> 
> Where should I look for suitable examples in the qemu code? I reckon
> that other machines exist that do the similar. I found lots of
> serial_mm_init() and sysbus_mmio_map() uses around serial port instances
> but I'm not sure how to couple my "serial ops" to the "bus" or SerialMM
> (if that is the way to go).

Your device is a "character device frontend". See the API in
include/chardev/char-fe.h. Frontends can be connected to various
backends. The simplest backend is the standard input/output
(named 'stdio').

To be useful your frontend have to implement some handlers:
IOEventHandler, IOCanReadHandler, IOReadHandler. The frontend
register these handlers by calling qemu_chr_fe_set_handlers()
(usually in the DeviceRealize() handler).

The backends will interact with your device via this API.

I recommend you to look at the hw/char/digic-uart.c model which is
quite simple, it returns the last char received, and only transmit
one char per I/O.

Then for a more complete (and up to date) model you can look at
hw/char/goldfish_tty.c, it uses a FIFO to receive chars, but still
transmit one char at a time.

Finally the hw/char/serial.c is probably the most complete models,
with 2 FIFOs (RX & TX) and try to respect timings.

Regards,

Phil.

Re: [PATCH v4 22/33] hostmem-epc: Add the reset interface for EPC backend reset

2021-09-10 Thread Paolo Bonzini

Il ven 10 set 2021, 22:21 Sean Christopherson  ha
scritto:

> > It's also possible that QEMU handles failure, but the kernel does two
> > passes; then QEMU can just do two passes.  The kernel will overall do
> four
> > passes, but:
> >
> > 1) the second (SECS pinned by children in the same vEPC) would be cheaper
> > than a full second pass
>
> The problem is that this would require a list_head (or temp allocations)
> to track
> the SECS pages that failed the first time 'round.  For vEPC destruction,
> the kernel
> can use sgx_epc_page.list because it can take the pages off the
> active/allocated
> list, but that's not an option in this case because the
> presumably-upcoming EPC
> cgroup needs to keep pages on the list to handle OOM.
>

Good point, so yeah: let's go for a ioctl that does full removal, returning
the number of failures. I will try and cobble up a patch unless Kai beats
me to it.

Thanks for the quick discussion!

Paolo


> The kernel's ioctl/syscall/whatever could return the number of pages that
> were
> not freed, or maybe just -EAGAIN, and userspace could use that to know it
> needs
> to do another reset to free everything.
>
> My thought for QEMU was to do (bad pseudocode):
>
> /* Retry to EREMOVE pinned SECS pages if necessary. */
> ret = ioctl(SGX_VEPC_RESET, ...);
> if (ret)
> ret = ioctl(SGX_VEPC_RESET, ...);
>
> /*
>  * Tag the VM as needed yet another round of resets to ERMOVE SECS
> pages
>  * that were pinned across vEPC sections.
>  */
> vm->sgx_epc_final_reset_needed = !!ret;
>
> > 2) the fourth would actually do nothing, because there would be no pages
> > failing the EREMOV'al.
> >
> > A hypothetical other SGX client that only uses one vEPC will do the right
> > thing with a single pass.
>
>

Re: [PULL 00/42] bsd-user updates to run hello world

2021-09-10 Thread Warner Losh

On Thu, Sep 9, 2021 at 10:31 AM Philippe Mathieu-Daudé 
wrote:

> On 9/9/21 5:12 PM, Warner Losh wrote:
> >
> >
> > On Thu, Sep 9, 2021, 9:01 AM Peter Maydell  > > wrote:
> >
> > On Tue, 7 Sept 2021 at 22:56,  > > wrote:
> > >
> > > From: Warner Losh mailto:i...@bsdimp.com>>
> > >
> > > The following changes since commit
> > f214d8e0150766c31172e16ef4b17674f549d852:
> > >
> > >   Merge remote-tracking branch
> > 'remotes/pmaydell/tags/pull-target-arm-20210826' into staging
> > (2021-08-26 18:03:57 +0100)
> > >
> > > are available in the Git repository at:
> > >
> > >   https://gitlab.com/bsdimp/qemu.git
> >  tags/bsd-user-pull-20210907-tag
> > >
> > > for you to fetch changes up to
> > dc96376e46a52ac63a27ea185c3f0a6fd54e3c82:
> > >
> > >   bsd-user: Update mapping to handle reserved and starting
> > conditions (2021-09-07 08:26:53 -0600)
> >
> > >  slirp |2 +-
> >
> > A bogus submodule update seems to have crept in here...
> >
> >
> > So I need to fix this and resubmit?
>
> Yes, but since there is no change in most of the commits, you don't
> need to repost the whole, once pushed the new tag, you can just
> post the the cover letter (which triggers Peter's scripts) and the
> fixed "[PULL 07/42] bsd-user: move arch specific defines out of
> elfload.c" which updated the submodule.
>

Done. Thanks!

Warner

Re: [PULL 00/42] bsd-user updates to run hello world

2021-09-10 Thread Warner Losh

On Thu, Sep 9, 2021 at 10:29 AM Philippe Mathieu-Daudé 
wrote:

> On 9/7/21 11:52 PM, i...@bsdimp.com wrote:
> > From: Warner Losh 
> >
> > The following changes since commit
> f214d8e0150766c31172e16ef4b17674f549d852:
> >
> >   Merge remote-tracking branch
> 'remotes/pmaydell/tags/pull-target-arm-20210826' into staging (2021-08-26
> 18:03:57 +0100)
> >
> > are available in the Git repository at:
> >
> >   https://gitlab.com/bsdimp/qemu.git tags/bsd-user-pull-20210907-tag
> >
> > for you to fetch changes up to dc96376e46a52ac63a27ea185c3f0a6fd54e3c82:
> >
> >   bsd-user: Update mapping to handle reserved and starting conditions
> (2021-09-07 08:26:53 -0600)
> >
> > 
> >
> > This series of patches gets me to the point that I can run "Hello World"
> on i386
> > and x86_64. This is for static binaries only, that are relatively small,
> but
> > it's better than the 100% instant mmap failre that is the current state
> of all
> > things bsd-user in upstream qemu. Future patch sets will refine this, add
> > the missing system calls, fix bugs preventing more sophisticated
> programms
> > from running and add a bunch of new architecture support.
> >
> > There's three large themes in these patches, though the changes that
> > represent them are interrelated making it hard to separate out further.
> > 1. Reorganization to support multiple OS and architectures (though I've
> only
> >tested FreeBSD, other BSDs might not even compile yet).
> > 2. Diff reduction with the bsd-user fork for several files. These diffs
> include
> >changes that borrowed from linux-user as well as changes to make
> things work
> >on FreeBSD. The records keeping when this was done, however, was poor
> at
> >best, so many of the specific borrowings are going unacknowledged
> here, apart
> >from this general ack. These diffs also include some minor code
> shuffling.
> >Some of the changes are done specifically to make it easier to rebase
> >the bsd-user fork's changes when these land in the tree (a number of
> changes
> >have been pushed there to make this more possible).
> > 3. Filling in the missing pieces to make things work. There's many
> changes to
> >elfload to make it load things in the right places, to find the
> interpreter
> >better, etc. There's changes to mmap.c to make the mappings work
> better and
> >there's changes to main.c that were inspired, at least, by
> now-ancient changes
> >to linux-user's main.c.
> >
> > I ran checkpatch.pl on this, and there's 350-odd errors it identifies
> (the vast
> > majoirty come from BSD's fetish for tabs), so there will need to be a V2
> to fix
> > this at the very least. In addition, the change set is big (about
> +~4.5k/-~2.5k
> > lines), so I anticipate some iteration as well just based on its sheer
> > size. I've tried to keep each set small to make it easy to review in
> isolation,
> > but I've also allowed some interrelated ones to get a little bigger than
> I'd
> > normally like. I've not done the customary documentation of the expected
> > checkpatch.pl output because it is large, and because I wanted to get
> review
> > of the other parts rolling to get this project unstuck. Future versions
> of the
> > patch will document the expected output.
> >
> > In addition, I noticed a number of places where I could modernize to
> make the
> > code match things like linux-user better. I've resisted the urge to do
> these at
> > this time, since it would complicate merging the other ~30k lines of
> diff that
> > remains after this batch. Future batches should generally be smaller
> once this
> > one has landed since they are, by and large, either a bunch of new files
> to
> > support armv7, aarch64, riscv64, mips, mipsel, mips64, ppc, ppc64 and
> ppc64le,
> > or are adding system calls, which can be done individually or small
> groups. I've
> > removed sparc and sparc64 support as they've been removed from FreeBSD
> and
> > have been near totally busted for years.
> >
> > Stacey Son did the bulk of this work originally, but since I had to move
> things
> > around so much and/or retool that work in non-trivial ways, I've kept
> myself as
> > author, and added his signed-off-by line. I'm unsure of the qemu standard
> > practice for this, but am happy to learn if this is too far outside its
> current
> > mainstream. For a while Sean Bruno did the merges from upstream, and he's
> > credited using his signed-off-by in appropriate places, though for this
> patch
> > set there's only a few. I've tried to ensure that others who have work in
> > individual patches that I've aggregated together also are reflected in
> their
> > signed-off-by. Given the chaotic stat of the upstream repo for its early
> > history, this may be the best that can be reconstructed at this late
> date. Most
> > of these files are 'foundational' so have existed from the earliest days
> when
> > record keeping wasn't

[PULL 00/42] bsd-user updates to run hello world

2021-09-10 Thread imp

From: Warner Losh 

The following changes since commit a61c30b8c8c3c8619847cfaa289233cc696f5689:

  Merge remote-tracking branch 'remotes/mjt/tags/patch-fetch' into staging 
(2021-09-07 10:15:48 +0100)

are available in the Git repository at:

  https://gitlab.com/bsdimp/qemu.git tags/pull-bsd-user-20210910

for you to fetch changes up to be04f210f954bed8663943a94ece50c2ca410231:

  bsd-user: Update mapping to handle reserved and starting conditions 
(2021-09-10 14:13:06 -0600)


This series of patches gets me to the point that I can run "Hello World" on i386
and x86_64. This is for static binaries only, that are relatively small, but
it's better than the 100% instant mmap failre that is the current state of all
things bsd-user in upstream qemu. Future patch sets will refine this, add
the missing system calls, fix bugs preventing more sophisticated programms
from running and add a bunch of new architecture support.

There's three large themes in these patches, though the changes that
represent them are interrelated making it hard to separate out further.
1. Reorganization to support multiple OS and architectures (though I've only
   tested FreeBSD, other BSDs might not even compile yet).
2. Diff reduction with the bsd-user fork for several files. These diffs include
   changes that borrowed from linux-user as well as changes to make things work
   on FreeBSD. The records keeping when this was done, however, was poor at
   best, so many of the specific borrowings are going unacknowledged here, apart
   from this general ack. These diffs also include some minor code shuffling.
   Some of the changes are done specifically to make it easier to rebase
   the bsd-user fork's changes when these land in the tree (a number of changes
   have been pushed there to make this more possible).
3. Filling in the missing pieces to make things work. There's many changes to
   elfload to make it load things in the right places, to find the interpreter
   better, etc. There's changes to mmap.c to make the mappings work better and
   there's changes to main.c that were inspired, at least, by now-ancient 
changes
   to linux-user's main.c.

I ran checkpatch.pl on this, and there's 350-odd errors it identifies (the vast
majoirty come from BSD's fetish for tabs), so there will need to be a V2 to fix
this at the very least. In addition, the change set is big (about +~4.5k/-~2.5k
lines), so I anticipate some iteration as well just based on its sheer
size. I've tried to keep each set small to make it easy to review in isolation,
but I've also allowed some interrelated ones to get a little bigger than I'd
normally like. I've not done the customary documentation of the expected
checkpatch.pl output because it is large, and because I wanted to get review
of the other parts rolling to get this project unstuck. Future versions of the
patch will document the expected output.

In addition, I noticed a number of places where I could modernize to make the
code match things like linux-user better. I've resisted the urge to do these at
this time, since it would complicate merging the other ~30k lines of diff that
remains after this batch. Future batches should generally be smaller once this
one has landed since they are, by and large, either a bunch of new files to
support armv7, aarch64, riscv64, mips, mipsel, mips64, ppc, ppc64 and ppc64le,
or are adding system calls, which can be done individually or small groups. I've
removed sparc and sparc64 support as they've been removed from FreeBSD and
have been near totally busted for years.

Stacey Son did the bulk of this work originally, but since I had to move things
around so much and/or retool that work in non-trivial ways, I've kept myself as
author, and added his signed-off-by line. I'm unsure of the qemu standard
practice for this, but am happy to learn if this is too far outside its current
mainstream. For a while Sean Bruno did the merges from upstream, and he's
credited using his signed-off-by in appropriate places, though for this patch
set there's only a few. I've tried to ensure that others who have work in
individual patches that I've aggregated together also are reflected in their
signed-off-by. Given the chaotic stat of the upstream repo for its early
history, this may be the best that can be reconstructed at this late date. Most
of these files are 'foundational' so have existed from the earliest days when
record keeping wasn't quite what I'd wish for in hindsight. There was only
really one change that I could easily cherry-pick (Colin's), so I did that.


Colin Percival (1):
  bsd-user: Add '-0 argv0' option to bsd-user/main.c

Warner Losh (41):
  bsd-user: remove sparc and sparc64
  bsd-user: add copyright header to elfload.c
  bsd-user: Add Stacey's copyright to main.c
  bsd-user: add license to bsdload.c
  bsd-user: style nits: bsdload.c

[PULL 07/42] bsd-user: move arch specific defines out of elfload.c

2021-09-10 Thread imp

From: Warner Losh 

Move the architecture specific defines to target_arch_elf.h and delete
them from elfload.c. Only retain ifdefs appropriate for i386 and x86_64.
Add the copyright/license comments, and guard ifdefs.

Signed-off-by: Warner Losh 
Reviewed-by: Richard Henderson 
---
 bsd-user/elfload.c| 81 +--
 bsd-user/i386/target_arch_elf.h   | 81 +++
 bsd-user/x86_64/target_arch_elf.h | 67 +
 3 files changed, 150 insertions(+), 79 deletions(-)
 create mode 100644 bsd-user/i386/target_arch_elf.h
 create mode 100644 bsd-user/x86_64/target_arch_elf.h

diff --git a/bsd-user/elfload.c b/bsd-user/elfload.c
index fffa24f041..639673f5b7 100644
--- a/bsd-user/elfload.c
+++ b/bsd-user/elfload.c
@@ -23,6 +23,8 @@
 #include "disas/disas.h"
 #include "qemu/path.h"
 
+#include "target_arch_elf.h"
+
 /* from personality.h */
 
 /*
@@ -93,85 +95,6 @@ enum {
 #define ELIBBAD 80
 #endif
 
-#ifdef TARGET_I386
-
-#define ELF_PLATFORM get_elf_platform()
-
-static const char *get_elf_platform(void)
-{
-static char elf_platform[] = "i386";
-int family = object_property_get_int(OBJECT(thread_cpu), "family", NULL);
-if (family > 6)
-family = 6;
-if (family >= 3)
-elf_platform[1] = '0' + family;
-return elf_platform;
-}
-
-#define ELF_HWCAP get_elf_hwcap()
-
-static uint32_t get_elf_hwcap(void)
-{
-X86CPU *cpu = X86_CPU(thread_cpu);
-
-return cpu->env.features[FEAT_1_EDX];
-}
-
-#ifdef TARGET_X86_64
-#define ELF_START_MMAP 0x2ab000ULL
-#define elf_check_arch(x) (((x) == ELF_ARCH))
-
-#define ELF_CLASS  ELFCLASS64
-#define ELF_DATA   ELFDATA2LSB
-#define ELF_ARCH   EM_X86_64
-
-static inline void init_thread(struct target_pt_regs *regs, struct image_info 
*infop)
-{
-regs->rax = 0;
-regs->rsp = infop->start_stack;
-regs->rip = infop->entry;
-if (bsd_type == target_freebsd) {
-regs->rdi = infop->start_stack;
-}
-}
-
-#else /* !TARGET_X86_64 */
-
-#define ELF_START_MMAP 0x8000
-
-/*
- * This is used to ensure we don't load something for the wrong architecture.
- */
-#define elf_check_arch(x) (((x) == EM_386) || ((x) == EM_486))
-
-/*
- * These are used to set parameters in the core dumps.
- */
-#define ELF_CLASS   ELFCLASS32
-#define ELF_DATAELFDATA2LSB
-#define ELF_ARCHEM_386
-
-static inline void init_thread(struct target_pt_regs *regs, struct image_info 
*infop)
-{
-regs->esp = infop->start_stack;
-regs->eip = infop->entry;
-
-/* SVR4/i386 ABI (pages 3-31, 3-32) says that when the program
-   starts %edx contains a pointer to a function which might be
-   registered using `atexit'.  This provides a mean for the
-   dynamic linker to call DT_FINI functions for shared libraries
-   that have been loaded before the code runs.
-
-   A value of 0 tells we have no such handler.  */
-regs->edx = 0;
-}
-#endif /* !TARGET_X86_64 */
-
-#define USE_ELF_CORE_DUMP
-#define ELF_EXEC_PAGESIZE   4096
-
-#endif
-
 #ifndef ELF_PLATFORM
 #define ELF_PLATFORM (NULL)
 #endif
diff --git a/bsd-user/i386/target_arch_elf.h b/bsd-user/i386/target_arch_elf.h
new file mode 100644
index 00..84f61bd930
--- /dev/null
+++ b/bsd-user/i386/target_arch_elf.h
@@ -0,0 +1,81 @@
+/*
+ *  i386 ELF definitions
+ *
+ *  Copyright (c) 2013 Stacey D. Son
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+#ifndef _TARGET_ARCH_ELF_H_
+#define _TARGET_ARCH_ELF_H_
+
+#define ELF_PLATFORM get_elf_platform()
+
+static const char *get_elf_platform(void)
+{
+static char elf_platform[] = "i386";
+int family = object_property_get_int(OBJECT(thread_cpu), "family", NULL);
+if (family > 6) {
+family = 6;
+}
+if (family >= 3) {
+elf_platform[1] = '0' + family;
+}
+return elf_platform;
+}
+
+#define ELF_HWCAP get_elf_hwcap()
+
+static uint32_t get_elf_hwcap(void)
+{
+X86CPU *cpu = X86_CPU(thread_cpu);
+
+return cpu->env.features[FEAT_1_EDX];
+}
+
+#define ELF_START_MMAP 0x8000
+
+/*
+ * This is used to ensure we don't load something for the wrong architecture.
+ */
+#define elf_check_arch(x) (((x) == EM_386) || ((x) == EM_486))
+
+/*
+ * These are used to set parameters in the core dumps.
+ */
+#define ELF_CLASS   ELFCLASS32
+#define ELF_DATAELFDATA2LSB

Re: [PATCH v4 22/33] hostmem-epc: Add the reset interface for EPC backend reset

2021-09-10 Thread Sean Christopherson

On Fri, Sep 10, 2021, Paolo Bonzini wrote:
> On 10/09/21 19:34, Sean Christopherson wrote:
> > On Fri, Sep 10, 2021, Paolo Bonzini wrote:
> > > On 10/09/21 17:34, Sean Christopherson wrote:
> > > > The only other option that comes to mind is a dedicated ioctl().
> > > 
> > > If it is not too restrictive to do it for the whole mapping at once,
> > > that would be fine.
> > 
> > Oooh, rats.  That reminds me of a complication.  If QEMU creates multiple 
> > EPC
> > sections, e.g. for a vNUMA setup, resetting each section individually will 
> > fail
> > if the guest did an unclean RESET and a given enclaves has EPC pages from 
> > multiple
> > sections.  E.g. an SECS in vEPC[X] can have children in vEPC[0..N-1], and 
> > all
> > those children need to be removed before the SECS can be removed.  Yay SGX!
> > 
> > There are two options: 1) QEMU has to handle "failure", or 2) the kernel 
> > provides
> > an ioctl() that takes multiple vEPC fds and handles the SECS dependencies.  
> > #1 is
> > probably the least awful option.  For #2, in addition to the kernel having 
> > to deal
> > with multiple fds, it would also need a second list_head object in each 
> > page so
> > that it could track which pages failed to be removed.  Using the existing 
> > list_head
> > would work for now, but it won't work if/when an EPC cgroup is added.
> > 
> > Note, for #1, QEMU would have to potentially do three passes.
> > 
> >1. Remove child pages for a given vEPC.
> >2. Remove SECS for a given vEPC that were pinned by children in the same 
> > vEPC.
> >3. Remove SECS for all vEPC that were pinned by children in different 
> > vEPC.
> 
> It's also possible that QEMU handles failure, but the kernel does two
> passes; then QEMU can just do two passes.  The kernel will overall do four
> passes, but:
> 
> 1) the second (SECS pinned by children in the same vEPC) would be cheaper
> than a full second pass

The problem is that this would require a list_head (or temp allocations) to 
track
the SECS pages that failed the first time 'round.  For vEPC destruction, the 
kernel
can use sgx_epc_page.list because it can take the pages off the active/allocated
list, but that's not an option in this case because the presumably-upcoming EPC
cgroup needs to keep pages on the list to handle OOM.

The kernel's ioctl/syscall/whatever could return the number of pages that were
not freed, or maybe just -EAGAIN, and userspace could use that to know it needs
to do another reset to free everything.

My thought for QEMU was to do (bad pseudocode):

/* Retry to EREMOVE pinned SECS pages if necessary. */
ret = ioctl(SGX_VEPC_RESET, ...);
if (ret)
ret = ioctl(SGX_VEPC_RESET, ...);

/*
 * Tag the VM as needed yet another round of resets to ERMOVE SECS pages
 * that were pinned across vEPC sections.
 */
vm->sgx_epc_final_reset_needed = !!ret;

> 2) the fourth would actually do nothing, because there would be no pages
> failing the EREMOV'al.
> 
> A hypothetical other SGX client that only uses one vEPC will do the right
> thing with a single pass.

Re: [PATCH v5 1/4] spapr: move NUMA associativity init to machine reset

2021-09-10 Thread Daniel Henrique Barboza





On 9/7/21 6:23 AM, David Gibson wrote:

On Tue, Sep 07, 2021 at 09:10:13AM +0200, Greg Kurz wrote:

On Tue, 7 Sep 2021 10:37:27 +1000
David Gibson  wrote:


On Mon, Sep 06, 2021 at 09:25:24PM -0300, Daniel Henrique Barboza wrote:

At this moment we only support one form of NUMA affinity, FORM1. This
allows us to init the internal structures during machine_init(), and
given that NUMA distances won't change during the guest lifetime we
don't need to bother with that again.

We're about to introduce FORM2, a new NUMA affinity mode for pSeries
guests. This means that we'll only be certain about the affinity mode
being used after client architecture support. This also means that the
guest can switch affinity modes in machine reset.

Let's prepare the ground for the FORM2 support by moving the NUMA
internal data init from machine_init() to machine_reset(). Change the
name to spapr_numa_associativity_reset() to make it clearer that this is
a function that can be called multiple times during the guest lifecycle.
We're also simplifying its current API since this method will be called
during CAS time (do_client_architecture_support()) later on and there's no
MachineState pointer already solved there.

Signed-off-by: Daniel Henrique Barboza 


Applied to ppc-for-6.2, thanks.



Even if already applied :

Reviewed-by: Greg Kurz 


Added, thanks.



I'm afraid this patch was deprecated by the new patch series I just posted.


Thanks,


Daniel




---
  hw/ppc/spapr.c  | 6 +++---
  hw/ppc/spapr_numa.c | 4 ++--
  include/hw/ppc/spapr_numa.h | 9 +
  3 files changed, 6 insertions(+), 13 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index d39fd4e644..8e1ff6cd10 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1621,6 +1621,9 @@ static void spapr_machine_reset(MachineState *machine)
   */
  spapr_irq_reset(spapr, _fatal);
  
+/* Reset numa_assoc_array */

+spapr_numa_associativity_reset(spapr);
+
  /*
   * There is no CAS under qtest. Simulate one to please the code that
   * depends on spapr->ov5_cas. This is especially needed to test device
@@ -2808,9 +2811,6 @@ static void spapr_machine_init(MachineState *machine)
  
  spapr->gpu_numa_id = spapr_numa_initial_nvgpu_numa_id(machine);
  
-/* Init numa_assoc_array */

-spapr_numa_associativity_init(spapr, machine);
-
  if ((!kvm_enabled() || kvmppc_has_cap_mmu_radix()) &&
  ppc_type_check_compat(machine->cpu_type, CPU_POWERPC_LOGICAL_3_00, 0,
spapr->max_compat_pvr)) {
diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
index 779f18b994..9ee4b479fe 100644
--- a/hw/ppc/spapr_numa.c
+++ b/hw/ppc/spapr_numa.c
@@ -155,10 +155,10 @@ static void 
spapr_numa_define_associativity_domains(SpaprMachineState *spapr)
  
  }
  
-void spapr_numa_associativity_init(SpaprMachineState *spapr,

-   MachineState *machine)
+void spapr_numa_associativity_reset(SpaprMachineState *spapr)
  {
  SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
+MachineState *machine = MACHINE(spapr);
  int nb_numa_nodes = machine->numa_state->num_nodes;
  int i, j, max_nodes_with_gpus;
  bool using_legacy_numa = spapr_machine_using_legacy_numa(spapr);
diff --git a/include/hw/ppc/spapr_numa.h b/include/hw/ppc/spapr_numa.h
index 6f9f02d3de..0e457bba57 100644
--- a/include/hw/ppc/spapr_numa.h
+++ b/include/hw/ppc/spapr_numa.h
@@ -16,14 +16,7 @@
  #include "hw/boards.h"
  #include "hw/ppc/spapr.h"
  
-/*

- * Having both SpaprMachineState and MachineState as arguments
- * feels odd, but it will spare a MACHINE() call inside the
- * function. spapr_machine_init() is the only caller for it, and
- * it has both pointers resolved already.
- */
-void spapr_numa_associativity_init(SpaprMachineState *spapr,
-   MachineState *machine);


Nice additional cleanup to the signature, thanks.


+void spapr_numa_associativity_reset(SpaprMachineState *spapr);
  void spapr_numa_write_rtas_dt(SpaprMachineState *spapr, void *fdt, int rtas);
  void spapr_numa_write_associativity_dt(SpaprMachineState *spapr, void *fdt,
 int offset, int nodeid);

[PATCH v6 6/6] spapr_numa.c: FORM2 NUMA affinity support

2021-09-10 Thread Daniel Henrique Barboza

The main feature of FORM2 affinity support is the separation of NUMA
distances from ibm,associativity information. This allows for a more
flexible and straightforward NUMA distance assignment without relying on
complex associations between several levels of NUMA via
ibm,associativity matches. Another feature is its extensibility. This base
support contains the facilities for NUMA distance assignment, but in the
future more facilities will be added for latency, performance, bandwidth
and so on.

This patch implements the base FORM2 affinity support as follows:

- the use of FORM2 associativity is indicated by using bit 2 of byte 5
of ibm,architecture-vec-5. A FORM2 aware guest can choose to use FORM1
or FORM2 affinity. Setting both forms will default to FORM2. We're not
advertising FORM2 for pseries-6.1 and older machine versions to prevent
guest visible changes in those;

- ibm,associativity-reference-points has a new semantic. Instead of
being used to calculate distances via NUMA levels, it's now used to
indicate the primary domain index in the ibm,associativity domain of
each resource. In our case it's set to {0x4}, matching the position
where we already place logical_domain_id;

- two new RTAS DT artifacts are introduced: ibm,numa-lookup-index-table
and ibm,numa-distance-table. The index table is used to list all the
NUMA logical domains of the platform, in ascending order, and allows for
spartial NUMA configurations (although QEMU ATM doesn't support that).
ibm,numa-distance-table is an array that contains all the distances from
the first NUMA node to all other nodes, then the second NUMA node
distances to all other nodes and so on;

- get_max_dist_ref_points() and get_numa_assoc_size() now checks for
OV5_FORM2_AFFINITY and returns FORM2 values if the guest selected FORM2
affinity during CAS.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr.c  |   8 ++
 hw/ppc/spapr_numa.c | 151 +++-
 include/hw/ppc/spapr.h  |   2 +
 include/hw/ppc/spapr_ovec.h |   1 +
 4 files changed, 159 insertions(+), 3 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 0703a26093..23aba5ae2d 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -2766,6 +2766,11 @@ static void spapr_machine_init(MachineState *machine)
 
 spapr_ovec_set(spapr->ov5, OV5_FORM1_AFFINITY);
 
+/* Do not advertise FORM2 NUMA support for pseries-6.1 and older */
+if (!smc->pre_6_2_numa_affinity) {
+spapr_ovec_set(spapr->ov5, OV5_FORM2_AFFINITY);
+}
+
 /* advertise support for dedicated HP event source to guests */
 if (spapr->use_hotplug_event_source) {
 spapr_ovec_set(spapr->ov5, OV5_HP_EVT);
@@ -4681,8 +4686,11 @@ DEFINE_SPAPR_MACHINE(6_2, "6.2", true);
  */
 static void spapr_machine_6_1_class_options(MachineClass *mc)
 {
+SpaprMachineClass *smc = SPAPR_MACHINE_CLASS(mc);
+
 spapr_machine_6_2_class_options(mc);
 compat_props_add(mc->compat_props, hw_compat_6_1, hw_compat_6_1_len);
+smc->pre_6_2_numa_affinity = true;
 }
 
 DEFINE_SPAPR_MACHINE(6_1, "6.1", false);
diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
index 0ade63c2d3..dd52b8921c 100644
--- a/hw/ppc/spapr_numa.c
+++ b/hw/ppc/spapr_numa.c
@@ -31,12 +31,22 @@
 #define FORM1_DIST_REF_POINTS4
 #define FORM1_NUMA_ASSOC_SIZE(FORM1_DIST_REF_POINTS + 1)
 
+/*
+ * FORM2 NUMA affinity has a single associativity domain, giving
+ * us a assoc size of 2.
+ */
+#define FORM2_DIST_REF_POINTS1
+#define FORM2_NUMA_ASSOC_SIZE(FORM2_DIST_REF_POINTS + 1)
+
 /*
  * Retrieves max_dist_ref_points of the current NUMA affinity.
  */
 static int get_max_dist_ref_points(SpaprMachineState *spapr)
 {
-/* No FORM2 affinity implemented yet */
+if (spapr_ovec_test(spapr->ov5_cas, OV5_FORM2_AFFINITY)) {
+return FORM2_DIST_REF_POINTS;
+}
+
 return FORM1_DIST_REF_POINTS;
 }
 
@@ -45,7 +55,10 @@ static int get_max_dist_ref_points(SpaprMachineState *spapr)
  */
 static int get_numa_assoc_size(SpaprMachineState *spapr)
 {
-/* No FORM2 affinity implemented yet */
+if (spapr_ovec_test(spapr->ov5_cas, OV5_FORM2_AFFINITY)) {
+return FORM2_NUMA_ASSOC_SIZE;
+}
+
 return FORM1_NUMA_ASSOC_SIZE;
 }
 
@@ -305,10 +318,39 @@ static void 
spapr_numa_FORM1_affinity_init(SpaprMachineState *spapr,
 spapr_numa_define_FORM1_domains(spapr);
 }
 
+/*
+ * Init NUMA FORM2 machine state data
+ */
+static void spapr_numa_FORM2_affinity_init(SpaprMachineState *spapr)
+{
+int i;
+
+/*
+ * For all resources but CPUs, FORM2 associativity arrays will
+ * be a size 2 array with the following format:
+ *
+ * ibm,associativity = {1, numa_id}
+ *
+ * CPUs will write an additional 'vcpu_id' on top of the arrays
+ * being initialized here. 'numa_id' is represented by the
+ * index 'i' of the loop.
+ *
+ * Given that this initialization is also valid for GPU associativity
+ *

[PATCH v6 3/6] spapr: introduce spapr_numa_associativity_reset()

2021-09-10 Thread Daniel Henrique Barboza

Introducing a new NUMA affinity, FORM2, requires a new mechanism to
switch between affinity modes after CAS. Also, we want FORM2 data
structures and functions to be completely separated from the existing
FORM1 code, allowing us to avoid adding new code that inherits the
existing complexity of FORM1.

At the same time, it's also desirable to minimize the amount of changes
made in write_dt() functions that are used to write ibm,associativity of
the resources, RTAS artifacts and h_home_node_associativity. These
functions can work in the same way in both NUMA affinity modes, as long
as we use a similar data structure and parametrize it properly depending
on the affinity mode selected.

This patch introduces spapr_numa_associativity_reset() to start this
process. This function will be used to switch to the chosen NUMA
affinity after CAS and after migrating the guest. To do that, the
existing 'numa_assoc_array' is renamed to 'FORM1_assoc_array' and will
hold FORM1 data that is populated at associativity_init().
'numa_assoc_array' is now a pointer that can be switched between the
existing affinity arrays. We don't have FORM2 data structures yet, so
'numa_assoc_array' will always point to 'FORM1_assoc_array'.

We also take the precaution of pointing 'numa_assoc_array' to
'FORM1_assoc_array' in associativity_init() time, before CAS, to not
change FORM1 availability for existing guests.

A small change in spapr_numa_write_associativity_dt() is made to reflect
the fact that 'numa_assoc_array' is now a pointer and we must be
explicit with the size being written in the DT.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr.c  | 14 +
 hw/ppc/spapr_hcall.c|  7 +++
 hw/ppc/spapr_numa.c | 42 +
 include/hw/ppc/spapr.h  |  3 ++-
 include/hw/ppc/spapr_numa.h |  1 +
 5 files changed, 57 insertions(+), 10 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index d39fd4e644..5afbb76cab 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1786,6 +1786,20 @@ static int spapr_post_load(void *opaque, int version_id)
 return err;
 }
 
+/*
+ * NUMA affinity selection is made in CAS time. There is no reliable
+ * way of telling whether the guest already went through CAS before
+ * migration due to how spapr_ov5_cas_needed works: a FORM1 guest can
+ * be migrated with ov5_cas empty regardless of going through CAS
+ * first.
+ *
+ * One solution is to call numa_associativity_reset(). The downside
+ * is that a guest migrated before CAS will reset it again when going
+ * through it, but since it's a lightweight operation it's worth being
+ * a little redundant to be safe.
+ */
+ spapr_numa_associativity_reset(spapr);
+
 return err;
 }
 
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 0e9a5b2e40..82ab92ddba 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -17,6 +17,7 @@
 #include "kvm_ppc.h"
 #include "hw/ppc/fdt.h"
 #include "hw/ppc/spapr_ovec.h"
+#include "hw/ppc/spapr_numa.h"
 #include "mmu-book3s-v3.h"
 #include "hw/mem/memory-device.h"
 
@@ -1197,6 +1198,12 @@ target_ulong do_client_architecture_support(PowerPCCPU 
*cpu,
 spapr->cas_pre_isa3_guest = !spapr_ovec_test(ov1_guest, OV1_PPC_3_00);
 spapr_ovec_cleanup(ov1_guest);
 
+/*
+ * Reset numa_assoc_array now that we know which NUMA affinity
+ * the guest will use.
+ */
+spapr_numa_associativity_reset(spapr);
+
 /*
  * Ensure the guest asks for an interrupt mode we support;
  * otherwise terminate the boot.
diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
index fb6059550f..327952ba9e 100644
--- a/hw/ppc/spapr_numa.c
+++ b/hw/ppc/spapr_numa.c
@@ -97,7 +97,7 @@ static void spapr_numa_define_FORM1_domains(SpaprMachineState 
*spapr)
  */
 for (i = 1; i < nb_numa_nodes; i++) {
 for (j = 1; j < MAX_DISTANCE_REF_POINTS; j++) {
-spapr->numa_assoc_array[i][j] = cpu_to_be32(i);
+spapr->FORM1_assoc_array[i][j] = cpu_to_be32(i);
 }
 }
 
@@ -149,8 +149,8 @@ static void 
spapr_numa_define_FORM1_domains(SpaprMachineState *spapr)
  * and going up to 0x1.
  */
 for (i = n_level; i > 0; i--) {
-assoc_src = spapr->numa_assoc_array[src][i];
-spapr->numa_assoc_array[dst][i] = assoc_src;
+assoc_src = spapr->FORM1_assoc_array[src][i];
+spapr->FORM1_assoc_array[dst][i] = assoc_src;
 }
 }
 }
@@ -167,6 +167,11 @@ static void 
spapr_numa_FORM1_affinity_init(SpaprMachineState *spapr,
 int nb_numa_nodes = machine->numa_state->num_nodes;
 int i, j, max_nodes_with_gpus;
 
+/* init FORM1_assoc_array */
+for (i = 0; i < MAX_NODES + NVGPU_MAX_NUM; i++) {
+spapr->FORM1_assoc_array[i] = g_new0(uint32_t, NUMA_ASSOC_SIZE);
+}
+
 /*
  * For all associativity arrays: first

[PATCH v6 5/6] spapr: move FORM1 verifications to post CAS

2021-09-10 Thread Daniel Henrique Barboza

FORM2 NUMA affinity is prepared to deal with empty (memory/cpu less)
NUMA nodes. This is used by the DAX KMEM driver to locate a PAPR SCM
device that has a different latency than the original NUMA node from the
regular memory. FORM2 is also enable to deal with asymmetric NUMA
distances gracefully, something that our FORM1 implementation doesn't
do.

Move these FORM1 verifications to a new function and wait until after
CAS, when we're sure that we're sticking with FORM1, to enforce them.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr.c  | 35 +--
 hw/ppc/spapr_hcall.c|  2 +-
 hw/ppc/spapr_numa.c | 55 -
 include/hw/ppc/spapr_numa.h |  3 +-
 4 files changed, 52 insertions(+), 43 deletions(-)

diff --git a/hw/ppc/spapr.c b/hw/ppc/spapr.c
index 5afbb76cab..0703a26093 100644
--- a/hw/ppc/spapr.c
+++ b/hw/ppc/spapr.c
@@ -1798,7 +1798,7 @@ static int spapr_post_load(void *opaque, int version_id)
  * through it, but since it's a lightweight operation it's worth being
  * a little redundant to be safe.
  */
- spapr_numa_associativity_reset(spapr);
+ spapr_numa_associativity_reset(spapr, false);
 
 return err;
 }
@@ -2787,39 +2787,6 @@ static void spapr_machine_init(MachineState *machine)
 /* init CPUs */
 spapr_init_cpus(spapr);
 
-/*
- * check we don't have a memory-less/cpu-less NUMA node
- * Firmware relies on the existing memory/cpu topology to provide the
- * NUMA topology to the kernel.
- * And the linux kernel needs to know the NUMA topology at start
- * to be able to hotplug CPUs later.
- */
-if (machine->numa_state->num_nodes) {
-for (i = 0; i < machine->numa_state->num_nodes; ++i) {
-/* check for memory-less node */
-if (machine->numa_state->nodes[i].node_mem == 0) {
-CPUState *cs;
-int found = 0;
-/* check for cpu-less node */
-CPU_FOREACH(cs) {
-PowerPCCPU *cpu = POWERPC_CPU(cs);
-if (cpu->node_id == i) {
-found = 1;
-break;
-}
-}
-/* memory-less and cpu-less node */
-if (!found) {
-error_report(
-   "Memory-less/cpu-less nodes are not supported (node 
%d)",
- i);
-exit(1);
-}
-}
-}
-
-}
-
 spapr->gpu_numa_id = spapr_numa_initial_nvgpu_numa_id(machine);
 
 /* Init numa_assoc_array */
diff --git a/hw/ppc/spapr_hcall.c b/hw/ppc/spapr_hcall.c
index 82ab92ddba..2dc22e2dc7 100644
--- a/hw/ppc/spapr_hcall.c
+++ b/hw/ppc/spapr_hcall.c
@@ -1202,7 +1202,7 @@ target_ulong do_client_architecture_support(PowerPCCPU 
*cpu,
  * Reset numa_assoc_array now that we know which NUMA affinity
  * the guest will use.
  */
-spapr_numa_associativity_reset(spapr);
+spapr_numa_associativity_reset(spapr, true);
 
 /*
  * Ensure the guest asks for an interrupt mode we support;
diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
index 7ad4b6582b..0ade63c2d3 100644
--- a/hw/ppc/spapr_numa.c
+++ b/hw/ppc/spapr_numa.c
@@ -198,6 +198,48 @@ static void 
spapr_numa_define_FORM1_domains(SpaprMachineState *spapr)
 
 }
 
+static void spapr_numa_FORM1_affinity_check(MachineState *machine)
+{
+int i;
+
+/*
+ * Check we don't have a memory-less/cpu-less NUMA node
+ * Firmware relies on the existing memory/cpu topology to provide the
+ * NUMA topology to the kernel.
+ * And the linux kernel needs to know the NUMA topology at start
+ * to be able to hotplug CPUs later.
+ */
+if (machine->numa_state->num_nodes) {
+for (i = 0; i < machine->numa_state->num_nodes; ++i) {
+/* check for memory-less node */
+if (machine->numa_state->nodes[i].node_mem == 0) {
+CPUState *cs;
+int found = 0;
+/* check for cpu-less node */
+CPU_FOREACH(cs) {
+PowerPCCPU *cpu = POWERPC_CPU(cs);
+if (cpu->node_id == i) {
+found = 1;
+break;
+}
+}
+/* memory-less and cpu-less node */
+if (!found) {
+error_report(
+"Memory-less/cpu-less nodes are not supported with FORM1 NUMA (node %d)", i);
+exit(EXIT_FAILURE);
+}
+}
+}
+}
+
+if (!spapr_numa_is_symmetrical(machine)) {
+error_report(
+"Asymmetrical NUMA topologies aren't supported in the pSeries machine using 
FORM1 NUMA");
+exit(EXIT_FAILURE);
+}
+}
+
 /*
  * Set NUMA machine state data based on FORM1 affinity semantics.
  */
@@ -260,12 +302,6 @@ static void

[PATCH v6 4/6] spapr_numa.c: parametrize FORM1 macros

2021-09-10 Thread Daniel Henrique Barboza

The next preliminary step to introduce NUMA FORM2 affinity is to make
the existing code independent of FORM1 macros and values, i.e.
MAX_DISTANCE_REF_POINTS, NUMA_ASSOC_SIZE and VCPU_ASSOC_SIZE. This patch
accomplishes that by doing the following:

- move the NUMA related macros from spapr.h to spapr_numa.c where they
are used. spapr.h gets instead a 'NUMA_NODES_MAX_NUM' macro that is used
to refer to the maximum number of NUMA nodes, including GPU nodes, that
the machine can support;

- MAX_DISTANCE_REF_POINTS and NUMA_ASSOC_SIZE are renamed to
FORM1_DIST_REF_POINTS and FORM1_NUMA_ASSOC_SIZE. These FORM1 specific
macros are used in FORM1 init functions;

- code that uses MAX_DISTANCE_REF_POINTS now retrieves the
max_dist_ref_points value using get_max_dist_ref_points().
NUMA_ASSOC_SIZE is replaced by get_numa_assoc_size() and VCPU_ASSOC_SIZE
is replaced by get_vcpu_assoc_size(). These functions are used by the
generic device tree functions and h_home_node_associativity() and will
allow them to switch between FORM1 and FORM2 without changing their core
logic.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr_numa.c| 93 +++---
 include/hw/ppc/spapr.h | 22 +++---
 2 files changed, 74 insertions(+), 41 deletions(-)

diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
index 327952ba9e..7ad4b6582b 100644
--- a/hw/ppc/spapr_numa.c
+++ b/hw/ppc/spapr_numa.c
@@ -19,6 +19,47 @@
 /* Moved from hw/ppc/spapr_pci_nvlink2.c */
 #define SPAPR_GPU_NUMA_ID   (cpu_to_be32(1))
 
+/*
+ * NUMA FORM1 macros. FORM1_DIST_REF_POINTS was taken from
+ * MAX_DISTANCE_REF_POINTS in arch/powerpc/mm/numa.h from Linux
+ * kernel source. It represents the amount of associativity domains
+ * for non-CPU resources.
+ *
+ * FORM1_NUMA_ASSOC_SIZE is the base array size of an ibm,associativity
+ * array for any non-CPU resource.
+ */
+#define FORM1_DIST_REF_POINTS4
+#define FORM1_NUMA_ASSOC_SIZE(FORM1_DIST_REF_POINTS + 1)
+
+/*
+ * Retrieves max_dist_ref_points of the current NUMA affinity.
+ */
+static int get_max_dist_ref_points(SpaprMachineState *spapr)
+{
+/* No FORM2 affinity implemented yet */
+return FORM1_DIST_REF_POINTS;
+}
+
+/*
+ * Retrieves numa_assoc_size of the current NUMA affinity.
+ */
+static int get_numa_assoc_size(SpaprMachineState *spapr)
+{
+/* No FORM2 affinity implemented yet */
+return FORM1_NUMA_ASSOC_SIZE;
+}
+
+/*
+ * Retrieves vcpu_assoc_size of the current NUMA affinity.
+ *
+ * vcpu_assoc_size is the size of ibm,associativity array
+ * for CPUs, which has an extra element (vcpu_id) in the end.
+ */
+static int get_vcpu_assoc_size(SpaprMachineState *spapr)
+{
+return get_numa_assoc_size(spapr) + 1;
+}
+
 static bool spapr_numa_is_symmetrical(MachineState *ms)
 {
 int src, dst;
@@ -96,7 +137,7 @@ static void 
spapr_numa_define_FORM1_domains(SpaprMachineState *spapr)
  * considered a match with associativity domains of node 0.
  */
 for (i = 1; i < nb_numa_nodes; i++) {
-for (j = 1; j < MAX_DISTANCE_REF_POINTS; j++) {
+for (j = 1; j < FORM1_DIST_REF_POINTS; j++) {
 spapr->FORM1_assoc_array[i][j] = cpu_to_be32(i);
 }
 }
@@ -134,7 +175,7 @@ static void 
spapr_numa_define_FORM1_domains(SpaprMachineState *spapr)
  *
  * The Linux kernel will assume that the distance between src and
  * dst, in this case of no match, is 10 (local distance) doubled
- * for each NUMA it didn't match. We have MAX_DISTANCE_REF_POINTS
+ * for each NUMA it didn't match. We have FORM1_DIST_REF_POINTS
  * levels (4), so this gives us 10*2*2*2*2 = 160.
  *
  * This logic can be seen in the Linux kernel source code, as of
@@ -168,13 +209,13 @@ static void 
spapr_numa_FORM1_affinity_init(SpaprMachineState *spapr,
 int i, j, max_nodes_with_gpus;
 
 /* init FORM1_assoc_array */
-for (i = 0; i < MAX_NODES + NVGPU_MAX_NUM; i++) {
-spapr->FORM1_assoc_array[i] = g_new0(uint32_t, NUMA_ASSOC_SIZE);
+for (i = 0; i < NUMA_NODES_MAX_NUM; i++) {
+spapr->FORM1_assoc_array[i] = g_new0(uint32_t, FORM1_NUMA_ASSOC_SIZE);
 }
 
 /*
  * For all associativity arrays: first position is the size,
- * position MAX_DISTANCE_REF_POINTS is always the numa_id,
+ * position FORM1_DIST_REF_POINTS is always the numa_id,
  * represented by the index 'i'.
  *
  * This will break on sparse NUMA setups, when/if QEMU starts
@@ -182,8 +223,8 @@ static void 
spapr_numa_FORM1_affinity_init(SpaprMachineState *spapr,
  * 'i' will be a valid node_id set by the user.
  */
 for (i = 0; i < nb_numa_nodes; i++) {
-spapr->FORM1_assoc_array[i][0] = cpu_to_be32(MAX_DISTANCE_REF_POINTS);
-spapr->FORM1_assoc_array[i][MAX_DISTANCE_REF_POINTS] = cpu_to_be32(i);
+spapr->FORM1_assoc_array[i][0] = cpu_to_be32(FORM1_DIST_REF_POINTS);
+

[PATCH v6 2/6] spapr_numa.c: scrap 'legacy_numa' concept

2021-09-10 Thread Daniel Henrique Barboza

When first introduced, 'legacy_numa' was a way to refer to guests that
either wouldn't be affected by associativity domain calculations, namely
the ones with only 1 NUMA node, and pre 5.2 guests that shouldn't be
affected by it because it would be an userspace change. Calling these
cases 'legacy_numa' was a convenient way to label these cases.

We're about to introduce a new NUMA affinity, FORM2, and this concept
of 'legacy_numa' is now a bit misleading because, although it is called
'legacy' it is in fact a FORM1 exclusive contraint.

This patch removes spapr_machine_using_legacy_numa() and open code the
conditions in each caller. While we're at it, move the chunk inside
spapr_numa_FORM1_affinity_init() that sets all numa_assoc_array domains
with 'node_id' to spapr_numa_define_FORM1_domains(). This chunk was
being executed if !pre_5_2_numa_associativity and num_nodes => 1, the
same conditions in which spapr_numa_define_FORM1_domains() is called
shortly after.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr_numa.c | 46 +++--
 1 file changed, 19 insertions(+), 27 deletions(-)

diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
index 786def7c73..fb6059550f 100644
--- a/hw/ppc/spapr_numa.c
+++ b/hw/ppc/spapr_numa.c
@@ -19,15 +19,6 @@
 /* Moved from hw/ppc/spapr_pci_nvlink2.c */
 #define SPAPR_GPU_NUMA_ID   (cpu_to_be32(1))
 
-static bool spapr_machine_using_legacy_numa(SpaprMachineState *spapr)
-{
-MachineState *machine = MACHINE(spapr);
-SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(machine);
-
-return smc->pre_5_2_numa_associativity ||
-   machine->numa_state->num_nodes <= 1;
-}
-
 static bool spapr_numa_is_symmetrical(MachineState *ms)
 {
 int src, dst;
@@ -97,7 +88,18 @@ static void 
spapr_numa_define_FORM1_domains(SpaprMachineState *spapr)
 MachineState *ms = MACHINE(spapr);
 NodeInfo *numa_info = ms->numa_state->nodes;
 int nb_numa_nodes = ms->numa_state->num_nodes;
-int src, dst, i;
+int src, dst, i, j;
+
+/*
+ * Fill all associativity domains of non-zero NUMA nodes with
+ * node_id. This is required because the default value (0) is
+ * considered a match with associativity domains of node 0.
+ */
+for (i = 1; i < nb_numa_nodes; i++) {
+for (j = 1; j < MAX_DISTANCE_REF_POINTS; j++) {
+spapr->numa_assoc_array[i][j] = cpu_to_be32(i);
+}
+}
 
 for (src = 0; src < nb_numa_nodes; src++) {
 for (dst = src; dst < nb_numa_nodes; dst++) {
@@ -164,7 +166,6 @@ static void 
spapr_numa_FORM1_affinity_init(SpaprMachineState *spapr,
 SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
 int nb_numa_nodes = machine->numa_state->num_nodes;
 int i, j, max_nodes_with_gpus;
-bool using_legacy_numa = spapr_machine_using_legacy_numa(spapr);
 
 /*
  * For all associativity arrays: first position is the size,
@@ -178,17 +179,6 @@ static void 
spapr_numa_FORM1_affinity_init(SpaprMachineState *spapr,
 for (i = 0; i < nb_numa_nodes; i++) {
 spapr->numa_assoc_array[i][0] = cpu_to_be32(MAX_DISTANCE_REF_POINTS);
 spapr->numa_assoc_array[i][MAX_DISTANCE_REF_POINTS] = cpu_to_be32(i);
-
-/*
- * Fill all associativity domains of non-zero NUMA nodes with
- * node_id. This is required because the default value (0) is
- * considered a match with associativity domains of node 0.
- */
-if (!using_legacy_numa && i != 0) {
-for (j = 1; j < MAX_DISTANCE_REF_POINTS; j++) {
-spapr->numa_assoc_array[i][j] = cpu_to_be32(i);
-}
-}
 }
 
 /*
@@ -214,11 +204,13 @@ static void 
spapr_numa_FORM1_affinity_init(SpaprMachineState *spapr,
 }
 
 /*
- * Legacy NUMA guests (pseries-5.1 and older, or guests with only
- * 1 NUMA node) will not benefit from anything we're going to do
- * after this point.
+ * Guests pseries-5.1 and older uses zeroed associativity domains,
+ * i.e. no domain definition based on NUMA distance input.
+ *
+ * Same thing with guests that have only one NUMA node.
  */
-if (using_legacy_numa) {
+if (smc->pre_5_2_numa_associativity ||
+machine->numa_state->num_nodes <= 1) {
 return;
 }
 
@@ -334,7 +326,7 @@ static void 
spapr_numa_FORM1_write_rtas_dt(SpaprMachineState *spapr,
 cpu_to_be32(maxdomain)
 };
 
-if (spapr_machine_using_legacy_numa(spapr)) {
+if (smc->pre_5_2_numa_associativity) {
 uint32_t legacy_refpoints[] = {
 cpu_to_be32(0x4),
 cpu_to_be32(0x4),
-- 
2.31.1

[PATCH v6 1/6] spapr_numa.c: split FORM1 code into helpers

2021-09-10 Thread Daniel Henrique Barboza

The upcoming FORM2 NUMA affinity will support asymmetric NUMA topologies
and doesn't need be concerned with all the legacy support for older
pseries FORM1 guests.

We're also not going to calculate associativity domains based on numa
distance (via spapr_numa_define_associativity_domains) since the
distances will be written directly into new DT properties.

Let's split FORM1 code into its own functions to allow for easier
insertion of FORM2 logic later on.

Signed-off-by: Daniel Henrique Barboza 
---
 hw/ppc/spapr_numa.c | 35 +--
 1 file changed, 25 insertions(+), 10 deletions(-)

diff --git a/hw/ppc/spapr_numa.c b/hw/ppc/spapr_numa.c
index 779f18b994..786def7c73 100644
--- a/hw/ppc/spapr_numa.c
+++ b/hw/ppc/spapr_numa.c
@@ -92,7 +92,7 @@ static uint8_t spapr_numa_get_numa_level(uint8_t distance)
 return 0;
 }
 
-static void spapr_numa_define_associativity_domains(SpaprMachineState *spapr)
+static void spapr_numa_define_FORM1_domains(SpaprMachineState *spapr)
 {
 MachineState *ms = MACHINE(spapr);
 NodeInfo *numa_info = ms->numa_state->nodes;
@@ -155,8 +155,11 @@ static void 
spapr_numa_define_associativity_domains(SpaprMachineState *spapr)
 
 }
 
-void spapr_numa_associativity_init(SpaprMachineState *spapr,
-   MachineState *machine)
+/*
+ * Set NUMA machine state data based on FORM1 affinity semantics.
+ */
+static void spapr_numa_FORM1_affinity_init(SpaprMachineState *spapr,
+   MachineState *machine)
 {
 SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
 int nb_numa_nodes = machine->numa_state->num_nodes;
@@ -225,7 +228,13 @@ void spapr_numa_associativity_init(SpaprMachineState 
*spapr,
 exit(EXIT_FAILURE);
 }
 
-spapr_numa_define_associativity_domains(spapr);
+spapr_numa_define_FORM1_domains(spapr);
+}
+
+void spapr_numa_associativity_init(SpaprMachineState *spapr,
+   MachineState *machine)
+{
+spapr_numa_FORM1_affinity_init(spapr, machine);
 }
 
 void spapr_numa_write_associativity_dt(SpaprMachineState *spapr, void *fdt,
@@ -302,12 +311,8 @@ int spapr_numa_write_assoc_lookup_arrays(SpaprMachineState 
*spapr, void *fdt,
 return ret;
 }
 
-/*
- * Helper that writes ibm,associativity-reference-points and
- * max-associativity-domains in the RTAS pointed by @rtas
- * in the DT @fdt.
- */
-void spapr_numa_write_rtas_dt(SpaprMachineState *spapr, void *fdt, int rtas)
+static void spapr_numa_FORM1_write_rtas_dt(SpaprMachineState *spapr,
+   void *fdt, int rtas)
 {
 MachineState *ms = MACHINE(spapr);
 SpaprMachineClass *smc = SPAPR_MACHINE_GET_CLASS(spapr);
@@ -365,6 +370,16 @@ void spapr_numa_write_rtas_dt(SpaprMachineState *spapr, 
void *fdt, int rtas)
  maxdomains, sizeof(maxdomains)));
 }
 
+/*
+ * Helper that writes ibm,associativity-reference-points and
+ * max-associativity-domains in the RTAS pointed by @rtas
+ * in the DT @fdt.
+ */
+void spapr_numa_write_rtas_dt(SpaprMachineState *spapr, void *fdt, int rtas)
+{
+spapr_numa_FORM1_write_rtas_dt(spapr, fdt, rtas);
+}
+
 static target_ulong h_home_node_associativity(PowerPCCPU *cpu,
   SpaprMachineState *spapr,
   target_ulong opcode,
-- 
2.31.1

[PATCH v6 0/6] pSeries FORM2 affinity support

2021-09-10 Thread Daniel Henrique Barboza

Hi,

In this version there was significant design changes after the
v5 review. Only patches 1 and 5 were present in the last version.

changes from v5:
- patch order was changed to make all the preliminary work without
adding FORM2 code;
- FORM1 and FORM2 data now co-exists. Both are being initialized in
spapr_numa_associativity_init() in two static arrays called
'FORM1_assoc_array' and 'FORM2_assoc_array'. 'numa_assoc_array' is now a
pointer that toggles between those 2;
- spapr_numa_associativity_reset() switches the NUMA affinity data to be
used. It is not a replace for associativity_init() as it was in v5;
- 'legacy_numa' concept was removed;
- FORM2 affinity init() is now completely separated from FORM1;
- FORM2 ibm,associativity array only contains size and numa_id for non-CPU
resources, and an extra vcpu_id for CPUs;
- FORM2 reference-points is { 1 };
- FORM2 maxdomain has size = 2;
- several other changes to accomodate the new design of having to deal with
2 different data structures, while minimizing changes in the write_dt()
functions 


Daniel Henrique Barboza (6):
  spapr_numa.c: split FORM1 code into helpers
  spapr_numa.c: scrap 'legacy_numa' concept
  spapr: introduce spapr_numa_associativity_reset()
  spapr_numa.c: parametrize FORM1 macros
  spapr: move FORM1 verifications to post CAS
  spapr_numa.c: FORM2 NUMA affinity support

 hw/ppc/spapr.c  |  55 +++---
 hw/ppc/spapr_hcall.c|   7 +
 hw/ppc/spapr_numa.c | 382 ++--
 include/hw/ppc/spapr.h  |  25 +--
 include/hw/ppc/spapr_numa.h |   2 +
 include/hw/ppc/spapr_ovec.h |   1 +
 6 files changed, 362 insertions(+), 110 deletions(-)

-- 
2.31.1

Re: [PATCH v4 22/33] hostmem-epc: Add the reset interface for EPC backend reset

2021-09-10 Thread Paolo Bonzini


On 10/09/21 19:34, Sean Christopherson wrote:

On Fri, Sep 10, 2021, Paolo Bonzini wrote:

On 10/09/21 17:34, Sean Christopherson wrote:

The only other option that comes to mind is a dedicated ioctl().


If it is not too restrictive to do it for the whole mapping at once,
that would be fine.


Oooh, rats.  That reminds me of a complication.  If QEMU creates multiple EPC
sections, e.g. for a vNUMA setup, resetting each section individually will fail
if the guest did an unclean RESET and a given enclaves has EPC pages from 
multiple
sections.  E.g. an SECS in vEPC[X] can have children in vEPC[0..N-1], and all
those children need to be removed before the SECS can be removed.  Yay SGX!

There are two options: 1) QEMU has to handle "failure", or 2) the kernel 
provides
an ioctl() that takes multiple vEPC fds and handles the SECS dependencies.  #1 
is
probably the least awful option.  For #2, in addition to the kernel having to 
deal
with multiple fds, it would also need a second list_head object in each page so
that it could track which pages failed to be removed.  Using the existing 
list_head
would work for now, but it won't work if/when an EPC cgroup is added.

Note, for #1, QEMU would have to potentially do three passes.

   1. Remove child pages for a given vEPC.
   2. Remove SECS for a given vEPC that were pinned by children in the same 
vEPC.
   3. Remove SECS for all vEPC that were pinned by children in different vEPC.


It's also possible that QEMU handles failure, but the kernel does two 
passes; then QEMU can just do two passes.  The kernel will overall do 
four passes, but:


1) the second (SECS pinned by children in the same vEPC) would be 
cheaper than a full second pass


2) the fourth would actually do nothing, because there would be no pages 
failing the EREMOV'al.


A hypothetical other SGX client that only uses one vEPC will do the 
right thing with a single pass.


Paolo

simple serial device emulation

2021-09-10 Thread Hinko Kocevar

I have an emulated MMIO area holding couple of registers that deal with
serial UART. Very simple access to the Tx and Rx registers from the
userspace point of view involves polling for a bit in one register and then
writing another; when there is room for another character. When the guest
app does write to a MMIO Tx register, as expected, io_writex() is invoked
and my handler is invoked. At the moment it does not do much. I'm thinking
now that the character needs to be fed to the serial device instance or
something.

Where should I look for suitable examples in the qemu code? I reckon that
other machines exist that do the similar. I found lots of serial_mm_init()
and sysbus_mmio_map() uses around serial port instances but I'm not sure
how to couple my "serial ops" to the "bus" or SerialMM (if that is the way
to go).

Thanks!
//hinko

-- 
.. the more I see the less I believe.., AE AoR

Re: [PATCH] hw/i386/acpi-build: Fix a typo

2021-09-10 Thread Philippe Mathieu-Daudé

On 9/10/21 8:54 PM, Volker Rümelin wrote:
>> Fix 'hotplugabble' -> 'hotpluggabble' typo.
> 
> I'm convinced that the correct spelling is hotpluggable. Only the
> consonant g is doubled.

Lol I missed this part, thanks :>

Re: Adding IO memory region to mipssim

2021-09-10 Thread Hinko Kocevar

Got it Phil, thank you very much! I need to educate myself on the subject
of TLB and MMU for mips.
//hinko

On Fri, Sep 10, 2021 at 5:30 PM Philippe Mathieu-Daudé 
wrote:

> On 9/10/21 3:21 PM, Hinko Kocevar wrote:
> > I'm trying to add an I/O memory region to mipssim machine to emulate a
> > MMIO region used by the u-boot loaded as BIOS image. I can confirm that
> > the machine starts and loads the BIOS, starts execution but hangs due to
> > unhandled IO access as described below.
> >
> > The region should be at 0xB881, of size 0x1.
> >
> > I've added these lines of code to mispsim.c mips_mipssim_init():
> >
> > my_state *s = g_malloc0(sizeof(my_state));
> > memory_region_init_io(>mmio, NULL, _ops, s,
> >  "mips_mipssim.foo", 0x1);
> > memory_region_add_subregion(address_space_mem, 0xB881LL,
> >mmio);
>
> You need to map your device at its physical address, not the virtual
> one.
>
> > All goes well, the machine starts, and I can see the newly added region
> > in qemu monitor info mtree output like so:
> >
> > b881-b881 (prio 0, i/o): mips_mipssim.foo
> >
> > With some tracing enabled I see this error:
> >
> >  Invalid access at addr 0x18810104, size 4, region '(null)', reason:
> > rejected
> >
> > I know the u-boot is making request to 0xB8810104 and not 0x18810104. I
>
> U-boot accessed the virtual address which is resolved into the physical
> one (where your device should be mapped).
>
> > also can see 0xB8810104 address being handed to io_writex(), but
> > mr_offset becomes 0x18810104 here:
> >
> >   mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
> >
> > What is going on?
> >
> > FWIW, I can place my emulated memory region at 0x18810104, but would
> > like to understand the behavior above.
>
> Yes, this is the correct address to map it.
>
> Maybe this helps:
> https://training.mips.com/basic_mips/PDF/Memory_Map.pdf
>
> Regards,
>
> Phil.
>


-- 
.. the more I see the less I believe.., AE AoR

Re: Implementing isochronous transfers in hw/hcd-ohci.c

2021-09-10 Thread Programmingkid




> On Sep 10, 2021, at 7:51 AM, BALATON Zoltan  wrote:
> 
> On Fri, 10 Sep 2021, Howard Spoelstra wrote:
>> On Fri, Sep 10, 2021 at 7:07 AM Gerd Hoffmann  wrote:
>> 
>>> On Thu, Sep 09, 2021 at 05:06:17PM -0400, Programmingkid wrote:
 Hi Gerd,
 
 Howard and I were talking about USB audio problems with Mac OS guests.
>>> We think the issue might be with frames being sent to the USB audio card
>>> too soon. My guess is only one frame is suppose to be transmitted every 1
>>> millisecond. I was also reading the todo notes in the file hw/hcd-ohci.c.
>>> This is what it says:
 
 * TODO:
 *  o Isochronous transfers
 *  o Allocate bandwidth in frames properly
 *  o Disable timers when nothing needs to be done, or remove timer usage
 *all together.
 *  o BIOS work to boot from USB storage
 */
 
 Do you think implementing isochronous transfers would fix the audio
>>> problems Mac OS guest are experiencing?
>>> 
>>> Most likely yes, audio devices typically use iso endpints.
>>> 
>>> take care,
>>>  Gerd
>>> 
>> 
>> Hi,
>> 
>> Below I pasted the first lines mentioning isochronous traffic from a pcap
>> file when running fedora12 with the usb-audio device and the first lines
>> from a pcap file running Mac OS 9.2 with the usb-audio device
>> 
>> Fedora:
>> 91 56.715001 host 0.5.1 USB 256 URB_ISOCHRONOUS out
>> 92 56.715018 0.5.1 host USB 64 URB_ISOCHRONOUS out
>> 
>> MacOS:
>> 143 56.031989 host 0.16.1 USB 256 URB_ISOCHRONOUS out
>> 144 56.032026 0.16.1 host USB 64 URB_ISOCHRONOUS out
>> 
>> The usb-audio device works for the fedora guest, so would this not indicate
>> that the iso endpoints are already working?
>> 
>> The usb-audio device also work (for a limited amount of time) when running
>> MacOS. Looking at USB logging in the Mac OS guest, to me it seems the MacOS
>> side runs into timing issues when packages drift too far apart. It then
>> finally gives up trying to keep the stream open.
> 
> I was also trying to find why the usb-audio device does not work with MorphOS 
> but I could not figure it out. Now I have two machines (mac99 and pegasos2) 
> that can boot MorphOS but usb-audio does not work with either so maybe it's 
> not because of the USB controller. I've found there is a debug property that 
> enables some logging: -device usb-audio,debug=1 but that did not reveal much 
> more. It looks like MorphOS tries to query the device but replies come very 
> slow (not sure if that's normal or a problem) then just gives up after a 
> while. Maybe you can try comparing what Fedora and other OSes query as it may 
> be we're missing some info in USB descriptors that other drivers than Linux's 
> rely on but that's just a guess I haven't tested with Linux guest on pegasos2 
> or mac99 yet.
> 
> Regards,
> BALATON Zoltan

Thank for the info everyone. When I try to use the USB-Audio device in Mac OS 
10.4.11, the operating system doesn't use it. It does show up in the System 
Profiler application. In Mac OS 9.2 the sound from it is perfect sounding, but 
the operating system crashes after less than a minute. In Mac OS 10.2.3 the 
operating system does set it as its output device but it does not work.

To find out what is wrong we would probably have to build the USB audio drivers 
in Mac OS X and enable debug output to see what is wrong. They are open source 
and I have done this in the past. As for Mac OS 9, I'm not sure how to debug 
its driver.

Re: [PATCH v2 0/3] The HMP/QMP interfaces in Qemu SGX

2021-09-10 Thread Paolo Bonzini

Il ven 10 set 2021, 16:22 Daniel P. Berrangé  ha
scritto:

> > Queued 1-2, thanks.
>
> I had just posted a bunch of comments on patch 1 ...
>

Sorry, I had already queued it a few hours earlier and just noticed I
hadn't sent out the message.

I have some updates to the main series too, so I might just pick up your
suggestions and send out everything next week or on Sunday.

Paolo


>
> Regards,
> Daniel
> --
> |: https://berrange.com  -o-
> https://www.flickr.com/photos/dberrange :|
> |: https://libvirt.org -o-
> https://fstop138.berrange.com :|
> |: https://entangle-photo.org-o-
> https://www.instagram.com/dberrange :|
>
>

Re: [PATCH] hw/i386/acpi-build: Fix a typo

2021-09-10 Thread Volker Rümelin


Fix 'hotplugabble' -> 'hotpluggabble' typo.


I'm convinced that the correct spelling is hotpluggable. Only the 
consonant g is doubled.


With best regards
Volker


Signed-off-by: Philippe Mathieu-Daudé 
---
  hw/i386/acpi-build.c | 6 +++---
  1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index d1f5fa3b5a5..478263e12c9 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -1916,7 +1916,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
  PCMachineState *pcms = PC_MACHINE(machine);
  int nb_numa_nodes = machine->numa_state->num_nodes;
  NodeInfo *numa_info = machine->numa_state->nodes;
-ram_addr_t hotplugabble_address_space_size =
+ram_addr_t hotpluggabble_address_space_size =
  object_property_get_int(OBJECT(pcms), PC_MACHINE_DEVMEM_REGION_SIZE,
  NULL);
  
@@ -2022,10 +2022,10 @@ build_srat(GArray *table_data, BIOSLinker *linker, MachineState *machine)

   * Memory devices may override proximity set by this entry,
   * providing _PXM method if necessary.
   */
-if (hotplugabble_address_space_size) {
+if (hotpluggabble_address_space_size) {
  numamem = acpi_data_push(table_data, sizeof *numamem);
  build_srat_memory(numamem, machine->device_memory->base,
-  hotplugabble_address_space_size, nb_numa_nodes - 1,
+  hotpluggabble_address_space_size, nb_numa_nodes - 1,
MEM_AFFINITY_HOTPLUGGABLE | MEM_AFFINITY_ENABLED);
  }

Re: [PATCH v4 22/33] hostmem-epc: Add the reset interface for EPC backend reset

2021-09-10 Thread Sean Christopherson

On Fri, Sep 10, 2021, Paolo Bonzini wrote:
> On 10/09/21 17:34, Sean Christopherson wrote:
> > > Yang explained to me (offlist) that this is needed because Windows fails 
> > > to
> > > reboot without it.  We would need a way to ask Linux to reinitialize the
> > > vEPC, that doesn't involve munmap/mmap; this could be for example
> > > fallocate(FALLOC_FL_ZERO_RANGE).
> > > 
> > > What do you all think?
> > 
> > Mechanically, FALLOC_FL_ZERO_RANGE could work, but it's definitely a 
> > stretch of
> > the semantics as the underlying memory would not actually be zeroed.
> 
> The contents are not visible to anyone, so they might well be zero
> (not entirely serious, but also not entirely unserious).

Yeah, it wouldn't be a sticking point, just odd.

> > The only other option that comes to mind is a dedicated ioctl().
> 
> If it is not too restrictive to do it for the whole mapping at once,
> that would be fine.

Oooh, rats.  That reminds me of a complication.  If QEMU creates multiple EPC
sections, e.g. for a vNUMA setup, resetting each section individually will fail
if the guest did an unclean RESET and a given enclaves has EPC pages from 
multiple
sections.  E.g. an SECS in vEPC[X] can have children in vEPC[0..N-1], and all
those children need to be removed before the SECS can be removed.  Yay SGX!

> If it makes sense to do it for a range, however,
> the effort of defining a ioctl() API is probably not worth it when
> fallocate() is available.
> 
> Either way, I suppose it would be just something like
> 
>   /* for fallocate; otherwise just use xa_for_each */
>   size = min_t(unsigned long, size, -start);
>   end = (start + size - 1) >> PAGE_SHIFT;
>   start >>= PAGE_SHIFT;
> 
>   /*
>* Removing in two passes lets us remove SECS pages as well,
>* since they can only be EREMOVE'd after all their child pages.
>* An EREMOVE failure on the second pass means that the SECS
>* page still has children on another instance.  Since it is
>* still in the xarray, bury our head in the sand and ignore
>* it; sgx_vepc_release() will deal with it.
>*/
>   LIST_HEAD(secs_pages);
> xa_for_each_range(>page_array, index, entry, start, end) {
> if (!sgx_vepc_free_page(entry))
> list_add_tail(_page->list, _pages);
> }
> 
> list_for_each_entry_safe(epc_page, tmp, _pages, list) {
>   list_del(_page->list);
> sgx_vepc_free_page(epc_page);
> }
> So another advantage of the ioctl would be e.g. being able to return the
> number of successfully EREMOVEd pages.  But since QEMU would not do
> anything with the return value and _also_ bury its head in the sand,
> that would only be useful if you guys have other uses in mind.

There are two options: 1) QEMU has to handle "failure", or 2) the kernel 
provides
an ioctl() that takes multiple vEPC fds and handles the SECS dependencies.  #1 
is
probably the least awful option.  For #2, in addition to the kernel having to 
deal
with multiple fds, it would also need a second list_head object in each page so
that it could track which pages failed to be removed.  Using the existing 
list_head
would work for now, but it won't work if/when an EPC cgroup is added.

Note, for #1, QEMU would have to potentially do three passes.

  1. Remove child pages for a given vEPC.
  2. Remove SECS for a given vEPC that were pinned by children in the same vEPC.
  3. Remove SECS for all vEPC that were pinned by children in different vEPC.

Re: [RFC 00/10] hw/mos6522: VIA timer emulation fixes and improvements

2021-09-10 Thread Mark Cave-Ayland


On 01/09/2021 09:06, Mark Cave-Ayland wrote:

I'll have a go at some basic timer measurements using your patch to see what sort of 
numbers I get for the latency here. Obviously QEMU doesn't guarantee response times 
but over 20ms does seem high.


I was able to spend some time today looking at this and came up with 
https://github.com/mcayland/qemu/tree/via-hacks with the aim of starting with your 
patches to reduce the calls to the timer update functions (to reduce jitter) and 
fixing up the places where mos6522_update_irq() isn't called.


That seemed okay, but the part I'm struggling with at the moment is removing the 
re-assertion of the timer interrupt if the timer has expired when any of the 
registers are read i.e.


diff --git a/hw/misc/mos6522.c b/hw/misc/mos6522.c
index 3c33cbebde..9884d7e178 100644
--- a/hw/misc/mos6522.c
+++ b/hw/misc/mos6522.c
@@ -229,16 +229,7 @@ uint64_t mos6522_read(void *opaque, hwaddr addr, unsigned 
size)
 {
 MOS6522State *s = opaque;
 uint32_t val;
-int64_t now = qemu_clock_get_ns(QEMU_CLOCK_VIRTUAL);

-if (now >= s->timers[0].next_irq_time) {
-mos6522_timer1_update(s, >timers[0], now);
-s->ifr |= T1_INT;
-}
-if (now >= s->timers[1].next_irq_time) {
-mos6522_timer2_update(s, >timers[1], now);
-s->ifr |= T2_INT;
-}
 switch (addr) {
 case VIA_REG_B:
 val = s->b;

If I apply this then I see a hang in about roughly 1 in 10 boots. Poking it with a 
debugger shows that the VIA1 timer interrupts are constantly being fired as IER and 
IFR have the timer 1 bit set, but it is being filtered out by SR being set to 0x2700 
on the CPU.


The existing code above isn't correct since T1_INT should only be asserted by the 
expiry of the timer 1 via mos6522_timer1(), so I'm wondering if this is tickling a 
CPU bug somewhere? In what circumstances could interrupts be disabled via SR but not 
enabled again?



ATB,

Mark.

Re: [PATCH v2 1/2] elf2dmp: Check curl_easy_setopt() return value

2021-09-10 Thread Viktor Prutyanov

Hi,

On Fri, 10 Sep 2021 19:06:55 +0200
Philippe Mathieu-Daudé  wrote:

> From: Peter Maydell 
> 
> Coverity points out that we aren't checking the return value
> from curl_easy_setopt().
> 
> Fixes: Coverity CID 1458895
> Inspired-by: Peter Maydell 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> Informal T-b tag on
> https://lore.kernel.org/qemu-devel/20210909004313.1dadb24e@192.168.1.7/
> Tested-by: Viktor Prutyanov 
> 
> v1 from Peter:
> https://lore.kernel.org/qemu-devel/20210901143910.17112-2-peter.mayd...@linaro.org/
> ---
>  contrib/elf2dmp/download.c | 22 ++
>  1 file changed, 10 insertions(+), 12 deletions(-)
> 
> diff --git a/contrib/elf2dmp/download.c b/contrib/elf2dmp/download.c
> index d09e607431f..bd7650a7a27 100644
> --- a/contrib/elf2dmp/download.c
> +++ b/contrib/elf2dmp/download.c
> @@ -25,21 +25,19 @@ int download_url(const char *name, const char
> *url) goto out_curl;
>  }
>  
> -curl_easy_setopt(curl, CURLOPT_URL, url);
> -curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, NULL);
> -curl_easy_setopt(curl, CURLOPT_WRITEDATA, file);
> -curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1);
> -curl_easy_setopt(curl, CURLOPT_NOPROGRESS, 0);
> -
> -if (curl_easy_perform(curl) != CURLE_OK) {
> -err = 1;
> -fclose(file);
> +if (curl_easy_setopt(curl, CURLOPT_URL, url) != CURLE_OK
> +|| curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, NULL)
> != CURLE_OK
> +|| curl_easy_setopt(curl, CURLOPT_WRITEDATA, file) !=
> CURLE_OK
> +|| curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1) !=
> CURLE_OK
> +|| curl_easy_setopt(curl, CURLOPT_NOPROGRESS, 0) !=
> CURLE_OK
> +|| curl_easy_perform(curl) != CURLE_OK) {
>  unlink(name);
> -goto out_curl;
> +fclose(file);
> +err = 1;
> +} else {
> +err = fclose(file);
>  }
>  
> -err = fclose(file);
> -
>  out_curl:
>  curl_easy_cleanup(curl);
>  

Reviewed-by: Viktor Prutyanov 
Tested-by: Viktor Prutyanov 

-- 
Viktor Prutyanov

Re: [PATCH v2 2/2] elf2dmp: Fail cleanly if PDB file specifies zero block_size

2021-09-10 Thread Viktor Prutyanov

Hi,

On Fri, 10 Sep 2021 19:06:56 +0200
Philippe Mathieu-Daudé  wrote:

> From: Peter Maydell 
> 
> Coverity points out that if the PDB file we're trying to read
> has a header specifying a block_size of zero then we will
> end up trying to divide by zero in pdb_ds_read_file().
> Check for this and fail cleanly instead.
> 
> Fixes: Coverity CID 1458869
> Signed-off-by: Peter Maydell 
> Reviewed-by: Viktor Prutyanov 
> Message-Id: <20210901143910.17112-3-peter.mayd...@linaro.org>
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
> Informal T-b tag on
> https://lore.kernel.org/qemu-devel/20210909004313.1dadb24e@192.168.1.7/
> Tested-by: Viktor Prutyanov 
> ---
>  contrib/elf2dmp/pdb.c | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/contrib/elf2dmp/pdb.c b/contrib/elf2dmp/pdb.c
> index b3a65470680..adcfa7e154c 100644
> --- a/contrib/elf2dmp/pdb.c
> +++ b/contrib/elf2dmp/pdb.c
> @@ -215,6 +215,10 @@ out_symbols:
>  
>  static int pdb_reader_ds_init(struct pdb_reader *r, PDB_DS_HEADER
> *hdr) {
> +if (hdr->block_size == 0) {
> +return 1;
> +}
> +
>  memset(r->file_used, 0, sizeof(r->file_used));
>  r->ds.header = hdr;
>  r->ds.toc = pdb_ds_read(hdr, (uint32_t *)((uint8_t *)hdr +

Tested-by: Viktor Prutyanov 

-- 
Viktor Prutyanov

Re: [PATCH v4 22/33] hostmem-epc: Add the reset interface for EPC backend reset

2021-09-10 Thread Paolo Bonzini


On 10/09/21 17:34, Sean Christopherson wrote:

Yang explained to me (offlist) that this is needed because Windows fails to
reboot without it.  We would need a way to ask Linux to reinitialize the
vEPC, that doesn't involve munmap/mmap; this could be for example
fallocate(FALLOC_FL_ZERO_RANGE).

What do you all think?


Mechanically, FALLOC_FL_ZERO_RANGE could work, but it's definitely a stretch of
the semantics as the underlying memory would not actually be zeroed.


The contents are not visible to anyone, so they might well be zero
(not entirely serious, but also not entirely unserious).


The only other option that comes to mind is a dedicated ioctl().


If it is not too restrictive to do it for the whole mapping at once,
that would be fine.  If it makes sense to do it for a range, however,
the effort of defining a ioctl() API is probably not worth it when
fallocate() is available.

Either way, I suppose it would be just something like

/* for fallocate; otherwise just use xa_for_each */
size = min_t(unsigned long, size, -start);
end = (start + size - 1) >> PAGE_SHIFT;
start >>= PAGE_SHIFT;

/*
 * Removing in two passes lets us remove SECS pages as well,
 * since they can only be EREMOVE'd after all their child pages.
 * An EREMOVE failure on the second pass means that the SECS
 * page still has children on another instance.  Since it is
 * still in the xarray, bury our head in the sand and ignore
 * it; sgx_vepc_release() will deal with it.
 */
LIST_HEAD(secs_pages);
xa_for_each_range(>page_array, index, entry, start, end) {
if (!sgx_vepc_free_page(entry))
list_add_tail(_page->list, _pages);
}

list_for_each_entry_safe(epc_page, tmp, _pages, list) {
list_del(_page->list);
sgx_vepc_free_page(epc_page);
}
 
So another advantage of the ioctl would be e.g. being able to return the

number of successfully EREMOVEd pages.  But since QEMU would not do
anything with the return value and _also_ bury its head in the sand,
that would only be useful if you guys have other uses in mind.

Paolo

Re: [PATCH RFC server v2 01/11] vfio-user: build library

2021-09-10 Thread Jag Raman



> On Sep 10, 2021, at 11:20 AM, Philippe Mathieu-Daudé  
> wrote:
> 
> On 8/27/21 7:53 PM, Jagannathan Raman wrote:
>> add the libvfio-user library as a submodule. build it as a cmake
>> subproject.
>> 
>> Signed-off-by: Elena Ufimtseva 
>> Signed-off-by: John G Johnson 
>> Signed-off-by: Jagannathan Raman 
>> ---
>> configure| 11 +++
>> meson.build  | 28 
>> .gitmodules  |  3 +++
>> MAINTAINERS  |  7 +++
>> hw/remote/meson.build|  2 ++
>> subprojects/libvfio-user |  1 +
>> 6 files changed, 52 insertions(+)
>> create mode 16 subprojects/libvfio-user
> 
>> diff --git a/subprojects/libvfio-user b/subprojects/libvfio-user
>> new file mode 16
>> index 000..647c934
>> --- /dev/null
>> +++ b/subprojects/libvfio-user
>> @@ -0,0 +1 @@
>> +Subproject commit 647c9341d2e06266a710ddd075f69c95dd3b8446
>> 
> 
> Could we point to a sha1 of a released tag instead?

OK, will do.

--
Jag

>

[PATCH v2 2/2] elf2dmp: Fail cleanly if PDB file specifies zero block_size

2021-09-10 Thread Philippe Mathieu-Daudé

From: Peter Maydell 

Coverity points out that if the PDB file we're trying to read
has a header specifying a block_size of zero then we will
end up trying to divide by zero in pdb_ds_read_file().
Check for this and fail cleanly instead.

Fixes: Coverity CID 1458869
Signed-off-by: Peter Maydell 
Reviewed-by: Viktor Prutyanov 
Message-Id: <20210901143910.17112-3-peter.mayd...@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé 
---
Informal T-b tag on
https://lore.kernel.org/qemu-devel/20210909004313.1dadb24e@192.168.1.7/
Tested-by: Viktor Prutyanov 
---
 contrib/elf2dmp/pdb.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/contrib/elf2dmp/pdb.c b/contrib/elf2dmp/pdb.c
index b3a65470680..adcfa7e154c 100644
--- a/contrib/elf2dmp/pdb.c
+++ b/contrib/elf2dmp/pdb.c
@@ -215,6 +215,10 @@ out_symbols:
 
 static int pdb_reader_ds_init(struct pdb_reader *r, PDB_DS_HEADER *hdr)
 {
+if (hdr->block_size == 0) {
+return 1;
+}
+
 memset(r->file_used, 0, sizeof(r->file_used));
 r->ds.header = hdr;
 r->ds.toc = pdb_ds_read(hdr, (uint32_t *)((uint8_t *)hdr +
-- 
2.31.1

[PATCH v2 0/2] elf2dmp: Fix minor Coverity nits

2021-09-10 Thread Philippe Mathieu-Daudé

This is a respin of Peter Maydell series, with slightly different
logic on the first patch. Quoting v1 cover [*]:

  Coverity complains about a couple of minor issues in elf2dmp:
   * we weren't checking the return value from curl_easy_setopt()
   * we might divide by zero if presented with a corrupt PDB file

  This patchseries fixes those.

Viktor Prutyanov tested the patch but didn't provided a formal
Tested-by tag, so I haven't included it in the patches.

[*] 
https://lore.kernel.org/qemu-devel/20210901143910.17112-1-peter.mayd...@linaro.org/

Peter Maydell (2):
  elf2dmp: Check curl_easy_setopt() return value
  elf2dmp: Fail cleanly if PDB file specifies zero block_size

 contrib/elf2dmp/download.c | 22 ++
 contrib/elf2dmp/pdb.c  |  4 
 2 files changed, 14 insertions(+), 12 deletions(-)

-- 
2.31.1

[PATCH v2 1/2] elf2dmp: Check curl_easy_setopt() return value

2021-09-10 Thread Philippe Mathieu-Daudé

From: Peter Maydell 

Coverity points out that we aren't checking the return value
from curl_easy_setopt().

Fixes: Coverity CID 1458895
Inspired-by: Peter Maydell 
Signed-off-by: Philippe Mathieu-Daudé 
---
Informal T-b tag on
https://lore.kernel.org/qemu-devel/20210909004313.1dadb24e@192.168.1.7/
Tested-by: Viktor Prutyanov 

v1 from Peter:
https://lore.kernel.org/qemu-devel/20210901143910.17112-2-peter.mayd...@linaro.org/
---
 contrib/elf2dmp/download.c | 22 ++
 1 file changed, 10 insertions(+), 12 deletions(-)

diff --git a/contrib/elf2dmp/download.c b/contrib/elf2dmp/download.c
index d09e607431f..bd7650a7a27 100644
--- a/contrib/elf2dmp/download.c
+++ b/contrib/elf2dmp/download.c
@@ -25,21 +25,19 @@ int download_url(const char *name, const char *url)
 goto out_curl;
 }
 
-curl_easy_setopt(curl, CURLOPT_URL, url);
-curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, NULL);
-curl_easy_setopt(curl, CURLOPT_WRITEDATA, file);
-curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1);
-curl_easy_setopt(curl, CURLOPT_NOPROGRESS, 0);
-
-if (curl_easy_perform(curl) != CURLE_OK) {
-err = 1;
-fclose(file);
+if (curl_easy_setopt(curl, CURLOPT_URL, url) != CURLE_OK
+|| curl_easy_setopt(curl, CURLOPT_WRITEFUNCTION, NULL) != CURLE_OK
+|| curl_easy_setopt(curl, CURLOPT_WRITEDATA, file) != CURLE_OK
+|| curl_easy_setopt(curl, CURLOPT_FOLLOWLOCATION, 1) != CURLE_OK
+|| curl_easy_setopt(curl, CURLOPT_NOPROGRESS, 0) != CURLE_OK
+|| curl_easy_perform(curl) != CURLE_OK) {
 unlink(name);
-goto out_curl;
+fclose(file);
+err = 1;
+} else {
+err = fclose(file);
 }
 
-err = fclose(file);
-
 out_curl:
 curl_easy_cleanup(curl);
 
-- 
2.31.1

Re: [PULL 0/6] Vga 20210910 patches

2021-09-10 Thread Peter Maydell

On Fri, 10 Sept 2021 at 14:19, Gerd Hoffmann  wrote:
>
> The following changes since commit bd662023e683850c085e98c8ff8297142c2dd9f2:
>
>   Merge remote-tracking branch 'remotes/mcayland/tags/qemu-openbios-20210908' 
> into staging (2021-09-08 11:06:17 +0100)
>
> are available in the Git repository at:
>
>   git://git.kraxel.org/qemu tags/vga-20210910-pull-request
>
> for you to fetch changes up to 6335c0b56819a5d1219ea84a11a732d0861542db:
>
>   qxl: fix pre-save logic (2021-09-10 12:23:12 +0200)
>
> 
> virtio-gpu + ui: fence syncronization.
> qxl: unbreak live migration.
>
> 

Hi; this fails to build on the ppc64 system:

../../ui/egl-helpers.c:79:6: error: no previous prototype for
'egl_dmabuf_create_sync' [-Werror=missing-prototypes]
   79 | void egl_dmabuf_create_sync(QemuDmaBuf *dmabuf)
  |  ^~
../../ui/egl-helpers.c:95:6: error: no previous prototype for
'egl_dmabuf_create_fence' [-Werror=missing-prototypes]
   95 | void egl_dmabuf_create_fence(QemuDmaBuf *dmabuf)
  |  ^~~


The prototype is hidden behind CONFIG_GBM, but the definition is not.

Then the callsites fail:

../../ui/gtk-gl-area.c: In function 'gd_gl_area_draw':
../../ui/gtk-gl-area.c:77:9: error: implicit declaration of function
'egl_dmabuf_create_sync' [-Werror=implicit-function-declaration]
   77 | egl_dmabuf_create_sync(dmabuf);
  | ^~
../../ui/gtk-gl-area.c:77:9: error: nested extern declaration of
'egl_dmabuf_create_sync' [-Werror=nested-externs]
../../ui/gtk-gl-area.c:81:9: error: implicit declaration of function
'egl_dmabuf_create_fence' [-Werror=implicit-function-declaration]
   81 | egl_dmabuf_create_fence(dmabuf);
  | ^~~
../../ui/gtk-gl-area.c:81:9: error: nested extern declaration of
'egl_dmabuf_create_fence' [-Werror=nested-externs]


../../ui/gtk-egl.c: In function 'gd_egl_draw':
../../ui/gtk-egl.c:100:9: error: implicit declaration of function
'egl_dmabuf_create_fence' [-Werror=implicit-function-declaration]
  100 | egl_dmabuf_create_fence(dmabuf);
  | ^~~
../../ui/gtk-egl.c:100:9: error: nested extern declaration of
'egl_dmabuf_create_fence' [-Werror=nested-externs]
../../ui/gtk-egl.c: In function 'gd_egl_scanout_flush':
../../ui/gtk-egl.c:301:9: error: implicit declaration of function
'egl_dmabuf_create_sync' [-Werror=implicit-function-declaration]
  301 | egl_dmabuf_create_sync(vc->gfx.guest_fb.dmabuf);
  | ^~
../../ui/gtk-egl.c:301:9: error: nested extern declaration of
'egl_dmabuf_create_sync' [-Werror=nested-externs]


You can probably repro this on any system which has the opengl
libraries installed but not libgbm.

-- PMM

Re: [PATCH RFC server v2 08/11] vfio-user: handle PCI BAR accesses

2021-09-10 Thread Jag Raman



> On Sep 9, 2021, at 3:37 AM, Stefan Hajnoczi  wrote:
> 
> On Fri, Aug 27, 2021 at 01:53:27PM -0400, Jagannathan Raman wrote:
>> +/**
>> + * VFU_OBJECT_BAR_HANDLER - macro for defining handlers for PCI BARs.
>> + *
>> + * To create handler for BAR number 2, VFU_OBJECT_BAR_HANDLER(2) would
>> + * define vfu_object_bar2_handler
>> + */
>> +#define VFU_OBJECT_BAR_HANDLER(BAR_NO)  
>>\
>> +static ssize_t vfu_object_bar##BAR_NO##_handler(vfu_ctx_t *vfu_ctx, 
>>\
>> +char * const buf, size_t count, 
>>\
>> +loff_t offset, const bool is_write) 
>>\
>> +{   
>>\
>> +VfuObject *o = vfu_get_private(vfu_ctx);
>>\
>> +hwaddr addr = (hwaddr)(pci_get_long(o->pci_dev->config +
>>\
>> +PCI_BASE_ADDRESS_0 +
>>\
>> +(4 * BAR_NO)) + offset);
>>\
> 
> Does this handle 64-bit BARs?

It presently only handles 32-bit BARs. We’ll add support for 64-bit BARs in the 
next rev
of this series.

> 
>> +/**
>> + * vfu_object_register_bars - Identify active BAR regions of pdev and setup
>> + *callbacks to handle read/write accesses
>> + */
>> +static void vfu_object_register_bars(vfu_ctx_t *vfu_ctx, PCIDevice *pdev)
>> +{
>> +uint32_t orig_val, new_val;
>> +int i, size;
>> +
>> +for (i = 0; i < PCI_NUM_REGIONS; i++) {
>> +orig_val = pci_default_read_config(pdev,
>> +   PCI_BASE_ADDRESS_0 + (4 * i), 4);
> 
> Same question as in an earlier patch: should we call pdev->read_config()?

Sure, will do.

--
Jag

Re: applied? Re: [PATCH v1 0/2] Update NVMM support to recent changes, [PATCH v1 1/2] Only check CONFIG_NVMM when NEED_CPU_H is defined, [PATCH v1 2/2] Fix nvmm_ram_block_added() function arguments

2021-09-10 Thread Paolo Bonzini


On 07/09/21 18:02, Reinoud Zandijk wrote:

ping?


Queued, thanks.

Paolo


On Sun, Aug 29, 2021 at 05:39:07PM +0100, Peter Maydell wrote:

On Sun, 29 Aug 2021 at 17:06, Reinoud Zandijk  wrote:


Hi :)

Have these patches been applied? How can I easily check it without manually
checking if they are there in a git pullup? Am I notified normally when
patches are applied?


Generally when a submaintainer picks up a patchset they'll send
an email reply to the patch cover letter to say they've done so.
At that point it should get into upstream git eventually.
Until that happens it's the initial submitter's responsibility to
'ping' the patch after a few weeks if nobody's replied to it
or reviewed it.

Ccing Paolo who seems to have taken the initial nvmm patches
through his tree.

thanks
-- PMM

Re: [PATCH RFC server v2 06/11] vfio-user: handle PCI config space accesses

2021-09-10 Thread Jag Raman



> On Sep 9, 2021, at 3:27 AM, Stefan Hajnoczi  wrote:
> 
> On Fri, Aug 27, 2021 at 01:53:25PM -0400, Jagannathan Raman wrote:
>> +static ssize_t vfu_object_cfg_access(vfu_ctx_t *vfu_ctx, char * const buf,
>> + size_t count, loff_t offset,
>> + const bool is_write)
>> +{
>> +VfuObject *o = vfu_get_private(vfu_ctx);
>> +uint32_t pci_access_width = sizeof(uint32_t);
>> +size_t bytes = count;
>> +uint32_t val = 0;
>> +char *ptr = buf;
>> +int len;
>> +
>> +while (bytes > 0) {
>> +len = (bytes > pci_access_width) ? pci_access_width : bytes;
>> +if (is_write) {
>> +memcpy(, ptr, len);
>> +pci_default_write_config(PCI_DEVICE(o->pci_dev),
>> + offset, val, len);
>> +trace_vfu_cfg_write(offset, val);
>> +} else {
>> +val = pci_default_read_config(PCI_DEVICE(o->pci_dev),
>> +  offset, len);
>> +memcpy(ptr, , len);
> 
> pci_default_read_config() returns a host-endian 32-bit value. This code
> looks wrong because it copies different bytes on big- and little-endian
> hosts.

I’ll collect more details on this one, trying to wrap my head around it.

Concerning config space writes, it doesn’t look like we need to
perform any conversion as the store operation automatically happens
in the CPU’s native format when we do something like the following:
PCIDevice->config[addr] = val;

Concerning config read, I observed that pci_default_read_config()
performs le32_to_cpu() conversion. May be we should not rely on
it doing the conversion.

> 
>> +trace_vfu_cfg_read(offset, val);
>> +}
> 
> Why call pci_default_read/write_config() directly instead of
> pci_dev->config_read/write()?

That makes sense - we should be calling pci_dev->config_read/write().

After performing pci_dev->config_read(), I’ll convert it to the CPU’s
endianness format using le32_to_cpu(). On big-endian systems,
it would re-order the bytes, and on little-endian systems it would
be a no-op.

--
Jag

Re: [PATCH 1/2] tests: add migrate-during-backup

2021-09-10 Thread Vladimir Sementsov-Ogievskiy


10.09.2021 17:18, Hanna Reitz wrote:

On 10.09.21 13:00, Vladimir Sementsov-Ogievskiy wrote:

Add a simple test which tries to run migration during backup.
bdrv_inactivate_all() should fail. But due to bug (see next commit with
fix) it doesn't, nodes are inactivated and continued backup crashes
on assertion "assert(!(bs->open_flags & BDRV_O_INACTIVE));" in
bdrv_co_write_req_prepare().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  .../qemu-iotests/tests/migrate-during-backup  | 87 +++
  .../tests/migrate-during-backup.out   |  5 ++
  2 files changed, 92 insertions(+)
  create mode 100755 tests/qemu-iotests/tests/migrate-during-backup
  create mode 100644 tests/qemu-iotests/tests/migrate-during-backup.out

diff --git a/tests/qemu-iotests/tests/migrate-during-backup 
b/tests/qemu-iotests/tests/migrate-during-backup
new file mode 100755
index 00..c3b7f1983d
--- /dev/null
+++ b/tests/qemu-iotests/tests/migrate-during-backup
@@ -0,0 +1,87 @@
+#!/usr/bin/env python3
+# group: migration disabled
+#
+# Copyright (c) 2021 Virtuozzo International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import os
+import iotests
+from iotests import qemu_img_create, qemu_io
+
+
+disk_a = os.path.join(iotests.test_dir, 'disk_a')
+disk_b = os.path.join(iotests.test_dir, 'disk_b')
+size = '1M'
+mig_file = os.path.join(iotests.test_dir, 'mig_file')
+mig_cmd = 'exec: cat > ' + mig_file
+
+
+class TestMigrateDuringBackup(iotests.QMPTestCase):
+    def tearDown(self):
+    self.vm.shutdown()
+    os.remove(disk_a)
+    os.remove(disk_b)
+    os.remove(mig_file)
+
+    def setUp(self):
+    qemu_img_create('-f', iotests.imgfmt, disk_a, size)
+    qemu_img_create('-f', iotests.imgfmt, disk_b, size)
+    qemu_io('-c', f'write 0 {size}', disk_a)
+
+    self.vm = iotests.VM().add_drive(disk_a)
+    self.vm.launch()
+    result = self.vm.qmp('blockdev-add', {
+    'node-name': 'target',
+    'driver': iotests.imgfmt,
+    'file': {
+    'driver': 'file',
+    'filename': disk_b
+    }
+    })
+    self.assert_qmp(result, 'return', {})
+
+    def test_migrate(self):
+    result = self.vm.qmp('blockdev-backup', device='drive0',
+ target='target', sync='full',
+ speed=1, x_perf={
+ 'max-workers': 1,
+ 'max-chunk': 64 * 1024
+ })
+    self.assert_qmp(result, 'return', {})
+
+    result = self.vm.qmp('job-pause', id='drive0')
+    self.assert_qmp(result, 'return', {})
+
+    result = self.vm.qmp('migrate-set-capabilities',
+ capabilities=[{'capability': 'events',
+    'state': True}])
+    self.assert_qmp(result, 'return', {})
+    result = self.vm.qmp('migrate', uri=mig_cmd)
+    self.assert_qmp(result, 'return', {})
+
+    self.vm.events_wait((('MIGRATION', {'data': {'status': 'completed'}}),
+ ('MIGRATION', {'data': {'status': 'failed'}})))


So the migration failing is the result we expect here, right? Perhaps we should 
then have a loop that waits for MIGRATION events, and breaks on both 
status=completed and status=failed, but logs an error if the migration 
completes unexpectedly.

While I’ll give a

Reviewed-by: Hanna Reitz 

either way, I’d like to know your opinion on this still.



If migration finishes with "completed" backup will crash (current behavior).
So, if we just check that nothing crashes, we are OK with both completed and 
failed results.

If think more about that all...

Actually, nothing wrong if both migration and backup succeeded. It becomes 
possible if we don't inactivate backup target node. And actually we don't need 
that for migration.

On the other hand, in Qemu we are generaly OK with reading from inactivated node. But 
that's wrong: if I understand correctly, "inactive" means that file may be 
modified by external program (for example, by Qemu process on target for shared 
migration). In this perspective, we can't do migration during backup.

On one more hand, for non-shared migration it's absolutely possible to support 
migration during backup: you just need not stop source immediately after

Re: [PATCH 1/4] target/arm: Add TB flag for "MVE insns not predicated"

2021-09-10 Thread Peter Maydell

On Tue, 7 Sept 2021 at 15:11, Richard Henderson
 wrote:
> >   * gen_update_fp_context() -- this function gets called for pretty
> > much every FP/MVE insn (as part of vfp_access_check), and it
> > can in rare cases update the FPSCR.LTPSIZE and the VPR. I guess
> > this means we really do need to end the TB
> > if (MVE && s->v7m_new_fp_ctxt_needed) (ie the comment in
> > gen_update_fp_context "We don't need to arrange to end the TB,
> > because [we don't cache FPSCR in TB flags]" is no longer true).
> > That seems likely to be painful because some of the insns that
> > do a vfp_access_check do complicated things with the TB exits
> > (eg WLSTP, LETP) that are going to override a naive setting of
> > is_jmp in gen_update_fp_context()...
>
> Well, we could let gen_goto_tb see that is_jmp is already UPDATE_NOCHAIN, and 
> suppress the
> goto_tb in that case.  That would seem to take care of any entry into 
> gen_jmp(_tb).

We actually already have code that sets is_jmp (to DISAS_UPDATE_EXIT)
from gen_preserve_fp_state() -- we do that if we're using icount,
setting DISAS_UPDATE_EXIT to force this to be the last insn in the TB.

Do icount IO instructions need to avoid a possible goto_tb ?
I suppose we do want them to go back to the main loop.

This suggests that we need to make gen_goto_tb() look at
the is_jmp value anyway, regardless of this series.

-- PMM

Re: [qemu-web PATCH] Fix link to Windows page in Wiki

2021-09-10 Thread Paolo Bonzini


On 25/08/21 18:43, Helge Konetzka wrote:
Furthermore I would like to propose to change the instructions for 
Native builds with MSYS2 on Wiki Windows page.


Please remove the section which copies system binaries to match the 
expected file names!


Instead define variables for configure (gcc-ar and gcc-ranlib are 
existing copies of x86_64-w64-mingw32-gcc-ar and 
x86_64-w64-mingw32-gcc-ranlib) and add strip to enable make install:


AR=gcc-ar NM=nm OBJCOPY=objcopy RANLIB=gcc-ranlib WINDRES=windres 
STRIP=strip \

./configure --cross-prefix=x86_64-w64-mingw32- --enable-gtk --enable-sdl


Do you even need anything but "./configure"? (possibly AR=gcc-ar NM=nm 
at the beginning)?


I applied the webpage fix in the meanwhile, thanks!

Paolo

Re: [qemu-web PATCH] contribute: ask not to submit merge requests

2021-09-10 Thread Peter Maydell

On Fri, 10 Sept 2021 at 16:34, Daniel P. Berrangé  wrote:
> I think it might be worth leaving a warning here despite adding it
> to the main contribute index page, as people might land directly on
> this page and not see the other page's warning.
>
> >  Do NOT report security issues (or other bugs, too) as "private" bugs in the
> >  bug tracker.  QEMU has a [security process](../security-process) for issues
> >  that should be reported in a non-public way instead.

Isn't that text for the old launchpad bug tracker? The gitlab UI equivalent
to launchpad "private" bugs is that there's a tickybox for "This issue is
confidential". (At least accidental confidential gitlab bugs won't have
the same "vanishes into a black hole that nobody looks at" property that
LP private bugs did.)

-- PMM

Re: [qemu-web PATCH 0/6] Small header/footer layout changes

2021-09-10 Thread Paolo Bonzini


On 08/09/21 14:28, Daniel P. Berrangé wrote:

I previously sent a large series to more fully re-design the
website, especially the front page

   https://lists.gnu.org/archive/html/qemu-devel/2020-10/msg08205.html

Paolo had some feedback on that which I want to take into account
but playing with CSS / layout always takes me way too long. I
figured it could still be beneficial to take some of the simpler
patches in that series.

So essentially this small series is the part of that large series
that changes the header and footer. The changes to the front page
body content are cut out until I can get them working better.

In a slight change from the previous posting for the "edit page"
link at the bottom, I've now used an icon to represent it and
moved its position, such that the layout is more visually
pleasant.

The slight downside with only taking the header/footer changes
is that some of the links I removed from the footer, would have
been added in the page body of the front page instead. I think
that's probably ok not to have them regardless though, as they
are just a single jump away in an obvious place from the navbar
header.


Yeah, some of them may be useful for easier access to contribution 
information and documentation, but it's okay to remove them for now.


I merged this part, thanks!

Paolo

Re: [PATCH v4 22/33] hostmem-epc: Add the reset interface for EPC backend reset

2021-09-10 Thread Sean Christopherson

On Fri, Sep 10, 2021, Paolo Bonzini wrote:
> On 19/07/21 13:21, Yang Zhong wrote:
> > +void sgx_memory_backend_reset(HostMemoryBackend *backend, int fd,
> > +  Error **errp)
> > +{
> > +MemoryRegion *mr = >mr;
> > +
> > +mr->enabled = false;
> > +
> > +/* destroy the old memory region if it exist */
> > +if (fd > 0 && mr->destructor) {
> > +mr->destructor(mr);
> > +}
> > +
> > +sgx_epc_backend_memory_alloc(backend, errp);
> > +}
> > +
> 
> Jarkko, Sean, Kai,
> 
> this I think is problematic because it has a race window while /dev/sgx_vepc
> is closed and then reopened.  First, the vEPC space could be exhausted by
> somebody doing another mmap in the meanwhile.  Second, somebody might (for
> whatever reason) remove /dev/sgx_vepc while QEMU runs.

Yep, both error cases are possible.

> Yang explained to me (offlist) that this is needed because Windows fails to
> reboot without it.  We would need a way to ask Linux to reinitialize the
> vEPC, that doesn't involve munmap/mmap; this could be for example
> fallocate(FALLOC_FL_ZERO_RANGE).
> 
> What do you all think?

Mechanically, FALLOC_FL_ZERO_RANGE could work, but it's definitely a stretch of
the semantics as the underlying memory would not actually be zeroed.

The only other option that comes to mind is a dedicated ioctl().

Re: [qemu-web PATCH] contribute: ask not to submit merge requests

2021-09-10 Thread Daniel P . Berrangé

On Fri, Sep 10, 2021 at 05:29:33PM +0200, Paolo Bonzini wrote:
> Currently, the bug reporting page has a paragraph about not sending patches
> on the bug tracker, with a link to the patch submission guidelines.
> 
> However, now that the tracker is hosted on Gitlab it is more likely that
> prospective contributors will submit a merge request linked to an issue,
> rather than attaching a patch to the issue.  Update the language and,
> since it is not limited to bug reports, move it to the main contribute
> page.
> 
> Signed-off-by: Paolo Bonzini 
> ---
>  contribute.md  | 3 ++-
>  contribute/report-a-bug.md | 2 --
>  2 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/contribute.md b/contribute.md
> index cbb476d..4802452 100644
> --- a/contribute.md
> +++ b/contribute.md
> @@ -15,7 +15,8 @@ permalink: /contribute/
>  
>  * Read developer documentation: [Getting started for 
> developers](https://wiki.qemu.org/Documentation/GettingStartedDevelopers),
>[Contributor FAQ](https://wiki.qemu.org/Contribute/FAQ),
> -  [How to submit a 
> patch](https://wiki.qemu.org/Contribute/SubmitAPatch),
>[Improve the 
> website](https://www.qemu.org/2017/02/04/the-new-qemu-website-is-up/)
>  
> +Please do not submit merge requests on GitLab; patches are sent to the 
> mailing list according to QEMU's [patch submissions 
> guidelines](https://wiki.qemu.org/Contribute/SubmitAPatch).
> +
>  Contributing to QEMU is subject to the [QEMU Code of 
> Conduct](https://qemu.org/docs/master/devel/code-of-conduct.html).
> diff --git a/contribute/report-a-bug.md b/contribute/report-a-bug.md
> index 922f699..807daf8 100644
> --- a/contribute/report-a-bug.md
> +++ b/contribute/report-a-bug.md
> @@ -18,8 +18,6 @@ When submitting a bug report, please try to do the 
> following:
>  
>  * Include information about the host and guest (operating system, version, 
> 32/64-bit).
>  
> -* Do not contribute patches on the bug tracker; send patches to the mailing 
> list. Follow QEMU's [guidelines about submitting 
> patches](https://wiki.qemu.org/Contribute/SubmitAPatch).
> -

I think it might be worth leaving a warning here despite adding it
to the main contribute index page, as people might land directly on
this page and not see the other page's warning.

>  Do NOT report security issues (or other bugs, too) as "private" bugs in the
>  bug tracker.  QEMU has a [security process](../security-process) for issues
>  that should be reported in a non-public way instead.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: Adding IO memory region to mipssim

2021-09-10 Thread Philippe Mathieu-Daudé

On 9/10/21 3:21 PM, Hinko Kocevar wrote:
> I'm trying to add an I/O memory region to mipssim machine to emulate a
> MMIO region used by the u-boot loaded as BIOS image. I can confirm that
> the machine starts and loads the BIOS, starts execution but hangs due to
> unhandled IO access as described below.
> 
> The region should be at 0xB881, of size 0x1.
> 
> I've added these lines of code to mispsim.c mips_mipssim_init():
> 
>     my_state *s = g_malloc0(sizeof(my_state));
>     memory_region_init_io(>mmio, NULL, _ops, s,
>                          "mips_mipssim.foo", 0x1);
>     memory_region_add_subregion(address_space_mem, 0xB881LL, >mmio);

You need to map your device at its physical address, not the virtual
one.

> All goes well, the machine starts, and I can see the newly added region
> in qemu monitor info mtree output like so:
> 
>     b881-b881 (prio 0, i/o): mips_mipssim.foo
> 
> With some tracing enabled I see this error:
> 
>  Invalid access at addr 0x18810104, size 4, region '(null)', reason:
> rejected
> 
> I know the u-boot is making request to 0xB8810104 and not 0x18810104. I

U-boot accessed the virtual address which is resolved into the physical
one (where your device should be mapped).

> also can see 0xB8810104 address being handed to io_writex(), but
> mr_offset becomes 0x18810104 here:
> 
>   mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;
> 
> What is going on?
> 
> FWIW, I can place my emulated memory region at 0x18810104, but would
> like to understand the behavior above.

Yes, this is the correct address to map it.

Maybe this helps:
https://training.mips.com/basic_mips/PDF/Memory_Map.pdf

Regards,

Phil.

[qemu-web PATCH] contribute: ask not to submit merge requests

2021-09-10 Thread Paolo Bonzini

Currently, the bug reporting page has a paragraph about not sending patches
on the bug tracker, with a link to the patch submission guidelines.

However, now that the tracker is hosted on Gitlab it is more likely that
prospective contributors will submit a merge request linked to an issue,
rather than attaching a patch to the issue.  Update the language and,
since it is not limited to bug reports, move it to the main contribute
page.

Signed-off-by: Paolo Bonzini 
---
 contribute.md  | 3 ++-
 contribute/report-a-bug.md | 2 --
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/contribute.md b/contribute.md
index cbb476d..4802452 100644
--- a/contribute.md
+++ b/contribute.md
@@ -15,7 +15,8 @@ permalink: /contribute/
 
 * Read developer documentation: [Getting started for 
developers](https://wiki.qemu.org/Documentation/GettingStartedDevelopers),
   [Contributor FAQ](https://wiki.qemu.org/Contribute/FAQ),
-  [How to submit a 
patch](https://wiki.qemu.org/Contribute/SubmitAPatch),
   [Improve the 
website](https://www.qemu.org/2017/02/04/the-new-qemu-website-is-up/)
 
+Please do not submit merge requests on GitLab; patches are sent to the mailing 
list according to QEMU's [patch submissions 
guidelines](https://wiki.qemu.org/Contribute/SubmitAPatch).
+
 Contributing to QEMU is subject to the [QEMU Code of 
Conduct](https://qemu.org/docs/master/devel/code-of-conduct.html).
diff --git a/contribute/report-a-bug.md b/contribute/report-a-bug.md
index 922f699..807daf8 100644
--- a/contribute/report-a-bug.md
+++ b/contribute/report-a-bug.md
@@ -18,8 +18,6 @@ When submitting a bug report, please try to do the following:
 
 * Include information about the host and guest (operating system, version, 
32/64-bit).
 
-* Do not contribute patches on the bug tracker; send patches to the mailing 
list. Follow QEMU's [guidelines about submitting 
patches](https://wiki.qemu.org/Contribute/SubmitAPatch).
-
 Do NOT report security issues (or other bugs, too) as "private" bugs in the
 bug tracker.  QEMU has a [security process](../security-process) for issues
 that should be reported in a non-public way instead.
-- 
2.31.1

Re: [PATCH RFC server v2 01/11] vfio-user: build library

2021-09-10 Thread Philippe Mathieu-Daudé

On 9/8/21 2:25 PM, Stefan Hajnoczi wrote:
> On Fri, Aug 27, 2021 at 01:53:20PM -0400, Jagannathan Raman wrote:

>> diff --git a/.gitmodules b/.gitmodules
>> index 08b1b48..cfeea7c 100644
>> --- a/.gitmodules
>> +++ b/.gitmodules
>> @@ -64,3 +64,6 @@
>>  [submodule "roms/vbootrom"]
>>  path = roms/vbootrom
>>  url = https://gitlab.com/qemu-project/vbootrom.git
>> +[submodule "subprojects/libvfio-user"]
>> +path = subprojects/libvfio-user
>> +url = https://github.com/nutanix/libvfio-user.git
> 
> Once this is merged I'll set up a
> gitlab.com/qemu-project/libvfio-user.git mirror. This ensures that no
> matter what happens with upstream libvfio-user.git, the source code that
> QEMU builds against will remain archived/available.

Can we do it the other way around? When the series is OK to be merged,
setup the https://gitlab.com/qemu-project/libvfio-user.git mirror and
have the submodule point to it?

Re: [PATCH RFC server v2 01/11] vfio-user: build library

2021-09-10 Thread Philippe Mathieu-Daudé

On 8/27/21 7:53 PM, Jagannathan Raman wrote:
> add the libvfio-user library as a submodule. build it as a cmake
> subproject.
> 
> Signed-off-by: Elena Ufimtseva 
> Signed-off-by: John G Johnson 
> Signed-off-by: Jagannathan Raman 
> ---
>  configure| 11 +++
>  meson.build  | 28 
>  .gitmodules  |  3 +++
>  MAINTAINERS  |  7 +++
>  hw/remote/meson.build|  2 ++
>  subprojects/libvfio-user |  1 +
>  6 files changed, 52 insertions(+)
>  create mode 16 subprojects/libvfio-user

> diff --git a/subprojects/libvfio-user b/subprojects/libvfio-user
> new file mode 16
> index 000..647c934
> --- /dev/null
> +++ b/subprojects/libvfio-user
> @@ -0,0 +1 @@
> +Subproject commit 647c9341d2e06266a710ddd075f69c95dd3b8446
> 

Could we point to a sha1 of a released tag instead?

Re: [PULL 0/3] Input 20210910 patches

2021-09-10 Thread Peter Maydell

On Fri, 10 Sept 2021 at 11:26, Gerd Hoffmann  wrote:
>
> The following changes since commit bd662023e683850c085e98c8ff8297142c2dd9f2:
>
>   Merge remote-tracking branch 'remotes/mcayland/tags/qemu-openbios-20210908' 
> into staging (2021-09-08 11:06:17 +0100)
>
> are available in the Git repository at:
>
>   git://git.kraxel.org/qemu tags/input-20210910-pull-request
>
> for you to fetch changes up to 4e9bddcbaa74e2463f0a79350fea5311c9890982:
>
>   ps2: migration support for command reply queue (2021-09-10 07:32:32 +0200)
>
> 
> input: ps2 fixes.
>


Applied, thanks.

Please update the changelog at https://wiki.qemu.org/ChangeLog/6.2
for any user-visible changes.

-- PMM

Re: [PATCH v4 23/33] sgx-epc: Add the reset interface for sgx-epc virt device

2021-09-10 Thread Paolo Bonzini


On 19/07/21 13:21, Yang Zhong wrote:

+static void sgx_epc_del_subregion(DeviceState *dev)
+{
+PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
+SGXEPCState *sgx_epc = >sgx_epc;
+SGXEPCDevice *epc = SGX_EPC(dev);
+
+/* del subregion and related operations */
+memory_region_del_subregion(_epc->mr,
+host_memory_backend_get_memory(epc->hostmem));
+host_memory_backend_set_mapped(epc->hostmem, false);
+g_free(sgx_epc->sections);
+sgx_epc->sections = NULL;
+
+/* multiple epc devices, only zero the first time */
+if (epc_num == sgx_epc->nr_sections) {
+sgx_epc->size = 0;
+sgx_epc->nr_sections = 0;
+}
+}


Can it just invoke a method on the memory devices, without reusing the 
code in sgx_epc_realize?  In particular why is it necessary to update 
sgx_epc->size and sgx_epc->nr_sections?


Paolo

Re: [PATCH v4 22/33] hostmem-epc: Add the reset interface for EPC backend reset

2021-09-10 Thread Paolo Bonzini


On 19/07/21 13:21, Yang Zhong wrote:

+void sgx_memory_backend_reset(HostMemoryBackend *backend, int fd,
+  Error **errp)
+{
+MemoryRegion *mr = >mr;
+
+mr->enabled = false;
+
+/* destroy the old memory region if it exist */
+if (fd > 0 && mr->destructor) {
+mr->destructor(mr);
+}
+
+sgx_epc_backend_memory_alloc(backend, errp);
+}
+


Jarkko, Sean, Kai,

this I think is problematic because it has a race window while 
/dev/sgx_vepc is closed and then reopened.  First, the vEPC space could 
be exhausted by somebody doing another mmap in the meanwhile.  Second, 
somebody might (for whatever reason) remove /dev/sgx_vepc while QEMU runs.


Yang explained to me (offlist) that this is needed because Windows fails 
to reboot without it.  We would need a way to ask Linux to reinitialize 
the vEPC, that doesn't involve munmap/mmap; this could be for example 
fallocate(FALLOC_FL_ZERO_RANGE).


What do you all think?

Paolo

Re: [PATCH v3 6/6] tests/qapi-schema: Test cases for aliases

2021-09-10 Thread Kevin Wolf

Am 06.09.2021 um 17:28 hat Markus Armbruster geschrieben:
> Kevin Wolf  writes:
> > +/* Can still specify the real member name with alias support */
> > +v = visitor_input_test_init(data, "{ 'foo': 42 }");
> > +visit_type_AliasStruct1(v, NULL, , _abort);
> > +g_assert_cmpint(tmp->foo, ==, 42);
> > +qapi_free_AliasStruct1(tmp);
> > +
> > +/* The alias is a working alternative */
> > +v = visitor_input_test_init(data, "{ 'bar': 42 }");
> > +visit_type_AliasStruct1(v, NULL, , _abort);
> > +g_assert_cmpint(tmp->foo, ==, 42);
> > +qapi_free_AliasStruct1(tmp);
> > +
> > +/* But you can't use both at the same time */
> > +v = visitor_input_test_init(data, "{ 'foo': 5, 'bar': 42 }");
> > +visit_type_AliasStruct1(v, NULL, , );
> > +error_free_or_abort();
> 
> I double-checked this reports "Value for parameter foo was already given
> through an alias", as it should.
> 
> Pointing to what exactly is giving values to foo already would be nice.
> In this case, 'foo' is obvious, but 'bar' is not.  This is not a demand.

We have the name, so we could print it, but it could be in a different
StackObject. I'm not sure if we have a good way to identify a parent
StackObject, and without it the message could be very confusing.

If you have a good idea what the message should look like, I can make an
attempt.

> > +/* You can't use more than one option at the same time */
> > +v = visitor_input_test_init(data, "{ 'foo': 42, 'nested': { 'foo': 42 
> > } }");
> > +visit_type_AliasStruct3(v, NULL, , );
> > +error_free_or_abort();
> 
> "Parameter 'foo' is unexpected".  Say what?  It *is* expected, it just
> clashes with 'nested.foo'.
> 
> I figure this is what happens:
> 
> * visit_type_AliasStruct3()
> 
>   - visit_start_struct()
> 
>   - visit_type_AliasStruct3_members()
> 
> • visit_type_AliasStruct1() for member @nested.
> 
>   This consumes consumes input nested.foo.
> 
>   - visit_check_struct()
> 
> Error: input foo has not been consumed.
> 
> Any ideas on how to report this error more clearly?

It's a result of the logic that wildcard aliases are silently ignored
when they aren't needed. The reason why I included this is that it would
allow you to have two members with the same name in the object
containing the alias and in the aliased object without conflicting as
long as both are given.

Never skipping wildcard aliases makes the code simpler and results in
the expected error message here. So I'll do that for v4.

Note that parsing something like '--chardev type=socket,addr.type=unix,
path=/tmp/sock,id=c' now depends on the order in the generated code. If
the top level 'type' weren't parsed and removed from the input first,
visiting 'addr.type' would now detect a conflict. For union types, we
know that 'type' is always parsed first, so it's not a problem, but in
the general case you need to be careful with the order.

> > +
> > +v = visitor_input_test_init(data, "{ 'tag': 'eins', 'foo': 6, 'bar': 9 
> > }");
> > +visit_type_AliasFlatUnion(v, NULL, , );
> > +error_free_or_abort();
> 
> "Value for parameter foo was already given through an alias".  Good,
> except I'm getting a feeling "already" may be confusing.  It's "already"
> only in the sense that we already got the value via alias, which is an
> implementation detail.  It may or may not be given already in the
> input.  Here it's not: 'bar' follows 'foo'.
> 
> What about "is also given through an alias"?

Sounds good.

> Positive tests look good to me, except they neglect to use any of the
> types using the alias features in QMP.  I think we need something like
> the appended incremental patch.

I'm squashing it in.

Kevin

Re: [PATCH RFC server v2 04/11] vfio-user: find and init PCI device

2021-09-10 Thread Jag Raman



> On Sep 8, 2021, at 8:43 AM, Stefan Hajnoczi  wrote:
> 
> On Fri, Aug 27, 2021 at 01:53:23PM -0400, Jagannathan Raman wrote:
>> @@ -96,6 +102,28 @@ static void vfu_object_machine_done(Notifier *notifier, 
>> void *data)
>>strerror(errno));
>> return;
>> }
>> +
>> +dev = qdev_find_recursive(sysbus_get_default(), o->devid);
>> +if (dev == NULL) {
>> +error_setg(_abort, "vfu: Device %s not found", o->devid);
>> +return;
>> +}
>> +
>> +if (!object_dynamic_cast(OBJECT(dev), TYPE_PCI_DEVICE)) {
>> +error_setg(_abort, "vfu: %s not a PCI devices", o->devid);
>> +return;
>> +}
>> +
>> +o->pci_dev = PCI_DEVICE(dev);
>> +
>> +ret = vfu_pci_init(o->vfu_ctx, VFU_PCI_TYPE_CONVENTIONAL,
>> +   PCI_HEADER_TYPE_NORMAL, 0);
> 
> What is needed to support PCI Express?

I think we could check if o->pci_dev supports QEMU_PCI_CAP_EXPRESS,
and based on that choose if we should use
VFU_PCI_TYPE_CONVENTIONAL or VFU_PCI_TYPE_EXPRESS.

pci_is_express() is already doing that, although it’s a private function
now. It’s a good time to export it.

--
Jag

[PATCH] hw/nvme: Return error for fused operations

2021-09-10 Thread Pankaj Raghav

Currently, FUSED operations are not supported by QEMU. As per the 1.4 SPEC,
controller should abort the command that requested a fused operation with 
an INVALID FIELD error code if they are not supported.


Signed-off-by: Pankaj Raghav 
---
 hw/nvme/ctrl.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/hw/nvme/ctrl.c b/hw/nvme/ctrl.c
index dc0e7b0030..d15a80a054 100644
--- a/hw/nvme/ctrl.c
+++ b/hw/nvme/ctrl.c
@@ -3893,6 +3893,10 @@ static uint16_t nvme_io_cmd(NvmeCtrl *n, NvmeRequest 
*req)
 return ns->status;
 }
 
+if (NVME_CMD_FLAGS_FUSE(req->cmd.flags)) {
+return NVME_INVALID_FIELD;
+}
+
 req->ns = ns;
 
 switch (req->cmd.opcode) {
-- 
2.25.1

Re: [PATCH RFC server v2 03/11] vfio-user: instantiate vfio-user context

2021-09-10 Thread Jag Raman



> On Sep 8, 2021, at 8:40 AM, Stefan Hajnoczi  wrote:
> 
> On Fri, Aug 27, 2021 at 01:53:22PM -0400, Jagannathan Raman wrote:
>> create a context with the vfio-user library to run a PCI device
>> 
>> Signed-off-by: Elena Ufimtseva 
>> Signed-off-by: John G Johnson 
>> Signed-off-by: Jagannathan Raman 
>> ---
>> hw/remote/vfio-user-obj.c | 29 +
>> 1 file changed, 29 insertions(+)
>> 
>> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
>> index 4a1e297..99d3dd1 100644
>> --- a/hw/remote/vfio-user-obj.c
>> +++ b/hw/remote/vfio-user-obj.c
>> @@ -27,11 +27,17 @@
>> #include "qemu/osdep.h"
>> #include "qemu-common.h"
>> 
>> +#include 
> 
> qemu/osdep.h already includes 
> 
>> +
>> #include "qom/object.h"
>> #include "qom/object_interfaces.h"
>> #include "qemu/error-report.h"
>> #include "trace.h"
>> #include "sysemu/runstate.h"
>> +#include "qemu/notify.h"
>> +#include "qapi/error.h"
>> +#include "sysemu/sysemu.h"
>> +#include "libvfio-user.h"
>> 
>> #define TYPE_VFU_OBJECT "vfio-user"
>> OBJECT_DECLARE_TYPE(VfuObject, VfuObjectClass, VFU_OBJECT)
>> @@ -51,6 +57,10 @@ struct VfuObject {
>> 
>> char *socket;
>> char *devid;
>> +
>> +Notifier machine_done;
>> +
>> +vfu_ctx_t *vfu_ctx;
>> };
>> 
>> static void vfu_object_set_socket(Object *obj, const char *str, Error **errp)
>> @@ -75,9 +85,23 @@ static void vfu_object_set_devid(Object *obj, const char 
>> *str, Error **errp)
>> trace_vfu_prop("devid", str);
>> }
>> 
>> +static void vfu_object_machine_done(Notifier *notifier, void *data)
> 
> Please document the reason for using a machine init done notifier.

OK, will do.

> 
>> +{
>> +VfuObject *o = container_of(notifier, VfuObject, machine_done);
>> +
>> +o->vfu_ctx = vfu_create_ctx(VFU_TRANS_SOCK, o->socket, 0,
>> +o, VFU_DEV_TYPE_PCI);
>> +if (o->vfu_ctx == NULL) {
>> +error_setg(_abort, "vfu: Failed to create context - %s",
>> +   strerror(errno));
>> +return;
>> +}
>> +}
>> +
>> static void vfu_object_init(Object *obj)
>> {
>> VfuObjectClass *k = VFU_OBJECT_GET_CLASS(obj);
>> +VfuObject *o = VFU_OBJECT(obj);
>> 
>> if (!object_dynamic_cast(OBJECT(current_machine), TYPE_REMOTE_MACHINE)) {
>> error_report("vfu: %s only compatible with %s machine",
>> @@ -92,6 +116,9 @@ static void vfu_object_init(Object *obj)
>> }
>> 
>> k->nr_devs++;
>> +
>> +o->machine_done.notify = vfu_object_machine_done;
>> +qemu_add_machine_init_done_notifier(>machine_done);
>> }
>> 
>> static void vfu_object_finalize(Object *obj)
>> @@ -101,6 +128,8 @@ static void vfu_object_finalize(Object *obj)
>> 
>> k->nr_devs--;
>> 
>> +vfu_destroy_ctx(o->vfu_ctx);
> 
> Will this function ever be called before vfu_object_machine_done() is
> called? In that case vfu_ctx isn't initialized.

There are some case where vfu_object_finalize() could be called before
vfu_object_machine_done() executes. In that case o->vfu_ctx would be
NULL - we didn’t account for that before.

vfu_destroy_ctx() does check for NULL - however, we’ll add a check
here as well in case vfu_destroy_ctx() changes in the future.

--
Jag

Re: [PATCH] softmmu: fix watchpoint processing in icount mode

2021-09-10 Thread Richard Henderson


On 9/10/21 3:46 PM, David Hildenbrand wrote:

On 10.09.21 15:34, Richard Henderson wrote:

On 9/10/21 1:15 PM, David Hildenbrand wrote:

On 07.09.21 13:30, Pavel Dovgalyuk wrote:

Watchpoint processing code restores vCPU state twice:
in tb_check_watchpoint and in cpu_loop_exit_restore/cpu_restore_state.
Normally it does not affect anything, but in icount mode instruction
counter is incremented twice and becomes incorrect.
This patch eliminates unneeded CPU state restore.

Signed-off-by: Pavel Dovgalyuk 
---
   softmmu/physmem.c |    5 +
   1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 23e77cb771..4025dfab11 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -941,14 +941,11 @@ void cpu_check_watchpoint(CPUState *cpu, vaddr addr, 
vaddr len,
   if (wp->flags & BP_STOP_BEFORE_ACCESS) {
   cpu->exception_index = EXCP_DEBUG;
   mmap_unlock();
-    cpu_loop_exit_restore(cpu, ra);
+    cpu_loop_exit(cpu);
   } else {
   /* Force execution of one insn next time.  */
   cpu->cflags_next_tb = 1 | curr_cflags(cpu);
   mmap_unlock();
-    if (ra) {
-    cpu_restore_state(cpu, ra, true);
-    }
   cpu_loop_exit_noexc(cpu);
   }
   }




I'm not an expert on that code, but it looks good to me.

Maybe we could have added a comment above the tb_check_watchpoint() call to 
highlight that
the restore will happen in there.


Hmm.  Curious.

Looking at tb_check_watchpoint, I have trouble seeing how it could be correct.
Watchpoints can happen at any memory reference within the TB.  We should be 
rolling back
to the cpu state at the memory reference (cpu_retore_state) and not the cpu 
state at the
start of the TB (cpu_restore_state_from_tb).


cpu_restore_state() ends up calling cpu_restore_state_from_tb() with essentially
the same parameters or what am I missing?


Whoops, yes.  I must have been thinking of a different function.


I'm also not sure why we're invalidating tb's.  Why does watchpoint hit imply 
that we
should want to ditch the TB?  If we want different behaviour from the next 
execution, we
should be adjusting cflags.


It goes back to

commit 06d55cc19ac84e799d2df8c750049e51798b00a4
Author: aliguori 
Date:   Tue Nov 18 20:24:06 2008 +

     Restore pc on watchpoint hits (Jan Kiszka)
     In order to provide accurate information about the triggering
     instruction, this patch adds the required bits to restore the pc if the
     access happened inside a TB. With the BP_STOP_BEFORE_ACCESS flag, the
     watchpoint user can control if the debug trap should be issued on or
     after the accessing instruction.
     Signed-off-by: Jan Kiszka 
     Signed-off-by: Anthony Liguori 


*trying to rememebr what we do on watchpoints* I think we want to
make sure that we end up with a single-instruction TB, right? So we
want to make sure to remove the old one.


When the watchpoint needs to trigger after the insn, we do indeed want to execute a single 
insn, which we do with the cflags there in the patch context.  But when we want to stop 
before the insn, we're already done -- so what was the invalidate supposed to achieve?


(Then of course there's the problem that Phillipe filed (#245) in which we set cflags as 
per above, then take an interrupt before using it, then wind up with garbage.  Ho hum.)



r~

r~

[PATCH v9 9/9] tests/data/acpi/virt: Update IORT files for ITS

2021-09-10 Thread Shashi Mallela

Updated expected IORT files applicable with latest GICv3
ITS changes.

Full diff of new file disassembly:

/*
 * Intel ACPI Component Architecture
 * AML/ASL+ Disassembler version 20180629 (64-bit version)
 * Copyright (c) 2000 - 2018 Intel Corporation
 *
 * Disassembly of tests/data/acpi/virt/IORT.pxb, Tue Jun 29 17:35:38 2021
 *
 * ACPI Data Table [IORT]
 *
 * Format: [HexOffset DecimalOffset ByteLength]  FieldName : FieldValue
 */

[000h    4]Signature : "IORT"[IO Remapping Table]
[004h 0004   4] Table Length : 007C
[008h 0008   1] Revision : 00
[009h 0009   1] Checksum : 07
[00Ah 0010   6]   Oem ID : "BOCHS "
[010h 0016   8] Oem Table ID : "BXPC"
[018h 0024   4] Oem Revision : 0001
[01Ch 0028   4]  Asl Compiler ID : "BXPC"
[020h 0032   4]Asl Compiler Revision : 0001

[024h 0036   4]   Node Count : 0002
[028h 0040   4]  Node Offset : 0030
[02Ch 0044   4] Reserved : 

[030h 0048   1] Type : 00
[031h 0049   2]   Length : 0018
[033h 0051   1] Revision : 00
[034h 0052   4] Reserved : 
[038h 0056   4]Mapping Count : 
[03Ch 0060   4]   Mapping Offset : 

[040h 0064   4] ItsCount : 0001
[044h 0068   4]  Identifiers : 

[048h 0072   1] Type : 02
[049h 0073   2]   Length : 0034
[04Bh 0075   1] Revision : 00
[04Ch 0076   4] Reserved : 
[050h 0080   4]Mapping Count : 0001
[054h 0084   4]   Mapping Offset : 0020

[058h 0088   8]Memory Properties : [IORT Memory Access Properties]
[058h 0088   4]  Cache Coherency : 0001
[05Ch 0092   1]Hints (decoded below) : 00
   Transient : 0
  Write Allocate : 0
   Read Allocate : 0
Override : 0
[05Dh 0093   2] Reserved : 
[05Fh 0095   1] Memory Flags (decoded below) : 03
   Coherency : 1
Device Attribute : 1
[060h 0096   4]ATS Attribute : 
[064h 0100   4]   PCI Segment Number : 
[068h 0104   1]Memory Size Limit : 00
[069h 0105   3] Reserved : 00

[068h 0104   4]   Input base : 
[06Ch 0108   4] ID Count : 
[070h 0112   4]  Output Base : 
[074h 0116   4] Output Reference : 0030
[078h 0120   4]Flags (decoded below) : 
  Single Mapping : 0

Raw Table Data: Length 124 (0x7C)

: 49 4F 52 54 7C 00 00 00 00 07 42 4F 43 48 53 20  // IORT|.BOCHS
0010: 42 58 50 43 20 20 20 20 01 00 00 00 42 58 50 43  // BXPCBXPC
0020: 01 00 00 00 02 00 00 00 30 00 00 00 00 00 00 00  // 0...
0030: 00 18 00 00 00 00 00 00 00 00 00 00 00 00 00 00  // 
0040: 01 00 00 00 00 00 00 00 02 34 00 00 00 00 00 00  // .4..
0050: 01 00 00 00 20 00 00 00 01 00 00 00 00 00 00 03  //  ...
0060: 00 00 00 00 00 00 00 00 00 00 00 00 FF FF 00 00  // 
0070: 00 00 00 00 30 00 00 00 00 00 00 00  // 0...

Signed-off-by: Shashi Mallela 
Acked-by: Igor Mammedov 
Reviewed-by: Peter Maydell 
---
 tests/data/acpi/virt/IORT   | Bin 0 -> 124 bytes
 tests/data/acpi/virt/IORT.memhp | Bin 0 -> 124 bytes
 tests/data/acpi/virt/IORT.numamem   | Bin 0 -> 124 bytes
 tests/data/acpi/virt/IORT.pxb   | Bin 0 -> 124 bytes
 tests/qtest/bios-tables-test-allowed-diff.h |   4 
 5 files changed, 4 deletions(-)

diff --git a/tests/data/acpi/virt/IORT b/tests/data/acpi/virt/IORT
index 
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..521acefe9ba66706c5607321a82d330586f3f280
 100644
GIT binary patch
literal 124
zcmebD4+^Pa00MR=e`k+i1*eDrX9XZ&1PX!JAesq?4S*O7Bw!2(4Uz`|CKCt^;wu0#
QRGb+i3L*dhhtM#y0PN=p0RR91

literal 0
HcmV?d1

diff --git a/tests/data/acpi/virt/IORT.memhp b/tests/data/acpi/virt/IORT.memhp
index 
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..521acefe9ba66706c5607321a82d330586f3f280
 100644
GIT binary patch
literal 124
zcmebD4+^Pa00MR=e`k+i1*eDrX9XZ&1PX!JAesq?4S*O7Bw!2(4Uz`|CKCt^;wu0#
QRGb+i3L*dhhtM#y0PN=p0RR91

literal 0
HcmV?d1

diff --git a/tests/data/acpi/virt/IORT.numamem 
b/tests/data/acpi/virt/IORT.numamem
index 
e69de29bb2d1d6434b8b29ae775ad8c2e48c5391..521acefe9ba66706c5607321a82d330586f3f280
 100644
GIT binary patch
literal 124

[PATCH v9 8/9] hw/arm/virt: add ITS support in virt GIC

2021-09-10 Thread Shashi Mallela

Included creation of ITS as part of virt platform GIC
initialization. This Emulated ITS model now co-exists with kvm
ITS and is enabled in absence of kvm irq kernel support in a
platform.

Signed-off-by: Shashi Mallela 
Reviewed-by: Peter Maydell 
---
 hw/arm/virt.c | 29 +++--
 include/hw/arm/virt.h |  2 ++
 target/arm/kvm_arm.h  |  4 ++--
 3 files changed, 31 insertions(+), 4 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 73e9c6bb7c..1d59f0e59f 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -584,6 +584,12 @@ static void create_its(VirtMachineState *vms)
 const char *itsclass = its_class_name();
 DeviceState *dev;
 
+if (!strcmp(itsclass, "arm-gicv3-its")) {
+if (!vms->tcg_its) {
+itsclass = NULL;
+}
+}
+
 if (!itsclass) {
 /* Do nothing if not supported */
 return;
@@ -621,7 +627,7 @@ static void create_v2m(VirtMachineState *vms)
 vms->msi_controller = VIRT_MSI_CTRL_GICV2M;
 }
 
-static void create_gic(VirtMachineState *vms)
+static void create_gic(VirtMachineState *vms, MemoryRegion *mem)
 {
 MachineState *ms = MACHINE(vms);
 /* We create a standalone GIC */
@@ -655,6 +661,14 @@ static void create_gic(VirtMachineState *vms)
  nb_redist_regions);
 qdev_prop_set_uint32(vms->gic, "redist-region-count[0]", 
redist0_count);
 
+if (!kvm_irqchip_in_kernel()) {
+if (vms->tcg_its) {
+object_property_set_link(OBJECT(vms->gic), "sysmem",
+ OBJECT(mem), _fatal);
+qdev_prop_set_bit(vms->gic, "has-lpi", true);
+}
+}
+
 if (nb_redist_regions == 2) {
 uint32_t redist1_capacity =
 vms->memmap[VIRT_HIGH_GIC_REDIST2].size / 
GICV3_REDIST_SIZE;
@@ -2039,7 +2053,7 @@ static void machvirt_init(MachineState *machine)
 
 virt_flash_fdt(vms, sysmem, secure_sysmem ?: sysmem);
 
-create_gic(vms);
+create_gic(vms, sysmem);
 
 virt_cpu_post_init(vms, sysmem);
 
@@ -2742,6 +2756,12 @@ static void virt_instance_init(Object *obj)
 } else {
 /* Default allows ITS instantiation */
 vms->its = true;
+
+if (vmc->no_tcg_its) {
+vms->tcg_its = false;
+} else {
+vms->tcg_its = true;
+}
 }
 
 /* Default disallows iommu instantiation */
@@ -2791,8 +2811,13 @@ DEFINE_VIRT_MACHINE_AS_LATEST(6, 2)
 
 static void virt_machine_6_1_options(MachineClass *mc)
 {
+VirtMachineClass *vmc = VIRT_MACHINE_CLASS(OBJECT_CLASS(mc));
+
 virt_machine_6_2_options(mc);
 compat_props_add(mc->compat_props, hw_compat_6_1, hw_compat_6_1_len);
+
+/* qemu ITS was introduced with 6.2 */
+vmc->no_tcg_its = true;
 }
 DEFINE_VIRT_MACHINE(6, 1)
 
diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index 9661c46699..b461b8d261 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -120,6 +120,7 @@ struct VirtMachineClass {
 MachineClass parent;
 bool disallow_affinity_adjustment;
 bool no_its;
+bool no_tcg_its;
 bool no_pmu;
 bool claim_edge_triggered_timers;
 bool smbios_old_sys_ver;
@@ -141,6 +142,7 @@ struct VirtMachineState {
 bool highmem;
 bool highmem_ecam;
 bool its;
+bool tcg_its;
 bool virt;
 bool ras;
 bool mte;
diff --git a/target/arm/kvm_arm.h b/target/arm/kvm_arm.h
index 34f8daa377..0613454975 100644
--- a/target/arm/kvm_arm.h
+++ b/target/arm/kvm_arm.h
@@ -525,8 +525,8 @@ static inline const char *its_class_name(void)
 /* KVM implementation requires this capability */
 return kvm_direct_msi_enabled() ? "arm-its-kvm" : NULL;
 } else {
-/* Software emulation is not implemented yet */
-return NULL;
+/* Software emulation based model */
+return "arm-gicv3-its";
 }
 }
 
-- 
2.27.0

[PATCH v9 7/9] tests/data/acpi/virt: Add IORT files for ITS

2021-09-10 Thread Shashi Mallela

Added expected IORT files applicable with latest GICv3
ITS changes.Temporarily differences in these files are
okay.

Signed-off-by: Shashi Mallela 
Acked-by: Igor Mammedov 
Reviewed-by: Peter Maydell 
---
 tests/data/acpi/virt/IORT   | 0
 tests/data/acpi/virt/IORT.memhp | 0
 tests/data/acpi/virt/IORT.numamem   | 0
 tests/data/acpi/virt/IORT.pxb   | 0
 tests/qtest/bios-tables-test-allowed-diff.h | 4 
 5 files changed, 4 insertions(+)
 create mode 100644 tests/data/acpi/virt/IORT
 create mode 100644 tests/data/acpi/virt/IORT.memhp
 create mode 100644 tests/data/acpi/virt/IORT.numamem
 create mode 100644 tests/data/acpi/virt/IORT.pxb

diff --git a/tests/data/acpi/virt/IORT b/tests/data/acpi/virt/IORT
new file mode 100644
index 00..e69de29bb2
diff --git a/tests/data/acpi/virt/IORT.memhp b/tests/data/acpi/virt/IORT.memhp
new file mode 100644
index 00..e69de29bb2
diff --git a/tests/data/acpi/virt/IORT.numamem 
b/tests/data/acpi/virt/IORT.numamem
new file mode 100644
index 00..e69de29bb2
diff --git a/tests/data/acpi/virt/IORT.pxb b/tests/data/acpi/virt/IORT.pxb
new file mode 100644
index 00..e69de29bb2
diff --git a/tests/qtest/bios-tables-test-allowed-diff.h 
b/tests/qtest/bios-tables-test-allowed-diff.h
index dfb8523c8b..2ef211df59 100644
--- a/tests/qtest/bios-tables-test-allowed-diff.h
+++ b/tests/qtest/bios-tables-test-allowed-diff.h
@@ -1 +1,5 @@
 /* List of comma-separated changed AML files to ignore */
+"tests/data/acpi/virt/IORT",
+"tests/data/acpi/virt/IORT.memhp",
+"tests/data/acpi/virt/IORT.numamem",
+"tests/data/acpi/virt/IORT.pxb",
-- 
2.27.0

[PATCH v9 6/9] hw/intc: GICv3 redistributor ITS processing

2021-09-10 Thread Shashi Mallela

Implemented lpi processing at redistributor to get lpi config info
from lpi configuration table,determine priority,set pending state in
lpi pending table and forward the lpi to cpuif.Added logic to invoke
redistributor lpi processing with translated LPI which set/clear LPI
from ITS device as part of ITS INT,CLEAR,DISCARD command and
GITS_TRANSLATER processing.

Signed-off-by: Shashi Mallela 
Tested-by: Neil Armstrong 
Reviewed-by: Peter Maydell 
---
 hw/intc/arm_gicv3.c|  14 +++
 hw/intc/arm_gicv3_common.c |   1 +
 hw/intc/arm_gicv3_cpuif.c  |   7 +-
 hw/intc/arm_gicv3_its.c|  23 +
 hw/intc/arm_gicv3_redist.c | 141 +
 hw/intc/gicv3_internal.h   |   9 ++
 include/hw/intc/arm_gicv3_common.h |   7 ++
 7 files changed, 200 insertions(+), 2 deletions(-)

diff --git a/hw/intc/arm_gicv3.c b/hw/intc/arm_gicv3.c
index d63f8af604..3f24707838 100644
--- a/hw/intc/arm_gicv3.c
+++ b/hw/intc/arm_gicv3.c
@@ -165,6 +165,16 @@ static void gicv3_redist_update_noirqset(GICv3CPUState *cs)
 cs->hppi.grp = gicv3_irq_group(cs->gic, cs, cs->hppi.irq);
 }
 
+if ((cs->gicr_ctlr & GICR_CTLR_ENABLE_LPIS) && cs->gic->lpi_enable &&
+(cs->hpplpi.prio != 0xff)) {
+if (irqbetter(cs, cs->hpplpi.irq, cs->hpplpi.prio)) {
+cs->hppi.irq = cs->hpplpi.irq;
+cs->hppi.prio = cs->hpplpi.prio;
+cs->hppi.grp = cs->hpplpi.grp;
+seenbetter = true;
+}
+}
+
 /* If the best interrupt we just found would preempt whatever
  * was the previous best interrupt before this update, then
  * we know it's definitely the best one now.
@@ -339,9 +349,13 @@ static void gicv3_set_irq(void *opaque, int irq, int level)
 
 static void arm_gicv3_post_load(GICv3State *s)
 {
+int i;
 /* Recalculate our cached idea of the current highest priority
  * pending interrupt, but don't set IRQ or FIQ lines.
  */
+for (i = 0; i < s->num_cpu; i++) {
+gicv3_redist_update_lpi(>cpu[i]);
+}
 gicv3_full_update_noirqset(s);
 /* Repopulate the cache of GICv3CPUState pointers for target CPUs */
 gicv3_cache_all_target_cpustates(s);
diff --git a/hw/intc/arm_gicv3_common.c b/hw/intc/arm_gicv3_common.c
index 53dea2a775..223db16fec 100644
--- a/hw/intc/arm_gicv3_common.c
+++ b/hw/intc/arm_gicv3_common.c
@@ -435,6 +435,7 @@ static void arm_gicv3_common_reset(DeviceState *dev)
 memset(cs->gicr_ipriorityr, 0, sizeof(cs->gicr_ipriorityr));
 
 cs->hppi.prio = 0xff;
+cs->hpplpi.prio = 0xff;
 
 /* State in the CPU interface must *not* be reset here, because it
  * is part of the CPU's reset domain, not the GIC device's.
diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index a032d505f5..462a35f66e 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -899,10 +899,12 @@ static void icc_activate_irq(GICv3CPUState *cs, int irq)
 cs->gicr_iactiver0 = deposit32(cs->gicr_iactiver0, irq, 1, 1);
 cs->gicr_ipendr0 = deposit32(cs->gicr_ipendr0, irq, 1, 0);
 gicv3_redist_update(cs);
-} else {
+} else if (irq < GICV3_LPI_INTID_START) {
 gicv3_gicd_active_set(cs->gic, irq);
 gicv3_gicd_pending_clear(cs->gic, irq);
 gicv3_update(cs->gic, irq, 1);
+} else {
+gicv3_redist_lpi_pending(cs, irq, 0);
 }
 }
 
@@ -1318,7 +1320,8 @@ static void icc_eoir_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 trace_gicv3_icc_eoir_write(is_eoir0 ? 0 : 1,
gicv3_redist_affid(cs), value);
 
-if (irq >= cs->gic->num_irq) {
+if ((irq >= cs->gic->num_irq) &&
+!(cs->gic->lpi_enable && (irq >= GICV3_LPI_INTID_START))) {
 /* This handles two cases:
  * 1. If software writes the ID of a spurious interrupt [ie 1020-1023]
  * to the GICC_EOIR, the GIC ignores that write.
diff --git a/hw/intc/arm_gicv3_its.c b/hw/intc/arm_gicv3_its.c
index 03c6800997..efb1b5ecab 100644
--- a/hw/intc/arm_gicv3_its.c
+++ b/hw/intc/arm_gicv3_its.c
@@ -233,6 +233,7 @@ static bool process_its_cmd(GICv3ITSState *s, uint64_t 
value, uint32_t offset,
 uint64_t cte = 0;
 bool cte_valid = false;
 bool result = false;
+uint64_t rdbase;
 
 if (cmd == NONE) {
 devid = offset;
@@ -293,6 +294,18 @@ static bool process_its_cmd(GICv3ITSState *s, uint64_t 
value, uint32_t offset,
  * Current implementation only supports rdbase == procnum
  * Hence rdbase physical address is ignored
  */
+rdbase = (cte & GITS_CTE_RDBASE_PROCNUM_MASK) >> 1U;
+
+if (rdbase > s->gicv3->num_cpu) {
+return result;
+}
+
+if ((cmd == CLEAR) || (cmd == DISCARD)) {
+gicv3_redist_process_lpi(>gicv3->cpu[rdbase], pIntid, 0);
+} else {
+gicv3_redist_process_lpi(>gicv3->cpu[rdbase], pIntid, 1);
+}
+
 if (cmd ==

[PATCH v9 3/9] hw/intc: GICv3 ITS command queue framework

2021-09-10 Thread Shashi Mallela

Added functionality to trigger ITS command queue processing on
write to CWRITE register and process each command queue entry to
identify the command type and handle commands like MAPD,MAPC,SYNC.

Signed-off-by: Shashi Mallela 
Reviewed-by: Peter Maydell 
Reviewed-by: Eric Auger 
Tested-by: Neil Armstrong 
---
 hw/intc/arm_gicv3_its.c  | 319 +++
 hw/intc/gicv3_internal.h |  40 +
 2 files changed, 359 insertions(+)

diff --git a/hw/intc/arm_gicv3_its.c b/hw/intc/arm_gicv3_its.c
index 8234939ccc..fcd152271a 100644
--- a/hw/intc/arm_gicv3_its.c
+++ b/hw/intc/arm_gicv3_its.c
@@ -50,6 +50,318 @@ static uint64_t baser_base_addr(uint64_t value, uint32_t 
page_sz)
 return result;
 }
 
+static bool update_cte(GICv3ITSState *s, uint16_t icid, bool valid,
+   uint64_t rdbase)
+{
+AddressSpace *as = >gicv3->dma_as;
+uint64_t value;
+uint64_t l2t_addr;
+bool valid_l2t;
+uint32_t l2t_id;
+uint32_t max_l2_entries;
+uint64_t cte = 0;
+MemTxResult res = MEMTX_OK;
+
+if (!s->ct.valid) {
+return true;
+}
+
+if (valid) {
+/* add mapping entry to collection table */
+cte = (valid & TABLE_ENTRY_VALID_MASK) | (rdbase << 1ULL);
+}
+
+/*
+ * The specification defines the format of level 1 entries of a
+ * 2-level table, but the format of level 2 entries and the format
+ * of flat-mapped tables is IMPDEF.
+ */
+if (s->ct.indirect) {
+l2t_id = icid / (s->ct.page_sz / L1TABLE_ENTRY_SIZE);
+
+value = address_space_ldq_le(as,
+ s->ct.base_addr +
+ (l2t_id * L1TABLE_ENTRY_SIZE),
+ MEMTXATTRS_UNSPECIFIED, );
+
+if (res != MEMTX_OK) {
+return false;
+}
+
+valid_l2t = (value & L2_TABLE_VALID_MASK) != 0;
+
+if (valid_l2t) {
+max_l2_entries = s->ct.page_sz / s->ct.entry_sz;
+
+l2t_addr = value & ((1ULL << 51) - 1);
+
+address_space_stq_le(as, l2t_addr +
+ ((icid % max_l2_entries) * GITS_CTE_SIZE),
+ cte, MEMTXATTRS_UNSPECIFIED, );
+}
+} else {
+/* Flat level table */
+address_space_stq_le(as, s->ct.base_addr + (icid * GITS_CTE_SIZE),
+ cte, MEMTXATTRS_UNSPECIFIED, );
+}
+if (res != MEMTX_OK) {
+return false;
+} else {
+return true;
+}
+}
+
+static bool process_mapc(GICv3ITSState *s, uint32_t offset)
+{
+AddressSpace *as = >gicv3->dma_as;
+uint16_t icid;
+uint64_t rdbase;
+bool valid;
+MemTxResult res = MEMTX_OK;
+bool result = false;
+uint64_t value;
+
+offset += NUM_BYTES_IN_DW;
+offset += NUM_BYTES_IN_DW;
+
+value = address_space_ldq_le(as, s->cq.base_addr + offset,
+ MEMTXATTRS_UNSPECIFIED, );
+
+if (res != MEMTX_OK) {
+return result;
+}
+
+icid = value & ICID_MASK;
+
+rdbase = (value & R_MAPC_RDBASE_MASK) >> R_MAPC_RDBASE_SHIFT;
+rdbase &= RDBASE_PROCNUM_MASK;
+
+valid = (value & CMD_FIELD_VALID_MASK);
+
+if ((icid > s->ct.maxids.max_collids) || (rdbase > s->gicv3->num_cpu)) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "ITS MAPC: invalid collection table attributes "
+  "icid %d rdbase %lu\n",  icid, rdbase);
+/*
+ * in this implementation, in case of error
+ * we ignore this command and move onto the next
+ * command in the queue
+ */
+} else {
+result = update_cte(s, icid, valid, rdbase);
+}
+
+return result;
+}
+
+static bool update_dte(GICv3ITSState *s, uint32_t devid, bool valid,
+   uint8_t size, uint64_t itt_addr)
+{
+AddressSpace *as = >gicv3->dma_as;
+uint64_t value;
+uint64_t l2t_addr;
+bool valid_l2t;
+uint32_t l2t_id;
+uint32_t max_l2_entries;
+uint64_t dte = 0;
+MemTxResult res = MEMTX_OK;
+
+if (s->dt.valid) {
+if (valid) {
+/* add mapping entry to device table */
+dte = (valid & TABLE_ENTRY_VALID_MASK) |
+  ((size & SIZE_MASK) << 1U) |
+  (itt_addr << GITS_DTE_ITTADDR_SHIFT);
+}
+} else {
+return true;
+}
+
+/*
+ * The specification defines the format of level 1 entries of a
+ * 2-level table, but the format of level 2 entries and the format
+ * of flat-mapped tables is IMPDEF.
+ */
+if (s->dt.indirect) {
+l2t_id = devid / (s->dt.page_sz / L1TABLE_ENTRY_SIZE);
+
+value = address_space_ldq_le(as,
+ s->dt.base_addr +
+ (l2t_id * L1TABLE_ENTRY_SIZE),
+ MEMTXATTRS_UNSPECIFIED, );
+
+if (res != MEMTX_OK) {
+

[PATCH v9 4/9] hw/intc: GICv3 ITS Command processing

2021-09-10 Thread Shashi Mallela

Added ITS command queue handling for MAPTI,MAPI commands,handled ITS
translation which triggers an LPI via INT command as well as write
to GITS_TRANSLATER register,defined enum to differentiate between ITS
command interrupt trigger and GITS_TRANSLATER based interrupt trigger.
Each of these commands make use of other functionalities implemented to
get device table entry,collection table entry or interrupt translation
table entry required for their processing.

Signed-off-by: Shashi Mallela 
Reviewed-by: Peter Maydell 
---
 hw/intc/arm_gicv3_its.c| 365 -
 hw/intc/gicv3_internal.h   |  12 +
 include/hw/intc/arm_gicv3_common.h |   2 +
 3 files changed, 378 insertions(+), 1 deletion(-)

diff --git a/hw/intc/arm_gicv3_its.c b/hw/intc/arm_gicv3_its.c
index fcd152271a..03c6800997 100644
--- a/hw/intc/arm_gicv3_its.c
+++ b/hw/intc/arm_gicv3_its.c
@@ -29,6 +29,22 @@ struct GICv3ITSClass {
 void (*parent_reset)(DeviceState *dev);
 };
 
+/*
+ * This is an internal enum used to distinguish between LPI triggered
+ * via command queue and LPI triggered via gits_translater write.
+ */
+typedef enum ItsCmdType {
+NONE = 0, /* internal indication for GITS_TRANSLATER write */
+CLEAR = 1,
+DISCARD = 2,
+INT = 3,
+} ItsCmdType;
+
+typedef struct {
+uint32_t iteh;
+uint64_t itel;
+} IteEntry;
+
 static uint64_t baser_base_addr(uint64_t value, uint32_t page_sz)
 {
 uint64_t result = 0;
@@ -50,6 +66,329 @@ static uint64_t baser_base_addr(uint64_t value, uint32_t 
page_sz)
 return result;
 }
 
+static bool get_cte(GICv3ITSState *s, uint16_t icid, uint64_t *cte,
+MemTxResult *res)
+{
+AddressSpace *as = >gicv3->dma_as;
+uint64_t l2t_addr;
+uint64_t value;
+bool valid_l2t;
+uint32_t l2t_id;
+uint32_t max_l2_entries;
+
+if (s->ct.indirect) {
+l2t_id = icid / (s->ct.page_sz / L1TABLE_ENTRY_SIZE);
+
+value = address_space_ldq_le(as,
+ s->ct.base_addr +
+ (l2t_id * L1TABLE_ENTRY_SIZE),
+ MEMTXATTRS_UNSPECIFIED, res);
+
+if (*res == MEMTX_OK) {
+valid_l2t = (value & L2_TABLE_VALID_MASK) != 0;
+
+if (valid_l2t) {
+max_l2_entries = s->ct.page_sz / s->ct.entry_sz;
+
+l2t_addr = value & ((1ULL << 51) - 1);
+
+*cte =  address_space_ldq_le(as, l2t_addr +
+((icid % max_l2_entries) * GITS_CTE_SIZE),
+MEMTXATTRS_UNSPECIFIED, res);
+   }
+   }
+} else {
+/* Flat level table */
+*cte =  address_space_ldq_le(as, s->ct.base_addr +
+ (icid * GITS_CTE_SIZE),
+  MEMTXATTRS_UNSPECIFIED, res);
+}
+
+return (*cte & TABLE_ENTRY_VALID_MASK) != 0;
+}
+
+static bool update_ite(GICv3ITSState *s, uint32_t eventid, uint64_t dte,
+   IteEntry ite)
+{
+AddressSpace *as = >gicv3->dma_as;
+uint64_t itt_addr;
+MemTxResult res = MEMTX_OK;
+
+itt_addr = (dte & GITS_DTE_ITTADDR_MASK) >> GITS_DTE_ITTADDR_SHIFT;
+itt_addr <<= ITTADDR_SHIFT; /* 256 byte aligned */
+
+address_space_stq_le(as, itt_addr + (eventid * (sizeof(uint64_t) +
+ sizeof(uint32_t))), ite.itel, MEMTXATTRS_UNSPECIFIED,
+ );
+
+if (res == MEMTX_OK) {
+address_space_stl_le(as, itt_addr + (eventid * (sizeof(uint64_t) +
+ sizeof(uint32_t))) + sizeof(uint32_t), ite.iteh,
+ MEMTXATTRS_UNSPECIFIED, );
+}
+if (res != MEMTX_OK) {
+return false;
+} else {
+return true;
+}
+}
+
+static bool get_ite(GICv3ITSState *s, uint32_t eventid, uint64_t dte,
+uint16_t *icid, uint32_t *pIntid, MemTxResult *res)
+{
+AddressSpace *as = >gicv3->dma_as;
+uint64_t itt_addr;
+bool status = false;
+IteEntry ite = {};
+
+itt_addr = (dte & GITS_DTE_ITTADDR_MASK) >> GITS_DTE_ITTADDR_SHIFT;
+itt_addr <<= ITTADDR_SHIFT; /* 256 byte aligned */
+
+ite.itel = address_space_ldq_le(as, itt_addr +
+(eventid * (sizeof(uint64_t) +
+sizeof(uint32_t))), MEMTXATTRS_UNSPECIFIED,
+res);
+
+if (*res == MEMTX_OK) {
+ite.iteh = address_space_ldl_le(as, itt_addr +
+(eventid * (sizeof(uint64_t) +
+sizeof(uint32_t))) + sizeof(uint32_t),
+MEMTXATTRS_UNSPECIFIED, res);
+
+if (*res == MEMTX_OK) {
+if (ite.itel & TABLE_ENTRY_VALID_MASK) {
+if ((ite.itel >> ITE_ENTRY_INTTYPE_SHIFT) &
+GITS_TYPE_PHYSICAL) {
+

[PATCH v9 5/9] hw/intc: GICv3 ITS Feature enablement

2021-09-10 Thread Shashi Mallela

Added properties to enable ITS feature and define qemu system
address space memory in gicv3 common,setup distributor and
redistributor registers to indicate LPI support.

Signed-off-by: Shashi Mallela 
Reviewed-by: Peter Maydell 
Tested-by: Neil Armstrong 
---
 hw/intc/arm_gicv3_common.c | 12 
 hw/intc/arm_gicv3_dist.c   |  5 -
 hw/intc/arm_gicv3_redist.c | 12 +---
 hw/intc/gicv3_internal.h   |  2 ++
 include/hw/intc/arm_gicv3_common.h |  1 +
 5 files changed, 28 insertions(+), 4 deletions(-)

diff --git a/hw/intc/arm_gicv3_common.c b/hw/intc/arm_gicv3_common.c
index 58ef65f589..53dea2a775 100644
--- a/hw/intc/arm_gicv3_common.c
+++ b/hw/intc/arm_gicv3_common.c
@@ -345,6 +345,11 @@ static void arm_gicv3_common_realize(DeviceState *dev, 
Error **errp)
 return;
 }
 
+if (s->lpi_enable && !s->dma) {
+error_setg(errp, "Redist-ITS: Guest 'sysmem' reference link not set");
+return;
+}
+
 s->cpu = g_new0(GICv3CPUState, s->num_cpu);
 
 for (i = 0; i < s->num_cpu; i++) {
@@ -381,6 +386,10 @@ static void arm_gicv3_common_realize(DeviceState *dev, 
Error **errp)
 (1 << 24) |
 (i << 8) |
 (last << 4);
+
+if (s->lpi_enable) {
+s->cpu[i].gicr_typer |= GICR_TYPER_PLPIS;
+}
 }
 }
 
@@ -494,9 +503,12 @@ static Property arm_gicv3_common_properties[] = {
 DEFINE_PROP_UINT32("num-cpu", GICv3State, num_cpu, 1),
 DEFINE_PROP_UINT32("num-irq", GICv3State, num_irq, 32),
 DEFINE_PROP_UINT32("revision", GICv3State, revision, 3),
+DEFINE_PROP_BOOL("has-lpi", GICv3State, lpi_enable, 0),
 DEFINE_PROP_BOOL("has-security-extensions", GICv3State, security_extn, 0),
 DEFINE_PROP_ARRAY("redist-region-count", GICv3State, nb_redist_regions,
   redist_region_count, qdev_prop_uint32, uint32_t),
+DEFINE_PROP_LINK("sysmem", GICv3State, dma, TYPE_MEMORY_REGION,
+ MemoryRegion *),
 DEFINE_PROP_END_OF_LIST(),
 };
 
diff --git a/hw/intc/arm_gicv3_dist.c b/hw/intc/arm_gicv3_dist.c
index 5beb7c4235..4164500ea9 100644
--- a/hw/intc/arm_gicv3_dist.c
+++ b/hw/intc/arm_gicv3_dist.c
@@ -384,7 +384,9 @@ static bool gicd_readl(GICv3State *s, hwaddr offset,
  * A3V == 1 (non-zero values of Affinity level 3 supported)
  * IDbits == 0xf (we support 16-bit interrupt identifiers)
  * DVIS == 0 (Direct virtual LPI injection not supported)
- * LPIS == 0 (LPIs not supported)
+ * LPIS == 1 (LPIs are supported if affinity routing is enabled)
+ * num_LPIs == 0b0 (bits [15:11],Number of LPIs as indicated
+ *  by GICD_TYPER.IDbits)
  * MBIS == 0 (message-based SPIs not supported)
  * SecurityExtn == 1 if security extns supported
  * CPUNumber == 0 since for us ARE is always 1
@@ -399,6 +401,7 @@ static bool gicd_readl(GICv3State *s, hwaddr offset,
 bool sec_extn = !(s->gicd_ctlr & GICD_CTLR_DS);
 
 *data = (1 << 25) | (1 << 24) | (sec_extn << 10) |
+(s->lpi_enable << GICD_TYPER_LPIS_SHIFT) |
 (0xf << 19) | itlinesnumber;
 return true;
 }
diff --git a/hw/intc/arm_gicv3_redist.c b/hw/intc/arm_gicv3_redist.c
index 53da703ed8..2108abfe9c 100644
--- a/hw/intc/arm_gicv3_redist.c
+++ b/hw/intc/arm_gicv3_redist.c
@@ -248,10 +248,16 @@ static MemTxResult gicr_writel(GICv3CPUState *cs, hwaddr 
offset,
 case GICR_CTLR:
 /* For our implementation, GICR_TYPER.DPGS is 0 and so all
  * the DPG bits are RAZ/WI. We don't do anything asynchronously,
- * so UWP and RWP are RAZ/WI. And GICR_TYPER.LPIS is 0 (we don't
- * implement LPIs) so Enable_LPIs is RES0. So there are no writable
- * bits for us.
+ * so UWP and RWP are RAZ/WI. GICR_TYPER.LPIS is 1 (we
+ * implement LPIs) so Enable_LPIs is programmable.
  */
+if (cs->gicr_typer & GICR_TYPER_PLPIS) {
+if (value & GICR_CTLR_ENABLE_LPIS) {
+cs->gicr_ctlr |= GICR_CTLR_ENABLE_LPIS;
+} else {
+cs->gicr_ctlr &= ~GICR_CTLR_ENABLE_LPIS;
+}
+}
 return MEMTX_OK;
 case GICR_STATUSR:
 /* RAZ/WI for our implementation */
diff --git a/hw/intc/gicv3_internal.h b/hw/intc/gicv3_internal.h
index 1966444790..530d1c1789 100644
--- a/hw/intc/gicv3_internal.h
+++ b/hw/intc/gicv3_internal.h
@@ -68,6 +68,8 @@
 #define GICD_CTLR_E1NWF (1U << 7)
 #define GICD_CTLR_RWP   (1U << 31)
 
+#define GICD_TYPER_LPIS_SHIFT  17
+
 /* 16 bits EventId */
 #define GICD_TYPER_IDBITS0xf
 
diff --git a/include/hw/intc/arm_gicv3_common.h 
b/include/hw/intc/arm_gicv3_common.h
index 0715b0bc2a..c1348cc60a 100644
--- a/include/hw/intc/arm_gicv3_common.h
+++ b/include/hw/intc/arm_gicv3_common.h
@@ -221,6 +221,7 @@ struct GICv3State {
 uint32_t num_cpu;
 uint32_t num_irq;

[PATCH v9 2/9] hw/intc: GICv3 ITS register definitions added

2021-09-10 Thread Shashi Mallela

Defined descriptors for ITS device table,collection table and ITS
command queue entities.Implemented register read/write functions,
extract ITS table parameters and command queue parameters,extended
gicv3 common to capture qemu address space(which host the ITS table
platform memories required for subsequent ITS processing) and
initialize the same in ITS device.

Signed-off-by: Shashi Mallela 
Reviewed-by: Peter Maydell 
Reviewed-by: Eric Auger 
Tested-by: Neil Armstrong 
---
 hw/intc/arm_gicv3_its.c| 376 +
 hw/intc/gicv3_internal.h   |  29 ++
 include/hw/intc/arm_gicv3_common.h |   3 +
 include/hw/intc/arm_gicv3_its_common.h |  23 ++
 4 files changed, 431 insertions(+)

diff --git a/hw/intc/arm_gicv3_its.c b/hw/intc/arm_gicv3_its.c
index 83ece7c4c1..8234939ccc 100644
--- a/hw/intc/arm_gicv3_its.c
+++ b/hw/intc/arm_gicv3_its.c
@@ -29,6 +29,160 @@ struct GICv3ITSClass {
 void (*parent_reset)(DeviceState *dev);
 };
 
+static uint64_t baser_base_addr(uint64_t value, uint32_t page_sz)
+{
+uint64_t result = 0;
+
+switch (page_sz) {
+case GITS_PAGE_SIZE_4K:
+case GITS_PAGE_SIZE_16K:
+result = FIELD_EX64(value, GITS_BASER, PHYADDR) << 12;
+break;
+
+case GITS_PAGE_SIZE_64K:
+result = FIELD_EX64(value, GITS_BASER, PHYADDRL_64K) << 16;
+result |= FIELD_EX64(value, GITS_BASER, PHYADDRH_64K) << 48;
+break;
+
+default:
+break;
+}
+return result;
+}
+
+/*
+ * This function extracts the ITS Device and Collection table specific
+ * parameters (like base_addr, size etc) from GITS_BASER register.
+ * It is called during ITS enable and also during post_load migration
+ */
+static void extract_table_params(GICv3ITSState *s)
+{
+uint16_t num_pages = 0;
+uint8_t  page_sz_type;
+uint8_t type;
+uint32_t page_sz = 0;
+uint64_t value;
+
+for (int i = 0; i < 8; i++) {
+value = s->baser[i];
+
+if (!value) {
+continue;
+}
+
+page_sz_type = FIELD_EX64(value, GITS_BASER, PAGESIZE);
+
+switch (page_sz_type) {
+case 0:
+page_sz = GITS_PAGE_SIZE_4K;
+break;
+
+case 1:
+page_sz = GITS_PAGE_SIZE_16K;
+break;
+
+case 2:
+case 3:
+page_sz = GITS_PAGE_SIZE_64K;
+break;
+
+default:
+g_assert_not_reached();
+}
+
+num_pages = FIELD_EX64(value, GITS_BASER, SIZE) + 1;
+
+type = FIELD_EX64(value, GITS_BASER, TYPE);
+
+switch (type) {
+
+case GITS_BASER_TYPE_DEVICE:
+memset(>dt, 0 , sizeof(s->dt));
+s->dt.valid = FIELD_EX64(value, GITS_BASER, VALID);
+
+if (!s->dt.valid) {
+return;
+}
+
+s->dt.page_sz = page_sz;
+s->dt.indirect = FIELD_EX64(value, GITS_BASER, INDIRECT);
+s->dt.entry_sz = FIELD_EX64(value, GITS_BASER, ENTRYSIZE);
+
+if (!s->dt.indirect) {
+s->dt.max_entries = (num_pages * page_sz) / s->dt.entry_sz;
+} else {
+s->dt.max_entries = (((num_pages * page_sz) /
+ L1TABLE_ENTRY_SIZE) *
+ (page_sz / s->dt.entry_sz));
+}
+
+s->dt.maxids.max_devids = (1UL << (FIELD_EX64(s->typer, GITS_TYPER,
+   DEVBITS) + 1));
+
+s->dt.base_addr = baser_base_addr(value, page_sz);
+
+break;
+
+case GITS_BASER_TYPE_COLLECTION:
+memset(>ct, 0 , sizeof(s->ct));
+s->ct.valid = FIELD_EX64(value, GITS_BASER, VALID);
+
+/*
+ * GITS_TYPER.HCC is 0 for this implementation
+ * hence writes are discarded if ct.valid is 0
+ */
+if (!s->ct.valid) {
+return;
+}
+
+s->ct.page_sz = page_sz;
+s->ct.indirect = FIELD_EX64(value, GITS_BASER, INDIRECT);
+s->ct.entry_sz = FIELD_EX64(value, GITS_BASER, ENTRYSIZE);
+
+if (!s->ct.indirect) {
+s->ct.max_entries = (num_pages * page_sz) / s->ct.entry_sz;
+} else {
+s->ct.max_entries = (((num_pages * page_sz) /
+ L1TABLE_ENTRY_SIZE) *
+ (page_sz / s->ct.entry_sz));
+}
+
+if (FIELD_EX64(s->typer, GITS_TYPER, CIL)) {
+s->ct.maxids.max_collids = (1UL << (FIELD_EX64(s->typer,
+GITS_TYPER, CIDBITS) + 1));
+} else {
+/* 16-bit CollectionId supported when CIL == 0 */
+s->ct.maxids.max_collids = (1UL << 16);
+}
+
+s->ct.base_addr = baser_base_addr(value, page_sz);
+
+break;
+
+default:
+break;
+}
+}

[PATCH v9 0/9] GICv3 LPI and ITS feature implementation

2021-09-10 Thread Shashi Mallela

This patchset implements qemu device model for enabling physical
LPI support and ITS functionality in GIC as per GICv3 specification.
Both flat table and 2 level tables are implemented.The ITS commands
for adding/deleting ITS table entries,trigerring LPI interrupts are
implemented.Translated LPI interrupt ids are processed by redistributor
to determine priority and set pending state appropriately before
forwarding the same to cpu interface.
The ITS feature support has been added to virt platform,wherein the
emulated functionality co-exists with kvm kernel functionality.

Changes in v9:
 - removed sbsa-ref patch due to inconclusive ongoing discussion
   regarding ITS placement and version in sbsa-ref platform.This will be
   taken up as a separate functionality later
 - updated its enable code as per latest virt 6.2
 - updated its code to replace usage of MEMTX_ with bool
 - All kvm_unit_tests PASS
 - Verified Linux Boot functionality

Shashi Mallela (9):
  hw/intc: GICv3 ITS initial framework
  hw/intc: GICv3 ITS register definitions added
  hw/intc: GICv3 ITS command queue framework
  hw/intc: GICv3 ITS Command processing
  hw/intc: GICv3 ITS Feature enablement
  hw/intc: GICv3 redistributor ITS processing
  tests/data/acpi/virt: Add IORT files for ITS
  hw/arm/virt: add ITS support in virt GIC
  tests/data/acpi/virt: Update IORT files for ITS

 hw/arm/virt.c  |   29 +-
 hw/intc/arm_gicv3.c|   14 +
 hw/intc/arm_gicv3_common.c |   13 +
 hw/intc/arm_gicv3_cpuif.c  |7 +-
 hw/intc/arm_gicv3_dist.c   |5 +-
 hw/intc/arm_gicv3_its.c| 1322 
 hw/intc/arm_gicv3_its_common.c |7 +-
 hw/intc/arm_gicv3_its_kvm.c|2 +-
 hw/intc/arm_gicv3_redist.c |  153 ++-
 hw/intc/gicv3_internal.h   |  188 +++-
 hw/intc/meson.build|1 +
 include/hw/arm/virt.h  |2 +
 include/hw/intc/arm_gicv3_common.h |   13 +
 include/hw/intc/arm_gicv3_its_common.h |   32 +-
 target/arm/kvm_arm.h   |4 +-
 tests/data/acpi/virt/IORT  |  Bin 0 -> 124 bytes
 tests/data/acpi/virt/IORT.memhp|  Bin 0 -> 124 bytes
 tests/data/acpi/virt/IORT.numamem  |  Bin 0 -> 124 bytes
 tests/data/acpi/virt/IORT.pxb  |  Bin 0 -> 124 bytes
 19 files changed, 1768 insertions(+), 24 deletions(-)
 create mode 100644 hw/intc/arm_gicv3_its.c
 create mode 100644 tests/data/acpi/virt/IORT
 create mode 100644 tests/data/acpi/virt/IORT.memhp
 create mode 100644 tests/data/acpi/virt/IORT.numamem
 create mode 100644 tests/data/acpi/virt/IORT.pxb

--
2.27.0

[PATCH v9 1/9] hw/intc: GICv3 ITS initial framework

2021-09-10 Thread Shashi Mallela

Added register definitions relevant to ITS,implemented overall
ITS device framework with stubs for ITS control and translater
regions read/write,extended ITS common to handle mmio init between
existing kvm device and newer qemu device.

Signed-off-by: Shashi Mallela 
Reviewed-by: Peter Maydell 
Reviewed-by: Eric Auger 
Tested-by: Neil Armstrong 
---
 hw/intc/arm_gicv3_its.c| 241 +
 hw/intc/arm_gicv3_its_common.c |   7 +-
 hw/intc/arm_gicv3_its_kvm.c|   2 +-
 hw/intc/gicv3_internal.h   |  96 +-
 hw/intc/meson.build|   1 +
 include/hw/intc/arm_gicv3_its_common.h |   9 +-
 6 files changed, 342 insertions(+), 14 deletions(-)
 create mode 100644 hw/intc/arm_gicv3_its.c

diff --git a/hw/intc/arm_gicv3_its.c b/hw/intc/arm_gicv3_its.c
new file mode 100644
index 00..83ece7c4c1
--- /dev/null
+++ b/hw/intc/arm_gicv3_its.c
@@ -0,0 +1,241 @@
+/*
+ * ITS emulation for a GICv3-based system
+ *
+ * Copyright Linaro.org 2021
+ *
+ * Authors:
+ *  Shashi Mallela 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or (at your
+ * option) any later version.  See the COPYING file in the top-level directory.
+ *
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "hw/qdev-properties.h"
+#include "hw/intc/arm_gicv3_its_common.h"
+#include "gicv3_internal.h"
+#include "qom/object.h"
+#include "qapi/error.h"
+
+typedef struct GICv3ITSClass GICv3ITSClass;
+/* This is reusing the GICv3ITSState typedef from ARM_GICV3_ITS_COMMON */
+DECLARE_OBJ_CHECKERS(GICv3ITSState, GICv3ITSClass,
+ ARM_GICV3_ITS, TYPE_ARM_GICV3_ITS)
+
+struct GICv3ITSClass {
+GICv3ITSCommonClass parent_class;
+void (*parent_reset)(DeviceState *dev);
+};
+
+static MemTxResult gicv3_its_translation_write(void *opaque, hwaddr offset,
+   uint64_t data, unsigned size,
+   MemTxAttrs attrs)
+{
+return MEMTX_OK;
+}
+
+static bool its_writel(GICv3ITSState *s, hwaddr offset,
+  uint64_t value, MemTxAttrs attrs)
+{
+bool result = true;
+
+return result;
+}
+
+static bool its_readl(GICv3ITSState *s, hwaddr offset,
+ uint64_t *data, MemTxAttrs attrs)
+{
+bool result = true;
+
+return result;
+}
+
+static bool its_writell(GICv3ITSState *s, hwaddr offset,
+   uint64_t value, MemTxAttrs attrs)
+{
+bool result = true;
+
+return result;
+}
+
+static bool its_readll(GICv3ITSState *s, hwaddr offset,
+  uint64_t *data, MemTxAttrs attrs)
+{
+bool result = true;
+
+return result;
+}
+
+static MemTxResult gicv3_its_read(void *opaque, hwaddr offset, uint64_t *data,
+  unsigned size, MemTxAttrs attrs)
+{
+GICv3ITSState *s = (GICv3ITSState *)opaque;
+bool result;
+
+switch (size) {
+case 4:
+result = its_readl(s, offset, data, attrs);
+break;
+case 8:
+result = its_readll(s, offset, data, attrs);
+break;
+default:
+result = false;
+break;
+}
+
+if (!result) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: invalid guest read at offset " TARGET_FMT_plx
+  "size %u\n", __func__, offset, size);
+/*
+ * The spec requires that reserved registers are RAZ/WI;
+ * so use false returns from leaf functions as a way to
+ * trigger the guest-error logging but don't return it to
+ * the caller, or we'll cause a spurious guest data abort.
+ */
+*data = 0;
+}
+return MEMTX_OK;
+}
+
+static MemTxResult gicv3_its_write(void *opaque, hwaddr offset, uint64_t data,
+   unsigned size, MemTxAttrs attrs)
+{
+GICv3ITSState *s = (GICv3ITSState *)opaque;
+bool result;
+
+switch (size) {
+case 4:
+result = its_writel(s, offset, data, attrs);
+break;
+case 8:
+result = its_writell(s, offset, data, attrs);
+break;
+default:
+result = false;
+break;
+}
+
+if (!result) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: invalid guest write at offset " TARGET_FMT_plx
+  "size %u\n", __func__, offset, size);
+/*
+ * The spec requires that reserved registers are RAZ/WI;
+ * so use false returns from leaf functions as a way to
+ * trigger the guest-error logging but don't return it to
+ * the caller, or we'll cause a spurious guest data abort.
+ */
+}
+return MEMTX_OK;
+}
+
+static const MemoryRegionOps gicv3_its_control_ops = {
+.read_with_attrs = gicv3_its_read,
+.write_with_attrs = gicv3_its_write,
+.valid.min_access_size = 4,
+.valid.max_access_size = 8,
+.impl.min_access_size = 4,
+

Re: [PATCH v2] block/mirror: fix NULL pointer dereference in mirror_wait_on_conflicts()

2021-09-10 Thread Hanna Reitz


On 10.09.21 14:45, Stefano Garzarella wrote:

In mirror_iteration() we call mirror_wait_on_conflicts() with
`self` parameter set to NULL.

Starting from commit d44dae1a7c we dereference `self` pointer in
mirror_wait_on_conflicts() without checks if it is not NULL.

Backtrace:
   Program terminated with signal SIGSEGV, Segmentation fault.
   #0  mirror_wait_on_conflicts (self=0x0, s=, offset=, 
bytes=)
   at ../block/mirror.c:172
   172  self->waiting_for_op = op;
   [Current thread is 1 (Thread 0x7f0908931ec0 (LWP 380249))]
   (gdb) bt
   #0  mirror_wait_on_conflicts (self=0x0, s=, offset=, 
bytes=)
   at ../block/mirror.c:172
   #1  0x5610c5d9d631 in mirror_run (job=0x5610c76a2c00, errp=) at ../block/mirror.c:491
   #2  0x5610c5d58726 in job_co_entry (opaque=0x5610c76a2c00) at 
../job.c:917
   #3  0x5610c5f046c6 in coroutine_trampoline (i0=, 
i1=)
   at ../util/coroutine-ucontext.c:173
   #4  0x7f0909975820 in ?? () at 
../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91
   from /usr/lib64/libc.so.6

Buglink: https://bugzilla.redhat.com/show_bug.cgi?id=2001404
Fixes: d44dae1a7c ("block/mirror: fix active mirror dead-lock in 
mirror_wait_on_conflicts")
Signed-off-by: Stefano Garzarella 
---
v2:
- moved "if (op->waiting_for_op) {continue;}" code into "if (self)" too.
   [Vladimir]
---
  block/mirror.c | 25 -
  1 file changed, 16 insertions(+), 9 deletions(-)


Thanks, applied to my block branch:

https://github.com/XanClic/qemu/commits/block

Hanna

Re: [PATCH v2 0/3] The HMP/QMP interfaces in Qemu SGX

2021-09-10 Thread Daniel P . Berrangé

On Fri, Sep 10, 2021 at 04:15:21PM +0200, Paolo Bonzini wrote:
> On 10/09/21 12:22, Yang Zhong wrote:
> > This patchset supply HMP/QMP interfaces to monitor and Libvirt, with
> > those interfaces, we can check the SGX info from VM side or check
> > host SGX capabilities from Libvirt side.
> > 
> > This patchset is splitted from below link(from patch26 to patch30):
> > https://patchew.org/QEMU/20210719112136.57018-1-yang.zh...@intel.com/
> > 
> > The rest patches are being pulled by Paolo's below link and this new
> > patchset is based on it.
> > https://gitlab.com/bonzini/qemu.git tags/for-upstream
> 
> Queued 1-2, thanks.

I had just posted a bunch of comments on patch 1 ...


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 1/2] tests: add migrate-during-backup

2021-09-10 Thread Hanna Reitz


On 10.09.21 13:00, Vladimir Sementsov-Ogievskiy wrote:

Add a simple test which tries to run migration during backup.
bdrv_inactivate_all() should fail. But due to bug (see next commit with
fix) it doesn't, nodes are inactivated and continued backup crashes
on assertion "assert(!(bs->open_flags & BDRV_O_INACTIVE));" in
bdrv_co_write_req_prepare().

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  .../qemu-iotests/tests/migrate-during-backup  | 87 +++
  .../tests/migrate-during-backup.out   |  5 ++
  2 files changed, 92 insertions(+)
  create mode 100755 tests/qemu-iotests/tests/migrate-during-backup
  create mode 100644 tests/qemu-iotests/tests/migrate-during-backup.out

diff --git a/tests/qemu-iotests/tests/migrate-during-backup 
b/tests/qemu-iotests/tests/migrate-during-backup
new file mode 100755
index 00..c3b7f1983d
--- /dev/null
+++ b/tests/qemu-iotests/tests/migrate-during-backup
@@ -0,0 +1,87 @@
+#!/usr/bin/env python3
+# group: migration disabled
+#
+# Copyright (c) 2021 Virtuozzo International GmbH
+#
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 2 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see .
+#
+
+import os
+import iotests
+from iotests import qemu_img_create, qemu_io
+
+
+disk_a = os.path.join(iotests.test_dir, 'disk_a')
+disk_b = os.path.join(iotests.test_dir, 'disk_b')
+size = '1M'
+mig_file = os.path.join(iotests.test_dir, 'mig_file')
+mig_cmd = 'exec: cat > ' + mig_file
+
+
+class TestMigrateDuringBackup(iotests.QMPTestCase):
+def tearDown(self):
+self.vm.shutdown()
+os.remove(disk_a)
+os.remove(disk_b)
+os.remove(mig_file)
+
+def setUp(self):
+qemu_img_create('-f', iotests.imgfmt, disk_a, size)
+qemu_img_create('-f', iotests.imgfmt, disk_b, size)
+qemu_io('-c', f'write 0 {size}', disk_a)
+
+self.vm = iotests.VM().add_drive(disk_a)
+self.vm.launch()
+result = self.vm.qmp('blockdev-add', {
+'node-name': 'target',
+'driver': iotests.imgfmt,
+'file': {
+'driver': 'file',
+'filename': disk_b
+}
+})
+self.assert_qmp(result, 'return', {})
+
+def test_migrate(self):
+result = self.vm.qmp('blockdev-backup', device='drive0',
+ target='target', sync='full',
+ speed=1, x_perf={
+ 'max-workers': 1,
+ 'max-chunk': 64 * 1024
+ })
+self.assert_qmp(result, 'return', {})
+
+result = self.vm.qmp('job-pause', id='drive0')
+self.assert_qmp(result, 'return', {})
+
+result = self.vm.qmp('migrate-set-capabilities',
+ capabilities=[{'capability': 'events',
+'state': True}])
+self.assert_qmp(result, 'return', {})
+result = self.vm.qmp('migrate', uri=mig_cmd)
+self.assert_qmp(result, 'return', {})
+
+self.vm.events_wait((('MIGRATION', {'data': {'status': 'completed'}}),
+ ('MIGRATION', {'data': {'status': 'failed'}})))


So the migration failing is the result we expect here, right? Perhaps we 
should then have a loop that waits for MIGRATION events, and breaks on 
both status=completed and status=failed, but logs an error if the 
migration completes unexpectedly.


While I’ll give a

Reviewed-by: Hanna Reitz 

either way, I’d like to know your opinion on this still.

Hanna


+
+result = self.vm.qmp('block-job-set-speed', device='drive0',
+ speed=0)
+self.assert_qmp(result, 'return', {})
+result = self.vm.qmp('job-resume', id='drive0')
+self.assert_qmp(result, 'return', {})
+
+
+if __name__ == '__main__':
+iotests.main(supported_fmts=['qcow2'],
+ supported_protocols=['file'])
diff --git a/tests/qemu-iotests/tests/migrate-during-backup.out 
b/tests/qemu-iotests/tests/migrate-during-backup.out
new file mode 100644
index 00..ae1213e6f8
--- /dev/null
+++ b/tests/qemu-iotests/tests/migrate-during-backup.out
@@ -0,0 +1,5 @@
+.
+--
+Ran 1 tests
+
+OK

Re: [PATCH 2/2] block: bdrv_inactivate_recurse(): check for permissions and fix crash

2021-09-10 Thread Hanna Reitz


On 10.09.21 13:01, Vladimir Sementsov-Ogievskiy wrote:

We must not inactivate child when parent has write permissions on
it.

Calling .bdrv_inactivate() doesn't help: actually only qcow2 has this
handler and it is used to flush caches, not for permission
manipulations.


I guess we could ask whether block jobs should implement 
.bdrv_inactivate() to cancel themselves, but I believe it’s indeed 
better to have the migration fail and thus force the user to manually 
cancel the job (should that be what they want).



So, let's simply check cumulative parent permissions before
inactivating the node.

This commit fixes a crash when we do migration during backup: prior to
the commit nothing prevents all nodes inactivation at migration finish
and following backup write to the target crashes on assertion
"assert(!(bs->open_flags & BDRV_O_INACTIVE));" in
bdrv_co_write_req_prepare().

After the commit, we rely on the fact that copy-before-write filter
keeps write permission on target node to be able to write to it. So
inactivation fails and migration fails as expected.

Corresponding test now passes, so, enable it.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
---
  block.c| 8 
  tests/qemu-iotests/tests/migrate-during-backup | 2 +-
  2 files changed, 9 insertions(+), 1 deletion(-)


Reviewed-by: Hanna Reitz

Re: [PATCH v2 0/3] The HMP/QMP interfaces in Qemu SGX

2021-09-10 Thread Paolo Bonzini


On 10/09/21 12:22, Yang Zhong wrote:

This patchset supply HMP/QMP interfaces to monitor and Libvirt, with
those interfaces, we can check the SGX info from VM side or check
host SGX capabilities from Libvirt side.

This patchset is splitted from below link(from patch26 to patch30):
https://patchew.org/QEMU/20210719112136.57018-1-yang.zh...@intel.com/

The rest patches are being pulled by Paolo's below link and this new
patchset is based on it.
https://gitlab.com/bonzini/qemu.git tags/for-upstream


Queued 1-2, thanks.

For patch 3, I would like to learn more about the whole reset part of 
the series.  Would it be possible to handle it in the kernel without 
having to reopen the vEPC device (there's a possible race where you 
can't reopen and it's a huge mess)?


Paolo

Re: [PATCH RFC server v2 02/11] vfio-user: define vfio-user object

2021-09-10 Thread Jag Raman



> On Sep 8, 2021, at 8:37 AM, Stefan Hajnoczi  wrote:
> 
> On Fri, Aug 27, 2021 at 01:53:21PM -0400, Jagannathan Raman wrote:
>> Define vfio-user object which is remote process server for QEMU. Setup
>> object initialization functions and properties necessary to instantiate
>> the object
>> 
>> Signed-off-by: Elena Ufimtseva 
>> Signed-off-by: John G Johnson 
>> Signed-off-by: Jagannathan Raman 
>> ---
>> qapi/qom.json |  20 ++-
>> hw/remote/vfio-user-obj.c | 145 
>> ++
>> MAINTAINERS   |   1 +
>> hw/remote/meson.build |   1 +
>> hw/remote/trace-events|   3 +
>> 5 files changed, 168 insertions(+), 2 deletions(-)
>> create mode 100644 hw/remote/vfio-user-obj.c
>> 
>> diff --git a/qapi/qom.json b/qapi/qom.json
>> index a25616b..3e941ee 100644
>> --- a/qapi/qom.json
>> +++ b/qapi/qom.json
>> @@ -689,6 +689,20 @@
>>   'data': { 'fd': 'str', 'devid': 'str' } }
>> 
>> ##
>> +# @VfioUserProperties:
>> +#
>> +# Properties for vfio-user objects.
>> +#
>> +# @socket: path to be used as socket by the libvfiouser library
>> +#
>> +# @devid: the id of the device to be associated with the file descriptor
>> +#
>> +# Since: 6.0
>> +##
>> +{ 'struct': 'VfioUserProperties',
>> +  'data': { 'socket': 'str', 'devid': 'str' } }
> 
> Please use 'SocketAddress' for socket instead of 'str'. That way file
> descriptor passing is easy to support and additional socket address
> families can be supported in the future.

OK, will do.

> 
>> +
>> +##
>> # @RngProperties:
>> #
>> # Properties for objects of classes derived from rng.
>> @@ -812,7 +826,8 @@
>> 'tls-creds-psk',
>> 'tls-creds-x509',
>> 'tls-cipher-suites',
>> -'x-remote-object'
>> +'x-remote-object',
>> +'vfio-user'
>>   ] }
>> 
>> ##
>> @@ -868,7 +883,8 @@
>>   'tls-creds-psk':  'TlsCredsPskProperties',
>>   'tls-creds-x509': 'TlsCredsX509Properties',
>>   'tls-cipher-suites':  'TlsCredsProperties',
>> -  'x-remote-object':'RemoteObjectProperties'
>> +  'x-remote-object':'RemoteObjectProperties',
>> +  'vfio-user':  'VfioUserProperties'
> 
> "vfio-user" doesn't communicate whether this is a client or server. Is
> "vfio-user-server" clearer?

“vfio-user-server” sounds better.

> 
>>   } }
>> 
>> ##
>> diff --git a/hw/remote/vfio-user-obj.c b/hw/remote/vfio-user-obj.c
>> new file mode 100644
>> index 000..4a1e297
>> --- /dev/null
>> +++ b/hw/remote/vfio-user-obj.c
>> @@ -0,0 +1,145 @@
>> +/**
>> + * QEMU vfio-user server object
>> + *
>> + * Copyright © 2021 Oracle and/or its affiliates.
>> + *
>> + * This work is licensed under the terms of the GNU GPL-v2, version 2 or 
>> later.
>> + *
>> + * See the COPYING file in the top-level directory.
>> + *
>> + */
>> +
>> +/**
>> + * Usage: add options:
>> + * -machine x-remote
>> + * -device ,id=
>> + * -object vfio-user,id=,socket=,devid=
> 
> I suggest renaming devid= to device= or pci-device= (similar to drive=
> and netdev=) for consistency and to avoid confusion with PCI Device IDs.

OK, will do.

> 
>> diff --git a/hw/remote/meson.build b/hw/remote/meson.build
>> index fb35fb8..cd44dfc 100644
>> --- a/hw/remote/meson.build
>> +++ b/hw/remote/meson.build
>> @@ -6,6 +6,7 @@ remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: 
>> files('message.c'))
>> remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('remote-obj.c'))
>> remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('proxy.c'))
>> remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: files('iohub.c'))
>> +remote_ss.add(when: 'CONFIG_MULTIPROCESS', if_true: 
>> files('vfio-user-obj.c'))
> 
> If you use CONFIG_VFIO_USER_SERVER then it's easier to separate mpqemu
> from vfio-user. Sharing CONFIG_MULTIPROCESS could become messy later.

OK, got it.

--
Jag

Re: [PATCH RFC server v2 00/11] vfio-user server in QEMU

2021-09-10 Thread Jag Raman



> On Sep 9, 2021, at 4:17 AM, Stefan Hajnoczi  wrote:
> 
> Hi Jag,
> I have finished reviewing these patches and left comments. I didn't take
> a look at the libvfio-user's implementation.

Thank you for you comments, Stefan - we’ll get cracking on them. :)

--
Jag

> 
> Stefan

Re: [PATCH v10 03/16] target/riscv: clwz must ignore high bits (use shift-left & changed logic)

2021-09-10 Thread Richard Henderson


On 9/10/21 3:47 PM, Philipp Tomsich wrote:
Just wondering regarding the UXL-comment: the clzw instruction will be an illegal encoding 
for RV32 (the w-form instructions are present on RV64 only), so it should never be 
encountered in a RV32 instruction stream.


Correct.


Did you mean that clz (the instruction operating on xlen-registers) would have 
ctx->w
set for RV32 executing on RV64 ... or am I missing something fundamental?


Yes.

Or, as some test patches I was planning to post this weekend, replacing "w" as boolean 
with an "operation length" (ol) as an enum of MXL_RV*, so that we can represent "d" 
operations on RV128 with the same mechanism.



r~

Re: [PATCH 1/5] docs/devel: document expectations for QAPI data modelling for QMP

2021-09-10 Thread Daniel P . Berrangé

On Fri, Sep 10, 2021 at 03:45:11PM +0200, Markus Armbruster wrote:
> Daniel P. Berrangé  writes:
> 
> > On Thu, Sep 09, 2021 at 11:33:20AM +0200, Markus Armbruster wrote:
> 
> [...]
> 
> >> There are many existing long lines in this file, so I'm not flagging
> >> yours, except for this one, because it increases the maximum.
> >
> > This line is at exactly 80 characters so checkstyle is happy with it.
> > We don't have any requirement for a differenet line limit for docs
> > afair ?
> 
> We don't.  I'm asking for a favour.
> 
> Humans tend to have trouble following long lines with our eyes (I sure
> do).  Typographic manuals suggest to limit columns to roughly 60
> characters for exactly that reason[1].
> 
> 70 is a reasonable compromise between legibility and "writability".  For
> code, we use 80-90, because there the actual width should still be fine
> due to indentation.
> 
> checkpatch.pl doesn't differentiate betwen code and prose.
> 
> This file is already hard to read for me.  Please consider not making it
> harder.

Ok, no worries, I just didn't understand the rationale.

Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH v10 03/16] target/riscv: clwz must ignore high bits (use shift-left & changed logic)

2021-09-10 Thread Philipp Tomsich

On Fri, 10 Sept 2021 at 16:40, Richard Henderson <
richard.hender...@linaro.org> wrote:

> On 9/10/21 3:36 PM, Philipp Tomsich wrote:
> > Richard,
> >
> > Did you have a chance to consider what to do with clzw?
> > I would prefer to avoid the extra extension instructions and change the
> implementation
> > (and would update the commit message to provide more context), but if
> you insist on
> > setting 'ctx->w' I'll just have the extra extensions emitted than delay
> this series further…
>
> I don't mind not setting ctx->w, but bear in mind that UXL is going to
> automatically set
> this flag when executing RV32 on RV64.  That's why I have written a tcg
> patch set to
> eliminate unnecessary sign-extensions.
>

Ok, thanks!  Updated patches follow, once all test workloads have run…

Just wondering regarding the UXL-comment: the clzw instruction will be an
illegal encoding for RV32 (the w-form instructions are present on RV64
only), so it should never be encountered in a RV32 instruction stream.  Did
you mean that clz (the instruction operating on xlen-registers) would have
ctx->w set for RV32 executing on RV64 ... or am I missing something
fundamental?

Philipp.

Re: [PATCH] softmmu: fix watchpoint processing in icount mode

2021-09-10 Thread David Hildenbrand


On 10.09.21 15:34, Richard Henderson wrote:

On 9/10/21 1:15 PM, David Hildenbrand wrote:

On 07.09.21 13:30, Pavel Dovgalyuk wrote:

Watchpoint processing code restores vCPU state twice:
in tb_check_watchpoint and in cpu_loop_exit_restore/cpu_restore_state.
Normally it does not affect anything, but in icount mode instruction
counter is incremented twice and becomes incorrect.
This patch eliminates unneeded CPU state restore.

Signed-off-by: Pavel Dovgalyuk 
---
   softmmu/physmem.c |    5 +
   1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 23e77cb771..4025dfab11 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -941,14 +941,11 @@ void cpu_check_watchpoint(CPUState *cpu, vaddr addr, 
vaddr len,
   if (wp->flags & BP_STOP_BEFORE_ACCESS) {
   cpu->exception_index = EXCP_DEBUG;
   mmap_unlock();
-    cpu_loop_exit_restore(cpu, ra);
+    cpu_loop_exit(cpu);
   } else {
   /* Force execution of one insn next time.  */
   cpu->cflags_next_tb = 1 | curr_cflags(cpu);
   mmap_unlock();
-    if (ra) {
-    cpu_restore_state(cpu, ra, true);
-    }
   cpu_loop_exit_noexc(cpu);
   }
   }




I'm not an expert on that code, but it looks good to me.

Maybe we could have added a comment above the tb_check_watchpoint() call to 
highlight that
the restore will happen in there.


Hmm.  Curious.

Looking at tb_check_watchpoint, I have trouble seeing how it could be correct.
Watchpoints can happen at any memory reference within the TB.  We should be 
rolling back
to the cpu state at the memory reference (cpu_retore_state) and not the cpu 
state at the
start of the TB (cpu_restore_state_from_tb).


cpu_restore_state() ends up calling cpu_restore_state_from_tb() with essentially
the same parameters or what am I missing?

So AFAIU this patch shouldn't change the situation -- but valid point that the
current behavior might be bogus.



I'm also not sure why we're invalidating tb's.  Why does watchpoint hit imply 
that we
should want to ditch the TB?  If we want different behaviour from the next 
execution, we
should be adjusting cflags.


It goes back to

commit 06d55cc19ac84e799d2df8c750049e51798b00a4
Author: aliguori 
Date:   Tue Nov 18 20:24:06 2008 +

Restore pc on watchpoint hits (Jan Kiszka)

In order to provide accurate information about the triggering

instruction, this patch adds the required bits to restore the pc if the
access happened inside a TB. With the BP_STOP_BEFORE_ACCESS flag, the
watchpoint user can control if the debug trap should be issued on or
after the accessing instruction.

Signed-off-by: Jan Kiszka 

Signed-off-by: Anthony Liguori 


*trying to rememebr what we do on watchpoints* I think we want to
make sure that we end up with a single-instruction TB, right? So we
want to make sure to remove the old one.


--
Thanks,

David / dhildenb

Re: [PATCH v2 1/3] monitor: Add HMP and QMP interfaces

2021-09-10 Thread Daniel P . Berrangé

On Fri, Sep 10, 2021 at 06:22:56PM +0800, Yang Zhong wrote:
> The QMP and HMP interfaces can be used by monitor or QMP tools to retrieve
> the SGX information from VM side when SGX is enabled on Intel platform.
> 
> Signed-off-by: Yang Zhong 
> ---
>  hmp-commands-info.hx | 15 +
>  hw/i386/sgx.c| 29 
>  include/hw/i386/sgx.h| 11 +
>  include/monitor/hmp-target.h |  1 +
>  qapi/misc-target.json| 43 
>  target/i386/monitor.c| 36 ++
>  tests/qtest/qmp-cmd-test.c   |  1 +
>  7 files changed, 136 insertions(+)
>  create mode 100644 include/hw/i386/sgx.h

> diff --git a/hw/i386/sgx.c b/hw/i386/sgx.c
> index 02fa6487c3..8a32d62d7e 100644
> --- a/hw/i386/sgx.c
> +++ b/hw/i386/sgx.c
> @@ -17,6 +17,35 @@
>  #include "monitor/qdev.h"
>  #include "qapi/error.h"
>  #include "exec/address-spaces.h"
> +#include "hw/i386/sgx.h"
> +
> +SGXInfo *sgx_get_info(void)

I'd suggest this should have an 'Error **errp'

> +{
> +SGXInfo *info = NULL;
> +X86MachineState *x86ms;
> +PCMachineState *pcms =
> +(PCMachineState *)object_dynamic_cast(qdev_get_machine(),
> +  TYPE_PC_MACHINE);
> +if (!pcms) {

  error_setg(errp, "SGX is only available for x86 PC machines");

> +return NULL;
> +}
> +
> +x86ms = X86_MACHINE(pcms);
> +if (!x86ms->sgx_epc_list) {

  error_setg(errp "SGX is not enabled on this machine");

> +return NULL;
> +}
> +
> +SGXEPCState *sgx_epc = >sgx_epc;
> +info = g_new0(SGXInfo, 1);
> +
> +info->sgx = true;
> +info->sgx1 = true;
> +info->sgx2 = true;
> +info->flc = true;
> +info->section_size = sgx_epc->size;
> +
> +return info;
> +}



> diff --git a/target/i386/monitor.c b/target/i386/monitor.c
> index 119211f0b0..0f1b48b4f8 100644
> --- a/target/i386/monitor.c
> +++ b/target/i386/monitor.c
> @@ -35,6 +35,7 @@
>  #include "qapi/qapi-commands-misc-target.h"
>  #include "qapi/qapi-commands-misc.h"
>  #include "hw/i386/pc.h"
> +#include "hw/i386/sgx.h"
>  
>  /* Perform linear address sign extension */
>  static hwaddr addr_canonical(CPUArchState *env, hwaddr addr)
> @@ -763,3 +764,38 @@ qmp_query_sev_attestation_report(const char *mnonce, 
> Error **errp)
>  {
>  return sev_get_attestation_report(mnonce, errp);
>  }
> +
> +SGXInfo *qmp_query_sgx(Error **errp)
> +{
> +SGXInfo *info;
> +
> +info = sgx_get_info();

Pass errp into this

> +if (!info) {
> +error_setg(errp, "SGX features are not available");

And get rid of this.

> +return NULL;
> +}
> +
> +return info;
> +}
> +
> +void hmp_info_sgx(Monitor *mon, const QDict *qdict)
> +{

  g_autoptr(Error) err = NULL

> +SGXInfo *info = qmp_query_sgx(NULL);

Pass in  not NULL;

Also  declare it with  'g_autoptr(SGXInfo) info = ...'

And then

   if (err) {
  monitor_printf(mon, "%s\n", error_get_pretty(err);
  return;
   }

> +
> +if (info && info->sgx) {
> +monitor_printf(mon, "SGX support: %s\n",
> +   info->sgx ? "enabled" : "disabled");
> +monitor_printf(mon, "SGX1 support: %s\n",
> +   info->sgx1 ? "enabled" : "disabled");
> +monitor_printf(mon, "SGX2 support: %s\n",
> +   info->sgx2 ? "enabled" : "disabled");
> +monitor_printf(mon, "FLC support: %s\n",
> +   info->flc ? "enabled" : "disabled");
> +monitor_printf(mon, "size: %" PRIu64 "\n",
> +   info->section_size);
> +} else {
> +monitor_printf(mon, "SGX is not enabled\n");
> +}

Now you can remove the if/else and just print the result

> +
> +qapi_free_SGXInfo(info);

This can be dropped with the g_autoptr usage


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

Re: [PATCH 1/5] docs/devel: document expectations for QAPI data modelling for QMP

2021-09-10 Thread Markus Armbruster

Daniel P. Berrangé  writes:

> On Thu, Sep 09, 2021 at 11:33:20AM +0200, Markus Armbruster wrote:

[...]

>> There are many existing long lines in this file, so I'm not flagging
>> yours, except for this one, because it increases the maximum.
>
> This line is at exactly 80 characters so checkstyle is happy with it.
> We don't have any requirement for a differenet line limit for docs
> afair ?

We don't.  I'm asking for a favour.

Humans tend to have trouble following long lines with our eyes (I sure
do).  Typographic manuals suggest to limit columns to roughly 60
characters for exactly that reason[1].

70 is a reasonable compromise between legibility and "writability".  For
code, we use 80-90, because there the actual width should still be fine
due to indentation.

checkpatch.pl doesn't differentiate betwen code and prose.

This file is already hard to read for me.  Please consider not making it
harder.

[1] https://en.wikipedia.org/wiki/Column_(typography)#Typographic_style

Re: [PATCH v10 03/16] target/riscv: clwz must ignore high bits (use shift-left & changed logic)

2021-09-10 Thread Richard Henderson


On 9/10/21 3:36 PM, Philipp Tomsich wrote:

Richard,

Did you have a chance to consider what to do with clzw?
I would prefer to avoid the extra extension instructions and change the implementation 
(and would update the commit message to provide more context), but if you insist on 
setting 'ctx->w' I'll just have the extra extensions emitted than delay this series further…


I don't mind not setting ctx->w, but bear in mind that UXL is going to automatically set 
this flag when executing RV32 on RV64.  That's why I have written a tcg patch set to 
eliminate unnecessary sign-extensions.


Do swap out EXT_ZERO for EXT_NONE in the gen_unary call if you're not going to make use of 
any kind of source extension.



r~

Re: [PATCH v10 03/16] target/riscv: clwz must ignore high bits (use shift-left & changed logic)

2021-09-10 Thread Philipp Tomsich

Richard,

Did you have a chance to consider what to do with clzw?
I would prefer to avoid the extra extension instructions and change the
implementation (and would update the commit message to provide more
context), but if you insist on setting 'ctx->w' I'll just have the extra
extensions emitted than delay this series further…

Thanks,
Philipp.

On Sun, 5 Sept 2021 at 12:01, Philipp Tomsich 
wrote:

> On Sun 5. Sep 2021 at 11:11, Richard Henderson
>  wrote:
> >
> > On 9/4/21 10:35 PM, Philipp Tomsich wrote:
> > > Assume clzw being executed on a register that is not sign-extended,
> such
> > > as for the following sequence that uses (1ULL << 63) | 392 as the
> operand
> > > to clzw:
> > >   bseti   a2, zero, 63
> > >   addia2, a2, 392
> > >   clzwa3, a2
> > > The correct result of clzw would be 23, but the current implementation
> > > returns -32 (as it performs a 64bit clz, which results in 0 leading
> zero
> > > bits, and then subtracts 32).
> > >
> > > Fix this by changing the implementation to:
> > >   1. shift the original register up by 32
> > >   2. performs a target-length (64bit) clz
> > >   3. return 32 if no bits are set
> > >
> > > Signed-off-by: Philipp Tomsich 
> > > ---
> > >
> > > Changes in v10:
> > > - New patch, fixing correctnes for clzw called on a register with
> undefined
> > >(as in: not properly sign-extended) upper bits.
> >
> > But we have
> >
> >  return gen_unary(ctx, a, EXT_ZERO, gen_clzw);
> >
> > should *not* be undefined.  Ah, what's missing is
> >
> >  ctx->w = true;
> >
> > within trans_clzw to cause the extend to take effect.
> >
> > There are a few other "w" functions that are missing that set, though
> they use EXT_NONE so
> > there is no visible bug, it would probably be best to set w anyway.
>
>
> I had originally considered that (it would have to be ctx->w = true;
> and EXT_SIGN),
> but that has the side-effect of performing an extension on the result
> of the clzw as
> well (i.e. bot get_gpr and set_gpr are extended).
>
> However, this is not what clzw does: it only ignores the upper bits
> and then produces
> a result in the value-range [0..32]. An extension on the result of
> either a clz or clzw
> is never needed (we have been fighting that problem in GCC and had to
> use patterns
> for the combiner), so I don't want to introduce this inefficiency in qemu.
>
> If you have EXT_SIGN (or _ZERO), we will end up with 2 additional
> extensions (one
> on the argument, one on the result) in addition to the 2 other tcg
> operations that we
> need (i.e. clzi/subi for the extending case, shli/clzi for the proposed
> fix).
>
> So I believe that this commit is the best fix:
> 1. It doesn't mark this as a w-form (i.e. ignores the high-bits on the
> input _and_
> extends the output), as it only treats the input as 32bit, but the
> output is xlen.
> 2. It doesn't introduce any unnecessary extensions, but handles the
> case in-place.
>
> Philipp.
>

Re: [PATCH] softmmu: fix watchpoint processing in icount mode

2021-09-10 Thread Richard Henderson


On 9/10/21 1:15 PM, David Hildenbrand wrote:

On 07.09.21 13:30, Pavel Dovgalyuk wrote:

Watchpoint processing code restores vCPU state twice:
in tb_check_watchpoint and in cpu_loop_exit_restore/cpu_restore_state.
Normally it does not affect anything, but in icount mode instruction
counter is incremented twice and becomes incorrect.
This patch eliminates unneeded CPU state restore.

Signed-off-by: Pavel Dovgalyuk 
---
  softmmu/physmem.c |    5 +
  1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 23e77cb771..4025dfab11 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -941,14 +941,11 @@ void cpu_check_watchpoint(CPUState *cpu, vaddr addr, 
vaddr len,
  if (wp->flags & BP_STOP_BEFORE_ACCESS) {
  cpu->exception_index = EXCP_DEBUG;
  mmap_unlock();
-    cpu_loop_exit_restore(cpu, ra);
+    cpu_loop_exit(cpu);
  } else {
  /* Force execution of one insn next time.  */
  cpu->cflags_next_tb = 1 | curr_cflags(cpu);
  mmap_unlock();
-    if (ra) {
-    cpu_restore_state(cpu, ra, true);
-    }
  cpu_loop_exit_noexc(cpu);
  }
  }




I'm not an expert on that code, but it looks good to me.

Maybe we could have added a comment above the tb_check_watchpoint() call to highlight that 
the restore will happen in there.


Hmm.  Curious.

Looking at tb_check_watchpoint, I have trouble seeing how it could be correct. 
Watchpoints can happen at any memory reference within the TB.  We should be rolling back 
to the cpu state at the memory reference (cpu_retore_state) and not the cpu state at the 
start of the TB (cpu_restore_state_from_tb).


I'm also not sure why we're invalidating tb's.  Why does watchpoint hit imply that we 
should want to ditch the TB?  If we want different behaviour from the next execution, we 
should be adjusting cflags.



r~

Re: [PATCH v2] block/mirror: fix NULL pointer dereference in mirror_wait_on_conflicts()

2021-09-10 Thread Vladimir Sementsov-Ogievskiy


10.09.2021 15:45, Stefano Garzarella wrote:

In mirror_iteration() we call mirror_wait_on_conflicts() with
`self` parameter set to NULL.

Starting from commit d44dae1a7c we dereference `self` pointer in
mirror_wait_on_conflicts() without checks if it is not NULL.

Backtrace:
   Program terminated with signal SIGSEGV, Segmentation fault.
   #0  mirror_wait_on_conflicts (self=0x0, s=, offset=, 
bytes=)
   at ../block/mirror.c:172
   172  self->waiting_for_op = op;
   [Current thread is 1 (Thread 0x7f0908931ec0 (LWP 380249))]
   (gdb) bt
   #0  mirror_wait_on_conflicts (self=0x0, s=, offset=, 
bytes=)
   at ../block/mirror.c:172
   #1  0x5610c5d9d631 in mirror_run (job=0x5610c76a2c00, errp=) at ../block/mirror.c:491
   #2  0x5610c5d58726 in job_co_entry (opaque=0x5610c76a2c00) at 
../job.c:917
   #3  0x5610c5f046c6 in coroutine_trampoline (i0=, 
i1=)
   at ../util/coroutine-ucontext.c:173
   #4  0x7f0909975820 in ?? () at 
../sysdeps/unix/sysv/linux/x86_64/__start_context.S:91
   from /usr/lib64/libc.so.6

Buglink:https://bugzilla.redhat.com/show_bug.cgi?id=2001404
Fixes: d44dae1a7c ("block/mirror: fix active mirror dead-lock in 
mirror_wait_on_conflicts")
Signed-off-by: Stefano Garzarella


Reviewed-by: Vladimir Sementsov-Ogievskiy 

--
Best regards,
Vladimir

Re: SMMU Stage 2 translation in QEMU

2021-09-10 Thread shashi . mallela

So that would be the driver code running in guest OS because i see
tables being setup by arm-smmu driver code in linux,which is similar to
what happens with ITS(table base addresses programmed in registers by
linux driver).

On Fri, 2021-09-10 at 13:54 +0100, Peter Maydell wrote:
> On Fri, 10 Sept 2021 at 13:39,  wrote:
> > I am referring to the latter,"purely emulated QEMU with an emulated
> > SMMU that handles accesses to emulated devices"
> 
> In that case, the stage 2 tables are set up by the guest
> code (running at emulated EL2), just as they would be if
> it were running on real hardware.
> 
> -- PMM

[PULL 3/6] ui: Create sync objects and fences only for blobs

2021-09-10 Thread Gerd Hoffmann

From: Vivek Kasireddy 

Create sync objects and fences only for dmabufs that are blobs. Once a
fence is created (after glFlush) and is signalled,
graphic_hw_gl_flushed() will be called and virtio-gpu cmd processing
will be resumed.

Cc: Gerd Hoffmann 
Signed-off-by: Vivek Kasireddy 
Message-Id: <20210901211014.2800391-4-vivek.kasire...@intel.com>
Signed-off-by: Gerd Hoffmann 
---
 include/ui/console.h|  1 +
 include/ui/egl-helpers.h|  1 +
 include/ui/gtk.h|  1 +
 hw/display/virtio-gpu-udmabuf.c |  1 +
 ui/gtk-egl.c| 20 
 ui/gtk-gl-area.c| 20 
 ui/gtk.c| 13 +
 7 files changed, 57 insertions(+)

diff --git a/include/ui/console.h b/include/ui/console.h
index 45ec1291743b..244664d727a4 100644
--- a/include/ui/console.h
+++ b/include/ui/console.h
@@ -170,6 +170,7 @@ typedef struct QemuDmaBuf {
 bool  y0_top;
 void  *sync;
 int   fence_fd;
+bool  allow_fences;
 } QemuDmaBuf;
 
 typedef struct DisplayState DisplayState;
diff --git a/include/ui/egl-helpers.h b/include/ui/egl-helpers.h
index 2c3ba92b53e1..2fb6e0dd6b87 100644
--- a/include/ui/egl-helpers.h
+++ b/include/ui/egl-helpers.h
@@ -19,6 +19,7 @@ typedef struct egl_fb {
 GLuint texture;
 GLuint framebuffer;
 bool delete_texture;
+QemuDmaBuf *dmabuf;
 } egl_fb;
 
 void egl_fb_destroy(egl_fb *fb);
diff --git a/include/ui/gtk.h b/include/ui/gtk.h
index 8e98a79ac813..43854f350907 100644
--- a/include/ui/gtk.h
+++ b/include/ui/gtk.h
@@ -155,6 +155,7 @@ extern bool gtk_use_gl_area;
 /* ui/gtk.c */
 void gd_update_windowsize(VirtualConsole *vc);
 int gd_monitor_update_interval(GtkWidget *widget);
+void gd_hw_gl_flushed(void *vc);
 
 /* ui/gtk-egl.c */
 void gd_egl_init(VirtualConsole *vc);
diff --git a/hw/display/virtio-gpu-udmabuf.c b/hw/display/virtio-gpu-udmabuf.c
index 3c01a415e71b..c6f7f587847f 100644
--- a/hw/display/virtio-gpu-udmabuf.c
+++ b/hw/display/virtio-gpu-udmabuf.c
@@ -185,6 +185,7 @@ static VGPUDMABuf
 dmabuf->buf.stride = fb->stride;
 dmabuf->buf.fourcc = qemu_pixman_to_drm_format(fb->format);
 dmabuf->buf.fd = res->dmabuf_fd;
+dmabuf->buf.allow_fences = true;
 
 dmabuf->scanout_id = scanout_id;
 QTAILQ_INSERT_HEAD(>dmabuf.bufs, dmabuf, next);
diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c
index b671181272d5..2c68696d9fab 100644
--- a/ui/gtk-egl.c
+++ b/ui/gtk-egl.c
@@ -12,6 +12,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/main-loop.h"
 
 #include "trace.h"
 
@@ -63,6 +64,7 @@ void gd_egl_draw(VirtualConsole *vc)
 {
 GdkWindow *window;
 int ww, wh;
+QemuDmaBuf *dmabuf = vc->gfx.guest_fb.dmabuf;
 
 if (!vc->gfx.gls) {
 return;
@@ -94,6 +96,14 @@ void gd_egl_draw(VirtualConsole *vc)
 }
 
 glFlush();
+if (dmabuf) {
+egl_dmabuf_create_fence(dmabuf);
+if (dmabuf->fence_fd > 0) {
+qemu_set_fd_handler(dmabuf->fence_fd, gd_hw_gl_flushed, NULL, vc);
+return;
+}
+graphic_hw_gl_block(vc->gfx.dcl.con, false);
+}
 graphic_hw_gl_flushed(vc->gfx.dcl.con);
 }
 
@@ -209,6 +219,8 @@ void gd_egl_scanout_dmabuf(DisplayChangeListener *dcl,
QemuDmaBuf *dmabuf)
 {
 #ifdef CONFIG_GBM
+VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
+
 egl_dmabuf_import_texture(dmabuf);
 if (!dmabuf->texture) {
 return;
@@ -217,6 +229,10 @@ void gd_egl_scanout_dmabuf(DisplayChangeListener *dcl,
 gd_egl_scanout_texture(dcl, dmabuf->texture,
false, dmabuf->width, dmabuf->height,
0, 0, dmabuf->width, dmabuf->height);
+
+if (dmabuf->allow_fences) {
+vc->gfx.guest_fb.dmabuf = dmabuf;
+}
 #endif
 }
 
@@ -281,6 +297,10 @@ void gd_egl_scanout_flush(DisplayChangeListener *dcl,
 egl_fb_blit(>gfx.win_fb, >gfx.guest_fb, !vc->gfx.y0_top);
 }
 
+if (vc->gfx.guest_fb.dmabuf) {
+egl_dmabuf_create_sync(vc->gfx.guest_fb.dmabuf);
+}
+
 eglSwapBuffers(qemu_egl_display, vc->gfx.esurface);
 }
 
diff --git a/ui/gtk-gl-area.c b/ui/gtk-gl-area.c
index dd5783fec784..1654941dc982 100644
--- a/ui/gtk-gl-area.c
+++ b/ui/gtk-gl-area.c
@@ -8,6 +8,7 @@
  */
 
 #include "qemu/osdep.h"
+#include "qemu/main-loop.h"
 
 #include "trace.h"
 
@@ -38,6 +39,7 @@ static void gtk_gl_area_set_scanout_mode(VirtualConsole *vc, 
bool scanout)
 void gd_gl_area_draw(VirtualConsole *vc)
 {
 int ww, wh, y1, y2;
+QemuDmaBuf *dmabuf = vc->gfx.guest_fb.dmabuf;
 
 if (!vc->gfx.gls) {
 return;
@@ -71,7 +73,18 @@ void gd_gl_area_draw(VirtualConsole *vc)
 surface_gl_render_texture(vc->gfx.gls, vc->gfx.ds);
 }
 
+if (dmabuf) {
+egl_dmabuf_create_sync(dmabuf);
+}
 glFlush();
+if (dmabuf) {
+egl_dmabuf_create_fence(dmabuf);
+if (dmabuf->fence_fd > 0) {
+

Adding IO memory region to mipssim

2021-09-10 Thread Hinko Kocevar

I'm trying to add an I/O memory region to mipssim machine to emulate a MMIO
region used by the u-boot loaded as BIOS image. I can confirm that the
machine starts and loads the BIOS, starts execution but hangs due to
unhandled IO access as described below.

The region should be at 0xB881, of size 0x1.

I've added these lines of code to mispsim.c mips_mipssim_init():

my_state *s = g_malloc0(sizeof(my_state));
memory_region_init_io(>mmio, NULL, _ops, s,
 "mips_mipssim.foo", 0x1);
memory_region_add_subregion(address_space_mem, 0xB881LL, >mmio);

All goes well, the machine starts, and I can see the newly added region in
qemu monitor info mtree output like so:

b881-b881 (prio 0, i/o): mips_mipssim.foo

With some tracing enabled I see this error:

 Invalid access at addr 0x18810104, size 4, region '(null)', reason:
rejected

I know the u-boot is making request to 0xB8810104 and not 0x18810104. I
also can see 0xB8810104 address being handed to io_writex(), but mr_offset
becomes 0x18810104 here:

  mr_offset = (iotlbentry->addr & TARGET_PAGE_MASK) + addr;

What is going on?

FWIW, I can place my emulated memory region at 0x18810104, but would like
to understand the behavior above.

Thanks!
//hinko

[PULL 6/6] qxl: fix pre-save logic

2021-09-10 Thread Gerd Hoffmann

Oops.  Logic is backwards.

Fixes: 39b8a183e2f3 ("qxl: remove assert in qxl_pre_save.")
Resolves: https://gitlab.com/qemu-project/qemu/-/issues/610
Resolves: https://bugzilla.redhat.com//show_bug.cgi?id=2002907
Signed-off-by: Gerd Hoffmann 
Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Marc-André Lureau 
Message-Id: <20210910094203.3582378-1-kra...@redhat.com>
---
 hw/display/qxl.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/display/qxl.c b/hw/display/qxl.c
index 43482d4364ba..29c80b4289b7 100644
--- a/hw/display/qxl.c
+++ b/hw/display/qxl.c
@@ -2252,7 +2252,7 @@ static int qxl_pre_save(void *opaque)
 } else {
 d->last_release_offset = (uint8_t *)d->last_release - ram_start;
 }
-if (d->last_release_offset < d->vga.vram_size) {
+if (d->last_release_offset >= d->vga.vram_size) {
 return 1;
 }
 
-- 
2.31.1

[PULL 5/6] virtio-gpu: Add gl_flushed callback

2021-09-10 Thread Gerd Hoffmann

From: Vivek Kasireddy 

Adding this callback provides a way to resume the processing of
cmds in fenceq and cmdq that were not processed because the UI
was waiting on a fence and blocked cmd processing.

Cc: Gerd Hoffmann 
Reviewed-by: Gerd Hoffmann 
Signed-off-by: Vivek Kasireddy 
Message-Id: <20210901211014.2800391-6-vivek.kasire...@intel.com>
Signed-off-by: Gerd Hoffmann 
---
 hw/display/virtio-gpu.c | 32 ++--
 1 file changed, 30 insertions(+), 2 deletions(-)

diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
index 72da5bf5002c..182e0868b09d 100644
--- a/hw/display/virtio-gpu.c
+++ b/hw/display/virtio-gpu.c
@@ -985,8 +985,10 @@ void virtio_gpu_simple_process_cmd(VirtIOGPU *g,
 break;
 }
 if (!cmd->finished) {
-virtio_gpu_ctrl_response_nodata(g, cmd, cmd->error ? cmd->error :
-VIRTIO_GPU_RESP_OK_NODATA);
+if (!g->parent_obj.renderer_blocked) {
+virtio_gpu_ctrl_response_nodata(g, cmd, cmd->error ? cmd->error :
+VIRTIO_GPU_RESP_OK_NODATA);
+}
 }
 }
 
@@ -1042,6 +1044,30 @@ void virtio_gpu_process_cmdq(VirtIOGPU *g)
 g->processing_cmdq = false;
 }
 
+static void virtio_gpu_process_fenceq(VirtIOGPU *g)
+{
+struct virtio_gpu_ctrl_command *cmd, *tmp;
+
+QTAILQ_FOREACH_SAFE(cmd, >fenceq, next, tmp) {
+trace_virtio_gpu_fence_resp(cmd->cmd_hdr.fence_id);
+virtio_gpu_ctrl_response_nodata(g, cmd, VIRTIO_GPU_RESP_OK_NODATA);
+QTAILQ_REMOVE(>fenceq, cmd, next);
+g_free(cmd);
+g->inflight--;
+if (virtio_gpu_stats_enabled(g->parent_obj.conf)) {
+fprintf(stderr, "inflight: %3d (-)\r", g->inflight);
+}
+}
+}
+
+static void virtio_gpu_handle_gl_flushed(VirtIOGPUBase *b)
+{
+VirtIOGPU *g = container_of(b, VirtIOGPU, parent_obj);
+
+virtio_gpu_process_fenceq(g);
+virtio_gpu_process_cmdq(g);
+}
+
 static void virtio_gpu_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
 {
 VirtIOGPU *g = VIRTIO_GPU(vdev);
@@ -1400,10 +1426,12 @@ static void virtio_gpu_class_init(ObjectClass *klass, 
void *data)
 DeviceClass *dc = DEVICE_CLASS(klass);
 VirtioDeviceClass *vdc = VIRTIO_DEVICE_CLASS(klass);
 VirtIOGPUClass *vgc = VIRTIO_GPU_CLASS(klass);
+VirtIOGPUBaseClass *vgbc = >parent;
 
 vgc->handle_ctrl = virtio_gpu_handle_ctrl;
 vgc->process_cmd = virtio_gpu_simple_process_cmd;
 vgc->update_cursor_data = virtio_gpu_update_cursor_data;
+vgbc->gl_flushed = virtio_gpu_handle_gl_flushed;
 
 vdc->realize = virtio_gpu_device_realize;
 vdc->reset = virtio_gpu_reset;
-- 
2.31.1

[PULL 1/6] ui/gtk: Create a common release_dmabuf helper

2021-09-10 Thread Gerd Hoffmann

From: Vivek Kasireddy 

Since the texture release mechanism is same for both gtk-egl
and gtk-glarea, move the helper from gtk-egl to common gtk
code so that it can be shared by both gtk backends.

Cc: Gerd Hoffmann 
Reviewed-by: Gerd Hoffmann 
Signed-off-by: Vivek Kasireddy 
Message-Id: <20210901211014.2800391-2-vivek.kasire...@intel.com>
Signed-off-by: Gerd Hoffmann 
---
 include/ui/gtk.h |  2 --
 ui/gtk-egl.c |  8 
 ui/gtk.c | 11 ++-
 3 files changed, 10 insertions(+), 11 deletions(-)

diff --git a/include/ui/gtk.h b/include/ui/gtk.h
index 7835ef1a7108..8e98a79ac813 100644
--- a/include/ui/gtk.h
+++ b/include/ui/gtk.h
@@ -181,8 +181,6 @@ void gd_egl_cursor_dmabuf(DisplayChangeListener *dcl,
   uint32_t hot_x, uint32_t hot_y);
 void gd_egl_cursor_position(DisplayChangeListener *dcl,
 uint32_t pos_x, uint32_t pos_y);
-void gd_egl_release_dmabuf(DisplayChangeListener *dcl,
-   QemuDmaBuf *dmabuf);
 void gd_egl_scanout_flush(DisplayChangeListener *dcl,
   uint32_t x, uint32_t y, uint32_t w, uint32_t h);
 void gtk_egl_init(DisplayGLMode mode);
diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c
index 2a2e6d3a17d4..b671181272d5 100644
--- a/ui/gtk-egl.c
+++ b/ui/gtk-egl.c
@@ -249,14 +249,6 @@ void gd_egl_cursor_position(DisplayChangeListener *dcl,
 vc->gfx.cursor_y = pos_y * vc->gfx.scale_y;
 }
 
-void gd_egl_release_dmabuf(DisplayChangeListener *dcl,
-   QemuDmaBuf *dmabuf)
-{
-#ifdef CONFIG_GBM
-egl_dmabuf_release_texture(dmabuf);
-#endif
-}
-
 void gd_egl_scanout_flush(DisplayChangeListener *dcl,
   uint32_t x, uint32_t y, uint32_t w, uint32_t h)
 {
diff --git a/ui/gtk.c b/ui/gtk.c
index cfb0728d1fb4..784a2f6c749c 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -575,6 +575,14 @@ static bool gd_has_dmabuf(DisplayChangeListener *dcl)
 return vc->gfx.has_dmabuf;
 }
 
+static void gd_gl_release_dmabuf(DisplayChangeListener *dcl,
+ QemuDmaBuf *dmabuf)
+{
+#ifdef CONFIG_GBM
+egl_dmabuf_release_texture(dmabuf);
+#endif
+}
+
 /** DisplayState Callbacks (opengl version) **/
 
 static const DisplayChangeListenerOps dcl_gl_area_ops = {
@@ -593,6 +601,7 @@ static const DisplayChangeListenerOps dcl_gl_area_ops = {
 .dpy_gl_scanout_disable  = gd_gl_area_scanout_disable,
 .dpy_gl_update   = gd_gl_area_scanout_flush,
 .dpy_gl_scanout_dmabuf   = gd_gl_area_scanout_dmabuf,
+.dpy_gl_release_dmabuf   = gd_gl_release_dmabuf,
 .dpy_has_dmabuf  = gd_has_dmabuf,
 };
 
@@ -615,8 +624,8 @@ static const DisplayChangeListenerOps dcl_egl_ops = {
 .dpy_gl_scanout_dmabuf   = gd_egl_scanout_dmabuf,
 .dpy_gl_cursor_dmabuf= gd_egl_cursor_dmabuf,
 .dpy_gl_cursor_position  = gd_egl_cursor_position,
-.dpy_gl_release_dmabuf   = gd_egl_release_dmabuf,
 .dpy_gl_update   = gd_egl_scanout_flush,
+.dpy_gl_release_dmabuf   = gd_gl_release_dmabuf,
 .dpy_has_dmabuf  = gd_has_dmabuf,
 };
 
-- 
2.31.1

[PULL 4/6] ui/gtk-egl: Wait for the draw signal for dmabuf blobs

2021-09-10 Thread Gerd Hoffmann

From: Vivek Kasireddy 

Instead of immediately drawing and submitting, queue and wait
for the draw signal if the dmabuf submitted is a blob.

Cc: Gerd Hoffmann 
Reviewed-by: Gerd Hoffmann 
Signed-off-by: Vivek Kasireddy 
Message-Id: <20210901211014.2800391-5-vivek.kasire...@intel.com>
Signed-off-by: Gerd Hoffmann 
---
 include/ui/gtk.h |  2 ++
 ui/gtk-egl.c | 15 +++
 ui/gtk.c |  2 +-
 3 files changed, 18 insertions(+), 1 deletion(-)

diff --git a/include/ui/gtk.h b/include/ui/gtk.h
index 43854f350907..7d22affd381a 100644
--- a/include/ui/gtk.h
+++ b/include/ui/gtk.h
@@ -182,6 +182,8 @@ void gd_egl_cursor_dmabuf(DisplayChangeListener *dcl,
   uint32_t hot_x, uint32_t hot_y);
 void gd_egl_cursor_position(DisplayChangeListener *dcl,
 uint32_t pos_x, uint32_t pos_y);
+void gd_egl_flush(DisplayChangeListener *dcl,
+  uint32_t x, uint32_t y, uint32_t w, uint32_t h);
 void gd_egl_scanout_flush(DisplayChangeListener *dcl,
   uint32_t x, uint32_t y, uint32_t w, uint32_t h);
 void gtk_egl_init(DisplayGLMode mode);
diff --git a/ui/gtk-egl.c b/ui/gtk-egl.c
index 2c68696d9fab..737e7b90d47b 100644
--- a/ui/gtk-egl.c
+++ b/ui/gtk-egl.c
@@ -304,6 +304,21 @@ void gd_egl_scanout_flush(DisplayChangeListener *dcl,
 eglSwapBuffers(qemu_egl_display, vc->gfx.esurface);
 }
 
+void gd_egl_flush(DisplayChangeListener *dcl,
+  uint32_t x, uint32_t y, uint32_t w, uint32_t h)
+{
+VirtualConsole *vc = container_of(dcl, VirtualConsole, gfx.dcl);
+GtkWidget *area = vc->gfx.drawing_area;
+
+if (vc->gfx.guest_fb.dmabuf) {
+graphic_hw_gl_block(vc->gfx.dcl.con, true);
+gtk_widget_queue_draw_area(area, x, y, w, h);
+return;
+}
+
+gd_egl_scanout_flush(>gfx.dcl, x, y, w, h);
+}
+
 void gtk_egl_init(DisplayGLMode mode)
 {
 GdkDisplay *gdk_display = gdk_display_get_default();
diff --git a/ui/gtk.c b/ui/gtk.c
index 5105c0a33ff1..b0564d80c191 100644
--- a/ui/gtk.c
+++ b/ui/gtk.c
@@ -637,7 +637,7 @@ static const DisplayChangeListenerOps dcl_egl_ops = {
 .dpy_gl_scanout_dmabuf   = gd_egl_scanout_dmabuf,
 .dpy_gl_cursor_dmabuf= gd_egl_cursor_dmabuf,
 .dpy_gl_cursor_position  = gd_egl_cursor_position,
-.dpy_gl_update   = gd_egl_scanout_flush,
+.dpy_gl_update   = gd_egl_flush,
 .dpy_gl_release_dmabuf   = gd_gl_release_dmabuf,
 .dpy_has_dmabuf  = gd_has_dmabuf,
 };
-- 
2.31.1

[PULL 0/6] Vga 20210910 patches

2021-09-10 Thread Gerd Hoffmann

The following changes since commit bd662023e683850c085e98c8ff8297142c2dd9f2:

  Merge remote-tracking branch 'remotes/mcayland/tags/qemu-openbios-20210908' 
into staging (2021-09-08 11:06:17 +0100)

are available in the Git repository at:

  git://git.kraxel.org/qemu tags/vga-20210910-pull-request

for you to fetch changes up to 6335c0b56819a5d1219ea84a11a732d0861542db:

  qxl: fix pre-save logic (2021-09-10 12:23:12 +0200)


virtio-gpu + ui: fence syncronization.
qxl: unbreak live migration.



Gerd Hoffmann (1):
  qxl: fix pre-save logic

Vivek Kasireddy (5):
  ui/gtk: Create a common release_dmabuf helper
  ui/egl: Add egl helpers to help with synchronization
  ui: Create sync objects and fences only for blobs
  ui/gtk-egl: Wait for the draw signal for dmabuf blobs
  virtio-gpu: Add gl_flushed callback

 include/ui/console.h|  3 +++
 include/ui/egl-helpers.h|  3 +++
 include/ui/gtk.h|  5 ++--
 hw/display/qxl.c|  2 +-
 hw/display/virtio-gpu-udmabuf.c |  1 +
 hw/display/virtio-gpu.c | 32 ++--
 ui/egl-helpers.c| 26 
 ui/gtk-egl.c| 43 +++--
 ui/gtk-gl-area.c| 20 +++
 ui/gtk.c| 26 ++--
 10 files changed, 146 insertions(+), 15 deletions(-)

-- 
2.31.1

[PULL 2/6] ui/egl: Add egl helpers to help with synchronization

2021-09-10 Thread Gerd Hoffmann

From: Vivek Kasireddy 

These egl helpers would be used for creating and waiting on
a sync object.

Cc: Gerd Hoffmann 
Reviewed-by: Gerd Hoffmann 
Signed-off-by: Vivek Kasireddy 
Message-Id: <20210901211014.2800391-3-vivek.kasire...@intel.com>
Signed-off-by: Gerd Hoffmann 
---
 include/ui/console.h |  2 ++
 include/ui/egl-helpers.h |  2 ++
 ui/egl-helpers.c | 26 ++
 3 files changed, 30 insertions(+)

diff --git a/include/ui/console.h b/include/ui/console.h
index 3be21497a2e8..45ec1291743b 100644
--- a/include/ui/console.h
+++ b/include/ui/console.h
@@ -168,6 +168,8 @@ typedef struct QemuDmaBuf {
 uint64_t  modifier;
 uint32_t  texture;
 bool  y0_top;
+void  *sync;
+int   fence_fd;
 } QemuDmaBuf;
 
 typedef struct DisplayState DisplayState;
diff --git a/include/ui/egl-helpers.h b/include/ui/egl-helpers.h
index f1bf8f97fc33..2c3ba92b53e1 100644
--- a/include/ui/egl-helpers.h
+++ b/include/ui/egl-helpers.h
@@ -45,6 +45,8 @@ int egl_get_fd_for_texture(uint32_t tex_id, EGLint *stride, 
EGLint *fourcc,
 
 void egl_dmabuf_import_texture(QemuDmaBuf *dmabuf);
 void egl_dmabuf_release_texture(QemuDmaBuf *dmabuf);
+void egl_dmabuf_create_sync(QemuDmaBuf *dmabuf);
+void egl_dmabuf_create_fence(QemuDmaBuf *dmabuf);
 
 #endif
 
diff --git a/ui/egl-helpers.c b/ui/egl-helpers.c
index 6d0cb2b5cb93..d8986b0a7f9c 100644
--- a/ui/egl-helpers.c
+++ b/ui/egl-helpers.c
@@ -76,6 +76,32 @@ void egl_fb_setup_for_tex(egl_fb *fb, int width, int height,
   GL_TEXTURE_2D, fb->texture, 0);
 }
 
+void egl_dmabuf_create_sync(QemuDmaBuf *dmabuf)
+{
+EGLSyncKHR sync;
+
+if (epoxy_has_egl_extension(qemu_egl_display,
+"EGL_KHR_fence_sync") &&
+epoxy_has_egl_extension(qemu_egl_display,
+"EGL_ANDROID_native_fence_sync")) {
+sync = eglCreateSyncKHR(qemu_egl_display,
+EGL_SYNC_NATIVE_FENCE_ANDROID, NULL);
+if (sync != EGL_NO_SYNC_KHR) {
+dmabuf->sync = sync;
+}
+}
+}
+
+void egl_dmabuf_create_fence(QemuDmaBuf *dmabuf)
+{
+if (dmabuf->sync) {
+dmabuf->fence_fd = eglDupNativeFenceFDANDROID(qemu_egl_display,
+  dmabuf->sync);
+eglDestroySyncKHR(qemu_egl_display, dmabuf->sync);
+dmabuf->sync = NULL;
+}
+}
+
 void egl_fb_setup_new_tex(egl_fb *fb, int width, int height)
 {
 GLuint texture;
-- 
2.31.1

Re: [PULL 0/7] Migration.next patches

2021-09-10 Thread Li, Zhijian




on 2021/9/10 20:55, Philippe Mathieu-Daudé wrote:

BTW: Does QEMU provide any mean to set http(s)_proxy to building vm ? 
Currently, i have to
hack the code like:

-self.ssh_root_check("pkg install -y %s\n" % " ".join(self.pkgs))
+self.ssh_root_check("setenv HTTP_PROXY http://myproxy; setenv HTTPS_PROXY 
http://myproxy; pkg install -y %s\n" % " ".join(self.pkgs))

This is supported since commit b08ba163aaa ("tests/vm: send proxy
environment variables over ssh"). Maybe we only pass lower case
variables and should consider upper case too?


Great, I'm glad to know this. Thank you.
Lower case variables also work well on FreeBSD, so it's sufficient i think.


Thanks
Zhijian

Re: [PATCH 1/4] target/arm: Add TB flag for "MVE insns not predicated"

2021-09-10 Thread Richard Henderson


On 9/10/21 11:30 AM, Peter Maydell wrote:

If gen_lookup_tb() and DISAS_UPDATE_EXIT are the same, maybe we
should get rid of gen_lookup_tb() entirely ?


Yes, I think we should do just that.


r~

Re: [PULL 0/7] Migration.next patches

2021-09-10 Thread Philippe Mathieu-Daudé

On 9/10/21 7:27 AM, lizhij...@fujitsu.com wrote:
> On 10/09/2021 13:20, Li Zhijian wrote:
>> On 10/09/2021 00:10, Juan Quintela wrote:
>>> "Li, Zhijian"  wrote:
 on 2021/9/9 21:42, Peter Maydell wrote:
> On Thu, 9 Sept 2021 at 11:36, Juan Quintela  wrote:
> Fails to build, FreeBSD:
>
> ../src/migration/rdma.c:1146:23: error: use of undeclared identifier
> 'IBV_ADVISE_MR_ADVICE_PREFETCH_WRITE'
>   int advice = wr ? IBV_ADVISE_MR_ADVICE_PREFETCH_WRITE :
>     ^
> ../src/migration/rdma.c:1147:18: error: use of undeclared identifier
> 'IBV_ADVISE_MR_ADVICE_PREFETCH'
>    IBV_ADVISE_MR_ADVICE_PREFETCH;
>    ^
> ../src/migration/rdma.c:1150:11: warning: implicit declaration of
> function 'ibv_advise_mr' is invalid in C99
> [-Wimplicit-function-declaration]
>   ret = ibv_advise_mr(pd, advice,
>     ^
> ../src/migration/rdma.c:1151:25: error: use of undeclared identifier
> 'IBV_ADVISE_MR_FLAG_FLUSH'
>   IBV_ADVISE_MR_FLAG_FLUSH, _list, 1);
>   ^
>
 it's introduced by [PULL 4/7] migration/rdma: advise prefetch write for 
 ODP region
 where it calls a ibv_advise_mr(). i have checked the latest FreeBSD, it 
 didn't ship with this API
 May i know if just FressBSD reports this failure? if so, i just need 
 filtering out FreeBSD only
>>> Second try.  I can't see an example where they search for:
>>> a symbol on the header file
>>>    and
>>> a function in a library
>>>
>>> so I assume that if you have the symbols, you have the function.
>>>
>>> How do you see it?
>>>
>>> Trying to compile it on vm-build-freebsd, but not being very sucessfull
>>> so far.
> 
> BTW: Does QEMU provide any mean to set http(s)_proxy to building vm ? 
> Currently, i have to
> hack the code like:
> 
> -self.ssh_root_check("pkg install -y %s\n" % " ".join(self.pkgs))
> +self.ssh_root_check("setenv HTTP_PROXY http://myproxy; setenv 
> HTTPS_PROXY http://myproxy; pkg install -y %s\n" % " ".join(self.pkgs))

This is supported since commit b08ba163aaa ("tests/vm: send proxy
environment variables over ssh"). Maybe we only pass lower case
variables and should consider upper case too?

> 
> 
> Thanks
> Zhijian

Re: SMMU Stage 2 translation in QEMU

2021-09-10 Thread Peter Maydell

On Fri, 10 Sept 2021 at 13:39,  wrote:
>
> I am referring to the latter,"purely emulated QEMU with an emulated
> SMMU that handles accesses to emulated devices"

In that case, the stage 2 tables are set up by the guest
code (running at emulated EL2), just as they would be if
it were running on real hardware.

-- PMM

Re: [PATCH 1/5] docs/devel: document expectations for QAPI data modelling for QMP

2021-09-10 Thread Daniel P . Berrangé

On Thu, Sep 09, 2021 at 11:33:20AM +0200, Markus Armbruster wrote:
> Daniel P. Berrangé  writes:
> 
> > Traditionally we have required that newly added QMP commands will model
> > any returned data using fine grained QAPI types. This is good for
> > commands that are intended to be consumed by machines, where clear data
> > representation is very important. Commands that don't satisfy this have
> > generally been added to HMP only.
> >
> > In effect the decision of whether to add a new command to QMP vs HMP has
> > been used as a proxy for the decision of whether the cost of designing a
> > fine grained QAPI type is justified by the potential benefits.
> >
> > As a result the commands present in QMP and HMP are non-overlapping
> > sets, although HMP comamnds can be accessed indirectly via the QMP
> > command 'human-monitor-command'.
> >
> > One of the downsides of 'human-monitor-command' is that the QEMU monitor
> > APIs remain tied into various internal parts of the QEMU code. For
> > example any exclusively HMP command will need to use 'monitor_printf'
> > to get data out. It would be desirable to be able to fully isolate the
> > monitor implementation from QEMU internals, however, this is only
> > possible if all commands are exclusively based on QAPI with direct
> > QMP exposure.
> >
> > The way to achieve this desired end goal is to finese the requirements
> > for QMP command design. For cases where the output of a command is only
> > intended for human consumption, it is reasonable to want to simplify
> > the implementation by returning a plain string containing formatted
> > data instead of designing a fine grained QAPI data type. This can be
> > permitted if-and-only-if the command is exposed under the 'x-' name
> > prefix. This indicates that the command data format is liable to
> > future change and that it is not following QAPI design best practice.
> >
> > The poster child example for this would be the 'info registers' HMP
> > command which returns printf formatted data representing CPU state.
> > This information varies enourmously across target architectures and
> > changes relatively frequently as new CPU features are implemented.
> > It is there as debugging data for human operators, and any machine
> > usage would treat it as an opaque blob. It is thus reasonable to
> > expose this in QMP as 'x-query-registers' returning a 'str' field.
> >
> > Signed-off-by: Daniel P. Berrangé 
> > ---
> >  docs/devel/writing-qmp-commands.rst | 25 +
> >  1 file changed, 25 insertions(+)


> > +QAPI types. As a general guide, a caller of the QMP command should never 
> > need
> > +to parse individual returned data fields. If a field appears to need 
> > parsing,
> > +them it should be split into separate fields corresponding to each distinct
> > +data item. This should be the common case for any new QMP command that is
> > +intended to be used by machines, as opposed to exclusively human operators.
> > +
> > +Some QMP commands, however, are only intended as adhoc debugging aids for 
> > human
> > +operators. While they may return large amounts of formatted data, it is not
> > +expected that machines will need to parse the result. The overhead of 
> > defining
> > +a fine grained QAPI type for the data may not be justified by the potential
> > +benefit. In such cases, it is permitted to have a command return a simple 
> > string
> 
> There are many existing long lines in this file, so I'm not flagging
> yours, except for this one, because it increases the maximum.

This line is at exactly 80 characters so checkstyle is happy with it.
We don't have any requirement for a differenet line limit for docs
afair ?


Regards,
Daniel
-- 
|: https://berrange.com  -o-https://www.flickr.com/photos/dberrange :|
|: https://libvirt.org -o-https://fstop138.berrange.com :|
|: https://entangle-photo.org-o-https://www.instagram.com/dberrange :|

1 2 3 >

1 - 100 of 204 matches

Mail list logo