Re: [RFC] KVM / QEMU: Introduce Interface for Querying APICv Info

2022-05-19 Thread Suravee Suthikulpanit

On 5/20/22 12:26 PM, Chao Gao wrote:

On Fri, May 20, 2022 at 10:30:40AM +0700, Suthikulpanit, Suravee wrote:

Hi All,

Currently, we don't have a good way to check whether APICV is active on a VM.
Normally, For AMD SVM AVIC, users either have to check for trace point, or using
"perf kvm stat live" to catch AVIC-related #VMEXIT.

For KVM, I would like to propose introducing a new IOCTL interface (i.e. 
KVM_GET_APICV_INFO),
where user-space tools (e.g. QEMU monitor) can query run-time information of 
APICv for VM and vCPUs
such as APICv inhibit reason flags.

For QEMU, we can leverage the "info lapic" command, and append the APICV 
information after
all LAPIC register information:

For example:

- Begin Snippet -
(qemu) info lapic 0
dumping local APIC state for CPU 0

LVT0 0x00010700 active-hi edge  masked  ExtINT (vec 0)
LVT1 0x0400 active-hi edge  NMI
LVTPC0x0001 active-hi edge  masked  Fixed  (vec 0)
LVTERR   0x00fe active-hi edge  Fixed  (vec 254)
LVTTHMR  0x0001 active-hi edge  masked  Fixed  (vec 0)
LVTT 0x000400ee active-hi edge tsc-deadline Fixed  (vec 238)
TimerDCR=0x0 (divide by 2) initial_count = 0 current_count = 0
SPIV 0x01ff APIC enabled, focus=off, spurious vec 255
ICR  0x00fd physical edge de-assert no-shorthand
ICR2 0x0005 cpu 5 (X2APIC ID)
ESR  0x
ISR  (none)
IRR  (none)

APR 0x00 TPR 0x00 DFR 0x0f LDR 0x00PPR 0x00

APICV   vm inhibit: 0x10 <-- HERE
APICV vcpu inhibit: 0 <-- HERE

-- End Snippet --

Otherwise, we can have APICv-specific info command (e.g. info apicv).


I think this information can be added to kvm per-vm/vcpu debugfs. Then no
qemu change is needed.


I used to suggest the KVM debugfs approach in the past, but someone has suggested that it might be better to have a 
proper interface and leverage QEMU monitor. The debugfs would be difficult to use if we have large number of VMs, where 
we need to locate qemu PID and search in the /sys/kernel/debug/kvm/. Although, it would be easy to write a shell 
script to read the information from these files.


With IOCTL interface, other user-space tools/libraries can also query this 
information.

We can also have both :)

Best Regards,
Suravee



Re: [RFC] KVM / QEMU: Introduce Interface for Querying APICv Info

2022-05-19 Thread Chao Gao
On Fri, May 20, 2022 at 10:30:40AM +0700, Suthikulpanit, Suravee wrote:
>Hi All,
>
>Currently, we don't have a good way to check whether APICV is active on a VM.
>Normally, For AMD SVM AVIC, users either have to check for trace point, or 
>using
>"perf kvm stat live" to catch AVIC-related #VMEXIT.
>
>For KVM, I would like to propose introducing a new IOCTL interface (i.e. 
>KVM_GET_APICV_INFO),
>where user-space tools (e.g. QEMU monitor) can query run-time information of 
>APICv for VM and vCPUs
>such as APICv inhibit reason flags.
>
>For QEMU, we can leverage the "info lapic" command, and append the APICV 
>information after
>all LAPIC register information:
>
>For example:
>
>- Begin Snippet -
>(qemu) info lapic 0
>dumping local APIC state for CPU 0
>
>LVT0 0x00010700 active-hi edge  masked  ExtINT (vec 0)
>LVT1 0x0400 active-hi edge  NMI
>LVTPC0x0001 active-hi edge  masked  Fixed  (vec 0)
>LVTERR   0x00fe active-hi edge  Fixed  (vec 
>254)
>LVTTHMR  0x0001 active-hi edge  masked  Fixed  (vec 0)
>LVTT 0x000400ee active-hi edge tsc-deadline Fixed  (vec 
>238)
>TimerDCR=0x0 (divide by 2) initial_count = 0 current_count = 0
>SPIV 0x01ff APIC enabled, focus=off, spurious vec 255
>ICR  0x00fd physical edge de-assert no-shorthand
>ICR2 0x0005 cpu 5 (X2APIC ID)
>ESR  0x
>ISR  (none)
>IRR  (none)
>
>APR 0x00 TPR 0x00 DFR 0x0f LDR 0x00PPR 0x00
>
>APICV   vm inhibit: 0x10 <-- HERE
>APICV vcpu inhibit: 0 <-- HERE
>
>-- End Snippet --
>
>Otherwise, we can have APICv-specific info command (e.g. info apicv).

I think this information can be added to kvm per-vm/vcpu debugfs. Then no
qemu change is needed.



Re: [External] Re: [PATCH] hw/pci/pcie.c: Fix invalid PCI_EXP_LNKCAP setting

2022-05-19 Thread Wenliang Wang
As PCI_EXP_LNKCAP is never masked when loading, this patch does affect 
cross version migration. It seems we need machine type compat to deal 
with that. What do you suggest, Michael?


On 5/20/22 12:49 AM, Michael S. Tsirkin wrote:

On Thu, May 19, 2022 at 10:45:59PM +0800, Wenliang Wang wrote:

pcie_cap_fill_slot_lnk() wrongly set PCI_EXP_LNKCAP when slot speed
and width is not set, causing strange downstream port link cap
(Speed unknown, Width x0) and pcie devices native hotplug error on Linux:

[3.545654] pcieport :02:00.0: pciehp: link training error: status 0x2000
[3.547143] pcieport :02:00.0: pciehp: Failed to check link status

We do not touch PCI_EXP_LNKCAP when speed=0 or width=0, as pcie_cap_v1_fill()
already do the default setting for us.

Signed-off-by: Wenliang Wang 



do we need machine type compat dance with this?
can you check whether this affects cross version
migration please?


---
  hw/pci/pcie.c | 5 +
  1 file changed, 5 insertions(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 68a62da..c82e7fc 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -92,6 +92,11 @@ static void pcie_cap_fill_slot_lnk(PCIDevice *dev)
  return;
  }
  
+/* Use default LNKCAP setting */

+if (s->speed == 0 || s->width == 0) {
+return;
+}
+
  /* Clear and fill LNKCAP from what was configured above */
  pci_long_test_and_clear_mask(exp_cap + PCI_EXP_LNKCAP,
   PCI_EXP_LNKCAP_MLW | PCI_EXP_LNKCAP_SLS);
--
2.7.4






[RFC] KVM / QEMU: Introduce Interface for Querying APICv Info

2022-05-19 Thread Suthikulpanit, Suravee

Hi All,

Currently, we don't have a good way to check whether APICV is active on a VM.
Normally, For AMD SVM AVIC, users either have to check for trace point, or using
"perf kvm stat live" to catch AVIC-related #VMEXIT.

For KVM, I would like to propose introducing a new IOCTL interface (i.e. 
KVM_GET_APICV_INFO),
where user-space tools (e.g. QEMU monitor) can query run-time information of 
APICv for VM and vCPUs
such as APICv inhibit reason flags.

For QEMU, we can leverage the "info lapic" command, and append the APICV 
information after
all LAPIC register information:

For example:

- Begin Snippet -
(qemu) info lapic 0
dumping local APIC state for CPU 0

LVT0 0x00010700 active-hi edge  masked  ExtINT (vec 0)
LVT1 0x0400 active-hi edge  NMI
LVTPC0x0001 active-hi edge  masked  Fixed  (vec 0)
LVTERR   0x00fe active-hi edge  Fixed  (vec 254)
LVTTHMR  0x0001 active-hi edge  masked  Fixed  (vec 0)
LVTT 0x000400ee active-hi edge tsc-deadline Fixed  (vec 238)
TimerDCR=0x0 (divide by 2) initial_count = 0 current_count = 0
SPIV 0x01ff APIC enabled, focus=off, spurious vec 255
ICR  0x00fd physical edge de-assert no-shorthand
ICR2 0x0005 cpu 5 (X2APIC ID)
ESR  0x
ISR  (none)
IRR  (none)

APR 0x00 TPR 0x00 DFR 0x0f LDR 0x00PPR 0x00

APICV   vm inhibit: 0x10 <-- HERE
APICV vcpu inhibit: 0 <-- HERE

-- End Snippet --

Otherwise, we can have APICv-specific info command (e.g. info apicv).

Any suggestions are much appreciated.

Best Regards,
Suravee



[PATCH v5] qga: add guest-get-diskstats command for Linux guests

2022-05-19 Thread luzhipeng
Add a new 'guest-get-diskstats' command for report disk io statistics
for Linux guests. This can be useful for getting io flow or handling
IO fault, no need to enter guests.

Signed-off-by: luzhipeng 
Reviewed-by: Marc-André Lureau 
---
 Changes v4->v5: fix Typo and adjust fileds order in qapi-schema
 Changes v3->v4: 
https://patchew.org/QEMU/20220515095437.1291-1-luzhip...@cestc.cn/
 Changes v2->v3: bugfix for memory leak 
 Changes v1->v2: 
v1:https://patchew.org/QEMU/20220512011930.214-1-luzhip...@cestc.cn/
 
 qga/commands-posix.c | 123 +++
 qga/commands-win32.c |   6 +++
 qga/qapi-schema.json |  86 ++
 3 files changed, 215 insertions(+)

diff --git a/qga/commands-posix.c b/qga/commands-posix.c
index 69f209af87..12b50b7124 100644
--- a/qga/commands-posix.c
+++ b/qga/commands-posix.c
@@ -2783,6 +2783,122 @@ GuestMemoryBlockInfo 
*qmp_guest_get_memory_block_info(Error **errp)
 return info;
 }
 
+#define MAX_NAME_LEN 128
+static GuestDiskStatsInfoList *guest_get_diskstats(Error **errp)
+{
+#ifdef CONFIG_LINUX
+GuestDiskStatsInfoList *head = NULL, **tail = 
+const char *diskstats = "/proc/diskstats";
+FILE *fp;
+size_t n;
+char *line = NULL;
+
+fp = fopen(diskstats, "r");
+if (fp  == NULL) {
+error_setg_errno(errp, errno, "open(\"%s\")", diskstats);
+return NULL;
+}
+
+while (getline(, , fp) != -1) {
+g_autofree GuestDiskStatsInfo *diskstatinfo = NULL;
+g_autofree GuestDiskStats *diskstat = NULL;
+char dev_name[MAX_NAME_LEN];
+unsigned int ios_pgr, tot_ticks, rq_ticks, wr_ticks, dc_ticks, 
fl_ticks;
+unsigned long rd_ios, rd_merges_or_rd_sec, rd_ticks_or_wr_sec, wr_ios;
+unsigned long wr_merges, rd_sec_or_wr_ios, wr_sec;
+unsigned long dc_ios, dc_merges, dc_sec, fl_ios;
+unsigned int major, minor;
+int i;
+
+i = sscanf(line, "%u %u %s %lu %lu %lu"
+   "%lu %lu %lu %lu %u %u %u %u"
+   "%lu %lu %lu %u %lu %u",
+   , , dev_name,
+   _ios, _merges_or_rd_sec, _sec_or_wr_ios,
+   _ticks_or_wr_sec, _ios, _merges, _sec,
+   _ticks, _pgr, _ticks, _ticks,
+   _ios, _merges, _sec, _ticks,
+   _ios, _ticks);
+
+if (i < 7) {
+continue;
+}
+
+diskstatinfo = g_new0(GuestDiskStatsInfo, 1);
+diskstatinfo->name = g_strdup(dev_name);
+diskstatinfo->major = major;
+diskstatinfo->minor = minor;
+
+diskstat = g_new0(GuestDiskStats, 1);
+if (i == 7) {
+diskstat->has_read_ios = true;
+diskstat->read_ios = rd_ios;
+diskstat->has_read_sectors = true;
+diskstat->read_sectors = rd_merges_or_rd_sec;
+diskstat->has_write_ios = true;
+diskstat->write_ios = rd_sec_or_wr_ios;
+diskstat->has_write_sectors = true;
+diskstat->write_sectors = rd_ticks_or_wr_sec;
+}
+if (i >= 14) {
+diskstat->has_read_ios = true;
+diskstat->read_ios = rd_ios;
+diskstat->has_read_sectors = true;
+diskstat->read_sectors = rd_sec_or_wr_ios;
+diskstat->has_read_merges = true;
+diskstat->read_merges = rd_merges_or_rd_sec;
+diskstat->has_read_ticks = true;
+diskstat->read_ticks = rd_ticks_or_wr_sec;
+diskstat->has_write_ios = true;
+diskstat->write_ios = wr_ios;
+diskstat->has_write_sectors = true;
+diskstat->write_sectors = wr_sec;
+diskstat->has_write_merges = true;
+diskstat->write_merges = wr_merges;
+diskstat->has_write_ticks = true;
+diskstat->write_ticks = wr_ticks;
+diskstat->has_ios_pgr = true;
+diskstat->ios_pgr = ios_pgr;
+diskstat->has_total_ticks = true;
+diskstat->total_ticks = tot_ticks;
+diskstat->has_weight_ticks = true;
+diskstat->weight_ticks = rq_ticks;
+}
+if (i >= 18) {
+diskstat->has_discard_ios = true;
+diskstat->discard_ios = dc_ios;
+diskstat->has_discard_merges = true;
+diskstat->discard_merges = dc_merges;
+diskstat->has_discard_sectors = true;
+diskstat->discard_sectors = dc_sec;
+diskstat->has_discard_ticks = true;
+diskstat->discard_ticks = dc_ticks;
+}
+if (i >= 20) {
+diskstat->has_flush_ios = true;
+diskstat->flush_ios = fl_ios;
+diskstat->has_flush_ticks = true;
+diskstat->flush_ticks = fl_ticks;
+}
+
+diskstatinfo->stats = g_steal_pointer();
+QAPI_LIST_APPEND(tail, diskstatinfo);
+diskstatinfo = NULL;
+}
+free(line);
+fclose(fp);
+return 

Re: [PATCH] util: optimise flush_idcache_range when the ppc host has coherent icache

2022-05-19 Thread Nicholas Piggin
Excerpts from Richard Henderson's message of May 20, 2022 4:31 am:
> On 5/19/22 07:11, Nicholas Piggin wrote:
>> dcache writeback and icache invalidate is not required when icache is
>> coherent, a shorter fixed-length sequence can be used which just has to
>> flush and re-fetch instructions that were in-flight.
>> 
>> Signed-off-by: Nicholas Piggin 
>> ---
>> 
>> I haven't been able to measure a significant performance difference
>> with this, qemu isn't flushing large ranges frequently so the old sequence
>> is not that slow.
> 
> Yeah, we should be flushing smallish regions (< 1-4k), as we generate 
> TranslationBlocks. 
> And hopefully the translation cache is large enough that we spend more time 
> executing 
> blocks than re-compiling them.  ;-)
> 
> 
>> +++ b/include/qemu/cacheflush.h
>> @@ -28,6 +28,10 @@ static inline void flush_idcache_range(uintptr_t rx, 
>> uintptr_t rw, size_t len)
>>   
>>   #else
>>   
>> +#if defined(__powerpc__)
>> +extern bool have_coherent_icache;
>> +#endif
> 
> Ug.  I'm undecided where to put this.  I'm tempted to say...
> 
>> --- a/util/cacheflush.c
>> +++ b/util/cacheflush.c
>> @@ -108,7 +108,16 @@ void flush_idcache_range(uintptr_t rx, uintptr_t rw, 
>> size_t len)
> 
> ... here in cacheflush.c, with a comment that the variable is defined and 
> initialized in 
> cacheinfo.c.
> 
> I'm even more tempted to merge the two files to put all of the 
> machine-specific cache data 
> in the same place, then this variable can be static.  There's even an 
> existing TODO 
> comment in cacheflush.c for aarch64.

That might be nice. Do you want me to look at doing that first?

>>   b = rw & ~(dsize - 1);
>> +
>> +if (have_coherent_icache) {
>> +asm volatile ("sync" : : : "memory");
>> +asm volatile ("icbi 0,%0" : : "r"(b) : "memory");
>> +asm volatile ("isync" : : : "memory");
>> +return;
>> +}
> 
> Where can I find definitive rules on this?

In processor manuals (I don't know if there are any notes about this in 
the ISA, I would be tempted to say there should be since many processors
implement it).

POWER9 UM, 4.6.2.2 Instruction Cache Block Invalidate (icbi) 

https://ibm.ent.box.com/s/tmklq90ze7aj8f4n32er1mu3sy9u8k3k

> Note that rx may not equal rw, and that we've got two virtual mappings for 
> the same 
> memory, one for "data" that is read-write and one for "execute" that is 
> read-execute. 
> (This split is enabled only for --enable-debug-tcg builds on linux, to make 
> sure we don't 
> regress apple m1, which requires the split all of the time.)
> 
> In particular, you're flushing one icache line with the dcache address, and 
> that you're 
> not flushing any of the other lines.  Is the coherent icache thing really 
> that we may 
> simply skip the dcache flush step, but must still flush all of the icache 
> lines?

Yeah it's just a funny sequence the processor implements. It treats icbi 
almost as a no-op except that it sets a flag such that the next isync 
will flush and refetch the pipeline. It doesn't do any cache flushing.

> Without docs, "icache snoop" to me would imply that we only need the two 
> barriers and no 
> flushes at all, just to make sure all memory writes complete before any new 
> instructions 
> are executed.  This would be like the two AArch64 bits, IDC and DIC, which 
> indicate that 
> the two caches are coherent to Point of Unification, which leaves us with 
> just the 
> Instruction Sequence Barrier at the end of the function.
> 
> 
>> +bool have_coherent_icache = false;
> 
> scripts/checkpatch.pl should complain this is initialized to 0.
> 
> 
>>   static void arch_cache_info(int *isize, int *dsize)
>>   {
>> +#  ifdef PPC_FEATURE_ICACHE_SNOOP
>> +unsigned long hwcap = qemu_getauxval(AT_HWCAP);
>> +#  endif
>> +
>>   if (*isize == 0) {
>>   *isize = qemu_getauxval(AT_ICACHEBSIZE);
>>   }
>>   if (*dsize == 0) {
>>   *dsize = qemu_getauxval(AT_DCACHEBSIZE);
>>   }
>> +
>> +#  ifdef PPC_FEATURE_ICACHE_SNOOP
>> +have_coherent_icache = (hwcap & PPC_FEATURE_ICACHE_SNOOP) != 0;
>> +#  endif
> 
> Better with only one ifdef, moving this second hunk up.

Will clean those bits up, thanks.

> It would be nice if there were some kernel documentation for this...

arm64 has kernel docs for hwcaps... powerpc probably should as well.
Good point, I might do a patch for that.

Thanks,
Nick



Re: [PATCH 2/5] machine.py: add default pseries params in machine.py

2022-05-19 Thread John Snow
On Mon, May 16, 2022, 12:53 PM Daniel Henrique Barboza <
danielhb...@gmail.com> wrote:

> pSeries guests set a handful of machine capabilities on by default, all
> of them related to security mitigations, that aren't always available in
> the host.
>
> This means that, as is today, running avocado in a Power9 server without
> the proper firmware support, and with --disable-tcg, this error will
> occur:
>
>  (1/1) tests/avocado/info_usernet.py:InfoUsernet.test_hostfwd: ERROR:
> ConnectError:
> Failed to establish session: EOFError\n  Exit code: 1\n  (...)
> (...)
> Command: ./qemu-system-ppc64 -display none -vga none (...)
> Output: qemu-system-ppc64: warning: netdev vnet has no peer
> qemu-system-ppc64: Requested safe cache capability level not supported by
> KVM
> Try appending -machine cap-cfpc=broken
>
> info_usernet.py happens to trigger this error first, but all tests would
> fail in this configuration because the host does not support the default
> 'cap-cfpc' capability.
>
> A similar situation was already fixed a couple of years ago by Greg Kurz
> (commit 63d57c8f91d0) but it was focused on TCG warnings for these same
> capabilities and running C qtests. This commit ended up preventing the
> problem we're facing with avocado when running qtests with KVM support.
>
> This patch does a similar approach by amending machine.py to disable
> these security capabilities in case we're running a pseries guest. The
> change is made in the _launch() callback to be sure that we're already
> commited into launching the guest. It's also worth noticing that we're
> relying on self._machine being set accordingly (i.e. via tag:machine),
> which is currently the case for all ppc64 related avocado tests.
>
> Signed-off-by: Daniel Henrique Barboza 
> ---
>  python/qemu/machine/machine.py | 13 +
>  1 file changed, 13 insertions(+)
>
> diff --git a/python/qemu/machine/machine.py
> b/python/qemu/machine/machine.py
> index 07ac5a710b..12e5e37bff 100644
> --- a/python/qemu/machine/machine.py
> +++ b/python/qemu/machine/machine.py
> @@ -51,6 +51,11 @@
>
>
>  LOG = logging.getLogger(__name__)
> +PSERIES_DEFAULT_CAPABILITIES = ("cap-cfpc=broken,"
> +"cap-sbbc=broken,"
> +"cap-ibs=broken,"
> +"cap-ccf-assist=off,"
> +"cap-fwnmi=off")
>
>
>  class QEMUMachineError(Exception):
> @@ -447,6 +452,14 @@ def _launch(self) -> None:
>  """
>  Launch the VM and establish a QMP connection
>  """
> +
> +# pseries needs extra machine options to disable Spectre/Meltdown
> +# KVM related capabilities that might not be available in the
> +# host.
> +if "qemu-system-ppc64" in self._binary:
> +if self._machine is None or "pseries" in self._machine:
> +self._args.extend(['-machine',
> PSERIES_DEFAULT_CAPABILITIES])
> +
>  self._pre_launch()
>  LOG.debug('VM launch command: %r', ' '.join(self._qemu_full_args))
>
> --
> 2.32.0
>

Hm, okay.

I have plans to try and factor the machine appliance out and into an
upstream package in the near future, so I want to avoid more hardcoding of
defaults.

Does avocado have a subclass of QEMUMachine where it might be more
appropriate to stick this bandaid? Can we make one?

(I don't think iotests runs into this problem because we always use
machine:none there, I think. VM tests might have a similar problem though,
and then it'd be reasonable to want the bandaid here in machine.py ...
well, boo. okay.)

My verdict is that it's a bandaid, but I'll accept it if the avocado folks
agree to it and I'll sort it out later when I do my rewrite.

I don't think I have access to a power9 machine to test this with either,
so I might want a tested-by from someone who does.

--js

>


Re: The fate of iotest 297

2022-05-19 Thread John Snow
On Thu, May 19, 2022, 4:25 AM Daniel P. Berrangé 
wrote:

> On Thu, May 19, 2022 at 09:54:56AM +0200, Kevin Wolf wrote:
> > Am 18.05.2022 um 20:21 hat John Snow geschrieben:
> > > To wire it up to "make check" by *default*, I believe I need to expand
> the
> > > configure script to poll for certain requisites and then create some
> > > wrapper script of some kind that only engages the python tests if the
> > > requisites were met ... and I lose some control over the mypy/pylint
> > > versioning windows. I have to tolerate a wider versioning, or it'll
> never
> > > get run in practice.
> > >
> > > I have some reluctance to doing this, because pylint and mypy change so
> > > frequently that I don't want "make check" to fail spuriously in the
> future.
> > >
> > > (In practice, these failures occur 100% of the time when I am on
> vacation.)
> >
> > So we seem to agree that it's something that we do expect to fail from
> > time to time. Maybe this is how I could express my point better: If it's
> > a hard failure, it should fail as early as possible - i.e. ideally
> > before the developer sends a patch, but certainly before failing a pull
> > request.
>
> At least with pylint we can make an explicit list of which lint
> checks we want to run, so we should not get new failures when a
> new pylint is released. If there are rare cases where we none
> the less see a new failure from a new release, then so be it,
> whoever hits it first can send a patch. IOW, I think we should
> just enable pylint all the time with a fixed list of tests we
> care about. Over time we can enable more of its checks when
> desired.
>

Yeh, this might help a bit. If we use system packages by default, we'll
also generally avoid using bleeding edge packages and I'll (generally)
catch those myself via check-tox before people run into them organically.


> I don't know enough about mypy to know if it can provide similar
> level of control. Possibly the answer for "should we run it by default"
> will be different for pylint vs mypy.
>

Yeah, we can probably do different things. mypy is actually much more
stable than pylint IMO, it's probably actually okay to just let that one
behave as-is.

(I know I have a fix for 0.950 in my recent rfc series, but anecdotally I
feel mypy changes behavior a lot less often than pylint. isort and flake8
have basically never ever broken on update for me, either.)

Still, none of this is all that different from the case where
> new GCC or CLang are released and developers find new warnings
> have arrived. People just send patches when they hit this.
> Given python is a core part of QEMU's dev tooling, I think it
> is reasonable to expect developers to cope with this for python
> too, as long as the frequency of problems is not unreasonably
> high.
>

To some extent, though it's still a bummer to get warnings and errors that
have nothing to do with your changes. I have made sure I test a wide matrix
to the best of my ability, so it should be fine. I guess I'm just super
conservative about it ...

(Well, and even when I had the check-tox test set to allow failure, the
yellow exclamation mark still annoyed people. I'm just keen to avoid more
nastygrams.)


> > > That said ... maybe I can add a controlled venv version of
> "check-python"
> > > and just have a --disable-check-python or something that spec files
> can opt
> > > into. Maybe that will work well enough?
> > >
> > > i.e. maybe configure can check for the presence of pip, the python venv
> > > module (debian doesnt ship it standard...), and PyPI connectivity and
> if
> > > so, enables the test. Otherwise, we skip it.
> >
> > I think this should work. If detecting the right environment is hard, I
> > don't think there is even a requirement to do so. You can make
> > --enable-check-python the default and if people don't want it, they can
> > explicitly disable it. (I understand that until you run 'make check', it
> > doesn't make a difference anyway, so pure users would never have to
> > change the option, right?)
>
> I think it should just be the default too. Contributors have to accept
> that python is a core part of our project and we expect such code to
> pass various python quality control tests, on the wide variety of OS
> platforms we run on.
>

I meant that I'd have the default be "auto", but if you're arguing for the
default to be "on", I suppose I could. I have a weak preference for keeping
the min requisites for a no-option configure set small. This should be
trivial to change in either direction, though.

The requisites aren't steep: you just need python and the venv stdlib
module. If you have python, you meet that requisite on every platform
except debian/ubuntu, which ships venv separately. In practice, it probably
will be enabled for most people by default.


> > > Got it. I'll see what I can come up with that checks the boxes for
> > > everyone, thanks for clarifying yours.
> > >
> > > I want to make everything "just work" but I'm also 

Re: [PATCH v4] fcntl: Add 32bit filesystem mode

2022-05-19 Thread Linus Walleij
On Thu, May 19, 2022 at 4:23 PM Icenowy Zheng  wrote:
> 在 2020-11-18星期三的 00:39 +0100,Linus Walleij写道:

> > It was brought to my attention that this bug from 2018 was
> > still unresolved: 32 bit emulators like QEMU were given
> > 64 bit hashes when running 32 bit emulation on 64 bit systems.
>
> Sorry for replying such an old mail, but I found that using 32-bit file
> syscalls in 32-bit QEMU user on 64-bit hosts are still broken today,
> and google sent me here.

Yeah the bug was 2 years old when I started patching it and now it
is 4 years old...

> This mail does not get any reply according to linux-ext4 patchwork, so
> could I ping it?

I suppose, I think the patch is authored according to the maintainer
requirements, but I'm happy to revise and resend it if it no longer
applies.

Arnd and others suggested to maybe use F_SETFL instead:
https://lore.kernel.org/linux-fsdevel/CAK8P3a2SN2zeK=dj01Br-m86rJmK8mOyH=ghaidwspgkaet...@mail.gmail.com/

I am happy to do it either way by need to have some input from the
maintainer (Ted). Maybe someone else on the fsdevel list want to
chime in? Maybe any FS maintainer can actually apply this?

Yours,
Linus Walleij



Re: [PATCH] hw/riscv: virt: Avoid double FDT platform node

2022-05-19 Thread Dylan Reid
On Thu, May 19, 2022 at 08:34:06PM +0530, Anup Patel wrote:
> On Fri, May 13, 2022 at 1:34 AM Dylan Reid  wrote:
> >
> > When starting the virt machine with `-machine virt,aia=aplic-imsic`,
> > both the imsic and aplic init code will add platform fdt nodes by
> > calling `platform_bus_add_all_fdt_nodes`. This leads to an error at
> > startup:
> > ```
> > qemu_fdt_add_subnode: Failed to create subnode /platform@400: 
> > FDT_ERR_EXISTS
> > ```
> >
> > The call from `create_fdt_imsic` is not needed as an imsic is currently
> > always combined with an aplic that will create the nodes.
> >
> > Fixes: 3029fab64309 ("hw/riscv: virt: Add support for generating platform 
> > FDT entries")
> > Signed-off-by: Dylan Reid 
> > ---
> >  hw/riscv/virt.c | 5 -
> >  1 file changed, 5 deletions(-)
> >
> > diff --git a/hw/riscv/virt.c b/hw/riscv/virt.c
> > index 3326f4db96..d625f776a6 100644
> > --- a/hw/riscv/virt.c
> > +++ b/hw/riscv/virt.c
> > @@ -561,11 +561,6 @@ static void create_fdt_imsic(RISCVVirtState *s, const 
> > MemMapEntry *memmap,
> >  }
> >  qemu_fdt_setprop_cell(mc->fdt, imsic_name, "phandle", *msi_m_phandle);
> >
> > -platform_bus_add_all_fdt_nodes(mc->fdt, imsic_name,
> > -   memmap[VIRT_PLATFORM_BUS].base,
> > -   memmap[VIRT_PLATFORM_BUS].size,
> > -   VIRT_PLATFORM_BUS_IRQ);
> > -
> 
> This patch only fixes for the case where there is only one socket.
> 
> I had send-out a similar fix which also handles multi-socket case.
> https://lore.kernel.org/all/20220511144528.393530-9-apa...@ventanamicro.com/

Thanks Anup, that looks good to me.

> 
> Regards,
> Anup
> 
> 
> >  g_free(imsic_name);
> >
> >  /* S-level IMSIC node */
> > --
> > 2.30.2
> >
> >



Re: [PULL 00/22] target-arm queue

2022-05-19 Thread Richard Henderson

On 5/19/22 10:36, Peter Maydell wrote:

target-arm queue: mostly patches from me this time round.
Nothing too exciting.

-- PMM

The following changes since commit 78ac2eebbab9150edf5d0d00e3648f5ebb599001:

   Merge tag 'artist-cursor-fix-final-pull-request' of 
https://github.com/hdeller/qemu-hppa into staging (2022-05-18 09:32:15 -0700)

are available in the Git repository at:

   https://git.linaro.org/people/pmaydell/qemu-arm.git 
tags/pull-target-arm-20220519

for you to fetch changes up to fab8ad39fb75a0d9f097db67b2a33754e88e:

   target/arm: Use FIELD definitions for CPACR, CPTR_ELx (2022-05-19 18:34:10 
+0100)


target-arm queue:
  * Implement FEAT_S2FWB
  * Implement FEAT_IDST
  * Drop unsupported_encoding() macro
  * hw/intc/arm_gicv3: Use correct number of priority bits for the CPU
  * Fix aarch64 debug register names
  * hw/adc/zynq-xadc: Use qemu_irq typedef
  * target/arm/helper.c: Delete stray obsolete comment
  * Make number of counters in PMCR follow the CPU
  * hw/arm/virt: Fix dtb nits
  * ptimer: Rename PTIMER_POLICY_DEFAULT to PTIMER_POLICY_LEGACY
  * target/arm: Fix PAuth keys access checks for disabled SEL2
  * Enable FEAT_HCX for -cpu max
  * Use FIELD definitions for CPACR, CPTR_ELx


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/7.1 as 
appropriate.


r~





Chris Howard (1):
   Fix aarch64 debug register names.

Florian Lugou (1):
   target/arm: Fix PAuth keys access checks for disabled SEL2

Peter Maydell (17):
   target/arm: Postpone interpretation of stage 2 descriptor attribute bits
   target/arm: Factor out FWB=0 specific part of combine_cacheattrs()
   target/arm: Implement FEAT_S2FWB
   target/arm: Enable FEAT_S2FWB for -cpu max
   target/arm: Implement FEAT_IDST
   target/arm: Drop unsupported_encoding() macro
   hw/intc/arm_gicv3_cpuif: Handle CPUs that don't specify GICv3 parameters
   hw/intc/arm_gicv3: report correct PRIbits field in ICV_CTLR_EL1
   hw/intc/arm_gicv3_kvm.c: Stop using GIC_MIN_BPR constant
   hw/intc/arm_gicv3: Support configurable number of physical priority bits
   hw/intc/arm_gicv3: Use correct number of priority bits for the CPU
   hw/intc/arm_gicv3: Provide ich_num_aprs()
   target/arm/helper.c: Delete stray obsolete comment
   target/arm: Make number of counters in PMCR follow the CPU
   hw/arm/virt: Fix incorrect non-secure flash dtb node name
   hw/arm/virt: Drop #size-cells and #address-cells from gpio-keys dtb node
   ptimer: Rename PTIMER_POLICY_DEFAULT to PTIMER_POLICY_LEGACY

Philippe Mathieu-Daudé (1):
   hw/adc/zynq-xadc: Use qemu_irq typedef

Richard Henderson (2):
   target/arm: Enable FEAT_HCX for -cpu max
   target/arm: Use FIELD definitions for CPACR, CPTR_ELx

  docs/system/arm/emulation.rst  |   2 +
  include/hw/adc/zynq-xadc.h |   3 +-
  include/hw/intc/arm_gicv3_common.h |   8 +-
  include/hw/ptimer.h|  16 +-
  target/arm/cpregs.h|  24 +++
  target/arm/cpu.h   |  76 +++-
  target/arm/internals.h |  11 +-
  target/arm/translate-a64.h |   9 -
  hw/adc/zynq-xadc.c |   4 +-
  hw/arm/boot.c  |   2 +-
  hw/arm/musicpal.c  |   2 +-
  hw/arm/virt.c  |   4 +-
  hw/core/machine.c  |   4 +-
  hw/dma/xilinx_axidma.c |   2 +-
  hw/dma/xlnx_csu_dma.c  |   2 +-
  hw/intc/arm_gicv3_common.c |   5 +
  hw/intc/arm_gicv3_cpuif.c  | 225 +---
  hw/intc/arm_gicv3_kvm.c|  16 +-
  hw/m68k/mcf5206.c  |   2 +-
  hw/m68k/mcf5208.c  |   2 +-
  hw/net/can/xlnx-zynqmp-can.c   |   2 +-
  hw/net/fsl_etsec/etsec.c   |   2 +-
  hw/net/lan9118.c   |   2 +-
  hw/rtc/exynos4210_rtc.c|   4 +-
  hw/timer/allwinner-a10-pit.c   |   2 +-
  hw/timer/altera_timer.c|   2 +-
  hw/timer/arm_timer.c   |   2 +-
  hw/timer/digic-timer.c |   2 +-
  hw/timer/etraxfs_timer.c   |   6 +-
  hw/timer/exynos4210_mct.c  |   6 +-
  hw/timer/exynos4210_pwm.c  |   2 +-
  hw/timer/grlib_gptimer.c   |   2 +-
  hw/timer/imx_epit.c|   4 +-
  hw/timer/imx_gpt.c |   2 +-
  hw/timer/mss-timer.c   |   2 +-
  hw/timer/sh_timer.c|   2 +-
  hw/timer/slavio_timer.c|   2 +-
  hw/timer/xilinx_timer.c|   2 +-
  target/arm/cpu.c   |  11 +-
  target/arm/cpu64.c |  30 
  target/arm/cpu_tcg.c   |   6 +
  target/arm/helper.c| 348 -
  target/arm/kvm64.c |  12 ++
  target/arm/op_helper.c

[PATCH v2 12/12] target/ppc: declare vmsumsh[ms] helper with call flags

2022-05-19 Thread matheus . ferst
From: Matheus Ferst 

Move vmsumshm and vmsumshs to decodetree, declare both helpers with
TCG_CALL_NO_RWG, and drop the unused env argument of vmsumshm.

Reviewed-by: Richard Henderson 
Signed-off-by: Matheus Ferst 
---
 target/ppc/helper.h | 4 ++--
 target/ppc/insn32.decode| 2 ++
 target/ppc/int_helper.c | 5 ++---
 target/ppc/translate/vmx-impl.c.inc | 3 ++-
 target/ppc/translate/vmx-ops.c.inc  | 1 -
 5 files changed, 8 insertions(+), 7 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 223b4c941a..3206ce5694 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -244,8 +244,8 @@ DEF_HELPER_5(vmhaddshs, void, env, avr, avr, avr, avr)
 DEF_HELPER_5(vmhraddshs, void, env, avr, avr, avr, avr)
 DEF_HELPER_FLAGS_4(VMSUMUHM, TCG_CALL_NO_RWG, void, avr, avr, avr, avr)
 DEF_HELPER_FLAGS_5(VMSUMUHS, TCG_CALL_NO_RWG, void, env, avr, avr, avr, avr)
-DEF_HELPER_5(vmsumshm, void, env, avr, avr, avr, avr)
-DEF_HELPER_5(vmsumshs, void, env, avr, avr, avr, avr)
+DEF_HELPER_FLAGS_4(VMSUMSHM, TCG_CALL_NO_RWG, void, avr, avr, avr, avr)
+DEF_HELPER_FLAGS_5(VMSUMSHS, TCG_CALL_NO_RWG, void, env, avr, avr, avr, avr)
 DEF_HELPER_FLAGS_4(vmladduhm, TCG_CALL_NO_RWG, void, avr, avr, avr, avr)
 DEF_HELPER_FLAGS_2(mtvscr, TCG_CALL_NO_RWG, void, env, i32)
 DEF_HELPER_FLAGS_1(mfvscr, TCG_CALL_NO_RWG, i32, env)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 43ea03c3e7..f001c02a8c 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -601,6 +601,8 @@ VMULLD  000100 . . . 00111001001@VX
 
 VMSUMUBM000100 . . . . 100100   @VA
 VMSUMMBM000100 . . . . 100101   @VA
+VMSUMSHM000100 . . . . 101000   @VA
+VMSUMSHS000100 . . . . 101001   @VA
 VMSUMUHM000100 . . . . 100110   @VA
 VMSUMUHS000100 . . . . 100111   @VA
 
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 9285a1c2a1..b9dd15d607 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -890,8 +890,7 @@ void helper_VMSUMMBM(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t 
*b, ppc_avr_t *c)
 }
 }
 
-void helper_vmsumshm(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,
- ppc_avr_t *b, ppc_avr_t *c)
+void helper_VMSUMSHM(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c)
 {
 int32_t prod[8];
 int i;
@@ -905,7 +904,7 @@ void helper_vmsumshm(CPUPPCState *env, ppc_avr_t *r, 
ppc_avr_t *a,
 }
 }
 
-void helper_vmsumshs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,
+void helper_VMSUMSHS(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,
  ppc_avr_t *b, ppc_avr_t *c)
 {
 int32_t prod[8];
diff --git a/target/ppc/translate/vmx-impl.c.inc 
b/target/ppc/translate/vmx-impl.c.inc
index da81296b96..d7524c3204 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -2587,9 +2587,9 @@ static bool trans_VSEL(DisasContext *ctx, arg_VA *a)
 return true;
 }
 
-GEN_VAFORM_PAIRED(vmsumshm, vmsumshs, 20)
 TRANS_FLAGS(ALTIVEC, VMSUMUBM, do_va_helper, gen_helper_VMSUMUBM)
 TRANS_FLAGS(ALTIVEC, VMSUMMBM, do_va_helper, gen_helper_VMSUMMBM)
+TRANS_FLAGS(ALTIVEC, VMSUMSHM, do_va_helper, gen_helper_VMSUMSHM)
 TRANS_FLAGS(ALTIVEC, VMSUMUHM, do_va_helper, gen_helper_VMSUMUHM)
 
 static bool do_va_env_helper(DisasContext *ctx, arg_VA *a,
@@ -2612,6 +2612,7 @@ static bool do_va_env_helper(DisasContext *ctx, arg_VA *a,
 }
 
 TRANS_FLAGS(ALTIVEC, VMSUMUHS, do_va_env_helper, gen_helper_VMSUMUHS)
+TRANS_FLAGS(ALTIVEC, VMSUMSHS, do_va_env_helper, gen_helper_VMSUMSHS)
 
 GEN_VAFORM_PAIRED(vmaddfp, vnmsubfp, 23)
 
diff --git a/target/ppc/translate/vmx-ops.c.inc 
b/target/ppc/translate/vmx-ops.c.inc
index 15b3e06410..d7cc57868e 100644
--- a/target/ppc/translate/vmx-ops.c.inc
+++ b/target/ppc/translate/vmx-ops.c.inc
@@ -224,7 +224,6 @@ GEN_VXFORM_UIMM(vctsxs, 5, 15),
 #define GEN_VAFORM_PAIRED(name0, name1, opc2)   \
 GEN_HANDLER(name0##_##name1, 0x04, opc2, 0xFF, 0x, PPC_ALTIVEC)
 GEN_VAFORM_PAIRED(vmhaddshs, vmhraddshs, 16),
-GEN_VAFORM_PAIRED(vmsumshm, vmsumshs, 20),
 GEN_VAFORM_PAIRED(vmaddfp, vnmsubfp, 23),
 
 GEN_VXFORM_DUAL(vclzb, vpopcntb, 1, 28, PPC_NONE, PPC2_ALTIVEC_207),
-- 
2.25.1




[PATCH v2 05/12] target/ppc: Use TCG_CALL_NO_RWG_SE in fsel helper

2022-05-19 Thread matheus . ferst
From: Matheus Ferst 

fsel doesn't change FPSCR and CR1 is handled by gen_set_cr1_from_fpscr,
so helper_fsel doesn't need the env argument and can be declared with
TCG_CALL_NO_RWG_SE. We also take this opportunity to move the insn to
decodetree.

Reviewed-by: Richard Henderson 
Signed-off-by: Matheus Ferst 
---
 target/ppc/fpu_helper.c| 15 +++
 target/ppc/helper.h|  2 +-
 target/ppc/insn32.decode   |  7 +++
 target/ppc/translate/fp-impl.c.inc | 30 --
 target/ppc/translate/fp-ops.c.inc  |  1 -
 5 files changed, 43 insertions(+), 12 deletions(-)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index f6c8318a71..b4d6f6ed4c 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -916,18 +916,17 @@ float64 helper_frsqrtes(CPUPPCState *env, float64 arg)
 }
 
 /* fsel - fsel. */
-uint64_t helper_fsel(CPUPPCState *env, uint64_t arg1, uint64_t arg2,
- uint64_t arg3)
+uint64_t helper_FSEL(uint64_t a, uint64_t b, uint64_t c)
 {
-CPU_DoubleU farg1;
+CPU_DoubleU fa;
 
-farg1.ll = arg1;
+fa.ll = a;
 
-if ((!float64_is_neg(farg1.d) || float64_is_zero(farg1.d)) &&
-!float64_is_any_nan(farg1.d)) {
-return arg2;
+if ((!float64_is_neg(fa.d) || float64_is_zero(fa.d)) &&
+!float64_is_any_nan(fa.d)) {
+return c;
 } else {
-return arg3;
+return b;
 }
 }
 
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index ba70d2133b..4a7cbdf922 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -120,7 +120,7 @@ DEF_HELPER_2(fre, i64, env, i64)
 DEF_HELPER_2(fres, i64, env, i64)
 DEF_HELPER_2(frsqrte, i64, env, i64)
 DEF_HELPER_2(frsqrtes, i64, env, i64)
-DEF_HELPER_4(fsel, i64, env, i64, i64, i64)
+DEF_HELPER_FLAGS_3(FSEL, TCG_CALL_NO_RWG_SE, i64, i64, i64, i64)
 
 DEF_HELPER_FLAGS_2(ftdiv, TCG_CALL_NO_RWG_SE, i32, i64, i64)
 DEF_HELPER_FLAGS_1(ftsqrt, TCG_CALL_NO_RWG_SE, i32, i64)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 39372fe673..1d0b55bde3 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -17,6 +17,9 @@
 # License along with this library; if not, see .
 #
 
+  frt fra frb frc rc:bool
+@A  .. frt:5 fra:5 frb:5 frc:5 . rc:1   
+
   rt ra si:int64_t
 @D  .. rt:5 ra:5 si:s16 
 
@@ -308,6 +311,10 @@ STFDU   110111 . .. ... @D
 STFDX   01 . ..  1011010111 -   @X
 STFDUX  01 . ..  100111 -   @X
 
+### Floating-Point Select Instruction
+
+FSEL11 . . . . 10111 .  @A
+
 ### Move To/From System Register Instructions
 
 SETBC   01 . . - 011000 -   @X_bi
diff --git a/target/ppc/translate/fp-impl.c.inc 
b/target/ppc/translate/fp-impl.c.inc
index cfb27bd020..f9b58b844e 100644
--- a/target/ppc/translate/fp-impl.c.inc
+++ b/target/ppc/translate/fp-impl.c.inc
@@ -222,8 +222,34 @@ static void gen_frsqrtes(DisasContext *ctx)
 tcg_temp_free_i64(t1);
 }
 
-/* fsel */
-_GEN_FLOAT_ACB(sel, 0x3F, 0x17, 0, PPC_FLOAT_FSEL);
+static bool trans_FSEL(DisasContext *ctx, arg_A *a)
+{
+TCGv_i64 t0, t1, t2;
+
+REQUIRE_INSNS_FLAGS(ctx, FLOAT_FSEL);
+REQUIRE_FPU(ctx);
+
+t0 = tcg_temp_new_i64();
+t1 = tcg_temp_new_i64();
+t2 = tcg_temp_new_i64();
+
+get_fpr(t0, a->fra);
+get_fpr(t1, a->frb);
+get_fpr(t2, a->frc);
+
+gen_helper_FSEL(t0, t0, t1, t2);
+set_fpr(a->frt, t0);
+if (a->rc) {
+gen_set_cr1_from_fpscr(ctx);
+}
+
+tcg_temp_free_i64(t0);
+tcg_temp_free_i64(t1);
+tcg_temp_free_i64(t2);
+
+return true;
+}
+
 /* fsub - fsubs */
 GEN_FLOAT_AB(sub, 0x14, 0x07C0, 1, PPC_FLOAT);
 /* Optional: */
diff --git a/target/ppc/translate/fp-ops.c.inc 
b/target/ppc/translate/fp-ops.c.inc
index 4260635a12..0538ab2d2d 100644
--- a/target/ppc/translate/fp-ops.c.inc
+++ b/target/ppc/translate/fp-ops.c.inc
@@ -24,7 +24,6 @@ GEN_FLOAT_AC(mul, 0x19, 0xF800, 1, PPC_FLOAT),
 GEN_FLOAT_BS(re, 0x3F, 0x18, 1, PPC_FLOAT_EXT),
 GEN_FLOAT_BS(res, 0x3B, 0x18, 1, PPC_FLOAT_FRES),
 GEN_FLOAT_BS(rsqrte, 0x3F, 0x1A, 1, PPC_FLOAT_FRSQRTE),
-_GEN_FLOAT_ACB(sel, sel, 0x3F, 0x17, 0, 0, PPC_FLOAT_FSEL),
 GEN_FLOAT_AB(sub, 0x14, 0x07C0, 1, PPC_FLOAT),
 GEN_FLOAT_ACB(madd, 0x1D, 1, PPC_FLOAT),
 GEN_FLOAT_ACB(msub, 0x1C, 1, PPC_FLOAT),
-- 
2.25.1




[PATCH v2 09/12] target/ppc: introduce do_va_helper

2022-05-19 Thread matheus . ferst
From: Matheus Ferst 

Reviewed-by: Richard Henderson 
Signed-off-by: Matheus Ferst 
---
 target/ppc/translate/vmx-impl.c.inc | 32 +
 1 file changed, 5 insertions(+), 27 deletions(-)

diff --git a/target/ppc/translate/vmx-impl.c.inc 
b/target/ppc/translate/vmx-impl.c.inc
index 764ac45409..e66301c007 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -2553,20 +2553,17 @@ static void gen_vmladduhm(DisasContext *ctx)
 tcg_temp_free_ptr(rd);
 }
 
-static bool trans_VPERM(DisasContext *ctx, arg_VA *a)
+static bool do_va_helper(DisasContext *ctx, arg_VA *a,
+void (*gen_helper)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr))
 {
 TCGv_ptr vrt, vra, vrb, vrc;
-
-REQUIRE_INSNS_FLAGS(ctx, ALTIVEC);
 REQUIRE_VECTOR(ctx);
 
 vrt = gen_avr_ptr(a->vrt);
 vra = gen_avr_ptr(a->vra);
 vrb = gen_avr_ptr(a->vrb);
 vrc = gen_avr_ptr(a->rc);
-
-gen_helper_VPERM(vrt, vra, vrb, vrc);
-
+gen_helper(vrt, vra, vrb, vrc);
 tcg_temp_free_ptr(vrt);
 tcg_temp_free_ptr(vra);
 tcg_temp_free_ptr(vrb);
@@ -2575,27 +2572,8 @@ static bool trans_VPERM(DisasContext *ctx, arg_VA *a)
 return true;
 }
 
-static bool trans_VPERMR(DisasContext *ctx, arg_VA *a)
-{
-TCGv_ptr vrt, vra, vrb, vrc;
-
-REQUIRE_INSNS_FLAGS2(ctx, ISA300);
-REQUIRE_VECTOR(ctx);
-
-vrt = gen_avr_ptr(a->vrt);
-vra = gen_avr_ptr(a->vra);
-vrb = gen_avr_ptr(a->vrb);
-vrc = gen_avr_ptr(a->rc);
-
-gen_helper_VPERMR(vrt, vra, vrb, vrc);
-
-tcg_temp_free_ptr(vrt);
-tcg_temp_free_ptr(vra);
-tcg_temp_free_ptr(vrb);
-tcg_temp_free_ptr(vrc);
-
-return true;
-}
+TRANS_FLAGS(ALTIVEC, VPERM, do_va_helper, gen_helper_VPERM)
+TRANS_FLAGS2(ISA300, VPERMR, do_va_helper, gen_helper_VPERMR)
 
 static bool trans_VSEL(DisasContext *ctx, arg_VA *a)
 {
-- 
2.25.1




[PATCH v2 11/12] target/ppc: declare vmsumuh[ms] helper with call flags

2022-05-19 Thread matheus . ferst
From: Matheus Ferst 

Move vmsumuhm and vmsumuhs to decodetree, declare both helpers with
TCG_CALL_NO_RWG, and drop the unused env argument of vmsumuhm.

Reviewed-by: Richard Henderson 
Signed-off-by: Matheus Ferst 
---
 target/ppc/helper.h |  4 ++--
 target/ppc/insn32.decode|  2 ++
 target/ppc/int_helper.c |  5 ++---
 target/ppc/translate/vmx-impl.c.inc | 24 ++--
 target/ppc/translate/vmx-ops.c.inc  |  1 -
 5 files changed, 28 insertions(+), 8 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index efbbd34feb..223b4c941a 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -242,8 +242,8 @@ DEF_HELPER_4(vpkudum, void, env, avr, avr, avr)
 DEF_HELPER_FLAGS_3(vpkpx, TCG_CALL_NO_RWG, void, avr, avr, avr)
 DEF_HELPER_5(vmhaddshs, void, env, avr, avr, avr, avr)
 DEF_HELPER_5(vmhraddshs, void, env, avr, avr, avr, avr)
-DEF_HELPER_5(vmsumuhm, void, env, avr, avr, avr, avr)
-DEF_HELPER_5(vmsumuhs, void, env, avr, avr, avr, avr)
+DEF_HELPER_FLAGS_4(VMSUMUHM, TCG_CALL_NO_RWG, void, avr, avr, avr, avr)
+DEF_HELPER_FLAGS_5(VMSUMUHS, TCG_CALL_NO_RWG, void, env, avr, avr, avr, avr)
 DEF_HELPER_5(vmsumshm, void, env, avr, avr, avr, avr)
 DEF_HELPER_5(vmsumshs, void, env, avr, avr, avr, avr)
 DEF_HELPER_FLAGS_4(vmladduhm, TCG_CALL_NO_RWG, void, avr, avr, avr, avr)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index fdb8d76456..43ea03c3e7 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -601,6 +601,8 @@ VMULLD  000100 . . . 00111001001@VX
 
 VMSUMUBM000100 . . . . 100100   @VA
 VMSUMMBM000100 . . . . 100101   @VA
+VMSUMUHM000100 . . . . 100110   @VA
+VMSUMUHS000100 . . . . 100111   @VA
 
 VMSUMCUD000100 . . . . 010111   @VA
 VMSUMUDM000100 . . . . 100011   @VA
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 85a7442103..9285a1c2a1 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -942,8 +942,7 @@ void helper_VMSUMUBM(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t 
*b, ppc_avr_t *c)
 }
 }
 
-void helper_vmsumuhm(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,
- ppc_avr_t *b, ppc_avr_t *c)
+void helper_VMSUMUHM(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c)
 {
 uint32_t prod[8];
 int i;
@@ -957,7 +956,7 @@ void helper_vmsumuhm(CPUPPCState *env, ppc_avr_t *r, 
ppc_avr_t *a,
 }
 }
 
-void helper_vmsumuhs(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,
+void helper_VMSUMUHS(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,
  ppc_avr_t *b, ppc_avr_t *c)
 {
 uint32_t prod[8];
diff --git a/target/ppc/translate/vmx-impl.c.inc 
b/target/ppc/translate/vmx-impl.c.inc
index 4cbd724641..da81296b96 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -2587,11 +2587,31 @@ static bool trans_VSEL(DisasContext *ctx, arg_VA *a)
 return true;
 }
 
-GEN_VAFORM_PAIRED(vmsumuhm, vmsumuhs, 19)
 GEN_VAFORM_PAIRED(vmsumshm, vmsumshs, 20)
-
 TRANS_FLAGS(ALTIVEC, VMSUMUBM, do_va_helper, gen_helper_VMSUMUBM)
 TRANS_FLAGS(ALTIVEC, VMSUMMBM, do_va_helper, gen_helper_VMSUMMBM)
+TRANS_FLAGS(ALTIVEC, VMSUMUHM, do_va_helper, gen_helper_VMSUMUHM)
+
+static bool do_va_env_helper(DisasContext *ctx, arg_VA *a,
+void (*gen_helper)(TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr, TCGv_ptr))
+{
+TCGv_ptr vrt, vra, vrb, vrc;
+REQUIRE_VECTOR(ctx);
+
+vrt = gen_avr_ptr(a->vrt);
+vra = gen_avr_ptr(a->vra);
+vrb = gen_avr_ptr(a->vrb);
+vrc = gen_avr_ptr(a->rc);
+gen_helper(cpu_env, vrt, vra, vrb, vrc);
+tcg_temp_free_ptr(vrt);
+tcg_temp_free_ptr(vra);
+tcg_temp_free_ptr(vrb);
+tcg_temp_free_ptr(vrc);
+
+return true;
+}
+
+TRANS_FLAGS(ALTIVEC, VMSUMUHS, do_va_env_helper, gen_helper_VMSUMUHS)
 
 GEN_VAFORM_PAIRED(vmaddfp, vnmsubfp, 23)
 
diff --git a/target/ppc/translate/vmx-ops.c.inc 
b/target/ppc/translate/vmx-ops.c.inc
index 5b85322c06..15b3e06410 100644
--- a/target/ppc/translate/vmx-ops.c.inc
+++ b/target/ppc/translate/vmx-ops.c.inc
@@ -224,7 +224,6 @@ GEN_VXFORM_UIMM(vctsxs, 5, 15),
 #define GEN_VAFORM_PAIRED(name0, name1, opc2)   \
 GEN_HANDLER(name0##_##name1, 0x04, opc2, 0xFF, 0x, PPC_ALTIVEC)
 GEN_VAFORM_PAIRED(vmhaddshs, vmhraddshs, 16),
-GEN_VAFORM_PAIRED(vmsumuhm, vmsumuhs, 19),
 GEN_VAFORM_PAIRED(vmsumshm, vmsumshs, 20),
 GEN_VAFORM_PAIRED(vmaddfp, vnmsubfp, 23),
 
-- 
2.25.1




[PATCH v2 04/12] target/ppc: use TCG_CALL_NO_RWG in VSX helpers without env

2022-05-19 Thread matheus . ferst
From: Matheus Ferst 

Helpers of VSX instructions without cpu_env as an argument cannot access
globals.

Reviewed-by: Richard Henderson 
Signed-off-by: Matheus Ferst 
---
 target/ppc/helper.h | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 11e41af020..ba70d2133b 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -533,10 +533,10 @@ DEF_HELPER_FLAGS_5(XXPERMX, TCG_CALL_NO_RWG, void, vsr, 
vsr, vsr, vsr, tl)
 DEF_HELPER_4(xxinsertw, void, env, vsr, vsr, i32)
 DEF_HELPER_3(xvxsigsp, void, env, vsr, vsr)
 DEF_HELPER_FLAGS_5(XXEVAL, TCG_CALL_NO_RWG, void, vsr, vsr, vsr, vsr, i32)
-DEF_HELPER_5(XXBLENDVB, void, vsr, vsr, vsr, vsr, i32)
-DEF_HELPER_5(XXBLENDVH, void, vsr, vsr, vsr, vsr, i32)
-DEF_HELPER_5(XXBLENDVW, void, vsr, vsr, vsr, vsr, i32)
-DEF_HELPER_5(XXBLENDVD, void, vsr, vsr, vsr, vsr, i32)
+DEF_HELPER_FLAGS_5(XXBLENDVB, TCG_CALL_NO_RWG, void, vsr, vsr, vsr, vsr, i32)
+DEF_HELPER_FLAGS_5(XXBLENDVH, TCG_CALL_NO_RWG, void, vsr, vsr, vsr, vsr, i32)
+DEF_HELPER_FLAGS_5(XXBLENDVW, TCG_CALL_NO_RWG, void, vsr, vsr, vsr, vsr, i32)
+DEF_HELPER_FLAGS_5(XXBLENDVD, TCG_CALL_NO_RWG, void, vsr, vsr, vsr, vsr, i32)
 
 DEF_HELPER_2(efscfsi, i32, env, i32)
 DEF_HELPER_2(efscfui, i32, env, i32)
-- 
2.25.1




[PATCH v2 08/12] target/ppc: declare xxextractuw and xxinsertw helpers with call flags

2022-05-19 Thread matheus . ferst
From: Matheus Ferst 

Move xxextractuw and xxinsertw to decodetree, declare both helpers with
TCG_CALL_NO_RWG, and drop the unused env argument.

Reviewed-by: Richard Henderson 
Signed-off-by: Matheus Ferst 
---
 target/ppc/helper.h |  4 +-
 target/ppc/insn32.decode|  9 -
 target/ppc/int_helper.c |  6 +--
 target/ppc/translate/vsx-impl.c.inc | 63 +
 target/ppc/translate/vsx-ops.c.inc  |  2 -
 5 files changed, 39 insertions(+), 45 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index f96d7f2fcf..69e1d3e327 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -527,9 +527,9 @@ DEF_HELPER_FLAGS_2(XXGENPCVDM_be_exp, TCG_CALL_NO_RWG, 
void, vsr, avr)
 DEF_HELPER_FLAGS_2(XXGENPCVDM_be_comp, TCG_CALL_NO_RWG, void, vsr, avr)
 DEF_HELPER_FLAGS_2(XXGENPCVDM_le_exp, TCG_CALL_NO_RWG, void, vsr, avr)
 DEF_HELPER_FLAGS_2(XXGENPCVDM_le_comp, TCG_CALL_NO_RWG, void, vsr, avr)
-DEF_HELPER_4(xxextractuw, void, env, vsr, vsr, i32)
+DEF_HELPER_FLAGS_3(XXEXTRACTUW, TCG_CALL_NO_RWG, void, vsr, vsr, i32)
 DEF_HELPER_FLAGS_5(XXPERMX, TCG_CALL_NO_RWG, void, vsr, vsr, vsr, vsr, tl)
-DEF_HELPER_4(xxinsertw, void, env, vsr, vsr, i32)
+DEF_HELPER_FLAGS_3(XXINSERTW, TCG_CALL_NO_RWG, void, vsr, vsr, i32)
 DEF_HELPER_FLAGS_2(XVXSIGSP, TCG_CALL_NO_RWG, void, vsr, vsr)
 DEF_HELPER_FLAGS_5(XXEVAL, TCG_CALL_NO_RWG, void, vsr, vsr, vsr, vsr, i32)
 DEF_HELPER_FLAGS_5(XXBLENDVB, TCG_CALL_NO_RWG, void, vsr, vsr, vsr, vsr, i32)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 483349ff6d..435cf1320c 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -161,8 +161,10 @@
 xt xb
 @XX2.. . . . . ..xt=%xx_xt 
xb=%xx_xb
 
-_uim2   xt xb uim:uint8_t
-@XX2_uim2   .. . ... uim:2 . . ..   _uim2 
xt=%xx_xt xb=%xx_xb
+_uimxt xb uim:uint8_t
+@XX2_uim2   .. . ... uim:2 . . ..   _uim 
xt=%xx_xt xb=%xx_xb
+
+@XX2_uim4   .. . . uim:4 . . .. _uim 
xt=%xx_xt xb=%xx_xb
 
 _bf_xb  bf xb
 @XX2_bf_xb  .. bf:3 .. . . . . ._bf_xb 
xb=%xx_xb
@@ -666,6 +668,9 @@ XXSPLTW 00 . ---.. . 010100100 . .  
@XX2_uim2
 
 ## VSX Permute Instructions
 
+XXEXTRACTUW 00 . -  . 010100101 ..  @XX2_uim4
+XXINSERTW   00 . -  . 010110101 ..  @XX2_uim4
+
 XXPERM  00 . . . 00011010 ...   @XX3
 XXPERMR 00 . . . 00111010 ...   @XX3
 XXPERMDI00 . . . 0 .. 01010 ... @XX3_dm
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 8c1674510b..9a361ad241 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -1647,8 +1647,7 @@ VSTRI(VSTRIHL, H, 8, true)
 VSTRI(VSTRIHR, H, 8, false)
 #undef VSTRI
 
-void helper_xxextractuw(CPUPPCState *env, ppc_vsr_t *xt,
-ppc_vsr_t *xb, uint32_t index)
+void helper_XXEXTRACTUW(ppc_vsr_t *xt, ppc_vsr_t *xb, uint32_t index)
 {
 ppc_vsr_t t = { };
 size_t es = sizeof(uint32_t);
@@ -1663,8 +1662,7 @@ void helper_xxextractuw(CPUPPCState *env, ppc_vsr_t *xt,
 *xt = t;
 }
 
-void helper_xxinsertw(CPUPPCState *env, ppc_vsr_t *xt,
-  ppc_vsr_t *xb, uint32_t index)
+void helper_XXINSERTW(ppc_vsr_t *xt, ppc_vsr_t *xb, uint32_t index)
 {
 ppc_vsr_t t = *xt;
 size_t es = sizeof(uint32_t);
diff --git a/target/ppc/translate/vsx-impl.c.inc 
b/target/ppc/translate/vsx-impl.c.inc
index 70cc97b0db..f980fb6f58 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -1589,7 +1589,7 @@ static bool trans_XXSEL(DisasContext *ctx, arg_XX4 *a)
 return true;
 }
 
-static bool trans_XXSPLTW(DisasContext *ctx, arg_XX2_uim2 *a)
+static bool trans_XXSPLTW(DisasContext *ctx, arg_XX2_uim *a)
 {
 int tofs, bofs;
 
@@ -1799,42 +1799,35 @@ static void gen_xxsldwi(DisasContext *ctx)
 tcg_temp_free_i64(xtl);
 }
 
-#define VSX_EXTRACT_INSERT(name)\
-static void gen_##name(DisasContext *ctx)   \
-{   \
-TCGv_ptr xt, xb;\
-TCGv_i32 t0;\
-TCGv_i64 t1;\
-uint8_t uimm = UIMM4(ctx->opcode);  \
-\
-if (unlikely(!ctx->vsx_enabled)) {  \
-gen_exception(ctx, POWERPC_EXCP_VSXU);  \
-return; \
-}   \
-xt = gen_vsr_ptr(xT(ctx->opcode));  \
-xb = 

[PATCH v2 10/12] target/ppc: declare vmsum[um]bm helpers with call flags

2022-05-19 Thread matheus . ferst
From: Matheus Ferst 

Move vmsumubm and vmsummbm to decodetree, declare both helpers with
TCG_CALL_NO_RWG, and drop the unused env argument.

Reviewed-by: Richard Henderson 
Signed-off-by: Matheus Ferst 
---
 target/ppc/helper.h | 4 ++--
 target/ppc/insn32.decode| 3 +++
 target/ppc/int_helper.c | 6 ++
 target/ppc/translate/vmx-impl.c.inc | 5 -
 target/ppc/translate/vmx-ops.c.inc  | 2 --
 5 files changed, 11 insertions(+), 9 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 69e1d3e327..efbbd34feb 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -223,8 +223,8 @@ DEF_HELPER_FLAGS_2(vupkhsw, TCG_CALL_NO_RWG, void, avr, avr)
 DEF_HELPER_FLAGS_2(vupklsb, TCG_CALL_NO_RWG, void, avr, avr)
 DEF_HELPER_FLAGS_2(vupklsh, TCG_CALL_NO_RWG, void, avr, avr)
 DEF_HELPER_FLAGS_2(vupklsw, TCG_CALL_NO_RWG, void, avr, avr)
-DEF_HELPER_5(vmsumubm, void, env, avr, avr, avr, avr)
-DEF_HELPER_5(vmsummbm, void, env, avr, avr, avr, avr)
+DEF_HELPER_FLAGS_4(VMSUMUBM, TCG_CALL_NO_RWG, void, avr, avr, avr, avr)
+DEF_HELPER_FLAGS_4(VMSUMMBM, TCG_CALL_NO_RWG, void, avr, avr, avr, avr)
 DEF_HELPER_FLAGS_4(VPERM, TCG_CALL_NO_RWG, void, avr, avr, avr, avr)
 DEF_HELPER_FLAGS_4(VPERMR, TCG_CALL_NO_RWG, void, avr, avr, avr, avr)
 DEF_HELPER_4(vpkshss, void, env, avr, avr, avr)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 435cf1320c..fdb8d76456 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -599,6 +599,9 @@ VMULLD  000100 . . . 00111001001@VX
 
 ## Vector Multiply-Sum Instructions
 
+VMSUMUBM000100 . . . . 100100   @VA
+VMSUMMBM000100 . . . . 100101   @VA
+
 VMSUMCUD000100 . . . . 010111   @VA
 VMSUMUDM000100 . . . . 100011   @VA
 
diff --git a/target/ppc/int_helper.c b/target/ppc/int_helper.c
index 9a361ad241..85a7442103 100644
--- a/target/ppc/int_helper.c
+++ b/target/ppc/int_helper.c
@@ -875,8 +875,7 @@ VMRG(w, u32, VsrW)
 #undef VMRG_DO
 #undef VMRG
 
-void helper_vmsummbm(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,
- ppc_avr_t *b, ppc_avr_t *c)
+void helper_VMSUMMBM(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c)
 {
 int32_t prod[16];
 int i;
@@ -928,8 +927,7 @@ void helper_vmsumshs(CPUPPCState *env, ppc_avr_t *r, 
ppc_avr_t *a,
 }
 }
 
-void helper_vmsumubm(CPUPPCState *env, ppc_avr_t *r, ppc_avr_t *a,
- ppc_avr_t *b, ppc_avr_t *c)
+void helper_VMSUMUBM(ppc_avr_t *r, ppc_avr_t *a, ppc_avr_t *b, ppc_avr_t *c)
 {
 uint16_t prod[16];
 int i;
diff --git a/target/ppc/translate/vmx-impl.c.inc 
b/target/ppc/translate/vmx-impl.c.inc
index e66301c007..4cbd724641 100644
--- a/target/ppc/translate/vmx-impl.c.inc
+++ b/target/ppc/translate/vmx-impl.c.inc
@@ -2587,9 +2587,12 @@ static bool trans_VSEL(DisasContext *ctx, arg_VA *a)
 return true;
 }
 
-GEN_VAFORM_PAIRED(vmsumubm, vmsummbm, 18)
 GEN_VAFORM_PAIRED(vmsumuhm, vmsumuhs, 19)
 GEN_VAFORM_PAIRED(vmsumshm, vmsumshs, 20)
+
+TRANS_FLAGS(ALTIVEC, VMSUMUBM, do_va_helper, gen_helper_VMSUMUBM)
+TRANS_FLAGS(ALTIVEC, VMSUMMBM, do_va_helper, gen_helper_VMSUMMBM)
+
 GEN_VAFORM_PAIRED(vmaddfp, vnmsubfp, 23)
 
 GEN_VXFORM_NOA(vclzb, 1, 28)
diff --git a/target/ppc/translate/vmx-ops.c.inc 
b/target/ppc/translate/vmx-ops.c.inc
index d960648d52..5b85322c06 100644
--- a/target/ppc/translate/vmx-ops.c.inc
+++ b/target/ppc/translate/vmx-ops.c.inc
@@ -221,11 +221,9 @@ GEN_VXFORM_UIMM(vcfsx, 5, 13),
 GEN_VXFORM_UIMM(vctuxs, 5, 14),
 GEN_VXFORM_UIMM(vctsxs, 5, 15),
 
-
 #define GEN_VAFORM_PAIRED(name0, name1, opc2)   \
 GEN_HANDLER(name0##_##name1, 0x04, opc2, 0xFF, 0x, PPC_ALTIVEC)
 GEN_VAFORM_PAIRED(vmhaddshs, vmhraddshs, 16),
-GEN_VAFORM_PAIRED(vmsumubm, vmsummbm, 18),
 GEN_VAFORM_PAIRED(vmsumuhm, vmsumuhs, 19),
 GEN_VAFORM_PAIRED(vmsumshm, vmsumshs, 20),
 GEN_VAFORM_PAIRED(vmaddfp, vnmsubfp, 23),
-- 
2.25.1




[PATCH v2 03/12] target/ppc: use TCG_CALL_NO_RWG in BCD helpers

2022-05-19 Thread matheus . ferst
From: Matheus Ferst 

Helpers of BCD instructions only access the VSRs supplied by the
TCGv_ptr arguments, no globals are accessed.

Reviewed-by: Richard Henderson 
Signed-off-by: Matheus Ferst 
---
 target/ppc/helper.h | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index a5d066ff2d..11e41af020 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -327,21 +327,21 @@ DEF_HELPER_FLAGS_3(vshasigmaw, TCG_CALL_NO_RWG, void, 
avr, avr, i32)
 DEF_HELPER_FLAGS_3(vshasigmad, TCG_CALL_NO_RWG, void, avr, avr, i32)
 DEF_HELPER_FLAGS_4(vpermxor, TCG_CALL_NO_RWG, void, avr, avr, avr, avr)
 
-DEF_HELPER_4(bcdadd, i32, avr, avr, avr, i32)
-DEF_HELPER_4(bcdsub, i32, avr, avr, avr, i32)
-DEF_HELPER_3(bcdcfn, i32, avr, avr, i32)
-DEF_HELPER_3(bcdctn, i32, avr, avr, i32)
-DEF_HELPER_3(bcdcfz, i32, avr, avr, i32)
-DEF_HELPER_3(bcdctz, i32, avr, avr, i32)
-DEF_HELPER_3(bcdcfsq, i32, avr, avr, i32)
-DEF_HELPER_3(bcdctsq, i32, avr, avr, i32)
-DEF_HELPER_4(bcdcpsgn, i32, avr, avr, avr, i32)
-DEF_HELPER_3(bcdsetsgn, i32, avr, avr, i32)
-DEF_HELPER_4(bcds, i32, avr, avr, avr, i32)
-DEF_HELPER_4(bcdus, i32, avr, avr, avr, i32)
-DEF_HELPER_4(bcdsr, i32, avr, avr, avr, i32)
-DEF_HELPER_4(bcdtrunc, i32, avr, avr, avr, i32)
-DEF_HELPER_4(bcdutrunc, i32, avr, avr, avr, i32)
+DEF_HELPER_FLAGS_4(bcdadd, TCG_CALL_NO_RWG, i32, avr, avr, avr, i32)
+DEF_HELPER_FLAGS_4(bcdsub, TCG_CALL_NO_RWG, i32, avr, avr, avr, i32)
+DEF_HELPER_FLAGS_3(bcdcfn, TCG_CALL_NO_RWG, i32, avr, avr, i32)
+DEF_HELPER_FLAGS_3(bcdctn, TCG_CALL_NO_RWG, i32, avr, avr, i32)
+DEF_HELPER_FLAGS_3(bcdcfz, TCG_CALL_NO_RWG, i32, avr, avr, i32)
+DEF_HELPER_FLAGS_3(bcdctz, TCG_CALL_NO_RWG, i32, avr, avr, i32)
+DEF_HELPER_FLAGS_3(bcdcfsq, TCG_CALL_NO_RWG, i32, avr, avr, i32)
+DEF_HELPER_FLAGS_3(bcdctsq, TCG_CALL_NO_RWG, i32, avr, avr, i32)
+DEF_HELPER_FLAGS_4(bcdcpsgn, TCG_CALL_NO_RWG, i32, avr, avr, avr, i32)
+DEF_HELPER_FLAGS_3(bcdsetsgn, TCG_CALL_NO_RWG, i32, avr, avr, i32)
+DEF_HELPER_FLAGS_4(bcds, TCG_CALL_NO_RWG, i32, avr, avr, avr, i32)
+DEF_HELPER_FLAGS_4(bcdus, TCG_CALL_NO_RWG, i32, avr, avr, avr, i32)
+DEF_HELPER_FLAGS_4(bcdsr, TCG_CALL_NO_RWG, i32, avr, avr, avr, i32)
+DEF_HELPER_FLAGS_4(bcdtrunc, TCG_CALL_NO_RWG, i32, avr, avr, avr, i32)
+DEF_HELPER_FLAGS_4(bcdutrunc, TCG_CALL_NO_RWG, i32, avr, avr, avr, i32)
 
 DEF_HELPER_4(xsadddp, void, env, vsr, vsr, vsr)
 DEF_HELPER_5(xsaddqp, void, env, i32, vsr, vsr, vsr)
-- 
2.25.1




[PATCH v2 01/12] target/ppc: declare darn32/darn64 helpers with TCG_CALL_NO_RWG_SE

2022-05-19 Thread matheus . ferst
From: Matheus Ferst 

Reviewed-by: Richard Henderson 
Signed-off-by: Matheus Ferst 
---
 target/ppc/helper.h | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index aa6773c4a5..718ab6bc7b 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -59,8 +59,8 @@ DEF_HELPER_FLAGS_2(cmpeqb, TCG_CALL_NO_RWG_SE, i32, tl, tl)
 DEF_HELPER_FLAGS_1(popcntw, TCG_CALL_NO_RWG_SE, tl, tl)
 DEF_HELPER_FLAGS_2(bpermd, TCG_CALL_NO_RWG_SE, i64, i64, i64)
 DEF_HELPER_3(srad, tl, env, tl, tl)
-DEF_HELPER_0(darn32, tl)
-DEF_HELPER_0(darn64, tl)
+DEF_HELPER_FLAGS_0(darn32, TCG_CALL_NO_RWG_SE, tl)
+DEF_HELPER_FLAGS_0(darn64, TCG_CALL_NO_RWG_SE, tl)
 #endif
 
 DEF_HELPER_FLAGS_1(cntlsw32, TCG_CALL_NO_RWG_SE, i32, i32)
-- 
2.25.1




[PATCH v2 06/12] target/ppc: implement xscvspdpn with helper_todouble

2022-05-19 Thread matheus . ferst
From: Matheus Ferst 

Move xscvspdpn to decodetree, drop helper_xscvspdpn and use
helper_todouble directly.

Signed-off-by: Matheus Ferst 
---
 target/ppc/fpu_helper.c |  5 -
 target/ppc/helper.h |  1 -
 target/ppc/insn32.decode|  1 +
 target/ppc/translate/vsx-impl.c.inc | 26 +-
 target/ppc/translate/vsx-ops.c.inc  |  1 -
 5 files changed, 26 insertions(+), 8 deletions(-)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index b4d6f6ed4c..9bde333006 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -2875,11 +2875,6 @@ uint64_t helper_xscvdpspn(CPUPPCState *env, uint64_t xb)
 return (result << 32) | result;
 }
 
-uint64_t helper_xscvspdpn(CPUPPCState *env, uint64_t xb)
-{
-return helper_todouble(xb >> 32);
-}
-
 /*
  * VSX_CVT_FP_TO_INT - VSX floating point to integer conversion
  *   op- instruction mnemonic
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 4a7cbdf922..5cee55176b 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -395,7 +395,6 @@ DEF_HELPER_3(XSCVSQQP, void, env, vsr, vsr)
 DEF_HELPER_3(xscvhpdp, void, env, vsr, vsr)
 DEF_HELPER_4(xscvsdqp, void, env, i32, vsr, vsr)
 DEF_HELPER_3(xscvspdp, void, env, vsr, vsr)
-DEF_HELPER_2(xscvspdpn, i64, env, i64)
 DEF_HELPER_3(xscvdpsxds, void, env, vsr, vsr)
 DEF_HELPER_3(xscvdpsxws, void, env, vsr, vsr)
 DEF_HELPER_3(xscvdpuxds, void, env, vsr, vsr)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index 1d0b55bde3..d4c2615b1a 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -708,6 +708,7 @@ XSCVUQQP11 . 00011 . 1101000100 -   
@X_tb
 XSCVSQQP11 . 01011 . 1101000100 -   @X_tb
 XVCVBF16SPN 00 . 1 . 111011011 ..   @XX2
 XVCVSPBF16  00 . 10001 . 111011011 ..   @XX2
+XSCVSPDPN   00 . - . 101001011 ..   @XX2
 
 ## VSX Vector Test Least-Significant Bit by Byte Instruction
 
diff --git a/target/ppc/translate/vsx-impl.c.inc 
b/target/ppc/translate/vsx-impl.c.inc
index 3692740736..cc0601a14e 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -1045,7 +1045,31 @@ GEN_VSX_HELPER_R2(xscvqpuwz, 0x04, 0x1A, 0x01, 
PPC2_ISA300)
 GEN_VSX_HELPER_X2(xscvhpdp, 0x16, 0x15, 0x10, PPC2_ISA300)
 GEN_VSX_HELPER_R2(xscvsdqp, 0x04, 0x1A, 0x0A, PPC2_ISA300)
 GEN_VSX_HELPER_X2(xscvspdp, 0x12, 0x14, 0, PPC2_VSX)
-GEN_VSX_HELPER_XT_XB_ENV(xscvspdpn, 0x16, 0x14, 0, PPC2_VSX207)
+
+bool trans_XSCVSPDPN(DisasContext *ctx, arg_XX2 *a)
+{
+TCGv_i64 t;
+TCGv_i32 b;
+
+REQUIRE_INSNS_FLAGS2(ctx, VSX207);
+REQUIRE_VSX(ctx);
+
+t = tcg_temp_new_i64();
+b = tcg_temp_new_i32();
+
+tcg_gen_ld_i32(b, cpu_env, offsetof(CPUPPCState, vsr[a->xb].VsrW(0)));
+
+gen_helper_todouble(t, b);
+
+set_cpu_vsr(a->xt, t, true);
+set_cpu_vsr(a->xt, tcg_constant_i64(0), false);
+
+tcg_temp_free_i64(t);
+tcg_temp_free_i32(b);
+
+return true;
+}
+
 GEN_VSX_HELPER_X2(xscvdpsxds, 0x10, 0x15, 0, PPC2_VSX)
 GEN_VSX_HELPER_X2(xscvdpsxws, 0x10, 0x05, 0, PPC2_VSX)
 GEN_VSX_HELPER_X2(xscvdpuxds, 0x10, 0x14, 0, PPC2_VSX)
diff --git a/target/ppc/translate/vsx-ops.c.inc 
b/target/ppc/translate/vsx-ops.c.inc
index b8fd116728..52d7ab30cd 100644
--- a/target/ppc/translate/vsx-ops.c.inc
+++ b/target/ppc/translate/vsx-ops.c.inc
@@ -200,7 +200,6 @@ GEN_XX2FORM(xscvdpspn, 0x16, 0x10, PPC2_VSX207),
 GEN_XX2FORM_EO(xscvhpdp, 0x16, 0x15, 0x10, PPC2_ISA300),
 GEN_VSX_XFORM_300_EO(xscvsdqp, 0x04, 0x1A, 0x0A, 0x0001),
 GEN_XX2FORM(xscvspdp, 0x12, 0x14, PPC2_VSX),
-GEN_XX2FORM(xscvspdpn, 0x16, 0x14, PPC2_VSX207),
 GEN_XX2FORM(xscvdpsxds, 0x10, 0x15, PPC2_VSX),
 GEN_XX2FORM(xscvdpsxws, 0x10, 0x05, PPC2_VSX),
 GEN_XX2FORM(xscvdpuxds, 0x10, 0x14, PPC2_VSX),
-- 
2.25.1




[PATCH v2 07/12] target/ppc: declare xvxsigsp helper with call flags

2022-05-19 Thread matheus . ferst
From: Matheus Ferst 

Move xvxsigsp to decodetree, declare helper_xvxsigsp with
TCG_CALL_NO_RWG, and drop the unused env argument.

Reviewed-by: Richard Henderson 
Signed-off-by: Matheus Ferst 
---
 target/ppc/fpu_helper.c |  2 +-
 target/ppc/helper.h |  2 +-
 target/ppc/insn32.decode|  4 
 target/ppc/translate/vsx-impl.c.inc | 18 +-
 target/ppc/translate/vsx-ops.c.inc  |  1 -
 5 files changed, 23 insertions(+), 4 deletions(-)

diff --git a/target/ppc/fpu_helper.c b/target/ppc/fpu_helper.c
index 9bde333006..8826e10074 100644
--- a/target/ppc/fpu_helper.c
+++ b/target/ppc/fpu_helper.c
@@ -3193,7 +3193,7 @@ uint64_t helper_xsrsp(CPUPPCState *env, uint64_t xb)
 return xt;
 }
 
-void helper_xvxsigsp(CPUPPCState *env, ppc_vsr_t *xt, ppc_vsr_t *xb)
+void helper_XVXSIGSP(ppc_vsr_t *xt, ppc_vsr_t *xb)
 {
 ppc_vsr_t t = { };
 uint32_t exp, i, fraction;
diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 5cee55176b..f96d7f2fcf 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -530,7 +530,7 @@ DEF_HELPER_FLAGS_2(XXGENPCVDM_le_comp, TCG_CALL_NO_RWG, 
void, vsr, avr)
 DEF_HELPER_4(xxextractuw, void, env, vsr, vsr, i32)
 DEF_HELPER_FLAGS_5(XXPERMX, TCG_CALL_NO_RWG, void, vsr, vsr, vsr, vsr, tl)
 DEF_HELPER_4(xxinsertw, void, env, vsr, vsr, i32)
-DEF_HELPER_3(xvxsigsp, void, env, vsr, vsr)
+DEF_HELPER_FLAGS_2(XVXSIGSP, TCG_CALL_NO_RWG, void, vsr, vsr)
 DEF_HELPER_FLAGS_5(XXEVAL, TCG_CALL_NO_RWG, void, vsr, vsr, vsr, vsr, i32)
 DEF_HELPER_FLAGS_5(XXBLENDVB, TCG_CALL_NO_RWG, void, vsr, vsr, vsr, vsr, i32)
 DEF_HELPER_FLAGS_5(XXBLENDVH, TCG_CALL_NO_RWG, void, vsr, vsr, vsr, vsr, i32)
diff --git a/target/ppc/insn32.decode b/target/ppc/insn32.decode
index d4c2615b1a..483349ff6d 100644
--- a/target/ppc/insn32.decode
+++ b/target/ppc/insn32.decode
@@ -710,6 +710,10 @@ XVCVBF16SPN 00 . 1 . 111011011 ..   
@XX2
 XVCVSPBF16  00 . 10001 . 111011011 ..   @XX2
 XSCVSPDPN   00 . - . 101001011 ..   @XX2
 
+## VSX Binary Floating-Point Math Support Instructions
+
+XVXSIGSP00 . 01001 . 111011011 ..   @XX2
+
 ## VSX Vector Test Least-Significant Bit by Byte Instruction
 
 XVTLSBB 00 ... -- 00010 . 111011011 . - @XX2_bf_xb
diff --git a/target/ppc/translate/vsx-impl.c.inc 
b/target/ppc/translate/vsx-impl.c.inc
index cc0601a14e..70cc97b0db 100644
--- a/target/ppc/translate/vsx-impl.c.inc
+++ b/target/ppc/translate/vsx-impl.c.inc
@@ -2155,7 +2155,23 @@ static void gen_xvxexpdp(DisasContext *ctx)
 tcg_temp_free_i64(xbl);
 }
 
-GEN_VSX_HELPER_X2(xvxsigsp, 0x00, 0x04, 0, PPC2_ISA300)
+static bool trans_XVXSIGSP(DisasContext *ctx, arg_XX2 *a)
+{
+TCGv_ptr t, b;
+
+REQUIRE_INSNS_FLAGS2(ctx, ISA300);
+REQUIRE_VSX(ctx);
+
+t = gen_vsr_ptr(a->xt);
+b = gen_vsr_ptr(a->xb);
+
+gen_helper_XVXSIGSP(t, b);
+
+tcg_temp_free_ptr(t);
+tcg_temp_free_ptr(b);
+
+return true;
+}
 
 static void gen_xvxsigdp(DisasContext *ctx)
 {
diff --git a/target/ppc/translate/vsx-ops.c.inc 
b/target/ppc/translate/vsx-ops.c.inc
index 52d7ab30cd..4524c5b02a 100644
--- a/target/ppc/translate/vsx-ops.c.inc
+++ b/target/ppc/translate/vsx-ops.c.inc
@@ -156,7 +156,6 @@ GEN_XX3FORM(xviexpdp, 0x00, 0x1F, PPC2_ISA300),
 GEN_XX2FORM_EO(xvxexpdp, 0x16, 0x1D, 0x00, PPC2_ISA300),
 GEN_XX2FORM_EO(xvxsigdp, 0x16, 0x1D, 0x01, PPC2_ISA300),
 GEN_XX2FORM_EO(xvxexpsp, 0x16, 0x1D, 0x08, PPC2_ISA300),
-GEN_XX2FORM_EO(xvxsigsp, 0x16, 0x1D, 0x09, PPC2_ISA300),
 
 /* DCMX  =  bit[25] << 6 | bit[29] << 5 | bit[11:15] */
 #define GEN_XX2FORM_DCMX(name, opc2, opc3, fl2) \
-- 
2.25.1




[PATCH v2 02/12] target/ppc: use TCG_CALL_NO_RWG in vector helpers without env

2022-05-19 Thread matheus . ferst
From: Matheus Ferst 

Helpers of vector instructions without cpu_env as an argument cannot
access globals.

Reviewed-by: Richard Henderson 
Signed-off-by: Matheus Ferst 
---
 target/ppc/helper.h | 162 ++--
 1 file changed, 81 insertions(+), 81 deletions(-)

diff --git a/target/ppc/helper.h b/target/ppc/helper.h
index 718ab6bc7b..a5d066ff2d 100644
--- a/target/ppc/helper.h
+++ b/target/ppc/helper.h
@@ -133,15 +133,15 @@ DEF_HELPER_FLAGS_1(ftsqrt, TCG_CALL_NO_RWG_SE, i32, i64)
 #define dh_ctype_vsr ppc_vsr_t *
 #define dh_typecode_vsr dh_typecode_ptr
 
-DEF_HELPER_3(vavgub, void, avr, avr, avr)
-DEF_HELPER_3(vavguh, void, avr, avr, avr)
-DEF_HELPER_3(vavguw, void, avr, avr, avr)
-DEF_HELPER_3(vabsdub, void, avr, avr, avr)
-DEF_HELPER_3(vabsduh, void, avr, avr, avr)
-DEF_HELPER_3(vabsduw, void, avr, avr, avr)
-DEF_HELPER_3(vavgsb, void, avr, avr, avr)
-DEF_HELPER_3(vavgsh, void, avr, avr, avr)
-DEF_HELPER_3(vavgsw, void, avr, avr, avr)
+DEF_HELPER_FLAGS_3(vavgub, TCG_CALL_NO_RWG, void, avr, avr, avr)
+DEF_HELPER_FLAGS_3(vavguh, TCG_CALL_NO_RWG, void, avr, avr, avr)
+DEF_HELPER_FLAGS_3(vavguw, TCG_CALL_NO_RWG, void, avr, avr, avr)
+DEF_HELPER_FLAGS_3(vabsdub, TCG_CALL_NO_RWG, void, avr, avr, avr)
+DEF_HELPER_FLAGS_3(vabsduh, TCG_CALL_NO_RWG, void, avr, avr, avr)
+DEF_HELPER_FLAGS_3(vabsduw, TCG_CALL_NO_RWG, void, avr, avr, avr)
+DEF_HELPER_FLAGS_3(vavgsb, TCG_CALL_NO_RWG, void, avr, avr, avr)
+DEF_HELPER_FLAGS_3(vavgsh, TCG_CALL_NO_RWG, void, avr, avr, avr)
+DEF_HELPER_FLAGS_3(vavgsw, TCG_CALL_NO_RWG, void, avr, avr, avr)
 DEF_HELPER_4(vcmpeqfp, void, env, avr, avr, avr)
 DEF_HELPER_4(vcmpgefp, void, env, avr, avr, avr)
 DEF_HELPER_4(vcmpgtfp, void, env, avr, avr, avr)
@@ -153,12 +153,12 @@ DEF_HELPER_4(vcmpeqfp_dot, void, env, avr, avr, avr)
 DEF_HELPER_4(vcmpgefp_dot, void, env, avr, avr, avr)
 DEF_HELPER_4(vcmpgtfp_dot, void, env, avr, avr, avr)
 DEF_HELPER_4(vcmpbfp_dot, void, env, avr, avr, avr)
-DEF_HELPER_3(vmrglb, void, avr, avr, avr)
-DEF_HELPER_3(vmrglh, void, avr, avr, avr)
-DEF_HELPER_3(vmrglw, void, avr, avr, avr)
-DEF_HELPER_3(vmrghb, void, avr, avr, avr)
-DEF_HELPER_3(vmrghh, void, avr, avr, avr)
-DEF_HELPER_3(vmrghw, void, avr, avr, avr)
+DEF_HELPER_FLAGS_3(vmrglb, TCG_CALL_NO_RWG, void, avr, avr, avr)
+DEF_HELPER_FLAGS_3(vmrglh, TCG_CALL_NO_RWG, void, avr, avr, avr)
+DEF_HELPER_FLAGS_3(vmrglw, TCG_CALL_NO_RWG, void, avr, avr, avr)
+DEF_HELPER_FLAGS_3(vmrghb, TCG_CALL_NO_RWG, void, avr, avr, avr)
+DEF_HELPER_FLAGS_3(vmrghh, TCG_CALL_NO_RWG, void, avr, avr, avr)
+DEF_HELPER_FLAGS_3(vmrghw, TCG_CALL_NO_RWG, void, avr, avr, avr)
 DEF_HELPER_FLAGS_3(VMULESB, TCG_CALL_NO_RWG, void, avr, avr, avr)
 DEF_HELPER_FLAGS_3(VMULESH, TCG_CALL_NO_RWG, void, avr, avr, avr)
 DEF_HELPER_FLAGS_3(VMULESW, TCG_CALL_NO_RWG, void, avr, avr, avr)
@@ -171,15 +171,15 @@ DEF_HELPER_FLAGS_3(VMULOSW, TCG_CALL_NO_RWG, void, avr, 
avr, avr)
 DEF_HELPER_FLAGS_3(VMULOUB, TCG_CALL_NO_RWG, void, avr, avr, avr)
 DEF_HELPER_FLAGS_3(VMULOUH, TCG_CALL_NO_RWG, void, avr, avr, avr)
 DEF_HELPER_FLAGS_3(VMULOUW, TCG_CALL_NO_RWG, void, avr, avr, avr)
-DEF_HELPER_3(vslo, void, avr, avr, avr)
-DEF_HELPER_3(vsro, void, avr, avr, avr)
-DEF_HELPER_3(vsrv, void, avr, avr, avr)
-DEF_HELPER_3(vslv, void, avr, avr, avr)
-DEF_HELPER_3(vaddcuw, void, avr, avr, avr)
-DEF_HELPER_2(vprtybw, void, avr, avr)
-DEF_HELPER_2(vprtybd, void, avr, avr)
-DEF_HELPER_2(vprtybq, void, avr, avr)
-DEF_HELPER_3(vsubcuw, void, avr, avr, avr)
+DEF_HELPER_FLAGS_3(vslo, TCG_CALL_NO_RWG, void, avr, avr, avr)
+DEF_HELPER_FLAGS_3(vsro, TCG_CALL_NO_RWG, void, avr, avr, avr)
+DEF_HELPER_FLAGS_3(vsrv, TCG_CALL_NO_RWG, void, avr, avr, avr)
+DEF_HELPER_FLAGS_3(vslv, TCG_CALL_NO_RWG, void, avr, avr, avr)
+DEF_HELPER_FLAGS_3(vaddcuw, TCG_CALL_NO_RWG, void, avr, avr, avr)
+DEF_HELPER_FLAGS_2(vprtybw, TCG_CALL_NO_RWG, void, avr, avr)
+DEF_HELPER_FLAGS_2(vprtybd, TCG_CALL_NO_RWG, void, avr, avr)
+DEF_HELPER_FLAGS_2(vprtybq, TCG_CALL_NO_RWG, void, avr, avr)
+DEF_HELPER_FLAGS_3(vsubcuw, TCG_CALL_NO_RWG, void, avr, avr, avr)
 DEF_HELPER_FLAGS_5(vaddsbs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
 DEF_HELPER_FLAGS_5(vaddshs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
 DEF_HELPER_FLAGS_5(vaddsws, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
@@ -192,19 +192,19 @@ DEF_HELPER_FLAGS_5(vadduws, TCG_CALL_NO_RWG, void, avr, 
avr, avr, avr, i32)
 DEF_HELPER_FLAGS_5(vsububs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
 DEF_HELPER_FLAGS_5(vsubuhs, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
 DEF_HELPER_FLAGS_5(vsubuws, TCG_CALL_NO_RWG, void, avr, avr, avr, avr, i32)
-DEF_HELPER_3(vadduqm, void, avr, avr, avr)
-DEF_HELPER_4(vaddecuq, void, avr, avr, avr, avr)
-DEF_HELPER_4(vaddeuqm, void, avr, avr, avr, avr)
-DEF_HELPER_3(vaddcuq, void, avr, avr, avr)
-DEF_HELPER_3(vsubuqm, void, avr, avr, avr)
-DEF_HELPER_4(vsubecuq, void, avr, avr, avr, avr)
-DEF_HELPER_4(vsubeuqm, void, avr, avr, avr, avr)
-DEF_HELPER_3(vsubcuq, void, avr, avr, avr)

[PATCH v2 00/12] Change helper declarations to use call flags

2022-05-19 Thread matheus . ferst
From: Matheus Ferst 

In our "PowerISA Vector/VSX instruction batch" patch series, rth noted[1]
that helpers that only access vector registers should be declared with
DEF_HELPER_FLAGS_* and TCG_CALL_NO_RWG. We fixed helpers in that series,
but there are older helpers that could use the same optimization.

Guided by the presence of env as the first argument, in patches 1~4 we
change helpers that do not have access to the cpu_env pointer to modify
any globals. Then, we change other helpers that receive cpu_env but do
not use it and apply the same fix, taking the opportunity to move them
to decodetree.

[1] https://lists.gnu.org/archive/html/qemu-ppc/2022-02/msg00568.html

Patches without review: 06.

v2:
 - darn32/darn64 helpers declared with TCG_CALL_NO_RWG_SE;
 - xscvspdpn implemented with helper_todouble, dropped helper_XSCVSPDPN;
 - vmsumuhs and vmsumshs helpers declared with TCG_CALL_NO_RWG;
 - Link to v1: https://lists.gnu.org/archive/html/qemu-ppc/2022-05/msg00287.html

Matheus Ferst (12):
  target/ppc: declare darn32/darn64 helpers with TCG_CALL_NO_RWG_SE
  target/ppc: use TCG_CALL_NO_RWG in vector helpers without env
  target/ppc: use TCG_CALL_NO_RWG in BCD helpers
  target/ppc: use TCG_CALL_NO_RWG in VSX helpers without env
  target/ppc: Use TCG_CALL_NO_RWG_SE in fsel helper
  target/ppc: implement xscvspdpn with helper_todouble
  target/ppc: declare xvxsigsp helper with call flags
  target/ppc: declare xxextractuw and xxinsertw helpers with call flags
  target/ppc: introduce do_va_helper
  target/ppc: declare vmsum[um]bm helpers with call flags
  target/ppc: declare vmsumuh[ms] helper with call flags
  target/ppc: declare vmsumsh[ms] helper with call flags

 target/ppc/fpu_helper.c |  22 +--
 target/ppc/helper.h | 225 ++--
 target/ppc/insn32.decode|  28 +++-
 target/ppc/int_helper.c |  22 +--
 target/ppc/translate/fp-impl.c.inc  |  30 +++-
 target/ppc/translate/fp-ops.c.inc   |   1 -
 target/ppc/translate/vmx-impl.c.inc |  62 
 target/ppc/translate/vmx-ops.c.inc  |   4 -
 target/ppc/translate/vsx-impl.c.inc | 107 -
 target/ppc/translate/vsx-ops.c.inc  |   4 -
 10 files changed, 284 insertions(+), 221 deletions(-)

-- 
2.25.1




[RFC PATCH v8 21/21] vdpa: Add x-cvq-svq

2022-05-19 Thread Eugenio Pérez
This isolates shadow cvq in its own group.

Signed-off-by: Eugenio Pérez 
---
 qapi/net.json|   8 ++-
 net/vhost-vdpa.c | 134 ---
 2 files changed, 133 insertions(+), 9 deletions(-)

diff --git a/qapi/net.json b/qapi/net.json
index cd7a1b32fe..f5b047ae15 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -447,9 +447,12 @@
 #
 # @x-svq: Start device with (experimental) shadow virtqueue. (Since 7.1)
 # (default: false)
+# @x-cvq-svq: Start device with (experimental) shadow virtqueue in its own
+# virtqueue group. (Since 7.1)
+# (default: false)
 #
 # Features:
-# @unstable: Member @x-svq is experimental.
+# @unstable: Members @x-svq and x-cvq-svq are experimental.
 #
 # Since: 5.1
 ##
@@ -457,7 +460,8 @@
   'data': {
 '*vhostdev': 'str',
 '*queues':   'int',
-'*x-svq':{'type': 'bool', 'features' : [ 'unstable'] } } }
+'*x-svq':{'type': 'bool', 'features' : [ 'unstable'] },
+'*x-cvq-svq':{'type': 'bool', 'features' : [ 'unstable'] } } }
 
 ##
 # @NetdevVmnetHostOptions:
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index ef8c82f92e..ad006a2bf3 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -70,6 +70,30 @@ const int vdpa_feature_bits[] = {
 VHOST_INVALID_FEATURE_BIT
 };
 
+/** Supported device specific feature bits with SVQ */
+static const uint64_t vdpa_svq_device_features =
+BIT_ULL(VIRTIO_NET_F_CSUM) |
+BIT_ULL(VIRTIO_NET_F_GUEST_CSUM) |
+BIT_ULL(VIRTIO_NET_F_CTRL_GUEST_OFFLOADS) |
+BIT_ULL(VIRTIO_NET_F_MTU) |
+BIT_ULL(VIRTIO_NET_F_MAC) |
+BIT_ULL(VIRTIO_NET_F_GUEST_TSO4) |
+BIT_ULL(VIRTIO_NET_F_GUEST_TSO6) |
+BIT_ULL(VIRTIO_NET_F_GUEST_ECN) |
+BIT_ULL(VIRTIO_NET_F_GUEST_UFO) |
+BIT_ULL(VIRTIO_NET_F_HOST_TSO4) |
+BIT_ULL(VIRTIO_NET_F_HOST_TSO6) |
+BIT_ULL(VIRTIO_NET_F_HOST_ECN) |
+BIT_ULL(VIRTIO_NET_F_HOST_UFO) |
+BIT_ULL(VIRTIO_NET_F_MRG_RXBUF) |
+BIT_ULL(VIRTIO_NET_F_STATUS) |
+BIT_ULL(VIRTIO_NET_F_CTRL_VQ) |
+BIT_ULL(VIRTIO_NET_F_MQ) |
+BIT_ULL(VIRTIO_F_ANY_LAYOUT) |
+BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR) |
+BIT_ULL(VIRTIO_NET_F_RSC_EXT) |
+BIT_ULL(VIRTIO_NET_F_STANDBY);
+
 VHostNetState *vhost_vdpa_get_vhost_net(NetClientState *nc)
 {
 VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
@@ -352,6 +376,17 @@ static int vhost_vdpa_get_features(int fd, uint64_t 
*features, Error **errp)
 return ret;
 }
 
+static int vhost_vdpa_get_backend_features(int fd, uint64_t *features,
+   Error **errp)
+{
+int ret = ioctl(fd, VHOST_GET_BACKEND_FEATURES, features);
+if (ret) {
+error_setg_errno(errp, errno,
+"Fail to query backend features from vhost-vDPA device");
+}
+return ret;
+}
+
 static int vhost_vdpa_get_max_queue_pairs(int fd, uint64_t features,
   int *has_cvq, Error **errp)
 {
@@ -385,16 +420,56 @@ static int vhost_vdpa_get_max_queue_pairs(int fd, 
uint64_t features,
 return 1;
 }
 
+/**
+ * Check vdpa device to support CVQ group asid 1
+ *
+ * @vdpa_device_fd: Vdpa device fd
+ * @queue_pairs: Queue pairs
+ * @errp: Error
+ */
+static int vhost_vdpa_check_cvq_svq(int vdpa_device_fd, int queue_pairs,
+Error **errp)
+{
+uint64_t backend_features;
+unsigned num_as;
+int r;
+
+r = vhost_vdpa_get_backend_features(vdpa_device_fd, _features,
+errp);
+if (unlikely(r)) {
+return -1;
+}
+
+if (unlikely(!(backend_features & VHOST_BACKEND_F_IOTLB_ASID))) {
+error_setg(errp, "Device without IOTLB_ASID feature");
+return -1;
+}
+
+r = ioctl(vdpa_device_fd, VHOST_VDPA_GET_AS_NUM, _as);
+if (unlikely(r)) {
+error_setg_errno(errp, errno,
+ "Cannot retrieve number of supported ASs");
+return -1;
+}
+if (unlikely(num_as < 2)) {
+error_setg(errp, "Insufficient number of ASs (%u, min: 2)", num_as);
+}
+
+return 0;
+}
+
 int net_init_vhost_vdpa(const Netdev *netdev, const char *name,
 NetClientState *peer, Error **errp)
 {
 const NetdevVhostVDPAOptions *opts;
+struct vhost_vdpa_iova_range iova_range;
 uint64_t features;
 int vdpa_device_fd;
 g_autofree NetClientState **ncs = NULL;
 NetClientState *nc;
 int queue_pairs, r, i, has_cvq = 0;
 g_autoptr(VhostIOVATree) iova_tree = NULL;
+ERRP_GUARD();
 
 assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
 opts = >u.vhost_vdpa;
@@ -419,14 +494,35 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char 
*name,
 qemu_close(vdpa_device_fd);
 return queue_pairs;
 }
-if (opts->x_svq) {
-struct vhost_vdpa_iova_range iova_range;
+if (opts->x_cvq_svq || opts->x_svq) {
+vhost_vdpa_get_iova_range(vdpa_device_fd, _range);
+
+

[RFC PATCH v8 18/21] vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs

2022-05-19 Thread Eugenio Pérez
To know the device features is needed for CVQ SVQ, so SVQ knows if it
can handle all commands or not. Extract from
vhost_vdpa_get_max_queue_pairs so we can reuse it.

Signed-off-by: Eugenio Pérez 
---
 net/vhost-vdpa.c | 30 --
 1 file changed, 20 insertions(+), 10 deletions(-)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index a66f73ff63..8960b8db74 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -325,20 +325,24 @@ static NetClientState *net_vhost_vdpa_init(NetClientState 
*peer,
 return nc;
 }
 
-static int vhost_vdpa_get_max_queue_pairs(int fd, int *has_cvq, Error **errp)
+static int vhost_vdpa_get_features(int fd, uint64_t *features, Error **errp)
+{
+int ret = ioctl(fd, VHOST_GET_FEATURES, features);
+if (ret) {
+error_setg_errno(errp, errno,
+ "Fail to query features from vhost-vDPA device");
+}
+return ret;
+}
+
+static int vhost_vdpa_get_max_queue_pairs(int fd, uint64_t features,
+  int *has_cvq, Error **errp)
 {
 unsigned long config_size = offsetof(struct vhost_vdpa_config, buf);
 g_autofree struct vhost_vdpa_config *config = NULL;
 __virtio16 *max_queue_pairs;
-uint64_t features;
 int ret;
 
-ret = ioctl(fd, VHOST_GET_FEATURES, );
-if (ret) {
-error_setg(errp, "Fail to query features from vhost-vDPA device");
-return ret;
-}
-
 if (features & (1 << VIRTIO_NET_F_CTRL_VQ)) {
 *has_cvq = 1;
 } else {
@@ -368,10 +372,11 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char 
*name,
 NetClientState *peer, Error **errp)
 {
 const NetdevVhostVDPAOptions *opts;
+uint64_t features;
 int vdpa_device_fd;
 g_autofree NetClientState **ncs = NULL;
 NetClientState *nc;
-int queue_pairs, i, has_cvq = 0;
+int queue_pairs, r, i, has_cvq = 0;
 
 assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
 opts = >u.vhost_vdpa;
@@ -385,7 +390,12 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char 
*name,
 return -errno;
 }
 
-queue_pairs = vhost_vdpa_get_max_queue_pairs(vdpa_device_fd,
+r = vhost_vdpa_get_features(vdpa_device_fd, , errp);
+if (r) {
+return r;
+}
+
+queue_pairs = vhost_vdpa_get_max_queue_pairs(vdpa_device_fd, features,
  _cvq, errp);
 if (queue_pairs < 0) {
 qemu_close(vdpa_device_fd);
-- 
2.27.0




[RFC PATCH v8 20/21] vdpa: Add x-svq to NetdevVhostVDPAOptions

2022-05-19 Thread Eugenio Pérez
Finally offering the possibility to enable SVQ from the command line.

Signed-off-by: Eugenio Pérez 
---
 qapi/net.json|  9 -
 net/vhost-vdpa.c | 38 +++---
 2 files changed, 43 insertions(+), 4 deletions(-)

diff --git a/qapi/net.json b/qapi/net.json
index d6f7cfd4d6..cd7a1b32fe 100644
--- a/qapi/net.json
+++ b/qapi/net.json
@@ -445,12 +445,19 @@
 # @queues: number of queues to be created for multiqueue vhost-vdpa
 #  (default: 1)
 #
+# @x-svq: Start device with (experimental) shadow virtqueue. (Since 7.1)
+# (default: false)
+#
+# Features:
+# @unstable: Member @x-svq is experimental.
+#
 # Since: 5.1
 ##
 { 'struct': 'NetdevVhostVDPAOptions',
   'data': {
 '*vhostdev': 'str',
-'*queues':   'int' } }
+'*queues':   'int',
+'*x-svq':{'type': 'bool', 'features' : [ 'unstable'] } } }
 
 ##
 # @NetdevVmnetHostOptions:
diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 8960b8db74..ef8c82f92e 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -129,6 +129,7 @@ static void vhost_vdpa_cleanup(NetClientState *nc)
 {
 VhostVDPAState *s = DO_UPCAST(VhostVDPAState, nc, nc);
 
+g_clear_pointer(>vhost_vdpa.iova_tree, vhost_iova_tree_release);
 if (s->vhost_net) {
 vhost_net_cleanup(s->vhost_net);
 g_free(s->vhost_net);
@@ -188,6 +189,14 @@ static NetClientInfo net_vhost_vdpa_info = {
 .check_peer_type = vhost_vdpa_check_peer_type,
 };
 
+static int vhost_vdpa_get_iova_range(int fd,
+ struct vhost_vdpa_iova_range *iova_range)
+{
+int ret = ioctl(fd, VHOST_VDPA_GET_IOVA_RANGE, iova_range);
+
+return ret < 0 ? -errno : 0;
+}
+
 static int vhost_vdpa_start_control_svq(VhostShadowVirtqueue *svq,
 struct vhost_dev *dev)
 {
@@ -295,7 +304,9 @@ static NetClientState *net_vhost_vdpa_init(NetClientState 
*peer,
int vdpa_device_fd,
int queue_pair_index,
int nvqs,
-   bool is_datapath)
+   bool is_datapath,
+   bool svq,
+   VhostIOVATree *iova_tree)
 {
 NetClientState *nc = NULL;
 VhostVDPAState *s;
@@ -313,12 +324,18 @@ static NetClientState *net_vhost_vdpa_init(NetClientState 
*peer,
 
 s->vhost_vdpa.device_fd = vdpa_device_fd;
 s->vhost_vdpa.index = queue_pair_index;
+s->vhost_vdpa.shadow_vqs_enabled = svq;
+s->vhost_vdpa.iova_tree = iova_tree ? vhost_iova_tree_acquire(iova_tree) :
+  NULL;
 if (!is_datapath) {
 s->vhost_vdpa.shadow_vq_ops = _vdpa_net_svq_ops;
 s->vhost_vdpa.svq_copy_descs = true;
 }
 ret = vhost_vdpa_add(nc, (void *)>vhost_vdpa, queue_pair_index, nvqs);
 if (ret) {
+if (iova_tree) {
+vhost_iova_tree_release(iova_tree);
+}
 qemu_del_net_client(nc);
 return NULL;
 }
@@ -377,6 +394,7 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char 
*name,
 g_autofree NetClientState **ncs = NULL;
 NetClientState *nc;
 int queue_pairs, r, i, has_cvq = 0;
+g_autoptr(VhostIOVATree) iova_tree = NULL;
 
 assert(netdev->type == NET_CLIENT_DRIVER_VHOST_VDPA);
 opts = >u.vhost_vdpa;
@@ -401,19 +419,31 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char 
*name,
 qemu_close(vdpa_device_fd);
 return queue_pairs;
 }
+if (opts->x_svq) {
+struct vhost_vdpa_iova_range iova_range;
+
+if (has_cvq) {
+error_setg(errp, "vdpa svq does not work with cvq");
+goto err_svq;
+}
+vhost_vdpa_get_iova_range(vdpa_device_fd, _range);
+iova_tree = vhost_iova_tree_new(iova_range.first, iova_range.last);
+}
 
 ncs = g_malloc0(sizeof(*ncs) * queue_pairs);
 
 for (i = 0; i < queue_pairs; i++) {
 ncs[i] = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
- vdpa_device_fd, i, 2, true);
+ vdpa_device_fd, i, 2, true, opts->x_svq,
+ iova_tree);
 if (!ncs[i])
 goto err;
 }
 
 if (has_cvq) {
 nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
- vdpa_device_fd, i, 1, false);
+ vdpa_device_fd, i, 1, false, opts->x_svq,
+ iova_tree);
 if (!nc)
 goto err;
 }
@@ -426,6 +456,8 @@ err:
 qemu_del_net_client(ncs[i]);
 }
 }
+
+err_svq:
 qemu_close(vdpa_device_fd);
 
 return -1;
-- 
2.27.0




[RFC PATCH v8 12/21] vdpa: delay set_vring_ready after DRIVER_OK

2022-05-19 Thread Eugenio Pérez
To restore the device in the destination of a live migration we send the
commands through control virtqueue. For a device to read CVQ it must
have received DRIVER_OK status bit.

However this open a window where the device could start receiving
packets in rx queue 0 before it receive the RSS configuration. To avoid
that, we will not send vring_enable until all configuration is used by
the device.

As a first step, reverse the DRIVER_OK and SET_VRING_ENABLE steps.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-vdpa.c | 20 +++-
 1 file changed, 15 insertions(+), 5 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 31b3d4d013..13e5e2a061 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -748,13 +748,18 @@ static int vhost_vdpa_get_vq_index(struct vhost_dev *dev, 
int idx)
 return idx;
 }
 
+/**
+ * Set ready all vring of the device
+ *
+ * @dev: Vhost device
+ */
 static int vhost_vdpa_set_vring_ready(struct vhost_dev *dev)
 {
 int i;
 trace_vhost_vdpa_set_vring_ready(dev);
-for (i = 0; i < dev->nvqs; ++i) {
+for (i = 0; i < dev->vq_index_end; ++i) {
 struct vhost_vring_state state = {
-.index = dev->vq_index + i,
+.index = i,
 .num = 1,
 };
 vhost_vdpa_call(dev, VHOST_VDPA_SET_VRING_ENABLE, );
@@ -1117,7 +1122,6 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, 
bool started)
 if (unlikely(!ok)) {
 return -1;
 }
-vhost_vdpa_set_vring_ready(dev);
 } else {
 ok = vhost_vdpa_svqs_stop(dev);
 if (unlikely(!ok)) {
@@ -1131,16 +1135,22 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, 
bool started)
 }
 
 if (started) {
+int r;
 memory_listener_register(>listener, _space_memory);
-return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
+r = vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
+if (unlikely(r)) {
+return r;
+}
+vhost_vdpa_set_vring_ready(dev);
 } else {
 vhost_vdpa_reset_device(dev);
 vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
VIRTIO_CONFIG_S_DRIVER);
 memory_listener_unregister(>listener);
 
-return 0;
 }
+
+return 0;
 }
 
 static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
-- 
2.27.0




[RFC PATCH v8 19/21] vhost: Add reference counting to vhost_iova_tree

2022-05-19 Thread Eugenio Pérez
Now that different vqs can have different ASIDs its easier to track them
using reference counters.

QEMU's glib version still does not have them so we've copied g_rc_box,
so the implementation can be converted to glib's one when the minimum
version is raised.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-iova-tree.h |  5 +++--
 hw/virtio/vhost-iova-tree.c | 21 +++--
 2 files changed, 22 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h
index 1ffcdc5b57..bacd17d99c 100644
--- a/hw/virtio/vhost-iova-tree.h
+++ b/hw/virtio/vhost-iova-tree.h
@@ -16,8 +16,9 @@
 typedef struct VhostIOVATree VhostIOVATree;
 
 VhostIOVATree *vhost_iova_tree_new(uint64_t iova_first, uint64_t iova_last);
-void vhost_iova_tree_delete(VhostIOVATree *iova_tree);
-G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostIOVATree, vhost_iova_tree_delete);
+VhostIOVATree *vhost_iova_tree_acquire(VhostIOVATree *iova_tree);
+void vhost_iova_tree_release(VhostIOVATree *iova_tree);
+G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostIOVATree, vhost_iova_tree_release);
 
 const DMAMap *vhost_iova_tree_find(const VhostIOVATree *iova_tree,
const DMAMap *map);
diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c
index 1a59894385..208476b3db 100644
--- a/hw/virtio/vhost-iova-tree.c
+++ b/hw/virtio/vhost-iova-tree.c
@@ -28,6 +28,9 @@ struct VhostIOVATree {
 
 /* IOVA address to qemu memory maps. */
 IOVATree *iova_taddr_map;
+
+/* Reference count */
+size_t refcnt;
 };
 
 /**
@@ -44,14 +47,28 @@ VhostIOVATree *vhost_iova_tree_new(hwaddr iova_first, 
hwaddr iova_last)
 tree->iova_last = iova_last;
 
 tree->iova_taddr_map = iova_tree_new();
+tree->refcnt = 1;
 return tree;
 }
 
 /**
- * Delete an iova tree
+ * Increases the reference count of the iova tree
+ */
+VhostIOVATree *vhost_iova_tree_acquire(VhostIOVATree *iova_tree)
+{
+++iova_tree->refcnt;
+return iova_tree;
+}
+
+/**
+ * Decrease reference counter of iova tree, freeing if it reaches 0
  */
-void vhost_iova_tree_delete(VhostIOVATree *iova_tree)
+void vhost_iova_tree_release(VhostIOVATree *iova_tree)
 {
+if (--iova_tree->refcnt) {
+return;
+}
+
 iova_tree_destroy(iova_tree->iova_taddr_map);
 g_free(iova_tree);
 }
-- 
2.27.0




[RFC PATCH v8 15/21] vhost: add vhost_svq_poll

2022-05-19 Thread Eugenio Pérez
It allows the Shadow Control VirtQueue to wait the device to use the commands
that restore the net device state after a live migration.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.h |  1 +
 hw/virtio/vhost-shadow-virtqueue.c | 57 +++---
 2 files changed, 54 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index 3c55fe2641..20ca59e9a7 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -124,6 +124,7 @@ bool vhost_svq_valid_features(uint64_t features, Error 
**errp);
 
 int vhost_svq_inject(VhostShadowVirtqueue *svq, const struct iovec *iov,
  size_t out_num, size_t in_num);
+ssize_t vhost_svq_poll(VhostShadowVirtqueue *svq);
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
 void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
 void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index c535c99905..831ffb71e5 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -10,6 +10,8 @@
 #include "qemu/osdep.h"
 #include "hw/virtio/vhost-shadow-virtqueue.h"
 
+#include 
+
 #include "qemu/error-report.h"
 #include "qapi/error.h"
 #include "qemu/main-loop.h"
@@ -583,10 +585,11 @@ static bool vhost_svq_unmap_elem(VhostShadowVirtqueue 
*svq, SVQElement *svq_elem
 return true;
 }
 
-static void vhost_svq_flush(VhostShadowVirtqueue *svq,
-bool check_for_avail_queue)
+static size_t vhost_svq_flush(VhostShadowVirtqueue *svq,
+  bool check_for_avail_queue)
 {
 VirtQueue *vq = svq->vq;
+size_t ret = 0;
 
 /* Forward as many used buffers as possible. */
 do {
@@ -604,7 +607,7 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
 if (svq->copy_descs) {
 bool ok = vhost_svq_unmap_elem(svq, svq_elem, len, true);
 if (unlikely(!ok)) {
-return;
+return ret;
 }
 }
 
@@ -621,10 +624,12 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
 i, svq->vring.num);
 virtqueue_fill(vq, elem, len, i);
 virtqueue_flush(vq, i);
-return;
+return ret + 1;
 }
 virtqueue_fill(vq, elem, len, i++);
 }
+
+ret++;
 }
 
 if (i > 0) {
@@ -640,6 +645,50 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
 vhost_handle_guest_kick(svq);
 }
 } while (!vhost_svq_enable_notification(svq));
+
+return ret;
+}
+
+/**
+ * Poll the SVQ for device used buffers.
+ *
+ * This function race with main event loop SVQ polling, so extra
+ * syncthronization is needed.
+ *
+ * Return the number of descriptors read from the device.
+ */
+ssize_t vhost_svq_poll(VhostShadowVirtqueue *svq)
+{
+int fd = event_notifier_get_fd(>hdev_call);
+GPollFD poll_fd = {
+.fd = fd,
+.events = G_IO_IN,
+};
+assert(fd >= 0);
+int r = g_poll(_fd, 1, -1);
+
+if (unlikely(r < 0)) {
+error_report("Cannot poll device call fd "G_POLLFD_FORMAT": (%d) %s",
+ poll_fd.fd, errno, g_strerror(errno));
+return -errno;
+}
+
+if (r == 0) {
+return 0;
+}
+
+if (unlikely(poll_fd.revents & ~(G_IO_IN))) {
+error_report(
+"Error polling device call fd "G_POLLFD_FORMAT": revents=%d",
+poll_fd.fd, poll_fd.revents);
+return -1;
+}
+
+/*
+ * Max return value of vhost_svq_flush is (uint16_t)-1, so it's safe to
+ * convert to ssize_t.
+ */
+return vhost_svq_flush(svq, false);
 }
 
 /**
-- 
2.27.0




[RFC PATCH v8 11/21] vhost: Update kernel headers

2022-05-19 Thread Eugenio Pérez
Signed-off-by: Eugenio Pérez 
---
 include/standard-headers/linux/vhost_types.h | 11 -
 linux-headers/linux/vhost.h  | 25 
 2 files changed, 30 insertions(+), 6 deletions(-)

diff --git a/include/standard-headers/linux/vhost_types.h 
b/include/standard-headers/linux/vhost_types.h
index 0bd2684a2a..ce78551b0f 100644
--- a/include/standard-headers/linux/vhost_types.h
+++ b/include/standard-headers/linux/vhost_types.h
@@ -87,7 +87,7 @@ struct vhost_msg {
 
 struct vhost_msg_v2 {
uint32_t type;
-   uint32_t reserved;
+   uint32_t asid;
union {
struct vhost_iotlb_msg iotlb;
uint8_t padding[64];
@@ -153,4 +153,13 @@ struct vhost_vdpa_iova_range {
 /* vhost-net should add virtio_net_hdr for RX, and strip for TX packets. */
 #define VHOST_NET_F_VIRTIO_NET_HDR 27
 
+/* Use message type V2 */
+#define VHOST_BACKEND_F_IOTLB_MSG_V2 0x1
+/* IOTLB can accept batching hints */
+#define VHOST_BACKEND_F_IOTLB_BATCH  0x2
+/* IOTLB can accept address space identifier through V2 type of IOTLB
+ * message
+ */
+#define VHOST_BACKEND_F_IOTLB_ASID  0x3
+
 #endif
diff --git a/linux-headers/linux/vhost.h b/linux-headers/linux/vhost.h
index 5d99e7c242..d42eb46efd 100644
--- a/linux-headers/linux/vhost.h
+++ b/linux-headers/linux/vhost.h
@@ -89,11 +89,6 @@
 
 /* Set or get vhost backend capability */
 
-/* Use message type V2 */
-#define VHOST_BACKEND_F_IOTLB_MSG_V2 0x1
-/* IOTLB can accept batching hints */
-#define VHOST_BACKEND_F_IOTLB_BATCH  0x2
-
 #define VHOST_SET_BACKEND_FEATURES _IOW(VHOST_VIRTIO, 0x25, __u64)
 #define VHOST_GET_BACKEND_FEATURES _IOR(VHOST_VIRTIO, 0x26, __u64)
 
@@ -154,6 +149,26 @@
 /* Get the config size */
 #define VHOST_VDPA_GET_CONFIG_SIZE _IOR(VHOST_VIRTIO, 0x79, __u32)
 
+/* Get the number of virtqueue groups. */
+#define VHOST_VDPA_GET_GROUP_NUM   _IOR(VHOST_VIRTIO, 0x7A, unsigned int)
+
+/* Get the number of address spaces. */
+#define VHOST_VDPA_GET_AS_NUM  _IOR(VHOST_VIRTIO, 0x7B, unsigned int)
+
+/* Get the group for a virtqueue: read index, write group in num,
+ * The virtqueue index is stored in the index field of
+ * vhost_vring_state. The group for this specific virtqueue is
+ * returned via num field of vhost_vring_state.
+ */
+#define VHOST_VDPA_GET_VRING_GROUP _IOWR(VHOST_VIRTIO, 0x7C,   \
+ struct vhost_vring_state)
+/* Set the ASID for a virtqueue group. The group index is stored in
+ * the index field of vhost_vring_state, the ASID associated with this
+ * group is stored at num field of vhost_vring_state.
+ */
+#define VHOST_VDPA_SET_GROUP_ASID  _IOW(VHOST_VIRTIO, 0x7D, \
+struct vhost_vring_state)
+
 /* Get the count of all virtqueues */
 #define VHOST_VDPA_GET_VQS_COUNT   _IOR(VHOST_VIRTIO, 0x80, __u32)
 
-- 
2.27.0




[RFC PATCH v8 16/21] vdpa: Add vhost_vdpa_start_control_svq

2022-05-19 Thread Eugenio Pérez
As a first step we only enable CVQ first than others. Future patches add
state restore.

Signed-off-by: Eugenio Pérez 
---
 net/vhost-vdpa.c | 61 
 1 file changed, 61 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index 174fec5e77..a66f73ff63 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -188,6 +188,66 @@ static NetClientInfo net_vhost_vdpa_info = {
 .check_peer_type = vhost_vdpa_check_peer_type,
 };
 
+static int vhost_vdpa_start_control_svq(VhostShadowVirtqueue *svq,
+struct vhost_dev *dev)
+{
+struct vhost_vring_state state = {
+.index = virtio_get_queue_index(svq->vq),
+.num = 1,
+};
+struct vhost_vdpa *v = dev->opaque;
+VirtIONet *n = VIRTIO_NET(dev->vdev);
+uint64_t features = dev->vdev->host_features;
+int r;
+size_t num = 0;
+
+assert(dev->vhost_ops->backend_type == VHOST_BACKEND_TYPE_VDPA);
+
+r = ioctl(v->device_fd, VHOST_VDPA_SET_VRING_ENABLE, );
+if (r < 0) {
+return -errno;
+}
+
+if (features & BIT_ULL(VIRTIO_NET_F_CTRL_MAC_ADDR)) {
+const struct virtio_net_ctrl_hdr ctrl = {
+.class = VIRTIO_NET_CTRL_MAC,
+.cmd = VIRTIO_NET_CTRL_MAC_ADDR_SET,
+};
+uint8_t mac[6];
+virtio_net_ctrl_ack ack;
+const struct iovec data[] = {
+{
+.iov_base = (void *),
+.iov_len = sizeof(ctrl),
+},{
+.iov_base = mac,
+.iov_len = sizeof(mac),
+},{
+.iov_base = ,
+.iov_len = sizeof(ack),
+}
+};
+
+memcpy(mac, n->mac, sizeof(mac));
+r = vhost_svq_inject(svq, data, 2, 1);
+if (unlikely(r)) {
+return r;
+}
+num++;
+}
+
+while (num) {
+/*
+ * We can call vhost_svq_poll here because BQL protects calls to run.
+ */
+size_t used = vhost_svq_poll(svq);
+assert(used <= num);
+num -= used;
+}
+
+return 0;
+}
+
 static void vhost_vdpa_net_handle_ctrl(VirtIODevice *vdev,
const VirtQueueElement *elem)
 {
@@ -226,6 +286,7 @@ static void vhost_vdpa_net_handle_ctrl(VirtIODevice *vdev,
 
 static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
 .used_elem_handler = vhost_vdpa_net_handle_ctrl,
+.start = vhost_vdpa_start_control_svq,
 };
 
 static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
-- 
2.27.0




[RFC PATCH v8 17/21] vdpa: Add asid attribute to vdpa device

2022-05-19 Thread Eugenio Pérez
We can configure ASID per group, but we still use asid 0 for every vdpa
device. Multiple asid support for cvq will be introduced in next
patches

Signed-off-by: Eugenio Pérez 
---
 include/hw/virtio/vhost.h |  1 +
 hw/net/vhost_net.c|  1 +
 hw/virtio/vhost-vdpa.c| 71 +++
 hw/virtio/trace-events|  9 ++---
 4 files changed, 72 insertions(+), 10 deletions(-)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index cebec1d817..eadaf055f0 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -84,6 +84,7 @@ struct vhost_dev {
 int vq_index_end;
 /* if non-zero, minimum required value for max_queues */
 int num_queues;
+uint32_t address_space_id;
 /* Must be a vq group different than any other vhost dev */
 bool independent_vq_group;
 uint64_t features;
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index 1c2386c01c..4d79d622f7 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -348,6 +348,7 @@ int vhost_net_start(VirtIODevice *dev, NetClientState *ncs,
 }
 
 net = get_vhost_net(peer);
+net->dev.address_space_id = !!cvq_idx;
 net->dev.independent_vq_group = !!cvq_idx;
 vhost_net_set_vq_index(net, i * 2, index_end);
 
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 52dd8baa8d..0208e36589 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -79,14 +79,18 @@ static int vhost_vdpa_dma_map(struct vhost_vdpa *v, hwaddr 
iova, hwaddr size,
 int ret = 0;
 
 msg.type = v->msg_type;
+if (v->dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID)) {
+msg.asid = v->dev->address_space_id;
+}
 msg.iotlb.iova = iova;
 msg.iotlb.size = size;
 msg.iotlb.uaddr = (uint64_t)(uintptr_t)vaddr;
 msg.iotlb.perm = readonly ? VHOST_ACCESS_RO : VHOST_ACCESS_RW;
 msg.iotlb.type = VHOST_IOTLB_UPDATE;
 
-   trace_vhost_vdpa_dma_map(v, fd, msg.type, msg.iotlb.iova, msg.iotlb.size,
-msg.iotlb.uaddr, msg.iotlb.perm, msg.iotlb.type);
+trace_vhost_vdpa_dma_map(v, fd, msg.type, msg.asid, msg.iotlb.iova,
+ msg.iotlb.size, msg.iotlb.uaddr, msg.iotlb.perm,
+ msg.iotlb.type);
 
 if (write(fd, , sizeof(msg)) != sizeof(msg)) {
 error_report("failed to write, fd=%d, errno=%d (%s)",
@@ -104,12 +108,15 @@ static int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, 
hwaddr iova,
 int fd = v->device_fd;
 int ret = 0;
 
+if (v->dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID)) {
+msg.asid = v->dev->address_space_id;
+}
 msg.type = v->msg_type;
 msg.iotlb.iova = iova;
 msg.iotlb.size = size;
 msg.iotlb.type = VHOST_IOTLB_INVALIDATE;
 
-trace_vhost_vdpa_dma_unmap(v, fd, msg.type, msg.iotlb.iova,
+trace_vhost_vdpa_dma_unmap(v, fd, msg.type, msg.asid, msg.iotlb.iova,
msg.iotlb.size, msg.iotlb.type);
 
 if (write(fd, , sizeof(msg)) != sizeof(msg)) {
@@ -123,13 +130,19 @@ static int vhost_vdpa_dma_unmap(struct vhost_vdpa *v, 
hwaddr iova,
 
 static void vhost_vdpa_listener_begin_batch(struct vhost_vdpa *v)
 {
+struct vhost_dev *dev = v->dev;
 int fd = v->device_fd;
 struct vhost_msg_v2 msg = {
 .type = v->msg_type,
 .iotlb.type = VHOST_IOTLB_BATCH_BEGIN,
 };
 
-trace_vhost_vdpa_listener_begin_batch(v, fd, msg.type, msg.iotlb.type);
+if (dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID)) {
+msg.asid = v->dev->address_space_id;
+}
+
+trace_vhost_vdpa_listener_begin_batch(v, fd, msg.type, msg.asid,
+  msg.iotlb.type);
 if (write(fd, , sizeof(msg)) != sizeof(msg)) {
 error_report("failed to write, fd=%d, errno=%d (%s)",
  fd, errno, strerror(errno));
@@ -161,10 +174,14 @@ static void vhost_vdpa_listener_commit(MemoryListener 
*listener)
 return;
 }
 
+if (dev->backend_cap & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID)) {
+msg.asid = v->dev->address_space_id;
+}
+
 msg.type = v->msg_type;
 msg.iotlb.type = VHOST_IOTLB_BATCH_END;
-
-trace_vhost_vdpa_listener_commit(v, fd, msg.type, msg.iotlb.type);
+trace_vhost_vdpa_listener_commit(v, fd, msg.type, msg.asid,
+ msg.iotlb.type);
 if (write(fd, , sizeof(msg)) != sizeof(msg)) {
 error_report("failed to write, fd=%d, errno=%d (%s)",
  fd, errno, strerror(errno));
@@ -1183,10 +1200,48 @@ call_err:
 return false;
 }
 
+static int vhost_vdpa_set_vq_group_address_space_id(struct vhost_dev *dev,
+struct vhost_vring_state *asid)
+{
+trace_vhost_vdpa_set_vq_group_address_space_id(dev, asid->index, 
asid->num);
+return vhost_vdpa_call(dev, VHOST_VDPA_SET_GROUP_ASID, asid);
+}
+
+static int 

[RFC PATCH v8 08/21] vhost: Add SVQElement

2022-05-19 Thread Eugenio Pérez
This allows SVQ to add metadata to the different queue elements.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.h |  8 --
 hw/virtio/vhost-shadow-virtqueue.c | 46 --
 2 files changed, 31 insertions(+), 23 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index 50f45153c0..e06ac52158 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -15,6 +15,10 @@
 #include "standard-headers/linux/vhost_types.h"
 #include "hw/virtio/vhost-iova-tree.h"
 
+typedef struct SVQElement {
+VirtQueueElement elem;
+} SVQElement;
+
 typedef void (*VirtQueueElementCallback)(VirtIODevice *vdev,
  const VirtQueueElement *elem);
 
@@ -64,10 +68,10 @@ typedef struct VhostShadowVirtqueue {
 VhostIOVATree *iova_tree;
 
 /* Map for use the guest's descriptors */
-VirtQueueElement **ring_id_maps;
+SVQElement **ring_id_maps;
 
 /* Next VirtQueue element that guest made available */
-VirtQueueElement *next_guest_avail_elem;
+SVQElement *next_guest_avail_elem;
 
 /*
  * Backup next field for each descriptor so we can recover securely, not
diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index 2d5d27d29c..044005ba89 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -171,9 +171,10 @@ static bool 
vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
 return true;
 }
 
-static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
-VirtQueueElement *elem, unsigned *head)
+static bool vhost_svq_add_split(VhostShadowVirtqueue *svq, SVQElement 
*svq_elem,
+unsigned *head)
 {
+const VirtQueueElement *elem = _elem->elem;
 unsigned avail_idx;
 vring_avail_t *avail = svq->vring.avail;
 bool ok;
@@ -222,7 +223,7 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
  * takes ownership of the element: In case of failure, it is free and the SVQ
  * is considered broken.
  */
-static bool vhost_svq_add(VhostShadowVirtqueue *svq, VirtQueueElement *elem)
+static bool vhost_svq_add(VhostShadowVirtqueue *svq, SVQElement *elem)
 {
 unsigned qemu_head;
 bool ok = vhost_svq_add_split(svq, elem, _head);
@@ -272,19 +273,21 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue 
*svq)
 virtio_queue_set_notification(svq->vq, false);
 
 while (true) {
+SVQElement *svq_elem;
 VirtQueueElement *elem;
 bool ok;
 
 if (svq->next_guest_avail_elem) {
-elem = g_steal_pointer(>next_guest_avail_elem);
+svq_elem = g_steal_pointer(>next_guest_avail_elem);
 } else {
-elem = virtqueue_pop(svq->vq, sizeof(*elem));
+svq_elem = virtqueue_pop(svq->vq, sizeof(*svq_elem));
 }
 
-if (!elem) {
+if (!svq_elem) {
 break;
 }
 
+elem = _elem->elem;
 if (elem->out_num + elem->in_num > vhost_svq_available_slots(svq)) 
{
 /*
  * This condition is possible since a contiguous buffer in GPA
@@ -297,11 +300,11 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue 
*svq)
  * queue the current guest descriptor and ignore further kicks
  * until some elements are used.
  */
-svq->next_guest_avail_elem = elem;
+svq->next_guest_avail_elem = svq_elem;
 return;
 }
 
-ok = vhost_svq_add(svq, elem);
+ok = vhost_svq_add(svq, svq_elem);
 if (unlikely(!ok)) {
 /* VQ is broken, just return and ignore any other kicks */
 return;
@@ -368,8 +371,7 @@ static uint16_t vhost_svq_last_desc_of_chain(const 
VhostShadowVirtqueue *svq,
 return i;
 }
 
-static VirtQueueElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq,
-   uint32_t *len)
+static SVQElement *vhost_svq_get_buf(VhostShadowVirtqueue *svq, uint32_t *len)
 {
 const vring_used_t *used = svq->vring.used;
 vring_used_elem_t used_elem;
@@ -399,8 +401,8 @@ static VirtQueueElement 
*vhost_svq_get_buf(VhostShadowVirtqueue *svq,
 return NULL;
 }
 
-num = svq->ring_id_maps[used_elem.id]->in_num +
-  svq->ring_id_maps[used_elem.id]->out_num;
+num = svq->ring_id_maps[used_elem.id]->elem.in_num +
+  svq->ring_id_maps[used_elem.id]->elem.out_num;
 last_used_chain = vhost_svq_last_desc_of_chain(svq, num, used_elem.id);
 svq->desc_next[last_used_chain] = svq->free_head;
 svq->free_head = used_elem.id;
@@ -421,11 +423,13 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
 vhost_svq_disable_notification(svq);
 

[RFC PATCH v8 13/21] vhost: Add ShadowVirtQueueStart operation

2022-05-19 Thread Eugenio Pérez
It allows to run commands at SVQ start.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.h |  4 
 hw/virtio/vhost-vdpa.c | 14 ++
 2 files changed, 18 insertions(+)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index 8fe0367944..3c55fe2641 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -28,10 +28,14 @@ typedef struct SVQElement {
 bool not_from_guest;
 } SVQElement;
 
+typedef struct VhostShadowVirtqueue VhostShadowVirtqueue;
+typedef int (*ShadowVirtQueueStart)(VhostShadowVirtqueue *svq,
+struct vhost_dev *dev);
 typedef void (*VirtQueueElementCallback)(VirtIODevice *vdev,
  const VirtQueueElement *elem);
 
 typedef struct VhostShadowVirtqueueOps {
+ShadowVirtQueueStart start;
 VirtQueueElementCallback used_elem_handler;
 } VhostShadowVirtqueueOps;
 
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 13e5e2a061..eec6d544e9 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1141,6 +1141,20 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, 
bool started)
 if (unlikely(r)) {
 return r;
 }
+
+if (v->shadow_vqs_enabled) {
+for (unsigned i = 0; i < v->shadow_vqs->len; ++i) {
+VhostShadowVirtqueue *svq = g_ptr_array_index(v->shadow_vqs,
+  i);
+if (svq->ops && svq->ops->start) {
+r = svq->ops->start(svq, dev);
+if (unlikely(r)) {
+return r;
+}
+}
+}
+}
+
 vhost_vdpa_set_vring_ready(dev);
 } else {
 vhost_vdpa_reset_device(dev);
-- 
2.27.0




[RFC PATCH v8 14/21] vhost: Make possible to check for device exclusive vq group

2022-05-19 Thread Eugenio Pérez
CVQ needs to be in its own group, not shared with any data vq. Enable
the checking of it here, before introducing address space id concepts.

Signed-off-by: Eugenio Pérez 
---
 include/hw/virtio/vhost.h |  2 +
 hw/net/vhost_net.c|  4 +-
 hw/virtio/vhost-vdpa.c| 79 ++-
 hw/virtio/trace-events|  1 +
 4 files changed, 84 insertions(+), 2 deletions(-)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index b291fe4e24..cebec1d817 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -84,6 +84,8 @@ struct vhost_dev {
 int vq_index_end;
 /* if non-zero, minimum required value for max_queues */
 int num_queues;
+/* Must be a vq group different than any other vhost dev */
+bool independent_vq_group;
 uint64_t features;
 uint64_t acked_features;
 uint64_t backend_features;
diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
index ccac5b7a64..1c2386c01c 100644
--- a/hw/net/vhost_net.c
+++ b/hw/net/vhost_net.c
@@ -339,14 +339,16 @@ int vhost_net_start(VirtIODevice *dev, NetClientState 
*ncs,
 }
 
 for (i = 0; i < nvhosts; i++) {
+bool cvq_idx = i >= data_queue_pairs;
 
-if (i < data_queue_pairs) {
+if (!cvq_idx) {
 peer = qemu_get_peer(ncs, i);
 } else { /* Control Virtqueue */
 peer = qemu_get_peer(ncs, n->max_queue_pairs);
 }
 
 net = get_vhost_net(peer);
+net->dev.independent_vq_group = !!cvq_idx;
 vhost_net_set_vq_index(net, i * 2, index_end);
 
 /* Suppress the masking guest notifiers on vhost user
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index eec6d544e9..52dd8baa8d 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -685,7 +685,8 @@ static int vhost_vdpa_set_backend_cap(struct vhost_dev *dev)
 {
 uint64_t features;
 uint64_t f = 0x1ULL << VHOST_BACKEND_F_IOTLB_MSG_V2 |
-0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH;
+0x1ULL << VHOST_BACKEND_F_IOTLB_BATCH |
+0x1ULL << VHOST_BACKEND_F_IOTLB_ASID;
 int r;
 
 if (vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES, )) {
@@ -1110,6 +,78 @@ static bool vhost_vdpa_svqs_stop(struct vhost_dev *dev)
 return true;
 }
 
+static int vhost_vdpa_get_vring_group(struct vhost_dev *dev,
+  struct vhost_vring_state *state)
+{
+int ret = vhost_vdpa_call(dev, VHOST_VDPA_GET_VRING_GROUP, state);
+trace_vhost_vdpa_get_vring_group(dev, state->index, state->num);
+return ret;
+}
+
+static bool vhost_dev_is_independent_group(struct vhost_dev *dev)
+{
+struct vhost_vdpa *v = dev->opaque;
+struct vhost_vring_state this_vq_group = {
+.index = dev->vq_index,
+};
+int ret;
+
+if (!(dev->backend_cap & VHOST_BACKEND_F_IOTLB_ASID)) {
+return true;
+}
+
+if (!v->shadow_vqs_enabled) {
+return true;
+}
+
+ret = vhost_vdpa_get_vring_group(dev, _vq_group);
+if (unlikely(ret)) {
+goto call_err;
+}
+
+for (int i = 1; i < dev->nvqs; ++i) {
+struct vhost_vring_state vq_group = {
+.index = dev->vq_index + i,
+};
+
+ret = vhost_vdpa_get_vring_group(dev, _group);
+if (unlikely(ret)) {
+goto call_err;
+}
+if (unlikely(vq_group.num != this_vq_group.num)) {
+error_report("VQ %d group is different than VQ %d one",
+ this_vq_group.index, vq_group.index);
+return false;
+}
+}
+
+for (int i = 0; i < dev->vq_index_end; ++i) {
+struct vhost_vring_state vq_group = {
+.index = i,
+};
+
+if (dev->vq_index <= i && i < dev->vq_index + dev->nvqs) {
+continue;
+}
+
+ret = vhost_vdpa_get_vring_group(dev, _group);
+if (unlikely(ret)) {
+goto call_err;
+}
+if (unlikely(vq_group.num == this_vq_group.num)) {
+error_report("VQ %d group is the same as VQ %d one",
+ this_vq_group.index, vq_group.index);
+return false;
+}
+}
+
+return true;
+
+call_err:
+error_report("Can't read vq group, errno=%d (%s)", ret, g_strerror(-ret));
+return false;
+}
+
 static int vhost_vdpa_dev_start(struct vhost_dev *dev, bool started)
 {
 struct vhost_vdpa *v = dev->opaque;
@@ -1118,6 +1191,10 @@ static int vhost_vdpa_dev_start(struct vhost_dev *dev, 
bool started)
 
 if (started) {
 vhost_vdpa_host_notifiers_init(dev);
+if (dev->independent_vq_group &&
+!vhost_dev_is_independent_group(dev)) {
+return -1;
+}
 ok = vhost_vdpa_svqs_start(dev);
 if (unlikely(!ok)) {
 return -1;
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index ab8e095b73..ffb8eb26e7 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -46,6 +46,7 @@ 

[RFC PATCH v8 10/21] vhost: Add vhost_svq_inject

2022-05-19 Thread Eugenio Pérez
This allows qemu to inject buffers to the device without guest's notice.

This will be use to inject net CVQ messages to restore status in the
destination.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.h |  5 +++
 hw/virtio/vhost-shadow-virtqueue.c | 72 +-
 2 files changed, 65 insertions(+), 12 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index 79cb2d301f..8fe0367944 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -23,6 +23,9 @@ typedef struct SVQElement {
 
 /* Length of in buffer */
 size_t in_len;
+
+/* Buffer has been injected by QEMU, not by the guest */
+bool not_from_guest;
 } SVQElement;
 
 typedef void (*VirtQueueElementCallback)(VirtIODevice *vdev,
@@ -115,6 +118,8 @@ typedef struct VhostShadowVirtqueue {
 
 bool vhost_svq_valid_features(uint64_t features, Error **errp);
 
+int vhost_svq_inject(VhostShadowVirtqueue *svq, const struct iovec *iov,
+ size_t out_num, size_t in_num);
 void vhost_svq_set_svq_kick_fd(VhostShadowVirtqueue *svq, int svq_kick_fd);
 void vhost_svq_set_svq_call_fd(VhostShadowVirtqueue *svq, int call_fd);
 void vhost_svq_get_vring_addr(const VhostShadowVirtqueue *svq,
diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index 5a8feb1cbc..c535c99905 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -312,6 +312,43 @@ static void vhost_svq_kick(VhostShadowVirtqueue *svq)
 event_notifier_set(>hdev_kick);
 }
 
+/**
+ * Inject a chain of buffers to the device
+ *
+ * @svq: Shadow VirtQueue
+ * @iov: Descriptors buffer
+ * @out_num: Number of out elements
+ * @in_num: Number of in elements
+ */
+int vhost_svq_inject(VhostShadowVirtqueue *svq, const struct iovec *iov,
+ size_t out_num, size_t in_num)
+{
+SVQElement *svq_elem;
+uint16_t num_slots = (in_num ? 1 : 0) + (out_num ? 1 : 0);
+
+/*
+ * To inject buffers in a SVQ that does not copy descriptors is not
+ * supported. All vhost_svq_inject calls are controlled by qemu so we won't
+ * hit these assertions.
+ */
+assert(svq->copy_descs);
+assert(num_slots > 0);
+
+if (unlikely(svq->next_guest_avail_elem)) {
+error_report("Injecting in a full queue");
+return -ENOMEM;
+}
+
+svq_elem = virtqueue_alloc_element(sizeof(*svq_elem), out_num, in_num);
+iov_copy(svq_elem->elem.in_sg, in_num, iov + out_num, in_num, 0, SIZE_MAX);
+iov_copy(svq_elem->elem.out_sg, out_num, iov, out_num, 0, SIZE_MAX);
+svq_elem->not_from_guest = true;
+vhost_svq_add(svq, svq_elem);
+vhost_svq_kick(svq);
+
+return 0;
+}
+
 /**
  * Forward available buffers.
  *
@@ -350,6 +387,7 @@ static void vhost_handle_guest_kick(VhostShadowVirtqueue 
*svq)
 break;
 }
 
+svq_elem->not_from_guest = false;
 elem = _elem->elem;
 needed_slots = svq->copy_descs ? 1 : elem->out_num + elem->in_num;
 if (needed_slots > vhost_svq_available_slots(svq)) {
@@ -575,19 +613,24 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
 svq->ops->used_elem_handler(svq->vdev, elem);
 }
 
-if (unlikely(i >= svq->vring.num)) {
-qemu_log_mask(LOG_GUEST_ERROR,
- "More than %u used buffers obtained in a %u size SVQ",
- i, svq->vring.num);
-virtqueue_fill(vq, elem, len, i);
-virtqueue_flush(vq, i);
-return;
+if (!svq_elem->not_from_guest) {
+if (unlikely(i >= svq->vring.num)) {
+qemu_log_mask(
+LOG_GUEST_ERROR,
+"More than %u used buffers obtained in a %u size SVQ",
+i, svq->vring.num);
+virtqueue_fill(vq, elem, len, i);
+virtqueue_flush(vq, i);
+return;
+}
+virtqueue_fill(vq, elem, len, i++);
 }
-virtqueue_fill(vq, elem, len, i++);
 }
 
-virtqueue_flush(vq, i);
-event_notifier_set(>svq_call);
+if (i > 0) {
+virtqueue_flush(vq, i);
+event_notifier_set(>svq_call);
+}
 
 if (check_for_avail_queue && svq->next_guest_avail_elem) {
 /*
@@ -755,7 +798,10 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
 if (svq->copy_descs) {
 vhost_svq_unmap_elem(svq, svq_elem, 0, false);
 }
-virtqueue_detach_element(svq->vq, _elem->elem, 0);
+
+if (!svq_elem->not_from_guest) {
+virtqueue_detach_element(svq->vq, _elem->elem, 0);
+}
 }
 }
 
@@ -764,7 +810,9 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
 

[RFC PATCH v8 07/21] vhost: move descriptor translation to vhost_svq_vring_write_descs

2022-05-19 Thread Eugenio Pérez
It's done for both in and out descriptors so it's better placed here.

Acked-by: Jason Wang 
Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.c | 38 +-
 1 file changed, 27 insertions(+), 11 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index a6a8e403ea..2d5d27d29c 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -122,17 +122,35 @@ static bool vhost_svq_translate_addr(const 
VhostShadowVirtqueue *svq,
 return true;
 }
 
-static void vhost_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
-const struct iovec *iovec, size_t num,
-bool more_descs, bool write)
+/**
+ * Write descriptors to SVQ vring
+ *
+ * @svq: The shadow virtqueue
+ * @sg: Cache for hwaddr
+ * @iovec: The iovec from the guest
+ * @num: iovec length
+ * @more_descs: True if more descriptors come in the chain
+ * @write: True if they are in descriptors
+ *
+ * Return true if success, false otherwise and print error.
+ */
+static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
+const struct iovec *iovec, size_t num,
+bool more_descs, bool write)
 {
 uint16_t i = svq->free_head, last = svq->free_head;
 unsigned n;
 uint16_t flags = write ? cpu_to_le16(VRING_DESC_F_WRITE) : 0;
 vring_desc_t *descs = svq->vring.desc;
+bool ok;
 
 if (num == 0) {
-return;
+return true;
+}
+
+ok = vhost_svq_translate_addr(svq, sg, iovec, num);
+if (unlikely(!ok)) {
+return false;
 }
 
 for (n = 0; n < num; n++) {
@@ -150,6 +168,7 @@ static void vhost_vring_write_descs(VhostShadowVirtqueue 
*svq, hwaddr *sg,
 }
 
 svq->free_head = le16_to_cpu(svq->desc_next[last]);
+return true;
 }
 
 static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
@@ -169,21 +188,18 @@ static bool vhost_svq_add_split(VhostShadowVirtqueue *svq,
 return false;
 }
 
-ok = vhost_svq_translate_addr(svq, sgs, elem->out_sg, elem->out_num);
+ok = vhost_svq_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
+ elem->in_num > 0, false);
 if (unlikely(!ok)) {
 return false;
 }
-vhost_vring_write_descs(svq, sgs, elem->out_sg, elem->out_num,
-elem->in_num > 0, false);
-
 
-ok = vhost_svq_translate_addr(svq, sgs, elem->in_sg, elem->in_num);
+ok = vhost_svq_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, 
false,
+ true);
 if (unlikely(!ok)) {
 return false;
 }
 
-vhost_vring_write_descs(svq, sgs, elem->in_sg, elem->in_num, false, true);
-
 /*
  * Put the entry in the available array (but don't update avail->idx until
  * they do sync).
-- 
2.27.0




[RFC PATCH v8 02/21] vhost: Add custom used buffer callback

2022-05-19 Thread Eugenio Pérez
The callback allows SVQ users to know the VirtQueue requests and
responses. QEMU can use this to synchronize virtio device model state,
allowing to migrate it with minimum changes to the migration code.

In the case of networking, this will be used to inspect control
virtqueue messages.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.h | 16 +++-
 include/hw/virtio/vhost-vdpa.h |  2 ++
 hw/virtio/vhost-shadow-virtqueue.c |  9 -
 hw/virtio/vhost-vdpa.c |  3 ++-
 4 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index c132c994e9..6593f07db3 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -15,6 +15,13 @@
 #include "standard-headers/linux/vhost_types.h"
 #include "hw/virtio/vhost-iova-tree.h"
 
+typedef void (*VirtQueueElementCallback)(VirtIODevice *vdev,
+ const VirtQueueElement *elem);
+
+typedef struct VhostShadowVirtqueueOps {
+VirtQueueElementCallback used_elem_handler;
+} VhostShadowVirtqueueOps;
+
 /* Shadow virtqueue to relay notifications */
 typedef struct VhostShadowVirtqueue {
 /* Shadow vring */
@@ -59,6 +66,12 @@ typedef struct VhostShadowVirtqueue {
  */
 uint16_t *desc_next;
 
+/* Optional callbacks */
+const VhostShadowVirtqueueOps *ops;
+
+/* Optional custom used virtqueue element handler */
+VirtQueueElementCallback used_elem_cb;
+
 /* Next head to expose to the device */
 uint16_t shadow_avail_idx;
 
@@ -85,7 +98,8 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, VirtIODevice 
*vdev,
  VirtQueue *vq);
 void vhost_svq_stop(VhostShadowVirtqueue *svq);
 
-VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree);
+VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
+const VhostShadowVirtqueueOps *ops);
 
 void vhost_svq_free(gpointer vq);
 G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostShadowVirtqueue, vhost_svq_free);
diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index a29dbb3f53..f1ba46a860 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -17,6 +17,7 @@
 #include "hw/virtio/vhost-iova-tree.h"
 #include "hw/virtio/virtio.h"
 #include "standard-headers/linux/vhost_types.h"
+#include "hw/virtio/vhost-shadow-virtqueue.h"
 
 typedef struct VhostVDPAHostNotifier {
 MemoryRegion mr;
@@ -35,6 +36,7 @@ typedef struct vhost_vdpa {
 /* IOVA mapping used by the Shadow Virtqueue */
 VhostIOVATree *iova_tree;
 GPtrArray *shadow_vqs;
+const VhostShadowVirtqueueOps *shadow_vq_ops;
 struct vhost_dev *dev;
 VhostVDPAHostNotifier notifier[VIRTIO_QUEUE_MAX];
 } VhostVDPA;
diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index 56c96ebd13..167db8be45 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -410,6 +410,10 @@ static void vhost_svq_flush(VhostShadowVirtqueue *svq,
 break;
 }
 
+if (svq->ops && svq->ops->used_elem_handler) {
+svq->ops->used_elem_handler(svq->vdev, elem);
+}
+
 if (unlikely(i >= svq->vring.num)) {
 qemu_log_mask(LOG_GUEST_ERROR,
  "More than %u used buffers obtained in a %u size SVQ",
@@ -607,12 +611,14 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
  * shadow methods and file descriptors.
  *
  * @iova_tree: Tree to perform descriptors translations
+ * @ops: SVQ operations hooks
  *
  * Returns the new virtqueue or NULL.
  *
  * In case of error, reason is reported through error_report.
  */
-VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree)
+VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
+const VhostShadowVirtqueueOps *ops)
 {
 g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
 int r;
@@ -634,6 +640,7 @@ VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree 
*iova_tree)
 event_notifier_init_fd(>svq_kick, VHOST_FILE_UNBIND);
 event_notifier_set_handler(>hdev_call, vhost_svq_handle_call);
 svq->iova_tree = iova_tree;
+svq->ops = ops;
 return g_steal_pointer();
 
 err_init_hdev_call:
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 66f054a12c..7677b337e6 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -418,7 +418,8 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, 
struct vhost_vdpa *v,
 
 shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free);
 for (unsigned n = 0; n < hdev->nvqs; ++n) {
-g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new(v->iova_tree);
+g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new(v->iova_tree,
+

[RFC PATCH v8 05/21] vhost: Add vhost_iova_tree_find

2022-05-19 Thread Eugenio Pérez
Just a simple wrapper so we can find DMAMap entries based on iova

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-iova-tree.h |  2 ++
 hw/virtio/vhost-iova-tree.c | 14 ++
 2 files changed, 16 insertions(+)

diff --git a/hw/virtio/vhost-iova-tree.h b/hw/virtio/vhost-iova-tree.h
index 6a4f24e0f9..1ffcdc5b57 100644
--- a/hw/virtio/vhost-iova-tree.h
+++ b/hw/virtio/vhost-iova-tree.h
@@ -19,6 +19,8 @@ VhostIOVATree *vhost_iova_tree_new(uint64_t iova_first, 
uint64_t iova_last);
 void vhost_iova_tree_delete(VhostIOVATree *iova_tree);
 G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostIOVATree, vhost_iova_tree_delete);
 
+const DMAMap *vhost_iova_tree_find(const VhostIOVATree *iova_tree,
+   const DMAMap *map);
 const DMAMap *vhost_iova_tree_find_iova(const VhostIOVATree *iova_tree,
 const DMAMap *map);
 int vhost_iova_tree_map_alloc(VhostIOVATree *iova_tree, DMAMap *map);
diff --git a/hw/virtio/vhost-iova-tree.c b/hw/virtio/vhost-iova-tree.c
index 67bf6d57ab..1a59894385 100644
--- a/hw/virtio/vhost-iova-tree.c
+++ b/hw/virtio/vhost-iova-tree.c
@@ -56,6 +56,20 @@ void vhost_iova_tree_delete(VhostIOVATree *iova_tree)
 g_free(iova_tree);
 }
 
+/**
+ * Find a mapping in the tree that matches map
+ *
+ * @iova_tree  The iova tree
+ * @mapThe map
+ *
+ * Return a matching map that contains argument map or NULL
+ */
+const DMAMap *vhost_iova_tree_find(const VhostIOVATree *iova_tree,
+   const DMAMap *map)
+{
+return iova_tree_find(iova_tree->iova_taddr_map, map);
+}
+
 /**
  * Find the IOVA address stored from a memory address
  *
-- 
2.27.0




[RFC PATCH v8 06/21] vdpa: Add map/unmap operation callback to SVQ

2022-05-19 Thread Eugenio Pérez
Net Shadow Control VirtQueue will use them to map buffers outside of the
guest's address space.

These are needed for other features like indirect descriptors. They can be used
to map SVQ vrings: It is currently done outside of
vhost-shadow-virtqueue.c and that is a duplication.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.h | 21 +++--
 hw/virtio/vhost-shadow-virtqueue.c |  8 +++-
 hw/virtio/vhost-vdpa.c | 20 +++-
 3 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index 6593f07db3..50f45153c0 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -22,6 +22,15 @@ typedef struct VhostShadowVirtqueueOps {
 VirtQueueElementCallback used_elem_handler;
 } VhostShadowVirtqueueOps;
 
+typedef int (*vhost_svq_map_op)(hwaddr iova, hwaddr size, void *vaddr,
+bool readonly, void *opaque);
+typedef int (*vhost_svq_unmap_op)(hwaddr iova, hwaddr size, void *opaque);
+
+typedef struct VhostShadowVirtqueueMapOps {
+vhost_svq_map_op map;
+vhost_svq_unmap_op unmap;
+} VhostShadowVirtqueueMapOps;
+
 /* Shadow virtqueue to relay notifications */
 typedef struct VhostShadowVirtqueue {
 /* Shadow vring */
@@ -69,6 +78,12 @@ typedef struct VhostShadowVirtqueue {
 /* Optional callbacks */
 const VhostShadowVirtqueueOps *ops;
 
+/* Device memory mapping callbacks */
+const VhostShadowVirtqueueMapOps *map_ops;
+
+/* Device memory mapping callbacks opaque */
+void *map_ops_opaque;
+
 /* Optional custom used virtqueue element handler */
 VirtQueueElementCallback used_elem_cb;
 
@@ -98,8 +113,10 @@ void vhost_svq_start(VhostShadowVirtqueue *svq, 
VirtIODevice *vdev,
  VirtQueue *vq);
 void vhost_svq_stop(VhostShadowVirtqueue *svq);
 
-VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
-const VhostShadowVirtqueueOps *ops);
+VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_map,
+const VhostShadowVirtqueueOps *ops,
+const VhostShadowVirtqueueMapOps *map_ops,
+void *map_ops_opaque);
 
 void vhost_svq_free(gpointer vq);
 G_DEFINE_AUTOPTR_CLEANUP_FUNC(VhostShadowVirtqueue, vhost_svq_free);
diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index 167db8be45..a6a8e403ea 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -612,13 +612,17 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq)
  *
  * @iova_tree: Tree to perform descriptors translations
  * @ops: SVQ operations hooks
+ * @map_ops: SVQ mapping operation hooks
+ * @map_ops_opaque: Opaque data to pass to mapping operations
  *
  * Returns the new virtqueue or NULL.
  *
  * In case of error, reason is reported through error_report.
  */
 VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_tree,
-const VhostShadowVirtqueueOps *ops)
+const VhostShadowVirtqueueOps *ops,
+const VhostShadowVirtqueueMapOps *map_ops,
+void *map_ops_opaque)
 {
 g_autofree VhostShadowVirtqueue *svq = g_new0(VhostShadowVirtqueue, 1);
 int r;
@@ -641,6 +645,8 @@ VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree 
*iova_tree,
 event_notifier_set_handler(>hdev_call, vhost_svq_handle_call);
 svq->iova_tree = iova_tree;
 svq->ops = ops;
+svq->map_ops = map_ops;
+svq->map_ops_opaque = map_ops_opaque;
 return g_steal_pointer();
 
 err_init_hdev_call:
diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 7677b337e6..e6ef944e23 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -392,6 +392,22 @@ static int vhost_vdpa_get_dev_features(struct vhost_dev 
*dev,
 return ret;
 }
 
+static int vhost_vdpa_svq_map(hwaddr iova, hwaddr size, void *vaddr,
+  bool readonly, void *opaque)
+{
+return vhost_vdpa_dma_map(opaque, iova, size, vaddr, readonly);
+}
+
+static int vhost_vdpa_svq_unmap(hwaddr iova, hwaddr size, void *opaque)
+{
+return vhost_vdpa_dma_unmap(opaque, iova, size);
+}
+
+static const VhostShadowVirtqueueMapOps vhost_vdpa_svq_map_ops = {
+.map = vhost_vdpa_svq_map,
+.unmap = vhost_vdpa_svq_unmap,
+};
+
 static int vhost_vdpa_init_svq(struct vhost_dev *hdev, struct vhost_vdpa *v,
Error **errp)
 {
@@ -419,7 +435,9 @@ static int vhost_vdpa_init_svq(struct vhost_dev *hdev, 
struct vhost_vdpa *v,
 shadow_vqs = g_ptr_array_new_full(hdev->nvqs, vhost_svq_free);
 for (unsigned n = 0; n < hdev->nvqs; ++n) {
 g_autoptr(VhostShadowVirtqueue) svq = vhost_svq_new(v->iova_tree,
- 

[RFC PATCH v8 09/21] vhost: Add svq copy desc mode

2022-05-19 Thread Eugenio Pérez
Enable SVQ to not to forward the descriptor translating its address to
qemu's IOVA but copying to a region outside of the guest.

Virtio-net control VQ will use this mode, so we don't need to send all
the guest's memory every time there is a change, but only on messages.
Reversely, CVQ will only have access to control messages.  This lead to
less messing with memory listeners.

We could also try to send only the required translation by message, but
this presents a problem when many control messages occupy the same
guest's memory region.

Lastly, this allows us to inject messages from QEMU to the device in a
simple manner.  CVQ should be used rarely and with small messages, so all
the drawbacks should be assumible.

Signed-off-by: Eugenio Pérez 
---
 hw/virtio/vhost-shadow-virtqueue.h |  10 ++
 include/hw/virtio/vhost-vdpa.h |   1 +
 hw/virtio/vhost-shadow-virtqueue.c | 174 +++--
 hw/virtio/vhost-vdpa.c |   1 +
 net/vhost-vdpa.c   |   1 +
 5 files changed, 175 insertions(+), 12 deletions(-)

diff --git a/hw/virtio/vhost-shadow-virtqueue.h 
b/hw/virtio/vhost-shadow-virtqueue.h
index e06ac52158..79cb2d301f 100644
--- a/hw/virtio/vhost-shadow-virtqueue.h
+++ b/hw/virtio/vhost-shadow-virtqueue.h
@@ -17,6 +17,12 @@
 
 typedef struct SVQElement {
 VirtQueueElement elem;
+
+/* SVQ IOVA address of in buffer and out buffer if cloned */
+hwaddr in_iova, out_iova;
+
+/* Length of in buffer */
+size_t in_len;
 } SVQElement;
 
 typedef void (*VirtQueueElementCallback)(VirtIODevice *vdev,
@@ -102,6 +108,9 @@ typedef struct VhostShadowVirtqueue {
 
 /* Next head to consume from the device */
 uint16_t last_used_idx;
+
+/* Copy each descriptor to QEMU iova */
+bool copy_descs;
 } VhostShadowVirtqueue;
 
 bool vhost_svq_valid_features(uint64_t features, Error **errp);
@@ -119,6 +128,7 @@ void vhost_svq_stop(VhostShadowVirtqueue *svq);
 
 VhostShadowVirtqueue *vhost_svq_new(VhostIOVATree *iova_map,
 const VhostShadowVirtqueueOps *ops,
+bool copy_descs,
 const VhostShadowVirtqueueMapOps *map_ops,
 void *map_ops_opaque);
 
diff --git a/include/hw/virtio/vhost-vdpa.h b/include/hw/virtio/vhost-vdpa.h
index f1ba46a860..dc2884eea4 100644
--- a/include/hw/virtio/vhost-vdpa.h
+++ b/include/hw/virtio/vhost-vdpa.h
@@ -33,6 +33,7 @@ typedef struct vhost_vdpa {
 struct vhost_vdpa_iova_range iova_range;
 uint64_t acked_features;
 bool shadow_vqs_enabled;
+bool svq_copy_descs;
 /* IOVA mapping used by the Shadow Virtqueue */
 VhostIOVATree *iova_tree;
 GPtrArray *shadow_vqs;
diff --git a/hw/virtio/vhost-shadow-virtqueue.c 
b/hw/virtio/vhost-shadow-virtqueue.c
index 044005ba89..5a8feb1cbc 100644
--- a/hw/virtio/vhost-shadow-virtqueue.c
+++ b/hw/virtio/vhost-shadow-virtqueue.c
@@ -16,6 +16,7 @@
 #include "qemu/log.h"
 #include "qemu/memalign.h"
 #include "linux-headers/linux/vhost.h"
+#include "qemu/iov.h"
 
 /**
  * Validate the transport device features that both guests can use with the SVQ
@@ -70,6 +71,30 @@ static uint16_t vhost_svq_available_slots(const 
VhostShadowVirtqueue *svq)
 return svq->vring.num - (svq->shadow_avail_idx - svq->shadow_used_idx);
 }
 
+static void vhost_svq_alloc_buffer(void **base, size_t *len,
+   const struct iovec *iov, size_t num,
+   bool write)
+{
+*len = iov_size(iov, num);
+size_t buf_size = ROUND_UP(*len, 4096);
+
+if (!num) {
+return;
+}
+
+/*
+ * Linearize element. If guest had a descriptor chain, we expose the device
+ * a single buffer.
+ */
+*base = qemu_memalign(4096, buf_size);
+if (!write) {
+iov_to_buf(iov, num, 0, *base, *len);
+memset(*base + *len, 0, buf_size - *len);
+} else {
+memset(*base, 0, *len);
+}
+}
+
 /**
  * Translate addresses between the qemu's virtual address and the SVQ IOVA
  *
@@ -126,7 +151,9 @@ static bool vhost_svq_translate_addr(const 
VhostShadowVirtqueue *svq,
  * Write descriptors to SVQ vring
  *
  * @svq: The shadow virtqueue
+ * @svq_elem: The shadow virtqueue element
  * @sg: Cache for hwaddr
+ * @descs_len: Total written buffer if svq->copy_descs.
  * @iovec: The iovec from the guest
  * @num: iovec length
  * @more_descs: True if more descriptors come in the chain
@@ -134,7 +161,9 @@ static bool vhost_svq_translate_addr(const 
VhostShadowVirtqueue *svq,
  *
  * Return true if success, false otherwise and print error.
  */
-static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq, hwaddr *sg,
+static bool vhost_svq_vring_write_descs(VhostShadowVirtqueue *svq,
+SVQElement *svq_elem, hwaddr *sg,
+size_t *descs_len,
 const struct 

[RFC PATCH v8 04/21] virtio: Make virtqueue_alloc_element non-static

2022-05-19 Thread Eugenio Pérez
So SVQ can allocate elements by calling it.

Signed-off-by: Eugenio Pérez 
---
 include/hw/virtio/virtio.h | 1 +
 hw/virtio/virtio.c | 2 +-
 2 files changed, 2 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index db1c0ddf6b..5ca29e8757 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -198,6 +198,7 @@ void virtqueue_fill(VirtQueue *vq, const VirtQueueElement 
*elem,
 unsigned int len, unsigned int idx);
 
 void virtqueue_map(VirtIODevice *vdev, VirtQueueElement *elem);
+void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_num);
 void *virtqueue_pop(VirtQueue *vq, size_t sz);
 unsigned int virtqueue_drop_all(VirtQueue *vq);
 void *qemu_get_virtqueue_element(VirtIODevice *vdev, QEMUFile *f, size_t sz);
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 5d607aeaa0..b0929ba86c 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1426,7 +1426,7 @@ void virtqueue_map(VirtIODevice *vdev, VirtQueueElement 
*elem)
 false);
 }
 
-static void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned 
in_num)
+void *virtqueue_alloc_element(size_t sz, unsigned out_num, unsigned in_num)
 {
 VirtQueueElement *elem;
 size_t in_addr_ofs = QEMU_ALIGN_UP(sz, __alignof__(elem->in_addr[0]));
-- 
2.27.0




[RFC PATCH v8 03/21] vdpa: control virtqueue support on shadow virtqueue

2022-05-19 Thread Eugenio Pérez
Introduce the control virtqueue support for vDPA shadow virtqueue. This
is needed for advanced networking features like multiqueue.

To demonstrate command handling, VIRTIO_NET_F_CTRL_MACADDR and
VIRTIO_NET_CTRL_MQ are implemented. If vDPA device is started with SVQ
support and virtio-net driver changes MAC or the number of queues
virtio-net device model will be updated with the new one.

Others cvq commands could be added here straightforwardly but they have
been not tested.

Signed-off-by: Eugenio Pérez 
---
 net/vhost-vdpa.c | 44 
 1 file changed, 44 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index df1e69ee72..ef12fc284c 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -11,6 +11,7 @@
 
 #include "qemu/osdep.h"
 #include "clients.h"
+#include "hw/virtio/virtio-net.h"
 #include "net/vhost_net.h"
 #include "net/vhost-vdpa.h"
 #include "hw/virtio/vhost-vdpa.h"
@@ -187,6 +188,46 @@ static NetClientInfo net_vhost_vdpa_info = {
 .check_peer_type = vhost_vdpa_check_peer_type,
 };
 
+static void vhost_vdpa_net_handle_ctrl(VirtIODevice *vdev,
+   const VirtQueueElement *elem)
+{
+struct virtio_net_ctrl_hdr ctrl;
+virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
+size_t s;
+struct iovec in = {
+.iov_base = ,
+.iov_len = sizeof(status),
+};
+
+s = iov_to_buf(elem->out_sg, elem->out_num, 0, , sizeof(ctrl.class));
+if (s != sizeof(ctrl.class)) {
+return;
+}
+
+switch (ctrl.class) {
+case VIRTIO_NET_CTRL_MAC_ADDR_SET:
+case VIRTIO_NET_CTRL_MQ:
+break;
+default:
+return;
+};
+
+s = iov_to_buf(elem->in_sg, elem->in_num, 0, , sizeof(status));
+if (s != sizeof(status) || status != VIRTIO_NET_OK) {
+return;
+}
+
+status = VIRTIO_NET_ERR;
+virtio_net_handle_ctrl_iov(vdev, , 1, elem->out_sg, elem->out_num);
+if (status != VIRTIO_NET_OK) {
+error_report("Bad CVQ processing in model");
+}
+}
+
+static const VhostShadowVirtqueueOps vhost_vdpa_net_svq_ops = {
+.used_elem_handler = vhost_vdpa_net_handle_ctrl,
+};
+
 static NetClientState *net_vhost_vdpa_init(NetClientState *peer,
const char *device,
const char *name,
@@ -211,6 +252,9 @@ static NetClientState *net_vhost_vdpa_init(NetClientState 
*peer,
 
 s->vhost_vdpa.device_fd = vdpa_device_fd;
 s->vhost_vdpa.index = queue_pair_index;
+if (!is_datapath) {
+s->vhost_vdpa.shadow_vq_ops = _vdpa_net_svq_ops;
+}
 ret = vhost_vdpa_add(nc, (void *)>vhost_vdpa, queue_pair_index, nvqs);
 if (ret) {
 qemu_del_net_client(nc);
-- 
2.27.0




[RFC PATCH v8 01/21] virtio-net: Expose ctrl virtqueue logic

2022-05-19 Thread Eugenio Pérez
This allows external vhost-net devices to modify the state of the
VirtIO device model once vhost-vdpa device has acknowledge the control
commands.

Signed-off-by: Eugenio Pérez 
---
 include/hw/virtio/virtio-net.h |  4 ++
 hw/net/virtio-net.c| 84 --
 2 files changed, 53 insertions(+), 35 deletions(-)

diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
index eb87032627..cd31b7f67d 100644
--- a/include/hw/virtio/virtio-net.h
+++ b/include/hw/virtio/virtio-net.h
@@ -218,6 +218,10 @@ struct VirtIONet {
 struct EBPFRSSContext ebpf_rss;
 };
 
+unsigned virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
+const struct iovec *in_sg, size_t in_num,
+const struct iovec *out_sg,
+unsigned out_num);
 void virtio_net_set_netclient_name(VirtIONet *n, const char *name,
const char *type);
 
diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
index 7ad948ee7c..0e350154ec 100644
--- a/hw/net/virtio-net.c
+++ b/hw/net/virtio-net.c
@@ -1434,57 +1434,71 @@ static int virtio_net_handle_mq(VirtIONet *n, uint8_t 
cmd,
 return VIRTIO_NET_OK;
 }
 
-static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
+unsigned virtio_net_handle_ctrl_iov(VirtIODevice *vdev,
+const struct iovec *in_sg, size_t in_num,
+const struct iovec *out_sg,
+unsigned out_num)
 {
 VirtIONet *n = VIRTIO_NET(vdev);
 struct virtio_net_ctrl_hdr ctrl;
 virtio_net_ctrl_ack status = VIRTIO_NET_ERR;
-VirtQueueElement *elem;
 size_t s;
 struct iovec *iov, *iov2;
-unsigned int iov_cnt;
+
+if (iov_size(in_sg, in_num) < sizeof(status) ||
+iov_size(out_sg, out_num) < sizeof(ctrl)) {
+virtio_error(vdev, "virtio-net ctrl missing headers");
+return 0;
+}
+
+iov2 = iov = g_memdup2(out_sg, sizeof(struct iovec) * out_num);
+s = iov_to_buf(iov, out_num, 0, , sizeof(ctrl));
+iov_discard_front(, _num, sizeof(ctrl));
+if (s != sizeof(ctrl)) {
+status = VIRTIO_NET_ERR;
+} else if (ctrl.class == VIRTIO_NET_CTRL_RX) {
+status = virtio_net_handle_rx_mode(n, ctrl.cmd, iov, out_num);
+} else if (ctrl.class == VIRTIO_NET_CTRL_MAC) {
+status = virtio_net_handle_mac(n, ctrl.cmd, iov, out_num);
+} else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
+status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, out_num);
+} else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
+status = virtio_net_handle_announce(n, ctrl.cmd, iov, out_num);
+} else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
+status = virtio_net_handle_mq(n, ctrl.cmd, iov, out_num);
+} else if (ctrl.class == VIRTIO_NET_CTRL_GUEST_OFFLOADS) {
+status = virtio_net_handle_offloads(n, ctrl.cmd, iov, out_num);
+}
+
+s = iov_from_buf(in_sg, in_num, 0, , sizeof(status));
+assert(s == sizeof(status));
+
+g_free(iov2);
+return sizeof(status);
+}
+
+static void virtio_net_handle_ctrl(VirtIODevice *vdev, VirtQueue *vq)
+{
+VirtQueueElement *elem;
 
 for (;;) {
+unsigned written;
 elem = virtqueue_pop(vq, sizeof(VirtQueueElement));
 if (!elem) {
 break;
 }
-if (iov_size(elem->in_sg, elem->in_num) < sizeof(status) ||
-iov_size(elem->out_sg, elem->out_num) < sizeof(ctrl)) {
-virtio_error(vdev, "virtio-net ctrl missing headers");
+
+written = virtio_net_handle_ctrl_iov(vdev, elem->in_sg, elem->in_num,
+ elem->out_sg, elem->out_num);
+if (written > 0) {
+virtqueue_push(vq, elem, written);
+virtio_notify(vdev, vq);
+g_free(elem);
+} else {
 virtqueue_detach_element(vq, elem, 0);
 g_free(elem);
 break;
 }
-
-iov_cnt = elem->out_num;
-iov2 = iov = g_memdup2(elem->out_sg,
-   sizeof(struct iovec) * elem->out_num);
-s = iov_to_buf(iov, iov_cnt, 0, , sizeof(ctrl));
-iov_discard_front(, _cnt, sizeof(ctrl));
-if (s != sizeof(ctrl)) {
-status = VIRTIO_NET_ERR;
-} else if (ctrl.class == VIRTIO_NET_CTRL_RX) {
-status = virtio_net_handle_rx_mode(n, ctrl.cmd, iov, iov_cnt);
-} else if (ctrl.class == VIRTIO_NET_CTRL_MAC) {
-status = virtio_net_handle_mac(n, ctrl.cmd, iov, iov_cnt);
-} else if (ctrl.class == VIRTIO_NET_CTRL_VLAN) {
-status = virtio_net_handle_vlan_table(n, ctrl.cmd, iov, iov_cnt);
-} else if (ctrl.class == VIRTIO_NET_CTRL_ANNOUNCE) {
-status = virtio_net_handle_announce(n, ctrl.cmd, iov, iov_cnt);
-} else if (ctrl.class == VIRTIO_NET_CTRL_MQ) {
-

[RFC PATCH v8 00/21] Net Control VQ support with asid in vDPA SVQ

2022-05-19 Thread Eugenio Pérez
Control virtqueue is used by networking device for accepting various
commands from the driver. It's a must to support multiqueue and other
configurations.

Shadow VirtQueue (SVQ) already makes possible migration of virtqueue
states, effectively intercepting them so qemu can track what regions of memory
are dirty because device action and needs migration. However, this does not
solve networking device state seen by the driver because CVQ messages, like
changes on MAC addresses from the driver.

To solve that, this series uses SVQ infraestructure proposed to intercept
networking control messages used by the device. This way, qemu is able to
update VirtIONet device model and to migrate it.

However, to intercept all queues would slow device data forwarding. To solve
that, only the CVQ must be intercepted all the time. This is achieved using
the ASID infraestructure, that allows different translations for different
virtqueues. The most updated kernel part of ASID is proposed at [1].

You can run qemu in two modes after applying this series: only intercepting
cvq with x-cvq-svq=on or intercept all the virtqueues adding cmdline x-svq=on:

-netdev 
type=vhost-vdpa,vhostdev=/dev/vhost-vdpa-0,id=vhost-vdpa0,x-cvq-svq=on,x-svq=on

First three patches enable the update of the virtio-net device model for each
CVQ message acknoledged by the device.

Patches from 5 to 9 enables individual SVQ to copy the buffers to QEMU's VA.
This allows simplyfing the memory mapping, instead of map all the guest's
memory like in the data virtqueues.

Patch 10 allows to inject control messages to the device. This allows to set
state to the device both at QEMU startup and at live migration destination. In
the future, this may also be used to emulate _F_ANNOUNCE.

Patch 11 updates kernel headers, but it assign random numbers to needed ioctls
because they are still not accepted in the kernel.

Patches 12-16 enables the set of the features of the net device model to the
vdpa device at device start.

Last ones enables the sepparated ASID and SVQ.

Comments are welcomed.

TODO:
* Fallback on regular CVQ if QEMU cannot isolate in its own ASID by any
  reason, blocking migration. This is tricky, since it can cause that the VM
  cannot be migrated anymore, so some way of block it must be used.
* Review failure paths, some are with TODO notes, other don't.

Changes from rfc v7:
* Don't map all guest space in ASID 1 but copy all the buffers. No need for
  more memory listeners.
* Move net backend start callback to SVQ.
* Wait for device CVQ commands used by the device at SVQ start, avoiding races.
* Changed ioctls, but they're provisional anyway.
* Reorder commits so refactor and code adding ones are closer to usage.
* Usual cleaning: better tracing, doc, patches messages, ...

Changes from rfc v6:
* Fix bad iotlb updates order when batching was enabled
* Add reference counting to iova_tree so cleaning is simpler.

Changes from rfc v5:
* Fixes bad calculus of cvq end group when MQ is not acked by the guest.

Changes from rfc v4:
* Add missing tracing
* Add multiqueue support
* Use already sent version for replacing g_memdup
* Care with memory management

Changes from rfc v3:
* Fix bad returning of descriptors to SVQ list.

Changes from rfc v2:
* Fix use-after-free.

Changes from rfc v1:
* Rebase to latest master.
* Configure ASID instead of assuming cvq asid != data vqs asid.
* Update device model so (MAC) state can be migrated too.

[1] https://lkml.kernel.org/kvm/20220224212314.1326-1-gda...@xilinx.com/

Eugenio Pérez (21):
  virtio-net: Expose ctrl virtqueue logic
  vhost: Add custom used buffer callback
  vdpa: control virtqueue support on shadow virtqueue
  virtio: Make virtqueue_alloc_element non-static
  vhost: Add vhost_iova_tree_find
  vdpa: Add map/unmap operation callback to SVQ
  vhost: move descriptor translation to vhost_svq_vring_write_descs
  vhost: Add SVQElement
  vhost: Add svq copy desc mode
  vhost: Add vhost_svq_inject
  vhost: Update kernel headers
  vdpa: delay set_vring_ready after DRIVER_OK
  vhost: Add ShadowVirtQueueStart operation
  vhost: Make possible to check for device exclusive vq group
  vhost: add vhost_svq_poll
  vdpa: Add vhost_vdpa_start_control_svq
  vdpa: Add asid attribute to vdpa device
  vdpa: Extract get features part from vhost_vdpa_get_max_queue_pairs
  vhost: Add reference counting to vhost_iova_tree
  vdpa: Add x-svq to NetdevVhostVDPAOptions
  vdpa: Add x-cvq-svq

 qapi/net.json|  13 +-
 hw/virtio/vhost-iova-tree.h  |   7 +-
 hw/virtio/vhost-shadow-virtqueue.h   |  61 ++-
 include/hw/virtio/vhost-vdpa.h   |   3 +
 include/hw/virtio/vhost.h|   3 +
 include/hw/virtio/virtio-net.h   |   4 +
 include/hw/virtio/virtio.h   |   1 +
 include/standard-headers/linux/vhost_types.h |  11 +-
 linux-headers/linux/vhost.h  |  25 +-
 hw/net/vhost_net.c   |   5 

Re: Accelerating non-standard disk types

2022-05-19 Thread Raphael Norwitz
On Tue, May 17, 2022 at 03:53:52PM +0200, Paolo Bonzini wrote:
> On 5/16/22 19:38, Raphael Norwitz wrote:
> > [1] Keep using the SCSI translation in QEMU but back vDisks with a
> > vhost-user-scsi or vhost-user-blk backend device.
> > [2] Implement SATA and IDE emulation with vfio-user (likely with an SPDK
> > client?).
> > [3] We've also been looking at your libblkio library. From your
> > description in
> > https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.gnu.org_archive_html_qemu-2Ddevel_2021-2D04_msg06146.html=DwICaQ=s883GpUCOChKOHiocYtGcg=In4gmR1pGzKB8G5p6LUrWqkSMec2L5EtXZow_FZNJZk=wBSqcw0cal3wPP87YIKgFgmqMHjGCC3apYf4wCn1SIrX6GW_FR-J9wO68v-cyrpn=CP-6ZY-gqgQ2zLAJdR8WVTrMBoqmFHilGvW_qnf2myU=
> >it
> > sounds like it may definitely play a role here, and possibly provide the
> > nessesary abstractions to back I/O from these emulated disks to any
> > backends we may want?
> 
> First of all: have you benchmarked it?  How much time is spent on MMIO vs.
> disk I/O?
>

Good point - we haven’t benchmarked the emulation, exit and translation
overheads - it is very possible speeding up disk I/O may not have a huge
impact. We would definitely benchmark this before exploring any of the
options seriously, but as you rightly note, performance is not the only
motivation here.

> Of the options above, the most interesting to me is to implement a
> vhost-user-blk/vhost-user-scsi backend in QEMU, similar to the NVMe one,
> that would translate I/O submissions to virtqueue (including polling and the
> like) and could be used with SATA.
>

We were certainly eyeing [1] as the most viable in the immediate future.
That said, since a vhost-user-blk driver has been added to libblkio, [3]
also sounds like a strong option. Do you see any long term benefit to
translating SATA/IDE submissions to virtqueues in a world where libblkio
is to be adopted?

> For IDE specifically, I'm not sure how much it can be sped up since it has
> only 1 in-flight operation.  I think using KVM coalesced I/O could provide
> an interesting boost (assuming instant or near-instant reply from the
> backend).  If all you're interested in however is not really performance,
> but rather having a single "connection" to your back end, vhost-user is
> certainly an option.
> 

Interesting - I will take a look at KVM coalesced I/O.

You’re totally right though, performance is not our main interest for
these disk types. I should have emphasized offload rather than
acceleration and performance. We would prefer to QA and support as few
data paths as possible, and a vhost-user offload mechanism would allow
us to use the same path for all I/O. I imagine other QEMU users who
offload to backends like SPDK and use SATA/IDE disk types may feel
similarly?

> Paolo

Re: Accelerating non-standard disk types

2022-05-19 Thread Raphael Norwitz
On Tue, May 17, 2022 at 04:29:17PM +0100, Stefan Hajnoczi wrote:
> On Mon, May 16, 2022 at 05:38:31PM +, Raphael Norwitz wrote:
> > Hey Stefan,
> > 
> > We've been thinking about ways to accelerate other disk types such as
> > SATA and IDE rather than translating to SCSI and using QEMU's iSCSI
> > driver, with existing and more performant backends such as SPDK. We
> > think there are some options worth exploring:
> > 
> > [1] Keep using the SCSI translation in QEMU but back vDisks with a
> > vhost-user-scsi or vhost-user-blk backend device.
> 
> If I understand correctly the idea is to have a QEMU Block Driver that
> connects to SPDK using vhost-user-scsi/blk?
>

Yes - the idea would be to introduce logic to translate SATA/IDE to SCSI
or block requests and send them via vhost-user-{scsi/blk} to SPDK or any
other vhost-user backend. Our thought is that this is doable today
whereas we may have to wait for QEMU to formally adopt libblkio before
proceeding with [3], and depending on timelines it may make sense to
implement [1] and then switch over to [3] later. Thoughts?

> > [2] Implement SATA and IDE emulation with vfio-user (likely with an SPDK
> > client?).
> 
> This is definitely the option with the lowest overhead. I'm not sure if
> implementing SATA and IDE emulation in SPDK is worth the effort for
> saving the last few cycles.
>

Agreed - it’s probably not worth exploring because of the amount of work
involved. One good argument would be that it may be better for security
in the multiprocess QEMU world, but to me that does not seem strong
enough to justify the work involved so I suggest we drop option [2].

> > [3] We've also been looking at your libblkio library. From your
> > description in
> > https://lists.gnu.org/archive/html/qemu-devel/2021-04/msg06146.html it
> > sounds like it may definitely play a role here, and possibly provide the
> > nessesary abstractions to back I/O from these emulated disks to any
> > backends we may want?
> 
> Kevin Wolf has contributed a vhost-user-blk driver for libblkio. This
> lets you achieve #1 using QEMU's libblkio Block Driver. The guest still
> sees IDE or SATA but instead of translating to iSCSI the I/O requests
> are sent over vhost-user-blk.
> 
> I suggest joining the libblkio chat and we can discuss how to set this
> up (the QEMU libblkio BlockDriver is not yet in qemu.git):
> https://matrix.to/#/#libblkio:matrix.org

Great - I have joined and will follow up there.

> 
> > We are planning to start a review of these options internally to survey
> > tradeoffs, potential timelines and practicality for these approaches. We
> > were also considering putting a submission together for KVM forum
> > describing our findings. Would you see any value in that?
> 
> I think it's always interesting to see performance results. I wonder if
> you have more cutting-edge optimizations or performance results you want
> to share at KVM Forum because IDE and SATA are more legacy/niche
> nowadays?
>

I realize I over-emphasized performance in my question - our larger goal
here is to align the data path for all disk types. We have some hope
that SATA can be sped up a bit, but it’s entirely possible that the MMIO
overhead will way outweigh and disk I/O improvements. Our thought was to
present a “Roadmap for supporting offload alternate disk types”, but
with your and Paolo’s response it seems like there isn’t enough material
to warrant a KVM talk and we should rather invest time in prototyping
and evaluating solutions.

> Stefan



Re: [PATCH] util: optimise flush_idcache_range when the ppc host has coherent icache

2022-05-19 Thread Richard Henderson

On 5/19/22 07:11, Nicholas Piggin wrote:

dcache writeback and icache invalidate is not required when icache is
coherent, a shorter fixed-length sequence can be used which just has to
flush and re-fetch instructions that were in-flight.

Signed-off-by: Nicholas Piggin 
---

I haven't been able to measure a significant performance difference
with this, qemu isn't flushing large ranges frequently so the old sequence
is not that slow.


Yeah, we should be flushing smallish regions (< 1-4k), as we generate TranslationBlocks. 
And hopefully the translation cache is large enough that we spend more time executing 
blocks than re-compiling them.  ;-)




+++ b/include/qemu/cacheflush.h
@@ -28,6 +28,10 @@ static inline void flush_idcache_range(uintptr_t rx, 
uintptr_t rw, size_t len)
  
  #else
  
+#if defined(__powerpc__)

+extern bool have_coherent_icache;
+#endif


Ug.  I'm undecided where to put this.  I'm tempted to say...


--- a/util/cacheflush.c
+++ b/util/cacheflush.c
@@ -108,7 +108,16 @@ void flush_idcache_range(uintptr_t rx, uintptr_t rw, 
size_t len)


... here in cacheflush.c, with a comment that the variable is defined and initialized in 
cacheinfo.c.


I'm even more tempted to merge the two files to put all of the machine-specific cache data 
in the same place, then this variable can be static.  There's even an existing TODO 
comment in cacheflush.c for aarch64.




  b = rw & ~(dsize - 1);
+
+if (have_coherent_icache) {
+asm volatile ("sync" : : : "memory");
+asm volatile ("icbi 0,%0" : : "r"(b) : "memory");
+asm volatile ("isync" : : : "memory");
+return;
+}


Where can I find definitive rules on this?

Note that rx may not equal rw, and that we've got two virtual mappings for the same 
memory, one for "data" that is read-write and one for "execute" that is read-execute. 
(This split is enabled only for --enable-debug-tcg builds on linux, to make sure we don't 
regress apple m1, which requires the split all of the time.)


In particular, you're flushing one icache line with the dcache address, and that you're 
not flushing any of the other lines.  Is the coherent icache thing really that we may 
simply skip the dcache flush step, but must still flush all of the icache lines?


Without docs, "icache snoop" to me would imply that we only need the two barriers and no 
flushes at all, just to make sure all memory writes complete before any new instructions 
are executed.  This would be like the two AArch64 bits, IDC and DIC, which indicate that 
the two caches are coherent to Point of Unification, which leaves us with just the 
Instruction Sequence Barrier at the end of the function.




+bool have_coherent_icache = false;


scripts/checkpatch.pl should complain this is initialized to 0.



  static void arch_cache_info(int *isize, int *dsize)
  {
+#  ifdef PPC_FEATURE_ICACHE_SNOOP
+unsigned long hwcap = qemu_getauxval(AT_HWCAP);
+#  endif
+
  if (*isize == 0) {
  *isize = qemu_getauxval(AT_ICACHEBSIZE);
  }
  if (*dsize == 0) {
  *dsize = qemu_getauxval(AT_DCACHEBSIZE);
  }
+
+#  ifdef PPC_FEATURE_ICACHE_SNOOP
+have_coherent_icache = (hwcap & PPC_FEATURE_ICACHE_SNOOP) != 0;
+#  endif


Better with only one ifdef, moving this second hunk up.

It would be nice if there were some kernel documentation for this...


r~



Re: [RFC 0/3] Introduce a new Qemu machine for RISC-V

2022-05-19 Thread Atish Kumar Patra
On Wed, May 18, 2022 at 3:46 AM Peter Maydell  wrote:
>
> On Wed, 18 May 2022 at 09:25, Daniel P. Berrangé  wrote:
> > The fact that RISC-V ecosystem is so young & has relatively few
> > users, and even fewer expecting  long term stability, is precisely
> > why we should just modify the existing 'virt' machine now rather
> > than introducing a new 'virt-pcie'. We can afford to have the
> > limited incompatibility in the short term given the small userbase.
> > We went through this same exercise with aarch64 virt machine and
> > despite the short term disruption, it was a good success IMHO to
> > get it switched from MMIO to PCI, instead of having two machines
> > in parallel long term.
>
> The aarch64 virt board does still carry around the mmio devices,
> though...it's just that we have pci as well now.
>
> Personally I don't think that switching to a new machine type
> is likely to help escape from the "bloat" problem, which arises
> from two conflicting desires:
>
>  (1) people want this kind of board to be nice and small and
>  simple, with a minimal set of devices
>  (2) everybody has their own "but this specific one device is
>  really important and it should be in the minimal set"
>  (watchdog? acpi? ability to power the machine on and off?
>  second UART? i2c? etc etc etc)
>

Both acpi/device tree support should be there anyways.
MMIO based reset will probably needed as well (I listed that earlier
with mandatory MMIO devices)

AFAIK everything else can be PCIe based which the new board will mandate.
It must strictly enforce the rules about what can be added to it. The
bar to allow
new MMIO devices must be very high and must have a wide range of usage.
This will make life easier for the entire ecosystem as well. AFAIK,
libvirt uses PCIe devices only to build VMs.

I understand that is probably a big ask but if odd mmio devices sneak
into this platform, then that defeats the purpose.
On other hand, having a flag day for virt machines creates a lot of
incompatibility for the users until everyone transitions.
The transition also has to happen based on Qemu version as virt
machine doesn't have any versioning right now.

Do we make users' life difficult by having a flag date based on the
Qemu version or take additional responsibility of maintaining another
board ?
I hope the new board will continue to be small so the maintenance
burden is not too much. Personally, I feel the latter approach will
have minimum inconvenience for everybody
but I am okay with whatever is decided by the community.



> So either your 'minimal' board is only serving a small subset
> of the users who want a minimal board; or else it's not as
> minimal as any of them would like; or else it acquires a
> growing set of -machine options to turn various devices on
> and off...
>
> -- PMM



Re: [PATCH 28/35] acpi: pvpanic-isa: use AcpiDevAmlIfClass:build_dev_aml to provide device's AML

2022-05-19 Thread Igor Mammedov
On Mon, 16 May 2022 16:46:29 -0400
"Michael S. Tsirkin"  wrote:

> On Mon, May 16, 2022 at 11:26:03AM -0400, Igor Mammedov wrote:
> > .. and clean up not longer needed conditionals in DSTD build code
> > pvpanic-isa AML will be fetched and included when ISA bridge will
> > build its own AML code (including attached devices).
> > 
> > Expected AML change:
> >the device under separate _SB.PCI0.ISA scope is moved directly
> >under Device(ISA) node.
> > 
> > Signed-off-by: Igor Mammedov 
> > ---
> >  include/hw/misc/pvpanic.h |  9 -
> >  hw/i386/acpi-build.c  | 37 --
> >  hw/misc/pvpanic-isa.c | 42 +++
> >  3 files changed, 42 insertions(+), 46 deletions(-)
> > 
> > diff --git a/include/hw/misc/pvpanic.h b/include/hw/misc/pvpanic.h
> > index 7f16cc9b16..e520566ab0 100644
> > --- a/include/hw/misc/pvpanic.h
> > +++ b/include/hw/misc/pvpanic.h
> > @@ -33,13 +33,4 @@ struct PVPanicState {
> >  
> >  void pvpanic_setup_io(PVPanicState *s, DeviceState *dev, unsigned size);
> >  
> > -static inline uint16_t pvpanic_port(void)
> > -{
> > -Object *o = object_resolve_path_type("", TYPE_PVPANIC_ISA_DEVICE, 
> > NULL);
> > -if (!o) {
> > -return 0;
> > -}
> > -return object_property_get_uint(o, PVPANIC_IOPORT_PROP, NULL);
> > -}
> > -
> >  #endif
> > diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
> > index 517818cd9f..a42f41f373 100644
> > --- a/hw/i386/acpi-build.c
> > +++ b/hw/i386/acpi-build.c
> > @@ -30,7 +30,6 @@
> >  #include "hw/pci/pci.h"
> >  #include "hw/core/cpu.h"
> >  #include "target/i386/cpu.h"
> > -#include "hw/misc/pvpanic.h"
> >  #include "hw/timer/hpet.h"
> >  #include "hw/acpi/acpi-defs.h"
> >  #include "hw/acpi/acpi.h"
> > @@ -117,7 +116,6 @@ typedef struct AcpiMiscInfo {
> >  #endif
> >  const unsigned char *dsdt_code;
> >  unsigned dsdt_size;
> > -uint16_t pvpanic_port;
> >  } AcpiMiscInfo;
> >  
> >  typedef struct AcpiBuildPciBusHotplugState {
> > @@ -302,7 +300,6 @@ static void acpi_get_misc_info(AcpiMiscInfo *info)
> >  #ifdef CONFIG_TPM
> >  info->tpm_version = tpm_get_version(tpm_find());
> >  #endif
> > -info->pvpanic_port = pvpanic_port();
> >  }
> >  
> >  /*
> > @@ -1749,40 +1746,6 @@ build_dsdt(GArray *table_data, BIOSLinker *linker,
> >  aml_append(dsdt, scope);
> >  }
> >  
> > -if (misc->pvpanic_port) {
> > -scope = aml_scope("\\_SB.PCI0.ISA");
> > -
> > -dev = aml_device("PEVT");
> > -aml_append(dev, aml_name_decl("_HID", aml_string("QEMU0001")));
> > -
> > -crs = aml_resource_template();
> > -aml_append(crs,
> > -aml_io(AML_DECODE16, misc->pvpanic_port, misc->pvpanic_port, 
> > 1, 1)
> > -);
> > -aml_append(dev, aml_name_decl("_CRS", crs));
> > -
> > -aml_append(dev, aml_operation_region("PEOR", AML_SYSTEM_IO,
> > -  aml_int(misc->pvpanic_port), 
> > 1));
> > -field = aml_field("PEOR", AML_BYTE_ACC, AML_NOLOCK, AML_PRESERVE);
> > -aml_append(field, aml_named_field("PEPT", 8));
> > -aml_append(dev, field);
> > -
> > -/* device present, functioning, decoding, shown in UI */
> > -aml_append(dev, aml_name_decl("_STA", aml_int(0xF)));
> > -
> > -method = aml_method("RDPT", 0, AML_NOTSERIALIZED);
> > -aml_append(method, aml_store(aml_name("PEPT"), aml_local(0)));
> > -aml_append(method, aml_return(aml_local(0)));
> > -aml_append(dev, method);
> > -
> > -method = aml_method("WRPT", 1, AML_NOTSERIALIZED);
> > -aml_append(method, aml_store(aml_arg(0), aml_name("PEPT")));
> > -aml_append(dev, method);
> > -
> > -aml_append(scope, dev);
> > -aml_append(dsdt, scope);
> > -}
> > -
> >  sb_scope = aml_scope("\\_SB");
> >  {
> >  Object *pci_host;
> > diff --git a/hw/misc/pvpanic-isa.c b/hw/misc/pvpanic-isa.c
> > index b84d4d458d..ccec50f61b 100644
> > --- a/hw/misc/pvpanic-isa.c
> > +++ b/hw/misc/pvpanic-isa.c
> > @@ -22,6 +22,7 @@
> >  #include "qom/object.h"
> >  #include "hw/isa/isa.h"
> >  #include "standard-headers/linux/pvpanic.h"
> > +#include "hw/acpi/acpi_aml_interface.h"
> >  
> >  OBJECT_DECLARE_SIMPLE_TYPE(PVPanicISAState, PVPANIC_ISA_DEVICE)
> >  
> > @@ -63,6 +64,41 @@ static void pvpanic_isa_realizefn(DeviceState *dev, 
> > Error **errp)
> >  isa_register_ioport(d, >mr, s->ioport);
> >  }
> >  
> > +static void build_pvpanic_isa_aml(AcpiDevAmlIf *adev, Aml *scope)
> > +{
> > +Aml *crs, *field, *method;
> > +PVPanicISAState *s = PVPANIC_ISA_DEVICE(adev);
> > +Aml *dev = aml_device("PEVT");
> > +
> > +aml_append(dev, aml_name_decl("_HID", aml_string("QEMU0001")));
> > +
> > +crs = aml_resource_template();
> > +aml_append(crs,
> > +aml_io(AML_DECODE16, s->ioport, s->ioport, 1, 1)
> > +);
> > +aml_append(dev, aml_name_decl("_CRS", 

[PULL 21/22] target/arm: Enable FEAT_HCX for -cpu max

2022-05-19 Thread Peter Maydell
From: Richard Henderson 

This feature adds a new register, HCRX_EL2, which controls
many of the newer AArch64 features.  So far the register is
effectively RES0, because none of the new features are done.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20220517054850.177016-2-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/cpu.h| 20 ++
 target/arm/cpu64.c  |  1 +
 target/arm/helper.c | 50 +
 3 files changed, 71 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 3dc79f121b5..fac526a4905 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -362,6 +362,7 @@ typedef struct CPUArchState {
 uint32_t pmsav5_data_ap; /* PMSAv5 MPU data access permissions */
 uint32_t pmsav5_insn_ap; /* PMSAv5 MPU insn access permissions */
 uint64_t hcr_el2; /* Hypervisor configuration register */
+uint64_t hcrx_el2; /* Extended Hypervisor configuration register */
 uint64_t scr_el3; /* Secure configuration register.  */
 union { /* Fault status registers.  */
 struct {
@@ -1545,6 +1546,19 @@ static inline void xpsr_write(CPUARMState *env, uint32_t 
val, uint32_t mask)
 #define HCR_TWEDEN(1ULL << 59)
 #define HCR_TWEDELMAKE_64BIT_MASK(60, 4)
 
+#define HCRX_ENAS0(1ULL << 0)
+#define HCRX_ENALS(1ULL << 1)
+#define HCRX_ENASR(1ULL << 2)
+#define HCRX_FNXS (1ULL << 3)
+#define HCRX_FGTNXS   (1ULL << 4)
+#define HCRX_SMPME(1ULL << 5)
+#define HCRX_TALLINT  (1ULL << 6)
+#define HCRX_VINMI(1ULL << 7)
+#define HCRX_VFNMI(1ULL << 8)
+#define HCRX_CMOW (1ULL << 9)
+#define HCRX_MCE2 (1ULL << 10)
+#define HCRX_MSCEN(1ULL << 11)
+
 #define HPFAR_NS  (1ULL << 63)
 
 #define SCR_NS(1U << 0)
@@ -2312,6 +2326,7 @@ static inline bool arm_is_el2_enabled(CPUARMState *env)
  * Not included here is HCR_RW.
  */
 uint64_t arm_hcr_el2_eff(CPUARMState *env);
+uint64_t arm_hcrx_el2_eff(CPUARMState *env);
 
 /* Return true if the specified exception level is running in AArch64 state. */
 static inline bool arm_el_is_aa64(CPUARMState *env, int el)
@@ -3933,6 +3948,11 @@ static inline bool isar_feature_aa64_ats1e1(const 
ARMISARegisters *id)
 return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, PAN) >= 2;
 }
 
+static inline bool isar_feature_aa64_hcx(const ARMISARegisters *id)
+{
+return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, HCX) != 0;
+}
+
 static inline bool isar_feature_aa64_uao(const ARMISARegisters *id)
 {
 return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, UAO) != 0;
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index a752b648568..3ff9219ca3b 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -934,6 +934,7 @@ static void aarch64_max_initfn(Object *obj)
 t = FIELD_DP64(t, ID_AA64MMFR1, LO, 1);   /* FEAT_LOR */
 t = FIELD_DP64(t, ID_AA64MMFR1, PAN, 2);  /* FEAT_PAN2 */
 t = FIELD_DP64(t, ID_AA64MMFR1, XNX, 1);  /* FEAT_XNX */
+t = FIELD_DP64(t, ID_AA64MMFR1, HCX, 1);  /* FEAT_HCX */
 cpu->isar.id_aa64mmfr1 = t;
 
 t = cpu->isar.id_aa64mmfr2;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index fdd51e5e754..7d983d7fffb 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -5288,6 +5288,52 @@ uint64_t arm_hcr_el2_eff(CPUARMState *env)
 return ret;
 }
 
+static void hcrx_write(CPUARMState *env, const ARMCPRegInfo *ri,
+   uint64_t value)
+{
+uint64_t valid_mask = 0;
+
+/* No features adding bits to HCRX are implemented. */
+
+/* Clear RES0 bits.  */
+env->cp15.hcrx_el2 = value & valid_mask;
+}
+
+static CPAccessResult access_hxen(CPUARMState *env, const ARMCPRegInfo *ri,
+  bool isread)
+{
+if (arm_current_el(env) < 3
+&& arm_feature(env, ARM_FEATURE_EL3)
+&& !(env->cp15.scr_el3 & SCR_HXEN)) {
+return CP_ACCESS_TRAP_EL3;
+}
+return CP_ACCESS_OK;
+}
+
+static const ARMCPRegInfo hcrx_el2_reginfo = {
+.name = "HCRX_EL2", .state = ARM_CP_STATE_AA64,
+.opc0 = 3, .opc1 = 4, .crn = 1, .crm = 2, .opc2 = 2,
+.access = PL2_RW, .writefn = hcrx_write, .accessfn = access_hxen,
+.fieldoffset = offsetof(CPUARMState, cp15.hcrx_el2),
+};
+
+/* Return the effective value of HCRX_EL2.  */
+uint64_t arm_hcrx_el2_eff(CPUARMState *env)
+{
+/*
+ * The bits in this register behave as 0 for all purposes other than
+ * direct reads of the register if:
+ *   - EL2 is not enabled in the current security state,
+ *   - SCR_EL3.HXEn is 0.
+ */
+if (!arm_is_el2_enabled(env)
+|| (arm_feature(env, ARM_FEATURE_EL3)
+&& !(env->cp15.scr_el3 & SCR_HXEN))) {
+return 0;
+}
+return env->cp15.hcrx_el2;
+}
+
 static void cptr_el2_write(CPUARMState *env, const ARMCPRegInfo *ri,
uint64_t value)
 {
@@ -8405,6 +8451,10 @@ void 

[PULL 16/22] target/arm: Make number of counters in PMCR follow the CPU

2022-05-19 Thread Peter Maydell
Currently we give all the v7-and-up CPUs a PMU with 4 counters.  This
means that we don't provide the 6 counters that are required by the
Arm BSA (Base System Architecture) specification if the CPU supports
the Virtualization extensions.

Instead of having a single PMCR_NUM_COUNTERS, make each CPU type
specify the PMCR reset value (obtained from the appropriate TRM), and
use the 'N' field of that value to define the number of counters
provided.

This means that we now supply 6 counters instead of 4 for:
 Cortex-A9, Cortex-A15, Cortex-A53, Cortex-A57, Cortex-A72,
 Cortex-A76, Neoverse-N1, '-cpu max'
This CPU goes from 4 to 8 counters:
 A64FX
These CPUs remain with 4 counters:
 Cortex-A7, Cortex-A8
This CPU goes down from 4 to 3 counters:
 Cortex-R5

Note that because we now use the PMCR reset value of the specific
implementation, we no longer set the LC bit out of reset.  This has
an UNKNOWN value out of reset for all cores with any AArch32 support,
so guest software should be setting it anyway if it wants it.

This change was originally landed in commit f7fb73b8cdd3f7 (during
the 6.0 release cycle) but was then reverted by commit
21c2dd77a6aa517 before that release because it did not work with KVM.
This version fixes that by creating the scratch vCPU in
kvm_arm_get_host_cpu_features() with the KVM_ARM_VCPU_PMU_V3 feature
if KVM supports it, and then only asking KVM for the PMCR_EL0 value
if the vCPU has a PMU.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
[PMM: Added the correct value for a64fx]
Message-id: 20220513122852.4063586-1-peter.mayd...@linaro.org
---
 target/arm/cpu.h   |  1 +
 target/arm/internals.h |  4 +++-
 target/arm/cpu64.c | 11 +++
 target/arm/cpu_tcg.c   |  6 ++
 target/arm/helper.c| 25 ++---
 target/arm/kvm64.c | 12 
 6 files changed, 47 insertions(+), 12 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index a42464eb57a..3dc79f121b5 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -965,6 +965,7 @@ struct ArchCPU {
 uint64_t id_aa64dfr0;
 uint64_t id_aa64dfr1;
 uint64_t id_aa64zfr0;
+uint64_t reset_pmcr_el0;
 } isar;
 uint64_t midr;
 uint32_t revidr;
diff --git a/target/arm/internals.h b/target/arm/internals.h
index 9b354eea7e4..b654bee4682 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -1304,7 +1304,9 @@ enum MVEECIState {
 
 static inline uint32_t pmu_num_counters(CPUARMState *env)
 {
-  return (env->cp15.c9_pmcr & PMCRN_MASK) >> PMCRN_SHIFT;
+ARMCPU *cpu = env_archcpu(env);
+
+return (cpu->isar.reset_pmcr_el0 & PMCRN_MASK) >> PMCRN_SHIFT;
 }
 
 /* Bits allowed to be set/cleared for PMCNTEN* and PMINTEN* */
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 7628f4fa39d..a752b648568 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -79,6 +79,7 @@ static void aarch64_a57_initfn(Object *obj)
 cpu->isar.id_aa64isar0 = 0x00011120;
 cpu->isar.id_aa64mmfr0 = 0x1124;
 cpu->isar.dbgdidr = 0x3516d000;
+cpu->isar.reset_pmcr_el0 = 0x41013000;
 cpu->clidr = 0x0a200023;
 cpu->ccsidr[0] = 0x701fe00a; /* 32KB L1 dcache */
 cpu->ccsidr[1] = 0x201fe012; /* 48KB L1 icache */
@@ -133,6 +134,7 @@ static void aarch64_a53_initfn(Object *obj)
 cpu->isar.id_aa64isar0 = 0x00011120;
 cpu->isar.id_aa64mmfr0 = 0x1122; /* 40 bit physical addr */
 cpu->isar.dbgdidr = 0x3516d000;
+cpu->isar.reset_pmcr_el0 = 0x41033000;
 cpu->clidr = 0x0a200023;
 cpu->ccsidr[0] = 0x700fe01a; /* 32KB L1 dcache */
 cpu->ccsidr[1] = 0x201fe00a; /* 32KB L1 icache */
@@ -185,6 +187,7 @@ static void aarch64_a72_initfn(Object *obj)
 cpu->isar.id_aa64isar0 = 0x00011120;
 cpu->isar.id_aa64mmfr0 = 0x1124;
 cpu->isar.dbgdidr = 0x3516d000;
+cpu->isar.reset_pmcr_el0 = 0x41023000;
 cpu->clidr = 0x0a200023;
 cpu->ccsidr[0] = 0x701fe00a; /* 32KB L1 dcache */
 cpu->ccsidr[1] = 0x201fe012; /* 48KB L1 icache */
@@ -261,6 +264,9 @@ static void aarch64_a76_initfn(Object *obj)
 cpu->isar.mvfr0 = 0x10110222;
 cpu->isar.mvfr1 = 0x1321;
 cpu->isar.mvfr2 = 0x0043;
+
+/* From D5.1 AArch64 PMU register summary */
+cpu->isar.reset_pmcr_el0 = 0x410b3000;
 }
 
 static void aarch64_neoverse_n1_initfn(Object *obj)
@@ -327,6 +333,9 @@ static void aarch64_neoverse_n1_initfn(Object *obj)
 cpu->isar.mvfr0 = 0x10110222;
 cpu->isar.mvfr1 = 0x1321;
 cpu->isar.mvfr2 = 0x0043;
+
+/* From D5.1 AArch64 PMU register summary */
+cpu->isar.reset_pmcr_el0 = 0x410c3000;
 }
 
 void arm_cpu_sve_finalize(ARMCPU *cpu, Error **errp)
@@ -1022,6 +1031,8 @@ static void aarch64_a64fx_initfn(Object *obj)
 set_bit(1, cpu->sve_vq_supported); /* 256bit */
 set_bit(3, cpu->sve_vq_supported); /* 512bit */
 
+cpu->isar.reset_pmcr_el0 = 0x46014040;
+
 /* TODO:  Add A64FX specific HPC extension registers */
 }
 
diff --git a/target/arm/cpu_tcg.c 

[PULL 20/22] target/arm: Fix PAuth keys access checks for disabled SEL2

2022-05-19 Thread Peter Maydell
From: Florian Lugou 

As per the description of the HCR_EL2.APK field in the ARMv8 ARM,
Pointer Authentication keys accesses should only be trapped to Secure
EL2 if it is enabled.

Signed-off-by: Florian Lugou 
Reviewed-by: Richard Henderson 
Message-id: 20220517145242.1215271-1-florian.lu...@provenrun.com
Signed-off-by: Peter Maydell 
---
 target/arm/helper.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index aa7a8e05721..fdd51e5e754 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -6768,7 +6768,7 @@ static CPAccessResult access_pauth(CPUARMState *env, 
const ARMCPRegInfo *ri,
 int el = arm_current_el(env);
 
 if (el < 2 &&
-arm_feature(env, ARM_FEATURE_EL2) &&
+arm_is_el2_enabled(env) &&
 !(arm_hcr_el2_eff(env) & HCR_APK)) {
 return CP_ACCESS_TRAP_EL2;
 }
-- 
2.25.1




[PULL 22/22] target/arm: Use FIELD definitions for CPACR, CPTR_ELx

2022-05-19 Thread Peter Maydell
From: Richard Henderson 

We had a few CPTR_* bits defined, but missed quite a few.
Complete all of the fields up to ARMv9.2.
Use FIELD_EX64 instead of manual extract32.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
Message-id: 20220517054850.177016-3-richard.hender...@linaro.org
Signed-off-by: Peter Maydell 
---
 target/arm/cpu.h| 44 +++-
 hw/arm/boot.c   |  2 +-
 target/arm/cpu.c| 11 ++---
 target/arm/helper.c | 54 ++---
 4 files changed, 75 insertions(+), 36 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index fac526a4905..c1865ad5dad 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1261,11 +1261,45 @@ void pmu_init(ARMCPU *cpu);
 #define SCTLR_SPINTMASK (1ULL << 62) /* FEAT_NMI */
 #define SCTLR_TIDCP   (1ULL << 63) /* FEAT_TIDCP1 */
 
-#define CPTR_TCPAC(1U << 31)
-#define CPTR_TTA  (1U << 20)
-#define CPTR_TFP  (1U << 10)
-#define CPTR_TZ   (1U << 8)   /* CPTR_EL2 */
-#define CPTR_EZ   (1U << 8)   /* CPTR_EL3 */
+/* Bit definitions for CPACR (AArch32 only) */
+FIELD(CPACR, CP10, 20, 2)
+FIELD(CPACR, CP11, 22, 2)
+FIELD(CPACR, TRCDIS, 28, 1)/* matches CPACR_EL1.TTA */
+FIELD(CPACR, D32DIS, 30, 1)/* up to v7; RAZ in v8 */
+FIELD(CPACR, ASEDIS, 31, 1)
+
+/* Bit definitions for CPACR_EL1 (AArch64 only) */
+FIELD(CPACR_EL1, ZEN, 16, 2)
+FIELD(CPACR_EL1, FPEN, 20, 2)
+FIELD(CPACR_EL1, SMEN, 24, 2)
+FIELD(CPACR_EL1, TTA, 28, 1)   /* matches CPACR.TRCDIS */
+
+/* Bit definitions for HCPTR (AArch32 only) */
+FIELD(HCPTR, TCP10, 10, 1)
+FIELD(HCPTR, TCP11, 11, 1)
+FIELD(HCPTR, TASE, 15, 1)
+FIELD(HCPTR, TTA, 20, 1)
+FIELD(HCPTR, TAM, 30, 1)   /* matches CPTR_EL2.TAM */
+FIELD(HCPTR, TCPAC, 31, 1) /* matches CPTR_EL2.TCPAC */
+
+/* Bit definitions for CPTR_EL2 (AArch64 only) */
+FIELD(CPTR_EL2, TZ, 8, 1)  /* !E2H */
+FIELD(CPTR_EL2, TFP, 10, 1)/* !E2H, matches HCPTR.TCP10 */
+FIELD(CPTR_EL2, TSM, 12, 1)/* !E2H */
+FIELD(CPTR_EL2, ZEN, 16, 2)/* E2H */
+FIELD(CPTR_EL2, FPEN, 20, 2)   /* E2H */
+FIELD(CPTR_EL2, SMEN, 24, 2)   /* E2H */
+FIELD(CPTR_EL2, TTA, 28, 1)
+FIELD(CPTR_EL2, TAM, 30, 1)/* matches HCPTR.TAM */
+FIELD(CPTR_EL2, TCPAC, 31, 1)  /* matches HCPTR.TCPAC */
+
+/* Bit definitions for CPTR_EL3 (AArch64 only) */
+FIELD(CPTR_EL3, EZ, 8, 1)
+FIELD(CPTR_EL3, TFP, 10, 1)
+FIELD(CPTR_EL3, ESM, 12, 1)
+FIELD(CPTR_EL3, TTA, 20, 1)
+FIELD(CPTR_EL3, TAM, 30, 1)
+FIELD(CPTR_EL3, TCPAC, 31, 1)
 
 #define MDCR_EPMAD(1U << 21)
 #define MDCR_EDAD (1U << 20)
diff --git a/hw/arm/boot.c b/hw/arm/boot.c
index a47f38dfc90..a8de33fd647 100644
--- a/hw/arm/boot.c
+++ b/hw/arm/boot.c
@@ -761,7 +761,7 @@ static void do_cpu_reset(void *opaque)
 env->cp15.scr_el3 |= SCR_ATA;
 }
 if (cpu_isar_feature(aa64_sve, cpu)) {
-env->cp15.cptr_el[3] |= CPTR_EZ;
+env->cp15.cptr_el[3] |= R_CPTR_EL3_EZ_MASK;
 }
 /* AArch64 kernels never boot in secure mode */
 assert(!info->secure_boot);
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 029f644768b..d2bd74c2ed4 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -201,9 +201,11 @@ static void arm_cpu_reset(DeviceState *dev)
 /* Trap on btype=3 for PACIxSP. */
 env->cp15.sctlr_el[1] |= SCTLR_BT0;
 /* and to the FP/Neon instructions */
-env->cp15.cpacr_el1 = deposit64(env->cp15.cpacr_el1, 20, 2, 3);
+env->cp15.cpacr_el1 = FIELD_DP64(env->cp15.cpacr_el1,
+ CPACR_EL1, FPEN, 3);
 /* and to the SVE instructions */
-env->cp15.cpacr_el1 = deposit64(env->cp15.cpacr_el1, 16, 2, 3);
+env->cp15.cpacr_el1 = FIELD_DP64(env->cp15.cpacr_el1,
+ CPACR_EL1, ZEN, 3);
 /* with reasonable vector length */
 if (cpu_isar_feature(aa64_sve, cpu)) {
 env->vfp.zcr_el[1] =
@@ -252,7 +254,10 @@ static void arm_cpu_reset(DeviceState *dev)
 } else {
 #if defined(CONFIG_USER_ONLY)
 /* Userspace expects access to cp10 and cp11 for FP/Neon */
-env->cp15.cpacr_el1 = deposit64(env->cp15.cpacr_el1, 20, 4, 0xf);
+env->cp15.cpacr_el1 = FIELD_DP64(env->cp15.cpacr_el1,
+ CPACR, CP10, 3);
+env->cp15.cpacr_el1 = FIELD_DP64(env->cp15.cpacr_el1,
+ CPACR, CP11, 3);
 #endif
 }
 
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 7d983d7fffb..40da63913c9 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -766,11 +766,14 @@ static void cpacr_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
  */
 if (cpu_isar_feature(aa32_vfp_simd, env_archcpu(env))) {
 /* VFP coprocessor: cp10 & cp11 [23:20] */
-mask |= (1 << 31) | (1 << 30) | 

[PULL 11/22] hw/intc/arm_gicv3: Use correct number of priority bits for the CPU

2022-05-19 Thread Peter Maydell
Make the GICv3 set its number of bits of physical priority from the
implementation-specific value provided in the CPU state struct, in
the same way we already do for virtual priority bits.  Because this
would be a migration compatibility break, we provide a property
force-8-bit-prio which is enabled for 7.0 and earlier versioned board
models to retain the legacy "always use 8 bits" behaviour.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Message-id: 20220512151457.3899052-6-peter.mayd...@linaro.org
Message-id: 20220506162129.2896966-5-peter.mayd...@linaro.org
---
 include/hw/intc/arm_gicv3_common.h |  1 +
 target/arm/cpu.h   |  1 +
 hw/core/machine.c  |  4 +++-
 hw/intc/arm_gicv3_common.c |  5 +
 hw/intc/arm_gicv3_cpuif.c  | 15 +++
 target/arm/cpu64.c |  6 ++
 6 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/include/hw/intc/arm_gicv3_common.h 
b/include/hw/intc/arm_gicv3_common.h
index 46677ec345c..ab5182a28a2 100644
--- a/include/hw/intc/arm_gicv3_common.h
+++ b/include/hw/intc/arm_gicv3_common.h
@@ -248,6 +248,7 @@ struct GICv3State {
 uint32_t revision;
 bool lpi_enable;
 bool security_extn;
+bool force_8bit_prio;
 bool irq_reset_nonsecure;
 bool gicd_no_migration_shift_bug;
 
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index a99b430e54e..a42464eb57a 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1002,6 +1002,7 @@ struct ArchCPU {
 int gic_num_lrs; /* number of list registers */
 int gic_vpribits; /* number of virtual priority bits */
 int gic_vprebits; /* number of virtual preemption bits */
+int gic_pribits; /* number of physical priority bits */
 
 /* Whether the cfgend input is high (i.e. this CPU should reset into
  * big-endian mode).  This setting isn't used directly: instead it modifies
diff --git a/hw/core/machine.c b/hw/core/machine.c
index b03d9192baf..bb0dc8f6a93 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -41,7 +41,9 @@
 #include "hw/virtio/virtio-pci.h"
 #include "qom/object_interfaces.h"
 
-GlobalProperty hw_compat_7_0[] = {};
+GlobalProperty hw_compat_7_0[] = {
+{ "arm-gicv3-common", "force-8-bit-prio", "on" },
+};
 const size_t hw_compat_7_0_len = G_N_ELEMENTS(hw_compat_7_0);
 
 GlobalProperty hw_compat_6_2[] = {
diff --git a/hw/intc/arm_gicv3_common.c b/hw/intc/arm_gicv3_common.c
index 5634c6fc788..351843db4aa 100644
--- a/hw/intc/arm_gicv3_common.c
+++ b/hw/intc/arm_gicv3_common.c
@@ -563,6 +563,11 @@ static Property arm_gicv3_common_properties[] = {
 DEFINE_PROP_UINT32("revision", GICv3State, revision, 3),
 DEFINE_PROP_BOOL("has-lpi", GICv3State, lpi_enable, 0),
 DEFINE_PROP_BOOL("has-security-extensions", GICv3State, security_extn, 0),
+/*
+ * Compatibility property: force 8 bits of physical priority, even
+ * if the CPU being emulated should have fewer.
+ */
+DEFINE_PROP_BOOL("force-8-bit-prio", GICv3State, force_8bit_prio, 0),
 DEFINE_PROP_ARRAY("redist-region-count", GICv3State, nb_redist_regions,
   redist_region_count, qdev_prop_uint32, uint32_t),
 DEFINE_PROP_LINK("sysmem", GICv3State, dma, TYPE_MEMORY_REGION,
diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index 69a15f7a444..66e06b787c7 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -2798,6 +2798,7 @@ void gicv3_init_cpuif(GICv3State *s)
  *  cpu->gic_num_lrs
  *  cpu->gic_vpribits
  *  cpu->gic_vprebits
+ *  cpu->gic_pribits
  */
 
 /* Note that we can't just use the GICv3CPUState as an opaque pointer
@@ -2810,11 +2811,17 @@ void gicv3_init_cpuif(GICv3State *s)
 define_arm_cp_regs(cpu, gicv3_cpuif_reginfo);
 
 /*
- * For the moment, retain the existing behaviour of 8 priority bits;
- * in a following commit we will take this from the CPU state,
- * as we do for the virtual priority bits.
+ * The CPU implementation specifies the number of supported
+ * bits of physical priority. For backwards compatibility
+ * of migration, we have a compat property that forces use
+ * of 8 priority bits regardless of what the CPU really has.
  */
-cs->pribits = 8;
+if (s->force_8bit_prio) {
+cs->pribits = 8;
+} else {
+cs->pribits = cpu->gic_pribits ?: 5;
+}
+
 /*
  * The GICv3 has separate ID register fields for virtual priority
  * and preemption bit values, but only a single ID register field
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 804a54922cb..7628f4fa39d 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -87,6 +87,7 @@ static void aarch64_a57_initfn(Object *obj)
 cpu->gic_num_lrs = 4;
 cpu->gic_vpribits = 5;
 cpu->gic_vprebits = 5;
+cpu->gic_pribits = 5;
 

[PULL 17/22] hw/arm/virt: Fix incorrect non-secure flash dtb node name

2022-05-19 Thread Peter Maydell
In the virt board with secure=on we put two nodes in the dtb
for flash devices: one for the secure-only flash, and one
for the non-secure flash. We get the reg properties for these
correct, but in the DT node name, which by convention includes
the base address of devices, we used the wrong address. Fix it.

Spotted by dtc, which will complain
Warning (unique_unit_address): /flash@0: duplicate unit-address (also used in 
node /secflash@0)
if you dump the dtb from QEMU with -machine dumpdtb=file.dtb
and then decompile it with dtc.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Message-id: 20220513131316.4081539-2-peter.mayd...@linaro.org
---
 hw/arm/virt.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 1a45f44435e..587e885a98c 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -1195,7 +1195,7 @@ static void virt_flash_fdt(VirtMachineState *vms,
 qemu_fdt_setprop_string(ms->fdt, nodename, "secure-status", "okay");
 g_free(nodename);
 
-nodename = g_strdup_printf("/flash@%" PRIx64, flashbase);
+nodename = g_strdup_printf("/flash@%" PRIx64, flashbase + flashsize);
 qemu_fdt_add_subnode(ms->fdt, nodename);
 qemu_fdt_setprop_string(ms->fdt, nodename, "compatible", "cfi-flash");
 qemu_fdt_setprop_sized_cells(ms->fdt, nodename, "reg",
-- 
2.25.1




[PULL 12/22] hw/intc/arm_gicv3: Provide ich_num_aprs()

2022-05-19 Thread Peter Maydell
We previously open-coded the expression for the number of virtual APR
registers and the assertion that it was not going to cause us to
overflow the cs->ich_apr[] array.  Factor this out into a new
ich_num_aprs() function, for consistency with the icc_num_aprs()
function we just added for the physical APR handling.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Message-id: 20220512151457.3899052-7-peter.mayd...@linaro.org
Message-id: 20220506162129.2896966-6-peter.mayd...@linaro.org
---
 hw/intc/arm_gicv3_cpuif.c | 16 ++--
 1 file changed, 10 insertions(+), 6 deletions(-)

diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index 66e06b787c7..8867e2e496f 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -49,6 +49,14 @@ static inline int icv_min_vbpr(GICv3CPUState *cs)
 return 7 - cs->vprebits;
 }
 
+static inline int ich_num_aprs(GICv3CPUState *cs)
+{
+/* Return the number of virtual APR registers (1, 2, or 4) */
+int aprmax = 1 << (cs->vprebits - 5);
+assert(aprmax <= ARRAY_SIZE(cs->ich_apr[0]));
+return aprmax;
+}
+
 /* Simple accessor functions for LR fields */
 static uint32_t ich_lr_vintid(uint64_t lr)
 {
@@ -145,9 +153,7 @@ static int ich_highest_active_virt_prio(GICv3CPUState *cs)
  * in the ICH Active Priority Registers.
  */
 int i;
-int aprmax = 1 << (cs->vprebits - 5);
-
-assert(aprmax <= ARRAY_SIZE(cs->ich_apr[0]));
+int aprmax = ich_num_aprs(cs);
 
 for (i = 0; i < aprmax; i++) {
 uint32_t apr = cs->ich_apr[GICV3_G0][i] |
@@ -1333,9 +1339,7 @@ static int icv_drop_prio(GICv3CPUState *cs)
  * 32 bits are actually relevant.
  */
 int i;
-int aprmax = 1 << (cs->vprebits - 5);
-
-assert(aprmax <= ARRAY_SIZE(cs->ich_apr[0]));
+int aprmax = ich_num_aprs(cs);
 
 for (i = 0; i < aprmax; i++) {
 uint64_t *papr0 = >ich_apr[GICV3_G0][i];
-- 
2.25.1




[PULL 15/22] target/arm/helper.c: Delete stray obsolete comment

2022-05-19 Thread Peter Maydell
In commit 88ce6c6ee85d we switched from directly fishing the number
of breakpoints and watchpoints out of the ID register fields to
abstracting out functions to do this job, but we forgot to delete the
now-obsolete comment in define_debug_regs() about the relation
between the ID field value and the actual number of breakpoints and
watchpoints.  Delete the obsolete comment.

Reported-by: CHRIS HOWARD 
Signed-off-by: Peter Maydell 
Reviewed-by: Alex Bennée 
Reviewed-by: Richard Henderson 
Message-id: 20220513131801.4082712-1-peter.mayd...@linaro.org
---
 target/arm/helper.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 91f78c91cea..d4db21dc92c 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -6540,7 +6540,6 @@ static void define_debug_regs(ARMCPU *cpu)
 define_one_arm_cp_reg(cpu, );
 }
 
-/* Note that all these register fields hold "number of Xs minus 1". */
 brps = arm_num_brps(cpu);
 wrps = arm_num_wrps(cpu);
 ctx_cmps = arm_num_ctx_cmps(cpu);
-- 
2.25.1




[PULL 13/22] Fix aarch64 debug register names.

2022-05-19 Thread Peter Maydell
From: Chris Howard 

Give all the debug registers their correct names including the
index, rather than having multiple registers all with the
same name string, which is confusing when viewed over the
gdbstub interface.

Signed-off-by: CHRIS HOWARD 
Reviewed-by: Richard Henderson 
Message-id: 4127d8ca-d54a-47c7-a039-0db7361e3...@web.de
[PMM: expanded commit message]
Signed-off-by: Peter Maydell 
---
 target/arm/helper.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 073d6509c8c..91f78c91cea 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -6554,14 +6554,16 @@ static void define_debug_regs(ARMCPU *cpu)
 }
 
 for (i = 0; i < brps; i++) {
+char *dbgbvr_el1_name = g_strdup_printf("DBGBVR%d_EL1", i);
+char *dbgbcr_el1_name = g_strdup_printf("DBGBCR%d_EL1", i);
 ARMCPRegInfo dbgregs[] = {
-{ .name = "DBGBVR", .state = ARM_CP_STATE_BOTH,
+{ .name = dbgbvr_el1_name, .state = ARM_CP_STATE_BOTH,
   .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 0, .crm = i, .opc2 = 4,
   .access = PL1_RW, .accessfn = access_tda,
   .fieldoffset = offsetof(CPUARMState, cp15.dbgbvr[i]),
   .writefn = dbgbvr_write, .raw_writefn = raw_write
 },
-{ .name = "DBGBCR", .state = ARM_CP_STATE_BOTH,
+{ .name = dbgbcr_el1_name, .state = ARM_CP_STATE_BOTH,
   .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 0, .crm = i, .opc2 = 5,
   .access = PL1_RW, .accessfn = access_tda,
   .fieldoffset = offsetof(CPUARMState, cp15.dbgbcr[i]),
@@ -6569,17 +6571,21 @@ static void define_debug_regs(ARMCPU *cpu)
 },
 };
 define_arm_cp_regs(cpu, dbgregs);
+g_free(dbgbvr_el1_name);
+g_free(dbgbcr_el1_name);
 }
 
 for (i = 0; i < wrps; i++) {
+char *dbgwvr_el1_name = g_strdup_printf("DBGWVR%d_EL1", i);
+char *dbgwcr_el1_name = g_strdup_printf("DBGWCR%d_EL1", i);
 ARMCPRegInfo dbgregs[] = {
-{ .name = "DBGWVR", .state = ARM_CP_STATE_BOTH,
+{ .name = dbgwvr_el1_name, .state = ARM_CP_STATE_BOTH,
   .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 0, .crm = i, .opc2 = 6,
   .access = PL1_RW, .accessfn = access_tda,
   .fieldoffset = offsetof(CPUARMState, cp15.dbgwvr[i]),
   .writefn = dbgwvr_write, .raw_writefn = raw_write
 },
-{ .name = "DBGWCR", .state = ARM_CP_STATE_BOTH,
+{ .name = dbgwcr_el1_name, .state = ARM_CP_STATE_BOTH,
   .cp = 14, .opc0 = 2, .opc1 = 0, .crn = 0, .crm = i, .opc2 = 7,
   .access = PL1_RW, .accessfn = access_tda,
   .fieldoffset = offsetof(CPUARMState, cp15.dbgwcr[i]),
@@ -6587,6 +6593,8 @@ static void define_debug_regs(ARMCPU *cpu)
 },
 };
 define_arm_cp_regs(cpu, dbgregs);
+g_free(dbgwvr_el1_name);
+g_free(dbgwcr_el1_name);
 }
 }
 
-- 
2.25.1




[PULL 10/22] hw/intc/arm_gicv3: Support configurable number of physical priority bits

2022-05-19 Thread Peter Maydell
The GICv3 code has always supported a configurable number of virtual
priority and preemption bits, but our implementation currently
hardcodes the number of physical priority bits at 8.  This is not
what most hardware implementations provide; for instance the
Cortex-A53 provides only 5 bits of physical priority.

Make the number of physical priority/preemption bits driven by fields
in the GICv3CPUState, the way that we already do for virtual
priority/preemption bits.  We set cs->pribits to 8, so there is no
behavioural change in this commit.  A following commit will add the
machinery for CPUs to set this to the correct value for their
implementation.

Note that changing the number of priority bits would be a migration
compatibility break, because the semantics of the icc_apr[][] array
changes.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Message-id: 20220512151457.3899052-5-peter.mayd...@linaro.org
Message-id: 20220506162129.2896966-4-peter.mayd...@linaro.org
---
 include/hw/intc/arm_gicv3_common.h |   7 +-
 hw/intc/arm_gicv3_cpuif.c  | 182 -
 2 files changed, 130 insertions(+), 59 deletions(-)

diff --git a/include/hw/intc/arm_gicv3_common.h 
b/include/hw/intc/arm_gicv3_common.h
index 4e416100559..46677ec345c 100644
--- a/include/hw/intc/arm_gicv3_common.h
+++ b/include/hw/intc/arm_gicv3_common.h
@@ -51,11 +51,6 @@
 /* Maximum number of list registers (architectural limit) */
 #define GICV3_LR_MAX 16
 
-/* Minimum BPR for Secure, or when security not enabled */
-#define GIC_MIN_BPR 0
-/* Minimum BPR for Nonsecure when security is enabled */
-#define GIC_MIN_BPR_NS (GIC_MIN_BPR + 1)
-
 /* For some distributor fields we want to model the array of 32-bit
  * register values which hold various bitmaps corresponding to enabled,
  * pending, etc bits. These macros and functions facilitate that; the
@@ -206,6 +201,8 @@ struct GICv3CPUState {
 int num_list_regs;
 int vpribits; /* number of virtual priority bits */
 int vprebits; /* number of virtual preemption bits */
+int pribits; /* number of physical priority bits */
+int prebits; /* number of physical preemption bits */
 
 /* Current highest priority pending interrupt for this CPU.
  * This is cached information that can be recalculated from the
diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index ebf269b73a4..69a15f7a444 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -787,6 +787,36 @@ static uint64_t icv_iar_read(CPUARMState *env, const 
ARMCPRegInfo *ri)
 return intid;
 }
 
+static uint32_t icc_fullprio_mask(GICv3CPUState *cs)
+{
+/*
+ * Return a mask word which clears the unimplemented priority bits
+ * from a priority value for a physical interrupt. (Not to be confused
+ * with the group priority, whose mask depends on the value of BPR
+ * for the interrupt group.)
+ */
+return ~0U << (8 - cs->pribits);
+}
+
+static inline int icc_min_bpr(GICv3CPUState *cs)
+{
+/* The minimum BPR for the physical interface. */
+return 7 - cs->prebits;
+}
+
+static inline int icc_min_bpr_ns(GICv3CPUState *cs)
+{
+return icc_min_bpr(cs) + 1;
+}
+
+static inline int icc_num_aprs(GICv3CPUState *cs)
+{
+/* Return the number of APR registers (1, 2, or 4) */
+int aprmax = 1 << MAX(cs->prebits - 5, 0);
+assert(aprmax <= ARRAY_SIZE(cs->icc_apr[0]));
+return aprmax;
+}
+
 static int icc_highest_active_prio(GICv3CPUState *cs)
 {
 /* Calculate the current running priority based on the set bits
@@ -794,14 +824,14 @@ static int icc_highest_active_prio(GICv3CPUState *cs)
  */
 int i;
 
-for (i = 0; i < ARRAY_SIZE(cs->icc_apr[0]); i++) {
+for (i = 0; i < icc_num_aprs(cs); i++) {
 uint32_t apr = cs->icc_apr[GICV3_G0][i] |
 cs->icc_apr[GICV3_G1][i] | cs->icc_apr[GICV3_G1NS][i];
 
 if (!apr) {
 continue;
 }
-return (i * 32 + ctz32(apr)) << (GIC_MIN_BPR + 1);
+return (i * 32 + ctz32(apr)) << (icc_min_bpr(cs) + 1);
 }
 /* No current active interrupts: return idle priority */
 return 0xff;
@@ -980,7 +1010,7 @@ static void icc_pmr_write(CPUARMState *env, const 
ARMCPRegInfo *ri,
 
 trace_gicv3_icc_pmr_write(gicv3_redist_affid(cs), value);
 
-value &= 0xff;
+value &= icc_fullprio_mask(cs);
 
 if (arm_feature(env, ARM_FEATURE_EL3) && !arm_is_secure(env) &&
 (env->cp15.scr_el3 & SCR_FIQ)) {
@@ -1004,7 +1034,7 @@ static void icc_activate_irq(GICv3CPUState *cs, int irq)
  */
 uint32_t mask = icc_gprio_mask(cs, cs->hppi.grp);
 int prio = cs->hppi.prio & mask;
-int aprbit = prio >> 1;
+int aprbit = prio >> (8 - cs->prebits);
 int regno = aprbit / 32;
 int regbit = aprbit % 32;
 
@@ -1162,7 +1192,7 @@ static void icc_drop_prio(GICv3CPUState *cs, int grp)
  */
 int i;
 
-for (i = 0; i < ARRAY_SIZE(cs->icc_apr[grp]); i++) {
+for (i = 0; i < icc_num_aprs(cs); 

[PULL 08/22] hw/intc/arm_gicv3: report correct PRIbits field in ICV_CTLR_EL1

2022-05-19 Thread Peter Maydell
As noted in the comment, the PRIbits field in ICV_CTLR_EL1 is
supposed to match the ICH_VTR_EL2 PRIbits setting; that is, it is the
virtual priority bit setting, not the physical priority bit setting.
(For QEMU currently we always implement 8 bits of physical priority,
so the PRIbits field was previously 7, since it is defined to be
"priority bits - 1".)

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Message-id: 20220512151457.3899052-3-peter.mayd...@linaro.org
Message-id: 20220506162129.2896966-2-peter.mayd...@linaro.org
---
 hw/intc/arm_gicv3_cpuif.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index df2f8583564..ebf269b73a4 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -657,7 +657,7 @@ static uint64_t icv_ctlr_read(CPUARMState *env, const 
ARMCPRegInfo *ri)
  * should match the ones reported in ich_vtr_read().
  */
 value = ICC_CTLR_EL1_A3V | (1 << ICC_CTLR_EL1_IDBITS_SHIFT) |
-(7 << ICC_CTLR_EL1_PRIBITS_SHIFT);
+((cs->vpribits - 1) << ICC_CTLR_EL1_PRIBITS_SHIFT);
 
 if (cs->ich_vmcr_el2 & ICH_VMCR_EL2_VEOIM) {
 value |= ICC_CTLR_EL1_EOIMODE;
-- 
2.25.1




[PULL 09/22] hw/intc/arm_gicv3_kvm.c: Stop using GIC_MIN_BPR constant

2022-05-19 Thread Peter Maydell
The GIC_MIN_BPR constant defines the minimum BPR value that the TCG
emulated GICv3 supports.  We're currently using this also as the
value we reset the KVM GICv3 ICC_BPR registers to, but this is only
right by accident.

We want to make the emulated GICv3 use a configurable number of
priority bits, which means that GIC_MIN_BPR will no longer be a
constant.  Replace the uses in the KVM reset code with literal 0,
plus a constant explaining why this is reasonable.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Message-id: 20220512151457.3899052-4-peter.mayd...@linaro.org
Message-id: 20220506162129.2896966-3-peter.mayd...@linaro.org
---
 hw/intc/arm_gicv3_kvm.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/hw/intc/arm_gicv3_kvm.c b/hw/intc/arm_gicv3_kvm.c
index 2922c516e56..3ca643ecba4 100644
--- a/hw/intc/arm_gicv3_kvm.c
+++ b/hw/intc/arm_gicv3_kvm.c
@@ -673,9 +673,19 @@ static void arm_gicv3_icc_reset(CPUARMState *env, const 
ARMCPRegInfo *ri)
 s = c->gic;
 
 c->icc_pmr_el1 = 0;
-c->icc_bpr[GICV3_G0] = GIC_MIN_BPR;
-c->icc_bpr[GICV3_G1] = GIC_MIN_BPR;
-c->icc_bpr[GICV3_G1NS] = GIC_MIN_BPR;
+/*
+ * Architecturally the reset value of the ICC_BPR registers
+ * is UNKNOWN. We set them all to 0 here; when the kernel
+ * uses these values to program the ICH_VMCR_EL2 fields that
+ * determine the guest-visible ICC_BPR register values, the
+ * hardware's "writing a value less than the minimum sets
+ * the field to the minimum value" behaviour will result in
+ * them effectively resetting to the correct minimum value
+ * for the host GIC.
+ */
+c->icc_bpr[GICV3_G0] = 0;
+c->icc_bpr[GICV3_G1] = 0;
+c->icc_bpr[GICV3_G1NS] = 0;
 
 c->icc_sre_el1 = 0x7;
 memset(c->icc_apr, 0, sizeof(c->icc_apr));
-- 
2.25.1




[PULL 19/22] ptimer: Rename PTIMER_POLICY_DEFAULT to PTIMER_POLICY_LEGACY

2022-05-19 Thread Peter Maydell
The traditional ptimer behaviour includes a collection of weird edge
case behaviours.  In 2016 we improved the ptimer implementation to
fix these and generally make the behaviour more flexible, with
ptimers opting in to the new behaviour by passing an appropriate set
of policy flags to ptimer_init().  For backwards-compatibility, we
defined PTIMER_POLICY_DEFAULT (which sets no flags) to give the old
weird behaviour.

This turns out to be a poor choice of name, because people writing
new devices which use ptimers are misled into thinking that the
default is probably a sensible choice of flags, when in fact it is
almost always not what you want.  Rename PTIMER_POLICY_DEFAULT to
PTIMER_POLICY_LEGACY and beef up the comment to more clearly say that
new devices should not be using it.

The code-change part of this commit was produced by
  sed -i -e 's/PTIMER_POLICY_DEFAULT/PTIMER_POLICY_LEGACY/g' $(git grep -l 
PTIMER_POLICY_DEFAULT)
with the exception of a test name string change in
tests/unit/ptimer-test.c which was added manually.

Signed-off-by: Peter Maydell 
Reviewed-by: Francisco Iglesias 
Reviewed-by: Richard Henderson 
Message-id: 20220516103058.162280-1-peter.mayd...@linaro.org
---
 include/hw/ptimer.h  | 16 
 hw/arm/musicpal.c|  2 +-
 hw/dma/xilinx_axidma.c   |  2 +-
 hw/dma/xlnx_csu_dma.c|  2 +-
 hw/m68k/mcf5206.c|  2 +-
 hw/m68k/mcf5208.c|  2 +-
 hw/net/can/xlnx-zynqmp-can.c |  2 +-
 hw/net/fsl_etsec/etsec.c |  2 +-
 hw/net/lan9118.c |  2 +-
 hw/rtc/exynos4210_rtc.c  |  4 ++--
 hw/timer/allwinner-a10-pit.c |  2 +-
 hw/timer/altera_timer.c  |  2 +-
 hw/timer/arm_timer.c |  2 +-
 hw/timer/digic-timer.c   |  2 +-
 hw/timer/etraxfs_timer.c |  6 +++---
 hw/timer/exynos4210_mct.c|  6 +++---
 hw/timer/exynos4210_pwm.c|  2 +-
 hw/timer/grlib_gptimer.c |  2 +-
 hw/timer/imx_epit.c  |  4 ++--
 hw/timer/imx_gpt.c   |  2 +-
 hw/timer/mss-timer.c |  2 +-
 hw/timer/sh_timer.c  |  2 +-
 hw/timer/slavio_timer.c  |  2 +-
 hw/timer/xilinx_timer.c  |  2 +-
 tests/unit/ptimer-test.c |  6 +++---
 25 files changed, 44 insertions(+), 36 deletions(-)

diff --git a/include/hw/ptimer.h b/include/hw/ptimer.h
index c443218475b..4dc02b0de47 100644
--- a/include/hw/ptimer.h
+++ b/include/hw/ptimer.h
@@ -33,9 +33,17 @@
  * to stderr when the guest attempts to enable the timer.
  */
 
-/* The default ptimer policy retains backward compatibility with the legacy
- * timers. Custom policies are adjusting the default one. Consider providing
- * a correct policy for your timer.
+/*
+ * The 'legacy' ptimer policy retains backward compatibility with the
+ * traditional ptimer behaviour from before policy flags were introduced.
+ * It has several weird behaviours which don't match typical hardware
+ * timer behaviour. For a new device using ptimers, you should not
+ * use PTIMER_POLICY_LEGACY, but instead check the actual behaviour
+ * that you need and specify the right set of policy flags to get that.
+ *
+ * If you are overhauling an existing device that uses PTIMER_POLICY_LEGACY
+ * and are in a position to check or test the real hardware behaviour,
+ * consider updating it to specify the right policy flags.
  *
  * The rough edges of the default policy:
  *  - Starting to run with a period = 0 emits error message and stops the
@@ -54,7 +62,7 @@
  *since the last period, effectively restarting the timer with a
  *counter = counter value at the moment of change (.i.e. one less).
  */
-#define PTIMER_POLICY_DEFAULT   0
+#define PTIMER_POLICY_LEGACY0
 
 /* Periodic timer counter stays with "0" for a one period before wrapping
  * around.  */
diff --git a/hw/arm/musicpal.c b/hw/arm/musicpal.c
index 7c840fb4283..b65c020115a 100644
--- a/hw/arm/musicpal.c
+++ b/hw/arm/musicpal.c
@@ -464,7 +464,7 @@ static void mv88w8618_timer_init(SysBusDevice *dev, 
mv88w8618_timer_state *s,
 sysbus_init_irq(dev, >irq);
 s->freq = freq;
 
-s->ptimer = ptimer_init(mv88w8618_timer_tick, s, PTIMER_POLICY_DEFAULT);
+s->ptimer = ptimer_init(mv88w8618_timer_tick, s, PTIMER_POLICY_LEGACY);
 }
 
 static uint64_t mv88w8618_pit_read(void *opaque, hwaddr offset,
diff --git a/hw/dma/xilinx_axidma.c b/hw/dma/xilinx_axidma.c
index bc383f53cca..cbb8f0f1696 100644
--- a/hw/dma/xilinx_axidma.c
+++ b/hw/dma/xilinx_axidma.c
@@ -552,7 +552,7 @@ static void xilinx_axidma_realize(DeviceState *dev, Error 
**errp)
 
 st->dma = s;
 st->nr = i;
-st->ptimer = ptimer_init(timer_hit, st, PTIMER_POLICY_DEFAULT);
+st->ptimer = ptimer_init(timer_hit, st, PTIMER_POLICY_LEGACY);
 ptimer_transaction_begin(st->ptimer);
 ptimer_set_freq(st->ptimer, s->freqhz);
 ptimer_transaction_commit(st->ptimer);
diff --git a/hw/dma/xlnx_csu_dma.c b/hw/dma/xlnx_csu_dma.c
index 60ada3286b4..1ce52ea5a2b 100644
--- 

[PULL 04/22] target/arm: Enable FEAT_S2FWB for -cpu max

2022-05-19 Thread Peter Maydell
Enable the FEAT_S2FWB for -cpu max. Since FEAT_S2FWB requires that
CLIDR_EL1.{LoUU,LoUIS} are zero, we explicitly squash these (the
inherited CLIDR_EL1 value from the Cortex-A57 has them as 1).

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Message-id: 20220505183950.2781801-5-peter.mayd...@linaro.org
---
 docs/system/arm/emulation.rst |  1 +
 target/arm/cpu64.c| 11 +++
 2 files changed, 12 insertions(+)

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index 8ed466bf68e..8f25502ced7 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -52,6 +52,7 @@ the following architecture extensions:
 - FEAT_RAS (Reliability, availability, and serviceability)
 - FEAT_RDM (Advanced SIMD rounding double multiply accumulate instructions)
 - FEAT_RNG (Random number generator)
+- FEAT_S2FWB (Stage 2 forced Write-Back)
 - FEAT_SB (Speculation Barrier)
 - FEAT_SEL2 (Secure EL2)
 - FEAT_SHA1 (SHA1 instructions)
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 04427e073f1..e83c013e1fe 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -812,6 +812,7 @@ static void aarch64_max_initfn(Object *obj)
 {
 ARMCPU *cpu = ARM_CPU(obj);
 uint64_t t;
+uint32_t u;
 
 if (kvm_enabled() || hvf_enabled()) {
 /* With KVM or HVF, '-cpu max' is identical to '-cpu host' */
@@ -842,6 +843,15 @@ static void aarch64_max_initfn(Object *obj)
 t = FIELD_DP64(t, MIDR_EL1, REVISION, 0);
 cpu->midr = t;
 
+/*
+ * We're going to set FEAT_S2FWB, which mandates that 
CLIDR_EL1.{LoUU,LoUIS}
+ * are zero.
+ */
+u = cpu->clidr;
+u = FIELD_DP32(u, CLIDR_EL1, LOUIS, 0);
+u = FIELD_DP32(u, CLIDR_EL1, LOUU, 0);
+cpu->clidr = u;
+
 t = cpu->isar.id_aa64isar0;
 t = FIELD_DP64(t, ID_AA64ISAR0, AES, 2);  /* FEAT_PMULL */
 t = FIELD_DP64(t, ID_AA64ISAR0, SHA1, 1); /* FEAT_SHA1 */
@@ -918,6 +928,7 @@ static void aarch64_max_initfn(Object *obj)
 t = FIELD_DP64(t, ID_AA64MMFR2, IESB, 1); /* FEAT_IESB */
 t = FIELD_DP64(t, ID_AA64MMFR2, VARANGE, 1);  /* FEAT_LVA */
 t = FIELD_DP64(t, ID_AA64MMFR2, ST, 1);   /* FEAT_TTST */
+t = FIELD_DP64(t, ID_AA64MMFR2, FWB, 1);  /* FEAT_S2FWB */
 t = FIELD_DP64(t, ID_AA64MMFR2, TTL, 1);  /* FEAT_TTL */
 t = FIELD_DP64(t, ID_AA64MMFR2, BBM, 2);  /* FEAT_BBM at level 2 */
 cpu->isar.id_aa64mmfr2 = t;
-- 
2.25.1




[PULL 14/22] hw/adc/zynq-xadc: Use qemu_irq typedef

2022-05-19 Thread Peter Maydell
From: Philippe Mathieu-Daudé 

Except hw/core/irq.c which implements the forward-declared opaque
qemu_irq structure, hw/adc/zynq-xadc.{c,h} are the only files not
using the typedef. Fix this single exception.

Signed-off-by: Philippe Mathieu-Daudé 
Reviewed-by: Bernhard Beschow 
Message-id: 20220509202035.50335-1-philippe.mathieu.da...@gmail.com
Signed-off-by: Peter Maydell 
---
 include/hw/adc/zynq-xadc.h | 3 +--
 hw/adc/zynq-xadc.c | 4 ++--
 2 files changed, 3 insertions(+), 4 deletions(-)

diff --git a/include/hw/adc/zynq-xadc.h b/include/hw/adc/zynq-xadc.h
index 2017b7a8037..c10cc4c379c 100644
--- a/include/hw/adc/zynq-xadc.h
+++ b/include/hw/adc/zynq-xadc.h
@@ -39,8 +39,7 @@ struct ZynqXADCState {
 uint16_t xadc_dfifo[ZYNQ_XADC_FIFO_DEPTH];
 uint16_t xadc_dfifo_entries;
 
-struct IRQState *qemu_irq;
-
+qemu_irq irq;
 };
 
 #endif /* ZYNQ_XADC_H */
diff --git a/hw/adc/zynq-xadc.c b/hw/adc/zynq-xadc.c
index cfc7bab0651..032e19cbd0a 100644
--- a/hw/adc/zynq-xadc.c
+++ b/hw/adc/zynq-xadc.c
@@ -86,7 +86,7 @@ static void zynq_xadc_update_ints(ZynqXADCState *s)
 s->regs[INT_STS] |= INT_DFIFO_GTH;
 }
 
-qemu_set_irq(s->qemu_irq, !!(s->regs[INT_STS] & ~s->regs[INT_MASK]));
+qemu_set_irq(s->irq, !!(s->regs[INT_STS] & ~s->regs[INT_MASK]));
 }
 
 static void zynq_xadc_reset(DeviceState *d)
@@ -262,7 +262,7 @@ static void zynq_xadc_init(Object *obj)
 memory_region_init_io(>iomem, obj, _ops, s, "zynq-xadc",
   ZYNQ_XADC_MMIO_SIZE);
 sysbus_init_mmio(sbd, >iomem);
-sysbus_init_irq(sbd, >qemu_irq);
+sysbus_init_irq(sbd, >irq);
 }
 
 static const VMStateDescription vmstate_zynq_xadc = {
-- 
2.25.1




[PULL 06/22] target/arm: Drop unsupported_encoding() macro

2022-05-19 Thread Peter Maydell
The unsupported_encoding() macro logs a LOG_UNIMP message and then
generates code to raise the usual exception for an unallocated
encoding.  Back when we were still implementing the A64 decoder this
was helpful for flagging up when guest code was using something we
hadn't yet implemented.  Now we completely cover the A64 instruction
set it is barely used.  The only remaining uses are for five
instructions whose semantics are "UNDEF, unless being run under
external halting debug":
 * HLT (when not being used for semihosting)
 * DCPSR1, DCPS2, DCPS3
 * DRPS

QEMU doesn't implement external halting debug, so for us the UNDEF is
the architecturally correct behaviour (because it's not possible to
execute these instructions with halting debug enabled).  The
LOG_UNIMP doesn't serve a useful purpose; replace these uses of
unsupported_encoding() with unallocated_encoding(), and delete the
macro.

Signed-off-by: Peter Maydell 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Message-id: 20220509160443.3561604-1-peter.mayd...@linaro.org
---
 target/arm/translate-a64.h | 9 -
 target/arm/translate-a64.c | 8 
 2 files changed, 4 insertions(+), 13 deletions(-)

diff --git a/target/arm/translate-a64.h b/target/arm/translate-a64.h
index 38884158aab..f2e8ee0ee1f 100644
--- a/target/arm/translate-a64.h
+++ b/target/arm/translate-a64.h
@@ -18,15 +18,6 @@
 #ifndef TARGET_ARM_TRANSLATE_A64_H
 #define TARGET_ARM_TRANSLATE_A64_H
 
-#define unsupported_encoding(s, insn)\
-do { \
-qemu_log_mask(LOG_UNIMP, \
-  "%s:%d: unsupported instruction encoding 0x%08x "  \
-  "at pc=%016" PRIx64 "\n",  \
-  __FILE__, __LINE__, insn, s->pc_curr); \
-unallocated_encoding(s); \
-} while (0)
-
 TCGv_i64 new_tmp_a64(DisasContext *s);
 TCGv_i64 new_tmp_a64_local(DisasContext *s);
 TCGv_i64 new_tmp_a64_zero(DisasContext *s);
diff --git a/target/arm/translate-a64.c b/target/arm/translate-a64.c
index 176a3c83ba2..f5025453078 100644
--- a/target/arm/translate-a64.c
+++ b/target/arm/translate-a64.c
@@ -2127,13 +2127,13 @@ static void disas_exc(DisasContext *s, uint32_t insn)
  * with our 32-bit semihosting).
  */
 if (s->current_el == 0) {
-unsupported_encoding(s, insn);
+unallocated_encoding(s);
 break;
 }
 #endif
 gen_exception_internal_insn(s, s->pc_curr, EXCP_SEMIHOST);
 } else {
-unsupported_encoding(s, insn);
+unallocated_encoding(s);
 }
 break;
 case 5:
@@ -2142,7 +2142,7 @@ static void disas_exc(DisasContext *s, uint32_t insn)
 break;
 }
 /* DCPS1, DCPS2, DCPS3 */
-unsupported_encoding(s, insn);
+unallocated_encoding(s);
 break;
 default:
 unallocated_encoding(s);
@@ -2307,7 +2307,7 @@ static void disas_uncond_b_reg(DisasContext *s, uint32_t 
insn)
 if (op3 != 0 || op4 != 0 || rn != 0x1f) {
 goto do_unallocated;
 } else {
-unsupported_encoding(s, insn);
+unallocated_encoding(s);
 }
 return;
 
-- 
2.25.1




[PULL 02/22] target/arm: Factor out FWB=0 specific part of combine_cacheattrs()

2022-05-19 Thread Peter Maydell
Factor out the part of combine_cacheattrs() that is specific to
handling HCR_EL2.FWB == 0.  This is the part where we combine the
memory type and cacheability attributes.

The "force Outer Shareable for Device or Normal Inner-NC Outer-NC"
logic remains in combine_cacheattrs() because it holds regardless
(this is the equivalent of the pseudocode EffectiveShareability()
function).

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Message-id: 20220505183950.2781801-3-peter.mayd...@linaro.org
---
 target/arm/helper.c | 88 +
 1 file changed, 50 insertions(+), 38 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 93c58ad29ab..a2a96358410 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -12578,6 +12578,46 @@ static uint8_t combine_cacheattr_nibble(uint8_t s1, 
uint8_t s2)
 }
 }
 
+/*
+ * Combine the memory type and cacheability attributes of
+ * s1 and s2 for the HCR_EL2.FWB == 0 case, returning the
+ * combined attributes in MAIR_EL1 format.
+ */
+static uint8_t combined_attrs_nofwb(CPUARMState *env,
+ARMCacheAttrs s1, ARMCacheAttrs s2)
+{
+uint8_t s1lo, s2lo, s1hi, s2hi, s2_mair_attrs, ret_attrs;
+
+s2_mair_attrs = convert_stage2_attrs(env, s2.attrs);
+
+s1lo = extract32(s1.attrs, 0, 4);
+s2lo = extract32(s2_mair_attrs, 0, 4);
+s1hi = extract32(s1.attrs, 4, 4);
+s2hi = extract32(s2_mair_attrs, 4, 4);
+
+/* Combine memory type and cacheability attributes */
+if (s1hi == 0 || s2hi == 0) {
+/* Device has precedence over normal */
+if (s1lo == 0 || s2lo == 0) {
+/* nGnRnE has precedence over anything */
+ret_attrs = 0;
+} else if (s1lo == 4 || s2lo == 4) {
+/* non-Reordering has precedence over Reordering */
+ret_attrs = 4;  /* nGnRE */
+} else if (s1lo == 8 || s2lo == 8) {
+/* non-Gathering has precedence over Gathering */
+ret_attrs = 8;  /* nGRE */
+} else {
+ret_attrs = 0xc; /* GRE */
+}
+} else { /* Normal memory */
+/* Outer/inner cacheability combine independently */
+ret_attrs = combine_cacheattr_nibble(s1hi, s2hi) << 4
+  | combine_cacheattr_nibble(s1lo, s2lo);
+}
+return ret_attrs;
+}
+
 /* Combine S1 and S2 cacheability/shareability attributes, per D4.5.4
  * and CombineS1S2Desc()
  *
@@ -12588,26 +12628,17 @@ static uint8_t combine_cacheattr_nibble(uint8_t s1, 
uint8_t s2)
 static ARMCacheAttrs combine_cacheattrs(CPUARMState *env,
 ARMCacheAttrs s1, ARMCacheAttrs s2)
 {
-uint8_t s1lo, s2lo, s1hi, s2hi;
 ARMCacheAttrs ret;
 bool tagged = false;
-uint8_t s2_mair_attrs;
 
 assert(s2.is_s2_format && !s1.is_s2_format);
 ret.is_s2_format = false;
 
-s2_mair_attrs = convert_stage2_attrs(env, s2.attrs);
-
 if (s1.attrs == 0xf0) {
 tagged = true;
 s1.attrs = 0xff;
 }
 
-s1lo = extract32(s1.attrs, 0, 4);
-s2lo = extract32(s2_mair_attrs, 0, 4);
-s1hi = extract32(s1.attrs, 4, 4);
-s2hi = extract32(s2_mair_attrs, 4, 4);
-
 /* Combine shareability attributes (table D4-43) */
 if (s1.shareability == 2 || s2.shareability == 2) {
 /* if either are outer-shareable, the result is outer-shareable */
@@ -12621,37 +12652,18 @@ static ARMCacheAttrs combine_cacheattrs(CPUARMState 
*env,
 }
 
 /* Combine memory type and cacheability attributes */
-if (s1hi == 0 || s2hi == 0) {
-/* Device has precedence over normal */
-if (s1lo == 0 || s2lo == 0) {
-/* nGnRnE has precedence over anything */
-ret.attrs = 0;
-} else if (s1lo == 4 || s2lo == 4) {
-/* non-Reordering has precedence over Reordering */
-ret.attrs = 4;  /* nGnRE */
-} else if (s1lo == 8 || s2lo == 8) {
-/* non-Gathering has precedence over Gathering */
-ret.attrs = 8;  /* nGRE */
-} else {
-ret.attrs = 0xc; /* GRE */
-}
+ret.attrs = combined_attrs_nofwb(env, s1, s2);
 
-/* Any location for which the resultant memory type is any
- * type of Device memory is always treated as Outer Shareable.
- */
+/*
+ * Any location for which the resultant memory type is any
+ * type of Device memory is always treated as Outer Shareable.
+ * Any location for which the resultant memory type is Normal
+ * Inner Non-cacheable, Outer Non-cacheable is always treated
+ * as Outer Shareable.
+ * TODO: FEAT_XS adds another value (0x40) also meaning iNCoNC
+ */
+if ((ret.attrs & 0xf0) == 0 || ret.attrs == 0x44) {
 ret.shareability = 2;
-} else { /* Normal memory */
-/* Outer/inner cacheability combine independently */
-ret.attrs = combine_cacheattr_nibble(s1hi, s2hi) << 4
-  | 

[PULL 18/22] hw/arm/virt: Drop #size-cells and #address-cells from gpio-keys dtb node

2022-05-19 Thread Peter Maydell
The virt board generates a gpio-keys node in the dtb, but it
incorrectly gives this node #size-cells and #address-cells
properties. If you dump the dtb with 'machine dumpdtb=file.dtb'
and run it through dtc, dtc will warn about this:

Warning (avoid_unnecessary_addr_size): /gpio-keys: unnecessary 
#address-cells/#size-cells without "ranges" or child "reg" property

Remove the bogus properties.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Message-id: 20220513131316.4081539-3-peter.mayd...@linaro.org
---
 hw/arm/virt.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 587e885a98c..097238faa7a 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -925,8 +925,6 @@ static void create_gpio_keys(char *fdt, DeviceState 
*pl061_dev,
 
 qemu_fdt_add_subnode(fdt, "/gpio-keys");
 qemu_fdt_setprop_string(fdt, "/gpio-keys", "compatible", "gpio-keys");
-qemu_fdt_setprop_cell(fdt, "/gpio-keys", "#size-cells", 0);
-qemu_fdt_setprop_cell(fdt, "/gpio-keys", "#address-cells", 1);
 
 qemu_fdt_add_subnode(fdt, "/gpio-keys/poweroff");
 qemu_fdt_setprop_string(fdt, "/gpio-keys/poweroff",
-- 
2.25.1




[PULL 03/22] target/arm: Implement FEAT_S2FWB

2022-05-19 Thread Peter Maydell
Implement the handling of FEAT_S2FWB; the meat of this is in the new
combined_attrs_fwb() function which combines S1 and S2 attributes
when HCR_EL2.FWB is set.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Message-id: 20220505183950.2781801-4-peter.mayd...@linaro.org
---
 target/arm/cpu.h|  5 +++
 target/arm/helper.c | 84 +++--
 2 files changed, 86 insertions(+), 3 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 18ca61e8e25..98efc638bbc 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3941,6 +3941,11 @@ static inline bool isar_feature_aa64_st(const 
ARMISARegisters *id)
 return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, ST) != 0;
 }
 
+static inline bool isar_feature_aa64_fwb(const ARMISARegisters *id)
+{
+return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, FWB) != 0;
+}
+
 static inline bool isar_feature_aa64_bti(const ARMISARegisters *id)
 {
 return FIELD_EX64(id->id_aa64pfr1, ID_AA64PFR1, BT) != 0;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index a2a96358410..073d6509c8c 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -5161,6 +5161,9 @@ static void do_hcr_write(CPUARMState *env, uint64_t 
value, uint64_t valid_mask)
 if (cpu_isar_feature(aa64_scxtnum, cpu)) {
 valid_mask |= HCR_ENSCXT;
 }
+if (cpu_isar_feature(aa64_fwb, cpu)) {
+valid_mask |= HCR_FWB;
+}
 }
 
 /* Clear RES0 bits.  */
@@ -5172,8 +5175,10 @@ static void do_hcr_write(CPUARMState *env, uint64_t 
value, uint64_t valid_mask)
  * HCR_PTW forbids certain page-table setups
  * HCR_DC disables stage1 and enables stage2 translation
  * HCR_DCT enables tagging on (disabled) stage1 translation
+ * HCR_FWB changes the interpretation of stage2 descriptor bits
  */
-if ((env->cp15.hcr_el2 ^ value) & (HCR_VM | HCR_PTW | HCR_DC | HCR_DCT)) {
+if ((env->cp15.hcr_el2 ^ value) &
+(HCR_VM | HCR_PTW | HCR_DC | HCR_DCT | HCR_FWB)) {
 tlb_flush(CPU(cpu));
 }
 env->cp15.hcr_el2 = value;
@@ -10731,9 +10736,15 @@ static bool ptw_attrs_are_device(CPUARMState *env, 
ARMCacheAttrs cacheattrs)
  * attributes are therefore only Device if stage 2 specifies Device.
  * With HCR_EL2.FWB == 0 this is when descriptor bits [5:4] are 0b00,
  * ie when cacheattrs.attrs bits [3:2] are 0b00.
+ * With HCR_EL2.FWB == 1 this is when descriptor bit [4] is 0, ie
+ * when cacheattrs.attrs bit [2] is 0.
  */
 assert(cacheattrs.is_s2_format);
-return (cacheattrs.attrs & 0xc) == 0;
+if (arm_hcr_el2_eff(env) & HCR_FWB) {
+return (cacheattrs.attrs & 0x4) == 0;
+} else {
+return (cacheattrs.attrs & 0xc) == 0;
+}
 }
 
 /* Translate a S1 pagetable walk through S2 if needed.  */
@@ -12618,6 +12629,69 @@ static uint8_t combined_attrs_nofwb(CPUARMState *env,
 return ret_attrs;
 }
 
+static uint8_t force_cacheattr_nibble_wb(uint8_t attr)
+{
+/*
+ * Given the 4 bits specifying the outer or inner cacheability
+ * in MAIR format, return a value specifying Normal Write-Back,
+ * with the allocation and transient hints taken from the input
+ * if the input specified some kind of cacheable attribute.
+ */
+if (attr == 0 || attr == 4) {
+/*
+ * 0 == an UNPREDICTABLE encoding
+ * 4 == Non-cacheable
+ * Either way, force Write-Back RW allocate non-transient
+ */
+return 0xf;
+}
+/* Change WriteThrough to WriteBack, keep allocation and transient hints */
+return attr | 4;
+}
+
+/*
+ * Combine the memory type and cacheability attributes of
+ * s1 and s2 for the HCR_EL2.FWB == 1 case, returning the
+ * combined attributes in MAIR_EL1 format.
+ */
+static uint8_t combined_attrs_fwb(CPUARMState *env,
+  ARMCacheAttrs s1, ARMCacheAttrs s2)
+{
+switch (s2.attrs) {
+case 7:
+/* Use stage 1 attributes */
+return s1.attrs;
+case 6:
+/*
+ * Force Normal Write-Back. Note that if S1 is Normal cacheable
+ * then we take the allocation hints from it; otherwise it is
+ * RW allocate, non-transient.
+ */
+if ((s1.attrs & 0xf0) == 0) {
+/* S1 is Device */
+return 0xff;
+}
+/* Need to check the Inner and Outer nibbles separately */
+return force_cacheattr_nibble_wb(s1.attrs & 0xf) |
+force_cacheattr_nibble_wb(s1.attrs >> 4) << 4;
+case 5:
+/* If S1 attrs are Device, use them; otherwise Normal Non-cacheable */
+if ((s1.attrs & 0xf0) == 0) {
+return s1.attrs;
+}
+return 0x44;
+case 0 ... 3:
+/* Force Device, of subtype specified by S2 */
+return s2.attrs << 2;
+default:
+/*
+ * RESERVED values (including RES0 descriptor bit [5] being nonzero);
+ * arbitrarily force Device.
+  

[PULL 01/22] target/arm: Postpone interpretation of stage 2 descriptor attribute bits

2022-05-19 Thread Peter Maydell
In the original Arm v8 two-stage translation, both stage 1 and stage
2 specify memory attributes (memory type, cacheability,
shareability); these are then combined to produce the overall memory
attributes for the whole stage 1+2 access.  In QEMU we implement this
by having get_phys_addr() fill in an ARMCacheAttrs struct, and we
convert both the stage 1 and stage 2 attribute bit formats to the
same encoding (an 8-bit attribute value matching the MAIR_EL1 fields,
plus a 2-bit shareability value).

The new FEAT_S2FWB feature allows the guest to enable a different
interpretation of the attribute bits in the stage 2 descriptors.
These bits can now be used to control details of how the stage 1 and
2 attributes should be combined (for instance they can say "always
use the stage 1 attributes" or "ignore the stage 1 attributes and
always be Device memory").  This means we need to pass the raw bit
information for stage 2 down to the function which combines the stage
1 and stage 2 information.

Add a field to ARMCacheAttrs that indicates whether the attrs field
should be interpreted as MAIR format, or as the raw stage 2 attribute
bits from the descriptor, and store the appropriate values when
filling in cacheattrs.

We only need to interpret the attrs field in a few places:
 * in do_ats_write(), where we know to expect a MAIR value
   (there is no ATS instruction to do a stage-2-only walk)
 * in S1_ptw_translate(), where we want to know whether the
   combined S1 + S2 attributes indicate Device memory that
   should provoke a fault
 * in combine_cacheattrs(), which does the S1 + S2 combining
Update those places accordingly.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Message-id: 20220505183950.2781801-2-peter.mayd...@linaro.org
---
 target/arm/internals.h |  7 ++-
 target/arm/helper.c| 42 --
 2 files changed, 42 insertions(+), 7 deletions(-)

diff --git a/target/arm/internals.h b/target/arm/internals.h
index 6ca0e957468..9b354eea7e4 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -1149,8 +1149,13 @@ bool pmsav8_mpu_lookup(CPUARMState *env, uint32_t 
address,
 
 /* Cacheability and shareability attributes for a memory access */
 typedef struct ARMCacheAttrs {
-unsigned int attrs:8; /* as in the MAIR register encoding */
+/*
+ * If is_s2_format is true, attrs is the S2 descriptor bits [5:2]
+ * Otherwise, attrs is the same as the MAIR_EL1 8-bit format
+ */
+unsigned int attrs:8;
 unsigned int shareability:2; /* as in the SH field of the VMSAv8-64 PTEs */
+bool is_s2_format:1;
 } ARMCacheAttrs;
 
 bool get_phys_addr(CPUARMState *env, target_ulong address,
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 432bd819195..93c58ad29ab 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -3187,6 +3187,12 @@ static uint64_t do_ats_write(CPUARMState *env, uint64_t 
value,
 ret = get_phys_addr(env, value, access_type, mmu_idx, _addr, ,
 , _size, , );
 
+/*
+ * ATS operations only do S1 or S1+S2 translations, so we never
+ * have to deal with the ARMCacheAttrs format for S2 only.
+ */
+assert(!cacheattrs.is_s2_format);
+
 if (ret) {
 /*
  * Some kinds of translation fault must cause exceptions rather
@@ -10717,6 +10723,19 @@ static bool get_level1_table_address(CPUARMState *env, 
ARMMMUIdx mmu_idx,
 return true;
 }
 
+static bool ptw_attrs_are_device(CPUARMState *env, ARMCacheAttrs cacheattrs)
+{
+/*
+ * For an S1 page table walk, the stage 1 attributes are always
+ * some form of "this is Normal memory". The combined S1+S2
+ * attributes are therefore only Device if stage 2 specifies Device.
+ * With HCR_EL2.FWB == 0 this is when descriptor bits [5:4] are 0b00,
+ * ie when cacheattrs.attrs bits [3:2] are 0b00.
+ */
+assert(cacheattrs.is_s2_format);
+return (cacheattrs.attrs & 0xc) == 0;
+}
+
 /* Translate a S1 pagetable walk through S2 if needed.  */
 static hwaddr S1_ptw_translate(CPUARMState *env, ARMMMUIdx mmu_idx,
hwaddr addr, bool *is_secure,
@@ -10745,7 +10764,7 @@ static hwaddr S1_ptw_translate(CPUARMState *env, 
ARMMMUIdx mmu_idx,
 return ~0;
 }
 if ((arm_hcr_el2_eff(env) & HCR_PTW) &&
-(cacheattrs.attrs & 0xf0) == 0) {
+ptw_attrs_are_device(env, cacheattrs)) {
 /*
  * PTW set and S1 walk touched S2 Device memory:
  * generate Permission fault.
@@ -11817,12 +11836,14 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
uint64_t address,
 }
 
 if (mmu_idx == ARMMMUIdx_Stage2 || mmu_idx == ARMMMUIdx_Stage2_S) {
-cacheattrs->attrs = convert_stage2_attrs(env, extract32(attrs, 0, 4));
+cacheattrs->is_s2_format = true;
+cacheattrs->attrs = extract32(attrs, 0, 4);
 } else {
 /* Index into MAIR registers for cache attributes 

[PULL 07/22] hw/intc/arm_gicv3_cpuif: Handle CPUs that don't specify GICv3 parameters

2022-05-19 Thread Peter Maydell
We allow a GICv3 to be connected to any CPU, but we don't do anything
to handle the case where the CPU type doesn't in hardware have a
GICv3 CPU interface and so the various GIC configuration fields
(gic_num_lrs, vprebits, vpribits) are not specified.

The current behaviour is that we will add the EL1 CPU interface
registers, but will not put in the EL2 CPU interface registers, even
if the CPU has EL2, which will leave the GIC in a broken state and
probably result in the guest crashing as it tries to set it up.  This
only affects the virt board when using the cortex-a15 or cortex-a7
CPU types (both 32-bit) with -machine gic-version=3 (or 'max')
and -machine virtualization=on.

Instead of failing to set up the EL2 registers, if the CPU doesn't
define the GIC configuration set it to a reasonable default, matching
the standard configuration for most Arm CPUs.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Message-id: 20220512151457.3899052-2-peter.mayd...@linaro.org
---
 hw/intc/arm_gicv3_cpuif.c | 18 +-
 1 file changed, 13 insertions(+), 5 deletions(-)

diff --git a/hw/intc/arm_gicv3_cpuif.c b/hw/intc/arm_gicv3_cpuif.c
index 9efba798f82..df2f8583564 100644
--- a/hw/intc/arm_gicv3_cpuif.c
+++ b/hw/intc/arm_gicv3_cpuif.c
@@ -2755,6 +2755,15 @@ void gicv3_init_cpuif(GICv3State *s)
 ARMCPU *cpu = ARM_CPU(qemu_get_cpu(i));
 GICv3CPUState *cs = >cpu[i];
 
+/*
+ * If the CPU doesn't define a GICv3 configuration, probably because
+ * in real hardware it doesn't have one, then we use default values
+ * matching the one used by most Arm CPUs. This applies to:
+ *  cpu->gic_num_lrs
+ *  cpu->gic_vpribits
+ *  cpu->gic_vprebits
+ */
+
 /* Note that we can't just use the GICv3CPUState as an opaque pointer
  * in define_arm_cp_regs_with_opaque(), because when we're called back
  * it might be with code translated by CPU 0 but run by CPU 1, in
@@ -2763,13 +2772,12 @@ void gicv3_init_cpuif(GICv3State *s)
  * get back to the GICv3CPUState from the CPUARMState.
  */
 define_arm_cp_regs(cpu, gicv3_cpuif_reginfo);
-if (arm_feature(>env, ARM_FEATURE_EL2)
-&& cpu->gic_num_lrs) {
+if (arm_feature(>env, ARM_FEATURE_EL2)) {
 int j;
 
-cs->num_list_regs = cpu->gic_num_lrs;
-cs->vpribits = cpu->gic_vpribits;
-cs->vprebits = cpu->gic_vprebits;
+cs->num_list_regs = cpu->gic_num_lrs ?: 4;
+cs->vpribits = cpu->gic_vpribits ?: 5;
+cs->vprebits = cpu->gic_vprebits ?: 5;
 
 /* Check against architectural constraints: getting these
  * wrong would be a bug in the CPU code defining these,
-- 
2.25.1




[PULL 05/22] target/arm: Implement FEAT_IDST

2022-05-19 Thread Peter Maydell
The Armv8.4 feature FEAT_IDST specifies that exceptions generated by
read accesses to the feature ID space should report a syndrome code
of 0x18 (EC_SYSTEMREGISTERTRAP) rather than 0x00 (EC_UNCATEGORIZED).
The feature ID space is defined to be:
 op0 == 3, op1 == {0,1,3}, CRn == 0, CRm == {0-7}, op2 == {0-7}

In our implementation we might return the EC_UNCATEGORIZED syndrome
value for a system register access in four cases:
 * no reginfo struct in the hashtable
 * cp_access_ok() fails (ie ri->access doesn't permit the access)
 * ri->accessfn returns CP_ACCESS_TRAP_UNCATEGORIZED at runtime
 * ri->type includes ARM_CP_RAISES_EXC, and the readfn raises
   an UNDEF exception at runtime

We have very few regdefs that set ARM_CP_RAISES_EXC, and none of
them are in the feature ID space. (In the unlikely event that any
are added in future they would need to take care of setting the
correct syndrome themselves.) This patch deals with the other
three cases, and enables FEAT_IDST for AArch64 -cpu max.

Signed-off-by: Peter Maydell 
Reviewed-by: Richard Henderson 
Message-id: 20220509155457.3560724-1-peter.mayd...@linaro.org
---
 docs/system/arm/emulation.rst |  1 +
 target/arm/cpregs.h   | 24 
 target/arm/cpu.h  |  5 +
 target/arm/cpu64.c|  1 +
 target/arm/op_helper.c|  9 +
 target/arm/translate-a64.c| 28 ++--
 6 files changed, 66 insertions(+), 2 deletions(-)

diff --git a/docs/system/arm/emulation.rst b/docs/system/arm/emulation.rst
index 8f25502ced7..3e95bba0d24 100644
--- a/docs/system/arm/emulation.rst
+++ b/docs/system/arm/emulation.rst
@@ -31,6 +31,7 @@ the following architecture extensions:
 - FEAT_FlagM2 (Enhancements to flag manipulation instructions)
 - FEAT_HPDS (Hierarchical permission disables)
 - FEAT_I8MM (AArch64 Int8 matrix multiplication instructions)
+- FEAT_IDST (ID space trap handling)
 - FEAT_IESB (Implicit error synchronization event)
 - FEAT_JSCVT (JavaScript conversion instructions)
 - FEAT_LOR (Limited ordering regions)
diff --git a/target/arm/cpregs.h b/target/arm/cpregs.h
index db03d6a7e13..d9b678c2f17 100644
--- a/target/arm/cpregs.h
+++ b/target/arm/cpregs.h
@@ -461,4 +461,28 @@ static inline bool cp_access_ok(int current_el,
 /* Raw read of a coprocessor register (as needed for migration, etc) */
 uint64_t read_raw_cp_reg(CPUARMState *env, const ARMCPRegInfo *ri);
 
+/*
+ * Return true if the cp register encoding is in the "feature ID space" as
+ * defined by FEAT_IDST (and thus should be reported with ER_ELx.EC
+ * as EC_SYSTEMREGISTERTRAP rather than EC_UNCATEGORIZED).
+ */
+static inline bool arm_cpreg_encoding_in_idspace(uint8_t opc0, uint8_t opc1,
+ uint8_t opc2,
+ uint8_t crn, uint8_t crm)
+{
+return opc0 == 3 && (opc1 == 0 || opc1 == 1 || opc1 == 3) &&
+crn == 0 && crm < 8;
+}
+
+/*
+ * As arm_cpreg_encoding_in_idspace(), but take the encoding from an
+ * ARMCPRegInfo.
+ */
+static inline bool arm_cpreg_in_idspace(const ARMCPRegInfo *ri)
+{
+return ri->state == ARM_CP_STATE_AA64 &&
+arm_cpreg_encoding_in_idspace(ri->opc0, ri->opc1, ri->opc2,
+  ri->crn, ri->crm);
+}
+
 #endif /* TARGET_ARM_CPREGS_H */
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 98efc638bbc..a99b430e54e 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -3946,6 +3946,11 @@ static inline bool isar_feature_aa64_fwb(const 
ARMISARegisters *id)
 return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, FWB) != 0;
 }
 
+static inline bool isar_feature_aa64_ids(const ARMISARegisters *id)
+{
+return FIELD_EX64(id->id_aa64mmfr2, ID_AA64MMFR2, IDS) != 0;
+}
+
 static inline bool isar_feature_aa64_bti(const ARMISARegisters *id)
 {
 return FIELD_EX64(id->id_aa64pfr1, ID_AA64PFR1, BT) != 0;
diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index e83c013e1fe..804a54922cb 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -928,6 +928,7 @@ static void aarch64_max_initfn(Object *obj)
 t = FIELD_DP64(t, ID_AA64MMFR2, IESB, 1); /* FEAT_IESB */
 t = FIELD_DP64(t, ID_AA64MMFR2, VARANGE, 1);  /* FEAT_LVA */
 t = FIELD_DP64(t, ID_AA64MMFR2, ST, 1);   /* FEAT_TTST */
+t = FIELD_DP64(t, ID_AA64MMFR2, IDS, 1);  /* FEAT_IDST */
 t = FIELD_DP64(t, ID_AA64MMFR2, FWB, 1);  /* FEAT_S2FWB */
 t = FIELD_DP64(t, ID_AA64MMFR2, TTL, 1);  /* FEAT_TTL */
 t = FIELD_DP64(t, ID_AA64MMFR2, BBM, 2);  /* FEAT_BBM at level 2 */
diff --git a/target/arm/op_helper.c b/target/arm/op_helper.c
index 390b6578a89..c4bd6688702 100644
--- a/target/arm/op_helper.c
+++ b/target/arm/op_helper.c
@@ -631,6 +631,7 @@ uint32_t HELPER(mrs_banked)(CPUARMState *env, uint32_t 
tgtmode, uint32_t regno)
 void HELPER(access_check_cp_reg)(CPUARMState *env, void *rip, uint32_t 
syndrome,
  uint32_t isread)
 {
+ 

[PULL 00/22] target-arm queue

2022-05-19 Thread Peter Maydell
target-arm queue: mostly patches from me this time round.
Nothing too exciting.

-- PMM

The following changes since commit 78ac2eebbab9150edf5d0d00e3648f5ebb599001:

  Merge tag 'artist-cursor-fix-final-pull-request' of 
https://github.com/hdeller/qemu-hppa into staging (2022-05-18 09:32:15 -0700)

are available in the Git repository at:

  https://git.linaro.org/people/pmaydell/qemu-arm.git 
tags/pull-target-arm-20220519

for you to fetch changes up to fab8ad39fb75a0d9f097db67b2a33754e88e:

  target/arm: Use FIELD definitions for CPACR, CPTR_ELx (2022-05-19 18:34:10 
+0100)


target-arm queue:
 * Implement FEAT_S2FWB
 * Implement FEAT_IDST
 * Drop unsupported_encoding() macro
 * hw/intc/arm_gicv3: Use correct number of priority bits for the CPU
 * Fix aarch64 debug register names
 * hw/adc/zynq-xadc: Use qemu_irq typedef
 * target/arm/helper.c: Delete stray obsolete comment
 * Make number of counters in PMCR follow the CPU
 * hw/arm/virt: Fix dtb nits
 * ptimer: Rename PTIMER_POLICY_DEFAULT to PTIMER_POLICY_LEGACY
 * target/arm: Fix PAuth keys access checks for disabled SEL2
 * Enable FEAT_HCX for -cpu max
 * Use FIELD definitions for CPACR, CPTR_ELx


Chris Howard (1):
  Fix aarch64 debug register names.

Florian Lugou (1):
  target/arm: Fix PAuth keys access checks for disabled SEL2

Peter Maydell (17):
  target/arm: Postpone interpretation of stage 2 descriptor attribute bits
  target/arm: Factor out FWB=0 specific part of combine_cacheattrs()
  target/arm: Implement FEAT_S2FWB
  target/arm: Enable FEAT_S2FWB for -cpu max
  target/arm: Implement FEAT_IDST
  target/arm: Drop unsupported_encoding() macro
  hw/intc/arm_gicv3_cpuif: Handle CPUs that don't specify GICv3 parameters
  hw/intc/arm_gicv3: report correct PRIbits field in ICV_CTLR_EL1
  hw/intc/arm_gicv3_kvm.c: Stop using GIC_MIN_BPR constant
  hw/intc/arm_gicv3: Support configurable number of physical priority bits
  hw/intc/arm_gicv3: Use correct number of priority bits for the CPU
  hw/intc/arm_gicv3: Provide ich_num_aprs()
  target/arm/helper.c: Delete stray obsolete comment
  target/arm: Make number of counters in PMCR follow the CPU
  hw/arm/virt: Fix incorrect non-secure flash dtb node name
  hw/arm/virt: Drop #size-cells and #address-cells from gpio-keys dtb node
  ptimer: Rename PTIMER_POLICY_DEFAULT to PTIMER_POLICY_LEGACY

Philippe Mathieu-Daudé (1):
  hw/adc/zynq-xadc: Use qemu_irq typedef

Richard Henderson (2):
  target/arm: Enable FEAT_HCX for -cpu max
  target/arm: Use FIELD definitions for CPACR, CPTR_ELx

 docs/system/arm/emulation.rst  |   2 +
 include/hw/adc/zynq-xadc.h |   3 +-
 include/hw/intc/arm_gicv3_common.h |   8 +-
 include/hw/ptimer.h|  16 +-
 target/arm/cpregs.h|  24 +++
 target/arm/cpu.h   |  76 +++-
 target/arm/internals.h |  11 +-
 target/arm/translate-a64.h |   9 -
 hw/adc/zynq-xadc.c |   4 +-
 hw/arm/boot.c  |   2 +-
 hw/arm/musicpal.c  |   2 +-
 hw/arm/virt.c  |   4 +-
 hw/core/machine.c  |   4 +-
 hw/dma/xilinx_axidma.c |   2 +-
 hw/dma/xlnx_csu_dma.c  |   2 +-
 hw/intc/arm_gicv3_common.c |   5 +
 hw/intc/arm_gicv3_cpuif.c  | 225 +---
 hw/intc/arm_gicv3_kvm.c|  16 +-
 hw/m68k/mcf5206.c  |   2 +-
 hw/m68k/mcf5208.c  |   2 +-
 hw/net/can/xlnx-zynqmp-can.c   |   2 +-
 hw/net/fsl_etsec/etsec.c   |   2 +-
 hw/net/lan9118.c   |   2 +-
 hw/rtc/exynos4210_rtc.c|   4 +-
 hw/timer/allwinner-a10-pit.c   |   2 +-
 hw/timer/altera_timer.c|   2 +-
 hw/timer/arm_timer.c   |   2 +-
 hw/timer/digic-timer.c |   2 +-
 hw/timer/etraxfs_timer.c   |   6 +-
 hw/timer/exynos4210_mct.c  |   6 +-
 hw/timer/exynos4210_pwm.c  |   2 +-
 hw/timer/grlib_gptimer.c   |   2 +-
 hw/timer/imx_epit.c|   4 +-
 hw/timer/imx_gpt.c |   2 +-
 hw/timer/mss-timer.c   |   2 +-
 hw/timer/sh_timer.c|   2 +-
 hw/timer/slavio_timer.c|   2 +-
 hw/timer/xilinx_timer.c|   2 +-
 target/arm/cpu.c   |  11 +-
 target/arm/cpu64.c |  30 
 target/arm/cpu_tcg.c   |   6 +
 target/arm/helper.c| 348 -
 target/arm/kvm64.c |  12 ++
 target/arm/op_helper.c |   9 +
 target/arm/translate-a64.c |  36 +++-
 tests/unit/ptimer-test.c   |   6 +-
 46 files changed, 697 insertions(+), 228 deletions(-)



Re: [PATCH] contrib/elf2dmp: add ELF dump header checking

2022-05-19 Thread Richard Henderson

On 5/19/22 09:48, Viktor Prutyanov wrote:

+if (ehdr->e_ident[EI_CLASS] != ELFCLASS64 ||
+ehdr->e_ident[EI_DATA] != ELFDATA2LSB) {
+eprintf("Invalid ELF class or byte order, must be 64-bit LE\n");
+return false;
+}


You could check EI_VERSION == EV_CURRENT too.
You should check e_machine == EM_X86_64.


+if (!ehdr->e_phnum) {
+eprintf("Invalid number of ELF program headers\n");
+return false;
+}


In init_states(), you appear to assume this number is exactly 1.


r~



Re: [PATCH v3 00/15] Misc cleanups

2022-05-19 Thread Marc-André Lureau
Hi

Before I send a v4 and hopefully final version, could somebody review
those patches:

- include: move qemu_*_exec_dir() to cutils
- osdep: export qemu_open_cloexec()

- qga: replace qemu_open_old() with qemu_open_cloexec()
- test/qga: use G_TEST_DIR to locate os-release test file

(Paolo sortof acked the v1, but not quite rigorously)

thanks!

On Fri, May 13, 2022 at 8:08 PM  wrote:
>
> From: Marc-André Lureau 
>
> Hi,
>
> v3:
> - changed error_report_err() back to g_critical()
> - added "qga: make build_fs_mount_list() return a bool"
> - replaced g_clear_pointer() usage by open-coded version
> - dropped needless g_autoptr(GError) in tests
> - rebased, (dropped "include: adjust header guards after renaming")
> - some commit message rewording
> - added r-b tags
>
> v2:
> - drop "compiler.h: add QEMU_{BEGIN,END}_IGNORE_INITIALIZER_OVERRIDES",
>   "qobject/json-lexer: disable -Winitializer-overrides warnings" &
>   "qapi/error: add g_autoptr(Error) support" and adjust related code.
> - add "test/qga: use g_auto wherever sensible"
> - add r-b tags
>
> Marc-André Lureau (15):
>   include: move qemu_*_exec_dir() to cutils
>   util/win32: simplify qemu_get_local_state_dir()
>   tests: make libqmp buildable for win32
>   qga: flatten safe_open_or_create()
>   osdep: export qemu_open_cloexec()
>   qga: use qemu_open_cloexec() for safe_open_or_create()
>   qga: throw an Error in ga_channel_open()
>   qga: replace qemu_open_old() with qemu_open_cloexec()
>   qga: make build_fs_mount_list() return a bool
>   test/qga: use G_TEST_DIR to locate os-release test file
>   qga/wixl: prefer variables over environment
>   qga/wixl: require Mingw_bin
>   qga/wixl: simplify some pre-processing
>   qga/wixl: replace QEMU_GA_MSI_MINGW_BIN_PATH with glib bindir
>   test/qga: use g_auto wherever sensible
>
>  configure|   9 +-
>  include/qemu/cutils.h|   7 ++
>  include/qemu/osdep.h |   9 +-
>  meson.build  |   5 +-
>  qemu-io.c|   1 +
>  qga/channel-posix.c  |  55 +
>  qga/commands-posix.c | 164 +--
>  qga/installer/qemu-ga.wxs|  83 +-
>  qga/meson.build  |  11 +-
>  storage-daemon/qemu-storage-daemon.c |   1 +
>  tests/qtest/fuzz/fuzz.c  |   1 +
>  tests/qtest/libqmp.c |  34 +-
>  tests/qtest/libqmp.h |   2 +
>  tests/unit/test-qga.c| 130 -
>  util/cutils.c| 108 ++
>  util/osdep.c |  10 +-
>  util/oslib-posix.c   |  81 -
>  util/oslib-win32.c   |  53 +
>  18 files changed, 358 insertions(+), 406 deletions(-)
>
> --
> 2.36.1
>




[PATCH] hw/pci/pcie.c: Fix invalid PCI_EXP_LNKCAP setting

2022-05-19 Thread Wenliang Wang
pcie_cap_fill_slot_lnk() wrongly set PCI_EXP_LNKCAP when slot speed
and width is not set, causing strange downstream port link cap
(Speed unknown, Width x0) and pcie devices native hotplug error on Linux:

[3.545654] pcieport :02:00.0: pciehp: link training error: status 0x2000
[3.547143] pcieport :02:00.0: pciehp: Failed to check link status

We do not touch PCI_EXP_LNKCAP when speed=0 or width=0, as pcie_cap_v1_fill()
already do the default setting for us.

Signed-off-by: Wenliang Wang 
---
 hw/pci/pcie.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
index 68a62da..c82e7fc 100644
--- a/hw/pci/pcie.c
+++ b/hw/pci/pcie.c
@@ -92,6 +92,11 @@ static void pcie_cap_fill_slot_lnk(PCIDevice *dev)
 return;
 }
 
+/* Use default LNKCAP setting */
+if (s->speed == 0 || s->width == 0) {
+return;
+}
+
 /* Clear and fill LNKCAP from what was configured above */
 pci_long_test_and_clear_mask(exp_cap + PCI_EXP_LNKCAP,
  PCI_EXP_LNKCAP_MLW | PCI_EXP_LNKCAP_SLS);
-- 
2.7.4




Re: [PATCH] hw/pci/pcie.c: Fix invalid PCI_EXP_LNKCAP setting

2022-05-19 Thread Michael S. Tsirkin
On Thu, May 19, 2022 at 10:45:59PM +0800, Wenliang Wang wrote:
> pcie_cap_fill_slot_lnk() wrongly set PCI_EXP_LNKCAP when slot speed
> and width is not set, causing strange downstream port link cap
> (Speed unknown, Width x0) and pcie devices native hotplug error on Linux:
> 
> [3.545654] pcieport :02:00.0: pciehp: link training error: status 
> 0x2000
> [3.547143] pcieport :02:00.0: pciehp: Failed to check link status
> 
> We do not touch PCI_EXP_LNKCAP when speed=0 or width=0, as pcie_cap_v1_fill()
> already do the default setting for us.
> 
> Signed-off-by: Wenliang Wang 


do we need machine type compat dance with this?
can you check whether this affects cross version
migration please?

> ---
>  hw/pci/pcie.c | 5 +
>  1 file changed, 5 insertions(+)
> 
> diff --git a/hw/pci/pcie.c b/hw/pci/pcie.c
> index 68a62da..c82e7fc 100644
> --- a/hw/pci/pcie.c
> +++ b/hw/pci/pcie.c
> @@ -92,6 +92,11 @@ static void pcie_cap_fill_slot_lnk(PCIDevice *dev)
>  return;
>  }
>  
> +/* Use default LNKCAP setting */
> +if (s->speed == 0 || s->width == 0) {
> +return;
> +}
> +
>  /* Clear and fill LNKCAP from what was configured above */
>  pci_long_test_and_clear_mask(exp_cap + PCI_EXP_LNKCAP,
>   PCI_EXP_LNKCAP_MLW | PCI_EXP_LNKCAP_SLS);
> -- 
> 2.7.4




[PATCH] contrib/elf2dmp: add ELF dump header checking

2022-05-19 Thread Viktor Prutyanov
Add ELF header checking to prevent processing input file which is not
QEMU guest memory dump or even not ELF.

Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1013

Signed-off-by: Viktor Prutyanov 
---
 contrib/elf2dmp/qemu_elf.c | 38 ++
 1 file changed, 38 insertions(+)

diff --git a/contrib/elf2dmp/qemu_elf.c b/contrib/elf2dmp/qemu_elf.c
index b601b6d7ba..941b573f17 100644
--- a/contrib/elf2dmp/qemu_elf.c
+++ b/contrib/elf2dmp/qemu_elf.c
@@ -118,6 +118,39 @@ static void exit_states(QEMU_Elf *qe)
 free(qe->state);
 }
 
+static bool check_ehdr(QEMU_Elf *qe)
+{
+Elf64_Ehdr *ehdr = qe->map;
+
+if (sizeof(Elf64_Ehdr) > qe->size) {
+eprintf("Invalid input dump file size\n");
+return false;
+}
+
+if (memcmp(ehdr->e_ident, ELFMAG, SELFMAG)) {
+eprintf("Invalid ELF signature, input file is not ELF\n");
+return false;
+}
+
+if (ehdr->e_ident[EI_CLASS] != ELFCLASS64 ||
+ehdr->e_ident[EI_DATA] != ELFDATA2LSB) {
+eprintf("Invalid ELF class or byte order, must be 64-bit LE\n");
+return false;
+}
+
+if (ehdr->e_type != ET_CORE) {
+eprintf("Invalid ELF type, must be core file\n");
+return false;
+}
+
+if (!ehdr->e_phnum) {
+eprintf("Invalid number of ELF program headers\n");
+return false;
+}
+
+return true;
+}
+
 int QEMU_Elf_init(QEMU_Elf *qe, const char *filename)
 {
 GError *gerr = NULL;
@@ -133,6 +166,11 @@ int QEMU_Elf_init(QEMU_Elf *qe, const char *filename)
 qe->map = g_mapped_file_get_contents(qe->gmf);
 qe->size = g_mapped_file_get_length(qe->gmf);
 
+if (!check_ehdr(qe)) {
+err = 1;
+goto out_unmap;
+}
+
 if (init_states(qe)) {
 eprintf("Failed to extract QEMU CPU states\n");
 err = 1;
-- 
2.35.1




Re: [PATCH v2 3/7] target/arm: Do not use aarch64_sve_zcr_get_valid_len in reset

2022-05-19 Thread Richard Henderson

On 5/19/22 03:40, Peter Maydell wrote:

Not all the code that looks at the sve vector length
goes through sve_zcr_len_for_el(), though. In particular,
this is setting up ZCR_EL1 for usermode, and all
the code under linux-user/ that wants to know the vector
length does it with "env->vfp.zcr_el[1] & 0xf".


Oops, yes.  Linux-user should be checking ZCR_LEN from env->hflags.


Incidentally, do_prctl_set_vl() also sets zcr_el[1] and
it doesn't call aarch64_sve_zcr_get_valid_len(). Should it,
or is it doing an equivalent check anyway?


I think this got missed when we introduced the set of valid lengths -- it's still assuming 
all lengths less than maximum are valid.


I'll add a couple of cleanup patches for this.

r~




[PATCH v3 3/3] ui: Remove deprecated options "-sdl" and "-curses"

2022-05-19 Thread Thomas Huth
We have "-sdl" and "-curses", but no "-gtk" and no "-cocoa" ...
these old-style options are rather confusing than helpful nowadays.
Now that the deprecation period is over, let's remove them, so we
get a cleaner interface (where "-display" is the only way to select
the user interface).

Reviewed-by: Daniel P. Berrangé 
Signed-off-by: Thomas Huth 
---
 docs/about/deprecated.rst   | 10 --
 docs/about/removed-features.rst | 10 ++
 softmmu/vl.c| 19 ---
 qemu-options.hx | 24 ++--
 4 files changed, 12 insertions(+), 51 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index 562a133f18..e19bcba242 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -81,16 +81,6 @@ the process listing. This is replaced by the new 
``password-secret``
 option which lets the password be securely provided on the command
 line using a ``secret`` object instance.
 
-``-sdl`` (since 6.2)
-
-
-Use ``-display sdl`` instead.
-
-``-curses`` (since 6.2)
-'''
-
-Use ``-display curses`` instead.
-
 ``-watchdog`` (since 6.2)
 '
 
diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst
index 4c9e001c35..c7b9dadd5d 100644
--- a/docs/about/removed-features.rst
+++ b/docs/about/removed-features.rst
@@ -386,6 +386,16 @@ Use ``-display sdl,grab-mod=lshift-lctrl-lalt`` instead.
 
 Use ``-display sdl,grab-mod=rctrl`` instead.
 
+``-sdl`` (removed in 7.1)
+'
+
+Use ``-display sdl`` instead.
+
+``-curses`` (removed in 7.1)
+
+
+Use ``-display curses`` instead.
+
 
 QEMU Machine Protocol (QMP) commands
 
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 484e9d9921..4c1e94b00e 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -2800,16 +2800,6 @@ void qemu_init(int argc, char **argv, char **envp)
 nographic = true;
 dpy.type = DISPLAY_TYPE_NONE;
 break;
-case QEMU_OPTION_curses:
-warn_report("-curses is deprecated, "
-"use -display curses instead.");
-#ifdef CONFIG_CURSES
-dpy.type = DISPLAY_TYPE_CURSES;
-#else
-error_report("curses or iconv support is disabled");
-exit(1);
-#endif
-break;
 case QEMU_OPTION_portrait:
 graphic_rotate = 90;
 break;
@@ -3176,15 +3166,6 @@ void qemu_init(int argc, char **argv, char **envp)
 dpy.has_full_screen = true;
 dpy.full_screen = true;
 break;
-case QEMU_OPTION_sdl:
-warn_report("-sdl is deprecated, use -display sdl instead.");
-#ifdef CONFIG_SDL
-dpy.type = DISPLAY_TYPE_SDL;
-break;
-#else
-error_report("SDL support is disabled");
-exit(1);
-#endif
 case QEMU_OPTION_pidfile:
 pid_file = optarg;
 break;
diff --git a/qemu-options.hx b/qemu-options.hx
index 726e437a97..60cf188da4 100644
--- a/qemu-options.hx
+++ b/qemu-options.hx
@@ -1981,9 +1981,8 @@ DEF("display", HAS_ARG, QEMU_OPTION_display,
 , QEMU_ARCH_ALL)
 SRST
 ``-display type``
-Select type of display to use. This option is a replacement for the
-old style -sdl/-curses/... options. Use ``-display help`` to list
-the available display types. Valid values for type are
+Select type of display to use. Use ``-display help`` to list the available
+display types. Valid values for type are
 
 ``spice-app[,gl=on|off]``
 Start QEMU as a Spice server and launch the default Spice client
@@ -2085,25 +2084,6 @@ SRST
 Use C-a h for help on switching between the console and monitor.
 ERST
 
-DEF("curses", 0, QEMU_OPTION_curses,
-"-curses shorthand for -display curses\n",
-QEMU_ARCH_ALL)
-SRST
-``-curses``
-Normally, if QEMU is compiled with graphical window support, it
-displays output such as guest graphics, guest console, and the QEMU
-monitor in a window. With this option, QEMU can display the VGA
-output when in text mode using a curses/ncurses interface. Nothing
-is displayed in graphical mode.
-ERST
-
-DEF("sdl", 0, QEMU_OPTION_sdl,
-"-sdlshorthand for -display sdl\n", QEMU_ARCH_ALL)
-SRST
-``-sdl``
-Enable SDL.
-ERST
-
 #ifdef CONFIG_SPICE
 DEF("spice", HAS_ARG, QEMU_OPTION_spice,
 "-spice [port=port][,tls-port=secured-port][,x509-dir=]\n"
-- 
2.27.0




[PATCH v3 1/3] ui: Remove deprecated parameters of the "-display sdl" option

2022-05-19 Thread Thomas Huth
Dropping these deprecated parameters simplifies further refactoring
(e.g. QAPIfication is easier without underscores in the name).

Reviewed-by: Daniel P. Berrangé 
Reviewed-by: Markus Armbruster 
Signed-off-by: Thomas Huth 
---
 docs/about/deprecated.rst   | 16 -
 docs/about/removed-features.rst | 17 ++
 softmmu/vl.c| 41 +
 qemu-options.hx | 32 ++---
 4 files changed, 20 insertions(+), 86 deletions(-)

diff --git a/docs/about/deprecated.rst b/docs/about/deprecated.rst
index a92ae0f162..562a133f18 100644
--- a/docs/about/deprecated.rst
+++ b/docs/about/deprecated.rst
@@ -81,22 +81,6 @@ the process listing. This is replaced by the new 
``password-secret``
 option which lets the password be securely provided on the command
 line using a ``secret`` object instance.
 
-``-display sdl,window_close=...`` (since 6.1)
-'
-
-Use ``-display sdl,window-close=...`` instead (i.e. with a minus instead of
-an underscore between "window" and "close").
-
-``-alt-grab`` and ``-display sdl,alt_grab=on`` (since 6.2)
-''
-
-Use ``-display sdl,grab-mod=lshift-lctrl-lalt`` instead.
-
-``-ctrl-grab`` and ``-display sdl,ctrl_grab=on`` (since 6.2)
-
-
-Use ``-display sdl,grab-mod=rctrl`` instead.
-
 ``-sdl`` (since 6.2)
 
 
diff --git a/docs/about/removed-features.rst b/docs/about/removed-features.rst
index eb76974347..4c9e001c35 100644
--- a/docs/about/removed-features.rst
+++ b/docs/about/removed-features.rst
@@ -370,6 +370,23 @@ The ``opened=on`` option in the command line or QMP 
``object-add`` either had
 no effect (if ``opened`` was the last option) or caused errors.  The property
 is therefore useless and should simply be removed.
 
+``-display sdl,window_close=...`` (removed in 7.1)
+''
+
+Use ``-display sdl,window-close=...`` instead (i.e. with a minus instead of
+an underscore between "window" and "close").
+
+``-alt-grab`` and ``-display sdl,alt_grab=on`` (removed in 7.1)
+'''
+
+Use ``-display sdl,grab-mod=lshift-lctrl-lalt`` instead.
+
+``-ctrl-grab`` and ``-display sdl,ctrl_grab=on`` (removed in 7.1)
+'
+
+Use ``-display sdl,grab-mod=rctrl`` instead.
+
+
 QEMU Machine Protocol (QMP) commands
 
 
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 84a31eba76..57ab9d5322 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -1079,32 +1079,7 @@ static void parse_display(const char *p)
 } else {
 goto invalid_sdl_args;
 }
-} else if (strstart(opts, ",alt_grab=", )) {
-opts = nextopt;
-if (strstart(opts, "on", )) {
-alt_grab = 1;
-} else if (strstart(opts, "off", )) {
-alt_grab = 0;
-} else {
-goto invalid_sdl_args;
-}
-warn_report("alt_grab is deprecated, use grab-mod instead.");
-} else if (strstart(opts, ",ctrl_grab=", )) {
-opts = nextopt;
-if (strstart(opts, "on", )) {
-ctrl_grab = 1;
-} else if (strstart(opts, "off", )) {
-ctrl_grab = 0;
-} else {
-goto invalid_sdl_args;
-}
-warn_report("ctrl_grab is deprecated, use grab-mod instead.");
-} else if (strstart(opts, ",window_close=", ) ||
-   strstart(opts, ",window-close=", )) {
-if (strstart(opts, ",window_close=", NULL)) {
-warn_report("window_close with an underscore is 
deprecated,"
-" please use window-close instead.");
-}
+} else if (strstart(opts, ",window-close=", )) {
 opts = nextopt;
 dpy.has_window_close = true;
 if (strstart(opts, "on", )) {
@@ -1962,10 +1937,6 @@ static void qemu_create_early_backends(void)
 const bool use_gtk = false;
 #endif
 
-if ((alt_grab || ctrl_grab) && !use_sdl) {
-error_report("-alt-grab and -ctrl-grab are only valid "
- "for SDL, ignoring option");
-}
 if (dpy.has_window_close && !use_gtk && !use_sdl) {
 error_report("window-close is only valid for GTK and SDL, "
  "ignoring option");
@@ -3273,16 +3244,6 @@ void qemu_init(int argc, char **argv, char **envp)
 dpy.has_full_screen = true;
 dpy.full_screen = true;
 break;
-case QEMU_OPTION_alt_grab:
-

[PATCH v3 0/3] ui: Remove deprecated sdl parameters and switch to QAPI parser

2022-05-19 Thread Thomas Huth
The "-display sdl" option still uses a hand-crafted parser for its
parameters since some of them used underscores which is disliked in QAPI.
Now that they've been deprecated and the deprecation period is over, we
can remove the problematic parameters and switch to use the QAPI parser
instead.

While we're at it, also remove the deprecated "-sdl" and "-curses" options.

v3:
 - Fixed some texts according to Markus' and Eric's suggestions
 - Renamed the GrabMod enum to HotKeyMod (to not confuse it so easily
   with GrabToggleKeys)
   
v2:
 - Rebase to current master branch to resolve conflicts in docs/about/*.rst
 - Use an enum for the grab-mod parameter instead of a unconstrained string

Thomas Huth (3):
  ui: Remove deprecated parameters of the "-display sdl" option
  ui: Switch "-display sdl" to use the QAPI parser
  ui: Remove deprecated options "-sdl" and "-curses"

 docs/about/deprecated.rst   |  26 ---
 docs/about/removed-features.rst |  27 +++
 qapi/ui.json|  26 ++-
 include/sysemu/sysemu.h |   2 -
 softmmu/globals.c   |   2 -
 softmmu/vl.c| 128 +---
 ui/sdl2.c   |  10 +++
 qemu-options.hx |  56 +-
 8 files changed, 67 insertions(+), 210 deletions(-)

-- 
2.27.0




[PATCH v6 0/8] KVM: mm: fd-based approach for supporting KVM guest private memory

2022-05-19 Thread Chao Peng
This is the v6 of this series which tries to implement the fd-based KVM
guest private memory. The patches are based on latest kvm/queue branch
commit:

  2764011106d0 (kvm/queue) KVM: VMX: Include MKTME KeyID bits in
shadow_zero_check
 
and Sean's below patch:

  KVM: x86/mmu: Add RET_PF_CONTINUE to eliminate bool+int* "returns"
  https://lkml.org/lkml/2022/4/22/1598

Introduction

In general this patch series introduce fd-based memslot which provides
guest memory through memory file descriptor fd[offset,size] instead of
hva/size. The fd can be created from a supported memory filesystem
like tmpfs/hugetlbfs etc. which we refer as memory backing store. KVM
and the the memory backing store exchange callbacks when such memslot
gets created. At runtime KVM will call into callbacks provided by the
backing store to get the pfn with the fd+offset. Memory backing store
will also call into KVM callbacks when userspace fallocate/punch hole
on the fd to notify KVM to map/unmap secondary MMU page tables.

Comparing to existing hva-based memslot, this new type of memslot allows
guest memory unmapped from host userspace like QEMU and even the kernel
itself, therefore reduce attack surface and prevent bugs.

Based on this fd-based memslot, we can build guest private memory that
is going to be used in confidential computing environments such as Intel
TDX and AMD SEV. When supported, the memory backing store can provide
more enforcement on the fd and KVM can use a single memslot to hold both
the private and shared part of the guest memory. 

mm extension
-
Introduces new MFD_INACCESSIBLE flag for memfd_create(), the file created
with these flags cannot read(), write() or mmap() etc via normal
MMU operations. The file content can only be used with the newly
introduced memfile_notifier extension.

The memfile_notifier extension provides two sets of callbacks for KVM to
interact with the memory backing store:
  - memfile_notifier_ops: callbacks for memory backing store to notify
KVM when memory gets allocated/invalidated.
  - backing store callbacks: callbacks for KVM to call into memory backing
store to request memory pages for guest private memory.

The memfile_notifier extension also provides APIs for memory backing
store to register/unregister itself and to trigger the notifier when the
bookmarked memory gets fallocated/invalidated.

memslot extension
-
Add the private fd and the fd offset to existing 'shared' memslot so that
both private/shared guest memory can live in one single memslot. A page in
the memslot is either private or shared. A page is private only when it's
already allocated in the backing store fd, all the other cases it's treated
as shared, this includes those already mapped as shared as well as those
having not been mapped. This means the memory backing store is the place
which tells the truth of which page is private.

Private memory map/unmap and conversion
---
Userspace's map/unmap operations are done by fallocate() ioctl on the
backing store fd.
  - map: default fallocate() with mode=0.
  - unmap: fallocate() with FALLOC_FL_PUNCH_HOLE.
The map/unmap will trigger above memfile_notifier_ops to let KVM map/unmap
secondary MMU page tables.

Test

To test the new functionalities of this patch TDX patchset is needed.
Since TDX patchset has not been merged so I did two kinds of test:

-  Selftest on normal VM from Vishal
   https://lkml.org/lkml/2022/5/10/2045
   The selftest has been ported to this patchset and you can find it in
   repo: https://github.com/chao-p/linux/tree/privmem-v6

-  Private memory funational test on latest TDX code
   The patch is rebased to latest TDX code and tested the new
   funcationalities. See below repos:
   Linux: https://github.com/chao-p/linux/commits/privmem-v6-tdx
   QEMU: https://github.com/chao-p/qemu/tree/privmem-v6

An example QEMU command line for TDX test:
-object tdx-guest,id=tdx \
-object memory-backend-memfd-private,id=ram1,size=2G \
-machine 
q35,kvm-type=tdx,pic=no,kernel_irqchip=split,memory-encryption=tdx,memory-backend=ram1

What's missing
--
  - The accounting for longterm pinned memory in the backing store is
not included since I havn't come out a good solution yet.
  - Batch invalidation notify for shmem is not ready, as I still see
it's a bit tricky to do that clearly.

Changelog
--
v6:
  - Re-organzied patch for both mm/KVM parts.
  - Added flags for memfile_notifier so its consumers can state their
features and memory backing store can check against these flags.
  - Put a backing store reference in the memfile_notifier and move pfn_ops
into backing store.
  - Only support boot time backing store register.
  - Overall KVM part improvement suggested by Sean and some others.
v5:
  - Removed userspace visible F_SEAL_INACCESSIBLE, instead using an
in-kernel flag (SHM_F_INACCESSIBLE for shmem). Private fd can only
be 

[PATCH v6 8/8] memfd_create.2: Describe MFD_INACCESSIBLE flag

2022-05-19 Thread Chao Peng
Signed-off-by: Chao Peng 
---
 man2/memfd_create.2 | 13 +
 1 file changed, 13 insertions(+)

diff --git a/man2/memfd_create.2 b/man2/memfd_create.2
index 89e9c4136..2698222ae 100644
--- a/man2/memfd_create.2
+++ b/man2/memfd_create.2
@@ -101,6 +101,19 @@ meaning that no other seals can be set on the file.
 .\" FIXME Why is the MFD_ALLOW_SEALING behavior not simply the default?
 .\" Is it worth adding some text explaining this?
 .TP
+.BR MFD_INACCESSIBLE
+Disallow userspace access through ordinary MMU accesses via
+.BR read (2),
+.BR write (2)
+and
+.BR mmap (2).
+The file size cannot be changed once initialized.
+This flag cannot coexist with
+.B MFD_ALLOW_SEALING
+and when this flag is set, the initial set of seals will be
+.B F_SEAL_SEAL,
+meaning that no other seals can be set on the file.
+.TP
 .BR MFD_HUGETLB " (since Linux 4.14)"
 .\" commit 749df87bd7bee5a79cef073f5d032ddb2b211de8
 The anonymous file will be created in the hugetlbfs filesystem using
-- 
2.17.1




[PATCH v6 7/8] KVM: Enable and expose KVM_MEM_PRIVATE

2022-05-19 Thread Chao Peng
Register private memslot to fd-based memory backing store and handle the
memfile notifiers to zap the existing mappings.

Currently the register is happened at memslot creating time and the
initial support does not include page migration/swap.

KVM_MEM_PRIVATE is not exposed by default, architecture code can turn
on it by implementing kvm_arch_private_mem_supported().

A 'kvm' reference is added in memslot structure since in
memfile_notifier callbacks we can only obtain a memslot reference while
kvm is need to do the zapping. The zapping itself reuses code from
existing mmu notifier handling.

Co-developed-by: Yu Zhang 
Signed-off-by: Yu Zhang 
Signed-off-by: Chao Peng 
---
 include/linux/kvm_host.h |  10 ++-
 virt/kvm/kvm_main.c  | 132 ---
 2 files changed, 131 insertions(+), 11 deletions(-)

diff --git a/include/linux/kvm_host.h b/include/linux/kvm_host.h
index b0a7910505ed..00efb4b96bc7 100644
--- a/include/linux/kvm_host.h
+++ b/include/linux/kvm_host.h
@@ -246,7 +246,7 @@ bool kvm_setup_async_pf(struct kvm_vcpu *vcpu, gpa_t 
cr2_or_gpa,
 int kvm_async_pf_wakeup_all(struct kvm_vcpu *vcpu);
 #endif
 
-#ifdef KVM_ARCH_WANT_MMU_NOTIFIER
+#if defined(KVM_ARCH_WANT_MMU_NOTIFIER) || defined(CONFIG_MEMFILE_NOTIFIER)
 struct kvm_gfn_range {
struct kvm_memory_slot *slot;
gfn_t start;
@@ -577,6 +577,7 @@ struct kvm_memory_slot {
struct file *private_file;
loff_t private_offset;
struct memfile_notifier notifier;
+   struct kvm *kvm;
 };
 
 static inline bool kvm_slot_is_private(const struct kvm_memory_slot *slot)
@@ -769,9 +770,13 @@ struct kvm {
struct hlist_head irq_ack_notifier_list;
 #endif
 
+#if (defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)) ||\
+   defined(CONFIG_MEMFILE_NOTIFIER)
+   unsigned long mmu_notifier_seq;
+#endif
+
 #if defined(CONFIG_MMU_NOTIFIER) && defined(KVM_ARCH_WANT_MMU_NOTIFIER)
struct mmu_notifier mmu_notifier;
-   unsigned long mmu_notifier_seq;
long mmu_notifier_count;
unsigned long mmu_notifier_range_start;
unsigned long mmu_notifier_range_end;
@@ -1438,6 +1443,7 @@ bool kvm_arch_dy_has_pending_interrupt(struct kvm_vcpu 
*vcpu);
 int kvm_arch_post_init_vm(struct kvm *kvm);
 void kvm_arch_pre_destroy_vm(struct kvm *kvm);
 int kvm_arch_create_vm_debugfs(struct kvm *kvm);
+bool kvm_arch_private_mem_supported(struct kvm *kvm);
 
 #ifndef __KVM_HAVE_ARCH_VM_ALLOC
 /*
diff --git a/virt/kvm/kvm_main.c b/virt/kvm/kvm_main.c
index db9d39a2d3a6..f93ac7cdfb53 100644
--- a/virt/kvm/kvm_main.c
+++ b/virt/kvm/kvm_main.c
@@ -843,6 +843,73 @@ static int kvm_init_mmu_notifier(struct kvm *kvm)
 
 #endif /* CONFIG_MMU_NOTIFIER && KVM_ARCH_WANT_MMU_NOTIFIER */
 
+#ifdef CONFIG_HAVE_KVM_PRIVATE_MEM
+static void kvm_private_mem_notifier_handler(struct memfile_notifier *notifier,
+pgoff_t start, pgoff_t end)
+{
+   int idx;
+   struct kvm_memory_slot *slot = container_of(notifier,
+   struct kvm_memory_slot,
+   notifier);
+   struct kvm_gfn_range gfn_range = {
+   .slot   = slot,
+   .start  = start - (slot->private_offset >> PAGE_SHIFT),
+   .end= end - (slot->private_offset >> PAGE_SHIFT),
+   .may_block  = true,
+   };
+   struct kvm *kvm = slot->kvm;
+
+   gfn_range.start = slot->base_gfn + gfn_range.start;
+   gfn_range.end = slot->base_gfn + min((unsigned long)gfn_range.end, 
slot->npages);
+
+   if (WARN_ON_ONCE(gfn_range.start >= gfn_range.end))
+   return;
+
+   idx = srcu_read_lock(>srcu);
+   KVM_MMU_LOCK(kvm);
+   if (kvm_unmap_gfn_range(kvm, _range))
+   kvm_flush_remote_tlbs(kvm);
+   kvm->mmu_notifier_seq++;
+   KVM_MMU_UNLOCK(kvm);
+   srcu_read_unlock(>srcu, idx);
+}
+
+static struct memfile_notifier_ops kvm_private_mem_notifier_ops = {
+   .populate = kvm_private_mem_notifier_handler,
+   .invalidate = kvm_private_mem_notifier_handler,
+};
+
+#define KVM_MEMFILE_FLAGS MEMFILE_F_USER_INACCESSIBLE | \
+ MEMFILE_F_UNMOVABLE | \
+ MEMFILE_F_UNRECLAIMABLE
+
+static inline int kvm_private_mem_register(struct kvm_memory_slot *slot)
+{
+   slot->notifier.ops = _private_mem_notifier_ops;
+   return memfile_register_notifier(slot->private_file, KVM_MEMFILE_FLAGS,
+>notifier);
+}
+
+static inline void kvm_private_mem_unregister(struct kvm_memory_slot *slot)
+{
+   memfile_unregister_notifier(>notifier);
+}
+
+#else /* !CONFIG_HAVE_KVM_PRIVATE_MEM */
+
+static inline int kvm_private_mem_register(struct kvm_memory_slot *slot)
+{
+   WARN_ON_ONCE(1);
+   return -EOPNOTSUPP;
+}
+
+static inline void kvm_private_mem_unregister(struct 

[PATCH v6 5/8] KVM: Add KVM_EXIT_MEMORY_FAULT exit

2022-05-19 Thread Chao Peng
This new KVM exit allows userspace to handle memory-related errors. It
indicates an error happens in KVM at guest memory range [gpa, gpa+size).
The flags includes additional information for userspace to handle the
error. Currently bit 0 is defined as 'private memory' where '1'
indicates error happens due to private memory access and '0' indicates
error happens due to shared memory access.

After private memory is enabled, this new exit will be used for KVM to
exit to userspace for shared memory <-> private memory conversion in
memory encryption usage.

In such usage, typically there are two kind of memory conversions:
  - explicit conversion: happens when guest explicitly calls into KVM to
map a range (as private or shared), KVM then exits to userspace to
do the map/unmap operations.
  - implicit conversion: happens in KVM page fault handler.
* if the fault is due to a private memory access then causes a
  userspace exit for a shared->private conversion request when the
  page has not been allocated in the private memory backend.
* If the fault is due to a shared memory access then causes a
  userspace exit for a private->shared conversion request when the
  page has already been allocated in the private memory backend.

Suggested-by: Sean Christopherson 
Co-developed-by: Yu Zhang 
Signed-off-by: Yu Zhang 
Signed-off-by: Chao Peng 
---
 Documentation/virt/kvm/api.rst | 22 ++
 include/uapi/linux/kvm.h   |  9 +
 2 files changed, 31 insertions(+)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index b959445b64cc..2421c012278b 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -6341,6 +6341,28 @@ array field represents return values. The userspace 
should update the return
 values of SBI call before resuming the VCPU. For more details on RISC-V SBI
 spec refer, https://github.com/riscv/riscv-sbi-doc.
 
+::
+
+   /* KVM_EXIT_MEMORY_FAULT */
+   struct {
+  #define KVM_MEMORY_EXIT_FLAG_PRIVATE (1 << 0)
+   __u32 flags;
+   __u32 padding;
+   __u64 gpa;
+   __u64 size;
+   } memory;
+If exit reason is KVM_EXIT_MEMORY_FAULT then it indicates that the VCPU has
+encountered a memory error which is not handled by KVM kernel module and
+userspace may choose to handle it. The 'flags' field indicates the memory
+properties of the exit.
+
+ - KVM_MEMORY_EXIT_FLAG_PRIVATE - indicates the memory error is caused by
+   private memory access when the bit is set otherwise the memory error is
+   caused by shared memory access when the bit is clear.
+
+'gpa' and 'size' indicate the memory range the error occurs at. The userspace
+may handle the error and return to KVM to retry the previous memory access.
+
 ::
 
/* Fix the size of the union. */
diff --git a/include/uapi/linux/kvm.h b/include/uapi/linux/kvm.h
index 28cacd3656d4..6ca864be258f 100644
--- a/include/uapi/linux/kvm.h
+++ b/include/uapi/linux/kvm.h
@@ -294,6 +294,7 @@ struct kvm_xen_exit {
 #define KVM_EXIT_X86_BUS_LOCK 33
 #define KVM_EXIT_XEN  34
 #define KVM_EXIT_RISCV_SBI35
+#define KVM_EXIT_MEMORY_FAULT 36
 
 /* For KVM_EXIT_INTERNAL_ERROR */
 /* Emulate instruction failed. */
@@ -518,6 +519,14 @@ struct kvm_run {
unsigned long args[6];
unsigned long ret[2];
} riscv_sbi;
+   /* KVM_EXIT_MEMORY_FAULT */
+   struct {
+#define KVM_MEMORY_EXIT_FLAG_PRIVATE   (1 << 0)
+   __u32 flags;
+   __u32 padding;
+   __u64 gpa;
+   __u64 size;
+   } memory;
/* Fix the size of the union. */
char padding[256];
};
-- 
2.25.1




[PATCH v6 3/8] mm/memfd: Introduce MFD_INACCESSIBLE flag

2022-05-19 Thread Chao Peng
Introduce a new memfd_create() flag indicating the content of the
created memfd is inaccessible from userspace through ordinary MMU
access (e.g., read/write/mmap). However, the file content can be
accessed via a different mechanism (e.g. KVM MMU) indirectly.

It provides semantics required for KVM guest private memory support
that a file descriptor with this flag set is going to be used as the
source of guest memory in confidential computing environments such
as Intel TDX/AMD SEV but may not be accessible from host userspace.

The flag can not coexist with MFD_ALLOW_SEALING, future sealing is
also impossible for a memfd created with this flag.

Signed-off-by: Chao Peng 
---
 include/uapi/linux/memfd.h |  1 +
 mm/memfd.c | 15 ++-
 2 files changed, 15 insertions(+), 1 deletion(-)

diff --git a/include/uapi/linux/memfd.h b/include/uapi/linux/memfd.h
index 7a8a26751c23..48750474b904 100644
--- a/include/uapi/linux/memfd.h
+++ b/include/uapi/linux/memfd.h
@@ -8,6 +8,7 @@
 #define MFD_CLOEXEC0x0001U
 #define MFD_ALLOW_SEALING  0x0002U
 #define MFD_HUGETLB0x0004U
+#define MFD_INACCESSIBLE   0x0008U
 
 /*
  * Huge page size encoding when MFD_HUGETLB is specified, and a huge page
diff --git a/mm/memfd.c b/mm/memfd.c
index 08f5f8304746..775541d53f1b 100644
--- a/mm/memfd.c
+++ b/mm/memfd.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 
 /*
@@ -261,7 +262,8 @@ long memfd_fcntl(struct file *file, unsigned int cmd, 
unsigned long arg)
 #define MFD_NAME_PREFIX_LEN (sizeof(MFD_NAME_PREFIX) - 1)
 #define MFD_NAME_MAX_LEN (NAME_MAX - MFD_NAME_PREFIX_LEN)
 
-#define MFD_ALL_FLAGS (MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_HUGETLB)
+#define MFD_ALL_FLAGS (MFD_CLOEXEC | MFD_ALLOW_SEALING | MFD_HUGETLB | \
+  MFD_INACCESSIBLE)
 
 SYSCALL_DEFINE2(memfd_create,
const char __user *, uname,
@@ -283,6 +285,10 @@ SYSCALL_DEFINE2(memfd_create,
return -EINVAL;
}
 
+   /* Disallow sealing when MFD_INACCESSIBLE is set. */
+   if (flags & MFD_INACCESSIBLE && flags & MFD_ALLOW_SEALING)
+   return -EINVAL;
+
/* length includes terminating zero */
len = strnlen_user(uname, MFD_NAME_MAX_LEN + 1);
if (len <= 0)
@@ -329,12 +335,19 @@ SYSCALL_DEFINE2(memfd_create,
if (flags & MFD_ALLOW_SEALING) {
file_seals = memfd_file_seals_ptr(file);
*file_seals &= ~F_SEAL_SEAL;
+   } else if (flags & MFD_INACCESSIBLE) {
+   error = memfile_node_set_flags(file,
+  MEMFILE_F_USER_INACCESSIBLE);
+   if (error)
+   goto err_file;
}
 
fd_install(fd, file);
kfree(name);
return fd;
 
+err_file:
+   fput(file);
 err_fd:
put_unused_fd(fd);
 err_name:
-- 
2.25.1




[PATCH v6 4/8] KVM: Extend the memslot to support fd-based private memory

2022-05-19 Thread Chao Peng
Extend the memslot definition to provide guest private memory through a
file descriptor(fd) instead of userspace_addr(hva). Such guest private
memory(fd) may never be mapped into userspace so no userspace_addr(hva)
can be used. Instead add another two new fields
(private_fd/private_offset), plus the existing memory_size to represent
the private memory range. Such memslot can still have the existing
userspace_addr(hva). When use, a single memslot can maintain both
private memory through private fd(private_fd/private_offset) and shared
memory through hva(userspace_addr). A GPA is considered private by KVM
if the memslot has private fd and that corresponding page in the private
fd is populated, otherwise, it's shared.

Since there is no userspace mapping for private fd so we cannot
rely on get_user_pages() to get the pfn in KVM, instead we add a new
memfile_notifier in the memslot and rely on it to get pfn by interacting
its callbacks from memory backing store with the fd/offset.

This new extension is indicated by a new flag KVM_MEM_PRIVATE. At
compile time, a new config HAVE_KVM_PRIVATE_MEM is added and right now
it is selected on X86_64 for Intel TDX usage.

To make KVM easy, internally we use a binary compatible struct
kvm_user_mem_region to handle both the normal and the '_ext' variants.

Co-developed-by: Yu Zhang 
Signed-off-by: Yu Zhang 
Signed-off-by: Chao Peng 
---
 Documentation/virt/kvm/api.rst   | 38 ++--
 arch/mips/include/asm/kvm_host.h |  2 +-
 arch/x86/include/asm/kvm_host.h  |  2 +-
 arch/x86/kvm/Kconfig |  2 ++
 arch/x86/kvm/x86.c   |  2 +-
 include/linux/kvm_host.h | 19 +++-
 include/uapi/linux/kvm.h | 24 
 virt/kvm/Kconfig |  3 +++
 virt/kvm/kvm_main.c  | 33 +--
 9 files changed, 103 insertions(+), 22 deletions(-)

diff --git a/Documentation/virt/kvm/api.rst b/Documentation/virt/kvm/api.rst
index 23baf7fce038..b959445b64cc 100644
--- a/Documentation/virt/kvm/api.rst
+++ b/Documentation/virt/kvm/api.rst
@@ -1311,7 +1311,7 @@ yet and must be cleared on entry.
 :Capability: KVM_CAP_USER_MEMORY
 :Architectures: all
 :Type: vm ioctl
-:Parameters: struct kvm_userspace_memory_region (in)
+:Parameters: struct kvm_userspace_memory_region(_ext) (in)
 :Returns: 0 on success, -1 on error
 
 ::
@@ -1324,9 +1324,18 @@ yet and must be cleared on entry.
__u64 userspace_addr; /* start of the userspace allocated memory */
   };
 
+  struct kvm_userspace_memory_region_ext {
+   struct kvm_userspace_memory_region region;
+   __u64 private_offset;
+   __u32 private_fd;
+   __u32 pad1;
+   __u64 pad2[14];
+};
+
   /* for kvm_memory_region::flags */
   #define KVM_MEM_LOG_DIRTY_PAGES  (1UL << 0)
   #define KVM_MEM_READONLY (1UL << 1)
+  #define KVM_MEM_PRIVATE  (1UL << 2)
 
 This ioctl allows the user to create, modify or delete a guest physical
 memory slot.  Bits 0-15 of "slot" specify the slot id and this value
@@ -1357,12 +1366,27 @@ It is recommended that the lower 21 bits of 
guest_phys_addr and userspace_addr
 be identical.  This allows large pages in the guest to be backed by large
 pages in the host.
 
-The flags field supports two flags: KVM_MEM_LOG_DIRTY_PAGES and
-KVM_MEM_READONLY.  The former can be set to instruct KVM to keep track of
-writes to memory within the slot.  See KVM_GET_DIRTY_LOG ioctl to know how to
-use it.  The latter can be set, if KVM_CAP_READONLY_MEM capability allows it,
-to make a new slot read-only.  In this case, writes to this memory will be
-posted to userspace as KVM_EXIT_MMIO exits.
+kvm_userspace_memory_region_ext includes all the kvm_userspace_memory_region
+fields. It also includes additional fields for some specific features. See
+below description of flags field for more information. It's recommended to use
+kvm_userspace_memory_region_ext in new userspace code.
+
+The flags field supports below flags:
+
+- KVM_MEM_LOG_DIRTY_PAGES can be set to instruct KVM to keep track of writes to
+  memory within the slot.  See KVM_GET_DIRTY_LOG ioctl to know how to use it.
+
+- KVM_MEM_READONLY can be set, if KVM_CAP_READONLY_MEM capability allows it, to
+  make a new slot read-only.  In this case, writes to this memory will be 
posted
+  to userspace as KVM_EXIT_MMIO exits.
+
+- KVM_MEM_PRIVATE can be set to indicate a new slot has private memory backed 
by
+  a file descirptor(fd) and the content of the private memory is invisible to
+  userspace. In this case, userspace should use private_fd/private_offset in
+  kvm_userspace_memory_region_ext to instruct KVM to provide private memory to
+  guest. Userspace should guarantee not to map the same pfn indicated by
+  private_fd/private_offset to different gfns with multiple memslots. Failed to
+  do this may result undefined behavior.
 
 When the KVM_CAP_SYNC_MMU capability is available, changes in the backing of
 the memory region 

[PATCH v2 1/4] xlnx_dp: fix the wrong register size

2022-05-19 Thread Frederic Konrad via
The core and the vblend registers size are wrong, they should respectively be
0x3B0 and 0x1E0 according to:
  
https://www.xilinx.com/htmldocs/registers/ug1087/ug1087-zynq-ultrascale-registers.html.

Let's fix that and use macros when creating the mmio region.

Fixes: 58ac482a66d ("introduce xlnx-dp")
Signed-off-by: Frederic Konrad 
Reviewed-by: Edgar E. Iglesias 
---
 hw/display/xlnx_dp.c | 17 ++---
 include/hw/display/xlnx_dp.h |  9 +++--
 2 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/hw/display/xlnx_dp.c b/hw/display/xlnx_dp.c
index 9bb781e312..0378570459 100644
--- a/hw/display/xlnx_dp.c
+++ b/hw/display/xlnx_dp.c
@@ -1219,19 +1219,22 @@ static void xlnx_dp_init(Object *obj)
 SysBusDevice *sbd = SYS_BUS_DEVICE(obj);
 XlnxDPState *s = XLNX_DP(obj);
 
-memory_region_init(>container, obj, TYPE_XLNX_DP, 0xC050);
+memory_region_init(>container, obj, TYPE_XLNX_DP, DP_CONTAINER_SIZE);
 
 memory_region_init_io(>core_iomem, obj, _ops, s, TYPE_XLNX_DP
-  ".core", 0x3AF);
-memory_region_add_subregion(>container, 0x, >core_iomem);
+  ".core", sizeof(s->core_registers));
+memory_region_add_subregion(>container, DP_CORE_REG_OFFSET,
+>core_iomem);
 
 memory_region_init_io(>vblend_iomem, obj, _ops, s, TYPE_XLNX_DP
-  ".v_blend", 0x1DF);
-memory_region_add_subregion(>container, 0xA000, >vblend_iomem);
+  ".v_blend", sizeof(s->vblend_registers));
+memory_region_add_subregion(>container, DP_VBLEND_REG_OFFSET,
+>vblend_iomem);
 
 memory_region_init_io(>avbufm_iomem, obj, _ops, s, TYPE_XLNX_DP
-  ".av_buffer_manager", 0x238);
-memory_region_add_subregion(>container, 0xB000, >avbufm_iomem);
+  ".av_buffer_manager", sizeof(s->avbufm_registers));
+memory_region_add_subregion(>container, DP_AVBUF_REG_OFFSET,
+>avbufm_iomem);
 
 memory_region_init_io(>audio_iomem, obj, _ops, s, TYPE_XLNX_DP
   ".audio", sizeof(s->audio_registers));
diff --git a/include/hw/display/xlnx_dp.h b/include/hw/display/xlnx_dp.h
index 8ab4733bb8..1ef5a89ee7 100644
--- a/include/hw/display/xlnx_dp.h
+++ b/include/hw/display/xlnx_dp.h
@@ -39,10 +39,15 @@
 #define AUD_CHBUF_MAX_DEPTH (32 * KiB)
 #define MAX_QEMU_BUFFER_SIZE(4 * KiB)
 
-#define DP_CORE_REG_ARRAY_SIZE  (0x3AF >> 2)
+#define DP_CORE_REG_OFFSET  (0x)
+#define DP_CORE_REG_ARRAY_SIZE  (0x3B0 >> 2)
+#define DP_AVBUF_REG_OFFSET (0xB000)
 #define DP_AVBUF_REG_ARRAY_SIZE (0x238 >> 2)
-#define DP_VBLEND_REG_ARRAY_SIZE(0x1DF >> 2)
+#define DP_VBLEND_REG_OFFSET(0xA000)
+#define DP_VBLEND_REG_ARRAY_SIZE(0x1E0 >> 2)
+#define DP_AUDIO_REG_OFFSET (0xC000)
 #define DP_AUDIO_REG_ARRAY_SIZE (0x50 >> 2)
+#define DP_CONTAINER_SIZE   (0xC050)
 
 struct PixmanPlane {
 pixman_format_code_t format;
-- 
2.25.1




[PATCH v6 2/8] mm/shmem: Support memfile_notifier

2022-05-19 Thread Chao Peng
From: "Kirill A. Shutemov" 

Implement shmem as a memfile_notifier backing store. Essentially it
interacts with the memfile_notifier feature flags for userspace
access/page migration/page reclaiming and implements the necessary
memfile_backing_store callbacks.

Signed-off-by: Kirill A. Shutemov 
Signed-off-by: Chao Peng 
---
 include/linux/shmem_fs.h |   2 +
 mm/shmem.c   | 120 ++-
 2 files changed, 121 insertions(+), 1 deletion(-)

diff --git a/include/linux/shmem_fs.h b/include/linux/shmem_fs.h
index ab51d3cd39bd..a8e98bdd121e 100644
--- a/include/linux/shmem_fs.h
+++ b/include/linux/shmem_fs.h
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* inode in-kernel data */
 
@@ -25,6 +26,7 @@ struct shmem_inode_info {
struct simple_xattrsxattrs; /* list of xattrs */
atomic_tstop_eviction;  /* hold when working on inode */
struct timespec64   i_crtime;   /* file creation time */
+   struct memfile_node memfile_node;   /* memfile node */
struct inodevfs_inode;
 };
 
diff --git a/mm/shmem.c b/mm/shmem.c
index 529c9ad3e926..f97ae328c87a 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -905,6 +905,24 @@ static struct folio *shmem_get_partial_folio(struct inode 
*inode, pgoff_t index)
return page ? page_folio(page) : NULL;
 }
 
+static void notify_populate(struct inode *inode, pgoff_t start, pgoff_t end)
+{
+   struct shmem_inode_info *info = SHMEM_I(inode);
+
+   memfile_notifier_populate(>memfile_node, start, end);
+}
+
+static void notify_invalidate(struct inode *inode, struct folio *folio,
+  pgoff_t start, pgoff_t end)
+{
+   struct shmem_inode_info *info = SHMEM_I(inode);
+
+   start = max(start, folio->index);
+   end = min(end, folio->index + folio_nr_pages(folio));
+
+   memfile_notifier_invalidate(>memfile_node, start, end);
+}
+
 /*
  * Remove range of pages and swap entries from page cache, and free them.
  * If !unfalloc, truncate or punch hole; if unfalloc, undo failed fallocate.
@@ -948,6 +966,8 @@ static void shmem_undo_range(struct inode *inode, loff_t 
lstart, loff_t lend,
}
index += folio_nr_pages(folio) - 1;
 
+   notify_invalidate(inode, folio, start, end);
+
if (!unfalloc || !folio_test_uptodate(folio))
truncate_inode_folio(mapping, folio);
folio_unlock(folio);
@@ -1021,6 +1041,9 @@ static void shmem_undo_range(struct inode *inode, loff_t 
lstart, loff_t lend,
index--;
break;
}
+
+   notify_invalidate(inode, folio, start, end);
+
VM_BUG_ON_FOLIO(folio_test_writeback(folio),
folio);
truncate_inode_folio(mapping, folio);
@@ -1092,6 +1115,13 @@ static int shmem_setattr(struct user_namespace 
*mnt_userns,
(newsize > oldsize && (info->seals & F_SEAL_GROW)))
return -EPERM;
 
+   if (info->memfile_node.flags & MEMFILE_F_USER_INACCESSIBLE) {
+   if(oldsize)
+   return -EPERM;
+   if (!PAGE_ALIGNED(newsize))
+   return -EINVAL;
+   }
+
if (newsize != oldsize) {
error = shmem_reacct_size(SHMEM_I(inode)->flags,
oldsize, newsize);
@@ -1340,6 +1370,8 @@ static int shmem_writepage(struct page *page, struct 
writeback_control *wbc)
goto redirty;
if (!total_swap_pages)
goto redirty;
+   if (info->memfile_node.flags & MEMFILE_F_UNRECLAIMABLE)
+   goto redirty;
 
/*
 * Our capabilities prevent regular writeback or sync from ever calling
@@ -2234,6 +2266,9 @@ static int shmem_mmap(struct file *file, struct 
vm_area_struct *vma)
if (ret)
return ret;
 
+   if (info->memfile_node.flags & MEMFILE_F_USER_INACCESSIBLE)
+   return -EPERM;
+
/* arm64 - allow memory tagging on RAM-based files */
vma->vm_flags |= VM_MTE_ALLOWED;
 
@@ -2274,6 +2309,7 @@ static struct inode *shmem_get_inode(struct super_block 
*sb, const struct inode
info->i_crtime = inode->i_mtime;
INIT_LIST_HEAD(>shrinklist);
INIT_LIST_HEAD(>swaplist);
+   memfile_node_init(>memfile_node);
simple_xattrs_init(>xattrs);
cache_no_acl(inode);
mapping_set_large_folios(inode->i_mapping);
@@ -2442,6 +2478,8 @@ shmem_write_begin(struct file *file, struct address_space 
*mapping,
   

[PATCH v3 2/3] ui: Switch "-display sdl" to use the QAPI parser

2022-05-19 Thread Thomas Huth
The "-display sdl" option still uses a hand-crafted parser for its
parameters since we didn't want to drag an interface we considered
somewhat flawed into the QAPI schema. Since the flaws are gone now,
it's time to QAPIfy.

This introduces the new "DisplaySDL" QAPI struct that is used to hold
the parameters that are unique to the SDL display. The only specific
parameter is currently "grab-mod" that is used to specify the required
modifier keys to escape from the mouse grabbing mode.

Signed-off-by: Thomas Huth 
---
 qapi/ui.json| 26 ++-
 include/sysemu/sysemu.h |  2 --
 softmmu/globals.c   |  2 --
 softmmu/vl.c| 70 +
 ui/sdl2.c   | 10 ++
 5 files changed, 36 insertions(+), 74 deletions(-)

diff --git a/qapi/ui.json b/qapi/ui.json
index 11a827d10f..413371d5e8 100644
--- a/qapi/ui.json
+++ b/qapi/ui.json
@@ -1295,6 +1295,29 @@
   '*swap-opt-cmd': 'bool'
   } }
 
+##
+# @HotKeyMod:
+#
+# Set of modifier keys that need to be held for shortcut key actions.
+#
+# Since: 7.1
+##
+{ 'enum'  : 'HotKeyMod',
+  'data'  : [ 'lctrl-lalt', 'lshift-lctrl-lalt', 'rctrl' ] }
+
+##
+# @DisplaySDL:
+#
+# SDL2 display options.
+#
+# @grab-mod:  Modifier keys that should be pressed together with the
+# "G" key to release the mouse grab.
+#
+# Since: 7.1
+##
+{ 'struct'  : 'DisplaySDL',
+  'data': { '*grab-mod'   : 'HotKeyMod' } }
+
 ##
 # @DisplayType:
 #
@@ -1374,7 +1397,8 @@
   'curses': { 'type': 'DisplayCurses', 'if': 'CONFIG_CURSES' },
   'egl-headless': { 'type': 'DisplayEGLHeadless',
 'if': { 'all': ['CONFIG_OPENGL', 'CONFIG_GBM'] } },
-  'dbus': { 'type': 'DisplayDBus', 'if': 'CONFIG_DBUS_DISPLAY' }
+  'dbus': { 'type': 'DisplayDBus', 'if': 'CONFIG_DBUS_DISPLAY' },
+  'sdl': { 'type': 'DisplaySDL', 'if': 'CONFIG_SDL' }
   }
 }
 
diff --git a/include/sysemu/sysemu.h b/include/sysemu/sysemu.h
index b4030acd74..812f66a31a 100644
--- a/include/sysemu/sysemu.h
+++ b/include/sysemu/sysemu.h
@@ -42,8 +42,6 @@ extern int graphic_depth;
 extern int display_opengl;
 extern const char *keyboard_layout;
 extern int win2k_install_hack;
-extern int alt_grab;
-extern int ctrl_grab;
 extern int graphic_rotate;
 extern int old_param;
 extern uint8_t *boot_splash_filedata;
diff --git a/softmmu/globals.c b/softmmu/globals.c
index 916bc12e2b..527edbefdd 100644
--- a/softmmu/globals.c
+++ b/softmmu/globals.c
@@ -50,8 +50,6 @@ QEMUOptionRom option_rom[MAX_OPTION_ROMS];
 int nb_option_roms;
 int old_param;
 const char *qemu_name;
-int alt_grab;
-int ctrl_grab;
 unsigned int nb_prom_envs;
 const char *prom_envs[MAX_PROM_ENVS];
 uint8_t *boot_splash_filedata;
diff --git a/softmmu/vl.c b/softmmu/vl.c
index 57ab9d5322..484e9d9921 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -1056,75 +1056,7 @@ static void parse_display(const char *p)
 exit(0);
 }
 
-if (strstart(p, "sdl", )) {
-/*
- * sdl DisplayType needs hand-crafted parser instead of
- * parse_display_qapi() due to some options not in
- * DisplayOptions, specifically:
- *   - ctrl_grab + alt_grab
- * They can't be moved into the QAPI since they use underscores,
- * thus they will get replaced by "grab-mod" in the long term
- */
-#if defined(CONFIG_SDL)
-dpy.type = DISPLAY_TYPE_SDL;
-while (*opts) {
-const char *nextopt;
-
-if (strstart(opts, ",grab-mod=", )) {
-opts = nextopt;
-if (strstart(opts, "lshift-lctrl-lalt", )) {
-alt_grab = 1;
-} else if (strstart(opts, "rctrl", )) {
-ctrl_grab = 1;
-} else {
-goto invalid_sdl_args;
-}
-} else if (strstart(opts, ",window-close=", )) {
-opts = nextopt;
-dpy.has_window_close = true;
-if (strstart(opts, "on", )) {
-dpy.window_close = true;
-} else if (strstart(opts, "off", )) {
-dpy.window_close = false;
-} else {
-goto invalid_sdl_args;
-}
-} else if (strstart(opts, ",show-cursor=", )) {
-opts = nextopt;
-dpy.has_show_cursor = true;
-if (strstart(opts, "on", )) {
-dpy.show_cursor = true;
-} else if (strstart(opts, "off", )) {
-dpy.show_cursor = false;
-} else {
-goto invalid_sdl_args;
-}
-} else if (strstart(opts, ",gl=", )) {
-opts = nextopt;
-dpy.has_gl = true;
-if (strstart(opts, "on", )) {
-dpy.gl = DISPLAYGL_MODE_ON;
-} else if (strstart(opts, "core", )) {
-dpy.gl = 

[PATCH v6 1/8] mm: Introduce memfile_notifier

2022-05-19 Thread Chao Peng
This patch introduces memfile_notifier facility so existing memory file
subsystems (e.g. tmpfs/hugetlbfs) can provide memory pages to allow a
third kernel component to make use of memory bookmarked in the memory
file and gets notified when the pages in the memory file become
allocated/invalidated.

It will be used for KVM to use a file descriptor as the guest memory
backing store and KVM will use this memfile_notifier interface to
interact with memory file subsystems. In the future there might be other
consumers (e.g. VFIO with encrypted device memory).

It consists below components:
 - memfile_backing_store: Each supported memory file subsystem can be
   implemented as a memory backing store which bookmarks memory and
   provides callbacks for other kernel systems (memfile_notifier
   consumers) to interact with.
 - memfile_notifier: memfile_notifier consumers defines callbacks and
   associate them to a file using memfile_register_notifier().
 - memfile_node: A memfile_node is associated with the file (inode) from
   the backing store and includes feature flags and a list of registered
   memfile_notifier for notifying.

Userspace is in charge of guest memory lifecycle: it first allocates
pages in memory backing store and then passes the fd to KVM and lets KVM
register memory slot to memory backing store via memfile_register_notifier.

Co-developed-by: Kirill A. Shutemov 
Signed-off-by: Kirill A. Shutemov 
Signed-off-by: Chao Peng 
---
 include/linux/memfile_notifier.h |  99 ++
 mm/Kconfig   |   4 +
 mm/Makefile  |   1 +
 mm/memfile_notifier.c| 137 +++
 4 files changed, 241 insertions(+)
 create mode 100644 include/linux/memfile_notifier.h
 create mode 100644 mm/memfile_notifier.c

diff --git a/include/linux/memfile_notifier.h b/include/linux/memfile_notifier.h
new file mode 100644
index ..dcb3ee6ed626
--- /dev/null
+++ b/include/linux/memfile_notifier.h
@@ -0,0 +1,99 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#ifndef _LINUX_MEMFILE_NOTIFIER_H
+#define _LINUX_MEMFILE_NOTIFIER_H
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+
+#define MEMFILE_F_USER_INACCESSIBLEBIT(0)  /* memory allocated in the file 
is inaccessible from userspace (e.g. read/write/mmap) */
+#define MEMFILE_F_UNMOVABLEBIT(1)  /* memory allocated in the file 
is unmovable (e.g. via pagemigration)*/
+#define MEMFILE_F_UNRECLAIMABLEBIT(2)  /* memory allocated in 
the file is unreclaimable (e.g. via kswapd) */
+
+#define MEMFILE_F_ALLOWED_MASK (MEMFILE_F_USER_INACCESSIBLE | \
+   MEMFILE_F_UNMOVABLE | \
+   MEMFILE_F_UNRECLAIMABLE)
+
+struct memfile_node {
+   struct list_headnotifiers;  /* registered memfile_notifier 
list on the file */
+   unsigned long   flags;  /* MEMFILE_F_* flags */
+};
+
+struct memfile_backing_store {
+   struct list_head list;
+   spinlock_t lock;
+   struct memfile_node* (*lookup_memfile_node)(struct file *file);
+   int (*get_lock_pfn)(struct file *file, pgoff_t offset, pfn_t *pfn,
+   int *order);
+   void (*put_unlock_pfn)(pfn_t pfn);
+};
+
+struct memfile_notifier;
+struct memfile_notifier_ops {
+   void (*populate)(struct memfile_notifier *notifier,
+pgoff_t start, pgoff_t end);
+   void (*invalidate)(struct memfile_notifier *notifier,
+  pgoff_t start, pgoff_t end);
+};
+
+struct memfile_notifier {
+   struct list_head list;
+   struct memfile_notifier_ops *ops;
+   struct memfile_backing_store *bs;
+};
+
+static inline void memfile_node_init(struct memfile_node *node)
+{
+   INIT_LIST_HEAD(>notifiers);
+   node->flags = 0;
+}
+
+#ifdef CONFIG_MEMFILE_NOTIFIER
+/* APIs for backing stores */
+extern void memfile_register_backing_store(struct memfile_backing_store *bs);
+extern int memfile_node_set_flags(struct file *file, unsigned long flags);
+extern void memfile_notifier_populate(struct memfile_node *node,
+ pgoff_t start, pgoff_t end);
+extern void memfile_notifier_invalidate(struct memfile_node *node,
+   pgoff_t start, pgoff_t end);
+/*APIs for notifier consumers */
+extern int memfile_register_notifier(struct file *file, unsigned long flags,
+struct memfile_notifier *notifier);
+extern void memfile_unregister_notifier(struct memfile_notifier *notifier);
+
+#else /* !CONFIG_MEMFILE_NOTIFIER */
+static void memfile_register_backing_store(struct memfile_backing_store *bs)
+{
+}
+
+static int memfile_node_set_flags(struct file *file, unsigned long flags)
+{
+   return -EOPNOTSUPP;
+}
+
+static void memfile_notifier_populate(struct memfile_node *node,
+ pgoff_t start, pgoff_t 

[PATCH v2 2/4] xlnx_dp: Introduce a vblank signal

2022-05-19 Thread Frederic Konrad via
From: Sai Pavan Boddu 

Add a periodic timer which raises vblank at a frequency of 30Hz.

Signed-off-by: Sai Pavan Boddu 
Signed-off-by: Edgar E. Iglesias 
Changes by fkonrad:
  - Switched to transaction-based ptimer API.
  - Added the DP_INT_VBLNK_START macro.
Signed-off-by: Frederic Konrad 
---
 hw/display/xlnx_dp.c | 27 ---
 include/hw/display/xlnx_dp.h |  3 +++
 2 files changed, 27 insertions(+), 3 deletions(-)

diff --git a/hw/display/xlnx_dp.c b/hw/display/xlnx_dp.c
index 0378570459..2686ca0f2e 100644
--- a/hw/display/xlnx_dp.c
+++ b/hw/display/xlnx_dp.c
@@ -114,6 +114,7 @@
 #define DP_TX_N_AUD (0x032C >> 2)
 #define DP_TX_AUDIO_EXT_DATA(n) ((0x0330 + 4 * n) >> 2)
 #define DP_INT_STATUS   (0x03A0 >> 2)
+#define DP_INT_VBLNK_START  (1 << 13)
 #define DP_INT_MASK (0x03A4 >> 2)
 #define DP_INT_EN   (0x03A8 >> 2)
 #define DP_INT_DS   (0x03AC >> 2)
@@ -274,6 +275,10 @@ static const VMStateDescription vmstate_dp = {
 }
 };
 
+#define DP_VBLANK_PTIMER_POLICY (PTIMER_POLICY_WRAP_AFTER_ONE_PERIOD | \
+ PTIMER_POLICY_CONTINUOUS_TRIGGER |\
+ PTIMER_POLICY_NO_IMMEDIATE_TRIGGER)
+
 static void xlnx_dp_update_irq(XlnxDPState *s);
 
 static uint64_t xlnx_dp_audio_read(void *opaque, hwaddr offset, unsigned size)
@@ -773,6 +778,13 @@ static void xlnx_dp_write(void *opaque, hwaddr offset, 
uint64_t value,
 break;
 case DP_TRANSMITTER_ENABLE:
 s->core_registers[offset] = value & 0x01;
+ptimer_transaction_begin(s->vblank);
+if (value & 0x1) {
+ptimer_run(s->vblank, 0);
+} else {
+ptimer_stop(s->vblank);
+}
+ptimer_transaction_commit(s->vblank);
 break;
 case DP_FORCE_SCRAMBLER_RESET:
 /*
@@ -1177,9 +1189,6 @@ static void xlnx_dp_update_display(void *opaque)
 return;
 }
 
-s->core_registers[DP_INT_STATUS] |= (1 << 13);
-xlnx_dp_update_irq(s);
-
 xlnx_dpdma_trigger_vsync_irq(s->dpdma);
 
 /*
@@ -1275,6 +1284,14 @@ static void xlnx_dp_finalize(Object *obj)
 fifo8_destroy(>rx_fifo);
 }
 
+static void vblank_hit(void *opaque)
+{
+XlnxDPState *s = XLNX_DP(opaque);
+
+s->core_registers[DP_INT_STATUS] |= DP_INT_VBLNK_START;
+xlnx_dp_update_irq(s);
+}
+
 static void xlnx_dp_realize(DeviceState *dev, Error **errp)
 {
 XlnxDPState *s = XLNX_DP(dev);
@@ -1309,6 +1326,10 @@ static void xlnx_dp_realize(DeviceState *dev, Error 
**errp)
);
 AUD_set_volume_out(s->amixer_output_stream, 0, 255, 255);
 xlnx_dp_audio_activate(s);
+s->vblank = ptimer_init(vblank_hit, s, DP_VBLANK_PTIMER_POLICY);
+ptimer_transaction_begin(s->vblank);
+ptimer_set_freq(s->vblank, 30);
+ptimer_transaction_commit(s->vblank);
 }
 
 static void xlnx_dp_reset(DeviceState *dev)
diff --git a/include/hw/display/xlnx_dp.h b/include/hw/display/xlnx_dp.h
index 1ef5a89ee7..e86a87f235 100644
--- a/include/hw/display/xlnx_dp.h
+++ b/include/hw/display/xlnx_dp.h
@@ -35,6 +35,7 @@
 #include "hw/dma/xlnx_dpdma.h"
 #include "audio/audio.h"
 #include "qom/object.h"
+#include "hw/ptimer.h"
 
 #define AUD_CHBUF_MAX_DEPTH (32 * KiB)
 #define MAX_QEMU_BUFFER_SIZE(4 * KiB)
@@ -107,6 +108,8 @@ struct XlnxDPState {
  */
 DPCDState *dpcd;
 I2CDDCState *edid;
+
+ptimer_state *vblank;
 };
 
 #define TYPE_XLNX_DP "xlnx.v-dp"
-- 
2.25.1




[PATCH v2 0/4] xlnx-zcu102: fix the display port.

2022-05-19 Thread Frederic Konrad via
Hi,

This patch set fixes some issues with the DisplayPort for the ZCU102:

The first patch fixes the wrong register size and thus the risk of register
overflow.

The three other one add a vblank interrupt required by the linux driver:
  - When using the VNC graphic backend and leaving it unconnected, in the best
case the gfx_update callback is called once every 3000ms which is
insufficient for the driver.  This is fixed by providing a VBLANK interrupt
from a ptimer.
  - This requirement revealed two issues with the IRQ numbers and the
interrupt disable logic fixed by the two last patches.

Tested by booting Petalinux with the framebuffer enabled.

Best Regards,
Fred

v1 -> v2:
  * Better use of the ptimer API by using a correct POLICY as suggested
by Peter Maydell (Patch 2).
  * Rebased on 78ac2eeb.

Frederic Konrad (2):
  xlnx_dp: fix the wrong register size
  xlnx-zynqmp: fix the irq mapping for the display port and its dma

Sai Pavan Boddu (2):
  xlnx_dp: Introduce a vblank signal
  xlnx_dp: Fix the interrupt disable logic

 hw/arm/xlnx-zynqmp.c |  4 ++--
 hw/display/xlnx_dp.c | 46 +++-
 include/hw/display/xlnx_dp.h | 12 --
 3 files changed, 47 insertions(+), 15 deletions(-)

-- 
2.25.1




[PATCH v2 3/4] xlnx_dp: Fix the interrupt disable logic

2022-05-19 Thread Frederic Konrad via
From: Sai Pavan Boddu 

Fix interrupt disable logic. Mask value 1 indicates that interrupts are
disabled.

Signed-off-by: Sai Pavan Boddu 
Reviewed-by: Edgar E. Iglesias 
Signed-off-by: Frederic Konrad 
---
 hw/display/xlnx_dp.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/hw/display/xlnx_dp.c b/hw/display/xlnx_dp.c
index 2686ca0f2e..48c0a8a661 100644
--- a/hw/display/xlnx_dp.c
+++ b/hw/display/xlnx_dp.c
@@ -888,7 +888,7 @@ static void xlnx_dp_write(void *opaque, hwaddr offset, 
uint64_t value,
 xlnx_dp_update_irq(s);
 break;
 case DP_INT_DS:
-s->core_registers[DP_INT_MASK] |= ~value;
+s->core_registers[DP_INT_MASK] |= value;
 xlnx_dp_update_irq(s);
 break;
 default:
-- 
2.25.1




[PATCH v2 4/4] xlnx-zynqmp: fix the irq mapping for the display port and its dma

2022-05-19 Thread Frederic Konrad via
When the display port has been initially implemented the device driver wasn't
using interrupts.  Now that the display port driver waits for vblank interrupt
it has been noticed that the irq mapping is wrong.  So use the value from the
linux device tree and the ultrascale+ reference manual.

Signed-off-by: Frederic Konrad 
Reviewed-by: Edgar E. Iglesias 
---
 hw/arm/xlnx-zynqmp.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/arm/xlnx-zynqmp.c b/hw/arm/xlnx-zynqmp.c
index 375309e68e..383e177a00 100644
--- a/hw/arm/xlnx-zynqmp.c
+++ b/hw/arm/xlnx-zynqmp.c
@@ -60,10 +60,10 @@
 #define SERDES_SIZE 0x2
 
 #define DP_ADDR 0xfd4a
-#define DP_IRQ  113
+#define DP_IRQ  0x77
 
 #define DPDMA_ADDR  0xfd4c
-#define DPDMA_IRQ   116
+#define DPDMA_IRQ   0x7a
 
 #define APU_ADDR0xfd5c
 #define APU_IRQ 153
-- 
2.25.1




[PATCH v5 5/6] RFC qapi/device_add: handle the rom_order_override when cold-plugging

2022-05-19 Thread Damien Hedde
rom_set_order_override() and rom_reset_order_override() were called
in qemu_create_cli_devices() to set the rom_order_override value
once and for all when creating the devices added on CLI.

Unfortunately this won't work with qapi commands.

Move the calls inside device_add so that it will be done in every
case:
+ CLI option: -device
+ QAPI command: device_add

rom_[set|reset]_order_override() are implemented in hw/core/loader.c
They either do nothing or call fw_cfg_[set|reset]_order_override().
The later functions are implemented in hw/nvram/fw_cfg.c and only
change an integer value of a "global" variable.
In consequence, there are no complex side effects involved and we can
safely move them from outside the -device option loop to the inner
function.

Signed-off-by: Damien Hedde 
---

I see 2 other ways to handle this:

1. Adding a new option to device_add.

We could add a new boolean (_rom_order_override_ for example) option
tot he qapi command. This flag would then be forced to "true" when
handling "-device" parameter from CLI.
The flag default could be:
- always false
- false on hot-plug, true on cold-plug.

2. Adding a new qapi command
We could add one or two commands to do the
rom_[set|reset]_order_override() operation.
---
 softmmu/qdev-monitor.c | 11 +++
 softmmu/vl.c   |  2 --
 2 files changed, 11 insertions(+), 2 deletions(-)

diff --git a/softmmu/qdev-monitor.c b/softmmu/qdev-monitor.c
index d68ef883b5..7cbee2b0d8 100644
--- a/softmmu/qdev-monitor.c
+++ b/softmmu/qdev-monitor.c
@@ -43,6 +43,7 @@
 #include "hw/qdev-properties.h"
 #include "hw/clock.h"
 #include "hw/boards.h"
+#include "hw/loader.h"
 
 /*
  * Aliases were a bad idea from the start.  Let's keep them
@@ -673,6 +674,10 @@ DeviceState *qdev_device_add_from_qdict(const QDict *opts,
 return NULL;
 }
 
+if (!is_hotplug) {
+rom_set_order_override(FW_CFG_ORDER_OVERRIDE_DEVICE);
+}
+
 /* create device */
 dev = qdev_new(driver);
 
@@ -714,6 +719,9 @@ DeviceState *qdev_device_add_from_qdict(const QDict *opts,
 if (!qdev_realize(DEVICE(dev), bus, errp)) {
 goto err_del_dev;
 }
+if (!is_hotplug) {
+rom_reset_order_override();
+}
 return dev;
 
 err_del_dev:
@@ -721,6 +729,9 @@ err_del_dev:
 object_unparent(OBJECT(dev));
 object_unref(OBJECT(dev));
 }
+if (!is_hotplug) {
+rom_reset_order_override();
+}
 return NULL;
 }
 
diff --git a/softmmu/vl.c b/softmmu/vl.c
index ea15e37973..5a0d54b595 100644
--- a/softmmu/vl.c
+++ b/softmmu/vl.c
@@ -2635,7 +2635,6 @@ static void qemu_create_cli_devices(void)
 }
 
 /* init generic devices */
-rom_set_order_override(FW_CFG_ORDER_OVERRIDE_DEVICE);
 qemu_opts_foreach(qemu_find_opts("device"),
   device_init_func, NULL, _fatal);
 QTAILQ_FOREACH(opt, _opts, next) {
@@ -2652,7 +2651,6 @@ static void qemu_create_cli_devices(void)
 object_unref(OBJECT(dev));
 loc_pop(>loc);
 }
-rom_reset_order_override();
 }
 
 static void qemu_machine_creation_done(void)
-- 
2.36.1




[PATCH v6 6/8] KVM: Handle page fault for private memory

2022-05-19 Thread Chao Peng
A page fault can carry the information of whether the access if private
or not for KVM_MEM_PRIVATE memslot, this can be filled by architecture
code(like TDX code). To handle page faut for such access, KVM maps the
page only when this private property matches host's view on this page
which can be decided by checking whether the corresponding page is
populated in the private fd or not. A page is considered as private when
the page is populated in the private fd, otherwise it's shared.

For a successful match, private pfn is obtained with memfile_notifier
callbacks from private fd and shared pfn is obtained with existing
get_user_pages.

For a failed match, KVM causes a KVM_EXIT_MEMORY_FAULT exit to
userspace. Userspace then can convert memory between private/shared from
host's view then retry the access.

Co-developed-by: Yu Zhang 
Signed-off-by: Yu Zhang 
Signed-off-by: Chao Peng 
---
 arch/x86/kvm/mmu.h  |  1 +
 arch/x86/kvm/mmu/mmu.c  | 70 +++--
 arch/x86/kvm/mmu/mmu_internal.h | 17 
 arch/x86/kvm/mmu/mmutrace.h |  1 +
 arch/x86/kvm/mmu/paging_tmpl.h  |  5 ++-
 include/linux/kvm_host.h| 22 +++
 6 files changed, 112 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/mmu.h b/arch/x86/kvm/mmu.h
index 7e258cc94152..c84835762249 100644
--- a/arch/x86/kvm/mmu.h
+++ b/arch/x86/kvm/mmu.h
@@ -176,6 +176,7 @@ struct kvm_page_fault {
 
/* Derived from mmu and global state.  */
const bool is_tdp;
+   const bool is_private;
const bool nx_huge_page_workaround_enabled;
 
/*
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index afe18d70ece7..e18460e0d743 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2899,6 +2899,9 @@ int kvm_mmu_max_mapping_level(struct kvm *kvm,
if (max_level == PG_LEVEL_4K)
return PG_LEVEL_4K;
 
+   if (kvm_slot_is_private(slot))
+   return max_level;
+
host_level = host_pfn_mapping_level(kvm, gfn, pfn, slot);
return min(host_level, max_level);
 }
@@ -3948,10 +3951,54 @@ static bool kvm_arch_setup_async_pf(struct kvm_vcpu 
*vcpu, gpa_t cr2_or_gpa,
  kvm_vcpu_gfn_to_hva(vcpu, gfn), );
 }
 
+static inline u8 order_to_level(int order)
+{
+   enum pg_level level;
+
+   for (level = KVM_MAX_HUGEPAGE_LEVEL; level > PG_LEVEL_4K; level--)
+   if (order >= page_level_shift(level) - PAGE_SHIFT)
+   return level;
+   return level;
+}
+
+static int kvm_faultin_pfn_private(struct kvm_vcpu *vcpu,
+  struct kvm_page_fault *fault)
+{
+   int order;
+   struct kvm_memory_slot *slot = fault->slot;
+   bool private_exist = !kvm_private_mem_get_pfn(slot, fault->gfn,
+ >pfn, );
+
+   if (fault->is_private != private_exist) {
+   if (private_exist)
+   kvm_private_mem_put_pfn(slot, fault->pfn);
+
+   vcpu->run->exit_reason = KVM_EXIT_MEMORY_FAULT;
+   if (fault->is_private)
+   vcpu->run->memory.flags = KVM_MEMORY_EXIT_FLAG_PRIVATE;
+   else
+   vcpu->run->memory.flags = 0;
+   vcpu->run->memory.padding = 0;
+   vcpu->run->memory.gpa = fault->gfn << PAGE_SHIFT;
+   vcpu->run->memory.size = PAGE_SIZE;
+   return RET_PF_USER;
+   }
+
+   if (fault->is_private) {
+   fault->max_level = min(order_to_level(order), fault->max_level);
+   fault->map_writable = !(slot->flags & KVM_MEM_READONLY);
+   return RET_PF_FIXED;
+   }
+
+   /* Fault is shared, fallthrough to the standard path. */
+   return RET_PF_CONTINUE;
+}
+
 static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct kvm_page_fault *fault)
 {
struct kvm_memory_slot *slot = fault->slot;
bool async;
+   int r;
 
/*
 * Retry the page fault if the gfn hit a memslot that is being deleted
@@ -3980,6 +4027,12 @@ static int kvm_faultin_pfn(struct kvm_vcpu *vcpu, struct 
kvm_page_fault *fault)
return RET_PF_EMULATE;
}
 
+   if (kvm_slot_is_private(slot)) {
+   r = kvm_faultin_pfn_private(vcpu, fault);
+   if (r != RET_PF_CONTINUE)
+   return r == RET_PF_FIXED ? RET_PF_CONTINUE : r;
+   }
+
async = false;
fault->pfn = __gfn_to_pfn_memslot(slot, fault->gfn, false, ,
  fault->write, >map_writable,
@@ -4028,8 +4081,11 @@ static bool is_page_fault_stale(struct kvm_vcpu *vcpu,
if (!sp && kvm_test_request(KVM_REQ_MMU_FREE_OBSOLETE_ROOTS, vcpu))
return true;
 
-   return fault->slot &&
-  mmu_notifier_retry_hva(vcpu->kvm, mmu_seq, fault->hva);
+   if (fault->is_private)
+   

[PATCH v5 1/6] machine: add phase_get() and document phase_check()/advance()

2022-05-19 Thread Damien Hedde
phase_get() returns the current phase, we'll use it in next
commit.

Signed-off-by: Damien Hedde 
Reviewed-by: Philippe Mathieu-Daudé 
---
 include/hw/qdev-core.h | 19 +++
 hw/core/qdev.c |  5 +
 2 files changed, 24 insertions(+)

diff --git a/include/hw/qdev-core.h b/include/hw/qdev-core.h
index 92c3d65208..e29c705b74 100644
--- a/include/hw/qdev-core.h
+++ b/include/hw/qdev-core.h
@@ -887,7 +887,26 @@ typedef enum MachineInitPhase {
 PHASE_MACHINE_READY,
 } MachineInitPhase;
 
+/*
+ * phase_get:
+ * Returns the current phase
+ */
+MachineInitPhase phase_get(void);
+
+/**
+ * phase_check:
+ * Test if current phase is at least @phase.
+ *
+ * Returns true if this is the case.
+ */
 extern bool phase_check(MachineInitPhase phase);
+
+/**
+ * @phase_advance:
+ * Update the current phase to @phase.
+ *
+ * Must only be used to make a single phase step.
+ */
 extern void phase_advance(MachineInitPhase phase);
 
 #endif
diff --git a/hw/core/qdev.c b/hw/core/qdev.c
index 84f3019440..632dc0a4be 100644
--- a/hw/core/qdev.c
+++ b/hw/core/qdev.c
@@ -910,6 +910,11 @@ Object *qdev_get_machine(void)
 
 static MachineInitPhase machine_phase;
 
+MachineInitPhase phase_get(void)
+{
+return machine_phase;
+}
+
 bool phase_check(MachineInitPhase phase)
 {
 return machine_phase >= phase;
-- 
2.36.1




  1   2   3   >