date:20230518

Re: [PATCH v3 3/3] pci: ROM preallocation for incoming migration

2023-05-18 Thread Michael S. Tsirkin

On Mon, May 15, 2023 at 03:52:29PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> On incoming migration we have the following sequence to load option
> ROM:
> 
> 1. On device realize we do normal load ROM from the file
> 
> 2. Than, on incoming migration we rewrite ROM from the incoming RAM
>block. If sizes mismatch we fail, like this:
> 
> Size mismatch: :00:03.0/virtio-net-pci.rom: 0x4 != 0x8: 
> Invalid argument
> 
> This is not ideal when we migrate to updated distribution: we have to
> keep old ROM files in new distribution and be careful around romfile
> property to load correct ROM file. Which is loaded actually just to
> allocate the ROM with correct length.
> 
> Note, that romsize property doesn't really help: if we try to specify
> it when default romfile is larger, it fails with something like:
> 
> romfile "efi-virtio.rom" (160768 bytes) is too large for ROM size 65536
> 
> Let's just ignore ROM file when romsize is specified and we are in
> incoming migration state. In other words, we need only to preallocate
> ROM of specified size, local ROM file is unrelated.
> 
> This way:
> 
> If romsize was specified on source, we just use same commandline as on
> source, and migration will work independently of local ROM files on
> target.
> 
> If romsize was not specified on source (and we have mismatching local
> ROM file on target host), we have to specify romsize on target to match
> source romsize. romfile parameter may be kept same as on source or may
> be dropped, the file is not loaded anyway.
> 
> As a bonus we avoid extra reading from ROM file on target.
> 
> Note: when we don't have romsize parameter on source command line and
> need it for target, it may be calculated as aligned up to power of two
> size of ROM file on source (if we know, which file is it) or,
> alternatively it may be retrieved from source QEMU by QMP qom-get
> command, like
> 
>   { "execute": "qom-get",
> "arguments": {
>   "path": "/machine/peripheral/CARD_ID/virtio-net-pci.rom[0]",
>   "property": "size" } }
> 
> Suggested-by: Michael S. Tsirkin 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 
> Reviewed-by: David Hildenbrand 
> Reviewed-by: Juan Quintela 


Breaks build here:

In function ‘pci_add_option_rom’,
inlined from ‘pci_qdev_realize’ at ../hw/pci/pci.c:2155:5:
../hw/pci/pci.c:2395:13: error: ‘size’ may be used uninitialized 
[-Werror=maybe-uninitialized]
 2395 | if (load_image_size(path, ptr, size) < 0) {
  | ^~~~
../hw/pci/pci.c: In function ‘pci_qdev_realize’:
../hw/pci/pci.c:2312:13: note: ‘size’ was declared here
 2312 | int64_t size;
  | ^~~~



> ---
>  hw/pci/pci.c | 77 ++--
>  1 file changed, 45 insertions(+), 32 deletions(-)
> 
> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
> index 3a0107758c..0f0c83c02f 100644
> --- a/hw/pci/pci.c
> +++ b/hw/pci/pci.c
> @@ -36,6 +36,7 @@
>  #include "migration/vmstate.h"
>  #include "net/net.h"
>  #include "sysemu/numa.h"
> +#include "sysemu/runstate.h"
>  #include "sysemu/sysemu.h"
>  #include "hw/loader.h"
>  #include "qemu/error-report.h"
> @@ -2308,10 +2309,16 @@ static void pci_add_option_rom(PCIDevice *pdev, bool 
> is_default_rom,
>  {
>  int64_t size;
>  g_autofree char *path = NULL;
> -void *ptr;
>  char name[32];
>  const VMStateDescription *vmsd;
>  
> +/*
> + * In case of incoming migration ROM will come with migration stream, no
> + * reason to load the file.  Neither we want to fail if local ROM file
> + * mismatches with specified romsize.
> + */
> +bool load_file = !runstate_check(RUN_STATE_INMIGRATE);
> +
>  if (!pdev->romfile || !strlen(pdev->romfile)) {
>  return;
>  }
> @@ -2341,32 +2348,35 @@ static void pci_add_option_rom(PCIDevice *pdev, bool 
> is_default_rom,
>  return;
>  }
>  
> -path = qemu_find_file(QEMU_FILE_TYPE_BIOS, pdev->romfile);
> -if (path == NULL) {
> -path = g_strdup(pdev->romfile);
> -}
> +if (load_file || pdev->romsize == -1) {
> +path = qemu_find_file(QEMU_FILE_TYPE_BIOS, pdev->romfile);
> +if (path == NULL) {
> +path = g_strdup(pdev->romfile);
> +}
>  
> -size = get_image_size(path);
> -if (size < 0) {
> -error_setg(errp, "failed to find romfile \"%s\"", pdev->romfile);
> -return;
> -} else if (size == 0) {
> -error_setg(errp, "romfile \"%s\" is empty", pdev->romfile);
> -return;
> -} else if (size > 2 * GiB) {
> -error_setg(errp, "romfile \"%s\" too large (size cannot exceed 2 
> GiB)",
> -   pdev->romfile);
> -return;
> -}
> -if (pdev->romsize != -1) {
> -if (size > pdev->romsize) {
> -error_setg(errp, "romfile \"%s\" (%u bytes) "
> -   "is too large for ROM size %u",
> -   pdev->romfile,

Re: [PATCH v3 5/5] vdpa: move CVQ isolation check to net_init_vhost_vdpa

2023-05-18 Thread Eugenio Perez Martin

On Thu, May 18, 2023 at 11:23 PM Michael S. Tsirkin  wrote:
>
> On Thu, May 18, 2023 at 08:36:22AM +0200, Eugenio Perez Martin wrote:
> > On Thu, May 18, 2023 at 7:50 AM Jason Wang  wrote:
> > >
> > > On Wed, May 17, 2023 at 2:30 PM Eugenio Perez Martin
> > >  wrote:
> > > >
> > > > On Wed, May 17, 2023 at 5:59 AM Jason Wang  wrote:
> > > > >
> > > > > On Tue, May 9, 2023 at 11:44 PM Eugenio Pérez  
> > > > > wrote:
> > > > > >
> > > > > > Evaluating it at start time instead of initialization time may make 
> > > > > > the
> > > > > > guest capable of dynamically adding or removing migration blockers.
> > > > > >
> > > > > > Also, moving to initialization reduces the number of ioctls in the
> > > > > > migration, reducing failure possibilities.
> > > > > >
> > > > > > As a drawback we need to check for CVQ isolation twice: one time 
> > > > > > with no
> > > > > > MQ negotiated and another one acking it, as long as the device 
> > > > > > supports
> > > > > > it.  This is because Vring ASID / group management is based on vq
> > > > > > indexes, but we don't know the index of CVQ before negotiating MQ.
> > > > > >
> > > > > > Signed-off-by: Eugenio Pérez 
> > > > > > ---
> > > > > > v2: Take out the reset of the device from vhost_vdpa_cvq_is_isolated
> > > > > > v3: Only record cvq_isolated, true if the device have cvq isolated 
> > > > > > in
> > > > > > both !MQ and MQ configurations.
> > > > > > ---
> > > > > >  net/vhost-vdpa.c | 178 
> > > > > > +++
> > > > > >  1 file changed, 135 insertions(+), 43 deletions(-)
> > > > > >
> > > > > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > > > > > index 3fb833fe76..29054b77a9 100644
> > > > > > --- a/net/vhost-vdpa.c
> > > > > > +++ b/net/vhost-vdpa.c
> > > > > > @@ -43,6 +43,10 @@ typedef struct VhostVDPAState {
> > > > > >
> > > > > >  /* The device always have SVQ enabled */
> > > > > >  bool always_svq;
> > > > > > +
> > > > > > +/* The device can isolate CVQ in its own ASID */
> > > > > > +bool cvq_isolated;
> > > > > > +
> > > > > >  bool started;
> > > > > >  } VhostVDPAState;
> > > > > >
> > > > > > @@ -362,15 +366,8 @@ static NetClientInfo net_vhost_vdpa_info = {
> > > > > >  .check_peer_type = vhost_vdpa_check_peer_type,
> > > > > >  };
> > > > > >
> > > > > > -/**
> > > > > > - * Get vring virtqueue group
> > > > > > - *
> > > > > > - * @device_fd  vdpa device fd
> > > > > > - * @vq_index   Virtqueue index
> > > > > > - *
> > > > > > - * Return -errno in case of error, or vq group if success.
> > > > > > - */
> > > > > > -static int64_t vhost_vdpa_get_vring_group(int device_fd, unsigned 
> > > > > > vq_index)
> > > > > > +static int64_t vhost_vdpa_get_vring_group(int device_fd, unsigned 
> > > > > > vq_index,
> > > > > > +  Error **errp)
> > > > > >  {
> > > > > >  struct vhost_vring_state state = {
> > > > > >  .index = vq_index,
> > > > > > @@ -379,8 +376,7 @@ static int64_t vhost_vdpa_get_vring_group(int 
> > > > > > device_fd, unsigned vq_index)
> > > > > >
> > > > > >  if (unlikely(r < 0)) {
> > > > > >  r = -errno;
> > > > > > -error_report("Cannot get VQ %u group: %s", vq_index,
> > > > > > - g_strerror(errno));
> > > > > > +error_setg_errno(errp, errno, "Cannot get VQ %u group", 
> > > > > > vq_index);
> > > > > >  return r;
> > > > > >  }
> > > > > >
> > > > > > @@ -480,9 +476,9 @@ static int 
> > > > > > vhost_vdpa_net_cvq_start(NetClientState *nc)
> > > > > >  {
> > > > > >  VhostVDPAState *s, *s0;
> > > > > >  struct vhost_vdpa *v;
> > > > > > -uint64_t backend_features;
> > > > > >  int64_t cvq_group;
> > > > > > -int cvq_index, r;
> > > > > > +int r;
> > > > > > +Error *err = NULL;
> > > > > >
> > > > > >  assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > > > > >
> > > > > > @@ -502,41 +498,22 @@ static int 
> > > > > > vhost_vdpa_net_cvq_start(NetClientState *nc)
> > > > > >  /*
> > > > > >   * If we early return in these cases SVQ will not be enabled. 
> > > > > > The migration
> > > > > >   * will be blocked as long as vhost-vdpa backends will not 
> > > > > > offer _F_LOG.
> > > > > > - *
> > > > > > - * Calling VHOST_GET_BACKEND_FEATURES as they are not 
> > > > > > available in v->dev
> > > > > > - * yet.
> > > > > >   */
> > > > > > -r = ioctl(v->device_fd, VHOST_GET_BACKEND_FEATURES, 
> > > > > > _features);
> > > > > > -if (unlikely(r < 0)) {
> > > > > > -error_report("Cannot get vdpa backend_features: %s(%d)",
> > > > > > -g_strerror(errno), errno);
> > > > > > -return -1;
> > > > > > +if (!vhost_vdpa_net_valid_svq_features(v->dev->features, 
> > > > > > NULL)) {
> > > > > > +return 0;
> > > > > >  }
> > > > > > -if (!(backend_features & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID)) 
> > > > > > ||
> > > > > > -

Re: [PULL 00/12] Migration 20230518 patches

2023-05-18 Thread Richard Henderson


On 5/18/23 10:12, Juan Quintela wrote:

The following changes since commit 266ccbb27b3ec6661f22395ec2c41d854c94d761:

   Merge tag 'pull-target-arm-20230518' 
ofhttps://git.linaro.org/people/pmaydell/qemu-arm  into staging (2023-05-18 
06:08:30 -0700)

are available in the Git repository at:

   https://gitlab.com/juan.quintela/qemu.git  
tags/migration-20230518-pull-request

for you to fetch changes up to ba9d2cbc01b4e33f9a97edcd77247831a333eac2:

   migration: Fix duplicated included in meson.build (2023-05-18 18:41:53 +0200)


Migration Pull request

Hi

Based on latest reviewed parts of migration:
- Disable colo (vladimir)
- Migration atomic counters (juan)

Please apply.


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/8.1 as 
appropriate.


r~

RE: [PATCH] multifd: Set a higher "backlog" default value for listen()

2023-05-18 Thread Wang, Wei W

On Friday, May 19, 2023 10:52 AM, Wang, Lei4 wrote:
> > We can change it to uint16_t or uint32_t, but need to see if listening
> > on a larger value is OK to everyone.
> 
> Is there any use case to use >256 migration channels? If not, then I suppose
> it's no need to increase it.

People can choose to use more than 256 channels to boost performance.
If it is determined that using larger than 256 channels doesn't increase 
performance
on all the existing platforms, then we need to have it reflected in the code 
explicitly,
e.g. fail with errors messages when user does:
migrate_set_parameter multifd-channels 512

> 
> >
> > Man page of listen mentions that the  maximum length of the queue for
> > incomplete sockets can be set using
> > /proc/sys/net/ipv4/tcp_max_syn_backlog,
> > and it is 4096 by default on my machine

[PATCH] Makefile: add file entry to ctags

2023-05-18 Thread Fei Wu

It's more convenient to jump among files with --extra=+fq.

Signed-off-by: Fei Wu 
---
 Makefile | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/Makefile b/Makefile
index 3c7d67142f..ffb3bcd4f4 100644
--- a/Makefile
+++ b/Makefile
@@ -239,8 +239,8 @@ ctags:
rm -f "$(SRC_PATH)/"tags,   \
"CTAGS", "Remove old tags")
$(call quiet-command, \
-   $(find-src-path) -exec ctags\
-   -f "$(SRC_PATH)/"tags --append {} +,\
+   $(find-src-path) -exec ctags --extra=+fq\
+   -f "$(SRC_PATH)/"tags --append {} +,\
"CTAGS", "Re-index $(SRC_PATH)")
 
 .PHONY: gtags
-- 
2.25.1

Re: [PULL 00/68] i386, build system, KVM changes for 2023-05-18

2023-05-18 Thread Yang Zhong



Paolo, please help add below queued sgx fix into this PULL request, which was
missed from last time, thanks a lot!
https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg00841.html
https://lists.nongnu.org/archive/html/qemu-devel/2023-04/msg00896.html

Regards,
Yang

Configure no longer works after pulling in the latest QEMU commits

2023-05-18 Thread Hao Xiang

Hi,

After pulling in the QEMU latest commits, I can no longer run the
"configure" command. Below is the error message I am seeing. I believe this
is related to this change "configure: create a python venv
unconditionally".
I am running on Debian GNU/Linux 11 (bullseye) and 5.15 kernel version.
What can I do to fix/work around the issue?

Thanks, Hao

$ ~/source/qemu-community-trees/qemu/bin/debug/test$ ../../../configure
--enable-debug --static --disable-gnutls
python determined to be '/usr/bin/python3'
python version: Python 3.9.2
mkvenv: Creating non-isolated virtual environment at 'pyvenv'
Skipping existing file
/data00/home/hao.xiang/source/qemu-community-trees/qemu/bin/debug/test/pyvenv/bin/pip

Skipping existing file
/data00/home/hao.xiang/source/qemu-community-trees/qemu/bin/debug/test/pyvenv/bin/pip3

mkvenv: checking for meson>=0.63.0
Metadata: missing: ['Author'], warnings: []
Metadata: missing: ['Home-page', 'Author'], warnings: []
Metadata: missing: ['Home-page', 'Author'], warnings: []
Not recognised as a requirement: ''
Unexpected line: quitting requirement scan: '[docs]'
Not recognised as a requirement: ''
Unexpected line: quitting requirement scan: '[jedi]'
Not recognised as a requirement: ''
Unexpected line: quitting requirement scan: "[:python_version < '3']"
Metadata: missing: ['Home-page', 'Author'], warnings: []
Not recognised as a requirement: ''
Unexpected line: quitting requirement scan: '[ARC]'
Not recognised as a requirement: ''
Unexpected line: quitting requirement scan: '[DNSSEC]'
mkvenv did not complete successfully:
Traceback (most recent call last):
  File
"/data00/home/hao.xiang/source/qemu-community-trees/qemu/bin/debug/test/pyvenv/share/python-wheels/distlib-0.2.8-py2.py3-none-any.whl/distlib/
metadata.py", line 730, in __init__
   self._data = json.loads(data)
  File "/usr/lib/python3.9/json/__init__.py", line 346, in loads
   return _default_decoder.decode(s)
  File "/usr/lib/python3.9/json/decoder.py", line 337, in decode
   obj, end = self.raw_decode(s, idx=_w(s, 0).end())
  File "/usr/lib/python3.9/json/decoder.py", line 355, in raw_decode
   raise JSONDecodeError("Expecting value", s, err.value) from None
json.decoder.JSONDecodeError: Expecting value: line 1 column 1 (char 0)

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/home/hao.xiang/source/qemu-community-trees/qemu/python/scripts/
mkvenv.py", line 877, in main
   ensure(
  File "/home/hao.xiang/source/qemu-community-trees/qemu/python/scripts/
mkvenv.py", line 768, in ensure
   _do_ensure(dep_specs, online, wheels_dir)
  File "/home/hao.xiang/source/qemu-community-trees/qemu/python/scripts/
mkvenv.py", line 723, in _do_ensure
   dist = dist_path.get_distribution(matcher.name)
  File
"/data00/home/hao.xiang/source/qemu-community-trees/qemu/bin/debug/test/pyvenv/share/python-wheels/distlib-0.2.8-py2.py3-none-any.whl/distlib/
database.py", line 240, in get_distribution
   self._generate_cache()
  File
"/data00/home/hao.xiang/source/qemu-community-trees/qemu/bin/debug/test/pyvenv/share/python-wheels/distlib-0.2.8-py2.py3-none-any.whl/distlib/
database.py", line 167, in _generate_cache
   for dist in self._yield_distributions():
  File
"/data00/home/hao.xiang/source/qemu-community-trees/qemu/bin/debug/test/pyvenv/share/python-wheels/distlib-0.2.8-py2.py3-none-any.whl/distlib/
database.py", line 157, in _yield_distributions
   yield old_dist_class(r.path, self)
  File
"/data00/home/hao.xiang/source/qemu-community-trees/qemu/bin/debug/test/pyvenv/share/python-wheels/distlib-0.2.8-py2.py3-none-any.whl/distlib/
database.py", line 878, in __init__
   metadata = self._get_metadata(path)
  File
"/data00/home/hao.xiang/source/qemu-community-trees/qemu/bin/debug/test/pyvenv/share/python-wheels/distlib-0.2.8-py2.py3-none-any.whl/distlib/
database.py", line 958, in _get_metadata
   metadata = Metadata(path=path, scheme='legacy')
  File
"/data00/home/hao.xiang/source/qemu-community-trees/qemu/bin/debug/test/pyvenv/share/python-wheels/distlib-0.2.8-py2.py3-none-any.whl/distlib/
metadata.py", line 741, in __init__
   self.validate()
  File
"/data00/home/hao.xiang/source/qemu-community-trees/qemu/bin/debug/test/pyvenv/share/python-wheels/distlib-0.2.8-py2.py3-none-any.whl/distlib/
metadata.py", line 958, in validate
   missing, warnings = self._legacy.check(True)
  File
"/data00/home/hao.xiang/source/qemu-community-trees/qemu/bin/debug/test/pyvenv/share/python-wheels/distlib-0.2.8-py2.py3-none-any.whl/distlib/
metadata.py", line 522, in check
   self.set_metadata_version()
  File
"/data00/home/hao.xiang/source/qemu-community-trees/qemu/bin/debug/test/pyvenv/share/python-wheels/distlib-0.2.8-py2.py3-none-any.whl/distlib/
metadata.py", line 289, in set_metadata_version
   self._fields['Metadata-Version'] = _best_version(self._fields)
  File

Re: [PATCH] multifd: Set a higher "backlog" default value for listen()

2023-05-18 Thread Wang, Lei

On 5/19/2023 10:44, Wang, Wei W wrote:
> On Friday, May 19, 2023 9:31 AM, Wang, Lei4 wrote:
>> On 5/18/2023 17:16, Juan Quintela wrote:
>>> Lei Wang  wrote:
 When destination VM is launched, the "backlog" parameter for listen()
 is set to 1 as default in socket_start_incoming_migration_internal(),
 which will lead to socket connection error (the queue of pending
 connections is full) when "multifd" and "multifd-channels" are set
 later on and a high number of channels are used. Set it to a
 hard-coded higher default value 512 to fix this issue.

 Reported-by: Wei Wang 
 Signed-off-by: Lei Wang 
>>>
>>> [cc'd daiel who is the maintainer of qio]
>>>
>>> My understanding of that value is that 230 or something like that
>>> would be more than enough.  The maxiimum number of multifd channels is
>> 256.
>>
>> You are right, the "multifd-channels" expects uint8_t, so 256 is enough.
>>
> 
> We can change it to uint16_t or uint32_t, but need to see if listening on a 
> larger
> value is OK to everyone.

Is there any use case to use >256 migration channels? If not, then I suppose
it's no need to increase it.

> 
> Man page of listen mentions that the  maximum length of the queue for
> incomplete sockets can be set using /proc/sys/net/ipv4/tcp_max_syn_backlog,
> and it is 4096 by default on my machine

RE: [PATCH] multifd: Set a higher "backlog" default value for listen()

2023-05-18 Thread Wang, Wei W

On Friday, May 19, 2023 9:31 AM, Wang, Lei4 wrote:
> On 5/18/2023 17:16, Juan Quintela wrote:
> > Lei Wang  wrote:
> >> When destination VM is launched, the "backlog" parameter for listen()
> >> is set to 1 as default in socket_start_incoming_migration_internal(),
> >> which will lead to socket connection error (the queue of pending
> >> connections is full) when "multifd" and "multifd-channels" are set
> >> later on and a high number of channels are used. Set it to a
> >> hard-coded higher default value 512 to fix this issue.
> >>
> >> Reported-by: Wei Wang 
> >> Signed-off-by: Lei Wang 
> >
> > [cc'd daiel who is the maintainer of qio]
> >
> > My understanding of that value is that 230 or something like that
> > would be more than enough.  The maxiimum number of multifd channels is
> 256.
> 
> You are right, the "multifd-channels" expects uint8_t, so 256 is enough.
> 

We can change it to uint16_t or uint32_t, but need to see if listening on a 
larger
value is OK to everyone.

Man page of listen mentions that the  maximum length of the queue for
incomplete sockets can be set using /proc/sys/net/ipv4/tcp_max_syn_backlog,
and it is 4096 by default on my machine.

[PATCH v5] hw/riscv: qemu crash when NUMA nodes exceed available CPUs

2023-05-18 Thread Yin Wang

Command "qemu-system-riscv64 -machine virt
-m 2G -smp 1 -numa node,mem=1G -numa node,mem=1G"
would trigger this problem.Backtrace with:
 #0  0x55b5b1a4 in riscv_numa_get_default_cpu_node_id  at 
../hw/riscv/numa.c:211
 #1  0x558ce510 in machine_numa_finish_cpu_init  at 
../hw/core/machine.c:1230
 #2  0x558ce9d3 in machine_run_board_init  at ../hw/core/machine.c:1346
 #3  0x55aaedc3 in qemu_init_board  at ../softmmu/vl.c:2513
 #4  0x55aaf064 in qmp_x_exit_preconfig  at ../softmmu/vl.c:2609
 #5  0x55ab1916 in qemu_init  at ../softmmu/vl.c:3617
 #6  0x5585463b in main  at ../softmmu/main.c:47
This commit fixes the issue by adding parameter checks.

Reviewed-by: Alistair Francis 
Reviewed-by: Daniel Henrique Barboza 
Reviewed-by: LIU Zhiwei 
Reviewed-by: Weiwei Li 
Signed-off-by: Yin Wang 
---
 hw/riscv/numa.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/hw/riscv/numa.c b/hw/riscv/numa.c
index 4720102561..e0414d5b1b 100644
--- a/hw/riscv/numa.c
+++ b/hw/riscv/numa.c
@@ -207,6 +207,12 @@ int64_t riscv_numa_get_default_cpu_node_id(const 
MachineState *ms, int idx)
 {
 int64_t nidx = 0;
 
+if (ms->numa_state->num_nodes > ms->smp.cpus) {
+error_report("Number of NUMA nodes (%d)"
+ " cannot exceed the number of available CPUs (%d).",
+ ms->numa_state->num_nodes, ms->smp.max_cpus);
+exit(EXIT_FAILURE);
+}
 if (ms->numa_state->num_nodes) {
 nidx = idx / (ms->smp.cpus / ms->numa_state->num_nodes);
 if (ms->numa_state->num_nodes <= nidx) {
-- 
2.34.1

RE: [PATCH v1] migration: fail the cap check if it requires the use of deferred incoming

2023-05-18 Thread Wang, Wei W

On Friday, May 19, 2023 3:20 AM, Peter Xu wrote:
> On Fri, May 19, 2023 at 12:00:26AM +0800, Wei Wang wrote:
> > qemu_start_incoming_migration needs to check the number of multifd
> > channels or postcopy ram channels to configure the backlog parameter (i.e.
> > the maximum length to which the queue of pending connections for
> > sockfd may grow) of listen(). So multifd and postcopy-preempt caps
> > require the use of deferred incoming, that is, calling
> > qemu_start_incoming_migration should be deferred via qmp or hmp
> > commands after the cap of multifd and postcopy-preempt are configured.
> >
> > Check if deferred incoming is used when enabling multifd or
> > postcopy-preempt, and fail the check with error messages if not.
> >
> > Signed-off-by: Wei Wang 
> 
> IIUC this will unfortunately break things like:
> 
>   -global migration.x-postcopy-preempt=on
> 
> where the cap is actually applied before incoming starts even with !defer so
> it should still work.

Actually the patch doesn’t check "!defer". It just checks if incoming has been 
started
or not. It allows the 2 caps to be set only before incoming starts. So I think 
the above
should work.

> 
> Can we just make socket_start_incoming_migration_internal() listen on a
> static but larger value?

Yes, agree for this and that's out initial change.
This needs listen() to create a longer queue for pending connections (seems OK 
to me).
Need to see Daniel and Juan's opinion about this.

[PATCH 4/7] disas/riscv.c: Support disas for Z*inx extensions

2023-05-18 Thread Weiwei Li

Support disas for Z*inx instructions only when Zfinx extension is supported.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 disas/riscv.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/disas/riscv.c b/disas/riscv.c
index 9e01810eef..a370bac6ef 100644
--- a/disas/riscv.c
+++ b/disas/riscv.c
@@ -4590,16 +4590,24 @@ static void format_inst(char *buf, size_t buflen, 
size_t tab, rv_decode *dec)
 append(buf, rv_ireg_name_sym[dec->rs2], buflen);
 break;
 case '3':
-append(buf, rv_freg_name_sym[dec->rd], buflen);
+append(buf, dec->cfg->ext_zfinx ? rv_ireg_name_sym[dec->rd] :
+  rv_freg_name_sym[dec->rd],
+   buflen);
 break;
 case '4':
-append(buf, rv_freg_name_sym[dec->rs1], buflen);
+append(buf, dec->cfg->ext_zfinx ? rv_ireg_name_sym[dec->rs1] :
+  rv_freg_name_sym[dec->rs1],
+   buflen);
 break;
 case '5':
-append(buf, rv_freg_name_sym[dec->rs2], buflen);
+append(buf, dec->cfg->ext_zfinx ? rv_ireg_name_sym[dec->rs2] :
+  rv_freg_name_sym[dec->rs2],
+   buflen);
 break;
 case '6':
-append(buf, rv_freg_name_sym[dec->rs3], buflen);
+append(buf, dec->cfg->ext_zfinx ? rv_ireg_name_sym[dec->rs3] :
+  rv_freg_name_sym[dec->rs3],
+   buflen);
 break;
 case '7':
 snprintf(tmp, sizeof(tmp), "%d", dec->rs1);
-- 
2.25.1

[PATCH 7/7] disas/riscv.c: Remove redundant parentheses

2023-05-18 Thread Weiwei Li

Remove redundant parenthese and fix multi-line comments.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 disas/riscv.c | 219 +-
 1 file changed, 110 insertions(+), 109 deletions(-)

diff --git a/disas/riscv.c b/disas/riscv.c
index ee50a4ab0c..47c325c0d6 100644
--- a/disas/riscv.c
+++ b/disas/riscv.c
@@ -2386,9 +2386,9 @@ static void decode_inst_opcode(rv_decode *dec, rv_isa isa)
 {
 rv_inst inst = dec->inst;
 rv_opcode op = rv_op_illegal;
-switch (((inst >> 0) & 0b11)) {
+switch ((inst >> 0) & 0b11) {
 case 0:
-switch (((inst >> 13) & 0b111)) {
+switch ((inst >> 13) & 0b111) {
 case 0: op = rv_op_c_addi4spn; break;
 case 1:
 if (isa == rv128) {
@@ -2441,9 +2441,9 @@ static void decode_inst_opcode(rv_decode *dec, rv_isa isa)
 }
 break;
 case 1:
-switch (((inst >> 13) & 0b111)) {
+switch ((inst >> 13) & 0b111) {
 case 0:
-switch (((inst >> 2) & 0b111)) {
+switch ((inst >> 2) & 0b111) {
 case 0: op = rv_op_c_nop; break;
 default: op = rv_op_c_addi; break;
 }
@@ -2457,13 +2457,13 @@ static void decode_inst_opcode(rv_decode *dec, rv_isa 
isa)
 break;
 case 2: op = rv_op_c_li; break;
 case 3:
-switch (((inst >> 7) & 0b1)) {
+switch ((inst >> 7) & 0b1) {
 case 2: op = rv_op_c_addi16sp; break;
 default: op = rv_op_c_lui; break;
 }
 break;
 case 4:
-switch (((inst >> 10) & 0b11)) {
+switch ((inst >> 10) & 0b11) {
 case 0:
 op = rv_op_c_srli;
 break;
@@ -2500,7 +2500,7 @@ static void decode_inst_opcode(rv_decode *dec, rv_isa isa)
 }
 break;
 case 2:
-switch (((inst >> 13) & 0b111)) {
+switch ((inst >> 13) & 0b111) {
 case 0:
 op = rv_op_c_slli;
 break;
@@ -2520,17 +2520,17 @@ static void decode_inst_opcode(rv_decode *dec, rv_isa 
isa)
 }
 break;
 case 4:
-switch (((inst >> 12) & 0b1)) {
+switch ((inst >> 12) & 0b1) {
 case 0:
-switch (((inst >> 2) & 0b1)) {
+switch ((inst >> 2) & 0b1) {
 case 0: op = rv_op_c_jr; break;
 default: op = rv_op_c_mv; break;
 }
 break;
 case 1:
-switch (((inst >> 2) & 0b1)) {
+switch ((inst >> 2) & 0b1) {
 case 0:
-switch (((inst >> 7) & 0b1)) {
+switch ((inst >> 7) & 0b1) {
 case 0: op = rv_op_c_ebreak; break;
 default: op = rv_op_c_jalr; break;
 }
@@ -2602,9 +2602,9 @@ static void decode_inst_opcode(rv_decode *dec, rv_isa isa)
 }
 break;
 case 3:
-switch (((inst >> 2) & 0b1)) {
+switch ((inst >> 2) & 0b1) {
 case 0:
-switch (((inst >> 12) & 0b111)) {
+switch ((inst >> 12) & 0b111) {
 case 0: op = rv_op_lb; break;
 case 1: op = rv_op_lh; break;
 case 2: op = rv_op_lw; break;
@@ -2616,17 +2616,17 @@ static void decode_inst_opcode(rv_decode *dec, rv_isa 
isa)
 }
 break;
 case 1:
-switch (((inst >> 12) & 0b111)) {
+switch ((inst >> 12) & 0b111) {
 case 0:
-switch (((inst >> 20) & 0b)) {
+switch ((inst >> 20) & 0b) {
 case 40: op = rv_op_vl1re8_v; break;
 case 552: op = rv_op_vl2re8_v; break;
 case 1576: op = rv_op_vl4re8_v; break;
 case 3624: op = rv_op_vl8re8_v; break;
 }
-switch (((inst >> 26) & 0b111)) {
+switch ((inst >> 26) & 0b111) {
 case 0:
-switch (((inst >> 20) & 0b1)) {
+switch ((inst >> 20) & 0b1) {
 case 0: op = rv_op_vle8_v; break;
 case 11: op = rv_op_vlm_v; break;
 case 16: op = rv_op_vle8ff_v; break;
@@ -2641,15 +2641,15 @@ static void decode_inst_opcode(rv_decode *dec, rv_isa 
isa)
 case 3: op = rv_op_fld; break;
 case 4: op = rv_op_flq; break;
 case 5:
-switch (((inst >> 20) & 0b)) {
+switch ((inst >> 20) & 0b) {
 case 40: op = rv_op_vl1re16_v; break;
 case 552: op = rv_op_vl2re16_v; break;
 case 1576: op = rv_op_vl4re16_v; break;
 case 3624: op = rv_op_vl8re16_v; break;
 }
-switch (((inst >>

[PATCH 3/7] disas/riscv.c: Support disas for Zcm* extensions

2023-05-18 Thread Weiwei Li

Support disas for Zcmt* instructions only when related extensions
are supported.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 disas/riscv.c | 20 
 1 file changed, 12 insertions(+), 8 deletions(-)

diff --git a/disas/riscv.c b/disas/riscv.c
index 729ab684da..9e01810eef 100644
--- a/disas/riscv.c
+++ b/disas/riscv.c
@@ -2501,7 +2501,7 @@ static void decode_inst_opcode(rv_decode *dec, rv_isa isa)
 op = rv_op_c_sqsp;
 } else {
 op = rv_op_c_fsdsp;
-if (((inst >> 12) & 0b01)) {
+if (dec->cfg->ext_zcmp && ((inst >> 12) & 0b01)) {
 switch ((inst >> 8) & 0b0) {
 case 8:
 if (((inst >> 4) & 0b0) >= 4) {
@@ -2527,16 +2527,20 @@ static void decode_inst_opcode(rv_decode *dec, rv_isa 
isa)
 } else {
 switch ((inst >> 10) & 0b011) {
 case 0:
-if (((inst >> 2) & 0xFF) >= 32) {
-op = rv_op_cm_jalt;
-} else {
-op = rv_op_cm_jt;
+if (dec->cfg->ext_zcmt) {
+if (((inst >> 2) & 0xFF) >= 32) {
+op = rv_op_cm_jalt;
+} else {
+op = rv_op_cm_jt;
+}
 }
 break;
 case 3:
-switch ((inst >> 5) & 0b011) {
-case 1: op = rv_op_cm_mvsa01; break;
-case 3: op = rv_op_cm_mva01s; break;
+if (dec->cfg->ext_zcmp) {
+switch ((inst >> 5) & 0b011) {
+case 1: op = rv_op_cm_mvsa01; break;
+case 3: op = rv_op_cm_mva01s; break;
+}
 }
 break;
 }
-- 
2.25.1

[PATCH 2/7] target/riscv: Pass RISCVCPUConfig as target_info to disassemble_info

2023-05-18 Thread Weiwei Li

Pass RISCVCPUConfig as disassemble_info.target_info to support disas
of conflict instructions related to specific extensions.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 disas/riscv.c  |  10 ++-
 target/riscv/cpu.c |   1 +
 target/riscv/cpu.h | 114 +-
 target/riscv/cpu_cfg.h | 135 +
 4 files changed, 144 insertions(+), 116 deletions(-)
 create mode 100644 target/riscv/cpu_cfg.h

diff --git a/disas/riscv.c b/disas/riscv.c
index e61bda5674..729ab684da 100644
--- a/disas/riscv.c
+++ b/disas/riscv.c
@@ -19,7 +19,7 @@
 
 #include "qemu/osdep.h"
 #include "disas/dis-asm.h"
-
+#include "target/riscv/cpu_cfg.h"
 
 /* types */
 
@@ -967,6 +967,7 @@ typedef enum {
 /* structures */
 
 typedef struct {
+RISCVCPUConfig *cfg;
 uint64_t  pc;
 uint64_t  inst;
 int32_t   imm;
@@ -4855,11 +4856,13 @@ static void decode_inst_decompress(rv_decode *dec, 
rv_isa isa)
 /* disassemble instruction */
 
 static void
-disasm_inst(char *buf, size_t buflen, rv_isa isa, uint64_t pc, rv_inst inst)
+disasm_inst(char *buf, size_t buflen, rv_isa isa, uint64_t pc, rv_inst inst,
+RISCVCPUConfig *cfg)
 {
 rv_decode dec = { 0 };
 dec.pc = pc;
 dec.inst = inst;
+dec.cfg = cfg;
 decode_inst_opcode(, isa);
 decode_inst_operands(, isa);
 decode_inst_decompress(, isa);
@@ -4914,7 +4917,8 @@ print_insn_riscv(bfd_vma memaddr, struct disassemble_info 
*info, rv_isa isa)
 break;
 }
 
-disasm_inst(buf, sizeof(buf), isa, memaddr, inst);
+disasm_inst(buf, sizeof(buf), isa, memaddr, inst,
+(RISCVCPUConfig *)info->target_info);
 (*info->fprintf_func)(info->stream, "%s", buf);
 
 return len;
diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index db0875fb43..4fe926cdd1 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -818,6 +818,7 @@ static void riscv_cpu_reset_hold(Object *obj)
 static void riscv_cpu_disas_set_info(CPUState *s, disassemble_info *info)
 {
 RISCVCPU *cpu = RISCV_CPU(s);
+info->target_info = >cfg;
 
 switch (riscv_cpu_mxl(>env)) {
 case MXL_RV32:
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index de7e43126a..dc1229b69c 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -27,6 +27,7 @@
 #include "qom/object.h"
 #include "qemu/int128.h"
 #include "cpu_bits.h"
+#include "cpu_cfg.h"
 #include "qapi/qapi-types-common.h"
 #include "cpu-qom.h"
 
@@ -368,119 +369,6 @@ struct CPUArchState {
 uint64_t kvm_timer_frequency;
 };
 
-/*
- * map is a 16-bit bitmap: the most significant set bit in map is the maximum
- * satp mode that is supported. It may be chosen by the user and must respect
- * what qemu implements (valid_1_10_32/64) and what the hw is capable of
- * (supported bitmap below).
- *
- * init is a 16-bit bitmap used to make sure the user selected a correct
- * configuration as per the specification.
- *
- * supported is a 16-bit bitmap used to reflect the hw capabilities.
- */
-typedef struct {
-uint16_t map, init, supported;
-} RISCVSATPMap;
-
-struct RISCVCPUConfig {
-bool ext_zba;
-bool ext_zbb;
-bool ext_zbc;
-bool ext_zbkb;
-bool ext_zbkc;
-bool ext_zbkx;
-bool ext_zbs;
-bool ext_zca;
-bool ext_zcb;
-bool ext_zcd;
-bool ext_zce;
-bool ext_zcf;
-bool ext_zcmp;
-bool ext_zcmt;
-bool ext_zk;
-bool ext_zkn;
-bool ext_zknd;
-bool ext_zkne;
-bool ext_zknh;
-bool ext_zkr;
-bool ext_zks;
-bool ext_zksed;
-bool ext_zksh;
-bool ext_zkt;
-bool ext_ifencei;
-bool ext_icsr;
-bool ext_icbom;
-bool ext_icboz;
-bool ext_zicond;
-bool ext_zihintpause;
-bool ext_smstateen;
-bool ext_sstc;
-bool ext_svadu;
-bool ext_svinval;
-bool ext_svnapot;
-bool ext_svpbmt;
-bool ext_zdinx;
-bool ext_zawrs;
-bool ext_zfh;
-bool ext_zfhmin;
-bool ext_zfinx;
-bool ext_zhinx;
-bool ext_zhinxmin;
-bool ext_zve32f;
-bool ext_zve64f;
-bool ext_zve64d;
-bool ext_zmmul;
-bool ext_zvfh;
-bool ext_zvfhmin;
-bool ext_smaia;
-bool ext_ssaia;
-bool ext_sscofpmf;
-bool rvv_ta_all_1s;
-bool rvv_ma_all_1s;
-
-uint32_t mvendorid;
-uint64_t marchid;
-uint64_t mimpid;
-
-/* Vendor-specific custom extensions */
-bool ext_xtheadba;
-bool ext_xtheadbb;
-bool ext_xtheadbs;
-bool ext_xtheadcmo;
-bool ext_xtheadcondmov;
-bool ext_xtheadfmemidx;
-bool ext_xtheadfmv;
-bool ext_xtheadmac;
-bool ext_xtheadmemidx;
-bool ext_xtheadmempair;
-bool ext_xtheadsync;
-bool ext_XVentanaCondOps;
-
-uint8_t pmu_num;
-char *priv_spec;
-char *user_spec;
-char *bext_spec;
-char *vext_spec;
-uint16_t vlen;
-uint16_t elen;
-uint16_t cbom_blocksize;
-uint16_t cboz_blocksize;
-bool mmu;
-bool pmp;
-bool epmp;
-bool debug;
-bool

[PATCH 0/7] Add support for extension specific disas

2023-05-18 Thread Weiwei Li

Some extensions have conflict encodings, such as
 * Z*inx reuse the same encodings as normal float point extensions.
 * Zcm* reuse the some encodings of Zcd.
 * Custom extensions from different vendors may share the same encodings.
To resolve this problem, this patchset tries to pass RISCVCPUConfig as 
disasemble_info.target_info to support extension specific disas, which means 
that the disas for this extensions is supported only when the related extension 
is supported.
This patchset also fixes some style problems in disas/riscv.c.

The port is available here:
https://github.com/plctlab/plct-qemu/tree/plct-disas-upstream

Weiwei Li (7):
  disas: Change type of disassemble_info.target_info to pointer
  target/riscv: Pass RISCVCPUConfig as target_info to disassemble_info
  disas/riscv.c: Support disas for Zcm* extensions
  disas/riscv.c: Support disas for Z*inx extensions
  disas/riscv.c: Remove unused decomp_rv32/64 value for vector
instructions
  disas/riscv.c: Fix lines with over 80 characters
  disas/riscv.c: Remove redundant parentheses

 disas/riscv.c   | 1206 +--
 include/disas/dis-asm.h |2 +-
 target/riscv/cpu.c  |1 +
 target/riscv/cpu.h  |  114 +---
 target/riscv/cpu_cfg.h  |  135 +
 5 files changed, 789 insertions(+), 669 deletions(-)
 create mode 100644 target/riscv/cpu_cfg.h

-- 
2.25.1

[PATCH 6/7] disas/riscv.c: Fix lines with over 80 characters

2023-05-18 Thread Weiwei Li

Fix lines with over 80 characters.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 disas/riscv.c | 201 +++---
 1 file changed, 140 insertions(+), 61 deletions(-)

diff --git a/disas/riscv.c b/disas/riscv.c
index fcea5d7beb..ee50a4ab0c 100644
--- a/disas/riscv.c
+++ b/disas/riscv.c
@@ -1108,8 +1108,10 @@ static const char rv_vreg_name_sym[32][4] = {
 /* pseudo-instruction constraints */
 
 static const rvc_constraint rvcc_jal[] = { rvc_rd_eq_ra, rvc_end };
-static const rvc_constraint rvcc_jalr[] = { rvc_rd_eq_ra, rvc_imm_eq_zero, 
rvc_end };
-static const rvc_constraint rvcc_nop[] = { rvc_rd_eq_x0, rvc_rs1_eq_x0, 
rvc_imm_eq_zero, rvc_end };
+static const rvc_constraint rvcc_jalr[] = { rvc_rd_eq_ra, rvc_imm_eq_zero,
+rvc_end };
+static const rvc_constraint rvcc_nop[] = { rvc_rd_eq_x0, rvc_rs1_eq_x0,
+   rvc_imm_eq_zero, rvc_end };
 static const rvc_constraint rvcc_mv[] = { rvc_imm_eq_zero, rvc_end };
 static const rvc_constraint rvcc_not[] = { rvc_imm_eq_n1, rvc_end };
 static const rvc_constraint rvcc_neg[] = { rvc_rs1_eq_x0, rvc_end };
@@ -1139,18 +1141,28 @@ static const rvc_constraint rvcc_bleu[] = { rvc_end };
 static const rvc_constraint rvcc_bgt[] = { rvc_end };
 static const rvc_constraint rvcc_bgtu[] = { rvc_end };
 static const rvc_constraint rvcc_j[] = { rvc_rd_eq_x0, rvc_end };
-static const rvc_constraint rvcc_ret[] = { rvc_rd_eq_x0, rvc_rs1_eq_ra, 
rvc_end };
-static const rvc_constraint rvcc_jr[] = { rvc_rd_eq_x0, rvc_imm_eq_zero, 
rvc_end };
-static const rvc_constraint rvcc_rdcycle[] = { rvc_rs1_eq_x0, 
rvc_csr_eq_0xc00, rvc_end };
-static const rvc_constraint rvcc_rdtime[] = { rvc_rs1_eq_x0, rvc_csr_eq_0xc01, 
rvc_end };
-static const rvc_constraint rvcc_rdinstret[] = { rvc_rs1_eq_x0, 
rvc_csr_eq_0xc02, rvc_end };
-static const rvc_constraint rvcc_rdcycleh[] = { rvc_rs1_eq_x0, 
rvc_csr_eq_0xc80, rvc_end };
-static const rvc_constraint rvcc_rdtimeh[] = { rvc_rs1_eq_x0, 
rvc_csr_eq_0xc81, rvc_end };
+static const rvc_constraint rvcc_ret[] = { rvc_rd_eq_x0, rvc_rs1_eq_ra,
+   rvc_end };
+static const rvc_constraint rvcc_jr[] = { rvc_rd_eq_x0, rvc_imm_eq_zero,
+  rvc_end };
+static const rvc_constraint rvcc_rdcycle[] = { rvc_rs1_eq_x0, rvc_csr_eq_0xc00,
+   rvc_end };
+static const rvc_constraint rvcc_rdtime[] = { rvc_rs1_eq_x0, rvc_csr_eq_0xc01,
+  rvc_end };
+static const rvc_constraint rvcc_rdinstret[] = { rvc_rs1_eq_x0,
+ rvc_csr_eq_0xc02, rvc_end };
+static const rvc_constraint rvcc_rdcycleh[] = { rvc_rs1_eq_x0,
+rvc_csr_eq_0xc80, rvc_end };
+static const rvc_constraint rvcc_rdtimeh[] = { rvc_rs1_eq_x0, rvc_csr_eq_0xc81,
+   rvc_end };
 static const rvc_constraint rvcc_rdinstreth[] = { rvc_rs1_eq_x0,
   rvc_csr_eq_0xc82, rvc_end };
-static const rvc_constraint rvcc_frcsr[] = { rvc_rs1_eq_x0, rvc_csr_eq_0x003, 
rvc_end };
-static const rvc_constraint rvcc_frrm[] = { rvc_rs1_eq_x0, rvc_csr_eq_0x002, 
rvc_end };
-static const rvc_constraint rvcc_frflags[] = { rvc_rs1_eq_x0, 
rvc_csr_eq_0x001, rvc_end };
+static const rvc_constraint rvcc_frcsr[] = { rvc_rs1_eq_x0, rvc_csr_eq_0x003,
+ rvc_end };
+static const rvc_constraint rvcc_frrm[] = { rvc_rs1_eq_x0, rvc_csr_eq_0x002,
+rvc_end };
+static const rvc_constraint rvcc_frflags[] = { rvc_rs1_eq_x0, rvc_csr_eq_0x001,
+   rvc_end };
 static const rvc_constraint rvcc_fscsr[] = { rvc_csr_eq_0x003, rvc_end };
 static const rvc_constraint rvcc_fsrm[] = { rvc_csr_eq_0x002, rvc_end };
 static const rvc_constraint rvcc_fsflags[] = { rvc_csr_eq_0x001, rvc_end };
@@ -1552,17 +1564,23 @@ const rv_opcode_data opcode_data[] = {
 { "fmv.q.x", rv_codec_r, rv_fmt_frd_rs1, NULL, 0, 0, 0 },
 { "c.addi4spn", rv_codec_ciw_4spn, rv_fmt_rd_rs1_imm, NULL, rv_op_addi,
   rv_op_addi, rv_op_addi, rvcd_imm_nz },
-{ "c.fld", rv_codec_cl_ld, rv_fmt_frd_offset_rs1, NULL, rv_op_fld, 
rv_op_fld, 0 },
-{ "c.lw", rv_codec_cl_lw, rv_fmt_rd_offset_rs1, NULL, rv_op_lw, rv_op_lw, 
rv_op_lw },
+{ "c.fld", rv_codec_cl_ld, rv_fmt_frd_offset_rs1, NULL, rv_op_fld,
+  rv_op_fld, 0 },
+{ "c.lw", rv_codec_cl_lw, rv_fmt_rd_offset_rs1, NULL, rv_op_lw, rv_op_lw,
+  rv_op_lw },
 { "c.flw", rv_codec_cl_lw, rv_fmt_frd_offset_rs1, NULL, rv_op_flw, 0, 0 },
-{ "c.fsd", rv_codec_cs_sd, rv_fmt_frs2_offset_rs1, NULL, rv_op_fsd, 
rv_op_fsd, 0 },
-{ "c.sw", rv_codec_cs_sw, rv_fmt_rs2_offset_rs1, NULL, rv_op_sw, rv_op_sw, 
rv_op_sw },
+

[PATCH 5/7] disas/riscv.c: Remove unused decomp_rv32/64 value for vector instructions

2023-05-18 Thread Weiwei Li

Currently decomp_rv32 and decomp_rv64 value in opcode_data for vector
instructions are the same op index as their own. And they have no
functional decomp_data. So they have no functional difference from just
leaving them as zero.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 disas/riscv.c | 740 +-
 1 file changed, 370 insertions(+), 370 deletions(-)

diff --git a/disas/riscv.c b/disas/riscv.c
index a370bac6ef..fcea5d7beb 100644
--- a/disas/riscv.c
+++ b/disas/riscv.c
@@ -1730,376 +1730,376 @@ const rv_opcode_data opcode_data[] = {
 { "zip", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
 { "xperm4", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
 { "xperm8", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
-{ "vle8.v", rv_codec_v_ldst, rv_fmt_ldst_vd_rs1_vm, NULL, rv_op_vle8_v, 
rv_op_vle8_v, 0 },
-{ "vle16.v", rv_codec_v_ldst, rv_fmt_ldst_vd_rs1_vm, NULL, rv_op_vle16_v, 
rv_op_vle16_v, 0 },
-{ "vle32.v", rv_codec_v_ldst, rv_fmt_ldst_vd_rs1_vm, NULL, rv_op_vle32_v, 
rv_op_vle32_v, 0 },
-{ "vle64.v", rv_codec_v_ldst, rv_fmt_ldst_vd_rs1_vm, NULL, rv_op_vle64_v, 
rv_op_vle64_v, 0 },
-{ "vse8.v", rv_codec_v_ldst, rv_fmt_ldst_vd_rs1_vm, NULL, rv_op_vse8_v, 
rv_op_vse8_v, 0 },
-{ "vse16.v", rv_codec_v_ldst, rv_fmt_ldst_vd_rs1_vm, NULL, rv_op_vse16_v, 
rv_op_vse16_v, 0 },
-{ "vse32.v", rv_codec_v_ldst, rv_fmt_ldst_vd_rs1_vm, NULL, rv_op_vse32_v, 
rv_op_vse32_v, 0 },
-{ "vse64.v", rv_codec_v_ldst, rv_fmt_ldst_vd_rs1_vm, NULL, rv_op_vse64_v, 
rv_op_vse64_v, 0 },
-{ "vlm.v", rv_codec_v_ldst, rv_fmt_ldst_vd_rs1_vm, NULL, rv_op_vlm_v, 
rv_op_vlm_v, 0 },
-{ "vsm.v", rv_codec_v_ldst, rv_fmt_ldst_vd_rs1_vm, NULL, rv_op_vsm_v, 
rv_op_vsm_v, 0 },
-{ "vlse8.v", rv_codec_v_r, rv_fmt_ldst_vd_rs1_rs2_vm, NULL, rv_op_vlse8_v, 
rv_op_vlse8_v, 0 },
-{ "vlse16.v", rv_codec_v_r, rv_fmt_ldst_vd_rs1_rs2_vm, NULL, 
rv_op_vlse16_v, rv_op_vlse16_v, 0 },
-{ "vlse32.v", rv_codec_v_r, rv_fmt_ldst_vd_rs1_rs2_vm, NULL, 
rv_op_vlse32_v, rv_op_vlse32_v, 0 },
-{ "vlse64.v", rv_codec_v_r, rv_fmt_ldst_vd_rs1_rs2_vm, NULL, 
rv_op_vlse64_v, rv_op_vlse64_v, 0 },
-{ "vsse8.v", rv_codec_v_r, rv_fmt_ldst_vd_rs1_rs2_vm, NULL, rv_op_vsse8_v, 
rv_op_vsse8_v, 0 },
-{ "vsse16.v", rv_codec_v_r, rv_fmt_ldst_vd_rs1_rs2_vm, NULL, 
rv_op_vsse16_v, rv_op_vsse16_v, 0 },
-{ "vsse32.v", rv_codec_v_r, rv_fmt_ldst_vd_rs1_rs2_vm, NULL, 
rv_op_vsse32_v, rv_op_vsse32_v, 0 },
-{ "vsse64.v", rv_codec_v_r, rv_fmt_ldst_vd_rs1_rs2_vm, NULL, 
rv_op_vsse64_v, rv_op_vsse64_v, 0 },
-{ "vluxei8.v", rv_codec_v_r, rv_fmt_ldst_vd_rs1_vs2_vm, NULL, 
rv_op_vluxei8_v, rv_op_vluxei8_v, 0 },
-{ "vluxei16.v", rv_codec_v_r, rv_fmt_ldst_vd_rs1_vs2_vm, NULL, 
rv_op_vluxei16_v, rv_op_vluxei16_v, 0 },
-{ "vluxei32.v", rv_codec_v_r, rv_fmt_ldst_vd_rs1_vs2_vm, NULL, 
rv_op_vluxei32_v, rv_op_vluxei32_v, 0 },
-{ "vluxei64.v", rv_codec_v_r, rv_fmt_ldst_vd_rs1_vs2_vm, NULL, 
rv_op_vluxei64_v, rv_op_vluxei64_v, 0 },
-{ "vloxei8.v", rv_codec_v_r, rv_fmt_ldst_vd_rs1_vs2_vm, NULL, 
rv_op_vloxei8_v, rv_op_vloxei8_v, 0 },
-{ "vloxei16.v", rv_codec_v_r, rv_fmt_ldst_vd_rs1_vs2_vm, NULL, 
rv_op_vloxei16_v, rv_op_vloxei16_v, 0 },
-{ "vloxei32.v", rv_codec_v_r, rv_fmt_ldst_vd_rs1_vs2_vm, NULL, 
rv_op_vloxei32_v, rv_op_vloxei32_v, 0 },
-{ "vloxei64.v", rv_codec_v_r, rv_fmt_ldst_vd_rs1_vs2_vm, NULL, 
rv_op_vloxei64_v, rv_op_vloxei64_v, 0 },
-{ "vsuxei8.v", rv_codec_v_r, rv_fmt_ldst_vd_rs1_vs2_vm, NULL, 
rv_op_vsuxei8_v, rv_op_vsuxei8_v, 0 },
-{ "vsuxei16.v", rv_codec_v_r, rv_fmt_ldst_vd_rs1_vs2_vm, NULL, 
rv_op_vsuxei16_v, rv_op_vsuxei16_v, 0 },
-{ "vsuxei32.v", rv_codec_v_r, rv_fmt_ldst_vd_rs1_vs2_vm, NULL, 
rv_op_vsuxei32_v, rv_op_vsuxei32_v, 0 },
-{ "vsuxei64.v", rv_codec_v_r, rv_fmt_ldst_vd_rs1_vs2_vm, NULL, 
rv_op_vsuxei64_v, rv_op_vsuxei64_v, 0 },
-{ "vsoxei8.v", rv_codec_v_r, rv_fmt_ldst_vd_rs1_vs2_vm, NULL, 
rv_op_vsoxei8_v, rv_op_vsoxei8_v, 0 },
-{ "vsoxei16.v", rv_codec_v_r, rv_fmt_ldst_vd_rs1_vs2_vm, NULL, 
rv_op_vsoxei16_v, rv_op_vsoxei16_v, 0 },
-{ "vsoxei32.v", rv_codec_v_r, rv_fmt_ldst_vd_rs1_vs2_vm, NULL, 
rv_op_vsoxei32_v, rv_op_vsoxei32_v, 0 },
-{ "vsoxei64.v", rv_codec_v_r, rv_fmt_ldst_vd_rs1_vs2_vm, NULL, 
rv_op_vsoxei64_v, rv_op_vsoxei64_v, 0 },
-{ "vle8ff.v", rv_codec_v_ldst, rv_fmt_ldst_vd_rs1_vm, NULL, 
rv_op_vle8ff_v, rv_op_vle8ff_v, 0 },
-{ "vle16ff.v", rv_codec_v_ldst, rv_fmt_ldst_vd_rs1_vm, NULL, 
rv_op_vle16ff_v, rv_op_vle16ff_v, 0 },
-{ "vle32ff.v", rv_codec_v_ldst, rv_fmt_ldst_vd_rs1_vm, NULL, 
rv_op_vle32ff_v, rv_op_vle32ff_v, 0 },
-{ "vle64ff.v", rv_codec_v_ldst, rv_fmt_ldst_vd_rs1_vm, NULL, 
rv_op_vle64ff_v, rv_op_vle64ff_v, 0 },
-{ "vl1re8.v", rv_codec_v_ldst, rv_fmt_ldst_vd_rs1_vm, NULL, 
rv_op_vl1re8_v, rv_op_vl1re8_v, 0 },
-{ "vl1re16.v", rv_codec_v_ldst, rv_fmt_ldst_vd_rs1_vm, NULL, 
rv_op_vl1re16_v, rv_op_vl1re16_v, 0 },
-{ "vl1re32.v",

[PATCH 1/7] disas: Change type of disassemble_info.target_info to pointer

2023-05-18 Thread Weiwei Li

Use pointer to pass more information of target to disasembler,
such as pass cpu.cfg related information in following commits.

Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 
---
 include/disas/dis-asm.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/disas/dis-asm.h b/include/disas/dis-asm.h
index 2f6f91c2ee..2324f6b1a4 100644
--- a/include/disas/dis-asm.h
+++ b/include/disas/dis-asm.h
@@ -397,7 +397,7 @@ typedef struct disassemble_info {
   char * disassembler_options;
 
   /* Field intended to be used by targets in any way they deem suitable.  */
-  int64_t target_info;
+  void *target_info;
 
   /* Options for Capstone disassembly.  */
   int cap_arch;
-- 
2.25.1

Re: [PATCH v2 6/8] iotests: always use a unique sub-directory per test

2023-05-18 Thread Eric Blake

On Fri, Mar 03, 2023 at 04:07:25PM +, Daniel P. Berrangé wrote:
> The current test runner is only safe against parallel execution within
> a single instance of the 'check' process, and only if -j is given a
> value greater than 2. This prevents running multiple copies of the
> 'check' process for different test scenarios.
> 
> This change switches the output / socket directories to always include
> the test name, image format and image protocol. This should allow full
> parallelism of all distinct test scenarios. eg running both qcow2 and
> raw tests at the same time, or both file and nbd tests at the same
> time.
> 
> It would be possible to allow for parallelism of the same test scenario
> by including the pid, but that would potentially let many directories
> accumulate over time on failures, so is not done.
> 
> Signed-off-by: Daniel P. Berrangé 
> ---
>  tests/qemu-iotests/testrunner.py | 20 ++--
>  1 file changed, 10 insertions(+), 10 deletions(-)

git bisect points to this commit as being the reason behind the
following regression in iotests:

$ ./check -nbd 104
...
--- /home/eblake/qemu/tests/qemu-iotests/104.out
+++ /home/eblake/qemu/build/tests/qemu-iotests/scratch/raw-nbd-104/104.out.bad
@@ -2,11 +2,11 @@
 === Check qemu-img info output ===

 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1024
-image: TEST_DIR/t.IMGFMT
+image: nbd+unix://?socket=/tmp/tmpqathrkpn/IMGFMT-nbd-104/nbd
 file format: IMGFMT
 virtual size: 1 KiB (1024 bytes)
 Formatting 'TEST_DIR/t.IMGFMT', fmt=IMGFMT size=1234
-image: TEST_DIR/t.IMGFMT
+image: nbd+unix://?socket=/tmp/tmpqathrkpn/IMGFMT-nbd-104/nbd
 file format: IMGFMT
 virtual size: 1.5 KiB (1536 bytes)
 *** done
Failures: 104

Back in 2015, commit a231cb2726, we added a hack that turned
nbd://... into TEST_DIR/t.IMGFMT to satisfy the output matching in the
various iotests, but with our new per-test directories, that filter is
no longer firing.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH] multifd: Set a higher "backlog" default value for listen()

2023-05-18 Thread Wang, Lei

On 5/18/2023 17:16, Juan Quintela wrote:
> Lei Wang  wrote:
>> When destination VM is launched, the "backlog" parameter for listen() is set
>> to 1 as default in socket_start_incoming_migration_internal(), which will
>> lead to socket connection error (the queue of pending connections is full)
>> when "multifd" and "multifd-channels" are set later on and a high number of
>> channels are used. Set it to a hard-coded higher default value 512 to fix
>> this issue.
>>
>> Reported-by: Wei Wang 
>> Signed-off-by: Lei Wang 
> 
> [cc'd daiel who is the maintainer of qio]
> 
> My understanding of that value is that 230 or something like that would
> be more than enough.  The maxiimum number of multifd channels is 256.

You are right, the "multifd-channels" expects uint8_t, so 256 is enough.

> 
> Daniel, any opinion?
> 
> Later, Juan.
> 
>> ---
>>  migration/socket.c | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/migration/socket.c b/migration/socket.c
>> index 1b6f5baefb..b43a66ef7e 100644
>> --- a/migration/socket.c
>> +++ b/migration/socket.c
>> @@ -179,7 +179,7 @@ socket_start_incoming_migration_internal(SocketAddress 
>> *saddr,
>>  QIONetListener *listener = qio_net_listener_new();
>>  MigrationIncomingState *mis = migration_incoming_get_current();
>>  size_t i;
>> -int num = 1;
>> +int num = 512;
>>  
>>  qio_net_listener_set_name(listener, "migration-socket-listener");
>

Re: [PATCH v11 00/14] TCG code quality tracking

2023-05-18 Thread Wu, Fei

On 4/22/2023 12:42 AM, Alex Bennée wrote:
> 
> Fei Wu  writes:
> 
>> This patch series were done by Vanderson and Alex originally in 2019, I
>> (Fei Wu) rebased them on latest upstream from:
>> https://github.com/stsquad/qemu/tree/tcg/tbstats-and-perf-v10
>> and send out this review per Alex's request, I will continue to address
>> any future review comments here. As it's been a very long time and there
>> are lots of conflicts during rebase, it's my fault if I introduce any
>> problems during the process.
> 
> Hi Fei,
> 
> Thanks for picking this up. I can confirm that this applies cleanly to
> master and I have kicked the tyres and things still seem to work. I'm
> not sure if I can provide much review on code I wrote but a few things
> to point out:
> 
>   - there are a number of CI failures, mainly qatomic on 32 bit guests
> see https://gitlab.com/stsquad/qemu/-/pipelines/844857279/failures
> maybe we just disable time accounting for 32 bit hosts?
> 
I sent out v12 series which fixes some CI failures. qatomic is not
touched yet, the current code with CONFIG_PROFILER should have the same
issue, what's the policy of 32 bit guests support on qemu?

Besides time, there are some other counters with uint64_t using qatomic
such as TCGProfile.table_op_count, we might switch to size_t instead?

>   - we need a proper solution to the invalidation of TBs so we can
> exclude them from lists (or at least not do the attempt
> translation/fail dance). Alternatively we could page out a copy of
> the TB data to a disk file when we hit a certain hotness? How would
> this interact with the jitperf support already?
> 
>   - we should add some documentation to the manual so users don't have
> to figure it all out by trail and error at the HMP command line.
> 
added one in docs/tb-stats.txt. Some extra bits could be added to
explain the fields of the output.

>   - there may be some exit cases missed because I saw some weird TB's
> with very long IR generation times.
> 
> TB id:5 | phys:0xb5f21d00 virt:0xcf2f17721d00 flags:0x0051 1 inv/2
> | exec:1889055/0 guest inst cov:1.05%
> | trans:2 ints: g:4 op:32 op_opt:26 spills:0
> | h/g (host bytes / guest insts): 56.00
> | time to gen at 2.4GHz => code:6723.33(ns) IR:2378539.17(ns)
> 
Is it reproducible on your system? I didn't see it on my system, is it
possible the system events cause this?

>   - code motion in 9/14 should be folded into the first patch
> 
done.

btw, I also added a few comments on v12 series, could you please check
if they make sense?

Thanks,
Fei.

> Even if we can't find a solution for safely dumping TBs I think the
> series without "tb-list" is still an improvement for getting rid of the
> --enable-profiler and making info JIT useful by default.
> 
> Richard,
> 
> What do you think?
>

Re: gitlab shared runner time expired

2023-05-18 Thread Eldon Stegall

On Thu, May 18, 2023 at 12:26:33PM -0700, Richard Henderson wrote:
> So, here we are again, out of runner time with 13 days left in the month.
> 
> Did we come to any resolution since last time?  Holding development for that 
> long just 
> isn't right, so I'll continue processing the hard way -- testing on private 
> runners and 
> local build machines.

Hi Richard,
We should have capacity for private runners to execute several jobs in
parallel.  Here [1] is an example of one that ran on a private runner today.

I have been thinking about suggesting a strategy to run jobs that lend
themselves to amd64 linux runners pinned to private runners that match,
so more "shared" minutes can be spent on runners that have different
capabilites.

If there is another specific arch/OS runner you have in mind, I would be
happy to make efforts towards provisioning one in our infrastructure.
Particularly it seems like a lot of people are trying to use qemu to
support amd64 linux on M1 macs, so it might make sense to have a private
mac silicon macOS runner.

Also, since we have a hardware runner that can run with kvm capabilites,
it might make sense to carve some of those tests out for that tagged
runner. 

Since this is relatively new, I think I was personally in an observational
period before seeking input on those goals, but let me know you thoughts.

Thanks,
Eldon

[1] https://gitlab.com/qemu-project/qemu/-/jobs/4310866300

RE: Multiple vIOMMU instance support in QEMU?

2023-05-18 Thread Tian, Kevin

> From: Jason Gunthorpe 
> Sent: Friday, May 19, 2023 4:19 AM
> 
> On Thu, May 18, 2023 at 03:45:24PM -0400, Peter Xu wrote:
> 
> > I see that Intel is already copied here (at least Yi and Kevin) so I assume
> > there're already some kind of synchronizations on multi-vIOMMU vs recent
> > works on Intel side, which is definitely nice and can avoid work conflicts.
> 
> I actually don't know that.. Intel sees multiple DMAR blocks in SW and
> they have kernel level replication of invalidation.. Intel doesn't
> have a HW fast path yet so they can rely on mediation to fix it. Thus
> I expect there is no HW replication of invalidations here. Kevin?
> 

No HW fast path so single vIOMMU instance is sufficient on Intel now.

Re: [PATCH 6/6] block: remove bdrv_co_io_plug() API

2023-05-18 Thread Eric Blake

On Wed, May 17, 2023 at 06:10:22PM -0400, Stefan Hajnoczi wrote:
> No block driver implements .bdrv_co_io_plug() anymore. Get rid of the
> function pointers.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  include/block/block-io.h |  3 ---
>  include/block/block_int-common.h | 11 --
>  block/io.c   | 37 
>  3 files changed, 51 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 5/6] block/linux-aio: convert to blk_io_plug_call() API

2023-05-18 Thread Eric Blake

On Wed, May 17, 2023 at 06:10:21PM -0400, Stefan Hajnoczi wrote:
> Stop using the .bdrv_co_io_plug() API because it is not multi-queue
> block layer friendly. Use the new blk_io_plug_call() API to batch I/O
> submission instead.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  include/block/raw-aio.h |  7 ---
>  block/file-posix.c  | 28 
>  block/linux-aio.c   | 41 +++--
>  3 files changed, 11 insertions(+), 65 deletions(-)
>

Nice to see that not only is it friendlier to multi-queue, it's also
fewer lines of code.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 4/6] block/io_uring: convert to blk_io_plug_call() API

2023-05-18 Thread Eric Blake

On Wed, May 17, 2023 at 06:10:20PM -0400, Stefan Hajnoczi wrote:
> Stop using the .bdrv_co_io_plug() API because it is not multi-queue
> block layer friendly. Use the new blk_io_plug_call() API to batch I/O
> submission instead.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  include/block/raw-aio.h |  7 ---
>  block/file-posix.c  | 10 -
>  block/io_uring.c| 45 -
>  block/trace-events  |  5 ++---
>  4 files changed, 19 insertions(+), 48 deletions(-)
> 

> @@ -337,7 +325,6 @@ void luring_io_unplug(void)
>   * @type: type of request
>   *
>   * Fetches sqes from ring, adds to pending queue and preps them
> - *
>   */
>  static int luring_do_submit(int fd, LuringAIOCB *luringcb, LuringState *s,
>  uint64_t offset, int type)
> @@ -370,14 +357,16 @@ static int luring_do_submit(int fd, LuringAIOCB 
> *luringcb, LuringState *s,

Looks a bit like a stray hunk, but you are touching the function, so
it's okay.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 3/6] block/blkio: convert to blk_io_plug_call() API

2023-05-18 Thread Eric Blake

On Wed, May 17, 2023 at 06:10:19PM -0400, Stefan Hajnoczi wrote:
> Stop using the .bdrv_co_io_plug() API because it is not multi-queue
> block layer friendly. Use the new blk_io_plug_call() API to batch I/O
> submission instead.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  block/blkio.c | 40 +---
>  1 file changed, 21 insertions(+), 19 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 2/6] block/nvme: convert to blk_io_plug_call() API

2023-05-18 Thread Eric Blake

On Wed, May 17, 2023 at 06:10:18PM -0400, Stefan Hajnoczi wrote:
> Stop using the .bdrv_co_io_plug() API because it is not multi-queue
> block layer friendly. Use the new blk_io_plug_call() API to batch I/O
> submission instead.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  block/nvme.c | 44 
>  1 file changed, 12 insertions(+), 32 deletions(-)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

Re: [PATCH 1/6] block: add blk_io_plug_call() API

2023-05-18 Thread Eric Blake

On Wed, May 17, 2023 at 06:10:17PM -0400, Stefan Hajnoczi wrote:
> Introduce a new API for thread-local blk_io_plug() that does not
> traverse the block graph. The goal is to make blk_io_plug() multi-queue
> friendly.
> 
> Instead of having block drivers track whether or not we're in a plugged
> section, provide an API that allows them to defer a function call until
> we're unplugged: blk_io_plug_call(fn, opaque). If blk_io_plug_call() is
> called multiple times with the same fn/opaque pair, then fn() is only
> called once at the end of the function - resulting in batching.
> 
> This patch introduces the API and changes blk_io_plug()/blk_io_unplug().
> blk_io_plug()/blk_io_unplug() no longer require a BlockBackend argument
> because the plug state is now thread-local.
> 
> Later patches convert block drivers to blk_io_plug_call() and then we
> can finally remove .bdrv_co_io_plug() once all block drivers have been
> converted.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---

> +++ b/block/plug.c
> +
> +/**
> + * blk_io_plug_call:
> + * @fn: a function pointer to be invoked
> + * @opaque: a user-defined argument to @fn()
> + *
> + * Call @fn(@opaque) immediately if not within a 
> blk_io_plug()/blk_io_unplug()
> + * section.
> + *
> + * Otherwise defer the call until the end of the outermost
> + * blk_io_plug()/blk_io_unplug() section in this thread. If the same
> + * @fn/@opaque pair has already been deferred, it will only be called once 
> upon
> + * blk_io_unplug() so that accumulated calls are batched into a single call.
> + *
> + * The caller must ensure that @opaque is not be freed before @fn() is 
> invoked.

s/be //

> + */
> +void blk_io_plug_call(void (*fn)(void *), void *opaque)

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[PATCH v3 2/2] tests/tcg/aarch64: add DC CVA[D]P tests

2023-05-18 Thread Zhuojia Shen

Test execution of DC CVAP and DC CVADP instructions under user mode
emulation.

Signed-off-by: Zhuojia Shen 
---
 tests/tcg/aarch64/Makefile.target | 11 ++
 tests/tcg/aarch64/dcpodp.c| 58 +++
 tests/tcg/aarch64/dcpop.c | 58 +++
 3 files changed, 127 insertions(+)
 create mode 100644 tests/tcg/aarch64/dcpodp.c
 create mode 100644 tests/tcg/aarch64/dcpop.c

diff --git a/tests/tcg/aarch64/Makefile.target 
b/tests/tcg/aarch64/Makefile.target
index 0315795487..3430fd3cd8 100644
--- a/tests/tcg/aarch64/Makefile.target
+++ b/tests/tcg/aarch64/Makefile.target
@@ -21,12 +21,23 @@ config-cc.mak: Makefile
$(quiet-@)( \
$(call cc-option,-march=armv8.1-a+sve,  CROSS_CC_HAS_SVE); \
$(call cc-option,-march=armv8.1-a+sve2, CROSS_CC_HAS_SVE2); 
\
+   $(call cc-option,-march=armv8.2-a,  
CROSS_CC_HAS_ARMV8_2); \
$(call cc-option,-march=armv8.3-a,  
CROSS_CC_HAS_ARMV8_3); \
+   $(call cc-option,-march=armv8.5-a,  
CROSS_CC_HAS_ARMV8_5); \
$(call cc-option,-mbranch-protection=standard,  
CROSS_CC_HAS_ARMV8_BTI); \
$(call cc-option,-march=armv8.5-a+memtag,   
CROSS_CC_HAS_ARMV8_MTE); \
$(call cc-option,-march=armv9-a+sme,
CROSS_CC_HAS_ARMV9_SME)) 3> config-cc.mak
 -include config-cc.mak
 
+ifneq ($(CROSS_CC_HAS_ARMV8_2),)
+AARCH64_TESTS += dcpop
+dcpop: CFLAGS += -march=armv8.2-a
+endif
+ifneq ($(CROSS_CC_HAS_ARMV8_5),)
+AARCH64_TESTS += dcpodp
+dcpodp: CFLAGS += -march=armv8.5-a
+endif
+
 # Pauth Tests
 ifneq ($(CROSS_CC_HAS_ARMV8_3),)
 AARCH64_TESTS += pauth-1 pauth-2 pauth-4 pauth-5
diff --git a/tests/tcg/aarch64/dcpodp.c b/tests/tcg/aarch64/dcpodp.c
new file mode 100644
index 00..6f6301ac86
--- /dev/null
+++ b/tests/tcg/aarch64/dcpodp.c
@@ -0,0 +1,58 @@
+/* Test execution of DC CVADP instruction */
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#ifndef HWCAP2_DCPODP
+#define HWCAP2_DCPODP (1 << 0)
+#endif
+
+bool should_fail = false;
+
+static void signal_handler(int sig, siginfo_t *si, void *data)
+{
+ucontext_t *uc = (ucontext_t *)data;
+
+if (should_fail) {
+uc->uc_mcontext.pc += 4;
+} else {
+exit(EXIT_FAILURE);
+}
+}
+
+static int do_dc_cvadp(void)
+{
+struct sigaction sa = {
+.sa_flags = SA_SIGINFO,
+.sa_sigaction = signal_handler,
+};
+
+sigemptyset(_mask);
+if (sigaction(SIGSEGV, , NULL) < 0) {
+perror("sigaction");
+return EXIT_FAILURE;
+}
+
+asm volatile("dc cvadp, %0\n\t" :: "r"());
+
+should_fail = true;
+asm volatile("dc cvadp, %0\n\t" :: "r"(NULL));
+should_fail = false;
+
+return EXIT_SUCCESS;
+}
+
+int main(void)
+{
+if (getauxval(AT_HWCAP2) & HWCAP2_DCPODP) {
+return do_dc_cvadp();
+} else {
+printf("SKIP: no HWCAP2_DCPODP on this system\n");
+return EXIT_SUCCESS;
+}
+}
diff --git a/tests/tcg/aarch64/dcpop.c b/tests/tcg/aarch64/dcpop.c
new file mode 100644
index 00..0c4d32cfe7
--- /dev/null
+++ b/tests/tcg/aarch64/dcpop.c
@@ -0,0 +1,58 @@
+/* Test execution of DC CVAP instruction */
+
+#include 
+#include 
+
+#include 
+#include 
+#include 
+#include 
+
+#ifndef HWCAP_DCPOP
+#define HWCAP_DCPOP (1 << 16)
+#endif
+
+bool should_fail = false;
+
+static void signal_handler(int sig, siginfo_t *si, void *data)
+{
+ucontext_t *uc = (ucontext_t *)data;
+
+if (should_fail) {
+uc->uc_mcontext.pc += 4;
+} else {
+exit(EXIT_FAILURE);
+}
+}
+
+static int do_dc_cvap(void)
+{
+struct sigaction sa = {
+.sa_flags = SA_SIGINFO,
+.sa_sigaction = signal_handler,
+};
+
+sigemptyset(_mask);
+if (sigaction(SIGSEGV, , NULL) < 0) {
+perror("sigaction");
+return EXIT_FAILURE;
+}
+
+asm volatile("dc cvap, %0\n\t" :: "r"());
+
+should_fail = true;
+asm volatile("dc cvap, %0\n\t" :: "r"(NULL));
+should_fail = false;
+
+return EXIT_SUCCESS;
+}
+
+int main(void)
+{
+if (getauxval(AT_HWCAP) & HWCAP_DCPOP) {
+return do_dc_cvap();
+} else {
+printf("SKIP: no HWCAP_DCPOP on this system\n");
+return EXIT_SUCCESS;
+}
+}
-- 
2.40.1

[PATCH v3 1/2] target/arm: allow DC CVA[D]P in user mode emulation

2023-05-18 Thread Zhuojia Shen

DC CVAP and DC CVADP instructions can be executed in EL0 on Linux,
either directly when SCTLR_EL1.UCI == 1 or emulated by the kernel (see
user_cache_maint_handler() in arch/arm64/kernel/traps.c).

This patch enables execution of the two instructions in user mode
emulation.

Signed-off-by: Zhuojia Shen 
---
 target/arm/helper.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 0b7fd2e7e6..d4bee43bd0 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -7405,7 +7405,6 @@ static const ARMCPRegInfo rndr_reginfo[] = {
   .access = PL0_R, .readfn = rndr_readfn },
 };
 
-#ifndef CONFIG_USER_ONLY
 static void dccvap_writefn(CPUARMState *env, const ARMCPRegInfo *opaque,
   uint64_t value)
 {
@@ -7420,6 +7419,7 @@ static void dccvap_writefn(CPUARMState *env, const 
ARMCPRegInfo *opaque,
 /* This won't be crossing page boundaries */
 haddr = probe_read(env, vaddr, dline_size, mem_idx, GETPC());
 if (haddr) {
+#ifndef CONFIG_USER_ONLY
 
 ram_addr_t offset;
 MemoryRegion *mr;
@@ -7430,6 +7430,7 @@ static void dccvap_writefn(CPUARMState *env, const 
ARMCPRegInfo *opaque,
 if (mr) {
 memory_region_writeback(mr, offset, dline_size);
 }
+#endif /*CONFIG_USER_ONLY*/
 }
 }
 
@@ -7448,7 +7449,6 @@ static const ARMCPRegInfo dcpodp_reg[] = {
   .fgt = FGT_DCCVADP,
   .accessfn = aa64_cacheop_poc_access, .writefn = dccvap_writefn },
 };
-#endif /*CONFIG_USER_ONLY*/
 
 static CPAccessResult access_aa64_tid5(CPUARMState *env, const ARMCPRegInfo 
*ri,
bool isread)
@@ -9092,7 +9092,6 @@ void register_cp_regs_for_features(ARMCPU *cpu)
 if (cpu_isar_feature(aa64_tlbios, cpu)) {
 define_arm_cp_regs(cpu, tlbios_reginfo);
 }
-#ifndef CONFIG_USER_ONLY
 /* Data Cache clean instructions up to PoP */
 if (cpu_isar_feature(aa64_dcpop, cpu)) {
 define_one_arm_cp_reg(cpu, dcpop_reg);
@@ -9101,7 +9100,6 @@ void register_cp_regs_for_features(ARMCPU *cpu)
 define_one_arm_cp_reg(cpu, dcpodp_reg);
 }
 }
-#endif /*CONFIG_USER_ONLY*/
 
 /*
  * If full MTE is enabled, add all of the system registers.
-- 
2.40.1

[PATCH v3 0/2] target/arm: allow DC CVA[D]P in user mode emulation

2023-05-18 Thread Zhuojia Shen

This patch series enables executing DC CVAP and DC CVADP instructions in
AArch64 Linux user mode emulation and adds proper TCG tests.

Changes in v3:
- Fix typo of HWCAP2_DCPODP
- Split tests into a separate patch
- Remove unnecessary handling of SIGILL in tests
- Merge 4 tests into 2

Changes in v2:
- Fix code to deal with unmapped address
- Add tests for DC'ing unmapped address

Zhuojia Shen (2):
  target/arm: allow DC CVA[D]P in user mode emulation
  tests/tcg/aarch64: add DC CVA[D]P tests

 target/arm/helper.c   |  6 ++--
 tests/tcg/aarch64/Makefile.target | 11 ++
 tests/tcg/aarch64/dcpodp.c| 58 +++
 tests/tcg/aarch64/dcpop.c | 58 +++
 4 files changed, 129 insertions(+), 4 deletions(-)
 create mode 100644 tests/tcg/aarch64/dcpodp.c
 create mode 100644 tests/tcg/aarch64/dcpop.c

-- 
2.40.1

>From a44b84c39e86e1bc78c93250a6b2d80fbf2d5393 Mon Sep 17 00:00:00 2001
From: Zhuojia Shen 
Date: Thu, 1 Dec 2022 15:02:18 -0800
Subject: [PATCH v3 1/2] target/arm: allow DC CVA[D]P in user mode emulation

DC CVAP and DC CVADP instructions can be executed in EL0 on Linux,
either directly when SCTLR_EL1.UCI == 1 or emulated by the kernel (see
user_cache_maint_handler() in arch/arm64/kernel/traps.c).

This patch enables execution of the two instructions in user mode
emulation.

Signed-off-by: Zhuojia Shen 
---
 target/arm/helper.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 0b7fd2e7e6..d4bee43bd0 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -7405,7 +7405,6 @@ static const ARMCPRegInfo rndr_reginfo[] = {
   .access = PL0_R, .readfn = rndr_readfn },
 };
 
-#ifndef CONFIG_USER_ONLY
 static void dccvap_writefn(CPUARMState *env, const ARMCPRegInfo *opaque,
   uint64_t value)
 {
@@ -7420,6 +7419,7 @@ static void dccvap_writefn(CPUARMState *env, const 
ARMCPRegInfo *opaque,
 /* This won't be crossing page boundaries */
 haddr = probe_read(env, vaddr, dline_size, mem_idx, GETPC());
 if (haddr) {
+#ifndef CONFIG_USER_ONLY
 
 ram_addr_t offset;
 MemoryRegion *mr;
@@ -7430,6 +7430,7 @@ static void dccvap_writefn(CPUARMState *env, const 
ARMCPRegInfo *opaque,
 if (mr) {
 memory_region_writeback(mr, offset, dline_size);
 }
+#endif /*CONFIG_USER_ONLY*/
 }
 }
 
@@ -7448,7 +7449,6 @@ static const ARMCPRegInfo dcpodp_reg[] = {
   .fgt = FGT_DCCVADP,
   .accessfn = aa64_cacheop_poc_access, .writefn = dccvap_writefn },
 };
-#endif /*CONFIG_USER_ONLY*/
 
 static CPAccessResult access_aa64_tid5(CPUARMState *env, const ARMCPRegInfo 
*ri,
bool isread)
@@ -9092,7 +9092,6 @@ void register_cp_regs_for_features(ARMCPU *cpu)
 if (cpu_isar_feature(aa64_tlbios, cpu)) {
 define_arm_cp_regs(cpu, tlbios_reginfo);
 }
-#ifndef CONFIG_USER_ONLY
 /* Data Cache clean instructions up to PoP */
 if (cpu_isar_feature(aa64_dcpop, cpu)) {
 define_one_arm_cp_reg(cpu, dcpop_reg);
@@ -9101,7 +9100,6 @@ void register_cp_regs_for_features(ARMCPU *cpu)
 define_one_arm_cp_reg(cpu, dcpodp_reg);
 }
 }
-#endif /*CONFIG_USER_ONLY*/
 
 /*
  * If full MTE is enabled, add all of the system registers.
-- 
2.40.1


>From d5cb0b89d45703d35eb9375fd71be5ad94cb17ed Mon Sep 17 00:00:00 2001
From: Zhuojia Shen 
Date: Thu, 18 May 2023 14:24:50 -0700
Subject: [PATCH v3 2/2] tests/tcg/aarch64: add DC CVA[D]P tests

Test execution of DC CVAP and DC CVADP instructions under user mode
emulation.

Signed-off-by: Zhuojia Shen 
---
 tests/tcg/aarch64/Makefile.target | 11 ++
 tests/tcg/aarch64/dcpodp.c| 58 +++
 tests/tcg/aarch64/dcpop.c | 58 +++
 3 files changed, 127 insertions(+)
 create mode 100644 tests/tcg/aarch64/dcpodp.c
 create mode 100644 tests/tcg/aarch64/dcpop.c

diff --git a/tests/tcg/aarch64/Makefile.target 
b/tests/tcg/aarch64/Makefile.target
index 0315795487..3430fd3cd8 100644
--- a/tests/tcg/aarch64/Makefile.target
+++ b/tests/tcg/aarch64/Makefile.target
@@ -21,12 +21,23 @@ config-cc.mak: Makefile
$(quiet-@)( \
$(call cc-option,-march=armv8.1-a+sve,  CROSS_CC_HAS_SVE); \
$(call cc-option,-march=armv8.1-a+sve2, CROSS_CC_HAS_SVE2); 
\
+   $(call cc-option,-march=armv8.2-a,  
CROSS_CC_HAS_ARMV8_2); \
$(call cc-option,-march=armv8.3-a,  
CROSS_CC_HAS_ARMV8_3); \
+   $(call cc-option,-march=armv8.5-a,  
CROSS_CC_HAS_ARMV8_5); \
$(call cc-option,-mbranch-protection=standard,  
CROSS_CC_HAS_ARMV8_BTI); \
$(call cc-option,-march=armv8.5-a+memtag,   
CROSS_CC_HAS_ARMV8_MTE); \
$(call cc-option,-march=armv9-a+sme,
CROSS_CC_HAS_ARMV9_SME)) 3> config-cc.mak
 -include

RE: Multiple vIOMMU instance support in QEMU?

2023-05-18 Thread Tian, Kevin

> From: Jason Gunthorpe 
> Sent: Thursday, May 18, 2023 10:57 PM
> 
> On Thu, May 18, 2023 at 10:16:24AM -0400, Peter Xu wrote:
> 
> > What you mentioned above makes sense to me from the POV that 1
> vIOMMU may
> > not suffice, but that's at least totally new area to me because I never
> > used >1 IOMMUs even bare metal (excluding the case where I'm aware
> that
> > e.g. a GPU could have its own IOMMU-like dma translator).
> 
> Even x86 systems are multi-iommu, one iommu per physical CPU socket.
> 
> I'm not sure how they model this though - Kevin do you know? Do we get
> multiple iommu instances in Linux or is all the broadcasting of
> invalidates and sharing of tables hidden?
> 

Yes Linux supports multiple iommu instances on x86 systems.

Each iommu has its own configuration structures/caches and attached
devices. No broadcasting.

An ACPI table is used to describe the topology between IOMMUs and
devices.

If an iommu domain is attached to two devices behind two IOMMUs,
separate iotlb invalidation commands are required when the domain
mapping is changed.

Re: [PATCH 0/1] pcie: Allow atomic completion on PCIE root port

2023-05-18 Thread Alex Williamson

On Thu, 18 May 2023 16:03:07 -0400
"Michael S. Tsirkin"  wrote:

> On Fri, Apr 21, 2023 at 06:06:49PM +0200, Robin Voetter wrote:
> > 
> > 
> > On 4/21/23 10:22, Michael S. Tsirkin wrote:  
> > > On Thu, Apr 20, 2023 at 05:38:39PM +0200, ro...@streamhpc.com wrote:  
> > >> From: Robin Voetter 
> > >>
> > >> The ROCm driver for Linux uses PCIe atomics to schedule work and
> > >> generally communicate between the host and the device.  This does not
> > >> currently work in QEMU with regular vfio-pci passthrough, because the
> > >> pcie-root-port does not advertise the PCIe atomic completer
> > >> capabilities.  When initializing the GPU from the Linux driver, it
> > >> queries whether the PCIe connection from the CPU to GPU supports the
> > >> required capabilities[1] in the pci_enable_atomic_ops_to_root
> > >> function[2].  Currently the only part where this fails is checking the
> > >> atomic completer capabilities (32 and 64 bits) on the root port[3].  In
> > >> this case, the driver determines that PCIe atomics are not supported at
> > >> all, and this causes ROCm programs to misbehave.  (While AMD advertises
> > >> that there is some support for ROCm without PCIe atomics, I have never
> > >> actually gotten that working...)
> > >>
> > >> This patch allows ROCm to properly function by introducing an
> > >> additional experimental property to the pcie-root-port,
> > >> x-atomic-completion.  
> > > 
> > > so what exactly makes it experimental? from this description
> > > it looks like it actually has to be enabled for things to work?  
> > 
> > I was not sure which would be appropriate, but I'm fine with making it a
> > non-experimental option.  
> 
> So I guess the real thing to do is to query this from vfio right?
> Unfortunately we don't have access to vfio when we
> are creating the root port, but I think the thing to do would
> be to check at the time when vfio is attached, and if
> atomic is set but not supported, fail attaching vfio.
> 
> Right?

We don't currently provide a way to query this in vfio, but I imagine
we could call pci_enable_atomic_ops_to_root() in the host kernel
ourselves with various sizes and expose which are supported via a
capability on the vfio-device.  I'm not sure what we do for VFs though
since that function is invalid for them (maybe worry about them later).
I also see that one of the in-kernel drivers (mlx5) tries to enable
128-bit support, so I wonder if we want separate options for 32/64-bit
and 128-bit.

QEMU device options are clearly the most straightforward path to enable
this, but would it actually make sense, perhaps in addition, to
implement the above in the kernel and then have the QEMU vfio-pci
driver enable the available completer bits in the root port during
realize?  We could probably get away with it on hotplug, but if
necessary it could be something we only do for cold-plug devices (we
also have a no-hotplug vfio-pci variant if we're concerned what happens
after the device is removed in the VM - again, be could probably get
away with clearing the bits on unplug).

I'm not entirely sure where we stand in QEMU on whether options that
can cause poor behavior should always be experimental or we allow users
to shoot themselves in the foot as they please.  Obviously it makes it
more difficult for libvirt to support such configurations, but maybe
they'd rely on the above automatic enabling rather than try to guess
themselves.  Thanks,

Alex

Re: [RFC PATCH 1/1] virtio-balloon: Add Working Set Reporting feature

2023-05-18 Thread Dr. David Alan Gilbert

* T.J. Alumbaugh (talum...@google.com) wrote:
> On Tue, May 16, 2023 at 5:03 AM Dr. David Alan Gilbert  
> wrote:
> >
> > * T.J. Alumbaugh (talum...@google.com) wrote:
> > >  Working Set Reporting supported in virtio-balloon.
> > >  - adds working set reporting and notification vqueues
> > >  - QMP API additions:
> > >- guest-ws property on balloon
> > >- generates QMP WS_EVENT when new reports available
> > >- ws_config, ws_request commands
> >
> > Hi,
> >   1st it's probably best to split this patch up into a few
> > separate patches; something like:
> > 1) Updating the virtio_balloon header
> > 2) the main virtio-balloon code
> > 3) Adding the qmp code
> > 4) Adding the HMP code
> > 5) The migration code
> >
> > That would make it easier for people to review
> > the bits they know.
> >
> > Also, please make sure migration works between a host
> > without this feature and one which does; I suggest
> > turning the feature off in older machine types, and
> > also just checking that it works.
> >
> 
> Thanks very much for this feedback. This makes sense to me. I had
> originally attempted to split the patch into 2 (all device changes and
> all qmp + HMP) but got compilation warnings (that became errors) on
> uncalled functions due to the default compiler settings (since some of
> the new functions in the device only exist in order to be called by
> QMP). It sounds like that's OK for the purposes of review. I'll do as
> you suggest and update with a v2 soon.

You can add __attribute__((unused)) in the earlier patch and remove
them in the later one;  but that should be fairly rare.

Dave

> -T.J.
> 
> > See more comments below.
> >
> > Dave
> >
> > > Signed-off-by: T.J. Alumbaugh 
> > > ---
> > >  hmp-commands.hx   |  26 ++
> > >  hw/core/machine-hmp-cmds.c|  21 ++
> > >  hw/virtio/virtio-balloon-pci.c|   2 +
> > >  hw/virtio/virtio-balloon.c| 225 +-
> > >  include/hw/virtio/virtio-balloon.h|  17 +-
> > >  include/monitor/hmp.h |   2 +
> > >  .../standard-headers/linux/virtio_balloon.h   |  17 ++
> > >  include/sysemu/balloon.h  |   8 +-
> > >  monitor/monitor.c |   1 +
> > >  qapi/machine.json |  66 +
> > >  qapi/misc.json|  26 ++
> > >  softmmu/balloon.c |  32 ++-
> > >  12 files changed, 437 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/hmp-commands.hx b/hmp-commands.hx
> > > index 9afbb54a51..f3548a148f 100644
> > > --- a/hmp-commands.hx
> > > +++ b/hmp-commands.hx
> > > @@ -1396,6 +1396,32 @@ SRST
> > >Request VM to change its memory allocation to *value* (in MB).
> > >  ERST
> > >
> > > +{
> > > +.name   = "ws_config",
> > > +.args_type  = "i0:i,i1:i,i2:i,refresh:i,report:i",
> > > +.params = "bin intervals 0-2, refresh and report thresholds",
> > > +.help   = "Working Set intervals, refresh/report thresholds 
> > > (ms)",
> > > +.cmd= hmp_ws_config,
> > > +},
> > > +
> > > +SRST
> > > +``ws_config``
> > > +  Set the intervals (in ms), refresh, and report thresholds for WS 
> > > reporting
> > > +ERST
> > > +
> > > +{
> > > +.name   = "ws_request",
> > > +.args_type  = "",
> > > +.params = "",
> > > +.help   = "Request the Working Set of the guest.",
> > > +.cmd= hmp_ws_request,
> > > +},
> > > +
> > > +SRST
> > > +``wss_request``
> >
> > Typo 'ws*s*'
> >
> > Some other comments on that:
> >   a) When you've split the hmp stuff out into a separate patch you can
> >  give an example of the command (especially ws_config) in the
> >  commit message.
> >
> >   b) Would it make sense to have a query-ws/info ws to display the last 
> > received
> >  working set info?
> >
> >   c) Some may feel 'ws' is a bit terse and want the unabbreviated
> >   version.  (Is it also general, or is it actually virtio balloon
> >   specific, ie should the name include virtio or balloon?)
> >
> >   d) You've got 3 bin intervals; is that '3' set in stone or is it
> >   likely to change in the future, in which case perhaps you want the
> >   perameters to be more flexible.  I note your migration code
> >   transfers a 'number of bins'.
> >
> > > +  Request the Working Set Size of the guest.
> > > +ERST
> > > +
> > >  {
> > >  .name   = "set_link",
> > >  .args_type  = "name:s,up:b",
> > > diff --git a/hw/core/machine-hmp-cmds.c b/hw/core/machine-hmp-cmds.c
> > > index c3e55ef9e9..dd11865ddc 100644
> > > --- a/hw/core/machine-hmp-cmds.c
> > > +++ b/hw/core/machine-hmp-cmds.c
> > > @@ -237,6 +237,27 @@ void hmp_balloon(Monitor *mon, const QDict *qdict)
> > >  hmp_handle_error(mon, err);
> > >  }
> > >
> > > +void hmp_ws_request(Monitor

Re: [RFC PATCH 1/1] virtio-balloon: Add Working Set Reporting feature

2023-05-18 Thread T.J. Alumbaugh

On Tue, May 16, 2023 at 5:03 AM Dr. David Alan Gilbert  wrote:
>
> * T.J. Alumbaugh (talum...@google.com) wrote:
> >  Working Set Reporting supported in virtio-balloon.
> >  - adds working set reporting and notification vqueues
> >  - QMP API additions:
> >- guest-ws property on balloon
> >- generates QMP WS_EVENT when new reports available
> >- ws_config, ws_request commands
>
> Hi,
>   1st it's probably best to split this patch up into a few
> separate patches; something like:
> 1) Updating the virtio_balloon header
> 2) the main virtio-balloon code
> 3) Adding the qmp code
> 4) Adding the HMP code
> 5) The migration code
>
> That would make it easier for people to review
> the bits they know.
>
> Also, please make sure migration works between a host
> without this feature and one which does; I suggest
> turning the feature off in older machine types, and
> also just checking that it works.
>

Thanks very much for this feedback. This makes sense to me. I had
originally attempted to split the patch into 2 (all device changes and
all qmp + HMP) but got compilation warnings (that became errors) on
uncalled functions due to the default compiler settings (since some of
the new functions in the device only exist in order to be called by
QMP). It sounds like that's OK for the purposes of review. I'll do as
you suggest and update with a v2 soon.

-T.J.

> See more comments below.
>
> Dave
>
> > Signed-off-by: T.J. Alumbaugh 
> > ---
> >  hmp-commands.hx   |  26 ++
> >  hw/core/machine-hmp-cmds.c|  21 ++
> >  hw/virtio/virtio-balloon-pci.c|   2 +
> >  hw/virtio/virtio-balloon.c| 225 +-
> >  include/hw/virtio/virtio-balloon.h|  17 +-
> >  include/monitor/hmp.h |   2 +
> >  .../standard-headers/linux/virtio_balloon.h   |  17 ++
> >  include/sysemu/balloon.h  |   8 +-
> >  monitor/monitor.c |   1 +
> >  qapi/machine.json |  66 +
> >  qapi/misc.json|  26 ++
> >  softmmu/balloon.c |  32 ++-
> >  12 files changed, 437 insertions(+), 6 deletions(-)
> >
> > diff --git a/hmp-commands.hx b/hmp-commands.hx
> > index 9afbb54a51..f3548a148f 100644
> > --- a/hmp-commands.hx
> > +++ b/hmp-commands.hx
> > @@ -1396,6 +1396,32 @@ SRST
> >Request VM to change its memory allocation to *value* (in MB).
> >  ERST
> >
> > +{
> > +.name   = "ws_config",
> > +.args_type  = "i0:i,i1:i,i2:i,refresh:i,report:i",
> > +.params = "bin intervals 0-2, refresh and report thresholds",
> > +.help   = "Working Set intervals, refresh/report thresholds 
> > (ms)",
> > +.cmd= hmp_ws_config,
> > +},
> > +
> > +SRST
> > +``ws_config``
> > +  Set the intervals (in ms), refresh, and report thresholds for WS 
> > reporting
> > +ERST
> > +
> > +{
> > +.name   = "ws_request",
> > +.args_type  = "",
> > +.params = "",
> > +.help   = "Request the Working Set of the guest.",
> > +.cmd= hmp_ws_request,
> > +},
> > +
> > +SRST
> > +``wss_request``
>
> Typo 'ws*s*'
>
> Some other comments on that:
>   a) When you've split the hmp stuff out into a separate patch you can
>  give an example of the command (especially ws_config) in the
>  commit message.
>
>   b) Would it make sense to have a query-ws/info ws to display the last 
> received
>  working set info?
>
>   c) Some may feel 'ws' is a bit terse and want the unabbreviated
>   version.  (Is it also general, or is it actually virtio balloon
>   specific, ie should the name include virtio or balloon?)
>
>   d) You've got 3 bin intervals; is that '3' set in stone or is it
>   likely to change in the future, in which case perhaps you want the
>   perameters to be more flexible.  I note your migration code
>   transfers a 'number of bins'.
>
> > +  Request the Working Set Size of the guest.
> > +ERST
> > +
> >  {
> >  .name   = "set_link",
> >  .args_type  = "name:s,up:b",
> > diff --git a/hw/core/machine-hmp-cmds.c b/hw/core/machine-hmp-cmds.c
> > index c3e55ef9e9..dd11865ddc 100644
> > --- a/hw/core/machine-hmp-cmds.c
> > +++ b/hw/core/machine-hmp-cmds.c
> > @@ -237,6 +237,27 @@ void hmp_balloon(Monitor *mon, const QDict *qdict)
> >  hmp_handle_error(mon, err);
> >  }
> >
> > +void hmp_ws_request(Monitor *mon, const QDict *qdict)
> > +{
> > +Error *err = NULL;
> > +
> > +qmp_ws_request();
> > +hmp_handle_error(mon, err);
> > +}
> > +
> > +void hmp_ws_config(Monitor *mon, const QDict *qdict)
> > +{
> > +uint64_t i0 = qdict_get_int(qdict, "i0");
> > +uint64_t i1 = qdict_get_int(qdict, "i1");
> > +uint64_t i2 = qdict_get_int(qdict, "i2");
> > +uint64_t refresh = qdict_get_int(qdict, "refresh");
> > +

Re: [PATCH v2] cryptodev-vhost-user: add asymmetric crypto support

2023-05-18 Thread Michael S. Tsirkin

Pls do not v2 as reply to v1.
Start a new thread if you really want to reply to v1
with link to lore copy of v2.

-- 
MST

Re: [PATCH] virtio-gpu: add a FIXME for virtio_gpu_load()

2023-05-18 Thread Michael S. Tsirkin

On Mon, May 15, 2023 at 05:25:18PM +0400, marcandre.lur...@redhat.com wrote:
> From: Marc-André Lureau 
> 
> It looks like the virtio_gpu_load() does not compute and set the offset,
> the same way virtio_gpu_set_scanout() does. This probably results in
> incorrect display until the scanout/framebuffer is updated again, I
> guess we should fix it, although I haven't checked this yet.
> 
> Signed-off-by: Marc-André Lureau 

I guess it's a way to ping Gerd ;)
Better to just fix it though, no?

> ---
>  hw/display/virtio-gpu.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/hw/display/virtio-gpu.c b/hw/display/virtio-gpu.c
> index 66ac9b6cc5..66cddd94d9 100644
> --- a/hw/display/virtio-gpu.c
> +++ b/hw/display/virtio-gpu.c
> @@ -1289,6 +1289,7 @@ static int virtio_gpu_load(QEMUFile *f, void *opaque, 
> size_t size,
>  /* load & apply scanout state */
>  vmstate_load_state(f, _virtio_gpu_scanouts, g, 1);
>  for (i = 0; i < g->parent_obj.conf.max_outputs; i++) {
> +/* FIXME: should take scanout.r.{x,y} into account */
>  scanout = >parent_obj.scanout[i];
>  if (!scanout->resource_id) {
>  continue;
> -- 
> 2.40.1

Re: [PATCH 0/4] Trivial cleanups

2023-05-18 Thread Michael S. Tsirkin

On Sat, May 13, 2023 at 12:09:02PM +0200, Bernhard Beschow wrote:
> This series:
> * Removes dead code from omap_uart and i82378
> * Resolves redundant code in the i8254 timer devices
> * Replaces string literals by macro usage for TYPE_ISA_PARALLEL devices
> 
> Bernhard Beschow (4):
>   hw/timer/i8254_common: Share "iobase" property via base class
>   hw/arm/omap: Remove unused omap_uart_attach()
>   hw/char/parallel: Export TYPE_ISA_PARALLEL macro
>   hw/isa/i82378: Remove unused "io" attribute


Acked-by: Michael S. Tsirkin 

>  include/hw/arm/omap.h  | 1 -
>  include/hw/char/parallel.h | 2 ++
>  hw/char/omap_uart.c| 9 -
>  hw/char/parallel-isa.c | 2 +-
>  hw/char/parallel.c | 1 -
>  hw/i386/kvm/i8254.c| 1 -
>  hw/isa/i82378.c| 1 -
>  hw/isa/isa-superio.c   | 3 ++-
>  hw/timer/i8254.c   | 6 --
>  hw/timer/i8254_common.c| 6 ++
>  10 files changed, 11 insertions(+), 21 deletions(-)
> 
> -- 
> 2.40.1
>

Re: [PATCH v8 00/23] Consolidate PIIX south bridges

2023-05-18 Thread Michael S. Tsirkin

On Wed, May 10, 2023 at 06:39:49PM +, Bernhard Beschow wrote:
> 
> 
> Am 21. April 2023 16:40:47 UTC schrieb Bernhard Beschow :
> >
> >
> >Am 21. April 2023 07:15:18 UTC schrieb "Michael S. Tsirkin" 
> >:
> >>On Thu, Mar 02, 2023 at 10:21:38PM +0100, Bernhard Beschow wrote:
> >>> This series consolidates the implementations of the PIIX3 and PIIX4 south
> >>> bridges and is an extended version of [1]. The motivation is to share as 
> >>> much
> >>> code as possible and to bring both device models to feature parity such 
> >>> that
> >>> perhaps PIIX4 can become a drop-in-replacement for PIIX3 in the pc 
> >>> machine. This
> >>> could resolve the "Frankenstein" PIIX4-PM problem in PIIX3 discussed on 
> >>> this
> >>> list before.
> >>
> >>Hi!
> >>No freeze is over, could you rebase pls?
> >>I could try to resolve the conflicts but this is so big I'd rather
> >>not take the risk of messing it up.
> >
> >Sure! Since this series is still under discussion I'd wait for the PIIX3 Xen 
> >decoupling series to land in master. This will simplify this series a bit by 
> >taking Xen out of the equation.
> 
> Could we queue the first two RTC patches already? IMO they're useful general 
> PC cleanups on their own.
> 
> Best regards,
> Bernhard


Could you please post just these two then?
Preferably rebased.

Thanks!

> >
> >Best regards,
> >Bernhard
> >
> >>
> >>> The series is structured as follows:
> >>> 
> >>> Move sub devices into the PIIX3 south bridge, like PIIX4 does already:
> >>> * hw/i386/pc: Create RTC controllers in south bridges
> >>> * hw/i386/pc: No need for rtc_state to be an out-parameter
> >>> * hw/i386/pc_piix: Allow for setting properties before realizing PIIX3 
> >>> south bridge
> >>> * hw/isa/piix3: Create USB controller in host device
> >>> * hw/isa/piix3: Create power management controller in host device
> >>> * hw/isa/piix3: Move ISA bus IRQ assignments into host device
> >>> * hw/isa/piix3: Create IDE controller in host device
> >>> * hw/isa/piix3: Wire up ACPI interrupt internally
> >>> 
> >>> Make PIIX3 and PIIX4 south bridges more similar:
> >>> * hw/isa/piix3: Resolve redundant PIIX_NUM_PIC_IRQS
> >>> * hw/isa/piix3: Rename pci_piix3_props for sharing with PIIX4
> >>> * hw/isa/piix3: Rename piix3_reset() for sharing with PIIX4
> >>> * hw/isa/piix3: Drop the "3" from PIIX base class
> >>> * hw/isa/piix4: Make PIIX4's ACPI and USB functions optional
> >>> * hw/isa/piix4: Remove unused inbound ISA interrupt lines
> >>> * hw/isa/piix4: Reuse struct PIIXState from PIIX3
> >>> * hw/isa/piix4: Create the "intr" property during init() already
> >>> * hw/isa/piix4: Rename reset control operations to match PIIX3
> >>> 
> >>> This patch achieves the main goal of the series:
> >>> * hw/isa/piix3: Merge hw/isa/piix4.c
> >>> 
> >>> Perform some further consolidations which were easier to do after the 
> >>> merge:
> >>> * hw/isa/piix: Harmonize names of reset control memory regions
> >>> * hw/isa/piix: Rename functions to be shared for interrupt triggering
> >>> * hw/isa/piix: Consolidate IRQ triggering
> >>> * hw/isa/piix: Share PIIX3's base class with PIIX4
> >>> * hw/isa/piix: Reuse PIIX3 base class' realize method in PIIX4
> >>> 
> >>> One challenge was dealing with optional devices where Peter already gave 
> >>> advice
> >>> in [1] which this series implements.
> >>> 
> >>> There are still some differences in the device models:
> >>> - PIIX4 instantiates its own PIC and PIT while PIIX3 doesn't
> >>> - PIIX4 wires up the RTC IRQ itself while PIIX3 doesn't
> >>> - Different binary layout in VM state
> >>> 
> >>> v8:
> >>> - Rebase onto master
> >>> - Remove Reviewed-by tag from 'hw/isa/piix: Reuse PIIX3 base class' 
> >>> realize
> >>>   method in PIIX4' since it changed considerably in v7.
> >>> 
> >>> Testing done (both on top of series as well as on 'hw/isa/piix3: Drop the 
> >>> "3"
> >>> from PIIX base class'):
> >>> * `make check`
> >>> * `make check-avocado`
> >>> * Boot live CD:
> >>>   * `qemu-system-x86_64 -M pc -m 2G -accel kvm -cpu host -cdrom
> >>> manjaro-kde-21.3.2-220704-linux515.iso`
> >>>   * `qemu-system-x86_64 -M q35 -m 2G -accel kvm -cpu host -cdrom
> >>> manjaro-kde-21.3.2-220704-linux515.iso`
> >>> * 'qemu-system-mips64el -M malta -kernel vmlinux-3.2.0-4-5kc-malta -hda
> >>> debian_wheezy_mipsel_standard.qcow2 -append "root=/dev/sda1 
> >>> console=ttyS0"`
> >>> 
> >>> v7:
> >>> - Rebase onto master
> >>> - Avoid the PIC proxy (Phil)
> >>>   The motivation for the PIC proxy was to allow for wiring up ISA 
> >>> interrupts in
> >>>   the south bridges. ISA interrupt wiring requires the GPIO lines to be
> >>>   populated already but pc_piix assigned the interrupts only after 
> >>> realizing
> >>>   PIIX3. By shifting interrupt assignment before realizing, the ISA 
> >>> interrupts
> >>>   are already populated during PIIX3's realize phase where the ISA 
> >>> interrupts
> >>>   are wired up.
> >>> - New patches:
> >>>   * hw/isa/piix4: Reuse struct PIIXState from PIIX3
>

Re: [PATCH v3 5/5] vdpa: move CVQ isolation check to net_init_vhost_vdpa

2023-05-18 Thread Michael S. Tsirkin

On Thu, May 18, 2023 at 08:36:22AM +0200, Eugenio Perez Martin wrote:
> On Thu, May 18, 2023 at 7:50 AM Jason Wang  wrote:
> >
> > On Wed, May 17, 2023 at 2:30 PM Eugenio Perez Martin
> >  wrote:
> > >
> > > On Wed, May 17, 2023 at 5:59 AM Jason Wang  wrote:
> > > >
> > > > On Tue, May 9, 2023 at 11:44 PM Eugenio Pérez  
> > > > wrote:
> > > > >
> > > > > Evaluating it at start time instead of initialization time may make 
> > > > > the
> > > > > guest capable of dynamically adding or removing migration blockers.
> > > > >
> > > > > Also, moving to initialization reduces the number of ioctls in the
> > > > > migration, reducing failure possibilities.
> > > > >
> > > > > As a drawback we need to check for CVQ isolation twice: one time with 
> > > > > no
> > > > > MQ negotiated and another one acking it, as long as the device 
> > > > > supports
> > > > > it.  This is because Vring ASID / group management is based on vq
> > > > > indexes, but we don't know the index of CVQ before negotiating MQ.
> > > > >
> > > > > Signed-off-by: Eugenio Pérez 
> > > > > ---
> > > > > v2: Take out the reset of the device from vhost_vdpa_cvq_is_isolated
> > > > > v3: Only record cvq_isolated, true if the device have cvq isolated in
> > > > > both !MQ and MQ configurations.
> > > > > ---
> > > > >  net/vhost-vdpa.c | 178 
> > > > > +++
> > > > >  1 file changed, 135 insertions(+), 43 deletions(-)
> > > > >
> > > > > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > > > > index 3fb833fe76..29054b77a9 100644
> > > > > --- a/net/vhost-vdpa.c
> > > > > +++ b/net/vhost-vdpa.c
> > > > > @@ -43,6 +43,10 @@ typedef struct VhostVDPAState {
> > > > >
> > > > >  /* The device always have SVQ enabled */
> > > > >  bool always_svq;
> > > > > +
> > > > > +/* The device can isolate CVQ in its own ASID */
> > > > > +bool cvq_isolated;
> > > > > +
> > > > >  bool started;
> > > > >  } VhostVDPAState;
> > > > >
> > > > > @@ -362,15 +366,8 @@ static NetClientInfo net_vhost_vdpa_info = {
> > > > >  .check_peer_type = vhost_vdpa_check_peer_type,
> > > > >  };
> > > > >
> > > > > -/**
> > > > > - * Get vring virtqueue group
> > > > > - *
> > > > > - * @device_fd  vdpa device fd
> > > > > - * @vq_index   Virtqueue index
> > > > > - *
> > > > > - * Return -errno in case of error, or vq group if success.
> > > > > - */
> > > > > -static int64_t vhost_vdpa_get_vring_group(int device_fd, unsigned 
> > > > > vq_index)
> > > > > +static int64_t vhost_vdpa_get_vring_group(int device_fd, unsigned 
> > > > > vq_index,
> > > > > +  Error **errp)
> > > > >  {
> > > > >  struct vhost_vring_state state = {
> > > > >  .index = vq_index,
> > > > > @@ -379,8 +376,7 @@ static int64_t vhost_vdpa_get_vring_group(int 
> > > > > device_fd, unsigned vq_index)
> > > > >
> > > > >  if (unlikely(r < 0)) {
> > > > >  r = -errno;
> > > > > -error_report("Cannot get VQ %u group: %s", vq_index,
> > > > > - g_strerror(errno));
> > > > > +error_setg_errno(errp, errno, "Cannot get VQ %u group", 
> > > > > vq_index);
> > > > >  return r;
> > > > >  }
> > > > >
> > > > > @@ -480,9 +476,9 @@ static int 
> > > > > vhost_vdpa_net_cvq_start(NetClientState *nc)
> > > > >  {
> > > > >  VhostVDPAState *s, *s0;
> > > > >  struct vhost_vdpa *v;
> > > > > -uint64_t backend_features;
> > > > >  int64_t cvq_group;
> > > > > -int cvq_index, r;
> > > > > +int r;
> > > > > +Error *err = NULL;
> > > > >
> > > > >  assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
> > > > >
> > > > > @@ -502,41 +498,22 @@ static int 
> > > > > vhost_vdpa_net_cvq_start(NetClientState *nc)
> > > > >  /*
> > > > >   * If we early return in these cases SVQ will not be enabled. 
> > > > > The migration
> > > > >   * will be blocked as long as vhost-vdpa backends will not offer 
> > > > > _F_LOG.
> > > > > - *
> > > > > - * Calling VHOST_GET_BACKEND_FEATURES as they are not available 
> > > > > in v->dev
> > > > > - * yet.
> > > > >   */
> > > > > -r = ioctl(v->device_fd, VHOST_GET_BACKEND_FEATURES, 
> > > > > _features);
> > > > > -if (unlikely(r < 0)) {
> > > > > -error_report("Cannot get vdpa backend_features: %s(%d)",
> > > > > -g_strerror(errno), errno);
> > > > > -return -1;
> > > > > +if (!vhost_vdpa_net_valid_svq_features(v->dev->features, NULL)) {
> > > > > +return 0;
> > > > >  }
> > > > > -if (!(backend_features & BIT_ULL(VHOST_BACKEND_F_IOTLB_ASID)) ||
> > > > > -!vhost_vdpa_net_valid_svq_features(v->dev->features, NULL)) {
> > > > > +
> > > > > +if (!s->cvq_isolated) {
> > > > >  return 0;
> > > > >  }
> > > > >
> > > > > -/*
> > > > > - * Check if all the virtqueues of the virtio device are in a 
> > > > > different vq
> > > > > - * than the last vq. VQ group of

Re: [PATCH 3/3] iotests: Test commit with iothreads and ongoing I/O

2023-05-18 Thread Eric Blake

On Wed, May 17, 2023 at 05:28:34PM +0200, Kevin Wolf wrote:
> This tests exercises graph locking, draining, and graph modifications
> with AioContext switches a lot. Amongst others, it serves as a
> regression test for bdrv_graph_wrlock() deadlocking because it is called
> with a locked AioContext and for AioContext handling in the NBD server.
> 
> Signed-off-by: Kevin Wolf 

I've now confirmed the following setups with just './check
graph-changes-while-io -qcow2':

patch 3 alone => test fails with wrlock assertion (good, we're
catching the 8.0 regression where the new assertion failure is
tripping)

patch 1 and 3 => test fails differently than patch 3 alone (good,
we're exposing the fact that NBD had a pre-existing bug, regardless of
whether the added rwlock made it easier to spot)

patch 2 and 3 => test passes (good, patch 2 appears to have fixed this
particular bug, and when we are ready to revert patch 1 because we get
rid of AioContext locking we'll still be okay)

patch 1, 2, and 3 => test passes (good, we fixed the NBD bug, and have
regression testing in place for a scenario that previously wasn't
getting good testing)

As such, I'm happy to supply:

Tested-by: Eric Blake 


Now on to the patch itself...

> ---
>  tests/qemu-iotests/iotests.py |  4 ++
>  .../qemu-iotests/tests/graph-changes-while-io | 56 +--
>  .../tests/graph-changes-while-io.out  |  4 +-
>  3 files changed, 58 insertions(+), 6 deletions(-)
> 

> @@ -84,6 +84,54 @@ class TestGraphChangesWhileIO(QMPTestCase):
>  
>  bench_thr.join()
>  
> +def test_commit_while_io(self) -> None:
> +# Run qemu-img bench in the background
> +bench_thr = Thread(target=do_qemu_img_bench, args=(20, ))

TIL - you can create a 1-item tuple in Python.  It caught me
off-guard, but makes sense now that I've re-read it.

> +
> +# While qemu-img bench is running, repeatedly commit overlay to node0
> +while bench_thr.is_alive():
> +result = self.qsd.qmp('block-commit', {
> +'job-id': 'job0',
> +'device': 'overlay',
> +})
> +self.assert_qmp(result, 'return', {})
> +
> +result = self.qsd.qmp('block-job-cancel', {
> +'device': 'job0',
> +})
> +self.assert_qmp(result, 'return', {})
> +
> +cancelled = False
> +while not cancelled:
> +for event in self.qsd.get_qmp().get_events(wait=10.0):

The updated test took about 34 seconds on my machine; long enough that
it is rightfully not part of './check -g quick', but still reasonable
that I had no problems reproducing the issue while the test was
running.

> +if event['event'] != 'JOB_STATUS_CHANGE':
> +continue
> +if event['data']['status'] == 'null':
> +cancelled = True
> +
> +bench_thr.join()
> +

It feels a bit odd that the test is skipped during './check -nbd', yet
it IS utilizing nbd, and the fix in patch 2 is indeed in NBD code.
But that's not a flaw in the test itself, just a limitation of what
images we need in order to set up the NBD service in a way to trigger
the problem.

Reviewed-by: Eric Blake 

I'm in a spot where I can quickly queue this through my NBD tree so we
can get it backported to 8.0.1; pull request coming up, provided the
full series passes a few more tests on my end.

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.   +1-919-301-3266
Virtualization:  qemu.org | libvirt.org

[PULL v2 44/44] Hexagon (linux-user/hexagon): handle breakpoints

2023-05-18 Thread Taylor Simpson

From: Matheus Tavares Bernardino 

This enables LLDB to work with hexagon linux-user mode through the GDB
remote protocol.

Helped-by: Richard Henderson 
Signed-off-by: Matheus Tavares Bernardino 
Reviewed-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
Message-Id: 

---
 linux-user/hexagon/cpu_loop.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/linux-user/hexagon/cpu_loop.c b/linux-user/hexagon/cpu_loop.c
index b84e25bf71..7f1499ed28 100644
--- a/linux-user/hexagon/cpu_loop.c
+++ b/linux-user/hexagon/cpu_loop.c
@@ -63,6 +63,9 @@ void cpu_loop(CPUHexagonState *env)
 case EXCP_ATOMIC:
 cpu_exec_step_atomic(cs);
 break;
+case EXCP_DEBUG:
+force_sig_fault(TARGET_SIGTRAP, TARGET_TRAP_BRKPT, 0);
+break;
 default:
 EXCP_DUMP(env, "\nqemu: unhandled CPU exception %#x - aborting\n",
  trapnr);
-- 
2.25.1

[PULL v2 42/44] Hexagon (gdbstub): fix p3:0 read and write via stub

2023-05-18 Thread Taylor Simpson

From: Brian Cain 

Signed-off-by: Brian Cain 
Co-authored-by: Sid Manning 
Signed-off-by: Sid Manning 
Co-authored-by: Matheus Tavares Bernardino 
Signed-off-by: Matheus Tavares Bernardino 
Reviewed-by: Taylor Simpson 
Signed-off-by: Taylor Simpson 
Message-Id: 
<32e7de567cdae184a6781644454bbb19916c955b.1683214375.git.quic_mathb...@quicinc.com>
---
 target/hexagon/gdbstub.c | 16 
 1 file changed, 16 insertions(+)

diff --git a/target/hexagon/gdbstub.c b/target/hexagon/gdbstub.c
index 46083da620..a06fed9f18 100644
--- a/target/hexagon/gdbstub.c
+++ b/target/hexagon/gdbstub.c
@@ -25,6 +25,14 @@ int hexagon_gdb_read_register(CPUState *cs, GByteArray 
*mem_buf, int n)
 HexagonCPU *cpu = HEXAGON_CPU(cs);
 CPUHexagonState *env = >env;
 
+if (n == HEX_REG_P3_0_ALIASED) {
+uint32_t p3_0 = 0;
+for (int i = 0; i < NUM_PREGS; i++) {
+p3_0 = deposit32(p3_0, i * 8, 8, env->pred[i]);
+}
+return gdb_get_regl(mem_buf, p3_0);
+}
+
 if (n < TOTAL_PER_THREAD_REGS) {
 return gdb_get_regl(mem_buf, env->gpr[n]);
 }
@@ -37,6 +45,14 @@ int hexagon_gdb_write_register(CPUState *cs, uint8_t 
*mem_buf, int n)
 HexagonCPU *cpu = HEXAGON_CPU(cs);
 CPUHexagonState *env = >env;
 
+if (n == HEX_REG_P3_0_ALIASED) {
+uint32_t p3_0 = ldtul_p(mem_buf);
+for (int i = 0; i < NUM_PREGS; i++) {
+env->pred[i] = extract32(p3_0, i * 8, 8);
+}
+return sizeof(target_ulong);
+}
+
 if (n < TOTAL_PER_THREAD_REGS) {
 env->gpr[n] = ldtul_p(mem_buf);
 return sizeof(target_ulong);
-- 
2.25.1

[PULL v2 43/44] Hexagon (gdbstub): add HVX support

2023-05-18 Thread Taylor Simpson

Signed-off-by: Taylor Simpson 
Co-authored-by: Brian Cain 
Signed-off-by: Brian Cain 
Co-authored-by: Matheus Tavares Bernardino 
Signed-off-by: Matheus Tavares Bernardino 
Reviewed-by: Brian Cain 
Message-Id: 
<17cb32f34d469f705c3cc066a3583935352ee048.1683214375.git.quic_mathb...@quicinc.com>
---
 configs/targets/hexagon-linux-user.mak |  2 +-
 target/hexagon/internal.h  |  2 +
 target/hexagon/cpu.c   |  6 ++
 target/hexagon/gdbstub.c   | 68 ++
 gdb-xml/hexagon-hvx.xml| 96 ++
 5 files changed, 173 insertions(+), 1 deletion(-)
 create mode 100644 gdb-xml/hexagon-hvx.xml

diff --git a/configs/targets/hexagon-linux-user.mak 
b/configs/targets/hexagon-linux-user.mak
index fd5e222d4f..2765a4c563 100644
--- a/configs/targets/hexagon-linux-user.mak
+++ b/configs/targets/hexagon-linux-user.mak
@@ -1,2 +1,2 @@
 TARGET_ARCH=hexagon
-TARGET_XML_FILES=gdb-xml/hexagon-core.xml
+TARGET_XML_FILES=gdb-xml/hexagon-core.xml gdb-xml/hexagon-hvx.xml
diff --git a/target/hexagon/internal.h b/target/hexagon/internal.h
index b1bfadc3f5..d732b6bb3c 100644
--- a/target/hexagon/internal.h
+++ b/target/hexagon/internal.h
@@ -33,6 +33,8 @@
 
 int hexagon_gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
 int hexagon_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
+int hexagon_hvx_gdb_read_register(CPUHexagonState *env, GByteArray *mem_buf, 
int n);
+int hexagon_hvx_gdb_write_register(CPUHexagonState *env, uint8_t *mem_buf, int 
n);
 
 void hexagon_debug_vreg(CPUHexagonState *env, int regnum);
 void hexagon_debug_qreg(CPUHexagonState *env, int regnum);
diff --git a/target/hexagon/cpu.c b/target/hexagon/cpu.c
index 575bcc190d..f155936289 100644
--- a/target/hexagon/cpu.c
+++ b/target/hexagon/cpu.c
@@ -24,6 +24,7 @@
 #include "hw/qdev-properties.h"
 #include "fpu/softfloat-helpers.h"
 #include "tcg/tcg.h"
+#include "exec/gdbstub.h"
 
 static void hexagon_v67_cpu_init(Object *obj) { }
 static void hexagon_v68_cpu_init(Object *obj) { }
@@ -339,6 +340,11 @@ static void hexagon_cpu_realize(DeviceState *dev, Error 
**errp)
 return;
 }
 
+gdb_register_coprocessor(cs, hexagon_hvx_gdb_read_register,
+ hexagon_hvx_gdb_write_register,
+ NUM_VREGS + NUM_QREGS,
+ "hexagon-hvx.xml", 0);
+
 qemu_init_vcpu(cs);
 cpu_reset(cs);
 
diff --git a/target/hexagon/gdbstub.c b/target/hexagon/gdbstub.c
index a06fed9f18..54d37e006e 100644
--- a/target/hexagon/gdbstub.c
+++ b/target/hexagon/gdbstub.c
@@ -60,3 +60,71 @@ int hexagon_gdb_write_register(CPUState *cs, uint8_t 
*mem_buf, int n)
 
 g_assert_not_reached();
 }
+
+static int gdb_get_vreg(CPUHexagonState *env, GByteArray *mem_buf, int n)
+{
+int total = 0;
+int i;
+for (i = 0; i < ARRAY_SIZE(env->VRegs[n].uw); i++) {
+total += gdb_get_regl(mem_buf, env->VRegs[n].uw[i]);
+}
+return total;
+}
+
+static int gdb_get_qreg(CPUHexagonState *env, GByteArray *mem_buf, int n)
+{
+int total = 0;
+int i;
+for (i = 0; i < ARRAY_SIZE(env->QRegs[n].uw); i++) {
+total += gdb_get_regl(mem_buf, env->QRegs[n].uw[i]);
+}
+return total;
+}
+
+int hexagon_hvx_gdb_read_register(CPUHexagonState *env, GByteArray *mem_buf, 
int n)
+{
+if (n < NUM_VREGS) {
+return gdb_get_vreg(env, mem_buf, n);
+}
+n -= NUM_VREGS;
+
+if (n < NUM_QREGS) {
+return gdb_get_qreg(env, mem_buf, n);
+}
+
+g_assert_not_reached();
+}
+
+static int gdb_put_vreg(CPUHexagonState *env, uint8_t *mem_buf, int n)
+{
+int i;
+for (i = 0; i < ARRAY_SIZE(env->VRegs[n].uw); i++) {
+env->VRegs[n].uw[i] = ldtul_p(mem_buf);
+mem_buf += 4;
+}
+return MAX_VEC_SIZE_BYTES;
+}
+
+static int gdb_put_qreg(CPUHexagonState *env, uint8_t *mem_buf, int n)
+{
+int i;
+for (i = 0; i < ARRAY_SIZE(env->QRegs[n].uw); i++) {
+env->QRegs[n].uw[i] = ldtul_p(mem_buf);
+mem_buf += 4;
+}
+return MAX_VEC_SIZE_BYTES / 8;
+}
+
+int hexagon_hvx_gdb_write_register(CPUHexagonState *env, uint8_t *mem_buf, int 
n)
+{
+   if (n < NUM_VREGS) {
+return gdb_put_vreg(env, mem_buf, n);
+}
+n -= NUM_VREGS;
+
+if (n < NUM_QREGS) {
+return gdb_put_qreg(env, mem_buf, n);
+}
+
+g_assert_not_reached();
+}
diff --git a/gdb-xml/hexagon-hvx.xml b/gdb-xml/hexagon-hvx.xml
new file mode 100644
index 00..5f2e220733
--- /dev/null
+++ b/gdb-xml/hexagon-hvx.xml
@@ -0,0 +1,96 @@
+
+
+
+
+
+
+  
+  
+  
+  
+  
+  
+  
+  
+  
+
+
+
+
+
+
+
+
+  
+
+  
+
+
+  
+  
+
+
+
+
+  
+  
+  
+  
+  
+  
+
+
+
+
+  
+
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+
+
-- 
2.25.1

[PULL v2 41/44] Hexagon: add core gdbstub xml data for LLDB

2023-05-18 Thread Taylor Simpson

From: Matheus Tavares Bernardino 

Signed-off-by: Matheus Tavares Bernardino 
Reviewed-by: Taylor Simpson 
Signed-off-by: Taylor Simpson 
Message-Id: 

---
 MAINTAINERS|  1 +
 configs/targets/hexagon-linux-user.mak |  1 +
 target/hexagon/cpu.c   |  3 +-
 gdb-xml/hexagon-core.xml   | 84 ++
 4 files changed, 88 insertions(+), 1 deletion(-)
 create mode 100644 gdb-xml/hexagon-core.xml

diff --git a/MAINTAINERS b/MAINTAINERS
index f757369373..2e18c3cad4 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -225,6 +225,7 @@ F: tests/tcg/hexagon/
 F: disas/hexagon.c
 F: configs/targets/hexagon-linux-user/default.mak
 F: docker/dockerfiles/debian-hexagon-cross.docker
+F: gdb-xml/hexagon*.xml
 
 Hexagon idef-parser
 M: Alessandro Di Federico 
diff --git a/configs/targets/hexagon-linux-user.mak 
b/configs/targets/hexagon-linux-user.mak
index 003ed0a408..fd5e222d4f 100644
--- a/configs/targets/hexagon-linux-user.mak
+++ b/configs/targets/hexagon-linux-user.mak
@@ -1 +1,2 @@
 TARGET_ARCH=hexagon
+TARGET_XML_FILES=gdb-xml/hexagon-core.xml
diff --git a/target/hexagon/cpu.c b/target/hexagon/cpu.c
index 7e127059c7..575bcc190d 100644
--- a/target/hexagon/cpu.c
+++ b/target/hexagon/cpu.c
@@ -383,8 +383,9 @@ static void hexagon_cpu_class_init(ObjectClass *c, void 
*data)
 cc->get_pc = hexagon_cpu_get_pc;
 cc->gdb_read_register = hexagon_gdb_read_register;
 cc->gdb_write_register = hexagon_gdb_write_register;
-cc->gdb_num_core_regs = TOTAL_PER_THREAD_REGS + NUM_VREGS + NUM_QREGS;
+cc->gdb_num_core_regs = TOTAL_PER_THREAD_REGS;
 cc->gdb_stop_before_watchpoint = true;
+cc->gdb_core_xml_file = "hexagon-core.xml";
 cc->disas_set_info = hexagon_cpu_disas_set_info;
 cc->tcg_ops = _tcg_ops;
 }
diff --git a/gdb-xml/hexagon-core.xml b/gdb-xml/hexagon-core.xml
new file mode 100644
index 00..e181163cff
--- /dev/null
+++ b/gdb-xml/hexagon-core.xml
@@ -0,0 +1,84 @@
+
+
+
+
+
+
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+  
+
+
-- 
2.25.1

[PULL v2 40/44] gdbstub: add test for untimely stop-reply packets

2023-05-18 Thread Taylor Simpson

From: Matheus Tavares Bernardino 

In the previous commit, we modified gdbstub.c to only send stop-reply
packets as a response to GDB commands that accept it. Now, let's add a
test for this intended behavior. Running this test before the fix from
the previous commit fails as QEMU sends a stop-reply packet
asynchronously, when GDB was in fact waiting an ACK.

Signed-off-by: Matheus Tavares Bernardino 
Acked-by: Alex Bennée 
Signed-off-by: Taylor Simpson 
Message-Id: 

---
 tests/guest-debug/run-test.py| 16 
 .../tcg/multiarch/system/Makefile.softmmu-target | 16 +++-
 2 files changed, 27 insertions(+), 5 deletions(-)

diff --git a/tests/guest-debug/run-test.py b/tests/guest-debug/run-test.py
index d865e46ecd..de6106a5e5 100755
--- a/tests/guest-debug/run-test.py
+++ b/tests/guest-debug/run-test.py
@@ -26,11 +26,12 @@ def get_args():
 parser.add_argument("--qargs", help="Qemu arguments for test")
 parser.add_argument("--binary", help="Binary to debug",
 required=True)
-parser.add_argument("--test", help="GDB test script",
-required=True)
+parser.add_argument("--test", help="GDB test script")
 parser.add_argument("--gdb", help="The gdb binary to use",
 default=None)
+parser.add_argument("--gdb-args", help="Additional gdb arguments")
 parser.add_argument("--output", help="A file to redirect output to")
+parser.add_argument("--stderr", help="A file to redirect stderr to")
 
 return parser.parse_args()
 
@@ -58,6 +59,10 @@ def log(output, msg):
 output = open(args.output, "w")
 else:
 output = None
+if args.stderr:
+stderr = open(args.stderr, "w")
+else:
+stderr = None
 
 socket_dir = TemporaryDirectory("qemu-gdbstub")
 socket_name = os.path.join(socket_dir.name, "gdbstub.socket")
@@ -77,6 +82,8 @@ def log(output, msg):
 
 # Now launch gdb with our test and collect the result
 gdb_cmd = "%s %s" % (args.gdb, args.binary)
+if args.gdb_args:
+gdb_cmd += " %s" % (args.gdb_args)
 # run quietly and ignore .gdbinit
 gdb_cmd += " -q -n -batch"
 # disable prompts in case of crash
@@ -84,13 +91,14 @@ def log(output, msg):
 # connect to remote
 gdb_cmd += " -ex 'target remote %s'" % (socket_name)
 # finally the test script itself
-gdb_cmd += " -x %s" % (args.test)
+if args.test:
+gdb_cmd += " -x %s" % (args.test)
 
 
 sleep(1)
 log(output, "GDB CMD: %s" % (gdb_cmd))
 
-result = subprocess.call(gdb_cmd, shell=True, stdout=output)
+result = subprocess.call(gdb_cmd, shell=True, stdout=output, stderr=stderr)
 
 # A result of greater than 128 indicates a fatal signal (likely a
 # crash due to gdb internal failure). That's a problem for GDB and
diff --git a/tests/tcg/multiarch/system/Makefile.softmmu-target 
b/tests/tcg/multiarch/system/Makefile.softmmu-target
index 5f432c95f3..fe40195d39 100644
--- a/tests/tcg/multiarch/system/Makefile.softmmu-target
+++ b/tests/tcg/multiarch/system/Makefile.softmmu-target
@@ -27,6 +27,20 @@ run-gdbstub-memory: memory
"-monitor none -display none -chardev 
file$(COMMA)path=$<.out$(COMMA)id=output $(QEMU_OPTS)" \
--bin $< --test $(MULTIARCH_SRC)/gdbstub/memory.py, \
softmmu gdbstub support)
+
+run-gdbstub-untimely-packet: hello
+   $(call run-test, $@, $(GDB_SCRIPT) \
+   --gdb $(HAVE_GDB_BIN) \
+   --gdb-args "-ex 'set debug remote 1'" \
+   --output untimely-packet.gdb.out \
+   --stderr untimely-packet.gdb.err \
+   --qemu $(QEMU) \
+   --bin $< --qargs \
+   "-monitor none -display none -chardev 
file$(COMMA)path=untimely-packet.out$(COMMA)id=output $(QEMU_OPTS)", \
+   "softmmu gdbstub untimely packets")
+   $(call quiet-command, \
+   (! grep -Fq 'Packet instead of Ack, ignoring it' 
untimely-packet.gdb.err), \
+   "GREP", "file  untimely-packet.gdb.err")
 else
 run-gdbstub-%:
$(call skip-test, "gdbstub test $*", "no guest arch support")
@@ -36,4 +50,4 @@ run-gdbstub-%:
$(call skip-test, "gdbstub test $*", "need working gdb")
 endif
 
-MULTIARCH_RUNS += run-gdbstub-memory
+MULTIARCH_RUNS += run-gdbstub-memory run-gdbstub-untimely-packet
-- 
2.25.1

Re: [PATCH v10 8/8] apic: disable reentrancy detection for apic-msi

2023-05-18 Thread Michael Tokarev


18.05.2023 23:22, Michael S. Tsirkin пишет:

On Thu, Apr 27, 2023 at 05:10:13PM -0400, Alexander Bulekov wrote:

As the code is designed for re-entrant calls to apic-msi, mark apic-msi
as reentrancy-safe.

Signed-off-by: Alexander Bulekov 
Reviewed-by: Darren Kenny 


Acked-by: Michael S. Tsirkin 


feel free to merge with rest of patchset - who's going to
merge it btw?



https://gitlab.com/qemu-project/qemu/-/commit/50795ee051a342c681a9b45671c552fbd6274db8

Author: Alexander Bulekov 
AuthorDate: Thu Apr 27 17:10:13 2023 -0400
Commit: Thomas Huth 
CommitDate: Fri Apr 28 11:31:54 2023 +0200

FWIW

/mjt

Re: [PATCH v10 8/8] apic: disable reentrancy detection for apic-msi

2023-05-18 Thread Michael S. Tsirkin

On Thu, Apr 27, 2023 at 05:10:13PM -0400, Alexander Bulekov wrote:
> As the code is designed for re-entrant calls to apic-msi, mark apic-msi
> as reentrancy-safe.
> 
> Signed-off-by: Alexander Bulekov 
> Reviewed-by: Darren Kenny 

Acked-by: Michael S. Tsirkin 


feel free to merge with rest of patchset - who's going to
merge it btw?

> ---
>  hw/intc/apic.c | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/hw/intc/apic.c b/hw/intc/apic.c
> index 20b5a94073..ac3d47d231 100644
> --- a/hw/intc/apic.c
> +++ b/hw/intc/apic.c
> @@ -885,6 +885,13 @@ static void apic_realize(DeviceState *dev, Error **errp)
>  memory_region_init_io(>io_memory, OBJECT(s), _io_ops, s, 
> "apic-msi",
>APIC_SPACE_SIZE);
>  
> +/*
> + * apic-msi's apic_mem_write can call into ioapic_eoi_broadcast, which 
> can
> + * write back to apic-msi. As such mark the apic-msi region re-entrancy
> + * safe.
> + */
> +s->io_memory.disable_reentrancy_guard = true;
> +
>  s->timer = timer_new_ns(QEMU_CLOCK_VIRTUAL, apic_timer, s);
>  local_apics[s->id] = s;
>  
> -- 
> 2.39.0

Re: Multiple vIOMMU instance support in QEMU?

2023-05-18 Thread Jason Gunthorpe

On Thu, May 18, 2023 at 03:45:24PM -0400, Peter Xu wrote:
> On Thu, May 18, 2023 at 11:56:46AM -0300, Jason Gunthorpe wrote:
> > On Thu, May 18, 2023 at 10:16:24AM -0400, Peter Xu wrote:
> > 
> > > What you mentioned above makes sense to me from the POV that 1 vIOMMU may
> > > not suffice, but that's at least totally new area to me because I never
> > > used >1 IOMMUs even bare metal (excluding the case where I'm aware that
> > > e.g. a GPU could have its own IOMMU-like dma translator).
> > 
> > Even x86 systems are multi-iommu, one iommu per physical CPU socket.
> 
> I tried to look at a 2-node system on hand and I indeed got two dmars:
> 
> [4.444788] DMAR: dmar0: reg_base_addr fbffc000 ver 1:0 cap 
> 8d2078c106f0466 ecap f020df
> [4.459673] DMAR: dmar1: reg_base_addr c7ffc000 ver 1:0 cap 
> 8d2078c106f0466 ecap f020df
> 
> Though they do not seem to be all parallel on attaching devices.  E.g.,
> most of the devices on this host are attached to dmar1, while there're only
> two devices attached to dmar0:

Yeah, I expect it has to do with physical topology. PCIe devices
physically connected to each socket should use the socket local iommu
and the socket local caches.

ie it would be foolish to take an IO in socket A and the forward it to
socket B to perform IOMMU translation then forward it back to socket A
to land in memory.

> > I'm not sure how they model this though - Kevin do you know? Do we get
> > multiple iommu instances in Linux or is all the broadcasting of
> > invalidates and sharing of tables hidden?
> > 
> > > What's the system layout of your multi-vIOMMU world?  Is there still a
> > > centric vIOMMU, or multi-vIOMMUs can run fully in parallel, so that e.g. 
> > > we
> > > can have DEV1,DEV2 under vIOMMU1 and DEV3,DEV4 under vIOMMU2?
> > 
> > Just like physical, each viommu is parallel and independent. Each has
> > its own caches, ASIDs, DIDs/etc and thus invalidation domains.
> > 
> > The seperated caches is the motivating reason to do this as something
> > like vCMDQ is a direct command channel for invalidations to only the
> > caches of a single IOMMU block.
> 
> From cache invalidation pov, shouldn't the best be per-device granule (like
> dev-iotlb in VT-d? No idea for ARM)?

There are many caches and different cache tag schemes in an iommu. All
of them are local to the IOMMU block.

Consider where we might have a single vDID but the devices using that
DID are spread across two physical IOMMUs. When the VM asks to
invalidate the vDID the system has to generate two physical pDID
invalidations.

This can't be done without a software mediation layer in the VMM.

The better solution is to make the pDID and vDID 1:1 so the VM itself
replicates the invalidations. The VM has better knowledge of when
replication is needed so it is overall more efficient.

> I see that Intel is already copied here (at least Yi and Kevin) so I assume
> there're already some kind of synchronizations on multi-vIOMMU vs recent
> works on Intel side, which is definitely nice and can avoid work conflicts.

I actually don't know that.. Intel sees multiple DMAR blocks in SW and
they have kernel level replication of invalidation.. Intel doesn't
have a HW fast path yet so they can rely on mediation to fix it. Thus
I expect there is no HW replication of invalidations here. Kevin?

Remember the VFIO API hides all of this, when you change the VFIO
container it automatically generates all requires invalidations in the
kernel.

I also heard AMD has a HW fast and also multi-iommu but I don't really
know the details.

Jason

Re: [PATCH RFC 1/5] hw/cxl: Use define for build bug detection

2023-05-18 Thread Ira Weiny

Jonathan Cameron wrote:
> On Wed, 17 May 2023 19:45:54 -0700
> Ira Weiny  wrote:
> 
> > Magic numbers can be confusing.
> > 
> > Use the range size define for CXL.cachemem rather than a magic number.
> > Update/add spec references.
> > 
> > Signed-off-by: Ira Weiny 
> 
> I guess we should do a scrub to move all refs to 3.0 soon
> given it's horrible having a mixture of spec versions for the references.
> 
> For future specs, we should only do this when sufficient X.Y references
> have started to appear - I think that's true for r3.0 now.

For the kernel side I think Dan is taking the 'if you are updating it then
update the spec' but otherwise leave it be.  So since I'm touching the
code I updated it.

I agree, it is a pain to have to look at the 2.0 spec but you can do it.

Ira

Re: [PATCH 7/8] python/qemu: allow avocado to set logging name space

2023-05-18 Thread John Snow

On Thu, May 18, 2023 at 12:20 PM Alex Bennée  wrote:
>
> Since the update to the latest version Avocado only automatically
> collects logging under the avocado name space. Tweak the QEMUMachine
> class to allow avocado to bring logging under its name space. This
> also allows useful tricks like:
>
>   ./avocado --show avocado.qemu.machine run path/to/test
>
> if you want to quickly get the machine invocation out of a test
> without searching deeply through the logs.
>

Huh. That's kind of weird though, right? Each Python module is
intended to log to its own namespace by design; it feels like Avocado
really ought to have configuration options that allows it to collect
logging from other namespaces. I'm not against this patch, but if for
instance I wind up splitting qemu.machine out as a separate module
someday (like I did to qemu.qmp), then it feels weird to add options
specifically for fudging the logging hierarchy.

Also, what about the QMP logging? I don't suppose this will trickle
down to that level either.

Worried this is kind of incomplete.

--js

> Signed-off-by: Alex Bennée 
> ---
>  python/qemu/machine/machine.py | 42 ++
>  tests/avocado/avocado_qemu/__init__.py |  3 +-
>  2 files changed, 24 insertions(+), 21 deletions(-)
>
> diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
> index e57c254484..402b9a0df9 100644
> --- a/python/qemu/machine/machine.py
> +++ b/python/qemu/machine/machine.py
> @@ -49,10 +49,6 @@
>
>  from . import console_socket
>
> -
> -LOG = logging.getLogger(__name__)
> -
> -
>  class QEMUMachineError(Exception):
>  """
>  Exception called when an error in QEMUMachine happens.
> @@ -131,6 +127,7 @@ def __init__(self,
>   drain_console: bool = False,
>   console_log: Optional[str] = None,
>   log_dir: Optional[str] = None,
> + log_namespace: Optional[str] = None,
>   qmp_timer: Optional[float] = 30):
>  '''
>  Initialize a QEMUMachine
> @@ -164,6 +161,11 @@ def __init__(self,
>  self._sock_dir = sock_dir
>  self._log_dir = log_dir
>
> +if log_namespace:
> +self.log = logging.getLogger(log_namespace)
> +else:
> +self.log = logging.getLogger(__name__)
> +
>  self._monitor_address = monitor_address
>
>  self._console_log_path = console_log
> @@ -382,11 +384,11 @@ def _post_shutdown(self) -> None:
>  Called to cleanup the VM instance after the process has exited.
>  May also be called after a failed launch.
>  """
> -LOG.debug("Cleaning up after VM process")
> +self.log.debug("Cleaning up after VM process")
>  try:
>  self._close_qmp_connection()
>  except Exception as err:  # pylint: disable=broad-except
> -LOG.warning(
> +self.log.warning(
>  "Exception closing QMP connection: %s",
>  str(err) if str(err) else type(err).__name__
>  )
> @@ -414,7 +416,7 @@ def _post_shutdown(self) -> None:
>  command = ' '.join(self._qemu_full_args)
>  else:
>  command = ''
> -LOG.warning(msg, -int(exitcode), command)
> +self.log.warning(msg, -int(exitcode), command)
>
>  self._quit_issued = False
>  self._user_killed = False
> @@ -458,7 +460,7 @@ def _launch(self) -> None:
>  Launch the VM and establish a QMP connection
>  """
>  self._pre_launch()
> -LOG.debug('VM launch command: %r', ' '.join(self._qemu_full_args))
> +self.log.debug('VM launch command: %r', ' 
> '.join(self._qemu_full_args))
>
>  # Cleaning up of this subprocess is guaranteed by _do_shutdown.
>  # pylint: disable=consider-using-with
> @@ -507,7 +509,7 @@ def _early_cleanup(self) -> None:
>  # for QEMU to exit, while QEMU is waiting for the socket to
>  # become writable.
>  if self._console_socket is not None:
> -LOG.debug("Closing console socket")
> +self.log.debug("Closing console socket")
>  self._console_socket.close()
>  self._console_socket = None
>
> @@ -518,7 +520,7 @@ def _hard_shutdown(self) -> None:
>  :raise subprocess.Timeout: When timeout is exceeds 60 seconds
>  waiting for the QEMU process to terminate.
>  """
> -LOG.debug("Performing hard shutdown")
> +self.log.debug("Performing hard shutdown")
>  self._early_cleanup()
>  self._subp.kill()
>  self._subp.wait(timeout=60)
> @@ -535,17 +537,17 @@ def _soft_shutdown(self, timeout: Optional[int]) -> 
> None:
>  :raise subprocess.TimeoutExpired: When timeout is exceeded waiting 
> for
>  the QEMU process to terminate.
>  """
> -LOG.debug("Attempting graceful termination")
> +

[PULL v2 22/44] Hexagon (target/hexagon) Short-circuit packet HVX writes

2023-05-18 Thread Taylor Simpson

In certain cases, we can avoid the overhead of writing to future_VRegs
and write directly to VRegs.  We consider HVX reads/writes when computing
ctx->need_commit.  Then, we can early-exit from gen_commit_hvx.

Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
Message-Id: <20230427230012.3800327-14-tsimp...@quicinc.com>
---
 target/hexagon/genptr.c|  6 -
 target/hexagon/translate.c | 46 +-
 2 files changed, 50 insertions(+), 2 deletions(-)

diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 5025e172cf..82a3408eb4 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -1104,7 +1104,11 @@ static void gen_log_vreg_write_pair(DisasContext *ctx, 
intptr_t srcoff, int num,
 
 static intptr_t get_result_qreg(DisasContext *ctx, int qnum)
 {
-return  offsetof(CPUHexagonState, future_QRegs[qnum]);
+if (ctx->need_commit) {
+return  offsetof(CPUHexagonState, future_QRegs[qnum]);
+} else {
+return  offsetof(CPUHexagonState, QRegs[qnum]);
+}
 }
 
 static void gen_vreg_load(DisasContext *ctx, intptr_t dstoff, TCGv src,
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index bcf64f725a..8e7a4377c8 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -70,6 +70,10 @@ intptr_t ctx_future_vreg_off(DisasContext *ctx, int regnum,
 {
 intptr_t offset;
 
+if (!ctx->need_commit) {
+return offsetof(CPUHexagonState, VRegs[regnum]);
+}
+
 /* See if it is already allocated */
 for (int i = 0; i < ctx->future_vregs_idx; i++) {
 if (ctx->future_vregs_num[i] == regnum) {
@@ -374,7 +378,7 @@ static bool need_commit(DisasContext *ctx)
 return true;
 }
 
-if (pkt->num_insns == 1) {
+if (pkt->num_insns == 1 && !pkt->pkt_has_hvx) {
 return false;
 }
 
@@ -394,6 +398,40 @@ static bool need_commit(DisasContext *ctx)
 }
 }
 
+/* Check for overlap between HVX reads and writes */
+for (int i = 0; i < ctx->vreg_log_idx; i++) {
+int vnum = ctx->vreg_log[i];
+if (test_bit(vnum, ctx->vregs_read)) {
+return true;
+}
+}
+if (!bitmap_empty(ctx->vregs_updated_tmp, NUM_VREGS)) {
+int i = find_first_bit(ctx->vregs_updated_tmp, NUM_VREGS);
+while (i < NUM_VREGS) {
+if (test_bit(i, ctx->vregs_read)) {
+return true;
+}
+i = find_next_bit(ctx->vregs_updated_tmp, NUM_VREGS, i + 1);
+}
+}
+if (!bitmap_empty(ctx->vregs_select, NUM_VREGS)) {
+int i = find_first_bit(ctx->vregs_select, NUM_VREGS);
+while (i < NUM_VREGS) {
+if (test_bit(i, ctx->vregs_read)) {
+return true;
+}
+i = find_next_bit(ctx->vregs_select, NUM_VREGS, i + 1);
+}
+}
+
+/* Check for overlap between HVX predicate reads and writes */
+for (int i = 0; i < ctx->qreg_log_idx; i++) {
+int qnum = ctx->qreg_log[i];
+if (test_bit(qnum, ctx->qregs_read)) {
+return true;
+}
+}
+
 return false;
 }
 
@@ -790,6 +828,12 @@ static void gen_commit_hvx(DisasContext *ctx)
 {
 int i;
 
+/* Early exit if not needed */
+if (!ctx->need_commit) {
+g_assert(!pkt_has_hvx_store(ctx->pkt));
+return;
+}
+
 /*
  *for (i = 0; i < ctx->vreg_log_idx; i++) {
  *int rnum = ctx->vreg_log[i];
-- 
2.25.1

[PULL v2 31/44] Hexagon (target/hexagon) Additional instructions handled by idef-parser

2023-05-18 Thread Taylor Simpson

 Changes in v3 
Fix bugs exposed by dpmpyss_rnd_s0 instruction
Set correct size/signedness for constants
Test cases added to tests/tcg/hexagon/misc.c

 Changes in v2 
Fix bug in imm_print identified in clang build

Currently, idef-parser skips all floating point instructions.  However,
there are some floating point instructions that can be handled.

The following instructions are now parsed
F2_sfimm_p
F2_sfimm_n
F2_dfimm_p
F2_dfimm_n
F2_dfmpyll
F2_dfmpylh

To make these instructions work, we fix some bugs in parser-helpers.c
gen_rvalue_extend
gen_cast_op
imm_print
lexer properly sets size/signedness of constants

Test cases added to tests/tcg/hexagon/fpstuff.c

Signed-off-by: Taylor Simpson 
Tested-by: Anton Johansson 
Reviewed-by: Anton Johansson 
Message-Id: <20230501203125.4025991-1-tsimp...@quicinc.com>
---
 target/hexagon/idef-parser/parser-helpers.h |  2 +-
 target/hexagon/idef-parser/parser-helpers.c | 61 ++---
 tests/tcg/hexagon/fpstuff.c | 54 ++
 tests/tcg/hexagon/misc.c| 35 
 target/hexagon/gen_idef_parser_funcs.py | 10 +++-
 target/hexagon/idef-parser/idef-parser.lex  | 37 +++--
 target/hexagon/idef-parser/idef-parser.y|  2 -
 7 files changed, 160 insertions(+), 41 deletions(-)

diff --git a/target/hexagon/idef-parser/parser-helpers.h 
b/target/hexagon/idef-parser/parser-helpers.h
index 1239d23a6a..7c58087169 100644
--- a/target/hexagon/idef-parser/parser-helpers.h
+++ b/target/hexagon/idef-parser/parser-helpers.h
@@ -80,7 +80,7 @@ void reg_compose(Context *c, YYLTYPE *locp, HexReg *reg, char 
reg_id[5]);
 
 void reg_print(Context *c, YYLTYPE *locp, HexReg *reg);
 
-void imm_print(Context *c, YYLTYPE *locp, HexImm *imm);
+void imm_print(Context *c, YYLTYPE *locp, HexValue *rvalue);
 
 void var_print(Context *c, YYLTYPE *locp, HexVar *var);
 
diff --git a/target/hexagon/idef-parser/parser-helpers.c 
b/target/hexagon/idef-parser/parser-helpers.c
index 6626e006f6..9550097269 100644
--- a/target/hexagon/idef-parser/parser-helpers.c
+++ b/target/hexagon/idef-parser/parser-helpers.c
@@ -167,8 +167,9 @@ void reg_print(Context *c, YYLTYPE *locp, HexReg *reg)
 EMIT(c, "hex_gpr[%u]", reg->id);
 }
 
-void imm_print(Context *c, YYLTYPE *locp, HexImm *imm)
+void imm_print(Context *c, YYLTYPE *locp, HexValue *rvalue)
 {
+HexImm *imm = >imm;
 switch (imm->type) {
 case I:
 EMIT(c, "i");
@@ -177,7 +178,21 @@ void imm_print(Context *c, YYLTYPE *locp, HexImm *imm)
 EMIT(c, "%ciV", imm->id);
 break;
 case VALUE:
-EMIT(c, "((int64_t) %" PRIu64 "ULL)", (int64_t) imm->value);
+if (rvalue->bit_width == 32) {
+if (rvalue->signedness == UNSIGNED) {
+EMIT(c, "((uint32_t) 0x%" PRIx32 ")", (uint32_t) imm->value);
+}  else {
+EMIT(c, "((int32_t) 0x%" PRIx32 ")", (int32_t) imm->value);
+}
+} else if (rvalue->bit_width == 64) {
+if (rvalue->signedness == UNSIGNED) {
+EMIT(c, "((uint64_t) 0x%" PRIx64 "ULL)", (uint64_t) 
imm->value);
+} else {
+EMIT(c, "((int64_t) 0x%" PRIx64 "LL)", (int64_t) imm->value);
+}
+} else {
+g_assert_not_reached();
+}
 break;
 case QEMU_TMP:
 EMIT(c, "qemu_tmp_%" PRIu64, imm->index);
@@ -213,7 +228,7 @@ void rvalue_print(Context *c, YYLTYPE *locp, void *pointer)
   tmp_print(c, locp, >tmp);
   break;
   case IMMEDIATE:
-  imm_print(c, locp, >imm);
+  imm_print(c, locp, rvalue);
   break;
   case VARID:
   var_print(c, locp, >var);
@@ -386,13 +401,10 @@ HexValue gen_rvalue_extend(Context *c, YYLTYPE *locp, 
HexValue *rvalue)
 
 if (rvalue->type == IMMEDIATE) {
 HexValue res = gen_imm_qemu_tmp(c, locp, 64, rvalue->signedness);
-bool is_unsigned = (rvalue->signedness == UNSIGNED);
-const char *sign_suffix = is_unsigned ? "u" : "";
 gen_c_int_type(c, locp, 64, rvalue->signedness);
-OUT(c, locp, " ", , " = ");
-OUT(c, locp, "(", sign_suffix, "int64_t) ");
-OUT(c, locp, "(", sign_suffix, "int32_t) ");
-OUT(c, locp, rvalue, ";\n");
+OUT(c, locp, " ", , " = (");
+gen_c_int_type(c, locp, 64, rvalue->signedness);
+OUT(c, locp, ")", rvalue, ";\n");
 return res;
 } else {
 HexValue res = gen_tmp(c, locp, 64, rvalue->signedness);
@@ -959,33 +971,18 @@ HexValue gen_cast_op(Context *c,
  unsigned target_width,
  HexSignedness signedness)
 {
+HexValue res;
 assert_signedness(c, locp, src->signedness);
 if (src->bit_width == target_width) {
-return *src;
-} else if (src->type == IMMEDIATE) {
-HexValue res = *src;
-res.bit_width = target_width;
-res.signedness = signedness;
-return

[PULL v2 33/44] Hexagon (target/hexagon/*.py): raise exception on reg parsing error

2023-05-18 Thread Taylor Simpson

From: Matheus Tavares Bernardino 

Currently, the python scripts used for the hexagon building will not
abort the compilation when there is an error parsing a register. Let's
make the compilation properly fail in such cases by rasing an exception
instead of just printing a warning message, which might get lost in the
output.

This patch was generated with:

 git grep -l "Bad register" *hexagon* | \
 xargs sed -i "" -e 's/print("Bad register parse: "[, 
]*\([^)]*\))/hex_common.bad_register(\1)/g'

Plus the bad_register() helper added to hex_common.py.

Signed-off-by: Matheus Tavares Bernardino 
Reviewed-by: Anton Johansson 
Tested-by: Taylor Simpson 
Reviewed-by: Taylor Simpson 
Signed-off-by: Taylor Simpson 
Message-Id: 
<1f5dbd92f68fdd89e2647e4ba527a2c32cf0f070.1683217043.git.quic_mathb...@quicinc.com>
---
 target/hexagon/gen_analyze_funcs.py | 30 +-
 target/hexagon/gen_helper_funcs.py  | 14 ++---
 target/hexagon/gen_helper_protos.py |  2 +-
 target/hexagon/gen_idef_parser_funcs.py |  2 +-
 target/hexagon/gen_tcg_funcs.py | 78 -
 target/hexagon/hex_common.py|  3 +
 6 files changed, 66 insertions(+), 63 deletions(-)

diff --git a/target/hexagon/gen_analyze_funcs.py 
b/target/hexagon/gen_analyze_funcs.py
index d040f67001..00868cc6cb 100755
--- a/target/hexagon/gen_analyze_funcs.py
+++ b/target/hexagon/gen_analyze_funcs.py
@@ -47,7 +47,7 @@ def analyze_opn_old(f, tag, regtype, regid, regno):
 f.write(f"const int {regN} = insn->regno[{regno}];\n")
 f.write(f"ctx_log_reg_write(ctx, {regN}, {predicated});\n")
 else:
-print("Bad register parse: ", regtype, regid)
+hex_common.bad_register(regtype, regid)
 elif regtype == "P":
 if regid in {"s", "t", "u", "v"}:
 f.write(f"const int {regN} = insn->regno[{regno}];\n")
@@ -56,7 +56,7 @@ def analyze_opn_old(f, tag, regtype, regid, regno):
 f.write(f"const int {regN} = insn->regno[{regno}];\n")
 f.write(f"ctx_log_pred_write(ctx, {regN});\n")
 else:
-print("Bad register parse: ", regtype, regid)
+hex_common.bad_register(regtype, regid)
 elif regtype == "C":
 if regid == "ss":
 f.write(
@@ -77,13 +77,13 @@ def analyze_opn_old(f, tag, regtype, regid, regno):
 f.write(f"const int {regN} = insn->regno[{regno}] " "+ 
HEX_REG_SA0;\n")
 f.write(f"ctx_log_reg_write(ctx, {regN}, {predicated});\n")
 else:
-print("Bad register parse: ", regtype, regid)
+hex_common.bad_register(regtype, regid)
 elif regtype == "M":
 if regid == "u":
 f.write(f"const int {regN} = insn->regno[{regno}];\n")
 f.write(f"ctx_log_reg_read(ctx, {regN});\n")
 else:
-print("Bad register parse: ", regtype, regid)
+hex_common.bad_register(regtype, regid)
 elif regtype == "V":
 newv = "EXT_DFL"
 if hex_common.is_new_result(tag):
@@ -105,7 +105,7 @@ def analyze_opn_old(f, tag, regtype, regid, regno):
 f.write(f"const int {regN} = insn->regno[{regno}];\n")
 f.write(f"ctx_log_vreg_write(ctx, {regN}, {newv}, " 
f"{predicated});\n")
 else:
-print("Bad register parse: ", regtype, regid)
+hex_common.bad_register(regtype, regid)
 elif regtype == "Q":
 if regid in {"d", "e", "x"}:
 f.write(f"const int {regN} = insn->regno[{regno}];\n")
@@ -114,7 +114,7 @@ def analyze_opn_old(f, tag, regtype, regid, regno):
 f.write(f"const int {regN} = insn->regno[{regno}];\n")
 f.write(f"ctx_log_qreg_read(ctx, {regN});\n")
 else:
-print("Bad register parse: ", regtype, regid)
+hex_common.bad_register(regtype, regid)
 elif regtype == "G":
 if regid in {"dd"}:
 f.write(f"//const int {regN} = insn->regno[{regno}];\n")
@@ -125,7 +125,7 @@ def analyze_opn_old(f, tag, regtype, regid, regno):
 elif regid in {"s"}:
 f.write(f"//const int {regN} = insn->regno[{regno}];\n")
 else:
-print("Bad register parse: ", regtype, regid)
+hex_common.bad_register(regtype, regid)
 elif regtype == "S":
 if regid in {"dd"}:
 f.write(f"//const int {regN} = insn->regno[{regno}];\n")
@@ -136,9 +136,9 @@ def analyze_opn_old(f, tag, regtype, regid, regno):
 elif regid in {"s"}:
 f.write(f"//const int {regN} = insn->regno[{regno}];\n")
 else:
-print("Bad register parse: ", regtype, regid)
+hex_common.bad_register(regtype, regid)
 else:
-print("Bad register parse: ", regtype, regid)
+hex_common.bad_register(regtype, regid)
 
 
 def analyze_opn_new(f, tag, regtype, regid, regno):
@@ -148,21 +148,21 @@ def analyze_opn_new(f, tag,

[PULL v2 19/44] Hexagon (target/hexagon) Mark registers as read during packet analysis

2023-05-18 Thread Taylor Simpson

Have gen_analyze_funcs mark the registers that are read by the
instruction.  We also mark the implicit reads using instruction
attributes.

Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
Message-Id: <20230427230012.3800327-11-tsimp...@quicinc.com>
---
 target/hexagon/translate.h  | 36 +++
 target/hexagon/attribs_def.h.inc|  6 +++-
 target/hexagon/translate.c  | 20 +
 target/hexagon/gen_analyze_funcs.py | 44 -
 target/hexagon/hex_common.py|  6 
 5 files changed, 97 insertions(+), 15 deletions(-)

diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h
index 4b9f21c41d..f72228859f 100644
--- a/target/hexagon/translate.h
+++ b/target/hexagon/translate.h
@@ -38,10 +38,12 @@ typedef struct DisasContext {
 int reg_log[REG_WRITES_MAX];
 int reg_log_idx;
 DECLARE_BITMAP(regs_written, TOTAL_PER_THREAD_REGS);
+DECLARE_BITMAP(regs_read, TOTAL_PER_THREAD_REGS);
 DECLARE_BITMAP(predicated_regs, TOTAL_PER_THREAD_REGS);
 int preg_log[PRED_WRITES_MAX];
 int preg_log_idx;
 DECLARE_BITMAP(pregs_written, NUM_PREGS);
+DECLARE_BITMAP(pregs_read, NUM_PREGS);
 uint8_t store_width[STORES_MAX];
 bool s1_store_processed;
 int future_vregs_idx;
@@ -55,8 +57,10 @@ typedef struct DisasContext {
 DECLARE_BITMAP(vregs_select, NUM_VREGS);
 DECLARE_BITMAP(predicated_future_vregs, NUM_VREGS);
 DECLARE_BITMAP(predicated_tmp_vregs, NUM_VREGS);
+DECLARE_BITMAP(vregs_read, NUM_VREGS);
 int qreg_log[NUM_QREGS];
 int qreg_log_idx;
+DECLARE_BITMAP(qregs_read, NUM_QREGS);
 bool pre_commit;
 TCGCond branch_cond;
 target_ulong branch_dest;
@@ -73,6 +77,11 @@ static inline void ctx_log_pred_write(DisasContext *ctx, int 
pnum)
 }
 }
 
+static inline void ctx_log_pred_read(DisasContext *ctx, int pnum)
+{
+set_bit(pnum, ctx->pregs_read);
+}
+
 static inline void ctx_log_reg_write(DisasContext *ctx, int rnum,
  bool is_predicated)
 {
@@ -99,6 +108,17 @@ static inline void ctx_log_reg_write_pair(DisasContext 
*ctx, int rnum,
 ctx_log_reg_write(ctx, rnum + 1, is_predicated);
 }
 
+static inline void ctx_log_reg_read(DisasContext *ctx, int rnum)
+{
+set_bit(rnum, ctx->regs_read);
+}
+
+static inline void ctx_log_reg_read_pair(DisasContext *ctx, int rnum)
+{
+ctx_log_reg_read(ctx, rnum);
+ctx_log_reg_read(ctx, rnum + 1);
+}
+
 intptr_t ctx_future_vreg_off(DisasContext *ctx, int regnum,
  int num, bool alloc_ok);
 intptr_t ctx_tmp_vreg_off(DisasContext *ctx, int regnum,
@@ -139,6 +159,17 @@ static inline void ctx_log_vreg_write_pair(DisasContext 
*ctx,
 ctx_log_vreg_write(ctx, rnum ^ 1, type, is_predicated);
 }
 
+static inline void ctx_log_vreg_read(DisasContext *ctx, int rnum)
+{
+set_bit(rnum, ctx->vregs_read);
+}
+
+static inline void ctx_log_vreg_read_pair(DisasContext *ctx, int rnum)
+{
+ctx_log_vreg_read(ctx, rnum ^ 0);
+ctx_log_vreg_read(ctx, rnum ^ 1);
+}
+
 static inline void ctx_log_qreg_write(DisasContext *ctx,
   int rnum)
 {
@@ -146,6 +177,11 @@ static inline void ctx_log_qreg_write(DisasContext *ctx,
 ctx->qreg_log_idx++;
 }
 
+static inline void ctx_log_qreg_read(DisasContext *ctx, int qnum)
+{
+set_bit(qnum, ctx->qregs_read);
+}
+
 extern TCGv hex_gpr[TOTAL_PER_THREAD_REGS];
 extern TCGv hex_pred[NUM_PREGS];
 extern TCGv hex_this_PC;
diff --git a/target/hexagon/attribs_def.h.inc b/target/hexagon/attribs_def.h.inc
index 69da9776f0..21d457fa4a 100644
--- a/target/hexagon/attribs_def.h.inc
+++ b/target/hexagon/attribs_def.h.inc
@@ -1,5 +1,5 @@
 /*
- *  Copyright(c) 2019-2022 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
+ *  Copyright(c) 2019-2023 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
  *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
@@ -112,6 +112,10 @@ DEF_ATTRIB(IMPLICIT_WRITES_P1, "Writes Predicate 1", "", 
"UREG.P1")
 DEF_ATTRIB(IMPLICIT_WRITES_P2, "Writes Predicate 1", "", "UREG.P2")
 DEF_ATTRIB(IMPLICIT_WRITES_P3, "May write Predicate 3", "", "UREG.P3")
 DEF_ATTRIB(IMPLICIT_READS_PC, "Reads the PC register", "", "")
+DEF_ATTRIB(IMPLICIT_READS_P0, "Reads the P0 register", "", "")
+DEF_ATTRIB(IMPLICIT_READS_P1, "Reads the P1 register", "", "")
+DEF_ATTRIB(IMPLICIT_READS_P2, "Reads the P2 register", "", "")
+DEF_ATTRIB(IMPLICIT_READS_P3, "Reads the P3 register", "", "")
 DEF_ATTRIB(IMPLICIT_WRITES_USR, "May write USR", "", "")
 DEF_ATTRIB(WRITES_PRED_REG, "Writes a predicate register", "", "")
 DEF_ATTRIB(COMMUTES, "The operation is communitive", "", "")
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index 0b021b301a..e84bd34618 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -336,6 +336,21 @@ static void

[PULL v2 17/44] Hexagon (target/hexagon) Clean up pred_written usage

2023-05-18 Thread Taylor Simpson

Only endloop instructions will conditionally write to a predicate.
When there is an endloop instruction, we preload the values into
new_pred_value.

The only place pred_written is needed is when HEX_DEBUG is on.

We remove the last use of check_for_attrib.  However, new uses will be
introduced later in this series, so we mark it with G_GNUC_UNUSED.

Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
Message-Id: <20230427230012.3800327-9-tsimp...@quicinc.com>
---
 target/hexagon/genptr.c| 16 +---
 target/hexagon/translate.c | 53 --
 2 files changed, 23 insertions(+), 46 deletions(-)

diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index dac62b90a6..9bbaca6300 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -137,7 +137,9 @@ void gen_log_pred_write(DisasContext *ctx, int pnum, TCGv 
val)
 tcg_gen_and_tl(hex_new_pred_value[pnum],
hex_new_pred_value[pnum], base_val);
 }
-tcg_gen_ori_tl(hex_pred_written, hex_pred_written, 1 << pnum);
+if (HEX_DEBUG) {
+tcg_gen_ori_tl(hex_pred_written, hex_pred_written, 1 << pnum);
+}
 set_bit(pnum, ctx->pregs_written);
 }
 
@@ -826,15 +828,13 @@ static void gen_endloop0(DisasContext *ctx)
 
 /*
  *if (lpcfg == 1) {
- *hex_new_pred_value[3] = 0xff;
- *hex_pred_written |= 1 << 3;
+ *p3 = 0xff;
  *}
  */
 TCGLabel *label1 = gen_new_label();
 tcg_gen_brcondi_tl(TCG_COND_NE, lpcfg, 1, label1);
 {
-tcg_gen_movi_tl(hex_new_pred_value[3], 0xff);
-tcg_gen_ori_tl(hex_pred_written, hex_pred_written, 1 << 3);
+gen_log_pred_write(ctx, 3, tcg_constant_tl(0xff));
 }
 gen_set_label(label1);
 
@@ -903,14 +903,12 @@ static void gen_endloop01(DisasContext *ctx)
 
 /*
  *if (lpcfg == 1) {
- *hex_new_pred_value[3] = 0xff;
- *hex_pred_written |= 1 << 3;
+ *p3 = 0xff;
  *}
  */
 tcg_gen_brcondi_tl(TCG_COND_NE, lpcfg, 1, label1);
 {
-tcg_gen_movi_tl(hex_new_pred_value[3], 0xff);
-tcg_gen_ori_tl(hex_pred_written, hex_pred_written, 1 << 3);
+gen_log_pred_write(ctx, 3, tcg_constant_tl(0xff));
 }
 gen_set_label(label1);
 
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index 01f448a325..0b021b301a 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -239,7 +239,7 @@ static int read_packet_words(CPUHexagonState *env, 
DisasContext *ctx,
 return nwords;
 }
 
-static bool check_for_attrib(Packet *pkt, int attrib)
+static G_GNUC_UNUSED bool check_for_attrib(Packet *pkt, int attrib)
 {
 for (int i = 0; i < pkt->num_insns; i++) {
 if (GET_ATTRIB(pkt->insn[i].opcode, attrib)) {
@@ -262,11 +262,6 @@ static bool need_slot_cancelled(Packet *pkt)
 return false;
 }
 
-static bool need_pred_written(Packet *pkt)
-{
-return check_for_attrib(pkt, A_WRITES_PRED_REG);
-}
-
 static bool need_next_PC(DisasContext *ctx)
 {
 Packet *pkt = ctx->pkt;
@@ -414,7 +409,7 @@ static void gen_start_packet(DisasContext *ctx)
 tcg_gen_movi_tl(hex_gpr[HEX_REG_PC], next_PC);
 }
 }
-if (need_pred_written(pkt)) {
+if (HEX_DEBUG) {
 tcg_gen_movi_tl(hex_pred_written, 0);
 }
 
@@ -428,6 +423,17 @@ static void gen_start_packet(DisasContext *ctx)
 }
 }
 
+/*
+ * Preload the predicated pred registers into hex_new_pred_value[pred_num]
+ * Only endloop instructions conditionally write to pred registers
+ */
+if (pkt->pkt_has_endloop) {
+for (int i = 0; i < ctx->preg_log_idx; i++) {
+int pred_num = ctx->preg_log[i];
+tcg_gen_mov_tl(hex_new_pred_value[pred_num], hex_pred[pred_num]);
+}
+}
+
 /* Preload the predicated HVX registers into future_VRegs and tmp_VRegs */
 if (!bitmap_empty(ctx->predicated_future_vregs, NUM_VREGS)) {
 int i = find_first_bit(ctx->predicated_future_vregs, NUM_VREGS);
@@ -535,41 +541,14 @@ static void gen_reg_writes(DisasContext *ctx)
 
 static void gen_pred_writes(DisasContext *ctx)
 {
-int i;
-
 /* Early exit if the log is empty */
 if (!ctx->preg_log_idx) {
 return;
 }
 
-/*
- * Only endloop instructions will conditionally
- * write a predicate.  If there are no endloop
- * instructions, we can use the non-conditional
- * write of the predicates.
- */
-if (ctx->pkt->pkt_has_endloop) {
-TCGv zero = tcg_constant_tl(0);
-TCGv pred_written = tcg_temp_new();
-for (i = 0; i < ctx->preg_log_idx; i++) {
-int pred_num = ctx->preg_log[i];
-
-tcg_gen_andi_tl(pred_written, hex_pred_written, 1 << pred_num);
-tcg_gen_movcond_tl(TCG_COND_NE, hex_pred[pred_num],
-   pred_written, zero,
-

[PULL v2 10/44] meson.build Add CONFIG_HEXAGON_IDEF_PARSER

2023-05-18 Thread Taylor Simpson

Enable conditional compilation depending on whether idef-parser
is configured

Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
Message-Id: <20230427230012.3800327-2-tsimp...@quicinc.com>
---
 meson.build | 1 +
 1 file changed, 1 insertion(+)

diff --git a/meson.build b/meson.build
index d3cf48960b..b36124fdc4 100644
--- a/meson.build
+++ b/meson.build
@@ -1866,6 +1866,7 @@ endif
 config_host_data.set('CONFIG_GTK', gtk.found())
 config_host_data.set('CONFIG_VTE', vte.found())
 config_host_data.set('CONFIG_GTK_CLIPBOARD', have_gtk_clipboard)
+config_host_data.set('CONFIG_HEXAGON_IDEF_PARSER', 
get_option('hexagon_idef_parser'))
 config_host_data.set('CONFIG_LIBATTR', have_old_libattr)
 config_host_data.set('CONFIG_LIBCAP_NG', libcap_ng.found())
 config_host_data.set('CONFIG_EBPF', libbpf.found())
-- 
2.25.1

[PULL v2 35/44] Hexagon: append eflags to unknown cpu model string

2023-05-18 Thread Taylor Simpson

From: Matheus Tavares Bernardino 

Running qemu-hexagon with a binary that was compiled for an arch version
unknown by qemu can produce a somewhat confusing message:

  qemu-hexagon: unable to find CPU model 'unknown'

Let's give a bit more info by appending the eflags so that the message
becomes:

  qemu-hexagon: unable to find CPU model 'unknown (0x69)'

Signed-off-by: Matheus Tavares Bernardino 
Signed-off-by: Taylor Simpson 
Tested-by: Taylor Simpson 
Reviewed-by: Taylor Simpson 
Message-Id: 
<8a8d013cc619b94fd4fb577ae6a8df26cedb972b.1683225804.git.quic_mathb...@quicinc.com>
---
 linux-user/hexagon/target_elf.h | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/linux-user/hexagon/target_elf.h b/linux-user/hexagon/target_elf.h
index a0271a0a2a..36056fc9f0 100644
--- a/linux-user/hexagon/target_elf.h
+++ b/linux-user/hexagon/target_elf.h
@@ -20,6 +20,9 @@
 
 static inline const char *cpu_get_model(uint32_t eflags)
 {
+static char buf[32];
+int err;
+
 /* For now, treat anything newer than v5 as a v73 */
 /* FIXME - Disable instructions that are newer than the specified arch */
 if (eflags == 0x04 ||/* v5  */
@@ -39,7 +42,9 @@ static inline const char *cpu_get_model(uint32_t eflags)
) {
 return "v73";
 }
-return "unknown";
+
+err = snprintf(buf, sizeof(buf), "unknown (0x%x)", eflags);
+return err >= 0 && err < sizeof(buf) ? buf : "unknown";
 }
 
 #endif
-- 
2.25.1

[PULL v2 14/44] Hexagon (target/hexagon) Add overrides for clr[tf]new

2023-05-18 Thread Taylor Simpson

These instructions have implicit reads from p0, so we don't want
them in helpers when idef-parser is off.

Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
Message-Id: <20230427230012.3800327-6-tsimp...@quicinc.com>
---
 target/hexagon/gen_tcg.h | 16 
 target/hexagon/macros.h  |  4 
 2 files changed, 16 insertions(+), 4 deletions(-)

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index ef17f2f18c..a1d7eabae7 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -1101,6 +1101,22 @@
 gen_jump(ctx, riV); \
 } while (0)
 
+/* if (p0.new) r0 = #0 */
+#define fGEN_TCG_SA1_clrtnew(SHORTCODE) \
+do { \
+tcg_gen_movcond_tl(TCG_COND_EQ, RdV, \
+   hex_new_pred_value[0], tcg_constant_tl(0), \
+   RdV, tcg_constant_tl(0)); \
+} while (0)
+
+/* if (!p0.new) r0 = #0 */
+#define fGEN_TCG_SA1_clrfnew(SHORTCODE) \
+do { \
+tcg_gen_movcond_tl(TCG_COND_NE, RdV, \
+   hex_new_pred_value[0], tcg_constant_tl(0), \
+   RdV, tcg_constant_tl(0)); \
+} while (0)
+
 #define fGEN_TCG_J2_pause(SHORTCODE) \
 do { \
 uiV = uiV; \
diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
index 760630de8f..b1ff40c894 100644
--- a/target/hexagon/macros.h
+++ b/target/hexagon/macros.h
@@ -227,12 +227,8 @@ static inline void gen_cancel(uint32_t slot)
 
 #ifdef QEMU_GENERATE
 #define fLSBNEW(PVAL)   tcg_gen_andi_tl(LSB, (PVAL), 1)
-#define fLSBNEW0tcg_gen_andi_tl(LSB, hex_new_pred_value[0], 1)
-#define fLSBNEW1tcg_gen_andi_tl(LSB, hex_new_pred_value[1], 1)
 #else
 #define fLSBNEW(PVAL)   ((PVAL) & 1)
-#define fLSBNEW0(env->new_pred_value[0] & 1)
-#define fLSBNEW1(env->new_pred_value[1] & 1)
 #endif
 
 #ifdef QEMU_GENERATE
-- 
2.25.1

[PULL v2 08/44] Hexagon (target/hexagon) Add v73 scalar instructions

2023-05-18 Thread Taylor Simpson

The following instructions are added
J2_callrh
J2_junprh

Signed-off-by: Taylor Simpson 
Reviewed-by: Anton Johansson 
Message-Id: <20230427224057.3766963-9-tsimp...@quicinc.com>
---
 target/hexagon/gen_tcg.h  | 4 
 target/hexagon/attribs_def.h.inc  | 1 +
 target/hexagon/imported/branch.idef   | 7 ++-
 target/hexagon/imported/encode_pp.def | 2 ++
 4 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index 598d80d3ce..6f12f665db 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -653,6 +653,8 @@
 gen_call(ctx, riV)
 #define fGEN_TCG_J2_callr(SHORTCODE) \
 gen_callr(ctx, RsV)
+#define fGEN_TCG_J2_callrh(SHORTCODE) \
+gen_callr(ctx, RsV)
 
 #define fGEN_TCG_J2_callt(SHORTCODE) \
 gen_cond_call(ctx, PuV, TCG_COND_EQ, riV)
@@ -851,6 +853,8 @@
 gen_jump(ctx, riV)
 #define fGEN_TCG_J2_jumpr(SHORTCODE) \
 gen_jumpr(ctx, RsV)
+#define fGEN_TCG_J2_jumprh(SHORTCODE) \
+gen_jumpr(ctx, RsV)
 #define fGEN_TCG_J4_jumpseti(SHORTCODE) \
 do { \
 tcg_gen_movi_tl(RdV, UiV); \
diff --git a/target/hexagon/attribs_def.h.inc b/target/hexagon/attribs_def.h.inc
index 3bef60bef3..69da9776f0 100644
--- a/target/hexagon/attribs_def.h.inc
+++ b/target/hexagon/attribs_def.h.inc
@@ -89,6 +89,7 @@ DEF_ATTRIB(JUMP, "Jump-type instruction", "", "")
 DEF_ATTRIB(INDIRECT, "Absolute register jump", "", "")
 DEF_ATTRIB(CALL, "Function call instruction", "", "")
 DEF_ATTRIB(COF, "Change-of-flow instruction", "", "")
+DEF_ATTRIB(HINTED_COF, "This instruction is a hinted change-of-flow", "", "")
 DEF_ATTRIB(CONDEXEC, "May be cancelled by a predicate", "", "")
 DEF_ATTRIB(DOTNEWVALUE, "Uses a register value generated in this pkt", "", "")
 DEF_ATTRIB(NEWCMPJUMP, "Compound compare and jump", "", "")
diff --git a/target/hexagon/imported/branch.idef 
b/target/hexagon/imported/branch.idef
index 88f5f48cce..93e2e375a5 100644
--- a/target/hexagon/imported/branch.idef
+++ b/target/hexagon/imported/branch.idef
@@ -1,5 +1,5 @@
 /*
- *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
+ *  Copyright(c) 2019-2023 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
  *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
@@ -34,6 +34,9 @@ Q6INSN(J2_jump,"jump #r22:2",ATTRIBS(A_JDIR), "direct 
unconditional jump",
 Q6INSN(J2_jumpr,"jumpr Rs32",ATTRIBS(A_JINDIR), "indirect unconditional jump",
 {fJUMPR(RsN,RsV,COF_TYPE_JUMPR);})
 
+Q6INSN(J2_jumprh,"jumprh Rs32",ATTRIBS(A_JINDIR, A_HINTED_COF), "indirect 
unconditional jump",
+{fJUMPR(RsN,RsV,COF_TYPE_JUMPR);})
+
 #define OLDCOND_JUMP(TAG,OPER,OPER2,ATTRIB,DESCR,SEMANTICS) \
 Q6INSN(TAG##t,"if (Pu4) "OPER":nt 
"OPER2,ATTRIB,DESCR,{fBRANCH_SPECULATE_STALL(fLSBOLD(PuV),,SPECULATE_NOT_TAKEN,12,0);
 if (fLSBOLD(PuV)) { SEMANTICS; }}) \
 Q6INSN(TAG##f,"if (!Pu4) "OPER":nt 
"OPER2,ATTRIB,DESCR,{fBRANCH_SPECULATE_STALL(fLSBOLDNOT(PuV),,SPECULATE_NOT_TAKEN,12,0);
 if (fLSBOLDNOT(PuV)) { SEMANTICS; }}) \
@@ -196,6 +199,8 @@ Q6INSN(J2_callrt,"if (Pu4) callr 
Rs32",ATTRIBS(CINDIR_STD),"indirect conditional
 Q6INSN(J2_callrf,"if (!Pu4) callr Rs32",ATTRIBS(CINDIR_STD),"indirect 
conditional call if false",
 {fBRANCH_SPECULATE_STALL(fLSBOLDNOT(PuV),,SPECULATE_NOT_TAKEN,12,0);if 
(fLSBOLDNOT(PuV)) { fCALLR(RsV); }})
 
+Q6INSN(J2_callrh,"callrh Rs32",ATTRIBS(CINDIR_STD, A_HINTED_COF), "hinted 
indirect unconditional call",
+{ fCALLR(RsV); })
 
 
 
diff --git a/target/hexagon/imported/encode_pp.def 
b/target/hexagon/imported/encode_pp.def
index 763f465bfd..0cd30a5e85 100644
--- a/target/hexagon/imported/encode_pp.def
+++ b/target/hexagon/imported/encode_pp.def
@@ -524,6 +524,7 @@ DEF_FIELD32(ICLASS_J" 110-  PP-! 
",J_PT,"Predict-taken")
 
 DEF_FIELDROW_DESC32(ICLASS_J"   PP-- ","[#0] PC=(Rs), 
R31=return")
 DEF_ENC32(J2_callr, ICLASS_J"   101s  PP--  ")
+DEF_ENC32(J2_callrh,ICLASS_J"   110s  PP--  ")
 
 DEF_FIELDROW_DESC32(ICLASS_J" 0001  PP-- ","[#1] if (Pu) 
PC=(Rs), R31=return")
 DEF_ENC32(J2_callrt,ICLASS_J" 0001  000s  PPuu  ")
@@ -531,6 +532,7 @@ DEF_ENC32(J2_callrf,ICLASS_J" 0001  001s  PPuu  
")
 
 DEF_FIELDROW_DESC32(ICLASS_J" 0010  PP-- ","[#2] PC=(Rs); 
")
 DEF_ENC32(J2_jumpr,  ICLASS_J" 0010  100s  PP--  ")
+DEF_ENC32(J2_jumprh, ICLASS_J" 0010  110s  PP--  ")
 DEF_ENC32(J4_hintjumpr,  ICLASS_J" 0010  101s  PP--  ")
 
 DEF_FIELDROW_DESC32(ICLASS_J" 0011  PP-- ","[#3] if (Pu) 
PC=(Rs) ")
-- 
2.25.1

[PULL v2 09/44] Hexagon (tests/tcg/hexagon) Add v73 scalar tests

2023-05-18 Thread Taylor Simpson

Tests added for the following instructions
J2_callrh
J2_jumprh

Signed-off-by: Taylor Simpson 
Reviewed-by: Anton Johansson 
Message-Id: <20230427224057.3766963-10-tsimp...@quicinc.com>
---
 tests/tcg/hexagon/v73_scalar.c| 96 +++
 tests/tcg/hexagon/Makefile.target |  2 +
 2 files changed, 98 insertions(+)
 create mode 100644 tests/tcg/hexagon/v73_scalar.c

diff --git a/tests/tcg/hexagon/v73_scalar.c b/tests/tcg/hexagon/v73_scalar.c
new file mode 100644
index 00..fee67fc531
--- /dev/null
+++ b/tests/tcg/hexagon/v73_scalar.c
@@ -0,0 +1,96 @@
+/*
+ *  Copyright(c) 2023 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+#include 
+#include 
+#include 
+
+/*
+ *  Test the scalar core instructions that are new in v73
+ */
+
+int err;
+
+static void __check32(int line, uint32_t result, uint32_t expect)
+{
+if (result != expect) {
+printf("ERROR at line %d: 0x%08x != 0x%08x\n",
+   line, result, expect);
+err++;
+}
+}
+
+#define check32(RES, EXP) __check32(__LINE__, RES, EXP)
+
+static void __check64(int line, uint64_t result, uint64_t expect)
+{
+if (result != expect) {
+printf("ERROR at line %d: 0x%016llx != 0x%016llx\n",
+   line, result, expect);
+err++;
+}
+}
+
+#define check64(RES, EXP) __check64(__LINE__, RES, EXP)
+
+static bool my_func_called;
+
+static void my_func(void)
+{
+my_func_called = true;
+}
+
+static inline void callrh(void *func)
+{
+asm volatile("callrh %0\n\t"
+ : : "r"(func)
+ /* Mark the caller-save registers as clobbered */
+ : "r0", "r1", "r2", "r3", "r4", "r5", "r6", "r7", "r8", "r9",
+   "r10", "r11", "r12", "r13", "r14", "r15", "r28",
+   "p0", "p1", "p2", "p3");
+}
+
+static void test_callrh(void)
+{
+my_func_called = false;
+callrh(_func);
+check32(my_func_called, true);
+}
+
+static void test_jumprh(void)
+{
+uint32_t res;
+asm ("%0 = #5\n\t"
+ "r0 = ##1f\n\t"
+ "jumprh r0\n\t"
+ "%0 = #3\n\t"
+ "jump 2f\n\t"
+ "1:\n\t"
+ "%0 = #1\n\t"
+ "2:\n\t"
+ : "=r"(res) : : "r0");
+check32(res, 1);
+}
+
+int main()
+{
+test_callrh();
+test_jumprh();
+
+puts(err ? "FAIL" : "PASS");
+return err ? 1 : 0;
+}
diff --git a/tests/tcg/hexagon/Makefile.target 
b/tests/tcg/hexagon/Makefile.target
index 558c056148..3172f2e4db 100644
--- a/tests/tcg/hexagon/Makefile.target
+++ b/tests/tcg/hexagon/Makefile.target
@@ -79,6 +79,7 @@ HEX_TESTS += test_vspliceb
 HEX_TESTS += v68_scalar
 HEX_TESTS += v68_hvx
 HEX_TESTS += v69_hvx
+HEX_TESTS += v73_scalar
 
 TESTS += $(HEX_TESTS)
 
@@ -98,6 +99,7 @@ v68_hvx: v68_hvx.c hvx_misc.h v6mpy_ref.c.inc
 v68_hvx: CFLAGS += -mhvx -Wno-unused-function
 v69_hvx: v69_hvx.c hvx_misc.h
 v69_hvx: CFLAGS += -mhvx -Wno-unused-function
+v73_scalar: CFLAGS += -Wno-unused-function
 
 hvx_histogram: hvx_histogram.c hvx_histogram_row.S
$(CC) $(CFLAGS) $(CROSS_CC_GUEST_CFLAGS) $^ -o $@ $(LDFLAGS)
-- 
2.25.1

[PULL v2 24/44] Hexagon (target/hexagon) Add overrides for disabled idef-parser insns

2023-05-18 Thread Taylor Simpson

The following have overrides
S2_insert
S2_insert_rp
S2_asr_r_svw_trun
A2_swiz

These instructions have semantics that write to the destination
before all the operand reads have been completed.  Therefore,
the idef-parser versions were disabled with the short-circuit patch.

Test cases added to tests/tcg/hexagon/read_write_overlap.c

Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
Message-Id: <20230427230012.3800327-16-tsimp...@quicinc.com>
---
 target/hexagon/gen_tcg.h   |  18 
 target/hexagon/genptr.c|  99 ++
 tests/tcg/hexagon/read_write_overlap.c | 136 +
 tests/tcg/hexagon/Makefile.target  |   1 +
 4 files changed, 254 insertions(+)
 create mode 100644 tests/tcg/hexagon/read_write_overlap.c

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index 7e070c35bd..ed2c1ccc46 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -1185,6 +1185,24 @@
 tcg_gen_extrl_i64_i32(RdV, tmp); \
 } while (0)
 
+#define fGEN_TCG_S2_insert(SHORTCODE) \
+do { \
+int width = uiV; \
+int offset = UiV; \
+if (width != 0) { \
+if (offset + width > 32) { \
+width = 32 - offset; \
+} \
+tcg_gen_deposit_tl(RxV, RxV, RsV, offset, width); \
+} \
+} while (0)
+#define fGEN_TCG_S2_insert_rp(SHORTCODE) \
+gen_insert_rp(ctx, RxV, RsV, RttV)
+#define fGEN_TCG_S2_asr_r_svw_trun(SHORTCODE) \
+gen_asr_r_svw_trun(ctx, RdV, RssV, RtV)
+#define fGEN_TCG_A2_swiz(SHORTCODE) \
+tcg_gen_bswap_tl(RdV, RsV)
+
 /* Floating point */
 #define fGEN_TCG_F2_conv_sf2df(SHORTCODE) \
 gen_helper_conv_sf2df(RddV, cpu_env, RsV)
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 82a3408eb4..5eb0d58659 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -1065,6 +1065,105 @@ static void gen_asl_r_r_sat(DisasContext *ctx, TCGv 
RdV, TCGv RsV, TCGv RtV)
 gen_set_label(done);
 }
 
+static void gen_insert_rp(DisasContext *ctx, TCGv RxV, TCGv RsV, TCGv_i64 RttV)
+{
+/*
+ * int width = fZXTN(6, 32, (fGETWORD(1, RttV)));
+ * int offset = fSXTN(7, 32, (fGETWORD(0, RttV)));
+ * size8u_t mask = ((fCONSTLL(1) << width) - 1);
+ * if (offset < 0) {
+ * RxV = 0;
+ * } else {
+ * RxV &= ~(mask << offset);
+ * RxV |= ((RsV & mask) << offset);
+ * }
+ */
+
+TCGv width = tcg_temp_new();
+TCGv offset = tcg_temp_new();
+TCGv_i64 mask = tcg_temp_new_i64();
+TCGv_i64 result = tcg_temp_new_i64();
+TCGv_i64 tmp = tcg_temp_new_i64();
+TCGv_i64 offset64 = tcg_temp_new_i64();
+TCGLabel *label = gen_new_label();
+TCGLabel *done = gen_new_label();
+
+tcg_gen_extrh_i64_i32(width, RttV);
+tcg_gen_extract_tl(width, width, 0, 6);
+tcg_gen_extrl_i64_i32(offset, RttV);
+tcg_gen_sextract_tl(offset, offset, 0, 7);
+/* Possible values for offset are -64 .. 63 */
+tcg_gen_brcondi_tl(TCG_COND_GE, offset, 0, label);
+/* For negative offsets, zero out the result */
+tcg_gen_movi_tl(RxV, 0);
+tcg_gen_br(done);
+gen_set_label(label);
+/* At this point, possible values of offset are 0 .. 63 */
+tcg_gen_ext_i32_i64(mask, width);
+tcg_gen_shl_i64(mask, tcg_constant_i64(1), mask);
+tcg_gen_subi_i64(mask, mask, 1);
+tcg_gen_extu_i32_i64(result, RxV);
+tcg_gen_ext_i32_i64(tmp, offset);
+tcg_gen_shl_i64(tmp, mask, tmp);
+tcg_gen_andc_i64(result, result, tmp);
+tcg_gen_extu_i32_i64(tmp, RsV);
+tcg_gen_and_i64(tmp, tmp, mask);
+tcg_gen_extu_i32_i64(offset64, offset);
+tcg_gen_shl_i64(tmp, tmp, offset64);
+tcg_gen_or_i64(result, result, tmp);
+tcg_gen_extrl_i64_i32(RxV, result);
+gen_set_label(done);
+}
+
+static void gen_asr_r_svw_trun(DisasContext *ctx, TCGv RdV,
+   TCGv_i64 RssV, TCGv RtV)
+{
+/*
+ * for (int i = 0; i < 2; i++) {
+ * fSETHALF(i, RdV, fGETHALF(0, ((fSXTN(7, 32, RtV) > 0) ?
+ * (fCAST4_8s(fGETWORD(i, RssV)) >> fSXTN(7, 32, RtV)) :
+ * (fCAST4_8s(fGETWORD(i, RssV)) << -fSXTN(7, 32, RtV);
+ * }
+ */
+TCGv shift_amt32 = tcg_temp_new();
+TCGv_i64 shift_amt64 = tcg_temp_new_i64();
+TCGv_i64 tmp64 = tcg_temp_new_i64();
+TCGv tmp32 = tcg_temp_new();
+TCGLabel *label = gen_new_label();
+TCGLabel *zero = gen_new_label();
+TCGLabel *done =  gen_new_label();
+
+tcg_gen_sextract_tl(shift_amt32, RtV, 0, 7);
+/* Possible values of shift_amt32 are -64 .. 63 */
+tcg_gen_brcondi_tl(TCG_COND_LE, shift_amt32, 0, label);
+/* After branch, possible values of shift_amt32 are 1 .. 63 */
+tcg_gen_ext_i32_i64(shift_amt64, shift_amt32);
+for (int i = 0; i < 2; i++) {
+tcg_gen_sextract_i64(tmp64, RssV, i * 32, 32);
+tcg_gen_sar_i64(tmp64, tmp64, shift_amt64);
+tcg_gen_extrl_i64_i32(tmp32,

[PULL v2 36/44] Hexagon (iclass): update J4_hintjumpr slot constraints

2023-05-18 Thread Taylor Simpson

From: Matheus Tavares Bernardino 

The Hexagon PRM says that "The assembler automatically encodes
instructions in the packet in the proper order. In the binary encoding
of a packet, the instructions must be ordered from Slot 3 down to
Slot 0."

Prior to the architecture version v73, the slot constraints from
instruction "hintjr" only allowed it to be executed at slot 2.
With that in mind, consider the packet:

{
hintjr(r0)
nop
nop
if (!p0) memd(r1+#0) = r1:0
}

To satisfy the ordering rule quoted from the PRM, the assembler would,
thus, move one of the nops to the first position, so that it can be
assigned to slot 3 and the subsequent hintjr to slot 2.

However, since v73, hintjr can be executed at either slot 2 or 3. So
there is no need to reorder that packet and the assembler will encode it
as is. When QEMU tries to execute it, however, we end up hitting a
"misaliged store" exception because both the store and the hintjr will
be assigned to store 0, and some functions like `slot_is_predicated()`
expect the decode machinery to assign only one instruction per slot. In
particular, the mentioned function will traverse the packet until it
finds the first instruction at the desired slot which, for slot 0, will
be hintjr. Since hintjr is not predicated, the result is that we try to
execute the store regardless of the predicate. And because the predicate
is false, we had not previously loaded hex_store_addr[0] or
hex_store_width[0]. As a result, the store will decide de width based on
trash memory, causing it to be misaligned.

Update the slot constraints for hintjr so that QEMU can properly handle
such encodings.

Note: to avoid similar-but-not-identical issues in the future, we should
look for multiple instructions at the same slot during decoding time and
throw an invalid packet exception. That will be done in the subsequent
commit.

Signed-off-by: Matheus Tavares Bernardino 
Reviewed-by: Taylor Simpson 
Signed-off-by: Taylor Simpson 
Message-Id: 
<0fcd8293642c6324119fbbab44741164bcbd04fb.1673616964.git.quic_mathb...@quicinc.com>
---
 target/hexagon/iclass.c | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/target/hexagon/iclass.c b/target/hexagon/iclass.c
index 6091286993..c3f8523b27 100644
--- a/target/hexagon/iclass.c
+++ b/target/hexagon/iclass.c
@@ -1,5 +1,5 @@
 /*
- *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
+ *  Copyright(c) 2019-2023 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
  *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
@@ -51,8 +51,10 @@ SlotMask find_iclass_slots(Opcode opcode, int itype)
 return SLOTS_0;
 } else if ((opcode == J2_trap0) ||
(opcode == Y2_isync) ||
-   (opcode == J2_pause) || (opcode == J4_hintjumpr)) {
+   (opcode == J2_pause)) {
 return SLOTS_2;
+} else if (opcode == J4_hintjumpr) {
+return SLOTS_23;
 } else if (GET_ATTRIB(opcode, A_CRSLOT23)) {
 return SLOTS_23;
 } else if (GET_ATTRIB(opcode, A_RESTRICT_PREFERSLOT0)) {
-- 
2.25.1

Re: [PATCH v7 1/4] qapi/qdev.json: unite DEVICE_* event data into single structure

2023-05-18 Thread Michael S. Tsirkin

On Fri, Apr 21, 2023 at 01:32:04PM +0300, Vladimir Sementsov-Ogievskiy wrote:
> DEVICE_DELETED and DEVICE_UNPLUG_GUEST_ERROR has equal data, let's
> refactor it to one structure. That also helps to add new events
> consistently.
> 
> Signed-off-by: Vladimir Sementsov-Ogievskiy 

Can QAPI maintainers please review this patchset?
It's been a month.

> ---
>  qapi/qdev.json | 39 +++
>  1 file changed, 27 insertions(+), 12 deletions(-)
> 
> diff --git a/qapi/qdev.json b/qapi/qdev.json
> index 2708fb4e99..135cd81586 100644
> --- a/qapi/qdev.json
> +++ b/qapi/qdev.json
> @@ -114,16 +114,37 @@
>  { 'command': 'device_del', 'data': {'id': 'str'} }
>  
>  ##
> -# @DEVICE_DELETED:
> +# @DeviceAndPath:
>  #
> -# Emitted whenever the device removal completion is acknowledged by the 
> guest.
> -# At this point, it's safe to reuse the specified device ID. Device removal 
> can
> -# be initiated by the guest or by HMP/QMP commands.
> +# In events we designate devices by both their ID (if the device has one)
> +# and QOM path.
> +#
> +# Why we need ID? User specify ID in device_add command and in command line
> +# and expects same identifier in the event data.
> +#
> +# Why we need QOM path? Some devices don't have ID and we still want to emit
> +# events for them.
> +#
> +# So, we have a bit of redundancy, as QOM path for device that has ID is
> +# always /machine/peripheral/ID. But that's hard to change keeping both
> +# simple interface for most users and universality for the generic case.
>  #
>  # @device: the device's ID if it has one
>  #
>  # @path: the device's QOM path
>  #
> +# Since: 8.0
> +##
> +{ 'struct': 'DeviceAndPath',
> +  'data': { '*device': 'str', 'path': 'str' } }
> +

Should be Since: 8.1 no?


> +##
> +# @DEVICE_DELETED:
> +#
> +# Emitted whenever the device removal completion is acknowledged by the 
> guest.
> +# At this point, it's safe to reuse the specified device ID. Device removal 
> can
> +# be initiated by the guest or by HMP/QMP commands.
> +#
>  # Since: 1.5
>  #
>  # Example:
> @@ -134,18 +155,13 @@
>  #  "timestamp": { "seconds": 1265044230, "microseconds": 450486 } }
>  #
>  ##
> -{ 'event': 'DEVICE_DELETED',
> -  'data': { '*device': 'str', 'path': 'str' } }
> +{ 'event': 'DEVICE_DELETED', 'data': 'DeviceAndPath' }
>  
>  ##
>  # @DEVICE_UNPLUG_GUEST_ERROR:
>  #
>  # Emitted when a device hot unplug fails due to a guest reported error.
>  #
> -# @device: the device's ID if it has one
> -#
> -# @path: the device's QOM path
> -#
>  # Since: 6.2
>  #
>  # Example:
> @@ -156,5 +172,4 @@
>  #  "timestamp": { "seconds": 1615570772, "microseconds": 202844 } }
>  #
>  ##
> -{ 'event': 'DEVICE_UNPLUG_GUEST_ERROR',
> -  'data': { '*device': 'str', 'path': 'str' } }
> +{ 'event': 'DEVICE_UNPLUG_GUEST_ERROR', 'data': 'DeviceAndPath' }
> -- 
> 2.34.1

[PULL v2 15/44] Hexagon (target/hexagon) Remove log_reg_write from op_helper.[ch]

2023-05-18 Thread Taylor Simpson

With the overrides added in prior commits, this function is not used
Remove references in macros.h

Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
Message-Id: <20230427230012.3800327-7-tsimp...@quicinc.com>
---
 target/hexagon/macros.h| 14 --
 target/hexagon/op_helper.h |  4 
 target/hexagon/op_helper.c | 17 -
 3 files changed, 35 deletions(-)

diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
index b1ff40c894..995ae0e384 100644
--- a/target/hexagon/macros.h
+++ b/target/hexagon/macros.h
@@ -343,10 +343,6 @@ static inline TCGv gen_read_ireg(TCGv result, TCGv val, 
int shift)
 
 #define fREAD_LR() (env->gpr[HEX_REG_LR])
 
-#define fWRITE_LR(A) log_reg_write(env, HEX_REG_LR, A)
-#define fWRITE_FP(A) log_reg_write(env, HEX_REG_FP, A)
-#define fWRITE_SP(A) log_reg_write(env, HEX_REG_SP, A)
-
 #define fREAD_SP() (env->gpr[HEX_REG_SP])
 #define fREAD_LC0 (env->gpr[HEX_REG_LC0])
 #define fREAD_LC1 (env->gpr[HEX_REG_LC1])
@@ -371,16 +367,6 @@ static inline TCGv gen_read_ireg(TCGv result, TCGv val, 
int shift)
 #define fBRANCH(LOC, TYPE)  fWRITE_NPC(LOC)
 #define fJUMPR(REGNO, TARGET, TYPE) fBRANCH(TARGET, COF_TYPE_JUMPR)
 #define fHINTJR(TARGET) { /* Not modelled in qemu */}
-#define fWRITE_LOOP_REGS0(START, COUNT) \
-do { \
-log_reg_write(env, HEX_REG_LC0, COUNT);  \
-log_reg_write(env, HEX_REG_SA0, START); \
-} while (0)
-#define fWRITE_LOOP_REGS1(START, COUNT) \
-do { \
-log_reg_write(env, HEX_REG_LC1, COUNT);  \
-log_reg_write(env, HEX_REG_SA1, START);\
-} while (0)
 
 #define fSET_OVERFLOW() SET_USR_FIELD(USR_OVF, 1)
 #define fSET_LPCFG(VAL) SET_USR_FIELD(USR_LPCFG, (VAL))
diff --git a/target/hexagon/op_helper.h b/target/hexagon/op_helper.h
index db22b54401..6bd4b07849 100644
--- a/target/hexagon/op_helper.h
+++ b/target/hexagon/op_helper.h
@@ -19,15 +19,11 @@
 #define HEXAGON_OP_HELPER_H
 
 /* Misc functions */
-void write_new_pc(CPUHexagonState *env, bool pkt_has_multi_cof, target_ulong 
addr);
-
 uint8_t mem_load1(CPUHexagonState *env, uint32_t slot, target_ulong vaddr);
 uint16_t mem_load2(CPUHexagonState *env, uint32_t slot, target_ulong vaddr);
 uint32_t mem_load4(CPUHexagonState *env, uint32_t slot, target_ulong vaddr);
 uint64_t mem_load8(CPUHexagonState *env, uint32_t slot, target_ulong vaddr);
 
-void log_reg_write(CPUHexagonState *env, int rnum,
-   target_ulong val);
 void log_store64(CPUHexagonState *env, target_ulong addr,
  int64_t val, int width, int slot);
 void log_store32(CPUHexagonState *env, target_ulong addr,
diff --git a/target/hexagon/op_helper.c b/target/hexagon/op_helper.c
index 3cc71b69d9..7e9e3f305e 100644
--- a/target/hexagon/op_helper.c
+++ b/target/hexagon/op_helper.c
@@ -52,23 +52,6 @@ G_NORETURN void HELPER(raise_exception)(CPUHexagonState 
*env, uint32_t excp)
 do_raise_exception_err(env, excp, 0);
 }
 
-void log_reg_write(CPUHexagonState *env, int rnum,
-   target_ulong val)
-{
-HEX_DEBUG_LOG("log_reg_write[%d] = " TARGET_FMT_ld " (0x" TARGET_FMT_lx 
")",
-  rnum, val, val);
-if (val == env->gpr[rnum]) {
-HEX_DEBUG_LOG(" NO CHANGE");
-}
-HEX_DEBUG_LOG("\n");
-
-env->new_value[rnum] = val;
-if (HEX_DEBUG) {
-/* Do this so HELPER(debug_commit_end) will know */
-env->reg_written[rnum] = 1;
-}
-}
-
 static void log_pred_write(CPUHexagonState *env, int pnum, target_ulong val)
 {
 HEX_DEBUG_LOG("log_pred_write[%d] = " TARGET_FMT_ld
-- 
2.25.1

[PULL v2 13/44] Hexagon (target/hexagon) Add overrides for allocframe/deallocframe

2023-05-18 Thread Taylor Simpson

These instructions have implicit writes to registers, so we don't
want them to be helpers when idef-parser is off.

Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
Message-Id: <20230427230012.3800327-5-tsimp...@quicinc.com>
---
 target/hexagon/gen_tcg.h | 32 +++
 target/hexagon/genptr.c  | 47 
 2 files changed, 79 insertions(+)

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index 8d5e9826a0..ef17f2f18c 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -500,6 +500,38 @@
 #define fGEN_TCG_Y2_icinva(SHORTCODE) \
 do { RsV = RsV; } while (0)
 
+/*
+ * allocframe(#uiV)
+ * RxV == r29
+ */
+#define fGEN_TCG_S2_allocframe(SHORTCODE) \
+gen_allocframe(ctx, RxV, uiV)
+
+/* sub-instruction version (no RxV, so handle it manually) */
+#define fGEN_TCG_SS2_allocframe(SHORTCODE) \
+do { \
+TCGv r29 = tcg_temp_new(); \
+tcg_gen_mov_tl(r29, hex_gpr[HEX_REG_SP]); \
+gen_allocframe(ctx, r29, uiV); \
+gen_log_reg_write(ctx, HEX_REG_SP, r29); \
+} while (0)
+
+/*
+ * Rdd32 = deallocframe(Rs32):raw
+ * RddV == r31:30
+ * RsV  == r30
+ */
+#define fGEN_TCG_L2_deallocframe(SHORTCODE) \
+gen_deallocframe(ctx, RddV, RsV)
+
+/* sub-instruction version (no RddV/RsV, so handle it manually) */
+#define fGEN_TCG_SL2_deallocframe(SHORTCODE) \
+do { \
+TCGv_i64 r31_30 = tcg_temp_new_i64(); \
+gen_deallocframe(ctx, r31_30, hex_gpr[HEX_REG_FP]); \
+gen_log_reg_write_pair(ctx, HEX_REG_FP, r31_30); \
+} while (0)
+
 /*
  * dealloc_return
  * Assembler mapped to
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 6e5767ec5e..fa7b1754bd 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -709,6 +709,18 @@ static void gen_cond_callr(DisasContext *ctx,
 gen_set_label(skip);
 }
 
+#ifndef CONFIG_HEXAGON_IDEF_PARSER
+/* frame = ((LR << 32) | FP) ^ (FRAMEKEY << 32)) */
+static TCGv_i64 gen_frame_scramble(void)
+{
+TCGv_i64 frame = tcg_temp_new_i64();
+TCGv tmp = tcg_temp_new();
+tcg_gen_xor_tl(tmp, hex_gpr[HEX_REG_LR], hex_gpr[HEX_REG_FRAMEKEY]);
+tcg_gen_concat_i32_i64(frame, hex_gpr[HEX_REG_FP], tmp);
+return frame;
+}
+#endif
+
 /* frame ^= (int64_t)FRAMEKEY << 32 */
 static void gen_frame_unscramble(TCGv_i64 frame)
 {
@@ -725,6 +737,41 @@ static void gen_load_frame(DisasContext *ctx, TCGv_i64 
frame, TCGv EA)
 tcg_gen_qemu_ld_i64(frame, EA, ctx->mem_idx, MO_TEUQ);
 }
 
+#ifndef CONFIG_HEXAGON_IDEF_PARSER
+/* Stack overflow check */
+static void gen_framecheck(TCGv EA, int framesize)
+{
+/* Not modelled in linux-user mode */
+/* Placeholder for system mode */
+#ifndef CONFIG_USER_ONLY
+g_assert_not_reached();
+#endif
+}
+
+static void gen_allocframe(DisasContext *ctx, TCGv r29, int framesize)
+{
+TCGv r30 = tcg_temp_new();
+TCGv_i64 frame;
+tcg_gen_addi_tl(r30, r29, -8);
+frame = gen_frame_scramble();
+gen_store8(cpu_env, r30, frame, ctx->insn->slot);
+gen_log_reg_write(ctx, HEX_REG_FP, r30);
+gen_framecheck(r30, framesize);
+tcg_gen_subi_tl(r29, r30, framesize);
+}
+
+static void gen_deallocframe(DisasContext *ctx, TCGv_i64 r31_30, TCGv r30)
+{
+TCGv r29 = tcg_temp_new();
+TCGv_i64 frame = tcg_temp_new_i64();
+gen_load_frame(ctx, frame, r30);
+gen_frame_unscramble(frame);
+tcg_gen_mov_i64(r31_30, frame);
+tcg_gen_addi_tl(r29, r30, 8);
+gen_log_reg_write(ctx, HEX_REG_SP, r29);
+}
+#endif
+
 static void gen_return(DisasContext *ctx, TCGv_i64 dst, TCGv src)
 {
 /*
-- 
2.25.1

[PULL v2 28/44] Hexagon (target/hexagon) Move pred_written to DisasContext

2023-05-18 Thread Taylor Simpson

The pred_written variable in the CPUHexagonState is only used for
bookkeeping within the translation of a packet.  With recent changes
that eliminate the need to free TCGv variables, these make more sense
to be transient and kept in DisasContext.

Suggested-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
Message-Id: <20230427230012.3800327-20-tsimp...@quicinc.com>
---
 target/hexagon/cpu.h   | 2 --
 target/hexagon/helper.h| 2 +-
 target/hexagon/translate.h | 2 +-
 target/hexagon/genptr.c| 2 +-
 target/hexagon/op_helper.c | 5 +++--
 target/hexagon/translate.c | 9 -
 6 files changed, 10 insertions(+), 12 deletions(-)

diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
index 2b4f77fb8e..7673f9f32d 100644
--- a/target/hexagon/cpu.h
+++ b/target/hexagon/cpu.h
@@ -98,8 +98,6 @@ typedef struct CPUArchState {
 target_ulong this_PC;
 target_ulong reg_written[TOTAL_PER_THREAD_REGS];
 
-target_ulong pred_written;
-
 MemLog mem_log_stores[STORES_MAX];
 target_ulong pkt_has_store_s1;
 target_ulong dczero_addr;
diff --git a/target/hexagon/helper.h b/target/hexagon/helper.h
index 4b750d0351..f3b298beee 100644
--- a/target/hexagon/helper.h
+++ b/target/hexagon/helper.h
@@ -21,7 +21,7 @@
 DEF_HELPER_FLAGS_2(raise_exception, TCG_CALL_NO_RETURN, noreturn, env, i32)
 DEF_HELPER_1(debug_start_packet, void, env)
 DEF_HELPER_FLAGS_3(debug_check_store_width, TCG_CALL_NO_WG, void, env, int, 
int)
-DEF_HELPER_FLAGS_3(debug_commit_end, TCG_CALL_NO_WG, void, env, int, int)
+DEF_HELPER_FLAGS_4(debug_commit_end, TCG_CALL_NO_WG, void, env, int, int, int)
 DEF_HELPER_2(commit_store, void, env, int)
 DEF_HELPER_3(gather_store, void, env, i32, int)
 DEF_HELPER_1(commit_hvx_stores, void, env)
diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h
index fdfa1b6fe3..a9f1ccee24 100644
--- a/target/hexagon/translate.h
+++ b/target/hexagon/translate.h
@@ -71,6 +71,7 @@ typedef struct DisasContext {
 bool has_hvx_helper;
 TCGv new_value[TOTAL_PER_THREAD_REGS];
 TCGv new_pred_value[NUM_PREGS];
+TCGv pred_written;
 } DisasContext;
 
 static inline void ctx_log_pred_write(DisasContext *ctx, int pnum)
@@ -194,7 +195,6 @@ extern TCGv hex_slot_cancelled;
 extern TCGv hex_branch_taken;
 extern TCGv hex_new_value_usr;
 extern TCGv hex_reg_written[TOTAL_PER_THREAD_REGS];
-extern TCGv hex_pred_written;
 extern TCGv hex_store_addr[STORES_MAX];
 extern TCGv hex_store_width[STORES_MAX];
 extern TCGv hex_store_val32[STORES_MAX];
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 1f69f4f922..785778759e 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -151,7 +151,7 @@ void gen_log_pred_write(DisasContext *ctx, int pnum, TCGv 
val)
 tcg_gen_and_tl(pred, pred, base_val);
 }
 if (HEX_DEBUG) {
-tcg_gen_ori_tl(hex_pred_written, hex_pred_written, 1 << pnum);
+tcg_gen_ori_tl(ctx->pred_written, ctx->pred_written, 1 << pnum);
 }
 set_bit(pnum, ctx->pregs_written);
 }
diff --git a/target/hexagon/op_helper.c b/target/hexagon/op_helper.c
index 26fba9f5d6..f9021efc7e 100644
--- a/target/hexagon/op_helper.c
+++ b/target/hexagon/op_helper.c
@@ -203,7 +203,8 @@ static void print_store(CPUHexagonState *env, int slot)
 }
 
 /* This function is a handy place to set a breakpoint */
-void HELPER(debug_commit_end)(CPUHexagonState *env, int has_st0, int has_st1)
+void HELPER(debug_commit_end)(CPUHexagonState *env,
+  int pred_written, int has_st0, int has_st1)
 {
 bool reg_printed = false;
 bool pred_printed = false;
@@ -225,7 +226,7 @@ void HELPER(debug_commit_end)(CPUHexagonState *env, int 
has_st0, int has_st1)
 }
 
 for (i = 0; i < NUM_PREGS; i++) {
-if (env->pred_written & (1 << i)) {
+if (pred_written & (1 << i)) {
 if (!pred_printed) {
 HEX_DEBUG_LOG("Predicates written\n");
 pred_printed = true;
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index 459aace921..a585cc8cfd 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -46,7 +46,6 @@ TCGv hex_slot_cancelled;
 TCGv hex_branch_taken;
 TCGv hex_new_value_usr;
 TCGv hex_reg_written[TOTAL_PER_THREAD_REGS];
-TCGv hex_pred_written;
 TCGv hex_store_addr[STORES_MAX];
 TCGv hex_store_width[STORES_MAX];
 TCGv hex_store_val32[STORES_MAX];
@@ -549,7 +548,8 @@ static void gen_start_packet(DisasContext *ctx)
 }
 }
 if (HEX_DEBUG) {
-tcg_gen_movi_tl(hex_pred_written, 0);
+ctx->pred_written = tcg_temp_new();
+tcg_gen_movi_tl(ctx->pred_written, 0);
 }
 
 /* Preload the predicated registers into get_result_gpr(ctx, i) */
@@ -1007,7 +1007,8 @@ static void gen_commit_packet(DisasContext *ctx)
 tcg_constant_tl(pkt->pkt_has_store_s1 && !pkt->pkt_has_dczeroa);
 
 /* Handy place to set a breakpoint at the end of execution */
-

[PULL v2 20/44] Hexagon (target/hexagon) Short-circuit packet register writes

2023-05-18 Thread Taylor Simpson

In certain cases, we can avoid the overhead of writing to hex_new_value
and write directly to hex_gpr.  We add need_commit field to DisasContext
indicating if the end-of-packet commit is needed.  If it is not needed,
get_result_gpr() and get_result_gpr_pair() can return hex_gpr.

We pass the ctx->need_commit to helpers when needed.

Finally, we can early-exit from gen_reg_writes during packet commit.

There are a few instructions whose semantics write to the result before
reading all the inputs.  Therefore, the idef-parser generated code is
incompatible with short-circuit.  We tell idef-parser to skip them.

For debugging purposes, we add a cpu property to turn off short-circuit.
When the short-circuit property is false, we skip the analysis and force
the end-of-packet commit.

Here's a simple example of the TCG generated for
0x004000b4:  0x7800c020 {   R0 = #0x1 }

BEFORE:
  004000b4
 movi_i32 new_r0,$0x1
 mov_i32 r0,new_r0

AFTER:
  004000b4
 movi_i32 r0,$0x1

This patch reintroduces a use of check_for_attrib, so we remove the
G_GNUC_UNUSED added earlier in this series.

Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
Reviewed-by: Brian Cain 
Message-Id: <20230427230012.3800327-12-tsimp...@quicinc.com>
---
 target/hexagon/cpu.h|  1 +
 target/hexagon/gen_tcg.h|  3 +-
 target/hexagon/genptr.h |  2 +
 target/hexagon/helper.h |  2 +-
 target/hexagon/macros.h | 13 -
 target/hexagon/translate.h  |  2 +
 target/hexagon/arch.c   |  3 +-
 target/hexagon/cpu.c|  3 ++
 target/hexagon/genptr.c | 30 ---
 target/hexagon/op_helper.c  |  5 +-
 target/hexagon/translate.c  | 67 -
 target/hexagon/gen_helper_funcs.py  |  2 +
 target/hexagon/gen_helper_protos.py | 10 +++-
 target/hexagon/gen_idef_parser_funcs.py |  7 +++
 target/hexagon/gen_tcg_funcs.py |  5 ++
 target/hexagon/hex_common.py|  3 ++
 16 files changed, 128 insertions(+), 30 deletions(-)

diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
index 4d8981d862..631bfdbe9c 100644
--- a/target/hexagon/cpu.h
+++ b/target/hexagon/cpu.h
@@ -150,6 +150,7 @@ struct ArchCPU {
 
 bool lldb_compat;
 target_ulong lldb_stack_adjust;
+bool short_circuit;
 };
 
 #include "cpu_bits.h"
diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index 099a6cc47f..7e070c35bd 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -592,7 +592,8 @@
 #define fGEN_TCG_A5_ACS(SHORTCODE) \
 do { \
 gen_helper_vacsh_pred(PeV, cpu_env, RxxV, RssV, RttV); \
-gen_helper_vacsh_val(RxxV, cpu_env, RxxV, RssV, RttV); \
+gen_helper_vacsh_val(RxxV, cpu_env, RxxV, RssV, RttV, \
+ tcg_constant_tl(ctx->need_commit)); \
 } while (0)
 
 #define fGEN_TCG_S2_cabacdecbin(SHORTCODE) \
diff --git a/target/hexagon/genptr.h b/target/hexagon/genptr.h
index 75d0fc262d..420867f934 100644
--- a/target/hexagon/genptr.h
+++ b/target/hexagon/genptr.h
@@ -58,4 +58,6 @@ void gen_set_half(int N, TCGv result, TCGv src);
 void gen_set_half_i64(int N, TCGv_i64 result, TCGv src);
 void probe_noshuf_load(TCGv va, int s, int mi);
 
+extern const target_ulong reg_immut_masks[TOTAL_PER_THREAD_REGS];
+
 #endif
diff --git a/target/hexagon/helper.h b/target/hexagon/helper.h
index 73849e3d49..4b750d0351 100644
--- a/target/hexagon/helper.h
+++ b/target/hexagon/helper.h
@@ -29,7 +29,7 @@ DEF_HELPER_FLAGS_4(fcircadd, TCG_CALL_NO_RWG_SE, s32, s32, 
s32, s32, s32)
 DEF_HELPER_FLAGS_1(fbrev, TCG_CALL_NO_RWG_SE, i32, i32)
 DEF_HELPER_3(sfrecipa, i64, env, f32, f32)
 DEF_HELPER_2(sfinvsqrta, i64, env, f32)
-DEF_HELPER_4(vacsh_val, s64, env, s64, s64, s64)
+DEF_HELPER_5(vacsh_val, s64, env, s64, s64, s64, i32)
 DEF_HELPER_FLAGS_4(vacsh_pred, TCG_CALL_NO_RWG_SE, s32, env, s64, s64, s64)
 DEF_HELPER_FLAGS_2(cabacdecbin_val, TCG_CALL_NO_RWG_SE, s64, s64, s64)
 DEF_HELPER_FLAGS_2(cabacdecbin_pred, TCG_CALL_NO_RWG_SE, s32, s64, s64)
diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
index 24c78fe80a..54562cccb0 100644
--- a/target/hexagon/macros.h
+++ b/target/hexagon/macros.h
@@ -44,8 +44,17 @@
reg_field_info[FIELD].offset)
 
 #define SET_USR_FIELD(FIELD, VAL) \
-fINSERT_BITS(env->new_value[HEX_REG_USR], reg_field_info[FIELD].width, \
- reg_field_info[FIELD].offset, (VAL))
+do { \
+if (pkt_need_commit) { \
+fINSERT_BITS(env->new_value[HEX_REG_USR], \
+reg_field_info[FIELD].width, \
+reg_field_info[FIELD].offset, (VAL)); \
+} else { \
+fINSERT_BITS(env->gpr[HEX_REG_USR], \
+reg_field_info[FIELD].width, \
+reg_field_info[FIELD].offset, (VAL)); \
+} \
+} while (0)
 #endif
 
 #ifdef

[PULL v2 34/44] Hexagon: list available CPUs with `-cpu help`

2023-05-18 Thread Taylor Simpson

From: Matheus Tavares Bernardino 

Currently, qemu-hexagon only models the v67 cpu. Nonetheless if we try
to get this information with `-cpu help`, qemu just exists with an error
code and no output. Let's correct that.

The code is basically a copy from target/alpha/cpu.h, but we strip the
"-hexagon-cpu" suffix before printing. This is to avoid confusing
situations like the following:

$ qemu-hexagon -cpu help

Available CPUs:
  v67-hexagon-cpu

$ qemu-hexagon -cpu v67-hexagon-cpu ./prog

qemu-hexagon: unable to find CPU model 'v67-hexagon-cpu'

Signed-off-by: Matheus Tavares Bernardino 
Signed-off-by: Taylor Simpson 
Tested-by: Taylor Simpson 
Reviewed-by: Taylor Simpson 
Message-Id: 

---
 target/hexagon/cpu.h |  3 +++
 target/hexagon/cpu.c | 20 
 2 files changed, 23 insertions(+)

diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
index d095dc6647..bfcb1057dd 100644
--- a/target/hexagon/cpu.h
+++ b/target/hexagon/cpu.h
@@ -48,6 +48,9 @@
 #define TYPE_HEXAGON_CPU_V71 HEXAGON_CPU_TYPE_NAME("v71")
 #define TYPE_HEXAGON_CPU_V73 HEXAGON_CPU_TYPE_NAME("v73")
 
+void hexagon_cpu_list(void);
+#define cpu_list hexagon_cpu_list
+
 #define MMU_USER_IDX 0
 
 typedef struct {
diff --git a/target/hexagon/cpu.c b/target/hexagon/cpu.c
index d4dfc382ab..7e127059c7 100644
--- a/target/hexagon/cpu.c
+++ b/target/hexagon/cpu.c
@@ -31,6 +31,26 @@ static void hexagon_v69_cpu_init(Object *obj) { }
 static void hexagon_v71_cpu_init(Object *obj) { }
 static void hexagon_v73_cpu_init(Object *obj) { }
 
+static void hexagon_cpu_list_entry(gpointer data, gpointer user_data)
+{
+ObjectClass *oc = data;
+char *name = g_strdup(object_class_get_name(oc));
+if (g_str_has_suffix(name, HEXAGON_CPU_TYPE_SUFFIX)) {
+name[strlen(name) - strlen(HEXAGON_CPU_TYPE_SUFFIX)] = '\0';
+}
+qemu_printf("  %s\n", name);
+g_free(name);
+}
+
+void hexagon_cpu_list(void)
+{
+GSList *list;
+list = object_class_get_list_sorted(TYPE_HEXAGON_CPU, false);
+qemu_printf("Available CPUs:\n");
+g_slist_foreach(list, hexagon_cpu_list_entry, NULL);
+g_slist_free(list);
+}
+
 static ObjectClass *hexagon_cpu_class_by_name(const char *cpu_model)
 {
 ObjectClass *oc;
-- 
2.25.1

[PULL v2 12/44] Hexagon (target/hexagon) Add overrides for loop setup instructions

2023-05-18 Thread Taylor Simpson

These instructions have implicit writes to registers, so we don't
want them to be helpers when idef-parser is off.

Signed-off-by: Taylor Simpson 
Acked-by: Richard Henderson 
Message-Id: <20230427230012.3800327-4-tsimp...@quicinc.com>
---
 target/hexagon/gen_tcg.h | 21 +++
 target/hexagon/genptr.c  | 44 
 2 files changed, 65 insertions(+)

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index d4bd38810e..8d5e9826a0 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -665,6 +665,27 @@
 #define fGEN_TCG_J2_callrf(SHORTCODE) \
 gen_cond_callr(ctx, TCG_COND_NE, PuV, RsV)
 
+#define fGEN_TCG_J2_loop0r(SHORTCODE) \
+gen_loop0r(ctx, RsV, riV)
+#define fGEN_TCG_J2_loop1r(SHORTCODE) \
+gen_loop1r(ctx, RsV, riV)
+#define fGEN_TCG_J2_loop0i(SHORTCODE) \
+gen_loop0i(ctx, UiV, riV)
+#define fGEN_TCG_J2_loop1i(SHORTCODE) \
+gen_loop1i(ctx, UiV, riV)
+#define fGEN_TCG_J2_ploop1sr(SHORTCODE) \
+gen_ploopNsr(ctx, 1, RsV, riV)
+#define fGEN_TCG_J2_ploop1si(SHORTCODE) \
+gen_ploopNsi(ctx, 1, UiV, riV)
+#define fGEN_TCG_J2_ploop2sr(SHORTCODE) \
+gen_ploopNsr(ctx, 2, RsV, riV)
+#define fGEN_TCG_J2_ploop2si(SHORTCODE) \
+gen_ploopNsi(ctx, 2, UiV, riV)
+#define fGEN_TCG_J2_ploop3sr(SHORTCODE) \
+gen_ploopNsr(ctx, 3, RsV, riV)
+#define fGEN_TCG_J2_ploop3si(SHORTCODE) \
+gen_ploopNsi(ctx, 3, UiV, riV)
+
 #define fGEN_TCG_J2_endloop0(SHORTCODE) \
 gen_endloop0(ctx)
 #define fGEN_TCG_J2_endloop1(SHORTCODE) \
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index dd707a9dc7..6e5767ec5e 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -518,6 +518,50 @@ static void gen_compare(TCGCond cond, TCGv res, TCGv arg1, 
TCGv arg2)
 tcg_gen_movcond_tl(cond, res, arg1, arg2, one, zero);
 }
 
+#ifndef CONFIG_HEXAGON_IDEF_PARSER
+static inline void gen_loop0r(DisasContext *ctx, TCGv RsV, int riV)
+{
+fIMMEXT(riV);
+fPCALIGN(riV);
+gen_log_reg_write(ctx, HEX_REG_LC0, RsV);
+gen_log_reg_write(ctx, HEX_REG_SA0, tcg_constant_tl(ctx->pkt->pc + riV));
+gen_set_usr_fieldi(ctx, USR_LPCFG, 0);
+}
+
+static void gen_loop0i(DisasContext *ctx, int count, int riV)
+{
+gen_loop0r(ctx, tcg_constant_tl(count), riV);
+}
+
+static inline void gen_loop1r(DisasContext *ctx, TCGv RsV, int riV)
+{
+fIMMEXT(riV);
+fPCALIGN(riV);
+gen_log_reg_write(ctx, HEX_REG_LC1, RsV);
+gen_log_reg_write(ctx, HEX_REG_SA1, tcg_constant_tl(ctx->pkt->pc + riV));
+}
+
+static void gen_loop1i(DisasContext *ctx, int count, int riV)
+{
+gen_loop1r(ctx, tcg_constant_tl(count), riV);
+}
+
+static void gen_ploopNsr(DisasContext *ctx, int N, TCGv RsV, int riV)
+{
+fIMMEXT(riV);
+fPCALIGN(riV);
+gen_log_reg_write(ctx, HEX_REG_LC0, RsV);
+gen_log_reg_write(ctx, HEX_REG_SA0, tcg_constant_tl(ctx->pkt->pc + riV));
+gen_set_usr_fieldi(ctx, USR_LPCFG, N);
+gen_log_pred_write(ctx, 3, tcg_constant_tl(0));
+}
+
+static void gen_ploopNsi(DisasContext *ctx, int N, int count, int riV)
+{
+gen_ploopNsr(ctx, N, tcg_constant_tl(count), riV);
+}
+#endif
+
 static void gen_cond_jumpr(DisasContext *ctx, TCGv dst_pc,
TCGCond cond, TCGv pred)
 {
-- 
2.25.1

[PULL v2 01/44] Hexagon (target/hexagon) Add support for v68/v69/v71/v73

2023-05-18 Thread Taylor Simpson

Add support for the ELF flags
Move target/hexagon/cpu.[ch] to be v73
Change the compiler flag used by "make check-tcg"

The decbin instruction is removed in Hexagon v73, so check the
version before trying to compile the instruction.

Signed-off-by: Taylor Simpson 
Reviewed-by: Anton Johansson 
Message-Id: <20230427224057.3766963-2-tsimp...@quicinc.com>
---
 configure |  2 +-
 linux-user/hexagon/target_elf.h   | 13 +
 target/hexagon/cpu.h  |  4 
 target/hexagon/cpu.c  | 14 ++
 tests/tcg/hexagon/misc.c  | 12 
 target/hexagon/README |  8 
 tests/tcg/hexagon/Makefile.target |  3 +++
 7 files changed, 43 insertions(+), 13 deletions(-)

diff --git a/configure b/configure
index 243e2e0a0d..0c3f7ba62f 100755
--- a/configure
+++ b/configure
@@ -1858,7 +1858,7 @@ fi
 : ${cross_cc_armeb="$cross_cc_arm"}
 : ${cross_cc_cflags_armeb="-mbig-endian"}
 : ${cross_cc_hexagon="hexagon-unknown-linux-musl-clang"}
-: ${cross_cc_cflags_hexagon="-mv67 -O2 -static"}
+: ${cross_cc_cflags_hexagon="-mv73 -O2 -static"}
 : ${cross_cc_cflags_i386="-m32"}
 : ${cross_cc_cflags_ppc="-m32 -mbig-endian"}
 : ${cross_cc_cflags_ppc64="-m64 -mbig-endian"}
diff --git a/linux-user/hexagon/target_elf.h b/linux-user/hexagon/target_elf.h
index b4e9f40527..a0271a0a2a 100644
--- a/linux-user/hexagon/target_elf.h
+++ b/linux-user/hexagon/target_elf.h
@@ -1,5 +1,5 @@
 /*
- *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
+ *  Copyright(c) 2019-2023 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
  *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
@@ -20,7 +20,7 @@
 
 static inline const char *cpu_get_model(uint32_t eflags)
 {
-/* For now, treat anything newer than v5 as a v67 */
+/* For now, treat anything newer than v5 as a v73 */
 /* FIXME - Disable instructions that are newer than the specified arch */
 if (eflags == 0x04 ||/* v5  */
 eflags == 0x05 ||/* v55 */
@@ -30,9 +30,14 @@ static inline const char *cpu_get_model(uint32_t eflags)
 eflags == 0x65 ||/* v65 */
 eflags == 0x66 ||/* v66 */
 eflags == 0x67 ||/* v67 */
-eflags == 0x8067 /* v67t */
+eflags == 0x8067 ||  /* v67t */
+eflags == 0x68 ||/* v68 */
+eflags == 0x69 ||/* v69 */
+eflags == 0x71 ||/* v71 */
+eflags == 0x8071 ||  /* v71t */
+eflags == 0x73   /* v73 */
) {
-return "v67";
+return "v73";
 }
 return "unknown";
 }
diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
index 81b663ecfb..4d8981d862 100644
--- a/target/hexagon/cpu.h
+++ b/target/hexagon/cpu.h
@@ -43,6 +43,10 @@
 #define CPU_RESOLVING_TYPE TYPE_HEXAGON_CPU
 
 #define TYPE_HEXAGON_CPU_V67 HEXAGON_CPU_TYPE_NAME("v67")
+#define TYPE_HEXAGON_CPU_V68 HEXAGON_CPU_TYPE_NAME("v68")
+#define TYPE_HEXAGON_CPU_V69 HEXAGON_CPU_TYPE_NAME("v69")
+#define TYPE_HEXAGON_CPU_V71 HEXAGON_CPU_TYPE_NAME("v71")
+#define TYPE_HEXAGON_CPU_V73 HEXAGON_CPU_TYPE_NAME("v73")
 
 #define MMU_USER_IDX 0
 
diff --git a/target/hexagon/cpu.c b/target/hexagon/cpu.c
index ab40cfc283..c78fe25c9f 100644
--- a/target/hexagon/cpu.c
+++ b/target/hexagon/cpu.c
@@ -1,5 +1,5 @@
 /*
- *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
+ *  Copyright(c) 2019-2023 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
  *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
@@ -25,9 +25,11 @@
 #include "fpu/softfloat-helpers.h"
 #include "tcg/tcg.h"
 
-static void hexagon_v67_cpu_init(Object *obj)
-{
-}
+static void hexagon_v67_cpu_init(Object *obj) { }
+static void hexagon_v68_cpu_init(Object *obj) { }
+static void hexagon_v69_cpu_init(Object *obj) { }
+static void hexagon_v71_cpu_init(Object *obj) { }
+static void hexagon_v73_cpu_init(Object *obj) { }
 
 static ObjectClass *hexagon_cpu_class_by_name(const char *cpu_model)
 {
@@ -382,6 +384,10 @@ static const TypeInfo hexagon_cpu_type_infos[] = {
 .class_init = hexagon_cpu_class_init,
 },
 DEFINE_CPU(TYPE_HEXAGON_CPU_V67,  hexagon_v67_cpu_init),
+DEFINE_CPU(TYPE_HEXAGON_CPU_V68,  hexagon_v68_cpu_init),
+DEFINE_CPU(TYPE_HEXAGON_CPU_V69,  hexagon_v69_cpu_init),
+DEFINE_CPU(TYPE_HEXAGON_CPU_V71,  hexagon_v71_cpu_init),
+DEFINE_CPU(TYPE_HEXAGON_CPU_V73,  hexagon_v73_cpu_init),
 };
 
 DEFINE_TYPES(hexagon_cpu_type_infos)
diff --git a/tests/tcg/hexagon/misc.c b/tests/tcg/hexagon/misc.c
index e126751e3a..4fcbb22795 100644
--- a/tests/tcg/hexagon/misc.c
+++ b/tests/tcg/hexagon/misc.c
@@ -18,6 +18,8 @@
 #include 
 #include 
 
+#define CORE_HAS_CABAC(__HEXAGON_ARCH__ <= 71)
+

[PULL v2 39/44] gdbstub: only send stop-reply packets when allowed to

2023-05-18 Thread Taylor Simpson

From: Matheus Tavares Bernardino 

GDB's remote serial protocol allows stop-reply messages to be sent by
the stub either as a notification packet or as a reply to a GDB command
(provided that the cmd accepts such a response). QEMU currently does not
implement notification packets, so it should only send stop-replies
synchronously and when requested. Nevertheless, it still issues
unsolicited stop messages through gdb_vm_state_change().

Although this behavior doesn't seem to cause problems with GDB itself
(the messages are just ignored), it can impact other debuggers that
implement the GDB remote serial protocol, like hexagon-lldb. Let's
change the gdbstub to send stop messages only as a response to a
previous GDB command that accepts such a reply.

Signed-off-by: Matheus Tavares Bernardino 
Acked-by: Alex Bennée 
Signed-off-by: Taylor Simpson 
Message-Id: 

---
 gdbstub/internals.h |  5 +
 gdbstub/gdbstub.c   | 37 -
 gdbstub/softmmu.c   | 13 +++--
 gdbstub/user.c  | 24 
 4 files changed, 60 insertions(+), 19 deletions(-)

diff --git a/gdbstub/internals.h b/gdbstub/internals.h
index 94ddff4495..33d21d6488 100644
--- a/gdbstub/internals.h
+++ b/gdbstub/internals.h
@@ -65,6 +65,11 @@ typedef struct GDBState {
 GByteArray *mem_buf;
 int sstep_flags;
 int supported_sstep_flags;
+/*
+ * Whether we are allowed to send a stop reply packet at this moment.
+ * Must be set off after sending the stop reply itself.
+ */
+bool allow_stop_reply;
 } GDBState;
 
 /* lives in main gdbstub.c */
diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index 0760d78685..be18568d0a 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -777,6 +777,10 @@ typedef void (*GdbCmdHandler)(GArray *params, void 
*user_ctx);
 /*
  * cmd_startswith -> cmd is compared using startswith
  *
+ * allow_stop_reply -> true iff the gdbstub can respond to this command with a
+ *   "stop reply" packet. The list of commands that accept such response is
+ *   defined at the GDB Remote Serial Protocol documentation. see:
+ *   
https://sourceware.org/gdb/onlinedocs/gdb/Stop-Reply-Packets.html#Stop-Reply-Packets.
  *
  * schema definitions:
  * Each schema parameter entry consists of 2 chars,
@@ -802,6 +806,7 @@ typedef struct GdbCmdParseEntry {
 const char *cmd;
 bool cmd_startswith;
 const char *schema;
+bool allow_stop_reply;
 } GdbCmdParseEntry;
 
 static inline int startswith(const char *string, const char *pattern)
@@ -835,6 +840,7 @@ static int process_string_cmd(void *user_ctx, const char 
*data,
 }
 }
 
+gdbserver_state.allow_stop_reply = cmd->allow_stop_reply;
 cmd->handler(params, user_ctx);
 return 0;
 }
@@ -1283,11 +1289,14 @@ static void handle_v_attach(GArray *params, void 
*user_ctx)
 gdbserver_state.g_cpu = cpu;
 gdbserver_state.c_cpu = cpu;
 
-g_string_printf(gdbserver_state.str_buf, "T%02xthread:", GDB_SIGNAL_TRAP);
-gdb_append_thread_id(cpu, gdbserver_state.str_buf);
-g_string_append_c(gdbserver_state.str_buf, ';');
+if (gdbserver_state.allow_stop_reply) {
+g_string_printf(gdbserver_state.str_buf, "T%02xthread:", 
GDB_SIGNAL_TRAP);
+gdb_append_thread_id(cpu, gdbserver_state.str_buf);
+g_string_append_c(gdbserver_state.str_buf, ';');
+gdbserver_state.allow_stop_reply = false;
 cleanup:
-gdb_put_strbuf();
+gdb_put_strbuf();
+}
 }
 
 static void handle_v_kill(GArray *params, void *user_ctx)
@@ -1310,12 +1319,14 @@ static const GdbCmdParseEntry gdb_v_commands_table[] = {
 .handler = handle_v_cont,
 .cmd = "Cont",
 .cmd_startswith = 1,
+.allow_stop_reply = true,
 .schema = "s0"
 },
 {
 .handler = handle_v_attach,
 .cmd = "Attach;",
 .cmd_startswith = 1,
+.allow_stop_reply = true,
 .schema = "l0"
 },
 {
@@ -1698,10 +1709,13 @@ static void handle_gen_set(GArray *params, void 
*user_ctx)
 
 static void handle_target_halt(GArray *params, void *user_ctx)
 {
-g_string_printf(gdbserver_state.str_buf, "T%02xthread:", GDB_SIGNAL_TRAP);
-gdb_append_thread_id(gdbserver_state.c_cpu, gdbserver_state.str_buf);
-g_string_append_c(gdbserver_state.str_buf, ';');
-gdb_put_strbuf();
+if (gdbserver_state.allow_stop_reply) {
+g_string_printf(gdbserver_state.str_buf, "T%02xthread:", 
GDB_SIGNAL_TRAP);
+gdb_append_thread_id(gdbserver_state.c_cpu, gdbserver_state.str_buf);
+g_string_append_c(gdbserver_state.str_buf, ';');
+gdb_put_strbuf();
+gdbserver_state.allow_stop_reply = false;
+}
 /*
  * Remove all the breakpoints when this query is issued,
  * because gdb is doing an initial connect and the state
@@ -1725,7 +1739,8 @@ static int gdb_handle_packet(const char *line_buf)
 static const GdbCmdParseEntry target_halted_cmd_desc = {

[PULL v2 37/44] Hexagon (decode): look for pkts with multiple insns at the same slot

2023-05-18 Thread Taylor Simpson

From: Matheus Tavares Bernardino 

Each slot in a packet can be assigned to at most one instruction.
Although the assembler generally ought to enforce this rule, we better
be safe than sorry and also do some check to properly throw an "invalid
packet" exception on wrong slot assignments.

This should also make it easier to debug possible future errors caused
by missing updates to `find_iclass_slots()` rules in
target/hexagon/iclass.c.

Co-authored-by: Taylor Simpson 
Signed-off-by: Taylor Simpson 
Signed-off-by: Matheus Tavares Bernardino 
Reviewed-by: Taylor Simpson 
Tested-by: Taylor Simpson 
Message-Id: 

---
 target/hexagon/decode.c   | 30 +++---
 tests/tcg/hexagon/invalid-slots.c | 29 +
 tests/tcg/hexagon/Makefile.target |  7 +++
 3 files changed, 63 insertions(+), 3 deletions(-)
 create mode 100644 tests/tcg/hexagon/invalid-slots.c

diff --git a/target/hexagon/decode.c b/target/hexagon/decode.c
index 041c8de751..946c55cc71 100644
--- a/target/hexagon/decode.c
+++ b/target/hexagon/decode.c
@@ -1,5 +1,5 @@
 /*
- *  Copyright(c) 2019-2022 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
+ *  Copyright(c) 2019-2023 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
  *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
@@ -797,7 +797,26 @@ static bool decode_parsebits_is_loopend(uint32_t 
encoding32)
 return bits == 0x2;
 }
 
-static void
+static bool has_valid_slot_assignment(Packet *pkt)
+{
+int used_slots = 0;
+for (int i = 0; i < pkt->num_insns; i++) {
+int slot_mask;
+Insn *insn = >insn[i];
+if (decode_opcode_ends_loop(insn->opcode)) {
+/* We overload slot 0 for endloop. */
+continue;
+}
+slot_mask = 1 << insn->slot;
+if (used_slots & slot_mask) {
+return false;
+}
+used_slots |= slot_mask;
+}
+return true;
+}
+
+static bool
 decode_set_slot_number(Packet *pkt)
 {
 int slot;
@@ -886,6 +905,8 @@ decode_set_slot_number(Packet *pkt)
 /* Then push it to slot0 */
 pkt->insn[slot1_iidx].slot = 0;
 }
+
+return has_valid_slot_assignment(pkt);
 }
 
 /*
@@ -961,8 +982,11 @@ int decode_packet(int max_words, const uint32_t *words, 
Packet *pkt,
 decode_apply_extenders(pkt);
 if (!disas_only) {
 decode_remove_extenders(pkt);
+if (!decode_set_slot_number(pkt)) {
+/* Invalid packet */
+return 0;
+}
 }
-decode_set_slot_number(pkt);
 decode_fill_newvalue_regno(pkt);
 
 if (pkt->pkt_has_hvx) {
diff --git a/tests/tcg/hexagon/invalid-slots.c 
b/tests/tcg/hexagon/invalid-slots.c
new file mode 100644
index 00..366ce4f42f
--- /dev/null
+++ b/tests/tcg/hexagon/invalid-slots.c
@@ -0,0 +1,29 @@
+/*
+ *  Copyright(c) 2023 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+char mem[8] __attribute__((aligned(8)));
+
+int main()
+{
+asm volatile(
+"r0 = #mem\n"
+/* Invalid packet (2 instructions at slot 0): */
+".word 0xa1804100\n" /* { memw(r0) = r1;  */
+".word 0x28032804\n" /*   r3 = #0; r4 = #0 }  */
+: : : "r0", "r3", "r4", "memory");
+return 0;
+}
diff --git a/tests/tcg/hexagon/Makefile.target 
b/tests/tcg/hexagon/Makefile.target
index 6109a7ed10..890cceed5d 100644
--- a/tests/tcg/hexagon/Makefile.target
+++ b/tests/tcg/hexagon/Makefile.target
@@ -50,6 +50,13 @@ HEX_TESTS += vector_add_int
 HEX_TESTS += scatter_gather
 HEX_TESTS += hvx_misc
 HEX_TESTS += hvx_histogram
+HEX_TESTS += invalid-slots
+
+run-and-check-exception = $(call run-test,$2,$3 2>$2.stderr; \
+   test $$? -eq 1 && grep -q "exception $(strip $1)" $2.stderr)
+
+run-invalid-slots: invalid-slots
+   $(call run-and-check-exception, 0x15, $@, $(QEMU) $(QEMU_OPTS) $<)
 
 HEX_TESTS += test_abs
 HEX_TESTS += test_bitcnt
-- 
2.25.1

[PULL v2 29/44] Hexagon (target/hexagon) Move pkt_has_store_s1 to DisasContext

2023-05-18 Thread Taylor Simpson

The pkt_has_store_s1 field is only used for bookkeeping helpers with
a load.  With recent changes that eliminate the need to free TCGv
variables, it makes more sense to make this transient.

These helpers already take the instruction slot as an argument.  We
combine the slot and pkt_has_store_s1 into a single argument called
slotval.

Suggested-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
Message-Id: <20230427230012.3800327-21-tsimp...@quicinc.com>
---
 target/hexagon/cpu.h|  1 -
 target/hexagon/macros.h | 16 
 target/hexagon/op_helper.h  | 12 
 target/hexagon/translate.h  |  1 -
 target/hexagon/genptr.c |  8 
 target/hexagon/op_helper.c  | 26 +++---
 target/hexagon/translate.c  |  7 ---
 target/hexagon/gen_analyze_funcs.py |  2 --
 target/hexagon/gen_helper_funcs.py  |  7 ++-
 target/hexagon/gen_tcg_funcs.py |  4 ++--
 target/hexagon/hex_common.py|  7 ---
 11 files changed, 51 insertions(+), 40 deletions(-)

diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
index 7673f9f32d..87e457dda9 100644
--- a/target/hexagon/cpu.h
+++ b/target/hexagon/cpu.h
@@ -99,7 +99,6 @@ typedef struct CPUArchState {
 target_ulong reg_written[TOTAL_PER_THREAD_REGS];
 
 MemLog mem_log_stores[STORES_MAX];
-target_ulong pkt_has_store_s1;
 target_ulong dczero_addr;
 
 float_status fp_status;
diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
index 828874f318..5308c0848e 100644
--- a/target/hexagon/macros.h
+++ b/target/hexagon/macros.h
@@ -173,14 +173,14 @@
 #define MEM_STORE8(VA, DATA, SLOT) \
 MEM_STORE8_FUNC(DATA)(cpu_env, VA, DATA, SLOT)
 #else
-#define MEM_LOAD1s(VA) ((int8_t)mem_load1(env, slot, VA))
-#define MEM_LOAD1u(VA) ((uint8_t)mem_load1(env, slot, VA))
-#define MEM_LOAD2s(VA) ((int16_t)mem_load2(env, slot, VA))
-#define MEM_LOAD2u(VA) ((uint16_t)mem_load2(env, slot, VA))
-#define MEM_LOAD4s(VA) ((int32_t)mem_load4(env, slot, VA))
-#define MEM_LOAD4u(VA) ((uint32_t)mem_load4(env, slot, VA))
-#define MEM_LOAD8s(VA) ((int64_t)mem_load8(env, slot, VA))
-#define MEM_LOAD8u(VA) ((uint64_t)mem_load8(env, slot, VA))
+#define MEM_LOAD1s(VA) ((int8_t)mem_load1(env, pkt_has_store_s1, slot, VA))
+#define MEM_LOAD1u(VA) ((uint8_t)mem_load1(env, pkt_has_store_s1, slot, VA))
+#define MEM_LOAD2s(VA) ((int16_t)mem_load2(env, pkt_has_store_s1, slot, VA))
+#define MEM_LOAD2u(VA) ((uint16_t)mem_load2(env, pkt_has_store_s1, slot, VA))
+#define MEM_LOAD4s(VA) ((int32_t)mem_load4(env, pkt_has_store_s1, slot, VA))
+#define MEM_LOAD4u(VA) ((uint32_t)mem_load4(env, pkt_has_store_s1, slot, VA))
+#define MEM_LOAD8s(VA) ((int64_t)mem_load8(env, pkt_has_store_s1, slot, VA))
+#define MEM_LOAD8u(VA) ((uint64_t)mem_load8(env, pkt_has_store_s1, slot, VA))
 
 #define MEM_STORE1(VA, DATA, SLOT) log_store32(env, VA, DATA, 1, SLOT)
 #define MEM_STORE2(VA, DATA, SLOT) log_store32(env, VA, DATA, 2, SLOT)
diff --git a/target/hexagon/op_helper.h b/target/hexagon/op_helper.h
index 6bd4b07849..8f3764d15e 100644
--- a/target/hexagon/op_helper.h
+++ b/target/hexagon/op_helper.h
@@ -19,10 +19,14 @@
 #define HEXAGON_OP_HELPER_H
 
 /* Misc functions */
-uint8_t mem_load1(CPUHexagonState *env, uint32_t slot, target_ulong vaddr);
-uint16_t mem_load2(CPUHexagonState *env, uint32_t slot, target_ulong vaddr);
-uint32_t mem_load4(CPUHexagonState *env, uint32_t slot, target_ulong vaddr);
-uint64_t mem_load8(CPUHexagonState *env, uint32_t slot, target_ulong vaddr);
+uint8_t mem_load1(CPUHexagonState *env, bool pkt_has_store_s1,
+  uint32_t slot, target_ulong vaddr);
+uint16_t mem_load2(CPUHexagonState *env, bool pkt_has_store_s1,
+   uint32_t slot, target_ulong vaddr);
+uint32_t mem_load4(CPUHexagonState *env, bool pkt_has_store_s1,
+   uint32_t slot, target_ulong vaddr);
+uint64_t mem_load8(CPUHexagonState *env, bool pkt_has_store_s1,
+   uint32_t slot, target_ulong vaddr);
 
 void log_store64(CPUHexagonState *env, target_ulong addr,
  int64_t val, int width, int slot);
diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h
index a9f1ccee24..9697b4de0e 100644
--- a/target/hexagon/translate.h
+++ b/target/hexagon/translate.h
@@ -66,7 +66,6 @@ typedef struct DisasContext {
 TCGCond branch_cond;
 target_ulong branch_dest;
 bool is_tight_loop;
-bool need_pkt_has_store_s1;
 bool short_circuit;
 bool has_hvx_helper;
 TCGv new_value[TOTAL_PER_THREAD_REGS];
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 785778759e..361cc789d7 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -398,6 +398,14 @@ static inline void gen_store_conditional8(DisasContext 
*ctx,
 tcg_gen_movi_tl(hex_llsc_addr, ~0);
 }
 
+#ifndef CONFIG_HEXAGON_IDEF_PARSER
+static TCGv gen_slotval(DisasContext *ctx)
+{
+int

[PULL v2 07/44] Hexagon (tests/tcg/hexagon) Add v69 HVX tests

2023-05-18 Thread Taylor Simpson

The following instructions are tested
V6_vasrvuhubrndsat
V6_vasrvuhubsat
V6_vasrvwuhrndsat
V6_vasrvwuhsat
V6_vassign_tmp
V6_vcombine_tmp
V6_vmpyuhvs

Signed-off-by: Taylor Simpson 
Reviewed-by: Anton Johansson 
Message-Id: <20230427224057.3766963-8-tsimp...@quicinc.com>
---
 tests/tcg/hexagon/v69_hvx.c   | 318 ++
 tests/tcg/hexagon/Makefile.target |   3 +
 2 files changed, 321 insertions(+)
 create mode 100644 tests/tcg/hexagon/v69_hvx.c

diff --git a/tests/tcg/hexagon/v69_hvx.c b/tests/tcg/hexagon/v69_hvx.c
new file mode 100644
index 00..a0d567d142
--- /dev/null
+++ b/tests/tcg/hexagon/v69_hvx.c
@@ -0,0 +1,318 @@
+/*
+ *  Copyright(c) 2023 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+int err;
+
+#include "hvx_misc.h"
+
+#define fVROUND(VAL, SHAMT) \
+((VAL) + (((SHAMT) > 0) ? (1LL << ((SHAMT) - 1)) : 0))
+
+#define fVSATUB(VAL) \
+VAL) & 0xffLL) == (VAL)) ? \
+(VAL) : \
+int32_t)(VAL)) < 0) ? 0 : 0xff))
+
+#define fVSATUH(VAL) \
+VAL) & 0xLL) == (VAL)) ? \
+(VAL) : \
+int32_t)(VAL)) < 0) ? 0 : 0x))
+
+static void test_vasrvuhubrndsat(void)
+{
+void *p0 = buffer0;
+void *p1 = buffer1;
+void *pout = output;
+
+memset(expect, 0xaa, sizeof(expect));
+memset(output, 0xbb, sizeof(output));
+
+for (int i = 0; i < BUFSIZE / 2; i++) {
+asm("v4 = vmem(%0 + #0)\n\t"
+"v5 = vmem(%0 + #1)\n\t"
+"v6 = vmem(%1 + #0)\n\t"
+"v5.ub = vasr(v5:4.uh, v6.ub):rnd:sat\n\t"
+"vmem(%2) = v5\n\t"
+: : "r"(p0), "r"(p1), "r"(pout)
+: "v4", "v5", "v6", "memory");
+p0 += sizeof(MMVector) * 2;
+p1 += sizeof(MMVector);
+pout += sizeof(MMVector);
+
+for (int j = 0; j < MAX_VEC_SIZE_BYTES / 2; j++) {
+int shamt;
+uint8_t byte0;
+uint8_t byte1;
+
+shamt = buffer1[i].ub[2 * j + 0] & 0x7;
+byte0 = fVSATUB(fVROUND(buffer0[2 * i + 0].uh[j], shamt) >> shamt);
+shamt = buffer1[i].ub[2 * j + 1] & 0x7;
+byte1 = fVSATUB(fVROUND(buffer0[2 * i + 1].uh[j], shamt) >> shamt);
+expect[i].uh[j] = (byte1 << 8) | (byte0 & 0xff);
+}
+}
+
+check_output_h(__LINE__, BUFSIZE / 2);
+}
+
+static void test_vasrvuhubsat(void)
+{
+void *p0 = buffer0;
+void *p1 = buffer1;
+void *pout = output;
+
+memset(expect, 0xaa, sizeof(expect));
+memset(output, 0xbb, sizeof(output));
+
+for (int i = 0; i < BUFSIZE / 2; i++) {
+asm("v4 = vmem(%0 + #0)\n\t"
+"v5 = vmem(%0 + #1)\n\t"
+"v6 = vmem(%1 + #0)\n\t"
+"v5.ub = vasr(v5:4.uh, v6.ub):sat\n\t"
+"vmem(%2) = v5\n\t"
+: : "r"(p0), "r"(p1), "r"(pout)
+: "v4", "v5", "v6", "memory");
+p0 += sizeof(MMVector) * 2;
+p1 += sizeof(MMVector);
+pout += sizeof(MMVector);
+
+for (int j = 0; j < MAX_VEC_SIZE_BYTES / 2; j++) {
+int shamt;
+uint8_t byte0;
+uint8_t byte1;
+
+shamt = buffer1[i].ub[2 * j + 0] & 0x7;
+byte0 = fVSATUB(buffer0[2 * i + 0].uh[j] >> shamt);
+shamt = buffer1[i].ub[2 * j + 1] & 0x7;
+byte1 = fVSATUB(buffer0[2 * i + 1].uh[j] >> shamt);
+expect[i].uh[j] = (byte1 << 8) | (byte0 & 0xff);
+}
+}
+
+check_output_h(__LINE__, BUFSIZE / 2);
+}
+
+static void test_vasrvwuhrndsat(void)
+{
+void *p0 = buffer0;
+void *p1 = buffer1;
+void *pout = output;
+
+memset(expect, 0xaa, sizeof(expect));
+memset(output, 0xbb, sizeof(output));
+
+for (int i = 0; i < BUFSIZE / 2; i++) {
+asm("v4 = vmem(%0 + #0)\n\t"
+"v5 = vmem(%0 + #1)\n\t"
+"v6 = vmem(%1 + #0)\n\t"
+"v5.uh = vasr(v5:4.w, v6.uh):rnd:sat\n\t"
+"vmem(%2) = v5\n\t"
+: : "r"(p0), "r"(p1), "r"(pout)
+: "v4", "v5", "v6", "memory");
+p0 += sizeof(MMVector) * 2;
+p1 += sizeof(MMVector);
+pout += sizeof(MMVector);
+
+for (int j = 0; j < MAX_VEC_SIZE_BYTES / 4; j++) {
+int shamt;
+

[PULL v2 18/44] Hexagon (target/hexagon) Don't overlap dest writes with source reads

2023-05-18 Thread Taylor Simpson

When generating TCG, make sure we have read all the operand registers
before writing to the destination registers.

This is a prerequesite for short-circuiting where the source and dest
operands could be the same.

Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
Message-Id: <20230427230012.3800327-10-tsimp...@quicinc.com>
---
 target/hexagon/genptr.c | 45 ++---
 1 file changed, 29 insertions(+), 16 deletions(-)

diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 9bbaca6300..3c7e0dafaf 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -971,6 +971,7 @@ static void gen_cmpi_jumpnv(DisasContext *ctx,
 /* Shift left with saturation */
 static void gen_shl_sat(DisasContext *ctx, TCGv dst, TCGv src, TCGv shift_amt)
 {
+TCGv tmp = tcg_temp_new();/* In case dst == src */
 TCGv usr = get_result_gpr(ctx, HEX_REG_USR);
 TCGv sh32 = tcg_temp_new();
 TCGv dst_sar = tcg_temp_new();
@@ -995,17 +996,17 @@ static void gen_shl_sat(DisasContext *ctx, TCGv dst, TCGv 
src, TCGv shift_amt)
  */
 
 tcg_gen_andi_tl(sh32, shift_amt, 31);
-tcg_gen_movcond_tl(TCG_COND_EQ, dst, sh32, shift_amt,
+tcg_gen_movcond_tl(TCG_COND_EQ, tmp, sh32, shift_amt,
src, tcg_constant_tl(0));
-tcg_gen_shl_tl(dst, dst, sh32);
-tcg_gen_sar_tl(dst_sar, dst, sh32);
+tcg_gen_shl_tl(tmp, tmp, sh32);
+tcg_gen_sar_tl(dst_sar, tmp, sh32);
 tcg_gen_movcond_tl(TCG_COND_LT, satval, src, tcg_constant_tl(0), min, max);
 
 tcg_gen_setcond_tl(TCG_COND_NE, ovf, dst_sar, src);
 tcg_gen_shli_tl(ovf, ovf, reg_field_info[USR_OVF].offset);
 tcg_gen_or_tl(usr, usr, ovf);
 
-tcg_gen_movcond_tl(TCG_COND_EQ, dst, dst_sar, src, dst, satval);
+tcg_gen_movcond_tl(TCG_COND_EQ, dst, dst_sar, src, tmp, satval);
 }
 
 static void gen_sar(TCGv dst, TCGv src, TCGv shift_amt)
@@ -1228,22 +1229,28 @@ void gen_sat_i32(TCGv dest, TCGv source, int width)
 
 void gen_sat_i32_ovfl(TCGv ovfl, TCGv dest, TCGv source, int width)
 {
-gen_sat_i32(dest, source, width);
-tcg_gen_setcond_tl(TCG_COND_NE, ovfl, source, dest);
+TCGv tmp = tcg_temp_new();/* In case dest == source */
+gen_sat_i32(tmp, source, width);
+tcg_gen_setcond_tl(TCG_COND_NE, ovfl, source, tmp);
+tcg_gen_mov_tl(dest, tmp);
 }
 
 void gen_satu_i32(TCGv dest, TCGv source, int width)
 {
+TCGv tmp = tcg_temp_new();/* In case dest == source */
 TCGv max_val = tcg_constant_tl((1 << width) - 1);
 TCGv zero = tcg_constant_tl(0);
-tcg_gen_movcond_tl(TCG_COND_GTU, dest, source, max_val, max_val, source);
-tcg_gen_movcond_tl(TCG_COND_LT, dest, source, zero, zero, dest);
+tcg_gen_movcond_tl(TCG_COND_GTU, tmp, source, max_val, max_val, source);
+tcg_gen_movcond_tl(TCG_COND_LT, tmp, source, zero, zero, tmp);
+tcg_gen_mov_tl(dest, tmp);
 }
 
 void gen_satu_i32_ovfl(TCGv ovfl, TCGv dest, TCGv source, int width)
 {
-gen_satu_i32(dest, source, width);
-tcg_gen_setcond_tl(TCG_COND_NE, ovfl, source, dest);
+TCGv tmp = tcg_temp_new();/* In case dest == source */
+gen_satu_i32(tmp, source, width);
+tcg_gen_setcond_tl(TCG_COND_NE, ovfl, source, tmp);
+tcg_gen_mov_tl(dest, tmp);
 }
 
 void gen_sat_i64(TCGv_i64 dest, TCGv_i64 source, int width)
@@ -1256,27 +1263,33 @@ void gen_sat_i64(TCGv_i64 dest, TCGv_i64 source, int 
width)
 
 void gen_sat_i64_ovfl(TCGv ovfl, TCGv_i64 dest, TCGv_i64 source, int width)
 {
+TCGv_i64 tmp = tcg_temp_new_i64(); /* In case dest == source */
 TCGv_i64 ovfl_64;
-gen_sat_i64(dest, source, width);
+gen_sat_i64(tmp, source, width);
 ovfl_64 = tcg_temp_new_i64();
-tcg_gen_setcond_i64(TCG_COND_NE, ovfl_64, dest, source);
+tcg_gen_setcond_i64(TCG_COND_NE, ovfl_64, tmp, source);
+tcg_gen_mov_i64(dest, tmp);
 tcg_gen_trunc_i64_tl(ovfl, ovfl_64);
 }
 
 void gen_satu_i64(TCGv_i64 dest, TCGv_i64 source, int width)
 {
+TCGv_i64 tmp = tcg_temp_new_i64();/* In case dest == source */
 TCGv_i64 max_val = tcg_constant_i64((1LL << width) - 1LL);
 TCGv_i64 zero = tcg_constant_i64(0);
-tcg_gen_movcond_i64(TCG_COND_GTU, dest, source, max_val, max_val, source);
-tcg_gen_movcond_i64(TCG_COND_LT, dest, source, zero, zero, dest);
+tcg_gen_movcond_i64(TCG_COND_GTU, tmp, source, max_val, max_val, source);
+tcg_gen_movcond_i64(TCG_COND_LT, tmp, source, zero, zero, tmp);
+tcg_gen_mov_i64(dest, tmp);
 }
 
 void gen_satu_i64_ovfl(TCGv ovfl, TCGv_i64 dest, TCGv_i64 source, int width)
 {
+TCGv_i64 tmp = tcg_temp_new_i64();/* In case dest == source */
 TCGv_i64 ovfl_64;
-gen_satu_i64(dest, source, width);
+gen_satu_i64(tmp, source, width);
 ovfl_64 = tcg_temp_new_i64();
-tcg_gen_setcond_i64(TCG_COND_NE, ovfl_64, dest, source);
+tcg_gen_setcond_i64(TCG_COND_NE, ovfl_64, tmp, source);
+tcg_gen_mov_i64(dest, tmp);
 tcg_gen_trunc_i64_tl(ovfl, ovfl_64);
 }
 
-- 
2.25.1

[PULL v2 26/44] Hexagon (target/hexagon) Move new_value to DisasContext

2023-05-18 Thread Taylor Simpson

The new_value array in the CPUHexagonState is only used for bookkeeping
within the translation of a packet.  With recent changes that eliminate
the need to free TCGv variables, these make more sense to be transient
and kept in DisasContext.

Suggested-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
Message-Id: <20230427230012.3800327-18-tsimp...@quicinc.com>
---
 target/hexagon/cpu.h   |  1 -
 target/hexagon/translate.h |  2 +-
 target/hexagon/genptr.c|  6 +-
 target/hexagon/translate.c | 14 +++---
 4 files changed, 9 insertions(+), 14 deletions(-)

diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
index f86c9f0131..0ef6d717d0 100644
--- a/target/hexagon/cpu.h
+++ b/target/hexagon/cpu.h
@@ -89,7 +89,6 @@ typedef struct CPUArchState {
 target_ulong stack_start;
 
 uint8_t slot_cancelled;
-target_ulong new_value[TOTAL_PER_THREAD_REGS];
 target_ulong new_value_usr;
 
 /*
diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h
index 4c17433a6f..6dde487566 100644
--- a/target/hexagon/translate.h
+++ b/target/hexagon/translate.h
@@ -69,6 +69,7 @@ typedef struct DisasContext {
 bool need_pkt_has_store_s1;
 bool short_circuit;
 bool has_hvx_helper;
+TCGv new_value[TOTAL_PER_THREAD_REGS];
 } DisasContext;
 
 static inline void ctx_log_pred_write(DisasContext *ctx, int pnum)
@@ -190,7 +191,6 @@ extern TCGv hex_pred[NUM_PREGS];
 extern TCGv hex_this_PC;
 extern TCGv hex_slot_cancelled;
 extern TCGv hex_branch_taken;
-extern TCGv hex_new_value[TOTAL_PER_THREAD_REGS];
 extern TCGv hex_new_value_usr;
 extern TCGv hex_reg_written[TOTAL_PER_THREAD_REGS];
 extern TCGv hex_new_pred_value[NUM_PREGS];
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index bfcb962a3d..37210e6f09 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -74,7 +74,11 @@ TCGv get_result_gpr(DisasContext *ctx, int rnum)
 if (rnum == HEX_REG_USR) {
 return hex_new_value_usr;
 } else {
-return hex_new_value[rnum];
+if (ctx->new_value[rnum] == NULL) {
+ctx->new_value[rnum] = tcg_temp_new();
+tcg_gen_movi_tl(ctx->new_value[rnum], 0);
+}
+return ctx->new_value[rnum];
 }
 } else {
 return hex_gpr[rnum];
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index e73c0066dd..bca42797c0 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -44,7 +44,6 @@ TCGv hex_pred[NUM_PREGS];
 TCGv hex_this_PC;
 TCGv hex_slot_cancelled;
 TCGv hex_branch_taken;
-TCGv hex_new_value[TOTAL_PER_THREAD_REGS];
 TCGv hex_new_value_usr;
 TCGv hex_reg_written[TOTAL_PER_THREAD_REGS];
 TCGv hex_new_pred_value[NUM_PREGS];
@@ -513,6 +512,9 @@ static void gen_start_packet(DisasContext *ctx)
 }
 ctx->s1_store_processed = false;
 ctx->pre_commit = true;
+for (i = 0; i < TOTAL_PER_THREAD_REGS; i++) {
+ctx->new_value[i] = NULL;
+}
 
 analyze_packet(ctx);
 
@@ -1159,7 +1161,6 @@ void gen_intermediate_code(CPUState *cs, TranslationBlock 
*tb, int *max_insns,
 }
 
 #define NAME_LEN   64
-static char new_value_names[TOTAL_PER_THREAD_REGS][NAME_LEN];
 static char reg_written_names[TOTAL_PER_THREAD_REGS][NAME_LEN];
 static char new_pred_value_names[NUM_PREGS][NAME_LEN];
 static char store_addr_names[STORES_MAX][NAME_LEN];
@@ -1181,15 +1182,6 @@ void hexagon_translate_init(void)
 offsetof(CPUHexagonState, gpr[i]),
 hexagon_regnames[i]);
 
-if (i == HEX_REG_USR) {
-hex_new_value[i] = NULL;
-} else {
-snprintf(new_value_names[i], NAME_LEN, "new_%s", 
hexagon_regnames[i]);
-hex_new_value[i] = tcg_global_mem_new(cpu_env,
-offsetof(CPUHexagonState, new_value[i]),
-new_value_names[i]);
-}
-
 if (HEX_DEBUG) {
 snprintf(reg_written_names[i], NAME_LEN, "reg_written_%s",
  hexagon_regnames[i]);
-- 
2.25.1

[PULL v2 03/44] Hexagon (tests/tcg/hexagon) Add v68 scalar tests

2023-05-18 Thread Taylor Simpson

Signed-off-by: Taylor Simpson 
Reviewed-by: Anton Johansson 
Message-Id: <20230427224057.3766963-4-tsimp...@quicinc.com>
---
 tests/tcg/hexagon/v68_scalar.c| 186 ++
 tests/tcg/hexagon/Makefile.target |   2 +
 2 files changed, 188 insertions(+)
 create mode 100644 tests/tcg/hexagon/v68_scalar.c

diff --git a/tests/tcg/hexagon/v68_scalar.c b/tests/tcg/hexagon/v68_scalar.c
new file mode 100644
index 00..7a8adb1130
--- /dev/null
+++ b/tests/tcg/hexagon/v68_scalar.c
@@ -0,0 +1,186 @@
+/*
+ *  Copyright(c) 2023 Qualcomm Innovation Center, Inc. All Rights Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+#include 
+#include 
+#include 
+
+/*
+ *  Test the scalar core instructions that are new in v68
+ */
+
+int err;
+
+static int buffer32[] = { 1, 2, 3, 4 };
+static long long buffer64[] = { 5, 6, 7, 8 };
+
+static void __check32(int line, uint32_t result, uint32_t expect)
+{
+if (result != expect) {
+printf("ERROR at line %d: 0x%08x != 0x%08x\n",
+   line, result, expect);
+err++;
+}
+}
+
+#define check32(RES, EXP) __check32(__LINE__, RES, EXP)
+
+static void __check64(int line, uint64_t result, uint64_t expect)
+{
+if (result != expect) {
+printf("ERROR at line %d: 0x%016llx != 0x%016llx\n",
+   line, result, expect);
+err++;
+}
+}
+
+#define check64(RES, EXP) __check64(__LINE__, RES, EXP)
+
+static inline int loadw_aq(int *p)
+{
+int res;
+asm volatile("%0 = memw_aq(%1)\n\t"
+ : "=r"(res) : "r"(p));
+return res;
+}
+
+static void test_loadw_aq(void)
+{
+int res;
+
+res = loadw_aq([0]);
+check32(res, 1);
+res = loadw_aq([1]);
+check32(res, 2);
+}
+
+static inline long long loadd_aq(long long *p)
+{
+long long res;
+asm volatile("%0 = memd_aq(%1)\n\t"
+ : "=r"(res) : "r"(p));
+return res;
+}
+
+static void test_loadd_aq(void)
+{
+long long res;
+
+res = loadd_aq([2]);
+check64(res, 7);
+res = loadd_aq([3]);
+check64(res, 8);
+}
+
+static inline void release_at(int *p)
+{
+asm volatile("release(%0):at\n\t"
+ : : "r"(p));
+}
+
+static void test_release_at(void)
+{
+release_at([2]);
+check64(buffer32[2], 3);
+release_at([3]);
+check64(buffer32[3], 4);
+}
+
+static inline void release_st(int *p)
+{
+asm volatile("release(%0):st\n\t"
+ : : "r"(p));
+}
+
+static void test_release_st(void)
+{
+release_st([2]);
+check64(buffer32[2], 3);
+release_st([3]);
+check64(buffer32[3], 4);
+}
+
+static inline void storew_rl_at(int *p, int val)
+{
+asm volatile("memw_rl(%0):at = %1\n\t"
+ : : "r"(p), "r"(val) : "memory");
+}
+
+static void test_storew_rl_at(void)
+{
+storew_rl_at([2], 9);
+check64(buffer32[2], 9);
+storew_rl_at([3], 10);
+check64(buffer32[3], 10);
+}
+
+static inline void stored_rl_at(long long *p, long long val)
+{
+asm volatile("memd_rl(%0):at = %1\n\t"
+ : : "r"(p), "r"(val) : "memory");
+}
+
+static void test_stored_rl_at(void)
+{
+stored_rl_at([2], 11);
+check64(buffer64[2], 11);
+stored_rl_at([3], 12);
+check64(buffer64[3], 12);
+}
+
+static inline void storew_rl_st(int *p, int val)
+{
+asm volatile("memw_rl(%0):st = %1\n\t"
+ : : "r"(p), "r"(val) : "memory");
+}
+
+static void test_storew_rl_st(void)
+{
+storew_rl_st([0], 13);
+check64(buffer32[0], 13);
+storew_rl_st([1], 14);
+check64(buffer32[1], 14);
+}
+
+static inline void stored_rl_st(long long *p, long long val)
+{
+asm volatile("memd_rl(%0):st = %1\n\t"
+ : : "r"(p), "r"(val) : "memory");
+}
+
+static void test_stored_rl_st(void)
+{
+stored_rl_st([0], 15);
+check64(buffer64[0], 15);
+stored_rl_st([1], 15);
+check64(buffer64[1], 15);
+}
+
+int main()
+{
+test_loadw_aq();
+test_loadd_aq();
+test_release_at();
+test_release_st();
+test_storew_rl_at();
+test_stored_rl_at();
+test_storew_rl_st();
+test_stored_rl_st();
+
+puts(err ? "FAIL" : "PASS");
+return err ? 1 : 0;
+}
diff --git a/tests/tcg/hexagon/Makefile.target 
b/tests/tcg/hexagon/Makefile.target
index 59b1b074e9..b7529e23bc 100644
--- a/tests/tcg/hexagon/Makefile.target
+++ b/tests/tcg/hexagon/Makefile.target
@@

[PULL v2 38/44] Remove test_vshuff from hvx_misc tests

2023-05-18 Thread Taylor Simpson

From: Marco Liebel 

test_vshuff checks that the vshuff instruction works correctly when
both vector registers are the same. Using vshuff in this way is
undefined and will be rejected by the compiler in a future version of
the toolchain.

Signed-off-by: Marco Liebel 
Reviewed-by: Brian Cain 
Reviewed-by: Taylor Simpson 
Tested-by: Taylor Simpson 
Signed-off-by: Taylor Simpson 
Message-Id: <20230509184231.2467626-1-quic_mlie...@quicinc.com>
---
 tests/tcg/hexagon/hvx_misc.c | 45 
 1 file changed, 45 deletions(-)

diff --git a/tests/tcg/hexagon/hvx_misc.c b/tests/tcg/hexagon/hvx_misc.c
index c89fe0253d..09dec8d7a1 100644
--- a/tests/tcg/hexagon/hvx_misc.c
+++ b/tests/tcg/hexagon/hvx_misc.c
@@ -342,49 +342,6 @@ static void test_vsubuwsat_dv(void)
 check_output_w(__LINE__, 2);
 }
 
-static void test_vshuff(void)
-{
-/* Test that vshuff works when the two operands are the same register */
-const uint32_t splat = 0x089be55c;
-const uint32_t shuff = 0x454fa926;
-MMVector v0, v1;
-
-memset(expect, 0x12, sizeof(MMVector));
-memset(output, 0x34, sizeof(MMVector));
-
-asm volatile("v25 = vsplat(%0)\n\t"
- "vshuff(v25, v25, %1)\n\t"
- "vmem(%2 + #0) = v25\n\t"
- : /* no outputs */
- : "r"(splat), "r"(shuff), "r"(output)
- : "v25", "memory");
-
-/*
- * The semantics of Hexagon are the operands are pass-by-value, so create
- * two copies of the vsplat result.
- */
-for (int i = 0; i < MAX_VEC_SIZE_BYTES / 4; i++) {
-v0.uw[i] = splat;
-v1.uw[i] = splat;
-}
-/* Do the vshuff operation */
-for (int offset = 1; offset < MAX_VEC_SIZE_BYTES; offset <<= 1) {
-if (shuff & offset) {
-for (int k = 0; k < MAX_VEC_SIZE_BYTES; k++) {
-if (!(k & offset)) {
-uint8_t tmp = v0.ub[k];
-v0.ub[k] = v1.ub[k + offset];
-v1.ub[k + offset] = tmp;
-}
-}
-}
-}
-/* Put the result in the expect buffer for verification */
-expect[0] = v1;
-
-check_output_b(__LINE__, 1);
-}
-
 static void test_load_tmp_predicated(void)
 {
 void *p0 = buffer0;
@@ -508,8 +465,6 @@ int main()
 test_vadduwsat();
 test_vsubuwsat_dv();
 
-test_vshuff();
-
 test_load_tmp_predicated();
 test_load_cur_predicated();
 
-- 
2.25.1

[PULL v2 16/44] Hexagon (target/hexagon) Eliminate uses of log_pred_write function

2023-05-18 Thread Taylor Simpson

These instructions have implicit writes to registers, so we don't
want them to be helpers when idef-parser is off.

The following instructions are overriden
S2_cabacdecbin
SA1_cmpeqi

Remove the log_pred_write function from op_helper.c
Remove references in macros.h

Signed-off-by: Taylor Simpson 
Acked-by: Richard Henderson 
Message-Id: <20230427230012.3800327-8-tsimp...@quicinc.com>
---
 target/hexagon/gen_tcg.h   | 16 +++
 target/hexagon/helper.h|  2 +
 target/hexagon/macros.h|  4 --
 target/hexagon/genptr.c|  5 ++
 target/hexagon/op_helper.c | 96 --
 5 files changed, 104 insertions(+), 19 deletions(-)

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index a1d7eabae7..099a6cc47f 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -595,6 +595,14 @@
 gen_helper_vacsh_val(RxxV, cpu_env, RxxV, RssV, RttV); \
 } while (0)
 
+#define fGEN_TCG_S2_cabacdecbin(SHORTCODE) \
+do { \
+TCGv p0 = tcg_temp_new(); \
+gen_helper_cabacdecbin_pred(p0, RssV, RttV); \
+gen_helper_cabacdecbin_val(RddV, RssV, RttV); \
+gen_log_pred_write(ctx, 0, p0); \
+} while (0)
+
 /*
  * Approximate reciprocal
  * r3,p1 = sfrecipa(r0, r1)
@@ -902,6 +910,14 @@
 #define fGEN_TCG_J4_tstbit0_fp1_jump_t(SHORTCODE) \
 gen_cmpnd_tstbit0_jmp(ctx, 1, RsV, TCG_COND_NE, riV)
 
+/* p0 = cmp.eq(r0, #7) */
+#define fGEN_TCG_SA1_cmpeqi(SHORTCODE) \
+do { \
+TCGv p0 = tcg_temp_new(); \
+gen_comparei(TCG_COND_EQ, p0, RsV, uiV); \
+gen_log_pred_write(ctx, 0, p0); \
+} while (0)
+
 #define fGEN_TCG_J2_jump(SHORTCODE) \
 gen_jump(ctx, riV)
 #define fGEN_TCG_J2_jumpr(SHORTCODE) \
diff --git a/target/hexagon/helper.h b/target/hexagon/helper.h
index ed7f9842f6..73849e3d49 100644
--- a/target/hexagon/helper.h
+++ b/target/hexagon/helper.h
@@ -31,6 +31,8 @@ DEF_HELPER_3(sfrecipa, i64, env, f32, f32)
 DEF_HELPER_2(sfinvsqrta, i64, env, f32)
 DEF_HELPER_4(vacsh_val, s64, env, s64, s64, s64)
 DEF_HELPER_FLAGS_4(vacsh_pred, TCG_CALL_NO_RWG_SE, s32, env, s64, s64, s64)
+DEF_HELPER_FLAGS_2(cabacdecbin_val, TCG_CALL_NO_RWG_SE, s64, s64, s64)
+DEF_HELPER_FLAGS_2(cabacdecbin_pred, TCG_CALL_NO_RWG_SE, s32, s64, s64)
 
 /* Floating point */
 DEF_HELPER_2(conv_sf2df, f64, env, f32)
diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
index 995ae0e384..24c78fe80a 100644
--- a/target/hexagon/macros.h
+++ b/target/hexagon/macros.h
@@ -371,10 +371,6 @@ static inline TCGv gen_read_ireg(TCGv result, TCGv val, 
int shift)
 #define fSET_OVERFLOW() SET_USR_FIELD(USR_OVF, 1)
 #define fSET_LPCFG(VAL) SET_USR_FIELD(USR_LPCFG, (VAL))
 #define fGET_LPCFG (GET_USR_FIELD(USR_LPCFG))
-#define fWRITE_P0(VAL) log_pred_write(env, 0, VAL)
-#define fWRITE_P1(VAL) log_pred_write(env, 1, VAL)
-#define fWRITE_P2(VAL) log_pred_write(env, 2, VAL)
-#define fWRITE_P3(VAL) log_pred_write(env, 3, VAL)
 #define fPART1(WORK) if (part1) { WORK; return; }
 #define fCAST4u(A) ((uint32_t)(A))
 #define fCAST4s(A) ((int32_t)(A))
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index fa7b1754bd..dac62b90a6 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -560,6 +560,11 @@ static void gen_ploopNsi(DisasContext *ctx, int N, int 
count, int riV)
 {
 gen_ploopNsr(ctx, N, tcg_constant_tl(count), riV);
 }
+
+static inline void gen_comparei(TCGCond cond, TCGv res, TCGv arg1, int arg2)
+{
+gen_compare(cond, res, arg1, tcg_constant_tl(arg2));
+}
 #endif
 
 static void gen_cond_jumpr(DisasContext *ctx, TCGv dst_pc,
diff --git a/target/hexagon/op_helper.c b/target/hexagon/op_helper.c
index 7e9e3f305e..46ccc59106 100644
--- a/target/hexagon/op_helper.c
+++ b/target/hexagon/op_helper.c
@@ -52,21 +52,6 @@ G_NORETURN void HELPER(raise_exception)(CPUHexagonState 
*env, uint32_t excp)
 do_raise_exception_err(env, excp, 0);
 }
 
-static void log_pred_write(CPUHexagonState *env, int pnum, target_ulong val)
-{
-HEX_DEBUG_LOG("log_pred_write[%d] = " TARGET_FMT_ld
-  " (0x" TARGET_FMT_lx ")\n",
-  pnum, val, val);
-
-/* Multiple writes to the same preg are and'ed together */
-if (env->pred_written & (1 << pnum)) {
-env->new_pred_value[pnum] &= val & 0xff;
-} else {
-env->new_pred_value[pnum] = val & 0xff;
-env->pred_written |= 1 << pnum;
-}
-}
-
 void log_store32(CPUHexagonState *env, target_ulong addr,
  target_ulong val, int width, int slot)
 {
@@ -399,6 +384,87 @@ int32_t HELPER(vacsh_pred)(CPUHexagonState *env,
 return PeV;
 }
 
+int64_t HELPER(cabacdecbin_val)(int64_t RssV, int64_t RttV)
+{
+int64_t RddV = 0;
+size4u_t state;
+size4u_t valMPS;
+size4u_t bitpos;
+size4u_t range;
+size4u_t offset;
+size4u_t rLPS;
+size4u_t rMPS;
+
+state =  fEXTRACTU_RANGE(fGETWORD(1, RttV), 5, 0);
+valMPS = fEXTRACTU_RANGE(fGETWORD(1, RttV), 8, 8);
+bitpos =

[PULL v2 04/44] Hexagon (target/hexagon) Add v68 HVX instructions

2023-05-18 Thread Taylor Simpson

The following instructions are added
V6_v6mpyvubs10_vxx
V6_v6mpyhubs10_vxx
V6_v6mpyvubs10
V6_v6mpyhubs10

Signed-off-by: Taylor Simpson 
Reviewed-by: Anton Johansson 
Message-Id: <20230427224057.3766963-5-tsimp...@quicinc.com>
---
 target/hexagon/mmvec/macros.h|   9 +-
 target/hexagon/imported/mmvec/encode_ext.def |   8 +-
 target/hexagon/imported/mmvec/ext.idef   | 281 ++-
 3 files changed, 295 insertions(+), 3 deletions(-)

diff --git a/target/hexagon/mmvec/macros.h b/target/hexagon/mmvec/macros.h
index 1201d778d0..a655634fd1 100644
--- a/target/hexagon/mmvec/macros.h
+++ b/target/hexagon/mmvec/macros.h
@@ -1,5 +1,5 @@
 /*
- *  Copyright(c) 2019-2022 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
+ *  Copyright(c) 2019-2023 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
  *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
@@ -346,4 +346,11 @@
 #define fUARCH_NOTE_PUMP_2X()
 
 #define IV1DEAD()
+
+#define fGET10BIT(COE, VAL, POS) \
+do { \
+COE = (sextract32(VAL, 24 + 2 * POS, 2) << 8) | \
+   extract32(VAL, POS * 8, 8); \
+} while (0);
+
 #endif
diff --git a/target/hexagon/imported/mmvec/encode_ext.def 
b/target/hexagon/imported/mmvec/encode_ext.def
index 6fbbe2c422..b9b62fef8d 100644
--- a/target/hexagon/imported/mmvec/encode_ext.def
+++ b/target/hexagon/imported/mmvec/encode_ext.def
@@ -1,5 +1,5 @@
 /*
- *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
+ *  Copyright(c) 2019-2023 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
  *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
@@ -730,6 +730,8 @@ DEF_ENC(V6_vmaxb, ICLASS_CJ" 1 111 001 v PP 0 
u 101 d") //
 DEF_ENC(V6_vsatuwuh,ICLASS_CJ" 1 111 001 v PP 0 u 110 d") //
 DEF_ENC(V6_vdealb4w, ICLASS_CJ" 1 111 001 v PP 0 u 111 d") //
 
+DEF_ENC(V6_v6mpyvubs10_vxx,ICLASS_CJ" 1 111 001 v PP 1 u 0ii 
x")
+DEF_ENC(V6_v6mpyhubs10_vxx,ICLASS_CJ" 1 111 001 v PP 1 u 1ii 
x")
 
 DEF_ENC(V6_vmpyowh_rnd, ICLASS_CJ" 1 111 010 v PP 0 u 000 d") 
//
 DEF_ENC(V6_vshuffeb,  ICLASS_CJ" 1 111 010 v PP 0 u 001 d") //
@@ -740,6 +742,10 @@ DEF_ENC(V6_vshufoeh,  ICLASS_CJ" 1 111 010 v PP 0 
u 101 d") //
 DEF_ENC(V6_vshufoeb,  ICLASS_CJ" 1 111 010 v PP 0 u 110 d") //
 DEF_ENC(V6_vcombine, ICLASS_CJ" 1 111 010 v PP 0 u 111 d") //
 
+DEF_ENC(V6_v6mpyvubs10,  ICLASS_CJ" 1 111 010 v PP 1 u 0ii d")
+DEF_ENC(V6_v6mpyhubs10,  ICLASS_CJ" 1 111 010 v PP 1 u 1ii d")
+
+
 DEF_ENC(V6_vmpyieoh, ICLASS_CJ" 1 111 011 v PP 0 u 000 d") //
 DEF_ENC(V6_vadduwsat, ICLASS_CJ" 1 111 011 v PP 0 u 001 d") //
 DEF_ENC(V6_vsathub, ICLASS_CJ" 1 111 011 v PP 0 u 010 d") //
diff --git a/target/hexagon/imported/mmvec/ext.idef 
b/target/hexagon/imported/mmvec/ext.idef
index 8ca5a606e1..c0d169fd4f 100644
--- a/target/hexagon/imported/mmvec/ext.idef
+++ b/target/hexagon/imported/mmvec/ext.idef
@@ -1,5 +1,5 @@
 /*
- *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
+ *  Copyright(c) 2019-2023 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
  *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
@@ -116,6 +116,10 @@ ITERATOR_INSN_MPY_SLOT_LATE(WIDTH,TAG, SYNTAX2,DESCR,CODE)
 EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX_DV),  \
 DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
 
+#define ITERATOR_INSN_MPY_SLOT_DOUBLE_VEC_VX_FWD(WIDTH,TAG,SYNTAX,DESCR,CODE) \
+EXTINSN(V6_##TAG, SYNTAX, ATTRIBS(A_EXTENSION,A_CVI,A_CVI_VX_DV),  \
+DESCR, DO_FOR_EACH_CODE(WIDTH, CODE))
+
 #define 
ITERATOR_INSN2_MPY_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX,SYNTAX2,DESCR,CODE) \
 ITERATOR_INSN_MPY_SLOT_DOUBLE_VEC(WIDTH,TAG,SYNTAX2,DESCR,CODE)
 
@@ -2507,6 +2511,281 @@ EXTINSN(V6_vscattermhw , 
"vscatter(Rt32,Mu2,Vvv32.w).h=Vw32", ATTRIBS(A_EXTENSIO
 })
 
 
+ITERATOR_INSN_MPY_SLOT_DOUBLE_VEC_VX_FWD(32, v6mpyvubs10_vxx, 
"Vxx32.w+=v6mpy(Vuu32.ub,Vvv32.b,#u2):v", "",
+fHIDE(size2s_t c00;)
+fGET10BIT(c00, VvvV.v[0].uw[i], 0)
+fHIDE(size2s_t c01;)
+fGET10BIT(c01, VvvV.v[0].uw[i], 1)
+fHIDE(size2s_t c02;)
+fGET10BIT(c02, VvvV.v[0].uw[i], 2)
+
+   fHIDE(size2s_t c10;)
+fGET10BIT(c10, VvvV.v[1].uw[i], 0)
+fHIDE(size2s_t c11;)
+fGET10BIT(c11, VvvV.v[1].uw[i], 1)
+fHIDE(size2s_t c12;)
+fGET10BIT(c12, VvvV.v[1].uw[i], 2)
+
+if (uiV == 0) {
+VxxV.v[1].w[i] += fMPY16US(fGETUBYTE(3,VuuV.v[0].uw[i]), c10);
+VxxV.v[1].w[i] += fMPY16US(fGETUBYTE(2,VuuV.v[1].uw[i]), c11);
+VxxV.v[1].w[i] +=

[PULL v2 21/44] Hexagon (target/hexagon) Short-circuit packet predicate writes

2023-05-18 Thread Taylor Simpson

In certain cases, we can avoid the overhead of writing to hex_new_pred_value
and write directly to hex_pred.  We consider predicate reads/writes when
computing ctx->need_commit.  The get_result_pred() function uses this
field to decide between hex_new_pred_value and hex_pred.  Then, we can
early-exit from gen_pred_writes.

Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
Message-Id: <20230427230012.3800327-13-tsimp...@quicinc.com>
---
 target/hexagon/genptr.h|  1 +
 target/hexagon/genptr.c| 15 ---
 target/hexagon/translate.c | 14 +++---
 3 files changed, 24 insertions(+), 6 deletions(-)

diff --git a/target/hexagon/genptr.h b/target/hexagon/genptr.h
index 420867f934..e11ccc2358 100644
--- a/target/hexagon/genptr.h
+++ b/target/hexagon/genptr.h
@@ -35,6 +35,7 @@ void gen_store4i(TCGv_env cpu_env, TCGv vaddr, int32_t src, 
uint32_t slot);
 void gen_store8i(TCGv_env cpu_env, TCGv vaddr, int64_t src, uint32_t slot);
 TCGv gen_read_reg(TCGv result, int num);
 TCGv gen_read_preg(TCGv pred, uint8_t num);
+TCGv get_result_pred(DisasContext *ctx, int pnum);
 void gen_log_reg_write(DisasContext *ctx, int rnum, TCGv val);
 void gen_log_pred_write(DisasContext *ctx, int pnum, TCGv val);
 void gen_set_usr_field(DisasContext *ctx, int field, TCGv val);
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 9858d7bc35..5025e172cf 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -110,8 +110,18 @@ static void gen_log_reg_write_pair(DisasContext *ctx, int 
rnum, TCGv_i64 val)
 gen_log_reg_write(ctx, rnum + 1, val32);
 }
 
+TCGv get_result_pred(DisasContext *ctx, int pnum)
+{
+if (ctx->need_commit) {
+return hex_new_pred_value[pnum];
+} else {
+return hex_pred[pnum];
+}
+}
+
 void gen_log_pred_write(DisasContext *ctx, int pnum, TCGv val)
 {
+TCGv pred = get_result_pred(ctx, pnum);
 TCGv base_val = tcg_temp_new();
 
 tcg_gen_andi_tl(base_val, val, 0xff);
@@ -124,10 +134,9 @@ void gen_log_pred_write(DisasContext *ctx, int pnum, TCGv 
val)
  * straight assignment.  Otherwise, do an and.
  */
 if (!test_bit(pnum, ctx->pregs_written)) {
-tcg_gen_mov_tl(hex_new_pred_value[pnum], base_val);
+tcg_gen_mov_tl(pred, base_val);
 } else {
-tcg_gen_and_tl(hex_new_pred_value[pnum],
-   hex_new_pred_value[pnum], base_val);
+tcg_gen_and_tl(pred, pred, base_val);
 }
 if (HEX_DEBUG) {
 tcg_gen_ori_tl(hex_pred_written, hex_pred_written, 1 << pnum);
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index 6fa885cf16..bcf64f725a 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -386,6 +386,14 @@ static bool need_commit(DisasContext *ctx)
 }
 }
 
+/* Check for overlap between predicate reads and writes */
+for (int i = 0; i < ctx->preg_log_idx; i++) {
+int pnum = ctx->preg_log[i];
+if (test_bit(pnum, ctx->pregs_read)) {
+return true;
+}
+}
+
 return false;
 }
 
@@ -503,7 +511,7 @@ static void gen_start_packet(DisasContext *ctx)
  * Preload the predicated pred registers into hex_new_pred_value[pred_num]
  * Only endloop instructions conditionally write to pred registers
  */
-if (pkt->pkt_has_endloop) {
+if (ctx->need_commit && pkt->pkt_has_endloop) {
 for (int i = 0; i < ctx->preg_log_idx; i++) {
 int pred_num = ctx->preg_log[i];
 tcg_gen_mov_tl(hex_new_pred_value[pred_num], hex_pred[pred_num]);
@@ -622,8 +630,8 @@ static void gen_reg_writes(DisasContext *ctx)
 
 static void gen_pred_writes(DisasContext *ctx)
 {
-/* Early exit if the log is empty */
-if (!ctx->preg_log_idx) {
+/* Early exit if not needed or the log is empty */
+if (!ctx->need_commit || !ctx->preg_log_idx) {
 return;
 }
 
-- 
2.25.1

[PULL v2 30/44] Hexagon (target/hexagon) Move items to DisasContext

2023-05-18 Thread Taylor Simpson

The following items in the CPUHexagonState are only used for bookkeeping
within the translation of a packet.  With recent changes that eliminate
the need to free TCGv variables, these make more sense to be transient
and kept in DisasContext.

The following items are moved
dczero_addr
branch_taken
this_PC

Suggested-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
Message-Id: <20230427230012.3800327-22-tsimp...@quicinc.com>
---
 target/hexagon/cpu.h   |  3 ---
 target/hexagon/helper.h|  2 +-
 target/hexagon/macros.h|  6 +-
 target/hexagon/translate.h |  5 ++---
 target/hexagon/genptr.c|  6 +++---
 target/hexagon/op_helper.c |  5 ++---
 target/hexagon/translate.c | 23 +++
 target/hexagon/README  |  2 +-
 8 files changed, 21 insertions(+), 31 deletions(-)

diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
index 87e457dda9..d095dc6647 100644
--- a/target/hexagon/cpu.h
+++ b/target/hexagon/cpu.h
@@ -82,7 +82,6 @@ typedef struct {
 typedef struct CPUArchState {
 target_ulong gpr[TOTAL_PER_THREAD_REGS];
 target_ulong pred[NUM_PREGS];
-target_ulong branch_taken;
 
 /* For comparing with LLDB on target - see adjust_stack_ptrs function */
 target_ulong last_pc_dumped;
@@ -95,11 +94,9 @@ typedef struct CPUArchState {
  * Only used when HEX_DEBUG is on, but unconditionally included
  * to reduce recompile time when turning HEX_DEBUG on/off.
  */
-target_ulong this_PC;
 target_ulong reg_written[TOTAL_PER_THREAD_REGS];
 
 MemLog mem_log_stores[STORES_MAX];
-target_ulong dczero_addr;
 
 float_status fp_status;
 
diff --git a/target/hexagon/helper.h b/target/hexagon/helper.h
index f3b298beee..fa0ebaf7c8 100644
--- a/target/hexagon/helper.h
+++ b/target/hexagon/helper.h
@@ -21,7 +21,7 @@
 DEF_HELPER_FLAGS_2(raise_exception, TCG_CALL_NO_RETURN, noreturn, env, i32)
 DEF_HELPER_1(debug_start_packet, void, env)
 DEF_HELPER_FLAGS_3(debug_check_store_width, TCG_CALL_NO_WG, void, env, int, 
int)
-DEF_HELPER_FLAGS_4(debug_commit_end, TCG_CALL_NO_WG, void, env, int, int, int)
+DEF_HELPER_FLAGS_5(debug_commit_end, TCG_CALL_NO_WG, void, env, i32, int, int, 
int)
 DEF_HELPER_2(commit_store, void, env, int)
 DEF_HELPER_3(gather_store, void, env, i32, int)
 DEF_HELPER_1(commit_hvx_stores, void, env)
diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
index 5308c0848e..5451b061ee 100644
--- a/target/hexagon/macros.h
+++ b/target/hexagon/macros.h
@@ -648,7 +648,11 @@ static inline TCGv gen_read_ireg(TCGv result, TCGv val, 
int shift)
reg_field_info[FIELD].offset)
 
 #ifdef QEMU_GENERATE
-#define fDCZEROA(REG) tcg_gen_mov_tl(hex_dczero_addr, (REG))
+#define fDCZEROA(REG) \
+do { \
+ctx->dczero_addr = tcg_temp_new(); \
+tcg_gen_mov_tl(ctx->dczero_addr, (REG)); \
+} while (0)
 #endif
 
 #define fBRANCH_SPECULATE_STALL(DOTNEWVAL, JUMP_COND, SPEC_DIR, HINTBITNUM, \
diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h
index 9697b4de0e..4dd59c6726 100644
--- a/target/hexagon/translate.h
+++ b/target/hexagon/translate.h
@@ -71,6 +71,8 @@ typedef struct DisasContext {
 TCGv new_value[TOTAL_PER_THREAD_REGS];
 TCGv new_pred_value[NUM_PREGS];
 TCGv pred_written;
+TCGv branch_taken;
+TCGv dczero_addr;
 } DisasContext;
 
 static inline void ctx_log_pred_write(DisasContext *ctx, int pnum)
@@ -189,16 +191,13 @@ static inline void ctx_log_qreg_read(DisasContext *ctx, 
int qnum)
 
 extern TCGv hex_gpr[TOTAL_PER_THREAD_REGS];
 extern TCGv hex_pred[NUM_PREGS];
-extern TCGv hex_this_PC;
 extern TCGv hex_slot_cancelled;
-extern TCGv hex_branch_taken;
 extern TCGv hex_new_value_usr;
 extern TCGv hex_reg_written[TOTAL_PER_THREAD_REGS];
 extern TCGv hex_store_addr[STORES_MAX];
 extern TCGv hex_store_width[STORES_MAX];
 extern TCGv hex_store_val32[STORES_MAX];
 extern TCGv_i64 hex_store_val64[STORES_MAX];
-extern TCGv hex_dczero_addr;
 extern TCGv hex_llsc_addr;
 extern TCGv hex_llsc_val;
 extern TCGv_i64 hex_llsc_val_i64;
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 361cc789d7..cb2aa28a19 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -480,9 +480,9 @@ static void gen_write_new_pc_addr(DisasContext *ctx, TCGv 
addr,
 if (ctx->pkt->pkt_has_multi_cof) {
 /* If there are multiple branches in a packet, ignore the second one */
 tcg_gen_movcond_tl(TCG_COND_NE, hex_gpr[HEX_REG_PC],
-   hex_branch_taken, tcg_constant_tl(0),
+   ctx->branch_taken, tcg_constant_tl(0),
hex_gpr[HEX_REG_PC], addr);
-tcg_gen_movi_tl(hex_branch_taken, 1);
+tcg_gen_movi_tl(ctx->branch_taken, 1);
 } else {
 tcg_gen_mov_tl(hex_gpr[HEX_REG_PC], addr);
 }
@@ -503,7 +503,7 @@ static void gen_write_new_pc_pcrel(DisasContext *ctx, int 
pc_off,
 ctx->branch_cond = TCG_COND_ALWAYS;

[PULL v2 23/44] Hexagon (target/hexagon) Short-circuit more HVX single instruction packets

2023-05-18 Thread Taylor Simpson

The generated helpers for HVX use pass-by-reference, so they can't
short-circuit when the reads/writes overlap.  The instructions with
overrides are OK because they use tcg_gen_gvec_*.

We add a flag has_hvx_helper to DisasContext and extend gen_analyze_funcs
to set the flag when the instruction is an HVX instruction with a
generated helper.

We add an override for V6_vcombine so that it can be short-circuited
along with a test case in tests/tcg/hexagon/hvx_misc.c

Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
Message-Id: <20230427230012.3800327-15-tsimp...@quicinc.com>
---
 target/hexagon/gen_tcg_hvx.h| 23 +++
 target/hexagon/translate.h  |  1 +
 target/hexagon/translate.c  | 17 +++--
 tests/tcg/hexagon/hvx_misc.c| 21 +
 target/hexagon/gen_analyze_funcs.py |  5 +
 5 files changed, 65 insertions(+), 2 deletions(-)

diff --git a/target/hexagon/gen_tcg_hvx.h b/target/hexagon/gen_tcg_hvx.h
index 8dceead5e5..44bae53f8d 100644
--- a/target/hexagon/gen_tcg_hvx.h
+++ b/target/hexagon/gen_tcg_hvx.h
@@ -140,6 +140,29 @@ static inline void assert_vhist_tmp(DisasContext *ctx)
  sizeof(MMVector), sizeof(MMVector)); \
 } while (0)
 
+/*
+ * Vector combine
+ *
+ * Be careful that the source and dest don't overlap
+ */
+#define fGEN_TCG_V6_vcombine(SHORTCODE) \
+do { \
+if (VddV_off != VuV_off) { \
+tcg_gen_gvec_mov(MO_64, VddV_off, VvV_off, \
+ sizeof(MMVector), sizeof(MMVector)); \
+tcg_gen_gvec_mov(MO_64, VddV_off + sizeof(MMVector), VuV_off, \
+ sizeof(MMVector), sizeof(MMVector)); \
+} else { \
+intptr_t tmpoff = offsetof(CPUHexagonState, vtmp); \
+tcg_gen_gvec_mov(MO_64, tmpoff, VuV_off, \
+ sizeof(MMVector), sizeof(MMVector)); \
+tcg_gen_gvec_mov(MO_64, VddV_off, VvV_off, \
+ sizeof(MMVector), sizeof(MMVector)); \
+tcg_gen_gvec_mov(MO_64, VddV_off + sizeof(MMVector), tmpoff, \
+ sizeof(MMVector), sizeof(MMVector)); \
+} \
+} while (0)
+
 /* Vector conditional move */
 #define fGEN_TCG_VEC_CMOV(PRED) \
 do { \
diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h
index 3f6fd3452c..26bcae0395 100644
--- a/target/hexagon/translate.h
+++ b/target/hexagon/translate.h
@@ -68,6 +68,7 @@ typedef struct DisasContext {
 bool is_tight_loop;
 bool need_pkt_has_store_s1;
 bool short_circuit;
+bool has_hvx_helper;
 } DisasContext;
 
 static inline void ctx_log_pred_write(DisasContext *ctx, int pnum)
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index 8e7a4377c8..fe85edc1ec 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -378,8 +378,20 @@ static bool need_commit(DisasContext *ctx)
 return true;
 }
 
-if (pkt->num_insns == 1 && !pkt->pkt_has_hvx) {
-return false;
+if (pkt->num_insns == 1) {
+if (pkt->pkt_has_hvx) {
+/*
+ * The HVX instructions with generated helpers use
+ * pass-by-reference, so they need the read/write overlap
+ * check below.
+ * The HVX instructions with overrides are OK.
+ */
+if (!ctx->has_hvx_helper) {
+return false;
+}
+} else {
+return false;
+}
 }
 
 /* Check for overlap between register reads and writes */
@@ -454,6 +466,7 @@ static void analyze_packet(DisasContext *ctx)
 {
 Packet *pkt = ctx->pkt;
 ctx->need_pkt_has_store_s1 = false;
+ctx->has_hvx_helper = false;
 for (int i = 0; i < pkt->num_insns; i++) {
 Insn *insn = >insn[i];
 ctx->insn = insn;
diff --git a/tests/tcg/hexagon/hvx_misc.c b/tests/tcg/hexagon/hvx_misc.c
index d0e64e035f..c89fe0253d 100644
--- a/tests/tcg/hexagon/hvx_misc.c
+++ b/tests/tcg/hexagon/hvx_misc.c
@@ -454,6 +454,25 @@ static void test_load_cur_predicated(void)
 check_output_w(__LINE__, BUFSIZE);
 }
 
+static void test_vcombine(void)
+{
+for (int i = 0; i < BUFSIZE / 2; i++) {
+asm volatile("v2 = vsplat(%0)\n\t"
+ "v3 = vsplat(%1)\n\t"
+ "v3:2 = vcombine(v2, v3)\n\t"
+ "vmem(%2+#0) = v2\n\t"
+ "vmem(%2+#1) = v3\n\t"
+ :
+ : "r"(2 * i), "r"(2 * i + 1), "r"([2 * i])
+ : "v2", "v3", "memory");
+for (int j = 0; j < MAX_VEC_SIZE_BYTES / 4; j++) {
+expect[2 * i].w[j] = 2 * i + 1;
+expect[2 * i + 1].w[j] = 2 * i;
+}
+}
+check_output_w(__LINE__, BUFSIZE);
+}
+
 int main()
 {
 init_buffers();
@@ -494,6 +513,8 @@ int main()
 test_load_tmp_predicated();
 test_load_cur_predicated();
 
+test_vcombine();
+

[PULL v2 02/44] Hexagon (target/hexagon) Add v68 scalar instructions

2023-05-18 Thread Taylor Simpson

The following instructions are added
L2_loadw_aq
L4_loadd_aq
R6_release_at_vi
R6_release_st_vi
S2_storew_rl_at_vi
S4_stored_rl_at_vi
S2_storew_rl_st_vi
S4_stored_rl_st_vi

The release instructions are nop's in qemu.  The others behave as
 loads/stores.

The encodings for these instructions changed some "don't care" bits
L2_loadw_locked
L4_loadd_locked
S2_storew_locked
S4_stored_locked

Signed-off-by: Taylor Simpson 
Reviewed-by: Anton Johansson 
Message-Id: <20230427224057.3766963-3-tsimp...@quicinc.com>
---
 target/hexagon/gen_tcg.h| 18 ++
 target/hexagon/attribs_def.h.inc|  7 +++
 target/hexagon/translate.c  |  3 +++
 target/hexagon/gen_idef_parser_funcs.py |  2 ++
 target/hexagon/imported/encode_pp.def   | 19 ++-
 target/hexagon/imported/ldst.idef   | 20 +++-
 6 files changed, 63 insertions(+), 6 deletions(-)

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index 329e7a1024..598d80d3ce 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -1236,6 +1236,24 @@
 uiV = uiV; \
 } while (0)
 
+#define fGEN_TCG_L2_loadw_aq(SHORTCODE) SHORTCODE
+#define fGEN_TCG_L4_loadd_aq(SHORTCODE) SHORTCODE
+
+/* Nothing to do for these in qemu, need to suppress compiler warnings */
+#define fGEN_TCG_R6_release_at_vi(SHORTCODE) \
+do { \
+RsV = RsV; \
+} while (0)
+#define fGEN_TCG_R6_release_st_vi(SHORTCODE) \
+do { \
+RsV = RsV; \
+} while (0)
+
+#define fGEN_TCG_S2_storew_rl_at_vi(SHORTCODE)  SHORTCODE
+#define fGEN_TCG_S4_stored_rl_at_vi(SHORTCODE)  SHORTCODE
+#define fGEN_TCG_S2_storew_rl_st_vi(SHORTCODE)  SHORTCODE
+#define fGEN_TCG_S4_stored_rl_st_vi(SHORTCODE)  SHORTCODE
+
 #define fGEN_TCG_J2_trap0(SHORTCODE) \
 do { \
 uiV = uiV; \
diff --git a/target/hexagon/attribs_def.h.inc b/target/hexagon/attribs_def.h.inc
index 9874d1658f..0ddfb45bdf 100644
--- a/target/hexagon/attribs_def.h.inc
+++ b/target/hexagon/attribs_def.h.inc
@@ -52,6 +52,12 @@ DEF_ATTRIB(REGWRSIZE_4B, "Memory width is 4 bytes", "", "")
 DEF_ATTRIB(REGWRSIZE_8B, "Memory width is 8 bytes", "", "")
 DEF_ATTRIB(MEMLIKE, "Memory-like instruction", "", "")
 DEF_ATTRIB(MEMLIKE_PACKET_RULES, "follows Memory-like packet rules", "", "")
+DEF_ATTRIB(RELEASE, "Releases a lock", "", "")
+DEF_ATTRIB(ACQUIRE, "Acquires a lock", "", "")
+
+DEF_ATTRIB(RLS_INNER, "Store release inner visibility", "", "")
+DEF_ATTRIB(RLS_ALL_THREAD, "Store release among all threads", "", "")
+DEF_ATTRIB(RLS_SAME_THREAD, "Store release with the same thread", "", "")
 
 /* V6 Vector attributes */
 DEF_ATTRIB(CVI, "Executes on the HVX extension", "", "")
@@ -74,6 +80,7 @@ DEF_ATTRIB(CVI_SCATTER_RELEASE, "CVI Store Release for 
scatter", "", "")
 DEF_ATTRIB(CVI_TMP_DST, "CVI instruction that doesn't write a register", "", 
"")
 DEF_ATTRIB(CVI_SLOT23, "Can execute in slot 2 or slot 3 (HVX)", "", "")
 
+DEF_ATTRIB(VTCM_ALLBANK_ACCESS, "Allocates in all VTCM schedulers.", "", "")
 
 /* Change-of-flow attributes */
 DEF_ATTRIB(JUMP, "Jump-type instruction", "", "")
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index cddd7c5db4..01f448a325 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -481,6 +481,9 @@ static void mark_store_width(DisasContext *ctx)
 uint8_t width = 0;
 
 if (GET_ATTRIB(opcode, A_SCALAR_STORE)) {
+if (GET_ATTRIB(opcode, A_MEMSIZE_0B)) {
+return;
+}
 if (GET_ATTRIB(opcode, A_MEMSIZE_1B)) {
 width |= 1;
 }
diff --git a/target/hexagon/gen_idef_parser_funcs.py 
b/target/hexagon/gen_idef_parser_funcs.py
index afe68bdb6f..dc9e396b52 100644
--- a/target/hexagon/gen_idef_parser_funcs.py
+++ b/target/hexagon/gen_idef_parser_funcs.py
@@ -109,6 +109,8 @@ def main():
 continue
 if "A_COF" in hex_common.attribdict[tag]:
 continue
+if ( tag.startswith('R6_release_') ):
+continue
 
 regs = tagregs[tag]
 imms = tagimms[tag]
diff --git a/target/hexagon/imported/encode_pp.def 
b/target/hexagon/imported/encode_pp.def
index d71c04cd30..763f465bfd 100644
--- a/target/hexagon/imported/encode_pp.def
+++ b/target/hexagon/imported/encode_pp.def
@@ -1,5 +1,5 @@
 /*
- *  Copyright(c) 2019-2021 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
+ *  Copyright(c) 2019-2023 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
  *
  *  This program is free software; you can redistribute it and/or modify
  *  it under the terms of the GNU General Public License as published by
@@ -382,14 +382,23 @@ DEF_ENC32(L4_return_fnew_pt,  ICLASS_LD" 011 0 000 s 
PP1110vv ---d")
 DEF_ENC32(L4_return_tnew_pnt, ICLASS_LD" 011 0 000 s PP0010vv ---d")
 DEF_ENC32(L4_return_fnew_pnt, ICLASS_LD" 011 0 000

[PULL v2 25/44] Hexagon (target/hexagon) Make special new_value for USR

2023-05-18 Thread Taylor Simpson

Precursor to moving new_value from the global state to DisasContext

USR will need to stay in the global state because some helpers will
set it's value

Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
Message-Id: <20230427230012.3800327-17-tsimp...@quicinc.com>
---
 target/hexagon/cpu.h|  1 +
 target/hexagon/genptr.h |  1 +
 target/hexagon/macros.h |  2 +-
 target/hexagon/translate.h  |  1 +
 target/hexagon/genptr.c |  8 ++--
 target/hexagon/translate.c  | 22 +++---
 target/hexagon/README   |  2 +-
 target/hexagon/gen_tcg_funcs.py |  2 +-
 8 files changed, 27 insertions(+), 12 deletions(-)

diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
index 631bfdbe9c..f86c9f0131 100644
--- a/target/hexagon/cpu.h
+++ b/target/hexagon/cpu.h
@@ -90,6 +90,7 @@ typedef struct CPUArchState {
 
 uint8_t slot_cancelled;
 target_ulong new_value[TOTAL_PER_THREAD_REGS];
+target_ulong new_value_usr;
 
 /*
  * Only used when HEX_DEBUG is on, but unconditionally included
diff --git a/target/hexagon/genptr.h b/target/hexagon/genptr.h
index e11ccc2358..a4b43c2910 100644
--- a/target/hexagon/genptr.h
+++ b/target/hexagon/genptr.h
@@ -35,6 +35,7 @@ void gen_store4i(TCGv_env cpu_env, TCGv vaddr, int32_t src, 
uint32_t slot);
 void gen_store8i(TCGv_env cpu_env, TCGv vaddr, int64_t src, uint32_t slot);
 TCGv gen_read_reg(TCGv result, int num);
 TCGv gen_read_preg(TCGv pred, uint8_t num);
+TCGv get_result_gpr(DisasContext *ctx, int rnum);
 TCGv get_result_pred(DisasContext *ctx, int pnum);
 void gen_log_reg_write(DisasContext *ctx, int rnum, TCGv val);
 void gen_log_pred_write(DisasContext *ctx, int pnum, TCGv val);
diff --git a/target/hexagon/macros.h b/target/hexagon/macros.h
index 54562cccb0..828874f318 100644
--- a/target/hexagon/macros.h
+++ b/target/hexagon/macros.h
@@ -46,7 +46,7 @@
 #define SET_USR_FIELD(FIELD, VAL) \
 do { \
 if (pkt_need_commit) { \
-fINSERT_BITS(env->new_value[HEX_REG_USR], \
+fINSERT_BITS(env->new_value_usr, \
 reg_field_info[FIELD].width, \
 reg_field_info[FIELD].offset, (VAL)); \
 } else { \
diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h
index 26bcae0395..4c17433a6f 100644
--- a/target/hexagon/translate.h
+++ b/target/hexagon/translate.h
@@ -191,6 +191,7 @@ extern TCGv hex_this_PC;
 extern TCGv hex_slot_cancelled;
 extern TCGv hex_branch_taken;
 extern TCGv hex_new_value[TOTAL_PER_THREAD_REGS];
+extern TCGv hex_new_value_usr;
 extern TCGv hex_reg_written[TOTAL_PER_THREAD_REGS];
 extern TCGv hex_new_pred_value[NUM_PREGS];
 extern TCGv hex_pred_written;
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 5eb0d58659..bfcb962a3d 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -68,10 +68,14 @@ static inline void gen_masked_reg_write(TCGv new_val, TCGv 
cur_val,
 }
 }
 
-static TCGv get_result_gpr(DisasContext *ctx, int rnum)
+TCGv get_result_gpr(DisasContext *ctx, int rnum)
 {
 if (ctx->need_commit) {
-return hex_new_value[rnum];
+if (rnum == HEX_REG_USR) {
+return hex_new_value_usr;
+} else {
+return hex_new_value[rnum];
+}
 } else {
 return hex_gpr[rnum];
 }
diff --git a/target/hexagon/translate.c b/target/hexagon/translate.c
index fe85edc1ec..e73c0066dd 100644
--- a/target/hexagon/translate.c
+++ b/target/hexagon/translate.c
@@ -45,6 +45,7 @@ TCGv hex_this_PC;
 TCGv hex_slot_cancelled;
 TCGv hex_branch_taken;
 TCGv hex_new_value[TOTAL_PER_THREAD_REGS];
+TCGv hex_new_value_usr;
 TCGv hex_reg_written[TOTAL_PER_THREAD_REGS];
 TCGv hex_new_pred_value[NUM_PREGS];
 TCGv hex_pred_written;
@@ -547,12 +548,12 @@ static void gen_start_packet(DisasContext *ctx)
 tcg_gen_movi_tl(hex_pred_written, 0);
 }
 
-/* Preload the predicated registers into hex_new_value[i] */
+/* Preload the predicated registers into get_result_gpr(ctx, i) */
 if (ctx->need_commit &&
 !bitmap_empty(ctx->predicated_regs, TOTAL_PER_THREAD_REGS)) {
 int i = find_first_bit(ctx->predicated_regs, TOTAL_PER_THREAD_REGS);
 while (i < TOTAL_PER_THREAD_REGS) {
-tcg_gen_mov_tl(hex_new_value[i], hex_gpr[i]);
+tcg_gen_mov_tl(get_result_gpr(ctx, i), hex_gpr[i]);
 i = find_next_bit(ctx->predicated_regs, TOTAL_PER_THREAD_REGS,
   i + 1);
 }
@@ -667,7 +668,7 @@ static void gen_reg_writes(DisasContext *ctx)
 for (i = 0; i < ctx->reg_log_idx; i++) {
 int reg_num = ctx->reg_log[i];
 
-tcg_gen_mov_tl(hex_gpr[reg_num], hex_new_value[reg_num]);
+tcg_gen_mov_tl(hex_gpr[reg_num], get_result_gpr(ctx, reg_num));
 
 /*
  * ctx->is_tight_loop is set when SA0 points to the beginning of the 
TB.
@@ -1180,10 +1181,14 @@ void hexagon_translate_init(void)

[PULL v2 06/44] Hexagon (target/hexagon) Add v69 HVX instructions

2023-05-18 Thread Taylor Simpson

The following instructions are added
V6_vasrvuhubrndsat
V6_vasrvuhubsat
V6_vasrvwuhrndsat
V6_vasrvwuhsat
V6_vassign_tmp
V6_vcombine_tmp
V6_vmpyuhvs

Signed-off-by: Taylor Simpson 
Reviewed-by: Anton Johansson 
Message-Id: <20230427224057.3766963-7-tsimp...@quicinc.com>
---
 target/hexagon/gen_tcg_hvx.h | 12 ++
 target/hexagon/attribs_def.h.inc |  8 
 target/hexagon/imported/mmvec/encode_ext.def |  8 
 target/hexagon/imported/mmvec/ext.idef   | 40 
 4 files changed, 68 insertions(+)

diff --git a/target/hexagon/gen_tcg_hvx.h b/target/hexagon/gen_tcg_hvx.h
index d4aefe8e3f..8dceead5e5 100644
--- a/target/hexagon/gen_tcg_hvx.h
+++ b/target/hexagon/gen_tcg_hvx.h
@@ -128,6 +128,18 @@ static inline void assert_vhist_tmp(DisasContext *ctx)
 tcg_gen_gvec_mov(MO_64, VdV_off, VuV_off, \
  sizeof(MMVector), sizeof(MMVector))
 
+#define fGEN_TCG_V6_vassign_tmp(SHORTCODE) \
+tcg_gen_gvec_mov(MO_64, VdV_off, VuV_off, \
+ sizeof(MMVector), sizeof(MMVector))
+
+#define fGEN_TCG_V6_vcombine_tmp(SHORTCODE) \
+do { \
+tcg_gen_gvec_mov(MO_64, VddV_off, VvV_off, \
+ sizeof(MMVector), sizeof(MMVector)); \
+tcg_gen_gvec_mov(MO_64, VddV_off + sizeof(MMVector), VuV_off, \
+ sizeof(MMVector), sizeof(MMVector)); \
+} while (0)
+
 /* Vector conditional move */
 #define fGEN_TCG_VEC_CMOV(PRED) \
 do { \
diff --git a/target/hexagon/attribs_def.h.inc b/target/hexagon/attribs_def.h.inc
index 0ddfb45bdf..3bef60bef3 100644
--- a/target/hexagon/attribs_def.h.inc
+++ b/target/hexagon/attribs_def.h.inc
@@ -69,11 +69,13 @@ DEF_ATTRIB(CVI_VP_VS, "Double vector permute/shft insn 
executes on HVX", "", "")
 DEF_ATTRIB(CVI_VX, "Multiply instruction executes on HVX", "", "")
 DEF_ATTRIB(CVI_VX_DV, "Double vector multiply insn executes on HVX", "", "")
 DEF_ATTRIB(CVI_VS, "Shift instruction executes on HVX", "", "")
+DEF_ATTRIB(CVI_VS_3SRC, "This shift needs to borrow a source register", "", "")
 DEF_ATTRIB(CVI_VS_VX, "Permute/shift and multiply insn executes on HVX", "", 
"")
 DEF_ATTRIB(CVI_VA, "ALU instruction executes on HVX", "", "")
 DEF_ATTRIB(CVI_VA_DV, "Double vector alu instruction executes on HVX", "", "")
 DEF_ATTRIB(CVI_4SLOT, "Consumes all the vector execution resources", "", "")
 DEF_ATTRIB(CVI_TMP, "Transient Memory Load not written to register", "", "")
+DEF_ATTRIB(CVI_REMAP, "Register Renaming not written to register file", "", "")
 DEF_ATTRIB(CVI_GATHER, "CVI Gather operation", "", "")
 DEF_ATTRIB(CVI_SCATTER, "CVI Scatter operation", "", "")
 DEF_ATTRIB(CVI_SCATTER_RELEASE, "CVI Store Release for scatter", "", "")
@@ -147,6 +149,8 @@ DEF_ATTRIB(L2FETCH, "Instruction is l2fetch type", "", "")
 DEF_ATTRIB(ICINVA, "icinva", "", "")
 DEF_ATTRIB(DCCLEANINVA, "dccleaninva", "", "")
 
+DEF_ATTRIB(NO_INTRINSIC, "Don't generate an intrisic", "", "")
+
 /* Documentation Notes */
 DEF_ATTRIB(NOTE_CONDITIONAL, "can be conditionally executed", "", "")
 DEF_ATTRIB(NOTE_NEWVAL_SLOT0, "New-value oprnd must execute on slot 0", "", "")
@@ -155,7 +159,11 @@ DEF_ATTRIB(NOTE_NOPACKET, "solo instruction", "", "")
 DEF_ATTRIB(NOTE_AXOK, "May only be grouped with ALU32 or non-FP XTYPE.", "", 
"")
 DEF_ATTRIB(NOTE_LATEPRED, "The predicate can not be used as a .new", "", "")
 DEF_ATTRIB(NOTE_NVSLOT0, "Can execute only in slot 0 (ST)", "", "")
+DEF_ATTRIB(NOTE_NOVP, "Cannot be paired with a HVX permute instruction", "", 
"")
+DEF_ATTRIB(NOTE_VA_UNARY, "Combined with HVX ALU op (must be unary)", "", "")
 
+/* V6 MMVector Notes for Documentation */
+DEF_ATTRIB(NOTE_SHIFT_RESOURCE, "Uses the HVX shift resource.", "", "")
 /* Restrictions to make note of */
 DEF_ATTRIB(RESTRICT_NOSLOT1_STORE, "Packet must not have slot 1 store", "", "")
 DEF_ATTRIB(RESTRICT_LATEPRED, "Predicate can not be used as a .new.", "", "")
diff --git a/target/hexagon/imported/mmvec/encode_ext.def 
b/target/hexagon/imported/mmvec/encode_ext.def
index b9b62fef8d..402438f566 100644
--- a/target/hexagon/imported/mmvec/encode_ext.def
+++ b/target/hexagon/imported/mmvec/encode_ext.def
@@ -257,6 +257,11 @@ DEF_ENC(V6_vasruhubrndsat, ICLASS_CJ" 1 000 vvv 
vvttt PP 0 u 111 ddd
 DEF_ENC(V6_vasruwuhsat, ICLASS_CJ" 1 000 vvv vvttt PP 1 u 100 
d") //
 DEF_ENC(V6_vasruhubsat,ICLASS_CJ" 1 000 vvv vvttt PP 1 u 101 
d") //
 
+DEF_ENC(V6_vasrvuhubrndsat,"00011101000vPP0u011d")
+DEF_ENC(V6_vasrvuhubsat,"00011101000vPP0u010d")
+DEF_ENC(V6_vasrvwuhrndsat,"00011101000vPP0u001d")
+DEF_ENC(V6_vasrvwuhsat,"00011101000vPP0u000d")
+
 /***
 *
 *  Group #1, Uses Q6 Rt32
@@ -716,6 +721,7 @@ DEF_ENC(V6_vaddclbw,ICLASS_CJ" 1 111 000 v PP 1 
u 001 d") //
 
 DEF_ENC(V6_vavguw,ICLASS_CJ" 1 111 000 v PP 1 u 010 d") //

[PULL v2 00/44] Hexagon update

2023-05-18 Thread Taylor Simpson

The following changes since commit 278238505d28d292927bff7683f39fb4fbca7fd1:

  Merge tag 'pull-tcg-20230511-2' of https://gitlab.com/rth7680/qemu into 
staging (2023-05-11 11:44:23 +0100)

are available in the Git repository at:

  https://github.com/quic/qemu tags/pull-hex-20230518-1

for you to fetch changes up to 9073bfd725440da0af44f1ee1e3bcf72e9de39b6:

  Hexagon (linux-user/hexagon): handle breakpoints (2023-05-18 12:40:52 -0700)


 Changes in v2 
Fix break in 32-bit host build

This PR can be broken down into the following parts
- Add support for new architecture versions v68/v69/v71/v73
- Short-circuit writes to temporaries when packet semantics permit this
- Move bookkeeping items from CPUHexagonState to DisasContext
- Correct '-cpu help' output and handling of unknown Hexagon versions
- Enable LLDB debugging
- Miscellaneous fixes and improvements


Brian Cain (1):
  Hexagon (gdbstub): fix p3:0 read and write via stub

Marco Liebel (1):
  Remove test_vshuff from hvx_misc tests

Matheus Tavares Bernardino (9):
  Hexagon (target/hexagon/*.py): raise exception on reg parsing error
  Hexagon: list available CPUs with `-cpu help`
  Hexagon: append eflags to unknown cpu model string
  Hexagon (iclass): update J4_hintjumpr slot constraints
  Hexagon (decode): look for pkts with multiple insns at the same slot
  gdbstub: only send stop-reply packets when allowed to
  gdbstub: add test for untimely stop-reply packets
  Hexagon: add core gdbstub xml data for LLDB
  Hexagon (linux-user/hexagon): handle breakpoints

Paolo Bonzini (1):
  target/hexagon: fix = vs. == mishap

Taylor Simpson (32):
  Hexagon (target/hexagon) Add support for v68/v69/v71/v73
  Hexagon (target/hexagon) Add v68 scalar instructions
  Hexagon (tests/tcg/hexagon) Add v68 scalar tests
  Hexagon (target/hexagon) Add v68 HVX instructions
  Hexagon (tests/tcg/hexagon) Add v68 HVX tests
  Hexagon (target/hexagon) Add v69 HVX instructions
  Hexagon (tests/tcg/hexagon) Add v69 HVX tests
  Hexagon (target/hexagon) Add v73 scalar instructions
  Hexagon (tests/tcg/hexagon) Add v73 scalar tests
  meson.build Add CONFIG_HEXAGON_IDEF_PARSER
  Hexagon (target/hexagon) Add DisasContext arg to gen_log_reg_write
  Hexagon (target/hexagon) Add overrides for loop setup instructions
  Hexagon (target/hexagon) Add overrides for allocframe/deallocframe
  Hexagon (target/hexagon) Add overrides for clr[tf]new
  Hexagon (target/hexagon) Remove log_reg_write from op_helper.[ch]
  Hexagon (target/hexagon) Eliminate uses of log_pred_write function
  Hexagon (target/hexagon) Clean up pred_written usage
  Hexagon (target/hexagon) Don't overlap dest writes with source reads
  Hexagon (target/hexagon) Mark registers as read during packet analysis
  Hexagon (target/hexagon) Short-circuit packet register writes
  Hexagon (target/hexagon) Short-circuit packet predicate writes
  Hexagon (target/hexagon) Short-circuit packet HVX writes
  Hexagon (target/hexagon) Short-circuit more HVX single instruction packets
  Hexagon (target/hexagon) Add overrides for disabled idef-parser insns
  Hexagon (target/hexagon) Make special new_value for USR
  Hexagon (target/hexagon) Move new_value to DisasContext
  Hexagon (target/hexagon) Move new_pred_value to DisasContext
  Hexagon (target/hexagon) Move pred_written to DisasContext
  Hexagon (target/hexagon) Move pkt_has_store_s1 to DisasContext
  Hexagon (target/hexagon) Move items to DisasContext
  Hexagon (target/hexagon) Additional instructions handled by idef-parser
  Hexagon (gdbstub): add HVX support

 MAINTAINERS|   1 +
 configure  |   2 +-
 configs/targets/hexagon-linux-user.mak |   1 +
 meson.build|   1 +
 gdbstub/internals.h|   5 +
 linux-user/hexagon/target_elf.h|  20 +-
 target/hexagon/cpu.h   |  17 +-
 target/hexagon/gen_tcg.h   | 138 +++-
 target/hexagon/gen_tcg_hvx.h   |  35 +++
 target/hexagon/genptr.h|   6 +-
 target/hexagon/helper.h|   6 +-
 target/hexagon/idef-parser/parser-helpers.h|   2 +-
 target/hexagon/internal.h  |   2 +
 target/hexagon/macros.h|  57 ++--
 target/hexagon/mmvec/macros.h  |   9 +-
 target/hexagon/op_helper.h |  16 +-
 target/hexagon/translate.h |  52 ++-
 target/hexagon/attribs_def.h.inc   |  22 +-
 gdbstub/gdbstub.c

[PULL v2 27/44] Hexagon (target/hexagon) Move new_pred_value to DisasContext

2023-05-18 Thread Taylor Simpson

The new_pred_value array in the CPUHexagonState is only used for
bookkeeping within the translation of a packet.  With recent changes
that eliminate the need to free TCGv variables, these make more sense
to be transient and kept in DisasContext.

Suggested-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
Message-Id: <20230427230012.3800327-19-tsimp...@quicinc.com>
---
 target/hexagon/cpu.h|  1 -
 target/hexagon/gen_tcg.h| 12 ++--
 target/hexagon/translate.h  |  2 +-
 target/hexagon/genptr.c | 10 +++---
 target/hexagon/idef-parser/parser-helpers.c |  2 +-
 target/hexagon/op_helper.c  |  2 +-
 target/hexagon/translate.c  | 16 ++--
 target/hexagon/gen_tcg_funcs.py |  2 +-
 8 files changed, 23 insertions(+), 24 deletions(-)

diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
index 0ef6d717d0..2b4f77fb8e 100644
--- a/target/hexagon/cpu.h
+++ b/target/hexagon/cpu.h
@@ -98,7 +98,6 @@ typedef struct CPUArchState {
 target_ulong this_PC;
 target_ulong reg_written[TOTAL_PER_THREAD_REGS];
 
-target_ulong new_pred_value[NUM_PREGS];
 target_ulong pred_written;
 
 MemLog mem_log_stores[STORES_MAX];
diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index ed2c1ccc46..d78d99d155 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -581,9 +581,9 @@
 #define fGEN_TCG_SL2_return_f(SHORTCODE) \
 gen_cond_return_subinsn(ctx, TCG_COND_NE, hex_pred[0])
 #define fGEN_TCG_SL2_return_tnew(SHORTCODE) \
-gen_cond_return_subinsn(ctx, TCG_COND_EQ, hex_new_pred_value[0])
+gen_cond_return_subinsn(ctx, TCG_COND_EQ, ctx->new_pred_value[0])
 #define fGEN_TCG_SL2_return_fnew(SHORTCODE) \
-gen_cond_return_subinsn(ctx, TCG_COND_NE, hex_new_pred_value[0])
+gen_cond_return_subinsn(ctx, TCG_COND_NE, ctx->new_pred_value[0])
 
 /*
  * Mathematical operations with more than one definition require
@@ -1122,7 +1122,7 @@
 #define fGEN_TCG_SA1_clrtnew(SHORTCODE) \
 do { \
 tcg_gen_movcond_tl(TCG_COND_EQ, RdV, \
-   hex_new_pred_value[0], tcg_constant_tl(0), \
+   ctx->new_pred_value[0], tcg_constant_tl(0), \
RdV, tcg_constant_tl(0)); \
 } while (0)
 
@@ -1130,7 +1130,7 @@
 #define fGEN_TCG_SA1_clrfnew(SHORTCODE) \
 do { \
 tcg_gen_movcond_tl(TCG_COND_NE, RdV, \
-   hex_new_pred_value[0], tcg_constant_tl(0), \
+   ctx->new_pred_value[0], tcg_constant_tl(0), \
RdV, tcg_constant_tl(0)); \
 } while (0)
 
@@ -1157,9 +1157,9 @@
 gen_cond_jumpr31(ctx, TCG_COND_NE, hex_pred[0])
 
 #define fGEN_TCG_SL2_jumpr31_tnew(SHORTCODE) \
-gen_cond_jumpr31(ctx, TCG_COND_EQ, hex_new_pred_value[0])
+gen_cond_jumpr31(ctx, TCG_COND_EQ, ctx->new_pred_value[0])
 #define fGEN_TCG_SL2_jumpr31_fnew(SHORTCODE) \
-gen_cond_jumpr31(ctx, TCG_COND_NE, hex_new_pred_value[0])
+gen_cond_jumpr31(ctx, TCG_COND_NE, ctx->new_pred_value[0])
 
 /* Count trailing zeros/ones */
 #define fGEN_TCG_S2_ct0(SHORTCODE) \
diff --git a/target/hexagon/translate.h b/target/hexagon/translate.h
index 6dde487566..fdfa1b6fe3 100644
--- a/target/hexagon/translate.h
+++ b/target/hexagon/translate.h
@@ -70,6 +70,7 @@ typedef struct DisasContext {
 bool short_circuit;
 bool has_hvx_helper;
 TCGv new_value[TOTAL_PER_THREAD_REGS];
+TCGv new_pred_value[NUM_PREGS];
 } DisasContext;
 
 static inline void ctx_log_pred_write(DisasContext *ctx, int pnum)
@@ -193,7 +194,6 @@ extern TCGv hex_slot_cancelled;
 extern TCGv hex_branch_taken;
 extern TCGv hex_new_value_usr;
 extern TCGv hex_reg_written[TOTAL_PER_THREAD_REGS];
-extern TCGv hex_new_pred_value[NUM_PREGS];
 extern TCGv hex_pred_written;
 extern TCGv hex_store_addr[STORES_MAX];
 extern TCGv hex_store_width[STORES_MAX];
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 37210e6f09..1f69f4f922 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -121,7 +121,11 @@ static void gen_log_reg_write_pair(DisasContext *ctx, int 
rnum, TCGv_i64 val)
 TCGv get_result_pred(DisasContext *ctx, int pnum)
 {
 if (ctx->need_commit) {
-return hex_new_pred_value[pnum];
+if (ctx->new_pred_value[pnum] == NULL) {
+ctx->new_pred_value[pnum] = tcg_temp_new();
+tcg_gen_movi_tl(ctx->new_pred_value[pnum], 0);
+}
+return ctx->new_pred_value[pnum];
 } else {
 return hex_pred[pnum];
 }
@@ -607,7 +611,7 @@ static void gen_cmpnd_cmp_jmp(DisasContext *ctx,
 gen_log_pred_write(ctx, pnum, pred);
 } else {
 TCGv pred = tcg_temp_new();
-tcg_gen_mov_tl(pred, hex_new_pred_value[pnum]);
+tcg_gen_mov_tl(pred, ctx->new_pred_value[pnum]);
 gen_cond_jump(ctx, cond2, pred, pc_off);

[PULL v2 32/44] target/hexagon: fix = vs. == mishap

2023-05-18 Thread Taylor Simpson

From: Paolo Bonzini 

 Changes in v2 
Fix yyassert's for sign and zero extends

Coverity reports a parameter that is "set but never used".  This is caused
by an assignment operator being used instead of equality.

Co-authored-by: Taylor Simpson 
Signed-off-by: Paolo Bonzini 
Signed-off-by: Taylor Simpson 
Reviewed-by: Anton Johansson 
Tested-by: Anton Johansson 
Message-Id: <20230428204411.1400931-1-tsimp...@quicinc.com>
---
 target/hexagon/idef-parser/parser-helpers.c | 2 +-
 target/hexagon/idef-parser/idef-parser.y| 4 ++--
 2 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/target/hexagon/idef-parser/parser-helpers.c 
b/target/hexagon/idef-parser/parser-helpers.c
index 9550097269..7b5ebafec2 100644
--- a/target/hexagon/idef-parser/parser-helpers.c
+++ b/target/hexagon/idef-parser/parser-helpers.c
@@ -1120,7 +1120,7 @@ HexValue gen_extend_op(Context *c,
HexValue *value,
HexSignedness signedness)
 {
-unsigned bit_width = (dst_width = 64) ? 64 : 32;
+unsigned bit_width = (dst_width == 64) ? 64 : 32;
 HexValue value_m = *value;
 HexValue src_width_m = *src_width;
 
diff --git a/target/hexagon/idef-parser/idef-parser.y 
b/target/hexagon/idef-parser/idef-parser.y
index 5f3907eb28..5c983954ed 100644
--- a/target/hexagon/idef-parser/idef-parser.y
+++ b/target/hexagon/idef-parser/idef-parser.y
@@ -683,7 +683,7 @@ rvalue : FAIL
  yyassert(c, &@1, $5.type == IMMEDIATE &&
   $5.imm.type == VALUE,
   "SXT expects immediate values\n");
- $$ = gen_extend_op(c, &@1, &$3, $5.imm.value, &$7, SIGNED);
+ $$ = gen_extend_op(c, &@1, &$3, 64, &$7, SIGNED);
  }
| ZXT '(' rvalue ',' IMM ',' rvalue ')'
  {
@@ -691,7 +691,7 @@ rvalue : FAIL
  yyassert(c, &@1, $5.type == IMMEDIATE &&
   $5.imm.type == VALUE,
   "ZXT expects immediate values\n");
- $$ = gen_extend_op(c, &@1, &$3, $5.imm.value, &$7, UNSIGNED);
+ $$ = gen_extend_op(c, &@1, &$3, 64, &$7, UNSIGNED);
  }
| '(' rvalue ')'
  {
-- 
2.25.1

[PULL v2 11/44] Hexagon (target/hexagon) Add DisasContext arg to gen_log_reg_write

2023-05-18 Thread Taylor Simpson

Add DisasContext arg to gen_log_reg_write_pair also

Signed-off-by: Taylor Simpson 
Reviewed-by: Richard Henderson 
Message-Id: <20230427230012.3800327-3-tsimp...@quicinc.com>
---
 target/hexagon/gen_tcg.h|  2 +-
 target/hexagon/genptr.h |  2 +-
 target/hexagon/genptr.c | 10 +-
 target/hexagon/idef-parser/parser-helpers.c |  2 +-
 target/hexagon/README   |  2 +-
 target/hexagon/gen_tcg_funcs.py |  8 +---
 6 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/target/hexagon/gen_tcg.h b/target/hexagon/gen_tcg.h
index 6f12f665db..d4bd38810e 100644
--- a/target/hexagon/gen_tcg.h
+++ b/target/hexagon/gen_tcg.h
@@ -515,7 +515,7 @@
 do { \
 TCGv_i64 RddV = get_result_gpr_pair(ctx, HEX_REG_FP); \
 gen_return(ctx, RddV, hex_gpr[HEX_REG_FP]); \
-gen_log_reg_write_pair(HEX_REG_FP, RddV); \
+gen_log_reg_write_pair(ctx, HEX_REG_FP, RddV); \
 } while (0)
 
 /*
diff --git a/target/hexagon/genptr.h b/target/hexagon/genptr.h
index 76e497aa48..75d0fc262d 100644
--- a/target/hexagon/genptr.h
+++ b/target/hexagon/genptr.h
@@ -35,7 +35,7 @@ void gen_store4i(TCGv_env cpu_env, TCGv vaddr, int32_t src, 
uint32_t slot);
 void gen_store8i(TCGv_env cpu_env, TCGv vaddr, int64_t src, uint32_t slot);
 TCGv gen_read_reg(TCGv result, int num);
 TCGv gen_read_preg(TCGv pred, uint8_t num);
-void gen_log_reg_write(int rnum, TCGv val);
+void gen_log_reg_write(DisasContext *ctx, int rnum, TCGv val);
 void gen_log_pred_write(DisasContext *ctx, int pnum, TCGv val);
 void gen_set_usr_field(DisasContext *ctx, int field, TCGv val);
 void gen_set_usr_fieldi(DisasContext *ctx, int field, int x);
diff --git a/target/hexagon/genptr.c b/target/hexagon/genptr.c
index 244063b1d2..dd707a9dc7 100644
--- a/target/hexagon/genptr.c
+++ b/target/hexagon/genptr.c
@@ -81,7 +81,7 @@ static TCGv_i64 get_result_gpr_pair(DisasContext *ctx, int 
rnum)
 return result;
 }
 
-void gen_log_reg_write(int rnum, TCGv val)
+void gen_log_reg_write(DisasContext *ctx, int rnum, TCGv val)
 {
 const target_ulong reg_mask = reg_immut_masks[rnum];
 
@@ -93,7 +93,7 @@ void gen_log_reg_write(int rnum, TCGv val)
 }
 }
 
-static void gen_log_reg_write_pair(int rnum, TCGv_i64 val)
+static void gen_log_reg_write_pair(DisasContext *ctx, int rnum, TCGv_i64 val)
 {
 const target_ulong reg_mask_low = reg_immut_masks[rnum];
 const target_ulong reg_mask_high = reg_immut_masks[rnum + 1];
@@ -231,7 +231,7 @@ static inline void gen_write_ctrl_reg(DisasContext *ctx, 
int reg_num,
 if (reg_num == HEX_REG_P3_0_ALIASED) {
 gen_write_p3_0(ctx, val);
 } else {
-gen_log_reg_write(reg_num, val);
+gen_log_reg_write(ctx, reg_num, val);
 if (reg_num == HEX_REG_QEMU_PKT_CNT) {
 ctx->num_packets = 0;
 }
@@ -255,7 +255,7 @@ static inline void gen_write_ctrl_reg_pair(DisasContext 
*ctx, int reg_num,
 tcg_gen_extrh_i64_i32(val32, val);
 tcg_gen_mov_tl(result, val32);
 } else {
-gen_log_reg_write_pair(reg_num, val);
+gen_log_reg_write_pair(ctx, reg_num, val);
 if (reg_num == HEX_REG_QEMU_PKT_CNT) {
 ctx->num_packets = 0;
 ctx->num_insns = 0;
@@ -719,7 +719,7 @@ static void gen_cond_return_subinsn(DisasContext *ctx, 
TCGCond cond, TCGv pred)
 {
 TCGv_i64 RddV = get_result_gpr_pair(ctx, HEX_REG_FP);
 gen_cond_return(ctx, RddV, hex_gpr[HEX_REG_FP], pred, cond);
-gen_log_reg_write_pair(HEX_REG_FP, RddV);
+gen_log_reg_write_pair(ctx, HEX_REG_FP, RddV);
 }
 
 static void gen_endloop0(DisasContext *ctx)
diff --git a/target/hexagon/idef-parser/parser-helpers.c 
b/target/hexagon/idef-parser/parser-helpers.c
index 8734218e51..09161e394d 100644
--- a/target/hexagon/idef-parser/parser-helpers.c
+++ b/target/hexagon/idef-parser/parser-helpers.c
@@ -1318,7 +1318,7 @@ void gen_write_reg(Context *c, YYLTYPE *locp, HexValue 
*reg, HexValue *value)
 value_m = rvalue_materialize(c, locp, _m);
 OUT(c,
 locp,
-"gen_log_reg_write(", >reg.id, ", ",
+"gen_log_reg_write(ctx, ", >reg.id, ", ",
 _m, ");\n");
 }
 
diff --git a/target/hexagon/README b/target/hexagon/README
index 0f48da9328..f86850ba73 100644
--- a/target/hexagon/README
+++ b/target/hexagon/README
@@ -87,7 +87,7 @@ tcg_funcs_generated.c.inc
 TCGv RsV = hex_gpr[insn->regno[1]];
 TCGv RtV = hex_gpr[insn->regno[2]];
 gen_helper_A2_add(RdV, cpu_env, RsV, RtV);
-gen_log_reg_write(RdN, RdV);
+gen_log_reg_write(ctx, RdN, RdV);
 }
 
 helper_funcs_generated.c.inc
diff --git a/target/hexagon/gen_tcg_funcs.py b/target/hexagon/gen_tcg_funcs.py
index fcb3384480..d9ccbe63f6 100755
--- a/target/hexagon/gen_tcg_funcs.py
+++ b/target/hexagon/gen_tcg_funcs.py
@@ -387,7 +387,8 @@ def gen_helper_call_imm(f, immlett):
 
 
 def genptr_dst_write_pair(f, tag, regtype, regid):
-f.write(f"

[PULL v2 05/44] Hexagon (tests/tcg/hexagon) Add v68 HVX tests

2023-05-18 Thread Taylor Simpson

Signed-off-by: Taylor Simpson 
Reviewed-by: Anton Johansson 
Message-Id: <20230427224057.3766963-6-tsimp...@quicinc.com>
---
 tests/tcg/hexagon/v68_hvx.c   |  90 +
 tests/tcg/hexagon/v6mpy_ref.c.inc | 161 ++
 tests/tcg/hexagon/Makefile.target |   3 +
 3 files changed, 254 insertions(+)
 create mode 100644 tests/tcg/hexagon/v68_hvx.c
 create mode 100644 tests/tcg/hexagon/v6mpy_ref.c.inc

diff --git a/tests/tcg/hexagon/v68_hvx.c b/tests/tcg/hexagon/v68_hvx.c
new file mode 100644
index 00..02718722a3
--- /dev/null
+++ b/tests/tcg/hexagon/v68_hvx.c
@@ -0,0 +1,90 @@
+/*
+ *  Copyright(c) 2022-2023 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+
+int err;
+
+#include "hvx_misc.h"
+
+MMVector v6mpy_buffer0[BUFSIZE] __attribute__((aligned(MAX_VEC_SIZE_BYTES)));
+MMVector v6mpy_buffer1[BUFSIZE] __attribute__((aligned(MAX_VEC_SIZE_BYTES)));
+
+static void init_v6mpy_buffers(void)
+{
+int counter0 = 0;
+int counter1 = 17;
+for (int i = 0; i < BUFSIZE; i++) {
+for (int j = 0; j < MAX_VEC_SIZE_BYTES / 4; j++) {
+v6mpy_buffer0[i].w[j] = counter0++;
+v6mpy_buffer1[i].w[j] = counter1++;
+}
+}
+}
+
+int v6mpy_ref[BUFSIZE][MAX_VEC_SIZE_BYTES / 4] = {
+#include "v6mpy_ref.c.inc"
+};
+
+static void test_v6mpy(void)
+{
+void *p00 = buffer0;
+void *p01 = v6mpy_buffer0;
+void *p10 = buffer1;
+void *p11 = v6mpy_buffer1;
+void *pout = output;
+
+memset(expect, 0xff, sizeof(expect));
+memset(output, 0xff, sizeof(expect));
+
+for (int i = 0; i < BUFSIZE; i++) {
+asm("v2 = vmem(%0 + #0)\n\t"
+"v3 = vmem(%1 + #0)\n\t"
+"v4 = vmem(%2 + #0)\n\t"
+"v5 = vmem(%3 + #0)\n\t"
+"v5:4.w = v6mpy(v5:4.ub, v3:2.b, #1):v\n\t"
+"vmem(%4 + #0) = v4\n\t"
+: : "r"(p00), "r"(p01), "r"(p10), "r"(p11), "r"(pout)
+: "v2", "v3", "v4", "v5", "memory");
+p00 += sizeof(MMVector);
+p01 += sizeof(MMVector);
+p10 += sizeof(MMVector);
+p11 += sizeof(MMVector);
+pout += sizeof(MMVector);
+
+for (int j = 0; j < MAX_VEC_SIZE_BYTES / 4; j++) {
+expect[i].w[j] = v6mpy_ref[i][j];
+}
+}
+
+check_output_w(__LINE__, BUFSIZE);
+}
+
+int main()
+{
+init_buffers();
+init_v6mpy_buffers();
+
+test_v6mpy();
+
+puts(err ? "FAIL" : "PASS");
+return err ? 1 : 0;
+}
diff --git a/tests/tcg/hexagon/v6mpy_ref.c.inc 
b/tests/tcg/hexagon/v6mpy_ref.c.inc
new file mode 100644
index 00..8258cddcb1
--- /dev/null
+++ b/tests/tcg/hexagon/v6mpy_ref.c.inc
@@ -0,0 +1,161 @@
+/*
+ *  Copyright(c) 2021-2023 Qualcomm Innovation Center, Inc. All Rights 
Reserved.
+ *
+ *  This program is free software; you can redistribute it and/or modify
+ *  it under the terms of the GNU General Public License as published by
+ *  the Free Software Foundation; either version 2 of the License, or
+ *  (at your option) any later version.
+ *
+ *  This program is distributed in the hope that it will be useful,
+ *  but WITHOUT ANY WARRANTY; without even the implied warranty of
+ *  MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ *  GNU General Public License for more details.
+ *
+ *  You should have received a copy of the GNU General Public License
+ *  along with this program; if not, see .
+ */
+
+{ 0xee11, 0xfcca, 0xc1b3, 0xd0cc,
+  0xe215, 0xf58e, 0xaf37, 0xc310,
+  0xd919, 0xf152, 0x9fbb, 0xb854,
+  0xd31d, 0xf016, 0x933f, 0xb098,
+  0xd021, 0xf1da, 0x89c3, 0xabdc,
+  0xd025, 0xf69e, 0x8347, 0xaa20,
+  0xd329, 0xfe62, 0x7fcb, 0xab64,
+  0xd92d, 0x0926, 0x7f4f, 0xafa8,
+  },
+{ 0xe231, 0x16ea, 0x81d3, 0xb6ec,
+  0xee35, 0x27ae, 0x8757, 0xc130,
+  0xfd39, 0x3b72, 0x8fdb, 0xce74,
+  0x0f3d, 0x5236, 0x9b5f, 0xdeb8,
+  0x2441, 0x6bfa, 0xa9e3, 0xf1fc,
+  0x3c45, 0x88be, 0xbb67, 0x0840,
+  0x5749, 0xa882, 0xcfeb, 0xe684,
+  0x494d, 0x9a46, 0xb16f, 0x02c8,
+

Re: [PATCH 0/1] pcie: Allow atomic completion on PCIE root port

2023-05-18 Thread Michael S. Tsirkin

On Fri, Apr 21, 2023 at 06:06:49PM +0200, Robin Voetter wrote:
> 
> 
> On 4/21/23 10:22, Michael S. Tsirkin wrote:
> > On Thu, Apr 20, 2023 at 05:38:39PM +0200, ro...@streamhpc.com wrote:
> >> From: Robin Voetter 
> >>
> >> The ROCm driver for Linux uses PCIe atomics to schedule work and
> >> generally communicate between the host and the device.  This does not
> >> currently work in QEMU with regular vfio-pci passthrough, because the
> >> pcie-root-port does not advertise the PCIe atomic completer
> >> capabilities.  When initializing the GPU from the Linux driver, it
> >> queries whether the PCIe connection from the CPU to GPU supports the
> >> required capabilities[1] in the pci_enable_atomic_ops_to_root
> >> function[2].  Currently the only part where this fails is checking the
> >> atomic completer capabilities (32 and 64 bits) on the root port[3].  In
> >> this case, the driver determines that PCIe atomics are not supported at
> >> all, and this causes ROCm programs to misbehave.  (While AMD advertises
> >> that there is some support for ROCm without PCIe atomics, I have never
> >> actually gotten that working...)
> >>
> >> This patch allows ROCm to properly function by introducing an
> >> additional experimental property to the pcie-root-port,
> >> x-atomic-completion.
> > 
> > so what exactly makes it experimental? from this description
> > it looks like it actually has to be enabled for things to work?
> 
> I was not sure which would be appropriate, but I'm fine with making it a
> non-experimental option.

So I guess the real thing to do is to query this from vfio right?
Unfortunately we don't have access to vfio when we
are creating the root port, but I think the thing to do would
be to check at the time when vfio is attached, and if
atomic is set but not supported, fail attaching vfio.

Right?

-- 
MST

Re: [REPOST PATCH v3 4/5] intel_iommu: allow Extended Interrupt Mode when using userspace APIC

2023-05-18 Thread Michael S. Tsirkin

On Tue, Apr 11, 2023 at 09:24:39PM +0700, Bui Quang Minh wrote:
> As userspace APIC now supports x2APIC, intel interrupt remapping
> hardware can be set to EIM mode when userspace local APIC is used.
> 
> Signed-off-by: Bui Quang Minh 
> ---
>  hw/i386/intel_iommu.c | 11 ---
>  1 file changed, 11 deletions(-)
> 
> diff --git a/hw/i386/intel_iommu.c b/hw/i386/intel_iommu.c
> index a62896759c..fd7c16b852 100644
> --- a/hw/i386/intel_iommu.c
> +++ b/hw/i386/intel_iommu.c
> @@ -4045,17 +4045,6 @@ static bool vtd_decide_config(IntelIOMMUState *s, 
> Error **errp)
>&& x86_iommu_ir_supported(x86_iommu) ?
>ON_OFF_AUTO_ON : 
> ON_OFF_AUTO_OFF;
>  }
> -if (s->intr_eim == ON_OFF_AUTO_ON && !s->buggy_eim) {
> -if (!kvm_irqchip_is_split()) {
> -error_setg(errp, "eim=on requires 
> accel=kvm,kernel-irqchip=split");
> -return false;
> -}
> -if (!kvm_enable_x2apic()) {
> -error_setg(errp, "eim=on requires support on the KVM side"
> - "(X2APIC_API, first shipped in v4.7)");
> -return false;
> -}
> -}
>  
>  /* Currently only address widths supported are 39 and 48 bits */
>  if ((s->aw_bits != VTD_HOST_AW_39BIT) &&


Paolo I think if you ack the kvm bits, I can merge.

> -- 
> 2.25.1
> 
>

Re: Status of DAX for virtio-fs/virtiofsd?

2023-05-18 Thread Vivek Goyal

On Wed, May 17, 2023 at 12:26:18PM -0400, Stefan Hajnoczi wrote:
> On Wed, 17 May 2023 at 11:54, Alex Bennée  wrote:
> Hi Alex,
> There were two unresolved issues:
> 
> 1. How to inject SIGBUS when the guest accesses a page that's beyond
> the end-of-file.
> 2. Implementing the vhost-user messages for mapping ranges of files to
> the vhost-user frontend.
> 
> The harder problem is SIGBUS. An mmap area may be larger than the
> length of the file. Or another process could truncate the file while
> it's mmapped, causing a previously correctly sized mmap to become
> longer than the actual file. When a page beyond the end of file is
> accessed, the kernel raises SIGBUS.
> 
> When this scenario occurs in the DAX Window, kvm.ko gets some type of
> vmexit (fault) and the code currently enters an infinite loop because
> it expects KVM memory regions to resolve faults. Since there is no
> page backing that part of the vma, the fault handling fails and the
> code loops trying to do this forever.
> 
> There needs to be a way to inject this fault back into the guest.
> However, we did not found a way to do that. We considered Machine
> Check Exceptions (MCEs), x86 interrupts, and paravirtualized
> approaches. None of them looked like a clean and sane way to do this.
> The Linux maintainers for MCEs and kvm.ko were not excited about
> supporting this.
> 
> So in the end, SIGBUS was never solved. It leads to a DoS because the
> host kernel will enter an infinite loop. We decided that until there
> is progress on SIGBUS, we can't go ahead with DAX Windows in
> production.
> 
> The easier problem is adding new vhost-user messages. It does lead to
> a fundamental change in the vhost-user protocol: the presence of the
> DAX Window means there are memory ranges that cannot be accessed via
> shared memory. Imagine Device A has a DAX Window and Device B needs to
> DMA to/from it. That doesn't work because the mmaps happen inside the
> frontend (QEMU), so Device B doesn't have access to the current
> mappings. The fundamental change to vhost-user is that virtqueue
> descriptor mapping code must now deal with the situation where guest
> addresses are absent from the shared memory regions and instead send
> vhost-user protocol messages to read/write to/from bounce buffers
> instead. The rest of the device backend does not require modification.
> This is a slow path, but at least it works whereas currently the I/O
> would fail because the memory is absent. Other solutions to the
> vhost-user DMA problem exist, but this is the one that Dave and I last
> discussed.
> 
> In the end, there is still work to do to make the DAX Window
> supportable. There is experimental code out there that kind of works,
> but we felt it was incomplete.

I feel that it will be good if someone can solve the vhost-user problem
first and get patches upstream. Now virtiofsd support from qemu has
been removed, so someone will have to add DAX support to rust virtiofsd.
(And make correspoding vhost-user changes in qemu).

Once that is done, someone can look into MCE issue.

With vhost-user problem solved, DAX will be usable in non-shared mode.
That is just pass through host filesystem into the guest and even host
can't make modifications. And that should steer clear us of the truncation
issue.

virtiofs DAX is a good piece of technology and provides speed up in many
cases. Will be sad to see the patches lost.

Now people are posting fixes to kernel side of DAX and there is no good
way to test these. I will try to make it work with old DAX branch david
had to test kernel changes but I am sure at some point of time it will
stop working and I don't want virtiofs kernel DAX code to become unstable.

Will be good if somebody takes up this project and makes it happen.

Thanks
Vivek

> 
> To your specific questions:
> 
> >  * What VMM/daemon combinations has DAX been tested on?
> 
> Only the experimental virtio-fs Kata Containers kernels and QEMU
> builds that were available a few years ago. I don't think the code has
> been rebased.
> 
> >  * Isn't it time the vhost-user spec is updated?
> 
> I don't know if Dave ever wrote the spec for or implemented the final
> version of the vhost-user protocol messages we discussed.
> 
> >  * Is anyone picking up Dave's patches for the QEMU side of support?
> 
> Not at the moment. It would be nice to support, but someone needs the
> energy/time/focus to deal with the outstanding issues I mentioned.
> 
> If you want to work on it, feel free to include me. I can help dig up
> old discussions and give input.
> 
> Stefan
>

Re: Multiple vIOMMU instance support in QEMU?

2023-05-18 Thread Peter Xu

On Thu, May 18, 2023 at 11:56:46AM -0300, Jason Gunthorpe wrote:
> On Thu, May 18, 2023 at 10:16:24AM -0400, Peter Xu wrote:
> 
> > What you mentioned above makes sense to me from the POV that 1 vIOMMU may
> > not suffice, but that's at least totally new area to me because I never
> > used >1 IOMMUs even bare metal (excluding the case where I'm aware that
> > e.g. a GPU could have its own IOMMU-like dma translator).
> 
> Even x86 systems are multi-iommu, one iommu per physical CPU socket.

I tried to look at a 2-node system on hand and I indeed got two dmars:

[4.444788] DMAR: dmar0: reg_base_addr fbffc000 ver 1:0 cap 8d2078c106f0466 
ecap f020df
[4.459673] DMAR: dmar1: reg_base_addr c7ffc000 ver 1:0 cap 8d2078c106f0466 
ecap f020df

Though they do not seem to be all parallel on attaching devices.  E.g.,
most of the devices on this host are attached to dmar1, while there're only
two devices attached to dmar0:

80:05.2 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 
v4/Xeon D IIO RAS/Control Status/Global Errors (rev 01)
80:05.0 System peripheral: Intel Corporation Xeon E7 v4/Xeon E5 v4/Xeon E3 
v4/Xeon D Map/VTd_Misc/System Management (rev 01)

> 
> I'm not sure how they model this though - Kevin do you know? Do we get
> multiple iommu instances in Linux or is all the broadcasting of
> invalidates and sharing of tables hidden?
> 
> > What's the system layout of your multi-vIOMMU world?  Is there still a
> > centric vIOMMU, or multi-vIOMMUs can run fully in parallel, so that e.g. we
> > can have DEV1,DEV2 under vIOMMU1 and DEV3,DEV4 under vIOMMU2?
> 
> Just like physical, each viommu is parallel and independent. Each has
> its own caches, ASIDs, DIDs/etc and thus invalidation domains.
> 
> The seperated caches is the motivating reason to do this as something
> like vCMDQ is a direct command channel for invalidations to only the
> caches of a single IOMMU block.

>From cache invalidation pov, shouldn't the best be per-device granule (like
dev-iotlb in VT-d? No idea for ARM)?

But that's two angles I assume - currently dev-iotlb is still emulated at
least in QEMU.  Having a hardware accelerated queue is definitely another
thing.

> 
> > Is it a common hardware layout or nVidia specific?
> 
> I think it is pretty normal, you have multiple copies of the IOMMU and
> its caches for physical reasons.
> 
> The only choice is if the platform HW somehow routes invalidations to
> all IOMMUs or requires SW to route/replicate invalidates.
> 
> ARM's IP seems to be designed toward the latter so I expect it is
> going to be common on ARM.

Thanks for the information, Jason.

I see that Intel is already copied here (at least Yi and Kevin) so I assume
there're already some kind of synchronizations on multi-vIOMMU vs recent
works on Intel side, which is definitely nice and can avoid work conflicts.

We should probably also copy Jason Wang and mst when there's any formal
proposal.  I've got them all copied here too.

-- 
Peter Xu

Re: [PATCH v2] target/arm: allow DC CVA[D]P in user mode emulation

2023-05-18 Thread Zhuojia Shen

On 05/17/2023 12:51 PM -0700, Richard Henderson wrote:
> On 5/17/23 10:31, Zhuojia Shen wrote:
> > DC CVAP and DC CVADP instructions can be executed in EL0 on Linux,
> > either directly when SCTLR_EL1.UCI == 1 or emulated by the kernel (see
> > user_cache_maint_handler() in arch/arm64/kernel/traps.c).
> > 
> > This patch enables execution of the two instructions in user mode
> > emulation.
> > 
> > Signed-off-by: Zhuojia Shen 
> > ---
> >   target/arm/helper.c   |  6 ++--
> >   tests/tcg/aarch64/Makefile.target | 11 
> >   tests/tcg/aarch64/dcpodp-1.c  | 47 +++
> >   tests/tcg/aarch64/dcpodp-2.c  | 47 +++
> >   tests/tcg/aarch64/dcpop-1.c   | 47 +++
> >   tests/tcg/aarch64/dcpop-2.c   | 47 +++
> >   6 files changed, 201 insertions(+), 4 deletions(-)
> >   create mode 100644 tests/tcg/aarch64/dcpodp-1.c
> >   create mode 100644 tests/tcg/aarch64/dcpodp-2.c
> >   create mode 100644 tests/tcg/aarch64/dcpop-1.c
> >   create mode 100644 tests/tcg/aarch64/dcpop-2.c
> 
> I recommend splitting the tests to a second patch.

Will do.

> 
>  +++ b/tests/tcg/aarch64/dcpodp-1.c
> > @@ -0,0 +1,47 @@
> > +/* Test execution of DC CVADP instruction */
> > +
> > +#include 
> > +#include 
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#ifndef HWCAP2_DCPODP
> > +#define HWCAP2_DCPODP (1 << 0)
> > +#endif
> > +
> > +static void signal_handler(int sig)
> > +{
> > +exit(EXIT_FAILURE);
> > +}
> > +
> > +static int do_dc_cvadp(void)
> > +{
> > +struct sigaction sa = {
> > +.sa_handler = signal_handler,
> > +};
> > +
> > +if (sigaction(SIGILL, , NULL) < 0) {
> > +perror("sigaction");
> > +return EXIT_FAILURE;
> > +}
> > +if (sigaction(SIGSEGV, , NULL) < 0) {
> > +perror("sigaction");
> > +return EXIT_FAILURE;
> > +}
> > +
> > +asm volatile("dc cvadp, %0\n\t" :: "r"());
> > +
> > +return EXIT_SUCCESS;
> > +}
> 
> ...
> 
> > diff --git a/tests/tcg/aarch64/dcpodp-2.c b/tests/tcg/aarch64/dcpodp-2.c
> > new file mode 100644
> > index 00..3245d7883d
> > --- /dev/null
> > +++ b/tests/tcg/aarch64/dcpodp-2.c
> > @@ -0,0 +1,47 @@
> > +/* Test execution of DC CVADP instruction on unmapped address */
> > +
> > +#include 
> > +#include 
> > +
> > +#include 
> > +#include 
> > +#include 
> > +
> > +#ifndef HWCAP2_DCPODP
> > +#define HWCAP2_DCPODP (1 << 0)
> > +#endif
> > +
> > +static void signal_handler(int sig)
> > +{
> > +exit(EXIT_SUCCESS);
> > +}
> > +
> > +static int do_dc_cvadp(void)
> > +{
> > +struct sigaction sa = {
> > +.sa_handler = signal_handler,
> > +};
> > +
> > +if (sigaction(SIGILL, , NULL) < 0) {
> > +perror("sigaction");
> > +return EXIT_FAILURE;
> > +}
> 
> This isn't: if SIGILL, exit with success.
> 
> You don't actually need to register anything for SIGILL, in either test,
> because SIGILL is a fine exit for failure.  So is SIGSEGV for test 1.
> 

Thanks!  Will update in the second patch.

> Also, you could merge all 4 tests and save some CI time.

Tests for CVAP and CVADP require different CFLAGS; I'll merge them into
2 tests.

> 
> 
> r~
>

gitlab shared runner time expired

2023-05-18 Thread Richard Henderson


So, here we are again, out of runner time with 13 days left in the month.

Did we come to any resolution since last time?  Holding development for that long just 
isn't right, so I'll continue processing the hard way -- testing on private runners and 
local build machines.



r~

1 2 3 4 >

1 - 100 of 342 matches

Mail list logo