Re: [PATCH] hw/nvme: fix validation of ASQ and ACQ

2021-08-23 Thread Klaus Jensen
On Aug 23 19:47, Keith Busch wrote:
> On Mon, Aug 23, 2021 at 02:20:18PM +0200, Klaus Jensen wrote:
> > From: Klaus Jensen 
> > 
> > Address 0x0 is a valid address. Fix the admin submission and completion
> > queue address validation to not error out on this.
> 
> Indeed, there are environments that can use that address. It's a host error if
> the controller was enabled with invalid queue addresses anyway. The controller
> only needs to verify the lower bits are clear, which we do later.
> 
> Reviewed-by: Keith Busch 
> 

Thanks Keith,

Yeah, I noticed this with a VFIO-based driver where the IOVAs typically
start at 0x0.

And yes, I specifically refrained from adding any other sanity checks on
the addresses. I.e., we could add a check for ASQ != ACQ, but who are we
to judge ;)

Applied to nvme-next!


signature.asc
Description: PGP signature


Re: [PATCH] hw/acpi/pcihp: validate bsel property of the bus before unplugging device

2021-08-23 Thread Ani Sinha



On Mon, 23 Aug 2021, Michael S. Tsirkin wrote:

> On Sat, Aug 21, 2021 at 08:35:35PM +0530, Ani Sinha wrote:
> > Bsel property of the pci bus indicates whether the bus supports acpi 
> > hotplug.
> > We need to validate the presence of this property before performing any 
> > hotplug
> > related callback operations. Currently validation of the existence of this
> > property was absent from acpi_pcihp_device_unplug_cb() function but is 
> > present
> > in other hotplug/unplug callback functions. Hence, this change adds the 
> > missing
> > check for the above function.
> >
> > Signed-off-by: Ani Sinha 
>
> I queued this but I have a general question:
> are all these errors logged with LOG_GUEST_ERROR?

I do not think they are logged that way. These logs go to stderr which can
end up in the qemu guest specific log file when qemu is run daemonized.

That being said, other platforms, for example virtio-pci also seems to do
what we do. They use error_setg() as well under similar conditions.

> Because if not we have a security problem.
> I also note that bsel is an internal property,

yeah this I think is an issue. We can change the log so as to not say
anything about bsel. I will let Igor comment. I can send out a separate
patch to fix this.

> I am not sure we should be printing this to users,
> it might just confuse them.
>
> Same question for all the other places validating bsel.
>
> > ---
> >  hw/acpi/pcihp.c | 10 --
> >  1 file changed, 8 insertions(+), 2 deletions(-)
> >
> > diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
> > index 0fd0c1d811..9982815a87 100644
> > --- a/hw/acpi/pcihp.c
> > +++ b/hw/acpi/pcihp.c
> > @@ -372,9 +372,15 @@ void acpi_pcihp_device_unplug_cb(HotplugHandler 
> > *hotplug_dev, AcpiPciHpState *s,
> >   DeviceState *dev, Error **errp)
> >  {
> >  PCIDevice *pdev = PCI_DEVICE(dev);
> > +int bsel = acpi_pcihp_get_bsel(pci_get_bus(pdev));
> > +
> > +trace_acpi_pci_unplug(PCI_SLOT(pdev->devfn), bsel);
> >
> > -trace_acpi_pci_unplug(PCI_SLOT(pdev->devfn),
> > -  acpi_pcihp_get_bsel(pci_get_bus(pdev)));
> > +if (bsel < 0) {
> > +error_setg(errp, "Unsupported bus. Bus doesn't have property '"
> > +   ACPI_PCIHP_PROP_BSEL "' set");
> > +return;
> > +}
> >
> >  /*
> >   * clean up acpi-index so it could reused by another device
> > --
> > 2.25.1
>
>



Re: [PATCH] vga: don't abort when adding a duplicate isa-vga device

2021-08-23 Thread Markus Armbruster
Thomas Huth  writes:

> On 14/08/2021 01.36, Jose R. Ziviani wrote:
>> If users try to add an isa-vga device that was already registered,
>> still in command line, qemu will crash:
>> $ qemu-system-mips64el -M pica61 -device isa-vga
>> RAMBlock "vga.vram" already registered, abort!
>> Aborted (core dumped)
>> That particular board registers such device automaticaly, so it's
>> not obvious that a VGA device already exists. This patch changes
>> this behavior by displaying a message and ignoring that device,
>> starting qemu normally.
>> Resolves: https://gitlab.com/qemu-project/qemu/-/issues/44
>> Signed-off-by: Jose R. Ziviani 
>> ---
>>   hw/display/vga-isa.c | 9 +
>>   1 file changed, 9 insertions(+)
>> diff --git a/hw/display/vga-isa.c b/hw/display/vga-isa.c
>> index 90851e730b..69db502dde 100644
>> --- a/hw/display/vga-isa.c
>> +++ b/hw/display/vga-isa.c
>> @@ -61,6 +61,15 @@ static void vga_isa_realizefn(DeviceState *dev, Error 
>> **errp)
>>   MemoryRegion *vga_io_memory;
>>   const MemoryRegionPortio *vga_ports, *vbe_ports;
>> +/*
>> + * some machines register VGA by default, so instead of aborting
>> + * it, show a message and ignore this device.
>> + */
>> +if (qemu_ram_block_by_name("vga.vram")) {
>> +error_report("vga.vram is already registered, ignoring this 
>> device");
>> +return;
>> +}
>
> I think we should not ignore the error, but rather turn this into a
> proper error (instead of aborting).
>
> So if you replace error_report(...) with error_setg(errp, ...), the
> patch should be fine.

Agreed.

error_report() in a function with an Error **errp parameter is almost
always wrong.




Re: [PATCH v7 05/15] machine: Improve the error reporting of smp parsing

2021-08-23 Thread wangyanan (Y)



On 2021/8/23 21:17, Philippe Mathieu-Daudé wrote:

On 8/23/21 2:27 PM, Yanan Wang wrote:

We have two requirements for a valid SMP configuration:
the product of "sockets * cores * threads" must represent all the
possible cpus, i.e., max_cpus, and then must include the initially
present cpus, i.e., smp_cpus.

So we only need to ensure 1) "sockets * cores * threads == maxcpus"
at first and then ensure 2) "maxcpus >= cpus". With a reasonable
order of the sanity check, we can simplify the error reporting code.
When reporting an error message we also report the exact value of
each topology member to make users easily see what's going on.

Signed-off-by: Yanan Wang 
Reviewed-by: Andrew Jones 
Reviewed-by: Pankaj Gupta 
---
  hw/core/machine.c | 22 +-
  hw/i386/pc.c  | 24 ++--
  2 files changed, 19 insertions(+), 27 deletions(-)

diff --git a/hw/core/machine.c b/hw/core/machine.c
index 85908abc77..093c0d382d 100644
--- a/hw/core/machine.c
+++ b/hw/core/machine.c
@@ -779,25 +779,21 @@ static void smp_parse(MachineState *ms, SMPConfiguration 
*config, Error **errp)
  maxcpus = maxcpus > 0 ? maxcpus : sockets * cores * threads;
  cpus = cpus > 0 ? cpus : maxcpus;
  
-if (sockets * cores * threads < cpus) {

-error_setg(errp, "cpu topology: "
-   "sockets (%u) * cores (%u) * threads (%u) < "
-   "smp_cpus (%u)",
-   sockets, cores, threads, cpus);
+if (sockets * cores * threads != maxcpus) {
+error_setg(errp, "Invalid CPU topology: "
+   "product of the hierarchy must match maxcpus: "
+   "sockets (%u) * cores (%u) * threads (%u) "
+   "!= maxcpus (%u)",
+   sockets, cores, threads, maxcpus);
  return;
  }

Thinking about scalability, MachineClass could have a
parse_cpu_topology() handler, and this would be the
generic one. Principally because architectures don't
use the same terms, and die/socket/core/thread arrangement
is machine specific (besides being arch-spec).
Not a problem as of today, but the way we try to handle
this generically seems over-engineered to me.

Hi Philippe,

The reason for introducing a generic implementation and avoiding
specific ones is that we thought there is little difference in parsing
logic between the specific parsers. Most part of the parsing is the
automatic calculation of missing values and the related error reporting,
in which the only difference between parsers is the handling of specific
(no matter of arch-specific or machine-specifc) parameters.

So it may be better to keep the parsing logic unified if we can easily
realize that. And actually we can use compat stuff to handle specific
topology parameters well. See implementation in patch #10.

There have been patches on list introducing new specific members
(s390 related in [1] and ARM related in [2]), and in each of them there
is a specific parser needed. However, based on generic one we can
extend without the increasing code duplication.

There is also some discussion about generic/specific parser in [1],
which can be a reference.

[1] 
https://lore.kernel.org/qemu-devel/1626281596-31061-2-git-send-email-pmo...@linux.ibm.com/
[2] 
https://lore.kernel.org/qemu-devel/20210516103228.37792-1-wangyana...@huawei.com/


Thanks,
Yanan
.

[unrelated to this particular patch]

.





Re: [PATCH for-6.2 v5 5/5] hw/acpi/aml-build: Generate PPTT table

2021-08-23 Thread wangyanan (Y)



On 2021/8/24 7:52, Michael S. Tsirkin wrote:

On Thu, Aug 05, 2021 at 08:39:21PM +0800, Yanan Wang wrote:

From: Andrew Jones 

Add the Processor Properties Topology Table (PPTT) to expose
CPU topology information defined by users to ACPI guests.

Note, a DT-boot Linux guest with a non-flat CPU topology will
see socket and core IDs being sequential integers starting
from zero, which is different from ACPI-boot Linux guest,
e.g. with -smp 4,sockets=2,cores=2,threads=1

a DT boot produces:

  cpu:  0 package_id:  0 core_id:  0
  cpu:  1 package_id:  0 core_id:  1
  cpu:  2 package_id:  1 core_id:  0
  cpu:  3 package_id:  1 core_id:  1

an ACPI boot produces:

  cpu:  0 package_id: 36 core_id:  0
  cpu:  1 package_id: 36 core_id:  1
  cpu:  2 package_id: 96 core_id:  2
  cpu:  3 package_id: 96 core_id:  3

This is due to several reasons:

  1) DT cpu nodes do not have an equivalent field to what the PPTT
 ACPI Processor ID must be, i.e. something equal to the MADT CPU
 UID or equal to the UID of an ACPI processor container. In both
 ACPI cases those are platform dependant IDs assigned by the
 vendor.

  2) While QEMU is the vendor for a guest, if the topology specifies
 SMT (> 1 thread), then, with ACPI, it is impossible to assign a
 core-id the same value as a package-id, thus it is not possible
 to have package-id=0 and core-id=0. This is because package and
 core containers must be in the same ACPI namespace and therefore
 must have unique UIDs.

  3) ACPI processor containers are not mandatorily required for PPTT
 tables to be used and, due to the limitations of which IDs are
 selected described above in (2), they are not helpful for QEMU,
 so we don't build them with this patch. In the absence of them,
 Linux assigns its own unique IDs. The maintainers have chosen not
 to use counters from zero, but rather ACPI table offsets, which
 explains why the numbers are so much larger than with DT.

  4) When there is no SMT (threads=1) the core IDs for ACPI boot guests
 match the logical CPU IDs, because these IDs must be equal to the
 MADT CPU UID (as no processor containers are present), and QEMU
 uses the logical CPU ID for these MADT IDs.

So in summary, with QEMU as vender for the guest, we use sequential

vendor?

Yes, will fix the typo.

integers starting from zero for non-leaf nodes without valid ID flag,
so that guest will ignore them and use table offsets as unique IDs.
And we use logical CPU IDs for leaf nodes to be consistent with MADT.

Signed-off-by: Andrew Jones 
Co-developed-by: Yanan Wang 
Signed-off-by: Yanan Wang 
---
  hw/acpi/aml-build.c | 50 +
  hw/arm/virt-acpi-build.c|  8 +-
  include/hw/acpi/aml-build.h |  3 +++
  3 files changed, 60 insertions(+), 1 deletion(-)

diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index 9fa5024414..aa61c9651e 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -1946,6 +1946,56 @@ void build_processor_hierarchy_node(GArray *tbl, 
uint32_t flags,
  }
  }
  
+/* ACPI 6.2: 5.2.29 Processor Properties Topology Table (PPTT) */

+void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms,
+const char *oem_id, const char *oem_table_id)
+{
+int pptt_start = table_data->len;
+int uid = 0;
+int socket;
+
+acpi_data_push(table_data, sizeof(AcpiTableHeader));
+
+for (socket = 0; socket < ms->smp.sockets; socket++) {
+uint32_t socket_offset = table_data->len - pptt_start;
+int core;
+
+build_processor_hierarchy_node(
+table_data,
+(1 << 0), /* ACPI 6.2 - Physical package */

A bit better to be detailed:

/* Physical package - represents the boundary of a physical package */


Ok. I will change this place and below mentioned to be more detailed
as you suggested if it's preferred.

+0, socket, NULL, 0);
+
+for (core = 0; core < ms->smp.cores; core++) {
+uint32_t core_offset = table_data->len - pptt_start;
+int thread;
+
+if (ms->smp.threads > 1) {
+build_processor_hierarchy_node(table_data, 0, socket_offset,

and here:
/* Physical package - does not represent the boundary of a physical package */

Thanks,
Yanan
.

+   core, NULL, 0);
+
+for (thread = 0; thread < ms->smp.threads; thread++) {
+build_processor_hierarchy_node(
+table_data,
+(1 << 1) | /* ACPI 6.2 - ACPI Processor ID valid */
+(1 << 2) | /* ACPI 6.3 - Processor is a Thread */
+(1 << 3),  /* ACPI 6.3 - Node is a Leaf */
+core_offset, uid++, NULL, 0);
+}
+} else {
+build_processor_hierarchy_node(
+table_data,
+(1 << 1) | 

Re: [PATCH for-6.2 v5 0/5] hw/arm/virt: Introduce cpu topology support

2021-08-23 Thread wangyanan (Y)



On 2021/8/24 7:53, Michael S. Tsirkin wrote:

On Thu, Aug 05, 2021 at 08:39:16PM +0800, Yanan Wang wrote:

Hi,

This is a new version (v5) of the series [1] that I posted to introduce
support for generating cpu topology descriptions to virt machine guest.

Once the view of an accurate virtual cpu topology is provided to guest,
with a well-designed vCPU pinning to the pCPU we may get a huge benefit,
e.g., the scheduling performance improvement. See Dario Faggioli's
research and the related performance tests in [2] for reference.

This patch series introduces cpu topology support for ARM platform.
Both cpu-map in DT and ACPI PPTT table are introduced to store the
topology information. And we only describe the topology information
to 6.2 and newer virt machines, considering compatibility.

ACPI things:

Reviewed-by: Michael S. Tsirkin 

Thanks for the reviewing.

pls merge through ARM tree.

Sure, got it.

Yanan
.



[1] 
https://lore.kernel.org/qemu-devel/20210622093413.13360-1-wangyana...@huawei.com/
[2] 
https://kvmforum2020.sched.com/event/eE1y/virtual-topology-for-virtual-machines
-friend-or-foe-dario-faggioli-suse

Some tests:
1) -smp 16,sockets=2,cores=4,threads=2,maxcpus=16
lscpu:
Architecture:aarch64
Byte Order:  Little Endian
CPU(s):  16
On-line CPU(s) list: 0-15
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):   2
NUMA node(s):1
Vendor ID:   ARM
Model:   2
Model name:  Cortex-A72
Stepping:r0p2
BogoMIPS:100.00
NUMA node0 CPU(s):   0-15

cat /sys/devices/system/cpu/present  -->  0-15
cat /sys/devices/system/cpu/possible -->  0-15

2) -smp 8,sockets=2,cores=4,threads=2,maxcpus=16
lscpu:
Architecture:aarch64
Byte Order:  Little Endian
CPU(s):  8
On-line CPU(s) list: 0-7
Thread(s) per core:  2
Core(s) per socket:  4
Socket(s):   1
NUMA node(s):1
Vendor ID:   ARM
Model:   2
Model name:  Cortex-A72
Stepping:r0p2
BogoMIPS:100.00
NUMA node0 CPU(s):   0-7

cat /sys/devices/system/cpu/present  -->  0-7
cat /sys/devices/system/cpu/possible -->  0-7

---

Changelogs:

v4->v5:
- drop the added -smp "expose=on|off" parameter and only describe topology
   for 6.2 and newer machines
- rebased the code on patch series [3] which has introduced some fix and
   improvement for smp parsing
- [3]: 
https://lore.kernel.org/qemu-devel/20210803080527.156556-1-wangyana...@huawei.com/

v3->v4:
- add new -smp parameter "expose=on|off" for users to enable/disable the feature
- add stricter -smp cmdline parsing rules on "expose=on" case
- move build_pptt to generic aml-build.c
- add default cluster node in the cpu-map
- rebase on top of latest upstream master
- v3: 
https://lore.kernel.org/qemu-devel/20210516102900.28036-1-wangyana...@huawei.com/

v2->v3:
- address comments from David, Philippe, and Andrew. Thanks!
- split some change into separate commits for ease of review
- adjust parsing rules of virt_smp_parse to be more strict
   (after discussion with Andrew)
- adjust author credit for the patches

v1->v2:
- Address Andrew Jones's comments
- Address Michael S. Tsirkin's comments

---

Andrew Jones (2):
   hw/arm/virt: Add cpu-map to device tree
   hw/acpi/aml-build: Generate PPTT table

Yanan Wang (3):
   hw/arm/virt: Only describe cpu topology to guest since virt 6.2
   device_tree: Add qemu_fdt_add_path
   hw/acpi/aml-build: Add Processor hierarchy node structure

  hw/acpi/aml-build.c  | 76 
  hw/arm/virt-acpi-build.c |  8 +++-
  hw/arm/virt.c| 62 -
  include/hw/acpi/aml-build.h  |  7 
  include/hw/arm/virt.h|  4 +-
  include/sysemu/device_tree.h |  1 +
  softmmu/device_tree.c| 44 -
  7 files changed, 188 insertions(+), 14 deletions(-)

--
2.19.1


.





Re: [PATCH 4/4] vl: Prioritize realizations of devices

2021-08-23 Thread Jason Wang
On Tue, Aug 24, 2021 at 6:37 AM Peter Xu  wrote:
>
> On Mon, Aug 23, 2021 at 06:05:07PM -0400, Michael S. Tsirkin wrote:
> > On Mon, Aug 23, 2021 at 03:18:51PM -0400, Peter Xu wrote:
> > > On Mon, Aug 23, 2021 at 02:49:12PM -0400, Eduardo Habkost wrote:
> > > > On Wed, Aug 18, 2021 at 03:43:18PM -0400, Peter Xu wrote:
> > > > > QEMU creates -device objects in order as specified by the user's 
> > > > > cmdline.
> > > > > However that ordering may not be the ideal order.  For example, some 
> > > > > platform
> > > > > devices (vIOMMUs) may want to be created earlier than most of the rest
> > > > > devices (e.g., vfio-pci, virtio).
> > > > >
> > > > > This patch orders the QemuOptsList of '-device's so they'll be sorted 
> > > > > first
> > > > > before kicking off the device realizations.  This will allow the 
> > > > > device
> > > > > realization code to be able to use APIs like 
> > > > > pci_device_iommu_address_space()
> > > > > correctly, because those functions rely on the platfrom devices being 
> > > > > realized.
> > > > >
> > > > > Now we rely on vmsd->priority which is defined as MigrationPriority 
> > > > > to provide
> > > > > the ordering, as either VM init and migration completes will need 
> > > > > such an
> > > > > ordering.  In the future we can move that priority information out of 
> > > > > vmsd.
> > > > >
> > > > > Signed-off-by: Peter Xu 
> > > >
> > > > Can we be 100% sure that changing the ordering of every single
> > > > device being created won't affect guest ABI?  (I don't think we can)
> > >
> > > That's a good question, however I doubt whether there's any real-world 
> > > guest
> > > ABI for that.  As a developer, I normally specify cmdline parameter in an 
> > > adhoc
> > > way, so that I assume most parameters are not sensitive to ordering and I 
> > > can
> > > tune the ordering as wish.  I'm not sure whether that's common for qemu 
> > > users,
> > > I would expect so, but I may have missed something that I'm not aware of.
> > >
> > > Per my knowledge the only "guest ABI" change is e.g. when we specify 
> > > "vfio-pci"
> > > to be before "intel-iommu": it'll be constantly broken before this 
> > > patchset,
> > > while after this series it'll be working.  It's just that I don't think 
> > > those
> > > "guest ABI" is necessary to be kept, and that's exactly what I want to 
> > > fix with
> > > the patchset..
> > >
> > > >
> > > > How many device types in QEMU have non-default vmsd priority?
> > >
> > > Not so much; here's the list of priorities and the devices using it:
> > >
> > >|+-|
> > >| priority   | devices |
> > >|+-|
> > >| MIG_PRI_IOMMU  |   3 |
> > >| MIG_PRI_PCI_BUS|   7 |
> > >| MIG_PRI_VIRTIO_MEM |   1 |
> > >| MIG_PRI_GICV3_ITS  |   1 |
> > >| MIG_PRI_GICV3  |   1 |
> > >|+-|
> >
> > iommu is probably ok. I think virtio mem is ok too,
> > in that it is normally created by virtio-mem-pci ...
>
> Hmm this reminded me whether virtio-mem-pci could have another devfn allocated
> after being moved..
>
> But frankly I still doubt whether we should guarantee that guest ABI on user
> not specifying addr=XXX in pci device parameters - I feel like it's a burden
> that we don't need to carry.
>
> (Btw, trying to keep the order is one thing; declare it guest ABI would be
>  another thing to me)
>
> >
> >
> >
> > > All the rest devices are using the default (0) priority.
> > >
> > > >
> > > > Can we at least ensure devices with the same priority won't be
> > > > reordered, just to be safe?  (qsort() doesn't guarantee that)
> > > >
> > > > If very few device types have non-default vmsd priority and
> > > > devices with the same priority aren't reordered, the risk of
> > > > compatibility breakage would be much smaller.
> > >
> > > I'm also wondering whether it's a good thing to break some guest ABI due 
> > > to
> > > this change, if possible.
> > >
> > > Let's imagine something breaks after applied, then the only reason should 
> > > be
> > > that qsort() changed the order of some same-priority devices and it's not 
> > > the
> > > same as user specified any more.  Then, does it also means there's yet 
> > > another
> > > ordering requirement that we didn't even notice?
> > >
> > > I doubt whether that'll even happen (or I think there'll be report 
> > > already, as
> > > in qemu man page there's no requirement on parameter ordering).  In all 
> > > cases,
> > > instead of "keeping the same priority devices in the same order as the 
> > > user has
> > > specified", IMHO we should make the broken devices to have different 
> > > priorities
> > > so the ordering will be guaranteed by qemu internal, rather than how user
> > > specified it.
> >
> > Well giving user control of guest ABI is a reasonable thing to do,
> > it is realize order that users do not really care about.
>
> 

Re: [PATCH 4/4] vl: Prioritize realizations of devices

2021-08-23 Thread Jason Wang
On Tue, Aug 24, 2021 at 3:18 AM Peter Xu  wrote:
>
> On Mon, Aug 23, 2021 at 02:49:12PM -0400, Eduardo Habkost wrote:
> > On Wed, Aug 18, 2021 at 03:43:18PM -0400, Peter Xu wrote:
> > > QEMU creates -device objects in order as specified by the user's cmdline.
> > > However that ordering may not be the ideal order.  For example, some 
> > > platform
> > > devices (vIOMMUs) may want to be created earlier than most of the rest
> > > devices (e.g., vfio-pci, virtio).
> > >
> > > This patch orders the QemuOptsList of '-device's so they'll be sorted 
> > > first
> > > before kicking off the device realizations.  This will allow the device
> > > realization code to be able to use APIs like 
> > > pci_device_iommu_address_space()
> > > correctly, because those functions rely on the platfrom devices being 
> > > realized.
> > >
> > > Now we rely on vmsd->priority which is defined as MigrationPriority to 
> > > provide
> > > the ordering, as either VM init and migration completes will need such an
> > > ordering.  In the future we can move that priority information out of 
> > > vmsd.
> > >
> > > Signed-off-by: Peter Xu 
> >
> > Can we be 100% sure that changing the ordering of every single
> > device being created won't affect guest ABI?  (I don't think we can)
>
> That's a good question, however I doubt whether there's any real-world guest
> ABI for that.  As a developer, I normally specify cmdline parameter in an 
> adhoc
> way, so that I assume most parameters are not sensitive to ordering and I can
> tune the ordering as wish.  I'm not sure whether that's common for qemu users,
> I would expect so, but I may have missed something that I'm not aware of.
>
> Per my knowledge the only "guest ABI" change is e.g. when we specify 
> "vfio-pci"
> to be before "intel-iommu": it'll be constantly broken before this patchset,
> while after this series it'll be working.  It's just that I don't think those
> "guest ABI" is necessary to be kept, and that's exactly what I want to fix 
> with
> the patchset..

Yes, and I wonder if we limit this to new machine types, we don't even
need to care about ABI stuff.

Thanks

>
> >
> > How many device types in QEMU have non-default vmsd priority?
>
> Not so much; here's the list of priorities and the devices using it:
>
>|+-|
>| priority   | devices |
>|+-|
>| MIG_PRI_IOMMU  |   3 |
>| MIG_PRI_PCI_BUS|   7 |
>| MIG_PRI_VIRTIO_MEM |   1 |
>| MIG_PRI_GICV3_ITS  |   1 |
>| MIG_PRI_GICV3  |   1 |
>|+-|
>
> All the rest devices are using the default (0) priority.
>
> >
> > Can we at least ensure devices with the same priority won't be
> > reordered, just to be safe?  (qsort() doesn't guarantee that)
> >
> > If very few device types have non-default vmsd priority and
> > devices with the same priority aren't reordered, the risk of
> > compatibility breakage would be much smaller.
>
> I'm also wondering whether it's a good thing to break some guest ABI due to
> this change, if possible.
>
> Let's imagine something breaks after applied, then the only reason should be
> that qsort() changed the order of some same-priority devices and it's not the
> same as user specified any more.  Then, does it also means there's yet another
> ordering requirement that we didn't even notice?
>
> I doubt whether that'll even happen (or I think there'll be report already, as
> in qemu man page there's no requirement on parameter ordering).  In all cases,
> instead of "keeping the same priority devices in the same order as the user 
> has
> specified", IMHO we should make the broken devices to have different 
> priorities
> so the ordering will be guaranteed by qemu internal, rather than how user
> specified it.
>
> From that pov, maybe this patchset would be great if it can be accepted and
> applied in early stage of a release? So we can figure out what's missing and
> fix them within the same release.  However again I still doubt whether there's
> any user that will break in a bad way.
>
> Thanks,
>
> --
> Peter Xu
>




[PATCH v3 4/4] hw/arm/virt: Add PL330 DMA controller and connect with SMMU v3

2021-08-23 Thread Li, Chunming
From: LCM 

Add PL330 DMA controller to test SMMU v3 connection and function.
The default SID for PL330 is 1 but we test other values, it works well.

Signed-off-by: Chunming Li 
Signed-off-by: Renwei Liu 
---
 hw/arm/virt.c | 92 ++-
 include/hw/arm/virt.h |  1 +
 2 files changed, 92 insertions(+), 1 deletion(-)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index c3fd30e07..8180e4a33 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -143,6 +143,7 @@ static const MemMapEntry base_memmap[] = {
 [VIRT_GIC_REDIST] = { 0x080A, 0x00F6 },
 [VIRT_UART] =   { 0x0900, 0x1000 },
 [VIRT_RTC] ={ 0x0901, 0x1000 },
+[VIRT_DMA] ={ 0x09011000, 0x1000 },
 [VIRT_FW_CFG] = { 0x0902, 0x0018 },
 [VIRT_GPIO] =   { 0x0903, 0x1000 },
 [VIRT_SECURE_UART] ={ 0x0904, 0x1000 },
@@ -188,6 +189,7 @@ static const int a15irqmap[] = {
 [VIRT_GPIO] = 7,
 [VIRT_SECURE_UART] = 8,
 [VIRT_ACPI_GED] = 9,
+[VIRT_DMA] = 10,
 [VIRT_MMIO] = 16, /* ...to 16 + NUM_VIRTIO_TRANSPORTS - 1 */
 [VIRT_GIC_V2M] = 48, /* ...to 48 + NUM_GICV2M_SPIS - 1 */
 [VIRT_SMMU] = 74,/* ...to 74 + NUM_SMMU_IRQS - 1 */
@@ -205,7 +207,7 @@ static const char *valid_cpus[] = {
 };
 
 static const uint16_t smmuv3_sidmap[] = {
-
+[VIRT_DMA] = 1,
 };
 
 static bool cpu_type_valid(const char *cpu)
@@ -793,6 +795,92 @@ static void create_uart(const VirtMachineState *vms, int 
uart,
 g_free(nodename);
 }
 
+static void create_dma(const VirtMachineState *vms)
+{
+int i;
+char *nodename;
+hwaddr base = vms->memmap[VIRT_DMA].base;
+hwaddr size = vms->memmap[VIRT_DMA].size;
+int irq = vms->irqmap[VIRT_DMA];
+int sid = vms->sidmap[VIRT_DMA];
+const char compat[] = "arm,pl330\0arm,primecell";
+const char irq_names[] = 
"abort\0dma0\0dma1\0dma2\0dma3\0dma4\0dma5\0dma6\0dma7";
+DeviceState *dev;
+MachineState *ms = MACHINE(vms);
+SysBusDevice *busdev;
+DeviceState *smmuv3_dev;
+SMMUState *smmuv3_sys;
+Object *smmuv3_memory;
+
+dev = qdev_new("pl330");
+
+if (vms->iommu == VIRT_IOMMU_SMMUV3 && vms->iommu_phandle) {
+smmuv3_dev = vms->smmuv3;
+smmuv3_sys = ARM_SMMU(smmuv3_dev);
+g_autofree char *memname = g_strdup_printf("%s-peri-%d[0]",
+   smmuv3_sys->mrtypename,
+   sid);
+
+smmuv3_memory = object_property_get_link(OBJECT(smmuv3_dev),
+memname, _abort);
+
+object_property_set_link(OBJECT(dev), "memory",
+ OBJECT(smmuv3_memory),
+ _fatal);
+} else {
+object_property_set_link(OBJECT(dev), "memory",
+ OBJECT(get_system_memory()),
+ _fatal);
+}
+
+qdev_prop_set_uint8(dev, "num_chnls",  8);
+qdev_prop_set_uint8(dev, "num_periph_req",  4);
+qdev_prop_set_uint8(dev, "num_events",  16);
+qdev_prop_set_uint8(dev, "data_width",  64);
+qdev_prop_set_uint8(dev, "wr_cap",  8);
+qdev_prop_set_uint8(dev, "wr_q_dep",  16);
+qdev_prop_set_uint8(dev, "rd_cap",  8);
+qdev_prop_set_uint8(dev, "rd_q_dep",  16);
+qdev_prop_set_uint16(dev, "data_buffer_dep",  256);
+
+busdev = SYS_BUS_DEVICE(dev);
+sysbus_realize_and_unref(busdev, _fatal);
+sysbus_mmio_map(busdev, 0, base);
+
+for (i = 0; i < 9; ++i) {
+sysbus_connect_irq(busdev, i, qdev_get_gpio_in(vms->gic, irq + i));
+}
+
+nodename = g_strdup_printf("/pl330@%" PRIx64, base);
+qemu_fdt_add_subnode(ms->fdt, nodename);
+qemu_fdt_setprop(ms->fdt, nodename, "compatible", compat, sizeof(compat));
+qemu_fdt_setprop_sized_cells(ms->fdt, nodename, "reg",
+ 2, base, 2, size);
+qemu_fdt_setprop_cells(ms->fdt, nodename, "interrupts",
+GIC_FDT_IRQ_TYPE_SPI, irq, GIC_FDT_IRQ_FLAGS_LEVEL_HI,
+GIC_FDT_IRQ_TYPE_SPI, irq + 1, GIC_FDT_IRQ_FLAGS_LEVEL_HI,
+GIC_FDT_IRQ_TYPE_SPI, irq + 2, GIC_FDT_IRQ_FLAGS_LEVEL_HI,
+GIC_FDT_IRQ_TYPE_SPI, irq + 3, GIC_FDT_IRQ_FLAGS_LEVEL_HI,
+GIC_FDT_IRQ_TYPE_SPI, irq + 4, GIC_FDT_IRQ_FLAGS_LEVEL_HI,
+GIC_FDT_IRQ_TYPE_SPI, irq + 5, GIC_FDT_IRQ_FLAGS_LEVEL_HI,
+GIC_FDT_IRQ_TYPE_SPI, irq + 6, GIC_FDT_IRQ_FLAGS_LEVEL_HI,
+GIC_FDT_IRQ_TYPE_SPI, irq + 7, GIC_FDT_IRQ_FLAGS_LEVEL_HI,
+GIC_FDT_IRQ_TYPE_SPI, irq + 8, GIC_FDT_IRQ_FLAGS_LEVEL_HI);
+
+qemu_fdt_setprop(ms->fdt, nodename, "interrupt-names", irq_names,
+ sizeof(irq_names));
+
+qemu_fdt_setprop_cell(ms->fdt, nodename, "clocks", vms->clock_phandle);
+

[PATCH v3 3/4] Update SMMU v3 creation to support non PCI/PCIe device connection

2021-08-23 Thread Li, Chunming
From: LCM 

  . Add sid-map property to store non PCI/PCIe devices SID
  . Create IOMMU memory regions for non PCI/PCIe devices based on their SID
  . Update SID getting strategy for PCI/PCIe and non PCI/PCIe devices

Signed-off-by: Chunming Li 
Signed-off-by: Renwei Liu 
---
 hw/arm/smmuv3.c  | 46 
 include/hw/arm/smmu-common.h |  7 +-
 include/hw/arm/smmuv3.h  |  2 ++
 3 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 01b60bee4..11d7fe842 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -32,6 +32,7 @@
 #include "hw/arm/smmuv3.h"
 #include "smmuv3-internal.h"
 #include "smmu-internal.h"
+#include "hw/qdev-properties.h"
 
 /**
  * smmuv3_trigger_irq - pulse @irq if enabled and update
@@ -1430,6 +1431,19 @@ static void smmu_reset(DeviceState *dev)
 smmuv3_init_regs(s);
 }
 
+static SMMUDevice *smmu_find_peri_sdev(SMMUState *s, uint16_t sid)
+{
+SMMUDevice *sdev;
+
+QLIST_FOREACH(sdev, >peri_sdev_list, next) {
+if (smmu_get_sid(sdev) == sid) {
+return sdev;
+}
+}
+
+return NULL;
+}
+
 static void smmu_realize(DeviceState *d, Error **errp)
 {
 SMMUState *sys = ARM_SMMU(d);
@@ -1437,6 +1451,9 @@ static void smmu_realize(DeviceState *d, Error **errp)
 SMMUv3Class *c = ARM_SMMUV3_GET_CLASS(s);
 SysBusDevice *dev = SYS_BUS_DEVICE(d);
 Error *local_err = NULL;
+SMMUDevice *sdev;
+char *name = NULL;
+uint16_t sid = 0;
 
 c->parent_realize(d, _err);
 if (local_err) {
@@ -1454,6 +1471,28 @@ static void smmu_realize(DeviceState *d, Error **errp)
 sysbus_init_mmio(dev, >iomem);
 
 smmu_init_irq(s, dev);
+
+/* Create IOMMU memory region for peripheral devices based on their SID */
+for (int i = 0; i < s->num_sid; i++) {
+sid = s->sid_map[i];
+sdev = smmu_find_peri_sdev(sys, sid);
+if (sdev) {
+continue;
+}
+
+sdev = g_new0(SMMUDevice, 1);
+sdev->smmu = sys;
+sdev->bus = NULL;
+sdev->devfn = sid;
+
+name = g_strdup_printf("%s-peri-%d", sys->mrtypename, sid);
+memory_region_init_iommu(>iommu, sizeof(sdev->iommu),
+ sys->mrtypename,
+ OBJECT(sys), name, 1ULL << SMMU_MAX_VA_BITS);
+
+QLIST_INSERT_HEAD(>peri_sdev_list, sdev, next);
+g_free(name);
+}
 }
 
 static const VMStateDescription vmstate_smmuv3_queue = {
@@ -1506,6 +1545,12 @@ static void smmuv3_instance_init(Object *obj)
 /* Nothing much to do here as of now */
 }
 
+static Property smmuv3_properties[] = {
+DEFINE_PROP_ARRAY("sid-map", SMMUv3State, num_sid, sid_map,
+  qdev_prop_uint16, uint16_t),
+DEFINE_PROP_END_OF_LIST(),
+};
+
 static void smmuv3_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
@@ -1515,6 +1560,7 @@ static void smmuv3_class_init(ObjectClass *klass, void 
*data)
 device_class_set_parent_reset(dc, smmu_reset, >parent_reset);
 c->parent_realize = dc->realize;
 dc->realize = smmu_realize;
+device_class_set_props(dc, smmuv3_properties);
 }
 
 static int smmuv3_notify_flag_changed(IOMMUMemoryRegion *iommu,
diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
index 706be3c6d..95cd12a4b 100644
--- a/include/hw/arm/smmu-common.h
+++ b/include/hw/arm/smmu-common.h
@@ -117,6 +117,7 @@ struct SMMUState {
 QLIST_HEAD(, SMMUDevice) devices_with_notifiers;
 uint8_t bus_num;
 PCIBus *primary_bus;
+QLIST_HEAD(, SMMUDevice) peri_sdev_list;
 };
 
 struct SMMUBaseClass {
@@ -138,7 +139,11 @@ SMMUPciBus *smmu_find_smmu_pcibus(SMMUState *s, uint8_t 
bus_num);
 /* Return the stream ID of an SMMU device */
 static inline uint16_t smmu_get_sid(SMMUDevice *sdev)
 {
-return PCI_BUILD_BDF(pci_bus_num(sdev->bus), sdev->devfn);
+if (sdev->bus == NULL) {
+return sdev->devfn;
+} else {
+return PCI_BUILD_BDF(pci_bus_num(sdev->bus), sdev->devfn);
+}
 }
 
 /**
diff --git a/include/hw/arm/smmuv3.h b/include/hw/arm/smmuv3.h
index c641e6073..32ba84990 100644
--- a/include/hw/arm/smmuv3.h
+++ b/include/hw/arm/smmuv3.h
@@ -39,6 +39,8 @@ struct SMMUv3State {
 uint32_t features;
 uint8_t sid_size;
 uint8_t sid_split;
+uint32_t num_sid;
+uint16_t *sid_map;
 
 uint32_t idr[6];
 uint32_t iidr;
-- 





[PATCH v3 2/4] hw/arm/smmuv3: Update implementation of CFGI commands based on device SID

2021-08-23 Thread Li, Chunming
From: LCM 

"smmu_iommu_mr" function can't get MR according to SID for non PCI/PCIe devices.
So we replace "smmuv3_flush_config" with "g_hash_table_foreach_remove" based on 
devices SID.

Signed-off-by: Chunming Li 
Signed-off-by: Renwei Liu 
---
 hw/arm/smmuv3.c  | 35 ++-
 include/hw/arm/smmu-common.h |  5 -
 2 files changed, 14 insertions(+), 26 deletions(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 11d7fe842..9f3f13fb8 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -613,14 +613,6 @@ static SMMUTransCfg *smmuv3_get_config(SMMUDevice *sdev, 
SMMUEventInfo *event)
 return cfg;
 }
 
-static void smmuv3_flush_config(SMMUDevice *sdev)
-{
-SMMUv3State *s = sdev->smmu;
-SMMUState *bc = >smmu_state;
-
-trace_smmuv3_config_cache_inv(smmu_get_sid(sdev));
-g_hash_table_remove(bc->configs, sdev);
-}
 
 static IOMMUTLBEntry smmuv3_translate(IOMMUMemoryRegion *mr, hwaddr addr,
   IOMMUAccessFlags flag, int iommu_idx)
@@ -964,22 +956,18 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 case SMMU_CMD_CFGI_STE:
 {
 uint32_t sid = CMD_SID();
-IOMMUMemoryRegion *mr = smmu_iommu_mr(bs, sid);
-SMMUDevice *sdev;
+SMMUSIDRange sid_range;
 
 if (CMD_SSEC()) {
 cmd_error = SMMU_CERROR_ILL;
 break;
 }
 
-if (!mr) {
-break;
-}
-
+sid_range.start = sid;
+sid_range.end = sid;
 trace_smmuv3_cmdq_cfgi_ste(sid);
-sdev = container_of(mr, SMMUDevice, iommu);
-smmuv3_flush_config(sdev);
-
+g_hash_table_foreach_remove(bs->configs, smmuv3_invalidate_ste,
+_range);
 break;
 }
 case SMMU_CMD_CFGI_STE_RANGE: /* same as SMMU_CMD_CFGI_ALL */
@@ -1006,21 +994,18 @@ static int smmuv3_cmdq_consume(SMMUv3State *s)
 case SMMU_CMD_CFGI_CD_ALL:
 {
 uint32_t sid = CMD_SID();
-IOMMUMemoryRegion *mr = smmu_iommu_mr(bs, sid);
-SMMUDevice *sdev;
+SMMUSIDRange sid_range;
 
 if (CMD_SSEC()) {
 cmd_error = SMMU_CERROR_ILL;
 break;
 }
 
-if (!mr) {
-break;
-}
-
+sid_range.start = sid;
+sid_range.end = sid;
 trace_smmuv3_cmdq_cfgi_cd(sid);
-sdev = container_of(mr, SMMUDevice, iommu);
-smmuv3_flush_config(sdev);
+g_hash_table_foreach_remove(bs->configs, smmuv3_invalidate_ste,
+_range);
 break;
 }
 case SMMU_CMD_TLBI_NH_ASID:
diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
index 95cd12a4b..d016455d8 100644
--- a/include/hw/arm/smmu-common.h
+++ b/include/hw/arm/smmu-common.h
@@ -159,7 +159,10 @@ int smmu_ptw(SMMUTransCfg *cfg, dma_addr_t iova, 
IOMMUAccessFlags perm,
  */
 SMMUTransTableInfo *select_tt(SMMUTransCfg *cfg, dma_addr_t iova);
 
-/* Return the iommu mr associated to @sid, or NULL if none */
+/**
+ * Return the iommu mr associated to @sid, or NULL if none
+ * Only for PCI device, check smmu_find_peri_sdev for non PCI/PCIe device
+ */
 IOMMUMemoryRegion *smmu_iommu_mr(SMMUState *s, uint32_t sid);
 
 #define SMMU_IOTLB_MAX_SIZE 256
-- 





[PATCH v3 1/4] hw/arm/smmuv3: Support non PCI/PCIe device connect with SMMU v3

2021-08-23 Thread Li, Chunming
From: LCM 

  . Add sid-map property to store non PCI/PCIe devices SID
  . Create IOMMU memory regions for non PCI/PCIe devices based on their SID
  . Update SID getting strategy for PCI/PCIe and non PCI/PCIe devices

Signed-off-by: Chunming Li 
Signed-off-by: Renwei Liu 
---
 hw/arm/smmuv3.c  | 46 
 include/hw/arm/smmu-common.h |  7 +-
 include/hw/arm/smmuv3.h  |  2 ++
 3 files changed, 54 insertions(+), 1 deletion(-)

diff --git a/hw/arm/smmuv3.c b/hw/arm/smmuv3.c
index 01b60bee4..11d7fe842 100644
--- a/hw/arm/smmuv3.c
+++ b/hw/arm/smmuv3.c
@@ -32,6 +32,7 @@
 #include "hw/arm/smmuv3.h"
 #include "smmuv3-internal.h"
 #include "smmu-internal.h"
+#include "hw/qdev-properties.h"
 
 /**
  * smmuv3_trigger_irq - pulse @irq if enabled and update
@@ -1430,6 +1431,19 @@ static void smmu_reset(DeviceState *dev)
 smmuv3_init_regs(s);
 }
 
+static SMMUDevice *smmu_find_peri_sdev(SMMUState *s, uint16_t sid)
+{
+SMMUDevice *sdev;
+
+QLIST_FOREACH(sdev, >peri_sdev_list, next) {
+if (smmu_get_sid(sdev) == sid) {
+return sdev;
+}
+}
+
+return NULL;
+}
+
 static void smmu_realize(DeviceState *d, Error **errp)
 {
 SMMUState *sys = ARM_SMMU(d);
@@ -1437,6 +1451,9 @@ static void smmu_realize(DeviceState *d, Error **errp)
 SMMUv3Class *c = ARM_SMMUV3_GET_CLASS(s);
 SysBusDevice *dev = SYS_BUS_DEVICE(d);
 Error *local_err = NULL;
+SMMUDevice *sdev;
+char *name = NULL;
+uint16_t sid = 0;
 
 c->parent_realize(d, _err);
 if (local_err) {
@@ -1454,6 +1471,28 @@ static void smmu_realize(DeviceState *d, Error **errp)
 sysbus_init_mmio(dev, >iomem);
 
 smmu_init_irq(s, dev);
+
+/* Create IOMMU memory region for peripheral devices based on their SID */
+for (int i = 0; i < s->num_sid; i++) {
+sid = s->sid_map[i];
+sdev = smmu_find_peri_sdev(sys, sid);
+if (sdev) {
+continue;
+}
+
+sdev = g_new0(SMMUDevice, 1);
+sdev->smmu = sys;
+sdev->bus = NULL;
+sdev->devfn = sid;
+
+name = g_strdup_printf("%s-peri-%d", sys->mrtypename, sid);
+memory_region_init_iommu(>iommu, sizeof(sdev->iommu),
+ sys->mrtypename,
+ OBJECT(sys), name, 1ULL << SMMU_MAX_VA_BITS);
+
+QLIST_INSERT_HEAD(>peri_sdev_list, sdev, next);
+g_free(name);
+}
 }
 
 static const VMStateDescription vmstate_smmuv3_queue = {
@@ -1506,6 +1545,12 @@ static void smmuv3_instance_init(Object *obj)
 /* Nothing much to do here as of now */
 }
 
+static Property smmuv3_properties[] = {
+DEFINE_PROP_ARRAY("sid-map", SMMUv3State, num_sid, sid_map,
+  qdev_prop_uint16, uint16_t),
+DEFINE_PROP_END_OF_LIST(),
+};
+
 static void smmuv3_class_init(ObjectClass *klass, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(klass);
@@ -1515,6 +1560,7 @@ static void smmuv3_class_init(ObjectClass *klass, void 
*data)
 device_class_set_parent_reset(dc, smmu_reset, >parent_reset);
 c->parent_realize = dc->realize;
 dc->realize = smmu_realize;
+device_class_set_props(dc, smmuv3_properties);
 }
 
 static int smmuv3_notify_flag_changed(IOMMUMemoryRegion *iommu,
diff --git a/include/hw/arm/smmu-common.h b/include/hw/arm/smmu-common.h
index 706be3c6d..95cd12a4b 100644
--- a/include/hw/arm/smmu-common.h
+++ b/include/hw/arm/smmu-common.h
@@ -117,6 +117,7 @@ struct SMMUState {
 QLIST_HEAD(, SMMUDevice) devices_with_notifiers;
 uint8_t bus_num;
 PCIBus *primary_bus;
+QLIST_HEAD(, SMMUDevice) peri_sdev_list;
 };
 
 struct SMMUBaseClass {
@@ -138,7 +139,11 @@ SMMUPciBus *smmu_find_smmu_pcibus(SMMUState *s, uint8_t 
bus_num);
 /* Return the stream ID of an SMMU device */
 static inline uint16_t smmu_get_sid(SMMUDevice *sdev)
 {
-return PCI_BUILD_BDF(pci_bus_num(sdev->bus), sdev->devfn);
+if (sdev->bus == NULL) {
+return sdev->devfn;
+} else {
+return PCI_BUILD_BDF(pci_bus_num(sdev->bus), sdev->devfn);
+}
 }
 
 /**
diff --git a/include/hw/arm/smmuv3.h b/include/hw/arm/smmuv3.h
index c641e6073..32ba84990 100644
--- a/include/hw/arm/smmuv3.h
+++ b/include/hw/arm/smmuv3.h
@@ -39,6 +39,8 @@ struct SMMUv3State {
 uint32_t features;
 uint8_t sid_size;
 uint8_t sid_split;
+uint32_t num_sid;
+uint16_t *sid_map;
 
 uint32_t idr[6];
 uint32_t iidr;
-- 





Re: [PATCH] hw/nvme: fix validation of ASQ and ACQ

2021-08-23 Thread Keith Busch
On Mon, Aug 23, 2021 at 02:20:18PM +0200, Klaus Jensen wrote:
> From: Klaus Jensen 
> 
> Address 0x0 is a valid address. Fix the admin submission and completion
> queue address validation to not error out on this.

Indeed, there are environments that can use that address. It's a host error if
the controller was enabled with invalid queue addresses anyway. The controller
only needs to verify the lower bits are clear, which we do later.

Reviewed-by: Keith Busch 



Re: [PATCH for-6.2 v5 0/5] hw/arm/virt: Introduce cpu topology support

2021-08-23 Thread Michael S. Tsirkin
On Thu, Aug 05, 2021 at 08:39:16PM +0800, Yanan Wang wrote:
> Hi,
> 
> This is a new version (v5) of the series [1] that I posted to introduce
> support for generating cpu topology descriptions to virt machine guest.
> 
> Once the view of an accurate virtual cpu topology is provided to guest,
> with a well-designed vCPU pinning to the pCPU we may get a huge benefit,
> e.g., the scheduling performance improvement. See Dario Faggioli's
> research and the related performance tests in [2] for reference.
> 
> This patch series introduces cpu topology support for ARM platform.
> Both cpu-map in DT and ACPI PPTT table are introduced to store the
> topology information. And we only describe the topology information
> to 6.2 and newer virt machines, considering compatibility.

ACPI things:

Reviewed-by: Michael S. Tsirkin 

pls merge through ARM tree.

> [1] 
> https://lore.kernel.org/qemu-devel/20210622093413.13360-1-wangyana...@huawei.com/
> [2] 
> https://kvmforum2020.sched.com/event/eE1y/virtual-topology-for-virtual-machines
> -friend-or-foe-dario-faggioli-suse
> 
> Some tests:
> 1) -smp 16,sockets=2,cores=4,threads=2,maxcpus=16
> lscpu:
> Architecture:aarch64
> Byte Order:  Little Endian
> CPU(s):  16
> On-line CPU(s) list: 0-15
> Thread(s) per core:  2
> Core(s) per socket:  4
> Socket(s):   2
> NUMA node(s):1
> Vendor ID:   ARM
> Model:   2
> Model name:  Cortex-A72
> Stepping:r0p2
> BogoMIPS:100.00
> NUMA node0 CPU(s):   0-15
> 
> cat /sys/devices/system/cpu/present  -->  0-15
> cat /sys/devices/system/cpu/possible -->  0-15
> 
> 2) -smp 8,sockets=2,cores=4,threads=2,maxcpus=16
> lscpu:
> Architecture:aarch64
> Byte Order:  Little Endian
> CPU(s):  8
> On-line CPU(s) list: 0-7
> Thread(s) per core:  2
> Core(s) per socket:  4
> Socket(s):   1
> NUMA node(s):1
> Vendor ID:   ARM
> Model:   2
> Model name:  Cortex-A72
> Stepping:r0p2
> BogoMIPS:100.00
> NUMA node0 CPU(s):   0-7
> 
> cat /sys/devices/system/cpu/present  -->  0-7
> cat /sys/devices/system/cpu/possible -->  0-7
> 
> ---
> 
> Changelogs:
> 
> v4->v5:
> - drop the added -smp "expose=on|off" parameter and only describe topology
>   for 6.2 and newer machines
> - rebased the code on patch series [3] which has introduced some fix and
>   improvement for smp parsing
> - [3]: 
> https://lore.kernel.org/qemu-devel/20210803080527.156556-1-wangyana...@huawei.com/
> 
> v3->v4:
> - add new -smp parameter "expose=on|off" for users to enable/disable the 
> feature
> - add stricter -smp cmdline parsing rules on "expose=on" case
> - move build_pptt to generic aml-build.c
> - add default cluster node in the cpu-map
> - rebase on top of latest upstream master
> - v3: 
> https://lore.kernel.org/qemu-devel/20210516102900.28036-1-wangyana...@huawei.com/
> 
> v2->v3:
> - address comments from David, Philippe, and Andrew. Thanks!
> - split some change into separate commits for ease of review
> - adjust parsing rules of virt_smp_parse to be more strict
>   (after discussion with Andrew)
> - adjust author credit for the patches
> 
> v1->v2:
> - Address Andrew Jones's comments
> - Address Michael S. Tsirkin's comments
> 
> ---
> 
> Andrew Jones (2):
>   hw/arm/virt: Add cpu-map to device tree
>   hw/acpi/aml-build: Generate PPTT table
> 
> Yanan Wang (3):
>   hw/arm/virt: Only describe cpu topology to guest since virt 6.2
>   device_tree: Add qemu_fdt_add_path
>   hw/acpi/aml-build: Add Processor hierarchy node structure
> 
>  hw/acpi/aml-build.c  | 76 
>  hw/arm/virt-acpi-build.c |  8 +++-
>  hw/arm/virt.c| 62 -
>  include/hw/acpi/aml-build.h  |  7 
>  include/hw/arm/virt.h|  4 +-
>  include/sysemu/device_tree.h |  1 +
>  softmmu/device_tree.c| 44 -
>  7 files changed, 188 insertions(+), 14 deletions(-)
> 
> -- 
> 2.19.1




Re: [PATCH for-6.2 v5 5/5] hw/acpi/aml-build: Generate PPTT table

2021-08-23 Thread Michael S. Tsirkin
On Thu, Aug 05, 2021 at 08:39:21PM +0800, Yanan Wang wrote:
> From: Andrew Jones 
> 
> Add the Processor Properties Topology Table (PPTT) to expose
> CPU topology information defined by users to ACPI guests.
> 
> Note, a DT-boot Linux guest with a non-flat CPU topology will
> see socket and core IDs being sequential integers starting
> from zero, which is different from ACPI-boot Linux guest,
> e.g. with -smp 4,sockets=2,cores=2,threads=1
> 
> a DT boot produces:
> 
>  cpu:  0 package_id:  0 core_id:  0
>  cpu:  1 package_id:  0 core_id:  1
>  cpu:  2 package_id:  1 core_id:  0
>  cpu:  3 package_id:  1 core_id:  1
> 
> an ACPI boot produces:
> 
>  cpu:  0 package_id: 36 core_id:  0
>  cpu:  1 package_id: 36 core_id:  1
>  cpu:  2 package_id: 96 core_id:  2
>  cpu:  3 package_id: 96 core_id:  3
> 
> This is due to several reasons:
> 
>  1) DT cpu nodes do not have an equivalent field to what the PPTT
> ACPI Processor ID must be, i.e. something equal to the MADT CPU
> UID or equal to the UID of an ACPI processor container. In both
> ACPI cases those are platform dependant IDs assigned by the
> vendor.
> 
>  2) While QEMU is the vendor for a guest, if the topology specifies
> SMT (> 1 thread), then, with ACPI, it is impossible to assign a
> core-id the same value as a package-id, thus it is not possible
> to have package-id=0 and core-id=0. This is because package and
> core containers must be in the same ACPI namespace and therefore
> must have unique UIDs.
> 
>  3) ACPI processor containers are not mandatorily required for PPTT
> tables to be used and, due to the limitations of which IDs are
> selected described above in (2), they are not helpful for QEMU,
> so we don't build them with this patch. In the absence of them,
> Linux assigns its own unique IDs. The maintainers have chosen not
> to use counters from zero, but rather ACPI table offsets, which
> explains why the numbers are so much larger than with DT.
> 
>  4) When there is no SMT (threads=1) the core IDs for ACPI boot guests
> match the logical CPU IDs, because these IDs must be equal to the
> MADT CPU UID (as no processor containers are present), and QEMU
> uses the logical CPU ID for these MADT IDs.
> 
> So in summary, with QEMU as vender for the guest, we use sequential

vendor?

> integers starting from zero for non-leaf nodes without valid ID flag,
> so that guest will ignore them and use table offsets as unique IDs.
> And we use logical CPU IDs for leaf nodes to be consistent with MADT.
> 
> Signed-off-by: Andrew Jones 
> Co-developed-by: Yanan Wang 
> Signed-off-by: Yanan Wang 
> ---
>  hw/acpi/aml-build.c | 50 +
>  hw/arm/virt-acpi-build.c|  8 +-
>  include/hw/acpi/aml-build.h |  3 +++
>  3 files changed, 60 insertions(+), 1 deletion(-)
> 
> diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
> index 9fa5024414..aa61c9651e 100644
> --- a/hw/acpi/aml-build.c
> +++ b/hw/acpi/aml-build.c
> @@ -1946,6 +1946,56 @@ void build_processor_hierarchy_node(GArray *tbl, 
> uint32_t flags,
>  }
>  }
>  
> +/* ACPI 6.2: 5.2.29 Processor Properties Topology Table (PPTT) */
> +void build_pptt(GArray *table_data, BIOSLinker *linker, MachineState *ms,
> +const char *oem_id, const char *oem_table_id)
> +{
> +int pptt_start = table_data->len;
> +int uid = 0;
> +int socket;
> +
> +acpi_data_push(table_data, sizeof(AcpiTableHeader));
> +
> +for (socket = 0; socket < ms->smp.sockets; socket++) {
> +uint32_t socket_offset = table_data->len - pptt_start;
> +int core;
> +
> +build_processor_hierarchy_node(
> +table_data,
> +(1 << 0), /* ACPI 6.2 - Physical package */

A bit better to be detailed:

/* Physical package - represents the boundary of a physical package */


> +0, socket, NULL, 0);
> +
> +for (core = 0; core < ms->smp.cores; core++) {
> +uint32_t core_offset = table_data->len - pptt_start;
> +int thread;
> +
> +if (ms->smp.threads > 1) {
> +build_processor_hierarchy_node(table_data, 0, socket_offset,

and here:
/* Physical package - does not represent the boundary of a physical package */


> +   core, NULL, 0);
> +
> +for (thread = 0; thread < ms->smp.threads; thread++) {
> +build_processor_hierarchy_node(
> +table_data,
> +(1 << 1) | /* ACPI 6.2 - ACPI Processor ID valid */
> +(1 << 2) | /* ACPI 6.3 - Processor is a Thread */
> +(1 << 3),  /* ACPI 6.3 - Node is a Leaf */
> +core_offset, uid++, NULL, 0);
> +}
> +} else {
> +build_processor_hierarchy_node(
> +table_data,
> +(1 << 1) | /* ACPI 

Re: [PATCH v2 1/1] virtio: failover: define the default device to use in case of error

2021-08-23 Thread Michael S. Tsirkin
On Mon, Aug 09, 2021 at 07:13:42PM +0200, Laurent Vivier wrote:
> If the guest driver doesn't support the STANDBY feature, by default
> we keep the virtio-net device and don't hotplug the VFIO device,
> but in some cases, user can prefer to use the VFIO device rather
> than the virtio-net one. We can't unplug the virtio-net device
> (because on migration it is expected on the destination side)
> but we can force the guest driver to be disabled. Then, we can
> hotplug the VFIO device that will be unplugged before the migration
> like in the normal failover migration but without the failover device.
> 
> This patch adds a new property to virtio-net device: "failover-default".
> 
> By default, "failover-default" is set to true and thus the default NIC
> to use if the failover cannot be enabled is the virtio-net device
> (this is what is done until now with the virtio-net failover).
> 
> If "failover-default" is set to false, in case of error, the virtio-net
> device is not the default anymore and the failover primary device
> is used instead.
> 
> If the STANDBY feature is supported by guest and host, the virtio-net
> failover acts as usual.
> 
> Signed-off-by: Laurent Vivier 

Three things I dislike here. First this is limited to 1.0.
OTOH this is all about legacy guests without STANDBY,
would be nicer to support legacy.
Second: the reason we don't want both
virtio and VFIO is because their mac addresses match.
This tends to confuse guest tools.
I don't see this solved here.

Proposal: management supplies an extra dummy mac.
This mac is used with virtio and its link it down.
Link state reporting is also optional but
it has been there for many years.
If link state reporting is disabled then maybe do not
expose VFIO after all.

Third thing is option name. Does not hint at the fact that
for legacy guests we do not get failover at all.
Let's try to be more explicit please.



> ---
>  include/hw/virtio/virtio-net.h |  1 +
>  hw/net/virtio-net.c| 49 +-
>  2 files changed, 44 insertions(+), 6 deletions(-)
> 
> diff --git a/include/hw/virtio/virtio-net.h b/include/hw/virtio/virtio-net.h
> index 824a69c23f06..ab77930a327e 100644
> --- a/include/hw/virtio/virtio-net.h
> +++ b/include/hw/virtio/virtio-net.h
> @@ -208,6 +208,7 @@ struct VirtIONet {
>  /* primary failover device is hidden*/
>  bool failover_primary_hidden;
>  bool failover;
> +bool failover_default;
>  DeviceListener primary_listener;
>  Notifier migration_state;
>  VirtioNetRssData rss_data;
> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> index 16d20cdee52a..972c03232a96 100644
> --- a/hw/net/virtio-net.c
> +++ b/hw/net/virtio-net.c
> @@ -935,12 +935,23 @@ static void virtio_net_set_features(VirtIODevice *vdev, 
> uint64_t features)
>  memset(n->vlans, 0xff, MAX_VLAN >> 3);
>  }
>  
> -if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY)) {
> -qapi_event_send_failover_negotiated(n->netclient_name);
> -qatomic_set(>failover_primary_hidden, false);
> -failover_add_primary(n, );
> -if (err) {
> -warn_report_err(err);
> +/*
> + * if the virtio-net driver has the STANDBY feature, we can plug the 
> primary
> + * if not but is not the default failover device,
> + * we need to plug the primary alone and the virtio-net driver will
> + * be disabled in the validate_features() function but 
> validate_features()
> + * is only available with virtio 1.0 spec
> + */
> +if (n->failover) {
> +if (virtio_has_feature(features, VIRTIO_NET_F_STANDBY) ||
> +   (virtio_has_feature(features, VIRTIO_F_VERSION_1) &&
> +!n->failover_default)) {
> +qapi_event_send_failover_negotiated(n->netclient_name);
> +qatomic_set(>failover_primary_hidden, false);
> +failover_add_primary(n, );
> +if (err) {
> +warn_report_err(err);
> +}
>  }
>  }
>  }
> @@ -3625,9 +3636,34 @@ static Property virtio_net_properties[] = {
>  DEFINE_PROP_INT32("speed", VirtIONet, net_conf.speed, SPEED_UNKNOWN),
>  DEFINE_PROP_STRING("duplex", VirtIONet, net_conf.duplex_str),
>  DEFINE_PROP_BOOL("failover", VirtIONet, failover, false),
> +DEFINE_PROP_BOOL("failover-default", VirtIONet, failover_default, true),
>  DEFINE_PROP_END_OF_LIST(),
>  };
>  
> +/* validate_features() is only available with VIRTIO_F_VERSION_1 */
> +static int failover_validate_features(VirtIODevice *vdev)
> +{
> +VirtIONet *n = VIRTIO_NET(vdev);
> +
> +/*
> + * If the guest driver doesn't support the STANDBY feature, by default
> + * we keep the virtio-net device and don't hotplug the VFIO device,
> + * but in some cases, user can prefer to use the VFIO device rather
> + * than the virtio-net one. We can't unplug the virtio-net device
> + * (because on migration it is expected on the destination side)
> 

Re: [PATCH 0/3] gdbstub: add support for switchable endianness

2021-08-23 Thread Changbin Du
On Mon, Aug 23, 2021 at 04:30:05PM +0100, Peter Maydell wrote:
> On Mon, 23 Aug 2021 at 16:21, Philippe Mathieu-Daudé  
> wrote:
> >
> > On 8/23/21 4:20 PM, Changbin Du wrote:
> > > To resolve the issue to debug switchable targets, this serias introduces
> > > basic infrastructure for gdbstub and enable support for ARM and RISC-V
> > > targets.
> > >
> > > For example, now there is no problem to debug an big-enadian aarch64 
> > > target
> > > on x86 host.
> > >
> > >   $ qemu-system-aarch64 -gdb tcp::1234,endianness=big ...
> >
> > I don't understand why you need all that.
> > Maybe you aren't using gdb-multiarch?
> >
> > You can install it or start it via QEMU Debian Docker image:
> >
> > $ docker run -it --rm -v /tmp:/tmp -u $UID --network=host \
> >   registry.gitlab.com/qemu-project/qemu/qemu/debian10 \
> >   gdb-multiarch -q \
> > --ex 'set architecture aarch64' \
> > --ex 'set endian big'
> > The target architecture is assumed to be aarch64
> > The target is assumed to be big endian
> > (gdb) target remote 172.17.0.1:1234
> 
> I don't think that will help, because an AArch64 CPU (at least
> in the boards we model) will always start up in little-endian,
> and our gdbstub will always transfer register data etc in
> little-endian order, because gdb cannot cope with a target that
> isn't always the same endianness. Fixing this requires gdb
Yes, that's the problem.

> changes to be more capable of handling dynamic target changes
> (this would also help with eg debugging across 32<->64 bit switches);
> as I understand it that gdb work would be pretty significant,
> and at least for aarch64 pretty much nobody cares about
> big-endian, so nobody's got round to doing it yet.
> 
Mostly we do not care dynamic target changes because nearly all OS will setup
endianness mode by its first instruction. And dynamic changes for gdb is hard
since the byte order of debugging info in elf is fixed. And currently the GDB
remote protocol does not support querying endianness info from remote.

So usually we needn't change byte order during a debug session, but we just want
the qemu gdbstub can send data in and handle data it received in right byte 
order.
This patch does this work with the help of users via the option 'endianness='.

> Our target/ppc/gdbstub.c code takes a different tack: it
> always sends register data in the same order the CPU is
> currently in, which has a different set of cases when it
> goes wrong.
>
Yes, I tried to do this before. But as I said above GDB unable to handle dynamic
target changing. Maybe we can take this way as 'endianness=default'? Anyway,
this requires each target provides a interface to determine the current byte
order.

> thanks
> -- PMM

-- 
Cheers,
Changbin Du



Re: [PATCH] hw/acpi/pcihp: validate bsel property of the bus before unplugging device

2021-08-23 Thread Michael S. Tsirkin
On Sat, Aug 21, 2021 at 08:35:35PM +0530, Ani Sinha wrote:
> Bsel property of the pci bus indicates whether the bus supports acpi hotplug.
> We need to validate the presence of this property before performing any 
> hotplug
> related callback operations. Currently validation of the existence of this
> property was absent from acpi_pcihp_device_unplug_cb() function but is present
> in other hotplug/unplug callback functions. Hence, this change adds the 
> missing
> check for the above function.
> 
> Signed-off-by: Ani Sinha 

I queued this but I have a general question:
are all these errors logged with LOG_GUEST_ERROR?
Because if not we have a security problem.
I also note that bsel is an internal property,
I am not sure we should be printing this to users,
it might just confuse them.

Same question for all the other places validating bsel.

> ---
>  hw/acpi/pcihp.c | 10 --
>  1 file changed, 8 insertions(+), 2 deletions(-)
> 
> diff --git a/hw/acpi/pcihp.c b/hw/acpi/pcihp.c
> index 0fd0c1d811..9982815a87 100644
> --- a/hw/acpi/pcihp.c
> +++ b/hw/acpi/pcihp.c
> @@ -372,9 +372,15 @@ void acpi_pcihp_device_unplug_cb(HotplugHandler 
> *hotplug_dev, AcpiPciHpState *s,
>   DeviceState *dev, Error **errp)
>  {
>  PCIDevice *pdev = PCI_DEVICE(dev);
> +int bsel = acpi_pcihp_get_bsel(pci_get_bus(pdev));
> +
> +trace_acpi_pci_unplug(PCI_SLOT(pdev->devfn), bsel);
>  
> -trace_acpi_pci_unplug(PCI_SLOT(pdev->devfn),
> -  acpi_pcihp_get_bsel(pci_get_bus(pdev)));
> +if (bsel < 0) {
> +error_setg(errp, "Unsupported bus. Bus doesn't have property '"
> +   ACPI_PCIHP_PROP_BSEL "' set");
> +return;
> +}
>  
>  /*
>   * clean up acpi-index so it could reused by another device
> -- 
> 2.25.1




Re: [PATCH 4/4] vl: Prioritize realizations of devices

2021-08-23 Thread Peter Xu
On Mon, Aug 23, 2021 at 05:56:23PM -0400, Eduardo Habkost wrote:
> I don't have any other example, but I assume address assignment
> based on ordering is a common pattern in device code.
> 
> I would take a very close and careful look at the devices with
> non-default vmsd priority.  If you can prove that the 13 device
> types with non-default priority are all order-insensitive, a
> custom sort function as you describe might be safe.

Besides virtio-mem-pci, there'll also similar devfn issue with all
MIG_PRI_PCI_BUS, as they'll be allocated just like other pci devices.  Say,
below two cmdlines will generate different pci topology too:

  $ qemu-system-x86_64 -device pcie-root-port,chassis=0 \
   -device pcie-root-port,chassis=1 \
   -device virtio-net-pci

And:

  $ qemu-system-x86_64 -device pcie-root-port,chassis=0 \
   -device virtio-net-pci
   -device pcie-root-port,chassis=1 \

This cannot be solved by keeping priority==0 ordering.

After a second thought, I think I was initially wrong on seeing migration
priority and device realization the same problem.

For example, for live migration we have a requirement on PCI_BUS being migrated
earlier than MIG_PRI_IOMMU because there's bus number information required
because IOMMU relies on the bus number to find address spaces.  However that's
definitely not a requirement for device realizations, say, realizing vIOMMU
after pci buses are fine (bus assigned during bios).

I've probably messed up with the ideas (though they really look alike!).  Sorry
about that.

Since the only ordering constraint so far is IOMMU vs all the rest of devices,
I'll introduce a new priority mechanism and only make sure vIOMMUs are realized
earlier.  That'll also avoid other implications on pci devfn allocations.

Will rework a new version tomorrow.  Thanks a lot for all the comments,

-- 
Peter Xu




Re: [PATCH 4/4] vl: Prioritize realizations of devices

2021-08-23 Thread Peter Xu
On Mon, Aug 23, 2021 at 05:54:44PM -0400, Michael S. Tsirkin wrote:
> > I can use a custom sort to replace qsort() to guarantee that.
> You don't have to do that. Simply use the device position on the command
> line for comparisons when priority is the same.

Indeed. :) Thanks,

-- 
Peter Xu




Re: [PATCH v4 01/14] target/riscv: Add x-zba, x-zbb, x-zbc and x-zbs properties

2021-08-23 Thread Alistair Francis
On Tue, Aug 24, 2021 at 4:12 AM Philipp Tomsich
 wrote:
>
> The bitmanipulation ISA extensions will be ratified as individual
> small extension packages instead of a large B-extension.  The first
> new instructions through the door (these have completed public review)
> are Zb[abcs].
>
> This adds new 'x-zba', 'x-zbb', 'x-zbc' and 'x-zbs' properties for
> these in target/riscv/cpu.[ch].
>
> Signed-off-by: Philipp Tomsich 
> Reviewed-by: Richard Henderson 

Reviewed-by: Alistair Francis 

Alistair

> ---
>
> (no changes since v3)
>
> Changes in v3:
> - Split off removal of 'x-b' property and 'ext_b' field into a separate
>   patch to ensure bisectability.
>
>  target/riscv/cpu.c | 4 
>  target/riscv/cpu.h | 4 
>  2 files changed, 8 insertions(+)
>
> diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> index 991a6bb760..c7bc1f9f44 100644
> --- a/target/riscv/cpu.c
> +++ b/target/riscv/cpu.c
> @@ -585,6 +585,10 @@ static Property riscv_cpu_properties[] = {
>  DEFINE_PROP_BOOL("u", RISCVCPU, cfg.ext_u, true),
>  /* This is experimental so mark with 'x-' */
>  DEFINE_PROP_BOOL("x-b", RISCVCPU, cfg.ext_b, false),
> +DEFINE_PROP_BOOL("x-zba", RISCVCPU, cfg.ext_zba, false),
> +DEFINE_PROP_BOOL("x-zbb", RISCVCPU, cfg.ext_zbb, false),
> +DEFINE_PROP_BOOL("x-zbc", RISCVCPU, cfg.ext_zbc, false),
> +DEFINE_PROP_BOOL("x-zbs", RISCVCPU, cfg.ext_zbs, false),
>  DEFINE_PROP_BOOL("x-h", RISCVCPU, cfg.ext_h, false),
>  DEFINE_PROP_BOOL("x-v", RISCVCPU, cfg.ext_v, false),
>  DEFINE_PROP_BOOL("Counters", RISCVCPU, cfg.ext_counters, true),
> diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
> index bf1c899c00..7c4cd8ea89 100644
> --- a/target/riscv/cpu.h
> +++ b/target/riscv/cpu.h
> @@ -293,6 +293,10 @@ struct RISCVCPU {
>  bool ext_u;
>  bool ext_h;
>  bool ext_v;
> +bool ext_zba;
> +bool ext_zbb;
> +bool ext_zbc;
> +bool ext_zbs;
>  bool ext_counters;
>  bool ext_ifencei;
>  bool ext_icsr;
> --
> 2.25.1
>
>



Re: [PATCH 0/3] gdbstub: add support for switchable endianness

2021-08-23 Thread Changbin Du
On Mon, Aug 23, 2021 at 05:21:07PM +0200, Philippe Mathieu-Daudé wrote:
> On 8/23/21 4:20 PM, Changbin Du wrote:
> > To resolve the issue to debug switchable targets, this serias introduces
> > basic infrastructure for gdbstub and enable support for ARM and RISC-V
> > targets.
> > 
> > For example, now there is no problem to debug an big-enadian aarch64 target
> > on x86 host.
> > 
> >   $ qemu-system-aarch64 -gdb tcp::1234,endianness=big ...
> 
> I don't understand why you need all that.
> Maybe you aren't using gdb-multiarch?
>
Nope, my gdb support all architectures.

> You can install it or start it via QEMU Debian Docker image:
> 
> $ docker run -it --rm -v /tmp:/tmp -u $UID --network=host \
>   registry.gitlab.com/qemu-project/qemu/qemu/debian10 \
>   gdb-multiarch -q \
> --ex 'set architecture aarch64' \
> --ex 'set endian big'
> The target architecture is assumed to be aarch64
> The target is assumed to be big endian
> (gdb) target remote 172.17.0.1:1234
> (gdb)
>
The gdb has no problem to read endianness and arch info from elf. The problem
is how qemu gdbstub handles the byte order it received.

Now let's try to debug a big-enadian aarch64 linux kernel.

1) start qemu with '-gdb tcp::1234'

$ gdb-multiarch vmlinux
(gdb) target remote :1234
Remote debugging using :1234
0x0040 in ?? ()
=> 0x0040:  Cannot access memory at address 0x40
(gdb) ni
Cannot access memory at address 0x40
(gdb) show architecture 
The target architecture is set to "auto" (currently "aarch64").
(gdb) show endian 
The target endianness is set automatically (currently big endian).

You see it an't work, not to mention adding breakpoints.

2) start qemu with '-gdb tcp::1234,endianness=big'

$ gdb-multiarch vmlinux
(gdb) target remote :1234
Remote debugging using :1234
0x4000 in ?? ()
=> 0x4000:  c0 00 00 58 ldr x0, 0x4018
(gdb) ni
0x4004 in ?? ()
=> 0x4004:  e1 03 1f aa mov x1, xzr
(gdb) b start_kernel
Breakpoint 1 at 0x800011130ee8 (2 locations)
(gdb) c
Continuing.

Thread 1 hit Breakpoint 1, 0x800011130ee8 in start_kernel ()
=> 0x800011130ee8 : 5f 24 03 d5 bti c
(gdb) bt
#0  0x800011130ee8 in start_kernel ()
#1  0x8000111303c8 in __primary_switched () at arch/arm64/kernel/head.S:467
Backtrace stopped: previous frame identical to this frame (corrupt stack?)

okay, now it works fine.

-- 
Cheers,
Changbin Du



[PATCH v6 0/5] python/aqmp: AQMP TUI

2021-08-23 Thread G S Niteesh Babu
Gitlab: https://gitlab.com/niteesh.gs/qemu/-/commits/aqmp-tui-prototype-v6
Based-on: <20210803182941.504537-1-js...@redhat.com> [v3,00/25] python:
introduce Asynchronous QMP package
CI: https://gitlab.com/niteesh.gs/qemu/-/pipelines/358117062

Updates since v5:

1) Moved all docstrings under init to class
2) Reworked the format_json function
3) Renamed the has_tui_handler function to has_handler_type
4) Added OSError to be considered for retrying
5) Reworked the editor to add messages to the end of the history stack.

G S Niteesh Babu (5):
  python: Add dependencies for AQMP TUI
  python/aqmp-tui: Add AQMP TUI
  python: Add entry point for aqmp-tui
  python: add optional pygments dependency
  python/aqmp-tui: Add syntax highlighting

 python/Pipfile.lock  |  20 ++
 python/qemu/aqmp/aqmp_tui.py | 652 +++
 python/setup.cfg |  27 +-
 3 files changed, 698 insertions(+), 1 deletion(-)
 create mode 100644 python/qemu/aqmp/aqmp_tui.py

-- 
2.17.1




[PATCH v6 2/5] python/aqmp-tui: Add AQMP TUI

2021-08-23 Thread G S Niteesh Babu
Added AQMP TUI.

Implements the follwing basic features:
1) Command transmission/reception.
2) Shows events asynchronously.
3) Shows server status in the bottom status bar.
4) Automatic retries on disconnects and error conditions.

Also added type annotations and necessary pylint/mypy configurations.

Signed-off-by: G S Niteesh Babu 
---
 python/qemu/aqmp/aqmp_tui.py | 620 +++
 python/setup.cfg |  13 +-
 2 files changed, 632 insertions(+), 1 deletion(-)
 create mode 100644 python/qemu/aqmp/aqmp_tui.py

diff --git a/python/qemu/aqmp/aqmp_tui.py b/python/qemu/aqmp/aqmp_tui.py
new file mode 100644
index 00..ac533541d2
--- /dev/null
+++ b/python/qemu/aqmp/aqmp_tui.py
@@ -0,0 +1,620 @@
+# Copyright (c) 2021
+#
+# Authors:
+#  Niteesh Babu G S 
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or
+# later.  See the COPYING file in the top-level directory.
+"""
+AQMP TUI
+
+AQMP TUI is an asynchronous interface built on top the of the AQMP library.
+It is the successor of QMP-shell and is bought-in as a replacement for it.
+
+Example Usage: aqmp-tui 
+Full Usage: aqmp-tui --help
+"""
+
+import argparse
+import asyncio
+import json
+import logging
+from logging import Handler, LogRecord
+import signal
+from typing import (
+List,
+Optional,
+Tuple,
+Type,
+Union,
+cast,
+)
+
+import urwid
+import urwid_readline
+
+from ..qmp import QEMUMonitorProtocol, QMPBadPortError
+from .error import ProtocolError
+from .message import DeserializationError, Message, UnexpectedTypeError
+from .protocol import ConnectError, Runstate
+from .qmp_client import ExecInterruptedError, QMPClient
+from .util import create_task, pretty_traceback
+
+
+# The name of the signal that is used to update the history list
+UPDATE_MSG: str = 'UPDATE_MSG'
+
+
+def format_json(msg: str) -> str:
+"""
+Formats valid/invalid multi-line JSON message into a single-line message.
+
+Formatting is first tried using the standard json module. If that fails
+due to an decoding error then a simple string manipulation is done to
+achieve a single line JSON string.
+
+Converting into single line is more asthetically pleasing when looking
+along with error messages.
+
+Eg:
+Input:
+  [ 1,
+true,
+3 ]
+The above input is not a valid QMP message and produces the following error
+"QMP message is not a JSON object."
+When displaying this in TUI in multiline mode we get
+
+[ 1,
+  true,
+  3 ]: QMP message is not a JSON object.
+
+whereas in singleline mode we get the following
+
+[1, true, 3]: QMP message is not a JSON object.
+
+The single line mode is more asthetically pleasing.
+
+:param msg:
+The message to formatted into single line.
+
+:return: Formatted singleline message.
+"""
+try:
+msg = json.loads(msg)
+return str(json.dumps(msg))
+except json.decoder.JSONDecodeError:
+msg = msg.replace('\n', '')
+words = msg.split(' ')
+words = list(filter(None, words))
+return ' '.join(words)
+
+
+def has_handler_type(logger: logging.Logger,
+ handler_type: Type[Handler]) -> bool:
+"""
+The Logger class has no interface to check if a certain type of handler is
+installed or not. So we provide an interface to do so.
+
+:param logger:
+Logger object
+:param handler_type:
+The type of the handler to be checked.
+
+:return: returns True if handler of type `handler_type`.
+"""
+for handler in logger.handlers:
+if isinstance(handler, handler_type):
+return True
+return False
+
+
+class App(QMPClient):
+"""
+Implements the AQMP TUI.
+
+Initializes the widgets and starts the urwid event loop.
+
+:param address:
+Address of the server to connect to.
+:param num_retries:
+The number of times to retry before stopping to reconnect.
+:param retry_delay:
+The delay(sec) before each retry
+"""
+def __init__(self, address: Union[str, Tuple[str, int]], num_retries: int,
+ retry_delay: Optional[int]) -> None:
+urwid.register_signal(type(self), UPDATE_MSG)
+self.window = Window(self)
+self.address = address
+self.aloop: Optional[asyncio.AbstractEventLoop] = None
+self.num_retries = num_retries
+self.retry_delay = retry_delay if retry_delay else 2
+self.retry: bool = False
+self.exiting: bool = False
+super().__init__()
+
+def add_to_history(self, msg: str, level: Optional[str] = None) -> None:
+"""
+Appends the msg to the history list.
+
+:param msg:
+The raw message to be appended in string type.
+"""
+urwid.emit_signal(self, UPDATE_MSG, msg, level)
+
+def _cb_outbound(self, msg: Message) -> Message:
+

[PATCH v6 3/5] python: Add entry point for aqmp-tui

2021-08-23 Thread G S Niteesh Babu
Add an entry point for aqmp-tui. This will allow it to be run from
the command line using "aqmp-tui localhost:1234"
More options available in the TUI can be found using "aqmp-tui -h"

Signed-off-by: G S Niteesh Babu 
---
 python/setup.cfg | 1 +
 1 file changed, 1 insertion(+)

diff --git a/python/setup.cfg b/python/setup.cfg
index e9ceaea637..0850c7a10f 100644
--- a/python/setup.cfg
+++ b/python/setup.cfg
@@ -66,6 +66,7 @@ console_scripts =
 qom-fuse = qemu.qmp.qom_fuse:QOMFuse.entry_point [fuse]
 qemu-ga-client = qemu.qmp.qemu_ga_client:main
 qmp-shell = qemu.qmp.qmp_shell:main
+aqmp-tui = qemu.aqmp.aqmp_tui:main [tui]
 
 [flake8]
 extend-ignore = E722  # Prefer pylint's bare-except checks to flake8's
-- 
2.17.1




[PATCH v6 4/5] python: add optional pygments dependency

2021-08-23 Thread G S Niteesh Babu
Added pygments as optional dependency for AQMP TUI.
This is required for the upcoming syntax highlighting feature
in AQMP TUI.
The dependency has also been added in the devel optional group.

Added mypy 'ignore_missing_imports' for pygments since it does
not have any type stubs.

Signed-off-by: G S Niteesh Babu 
---
 python/Pipfile.lock | 8 
 python/setup.cfg| 5 +
 2 files changed, 13 insertions(+)

diff --git a/python/Pipfile.lock b/python/Pipfile.lock
index da7a4ee164..d2a7dbd88b 100644
--- a/python/Pipfile.lock
+++ b/python/Pipfile.lock
@@ -200,6 +200,14 @@
 ],
 "version": "==2.0.0"
 },
+"pygments": {
+"hashes": [
+
"sha256:a18f47b506a429f6f4b9df81bb02beab9ca21d0a5fee38ed15aef65f0545519f",
+
"sha256:d66e804411278594d764fc69ec36ec13d9ae9147193a1740cd34d272ca383b8e"
+],
+"markers": "python_version >= '3.5'",
+"version": "==2.9.0"
+},
 "pylint": {
 "hashes": [
 
"sha256:082a6d461b54f90eea49ca90fff4ee8b6e45e8029e5dbd72f6107ef84f3779c0",
diff --git a/python/setup.cfg b/python/setup.cfg
index 0850c7a10f..435f86384a 100644
--- a/python/setup.cfg
+++ b/python/setup.cfg
@@ -46,6 +46,7 @@ devel =
 tox >= 3.18.0
 urwid >= 2.1.2
 urwid-readline >= 0.13
+Pygments >= 2.9.0
 
 # Provides qom-fuse functionality
 fuse =
@@ -55,6 +56,7 @@ fuse =
 tui =
 urwid >= 2.1.2
 urwid-readline >= 0.13
+Pygments >= 2.9.0
 
 [options.entry_points]
 console_scripts =
@@ -97,6 +99,9 @@ ignore_missing_imports = True
 [mypy-urwid_readline]
 ignore_missing_imports = True
 
+[mypy-pygments]
+ignore_missing_imports = True
+
 [pylint.messages control]
 # Disable the message, report, category or checker with the given id(s). You
 # can either give multiple identifiers separated by comma (,) or put this
-- 
2.17.1




Re: [PATCH 4/4] vl: Prioritize realizations of devices

2021-08-23 Thread Peter Xu
On Mon, Aug 23, 2021 at 06:05:07PM -0400, Michael S. Tsirkin wrote:
> On Mon, Aug 23, 2021 at 03:18:51PM -0400, Peter Xu wrote:
> > On Mon, Aug 23, 2021 at 02:49:12PM -0400, Eduardo Habkost wrote:
> > > On Wed, Aug 18, 2021 at 03:43:18PM -0400, Peter Xu wrote:
> > > > QEMU creates -device objects in order as specified by the user's 
> > > > cmdline.
> > > > However that ordering may not be the ideal order.  For example, some 
> > > > platform
> > > > devices (vIOMMUs) may want to be created earlier than most of the rest
> > > > devices (e.g., vfio-pci, virtio).
> > > > 
> > > > This patch orders the QemuOptsList of '-device's so they'll be sorted 
> > > > first
> > > > before kicking off the device realizations.  This will allow the device
> > > > realization code to be able to use APIs like 
> > > > pci_device_iommu_address_space()
> > > > correctly, because those functions rely on the platfrom devices being 
> > > > realized.
> > > > 
> > > > Now we rely on vmsd->priority which is defined as MigrationPriority to 
> > > > provide
> > > > the ordering, as either VM init and migration completes will need such 
> > > > an
> > > > ordering.  In the future we can move that priority information out of 
> > > > vmsd.
> > > > 
> > > > Signed-off-by: Peter Xu 
> > > 
> > > Can we be 100% sure that changing the ordering of every single
> > > device being created won't affect guest ABI?  (I don't think we can)
> > 
> > That's a good question, however I doubt whether there's any real-world guest
> > ABI for that.  As a developer, I normally specify cmdline parameter in an 
> > adhoc
> > way, so that I assume most parameters are not sensitive to ordering and I 
> > can
> > tune the ordering as wish.  I'm not sure whether that's common for qemu 
> > users,
> > I would expect so, but I may have missed something that I'm not aware of.
> > 
> > Per my knowledge the only "guest ABI" change is e.g. when we specify 
> > "vfio-pci"
> > to be before "intel-iommu": it'll be constantly broken before this patchset,
> > while after this series it'll be working.  It's just that I don't think 
> > those
> > "guest ABI" is necessary to be kept, and that's exactly what I want to fix 
> > with
> > the patchset..
> > 
> > > 
> > > How many device types in QEMU have non-default vmsd priority?
> > 
> > Not so much; here's the list of priorities and the devices using it:
> > 
> >|+-|
> >| priority   | devices |
> >|+-|
> >| MIG_PRI_IOMMU  |   3 |
> >| MIG_PRI_PCI_BUS|   7 |
> >| MIG_PRI_VIRTIO_MEM |   1 |
> >| MIG_PRI_GICV3_ITS  |   1 |
> >| MIG_PRI_GICV3  |   1 |
> >|+-|
> 
> iommu is probably ok. I think virtio mem is ok too,
> in that it is normally created by virtio-mem-pci ...

Hmm this reminded me whether virtio-mem-pci could have another devfn allocated
after being moved..

But frankly I still doubt whether we should guarantee that guest ABI on user
not specifying addr=XXX in pci device parameters - I feel like it's a burden
that we don't need to carry.

(Btw, trying to keep the order is one thing; declare it guest ABI would be
 another thing to me)

> 
> 
> 
> > All the rest devices are using the default (0) priority.
> > 
> > > 
> > > Can we at least ensure devices with the same priority won't be
> > > reordered, just to be safe?  (qsort() doesn't guarantee that)
> > > 
> > > If very few device types have non-default vmsd priority and
> > > devices with the same priority aren't reordered, the risk of
> > > compatibility breakage would be much smaller.
> > 
> > I'm also wondering whether it's a good thing to break some guest ABI due to
> > this change, if possible.
> > 
> > Let's imagine something breaks after applied, then the only reason should be
> > that qsort() changed the order of some same-priority devices and it's not 
> > the
> > same as user specified any more.  Then, does it also means there's yet 
> > another
> > ordering requirement that we didn't even notice?
> > 
> > I doubt whether that'll even happen (or I think there'll be report already, 
> > as
> > in qemu man page there's no requirement on parameter ordering).  In all 
> > cases,
> > instead of "keeping the same priority devices in the same order as the user 
> > has
> > specified", IMHO we should make the broken devices to have different 
> > priorities
> > so the ordering will be guaranteed by qemu internal, rather than how user
> > specified it.
> 
> Well giving user control of guest ABI is a reasonable thing to do,
> it is realize order that users do not really care about.

Makes sense.

> 
> I guess we could move pci slot allocation out of realize
> so it does not depend on realize order?

Yes that sounds like another approach, but it seems to require more changes.

Thanks,

-- 
Peter Xu




[PATCH v6 1/5] python: Add dependencies for AQMP TUI

2021-08-23 Thread G S Niteesh Babu
Added dependencies for the upcoming AQMP TUI under the optional
'tui' group.

The same dependencies have also been added under the devel group
since no work around has been found for optional groups to imply
other optional groups.

Signed-off-by: G S Niteesh Babu 
---
 python/Pipfile.lock | 12 
 python/setup.cfg|  8 
 2 files changed, 20 insertions(+)

diff --git a/python/Pipfile.lock b/python/Pipfile.lock
index 457f5c3fe8..da7a4ee164 100644
--- a/python/Pipfile.lock
+++ b/python/Pipfile.lock
@@ -289,6 +289,18 @@
 "markers": "python_version < '3.8'",
 "version": "==3.10.0.0"
 },
+"urwid": {
+"hashes": [
+
"sha256:588bee9c1cb208d0906a9f73c613d2bd32c3ed3702012f51efe318a3f2127eae"
+],
+"version": "==2.1.2"
+},
+"urwid-readline": {
+"hashes": [
+
"sha256:018020cbc864bb5ed87be17dc26b069eae2755cb29f3a9c569aac3bded1efaf4"
+],
+"version": "==0.13"
+},
 "virtualenv": {
 "hashes": [
 
"sha256:14fdf849f80dbb29a4eb6caa9875d476ee2a5cf76a5f5415fa2f1606010ab467",
diff --git a/python/setup.cfg b/python/setup.cfg
index 152c683f41..589a90be21 100644
--- a/python/setup.cfg
+++ b/python/setup.cfg
@@ -44,11 +44,18 @@ devel =
 mypy >= 0.770
 pylint >= 2.8.0
 tox >= 3.18.0
+urwid >= 2.1.2
+urwid-readline >= 0.13
 
 # Provides qom-fuse functionality
 fuse =
 fusepy >= 2.0.4
 
+# AQMP TUI dependencies
+tui =
+urwid >= 2.1.2
+urwid-readline >= 0.13
+
 [options.entry_points]
 console_scripts =
 qom = qemu.qmp.qom:main
@@ -132,5 +139,6 @@ allowlist_externals = make
 deps =
 .[devel]
 .[fuse]  # Workaround to trigger tox venv rebuild
+.[tui]   # Workaround to trigger tox venv rebuild
 commands =
 make check
-- 
2.17.1




[PATCH v6 5/5] python/aqmp-tui: Add syntax highlighting

2021-08-23 Thread G S Niteesh Babu
Add syntax highlighting for the incoming and outgoing QMP messages.
This is achieved using the pygments module which was added in a
previous commit.

The current implementation is a really simple one which doesn't
allow for any configuration. In future this has to be improved
to allow for easier theme config using an external config of
some sort.

Signed-off-by: G S Niteesh Babu 
---
 python/qemu/aqmp/aqmp_tui.py | 36 ++--
 1 file changed, 34 insertions(+), 2 deletions(-)

diff --git a/python/qemu/aqmp/aqmp_tui.py b/python/qemu/aqmp/aqmp_tui.py
index ac533541d2..a2929f771c 100644
--- a/python/qemu/aqmp/aqmp_tui.py
+++ b/python/qemu/aqmp/aqmp_tui.py
@@ -30,6 +30,8 @@
 cast,
 )
 
+from pygments import lexers
+from pygments import token as Token
 import urwid
 import urwid_readline
 
@@ -45,6 +47,22 @@
 UPDATE_MSG: str = 'UPDATE_MSG'
 
 
+palette = [
+(Token.Punctuation, '', '', '', 'h15,bold', 'g7'),
+(Token.Text, '', '', '', '', 'g7'),
+(Token.Name.Tag, '', '', '', 'bold,#f88', 'g7'),
+(Token.Literal.Number.Integer, '', '', '', '#fa0', 'g7'),
+(Token.Literal.String.Double, '', '', '', '#6f6', 'g7'),
+(Token.Keyword.Constant, '', '', '', '#6af', 'g7'),
+('DEBUG', '', '', '', '#ddf', 'g7'),
+('INFO', '', '', '', 'g100', 'g7'),
+('WARNING', '', '', '', '#ff6', 'g7'),
+('ERROR', '', '', '', '#a00', 'g7'),
+('CRITICAL', '', '', '', '#a00', 'g7'),
+('background', '', 'black', '', '', 'g7'),
+]
+
+
 def format_json(msg: str) -> str:
 """
 Formats valid/invalid multi-line JSON message into a single-line message.
@@ -353,6 +371,9 @@ def run(self, debug: bool = False) -> None:
 :param debug:
 Enables/Disables asyncio event loop debugging
 """
+screen = urwid.raw_display.Screen()
+screen.set_terminal_properties(256)
+
 self.aloop = asyncio.get_event_loop()
 self.aloop.set_debug(debug)
 
@@ -364,6 +385,8 @@ def run(self, debug: bool = False) -> None:
 event_loop = urwid.AsyncioEventLoop(loop=self.aloop)
 main_loop = urwid.MainLoop(urwid.AttrMap(self.window, 'background'),
unhandled_input=self.unhandled_input,
+   screen=screen,
+   palette=palette,
handle_mouse=True,
event_loop=event_loop)
 
@@ -487,7 +510,8 @@ def __init__(self, parent: App) -> None:
 self.history = urwid.SimpleFocusListWalker([])
 super().__init__(self.history)
 
-def add_to_history(self, history: str) -> None:
+def add_to_history(self,
+   history: Union[str, List[Tuple[str, str]]]) -> None:
 """
 Appends a message to the list and set the focus to the last appended
 message.
@@ -531,10 +555,18 @@ def cb_add_to_history(self, msg: str, level: 
Optional[str] = None) -> None:
 
 :param msg:
 The message to be appended to the history box.
+:param level:
+The log level of the message, if it is a log message.
 """
+formatted = []
 if level:
 msg = f'[{level}]: {msg}'
-self.history.add_to_history(msg)
+formatted.append((level, msg))
+else:
+lexer = lexers.JsonLexer()  # pylint: disable=no-member
+for token in lexer.get_tokens(msg):
+formatted.append(token)
+self.history.add_to_history(formatted)
 
 
 class Window(urwid.Frame):
-- 
2.17.1




Re: [RFC PATCH v2 0/5] physmem: Have flaview API check bus permission from MemTxAttrs argument

2021-08-23 Thread Alexander Bulekov
On 210823 1650, Peter Xu wrote:
> On Mon, Aug 23, 2021 at 08:10:50PM +0100, Peter Maydell wrote:
> > On Mon, 23 Aug 2021 at 17:42, Philippe Mathieu-Daudé  
> > wrote:
> > >
> > > This series aim to kill a recent class of bug, the infamous
> > > "DMA reentrancy" issues found by Alexander while fuzzing.
> > >
> > > Introduce the 'bus_perm' field in MemTxAttrs, defining 3 bits:
> > >
> > > - MEMTXPERM_UNSPECIFIED (current default, unchanged behavior)
> > > - MEMTXPERM_UNRESTRICTED (allow list approach)
> > > - MEMTXPERM_RAM_DEVICE (example of deny list approach)
> > >
> > > If a transaction permission is not allowed (for example access
> > > to non-RAM device), we return the specific MEMTX_BUS_ERROR.
> > >
> > > Permissions are checked in after the flatview is resolved, and
> > > before the access is done, in a new function: flatview_access_allowed().
> > 
> > So I'm not going to say 'no' to this, because we have a real
> > recursive-device-handling problem and I don't have a better
> > idea to hand, but the thing about this is that we end up with
> > behaviour which is not what the real hardware does. I'm not
> > aware of any DMA device which has this kind of "can only DMA
> > to/from RAM, and aborts on access to a device" behaviour...
> 
> Sorry for not being familiar with the context - is there more info regarding
> the problem to fix?  I'm looking at the links mentioned in the old series:
> 
> https://lore.kernel.org/qemu-devel/20200903110831.353476-12-phi...@redhat.com/
> https://bugs.launchpad.net/qemu/+bug/1886362
> https://bugs.launchpad.net/qemu/+bug/1888606
> 
> They seem all marked as fixed now.

Here are some that should still reproduce:
https://gitlab.com/qemu-project/qemu/-/issues/542
https://gitlab.com/qemu-project/qemu/-/issues/540
https://gitlab.com/qemu-project/qemu/-/issues/541
https://gitlab.com/qemu-project/qemu/-/issues/62
https://lore.kernel.org/qemu-devel/20210218140629.373646-1-ppan...@redhat.com/ 
(CVE-2021-20255)

There's also this one, that I don't think I ever created a bug report
for (working on it now):
https://bugs.chromium.org/p/oss-fuzz/issues/detail?id=33247
-Alex

> 
> Thanks,
> 
> -- 
> Peter Xu
> 



Re: [PATCH 4/4] vl: Prioritize realizations of devices

2021-08-23 Thread Michael S. Tsirkin
On Mon, Aug 23, 2021 at 03:18:51PM -0400, Peter Xu wrote:
> On Mon, Aug 23, 2021 at 02:49:12PM -0400, Eduardo Habkost wrote:
> > On Wed, Aug 18, 2021 at 03:43:18PM -0400, Peter Xu wrote:
> > > QEMU creates -device objects in order as specified by the user's cmdline.
> > > However that ordering may not be the ideal order.  For example, some 
> > > platform
> > > devices (vIOMMUs) may want to be created earlier than most of the rest
> > > devices (e.g., vfio-pci, virtio).
> > > 
> > > This patch orders the QemuOptsList of '-device's so they'll be sorted 
> > > first
> > > before kicking off the device realizations.  This will allow the device
> > > realization code to be able to use APIs like 
> > > pci_device_iommu_address_space()
> > > correctly, because those functions rely on the platfrom devices being 
> > > realized.
> > > 
> > > Now we rely on vmsd->priority which is defined as MigrationPriority to 
> > > provide
> > > the ordering, as either VM init and migration completes will need such an
> > > ordering.  In the future we can move that priority information out of 
> > > vmsd.
> > > 
> > > Signed-off-by: Peter Xu 
> > 
> > Can we be 100% sure that changing the ordering of every single
> > device being created won't affect guest ABI?  (I don't think we can)
> 
> That's a good question, however I doubt whether there's any real-world guest
> ABI for that.  As a developer, I normally specify cmdline parameter in an 
> adhoc
> way, so that I assume most parameters are not sensitive to ordering and I can
> tune the ordering as wish.  I'm not sure whether that's common for qemu users,
> I would expect so, but I may have missed something that I'm not aware of.
> 
> Per my knowledge the only "guest ABI" change is e.g. when we specify 
> "vfio-pci"
> to be before "intel-iommu": it'll be constantly broken before this patchset,
> while after this series it'll be working.  It's just that I don't think those
> "guest ABI" is necessary to be kept, and that's exactly what I want to fix 
> with
> the patchset..
> 
> > 
> > How many device types in QEMU have non-default vmsd priority?
> 
> Not so much; here's the list of priorities and the devices using it:
> 
>|+-|
>| priority   | devices |
>|+-|
>| MIG_PRI_IOMMU  |   3 |
>| MIG_PRI_PCI_BUS|   7 |
>| MIG_PRI_VIRTIO_MEM |   1 |
>| MIG_PRI_GICV3_ITS  |   1 |
>| MIG_PRI_GICV3  |   1 |
>|+-|

iommu is probably ok. I think virtio mem is ok too,
in that it is normally created by virtio-mem-pci ...



> All the rest devices are using the default (0) priority.
> 
> > 
> > Can we at least ensure devices with the same priority won't be
> > reordered, just to be safe?  (qsort() doesn't guarantee that)
> > 
> > If very few device types have non-default vmsd priority and
> > devices with the same priority aren't reordered, the risk of
> > compatibility breakage would be much smaller.
> 
> I'm also wondering whether it's a good thing to break some guest ABI due to
> this change, if possible.
> 
> Let's imagine something breaks after applied, then the only reason should be
> that qsort() changed the order of some same-priority devices and it's not the
> same as user specified any more.  Then, does it also means there's yet another
> ordering requirement that we didn't even notice?
> 
> I doubt whether that'll even happen (or I think there'll be report already, as
> in qemu man page there's no requirement on parameter ordering).  In all cases,
> instead of "keeping the same priority devices in the same order as the user 
> has
> specified", IMHO we should make the broken devices to have different 
> priorities
> so the ordering will be guaranteed by qemu internal, rather than how user
> specified it.

Well giving user control of guest ABI is a reasonable thing to do,
it is realize order that users do not really care about.

I guess we could move pci slot allocation out of realize
so it does not depend on realize order?


> >From that pov, maybe this patchset would be great if it can be accepted and
> applied in early stage of a release? So we can figure out what's missing and
> fix them within the same release.  However again I still doubt whether there's
> any user that will break in a bad way.
> 
> Thanks,
> 
> -- 
> Peter Xu




Re: [PATCH 4/4] vl: Prioritize realizations of devices

2021-08-23 Thread Eduardo Habkost
On Mon, Aug 23, 2021 at 05:31:46PM -0400, Peter Xu wrote:
> On Mon, Aug 23, 2021 at 05:07:03PM -0400, Eduardo Habkost wrote:
> > To give just one example:
> > 
> > $ (echo 'info pci';echo quit;) | qemu-system-x86_64 -device virtio-net-pci 
> > -device e1000e -monitor stdio | tail -n 20
> >   Bus  0, device   4, function 0:
> > Ethernet controller: PCI device 1af4:1000
> >   PCI subsystem 1af4:0001
> >   IRQ 0, pin A
> >   BAR0: I/O at 0x [0x001e].
> >   BAR1: 32 bit memory at 0x [0x0ffe].
> >   BAR4: 64 bit prefetchable memory at 0x [0x3ffe].
> >   BAR6: 32 bit memory at 0x [0x0003fffe].
> >   id ""
> >   Bus  0, device   5, function 0:
> > Ethernet controller: PCI device 8086:10d3
> >   PCI subsystem 8086:
> >   IRQ 0, pin A
> >   BAR0: 32 bit memory at 0x [0x0001fffe].
> >   BAR1: 32 bit memory at 0x [0x0001fffe].
> >   BAR2: I/O at 0x [0x001e].
> >   BAR3: 32 bit memory at 0x [0x3ffe].
> >   BAR6: 32 bit memory at 0x [0x0003fffe].
> >   id ""
> > (qemu) quit
> > $ (echo 'info pci';echo quit;) | qemu-system-x86_64 -device e1000e -device 
> > virtio-net-pci -monitor stdio | tail -n 20
> >   Bus  0, device   4, function 0:
> > Ethernet controller: PCI device 8086:10d3
> >   PCI subsystem 8086:
> >   IRQ 0, pin A
> >   BAR0: 32 bit memory at 0x [0x0001fffe].
> >   BAR1: 32 bit memory at 0x [0x0001fffe].
> >   BAR2: I/O at 0x [0x001e].
> >   BAR3: 32 bit memory at 0x [0x3ffe].
> >   BAR6: 32 bit memory at 0x [0x0003fffe].
> >   id ""
> >   Bus  0, device   5, function 0:
> > Ethernet controller: PCI device 1af4:1000
> >   PCI subsystem 1af4:0001
> >   IRQ 0, pin A
> >   BAR0: I/O at 0x [0x001e].
> >   BAR1: 32 bit memory at 0x [0x0ffe].
> >   BAR4: 64 bit prefetchable memory at 0x [0x3ffe].
> >   BAR6: 32 bit memory at 0x [0x0003fffe].
> >   id ""
> > (qemu) quit
> > 
> > 
> > If the order of the -device arguments changes, the devices are assigned to
> > different PCI slots.
> 
> Thanks for the example.
> 
> Initially I thought about this and didn't think it an issue (because serious
> users will always specify addr=XXX for -device; I thought libvirt always does
> that), but I do remember that guest OS could identify its hardware config with
> devfn number, so nmcli may mess up its config with before/after this change
> indeed..
> 
> I can use a custom sort to replace qsort() to guarantee that.
> 
> Do you have other examples in mind that I may have overlooked, especially I 
> may
> not be able to fix by a custom sort with only moving priority>=1 devices?

I don't have any other example, but I assume address assignment
based on ordering is a common pattern in device code.

I would take a very close and careful look at the devices with
non-default vmsd priority.  If you can prove that the 13 device
types with non-default priority are all order-insensitive, a
custom sort function as you describe might be safe.

-- 
Eduardo




Re: [PATCH 4/4] vl: Prioritize realizations of devices

2021-08-23 Thread Michael S. Tsirkin
On Mon, Aug 23, 2021 at 05:31:46PM -0400, Peter Xu wrote:
> On Mon, Aug 23, 2021 at 05:07:03PM -0400, Eduardo Habkost wrote:
> > To give just one example:
> > 
> > $ (echo 'info pci';echo quit;) | qemu-system-x86_64 -device virtio-net-pci 
> > -device e1000e -monitor stdio | tail -n 20
> >   Bus  0, device   4, function 0:
> > Ethernet controller: PCI device 1af4:1000
> >   PCI subsystem 1af4:0001
> >   IRQ 0, pin A
> >   BAR0: I/O at 0x [0x001e].
> >   BAR1: 32 bit memory at 0x [0x0ffe].
> >   BAR4: 64 bit prefetchable memory at 0x [0x3ffe].
> >   BAR6: 32 bit memory at 0x [0x0003fffe].
> >   id ""
> >   Bus  0, device   5, function 0:
> > Ethernet controller: PCI device 8086:10d3
> >   PCI subsystem 8086:
> >   IRQ 0, pin A
> >   BAR0: 32 bit memory at 0x [0x0001fffe].
> >   BAR1: 32 bit memory at 0x [0x0001fffe].
> >   BAR2: I/O at 0x [0x001e].
> >   BAR3: 32 bit memory at 0x [0x3ffe].
> >   BAR6: 32 bit memory at 0x [0x0003fffe].
> >   id ""
> > (qemu) quit
> > $ (echo 'info pci';echo quit;) | qemu-system-x86_64 -device e1000e -device 
> > virtio-net-pci -monitor stdio | tail -n 20
> >   Bus  0, device   4, function 0:
> > Ethernet controller: PCI device 8086:10d3
> >   PCI subsystem 8086:
> >   IRQ 0, pin A
> >   BAR0: 32 bit memory at 0x [0x0001fffe].
> >   BAR1: 32 bit memory at 0x [0x0001fffe].
> >   BAR2: I/O at 0x [0x001e].
> >   BAR3: 32 bit memory at 0x [0x3ffe].
> >   BAR6: 32 bit memory at 0x [0x0003fffe].
> >   id ""
> >   Bus  0, device   5, function 0:
> > Ethernet controller: PCI device 1af4:1000
> >   PCI subsystem 1af4:0001
> >   IRQ 0, pin A
> >   BAR0: I/O at 0x [0x001e].
> >   BAR1: 32 bit memory at 0x [0x0ffe].
> >   BAR4: 64 bit prefetchable memory at 0x [0x3ffe].
> >   BAR6: 32 bit memory at 0x [0x0003fffe].
> >   id ""
> > (qemu) quit
> > 
> > 
> > If the order of the -device arguments changes, the devices are assigned to
> > different PCI slots.
> 
> Thanks for the example.
> 
> Initially I thought about this and didn't think it an issue (because serious
> users will always specify addr=XXX for -device; I thought libvirt always does
> that), but I do remember that guest OS could identify its hardware config with
> devfn number, so nmcli may mess up its config with before/after this change
> indeed..
> 
> I can use a custom sort to replace qsort() to guarantee that.


You don't have to do that. Simply use the device position on the command
line for comparisons when priority is the same.


> Do you have other examples in mind that I may have overlooked, especially I 
> may
> not be able to fix by a custom sort with only moving priority>=1 devices?
> 
> Thanks,

> -- 
> Peter Xu




Re: [PATCH V6 00/27] Live Update

2021-08-23 Thread Steven Sistare
Hi Zheng, testing aarch64 is on our todo list. We will run this case and try to 
reproduce the failure.  Thanks for the report.

- Steve

On 8/21/2021 4:54 AM, Zheng Chuan wrote:
> Hi, steve
> 
> It seems the VM will stuck after cpr-load on AArch64 environment?
> 
> My AArch64 environment and test steps:
> 1. linux kernel: 5.14-rc6
> 2. QEMU version: v6.1.0-rc2 (patch your patchset), and configure with 
> `../configure --target-list=aarch64-softmmu --disable-werror --enable-kvm` 4. 
> Steps to live update:
> # ./build/aarch64-softmmu/qemu-system-aarch64 -machine 
> virt,accel=kvm,gic-version=3,memfd-alloc=on -nodefaults -cpu host -m 2G -smp 
> 1 -drive 
> file=/usr/share/edk2/aarch64/QEMU_EFI-pflash.raw,if=pflash,format=raw,readonly=on
> -drive file=,format=qcow2,if=none,id=drive_image1
> -device virtio-blk-pci,id=image1,drive=drive_image1 -vnc :10 -device
> virtio-gpu,id=video0 -device piix3-usb-uhci,id=usb -device
> usb-tablet,id=input0,bus=usb.0,port=1 -device
> usb-kbd,id=input1,bus=usb.0,port=2 -monitor stdio
> (qemu) cpr-save /tmp/qemu.save restart
> (qemu) cpr-exec ./build/aarch64-softmmu/qemu-system-aarch64 -machine 
> virt,accel=kvm,gic-version=3,memfd-alloc=on -nodefaults -cpu host -m 2G -smp 
> 1 -drive 
> file=/usr/share/edk2/aarch64/QEMU_EFI-pflash.raw,if=pflash,format=raw,readonly=on
> -drive file=,format=qcow2,if=none,id=drive_image1
> -device virtio-blk-pci,id=image1,drive=drive_image1 -vnc :10 -device
> virtio-gpu,id=video0 -device piix3-usb-uhci,id=usb -device
> usb-tablet,id=input0,bus=usb.0,port=1 -device
> usb-kbd,id=input1,bus=usb.0,port=2 -monitor stdio -S
> (qemu) QEMU 6.0.92 monitor - type 'help' for more information
> (qemu) cpr-load /tmp/qemu.save
> 
> Does I miss something?
> 
> On 2021/8/7 5:43, Steve Sistare wrote:
>> Provide the cpr-save, cpr-exec, and cpr-load commands for live update.
>> These save and restore VM state, with minimal guest pause time, so that
>> qemu may be updated to a new version in between.
>>
>> cpr-save stops the VM and saves vmstate to an ordinary file.  It supports
>> any type of guest image and block device, but the caller must not modify
>> guest block devices between cpr-save and cpr-load.  It supports two modes:
>> reboot and restart.
>>
>> In reboot mode, the caller invokes cpr-save and then terminates qemu.
>> The caller may then update the host kernel and system software and reboot.
>> The caller resumes the guest by running qemu with the same arguments as the
>> original process and invoking cpr-load.  To use this mode, guest ram must be
>> mapped to a persistent shared memory file such as /dev/dax0.0, or /dev/shm
>> PKRAM as proposed in 
>> https://lore.kernel.org/lkml/1617140178-8773-1-git-send-email-anthony.yzn...@oracle.com.
>>
>> The reboot mode supports vfio devices if the caller first suspends the
>> guest, such as by issuing guest-suspend-ram to the qemu guest agent.  The
>> guest drivers' suspend methods flush outstanding requests and re-initialize
>> the devices, and thus there is no device state to save and restore.
>>
>> Restart mode preserves the guest VM across a restart of the qemu process.
>> After cpr-save, the caller passes qemu command-line arguments to cpr-exec,
>> which directly exec's the new qemu binary.  The arguments must include -S
>> so new qemu starts in a paused state and waits for the cpr-load command.
>> The restart mode supports vfio devices by preserving the vfio container,
>> group, device, and event descriptors across the qemu re-exec, and by
>> updating DMA mapping virtual addresses using VFIO_DMA_UNMAP_FLAG_VADDR and
>> VFIO_DMA_MAP_FLAG_VADDR as defined in 
>> https://lore.kernel.org/kvm/1611939252-7240-1-git-send-email-steven.sist...@oracle.com/
>> and integrated in Linux kernel 5.12.
>>
>> To use the restart mode, qemu must be started with the memfd-alloc option,
>> which allocates guest ram using memfd_create.  The memfd's are saved to
>> the environment and kept open across exec, after which they are found from
>> the environment and re-mmap'd.  Hence guest ram is preserved in place,
>> albeit with new virtual addresses in the qemu process.
>>
>> The caller resumes the guest by invoking cpr-load, which loads state from
>> the file. If the VM was running at cpr-save time, then VM execution resumes.
>> If the VM was suspended at cpr-save time (reboot mode), then the caller must
>> issue a system_wakeup command to resume.
>>
>> The first patches add reboot mode:
>>   - memory: qemu_check_ram_volatile
>>   - migration: fix populate_vfio_info
>>   - migration: qemu file wrappers
>>   - migration: simplify savevm
>>   - vl: start on wakeup request
>>   - cpr: reboot mode
>>   - cpr: reboot HMP interfaces
>>
>> The next patches add restart mode:
>>   - memory: flat section iterator
>>   - oslib: qemu_clear_cloexec
>>   - machine: memfd-alloc option
>>   - qapi: list utility functions
>>   - vl: helper to request re-exec
>>   - cpr: preserve extra state
>>   - cpr: restart mode
>>   - cpr: restart HMP 

Re: [PATCH 4/4] vl: Prioritize realizations of devices

2021-08-23 Thread Peter Xu
On Mon, Aug 23, 2021 at 05:07:03PM -0400, Eduardo Habkost wrote:
> To give just one example:
> 
> $ (echo 'info pci';echo quit;) | qemu-system-x86_64 -device virtio-net-pci 
> -device e1000e -monitor stdio | tail -n 20
>   Bus  0, device   4, function 0:
> Ethernet controller: PCI device 1af4:1000
>   PCI subsystem 1af4:0001
>   IRQ 0, pin A
>   BAR0: I/O at 0x [0x001e].
>   BAR1: 32 bit memory at 0x [0x0ffe].
>   BAR4: 64 bit prefetchable memory at 0x [0x3ffe].
>   BAR6: 32 bit memory at 0x [0x0003fffe].
>   id ""
>   Bus  0, device   5, function 0:
> Ethernet controller: PCI device 8086:10d3
>   PCI subsystem 8086:
>   IRQ 0, pin A
>   BAR0: 32 bit memory at 0x [0x0001fffe].
>   BAR1: 32 bit memory at 0x [0x0001fffe].
>   BAR2: I/O at 0x [0x001e].
>   BAR3: 32 bit memory at 0x [0x3ffe].
>   BAR6: 32 bit memory at 0x [0x0003fffe].
>   id ""
> (qemu) quit
> $ (echo 'info pci';echo quit;) | qemu-system-x86_64 -device e1000e -device 
> virtio-net-pci -monitor stdio | tail -n 20
>   Bus  0, device   4, function 0:
> Ethernet controller: PCI device 8086:10d3
>   PCI subsystem 8086:
>   IRQ 0, pin A
>   BAR0: 32 bit memory at 0x [0x0001fffe].
>   BAR1: 32 bit memory at 0x [0x0001fffe].
>   BAR2: I/O at 0x [0x001e].
>   BAR3: 32 bit memory at 0x [0x3ffe].
>   BAR6: 32 bit memory at 0x [0x0003fffe].
>   id ""
>   Bus  0, device   5, function 0:
> Ethernet controller: PCI device 1af4:1000
>   PCI subsystem 1af4:0001
>   IRQ 0, pin A
>   BAR0: I/O at 0x [0x001e].
>   BAR1: 32 bit memory at 0x [0x0ffe].
>   BAR4: 64 bit prefetchable memory at 0x [0x3ffe].
>   BAR6: 32 bit memory at 0x [0x0003fffe].
>   id ""
> (qemu) quit
> 
> 
> If the order of the -device arguments changes, the devices are assigned to
> different PCI slots.

Thanks for the example.

Initially I thought about this and didn't think it an issue (because serious
users will always specify addr=XXX for -device; I thought libvirt always does
that), but I do remember that guest OS could identify its hardware config with
devfn number, so nmcli may mess up its config with before/after this change
indeed..

I can use a custom sort to replace qsort() to guarantee that.

Do you have other examples in mind that I may have overlooked, especially I may
not be able to fix by a custom sort with only moving priority>=1 devices?

Thanks,

-- 
Peter Xu




Re: [PATCH 4/4] vl: Prioritize realizations of devices

2021-08-23 Thread Eduardo Habkost
On Mon, Aug 23, 2021 at 03:18:51PM -0400, Peter Xu wrote:
> On Mon, Aug 23, 2021 at 02:49:12PM -0400, Eduardo Habkost wrote:
> > On Wed, Aug 18, 2021 at 03:43:18PM -0400, Peter Xu wrote:
> > > QEMU creates -device objects in order as specified by the user's cmdline.
> > > However that ordering may not be the ideal order.  For example, some 
> > > platform
> > > devices (vIOMMUs) may want to be created earlier than most of the rest
> > > devices (e.g., vfio-pci, virtio).
> > > 
> > > This patch orders the QemuOptsList of '-device's so they'll be sorted 
> > > first
> > > before kicking off the device realizations.  This will allow the device
> > > realization code to be able to use APIs like 
> > > pci_device_iommu_address_space()
> > > correctly, because those functions rely on the platfrom devices being 
> > > realized.
> > > 
> > > Now we rely on vmsd->priority which is defined as MigrationPriority to 
> > > provide
> > > the ordering, as either VM init and migration completes will need such an
> > > ordering.  In the future we can move that priority information out of 
> > > vmsd.
> > > 
> > > Signed-off-by: Peter Xu 
> > 
> > Can we be 100% sure that changing the ordering of every single
> > device being created won't affect guest ABI?  (I don't think we can)
> 
> That's a good question, however I doubt whether there's any real-world guest
> ABI for that.  As a developer, I normally specify cmdline parameter in an 
> adhoc
> way, so that I assume most parameters are not sensitive to ordering and I can
> tune the ordering as wish.  I'm not sure whether that's common for qemu users,
> I would expect so, but I may have missed something that I'm not aware of.

To give just one example:

$ (echo 'info pci';echo quit;) | qemu-system-x86_64 -device virtio-net-pci 
-device e1000e -monitor stdio | tail -n 20
  Bus  0, device   4, function 0:
Ethernet controller: PCI device 1af4:1000
  PCI subsystem 1af4:0001
  IRQ 0, pin A
  BAR0: I/O at 0x [0x001e].
  BAR1: 32 bit memory at 0x [0x0ffe].
  BAR4: 64 bit prefetchable memory at 0x [0x3ffe].
  BAR6: 32 bit memory at 0x [0x0003fffe].
  id ""
  Bus  0, device   5, function 0:
Ethernet controller: PCI device 8086:10d3
  PCI subsystem 8086:
  IRQ 0, pin A
  BAR0: 32 bit memory at 0x [0x0001fffe].
  BAR1: 32 bit memory at 0x [0x0001fffe].
  BAR2: I/O at 0x [0x001e].
  BAR3: 32 bit memory at 0x [0x3ffe].
  BAR6: 32 bit memory at 0x [0x0003fffe].
  id ""
(qemu) quit
$ (echo 'info pci';echo quit;) | qemu-system-x86_64 -device e1000e -device 
virtio-net-pci -monitor stdio | tail -n 20
  Bus  0, device   4, function 0:
Ethernet controller: PCI device 8086:10d3
  PCI subsystem 8086:
  IRQ 0, pin A
  BAR0: 32 bit memory at 0x [0x0001fffe].
  BAR1: 32 bit memory at 0x [0x0001fffe].
  BAR2: I/O at 0x [0x001e].
  BAR3: 32 bit memory at 0x [0x3ffe].
  BAR6: 32 bit memory at 0x [0x0003fffe].
  id ""
  Bus  0, device   5, function 0:
Ethernet controller: PCI device 1af4:1000
  PCI subsystem 1af4:0001
  IRQ 0, pin A
  BAR0: I/O at 0x [0x001e].
  BAR1: 32 bit memory at 0x [0x0ffe].
  BAR4: 64 bit prefetchable memory at 0x [0x3ffe].
  BAR6: 32 bit memory at 0x [0x0003fffe].
  id ""
(qemu) quit


If the order of the -device arguments changes, the devices are assigned to
different PCI slots.


> 
> Per my knowledge the only "guest ABI" change is e.g. when we specify 
> "vfio-pci"
> to be before "intel-iommu": it'll be constantly broken before this patchset,
> while after this series it'll be working.  It's just that I don't think those
> "guest ABI" is necessary to be kept, and that's exactly what I want to fix 
> with
> the patchset..

If the only ordering changes caused by this patch were intentional and affected
only configurations that are known to be broken (like vfio-pci vs intel-iommu),
I would agree.

However, if we are reordering every single -device option in an unspecified way
(like qsort() does when elements compare as equal), we are probably breaking
guest ABI and creating a completely different machine (like in the PCI example 
above).


> 
> > 
> > How many device types in QEMU have non-default vmsd priority?
> 
> Not so much; here's the list of priorities and the devices using it:
> 
>|+-|
>| priority   | devices |
>|+-|
>| MIG_PRI_IOMMU  |   3 |
>| MIG_PRI_PCI_BUS|   7 |
>| MIG_PRI_VIRTIO_MEM |   1 |
>| MIG_PRI_GICV3_ITS  |   1 |
>| MIG_PRI_GICV3  |   1 |
>

Re: [RFC PATCH v2 0/5] physmem: Have flaview API check bus permission from MemTxAttrs argument

2021-08-23 Thread Peter Xu
On Mon, Aug 23, 2021 at 08:10:50PM +0100, Peter Maydell wrote:
> On Mon, 23 Aug 2021 at 17:42, Philippe Mathieu-Daudé  
> wrote:
> >
> > This series aim to kill a recent class of bug, the infamous
> > "DMA reentrancy" issues found by Alexander while fuzzing.
> >
> > Introduce the 'bus_perm' field in MemTxAttrs, defining 3 bits:
> >
> > - MEMTXPERM_UNSPECIFIED (current default, unchanged behavior)
> > - MEMTXPERM_UNRESTRICTED (allow list approach)
> > - MEMTXPERM_RAM_DEVICE (example of deny list approach)
> >
> > If a transaction permission is not allowed (for example access
> > to non-RAM device), we return the specific MEMTX_BUS_ERROR.
> >
> > Permissions are checked in after the flatview is resolved, and
> > before the access is done, in a new function: flatview_access_allowed().
> 
> So I'm not going to say 'no' to this, because we have a real
> recursive-device-handling problem and I don't have a better
> idea to hand, but the thing about this is that we end up with
> behaviour which is not what the real hardware does. I'm not
> aware of any DMA device which has this kind of "can only DMA
> to/from RAM, and aborts on access to a device" behaviour...

Sorry for not being familiar with the context - is there more info regarding
the problem to fix?  I'm looking at the links mentioned in the old series:

https://lore.kernel.org/qemu-devel/20200903110831.353476-12-phi...@redhat.com/
https://bugs.launchpad.net/qemu/+bug/1886362
https://bugs.launchpad.net/qemu/+bug/1888606

They seem all marked as fixed now.

Thanks,

-- 
Peter Xu




[PATCH v5 24/24] target/riscv: Use {get,dest}_gpr for RVV

2021-08-23 Thread Richard Henderson
Remove gen_get_gpr, as the function becomes unused.

Signed-off-by: Richard Henderson 
---
 target/riscv/translate.c| 13 ++---
 target/riscv/insn_trans/trans_rvv.c.inc | 74 +++--
 2 files changed, 26 insertions(+), 61 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index e44254e878..e356fc6c46 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -232,11 +232,6 @@ static TCGv get_gpr(DisasContext *ctx, int reg_num, 
DisasExtend ext)
 g_assert_not_reached();
 }
 
-static void gen_get_gpr(DisasContext *ctx, TCGv t, int reg_num)
-{
-tcg_gen_mov_tl(t, get_gpr(ctx, reg_num, EXT_NONE));
-}
-
 static TCGv dest_gpr(DisasContext *ctx, int reg_num)
 {
 if (reg_num == 0 || ctx->w) {
@@ -637,9 +632,11 @@ void riscv_translate_init(void)
 {
 int i;
 
-/* cpu_gpr[0] is a placeholder for the zero register. Do not use it. */
-/* Use the gen_set_gpr and gen_get_gpr helper functions when accessing */
-/* registers, unless you specifically block reads/writes to reg 0 */
+/*
+ * cpu_gpr[0] is a placeholder for the zero register. Do not use it.
+ * Use the gen_set_gpr and get_gpr helper functions when accessing regs,
+ * unless you specifically block reads/writes to reg 0.
+ */
 cpu_gpr[0] = NULL;
 
 for (i = 1; i < 32; i++) {
diff --git a/target/riscv/insn_trans/trans_rvv.c.inc 
b/target/riscv/insn_trans/trans_rvv.c.inc
index de580c493c..fa451938f1 100644
--- a/target/riscv/insn_trans/trans_rvv.c.inc
+++ b/target/riscv/insn_trans/trans_rvv.c.inc
@@ -27,27 +27,22 @@ static bool trans_vsetvl(DisasContext *ctx, arg_vsetvl *a)
 return false;
 }
 
-s2 = tcg_temp_new();
-dst = tcg_temp_new();
+s2 = get_gpr(ctx, a->rs2, EXT_ZERO);
+dst = dest_gpr(ctx, a->rd);
 
 /* Using x0 as the rs1 register specifier, encodes an infinite AVL */
 if (a->rs1 == 0) {
 /* As the mask is at least one bit, RV_VLEN_MAX is >= VLMAX */
 s1 = tcg_constant_tl(RV_VLEN_MAX);
 } else {
-s1 = tcg_temp_new();
-gen_get_gpr(ctx, s1, a->rs1);
+s1 = get_gpr(ctx, a->rs1, EXT_ZERO);
 }
-gen_get_gpr(ctx, s2, a->rs2);
 gen_helper_vsetvl(dst, cpu_env, s1, s2);
 gen_set_gpr(ctx, a->rd, dst);
+
 tcg_gen_movi_tl(cpu_pc, ctx->pc_succ_insn);
 lookup_and_goto_ptr(ctx);
 ctx->base.is_jmp = DISAS_NORETURN;
-
-tcg_temp_free(s1);
-tcg_temp_free(s2);
-tcg_temp_free(dst);
 return true;
 }
 
@@ -60,23 +55,20 @@ static bool trans_vsetvli(DisasContext *ctx, arg_vsetvli *a)
 }
 
 s2 = tcg_constant_tl(a->zimm);
-dst = tcg_temp_new();
+dst = dest_gpr(ctx, a->rd);
 
 /* Using x0 as the rs1 register specifier, encodes an infinite AVL */
 if (a->rs1 == 0) {
 /* As the mask is at least one bit, RV_VLEN_MAX is >= VLMAX */
 s1 = tcg_constant_tl(RV_VLEN_MAX);
 } else {
-s1 = tcg_temp_new();
-gen_get_gpr(ctx, s1, a->rs1);
+s1 = get_gpr(ctx, a->rs1, EXT_ZERO);
 }
 gen_helper_vsetvl(dst, cpu_env, s1, s2);
 gen_set_gpr(ctx, a->rd, dst);
+
 gen_goto_tb(ctx, 0, ctx->pc_succ_insn);
 ctx->base.is_jmp = DISAS_NORETURN;
-
-tcg_temp_free(s1);
-tcg_temp_free(dst);
 return true;
 }
 
@@ -173,7 +165,7 @@ static bool ldst_us_trans(uint32_t vd, uint32_t rs1, 
uint32_t data,
 
 dest = tcg_temp_new_ptr();
 mask = tcg_temp_new_ptr();
-base = tcg_temp_new();
+base = get_gpr(s, rs1, EXT_NONE);
 
 /*
  * As simd_desc supports at most 256 bytes, and in this implementation,
@@ -184,7 +176,6 @@ static bool ldst_us_trans(uint32_t vd, uint32_t rs1, 
uint32_t data,
  */
 desc = tcg_constant_i32(simd_desc(s->vlen / 8, s->vlen / 8, data));
 
-gen_get_gpr(s, base, rs1);
 tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
 tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
 
@@ -192,7 +183,6 @@ static bool ldst_us_trans(uint32_t vd, uint32_t rs1, 
uint32_t data,
 
 tcg_temp_free_ptr(dest);
 tcg_temp_free_ptr(mask);
-tcg_temp_free(base);
 gen_set_label(over);
 return true;
 }
@@ -330,12 +320,10 @@ static bool ldst_stride_trans(uint32_t vd, uint32_t rs1, 
uint32_t rs2,
 
 dest = tcg_temp_new_ptr();
 mask = tcg_temp_new_ptr();
-base = tcg_temp_new();
-stride = tcg_temp_new();
+base = get_gpr(s, rs1, EXT_NONE);
+stride = get_gpr(s, rs2, EXT_NONE);
 desc = tcg_constant_i32(simd_desc(s->vlen / 8, s->vlen / 8, data));
 
-gen_get_gpr(s, base, rs1);
-gen_get_gpr(s, stride, rs2);
 tcg_gen_addi_ptr(dest, cpu_env, vreg_ofs(s, vd));
 tcg_gen_addi_ptr(mask, cpu_env, vreg_ofs(s, 0));
 
@@ -343,8 +331,6 @@ static bool ldst_stride_trans(uint32_t vd, uint32_t rs1, 
uint32_t rs2,
 
 tcg_temp_free_ptr(dest);
 tcg_temp_free_ptr(mask);
-tcg_temp_free(base);
-tcg_temp_free(stride);
 gen_set_label(over);
 return true;
 }
@@ -458,10 +444,9 @@ static bool 

[PATCH v5 22/24] target/riscv: Use {get,dest}_gpr for RVD

2021-08-23 Thread Richard Henderson
Reviewed-by: Bin Meng 
Signed-off-by: Richard Henderson 
---
 target/riscv/insn_trans/trans_rvd.c.inc | 125 
 1 file changed, 60 insertions(+), 65 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvd.c.inc 
b/target/riscv/insn_trans/trans_rvd.c.inc
index 11b9b3f90b..db9ae15755 100644
--- a/target/riscv/insn_trans/trans_rvd.c.inc
+++ b/target/riscv/insn_trans/trans_rvd.c.inc
@@ -20,30 +20,40 @@
 
 static bool trans_fld(DisasContext *ctx, arg_fld *a)
 {
+TCGv addr;
+
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVD);
-TCGv t0 = tcg_temp_new();
-gen_get_gpr(ctx, t0, a->rs1);
-tcg_gen_addi_tl(t0, t0, a->imm);
 
-tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], t0, ctx->mem_idx, MO_TEQ);
+addr = get_gpr(ctx, a->rs1, EXT_NONE);
+if (a->imm) {
+TCGv temp = temp_new(ctx);
+tcg_gen_addi_tl(temp, addr, a->imm);
+addr = temp;
+}
+
+tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], addr, ctx->mem_idx, MO_TEQ);
 
 mark_fs_dirty(ctx);
-tcg_temp_free(t0);
 return true;
 }
 
 static bool trans_fsd(DisasContext *ctx, arg_fsd *a)
 {
+TCGv addr;
+
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVD);
-TCGv t0 = tcg_temp_new();
-gen_get_gpr(ctx, t0, a->rs1);
-tcg_gen_addi_tl(t0, t0, a->imm);
 
-tcg_gen_qemu_st_i64(cpu_fpr[a->rs2], t0, ctx->mem_idx, MO_TEQ);
+addr = get_gpr(ctx, a->rs1, EXT_NONE);
+if (a->imm) {
+TCGv temp = temp_new(ctx);
+tcg_gen_addi_tl(temp, addr, a->imm);
+addr = temp;
+}
+
+tcg_gen_qemu_st_i64(cpu_fpr[a->rs2], addr, ctx->mem_idx, MO_TEQ);
 
-tcg_temp_free(t0);
 return true;
 }
 
@@ -252,11 +262,10 @@ static bool trans_feq_d(DisasContext *ctx, arg_feq_d *a)
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVD);
 
-TCGv t0 = tcg_temp_new();
-gen_helper_feq_d(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
-gen_set_gpr(ctx, a->rd, t0);
-tcg_temp_free(t0);
+TCGv dest = dest_gpr(ctx, a->rd);
 
+gen_helper_feq_d(dest, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
@@ -265,11 +274,10 @@ static bool trans_flt_d(DisasContext *ctx, arg_flt_d *a)
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVD);
 
-TCGv t0 = tcg_temp_new();
-gen_helper_flt_d(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
-gen_set_gpr(ctx, a->rd, t0);
-tcg_temp_free(t0);
+TCGv dest = dest_gpr(ctx, a->rd);
 
+gen_helper_flt_d(dest, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
@@ -278,11 +286,10 @@ static bool trans_fle_d(DisasContext *ctx, arg_fle_d *a)
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVD);
 
-TCGv t0 = tcg_temp_new();
-gen_helper_fle_d(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
-gen_set_gpr(ctx, a->rd, t0);
-tcg_temp_free(t0);
+TCGv dest = dest_gpr(ctx, a->rd);
 
+gen_helper_fle_d(dest, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
@@ -291,10 +298,10 @@ static bool trans_fclass_d(DisasContext *ctx, 
arg_fclass_d *a)
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVD);
 
-TCGv t0 = tcg_temp_new();
-gen_helper_fclass_d(t0, cpu_fpr[a->rs1]);
-gen_set_gpr(ctx, a->rd, t0);
-tcg_temp_free(t0);
+TCGv dest = dest_gpr(ctx, a->rd);
+
+gen_helper_fclass_d(dest, cpu_fpr[a->rs1]);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
@@ -303,12 +310,11 @@ static bool trans_fcvt_w_d(DisasContext *ctx, 
arg_fcvt_w_d *a)
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVD);
 
-TCGv t0 = tcg_temp_new();
-gen_set_rm(ctx, a->rm);
-gen_helper_fcvt_w_d(t0, cpu_env, cpu_fpr[a->rs1]);
-gen_set_gpr(ctx, a->rd, t0);
-tcg_temp_free(t0);
+TCGv dest = dest_gpr(ctx, a->rd);
 
+gen_set_rm(ctx, a->rm);
+gen_helper_fcvt_w_d(dest, cpu_env, cpu_fpr[a->rs1]);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
@@ -317,12 +323,11 @@ static bool trans_fcvt_wu_d(DisasContext *ctx, 
arg_fcvt_wu_d *a)
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVD);
 
-TCGv t0 = tcg_temp_new();
-gen_set_rm(ctx, a->rm);
-gen_helper_fcvt_wu_d(t0, cpu_env, cpu_fpr[a->rs1]);
-gen_set_gpr(ctx, a->rd, t0);
-tcg_temp_free(t0);
+TCGv dest = dest_gpr(ctx, a->rd);
 
+gen_set_rm(ctx, a->rm);
+gen_helper_fcvt_wu_d(dest, cpu_env, cpu_fpr[a->rs1]);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
@@ -331,12 +336,10 @@ static bool trans_fcvt_d_w(DisasContext *ctx, 
arg_fcvt_d_w *a)
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVD);
 
-TCGv t0 = tcg_temp_new();
-gen_get_gpr(ctx, t0, a->rs1);
+TCGv src = get_gpr(ctx, a->rs1, EXT_SIGN);
 
 gen_set_rm(ctx, a->rm);
-gen_helper_fcvt_d_w(cpu_fpr[a->rd], cpu_env, t0);
-tcg_temp_free(t0);
+gen_helper_fcvt_d_w(cpu_fpr[a->rd], cpu_env, src);
 
 mark_fs_dirty(ctx);
 return true;
@@ -347,12 +350,10 @@ static bool trans_fcvt_d_wu(DisasContext *ctx, 
arg_fcvt_d_wu *a)
 REQUIRE_FPU;
 

[PATCH v5 18/24] target/riscv: Reorg csr instructions

2021-08-23 Thread Richard Henderson
Introduce csrr and csrw helpers, for read-only and write-only insns.

Note that we do not properly implement this in riscv_csrrw, in that
we cannot distinguish true read-only (rs1 == 0) from any other zero
write_mask another source register -- this should still raise an
exception for read-only registers.

Only issue gen_io_start for CF_USE_ICOUNT.
Use ctx->zero for csrrc.
Use get_gpr and dest_gpr.

Reviewed-by: Bin Meng 
Signed-off-by: Richard Henderson 
---
 target/riscv/helper.h   |   6 +-
 target/riscv/op_helper.c|  18 +--
 target/riscv/insn_trans/trans_rvi.c.inc | 172 +---
 3 files changed, 131 insertions(+), 65 deletions(-)

diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 415e37bc37..460eee9988 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -65,9 +65,9 @@ DEF_HELPER_FLAGS_2(gorc, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 DEF_HELPER_FLAGS_2(gorcw, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 
 /* Special functions */
-DEF_HELPER_3(csrrw, tl, env, tl, tl)
-DEF_HELPER_4(csrrs, tl, env, tl, tl, tl)
-DEF_HELPER_4(csrrc, tl, env, tl, tl, tl)
+DEF_HELPER_2(csrr, tl, env, int)
+DEF_HELPER_3(csrw, void, env, int, tl)
+DEF_HELPER_4(csrrw, tl, env, int, tl, tl)
 #ifndef CONFIG_USER_ONLY
 DEF_HELPER_2(sret, tl, env, tl)
 DEF_HELPER_2(mret, tl, env, tl)
diff --git a/target/riscv/op_helper.c b/target/riscv/op_helper.c
index 3c48e739ac..ee7c24efe7 100644
--- a/target/riscv/op_helper.c
+++ b/target/riscv/op_helper.c
@@ -37,11 +37,10 @@ void helper_raise_exception(CPURISCVState *env, uint32_t 
exception)
 riscv_raise_exception(env, exception, 0);
 }
 
-target_ulong helper_csrrw(CPURISCVState *env, target_ulong src,
-target_ulong csr)
+target_ulong helper_csrr(CPURISCVState *env, int csr)
 {
 target_ulong val = 0;
-RISCVException ret = riscv_csrrw(env, csr, , src, -1);
+RISCVException ret = riscv_csrrw(env, csr, , 0, 0);
 
 if (ret != RISCV_EXCP_NONE) {
 riscv_raise_exception(env, ret, GETPC());
@@ -49,23 +48,20 @@ target_ulong helper_csrrw(CPURISCVState *env, target_ulong 
src,
 return val;
 }
 
-target_ulong helper_csrrs(CPURISCVState *env, target_ulong src,
-target_ulong csr, target_ulong rs1_pass)
+void helper_csrw(CPURISCVState *env, int csr, target_ulong src)
 {
-target_ulong val = 0;
-RISCVException ret = riscv_csrrw(env, csr, , -1, rs1_pass ? src : 0);
+RISCVException ret = riscv_csrrw(env, csr, NULL, src, -1);
 
 if (ret != RISCV_EXCP_NONE) {
 riscv_raise_exception(env, ret, GETPC());
 }
-return val;
 }
 
-target_ulong helper_csrrc(CPURISCVState *env, target_ulong src,
-target_ulong csr, target_ulong rs1_pass)
+target_ulong helper_csrrw(CPURISCVState *env, int csr,
+  target_ulong src, target_ulong write_mask)
 {
 target_ulong val = 0;
-RISCVException ret = riscv_csrrw(env, csr, , 0, rs1_pass ? src : 0);
+RISCVException ret = riscv_csrrw(env, csr, , src, write_mask);
 
 if (ret != RISCV_EXCP_NONE) {
 riscv_raise_exception(env, ret, GETPC());
diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc
index 76454fb7e2..920ae0edb3 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -426,80 +426,150 @@ static bool trans_fence_i(DisasContext *ctx, arg_fence_i 
*a)
 return true;
 }
 
-#define RISCV_OP_CSR_PRE do {\
-source1 = tcg_temp_new(); \
-csr_store = tcg_temp_new(); \
-dest = tcg_temp_new(); \
-rs1_pass = tcg_temp_new(); \
-gen_get_gpr(ctx, source1, a->rs1); \
-tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next); \
-tcg_gen_movi_tl(rs1_pass, a->rs1); \
-tcg_gen_movi_tl(csr_store, a->csr); \
-gen_io_start();\
-} while (0)
+static bool do_csr_post(DisasContext *ctx)
+{
+/* We may have changed important cpu state -- exit to main loop. */
+tcg_gen_movi_tl(cpu_pc, ctx->pc_succ_insn);
+exit_tb(ctx);
+ctx->base.is_jmp = DISAS_NORETURN;
+return true;
+}
 
-#define RISCV_OP_CSR_POST do {\
-gen_set_gpr(ctx, a->rd, dest); \
-tcg_gen_movi_tl(cpu_pc, ctx->pc_succ_insn); \
-exit_tb(ctx); \
-ctx->base.is_jmp = DISAS_NORETURN; \
-tcg_temp_free(source1); \
-tcg_temp_free(csr_store); \
-tcg_temp_free(dest); \
-tcg_temp_free(rs1_pass); \
-} while (0)
+static bool do_csrr(DisasContext *ctx, int rd, int rc)
+{
+TCGv dest = dest_gpr(ctx, rd);
+TCGv_i32 csr = tcg_constant_i32(rc);
 
+if (tb_cflags(ctx->base.tb) & CF_USE_ICOUNT) {
+gen_io_start();
+}
+gen_helper_csrr(dest, cpu_env, csr);
+gen_set_gpr(ctx, rd, dest);
+return do_csr_post(ctx);
+}
+
+static bool do_csrw(DisasContext *ctx, int rc, TCGv src)
+{
+TCGv_i32 csr = tcg_constant_i32(rc);
+
+if (tb_cflags(ctx->base.tb) & CF_USE_ICOUNT) {
+gen_io_start();
+}
+gen_helper_csrw(cpu_env, csr, src);
+return do_csr_post(ctx);
+}
+
+static bool 

[PATCH v5 20/24] target/riscv: Use gen_shift_imm_fn for slli_uw

2021-08-23 Thread Richard Henderson
Always use tcg_gen_deposit_z_tl; the special case for
shamt >= 32 is handled there.

Signed-off-by: Richard Henderson 
---
 target/riscv/insn_trans/trans_rvb.c.inc | 19 ++-
 1 file changed, 6 insertions(+), 13 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvb.c.inc 
b/target/riscv/insn_trans/trans_rvb.c.inc
index b97c3ca5da..b72e76255c 100644
--- a/target/riscv/insn_trans/trans_rvb.c.inc
+++ b/target/riscv/insn_trans/trans_rvb.c.inc
@@ -635,21 +635,14 @@ static bool trans_add_uw(DisasContext *ctx, arg_add_uw *a)
 return gen_arith(ctx, a, EXT_NONE, gen_add_uw);
 }
 
+static void gen_slli_uw(TCGv dest, TCGv src, target_long shamt)
+{
+tcg_gen_deposit_z_tl(dest, src, shamt, MIN(32, TARGET_LONG_BITS - shamt));
+}
+
 static bool trans_slli_uw(DisasContext *ctx, arg_slli_uw *a)
 {
 REQUIRE_64BIT(ctx);
 REQUIRE_EXT(ctx, RVB);
-
-TCGv source1 = tcg_temp_new();
-gen_get_gpr(ctx, source1, a->rs1);
-
-if (a->shamt < 32) {
-tcg_gen_deposit_z_tl(source1, source1, a->shamt, 32);
-} else {
-tcg_gen_shli_tl(source1, source1, a->shamt);
-}
-
-gen_set_gpr(ctx, a->rd, source1);
-tcg_temp_free(source1);
-return true;
+return gen_shift_imm_fn(ctx, a, EXT_NONE, gen_slli_uw);
 }
-- 
2.25.1




[PATCH v5 19/24] target/riscv: Use {get,dest}_gpr for RVA

2021-08-23 Thread Richard Henderson
Reviewed-by: Bin Meng 
Signed-off-by: Richard Henderson 
---
 target/riscv/insn_trans/trans_rva.c.inc | 47 ++---
 1 file changed, 19 insertions(+), 28 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rva.c.inc 
b/target/riscv/insn_trans/trans_rva.c.inc
index 3cc3c3b073..6ea07d89b0 100644
--- a/target/riscv/insn_trans/trans_rva.c.inc
+++ b/target/riscv/insn_trans/trans_rva.c.inc
@@ -18,11 +18,10 @@
  * this program.  If not, see .
  */
 
-static inline bool gen_lr(DisasContext *ctx, arg_atomic *a, MemOp mop)
+static bool gen_lr(DisasContext *ctx, arg_atomic *a, MemOp mop)
 {
-TCGv src1 = tcg_temp_new();
-/* Put addr in load_res, data in load_val.  */
-gen_get_gpr(ctx, src1, a->rs1);
+TCGv src1 = get_gpr(ctx, a->rs1, EXT_ZERO);
+
 if (a->rl) {
 tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL);
 }
@@ -30,33 +29,33 @@ static inline bool gen_lr(DisasContext *ctx, arg_atomic *a, 
MemOp mop)
 if (a->aq) {
 tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);
 }
+
+/* Put addr in load_res, data in load_val.  */
 tcg_gen_mov_tl(load_res, src1);
 gen_set_gpr(ctx, a->rd, load_val);
 
-tcg_temp_free(src1);
 return true;
 }
 
-static inline bool gen_sc(DisasContext *ctx, arg_atomic *a, MemOp mop)
+static bool gen_sc(DisasContext *ctx, arg_atomic *a, MemOp mop)
 {
-TCGv src1 = tcg_temp_new();
-TCGv src2 = tcg_temp_new();
-TCGv dat = tcg_temp_new();
+TCGv dest, src1, src2;
 TCGLabel *l1 = gen_new_label();
 TCGLabel *l2 = gen_new_label();
 
-gen_get_gpr(ctx, src1, a->rs1);
+src1 = get_gpr(ctx, a->rs1, EXT_ZERO);
 tcg_gen_brcond_tl(TCG_COND_NE, load_res, src1, l1);
 
-gen_get_gpr(ctx, src2, a->rs2);
 /*
  * Note that the TCG atomic primitives are SC,
  * so we can ignore AQ/RL along this path.
  */
-tcg_gen_atomic_cmpxchg_tl(src1, load_res, load_val, src2,
+dest = dest_gpr(ctx, a->rd);
+src2 = get_gpr(ctx, a->rs2, EXT_NONE);
+tcg_gen_atomic_cmpxchg_tl(dest, load_res, load_val, src2,
   ctx->mem_idx, mop);
-tcg_gen_setcond_tl(TCG_COND_NE, dat, src1, load_val);
-gen_set_gpr(ctx, a->rd, dat);
+tcg_gen_setcond_tl(TCG_COND_NE, dest, dest, load_val);
+gen_set_gpr(ctx, a->rd, dest);
 tcg_gen_br(l2);
 
 gen_set_label(l1);
@@ -65,8 +64,7 @@ static inline bool gen_sc(DisasContext *ctx, arg_atomic *a, 
MemOp mop)
  * provide the memory barrier implied by AQ/RL.
  */
 tcg_gen_mb(TCG_MO_ALL + a->aq * TCG_BAR_LDAQ + a->rl * TCG_BAR_STRL);
-tcg_gen_movi_tl(dat, 1);
-gen_set_gpr(ctx, a->rd, dat);
+gen_set_gpr(ctx, a->rd, tcg_constant_tl(1));
 
 gen_set_label(l2);
 /*
@@ -75,9 +73,6 @@ static inline bool gen_sc(DisasContext *ctx, arg_atomic *a, 
MemOp mop)
  */
 tcg_gen_movi_tl(load_res, -1);
 
-tcg_temp_free(dat);
-tcg_temp_free(src1);
-tcg_temp_free(src2);
 return true;
 }
 
@@ -85,17 +80,13 @@ static bool gen_amo(DisasContext *ctx, arg_atomic *a,
 void(*func)(TCGv, TCGv, TCGv, TCGArg, MemOp),
 MemOp mop)
 {
-TCGv src1 = tcg_temp_new();
-TCGv src2 = tcg_temp_new();
+TCGv dest = dest_gpr(ctx, a->rd);
+TCGv src1 = get_gpr(ctx, a->rs1, EXT_NONE);
+TCGv src2 = get_gpr(ctx, a->rs2, EXT_NONE);
 
-gen_get_gpr(ctx, src1, a->rs1);
-gen_get_gpr(ctx, src2, a->rs2);
+func(dest, src1, src2, ctx->mem_idx, mop);
 
-(*func)(src2, src1, src2, ctx->mem_idx, mop);
-
-gen_set_gpr(ctx, a->rd, src2);
-tcg_temp_free(src1);
-tcg_temp_free(src2);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
-- 
2.25.1




[PATCH v5 17/24] target/riscv: Fix hgeie, hgeip

2021-08-23 Thread Richard Henderson
We failed to write into *val for these read functions;
replace them with read_zero.  Only warn about unsupported
non-zero value when writing a non-zero value.

Signed-off-by: Richard Henderson 
---
 target/riscv/csr.c | 26 --
 1 file changed, 8 insertions(+), 18 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index d900f96dc1..905860dbb2 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -1124,17 +1124,12 @@ static RISCVException write_hcounteren(CPURISCVState 
*env, int csrno,
 return RISCV_EXCP_NONE;
 }
 
-static RISCVException read_hgeie(CPURISCVState *env, int csrno,
- target_ulong *val)
-{
-qemu_log_mask(LOG_UNIMP, "No support for a non-zero GEILEN.");
-return RISCV_EXCP_NONE;
-}
-
 static RISCVException write_hgeie(CPURISCVState *env, int csrno,
   target_ulong val)
 {
-qemu_log_mask(LOG_UNIMP, "No support for a non-zero GEILEN.");
+if (val) {
+qemu_log_mask(LOG_UNIMP, "No support for a non-zero GEILEN.");
+}
 return RISCV_EXCP_NONE;
 }
 
@@ -1165,17 +1160,12 @@ static RISCVException write_htinst(CPURISCVState *env, 
int csrno,
 return RISCV_EXCP_NONE;
 }
 
-static RISCVException read_hgeip(CPURISCVState *env, int csrno,
- target_ulong *val)
-{
-qemu_log_mask(LOG_UNIMP, "No support for a non-zero GEILEN.");
-return RISCV_EXCP_NONE;
-}
-
 static RISCVException write_hgeip(CPURISCVState *env, int csrno,
   target_ulong val)
 {
-qemu_log_mask(LOG_UNIMP, "No support for a non-zero GEILEN.");
+if (val) {
+qemu_log_mask(LOG_UNIMP, "No support for a non-zero GEILEN.");
+}
 return RISCV_EXCP_NONE;
 }
 
@@ -1599,10 +1589,10 @@ riscv_csr_operations csr_ops[CSR_TABLE_SIZE] = {
 [CSR_HIP] = { "hip", hmode,   NULL,   NULL, rmw_hip
   },
 [CSR_HIE] = { "hie", hmode,   read_hie, write_hie  
   },
 [CSR_HCOUNTEREN]  = { "hcounteren",  hmode,   read_hcounteren,  
write_hcounteren  },
-[CSR_HGEIE]   = { "hgeie",   hmode,   read_hgeie,   
write_hgeie   },
+[CSR_HGEIE]   = { "hgeie",   hmode,   read_zero,
write_hgeie   },
 [CSR_HTVAL]   = { "htval",   hmode,   read_htval,   
write_htval   },
 [CSR_HTINST]  = { "htinst",  hmode,   read_htinst,  
write_htinst  },
-[CSR_HGEIP]   = { "hgeip",   hmode,   read_hgeip,   
write_hgeip   },
+[CSR_HGEIP]   = { "hgeip",   hmode,   read_zero,
write_hgeip   },
 [CSR_HGATP]   = { "hgatp",   hmode,   read_hgatp,   
write_hgatp   },
 [CSR_HTIMEDELTA]  = { "htimedelta",  hmode,   read_htimedelta,  
write_htimedelta  },
 [CSR_HTIMEDELTAH] = { "htimedeltah", hmode32, read_htimedeltah, 
write_htimedeltah },
-- 
2.25.1




[PATCH v5 13/24] target/riscv: Use extracts for sraiw and srliw

2021-08-23 Thread Richard Henderson
These operations can be done in one instruction on some hosts.

Signed-off-by: Richard Henderson 
---
 target/riscv/insn_trans/trans_rvi.c.inc | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc
index e4726e618c..9e8d99be51 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -347,18 +347,28 @@ static bool trans_slliw(DisasContext *ctx, arg_slliw *a)
 return gen_shift_imm_fn(ctx, a, EXT_NONE, tcg_gen_shli_tl);
 }
 
+static void gen_srliw(TCGv dst, TCGv src, target_long shamt)
+{
+tcg_gen_extract_tl(dst, src, shamt, 32 - shamt);
+}
+
 static bool trans_srliw(DisasContext *ctx, arg_srliw *a)
 {
 REQUIRE_64BIT(ctx);
 ctx->w = true;
-return gen_shift_imm_fn(ctx, a, EXT_ZERO, tcg_gen_shri_tl);
+return gen_shift_imm_fn(ctx, a, EXT_NONE, gen_srliw);
+}
+
+static void gen_sraiw(TCGv dst, TCGv src, target_long shamt)
+{
+tcg_gen_sextract_tl(dst, src, shamt, 32 - shamt);
 }
 
 static bool trans_sraiw(DisasContext *ctx, arg_sraiw *a)
 {
 REQUIRE_64BIT(ctx);
 ctx->w = true;
-return gen_shift_imm_fn(ctx, a, EXT_SIGN, tcg_gen_sari_tl);
+return gen_shift_imm_fn(ctx, a, EXT_NONE, gen_sraiw);
 }
 
 static bool trans_addw(DisasContext *ctx, arg_addw *a)
-- 
2.25.1




[PATCH v5 23/24] target/riscv: Tidy trans_rvh.c.inc

2021-08-23 Thread Richard Henderson
Exit early if check_access fails.
Split out do_hlv, do_hsv, do_hlvx subroutines.
Use dest_gpr, get_gpr in the new subroutines.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/riscv/insn32.decode  |   1 +
 target/riscv/insn_trans/trans_rvh.c.inc | 266 +---
 2 files changed, 57 insertions(+), 210 deletions(-)

diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index f09f8d5faf..2cd921d51c 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -42,6 +42,7 @@
 imm rd
 rd rs1 rs2
rd rs1
+_s rs1 rs2
 imm rs1 rs2
 imm rd
  shamt rs1 rd
diff --git a/target/riscv/insn_trans/trans_rvh.c.inc 
b/target/riscv/insn_trans/trans_rvh.c.inc
index 585eb1d87e..ecbf77ff9c 100644
--- a/target/riscv/insn_trans/trans_rvh.c.inc
+++ b/target/riscv/insn_trans/trans_rvh.c.inc
@@ -17,281 +17,139 @@
  */
 
 #ifndef CONFIG_USER_ONLY
-static void check_access(DisasContext *ctx) {
+static bool check_access(DisasContext *ctx)
+{
 if (!ctx->hlsx) {
 if (ctx->virt_enabled) {
 generate_exception(ctx, RISCV_EXCP_VIRT_INSTRUCTION_FAULT);
 } else {
 generate_exception(ctx, RISCV_EXCP_ILLEGAL_INST);
 }
+return false;
 }
+return true;
 }
 #endif
 
+static bool do_hlv(DisasContext *ctx, arg_r2 *a, MemOp mop)
+{
+#ifdef CONFIG_USER_ONLY
+return false;
+#else
+if (check_access(ctx)) {
+TCGv dest = dest_gpr(ctx, a->rd);
+TCGv addr = get_gpr(ctx, a->rs1, EXT_NONE);
+int mem_idx = ctx->mem_idx | TB_FLAGS_PRIV_HYP_ACCESS_MASK;
+tcg_gen_qemu_ld_tl(dest, addr, mem_idx, mop);
+gen_set_gpr(ctx, a->rd, dest);
+}
+return true;
+#endif
+}
+
 static bool trans_hlv_b(DisasContext *ctx, arg_hlv_b *a)
 {
 REQUIRE_EXT(ctx, RVH);
-#ifndef CONFIG_USER_ONLY
-TCGv t0 = tcg_temp_new();
-TCGv t1 = tcg_temp_new();
-
-check_access(ctx);
-
-gen_get_gpr(ctx, t0, a->rs1);
-
-tcg_gen_qemu_ld_tl(t1, t0, ctx->mem_idx | TB_FLAGS_PRIV_HYP_ACCESS_MASK, 
MO_SB);
-gen_set_gpr(ctx, a->rd, t1);
-
-tcg_temp_free(t0);
-tcg_temp_free(t1);
-return true;
-#else
-return false;
-#endif
+return do_hlv(ctx, a, MO_SB);
 }
 
 static bool trans_hlv_h(DisasContext *ctx, arg_hlv_h *a)
 {
 REQUIRE_EXT(ctx, RVH);
-#ifndef CONFIG_USER_ONLY
-TCGv t0 = tcg_temp_new();
-TCGv t1 = tcg_temp_new();
-
-check_access(ctx);
-
-gen_get_gpr(ctx, t0, a->rs1);
-
-tcg_gen_qemu_ld_tl(t1, t0, ctx->mem_idx | TB_FLAGS_PRIV_HYP_ACCESS_MASK, 
MO_TESW);
-gen_set_gpr(ctx, a->rd, t1);
-
-tcg_temp_free(t0);
-tcg_temp_free(t1);
-return true;
-#else
-return false;
-#endif
+return do_hlv(ctx, a, MO_TESW);
 }
 
 static bool trans_hlv_w(DisasContext *ctx, arg_hlv_w *a)
 {
 REQUIRE_EXT(ctx, RVH);
-#ifndef CONFIG_USER_ONLY
-TCGv t0 = tcg_temp_new();
-TCGv t1 = tcg_temp_new();
-
-check_access(ctx);
-
-gen_get_gpr(ctx, t0, a->rs1);
-
-tcg_gen_qemu_ld_tl(t1, t0, ctx->mem_idx | TB_FLAGS_PRIV_HYP_ACCESS_MASK, 
MO_TESL);
-gen_set_gpr(ctx, a->rd, t1);
-
-tcg_temp_free(t0);
-tcg_temp_free(t1);
-return true;
-#else
-return false;
-#endif
+return do_hlv(ctx, a, MO_TESL);
 }
 
 static bool trans_hlv_bu(DisasContext *ctx, arg_hlv_bu *a)
 {
 REQUIRE_EXT(ctx, RVH);
-#ifndef CONFIG_USER_ONLY
-TCGv t0 = tcg_temp_new();
-TCGv t1 = tcg_temp_new();
-
-check_access(ctx);
-
-gen_get_gpr(ctx, t0, a->rs1);
-
-tcg_gen_qemu_ld_tl(t1, t0, ctx->mem_idx | TB_FLAGS_PRIV_HYP_ACCESS_MASK, 
MO_UB);
-gen_set_gpr(ctx, a->rd, t1);
-
-tcg_temp_free(t0);
-tcg_temp_free(t1);
-return true;
-#else
-return false;
-#endif
+return do_hlv(ctx, a, MO_UB);
 }
 
 static bool trans_hlv_hu(DisasContext *ctx, arg_hlv_hu *a)
 {
 REQUIRE_EXT(ctx, RVH);
-#ifndef CONFIG_USER_ONLY
-TCGv t0 = tcg_temp_new();
-TCGv t1 = tcg_temp_new();
+return do_hlv(ctx, a, MO_TEUW);
+}
 
-check_access(ctx);
-
-gen_get_gpr(ctx, t0, a->rs1);
-tcg_gen_qemu_ld_tl(t1, t0, ctx->mem_idx | TB_FLAGS_PRIV_HYP_ACCESS_MASK, 
MO_TEUW);
-gen_set_gpr(ctx, a->rd, t1);
-
-tcg_temp_free(t0);
-tcg_temp_free(t1);
-return true;
-#else
+static bool do_hsv(DisasContext *ctx, arg_r2_s *a, MemOp mop)
+{
+#ifdef CONFIG_USER_ONLY
 return false;
+#else
+if (check_access(ctx)) {
+TCGv addr = get_gpr(ctx, a->rs1, EXT_NONE);
+TCGv data = get_gpr(ctx, a->rs2, EXT_NONE);
+int mem_idx = ctx->mem_idx | TB_FLAGS_PRIV_HYP_ACCESS_MASK;
+tcg_gen_qemu_st_tl(data, addr, mem_idx, mop);
+}
+return true;
 #endif
 }
 
 static bool trans_hsv_b(DisasContext *ctx, arg_hsv_b *a)
 {
 REQUIRE_EXT(ctx, RVH);
-#ifndef CONFIG_USER_ONLY
-TCGv t0 = tcg_temp_new();
-TCGv dat = tcg_temp_new();
-
-check_access(ctx);
-
-gen_get_gpr(ctx, t0, a->rs1);
-gen_get_gpr(ctx, dat, a->rs2);
-
-tcg_gen_qemu_st_tl(dat, 

[PATCH v5 09/24] target/riscv: Move gen_* helpers for RVM

2021-08-23 Thread Richard Henderson
Move these helpers near their use by the trans_*
functions within insn_trans/trans_rvm.c.inc.

Reviewed-by: Bin Meng 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alistair Francis 
Signed-off-by: Richard Henderson 
---
 target/riscv/translate.c| 127 
 target/riscv/insn_trans/trans_rvm.c.inc | 127 
 2 files changed, 127 insertions(+), 127 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 1855eacbac..7fbacfa6ee 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -249,133 +249,6 @@ static void gen_set_gpr(DisasContext *ctx, int reg_num, 
TCGv t)
 }
 }
 
-static void gen_mulhsu(TCGv ret, TCGv arg1, TCGv arg2)
-{
-TCGv rl = tcg_temp_new();
-TCGv rh = tcg_temp_new();
-
-tcg_gen_mulu2_tl(rl, rh, arg1, arg2);
-/* fix up for one negative */
-tcg_gen_sari_tl(rl, arg1, TARGET_LONG_BITS - 1);
-tcg_gen_and_tl(rl, rl, arg2);
-tcg_gen_sub_tl(ret, rh, rl);
-
-tcg_temp_free(rl);
-tcg_temp_free(rh);
-}
-
-static void gen_div(TCGv ret, TCGv source1, TCGv source2)
-{
-TCGv temp1, temp2, zero, one, mone, min;
-
-temp1 = tcg_temp_new();
-temp2 = tcg_temp_new();
-zero = tcg_constant_tl(0);
-one = tcg_constant_tl(1);
-mone = tcg_constant_tl(-1);
-min = tcg_constant_tl(1ull << (TARGET_LONG_BITS - 1));
-
-/*
- * If overflow, set temp2 to 1, else source2.
- * This produces the required result of min.
- */
-tcg_gen_setcond_tl(TCG_COND_EQ, temp1, source1, min);
-tcg_gen_setcond_tl(TCG_COND_EQ, temp2, source2, mone);
-tcg_gen_and_tl(temp1, temp1, temp2);
-tcg_gen_movcond_tl(TCG_COND_NE, temp2, temp1, zero, one, source2);
-
-/*
- * If div by zero, set temp1 to -1 and temp2 to 1 to
- * produce the required result of -1.
- */
-tcg_gen_movcond_tl(TCG_COND_EQ, temp1, source2, zero, mone, source1);
-tcg_gen_movcond_tl(TCG_COND_EQ, temp2, source2, zero, one, temp2);
-
-tcg_gen_div_tl(ret, temp1, temp2);
-
-tcg_temp_free(temp1);
-tcg_temp_free(temp2);
-}
-
-static void gen_divu(TCGv ret, TCGv source1, TCGv source2)
-{
-TCGv temp1, temp2, zero, one, max;
-
-temp1 = tcg_temp_new();
-temp2 = tcg_temp_new();
-zero = tcg_constant_tl(0);
-one = tcg_constant_tl(1);
-max = tcg_constant_tl(~0);
-
-/*
- * If div by zero, set temp1 to max and temp2 to 1 to
- * produce the required result of max.
- */
-tcg_gen_movcond_tl(TCG_COND_EQ, temp1, source2, zero, max, source1);
-tcg_gen_movcond_tl(TCG_COND_EQ, temp2, source2, zero, one, source2);
-tcg_gen_divu_tl(ret, temp1, temp2);
-
-tcg_temp_free(temp1);
-tcg_temp_free(temp2);
-}
-
-static void gen_rem(TCGv ret, TCGv source1, TCGv source2)
-{
-TCGv temp1, temp2, zero, one, mone, min;
-
-temp1 = tcg_temp_new();
-temp2 = tcg_temp_new();
-zero = tcg_constant_tl(0);
-one = tcg_constant_tl(1);
-mone = tcg_constant_tl(-1);
-min = tcg_constant_tl(1ull << (TARGET_LONG_BITS - 1));
-
-/*
- * If overflow, set temp1 to 0, else source1.
- * This avoids a possible host trap, and produces the required result of 0.
- */
-tcg_gen_setcond_tl(TCG_COND_EQ, temp1, source1, min);
-tcg_gen_setcond_tl(TCG_COND_EQ, temp2, source2, mone);
-tcg_gen_and_tl(temp1, temp1, temp2);
-tcg_gen_movcond_tl(TCG_COND_NE, temp1, temp1, zero, zero, source1);
-
-/*
- * If div by zero, set temp2 to 1, else source2.
- * This avoids a possible host trap, but produces an incorrect result.
- */
-tcg_gen_movcond_tl(TCG_COND_EQ, temp2, source2, zero, one, source2);
-
-tcg_gen_rem_tl(temp1, temp1, temp2);
-
-/* If div by zero, the required result is the original dividend. */
-tcg_gen_movcond_tl(TCG_COND_EQ, ret, source2, zero, source1, temp1);
-
-tcg_temp_free(temp1);
-tcg_temp_free(temp2);
-}
-
-static void gen_remu(TCGv ret, TCGv source1, TCGv source2)
-{
-TCGv temp, zero, one;
-
-temp = tcg_temp_new();
-zero = tcg_constant_tl(0);
-one = tcg_constant_tl(1);
-
-/*
- * If div by zero, set temp to 1, else source2.
- * This avoids a possible host trap, but produces an incorrect result.
- */
-tcg_gen_movcond_tl(TCG_COND_EQ, temp, source2, zero, one, source2);
-
-tcg_gen_remu_tl(temp, source1, temp);
-
-/* If div by zero, the required result is the original dividend. */
-tcg_gen_movcond_tl(TCG_COND_EQ, ret, source2, zero, source1, temp);
-
-tcg_temp_free(temp);
-}
-
 static void gen_jal(DisasContext *ctx, int rd, target_ulong imm)
 {
 target_ulong next_pc;
diff --git a/target/riscv/insn_trans/trans_rvm.c.inc 
b/target/riscv/insn_trans/trans_rvm.c.inc
index 80552be7a3..b89a85ad3a 100644
--- a/target/riscv/insn_trans/trans_rvm.c.inc
+++ b/target/riscv/insn_trans/trans_rvm.c.inc
@@ -39,6 +39,21 @@ static bool trans_mulh(DisasContext *ctx, arg_mulh *a)
 return gen_arith(ctx, a, EXT_NONE, 

[PATCH v5 21/24] target/riscv: Use {get,dest}_gpr for RVF

2021-08-23 Thread Richard Henderson
Reviewed-by: Bin Meng 
Signed-off-by: Richard Henderson 
---
 target/riscv/insn_trans/trans_rvf.c.inc | 146 
 1 file changed, 70 insertions(+), 76 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvf.c.inc 
b/target/riscv/insn_trans/trans_rvf.c.inc
index fb9f7f9c00..bddbd418d9 100644
--- a/target/riscv/insn_trans/trans_rvf.c.inc
+++ b/target/riscv/insn_trans/trans_rvf.c.inc
@@ -25,32 +25,43 @@
 
 static bool trans_flw(DisasContext *ctx, arg_flw *a)
 {
+TCGv_i64 dest;
+TCGv addr;
+
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVF);
-TCGv t0 = tcg_temp_new();
-gen_get_gpr(ctx, t0, a->rs1);
-tcg_gen_addi_tl(t0, t0, a->imm);
 
-tcg_gen_qemu_ld_i64(cpu_fpr[a->rd], t0, ctx->mem_idx, MO_TEUL);
-gen_nanbox_s(cpu_fpr[a->rd], cpu_fpr[a->rd]);
+addr = get_gpr(ctx, a->rs1, EXT_NONE);
+if (a->imm) {
+TCGv temp = temp_new(ctx);
+tcg_gen_addi_tl(temp, addr, a->imm);
+addr = temp;
+}
+
+dest = cpu_fpr[a->rd];
+tcg_gen_qemu_ld_i64(dest, addr, ctx->mem_idx, MO_TEUL);
+gen_nanbox_s(dest, dest);
 
-tcg_temp_free(t0);
 mark_fs_dirty(ctx);
 return true;
 }
 
 static bool trans_fsw(DisasContext *ctx, arg_fsw *a)
 {
+TCGv addr;
+
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVF);
-TCGv t0 = tcg_temp_new();
-gen_get_gpr(ctx, t0, a->rs1);
 
-tcg_gen_addi_tl(t0, t0, a->imm);
+addr = get_gpr(ctx, a->rs1, EXT_NONE);
+if (a->imm) {
+TCGv temp = tcg_temp_new();
+tcg_gen_addi_tl(temp, addr, a->imm);
+addr = temp;
+}
 
-tcg_gen_qemu_st_i64(cpu_fpr[a->rs2], t0, ctx->mem_idx, MO_TEUL);
+tcg_gen_qemu_st_i64(cpu_fpr[a->rs2], addr, ctx->mem_idx, MO_TEUL);
 
-tcg_temp_free(t0);
 return true;
 }
 
@@ -271,12 +282,11 @@ static bool trans_fcvt_w_s(DisasContext *ctx, 
arg_fcvt_w_s *a)
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVF);
 
-TCGv t0 = tcg_temp_new();
-gen_set_rm(ctx, a->rm);
-gen_helper_fcvt_w_s(t0, cpu_env, cpu_fpr[a->rs1]);
-gen_set_gpr(ctx, a->rd, t0);
-tcg_temp_free(t0);
+TCGv dest = dest_gpr(ctx, a->rd);
 
+gen_set_rm(ctx, a->rm);
+gen_helper_fcvt_w_s(dest, cpu_env, cpu_fpr[a->rs1]);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
@@ -285,12 +295,11 @@ static bool trans_fcvt_wu_s(DisasContext *ctx, 
arg_fcvt_wu_s *a)
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVF);
 
-TCGv t0 = tcg_temp_new();
-gen_set_rm(ctx, a->rm);
-gen_helper_fcvt_wu_s(t0, cpu_env, cpu_fpr[a->rs1]);
-gen_set_gpr(ctx, a->rd, t0);
-tcg_temp_free(t0);
+TCGv dest = dest_gpr(ctx, a->rd);
 
+gen_set_rm(ctx, a->rm);
+gen_helper_fcvt_wu_s(dest, cpu_env, cpu_fpr[a->rs1]);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
@@ -300,17 +309,15 @@ static bool trans_fmv_x_w(DisasContext *ctx, arg_fmv_x_w 
*a)
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVF);
 
-TCGv t0 = tcg_temp_new();
+TCGv dest = dest_gpr(ctx, a->rd);
 
 #if defined(TARGET_RISCV64)
-tcg_gen_ext32s_tl(t0, cpu_fpr[a->rs1]);
+tcg_gen_ext32s_tl(dest, cpu_fpr[a->rs1]);
 #else
-tcg_gen_extrl_i64_i32(t0, cpu_fpr[a->rs1]);
+tcg_gen_extrl_i64_i32(dest, cpu_fpr[a->rs1]);
 #endif
 
-gen_set_gpr(ctx, a->rd, t0);
-tcg_temp_free(t0);
-
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
@@ -318,10 +325,11 @@ static bool trans_feq_s(DisasContext *ctx, arg_feq_s *a)
 {
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVF);
-TCGv t0 = tcg_temp_new();
-gen_helper_feq_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
-gen_set_gpr(ctx, a->rd, t0);
-tcg_temp_free(t0);
+
+TCGv dest = dest_gpr(ctx, a->rd);
+
+gen_helper_feq_s(dest, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
@@ -329,10 +337,11 @@ static bool trans_flt_s(DisasContext *ctx, arg_flt_s *a)
 {
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVF);
-TCGv t0 = tcg_temp_new();
-gen_helper_flt_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
-gen_set_gpr(ctx, a->rd, t0);
-tcg_temp_free(t0);
+
+TCGv dest = dest_gpr(ctx, a->rd);
+
+gen_helper_flt_s(dest, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
@@ -340,10 +349,11 @@ static bool trans_fle_s(DisasContext *ctx, arg_fle_s *a)
 {
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVF);
-TCGv t0 = tcg_temp_new();
-gen_helper_fle_s(t0, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
-gen_set_gpr(ctx, a->rd, t0);
-tcg_temp_free(t0);
+
+TCGv dest = dest_gpr(ctx, a->rd);
+
+gen_helper_fle_s(dest, cpu_env, cpu_fpr[a->rs1], cpu_fpr[a->rs2]);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
@@ -352,13 +362,10 @@ static bool trans_fclass_s(DisasContext *ctx, 
arg_fclass_s *a)
 REQUIRE_FPU;
 REQUIRE_EXT(ctx, RVF);
 
-TCGv t0 = tcg_temp_new();
-
-gen_helper_fclass_s(t0, cpu_fpr[a->rs1]);
-
-gen_set_gpr(ctx, a->rd, t0);
-tcg_temp_free(t0);
+TCGv dest = 

[PATCH v5 14/24] target/riscv: Use get_gpr in branches

2021-08-23 Thread Richard Henderson
Narrow the scope of t0 in trans_jalr.

Reviewed-by: Bin Meng 
Reviewed-by: Alistair Francis 
Signed-off-by: Richard Henderson 
---
 target/riscv/insn_trans/trans_rvi.c.inc | 25 ++---
 1 file changed, 10 insertions(+), 15 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc
index 9e8d99be51..a5249e71c2 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -54,24 +54,25 @@ static bool trans_jal(DisasContext *ctx, arg_jal *a)
 
 static bool trans_jalr(DisasContext *ctx, arg_jalr *a)
 {
-/* no chaining with JALR */
 TCGLabel *misaligned = NULL;
-TCGv t0 = tcg_temp_new();
 
-
-gen_get_gpr(ctx, cpu_pc, a->rs1);
-tcg_gen_addi_tl(cpu_pc, cpu_pc, a->imm);
+tcg_gen_addi_tl(cpu_pc, get_gpr(ctx, a->rs1, EXT_NONE), a->imm);
 tcg_gen_andi_tl(cpu_pc, cpu_pc, (target_ulong)-2);
 
 if (!has_ext(ctx, RVC)) {
+TCGv t0 = tcg_temp_new();
+
 misaligned = gen_new_label();
 tcg_gen_andi_tl(t0, cpu_pc, 0x2);
 tcg_gen_brcondi_tl(TCG_COND_NE, t0, 0x0, misaligned);
+tcg_temp_free(t0);
 }
 
 if (a->rd != 0) {
 tcg_gen_movi_tl(cpu_gpr[a->rd], ctx->pc_succ_insn);
 }
+
+/* No chaining with JALR. */
 lookup_and_goto_ptr(ctx);
 
 if (misaligned) {
@@ -80,21 +81,18 @@ static bool trans_jalr(DisasContext *ctx, arg_jalr *a)
 }
 ctx->base.is_jmp = DISAS_NORETURN;
 
-tcg_temp_free(t0);
 return true;
 }
 
 static bool gen_branch(DisasContext *ctx, arg_b *a, TCGCond cond)
 {
 TCGLabel *l = gen_new_label();
-TCGv source1, source2;
-source1 = tcg_temp_new();
-source2 = tcg_temp_new();
-gen_get_gpr(ctx, source1, a->rs1);
-gen_get_gpr(ctx, source2, a->rs2);
+TCGv src1 = get_gpr(ctx, a->rs1, EXT_SIGN);
+TCGv src2 = get_gpr(ctx, a->rs2, EXT_SIGN);
 
-tcg_gen_brcond_tl(cond, source1, source2, l);
+tcg_gen_brcond_tl(cond, src1, src2, l);
 gen_goto_tb(ctx, 1, ctx->pc_succ_insn);
+
 gen_set_label(l); /* branch taken */
 
 if (!has_ext(ctx, RVC) && ((ctx->base.pc_next + a->imm) & 0x3)) {
@@ -105,9 +103,6 @@ static bool gen_branch(DisasContext *ctx, arg_b *a, TCGCond 
cond)
 }
 ctx->base.is_jmp = DISAS_NORETURN;
 
-tcg_temp_free(source1);
-tcg_temp_free(source2);
-
 return true;
 }
 
-- 
2.25.1




[PATCH v5 10/24] target/riscv: Move gen_* helpers for RVB

2021-08-23 Thread Richard Henderson
Move these helpers near their use by the trans_*
functions within insn_trans/trans_rvb.c.inc.

Reviewed-by: Bin Meng 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alistair Francis 
Signed-off-by: Richard Henderson 
---
 target/riscv/translate.c| 233 ---
 target/riscv/insn_trans/trans_rvb.c.inc | 234 
 2 files changed, 234 insertions(+), 233 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 7fbacfa6ee..09853530c4 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -380,229 +380,6 @@ static bool gen_arith_imm_tl(DisasContext *ctx, arg_i *a, 
DisasExtend ext,
 return true;
 }
 
-static void gen_pack(TCGv ret, TCGv arg1, TCGv arg2)
-{
-tcg_gen_deposit_tl(ret, arg1, arg2,
-   TARGET_LONG_BITS / 2,
-   TARGET_LONG_BITS / 2);
-}
-
-static void gen_packu(TCGv ret, TCGv arg1, TCGv arg2)
-{
-TCGv t = tcg_temp_new();
-tcg_gen_shri_tl(t, arg1, TARGET_LONG_BITS / 2);
-tcg_gen_deposit_tl(ret, arg2, t, 0, TARGET_LONG_BITS / 2);
-tcg_temp_free(t);
-}
-
-static void gen_packh(TCGv ret, TCGv arg1, TCGv arg2)
-{
-TCGv t = tcg_temp_new();
-tcg_gen_ext8u_tl(t, arg2);
-tcg_gen_deposit_tl(ret, arg1, t, 8, TARGET_LONG_BITS - 8);
-tcg_temp_free(t);
-}
-
-static void gen_sbop_mask(TCGv ret, TCGv shamt)
-{
-tcg_gen_movi_tl(ret, 1);
-tcg_gen_shl_tl(ret, ret, shamt);
-}
-
-static void gen_bset(TCGv ret, TCGv arg1, TCGv shamt)
-{
-TCGv t = tcg_temp_new();
-
-gen_sbop_mask(t, shamt);
-tcg_gen_or_tl(ret, arg1, t);
-
-tcg_temp_free(t);
-}
-
-static void gen_bclr(TCGv ret, TCGv arg1, TCGv shamt)
-{
-TCGv t = tcg_temp_new();
-
-gen_sbop_mask(t, shamt);
-tcg_gen_andc_tl(ret, arg1, t);
-
-tcg_temp_free(t);
-}
-
-static void gen_binv(TCGv ret, TCGv arg1, TCGv shamt)
-{
-TCGv t = tcg_temp_new();
-
-gen_sbop_mask(t, shamt);
-tcg_gen_xor_tl(ret, arg1, t);
-
-tcg_temp_free(t);
-}
-
-static void gen_bext(TCGv ret, TCGv arg1, TCGv shamt)
-{
-tcg_gen_shr_tl(ret, arg1, shamt);
-tcg_gen_andi_tl(ret, ret, 1);
-}
-
-static void gen_slo(TCGv ret, TCGv arg1, TCGv arg2)
-{
-tcg_gen_not_tl(ret, arg1);
-tcg_gen_shl_tl(ret, ret, arg2);
-tcg_gen_not_tl(ret, ret);
-}
-
-static void gen_sro(TCGv ret, TCGv arg1, TCGv arg2)
-{
-tcg_gen_not_tl(ret, arg1);
-tcg_gen_shr_tl(ret, ret, arg2);
-tcg_gen_not_tl(ret, ret);
-}
-
-static bool gen_grevi(DisasContext *ctx, arg_grevi *a)
-{
-TCGv source1 = tcg_temp_new();
-TCGv source2;
-
-gen_get_gpr(ctx, source1, a->rs1);
-
-if (a->shamt == (TARGET_LONG_BITS - 8)) {
-/* rev8, byte swaps */
-tcg_gen_bswap_tl(source1, source1);
-} else {
-source2 = tcg_temp_new();
-tcg_gen_movi_tl(source2, a->shamt);
-gen_helper_grev(source1, source1, source2);
-tcg_temp_free(source2);
-}
-
-gen_set_gpr(ctx, a->rd, source1);
-tcg_temp_free(source1);
-return true;
-}
-
-#define GEN_SHADD(SHAMT)   \
-static void gen_sh##SHAMT##add(TCGv ret, TCGv arg1, TCGv arg2) \
-{  \
-TCGv t = tcg_temp_new();   \
-   \
-tcg_gen_shli_tl(t, arg1, SHAMT);   \
-tcg_gen_add_tl(ret, t, arg2);  \
-   \
-tcg_temp_free(t);  \
-}
-
-GEN_SHADD(1)
-GEN_SHADD(2)
-GEN_SHADD(3)
-
-static void gen_ctzw(TCGv ret, TCGv arg1)
-{
-tcg_gen_ori_tl(ret, arg1, (target_ulong)MAKE_64BIT_MASK(32, 32));
-tcg_gen_ctzi_tl(ret, ret, 64);
-}
-
-static void gen_clzw(TCGv ret, TCGv arg1)
-{
-tcg_gen_ext32u_tl(ret, arg1);
-tcg_gen_clzi_tl(ret, ret, 64);
-tcg_gen_subi_tl(ret, ret, 32);
-}
-
-static void gen_cpopw(TCGv ret, TCGv arg1)
-{
-tcg_gen_ext32u_tl(arg1, arg1);
-tcg_gen_ctpop_tl(ret, arg1);
-}
-
-static void gen_packw(TCGv ret, TCGv arg1, TCGv arg2)
-{
-TCGv t = tcg_temp_new();
-tcg_gen_ext16s_tl(t, arg2);
-tcg_gen_deposit_tl(ret, arg1, t, 16, 48);
-tcg_temp_free(t);
-}
-
-static void gen_packuw(TCGv ret, TCGv arg1, TCGv arg2)
-{
-TCGv t = tcg_temp_new();
-tcg_gen_shri_tl(t, arg1, 16);
-tcg_gen_deposit_tl(ret, arg2, t, 0, 16);
-tcg_gen_ext32s_tl(ret, ret);
-tcg_temp_free(t);
-}
-
-static void gen_rorw(TCGv ret, TCGv arg1, TCGv arg2)
-{
-TCGv_i32 t1 = tcg_temp_new_i32();
-TCGv_i32 t2 = tcg_temp_new_i32();
-
-/* truncate to 32-bits */
-tcg_gen_trunc_tl_i32(t1, arg1);
-tcg_gen_trunc_tl_i32(t2, arg2);
-
-tcg_gen_rotr_i32(t1, t1, t2);
-
-/* sign-extend 64-bits */
-tcg_gen_ext_i32_tl(ret, t1);
-
-tcg_temp_free_i32(t1);
-tcg_temp_free_i32(t2);
-}
-
-static void gen_rolw(TCGv ret, 

[PATCH v5 11/24] target/riscv: Add DisasExtend to gen_unary

2021-08-23 Thread Richard Henderson
Use ctx->w for ctpopw, which is the only one that can
re-use the generic algorithm for the narrow operation.

Reviewed-by: Alistair Francis 
Signed-off-by: Richard Henderson 
---
 target/riscv/translate.c| 14 ++
 target/riscv/insn_trans/trans_rvb.c.inc | 24 +---
 2 files changed, 15 insertions(+), 23 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 09853530c4..785e9e58cc 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -478,17 +478,15 @@ static bool gen_shiftiw(DisasContext *ctx, arg_shift *a,
 return true;
 }
 
-static bool gen_unary(DisasContext *ctx, arg_r2 *a,
-  void(*func)(TCGv, TCGv))
+static bool gen_unary(DisasContext *ctx, arg_r2 *a, DisasExtend ext,
+  void (*func)(TCGv, TCGv))
 {
-TCGv source = tcg_temp_new();
+TCGv dest = dest_gpr(ctx, a->rd);
+TCGv src1 = get_gpr(ctx, a->rs1, ext);
 
-gen_get_gpr(ctx, source, a->rs1);
+func(dest, src1);
 
-(*func)(source, source);
-
-gen_set_gpr(ctx, a->rd, source);
-tcg_temp_free(source);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
diff --git a/target/riscv/insn_trans/trans_rvb.c.inc 
b/target/riscv/insn_trans/trans_rvb.c.inc
index 73f088be23..e255678fff 100644
--- a/target/riscv/insn_trans/trans_rvb.c.inc
+++ b/target/riscv/insn_trans/trans_rvb.c.inc
@@ -26,7 +26,7 @@ static void gen_clz(TCGv ret, TCGv arg1)
 static bool trans_clz(DisasContext *ctx, arg_clz *a)
 {
 REQUIRE_EXT(ctx, RVB);
-return gen_unary(ctx, a, gen_clz);
+return gen_unary(ctx, a, EXT_ZERO, gen_clz);
 }
 
 static void gen_ctz(TCGv ret, TCGv arg1)
@@ -37,13 +37,13 @@ static void gen_ctz(TCGv ret, TCGv arg1)
 static bool trans_ctz(DisasContext *ctx, arg_ctz *a)
 {
 REQUIRE_EXT(ctx, RVB);
-return gen_unary(ctx, a, gen_ctz);
+return gen_unary(ctx, a, EXT_ZERO, gen_ctz);
 }
 
 static bool trans_cpop(DisasContext *ctx, arg_cpop *a)
 {
 REQUIRE_EXT(ctx, RVB);
-return gen_unary(ctx, a, tcg_gen_ctpop_tl);
+return gen_unary(ctx, a, EXT_ZERO, tcg_gen_ctpop_tl);
 }
 
 static bool trans_andn(DisasContext *ctx, arg_andn *a)
@@ -132,13 +132,13 @@ static bool trans_maxu(DisasContext *ctx, arg_maxu *a)
 static bool trans_sext_b(DisasContext *ctx, arg_sext_b *a)
 {
 REQUIRE_EXT(ctx, RVB);
-return gen_unary(ctx, a, tcg_gen_ext8s_tl);
+return gen_unary(ctx, a, EXT_NONE, tcg_gen_ext8s_tl);
 }
 
 static bool trans_sext_h(DisasContext *ctx, arg_sext_h *a)
 {
 REQUIRE_EXT(ctx, RVB);
-return gen_unary(ctx, a, tcg_gen_ext16s_tl);
+return gen_unary(ctx, a, EXT_NONE, tcg_gen_ext16s_tl);
 }
 
 static void gen_sbop_mask(TCGv ret, TCGv shamt)
@@ -366,7 +366,6 @@ GEN_TRANS_SHADD(3)
 
 static void gen_clzw(TCGv ret, TCGv arg1)
 {
-tcg_gen_ext32u_tl(ret, arg1);
 tcg_gen_clzi_tl(ret, ret, 64);
 tcg_gen_subi_tl(ret, ret, 32);
 }
@@ -375,7 +374,7 @@ static bool trans_clzw(DisasContext *ctx, arg_clzw *a)
 {
 REQUIRE_64BIT(ctx);
 REQUIRE_EXT(ctx, RVB);
-return gen_unary(ctx, a, gen_clzw);
+return gen_unary(ctx, a, EXT_ZERO, gen_clzw);
 }
 
 static void gen_ctzw(TCGv ret, TCGv arg1)
@@ -388,20 +387,15 @@ static bool trans_ctzw(DisasContext *ctx, arg_ctzw *a)
 {
 REQUIRE_64BIT(ctx);
 REQUIRE_EXT(ctx, RVB);
-return gen_unary(ctx, a, gen_ctzw);
-}
-
-static void gen_cpopw(TCGv ret, TCGv arg1)
-{
-tcg_gen_ext32u_tl(arg1, arg1);
-tcg_gen_ctpop_tl(ret, arg1);
+return gen_unary(ctx, a, EXT_NONE, gen_ctzw);
 }
 
 static bool trans_cpopw(DisasContext *ctx, arg_cpopw *a)
 {
 REQUIRE_64BIT(ctx);
 REQUIRE_EXT(ctx, RVB);
-return gen_unary(ctx, a, gen_cpopw);
+ctx->w = true;
+return gen_unary(ctx, a, EXT_ZERO, tcg_gen_ctpop_tl);
 }
 
 static void gen_packw(TCGv ret, TCGv arg1, TCGv arg2)
-- 
2.25.1




[PATCH v5 16/24] target/riscv: Fix rmw_sip, rmw_vsip, rmw_hsip vs write-only operation

2021-08-23 Thread Richard Henderson
We distinguish write-only by passing ret_value as NULL.

Signed-off-by: Richard Henderson 
---
 target/riscv/csr.c | 23 +++
 1 file changed, 15 insertions(+), 8 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 9a4ed18ac5..d900f96dc1 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -937,9 +937,12 @@ static RISCVException rmw_vsip(CPURISCVState *env, int 
csrno,
 /* Shift the S bits to their VS bit location in mip */
 int ret = rmw_mip(env, 0, ret_value, new_value << 1,
   (write_mask << 1) & vsip_writable_mask & env->hideleg);
-*ret_value &= VS_MODE_INTERRUPTS;
-/* Shift the VS bits to their S bit location in vsip */
-*ret_value >>= 1;
+
+if (ret_value) {
+*ret_value &= VS_MODE_INTERRUPTS;
+/* Shift the VS bits to their S bit location in vsip */
+*ret_value >>= 1;
+}
 return ret;
 }
 
@@ -956,7 +959,9 @@ static RISCVException rmw_sip(CPURISCVState *env, int csrno,
   write_mask & env->mideleg & sip_writable_mask);
 }
 
-*ret_value &= env->mideleg;
+if (ret_value) {
+*ret_value &= env->mideleg;
+}
 return ret;
 }
 
@@ -1072,8 +1077,9 @@ static RISCVException rmw_hvip(CPURISCVState *env, int 
csrno,
 int ret = rmw_mip(env, 0, ret_value, new_value,
   write_mask & hvip_writable_mask);
 
-*ret_value &= hvip_writable_mask;
-
+if (ret_value) {
+*ret_value &= hvip_writable_mask;
+}
 return ret;
 }
 
@@ -1084,8 +1090,9 @@ static RISCVException rmw_hip(CPURISCVState *env, int 
csrno,
 int ret = rmw_mip(env, 0, ret_value, new_value,
   write_mask & hip_writable_mask);
 
-*ret_value &= hip_writable_mask;
-
+if (ret_value) {
+*ret_value &= hip_writable_mask;
+}
 return ret;
 }
 
-- 
2.25.1




[PATCH v5 06/24] target/riscv: Add DisasExtend to gen_arith*

2021-08-23 Thread Richard Henderson
Most arithmetic does not require extending the inputs.
Exceptions include division, comparison and minmax.

Begin using ctx->w, which allows elimination of gen_addw,
gen_subw, gen_mulw.

Reviewed-by: Bin Meng 
Reviewed-by: Alistair Francis 
Signed-off-by: Richard Henderson 
---
 target/riscv/translate.c| 69 +++--
 target/riscv/insn_trans/trans_rvb.c.inc | 30 +--
 target/riscv/insn_trans/trans_rvi.c.inc | 39 --
 target/riscv/insn_trans/trans_rvm.c.inc | 16 +++---
 4 files changed, 64 insertions(+), 90 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index d7552dc377..7dd2839288 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -230,7 +230,7 @@ static void gen_get_gpr(DisasContext *ctx, TCGv t, int 
reg_num)
 tcg_gen_mov_tl(t, get_gpr(ctx, reg_num, EXT_NONE));
 }
 
-static TCGv __attribute__((unused)) dest_gpr(DisasContext *ctx, int reg_num)
+static TCGv dest_gpr(DisasContext *ctx, int reg_num)
 {
 if (reg_num == 0 || ctx->w) {
 return temp_new(ctx);
@@ -482,57 +482,31 @@ static int ex_rvc_shifti(DisasContext *ctx, int imm)
 /* Include the auto-generated decoder for 32 bit insn */
 #include "decode-insn32.c.inc"
 
-static bool gen_arith_imm_fn(DisasContext *ctx, arg_i *a,
+static bool gen_arith_imm_fn(DisasContext *ctx, arg_i *a, DisasExtend ext,
  void (*func)(TCGv, TCGv, target_long))
 {
-TCGv source1;
-source1 = tcg_temp_new();
+TCGv dest = dest_gpr(ctx, a->rd);
+TCGv src1 = get_gpr(ctx, a->rs1, ext);
 
-gen_get_gpr(ctx, source1, a->rs1);
+func(dest, src1, a->imm);
 
-(*func)(source1, source1, a->imm);
-
-gen_set_gpr(ctx, a->rd, source1);
-tcg_temp_free(source1);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
-static bool gen_arith_imm_tl(DisasContext *ctx, arg_i *a,
+static bool gen_arith_imm_tl(DisasContext *ctx, arg_i *a, DisasExtend ext,
  void (*func)(TCGv, TCGv, TCGv))
 {
-TCGv source1, source2;
-source1 = tcg_temp_new();
-source2 = tcg_temp_new();
+TCGv dest = dest_gpr(ctx, a->rd);
+TCGv src1 = get_gpr(ctx, a->rs1, ext);
+TCGv src2 = tcg_constant_tl(a->imm);
 
-gen_get_gpr(ctx, source1, a->rs1);
-tcg_gen_movi_tl(source2, a->imm);
+func(dest, src1, src2);
 
-(*func)(source1, source1, source2);
-
-gen_set_gpr(ctx, a->rd, source1);
-tcg_temp_free(source1);
-tcg_temp_free(source2);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
-static void gen_addw(TCGv ret, TCGv arg1, TCGv arg2)
-{
-tcg_gen_add_tl(ret, arg1, arg2);
-tcg_gen_ext32s_tl(ret, ret);
-}
-
-static void gen_subw(TCGv ret, TCGv arg1, TCGv arg2)
-{
-tcg_gen_sub_tl(ret, arg1, arg2);
-tcg_gen_ext32s_tl(ret, ret);
-}
-
-static void gen_mulw(TCGv ret, TCGv arg1, TCGv arg2)
-{
-tcg_gen_mul_tl(ret, arg1, arg2);
-tcg_gen_ext32s_tl(ret, ret);
-}
-
 static bool gen_arith_div_w(DisasContext *ctx, arg_r *a,
 void(*func)(TCGv, TCGv, TCGv))
 {
@@ -798,21 +772,16 @@ static void gen_add_uw(TCGv ret, TCGv arg1, TCGv arg2)
 tcg_gen_add_tl(ret, arg1, arg2);
 }
 
-static bool gen_arith(DisasContext *ctx, arg_r *a,
-  void(*func)(TCGv, TCGv, TCGv))
+static bool gen_arith(DisasContext *ctx, arg_r *a, DisasExtend ext,
+  void (*func)(TCGv, TCGv, TCGv))
 {
-TCGv source1, source2;
-source1 = tcg_temp_new();
-source2 = tcg_temp_new();
+TCGv dest = dest_gpr(ctx, a->rd);
+TCGv src1 = get_gpr(ctx, a->rs1, ext);
+TCGv src2 = get_gpr(ctx, a->rs2, ext);
 
-gen_get_gpr(ctx, source1, a->rs1);
-gen_get_gpr(ctx, source2, a->rs2);
+func(dest, src1, src2);
 
-(*func)(source1, source1, source2);
-
-gen_set_gpr(ctx, a->rd, source1);
-tcg_temp_free(source1);
-tcg_temp_free(source2);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
diff --git a/target/riscv/insn_trans/trans_rvb.c.inc 
b/target/riscv/insn_trans/trans_rvb.c.inc
index 260e15b47d..217a7d1f26 100644
--- a/target/riscv/insn_trans/trans_rvb.c.inc
+++ b/target/riscv/insn_trans/trans_rvb.c.inc
@@ -38,61 +38,61 @@ static bool trans_cpop(DisasContext *ctx, arg_cpop *a)
 static bool trans_andn(DisasContext *ctx, arg_andn *a)
 {
 REQUIRE_EXT(ctx, RVB);
-return gen_arith(ctx, a, tcg_gen_andc_tl);
+return gen_arith(ctx, a, EXT_NONE, tcg_gen_andc_tl);
 }
 
 static bool trans_orn(DisasContext *ctx, arg_orn *a)
 {
 REQUIRE_EXT(ctx, RVB);
-return gen_arith(ctx, a, tcg_gen_orc_tl);
+return gen_arith(ctx, a, EXT_NONE, tcg_gen_orc_tl);
 }
 
 static bool trans_xnor(DisasContext *ctx, arg_xnor *a)
 {
 REQUIRE_EXT(ctx, RVB);
-return gen_arith(ctx, a, tcg_gen_eqv_tl);
+return gen_arith(ctx, a, EXT_NONE, tcg_gen_eqv_tl);
 }
 
 static bool trans_pack(DisasContext *ctx, arg_pack *a)
 {
 REQUIRE_EXT(ctx, RVB);
-return gen_arith(ctx, a, gen_pack);
+return 

[PATCH v5 15/24] target/riscv: Use {get, dest}_gpr for integer load/store

2021-08-23 Thread Richard Henderson
Reviewed-by: Bin Meng 
Reviewed-by: Alistair Francis 
Signed-off-by: Richard Henderson 
---
 target/riscv/insn_trans/trans_rvi.c.inc | 36 +
 1 file changed, 19 insertions(+), 17 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc
index a5249e71c2..76454fb7e2 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -138,15 +138,17 @@ static bool trans_bgeu(DisasContext *ctx, arg_bgeu *a)
 
 static bool gen_load(DisasContext *ctx, arg_lb *a, MemOp memop)
 {
-TCGv t0 = tcg_temp_new();
-TCGv t1 = tcg_temp_new();
-gen_get_gpr(ctx, t0, a->rs1);
-tcg_gen_addi_tl(t0, t0, a->imm);
+TCGv dest = dest_gpr(ctx, a->rd);
+TCGv addr = get_gpr(ctx, a->rs1, EXT_NONE);
 
-tcg_gen_qemu_ld_tl(t1, t0, ctx->mem_idx, memop);
-gen_set_gpr(ctx, a->rd, t1);
-tcg_temp_free(t0);
-tcg_temp_free(t1);
+if (a->imm) {
+TCGv temp = temp_new(ctx);
+tcg_gen_addi_tl(temp, addr, a->imm);
+addr = temp;
+}
+
+tcg_gen_qemu_ld_tl(dest, addr, ctx->mem_idx, memop);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
@@ -177,19 +179,19 @@ static bool trans_lhu(DisasContext *ctx, arg_lhu *a)
 
 static bool gen_store(DisasContext *ctx, arg_sb *a, MemOp memop)
 {
-TCGv t0 = tcg_temp_new();
-TCGv dat = tcg_temp_new();
-gen_get_gpr(ctx, t0, a->rs1);
-tcg_gen_addi_tl(t0, t0, a->imm);
-gen_get_gpr(ctx, dat, a->rs2);
+TCGv addr = get_gpr(ctx, a->rs1, EXT_NONE);
+TCGv data = get_gpr(ctx, a->rs2, EXT_NONE);
 
-tcg_gen_qemu_st_tl(dat, t0, ctx->mem_idx, memop);
-tcg_temp_free(t0);
-tcg_temp_free(dat);
+if (a->imm) {
+TCGv temp = temp_new(ctx);
+tcg_gen_addi_tl(temp, addr, a->imm);
+addr = temp;
+}
+
+tcg_gen_qemu_st_tl(data, addr, ctx->mem_idx, memop);
 return true;
 }
 
-
 static bool trans_sb(DisasContext *ctx, arg_sb *a)
 {
 return gen_store(ctx, a, MO_SB);
-- 
2.25.1




[PATCH v5 08/24] target/riscv: Use gen_arith for mulh and mulhu

2021-08-23 Thread Richard Henderson
Split out gen_mulh and gen_mulhu and use the common helper.

Reviewed-by: Bin Meng 
Reviewed-by: Alistair Francis 
Signed-off-by: Richard Henderson 
---
 target/riscv/insn_trans/trans_rvm.c.inc | 40 +++--
 1 file changed, 18 insertions(+), 22 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvm.c.inc 
b/target/riscv/insn_trans/trans_rvm.c.inc
index 3d93b24c25..80552be7a3 100644
--- a/target/riscv/insn_trans/trans_rvm.c.inc
+++ b/target/riscv/insn_trans/trans_rvm.c.inc
@@ -25,20 +25,18 @@ static bool trans_mul(DisasContext *ctx, arg_mul *a)
 return gen_arith(ctx, a, EXT_NONE, tcg_gen_mul_tl);
 }
 
+static void gen_mulh(TCGv ret, TCGv s1, TCGv s2)
+{
+TCGv discard = tcg_temp_new();
+
+tcg_gen_muls2_tl(discard, ret, s1, s2);
+tcg_temp_free(discard);
+}
+
 static bool trans_mulh(DisasContext *ctx, arg_mulh *a)
 {
 REQUIRE_EXT(ctx, RVM);
-TCGv source1 = tcg_temp_new();
-TCGv source2 = tcg_temp_new();
-gen_get_gpr(ctx, source1, a->rs1);
-gen_get_gpr(ctx, source2, a->rs2);
-
-tcg_gen_muls2_tl(source2, source1, source1, source2);
-
-gen_set_gpr(ctx, a->rd, source1);
-tcg_temp_free(source1);
-tcg_temp_free(source2);
-return true;
+return gen_arith(ctx, a, EXT_NONE, gen_mulh);
 }
 
 static bool trans_mulhsu(DisasContext *ctx, arg_mulhsu *a)
@@ -47,20 +45,18 @@ static bool trans_mulhsu(DisasContext *ctx, arg_mulhsu *a)
 return gen_arith(ctx, a, EXT_NONE, gen_mulhsu);
 }
 
+static void gen_mulhu(TCGv ret, TCGv s1, TCGv s2)
+{
+TCGv discard = tcg_temp_new();
+
+tcg_gen_mulu2_tl(discard, ret, s1, s2);
+tcg_temp_free(discard);
+}
+
 static bool trans_mulhu(DisasContext *ctx, arg_mulhu *a)
 {
 REQUIRE_EXT(ctx, RVM);
-TCGv source1 = tcg_temp_new();
-TCGv source2 = tcg_temp_new();
-gen_get_gpr(ctx, source1, a->rs1);
-gen_get_gpr(ctx, source2, a->rs2);
-
-tcg_gen_mulu2_tl(source2, source1, source1, source2);
-
-gen_set_gpr(ctx, a->rd, source1);
-tcg_temp_free(source1);
-tcg_temp_free(source2);
-return true;
+return gen_arith(ctx, a, EXT_NONE, gen_mulhu);
 }
 
 static bool trans_div(DisasContext *ctx, arg_div *a)
-- 
2.25.1




[PATCH v5 12/24] target/riscv: Use DisasExtend in shift operations

2021-08-23 Thread Richard Henderson
These operations are greatly simplified by ctx->w, which allows
us to fold gen_shiftw into gen_shift.  Split gen_shifti into
gen_shift_imm_{fn,tl} like we do for gen_arith_imm_{fn,tl}.

Reviewed-by: Bin Meng 
Reviewed-by: Alistair Francis 
Signed-off-by: Richard Henderson 
---
 target/riscv/translate.c| 110 +---
 target/riscv/insn_trans/trans_rvb.c.inc | 129 +++-
 target/riscv/insn_trans/trans_rvi.c.inc |  88 
 3 files changed, 125 insertions(+), 202 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 785e9e58cc..e44254e878 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -99,6 +99,13 @@ static inline bool is_32bit(DisasContext *ctx)
 }
 #endif
 
+/* The word size for this operation. */
+static inline int oper_len(DisasContext *ctx)
+{
+return ctx->w ? 32 : TARGET_LONG_BITS;
+}
+
+
 /*
  * RISC-V requires NaN-boxing of narrower width floating point values.
  * This applies when a 32-bit value is assigned to a 64-bit FP register.
@@ -393,88 +400,58 @@ static bool gen_arith(DisasContext *ctx, arg_r *a, 
DisasExtend ext,
 return true;
 }
 
-static bool gen_shift(DisasContext *ctx, arg_r *a,
-void(*func)(TCGv, TCGv, TCGv))
+static bool gen_shift_imm_fn(DisasContext *ctx, arg_shift *a, DisasExtend ext,
+ void (*func)(TCGv, TCGv, target_long))
 {
-TCGv source1 = tcg_temp_new();
-TCGv source2 = tcg_temp_new();
+TCGv dest, src1;
+int max_len = oper_len(ctx);
 
-gen_get_gpr(ctx, source1, a->rs1);
-gen_get_gpr(ctx, source2, a->rs2);
-
-tcg_gen_andi_tl(source2, source2, TARGET_LONG_BITS - 1);
-(*func)(source1, source1, source2);
-
-gen_set_gpr(ctx, a->rd, source1);
-tcg_temp_free(source1);
-tcg_temp_free(source2);
-return true;
-}
-
-static uint32_t opcode_at(DisasContextBase *dcbase, target_ulong pc)
-{
-DisasContext *ctx = container_of(dcbase, DisasContext, base);
-CPUState *cpu = ctx->cs;
-CPURISCVState *env = cpu->env_ptr;
-
-return cpu_ldl_code(env, pc);
-}
-
-static bool gen_shifti(DisasContext *ctx, arg_shift *a,
-   void(*func)(TCGv, TCGv, TCGv))
-{
-if (a->shamt >= TARGET_LONG_BITS) {
+if (a->shamt >= max_len) {
 return false;
 }
 
-TCGv source1 = tcg_temp_new();
-TCGv source2 = tcg_temp_new();
+dest = dest_gpr(ctx, a->rd);
+src1 = get_gpr(ctx, a->rs1, ext);
 
-gen_get_gpr(ctx, source1, a->rs1);
+func(dest, src1, a->shamt);
 
-tcg_gen_movi_tl(source2, a->shamt);
-(*func)(source1, source1, source2);
-
-gen_set_gpr(ctx, a->rd, source1);
-tcg_temp_free(source1);
-tcg_temp_free(source2);
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
-static bool gen_shiftw(DisasContext *ctx, arg_r *a,
-   void(*func)(TCGv, TCGv, TCGv))
+static bool gen_shift_imm_tl(DisasContext *ctx, arg_shift *a, DisasExtend ext,
+ void (*func)(TCGv, TCGv, TCGv))
 {
-TCGv source1 = tcg_temp_new();
-TCGv source2 = tcg_temp_new();
+TCGv dest, src1, src2;
+int max_len = oper_len(ctx);
 
-gen_get_gpr(ctx, source1, a->rs1);
-gen_get_gpr(ctx, source2, a->rs2);
+if (a->shamt >= max_len) {
+return false;
+}
 
-tcg_gen_andi_tl(source2, source2, 31);
-(*func)(source1, source1, source2);
-tcg_gen_ext32s_tl(source1, source1);
+dest = dest_gpr(ctx, a->rd);
+src1 = get_gpr(ctx, a->rs1, ext);
+src2 = tcg_constant_tl(a->shamt);
 
-gen_set_gpr(ctx, a->rd, source1);
-tcg_temp_free(source1);
-tcg_temp_free(source2);
+func(dest, src1, src2);
+
+gen_set_gpr(ctx, a->rd, dest);
 return true;
 }
 
-static bool gen_shiftiw(DisasContext *ctx, arg_shift *a,
-void(*func)(TCGv, TCGv, TCGv))
+static bool gen_shift(DisasContext *ctx, arg_r *a, DisasExtend ext,
+  void (*func)(TCGv, TCGv, TCGv))
 {
-TCGv source1 = tcg_temp_new();
-TCGv source2 = tcg_temp_new();
+TCGv dest = dest_gpr(ctx, a->rd);
+TCGv src1 = get_gpr(ctx, a->rs1, ext);
+TCGv src2 = get_gpr(ctx, a->rs2, EXT_NONE);
+TCGv ext2 = tcg_temp_new();
 
-gen_get_gpr(ctx, source1, a->rs1);
-tcg_gen_movi_tl(source2, a->shamt);
+tcg_gen_andi_tl(ext2, src2, oper_len(ctx) - 1);
+func(dest, src1, ext2);
 
-(*func)(source1, source1, source2);
-tcg_gen_ext32s_tl(source1, source1);
-
-gen_set_gpr(ctx, a->rd, source1);
-tcg_temp_free(source1);
-tcg_temp_free(source2);
+gen_set_gpr(ctx, a->rd, dest);
+tcg_temp_free(ext2);
 return true;
 }
 
@@ -490,6 +467,15 @@ static bool gen_unary(DisasContext *ctx, arg_r2 *a, 
DisasExtend ext,
 return true;
 }
 
+static uint32_t opcode_at(DisasContextBase *dcbase, target_ulong pc)
+{
+DisasContext *ctx = container_of(dcbase, DisasContext, base);
+CPUState *cpu = ctx->cs;
+CPURISCVState *env = 

[PATCH v5 07/24] target/riscv: Remove gen_arith_div*

2021-08-23 Thread Richard Henderson
Use ctx->w and the enhanced gen_arith function.

Reviewed-by: Bin Meng 
Reviewed-by: Alistair Francis 
Signed-off-by: Richard Henderson 
---
 target/riscv/translate.c| 42 -
 target/riscv/insn_trans/trans_rvm.c.inc | 16 +-
 2 files changed, 8 insertions(+), 50 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 7dd2839288..1855eacbac 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -507,48 +507,6 @@ static bool gen_arith_imm_tl(DisasContext *ctx, arg_i *a, 
DisasExtend ext,
 return true;
 }
 
-static bool gen_arith_div_w(DisasContext *ctx, arg_r *a,
-void(*func)(TCGv, TCGv, TCGv))
-{
-TCGv source1, source2;
-source1 = tcg_temp_new();
-source2 = tcg_temp_new();
-
-gen_get_gpr(ctx, source1, a->rs1);
-gen_get_gpr(ctx, source2, a->rs2);
-tcg_gen_ext32s_tl(source1, source1);
-tcg_gen_ext32s_tl(source2, source2);
-
-(*func)(source1, source1, source2);
-
-tcg_gen_ext32s_tl(source1, source1);
-gen_set_gpr(ctx, a->rd, source1);
-tcg_temp_free(source1);
-tcg_temp_free(source2);
-return true;
-}
-
-static bool gen_arith_div_uw(DisasContext *ctx, arg_r *a,
-void(*func)(TCGv, TCGv, TCGv))
-{
-TCGv source1, source2;
-source1 = tcg_temp_new();
-source2 = tcg_temp_new();
-
-gen_get_gpr(ctx, source1, a->rs1);
-gen_get_gpr(ctx, source2, a->rs2);
-tcg_gen_ext32u_tl(source1, source1);
-tcg_gen_ext32u_tl(source2, source2);
-
-(*func)(source1, source1, source2);
-
-tcg_gen_ext32s_tl(source1, source1);
-gen_set_gpr(ctx, a->rd, source1);
-tcg_temp_free(source1);
-tcg_temp_free(source2);
-return true;
-}
-
 static void gen_pack(TCGv ret, TCGv arg1, TCGv arg2)
 {
 tcg_gen_deposit_tl(ret, arg1, arg2,
diff --git a/target/riscv/insn_trans/trans_rvm.c.inc 
b/target/riscv/insn_trans/trans_rvm.c.inc
index 013b3f7009..3d93b24c25 100644
--- a/target/riscv/insn_trans/trans_rvm.c.inc
+++ b/target/riscv/insn_trans/trans_rvm.c.inc
@@ -99,30 +99,30 @@ static bool trans_divw(DisasContext *ctx, arg_divw *a)
 {
 REQUIRE_64BIT(ctx);
 REQUIRE_EXT(ctx, RVM);
-
-return gen_arith_div_w(ctx, a, _div);
+ctx->w = true;
+return gen_arith(ctx, a, EXT_SIGN, gen_div);
 }
 
 static bool trans_divuw(DisasContext *ctx, arg_divuw *a)
 {
 REQUIRE_64BIT(ctx);
 REQUIRE_EXT(ctx, RVM);
-
-return gen_arith_div_uw(ctx, a, _divu);
+ctx->w = true;
+return gen_arith(ctx, a, EXT_ZERO, gen_divu);
 }
 
 static bool trans_remw(DisasContext *ctx, arg_remw *a)
 {
 REQUIRE_64BIT(ctx);
 REQUIRE_EXT(ctx, RVM);
-
-return gen_arith_div_w(ctx, a, _rem);
+ctx->w = true;
+return gen_arith(ctx, a, EXT_SIGN, gen_rem);
 }
 
 static bool trans_remuw(DisasContext *ctx, arg_remuw *a)
 {
 REQUIRE_64BIT(ctx);
 REQUIRE_EXT(ctx, RVM);
-
-return gen_arith_div_uw(ctx, a, _remu);
+ctx->w = true;
+return gen_arith(ctx, a, EXT_ZERO, gen_remu);
 }
-- 
2.25.1




[PATCH v5 04/24] target/riscv: Add DisasContext to gen_get_gpr, gen_set_gpr

2021-08-23 Thread Richard Henderson
We will require the context to handle RV64 word operations.

Reviewed-by: Alistair Francis 
Reviewed-by: Bin Meng 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 target/riscv/translate.c| 58 -
 target/riscv/insn_trans/trans_rva.c.inc | 18 
 target/riscv/insn_trans/trans_rvb.c.inc |  4 +-
 target/riscv/insn_trans/trans_rvd.c.inc | 32 +++---
 target/riscv/insn_trans/trans_rvf.c.inc | 32 +++---
 target/riscv/insn_trans/trans_rvh.c.inc | 52 +++---
 target/riscv/insn_trans/trans_rvi.c.inc | 44 +--
 target/riscv/insn_trans/trans_rvm.c.inc | 12 ++---
 target/riscv/insn_trans/trans_rvv.c.inc | 36 +++
 9 files changed, 144 insertions(+), 144 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 147b9c2f68..ce4c56c179 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -175,7 +175,7 @@ static void gen_goto_tb(DisasContext *ctx, int n, 
target_ulong dest)
 /* Wrapper for getting reg values - need to check of reg is zero since
  * cpu_gpr[0] is not actually allocated
  */
-static inline void gen_get_gpr(TCGv t, int reg_num)
+static void gen_get_gpr(DisasContext *ctx, TCGv t, int reg_num)
 {
 if (reg_num == 0) {
 tcg_gen_movi_tl(t, 0);
@@ -189,7 +189,7 @@ static inline void gen_get_gpr(TCGv t, int reg_num)
  * since we usually avoid calling the OP_TYPE_gen function if we see a write to
  * $zero
  */
-static inline void gen_set_gpr(int reg_num_dst, TCGv t)
+static void gen_set_gpr(DisasContext *ctx, int reg_num_dst, TCGv t)
 {
 if (reg_num_dst != 0) {
 tcg_gen_mov_tl(cpu_gpr[reg_num_dst], t);
@@ -435,11 +435,11 @@ static bool gen_arith_imm_fn(DisasContext *ctx, arg_i *a,
 TCGv source1;
 source1 = tcg_temp_new();
 
-gen_get_gpr(source1, a->rs1);
+gen_get_gpr(ctx, source1, a->rs1);
 
 (*func)(source1, source1, a->imm);
 
-gen_set_gpr(a->rd, source1);
+gen_set_gpr(ctx, a->rd, source1);
 tcg_temp_free(source1);
 return true;
 }
@@ -451,12 +451,12 @@ static bool gen_arith_imm_tl(DisasContext *ctx, arg_i *a,
 source1 = tcg_temp_new();
 source2 = tcg_temp_new();
 
-gen_get_gpr(source1, a->rs1);
+gen_get_gpr(ctx, source1, a->rs1);
 tcg_gen_movi_tl(source2, a->imm);
 
 (*func)(source1, source1, source2);
 
-gen_set_gpr(a->rd, source1);
+gen_set_gpr(ctx, a->rd, source1);
 tcg_temp_free(source1);
 tcg_temp_free(source2);
 return true;
@@ -487,15 +487,15 @@ static bool gen_arith_div_w(DisasContext *ctx, arg_r *a,
 source1 = tcg_temp_new();
 source2 = tcg_temp_new();
 
-gen_get_gpr(source1, a->rs1);
-gen_get_gpr(source2, a->rs2);
+gen_get_gpr(ctx, source1, a->rs1);
+gen_get_gpr(ctx, source2, a->rs2);
 tcg_gen_ext32s_tl(source1, source1);
 tcg_gen_ext32s_tl(source2, source2);
 
 (*func)(source1, source1, source2);
 
 tcg_gen_ext32s_tl(source1, source1);
-gen_set_gpr(a->rd, source1);
+gen_set_gpr(ctx, a->rd, source1);
 tcg_temp_free(source1);
 tcg_temp_free(source2);
 return true;
@@ -508,15 +508,15 @@ static bool gen_arith_div_uw(DisasContext *ctx, arg_r *a,
 source1 = tcg_temp_new();
 source2 = tcg_temp_new();
 
-gen_get_gpr(source1, a->rs1);
-gen_get_gpr(source2, a->rs2);
+gen_get_gpr(ctx, source1, a->rs1);
+gen_get_gpr(ctx, source2, a->rs2);
 tcg_gen_ext32u_tl(source1, source1);
 tcg_gen_ext32u_tl(source2, source2);
 
 (*func)(source1, source1, source2);
 
 tcg_gen_ext32s_tl(source1, source1);
-gen_set_gpr(a->rd, source1);
+gen_set_gpr(ctx, a->rd, source1);
 tcg_temp_free(source1);
 tcg_temp_free(source2);
 return true;
@@ -606,7 +606,7 @@ static bool gen_grevi(DisasContext *ctx, arg_grevi *a)
 TCGv source1 = tcg_temp_new();
 TCGv source2;
 
-gen_get_gpr(source1, a->rs1);
+gen_get_gpr(ctx, source1, a->rs1);
 
 if (a->shamt == (TARGET_LONG_BITS - 8)) {
 /* rev8, byte swaps */
@@ -618,7 +618,7 @@ static bool gen_grevi(DisasContext *ctx, arg_grevi *a)
 tcg_temp_free(source2);
 }
 
-gen_set_gpr(a->rd, source1);
+gen_set_gpr(ctx, a->rd, source1);
 tcg_temp_free(source1);
 return true;
 }
@@ -752,12 +752,12 @@ static bool gen_arith(DisasContext *ctx, arg_r *a,
 source1 = tcg_temp_new();
 source2 = tcg_temp_new();
 
-gen_get_gpr(source1, a->rs1);
-gen_get_gpr(source2, a->rs2);
+gen_get_gpr(ctx, source1, a->rs1);
+gen_get_gpr(ctx, source2, a->rs2);
 
 (*func)(source1, source1, source2);
 
-gen_set_gpr(a->rd, source1);
+gen_set_gpr(ctx, a->rd, source1);
 tcg_temp_free(source1);
 tcg_temp_free(source2);
 return true;
@@ -769,13 +769,13 @@ static bool gen_shift(DisasContext *ctx, arg_r *a,
 TCGv source1 = tcg_temp_new();
 TCGv source2 = tcg_temp_new();
 
-gen_get_gpr(source1, a->rs1);
-gen_get_gpr(source2, a->rs2);
+  

[PATCH v5 02/24] tests/tcg/riscv64: Add test for division

2021-08-23 Thread Richard Henderson
Tested-by: Bin Meng 
Reviewed-by: Bin Meng 
Reviewed-by: Alistair Francis 
Signed-off-by: Richard Henderson 
---
 tests/tcg/riscv64/test-div.c  | 58 +++
 tests/tcg/riscv64/Makefile.target |  5 +++
 2 files changed, 63 insertions(+)
 create mode 100644 tests/tcg/riscv64/test-div.c
 create mode 100644 tests/tcg/riscv64/Makefile.target

diff --git a/tests/tcg/riscv64/test-div.c b/tests/tcg/riscv64/test-div.c
new file mode 100644
index 00..a90480be3f
--- /dev/null
+++ b/tests/tcg/riscv64/test-div.c
@@ -0,0 +1,58 @@
+#include 
+#include 
+
+struct TestS {
+long x, y, q, r;
+};
+
+static struct TestS test_s[] = {
+{ 4, 2, 2, 0 }, /* normal cases */
+{ 9, 7, 1, 2 },
+{ 0, 0, -1, 0 },/* div by zero cases */
+{ 9, 0, -1, 9 },
+{ LONG_MIN, -1, LONG_MIN, 0 },  /* overflow case */
+};
+
+struct TestU {
+unsigned long x, y, q, r;
+};
+
+static struct TestU test_u[] = {
+{ 4, 2, 2, 0 }, /* normal cases */
+{ 9, 7, 1, 2 },
+{ 0, 0, ULONG_MAX, 0 }, /* div by zero cases */
+{ 9, 0, ULONG_MAX, 9 },
+};
+
+#define ARRAY_SIZE(X)  (sizeof(X) / sizeof(*(X)))
+
+int main (void)
+{
+int i;
+
+for (i = 0; i < ARRAY_SIZE(test_s); i++) {
+long q, r;
+
+asm("div %0, %2, %3\n\t"
+"rem %1, %2, %3"
+: "=" (q), "=r" (r)
+: "r" (test_s[i].x), "r" (test_s[i].y));
+
+assert(q == test_s[i].q);
+assert(r == test_s[i].r);
+}
+
+for (i = 0; i < ARRAY_SIZE(test_u); i++) {
+unsigned long q, r;
+
+asm("divu %0, %2, %3\n\t"
+"remu %1, %2, %3"
+: "=" (q), "=r" (r)
+: "r" (test_u[i].x), "r" (test_u[i].y));
+
+assert(q == test_u[i].q);
+assert(r == test_u[i].r);
+}
+
+return 0;
+}
diff --git a/tests/tcg/riscv64/Makefile.target 
b/tests/tcg/riscv64/Makefile.target
new file mode 100644
index 00..d41bf6d60d
--- /dev/null
+++ b/tests/tcg/riscv64/Makefile.target
@@ -0,0 +1,5 @@
+# -*- Mode: makefile -*-
+# RISC-V specific tweaks
+
+VPATH += $(SRC_PATH)/tests/tcg/riscv64
+TESTS += test-div
-- 
2.25.1




[PATCH v5 03/24] target/riscv: Clean up division helpers

2021-08-23 Thread Richard Henderson
Utilize the condition in the movcond more; this allows some of
the setcond that were feeding into movcond to be removed.
Do not write into source1 and source2.  Re-name "condN" to "tempN"
and use the temporaries for more than holding conditions.

Tested-by: Bin Meng 
Reviewed-by: Bin Meng 
Reviewed-by: Alistair Francis 
Signed-off-by: Richard Henderson 
---
 target/riscv/translate.c | 160 ---
 1 file changed, 84 insertions(+), 76 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 20a55c92fb..147b9c2f68 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -213,106 +213,114 @@ static void gen_mulhsu(TCGv ret, TCGv arg1, TCGv arg2)
 
 static void gen_div(TCGv ret, TCGv source1, TCGv source2)
 {
-TCGv cond1, cond2, zeroreg, resultopt1;
+TCGv temp1, temp2, zero, one, mone, min;
+
+temp1 = tcg_temp_new();
+temp2 = tcg_temp_new();
+zero = tcg_constant_tl(0);
+one = tcg_constant_tl(1);
+mone = tcg_constant_tl(-1);
+min = tcg_constant_tl(1ull << (TARGET_LONG_BITS - 1));
+
 /*
- * Handle by altering args to tcg_gen_div to produce req'd results:
- * For overflow: want source1 in source1 and 1 in source2
- * For div by zero: want -1 in source1 and 1 in source2 -> -1 result
+ * If overflow, set temp2 to 1, else source2.
+ * This produces the required result of min.
  */
-cond1 = tcg_temp_new();
-cond2 = tcg_temp_new();
-zeroreg = tcg_constant_tl(0);
-resultopt1 = tcg_temp_new();
+tcg_gen_setcond_tl(TCG_COND_EQ, temp1, source1, min);
+tcg_gen_setcond_tl(TCG_COND_EQ, temp2, source2, mone);
+tcg_gen_and_tl(temp1, temp1, temp2);
+tcg_gen_movcond_tl(TCG_COND_NE, temp2, temp1, zero, one, source2);
 
-tcg_gen_movi_tl(resultopt1, (target_ulong)-1);
-tcg_gen_setcondi_tl(TCG_COND_EQ, cond2, source2, (target_ulong)(~0L));
-tcg_gen_setcondi_tl(TCG_COND_EQ, cond1, source1,
-((target_ulong)1) << (TARGET_LONG_BITS - 1));
-tcg_gen_and_tl(cond1, cond1, cond2); /* cond1 = overflow */
-tcg_gen_setcondi_tl(TCG_COND_EQ, cond2, source2, 0); /* cond2 = div 0 */
-/* if div by zero, set source1 to -1, otherwise don't change */
-tcg_gen_movcond_tl(TCG_COND_EQ, source1, cond2, zeroreg, source1,
-resultopt1);
-/* if overflow or div by zero, set source2 to 1, else don't change */
-tcg_gen_or_tl(cond1, cond1, cond2);
-tcg_gen_movi_tl(resultopt1, (target_ulong)1);
-tcg_gen_movcond_tl(TCG_COND_EQ, source2, cond1, zeroreg, source2,
-resultopt1);
-tcg_gen_div_tl(ret, source1, source2);
+/*
+ * If div by zero, set temp1 to -1 and temp2 to 1 to
+ * produce the required result of -1.
+ */
+tcg_gen_movcond_tl(TCG_COND_EQ, temp1, source2, zero, mone, source1);
+tcg_gen_movcond_tl(TCG_COND_EQ, temp2, source2, zero, one, temp2);
 
-tcg_temp_free(cond1);
-tcg_temp_free(cond2);
-tcg_temp_free(resultopt1);
+tcg_gen_div_tl(ret, temp1, temp2);
+
+tcg_temp_free(temp1);
+tcg_temp_free(temp2);
 }
 
 static void gen_divu(TCGv ret, TCGv source1, TCGv source2)
 {
-TCGv cond1, zeroreg, resultopt1;
-cond1 = tcg_temp_new();
+TCGv temp1, temp2, zero, one, max;
 
-zeroreg = tcg_constant_tl(0);
-resultopt1 = tcg_temp_new();
+temp1 = tcg_temp_new();
+temp2 = tcg_temp_new();
+zero = tcg_constant_tl(0);
+one = tcg_constant_tl(1);
+max = tcg_constant_tl(~0);
 
-tcg_gen_setcondi_tl(TCG_COND_EQ, cond1, source2, 0);
-tcg_gen_movi_tl(resultopt1, (target_ulong)-1);
-tcg_gen_movcond_tl(TCG_COND_EQ, source1, cond1, zeroreg, source1,
-resultopt1);
-tcg_gen_movi_tl(resultopt1, (target_ulong)1);
-tcg_gen_movcond_tl(TCG_COND_EQ, source2, cond1, zeroreg, source2,
-resultopt1);
-tcg_gen_divu_tl(ret, source1, source2);
+/*
+ * If div by zero, set temp1 to max and temp2 to 1 to
+ * produce the required result of max.
+ */
+tcg_gen_movcond_tl(TCG_COND_EQ, temp1, source2, zero, max, source1);
+tcg_gen_movcond_tl(TCG_COND_EQ, temp2, source2, zero, one, source2);
+tcg_gen_divu_tl(ret, temp1, temp2);
 
-tcg_temp_free(cond1);
-tcg_temp_free(resultopt1);
+tcg_temp_free(temp1);
+tcg_temp_free(temp2);
 }
 
 static void gen_rem(TCGv ret, TCGv source1, TCGv source2)
 {
-TCGv cond1, cond2, zeroreg, resultopt1;
+TCGv temp1, temp2, zero, one, mone, min;
 
-cond1 = tcg_temp_new();
-cond2 = tcg_temp_new();
-zeroreg = tcg_constant_tl(0);
-resultopt1 = tcg_temp_new();
+temp1 = tcg_temp_new();
+temp2 = tcg_temp_new();
+zero = tcg_constant_tl(0);
+one = tcg_constant_tl(1);
+mone = tcg_constant_tl(-1);
+min = tcg_constant_tl(1ull << (TARGET_LONG_BITS - 1));
 
-tcg_gen_movi_tl(resultopt1, 1L);
-tcg_gen_setcondi_tl(TCG_COND_EQ, cond2, source2, (target_ulong)-1);
-tcg_gen_setcondi_tl(TCG_COND_EQ, cond1, 

[PATCH v5 05/24] target/riscv: Introduce DisasExtend and new helpers

2021-08-23 Thread Richard Henderson
Introduce get_gpr, dest_gpr, temp_new -- new helpers that do not force
tcg globals into temps, returning a constant 0 for $zero as source and
a new temp for $zero as destination.

Introduce ctx->w for simplifying word operations, such as addw.

Reviewed-by: Bin Meng 
Reviewed-by: Alistair Francis 
Signed-off-by: Richard Henderson 
---
 target/riscv/translate.c | 101 ---
 1 file changed, 83 insertions(+), 18 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index ce4c56c179..d7552dc377 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -39,15 +39,25 @@ static TCGv load_val;
 
 #include "exec/gen-icount.h"
 
+/*
+ * If an operation is being performed on less than TARGET_LONG_BITS,
+ * it may require the inputs to be sign- or zero-extended; which will
+ * depend on the exact operation being performed.
+ */
+typedef enum {
+EXT_NONE,
+EXT_SIGN,
+EXT_ZERO,
+} DisasExtend;
+
 typedef struct DisasContext {
 DisasContextBase base;
 /* pc_succ_insn points to the instruction following base.pc_next */
 target_ulong pc_succ_insn;
 target_ulong priv_ver;
-bool virt_enabled;
+target_ulong misa;
 uint32_t opcode;
 uint32_t mstatus_fs;
-target_ulong misa;
 uint32_t mem_idx;
 /* Remember the rounding mode encoded in the previous fp instruction,
which we have already installed into env->fp_status.  Or -1 for
@@ -55,6 +65,8 @@ typedef struct DisasContext {
to any system register, which includes CSR_FRM, so we do not have
to reset this known value.  */
 int frm;
+bool w;
+bool virt_enabled;
 bool ext_ifencei;
 bool hlsx;
 /* vector extension */
@@ -64,7 +76,11 @@ typedef struct DisasContext {
 uint16_t vlen;
 uint16_t mlen;
 bool vl_eq_vlmax;
+uint8_t ntemp;
 CPUState *cs;
+TCGv zero;
+/* Space for 3 operands plus 1 extra for address computation. */
+TCGv temp[4];
 } DisasContext;
 
 static inline bool has_ext(DisasContext *ctx, uint32_t ext)
@@ -172,27 +188,64 @@ static void gen_goto_tb(DisasContext *ctx, int n, 
target_ulong dest)
 }
 }
 
-/* Wrapper for getting reg values - need to check of reg is zero since
- * cpu_gpr[0] is not actually allocated
+/*
+ * Wrappers for getting reg values.
+ *
+ * The $zero register does not have cpu_gpr[0] allocated -- we supply the
+ * constant zero as a source, and an uninitialized sink as destination.
+ *
+ * Further, we may provide an extension for word operations.
  */
-static void gen_get_gpr(DisasContext *ctx, TCGv t, int reg_num)
+static TCGv temp_new(DisasContext *ctx)
 {
-if (reg_num == 0) {
-tcg_gen_movi_tl(t, 0);
-} else {
-tcg_gen_mov_tl(t, cpu_gpr[reg_num]);
-}
+assert(ctx->ntemp < ARRAY_SIZE(ctx->temp));
+return ctx->temp[ctx->ntemp++] = tcg_temp_new();
 }
 
-/* Wrapper for setting reg values - need to check of reg is zero since
- * cpu_gpr[0] is not actually allocated. this is more for safety purposes,
- * since we usually avoid calling the OP_TYPE_gen function if we see a write to
- * $zero
- */
-static void gen_set_gpr(DisasContext *ctx, int reg_num_dst, TCGv t)
+static TCGv get_gpr(DisasContext *ctx, int reg_num, DisasExtend ext)
 {
-if (reg_num_dst != 0) {
-tcg_gen_mov_tl(cpu_gpr[reg_num_dst], t);
+TCGv t;
+
+if (reg_num == 0) {
+return ctx->zero;
+}
+
+switch (ctx->w ? ext : EXT_NONE) {
+case EXT_NONE:
+return cpu_gpr[reg_num];
+case EXT_SIGN:
+t = temp_new(ctx);
+tcg_gen_ext32s_tl(t, cpu_gpr[reg_num]);
+return t;
+case EXT_ZERO:
+t = temp_new(ctx);
+tcg_gen_ext32u_tl(t, cpu_gpr[reg_num]);
+return t;
+}
+g_assert_not_reached();
+}
+
+static void gen_get_gpr(DisasContext *ctx, TCGv t, int reg_num)
+{
+tcg_gen_mov_tl(t, get_gpr(ctx, reg_num, EXT_NONE));
+}
+
+static TCGv __attribute__((unused)) dest_gpr(DisasContext *ctx, int reg_num)
+{
+if (reg_num == 0 || ctx->w) {
+return temp_new(ctx);
+}
+return cpu_gpr[reg_num];
+}
+
+static void gen_set_gpr(DisasContext *ctx, int reg_num, TCGv t)
+{
+if (reg_num != 0) {
+if (ctx->w) {
+tcg_gen_ext32s_tl(cpu_gpr[reg_num], t);
+} else {
+tcg_gen_mov_tl(cpu_gpr[reg_num], t);
+}
 }
 }
 
@@ -940,6 +993,11 @@ static void riscv_tr_init_disas_context(DisasContextBase 
*dcbase, CPUState *cs)
 ctx->mlen = 1 << (ctx->sew  + 3 - ctx->lmul);
 ctx->vl_eq_vlmax = FIELD_EX32(tb_flags, TB_FLAGS, VL_EQ_VLMAX);
 ctx->cs = cs;
+ctx->w = false;
+ctx->ntemp = 0;
+memset(ctx->temp, 0, sizeof(ctx->temp));
+
+ctx->zero = tcg_constant_tl(0);
 }
 
 static void riscv_tr_tb_start(DisasContextBase *db, CPUState *cpu)
@@ -961,6 +1019,13 @@ static void riscv_tr_translate_insn(DisasContextBase 
*dcbase, CPUState *cpu)
 
 decode_opc(env, ctx, opcode16);
 ctx->base.pc_next = 

[PATCH v5 01/24] target/riscv: Use tcg_constant_*

2021-08-23 Thread Richard Henderson
Replace uses of tcg_const_* with the allocate and free close together.

Reviewed-by: Bin Meng 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Alistair Francis 
Signed-off-by: Richard Henderson 
---
 target/riscv/translate.c| 36 --
 target/riscv/insn_trans/trans_rvf.c.inc |  3 +-
 target/riscv/insn_trans/trans_rvv.c.inc | 65 +
 3 files changed, 34 insertions(+), 70 deletions(-)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 6983be5723..20a55c92fb 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -104,20 +104,16 @@ static void gen_nanbox_s(TCGv_i64 out, TCGv_i64 in)
  */
 static void gen_check_nanbox_s(TCGv_i64 out, TCGv_i64 in)
 {
-TCGv_i64 t_max = tcg_const_i64(0xull);
-TCGv_i64 t_nan = tcg_const_i64(0x7fc0ull);
+TCGv_i64 t_max = tcg_constant_i64(0xull);
+TCGv_i64 t_nan = tcg_constant_i64(0x7fc0ull);
 
 tcg_gen_movcond_i64(TCG_COND_GEU, out, in, t_max, in, t_nan);
-tcg_temp_free_i64(t_max);
-tcg_temp_free_i64(t_nan);
 }
 
 static void generate_exception(DisasContext *ctx, int excp)
 {
 tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);
-TCGv_i32 helper_tmp = tcg_const_i32(excp);
-gen_helper_raise_exception(cpu_env, helper_tmp);
-tcg_temp_free_i32(helper_tmp);
+gen_helper_raise_exception(cpu_env, tcg_constant_i32(excp));
 ctx->base.is_jmp = DISAS_NORETURN;
 }
 
@@ -125,17 +121,13 @@ static void generate_exception_mtval(DisasContext *ctx, 
int excp)
 {
 tcg_gen_movi_tl(cpu_pc, ctx->base.pc_next);
 tcg_gen_st_tl(cpu_pc, cpu_env, offsetof(CPURISCVState, badaddr));
-TCGv_i32 helper_tmp = tcg_const_i32(excp);
-gen_helper_raise_exception(cpu_env, helper_tmp);
-tcg_temp_free_i32(helper_tmp);
+gen_helper_raise_exception(cpu_env, tcg_constant_i32(excp));
 ctx->base.is_jmp = DISAS_NORETURN;
 }
 
 static void gen_exception_debug(void)
 {
-TCGv_i32 helper_tmp = tcg_const_i32(EXCP_DEBUG);
-gen_helper_raise_exception(cpu_env, helper_tmp);
-tcg_temp_free_i32(helper_tmp);
+gen_helper_raise_exception(cpu_env, tcg_constant_i32(EXCP_DEBUG));
 }
 
 /* Wrapper around tcg_gen_exit_tb that handles single stepping */
@@ -229,7 +221,7 @@ static void gen_div(TCGv ret, TCGv source1, TCGv source2)
  */
 cond1 = tcg_temp_new();
 cond2 = tcg_temp_new();
-zeroreg = tcg_const_tl(0);
+zeroreg = tcg_constant_tl(0);
 resultopt1 = tcg_temp_new();
 
 tcg_gen_movi_tl(resultopt1, (target_ulong)-1);
@@ -250,7 +242,6 @@ static void gen_div(TCGv ret, TCGv source1, TCGv source2)
 
 tcg_temp_free(cond1);
 tcg_temp_free(cond2);
-tcg_temp_free(zeroreg);
 tcg_temp_free(resultopt1);
 }
 
@@ -259,7 +250,7 @@ static void gen_divu(TCGv ret, TCGv source1, TCGv source2)
 TCGv cond1, zeroreg, resultopt1;
 cond1 = tcg_temp_new();
 
-zeroreg = tcg_const_tl(0);
+zeroreg = tcg_constant_tl(0);
 resultopt1 = tcg_temp_new();
 
 tcg_gen_setcondi_tl(TCG_COND_EQ, cond1, source2, 0);
@@ -272,7 +263,6 @@ static void gen_divu(TCGv ret, TCGv source1, TCGv source2)
 tcg_gen_divu_tl(ret, source1, source2);
 
 tcg_temp_free(cond1);
-tcg_temp_free(zeroreg);
 tcg_temp_free(resultopt1);
 }
 
@@ -282,7 +272,7 @@ static void gen_rem(TCGv ret, TCGv source1, TCGv source2)
 
 cond1 = tcg_temp_new();
 cond2 = tcg_temp_new();
-zeroreg = tcg_const_tl(0);
+zeroreg = tcg_constant_tl(0);
 resultopt1 = tcg_temp_new();
 
 tcg_gen_movi_tl(resultopt1, 1L);
@@ -302,7 +292,6 @@ static void gen_rem(TCGv ret, TCGv source1, TCGv source2)
 
 tcg_temp_free(cond1);
 tcg_temp_free(cond2);
-tcg_temp_free(zeroreg);
 tcg_temp_free(resultopt1);
 }
 
@@ -310,7 +299,7 @@ static void gen_remu(TCGv ret, TCGv source1, TCGv source2)
 {
 TCGv cond1, zeroreg, resultopt1;
 cond1 = tcg_temp_new();
-zeroreg = tcg_const_tl(0);
+zeroreg = tcg_constant_tl(0);
 resultopt1 = tcg_temp_new();
 
 tcg_gen_movi_tl(resultopt1, (target_ulong)1);
@@ -323,7 +312,6 @@ static void gen_remu(TCGv ret, TCGv source1, TCGv source2)
 source1);
 
 tcg_temp_free(cond1);
-tcg_temp_free(zeroreg);
 tcg_temp_free(resultopt1);
 }
 
@@ -384,15 +372,11 @@ static inline void mark_fs_dirty(DisasContext *ctx) { }
 
 static void gen_set_rm(DisasContext *ctx, int rm)
 {
-TCGv_i32 t0;
-
 if (ctx->frm == rm) {
 return;
 }
 ctx->frm = rm;
-t0 = tcg_const_i32(rm);
-gen_helper_set_rounding_mode(cpu_env, t0);
-tcg_temp_free_i32(t0);
+gen_helper_set_rounding_mode(cpu_env, tcg_constant_i32(rm));
 }
 
 static int ex_plus_1(DisasContext *ctx, int nf)
diff --git a/target/riscv/insn_trans/trans_rvf.c.inc 
b/target/riscv/insn_trans/trans_rvf.c.inc
index db1c0c9974..89f78701e7 100644
--- a/target/riscv/insn_trans/trans_rvf.c.inc
+++ b/target/riscv/insn_trans/trans_rvf.c.inc
@@ -200,12 +200,11 @@ static bool 

[PATCH v5 00/24] target/riscv: Use tcg_constant_*

2021-08-23 Thread Richard Henderson
Replace use of tcg_const_*, which makes a copy into a temp which must
be freed, with direct use of the constant.  Reorg handling of $zero,
with different accessors for source and destination.  Reorg handling
of csrs, passing the actual write_mask instead of a regno.  Use more
helpers for RVH expansion.

Patches lacking review:
  13-target-riscv-Use-extracts-for-sraiw-and-srliw.patch  (new)
  16-target-riscv-Fix-rmw_sip-rmw_vsip-rmw_hsip-vs-wri.patch  (new)
  17-target-riscv-Fix-hgeie-hgeip.patch   (new)
  20-target-riscv-Use-gen_shift_imm_fn-for-slli_uw.patch
  24-target-riscv-Use-get-dest-_gpr-for-RVV.patch

Changes for v5:
  * Use extract for sraiw, srliw.
  * Fix some broken csr helpers.

Changes for v4:
  * Add a test for division, primarily checking the edge cases.
  * Dropped the greviw patch, since grev has been dropped from Zbb 1.0.0.

Changes for v3:
  * Fix an introduced remainder bug (bin meng),
and remove one extra movcond from rem/remu.
  * Do not zero DisasContext on allocation (bin meng).

Changes for v2:
  * Retain the requirement to call gen_set_gpr.
  * Add DisasExtend as an argument to get_gpr, and ctx->w as a member
of DisasContext.  This should help in implementing UXL, where we
should be able to set ctx->w for all insns, but there is certainly
more required for that.


r~


Richard Henderson (24):
  target/riscv: Use tcg_constant_*
  tests/tcg/riscv64: Add test for division
  target/riscv: Clean up division helpers
  target/riscv: Add DisasContext to gen_get_gpr, gen_set_gpr
  target/riscv: Introduce DisasExtend and new helpers
  target/riscv: Add DisasExtend to gen_arith*
  target/riscv: Remove gen_arith_div*
  target/riscv: Use gen_arith for mulh and mulhu
  target/riscv: Move gen_* helpers for RVM
  target/riscv: Move gen_* helpers for RVB
  target/riscv: Add DisasExtend to gen_unary
  target/riscv: Use DisasExtend in shift operations
  target/riscv: Use extracts for sraiw and srliw
  target/riscv: Use get_gpr in branches
  target/riscv: Use {get,dest}_gpr for integer load/store
  target/riscv: Fix rmw_sip, rmw_vsip, rmw_hsip vs write-only operation
  target/riscv: Fix hgeie, hgeip
  target/riscv: Reorg csr instructions
  target/riscv: Use {get,dest}_gpr for RVA
  target/riscv: Use gen_shift_imm_fn for slli_uw
  target/riscv: Use {get,dest}_gpr for RVF
  target/riscv: Use {get,dest}_gpr for RVD
  target/riscv: Tidy trans_rvh.c.inc
  target/riscv: Use {get,dest}_gpr for RVV

 target/riscv/helper.h   |   6 +-
 target/riscv/insn32.decode  |   1 +
 target/riscv/csr.c  |  49 +-
 target/riscv/op_helper.c|  18 +-
 target/riscv/translate.c| 701 ++--
 tests/tcg/riscv64/test-div.c|  58 ++
 target/riscv/insn_trans/trans_rva.c.inc |  51 +-
 target/riscv/insn_trans/trans_rvb.c.inc | 366 ++---
 target/riscv/insn_trans/trans_rvd.c.inc | 127 +++--
 target/riscv/insn_trans/trans_rvf.c.inc | 149 +++--
 target/riscv/insn_trans/trans_rvh.c.inc | 266 ++---
 target/riscv/insn_trans/trans_rvi.c.inc | 370 +++--
 target/riscv/insn_trans/trans_rvm.c.inc | 191 +--
 target/riscv/insn_trans/trans_rvv.c.inc | 151 ++---
 tests/tcg/riscv64/Makefile.target   |   5 +
 15 files changed, 1154 insertions(+), 1355 deletions(-)
 create mode 100644 tests/tcg/riscv64/test-div.c
 create mode 100644 tests/tcg/riscv64/Makefile.target

-- 
2.25.1




Re: [PATCH v4 15/21] target/riscv: Reorg csr instructions

2021-08-23 Thread Richard Henderson

On 8/22/21 9:54 PM, Bin Meng wrote:

On Sat, Aug 21, 2021 at 1:43 AM Richard Henderson
 wrote:


Introduce csrr and csrw helpers, for read-only and write-only insns.

Note that we do not properly implement this in riscv_csrrw, in that
we cannot distinguish true read-only (rs1 == 0) from any other zero
write_mask another source register -- this should still raise an
exception for read-only registers.

Only issue gen_io_start for CF_USE_ICOUNT.
Use ctx->zero for csrrc.
Use get_gpr and dest_gpr.

Reviewed-by: Bin Meng 
Signed-off-by: Richard Henderson 
---
  target/riscv/helper.h   |   6 +-
  target/riscv/op_helper.c|  18 +--
  target/riscv/insn_trans/trans_rvi.c.inc | 172 +---
  3 files changed, 131 insertions(+), 65 deletions(-)



When testing Linux kernel boot, there was a segment fault in the
helper_csrw() path where ret_value pointer is now NULL, and some CSR
write op does not test ret_value.


Thanks.  It would be really nice to get an acceptance test in...


r~




Re: Testing a microcontroller emulation by loading the binary on incomplete Flash emulation

2021-08-23 Thread Gautam Bhat
On Sun, Aug 22, 2021 at 10:18 PM Peter Maydell  wrote:
>
> On Sun, 22 Aug 2021 at 15:37, Gautam Bhat  wrote:
> >
> > Hi,
> >
> > I am to implement a very simple microcontroller for my understanding
> > of Qemu development. This microcontroller runs its code from
> > programmable flash which is bit-, byte- and word addressable. To do
> > some initial tests of my nascent microcontroller implementation I
> > would like to load a very simple program binary. Is there a way to
> > load this binary and start execution without emulating Flash
> > controller and memory?
>
> Just create a plain old RAM memory region, and then load the
> guest binary into it with the 'generic loader' (which can
> take an ELF file or a raw binary).
>
> -- PMM

Thanks. I will check it out.



Re: [PATCH V6 19/27] vfio-pci: cpr part 1 (fd and dma)

2021-08-23 Thread Steven Sistare
On 8/10/2021 1:06 PM, Alex Williamson wrote:
> On Fri,  6 Aug 2021 14:43:53 -0700
> Steve Sistare  wrote:
> 
>> Enable vfio-pci devices to be saved and restored across an exec restart
>> of qemu.
>>
>> At vfio creation time, save the value of vfio container, group, and device
>> descriptors in cpr state.
>>
>> In cpr-save and cpr-exec, suspend the use of virtual addresses in DMA
>> mappings with VFIO_DMA_UNMAP_FLAG_VADDR, because guest ram will be remapped
>> at a different VA after exec.  DMA to already-mapped pages continues.  Save
>> the msi message area as part of vfio-pci vmstate, save the interrupt and
>> notifier eventfd's in cpr state, and clear the close-on-exec flag for the
>> vfio descriptors.  The flag is not cleared earlier because the descriptors
>> should not persist across miscellaneous fork and exec calls that may be
>> performed during normal operation.
>>
>> On qemu restart, vfio_realize() finds the descriptor env vars, uses
>> the descriptors, and notes that the device is being reused.  Device and
>> iommu state is already configured, so operations in vfio_realize that
>> would modify the configuration are skipped for a reused device, including
>> vfio ioctl's and writes to PCI configuration space.  The result is that
>> vfio_realize constructs qemu data structures that reflect the current
>> state of the device.  However, the reconstruction is not complete until
>> cpr-load is called. cpr-load loads the msi data and finds eventfds in cpr
>> state.  It rebuilds vector data structures and attaches the interrupts to
>> the new KVM instance.  cpr-load then walks the flattened ranges of the
>> vfio_address_spaces and calls VFIO_DMA_MAP_FLAG_VADDR to inform the kernel
>> of the new VA's.  Lastly, it starts the VM and suppresses vfio device reset.
>>
>> This functionality is delivered by 3 patches for clarity.  Part 1 handles
>> device file descriptors and DMA.  Part 2 adds eventfd and MSI/MSI-X vector
>> support.  Part 3 adds INTX support.
>>
>> Signed-off-by: Steve Sistare 
>> ---
>>  MAINTAINERS   |   1 +
>>  hw/pci/pci.c  |   4 ++
>>  hw/vfio/common.c  |  69 --
>>  hw/vfio/cpr.c | 160 
>> ++
>>  hw/vfio/meson.build   |   1 +
>>  hw/vfio/pci.c |  57 +++
>>  hw/vfio/trace-events  |   1 +
>>  include/hw/pci/pci.h  |   1 +
>>  include/hw/vfio/vfio-common.h |   5 ++
>>  include/migration/cpr.h   |   3 +
>>  linux-headers/linux/vfio.h|   6 ++
>>  migration/cpr.c   |  10 ++-
>>  migration/target.c|  14 
>>  13 files changed, 325 insertions(+), 7 deletions(-)
>>  create mode 100644 hw/vfio/cpr.c
>>
>> diff --git a/MAINTAINERS b/MAINTAINERS
>> index a9d2ed8..3132965 100644
>> --- a/MAINTAINERS
>> +++ b/MAINTAINERS
>> @@ -2904,6 +2904,7 @@ CPR
>>  M: Steve Sistare 
>>  M: Mark Kanda 
>>  S: Maintained
>> +F: hw/vfio/cpr.c
>>  F: include/migration/cpr.h
>>  F: migration/cpr.c
>>  F: qapi/cpr.json
>> diff --git a/hw/pci/pci.c b/hw/pci/pci.c
>> index 59408a3..b9c6ca1 100644
>> --- a/hw/pci/pci.c
>> +++ b/hw/pci/pci.c
>> @@ -307,6 +307,10 @@ static void pci_do_device_reset(PCIDevice *dev)
>>  {
>>  int r;
>>  
>> +if (dev->reused) {
>> +return;
>> +}
>> +
>>  pci_device_deassert_intx(dev);
>>  assert(dev->irq_state == 0);
>>  
>> diff --git a/hw/vfio/common.c b/hw/vfio/common.c
>> index 7918c0d..872a1ac 100644
>> --- a/hw/vfio/common.c
>> +++ b/hw/vfio/common.c
>> @@ -31,6 +31,7 @@
>>  #include "exec/memory.h"
>>  #include "exec/ram_addr.h"
>>  #include "hw/hw.h"
>> +#include "migration/cpr.h"
>>  #include "qemu/error-report.h"
>>  #include "qemu/main-loop.h"
>>  #include "qemu/range.h"
>> @@ -464,6 +465,10 @@ static int vfio_dma_unmap(VFIOContainer *container,
>>  return vfio_dma_unmap_bitmap(container, iova, size, iotlb);
>>  }
>>  
>> +if (container->reused) {
>> +return 0;
>> +}
>> +
>>  while (ioctl(container->fd, VFIO_IOMMU_UNMAP_DMA, )) {
>>  /*
>>   * The type1 backend has an off-by-one bug in the kernel 
>> (71a7d3d78e3c
>> @@ -501,6 +506,10 @@ static int vfio_dma_map(VFIOContainer *container, 
>> hwaddr iova,
>>  .size = size,
>>  };
>>  
>> +if (container->reused) {
>> +return 0;
>> +}
>> +
>>  if (!readonly) {
>>  map.flags |= VFIO_DMA_MAP_FLAG_WRITE;
>>  }
>> @@ -1872,6 +1881,10 @@ static int vfio_init_container(VFIOContainer 
>> *container, int group_fd,
>>  if (iommu_type < 0) {
>>  return iommu_type;
>>  }
>> +if (container->reused) {
>> +container->iommu_type = iommu_type;
>> +return 0;
>> +}
>>  
> 
> I'd like to see more comments throughout, but particularly where we're
> dumping out of functions for reused containers, groups, and devices.
> For instance map/unmap we're assuming we'll reach the same IOMMU
> mapping state we 

Re: [PATCH] softmmu/physmem: Improve guest memory allocation failure error message

2021-08-23 Thread David Hildenbrand

On 23.08.21 12:34, Philippe Mathieu-Daudé wrote:

On 8/23/21 12:24 PM, David Hildenbrand wrote:

On 23.08.21 12:12, Philippe Mathieu-Daudé wrote:

On 8/23/21 11:29 AM, David Hildenbrand wrote:

On 23.08.21 11:23, Peter Maydell wrote:

On Mon, 23 Aug 2021 at 09:40, David Hildenbrand 
wrote:

Not opposed to printing the size, although I doubt that it will really
stop similar questions/problems getting raised.


The case that triggered this was somebody thinking
-m took a byte count, so very likely that an error message
saying "you tried to allocate 38TB" would have made their
mistake clear in a way that just "allocation failed" did not.
It also means that if a future user asks us for help then
we can look at the error message and immediately tell them
the problem, rather than going "hmm, what are all the possible
ways that allocation might have failed" and going off down
rabbitholes like VM overcommit settings...


We've had similar issues recently where Linux memory overcommit handling
rejected the allocation -- and the user was well aware about the actual
size. You won't be able to catch such reports, because people don't
understand how Linux memory overcommit handling works or was configured.

"I have 3 GiB of free memory, why can't I create a 3 GiB VM". "I have 3
GiB of RAM, why can't I create a 3 GiB VM even if it won't make use of
all 3 GiB of memory".

Thus my comment, it will only stop very basic usage issues. And I agree
that looking at the error *might* help. It didn't help for the cases I
just described, because we need much more system information to make a
guess what the user error actually is.


Is it possible to get the maximal overcommitable amount on Linux?


Not reliably I think.

In the "always" mode, there is none.

In the "guess"/"estimate" mode, the kernel takes a guess (currently
implemented as checking if the mmap size <= total RAM + total SWAP).
 Committable = MemTotal + SwapTotal

In the "never" mode:
 Committable = CommitLimit - Committed_AS
However, the value gets further reduced for !root applications by
/proc/sys/vm/admin_reserve_kbytes.

Replicating these calculations in user space would be suboptimal IMHO.


What about simply giving a hint about memory overcommit and display
a link to documentation with longer description about how to check
and figure out this issue?


That would be highly OS-specific -- for example, there is no memory 
overcommit under Windows. Sure, we could add a Linux specific hint, 
indication documentation. But I'm not sure if most end users stumbling 
into such an error+hint would be able to make sense of memory overcommit 
details (not to mention that they know what it even is) :)


You can run into memory allocation issues with many applications. Let me 
give you a simple example


t480s: ~  $ dd if=/dev/zero of=/dev/null ibs=100G
dd: memory exhausted by input buffer of size 107374182400 bytes (100 GiB)

So indicating the size of the failing allocation might be just good 
enough. For the other parts it's usually just "the way the OS was 
configured, it does not think it can allow this allocation".


--
Thanks,

David / dhildenb




Re: [PATCH 4/4] vl: Prioritize realizations of devices

2021-08-23 Thread Peter Xu
On Mon, Aug 23, 2021 at 02:49:12PM -0400, Eduardo Habkost wrote:
> On Wed, Aug 18, 2021 at 03:43:18PM -0400, Peter Xu wrote:
> > QEMU creates -device objects in order as specified by the user's cmdline.
> > However that ordering may not be the ideal order.  For example, some 
> > platform
> > devices (vIOMMUs) may want to be created earlier than most of the rest
> > devices (e.g., vfio-pci, virtio).
> > 
> > This patch orders the QemuOptsList of '-device's so they'll be sorted first
> > before kicking off the device realizations.  This will allow the device
> > realization code to be able to use APIs like 
> > pci_device_iommu_address_space()
> > correctly, because those functions rely on the platfrom devices being 
> > realized.
> > 
> > Now we rely on vmsd->priority which is defined as MigrationPriority to 
> > provide
> > the ordering, as either VM init and migration completes will need such an
> > ordering.  In the future we can move that priority information out of vmsd.
> > 
> > Signed-off-by: Peter Xu 
> 
> Can we be 100% sure that changing the ordering of every single
> device being created won't affect guest ABI?  (I don't think we can)

That's a good question, however I doubt whether there's any real-world guest
ABI for that.  As a developer, I normally specify cmdline parameter in an adhoc
way, so that I assume most parameters are not sensitive to ordering and I can
tune the ordering as wish.  I'm not sure whether that's common for qemu users,
I would expect so, but I may have missed something that I'm not aware of.

Per my knowledge the only "guest ABI" change is e.g. when we specify "vfio-pci"
to be before "intel-iommu": it'll be constantly broken before this patchset,
while after this series it'll be working.  It's just that I don't think those
"guest ABI" is necessary to be kept, and that's exactly what I want to fix with
the patchset..

> 
> How many device types in QEMU have non-default vmsd priority?

Not so much; here's the list of priorities and the devices using it:

   |+-|
   | priority   | devices |
   |+-|
   | MIG_PRI_IOMMU  |   3 |
   | MIG_PRI_PCI_BUS|   7 |
   | MIG_PRI_VIRTIO_MEM |   1 |
   | MIG_PRI_GICV3_ITS  |   1 |
   | MIG_PRI_GICV3  |   1 |
   |+-|

All the rest devices are using the default (0) priority.

> 
> Can we at least ensure devices with the same priority won't be
> reordered, just to be safe?  (qsort() doesn't guarantee that)
> 
> If very few device types have non-default vmsd priority and
> devices with the same priority aren't reordered, the risk of
> compatibility breakage would be much smaller.

I'm also wondering whether it's a good thing to break some guest ABI due to
this change, if possible.

Let's imagine something breaks after applied, then the only reason should be
that qsort() changed the order of some same-priority devices and it's not the
same as user specified any more.  Then, does it also means there's yet another
ordering requirement that we didn't even notice?

I doubt whether that'll even happen (or I think there'll be report already, as
in qemu man page there's no requirement on parameter ordering).  In all cases,
instead of "keeping the same priority devices in the same order as the user has
specified", IMHO we should make the broken devices to have different priorities
so the ordering will be guaranteed by qemu internal, rather than how user
specified it.

>From that pov, maybe this patchset would be great if it can be accepted and
applied in early stage of a release? So we can figure out what's missing and
fix them within the same release.  However again I still doubt whether there's
any user that will break in a bad way.

Thanks,

-- 
Peter Xu




Re: [RFC PATCH v2 0/5] physmem: Have flaview API check bus permission from MemTxAttrs argument

2021-08-23 Thread Peter Maydell
On Mon, 23 Aug 2021 at 17:42, Philippe Mathieu-Daudé  wrote:
>
> This series aim to kill a recent class of bug, the infamous
> "DMA reentrancy" issues found by Alexander while fuzzing.
>
> Introduce the 'bus_perm' field in MemTxAttrs, defining 3 bits:
>
> - MEMTXPERM_UNSPECIFIED (current default, unchanged behavior)
> - MEMTXPERM_UNRESTRICTED (allow list approach)
> - MEMTXPERM_RAM_DEVICE (example of deny list approach)
>
> If a transaction permission is not allowed (for example access
> to non-RAM device), we return the specific MEMTX_BUS_ERROR.
>
> Permissions are checked in after the flatview is resolved, and
> before the access is done, in a new function: flatview_access_allowed().

So I'm not going to say 'no' to this, because we have a real
recursive-device-handling problem and I don't have a better
idea to hand, but the thing about this is that we end up with
behaviour which is not what the real hardware does. I'm not
aware of any DMA device which has this kind of "can only DMA
to/from RAM, and aborts on access to a device" behaviour...

-- PMM



Re: [RFC PATCH v2 5/5] softmmu/physmem: Have flaview API check MemTxAttrs::bus_perm field

2021-08-23 Thread David Hildenbrand

On 23.08.21 18:41, Philippe Mathieu-Daudé wrote:

Check bus permission in flatview_access_allowed() before
running any bus transaction.

There is not change for the default case (MEMTXPERM_UNSPECIFIED).


s/not/no/



The MEMTXPERM_UNRESTRICTED case works as an allow list. Devices
using it won't be checked by flatview_access_allowed().


Well, and MEMTXPERM_UNSPECIFIED. Another indication that the split 
should better be avoided.




The only deny list equivalent is MEMTXPERM_RAM_DEVICE: devices
using this flag will reject transactions and set the optional
MemTxResult to MEMTX_BUS_ERROR.

Signed-off-by: Philippe Mathieu-Daudé 
---
  softmmu/physmem.c | 17 -
  1 file changed, 16 insertions(+), 1 deletion(-)

diff --git a/softmmu/physmem.c b/softmmu/physmem.c
index 0d31a2f4199..329542dba75 100644
--- a/softmmu/physmem.c
+++ b/softmmu/physmem.c
@@ -2772,7 +2772,22 @@ static inline bool flatview_access_allowed(MemoryRegion 
*mr, MemTxAttrs attrs,
 hwaddr addr, hwaddr len,
 MemTxResult *result)
  {
-return true;
+if (unlikely(attrs.bus_perm == MEMTXPERM_RAM_DEVICE)) {
+if (memory_region_is_ram(mr) || memory_region_is_ram_device(mr)) {
+return true;
+}


I'm a bit confused why it's called MEMTXPERM_RAM_DEVICE, yet we also 
allow access to !memory_region_is_ram_device(mr).


Can we find a more expressive name?

Also, I wonder if we'd actually want to have "flags" instead of pure 
values. Like having


#define MEMTXPERM_RAM   1
#define MEMTXPERM_RAM_DEVICE2

and map them cleanly to the two similar, but different types of memory 
backends.




+qemu_log_mask(LOG_GUEST_ERROR,
+  "Invalid access to non-RAM device at "
+  "addr 0x%" HWADDR_PRIX ", size %" HWADDR_PRIu ", "
+  "region '%s'\n", addr, len, memory_region_name(mr));
+if (result) {
+*result |= MEMTX_BUS_ERROR;
+}
+return false;
+} else {
+/* MEMTXPERM_UNRESTRICTED and MEMTXPERM_UNSPECIFIED cases */
+return true;
+}
  }
  
  /* Called within RCU critical section.  */




Do we have any target user examples / prototypes?

--
Thanks,

David / dhildenb




Re: [RFC PATCH v2 2/5] hw/intc/arm_gicv3: Check for !MEMTX_OK instead of MEMTX_ERROR

2021-08-23 Thread Peter Maydell
On Mon, 23 Aug 2021 at 17:42, Philippe Mathieu-Daudé  wrote:
>
> We are going to introduce more MemTxResult bits, so it is
> safer to check for !MEMTX_OK rather than MEMTX_ERROR.
>
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Peter Maydell 

but note that these MEMTX_* aren't from the memory transaction
API functions; they're just being used by gicd_readl() and
friends as a way to indicate a success/failure so that the
actual MemoryRegionOps read/write fns like gicv3_dist_read()
can log a guest error. Arguably this is a bit of a misuse of
the MEMTX_* constants and perhaps we should have gicd_readl etc
return a bool instead.

thanks
-- PMM



Re: [RFC PATCH v2 3/5] exec/memattrs: Introduce MemTxAttrs::bus_perm field

2021-08-23 Thread David Hildenbrand

On 23.08.21 20:41, Peter Xu wrote:

On Mon, Aug 23, 2021 at 06:41:55PM +0200, Philippe Mathieu-Daudé wrote:

+/* Permission to restrict bus memory accesses. See MemTxAttrs::bus_perm */
+enum {
+MEMTXPERM_UNSPECIFIED   = 0,
+MEMTXPERM_UNRESTRICTED  = 1,
+MEMTXPERM_RAM_DEVICE= 2,
+};


Is there a difference between UNSPECIFIED and UNRESTRICTED?

If no, should we merge them?



I'd assume MEMTXPERM_UNSPECIFIED has to be treated like 
MEMTXPERM_UNRESTRICTED, so I'd also think we should just squash them.


--
Thanks,

David / dhildenb




Re: [RFC PATCH v2 4/5] softmmu/physmem: Introduce flatview_access_allowed() to check bus perms

2021-08-23 Thread David Hildenbrand

On 23.08.21 20:43, Peter Xu wrote:

On Mon, Aug 23, 2021 at 06:41:56PM +0200, Philippe Mathieu-Daudé wrote:

Introduce flatview_access_allowed() to check bus permission
before running any bus transaction. For now this is a simple
stub.

Signed-off-by: Philippe Mathieu-Daudé 


Shall we squash this patch into the next one?  It helps explain better on why
it's needed.  Thanks,



I'd even go one step further and squash 3-5 into a single one.

--
Thanks,

David / dhildenb




Re: [RFC PATCH v2 2/5] hw/intc/arm_gicv3: Check for !MEMTX_OK instead of MEMTX_ERROR

2021-08-23 Thread David Hildenbrand

On 23.08.21 18:41, Philippe Mathieu-Daudé wrote:

We are going to introduce more MemTxResult bits, so it is
safer to check for !MEMTX_OK rather than MEMTX_ERROR.

Signed-off-by: Philippe Mathieu-Daudé 
---


Reviewed-by: David Hildenbrand 

--
Thanks,

David / dhildenb




Re: [PATCH v5 2/5] python/aqmp-tui: Add AQMP TUI

2021-08-23 Thread John Snow
On Mon, Aug 23, 2021 at 12:31 PM G S Niteesh Babu 
wrote:

> Added AQMP TUI.
>
> Implements the follwing basic features:
> 1) Command transmission/reception.
> 2) Shows events asynchronously.
> 3) Shows server status in the bottom status bar.
> 4) Automatic retries on disconnects and error conditions.
>
> Also added type annotations and necessary pylint/mypy configurations.
>
> Signed-off-by: G S Niteesh Babu 
> ---
>  python/qemu/aqmp/aqmp_tui.py | 637 +++
>  python/setup.cfg |  13 +-
>  2 files changed, 649 insertions(+), 1 deletion(-)
>  create mode 100644 python/qemu/aqmp/aqmp_tui.py
>
> diff --git a/python/qemu/aqmp/aqmp_tui.py b/python/qemu/aqmp/aqmp_tui.py
> new file mode 100644
> index 00..d3180e38bf
> --- /dev/null
> +++ b/python/qemu/aqmp/aqmp_tui.py
> @@ -0,0 +1,637 @@
> +# Copyright (c) 2021
> +#
> +# Authors:
> +#  Niteesh Babu G S 
> +#
> +# This work is licensed under the terms of the GNU GPL, version 2 or
> +# later.  See the COPYING file in the top-level directory.
> +"""
> +AQMP TUI
> +
> +AQMP TUI is an asynchronous interface built on top the of the AQMP
> library.
> +It is the successor of QMP-shell and is bought-in as a replacement for it.
> +
> +Example Usage: aqmp-tui 
> +Full Usage: aqmp-tui --help
> +"""
> +
> +import argparse
> +import asyncio
> +import logging
> +from logging import Handler, LogRecord
> +import signal
> +from typing import (
> +List,
> +Optional,
> +Tuple,
> +Type,
> +Union,
> +cast,
> +)
> +
> +import urwid
> +import urwid_readline
> +
> +from ..qmp import QEMUMonitorProtocol, QMPBadPortError
> +from .error import ProtocolError
> +from .message import DeserializationError, Message, UnexpectedTypeError
> +from .protocol import ConnectError, Runstate
> +from .qmp_client import ExecInterruptedError, QMPClient
> +from .util import create_task, pretty_traceback
> +
> +
> +# The name of the signal that is used to update the history list
> +UPDATE_MSG: str = 'UPDATE_MSG'
> +
> +
> +def format_json(msg: str) -> str:
> +"""
> +Formats given multi-line JSON message into a single-line message.
> +Converting into single line is more asthetically pleasing when looking
> +along with error messages.
> +
> +Eg:
> +Input:
> +  [ 1,
> +true,
> +3 ]
> +The above input is not a valid QMP message and produces the following
> error
> +"QMP message is not a JSON object."
> +When displaying this in TUI in multiline mode we get
> +
> +[ 1,
> +  true,
> +  3 ]: QMP message is not a JSON object.
> +
> +whereas in singleline mode we get the following
> +
> +[1, true, 3]: QMP message is not a JSON object.
> +
> +The single line mode is more asthetically pleasing.
> +
> +:param msg:
> +The message to formatted into single line.
> +
> +:return: Formatted singleline message.
> +
> +NOTE: We cannot use the JSON module here because it is only capable of
> +format valid JSON messages. But here the goal is to also format
> invalid
> +JSON messages.
> +"""
> +msg = msg.replace('\n', '')
> +words = msg.split(' ')
> +words = [word for word in words if word != '']
>

try list(filter(None, words)) -- it's a little easier to read.


> +return ' '.join(words)
> +
> +
> +def has_tui_handler(logger: logging.Logger,
> +handler_type: Type[Handler]) -> bool:
>

maybe has_handler_type(...), since you wrote something a bit more generic
than just checking for the TUI handler.


> +"""
> +The Logger class has no interface to check if a certain type of
> handler is
> +installed or not. So we provide an interface to do so.
> +
> +:param logger:
> +Logger object
> +:param handler_type:
> +The type of the handler to be checked.
> +
> +:return: returns True if handler of type `handler_type` is installed
> else
> + False.
>

If you wanted to fit this on one line, the "else False" is implied and
could be omitted.


> +"""
> +handlers = logger.handlers
> +for handler in handlers:
>

You could combine these lines if you wanted: for handler in
logger.handlers: ...


> +if isinstance(handler, handler_type):
> +return True
> +return False
> +
> +
> +class App(QMPClient):
> +"""
> +Implements the AQMP TUI.
> +
> +Initializes the widgets and starts the urwid event loop.
> +"""
> +def __init__(self, address: Union[str, Tuple[str, int]], num_retries:
> int,
> + retry_delay: Optional[int]) -> None:
> +"""
> +Initializes the TUI.
> +
> +:param address:
> +Address of the server to connect to.
> +:param num_retries:
> +The number of times to retry before stopping to reconnect.
> +:param retry_delay:
> +The delay(sec) before each retry
> +"""
>

Here and elsewhere, the init 

Re: [RFC PATCH v2 1/5] softmmu/physmem: Simplify flatview_write and address_space_access_valid

2021-08-23 Thread David Hildenbrand

On 23.08.21 18:41, Philippe Mathieu-Daudé wrote:

Remove unuseful local 'result' variables.

Signed-off-by: Philippe Mathieu-Daudé 
---


Reviewed-by: David Hildenbrand 


--
Thanks,

David / dhildenb




Re: [RFC PATCH v2 2/5] hw/intc/arm_gicv3: Check for !MEMTX_OK instead of MEMTX_ERROR

2021-08-23 Thread Peter Xu
On Mon, Aug 23, 2021 at 06:41:54PM +0200, Philippe Mathieu-Daudé wrote:
> We are going to introduce more MemTxResult bits, so it is
> safer to check for !MEMTX_OK rather than MEMTX_ERROR.
> 
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Peter Xu 

-- 
Peter Xu




Re: [PATCH 4/4] vl: Prioritize realizations of devices

2021-08-23 Thread Eduardo Habkost
On Wed, Aug 18, 2021 at 03:43:18PM -0400, Peter Xu wrote:
> QEMU creates -device objects in order as specified by the user's cmdline.
> However that ordering may not be the ideal order.  For example, some platform
> devices (vIOMMUs) may want to be created earlier than most of the rest
> devices (e.g., vfio-pci, virtio).
> 
> This patch orders the QemuOptsList of '-device's so they'll be sorted first
> before kicking off the device realizations.  This will allow the device
> realization code to be able to use APIs like pci_device_iommu_address_space()
> correctly, because those functions rely on the platfrom devices being 
> realized.
> 
> Now we rely on vmsd->priority which is defined as MigrationPriority to provide
> the ordering, as either VM init and migration completes will need such an
> ordering.  In the future we can move that priority information out of vmsd.
> 
> Signed-off-by: Peter Xu 

Can we be 100% sure that changing the ordering of every single
device being created won't affect guest ABI?  (I don't think we can)

How many device types in QEMU have non-default vmsd priority?

Can we at least ensure devices with the same priority won't be
reordered, just to be safe?  (qsort() doesn't guarantee that)

If very few device types have non-default vmsd priority and
devices with the same priority aren't reordered, the risk of
compatibility breakage would be much smaller.

-- 
Eduardo




[PATCH v5 14/14] disas/riscv: Add Zb[abcs] instructions

2021-08-23 Thread Philipp Tomsich
With the addition of Zb[abcs], we also need to add disassembler
support for these new instructions.

Signed-off-by: Philipp Tomsich 

---

(no changes since v2)

Changes in v2:
- Fix missing ';' from last-minute whitespace cleanups.

 disas/riscv.c | 157 +-
 1 file changed, 154 insertions(+), 3 deletions(-)

diff --git a/disas/riscv.c b/disas/riscv.c
index 278d9be924..793ad14c27 100644
--- a/disas/riscv.c
+++ b/disas/riscv.c
@@ -478,6 +478,49 @@ typedef enum {
 rv_op_fsflags = 316,
 rv_op_fsrmi = 317,
 rv_op_fsflagsi = 318,
+rv_op_bseti = 319,
+rv_op_bclri = 320,
+rv_op_binvi = 321,
+rv_op_bexti = 322,
+rv_op_rori = 323,
+rv_op_clz = 324,
+rv_op_ctz = 325,
+rv_op_cpop = 326,
+rv_op_sext_h = 327,
+rv_op_sext_b = 328,
+rv_op_xnor = 329,
+rv_op_orn = 330,
+rv_op_andn = 331,
+rv_op_rol = 332,
+rv_op_ror = 333,
+rv_op_sh1add = 334,
+rv_op_sh2add = 335,
+rv_op_sh3add = 336,
+rv_op_sh1add_uw = 337,
+rv_op_sh2add_uw = 338,
+rv_op_sh3add_uw = 339,
+rv_op_clmul = 340,
+rv_op_clmulr = 341,
+rv_op_clmulh = 342,
+rv_op_min = 343,
+rv_op_minu = 344,
+rv_op_max = 345,
+rv_op_maxu = 346,
+rv_op_clzw = 347,
+rv_op_ctzw = 348,
+rv_op_cpopw = 349,
+rv_op_slli_uw = 350,
+rv_op_add_uw = 351,
+rv_op_rolw = 352,
+rv_op_rorw = 353,
+rv_op_rev8 = 354,
+rv_op_zext_h = 355,
+rv_op_roriw = 356,
+rv_op_orc_b = 357,
+rv_op_bset = 358,
+rv_op_bclr = 359,
+rv_op_binv = 360,
+rv_op_bext = 361,
 } rv_op;
 
 /* structures */
@@ -1117,6 +1160,49 @@ const rv_opcode_data opcode_data[] = {
 { "fsflags", rv_codec_i_csr, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
 { "fsrmi", rv_codec_i_csr, rv_fmt_rd_zimm, NULL, 0, 0, 0 },
 { "fsflagsi", rv_codec_i_csr, rv_fmt_rd_zimm, NULL, 0, 0, 0 },
+{ "bseti", rv_codec_i_sh7, rv_fmt_rd_rs1_imm, NULL, 0, 0, 0 },
+{ "bclri", rv_codec_i_sh7, rv_fmt_rd_rs1_imm, NULL, 0, 0, 0 },
+{ "binvi", rv_codec_i_sh7, rv_fmt_rd_rs1_imm, NULL, 0, 0, 0 },
+{ "bexti", rv_codec_i_sh7, rv_fmt_rd_rs1_imm, NULL, 0, 0, 0 },
+{ "rori", rv_codec_i_sh7, rv_fmt_rd_rs1_imm, NULL, 0, 0, 0 },
+{ "clz", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
+{ "ctz", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
+{ "cpop", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
+{ "sext.h", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
+{ "sext.b", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
+{ "xnor", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
+{ "orn", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
+{ "andn", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
+{ "rol", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
+{ "ror", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
+{ "sh1add", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
+{ "sh2add", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
+{ "sh3add", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
+{ "sh1add.uw", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
+{ "sh2add.uw", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
+{ "sh3add.uw", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
+{ "clmul", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
+{ "clmulr", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
+{ "clmulh", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
+{ "min", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
+{ "minu", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
+{ "max", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
+{ "maxu", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
+{ "clzw", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
+{ "clzw", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
+{ "cpopw", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
+{ "slli.uw", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
+{ "add.uw", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
+{ "rolw", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
+{ "rorw", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
+{ "rev8", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
+{ "zext.h", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
+{ "roriw", rv_codec_i_sh5, rv_fmt_rd_rs1_imm, NULL, 0, 0, 0 },
+{ "orc.b", rv_codec_r, rv_fmt_rd_rs1, NULL, 0, 0, 0 },
+{ "bset", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
+{ "bclr", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
+{ "binv", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
+{ "bext", rv_codec_r, rv_fmt_rd_rs1_rs2, NULL, 0, 0, 0 },
 };
 
 /* CSR names */
@@ -1507,7 +1593,20 @@ static void decode_inst_opcode(rv_decode *dec, rv_isa 
isa)
 case 0: op = rv_op_addi; break;
 case 1:
 switch (((inst >> 27) & 0b1)) {
-case 0: op = rv_op_slli; break;
+case 0b0: op = rv_op_slli; break;
+case 0b00101: op = rv_op_bseti; 

Re: [RFC PATCH v2 5/5] softmmu/physmem: Have flaview API check MemTxAttrs::bus_perm field

2021-08-23 Thread Peter Xu
On Mon, Aug 23, 2021 at 06:41:57PM +0200, Philippe Mathieu-Daudé wrote:
> @@ -2772,7 +2772,22 @@ static inline bool 
> flatview_access_allowed(MemoryRegion *mr, MemTxAttrs attrs,
> hwaddr addr, hwaddr len,
> MemTxResult *result)
>  {
> -return true;
> +if (unlikely(attrs.bus_perm == MEMTXPERM_RAM_DEVICE)) {
> +if (memory_region_is_ram(mr) || memory_region_is_ram_device(mr)) {

memory_region_is_ram() should be enough ("ram_device" is only set if "ram" is
set)?  Thanks,

-- 
Peter Xu




[PATCH v5 09/14] target/riscv: Add orc.b instruction for Zbb, removing gorc/gorci

2021-08-23 Thread Philipp Tomsich
The 1.0.0 version of Zbb does not contain gorc/gorci.  Instead, a
orc.b instruction (equivalent to the orc.b pseudo-instruction built on
gorci from pre-0.93 draft-B) is available, mainly targeting
string-processing workloads.

This commit adds the new orc.b instruction and removed gorc/gorci.

Signed-off-by: Philipp Tomsich 
Reviewed-by: Richard Henderson 

Change orc.b to implementation suggested by Richard Henderson

---

(no changes since v3)

Changes in v3:
- Moved orc.b and gorc/gorci changes into separate commit.
- Using the simpler orc.b implementation suggested by Richard Henderson

 target/riscv/bitmanip_helper.c  | 26 --
 target/riscv/helper.h   |  2 --
 target/riscv/insn32.decode  |  6 +
 target/riscv/insn_trans/trans_rvb.c.inc | 35 +++--
 target/riscv/translate.c|  6 -
 5 files changed, 16 insertions(+), 59 deletions(-)

diff --git a/target/riscv/bitmanip_helper.c b/target/riscv/bitmanip_helper.c
index 73be5a81c7..bb48388fcd 100644
--- a/target/riscv/bitmanip_helper.c
+++ b/target/riscv/bitmanip_helper.c
@@ -64,32 +64,6 @@ target_ulong HELPER(grevw)(target_ulong rs1, target_ulong 
rs2)
 return do_grev(rs1, rs2, 32);
 }
 
-static target_ulong do_gorc(target_ulong rs1,
-target_ulong rs2,
-int bits)
-{
-target_ulong x = rs1;
-int i, shift;
-
-for (i = 0, shift = 1; shift < bits; i++, shift <<= 1) {
-if (rs2 & shift) {
-x |= do_swap(x, adjacent_masks[i], shift);
-}
-}
-
-return x;
-}
-
-target_ulong HELPER(gorc)(target_ulong rs1, target_ulong rs2)
-{
-return do_gorc(rs1, rs2, TARGET_LONG_BITS);
-}
-
-target_ulong HELPER(gorcw)(target_ulong rs1, target_ulong rs2)
-{
-return do_gorc(rs1, rs2, 32);
-}
-
 target_ulong HELPER(clmul)(target_ulong rs1, target_ulong rs2)
 {
 target_ulong result = 0;
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index c559c860a7..80561e8866 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -61,8 +61,6 @@ DEF_HELPER_FLAGS_1(fclass_d, TCG_CALL_NO_RWG_SE, tl, i64)
 /* Bitmanip */
 DEF_HELPER_FLAGS_2(grev, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 DEF_HELPER_FLAGS_2(grevw, TCG_CALL_NO_RWG_SE, tl, tl, tl)
-DEF_HELPER_FLAGS_2(gorc, TCG_CALL_NO_RWG_SE, tl, tl, tl)
-DEF_HELPER_FLAGS_2(gorcw, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 DEF_HELPER_FLAGS_2(clmul, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 DEF_HELPER_FLAGS_2(clmulr, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index faa56836d8..8bcb602455 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -680,6 +680,7 @@ max101 .. 110 . 0110011 @r
 maxu   101 .. 111 . 0110011 @r
 min101 .. 100 . 0110011 @r
 minu   101 .. 101 . 0110011 @r
+orc_b  001010 000111 . 101 . 0010011 @r2
 orn010 .. 110 . 0110011 @r
 rol011 .. 001 . 0110011 @r
 ror011 .. 101 . 0110011 @r
@@ -701,19 +702,14 @@ pack   100 .. 100 . 0110011 @r
 packu  0100100 .. 100 . 0110011 @r
 packh  100 .. 111 . 0110011 @r
 grev   0110100 .. 101 . 0110011 @r
-gorc   0010100 .. 101 . 0110011 @r
-
 grevi  01101. ... 101 . 0010011 @sh
-gorci  00101. ... 101 . 0010011 @sh
 
 # *** RV64B Standard Extension (in addition to RV32B) ***
 packw  100 .. 100 . 0111011 @r
 packuw 0100100 .. 100 . 0111011 @r
 grevw  0110100 .. 101 . 0111011 @r
-gorcw  0010100 .. 101 . 0111011 @r
 
 greviw 0110100 .. 101 . 0011011 @sh5
-gorciw 0010100 .. 101 . 0011011 @sh5
 
 # *** RV32 Zbc Standard Extension ***
 clmul  101 .. 001 . 0110011 @r
diff --git a/target/riscv/insn_trans/trans_rvb.c.inc 
b/target/riscv/insn_trans/trans_rvb.c.inc
index fcdf8a2b90..93f726e174 100644
--- a/target/riscv/insn_trans/trans_rvb.c.inc
+++ b/target/riscv/insn_trans/trans_rvb.c.inc
@@ -215,18 +215,27 @@ static bool trans_grevi(DisasContext *ctx, arg_grevi *a)
 return gen_grevi(ctx, a);
 }
 
-static bool trans_gorc(DisasContext *ctx, arg_gorc *a)
+static void gen_orc_b(TCGv ret, TCGv source1)
 {
-REQUIRE_EXT(ctx, RVB);
-return gen_shift(ctx, a, gen_helper_gorc);
+TCGv  tmp = tcg_temp_new();
+
+/* Set msb in each byte if the byte was zero. */
+tcg_gen_subi_tl(tmp, source1, dup_const(MO_8, 0x01));
+tcg_gen_andc_tl(tmp, tmp, source1);
+tcg_gen_andi_tl(tmp, tmp, dup_const(MO_8, 0x80));
+
+/* Replicate the msb of each byte across the byte. */
+tcg_gen_shri_tl(tmp, tmp, 7);
+tcg_gen_muli_tl(ret, tmp, 0xff);
 }
 
-static bool trans_gorci(DisasContext *ctx, arg_gorci *a)
+static bool 

Re: [RFC PATCH v2 1/5] softmmu/physmem: Simplify flatview_write and address_space_access_valid

2021-08-23 Thread Peter Xu
On Mon, Aug 23, 2021 at 06:41:53PM +0200, Philippe Mathieu-Daudé wrote:
> Remove unuseful local 'result' variables.
> 
> Signed-off-by: Philippe Mathieu-Daudé 

Reviewed-by: Peter Xu 

-- 
Peter Xu




[PATCH v5 08/14] target/riscv: Reassign instructions to the Zbb-extension

2021-08-23 Thread Philipp Tomsich
This reassigns the instructions that are part of Zbb into it, with the
notable exceptions of the instructions (rev8, zext.w and orc.b) that
changed due to gorci, grevi and pack not being part of Zb[abcs].

Signed-off-by: Philipp Tomsich 
Reviewed-by: Richard Henderson 
---

(no changes since v3)

Changes in v3:
- The changes to the Zbb instructions (i.e. use the REQUIRE_ZBB macro)
  are now in a separate commit.

 target/riscv/insn32.decode  | 40 ++--
 target/riscv/insn_trans/trans_rvb.c.inc | 50 ++---
 2 files changed, 49 insertions(+), 41 deletions(-)

diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 0471c8..faa56836d8 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -671,45 +671,47 @@ sh2add_uw  001 .. 100 . 0111011 @r
 sh3add_uw  001 .. 110 . 0111011 @r
 slli_uw1  001 . 0011011 @sh
 
-# *** RV32B Standard Extension ***
+# *** RV32 Zbb Standard Extension ***
+andn   010 .. 111 . 0110011 @r
 clz011000 00 . 001 . 0010011 @r2
-ctz011000 01 . 001 . 0010011 @r2
 cpop   011000 10 . 001 . 0010011 @r2
+ctz011000 01 . 001 . 0010011 @r2
+max101 .. 110 . 0110011 @r
+maxu   101 .. 111 . 0110011 @r
+min101 .. 100 . 0110011 @r
+minu   101 .. 101 . 0110011 @r
+orn010 .. 110 . 0110011 @r
+rol011 .. 001 . 0110011 @r
+ror011 .. 101 . 0110011 @r
+rori   01100  101 . 0010011 @sh
 sext_b 011000 000100 . 001 . 0010011 @r2
 sext_h 011000 000101 . 001 . 0010011 @r2
-
-andn   010 .. 111 . 0110011 @r
-orn010 .. 110 . 0110011 @r
 xnor   010 .. 100 . 0110011 @r
+
+# *** RV64 Zbb Standard Extension (in addition to RV32 Zbb) ***
+clzw   011 0 . 001 . 0011011 @r2
+ctzw   011 1 . 001 . 0011011 @r2
+cpopw  011 00010 . 001 . 0011011 @r2
+rolw   011 .. 001 . 0111011 @r
+roriw  011 .. 101 . 0011011 @sh5
+rorw   011 .. 101 . 0111011 @r
+
+# *** RV32B Standard Extension ***
 pack   100 .. 100 . 0110011 @r
 packu  0100100 .. 100 . 0110011 @r
 packh  100 .. 111 . 0110011 @r
-min101 .. 100 . 0110011 @r
-minu   101 .. 101 . 0110011 @r
-max101 .. 110 . 0110011 @r
-maxu   101 .. 111 . 0110011 @r
-ror011 .. 101 . 0110011 @r
-rol011 .. 001 . 0110011 @r
 grev   0110100 .. 101 . 0110011 @r
 gorc   0010100 .. 101 . 0110011 @r
 
-rori   01100. ... 101 . 0010011 @sh
 grevi  01101. ... 101 . 0010011 @sh
 gorci  00101. ... 101 . 0010011 @sh
 
 # *** RV64B Standard Extension (in addition to RV32B) ***
-clzw   011 0 . 001 . 0011011 @r2
-ctzw   011 1 . 001 . 0011011 @r2
-cpopw  011 00010 . 001 . 0011011 @r2
-
 packw  100 .. 100 . 0111011 @r
 packuw 0100100 .. 100 . 0111011 @r
-rorw   011 .. 101 . 0111011 @r
-rolw   011 .. 001 . 0111011 @r
 grevw  0110100 .. 101 . 0111011 @r
 gorcw  0010100 .. 101 . 0111011 @r
 
-roriw  011 .. 101 . 0011011 @sh5
 greviw 0110100 .. 101 . 0011011 @sh5
 gorciw 0010100 .. 101 . 0011011 @sh5
 
diff --git a/target/riscv/insn_trans/trans_rvb.c.inc 
b/target/riscv/insn_trans/trans_rvb.c.inc
index 89700117c4..fcdf8a2b90 100644
--- a/target/riscv/insn_trans/trans_rvb.c.inc
+++ b/target/riscv/insn_trans/trans_rvb.c.inc
@@ -1,5 +1,5 @@
 /*
- * RISC-V translation routines for the Zb[acs] Standard Extension.
+ * RISC-V translation routines for the Zb[abcs] Standard Extension.
  *
  * Copyright (c) 2020 Kito Cheng, kito.ch...@sifive.com
  * Copyright (c) 2020 Frank Chang, frank.ch...@sifive.com
@@ -24,6 +24,12 @@
 }\
 } while (0)
 
+#define REQUIRE_ZBB(ctx) do {\
+if (!RISCV_CPU(ctx->cs)->cfg.ext_zbb) {  \
+return false;\
+}\
+} while (0)
+
 #define REQUIRE_ZBC(ctx) do {\
 if (!RISCV_CPU(ctx->cs)->cfg.ext_zbc) {  \
 return false;\
@@ -38,37 +44,37 @@
 
 static bool trans_clz(DisasContext *ctx, arg_clz *a)
 {
-REQUIRE_EXT(ctx, RVB);
+REQUIRE_ZBB(ctx);
 return gen_unary(ctx, a, gen_clz);
 }
 
 static 

Re: [RFC PATCH v2 4/5] softmmu/physmem: Introduce flatview_access_allowed() to check bus perms

2021-08-23 Thread Peter Xu
On Mon, Aug 23, 2021 at 06:41:56PM +0200, Philippe Mathieu-Daudé wrote:
> Introduce flatview_access_allowed() to check bus permission
> before running any bus transaction. For now this is a simple
> stub.
> 
> Signed-off-by: Philippe Mathieu-Daudé 

Shall we squash this patch into the next one?  It helps explain better on why
it's needed.  Thanks,

-- 
Peter Xu




[PATCH v5 11/14] target/riscv: Add rev8 instruction, removing grev/grevi

2021-08-23 Thread Philipp Tomsich
The 1.0.0 version of Zbb does not contain grev/grevi.  Instead, a
rev8 instruction (equivalent to the rev8 pseudo-instruction built on
grevi from pre-0.93 draft-B) is available.

This commit adds the new rev8 instruction and removes grev/grevi.

Note that there is no W-form of this instruction (both a
sign-extending and zero-extending 32-bit version can easily be
synthesized by following rev8 with either a srai or srli instruction
on RV64) and that the opcode encodings for rev8 in RV32 and RV64 are
different.

Signed-off-by: Philipp Tomsich 
Reviewed-by: Richard Henderson 
---

(no changes since v4)

Changes in v4:
- reorder trans_rev8* functions to be sequential
- rename rev8 to rev8_32 in decoder

Changes in v3:
- rev8-addition & grevi*-removal moved to a separate commit

 target/riscv/bitmanip_helper.c  | 40 -
 target/riscv/helper.h   |  2 --
 target/riscv/insn32.decode  | 12 
 target/riscv/insn_trans/trans_rvb.c.inc | 34 ++---
 target/riscv/translate.c| 28 -
 5 files changed, 16 insertions(+), 100 deletions(-)

diff --git a/target/riscv/bitmanip_helper.c b/target/riscv/bitmanip_helper.c
index bb48388fcd..f1b5e5549f 100644
--- a/target/riscv/bitmanip_helper.c
+++ b/target/riscv/bitmanip_helper.c
@@ -24,46 +24,6 @@
 #include "exec/helper-proto.h"
 #include "tcg/tcg.h"
 
-static const uint64_t adjacent_masks[] = {
-dup_const(MO_8, 0x55),
-dup_const(MO_8, 0x33),
-dup_const(MO_8, 0x0f),
-dup_const(MO_16, 0xff),
-dup_const(MO_32, 0x),
-UINT32_MAX
-};
-
-static inline target_ulong do_swap(target_ulong x, uint64_t mask, int shift)
-{
-return ((x & mask) << shift) | ((x & ~mask) >> shift);
-}
-
-static target_ulong do_grev(target_ulong rs1,
-target_ulong rs2,
-int bits)
-{
-target_ulong x = rs1;
-int i, shift;
-
-for (i = 0, shift = 1; shift < bits; i++, shift <<= 1) {
-if (rs2 & shift) {
-x = do_swap(x, adjacent_masks[i], shift);
-}
-}
-
-return x;
-}
-
-target_ulong HELPER(grev)(target_ulong rs1, target_ulong rs2)
-{
-return do_grev(rs1, rs2, TARGET_LONG_BITS);
-}
-
-target_ulong HELPER(grevw)(target_ulong rs1, target_ulong rs2)
-{
-return do_grev(rs1, rs2, 32);
-}
-
 target_ulong HELPER(clmul)(target_ulong rs1, target_ulong rs2)
 {
 target_ulong result = 0;
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 80561e8866..ae2e94542c 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -59,8 +59,6 @@ DEF_HELPER_FLAGS_2(fcvt_d_lu, TCG_CALL_NO_RWG, i64, env, tl)
 DEF_HELPER_FLAGS_1(fclass_d, TCG_CALL_NO_RWG_SE, tl, i64)
 
 /* Bitmanip */
-DEF_HELPER_FLAGS_2(grev, TCG_CALL_NO_RWG_SE, tl, tl, tl)
-DEF_HELPER_FLAGS_2(grevw, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 DEF_HELPER_FLAGS_2(clmul, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 DEF_HELPER_FLAGS_2(clmulr, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 8bcb602455..017eb50a49 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -682,6 +682,9 @@ min101 .. 100 . 0110011 @r
 minu   101 .. 101 . 0110011 @r
 orc_b  001010 000111 . 101 . 0010011 @r2
 orn010 .. 110 . 0110011 @r
+# The encoding for rev8 differs between RV32 and RV64. 
+# rev8_32 denotes the RV32 variant.
+rev8_32011010 011000 . 101 . 0010011 @r2
 rol011 .. 001 . 0110011 @r
 ror011 .. 101 . 0110011 @r
 rori   01100  101 . 0010011 @sh
@@ -693,6 +696,10 @@ xnor   010 .. 100 . 0110011 @r
 clzw   011 0 . 001 . 0011011 @r2
 ctzw   011 1 . 001 . 0011011 @r2
 cpopw  011 00010 . 001 . 0011011 @r2
+# The encoding for rev8 differs between RV32 and RV64.
+# When executing on RV64, the encoding used in RV32 is an illegal
+# instruction, so we use different handler functions to differentiate.
+rev8_64011010 111000 . 101 . 0010011 @r2
 rolw   011 .. 001 . 0111011 @r
 roriw  011 .. 101 . 0011011 @sh5
 rorw   011 .. 101 . 0111011 @r
@@ -701,15 +708,10 @@ rorw   011 .. 101 . 0111011 @r
 pack   100 .. 100 . 0110011 @r
 packu  0100100 .. 100 . 0110011 @r
 packh  100 .. 111 . 0110011 @r
-grev   0110100 .. 101 . 0110011 @r
-grevi  01101. ... 101 . 0010011 @sh
 
 # *** RV64B Standard Extension (in addition to RV32B) ***
 packw  100 .. 100 . 0111011 @r
 packuw 0100100 .. 100 . 0111011 @r
-grevw  0110100 .. 101 . 0111011 @r
-
-greviw 0110100 .. 101 . 0011011 @sh5
 
 # *** RV32 Zbc Standard Extension ***
 

[PATCH v5 12/14] target/riscv: Add zext.h instructions to Zbb, removing pack/packu/packh

2021-08-23 Thread Philipp Tomsich
The 1.0.0 version of Zbb does not contain pack/packu/packh. However, a
zext.h instruction is provided (built on pack/packh from pre-0.93
draft-B) is available.

This commit adds zext.h and removes the pack* instructions.

Note that the encodings for zext.h are different between RV32 and
RV64, which is handled through REQUIRE_32BIT.

Signed-off-by: Philipp Tomsich 
Reviewed-by: Richard Henderson 
---

(no changes since v4)

Changes in v4:
- Renamed RV32 variant to zext_h_32.
- Reordered trans_zext_h_{32,64} to be next to each other.

Changes in v3:
- Moved zext.h-addition & pack*-removal to a separate commit.

 target/riscv/insn32.decode  | 12 ---
 target/riscv/insn_trans/trans_rvb.c.inc | 46 -
 target/riscv/translate.c| 40 -
 3 files changed, 21 insertions(+), 77 deletions(-)

diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 017eb50a49..abf794095a 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -691,6 +691,9 @@ rori   01100  101 . 0010011 @sh
 sext_b 011000 000100 . 001 . 0010011 @r2
 sext_h 011000 000101 . 001 . 0010011 @r2
 xnor   010 .. 100 . 0110011 @r
+# The encoding for zext.h differs between RV32 and RV64.
+# zext_h_32 denotes the RV32 variant.
+zext_h_32  100 0 . 100 . 0110011 @r2
 
 # *** RV64 Zbb Standard Extension (in addition to RV32 Zbb) ***
 clzw   011 0 . 001 . 0011011 @r2
@@ -703,15 +706,14 @@ rev8_64011010 111000 . 101 . 0010011 @r2
 rolw   011 .. 001 . 0111011 @r
 roriw  011 .. 101 . 0011011 @sh5
 rorw   011 .. 101 . 0111011 @r
+# The encoding for zext.h differs between RV32 and RV64.
+# When executing on RV64, the encoding used in RV32 is an illegal
+# instruction, so we use different handler functions to differentiate.
+zext_h_64  100 0 . 100 . 0111011 @r2
 
 # *** RV32B Standard Extension ***
-pack   100 .. 100 . 0110011 @r
-packu  0100100 .. 100 . 0110011 @r
-packh  100 .. 111 . 0110011 @r
 
 # *** RV64B Standard Extension (in addition to RV32B) ***
-packw  100 .. 100 . 0111011 @r
-packuw 0100100 .. 100 . 0111011 @r
 
 # *** RV32 Zbc Standard Extension ***
 clmul  101 .. 001 . 0110011 @r
diff --git a/target/riscv/insn_trans/trans_rvb.c.inc 
b/target/riscv/insn_trans/trans_rvb.c.inc
index 57929025ea..899f3ecb85 100644
--- a/target/riscv/insn_trans/trans_rvb.c.inc
+++ b/target/riscv/insn_trans/trans_rvb.c.inc
@@ -78,24 +78,6 @@ static bool trans_xnor(DisasContext *ctx, arg_xnor *a)
 return gen_arith(ctx, a, tcg_gen_eqv_tl);
 }
 
-static bool trans_pack(DisasContext *ctx, arg_pack *a)
-{
-REQUIRE_EXT(ctx, RVB);
-return gen_arith(ctx, a, gen_pack);
-}
-
-static bool trans_packu(DisasContext *ctx, arg_packu *a)
-{
-REQUIRE_EXT(ctx, RVB);
-return gen_arith(ctx, a, gen_packu);
-}
-
-static bool trans_packh(DisasContext *ctx, arg_packh *a)
-{
-REQUIRE_EXT(ctx, RVB);
-return gen_arith(ctx, a, gen_packh);
-}
-
 static bool trans_min(DisasContext *ctx, arg_min *a)
 {
 REQUIRE_ZBB(ctx);
@@ -233,6 +215,20 @@ static bool trans_orc_b(DisasContext *ctx, arg_orc_b *a)
 return gen_unary(ctx, a, _orc_b);
 }
 
+static bool trans_zext_h_32(DisasContext *ctx, arg_zext_h_32 *a)
+{
+REQUIRE_32BIT(ctx);
+REQUIRE_ZBB(ctx);
+return gen_unary(ctx, a, _gen_ext16u_tl);
+}
+
+static bool trans_zext_h_64(DisasContext *ctx, arg_zext_h_64 *a)
+{
+REQUIRE_64BIT(ctx);
+REQUIRE_ZBB(ctx);
+return gen_unary(ctx, a, _gen_ext16u_tl);
+}
+
 
 #define GEN_TRANS_SHADD(SHAMT) \
 static bool trans_sh##SHAMT##add(DisasContext *ctx, arg_sh##SHAMT##add *a) \
@@ -266,20 +262,6 @@ static bool trans_cpopw(DisasContext *ctx, arg_cpopw *a)
 return gen_unary(ctx, a, gen_cpopw);
 }
 
-static bool trans_packw(DisasContext *ctx, arg_packw *a)
-{
-REQUIRE_64BIT(ctx);
-REQUIRE_EXT(ctx, RVB);
-return gen_arith(ctx, a, gen_packw);
-}
-
-static bool trans_packuw(DisasContext *ctx, arg_packuw *a)
-{
-REQUIRE_64BIT(ctx);
-REQUIRE_EXT(ctx, RVB);
-return gen_arith(ctx, a, gen_packuw);
-}
-
 static bool trans_rorw(DisasContext *ctx, arg_rorw *a)
 {
 REQUIRE_64BIT(ctx);
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index c2a1df2f01..0e2698bfb3 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -536,29 +536,6 @@ static bool gen_arith_div_uw(DisasContext *ctx, arg_r *a,
 return true;
 }
 
-static void gen_pack(TCGv ret, TCGv arg1, TCGv arg2)
-{
-tcg_gen_deposit_tl(ret, arg1, arg2,
-   TARGET_LONG_BITS / 2,
-   TARGET_LONG_BITS / 2);
-}
-
-static void gen_packu(TCGv ret, TCGv arg1, TCGv arg2)
-{
-TCGv t = 

[PATCH v5 07/14] target/riscv: Add instructions of the Zbc-extension

2021-08-23 Thread Philipp Tomsich
The following instructions are part of Zbc:
 - clmul
 - clmulh
 - clmulr

Note that these instructions were already defined in the pre-0.93 and
the 0.93 draft-B proposals, but had not been omitted in the earlier
addition of draft-B to QEmu.

Signed-off-by: Philipp Tomsich 
---

Changes in v5:
- Introduce gen_clmulh (as suggested by Richard H) and use to simplify
  trans_clmulh().

Changes in v3:
- This adds the Zbc instructions as a spearate commit.
- Uses a helper for clmul/clmulr instead of inlining the calculation of
  the result (addressing a comment from Richard Henderson).

 target/riscv/bitmanip_helper.c  | 27 +
 target/riscv/helper.h   |  2 ++
 target/riscv/insn32.decode  |  5 +
 target/riscv/insn_trans/trans_rvb.c.inc | 27 -
 target/riscv/translate.c|  6 ++
 5 files changed, 66 insertions(+), 1 deletion(-)

diff --git a/target/riscv/bitmanip_helper.c b/target/riscv/bitmanip_helper.c
index 5b2f795d03..73be5a81c7 100644
--- a/target/riscv/bitmanip_helper.c
+++ b/target/riscv/bitmanip_helper.c
@@ -3,6 +3,7 @@
  *
  * Copyright (c) 2020 Kito Cheng, kito.ch...@sifive.com
  * Copyright (c) 2020 Frank Chang, frank.ch...@sifive.com
+ * Copyright (c) 2021 Philipp Tomsich, philipp.toms...@vrull.eu
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms and conditions of the GNU General Public License,
@@ -88,3 +89,29 @@ target_ulong HELPER(gorcw)(target_ulong rs1, target_ulong 
rs2)
 {
 return do_gorc(rs1, rs2, 32);
 }
+
+target_ulong HELPER(clmul)(target_ulong rs1, target_ulong rs2)
+{
+target_ulong result = 0;
+
+for (int i = 0; i < TARGET_LONG_BITS; i++) {
+if ((rs2 >> i) & 1) {
+result ^= (rs1 << i);
+}
+}
+
+return result;
+}
+
+target_ulong HELPER(clmulr)(target_ulong rs1, target_ulong rs2)
+{
+target_ulong result = 0;
+
+for (int i = 0; i < TARGET_LONG_BITS; i++) {
+if ((rs2 >> i) & 1) {
+result ^= (rs1 >> (TARGET_LONG_BITS - i - 1));
+}
+}
+
+return result;
+}
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 415e37bc37..c559c860a7 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -63,6 +63,8 @@ DEF_HELPER_FLAGS_2(grev, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 DEF_HELPER_FLAGS_2(grevw, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 DEF_HELPER_FLAGS_2(gorc, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 DEF_HELPER_FLAGS_2(gorcw, TCG_CALL_NO_RWG_SE, tl, tl, tl)
+DEF_HELPER_FLAGS_2(clmul, TCG_CALL_NO_RWG_SE, tl, tl, tl)
+DEF_HELPER_FLAGS_2(clmulr, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 
 /* Special functions */
 DEF_HELPER_3(csrrw, tl, env, tl, tl)
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 1166e7f648..0471c8 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -713,6 +713,11 @@ roriw  011 .. 101 . 0011011 @sh5
 greviw 0110100 .. 101 . 0011011 @sh5
 gorciw 0010100 .. 101 . 0011011 @sh5
 
+# *** RV32 Zbc Standard Extension ***
+clmul  101 .. 001 . 0110011 @r
+clmulh 101 .. 011 . 0110011 @r
+clmulr 101 .. 010 . 0110011 @r
+
 # *** RV32 Zbs Standard Extension ***
 bclr   0100100 .. 001 . 0110011 @r
 bclri  01001. ... 001 . 0010011 @sh
diff --git a/target/riscv/insn_trans/trans_rvb.c.inc 
b/target/riscv/insn_trans/trans_rvb.c.inc
index 21d713df27..89700117c4 100644
--- a/target/riscv/insn_trans/trans_rvb.c.inc
+++ b/target/riscv/insn_trans/trans_rvb.c.inc
@@ -1,5 +1,5 @@
 /*
- * RISC-V translation routines for the RVB draft Zb[as] Standard Extension.
+ * RISC-V translation routines for the Zb[acs] Standard Extension.
  *
  * Copyright (c) 2020 Kito Cheng, kito.ch...@sifive.com
  * Copyright (c) 2020 Frank Chang, frank.ch...@sifive.com
@@ -24,6 +24,12 @@
 }\
 } while (0)
 
+#define REQUIRE_ZBC(ctx) do {\
+if (!RISCV_CPU(ctx->cs)->cfg.ext_zbc) {  \
+return false;\
+}\
+} while (0)
+
 #define REQUIRE_ZBS(ctx) do {\
 if (!RISCV_CPU(ctx->cs)->cfg.ext_zbs) {  \
 return false;\
@@ -357,3 +363,22 @@ static bool trans_slli_uw(DisasContext *ctx, arg_slli_uw 
*a)
 tcg_temp_free(source1);
 return true;
 }
+
+static bool trans_clmul(DisasContext *ctx, arg_clmul *a)
+{
+REQUIRE_ZBC(ctx);
+return gen_arith(ctx, a, gen_helper_clmul);
+}
+
+
+static bool trans_clmulh(DisasContext *ctx, arg_clmulr *a)
+{
+REQUIRE_ZBC(ctx);
+return gen_arith(ctx, a, gen_clmulh);
+}
+
+static bool trans_clmulr(DisasContext *ctx, arg_clmulh *a)
+{
+REQUIRE_ZBC(ctx);
+return gen_arith(ctx, a, gen_helper_clmulr);
+}
diff --git a/target/riscv/translate.c 

[PATCH v5 02/14] target/riscv: Reassign instructions to the Zba-extension

2021-08-23 Thread Philipp Tomsich
The following instructions are part of Zba:
 - add.uw (RV64 only)
 - sh[123]add (RV32 and RV64)
 - sh[123]add.uw (RV64-only)
 - slli.uw (RV64-only)

Signed-off-by: Philipp Tomsich 
Reviewed-by: Richard Henderson 
---

(no changes since v3)

Changes in v3:
- The changes to the Zba instructions (i.e. the REQUIRE_ZBA macro
  and its use for qualifying the Zba instructions) are moved into
  a separate commit.

 target/riscv/insn32.decode  | 20 
 target/riscv/insn_trans/trans_rvb.c.inc | 17 -
 2 files changed, 24 insertions(+), 13 deletions(-)

diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index f09f8d5faf..68b163b72d 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -659,6 +659,18 @@ vamomaxd_v  10100 . . . . 111 . 010 
@r_wdvm
 vamominud_v 11000 . . . . 111 . 010 @r_wdvm
 vamomaxud_v 11100 . . . . 111 . 010 @r_wdvm
 
+# *** RV32 Zba Standard Extension ***
+sh1add 001 .. 010 . 0110011 @r
+sh2add 001 .. 100 . 0110011 @r
+sh3add 001 .. 110 . 0110011 @r
+
+# *** RV64 Zba Standard Extension (in addition to RV32 Zba) ***
+add_uw 100 .. 000 . 0111011 @r
+sh1add_uw  001 .. 010 . 0111011 @r
+sh2add_uw  001 .. 100 . 0111011 @r
+sh3add_uw  001 .. 110 . 0111011 @r
+slli_uw1  001 . 0011011 @sh
+
 # *** RV32B Standard Extension ***
 clz011000 00 . 001 . 0010011 @r2
 ctz011000 01 . 001 . 0010011 @r2
@@ -686,9 +698,6 @@ ror011 .. 101 . 0110011 @r
 rol011 .. 001 . 0110011 @r
 grev   0110100 .. 101 . 0110011 @r
 gorc   0010100 .. 101 . 0110011 @r
-sh1add 001 .. 010 . 0110011 @r
-sh2add 001 .. 100 . 0110011 @r
-sh3add 001 .. 110 . 0110011 @r
 
 bseti  00101. ... 001 . 0010011 @sh
 bclri  01001. ... 001 . 0010011 @sh
@@ -717,10 +726,6 @@ rorw   011 .. 101 . 0111011 @r
 rolw   011 .. 001 . 0111011 @r
 grevw  0110100 .. 101 . 0111011 @r
 gorcw  0010100 .. 101 . 0111011 @r
-sh1add_uw  001 .. 010 . 0111011 @r
-sh2add_uw  001 .. 100 . 0111011 @r
-sh3add_uw  001 .. 110 . 0111011 @r
-add_uw 100 .. 000 . 0111011 @r
 
 bsetiw 0010100 .. 001 . 0011011 @sh5
 bclriw 0100100 .. 001 . 0011011 @sh5
@@ -731,4 +736,3 @@ roriw  011 .. 101 . 0011011 @sh5
 greviw 0110100 .. 101 . 0011011 @sh5
 gorciw 0010100 .. 101 . 0011011 @sh5
 
-slli_uw1. ... 001 . 0011011 @sh
diff --git a/target/riscv/insn_trans/trans_rvb.c.inc 
b/target/riscv/insn_trans/trans_rvb.c.inc
index 9e81f6e3de..3cdd70a2b9 100644
--- a/target/riscv/insn_trans/trans_rvb.c.inc
+++ b/target/riscv/insn_trans/trans_rvb.c.inc
@@ -1,8 +1,9 @@
 /*
- * RISC-V translation routines for the RVB Standard Extension.
+ * RISC-V translation routines for the RVB draft and Zba Standard Extension.
  *
  * Copyright (c) 2020 Kito Cheng, kito.ch...@sifive.com
  * Copyright (c) 2020 Frank Chang, frank.ch...@sifive.com
+ * Copyright (c) 2021 Philipp Tomsich, philipp.toms...@vrull.eu
  *
  * This program is free software; you can redistribute it and/or modify it
  * under the terms and conditions of the GNU General Public License,
@@ -17,6 +18,12 @@
  * this program.  If not, see .
  */
 
+#define REQUIRE_ZBA(ctx) do {\
+if (!RISCV_CPU(ctx->cs)->cfg.ext_zba) {  \
+return false;\
+}\
+} while (0)
+
 static bool trans_clz(DisasContext *ctx, arg_clz *a)
 {
 REQUIRE_EXT(ctx, RVB);
@@ -229,7 +236,7 @@ static bool trans_gorci(DisasContext *ctx, arg_gorci *a)
 #define GEN_TRANS_SHADD(SHAMT) \
 static bool trans_sh##SHAMT##add(DisasContext *ctx, arg_sh##SHAMT##add *a) \
 {  \
-REQUIRE_EXT(ctx, RVB); \
+REQUIRE_ZBA(ctx);  \
 return gen_arith(ctx, a, gen_sh##SHAMT##add);  \
 }
 
@@ -403,7 +410,7 @@ static bool trans_sh##SHAMT##add_uw(DisasContext *ctx,  
  \
 arg_sh##SHAMT##add_uw *a) \
 { \
 REQUIRE_64BIT(ctx);   \
-REQUIRE_EXT(ctx, RVB);\
+REQUIRE_ZBA(ctx);   

[PATCH v5 13/14] target/riscv: Remove RVB (replaced by Zb[abcs]

2021-08-23 Thread Philipp Tomsich
With everything classified as Zb[abcs] and pre-0.93 draft-B
instructions that are not part of Zb[abcs] removed, we can remove the
remaining support code for RVB.

Note that RVB has been retired for good and misa.B will neither mean
'some' or 'all of' Zb*:
  https://lists.riscv.org/g/tech-bitmanip/message/532

Signed-off-by: Philipp Tomsich 
Reviewed-by: Richard Henderson 
---

(no changes since v3)

Changes in v3:
- Removing RVB moved into a separate commit at the tail-end of the series.

 target/riscv/cpu.c | 27 ---
 target/riscv/cpu.h |  3 ---
 target/riscv/insn32.decode |  4 
 3 files changed, 34 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index c7bc1f9f44..93bd8f7802 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -127,11 +127,6 @@ static void set_priv_version(CPURISCVState *env, int 
priv_ver)
 env->priv_ver = priv_ver;
 }
 
-static void set_bext_version(CPURISCVState *env, int bext_ver)
-{
-env->bext_ver = bext_ver;
-}
-
 static void set_vext_version(CPURISCVState *env, int vext_ver)
 {
 env->vext_ver = vext_ver;
@@ -393,7 +388,6 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 CPURISCVState *env = >env;
 RISCVCPUClass *mcc = RISCV_CPU_GET_CLASS(dev);
 int priv_version = PRIV_VERSION_1_11_0;
-int bext_version = BEXT_VERSION_0_93_0;
 int vext_version = VEXT_VERSION_0_07_1;
 target_ulong target_misa = env->misa;
 Error *local_err = NULL;
@@ -418,7 +412,6 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 }
 
 set_priv_version(env, priv_version);
-set_bext_version(env, bext_version);
 set_vext_version(env, vext_version);
 
 if (cpu->cfg.mmu) {
@@ -496,24 +489,6 @@ static void riscv_cpu_realize(DeviceState *dev, Error 
**errp)
 if (cpu->cfg.ext_h) {
 target_misa |= RVH;
 }
-if (cpu->cfg.ext_b) {
-target_misa |= RVB;
-
-if (cpu->cfg.bext_spec) {
-if (!g_strcmp0(cpu->cfg.bext_spec, "v0.93")) {
-bext_version = BEXT_VERSION_0_93_0;
-} else {
-error_setg(errp,
-   "Unsupported bitmanip spec version '%s'",
-   cpu->cfg.bext_spec);
-return;
-}
-} else {
-qemu_log("bitmanip version is not specified, "
- "use the default value v0.93\n");
-}
-set_bext_version(env, bext_version);
-}
 if (cpu->cfg.ext_v) {
 target_misa |= RVV;
 if (!is_power_of_2(cpu->cfg.vlen)) {
@@ -584,7 +559,6 @@ static Property riscv_cpu_properties[] = {
 DEFINE_PROP_BOOL("s", RISCVCPU, cfg.ext_s, true),
 DEFINE_PROP_BOOL("u", RISCVCPU, cfg.ext_u, true),
 /* This is experimental so mark with 'x-' */
-DEFINE_PROP_BOOL("x-b", RISCVCPU, cfg.ext_b, false),
 DEFINE_PROP_BOOL("x-zba", RISCVCPU, cfg.ext_zba, false),
 DEFINE_PROP_BOOL("x-zbb", RISCVCPU, cfg.ext_zbb, false),
 DEFINE_PROP_BOOL("x-zbc", RISCVCPU, cfg.ext_zbc, false),
@@ -595,7 +569,6 @@ static Property riscv_cpu_properties[] = {
 DEFINE_PROP_BOOL("Zifencei", RISCVCPU, cfg.ext_ifencei, true),
 DEFINE_PROP_BOOL("Zicsr", RISCVCPU, cfg.ext_icsr, true),
 DEFINE_PROP_STRING("priv_spec", RISCVCPU, cfg.priv_spec),
-DEFINE_PROP_STRING("bext_spec", RISCVCPU, cfg.bext_spec),
 DEFINE_PROP_STRING("vext_spec", RISCVCPU, cfg.vext_spec),
 DEFINE_PROP_UINT16("vlen", RISCVCPU, cfg.vlen, 128),
 DEFINE_PROP_UINT16("elen", RISCVCPU, cfg.elen, 64),
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 7c4cd8ea89..77e8b06106 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -67,7 +67,6 @@
 #define RVS RV('S')
 #define RVU RV('U')
 #define RVH RV('H')
-#define RVB RV('B')
 
 /* S extension denotes that Supervisor mode exists, however it is possible
to have a core that support S mode but does not have an MMU and there
@@ -83,7 +82,6 @@ enum {
 #define PRIV_VERSION_1_10_0 0x00011000
 #define PRIV_VERSION_1_11_0 0x00011100
 
-#define BEXT_VERSION_0_93_0 0x9300
 #define VEXT_VERSION_0_07_1 0x0701
 
 enum {
@@ -288,7 +286,6 @@ struct RISCVCPU {
 bool ext_f;
 bool ext_d;
 bool ext_c;
-bool ext_b;
 bool ext_s;
 bool ext_u;
 bool ext_h;
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index abf794095a..0f6020ccb1 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -711,10 +711,6 @@ rorw   011 .. 101 . 0111011 @r
 # instruction, so we use different handler functions to differentiate.
 zext_h_64  100 0 . 100 . 0111011 @r2
 
-# *** RV32B Standard Extension ***
-
-# *** RV64B Standard Extension (in addition to RV32B) ***
-
 # *** RV32 Zbc Standard Extension ***
 clmul  101 .. 001 . 0110011 @r
 

[PATCH v5 06/14] target/riscv: Reassign instructions to the Zbs-extension

2021-08-23 Thread Philipp Tomsich
The following instructions are part of Zbs:
 - b{set,clr,ext,inv}
 - b{set,clr,ext,inv}i

Signed-off-by: Philipp Tomsich 
Reviewed-by: Richard Henderson 
---

(no changes since v3)

Changes in v3:
- The changes to the Zbs instructions (i.e. the REQUIRE_ZBS macro) and
  its use for qualifying the Zba instructions) are moved into a
  separate commit.

 target/riscv/insn32.decode  | 17 +
 target/riscv/insn_trans/trans_rvb.c.inc | 24 +++-
 2 files changed, 24 insertions(+), 17 deletions(-)

diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 7e38477553..1166e7f648 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -688,19 +688,11 @@ min101 .. 100 . 0110011 @r
 minu   101 .. 101 . 0110011 @r
 max101 .. 110 . 0110011 @r
 maxu   101 .. 111 . 0110011 @r
-bset   0010100 .. 001 . 0110011 @r
-bclr   0100100 .. 001 . 0110011 @r
-binv   0110100 .. 001 . 0110011 @r
-bext   0100100 .. 101 . 0110011 @r
 ror011 .. 101 . 0110011 @r
 rol011 .. 001 . 0110011 @r
 grev   0110100 .. 101 . 0110011 @r
 gorc   0010100 .. 101 . 0110011 @r
 
-bseti  00101. ... 001 . 0010011 @sh
-bclri  01001. ... 001 . 0010011 @sh
-binvi  01101. ... 001 . 0010011 @sh
-bexti  01001. ... 101 . 0010011 @sh
 rori   01100. ... 101 . 0010011 @sh
 grevi  01101. ... 101 . 0010011 @sh
 gorci  00101. ... 101 . 0010011 @sh
@@ -721,3 +713,12 @@ roriw  011 .. 101 . 0011011 @sh5
 greviw 0110100 .. 101 . 0011011 @sh5
 gorciw 0010100 .. 101 . 0011011 @sh5
 
+# *** RV32 Zbs Standard Extension ***
+bclr   0100100 .. 001 . 0110011 @r
+bclri  01001. ... 001 . 0010011 @sh
+bext   0100100 .. 101 . 0110011 @r
+bexti  01001. ... 101 . 0010011 @sh
+binv   0110100 .. 001 . 0110011 @r
+binvi  01101. ... 001 . 0010011 @sh
+bset   0010100 .. 001 . 0110011 @r
+bseti  00101. ... 001 . 0010011 @sh
diff --git a/target/riscv/insn_trans/trans_rvb.c.inc 
b/target/riscv/insn_trans/trans_rvb.c.inc
index ac706349f5..21d713df27 100644
--- a/target/riscv/insn_trans/trans_rvb.c.inc
+++ b/target/riscv/insn_trans/trans_rvb.c.inc
@@ -1,5 +1,5 @@
 /*
- * RISC-V translation routines for the RVB draft and Zba Standard Extension.
+ * RISC-V translation routines for the RVB draft Zb[as] Standard Extension.
  *
  * Copyright (c) 2020 Kito Cheng, kito.ch...@sifive.com
  * Copyright (c) 2020 Frank Chang, frank.ch...@sifive.com
@@ -24,6 +24,12 @@
 }\
 } while (0)
 
+#define REQUIRE_ZBS(ctx) do {\
+if (!RISCV_CPU(ctx->cs)->cfg.ext_zbs) {  \
+return false;\
+}\
+} while (0)
+
 static bool trans_clz(DisasContext *ctx, arg_clz *a)
 {
 REQUIRE_EXT(ctx, RVB);
@@ -116,49 +122,49 @@ static bool trans_sext_h(DisasContext *ctx, arg_sext_h *a)
 
 static bool trans_bset(DisasContext *ctx, arg_bset *a)
 {
-REQUIRE_EXT(ctx, RVB);
+REQUIRE_ZBS(ctx);
 return gen_shift(ctx, a, gen_bset);
 }
 
 static bool trans_bseti(DisasContext *ctx, arg_bseti *a)
 {
-REQUIRE_EXT(ctx, RVB);
+REQUIRE_ZBS(ctx);
 return gen_shifti(ctx, a, gen_bset);
 }
 
 static bool trans_bclr(DisasContext *ctx, arg_bclr *a)
 {
-REQUIRE_EXT(ctx, RVB);
+REQUIRE_ZBS(ctx);
 return gen_shift(ctx, a, gen_bclr);
 }
 
 static bool trans_bclri(DisasContext *ctx, arg_bclri *a)
 {
-REQUIRE_EXT(ctx, RVB);
+REQUIRE_ZBS(ctx);
 return gen_shifti(ctx, a, gen_bclr);
 }
 
 static bool trans_binv(DisasContext *ctx, arg_binv *a)
 {
-REQUIRE_EXT(ctx, RVB);
+REQUIRE_ZBS(ctx);
 return gen_shift(ctx, a, gen_binv);
 }
 
 static bool trans_binvi(DisasContext *ctx, arg_binvi *a)
 {
-REQUIRE_EXT(ctx, RVB);
+REQUIRE_ZBS(ctx);
 return gen_shifti(ctx, a, gen_binv);
 }
 
 static bool trans_bext(DisasContext *ctx, arg_bext *a)
 {
-REQUIRE_EXT(ctx, RVB);
+REQUIRE_ZBS(ctx);
 return gen_shift(ctx, a, gen_bext);
 }
 
 static bool trans_bexti(DisasContext *ctx, arg_bexti *a)
 {
-REQUIRE_EXT(ctx, RVB);
+REQUIRE_ZBS(ctx);
 return gen_shifti(ctx, a, gen_bext);
 }
 
-- 
2.25.1




[PATCH v5 03/14] target/riscv: slli.uw is only a valid encoding if shamt first in 64 bits

2021-08-23 Thread Philipp Tomsich
For RV64, the shamt field in slli.uw is 6 bits wide. While the encoding
space currently reserves a wider shamt-field (for use is a future RV128
ISA), setting the additional bit to 1 will not map to slli.uw for RV64
and needs to be treated as an illegal instruction.

Note that this encoding being reserved for a future RV128 does not imply
that no other instructions for RV64-only could be added in this encoding
space in the future.

As the implementation is separate from the gen_shifti helpers, we keep
it that way and add the check for the shamt-width here.

Signed-off-by: Philipp Tomsich 
Reviewed-by: Richard Henderson 
---

(no changes since v3)

Changes in v3:
- Instead of defining a new decoding format, we treat slli.uw as if it
  had a 7bit-wide field for shamt (the 7th bit is reserved for RV128)
  and check for validity of the encoding in C code.

 target/riscv/insn_trans/trans_rvb.c.inc | 9 +
 1 file changed, 9 insertions(+)

diff --git a/target/riscv/insn_trans/trans_rvb.c.inc 
b/target/riscv/insn_trans/trans_rvb.c.inc
index 3cdd70a2b9..dcc7b6893d 100644
--- a/target/riscv/insn_trans/trans_rvb.c.inc
+++ b/target/riscv/insn_trans/trans_rvb.c.inc
@@ -430,6 +430,15 @@ static bool trans_slli_uw(DisasContext *ctx, arg_slli_uw 
*a)
 REQUIRE_64BIT(ctx);
 REQUIRE_ZBA(ctx);
 
+/*
+ * The shamt field is only 6 bits for RV64 (with the 7th bit
+ * remaining reserved for RV128).  If the reserved bit is set
+ * on RV64, the encoding is illegal.
+ */
+if (a->shamt >= TARGET_LONG_BITS) {
+return false;
+}
+
 TCGv source1 = tcg_temp_new();
 gen_get_gpr(source1, a->rs1);
 
-- 
2.25.1




[PATCH v5 04/14] target/riscv: Remove the W-form instructions from Zbs

2021-08-23 Thread Philipp Tomsich
Zbs 1.0.0 (just as the 0.93 draft-B before) does no provide for W-form
instructions for Zbs (single-bit instructions).  Remove them.

Note that these instructions had already been removed for the 0.93
version of the draft-B extenstion and have not been present in the
binutils patches circulating in January 2021.

Signed-off-by: Philipp Tomsich 
Reviewed-by: Richard Henderson 
---

(no changes since v3)

Changes in v3:
- Remove the W-form instructions from Zbs in a separate commit.

 target/riscv/insn32.decode  |  7 
 target/riscv/insn_trans/trans_rvb.c.inc | 49 -
 2 files changed, 56 deletions(-)

diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 68b163b72d..9abdbcb799 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -716,10 +716,6 @@ cpopw  011 00010 . 001 . 0011011 @r2
 
 packw  100 .. 100 . 0111011 @r
 packuw 0100100 .. 100 . 0111011 @r
-bsetw  0010100 .. 001 . 0111011 @r
-bclrw  0100100 .. 001 . 0111011 @r
-binvw  0110100 .. 001 . 0111011 @r
-bextw  0100100 .. 101 . 0111011 @r
 slow   001 .. 001 . 0111011 @r
 srow   001 .. 101 . 0111011 @r
 rorw   011 .. 101 . 0111011 @r
@@ -727,9 +723,6 @@ rolw   011 .. 001 . 0111011 @r
 grevw  0110100 .. 101 . 0111011 @r
 gorcw  0010100 .. 101 . 0111011 @r
 
-bsetiw 0010100 .. 001 . 0011011 @sh5
-bclriw 0100100 .. 001 . 0011011 @sh5
-binviw 0110100 .. 001 . 0011011 @sh5
 sloiw  001 .. 001 . 0011011 @sh5
 sroiw  001 .. 101 . 0011011 @sh5
 roriw  011 .. 101 . 0011011 @sh5
diff --git a/target/riscv/insn_trans/trans_rvb.c.inc 
b/target/riscv/insn_trans/trans_rvb.c.inc
index dcc7b6893d..975492d45c 100644
--- a/target/riscv/insn_trans/trans_rvb.c.inc
+++ b/target/riscv/insn_trans/trans_rvb.c.inc
@@ -279,55 +279,6 @@ static bool trans_packuw(DisasContext *ctx, arg_packuw *a)
 return gen_arith(ctx, a, gen_packuw);
 }
 
-static bool trans_bsetw(DisasContext *ctx, arg_bsetw *a)
-{
-REQUIRE_64BIT(ctx);
-REQUIRE_EXT(ctx, RVB);
-return gen_shiftw(ctx, a, gen_bset);
-}
-
-static bool trans_bsetiw(DisasContext *ctx, arg_bsetiw *a)
-{
-REQUIRE_64BIT(ctx);
-REQUIRE_EXT(ctx, RVB);
-return gen_shiftiw(ctx, a, gen_bset);
-}
-
-static bool trans_bclrw(DisasContext *ctx, arg_bclrw *a)
-{
-REQUIRE_64BIT(ctx);
-REQUIRE_EXT(ctx, RVB);
-return gen_shiftw(ctx, a, gen_bclr);
-}
-
-static bool trans_bclriw(DisasContext *ctx, arg_bclriw *a)
-{
-REQUIRE_64BIT(ctx);
-REQUIRE_EXT(ctx, RVB);
-return gen_shiftiw(ctx, a, gen_bclr);
-}
-
-static bool trans_binvw(DisasContext *ctx, arg_binvw *a)
-{
-REQUIRE_64BIT(ctx);
-REQUIRE_EXT(ctx, RVB);
-return gen_shiftw(ctx, a, gen_binv);
-}
-
-static bool trans_binviw(DisasContext *ctx, arg_binviw *a)
-{
-REQUIRE_64BIT(ctx);
-REQUIRE_EXT(ctx, RVB);
-return gen_shiftiw(ctx, a, gen_binv);
-}
-
-static bool trans_bextw(DisasContext *ctx, arg_bextw *a)
-{
-REQUIRE_64BIT(ctx);
-REQUIRE_EXT(ctx, RVB);
-return gen_shiftw(ctx, a, gen_bext);
-}
-
 static bool trans_slow(DisasContext *ctx, arg_slow *a)
 {
 REQUIRE_64BIT(ctx);
-- 
2.25.1




[PATCH v5 10/14] target/riscv: Add a REQUIRE_32BIT macro

2021-08-23 Thread Philipp Tomsich
With the changes to Zb[abcs], there's some encodings that are
different in RV64 and RV32 (e.g., for rev8 and zext.h). For these,
we'll need a helper macro allowing us to select on RV32, as well.

Signed-off-by: Philipp Tomsich 
Reviewed-by: Richard Henderson 
---

(no changes since v3)

Changes in v3:
- Moved the REQUIRE_32BIT macro into a separate commit.

 target/riscv/translate.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index bdb47905f6..9b726ce9c4 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -417,6 +417,12 @@ EX_SH(12)
 }  \
 } while (0)
 
+#define REQUIRE_32BIT(ctx) do { \
+if (!is_32bit(ctx)) {   \
+return false;   \
+}   \
+} while (0)
+
 #define REQUIRE_64BIT(ctx) do { \
 if (is_32bit(ctx)) {\
 return false;   \
-- 
2.25.1




[PATCH v5 00/14] target/riscv: Update QEmu for Zb[abcs] 1.0.0

2021-08-23 Thread Philipp Tomsich


The Zb[abcs] extensions have complete public review and are nearing
ratifications. These individual extensions are one part of what was
previously though of as the "BitManip" (B) extension, leaving the
final details of future Zb* extensions open as they will undergo
further public discourse.

This series updates the earlier support for the B extension by
 - removing those instructions that are not included in Zb[abcs]
 - splitting this into 4 separate extensions that can be independently
   enabled: Zba (addressing), Zbb (basic bit-manip), Zbc (carryless
   multiplication), Zbs (single-bit operations)
 - update the to the 1.0.0 version (e.g. w-forms of rev8 and Zbs
   instructions are not included in Zb[abcs])

For the latest version of the public review speicifcaiton
(incorporating some editorial fixes and corrections from the review
period), refer to:
  
https://github.com/riscv/riscv-bitmanip/releases/download/1.0.0/bitmanip-1.0.0-31-g2af7256.pdf


Changes in v5:
- Introduce gen_clmulh (as suggested by Richard H) and use to simplify
  trans_clmulh().

Changes in v4:
- Drop rewrite of slli.uw (to match formal specification), as it would
  remove an optimization.
- reorder trans_rev8* functions to be sequential
- rename rev8 to rev8_32 in decoder
- Renamed RV32 variant to zext_h_32.
- Reordered trans_zext_h_{32,64} to be next to each other.

Changes in v3:
- Split off removal of 'x-b' property and 'ext_b' field into a separate
  patch to ensure bisectability.
- The changes to the Zba instructions (i.e. the REQUIRE_ZBA macro
  and its use for qualifying the Zba instructions) are moved into
  a separate commit.
- Instead of defining a new decoding format, we treat slli.uw as if it
  had a 7bit-wide field for shamt (the 7th bit is reserved for RV128)
  and check for validity of the encoding in C code.
- Remove the W-form instructions from Zbs in a separate commit.
- Remove shift-one instructions in a separate commit.
- The changes to the Zbs instructions (i.e. the REQUIRE_ZBS macro) and
  its use for qualifying the Zba instructions) are moved into a
  separate commit.
- This adds the Zbc instructions as a spearate commit.
- Uses a helper for clmul/clmulr instead of inlining the calculation of
  the result (addressing a comment from Richard Henderson).
- The changes to the Zbb instructions (i.e. use the REQUIRE_ZBB macro)
  are now in a separate commit.
- Moved orc.b and gorc/gorci changes into separate commit.
- Using the simpler orc.b implementation suggested by Richard Henderson
- Moved the REQUIRE_32BIT macro into a separate commit.
- rev8-addition & grevi*-removal moved to a separate commit
- Moved zext.h-addition & pack*-removal to a separate commit.
- Removing RVB moved into a separate commit at the tail-end of the series.

Changes in v2:
- Fix missing ';' from last-minute whitespace cleanups.

Philipp Tomsich (14):
  target/riscv: Add x-zba, x-zbb, x-zbc and x-zbs properties
  target/riscv: Reassign instructions to the Zba-extension
  target/riscv: slli.uw is only a valid encoding if shamt first in 64
bits
  target/riscv: Remove the W-form instructions from Zbs
  target/riscv: Remove shift-one instructions (proposed Zbo in pre-0.93
draft-B)
  target/riscv: Reassign instructions to the Zbs-extension
  target/riscv: Add instructions of the Zbc-extension
  target/riscv: Reassign instructions to the Zbb-extension
  target/riscv: Add orc.b instruction for Zbb, removing gorc/gorci
  target/riscv: Add a REQUIRE_32BIT macro
  target/riscv: Add rev8 instruction, removing grev/grevi
  target/riscv: Add zext.h instructions to Zbb, removing
pack/packu/packh
  target/riscv: Remove RVB (replaced by Zb[abcs]
  disas/riscv: Add Zb[abcs] instructions

 disas/riscv.c   | 157 ++-
 target/riscv/bitmanip_helper.c  |  65 +
 target/riscv/cpu.c  |  31 +--
 target/riscv/cpu.h  |   7 +-
 target/riscv/helper.h   |   6 +-
 target/riscv/insn32.decode  | 115 
 target/riscv/insn_trans/trans_rvb.c.inc | 333 +---
 target/riscv/translate.c| 100 +--
 8 files changed, 366 insertions(+), 448 deletions(-)

-- 
2.25.1




[PATCH v5 05/14] target/riscv: Remove shift-one instructions (proposed Zbo in pre-0.93 draft-B)

2021-08-23 Thread Philipp Tomsich
The Zb[abcs] ratification package does not include the proposed
shift-one instructions. There currently is no clear plan to whether
these (or variants of them) will be ratified as Zbo (or a different
extension) or what the timeframe for such a decision could be.

Signed-off-by: Philipp Tomsich 
Reviewed-by: Richard Henderson 
---

(no changes since v3)

Changes in v3:
- Remove shift-one instructions in a separate commit.

 target/riscv/insn32.decode  |  8 
 target/riscv/insn_trans/trans_rvb.c.inc | 52 -
 target/riscv/translate.c| 14 ---
 3 files changed, 74 deletions(-)

diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 9abdbcb799..7e38477553 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -692,8 +692,6 @@ bset   0010100 .. 001 . 0110011 @r
 bclr   0100100 .. 001 . 0110011 @r
 binv   0110100 .. 001 . 0110011 @r
 bext   0100100 .. 101 . 0110011 @r
-slo001 .. 001 . 0110011 @r
-sro001 .. 101 . 0110011 @r
 ror011 .. 101 . 0110011 @r
 rol011 .. 001 . 0110011 @r
 grev   0110100 .. 101 . 0110011 @r
@@ -703,8 +701,6 @@ bseti  00101. ... 001 . 0010011 @sh
 bclri  01001. ... 001 . 0010011 @sh
 binvi  01101. ... 001 . 0010011 @sh
 bexti  01001. ... 101 . 0010011 @sh
-sloi   00100. ... 001 . 0010011 @sh
-sroi   00100. ... 101 . 0010011 @sh
 rori   01100. ... 101 . 0010011 @sh
 grevi  01101. ... 101 . 0010011 @sh
 gorci  00101. ... 101 . 0010011 @sh
@@ -716,15 +712,11 @@ cpopw  011 00010 . 001 . 0011011 @r2
 
 packw  100 .. 100 . 0111011 @r
 packuw 0100100 .. 100 . 0111011 @r
-slow   001 .. 001 . 0111011 @r
-srow   001 .. 101 . 0111011 @r
 rorw   011 .. 101 . 0111011 @r
 rolw   011 .. 001 . 0111011 @r
 grevw  0110100 .. 101 . 0111011 @r
 gorcw  0010100 .. 101 . 0111011 @r
 
-sloiw  001 .. 001 . 0011011 @sh5
-sroiw  001 .. 101 . 0011011 @sh5
 roriw  011 .. 101 . 0011011 @sh5
 greviw 0110100 .. 101 . 0011011 @sh5
 gorciw 0010100 .. 101 . 0011011 @sh5
diff --git a/target/riscv/insn_trans/trans_rvb.c.inc 
b/target/riscv/insn_trans/trans_rvb.c.inc
index 975492d45c..ac706349f5 100644
--- a/target/riscv/insn_trans/trans_rvb.c.inc
+++ b/target/riscv/insn_trans/trans_rvb.c.inc
@@ -162,30 +162,6 @@ static bool trans_bexti(DisasContext *ctx, arg_bexti *a)
 return gen_shifti(ctx, a, gen_bext);
 }
 
-static bool trans_slo(DisasContext *ctx, arg_slo *a)
-{
-REQUIRE_EXT(ctx, RVB);
-return gen_shift(ctx, a, gen_slo);
-}
-
-static bool trans_sloi(DisasContext *ctx, arg_sloi *a)
-{
-REQUIRE_EXT(ctx, RVB);
-return gen_shifti(ctx, a, gen_slo);
-}
-
-static bool trans_sro(DisasContext *ctx, arg_sro *a)
-{
-REQUIRE_EXT(ctx, RVB);
-return gen_shift(ctx, a, gen_sro);
-}
-
-static bool trans_sroi(DisasContext *ctx, arg_sroi *a)
-{
-REQUIRE_EXT(ctx, RVB);
-return gen_shifti(ctx, a, gen_sro);
-}
-
 static bool trans_ror(DisasContext *ctx, arg_ror *a)
 {
 REQUIRE_EXT(ctx, RVB);
@@ -279,34 +255,6 @@ static bool trans_packuw(DisasContext *ctx, arg_packuw *a)
 return gen_arith(ctx, a, gen_packuw);
 }
 
-static bool trans_slow(DisasContext *ctx, arg_slow *a)
-{
-REQUIRE_64BIT(ctx);
-REQUIRE_EXT(ctx, RVB);
-return gen_shiftw(ctx, a, gen_slo);
-}
-
-static bool trans_sloiw(DisasContext *ctx, arg_sloiw *a)
-{
-REQUIRE_64BIT(ctx);
-REQUIRE_EXT(ctx, RVB);
-return gen_shiftiw(ctx, a, gen_slo);
-}
-
-static bool trans_srow(DisasContext *ctx, arg_srow *a)
-{
-REQUIRE_64BIT(ctx);
-REQUIRE_EXT(ctx, RVB);
-return gen_shiftw(ctx, a, gen_sro);
-}
-
-static bool trans_sroiw(DisasContext *ctx, arg_sroiw *a)
-{
-REQUIRE_64BIT(ctx);
-REQUIRE_EXT(ctx, RVB);
-return gen_shiftiw(ctx, a, gen_sro);
-}
-
 static bool trans_rorw(DisasContext *ctx, arg_rorw *a)
 {
 REQUIRE_64BIT(ctx);
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index 6983be5723..fc22ae82d0 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -595,20 +595,6 @@ static void gen_bext(TCGv ret, TCGv arg1, TCGv shamt)
 tcg_gen_andi_tl(ret, ret, 1);
 }
 
-static void gen_slo(TCGv ret, TCGv arg1, TCGv arg2)
-{
-tcg_gen_not_tl(ret, arg1);
-tcg_gen_shl_tl(ret, ret, arg2);
-tcg_gen_not_tl(ret, ret);
-}
-
-static void gen_sro(TCGv ret, TCGv arg1, TCGv arg2)
-{
-tcg_gen_not_tl(ret, arg1);
-tcg_gen_shr_tl(ret, ret, arg2);
-tcg_gen_not_tl(ret, ret);
-}
-
 static bool 

[PATCH v5 01/14] target/riscv: Add x-zba, x-zbb, x-zbc and x-zbs properties

2021-08-23 Thread Philipp Tomsich
The bitmanipulation ISA extensions will be ratified as individual
small extension packages instead of a large B-extension.  The first
new instructions through the door (these have completed public review)
are Zb[abcs].

This adds new 'x-zba', 'x-zbb', 'x-zbc' and 'x-zbs' properties for
these in target/riscv/cpu.[ch].

Signed-off-by: Philipp Tomsich 
Reviewed-by: Richard Henderson 
---

(no changes since v3)

Changes in v3:
- Split off removal of 'x-b' property and 'ext_b' field into a separate
  patch to ensure bisectability.

 target/riscv/cpu.c | 4 
 target/riscv/cpu.h | 4 
 2 files changed, 8 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 991a6bb760..c7bc1f9f44 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -585,6 +585,10 @@ static Property riscv_cpu_properties[] = {
 DEFINE_PROP_BOOL("u", RISCVCPU, cfg.ext_u, true),
 /* This is experimental so mark with 'x-' */
 DEFINE_PROP_BOOL("x-b", RISCVCPU, cfg.ext_b, false),
+DEFINE_PROP_BOOL("x-zba", RISCVCPU, cfg.ext_zba, false),
+DEFINE_PROP_BOOL("x-zbb", RISCVCPU, cfg.ext_zbb, false),
+DEFINE_PROP_BOOL("x-zbc", RISCVCPU, cfg.ext_zbc, false),
+DEFINE_PROP_BOOL("x-zbs", RISCVCPU, cfg.ext_zbs, false),
 DEFINE_PROP_BOOL("x-h", RISCVCPU, cfg.ext_h, false),
 DEFINE_PROP_BOOL("x-v", RISCVCPU, cfg.ext_v, false),
 DEFINE_PROP_BOOL("Counters", RISCVCPU, cfg.ext_counters, true),
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index bf1c899c00..7c4cd8ea89 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -293,6 +293,10 @@ struct RISCVCPU {
 bool ext_u;
 bool ext_h;
 bool ext_v;
+bool ext_zba;
+bool ext_zbb;
+bool ext_zbc;
+bool ext_zbs;
 bool ext_counters;
 bool ext_ifencei;
 bool ext_icsr;
-- 
2.25.1




Re: [RFC PATCH v2 3/5] exec/memattrs: Introduce MemTxAttrs::bus_perm field

2021-08-23 Thread Peter Xu
On Mon, Aug 23, 2021 at 06:41:55PM +0200, Philippe Mathieu-Daudé wrote:
> +/* Permission to restrict bus memory accesses. See MemTxAttrs::bus_perm */
> +enum {
> +MEMTXPERM_UNSPECIFIED   = 0,
> +MEMTXPERM_UNRESTRICTED  = 1,
> +MEMTXPERM_RAM_DEVICE= 2,
> +};

Is there a difference between UNSPECIFIED and UNRESTRICTED?

If no, should we merge them?

-- 
Peter Xu




Re: [PATCH v2 2/3] hw/usb/hcd-xhci-pci: Abort if setting link property failed

2021-08-23 Thread Eduardo Habkost
+Markus

On Thu, Aug 19, 2021 at 07:15:46PM +0200, Philippe Mathieu-Daudé wrote:
> Do not ignore eventual error if we failed at setting the 'host'
> property of the TYPE_XHCI model.
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  hw/usb/hcd-xhci-pci.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/hw/usb/hcd-xhci-pci.c b/hw/usb/hcd-xhci-pci.c
> index e934b1a5b1f..71f6629ccde 100644
> --- a/hw/usb/hcd-xhci-pci.c
> +++ b/hw/usb/hcd-xhci-pci.c
> @@ -115,7 +115,7 @@ static void usb_xhci_pci_realize(struct PCIDevice *dev, 
> Error **errp)
>  dev->config[PCI_CACHE_LINE_SIZE] = 0x10;
>  dev->config[0x60] = 0x30; /* release number */
>  
> -object_property_set_link(OBJECT(>xhci), "host", OBJECT(s), NULL);
> +object_property_set_link(OBJECT(>xhci), "host", OBJECT(s), 
> _fatal);

If this fails, it's due to programmer error, isn't?  Shouldn't we
use _abort on that case?

>  s->xhci.intr_update = xhci_pci_intr_update;
>  s->xhci.intr_raise = xhci_pci_intr_raise;
>  if (!qdev_realize(DEVICE(>xhci), NULL, errp)) {
> -- 
> 2.31.1
> 

-- 
Eduardo




Re: [PATCH v4 07/14] target/riscv: Add instructions of the Zbc-extension

2021-08-23 Thread Richard Henderson

On 8/23/21 11:11 AM, Philipp Tomsich wrote:

+/* ... then shift the result 1 bit to the right. */
+TCGv dst = tcg_temp_new();
+gen_get_gpr(dst, a->rd);
+tcg_gen_shri_tl(dst, dst, 1);
+gen_set_gpr(a->rd, dst);
+tcg_temp_free(dst);


Missed review changes from v3:

static void gen_clmulh(TCGv dst, TCGv src1, TCGv src2)
{
gen_helper_clmulr(dst, src1, src2);
tcg_gen_shri_tl(dst, dst, 1);
}


r~



[PATCH v4 08/14] target/riscv: Reassign instructions to the Zbb-extension

2021-08-23 Thread Philipp Tomsich
This reassigns the instructions that are part of Zbb into it, with the
notable exceptions of the instructions (rev8, zext.w and orc.b) that
changed due to gorci, grevi and pack not being part of Zb[abcs].

Signed-off-by: Philipp Tomsich 
Reviewed-by: Richard Henderson 
---

(no changes since v3)

Changes in v3:
- The changes to the Zbb instructions (i.e. use the REQUIRE_ZBB macro)
  are now in a separate commit.

 target/riscv/insn32.decode  | 40 ++--
 target/riscv/insn_trans/trans_rvb.c.inc | 50 ++---
 2 files changed, 49 insertions(+), 41 deletions(-)

diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 0471c8..faa56836d8 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -671,45 +671,47 @@ sh2add_uw  001 .. 100 . 0111011 @r
 sh3add_uw  001 .. 110 . 0111011 @r
 slli_uw1  001 . 0011011 @sh
 
-# *** RV32B Standard Extension ***
+# *** RV32 Zbb Standard Extension ***
+andn   010 .. 111 . 0110011 @r
 clz011000 00 . 001 . 0010011 @r2
-ctz011000 01 . 001 . 0010011 @r2
 cpop   011000 10 . 001 . 0010011 @r2
+ctz011000 01 . 001 . 0010011 @r2
+max101 .. 110 . 0110011 @r
+maxu   101 .. 111 . 0110011 @r
+min101 .. 100 . 0110011 @r
+minu   101 .. 101 . 0110011 @r
+orn010 .. 110 . 0110011 @r
+rol011 .. 001 . 0110011 @r
+ror011 .. 101 . 0110011 @r
+rori   01100  101 . 0010011 @sh
 sext_b 011000 000100 . 001 . 0010011 @r2
 sext_h 011000 000101 . 001 . 0010011 @r2
-
-andn   010 .. 111 . 0110011 @r
-orn010 .. 110 . 0110011 @r
 xnor   010 .. 100 . 0110011 @r
+
+# *** RV64 Zbb Standard Extension (in addition to RV32 Zbb) ***
+clzw   011 0 . 001 . 0011011 @r2
+ctzw   011 1 . 001 . 0011011 @r2
+cpopw  011 00010 . 001 . 0011011 @r2
+rolw   011 .. 001 . 0111011 @r
+roriw  011 .. 101 . 0011011 @sh5
+rorw   011 .. 101 . 0111011 @r
+
+# *** RV32B Standard Extension ***
 pack   100 .. 100 . 0110011 @r
 packu  0100100 .. 100 . 0110011 @r
 packh  100 .. 111 . 0110011 @r
-min101 .. 100 . 0110011 @r
-minu   101 .. 101 . 0110011 @r
-max101 .. 110 . 0110011 @r
-maxu   101 .. 111 . 0110011 @r
-ror011 .. 101 . 0110011 @r
-rol011 .. 001 . 0110011 @r
 grev   0110100 .. 101 . 0110011 @r
 gorc   0010100 .. 101 . 0110011 @r
 
-rori   01100. ... 101 . 0010011 @sh
 grevi  01101. ... 101 . 0010011 @sh
 gorci  00101. ... 101 . 0010011 @sh
 
 # *** RV64B Standard Extension (in addition to RV32B) ***
-clzw   011 0 . 001 . 0011011 @r2
-ctzw   011 1 . 001 . 0011011 @r2
-cpopw  011 00010 . 001 . 0011011 @r2
-
 packw  100 .. 100 . 0111011 @r
 packuw 0100100 .. 100 . 0111011 @r
-rorw   011 .. 101 . 0111011 @r
-rolw   011 .. 001 . 0111011 @r
 grevw  0110100 .. 101 . 0111011 @r
 gorcw  0010100 .. 101 . 0111011 @r
 
-roriw  011 .. 101 . 0011011 @sh5
 greviw 0110100 .. 101 . 0011011 @sh5
 gorciw 0010100 .. 101 . 0011011 @sh5
 
diff --git a/target/riscv/insn_trans/trans_rvb.c.inc 
b/target/riscv/insn_trans/trans_rvb.c.inc
index 92c31ea1e6..0aa7e9abf7 100644
--- a/target/riscv/insn_trans/trans_rvb.c.inc
+++ b/target/riscv/insn_trans/trans_rvb.c.inc
@@ -1,5 +1,5 @@
 /*
- * RISC-V translation routines for the Zb[acs] Standard Extension.
+ * RISC-V translation routines for the Zb[abcs] Standard Extension.
  *
  * Copyright (c) 2020 Kito Cheng, kito.ch...@sifive.com
  * Copyright (c) 2020 Frank Chang, frank.ch...@sifive.com
@@ -24,6 +24,12 @@
 }\
 } while (0)
 
+#define REQUIRE_ZBB(ctx) do {\
+if (!RISCV_CPU(ctx->cs)->cfg.ext_zbb) {  \
+return false;\
+}\
+} while (0)
+
 #define REQUIRE_ZBC(ctx) do {\
 if (!RISCV_CPU(ctx->cs)->cfg.ext_zbc) {  \
 return false;\
@@ -38,37 +44,37 @@
 
 static bool trans_clz(DisasContext *ctx, arg_clz *a)
 {
-REQUIRE_EXT(ctx, RVB);
+REQUIRE_ZBB(ctx);
 return gen_unary(ctx, a, gen_clz);
 }
 
 static 

[PATCH v4 11/14] target/riscv: Add rev8 instruction, removing grev/grevi

2021-08-23 Thread Philipp Tomsich
The 1.0.0 version of Zbb does not contain grev/grevi.  Instead, a
rev8 instruction (equivalent to the rev8 pseudo-instruction built on
grevi from pre-0.93 draft-B) is available.

This commit adds the new rev8 instruction and removes grev/grevi.

Note that there is no W-form of this instruction (both a
sign-extending and zero-extending 32-bit version can easily be
synthesized by following rev8 with either a srai or srli instruction
on RV64) and that the opcode encodings for rev8 in RV32 and RV64 are
different.

Signed-off-by: Philipp Tomsich 
Reviewed-by: Richard Henderson 
---

Changes in v4:
- reorder trans_rev8* functions to be sequential
- rename rev8 to rev8_32 in decoder

Changes in v3:
- rev8-addition & grevi*-removal moved to a separate commit

 target/riscv/bitmanip_helper.c  | 40 -
 target/riscv/helper.h   |  2 --
 target/riscv/insn32.decode  | 12 
 target/riscv/insn_trans/trans_rvb.c.inc | 34 ++---
 target/riscv/translate.c| 28 -
 5 files changed, 16 insertions(+), 100 deletions(-)

diff --git a/target/riscv/bitmanip_helper.c b/target/riscv/bitmanip_helper.c
index bb48388fcd..f1b5e5549f 100644
--- a/target/riscv/bitmanip_helper.c
+++ b/target/riscv/bitmanip_helper.c
@@ -24,46 +24,6 @@
 #include "exec/helper-proto.h"
 #include "tcg/tcg.h"
 
-static const uint64_t adjacent_masks[] = {
-dup_const(MO_8, 0x55),
-dup_const(MO_8, 0x33),
-dup_const(MO_8, 0x0f),
-dup_const(MO_16, 0xff),
-dup_const(MO_32, 0x),
-UINT32_MAX
-};
-
-static inline target_ulong do_swap(target_ulong x, uint64_t mask, int shift)
-{
-return ((x & mask) << shift) | ((x & ~mask) >> shift);
-}
-
-static target_ulong do_grev(target_ulong rs1,
-target_ulong rs2,
-int bits)
-{
-target_ulong x = rs1;
-int i, shift;
-
-for (i = 0, shift = 1; shift < bits; i++, shift <<= 1) {
-if (rs2 & shift) {
-x = do_swap(x, adjacent_masks[i], shift);
-}
-}
-
-return x;
-}
-
-target_ulong HELPER(grev)(target_ulong rs1, target_ulong rs2)
-{
-return do_grev(rs1, rs2, TARGET_LONG_BITS);
-}
-
-target_ulong HELPER(grevw)(target_ulong rs1, target_ulong rs2)
-{
-return do_grev(rs1, rs2, 32);
-}
-
 target_ulong HELPER(clmul)(target_ulong rs1, target_ulong rs2)
 {
 target_ulong result = 0;
diff --git a/target/riscv/helper.h b/target/riscv/helper.h
index 80561e8866..ae2e94542c 100644
--- a/target/riscv/helper.h
+++ b/target/riscv/helper.h
@@ -59,8 +59,6 @@ DEF_HELPER_FLAGS_2(fcvt_d_lu, TCG_CALL_NO_RWG, i64, env, tl)
 DEF_HELPER_FLAGS_1(fclass_d, TCG_CALL_NO_RWG_SE, tl, i64)
 
 /* Bitmanip */
-DEF_HELPER_FLAGS_2(grev, TCG_CALL_NO_RWG_SE, tl, tl, tl)
-DEF_HELPER_FLAGS_2(grevw, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 DEF_HELPER_FLAGS_2(clmul, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 DEF_HELPER_FLAGS_2(clmulr, TCG_CALL_NO_RWG_SE, tl, tl, tl)
 
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 8bcb602455..017eb50a49 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -682,6 +682,9 @@ min101 .. 100 . 0110011 @r
 minu   101 .. 101 . 0110011 @r
 orc_b  001010 000111 . 101 . 0010011 @r2
 orn010 .. 110 . 0110011 @r
+# The encoding for rev8 differs between RV32 and RV64. 
+# rev8_32 denotes the RV32 variant.
+rev8_32011010 011000 . 101 . 0010011 @r2
 rol011 .. 001 . 0110011 @r
 ror011 .. 101 . 0110011 @r
 rori   01100  101 . 0010011 @sh
@@ -693,6 +696,10 @@ xnor   010 .. 100 . 0110011 @r
 clzw   011 0 . 001 . 0011011 @r2
 ctzw   011 1 . 001 . 0011011 @r2
 cpopw  011 00010 . 001 . 0011011 @r2
+# The encoding for rev8 differs between RV32 and RV64.
+# When executing on RV64, the encoding used in RV32 is an illegal
+# instruction, so we use different handler functions to differentiate.
+rev8_64011010 111000 . 101 . 0010011 @r2
 rolw   011 .. 001 . 0111011 @r
 roriw  011 .. 101 . 0011011 @sh5
 rorw   011 .. 101 . 0111011 @r
@@ -701,15 +708,10 @@ rorw   011 .. 101 . 0111011 @r
 pack   100 .. 100 . 0110011 @r
 packu  0100100 .. 100 . 0110011 @r
 packh  100 .. 111 . 0110011 @r
-grev   0110100 .. 101 . 0110011 @r
-grevi  01101. ... 101 . 0010011 @sh
 
 # *** RV64B Standard Extension (in addition to RV32B) ***
 packw  100 .. 100 . 0111011 @r
 packuw 0100100 .. 100 . 0111011 @r
-grevw  0110100 .. 101 . 0111011 @r
-
-greviw 0110100 .. 101 . 0011011 @sh5
 
 # *** RV32 Zbc Standard Extension ***
 clmul  101 

  1   2   3   4   >