Re: [PATCH v2 09/13] vdpa net: block migration if the device has CVQ

2023-02-21 Thread Eugenio Perez Martin
On Wed, Feb 22, 2023 at 5:01 AM Jason Wang  wrote:
>
>
> 在 2023/2/8 17:42, Eugenio Pérez 写道:
> > Devices with CVQ needs to migrate state beyond vq state.  Leaving this
> > to future series.
>
>
> I may miss something but what is missed to support CVQ/MQ?
>

To restore all the device state set by CVQ in the migration source
(MAC, MQ, ...) before data vqs start. We don't have a reliable way to
not start data vqs until the device [1].

Thanks!

[1] https://lists.gnu.org/archive/html/qemu-devel/2023-01/msg02652.html

> Thanks
>
>
> >
> > Signed-off-by: Eugenio Pérez 
> > ---
> >   net/vhost-vdpa.c | 6 ++
> >   1 file changed, 6 insertions(+)
> >
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index bca13f97fd..309861e56c 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -955,11 +955,17 @@ int net_init_vhost_vdpa(const Netdev *netdev, const 
> > char *name,
> >   }
> >
> >   if (has_cvq) {
> > +VhostVDPAState *s;
> > +
> >   nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
> >vdpa_device_fd, i, 1, false,
> >opts->x_svq, iova_range);
> >   if (!nc)
> >   goto err;
> > +
> > +s = DO_UPCAST(VhostVDPAState, nc, nc);
> > +error_setg(>vhost_vdpa.dev->migration_blocker,
> > +   "net vdpa cannot migrate with MQ feature");
> >   }
> >
> >   return 0;
>




Re: [PATCH v2 07/13] vdpa: add vdpa net migration state notifier

2023-02-21 Thread Eugenio Perez Martin
On Wed, Feb 22, 2023 at 4:56 AM Jason Wang  wrote:
>
>
> 在 2023/2/8 17:42, Eugenio Pérez 写道:
> > This allows net to restart the device backend to configure SVQ on it.
> >
> > Ideally, these changes should not be net specific. However, the vdpa net
> > backend is the one with enough knowledge to configure everything because
> > of some reasons:
> > * Queues might need to be shadowed or not depending on its kind (control
> >vs data).
> > * Queues need to share the same map translations (iova tree).
> >
> > Because of that it is cleaner to restart the whole net backend and
> > configure again as expected, similar to how vhost-kernel moves between
> > userspace and passthrough.
> >
> > If more kinds of devices need dynamic switching to SVQ we can create a
> > callback struct like VhostOps and move most of the code there.
> > VhostOps cannot be reused since all vdpa backend share them, and to
> > personalize just for networking would be too heavy.
> >
> > Signed-off-by: Eugenio Pérez 
> > ---
> > v3:
> > * Add TODO to use the resume operation in the future.
> > * Use migration_in_setup and migration_has_failed instead of a
> >complicated switch case.
> > ---
> >   net/vhost-vdpa.c | 76 
> >   1 file changed, 76 insertions(+)
> >
> > diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
> > index dd686b4514..bca13f97fd 100644
> > --- a/net/vhost-vdpa.c
> > +++ b/net/vhost-vdpa.c
> > @@ -26,12 +26,14 @@
> >   #include 
> >   #include "standard-headers/linux/virtio_net.h"
> >   #include "monitor/monitor.h"
> > +#include "migration/misc.h"
> >   #include "hw/virtio/vhost.h"
> >
> >   /* Todo:need to add the multiqueue support here */
> >   typedef struct VhostVDPAState {
> >   NetClientState nc;
> >   struct vhost_vdpa vhost_vdpa;
> > +Notifier migration_state;
> >   VHostNetState *vhost_net;
> >
> >   /* Control commands shadow buffers */
> > @@ -241,10 +243,79 @@ static VhostVDPAState 
> > *vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
> >   return DO_UPCAST(VhostVDPAState, nc, nc0);
> >   }
> >
> > +static void vhost_vdpa_net_log_global_enable(VhostVDPAState *s, bool 
> > enable)
> > +{
> > +struct vhost_vdpa *v = >vhost_vdpa;
> > +VirtIONet *n;
> > +VirtIODevice *vdev;
> > +int data_queue_pairs, cvq, r;
> > +NetClientState *peer;
> > +
> > +/* We are only called on the first data vqs and only if x-svq is not 
> > set */
> > +if (s->vhost_vdpa.shadow_vqs_enabled == enable) {
> > +return;
> > +}
> > +
> > +vdev = v->dev->vdev;
> > +n = VIRTIO_NET(vdev);
>
>
> Let's tweak the code to move those initialization to the beginning of
> the function.
>

Sure.

>
> > +if (!n->vhost_started) {
> > +return;
> > +}
>
>
> What happens if the vhost is started during the live migration?
>

This is solved at v3, checking the migrate state at
vhost_vdpa_net_data_start_first too [1]. However, this created another
few complications / complex code as Si-Wei points out.

Recent changes due to virtio reset makes it easier to move all this
code to hw/virtio/vhost-vdpa.c, where different kinds of vDPA devices
can share the code. I'll send a new version that way.

>
> > +
> > +data_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
> > +cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
> > +  n->max_ncs - n->max_queue_pairs : 0;
> > +/*
> > + * TODO: vhost_net_stop does suspend, get_base and reset. We can be 
> > smarter
> > + * in the future and resume the device if read-only operations between
> > + * suspend and reset goes wrong.
> > + */
> > +vhost_net_stop(vdev, n->nic->ncs, data_queue_pairs, cvq);
> > +
> > +peer = s->nc.peer;
> > +for (int i = 0; i < data_queue_pairs + cvq; i++) {
> > +VhostVDPAState *vdpa_state;
> > +NetClientState *nc;
> > +
> > +if (i < data_queue_pairs) {
> > +nc = qemu_get_peer(peer, i);
> > +} else {
> > +nc = qemu_get_peer(peer, n->max_queue_pairs);
> > +}
> > +
> > +vdpa_state = DO_UPCAST(VhostVDPAState, nc, nc);
> > +vdpa_state->vhost_vdpa.shadow_data = enable;
> > +
> > +if (i < data_queue_pairs) {
> > +/* Do not override CVQ shadow_vqs_enabled */
> > +vdpa_state->vhost_vdpa.shadow_vqs_enabled = enable;
> > +}
>
>
> I wonder what happens if the number of queue pairs is changed during
> live migration? Should we assign all qps in this case?
>

Migration is blocked if the device has CVQ feature in this series.

Thanks!

[1] 
https://patchwork.kernel.org/project/qemu-devel/patch/20230215173850.298832-9-epere...@redhat.com/

> Thanks
>
>
> > +}
> > +
> > +r = vhost_net_start(vdev, n->nic->ncs, data_queue_pairs, cvq);
> > +if (unlikely(r < 0)) {
> > +error_report("unable to start vhost net: %s(%d)", g_strerror(-r), 
> > -r);
> > +}
> > +}
> > +

Re: [PATCH v3 21/24] gdbstub: only compile gdbstub twice for whole build

2023-02-21 Thread Richard Henderson

On 2/21/23 12:52, Alex Bennée wrote:

-# and BSD?
-specific_ss.add(when: 'CONFIG_LINUX_USER', if_true: files('user-target.c'))
+# The user-target is specialised by the guest
+specific_ss.add(when: 'CONFIG_USER_ONLY', if_true: files('user-target.c'))


Looks like should be folded to patch 10.


r~



Re: [PATCH v3 10/24] gdbstub: move chunks of user code into own files

2023-02-21 Thread Richard Henderson

On 2/21/23 12:52, Alex Bennée wrote:

+# and BSD?
+specific_ss.add(when: 'CONFIG_LINUX_USER', if_true: files('user-target.c'))


Certainly and bsd.  It had been compiled before you moved it.


r~



Re: [PATCH v3 5/6] meson: prefer 'sphinx-build' to 'sphinx-build-3'

2023-02-21 Thread Markus Armbruster
John Snow  writes:

> On Tue, Feb 21, 2023, 1:50 AM Markus Armbruster  wrote:
>
>> John Snow  writes:
>>
>> > Once upon a time, "sphinx-build" on certain RPM platforms invoked
>> > specifically a Python 2.x version, while "sphinx-build-3" was a distro
>> > shim for the Python 3.x version.
>> >
>> > These days, none of our supported platforms utilize a 2.x version, so it
>> > should be safe to search for 'sphinx-build' prior to 'sphinx-build-3',
>> > which will prefer pip/venv installed versions of sphinx if they're
>> > available.
>> >
>> > This adds an extremely convenient ability to test document building
>> > ability in QEMU across multiple versions of Sphinx for the purposes of
>> > compatibility testing.
>> >
>> > Signed-off-by: John Snow 
>> > ---
>> >  docs/meson.build | 2 +-
>> >  1 file changed, 1 insertion(+), 1 deletion(-)
>> >
>> > diff --git a/docs/meson.build b/docs/meson.build
>> > index 9136fed3b73..906034f9a87 100644
>> > --- a/docs/meson.build
>> > +++ b/docs/meson.build
>> > @@ -1,5 +1,5 @@
>> >  if get_option('sphinx_build') == ''
>> > -  sphinx_build = find_program(['sphinx-build-3', 'sphinx-build'],
>> > +  sphinx_build = find_program(['sphinx-build', 'sphinx-build-3'],
>> >required: get_option('docs'))
>> >  else
>> >sphinx_build = find_program(get_option('sphinx_build'),
>>
>> Do we still need to check for sphinx-build-3?  Or asked differently, is
>> there any supported build host that provides only sphinx-build-3?
>>
>
> Yes, modern Fedora still uses "sphinx-build-3" as the name in /usr/bin for
> the rpm-packaged version of sphinx.

For what it's worth, python3-sphinx-5.0.2-2.fc37.noarch provides

/usr/bin/sphinx-build
/usr/bin/sphinx-build-3
/usr/bin/sphinx-build-3.11

where the latter two are symbolic links to the first.  No need to check
for sphinx-build-3 here.

> It's just that the only platforms where "sphinx-build" is the 2.x version
> are platforms on which we want to drop 3.6 support anyway, so it's OK to
> invert the search priority in the context of this series.
>
> (All pip/pypi versions use "sphinx-build" as the binary name. In effect,
> this patch means we prefer pip/pypi versions if they're in your $PATH.)




Re: [PATCH RESEND 04/18] i386/cpu: Fix number of addressable IDs in CPUID.04H

2023-02-21 Thread Zhao Liu
Hi Xiaoyao,

Thanks, I've spent some time thinking about it here.

On Mon, Feb 20, 2023 at 02:59:20PM +0800, Xiaoyao Li wrote:
> Date: Mon, 20 Feb 2023 14:59:20 +0800
> From: Xiaoyao Li 
> Subject: Re: [PATCH RESEND 04/18] i386/cpu: Fix number of addressable IDs
>  in CPUID.04H
> 
> On 2/13/2023 5:36 PM, Zhao Liu wrote:
> > For i-cache and d-cache, the maximum IDs for CPUs sharing cache (
> > CPUID.04H.00H:EAX[bits 25:14] and CPUID.04H.01H:EAX[bits 25:14]) are
> > both 0, and this means i-cache and d-cache are shared in the SMT level.
> > This is correct if there's single thread per core, but is wrong for the
> > hyper threading case (one core contains multiple threads) since the
> > i-cache and d-cache are shared in the core level other than SMT level.
> > 
> > Therefore, in order to be compatible with both multi-threaded and
> > single-threaded situations, we should set i-cache and d-cache be shared
> > at the core level by default.
> 
> It's true for VM only when the exactly HW topology is configured to VM.
> i.e., two virtual LPs of one virtual CORE are pinned to two physical LPs
> that of one physical CORE.

Yes, in this case, host and guest has the same topology, and their
topology can match.

> Otherwise it's incorrect for VM.

My understanding here is that what we do in QEMU is to create
self-consistent CPU topology and cache topology for the guest.

If the VM topology is self-consistent and emulated to be almost
identical to the real machine, then the emulation in QEMU is correct,
right? ;-)

> 
> for example. given a VM of 4 threads and 2 cores. If not pinning the 4
> threads to physical 4 LPs of 2 CORES. It's likely each vcpu running on a LP
> of different physical cores.

Thanks for bringing this up, this is worth discussing.

I looked into it and found that the specific scheduling policy for the
vCPU actually depends on the host setting. For example, (IIUC) if host

enables core scheduling, then host will schedule the vCPU on the SMT
threads of same core.

Also, to explore the original purpose of the "per thread" i/d cache
topology, I have retraced its history.

The related commit should be in '09, which is 400281a (set CPUID bits
to present cores and threads topology). In this commit, the
multithreading cache topology is added in CPUID.04H. In particular, here
it set the L2 cache level to per core, but it did not change the level of
L1 (i/d cache), that is, L1 still remains per thread.

I think that here is the problem, L1 should also be per core in
multithreading case. (the fix for this patch is worth it?)

Another thing we can refer to is that AMD's i/d cache topology is per
core rather than per thread (different CPUID leaf than intel): In
encode_cache_cpuid801d() (target/i386/cpu.c), i/d cache and L2
are encoded as core level in EAX. They set up the per core supposedly
to emulate the L1 topology of the real machine as well.

So, I guess this example is "unintentionally" benefiting from the
"per thread" level of i/d cache.

What do you think?

> So no vcpu shares L1i/L1d cache at core level.

Yes. The scheduling of host is not guaranteed, and workload balance
policies in various scenarios and some security mitigation ways may
break the delicate balance we have carefully set up.

Perhaps another way is to also add a new command "x-l1-cache-topo" (like
[1] did for L2) that can adjust the i/d cache level from core to smt to
benefit cases like this.

[1]: https://lists.gnu.org/archive/html/qemu-devel/2023-02/msg03201.html

Thanks,
Zhao




Re: [PATCH 3/5] bulk: Replace [g_]assert(0) -> g_assert_not_reached()

2023-02-21 Thread Richard Henderson

On 2/21/23 18:06, Thomas Huth wrote:

  int postcopy_wake_shared(struct PostCopyFD *pcfd,
   uint64_t client_addr,
   RAMBlock *rb)
  {
-    assert(0);
-    return -1;
+    g_assert_not_reached();
  }
  #endif


If we ever reconsider to allow compiling with G_DISABLE_ASSERT again,


... and we shouldn't [1] ...


this will fail to compile since the return is missing now, so this is kind of 
ugly ... would it make sense to replace this with g_assert_true(0) instead? Or 
use abort() directly?


With g_assert_true(0), definitely not.
That is a testing-only item which can be disabled at runtime.

With abort(), no, since g_assert_not_reached() prints file:line.
Indeed, I was suggesting the opposite -- to replace abort() without error_report() with 
g_assert_not_reached().



r~


[1] Allowing G_DISABLE_ASSERT and/or NDEBUG would only require that we invent 
qemu-specific replacements with either (1) do exactly the same thing or, (2) interact with 
__builtin_unreachable() or __builtin_trap(), so that we tell the compiler exactly what's 
going on with the expressions and flow control.





Re: [RFC v5 0/3] migration: reduce time of loading non-iterable vmstate

2023-02-21 Thread Chuang Xu

Hi, Peter

On 2023/2/22 上午4:36, Peter Xu wrote:

On 2023/2/21 上午11:38, Chuang Xu wrote:

I think we need a memory_region_transaction_commit_force() to force
commit
some transactions when load vmstate. This function is designed like this:

/*
  * memory_region_transaction_commit_force() is desgined to
  * force the mr transaction to be commited in the process
  * of loading vmstate.
  */
void memory_region_transaction_commit_force(void)

I would call this memory_region_transaction_do_commit(), and I don't think
the manipulation of memory_region_transaction_depth is needed here since we
don't release BQL during the whole process, so changing that depth isn't
needed at all to me.

So, I think we can...


{
     AddressSpace *as;
     unsigned int memory_region_transaction_depth_copy =
memory_region_transaction_depth;

     /*
  * Temporarily replace memory_region_transaction_depth with 0 to
prevent
  * memory_region_transaction_commit_force() and
address_space_to_flatview()
  * call each other recursively.
  */
     memory_region_transaction_depth = 0;

... drop here ...


Note that as I mentioned in the comment, we temporarily replace this value
to prevent commit() and address_space_to_flatview() call each other recursively,
and eventually stack overflow.

Part of the coredump call stack is attached here:

#8  0x558de5a998b5 in memory_region_transaction_do_commit () at 
../softmmu/memory.c:1131
#9  0x558de5a99dfd in address_space_to_flatview (as=0x558de6516060 
) at 
/data00/migration/qemu-open/include/exec/memory.h:1130
#10 address_space_get_flatview (as=as@entry=0x558de6516060 
) at ../softmmu/memory.c:810
#11 0x558de5a9a199 in address_space_update_ioeventfds (as=as@entry=0x558de6516060 
) at ../softmmu/memory.c:836
#12 0x558de5a99900 in memory_region_transaction_do_commit () at 
../softmmu/memory.c:1137
#13 memory_region_transaction_do_commit () at ../softmmu/memory.c:1125
#14 0x558de5a99dfd in address_space_to_flatview (as=0x558de6516060 
) at 
/data00/migration/qemu-open/include/exec/memory.h:1130
#15 address_space_get_flatview (as=as@entry=0x558de6516060 
) at ../softmmu/memory.c:810
#16 0x558de5a9a199 in address_space_update_ioeventfds (as=as@entry=0x558de6516060 
) at ../softmmu/memory.c:836
#17 0x558de5a99900 in memory_region_transaction_do_commit () at 
../softmmu/memory.c:1137
#18 memory_region_transaction_do_commit () at ../softmmu/memory.c:1125
#19 0x558de5a99dfd in address_space_to_flatview (as=0x558de6516060 
) at 
/data00/migration/qemu-open/include/exec/memory.h:1130
#20 address_space_get_flatview (as=as@entry=0x558de6516060 
) at ../softmmu/memory.c:810
#21 0x558de5a9a199 in address_space_update_ioeventfds (as=as@entry=0x558de6516060 
) at ../softmmu/memory.c:836
#22 0x558de5a99900 in memory_region_transaction_do_commit () at 
../softmmu/memory.c:1137
#23 memory_region_transaction_do_commit () at ../softmmu/memory.c:1125
#24 0x558de5a99dfd in address_space_to_flatview (as=0x558de6516060 
) at 
/data00/migration/qemu-open/include/exec/memory.h:1130

So I think we need to change the depth here.




     assert(qemu_mutex_iothread_locked());


     if (memory_region_update_pending) {
     flatviews_reset();

     MEMORY_LISTENER_CALL_GLOBAL(begin, Forward);

     QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
     address_space_set_flatview(as);
     address_space_update_ioeventfds(as);
     }
     memory_region_update_pending = false;
     ioeventfd_update_pending = false;
     MEMORY_LISTENER_CALL_GLOBAL(commit, Forward);
     } else if (ioeventfd_update_pending) {
     QTAILQ_FOREACH(as, _spaces, address_spaces_link) {
     address_space_update_ioeventfds(as);
     }
     ioeventfd_update_pending = false;
     }

     /* recover memory_region_transaction_depth */
     memory_region_transaction_depth =
memory_region_transaction_depth_copy;

... drop here ...


}

... then call this new memory_region_transaction_do_commit() in
memory_region_transaction_commit().

void memory_region_transaction_commit(void)
{
 AddressSpace *as;

 assert(memory_region_transaction_depth);
 --memory_region_transaction_depth;
 memory_region_transaction_do_commit();
}

Then...


Now there are two options to use this function:
1. call it in address_space_to_flatview():

static inline FlatView *address_space_to_flatview(AddressSpace *as)
{
     /*
  * Before using any flatview, check whether we're during a memory
  * region transaction. If so, force commit the memory region
transaction.
  */
     if (memory_region_transaction_in_progress())

Here we need to add the condition of BQL holding, or some threads without
BQL held running here will trigger the assertion in
memory_region_transaction_commit_force().

I'm not sure whether this condition is sufficient, at least for the mr access
in the load thread it is sufficient (because the load thread will hold the BQL

Re: [PATCH v1 3/6] kvm: Synchronize the backup bitmap in the last stage

2023-02-21 Thread Gavin Shan

On 2/22/23 10:58 AM, Peter Xu wrote:

On Wed, Feb 22, 2023 at 10:44:07AM +1100, Gavin Shan wrote:

Peter, could you please give some hints for me to understand the atomic
and non-atomic update here? Ok, I will drop this part of changes in next
revision with the assumption that we have atomic update supported for
ARM64.


See commit f39b7d2b96.  Please don't remove the change in this patch.

The comment was just something I thought about when reading, not something
I suggested to change.

If to remove it we'll need to remove the whole chunk not your changes alone
here.  Still, please take it with a grain of salt before anyone can help to
confirm because I can miss something else here.

In short: before we know anything solidly, your current code is exactly
correct, AFAICT.



Thanks, Peter. I think it's all for later. I will keep the changes and with
your r-b in next revision :)

Thanks,
Gavin




Re: [PATCH v10 1/9] mm: Introduce memfd_restricted system call to create restricted user memory

2023-02-21 Thread Alexey Kardashevskiy

On 14/1/23 08:54, Sean Christopherson wrote:

On Fri, Dec 02, 2022, Chao Peng wrote:

The system call is currently wired up for x86 arch.


Building on other architectures (except for arm64 for some reason) yields:

   CALL/.../scripts/checksyscalls.sh
   :1565:2: warning: #warning syscall memfd_restricted not implemented 
[-Wcpp]

Do we care?  It's the only such warning, which makes me think we either need to
wire this up for all architectures, or explicitly document that it's 
unsupported.


Signed-off-by: Kirill A. Shutemov 
Signed-off-by: Chao Peng 
---


...


diff --git a/include/linux/restrictedmem.h b/include/linux/restrictedmem.h
new file mode 100644
index ..c2700c5daa43
--- /dev/null
+++ b/include/linux/restrictedmem.h
@@ -0,0 +1,71 @@
+/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
+#ifndef _LINUX_RESTRICTEDMEM_H


Missing

  #define _LINUX_RESTRICTEDMEM_H

which causes fireworks if restrictedmem.h is included more than once.


+#include 
+#include 
+#include 


...


+static inline int restrictedmem_get_page(struct file *file, pgoff_t offset,
+struct page **pagep, int *order)
+{
+   return -1;


This should be a proper -errno, though in the current incarnation of things it's
a moot point because no stub is needed.  KVM can (and should) easily provide its
own stub for this one.


+}
+
+static inline bool file_is_restrictedmem(struct file *file)
+{
+   return false;
+}
+
+static inline void restrictedmem_error_page(struct page *page,
+   struct address_space *mapping)
+{
+}
+
+#endif /* CONFIG_RESTRICTEDMEM */
+
+#endif /* _LINUX_RESTRICTEDMEM_H */


...


diff --git a/mm/restrictedmem.c b/mm/restrictedmem.c
new file mode 100644
index ..56953c204e5c
--- /dev/null
+++ b/mm/restrictedmem.c
@@ -0,0 +1,318 @@
+// SPDX-License-Identifier: GPL-2.0
+#include "linux/sbitmap.h"
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+struct restrictedmem_data {


Any objection to simply calling this "restrictedmem"?  And then using either 
"rm"
or "rmem" for local variable names?  I kept reading "data" as the underyling 
data
being written to the page, as opposed to the metadata describing the 
restrictedmem
instance.


+   struct mutex lock;
+   struct file *memfd;
+   struct list_head notifiers;
+};
+
+static void restrictedmem_invalidate_start(struct restrictedmem_data *data,
+  pgoff_t start, pgoff_t end)
+{
+   struct restrictedmem_notifier *notifier;
+
+   mutex_lock(>lock);


This can be a r/w semaphore instead of a mutex, that way punching holes at 
multiple
points in the file can at least run the notifiers in parallel.  The actual 
allocation
by shmem will still be serialized, but I think it's worth the simple 
optimization
since zapping and flushing in KVM may be somewhat slow.


+   list_for_each_entry(notifier, >notifiers, list) {
+   notifier->ops->invalidate_start(notifier, start, end);


Two major design issues that we overlooked long ago:

   1. Blindly invoking notifiers will not scale.  E.g. if userspace configures a
  VM with a large number of convertible memslots that are all backed by a
  single large restrictedmem instance, then converting a single page will
  result in a linear walk through all memslots.  I don't expect anyone to
  actually do something silly like that, but I also never expected there to 
be
  a legitimate usecase for thousands of memslots.

   2. This approach fails to provide the ability for KVM to ensure a guest has
  exclusive access to a page.  As discussed in the past, the kernel can rely
  on hardware (and maybe ARM's pKVM implementation?) for those guarantees, 
but
  only for SNP and TDX VMs.  For VMs where userspace is trusted to some 
extent,
  e.g. SEV, there is value in ensuring a 1:1 association.

  And probably more importantly, relying on hardware for SNP and TDX yields 
a
  poor ABI and complicates KVM's internals.  If the kernel doesn't 
guarantee a
  page is exclusive to a guest, i.e. if userspace can hand out the same page
  from a restrictedmem instance to multiple VMs, then failure will occur 
only
  when KVM tries to assign the page to the second VM.  That will happen deep
  in KVM, which means KVM needs to gracefully handle such errors, and it 
means
  that KVM's ABI effectively allows plumbing garbage into its memslots.

Rather than use a simple list of notifiers, this appears to be yet another
opportunity to use an xarray.  Supporting sharing of restrictedmem will be
non-trivial, but IMO we should punt that to the future since it's still unclear
exactly how sharing will work.

An xarray will solve #1 by notifying only the consumers (memslots) that are 
bound
to the affected range.

And for #2, it's relatively straightforward (knock wood) to detect 

Re: [PATCH v1 5/6] hw/arm/virt: Enable backup bitmap for dirty ring

2023-02-21 Thread Gavin Shan

On 2/22/23 3:27 AM, Peter Maydell wrote:

On Mon, 13 Feb 2023 at 00:40, Gavin Shan  wrote:


When KVM device "kvm-arm-gicv3" or "arm-its-kvm" is used, we have to
enable the backup bitmap for the dirty ring. Otherwise, the migration
will fail because those two devices are using the backup bitmap to track
dirty guest memory, corresponding to various hardware tables.

Signed-off-by: Gavin Shan 
Reviewed-by: Juan Quintela 
---
  hw/arm/virt.c| 26 ++
  target/arm/kvm64.c   | 25 +
  target/arm/kvm_arm.h | 12 
  3 files changed, 63 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index 75f28947de..ea9bd98a65 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -2024,6 +2024,8 @@ static void machvirt_init(MachineState *machine)
  VirtMachineClass *vmc = VIRT_MACHINE_GET_CLASS(machine);
  MachineClass *mc = MACHINE_GET_CLASS(machine);
  const CPUArchIdList *possible_cpus;
+const char *gictype = NULL;
+const char *itsclass = NULL;
  MemoryRegion *sysmem = get_system_memory();
  MemoryRegion *secure_sysmem = NULL;
  MemoryRegion *tag_sysmem = NULL;
@@ -2071,6 +2073,30 @@ static void machvirt_init(MachineState *machine)
   */
  finalize_gic_version(vms);

+/*
+ * When "kvm-arm-gicv3" or "arm-its-kvm" is used, the backup dirty
+ * bitmap has to be enabled for KVM dirty ring, before any memory
+ * slot is added. Otherwise, the migration will fail with the dirty
+ * ring.
+ */
+if (kvm_enabled()) {
+if (vms->gic_version != VIRT_GIC_VERSION_2) {
+gictype = gicv3_class_name();
+}
+
+if (vms->gic_version != VIRT_GIC_VERSION_2 && vms->its) {
+itsclass = its_class_name();
+}
+
+if (((gictype && !strcmp(gictype, "kvm-arm-gicv3")) ||
+ (itsclass && !strcmp(itsclass, "arm-its-kvm"))) &&
+!kvm_arm_enable_dirty_ring_with_bitmap()) {
+error_report("Failed to enable the backup bitmap for "
+ "KVM dirty ring");
+exit(1);
+}
+}


Why does this need to be board-specific code? Is there
some way we can just do the right thing automatically?
Why does the GIC/ITS matter?

The kernel should already know whether we have asked it
to do something that needs this extra extension, so
I think we ought to be able in the generic "enable the
dirty ring" code say "if the kernel says we need this
extra thing, also enable this extra thing". Or if that's
too early, we can do the extra part in a generic hook a
bit later.

In the future there might be other things, presumably,
that need the backup bitmap, so it would be more future
proof not to need to also change QEMU to add extra
logic checks that duplicate the logic the kernel already has.



When the dirty ring is enabled, a per-vcpu buffer is used to track the dirty 
pages.
The prerequisite to use the per-vcpu buffer is existing running VCPU context. 
There
are two cases where no running VCPU context exists and the backup bitmap 
extension
is needed, as we know until now: (a) save/restore GICv3 tables; (b) 
save/restore ITS
tables; These two cases are related to KVM device "kvm-arm-gicv3" and 
"arm-its-kvm",
which are only needed by virt machine at present. So we needn't the backup 
bitmap
extension for other boards.

The host kernel always exports the capability KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP
for ARM64. The capability isn't exported for x86 because we needn't it there. 
The
generic path to enable the extension would be in kvm_init(). I think the 
extension
is enabled unconditionally if it has been exported by host kernel. It means 
there
will be unnecessary overhead to synchronize the backup bitmap when the 
aforementioned
KVM devices aren't used, but the overhead should be very small and acceptable. 
The
only concern is host kernel has to allocate the backup bitmap, which is totally 
no
use. Please let me know your thoughts, Peter.


qemu_init
  qemu_create_machine
:
:
  configure_accelerators
do_configure_accelerator
  accel_init_machine
kvm_init// Where the dirty ring is eanbled. 
Would be best
  kvm_arch_init // place to enable the backup 
bitmap extension regardless
:   // of 'kvm-arm-gicv3' and 
'arm-its-kvm' are used
:
  qmp_x_exit_preconfig
qemu_init_board
  machine_run_board_init
machvirt_init  // memory slots are added here and 
where we know the KVM devices
:  // are used
:
  accel_setup_post // no backend for KVM yet, xen only

Note that the capability KVM_CAP_DIRTY_LOG_RING_WITH_BITMAP can't be enabled if
the dirty ring isn't enabled or memory slots have been added.

Thanks,
Gavin




[PATCH] plugin: fix clearing of plugin_mem_cbs before TB exit

2023-02-21 Thread Emilio Cota
Currently we are wrongly accessing plugin_tb->mem_helper at
translation time from plugin_gen_disable_mem_helpers, which is
called before generating a TB exit, e.g. with exit_tb.

Recall that it is only during TB finalisation, i.e. when we go over
the TB post-translation to inject or remove plugin instrumentation,
when plugin_tb->mem_helper is set. This means that we never clear
plugin_mem_cbs when calling plugin_gen_disable_mem_helpers since
mem_helper is always false. Therefore a guest instruction that uses
helpers and emits an explicit TB exit results in plugin_mem_cbs being
set upon exiting, which is caught by an assertion as reported in
the reopening of issue #1381 and replicated below.

Fix this by (1) adding an insertion point before exiting a TB
("before_exit"), and (2) deciding whether or not to emit the
clearing of plugin_mem_cbs at this newly-added insertion point
during TB finalisation.

While at it, suffix plugin_gen_disable_mem_helpers with _before_exit
to make its intent more clear.

- Before:
$ ./qemu-system-riscv32 -M spike -nographic -plugin 
contrib/plugins/libexeclog.so -d plugin,in_asm,op
IN:
Priv: 3; Virt: 0
0x1000:  0297  auipc   t0,0
# 0x1000
0x1004:  02828613  addia2,t0,40
0x1008:  f1402573  csrrs   a0,mhartid,zero

OP:
 ld_i32 tmp1,env,$0xfff0
 brcond_i32 tmp1,$0x0,lt,$L0

  1000 
 mov_i64 tmp2,$0x7ff9940152e0
 ld_i32 tmp1,env,$0xef80
 call plugin(0x7ff9edbcb6f0),$0x11,$0,tmp1,tmp2
 mov_i32 x5/t0,$0x1000

  1004 
 mov_i64 tmp2,$0x7ff9940153e0
 ld_i32 tmp1,env,$0xef80
 call plugin(0x7ff9edbcb6f0),$0x11,$0,tmp1,tmp2
 add_i32 x12/a2,x5/t0,$0x28

  1008 f1402573
 mov_i64 tmp2,$0x7ff9940136c0
 st_i64 tmp2,env,$0xef68
 mov_i64 tmp2,$0x7ff994015530
 ld_i32 tmp1,env,$0xef80
 call plugin(0x7ff9edbcb6f0),$0x11,$0,tmp1,tmp2 <-- sets plugin_mem_cbs
 call csrr,$0x0,$1,x10/a0,env,$0xf14  <-- helper that might access memory
 mov_i32 pc,$0x100c
 exit_tb $0x0  <-- exit_tb right after the helper; missing clearing of 
plugin_mem_cbs
 mov_i64 tmp2,$0x0
 st_i64 tmp2,env,$0xef68 <-- after_insn clearing: dead due to 
exit_tb above
 set_label $L0
 exit_tb $0x7ff9a443 <-- again, missing clearing (doesn't matter due to $L0 
label)

0, 0x1000, 0x297, "0297  auipc   t0,0   
 # 0x1000"
0, 0x1004, 0x2828613, "02828613  addia2,t0,40"
**
ERROR:../accel/tcg/cpu-exec.c:983:cpu_exec_loop: assertion failed: 
(cpu->plugin_mem_cbs == ((void *)0))
Bail out! ERROR:../accel/tcg/cpu-exec.c:983:cpu_exec_loop: assertion failed: 
(cpu->plugin_mem_cbs == ((void *)0))
Aborted (core dumped)

- After:
$ ./qemu-system-riscv32 -M spike -nographic -plugin 
contrib/plugins/libexeclog.so -d plugin,in_asm,op
(snip)
 call plugin(0x7f19bc9e36f0),$0x11,$0,tmp1,tmp2 <-- sets plugin_mem_cbs
 call csrr,$0x0,$1,x10/a0,env,$0xf14
 mov_i32 pc,$0x100c
 mov_i64 tmp2,$0x0
 st_i64 tmp2,env,$0xef68 <-- before_exit clearing of plugin_mem_cbs
 exit_tb $0x0
 mov_i64 tmp2,$0x0
 st_i64 tmp2,env,$0xef68 <-- after_insn clearing (dead)
 set_label $L0
 mov_i64 tmp2,$0x0
 st_i64 tmp2,env,$0xef68 <-- before_exit clearing (doesn't matter 
due to $L0)
 exit_tb $0x7f38c843
[...]

Fixes: #1381
Signed-off-by: Emilio Cota 
---
 accel/tcg/plugin-gen.c| 44 ---
 include/exec/plugin-gen.h |  4 ++--
 include/qemu/plugin.h |  3 ---
 tcg/tcg-op.c  |  6 +++---
 4 files changed, 28 insertions(+), 29 deletions(-)

diff --git a/accel/tcg/plugin-gen.c b/accel/tcg/plugin-gen.c
index 17a686bd9e..b4fc171b8e 100644
--- a/accel/tcg/plugin-gen.c
+++ b/accel/tcg/plugin-gen.c
@@ -67,6 +67,7 @@ enum plugin_gen_from {
 PLUGIN_GEN_FROM_INSN,
 PLUGIN_GEN_FROM_MEM,
 PLUGIN_GEN_AFTER_INSN,
+PLUGIN_GEN_BEFORE_EXIT,
 PLUGIN_GEN_N_FROMS,
 };
 
@@ -177,6 +178,7 @@ static void plugin_gen_empty_callback(enum plugin_gen_from 
from)
 {
 switch (from) {
 case PLUGIN_GEN_AFTER_INSN:
+case PLUGIN_GEN_BEFORE_EXIT:
 gen_wrapped(from, PLUGIN_GEN_DISABLE_MEM_HELPER,
 gen_empty_mem_helper);
 break;
@@ -575,7 +577,7 @@ static void inject_mem_helper(TCGOp *begin_op, GArray *arr)
  * that we can read them at run-time (i.e. when the helper executes).
  * This run-time access is performed from qemu_plugin_vcpu_mem_cb.
  *
- * Note that plugin_gen_disable_mem_helpers undoes (2). Since it
+ * Note that inject_mem_disable_helper undoes (2). Since it
  * is possible that the code we generate after the instruction is
  * dead, we also add checks before generating tb_exit etc.
  */
@@ -600,7 +602,6 @@ static void inject_mem_enable_helper(struct qemu_plugin_tb 
*ptb,
 rm_ops(begin_op);
 return;
 }
-ptb->mem_helper = true;
 
 arr = 

Re: [PATCH v2 13/13] vdpa: return VHOST_F_LOG_ALL in vhost-vdpa devices

2023-02-21 Thread Jason Wang



在 2023/2/8 17:42, Eugenio Pérez 写道:

vhost-vdpa devices can return this features now that blockers have been
set in case some features are not met.

Expose VHOST_F_LOG_ALL only in that case.

Signed-off-by: Eugenio Pérez 
---



Acked-by: Jason Wang 

Thanks



  hw/virtio/vhost-vdpa.c | 3 +--
  1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 13a86a2bb1..5fddc77c5c 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -1319,10 +1319,9 @@ static int vhost_vdpa_set_vring_call(struct vhost_dev 
*dev,
  static int vhost_vdpa_get_features(struct vhost_dev *dev,
   uint64_t *features)
  {
-struct vhost_vdpa *v = dev->opaque;
  int ret = vhost_vdpa_get_dev_features(dev, features);
  
-if (ret == 0 && v->shadow_vqs_enabled) {

+if (ret == 0) {
  /* Add SVQ logging capabilities */
  *features |= BIT_ULL(VHOST_F_LOG_ALL);
  }





Re: [PATCH 3/5] bulk: Replace [g_]assert(0) -> g_assert_not_reached()

2023-02-21 Thread Thomas Huth

On 22/02/2023 00.25, Philippe Mathieu-Daudé wrote:

In order to avoid warnings such commit c0a6665c3c ("target/i386:
Remove compilation errors when -Werror=maybe-uninitialized"),
replace all assert(0) and g_assert(0) by g_assert_not_reached().

Remove any code following g_assert_not_reached().

See previous commit for rationale.

Signed-off-by: Philippe Mathieu-Daudé 
---

diff --git a/docs/spin/aio_notify_accept.promela 
b/docs/spin/aio_notify_accept.promela
index 9cef2c955d..f929d30328 100644
--- a/docs/spin/aio_notify_accept.promela
+++ b/docs/spin/aio_notify_accept.promela
@@ -118,7 +118,7 @@ accept_if_req_not_eventually_false:
  if
  :: req -> goto accept_if_req_not_eventually_false;
  fi;
-assert(0);
+g_assert_not_reached();
  }


This does not look like C code ... is it safe to replace the statement here?


diff --git a/docs/spin/aio_notify_bug.promela b/docs/spin/aio_notify_bug.promela
index b3bfca1ca4..ce6f5177ed 100644
--- a/docs/spin/aio_notify_bug.promela
+++ b/docs/spin/aio_notify_bug.promela
@@ -106,7 +106,7 @@ accept_if_req_not_eventually_false:
  if
  :: req -> goto accept_if_req_not_eventually_false;
  fi;
-assert(0);
+g_assert_not_reached();
  }


dito


diff --git a/migration/postcopy-ram.c b/migration/postcopy-ram.c
index f54f44d899..59c8032a21 100644
--- a/migration/postcopy-ram.c
+++ b/migration/postcopy-ram.c
@@ -1347,49 +1347,42 @@ int postcopy_ram_incoming_init(MigrationIncomingState 
*mis)
  
  int postcopy_ram_incoming_cleanup(MigrationIncomingState *mis)

  {
-assert(0);
-return -1;
+g_assert_not_reached();
  }
  
  int postcopy_ram_prepare_discard(MigrationIncomingState *mis)

  {
-assert(0);
-return -1;
+g_assert_not_reached();
  }
  
  int postcopy_request_shared_page(struct PostCopyFD *pcfd, RAMBlock *rb,

   uint64_t client_addr, uint64_t rb_offset)
  {
-assert(0);
-return -1;
+g_assert_not_reached();
  }
  
  int postcopy_ram_incoming_setup(MigrationIncomingState *mis)

  {
-assert(0);
-return -1;
+g_assert_not_reached();
  }
  
  int postcopy_place_page(MigrationIncomingState *mis, void *host, void *from,

  RAMBlock *rb)
  {
-assert(0);
-return -1;
+g_assert_not_reached();
  }
  
  int postcopy_place_page_zero(MigrationIncomingState *mis, void *host,

  RAMBlock *rb)
  {
-assert(0);
-return -1;
+g_assert_not_reached();
  }
  
  int postcopy_wake_shared(struct PostCopyFD *pcfd,

   uint64_t client_addr,
   RAMBlock *rb)
  {
-assert(0);
-return -1;
+g_assert_not_reached();
  }
  #endif


If we ever reconsider to allow compiling with G_DISABLE_ASSERT again, this 
will fail to compile since the return is missing now, so this is kind of 
ugly ... would it make sense to replace this with g_assert_true(0) instead? 
Or use abort() directly?


 Thomas




Re: [PATCH v2 11/13] vdpa: block migration if dev does not have _F_SUSPEND

2023-02-21 Thread Jason Wang



在 2023/2/8 17:42, Eugenio Pérez 写道:

Next patches enable devices to be migrated even if vdpa netdev has not
been started with x-svq. However, not all devices are migratable, so we
need to block migration if we detect that.

Block vhost-vdpa device migration if it does not offer _F_SUSPEND and it
has not been started with x-svq.

Signed-off-by: Eugenio Pérez 
---
  hw/virtio/vhost-vdpa.c | 21 +
  1 file changed, 21 insertions(+)

diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
index 84a6b9690b..9d30cf9b3c 100644
--- a/hw/virtio/vhost-vdpa.c
+++ b/hw/virtio/vhost-vdpa.c
@@ -442,6 +442,27 @@ static int vhost_vdpa_init(struct vhost_dev *dev, void 
*opaque, Error **errp)
  return 0;
  }
  
+/*

+ * If dev->shadow_vqs_enabled at initialization that means the device has
+ * been started with x-svq=on, so don't block migration
+ */
+if (dev->migration_blocker == NULL && !v->shadow_vqs_enabled) {
+uint64_t backend_features;
+
+/* We don't have dev->backend_features yet */
+ret = vhost_vdpa_call(dev, VHOST_GET_BACKEND_FEATURES,
+  _features);
+if (unlikely(ret)) {
+error_setg_errno(errp, -ret, "Could not get backend features");
+return ret;
+}
+
+if (!(backend_features & BIT_ULL(VHOST_BACKEND_F_SUSPEND))) {
+error_setg(>migration_blocker,
+"vhost-vdpa backend lacks VHOST_BACKEND_F_SUSPEND feature.");
+}



I wonder why not let the device to decide? For networking device, we can 
live without suspend probably.


Thanks



+}
+
  /*
   * Similar to VFIO, we end up pinning all guest memory and have to
   * disable discarding of RAM.





Re: [PATCH v2 09/13] vdpa net: block migration if the device has CVQ

2023-02-21 Thread Jason Wang



在 2023/2/8 17:42, Eugenio Pérez 写道:

Devices with CVQ needs to migrate state beyond vq state.  Leaving this
to future series.



I may miss something but what is missed to support CVQ/MQ?

Thanks




Signed-off-by: Eugenio Pérez 
---
  net/vhost-vdpa.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index bca13f97fd..309861e56c 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -955,11 +955,17 @@ int net_init_vhost_vdpa(const Netdev *netdev, const char 
*name,
  }
  
  if (has_cvq) {

+VhostVDPAState *s;
+
  nc = net_vhost_vdpa_init(peer, TYPE_VHOST_VDPA, name,
   vdpa_device_fd, i, 1, false,
   opts->x_svq, iova_range);
  if (!nc)
  goto err;
+
+s = DO_UPCAST(VhostVDPAState, nc, nc);
+error_setg(>vhost_vdpa.dev->migration_blocker,
+   "net vdpa cannot migrate with MQ feature");
  }
  
  return 0;





Re: [PATCH v2 07/13] vdpa: add vdpa net migration state notifier

2023-02-21 Thread Jason Wang



在 2023/2/8 17:42, Eugenio Pérez 写道:

This allows net to restart the device backend to configure SVQ on it.

Ideally, these changes should not be net specific. However, the vdpa net
backend is the one with enough knowledge to configure everything because
of some reasons:
* Queues might need to be shadowed or not depending on its kind (control
   vs data).
* Queues need to share the same map translations (iova tree).

Because of that it is cleaner to restart the whole net backend and
configure again as expected, similar to how vhost-kernel moves between
userspace and passthrough.

If more kinds of devices need dynamic switching to SVQ we can create a
callback struct like VhostOps and move most of the code there.
VhostOps cannot be reused since all vdpa backend share them, and to
personalize just for networking would be too heavy.

Signed-off-by: Eugenio Pérez 
---
v3:
* Add TODO to use the resume operation in the future.
* Use migration_in_setup and migration_has_failed instead of a
   complicated switch case.
---
  net/vhost-vdpa.c | 76 
  1 file changed, 76 insertions(+)

diff --git a/net/vhost-vdpa.c b/net/vhost-vdpa.c
index dd686b4514..bca13f97fd 100644
--- a/net/vhost-vdpa.c
+++ b/net/vhost-vdpa.c
@@ -26,12 +26,14 @@
  #include 
  #include "standard-headers/linux/virtio_net.h"
  #include "monitor/monitor.h"
+#include "migration/misc.h"
  #include "hw/virtio/vhost.h"
  
  /* Todo:need to add the multiqueue support here */

  typedef struct VhostVDPAState {
  NetClientState nc;
  struct vhost_vdpa vhost_vdpa;
+Notifier migration_state;
  VHostNetState *vhost_net;
  
  /* Control commands shadow buffers */

@@ -241,10 +243,79 @@ static VhostVDPAState 
*vhost_vdpa_net_first_nc_vdpa(VhostVDPAState *s)
  return DO_UPCAST(VhostVDPAState, nc, nc0);
  }
  
+static void vhost_vdpa_net_log_global_enable(VhostVDPAState *s, bool enable)

+{
+struct vhost_vdpa *v = >vhost_vdpa;
+VirtIONet *n;
+VirtIODevice *vdev;
+int data_queue_pairs, cvq, r;
+NetClientState *peer;
+
+/* We are only called on the first data vqs and only if x-svq is not set */
+if (s->vhost_vdpa.shadow_vqs_enabled == enable) {
+return;
+}
+
+vdev = v->dev->vdev;
+n = VIRTIO_NET(vdev);



Let's tweak the code to move those initialization to the beginning of 
the function.




+if (!n->vhost_started) {
+return;
+}



What happens if the vhost is started during the live migration?



+
+data_queue_pairs = n->multiqueue ? n->max_queue_pairs : 1;
+cvq = virtio_vdev_has_feature(vdev, VIRTIO_NET_F_CTRL_VQ) ?
+  n->max_ncs - n->max_queue_pairs : 0;
+/*
+ * TODO: vhost_net_stop does suspend, get_base and reset. We can be smarter
+ * in the future and resume the device if read-only operations between
+ * suspend and reset goes wrong.
+ */
+vhost_net_stop(vdev, n->nic->ncs, data_queue_pairs, cvq);
+
+peer = s->nc.peer;
+for (int i = 0; i < data_queue_pairs + cvq; i++) {
+VhostVDPAState *vdpa_state;
+NetClientState *nc;
+
+if (i < data_queue_pairs) {
+nc = qemu_get_peer(peer, i);
+} else {
+nc = qemu_get_peer(peer, n->max_queue_pairs);
+}
+
+vdpa_state = DO_UPCAST(VhostVDPAState, nc, nc);
+vdpa_state->vhost_vdpa.shadow_data = enable;
+
+if (i < data_queue_pairs) {
+/* Do not override CVQ shadow_vqs_enabled */
+vdpa_state->vhost_vdpa.shadow_vqs_enabled = enable;
+}



I wonder what happens if the number of queue pairs is changed during 
live migration? Should we assign all qps in this case?


Thanks



+}
+
+r = vhost_net_start(vdev, n->nic->ncs, data_queue_pairs, cvq);
+if (unlikely(r < 0)) {
+error_report("unable to start vhost net: %s(%d)", g_strerror(-r), -r);
+}
+}
+
+static void vdpa_net_migration_state_notifier(Notifier *notifier, void *data)
+{
+MigrationState *migration = data;
+VhostVDPAState *s = container_of(notifier, VhostVDPAState,
+ migration_state);
+
+if (migration_in_setup(migration)) {
+vhost_vdpa_net_log_global_enable(s, true);
+} else if (migration_has_failed(migration)) {
+vhost_vdpa_net_log_global_enable(s, false);
+}
+}
+
  static void vhost_vdpa_net_data_start_first(VhostVDPAState *s)
  {
  struct vhost_vdpa *v = >vhost_vdpa;
  
+add_migration_state_change_notifier(>migration_state);

  if (v->shadow_vqs_enabled) {
  v->iova_tree = vhost_iova_tree_new(v->iova_range.first,
 v->iova_range.last);
@@ -278,6 +349,10 @@ static void vhost_vdpa_net_client_stop(NetClientState *nc)
  
  assert(nc->info->type == NET_CLIENT_DRIVER_VHOST_VDPA);
  
+if (s->vhost_vdpa.index == 0) {

+remove_migration_state_change_notifier(>migration_state);
+}
+
  

Re: [PATCH 2/5] scripts/checkpatch.pl: Do not allow assert(0)

2023-02-21 Thread Thomas Huth

On 22/02/2023 00.25, Philippe Mathieu-Daudé wrote:

Since commit 262a69f428 ("osdep.h: Prohibit disabling assert()
in supported builds") we can not build QEMU with NDEBUG (or
G_DISABLE_ASSERT) defined, thus 'assert(0)' always aborts QEMU.

However some static analyzers / compilers doesn't notice NDEBUG
can't be defined and emit warnings if code is used after an
'assert(0)' call. See for example commit c0a6665c3c ("target/i386:
Remove compilation errors when -Werror=maybe-uninitialized").


commit c0a6665c3c only uses g_assert_not_reached(), so that looks like a bad 
example for your asset(0) case?


 Thomas




Re: [PATCH v2 04/13] vdpa: move vhost reset after get vring base

2023-02-21 Thread Jason Wang
On Tue, Feb 21, 2023 at 3:08 PM Eugenio Perez Martin
 wrote:
>
> On Tue, Feb 21, 2023 at 6:36 AM Jason Wang  wrote:
> >
> >
> > 在 2023/2/8 17:42, Eugenio Pérez 写道:
> > > The function vhost.c:vhost_dev_stop calls vhost operation
> > > vhost_dev_start(false). In the case of vdpa it totally reset and wipes
> > > the device, making the fetching of the vring base (virtqueue state) 
> > > totally
> > > useless.
> > >
> > > The kernel backend does not use vhost_dev_start vhost op callback, but
> > > vhost-user do. A patch to make vhost_user_dev_start more similar to vdpa
> > > is desirable, but it can be added on top.
> > >
> > > Signed-off-by: Eugenio Pérez 
> > > ---
> > >   include/hw/virtio/vhost-backend.h |  4 
> > >   hw/virtio/vhost-vdpa.c| 22 --
> > >   hw/virtio/vhost.c |  3 +++
> > >   3 files changed, 23 insertions(+), 6 deletions(-)
> > >
> > > diff --git a/include/hw/virtio/vhost-backend.h 
> > > b/include/hw/virtio/vhost-backend.h
> > > index c5ab49051e..ec3fbae58d 100644
> > > --- a/include/hw/virtio/vhost-backend.h
> > > +++ b/include/hw/virtio/vhost-backend.h
> > > @@ -130,6 +130,9 @@ typedef bool (*vhost_force_iommu_op)(struct vhost_dev 
> > > *dev);
> > >
> > >   typedef int (*vhost_set_config_call_op)(struct vhost_dev *dev,
> > >  int fd);
> > > +
> > > +typedef void (*vhost_reset_status_op)(struct vhost_dev *dev);
> > > +
> > >   typedef struct VhostOps {
> > >   VhostBackendType backend_type;
> > >   vhost_backend_init vhost_backend_init;
> > > @@ -177,6 +180,7 @@ typedef struct VhostOps {
> > >   vhost_get_device_id_op vhost_get_device_id;
> > >   vhost_force_iommu_op vhost_force_iommu;
> > >   vhost_set_config_call_op vhost_set_config_call;
> > > +vhost_reset_status_op vhost_reset_status;
> > >   } VhostOps;
> > >
> > >   int vhost_backend_update_device_iotlb(struct vhost_dev *dev,
> > > diff --git a/hw/virtio/vhost-vdpa.c b/hw/virtio/vhost-vdpa.c
> > > index cbbe92ffe8..26e38a6aab 100644
> > > --- a/hw/virtio/vhost-vdpa.c
> > > +++ b/hw/virtio/vhost-vdpa.c
> > > @@ -1152,14 +1152,23 @@ static int vhost_vdpa_dev_start(struct vhost_dev 
> > > *dev, bool started)
> > >   if (started) {
> > >   memory_listener_register(>listener, _space_memory);
> > >   return vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_DRIVER_OK);
> > > -} else {
> > > -vhost_vdpa_reset_device(dev);
> > > -vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> > > -   VIRTIO_CONFIG_S_DRIVER);
> > > -memory_listener_unregister(>listener);
> > > +}
> > >
> > > -return 0;
> > > +return 0;
> > > +}
> > > +
> > > +static void vhost_vdpa_reset_status(struct vhost_dev *dev)
> > > +{
> > > +struct vhost_vdpa *v = dev->opaque;
> > > +
> > > +if (dev->vq_index + dev->nvqs != dev->vq_index_end) {
> > > +return;
> > >   }
> > > +
> > > +vhost_vdpa_reset_device(dev);
> > > +vhost_vdpa_add_status(dev, VIRTIO_CONFIG_S_ACKNOWLEDGE |
> > > +VIRTIO_CONFIG_S_DRIVER);
> > > +memory_listener_unregister(>listener);
> > >   }
> > >
> > >   static int vhost_vdpa_set_log_base(struct vhost_dev *dev, uint64_t base,
> > > @@ -1346,4 +1355,5 @@ const VhostOps vdpa_ops = {
> > >   .vhost_vq_get_addr = vhost_vdpa_vq_get_addr,
> > >   .vhost_force_iommu = vhost_vdpa_force_iommu,
> > >   .vhost_set_config_call = vhost_vdpa_set_config_call,
> > > +.vhost_reset_status = vhost_vdpa_reset_status,
> > >   };
> > > diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
> > > index eb8c4c378c..a266396576 100644
> > > --- a/hw/virtio/vhost.c
> > > +++ b/hw/virtio/vhost.c
> > > @@ -2049,6 +2049,9 @@ void vhost_dev_stop(struct vhost_dev *hdev, 
> > > VirtIODevice *vdev, bool vrings)
> > >hdev->vqs + i,
> > >hdev->vq_index + i);
> > >   }
> > > +if (hdev->vhost_ops->vhost_reset_status) {
> > > +hdev->vhost_ops->vhost_reset_status(hdev);
> > > +}
> >
> >
> > This looks racy, if we don't suspend/reset the device, device can move
> > last_avail_idx even after get_vring_base()?
> >
> > Instead of doing things like this, should we fallback to
> > virtio_queue_restore_last_avail_idx() in this case?
> >
>
> Right, we can track if the device is suspended / SVQ and then return
> an error in vring_get_base if it is not. Would that work?

When we don't support suspend, yes.

Thanks

>
> Thanks!
>
> > Thanks
> >
> >
> > >
> > >   if (vhost_dev_has_iommu(hdev)) {
> > >   if (hdev->vhost_ops->vhost_set_iotlb_callback) {
> >
>
>




Re: [PATCH v8 0/8] Introduce igb

2023-02-21 Thread Jason Wang
On Mon, Feb 20, 2023 at 3:04 PM Akihiko Odaki  wrote:
>
> On 2023/02/20 16:01, Jason Wang wrote:
> >
> > 在 2023/2/6 20:30, Akihiko Odaki 写道:
> >> Hi Jason,
> >>
> >> Let me remind that every patches in this series now has Reviewed-by:
> >> or Acked-by: tag though I forgot to include tags the prior versions of
> >> this series received to the latest version:
> >
> >
> > No worries, I can do that.
> >
> > But when I try, it doesn't apply cleanly on master, are there any
> > dependence I missed?
> >
> > # git am *.eml
> > Applying: pcie: Introduce pcie_sriov_num_vfs
> > Applying: e1000: Split header files
> > error: patch failed: hw/net/e1000_regs.h:470
> > error: hw/net/e1000_regs.h: patch does not apply
> > error: patch failed: hw/net/e1000x_common.c:29
> > error: hw/net/e1000x_common.c: patch does not apply
> > Patch failed at 0002 e1000: Split header files
> > hint: Use 'git am --show-current-patch' to see the failed patch
> > When you have resolved this problem, run "git am --continue".
> > If you prefer to skip this patch, run "git am --skip" instead.
> > To restore the original branch and stop patching, run "git am --abort".
>
> It is Based-on: <20230201033539.30049-1-akihiko.od...@daynix.com>.
> ([PATCH v5 00/29] e1000x cleanups (preliminary for IGB))
>
> Please apply the series first.

The e1000x cleanups applies cleanly, but when I try to apply igb series, I got:

# git am *.eml
Applying: pcie: Introduce pcie_sriov_num_vfs
Applying: e1000: Split header files
Applying: Intrdocue igb device emulation
Applying: tests/qtest/e1000e-test: Fabricate ethernet header
Applying: tests/qtest/libqos/e1000e: Export macreg functions
Applying: igb: Introduce qtest for igb device
error: patch failed: tests/qtest/meson.build:256
error: tests/qtest/meson.build: patch does not apply
Patch failed at 0006 igb: Introduce qtest for igb device
hint: Use 'git am --show-current-patch' to see the failed patch
When you have resolved this problem, run "git am --continue".
If you prefer to skip this patch, run "git am --skip" instead.
To restore the original branch and stop patching, run "git am --abort".

The patches seem to require a rebase.

Thanks

>
> Regards,
> Akihiko Odaki
>
> >
> > Thanks
> >
> >
> >>
> >> "Introduce igb"
> >> https://lore.kernel.org/qemu-devel/dbbp189mb143365704198dc9a0684dea595...@dbbp189mb1433.eurp189.prod.outlook.com/
> >>
> >> "docs/system/devices/igb: Add igb documentation"
> >> https://lore.kernel.org/qemu-devel/741a0975-9f7a-b4bc-9651-cf45f03d1...@kaod.org/
> >>
> >> Regards,
> >> Akihiko Odaki
> >>
> >> On 2023/02/04 13:36, Akihiko Odaki wrote:
> >>> Based-on: <20230201033539.30049-1-akihiko.od...@daynix.com>
> >>> ([PATCH v5 00/29] e1000x cleanups (preliminary for IGB))
> >>>
> >>> igb is a family of Intel's gigabit ethernet controllers. This series
> >>> implements
> >>> 82576 emulation in particular. You can see the last patch for the
> >>> documentation.
> >>>
> >>> Note that there is another effort to bring 82576 emulation. This
> >>> series was
> >>> developed independently by Sriram Yagnaraman.
> >>> https://lists.gnu.org/archive/html/qemu-devel/2022-12/msg04670.html
> >>>
> >>> V7 -> V8:
> >>> - Removed obsolete patch
> >>>"hw/net/net_tx_pkt: Introduce net_tx_pkt_get_eth_hdr" (Cédric Le
> >>> Goater)
> >>>
> >>> V6 -> V7:
> >>> - Reordered statements in igb_receive_internal() so that checksum
> >>> will be
> >>>calculated only once and it will be more close to
> >>> e1000e_receive_internal().
> >>>
> >>> V5 -> V6:
> >>> - Rebased.
> >>> - Renamed "test" to "packet" in tests/qtest/e1000e-test.c.
> >>> - Fixed Rx logic so that a Rx pool without enough space won't prevent
> >>> other
> >>>pools from receiving, based on Sriram Yagnaraman's work.
> >>>
> >>> V4 -> V5:
> >>> - Rebased.
> >>> - Squashed patches to copy from e1000e code and modify it.
> >>> - Listed the implemented features.
> >>> - Added a check for interrupts availablity on PF.
> >>> - Fixed the declaration of igb_receive_internal(). (Sriram Yagnaraman)
> >>>
> >>> V3 -> V4:
> >>> - Rebased.
> >>> - Corrected PCIDevice specified for DMA.
> >>>
> >>> V2 -> V3:
> >>> - Rebased.
> >>> - Fixed PCIDevice reference in hw/net/igbvf.c.
> >>> - Fixed TX packet switching when VM loopback is enabled.
> >>> - Fixed VMDq enablement check.
> >>> - Fixed RX descriptor length parser.
> >>> - Fixed the definitions of RQDPC readers.
> >>> - Implemented VLAN VM filter.
> >>> - Implemented VT_CTL.Def_PL.
> >>> - Implemented the combination of VMDq and RSS.
> >>> - Noted that igb is tested with Windows HLK.
> >>>
> >>> V1 -> V2:
> >>> - Spun off e1000e general improvements to a distinct series.
> >>> - Restored vnet_hdr offload as there seems nothing preventing from that.
> >>>
> >>> Akihiko Odaki (8):
> >>>pcie: Introduce pcie_sriov_num_vfs
> >>>e1000: Split header files
> >>>Intrdocue igb device emulation
> >>>tests/qtest/e1000e-test: Fabricate ethernet header
> >>>tests/qtest/libqos/e1000e: Export macreg 

Re: [PATCH v4 6/9] hw/i386/pc: Initialize ram_memory variable directly

2023-02-21 Thread Xiaoyao Li

On 2/14/2023 12:20 AM, Bernhard Beschow wrote:

Going through pc_memory_init() seems quite complicated for a simple
assignment.



...


diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index 5bde4533cc..00ba725656 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -143,6 +143,7 @@ static void pc_init1(MachineState *machine,
  if (xen_enabled()) {
  xen_hvm_init_pc(pcms, _memory);
  } else {
+ram_memory = machine->ram;
  if (!pcms->max_ram_below_4g) {
  pcms->max_ram_below_4g = 0xe000; /* default: 3.5G */
  }
@@ -205,8 +206,7 @@ static void pc_init1(MachineState *machine,
  
  /* allocate ram and load rom/bios */

  if (!xen_enabled()) {
-pc_memory_init(pcms, system_memory,
-   rom_memory, _memory, hole64_size);


IMHO, it seems more proper to put

+ram_memory = machine->ram;

here rather than above.


+pc_memory_init(pcms, system_memory, rom_memory, hole64_size);
  } else {
  pc_system_flash_cleanup_unused(pcms);
  if (machine->kernel_filename != NULL) {





[PATCH v3 06/25] target/arm: Update SCR and HCR for RME

2023-02-21 Thread Richard Henderson
Define the missing SCR and HCR bits, allow SCR_NSE and {SCR,HCR}_GPF
to be set, and invalidate TLBs when NSE changes.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/cpu.h|  5 +++--
 target/arm/helper.c | 10 --
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index b046f96e4e..230241cf93 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -1650,7 +1650,7 @@ static inline void xpsr_write(CPUARMState *env, uint32_t 
val, uint32_t mask)
 #define HCR_TERR  (1ULL << 36)
 #define HCR_TEA   (1ULL << 37)
 #define HCR_MIOCNCE   (1ULL << 38)
-/* RES0 bit 39 */
+#define HCR_TME   (1ULL << 39)
 #define HCR_APK   (1ULL << 40)
 #define HCR_API   (1ULL << 41)
 #define HCR_NV(1ULL << 42)
@@ -1659,7 +1659,7 @@ static inline void xpsr_write(CPUARMState *env, uint32_t 
val, uint32_t mask)
 #define HCR_NV2   (1ULL << 45)
 #define HCR_FWB   (1ULL << 46)
 #define HCR_FIEN  (1ULL << 47)
-/* RES0 bit 48 */
+#define HCR_GPF   (1ULL << 48)
 #define HCR_TID4  (1ULL << 49)
 #define HCR_TICAB (1ULL << 50)
 #define HCR_AMVOFFEN  (1ULL << 51)
@@ -1724,6 +1724,7 @@ static inline void xpsr_write(CPUARMState *env, uint32_t 
val, uint32_t mask)
 #define SCR_TRNDR (1ULL << 40)
 #define SCR_ENTP2 (1ULL << 41)
 #define SCR_GPF   (1ULL << 48)
+#define SCR_NSE   (1ULL << 62)
 
 #define HSTR_TTEE (1 << 16)
 #define HSTR_TJDBX (1 << 17)
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 37d9267fb4..3650234c73 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -1875,6 +1875,9 @@ static void scr_write(CPUARMState *env, const 
ARMCPRegInfo *ri, uint64_t value)
 if (cpu_isar_feature(aa64_fgt, cpu)) {
 valid_mask |= SCR_FGTEN;
 }
+if (cpu_isar_feature(aa64_rme, cpu)) {
+valid_mask |= SCR_NSE | SCR_GPF;
+}
 } else {
 valid_mask &= ~(SCR_RW | SCR_ST);
 if (cpu_isar_feature(aa32_ras, cpu)) {
@@ -1904,10 +1907,10 @@ static void scr_write(CPUARMState *env, const 
ARMCPRegInfo *ri, uint64_t value)
 env->cp15.scr_el3 = value;
 
 /*
- * If SCR_EL3.NS changes, i.e. arm_is_secure_below_el3, then
+ * If SCR_EL3.{NS,NSE} changes, i.e. change of security state,
  * we must invalidate all TLBs below EL3.
  */
-if (changed & SCR_NS) {
+if (changed & (SCR_NS | SCR_NSE)) {
 tlb_flush_by_mmuidx(env_cpu(env), (ARMMMUIdxBit_E10_0 |
ARMMMUIdxBit_E20_0 |
ARMMMUIdxBit_E10_1 |
@@ -5655,6 +5658,9 @@ static void do_hcr_write(CPUARMState *env, uint64_t 
value, uint64_t valid_mask)
 if (cpu_isar_feature(aa64_fwb, cpu)) {
 valid_mask |= HCR_FWB;
 }
+if (cpu_isar_feature(aa64_rme, cpu)) {
+valid_mask |= HCR_GPF;
+}
 }
 
 if (cpu_isar_feature(any_evt, cpu)) {
-- 
2.34.1




[PATCH v3 10/25] include/exec/memattrs: Add two bits of space to MemTxAttrs

2023-02-21 Thread Richard Henderson
We will need 2 bits to represent ARMSecurityState.

Do not attempt to replace or widen secure, even though it
logically overlaps the new field -- there are uses within
e.g. hw/block/pflash_cfi01.c, which don't know anything
specific about ARM.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 include/exec/memattrs.h | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/include/exec/memattrs.h b/include/exec/memattrs.h
index 9fb98bc1ef..d04170aa27 100644
--- a/include/exec/memattrs.h
+++ b/include/exec/memattrs.h
@@ -29,10 +29,17 @@ typedef struct MemTxAttrs {
  * "didn't specify" if necessary.
  */
 unsigned int unspecified:1;
-/* ARM/AMBA: TrustZone Secure access
+/*
+ * ARM/AMBA: TrustZone Secure access
  * x86: System Management Mode access
  */
 unsigned int secure:1;
+/*
+ * ARM: ArmSecuritySpace.  This partially overlaps secure, but it is
+ * easier to have both fields to assist code that does not understand
+ * ARMv9 RME, or no specific knowledge of ARM at all (e.g. pflash).
+ */
+unsigned int space:2;
 /* Memory access is usermode (unprivileged) */
 unsigned int user:1;
 /*
-- 
2.34.1




[PATCH v3 15/25] target/arm: NSTable is RES0 for the RME EL3 regime

2023-02-21 Thread Richard Henderson
Test in_space instead of in_secure so that we don't switch
out of Root space.  Handle the output space change immediately,
rather than try and combine the NSTable and NS bits later.

Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 27 ++-
 1 file changed, 14 insertions(+), 13 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index a77db3dd43..8b3deb0884 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -1238,7 +1238,6 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
S1Translate *ptw,
 {
 ARMCPU *cpu = env_archcpu(env);
 ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
-bool is_secure = ptw->in_secure;
 int32_t level;
 ARMVAParameters param;
 uint64_t ttbr;
@@ -1254,7 +1253,6 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
S1Translate *ptw,
 uint64_t descaddrmask;
 bool aarch64 = arm_el_is_aa64(env, el);
 uint64_t descriptor, new_descriptor;
-bool nstable;
 
 /* TODO: This code does not support shareability levels. */
 if (aarch64) {
@@ -1415,20 +1413,19 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
S1Translate *ptw,
 descaddrmask = MAKE_64BIT_MASK(0, 40);
 }
 descaddrmask &= ~indexmask_grainsize;
-
-/*
- * Secure accesses start with the page table in secure memory and
- * can be downgraded to non-secure at any step. Non-secure accesses
- * remain non-secure. We implement this by just ORing in the NSTable/NS
- * bits at each step.
- */
-tableattrs = is_secure ? 0 : (1 << 4);
+tableattrs = 0;
 
  next_level:
 descaddr |= (address >> (stride * (4 - level))) & indexmask;
 descaddr &= ~7ULL;
-nstable = extract32(tableattrs, 4, 1);
-if (nstable && ptw->in_secure) {
+
+/*
+ * Process the NSTable bit from the previous level.  This changes
+ * the table address space and the output space from Secure to
+ * NonSecure.  With RME, the EL3 translation regime does not change
+ * from Root to NonSecure.
+ */
+if (extract32(tableattrs, 4, 1) && ptw->in_space == ARMSS_Secure) {
 /*
  * Stage2_S -> Stage2 or Phys_S -> Phys_NS
  * Assert the relative order of the secure/non-secure indexes.
@@ -1437,7 +1434,11 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
S1Translate *ptw,
 QEMU_BUILD_BUG_ON(ARMMMUIdx_Stage2_S + 1 != ARMMMUIdx_Stage2);
 ptw->in_ptw_idx += 1;
 ptw->in_secure = false;
+ptw->in_space = ARMSS_NonSecure;
+result->f.attrs.secure = false;
+result->f.attrs.space = ARMSS_NonSecure;
 }
+
 if (!S1_ptw_translate(env, ptw, descaddr, fi)) {
 goto do_fault;
 }
@@ -1540,7 +1541,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
S1Translate *ptw,
  */
 attrs = new_descriptor & (MAKE_64BIT_MASK(2, 10) | MAKE_64BIT_MASK(50, 
14));
 if (!regime_is_stage2(mmu_idx)) {
-attrs |= nstable << 5; /* NS */
+attrs |= !ptw->in_secure << 5; /* NS */
 if (!param.hpd) {
 attrs |= extract64(tableattrs, 0, 2) << 53; /* XN, PXN */
 /*
-- 
2.34.1




[PATCH NOTFORMERGE v3 24/25] target/arm: Enable RME for -cpu max

2023-02-21 Thread Richard Henderson
Add a cpu property to set GPCCR_EL3.L0GPTSZ, for testing
various possible configurations.

Signed-off-by: Richard Henderson 
---
 target/arm/cpu64.c | 37 +
 1 file changed, 37 insertions(+)

diff --git a/target/arm/cpu64.c b/target/arm/cpu64.c
index 4066950da1..70c173ee3d 100644
--- a/target/arm/cpu64.c
+++ b/target/arm/cpu64.c
@@ -672,6 +672,40 @@ void arm_cpu_lpa2_finalize(ARMCPU *cpu, Error **errp)
 cpu->isar.id_aa64mmfr0 = t;
 }
 
+static void cpu_max_set_l0gptsz(Object *obj, Visitor *v, const char *name,
+void *opaque, Error **errp)
+{
+ARMCPU *cpu = ARM_CPU(obj);
+uint32_t value;
+
+if (!visit_type_uint32(v, name, , errp)) {
+return;
+}
+
+/* Encode the value for the GPCCR_EL3 field. */
+switch (value) {
+case 30:
+case 34:
+case 36:
+case 39:
+cpu->reset_l0gptsz = value - 30;
+break;
+default:
+error_setg(errp, "invalid value for l0gptsz");
+error_append_hint(errp, "valid values are 30, 34, 36, 39\n");
+break;
+}
+}
+
+static void cpu_max_get_l0gptsz(Object *obj, Visitor *v, const char *name,
+void *opaque, Error **errp)
+{
+ARMCPU *cpu = ARM_CPU(obj);
+uint32_t value = cpu->reset_l0gptsz + 30;
+
+visit_type_uint32(v, name, , errp);
+}
+
 static void aarch64_a57_initfn(Object *obj)
 {
 ARMCPU *cpu = ARM_CPU(obj);
@@ -1200,6 +1234,7 @@ static void aarch64_max_initfn(Object *obj)
 t = FIELD_DP64(t, ID_AA64PFR0, SVE, 1);
 t = FIELD_DP64(t, ID_AA64PFR0, SEL2, 1);  /* FEAT_SEL2 */
 t = FIELD_DP64(t, ID_AA64PFR0, DIT, 1);   /* FEAT_DIT */
+t = FIELD_DP64(t, ID_AA64PFR0, RME, 1);   /* FEAT_RME */
 t = FIELD_DP64(t, ID_AA64PFR0, CSV2, 2);  /* FEAT_CSV2_2 */
 t = FIELD_DP64(t, ID_AA64PFR0, CSV3, 1);  /* FEAT_CSV3 */
 cpu->isar.id_aa64pfr0 = t;
@@ -1301,6 +1336,8 @@ static void aarch64_max_initfn(Object *obj)
 object_property_add(obj, "sve-max-vq", "uint32", cpu_max_get_sve_max_vq,
 cpu_max_set_sve_max_vq, NULL, NULL);
 qdev_property_add_static(DEVICE(obj), _cpu_lpa2_property);
+object_property_add(obj, "l0gptsz", "uint32", cpu_max_get_l0gptsz,
+cpu_max_set_l0gptsz, NULL, NULL);
 }
 
 static const ARMCPUInfo aarch64_cpus[] = {
-- 
2.34.1




[PATCH v3 07/25] target/arm: SCR_EL3.NS may be RES1

2023-02-21 Thread Richard Henderson
With RME, SEL2 must also be present to support secure state.
The NS bit is RES1 if SEL2 is not present.

Signed-off-by: Richard Henderson 
---
 target/arm/helper.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 3650234c73..ae8b3f6a48 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -1856,6 +1856,9 @@ static void scr_write(CPUARMState *env, const 
ARMCPRegInfo *ri, uint64_t value)
 }
 if (cpu_isar_feature(aa64_sel2, cpu)) {
 valid_mask |= SCR_EEL2;
+} else if (cpu_isar_feature(aa64_rme, cpu)) {
+/* With RME and without SEL2, NS is RES1 (R_GSWWH, I_DJJQJ). */
+value |= SCR_NS;
 }
 if (cpu_isar_feature(aa64_mte, cpu)) {
 valid_mask |= SCR_ATA;
-- 
2.34.1




[PATCH v3 04/25] target/arm: Rewrite check_s2_mmu_setup

2023-02-21 Thread Richard Henderson
Integrate neighboring code from get_phys_addr_lpae which computed
starting level, as it is easier to validate when doing both at the
same time.  Mirror the checks at the start of AArch{64,32}.S2Walk,
especially S2InvalidSL and S2InconsistentSL.

This reverts 49ba115bb74, which was incorrect -- there is nothing
in the ARM pseudocode that depends on TxSZ, i.e. outputsize; the
pseudocode is consistent in referencing PAMax.

Fixes: 49ba115bb74 ("target/arm: Pass outputsize down to check_s2_mmu_setup")
Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 173 ++-
 1 file changed, 97 insertions(+), 76 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 2b125fff44..6fb72fb086 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -1077,70 +1077,119 @@ static ARMVAParameters aa32_va_parameters(CPUARMState 
*env, uint32_t va,
  * check_s2_mmu_setup
  * @cpu:ARMCPU
  * @is_aa64:True if the translation regime is in AArch64 state
- * @startlevel: Suggested starting level
- * @inputsize:  Bitsize of IPAs
+ * @tcr:VTCR_EL2 or VSTCR_EL2
+ * @ds: Effective value of TCR.DS.
+ * @iasize: Bitsize of IPAs
  * @stride: Page-table stride (See the ARM ARM)
  *
- * Returns true if the suggested S2 translation parameters are OK and
- * false otherwise.
+ * Decode the starting level of the S2 lookup, returning INT_MIN if
+ * the configuration is invalid.
  */
-static bool check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, int level,
-   int inputsize, int stride, int outputsize)
+static int check_s2_mmu_setup(ARMCPU *cpu, bool is_aa64, uint64_t tcr,
+  bool ds, int iasize, int stride)
 {
-const int grainsize = stride + 3;
-int startsizecheck;
-
-/*
- * Negative levels are usually not allowed...
- * Except for FEAT_LPA2, 4k page table, 52-bit address space, which
- * begins with level -1.  Note that previous feature tests will have
- * eliminated this combination if it is not enabled.
- */
-if (level < (inputsize == 52 && stride == 9 ? -1 : 0)) {
-return false;
-}
-
-startsizecheck = inputsize - ((3 - level) * stride + grainsize);
-if (startsizecheck < 1 || startsizecheck > stride + 4) {
-return false;
-}
+int sl0, sl2, startlevel, granulebits, levels;
+int s1_min_iasize, s1_max_iasize;
 
+sl0 = extract32(tcr, 6, 2);
 if (is_aa64) {
+/*
+ * AArch64.S2InvalidTxSZ: While we checked tsz_oob near the top of
+ * get_phys_addr_lpae, that used aa64_va_parameters which apply
+ * to aarch64.  If Stage1 is aarch32, the min_txsz is larger.
+ * See AArch64.S2MinTxSZ, where min_tsz is 24, translated to
+ * inputsize is 64 - 24 = 40.
+ */
+if (iasize < 40 && !arm_el_is_aa64(>env, 1)) {
+goto fail;
+}
+
+/*
+ * AArch64.S2InvalidSL: Interpretation of SL depends on the page size,
+ * so interleave AArch64.S2StartLevel.
+ */
 switch (stride) {
-case 13: /* 64KB Pages.  */
-if (level == 0 || (level == 1 && outputsize <= 42)) {
-return false;
+case 9: /* 4KB */
+/* SL2 is RES0 unless DS=1 & 4KB granule. */
+sl2 = extract64(tcr, 33, 1);
+if (ds && sl2) {
+if (sl0 != 0) {
+goto fail;
+}
+startlevel = -1;
+} else {
+startlevel = 2 - sl0;
+switch (sl0) {
+case 2:
+if (arm_pamax(cpu) < 44) {
+goto fail;
+}
+break;
+case 3:
+if (!cpu_isar_feature(aa64_st, cpu)) {
+goto fail;
+}
+startlevel = 3;
+break;
+}
 }
 break;
-case 11: /* 16KB Pages.  */
-if (level == 0 || (level == 1 && outputsize <= 40)) {
-return false;
+case 11: /* 16KB */
+switch (sl0) {
+case 2:
+if (arm_pamax(cpu) < 42) {
+goto fail;
+}
+break;
+case 3:
+if (!ds) {
+goto fail;
+}
+break;
 }
+startlevel = 3 - sl0;
 break;
-case 9: /* 4KB Pages.  */
-if (level == 0 && outputsize <= 42) {
-return false;
+case 13: /* 64KB */
+switch (sl0) {
+case 2:
+if (arm_pamax(cpu) < 44) {
+goto fail;
+}
+break;
+case 3:
+goto fail;
 }
+startlevel = 3 - sl0;
 break;

[PATCH v3 02/25] target/arm: Stub arm_hcr_el2_eff for m-profile

2023-02-21 Thread Richard Henderson
M-profile doesn't have HCR_EL2.  While we could test features
before each call, zero is a generally safe return value to
disable the code in the caller.  This test is required to
avoid an assert in arm_is_secure_below_el3.

Signed-off-by: Richard Henderson 
---
 target/arm/helper.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/arm/helper.c b/target/arm/helper.c
index 07d4100365..37d9267fb4 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -5788,6 +5788,9 @@ uint64_t arm_hcr_el2_eff_secstate(CPUARMState *env, bool 
secure)
 
 uint64_t arm_hcr_el2_eff(CPUARMState *env)
 {
+if (arm_feature(env, ARM_FEATURE_M)) {
+return 0;
+}
 return arm_hcr_el2_eff_secstate(env, arm_is_secure_below_el3(env));
 }
 
-- 
2.34.1




[PATCH v3 18/25] target/arm: Use get_phys_addr_with_struct in S1_ptw_translate

2023-02-21 Thread Richard Henderson
Do not provide a fast-path for physical addresses,
as those will need to be validated for GPC.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 35 ++-
 1 file changed, 14 insertions(+), 21 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index fc4c1ccf54..8a31af60c9 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -232,29 +232,22 @@ static bool S1_ptw_translate(CPUARMState *env, 
S1Translate *ptw,
  * From gdbstub, do not use softmmu so that we don't modify the
  * state of the cpu at all, including softmmu tlb contents.
  */
-if (regime_is_stage2(s2_mmu_idx)) {
-S1Translate s2ptw = {
-.in_mmu_idx = s2_mmu_idx,
-.in_ptw_idx = arm_space_to_phys(space),
-.in_space = space,
-.in_secure = is_secure,
-.in_debug = true,
-};
-GetPhysAddrResult s2 = { };
+S1Translate s2ptw = {
+.in_mmu_idx = s2_mmu_idx,
+.in_ptw_idx = arm_space_to_phys(space),
+.in_space = space,
+.in_secure = is_secure,
+.in_debug = true,
+};
+GetPhysAddrResult s2 = { };
 
-if (get_phys_addr_lpae(env, , addr, MMU_DATA_LOAD,
-   false, , fi)) {
-goto fail;
-}
-ptw->out_phys = s2.f.phys_addr;
-pte_attrs = s2.cacheattrs.attrs;
-pte_secure = s2.f.attrs.secure;
-} else {
-/* Regime is physical. */
-ptw->out_phys = addr;
-pte_attrs = 0;
-pte_secure = is_secure;
+if (get_phys_addr_with_struct(env, , addr,
+  MMU_DATA_LOAD, , fi)) {
+goto fail;
 }
+ptw->out_phys = s2.f.phys_addr;
+pte_attrs = s2.cacheattrs.attrs;
+pte_secure = s2.f.attrs.secure;
 ptw->out_host = NULL;
 ptw->out_rw = false;
 } else {
-- 
2.34.1




[PATCH v3 05/25] target/arm: Add isar_feature_aa64_rme

2023-02-21 Thread Richard Henderson
Add the missing field for ID_AA64PFR0, and the predicate.
Disable it if EL3 is forced off by the board or command-line.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/cpu.h | 6 ++
 target/arm/cpu.c | 4 
 2 files changed, 10 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index cb4e405f04..b046f96e4e 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -2190,6 +2190,7 @@ FIELD(ID_AA64PFR0, SEL2, 36, 4)
 FIELD(ID_AA64PFR0, MPAM, 40, 4)
 FIELD(ID_AA64PFR0, AMU, 44, 4)
 FIELD(ID_AA64PFR0, DIT, 48, 4)
+FIELD(ID_AA64PFR0, RME, 52, 4)
 FIELD(ID_AA64PFR0, CSV2, 56, 4)
 FIELD(ID_AA64PFR0, CSV3, 60, 4)
 
@@ -3808,6 +3809,11 @@ static inline bool isar_feature_aa64_sel2(const 
ARMISARegisters *id)
 return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, SEL2) != 0;
 }
 
+static inline bool isar_feature_aa64_rme(const ARMISARegisters *id)
+{
+return FIELD_EX64(id->id_aa64pfr0, ID_AA64PFR0, RME) != 0;
+}
+
 static inline bool isar_feature_aa64_vh(const ARMISARegisters *id)
 {
 return FIELD_EX64(id->id_aa64mmfr1, ID_AA64MMFR1, VH) != 0;
diff --git a/target/arm/cpu.c b/target/arm/cpu.c
index 876ab8f3bf..83685ed247 100644
--- a/target/arm/cpu.c
+++ b/target/arm/cpu.c
@@ -1947,6 +1947,10 @@ static void arm_cpu_realizefn(DeviceState *dev, Error 
**errp)
 cpu->isar.id_dfr0 = FIELD_DP32(cpu->isar.id_dfr0, ID_DFR0, COPSDBG, 0);
 cpu->isar.id_aa64pfr0 = FIELD_DP64(cpu->isar.id_aa64pfr0,
ID_AA64PFR0, EL3, 0);
+
+/* Disable the realm management extension, which requires EL3. */
+cpu->isar.id_aa64pfr0 = FIELD_DP64(cpu->isar.id_aa64pfr0,
+   ID_AA64PFR0, RME, 0);
 }
 
 if (!cpu->has_el2) {
-- 
2.34.1




[PATCH v3 14/25] target/arm: Pipe ARMSecuritySpace through ptw.c

2023-02-21 Thread Richard Henderson
Add input and output space members to S1Translate.
Set and adjust them in S1_ptw_translate, and the
various points at which we drop secure state.
Initialize the space in get_phys_addr; for now
leave get_phys_addr_with_secure considering only
secure vs non-secure spaces.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 98 ++--
 1 file changed, 78 insertions(+), 20 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 9f608b12b2..a77db3dd43 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -19,11 +19,13 @@
 typedef struct S1Translate {
 ARMMMUIdx in_mmu_idx;
 ARMMMUIdx in_ptw_idx;
+ARMSecuritySpace in_space;
 bool in_secure;
 bool in_debug;
 bool out_secure;
 bool out_rw;
 bool out_be;
+ARMSecuritySpace out_space;
 hwaddr out_virt;
 hwaddr out_phys;
 void *out_host;
@@ -216,6 +218,7 @@ static bool S2_attrs_are_device(uint64_t hcr, uint8_t attrs)
 static bool S1_ptw_translate(CPUARMState *env, S1Translate *ptw,
  hwaddr addr, ARMMMUFaultInfo *fi)
 {
+ARMSecuritySpace space = ptw->in_space;
 bool is_secure = ptw->in_secure;
 ARMMMUIdx mmu_idx = ptw->in_mmu_idx;
 ARMMMUIdx s2_mmu_idx = ptw->in_ptw_idx;
@@ -232,7 +235,8 @@ static bool S1_ptw_translate(CPUARMState *env, S1Translate 
*ptw,
 if (regime_is_stage2(s2_mmu_idx)) {
 S1Translate s2ptw = {
 .in_mmu_idx = s2_mmu_idx,
-.in_ptw_idx = is_secure ? ARMMMUIdx_Phys_S : ARMMMUIdx_Phys_NS,
+.in_ptw_idx = arm_space_to_phys(space),
+.in_space = space,
 .in_secure = is_secure,
 .in_debug = true,
 };
@@ -290,10 +294,17 @@ static bool S1_ptw_translate(CPUARMState *env, 
S1Translate *ptw,
 }
 
 /* Check if page table walk is to secure or non-secure PA space. */
-ptw->out_secure = (is_secure
-   && !(pte_secure
+if (is_secure) {
+bool out_secure = !(pte_secure
 ? env->cp15.vstcr_el2 & VSTCR_SW
-: env->cp15.vtcr_el2 & VTCR_NSW));
+: env->cp15.vtcr_el2 & VTCR_NSW);
+if (!out_secure) {
+is_secure = false;
+space = ARMSS_NonSecure;
+}
+}
+ptw->out_secure = is_secure;
+ptw->out_space = space;
 ptw->out_be = regime_translation_big_endian(env, mmu_idx);
 return true;
 
@@ -324,7 +335,10 @@ static uint32_t arm_ldl_ptw(CPUARMState *env, S1Translate 
*ptw,
 }
 } else {
 /* Page tables are in MMIO. */
-MemTxAttrs attrs = { .secure = ptw->out_secure };
+MemTxAttrs attrs = {
+.secure = ptw->out_secure,
+.space = ptw->out_space,
+};
 AddressSpace *as = arm_addressspace(cs, attrs);
 MemTxResult result = MEMTX_OK;
 
@@ -367,7 +381,10 @@ static uint64_t arm_ldq_ptw(CPUARMState *env, S1Translate 
*ptw,
 #endif
 } else {
 /* Page tables are in MMIO. */
-MemTxAttrs attrs = { .secure = ptw->out_secure };
+MemTxAttrs attrs = {
+.secure = ptw->out_secure,
+.space = ptw->out_space,
+};
 AddressSpace *as = arm_addressspace(cs, attrs);
 MemTxResult result = MEMTX_OK;
 
@@ -873,6 +890,7 @@ static bool get_phys_addr_v6(CPUARMState *env, S1Translate 
*ptw,
  * regime, because the attribute will already be non-secure.
  */
 result->f.attrs.secure = false;
+result->f.attrs.space = ARMSS_NonSecure;
 }
 result->f.phys_addr = phys_addr;
 return false;
@@ -1577,6 +1595,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
S1Translate *ptw,
  * regime, because the attribute will already be non-secure.
  */
 result->f.attrs.secure = false;
+result->f.attrs.space = ARMSS_NonSecure;
 }
 
 /* When in aarch64 mode, and BTI is enabled, remember GP in the TLB.  */
@@ -2361,6 +2380,7 @@ static bool get_phys_addr_pmsav8(CPUARMState *env, 
uint32_t address,
  */
 if (sattrs.ns) {
 result->f.attrs.secure = false;
+result->f.attrs.space = ARMSS_NonSecure;
 } else if (!secure) {
 /*
  * NS access to S memory must fault.
@@ -2710,6 +2730,7 @@ static bool get_phys_addr_twostage(CPUARMState *env, 
S1Translate *ptw,
 bool is_secure = ptw->in_secure;
 bool ret, ipa_secure, s2walk_secure;
 ARMCacheAttrs cacheattrs1;
+ARMSecuritySpace ipa_space, s2walk_space;
 bool is_el0;
 uint64_t hcr;
 
@@ -2722,20 +2743,24 @@ static bool get_phys_addr_twostage(CPUARMState *env, 
S1Translate *ptw,
 
 ipa = result->f.phys_addr;
 ipa_secure = result->f.attrs.secure;
+ipa_space = result->f.attrs.space;
 if (is_secure) {
 /* Select TCR based on the NS bit 

[PATCH v3 12/25] target/arm: Introduce ARMMMUIdx_Phys_{Realm,Root}

2023-02-21 Thread Richard Henderson
With FEAT_RME, there are four physical address spaces.
For now, just define the symbols, and mention them in
the same spots as the other Phys indexes in ptw.c.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/cpu-param.h |  2 +-
 target/arm/cpu.h   | 23 +--
 target/arm/ptw.c   | 10 --
 3 files changed, 30 insertions(+), 5 deletions(-)

diff --git a/target/arm/cpu-param.h b/target/arm/cpu-param.h
index 53cac9c89b..8dfd7a0bb6 100644
--- a/target/arm/cpu-param.h
+++ b/target/arm/cpu-param.h
@@ -47,6 +47,6 @@
 bool guarded;
 #endif
 
-#define NB_MMU_MODES 12
+#define NB_MMU_MODES 14
 
 #endif
diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index c5fc475cf8..05fd6e61aa 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -2865,8 +2865,10 @@ typedef enum ARMMMUIdx {
 ARMMMUIdx_Stage2= 9 | ARM_MMU_IDX_A,
 
 /* TLBs with 1-1 mapping to the physical address spaces. */
-ARMMMUIdx_Phys_S= 10 | ARM_MMU_IDX_A,
-ARMMMUIdx_Phys_NS   = 11 | ARM_MMU_IDX_A,
+ARMMMUIdx_Phys_S = 10 | ARM_MMU_IDX_A,
+ARMMMUIdx_Phys_NS= 11 | ARM_MMU_IDX_A,
+ARMMMUIdx_Phys_Root  = 12 | ARM_MMU_IDX_A,
+ARMMMUIdx_Phys_Realm = 13 | ARM_MMU_IDX_A,
 
 /*
  * These are not allocated TLBs and are used only for AT system
@@ -2930,6 +2932,23 @@ typedef enum ARMASIdx {
 ARMASIdx_TagS = 3,
 } ARMASIdx;
 
+static inline ARMMMUIdx arm_space_to_phys(ARMSecuritySpace space)
+{
+/* Assert the relative order of the physical mmu indexes. */
+QEMU_BUILD_BUG_ON(ARMSS_Secure != 0);
+QEMU_BUILD_BUG_ON(ARMMMUIdx_Phys_NS != ARMMMUIdx_Phys_S + ARMSS_NonSecure);
+QEMU_BUILD_BUG_ON(ARMMMUIdx_Phys_Root != ARMMMUIdx_Phys_S + ARMSS_Root);
+QEMU_BUILD_BUG_ON(ARMMMUIdx_Phys_Realm != ARMMMUIdx_Phys_S + ARMSS_Realm);
+
+return ARMMMUIdx_Phys_S + space;
+}
+
+static inline ARMSecuritySpace arm_phys_to_space(ARMMMUIdx idx)
+{
+assert(idx >= ARMMMUIdx_Phys_S && idx <= ARMMMUIdx_Phys_Realm);
+return idx - ARMMMUIdx_Phys_S;
+}
+
 static inline bool arm_v7m_csselr_razwi(ARMCPU *cpu)
 {
 /* If all the CLIDR.Ctypem bits are 0 there are no caches, and
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 5ed5bb5039..5a0c5edc88 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -182,8 +182,10 @@ static bool regime_translation_disabled(CPUARMState *env, 
ARMMMUIdx mmu_idx,
 case ARMMMUIdx_E3:
 break;
 
-case ARMMMUIdx_Phys_NS:
 case ARMMMUIdx_Phys_S:
+case ARMMMUIdx_Phys_NS:
+case ARMMMUIdx_Phys_Root:
+case ARMMMUIdx_Phys_Realm:
 /* No translation for physical address spaces. */
 return true;
 
@@ -2632,8 +2634,10 @@ static bool get_phys_addr_disabled(CPUARMState *env, 
target_ulong address,
 switch (mmu_idx) {
 case ARMMMUIdx_Stage2:
 case ARMMMUIdx_Stage2_S:
-case ARMMMUIdx_Phys_NS:
 case ARMMMUIdx_Phys_S:
+case ARMMMUIdx_Phys_NS:
+case ARMMMUIdx_Phys_Root:
+case ARMMMUIdx_Phys_Realm:
 break;
 
 default:
@@ -2830,6 +2834,8 @@ static bool get_phys_addr_with_struct(CPUARMState *env, 
S1Translate *ptw,
 switch (mmu_idx) {
 case ARMMMUIdx_Phys_S:
 case ARMMMUIdx_Phys_NS:
+case ARMMMUIdx_Phys_Root:
+case ARMMMUIdx_Phys_Realm:
 /* Checking Phys early avoids special casing later vs regime_el. */
 return get_phys_addr_disabled(env, address, access_type, mmu_idx,
   is_secure, result, fi);
-- 
2.34.1




[PATCH v3 13/25] target/arm: Remove __attribute__((nonnull)) from ptw.c

2023-02-21 Thread Richard Henderson
This was added in 7e98e21c098 as part of a reorg in which
one of the argument had been legally NULL, and this caught
actual instances.  Now that the reorg is complete, this
serves little purpose.

Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 5a0c5edc88..9f608b12b2 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -32,15 +32,13 @@ typedef struct S1Translate {
 static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
uint64_t address,
MMUAccessType access_type, bool s1_is_el0,
-   GetPhysAddrResult *result, ARMMMUFaultInfo *fi)
-__attribute__((nonnull));
+   GetPhysAddrResult *result, ARMMMUFaultInfo *fi);
 
 static bool get_phys_addr_with_struct(CPUARMState *env, S1Translate *ptw,
   target_ulong address,
   MMUAccessType access_type,
   GetPhysAddrResult *result,
-  ARMMMUFaultInfo *fi)
-__attribute__((nonnull));
+  ARMMMUFaultInfo *fi);
 
 /* This mapping is common between ID_AA64MMFR0.PARANGE and TCR_ELx.{I}PS. */
 static const uint8_t pamax_map[] = {
-- 
2.34.1




[PATCH v3 03/25] target/arm: Diagnose incorrect usage of arm_is_secure subroutines

2023-02-21 Thread Richard Henderson
In several places we use arm_is_secure_below_el3 and
arm_is_el3_or_mon separately from arm_is_secure.
These functions make no sense for m-profile, and
would indicate prior incorrect feature testing.

Signed-off-by: Richard Henderson 
---
 target/arm/cpu.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 7a2f804aeb..cb4e405f04 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -2389,7 +2389,8 @@ static inline int arm_feature(CPUARMState *env, int 
feature)
 void arm_cpu_finalize_features(ARMCPU *cpu, Error **errp);
 
 #if !defined(CONFIG_USER_ONLY)
-/* Return true if exception levels below EL3 are in secure state,
+/*
+ * Return true if exception levels below EL3 are in secure state,
  * or would be following an exception return to that level.
  * Unlike arm_is_secure() (which is always a question about the
  * _current_ state of the CPU) this doesn't care about the current
@@ -2397,6 +2398,7 @@ void arm_cpu_finalize_features(ARMCPU *cpu, Error **errp);
  */
 static inline bool arm_is_secure_below_el3(CPUARMState *env)
 {
+assert(!arm_feature(env, ARM_FEATURE_M));
 if (arm_feature(env, ARM_FEATURE_EL3)) {
 return !(env->cp15.scr_el3 & SCR_NS);
 } else {
@@ -2410,6 +2412,7 @@ static inline bool arm_is_secure_below_el3(CPUARMState 
*env)
 /* Return true if the CPU is AArch64 EL3 or AArch32 Mon */
 static inline bool arm_is_el3_or_mon(CPUARMState *env)
 {
+assert(!arm_feature(env, ARM_FEATURE_M));
 if (arm_feature(env, ARM_FEATURE_EL3)) {
 if (is_a64(env) && extract32(env->pstate, 2, 2) == 3) {
 /* CPU currently in AArch64 state and EL3 */
-- 
2.34.1




[PATCH NOTFORMERGE v3 25/25] hw/arm/virt: Add some memory for Realm Management Monitor

2023-02-21 Thread Richard Henderson
This is arbitrary, but used by the Huawei TF-A test code.

Signed-off-by: Richard Henderson 
---
 include/hw/arm/virt.h |  2 ++
 hw/arm/virt.c | 43 +++
 2 files changed, 45 insertions(+)

diff --git a/include/hw/arm/virt.h b/include/hw/arm/virt.h
index e1ddbea96b..5c0c8a67e4 100644
--- a/include/hw/arm/virt.h
+++ b/include/hw/arm/virt.h
@@ -86,6 +86,7 @@ enum {
 VIRT_ACPI_GED,
 VIRT_NVDIMM_ACPI,
 VIRT_PVTIME,
+VIRT_RMM_MEM,
 VIRT_LOWMEMMAP_LAST,
 };
 
@@ -159,6 +160,7 @@ struct VirtMachineState {
 bool virt;
 bool ras;
 bool mte;
+bool rmm;
 bool dtb_randomness;
 OnOffAuto acpi;
 VirtGICType gic_version;
diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ac626b3bef..067f16cd77 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -159,6 +159,7 @@ static const MemMapEntry base_memmap[] = {
 /* ...repeating for a total of NUM_VIRTIO_TRANSPORTS, each of that size */
 [VIRT_PLATFORM_BUS] =   { 0x0c00, 0x0200 },
 [VIRT_SECURE_MEM] = { 0x0e00, 0x0100 },
+[VIRT_RMM_MEM] ={ 0x0f00, 0x0010 },
 [VIRT_PCIE_MMIO] =  { 0x1000, 0x2eff },
 [VIRT_PCIE_PIO] =   { 0x3eff, 0x0001 },
 [VIRT_PCIE_ECAM] =  { 0x3f00, 0x0100 },
@@ -1602,6 +1603,25 @@ static void create_secure_ram(VirtMachineState *vms,
 g_free(nodename);
 }
 
+static void create_rmm_ram(VirtMachineState *vms,
+   MemoryRegion *sysmem,
+   MemoryRegion *tag_sysmem)
+{
+MemoryRegion *rmm_ram = g_new(MemoryRegion, 1);
+hwaddr base = vms->memmap[VIRT_RMM_MEM].base;
+hwaddr size = vms->memmap[VIRT_RMM_MEM].size;
+
+memory_region_init_ram(rmm_ram, NULL, "virt.rmm-ram", size,
+   _fatal);
+memory_region_add_subregion(sysmem, base, rmm_ram);
+
+/* do not fill in fdt to hide rmm from normal world guest */
+
+if (tag_sysmem) {
+create_tag_ram(tag_sysmem, base, size, "mach-virt.rmm-tag");
+}
+}
+
 static void *machvirt_dtb(const struct arm_boot_info *binfo, int *fdt_size)
 {
 const VirtMachineState *board = container_of(binfo, VirtMachineState,
@@ -2283,6 +2303,10 @@ static void machvirt_init(MachineState *machine)
machine->ram_size, "mach-virt.tag");
 }
 
+if (vms->rmm) {
+create_rmm_ram(vms, sysmem, tag_sysmem);
+}
+
 vms->highmem_ecam &= (!firmware_loaded || aarch64);
 
 create_rtc(vms);
@@ -2562,6 +2586,20 @@ static void virt_set_mte(Object *obj, bool value, Error 
**errp)
 vms->mte = value;
 }
 
+static bool virt_get_rmm(Object *obj, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+return vms->rmm;
+}
+
+static void virt_set_rmm(Object *obj, bool value, Error **errp)
+{
+VirtMachineState *vms = VIRT_MACHINE(obj);
+
+vms->rmm = value;
+}
+
 static char *virt_get_gic_version(Object *obj, Error **errp)
 {
 VirtMachineState *vms = VIRT_MACHINE(obj);
@@ -3115,6 +3153,11 @@ static void virt_machine_class_init(ObjectClass *oc, 
void *data)
   "guest CPU which implements the ARM "
   "Memory Tagging Extension");
 
+object_class_property_add_bool(oc, "rmm", virt_get_rmm, virt_set_rmm);
+object_class_property_set_description(oc, "rmm",
+  "Set on/off to enable/disable ram "
+  "for the Realm Management Monitor");
+
 object_class_property_add_bool(oc, "its", virt_get_its,
virt_set_its);
 object_class_property_set_description(oc, "its",
-- 
2.34.1




[PATCH v3 17/25] target/arm: Handle no-execute for Realm and Root regimes

2023-02-21 Thread Richard Henderson
While Root and Realm may read and write data from other spaces,
neither may execute from other pa spaces.

This happens for Stage1 EL3, EL2, EL2&0, but stage2 EL1&0.

Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 52 ++--
 1 file changed, 46 insertions(+), 6 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 61c1227578..fc4c1ccf54 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -907,7 +907,7 @@ do_fault:
  * @xn:  XN (execute-never) bits
  * @s1_is_el0: true if this is S2 of an S1+2 walk for EL0
  */
-static int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0)
+static int get_S2prot_noexecute(int s2ap)
 {
 int prot = 0;
 
@@ -917,6 +917,12 @@ static int get_S2prot(CPUARMState *env, int s2ap, int xn, 
bool s1_is_el0)
 if (s2ap & 2) {
 prot |= PAGE_WRITE;
 }
+return prot;
+}
+
+static int get_S2prot(CPUARMState *env, int s2ap, int xn, bool s1_is_el0)
+{
+int prot = get_S2prot_noexecute(s2ap);
 
 if (cpu_isar_feature(any_tts2uxn, env_archcpu(env))) {
 switch (xn) {
@@ -982,9 +988,39 @@ static int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, 
bool is_aa64,
 }
 }
 
-if (out_pa == ARMSS_NonSecure && in_pa == ARMSS_Secure &&
-(env->cp15.scr_el3 & SCR_SIF)) {
-return prot_rw;
+if (in_pa != out_pa) {
+switch (in_pa) {
+case ARMSS_Root:
+/*
+ * R_ZWRVD: permission fault for insn fetched from non-Root,
+ * I_WWBFB: SIF has no effect in EL3.
+ */
+return prot_rw;
+case ARMSS_Realm:
+/*
+ * R_PKTDS: permission fault for insn fetched from non-Realm,
+ * for Realm EL2 or EL2&0.  The corresponding fault for EL1&0
+ * happens during any stage2 translation.
+ */
+switch (mmu_idx) {
+case ARMMMUIdx_E2:
+case ARMMMUIdx_E20_0:
+case ARMMMUIdx_E20_2:
+case ARMMMUIdx_E20_2_PAN:
+return prot_rw;
+default:
+break;
+}
+break;
+case ARMSS_Secure:
+if (env->cp15.scr_el3 & SCR_SIF) {
+return prot_rw;
+}
+break;
+default:
+/* Input NonSecure must have output NonSecure. */
+g_assert_not_reached();
+}
 }
 
 /* TODO have_wxn should be replaced with
@@ -1561,12 +1597,16 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
S1Translate *ptw,
 /*
  * R_GYNXY: For stage2 in Realm security state, bit 55 is NS.
  * The bit remains ignored for other security states.
+ * R_YMCSL: Executing an insn fetched from non-Realm causes
+ * a stage2 permission fault.
  */
 if (out_space == ARMSS_Realm && extract64(attrs, 55, 1)) {
 out_space = ARMSS_NonSecure;
+result->f.prot = get_S2prot_noexecute(ap);
+} else {
+xn = extract64(attrs, 53, 2);
+result->f.prot = get_S2prot(env, ap, xn, s1_is_el0);
 }
-xn = extract64(attrs, 53, 2);
-result->f.prot = get_S2prot(env, ap, xn, s1_is_el0);
 } else {
 int nse, ns = extract32(attrs, 5, 1);
 switch (out_space) {
-- 
2.34.1




[PATCH v3 01/25] target/arm: Handle m-profile in arm_is_secure

2023-02-21 Thread Richard Henderson
Signed-off-by: Richard Henderson 
---
 target/arm/cpu.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 12b1082537..7a2f804aeb 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -2426,6 +2426,9 @@ static inline bool arm_is_el3_or_mon(CPUARMState *env)
 /* Return true if the processor is in secure state */
 static inline bool arm_is_secure(CPUARMState *env)
 {
+if (arm_feature(env, ARM_FEATURE_M)) {
+return env->v7m.secure;
+}
 if (arm_is_el3_or_mon(env)) {
 return true;
 }
-- 
2.34.1




[PATCH v3 22/25] target/arm: Implement GPC exceptions

2023-02-21 Thread Richard Henderson
Handle GPC Fault types in arm_deliver_fault, reporting as
either a GPC exception at EL3, or falling through to insn
or data aborts at various exception levels.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/cpu.h|  1 +
 target/arm/internals.h  | 27 
 target/arm/helper.c |  5 +++
 target/arm/tlb_helper.c | 96 +++--
 4 files changed, 126 insertions(+), 3 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 05fd6e61aa..b189efadf8 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -57,6 +57,7 @@
 #define EXCP_UNALIGNED  22   /* v7M UNALIGNED UsageFault */
 #define EXCP_DIVBYZERO  23   /* v7M DIVBYZERO UsageFault */
 #define EXCP_VSERR  24
+#define EXCP_GPC25   /* v9 Granule Protection Check Fault */
 /* NB: add new EXCP_ defines to the array in arm_log_exception() too */
 
 #define ARMV7M_EXCP_RESET   1
diff --git a/target/arm/internals.h b/target/arm/internals.h
index 759b70c646..5e88649fea 100644
--- a/target/arm/internals.h
+++ b/target/arm/internals.h
@@ -352,14 +352,27 @@ typedef enum ARMFaultType {
 ARMFault_ICacheMaint,
 ARMFault_QEMU_NSCExec, /* v8M: NS executing in S memory */
 ARMFault_QEMU_SFault, /* v8M: SecureFault INVTRAN, INVEP or AUVIOL */
+ARMFault_GPCFOnWalk,
+ARMFault_GPCFOnOutput,
 } ARMFaultType;
 
+typedef enum ARMGPCF {
+GPCF_None,
+GPCF_AddressSize,
+GPCF_Walk,
+GPCF_EABT,
+GPCF_Fail,
+} ARMGPCF;
+
 /**
  * ARMMMUFaultInfo: Information describing an ARM MMU Fault
  * @type: Type of fault
+ * @gpcf: Subtype of ARMFault_GPCFOn{Walk,Output}.
  * @level: Table walk level (for translation, access flag and permission 
faults)
  * @domain: Domain of the fault address (for non-LPAE CPUs only)
  * @s2addr: Address that caused a fault at stage 2
+ * @paddr: physical address that caused a fault for gpc
+ * @paddr_space: physical address space that caused a fault for gpc
  * @stage2: True if we faulted at stage 2
  * @s1ptw: True if we faulted at stage 2 while doing a stage 1 page-table walk
  * @s1ns: True if we faulted on a non-secure IPA while in secure state
@@ -368,7 +381,10 @@ typedef enum ARMFaultType {
 typedef struct ARMMMUFaultInfo ARMMMUFaultInfo;
 struct ARMMMUFaultInfo {
 ARMFaultType type;
+ARMGPCF gpcf;
 target_ulong s2addr;
+target_ulong paddr;
+ARMSecuritySpace paddr_space;
 int level;
 int domain;
 bool stage2;
@@ -542,6 +558,17 @@ static inline uint32_t arm_fi_to_lfsc(ARMMMUFaultInfo *fi)
 case ARMFault_Exclusive:
 fsc = 0x35;
 break;
+case ARMFault_GPCFOnWalk:
+assert(fi->level >= -1 && fi->level <= 3);
+if (fi->level < 0) {
+fsc = 0b100011;
+} else {
+fsc = 0b100100 | fi->level;
+}
+break;
+case ARMFault_GPCFOnOutput:
+fsc = 0b101000;
+break;
 default:
 /* Other faults can't occur in a context that requires a
  * long-format status code.
diff --git a/target/arm/helper.c b/target/arm/helper.c
index 9e1c1ed6d8..dc97dc120b 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -10238,6 +10238,7 @@ void arm_log_exception(CPUState *cs)
 [EXCP_UNALIGNED] = "v7M UNALIGNED UsageFault",
 [EXCP_DIVBYZERO] = "v7M DIVBYZERO UsageFault",
 [EXCP_VSERR] = "Virtual SERR",
+[EXCP_GPC] = "Granule Protection Check",
 };
 
 if (idx >= 0 && idx < ARRAY_SIZE(excnames)) {
@@ -10966,6 +10967,10 @@ static void arm_cpu_do_interrupt_aarch64(CPUState *cs)
 }
 
 switch (cs->exception_index) {
+case EXCP_GPC:
+qemu_log_mask(CPU_LOG_INT, "...with MFAR 0x%" PRIx64 "\n",
+  env->cp15.mfar_el3);
+/* fall through */
 case EXCP_PREFETCH_ABORT:
 case EXCP_DATA_ABORT:
 /*
diff --git a/target/arm/tlb_helper.c b/target/arm/tlb_helper.c
index 60abcbebe6..aa03d3f8dc 100644
--- a/target/arm/tlb_helper.c
+++ b/target/arm/tlb_helper.c
@@ -109,17 +109,106 @@ static uint32_t compute_fsr_fsc(CPUARMState *env, 
ARMMMUFaultInfo *fi,
 return fsr;
 }
 
+static bool report_as_gpc_exception(ARMCPU *cpu, int current_el,
+ARMMMUFaultInfo *fi)
+{
+bool ret;
+
+switch (fi->gpcf) {
+case GPCF_None:
+return false;
+case GPCF_AddressSize:
+case GPCF_Walk:
+case GPCF_EABT:
+/* R_PYTGX: GPT faults are reported as GPC. */
+ret = true;
+break;
+case GPCF_Fail:
+/*
+ * R_BLYPM: A GPF at EL3 is reported as insn or data abort.
+ * R_VBZMW, R_LXHQR: A GPF at EL[0-2] is reported as a GPC
+ * if SCR_EL3.GPF is set, otherwise an insn or data abort.
+ */
+ret = (cpu->env.cp15.scr_el3 & SCR_GPF) && current_el != 3;
+break;
+default:
+g_assert_not_reached();
+}
+
+assert(cpu_isar_feature(aa64_rme, cpu));
+

[PATCH v3 09/25] target/arm: Introduce ARMSecuritySpace

2023-02-21 Thread Richard Henderson
Introduce both the enumeration and functions to retrieve
the current state, and state outside of EL3.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/cpu.h| 89 ++---
 target/arm/helper.c | 60 ++
 2 files changed, 127 insertions(+), 22 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 8d18d98350..203a3e0046 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -2409,25 +2409,53 @@ static inline int arm_feature(CPUARMState *env, int 
feature)
 
 void arm_cpu_finalize_features(ARMCPU *cpu, Error **errp);
 
-#if !defined(CONFIG_USER_ONLY)
 /*
+ * ARM v9 security states.
+ * The ordering of the enumeration corresponds to the low 2 bits
+ * of the GPI value, and (except for Root) the concat of NSE:NS.
+ */
+
+typedef enum ARMSecuritySpace {
+ARMSS_Secure = 0,
+ARMSS_NonSecure  = 1,
+ARMSS_Root   = 2,
+ARMSS_Realm  = 3,
+} ARMSecuritySpace;
+
+/* Return true if @space is secure, in the pre-v9 sense. */
+static inline bool arm_space_is_secure(ARMSecuritySpace space)
+{
+return space == ARMSS_Secure || space == ARMSS_Root;
+}
+
+/* Return the ARMSecuritySpace for @secure, assuming !RME or EL[0-2]. */
+static inline ARMSecuritySpace arm_secure_to_space(bool secure)
+{
+return secure ? ARMSS_Secure : ARMSS_NonSecure;
+}
+
+#if !defined(CONFIG_USER_ONLY)
+/**
+ * arm_security_space_below_el3:
+ * @env: cpu context
+ *
+ * Return the security space of exception levels below EL3, following
+ * an exception return to those levels.  Unlike arm_security_space,
+ * this doesn't care about the current EL.
+ */
+ARMSecuritySpace arm_security_space_below_el3(CPUARMState *env);
+
+/**
+ * arm_is_secure_below_el3:
+ * @env: cpu context
+ *
  * Return true if exception levels below EL3 are in secure state,
- * or would be following an exception return to that level.
- * Unlike arm_is_secure() (which is always a question about the
- * _current_ state of the CPU) this doesn't care about the current
- * EL or mode.
+ * or would be following an exception return to those levels.
  */
 static inline bool arm_is_secure_below_el3(CPUARMState *env)
 {
-assert(!arm_feature(env, ARM_FEATURE_M));
-if (arm_feature(env, ARM_FEATURE_EL3)) {
-return !(env->cp15.scr_el3 & SCR_NS);
-} else {
-/* If EL3 is not supported then the secure state is implementation
- * defined, in which case QEMU defaults to non-secure.
- */
-return false;
-}
+ARMSecuritySpace ss = arm_security_space_below_el3(env);
+return ss == ARMSS_Secure;
 }
 
 /* Return true if the CPU is AArch64 EL3 or AArch32 Mon */
@@ -2447,16 +2475,23 @@ static inline bool arm_is_el3_or_mon(CPUARMState *env)
 return false;
 }
 
-/* Return true if the processor is in secure state */
+/**
+ * arm_security_space:
+ * @env: cpu context
+ *
+ * Return the current security space of the cpu.
+ */
+ARMSecuritySpace arm_security_space(CPUARMState *env);
+
+/**
+ * arm_is_secure:
+ * @env: cpu context
+ *
+ * Return true if the processor is in secure state.
+ */
 static inline bool arm_is_secure(CPUARMState *env)
 {
-if (arm_feature(env, ARM_FEATURE_M)) {
-return env->v7m.secure;
-}
-if (arm_is_el3_or_mon(env)) {
-return true;
-}
-return arm_is_secure_below_el3(env);
+return arm_space_is_secure(arm_security_space(env));
 }
 
 /*
@@ -2475,11 +2510,21 @@ static inline bool arm_is_el2_enabled(CPUARMState *env)
 }
 
 #else
+static inline ARMSecuritySpace arm_security_space_below_el3(CPUARMState *env)
+{
+return ARMSS_NonSecure;
+}
+
 static inline bool arm_is_secure_below_el3(CPUARMState *env)
 {
 return false;
 }
 
+static inline ARMSecuritySpace arm_security_space(CPUARMState *env)
+{
+return ARMSS_NonSecure;
+}
+
 static inline bool arm_is_secure(CPUARMState *env)
 {
 return false;
diff --git a/target/arm/helper.c b/target/arm/helper.c
index eff109f83c..9e1c1ed6d8 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -12538,3 +12538,63 @@ void aarch64_sve_change_el(CPUARMState *env, int 
old_el,
 }
 }
 #endif
+
+#ifndef CONFIG_USER_ONLY
+ARMSecuritySpace arm_security_space(CPUARMState *env)
+{
+if (arm_feature(env, ARM_FEATURE_M)) {
+return arm_secure_to_space(env->v7m.secure);
+}
+
+/*
+ * If EL3 is not supported then the secure state is implementation
+ * defined, in which case QEMU defaults to non-secure.
+ */
+if (!arm_feature(env, ARM_FEATURE_EL3)) {
+return ARMSS_NonSecure;
+}
+
+/* Check for AArch64 EL3 or AArch32 Mon. */
+if (is_a64(env)) {
+if (extract32(env->pstate, 2, 2) == 3) {
+if (cpu_isar_feature(aa64_rme, env_archcpu(env))) {
+return ARMSS_Root;
+} else {
+return ARMSS_Secure;
+}
+}
+} else {
+if ((env->uncached_cpsr & CPSR_M) == ARM_CPU_MODE_MON) {

[PATCH v3 16/25] target/arm: Handle Block and Page bits for security space

2023-02-21 Thread Richard Henderson
With Realm security state, bit 55 of a block or page descriptor during
the stage2 walk becomes the NS bit; during the stage1 walk the bit 5
NS bit is RES0.  With Root security state, bit 11 of the block or page
descriptor during the stage1 walk becomes the NSE bit.

Rather than collecting an NS bit and applying it later, compute the
output pa space from the input pa space and unconditionally assign.
This means that we no longer need to adjust the output space earlier
for the NSTable bit.

Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 91 ++--
 1 file changed, 73 insertions(+), 18 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 8b3deb0884..61c1227578 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -954,12 +954,14 @@ static int get_S2prot(CPUARMState *env, int s2ap, int xn, 
bool s1_is_el0)
  * @mmu_idx: MMU index indicating required translation regime
  * @is_aa64: TRUE if AArch64
  * @ap:  The 2-bit simple AP (AP[2:1])
- * @ns:  NS (non-secure) bit
  * @xn:  XN (execute-never) bit
  * @pxn: PXN (privileged execute-never) bit
+ * @in_pa:   The original input pa space
+ * @out_pa:  The output pa space, modified by NSTable, NS, and NSE
  */
 static int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, bool is_aa64,
-  int ap, int ns, int xn, int pxn)
+  int ap, int xn, int pxn,
+  ARMSecuritySpace in_pa, ARMSecuritySpace out_pa)
 {
 bool is_user = regime_is_user(env, mmu_idx);
 int prot_rw, user_rw;
@@ -980,7 +982,8 @@ static int get_S1prot(CPUARMState *env, ARMMMUIdx mmu_idx, 
bool is_aa64,
 }
 }
 
-if (ns && arm_is_secure(env) && (env->cp15.scr_el3 & SCR_SIF)) {
+if (out_pa == ARMSS_NonSecure && in_pa == ARMSS_Secure &&
+(env->cp15.scr_el3 & SCR_SIF)) {
 return prot_rw;
 }
 
@@ -1248,11 +1251,12 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
S1Translate *ptw,
 int32_t stride;
 int addrsize, inputsize, outputsize;
 uint64_t tcr = regime_tcr(env, mmu_idx);
-int ap, ns, xn, pxn;
+int ap, xn, pxn;
 uint32_t el = regime_el(env, mmu_idx);
 uint64_t descaddrmask;
 bool aarch64 = arm_el_is_aa64(env, el);
 uint64_t descriptor, new_descriptor;
+ARMSecuritySpace out_space;
 
 /* TODO: This code does not support shareability levels. */
 if (aarch64) {
@@ -1435,8 +1439,6 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
S1Translate *ptw,
 ptw->in_ptw_idx += 1;
 ptw->in_secure = false;
 ptw->in_space = ARMSS_NonSecure;
-result->f.attrs.secure = false;
-result->f.attrs.space = ARMSS_NonSecure;
 }
 
 if (!S1_ptw_translate(env, ptw, descaddr, fi)) {
@@ -1554,15 +1556,75 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
S1Translate *ptw,
 }
 
 ap = extract32(attrs, 6, 2);
+out_space = ptw->in_space;
 if (regime_is_stage2(mmu_idx)) {
-ns = mmu_idx == ARMMMUIdx_Stage2;
+/*
+ * R_GYNXY: For stage2 in Realm security state, bit 55 is NS.
+ * The bit remains ignored for other security states.
+ */
+if (out_space == ARMSS_Realm && extract64(attrs, 55, 1)) {
+out_space = ARMSS_NonSecure;
+}
 xn = extract64(attrs, 53, 2);
 result->f.prot = get_S2prot(env, ap, xn, s1_is_el0);
 } else {
-ns = extract32(attrs, 5, 1);
+int nse, ns = extract32(attrs, 5, 1);
+switch (out_space) {
+case ARMSS_Root:
+/*
+ * R_GVZML: Bit 11 becomes the NSE field in the EL3 regime.
+ * R_XTYPW: NSE and NS together select the output pa space.
+ */
+nse = extract32(attrs, 11, 1);
+out_space = (nse << 1) | ns;
+if (out_space == ARMSS_Secure &&
+!cpu_isar_feature(aa64_sel2, cpu)) {
+out_space = ARMSS_NonSecure;
+}
+break;
+case ARMSS_Secure:
+if (ns) {
+out_space = ARMSS_NonSecure;
+}
+break;
+case ARMSS_Realm:
+switch (mmu_idx) {
+case ARMMMUIdx_Stage1_E0:
+case ARMMMUIdx_Stage1_E1:
+case ARMMMUIdx_Stage1_E1_PAN:
+/* I_CZPRF: For Realm EL1&0 stage1, NS bit is RES0. */
+break;
+case ARMMMUIdx_E2:
+case ARMMMUIdx_E20_0:
+case ARMMMUIdx_E20_2:
+case ARMMMUIdx_E20_2_PAN:
+/*
+ * R_LYKFZ, R_WGRZN: For Realm EL2 and EL2&1,
+ * NS changes the output to non-secure space.
+ */
+if (ns) {
+out_space = ARMSS_NonSecure;
+}
+break;
+default:
+g_assert_not_reached();
+}
+break;
+case ARMSS_NonSecure:
+/* 

[PATCH v3 21/25] target/arm: Add GPC syndrome

2023-02-21 Thread Richard Henderson
The function takes the fields as filled in by
the Arm ARM pseudocode for TakeGPCException.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/syndrome.h | 10 ++
 1 file changed, 10 insertions(+)

diff --git a/target/arm/syndrome.h b/target/arm/syndrome.h
index d27d1bc31f..62254d0e51 100644
--- a/target/arm/syndrome.h
+++ b/target/arm/syndrome.h
@@ -50,6 +50,7 @@ enum arm_exception_class {
 EC_SVEACCESSTRAP  = 0x19,
 EC_ERETTRAP   = 0x1a,
 EC_SMETRAP= 0x1d,
+EC_GPC= 0x1e,
 EC_INSNABORT  = 0x20,
 EC_INSNABORT_SAME_EL  = 0x21,
 EC_PCALIGNMENT= 0x22,
@@ -247,6 +248,15 @@ static inline uint32_t syn_bxjtrap(int cv, int cond, int 
rm)
 (cv << 24) | (cond << 20) | rm;
 }
 
+static inline uint32_t syn_gpc(int s2ptw, int ind, int gpcsc,
+   int cm, int s1ptw, int wnr, int fsc)
+{
+/* TODO: FEAT_NV2 adds VNCR */
+return (EC_GPC << ARM_EL_EC_SHIFT) | ARM_EL_IL | (s2ptw << 21)
+| (ind << 20) | (gpcsc << 14) | (cm << 8) | (s1ptw << 7)
+| (wnr << 6) | fsc;
+}
+
 static inline uint32_t syn_insn_abort(int same_el, int ea, int s1ptw, int fsc)
 {
 return (EC_INSNABORT << ARM_EL_EC_SHIFT) | (same_el << ARM_EL_EC_SHIFT)
-- 
2.34.1




[PATCH v3 08/25] target/arm: Add RME cpregs

2023-02-21 Thread Richard Henderson
This includes GPCCR, GPTBR, MFAR, the TLB flush insns PAALL, PAALLOS,
RPALOS, RPAOS, and the cache flush insns CIPAPA and CIGDPAPA.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/cpu.h| 19 +++
 target/arm/helper.c | 83 +
 2 files changed, 102 insertions(+)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 230241cf93..8d18d98350 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -541,6 +541,11 @@ typedef struct CPUArchState {
 uint64_t fgt_read[2]; /* HFGRTR, HDFGRTR */
 uint64_t fgt_write[2]; /* HFGWTR, HDFGWTR */
 uint64_t fgt_exec[1]; /* HFGITR */
+
+/* RME registers */
+uint64_t gpccr_el3;
+uint64_t gptbr_el3;
+uint64_t mfar_el3;
 } cp15;
 
 struct {
@@ -1043,6 +1048,7 @@ struct ArchCPU {
 uint64_t reset_cbar;
 uint32_t reset_auxcr;
 bool reset_hivecs;
+uint8_t reset_l0gptsz;
 
 /*
  * Intermediate values used during property parsing.
@@ -2336,6 +2342,19 @@ FIELD(MVFR1, SIMDFMAC, 28, 4)
 FIELD(MVFR2, SIMDMISC, 0, 4)
 FIELD(MVFR2, FPMISC, 4, 4)
 
+FIELD(GPCCR, PPS, 0, 3)
+FIELD(GPCCR, IRGN, 8, 2)
+FIELD(GPCCR, ORGN, 10, 2)
+FIELD(GPCCR, SH, 12, 2)
+FIELD(GPCCR, PGS, 14, 2)
+FIELD(GPCCR, GPC, 16, 1)
+FIELD(GPCCR, GPCP, 17, 1)
+FIELD(GPCCR, L0GPTSZ, 20, 4)
+
+FIELD(MFAR, FPA, 12, 40)
+FIELD(MFAR, NSE, 62, 1)
+FIELD(MFAR, NS, 63, 1)
+
 QEMU_BUILD_BUG_ON(ARRAY_SIZE(((ARMCPU *)0)->ccsidr) <= 
R_V7M_CSSELR_INDEX_MASK);
 
 /* If adding a feature bit which corresponds to a Linux ELF
diff --git a/target/arm/helper.c b/target/arm/helper.c
index ae8b3f6a48..eff109f83c 100644
--- a/target/arm/helper.c
+++ b/target/arm/helper.c
@@ -6935,6 +6935,83 @@ static const ARMCPRegInfo sme_reginfo[] = {
   .access = PL2_RW, .accessfn = access_esm,
   .type = ARM_CP_CONST, .resetvalue = 0 },
 };
+
+static void tlbi_aa64_paall_write(CPUARMState *env, const ARMCPRegInfo *ri,
+  uint64_t value)
+{
+CPUState *cs = env_cpu(env);
+
+tlb_flush(cs);
+}
+
+static void gpccr_write(CPUARMState *env, const ARMCPRegInfo *ri,
+uint64_t value)
+{
+/* L0GPTSZ is RO; other bits not mentioned are RES0. */
+uint64_t rw_mask = R_GPCCR_PPS_MASK | R_GPCCR_IRGN_MASK |
+R_GPCCR_ORGN_MASK | R_GPCCR_SH_MASK | R_GPCCR_PGS_MASK |
+R_GPCCR_GPC_MASK | R_GPCCR_GPCP_MASK;
+
+env->cp15.gpccr_el3 = (value & rw_mask) | (env->cp15.gpccr_el3 & ~rw_mask);
+}
+
+static void gpccr_reset(CPUARMState *env, const ARMCPRegInfo *ri)
+{
+env->cp15.gpccr_el3 = FIELD_DP64(0, GPCCR, L0GPTSZ,
+ env_archcpu(env)->reset_l0gptsz);
+}
+
+static void tlbi_aa64_paallos_write(CPUARMState *env, const ARMCPRegInfo *ri,
+uint64_t value)
+{
+CPUState *cs = env_cpu(env);
+
+tlb_flush_all_cpus_synced(cs);
+}
+
+static const ARMCPRegInfo rme_reginfo[] = {
+{ .name = "GPCCR_EL3", .state = ARM_CP_STATE_AA64,
+  .opc0 = 3, .opc1 = 6, .crn = 2, .crm = 1, .opc2 = 6,
+  .access = PL3_RW, .writefn = gpccr_write, .resetfn = gpccr_reset,
+  .fieldoffset = offsetof(CPUARMState, cp15.gpccr_el3) },
+{ .name = "GPTBR_EL3", .state = ARM_CP_STATE_AA64,
+  .opc0 = 3, .opc1 = 6, .crn = 2, .crm = 1, .opc2 = 4,
+  .access = PL3_RW, .fieldoffset = offsetof(CPUARMState, cp15.gptbr_el3) },
+{ .name = "MFAR_EL3", .state = ARM_CP_STATE_AA64,
+  .opc0 = 3, .opc1 = 6, .crn = 6, .crm = 0, .opc2 = 5,
+  .access = PL3_RW, .fieldoffset = offsetof(CPUARMState, cp15.mfar_el3) },
+{ .name = "TLBI_PAALL", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 7, .opc2 = 4,
+  .access = PL3_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_paall_write },
+{ .name = "TLBI_PAALLOS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 1, .opc2 = 4,
+  .access = PL3_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_paallos_write },
+/*
+ * QEMU does not have a way to invalidate by physical address, thus
+ * invalidating a range of physical addresses is accomplished by
+ * flushing all tlb entries in the outer sharable domain,
+ * just like PAALLOS.
+ */
+{ .name = "TLBI_RPALOS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 4, .opc2 = 7,
+  .access = PL3_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_paallos_write },
+{ .name = "TLBI_RPAOS", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 6, .crn = 8, .crm = 4, .opc2 = 3,
+  .access = PL3_W, .type = ARM_CP_NO_RAW,
+  .writefn = tlbi_aa64_paallos_write },
+{ .name = "DC_CIPAPA", .state = ARM_CP_STATE_AA64,
+  .opc0 = 1, .opc1 = 6, .crn = 7, .crm = 14, .opc2 = 1,
+  .access = PL3_W, .type = ARM_CP_NOP },
+};
+
+static const ARMCPRegInfo rme_mte_reginfo[] = {
+{ .name = "DC_CIGDPAPA", .state = ARM_CP_STATE_AA64,
+  

[PATCH v3 20/25] target/arm: Use get_phys_addr_with_struct for stage2

2023-02-21 Thread Richard Henderson
This fixes a bug in which we failed to initialize
the result attributes properly after the memset.

Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 11 +--
 1 file changed, 1 insertion(+), 10 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 6fa3d33a4e..7e1aa34d24 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -37,10 +37,6 @@ typedef struct S1Translate {
 void *out_host;
 } S1Translate;
 
-static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
-   uint64_t address, MMUAccessType access_type,
-   GetPhysAddrResult *result, ARMMMUFaultInfo *fi);
-
 static bool get_phys_addr_with_struct(CPUARMState *env, S1Translate *ptw,
   target_ulong address,
   MMUAccessType access_type,
@@ -2859,12 +2855,7 @@ static bool get_phys_addr_twostage(CPUARMState *env, 
S1Translate *ptw,
 cacheattrs1 = result->cacheattrs;
 memset(result, 0, sizeof(*result));
 
-if (arm_feature(env, ARM_FEATURE_PMSA)) {
-ret = get_phys_addr_pmsav8(env, ipa, access_type,
-   ptw->in_mmu_idx, is_secure, result, fi);
-} else {
-ret = get_phys_addr_lpae(env, ptw, ipa, access_type, result, fi);
-}
+ret = get_phys_addr_with_struct(env, ptw, ipa, access_type, result, fi);
 fi->s2addr = ipa;
 
 /* Combine the S1 and S2 perms.  */
-- 
2.34.1




[PATCH v3 19/25] target/arm: Move s1_is_el0 into S1Translate

2023-02-21 Thread Richard Henderson
Instead of passing this to get_phys_addr_lpae, stash it
in the S1Translate structure.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 27 ---
 1 file changed, 12 insertions(+), 15 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 8a31af60c9..6fa3d33a4e 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -22,6 +22,12 @@ typedef struct S1Translate {
 ARMSecuritySpace in_space;
 bool in_secure;
 bool in_debug;
+/*
+ * If this is stage 2 of a stage 1+2 page table walk, then this must
+ * be true if stage 1 is an EL0 access; otherwise this is ignored.
+ * Stage 2 is indicated by in_mmu_idx set to ARMMMUIdx_Stage2{,_S}.
+ */
+bool in_s1_is_el0;
 bool out_secure;
 bool out_rw;
 bool out_be;
@@ -32,8 +38,7 @@ typedef struct S1Translate {
 } S1Translate;
 
 static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
-   uint64_t address,
-   MMUAccessType access_type, bool s1_is_el0,
+   uint64_t address, MMUAccessType access_type,
GetPhysAddrResult *result, ARMMMUFaultInfo *fi);
 
 static bool get_phys_addr_with_struct(CPUARMState *env, S1Translate *ptw,
@@ -1255,17 +1260,12 @@ static int check_s2_mmu_setup(ARMCPU *cpu, bool 
is_aa64, uint64_t tcr,
  * @ptw: Current and next stage parameters for the walk.
  * @address: virtual address to get physical address for
  * @access_type: MMU_DATA_LOAD, MMU_DATA_STORE or MMU_INST_FETCH
- * @s1_is_el0: if @ptw->in_mmu_idx is ARMMMUIdx_Stage2
- * (so this is a stage 2 page table walk),
- * must be true if this is stage 2 of a stage 1+2
- * walk for an EL0 access. If @mmu_idx is anything else,
- * @s1_is_el0 is ignored.
  * @result: set on translation success,
  * @fi: set to fault info if the translation fails
  */
 static bool get_phys_addr_lpae(CPUARMState *env, S1Translate *ptw,
uint64_t address,
-   MMUAccessType access_type, bool s1_is_el0,
+   MMUAccessType access_type,
GetPhysAddrResult *result, ARMMMUFaultInfo *fi)
 {
 ARMCPU *cpu = env_archcpu(env);
@@ -1598,7 +1598,7 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
S1Translate *ptw,
 result->f.prot = get_S2prot_noexecute(ap);
 } else {
 xn = extract64(attrs, 53, 2);
-result->f.prot = get_S2prot(env, ap, xn, s1_is_el0);
+result->f.prot = get_S2prot(env, ap, xn, ptw->in_s1_is_el0);
 }
 } else {
 int nse, ns = extract32(attrs, 5, 1);
@@ -2820,7 +2820,6 @@ static bool get_phys_addr_twostage(CPUARMState *env, 
S1Translate *ptw,
 bool ret, ipa_secure, s2walk_secure;
 ARMCacheAttrs cacheattrs1;
 ARMSecuritySpace ipa_space, s2walk_space;
-bool is_el0;
 uint64_t hcr;
 
 ret = get_phys_addr_with_struct(env, ptw, address, access_type, result, 
fi);
@@ -2845,7 +2844,7 @@ static bool get_phys_addr_twostage(CPUARMState *env, 
S1Translate *ptw,
 s2walk_space = ipa_space;
 }
 
-is_el0 = ptw->in_mmu_idx == ARMMMUIdx_Stage1_E0;
+ptw->in_s1_is_el0 = ptw->in_mmu_idx == ARMMMUIdx_Stage1_E0;
 ptw->in_mmu_idx = s2walk_secure ? ARMMMUIdx_Stage2_S : ARMMMUIdx_Stage2;
 ptw->in_ptw_idx = arm_space_to_phys(s2walk_space);
 ptw->in_secure = s2walk_secure;
@@ -2864,8 +2863,7 @@ static bool get_phys_addr_twostage(CPUARMState *env, 
S1Translate *ptw,
 ret = get_phys_addr_pmsav8(env, ipa, access_type,
ptw->in_mmu_idx, is_secure, result, fi);
 } else {
-ret = get_phys_addr_lpae(env, ptw, ipa, access_type,
- is_el0, result, fi);
+ret = get_phys_addr_lpae(env, ptw, ipa, access_type, result, fi);
 }
 fi->s2addr = ipa;
 
@@ -3041,8 +3039,7 @@ static bool get_phys_addr_with_struct(CPUARMState *env, 
S1Translate *ptw,
 }
 
 if (regime_using_lpae_format(env, mmu_idx)) {
-return get_phys_addr_lpae(env, ptw, address, access_type, false,
-  result, fi);
+return get_phys_addr_lpae(env, ptw, address, access_type, result, fi);
 } else if (arm_feature(env, ARM_FEATURE_V7) ||
regime_sctlr(env, mmu_idx) & SCTLR_XP) {
 return get_phys_addr_v6(env, ptw, address, access_type, result, fi);
-- 
2.34.1




[PATCH v3 23/25] target/arm: Implement the granule protection check

2023-02-21 Thread Richard Henderson
Place the check at the end of get_phys_addr_with_struct,
so that we check all physical results.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/ptw.c | 249 +++
 1 file changed, 232 insertions(+), 17 deletions(-)

diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 7e1aa34d24..8fa4849aaa 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -37,11 +37,17 @@ typedef struct S1Translate {
 void *out_host;
 } S1Translate;
 
-static bool get_phys_addr_with_struct(CPUARMState *env, S1Translate *ptw,
-  target_ulong address,
-  MMUAccessType access_type,
-  GetPhysAddrResult *result,
-  ARMMMUFaultInfo *fi);
+static bool get_phys_addr_nogpc(CPUARMState *env, S1Translate *ptw,
+target_ulong address,
+MMUAccessType access_type,
+GetPhysAddrResult *result,
+ARMMMUFaultInfo *fi);
+
+static bool get_phys_addr_gpc(CPUARMState *env, S1Translate *ptw,
+  target_ulong address,
+  MMUAccessType access_type,
+  GetPhysAddrResult *result,
+  ARMMMUFaultInfo *fi);
 
 /* This mapping is common between ID_AA64MMFR0.PARANGE and TCR_ELx.{I}PS. */
 static const uint8_t pamax_map[] = {
@@ -197,6 +203,197 @@ static bool regime_translation_disabled(CPUARMState *env, 
ARMMMUIdx mmu_idx,
 return (regime_sctlr(env, mmu_idx) & SCTLR_M) == 0;
 }
 
+static bool granule_protection_check(CPUARMState *env, uint64_t paddress,
+ ARMSecuritySpace pspace,
+ ARMMMUFaultInfo *fi)
+{
+MemTxAttrs attrs = {
+.secure = true,
+.space = ARMSS_Root,
+};
+ARMCPU *cpu = env_archcpu(env);
+uint64_t gpccr = env->cp15.gpccr_el3;
+unsigned pps, pgs, l0gptsz, level = 0;
+uint64_t tableaddr, pps_mask, align, entry, index;
+AddressSpace *as;
+MemTxResult result;
+int gpi;
+
+if (!FIELD_EX64(gpccr, GPCCR, GPC)) {
+return true;
+}
+
+/*
+ * GPC Priority 1 (R_GMGRR):
+ * R_JWCSM: If the configuration of GPCCR_EL3 is invalid,
+ * the access fails as GPT walk fault at level 0.
+ */
+
+/*
+ * Configuration of PPS to a value exceeding the implemented
+ * physical address size is invalid.
+ */
+pps = FIELD_EX64(gpccr, GPCCR, PPS);
+if (pps > FIELD_EX64(cpu->isar.id_aa64mmfr0, ID_AA64MMFR0, PARANGE)) {
+goto fault_walk;
+}
+pps = pamax_map[pps];
+pps_mask = MAKE_64BIT_MASK(0, pps);
+
+switch (FIELD_EX64(gpccr, GPCCR, SH)) {
+case 0b10: /* outer shareable */
+break;
+case 0b00: /* non-shareable */
+case 0b11: /* inner shareable */
+/* Inner and Outer non-cacheable requires Outer shareable. */
+if (FIELD_EX64(gpccr, GPCCR, ORGN) == 0 &&
+FIELD_EX64(gpccr, GPCCR, IRGN) == 0) {
+goto fault_walk;
+}
+break;
+default:   /* reserved */
+goto fault_walk;
+}
+
+switch (FIELD_EX64(gpccr, GPCCR, PGS)) {
+case 0b00: /* 4KB */
+pgs = 12;
+break;
+case 0b01: /* 64KB */
+pgs = 16;
+break;
+case 0b10: /* 16KB */
+pgs = 14;
+break;
+default: /* reserved */
+goto fault_walk;
+}
+
+/* Note this field is read-only and fixed at reset. */
+l0gptsz = 30 + FIELD_EX64(gpccr, GPCCR, L0GPTSZ);
+
+/*
+ * GPC Priority 2: Secure, Realm or Root address exceeds PPS.
+ * R_CPDSB: A NonSecure physical address input exceeding PPS
+ * does not experience any fault.
+ */
+if (paddress & ~pps_mask) {
+if (pspace == ARMSS_NonSecure) {
+return true;
+}
+goto fault_size;
+}
+
+/* GPC Priority 3: the base address of GPTBR_EL3 exceeds PPS. */
+tableaddr = env->cp15.gptbr_el3 << 12;
+if (tableaddr & ~pps_mask) {
+goto fault_size;
+}
+
+/*
+ * BADDR is aligned per a function of PPS and L0GPTSZ.
+ * These bits of GPTBR_EL3 are RES0, but are not a configuration error,
+ * unlike the RES0 bits of the GPT entries (R_XNKFZ).
+ */
+align = MAX(pps - l0gptsz + 3, 12);
+align = MAKE_64BIT_MASK(0, align);
+tableaddr &= ~align;
+
+as = arm_addressspace(env_cpu(env), attrs);
+
+/* Level 0 lookup. */
+index = extract64(paddress, l0gptsz, pps - l0gptsz);
+tableaddr += index * 8;
+entry = address_space_ldq_le(as, tableaddr, attrs, );
+if (result != MEMTX_OK) {
+goto fault_eabt;
+}
+
+switch (extract32(entry, 0, 4)) {
+case 1: /* block descriptor */
+if (entry >> 8) {
+goto fault_walk; /* RES0 bits not 0 */

[PATCH v3 00/25] target/arm: Implement FEAT_RME

2023-02-21 Thread Richard Henderson
This is based on mainline, without any extra ARMv9-A dependencies
which are still under development.  This is good enough to pass
all of the tests within

https://github.com/Huawei/Huawei_CCA_QEMU

Changes for v3:
  * Incorporate fix for m-profile arm_cpu_get_phys_page_attrs_debug,
since it has conflicts with the rest of the patch set.
  * Revert accidental change in S1_ptw_translate remapping
ARMFault_GPCFOnOutput to ARMFault_GPCFOnWalk.
  * Remove __attribute__((nonnull)) early.
  * Rename get_phys_addr_{inner,outer} -> get_phys_addr_{nogpc,gpc}.

Changes for v2:
  * Drop "Fix pmsav8 stage2 secure parameter".
  * Incorporate review feedback.
  * Mark last two patches as "NOTFORMERGE".


r~


Richard Henderson (25):
  target/arm: Handle m-profile in arm_is_secure
  target/arm: Stub arm_hcr_el2_eff for m-profile
  target/arm: Diagnose incorrect usage of arm_is_secure subroutines
  target/arm: Rewrite check_s2_mmu_setup
  target/arm: Add isar_feature_aa64_rme
  target/arm: Update SCR and HCR for RME
  target/arm: SCR_EL3.NS may be RES1
  target/arm: Add RME cpregs
  target/arm: Introduce ARMSecuritySpace
  include/exec/memattrs: Add two bits of space to MemTxAttrs
  target/arm: Adjust the order of Phys and Stage2 ARMMMUIdx
  target/arm: Introduce ARMMMUIdx_Phys_{Realm,Root}
  target/arm: Remove __attribute__((nonnull)) from ptw.c
  target/arm: Pipe ARMSecuritySpace through ptw.c
  target/arm: NSTable is RES0 for the RME EL3 regime
  target/arm: Handle Block and Page bits for security space
  target/arm: Handle no-execute for Realm and Root regimes
  target/arm: Use get_phys_addr_with_struct in S1_ptw_translate
  target/arm: Move s1_is_el0 into S1Translate
  target/arm: Use get_phys_addr_with_struct for stage2
  target/arm: Add GPC syndrome
  target/arm: Implement GPC exceptions
  target/arm: Implement the granule protection check
  target/arm: Enable RME for -cpu max
  hw/arm/virt: Add some memory for Realm Management Monitor

 include/exec/memattrs.h |   9 +-
 include/hw/arm/virt.h   |   2 +
 target/arm/cpu-param.h  |   2 +-
 target/arm/cpu.h| 149 ++--
 target/arm/internals.h  |  27 ++
 target/arm/syndrome.h   |  10 +
 hw/arm/virt.c   |  43 +++
 target/arm/cpu.c|   4 +
 target/arm/cpu64.c  |  37 ++
 target/arm/helper.c | 164 -
 target/arm/ptw.c| 753 ++--
 target/arm/tlb_helper.c |  96 -
 12 files changed, 1073 insertions(+), 223 deletions(-)

-- 
2.34.1




[PATCH v3 11/25] target/arm: Adjust the order of Phys and Stage2 ARMMMUIdx

2023-02-21 Thread Richard Henderson
It will be helpful to have ARMMMUIdx_Phys_* to be in the same
relative order as ARMSecuritySpace enumerators. This requires
the adjustment to the nstable check. While there, check for being
in secure state rather than rely on clearing the low bit making
no change to non-secure state.

Reviewed-by: Peter Maydell 
Signed-off-by: Richard Henderson 
---
 target/arm/cpu.h | 12 ++--
 target/arm/ptw.c | 12 +---
 2 files changed, 11 insertions(+), 13 deletions(-)

diff --git a/target/arm/cpu.h b/target/arm/cpu.h
index 203a3e0046..c5fc475cf8 100644
--- a/target/arm/cpu.h
+++ b/target/arm/cpu.h
@@ -2855,18 +2855,18 @@ typedef enum ARMMMUIdx {
 ARMMMUIdx_E2= 6 | ARM_MMU_IDX_A,
 ARMMMUIdx_E3= 7 | ARM_MMU_IDX_A,
 
-/* TLBs with 1-1 mapping to the physical address spaces. */
-ARMMMUIdx_Phys_NS   = 8 | ARM_MMU_IDX_A,
-ARMMMUIdx_Phys_S= 9 | ARM_MMU_IDX_A,
-
 /*
  * Used for second stage of an S12 page table walk, or for descriptor
  * loads during first stage of an S1 page table walk.  Note that both
  * are in use simultaneously for SecureEL2: the security state for
  * the S2 ptw is selected by the NS bit from the S1 ptw.
  */
-ARMMMUIdx_Stage2= 10 | ARM_MMU_IDX_A,
-ARMMMUIdx_Stage2_S  = 11 | ARM_MMU_IDX_A,
+ARMMMUIdx_Stage2_S  = 8 | ARM_MMU_IDX_A,
+ARMMMUIdx_Stage2= 9 | ARM_MMU_IDX_A,
+
+/* TLBs with 1-1 mapping to the physical address spaces. */
+ARMMMUIdx_Phys_S= 10 | ARM_MMU_IDX_A,
+ARMMMUIdx_Phys_NS   = 11 | ARM_MMU_IDX_A,
 
 /*
  * These are not allocated TLBs and are used only for AT system
diff --git a/target/arm/ptw.c b/target/arm/ptw.c
index 6fb72fb086..5ed5bb5039 100644
--- a/target/arm/ptw.c
+++ b/target/arm/ptw.c
@@ -1410,16 +1410,14 @@ static bool get_phys_addr_lpae(CPUARMState *env, 
S1Translate *ptw,
 descaddr |= (address >> (stride * (4 - level))) & indexmask;
 descaddr &= ~7ULL;
 nstable = extract32(tableattrs, 4, 1);
-if (nstable) {
+if (nstable && ptw->in_secure) {
 /*
  * Stage2_S -> Stage2 or Phys_S -> Phys_NS
- * Assert that the non-secure idx are even, and relative order.
+ * Assert the relative order of the secure/non-secure indexes.
  */
-QEMU_BUILD_BUG_ON((ARMMMUIdx_Phys_NS & 1) != 0);
-QEMU_BUILD_BUG_ON((ARMMMUIdx_Stage2 & 1) != 0);
-QEMU_BUILD_BUG_ON(ARMMMUIdx_Phys_NS + 1 != ARMMMUIdx_Phys_S);
-QEMU_BUILD_BUG_ON(ARMMMUIdx_Stage2 + 1 != ARMMMUIdx_Stage2_S);
-ptw->in_ptw_idx &= ~1;
+QEMU_BUILD_BUG_ON(ARMMMUIdx_Phys_S + 1 != ARMMMUIdx_Phys_NS);
+QEMU_BUILD_BUG_ON(ARMMMUIdx_Stage2_S + 1 != ARMMMUIdx_Stage2);
+ptw->in_ptw_idx += 1;
 ptw->in_secure = false;
 }
 if (!S1_ptw_translate(env, ptw, descaddr, fi)) {
-- 
2.34.1




[PULL v2 4/8] cpus: Make {start,end}_exclusive() recursive

2023-02-21 Thread Richard Henderson
From: Ilya Leoshkevich 

Currently dying to one of the core_dump_signal()s deadlocks, because
dump_core_and_abort() calls start_exclusive() two times: first via
stop_all_tasks(), and then via preexit_cleanup() ->
qemu_plugin_user_exit().

There are a number of ways to solve this: resume after dumping core;
check cpu_in_exclusive_context() in qemu_plugin_user_exit(); or make
{start,end}_exclusive() recursive. Pick the last option, since it's
the most straightforward one.

Fixes: da91c1920242 ("linux-user: Clean up when exiting due to a signal")
Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Signed-off-by: Ilya Leoshkevich 
Message-Id: <20230214140829.45392-3-...@linux.ibm.com>
Signed-off-by: Richard Henderson 
---
 include/hw/core/cpu.h |  4 ++--
 cpus-common.c | 12 ++--
 2 files changed, 12 insertions(+), 4 deletions(-)

diff --git a/include/hw/core/cpu.h b/include/hw/core/cpu.h
index 2417597236..671f041bec 100644
--- a/include/hw/core/cpu.h
+++ b/include/hw/core/cpu.h
@@ -349,7 +349,7 @@ struct CPUState {
 bool unplug;
 bool crash_occurred;
 bool exit_request;
-bool in_exclusive_context;
+int exclusive_context_count;
 uint32_t cflags_next_tb;
 /* updates protected by BQL */
 uint32_t interrupt_request;
@@ -758,7 +758,7 @@ void async_safe_run_on_cpu(CPUState *cpu, run_on_cpu_func 
func, run_on_cpu_data
  */
 static inline bool cpu_in_exclusive_context(const CPUState *cpu)
 {
-return cpu->in_exclusive_context;
+return cpu->exclusive_context_count;
 }
 
 /**
diff --git a/cpus-common.c b/cpus-common.c
index 793364dc0e..39f355de98 100644
--- a/cpus-common.c
+++ b/cpus-common.c
@@ -192,6 +192,11 @@ void start_exclusive(void)
 CPUState *other_cpu;
 int running_cpus;
 
+if (current_cpu->exclusive_context_count) {
+current_cpu->exclusive_context_count++;
+return;
+}
+
 qemu_mutex_lock(_cpu_list_lock);
 exclusive_idle();
 
@@ -219,13 +224,16 @@ void start_exclusive(void)
  */
 qemu_mutex_unlock(_cpu_list_lock);
 
-current_cpu->in_exclusive_context = true;
+current_cpu->exclusive_context_count = 1;
 }
 
 /* Finish an exclusive operation.  */
 void end_exclusive(void)
 {
-current_cpu->in_exclusive_context = false;
+current_cpu->exclusive_context_count--;
+if (current_cpu->exclusive_context_count) {
+return;
+}
 
 qemu_mutex_lock(_cpu_list_lock);
 qatomic_set(_cpus, 0);
-- 
2.34.1




[PULL v2 1/8] accel/tcg: Allow the second page of an instruction to be MMIO

2023-02-21 Thread Richard Henderson
If an instruction straddles a page boundary, and the first page
was ram, but the second page was MMIO, we would abort.  Handle
this as if both pages are MMIO, by setting the ram_addr_t for
the first page to -1.

Reported-by: Sid Manning 
Reported-by: Jørgen Hansen 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Richard Henderson 
---
 accel/tcg/translator.c | 12 ++--
 1 file changed, 10 insertions(+), 2 deletions(-)

diff --git a/accel/tcg/translator.c b/accel/tcg/translator.c
index ef5193c67e..1cf404ced0 100644
--- a/accel/tcg/translator.c
+++ b/accel/tcg/translator.c
@@ -176,8 +176,16 @@ static void *translator_access(CPUArchState *env, 
DisasContextBase *db,
 if (host == NULL) {
 tb_page_addr_t phys_page =
 get_page_addr_code_hostp(env, base, >host_addr[1]);
-/* We cannot handle MMIO as second page. */
-assert(phys_page != -1);
+
+/*
+ * If the second page is MMIO, treat as if the first page
+ * was MMIO as well, so that we do not cache the TB.
+ */
+if (unlikely(phys_page == -1)) {
+tb_set_page_addr0(tb, -1);
+return NULL;
+}
+
 tb_set_page_addr1(tb, phys_page);
 #ifdef CONFIG_USER_ONLY
 page_protect(end);
-- 
2.34.1




[PULL v2 5/8] linux-user/microblaze: Handle privileged exception

2023-02-21 Thread Richard Henderson
From: Ilya Leoshkevich 

Follow what kernel's full_exception() is doing.

Reviewed-by: Richard Henderson 
Signed-off-by: Ilya Leoshkevich 
Message-Id: <20230214140829.45392-4-...@linux.ibm.com>
Signed-off-by: Richard Henderson 
---
 linux-user/microblaze/cpu_loop.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/linux-user/microblaze/cpu_loop.c b/linux-user/microblaze/cpu_loop.c
index 5ccf9e942e..212e62d0a6 100644
--- a/linux-user/microblaze/cpu_loop.c
+++ b/linux-user/microblaze/cpu_loop.c
@@ -25,8 +25,8 @@
 
 void cpu_loop(CPUMBState *env)
 {
+int trapnr, ret, si_code, sig;
 CPUState *cs = env_cpu(env);
-int trapnr, ret, si_code;
 
 while (1) {
 cpu_exec_start(cs);
@@ -76,6 +76,7 @@ void cpu_loop(CPUMBState *env)
 env->iflags &= ~(IMM_FLAG | D_FLAG);
 switch (env->esr & 31) {
 case ESR_EC_DIVZERO:
+sig = TARGET_SIGFPE;
 si_code = TARGET_FPE_INTDIV;
 break;
 case ESR_EC_FPU:
@@ -84,6 +85,7 @@ void cpu_loop(CPUMBState *env)
  * if there's no recognized bit set.  Possibly this
  * implies that si_code is 0, but follow the structure.
  */
+sig = TARGET_SIGFPE;
 si_code = env->fsr;
 if (si_code & FSR_IO) {
 si_code = TARGET_FPE_FLTINV;
@@ -97,13 +99,17 @@ void cpu_loop(CPUMBState *env)
 si_code = TARGET_FPE_FLTRES;
 }
 break;
+case ESR_EC_PRIVINSN:
+sig = SIGILL;
+si_code = ILL_PRVOPC;
+break;
 default:
 fprintf(stderr, "Unhandled hw-exception: 0x%x\n",
 env->esr & ESR_EC_MASK);
 cpu_dump_state(cs, stderr, 0);
 exit(EXIT_FAILURE);
 }
-force_sig_fault(TARGET_SIGFPE, si_code, env->pc);
+force_sig_fault(sig, si_code, env->pc);
 break;
 
 case EXCP_DEBUG:
-- 
2.34.1




[PULL v2 2/8] linux-user/sparc: Raise SIGILL for all unhandled software traps

2023-02-21 Thread Richard Henderson
The linux kernel's trap tables vector all unassigned trap
numbers to BAD_TRAP, which then raises SIGILL.

Tested-by: Ilya Leoshkevich 
Reported-by: Ilya Leoshkevich 
Signed-off-by: Richard Henderson 
---
 linux-user/sparc/cpu_loop.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/linux-user/sparc/cpu_loop.c b/linux-user/sparc/cpu_loop.c
index 434c90a55f..c120c42278 100644
--- a/linux-user/sparc/cpu_loop.c
+++ b/linux-user/sparc/cpu_loop.c
@@ -248,6 +248,14 @@ void cpu_loop (CPUSPARCState *env)
 cpu_exec_step_atomic(cs);
 break;
 default:
+/*
+ * Most software trap numbers vector to BAD_TRAP.
+ * Handle anything not explicitly matched above.
+ */
+if (trapnr >= TT_TRAP && trapnr <= TT_TRAP + 0x7f) {
+force_sig_fault(TARGET_SIGILL, ILL_ILLTRP, env->pc);
+break;
+}
 fprintf(stderr, "Unhandled trap: 0x%x\n", trapnr);
 cpu_dump_state(cs, stderr, 0);
 exit(EXIT_FAILURE);
-- 
2.34.1




[PULL v2 7/8] util/cacheflush: fix cache on windows-arm64

2023-02-21 Thread Richard Henderson
From: Pierrick Bouvier 

ctr_el0 access is privileged on this platform and fails as an illegal
instruction.

Windows does not offer a way to flush data cache from userspace, and
only FlushInstructionCache is available in Windows API.

The generic implementation of flush_idcache_range uses,
__builtin___clear_cache, which already use the FlushInstructionCache
function. So we rely on that.

Signed-off-by: Pierrick Bouvier 
Reviewed-by: Richard Henderson 
Message-Id: <20230221153006.20300-2-pierrick.bouv...@linaro.org>
Signed-off-by: Richard Henderson 
---
 util/cacheflush.c | 14 +++---
 1 file changed, 11 insertions(+), 3 deletions(-)

diff --git a/util/cacheflush.c b/util/cacheflush.c
index 2c2c73e085..06c2333a60 100644
--- a/util/cacheflush.c
+++ b/util/cacheflush.c
@@ -121,8 +121,12 @@ static void sys_cache_info(int *isize, int *dsize)
 static bool have_coherent_icache;
 #endif
 
-#if defined(__aarch64__) && !defined(CONFIG_DARWIN)
-/* Apple does not expose CTR_EL0, so we must use system interfaces. */
+#if defined(__aarch64__) && !defined(CONFIG_DARWIN) && !defined(CONFIG_WIN32)
+/*
+ * Apple does not expose CTR_EL0, so we must use system interfaces.
+ * Windows neither, but we use a generic implementation of flush_idcache_range
+ * in this case.
+ */
 static uint64_t save_ctr_el0;
 static void arch_cache_info(int *isize, int *dsize)
 {
@@ -225,7 +229,11 @@ static void __attribute__((constructor)) 
init_cache_info(void)
 
 /* Caches are coherent and do not require flushing; symbol inline. */
 
-#elif defined(__aarch64__)
+#elif defined(__aarch64__) && !defined(CONFIG_WIN32)
+/*
+ * For Windows, we use generic implementation of flush_idcache_range, that
+ * performs a call to FlushInstructionCache, through __builtin___clear_cache.
+ */
 
 #ifdef CONFIG_DARWIN
 /* Apple does not expose CTR_EL0, so we must use system interfaces. */
-- 
2.34.1




[PULL v2 6/8] target/microblaze: Add gdbstub xml

2023-02-21 Thread Richard Henderson
Mirroring the upstream gdb xml files, the two stack boundary
registers are separated out.

Reviewed-by: Edgar E. Iglesias 
Signed-off-by: Richard Henderson 
---
 target/microblaze/cpu.h |  2 +
 target/microblaze/cpu.c |  7 ++-
 target/microblaze/gdbstub.c | 51 +++-
 configs/targets/microblaze-linux-user.mak   |  1 +
 configs/targets/microblaze-softmmu.mak  |  1 +
 configs/targets/microblazeel-linux-user.mak |  1 +
 configs/targets/microblazeel-softmmu.mak|  1 +
 gdb-xml/microblaze-core.xml | 67 +
 gdb-xml/microblaze-stack-protect.xml| 12 
 9 files changed, 128 insertions(+), 15 deletions(-)
 create mode 100644 gdb-xml/microblaze-core.xml
 create mode 100644 gdb-xml/microblaze-stack-protect.xml

diff --git a/target/microblaze/cpu.h b/target/microblaze/cpu.h
index 1e84dd8f47..e541fbb0b3 100644
--- a/target/microblaze/cpu.h
+++ b/target/microblaze/cpu.h
@@ -367,6 +367,8 @@ hwaddr mb_cpu_get_phys_page_attrs_debug(CPUState *cpu, 
vaddr addr,
 MemTxAttrs *attrs);
 int mb_cpu_gdb_read_register(CPUState *cpu, GByteArray *buf, int reg);
 int mb_cpu_gdb_write_register(CPUState *cpu, uint8_t *buf, int reg);
+int mb_cpu_gdb_read_stack_protect(CPUArchState *cpu, GByteArray *buf, int reg);
+int mb_cpu_gdb_write_stack_protect(CPUArchState *cpu, uint8_t *buf, int reg);
 
 static inline uint32_t mb_cpu_read_msr(const CPUMBState *env)
 {
diff --git a/target/microblaze/cpu.c b/target/microblaze/cpu.c
index 817681f9b2..a2d2f5c340 100644
--- a/target/microblaze/cpu.c
+++ b/target/microblaze/cpu.c
@@ -28,6 +28,7 @@
 #include "qemu/module.h"
 #include "hw/qdev-properties.h"
 #include "exec/exec-all.h"
+#include "exec/gdbstub.h"
 #include "fpu/softfloat-helpers.h"
 
 static const struct {
@@ -294,6 +295,9 @@ static void mb_cpu_initfn(Object *obj)
 CPUMBState *env = >env;
 
 cpu_set_cpustate_pointers(cpu);
+gdb_register_coprocessor(CPU(cpu), mb_cpu_gdb_read_stack_protect,
+ mb_cpu_gdb_write_stack_protect, 2,
+ "microblaze-stack-protect.xml", 0);
 
 set_float_rounding_mode(float_round_nearest_even, >fp_status);
 
@@ -422,7 +426,8 @@ static void mb_cpu_class_init(ObjectClass *oc, void *data)
 cc->sysemu_ops = _sysemu_ops;
 #endif
 device_class_set_props(dc, mb_properties);
-cc->gdb_num_core_regs = 32 + 27;
+cc->gdb_num_core_regs = 32 + 25;
+cc->gdb_core_xml_file = "microblaze-core.xml";
 
 cc->disas_set_info = mb_disas_set_info;
 cc->tcg_ops = _tcg_ops;
diff --git a/target/microblaze/gdbstub.c b/target/microblaze/gdbstub.c
index 2e6e070051..8143fcae88 100644
--- a/target/microblaze/gdbstub.c
+++ b/target/microblaze/gdbstub.c
@@ -39,8 +39,11 @@ enum {
 GDB_PVR0  = 32 + 6,
 GDB_PVR11 = 32 + 17,
 GDB_EDR   = 32 + 18,
-GDB_SLR   = 32 + 25,
-GDB_SHR   = 32 + 26,
+};
+
+enum {
+GDB_SP_SHL,
+GDB_SP_SHR,
 };
 
 int mb_cpu_gdb_read_register(CPUState *cs, GByteArray *mem_buf, int n)
@@ -83,12 +86,6 @@ int mb_cpu_gdb_read_register(CPUState *cs, GByteArray 
*mem_buf, int n)
 case GDB_EDR:
 val = env->edr;
 break;
-case GDB_SLR:
-val = env->slr;
-break;
-case GDB_SHR:
-val = env->shr;
-break;
 default:
 /* Other SRegs aren't modeled, so report a value of 0 */
 val = 0;
@@ -97,6 +94,23 @@ int mb_cpu_gdb_read_register(CPUState *cs, GByteArray 
*mem_buf, int n)
 return gdb_get_reg32(mem_buf, val);
 }
 
+int mb_cpu_gdb_read_stack_protect(CPUMBState *env, GByteArray *mem_buf, int n)
+{
+uint32_t val;
+
+switch (n) {
+case GDB_SP_SHL:
+val = env->slr;
+break;
+case GDB_SP_SHR:
+val = env->shr;
+break;
+default:
+return 0;
+}
+return gdb_get_reg32(mem_buf, val);
+}
+
 int mb_cpu_gdb_write_register(CPUState *cs, uint8_t *mem_buf, int n)
 {
 MicroBlazeCPU *cpu = MICROBLAZE_CPU(cs);
@@ -135,12 +149,21 @@ int mb_cpu_gdb_write_register(CPUState *cs, uint8_t 
*mem_buf, int n)
 case GDB_EDR:
 env->edr = tmp;
 break;
-case GDB_SLR:
-env->slr = tmp;
-break;
-case GDB_SHR:
-env->shr = tmp;
-break;
+}
+return 4;
+}
+
+int mb_cpu_gdb_write_stack_protect(CPUMBState *env, uint8_t *mem_buf, int n)
+{
+switch (n) {
+case GDB_SP_SHL:
+env->slr = ldl_p(mem_buf);
+break;
+case GDB_SP_SHR:
+env->shr = ldl_p(mem_buf);
+break;
+default:
+return 0;
 }
 return 4;
 }
diff --git a/configs/targets/microblaze-linux-user.mak 
b/configs/targets/microblaze-linux-user.mak
index 4249a37f65..0a2322c249 100644
--- a/configs/targets/microblaze-linux-user.mak
+++ b/configs/targets/microblaze-linux-user.mak
@@ -3,3 +3,4 @@ TARGET_SYSTBL_ABI=common
 TARGET_SYSTBL=syscall.tbl
 TARGET_BIG_ENDIAN=y
 TARGET_HAS_BFLT=y

Re: [PATCH] target/riscv/vector_helper.c: create vext_set_tail_elems_1s()

2023-02-21 Thread liweiwei



On 2023/2/22 02:45, Daniel Henrique Barboza wrote:

Commit 752614cab8e6 ("target/riscv: rvv: Add tail agnostic for vector
load / store instructions") added code to set the tail elements to 1 in
the end of vext_ldst_stride(), vext_ldst_us(), vext_ldst_index() and
vext_ldff(). Aside from a env->vl versus an evl value being used in the
first loop, the code is being repeated 4 times.

Create a helper to avoid code repetition in all those functions.
Arguments that are used in the callers (nf, esz and max_elems) are
passed as arguments. All other values are being derived inside the
helper.

Signed-off-by: Daniel Henrique Barboza 


LGTM.

Reviewed-by: Weiwei Li 


---
  target/riscv/vector_helper.c | 86 +---
  1 file changed, 30 insertions(+), 56 deletions(-)

diff --git a/target/riscv/vector_helper.c b/target/riscv/vector_helper.c
index 00de879787..7d2e3978f1 100644
--- a/target/riscv/vector_helper.c
+++ b/target/riscv/vector_helper.c
@@ -267,6 +267,28 @@ GEN_VEXT_ST_ELEM(ste_h, int16_t, H2, stw)
  GEN_VEXT_ST_ELEM(ste_w, int32_t, H4, stl)
  GEN_VEXT_ST_ELEM(ste_d, int64_t, H8, stq)
  
+static void vext_set_tail_elems_1s(CPURISCVState *env, target_ulong vl,

+   void *vd, uint32_t desc, uint32_t nf,
+   uint32_t esz, uint32_t max_elems)
+{
+uint32_t total_elems = vext_get_total_elems(env, desc, esz);
+uint32_t vlenb = env_archcpu(env)->cfg.vlen >> 3;


By the way, env_archcpu(env)->cfg in vector_helper.c can also be replace 
by cpu_get_cfg().


Regards,

Weiwei Li


+uint32_t vta = vext_vta(desc);
+uint32_t registers_used;
+int k;
+
+for (k = 0; k < nf; ++k) {
+vext_set_elems_1s(vd, vta, (k * max_elems + vl) * esz,
+  (k * max_elems + max_elems) * esz);
+}
+
+if (nf * max_elems % total_elems != 0) {
+registers_used = ((nf * max_elems) * esz + (vlenb - 1)) / vlenb;
+vext_set_elems_1s(vd, vta, (nf * max_elems) * esz,
+  registers_used * vlenb);
+}
+}
+
  /*
   *** stride: access vector element from strided memory
   */
@@ -281,8 +303,6 @@ vext_ldst_stride(void *vd, void *v0, target_ulong base,
  uint32_t nf = vext_nf(desc);
  uint32_t max_elems = vext_max_elems(desc, log2_esz);
  uint32_t esz = 1 << log2_esz;
-uint32_t total_elems = vext_get_total_elems(env, desc, esz);
-uint32_t vta = vext_vta(desc);
  uint32_t vma = vext_vma(desc);
  
  for (i = env->vstart; i < env->vl; i++, env->vstart++) {

@@ -301,18 +321,8 @@ vext_ldst_stride(void *vd, void *v0, target_ulong base,
  }
  }
  env->vstart = 0;
-/* set tail elements to 1s */
-for (k = 0; k < nf; ++k) {
-vext_set_elems_1s(vd, vta, (k * max_elems + env->vl) * esz,
-  (k * max_elems + max_elems) * esz);
-}
-if (nf * max_elems % total_elems != 0) {
-uint32_t vlenb = env_archcpu(env)->cfg.vlen >> 3;
-uint32_t registers_used =
-((nf * max_elems) * esz + (vlenb - 1)) / vlenb;
-vext_set_elems_1s(vd, vta, (nf * max_elems) * esz,
-  registers_used * vlenb);
-}
+
+vext_set_tail_elems_1s(env, env->vl, vd, desc, nf, esz, max_elems);
  }
  
  #define GEN_VEXT_LD_STRIDE(NAME, ETYPE, LOAD_FN)\

@@ -359,8 +369,6 @@ vext_ldst_us(void *vd, target_ulong base, CPURISCVState 
*env, uint32_t desc,
  uint32_t nf = vext_nf(desc);
  uint32_t max_elems = vext_max_elems(desc, log2_esz);
  uint32_t esz = 1 << log2_esz;
-uint32_t total_elems = vext_get_total_elems(env, desc, esz);
-uint32_t vta = vext_vta(desc);
  
  /* load bytes from guest memory */

  for (i = env->vstart; i < evl; i++, env->vstart++) {
@@ -372,18 +380,8 @@ vext_ldst_us(void *vd, target_ulong base, CPURISCVState 
*env, uint32_t desc,
  }
  }
  env->vstart = 0;
-/* set tail elements to 1s */
-for (k = 0; k < nf; ++k) {
-vext_set_elems_1s(vd, vta, (k * max_elems + evl) * esz,
-  (k * max_elems + max_elems) * esz);
-}
-if (nf * max_elems % total_elems != 0) {
-uint32_t vlenb = env_archcpu(env)->cfg.vlen >> 3;
-uint32_t registers_used =
-((nf * max_elems) * esz + (vlenb - 1)) / vlenb;
-vext_set_elems_1s(vd, vta, (nf * max_elems) * esz,
-  registers_used * vlenb);
-}
+
+vext_set_tail_elems_1s(env, evl, vd, desc, nf, esz, max_elems);
  }
  
  /*

@@ -484,8 +482,6 @@ vext_ldst_index(void *vd, void *v0, target_ulong base,
  uint32_t vm = vext_vm(desc);
  uint32_t max_elems = vext_max_elems(desc, log2_esz);
  uint32_t esz = 1 << log2_esz;
-uint32_t total_elems = vext_get_total_elems(env, desc, esz);
-uint32_t vta = vext_vta(desc);
  uint32_t vma = vext_vma(desc);
  
  /* load bytes from guest memory */

@@ -505,18 +501,8 @@ vext_ldst_index(void *vd, void *v0, 

[PULL v2 8/8] sysemu/os-win32: fix setjmp/longjmp on windows-arm64

2023-02-21 Thread Richard Henderson
From: Pierrick Bouvier 

Windows implementation of setjmp/longjmp is done in
C:/WINDOWS/system32/ucrtbase.dll. Alas, on arm64, it seems to *always*
perform stack unwinding, which crashes from generated code.

By using alternative implementation built in mingw, we avoid doing stack
unwinding and this fixes crash when calling longjmp.

Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Pierrick Bouvier 
Acked-by: Richard Henderson 
Message-Id: <20230221153006.20300-3-pierrick.bouv...@linaro.org>
Signed-off-by: Richard Henderson 
---
 include/sysemu/os-win32.h | 28 
 meson.build   | 21 +
 2 files changed, 45 insertions(+), 4 deletions(-)

diff --git a/include/sysemu/os-win32.h b/include/sysemu/os-win32.h
index 5b38c7bd04..97d0243aee 100644
--- a/include/sysemu/os-win32.h
+++ b/include/sysemu/os-win32.h
@@ -51,14 +51,34 @@ typedef struct sockaddr_un {
 extern "C" {
 #endif
 
-#if defined(_WIN64)
-/* On w64, setjmp is implemented by _setjmp which needs a second parameter.
+#if defined(__aarch64__)
+/*
+ * On windows-arm64, setjmp is available in only one variant, and longjmp 
always
+ * does stack unwinding. This crash with generated code.
+ * Thus, we use another implementation of setjmp (not windows one), coming from
+ * mingw, which never performs stack unwinding.
+ */
+#undef setjmp
+#undef longjmp
+/*
+ * These functions are not declared in setjmp.h because __aarch64__ defines
+ * setjmp to _setjmpex instead. However, they are still defined in 
libmingwex.a,
+ * which gets linked automatically.
+ */
+extern int __mingw_setjmp(jmp_buf);
+extern void __attribute__((noreturn)) __mingw_longjmp(jmp_buf, int);
+#define setjmp(env) __mingw_setjmp(env)
+#define longjmp(env, val) __mingw_longjmp(env, val)
+#elif defined(_WIN64)
+/*
+ * On windows-x64, setjmp is implemented by _setjmp which needs a second 
parameter.
  * If this parameter is NULL, longjump does no stack unwinding.
  * That is what we need for QEMU. Passing the value of register rsp (default)
- * lets longjmp try a stack unwinding which will crash with generated code. */
+ * lets longjmp try a stack unwinding which will crash with generated code.
+ */
 # undef setjmp
 # define setjmp(env) _setjmp(env, NULL)
-#endif
+#endif /* __aarch64__ */
 /* QEMU uses sigsetjmp()/siglongjmp() as the portable way to specify
  * "longjmp and don't touch the signal masks". Since we know that the
  * savemask parameter will always be zero we can safely define these
diff --git a/meson.build b/meson.build
index bc7e5b1d15..6a139e7085 100644
--- a/meson.build
+++ b/meson.build
@@ -2466,6 +2466,27 @@ if targetos == 'windows'
 }''', name: '_lock_file and _unlock_file'))
 endif
 
+if targetos == 'windows'
+  mingw_has_setjmp_longjmp = cc.links('''
+#include 
+int main(void) {
+  /*
+   * These functions are not available in setjmp header, but may be
+   * available at link time, from libmingwex.a.
+   */
+  extern int __mingw_setjmp(jmp_buf);
+  extern void __attribute__((noreturn)) __mingw_longjmp(jmp_buf, int);
+  jmp_buf env;
+  __mingw_setjmp(env);
+  __mingw_longjmp(env, 0);
+}
+  ''', name: 'mingw setjmp and longjmp')
+
+  if cpu == 'aarch64' and not mingw_has_setjmp_longjmp
+error('mingw must provide setjmp/longjmp for windows-arm64')
+  endif
+endif
+
 
 # Target configuration #
 
-- 
2.34.1




[PULL v2 3/8] linux-user: Always exit from exclusive state in fork_end()

2023-02-21 Thread Richard Henderson
From: Ilya Leoshkevich 

fork()ed processes currently start with
current_cpu->in_exclusive_context set, which is, strictly speaking, not
correct, but does not cause problems (even assertion failures).

With one of the next patches, the code begins to rely on this value, so
fix it by always calling end_exclusive() in fork_end().

Reviewed-by: Richard Henderson 
Reviewed-by: Alex Bennée 
Signed-off-by: Ilya Leoshkevich 
Message-Id: <20230214140829.45392-2-...@linux.ibm.com>
Signed-off-by: Richard Henderson 
---
 linux-user/main.c| 10 ++
 linux-user/syscall.c |  1 +
 2 files changed, 7 insertions(+), 4 deletions(-)

diff --git a/linux-user/main.c b/linux-user/main.c
index 4290651c3c..4ff30ff980 100644
--- a/linux-user/main.c
+++ b/linux-user/main.c
@@ -161,13 +161,15 @@ void fork_end(int child)
 }
 qemu_init_cpu_list();
 gdbserver_fork(thread_cpu);
-/* qemu_init_cpu_list() takes care of reinitializing the
- * exclusive state, so we don't need to end_exclusive() here.
- */
 } else {
 cpu_list_unlock();
-end_exclusive();
 }
+/*
+ * qemu_init_cpu_list() reinitialized the child exclusive state, but we
+ * also need to keep current_cpu consistent, so call end_exclusive() for
+ * both child and parent.
+ */
+end_exclusive();
 }
 
 __thread CPUState *thread_cpu;
diff --git a/linux-user/syscall.c b/linux-user/syscall.c
index 1e868e9b0e..a6c426d73c 100644
--- a/linux-user/syscall.c
+++ b/linux-user/syscall.c
@@ -6752,6 +6752,7 @@ static int do_fork(CPUArchState *env, unsigned int flags, 
abi_ulong newsp,
 cpu_clone_regs_parent(env, flags);
 fork_end(0);
 }
+g_assert(!cpu_in_exclusive_context(cpu));
 }
 return ret;
 }
-- 
2.34.1




[PULL v2 0/8] tcg patch queue

2023-02-21 Thread Richard Henderson
The following changes since commit 79b677d658d3d35e1e776826ac4abb28cdce69b8:

  Merge tag 'net-pull-request' of https://github.com/jasowang/qemu into staging 
(2023-02-21 11:28:31 +)

are available in the Git repository at:

  https://gitlab.com/rth7680/qemu.git tags/pull-tcg-20230221

for you to fetch changes up to dbd672c87f19949bb62bfb1fb3a97b9729fd7560:

  sysemu/os-win32: fix setjmp/longjmp on windows-arm64 (2023-02-21 13:45:48 
-1000)


tcg: Allow first half of insn in ram, and second half in mmio
linux-user/sparc: SIGILL for unknown trap vectors
linux-user/microblaze: SIGILL for privileged insns
linux-user: Fix deadlock while exiting due to signal
target/microblaze: Add gdbstub xml
util: Adjust cacheflush for windows-arm64
include/sysemu/os-win32: Adjust setjmp/longjmp for windows-arm64


Ilya Leoshkevich (3):
  linux-user: Always exit from exclusive state in fork_end()
  cpus: Make {start,end}_exclusive() recursive
  linux-user/microblaze: Handle privileged exception

Pierrick Bouvier (2):
  util/cacheflush: fix cache on windows-arm64
  sysemu/os-win32: fix setjmp/longjmp on windows-arm64

Richard Henderson (3):
  accel/tcg: Allow the second page of an instruction to be MMIO
  linux-user/sparc: Raise SIGILL for all unhandled software traps
  target/microblaze: Add gdbstub xml

 include/hw/core/cpu.h   |  4 +-
 include/sysemu/os-win32.h   | 28 ++--
 target/microblaze/cpu.h |  2 +
 accel/tcg/translator.c  | 12 +-
 cpus-common.c   | 12 +-
 linux-user/main.c   | 10 +++--
 linux-user/microblaze/cpu_loop.c| 10 -
 linux-user/sparc/cpu_loop.c |  8 
 linux-user/syscall.c|  1 +
 target/microblaze/cpu.c |  7 ++-
 target/microblaze/gdbstub.c | 51 --
 util/cacheflush.c   | 14 --
 configs/targets/microblaze-linux-user.mak   |  1 +
 configs/targets/microblaze-softmmu.mak  |  1 +
 configs/targets/microblazeel-linux-user.mak |  1 +
 configs/targets/microblazeel-softmmu.mak|  1 +
 gdb-xml/microblaze-core.xml | 67 +
 gdb-xml/microblaze-stack-protect.xml| 12 ++
 meson.build | 21 +
 19 files changed, 229 insertions(+), 34 deletions(-)
 create mode 100644 gdb-xml/microblaze-core.xml
 create mode 100644 gdb-xml/microblaze-stack-protect.xml



Re: [PATCH 2/6] hw/cxl: rename mailbox return code type from ret_code to CXLRetCode

2023-02-21 Thread Ira Weiny
Jonathan Cameron wrote:
> This enum typedef used to be local to one file, so having a generic
> name wasn't a big problem even if it wasn't compliant with QEMU naming
> conventions.  Now it is in cxl_device.h to support use outside of
> cxl-mailbox-utils.c rename it.

Same comment as 1/6 but still.

Reviewed-by: Ira Weiny 

> 
> Signed-off-by: Jonathan Cameron 
> ---
>  hw/cxl/cxl-mailbox-utils.c  | 62 ++---
>  include/hw/cxl/cxl_device.h |  2 +-
>  2 files changed, 32 insertions(+), 32 deletions(-)
> 



Re: [PATCH 1/6] hw/cxl: Move enum ret_code definition to cxl_device.h

2023-02-21 Thread Ira Weiny
Jonathan Cameron wrote:
> Needs tidy up and rename to something more generic now it is
> in a header.

I'm not opposed to this change and patch 2 but I don't see where
CXLRetCode is being used outside of cxl-mailbox-utils.c in this series.

Despite that reservation I think this is a good clarification.

Reviewed-by: Ira Weiny 

> 
> Signed-off-by: Jonathan Cameron 
> ---
>  hw/cxl/cxl-mailbox-utils.c  | 28 
>  include/hw/cxl/cxl_device.h | 28 
>  2 files changed, 28 insertions(+), 28 deletions(-)
> 



Re: [PATCH 6/6] hw/cxl: Add clear poison mailbox command support.

2023-02-21 Thread Ira Weiny
Jonathan Cameron wrote:
> Current implementation is very simple so many of the corner
> cases do not exist (e.g. fragmenting larger poison list entries)
> 
> Signed-off-by: Jonathan Cameron 
> ---
>  hw/cxl/cxl-mailbox-utils.c  | 77 +
>  hw/mem/cxl_type3.c  | 36 +
>  include/hw/cxl/cxl_device.h |  1 +
>  3 files changed, 114 insertions(+)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 7d3f7bcd3a..f56c76b205 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -65,6 +65,7 @@ enum {
>  MEDIA_AND_POISON = 0x43,
>  #define GET_POISON_LIST0x0
>  #define INJECT_POISON  0x1
> +#define CLEAR_POISON   0x2
>  };
>  
>  struct cxl_cmd;
> @@ -474,6 +475,80 @@ static CXLRetCode cmd_media_inject_poison(struct cxl_cmd 
> *cmd,
>  return CXL_MBOX_SUCCESS;
>  }
>  
> +static CXLRetCode cmd_media_clear_poison(struct cxl_cmd *cmd,
> + CXLDeviceState *cxl_dstate,
> + uint16_t *len)
> +{
> +CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
> +CXLPoisonList *poison_list = >poison_list;
> +CXLType3Class *cvc = CXL_TYPE3_GET_CLASS(ct3d);
> +struct clear_poison_pl {
> +uint64_t dpa;
> +uint8_t data[64];
> +};
> +CXLPoison *ent;
> +
> +struct clear_poison_pl *in = (void *)cmd->payload;
> +
> +if (in->dpa + 64 > cxl_dstate->mem_size) {
> +return CXL_MBOX_INVALID_PA;
> +}
> +
> +QLIST_FOREACH(ent, poison_list, node) {
> +/*
> + * Test for contained in entry. Simpler than general case
> + * as clearing 64 bytes and entries 64 byte aligned
> + */
> +if ((in->dpa < ent->start) || (in->dpa >= ent->start + ent->length)) 
> {
> +continue;
> +}
> +/* Do accounting early as we know one will go away */
> +ct3d->poison_list_cnt--;
> +if (in->dpa > ent->start) {
> +CXLPoison *frag;
> +if (ct3d->poison_list_cnt == CXL_POISON_LIST_LIMIT) {

Isn't this always impossible because poison_list_cnt was just decremented?

I wonder if the early accounting is correct with this check.

> +cxl_set_poison_list_overflowed(ct3d);
> +break;
> +}
> +frag = g_new0(CXLPoison, 1);
> +
> +frag->start = ent->start;
> +frag->length = in->dpa - ent->start;
> +frag->type = ent->type;
> +
> +QLIST_INSERT_HEAD(poison_list, frag, node);
> +ct3d->poison_list_cnt++;
> +}
> +if (in->dpa + 64 < ent->start + ent->length) {
> +CXLPoison *frag;
> +
> +if (ct3d->poison_list_cnt == CXL_POISON_LIST_LIMIT) {
> +cxl_set_poison_list_overflowed(ct3d);
> +break;
> +}
> +
> +frag = g_new0(CXLPoison, 1);
> +
> +frag->start = in->dpa + 64;
> +frag->length = ent->start + ent->length - frag->start;
> +frag->type = ent->type;
> +QLIST_INSERT_HEAD(poison_list, frag, node);
> +ct3d->poison_list_cnt++;
> +}
> +/* Any fragments have been added, free original entry */
> +QLIST_REMOVE(ent, node);

Seems safer to decrement here and check limit prior to adding the
fragments above.

> +g_free(ent);
> +break;
> +}
> +/* Clearing a region with no poison is not an error so always do so */
> +if (cvc->set_cacheline)

Per Qemu coding you need '{'.  But is this check needed? ...

> +if (!cvc->set_cacheline(ct3d, in->dpa, in->data)) {
> +return CXL_MBOX_INTERNAL_ERROR;
> +}
> +
> +return CXL_MBOX_SUCCESS;
> +}
> +
>  #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
>  #define IMMEDIATE_DATA_CHANGE (1 << 2)
>  #define IMMEDIATE_POLICY_CHANGE (1 << 3)
> @@ -505,6 +580,8 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
>  cmd_media_get_poison_list, 16, 0 },
>  [MEDIA_AND_POISON][INJECT_POISON] = { "MEDIA_AND_POISON_INJECT_POISON",
>  cmd_media_inject_poison, 8, 0 },
> +[MEDIA_AND_POISON][CLEAR_POISON] = { "MEDIA_AND_POISON_CLEAR_POISON",
> +cmd_media_clear_poison, 72, 0 },
>  };
>  
>  void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
> diff --git a/hw/mem/cxl_type3.c b/hw/mem/cxl_type3.c
> index 3585f78b4e..8adc725edc 100644
> --- a/hw/mem/cxl_type3.c
> +++ b/hw/mem/cxl_type3.c
> @@ -925,6 +925,41 @@ static void set_lsa(CXLType3Dev *ct3d, const void *buf, 
> uint64_t size,
>   */
>  }
>  
> +static bool set_cacheline(CXLType3Dev *ct3d, uint64_t dpa_offset, uint8_t 
> *data)
> +{
> +MemoryRegion *vmr = NULL, *pmr = NULL;
> +AddressSpace *as;
> +
> +if (ct3d->hostvmem) {
> +vmr = host_memory_backend_get_memory(ct3d->hostvmem);
> +}
> +if (ct3d->hostpmem) {
> 

Re: [PATCH 5/6] hw/cxl: Add poison injection via the mailbox.

2023-02-21 Thread Ira Weiny
Jonathan Cameron wrote:
> Very simple implementation to allow testing of corresponding
> kernel code. Note that for now we track each 64 byte section
> independently.  Whilst a valid implementation choice, it may
> make sense to fuse entries so as to prove out more complex
> corners of the kernel code.
> 
> Signed-off-by: Jonathan Cameron 
> ---
>  hw/cxl/cxl-mailbox-utils.c | 40 ++
>  1 file changed, 40 insertions(+)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index cf3cfb10a1..7d3f7bcd3a 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -64,6 +64,7 @@ enum {
>  #define SET_LSA   0x3
>  MEDIA_AND_POISON = 0x43,
>  #define GET_POISON_LIST0x0
> +#define INJECT_POISON  0x1
>  };
>  
>  struct cxl_cmd;
> @@ -436,6 +437,43 @@ static CXLRetCode cmd_media_get_poison_list(struct 
> cxl_cmd *cmd,
>  return CXL_MBOX_SUCCESS;
>  }
>  
> +static CXLRetCode cmd_media_inject_poison(struct cxl_cmd *cmd,
> +  CXLDeviceState *cxl_dstate,
> +  uint16_t *len)
> +{
> +CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
> +CXLPoisonList *poison_list = >poison_list;
> +CXLPoison *ent;
> +struct inject_poison_pl {
> +uint64_t dpa;
> +};
> +struct inject_poison_pl *in = (void *)cmd->payload;
> +CXLPoison *p;
> +
> +QLIST_FOREACH(ent, poison_list, node) {
> +if (ent->start == in->dpa && ent->length == 64) {

How does this interact with the QMP inject poison?  Should this be
checking the range of the entries?

Ira

> +return CXL_MBOX_SUCCESS;
> +}
> +}
> +
> +if (ct3d->poison_list_cnt == CXL_POISON_LIST_LIMIT) {
> +return CXL_MBOX_INJECT_POISON_LIMIT;
> +}
> +p = g_new0(CXLPoison, 1);
> +
> +p->length = 64;
> +p->start = in->dpa;
> +p->type = CXL_POISON_TYPE_INJECTED;
> +
> +/*
> + * Possible todo: Merge with existing entry if next to it and if same 
> type
> + */
> +QLIST_INSERT_HEAD(poison_list, p, node);
> +ct3d->poison_list_cnt++;
> +
> +return CXL_MBOX_SUCCESS;
> +}
> +
>  #define IMMEDIATE_CONFIG_CHANGE (1 << 1)
>  #define IMMEDIATE_DATA_CHANGE (1 << 2)
>  #define IMMEDIATE_POLICY_CHANGE (1 << 3)
> @@ -465,6 +503,8 @@ static struct cxl_cmd cxl_cmd_set[256][256] = {
>  ~0, IMMEDIATE_CONFIG_CHANGE | IMMEDIATE_DATA_CHANGE },
>  [MEDIA_AND_POISON][GET_POISON_LIST] = { 
> "MEDIA_AND_POISON_GET_POISON_LIST",
>  cmd_media_get_poison_list, 16, 0 },
> +[MEDIA_AND_POISON][INJECT_POISON] = { "MEDIA_AND_POISON_INJECT_POISON",
> +cmd_media_inject_poison, 8, 0 },
>  };
>  
>  void cxl_process_mailbox(CXLDeviceState *cxl_dstate)
> -- 
> 2.37.2
> 





Re: [PATCH v3 23/24] include: split target_long definition from cpu-defs

2023-02-21 Thread Richard Henderson

On 2/21/23 12:52, Alex Bennée wrote:

While we will continue to include this via cpu-defs it is useful to be
able to define this separately for 32 and 64 bit versions of an
otherwise target independent compilation unit.

Signed-off-by: Alex Bennée
---
  include/exec/cpu-defs.h| 19 +
  include/exec/target_long.h | 42 ++
  2 files changed, 43 insertions(+), 18 deletions(-)
  create mode 100644 include/exec/target_long.h


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 4/6] hw/cxl: QMP based poison injection support

2023-02-21 Thread Ira Weiny
Jonathan Cameron wrote:
> Inject poison using qmp command cxl-inject-poison to add an entry to the
> poison list.
> 
> For now, the poison is not returned CXL.mem reads, but only via the
> mailbox command Get Poison List.
> 
> See CXL rev 3.0, sec 8.2.9.8.4.1 Get Poison list (Opcode 4300h)
> 
> Kernel patches to use this interface here:
> https://lore.kernel.org/linux-cxl/cover.1665606782.git.alison.schofi...@intel.com/
> 
> To inject poison using qmp (telnet to the qmp port)
> { "execute": "qmp_capabilities" }
> 
> { "execute": "cxl-inject-poison",
> "arguments": {
>  "path": "/machine/peripheral/cxl-pmem0",
>  "start": 2048,
>  "length": 256
> }
> }
> 
> Adjusted to select a device on your machine.
> 
> Note that the poison list supported is kept short enough to avoid the
> complexity of state machine that is needed to handle the MORE flag.
> 
> Signed-off-by: Jonathan Cameron 
> 
> ---
> v3:
> Improve QMP documentation.
> 
> v2:
> Moved to QMP to allow for single command.
> Update reference in coverletter
> Added specific setting of type for this approach to injection.
> Drop the unnecessary ct3d class get_poison_list callback.
> Block overlapping regions from being injected
> Handle list overflow
> Use Ira's utility function to get the timestamps
> ---
>  hw/cxl/cxl-mailbox-utils.c  | 82 +
>  hw/mem/cxl_type3.c  | 56 +
>  hw/mem/cxl_type3_stubs.c|  3 ++
>  hw/mem/meson.build  |  2 +
>  include/hw/cxl/cxl_device.h | 20 +
>  qapi/cxl.json   | 16 
>  6 files changed, 179 insertions(+)
> 
> diff --git a/hw/cxl/cxl-mailbox-utils.c b/hw/cxl/cxl-mailbox-utils.c
> index 580366ed2f..cf3cfb10a1 100644
> --- a/hw/cxl/cxl-mailbox-utils.c
> +++ b/hw/cxl/cxl-mailbox-utils.c
> @@ -62,6 +62,8 @@ enum {
>  #define GET_PARTITION_INFO 0x0
>  #define GET_LSA   0x2
>  #define SET_LSA   0x3
> +MEDIA_AND_POISON = 0x43,
> +#define GET_POISON_LIST0x0
>  };
>  
>  struct cxl_cmd;
> @@ -267,6 +269,8 @@ static CXLRetCode cmd_identify_memory_device(struct 
> cxl_cmd *cmd,
>  id->persistent_capacity = cxl_dstate->pmem_size / 
> CXL_CAPACITY_MULTIPLIER;
>  id->volatile_capacity = cxl_dstate->vmem_size / CXL_CAPACITY_MULTIPLIER;
>  id->lsa_size = cvc->get_lsa_size(ct3d);
> +id->poison_list_max_mer[1] = 0x1; /* 256 poison records */

Using st24_le_p() would be more robust I think.

> +id->inject_poison_limit = 0; /* No limit - so limited by main poison 
> record limit */
>  
>  *len = sizeof(*id);
>  return CXL_MBOX_SUCCESS;
> @@ -356,6 +360,82 @@ static CXLRetCode cmd_ccls_set_lsa(struct cxl_cmd *cmd,
>  return CXL_MBOX_SUCCESS;
>  }
>  
> +/*
> + * This is very inefficient, but good enough for now!
> + * Also the payload will always fit, so no need to handle the MORE flag and
> + * make this stateful. We may want to allow longer poison lists to aid
> + * testing that kernel functionality.
> + */
> +static CXLRetCode cmd_media_get_poison_list(struct cxl_cmd *cmd,
> +CXLDeviceState *cxl_dstate,
> +uint16_t *len)
> +{
> +struct get_poison_list_pl {
> +uint64_t pa;
> +uint64_t length;
> +} QEMU_PACKED;
> +
> +struct get_poison_list_out_pl {
> +uint8_t flags;
> +uint8_t rsvd1;
> +uint64_t overflow_timestamp;
> +uint16_t count;
> +uint8_t rsvd2[0x14];
> +struct {
> +uint64_t addr;
> +uint32_t length;
> +uint32_t resv;
> +} QEMU_PACKED records[];
> +} QEMU_PACKED;
> +
> +struct get_poison_list_pl *in = (void *)cmd->payload;
> +struct get_poison_list_out_pl *out = (void *)cmd->payload;
> +CXLType3Dev *ct3d = container_of(cxl_dstate, CXLType3Dev, cxl_dstate);
> +uint16_t record_count = 0, i = 0;
> +uint64_t query_start = in->pa;

Should we verify Bits[5:0] are 0?

> +uint64_t query_length = in->length;

Isn't in->length in units of 64bytes?  Does that get converted somewhere?

> +CXLPoisonList *poison_list = >poison_list;
> +CXLPoison *ent;
> +uint16_t out_pl_len;
> +
> +QLIST_FOREACH(ent, poison_list, node) {
> +/* Check for no overlap */
> +if (ent->start >= query_start + query_length ||
> +ent->start + ent->length <= query_start) {
> +continue;
> +}
> +record_count++;
> +}
> +out_pl_len = sizeof(*out) + record_count * sizeof(out->records[0]);
> +assert(out_pl_len <= CXL_MAILBOX_MAX_PAYLOAD_SIZE);
> +
> +memset(out, 0, out_pl_len);
> +QLIST_FOREACH(ent, poison_list, node) {
> +uint64_t start, stop;
> +
> +/* Check for no overlap */
> +if (ent->start >= query_start + query_length ||
> +ent->start + ent->length <= query_start) {
> +continue;
> +

Re: [PATCH v3 22/24] testing: probe gdb for supported architectures ahead of time

2023-02-21 Thread Richard Henderson

On 2/21/23 12:52, Alex Bennée wrote:

Currently when we encounter a gdb that is old or not built with
multiarch in mind we fail rather messily. Try and improve the
situation by probing ahead of time and setting
HOST_GDB_SUPPORTS_ARCH=y in the relevant tcg configs. We can then skip
and give a more meaningful message if we don't run the test.

[AJB: we still miss some arches, for example gdb uses s390 which fails
when we look for s390x. Not sure what the best way to deal with that
is? Maybe define a gdb_arch as we probe each target?]


I think we need to have a complete gdb -> qemu mapping.
Seems like this would be fairly easy in python...


r~



Re: [PATCH v3 18/24] gdbstub: don't use target_ulong while handling registers

2023-02-21 Thread Richard Henderson

On 2/21/23 12:52, Alex Bennée wrote:

This is a hangover from the original code. addr is misleading as it is
only really a register id. While len will never exceed
MAX_PACKET_LENGTH I've used size_t as that is what strlen returns.

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Signed-off-by: Alex Bennée 

---
v3
   - fix commit message
   - use unsigned for regid


Don't use unsigned, which you did here:


  static void handle_write_all_regs(GArray *params, void *user_ctx)
  {
-target_ulong addr, len;
+unsigned int reg_id;


but not here


  static void handle_read_all_regs(GArray *params, void *user_ctx)
  {
-target_ulong addr, len;
+int reg_id;


because the comparison,


+for (reg_id = 0; reg_id < gdbserver_state.g_cpu->gdb_num_g_regs; reg_id++) 
{


is against signed:

include/hw/core/cpu.h:377:int gdb_num_g_regs;


r~



Re: [PATCH v3 11/24] gdbstub: rationalise signal mapping in softmmu

2023-02-21 Thread Richard Henderson

On 2/21/23 12:52, Alex Bennée wrote:

We don't really need a table for mapping two symbols.

Signed-off-by: Alex Bennée
Suggested-by: Richard Henderson
---
  gdbstub/softmmu.c | 19 +++
  1 file changed, 7 insertions(+), 12 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH v3 09/24] gdbstub: move chunk of softmmu functionality to own file

2023-02-21 Thread Richard Henderson

On 2/21/23 12:52, Alex Bennée wrote:

This is mostly code motion but a number of things needed to be done
for this minimal patch set:

   - move shared structures to internals.h
   - splitting some functions into user and softmmu versions
   - fixing a few casting issues to keep softmmu common

More CONFIG_USER_ONLY stuff will be handled in a following patches.

Reviewed-by: Fabiano Rosas
Signed-off-by: Alex Bennée

---
v3
   - rebase fixes
   - move extern to internals.h
---
  gdbstub/internals.h  |  43 -
  gdbstub/gdbstub.c| 421 +--
  gdbstub/softmmu.c| 415 ++
  gdbstub/trace-events |   4 +-
  4 files changed, 470 insertions(+), 413 deletions(-)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 5/5] hw: Remove mentions of NDEBUG

2023-02-21 Thread Richard Henderson

On 2/21/23 13:25, Philippe Mathieu-Daudé wrote:

Since commit 262a69f428 ("osdep.h: Prohibit disabling
assert() in supported builds") 'NDEBUG' can not be defined.

Signed-off-by: Philippe Mathieu-Daudé 
---
  hw/scsi/mptsas.c   | 2 --
  hw/virtio/virtio.c | 2 --
  2 files changed, 4 deletions(-)


Reviewed-by: Richard Henderson 

r~



diff --git a/hw/scsi/mptsas.c b/hw/scsi/mptsas.c
index c485da792c..5b373d3ed6 100644
--- a/hw/scsi/mptsas.c
+++ b/hw/scsi/mptsas.c
@@ -1240,8 +1240,6 @@ static void *mptsas_load_request(QEMUFile *f, SCSIRequest 
*sreq)
  n = qemu_get_be32(f);
  /* TODO: add a way for SCSIBusInfo's load_request to fail,
   * and fail migration instead of asserting here.
- * This is just one thing (there are probably more) that must be
- * fixed before we can allow NDEBUG compilation.
   */
  assert(n >= 0);
  
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c

index f35178f5fc..c6b3e3fb08 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1898,8 +1898,6 @@ void *qemu_get_virtqueue_element(VirtIODevice *vdev, 
QEMUFile *f, size_t sz)
  
  /* TODO: teach all callers that this can fail, and return failure instead

   * of asserting here.
- * This is just one thing (there are probably more) that must be
- * fixed before we can allow NDEBUG compilation.
   */
  assert(ARRAY_SIZE(data.in_addr) >= data.in_num);
  assert(ARRAY_SIZE(data.out_addr) >= data.out_num);





Re: [PATCH 4/5] block/vvfat: Remove pointless check of NDEBUG

2023-02-21 Thread Richard Henderson

On 2/21/23 13:25, Philippe Mathieu-Daudé wrote:

Since commit 262a69f428 ("osdep.h: Prohibit disabling
assert() in supported builds") 'NDEBUG' can not be defined,
so '#ifndef NDEBUG' is dead code. Remove it.

Signed-off-by: Philippe Mathieu-Daudé 
---
  block/vvfat.c | 3 ---
  1 file changed, 3 deletions(-)


Reviewed-by: Richard Henderson 

r~



diff --git a/block/vvfat.c b/block/vvfat.c
index d7d775bd2c..fd45e86416 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -2784,13 +2784,10 @@ static int handle_commits(BDRVVVFATState* s)
  fail = -2;
  break;
  case ACTION_WRITEOUT: {
-#ifndef NDEBUG
-/* these variables are only used by assert() below */
  direntry_t* entry = array_get(&(s->directory),
  commit->param.writeout.dir_index);
  uint32_t begin = begin_of_direntry(entry);
  mapping_t* mapping = find_mapping_for_cluster(s, begin);
-#endif
  
  assert(mapping);

  assert(mapping->begin == begin);





Re: [PATCH 2/5] scripts/checkpatch.pl: Do not allow assert(0)

2023-02-21 Thread Richard Henderson

On 2/21/23 13:25, Philippe Mathieu-Daudé wrote:

Since commit 262a69f428 ("osdep.h: Prohibit disabling assert()
in supported builds") we can not build QEMU with NDEBUG (or
G_DISABLE_ASSERT) defined, thus 'assert(0)' always aborts QEMU.

However some static analyzers / compilers doesn't notice NDEBUG
can't be defined and emit warnings if code is used after an
'assert(0)' call. See for example commit c0a6665c3c ("target/i386:
Remove compilation errors when -Werror=maybe-uninitialized").

Apparently such compiler isn't as clever with G_DISABLE_ASSERT,
so we can silent these warnings by using g_assert_not_reached()
which is easier to read anyway.

In order to avoid these annoying warnings, add a checkpatch rule
to prohibit 'assert(0)'. Suggest using g_assert_not_reached()
instead. For example when reverting the previous patch we get:

   ERROR: use g_assert_not_reached() instead of assert(0)
   #21: FILE: target/ppc/dfp_helper.c:124:
   +assert(0); /* cannot get here */

   ERROR: use g_assert_not_reached() instead of assert(0)
   #30: FILE: target/ppc/dfp_helper.c:141:
   +assert(0); /* cannot get here */

   total: 2 errors, 0 warnings, 16 lines checked

Signed-off-by: Philippe Mathieu-Daudé
---
  scripts/checkpatch.pl | 3 +++
  1 file changed, 3 insertions(+)


Reviewed-by: Richard Henderson 

r~



Re: [PATCH 3/5] bulk: Replace [g_]assert(0) -> g_assert_not_reached()

2023-02-21 Thread Richard Henderson

On 2/21/23 13:25, Philippe Mathieu-Daudé wrote:

In order to avoid warnings such commit c0a6665c3c ("target/i386:
Remove compilation errors when -Werror=maybe-uninitialized"),
replace all assert(0) and g_assert(0) by g_assert_not_reached().

Remove any code following g_assert_not_reached().

See previous commit for rationale.

Signed-off-by: Philippe Mathieu-Daudé 
---
  docs/spin/aio_notify_accept.promela |   6 +-
  docs/spin/aio_notify_bug.promela|   6 +-


C only.  Otherwise,
Reviewed-by: Richard Henderson 


r~



Re: [PATCH v1 3/6] kvm: Synchronize the backup bitmap in the last stage

2023-02-21 Thread Peter Xu
Hi, Gavin,

On Wed, Feb 22, 2023 at 10:44:07AM +1100, Gavin Shan wrote:
> Peter, could you please give some hints for me to understand the atomic
> and non-atomic update here? Ok, I will drop this part of changes in next
> revision with the assumption that we have atomic update supported for
> ARM64.

See commit f39b7d2b96.  Please don't remove the change in this patch.

The comment was just something I thought about when reading, not something
I suggested to change.

If to remove it we'll need to remove the whole chunk not your changes alone
here.  Still, please take it with a grain of salt before anyone can help to
confirm because I can miss something else here.

In short: before we know anything solidly, your current code is exactly
correct, AFAICT.

Thanks,

-- 
Peter Xu




Re: [PATCH v4 2/4] sysemu/os-win32: fix setjmp/longjmp on windows-arm64

2023-02-21 Thread Richard Henderson

On 2/21/23 05:30, Pierrick Bouvier wrote:

Windows implementation of setjmp/longjmp is done in
C:/WINDOWS/system32/ucrtbase.dll. Alas, on arm64, it seems to *always*
perform stack unwinding, which crashes from generated code.

By using alternative implementation built in mingw, we avoid doing stack
unwinding and this fixes crash when calling longjmp.

Signed-off-by: Pierrick Bouvier 
Acked-by: Richard Henderson 
---
  include/sysemu/os-win32.h | 28 
  meson.build   | 21 +
  2 files changed, 45 insertions(+), 4 deletions(-)


Queueing this to tcg-next.


r~



Re: [PATCH v4 1/4] util/cacheflush: fix cache on windows-arm64

2023-02-21 Thread Richard Henderson

On 2/21/23 05:30, Pierrick Bouvier wrote:

ctr_el0 access is privileged on this platform and fails as an illegal
instruction.

Windows does not offer a way to flush data cache from userspace, and
only FlushInstructionCache is available in Windows API.

The generic implementation of flush_idcache_range uses,
__builtin___clear_cache, which already use the FlushInstructionCache
function. So we rely on that.

Signed-off-by: Pierrick Bouvier 
Reviewed-by: Richard Henderson 
---
  util/cacheflush.c | 14 +++---
  1 file changed, 11 insertions(+), 3 deletions(-)


Queueing this to tcg-next.


r~



Re: [PATCH v1 3/6] kvm: Synchronize the backup bitmap in the last stage

2023-02-21 Thread Gavin Shan

On 2/22/23 4:46 AM, Peter Xu wrote:

On Mon, Feb 13, 2023 at 08:39:22AM +0800, Gavin Shan wrote:

In the last stage of live migration or memory slot removal, the
backup bitmap needs to be synchronized when it has been enabled.

Signed-off-by: Gavin Shan 
---
  accel/kvm/kvm-all.c  | 11 +++
  include/sysemu/kvm_int.h |  1 +
  2 files changed, 12 insertions(+)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 01a6a026af..b5e12de522 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1352,6 +1352,10 @@ static void kvm_set_phys_mem(KVMMemoryListener *kml,
   */
  if (kvm_state->kvm_dirty_ring_size) {
  kvm_dirty_ring_reap_locked(kvm_state, NULL);
+if (kvm_state->kvm_dirty_ring_with_bitmap) {
+kvm_slot_sync_dirty_pages(mem);
+kvm_slot_get_dirty_log(kvm_state, mem);
+}
  } else {
  kvm_slot_get_dirty_log(kvm_state, mem);
  }


IIUC after the memory atomic update changes lands QEMU, we may not need
this sync at all.

My understanding is that we sync dirty log here only because of non-atomic
updates happening in the past and we may lose dirty bits unexpectedly.
Maybe Paolo knows.

But that needs some more justification and history digging, so definitely
more suitable to leave it for later and separate discussion.

Reviewed-by: Peter Xu 



Peter, could you please give some hints for me to understand the atomic
and non-atomic update here? Ok, I will drop this part of changes in next
revision with the assumption that we have atomic update supported for
ARM64.

Thanks,
Gavin




@@ -1573,6 +1577,12 @@ static void kvm_log_sync_global(MemoryListener *l, bool 
last_stage)
  mem = >slots[i];
  if (mem->memory_size && mem->flags & KVM_MEM_LOG_DIRTY_PAGES) {
  kvm_slot_sync_dirty_pages(mem);
+
+if (s->kvm_dirty_ring_with_bitmap && last_stage &&
+kvm_slot_get_dirty_log(s, mem)) {
+kvm_slot_sync_dirty_pages(mem);
+}
+
  /*
   * This is not needed by KVM_GET_DIRTY_LOG because the
   * ioctl will unconditionally overwrite the whole region.
@@ -3701,6 +3711,7 @@ static void kvm_accel_instance_init(Object *obj)
  s->kernel_irqchip_split = ON_OFF_AUTO_AUTO;
  /* KVM dirty ring is by default off */
  s->kvm_dirty_ring_size = 0;
+s->kvm_dirty_ring_with_bitmap = false;
  s->notify_vmexit = NOTIFY_VMEXIT_OPTION_RUN;
  s->notify_window = 0;
  }
diff --git a/include/sysemu/kvm_int.h b/include/sysemu/kvm_int.h
index 60b520a13e..fdd5b1bde0 100644
--- a/include/sysemu/kvm_int.h
+++ b/include/sysemu/kvm_int.h
@@ -115,6 +115,7 @@ struct KVMState
  } *as;
  uint64_t kvm_dirty_ring_bytes;  /* Size of the per-vcpu dirty ring */
  uint32_t kvm_dirty_ring_size;   /* Number of dirty GFNs per ring */
+bool kvm_dirty_ring_with_bitmap;
  struct KVMDirtyRingReaper reaper;
  NotifyVmexitOption notify_vmexit;
  uint32_t notify_window;
--
2.23.0








Re: [PATCH v4 4/4] target/ppc: fix warning with clang-15

2023-02-21 Thread Richard Henderson

On 2/21/23 12:30, Philippe Mathieu-Daudé wrote:

On 21/2/23 16:30, Pierrick Bouvier wrote:

When compiling for windows-arm64 using clang-15, it reports a sometimes
uninitialized variable. This seems to be a false positive, as a default
case guards switch expressions, preventing to return an uninitialized
value, but clang seems unhappy with assert(0) definition.


$ git grep 'assert(0)' | wc -l
   96

Should we mass-update and forbid 'assert(0)' adding a check in
scripts/checkpatch.pl? Otherwise we'll keep getting similar clang
warnings...


I just think assert(0) produces a less clean error message, so on that basis yes, we 
should replace them all.  Perhaps abort() as well, unless there's an error_report 
immediately preceding.


The fact that assert(0) was seen to fall through is a system header bug.  I see we have a 
workaround in include/qemu/osdep.h for __MINGW32__, but I guess this doesn't trigger for 
arm64?  Pierrick, would you mind testing a change there?



r~



Re: [PATCH] hw/arm/virt: Prevent CPUs in one socket to span mutiple NUMA nodes

2023-02-21 Thread Gavin Shan

On 2/22/23 10:31 AM, Philippe Mathieu-Daudé wrote:

On 22/2/23 00:12, Gavin Shan wrote:

On 2/21/23 9:21 PM, Philippe Mathieu-Daudé wrote:

On 21/2/23 10:21, Gavin Shan wrote:

On 2/21/23 8:15 PM, Philippe Mathieu-Daudé wrote:

On 21/2/23 09:53, Gavin Shan wrote:

Linux kernel guest reports warning when two CPUs in one socket have
been associated with different NUMA nodes, using the following command
lines.

   -smp 6,maxcpus=6,sockets=2,clusters=1,cores=3,threads=1 \
   -numa node,nodeid=0,cpus=0-1,memdev=ram0    \
   -numa node,nodeid=1,cpus=2-3,memdev=ram1    \
   -numa node,nodeid=2,cpus=4-5,memdev=ram2    \

   [ cut here ]
   WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:2271 
build_sched_domains+0x284/0x910
   Modules linked in:
   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-268.el9.aarch64 #1
   pstate: 0045 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
   pc : build_sched_domains+0x284/0x910
   lr : build_sched_domains+0x184/0x910
   sp : 8804bd50
   x29: 8804bd50 x28: 0002 x27: 
   x26: 89cf9a80 x25:  x24: 89cbf840
   x23: 80325000 x22: 005df800 x21: 8a4ce508
   x20:  x19: 80324440 x18: 0014
   x17: 388925c0 x16: 5386a066 x15: 9c10cc2e
   x14: 01c0 x13: 0001 x12: 7fffb1a0
   x11: 7fffb180 x10: 8a4ce508 x9 : 0041
   x8 : 8a4ce500 x7 : 8a4cf920 x6 : 0001
   x5 : 0001 x4 : 0007 x3 : 0002
   x2 : 1000 x1 : 8a4cf928 x0 : 0001
   Call trace:
    build_sched_domains+0x284/0x910
    sched_init_domains+0xac/0xe0
    sched_init_smp+0x48/0xc8
    kernel_init_freeable+0x140/0x1ac
    kernel_init+0x28/0x140
    ret_from_fork+0x10/0x20

Fix it by preventing mutiple CPUs in one socket to be associated with
different NUMA nodes.

Reported-by: Yihuang Yu 
Signed-off-by: Gavin Shan 
---
  hw/arm/virt.c | 37 +
  1 file changed, 37 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ac626b3bef..e0af267c77 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -230,6 +230,39 @@ static bool cpu_type_valid(const char *cpu)
  return false;
  }
+static bool numa_state_valid(MachineState *ms)
+{
+    MachineClass *mc = MACHINE_GET_CLASS(ms);
+    NumaState *state = ms->numa_state;
+    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
+    const CPUArchId *cpus = possible_cpus->cpus;
+    int len = possible_cpus->len, i, j;
+
+    if (!state || state->num_nodes <= 1 || len <= 1) {
+    return true;
+    }
+
+    for (i = 0; i < len; i++) {
+    for (j = i + 1; j < len; j++) {
+    if (cpus[i].props.has_socket_id &&
+    cpus[i].props.has_node_id &&
+    cpus[j].props.has_socket_id &&
+    cpus[j].props.has_node_id &&
+    cpus[i].props.socket_id == cpus[j].props.socket_id &&
+    cpus[i].props.node_id != cpus[j].props.node_id) {
+    error_report("CPU-%d and CPU-%d in socket-%ld have been "
+ "associated with node-%ld and node-%ld",
+ i, j, cpus[i].props.socket_id,
+ cpus[i].props.node_id,
+ cpus[j].props.node_id);
+    return false;
+    }
+    }
+    }
+
+    return true;
+}
+
  static void create_randomness(MachineState *ms, const char *node)
  {
  struct {
@@ -2040,6 +2073,10 @@ static void machvirt_init(MachineState *machine)
  exit(1);
  }
+    if (!numa_state_valid(machine)) {
+    exit(1);
+    }


Why restrict to the virt machine?



We tried x86 machines and virt machine, but the issue isn't reproducible on x86 
machines.
So I think it's machine or architecture specific issue. However, I believe 
RiscV should
have similar issue because linux/drivers/base/arch_topology.c is shared by 
ARM64 and RiscV.
x86 doesn't use the driver to populate its CPU topology.


Oh, I haven't thought about the other archs, I meant this seem a generic
issue which affects all (ARM) machines, so why restrict to the (ARM)
virt machine?



[Ccing Igor for comments]

Well, virt machine is the only concern to us for now. You're right that all 
ARM64 and ARM machines
need this check and limitation. So the check needs to be done in the generic 
path. The best way
I can figure out is like something below. The idea is to introduce a switch to 
'struct NumaState'
and do the check in the generic path. The switch is turned on by individual 
machines. Please let me
know if you have better ideas


Can't this be done generically in machine_numa_finish_cpu_init()
-> numa_validate_initiator()?



Yes, machine_numa_finish_cpu_init() is better place, before 

Re: [PATCH] hw/arm/virt: Prevent CPUs in one socket to span mutiple NUMA nodes

2023-02-21 Thread Philippe Mathieu-Daudé

On 22/2/23 00:12, Gavin Shan wrote:

On 2/21/23 9:21 PM, Philippe Mathieu-Daudé wrote:

On 21/2/23 10:21, Gavin Shan wrote:

On 2/21/23 8:15 PM, Philippe Mathieu-Daudé wrote:

On 21/2/23 09:53, Gavin Shan wrote:

Linux kernel guest reports warning when two CPUs in one socket have
been associated with different NUMA nodes, using the following command
lines.

   -smp 6,maxcpus=6,sockets=2,clusters=1,cores=3,threads=1 \
   -numa node,nodeid=0,cpus=0-1,memdev=ram0    \
   -numa node,nodeid=1,cpus=2-3,memdev=ram1    \
   -numa node,nodeid=2,cpus=4-5,memdev=ram2    \

   [ cut here ]
   WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:2271 
build_sched_domains+0x284/0x910

   Modules linked in:
   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-268.el9.aarch64 #1
   pstate: 0045 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
   pc : build_sched_domains+0x284/0x910
   lr : build_sched_domains+0x184/0x910
   sp : 8804bd50
   x29: 8804bd50 x28: 0002 x27: 
   x26: 89cf9a80 x25:  x24: 89cbf840
   x23: 80325000 x22: 005df800 x21: 8a4ce508
   x20:  x19: 80324440 x18: 0014
   x17: 388925c0 x16: 5386a066 x15: 9c10cc2e
   x14: 01c0 x13: 0001 x12: 7fffb1a0
   x11: 7fffb180 x10: 8a4ce508 x9 : 0041
   x8 : 8a4ce500 x7 : 8a4cf920 x6 : 0001
   x5 : 0001 x4 : 0007 x3 : 0002
   x2 : 1000 x1 : 8a4cf928 x0 : 0001
   Call trace:
    build_sched_domains+0x284/0x910
    sched_init_domains+0xac/0xe0
    sched_init_smp+0x48/0xc8
    kernel_init_freeable+0x140/0x1ac
    kernel_init+0x28/0x140
    ret_from_fork+0x10/0x20

Fix it by preventing mutiple CPUs in one socket to be associated with
different NUMA nodes.

Reported-by: Yihuang Yu 
Signed-off-by: Gavin Shan 
---
  hw/arm/virt.c | 37 +
  1 file changed, 37 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ac626b3bef..e0af267c77 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -230,6 +230,39 @@ static bool cpu_type_valid(const char *cpu)
  return false;
  }
+static bool numa_state_valid(MachineState *ms)
+{
+    MachineClass *mc = MACHINE_GET_CLASS(ms);
+    NumaState *state = ms->numa_state;
+    const CPUArchIdList *possible_cpus = 
mc->possible_cpu_arch_ids(ms);

+    const CPUArchId *cpus = possible_cpus->cpus;
+    int len = possible_cpus->len, i, j;
+
+    if (!state || state->num_nodes <= 1 || len <= 1) {
+    return true;
+    }
+
+    for (i = 0; i < len; i++) {
+    for (j = i + 1; j < len; j++) {
+    if (cpus[i].props.has_socket_id &&
+    cpus[i].props.has_node_id &&
+    cpus[j].props.has_socket_id &&
+    cpus[j].props.has_node_id &&
+    cpus[i].props.socket_id == cpus[j].props.socket_id &&
+    cpus[i].props.node_id != cpus[j].props.node_id) {
+    error_report("CPU-%d and CPU-%d in socket-%ld have 
been "

+ "associated with node-%ld and node-%ld",
+ i, j, cpus[i].props.socket_id,
+ cpus[i].props.node_id,
+ cpus[j].props.node_id);
+    return false;
+    }
+    }
+    }
+
+    return true;
+}
+
  static void create_randomness(MachineState *ms, const char *node)
  {
  struct {
@@ -2040,6 +2073,10 @@ static void machvirt_init(MachineState 
*machine)

  exit(1);
  }
+    if (!numa_state_valid(machine)) {
+    exit(1);
+    }


Why restrict to the virt machine?



We tried x86 machines and virt machine, but the issue isn't 
reproducible on x86 machines.
So I think it's machine or architecture specific issue. However, I 
believe RiscV should
have similar issue because linux/drivers/base/arch_topology.c is 
shared by ARM64 and RiscV.

x86 doesn't use the driver to populate its CPU topology.


Oh, I haven't thought about the other archs, I meant this seem a generic
issue which affects all (ARM) machines, so why restrict to the (ARM)
virt machine?



[Ccing Igor for comments]

Well, virt machine is the only concern to us for now. You're right that 
all ARM64 and ARM machines
need this check and limitation. So the check needs to be done in the 
generic path. The best way
I can figure out is like something below. The idea is to introduce a 
switch to 'struct NumaState'
and do the check in the generic path. The switch is turned on by 
individual machines. Please let me

know if you have better ideas


Can't this be done generically in machine_numa_finish_cpu_init()
-> numa_validate_initiator()?

- Add 'bool struct NumaState::has_strict_socket_mapping', which is 
'false' by default until
   machine specific 

[PATCH 2/5] scripts/checkpatch.pl: Do not allow assert(0)

2023-02-21 Thread Philippe Mathieu-Daudé
Since commit 262a69f428 ("osdep.h: Prohibit disabling assert()
in supported builds") we can not build QEMU with NDEBUG (or
G_DISABLE_ASSERT) defined, thus 'assert(0)' always aborts QEMU.

However some static analyzers / compilers doesn't notice NDEBUG
can't be defined and emit warnings if code is used after an
'assert(0)' call. See for example commit c0a6665c3c ("target/i386:
Remove compilation errors when -Werror=maybe-uninitialized").

Apparently such compiler isn't as clever with G_DISABLE_ASSERT,
so we can silent these warnings by using g_assert_not_reached()
which is easier to read anyway.

In order to avoid these annoying warnings, add a checkpatch rule
to prohibit 'assert(0)'. Suggest using g_assert_not_reached()
instead. For example when reverting the previous patch we get:

  ERROR: use g_assert_not_reached() instead of assert(0)
  #21: FILE: target/ppc/dfp_helper.c:124:
  +assert(0); /* cannot get here */

  ERROR: use g_assert_not_reached() instead of assert(0)
  #30: FILE: target/ppc/dfp_helper.c:141:
  +assert(0); /* cannot get here */

  total: 2 errors, 0 warnings, 16 lines checked

Signed-off-by: Philippe Mathieu-Daudé 
---
 scripts/checkpatch.pl | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/scripts/checkpatch.pl b/scripts/checkpatch.pl
index 6ecabfb2b5..d768171dcf 100755
--- a/scripts/checkpatch.pl
+++ b/scripts/checkpatch.pl
@@ -2982,6 +2982,9 @@ sub process {
if ($line =~ /\bsysconf\(_SC_PAGESIZE\)/) {
ERROR("use qemu_real_host_page_size() instead of 
sysconf(_SC_PAGESIZE)\n" . $herecurr);
}
+   if ($line =~ /\b(g_)?assert\(0\)/) {
+   ERROR("use g_assert_not_reached() instead of 
assert(0)\n" . $herecurr);
+   }
my $non_exit_glib_asserts = qr{g_assert_cmpstr|
g_assert_cmpint|
g_assert_cmpuint|
-- 
2.38.1




[PATCH 1/5] target/ppc: fix warning with clang-15

2023-02-21 Thread Philippe Mathieu-Daudé
From: Pierrick Bouvier 

When compiling for windows-arm64 using clang-15, it reports a sometimes
uninitialized variable. This seems to be a false positive, as a default
case guards switch expressions, preventing to return an uninitialized
value, but clang seems unhappy with assert(0) definition.

Change code to g_assert_not_reached() fix the warning.

Signed-off-by: Pierrick Bouvier 
Reviewed-by: Richard Henderson 
Reviewed-by: Philippe Mathieu-Daudé 
Message-Id: <20230221153006.20300-5-pierrick.bouv...@linaro.org>
Signed-off-by: Philippe Mathieu-Daudé 
---
 target/ppc/dfp_helper.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/target/ppc/dfp_helper.c b/target/ppc/dfp_helper.c
index cc024316d5..5967ea07a9 100644
--- a/target/ppc/dfp_helper.c
+++ b/target/ppc/dfp_helper.c
@@ -121,7 +121,7 @@ static void dfp_set_round_mode_from_immediate(uint8_t r, 
uint8_t rmc,
 case 3: /* use FPSCR rounding mode */
 return;
 default:
-assert(0); /* cannot get here */
+g_assert_not_reached();
 }
 } else { /* r == 1 */
 switch (rmc & 3) {
@@ -138,7 +138,7 @@ static void dfp_set_round_mode_from_immediate(uint8_t r, 
uint8_t rmc,
 rnd = DEC_ROUND_HALF_DOWN;
 break;
 default:
-assert(0); /* cannot get here */
+g_assert_not_reached();
 }
 }
 decContextSetRounding(>context, rnd);
-- 
2.38.1




[PATCH 0/5] bulk: Replace assert(0) -> g_assert_not_reached()

2023-02-21 Thread Philippe Mathieu-Daudé
Save contributors to post a patch each time clang
produce a -Werror=maybe-uninitialized warning on
assert(0). Replace by g_assert_not_reached()() and
prohibit '[g_]assert(0)'. Remove NDEBUG.

Philippe Mathieu-Daudé (4):
  scripts/checkpatch.pl: Do not allow assert(0)
  bulk: Replace [g_]assert(0) -> g_assert_not_reached()
  block/vvfat: Remove pointless check of NDEBUG
  hw: Remove mentions of NDEBUG

Pierrick Bouvier (1):
  target/ppc: fix warning with clang-15

 block/vvfat.c   |   3 -
 docs/spin/aio_notify_accept.promela |   6 +-
 docs/spin/aio_notify_bug.promela|   6 +-
 hw/acpi/aml-build.c |   3 +-
 hw/arm/highbank.c   |   2 +-
 hw/char/avr_usart.c |   2 +-
 hw/core/numa.c  |   2 +-
 hw/net/i82596.c |   2 +-
 hw/scsi/mptsas.c|   2 -
 hw/virtio/virtio.c  |   2 -
 hw/watchdog/watchdog.c  |   2 +-
 migration/migration-hmp-cmds.c  |   2 +-
 migration/postcopy-ram.c|  21 ++
 migration/ram.c |   8 +--
 qobject/qlit.c  |   2 +-
 qobject/qnum.c  |  12 ++--
 scripts/checkpatch.pl   |   3 +
 softmmu/rtc.c   |   2 +-
 target/mips/sysemu/physaddr.c   |   3 +-
 target/mips/tcg/msa_helper.c| 104 ++--
 target/ppc/dfp_helper.c |  12 ++--
 target/ppc/mmu_helper.c |   2 +-
 tests/qtest/ipmi-bt-test.c  |   2 +-
 tests/qtest/ipmi-kcs-test.c |   4 +-
 tests/qtest/rtl8139-test.c  |   2 +-
 25 files changed, 96 insertions(+), 115 deletions(-)

-- 
2.38.1




[PATCH 5/5] hw: Remove mentions of NDEBUG

2023-02-21 Thread Philippe Mathieu-Daudé
Since commit 262a69f428 ("osdep.h: Prohibit disabling
assert() in supported builds") 'NDEBUG' can not be defined.

Signed-off-by: Philippe Mathieu-Daudé 
---
 hw/scsi/mptsas.c   | 2 --
 hw/virtio/virtio.c | 2 --
 2 files changed, 4 deletions(-)

diff --git a/hw/scsi/mptsas.c b/hw/scsi/mptsas.c
index c485da792c..5b373d3ed6 100644
--- a/hw/scsi/mptsas.c
+++ b/hw/scsi/mptsas.c
@@ -1240,8 +1240,6 @@ static void *mptsas_load_request(QEMUFile *f, SCSIRequest 
*sreq)
 n = qemu_get_be32(f);
 /* TODO: add a way for SCSIBusInfo's load_request to fail,
  * and fail migration instead of asserting here.
- * This is just one thing (there are probably more) that must be
- * fixed before we can allow NDEBUG compilation.
  */
 assert(n >= 0);
 
diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index f35178f5fc..c6b3e3fb08 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -1898,8 +1898,6 @@ void *qemu_get_virtqueue_element(VirtIODevice *vdev, 
QEMUFile *f, size_t sz)
 
 /* TODO: teach all callers that this can fail, and return failure instead
  * of asserting here.
- * This is just one thing (there are probably more) that must be
- * fixed before we can allow NDEBUG compilation.
  */
 assert(ARRAY_SIZE(data.in_addr) >= data.in_num);
 assert(ARRAY_SIZE(data.out_addr) >= data.out_num);
-- 
2.38.1




[PATCH 4/5] block/vvfat: Remove pointless check of NDEBUG

2023-02-21 Thread Philippe Mathieu-Daudé
Since commit 262a69f428 ("osdep.h: Prohibit disabling
assert() in supported builds") 'NDEBUG' can not be defined,
so '#ifndef NDEBUG' is dead code. Remove it.

Signed-off-by: Philippe Mathieu-Daudé 
---
 block/vvfat.c | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/block/vvfat.c b/block/vvfat.c
index d7d775bd2c..fd45e86416 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -2784,13 +2784,10 @@ static int handle_commits(BDRVVVFATState* s)
 fail = -2;
 break;
 case ACTION_WRITEOUT: {
-#ifndef NDEBUG
-/* these variables are only used by assert() below */
 direntry_t* entry = array_get(&(s->directory),
 commit->param.writeout.dir_index);
 uint32_t begin = begin_of_direntry(entry);
 mapping_t* mapping = find_mapping_for_cluster(s, begin);
-#endif
 
 assert(mapping);
 assert(mapping->begin == begin);
-- 
2.38.1




[PATCH 3/5] bulk: Replace [g_]assert(0) -> g_assert_not_reached()

2023-02-21 Thread Philippe Mathieu-Daudé
In order to avoid warnings such commit c0a6665c3c ("target/i386:
Remove compilation errors when -Werror=maybe-uninitialized"),
replace all assert(0) and g_assert(0) by g_assert_not_reached().

Remove any code following g_assert_not_reached().

See previous commit for rationale.

Signed-off-by: Philippe Mathieu-Daudé 
---
 docs/spin/aio_notify_accept.promela |   6 +-
 docs/spin/aio_notify_bug.promela|   6 +-
 hw/acpi/aml-build.c |   3 +-
 hw/arm/highbank.c   |   2 +-
 hw/char/avr_usart.c |   2 +-
 hw/core/numa.c  |   2 +-
 hw/net/i82596.c |   2 +-
 hw/watchdog/watchdog.c  |   2 +-
 migration/migration-hmp-cmds.c  |   2 +-
 migration/postcopy-ram.c|  21 ++
 migration/ram.c |   8 +--
 qobject/qlit.c  |   2 +-
 qobject/qnum.c  |  12 ++--
 softmmu/rtc.c   |   2 +-
 target/mips/sysemu/physaddr.c   |   3 +-
 target/mips/tcg/msa_helper.c| 104 ++--
 target/ppc/dfp_helper.c |   8 +--
 target/ppc/mmu_helper.c |   2 +-
 tests/qtest/ipmi-bt-test.c  |   2 +-
 tests/qtest/ipmi-kcs-test.c |   4 +-
 tests/qtest/rtl8139-test.c  |   2 +-
 21 files changed, 91 insertions(+), 106 deletions(-)

diff --git a/docs/spin/aio_notify_accept.promela 
b/docs/spin/aio_notify_accept.promela
index 9cef2c955d..f929d30328 100644
--- a/docs/spin/aio_notify_accept.promela
+++ b/docs/spin/aio_notify_accept.promela
@@ -118,7 +118,7 @@ accept_if_req_not_eventually_false:
 if
 :: req -> goto accept_if_req_not_eventually_false;
 fi;
-assert(0);
+g_assert_not_reached();
 }
 
 #else
@@ -141,12 +141,12 @@ accept_if_event_not_eventually_true:
 :: !event && notifier_done  -> do :: true -> skip; od;
 :: !event && !notifier_done -> goto 
accept_if_event_not_eventually_true;
 fi;
-assert(0);
+g_assert_not_reached();
 
 accept_if_event_not_eventually_false:
 if
 :: event -> goto accept_if_event_not_eventually_false;
 fi;
-assert(0);
+g_assert_not_reached();
 }
 #endif
diff --git a/docs/spin/aio_notify_bug.promela b/docs/spin/aio_notify_bug.promela
index b3bfca1ca4..ce6f5177ed 100644
--- a/docs/spin/aio_notify_bug.promela
+++ b/docs/spin/aio_notify_bug.promela
@@ -106,7 +106,7 @@ accept_if_req_not_eventually_false:
 if
 :: req -> goto accept_if_req_not_eventually_false;
 fi;
-assert(0);
+g_assert_not_reached();
 }
 
 #else
@@ -129,12 +129,12 @@ accept_if_event_not_eventually_true:
 :: !event && notifier_done  -> do :: true -> skip; od;
 :: !event && !notifier_done -> goto 
accept_if_event_not_eventually_true;
 fi;
-assert(0);
+g_assert_not_reached();
 
 accept_if_event_not_eventually_false:
 if
 :: event -> goto accept_if_event_not_eventually_false;
 fi;
-assert(0);
+g_assert_not_reached();
 }
 #endif
diff --git a/hw/acpi/aml-build.c b/hw/acpi/aml-build.c
index ea331a20d1..97dfdcdd2f 100644
--- a/hw/acpi/aml-build.c
+++ b/hw/acpi/aml-build.c
@@ -534,8 +534,7 @@ void aml_append(Aml *parent_ctx, Aml *child)
 case AML_NO_OPCODE:
 break;
 default:
-assert(0);
-break;
+g_assert_not_reached();
 }
 build_append_array(parent_ctx->buf, buf);
 build_free_array(buf);
diff --git a/hw/arm/highbank.c b/hw/arm/highbank.c
index f12aacea6b..fc212195ca 100644
--- a/hw/arm/highbank.c
+++ b/hw/arm/highbank.c
@@ -198,7 +198,7 @@ static void calxeda_init(MachineState *machine, enum 
cxmachines machine_id)
 machine->cpu_type = ARM_CPU_TYPE_NAME("cortex-a15");
 break;
 default:
-assert(0);
+g_assert_not_reached();
 }
 
 for (n = 0; n < smp_cpus; n++) {
diff --git a/hw/char/avr_usart.c b/hw/char/avr_usart.c
index 5bcf9db0b7..e738a2ca97 100644
--- a/hw/char/avr_usart.c
+++ b/hw/char/avr_usart.c
@@ -86,7 +86,7 @@ static void update_char_mask(AVRUsartState *usart)
 usart->char_mask = 0b;
 break;
 default:
-assert(0);
+g_assert_not_reached();
 }
 }
 
diff --git a/hw/core/numa.c b/hw/core/numa.c
index d8d36b16d8..26ef02792a 100644
--- a/hw/core/numa.c
+++ b/hw/core/numa.c
@@ -381,7 +381,7 @@ void parse_numa_hmat_lb(NumaState *numa_state, 
NumaHmatLBOptions *node,
 }
 lb_data.data = node->bandwidth;
 } else {
-assert(0);
+g_assert_not_reached();
 }
 
 g_array_append_val(hmat_lb->list, lb_data);
diff --git a/hw/net/i82596.c b/hw/net/i82596.c
index ec21e2699a..eda0f586fb 100644
--- a/hw/net/i82596.c
+++ b/hw/net/i82596.c
@@ -285,7 +285,7 @@ static void command_loop(I82596State *s)
 case CmdDump:
 case CmdDiagnose:
 printf("FIXME Command %d !!\n", cmd & 7);
-assert(0);
+g_assert_not_reached();
 }
 
 

Re: [PATCH v1 2/6] migration: Add last stage indicator to global dirty log synchronization

2023-02-21 Thread Gavin Shan

On 2/22/23 4:36 AM, Peter Xu wrote:

On Mon, Feb 13, 2023 at 08:39:21AM +0800, Gavin Shan wrote:

The global dirty log synchronization is used when KVM and dirty ring
are enabled. There is a particularity for ARM64 where the backup
bitmap is used to track dirty pages in non-running-vcpu situations.
It means the dirty ring works with the combination of ring buffer
and backup bitmap. The dirty bits in the backup bitmap needs to
collected in the last stage of live migration.

In order to identify the last stage of live migration and pass it
down, an extra parameter is added to the relevant functions and
callbacks. This last stage indicator isn't used until the dirty
ring is enabled in the subsequent patches.

No functional change intended.

Signed-off-by: Gavin Shan 


Reviewed-by: Peter Xu 

One trivial thing to mention below.


---
  accel/kvm/kvm-all.c   |  2 +-
  include/exec/memory.h |  5 +++--
  migration/dirtyrate.c |  4 ++--
  migration/ram.c   | 20 ++--
  softmmu/memory.c  | 10 +-
  5 files changed, 21 insertions(+), 20 deletions(-)

diff --git a/accel/kvm/kvm-all.c b/accel/kvm/kvm-all.c
index 9b26582655..01a6a026af 100644
--- a/accel/kvm/kvm-all.c
+++ b/accel/kvm/kvm-all.c
@@ -1554,7 +1554,7 @@ static void kvm_log_sync(MemoryListener *listener,
  kvm_slots_unlock();
  }
  
-static void kvm_log_sync_global(MemoryListener *l)

+static void kvm_log_sync_global(MemoryListener *l, bool last_stage)
  {
  KVMMemoryListener *kml = container_of(l, KVMMemoryListener, listener);
  KVMState *s = kvm_state;
diff --git a/include/exec/memory.h b/include/exec/memory.h
index 2e602a2fad..75b2fd9f48 100644
--- a/include/exec/memory.h
+++ b/include/exec/memory.h
@@ -929,8 +929,9 @@ struct MemoryListener {
   * its @log_sync must be NULL.  Vice versa.
   *
   * @listener: The #MemoryListener.
+ * @last_stage: The last stage to synchronize the log during migration


IMHO it may be important to mention the vcpu status here that the caller
guarantees to call the last_stage==true only once, only after all vcpus are
stopped (and vcpus will not be started again if migration succeeded).



Yes, I will update the comments in next revision accordingly.


   */
-void (*log_sync_global)(MemoryListener *listener);
+void (*log_sync_global)(MemoryListener *listener, bool last_stage);


Thanks,
Gavin




Re: [PATCH v1 1/6] linux-headers: Update for dirty ring

2023-02-21 Thread Gavin Shan

On 2/22/23 3:30 AM, Peter Maydell wrote:

On Mon, 13 Feb 2023 at 00:39, Gavin Shan  wrote:


Signed-off-by: Gavin Shan 
---
  linux-headers/asm-arm64/kvm.h | 1 +
  linux-headers/linux/kvm.h | 2 ++
  2 files changed, 3 insertions(+)


For this to be a non-RFC patch, this needs to be a proper
sync of the headers against an upstream kernel tree.
(By-hand tweaks are fine for purposes of working on
and getting patchsets reviewed.)



Yes, I vaguely remember there is script to synchronize linux header files,
which is './scripts/update-linux-headers.sh'. I think I need to run the
following command to update?

  # ./scripts/update-linux-headers.sh  

Thanks,
Gavin




Re: [PATCH] hw/arm/virt: Prevent CPUs in one socket to span mutiple NUMA nodes

2023-02-21 Thread Gavin Shan

On 2/21/23 9:21 PM, Philippe Mathieu-Daudé wrote:

On 21/2/23 10:21, Gavin Shan wrote:

On 2/21/23 8:15 PM, Philippe Mathieu-Daudé wrote:

On 21/2/23 09:53, Gavin Shan wrote:

Linux kernel guest reports warning when two CPUs in one socket have
been associated with different NUMA nodes, using the following command
lines.

   -smp 6,maxcpus=6,sockets=2,clusters=1,cores=3,threads=1 \
   -numa node,nodeid=0,cpus=0-1,memdev=ram0    \
   -numa node,nodeid=1,cpus=2-3,memdev=ram1    \
   -numa node,nodeid=2,cpus=4-5,memdev=ram2    \

   [ cut here ]
   WARNING: CPU: 0 PID: 1 at kernel/sched/topology.c:2271 
build_sched_domains+0x284/0x910
   Modules linked in:
   CPU: 0 PID: 1 Comm: swapper/0 Not tainted 5.14.0-268.el9.aarch64 #1
   pstate: 0045 (nzcv daif +PAN -UAO -TCO -DIT -SSBS BTYPE=--)
   pc : build_sched_domains+0x284/0x910
   lr : build_sched_domains+0x184/0x910
   sp : 8804bd50
   x29: 8804bd50 x28: 0002 x27: 
   x26: 89cf9a80 x25:  x24: 89cbf840
   x23: 80325000 x22: 005df800 x21: 8a4ce508
   x20:  x19: 80324440 x18: 0014
   x17: 388925c0 x16: 5386a066 x15: 9c10cc2e
   x14: 01c0 x13: 0001 x12: 7fffb1a0
   x11: 7fffb180 x10: 8a4ce508 x9 : 0041
   x8 : 8a4ce500 x7 : 8a4cf920 x6 : 0001
   x5 : 0001 x4 : 0007 x3 : 0002
   x2 : 1000 x1 : 8a4cf928 x0 : 0001
   Call trace:
    build_sched_domains+0x284/0x910
    sched_init_domains+0xac/0xe0
    sched_init_smp+0x48/0xc8
    kernel_init_freeable+0x140/0x1ac
    kernel_init+0x28/0x140
    ret_from_fork+0x10/0x20

Fix it by preventing mutiple CPUs in one socket to be associated with
different NUMA nodes.

Reported-by: Yihuang Yu 
Signed-off-by: Gavin Shan 
---
  hw/arm/virt.c | 37 +
  1 file changed, 37 insertions(+)

diff --git a/hw/arm/virt.c b/hw/arm/virt.c
index ac626b3bef..e0af267c77 100644
--- a/hw/arm/virt.c
+++ b/hw/arm/virt.c
@@ -230,6 +230,39 @@ static bool cpu_type_valid(const char *cpu)
  return false;
  }
+static bool numa_state_valid(MachineState *ms)
+{
+    MachineClass *mc = MACHINE_GET_CLASS(ms);
+    NumaState *state = ms->numa_state;
+    const CPUArchIdList *possible_cpus = mc->possible_cpu_arch_ids(ms);
+    const CPUArchId *cpus = possible_cpus->cpus;
+    int len = possible_cpus->len, i, j;
+
+    if (!state || state->num_nodes <= 1 || len <= 1) {
+    return true;
+    }
+
+    for (i = 0; i < len; i++) {
+    for (j = i + 1; j < len; j++) {
+    if (cpus[i].props.has_socket_id &&
+    cpus[i].props.has_node_id &&
+    cpus[j].props.has_socket_id &&
+    cpus[j].props.has_node_id &&
+    cpus[i].props.socket_id == cpus[j].props.socket_id &&
+    cpus[i].props.node_id != cpus[j].props.node_id) {
+    error_report("CPU-%d and CPU-%d in socket-%ld have been "
+ "associated with node-%ld and node-%ld",
+ i, j, cpus[i].props.socket_id,
+ cpus[i].props.node_id,
+ cpus[j].props.node_id);
+    return false;
+    }
+    }
+    }
+
+    return true;
+}
+
  static void create_randomness(MachineState *ms, const char *node)
  {
  struct {
@@ -2040,6 +2073,10 @@ static void machvirt_init(MachineState *machine)
  exit(1);
  }
+    if (!numa_state_valid(machine)) {
+    exit(1);
+    }


Why restrict to the virt machine?



We tried x86 machines and virt machine, but the issue isn't reproducible on x86 
machines.
So I think it's machine or architecture specific issue. However, I believe 
RiscV should
have similar issue because linux/drivers/base/arch_topology.c is shared by 
ARM64 and RiscV.
x86 doesn't use the driver to populate its CPU topology.


Oh, I haven't thought about the other archs, I meant this seem a generic
issue which affects all (ARM) machines, so why restrict to the (ARM)
virt machine?



[Ccing Igor for comments]

Well, virt machine is the only concern to us for now. You're right that all 
ARM64 and ARM machines
need this check and limitation. So the check needs to be done in the generic 
path. The best way
I can figure out is like something below. The idea is to introduce a switch to 
'struct NumaState'
and do the check in the generic path. The switch is turned on by individual 
machines. Please let me
know if you have better ideas

- Add 'bool struct NumaState::has_strict_socket_mapping', which is 'false' by 
default until
  machine specific initialization function calls helper 
set_numa_strict_socket_mapping(), for
  example in hw/arm/virt.c::virt_instance_init().

- In 

[PATCH v3 17/24] gdbstub: fix address type of gdb_set_cpu_pc

2023-02-21 Thread Alex Bennée
The underlying call uses vaddr and the comms API uses unsigned long
long which will always fit. We don't need to deal in target_ulong
here.

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Signed-off-by: Alex Bennée 
---
 gdbstub/gdbstub.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index e107aa065c..4814e8fbf3 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -535,7 +535,7 @@ static void gdb_process_breakpoint_remove_all(GDBProcess *p)
 }
 
 
-static void gdb_set_cpu_pc(target_ulong pc)
+static void gdb_set_cpu_pc(vaddr pc)
 {
 CPUState *cpu = gdbserver_state.c_cpu;
 
@@ -1289,7 +1289,7 @@ static void handle_file_io(GArray *params, void *user_ctx)
 static void handle_step(GArray *params, void *user_ctx)
 {
 if (params->len) {
-gdb_set_cpu_pc((target_ulong)get_param(params, 0)->val_ull);
+gdb_set_cpu_pc(get_param(params, 0)->val_ull);
 }
 
 cpu_single_step(gdbserver_state.c_cpu, gdbserver_state.sstep_flags);
-- 
2.39.1




[PATCH v3 20/24] gdbstub: move syscall handling to new file

2023-02-21 Thread Alex Bennée
Our GDB syscall support is the last chunk of code that needs target
specific support so move it to a new file. We take the opportunity to
move the syscall state into its own singleton instance and add in a
few helpers for the main gdbstub to interact with the module.

I also moved the gdb_exit() declaration into syscalls.h as it feels
pretty related and most of the callers of it treat it as such.

Reviewed-by: Richard Henderson 
Signed-off-by: Alex Bennée 
---
 gdbstub/internals.h|   8 +-
 include/exec/gdbstub.h | 102 -
 include/gdbstub/syscalls.h | 124 
 gdbstub/gdbstub.c  | 177 +-
 gdbstub/softmmu.c  |   7 +-
 gdbstub/syscalls.c | 227 +
 gdbstub/user.c |   1 +
 linux-user/exit.c  |   2 +-
 semihosting/arm-compat-semi.c  |   1 +
 semihosting/guestfd.c  |   2 +-
 semihosting/syscalls.c |   2 +-
 softmmu/runstate.c |   2 +-
 target/m68k/m68k-semi.c|   2 +-
 target/mips/tcg/sysemu/mips-semi.c |   2 +-
 target/nios2/nios2-semi.c  |   2 +-
 gdbstub/meson.build|   4 +
 16 files changed, 377 insertions(+), 288 deletions(-)
 create mode 100644 include/gdbstub/syscalls.h
 create mode 100644 gdbstub/syscalls.c

diff --git a/gdbstub/internals.h b/gdbstub/internals.h
index 5f2e24c4f3..fe82facaeb 100644
--- a/gdbstub/internals.h
+++ b/gdbstub/internals.h
@@ -59,8 +59,6 @@ typedef struct GDBState {
 bool multiprocess;
 GDBProcess *processes;
 int process_num;
-char syscall_buf[256];
-gdb_syscall_complete_cb current_syscall_cb;
 GString *str_buf;
 GByteArray *mem_buf;
 int sstep_flags;
@@ -189,6 +187,12 @@ void gdb_handle_query_attached(GArray *params, void 
*user_ctx); /* both */
 void gdb_handle_query_qemu_phy_mem_mode(GArray *params, void *user_ctx);
 void gdb_handle_set_qemu_phy_mem_mode(GArray *params, void *user_ctx);
 
+/* sycall handling */
+void gdb_handle_file_io(GArray *params, void *user_ctx);
+bool gdb_handled_syscall(void);
+void gdb_disable_syscalls(void);
+void gdb_syscall_reset(void);
+
 /*
  * Break/Watch point support - there is an implementation for softmmu
  * and user mode.
diff --git a/include/exec/gdbstub.h b/include/exec/gdbstub.h
index bb8a3928dd..7d743fe1e9 100644
--- a/include/exec/gdbstub.h
+++ b/include/exec/gdbstub.h
@@ -10,98 +10,6 @@
 #define GDB_WATCHPOINT_READ  3
 #define GDB_WATCHPOINT_ACCESS4
 
-/* For gdb file i/o remote protocol open flags. */
-#define GDB_O_RDONLY  0
-#define GDB_O_WRONLY  1
-#define GDB_O_RDWR2
-#define GDB_O_APPEND  8
-#define GDB_O_CREAT   0x200
-#define GDB_O_TRUNC   0x400
-#define GDB_O_EXCL0x800
-
-/* For gdb file i/o remote protocol errno values */
-#define GDB_EPERM   1
-#define GDB_ENOENT  2
-#define GDB_EINTR   4
-#define GDB_EBADF   9
-#define GDB_EACCES 13
-#define GDB_EFAULT 14
-#define GDB_EBUSY  16
-#define GDB_EEXIST 17
-#define GDB_ENODEV 19
-#define GDB_ENOTDIR20
-#define GDB_EISDIR 21
-#define GDB_EINVAL 22
-#define GDB_ENFILE 23
-#define GDB_EMFILE 24
-#define GDB_EFBIG  27
-#define GDB_ENOSPC 28
-#define GDB_ESPIPE 29
-#define GDB_EROFS  30
-#define GDB_ENAMETOOLONG   91
-#define GDB_EUNKNOWN   
-
-/* For gdb file i/o remote protocol lseek whence. */
-#define GDB_SEEK_SET  0
-#define GDB_SEEK_CUR  1
-#define GDB_SEEK_END  2
-
-/* For gdb file i/o stat/fstat. */
-typedef uint32_t gdb_mode_t;
-typedef uint32_t gdb_time_t;
-
-struct gdb_stat {
-  uint32_tgdb_st_dev; /* device */
-  uint32_tgdb_st_ino; /* inode */
-  gdb_mode_t  gdb_st_mode;/* protection */
-  uint32_tgdb_st_nlink;   /* number of hard links */
-  uint32_tgdb_st_uid; /* user ID of owner */
-  uint32_tgdb_st_gid; /* group ID of owner */
-  uint32_tgdb_st_rdev;/* device type (if inode device) */
-  uint64_tgdb_st_size;/* total size, in bytes */
-  uint64_tgdb_st_blksize; /* blocksize for filesystem I/O */
-  uint64_tgdb_st_blocks;  /* number of blocks allocated */
-  gdb_time_t  gdb_st_atime;   /* time of last access */
-  gdb_time_t  gdb_st_mtime;   /* time of last modification */
-  gdb_time_t  gdb_st_ctime;   /* time of last change */
-} QEMU_PACKED;
-
-struct gdb_timeval {
-  gdb_time_t tv_sec;  /* second */
-  uint64_t tv_usec;   /* microsecond */
-} QEMU_PACKED;
-
-typedef void (*gdb_syscall_complete_cb)(CPUState *cpu, uint64_t ret, int err);
-
-/**
- * gdb_do_syscall:
- * @cb: function to call when the system call has completed
- * @fmt: gdb syscall format string
- * ...: list of arguments to interpolate into @fmt
- *
- * Send a GDB syscall request. This function will return immediately;
- * the callback function will be called later when the 

[PATCH v3 16/24] gdbstub: specialise stub_can_reverse

2023-02-21 Thread Alex Bennée
Currently we only support replay for softmmu mode so it is a constant
false for user-mode.

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Signed-off-by: Alex Bennée 

---
v3
  - rename gdb_stub_can_revers -> gdb_can_reverse
---
 gdbstub/internals.h |  1 +
 gdbstub/gdbstub.c   | 13 ++---
 gdbstub/softmmu.c   |  5 +
 gdbstub/user.c  |  5 +
 4 files changed, 13 insertions(+), 11 deletions(-)

diff --git a/gdbstub/internals.h b/gdbstub/internals.h
index 90069a9415..5f2e24c4f3 100644
--- a/gdbstub/internals.h
+++ b/gdbstub/internals.h
@@ -128,6 +128,7 @@ CPUState *gdb_first_attached_cpu(void);
 void gdb_append_thread_id(CPUState *cpu, GString *buf);
 int gdb_get_cpu_index(CPUState *cpu);
 unsigned int gdb_get_max_cpus(void); /* both */
+bool gdb_can_reverse(void); /* softmmu, stub for user */
 
 void gdb_create_default_process(GDBState *s);
 
diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index f9950200b8..e107aa065c 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -113,15 +113,6 @@ int use_gdb_syscalls(void)
 return gdb_syscall_mode == GDB_SYS_ENABLED;
 }
 
-static bool stub_can_reverse(void)
-{
-#ifdef CONFIG_USER_ONLY
-return false;
-#else
-return replay_mode == REPLAY_MODE_PLAY;
-#endif
-}
-
 /* writes 2*len+1 bytes in buf */
 void gdb_memtohex(GString *buf, const uint8_t *mem, int len)
 {
@@ -1307,7 +1298,7 @@ static void handle_step(GArray *params, void *user_ctx)
 
 static void handle_backward(GArray *params, void *user_ctx)
 {
-if (!stub_can_reverse()) {
+if (!gdb_can_reverse()) {
 gdb_put_packet("E22");
 }
 if (params->len == 1) {
@@ -1558,7 +1549,7 @@ static void handle_query_supported(GArray *params, void 
*user_ctx)
 g_string_append(gdbserver_state.str_buf, ";qXfer:features:read+");
 }
 
-if (stub_can_reverse()) {
+if (gdb_can_reverse()) {
 g_string_append(gdbserver_state.str_buf,
 ";ReverseStep+;ReverseContinue+");
 }
diff --git a/gdbstub/softmmu.c b/gdbstub/softmmu.c
index 65aa2018a7..5363ff066d 100644
--- a/gdbstub/softmmu.c
+++ b/gdbstub/softmmu.c
@@ -443,6 +443,11 @@ unsigned int gdb_get_max_cpus(void)
 return ms->smp.max_cpus;
 }
 
+bool gdb_can_reverse(void)
+{
+return replay_mode == REPLAY_MODE_PLAY;
+}
+
 /*
  * Softmmu specific command helpers
  */
diff --git a/gdbstub/user.c b/gdbstub/user.c
index 15ff3ab08d..a716ad05b2 100644
--- a/gdbstub/user.c
+++ b/gdbstub/user.c
@@ -404,6 +404,11 @@ unsigned int gdb_get_max_cpus(void)
 return max_cpus;
 }
 
+/* replay not supported for user-mode */
+bool gdb_can_reverse(void)
+{
+return false;
+}
 
 /*
  * Break/Watch point helpers
-- 
2.39.1




[PATCH v3 18/24] gdbstub: don't use target_ulong while handling registers

2023-02-21 Thread Alex Bennée
This is a hangover from the original code. addr is misleading as it is
only really a register id. While len will never exceed
MAX_PACKET_LENGTH I've used size_t as that is what strlen returns.

Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Signed-off-by: Alex Bennée 

---
v3
  - fix commit message
  - use unsigned for regid
---
 gdbstub/gdbstub.c | 17 ++---
 1 file changed, 10 insertions(+), 7 deletions(-)

diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index 4814e8fbf3..7ae3ff52b3 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -1192,7 +1192,8 @@ static void handle_read_mem(GArray *params, void 
*user_ctx)
 
 static void handle_write_all_regs(GArray *params, void *user_ctx)
 {
-target_ulong addr, len;
+unsigned int reg_id;
+size_t len;
 uint8_t *registers;
 int reg_size;
 
@@ -1204,9 +1205,10 @@ static void handle_write_all_regs(GArray *params, void 
*user_ctx)
 len = strlen(get_param(params, 0)->data) / 2;
 gdb_hextomem(gdbserver_state.mem_buf, get_param(params, 0)->data, len);
 registers = gdbserver_state.mem_buf->data;
-for (addr = 0; addr < gdbserver_state.g_cpu->gdb_num_g_regs && len > 0;
- addr++) {
-reg_size = gdb_write_register(gdbserver_state.g_cpu, registers, addr);
+for (reg_id = 0;
+ reg_id < gdbserver_state.g_cpu->gdb_num_g_regs && len > 0;
+ reg_id++) {
+reg_size = gdb_write_register(gdbserver_state.g_cpu, registers, 
reg_id);
 len -= reg_size;
 registers += reg_size;
 }
@@ -1215,15 +1217,16 @@ static void handle_write_all_regs(GArray *params, void 
*user_ctx)
 
 static void handle_read_all_regs(GArray *params, void *user_ctx)
 {
-target_ulong addr, len;
+int reg_id;
+size_t len;
 
 cpu_synchronize_state(gdbserver_state.g_cpu);
 g_byte_array_set_size(gdbserver_state.mem_buf, 0);
 len = 0;
-for (addr = 0; addr < gdbserver_state.g_cpu->gdb_num_g_regs; addr++) {
+for (reg_id = 0; reg_id < gdbserver_state.g_cpu->gdb_num_g_regs; reg_id++) 
{
 len += gdb_read_register(gdbserver_state.g_cpu,
  gdbserver_state.mem_buf,
- addr);
+ reg_id);
 }
 g_assert(len == gdbserver_state.mem_buf->len);
 
-- 
2.39.1




[PATCH v3 14/24] gdbstub: specialise target_memory_rw_debug

2023-02-21 Thread Alex Bennée
The two implementations are different enough to encourage having a
specialisation and we can move some of the softmmu only stuff out of
gdbstub.

Reviewed-by: Richard Henderson 
Signed-off-by: Alex Bennée 
---
 gdbstub/internals.h | 19 
 gdbstub/gdbstub.c   | 73 +++--
 gdbstub/softmmu.c   | 51 +++
 gdbstub/user.c  | 15 ++
 4 files changed, 96 insertions(+), 62 deletions(-)

diff --git a/gdbstub/internals.h b/gdbstub/internals.h
index 3875c6877e..9de995ce3a 100644
--- a/gdbstub/internals.h
+++ b/gdbstub/internals.h
@@ -183,6 +183,10 @@ void gdb_handle_query_xfer_auxv(GArray *params, void 
*user_ctx); /*user */
 
 void gdb_handle_query_attached(GArray *params, void *user_ctx); /* both */
 
+/* softmmu only */
+void gdb_handle_query_qemu_phy_mem_mode(GArray *params, void *user_ctx);
+void gdb_handle_set_qemu_phy_mem_mode(GArray *params, void *user_ctx);
+
 /*
  * Break/Watch point support - there is an implementation for softmmu
  * and user mode.
@@ -192,4 +196,19 @@ int gdb_breakpoint_insert(CPUState *cs, int type, hwaddr 
addr, hwaddr len);
 int gdb_breakpoint_remove(CPUState *cs, int type, hwaddr addr, hwaddr len);
 void gdb_breakpoint_remove_all(CPUState *cs);
 
+/**
+ * gdb_target_memory_rw_debug() - handle debug access to memory
+ * @cs: CPUState
+ * @addr: nominal address, could be an entire physical address
+ * @buf: data
+ * @len: length of access
+ * @is_write: is it a write operation
+ *
+ * This function is specialised depending on the mode we are running
+ * in. For softmmu guests we can switch the interpretation of the
+ * address to a physical address.
+ */
+int gdb_target_memory_rw_debug(CPUState *cs, hwaddr addr,
+   uint8_t *buf, int len, bool is_write);
+
 #endif /* GDBSTUB_INTERNALS_H */
diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index 0d90685c72..91021859a1 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -46,33 +46,6 @@
 
 #include "internals.h"
 
-#ifndef CONFIG_USER_ONLY
-static int phy_memory_mode;
-#endif
-
-static inline int target_memory_rw_debug(CPUState *cpu, target_ulong addr,
- uint8_t *buf, int len, bool is_write)
-{
-CPUClass *cc;
-
-#ifndef CONFIG_USER_ONLY
-if (phy_memory_mode) {
-if (is_write) {
-cpu_physical_memory_write(addr, buf, len);
-} else {
-cpu_physical_memory_read(addr, buf, len);
-}
-return 0;
-}
-#endif
-
-cc = CPU_GET_CLASS(cpu);
-if (cc->memory_rw_debug) {
-return cc->memory_rw_debug(cpu, addr, buf, len, is_write);
-}
-return cpu_memory_rw_debug(cpu, addr, buf, len, is_write);
-}
-
 typedef struct GDBRegisterState {
 int base_reg;
 int num_regs;
@@ -1194,11 +1167,11 @@ static void handle_write_mem(GArray *params, void 
*user_ctx)
 }
 
 gdb_hextomem(gdbserver_state.mem_buf, get_param(params, 2)->data,
- get_param(params, 1)->val_ull);
-if (target_memory_rw_debug(gdbserver_state.g_cpu,
-   get_param(params, 0)->val_ull,
-   gdbserver_state.mem_buf->data,
-   gdbserver_state.mem_buf->len, true)) {
+ get_param(params, 1)->val_ull);
+if (gdb_target_memory_rw_debug(gdbserver_state.g_cpu,
+   get_param(params, 0)->val_ull,
+   gdbserver_state.mem_buf->data,
+   gdbserver_state.mem_buf->len, true)) {
 gdb_put_packet("E14");
 return;
 }
@@ -1222,10 +1195,10 @@ static void handle_read_mem(GArray *params, void 
*user_ctx)
 g_byte_array_set_size(gdbserver_state.mem_buf,
   get_param(params, 1)->val_ull);
 
-if (target_memory_rw_debug(gdbserver_state.g_cpu,
-   get_param(params, 0)->val_ull,
-   gdbserver_state.mem_buf->data,
-   gdbserver_state.mem_buf->len, false)) {
+if (gdb_target_memory_rw_debug(gdbserver_state.g_cpu,
+   get_param(params, 0)->val_ull,
+   gdbserver_state.mem_buf->data,
+   gdbserver_state.mem_buf->len, false)) {
 gdb_put_packet("E14");
 return;
 }
@@ -1675,30 +1648,6 @@ static void handle_query_qemu_supported(GArray *params, 
void *user_ctx)
 gdb_put_strbuf();
 }
 
-#ifndef CONFIG_USER_ONLY
-static void handle_query_qemu_phy_mem_mode(GArray *params,
-   void *user_ctx)
-{
-g_string_printf(gdbserver_state.str_buf, "%d", phy_memory_mode);
-gdb_put_strbuf();
-}
-
-static void handle_set_qemu_phy_mem_mode(GArray *params, void *user_ctx)
-{
-if (!params->len) {
-gdb_put_packet("E22");
-return;
-}
-
-if (!get_param(params, 

[PATCH v3 19/24] gdbstub: move register helpers into standalone include

2023-02-21 Thread Alex Bennée
These inline helpers are all used by target specific code so move them
out of the general header so we don't needlessly pollute the rest of
the API with target specific stuff.

Note we have to include cpu.h in semihosting as it was relying on a
side effect before.

Reviewed-by: Taylor Simpson 
Reviewed-by: Philippe Mathieu-Daudé 
Reviewed-by: Richard Henderson 
Signed-off-by: Alex Bennée 

---
v3
  - update xtensa's import-core script as well
---
 include/exec/gdbstub.h |  86 -
 include/gdbstub/helpers.h  | 103 +
 semihosting/syscalls.c |   1 +
 target/alpha/gdbstub.c |   2 +-
 target/arm/gdbstub.c   |   1 +
 target/arm/gdbstub64.c |   2 +-
 target/arm/helper-a64.c|   2 +-
 target/arm/m_helper.c  |   1 +
 target/avr/gdbstub.c   |   2 +-
 target/cris/gdbstub.c  |   2 +-
 target/hexagon/gdbstub.c   |   2 +-
 target/hppa/gdbstub.c  |   2 +-
 target/i386/gdbstub.c  |   2 +-
 target/i386/whpx/whpx-all.c|   2 +-
 target/loongarch/gdbstub.c |   1 +
 target/m68k/gdbstub.c  |   2 +-
 target/m68k/helper.c   |   1 +
 target/m68k/m68k-semi.c|   1 +
 target/microblaze/gdbstub.c|   2 +-
 target/mips/gdbstub.c  |   2 +-
 target/mips/tcg/sysemu/mips-semi.c |   1 +
 target/nios2/cpu.c |   2 +-
 target/nios2/nios2-semi.c  |   1 +
 target/openrisc/gdbstub.c  |   2 +-
 target/openrisc/interrupt.c|   2 +-
 target/openrisc/mmu.c  |   2 +-
 target/ppc/cpu_init.c  |   2 +-
 target/ppc/gdbstub.c   |   1 +
 target/riscv/gdbstub.c |   1 +
 target/rx/gdbstub.c|   2 +-
 target/s390x/gdbstub.c |   1 +
 target/s390x/helper.c  |   2 +-
 target/sh4/gdbstub.c   |   2 +-
 target/sparc/gdbstub.c |   2 +-
 target/tricore/gdbstub.c   |   2 +-
 target/xtensa/core-dc232b.c|   2 +-
 target/xtensa/core-dc233c.c|   2 +-
 target/xtensa/core-de212.c |   2 +-
 target/xtensa/core-de233_fpu.c |   2 +-
 target/xtensa/core-dsp3400.c   |   2 +-
 target/xtensa/core-fsf.c   |   2 +-
 target/xtensa/core-lx106.c |   2 +-
 target/xtensa/core-sample_controller.c |   2 +-
 target/xtensa/core-test_kc705_be.c |   2 +-
 target/xtensa/core-test_mmuhifi_c3.c   |   2 +-
 target/xtensa/gdbstub.c|   2 +-
 target/xtensa/helper.c |   2 +-
 target/xtensa/import_core.sh   |   2 +-
 48 files changed, 149 insertions(+), 121 deletions(-)
 create mode 100644 include/gdbstub/helpers.h

diff --git a/include/exec/gdbstub.h b/include/exec/gdbstub.h
index 8fff5450ed..bb8a3928dd 100644
--- a/include/exec/gdbstub.h
+++ b/include/exec/gdbstub.h
@@ -110,92 +110,6 @@ void gdb_register_coprocessor(CPUState *cpu,
   gdb_get_reg_cb get_reg, gdb_set_reg_cb set_reg,
   int num_regs, const char *xml, int g_pos);
 
-#ifdef NEED_CPU_H
-#include "cpu.h"
-
-/*
- * The GDB remote protocol transfers values in target byte order. As
- * the gdbstub may be batching up several register values we always
- * append to the array.
- */
-
-static inline int gdb_get_reg8(GByteArray *buf, uint8_t val)
-{
-g_byte_array_append(buf, , 1);
-return 1;
-}
-
-static inline int gdb_get_reg16(GByteArray *buf, uint16_t val)
-{
-uint16_t to_word = tswap16(val);
-g_byte_array_append(buf, (uint8_t *) _word, 2);
-return 2;
-}
-
-static inline int gdb_get_reg32(GByteArray *buf, uint32_t val)
-{
-uint32_t to_long = tswap32(val);
-g_byte_array_append(buf, (uint8_t *) _long, 4);
-return 4;
-}
-
-static inline int gdb_get_reg64(GByteArray *buf, uint64_t val)
-{
-uint64_t to_quad = tswap64(val);
-g_byte_array_append(buf, (uint8_t *) _quad, 8);
-return 8;
-}
-
-static inline int gdb_get_reg128(GByteArray *buf, uint64_t val_hi,
- uint64_t val_lo)
-{
-uint64_t to_quad;
-#if TARGET_BIG_ENDIAN
-to_quad = tswap64(val_hi);
-g_byte_array_append(buf, (uint8_t *) _quad, 8);
-to_quad = tswap64(val_lo);
-g_byte_array_append(buf, (uint8_t *) _quad, 8);
-#else
-to_quad = tswap64(val_lo);
-g_byte_array_append(buf, (uint8_t *) _quad, 8);
-to_quad = tswap64(val_hi);
-g_byte_array_append(buf, (uint8_t *) _quad, 8);
-#endif
-return 16;
-}
-
-static inline int gdb_get_zeroes(GByteArray *array, size_t len)
-{
-guint oldlen = array->len;
-g_byte_array_set_size(array, oldlen + len);
-memset(array->data + oldlen, 0, len);
-
-return len;
-}
-
-/**
- * gdb_get_reg_ptr: get pointer to start of last 

[PATCH v3 21/24] gdbstub: only compile gdbstub twice for whole build

2023-02-21 Thread Alex Bennée
Now we have removed any target specific bits from the core gdbstub
code we only need to build it twice. We have to jump a few meson hoops
to manually define the CONFIG_USER_ONLY symbol but it seems to work.

Signed-off-by: Alex Bennée 

---
v3
  - also include user and softmmu bits in the library
---
 gdbstub/gdbstub.c |  3 +--
 gdbstub/user-target.c |  2 +-
 gdbstub/meson.build   | 32 ++--
 3 files changed, 28 insertions(+), 9 deletions(-)

diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index a8b321710c..791cb79bf6 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -39,7 +39,6 @@
 
 #include "sysemu/hw_accel.h"
 #include "sysemu/runstate.h"
-#include "exec/exec-all.h"
 #include "exec/tb-flush.h"
 #include "exec/hwaddr.h"
 #include "sysemu/replay.h"
@@ -1611,7 +1610,7 @@ static const GdbCmdParseEntry gdb_gen_query_table[] = {
 .cmd_startswith = 1,
 .schema = "s:l,l0"
 },
-#if defined(CONFIG_USER_ONLY) && defined(CONFIG_LINUX_USER)
+#if defined(CONFIG_USER_ONLY) && defined(CONFIG_LINUX)
 {
 .handler = gdb_handle_query_xfer_auxv,
 .cmd = "Xfer:auxv:read::",
diff --git a/gdbstub/user-target.c b/gdbstub/user-target.c
index e1d9650c1e..a0c10d6e12 100644
--- a/gdbstub/user-target.c
+++ b/gdbstub/user-target.c
@@ -233,7 +233,7 @@ static inline int target_memory_rw_debug(CPUState *cpu, 
target_ulong addr,
 }
 
 
-#if defined(CONFIG_LINUX_USER)
+#if defined(CONFIG_LINUX)
 void gdb_handle_query_xfer_auxv(GArray *params, void *user_ctx)
 {
 TaskState *ts;
diff --git a/gdbstub/meson.build b/gdbstub/meson.build
index 56c40c25ef..6abf067afc 100644
--- a/gdbstub/meson.build
+++ b/gdbstub/meson.build
@@ -4,13 +4,33 @@
 # types such as hwaddr.
 #
 
-specific_ss.add(files('gdbstub.c'))
+# We need to build the core gdb code via a library to be able to tweak
+# cflags so:
+
+gdb_user_ss = ss.source_set()
+gdb_softmmu_ss = ss.source_set()
+
+# We build two versions of gdbstub, one for each mode
+gdb_user_ss.add(files('gdbstub.c', 'user.c'))
+gdb_softmmu_ss.add(files('gdbstub.c', 'softmmu.c'))
+
+gdb_user_ss = gdb_user_ss.apply(config_host, strict: false)
+gdb_softmmu_ss = gdb_softmmu_ss.apply(config_host, strict: false)
+
+libgdb_user = static_library('gdb_user', gdb_user_ss.sources(),
+ name_suffix: 'fa',
+ c_args: '-DCONFIG_USER_ONLY')
+
+libgdb_softmmu = static_library('gdb_softmmu', gdb_softmmu_ss.sources(),
+name_suffix: 'fa')
+
+gdb_user = declare_dependency(link_whole: libgdb_user)
+user_ss.add(gdb_user)
+gdb_softmmu = declare_dependency(link_whole: libgdb_softmmu)
+softmmu_ss.add(gdb_softmmu)
 
 # These have to built to the target ABI
 specific_ss.add(files('syscalls.c'))
 
-softmmu_ss.add(files('softmmu.c'))
-user_ss.add(files('user.c'))
-
-# and BSD?
-specific_ss.add(when: 'CONFIG_LINUX_USER', if_true: files('user-target.c'))
+# The user-target is specialised by the guest
+specific_ss.add(when: 'CONFIG_USER_ONLY', if_true: files('user-target.c'))
-- 
2.39.1




[PATCH v3 22/24] testing: probe gdb for supported architectures ahead of time

2023-02-21 Thread Alex Bennée
Currently when we encounter a gdb that is old or not built with
multiarch in mind we fail rather messily. Try and improve the
situation by probing ahead of time and setting
HOST_GDB_SUPPORTS_ARCH=y in the relevant tcg configs. We can then skip
and give a more meaningful message if we don't run the test.

[AJB: we still miss some arches, for example gdb uses s390 which fails
when we look for s390x. Not sure what the best way to deal with that
is? Maybe define a gdb_arch as we probe each target?]

Signed-off-by: Alex Bennée 
Cc: Richard Henderson 
Cc: Paolo Bonzini 
---
 configure |  8 +
 scripts/probe-gdb-support.sh  | 36 +++
 tests/tcg/aarch64/Makefile.target |  2 +-
 tests/tcg/multiarch/Makefile.target   |  5 +++
 .../multiarch/system/Makefile.softmmu-target  |  6 +++-
 tests/tcg/s390x/Makefile.target   |  2 +-
 6 files changed, 56 insertions(+), 3 deletions(-)
 create mode 100755 scripts/probe-gdb-support.sh

diff --git a/configure b/configure
index cf6db3d551..366a1d8dd2 100755
--- a/configure
+++ b/configure
@@ -226,6 +226,7 @@ stack_protector=""
 safe_stack=""
 use_containers="yes"
 gdb_bin=$(command -v "gdb-multiarch" || command -v "gdb")
+gdb_arches=""
 
 if test -e "$source_path/.git"
 then
@@ -2344,6 +2345,7 @@ if test -n "$gdb_bin"; then
 gdb_version=$($gdb_bin --version | head -n 1)
 if version_ge ${gdb_version##* } 9.1; then
 echo "HAVE_GDB_BIN=$gdb_bin" >> $config_host_mak
+gdb_arches=$("$source_path/scripts/probe-gdb-support.sh" $gdb_bin)
 else
 gdb_bin=""
 fi
@@ -2467,6 +2469,12 @@ for target in $target_list; do
   write_target_makefile "build-tcg-tests-$target" >> "$config_target_mak"
   echo "BUILD_STATIC=$build_static" >> "$config_target_mak"
   echo "QEMU=$PWD/$qemu" >> "$config_target_mak"
+
+  # will GDB work with these binaries?
+  if test "${gdb_arches#*$arch}" != "$gdb_arches"; then
+  echo "HOST_GDB_SUPPORTS_ARCH=y" >> "$config_target_mak"
+  fi
+
   echo "run-tcg-tests-$target: $qemu\$(EXESUF)" >> Makefile.prereqs
   tcg_tests_targets="$tcg_tests_targets $target"
   fi
diff --git a/scripts/probe-gdb-support.sh b/scripts/probe-gdb-support.sh
new file mode 100755
index 00..2b09a00a5b
--- /dev/null
+++ b/scripts/probe-gdb-support.sh
@@ -0,0 +1,36 @@
+#!/bin/sh
+
+# Probe gdb for supported architectures.
+#
+# This is required to support testing of the gdbstub as its hard to
+# handle errors gracefully during the test. Instead this script when
+# passed a GDB binary will probe its architecture support and return a
+# string of supported arches, stripped of guff.
+#
+# Copyright 2023 Linaro Ltd
+#
+# Author: Alex Bennée 
+#
+# This work is licensed under the terms of the GNU GPL, version 2 or later.
+# See the COPYING file in the top-level directory.
+#
+# SPDX-License-Identifier: GPL-2.0-or-later
+
+if test -z "$1"; then
+  echo "Usage: $0 /path/to/gdb"
+  exit 1
+fi
+
+# Start gdb with a set-architecture and capture the set of valid
+# options.
+
+valid_args=$($1 -ex "set architecture" -ex "quit" 2>&1 >/dev/null)
+
+# Strip off the preamble
+raw_arches=$(echo "${valid_args}" | sed "s/.*Valid arguments are \(.*\)/\1/")
+
+# Split into lines, strip everything after :foo and return final
+# "clean" list of supported arches.
+final_arches=$(echo "${raw_arches}" | tr ', ' '\n' | sed "s/:.*//" | sort | 
uniq)
+
+echo "$final_arches"
diff --git a/tests/tcg/aarch64/Makefile.target 
b/tests/tcg/aarch64/Makefile.target
index db122ab4ff..9e91a20b0d 100644
--- a/tests/tcg/aarch64/Makefile.target
+++ b/tests/tcg/aarch64/Makefile.target
@@ -81,7 +81,7 @@ sha512-vector: sha512.c
 
 TESTS += sha512-vector
 
-ifneq ($(HAVE_GDB_BIN),)
+ifeq ($(HOST_GDB_SUPPORTS_ARCH),y)
 GDB_SCRIPT=$(SRC_PATH)/tests/guest-debug/run-test.py
 
 run-gdbstub-sysregs: sysregs
diff --git a/tests/tcg/multiarch/Makefile.target 
b/tests/tcg/multiarch/Makefile.target
index ae8b3d7268..373db69648 100644
--- a/tests/tcg/multiarch/Makefile.target
+++ b/tests/tcg/multiarch/Makefile.target
@@ -64,6 +64,7 @@ run-test-mmap-%: test-mmap
$(call run-test, test-mmap-$*, $(QEMU) -p $* $<, $< ($* byte pages))
 
 ifneq ($(HAVE_GDB_BIN),)
+ifeq ($(HOST_GDB_SUPPORTS_ARCH),y)
 GDB_SCRIPT=$(SRC_PATH)/tests/guest-debug/run-test.py
 
 run-gdbstub-sha1: sha1
@@ -87,6 +88,10 @@ run-gdbstub-thread-breakpoint: testthread
--bin $< --test 
$(MULTIARCH_SRC)/gdbstub/test-thread-breakpoint.py, \
hitting a breakpoint on non-main thread)
 
+else
+run-gdbstub-%:
+   $(call skip-test, "gdbstub test $*", "no guest arch support")
+endif
 else
 run-gdbstub-%:
$(call skip-test, "gdbstub test $*", "need working gdb")
diff --git a/tests/tcg/multiarch/system/Makefile.softmmu-target 
b/tests/tcg/multiarch/system/Makefile.softmmu-target
index 368b64d531..5f432c95f3 100644
--- a/tests/tcg/multiarch/system/Makefile.softmmu-target
+++ 

[PATCH v3 12/24] gdbstub: abstract target specific details from gdb_put_packet_binary

2023-02-21 Thread Alex Bennée
We unfortunately handle the checking of packet acknowledgement
differently for user and softmmu modes. Abstract the user mode stuff
behind gdb_got_immediate_ack with a stub for softmmu.

Reviewed-by: Richard Henderson 
Reviewed-by: Philippe Mathieu-Daudé 
Signed-off-by: Alex Bennée 
---
 gdbstub/internals.h | 15 +++
 gdbstub/gdbstub.c   | 10 ++
 gdbstub/softmmu.c   |  8 
 gdbstub/user.c  | 19 +++
 4 files changed, 44 insertions(+), 8 deletions(-)

diff --git a/gdbstub/internals.h b/gdbstub/internals.h
index a3dd629ee8..3912d0de38 100644
--- a/gdbstub/internals.h
+++ b/gdbstub/internals.h
@@ -108,6 +108,21 @@ void gdb_memtohex(GString *buf, const uint8_t *mem, int 
len);
 void gdb_memtox(GString *buf, const char *mem, int len);
 void gdb_read_byte(uint8_t ch);
 
+/*
+ * Packet acknowledgement - we handle this slightly differently
+ * between user and softmmu mode, mainly to deal with the differences
+ * between the flexible chardev and the direct fd approaches.
+ *
+ * We currently don't support a negotiated QStartNoAckMode
+ */
+
+/**
+ * gdb_got_immediate_ack() - check ok to continue
+ *
+ * Returns true to continue, false to re-transmit for user only, the
+ * softmmu stub always returns true.
+ */
+bool gdb_got_immediate_ack(void);
 /* utility helpers */
 CPUState *gdb_first_attached_cpu(void);
 void gdb_append_thread_id(CPUState *cpu, GString *buf);
diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index 4bf99783a6..76c24b7cb6 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -239,15 +239,9 @@ int gdb_put_packet_binary(const char *buf, int len, bool 
dump)
 gdb_put_buffer(gdbserver_state.last_packet->data,
gdbserver_state.last_packet->len);
 
-#ifdef CONFIG_USER_ONLY
-i = gdb_get_char();
-if (i < 0)
-return -1;
-if (i == '+')
+if (gdb_got_immediate_ack()) {
 break;
-#else
-break;
-#endif
+}
 }
 return 0;
 }
diff --git a/gdbstub/softmmu.c b/gdbstub/softmmu.c
index 79674b8bea..0232e62ea4 100644
--- a/gdbstub/softmmu.c
+++ b/gdbstub/softmmu.c
@@ -55,6 +55,14 @@ int gdb_get_cpu_index(CPUState *cpu)
 return cpu->cpu_index + 1;
 }
 
+/*
+ * We check the status of the last message in the chardev receive code
+ */
+bool gdb_got_immediate_ack(void)
+{
+return true;
+}
+
 /*
  * GDB Connection management. For system emulation we do all of this
  * via our existing Chardev infrastructure which allows us to support
diff --git a/gdbstub/user.c b/gdbstub/user.c
index 1c9e070e57..33a0701cea 100644
--- a/gdbstub/user.c
+++ b/gdbstub/user.c
@@ -54,6 +54,25 @@ int gdb_get_char(void)
 return ch;
 }
 
+bool gdb_got_immediate_ack(void)
+{
+int i;
+
+i = gdb_get_char();
+if (i < 0) {
+/* no response, continue anyway */
+return true;
+}
+
+if (i == '+') {
+/* received correctly, continue */
+return true;
+}
+
+/* anything else, including '-' then try again */
+return false;
+}
+
 void gdb_put_buffer(const uint8_t *buf, int len)
 {
 int ret;
-- 
2.39.1




[PATCH v3 23/24] include: split target_long definition from cpu-defs

2023-02-21 Thread Alex Bennée
While we will continue to include this via cpu-defs it is useful to be
able to define this separately for 32 and 64 bit versions of an
otherwise target independent compilation unit.

Signed-off-by: Alex Bennée 
---
 include/exec/cpu-defs.h| 19 +
 include/exec/target_long.h | 42 ++
 2 files changed, 43 insertions(+), 18 deletions(-)
 create mode 100644 include/exec/target_long.h

diff --git a/include/exec/cpu-defs.h b/include/exec/cpu-defs.h
index 21309cf567..98605dfba2 100644
--- a/include/exec/cpu-defs.h
+++ b/include/exec/cpu-defs.h
@@ -58,24 +58,7 @@
 # define TARGET_TB_PCREL 0
 #endif
 
-#define TARGET_LONG_SIZE (TARGET_LONG_BITS / 8)
-
-/* target_ulong is the type of a virtual address */
-#if TARGET_LONG_SIZE == 4
-typedef int32_t target_long;
-typedef uint32_t target_ulong;
-#define TARGET_FMT_lx "%08x"
-#define TARGET_FMT_ld "%d"
-#define TARGET_FMT_lu "%u"
-#elif TARGET_LONG_SIZE == 8
-typedef int64_t target_long;
-typedef uint64_t target_ulong;
-#define TARGET_FMT_lx "%016" PRIx64
-#define TARGET_FMT_ld "%" PRId64
-#define TARGET_FMT_lu "%" PRIu64
-#else
-#error TARGET_LONG_SIZE undefined
-#endif
+#include "exec/target_long.h"
 
 #if !defined(CONFIG_USER_ONLY) && defined(CONFIG_TCG)
 
diff --git a/include/exec/target_long.h b/include/exec/target_long.h
new file mode 100644
index 00..93c9472971
--- /dev/null
+++ b/include/exec/target_long.h
@@ -0,0 +1,42 @@
+/*
+ * Target Long Definitions
+ *
+ * Copyright (c) 2003 Fabrice Bellard
+ * Copyright (c) 2023 Linaro Ltd
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef _TARGET_LONG_H_
+#define _TARGET_LONG_H_
+
+/*
+ * Usually this should only be included via cpu-defs.h however for
+ * certain cases where we want to build only two versions of a binary
+ * object we can include directly. However the build-system must
+ * ensure TARGET_LONG_BITS is defined directly.
+ */
+#ifndef TARGET_LONG_BITS
+#error TARGET_LONG_BITS not defined
+#endif
+
+#define TARGET_LONG_SIZE (TARGET_LONG_BITS / 8)
+
+/* target_ulong is the type of a virtual address */
+#if TARGET_LONG_SIZE == 4
+typedef int32_t target_long;
+typedef uint32_t target_ulong;
+#define TARGET_FMT_lx "%08x"
+#define TARGET_FMT_ld "%d"
+#define TARGET_FMT_lu "%u"
+#elif TARGET_LONG_SIZE == 8
+typedef int64_t target_long;
+typedef uint64_t target_ulong;
+#define TARGET_FMT_lx "%016" PRIx64
+#define TARGET_FMT_ld "%" PRId64
+#define TARGET_FMT_lu "%" PRIu64
+#else
+#error TARGET_LONG_SIZE undefined
+#endif
+
+#endif /* _TARGET_LONG_H_ */
-- 
2.39.1




[PATCH v3 11/24] gdbstub: rationalise signal mapping in softmmu

2023-02-21 Thread Alex Bennée
We don't really need a table for mapping two symbols.

Signed-off-by: Alex Bennée 
Suggested-by: Richard Henderson 
---
 gdbstub/softmmu.c | 19 +++
 1 file changed, 7 insertions(+), 12 deletions(-)

diff --git a/gdbstub/softmmu.c b/gdbstub/softmmu.c
index 864ecee38f..79674b8bea 100644
--- a/gdbstub/softmmu.c
+++ b/gdbstub/softmmu.c
@@ -499,21 +499,16 @@ enum {
 TARGET_SIGTRAP = 5
 };
 
-static int gdb_signal_table[] = {
--1,
--1,
-TARGET_SIGINT,
--1,
--1,
-TARGET_SIGTRAP
-};
-
 int gdb_signal_to_target (int sig)
 {
-if (sig < ARRAY_SIZE (gdb_signal_table))
-return gdb_signal_table[sig];
-else
+switch (sig) {
+case 2:
+return TARGET_SIGINT;
+case 5:
+return TARGET_SIGTRAP;
+default:
 return -1;
+}
 }
 
 /*
-- 
2.39.1




[PATCH v3 15/24] gdbstub: introduce gdb_get_max_cpus

2023-02-21 Thread Alex Bennée
This is needed for handling vcont packets as the way of calculating
max cpus vhanges between user and softmmu mode.

Reviewed-by: Richard Henderson 
Signed-off-by: Alex Bennée 

---
v3
  - rm out of date comment
---
 gdbstub/internals.h |  1 +
 gdbstub/gdbstub.c   | 11 +--
 gdbstub/softmmu.c   |  9 +
 gdbstub/user.c  | 17 +
 4 files changed, 28 insertions(+), 10 deletions(-)

diff --git a/gdbstub/internals.h b/gdbstub/internals.h
index 9de995ce3a..90069a9415 100644
--- a/gdbstub/internals.h
+++ b/gdbstub/internals.h
@@ -127,6 +127,7 @@ bool gdb_got_immediate_ack(void);
 CPUState *gdb_first_attached_cpu(void);
 void gdb_append_thread_id(CPUState *cpu, GString *buf);
 int gdb_get_cpu_index(CPUState *cpu);
+unsigned int gdb_get_max_cpus(void); /* both */
 
 void gdb_create_default_process(GDBState *s);
 
diff --git a/gdbstub/gdbstub.c b/gdbstub/gdbstub.c
index 91021859a1..f9950200b8 100644
--- a/gdbstub/gdbstub.c
+++ b/gdbstub/gdbstub.c
@@ -624,16 +624,7 @@ static int gdb_handle_vcont(const char *p)
 GDBProcess *process;
 CPUState *cpu;
 GDBThreadIdKind kind;
-#ifdef CONFIG_USER_ONLY
-int max_cpus = 1; /* global variable max_cpus exists only in system mode */
-
-CPU_FOREACH(cpu) {
-max_cpus = max_cpus <= cpu->cpu_index ? cpu->cpu_index + 1 : max_cpus;
-}
-#else
-MachineState *ms = MACHINE(qdev_get_machine());
-unsigned int max_cpus = ms->smp.max_cpus;
-#endif
+unsigned int max_cpus = gdb_get_max_cpus();
 /* uninitialised CPUs stay 0 */
 newstates = g_new0(char, max_cpus);
 
diff --git a/gdbstub/softmmu.c b/gdbstub/softmmu.c
index d9b9ba0a32..65aa2018a7 100644
--- a/gdbstub/softmmu.c
+++ b/gdbstub/softmmu.c
@@ -433,6 +433,15 @@ int gdb_target_memory_rw_debug(CPUState *cpu, hwaddr addr,
 return cpu_memory_rw_debug(cpu, addr, buf, len, is_write);
 }
 
+/*
+ * cpu helpers
+ */
+
+unsigned int gdb_get_max_cpus(void)
+{
+MachineState *ms = MACHINE(qdev_get_machine());
+return ms->smp.max_cpus;
+}
 
 /*
  * Softmmu specific command helpers
diff --git a/gdbstub/user.c b/gdbstub/user.c
index b956c0e297..15ff3ab08d 100644
--- a/gdbstub/user.c
+++ b/gdbstub/user.c
@@ -388,6 +388,23 @@ int gdb_target_memory_rw_debug(CPUState *cpu, hwaddr addr,
 return cpu_memory_rw_debug(cpu, addr, buf, len, is_write);
 }
 
+/*
+ * cpu helpers
+ */
+
+unsigned int gdb_get_max_cpus(void)
+{
+CPUState *cpu;
+unsigned int max_cpus = 1;
+
+CPU_FOREACH(cpu) {
+max_cpus = max_cpus <= cpu->cpu_index ? cpu->cpu_index + 1 : max_cpus;
+}
+
+return max_cpus;
+}
+
+
 /*
  * Break/Watch point helpers
  */
-- 
2.39.1




[PATCH v3 24/24] gdbstub: split out softmmu/user specifics for syscall handling

2023-02-21 Thread Alex Bennée
Most of the syscall code is config agnostic aside from the size of
target_ulong. In preparation for the next patch move the final bits of
specialisation into the appropriate user and softmmu helpers.

Signed-off-by: Alex Bennée 
---
 gdbstub/internals.h |  5 +
 gdbstub/softmmu.c   | 24 
 gdbstub/syscalls.c  | 32 +++-
 gdbstub/user.c  | 24 
 4 files changed, 64 insertions(+), 21 deletions(-)

diff --git a/gdbstub/internals.h b/gdbstub/internals.h
index fe82facaeb..dce7c4f66f 100644
--- a/gdbstub/internals.h
+++ b/gdbstub/internals.h
@@ -193,6 +193,11 @@ bool gdb_handled_syscall(void);
 void gdb_disable_syscalls(void);
 void gdb_syscall_reset(void);
 
+/* user/softmmu specific signal handling */
+void gdb_pre_syscall_handling(void);
+bool gdb_send_syscall_now(void);
+void gdb_post_syscall_handling(void);
+
 /*
  * Break/Watch point support - there is an implementation for softmmu
  * and user mode.
diff --git a/gdbstub/softmmu.c b/gdbstub/softmmu.c
index b7e3829ca0..8f3c8ef449 100644
--- a/gdbstub/softmmu.c
+++ b/gdbstub/softmmu.c
@@ -101,6 +101,30 @@ static void gdb_chr_event(void *opaque, QEMUChrEvent event)
 }
 }
 
+/*
+ * In softmmu mode we stop the VM and wait to send the syscall packet
+ * until notification that the CPU has stopped. This must be done
+ * because if the packet is sent now the reply from the syscall
+ * request could be received while the CPU is still in the running
+ * state, which can cause packets to be dropped and state transition
+ * 'T' packets to be sent while the syscall is still being processed.
+ */
+
+void gdb_pre_syscall_handling(void)
+{
+vm_stop(RUN_STATE_DEBUG);
+}
+
+bool gdb_send_syscall_now(void)
+{
+return false;
+}
+
+void gdb_post_syscall_handling(void)
+{
+qemu_cpu_kick(gdbserver_state.c_cpu);
+}
+
 static void gdb_vm_state_change(void *opaque, bool running, RunState state)
 {
 CPUState *cpu = gdbserver_state.c_cpu;
diff --git a/gdbstub/syscalls.c b/gdbstub/syscalls.c
index 1b63a1d197..24eee38136 100644
--- a/gdbstub/syscalls.c
+++ b/gdbstub/syscalls.c
@@ -102,9 +102,10 @@ void gdb_do_syscallv(gdb_syscall_complete_cb cb, const 
char *fmt, va_list va)
 }
 
 gdbserver_syscall_state.current_syscall_cb = cb;
-#ifndef CONFIG_USER_ONLY
-vm_stop(RUN_STATE_DEBUG);
-#endif
+
+/* user/softmmu specific handling */
+gdb_pre_syscall_handling();
+
 p = _syscall_state.syscall_buf[0];
 p_end = 
_syscall_state.syscall_buf[sizeof(gdbserver_syscall_state.syscall_buf)];
 *(p++) = 'F';
@@ -138,24 +139,13 @@ void gdb_do_syscallv(gdb_syscall_complete_cb cb, const 
char *fmt, va_list va)
 }
 }
 *p = 0;
-#ifdef CONFIG_USER_ONLY
-gdb_put_packet(gdbserver_syscall_state.syscall_buf);
-/* Return control to gdb for it to process the syscall request.
- * Since the protocol requires that gdb hands control back to us
- * using a "here are the results" F packet, we don't need to check
- * gdb_handlesig's return value (which is the signal to deliver if
- * execution was resumed via a continue packet).
- */
-gdb_handlesig(gdbserver_state.c_cpu, 0);
-#else
-/* In this case wait to send the syscall packet until notification that
-   the CPU has stopped.  This must be done because if the packet is sent
-   now the reply from the syscall request could be received while the CPU
-   is still in the running state, which can cause packets to be dropped
-   and state transition 'T' packets to be sent while the syscall is still
-   being processed.  */
-qemu_cpu_kick(gdbserver_state.c_cpu);
-#endif
+
+if (gdb_send_syscall_now()) { /* true only for *-user */
+gdb_put_packet(gdbserver_syscall_state.syscall_buf);
+}
+
+/* user/softmmu specific handling */
+gdb_post_syscall_handling();
 }
 
 void gdb_do_syscall(gdb_syscall_complete_cb cb, const char *fmt, ...)
diff --git a/gdbstub/user.c b/gdbstub/user.c
index cc7eeb9afb..a5227e23cf 100644
--- a/gdbstub/user.c
+++ b/gdbstub/user.c
@@ -467,3 +467,27 @@ void gdb_breakpoint_remove_all(CPUState *cs)
 {
 cpu_breakpoint_remove_all(cs, BP_GDB);
 }
+
+/*
+ * For user-mode syscall support we send the system call immediately
+ * and then return control to gdb for it to process the syscall request.
+ * Since the protocol requires that gdb hands control back to us
+ * using a "here are the results" F packet, we don't need to check
+ * gdb_handlesig's return value (which is the signal to deliver if
+ * execution was resumed via a continue packet).
+ */
+
+void gdb_pre_syscall_handling(void)
+{
+return;
+}
+
+bool gdb_send_syscall_now(void)
+{
+return true;
+}
+
+void gdb_post_syscall_handling(void)
+{
+gdb_handlesig(gdbserver_state.c_cpu, 0);
+}
-- 
2.39.1




  1   2   3   4   >