date:20220726

Re: [QEMU PATCH v2 4/6] nvdimm: Implement ACPI NVDIMM Label Methods

2022-07-26 Thread Robert Hoo

On Thu, 2022-07-21 at 10:58 +0200, Igor Mammedov wrote:
[...]
Thanks Igor for review.
> > > The patch it is too intrusive and my hunch is that it breaks
> > > ABI and needs a bunch of compat knobs to work properly and
> > > that I'd like to avoid unless there is not other way around
> > > the problem.  
> > 
> > Is the ABI here you mentioned the "struct NvdimmMthdIn{}" stuff?
> > and the compat knobs refers to related functions' input/output
> > params?
> 
> ABI are structures that guest and QEMU pass through information
> between each other. And knobs in this case would be compat
> variable[s]
> to keep old behavior in place for old machine types.

My humble opinion:
The changes of the compat variable(s) here don't break the ABI, the ABI
between guest and host/qemu is the ACPI spec which we don't change and
fully conform to it; actually we're implementing it.
e.g. with these patches, old guest can boot up with no difference nor
changes.
> 
> > My thoughts is that eventually, sooner or later, more ACPI methods
> > will
> > be implemented per request, although now we can play the trick of
> > wrapper new methods over the pipe of old _DSM implementation.
> > Though this changes a little on existing struct NvdimmDsmIn {}, it
> > paves the way for the future; and actually the change is more an
> > extension or generalization, not fundamentally changes the
> > framework.
> > 
> > In short, my point is the change/generalization/extension will be
> > inevitable, even if not present.
> 
> Expanding ABI (interface between host) has 2 drawbacks
>  * it exposes more attack surface of VMM to hostile guest
>and rises chances that vulnerability would slip through
>review/testing

This patch doesn't increase attach surface, I think.

>  * migration wise, QEMU has to support any ABI for years
>and not only latest an greatest interface but also old
>ones to keep guest started on older QEMU working across
>migration, so any ABI change should be considered very
>carefully before being implemented otherwise it all
>quickly snowballs in unsupportable mess of compat
>variables smeared across host/guest.
>Reducing exposed ABI and constant need to expand it
>was a reason why we have moved ACPI code from firmware
>into QEMU, so we could describe hardware without costs
>associated with of maintaining ABI.

Yeah, migration is the only broken thing. With this patch, guest ACPI
table changes, live guest migrate between new and old qemus will have
problem. But I think this is not the only example of such kind of
problem. How about other similar cases?

In fact, the point of our contention is around this 
https://www.qemu.org/docs/master/specs/acpi_nvdimm.html, whether or not
change the implementation protocol by this patch. The protocol was for
_DSM only. Unless we're not going to support any ACPI methods, it
should be updated, and the _LS{I,R,W} are ACPI methods, we can play the
trick in this special case, but definitely not next time.

I suggest to do it now, nevertheless, you maintainers make the final
decision.

> 
> There might be need to extend ABI eventually, but not in this case.
> 
> > > I was skeptical about this approach during v1 review and
> > > now I'm pretty much sure it's over-engineered and we can
> > > just repack data we receive from existing label _DSM functions
> > > to provide _LS{I,R,W} like it was suggested in v1.
> > > It will be much simpler and affect only AML side without
> > > complicating ABI and without any compat cruft and will work
> > > with ping-pong migration without any issues.  
> > 
> > Ostensibly it may looks simpler, actually not, I think. The AML
> > "common
> > pipe" NCAL() is already complex, it packs all _DSMs and NFIT()
> > function
> > logics there, packing new stuff in/through it will be bug-prone.
> > Though this time we can avert touching it, as the new ACPI methods
> > deprecating old _DSM functionally is almost the same.
> > How about next time? are we going to always packing new methods
> > logic
> > in NCAL()?
> > My point is that we should implement new methods as itself, of
> > course,
> > as a general programming rule, we can/should abstract common
> > routines,
> > but not packing them in one large function.
> > > 
> > >   
[...]

Re: [PATCH 07/16] virtio-net: support queue reset

2022-07-26 Thread Jason Wang

On Tue, Jul 26, 2022 at 3:02 PM Kangjie Xu  wrote:
>
>
> 在 2022/7/26 11:43, Jason Wang 写道:
> >
> > 在 2022/7/18 19:17, Kangjie Xu 写道:
> >> From: Xuan Zhuo 
> >>
> >> virtio-net implements queue reset. Queued packets in the corresponding
> >> queue pair are flushed or purged.
> >>
> >> Queue reset is currently only implemented for non-vhosts.
> >>
> >> Signed-off-by: Xuan Zhuo 
> >> ---
> >>   hw/net/virtio-net.c | 15 +++
> >>   1 file changed, 15 insertions(+)
> >>
> >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >> index 7ad948ee7c..8396e21a67 100644
> >> --- a/hw/net/virtio-net.c
> >> +++ b/hw/net/virtio-net.c
> >> @@ -531,6 +531,19 @@ static RxFilterInfo
> >> *virtio_net_query_rxfilter(NetClientState *nc)
> >>   return info;
> >>   }
> >>   +static void virtio_net_queue_reset(VirtIODevice *vdev, uint32_t
> >> queue_index)
> >> +{
> >> +VirtIONet *n = VIRTIO_NET(vdev);
> >> +NetClientState *nc = qemu_get_subqueue(n->nic, vq2q(queue_index));
> >> +
> >> +if (!nc->peer) {
> >> +return;
> >> +}
> >> +
> >> +qemu_flush_or_purge_queued_packets(nc->peer, true);
> >> +assert(!virtio_net_get_subqueue(nc)->async_tx.elem);
> >
> >
> > Let's try to reuse this function in virtio_net_reset().
> >
> Yeah, I'll fix it.
>
> Thanks.
>
> >
> >> +}
> >> +
> >>   static void virtio_net_reset(VirtIODevice *vdev)
> >>   {
> >>   VirtIONet *n = VIRTIO_NET(vdev);
> >> @@ -741,6 +754,7 @@ static uint64_t
> >> virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
> >>   }
> >> if (!get_vhost_net(nc->peer)) {
> >> +virtio_add_feature(, VIRTIO_F_RING_RESET);
> >
> >
> > This breaks migration compatibility.
> >
> > We probably need:
> >
> > 1) a new command line parameter
> > 2) make it disabled for pre-7.2 machine
> >
> > Thanks
> >
> >
> Sorry, I don't get what is the meaning of "pre-7.2 machine". Could you
> please explain it?

I meant for pre 7.2 machine type, we should make reset fault off by default.

Otherwise we break migration compatibility.

Thanks

>
> Thanks
>
> >>   return features;
> >>   }
> >>   @@ -3766,6 +3780,7 @@ static void virtio_net_class_init(ObjectClass
> >> *klass, void *data)
> >>   vdc->set_features = virtio_net_set_features;
> >>   vdc->bad_features = virtio_net_bad_features;
> >>   vdc->reset = virtio_net_reset;
> >> +vdc->queue_reset = virtio_net_queue_reset;
> >>   vdc->set_status = virtio_net_set_status;
> >>   vdc->guest_notifier_mask = virtio_net_guest_notifier_mask;
> >>   vdc->guest_notifier_pending = virtio_net_guest_notifier_pending;
>

Re: [PATCH 14/16] virtio-net: support queue_enable for vhost-user

2022-07-26 Thread Jason Wang

On Tue, Jul 26, 2022 at 2:54 PM Kangjie Xu  wrote:
>
>
> 在 2022/7/26 12:25, Jason Wang 写道:
> >
> > 在 2022/7/18 19:17, Kangjie Xu 写道:
> >> Support queue enable in vhost-user scenario. It will be called when
> >> a vq reset operation is performed and the vq will be restared.
> >>
> >> It should be noted that we can restart the vq when the vhost has
> >> already started. When launching a new vhost-user device, the vhost
> >> is not started and all vqs are not initalized until
> >> VIRTIO_PCI_COMMON_STATUS is written. Thus, we should use vhost_started
> >> to differentiate the two cases: vq reset and device start.
> >>
> >> Signed-off-by: Kangjie Xu 
> >> Signed-off-by: Xuan Zhuo 
> >> ---
> >>   hw/net/virtio-net.c | 20 
> >>   1 file changed, 20 insertions(+)
> >>
> >> diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >> index 8396e21a67..2c26e2ef73 100644
> >> --- a/hw/net/virtio-net.c
> >> +++ b/hw/net/virtio-net.c
> >> @@ -544,6 +544,25 @@ static void virtio_net_queue_reset(VirtIODevice
> >> *vdev, uint32_t queue_index)
> >>   assert(!virtio_net_get_subqueue(nc)->async_tx.elem);
> >>   }
> >>   +static void virtio_net_queue_enable(VirtIODevice *vdev, uint32_t
> >> queue_index)
> >> +{
> >> +VirtIONet *n = VIRTIO_NET(vdev);
> >> +NetClientState *nc = qemu_get_subqueue(n->nic, vq2q(queue_index));
> >> +int r;
> >> +
> >> +if (!nc->peer || !vdev->vhost_started) {
> >> +return;
> >> +}
> >> +
> >> +if (nc->peer->info->type == NET_CLIENT_DRIVER_VHOST_USER) {
> >> +r = vhost_virtqueue_restart(vdev, nc, queue_index);
> >> +if (r < 0) {
> >> +error_report("unable to restart vhost net virtqueue: %d, "
> >> +"when resetting the queue", queue_index);
> >> +}
> >> +}
> >> +}
> >
> >
> > So we don't check queue_enable in vhost_dev_start(), does this mean
> > the queue_enable is actually meaningless (since the virtqueue is
> > already started there)?
> >
> > And another issue is
> >
> > peet_attach/peer_detach() "abuse" vhost_set_vring_enable(). This
> > probably means we need to invent new type of request instead of
> > re-using VHOST_USER_SET_VRING_ENABLE.
> >
> > Thanks
> >
> >
> 1. Yes, we don't need queue_enable in vhost_dev_start(), queue_enable is
> only useful when restarting the queue. I name it as queue_enable()
> simply because it is called when VIRTIO_PCI_COMMON_Q_ENABLE is written.
> Would it look better if we rename it as "queue_reenable"?

I think the right approach is probably:

1) when VERSION_1 is negotiated, only start the virtqueue when queue_enable is 1
2) when VERSION_1 is not negotiated, start virtqueue when DRIVER_OK
(vhost_dev_start())

?

>
> 2. I think inventing a new type of vhost-protocol message can be a good
> choice. However, I don't know much about the vhost protocol. If we want
> to add a new message in vhost protocol, except the documentation and the
> code in qemu, Do we need to submit patches to other projects, e.g. some
> projects like virtio-spec?

Probably not since vhost-user doesn't belong to the spec currently.
The doc in qemu should be sufficient.

Thanks

>
> Thanks
>
> >> +
> >>   static void virtio_net_reset(VirtIODevice *vdev)
> >>   {
> >>   VirtIONet *n = VIRTIO_NET(vdev);
> >> @@ -3781,6 +3800,7 @@ static void virtio_net_class_init(ObjectClass
> >> *klass, void *data)
> >>   vdc->bad_features = virtio_net_bad_features;
> >>   vdc->reset = virtio_net_reset;
> >>   vdc->queue_reset = virtio_net_queue_reset;
> >> +vdc->queue_enable = virtio_net_queue_enable;
> >>   vdc->set_status = virtio_net_set_status;
> >>   vdc->guest_notifier_mask = virtio_net_guest_notifier_mask;
> >>   vdc->guest_notifier_pending = virtio_net_guest_notifier_pending;
>

Re: [PATCH 08/16] vhost: add op to enable or disable a single vring

2022-07-26 Thread Jason Wang

On Tue, Jul 26, 2022 at 2:39 PM Kangjie Xu  wrote:
>
>
> 在 2022/7/26 11:49, Jason Wang 写道:
> >
> > 在 2022/7/18 19:17, Kangjie Xu 写道:
> >> The interface to set enable status for a single vring is lacked in
> >> VhostOps, since the vhost_set_vring_enable_op will manipulate all
> >> virtqueues in a device.
> >>
> >> Resetting a single vq will rely on this interface. It requires a
> >> reply to indicate that the reset operation is finished, so the
> >> parameter, wait_for_reply, is added.
> >
> >
> > The wait reply seems to be a implementation specific thing. Can we
> > hide it?
> >
> > Thanks
> >
> I do not hide wait_for_reply here because when stopping the queue, a
> reply is needed to ensure that the message has been processed and queue
> has been disabled.

This needs to be done at vhost-backend level instead of the general vhost code.

E.g vhost-kernel or vDPA is using ioctl() which is synchronous.

>
> When restarting the queue, we do not need a reply, which is the same as
> what qemu does in vhost_dev_start().
>
> So I add this parameter to distinguish the two cases.
>
> I think one alternative implementation is to add a interface in
> VhostOps: queue_reset(). In this way details can be hidden. What do you
> think of this solution? Does it look better?

Let me ask it differently, under which case can we call this function
with wait_for_reply = false?

Thanks

>
> Thanks
>
> >
> >>
> >> Signed-off-by: Kangjie Xu 
> >> Signed-off-by: Xuan Zhuo 
> >> ---
> >>   include/hw/virtio/vhost-backend.h | 4 
> >>   1 file changed, 4 insertions(+)
> >>
> >> diff --git a/include/hw/virtio/vhost-backend.h
> >> b/include/hw/virtio/vhost-backend.h
> >> index eab46d7f0b..7bddd1e9a0 100644
> >> --- a/include/hw/virtio/vhost-backend.h
> >> +++ b/include/hw/virtio/vhost-backend.h
> >> @@ -81,6 +81,9 @@ typedef int (*vhost_set_backend_cap_op)(struct
> >> vhost_dev *dev);
> >>   typedef int (*vhost_set_owner_op)(struct vhost_dev *dev);
> >>   typedef int (*vhost_reset_device_op)(struct vhost_dev *dev);
> >>   typedef int (*vhost_get_vq_index_op)(struct vhost_dev *dev, int idx);
> >> +typedef int (*vhost_set_single_vring_enable_op)(struct vhost_dev *dev,
> >> +int index, int enable,
> >> +bool wait_for_reply);
> >>   typedef int (*vhost_set_vring_enable_op)(struct vhost_dev *dev,
> >>int enable);
> >>   typedef bool (*vhost_requires_shm_log_op)(struct vhost_dev *dev);
> >> @@ -155,6 +158,7 @@ typedef struct VhostOps {
> >>   vhost_set_owner_op vhost_set_owner;
> >>   vhost_reset_device_op vhost_reset_device;
> >>   vhost_get_vq_index_op vhost_get_vq_index;
> >> +vhost_set_single_vring_enable_op vhost_set_single_vring_enable;
> >>   vhost_set_vring_enable_op vhost_set_vring_enable;
> >>   vhost_requires_shm_log_op vhost_requires_shm_log;
> >>   vhost_migration_done_op vhost_migration_done;
>

Re: [PATCH 16/16] vhost-net: vq reset feature bit support

2022-07-26 Thread Jason Wang

On Tue, Jul 26, 2022 at 2:24 PM Kangjie Xu  wrote:
>
>
> 在 2022/7/26 12:28, Jason Wang 写道:
> >
> > 在 2022/7/18 19:17, Kangjie Xu 写道:
> >> Add support for negotation of vq reset feature bit.
> >>
> >> Signed-off-by: Kangjie Xu 
> >> Signed-off-by: Xuan Zhuo 
> >
> >
> > I'd suggest to add support for vhost-net kernel as well. It looks much
> > more easier than vhost-user (I guess a stop/start would do the trick).
> >
> > Thanks
> >
> >
> Yeah, we've planned to support it in the future.

If it's possible, I suggest to implement in this series. It would be
easier since it current kernel support it already (via SET_BACKEND).

Thanks

>
> Thanks
>
> >> ---
> >>   hw/net/vhost_net.c  | 1 +
> >>   hw/net/virtio-net.c | 3 ++-
> >>   2 files changed, 3 insertions(+), 1 deletion(-)
> >>
> >> diff --git a/hw/net/vhost_net.c b/hw/net/vhost_net.c
> >> index 4f5f034c11..de910f6466 100644
> >> --- a/hw/net/vhost_net.c
> >> +++ b/hw/net/vhost_net.c
> >> @@ -73,6 +73,7 @@ static const int user_feature_bits[] = {
> >>   VIRTIO_NET_F_MTU,
> >>   VIRTIO_F_IOMMU_PLATFORM,
> >>   VIRTIO_F_RING_PACKED,
> >> +VIRTIO_F_RING_RESET,
> >>   VIRTIO_NET_F_RSS,
> >>   VIRTIO_NET_F_HASH_REPORT,
> >>   diff --git a/hw/net/virtio-net.c b/hw/net/virtio-net.c
> >> index 0747ffe71c..a8b299067a 100644
> >> --- a/hw/net/virtio-net.c
> >> +++ b/hw/net/virtio-net.c
> >> @@ -757,6 +757,8 @@ static uint64_t
> >> virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
> >> virtio_add_feature(, VIRTIO_NET_F_MAC);
> >>   +virtio_add_feature(, VIRTIO_F_RING_RESET);
> >> +
> >>   if (!peer_has_vnet_hdr(n)) {
> >>   virtio_clear_feature(, VIRTIO_NET_F_CSUM);
> >>   virtio_clear_feature(, VIRTIO_NET_F_HOST_TSO4);
> >> @@ -777,7 +779,6 @@ static uint64_t
> >> virtio_net_get_features(VirtIODevice *vdev, uint64_t features,
> >>   }
> >> if (!get_vhost_net(nc->peer)) {
> >> -virtio_add_feature(, VIRTIO_F_RING_RESET);
> >>   return features;
> >>   }
>

Re: [PATCH 09/16] vhost-user: enable/disable a single vring

2022-07-26 Thread Jason Wang

On Tue, Jul 26, 2022 at 1:27 PM Kangjie Xu  wrote:
>
>
> 在 2022/7/26 12:07, Jason Wang 写道:
> >
> > 在 2022/7/18 19:17, Kangjie Xu 写道:
> >> Implement the vhost_set_single_vring_enable, which is to enable or
> >> disable a single vring.
> >>
> >> The parameter wait_for_reply is added to help for some cases such as
> >> vq reset.
> >>
> >> Meanwhile, vhost_user_set_vring_enable() is refactored.
> >>
> >> Signed-off-by: Kangjie Xu 
> >> Signed-off-by: Xuan Zhuo 
> >> ---
> >>   hw/virtio/vhost-user.c | 55 --
> >>   1 file changed, 48 insertions(+), 7 deletions(-)
> >>
> >> diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
> >> index 75b8df21a4..5a80a415f0 100644
> >> --- a/hw/virtio/vhost-user.c
> >> +++ b/hw/virtio/vhost-user.c
> >> @@ -267,6 +267,8 @@ struct scrub_regions {
> >>   int fd_idx;
> >>   };
> >>   +static int enforce_reply(struct vhost_dev *dev, const VhostUserMsg
> >> *msg);
> >> +
> >>   static bool ioeventfd_enabled(void)
> >>   {
> >>   return !kvm_enabled() || kvm_eventfds_enabled();
> >> @@ -1198,6 +1200,49 @@ static int vhost_user_set_vring_base(struct
> >> vhost_dev *dev,
> >>   return vhost_set_vring(dev, VHOST_USER_SET_VRING_BASE, ring);
> >>   }
> >>   +
> >> +static int vhost_user_set_single_vring_enable(struct vhost_dev *dev,
> >> +  int index,
> >> +  int enable,
> >> +  bool wait_for_reply)
> >> +{
> >> +int ret;
> >> +
> >> +if (index < dev->vq_index || index >= dev->vq_index + dev->nvqs) {
> >> +return -EINVAL;
> >> +}
> >> +
> >> +struct vhost_vring_state state = {
> >> +.index = index,
> >> +.num   = enable,
> >> +};
> >> +
> >> +VhostUserMsg msg = {
> >> +.hdr.request = VHOST_USER_SET_VRING_ENABLE,
> >> +.hdr.flags = VHOST_USER_VERSION,
> >> +.payload.state = state,
> >> +.hdr.size = sizeof(msg.payload.state),
> >> +};
> >> +
> >> +bool reply_supported = virtio_has_feature(dev->protocol_features,
> >> + VHOST_USER_PROTOCOL_F_REPLY_ACK);
> >> +
> >> +if (reply_supported && wait_for_reply) {
> >> +msg.hdr.flags |= VHOST_USER_NEED_REPLY_MASK;
> >> +}
> >
> >
> > Do we need to fail if !realy_supported && wait_for_reply?
> >
> > Thanks
> >
> >
> I guess you mean "should we fail if VHOST_USER_PROTOCOL_F_REPLY_ACK
> feature is not supported?".
>
> The implementation here is similar to that in vhost_user_set_vring_addr().
>
> If this feature is not supported, it will call enforce_reply(), then
> call vhost_user_get_features() to get a reply.

Ok, so you meant we can then fallback to VHOST_USER_GET_FEATURES? I
wonder how robust is this, e.g is the bakcend required to process
vhost-user request in order?

Thanks

>
> Since the messages will be processed sequentailly in DPDK, success of
> enforce_reply() means the previous message VHOST_USER_SET_VRING_ENABLE
> has been processed.
>
> Thanks
>
> >
> >> +
> >> +ret = vhost_user_write(dev, , NULL, 0);
> >> +if (ret < 0) {
> >> +return ret;
> >> +}
> >> +
> >> +if (wait_for_reply) {
> >> +return enforce_reply(dev, );
> >> +}
> >> +
> >> +return ret;
> >> +}
> >> +
> >>   static int vhost_user_set_vring_enable(struct vhost_dev *dev, int
> >> enable)
> >>   {
> >>   int i;
> >> @@ -1207,13 +1252,8 @@ static int vhost_user_set_vring_enable(struct
> >> vhost_dev *dev, int enable)
> >>   }
> >> for (i = 0; i < dev->nvqs; ++i) {
> >> -int ret;
> >> -struct vhost_vring_state state = {
> >> -.index = dev->vq_index + i,
> >> -.num   = enable,
> >> -};
> >> -
> >> -ret = vhost_set_vring(dev, VHOST_USER_SET_VRING_ENABLE,
> >> );
> >> +int ret = vhost_user_set_single_vring_enable(dev,
> >> dev->vq_index + i,
> >> + enable, false);
> >>   if (ret < 0) {
> >>   /*
> >>* Restoring the previous state is likely infeasible,
> >> as well as
> >> @@ -2627,6 +2667,7 @@ const VhostOps user_ops = {
> >>   .vhost_set_owner = vhost_user_set_owner,
> >>   .vhost_reset_device = vhost_user_reset_device,
> >>   .vhost_get_vq_index = vhost_user_get_vq_index,
> >> +.vhost_set_single_vring_enable =
> >> vhost_user_set_single_vring_enable,
> >>   .vhost_set_vring_enable = vhost_user_set_vring_enable,
> >>   .vhost_requires_shm_log = vhost_user_requires_shm_log,
> >>   .vhost_migration_done = vhost_user_migration_done,
>

Re: [PATCH] target/riscv: Ensure opcode is saved for every instruction

2022-07-26 Thread Anup Patel

On Wed, Jul 27, 2022 at 9:24 AM Richard Henderson
 wrote:
>
> On 7/26/22 20:25, Anup Patel wrote:
> > We should call decode_save_opc() for every decoded instruction
> > because generating transformed instruction upon guest page faults
> > expects opcode to be available. Without this, hypervisor sees
> > transformed instruction as zero in htinst CSR for guest MMIO
> > emulation which makes MMIO emulation in hypervisor slow and
> > also breaks nested virtualization.
>
> Then just add decode_save_opc to load/store insns, not everything including 
> plain
> arithmetic...

We will have to add for float load/store, atomics, and HLV/HSV as
well. Basically we end-up adding for everything except integer and
float arithmetic.

I see that decode_save_opc() only saves opcode in an array
through tcg_set_insn_start_param(). Which brings me to the
question about how much are we saving by distributing
decode_save_opc() calls ?

If we distribute decode_save_opc() calls then the code becomes
fragile for future changes and we will miss adding decode_save_opc()
for some new extensions.

Regards,
Anup

>
>
> r~
>
>
> >
> > Fixes: a9814e3e08d2 ("target/riscv: Minimize the calls to decode_save_opc")
> > Signed-off-by: Anup Patel 
> > ---
> >   target/riscv/insn_trans/trans_privileged.c.inc |  4 
> >   target/riscv/insn_trans/trans_rvh.c.inc|  2 --
> >   target/riscv/insn_trans/trans_rvi.c.inc|  2 --
> >   target/riscv/translate.c   | 10 --
> >   4 files changed, 4 insertions(+), 14 deletions(-)
> >
> > diff --git a/target/riscv/insn_trans/trans_privileged.c.inc 
> > b/target/riscv/insn_trans/trans_privileged.c.inc
> > index 46f96ad74d..53613682e8 100644
> > --- a/target/riscv/insn_trans/trans_privileged.c.inc
> > +++ b/target/riscv/insn_trans/trans_privileged.c.inc
> > @@ -75,7 +75,6 @@ static bool trans_sret(DisasContext *ctx, arg_sret *a)
> >   {
> >   #ifndef CONFIG_USER_ONLY
> >   if (has_ext(ctx, RVS)) {
> > -decode_save_opc(ctx);
> >   gen_helper_sret(cpu_pc, cpu_env);
> >   tcg_gen_exit_tb(NULL, 0); /* no chaining */
> >   ctx->base.is_jmp = DISAS_NORETURN;
> > @@ -91,7 +90,6 @@ static bool trans_sret(DisasContext *ctx, arg_sret *a)
> >   static bool trans_mret(DisasContext *ctx, arg_mret *a)
> >   {
> >   #ifndef CONFIG_USER_ONLY
> > -decode_save_opc(ctx);
> >   gen_helper_mret(cpu_pc, cpu_env);
> >   tcg_gen_exit_tb(NULL, 0); /* no chaining */
> >   ctx->base.is_jmp = DISAS_NORETURN;
> > @@ -104,7 +102,6 @@ static bool trans_mret(DisasContext *ctx, arg_mret *a)
> >   static bool trans_wfi(DisasContext *ctx, arg_wfi *a)
> >   {
> >   #ifndef CONFIG_USER_ONLY
> > -decode_save_opc(ctx);
> >   gen_set_pc_imm(ctx, ctx->pc_succ_insn);
> >   gen_helper_wfi(cpu_env);
> >   return true;
> > @@ -116,7 +113,6 @@ static bool trans_wfi(DisasContext *ctx, arg_wfi *a)
> >   static bool trans_sfence_vma(DisasContext *ctx, arg_sfence_vma *a)
> >   {
> >   #ifndef CONFIG_USER_ONLY
> > -decode_save_opc(ctx);
> >   gen_helper_tlb_flush(cpu_env);
> >   return true;
> >   #endif
> > diff --git a/target/riscv/insn_trans/trans_rvh.c.inc 
> > b/target/riscv/insn_trans/trans_rvh.c.inc
> > index 4f8aecddc7..cebcb3f8f6 100644
> > --- a/target/riscv/insn_trans/trans_rvh.c.inc
> > +++ b/target/riscv/insn_trans/trans_rvh.c.inc
> > @@ -169,7 +169,6 @@ static bool trans_hfence_gvma(DisasContext *ctx, 
> > arg_sfence_vma *a)
> >   {
> >   REQUIRE_EXT(ctx, RVH);
> >   #ifndef CONFIG_USER_ONLY
> > -decode_save_opc(ctx);
> >   gen_helper_hyp_gvma_tlb_flush(cpu_env);
> >   return true;
> >   #endif
> > @@ -180,7 +179,6 @@ static bool trans_hfence_vvma(DisasContext *ctx, 
> > arg_sfence_vma *a)
> >   {
> >   REQUIRE_EXT(ctx, RVH);
> >   #ifndef CONFIG_USER_ONLY
> > -decode_save_opc(ctx);
> >   gen_helper_hyp_tlb_flush(cpu_env);
> >   return true;
> >   #endif
> > diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
> > b/target/riscv/insn_trans/trans_rvi.c.inc
> > index c49dbec0eb..1f318ffbef 100644
> > --- a/target/riscv/insn_trans/trans_rvi.c.inc
> > +++ b/target/riscv/insn_trans/trans_rvi.c.inc
> > @@ -834,8 +834,6 @@ static bool trans_fence_i(DisasContext *ctx, 
> > arg_fence_i *a)
> >
> >   static bool do_csr_post(DisasContext *ctx)
> >   {
> > -/* The helper may raise ILLEGAL_INSN -- record binv for unwind. */
> > -decode_save_opc(ctx);
> >   /* We may have changed important cpu state -- exit to main loop. */
> >   gen_set_pc_imm(ctx, ctx->pc_succ_insn);
> >   tcg_gen_exit_tb(NULL, 0);
> > diff --git a/target/riscv/translate.c b/target/riscv/translate.c
> > index a79d0cd95b..5425d19846 100644
> > --- a/target/riscv/translate.c
> > +++ b/target/riscv/translate.c
> > @@ -207,10 +207,10 @@ static void gen_check_nanbox_s(TCGv_i64 out, TCGv_i64 
> > in)
> >   tcg_gen_movcond_i64(TCG_COND_GEU, out, in, t_max, in, t_nan);
> >   }
> >
> > -static void

Re: [PATCH] target/riscv: Ensure opcode is saved for every instruction

2022-07-26 Thread Richard Henderson


On 7/26/22 20:25, Anup Patel wrote:

We should call decode_save_opc() for every decoded instruction
because generating transformed instruction upon guest page faults
expects opcode to be available. Without this, hypervisor sees
transformed instruction as zero in htinst CSR for guest MMIO
emulation which makes MMIO emulation in hypervisor slow and
also breaks nested virtualization.


Then just add decode_save_opc to load/store insns, not everything including plain 
arithmetic...



r~




Fixes: a9814e3e08d2 ("target/riscv: Minimize the calls to decode_save_opc")
Signed-off-by: Anup Patel 
---
  target/riscv/insn_trans/trans_privileged.c.inc |  4 
  target/riscv/insn_trans/trans_rvh.c.inc|  2 --
  target/riscv/insn_trans/trans_rvi.c.inc|  2 --
  target/riscv/translate.c   | 10 --
  4 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/target/riscv/insn_trans/trans_privileged.c.inc 
b/target/riscv/insn_trans/trans_privileged.c.inc
index 46f96ad74d..53613682e8 100644
--- a/target/riscv/insn_trans/trans_privileged.c.inc
+++ b/target/riscv/insn_trans/trans_privileged.c.inc
@@ -75,7 +75,6 @@ static bool trans_sret(DisasContext *ctx, arg_sret *a)
  {
  #ifndef CONFIG_USER_ONLY
  if (has_ext(ctx, RVS)) {
-decode_save_opc(ctx);
  gen_helper_sret(cpu_pc, cpu_env);
  tcg_gen_exit_tb(NULL, 0); /* no chaining */
  ctx->base.is_jmp = DISAS_NORETURN;
@@ -91,7 +90,6 @@ static bool trans_sret(DisasContext *ctx, arg_sret *a)
  static bool trans_mret(DisasContext *ctx, arg_mret *a)
  {
  #ifndef CONFIG_USER_ONLY
-decode_save_opc(ctx);
  gen_helper_mret(cpu_pc, cpu_env);
  tcg_gen_exit_tb(NULL, 0); /* no chaining */
  ctx->base.is_jmp = DISAS_NORETURN;
@@ -104,7 +102,6 @@ static bool trans_mret(DisasContext *ctx, arg_mret *a)
  static bool trans_wfi(DisasContext *ctx, arg_wfi *a)
  {
  #ifndef CONFIG_USER_ONLY
-decode_save_opc(ctx);
  gen_set_pc_imm(ctx, ctx->pc_succ_insn);
  gen_helper_wfi(cpu_env);
  return true;
@@ -116,7 +113,6 @@ static bool trans_wfi(DisasContext *ctx, arg_wfi *a)
  static bool trans_sfence_vma(DisasContext *ctx, arg_sfence_vma *a)
  {
  #ifndef CONFIG_USER_ONLY
-decode_save_opc(ctx);
  gen_helper_tlb_flush(cpu_env);
  return true;
  #endif
diff --git a/target/riscv/insn_trans/trans_rvh.c.inc 
b/target/riscv/insn_trans/trans_rvh.c.inc
index 4f8aecddc7..cebcb3f8f6 100644
--- a/target/riscv/insn_trans/trans_rvh.c.inc
+++ b/target/riscv/insn_trans/trans_rvh.c.inc
@@ -169,7 +169,6 @@ static bool trans_hfence_gvma(DisasContext *ctx, 
arg_sfence_vma *a)
  {
  REQUIRE_EXT(ctx, RVH);
  #ifndef CONFIG_USER_ONLY
-decode_save_opc(ctx);
  gen_helper_hyp_gvma_tlb_flush(cpu_env);
  return true;
  #endif
@@ -180,7 +179,6 @@ static bool trans_hfence_vvma(DisasContext *ctx, 
arg_sfence_vma *a)
  {
  REQUIRE_EXT(ctx, RVH);
  #ifndef CONFIG_USER_ONLY
-decode_save_opc(ctx);
  gen_helper_hyp_tlb_flush(cpu_env);
  return true;
  #endif
diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc
index c49dbec0eb..1f318ffbef 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -834,8 +834,6 @@ static bool trans_fence_i(DisasContext *ctx, arg_fence_i *a)
  
  static bool do_csr_post(DisasContext *ctx)

  {
-/* The helper may raise ILLEGAL_INSN -- record binv for unwind. */
-decode_save_opc(ctx);
  /* We may have changed important cpu state -- exit to main loop. */
  gen_set_pc_imm(ctx, ctx->pc_succ_insn);
  tcg_gen_exit_tb(NULL, 0);
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index a79d0cd95b..5425d19846 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -207,10 +207,10 @@ static void gen_check_nanbox_s(TCGv_i64 out, TCGv_i64 in)
  tcg_gen_movcond_i64(TCG_COND_GEU, out, in, t_max, in, t_nan);
  }
  
-static void decode_save_opc(DisasContext *ctx)

+static void decode_save_opc(DisasContext *ctx, target_ulong opc)
  {
  assert(ctx->insn_start != NULL);
-tcg_set_insn_start_param(ctx->insn_start, 1, ctx->opcode);
+tcg_set_insn_start_param(ctx->insn_start, 1, opc);
  ctx->insn_start = NULL;
  }
  
@@ -240,8 +240,6 @@ static void generate_exception(DisasContext *ctx, int excp)
  
  static void gen_exception_illegal(DisasContext *ctx)

  {
-tcg_gen_st_i32(tcg_constant_i32(ctx->opcode), cpu_env,
-   offsetof(CPURISCVState, bins));
  generate_exception(ctx, RISCV_EXCP_ILLEGAL_INST);
  }
  
@@ -643,8 +641,6 @@ static void gen_set_rm(DisasContext *ctx, int rm)

  return;
  }
  
-/* The helper may raise ILLEGAL_INSN -- record binv for unwind. */

-decode_save_opc(ctx);
  gen_helper_set_rounding_mode(cpu_env, tcg_constant_i32(rm));
  }
  
@@ -1055,6 +1051,7 @@ static void decode_opc(CPURISCVState *env, DisasContext *ctx, uint16_t opcode)
  
  /* Check for

[PATCH] target/riscv: Ensure opcode is saved for every instruction

2022-07-26 Thread Anup Patel

We should call decode_save_opc() for every decoded instruction
because generating transformed instruction upon guest page faults
expects opcode to be available. Without this, hypervisor sees
transformed instruction as zero in htinst CSR for guest MMIO
emulation which makes MMIO emulation in hypervisor slow and
also breaks nested virtualization.

Fixes: a9814e3e08d2 ("target/riscv: Minimize the calls to decode_save_opc")
Signed-off-by: Anup Patel 
---
 target/riscv/insn_trans/trans_privileged.c.inc |  4 
 target/riscv/insn_trans/trans_rvh.c.inc|  2 --
 target/riscv/insn_trans/trans_rvi.c.inc|  2 --
 target/riscv/translate.c   | 10 --
 4 files changed, 4 insertions(+), 14 deletions(-)

diff --git a/target/riscv/insn_trans/trans_privileged.c.inc 
b/target/riscv/insn_trans/trans_privileged.c.inc
index 46f96ad74d..53613682e8 100644
--- a/target/riscv/insn_trans/trans_privileged.c.inc
+++ b/target/riscv/insn_trans/trans_privileged.c.inc
@@ -75,7 +75,6 @@ static bool trans_sret(DisasContext *ctx, arg_sret *a)
 {
 #ifndef CONFIG_USER_ONLY
 if (has_ext(ctx, RVS)) {
-decode_save_opc(ctx);
 gen_helper_sret(cpu_pc, cpu_env);
 tcg_gen_exit_tb(NULL, 0); /* no chaining */
 ctx->base.is_jmp = DISAS_NORETURN;
@@ -91,7 +90,6 @@ static bool trans_sret(DisasContext *ctx, arg_sret *a)
 static bool trans_mret(DisasContext *ctx, arg_mret *a)
 {
 #ifndef CONFIG_USER_ONLY
-decode_save_opc(ctx);
 gen_helper_mret(cpu_pc, cpu_env);
 tcg_gen_exit_tb(NULL, 0); /* no chaining */
 ctx->base.is_jmp = DISAS_NORETURN;
@@ -104,7 +102,6 @@ static bool trans_mret(DisasContext *ctx, arg_mret *a)
 static bool trans_wfi(DisasContext *ctx, arg_wfi *a)
 {
 #ifndef CONFIG_USER_ONLY
-decode_save_opc(ctx);
 gen_set_pc_imm(ctx, ctx->pc_succ_insn);
 gen_helper_wfi(cpu_env);
 return true;
@@ -116,7 +113,6 @@ static bool trans_wfi(DisasContext *ctx, arg_wfi *a)
 static bool trans_sfence_vma(DisasContext *ctx, arg_sfence_vma *a)
 {
 #ifndef CONFIG_USER_ONLY
-decode_save_opc(ctx);
 gen_helper_tlb_flush(cpu_env);
 return true;
 #endif
diff --git a/target/riscv/insn_trans/trans_rvh.c.inc 
b/target/riscv/insn_trans/trans_rvh.c.inc
index 4f8aecddc7..cebcb3f8f6 100644
--- a/target/riscv/insn_trans/trans_rvh.c.inc
+++ b/target/riscv/insn_trans/trans_rvh.c.inc
@@ -169,7 +169,6 @@ static bool trans_hfence_gvma(DisasContext *ctx, 
arg_sfence_vma *a)
 {
 REQUIRE_EXT(ctx, RVH);
 #ifndef CONFIG_USER_ONLY
-decode_save_opc(ctx);
 gen_helper_hyp_gvma_tlb_flush(cpu_env);
 return true;
 #endif
@@ -180,7 +179,6 @@ static bool trans_hfence_vvma(DisasContext *ctx, 
arg_sfence_vma *a)
 {
 REQUIRE_EXT(ctx, RVH);
 #ifndef CONFIG_USER_ONLY
-decode_save_opc(ctx);
 gen_helper_hyp_tlb_flush(cpu_env);
 return true;
 #endif
diff --git a/target/riscv/insn_trans/trans_rvi.c.inc 
b/target/riscv/insn_trans/trans_rvi.c.inc
index c49dbec0eb..1f318ffbef 100644
--- a/target/riscv/insn_trans/trans_rvi.c.inc
+++ b/target/riscv/insn_trans/trans_rvi.c.inc
@@ -834,8 +834,6 @@ static bool trans_fence_i(DisasContext *ctx, arg_fence_i *a)
 
 static bool do_csr_post(DisasContext *ctx)
 {
-/* The helper may raise ILLEGAL_INSN -- record binv for unwind. */
-decode_save_opc(ctx);
 /* We may have changed important cpu state -- exit to main loop. */
 gen_set_pc_imm(ctx, ctx->pc_succ_insn);
 tcg_gen_exit_tb(NULL, 0);
diff --git a/target/riscv/translate.c b/target/riscv/translate.c
index a79d0cd95b..5425d19846 100644
--- a/target/riscv/translate.c
+++ b/target/riscv/translate.c
@@ -207,10 +207,10 @@ static void gen_check_nanbox_s(TCGv_i64 out, TCGv_i64 in)
 tcg_gen_movcond_i64(TCG_COND_GEU, out, in, t_max, in, t_nan);
 }
 
-static void decode_save_opc(DisasContext *ctx)
+static void decode_save_opc(DisasContext *ctx, target_ulong opc)
 {
 assert(ctx->insn_start != NULL);
-tcg_set_insn_start_param(ctx->insn_start, 1, ctx->opcode);
+tcg_set_insn_start_param(ctx->insn_start, 1, opc);
 ctx->insn_start = NULL;
 }
 
@@ -240,8 +240,6 @@ static void generate_exception(DisasContext *ctx, int excp)
 
 static void gen_exception_illegal(DisasContext *ctx)
 {
-tcg_gen_st_i32(tcg_constant_i32(ctx->opcode), cpu_env,
-   offsetof(CPURISCVState, bins));
 generate_exception(ctx, RISCV_EXCP_ILLEGAL_INST);
 }
 
@@ -643,8 +641,6 @@ static void gen_set_rm(DisasContext *ctx, int rm)
 return;
 }
 
-/* The helper may raise ILLEGAL_INSN -- record binv for unwind. */
-decode_save_opc(ctx);
 gen_helper_set_rounding_mode(cpu_env, tcg_constant_i32(rm));
 }
 
@@ -1055,6 +1051,7 @@ static void decode_opc(CPURISCVState *env, DisasContext 
*ctx, uint16_t opcode)
 
 /* Check for compressed insn */
 if (extract16(opcode, 0, 2) != 3) {
+decode_save_opc(ctx, opcode);
 if (!has_ext(ctx, RVC)) {
 gen_exception_illegal(ctx);
 } else {
@@ -1071,6 +1068,7 @@

Re: [PATCH] hw/microblaze: pass random seed to fdt

2022-07-26 Thread Richard Henderson


On 7/26/22 18:49, Jason A. Donenfeld wrote:

Hi Edgar,

On Thu, Jul 21, 2022 at 8:43 PM Edgar E. Iglesias
 wrote:

Ah OK, Paolo, it would be great if you would take this via your tree!


It looks like Paolo never did this. So you might want to queue this
somewhere, or bug him to take it, or something. I don't know how this
works with 7.1-rc0 just being tagged, but I assume this means this has
to wait until 7.2


Yes, it has missed the window by over a week now: soft freeze.
You really should have kept all of these in one thread.


r~

Re: [PATCH] hw/nvme: Add iothread support

2022-07-26 Thread Jinhao Fan

at 2:07 AM, Keith Busch  wrote:

> MSI-x uses MMIO for masking, so there's no need for an NVMe specific way to
> mask these interrupts. You can just use the generic PCIe methods to clear the
> vector's enable bit. But no NVMe driver that I know of is making use of these
> either, though it should be possible to make linux start doing that.

I believe we need to handle MSI-x masking in the NVMe controller after we
switch to irqfd. Before that QEMU’s MSI-x emulation logic helps us handle
masked interrupts. But with irqfd, we bypass QEMU’s MSI-x and let the kernel
send the interrupt directly. Therefore qemu-nvme needs to do some
bookkeeping about which interrupt vectors are masked.
msix_set_vector_notifiers helps us do that.

Re: [PATCH] target/sh4: Honor QEMU_LOG_FILENAME with QEMU_LOG=cpu

2022-07-26 Thread Yoshinori Sato

On Mon, 25 Jul 2022 23:28:54 +0900,
Ilya Leoshkevich wrote:
> 
> When using QEMU_LOG=cpu on sh4, QEMU_LOG_FILENAME is partially ignored.
> Fix by using qemu_fprintf() instead of qemu_printf() in the respective
> places.
> 
> Fixes: 90c84c560067 ("qom/cpu: Simplify how CPUClass:cpu_dump_state() prints")
> Signed-off-by: Ilya Leoshkevich 
> ---
>  target/sh4/translate.c | 14 +++---
>  1 file changed, 7 insertions(+), 7 deletions(-)
> 
> diff --git a/target/sh4/translate.c b/target/sh4/translate.c
> index f1b190e7cf..9aadaf52cd 100644
> --- a/target/sh4/translate.c
> +++ b/target/sh4/translate.c
> @@ -171,16 +171,16 @@ void superh_cpu_dump_state(CPUState *cs, FILE *f, int 
> flags)
>  qemu_fprintf(f, "sgr=0x%08x dbr=0x%08x delayed_pc=0x%08x fpul=0x%08x\n",
>   env->sgr, env->dbr, env->delayed_pc, env->fpul);
>  for (i = 0; i < 24; i += 4) {
> -qemu_printf("r%d=0x%08x r%d=0x%08x r%d=0x%08x r%d=0x%08x\n",
> - i, env->gregs[i], i + 1, env->gregs[i + 1],
> - i + 2, env->gregs[i + 2], i + 3, env->gregs[i + 3]);
> +qemu_fprintf(f, "r%d=0x%08x r%d=0x%08x r%d=0x%08x r%d=0x%08x\n",
> + i, env->gregs[i], i + 1, env->gregs[i + 1],
> + i + 2, env->gregs[i + 2], i + 3, env->gregs[i + 3]);
>  }
>  if (env->flags & DELAY_SLOT) {
> -qemu_printf("in delay slot (delayed_pc=0x%08x)\n",
> - env->delayed_pc);
> +qemu_fprintf(f, "in delay slot (delayed_pc=0x%08x)\n",
> + env->delayed_pc);
>  } else if (env->flags & DELAY_SLOT_CONDITIONAL) {
> -qemu_printf("in conditional delay slot (delayed_pc=0x%08x)\n",
> - env->delayed_pc);
> +qemu_fprintf(f, "in conditional delay slot (delayed_pc=0x%08x)\n",
> + env->delayed_pc);
>  } else if (env->flags & DELAY_SLOT_RTE) {
>  qemu_fprintf(f, "in rte delay slot (delayed_pc=0x%08x)\n",
>   env->delayed_pc);
> -- 
> 2.35.3
> 

Reviewd-by: Yoshinori Sato 

-- 
Yosinori Sato

Re: [PATCH] target/riscv: fix csr check for cycle{h}, instret{h}, time{h}, hpmcounter3~31{h}

2022-07-26 Thread Weiwei Li




在 2022/7/27 上午7:34, Atish Patra 写道:

On Wed, Jul 20, 2022 at 9:32 PM Alistair Francis  wrote:

On Sat, Jul 2, 2022 at 11:42 PM Weiwei Li  wrote:

- improve the field extract progress

This part is already improved with the PMU series.

https://www.mail-archive.com/qemu-devel@nongnu.org/msg895143.html

Yeah, I have replied on this patch a few days ago.



- add stand-alone check for mcuonteren when in less-privileged mode
- add check for scounteren when 'S' is enabled and current priv is PRV_U


These two parts are fine. I am resending the remaining patches for the
PMU series.
Can you please rebase your top and resend it ?


Ok. I'll rebase and resend it later.

Regards,

Weiwei Li




Signed-off-by: Weiwei Li 
Signed-off-by: Junqiang Wang 

+ Atish

Alistair


---
  target/riscv/csr.c | 76 ++
  1 file changed, 22 insertions(+), 54 deletions(-)

diff --git a/target/riscv/csr.c b/target/riscv/csr.c
index 6dbe9b541f..a4719cbf35 100644
--- a/target/riscv/csr.c
+++ b/target/riscv/csr.c
@@ -72,66 +72,34 @@ static RISCVException ctr(CPURISCVState *env, int csrno)
  #if !defined(CONFIG_USER_ONLY)
  CPUState *cs = env_cpu(env);
  RISCVCPU *cpu = RISCV_CPU(cs);
+uint32_t field = 0;

  if (!cpu->cfg.ext_counters) {
  /* The Counters extensions is not enabled */
  return RISCV_EXCP_ILLEGAL_INST;
  }

-if (riscv_cpu_virt_enabled(env)) {
-switch (csrno) {
-case CSR_CYCLE:
-if (!get_field(env->hcounteren, COUNTEREN_CY) &&
-get_field(env->mcounteren, COUNTEREN_CY)) {
-return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
-}
-break;
-case CSR_TIME:
-if (!get_field(env->hcounteren, COUNTEREN_TM) &&
-get_field(env->mcounteren, COUNTEREN_TM)) {
-return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
-}
-break;
-case CSR_INSTRET:
-if (!get_field(env->hcounteren, COUNTEREN_IR) &&
-get_field(env->mcounteren, COUNTEREN_IR)) {
-return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
-}
-break;
-case CSR_HPMCOUNTER3...CSR_HPMCOUNTER31:
-if (!get_field(env->hcounteren, 1 << (csrno - CSR_HPMCOUNTER3)) &&
-get_field(env->mcounteren, 1 << (csrno - CSR_HPMCOUNTER3))) {
-return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
-}
-break;
-}
-if (riscv_cpu_mxl(env) == MXL_RV32) {
-switch (csrno) {
-case CSR_CYCLEH:
-if (!get_field(env->hcounteren, COUNTEREN_CY) &&
-get_field(env->mcounteren, COUNTEREN_CY)) {
-return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
-}
-break;
-case CSR_TIMEH:
-if (!get_field(env->hcounteren, COUNTEREN_TM) &&
-get_field(env->mcounteren, COUNTEREN_TM)) {
-return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
-}
-break;
-case CSR_INSTRETH:
-if (!get_field(env->hcounteren, COUNTEREN_IR) &&
-get_field(env->mcounteren, COUNTEREN_IR)) {
-return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
-}
-break;
-case CSR_HPMCOUNTER3H...CSR_HPMCOUNTER31H:
-if (!get_field(env->hcounteren, 1 << (csrno - CSR_HPMCOUNTER3H)) 
&&
-get_field(env->mcounteren, 1 << (csrno - 
CSR_HPMCOUNTER3H))) {
-return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
-}
-break;
-}
+if (csrno <= CSR_HPMCOUNTER31 && csrno >= CSR_CYCLE) {
+field = 1 << (csrno - CSR_CYCLE);
+} else if (riscv_cpu_mxl(env) == MXL_RV32 && csrno <= CSR_HPMCOUNTER31H &&
+   csrno >= CSR_CYCLEH) {
+field = 1 << (csrno - CSR_CYCLEH);
+}
+
+if (env->priv < PRV_M && !get_field(env->mcounteren, field)) {
+return RISCV_EXCP_ILLEGAL_INST;
+}
+
+if (riscv_cpu_virt_enabled(env) && !get_field(env->hcounteren, field)) {
+return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
+}
+
+if (riscv_has_ext(env, RVS) && env->priv == PRV_U &&
+!get_field(env->scounteren, field)) {
+if (riscv_cpu_virt_enabled(env)) {
+return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
+} else {
+return RISCV_EXCP_ILLEGAL_INST;
  }
  }
  #endif
--
2.17.1

Re: [PATCH] hw/microblaze: pass random seed to fdt

2022-07-26 Thread Jason A. Donenfeld

Hi Edgar,

On Thu, Jul 21, 2022 at 8:43 PM Edgar E. Iglesias
 wrote:
> Ah OK, Paolo, it would be great if you would take this via your tree!

It looks like Paolo never did this. So you might want to queue this
somewhere, or bug him to take it, or something. I don't know how this
works with 7.1-rc0 just being tagged, but I assume this means this has
to wait until 7.2

Jason

Re: [PATCH v3] target/s390x: support PRNO_TRNG instruction

2022-07-26 Thread Jason A. Donenfeld

Hey David,

On Wed, Jul 20, 2022 at 08:41:48PM +0200, David Hildenbrand wrote:
> I did not review the doc in detail once again, maybe I get to that later
> this week.

Did you ever get around to merging this patch? Is it in some tree
somewhere?

Jason

Re: [PATCH] linux-user: Don't assume 0 is not a valid host timer_t value

2022-07-26 Thread Jon Alduan

Hello Peter,

I can say so far, your patch solved the issue! Great thanks for that!

Regarding the libc version:
>From my WSL2 Ubuntu 21.04 x86_64:
$ ls -l /lib32/libc*
-rwxr-xr-x 1 root root 2042632 Mar 31  2021 /lib32/libc-2.33.so

My gcc version 10 does use the same libc version.
As already mentioned, I can also reproduce this on a VM with Ubuntu 20.04
and libc-2.31.
In addition, originally, this issue was first reproduced with an own
buildroot RootFS and containing libc-2.28.

As you see, the libcs are not that old. What about the virtual environment?
I could not check this hypothesis, but I hope to do so soon.

Thank you again and best regards
Jon

El lun, 25 jul 2022 a las 14:45, Peter Maydell ()
escribió:

> On Mon, 25 Jul 2022 at 12:13, Daniel P. Berrangé 
> wrote:
> >
> > On Mon, Jul 25, 2022 at 12:00:35PM +0100, Peter Maydell wrote:
> > > For handling guest POSIX timers, we currently use an array
> > > g_posix_timers[], whose entries are a host timer_t value, or 0 for
> > > "this slot is unused".  When the guest calls the timer_create syscall
> > > we look through the array for a slot containing 0, and use that for
> > > the new timer.
> > >
> > > This scheme assumes that host timer_t values can never be zero.  This
> > > is unfortunately not a valid assumption -- for some host libc
> > > versions, timer_t values are simply indexes starting at 0.  When
> > > using this kind of host libc, the effect is that the first and second
> > > timers end up sharing a slot, and so when the guest tries to operate
> > > on the first timer it changes the second timer instead.
> >
> > For sake of historical record, could you mention here which specific
> > libc impl / version highlights the problem.
>
> Jon, which host libc are you seeing this with?
>
> thanks
> -- PMM
>


-- 
j.A

[PATCH v2 1/1] monitor: Support specified vCPU registers

2022-07-26 Thread zhenwei pi

Originally we have to get all the vCPU registers and parse the
specified one. To improve the performance of this usage, allow user
specified vCPU id to query registers.

Run a VM with 16 vCPU, use bcc tool to track the latency of
'hmp_info_registers':
'info registers -a' uses about 3ms;
'info registers 12' uses about 150us.

Cc: Darren Kenny 
Signed-off-by: zhenwei pi 
---
 hmp-commands-info.hx |  7 ---
 monitor/misc.c   | 18 ++
 2 files changed, 22 insertions(+), 3 deletions(-)

diff --git a/hmp-commands-info.hx b/hmp-commands-info.hx
index 3ffa24bd67..7a00b4ded3 100644
--- a/hmp-commands-info.hx
+++ b/hmp-commands-info.hx
@@ -100,9 +100,10 @@ ERST
 
 {
 .name   = "registers",
-.args_type  = "cpustate_all:-a",
-.params = "[-a]",
-.help   = "show the cpu registers (-a: all - show register info 
for all cpus)",
+.args_type  = "cpustate_all:-a,vcpu:i?",
+.params = "[-a|vcpu]",
+.help   = "show the cpu registers (-a: all - show register info 
for all cpus;"
+  " vcpu: specific vCPU to query)",
 .cmd= hmp_info_registers,
 },
 
diff --git a/monitor/misc.c b/monitor/misc.c
index 3d2312ba8d..8e1d4840f2 100644
--- a/monitor/misc.c
+++ b/monitor/misc.c
@@ -307,6 +307,7 @@ int monitor_get_cpu_index(Monitor *mon)
 static void hmp_info_registers(Monitor *mon, const QDict *qdict)
 {
 bool all_cpus = qdict_get_try_bool(qdict, "cpustate_all", false);
+int vcpu = qdict_get_try_int(qdict, "vcpu", -1);
 CPUState *cs;
 
 if (all_cpus) {
@@ -314,6 +315,23 @@ static void hmp_info_registers(Monitor *mon, const QDict 
*qdict)
 monitor_printf(mon, "\nCPU#%d\n", cs->cpu_index);
 cpu_dump_state(cs, NULL, CPU_DUMP_FPU);
 }
+} else if (vcpu >= 0) {
+CPUState *target_cs = NULL;
+
+CPU_FOREACH(cs) {
+if (cs->cpu_index == vcpu) {
+target_cs = cs;
+break;
+}
+}
+
+if (!target_cs) {
+monitor_printf(mon, "CPU#%d not available\n", vcpu);
+return;
+}
+
+monitor_printf(mon, "\nCPU#%d\n", target_cs->cpu_index);
+cpu_dump_state(target_cs, NULL, CPU_DUMP_FPU);
 } else {
 cs = mon_get_cpu(mon);
 
-- 
2.20.1

[PATCH v2 0/1] monitor: Support specified vCPU registers

2022-07-26 Thread zhenwei pi

v1 -> v2:
- Typo fix in commit message.
- Suggested by Darren, use '[-a|vcpu]' instead of '[-a] [vcpu]',
  becase only one of these may be specified at a time.

v1:
- Support specified vCPU registers for monitor command.

Zhenwei Pi (1):
  monitor: Support specified vCPU registers

 hmp-commands-info.hx |  7 ---
 monitor/misc.c   | 18 ++
 2 files changed, 22 insertions(+), 3 deletions(-)

-- 
2.20.1

Re: [PATCH v1 1/1] migration: add remaining params->has_* = true in migration_instance_init()

2022-07-26 Thread Leonardo Bras Soares Passos

Please include:

Fixes: 69ef1f36b0 ("migration: define 'tls-creds' and 'tls-hostname'
migration parameters")
Fixes: 1d58872a91 ("migration: do not wait for free thread")
Fixes: d2f1d29b95 ("migration: add support for a "tls-authz" migration
parameter")

On Mon, Jul 25, 2022 at 10:02 PM Leonardo Bras  wrote:
>
> Some of params->has_* = true are missing in migration_instance_init, this
> causes migrate_params_check() to skip some tests, allowing some
> unsupported scenarios.
>
> Fix this by adding all missing params->has_* = true in
> migration_instance_init().
>
> Signed-off-by: Leonardo Bras 
> ---
>  migration/migration.c | 4 
>  1 file changed, 4 insertions(+)
>
> diff --git a/migration/migration.c b/migration/migration.c
> index e03f698a3c..82fbe0cf55 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -4451,6 +4451,7 @@ static void migration_instance_init(Object *obj)
>  /* Set has_* up only for parameter checks */
>  params->has_compress_level = true;
>  params->has_compress_threads = true;
> +params->has_compress_wait_thread = true;
>  params->has_decompress_threads = true;
>  params->has_throttle_trigger_threshold = true;
>  params->has_cpu_throttle_initial = true;
> @@ -4471,6 +4472,9 @@ static void migration_instance_init(Object *obj)
>  params->has_announce_max = true;
>  params->has_announce_rounds = true;
>  params->has_announce_step = true;
> +params->has_tls_creds = true;
> +params->has_tls_hostname = true;
> +params->has_tls_authz = true;
>
>  qemu_sem_init(>postcopy_pause_sem, 0);
>  qemu_sem_init(>postcopy_pause_rp_sem, 0);
> --
> 2.37.1
>

Re: [PATCH] target/riscv: fix csr check for cycle{h}, instret{h}, time{h}, hpmcounter3~31{h}

2022-07-26 Thread Atish Patra

On Wed, Jul 20, 2022 at 9:32 PM Alistair Francis  wrote:
>
> On Sat, Jul 2, 2022 at 11:42 PM Weiwei Li  wrote:
> >
> > - improve the field extract progress

This part is already improved with the PMU series.

https://www.mail-archive.com/qemu-devel@nongnu.org/msg895143.html

> > - add stand-alone check for mcuonteren when in less-privileged mode
> > - add check for scounteren when 'S' is enabled and current priv is PRV_U
> >

These two parts are fine. I am resending the remaining patches for the
PMU series.
Can you please rebase your top and resend it ?

> > Signed-off-by: Weiwei Li 
> > Signed-off-by: Junqiang Wang 
>
> + Atish
>
> Alistair
>
> > ---
> >  target/riscv/csr.c | 76 ++
> >  1 file changed, 22 insertions(+), 54 deletions(-)
> >
> > diff --git a/target/riscv/csr.c b/target/riscv/csr.c
> > index 6dbe9b541f..a4719cbf35 100644
> > --- a/target/riscv/csr.c
> > +++ b/target/riscv/csr.c
> > @@ -72,66 +72,34 @@ static RISCVException ctr(CPURISCVState *env, int csrno)
> >  #if !defined(CONFIG_USER_ONLY)
> >  CPUState *cs = env_cpu(env);
> >  RISCVCPU *cpu = RISCV_CPU(cs);
> > +uint32_t field = 0;
> >
> >  if (!cpu->cfg.ext_counters) {
> >  /* The Counters extensions is not enabled */
> >  return RISCV_EXCP_ILLEGAL_INST;
> >  }
> >
> > -if (riscv_cpu_virt_enabled(env)) {
> > -switch (csrno) {
> > -case CSR_CYCLE:
> > -if (!get_field(env->hcounteren, COUNTEREN_CY) &&
> > -get_field(env->mcounteren, COUNTEREN_CY)) {
> > -return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
> > -}
> > -break;
> > -case CSR_TIME:
> > -if (!get_field(env->hcounteren, COUNTEREN_TM) &&
> > -get_field(env->mcounteren, COUNTEREN_TM)) {
> > -return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
> > -}
> > -break;
> > -case CSR_INSTRET:
> > -if (!get_field(env->hcounteren, COUNTEREN_IR) &&
> > -get_field(env->mcounteren, COUNTEREN_IR)) {
> > -return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
> > -}
> > -break;
> > -case CSR_HPMCOUNTER3...CSR_HPMCOUNTER31:
> > -if (!get_field(env->hcounteren, 1 << (csrno - 
> > CSR_HPMCOUNTER3)) &&
> > -get_field(env->mcounteren, 1 << (csrno - 
> > CSR_HPMCOUNTER3))) {
> > -return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
> > -}
> > -break;
> > -}
> > -if (riscv_cpu_mxl(env) == MXL_RV32) {
> > -switch (csrno) {
> > -case CSR_CYCLEH:
> > -if (!get_field(env->hcounteren, COUNTEREN_CY) &&
> > -get_field(env->mcounteren, COUNTEREN_CY)) {
> > -return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
> > -}
> > -break;
> > -case CSR_TIMEH:
> > -if (!get_field(env->hcounteren, COUNTEREN_TM) &&
> > -get_field(env->mcounteren, COUNTEREN_TM)) {
> > -return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
> > -}
> > -break;
> > -case CSR_INSTRETH:
> > -if (!get_field(env->hcounteren, COUNTEREN_IR) &&
> > -get_field(env->mcounteren, COUNTEREN_IR)) {
> > -return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
> > -}
> > -break;
> > -case CSR_HPMCOUNTER3H...CSR_HPMCOUNTER31H:
> > -if (!get_field(env->hcounteren, 1 << (csrno - 
> > CSR_HPMCOUNTER3H)) &&
> > -get_field(env->mcounteren, 1 << (csrno - 
> > CSR_HPMCOUNTER3H))) {
> > -return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
> > -}
> > -break;
> > -}
> > +if (csrno <= CSR_HPMCOUNTER31 && csrno >= CSR_CYCLE) {
> > +field = 1 << (csrno - CSR_CYCLE);
> > +} else if (riscv_cpu_mxl(env) == MXL_RV32 && csrno <= 
> > CSR_HPMCOUNTER31H &&
> > +   csrno >= CSR_CYCLEH) {
> > +field = 1 << (csrno - CSR_CYCLEH);
> > +}
> > +
> > +if (env->priv < PRV_M && !get_field(env->mcounteren, field)) {
> > +return RISCV_EXCP_ILLEGAL_INST;
> > +}
> > +
> > +if (riscv_cpu_virt_enabled(env) && !get_field(env->hcounteren, field)) 
> > {
> > +return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
> > +}
> > +
> > +if (riscv_has_ext(env, RVS) && env->priv == PRV_U &&
> > +!get_field(env->scounteren, field)) {
> > +if (riscv_cpu_virt_enabled(env)) {
> > +return RISCV_EXCP_VIRT_INSTRUCTION_FAULT;
> > +} else {
> > +return RISCV_EXCP_ILLEGAL_INST;
> >  }
> >  }
> >  #endif
> > --
> > 2.17.1
> >
> >



-- 
Regards,
Atish

Re: [PATCH 2/2] pci: Sanity check mask argument to pci_set_*_by_mask()

2022-07-26 Thread Richard Henderson


On 7/26/22 09:32, Peter Maydell wrote:

Coverity complains that in functions like pci_set_word_by_mask()
we might end up shifting by more than 31 bits. This is true,
but only if the caller passes in a zero mask. Help Coverity out
by asserting that the mask argument is valid.

Fixes: CID 1487168

Signed-off-by: Peter Maydell 
---
Note that only 1 of these 4 functions is used, and that only
in 2 places in the codebase. In both cases the mask argument
is a compile-time constant.
---
  include/hw/pci/pci.h | 20 
  1 file changed, 16 insertions(+), 4 deletions(-)

diff --git a/include/hw/pci/pci.h b/include/hw/pci/pci.h
index c79144bc5ef..0392b947b8b 100644
--- a/include/hw/pci/pci.h
+++ b/include/hw/pci/pci.h
@@ -688,7 +688,10 @@ static inline void
  pci_set_byte_by_mask(uint8_t *config, uint8_t mask, uint8_t reg)
  {
  uint8_t val = pci_get_byte(config);
-uint8_t rval = reg << ctz32(mask);
+uint8_t rval;
+
+assert(mask & 0xff);


Why the and, especially considering the uint8_t type?


@@ -696,7 +699,10 @@ static inline void
  pci_set_word_by_mask(uint8_t *config, uint16_t mask, uint16_t reg)
  {
  uint16_t val = pci_get_word(config);
-uint16_t rval = reg << ctz32(mask);
+uint16_t rval;
+
+assert(mask & 0x);


Similarly.

Otherwise,
Reviewed-by: Richard Henderson 


r~

Re: [PATCH 1/2] pci: Remove unused pci_get_*_by_mask() functions

2022-07-26 Thread Richard Henderson


On 7/26/22 09:32, Peter Maydell wrote:

The helper functions pci_get_{byte,word,long,quad}_by_mask()
were added in 2012 in commit c9f50cea70a1596. In the decade
since we have never added a single use of them.

The helpers clearly aren't that helpful, so drop them
rather than carrying around dead code.

Signed-off-by: Peter Maydell
---
  include/hw/pci/pci.h | 28 
  1 file changed, 28 deletions(-)


Reviewed-by: Richard Henderson 

r~

Re: [PATCH 00/11] QOM'ify PIIX3 southbridge

2022-07-26 Thread BB




Am 26. Juli 2022 16:53:03 MESZ schrieb "Michael S. Tsirkin" :
>On Wed, Jul 13, 2022 at 10:17:24AM +0200, Bernhard Beschow wrote:
>> Similar to PIIX4 this series QOM'ifies internal device creation for PIIX3.
>> This reduces the delta between the implementations of PIIX3 and PIIX4 and
>> therefore might allow to merge both implementations in the future.
>> 
>> There were two challenges in this series:
>> 
>> First, QEMU considers the ACPI and USB functions to be optional in PIIX3.
>> When instantiating those with object_initialize_child(), they need to be
>> unparented in the realize function to prevent an assertion (see respective
>> commit messages).
>> 
>> Second, the PIC used to be instantiated outside of the southbridge while
>> some sub functions require a PIC with populated qemu_irqs. This has been
>> solved by introducing a proxy PIC which furthermore allows PIIX3 to be
>> agnostic towards the virtualization technology used (KVM, TCG, Xen).
>
>Thanks!
>I think it's best to merge this after the 7.1 release.
>I'll tag this but if possible pls also ping me after the release
>to make sure I don't forget. Thanks!

Sure!
I'm extending the scope of this series to go all the way to consolidate the 
piix 3 + 4 southbridges which is why I didn't post a v2 yet. The extended 
series will also address Peter's comments.

Thanks,
Bernhard

P.S.:
I've got a working POC where PIIX4 rather than PIIX3 is used in the "pc" 
machine which also supports KVM accelleration: 
https://github.com/shentok/qemu/commits/pc-piix4

>
>> Testing done:
>> * make check
>> * make check-avocado
>> * Boot live CD:
>>   * qemu-system-x86_64 -M pc -m 2G -accel kvm -cpu host -cdrom 
>> manjaro-kde-21.3.2-220704-linux515.iso
>>   * qemu-system-x86_64 -M q35 -m 2G -accel kvm -cpu host -cdrom 
>> manjaro-kde-21.3.2-220704-linux515.iso
>> 
>> Bernhard Beschow (11):
>>   hw/i386/pc: QOM'ify DMA creation
>>   hw/i386/pc_piix: Allow for setting properties before realizing PIIX3
>> southbridge
>>   hw/isa/piix3: QOM'ify USB controller creation
>>   hw/isa/piix3: QOM'ify ACPI controller creation
>>   hw/i386/pc: QOM'ify RTC creation
>>   hw/i386/pc: No need for rtc_state to be an out-parameter
>>   hw/intc/i8259: Introduce i8259 proxy "isa-pic"
>>   hw/isa/piix3: QOM'ify ISA PIC creation
>>   hw/isa/piix3: QOM'ify IDE controller creation
>>   hw/isa/piix3: Wire up ACPI interrupt internally
>>   hw/isa/piix3: Remove extra ';' outside of functions
>> 
>>  hw/i386/Kconfig   |  1 -
>>  hw/i386/pc.c  | 17 ---
>>  hw/i386/pc_piix.c | 70 -
>>  hw/i386/pc_q35.c  |  3 +-
>>  hw/intc/i8259.c   | 27 +++
>>  hw/isa/Kconfig|  1 +
>>  hw/isa/lpc_ich9.c | 11 +
>>  hw/isa/piix3.c| 84 ---
>>  include/hw/i386/ich9.h|  2 +
>>  include/hw/i386/pc.h  |  2 +-
>>  include/hw/intc/i8259.h   | 14 ++
>>  include/hw/southbridge/piix.h | 16 ++-
>>  12 files changed, 201 insertions(+), 47 deletions(-)
>> 
>> -- 
>> 2.37.1
>> 
>

Re: [PATCH v10 04/12] target/riscv: pmu: Make number of counters configurable

2022-07-26 Thread Atish Patra

On Tue, Jul 5, 2022 at 1:20 AM Weiwei Li  wrote:
>
>
> 在 2022/7/5 下午3:51, Atish Kumar Patra 写道:
> > On Mon, Jul 4, 2022 at 5:38 PM Weiwei Li  wrote:
> >>
> >> 在 2022/7/4 下午11:26, Weiwei Li 写道:
> >>> 在 2022/6/21 上午7:15, Atish Patra 写道:
>  The RISC-V privilege specification provides flexibility to implement
>  any number of counters from 29 programmable counters. However, the QEMU
>  implements all the counters.
> 
>  Make it configurable through pmu config parameter which now will
>  indicate
>  how many programmable counters should be implemented by the cpu.
> 
>  Reviewed-by: Bin Meng 
>  Reviewed-by: Alistair Francis 
>  Signed-off-by: Atish Patra 
>  Signed-off-by: Atish Patra 
>  ---
> target/riscv/cpu.c |  3 +-
> target/riscv/cpu.h |  2 +-
> target/riscv/csr.c | 94 ++
> 3 files changed, 63 insertions(+), 36 deletions(-)
> 
>  diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
>  index 1b57b3c43980..d12c6dc630ca 100644
>  --- a/target/riscv/cpu.c
>  +++ b/target/riscv/cpu.c
>  @@ -851,7 +851,6 @@ static void riscv_cpu_init(Object *obj)
> {
> RISCVCPU *cpu = RISCV_CPU(obj);
> -cpu->cfg.ext_pmu = true;
> cpu->cfg.ext_ifencei = true;
> cpu->cfg.ext_icsr = true;
> cpu->cfg.mmu = true;
>  @@ -879,7 +878,7 @@ static Property riscv_cpu_extensions[] = {
> DEFINE_PROP_BOOL("u", RISCVCPU, cfg.ext_u, true),
> DEFINE_PROP_BOOL("v", RISCVCPU, cfg.ext_v, false),
> DEFINE_PROP_BOOL("h", RISCVCPU, cfg.ext_h, true),
>  -DEFINE_PROP_BOOL("pmu", RISCVCPU, cfg.ext_pmu, true),
>  +DEFINE_PROP_UINT8("pmu-num", RISCVCPU, cfg.pmu_num, 16),
> >>> I think It's better to add  a check on cfg.pmu_num to  <= 29.
> >>>
> >> OK, I find this check in the following patch.
> DEFINE_PROP_BOOL("Zifencei", RISCVCPU, cfg.ext_ifencei, true),
> DEFINE_PROP_BOOL("Zicsr", RISCVCPU, cfg.ext_icsr, true),
> DEFINE_PROP_BOOL("Zfh", RISCVCPU, cfg.ext_zfh, false),
>  diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
>  index 252c30a55d78..ffee54ea5c27 100644
>  --- a/target/riscv/cpu.h
>  +++ b/target/riscv/cpu.h
>  @@ -397,7 +397,6 @@ struct RISCVCPUConfig {
> bool ext_zksed;
> bool ext_zksh;
> bool ext_zkt;
>  -bool ext_pmu;
> bool ext_ifencei;
> bool ext_icsr;
> bool ext_svinval;
>  @@ -421,6 +420,7 @@ struct RISCVCPUConfig {
> /* Vendor-specific custom extensions */
> bool ext_XVentanaCondOps;
> +uint8_t pmu_num;
> char *priv_spec;
> char *user_spec;
> char *bext_spec;
>  diff --git a/target/riscv/csr.c b/target/riscv/csr.c
>  index 0ca05c77883c..b4a8e15f498f 100644
>  --- a/target/riscv/csr.c
>  +++ b/target/riscv/csr.c
>  @@ -73,9 +73,17 @@ static RISCVException ctr(CPURISCVState *env, int
>  csrno)
> CPUState *cs = env_cpu(env);
> RISCVCPU *cpu = RISCV_CPU(cs);
> int ctr_index;
>  +int base_csrno = CSR_HPMCOUNTER3;
>  +bool rv32 = riscv_cpu_mxl(env) == MXL_RV32 ? true : false;
> -if (!cpu->cfg.ext_pmu) {
>  -/* The PMU extension is not enabled */
>  +if (rv32 && csrno >= CSR_CYCLEH) {
>  +/* Offset for RV32 hpmcounternh counters */
>  +base_csrno += 0x80;
>  +}
>  +ctr_index = csrno - base_csrno;
>  +
>  +if (!cpu->cfg.pmu_num || ctr_index >= (cpu->cfg.pmu_num)) {
>  +/* No counter is enabled in PMU or the counter is out of
>  range */
> >>> I seems unnecessary to add '!cpu->cfg.pmu_num ' here, 'ctr_index >=
> >>> (cpu->cfg.pmu_num)' is true
> > The check is improved in the following patches as well.
> >
> Do you mean 'if ((!cpu->cfg.pmu_num || !(cpu->pmu_avail_ctrs &
> ctr_mask)))'  in patch 9 ?
>
> In this condition, '!cpu->cfg.pmu_num' seems unnecessary too.
>

Yes. I will remove it.

> Regards,
>
> Weiwei Li
>
> >> Typo.  I -> It
> >>> when cpu->cfg.pmu_num is zero if the problem for base_csrno is fixed.
> >>>
> >>> Ragards,
> >>>
> >>> Weiwei Li
> >>>
> return RISCV_EXCP_ILLEGAL_INST;
> }
> @@ -103,7 +111,7 @@ static RISCVException ctr(CPURISCVState *env,
>  int csrno)
> }
> break;
> }
>  -if (riscv_cpu_mxl(env) == MXL_RV32) {
>  +if (rv32) {
> switch (csrno) {
> case CSR_CYCLEH:
> if (!get_field(env->mcounteren, COUNTEREN_CY)) {
>  @@ -158,7 +166,7 @@ static RISCVException ctr(CPURISCVState *env, int
>  csrno)
> }
> break;
> }
>  -

Re: [PULL 00/16] pc,virtio: fixes

2022-07-26 Thread Richard Henderson


On 7/26/22 12:40, Michael S. Tsirkin wrote:

The following changes since commit d1c912b816844aa045082595eba796b5a025dbc4:

   Merge tag 'linux-user-for-7.1-pull-request' of 
https://gitlab.com/laurent_vivier/qemu into staging (2022-07-26 13:29:26 +0100)

are available in the Git repository at:

   git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream

for you to fetch changes up to 0522be9a0c0094088ccef7aab352c57f483ca250:

   hw/virtio/virtio-iommu: Enforce power-of-two notify for both MAP and UNMAP 
(2022-07-26 15:33:29 -0400)


pc,virtio: fixes

Several fixes. From now on, regression fixes only.

Signed-off-by: Michael S. Tsirkin 


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/7.1 as 
appropriate.


r~





Jean-Philippe Brucker (1):
   hw/virtio/virtio-iommu: Enforce power-of-two notify for both MAP and 
UNMAP

Joao Martins (11):
   hw/i386: add 4g boundary start to X86MachineState
   i386/pc: create pci-host qdev prior to pc_memory_init()
   i386/pc: pass pci_hole64_size to pc_memory_init()
   i386/pc: factor out above-4g end to an helper
   i386/pc: factor out cxl range end to helper
   i386/pc: factor out cxl range start to helper
   i386/pc: handle unitialized mr in pc_get_cxl_range_end()
   i386/pc: factor out device_memory base/size to helper
   i386/pc: bounds check phys-bits against max used GPA
   i386/pc: relocate 4g start to 1T where applicable
   i386/pc: restrict AMD only enforcing of 1Tb hole to new machine type

Jonathan Cameron (3):
   hw/machine: Clear out left over CXL related pointer from move of state 
handling to machines.
   hw/i386/pc: Always place CXL Memory Regions after device_memory
   hw/cxl: Fix size of constant in interleave granularity function.

Robert Hoo (1):
   acpi/nvdimm: Define trace events for NVDIMM and substitute nvdimm_debug()

  include/hw/boards.h|   1 -
  include/hw/cxl/cxl_component.h |   2 +-
  include/hw/i386/pc.h   |   4 +-
  include/hw/i386/x86.h  |   3 +
  include/hw/mem/nvdimm.h|   8 --
  include/hw/pci-host/i440fx.h   |   3 +-
  hw/acpi/nvdimm.c   |  35 ---
  hw/i386/acpi-build.c   |   2 +-
  hw/i386/pc.c   | 209 -
  hw/i386/pc_piix.c  |  15 ++-
  hw/i386/pc_q35.c   |  15 ++-
  hw/i386/sgx.c  |   2 +-
  hw/i386/x86.c  |   1 +
  hw/pci-host/i440fx.c   |   5 +-
  hw/virtio/virtio-iommu.c   |  47 +
  hw/acpi/trace-events   |  13 +++
  16 files changed, 258 insertions(+), 107 deletions(-)

Re: [PATCH v10 11/12] hw/riscv: virt: Add PMU DT node to the device tree

2022-07-26 Thread Atish Patra

On Thu, Jul 14, 2022 at 3:27 AM Heiko Stübner  wrote:
>
> Hi Atish,
>
> Am Dienstag, 21. Juni 2022, 01:16:01 CEST schrieb Atish Patra:
> > Qemu virt machine can support few cache events and cycle/instret counters.
> > It also supports counter overflow for these events.
> >
> > Add a DT node so that OpenSBI/Linux kernel is aware of the virt machine
> > capabilities. There are some dummy nodes added for testing as well.
> >
> > Acked-by: Alistair Francis 
> > Signed-off-by: Atish Patra 
> > Signed-off-by: Atish Patra 
> > ---
>
> > +static void create_fdt_socket_pmu(RISCVVirtState *s,
> > +  int socket, uint32_t *phandle,
> > +  uint32_t *intc_phandles)
> > +{
> > +int cpu;
> > +char *pmu_name;
> > +uint32_t *pmu_cells;
> > +MachineState *mc = MACHINE(s);
> > +RISCVCPU hart = s->soc[socket].harts[0];
> > +
> > +pmu_cells = g_new0(uint32_t, s->soc[socket].num_harts * 2);
> > +
> > +for (cpu = 0; cpu < s->soc[socket].num_harts; cpu++) {
> > +pmu_cells[cpu * 2 + 0] = cpu_to_be32(intc_phandles[cpu]);
> > +pmu_cells[cpu * 2 + 1] = cpu_to_be32(IRQ_PMU_OVF);
> > +}
> > +
> > +pmu_name = g_strdup_printf("/soc/pmu");
> > +qemu_fdt_add_subnode(mc->fdt, pmu_name);
> > +qemu_fdt_setprop_string(mc->fdt, pmu_name, "compatible", "riscv,pmu");
>
> Where is the binding document for this?
>
> As the comment below states the "riscv,event-to-mhpmcounters" property
> is opensbi-specific and gets removed in the opensbi stage, but that still
> leaves the pmu node in it and from the version I found, Rob wasn't overly
> happy with the compatible [0]. Did this get addressed?
>

This is OpenSBI specific binding.
https://github.com/riscv-software-src/opensbi/blob/master/docs/pmu_support.md

Linux kernel doesn't use binding anymore. The earlier version patches
relied on the DT binding.
However, based on the feedback it was removed.

OpenSBI should delete the node and the interrupt-extended property
deletion is necessary at this point.

>
> Thanks
> Heiko
>
>
> [0] https://lore.kernel.org/all/yxhpqfpxh1vzn...@robh.at.kernel.org/
>
>
>
> > +riscv_pmu_generate_fdt_node(mc->fdt, hart.cfg.pmu_num, pmu_name);
> > +
> > +g_free(pmu_name);
> > +g_free(pmu_cells);
> > +}
> > +
> >  static void create_fdt_sockets(RISCVVirtState *s, const MemMapEntry 
> > *memmap,
> > bool is_32_bit, uint32_t *phandle,
> > uint32_t *irq_mmio_phandle,
> > @@ -759,6 +786,7 @@ static void create_fdt_sockets(RISCVVirtState *s, const 
> > MemMapEntry *memmap,
> >  _phandles[phandle_pos]);
> >  }
> >  }
> > +create_fdt_socket_pmu(s, socket, phandle, intc_phandles);
> >  }
> >
> >  if (s->aia_type == VIRT_AIA_TYPE_APLIC_IMSIC) {
> > diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
> > index 7d9e2aca12a9..69bbd9fff4e1 100644
> > --- a/target/riscv/cpu.c
> > +++ b/target/riscv/cpu.c
> > @@ -1110,6 +1110,7 @@ static void riscv_isa_string_ext(RISCVCPU *cpu, char 
> > **isa_str, int max_str_len)
> >  ISA_EDATA_ENTRY(zve64f, ext_zve64f),
> >  ISA_EDATA_ENTRY(zhinx, ext_zhinx),
> >  ISA_EDATA_ENTRY(zhinxmin, ext_zhinxmin),
> > +ISA_EDATA_ENTRY(sscofpmf, ext_sscofpmf),
> >  ISA_EDATA_ENTRY(svinval, ext_svinval),
> >  ISA_EDATA_ENTRY(svnapot, ext_svnapot),
> >  ISA_EDATA_ENTRY(svpbmt, ext_svpbmt),
> > diff --git a/target/riscv/pmu.c b/target/riscv/pmu.c
> > index 34096941c0ce..59feb3c243dd 100644
> > --- a/target/riscv/pmu.c
> > +++ b/target/riscv/pmu.c
> > @@ -20,11 +20,68 @@
> >  #include "cpu.h"
> >  #include "pmu.h"
> >  #include "sysemu/cpu-timers.h"
> > +#include "sysemu/device_tree.h"
> >
> >  #define RISCV_TIMEBASE_FREQ 10 /* 1Ghz */
> >  #define MAKE_32BIT_MASK(shift, length) \
> >  (((uint32_t)(~0UL) >> (32 - (length))) << (shift))
> >
> > +/**
> > + * To keep it simple, any event can be mapped to any programmable counters 
> > in
> > + * QEMU. The generic cycle & instruction count events can also be monitored
> > + * using programmable counters. In that case, mcycle & minstret must 
> > continue
> > + * to provide the correct value as well. Heterogeneous PMU per hart is not
> > + * supported yet. Thus, number of counters are same across all harts.
> > + */
> > +void riscv_pmu_generate_fdt_node(void *fdt, int num_ctrs, char *pmu_name)
> > +{
> > +uint32_t fdt_event_ctr_map[20] = {};
> > +uint32_t cmask;
> > +
> > +/* All the programmable counters can map to any event */
> > +cmask = MAKE_32BIT_MASK(3, num_ctrs);
> > +
> > +   /**
> > +* The event encoding is specified in the SBI specification
> > +* Event idx is a 20bits wide number encoded as follows:
> > +* event_idx[19:16] = type
> > +* event_idx[15:0] = code
> > +* The code field in cache events are encoded as follows:
> > +* event_idx.code[15:3] =

Re: [PATCH] Hexagon (tests/tcg/hexagon) add compiler options to EXTRA_CFLAGS

2022-07-26 Thread Philippe Mathieu-Daudé via


Hi Taylor,

On 26/7/22 21:17, Taylor Simpson wrote:

The cross_cc_cflags_hexagon in configure are not getting passed to
the Hexagon cross compiler.  Set EXTRA_CFLAGS in
tests/tcg/hexagon/Makefile.target.

Suggested-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
---
  tests/tcg/hexagon/Makefile.target | 1 +
  1 file changed, 1 insertion(+)

diff --git a/tests/tcg/hexagon/Makefile.target 
b/tests/tcg/hexagon/Makefile.target
index 23b9870534..627bf58fe6 100644
--- a/tests/tcg/hexagon/Makefile.target
+++ b/tests/tcg/hexagon/Makefile.target
@@ -20,6 +20,7 @@ EXTRA_RUNS =
  
  CFLAGS += -Wno-incompatible-pointer-types -Wno-undefined-internal

  CFLAGS += -fno-unroll-loops
+EXTRA_CFLAGS += -mv67 -O2
  
  HEX_SRC=$(SRC_PATH)/tests/tcg/hexagon

  VPATH += $(HEX_SRC)


My understanding of Richard suggestion is something like:

-- >8 --
@@ -45,6 +45,8 @@ HEX_TESTS += overflow

 TESTS += $(HEX_TESTS)

+$(filter-out usr, $(HEX_TESTS)): CFLAGS += -mv67 -O2
+
 # This test has to be compiled for the -mv67t target
 usr: usr.c
$(CC) $(CFLAGS) -mv67t -O2 -Wno-inline-asm 
-Wno-expansion-to-defined $< -o $@ $(LDFLAGS)

---

Eventually to keep the same style in the file:
-- >8 --
@@ -46,6 +46,5 @@ HEX_TESTS += overflow
 TESTS += $(HEX_TESTS)

 # This test has to be compiled for the -mv67t target
-usr: usr.c
-   $(CC) $(CFLAGS) -mv67t -O2 -Wno-inline-asm 
-Wno-expansion-to-defined $< -o $@ $(LDFLAGS)

+usr: CFLAGS += -mv67t -O2 -Wno-inline-asm -Wno-expansion-to-defined
---

Regards,

Phil.

Re: hexagon docker test failure

2022-07-26 Thread Philippe Mathieu-Daudé via


(Cc'ing Paolo for commit cd362defb)

On 26/7/22 19:23, Taylor Simpson wrote:



-Original Message-
From: Richard Henderson 
Sent: Tuesday, July 26, 2022 10:41 AM
To: Taylor Simpson 
Cc: qemu-devel 
Subject: hexagon docker test failure

Hi Taylor,

One of your recent hexagon testsuite changes is incompatible with the
docker image that we're using:

tests/tcg/hexagon/multi_result.c:79:16: error: invalid instruction

asm volatile("%0,p0 = vminub(%2, %3)\n\t"

 ^

:1:2: note: instantiated into assembly here

  r3:2,p0 = vminub(r1:0, r3:2)

  ^

1 error generated.


Can we try again to update debian-hexagon-cross?  I recall that last time
there was a compiler bug that prevented forward progress.  Perhaps that has
been fixed in the interim?

I'm willing to accept such an update in the next week before rc1, but if we
can't manage that we'll need to disable the failing test(s?).  Thanks in
advance,


r~


Hi Richard,

Some of the tests require the -mv67 flag to be passed to the compiler because 
they have instructions that are new in v67.

This patch
commit cd362defbbd09cbbc08b3bb465141542887b8cef
Author: Paolo Bonzini 
Date:   Fri May 27 16:35:48 2022 +0100

 tests/tcg: merge configure.sh back into main configure script

Moved this line from tests/tcg/configure.sh to the main configure script
: ${cross_cc_cflags_hexagon="-mv67 -O2 -static"}


However, those flags aren't actually passed to the compiler any more - at least 
by default.

The gitlab builder is passing
https://gitlab.com/qemu-project/qemu/-/jobs/2771528066
So, there must be something in $MAKE_CHECK_ARGS

I use the following when I run
make EXTRA_CFLAGS='-mv67 -O2' check-tcg


So, we probably don't need a new docker image.  Do other targets have their 
cross_cc_cflags?  Please advise.

Thanks,
Taylor

Re: [PATCH 2/2] pci: Sanity check mask argument to pci_set_*_by_mask()

2022-07-26 Thread Philippe Mathieu-Daudé via


On 26/7/22 18:32, Peter Maydell wrote:

Coverity complains that in functions like pci_set_word_by_mask()
we might end up shifting by more than 31 bits. This is true,
but only if the caller passes in a zero mask. Help Coverity out
by asserting that the mask argument is valid.

Fixes: CID 1487168

Signed-off-by: Peter Maydell 
---
Note that only 1 of these 4 functions is used, and that only
in 2 places in the codebase. In both cases the mask argument
is a compile-time constant.
---
  include/hw/pci/pci.h | 20 
  1 file changed, 16 insertions(+), 4 deletions(-)


Reviewed-by: Philippe Mathieu-Daudé

[PATCH v7 13/15] block: Manipulate bs->file / bs->backing pointers in .attach/.detach

2022-07-26 Thread Vladimir Sementsov-Ogievskiy

bs->file and bs->backing are a kind of duplication of part of
bs->children. But very useful diplication, so let's not drop them at
all:)

We should manage bs->file and bs->backing in same place, where we
manage bs->children, to keep them in sync.

Moreover, generic io paths are unprepared to BdrvChild without a bs, so
it's double good to clear bs->file / bs->backing when we detach the
child.

Detach is simple: if we detach bs->file or bs->backing child, just
set corresponding field to NULL.

Attach is a bit more complicated. But we still can precisely detect
should we set one of bs->file / bs->backing or not:

- if role is BDRV_CHILD_COW, we definitely deal with bs->backing
- else, if role is BDRV_CHILD_FILTERED (it must be also
  BDRV_CHILD_PRIMARY), it's a filtered child. Use
  bs->drv->filtered_child_is_backing to chose the pointer field to
  modify.
- else, if role is BDRV_CHILD_PRIMARY, we deal with bs->file
- in all other cases, it's neither bs->backing nor bs->file. It's some
  other child and we shouldn't care

OK. This change brings one more good thing: we can (and should) get rid
of all indirect pointers in the block-graph-change transactions:

bdrv_attach_child_common() stores BdrvChild** into transaction to clear
it on abort.

bdrv_attach_child_common() has two callers: bdrv_attach_child_noperm()
just pass-through this feature, bdrv_root_attach_child() doesn't need
the feature.

Look at bdrv_attach_child_noperm() callers:
  - bdrv_attach_child() doesn't need the feature
  - bdrv_set_file_or_backing_noperm() uses the feature to manage
bs->file and bs->backing, we don't want it anymore
  - bdrv_append() uses the feature to manage bs->backing, again we
don't want it anymore

So, we should drop this stuff! Great!

We could probably keep BdrvChild** argument to keep the int return
value, but it seems not worth the complexity.

Finally, we now set .file / .backing automatically in generic code and
want to restring setting them by hand outside of .attach/.detach.
So, this patch cleanups all remaining places where they were set.
To find such places I use:

  git grep '\->file ='
  git grep '\->backing ='
  git grep '&.*\'
  git grep '&.*\'

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Hanna Reitz 
---
 block.c  | 234 ++-
 block/raw-format.c   |   4 +-
 block/snapshot-access.c  |   6 +-
 block/snapshot.c |   1 -
 include/block/block_int-common.h |  15 +-
 tests/unit/test-bdrv-drain.c |  10 +-
 6 files changed, 126 insertions(+), 144 deletions(-)

diff --git a/block.c b/block.c
index 121affb045..3343dc5649 100644
--- a/block.c
+++ b/block.c
@@ -1440,9 +1440,39 @@ static void bdrv_child_cb_attach(BdrvChild *child)
 
 assert_bdrv_graph_writable(bs);
 QLIST_INSERT_HEAD(>children, child, next);
-
-if (child->role & BDRV_CHILD_COW) {
+if (bs->drv->is_filter || (child->role & BDRV_CHILD_FILTERED)) {
+/*
+ * Here we handle filters and block/raw-format.c when it behave like
+ * filter. They generally have a single PRIMARY child, which is also 
the
+ * FILTERED child, and that they may have multiple more children, which
+ * are neither PRIMARY nor FILTERED. And never we have a COW child 
here.
+ * So bs->file will be the PRIMARY child, unless the PRIMARY child goes
+ * into bs->backing on exceptional cases; and bs->backing will be
+ * nothing else.
+ */
+assert(!(child->role & BDRV_CHILD_COW));
+if (child->role & BDRV_CHILD_PRIMARY) {
+assert(child->role & BDRV_CHILD_FILTERED);
+assert(!bs->backing);
+assert(!bs->file);
+
+if (bs->drv->filtered_child_is_backing) {
+bs->backing = child;
+} else {
+bs->file = child;
+}
+} else {
+assert(!(child->role & BDRV_CHILD_FILTERED));
+}
+} else if (child->role & BDRV_CHILD_COW) {
+assert(bs->drv->supports_backing);
+assert(!(child->role & BDRV_CHILD_PRIMARY));
+assert(!bs->backing);
+bs->backing = child;
 bdrv_backing_attach(child);
+} else if (child->role & BDRV_CHILD_PRIMARY) {
+assert(!bs->file);
+bs->file = child;
 }
 
 bdrv_apply_subtree_drain(child, bs);
@@ -1460,6 +1490,12 @@ static void bdrv_child_cb_detach(BdrvChild *child)
 
 assert_bdrv_graph_writable(bs);
 QLIST_REMOVE(child, next);
+if (child == bs->backing) {
+assert(child != bs->file);
+bs->backing = NULL;
+} else if (child == bs->file) {
+bs->file = NULL;
+}
 }
 
 static int bdrv_child_cb_update_filename(BdrvChild *c, BlockDriverState *base,
@@ -1665,7 +1701,7 @@ open_failed:
 bs->drv = NULL;
 if (bs->file != NULL) {
 bdrv_unref_child(bs, bs->file);
-bs->file = NULL;
+assert(!bs->file);
 }
 g_free(bs->opaque);

[PATCH v7 10/15] Revert "block: Let replace_child_tran keep indirect pointer"

2022-07-26 Thread Vladimir Sementsov-Ogievskiy

That's a preparation to previously reverted
"block: Let replace_child_noperm free children". Drop it too, we don't
need it for a new approach.

This reverts commit 82b54cf51656bf3cd5ed1ac549e8a1085a0e3290.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Hanna Reitz 
---
 block.c | 81 +++--
 1 file changed, 10 insertions(+), 71 deletions(-)

diff --git a/block.c b/block.c
index 87e2f23d13..1b9e0a2d79 100644
--- a/block.c
+++ b/block.c
@@ -2336,7 +2336,6 @@ static int bdrv_drv_set_perm(BlockDriverState *bs, 
uint64_t perm,
 
 typedef struct BdrvReplaceChildState {
 BdrvChild *child;
-BdrvChild **childp;
 BlockDriverState *old_bs;
 } BdrvReplaceChildState;
 
@@ -2354,29 +2353,7 @@ static void bdrv_replace_child_abort(void *opaque)
 BlockDriverState *new_bs = s->child->bs;
 
 GLOBAL_STATE_CODE();
-/*
- * old_bs reference is transparently moved from @s to s->child.
- *
- * Pass >child here instead of s->childp, because:
- * (1) s->old_bs must be non-NULL, so bdrv_replace_child_noperm() will not
- * modify the BdrvChild * pointer we indirectly pass to it, i.e. it
- * will not modify s->child.  From that perspective, it does not matter
- * whether we pass s->childp or >child.
- * (TODO: Right now, bdrv_replace_child_noperm() never modifies that
- * pointer anyway (though it will in the future), so at this point it
- * absolutely does not matter whether we pass s->childp or >child.)
- * (2) If new_bs is not NULL, s->childp will be NULL.  We then cannot use
- * it here.
- * (3) If new_bs is NULL, *s->childp will have been NULLed by
- * bdrv_replace_child_tran()'s bdrv_replace_child_noperm() call, and we
- * must not pass a NULL *s->childp here.
- * (TODO: In its current state, bdrv_replace_child_noperm() will not
- * have NULLed *s->childp, so this does not apply yet.  It will in the
- * future.)
- *
- * So whether new_bs was NULL or not, we cannot pass s->childp here; and in
- * any case, there is no reason to pass it anyway.
- */
+/* old_bs reference is transparently moved from @s to @s->child */
 bdrv_replace_child_noperm(>child, s->old_bs);
 bdrv_unref(new_bs);
 }
@@ -2393,32 +2370,22 @@ static TransactionActionDrv bdrv_replace_child_drv = {
  * Note: real unref of old_bs is done only on commit.
  *
  * The function doesn't update permissions, caller is responsible for this.
- *
- * Note that if new_bs == NULL, @childp is stored in a state object attached
- * to @tran, so that the old child can be reinstated in the abort handler.
- * Therefore, if @new_bs can be NULL, @childp must stay valid until the
- * transaction is committed or aborted.
- *
- * (TODO: The reinstating does not happen yet, but it will once
- * bdrv_replace_child_noperm() NULLs *childp when new_bs is NULL.)
  */
-static void bdrv_replace_child_tran(BdrvChild **childp,
-BlockDriverState *new_bs,
+static void bdrv_replace_child_tran(BdrvChild *child, BlockDriverState *new_bs,
 Transaction *tran)
 {
 BdrvReplaceChildState *s = g_new(BdrvReplaceChildState, 1);
 *s = (BdrvReplaceChildState) {
-.child = *childp,
-.childp = new_bs == NULL ? childp : NULL,
-.old_bs = (*childp)->bs,
+.child = child,
+.old_bs = child->bs,
 };
 tran_add(tran, _replace_child_drv, s);
 
 if (new_bs) {
 bdrv_ref(new_bs);
 }
-bdrv_replace_child_noperm(childp, new_bs);
-/* old_bs reference is transparently moved from *childp to @s */
+bdrv_replace_child_noperm(, new_bs);
+/* old_bs reference is transparently moved from @child to @s */
 }
 
 /*
@@ -5043,7 +5010,6 @@ static bool should_update_child(BdrvChild *c, 
BlockDriverState *to)
 
 typedef struct BdrvRemoveFilterOrCowChild {
 BdrvChild *child;
-BlockDriverState *bs;
 bool is_backing;
 } BdrvRemoveFilterOrCowChild;
 
@@ -5073,19 +5039,10 @@ static void bdrv_remove_filter_or_cow_child_commit(void 
*opaque)
 bdrv_child_free(s->child);
 }
 
-static void bdrv_remove_filter_or_cow_child_clean(void *opaque)
-{
-BdrvRemoveFilterOrCowChild *s = opaque;
-
-/* Drop the bs reference after the transaction is done */
-bdrv_unref(s->bs);
-g_free(s);
-}
-
 static TransactionActionDrv bdrv_remove_filter_or_cow_child_drv = {
 .abort = bdrv_remove_filter_or_cow_child_abort,
 .commit = bdrv_remove_filter_or_cow_child_commit,
-.clean = bdrv_remove_filter_or_cow_child_clean,
+.clean = g_free,
 };
 
 /*
@@ -5103,11 +5060,6 @@ static void 
bdrv_remove_file_or_backing_child(BlockDriverState *bs,
 return;
 }
 
-/*
- * Keep a reference to @bs so @childp will stay valid throughout the
- * transaction (required by bdrv_replace_child_tran())
- */
-bdrv_ref(bs);
 if (child ==

[PATCH v7 08/15] block/snapshot: stress that we fallback to primary child

2022-07-26 Thread Vladimir Sementsov-Ogievskiy

Actually what we chose is a primary child. Let's stress it in the code.

We are going to drop indirect pointer logic here in future. Actually
this commit simplifies the future work: we drop use of indirection in
the assertion now.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Hanna Reitz 
---
 block/snapshot.c | 30 ++
 1 file changed, 10 insertions(+), 20 deletions(-)

diff --git a/block/snapshot.c b/block/snapshot.c
index d6f53c3065..75e8d3a937 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -161,21 +161,14 @@ bool bdrv_snapshot_find_by_id_and_name(BlockDriverState 
*bs,
 static BdrvChild **bdrv_snapshot_fallback_ptr(BlockDriverState *bs)
 {
 BdrvChild **fallback;
-BdrvChild *child;
+BdrvChild *child = bdrv_primary_child(bs);
 
-/*
- * The only BdrvChild pointers that are safe to modify (and which
- * we can thus return a reference to) are bs->file and
- * bs->backing.
- */
-fallback = >file;
-if (!*fallback && bs->drv && bs->drv->is_filter) {
-fallback = >backing;
-}
-
-if (!*fallback) {
+/* We allow fallback only to primary child */
+if (!child) {
 return NULL;
 }
+fallback = (child == bs->file ? >file : >backing);
+assert(*fallback == child);
 
 /*
  * Check that there are no other children that would need to be
@@ -309,15 +302,12 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
 }
 
 /*
- * fallback_ptr is >file or >backing.  *fallback_ptr
- * was closed above and set to NULL, but the .bdrv_open() call
- * has opened it again, because we set the respective option
- * (with the qdict_put_str() call above).
- * Assert that .bdrv_open() has attached some child on
- * *fallback_ptr, and that it has attached the one we wanted
- * it to (i.e., fallback_bs).
+ * fallback was a primary child. It was closed above and set to NULL,
+ * but the .bdrv_open() call has opened it again, because we set the
+ * respective option (with the qdict_put_str() call above).
+ * Assert that .bdrv_open() has attached the right BDS as primary 
child.
  */
-assert(*fallback_ptr && fallback_bs == (*fallback_ptr)->bs);
+assert(bdrv_primary_bs(bs) == fallback_bs);
 bdrv_unref(fallback_bs);
 return ret;
 }
-- 
2.25.1

[PATCH v7 04/15] test-bdrv-graph-mod: update test_parallel_perm_update test case

2022-07-26 Thread Vladimir Sementsov-Ogievskiy

test_parallel_perm_update() does two things that we are going to
restrict in the near future:

1. It updates bs->file field by hand. bs->file will be managed
   automatically by generic code (together with bs->children list).

   Let's better refactor our "tricky" bds to have own state where one
   of children is linked as "selected".
   This also looks less "tricky", so avoid using this word.

2. It create FILTERED children that are not PRIMARY. Except for tests
   all FILTERED children in the Qemu block layer are always PRIMARY as
   well.  We are going to formalize this rule, so let's better use DATA
   children here.

3. It creates more than one FILTERED child, which is already abandoned
   in BDRV_CHILD_FILTERED's description.

While being here, update the picture to better correspond to the test
code.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Hanna Reitz 
---
 tests/unit/test-bdrv-graph-mod.c | 80 +++-
 1 file changed, 49 insertions(+), 31 deletions(-)

diff --git a/tests/unit/test-bdrv-graph-mod.c b/tests/unit/test-bdrv-graph-mod.c
index a6e3bb79be..e2f1355af1 100644
--- a/tests/unit/test-bdrv-graph-mod.c
+++ b/tests/unit/test-bdrv-graph-mod.c
@@ -241,13 +241,26 @@ static void test_parallel_exclusive_write(void)
 bdrv_unref(top);
 }
 
-static void write_to_file_perms(BlockDriverState *bs, BdrvChild *c,
- BdrvChildRole role,
- BlockReopenQueue *reopen_queue,
- uint64_t perm, uint64_t shared,
- uint64_t *nperm, uint64_t *nshared)
+/*
+ * write-to-selected node may have several DATA children, one of them may be
+ * "selected". Exclusive write permission is taken on selected child.
+ *
+ * We don't realize write handler itself, as we need only to test how 
permission
+ * update works.
+ */
+typedef struct BDRVWriteToSelectedState {
+BdrvChild *selected;
+} BDRVWriteToSelectedState;
+
+static void write_to_selected_perms(BlockDriverState *bs, BdrvChild *c,
+BdrvChildRole role,
+BlockReopenQueue *reopen_queue,
+uint64_t perm, uint64_t shared,
+uint64_t *nperm, uint64_t *nshared)
 {
-if (bs->file && c == bs->file) {
+BDRVWriteToSelectedState *s = bs->opaque;
+
+if (s->selected && c == s->selected) {
 *nperm = BLK_PERM_WRITE;
 *nshared = BLK_PERM_ALL & ~BLK_PERM_WRITE;
 } else {
@@ -256,9 +269,10 @@ static void write_to_file_perms(BlockDriverState *bs, 
BdrvChild *c,
 }
 }
 
-static BlockDriver bdrv_write_to_file = {
-.format_name = "tricky-perm",
-.bdrv_child_perm = write_to_file_perms,
+static BlockDriver bdrv_write_to_selected = {
+.format_name = "write-to-selected",
+.instance_size = sizeof(BDRVWriteToSelectedState),
+.bdrv_child_perm = write_to_selected_perms,
 };
 
 
@@ -266,15 +280,18 @@ static BlockDriver bdrv_write_to_file = {
  * The following test shows that topological-sort order is required for
  * permission update, simple DFS is not enough.
  *
- * Consider the block driver which has two filter children: one active
- * with exclusive write access and one inactive with no specific
- * permissions.
+ * Consider the block driver (write-to-selected) which has two children: one is
+ * selected so we have exclusive write access to it and for the other one we
+ * don't need any specific permissions.
  *
  * And, these two children has a common base child, like this:
+ *   (additional "top" on top is used in test just because the only public
+ *function to update permission should get a specific child to update.
+ *Making bdrv_refresh_perms() public just for this test isn't worth it)
  *
- * ┌─┐ ┌──┐
- * │ fl2 │ ◀── │ top  │
- * └─┘ └──┘
+ * ┌─┐ ┌───┐ ┌─┐
+ * │ fl2 │ ◀── │ write-to-selected │ ◀── │ top │
+ * └─┘ └───┘ └─┘
  *   │   │
  *   │   │ w
  *   │   ▼
@@ -290,14 +307,14 @@ static BlockDriver bdrv_write_to_file = {
  *
  * So, exclusive write is propagated.
  *
- * Assume, we want to make fl2 active instead of fl1.
- * So, we set some option for top driver and do permission update.
+ * Assume, we want to select fl2 instead of fl1.
+ * So, we set some option for write-to-selected driver and do permission 
update.
  *
  * With simple DFS, if permission update goes first through
- * top->fl1->base branch it will succeed: it firstly drop exclusive write
- * permissions and than apply them for another BdrvChildren.
- * But if permission update goes first through top->fl2->base branch it
- * will fail, as when we try to update fl2->base child, old not yet
+ * write-to-selected -> fl1 -> base branch it will succeed: it firstly drop
+ * exclusive write permissions and than apply them for

Re: [PATCH] s390x/cpumodel: add stfl197 processor-activity-instrumentation extension 1

2022-07-26 Thread David Hildenbrand

On 26.07.22 21:48, Christian Borntraeger wrote:
> Add stfle 197 (processor-activity-instrumentation extension 1) to the
> gen16 default model and fence it off for 7.0 and older.

QEMU is already in soft-freeze. I assume you want to get this still into
7.1. (decision not in my hands :) )

Anyhow, if a re-target to the next release is required or not

Reviewed-by: David Hildenbrand 

> 
> Signed-off-by: Christian Borntraeger 
> ---
>  hw/s390x/s390-virtio-ccw.c  | 1 +
>  target/s390x/cpu_features_def.h.inc | 1 +
>  target/s390x/gen-features.c | 2 ++
>  3 files changed, 4 insertions(+)
> 
> diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
> index cc3097bfee80..6268aa5d0888 100644
> --- a/hw/s390x/s390-virtio-ccw.c
> +++ b/hw/s390x/s390-virtio-ccw.c
> @@ -806,6 +806,7 @@ static void ccw_machine_7_0_instance_options(MachineState 
> *machine)
>  static const S390FeatInit qemu_cpu_feat = { S390_FEAT_LIST_QEMU_V7_0 };
>  
>  ccw_machine_7_1_instance_options(machine);
> +s390_cpudef_featoff_greater(16, 1, S390_FEAT_PAIE);
>  s390_set_qemu_cpu_model(0x8561, 15, 1, qemu_cpu_feat);
>  }
>  
> diff --git a/target/s390x/cpu_features_def.h.inc 
> b/target/s390x/cpu_features_def.h.inc
> index 3603e5fb12c6..e3cfe637354b 100644
> --- a/target/s390x/cpu_features_def.h.inc
> +++ b/target/s390x/cpu_features_def.h.inc
> @@ -114,6 +114,7 @@ DEF_FEAT(VECTOR_PACKED_DECIMAL_ENH2, "vxpdeh2", STFL, 
> 192, "Vector-Packed-Decima
>  DEF_FEAT(BEAR_ENH, "beareh", STFL, 193, "BEAR-enhancement facility")
>  DEF_FEAT(RDP, "rdp", STFL, 194, "Reset-DAT-protection facility")
>  DEF_FEAT(PAI, "pai", STFL, 196, "Processor-Activity-Instrumentation 
> facility")
> +DEF_FEAT(PAIE, "paie", STFL, 197, "Processor-Activity-Instrumentation 
> extension-1")
>  
>  /* Features exposed via SCLP SCCB Byte 80 - 98  (bit numbers relative to 
> byte-80) */
>  DEF_FEAT(SIE_GSLS, "gsls", SCLP_CONF_CHAR, 40, "SIE: 
> Guest-storage-limit-suppression facility")
> diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
> index ad140184b903..1558c5262616 100644
> --- a/target/s390x/gen-features.c
> +++ b/target/s390x/gen-features.c
> @@ -575,6 +575,7 @@ static uint16_t full_GEN16_GA1[] = {
>  S390_FEAT_BEAR_ENH,
>  S390_FEAT_RDP,
>  S390_FEAT_PAI,
> +S390_FEAT_PAIE,
>  };
>  
>  
> @@ -669,6 +670,7 @@ static uint16_t default_GEN16_GA1[] = {
>  S390_FEAT_BEAR_ENH,
>  S390_FEAT_RDP,
>  S390_FEAT_PAI,
> +S390_FEAT_PAIE,
>  };
>  
>  /* QEMU (CPU model) features */


-- 
Thanks,

David / dhildenb

Re: [PATCH v3 for 7.2 00/21] virtio-gpio and various virtio cleanups

2022-07-26 Thread Michael S. Tsirkin

On Tue, Jul 26, 2022 at 08:21:29PM +0100, Alex Bennée wrote:
> Hi,
> 
> After much slogging through the vhost-user code I've gotten the
> virtio-gpio device working again. The core change in pushing the
> responsibility for VHOST_USER_F_PROTOCOL_FEATURES down to the
> vhost-user layer (which knows it needs it). We still need to account
> for that in virtio-gpio because the result of the negotiating protocol
> features is the vrings start disabled so the stub needs to explicitly
> enable them. I did consider pushing this behaviour explicitly into
> vhost_dev_start but that would have required un-picking it from
> vhost-net (which is the only other device which uses protocol features
> AFAICT - but is a measure more complex in it's setup).
> 
> As last time there are a whole series of clean-ups and doc tweaks. I
> don't know if any are trivial enough to sneak into later RCs but it
> shouldn't be a problem to wait until the tree re-opens.

Right. Still I think some are fixes we should merge now.
I am thinking patches 5, 7,8,9 ? 6 if it makes backporting
much easier. WDYT? If you agree pls separate bugfixes in
series I can apply. Thanks!

> There is a remaining issue that a --enable-sanitizers build fails for
> qos-test due to leaks. It shows up as a leak from:
> 
>   Direct leak of 240 byte(s) in 1 object(s) allocated from:   
>   
>
>   #0 0x7fc5a3f2a037 in __interceptor_calloc 
> ../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154   
>  
>   #1 0x7fc5a2e5cda0 in g_malloc0 ../../../glib/gmem.c:136 
>   
>
>   #2 0x55ce773cc728 in virtio_device_realize 
> ../../hw/virtio/virtio.c:3691 
> 
>   #3 0x55ce7784ed7e in device_set_realized ../../hw/core/qdev.c:553   
>   
>
>   #4 0x55ce77862d0c in property_set_bool ../../qom/object.c:2273  
>
> 
> I'm not entirely sure what the allocation is because it gets inlined
> in the virtio_device_realize call. Perhaps it's the QOM object itself
> which is never gracefully torn down at the end of the test?
> 
> However when I attempted to bisect I found master was broken as well.
> For example in my arm/aarch64-softmmu build we see 5 failures:
> 
> Summary of Failures:
> 
>3/48 qemu:qtest+qtest-aarch64 / qtest-aarch64/migration-test
> ERROR  96.15s   killed by signal 6 SIGABRT
>9/48 qemu:qtest+qtest-aarch64 / qtest-aarch64/qos-test  
> ERROR  32.50s   killed by signal 6 SIGABRT
>   11/48 qemu:qtest+qtest-arm / qtest-arm/qos-test  
> ERROR  26.93s   killed by signal 6 SIGABRT
>   20/48 qemu:qtest+qtest-aarch64 / qtest-aarch64/device-introspect-test
> ERROR   5.17s   killed by signal 6 SIGABRT
>   45/48 qemu:qtest+qtest-arm / qtest-arm/device-introspect-test
> ERROR   4.97s   killed by signal 6 SIGABRT
> 
> Of which the qos-tests are the only new ones. I suspect something must
> be preventing the other stuff being exercised in our CI system.
> 
> Alex Bennée (19):
>   include/hw/virtio: more comment for VIRTIO_F_BAD_FEATURE
>   include/hw: document vhost_dev feature life-cycle
>   hw/virtio: fix some coding style issues
>   hw/virtio: log potentially buggy guest drivers
>   block/vhost-user-blk-server: don't expose
> VHOST_USER_F_PROTOCOL_FEATURES
>   hw/virtio: incorporate backend features in features
>   hw/virtio: gracefully handle unset vhost_dev vdev
>   hw/virtio: handle un-configured shutdown in virtio-pci
>   hw/virtio: fix vhost_user_read tracepoint
>   hw/virtio: add some vhost-user trace events
>   tests/qtest: pass stdout/stderr down to subtests
>   tests/qtest: add a timeout for subprocess_run_one_test
>   tests/qtest: use qos_printf instead of g_test_message
>   tests/qtest: catch unhandled vhost-user messages
>   tests/qtest: plain g_assert for VHOST_USER_F_PROTOCOL_FEATURES
>   tests/qtest: add assert to catch bad features
>   tests/qtest: implement stub for VHOST_USER_GET_CONFIG
>   tests/qtest: add a get_features op to vhost-user-test
>   tests/qtest: enable tests for virtio-gpio
> 
> Viresh Kumar (2):
>   hw/virtio: add boilerplate for vhost-user-gpio device
>   hw/virtio: add vhost-user-gpio-pci boilerplate
> 
>  include/hw/virtio/vhost-user-gpio.h  |  35 +++
>  include/hw/virtio/vhost.h|   3 +
>  include/hw/virtio/virtio.h   |   7 +-
>  tests/qtest/libqos/virtio-gpio.h |  35 +++
>  block/export/vhost-user-blk-server.c |   3 +-
>  hw/virtio/vhost-user-gpio-pci.c  |  69 +
>

[PULL 11/16] i386/pc: handle unitialized mr in pc_get_cxl_range_end()

2022-07-26 Thread Michael S. Tsirkin

From: Joao Martins 

Remove pc_get_cxl_range_end() dependency on the CXL memory region,
and replace with one that does not require the CXL host_mr to determine
the start of CXL start.

This in preparation to allow pc_pci_hole64_start() to be called early
in pc_memory_init(), handle CXL memory region end when its underlying
memory region isn't yet initialized.

Cc: Jonathan Cameron 
Signed-off-by: Joao Martins 
Message-Id: <20220719170014.27028-8-joao.m.mart...@oracle.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
Acked-by: Igor Mammedov 
---
 hw/i386/pc.c | 18 --
 1 file changed, 8 insertions(+), 10 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 9e1a067c41..611eb197da 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -843,17 +843,15 @@ static uint64_t pc_get_cxl_range_start(PCMachineState 
*pcms)
 
 static uint64_t pc_get_cxl_range_end(PCMachineState *pcms)
 {
-uint64_t start = 0;
+uint64_t start = pc_get_cxl_range_start(pcms) + MiB;
 
-if (pcms->cxl_devices_state.host_mr.addr) {
-start = pcms->cxl_devices_state.host_mr.addr +
-memory_region_size(>cxl_devices_state.host_mr);
-if (pcms->cxl_devices_state.fixed_windows) {
-GList *it;
-for (it = pcms->cxl_devices_state.fixed_windows; it; it = 
it->next) {
-CXLFixedWindow *fw = it->data;
-start = fw->mr.addr + memory_region_size(>mr);
-}
+if (pcms->cxl_devices_state.fixed_windows) {
+GList *it;
+
+start = ROUND_UP(start, 256 * MiB);
+for (it = pcms->cxl_devices_state.fixed_windows; it; it = it->next) {
+CXLFixedWindow *fw = it->data;
+start += fw->size;
 }
 }
 
-- 
MST

[PULL 08/16] i386/pc: factor out above-4g end to an helper

2022-07-26 Thread Michael S. Tsirkin

From: Joao Martins 

There's a couple of places that seem to duplicate this calculation
of RAM size above the 4G boundary. Move all those to a helper function.

Signed-off-by: Joao Martins 
Reviewed-by: Igor Mammedov 
Message-Id: <20220719170014.27028-5-joao.m.mart...@oracle.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/i386/pc.c | 29 ++---
 1 file changed, 14 insertions(+), 15 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index f4d5b25fdd..d1e20ccb27 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -814,6 +814,17 @@ void xen_load_linux(PCMachineState *pcms)
 #define PC_ROM_ALIGN   0x800
 #define PC_ROM_SIZE(PC_ROM_MAX - PC_ROM_MIN_VGA)
 
+static hwaddr pc_above_4g_end(PCMachineState *pcms)
+{
+X86MachineState *x86ms = X86_MACHINE(pcms);
+
+if (pcms->sgx_epc.size != 0) {
+return sgx_epc_above_4g_end(>sgx_epc);
+}
+
+return x86ms->above_4g_mem_start + x86ms->above_4g_mem_size;
+}
+
 void pc_memory_init(PCMachineState *pcms,
 MemoryRegion *system_memory,
 MemoryRegion *rom_memory,
@@ -891,15 +902,8 @@ void pc_memory_init(PCMachineState *pcms,
 exit(EXIT_FAILURE);
 }
 
-if (pcms->sgx_epc.size != 0) {
-machine->device_memory->base = 
sgx_epc_above_4g_end(>sgx_epc);
-} else {
-machine->device_memory->base =
-x86ms->above_4g_mem_start + x86ms->above_4g_mem_size;
-}
-
 machine->device_memory->base =
-ROUND_UP(machine->device_memory->base, 1 * GiB);
+ROUND_UP(pc_above_4g_end(pcms), 1 * GiB);
 
 if (pcmc->enforce_aligned_dimm) {
 /* size device region assuming 1G page max alignment per slot */
@@ -926,10 +930,8 @@ void pc_memory_init(PCMachineState *pcms,
 if (pcmc->has_reserved_memory && machine->device_memory->base) {
 cxl_base = machine->device_memory->base
 + memory_region_size(>device_memory->mr);
-} else if (pcms->sgx_epc.size != 0) {
-cxl_base = sgx_epc_above_4g_end(>sgx_epc);
 } else {
-cxl_base = x86ms->above_4g_mem_start + x86ms->above_4g_mem_size;
+cxl_base = pc_above_4g_end(pcms);
 }
 
 e820_add_entry(cxl_base, cxl_size, E820_RESERVED);
@@ -1016,7 +1018,6 @@ uint64_t pc_pci_hole64_start(void)
 PCMachineState *pcms = PC_MACHINE(qdev_get_machine());
 PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
 MachineState *ms = MACHINE(pcms);
-X86MachineState *x86ms = X86_MACHINE(pcms);
 uint64_t hole64_start = 0;
 
 if (pcms->cxl_devices_state.host_mr.addr) {
@@ -1034,10 +1035,8 @@ uint64_t pc_pci_hole64_start(void)
 if (!pcmc->broken_reserved_end) {
 hole64_start += memory_region_size(>device_memory->mr);
 }
-} else if (pcms->sgx_epc.size != 0) {
-hole64_start = sgx_epc_above_4g_end(>sgx_epc);
 } else {
-hole64_start = x86ms->above_4g_mem_start + x86ms->above_4g_mem_size;
+hole64_start = pc_above_4g_end(pcms);
 }
 
 return ROUND_UP(hole64_start, 1 * GiB);
-- 
MST

[PULL 09/16] i386/pc: factor out cxl range end to helper

2022-07-26 Thread Michael S. Tsirkin

From: Joao Martins 

Move calculation of CXL memory region end to separate helper.

This is in preparation to a future change that removes CXL range
dependency on the CXL memory region, with the goal of allowing
pc_pci_hole64_start() to be called before any memory region are
initialized.

Cc: Jonathan Cameron 
Signed-off-by: Joao Martins 
Acked-by: Igor Mammedov 
Message-Id: <20220719170014.27028-6-joao.m.mart...@oracle.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/i386/pc.c | 31 +--
 1 file changed, 21 insertions(+), 10 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index d1e20ccb27..cb27309e76 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -825,6 +825,25 @@ static hwaddr pc_above_4g_end(PCMachineState *pcms)
 return x86ms->above_4g_mem_start + x86ms->above_4g_mem_size;
 }
 
+static uint64_t pc_get_cxl_range_end(PCMachineState *pcms)
+{
+uint64_t start = 0;
+
+if (pcms->cxl_devices_state.host_mr.addr) {
+start = pcms->cxl_devices_state.host_mr.addr +
+memory_region_size(>cxl_devices_state.host_mr);
+if (pcms->cxl_devices_state.fixed_windows) {
+GList *it;
+for (it = pcms->cxl_devices_state.fixed_windows; it; it = 
it->next) {
+CXLFixedWindow *fw = it->data;
+start = fw->mr.addr + memory_region_size(>mr);
+}
+}
+}
+
+return start;
+}
+
 void pc_memory_init(PCMachineState *pcms,
 MemoryRegion *system_memory,
 MemoryRegion *rom_memory,
@@ -1020,16 +1039,8 @@ uint64_t pc_pci_hole64_start(void)
 MachineState *ms = MACHINE(pcms);
 uint64_t hole64_start = 0;
 
-if (pcms->cxl_devices_state.host_mr.addr) {
-hole64_start = pcms->cxl_devices_state.host_mr.addr +
-memory_region_size(>cxl_devices_state.host_mr);
-if (pcms->cxl_devices_state.fixed_windows) {
-GList *it;
-for (it = pcms->cxl_devices_state.fixed_windows; it; it = 
it->next) {
-CXLFixedWindow *fw = it->data;
-hole64_start = fw->mr.addr + memory_region_size(>mr);
-}
-}
+if (pcms->cxl_devices_state.is_enabled) {
+hole64_start = pc_get_cxl_range_end(pcms);
 } else if (pcmc->has_reserved_memory && ms->device_memory->base) {
 hole64_start = ms->device_memory->base;
 if (!pcmc->broken_reserved_end) {
-- 
MST

[PATCH v7 11/15] Revert "block: Restructure remove_file_or_backing_child()"

2022-07-26 Thread Vladimir Sementsov-Ogievskiy

That's a preparation to previously reverted
"block: Let replace_child_noperm free children". Drop it too, we don't
need it for a new approach.

This reverts commit 562bda8bb41879eeda0bd484dd3d55134579b28e.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Hanna Reitz 
---
 block.c | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/block.c b/block.c
index 1b9e0a2d79..5eefadf768 100644
--- a/block.c
+++ b/block.c
@@ -5053,33 +5053,30 @@ static void 
bdrv_remove_file_or_backing_child(BlockDriverState *bs,
   BdrvChild *child,
   Transaction *tran)
 {
-BdrvChild **childp;
 BdrvRemoveFilterOrCowChild *s;
 
+assert(child == bs->backing || child == bs->file);
+
 if (!child) {
 return;
 }
 
-if (child == bs->backing) {
-childp = >backing;
-} else if (child == bs->file) {
-childp = >file;
-} else {
-g_assert_not_reached();
-}
-
 if (child->bs) {
-bdrv_replace_child_tran(*childp, NULL, tran);
+bdrv_replace_child_tran(child, NULL, tran);
 }
 
 s = g_new(BdrvRemoveFilterOrCowChild, 1);
 *s = (BdrvRemoveFilterOrCowChild) {
 .child = child,
-.is_backing = (childp == >backing),
+.is_backing = (child == bs->backing),
 };
 tran_add(tran, _remove_filter_or_cow_child_drv, s);
 
-*childp = NULL;
+if (s->is_backing) {
+bs->backing = NULL;
+} else {
+bs->file = NULL;
+}
 }
 
 /*
-- 
2.25.1

[PATCH v7 15/15] block: refactor bdrv_remove_file_or_backing_child to bdrv_remove_child

2022-07-26 Thread Vladimir Sementsov-Ogievskiy

Now the function can remove any child, so give it more common name.
Drop assertions and drop bs argument which becomes unused. Function
would be reused in a further commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Hanna Reitz 
---
 block.c | 27 +--
 1 file changed, 9 insertions(+), 18 deletions(-)

diff --git a/block.c b/block.c
index 3343dc5649..55cff270f8 100644
--- a/block.c
+++ b/block.c
@@ -92,9 +92,7 @@ static bool bdrv_recurse_has_child(BlockDriverState *bs,
 
 static void bdrv_replace_child_noperm(BdrvChild *child,
   BlockDriverState *new_bs);
-static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
-  BdrvChild *child,
-  Transaction *tran);
+static void bdrv_remove_child(BdrvChild *child, Transaction *tran);
 static void bdrv_remove_filter_or_cow_child(BlockDriverState *bs,
 Transaction *tran);
 
@@ -3337,7 +3335,7 @@ static int 
bdrv_set_file_or_backing_noperm(BlockDriverState *parent_bs,
 
 if (child) {
 bdrv_unset_inherits_from(parent_bs, child, tran);
-bdrv_remove_file_or_backing_child(parent_bs, child, tran);
+bdrv_remove_child(child, tran);
 }
 
 if (!child_bs) {
@@ -5021,26 +5019,19 @@ static bool should_update_child(BdrvChild *c, 
BlockDriverState *to)
 return ret;
 }
 
-static void bdrv_remove_filter_or_cow_child_commit(void *opaque)
+static void bdrv_remove_child_commit(void *opaque)
 {
 GLOBAL_STATE_CODE();
 bdrv_child_free(opaque);
 }
 
-static TransactionActionDrv bdrv_remove_filter_or_cow_child_drv = {
-.commit = bdrv_remove_filter_or_cow_child_commit,
+static TransactionActionDrv bdrv_remove_child_drv = {
+.commit = bdrv_remove_child_commit,
 };
 
-/*
- * A function to remove backing or file child of @bs.
- * Function doesn't update permissions, caller is responsible for this.
- */
-static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
-  BdrvChild *child,
-  Transaction *tran)
+/* Function doesn't update permissions, caller is responsible for this. */
+static void bdrv_remove_child(BdrvChild *child, Transaction *tran)
 {
-assert(child == bs->backing || child == bs->file);
-
 if (!child) {
 return;
 }
@@ -5049,7 +5040,7 @@ static void 
bdrv_remove_file_or_backing_child(BlockDriverState *bs,
 bdrv_replace_child_tran(child, NULL, tran);
 }
 
-tran_add(tran, _remove_filter_or_cow_child_drv, child);
+tran_add(tran, _remove_child_drv, child);
 }
 
 /*
@@ -5060,7 +5051,7 @@ static void 
bdrv_remove_file_or_backing_child(BlockDriverState *bs,
 static void bdrv_remove_filter_or_cow_child(BlockDriverState *bs,
 Transaction *tran)
 {
-bdrv_remove_file_or_backing_child(bs, bdrv_filter_or_cow_child(bs), tran);
+bdrv_remove_child(bdrv_filter_or_cow_child(bs), tran);
 }
 
 static int bdrv_replace_node_noperm(BlockDriverState *from,
-- 
2.25.1

[PULL 04/16] hw/cxl: Fix size of constant in interleave granularity function.

2022-07-26 Thread Michael S. Tsirkin

From: Jonathan Cameron 

Whilst the interleave granularity is always small enough that this isn't
a real problem (much less than 4GiB) let's change the constant
to ULL to fix the coverity warning.

Reported-by: Peter Maydell 
Fixes: 829de299d1 ("hw/cxl/component: Add utils for interleave parameter 
encoding/decoding")
Fixes: Coverity CID 1488868
Signed-off-by: Jonathan Cameron 
Message-Id: <20220701132300.2264-4-jonathan.came...@huawei.com>
Acked-by: Igor Mammedov 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/cxl/cxl_component.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/hw/cxl/cxl_component.h b/include/hw/cxl/cxl_component.h
index 70b5018156..94ec2f07d7 100644
--- a/include/hw/cxl/cxl_component.h
+++ b/include/hw/cxl/cxl_component.h
@@ -215,7 +215,7 @@ uint8_t cxl_interleave_granularity_enc(uint64_t gran, Error 
**errp);
 
 static inline hwaddr cxl_decode_ig(int ig)
 {
-return 1 << (ig + 8);
+return 1ULL << (ig + 8);
 }
 
 CXLComponentState *cxl_get_hb_cstate(PCIHostState *hb);
-- 
MST

[PATCH v7 14/15] block/snapshot: drop indirection around bdrv_snapshot_fallback_ptr

2022-07-26 Thread Vladimir Sementsov-Ogievskiy

Now the indirection is not actually used, we can safely reduce it to
simple pointer. For consistency do a bit of refactoring to get rid of
_ptr suffixes that become meaningless.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Hanna Reitz 
---
 block/snapshot.c | 38 --
 1 file changed, 16 insertions(+), 22 deletions(-)

diff --git a/block/snapshot.c b/block/snapshot.c
index f3971ac2bd..e22ac3eac6 100644
--- a/block/snapshot.c
+++ b/block/snapshot.c
@@ -151,34 +151,29 @@ bool bdrv_snapshot_find_by_id_and_name(BlockDriverState 
*bs,
 }
 
 /**
- * Return a pointer to the child BDS pointer to which we can fall
+ * Return a pointer to child of given BDS to which we can fall
  * back if the given BDS does not support snapshots.
  * Return NULL if there is no BDS to (safely) fall back to.
- *
- * We need to return an indirect pointer because bdrv_snapshot_goto()
- * has to modify the BdrvChild pointer.
  */
-static BdrvChild **bdrv_snapshot_fallback_ptr(BlockDriverState *bs)
+static BdrvChild *bdrv_snapshot_fallback_child(BlockDriverState *bs)
 {
-BdrvChild **fallback;
-BdrvChild *child = bdrv_primary_child(bs);
+BdrvChild *fallback = bdrv_primary_child(bs);
+BdrvChild *child;
 
 /* We allow fallback only to primary child */
-if (!child) {
+if (!fallback) {
 return NULL;
 }
-fallback = (child == bs->file ? >file : >backing);
-assert(*fallback == child);
 
 /*
  * Check that there are no other children that would need to be
  * snapshotted.  If there are, it is not safe to fall back to
- * *fallback.
+ * fallback.
  */
 QLIST_FOREACH(child, >children, next) {
 if (child->role & (BDRV_CHILD_DATA | BDRV_CHILD_METADATA |
BDRV_CHILD_FILTERED) &&
-child != *fallback)
+child != fallback)
 {
 return NULL;
 }
@@ -189,8 +184,7 @@ static BdrvChild 
**bdrv_snapshot_fallback_ptr(BlockDriverState *bs)
 
 static BlockDriverState *bdrv_snapshot_fallback(BlockDriverState *bs)
 {
-BdrvChild **child_ptr = bdrv_snapshot_fallback_ptr(bs);
-return child_ptr ? (*child_ptr)->bs : NULL;
+return child_bs(bdrv_snapshot_fallback_child(bs));
 }
 
 int bdrv_can_snapshot(BlockDriverState *bs)
@@ -237,7 +231,7 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
Error **errp)
 {
 BlockDriver *drv = bs->drv;
-BdrvChild **fallback_ptr;
+BdrvChild *fallback;
 int ret, open_ret;
 
 GLOBAL_STATE_CODE();
@@ -260,13 +254,13 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
 return ret;
 }
 
-fallback_ptr = bdrv_snapshot_fallback_ptr(bs);
-if (fallback_ptr) {
+fallback = bdrv_snapshot_fallback_child(bs);
+if (fallback) {
 QDict *options;
 QDict *file_options;
 Error *local_err = NULL;
-BlockDriverState *fallback_bs = (*fallback_ptr)->bs;
-char *subqdict_prefix = g_strdup_printf("%s.", (*fallback_ptr)->name);
+BlockDriverState *fallback_bs = fallback->bs;
+char *subqdict_prefix = g_strdup_printf("%s.", fallback->name);
 
 options = qdict_clone_shallow(bs->options);
 
@@ -277,8 +271,8 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
 qobject_unref(file_options);
 g_free(subqdict_prefix);
 
-/* Force .bdrv_open() below to re-attach fallback_bs on *fallback_ptr 
*/
-qdict_put_str(options, (*fallback_ptr)->name,
+/* Force .bdrv_open() below to re-attach fallback_bs on fallback */
+qdict_put_str(options, fallback->name,
   bdrv_get_node_name(fallback_bs));
 
 /* Now close bs, apply the snapshot on fallback_bs, and re-open bs */
@@ -287,7 +281,7 @@ int bdrv_snapshot_goto(BlockDriverState *bs,
 }
 
 /* .bdrv_open() will re-attach it */
-bdrv_unref_child(bs, *fallback_ptr);
+bdrv_unref_child(bs, fallback);
 
 ret = bdrv_snapshot_goto(fallback_bs, snapshot_id, errp);
 open_ret = drv->bdrv_open(bs, options, bs->open_flags, _err);
-- 
2.25.1

[PULL 06/16] i386/pc: create pci-host qdev prior to pc_memory_init()

2022-07-26 Thread Michael S. Tsirkin

From: Joao Martins 

At the start of pc_memory_init() we usually pass a range of
0..UINT64_MAX as pci_memory, when really its 2G (i440fx) or
32G (q35). To get the real user value, we need to get pci-host
passed property for default pci_hole64_size. Thus to get that,
create the qdev prior to memory init to better make estimations
on max used/phys addr.

This is in preparation to determine that host-phys-bits are
enough and also for pci-hole64-size to be considered to relocate
ram-above-4g to be at 1T (on AMD platforms).

Signed-off-by: Joao Martins 
Reviewed-by: Igor Mammedov 
Message-Id: <20220719170014.27028-3-joao.m.mart...@oracle.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/pci-host/i440fx.h | 3 ++-
 hw/i386/pc_piix.c| 7 +--
 hw/i386/pc_q35.c | 6 +++---
 hw/pci-host/i440fx.c | 5 ++---
 4 files changed, 12 insertions(+), 9 deletions(-)

diff --git a/include/hw/pci-host/i440fx.h b/include/hw/pci-host/i440fx.h
index 52518dbf08..d02bf1ed6b 100644
--- a/include/hw/pci-host/i440fx.h
+++ b/include/hw/pci-host/i440fx.h
@@ -35,7 +35,8 @@ struct PCII440FXState {
 
 #define TYPE_IGD_PASSTHROUGH_I440FX_PCI_DEVICE "igd-passthrough-i440FX"
 
-PCIBus *i440fx_init(const char *host_type, const char *pci_type,
+PCIBus *i440fx_init(const char *pci_type,
+DeviceState *dev,
 MemoryRegion *address_space_mem,
 MemoryRegion *address_space_io,
 ram_addr_t ram_size,
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index fbf9465318..b8b3ce3408 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -91,6 +91,7 @@ static void pc_init1(MachineState *machine,
 MemoryRegion *pci_memory;
 MemoryRegion *rom_memory;
 ram_addr_t lowmem;
+DeviceState *i440fx_host;
 
 /*
  * Calculate ram split, for memory below and above 4G.  It's a bit
@@ -164,9 +165,11 @@ static void pc_init1(MachineState *machine,
 pci_memory = g_new(MemoryRegion, 1);
 memory_region_init(pci_memory, NULL, "pci", UINT64_MAX);
 rom_memory = pci_memory;
+i440fx_host = qdev_new(host_type);
 } else {
 pci_memory = NULL;
 rom_memory = system_memory;
+i440fx_host = NULL;
 }
 
 pc_guest_info_init(pcms);
@@ -200,8 +203,8 @@ static void pc_init1(MachineState *machine,
 const char *type = xen_enabled() ? TYPE_PIIX3_XEN_DEVICE
  : TYPE_PIIX3_DEVICE;
 
-pci_bus = i440fx_init(host_type,
-  pci_type,
+pci_bus = i440fx_init(pci_type,
+  i440fx_host,
   system_memory, system_io, machine->ram_size,
   x86ms->below_4g_mem_size,
   x86ms->above_4g_mem_size,
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 12cc76aaf8..f4d23b1469 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -203,12 +203,12 @@ static void pc_q35_init(MachineState *machine)
 pcms->smbios_entry_point_type);
 }
 
-/* allocate ram and load rom/bios */
-pc_memory_init(pcms, get_system_memory(), rom_memory, _memory);
-
 /* create pci host bus */
 q35_host = Q35_HOST_DEVICE(qdev_new(TYPE_Q35_HOST_DEVICE));
 
+/* allocate ram and load rom/bios */
+pc_memory_init(pcms, get_system_memory(), rom_memory, _memory);
+
 object_property_add_child(qdev_get_machine(), "q35", OBJECT(q35_host));
 object_property_set_link(OBJECT(q35_host), MCH_HOST_PROP_RAM_MEM,
  OBJECT(ram_memory), NULL);
diff --git a/hw/pci-host/i440fx.c b/hw/pci-host/i440fx.c
index 1c5ad5f918..d5426ef4a5 100644
--- a/hw/pci-host/i440fx.c
+++ b/hw/pci-host/i440fx.c
@@ -237,7 +237,8 @@ static void i440fx_realize(PCIDevice *dev, Error **errp)
 }
 }
 
-PCIBus *i440fx_init(const char *host_type, const char *pci_type,
+PCIBus *i440fx_init(const char *pci_type,
+DeviceState *dev,
 MemoryRegion *address_space_mem,
 MemoryRegion *address_space_io,
 ram_addr_t ram_size,
@@ -246,7 +247,6 @@ PCIBus *i440fx_init(const char *host_type, const char 
*pci_type,
 MemoryRegion *pci_address_space,
 MemoryRegion *ram_memory)
 {
-DeviceState *dev;
 PCIBus *b;
 PCIDevice *d;
 PCIHostState *s;
@@ -254,7 +254,6 @@ PCIBus *i440fx_init(const char *host_type, const char 
*pci_type,
 unsigned i;
 I440FXState *i440fx;
 
-dev = qdev_new(host_type);
 s = PCI_HOST_BRIDGE(dev);
 b = pci_root_bus_new(dev, NULL, pci_address_space,
  address_space_io, 0, TYPE_PCI_BUS);
-- 
MST

[PATCH v7 05/15] tests-bdrv-drain: bdrv_replace_test driver: declare supports_backing

2022-07-26 Thread Vladimir Sementsov-Ogievskiy

We do add COW child to the node.  In future we are going to forbid
adding COW child to the node that doesn't support backing. So, fix it
here now.

Don't worry about setting bs->backing itself: in further commit we'll
update the block-layer to automatically set/unset this field in generic
code.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Hanna Reitz 
---
 tests/unit/test-bdrv-drain.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/unit/test-bdrv-drain.c b/tests/unit/test-bdrv-drain.c
index 36be84ae55..23d425a494 100644
--- a/tests/unit/test-bdrv-drain.c
+++ b/tests/unit/test-bdrv-drain.c
@@ -1948,6 +1948,7 @@ static void coroutine_fn 
bdrv_replace_test_co_drain_end(BlockDriverState *bs)
 static BlockDriver bdrv_replace_test = {
 .format_name= "replace_test",
 .instance_size  = sizeof(BDRVReplaceTestState),
+.supports_backing   = true,
 
 .bdrv_close = bdrv_replace_test_close,
 .bdrv_co_preadv = bdrv_replace_test_co_preadv,
-- 
2.25.1

[PATCH v7 12/15] Revert "block: Pass BdrvChild ** to replace_child_noperm"

2022-07-26 Thread Vladimir Sementsov-Ogievskiy

That's a preparation to previously reverted
"block: Let replace_child_noperm free children". Drop it too, we don't
need it for a new approach.

This reverts commit be64bbb0149748f3999c49b13976aafb8330ea86.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Hanna Reitz 
---
 block.c | 23 +++
 1 file changed, 11 insertions(+), 12 deletions(-)

diff --git a/block.c b/block.c
index 5eefadf768..121affb045 100644
--- a/block.c
+++ b/block.c
@@ -90,7 +90,7 @@ static BlockDriverState *bdrv_open_inherit(const char 
*filename,
 static bool bdrv_recurse_has_child(BlockDriverState *bs,
BlockDriverState *child);
 
-static void bdrv_replace_child_noperm(BdrvChild **child,
+static void bdrv_replace_child_noperm(BdrvChild *child,
   BlockDriverState *new_bs);
 static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
   BdrvChild *child,
@@ -2354,7 +2354,7 @@ static void bdrv_replace_child_abort(void *opaque)
 
 GLOBAL_STATE_CODE();
 /* old_bs reference is transparently moved from @s to @s->child */
-bdrv_replace_child_noperm(>child, s->old_bs);
+bdrv_replace_child_noperm(s->child, s->old_bs);
 bdrv_unref(new_bs);
 }
 
@@ -2384,7 +2384,7 @@ static void bdrv_replace_child_tran(BdrvChild *child, 
BlockDriverState *new_bs,
 if (new_bs) {
 bdrv_ref(new_bs);
 }
-bdrv_replace_child_noperm(, new_bs);
+bdrv_replace_child_noperm(child, new_bs);
 /* old_bs reference is transparently moved from @child to @s */
 }
 
@@ -2766,10 +2766,9 @@ uint64_t bdrv_qapi_perm_to_blk_perm(BlockPermission 
qapi_perm)
 return permissions[qapi_perm];
 }
 
-static void bdrv_replace_child_noperm(BdrvChild **childp,
+static void bdrv_replace_child_noperm(BdrvChild *child,
   BlockDriverState *new_bs)
 {
-BdrvChild *child = *childp;
 BlockDriverState *old_bs = child->bs;
 int new_bs_quiesce_counter;
 int drain_saldo;
@@ -2867,7 +2866,7 @@ static void bdrv_attach_child_common_abort(void *opaque)
 BlockDriverState *bs = child->bs;
 
 GLOBAL_STATE_CODE();
-bdrv_replace_child_noperm(s->child, NULL);
+bdrv_replace_child_noperm(child, NULL);
 
 if (bdrv_get_aio_context(bs) != s->old_child_ctx) {
 bdrv_try_set_aio_context(bs, s->old_child_ctx, _abort);
@@ -2968,7 +2967,7 @@ static int bdrv_attach_child_common(BlockDriverState 
*child_bs,
 }
 
 bdrv_ref(child_bs);
-bdrv_replace_child_noperm(_child, child_bs);
+bdrv_replace_child_noperm(new_child, child_bs);
 
 *child = new_child;
 
@@ -3024,13 +3023,13 @@ static int bdrv_attach_child_noperm(BlockDriverState 
*parent_bs,
 return 0;
 }
 
-static void bdrv_detach_child(BdrvChild **childp)
+static void bdrv_detach_child(BdrvChild *child)
 {
-BlockDriverState *old_bs = (*childp)->bs;
+BlockDriverState *old_bs = child->bs;
 
 GLOBAL_STATE_CODE();
-bdrv_replace_child_noperm(childp, NULL);
-bdrv_child_free(*childp);
+bdrv_replace_child_noperm(child, NULL);
+bdrv_child_free(child);
 
 if (old_bs) {
 /*
@@ -3142,7 +3141,7 @@ void bdrv_root_unref_child(BdrvChild *child)
 GLOBAL_STATE_CODE();
 
 child_bs = child->bs;
-bdrv_detach_child();
+bdrv_detach_child(child);
 bdrv_unref(child_bs);
 }
 
-- 
2.25.1

[PULL 05/16] hw/i386: add 4g boundary start to X86MachineState

2022-07-26 Thread Michael S. Tsirkin

From: Joao Martins 

Rather than hardcoding the 4G boundary everywhere, introduce a
X86MachineState field @above_4g_mem_start and use it
accordingly.

This is in preparation for relocating ram-above-4g to be
dynamically start at 1T on AMD platforms.

Signed-off-by: Joao Martins 
Reviewed-by: Igor Mammedov 
Message-Id: <20220719170014.27028-2-joao.m.mart...@oracle.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/i386/x86.h |  3 +++
 hw/i386/acpi-build.c  |  2 +-
 hw/i386/pc.c  | 11 ++-
 hw/i386/sgx.c |  2 +-
 hw/i386/x86.c |  1 +
 5 files changed, 12 insertions(+), 7 deletions(-)

diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index 6bdf1f6ab2..62fa5774f8 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -56,6 +56,9 @@ struct X86MachineState {
 /* RAM information (sizes, addresses, configuration): */
 ram_addr_t below_4g_mem_size, above_4g_mem_size;
 
+/* Start address of the initial RAM above 4G */
+uint64_t above_4g_mem_start;
+
 /* CPU and apic information: */
 bool apic_xrupt_override;
 unsigned pci_irq_mask;
diff --git a/hw/i386/acpi-build.c b/hw/i386/acpi-build.c
index cad6f5ac41..0355bd3dda 100644
--- a/hw/i386/acpi-build.c
+++ b/hw/i386/acpi-build.c
@@ -2024,7 +2024,7 @@ build_srat(GArray *table_data, BIOSLinker *linker, 
MachineState *machine)
 build_srat_memory(table_data, mem_base, mem_len, i - 1,
   MEM_AFFINITY_ENABLED);
 }
-mem_base = 1ULL << 32;
+mem_base = x86ms->above_4g_mem_start;
 mem_len = next_base - x86ms->below_4g_mem_size;
 next_base = mem_base + mem_len;
 }
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 46ab1dcb47..13b68307be 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -850,9 +850,10 @@ void pc_memory_init(PCMachineState *pcms,
  machine->ram,
  x86ms->below_4g_mem_size,
  x86ms->above_4g_mem_size);
-memory_region_add_subregion(system_memory, 0x1ULL,
+memory_region_add_subregion(system_memory, x86ms->above_4g_mem_start,
 ram_above_4g);
-e820_add_entry(0x1ULL, x86ms->above_4g_mem_size, E820_RAM);
+e820_add_entry(x86ms->above_4g_mem_start, x86ms->above_4g_mem_size,
+   E820_RAM);
 }
 
 if (pcms->sgx_epc.size != 0) {
@@ -893,7 +894,7 @@ void pc_memory_init(PCMachineState *pcms,
 machine->device_memory->base = 
sgx_epc_above_4g_end(>sgx_epc);
 } else {
 machine->device_memory->base =
-0x1ULL + x86ms->above_4g_mem_size;
+x86ms->above_4g_mem_start + x86ms->above_4g_mem_size;
 }
 
 machine->device_memory->base =
@@ -927,7 +928,7 @@ void pc_memory_init(PCMachineState *pcms,
 } else if (pcms->sgx_epc.size != 0) {
 cxl_base = sgx_epc_above_4g_end(>sgx_epc);
 } else {
-cxl_base = 0x1ULL + x86ms->above_4g_mem_size;
+cxl_base = x86ms->above_4g_mem_start + x86ms->above_4g_mem_size;
 }
 
 e820_add_entry(cxl_base, cxl_size, E820_RESERVED);
@@ -1035,7 +1036,7 @@ uint64_t pc_pci_hole64_start(void)
 } else if (pcms->sgx_epc.size != 0) {
 hole64_start = sgx_epc_above_4g_end(>sgx_epc);
 } else {
-hole64_start = 0x1ULL + x86ms->above_4g_mem_size;
+hole64_start = x86ms->above_4g_mem_start + x86ms->above_4g_mem_size;
 }
 
 return ROUND_UP(hole64_start, 1 * GiB);
diff --git a/hw/i386/sgx.c b/hw/i386/sgx.c
index a44d66ba2a..09d9c7c73d 100644
--- a/hw/i386/sgx.c
+++ b/hw/i386/sgx.c
@@ -295,7 +295,7 @@ void pc_machine_init_sgx_epc(PCMachineState *pcms)
 return;
 }
 
-sgx_epc->base = 0x1ULL + x86ms->above_4g_mem_size;
+sgx_epc->base = x86ms->above_4g_mem_start + x86ms->above_4g_mem_size;
 
 memory_region_init(_epc->mr, OBJECT(pcms), "sgx-epc", UINT64_MAX);
 memory_region_add_subregion(get_system_memory(), sgx_epc->base,
diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index ecea25d249..050eedc0c8 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1391,6 +1391,7 @@ static void x86_machine_initfn(Object *obj)
 x86ms->oem_id = g_strndup(ACPI_BUILD_APPNAME6, 6);
 x86ms->oem_table_id = g_strndup(ACPI_BUILD_APPNAME8, 8);
 x86ms->bus_lock_ratelimit = 0;
+x86ms->above_4g_mem_start = 4 * GiB;
 }
 
 static void x86_machine_class_init(ObjectClass *oc, void *data)
-- 
MST

[PATCH v7 09/15] Revert "block: Let replace_child_noperm free children"

2022-07-26 Thread Vladimir Sementsov-Ogievskiy

We are going to reimplement this behavior (clear bs->file / bs->backing
pointers automatically when child->bs is cleared) in a nicer way, see
further commit
"block: Manipulate bs->file / bs->backing pointers in .attach/.detach".

With this revert we bring back a problem that was fixed by b0a9f6fed3d8.
Still the problem was mostly theoretical, we don't have concrete bugs
fixed by b0a9f6fed3d8, we don't have a specific test. Probably some
accidental failures of iotests are related.

Alternatively, we may merge this and following three reverts into final
"block: Manipulate ..." to avoid any kind of regression. But seems that
in this case having separate clear revert commits is better.

This reverts commit b0a9f6fed3d80de610dcd04a7e66f9f30a04174f.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Hanna Reitz 
---
 block.c | 102 +---
 1 file changed, 23 insertions(+), 79 deletions(-)

diff --git a/block.c b/block.c
index 4e38fc45c0..87e2f23d13 100644
--- a/block.c
+++ b/block.c
@@ -90,10 +90,8 @@ static BlockDriverState *bdrv_open_inherit(const char 
*filename,
 static bool bdrv_recurse_has_child(BlockDriverState *bs,
BlockDriverState *child);
 
-static void bdrv_child_free(BdrvChild *child);
 static void bdrv_replace_child_noperm(BdrvChild **child,
-  BlockDriverState *new_bs,
-  bool free_empty_child);
+  BlockDriverState *new_bs);
 static void bdrv_remove_file_or_backing_child(BlockDriverState *bs,
   BdrvChild *child,
   Transaction *tran);
@@ -2340,7 +2338,6 @@ typedef struct BdrvReplaceChildState {
 BdrvChild *child;
 BdrvChild **childp;
 BlockDriverState *old_bs;
-bool free_empty_child;
 } BdrvReplaceChildState;
 
 static void bdrv_replace_child_commit(void *opaque)
@@ -2348,9 +2345,6 @@ static void bdrv_replace_child_commit(void *opaque)
 BdrvReplaceChildState *s = opaque;
 GLOBAL_STATE_CODE();
 
-if (s->free_empty_child && !s->child->bs) {
-bdrv_child_free(s->child);
-}
 bdrv_unref(s->old_bs);
 }
 
@@ -2368,26 +2362,22 @@ static void bdrv_replace_child_abort(void *opaque)
  * modify the BdrvChild * pointer we indirectly pass to it, i.e. it
  * will not modify s->child.  From that perspective, it does not matter
  * whether we pass s->childp or >child.
+ * (TODO: Right now, bdrv_replace_child_noperm() never modifies that
+ * pointer anyway (though it will in the future), so at this point it
+ * absolutely does not matter whether we pass s->childp or >child.)
  * (2) If new_bs is not NULL, s->childp will be NULL.  We then cannot use
  * it here.
  * (3) If new_bs is NULL, *s->childp will have been NULLed by
  * bdrv_replace_child_tran()'s bdrv_replace_child_noperm() call, and we
  * must not pass a NULL *s->childp here.
+ * (TODO: In its current state, bdrv_replace_child_noperm() will not
+ * have NULLed *s->childp, so this does not apply yet.  It will in the
+ * future.)
  *
  * So whether new_bs was NULL or not, we cannot pass s->childp here; and in
  * any case, there is no reason to pass it anyway.
  */
-bdrv_replace_child_noperm(>child, s->old_bs, true);
-/*
- * The child was pre-existing, so s->old_bs must be non-NULL, and
- * s->child thus must not have been freed
- */
-assert(s->child != NULL);
-if (!new_bs) {
-/* As described above, *s->childp was cleared, so restore it */
-assert(s->childp != NULL);
-*s->childp = s->child;
-}
+bdrv_replace_child_noperm(>child, s->old_bs);
 bdrv_unref(new_bs);
 }
 
@@ -2404,44 +2394,30 @@ static TransactionActionDrv bdrv_replace_child_drv = {
  *
  * The function doesn't update permissions, caller is responsible for this.
  *
- * (*childp)->bs must not be NULL.
- *
  * Note that if new_bs == NULL, @childp is stored in a state object attached
  * to @tran, so that the old child can be reinstated in the abort handler.
  * Therefore, if @new_bs can be NULL, @childp must stay valid until the
  * transaction is committed or aborted.
  *
- * If @free_empty_child is true and @new_bs is NULL, the BdrvChild is
- * freed (on commit).  @free_empty_child should only be false if the
- * caller will free the BDrvChild themselves (which may be important
- * if this is in turn called in another transactional context).
+ * (TODO: The reinstating does not happen yet, but it will once
+ * bdrv_replace_child_noperm() NULLs *childp when new_bs is NULL.)
  */
 static void bdrv_replace_child_tran(BdrvChild **childp,
 BlockDriverState *new_bs,
-Transaction *tran,
-

[PATCH v7 03/15] block/blklogwrites: don't care to remove bs->file child on failure

2022-07-26 Thread Vladimir Sementsov-Ogievskiy

We don't need to remove bs->file, generic layer takes care of it. No
other driver cares to remove bs->file on failure by hand.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Hanna Reitz 
---
 block/blklogwrites.c | 4 
 1 file changed, 4 deletions(-)

diff --git a/block/blklogwrites.c b/block/blklogwrites.c
index 12b4c3c8cf..cef9efe55d 100644
--- a/block/blklogwrites.c
+++ b/block/blklogwrites.c
@@ -254,10 +254,6 @@ fail_log:
 s->log_file = NULL;
 }
 fail:
-if (ret < 0) {
-bdrv_unref_child(bs, bs->file);
-bs->file = NULL;
-}
 qemu_opts_del(opts);
 return ret;
 }
-- 
2.25.1

[PATCH v7 06/15] test-bdrv-graph-mod: fix filters to be filters

2022-07-26 Thread Vladimir Sementsov-Ogievskiy

bdrv_pass_through is used as filter, even all node variables has
corresponding names. We want to append it, so it should be
backing-child-based filter like mirror_top.
So, in test_update_perm_tree, first child should be DATA, as we don't
want filters with two filtered children.

bdrv_exclusive_writer is used as a filter once. So it should be filter
anyway. We want to append it, so it should be backing-child-based
fitler too.

Make all FILTERED children to be PRIMARY as well. We are going to force
this rule by assertion soon.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Hanna Reitz 
---
 include/block/block_int-common.h |  5 +++--
 tests/unit/test-bdrv-graph-mod.c | 24 +---
 2 files changed, 20 insertions(+), 9 deletions(-)

diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 9d91ccbcbf..d68adc6ff3 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -122,8 +122,9 @@ struct BlockDriver {
 /*
  * Only make sense for filter drivers, for others must be false.
  * If true, filtered child is bs->backing. Otherwise it's bs->file.
- * Only two internal filters use bs->backing as filtered child and has this
- * field set to true: mirror_top and commit_top.
+ * Two internal filters use bs->backing as filtered child and has this
+ * field set to true: mirror_top and commit_top. There also two such test
+ * filters in tests/unit/test-bdrv-graph-mod.c.
  *
  * Never create any more such filters!
  *
diff --git a/tests/unit/test-bdrv-graph-mod.c b/tests/unit/test-bdrv-graph-mod.c
index e2f1355af1..c522591531 100644
--- a/tests/unit/test-bdrv-graph-mod.c
+++ b/tests/unit/test-bdrv-graph-mod.c
@@ -26,6 +26,8 @@
 
 static BlockDriver bdrv_pass_through = {
 .format_name = "pass-through",
+.is_filter = true,
+.filtered_child_is_backing = true,
 .bdrv_child_perm = bdrv_default_perms,
 };
 
@@ -57,6 +59,8 @@ static void exclusive_write_perms(BlockDriverState *bs, 
BdrvChild *c,
 
 static BlockDriver bdrv_exclusive_writer = {
 .format_name = "exclusive-writer",
+.is_filter = true,
+.filtered_child_is_backing = true,
 .bdrv_child_perm = exclusive_write_perms,
 };
 
@@ -134,7 +138,7 @@ static void test_update_perm_tree(void)
 blk_insert_bs(root, bs, _abort);
 
 bdrv_attach_child(filter, bs, "child", _of_bds,
-  BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY, _abort);
+  BDRV_CHILD_DATA, _abort);
 
 ret = bdrv_append(filter, bs, NULL);
 g_assert_cmpint(ret, <, 0);
@@ -228,11 +232,14 @@ static void test_parallel_exclusive_write(void)
  */
 bdrv_ref(base);
 
-bdrv_attach_child(top, fl1, "backing", _of_bds, BDRV_CHILD_DATA,
+bdrv_attach_child(top, fl1, "backing", _of_bds,
+  BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
   _abort);
-bdrv_attach_child(fl1, base, "backing", _of_bds, BDRV_CHILD_FILTERED,
+bdrv_attach_child(fl1, base, "backing", _of_bds,
+  BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
   _abort);
-bdrv_attach_child(fl2, base, "backing", _of_bds, BDRV_CHILD_FILTERED,
+bdrv_attach_child(fl2, base, "backing", _of_bds,
+  BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
   _abort);
 
 bdrv_replace_node(fl1, fl2, _abort);
@@ -344,9 +351,11 @@ static void test_parallel_perm_update(void)
   BDRV_CHILD_DATA, _abort);
 c_fl2 = bdrv_attach_child(ws, fl2, "second", _of_bds,
   BDRV_CHILD_DATA, _abort);
-bdrv_attach_child(fl1, base, "backing", _of_bds, BDRV_CHILD_FILTERED,
+bdrv_attach_child(fl1, base, "backing", _of_bds,
+  BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
   _abort);
-bdrv_attach_child(fl2, base, "backing", _of_bds, BDRV_CHILD_FILTERED,
+bdrv_attach_child(fl2, base, "backing", _of_bds,
+  BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
   _abort);
 
 /* Select fl1 as first child to be active */
@@ -397,7 +406,8 @@ static void test_append_greedy_filter(void)
 BlockDriverState *base = no_perm_node("base");
 BlockDriverState *fl = exclusive_writer_node("fl1");
 
-bdrv_attach_child(top, base, "backing", _of_bds, BDRV_CHILD_COW,
+bdrv_attach_child(top, base, "backing", _of_bds,
+  BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
   _abort);
 
 bdrv_append(fl, base, _abort);
-- 
2.25.1

[PULL 03/16] hw/i386/pc: Always place CXL Memory Regions after device_memory

2022-07-26 Thread Michael S. Tsirkin

From: Jonathan Cameron 

Previously broken_reserved_end was taken into account, but Igor Mammedov
identified that this could lead to a clash between potential RAM being
mapped in the region and CXL usage. Hence always add the size of the
device_memory memory region.  This only affects the case where the
broken_reserved_end flag was set.

Fixes: 6e4e3ae936e6 ("hw/cxl/component: Implement host bridge MMIO (8.2.5, 
table 142)")
Reported-by: Igor Mammedov 
Signed-off-by: Jonathan Cameron 
Message-Id: <20220701132300.2264-3-jonathan.came...@huawei.com>
Acked-by: Igor Mammedov 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/i386/pc.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index d2b5823ffb..46ab1dcb47 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -922,10 +922,8 @@ void pc_memory_init(PCMachineState *pcms,
 hwaddr cxl_size = MiB;
 
 if (pcmc->has_reserved_memory && machine->device_memory->base) {
-cxl_base = machine->device_memory->base;
-if (!pcmc->broken_reserved_end) {
-cxl_base += memory_region_size(>device_memory->mr);
-}
+cxl_base = machine->device_memory->base
++ memory_region_size(>device_memory->mr);
 } else if (pcms->sgx_epc.size != 0) {
 cxl_base = sgx_epc_above_4g_end(>sgx_epc);
 } else {
-- 
MST

[PATCH v7 07/15] block: document connection between child roles and bs->backing/bs->file

2022-07-26 Thread Vladimir Sementsov-Ogievskiy

Make the informal rules formal. In further commit we'll add
corresponding assertions.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Hanna Reitz 
---
 include/block/block-common.h | 39 
 1 file changed, 39 insertions(+)

diff --git a/include/block/block-common.h b/include/block/block-common.h
index fdb7306e78..fda67a7c38 100644
--- a/include/block/block-common.h
+++ b/include/block/block-common.h
@@ -313,6 +313,45 @@ enum {
  *
  * At least one of DATA, METADATA, FILTERED, or COW must be set for
  * every child.
+ *
+ *
+ * = Connection with bs->children, bs->file and bs->backing fields =
+ *
+ * 1. Filters
+ *
+ * Filter drivers have drv->is_filter = true.
+ *
+ * Filter node has exactly one FILTERED|PRIMARY child, and may have other
+ * children which must not have these bits (one example is the
+ * copy-before-write filter, which also has its target DATA child).
+ *
+ * Filter nodes never have COW children.
+ *
+ * For most filters, the filtered child is linked in bs->file, bs->backing is
+ * NULL.  For some filters (as an exception), it is the other way around; those
+ * drivers will have drv->filtered_child_is_backing set to true (see that
+ * field’s documentation for what drivers this concerns)
+ *
+ * 2. "raw" driver (block/raw-format.c)
+ *
+ * Formally it's not a filter (drv->is_filter = false)
+ *
+ * bs->backing is always NULL
+ *
+ * Only has one child, linked in bs->file. Its role is either FILTERED|PRIMARY
+ * (like filter) or DATA|PRIMARY depending on options.
+ *
+ * 3. Other drivers
+ *
+ * Don't have any FILTERED children.
+ *
+ * May have at most one COW child. In this case it's linked in bs->backing.
+ * Otherwise bs->backing is NULL. COW child is never PRIMARY.
+ *
+ * May have at most one PRIMARY child. In this case it's linked in bs->file.
+ * Otherwise bs->file is NULL.
+ *
+ * May also have some other children that don't have the PRIMARY or COW bit 
set.
  */
 enum BdrvChildRoleBits {
 /*
-- 
2.25.1

[PATCH v7 for-7.2 00/15] block: cleanup backing and file handling

2022-07-26 Thread Vladimir Sementsov-Ogievskiy

Hi all!

That's the first part of
"[PATCH v5 00/45] Transactional block-graph modifying API",
updated and is fully reviewed by Hanna.

v7: add r-bs and rebase on master

Vladimir Sementsov-Ogievskiy (15):
  block: BlockDriver: add .filtered_child_is_backing field
  block: introduce bdrv_open_file_child() helper
  block/blklogwrites: don't care to remove bs->file child on failure
  test-bdrv-graph-mod: update test_parallel_perm_update test case
  tests-bdrv-drain: bdrv_replace_test driver: declare supports_backing
  test-bdrv-graph-mod: fix filters to be filters
  block: document connection between child roles and
bs->backing/bs->file
  block/snapshot: stress that we fallback to primary child
  Revert "block: Let replace_child_noperm free children"
  Revert "block: Let replace_child_tran keep indirect pointer"
  Revert "block: Restructure remove_file_or_backing_child()"
  Revert "block: Pass BdrvChild ** to replace_child_noperm"
  block: Manipulate bs->file / bs->backing pointers in .attach/.detach
  block/snapshot: drop indirection around bdrv_snapshot_fallback_ptr
  block: refactor bdrv_remove_file_or_backing_child to bdrv_remove_child

 block.c| 435 ++---
 block/blkdebug.c   |   9 +-
 block/blklogwrites.c   |  11 +-
 block/blkreplay.c  |   7 +-
 block/blkverify.c  |   9 +-
 block/bochs.c  |   7 +-
 block/cloop.c  |   7 +-
 block/commit.c |   1 +
 block/copy-before-write.c  |   9 +-
 block/copy-on-read.c   |   9 +-
 block/crypto.c |  11 +-
 block/dmg.c|   7 +-
 block/filter-compress.c|   8 +-
 block/mirror.c |   1 +
 block/parallels.c  |   7 +-
 block/preallocate.c|   9 +-
 block/qcow.c   |   6 +-
 block/qcow2.c  |   8 +-
 block/qed.c|   8 +-
 block/raw-format.c |   4 +-
 block/replication.c|   8 +-
 block/snapshot-access.c|   6 +-
 block/snapshot.c   |  59 ++--
 block/throttle.c   |   8 +-
 block/vdi.c|   7 +-
 block/vhdx.c   |   7 +-
 block/vmdk.c   |   7 +-
 block/vpc.c|   7 +-
 include/block/block-common.h   |  39 +++
 include/block/block-global-state.h |   3 +
 include/block/block_int-common.h   |  29 +-
 tests/unit/test-bdrv-drain.c   |  11 +-
 tests/unit/test-bdrv-graph-mod.c   | 104 ---
 33 files changed, 389 insertions(+), 479 deletions(-)

-- 
2.25.1

[PULL 01/16] acpi/nvdimm: Define trace events for NVDIMM and substitute nvdimm_debug()

2022-07-26 Thread Michael S. Tsirkin

From: Robert Hoo 

Signed-off-by: Robert Hoo 
Reviewed-by: Jingqi Liu 
Message-Id: <20220704085852.330005-1-robert...@linux.intel.com>
Reviewed-by: Igor Mammedov 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/mem/nvdimm.h |  8 
 hw/acpi/nvdimm.c| 35 ---
 hw/acpi/trace-events| 13 +
 3 files changed, 29 insertions(+), 27 deletions(-)

diff --git a/include/hw/mem/nvdimm.h b/include/hw/mem/nvdimm.h
index cf8f59be44..acf887c83d 100644
--- a/include/hw/mem/nvdimm.h
+++ b/include/hw/mem/nvdimm.h
@@ -29,14 +29,6 @@
 #include "hw/acpi/aml-build.h"
 #include "qom/object.h"
 
-#define NVDIMM_DEBUG 0
-#define nvdimm_debug(fmt, ...)\
-do {  \
-if (NVDIMM_DEBUG) {   \
-fprintf(stderr, "nvdimm: " fmt, ## __VA_ARGS__);  \
-} \
-} while (0)
-
 /*
  * The minimum label data size is required by NVDIMM Namespace
  * specification, see the chapter 2 Namespaces:
diff --git a/hw/acpi/nvdimm.c b/hw/acpi/nvdimm.c
index 5f85b16327..31e46df0bd 100644
--- a/hw/acpi/nvdimm.c
+++ b/hw/acpi/nvdimm.c
@@ -35,6 +35,7 @@
 #include "hw/nvram/fw_cfg.h"
 #include "hw/mem/nvdimm.h"
 #include "qemu/nvdimm-utils.h"
+#include "trace.h"
 
 /*
  * define Byte Addressable Persistent Memory (PM) Region according to
@@ -550,8 +551,8 @@ static void nvdimm_dsm_func_read_fit(NVDIMMState *state, 
NvdimmDsmIn *in,
 
 fit = fit_buf->fit;
 
-nvdimm_debug("Read FIT: offset 0x%x FIT size 0x%x Dirty %s.\n",
- read_fit->offset, fit->len, fit_buf->dirty ? "Yes" : "No");
+trace_acpi_nvdimm_read_fit(read_fit->offset, fit->len,
+   fit_buf->dirty ? "Yes" : "No");
 
 if (read_fit->offset > fit->len) {
 func_ret_status = NVDIMM_DSM_RET_STATUS_INVALID;
@@ -658,7 +659,7 @@ static void nvdimm_dsm_label_size(NVDIMMDevice *nvdimm, 
hwaddr dsm_mem_addr)
 label_size = nvdimm->label_size;
 mxfer = nvdimm_get_max_xfer_label_size();
 
-nvdimm_debug("label_size 0x%x, max_xfer 0x%x.\n", label_size, mxfer);
+trace_acpi_nvdimm_label_info(label_size, mxfer);
 
 label_size_out.func_ret_status = 
cpu_to_le32(NVDIMM_DSM_RET_STATUS_SUCCESS);
 label_size_out.label_size = cpu_to_le32(label_size);
@@ -674,20 +675,18 @@ static uint32_t nvdimm_rw_label_data_check(NVDIMMDevice 
*nvdimm,
 uint32_t ret = NVDIMM_DSM_RET_STATUS_INVALID;
 
 if (offset + length < offset) {
-nvdimm_debug("offset 0x%x + length 0x%x is overflow.\n", offset,
- length);
+trace_acpi_nvdimm_label_overflow(offset, length);
 return ret;
 }
 
 if (nvdimm->label_size < offset + length) {
-nvdimm_debug("position 0x%x is beyond label data (len = %" PRIx64 
").\n",
- offset + length, nvdimm->label_size);
+trace_acpi_nvdimm_label_oversize(offset + length, nvdimm->label_size);
 return ret;
 }
 
 if (length > nvdimm_get_max_xfer_label_size()) {
-nvdimm_debug("length (0x%x) is larger than max_xfer (0x%x).\n",
- length, nvdimm_get_max_xfer_label_size());
+trace_acpi_nvdimm_label_xfer_exceed(length,
+nvdimm_get_max_xfer_label_size());
 return ret;
 }
 
@@ -710,8 +709,8 @@ static void nvdimm_dsm_get_label_data(NVDIMMDevice *nvdimm, 
NvdimmDsmIn *in,
 get_label_data->offset = le32_to_cpu(get_label_data->offset);
 get_label_data->length = le32_to_cpu(get_label_data->length);
 
-nvdimm_debug("Read Label Data: offset 0x%x length 0x%x.\n",
- get_label_data->offset, get_label_data->length);
+trace_acpi_nvdimm_read_label(get_label_data->offset,
+ get_label_data->length);
 
 status = nvdimm_rw_label_data_check(nvdimm, get_label_data->offset,
 get_label_data->length);
@@ -749,8 +748,8 @@ static void nvdimm_dsm_set_label_data(NVDIMMDevice *nvdimm, 
NvdimmDsmIn *in,
 set_label_data->offset = le32_to_cpu(set_label_data->offset);
 set_label_data->length = le32_to_cpu(set_label_data->length);
 
-nvdimm_debug("Write Label Data: offset 0x%x length 0x%x.\n",
- set_label_data->offset, set_label_data->length);
+trace_acpi_nvdimm_write_label(set_label_data->offset,
+  set_label_data->length);
 
 status = nvdimm_rw_label_data_check(nvdimm, set_label_data->offset,
 set_label_data->length);
@@ -821,7 +820,7 @@ static void nvdimm_dsm_device(NvdimmDsmIn *in, hwaddr 
dsm_mem_addr)
 static uint64_t
 nvdimm_dsm_read(void *opaque, hwaddr addr, unsigned size)
 {
-nvdimm_debug("BUG: we never read _DSM IO Port.\n");
+trace_acpi_nvdimm_read_io_port();

[PATCH v7 02/15] block: introduce bdrv_open_file_child() helper

2022-07-26 Thread Vladimir Sementsov-Ogievskiy

Almost all drivers call bdrv_open_child() similarly. Let's create a
helper for this.

The only not updated drivers that call bdrv_open_child() to set
bs->file are raw-format and snapshot-access:
raw-format sometimes want to have filtered child but
don't set drv->is_filter to true.
snapshot-access wants only DATA | PRIMARY

Possibly we should implement drv->is_filter_func() handler, to consider
raw-format as filter when it works as filter.. But it's another story.

Note also, that we decrease assignments to bs->file in code: it helps
us restrict modifying this field in further commit.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Hanna Reitz 
---
 block.c| 21 +
 block/blkdebug.c   |  9 +++--
 block/blklogwrites.c   |  7 ++-
 block/blkreplay.c  |  7 ++-
 block/blkverify.c  |  9 +++--
 block/bochs.c  |  7 +++
 block/cloop.c  |  7 +++
 block/copy-before-write.c  |  9 -
 block/copy-on-read.c   |  9 -
 block/crypto.c | 11 ++-
 block/dmg.c|  7 +++
 block/filter-compress.c|  8 +++-
 block/parallels.c  |  7 +++
 block/preallocate.c|  9 -
 block/qcow.c   |  6 ++
 block/qcow2.c  |  8 
 block/qed.c|  8 
 block/replication.c|  8 +++-
 block/throttle.c   |  8 +++-
 block/vdi.c|  7 +++
 block/vhdx.c   |  7 +++
 block/vmdk.c   |  7 +++
 block/vpc.c|  7 +++
 include/block/block-global-state.h |  3 +++
 24 files changed, 95 insertions(+), 101 deletions(-)

diff --git a/block.c b/block.c
index bc85f46eed..4e38fc45c0 100644
--- a/block.c
+++ b/block.c
@@ -3668,6 +3668,27 @@ BdrvChild *bdrv_open_child(const char *filename,
  errp);
 }
 
+/*
+ * Wrapper on bdrv_open_child() for most popular case: open primary child of 
bs.
+ */
+int bdrv_open_file_child(const char *filename,
+ QDict *options, const char *bdref_key,
+ BlockDriverState *parent, Error **errp)
+{
+BdrvChildRole role;
+
+/* commit_top and mirror_top don't use this function */
+assert(!parent->drv->filtered_child_is_backing);
+
+role = parent->drv->is_filter ?
+(BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY) : BDRV_CHILD_IMAGE;
+
+parent->file = bdrv_open_child(filename, options, bdref_key, parent,
+   _of_bds, role, false, errp);
+
+return parent->file ? 0 : -EINVAL;
+}
+
 /*
  * TODO Future callers may need to specify parent/child_class in order for
  * option inheritance to work. Existing callers use it for the root node.
diff --git a/block/blkdebug.c b/block/blkdebug.c
index bbf2948703..5fcfc8ac6f 100644
--- a/block/blkdebug.c
+++ b/block/blkdebug.c
@@ -503,12 +503,9 @@ static int blkdebug_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 
 /* Open the image file */
-bs->file = bdrv_open_child(qemu_opt_get(opts, "x-image"), options, "image",
-   bs, _of_bds,
-   BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
-   false, errp);
-if (!bs->file) {
-ret = -EINVAL;
+ret = bdrv_open_file_child(qemu_opt_get(opts, "x-image"), options, "image",
+   bs, errp);
+if (ret < 0) {
 goto out;
 }
 
diff --git a/block/blklogwrites.c b/block/blklogwrites.c
index e3c6c4039c..12b4c3c8cf 100644
--- a/block/blklogwrites.c
+++ b/block/blklogwrites.c
@@ -155,11 +155,8 @@ static int blk_log_writes_open(BlockDriverState *bs, QDict 
*options, int flags,
 }
 
 /* Open the file */
-bs->file = bdrv_open_child(NULL, options, "file", bs, _of_bds,
-   BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY, false,
-   errp);
-if (!bs->file) {
-ret = -EINVAL;
+ret = bdrv_open_file_child(NULL, options, "file", bs, errp);
+if (ret < 0) {
 goto fail;
 }
 
diff --git a/block/blkreplay.c b/block/blkreplay.c
index dcbe780ddb..76a0b8d12a 100644
--- a/block/blkreplay.c
+++ b/block/blkreplay.c
@@ -26,11 +26,8 @@ static int blkreplay_open(BlockDriverState *bs, QDict 
*options, int flags,
 int ret;
 
 /* Open the image file */
-bs->file = bdrv_open_child(NULL, options, "image", bs, _of_bds,
-   BDRV_CHILD_FILTERED | BDRV_CHILD_PRIMARY,
-   false, errp);
-if (!bs->file) {
-ret = -EINVAL;
+ret = bdrv_open_file_child(NULL, options, "image", bs, errp);
+if (ret < 0) {
 goto

[PATCH v7 01/15] block: BlockDriver: add .filtered_child_is_backing field

2022-07-26 Thread Vladimir Sementsov-Ogievskiy

Unfortunately not all filters use .file child as filtered child. Two
exclusions are mirror_top and commit_top. Happily they both are private
filters. Bad thing is that this inconsistency is observable through qmp
commands query-block / query-named-block-nodes. So, could we just
change mirror_top and commit_top to use file child as all other filter
driver is an open question. Probably, we could do that with some kind
of deprecation period, but how to warn users during it?

For now, let's just add a field so we can distinguish them in generic
code, it will be used in further commits.

Signed-off-by: Vladimir Sementsov-Ogievskiy 
Reviewed-by: Hanna Reitz 
---
 block/commit.c   |  1 +
 block/mirror.c   |  1 +
 include/block/block_int-common.h | 13 +
 3 files changed, 15 insertions(+)

diff --git a/block/commit.c b/block/commit.c
index 38571510cb..e210e86bac 100644
--- a/block/commit.c
+++ b/block/commit.c
@@ -238,6 +238,7 @@ static BlockDriver bdrv_commit_top = {
 .bdrv_child_perm= bdrv_commit_top_child_perm,
 
 .is_filter  = true,
+.filtered_child_is_backing  = true,
 };
 
 void commit_start(const char *job_id, BlockDriverState *bs,
diff --git a/block/mirror.c b/block/mirror.c
index 3c4ab1159d..b808e8bdc2 100644
--- a/block/mirror.c
+++ b/block/mirror.c
@@ -1578,6 +1578,7 @@ static BlockDriver bdrv_mirror_top = {
 .bdrv_child_perm= bdrv_mirror_top_child_perm,
 
 .is_filter  = true,
+.filtered_child_is_backing  = true,
 };
 
 static BlockJob *mirror_start_job(
diff --git a/include/block/block_int-common.h b/include/block/block_int-common.h
index 8947abab76..9d91ccbcbf 100644
--- a/include/block/block_int-common.h
+++ b/include/block/block_int-common.h
@@ -119,6 +119,19 @@ struct BlockDriver {
  * (And this filtered child must then be bs->file or bs->backing.)
  */
 bool is_filter;
+/*
+ * Only make sense for filter drivers, for others must be false.
+ * If true, filtered child is bs->backing. Otherwise it's bs->file.
+ * Only two internal filters use bs->backing as filtered child and has this
+ * field set to true: mirror_top and commit_top.
+ *
+ * Never create any more such filters!
+ *
+ * TODO: imagine how to deprecate this behavior and make all filters work
+ * similarly using bs->file as filtered child.
+ */
+bool filtered_child_is_backing;
+
 /*
  * Set to true if the BlockDriver is a format driver.  Format nodes
  * generally do not expect their children to be other format nodes
-- 
2.25.1

[PATCH] s390x/cpumodel: add stfl197 processor-activity-instrumentation extension 1

2022-07-26 Thread Christian Borntraeger

Add stfle 197 (processor-activity-instrumentation extension 1) to the
gen16 default model and fence it off for 7.0 and older.

Signed-off-by: Christian Borntraeger 
---
 hw/s390x/s390-virtio-ccw.c  | 1 +
 target/s390x/cpu_features_def.h.inc | 1 +
 target/s390x/gen-features.c | 2 ++
 3 files changed, 4 insertions(+)

diff --git a/hw/s390x/s390-virtio-ccw.c b/hw/s390x/s390-virtio-ccw.c
index cc3097bfee80..6268aa5d0888 100644
--- a/hw/s390x/s390-virtio-ccw.c
+++ b/hw/s390x/s390-virtio-ccw.c
@@ -806,6 +806,7 @@ static void ccw_machine_7_0_instance_options(MachineState 
*machine)
 static const S390FeatInit qemu_cpu_feat = { S390_FEAT_LIST_QEMU_V7_0 };
 
 ccw_machine_7_1_instance_options(machine);
+s390_cpudef_featoff_greater(16, 1, S390_FEAT_PAIE);
 s390_set_qemu_cpu_model(0x8561, 15, 1, qemu_cpu_feat);
 }
 
diff --git a/target/s390x/cpu_features_def.h.inc 
b/target/s390x/cpu_features_def.h.inc
index 3603e5fb12c6..e3cfe637354b 100644
--- a/target/s390x/cpu_features_def.h.inc
+++ b/target/s390x/cpu_features_def.h.inc
@@ -114,6 +114,7 @@ DEF_FEAT(VECTOR_PACKED_DECIMAL_ENH2, "vxpdeh2", STFL, 192, 
"Vector-Packed-Decima
 DEF_FEAT(BEAR_ENH, "beareh", STFL, 193, "BEAR-enhancement facility")
 DEF_FEAT(RDP, "rdp", STFL, 194, "Reset-DAT-protection facility")
 DEF_FEAT(PAI, "pai", STFL, 196, "Processor-Activity-Instrumentation facility")
+DEF_FEAT(PAIE, "paie", STFL, 197, "Processor-Activity-Instrumentation 
extension-1")
 
 /* Features exposed via SCLP SCCB Byte 80 - 98  (bit numbers relative to 
byte-80) */
 DEF_FEAT(SIE_GSLS, "gsls", SCLP_CONF_CHAR, 40, "SIE: 
Guest-storage-limit-suppression facility")
diff --git a/target/s390x/gen-features.c b/target/s390x/gen-features.c
index ad140184b903..1558c5262616 100644
--- a/target/s390x/gen-features.c
+++ b/target/s390x/gen-features.c
@@ -575,6 +575,7 @@ static uint16_t full_GEN16_GA1[] = {
 S390_FEAT_BEAR_ENH,
 S390_FEAT_RDP,
 S390_FEAT_PAI,
+S390_FEAT_PAIE,
 };
 
 
@@ -669,6 +670,7 @@ static uint16_t default_GEN16_GA1[] = {
 S390_FEAT_BEAR_ENH,
 S390_FEAT_RDP,
 S390_FEAT_PAI,
+S390_FEAT_PAIE,
 };
 
 /* QEMU (CPU model) features */
-- 
2.36.1

[PATCH v3 12/21] hw/virtio: add vhost-user-gpio-pci boilerplate

2022-07-26 Thread Alex Bennée

From: Viresh Kumar 

This allows is to instantiate a vhost-user-gpio device as part of a PCI
bus. It is mostly boilerplate which looks pretty similar to the
vhost-user-fs-pci device.

Signed-off-by: Viresh Kumar 
Reviewed-by: Alex Bennée 
Message-Id: 
<5f560cab92d0d789b1c94295ec74b9952907d69d.1641987128.git.viresh.ku...@linaro.org>
Signed-off-by: Alex Bennée 
---
 hw/virtio/vhost-user-gpio-pci.c | 69 +
 hw/virtio/meson.build   |  1 +
 2 files changed, 70 insertions(+)
 create mode 100644 hw/virtio/vhost-user-gpio-pci.c

diff --git a/hw/virtio/vhost-user-gpio-pci.c b/hw/virtio/vhost-user-gpio-pci.c
new file mode 100644
index 00..b3028a24a1
--- /dev/null
+++ b/hw/virtio/vhost-user-gpio-pci.c
@@ -0,0 +1,69 @@
+/*
+ * Vhost-user gpio virtio device PCI glue
+ *
+ * Copyright (c) 2022 Viresh Kumar 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "hw/qdev-properties.h"
+#include "hw/virtio/vhost-user-gpio.h"
+#include "hw/virtio/virtio-pci.h"
+
+struct VHostUserGPIOPCI {
+VirtIOPCIProxy parent_obj;
+VHostUserGPIO vdev;
+};
+
+typedef struct VHostUserGPIOPCI VHostUserGPIOPCI;
+
+#define TYPE_VHOST_USER_GPIO_PCI "vhost-user-gpio-pci-base"
+
+DECLARE_INSTANCE_CHECKER(VHostUserGPIOPCI, VHOST_USER_GPIO_PCI,
+ TYPE_VHOST_USER_GPIO_PCI)
+
+static void vhost_user_gpio_pci_realize(VirtIOPCIProxy *vpci_dev, Error **errp)
+{
+VHostUserGPIOPCI *dev = VHOST_USER_GPIO_PCI(vpci_dev);
+DeviceState *vdev = DEVICE(>vdev);
+
+vpci_dev->nvectors = 1;
+qdev_realize(vdev, BUS(_dev->bus), errp);
+}
+
+static void vhost_user_gpio_pci_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+VirtioPCIClass *k = VIRTIO_PCI_CLASS(klass);
+PCIDeviceClass *pcidev_k = PCI_DEVICE_CLASS(klass);
+k->realize = vhost_user_gpio_pci_realize;
+set_bit(DEVICE_CATEGORY_INPUT, dc->categories);
+pcidev_k->vendor_id = PCI_VENDOR_ID_REDHAT_QUMRANET;
+pcidev_k->device_id = 0; /* Set by virtio-pci based on virtio id */
+pcidev_k->revision = 0x00;
+pcidev_k->class_id = PCI_CLASS_COMMUNICATION_OTHER;
+}
+
+static void vhost_user_gpio_pci_instance_init(Object *obj)
+{
+VHostUserGPIOPCI *dev = VHOST_USER_GPIO_PCI(obj);
+
+virtio_instance_init_common(obj, >vdev, sizeof(dev->vdev),
+TYPE_VHOST_USER_GPIO);
+}
+
+static const VirtioPCIDeviceTypeInfo vhost_user_gpio_pci_info = {
+.base_name = TYPE_VHOST_USER_GPIO_PCI,
+.non_transitional_name = "vhost-user-gpio-pci",
+.instance_size = sizeof(VHostUserGPIOPCI),
+.instance_init = vhost_user_gpio_pci_instance_init,
+.class_init = vhost_user_gpio_pci_class_init,
+};
+
+static void vhost_user_gpio_pci_register(void)
+{
+virtio_pci_types_register(_user_gpio_pci_info);
+}
+
+type_init(vhost_user_gpio_pci_register);
diff --git a/hw/virtio/meson.build b/hw/virtio/meson.build
index 33c8e71fab..c14e3db10a 100644
--- a/hw/virtio/meson.build
+++ b/hw/virtio/meson.build
@@ -30,6 +30,7 @@ virtio_ss.add(when: 'CONFIG_VIRTIO_MEM', if_true: 
files('virtio-mem.c'))
 virtio_ss.add(when: 'CONFIG_VHOST_USER_I2C', if_true: 
files('vhost-user-i2c.c'))
 virtio_ss.add(when: 'CONFIG_VHOST_USER_RNG', if_true: 
files('vhost-user-rng.c'))
 virtio_ss.add(when: 'CONFIG_VHOST_USER_GPIO', if_true: 
files('vhost-user-gpio.c'))
+virtio_ss.add(when: ['CONFIG_VIRTIO_PCI', 'CONFIG_VHOST_USER_GPIO'], if_true: 
files('vhost-user-gpio-pci.c'))
 
 virtio_pci_ss = ss.source_set()
 virtio_pci_ss.add(when: 'CONFIG_VHOST_VSOCK', if_true: 
files('vhost-vsock-pci.c'))
-- 
2.30.2

[PATCH v3 16/21] tests/qtest: catch unhandled vhost-user messages

2022-07-26 Thread Alex Bennée

We don't need to action every message but lets document the ones we
are expecting to consume so future tests don't get confused about
unhandled bits.

Signed-off-by: Alex Bennée 

---
v1
  - drop g_test_fail() when we get unexpected result, that just hangs
---
 tests/qtest/vhost-user-test.c | 40 +++
 1 file changed, 40 insertions(+)

diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c
index 968113d591..d0fa034601 100644
--- a/tests/qtest/vhost-user-test.c
+++ b/tests/qtest/vhost-user-test.c
@@ -358,12 +358,41 @@ static void chr_read(void *opaque, const uint8_t *buf, 
int size)
 }
 break;
 
+case VHOST_USER_SET_OWNER:
+/*
+ * We don't need to do anything here, the remote is just
+ * letting us know it is in charge. Just log it.
+ */
+qos_printf("set_owner: start of session\n");
+break;
+
 case VHOST_USER_GET_PROTOCOL_FEATURES:
 if (s->vu_ops->get_protocol_features) {
 s->vu_ops->get_protocol_features(s, chr, );
 }
 break;
 
+case VHOST_USER_SET_PROTOCOL_FEATURES:
+/*
+ * We did set VHOST_USER_F_PROTOCOL_FEATURES so its valid for
+ * the remote end to send this. There is no handshake reply so
+ * just log the details for debugging.
+ */
+qos_printf("set_protocol_features: 0x%"PRIx64 "\n", msg.payload.u64);
+break;
+
+/*
+ * A real vhost-user backend would actually set the size and
+ * address of the vrings but we can simply report them.
+ */
+case VHOST_USER_SET_VRING_NUM:
+qos_printf("set_vring_num: %d/%d\n",
+   msg.payload.state.index, msg.payload.state.num);
+break;
+case VHOST_USER_SET_VRING_ADDR:
+qos_printf("set_vring_addr:\n");
+break;
+
 case VHOST_USER_GET_VRING_BASE:
 /* send back vring base to qemu */
 msg.flags |= VHOST_USER_REPLY_MASK;
@@ -428,7 +457,18 @@ static void chr_read(void *opaque, const uint8_t *buf, int 
size)
 qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE + msg.size);
 break;
 
+case VHOST_USER_SET_VRING_ENABLE:
+/*
+ * Another case we ignore as we don't need to respond. With a
+ * fully functioning vhost-user we would enable/disable the
+ * vring monitoring.
+ */
+qos_printf("set_vring(%d)=%s\n", msg.payload.state.index,
+   msg.payload.state.num ? "enabled" : "disabled");
+break;
+
 default:
+qos_printf("vhost-user: un-handled message: %d\n", msg.request);
 break;
 }
 
-- 
2.30.2

[PULL 15/16] i386/pc: restrict AMD only enforcing of 1Tb hole to new machine type

2022-07-26 Thread Michael S. Tsirkin

From: Joao Martins 

The added enforcing is only relevant in the case of AMD where the
range right before the 1TB is restricted and cannot be DMA mapped
by the kernel consequently leading to IOMMU INVALID_DEVICE_REQUEST
or possibly other kinds of IOMMU events in the AMD IOMMU.

Although, there's a case where it may make sense to disable the
IOVA relocation/validation when migrating from a
non-amd-1tb-aware qemu to one that supports it.

Relocating RAM regions to after the 1Tb hole has consequences for
guest ABI because we are changing the memory mapping, so make
sure that only new machine enforce but not older ones.

Signed-off-by: Joao Martins 
Acked-by: Dr. David Alan Gilbert 
Acked-by: Igor Mammedov 
Message-Id: <20220719170014.27028-12-joao.m.mart...@oracle.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/i386/pc.h | 1 +
 hw/i386/pc.c | 6 --
 hw/i386/pc_piix.c| 1 +
 hw/i386/pc_q35.c | 1 +
 4 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 01938fce4c..8435733bd6 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -118,6 +118,7 @@ struct PCMachineClass {
 bool has_reserved_memory;
 bool enforce_aligned_dimm;
 bool broken_reserved_end;
+bool enforce_amd_1tb_hole;
 
 /* generate legacy CPU hotplug AML */
 bool legacy_cpu_hotplug;
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 1c5c9e17c6..7280c02ce3 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -951,9 +951,10 @@ void pc_memory_init(PCMachineState *pcms,
 /*
  * The HyperTransport range close to the 1T boundary is unique to AMD
  * hosts with IOMMUs enabled. Restrict the ram-above-4g relocation
- * to above 1T to AMD vCPUs only.
+ * to above 1T to AMD vCPUs only. @enforce_amd_1tb_hole is only false in
+ * older machine types (<= 7.0) for compatibility purposes.
  */
-if (IS_AMD_CPU(>env)) {
+if (IS_AMD_CPU(>env) && pcmc->enforce_amd_1tb_hole) {
 /* Bail out if max possible address does not cross HT range */
 if (pc_max_used_gpa(pcms, pci_hole64_size) >= AMD_HT_START) {
 x86ms->above_4g_mem_start = AMD_ABOVE_1TB_START;
@@ -1902,6 +1903,7 @@ static void pc_machine_class_init(ObjectClass *oc, void 
*data)
 pcmc->has_reserved_memory = true;
 pcmc->kvmclock_enabled = true;
 pcmc->enforce_aligned_dimm = true;
+pcmc->enforce_amd_1tb_hole = true;
 /* BIOS ACPI tables: 128K. Other BIOS datastructures: less than 4K reported
  * to be used at the moment, 32K should be enough for a while.  */
 pcmc->acpi_data_size = 0x2 + 0x8000;
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index aa191d405a..a5c65c1c35 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -451,6 +451,7 @@ static void pc_i440fx_7_0_machine_options(MachineClass *m)
 m->alias = NULL;
 m->is_default = false;
 pcmc->legacy_no_rng_seed = true;
+pcmc->enforce_amd_1tb_hole = false;
 compat_props_add(m->compat_props, hw_compat_7_0, hw_compat_7_0_len);
 compat_props_add(m->compat_props, pc_compat_7_0, pc_compat_7_0_len);
 }
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index 307910b33c..3a35193ff7 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -387,6 +387,7 @@ static void pc_q35_7_0_machine_options(MachineClass *m)
 pc_q35_7_1_machine_options(m);
 m->alias = NULL;
 pcmc->legacy_no_rng_seed = true;
+pcmc->enforce_amd_1tb_hole = false;
 compat_props_add(m->compat_props, hw_compat_7_0, hw_compat_7_0_len);
 compat_props_add(m->compat_props, pc_compat_7_0, pc_compat_7_0_len);
 }
-- 
MST

[PULL 12/16] i386/pc: factor out device_memory base/size to helper

2022-07-26 Thread Michael S. Tsirkin

From: Joao Martins 

Move obtaining hole64_start from device_memory memory region base/size
into an helper alongside correspondent getters in pc_memory_init() when
the hotplug range is unitialized. While doing that remove the memory
region based logic from this newly added helper.

This is the final step that allows pc_pci_hole64_start() to be callable
at the beginning of pc_memory_init() before any memory regions are
initialized.

Cc: Jonathan Cameron 
Signed-off-by: Joao Martins 
Acked-by: Igor Mammedov 
Message-Id: <20220719170014.27028-9-joao.m.mart...@oracle.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/i386/pc.c | 48 
 1 file changed, 32 insertions(+), 16 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 611eb197da..ebc27e4cb7 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -825,15 +825,36 @@ static hwaddr pc_above_4g_end(PCMachineState *pcms)
 return x86ms->above_4g_mem_start + x86ms->above_4g_mem_size;
 }
 
-static uint64_t pc_get_cxl_range_start(PCMachineState *pcms)
+static void pc_get_device_memory_range(PCMachineState *pcms,
+   hwaddr *base,
+   ram_addr_t *device_mem_size)
 {
 PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
 MachineState *machine = MACHINE(pcms);
-hwaddr cxl_base;
+ram_addr_t size;
+hwaddr addr;
 
-if (pcmc->has_reserved_memory && machine->device_memory->base) {
-cxl_base = machine->device_memory->base
-+ memory_region_size(>device_memory->mr);
+size = machine->maxram_size - machine->ram_size;
+addr = ROUND_UP(pc_above_4g_end(pcms), 1 * GiB);
+
+if (pcmc->enforce_aligned_dimm) {
+/* size device region assuming 1G page max alignment per slot */
+size += (1 * GiB) * machine->ram_slots;
+}
+
+*base = addr;
+*device_mem_size = size;
+}
+
+static uint64_t pc_get_cxl_range_start(PCMachineState *pcms)
+{
+PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
+hwaddr cxl_base;
+ram_addr_t size;
+
+if (pcmc->has_reserved_memory) {
+pc_get_device_memory_range(pcms, _base, );
+cxl_base += size;
 } else {
 cxl_base = pc_above_4g_end(pcms);
 }
@@ -920,7 +941,7 @@ void pc_memory_init(PCMachineState *pcms,
 /* initialize device memory address space */
 if (pcmc->has_reserved_memory &&
 (machine->ram_size < machine->maxram_size)) {
-ram_addr_t device_mem_size = machine->maxram_size - machine->ram_size;
+ram_addr_t device_mem_size;
 
 if (machine->ram_slots > ACPI_MAX_RAM_SLOTS) {
 error_report("unsupported amount of memory slots: %"PRIu64,
@@ -935,13 +956,7 @@ void pc_memory_init(PCMachineState *pcms,
 exit(EXIT_FAILURE);
 }
 
-machine->device_memory->base =
-ROUND_UP(pc_above_4g_end(pcms), 1 * GiB);
-
-if (pcmc->enforce_aligned_dimm) {
-/* size device region assuming 1G page max alignment per slot */
-device_mem_size += (1 * GiB) * machine->ram_slots;
-}
+pc_get_device_memory_range(pcms, >device_memory->base, 
_mem_size);
 
 if ((machine->device_memory->base + device_mem_size) <
 device_mem_size) {
@@ -1046,13 +1061,14 @@ uint64_t pc_pci_hole64_start(void)
 PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
 MachineState *ms = MACHINE(pcms);
 uint64_t hole64_start = 0;
+ram_addr_t size = 0;
 
 if (pcms->cxl_devices_state.is_enabled) {
 hole64_start = pc_get_cxl_range_end(pcms);
-} else if (pcmc->has_reserved_memory && ms->device_memory->base) {
-hole64_start = ms->device_memory->base;
+} else if (pcmc->has_reserved_memory && (ms->ram_size < ms->maxram_size)) {
+pc_get_device_memory_range(pcms, _start, );
 if (!pcmc->broken_reserved_end) {
-hole64_start += memory_region_size(>device_memory->mr);
+hole64_start += size;
 }
 } else {
 hole64_start = pc_above_4g_end(pcms);
-- 
MST

[PATCH v3 13/21] tests/qtest: pass stdout/stderr down to subtests

2022-07-26 Thread Alex Bennée

When trying to work out what the virtio-net-tests where doing it was
hard because the g_test_trap_subprocess redirects all output to
/dev/null. Lift this restriction by using the appropriate flags so you
can see something similar to what the vhost-user-blk tests show when
running.

Signed-off-by: Alex Bennée 
Acked-by: Thomas Huth 
Message-Id: <20220407150042.2338562-1-alex.ben...@linaro.org>

---
v2
  - keep dumping of CLI behind the g_test_verbose flag
---
 tests/qtest/qos-test.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/tests/qtest/qos-test.c b/tests/qtest/qos-test.c
index f97d0a08fd..7e1c8fc579 100644
--- a/tests/qtest/qos-test.c
+++ b/tests/qtest/qos-test.c
@@ -185,7 +185,8 @@ static void run_one_test(const void *arg)
 static void subprocess_run_one_test(const void *arg)
 {
 const gchar *path = arg;
-g_test_trap_subprocess(path, 0, 0);
+g_test_trap_subprocess(path, 0,
+   G_TEST_SUBPROCESS_INHERIT_STDOUT | 
G_TEST_SUBPROCESS_INHERIT_STDERR);
 g_test_trap_assert_passed();
 }
 
-- 
2.30.2

[PULL 14/16] i386/pc: relocate 4g start to 1T where applicable

2022-07-26 Thread Michael S. Tsirkin

From: Joao Martins 

It is assumed that the whole GPA space is available to be DMA
addressable, within a given address space limit, except for a
tiny region before the 4G. Since Linux v5.4, VFIO validates
whether the selected GPA is indeed valid i.e. not reserved by
IOMMU on behalf of some specific devices or platform-defined
restrictions, and thus failing the ioctl(VFIO_DMA_MAP) with
 -EINVAL.

AMD systems with an IOMMU are examples of such platforms and
particularly may only have these ranges as allowed:

 - fedf (0  .. 3.982G)
fef0 - 00fc (3.983G .. 1011.9G)
0100 -  (1Tb.. 16Pb[*])

We already account for the 4G hole, albeit if the guest is big
enough we will fail to allocate a guest with  >1010G due to the
~12G hole at the 1Tb boundary, reserved for HyperTransport (HT).

[*] there is another reserved region unrelated to HT that exists
in the 256T boundary in Fam 17h according to Errata #1286,
documeted also in "Open-Source Register Reference for AMD Family
17h Processors (PUB)"

When creating the region above 4G, take into account that on AMD
platforms the HyperTransport range is reserved and hence it
cannot be used either as GPAs. On those cases rather than
establishing the start of ram-above-4g to be 4G, relocate instead
to 1Tb. See AMD IOMMU spec, section 2.1.2 "IOMMU Logical
Topology", for more information on the underlying restriction of
IOVAs.

After accounting for the 1Tb hole on AMD hosts, mtree should
look like:

-7fff (prio 0, i/o):
 alias ram-below-4g @pc.ram -7fff
0100-01ff7fff (prio 0, i/o):
alias ram-above-4g @pc.ram 8000-00ff

If the relocation is done or the address space covers it, we
also add the the reserved HT e820 range as reserved.

Default phys-bits on Qemu is TCG_PHYS_ADDR_BITS (40) which is enough
to address 1Tb (0xff  ). On AMD platforms, if a
ram-above-4g relocation is attempted and the CPU wasn't configured
with a big enough phys-bits, an error message will be printed
due to the maxphysaddr vs maxusedaddr check previously added.

Suggested-by: Igor Mammedov 
Signed-off-by: Joao Martins 
Acked-by: Igor Mammedov 
Message-Id: <20220719170014.27028-11-joao.m.mart...@oracle.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/i386/pc.c | 54 
 1 file changed, 54 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 56d8c179ea..1c5c9e17c6 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -891,6 +891,40 @@ static hwaddr pc_max_used_gpa(PCMachineState *pcms, 
uint64_t pci_hole64_size)
 return pc_pci_hole64_start() + pci_hole64_size - 1;
 }
 
+/*
+ * AMD systems with an IOMMU have an additional hole close to the
+ * 1Tb, which are special GPAs that cannot be DMA mapped. Depending
+ * on kernel version, VFIO may or may not let you DMA map those ranges.
+ * Starting Linux v5.4 we validate it, and can't create guests on AMD machines
+ * with certain memory sizes. It's also wrong to use those IOVA ranges
+ * in detriment of leading to IOMMU INVALID_DEVICE_REQUEST or worse.
+ * The ranges reserved for Hyper-Transport are:
+ *
+ * FD__h - FF__h
+ *
+ * The ranges represent the following:
+ *
+ * Base Address   Top Address  Use
+ *
+ * FD__h FD_F7FF_h Reserved interrupt address space
+ * FD_F800_h FD_F8FF_h Interrupt/EOI IntCtl
+ * FD_F900_h FD_F90F_h Legacy PIC IACK
+ * FD_F910_h FD_F91F_h System Management
+ * FD_F920_h FD_FAFF_h Reserved Page Tables
+ * FD_FB00_h FD_FBFF_h Address Translation
+ * FD_FC00_h FD_FDFF_h I/O Space
+ * FD_FE00_h FD__h Configuration
+ * FE__h FE_1FFF_h Extended Configuration/Device Messages
+ * FE_2000_h FF__h Reserved
+ *
+ * See AMD IOMMU spec, section 2.1.2 "IOMMU Logical Topology",
+ * Table 3: Special Address Controls (GPA) for more information.
+ */
+#define AMD_HT_START 0xfdUL
+#define AMD_HT_END   0xffUL
+#define AMD_ABOVE_1TB_START  (AMD_HT_END + 1)
+#define AMD_HT_SIZE  (AMD_ABOVE_1TB_START - AMD_HT_START)
+
 void pc_memory_init(PCMachineState *pcms,
 MemoryRegion *system_memory,
 MemoryRegion *rom_memory,
@@ -914,6 +948,26 @@ void pc_memory_init(PCMachineState *pcms,
 
 linux_boot = (machine->kernel_filename != NULL);
 
+/*
+ * The HyperTransport range close to the 1T boundary is unique to AMD
+ * hosts with IOMMUs enabled. Restrict the ram-above-4g relocation
+ * to above 1T to AMD vCPUs only.
+ */
+if (IS_AMD_CPU(>env)) {
+/* Bail out if max possible address does not cross HT range */
+if (pc_max_used_gpa(pcms, pci_hole64_size) >= AMD_HT_START) {
+x86ms->above_4g_mem_start =

[PULL 13/16] i386/pc: bounds check phys-bits against max used GPA

2022-07-26 Thread Michael S. Tsirkin

From: Joao Martins 

Calculate max *used* GPA against the CPU maximum possible address
and error out if the former surprasses the latter. This ensures
max used GPA is reacheable by configured phys-bits. Default phys-bits
on Qemu is TCG_PHYS_ADDR_BITS (40) which is enough for the CPU to
address 1Tb (0xff  ) or 1010G (0xfc  ) in AMD hosts
with IOMMU.

This is preparation for AMD guests with >1010G, where it will want relocate
ram-above-4g to be after 1Tb instead of 4G.

Signed-off-by: Joao Martins 
Acked-by: Igor Mammedov 
Message-Id: <20220719170014.27028-10-joao.m.mart...@oracle.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/i386/pc.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index ebc27e4cb7..56d8c179ea 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -879,6 +879,18 @@ static uint64_t pc_get_cxl_range_end(PCMachineState *pcms)
 return start;
 }
 
+static hwaddr pc_max_used_gpa(PCMachineState *pcms, uint64_t pci_hole64_size)
+{
+X86CPU *cpu = X86_CPU(first_cpu);
+
+/* 32-bit systems don't have hole64 thus return max CPU address */
+if (cpu->phys_bits <= 32) {
+return ((hwaddr)1 << cpu->phys_bits) - 1;
+}
+
+return pc_pci_hole64_start() + pci_hole64_size - 1;
+}
+
 void pc_memory_init(PCMachineState *pcms,
 MemoryRegion *system_memory,
 MemoryRegion *rom_memory,
@@ -893,13 +905,28 @@ void pc_memory_init(PCMachineState *pcms,
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
 X86MachineState *x86ms = X86_MACHINE(pcms);
+hwaddr maxphysaddr, maxusedaddr;
 hwaddr cxl_base, cxl_resv_end = 0;
+X86CPU *cpu = X86_CPU(first_cpu);
 
 assert(machine->ram_size == x86ms->below_4g_mem_size +
 x86ms->above_4g_mem_size);
 
 linux_boot = (machine->kernel_filename != NULL);
 
+/*
+ * phys-bits is required to be appropriately configured
+ * to make sure max used GPA is reachable.
+ */
+maxusedaddr = pc_max_used_gpa(pcms, pci_hole64_size);
+maxphysaddr = ((hwaddr)1 << cpu->phys_bits) - 1;
+if (maxphysaddr < maxusedaddr) {
+error_report("Address space limit 0x%"PRIx64" < 0x%"PRIx64
+ " phys-bits too low (%u)",
+ maxphysaddr, maxusedaddr, cpu->phys_bits);
+exit(EXIT_FAILURE);
+}
+
 /*
  * Split single memory region and use aliases to address portions of it,
  * done for backwards compatibility with older qemus.
-- 
MST

[PULL 10/16] i386/pc: factor out cxl range start to helper

2022-07-26 Thread Michael S. Tsirkin

From: Joao Martins 

Factor out the calculation of the base address of the memory region.
It will be used later on for the cxl range end counterpart calculation
and as well in pc_memory_init() CXL memory region initialization, thus
avoiding duplication.

Cc: Jonathan Cameron 
Signed-off-by: Joao Martins 
Acked-by: Igor Mammedov 
Message-Id: <20220719170014.27028-7-joao.m.mart...@oracle.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/i386/pc.c | 24 +---
 1 file changed, 17 insertions(+), 7 deletions(-)

diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index cb27309e76..9e1a067c41 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -825,6 +825,22 @@ static hwaddr pc_above_4g_end(PCMachineState *pcms)
 return x86ms->above_4g_mem_start + x86ms->above_4g_mem_size;
 }
 
+static uint64_t pc_get_cxl_range_start(PCMachineState *pcms)
+{
+PCMachineClass *pcmc = PC_MACHINE_GET_CLASS(pcms);
+MachineState *machine = MACHINE(pcms);
+hwaddr cxl_base;
+
+if (pcmc->has_reserved_memory && machine->device_memory->base) {
+cxl_base = machine->device_memory->base
++ memory_region_size(>device_memory->mr);
+} else {
+cxl_base = pc_above_4g_end(pcms);
+}
+
+return cxl_base;
+}
+
 static uint64_t pc_get_cxl_range_end(PCMachineState *pcms)
 {
 uint64_t start = 0;
@@ -946,13 +962,7 @@ void pc_memory_init(PCMachineState *pcms,
 MemoryRegion *mr = >cxl_devices_state.host_mr;
 hwaddr cxl_size = MiB;
 
-if (pcmc->has_reserved_memory && machine->device_memory->base) {
-cxl_base = machine->device_memory->base
-+ memory_region_size(>device_memory->mr);
-} else {
-cxl_base = pc_above_4g_end(pcms);
-}
-
+cxl_base = pc_get_cxl_range_start(pcms);
 e820_add_entry(cxl_base, cxl_size, E820_RESERVED);
 memory_region_init(mr, OBJECT(machine), "cxl_host_reg", cxl_size);
 memory_region_add_subregion(system_memory, cxl_base, mr);
-- 
MST

[PATCH v3 20/21] tests/qtest: add a get_features op to vhost-user-test

2022-07-26 Thread Alex Bennée

As we expand this test for more virtio devices we will need to support
different feature sets. Add a mandatory op field to fetch the list of
features needed for the test itself.

Signed-off-by: Alex Bennée 
---
 tests/qtest/vhost-user-test.c | 37 +--
 1 file changed, 27 insertions(+), 10 deletions(-)

diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c
index d546721f5d..28b4cf28ec 100644
--- a/tests/qtest/vhost-user-test.c
+++ b/tests/qtest/vhost-user-test.c
@@ -171,10 +171,11 @@ struct vhost_user_ops {
 const char *chr_opts);
 
 /* VHOST-USER commands. */
+uint64_t (*get_features)(TestServer *s);
 void (*set_features)(TestServer *s, CharBackend *chr,
-VhostUserMsg *msg);
+ VhostUserMsg *msg);
 void (*get_protocol_features)(TestServer *s,
-CharBackend *chr, VhostUserMsg *msg);
+  CharBackend *chr, VhostUserMsg *msg);
 };
 
 static const char *init_hugepagefs(void);
@@ -338,20 +339,22 @@ static void chr_read(void *opaque, const uint8_t *buf, 
int size)
 
 switch (msg.request) {
 case VHOST_USER_GET_FEATURES:
+/* Mandatory for tests to define get_features */
+g_assert(s->vu_ops->get_features);
+
 /* send back features to qemu */
 msg.flags |= VHOST_USER_REPLY_MASK;
 msg.size = sizeof(m.payload.u64);
-msg.payload.u64 = 0x1ULL << VHOST_F_LOG_ALL |
-0x1ULL << VHOST_USER_F_PROTOCOL_FEATURES;
-if (s->queues > 1) {
-msg.payload.u64 |= 0x1ULL << VIRTIO_NET_F_MQ;
-}
+
 if (s->test_flags >= TEST_FLAGS_BAD) {
 msg.payload.u64 = 0;
 s->test_flags = TEST_FLAGS_END;
+} else {
+msg.payload.u64 = s->vu_ops->get_features(s);
 }
-p = (uint8_t *) 
-qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE + msg.size);
+
+qemu_chr_fe_write_all(chr, (uint8_t *) ,
+  VHOST_USER_HDR_SIZE + msg.size);
 break;
 
 case VHOST_USER_SET_FEATURES:
@@ -990,8 +993,21 @@ static void test_multiqueue(void *obj, void *arg, 
QGuestAllocator *alloc)
 wait_for_rings_started(s, s->queues * 2);
 }
 
+
+static uint64_t vu_net_get_features(TestServer *s)
+{
+uint64_t features = 0x1ULL << VHOST_F_LOG_ALL |
+0x1ULL << VHOST_USER_F_PROTOCOL_FEATURES;
+
+if (s->queues > 1) {
+features |= 0x1ULL << VIRTIO_NET_F_MQ;
+}
+
+return features;
+}
+
 static void vu_net_set_features(TestServer *s, CharBackend *chr,
-VhostUserMsg *msg)
+VhostUserMsg *msg)
 {
 g_assert(msg->payload.u64 & (0x1ULL << VHOST_USER_F_PROTOCOL_FEATURES));
 if (s->test_flags == TEST_FLAGS_DISCONNECT) {
@@ -1020,6 +1036,7 @@ static struct vhost_user_ops g_vu_net_ops = {
 
 .append_opts = append_vhost_net_opts,
 
+.get_features = vu_net_get_features,
 .set_features = vu_net_set_features,
 .get_protocol_features = vu_net_get_protocol_features,
 };
-- 
2.30.2

[PATCH v3 17/21] tests/qtest: plain g_assert for VHOST_USER_F_PROTOCOL_FEATURES

2022-07-26 Thread Alex Bennée

checkpatch.pl warns that non-plain asserts should be avoided so
convert the check to a plain g_assert.

Signed-off-by: Alex Bennée 
---
 tests/qtest/vhost-user-test.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c
index d0fa034601..db18e0b664 100644
--- a/tests/qtest/vhost-user-test.c
+++ b/tests/qtest/vhost-user-test.c
@@ -980,8 +980,7 @@ static void test_multiqueue(void *obj, void *arg, 
QGuestAllocator *alloc)
 static void vu_net_set_features(TestServer *s, CharBackend *chr,
 VhostUserMsg *msg)
 {
-g_assert_cmpint(msg->payload.u64 &
-(0x1ULL << VHOST_USER_F_PROTOCOL_FEATURES), !=, 0ULL);
+g_assert(msg->payload.u64 & (0x1ULL << VHOST_USER_F_PROTOCOL_FEATURES));
 if (s->test_flags == TEST_FLAGS_DISCONNECT) {
 qemu_chr_fe_disconnect(chr);
 s->test_flags = TEST_FLAGS_BAD;
-- 
2.30.2

[PULL 02/16] hw/machine: Clear out left over CXL related pointer from move of state handling to machines.

2022-07-26 Thread Michael S. Tsirkin

From: Jonathan Cameron 

This got left behind in the move of the CXL setup code from core
files to the machines that support it.

Link: 
https://gitlab.com/qemu-project/qemu/-/commit/1ebf9001fb2701e3c00b401334c8f3900a46adaa
Signed-off-by: Jonathan Cameron 
Message-Id: <20220701132300.2264-2-jonathan.came...@huawei.com>
Acked-by: Igor Mammedov 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/boards.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/hw/boards.h b/include/hw/boards.h
index d94edcef28..7b416c9787 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -360,7 +360,6 @@ struct MachineState {
 CpuTopology smp;
 struct NVDIMMState *nvdimms_state;
 struct NumaState *numa_state;
-CXLFixedMemoryWindowOptionsList *cfmws_list;
 };
 
 #define DEFINE_MACHINE(namestr, machine_initfn) \
-- 
MST

[PULL 07/16] i386/pc: pass pci_hole64_size to pc_memory_init()

2022-07-26 Thread Michael S. Tsirkin

From: Joao Martins 

Use the pre-initialized pci-host qdev and fetch the
pci-hole64-size into pc_memory_init() newly added argument.
Use PCI_HOST_PROP_PCI_HOLE64_SIZE pci-host property for
fetching pci-hole64-size.

This is in preparation to determine that host-phys-bits are
enough and for pci-hole64-size to be considered to relocate
ram-above-4g to be at 1T (on AMD platforms).

Signed-off-by: Joao Martins 
Reviewed-by: Igor Mammedov 
Message-Id: <20220719170014.27028-4-joao.m.mart...@oracle.com>
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 include/hw/i386/pc.h |  3 ++-
 hw/i386/pc.c |  3 ++-
 hw/i386/pc_piix.c|  7 ++-
 hw/i386/pc_q35.c | 10 +-
 4 files changed, 19 insertions(+), 4 deletions(-)

diff --git a/include/hw/i386/pc.h b/include/hw/i386/pc.h
index 2a8ffbcfa8..01938fce4c 100644
--- a/include/hw/i386/pc.h
+++ b/include/hw/i386/pc.h
@@ -162,7 +162,8 @@ void xen_load_linux(PCMachineState *pcms);
 void pc_memory_init(PCMachineState *pcms,
 MemoryRegion *system_memory,
 MemoryRegion *rom_memory,
-MemoryRegion **ram_memory);
+MemoryRegion **ram_memory,
+uint64_t pci_hole64_size);
 uint64_t pc_pci_hole64_start(void);
 DeviceState *pc_vga_init(ISABus *isa_bus, PCIBus *pci_bus);
 void pc_basic_device_init(struct PCMachineState *pcms,
diff --git a/hw/i386/pc.c b/hw/i386/pc.c
index 13b68307be..f4d5b25fdd 100644
--- a/hw/i386/pc.c
+++ b/hw/i386/pc.c
@@ -817,7 +817,8 @@ void xen_load_linux(PCMachineState *pcms)
 void pc_memory_init(PCMachineState *pcms,
 MemoryRegion *system_memory,
 MemoryRegion *rom_memory,
-MemoryRegion **ram_memory)
+MemoryRegion **ram_memory,
+uint64_t pci_hole64_size)
 {
 int linux_boot, i;
 MemoryRegion *option_rom_mr;
diff --git a/hw/i386/pc_piix.c b/hw/i386/pc_piix.c
index b8b3ce3408..aa191d405a 100644
--- a/hw/i386/pc_piix.c
+++ b/hw/i386/pc_piix.c
@@ -91,6 +91,7 @@ static void pc_init1(MachineState *machine,
 MemoryRegion *pci_memory;
 MemoryRegion *rom_memory;
 ram_addr_t lowmem;
+uint64_t hole64_size;
 DeviceState *i440fx_host;
 
 /*
@@ -166,10 +167,14 @@ static void pc_init1(MachineState *machine,
 memory_region_init(pci_memory, NULL, "pci", UINT64_MAX);
 rom_memory = pci_memory;
 i440fx_host = qdev_new(host_type);
+hole64_size = object_property_get_uint(OBJECT(i440fx_host),
+   PCI_HOST_PROP_PCI_HOLE64_SIZE,
+   _abort);
 } else {
 pci_memory = NULL;
 rom_memory = system_memory;
 i440fx_host = NULL;
+hole64_size = 0;
 }
 
 pc_guest_info_init(pcms);
@@ -186,7 +191,7 @@ static void pc_init1(MachineState *machine,
 /* allocate ram and load rom/bios */
 if (!xen_enabled()) {
 pc_memory_init(pcms, system_memory,
-   rom_memory, _memory);
+   rom_memory, _memory, hole64_size);
 } else {
 pc_system_flash_cleanup_unused(pcms);
 if (machine->kernel_filename != NULL) {
diff --git a/hw/i386/pc_q35.c b/hw/i386/pc_q35.c
index f4d23b1469..307910b33c 100644
--- a/hw/i386/pc_q35.c
+++ b/hw/i386/pc_q35.c
@@ -138,6 +138,7 @@ static void pc_q35_init(MachineState *machine)
 MachineClass *mc = MACHINE_GET_CLASS(machine);
 bool acpi_pcihp;
 bool keep_pci_slot_hpc;
+uint64_t pci_hole64_size = 0;
 
 /* Check whether RAM fits below 4G (leaving 1/2 GByte for IO memory
  * and 256 Mbytes for PCI Express Enhanced Configuration Access Mapping
@@ -206,8 +207,15 @@ static void pc_q35_init(MachineState *machine)
 /* create pci host bus */
 q35_host = Q35_HOST_DEVICE(qdev_new(TYPE_Q35_HOST_DEVICE));
 
+if (pcmc->pci_enabled) {
+pci_hole64_size = object_property_get_uint(OBJECT(q35_host),
+   
PCI_HOST_PROP_PCI_HOLE64_SIZE,
+   _abort);
+}
+
 /* allocate ram and load rom/bios */
-pc_memory_init(pcms, get_system_memory(), rom_memory, _memory);
+pc_memory_init(pcms, get_system_memory(), rom_memory, _memory,
+   pci_hole64_size);
 
 object_property_add_child(qdev_get_machine(), "q35", OBJECT(q35_host));
 object_property_set_link(OBJECT(q35_host), MCH_HOST_PROP_RAM_MEM,
-- 
MST

Re: [RFC 1/2] hw/ppc/ppc440_uc: Initialize length passed to cpu_physical_memory_map()

2022-07-26 Thread Richard Henderson


On 7/26/22 11:23, Peter Maydell wrote:

In dcr_write_dma(), there is code that uses cpu_physical_memory_map()
to implement a DMA transfer.  That function takes a 'plen' argument,
which points to a hwaddr which is used for both input and output: the
caller must set it to the size of the range it wants to map, and on
return it is updated to the actual length mapped. The dcr_write_dma()
code fails to initialize rlen and wlen, so will end up mapping an
unpredictable amount of memory.

Initialize the length values correctly, and check that we managed to
map the entire range before using the fast-path memmove().

This was spotted by Coverity, which points out that we never
initialized the variables before using them.

Fixes: Coverity CID 1487137
Signed-off-by: Peter Maydell
---
This seems totally broken, so I presume we just don't have any
guest code that actually exercises this...
---
  hw/ppc/ppc440_uc.c | 5 -
  1 file changed, 4 insertions(+), 1 deletion(-)


Reviewed-by: Richard Henderson 

r~

[PATCH v3 11/21] hw/virtio: add boilerplate for vhost-user-gpio device

2022-07-26 Thread Alex Bennée

From: Viresh Kumar 

This creates the QEMU side of the vhost-user-gpio device which connects
to the remote daemon. It is based of vhost-user-i2c code.

Signed-off-by: Viresh Kumar 
Reviewed-by: Alex Bennée 
Message-Id: 
<5390324a748194a21bc99b1538e19761a8c64092.1641987128.git.viresh.ku...@linaro.org>
[AJB: fixes for qtest, tweaks to feature bits]
Signed-off-by: Alex Bennée 
Cc: Vincent Whitchurch 

---
v2
  - set VIRTIO_F_VERSION_1
  - set VHOST_USER_F_PROTOCOL_FEATURES
  - terminate feature_bits with VHOST_INVALID_FEATURE_BIT
  - ensure vdev->backend_features set
  - ensure vhost_dev.acked_features set
v3
  - break out vhost_dev structure for code flow reasons
  - use the vhost-user-blk connection lifecycle code
  - follow ->parent_obj style for VHostUserGPIO object
  - add all feature bits supported by the rust-vmm backend
  - clean-up errp propagation to avoid local_err and use ERRP_GAURD
  - s/vhost_dev->features/vdev->guest_features/ when calling vhost_ack_features
  - drop VHOST_USER_F_PROTOCOL_FEATURES definition (pushed to vhost-user)
  - explicitly call vhost_set_vring_enable due to properly negotiated 
VHOST_USER_F_PROTOCOL_FEATURES
---
 include/hw/virtio/vhost-user-gpio.h |  35 +++
 hw/virtio/vhost-user-gpio.c | 414 
 hw/virtio/Kconfig   |   5 +
 hw/virtio/meson.build   |   1 +
 hw/virtio/trace-events  |   5 +
 5 files changed, 460 insertions(+)
 create mode 100644 include/hw/virtio/vhost-user-gpio.h
 create mode 100644 hw/virtio/vhost-user-gpio.c

diff --git a/include/hw/virtio/vhost-user-gpio.h 
b/include/hw/virtio/vhost-user-gpio.h
new file mode 100644
index 00..4fe9aeecc0
--- /dev/null
+++ b/include/hw/virtio/vhost-user-gpio.h
@@ -0,0 +1,35 @@
+/*
+ * Vhost-user GPIO virtio device
+ *
+ * Copyright (c) 2021 Viresh Kumar 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef _QEMU_VHOST_USER_GPIO_H
+#define _QEMU_VHOST_USER_GPIO_H
+
+#include "hw/virtio/virtio.h"
+#include "hw/virtio/vhost.h"
+#include "hw/virtio/vhost-user.h"
+#include "standard-headers/linux/virtio_gpio.h"
+#include "chardev/char-fe.h"
+
+#define TYPE_VHOST_USER_GPIO "vhost-user-gpio-device"
+OBJECT_DECLARE_SIMPLE_TYPE(VHostUserGPIO, VHOST_USER_GPIO);
+
+struct VHostUserGPIO {
+/*< private >*/
+VirtIODevice parent_obj;
+CharBackend chardev;
+struct virtio_gpio_config config;
+struct vhost_virtqueue *vhost_vq;
+struct vhost_dev vhost_dev;
+VhostUserState vhost_user;
+VirtQueue *command_vq;
+VirtQueue *interrupt_vq;
+bool connected;
+/*< public >*/
+};
+
+#endif /* _QEMU_VHOST_USER_GPIO_H */
diff --git a/hw/virtio/vhost-user-gpio.c b/hw/virtio/vhost-user-gpio.c
new file mode 100644
index 00..d487966c7e
--- /dev/null
+++ b/hw/virtio/vhost-user-gpio.c
@@ -0,0 +1,414 @@
+/*
+ * Vhost-user GPIO virtio device
+ *
+ * Copyright (c) 2022 Viresh Kumar 
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/qdev-properties.h"
+#include "hw/virtio/virtio-bus.h"
+#include "hw/virtio/vhost-user-gpio.h"
+#include "qemu/error-report.h"
+#include "standard-headers/linux/virtio_ids.h"
+#include "trace.h"
+
+#define REALIZE_CONNECTION_RETRIES 3
+
+/* Features required from VirtIO */
+static const int feature_bits[] = {
+VIRTIO_F_VERSION_1,
+VIRTIO_F_NOTIFY_ON_EMPTY,
+VIRTIO_RING_F_INDIRECT_DESC,
+VIRTIO_RING_F_EVENT_IDX,
+VIRTIO_GPIO_F_IRQ,
+VHOST_INVALID_FEATURE_BIT
+};
+
+static void vu_gpio_get_config(VirtIODevice *vdev, uint8_t *config)
+{
+VHostUserGPIO *gpio = VHOST_USER_GPIO(vdev);
+
+memcpy(config, >config, sizeof(gpio->config));
+}
+
+static int vu_gpio_config_notifier(struct vhost_dev *dev)
+{
+VHostUserGPIO *gpio = VHOST_USER_GPIO(dev->vdev);
+
+memcpy(dev->vdev->config, >config, sizeof(gpio->config));
+virtio_notify_config(dev->vdev);
+
+return 0;
+}
+
+const VhostDevConfigOps gpio_ops = {
+.vhost_dev_config_notifier = vu_gpio_config_notifier,
+};
+
+static int vu_gpio_start(VirtIODevice *vdev)
+{
+BusState *qbus = BUS(qdev_get_parent_bus(DEVICE(vdev)));
+VirtioBusClass *k = VIRTIO_BUS_GET_CLASS(qbus);
+VHostUserGPIO *gpio = VHOST_USER_GPIO(vdev);
+struct vhost_dev *vhost_dev = >vhost_dev;
+int ret, i;
+
+if (!k->set_guest_notifiers) {
+error_report("binding does not support guest notifiers");
+return -ENOSYS;
+}
+
+ret = vhost_dev_enable_notifiers(vhost_dev, vdev);
+if (ret < 0) {
+error_report("Error enabling host notifiers: %d", ret);
+return ret;
+}
+
+ret = k->set_guest_notifiers(qbus->parent, vhost_dev->nvqs, true);
+if (ret < 0) {
+error_report("Error binding guest notifier: %d", ret);
+goto err_host_notifiers;
+}
+
+/*
+ * Before we start up we need to ensure we have the final feature
+ * set needed for the vhost configuration. The backend may also
+

[PATCH v3 08/21] hw/virtio: handle un-configured shutdown in virtio-pci

2022-07-26 Thread Alex Bennée

The assert() protecting against leakage is a little aggressive and
causes needless crashes if a device is shutdown without having been
configured. In this case no descriptors are lost because none have
been assigned.

Signed-off-by: Alex Bennée 
---
 hw/virtio/virtio-pci.c | 9 +++--
 1 file changed, 7 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/virtio-pci.c b/hw/virtio/virtio-pci.c
index 45327f0b31..5ce61f9b45 100644
--- a/hw/virtio/virtio-pci.c
+++ b/hw/virtio/virtio-pci.c
@@ -996,9 +996,14 @@ static int virtio_pci_set_guest_notifiers(DeviceState *d, 
int nvqs, bool assign)
 
 nvqs = MIN(nvqs, VIRTIO_QUEUE_MAX);
 
-/* When deassigning, pass a consistent nvqs value
- * to avoid leaking notifiers.
+/*
+ * When deassigning, pass a consistent nvqs value to avoid leaking
+ * notifiers. But first check we've actually been configured, exit
+ * early if we haven't.
  */
+if (!assign && !proxy->nvqs_with_notifiers) {
+return 0;
+}
 assert(assign || nvqs == proxy->nvqs_with_notifiers);
 
 proxy->nvqs_with_notifiers = nvqs;
-- 
2.30.2

[PULL 16/16] hw/virtio/virtio-iommu: Enforce power-of-two notify for both MAP and UNMAP

2022-07-26 Thread Michael S. Tsirkin

From: Jean-Philippe Brucker 

Currently we only enforce power-of-two mappings (required by the QEMU
notifier) for UNMAP requests. A MAP request not aligned on a
power-of-two may be successfully handled by VFIO, and then the
corresponding UNMAP notify will fail because it will attempt to split
that mapping. Ensure MAP and UNMAP notifications are consistent.

Fixes: dde3f08b5cab ("virtio-iommu: Handle non power of 2 range invalidations")
Reported-by: Tina Zhang 
Signed-off-by: Jean-Philippe Brucker 
Message-Id: <20220718135636.338264-1-jean-phili...@linaro.org>
Tested-by: Tina Zhang 
Reviewed-by: Michael S. Tsirkin 
Signed-off-by: Michael S. Tsirkin 
---
 hw/virtio/virtio-iommu.c | 47 
 1 file changed, 28 insertions(+), 19 deletions(-)

diff --git a/hw/virtio/virtio-iommu.c b/hw/virtio/virtio-iommu.c
index 281152d338..62e07ec2e4 100644
--- a/hw/virtio/virtio-iommu.c
+++ b/hw/virtio/virtio-iommu.c
@@ -197,6 +197,32 @@ static gint interval_cmp(gconstpointer a, gconstpointer b, 
gpointer user_data)
 }
 }
 
+static void virtio_iommu_notify_map_unmap(IOMMUMemoryRegion *mr,
+  IOMMUTLBEvent *event,
+  hwaddr virt_start, hwaddr virt_end)
+{
+uint64_t delta = virt_end - virt_start;
+
+event->entry.iova = virt_start;
+event->entry.addr_mask = delta;
+
+if (delta == UINT64_MAX) {
+memory_region_notify_iommu(mr, 0, *event);
+}
+
+while (virt_start != virt_end + 1) {
+uint64_t mask = dma_aligned_pow2_mask(virt_start, virt_end, 64);
+
+event->entry.addr_mask = mask;
+event->entry.iova = virt_start;
+memory_region_notify_iommu(mr, 0, *event);
+virt_start += mask + 1;
+if (event->entry.perm != IOMMU_NONE) {
+event->entry.translated_addr += mask + 1;
+}
+}
+}
+
 static void virtio_iommu_notify_map(IOMMUMemoryRegion *mr, hwaddr virt_start,
 hwaddr virt_end, hwaddr paddr,
 uint32_t flags)
@@ -215,19 +241,16 @@ static void virtio_iommu_notify_map(IOMMUMemoryRegion 
*mr, hwaddr virt_start,
 
 event.type = IOMMU_NOTIFIER_MAP;
 event.entry.target_as = _space_memory;
-event.entry.addr_mask = virt_end - virt_start;
-event.entry.iova = virt_start;
 event.entry.perm = perm;
 event.entry.translated_addr = paddr;
 
-memory_region_notify_iommu(mr, 0, event);
+virtio_iommu_notify_map_unmap(mr, , virt_start, virt_end);
 }
 
 static void virtio_iommu_notify_unmap(IOMMUMemoryRegion *mr, hwaddr virt_start,
   hwaddr virt_end)
 {
 IOMMUTLBEvent event;
-uint64_t delta = virt_end - virt_start;
 
 if (!(mr->iommu_notify_flags & IOMMU_NOTIFIER_UNMAP)) {
 return;
@@ -239,22 +262,8 @@ static void virtio_iommu_notify_unmap(IOMMUMemoryRegion 
*mr, hwaddr virt_start,
 event.entry.target_as = _space_memory;
 event.entry.perm = IOMMU_NONE;
 event.entry.translated_addr = 0;
-event.entry.addr_mask = delta;
-event.entry.iova = virt_start;
 
-if (delta == UINT64_MAX) {
-memory_region_notify_iommu(mr, 0, event);
-}
-
-
-while (virt_start != virt_end + 1) {
-uint64_t mask = dma_aligned_pow2_mask(virt_start, virt_end, 64);
-
-event.entry.addr_mask = mask;
-event.entry.iova = virt_start;
-memory_region_notify_iommu(mr, 0, event);
-virt_start += mask + 1;
-}
+virtio_iommu_notify_map_unmap(mr, , virt_start, virt_end);
 }
 
 static gboolean virtio_iommu_notify_unmap_cb(gpointer key, gpointer value,
-- 
MST

[PATCH v3 14/21] tests/qtest: add a timeout for subprocess_run_one_test

2022-07-26 Thread Alex Bennée

Hangs have been observed in the tests and currently we don't timeout
if a subprocess hangs. Rectify that.

Signed-off-by: Alex Bennée 

---
v3
  - expand timeout to 180 at Thomas' suggestion
---
 tests/qtest/qos-test.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/tests/qtest/qos-test.c b/tests/qtest/qos-test.c
index 7e1c8fc579..ac23def284 100644
--- a/tests/qtest/qos-test.c
+++ b/tests/qtest/qos-test.c
@@ -185,7 +185,7 @@ static void run_one_test(const void *arg)
 static void subprocess_run_one_test(const void *arg)
 {
 const gchar *path = arg;
-g_test_trap_subprocess(path, 0,
+g_test_trap_subprocess(path, 180 * G_USEC_PER_SEC,
G_TEST_SUBPROCESS_INHERIT_STDOUT | 
G_TEST_SUBPROCESS_INHERIT_STDERR);
 g_test_trap_assert_passed();
 }
-- 
2.30.2

[PATCH v3 06/21] hw/virtio: incorporate backend features in features

2022-07-26 Thread Alex Bennée

There are some extra bits used over a vhost-user connection which are
hidden from the device itself. We need to set them here to ensure we
enable things like the protocol extensions.

Currently net/vhost-user.c has it's own inscrutable way of persisting
this data but it really should live in the core vhost_user code.

Signed-off-by: Alex Bennée 
---
 hw/virtio/vhost-user.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 55fce18480..a96a586ebf 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -1461,7 +1461,14 @@ static int vhost_user_set_features(struct vhost_dev *dev,
  */
 bool log_enabled = features & (0x1ULL << VHOST_F_LOG_ALL);
 
-return vhost_user_set_u64(dev, VHOST_USER_SET_FEATURES, features,
+/*
+ * We need to include any extra backend only feature bits that
+ * might be needed by our device. Currently this includes the
+ * VHOST_USER_F_PROTOCOL_FEATURES bit for enabling protocol
+ * features.
+ */
+return vhost_user_set_u64(dev, VHOST_USER_SET_FEATURES,
+  features | dev->backend_features,
   log_enabled);
 }
 
-- 
2.30.2

[PATCH v3 10/21] hw/virtio: add some vhost-user trace events

2022-07-26 Thread Alex Bennée

These are useful for tracing the lifetime of vhost-user connections.

Signed-off-by: Alex Bennée 
---
 hw/virtio/vhost.c  | 6 ++
 hw/virtio/trace-events | 4 
 2 files changed, 10 insertions(+)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index f758f177bb..5185c15295 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1477,6 +1477,8 @@ void vhost_dev_cleanup(struct vhost_dev *hdev)
 {
 int i;
 
+trace_vhost_dev_cleanup(hdev);
+
 for (i = 0; i < hdev->nvqs; ++i) {
 vhost_virtqueue_cleanup(hdev->vqs + i);
 }
@@ -1783,6 +1785,8 @@ int vhost_dev_start(struct vhost_dev *hdev, VirtIODevice 
*vdev)
 /* should only be called after backend is connected */
 assert(hdev->vhost_ops);
 
+trace_vhost_dev_start(hdev, vdev->name);
+
 vdev->vhost_started = true;
 hdev->started = true;
 hdev->vdev = vdev;
@@ -1869,6 +1873,8 @@ void vhost_dev_stop(struct vhost_dev *hdev, VirtIODevice 
*vdev)
 /* should only be called after backend is connected */
 assert(hdev->vhost_ops);
 
+trace_vhost_dev_stop(hdev, vdev->name);
+
 if (hdev->vhost_ops->vhost_dev_start) {
 hdev->vhost_ops->vhost_dev_start(hdev, false);
 }
diff --git a/hw/virtio/trace-events b/hw/virtio/trace-events
index 20af2e7ebd..887ca7afa8 100644
--- a/hw/virtio/trace-events
+++ b/hw/virtio/trace-events
@@ -8,6 +8,10 @@ vhost_region_add_section_aligned(const char *name, uint64_t 
gpa, uint64_t size,
 vhost_section(const char *name) "%s"
 vhost_reject_section(const char *name, int d) "%s:%d"
 vhost_iotlb_miss(void *dev, int step) "%p step %d"
+vhost_dev_cleanup(void *dev) "%p"
+vhost_dev_start(void *dev, const char *name) "%p:%s"
+vhost_dev_stop(void *dev, const char *name) "%p:%s"
+
 
 # vhost-user.c
 vhost_user_postcopy_end_entry(void) ""
-- 
2.30.2

[PULL 00/16] pc,virtio: fixes

2022-07-26 Thread Michael S. Tsirkin

The following changes since commit d1c912b816844aa045082595eba796b5a025dbc4:

  Merge tag 'linux-user-for-7.1-pull-request' of 
https://gitlab.com/laurent_vivier/qemu into staging (2022-07-26 13:29:26 +0100)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/virt/kvm/mst/qemu.git tags/for_upstream

for you to fetch changes up to 0522be9a0c0094088ccef7aab352c57f483ca250:

  hw/virtio/virtio-iommu: Enforce power-of-two notify for both MAP and UNMAP 
(2022-07-26 15:33:29 -0400)


pc,virtio: fixes

Several fixes. From now on, regression fixes only.

Signed-off-by: Michael S. Tsirkin 


Jean-Philippe Brucker (1):
  hw/virtio/virtio-iommu: Enforce power-of-two notify for both MAP and UNMAP

Joao Martins (11):
  hw/i386: add 4g boundary start to X86MachineState
  i386/pc: create pci-host qdev prior to pc_memory_init()
  i386/pc: pass pci_hole64_size to pc_memory_init()
  i386/pc: factor out above-4g end to an helper
  i386/pc: factor out cxl range end to helper
  i386/pc: factor out cxl range start to helper
  i386/pc: handle unitialized mr in pc_get_cxl_range_end()
  i386/pc: factor out device_memory base/size to helper
  i386/pc: bounds check phys-bits against max used GPA
  i386/pc: relocate 4g start to 1T where applicable
  i386/pc: restrict AMD only enforcing of 1Tb hole to new machine type

Jonathan Cameron (3):
  hw/machine: Clear out left over CXL related pointer from move of state 
handling to machines.
  hw/i386/pc: Always place CXL Memory Regions after device_memory
  hw/cxl: Fix size of constant in interleave granularity function.

Robert Hoo (1):
  acpi/nvdimm: Define trace events for NVDIMM and substitute nvdimm_debug()

 include/hw/boards.h|   1 -
 include/hw/cxl/cxl_component.h |   2 +-
 include/hw/i386/pc.h   |   4 +-
 include/hw/i386/x86.h  |   3 +
 include/hw/mem/nvdimm.h|   8 --
 include/hw/pci-host/i440fx.h   |   3 +-
 hw/acpi/nvdimm.c   |  35 ---
 hw/i386/acpi-build.c   |   2 +-
 hw/i386/pc.c   | 209 -
 hw/i386/pc_piix.c  |  15 ++-
 hw/i386/pc_q35.c   |  15 ++-
 hw/i386/sgx.c  |   2 +-
 hw/i386/x86.c  |   1 +
 hw/pci-host/i440fx.c   |   5 +-
 hw/virtio/virtio-iommu.c   |  47 +
 hw/acpi/trace-events   |  13 +++
 16 files changed, 258 insertions(+), 107 deletions(-)

[PATCH v3 18/21] tests/qtest: add assert to catch bad features

2022-07-26 Thread Alex Bennée

No device driver (which is what the qvirtio_ access functions
represent) should be setting UNUSED(30) in the feature space. Although
existing libqos users mask it out lets ensure nothing sneaks through.

Signed-off-by: Alex Bennée 
---
 tests/qtest/libqos/virtio.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/tests/qtest/libqos/virtio.c b/tests/qtest/libqos/virtio.c
index 09ec09b655..03056e5187 100644
--- a/tests/qtest/libqos/virtio.c
+++ b/tests/qtest/libqos/virtio.c
@@ -101,6 +101,8 @@ uint64_t qvirtio_get_features(QVirtioDevice *d)
 
 void qvirtio_set_features(QVirtioDevice *d, uint64_t features)
 {
+g_assert(!(features & QVIRTIO_F_BAD_FEATURE));
+
 d->features = features;
 d->bus->set_features(d, features);
 
-- 
2.30.2

[PATCH v3 19/21] tests/qtest: implement stub for VHOST_USER_GET_CONFIG

2022-07-26 Thread Alex Bennée

We don't implement the full solution because frankly none of the tests
need to at the moment. We may end up re-implementing libvhostuser in
the end.

Signed-off-by: Alex Bennée 
---
 tests/qtest/vhost-user-test.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c
index db18e0b664..d546721f5d 100644
--- a/tests/qtest/vhost-user-test.c
+++ b/tests/qtest/vhost-user-test.c
@@ -79,6 +79,8 @@ typedef enum VhostUserRequest {
 VHOST_USER_SET_PROTOCOL_FEATURES = 16,
 VHOST_USER_GET_QUEUE_NUM = 17,
 VHOST_USER_SET_VRING_ENABLE = 18,
+VHOST_USER_GET_CONFIG = 24,
+VHOST_USER_SET_CONFIG = 25,
 VHOST_USER_MAX
 } VhostUserRequest;
 
@@ -372,6 +374,17 @@ static void chr_read(void *opaque, const uint8_t *buf, int 
size)
 }
 break;
 
+case VHOST_USER_GET_CONFIG:
+/*
+ * Treat GET_CONFIG as a NOP and just reply and let the guest
+ * consider we have updated its memory. Tests currently don't
+ * require working configs.
+ */
+msg.flags |= VHOST_USER_REPLY_MASK;
+p = (uint8_t *) 
+qemu_chr_fe_write_all(chr, p, VHOST_USER_HDR_SIZE + msg.size);
+break;
+
 case VHOST_USER_SET_PROTOCOL_FEATURES:
 /*
  * We did set VHOST_USER_F_PROTOCOL_FEATURES so its valid for
-- 
2.30.2

[PATCH v3 05/21] block/vhost-user-blk-server: don't expose VHOST_USER_F_PROTOCOL_FEATURES

2022-07-26 Thread Alex Bennée

This bit is unused in actual VirtIO feature negotiation and should
only appear in the vhost-user messages between master and slave.

[AJB: experiment, this doesn't break the tests but I'm not super
confident of the range of tests]

Signed-off-by: Alex Bennée 
---
 block/export/vhost-user-blk-server.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/block/export/vhost-user-blk-server.c 
b/block/export/vhost-user-blk-server.c
index 3409d9e02e..d31436006d 100644
--- a/block/export/vhost-user-blk-server.c
+++ b/block/export/vhost-user-blk-server.c
@@ -125,8 +125,7 @@ static uint64_t vu_blk_get_features(VuDev *dev)
1ull << VIRTIO_BLK_F_MQ |
1ull << VIRTIO_F_VERSION_1 |
1ull << VIRTIO_RING_F_INDIRECT_DESC |
-   1ull << VIRTIO_RING_F_EVENT_IDX |
-   1ull << VHOST_USER_F_PROTOCOL_FEATURES;
+   1ull << VIRTIO_RING_F_EVENT_IDX ;
 
 if (!vexp->handler.writable) {
 features |= 1ull << VIRTIO_BLK_F_RO;
-- 
2.30.2

[PATCH v3 21/21] tests/qtest: enable tests for virtio-gpio

2022-07-26 Thread Alex Bennée

We don't have a virtio-gpio implementation in QEMU and only
support a vhost-user backend. The QEMU side of the code is minimal so
it should be enough to instantiate the device and pass some vhost-user
messages over the control socket. To do this we hook into the existing
vhost-user-test code and just add the bits required for gpio.

Signed-off-by: Alex Bennée 
Cc: Viresh Kumar 
Cc: Paolo Bonzini 
Cc: Eric Auger 
Message-Id: <20220408155704.2777166-1-alex.ben...@linaro.org>

---
v2
  - add more of the missing boilerplate
  - don't request LOG_SHMD
  - use get_features op
  - report VIRTIO_F_VERSION_1
  - more comments
---
 tests/qtest/libqos/virtio-gpio.h |  35 +++
 tests/qtest/libqos/virtio-gpio.c | 171 +++
 tests/qtest/libqos/virtio.c  |   2 +-
 tests/qtest/vhost-user-test.c|  66 
 tests/qtest/libqos/meson.build   |   1 +
 5 files changed, 274 insertions(+), 1 deletion(-)
 create mode 100644 tests/qtest/libqos/virtio-gpio.h
 create mode 100644 tests/qtest/libqos/virtio-gpio.c

diff --git a/tests/qtest/libqos/virtio-gpio.h b/tests/qtest/libqos/virtio-gpio.h
new file mode 100644
index 00..f11d41bd19
--- /dev/null
+++ b/tests/qtest/libqos/virtio-gpio.h
@@ -0,0 +1,35 @@
+/*
+ * virtio-gpio structures
+ *
+ * Copyright (c) 2022 Linaro Ltd
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#ifndef TESTS_LIBQOS_VIRTIO_GPIO_H
+#define TESTS_LIBQOS_VIRTIO_GPIO_H
+
+#include "qgraph.h"
+#include "virtio.h"
+#include "virtio-pci.h"
+
+typedef struct QVhostUserGPIO QVhostUserGPIO;
+typedef struct QVhostUserGPIOPCI QVhostUserGPIOPCI;
+typedef struct QVhostUserGPIODevice QVhostUserGPIODevice;
+
+struct QVhostUserGPIO {
+QVirtioDevice *vdev;
+QVirtQueue **queues;
+};
+
+struct QVhostUserGPIOPCI {
+QVirtioPCIDevice pci_vdev;
+QVhostUserGPIO gpio;
+};
+
+struct QVhostUserGPIODevice {
+QOSGraphObject obj;
+QVhostUserGPIO gpio;
+};
+
+#endif
diff --git a/tests/qtest/libqos/virtio-gpio.c b/tests/qtest/libqos/virtio-gpio.c
new file mode 100644
index 00..762aa6695b
--- /dev/null
+++ b/tests/qtest/libqos/virtio-gpio.c
@@ -0,0 +1,171 @@
+/*
+ * virtio-gpio nodes for testing
+ *
+ * Copyright (c) 2022 Linaro Ltd
+ *
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ */
+
+#include "qemu/osdep.h"
+#include "standard-headers/linux/virtio_config.h"
+#include "../libqtest.h"
+#include "qemu/module.h"
+#include "qgraph.h"
+#include "virtio-gpio.h"
+
+static QGuestAllocator *alloc;
+
+static void virtio_gpio_cleanup(QVhostUserGPIO *gpio)
+{
+QVirtioDevice *vdev = gpio->vdev;
+int i;
+
+for (i = 0; i < 2; i++) {
+qvirtqueue_cleanup(vdev->bus, gpio->queues[i], alloc);
+}
+g_free(gpio->queues);
+}
+
+/*
+ * This handles the VirtIO setup from the point of view of the driver
+ * frontend and therefor doesn't present any vhost specific features
+ * and in fact masks of the re-used bit.
+ */
+static void virtio_gpio_setup(QVhostUserGPIO *gpio)
+{
+QVirtioDevice *vdev = gpio->vdev;
+uint64_t features;
+int i;
+
+features = qvirtio_get_features(vdev);
+features &= ~QVIRTIO_F_BAD_FEATURE;
+qvirtio_set_features(vdev, features);
+
+gpio->queues = g_new(QVirtQueue *, 2);
+for (i = 0; i < 2; i++) {
+gpio->queues[i] = qvirtqueue_setup(vdev, alloc, i);
+}
+qvirtio_set_driver_ok(vdev);
+}
+
+static void *qvirtio_gpio_get_driver(QVhostUserGPIO *v_gpio,
+ const char *interface)
+{
+if (!g_strcmp0(interface, "vhost-user-gpio")) {
+return v_gpio;
+}
+if (!g_strcmp0(interface, "virtio")) {
+return v_gpio->vdev;
+}
+
+g_assert_not_reached();
+}
+
+static void *qvirtio_gpio_device_get_driver(void *object,
+const char *interface)
+{
+QVhostUserGPIODevice *v_gpio = object;
+return qvirtio_gpio_get_driver(_gpio->gpio, interface);
+}
+
+/* virtio-gpio (mmio) */
+static void qvirtio_gpio_device_destructor(QOSGraphObject *obj)
+{
+QVhostUserGPIODevice *gpio_dev = (QVhostUserGPIODevice *) obj;
+virtio_gpio_cleanup(_dev->gpio);
+}
+
+static void qvirtio_gpio_device_start_hw(QOSGraphObject *obj)
+{
+QVhostUserGPIODevice *gpio_dev = (QVhostUserGPIODevice *) obj;
+virtio_gpio_setup(_dev->gpio);
+}
+
+static void *virtio_gpio_device_create(void *virtio_dev,
+   QGuestAllocator *t_alloc,
+   void *addr)
+{
+QVhostUserGPIODevice *virtio_device = g_new0(QVhostUserGPIODevice, 1);
+QVhostUserGPIO *interface = _device->gpio;
+
+interface->vdev = virtio_dev;
+alloc = t_alloc;
+
+virtio_device->obj.get_driver = qvirtio_gpio_device_get_driver;
+virtio_device->obj.start_hw = qvirtio_gpio_device_start_hw;
+virtio_device->obj.destructor = qvirtio_gpio_device_destructor;
+
+return _device->obj;
+}
+
+/* virtio-gpio-pci */
+static void

[PATCH v3 02/21] include/hw: document vhost_dev feature life-cycle

2022-07-26 Thread Alex Bennée

Try and explicitly document the various state of feature bits as
related to the vhost_dev structure. Importantly the backend_features
can advertise things like VHOST_USER_F_PROTOCOL_FEATURES which is
never exposed to the driver and is only present in the vhost-user
feature negotiation.

Signed-off-by: Alex Bennée 
---
 include/hw/virtio/vhost.h | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/include/hw/virtio/vhost.h b/include/hw/virtio/vhost.h
index a346f23d13..586c5457e2 100644
--- a/include/hw/virtio/vhost.h
+++ b/include/hw/virtio/vhost.h
@@ -86,8 +86,11 @@ struct vhost_dev {
 /* if non-zero, minimum required value for max_queues */
 int num_queues;
 uint64_t features;
+/** @acked_features: final set of negotiated features */
 uint64_t acked_features;
+/** @backend_features: backend specific feature bits */
 uint64_t backend_features;
+/** @protocol_features: final negotiated protocol features */
 uint64_t protocol_features;
 uint64_t max_queues;
 uint64_t backend_cap;
-- 
2.30.2

[PATCH v3 07/21] hw/virtio: gracefully handle unset vhost_dev vdev

2022-07-26 Thread Alex Bennée

I've noticed asserts firing because we query the status of vdev after
a vhost connection is closed down. Rather than faulting on the NULL
indirect just quietly reply false.

Signed-off-by: Alex Bennée 
---
 hw/virtio/vhost.c | 10 +++---
 1 file changed, 7 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index 0827d631c0..f758f177bb 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -306,7 +306,7 @@ static inline void vhost_dev_log_resize(struct vhost_dev 
*dev, uint64_t size)
 dev->log_size = size;
 }
 
-static int vhost_dev_has_iommu(struct vhost_dev *dev)
+static bool vhost_dev_has_iommu(struct vhost_dev *dev)
 {
 VirtIODevice *vdev = dev->vdev;
 
@@ -316,8 +316,12 @@ static int vhost_dev_has_iommu(struct vhost_dev *dev)
  * does not have IOMMU, there's no need to enable this feature
  * which may cause unnecessary IOTLB miss/update transactions.
  */
-return virtio_bus_device_iommu_enabled(vdev) &&
-   virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
+if (vdev) {
+return virtio_bus_device_iommu_enabled(vdev) &&
+virtio_host_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM);
+} else {
+return false;
+}
 }
 
 static void *vhost_memory_map(struct vhost_dev *dev, hwaddr addr,
-- 
2.30.2

[PATCH v3 09/21] hw/virtio: fix vhost_user_read tracepoint

2022-07-26 Thread Alex Bennée

As reads happen in the callback we were never seeing them. We only
really care about the header so move the tracepoint to when the header
is complete.

Fixes: 6ca6d8ee9d (hw/virtio: add vhost_user_[read|write] trace points)
Signed-off-by: Alex Bennée 
---
 hw/virtio/vhost-user.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index a96a586ebf..b7c13e7e16 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -296,6 +296,8 @@ static int vhost_user_read_header(struct vhost_dev *dev, 
VhostUserMsg *msg)
 return -EPROTO;
 }
 
+trace_vhost_user_read(msg->hdr.request, msg->hdr.flags);
+
 return 0;
 }
 
@@ -545,8 +547,6 @@ static int vhost_user_set_log_base(struct vhost_dev *dev, 
uint64_t base,
 }
 }
 
-trace_vhost_user_read(msg.hdr.request, msg.hdr.flags);
-
 return 0;
 }
 
-- 
2.30.2

[PATCH v3 for 7.2 00/21] virtio-gpio and various virtio cleanups

2022-07-26 Thread Alex Bennée

Hi,

After much slogging through the vhost-user code I've gotten the
virtio-gpio device working again. The core change in pushing the
responsibility for VHOST_USER_F_PROTOCOL_FEATURES down to the
vhost-user layer (which knows it needs it). We still need to account
for that in virtio-gpio because the result of the negotiating protocol
features is the vrings start disabled so the stub needs to explicitly
enable them. I did consider pushing this behaviour explicitly into
vhost_dev_start but that would have required un-picking it from
vhost-net (which is the only other device which uses protocol features
AFAICT - but is a measure more complex in it's setup).

As last time there are a whole series of clean-ups and doc tweaks. I
don't know if any are trivial enough to sneak into later RCs but it
shouldn't be a problem to wait until the tree re-opens.

There is a remaining issue that a --enable-sanitizers build fails for
qos-test due to leaks. It shows up as a leak from:

  Direct leak of 240 byte(s) in 1 object(s) allocated from: 

   
  #0 0x7fc5a3f2a037 in __interceptor_calloc 
../../../../src/libsanitizer/asan/asan_malloc_linux.cpp:154 
   
  #1 0x7fc5a2e5cda0 in g_malloc0 ../../../glib/gmem.c:136   

   
  #2 0x55ce773cc728 in virtio_device_realize ../../hw/virtio/virtio.c:3691  

   
  #3 0x55ce7784ed7e in device_set_realized ../../hw/core/qdev.c:553 

   
  #4 0x55ce77862d0c in property_set_bool ../../qom/object.c:2273
 

I'm not entirely sure what the allocation is because it gets inlined
in the virtio_device_realize call. Perhaps it's the QOM object itself
which is never gracefully torn down at the end of the test?

However when I attempted to bisect I found master was broken as well.
For example in my arm/aarch64-softmmu build we see 5 failures:

Summary of Failures:

   3/48 qemu:qtest+qtest-aarch64 / qtest-aarch64/migration-test
ERROR  96.15s   killed by signal 6 SIGABRT
   9/48 qemu:qtest+qtest-aarch64 / qtest-aarch64/qos-test  
ERROR  32.50s   killed by signal 6 SIGABRT
  11/48 qemu:qtest+qtest-arm / qtest-arm/qos-test  
ERROR  26.93s   killed by signal 6 SIGABRT
  20/48 qemu:qtest+qtest-aarch64 / qtest-aarch64/device-introspect-test
ERROR   5.17s   killed by signal 6 SIGABRT
  45/48 qemu:qtest+qtest-arm / qtest-arm/device-introspect-test
ERROR   4.97s   killed by signal 6 SIGABRT

Of which the qos-tests are the only new ones. I suspect something must
be preventing the other stuff being exercised in our CI system.

Alex Bennée (19):
  include/hw/virtio: more comment for VIRTIO_F_BAD_FEATURE
  include/hw: document vhost_dev feature life-cycle
  hw/virtio: fix some coding style issues
  hw/virtio: log potentially buggy guest drivers
  block/vhost-user-blk-server: don't expose
VHOST_USER_F_PROTOCOL_FEATURES
  hw/virtio: incorporate backend features in features
  hw/virtio: gracefully handle unset vhost_dev vdev
  hw/virtio: handle un-configured shutdown in virtio-pci
  hw/virtio: fix vhost_user_read tracepoint
  hw/virtio: add some vhost-user trace events
  tests/qtest: pass stdout/stderr down to subtests
  tests/qtest: add a timeout for subprocess_run_one_test
  tests/qtest: use qos_printf instead of g_test_message
  tests/qtest: catch unhandled vhost-user messages
  tests/qtest: plain g_assert for VHOST_USER_F_PROTOCOL_FEATURES
  tests/qtest: add assert to catch bad features
  tests/qtest: implement stub for VHOST_USER_GET_CONFIG
  tests/qtest: add a get_features op to vhost-user-test
  tests/qtest: enable tests for virtio-gpio

Viresh Kumar (2):
  hw/virtio: add boilerplate for vhost-user-gpio device
  hw/virtio: add vhost-user-gpio-pci boilerplate

 include/hw/virtio/vhost-user-gpio.h  |  35 +++
 include/hw/virtio/vhost.h|   3 +
 include/hw/virtio/virtio.h   |   7 +-
 tests/qtest/libqos/virtio-gpio.h |  35 +++
 block/export/vhost-user-blk-server.c |   3 +-
 hw/virtio/vhost-user-gpio-pci.c  |  69 +
 hw/virtio/vhost-user-gpio.c  | 414 +++
 hw/virtio/vhost-user.c   |  20 +-
 hw/virtio/vhost.c|  16 +-
 hw/virtio/virtio-pci.c   |   9 +-
 hw/virtio/virtio.c   |   7 +
 tests/qtest/libqos/virtio-gpio.c | 171 +++
 tests/qtest/libqos/virtio.c  |   4 +-
 tests/qtest/qos-test.c   |   8 +-
 tests/qtest/vhost-user-test.c| 172 +--

[PATCH v3 04/21] hw/virtio: log potentially buggy guest drivers

2022-07-26 Thread Alex Bennée

If the guest driver attempts to use the UNUSED(30) bit it is
potentially buggy as 6.3 Legacy Interface: Reserved Feature Bits
states it "SHOULD NOT be negotiated". For now just log this guest
error.

Signed-off-by: Alex Bennée 
---
 hw/virtio/virtio.c | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/hw/virtio/virtio.c b/hw/virtio/virtio.c
index 5d607aeaa0..97a6307c0f 100644
--- a/hw/virtio/virtio.c
+++ b/hw/virtio/virtio.c
@@ -2980,6 +2980,13 @@ int virtio_set_features(VirtIODevice *vdev, uint64_t val)
 if (vdev->status & VIRTIO_CONFIG_S_FEATURES_OK) {
 return -EINVAL;
 }
+
+if (val & (1ull << VIRTIO_F_BAD_FEATURE)) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: guest driver for %s has enabled UNUSED(30) feature 
bit!\n",
+  __func__, vdev->name);
+}
+
 ret = virtio_set_features_nocheck(vdev, val);
 if (virtio_vdev_has_feature(vdev, VIRTIO_RING_F_EVENT_IDX)) {
 /* VIRTIO_RING_F_EVENT_IDX changes the size of the caches.  */
-- 
2.30.2

[PATCH v3 15/21] tests/qtest: use qos_printf instead of g_test_message

2022-07-26 Thread Alex Bennée

The vhost-user tests respawn qos-test as a standalone process. As a
result the gtester framework squashes all messages coming out of it
which make it hard to debug. As the test does not care about asserting
certain messages just convert the tests to use the direct qos_printf.

Signed-off-by: Alex Bennée 
---
 tests/qtest/qos-test.c|  5 +
 tests/qtest/vhost-user-test.c | 13 +++--
 2 files changed, 12 insertions(+), 6 deletions(-)

diff --git a/tests/qtest/qos-test.c b/tests/qtest/qos-test.c
index ac23def284..d0bdf06fad 100644
--- a/tests/qtest/qos-test.c
+++ b/tests/qtest/qos-test.c
@@ -320,6 +320,11 @@ static void walk_path(QOSGraphNode *orig_path, int len)
 int main(int argc, char **argv, char** envp)
 {
 g_test_init(, , NULL);
+
+if (g_test_subprocess()) {
+qos_printf("qos_test running single test in subprocess\n");
+}
+
 if (g_test_verbose()) {
 qos_printf("ENVIRONMENT VARIABLES: {\n");
 for (char **env = envp; *env != 0; env++) {
diff --git a/tests/qtest/vhost-user-test.c b/tests/qtest/vhost-user-test.c
index 8bf390be20..968113d591 100644
--- a/tests/qtest/vhost-user-test.c
+++ b/tests/qtest/vhost-user-test.c
@@ -26,6 +26,7 @@
 #include "libqos/virtio-pci.h"
 
 #include "libqos/malloc-pc.h"
+#include "libqos/qgraph_internal.h"
 #include "hw/virtio/virtio-net.h"
 
 #include "standard-headers/linux/vhost_types.h"
@@ -316,7 +317,7 @@ static void chr_read(void *opaque, const uint8_t *buf, int 
size)
 }
 
 if (size != VHOST_USER_HDR_SIZE) {
-g_test_message("Wrong message size received %d", size);
+qos_printf("%s: Wrong message size received %d\n", __func__, size);
 return;
 }
 
@@ -327,8 +328,8 @@ static void chr_read(void *opaque, const uint8_t *buf, int 
size)
 p += VHOST_USER_HDR_SIZE;
 size = qemu_chr_fe_read_all(chr, p, msg.size);
 if (size != msg.size) {
-g_test_message("Wrong message size received %d != %d",
-   size, msg.size);
+qos_printf("%s: Wrong message size received %d != %d\n",
+   __func__, size, msg.size);
 return;
 }
 }
@@ -450,7 +451,7 @@ static const char *init_hugepagefs(void)
 }
 
 if (access(path, R_OK | W_OK | X_OK)) {
-g_test_message("access on path (%s): %s", path, strerror(errno));
+qos_printf("access on path (%s): %s", path, strerror(errno));
 g_test_fail();
 return NULL;
 }
@@ -460,13 +461,13 @@ static const char *init_hugepagefs(void)
 } while (ret != 0 && errno == EINTR);
 
 if (ret != 0) {
-g_test_message("statfs on path (%s): %s", path, strerror(errno));
+qos_printf("statfs on path (%s): %s", path, strerror(errno));
 g_test_fail();
 return NULL;
 }
 
 if (fs.f_type != HUGETLBFS_MAGIC) {
-g_test_message("Warning: path not on HugeTLBFS: %s", path);
+qos_printf("Warning: path not on HugeTLBFS: %s", path);
 g_test_fail();
 return NULL;
 }
-- 
2.30.2

[PATCH v3 03/21] hw/virtio: fix some coding style issues

2022-07-26 Thread Alex Bennée

Signed-off-by: Alex Bennée 
---
 hw/virtio/vhost-user.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/hw/virtio/vhost-user.c b/hw/virtio/vhost-user.c
index 75b8df21a4..55fce18480 100644
--- a/hw/virtio/vhost-user.c
+++ b/hw/virtio/vhost-user.c
@@ -200,7 +200,7 @@ typedef struct {
 VhostUserRequest request;
 
 #define VHOST_USER_VERSION_MASK (0x3)
-#define VHOST_USER_REPLY_MASK   (0x1<<2)
+#define VHOST_USER_REPLY_MASK   (0x1 << 2)
 #define VHOST_USER_NEED_REPLY_MASK  (0x1 << 3)
 uint32_t flags;
 uint32_t size; /* the following payload size */
@@ -208,7 +208,7 @@ typedef struct {
 
 typedef union {
 #define VHOST_USER_VRING_IDX_MASK   (0xff)
-#define VHOST_USER_VRING_NOFD_MASK  (0x1<<8)
+#define VHOST_USER_VRING_NOFD_MASK  (0x1 << 8)
 uint64_t u64;
 struct vhost_vring_state state;
 struct vhost_vring_addr addr;
@@ -248,7 +248,8 @@ struct vhost_user {
 size_t region_rb_len;
 /* RAMBlock associated with a given region */
 RAMBlock **region_rb;
-/* The offset from the start of the RAMBlock to the start of the
+/*
+ * The offset from the start of the RAMBlock to the start of the
  * vhost region.
  */
 ram_addr_t*region_rb_offset;
-- 
2.30.2

[PATCH v3 01/21] include/hw/virtio: more comment for VIRTIO_F_BAD_FEATURE

2022-07-26 Thread Alex Bennée

When debugging a new vhost user you may be surprised to see
VHOST_USER_F_PROTOCOL getting squashed in the maze of
backend_features, acked_features and guest_features. Expand the
description here to help the next poor soul trying to work through
this.

Signed-off-by: Alex Bennée 

---
v3
  - s/vhost/vhost-user/
---
 include/hw/virtio/virtio.h | 7 ++-
 1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/include/hw/virtio/virtio.h b/include/hw/virtio/virtio.h
index db1c0ddf6b..9bb2485415 100644
--- a/include/hw/virtio/virtio.h
+++ b/include/hw/virtio/virtio.h
@@ -24,7 +24,12 @@
 #include "qom/object.h"
 #include "hw/virtio/vhost.h"
 
-/* A guest should never accept this.  It implies negotiation is broken. */
+/*
+ * A guest should never accept this. It implies negotiation is broken
+ * between the driver frontend and the device. This bit is re-used for
+ * vhost-user to advertise VHOST_USER_F_PROTOCOL_FEATURES between QEMU
+ * and a vhost-user backend.
+ */
 #define VIRTIO_F_BAD_FEATURE   30
 
 #define VIRTIO_LEGACY_FEATURES ((0x1ULL << VIRTIO_F_BAD_FEATURE) | \
-- 
2.30.2

[PATCH] Hexagon (tests/tcg/hexagon) add compiler options to EXTRA_CFLAGS

2022-07-26 Thread Taylor Simpson

The cross_cc_cflags_hexagon in configure are not getting passed to
the Hexagon cross compiler.  Set EXTRA_CFLAGS in
tests/tcg/hexagon/Makefile.target.

Suggested-by: Richard Henderson 
Signed-off-by: Taylor Simpson 
---
 tests/tcg/hexagon/Makefile.target | 1 +
 1 file changed, 1 insertion(+)

diff --git a/tests/tcg/hexagon/Makefile.target 
b/tests/tcg/hexagon/Makefile.target
index 23b9870534..627bf58fe6 100644
--- a/tests/tcg/hexagon/Makefile.target
+++ b/tests/tcg/hexagon/Makefile.target
@@ -20,6 +20,7 @@ EXTRA_RUNS =
 
 CFLAGS += -Wno-incompatible-pointer-types -Wno-undefined-internal
 CFLAGS += -fno-unroll-loops
+EXTRA_CFLAGS += -mv67 -O2
 
 HEX_SRC=$(SRC_PATH)/tests/tcg/hexagon
 VPATH += $(HEX_SRC)
-- 
2.17.1

Re: [PATCH 0/2] block/parallels: Fix buffer-based write call

2022-07-26 Thread Vladimir Sementsov-Ogievskiy


On 7/14/22 16:27, Hanna Reitz wrote:

Hi,

While reviewing Stefan’s libblkio driver series, I’ve noticed that
block/parallels.c contains a call to bdrv_co_pwritev() that doesn’t pass
a QEMUIOVector object but a plain buffer instead.  That seems wrong and
also pretty dangerous, so change it to a bdrv_co_pwrite() call (as I
assume it should be), and add a regression test demonstrating the
problem.


Hanna Reitz (2):
   block/parallels: Fix buffer-based write call
   iotests/131: Add parallels regression test

  block/parallels.c  |  4 ++--
  tests/qemu-iotests/131 | 35 ++-
  tests/qemu-iotests/131.out | 13 +
  3 files changed, 49 insertions(+), 3 deletions(-)



Thanks, applied to my block branch.

--
Best regards,
Vladimir

Re: [PULL 0/9] target-arm queue

2022-07-26 Thread Richard Henderson


On 7/26/22 08:20, Peter Maydell wrote:

A last lot of bug fixes before rc0...

thanks
-- PMM

The following changes since commit 0d0275c31f00b71b49eb80bbdca2cfe244cf80fb:

   Merge tag 'net-pull-request' of https://github.com/jasowang/qemu into 
staging (2022-07-26 10:31:02 +0100)

are available in the Git repository at:

   https://git.linaro.org/people/pmaydell/qemu-arm.git 
tags/pull-target-arm-20220726

for you to fetch changes up to 5865d99fe88d8c8fa437c18c6b63fb2a8165634f:

   hw/display/bcm2835_fb: Fix framebuffer allocation address (2022-07-26 
14:09:44 +0100)


target-arm queue:
  * Update Coverity component definitions
  * target/arm: Add MO_128 entry to pred_esz_masks[]
  * configure: Fix portability issues
  * hw/display/bcm2835_fb: Fix framebuffer allocation address


Applied, thanks.  Please update https://wiki.qemu.org/ChangeLog/7.1 as 
appropriate.


r~





Alan Jian (1):
   hw/display/bcm2835_fb: Fix framebuffer allocation address

Peter Maydell (8):
   scripts/coverity-scan/COMPONENTS.md: Add loongarch component
   scripts/coverity-scan/COMPONENTS.md: Update slirp component info
   target/arm: Add MO_128 entry to pred_esz_masks[]
   configure: Add missing POSIX-required space
   configure: Add braces to clarify intent of $emu[[:space:]]
   configure: Don't use bash-specific string-replacement syntax
   configure: Drop dead code attempting to use -msmall-data on alpha hosts
   configure: Avoid '==' bashism

  configure   | 20 +++-
  target/arm/cpu.h|  2 +-
  hw/display/bcm2835_fb.c |  3 +--
  target/arm/translate-sve.c  |  5 +++--
  scripts/coverity-scan/COMPONENTS.md |  7 +--
  5 files changed, 17 insertions(+), 20 deletions(-)

Re: [RFC 1/2] hw/ppc/ppc440_uc: Initialize length passed to cpu_physical_memory_map()

2022-07-26 Thread Peter Maydell

On Tue, 26 Jul 2022 at 19:23, Peter Maydell  wrote:
>
> In dcr_write_dma(), there is code that uses cpu_physical_memory_map()
> to implement a DMA transfer.  That function takes a 'plen' argument,
> which points to a hwaddr which is used for both input and output: the
> caller must set it to the size of the range it wants to map, and on
> return it is updated to the actual length mapped. The dcr_write_dma()
> code fails to initialize rlen and wlen, so will end up mapping an
> unpredictable amount of memory.
>
> Initialize the length values correctly, and check that we managed to
> map the entire range before using the fast-path memmove().
>
> This was spotted by Coverity, which points out that we never
> initialized the variables before using them.
>
> Fixes: Coverity CID 1487137

Also CID 1487150 (it reports the wlen and rlen issues separately).

> Signed-off-by: Peter Maydell 

-- PMM

[RFC 1/2] hw/ppc/ppc440_uc: Initialize length passed to cpu_physical_memory_map()

2022-07-26 Thread Peter Maydell

In dcr_write_dma(), there is code that uses cpu_physical_memory_map()
to implement a DMA transfer.  That function takes a 'plen' argument,
which points to a hwaddr which is used for both input and output: the
caller must set it to the size of the range it wants to map, and on
return it is updated to the actual length mapped. The dcr_write_dma()
code fails to initialize rlen and wlen, so will end up mapping an
unpredictable amount of memory.

Initialize the length values correctly, and check that we managed to
map the entire range before using the fast-path memmove().

This was spotted by Coverity, which points out that we never
initialized the variables before using them.

Fixes: Coverity CID 1487137
Signed-off-by: Peter Maydell 
---
This seems totally broken, so I presume we just don't have any
guest code that actually exercises this...
---
 hw/ppc/ppc440_uc.c | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/hw/ppc/ppc440_uc.c b/hw/ppc/ppc440_uc.c
index a1ecf6dd1c2..11fdb88c220 100644
--- a/hw/ppc/ppc440_uc.c
+++ b/hw/ppc/ppc440_uc.c
@@ -904,14 +904,17 @@ static void dcr_write_dma(void *opaque, int dcrn, 
uint32_t val)
 int width, i, sidx, didx;
 uint8_t *rptr, *wptr;
 hwaddr rlen, wlen;
+hwaddr xferlen;
 
 sidx = didx = 0;
 width = 1 << ((val & DMA0_CR_PW) >> 25);
+xferlen = count * width;
+wlen = rlen = xferlen;
 rptr = cpu_physical_memory_map(dma->ch[chnl].sa, ,
false);
 wptr = cpu_physical_memory_map(dma->ch[chnl].da, ,
true);
-if (rptr && wptr) {
+if (rptr && rlen == xferlen && wptr && wlen == xferlen) {
 if (!(val & DMA0_CR_DEC) &&
 val & DMA0_CR_SAI && val & DMA0_CR_DAI) {
 /* optimise common case */
-- 
2.25.1

[RFC 0/2] Fix Coverity and other errors in ppc440_uc DMA

2022-07-26 Thread Peter Maydell

This patchset is mainly trying to fix a problem that Coverity spotted
in the dcr_write_dma() function in hw/ppc/ppc440_uc.c, where the code
is not correctly using the cpu_physical_memory_map() function.
While I was fixing that I noticed a second problem in this code,
where it doesn't have a fallback for when cpu_physical_memory_map()
says "I couldn't map that for you".

I've marked these patches as RFC, partly because I don't have any
guest that would exercise the code changes[*], and partly because
I don't have any documentation of the hardware to tell me how it
should behave, so patch 2 in particular has some FIXMEs. I also
notice that the code doesn't update any of the registers like the
count or source/base addresses when the DMA transfer happens, which
seems odd, but perhaps the real hardware does work like that.

I think we should probably take patch 1 (which is a fairly minimal
fix of the use-of-uninitialized-data problem), but patch 2 is a bit
more unfinished.

[*] The commit 3c409c1927efde2fc that added this code says it's used
by AmigaOS.)

thanks
-- PMM

Peter Maydell (2):
  hw/ppc/ppc440_uc: Initialize length passed to
cpu_physical_memory_map()
  hw/ppc/ppc440_uc: Handle mapping failure in DMA engine

 hw/ppc/ppc440_uc.c | 34 +-
 1 file changed, 33 insertions(+), 1 deletion(-)

-- 
2.25.1

[RFC 2/2] hw/ppc/ppc440_uc: Handle mapping failure in DMA engine

2022-07-26 Thread Peter Maydell

Currently the code for doing DMA in dcr_write_dma() has no fallback
code for if its calls to cpu_physical_memory_map() fail. Add
handling for this situation, by using address_space_read() and
address_space_write() to do the data transfers.

Signed-off-by: Peter Maydell 
---
I believe this to be equivalent to the fastpath code.  However, as
the comments note, I don't know what the intended behaviour on a DMA
memory access error is, because I couldn't find a datasheet for this
hardware.  I am also a bit suspicious that the current code does not
seem to update any of the count, source or destination addresses
after the memory transfer: is that really how the hardware behaves?
---
 hw/ppc/ppc440_uc.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/hw/ppc/ppc440_uc.c b/hw/ppc/ppc440_uc.c
index 11fdb88c220..0879f180a14 100644
--- a/hw/ppc/ppc440_uc.c
+++ b/hw/ppc/ppc440_uc.c
@@ -905,6 +905,7 @@ static void dcr_write_dma(void *opaque, int dcrn, uint32_t 
val)
 uint8_t *rptr, *wptr;
 hwaddr rlen, wlen;
 hwaddr xferlen;
+bool fastpathed = false;
 
 sidx = didx = 0;
 width = 1 << ((val & DMA0_CR_PW) >> 25);
@@ -915,6 +916,7 @@ static void dcr_write_dma(void *opaque, int dcrn, uint32_t 
val)
 wptr = cpu_physical_memory_map(dma->ch[chnl].da, ,
true);
 if (rptr && rlen == xferlen && wptr && wlen == xferlen) {
+fastpathed = true;
 if (!(val & DMA0_CR_DEC) &&
 val & DMA0_CR_SAI && val & DMA0_CR_DAI) {
 /* optimise common case */
@@ -940,6 +942,33 @@ static void dcr_write_dma(void *opaque, int dcrn, uint32_t 
val)
 if (rptr) {
 cpu_physical_memory_unmap(rptr, rlen, 0, sidx);
 }
+if (!fastpathed) {
+/* Fast-path failed, do each access one at a time */
+for (sidx = didx = i = 0; i < count; i++) {
+uint8_t buf[8];
+assert(width <= sizeof(buf));
+if (address_space_read(_space_memory,
+   dma->ch[chnl].sa + sidx,
+   MEMTXATTRS_UNSPECIFIED,
+   buf, width) != MEMTX_OK) {
+/* FIXME: model correct behaviour on errors */
+break;
+}
+if (address_space_write(_space_memory,
+dma->ch[chnl].da + didx,
+MEMTXATTRS_UNSPECIFIED,
+buf, width) != MEMTX_OK) {
+/* FIXME: model correct behaviour on errors */
+break;
+}
+if (val & DMA0_CR_SAI) {
+sidx += width;
+}
+if (val & DMA0_CR_DAI) {
+didx += width;
+}
+}
+}
 }
 }
 break;
-- 
2.25.1

Re: hexagon docker test failure

2022-07-26 Thread Richard Henderson


On 7/26/22 11:00, Taylor Simpson wrote:

So, instead of putting those in CFLAGS, put them in EXTRA_CFLAGS.

--- a/tests/tcg/hexagon/Makefile.target
+++ b/tests/tcg/hexagon/Makefile.target
@@ -20,6 +20,7 @@ EXTRA_RUNS =
  
  CFLAGS += -Wno-incompatible-pointer-types -Wno-undefined-internal

  CFLAGS += -fno-unroll-loops
+EXTRA_CFLAGS += -mv67 -O2
  
  HEX_SRC=$(SRC_PATH)/tests/tcg/hexagon

  VPATH += $(HEX_SRC)


Ah, ok.


Do I understand correctly that putting the flags in Makefile.target is the 
proper way and cross_cc_cflags is obsolete?


cross_cc_flags is intended to handle using one compiler for multiple targets, e.g. arm vs 
armbe.


Which is not what you're attempting to do; you're trying to test a particular isa. 
Compare tests/tcg/aarch64/Makefile.target:


bti-1 bti-3: CFLAGS += -mbranch-protection=standard

pauth-%: CFLAGS += -march=armv8.3-a

mte-%: CFLAGS += -march=armv8.5-a+memtag


where we set specific isa extensions for specific tests.


r~

RE: hexagon docker test failure

2022-07-26 Thread Brian Cain

> -Original Message-
> From: Qemu-devel 
> On Behalf Of Richard Henderson
> Sent: Tuesday, July 26, 2022 12:42 PM
> To: Taylor Simpson 
> Cc: qemu-devel 
> Subject: Re: hexagon docker test failure
> 
> WARNING: This email originated from outside of Qualcomm. Please be wary of
> any links or attachments, and do not enable macros.
> 
> On 7/26/22 10:23, Taylor Simpson wrote:
> >
> >> -Original Message-
> >> From: Richard Henderson 
> >> Sent: Tuesday, July 26, 2022 10:41 AM
> >> To: Taylor Simpson 
> >> Cc: qemu-devel 
> >> Subject: hexagon docker test failure
> >>
> >> Hi Taylor,
> >>
> >> One of your recent hexagon testsuite changes is incompatible with the
> >> docker image that we're using:
> >>
> >> tests/tcg/hexagon/multi_result.c:79:16: error: invalid instruction
> >>
> >> asm volatile("%0,p0 = vminub(%2, %3)\n\t"
> >>
> >>  ^
> >>
> >> :1:2: note: instantiated into assembly here
> >>
> >>   r3:2,p0 = vminub(r1:0, r3:2)
> >>
> >>   ^
> >>
> >> 1 error generated.
> >>
> >>
> >> Can we try again to update debian-hexagon-cross?  I recall that last time
> >> there was a compiler bug that prevented forward progress.  Perhaps that
> has
> >> been fixed in the interim?
> >>
> >> I'm willing to accept such an update in the next week before rc1, but if we
> >> can't manage that we'll need to disable the failing test(s?).  Thanks in
> >> advance,
> >>
> >>
> >> r~
> >
> > Hi Richard,
> >
> > Some of the tests require the -mv67 flag to be passed to the compiler
> because they have instructions that are new in v67.
> >
> > This patch
> > commit cd362defbbd09cbbc08b3bb465141542887b8cef
> > Author: Paolo Bonzini 
> > Date:   Fri May 27 16:35:48 2022 +0100
> >
> >  tests/tcg: merge configure.sh back into main configure script
> >
> > Moved this line from tests/tcg/configure.sh to the main configure script
> > : ${cross_cc_cflags_hexagon="-mv67 -O2 -static"}
> >
> >
> > However, those flags aren't actually passed to the compiler any more - at
> least by default.
> >
> > The gitlab builder is passing
> > https://gitlab.com/qemu-project/qemu/-/jobs/2771528066
> > So, there must be something in $MAKE_CHECK_ARGS
> >
> > I use the following when I run
> > make EXTRA_CFLAGS='-mv67 -O2' check-tcg
> >
> >
> > So, we probably don't need a new docker image.  Do other targets have their
> cross_cc_cflags?  Please advise.
> 
> Oooh, that's easy.  Just modify CFLAGS in tests/tcg/hexagon/Makefile.target.
> 
> I've just tested
> 
> --- a/tests/tcg/hexagon/Makefile.target
> 
> +++ b/tests/tcg/hexagon/Makefile.target
> 
> @@ -19,7 +19,7 @@
> 
>   EXTRA_RUNS =
> 
> 
> 
>   CFLAGS += -Wno-incompatible-pointer-types -Wno-undefined-internal
> 
> -CFLAGS += -fno-unroll-loops
> 
> +CFLAGS += -fno-unroll-loops -mv67 -O2
> 
> 
> 
>   HEX_SRC=$(SRC_PATH)/tests/tcg/hexagon
> 
>   VPATH += $(HEX_SRC)
> 
> 
> and it now builds, but I see a runtime error:
> 
> timeout --foreground 90  /home/rth/qemu/bld/qemu-hexagon  misc >
> misc.out
> 
> make[1]: *** [../Makefile.target:158: run-misc] Error 1
> 
> $ cat ./tests/tcg/hexagon-linux-user/misc.out
> 
> ERROR: 0x0007 != 0x001f
> 
> FAIL
> 
> 
> Any ideas there?

I don't think I've seen this one fail before.  We can analyze the failure 
and/or bisect to see if it was introduced by a particular commit.

-Brian

Re: [PATCH] hw/nvme: Add iothread support

2022-07-26 Thread Keith Busch

On Tue, Jul 26, 2022 at 11:32:57PM +0800, Jinhao Fan wrote:
> at 10:45 PM, Keith Busch  wrote:
> 
> > On Tue, Jul 26, 2022 at 04:55:54PM +0800, Jinhao Fan wrote:
> >> Hi Klaus and Keith,
> >> 
> >> I just added support for interrupt masking. How can I test interrupt
> >> masking?
> > 
> > Are you asking about MSI masking? I don't think any drivers are using the
> > feature, so you'd need to modify one to test it. I can give you some 
> > pointers
> > if you are asking about MSI.
> 
> Thanks Keith,
> 
> Do I understand correctly: qemu-nvme only supports MSI-X and pin-based
> interrupts. So MSI masking here is equivalent with MSI-X masking.

It looks like qemu's nvme only supports MSI-x. I could have sworn it used to
support MSI, but I must be thinking of an unofficial fork.

I was mainly asking about MSI masking as it relates to nvme controller's
IVMS/IVMC registers. I don't think any driver is making use of these, and those
don't apply to MSI-x; just MSI and legacy.

At the PCIe level, masking MSI vectors is in Config space. Writing to Config
space is too slow to do per-interrupt, so NVMe created the IVMS/IVMC registers
to get around that.

> If the above is correct, I think I am asking about MSI masking.
> 
> BTW, a double check on ctrl.c seems to show that we only support disabling
> interrupt at CQ creation time, which is recorded in the cq->irq_enabled.
> This is different from my prior understanding that interrupts can be
> disabled at runtime by a call like Linux irq_save(). Therefore I doubt
> whether qemu-nvme supported "interrupt masking" before. How do you
> understand qemu-nvme’s interrupt masking support?

MSI-x uses MMIO for masking, so there's no need for an NVMe specific way to
mask these interrupts. You can just use the generic PCIe methods to clear the
vector's enable bit. But no NVMe driver that I know of is making use of these
either, though it should be possible to make linux start doing that.

The CQ irq_enabled field is there so the user can create a pure polling queue.
That's a fixed property of the queue that can't be re-enabled without
destroying and recreating.

The linux irq_save only disables local CPU interrupts from most sources. All
pci devices are unaware of this and can still send their interrupt messages to
the CPU, but the CPU won't context switch to the irq handler until after
irqrestore is called.

Re: [PATCH V9 41/46] python/machine: QEMUMachine reopen_qmp_connection

2022-07-26 Thread John Snow

On Tue, Jul 26, 2022 at 12:12 PM Steve Sistare
 wrote:
>
> Provide reopen_qmp_connection() to reopen a closed monitor connection.
> This is needed by cpr, because qemu exec closes the monitor socket.
>
> Signed-off-by: Steve Sistare 
> ---
>  python/qemu/machine/machine.py | 9 +
>  1 file changed, 9 insertions(+)
>
> diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
> index d05950e..60b934d 100644
> --- a/python/qemu/machine/machine.py
> +++ b/python/qemu/machine/machine.py
> @@ -491,6 +491,15 @@ def _close_qmp_connection(self) -> None:
>  finally:
>  self._qmp_connection = None
>
> +def reopen_qmp_connection(self):

def reopen_qmp_connection(self) -> None:
"""Close and re-open the QMP connection."""
...

> +self._close_qmp_connection()
> +self._qmp_connection = QEMUMonitorProtocol(
> +self._monitor_address,
> +server=True,
> +nickname=self._name
> +)
> +self._qmp.accept(self._qmp_timer)
> +
>  def _early_cleanup(self) -> None:
>  """
>  Perform any cleanup that needs to happen before the VM exits.
> --
> 1.8.3.1
>

With applied fixup:

Reviewed-by: John Snow

Re: [PATCH V9 40/46] python/machine: QEMUMachine full_args

2022-07-26 Thread John Snow

On Tue, Jul 26, 2022 at 12:12 PM Steve Sistare
 wrote:
>
> Provide full_args() to return all command-line arguments used to start a
> vm, some of which are not otherwise visible to QEMUMachine clients.  This
> is needed by the cpr test, which must start a vm, then pass all qemu
> command-line arguments when setting the cpr-exec-args migration parameter.
>
> Signed-off-by: Steve Sistare 
> ---
>  python/qemu/machine/machine.py | 5 +
>  1 file changed, 5 insertions(+)
>
> diff --git a/python/qemu/machine/machine.py b/python/qemu/machine/machine.py
> index 37191f4..d05950e 100644
> --- a/python/qemu/machine/machine.py
> +++ b/python/qemu/machine/machine.py
> @@ -332,6 +332,11 @@ def args(self) -> List[str]:
>  """Returns the list of arguments given to the QEMU binary."""
>  return self._args
>
> +@property
> +def full_args(self) -> List[str]:
> +"""Returns the full list of arguments used to launch QEMU."""
> +return list(self._qemu_full_args)
> +
>  def _pre_launch(self) -> None:
>  if self._console_set:
>  self._remove_files.append(self._console_address)
> --
> 1.8.3.1
>

Reviewed-by: John Snow 
Acked-by: John Snow

1 2 3 >

1 - 100 of 265 matches

Mail list logo