date:20231026

Re: [PATCH v2 0/2] igb: Add FLR support

2023-10-26 Thread Jason Wang

On Tue, Oct 24, 2023 at 11:30 AM Akihiko Odaki  wrote:
>
> On 2023/10/24 0:45, Cédric Le Goater wrote:
> > From: Cédric Le Goater 
> >
> > Hello,
> >
> > Here is a little series adding FLR to the new IGB models.
> >
> > Thanks,
> >
> > C.
> >
> > Changes in v2:
> >
> > - add a "x-pcie-flr-init" compat property for pre 8.2 machines
> >
> > Cédric Le Goater (2):
> >igb: Add a VF reset handler
> >igb: Add Function Level Reset to PF and VF
> >
> >   hw/net/igb_common.h |  1 +
> >   hw/net/igb_core.h   |  3 +++
> >   hw/core/machine.c   |  3 ++-
> >   hw/net/igb.c| 15 +++
> >   hw/net/igb_core.c   |  6 --
> >   hw/net/igbvf.c  | 19 +++
> >   hw/net/trace-events |  1 +
> >   7 files changed, 45 insertions(+), 3 deletions(-)
> >
>
> For the whole series:
> Reviewed-by: Akihiko Odaki 
>

Queued.

Thanks

Re: [QEMU][PATCHv2 0/8] Xen: support grant mappings.

2023-10-26 Thread Juergen Gross


On 26.10.23 22:56, Stefano Stabellini wrote:

On Thu, 26 Oct 2023, David Woodhouse wrote:

On Thu, 2023-10-26 at 13:36 -0700, Stefano Stabellini wrote:



This seems like a lot of code to replace that simpler option... is
there a massive performance win from doing it this way? Would we want
to use this trick for the Xen PV backends (qdisk, qnic) *too*? Might it
make sense to introduce the simple version and *then* the optimisation,
with some clear benchmarking to show the win?


This is not done for performance but for safety (as in safety
certifications, ISO 26262, etc.). This is to enable unprivileged virtio
backends running in a DomU. By unprivileged I mean a virtio backend that
is unable to map arbitrary memory (the xenforeignmemory interface is
prohibited).

The goal is to run Xen on safety-critical systems such as cars,
industrial robots and more. In this configuration there is no
traditional Dom0 in the system at all. If you  would like to know more:
https://www.youtube.com/watch?v=tisljY6Bqv0=PLYyw7IQjL-zHtpYtMpFR3KYdRn0rcp5Xn=8


Yeah, I understand why we're using grant mappings instead of just
directly having access via foreignmem mappings. That wasn't what I was
confused about.

What I haven't worked out is why we're implementing this through an
automatically-populated MemoryRegion in QEMU, rather than just using
grant mapping ops like we always have.

It seems like a lot of complexity just to avoid calling
qemu_xen_gnttab_map_refs() from the virtio backend.


I think there are two questions here. One question is "Why do we need
all the new grant mapping code added to xen-mapcache.c in patch #7?
Can't we use qemu_xen_gnttab_map_refs() instead?"


The main motivation was to _avoid_ having to change all the backends.

My implementation enables _all_ qemu based virtio backends to use grant
mappings. And if a new backend is added to qemu, there will be no change
required to make it work with grants.


And the second question is where to call the grant mapping from (whether
the new code or the old qemu_xen_gnttab_map_refs code it doesn't
matter). It could be even simpler than calling it from the virtio
backends: we could just call it from the existing qemu_ram_ptr_length()
hook. See this discussion:
https://marc.info/?l=qemu-devel=169828434927778


I wanted to be explicit where and when the mapping and unmapping are
happening. Adding the mapping to qemu_ram_ptr_length() would be probably
possible, but it would be quite hard to verify that no double mappings
are happening. And I had problems with that approach when qemu was setting
up the ring page access.

Adding a map() and an unmap() hook seemed to be the cleanest solution, even
if it needs more code churn. The qemu_ram_ptr_length() approach seemed to be
more like a hack than a solution.


Juergen


OpenPGP_0xB0DE9DD628BF132F.asc
Description: OpenPGP public key


OpenPGP_signature.asc
Description: OpenPGP digital signature

Re: [PATCH v6 06/10] hw/fsi: Aspeed APB2OPB interface

2023-10-26 Thread Andrew Jeffery

On Thu, 2023-10-26 at 10:27 -0500, Ninad Palsule wrote:
> Hello Cedric,
> 
> 
> On 10/24/23 10:21, Cédric Le Goater wrote:
> > On 10/24/23 17:00, Ninad Palsule wrote:
> > > Hello Cedric,
> > > 
> > > On 10/24/23 02:46, Cédric Le Goater wrote:
> > > > and the fsi_opb_* routines are useless to me.
> > > We are trying to keep the separation between OPB implementation and 
> > > interface hence we have all those fsi_opb_*. I feel that we should 
> > > keep as it is so that future extensions will be easier. Please let me 
> > > know.
> > 
> > Well, I can't really tell because I don't know enough about FSI :/
> > 
> > The models look fragile and I have spent already a lot of time trying
> > to untangle what they are trying to do. Please ask your teammates or
> > let's see in the next QEMU cycle.
> 
> I have decided to go with the approach you suggested and it looks much 
> better. Fixed it.

I intended to reply to this before Ninad sent out v7, but life
intervened.

If we can't justify it with the code we have now I think it's right to
pull it out. Add the code to support the things we're trying to do when
we need to do them. As long as we don't do anything that precludes us
from adding that code later (and I can't really imagine how we'd corner
ourselves like that).

We should bear in mind I wrote the initial models several years ago in
the space of about a week while I was trying to learn FSI (and more
deeply about the QEMU bus and address space modelling). I think I was
doing that to unblock some CI due to the introduction of the kernel
driver for the Aspeed FSI hardware. The models were pretty rough -
prior to all this review the code reflected my hazy understanding of
the problems. I didn't get time to remove the complexities introduced
by my misunderstandings, and now it's been so long that I'm not much
help with fixing them.

Andrew

Re: [PATCH v2 RESEND] ppc/pnv: Fix number of I2C engines and ports for power9/10

2023-10-26 Thread Philippe Mathieu-Daudé


On 25/10/23 08:56, Cédric Le Goater wrote:

On 10/24/23 23:29, Glenn Miles wrote:

Power9 is supposed to have 4 PIB-connected I2C engines with the
following number of ports on each engine:

 0: 2
 1: 13
 2: 2
 3: 2

Power10 also has 4 engines but has the following number of ports
on each engine:

 0: 14
 1: 14
 2: 2
 3: 16

Current code assumes that they all have the same (maximum) number.
This can be a problem if software expects to see a certain number
of ports present (Power Hypervisor seems to care).

Fixed this by adding separate tables for power9 and power10 that
map the I2C controller number to the number of I2C buses that should
be attached for that engine.

Signed-off-by: Glenn Miles 


you could have kept :

Reviewed-by: Cédric Le Goater 

one comment below,


---
Based-on: <20231017221434.810363-1-mil...@linux.vnet.ibm.com>
([PATCH] ppc/pnv: Connect PNV I2C controller to powernv10)

Changes from v1:
 - Added i2c_ports_per_engine to PnvChipClass
 - replaced the word "ctlr" with "engine"

  hw/ppc/pnv.c  | 14 ++
  include/hw/ppc/pnv_chip.h |  6 ++
  2 files changed, 12 insertions(+), 8 deletions(-)

diff --git a/hw/ppc/pnv.c b/hw/ppc/pnv.c
index 2655b6e506..f6dc84b869 100644
--- a/hw/ppc/pnv.c
+++ b/hw/ppc/pnv.c
@@ -1507,6 +1507,8 @@ static void pnv_chip_power9_pec_realize(PnvChip 
*chip, Error **errp)

  }
  }
+static int pnv_power9_i2c_ports_per_engine[PNV9_CHIP_MAX_I2C] = {2, 
13, 2, 2};

+


Generally, these class constants are located close to the class definitions
in the file.


Either keep them close by for comparison, or, since there
is a single use, declare it in the function using it here 
pnv_chip_power9_class_init().




Thanks,

C.

Re: [PATCH v2 0/3] vfio/pci: Fix buffer overrun when writing the VF token

2023-10-26 Thread Philippe Mathieu-Daudé


On 26/10/23 16:00, Cédric Le Goater wrote:

On 10/26/23 09:06, Cédric Le Goater wrote:

Hello,

This series fixes a buffer overrun in VFIO. The buffer used in
vfio_realize() by qemu_uuid_unparse() is too small, UUID_FMT_LEN lacks
one byte for the trailing NUL.

Instead of adding + 1, as done elsewhere, the changes introduce a
UUID_STR_LEN define for the correct size and use it where required.


Cc: qemu-sta...@nongnu.org # 8.1+


Hopefully 8.2 shouldn't be affected ;)



I propose to take this series in vfio-next if no one objects.

Thanks,

C.

Re: [PATCH v5 8/8] hw/mem/cxl_type3: Add CXL RAS Error Injection Support.

2023-10-26 Thread Markus Armbruster

I'm trying to fill in QMP documentation holes, and found one in commit
415442a1b4a (this patch).  Details inline.

Jonathan Cameron  writes:

> CXL uses PCI AER Internal errors to signal to the host that an error has
> occurred. The host can then read more detailed status from the CXL RAS
> capability.
>
> For uncorrectable errors: support multiple injection in one operation
> as this is needed to reliably test multiple header logging support in an
> OS. The equivalent feature doesn't exist for correctable errors, so only
> one error need be injected at a time.
>
> Note:
>  - Header content needs to be manually specified in a fashion that
>matches the specification for what can be in the header for each
>error type.
>
> Injection via QMP:
> { "execute": "qmp_capabilities" }
> ...
> { "execute": "cxl-inject-uncorrectable-errors",
>   "arguments": {
> "path": "/machine/peripheral/cxl-pmem0",
> "errors": [
> {
> "type": "cache-address-parity",
> "header": [ 3, 4]
> },
> {
> "type": "cache-data-parity",
> "header": 
> [0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31]
> },
> {
> "type": "internal",
> "header": [ 1, 2, 4]
> }
> ]
>   }}
> ...
> { "execute": "cxl-inject-correctable-error",
> "arguments": {
> "path": "/machine/peripheral/cxl-pmem0",
> "type": "physical"
> } }
>
> Signed-off-by: Jonathan Cameron 

[...]

> diff --git a/qapi/cxl.json b/qapi/cxl.json
> new file mode 100644
> index 00..ac7e167fa2
> --- /dev/null
> +++ b/qapi/cxl.json
> @@ -0,0 +1,118 @@
> +# -*- Mode: Python -*-
> +# vim: filetype=python
> +
> +##
> +# = CXL devices
> +##
> +
> +##
> +# @CxlUncorErrorType:
> +#
> +# Type of uncorrectable CXL error to inject. These errors are reported via
> +# an AER uncorrectable internal error with additional information logged at
> +# the CXL device.
> +#
> +# @cache-data-parity: Data error such as data parity or data ECC error 
> CXL.cache
> +# @cache-address-parity: Address parity or other errors associated with the
> +#address field on CXL.cache
> +# @cache-be-parity: Byte enable parity or other byte enable errors on 
> CXL.cache
> +# @cache-data-ecc: ECC error on CXL.cache
> +# @mem-data-parity: Data error such as data parity or data ECC error on 
> CXL.mem
> +# @mem-address-parity: Address parity or other errors associated with the
> +#  address field on CXL.mem
> +# @mem-be-parity: Byte enable parity or other byte enable errors on CXL.mem.
> +# @mem-data-ecc: Data ECC error on CXL.mem.
> +# @reinit-threshold: REINIT threshold hit.
> +# @rsvd-encoding: Received unrecognized encoding.
> +# @poison-received: Received poison from the peer.
> +# @receiver-overflow: Buffer overflows (first 3 bits of header log indicate 
> which)
> +# @internal: Component specific error
> +# @cxl-ide-tx: Integrity and data encryption tx error.
> +# @cxl-ide-rx: Integrity and data encryption rx error.
> +##
> +
> +{ 'enum': 'CxlUncorErrorType',
> +  'data': ['cache-data-parity',
> +   'cache-address-parity',
> +   'cache-be-parity',
> +   'cache-data-ecc',
> +   'mem-data-parity',
> +   'mem-address-parity',
> +   'mem-be-parity',
> +   'mem-data-ecc',
> +   'reinit-threshold',
> +   'rsvd-encoding',
> +   'poison-received',
> +   'receiver-overflow',
> +   'internal',
> +   'cxl-ide-tx',
> +   'cxl-ide-rx'
> +   ]
> + }
> +
> +##
> +# @CXLUncorErrorRecord:
> +#
> +# Record of a single error including header log.
> +#
> +# @type: Type of error
> +# @header: 16 DWORD of header.
> +##
> +{ 'struct': 'CXLUncorErrorRecord',
> +  'data': {
> +  'type': 'CxlUncorErrorType',
> +  'header': [ 'uint32' ]
> +  }
> +}
> +
> +##
> +# @cxl-inject-uncorrectable-errors:
> +#
> +# Command to allow injection of multiple errors in one go. This allows 
> testing
> +# of multiple header log handling in the OS.
> +#
> +# @path: CXL Type 3 device canonical QOM path
> +# @errors: Errors to inject
> +##
> +{ 'command': 'cxl-inject-uncorrectable-errors',
> +  'data': { 'path': 'str',
> + 'errors': [ 'CXLUncorErrorRecord' ] }}
> +
> +##
> +# @CxlCorErrorType:
> +#
> +# Type of CXL correctable error to inject
> +#
> +# @cache-data-ecc: Data ECC error on CXL.cache
> +# @mem-data-ecc: Data ECC error on CXL.mem

Missing:

   # @retry-threshold: ...

I need suitable description text.  Can you help me?

> +# @crc-threshold: Component specific and applicable to 68 byte Flit mode 
> only.
> +# @cache-poison-received: Received poison from a peer on CXL.cache.
> +# @mem-poison-received: Received poison from a peer on CXL.mem
> +# @physical: Received error indication from the physical layer.
> +##
> +{ 'enum': 'CxlCorErrorType',
> +  'data': ['cache-data-ecc',
>

Re: [PATCH 08/29] tcg/aarch64: Generate TBZ, TBNZ

2023-10-26 Thread Paolo Bonzini


On 10/26/23 02:13, Richard Henderson wrote:

+case TCG_COND_TSTEQ:
+case TCG_COND_TSTNE:
+if (b_const && is_power_of_2(b)) {
+tbit = ctz64(b);
+need_cmp = false;
+}


I think another value that can be handled efficiently is 0x 
which becomes a "cbz/cbnz wNN, LABEL" instruction.


This could be interesting if the i386 frontend implemented JE/JNE and 
JS/JNS (of sizes smaller than MO_TL) using masks like 0x and 
0x8000 respectively.  Like (for SF):


 MemOp size = (s->cc_op - CC_OP_ADDB) & 3;
 if (size == MO_TL) {
 return (CCPrepare) { .cond = TCG_COND_EQ, .reg = cpu_cc_dst,
  .mask = -1 };
 } else {
 return (CCPrepare) { .cond = TCG_COND_TSTEQ, .reg = cpu_cc_dst,
  .imm = (1ull << (8 << size)) - 1,
  .mask = -1 };
}

Then on aarch64, JE could become CBZ and JS could become TBNZ.

Unfortunately, the code produced on x86 is not awful but also not too 
good; we discussed earlier how TST against 0x and 0x8000 can 
be computed efficiently using "testl reg, reg", but you don't get to 
that point in tcg_out_testi because the other conditions require an S32 
constraint.  Those constants don't satisfy it. :(  So you lose the sign 
extension instructions, but you get a somewhat bulky MOV to load the 
constant followed by "testl reg, reg_containing_imm".


I guess in principle you could add 
TCG_TARGET_{br,mov,set}condi_valid(cond, const) but it's pretty ugly.


Paolo

Re: [PATCH 0/1] Enable -Wshadow=local

2023-10-26 Thread Markus Armbruster

Stefan Hajnoczi  writes:

> On Thu, 26 Oct 2023 at 14:32, Markus Armbruster  wrote:
>>
>> Requires Brian's pull request and two patches from Thomas to compile:
>>
>> [PULL 0/2] hex queue - GETPC() fixes, shadowing fixes
>> [PATCH v2] block/snapshot: Fix compiler warning with -Wshadow=local
>> [PATCH v2] migration/ram: Fix compilation with -Wshadow=local
>>
>> Stefan, the PR was posted a week ago; anything blocking it?
>
> It's not in a pull request, so I won't see it. I don't have tooling
> that can spot individual patch series that need to go into
> qemu.git/master, so I rely on being emailed about them.

I'm inquiring about this one:


https://lore.kernel.org/qemu-devel/20231019021733.2258592-1-bc...@quicinc.com/

Looks like a PR to me.

> Would you like me to merge this patch series into qemu.git/master?

Yes, I'd like you merge Brian's PR I linked to.

[...]

Re: [PATCH] linux-user/i386: Properly align signal frame

2023-10-26 Thread Richard Henderson


On 10/25/23 23:35, Michael Tokarev wrote:

24.05.2023 08:46, Richard Henderson:

The beginning of the structure, with pretaddr, should be just below
16-byte alignment.  Disconnect fpstate from sigframe, just like the
kernel does.


Resolves: https://gitlab.com/qemu-project/qemu/-/issues/1648

Ping? Has this been forgotten? It's been 5 months already..


I have not returned to this problem yet.


r~

Re: [PATCH] target/i386/kvm: Refine VMX controls setting for backward compatibility

2023-10-26 Thread Ewan Hai

On 10/25/23 23:20, Zhao Liu wrote:

On Mon, Sep 25, 2023 at 03:14:53AM -0400, EwanHai wrote:

Date: Mon, 25 Sep 2023 03:14:53 -0400
From: EwanHai
Subject: [PATCH] target/i386/kvm: Refine VMX controls setting for backward
  compatibility
X-Mailer: git-send-email 2.34.1

Commit 4a910e1 ("target/i386: do not set unsupported VMX secondary
execution controls") implemented a workaround for hosts that have
specific CPUID features but do not support the corresponding VMX
controls, e.g., hosts support RDSEED but do not support RDSEED-Exiting.

In detail, commit 4a910e1 introduced a flag `has_msr_vmx_procbased_clts2`.
If KVM has `MSR_IA32_VMX_PROCBASED_CTLS2` in its msr list, QEMU would
use KVM's settings, avoiding any modifications to this MSR.

However, this commit (4a910e1) didn’t account for cases in older Linux

s/didn’t/didn't/

I'll fix it.

kernels(e.g., linux-4.19.90) where `MSR_IA32_VMX_PROCBASED_CTLS2` is

For this old kernel, it's better to add the brief lifecycle note (e.g.,
lts, EOL) to illustrate the value of considering such compatibility
fixes.

I've checked the linux-stable repo, found that
MSR_IA32_VMX_PROCBASED_CTLS2 is not included in kvm regular msr list
until linux-5.3, and in linux-4.19.x(EOL:Dec,2024), there is also no
MSR_IA32_VMX_PROCBASED_CTLS2 in kvm regular msr list.

So maybe this is an important compatibility fix for kernel < 5.3.

in `kvm_feature_msrs`—obtained by ioctl(KVM_GET_MSR_FEATURE_INDEX_LIST),

s/—obtained/-obtained/

I'll fix it.

but not in `kvm_msr_list`—obtained by ioctl(KVM_GET_MSR_INDEX_LIST).

s/—obtained/-obtained/

I'll fix it.

As a result,it did not set the `has_msr_vmx_procbased_clts2` flag based
on `kvm_msr_list` alone, even though KVM maintains the value of this MSR.

This patch supplements the above logic, ensuring that
`has_msr_vmx_procbased_clts2` is correctly set by checking both MSR
lists, thus maintaining compatibility with older kernels.

Signed-off-by: EwanHai
---
  target/i386/kvm/kvm.c | 6 ++
  1 file changed, 6 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index af101fcdf6..6299284de4 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2343,6 +2343,7 @@ void kvm_arch_do_init_vcpu(X86CPU *cpu)
  static int kvm_get_supported_feature_msrs(KVMState *s)
  {
  int ret = 0;
+int i;

  if (kvm_feature_msrs != NULL) {

  return 0;
@@ -2377,6 +2378,11 @@ static int kvm_get_supported_feature_msrs(KVMState *s)
  return ret;
  }

It's worth adding a comment here to indicate that this is a
compatibility fix.

-Zhao

+for (i = 0; i < kvm_feature_msrs->nmsrs; i++) {

+if (kvm_feature_msrs->indices[i] == MSR_IA32_VMX_PROCBASED_CTLS2) {
+has_msr_vmx_procbased_ctls2 = true;
+}
+}
  return 0;
  }

--

2.34.1

Plan to use patch bellow, any more suggestion?

 From a3006fcec3615d98ac1eb252a61952d44aa5029b Mon Sep 17 00:00:00 2001
From: EwanHai
Date: Mon, 25 Sep 2023 02:11:59 -0400
Subject: [PATCH] target/i386/kvm: Refine VMX controls setting for backward
  compatibility

Commit 4a910e1 ("target/i386: do not set unsupported VMX secondary
execution controls") implemented a workaround for hosts that have
specific CPUID features but do not support the corresponding VMX
controls, e.g., hosts support RDSEED but do not support RDSEED-Exiting.

In detail, commit 4a910e1 introduced a flag `has_msr_vmx_procbased_clts2`.
If KVM has `MSR_IA32_VMX_PROCBASED_CTLS2` in its msr list, QEMU would
use KVM's settings, avoiding any modifications to this MSR.

However, this commit (4a910e1) didn't account for cases in older Linux
kernels(<5.3) where `MSR_IA32_VMX_PROCBASED_CTLS2` is in
`kvm_feature_msrs`-obtained by ioctl(KVM_GET_MSR_FEATURE_INDEX_LIST),
but not in `kvm_msr_list`-obtained by ioctl(KVM_GET_MSR_INDEX_LIST).
As a result,it did not set the `has_msr_vmx_procbased_clts2` flag based
on `kvm_msr_list` alone, even though KVM maintains the value of this MSR.

This patch supplements the above logic, ensuring that
`has_msr_vmx_procbased_clts2` is correctly set by checking both MSR
lists, thus maintaining compatibility with older kernels.

Signed-off-by: EwanHai
---
  target/i386/kvm/kvm.c | 14 ++
  1 file changed, 14 insertions(+)

diff --git a/target/i386/kvm/kvm.c b/target/i386/kvm/kvm.c
index af101fcdf6..3cf95f8579 100644
--- a/target/i386/kvm/kvm.c
+++ b/target/i386/kvm/kvm.c
@@ -2343,6 +2343,7 @@ void kvm_arch_do_init_vcpu(X86CPU *cpu)
  static int kvm_get_supported_feature_msrs(KVMState *s)
  {
  int ret = 0;
+int i;

  if (kvm_feature_msrs != NULL) {
  return 0;
@@ -2377,6 +2378,19 @@ static int kvm_get_supported_feature_msrs(KVMState *s)
  return ret;
  }

+/*
+ * Compatibility fix:
+ * Older Linux kernels(<5.3) include the MSR_IA32_VMX_PROCBASED_CTLS2
+ * only in feature msr list, but not in regular msr list. This lead to
+ * an issue in older kernel versions where QEMU,

Re: [PATCH 2/6] system/physmem: IOMMU: Invoke the translate_size function if it is implemented

2023-10-26 Thread Ethan Chen via

On Thu, Oct 26, 2023 at 10:20:41AM -0400, Peter Xu wrote:
> Could you elaborate why is that important?  In what use case?
I was not involved in the formulation of the IOPMP specification, but I'll try
to explain my perspective. IOPMP use the same the idea as PMP. "The matching 
PMP entry must match all bytes of an access, or the access fails."

> 
> Consider IOVA mapped for address range iova=[0, 4K] only, here we have a
> DMA request with range=[0, 8K].  Now my understanding is what you want to
> achieve is don't trigger the DMA to [0, 4K] and fail the whole [0, 8K]
> request.
> 
> Can we just fail at the latter DMA [4K, 8K] when it happens?  After all,
> IIUC a device can split the 0-8K DMA into two smaller DMAs, then the 1st
> chunk can succeed then if it falls in 0-4K.  Some further explanation of
> the failure use case could be helpful.

IOPMP can only detect partially hit in an access. DMA device will split a 
large DMA transfer to small DMA transfers base on target and DMA transfer 
width, so partially hit error only happens when an access cross the boundary.
But to ensure that an access is only within one entry is still important. 
For example, an entry may mean permission of a device memory region. We do 
not want to see one DMA transfer can access mutilple devices, although DMA 
have permissions from multiple entries.

Thanks,
Ethan Chen

[PATCH 2/2] acpi/tests/avocado/bits: enable console logging from bits VM

2023-10-26 Thread Ani Sinha

Console logs from the VM can be useful for debugging when things go wrong.
Other avocado tests enables them. This change enables console logging with the
following changes:
 - point to the newer bios bits image that actually enabled VM console.
 - change the bits test to drain the console logs from the VM and write the
   logs.
 - wait for SHUTDOWN event from QEMU so that console logs can be drained out
   of the socket before it is closed as a part of vm.wait().

Additionally, following two cosmetic changes have been made:
 - Removed VM QEMU command line logging as avocado framework already logs it.
   This is a minor cleanup along the way.
 - Update my email to my work email in the avocado acpi bios bits test.

CC: js...@redhat.com
Signed-off-by: Ani Sinha 
---
 tests/avocado/acpi-bits.py | 26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/tests/avocado/acpi-bits.py b/tests/avocado/acpi-bits.py
index 042007b0b8..68b9e98d4e 100644
--- a/tests/avocado/acpi-bits.py
+++ b/tests/avocado/acpi-bits.py
@@ -18,7 +18,7 @@
 #
 #
 # Author:
-#  Ani Sinha 
+#  Ani Sinha 
 
 # pylint: disable=invalid-name
 # pylint: disable=consider-using-f-string
@@ -48,6 +48,7 @@
 )
 from qemu.machine import QEMUMachine
 from avocado import skipIf
+from avocado.utils import datadrainer as drainer
 from avocado_qemu import QemuBaseTest
 
 deps = ["xorriso", "mformat"] # dependent tools needed in the test setup/box.
@@ -141,12 +142,12 @@ def __init__(self, *args, **kwargs):
 self._baseDir = None
 
 # following are some standard configuration constants
-self._bitsInternalVer = 2020
-self._bitsCommitHash = 'b48b88ff' # commit hash must match
+self._bitsInternalVer = 2020 # gitlab CI does shallow clones of depth 
20
+self._bitsCommitHash = 'c7920d2b' # commit hash must match
   # the artifact tag below
-self._bitsTag = "qemu-bits-10182022" # this is the latest bits
+self._bitsTag = "qemu-bits-10262023" # this is the latest bits
  # release as of today.
-self._bitsArtSHA1Hash = 'b04790ac9b99b5662d0416392c73b97580641fe5'
+self._bitsArtSHA1Hash = 'b22cdfcfc7453875297d06d626f5474ee36a343f'
 self._bitsArtURL = ("https://gitlab.com/qemu-project/;
 "biosbits-bits/-/jobs/artifacts/%s/"
 "download?job=qemu-bits-build" %self._bitsTag)
@@ -386,15 +387,20 @@ def test_acpi_smbios_bits(self):
 # for newer machine models"). Therefore, enforce 32-bit entry point.
 self._vm.add_args('-machine', 'smbios-entry-point-type=32')
 
-args = " ".join(str(arg) for arg in self._vm.base_args()) + \
-" " + " ".join(str(arg) for arg in self._vm.args)
+# enable console logging
+self._vm.set_console()
+self._vm.launch()
 
-self.logger.info("launching QEMU vm with the following arguments: %s",
- args)
+self.logger.debug("Console output from bits VM follows ...")
+c_drainer = drainer.LineLogger(self._vm.console_socket.fileno(),
+   logger=self.logger.getChild("console"),
+   stop_check=(lambda :
+   not self._vm.is_running()))
+c_drainer.start()
 
-self._vm.launch()
 # biosbits has been configured to run all the specified test suites
 # in batch mode and then automatically initiate a vm shutdown.
 # Rely on avocado's unit test timeout.
+self._vm.event_wait('SHUTDOWN')
 self._vm.wait(timeout=None)
 self.parse_log()
-- 
2.42.0

[PATCH 1/2] acpi/tests/avocado/bits: enforce 32-bit SMBIOS entry point

2023-10-26 Thread Ani Sinha

QEMU defaults to 64-bit entry point since the following commit
bf376f3020 ("hw/i386/pc: Default to use SMBIOS 3.0 for newer machine models")
The above change is applicable for all newer machine versions from version 8.1
and newer. i440fx and q35 machine versions 8.0 and older still use 32-bit entry
points.
Unfortunately, bits currently does not recognize 64-bit entry points and hence
is not able to parse SMBIOS tables. Therefore, we need to enforce 32-bit
SMBIOS entry point in QEMU command line so that bits is able to parse the
SMBIOS tables.
Once we implement the support in bits to parse 64-bit entry points, we can
remove the extra command line that is passed to enforce a 32-bit entry point.
The support can be added to the following smbios test script:
tests/avocado/acpi-bits/bits-tests/smbios.py2 in QEMU repository.

CC: jus...@redhat.com
CC: imamm...@redhat.com
Signed-off-by: Ani Sinha 
---
 tests/avocado/acpi-bits.py | 5 +
 1 file changed, 5 insertions(+)

diff --git a/tests/avocado/acpi-bits.py b/tests/avocado/acpi-bits.py
index eca13dc518..042007b0b8 100644
--- a/tests/avocado/acpi-bits.py
+++ b/tests/avocado/acpi-bits.py
@@ -380,6 +380,11 @@ def test_acpi_smbios_bits(self):
 # consistent in terms of timing. smilatency tests have consistent
 # timing requirements.
 self._vm.add_args('-icount', 'auto')
+# currently there is no support in bits for recognizing 64-bit SMBIOS
+# entry points. QEMU defaults to 64-bit entry points since the
+# upstream commit bf376f3020 ("hw/i386/pc: Default to use SMBIOS 3.0
+# for newer machine models"). Therefore, enforce 32-bit entry point.
+self._vm.add_args('-machine', 'smbios-entry-point-type=32')
 
 args = " ".join(str(arg) for arg in self._vm.base_args()) + \
 " " + " ".join(str(arg) for arg in self._vm.args)
-- 
2.42.0

[PATCH 0/2] Some biosbits avocado test fixes

2023-10-26 Thread Ani Sinha

Included are couple of bios bits test fixes.
32-bit SMBIOS entry point is enforced.
Console logging is enabled.

I have tested these changes in the CI pipeline here and the test seems
to pass:

https://gitlab.com/anisinha/qemu/-/jobs/5380627517
Log:

https://cdn.artifacts.gitlab-static.net/8a/b0/8ab0aa629e9c43a80356e27a440985f41da9ad10b120a410d9f070bed092fea6/2023_10_26/5380627517/5862985776/job.log?response-content-type=text%2Fplain%3B%20charset%3Dutf-8=inline=1698376660=gprd-artifacts-cdn=ln7fYsTb8t6ch0Trsa7SHAN01QY=

CC: js...@redhat.com
CC: jus...@redhat.com
CC: imamm...@redhat.com
CC: m...@redhat.com
CC: qemu-devel@nongnu.org
CC: cr...@redhat.com
CC: phi...@linaro.org
CC: Wainer dos Santos Moschetta 
CC: Beraldo Leal 
 
Ani Sinha (2):
  acpi/tests/avocado/bits: enforce 32-bit SMBIOS entry point
  acpi/tests/avocado/bits: enable console logging from bits VM

 tests/avocado/acpi-bits.py | 33 ++---
 1 file changed, 22 insertions(+), 11 deletions(-)

-- 
2.42.0

RE: [PATCH 0/1] Enable -Wshadow=local

2023-10-26 Thread Brian Cain



> -Original Message-
> From: Stefan Hajnoczi 
> Sent: Thursday, October 26, 2023 7:52 PM
> To: Markus Armbruster 
> Cc: qemu-devel@nongnu.org; pbonz...@redhat.com;
> marcandre.lur...@redhat.com; berra...@redhat.com; th...@redhat.com;
> phi...@linaro.org; Brian Cain ; i...@bsdimp.com;
> stefa...@redhat.com
> Subject: Re: [PATCH 0/1] Enable -Wshadow=local
> 
> WARNING: This email originated from outside of Qualcomm. Please be wary of
> any links or attachments, and do not enable macros.
> 
> On Thu, 26 Oct 2023 at 14:32, Markus Armbruster 
> wrote:
> >
> > Requires Brian's pull request and two patches from Thomas to compile:
> >
> > [PULL 0/2] hex queue - GETPC() fixes, shadowing fixes
> > [PATCH v2] block/snapshot: Fix compiler warning with -Wshadow=local
> > [PATCH v2] migration/ram: Fix compilation with -Wshadow=local
> >
> > Stefan, the PR was posted a week ago; anything blocking it?
> 
> It's not in a pull request, so I won't see it. I don't have tooling
> that can spot individual patch series that need to go into
> qemu.git/master, so I rely on being emailed about them.

My mistake -- I thought I had emailed you.  But I see now that I likely used 
the wrong email address.

> 
> Would you like me to merge this patch series into qemu.git/master?
> 
> Stefan
> 
> > Warner, I believe not waiting for your cleanup of bsd-user is fine.
> > Please holler if it isn't.
> >
> > Based-on: <20231019021733.2258592-1-bc...@quicinc.com>
> > Based-on: <20231023175038.111607-1-th...@redhat.com>
> > Based-on: <20231024092220.55305-1-th...@redhat.com>
> >
> > Markus Armbruster (1):
> >   meson: Enable -Wshadow=local
> >
> >  meson.build | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > --
> > 2.41.0
> >
> >

Re: [PATCH 13/29] tcg/i386: Support TCG_COND_TST{EQ,NE}

2023-10-26 Thread Paolo Bonzini


On 10/26/23 02:14, Richard Henderson wrote:


+} else if ((i & ~0xff00) == 0 && r < 4) {
+tcg_out_modrm(s, OPC_GRP3_Eb, EXT3_TESTi, r);


Should be "r + 4".

Paolo


+tcg_out8(s, i >> 8);
 } else {

Re: [PATCH 06/29] tcg/optimize: Handle TCG_COND_TST{EQ,NE}

2023-10-26 Thread Paolo Bonzini


On 10/26/23 02:13, Richard Henderson wrote:

+
+sh = ctz64(val);
+ret = op->args[0];
+src1 = op->args[1];
+inv = cond == TCG_COND_TSTEQ;
+
+if (neg && !inv && sext_opc) {
+op->opc = sext_opc;
+op->args[1] = src1;
+op->args[2] = sh;
+op->args[3] = 1;
+neg = false;


This needs to check the validity of (sh,1) as arguments to the extract 
opcode (and perhaps the opposite transformation should be done in 
tcg_gen_extract, when creating a 1-bit extract on a target that does not 
support it).


Paolo

[PATCH] Support for the RISCV Zalasr extension

2023-10-26 Thread Brendan Sweeney

>From 4af1fca6e5c99578a5b80b834c22b70f6419639f Mon Sep 17 00:00:00 2001
From: Brendan Sweeney 
Date: Thu, 26 Oct 2023 17:01:29 -0500
Subject: [PATCH] Support for the RISCV Zalasr extension

Signed-off-by: Brendan Sweeney 
---
target/riscv/cpu.c | 2 +
target/riscv/cpu_cfg.h | 1 +
target/riscv/insn32.decode | 15 +++
target/riscv/insn_trans/trans_rvzalasr.c.inc | 112 +++
target/riscv/translate.c | 1 +
5 files changed, 131 insertions(+)
create mode 100644 target/riscv/insn_trans/trans_rvzalasr.c.inc

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index ac4a6c7eec..a0414bd956 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -85,6 +85,7 @@ const RISCVIsaExtData isa_edata_arr[] = {
ISA_EXT_DATA_ENTRY(zihintpause, PRIV_VERSION_1_10_0, ext_zihintpause),
ISA_EXT_DATA_ENTRY(zmmul, PRIV_VERSION_1_12_0, ext_zmmul),
ISA_EXT_DATA_ENTRY(zawrs, PRIV_VERSION_1_12_0, ext_zawrs),
+ ISA_EXT_DATA_ENTRY(zalasr, PRIV_VERSION_1_12_0, ext_zalasr),
ISA_EXT_DATA_ENTRY(zfa, PRIV_VERSION_1_12_0, ext_zfa),
ISA_EXT_DATA_ENTRY(zfbfmin, PRIV_VERSION_1_12_0, ext_zfbfmin),
ISA_EXT_DATA_ENTRY(zfh, PRIV_VERSION_1_11_0, ext_zfh),
@@ -1248,6 +1249,7 @@ const RISCVCPUMultiExtConfig riscv_cpu_extensions[] =
{
MULTI_EXT_CFG_BOOL("zihintpause", ext_zihintpause, true),
MULTI_EXT_CFG_BOOL("zawrs", ext_zawrs, true),
MULTI_EXT_CFG_BOOL("zfa", ext_zfa, true),
+ MULTI_EXT_CFG_BOOL("zalasr", ext_zalasr, true),
MULTI_EXT_CFG_BOOL("zfh", ext_zfh, false),
MULTI_EXT_CFG_BOOL("zfhmin", ext_zfhmin, false),
MULTI_EXT_CFG_BOOL("zve32f", ext_zve32f, false),
diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
index 0e6a0f245c..8e4f9282fd 100644
--- a/target/riscv/cpu_cfg.h
+++ b/target/riscv/cpu_cfg.h
@@ -76,6 +76,7 @@ struct RISCVCPUConfig {
bool ext_svpbmt;
bool ext_zdinx;
bool ext_zawrs;
+ bool ext_zalasr;
bool ext_zfa;
bool ext_zfbfmin;
bool ext_zfh;
diff --git a/target/riscv/insn32.decode b/target/riscv/insn32.decode
index 33597fe2bb..ba95cdf964 100644
--- a/target/riscv/insn32.decode
+++ b/target/riscv/insn32.decode
@@ -70,6 +70,9 @@
@atom_ld . aq:1 rl:1 .  . ...  rs2=0 %rs1 %rd
@atom_st . aq:1 rl:1 .  . ...  %rs2 %rs1 %rd
+@l_aq . . rl:1 .  . ...  rs2=0 %rs1 %rd aq=1
+@s_rl . aq:1 . .  . ...  %rs2 %rs1 rd=0 rl=1
+
@r4_rm . .. . . ... . ... %rs3 %rs2 %rs1 %rm %rd
@r_rm ... . . ... . ... %rs2 %rs1 %rm %rd
@@ -739,6 +742,18 @@ vsetvl 100 . . 111 . 1010111 @r
wrs_nto 1101 0 000 0 1110011
wrs_sto 00011101 0 000 0 1110011
+# *** RV32 Zalasr Standard Extension ***
+lb_aq 00110 1 . 0 . 000 . 010 @l_aq
+lh_aq 00110 1 . 0 . 001 . 010 @l_aq
+lw_aq 00110 1 . 0 . 010 . 010 @l_aq
+sb_rl 00111 . 1 . . 000 0 010 @s_rl
+sh_rl 00111 . 1 . . 001 0 010 @s_rl
+sw_rl 00111 . 1 . . 010 0 010 @s_rl
+
+# *** RV64 Zalasr Standard Extension (in addition to RV32 Zalasr) ***
+ld_aq 00110 1 . 0 . 011 . 010 @l_aq
+sd_rl 00111 . 1 . . 011 0 010 @s_rl
+
# *** RV32 Zba Standard Extension ***
sh1add 001 .. 010 . 0110011 @r
sh2add 001 .. 100 . 0110011 @r
diff --git a/target/riscv/insn_trans/trans_rvzalasr.c.inc
b/target/riscv/insn_trans/trans_rvzalasr.c.inc
new file mode 100644
index 00..cee81ce8b8
--- /dev/null
+++ b/target/riscv/insn_trans/trans_rvzalasr.c.inc
@@ -0,0 +1,112 @@
+/*
+ * RISC-V translation routines for the Zzlasr Standard Extension.
+ *
+ * Copyright (c) 2023 Brendan Sweeney, b...@berkeley.edu
+ *
+ * This program is free software; you can redistribute it and/or modify it
+ * under the terms and conditions of the GNU General Public License,
+ * version 2 or later, as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope it will be useful, but WITHOUT
+ * ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or
+ * FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for
+ * more details.
+ *
+ * You should have received a copy of the GNU General Public License along
with
+ * this program. If not, see .
+ */
+
+#define REQUIRE_ZALASR(ctx) do { \
+ if (!ctx->cfg_ptr->ext_zalasr) { \
+ return false; \
+ } \
+} while (0)
+
+static bool gen_l_aq(DisasContext *ctx, arg_atomic *a, MemOp mop)
+{
+ TCGv src1;
+
+ decode_save_opc(ctx);
+ src1 = get_address(ctx, a->rs1, 0);
+ if (a->rl) {
+ tcg_gen_mb(TCG_MO_ALL | TCG_BAR_STRL);
+ }
+ tcg_gen_qemu_ld_tl(load_val, src1, ctx->mem_idx, mop);
+ if (a->aq) {
+ tcg_gen_mb(TCG_MO_ALL | TCG_BAR_LDAQ);
+ }
+ /* Put data in load_val. */
+ gen_set_gpr(ctx, a->rd, load_val);
+
+ return true;
+}
+
+static bool trans_lb_aq(DisasContext *ctx, arg_lb_aq *a)
+{
+ REQUIRE_ZALASR(ctx);
+ return gen_l_aq(ctx, a, (MO_ALIGN | MO_SB));
+}
+
+static bool trans_lh_aq(DisasContext *ctx,

Re: [PATCH 0/1] Enable -Wshadow=local

2023-10-26 Thread Stefan Hajnoczi

On Thu, 26 Oct 2023 at 14:32, Markus Armbruster  wrote:
>
> Requires Brian's pull request and two patches from Thomas to compile:
>
> [PULL 0/2] hex queue - GETPC() fixes, shadowing fixes
> [PATCH v2] block/snapshot: Fix compiler warning with -Wshadow=local
> [PATCH v2] migration/ram: Fix compilation with -Wshadow=local
>
> Stefan, the PR was posted a week ago; anything blocking it?

It's not in a pull request, so I won't see it. I don't have tooling
that can spot individual patch series that need to go into
qemu.git/master, so I rely on being emailed about them.

Would you like me to merge this patch series into qemu.git/master?

Stefan

> Warner, I believe not waiting for your cleanup of bsd-user is fine.
> Please holler if it isn't.
>
> Based-on: <20231019021733.2258592-1-bc...@quicinc.com>
> Based-on: <20231023175038.111607-1-th...@redhat.com>
> Based-on: <20231024092220.55305-1-th...@redhat.com>
>
> Markus Armbruster (1):
>   meson: Enable -Wshadow=local
>
>  meson.build | 1 +
>  1 file changed, 1 insertion(+)
>
> --
> 2.41.0
>
>

Re: [PULL 00/39] Migration 20231024 patches

2023-10-26 Thread Stefan Hajnoczi

On Fri, 27 Oct 2023 at 00:25, Juan Quintela  wrote:
>
> Stefan Hajnoczi  wrote:
> > On Tue, 24 Oct 2023 at 23:45, Juan Quintela  wrote:
> >>
> >> The following changes since commit 
> >> a95260486aa7e78d7c7194eba65cf03311ad94ad:
> >>
> >>   Merge tag 'pull-tcg-20231023' of https://gitlab.com/rth7680/qemu into 
> >> staging (2023-10-23 14:45:46 -0700)
> >>
> >> are available in the Git repository at:
> >>
> >>   https://gitlab.com/juan.quintela/qemu.git 
> >> tags/migration-20231024-pull-request
> >>
> >> for you to fetch changes up to 088f7f03da3f5b3487091302b795c22b1bfe56fb:
> >>
> >>   migration: Deprecate old compression method (2023-10-24 13:48:24 +0200)
> >>
> >> 
> >> Migration Pull request (20231024)
> >>
> >> Hi
> >>
> >> In this PULL:
> >> - vmstate registration fixes (thomas, juan)
> >> - start merging vmstate_section_needed changes (marc)
> >> - migration depreactions (juan)
> >> - migration documentation for backwards compatibility (juan)
> >>
> >> Please apply.
> >
> > Hi Juan,
> > I'm seeing CI failures:
> > https://gitlab.com/qemu-project/qemu/-/pipelines/1048630760
>
> start with s390x:
>
> Errors:
>
>  32/840 qemu:qtest+qtest-s390x / qtest-s390x/qom-test 
> ERROR  50.27s   killed by signal 6 SIGABRT
> 104/840 qemu:qtest+qtest-s390x / qtest-s390x/test-hmp 
> ERROR  51.55s   killed by signal 6 SIGABRT
> 189/840 qemu:qtest+qtest-s390x / qtest-s390x/boot-serial-test 
> ERROR  54.07s   killed by signal 6 SIGABRT
> 192/840 qemu:qtest+qtest-s390x / qtest-s390x/qos-test 
> ERROR  51.29s   killed by signal 6 SIGABRT
> 519/840 qemu:qtest+qtest-s390x / qtest-s390x/test-filter-mirror   
> ERROR  50.36s   killed by signal 6 SIGABRT
> 520/840 qemu:qtest+qtest-s390x / qtest-s390x/test-netfilter   
> ERROR  51.03s   killed by signal 6 SIGABRT
> 522/840 qemu:qtest+qtest-s390x / qtest-s390x/device-plug-test 
> ERROR  50.99s   killed by signal 6 SIGABRT
> 523/840 qemu:qtest+qtest-s390x / qtest-s390x/test-filter-redirector   
> ERROR  54.14s   killed by signal 6 SIGABRT
> 524/840 qemu:qtest+qtest-s390x / qtest-s390x/drive_del-test   
> ERROR  53.40s   killed by signal 6 SIGABRT
> 525/840 qemu:qtest+qtest-s390x / qtest-s390x/virtio-ccw-test  
> ERROR  54.67s   killed by signal 6 SIGABRT
> 526/840 qemu:qtest+qtest-s390x / qtest-s390x/device-introspect-test   
> ERROR  51.15s   killed by signal 6 SIGABRT
> 527/840 qemu:qtest+qtest-s390x / qtest-s390x/cpu-plug-test
> ERROR  51.21s   killed by signal 6 SIGABRT
> 535/840 qemu:qtest+qtest-s390x / qtest-s390x/qmp-test 
> ERROR  51.18s   killed by signal 6 SIGABRT
> 534/840 qemu:qtest+qtest-s390x / qtest-s390x/machine-none-test
> ERROR  51.21s   killed by signal 6 SIGABRT
> 533/840 qemu:qtest+qtest-s390x / qtest-s390x/qmp-cmd-test 
> ERROR  51.22s   killed by signal 6 SIGABRT
> 549/840 qemu:qtest+qtest-s390x / qtest-s390x/readconfig-test  
> ERROR  51.20s   killed by signal 6 SIGABRT
> 644/840 qemu:block / io-qcow2-001 
> ERROR   0.32s   exit status 1
> 645/840 qemu:block / io-qcow2-002 
> ERROR   0.32s   exit status 1
> 646/840 qemu:block / io-qcow2-003 
> ERROR   0.34s   exit status 1
> 647/840 qemu:block / io-qcow2-004 
> ERROR   0.31s   exit status 1
> 648/840 qemu:block / io-qcow2-005 
> ERROR   0.43s   exit status 1
> 649/840 qemu:block / io-qcow2-007 
> ERROR   0.34s   exit status 1
> 650/840 qemu:block / io-qcow2-008 
> ERROR   0.63s   exit status 1
> 651/840 qemu:block / io-qcow2-009 
> ERROR   0.32s   exit status 1
> 652/840 qemu:block / io-qcow2-010 
> ERROR   0.30s   exit status 1
> 654/840 qemu:block / io-qcow2-011 
> ERROR   0.31s   exit status 1
> 655/840 qemu:block / io-qcow2-012 
> ERROR   0.36s   exit status 1
> 657/840 qemu:block / io-qcow2-013 
> ERROR   0.51s   exit status 1
> 658/840 qemu:block / io-qcow2-017 
> ERROR

Re: [PATCH] target/i386/monitor: synchronize cpu before printing lapic state

2023-10-26 Thread Dongli Zhang

Hi David,

Thank you very much for the Reviewed-by in another thread.

I have re-based the patch and sent again.

https://lore.kernel.org/all/20231026211938.162815-1-dongli.zh...@oracle.com/

Dongli Zhang

On 10/26/23 09:39, Dongli Zhang wrote:
> Hi David,
> 
> On 10/26/23 08:39, David Woodhouse wrote:
>> From: David Woodhouse 
>>
>> Where the local APIC is emulated by KVM, we need kvm_get_apic() to pull
>> the current state into userspace before it's printed. Otherwise we get
>> stale values.
>>
>> Signed-off-by: David Woodhouse 
>> ---
>>  target/i386/monitor.c | 2 ++
>>  1 file changed, 2 insertions(+)
>>
>> diff --git a/target/i386/monitor.c b/target/i386/monitor.c
>> index 6512846327..0754d699ba 100644
>> --- a/target/i386/monitor.c
>> +++ b/target/i386/monitor.c
>> @@ -29,6 +29,7 @@
>>  #include "monitor/hmp.h"
>>  #include "qapi/qmp/qdict.h"
>>  #include "sysemu/kvm.h"
>> +#include "sysemu/hw_accel.h"
>>  #include "qapi/error.h"
>>  #include "qapi/qapi-commands-misc-target.h"
>>  #include "qapi/qapi-commands-misc.h"
>> @@ -655,6 +656,7 @@ void hmp_info_local_apic(Monitor *mon, const QDict 
>> *qdict)
>>  if (qdict_haskey(qdict, "apic-id")) {
>>  int id = qdict_get_try_int(qdict, "apic-id", 0);
>>  cs = cpu_by_arch_id(id);
>> +cpu_synchronize_state(cs);
> 
> AFAIR, there is a case that cs may be NULL here when I was sending the similar
> bugfix long time ago.
> 
> https://lore.kernel.org/qemu-devel/20210701214051.1588-1-dongli.zh...@oracle.com/
> 
> ... and resend:
> 
> https://lore.kernel.org/qemu-devel/20210908143803.29191-1-dongli.zh...@oracle.com/
> 
> ... and resent by Daniel as part of another patchset (after review):
> 
> https://lore.kernel.org/qemu-devel/20211028155457.967291-19-berra...@redhat.com/
> 
> 
> This utility is helpful for the diagnostic of loss of interrupt issue.
> 
> Dongli Zhang
> 
>>  } else {
>>  cs = mon_get_cpu(mon);
>>  }

[PATCH v3 1/1] hmp: synchronize cpu state for lapic info

2023-10-26 Thread Dongli Zhang

While the default "info lapic" always synchronizes cpu state ...

mon_get_cpu()
-> mon_get_cpu_sync(mon, true)
   -> cpu_synchronize_state(cpu)
  -> ioctl KVM_GET_LAPIC (taking KVM as example)

... the cpu state is not synchronized when the apic-id is available as
argument.

The cpu state should be synchronized when apic-id is available. Otherwise
the "info lapic " always returns stale data.

Reference:
https://lore.kernel.org/all/20211028155457.967291-19-berra...@redhat.com/

Cc: Joe Jin 
Signed-off-by: Dongli Zhang 
Reviewed-by: Daniel P. Berrangé 
Reviewed-by: David Woodhouse 
---
Changed since v1:
  - I sent out wrong patch version in v1
Changed since v2:
  - Add the Reviewed-by from Daniel and David

 target/i386/monitor.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/target/i386/monitor.c b/target/i386/monitor.c
index 6512846327..d727270fd0 100644
--- a/target/i386/monitor.c
+++ b/target/i386/monitor.c
@@ -28,6 +28,7 @@
 #include "monitor/hmp-target.h"
 #include "monitor/hmp.h"
 #include "qapi/qmp/qdict.h"
+#include "sysemu/hw_accel.h"
 #include "sysemu/kvm.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-misc-target.h"
@@ -654,7 +655,11 @@ void hmp_info_local_apic(Monitor *mon, const QDict *qdict)
 
 if (qdict_haskey(qdict, "apic-id")) {
 int id = qdict_get_try_int(qdict, "apic-id", 0);
+
 cs = cpu_by_arch_id(id);
+if (cs) {
+cpu_synchronize_state(cs);
+}
 } else {
 cs = mon_get_cpu(mon);
 }
-- 
2.34.1

Re: [QEMU][PATCHv2 0/8] Xen: support grant mappings.

2023-10-26 Thread Stefano Stabellini

On Thu, 26 Oct 2023, David Woodhouse wrote:
> On Thu, 2023-10-26 at 13:36 -0700, Stefano Stabellini wrote:
> > 
> > > This seems like a lot of code to replace that simpler option... is
> > > there a massive performance win from doing it this way? Would we want
> > > to use this trick for the Xen PV backends (qdisk, qnic) *too*? Might it
> > > make sense to introduce the simple version and *then* the optimisation,
> > > with some clear benchmarking to show the win?
> > 
> > This is not done for performance but for safety (as in safety
> > certifications, ISO 26262, etc.). This is to enable unprivileged virtio
> > backends running in a DomU. By unprivileged I mean a virtio backend that
> > is unable to map arbitrary memory (the xenforeignmemory interface is
> > prohibited).
> > 
> > The goal is to run Xen on safety-critical systems such as cars,
> > industrial robots and more. In this configuration there is no
> > traditional Dom0 in the system at all. If you  would like to know more:
> > https://www.youtube.com/watch?v=tisljY6Bqv0=PLYyw7IQjL-zHtpYtMpFR3KYdRn0rcp5Xn=8
> 
> Yeah, I understand why we're using grant mappings instead of just
> directly having access via foreignmem mappings. That wasn't what I was
> confused about.
> 
> What I haven't worked out is why we're implementing this through an
> automatically-populated MemoryRegion in QEMU, rather than just using
> grant mapping ops like we always have.
> 
> It seems like a lot of complexity just to avoid calling
> qemu_xen_gnttab_map_refs() from the virtio backend.

I think there are two questions here. One question is "Why do we need
all the new grant mapping code added to xen-mapcache.c in patch #7?
Can't we use qemu_xen_gnttab_map_refs() instead?"

Good question, I'll let Juergen and Vikram comment as original authors.

And the second question is where to call the grant mapping from (whether
the new code or the old qemu_xen_gnttab_map_refs code it doesn't
matter). It could be even simpler than calling it from the virtio
backends: we could just call it from the existing qemu_ram_ptr_length()
hook. See this discussion:
https://marc.info/?l=qemu-devel=169828434927778

[RFC] pvpanic notifications for non-panic reboot events

2023-10-26 Thread Thomas Weißschuh

Hi everybody,

the mechanism for a (Linux) guest to signal a regular shutdown event to
QEMU seems fairly architecture specific and dependent on kernel
configuration.

The existing pvpanic protocol [0] could be extended fairly easily to
also cover these events.

Any thoughts?

Thanks,
Thomas

[0] https://github.com/qemu/qemu/blob/master/docs/specs/pvpanic.txt

Re: [QEMU][PATCHv2 0/8] Xen: support grant mappings.

2023-10-26 Thread David Woodhouse

On Thu, 2023-10-26 at 13:36 -0700, Stefano Stabellini wrote:
> 
> > This seems like a lot of code to replace that simpler option... is
> > there a massive performance win from doing it this way? Would we want
> > to use this trick for the Xen PV backends (qdisk, qnic) *too*? Might it
> > make sense to introduce the simple version and *then* the optimisation,
> > with some clear benchmarking to show the win?
> 
> This is not done for performance but for safety (as in safety
> certifications, ISO 26262, etc.). This is to enable unprivileged virtio
> backends running in a DomU. By unprivileged I mean a virtio backend that
> is unable to map arbitrary memory (the xenforeignmemory interface is
> prohibited).
> 
> The goal is to run Xen on safety-critical systems such as cars,
> industrial robots and more. In this configuration there is no
> traditional Dom0 in the system at all. If you  would like to know more:
> https://www.youtube.com/watch?v=tisljY6Bqv0=PLYyw7IQjL-zHtpYtMpFR3KYdRn0rcp5Xn=8

Yeah, I understand why we're using grant mappings instead of just
directly having access via foreignmem mappings. That wasn't what I was
confused about.

What I haven't worked out is why we're implementing this through an
automatically-populated MemoryRegion in QEMU, rather than just using
grant mapping ops like we always have.

It seems like a lot of complexity just to avoid calling
qemu_xen_gnttab_map_refs() from the virtio backend.

And if we're going to use this magic region, are we going to start
using it for the Xen PV backends in QEMU too, when they want to map
grant refs? 

smime.p7s
Description: S/MIME cryptographic signature

Re: [QEMU][PATCHv2 0/8] Xen: support grant mappings.

2023-10-26 Thread Stefano Stabellini

On Thu, 26 Oct 2023, David Woodhouse wrote:
> On Thu, 2023-10-26 at 11:07 -0700, Stefano Stabellini wrote:
> > On Thu, 26 Oct 2023, David Woodhouse wrote:
> > > On Wed, 2023-10-25 at 14:24 -0700, Vikram Garhwal wrote:
> > > > Hi,
> > > > This patch series add support for grant mappings as a pseudo RAM region 
> > > > for Xen.
> > > > 
> > > > Enabling grant mappings patches(first 6) are written by Juergen in 2021.
> > > > 
> > > > QEMU Virtio device provides an emulated backends for Virtio frontned 
> > > > devices
> > > > in Xen.
> > > > Please set "iommu_platform=on" option when invoking QEMU. As this will 
> > > > set
> > > > VIRTIO_F_ACCESS_PLATFORM feature which will be used by virtio frontend 
> > > > in Xen
> > > > to know whether backend supports grants or not.
> > > 
> > > I don't really understand what's going on here. The subject of the
> > > cover letter certainly doesn't help me, because we *already* support
> > > grant mappings under Xen, don't we?
> > > 
> > > I found
> > > https://static.linaro.org/connect/lvc21/presentations/lvc21-314.pdf but
> > > I think it's a bit out of date; the decision about how to handle grant
> > > mappings for virtio devices is still 'TBD'.
> > 
> > See this presentation:
> > https://www.youtube.com/watch?v=boRQ8UHc760
> > 
> > The patch series is for the guest (e.g. Linux) to use grants to share
> > memory with virtio devices. The plumbing was already done in Linux a
> > couple of years ago, but QEMU support for it is still missing.
> 
> Thanks.
> 
> > > Can you talk me through the process of what happens when a guest wants
> > > to a virtio device to initiate 'DMA' to one of its pages? I assume it
> > > starts by creating a grant mapping, and then taking the gntref and...
> > > then what?
> > 
> > First the guest gets a grant reference for the page, then it uses it as
> > "dma address" to pass to the virtio device. The top bit (1ULL << 63) is
> > used to signal that the address in question is a grant address.
> 
> OK, so the guest sets the top bit in the DMA address and that indicates
> that this is no longer actually a guest physical address, but instead,
> bits 62-12 are a (starting) grant ref. (And the address *within* the
> page is still bits 0-11, assuming 4KiB pages).
> 
> An alternative way of implementing this on the QEMU side would just be
> to teach the virtio mapping to recognise the "special" format with the
> top bit set, and call qemu_xen_gnttab_map_refs() directly to get the
> mapping?
> 
> This seems like a lot of code to replace that simpler option... is
> there a massive performance win from doing it this way? Would we want
> to use this trick for the Xen PV backends (qdisk, qnic) *too*? Might it
> make sense to introduce the simple version and *then* the optimisation,
> with some clear benchmarking to show the win?

This is not done for performance but for safety (as in safety
certifications, ISO 26262, etc.). This is to enable unprivileged virtio
backends running in a DomU. By unprivileged I mean a virtio backend that
is unable to map arbitrary memory (the xenforeignmemory interface is
prohibited).

The goal is to run Xen on safety-critical systems such as cars,
industrial robots and more. In this configuration there is no
traditional Dom0 in the system at all. If you  would like to know more:
https://www.youtube.com/watch?v=tisljY6Bqv0=PLYyw7IQjL-zHtpYtMpFR3KYdRn0rcp5Xn=8


> If we're just going to switch to grant mappings, why *aren't* we using
> Xen PV device models on Arm anyway?

I think you mean Xen PV drivers. Yes we have been using them for a
while. Now we would also like to add the option of running virtio
devices but we can only do that if the virtio backend is unprivileged as
we have no Dom0 in the system.

Re: [QEMU][PATCHv2 0/8] Xen: support grant mappings.

2023-10-26 Thread David Woodhouse

On Thu, 2023-10-26 at 11:07 -0700, Stefano Stabellini wrote:
> On Thu, 26 Oct 2023, David Woodhouse wrote:
> > On Wed, 2023-10-25 at 14:24 -0700, Vikram Garhwal wrote:
> > > Hi,
> > > This patch series add support for grant mappings as a pseudo RAM region 
> > > for Xen.
> > > 
> > > Enabling grant mappings patches(first 6) are written by Juergen in 2021.
> > > 
> > > QEMU Virtio device provides an emulated backends for Virtio frontned 
> > > devices
> > > in Xen.
> > > Please set "iommu_platform=on" option when invoking QEMU. As this will set
> > > VIRTIO_F_ACCESS_PLATFORM feature which will be used by virtio frontend in 
> > > Xen
> > > to know whether backend supports grants or not.
> > 
> > I don't really understand what's going on here. The subject of the
> > cover letter certainly doesn't help me, because we *already* support
> > grant mappings under Xen, don't we?
> > 
> > I found
> > https://static.linaro.org/connect/lvc21/presentations/lvc21-314.pdf but
> > I think it's a bit out of date; the decision about how to handle grant
> > mappings for virtio devices is still 'TBD'.
> 
> See this presentation:
> https://www.youtube.com/watch?v=boRQ8UHc760
> 
> The patch series is for the guest (e.g. Linux) to use grants to share
> memory with virtio devices. The plumbing was already done in Linux a
> couple of years ago, but QEMU support for it is still missing.

Thanks.

> > Can you talk me through the process of what happens when a guest wants
> > to a virtio device to initiate 'DMA' to one of its pages? I assume it
> > starts by creating a grant mapping, and then taking the gntref and...
> > then what?
> 
> First the guest gets a grant reference for the page, then it uses it as
> "dma address" to pass to the virtio device. The top bit (1ULL << 63) is
> used to signal that the address in question is a grant address.

OK, so the guest sets the top bit in the DMA address and that indicates
that this is no longer actually a guest physical address, but instead,
bits 62-12 are a (starting) grant ref. (And the address *within* the
page is still bits 0-11, assuming 4KiB pages).

An alternative way of implementing this on the QEMU side would just be
to teach the virtio mapping to recognise the "special" format with the
top bit set, and call qemu_xen_gnttab_map_refs() directly to get the
mapping?

This seems like a lot of code to replace that simpler option... is
there a massive performance win from doing it this way? Would we want
to use this trick for the Xen PV backends (qdisk, qnic) *too*? Might it
make sense to introduce the simple version and *then* the optimisation,
with some clear benchmarking to show the win?

If we're just going to switch to grant mappings, why *aren't* we using
Xen PV device models on Arm anyway?

Or am I missing the point completely?

smime.p7s
Description: S/MIME cryptographic signature

Re: [PATCH 4/3] migration: Add tracepoints for downtime checkpoints

2023-10-26 Thread Peter Xu

On Thu, Oct 26, 2023 at 04:08:20PM -0400, Peter Xu wrote:
> On Thu, Oct 26, 2023 at 08:43:59PM +0100, Joao Martins wrote:
> > Considering we aren't including any downtime timestamps in the tracing, is 
> > this
> > a way to say that the tracing tool printing timestamps is what we use to 
> > extract
> > downtime contribution?
> > 
> > It might be obvious, but perhaps should be spelled out in the commit 
> > message?
> 
> Sure, I'll state that in the commit message in a new version.

After a second thought, I'll rename it into vmstate_downtime_checkopint().

I'll wait for 1-2 more days for further review comments before a repost.

Thanks,

-- 
Peter Xu

Re: [PATCH V2 0/6] Live Update reboot mode

2023-10-26 Thread Steven Sistare

SORRY for the noise, I type the wrong send-mail command, please ignore this 
resend of V2 - steve

On 10/26/2023 4:08 PM, Steve Sistare wrote:
> Add a mode migration parameter that can be used to select alternate
> migration algorithms.  The default mode is normal, representing the
> current migration algorithm, and does not need to be explicitly set.
> 
> Provide the cpr-reboot (CheckPoint and Restart) migration mode for live
> update, which saves state to a file.  This allows one to quit qemu, reboot
> to an updated kernel, install an updated version of qemu, and resume via
> the migrate-incoming command.  The caller must specify a migration URI
> that writes to and reads from a file, and must set the mode parameter
> before invoking the migrate or migrate-incoming commands.
> 
> Unlike normal mode, the use of certain local storage options does not block
> cpr-reboot mode, but the caller must not modify guest block devices between
> the quit and restart.  To avoid saving guest RAM to the state file, the memory
> backend must be shared, and the @x-ignore-shared migration capability must
> be set.
> 
> Guest RAM must be non-volatile across reboot, which can be achieved by
> backing it with a dax device, or /dev/shm PKRAM as proposed in
> https://lore.kernel.org/lkml/1617140178-8773-1-git-send-email-anthony.yzn...@oracle.com
> but this is not enforced.  The restarted qemu arguments must match those used
> to initially start qemu, plus the -incoming option.
> 
> This patch series contains minimal functionality.  Future patches will enhance
> reboot mode by preserving vfio devices for suspended guests.  They will also
> add a new mode for updating qemu using the exec system call, which will keep
> vfio devices and certain character devices alive.
> 
> Here is an example of updating the host kernel using reboot mode.
> 
> window 1| window 2
> |
> # qemu-system-$arch -monitor stdio  |
>   mem-path=/dev/dax0.0 ...  |
> QEMU 8.1.50 monitor - type 'help' for more info |
> (qemu) info status  |
> VM status: running  |
> | # yum update kernel-uek
> (qemu) migrate_set_capability x-ignore-shared on|
> (qemu) migrate_set_parameter mode cpr-reboot|
> (qemu) migrate -d file:vm.state |
> (qemu) info status  |
> VM status: paused (postmigrate) |
> (qemu) quit |
> |
> # systemctl kexec   |
> kexec_core: Starting new kernel |
> ... |
> |
> # qemu-system-$arch -monitor stdio  |
>   mem-path=/dev/dax0.0 -incoming defer ...  |
> QEMU 8.1.50 monitor - type 'help' for more info |
> (qemu) info status  |
> VM status: paused (inmigrate)   |
> (qemu) migrate_set_capability x-ignore-shared on|
> (qemu) migrate_set_parameter mode cpr-reboot|
> (qemu) migrate_incoming file:vm.state   |
> (qemu) info status  |
> VM status: running  |
> 
> Changes in V2:
>   - moved "migration mode" code and comments to more appropriate places
>   - clarified the behavior of non-shared-memory backends
>   - split blocker patches and reverted some blockers
>   - added a test
> 
> Steve Sistare (6):
>   migration: mode parameter
>   migration: per-mode blockers
>   cpr: relax blockdev migration blockers
>   cpr: relax vhost migration blockers
>   cpr: reboot mode
>   tests/qtest: migration: add reboot mode test
> 
>  block/parallels.c   |  2 +-
>  block/qcow.c|  2 +-
>  block/vdi.c |  2 +-
>  block/vhdx.c|  2 +-
>  block/vmdk.c|  2 +-
>  block/vpc.c |  2 +-
>  block/vvfat.c   |  2 +-
>  hw/core/qdev-properties-system.c| 14 ++
>  hw/scsi/vhost-scsi.c|  2 +-
>  hw/virtio/vhost.c   |  2 +-
>  include/hw/qdev-properties-system.h |  4 ++
>  include/migration/blocker.h | 44 +++--
>  include/migration/misc.h|  1 +
>  migration/migration-hmp-cmds.c  |  9 
>  migration/migration.c   | 95 
> -
>  migration/options.c | 21 
>  migration/options.h |  1 +
>  qapi/migration.json | 40 ++--
>  stubs/migr-blocker.c| 10 
>  tests/qtest/migration-test.c| 27 +++
>  20 files changed, 255 insertions(+), 29 deletions(-)
>

[PATCH V2 1/6] migration: mode parameter

2023-10-26 Thread Steve Sistare

Create a mode migration parameter that can be used to select alternate
migration algorithms.  The default mode is normal, representing the
current migration algorithm, and does not need to be explicitly set.

No functional change until a new mode is added, except that the mode is
shown by the 'info migrate' command.

Signed-off-by: Steve Sistare 
---
 hw/core/qdev-properties-system.c| 14 ++
 include/hw/qdev-properties-system.h |  4 
 include/migration/misc.h|  1 +
 migration/migration-hmp-cmds.c  |  9 +
 migration/options.c | 21 +
 migration/options.h |  1 +
 qapi/migration.json | 27 ---
 7 files changed, 74 insertions(+), 3 deletions(-)

diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index 6883406..07a848d 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -673,6 +673,20 @@ const PropertyInfo qdev_prop_multifd_compression = {
 .set_default_value = qdev_propinfo_set_default_value_enum,
 };
 
+/* --- MigMode --- */
+
+QEMU_BUILD_BUG_ON(sizeof(MigMode) != sizeof(int));
+
+const PropertyInfo qdev_prop_mig_mode = {
+.name = "MigMode",
+.description = "mig_mode values, "
+   "normal",
+.enum_table = _lookup,
+.get = qdev_propinfo_get_enum,
+.set = qdev_propinfo_set_enum,
+.set_default_value = qdev_propinfo_set_default_value_enum,
+};
+
 /* --- Reserved Region --- */
 
 /*
diff --git a/include/hw/qdev-properties-system.h 
b/include/hw/qdev-properties-system.h
index 0ac327a..1418801 100644
--- a/include/hw/qdev-properties-system.h
+++ b/include/hw/qdev-properties-system.h
@@ -7,6 +7,7 @@ extern const PropertyInfo qdev_prop_chr;
 extern const PropertyInfo qdev_prop_macaddr;
 extern const PropertyInfo qdev_prop_reserved_region;
 extern const PropertyInfo qdev_prop_multifd_compression;
+extern const PropertyInfo qdev_prop_mig_mode;
 extern const PropertyInfo qdev_prop_losttickpolicy;
 extern const PropertyInfo qdev_prop_blockdev_on_error;
 extern const PropertyInfo qdev_prop_bios_chs_trans;
@@ -41,6 +42,9 @@ extern const PropertyInfo qdev_prop_pcie_link_width;
 #define DEFINE_PROP_MULTIFD_COMPRESSION(_n, _s, _f, _d) \
 DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_multifd_compression, \
MultiFDCompression)
+#define DEFINE_PROP_MIG_MODE(_n, _s, _f, _d) \
+DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_mig_mode, \
+   MigMode)
 #define DEFINE_PROP_LOSTTICKPOLICY(_n, _s, _f, _d) \
 DEFINE_PROP_SIGNED(_n, _s, _f, _d, qdev_prop_losttickpolicy, \
 LostTickPolicy)
diff --git a/include/migration/misc.h b/include/migration/misc.h
index 673ac49..1bc8902 100644
--- a/include/migration/misc.h
+++ b/include/migration/misc.h
@@ -15,6 +15,7 @@
 #define MIGRATION_MISC_H
 
 #include "qemu/notify.h"
+#include "qapi/qapi-types-migration.h"
 #include "qapi/qapi-types-net.h"
 
 /* migration/ram.c */
diff --git a/migration/migration-hmp-cmds.c b/migration/migration-hmp-cmds.c
index a82597f..35e57b8 100644
--- a/migration/migration-hmp-cmds.c
+++ b/migration/migration-hmp-cmds.c
@@ -387,6 +387,11 @@ void hmp_info_migrate_parameters(Monitor *mon, const QDict 
*qdict)
 monitor_printf(mon, "%s: %" PRIu64 " MB/s\n",
 MigrationParameter_str(MIGRATION_PARAMETER_VCPU_DIRTY_LIMIT),
 params->vcpu_dirty_limit);
+
+assert(params->has_mode);
+monitor_printf(mon, "%s: %s\n",
+MigrationParameter_str(MIGRATION_PARAMETER_MODE),
+qapi_enum_lookup(_lookup, params->mode));
 }
 
 qapi_free_MigrationParameters(params);
@@ -661,6 +666,10 @@ void hmp_migrate_set_parameter(Monitor *mon, const QDict 
*qdict)
 p->has_vcpu_dirty_limit = true;
 visit_type_size(v, param, >vcpu_dirty_limit, );
 break;
+case MIGRATION_PARAMETER_MODE:
+p->has_mode = true;
+visit_type_MigMode(v, param, >mode, );
+break;
 default:
 assert(0);
 }
diff --git a/migration/options.c b/migration/options.c
index 42fb818..cbede3f 100644
--- a/migration/options.c
+++ b/migration/options.c
@@ -175,6 +175,9 @@ Property migration_properties[] = {
 DEFINE_PROP_UINT64("vcpu-dirty-limit", MigrationState,
parameters.vcpu_dirty_limit,
DEFAULT_MIGRATE_VCPU_DIRTY_LIMIT),
+DEFINE_PROP_MIG_MODE("mode", MigrationState,
+  parameters.mode,
+  MIG_MODE_NORMAL),
 
 /* Migration capabilities */
 DEFINE_PROP_MIG_CAP("x-xbzrle", MIGRATION_CAPABILITY_XBZRLE),
@@ -803,6 +806,13 @@ uint64_t migrate_max_postcopy_bandwidth(void)
 return s->parameters.max_postcopy_bandwidth;
 }
 
+MigMode migrate_mode(void)
+{
+MigrationState *s = migrate_get_current();
+
+return s->parameters.mode;
+}
+
 int migrate_multifd_channels(void)
 {

[PATCH V2 2/6] migration: per-mode blockers

2023-10-26 Thread Steve Sistare

Extend the blocker interface so that a blocker can be registered for
one or more migration modes.  The existing interfaces register a
blocker for all modes, and the new interfaces take a varargs list
of modes.

Internally, maintain a separate blocker list per mode.  The same Error
object may be added to multiple lists.  When a block is deleted, it is
removed from every list, and the Error is freed.

No functional change until a new mode is added.

Signed-off-by: Steve Sistare 
Reviewed-by: Juan Quintela 
---
 include/migration/blocker.h | 44 +++--
 migration/migration.c   | 95 ++---
 stubs/migr-blocker.c| 10 +
 3 files changed, 132 insertions(+), 17 deletions(-)

diff --git a/include/migration/blocker.h b/include/migration/blocker.h
index b048f30..a687ac0 100644
--- a/include/migration/blocker.h
+++ b/include/migration/blocker.h
@@ -14,8 +14,12 @@
 #ifndef MIGRATION_BLOCKER_H
 #define MIGRATION_BLOCKER_H
 
+#include "qapi/qapi-types-migration.h"
+
+#define MIG_MODE_ALL MIG_MODE__MAX
+
 /**
- * @migrate_add_blocker - prevent migration from proceeding
+ * @migrate_add_blocker - prevent all modes of migration from proceeding
  *
  * @reasonp - address of an error to be returned whenever migration is 
attempted
  *
@@ -30,8 +34,8 @@
 int migrate_add_blocker(Error **reasonp, Error **errp);
 
 /**
- * @migrate_add_blocker_internal - prevent migration from proceeding without
- * only-migrate implications
+ * @migrate_add_blocker_internal - prevent all modes of migration from
+ * proceeding, but ignore -only-migratable
  *
  * @reasonp - address of an error to be returned whenever migration is 
attempted
  *
@@ -50,7 +54,7 @@ int migrate_add_blocker(Error **reasonp, Error **errp);
 int migrate_add_blocker_internal(Error **reasonp, Error **errp);
 
 /**
- * @migrate_del_blocker - remove a blocking error from migration and free it.
+ * @migrate_del_blocker - remove a migration blocker from all modes and free 
it.
  *
  * @reasonp - address of the error blocking migration
  *
@@ -58,4 +62,36 @@ int migrate_add_blocker_internal(Error **reasonp, Error 
**errp);
  */
 void migrate_del_blocker(Error **reasonp);
 
+/**
+ * @migrate_add_blocker_normal - prevent normal migration mode from proceeding
+ *
+ * @reasonp - address of an error to be returned whenever migration is 
attempted
+ *
+ * @errp - [out] The reason (if any) we cannot block migration right now.
+ *
+ * @returns - 0 on success, -EBUSY/-EACCES on failure, with errp set.
+ *
+ * *@reasonp is freed and set to NULL if failure is returned.
+ * On success, the caller must not free @reasonp, except by
+ *   calling migrate_del_blocker.
+ */
+int migrate_add_blocker_normal(Error **reasonp, Error **errp);
+
+/**
+ * @migrate_add_blocker_modes - prevent some modes of migration from proceeding
+ *
+ * @reasonp - address of an error to be returned whenever migration is 
attempted
+ *
+ * @errp - [out] The reason (if any) we cannot block migration right now.
+ *
+ * @mode - one or more migration modes to be blocked.  The list is terminated
+ * by -1 or MIG_MODE_ALL.  For the latter, all modes are blocked.
+ *
+ * @returns - 0 on success, -EBUSY/-EACCES on failure, with errp set.
+ *
+ * *@reasonp is freed and set to NULL if failure is returned.
+ * On success, the caller must not free *@reasonp before the blocker is 
removed.
+ */
+int migrate_add_blocker_modes(Error **reasonp, Error **errp, MigMode mode, 
...);
+
 #endif
diff --git a/migration/migration.c b/migration/migration.c
index 67547eb..b8b54e6 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -92,7 +92,7 @@ enum mig_rp_message_type {
 static MigrationState *current_migration;
 static MigrationIncomingState *current_incoming;
 
-static GSList *migration_blockers;
+static GSList *migration_blockers[MIG_MODE__MAX];
 
 static bool migration_object_check(MigrationState *ms, Error **errp);
 static int migration_maybe_pause(MigrationState *s,
@@ -1011,7 +1011,7 @@ static void fill_source_migration_info(MigrationInfo 
*info)
 {
 MigrationState *s = migrate_get_current();
 int state = qatomic_read(>state);
-GSList *cur_blocker = migration_blockers;
+GSList *cur_blocker = migration_blockers[migrate_mode()];
 
 info->blocked_reasons = NULL;
 
@@ -1475,38 +1475,105 @@ int migrate_init(MigrationState *s, Error **errp)
 return 0;
 }
 
-int migrate_add_blocker_internal(Error **reasonp, Error **errp)
+static bool is_busy(Error **reasonp, Error **errp)
 {
+ERRP_GUARD();
+
 /* Snapshots are similar to migrations, so check RUN_STATE_SAVE_VM too. */
 if (runstate_check(RUN_STATE_SAVE_VM) || !migration_is_idle()) {
 error_propagate_prepend(errp, *reasonp,
 "disallowing migration blocker "
 "(migration/snapshot in progress) for: ");
 *reasonp = NULL;
-

[PATCH V2 3/6] cpr: relax blockdev migration blockers

2023-10-26 Thread Steve Sistare

Some blockdevs block migration because they do not support sharing across
hosts and/or do not support dirty bitmaps.  These prohibitions do not apply
if the old and new qemu processes do not run concurrently, and if new qemu
starts on the same host as old, which is the case for cpr.  Narrow the scope
of these blockers so they only apply to normal mode.  They will not block
cpr modes when they are added in subsequent patches.

No functional change until a new mode is added.

Signed-off-by: Steve Sistare 
Reviewed-by: Juan Quintela 
---
 block/parallels.c | 2 +-
 block/qcow.c  | 2 +-
 block/vdi.c   | 2 +-
 block/vhdx.c  | 2 +-
 block/vmdk.c  | 2 +-
 block/vpc.c   | 2 +-
 block/vvfat.c | 2 +-
 7 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/block/parallels.c b/block/parallels.c
index 1697a2e..8a520db 100644
--- a/block/parallels.c
+++ b/block/parallels.c
@@ -1369,7 +1369,7 @@ static int parallels_open(BlockDriverState *bs, QDict 
*options, int flags,
bdrv_get_device_or_node_name(bs));
 bdrv_graph_rdunlock_main_loop();
 
-ret = migrate_add_blocker(>migration_blocker, errp);
+ret = migrate_add_blocker_normal(>migration_blocker, errp);
 if (ret < 0) {
 error_setg(errp, "Migration blocker error");
 goto fail;
diff --git a/block/qcow.c b/block/qcow.c
index fdd4c83..eab68e3 100644
--- a/block/qcow.c
+++ b/block/qcow.c
@@ -307,7 +307,7 @@ static int qcow_open(BlockDriverState *bs, QDict *options, 
int flags,
bdrv_get_device_or_node_name(bs));
 bdrv_graph_rdunlock_main_loop();
 
-ret = migrate_add_blocker(>migration_blocker, errp);
+ret = migrate_add_blocker_normal(>migration_blocker, errp);
 if (ret < 0) {
 goto fail;
 }
diff --git a/block/vdi.c b/block/vdi.c
index fd7e365..c647d72 100644
--- a/block/vdi.c
+++ b/block/vdi.c
@@ -498,7 +498,7 @@ static int vdi_open(BlockDriverState *bs, QDict *options, 
int flags,
bdrv_get_device_or_node_name(bs));
 bdrv_graph_rdunlock_main_loop();
 
-ret = migrate_add_blocker(>migration_blocker, errp);
+ret = migrate_add_blocker_normal(>migration_blocker, errp);
 if (ret < 0) {
 goto fail_free_bmap;
 }
diff --git a/block/vhdx.c b/block/vhdx.c
index e37f8c0..a9d0874 100644
--- a/block/vhdx.c
+++ b/block/vhdx.c
@@ -1096,7 +1096,7 @@ static int vhdx_open(BlockDriverState *bs, QDict 
*options, int flags,
 error_setg(>migration_blocker, "The vhdx format used by node '%s' "
"does not support live migration",
bdrv_get_device_or_node_name(bs));
-ret = migrate_add_blocker(>migration_blocker, errp);
+ret = migrate_add_blocker_normal(>migration_blocker, errp);
 if (ret < 0) {
 goto fail;
 }
diff --git a/block/vmdk.c b/block/vmdk.c
index 1335d39..85864b8 100644
--- a/block/vmdk.c
+++ b/block/vmdk.c
@@ -1386,7 +1386,7 @@ static int vmdk_open(BlockDriverState *bs, QDict 
*options, int flags,
 error_setg(>migration_blocker, "The vmdk format used by node '%s' "
"does not support live migration",
bdrv_get_device_or_node_name(bs));
-ret = migrate_add_blocker(>migration_blocker, errp);
+ret = migrate_add_blocker_normal(>migration_blocker, errp);
 if (ret < 0) {
 goto fail;
 }
diff --git a/block/vpc.c b/block/vpc.c
index c30cf86..aa1a48a 100644
--- a/block/vpc.c
+++ b/block/vpc.c
@@ -452,7 +452,7 @@ static int vpc_open(BlockDriverState *bs, QDict *options, 
int flags,
bdrv_get_device_or_node_name(bs));
 bdrv_graph_rdunlock_main_loop();
 
-ret = migrate_add_blocker(>migration_blocker, errp);
+ret = migrate_add_blocker_normal(>migration_blocker, errp);
 if (ret < 0) {
 goto fail;
 }
diff --git a/block/vvfat.c b/block/vvfat.c
index 266e036..9d050ba 100644
--- a/block/vvfat.c
+++ b/block/vvfat.c
@@ -1268,7 +1268,7 @@ static int vvfat_open(BlockDriverState *bs, QDict 
*options, int flags,
"The vvfat (rw) format used by node '%s' "
"does not support live migration",
bdrv_get_device_or_node_name(bs));
-ret = migrate_add_blocker(>migration_blocker, errp);
+ret = migrate_add_blocker_normal(>migration_blocker, errp);
 if (ret < 0) {
 goto fail;
 }
-- 
1.8.3.1

[PATCH V2 4/6] cpr: relax vhost migration blockers

2023-10-26 Thread Steve Sistare

vhost blocks migration if logging is not supported to track dirty
memory, and vhost-user blocks it if the log cannot be saved to a shm fd.

vhost-vdpa blocks migration if both hosts do not support all the device's
features using a shadow VQ, for tracking requests and dirty memory.

vhost-scsi blocks migration if storage cannot be shared across hosts,
or if state cannot be migrated.

None of these conditions apply if the old and new qemu processes do
not run concurrently, and if new qemu starts on the same host as old,
which is the case for cpr.

Narrow the scope of these blockers so they only apply to normal mode.
They will not block cpr modes when they are added in subsequent patches.

No functional change until a new mode is added.

Signed-off-by: Steve Sistare 
Reviewed-by: Juan Quintela 
---
 hw/scsi/vhost-scsi.c | 2 +-
 hw/virtio/vhost.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/hw/scsi/vhost-scsi.c b/hw/scsi/vhost-scsi.c
index 14e23cc..bf528d5 100644
--- a/hw/scsi/vhost-scsi.c
+++ b/hw/scsi/vhost-scsi.c
@@ -208,7 +208,7 @@ static void vhost_scsi_realize(DeviceState *dev, Error 
**errp)
 "When external environment supports it (Orchestrator migrates "
 "target SCSI device state or use shared storage over network), 
"
 "set 'migratable' property to true to enable migration.");
-if (migrate_add_blocker(>migration_blocker, errp) < 0) {
+if (migrate_add_blocker_normal(>migration_blocker, errp) < 0) {
 goto free_virtio;
 }
 }
diff --git a/hw/virtio/vhost.c b/hw/virtio/vhost.c
index d737671..f5e9625 100644
--- a/hw/virtio/vhost.c
+++ b/hw/virtio/vhost.c
@@ -1527,7 +1527,7 @@ int vhost_dev_init(struct vhost_dev *hdev, void *opaque,
 }
 
 if (hdev->migration_blocker != NULL) {
-r = migrate_add_blocker(>migration_blocker, errp);
+r = migrate_add_blocker_normal(>migration_blocker, errp);
 if (r < 0) {
 goto fail_busyloop;
 }
-- 
1.8.3.1

Re: [PATCH 4/3] migration: Add tracepoints for downtime checkpoints

2023-10-26 Thread Peter Xu

On Thu, Oct 26, 2023 at 08:43:59PM +0100, Joao Martins wrote:
> Considering we aren't including any downtime timestamps in the tracing, is 
> this
> a way to say that the tracing tool printing timestamps is what we use to 
> extract
> downtime contribution?
> 
> It might be obvious, but perhaps should be spelled out in the commit message?

Sure, I'll state that in the commit message in a new version.

-- 
Peter Xu

[PATCH V2 5/6] cpr: reboot mode

2023-10-26 Thread Steve Sistare

Add the cpr-reboot migration mode.  Usage:

$ qemu-system-$arch -monitor stdio ...
QEMU 8.1.50 monitor - type 'help' for more information
(qemu) migrate_set_capability x-ignore-shared on
(qemu) migrate_set_parameter mode cpr-reboot
(qemu) migrate -d file:vm.state
(qemu) info status
VM status: paused (postmigrate)
(qemu) quit

$ qemu-system-$arch -monitor stdio -incoming defer ...
QEMU 8.1.50 monitor - type 'help' for more information
(qemu) migrate_set_capability x-ignore-shared on
(qemu) migrate_set_parameter mode cpr-reboot
(qemu) migrate_incoming file:vm.state
(qemu) info status
VM status: running

In this mode, the migrate command saves state to a file, allowing one
to quit qemu, reboot to an updated kernel, and restart an updated version
of qemu.  The caller must specify a migration URI that writes to and reads
from a file.  Unlike normal mode, the use of certain local storage options
does not block the migration, but the caller must not modify guest block
devices between the quit and restart.  To avoid saving guest RAM to the
file, the memory backend must be shared, and the @x-ignore-shared migration
capability must be set.  Guest RAM must be non-volatile across reboot, such
as by backing it with a dax device, but this is not enforced.  The restarted
qemu arguments must match those used to initially start qemu, plus the
-incoming option.

Signed-off-by: Steve Sistare 
---
 hw/core/qdev-properties-system.c |  2 +-
 qapi/migration.json  | 15 ++-
 2 files changed, 15 insertions(+), 2 deletions(-)

diff --git a/hw/core/qdev-properties-system.c b/hw/core/qdev-properties-system.c
index 07a848d..d961404 100644
--- a/hw/core/qdev-properties-system.c
+++ b/hw/core/qdev-properties-system.c
@@ -680,7 +680,7 @@ QEMU_BUILD_BUG_ON(sizeof(MigMode) != sizeof(int));
 const PropertyInfo qdev_prop_mig_mode = {
 .name = "MigMode",
 .description = "mig_mode values, "
-   "normal",
+   "normal,cpr-reboot",
 .enum_table = _lookup,
 .get = qdev_propinfo_get_enum,
 .set = qdev_propinfo_set_enum,
diff --git a/qapi/migration.json b/qapi/migration.json
index f99904e..795182f 100644
--- a/qapi/migration.json
+++ b/qapi/migration.json
@@ -620,9 +620,22 @@
 #
 # @normal: the original form of migration. (since 8.2)
 #
+# @cpr-reboot: The migrate command saves state to a file, allowing one to
+#  quit qemu, reboot to an updated kernel, and restart an updated
+#  version of qemu.  The caller must specify a migration URI
+#  that writes to and reads from a file.  Unlike normal mode,
+#  the use of certain local storage options does not block the
+#  migration, but the caller must not modify guest block devices
+#  between the quit and restart.  To avoid saving guest RAM to the
+#  file, the memory backend must be shared, and the 
@x-ignore-shared
+#  migration capability must be set.  Guest RAM must be 
non-volatile
+#  across reboot, such as by backing it with a dax device, but this
+#  is not enforced.  The restarted qemu arguments must match those
+#  used to initially start qemu, plus the -incoming option.
+#  (since 8.2)
 ##
 { 'enum': 'MigMode',
-  'data': [ 'normal' ] }
+  'data': [ 'normal', 'cpr-reboot' ] }
 
 ##
 # @BitmapMigrationBitmapAliasTransform:
-- 
1.8.3.1

[PATCH V2 0/6] Live Update reboot mode

2023-10-26 Thread Steve Sistare

Add a mode migration parameter that can be used to select alternate
migration algorithms.  The default mode is normal, representing the
current migration algorithm, and does not need to be explicitly set.

Provide the cpr-reboot (CheckPoint and Restart) migration mode for live
update, which saves state to a file.  This allows one to quit qemu, reboot
to an updated kernel, install an updated version of qemu, and resume via
the migrate-incoming command.  The caller must specify a migration URI
that writes to and reads from a file, and must set the mode parameter
before invoking the migrate or migrate-incoming commands.

Unlike normal mode, the use of certain local storage options does not block
cpr-reboot mode, but the caller must not modify guest block devices between
the quit and restart.  To avoid saving guest RAM to the state file, the memory
backend must be shared, and the @x-ignore-shared migration capability must
be set.

Guest RAM must be non-volatile across reboot, which can be achieved by
backing it with a dax device, or /dev/shm PKRAM as proposed in
https://lore.kernel.org/lkml/1617140178-8773-1-git-send-email-anthony.yzn...@oracle.com
but this is not enforced.  The restarted qemu arguments must match those used
to initially start qemu, plus the -incoming option.

This patch series contains minimal functionality.  Future patches will enhance
reboot mode by preserving vfio devices for suspended guests.  They will also
add a new mode for updating qemu using the exec system call, which will keep
vfio devices and certain character devices alive.

Here is an example of updating the host kernel using reboot mode.

window 1| window 2
|
# qemu-system-$arch -monitor stdio  |
  mem-path=/dev/dax0.0 ...  |
QEMU 8.1.50 monitor - type 'help' for more info |
(qemu) info status  |
VM status: running  |
| # yum update kernel-uek
(qemu) migrate_set_capability x-ignore-shared on|
(qemu) migrate_set_parameter mode cpr-reboot|
(qemu) migrate -d file:vm.state |
(qemu) info status  |
VM status: paused (postmigrate) |
(qemu) quit |
|
# systemctl kexec   |
kexec_core: Starting new kernel |
... |
|
# qemu-system-$arch -monitor stdio  |
  mem-path=/dev/dax0.0 -incoming defer ...  |
QEMU 8.1.50 monitor - type 'help' for more info |
(qemu) info status  |
VM status: paused (inmigrate)   |
(qemu) migrate_set_capability x-ignore-shared on|
(qemu) migrate_set_parameter mode cpr-reboot|
(qemu) migrate_incoming file:vm.state   |
(qemu) info status  |
VM status: running  |

Changes in V2:
  - moved "migration mode" code and comments to more appropriate places
  - clarified the behavior of non-shared-memory backends
  - split blocker patches and reverted some blockers
  - added a test

Steve Sistare (6):
  migration: mode parameter
  migration: per-mode blockers
  cpr: relax blockdev migration blockers
  cpr: relax vhost migration blockers
  cpr: reboot mode
  tests/qtest: migration: add reboot mode test

 block/parallels.c   |  2 +-
 block/qcow.c|  2 +-
 block/vdi.c |  2 +-
 block/vhdx.c|  2 +-
 block/vmdk.c|  2 +-
 block/vpc.c |  2 +-
 block/vvfat.c   |  2 +-
 hw/core/qdev-properties-system.c| 14 ++
 hw/scsi/vhost-scsi.c|  2 +-
 hw/virtio/vhost.c   |  2 +-
 include/hw/qdev-properties-system.h |  4 ++
 include/migration/blocker.h | 44 +++--
 include/migration/misc.h|  1 +
 migration/migration-hmp-cmds.c  |  9 
 migration/migration.c   | 95 -
 migration/options.c | 21 
 migration/options.h |  1 +
 qapi/migration.json | 40 ++--
 stubs/migr-blocker.c| 10 
 tests/qtest/migration-test.c| 27 +++
 20 files changed, 255 insertions(+), 29 deletions(-)

-- 
1.8.3.1

[PATCH V2 6/6] tests/qtest: migration: add reboot mode test

2023-10-26 Thread Steve Sistare

Signed-off-by: Steve Sistare 
---
 tests/qtest/migration-test.c | 27 +++
 1 file changed, 27 insertions(+)

diff --git a/tests/qtest/migration-test.c b/tests/qtest/migration-test.c
index e1c1105..de29fc5 100644
--- a/tests/qtest/migration-test.c
+++ b/tests/qtest/migration-test.c
@@ -2001,6 +2001,31 @@ static void test_precopy_file_offset_bad(void)
 test_file_common(, false);
 }
 
+static void *test_mode_reboot_start(QTestState *from, QTestState *to)
+{
+migrate_set_parameter_str(from, "mode", "cpr-reboot");
+migrate_set_parameter_str(to, "mode", "cpr-reboot");
+
+migrate_set_capability(from, "x-ignore-shared", true);
+migrate_set_capability(to, "x-ignore-shared", true);
+
+return NULL;
+}
+
+static void test_mode_reboot(void)
+{
+g_autofree char *uri = g_strdup_printf("file:%s/%s", tmpfs,
+   FILE_TEST_FILENAME);
+MigrateCommon args = {
+.start.use_shmem = true,
+.connect_uri = uri,
+.listen_uri = "defer",
+.start_hook = test_mode_reboot_start
+};
+
+test_file_common(, true);
+}
+
 static void test_precopy_tcp_plain(void)
 {
 MigrateCommon args = {
@@ -3056,6 +3081,8 @@ int main(int argc, char **argv)
 qtest_add_func("/migration/precopy/file/offset/bad",
test_precopy_file_offset_bad);
 
+qtest_add_func("/migration/mode/reboot", test_mode_reboot);
+
 #ifdef CONFIG_GNUTLS
 qtest_add_func("/migration/precopy/unix/tls/psk",
test_precopy_unix_tls_psk);
-- 
1.8.3.1

Re: [PATCH 0/3] migration: Downtime tracepoints

2023-10-26 Thread Peter Xu

On Thu, Oct 26, 2023 at 08:33:13PM +0100, Joao Martins wrote:
> Sure. For the fourth patch, feel free to add Suggested-by and/or a Link,
> considering it started on the other patches (if you also agree it is right). 
> The
> patches ofc are enterily different, but at least I like to believe the ideas
> initially presented and then subsequently improved are what lead to the 
> downtime
> observability improvements in this series.

Sure, I'll add that.

If you like, I would be definitely happy to have Co-developed-by: with you,
if you agree.  I just don't know whether that addressed all your need, and
I need some patch like that for our builds.

Thanks,

-- 
Peter Xu

Re: [PATCH 4/3] migration: Add tracepoints for downtime checkpoints

2023-10-26 Thread Joao Martins

On 26/10/2023 20:01, Peter Xu wrote:
> Add tracepoints for major downtime checkpoints on both src and dst.  They
> share the same tracepoint with a string showing its stage.
> 
> On src, we have these checkpoints added:
> 
>   - downtime-start: right before vm stops on src
>   - vm-stopped: after vm is fully stopped
>   - iterable-saved: after all iterables saved (END sections)
>   - non-iterable-saved: after all non-iterable saved (FULL sections)
>   - downtime-stop: migration fully completed
> 
> On dst, we have these checkpoints added:
> 
>   - precopy-loadvm-completes: after loadvm all done for precopy
>   - precopy-bh-*: record BH steps to resume VM for precopy
>   - postcopy-bh-*: record BH steps to resume VM for postcopy
> 
> On dst side, we don't have a good way to trace total time consumed by
> iterable or non-iterable for now.  We can mark it by 1st time receiving a
> FULL / END section, but rather than that let's just rely on the other
> tracepoints added for vmstates to back up the information.
> 
> Until this patch, one can enable "vmstate_downtime*" and it'll enable all
> tracepoints for downtime measurements.
> 
> Since the downtime timestamp tracepoints will cover postcopy too, drop
> loadvm_postcopy_handle_run_bh() tracepoint alongside, because they service
> the same purpose, but that only for postcopy.  We then have unified prefix
> for all downtime relevant tracepoints.
> 
> Signed-off-by: Peter Xu 
> ---
>  migration/migration.c  | 16 +++-
>  migration/savevm.c | 14 +-
>  migration/trace-events |  2 +-
>  3 files changed, 25 insertions(+), 7 deletions(-)
> 
> diff --git a/migration/migration.c b/migration/migration.c
> index 9013c1b500..f1f1d2ae2b 100644
> --- a/migration/migration.c
> +++ b/migration/migration.c
> @@ -103,6 +103,7 @@ static int close_return_path_on_source(MigrationState *s);
>  
>  static void migration_downtime_start(MigrationState *s)
>  {
> +trace_vmstate_downtime_timestamp("downtime-start");
>  s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
>  }
>  
> @@ -117,6 +118,8 @@ static void migration_downtime_end(MigrationState *s)
>  if (!s->downtime) {
>  s->downtime = now - s->downtime_start;
>  }
> +
> +trace_vmstate_downtime_timestamp("downtime-end");
>  }
> 

Considering we aren't including any downtime timestamps in the tracing, is this
a way to say that the tracing tool printing timestamps is what we use to extract
downtime contribution?

It might be obvious, but perhaps should be spelled out in the commit message?

Re: [PATCH 0/3] migration: Downtime tracepoints

2023-10-26 Thread Joao Martins

On 26/10/2023 19:18, Peter Xu wrote:
> On Thu, Oct 26, 2023 at 01:03:57PM -0400, Peter Xu wrote:
>> On Thu, Oct 26, 2023 at 05:06:37PM +0100, Joao Martins wrote:
>>> On 26/10/2023 16:53, Peter Xu wrote:
 This small series (actually only the last patch; first two are cleanups)
 wants to improve ability of QEMU downtime analysis similarly to what Joao
 used to propose here:

   
 https://lore.kernel.org/r/20230926161841.98464-1-joao.m.mart...@oracle.com

>>> Thanks for following up on the idea; It's been hard to have enough 
>>> bandwidth for
>>> everything on the past set of weeks :(
>>
>> Yeah, totally understdood.  I think our QE team pushed me towards some
>> series like this, while my plan was waiting for your new version. :)
>>

Oh my end, it was similar (though not by QE/QA) with folks feeling at a blank
when they see a bigger downtime.

Having an explainer/breakdown totally makes this easier to poke holes on where
problems are.

>> Then when I started I decided to go into per-device.  I was thinking of
>> also persist that information, but then I remembered some ppc guest can
>> have ~40,000 vmstates..  and memory to maintain that may or may not regress
>> a ppc user.  So I figured I should first keep it simple with tracepoints.
>>

Yeah, I should have removed that last patch for QAPI.

vmstates is something that I wasn't quite liking how it looked, but I think you
managed to square a relatively clean way on that last patch.

>>>
 But with a few differences:

   - Nothing exported yet to qapi, all tracepoints so far

   - Instead of major checkpoints (stop, iterable, non-iterable, resume-rp),
 finer granule by providing downtime measurements for each vmstate (I
 made microsecond to be the unit to be accurate).  So far it seems
 iterable / non-iterable is the core of the problem, and I want to nail
 it to per-device.

   - Trace dest QEMU too

 For the last bullet: consider the case where a device save() can be super
 fast, while load() can actually be super slow.  Both of them will
 contribute to the ultimate downtime, but not a simple summary: when src
 QEMU is save()ing on device1, dst QEMU can be load()ing on device2.  So
 they can run in parallel.  However the only way to figure all components of
 the downtime is to record both.

 Please have a look, thanks.

>>>
>>> I like your series, as it allows a user to pinpoint one particular bad 
>>> device,
>>> while covering the load side too. The checkpoints of migration on the other 
>>> hand
>>> were useful -- while also a bit ugly -- for the sort of big picture of how
>>> downtime breaks down. Perhaps we could add that /also/ as tracepoitns 
>>> without
>>> specifically commiting to be exposed in QAPI.
>>>
>>> More fundamentally, how can one capture the 'stop' part? There's also time 
>>> spent
>>> there like e.g. quiescing/stopping vhost-net workers, or suspending the VF
>>> device. All likely as bad to those tracepoints pertaining device-state/ram
>>> related stuff (iterable and non-iterable portions).
>>
>> Yeah that's a good point.  I didn't cover "stop" yet because I think it's
>> just more tricky and I didn't think it all through, yet.
>>

It could follow your previous line of thought where you do it per vmstate.

But the catch is that vm state change handlers are nameless so tracepoints
wouldn't be tell which vm-state is spending time on each

>> The first question is, when stopping some backends, the vCPUs are still
>> running, so it's not 100% clear to me on which should be contributed as
>> part of real downtime.
> 
> I was wrong.. we always stop vcpus first.
> 

I was about to say this, but I guess you figured out. Even if your vCPUs weren't
stopped first, the external I/O threads (qemu or kernel) wouldn't service
guest own I/O which is a portion of outage.

> If you won't mind, I can add some traceopints for all those spots in this
> series to cover your other series.  I'll also make sure I do that for both
> sides.
> 
Sure. For the fourth patch, feel free to add Suggested-by and/or a Link,
considering it started on the other patches (if you also agree it is right). The
patches ofc are enterily different, but at least I like to believe the ideas
initially presented and then subsequently improved are what lead to the downtime
observability improvements in this series.

Joao

Re: [PATCH] vfio/pci: Check return value of vfio_set_irq_signaling()

2023-10-26 Thread Alex Williamson

On Thu, 26 Oct 2023 09:10:43 +0200
Cédric Le Goater  wrote:

> and drop warning when ENOTTY is returned. Only useful for the mdev-mtty
> driver today, which has partial support for INTx: the AUTOMASK
> behavior is not implemented.

FWIW, I prefer not to carry a sentence through from subject to commit
log, I find it harder to follow.

Anyway, I'm not sure it's a great idea to suppress this warning.  We
really want drivers to implement the eventfd channel for INTx
unmasking.  I think we're only putting up with it for mtty because it's
a sample driver and it's a step forward versus the botched
implementation of the SET_IRQS ioctl that it previously had.  We could
implement the unmask eventfd channel for mtty, but it might be better
from a test coverage perspective to have it as a driver that forces the
QEMU INTx path to be exercised.

If we suppress this warning, as the de facto userspace driver for vfio
devices, we're declaring it ok to implement INTx without UNMASK eventfd
support when there's no technical reason it couldn't be implemented.

Maybe we should just let QEMU continue to complain about mtty?  Thanks,

Alex

> Signed-off-by: Cédric Le Goater 
> ---
>  hw/vfio/pci.c | 46 ++
>  1 file changed, 30 insertions(+), 16 deletions(-)
> 
> diff --git a/hw/vfio/pci.c b/hw/vfio/pci.c
> index 
> b27011cee72a0fb3b2d57d297c0b5c2ccff9d9a6..5cbc771e55d83561011785e54a38dea042fc834c
>  100644
> --- a/hw/vfio/pci.c
> +++ b/hw/vfio/pci.c
> @@ -114,15 +114,16 @@ static void vfio_intx_eoi(VFIODevice *vbasedev)
>  vfio_unmask_single_irqindex(vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
>  }
>  
> -static void vfio_intx_enable_kvm(VFIOPCIDevice *vdev, Error **errp)
> +static int vfio_intx_enable_kvm(VFIOPCIDevice *vdev, Error **errp)
>  {
> +int ret = 0;
>  #ifdef CONFIG_KVM
>  int irq_fd = event_notifier_get_fd(>intx.interrupt);
>  
>  if (vdev->no_kvm_intx || !kvm_irqfds_enabled() ||
>  vdev->intx.route.mode != PCI_INTX_ENABLED ||
>  !kvm_resamplefds_enabled()) {
> -return;
> +return 0;
>  }
>  
>  /* Get to a known interrupt state */
> @@ -132,23 +133,26 @@ static void vfio_intx_enable_kvm(VFIOPCIDevice *vdev, 
> Error **errp)
>  pci_irq_deassert(>pdev);
>  
>  /* Get an eventfd for resample/unmask */
> -if (event_notifier_init(>intx.unmask, 0)) {
> +ret = event_notifier_init(>intx.unmask, 0);
> +if (ret) {
>  error_setg(errp, "event_notifier_init failed eoi");
>  goto fail;
>  }
>  
> -if (kvm_irqchip_add_irqfd_notifier_gsi(kvm_state,
> -   >intx.interrupt,
> -   >intx.unmask,
> -   vdev->intx.route.irq)) {
> +ret = kvm_irqchip_add_irqfd_notifier_gsi(kvm_state,
> + >intx.interrupt,
> + >intx.unmask,
> + vdev->intx.route.irq);
> +if (ret) {
>  error_setg_errno(errp, errno, "failed to setup resample irqfd");
>  goto fail_irqfd;
>  }
>  
> -if (vfio_set_irq_signaling(>vbasedev, VFIO_PCI_INTX_IRQ_INDEX, 0,
> -   VFIO_IRQ_SET_ACTION_UNMASK,
> -   event_notifier_get_fd(>intx.unmask),
> -   errp)) {
> +ret = vfio_set_irq_signaling(>vbasedev, VFIO_PCI_INTX_IRQ_INDEX, 0,
> + VFIO_IRQ_SET_ACTION_UNMASK,
> + event_notifier_get_fd(>intx.unmask),
> + errp);
> +if (ret) {
>  goto fail_vfio;
>  }
>  
> @@ -159,7 +163,7 @@ static void vfio_intx_enable_kvm(VFIOPCIDevice *vdev, 
> Error **errp)
>  
>  trace_vfio_intx_enable_kvm(vdev->vbasedev.name);
>  
> -return;
> +return 0;
>  
>  fail_vfio:
>  kvm_irqchip_remove_irqfd_notifier_gsi(kvm_state, >intx.interrupt,
> @@ -170,6 +174,7 @@ fail:
>  qemu_set_fd_handler(irq_fd, vfio_intx_interrupt, NULL, vdev);
>  vfio_unmask_single_irqindex(>vbasedev, VFIO_PCI_INTX_IRQ_INDEX);
>  #endif
> +return ret;
>  }
>  
>  static void vfio_intx_disable_kvm(VFIOPCIDevice *vdev)
> @@ -212,6 +217,7 @@ static void vfio_intx_disable_kvm(VFIOPCIDevice *vdev)
>  static void vfio_intx_update(VFIOPCIDevice *vdev, PCIINTxRoute *route)
>  {
>  Error *err = NULL;
> +int ret;
>  
>  trace_vfio_intx_update(vdev->vbasedev.name,
> vdev->intx.route.irq, route->irq);
> @@ -224,9 +230,13 @@ static void vfio_intx_update(VFIOPCIDevice *vdev, 
> PCIINTxRoute *route)
>  return;
>  }
>  
> -vfio_intx_enable_kvm(vdev, );
> +ret = vfio_intx_enable_kvm(vdev, );
>  if (err) {
> -warn_reportf_err(err, VFIO_MSG_PREFIX, vdev->vbasedev.name);
> +if (ret != -ENOTTY) {
> +warn_reportf_err(err,

RE: [PATCH v2 07/16] target/hexagon: Declare QOM definitions in 'cpu-qom.h'

2023-10-26 Thread Brian Cain



> -Original Message-
> From: Philippe Mathieu-Daudé 
> Sent: Friday, October 13, 2023 9:01 AM
> To: qemu-devel@nongnu.org
> Cc: Eduardo Habkost ; Xiaojuan Yang
> ; Michael S. Tsirkin ; qemu-
> p...@nongnu.org; Aleksandar Rikalo ; David
> Hildenbrand ; qemu-s3...@nongnu.org; Edgar E. Iglesias
> ; Jiaxun Yang ; Song
> Gao ; Philippe Mathieu-Daudé ;
> Paolo Bonzini ; Stafford Horne ;
> Alistair Francis ; Yanan Wang
> ; Max Filippov ; Artyom
> Tarasenko ; Marcel Apfelbaum
> ; Cédric Le Goater ; Laurent
> Vivier ; Aurelien Jarno ; qemu-
> ri...@nongnu.org; Palmer Dabbelt ; Yoshinori Sato
> ; Bastian Koppelmann  paderborn.de>; Bin Meng ; Daniel Henrique
> Barboza ; Mark Cave-Ayland  ayl...@ilande.co.uk>; Weiwei Li ; Daniel Henrique
> Barboza ; Nicholas Piggin
> ; qemu-...@nongnu.org; Liu Zhiwei
> ; Marek Vasut ; Laurent
> Vivier ; Peter Maydell ; Brian
> Cain ; Thomas Huth ; Chris Wulff
> ; Sergio Lopez ; Richard Henderson
> ; Ilya Leoshkevich ;
> Michael Rolnik 
> Subject: [PATCH v2 07/16] target/hexagon: Declare QOM definitions in 'cpu-
> qom.h'
> 
> WARNING: This email originated from outside of Qualcomm. Please be wary of
> any links or attachments, and do not enable macros.
> 
> "target/foo/cpu.h" contains the target specific declarations.
> 
> A heterogeneous setup need to access target agnostic declarations
> (at least the QOM ones, to instantiate the objects).
> 
> Our convention is to add such target agnostic QOM declarations in
> the "target/foo/cpu-qom.h" header.
> 
> Extract QOM definitions from "cpu.h" to "cpu-qom.h".
> 
> Signed-off-by: Philippe Mathieu-Daudé 
> ---
>  target/hexagon/cpu-qom.h | 28 
>  target/hexagon/cpu.h | 15 +--
>  2 files changed, 29 insertions(+), 14 deletions(-)
>  create mode 100644 target/hexagon/cpu-qom.h
> 
> diff --git a/target/hexagon/cpu-qom.h b/target/hexagon/cpu-qom.h
> new file mode 100644
> index 00..f02df7ee6f
> --- /dev/null
> +++ b/target/hexagon/cpu-qom.h
> @@ -0,0 +1,28 @@
> +/*
> + * QEMU Hexagon CPU QOM header (target agnostic)
> + *
> + * Copyright(c) 2019-2023 Qualcomm Innovation Center, Inc. All Rights
> Reserved.
> + *
> + * SPDX-License-Identifier: GPL-2.0-or-later
> + */
> +
> +#ifndef QEMU_HEXAGON_CPU_QOM_H
> +#define QEMU_HEXAGON_CPU_QOM_H
> +
> +#include "hw/core/cpu.h"
> +#include "qom/object.h"
> +
> +#define TYPE_HEXAGON_CPU "hexagon-cpu"
> +
> +#define HEXAGON_CPU_TYPE_SUFFIX "-" TYPE_HEXAGON_CPU
> +#define HEXAGON_CPU_TYPE_NAME(name) (name
> HEXAGON_CPU_TYPE_SUFFIX)
> +
> +#define TYPE_HEXAGON_CPU_V67 HEXAGON_CPU_TYPE_NAME("v67")
> +#define TYPE_HEXAGON_CPU_V68 HEXAGON_CPU_TYPE_NAME("v68")
> +#define TYPE_HEXAGON_CPU_V69 HEXAGON_CPU_TYPE_NAME("v69")
> +#define TYPE_HEXAGON_CPU_V71 HEXAGON_CPU_TYPE_NAME("v71")
> +#define TYPE_HEXAGON_CPU_V73 HEXAGON_CPU_TYPE_NAME("v73")
> +
> +OBJECT_DECLARE_CPU_TYPE(HexagonCPU, HexagonCPUClass,
> HEXAGON_CPU)
> +
> +#endif
> diff --git a/target/hexagon/cpu.h b/target/hexagon/cpu.h
> index 035ac4fb6d..7d16083c6a 100644
> --- a/target/hexagon/cpu.h
> +++ b/target/hexagon/cpu.h
> @@ -20,11 +20,10 @@
> 
>  #include "fpu/softfloat-types.h"
> 
> +#include "cpu-qom.h"
>  #include "exec/cpu-defs.h"
>  #include "hex_regs.h"
>  #include "mmvec/mmvec.h"
> -#include "qom/object.h"
> -#include "hw/core/cpu.h"
>  #include "hw/registerfields.h"
> 
>  #define NUM_PREGS 4
> @@ -36,18 +35,8 @@
>  #define PRED_WRITES_MAX 5   /* 4 insns + endloop */
>  #define VSTORES_MAX 2
> 
> -#define TYPE_HEXAGON_CPU "hexagon-cpu"
> -
> -#define HEXAGON_CPU_TYPE_SUFFIX "-" TYPE_HEXAGON_CPU
> -#define HEXAGON_CPU_TYPE_NAME(name) (name
> HEXAGON_CPU_TYPE_SUFFIX)
>  #define CPU_RESOLVING_TYPE TYPE_HEXAGON_CPU
> 
> -#define TYPE_HEXAGON_CPU_V67 HEXAGON_CPU_TYPE_NAME("v67")
> -#define TYPE_HEXAGON_CPU_V68 HEXAGON_CPU_TYPE_NAME("v68")
> -#define TYPE_HEXAGON_CPU_V69 HEXAGON_CPU_TYPE_NAME("v69")
> -#define TYPE_HEXAGON_CPU_V71 HEXAGON_CPU_TYPE_NAME("v71")
> -#define TYPE_HEXAGON_CPU_V73 HEXAGON_CPU_TYPE_NAME("v73")
> -
>  void hexagon_cpu_list(void);
>  #define cpu_list hexagon_cpu_list
> 
> @@ -127,8 +116,6 @@ typedef struct CPUArchState {
>  VTCMStoreLog vtcm_log;
>  } CPUHexagonState;
> 
> -OBJECT_DECLARE_CPU_TYPE(HexagonCPU, HexagonCPUClass,
> HEXAGON_CPU)
> -
>  typedef struct HexagonCPUClass {
>  CPUClass parent_class;
> 
> --
> 2.41.0

Reviewed-by: Brian Cain

[PATCH 4/3] migration: Add tracepoints for downtime checkpoints

2023-10-26 Thread Peter Xu

Add tracepoints for major downtime checkpoints on both src and dst.  They
share the same tracepoint with a string showing its stage.

On src, we have these checkpoints added:

  - downtime-start: right before vm stops on src
  - vm-stopped: after vm is fully stopped
  - iterable-saved: after all iterables saved (END sections)
  - non-iterable-saved: after all non-iterable saved (FULL sections)
  - downtime-stop: migration fully completed

On dst, we have these checkpoints added:

  - precopy-loadvm-completes: after loadvm all done for precopy
  - precopy-bh-*: record BH steps to resume VM for precopy
  - postcopy-bh-*: record BH steps to resume VM for postcopy

On dst side, we don't have a good way to trace total time consumed by
iterable or non-iterable for now.  We can mark it by 1st time receiving a
FULL / END section, but rather than that let's just rely on the other
tracepoints added for vmstates to back up the information.

Until this patch, one can enable "vmstate_downtime*" and it'll enable all
tracepoints for downtime measurements.

Since the downtime timestamp tracepoints will cover postcopy too, drop
loadvm_postcopy_handle_run_bh() tracepoint alongside, because they service
the same purpose, but that only for postcopy.  We then have unified prefix
for all downtime relevant tracepoints.

Signed-off-by: Peter Xu 
---
 migration/migration.c  | 16 +++-
 migration/savevm.c | 14 +-
 migration/trace-events |  2 +-
 3 files changed, 25 insertions(+), 7 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 9013c1b500..f1f1d2ae2b 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -103,6 +103,7 @@ static int close_return_path_on_source(MigrationState *s);
 
 static void migration_downtime_start(MigrationState *s)
 {
+trace_vmstate_downtime_timestamp("downtime-start");
 s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 }
 
@@ -117,6 +118,8 @@ static void migration_downtime_end(MigrationState *s)
 if (!s->downtime) {
 s->downtime = now - s->downtime_start;
 }
+
+trace_vmstate_downtime_timestamp("downtime-end");
 }
 
 static bool migration_needs_multiple_sockets(void)
@@ -151,7 +154,11 @@ static gint page_request_addr_cmp(gconstpointer ap, 
gconstpointer bp)
 
 int migration_stop_vm(RunState state)
 {
-return vm_stop_force_state(state);
+int ret = vm_stop_force_state(state);
+
+trace_vmstate_downtime_timestamp("vm-stopped");
+
+return ret;
 }
 
 void migration_object_init(void)
@@ -500,6 +507,8 @@ static void process_incoming_migration_bh(void *opaque)
 Error *local_err = NULL;
 MigrationIncomingState *mis = opaque;
 
+trace_vmstate_downtime_timestamp("precopy-bh-enter");
+
 /* If capability late_block_activate is set:
  * Only fire up the block code now if we're going to restart the
  * VM, else 'cont' will do it.
@@ -525,6 +534,8 @@ static void process_incoming_migration_bh(void *opaque)
  */
 qemu_announce_self(>announce_timer, migrate_announce_params());
 
+trace_vmstate_downtime_timestamp("precopy-bh-announced");
+
 multifd_load_shutdown();
 
 dirty_bitmap_mig_before_vm_start();
@@ -542,6 +553,7 @@ static void process_incoming_migration_bh(void *opaque)
 } else {
 runstate_set(global_state_get_runstate());
 }
+trace_vmstate_downtime_timestamp("precopy-bh-vm-started");
 /*
  * This must happen after any state changes since as soon as an external
  * observer sees this event they might start to prod at the VM assuming
@@ -576,6 +588,8 @@ process_incoming_migration_co(void *opaque)
 ret = qemu_loadvm_state(mis->from_src_file);
 mis->loadvm_co = NULL;
 
+trace_vmstate_downtime_timestamp("precopy-loadvm-completed");
+
 ps = postcopy_state_get();
 trace_process_incoming_migration_co_end(ret, ps);
 if (ps != POSTCOPY_INCOMING_NONE) {
diff --git a/migration/savevm.c b/migration/savevm.c
index cd6d6ba493..49cbbd151c 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1494,6 +1494,8 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile 
*f, bool in_postcopy)
 end_ts_each - start_ts_each);
 }
 
+trace_vmstate_downtime_timestamp("iterable-saved");
+
 return 0;
 }
 
@@ -1560,6 +1562,8 @@ int 
qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
 json_writer_free(vmdesc);
 ms->vmdesc = NULL;
 
+trace_vmstate_downtime_timestamp("non-iterable-saved");
+
 return 0;
 }
 
@@ -2102,18 +2106,18 @@ static void loadvm_postcopy_handle_run_bh(void *opaque)
 Error *local_err = NULL;
 MigrationIncomingState *mis = opaque;
 
-trace_loadvm_postcopy_handle_run_bh("enter");
+trace_vmstate_downtime_timestamp("postcopy-bh-enter");
 
 /* TODO we should move all of this lot into postcopy_ram.c or a shared code
  * in migration.c
  */
 cpu_synchronize_all_post_init();
 
-

Re: Replace calls to functions named cpu_physical_memory_* with address_space_*.

2023-10-26 Thread Tanmay

Yeah, I felt that it may not be a cakewalk as it might sound.

You're right, trying to understand the whole code is overwhelming. I'll
start with a small section instead.

I have interest in working on x86_64 and Aarch64 architectures within qemu.
Please let me know if there are any specific tasks from where I can start
exploring.

Thanks,
Tanmay

On Thu, 26 Oct 2023 at 22:16, Peter Maydell 
wrote:

> On Thu, 26 Oct 2023 at 13:48, Tanmay  wrote:
> > I'm really interested in contributing to qemu. I wanted to
> > work on the renaming API calls cpu_physical_memory_* to
> > address_space_*. I couldn't find any related issues on the
> > GItlab tracker. Can I work on this issue?
>
> You're welcome to, but be aware that this is unfortunately
> one of the items in the "BiteSizedTasks" list that is
> not as simple as the one-line description makes it sound.
> (I have a personal project to try to go through that page and
> either expand entries into issues in gitlab that describe the
> task in more detail, or else delete them if they don't really
> seem to be "bite sized". But I haven't got very far with it yet,
> so there are still quite a few unhelpful "landmine" tasks on it.
> Sorry about that :-(  )
>
> It also is something where the right thing to do is going to
> depend on the call-site and what that particular device or piece
> of code is trying to do -- it is not a mechanical conversion.
> (This is partly why the conversion is not yet complete.)
>
> Most of the devices which use these functions should indeed
> use address_space_* functions instead, but the question then
> is "what address space should they access?". That usually ought
> to be one passed into them by the board code. (commit 112a829f8f0a
> is an example of that kind of conversion.) Unfortunately many
> of the remaining uses of cpu_physical_memory_* in hw/ are
> in very old code which hasn't even been converted to the
> kind of new device model coding style that would allow you to
> provide an address space by a QOM property that way. So for
> those devices this would be just one of a whole pile of
> "modernizations" and refactorings that need to be done.
>
> I think what I would suggest is that rather than starting
> with this task in general, that you start with what part
> of QEMU you're interested in working on in particular (eg
> whether you're interested in a particular target architecture
> or a particular subsystem like migration, etc), and then
> we can probably find some tasks that relate to that specific
> interest and help in starting to understand that part of the
> code. (QEMU as a whole is too big for anybody to understand
> all of it...) If what you want to work on turns out to
> involve one of the bits of code which needs this API upgrade,
> maybe we can help you work on that; but it might turn out that
> the two don't overlap at all, or that there's a better starting
> task.
>
> thanks
> -- PMM
>

Re: [PATCH 0/3] migration: Downtime tracepoints

2023-10-26 Thread Peter Xu

On Thu, Oct 26, 2023 at 01:03:57PM -0400, Peter Xu wrote:
> On Thu, Oct 26, 2023 at 05:06:37PM +0100, Joao Martins wrote:
> > On 26/10/2023 16:53, Peter Xu wrote:
> > > This small series (actually only the last patch; first two are cleanups)
> > > wants to improve ability of QEMU downtime analysis similarly to what Joao
> > > used to propose here:
> > > 
> > >   
> > > https://lore.kernel.org/r/20230926161841.98464-1-joao.m.mart...@oracle.com
> > > 
> > Thanks for following up on the idea; It's been hard to have enough 
> > bandwidth for
> > everything on the past set of weeks :(
> 
> Yeah, totally understdood.  I think our QE team pushed me towards some
> series like this, while my plan was waiting for your new version. :)
> 
> Then when I started I decided to go into per-device.  I was thinking of
> also persist that information, but then I remembered some ppc guest can
> have ~40,000 vmstates..  and memory to maintain that may or may not regress
> a ppc user.  So I figured I should first keep it simple with tracepoints.
> 
> > 
> > > But with a few differences:
> > > 
> > >   - Nothing exported yet to qapi, all tracepoints so far
> > > 
> > >   - Instead of major checkpoints (stop, iterable, non-iterable, 
> > > resume-rp),
> > > finer granule by providing downtime measurements for each vmstate (I
> > > made microsecond to be the unit to be accurate).  So far it seems
> > > iterable / non-iterable is the core of the problem, and I want to nail
> > > it to per-device.
> > > 
> > >   - Trace dest QEMU too
> > > 
> > > For the last bullet: consider the case where a device save() can be super
> > > fast, while load() can actually be super slow.  Both of them will
> > > contribute to the ultimate downtime, but not a simple summary: when src
> > > QEMU is save()ing on device1, dst QEMU can be load()ing on device2.  So
> > > they can run in parallel.  However the only way to figure all components 
> > > of
> > > the downtime is to record both.
> > > 
> > > Please have a look, thanks.
> > >
> > 
> > I like your series, as it allows a user to pinpoint one particular bad 
> > device,
> > while covering the load side too. The checkpoints of migration on the other 
> > hand
> > were useful -- while also a bit ugly -- for the sort of big picture of how
> > downtime breaks down. Perhaps we could add that /also/ as tracepoitns 
> > without
> > specifically commiting to be exposed in QAPI.
> > 
> > More fundamentally, how can one capture the 'stop' part? There's also time 
> > spent
> > there like e.g. quiescing/stopping vhost-net workers, or suspending the VF
> > device. All likely as bad to those tracepoints pertaining device-state/ram
> > related stuff (iterable and non-iterable portions).
> 
> Yeah that's a good point.  I didn't cover "stop" yet because I think it's
> just more tricky and I didn't think it all through, yet.
> 
> The first question is, when stopping some backends, the vCPUs are still
> running, so it's not 100% clear to me on which should be contributed as
> part of real downtime.

I was wrong.. we always stop vcpus first.

If you won't mind, I can add some traceopints for all those spots in this
series to cover your other series.  I'll also make sure I do that for both
sides.

Thanks,

> 
> Meanwhile that'll be another angle besides vmstates: need to keep some eye
> on the state change handlers, and that can be a device, or something else.
> 
> Did you measure the stop process in some way before?  Do you have some
> rough number or anything surprising you already observed?
> 
> Thanks,
> 
> -- 
> Peter Xu

-- 
Peter Xu

Re: [QEMU][PATCHv2 0/8] Xen: support grant mappings.

2023-10-26 Thread Stefano Stabellini

On Thu, 26 Oct 2023, David Woodhouse wrote:
> On Wed, 2023-10-25 at 14:24 -0700, Vikram Garhwal wrote:
> > Hi,
> > This patch series add support for grant mappings as a pseudo RAM region for 
> > Xen.
> > 
> > Enabling grant mappings patches(first 6) are written by Juergen in 2021.
> > 
> > QEMU Virtio device provides an emulated backends for Virtio frontned devices
> > in Xen.
> > Please set "iommu_platform=on" option when invoking QEMU. As this will set
> > VIRTIO_F_ACCESS_PLATFORM feature which will be used by virtio frontend in 
> > Xen
> > to know whether backend supports grants or not.
> 
> I don't really understand what's going on here. The subject of the
> cover letter certainly doesn't help me, because we *already* support
> grant mappings under Xen, don't we?
> 
> I found
> https://static.linaro.org/connect/lvc21/presentations/lvc21-314.pdf but
> I think it's a bit out of date; the decision about how to handle grant
> mappings for virtio devices is still 'TBD'.

See this presentation:
https://www.youtube.com/watch?v=boRQ8UHc760

The patch series is for the guest (e.g. Linux) to use grants to share
memory with virtio devices. The plumbing was already done in Linux a
couple of years ago, but QEMU support for it is still missing.


> Can you talk me through the process of what happens when a guest wants
> to a virtio device to initiate 'DMA' to one of its pages? I assume it
> starts by creating a grant mapping, and then taking the gntref and...
> then what?

First the guest gets a grant reference for the page, then it uses it as
"dma address" to pass to the virtio device. The top bit (1ULL << 63) is
used to signal that the address in question is a grant address.

See in Linux:
drivers/xen/grant-dma-ops.c grant_to_dma, dma_to_grant,
xen_grant_dma_alloc, etc.


> I don't see any changes to the virtio devices themselves in this
> series; are we doing something that will make it work by magic? If so,
> it might be useful to explain that magic...

Yes something like that :-)
https://marc.info/?l=xen-devel=165419780304962=2

Vikram, I think it would be a good idea to submit a separate patch to
xen.git to add a document under docs/ to capture this.

Re: [PATCH v3 3/6] target/riscv/tcg: add user flag for profile support

2023-10-26 Thread Andrea Bolognani

On Thu, Oct 26, 2023 at 05:14:49PM +0200, Andrew Jones wrote:
> On Thu, Oct 26, 2023 at 07:36:21AM -0700, Andrea Bolognani wrote:
> > On Mon, Oct 23, 2023 at 07:35:16PM +0200, Andrew Jones wrote:
> > > On Mon, Oct 23, 2023 at 02:00:00PM -0300, Daniel Henrique Barboza wrote:
> > > > On 10/23/23 05:16, Andrew Jones wrote:
> > > > > Hmm, I'm not sure I agree with special-casing profiles like this. I 
> > > > > think
> > > > > the left-to-right processing should be consistent for all. I'm also 
> > > > > not
> > > > > sure we should always warn when disabling a profile. For example, if a
> > > > > user does
> > > > >
> > > > >   -cpu rv64,rva22u64=true,rva22u64=false
> > > > >
> > > > > then they'll get a warning, even though all they're doing is 
> > > > > restoring the
> > > > > cpu model. While that looks like an odd thing to do, a script may be
> > > > > adding the rva22u64=true and the rva22u64=false is the user input 
> > > > > which
> > > > > undoes what the script did.
> > > >
> > > > QEMU options do not work with a "the user enabled then disabled the 
> > > > same option,
> > > > thus it'll count as nothing happened" logic. The last instance of the 
> > > > option will
> > > > overwrite all previous instances. In the example you mentioned above 
> > > > the user would
> > > > disable all mandatory extensions of rva22u64 in the CPU, doesn't matter 
> > > > if the
> > > > same profile was enabled beforehand.
> > >
> > > Yup, I'm aware, but I keep thinking that we'll only be using profiles with
> > > a base cpu type. If you start with nothing (a base) and then add a profile
> > > and take the same one away, you shouldn't be taking away anything else. I
> > > agree that if you use a profile on some cpu type that already enabled a
> > > bunch of stuff itself, then disabling a profile would potentially remove
> > > some of those too, but mixing cpu types that have their own extensions and
> > > profiles seems like a great way to confuse oneself as to what extensions
> > > will be present.  IOW, we should be adding a base cpu type at the same
> > > time we're adding these profiles.
> >
> > The question that keep bouncing around my head is: why would we even
> > allow disabling profiles?
> >
> > It seems to me that it only makes things more complicated, and I
> > really can't see the use case for it.
> >
> > Enabling additional features on top of a profile? There's obvious
> > value in that, so that you can model hardware that implements
> > optional and proprietary extensions. Enabling multiple profiles?
> > You've convinced me that it's useful. But disabling profiles, I just
> > don't see it. I believe Alistair was similarly unconvinced.
>
> The only value I see in allowing a profile to be disabled is to undo the
> enabling of the profile by specifying the profile as 'off' to the right of
> it being specified as 'on'. That may seem pointless, but scripts take
> advantage of being able to do that. Besides that one possible use case,
> there isn't much use in disabling profiles, but treating profile
> properties like every other boolean property makes the UI consistent and
> should actually simplify the code.

The code might be simpler, but the result is an additional burden on
the user, as the interactions between the various flags become much
more nuanced and less intuitive. I'm not convinced the trade-off is a
worthwhile one.

For the script override scenario, fair enough, but once again I feel
that we're making things much worse in the general case in order to
cater to a much narrower one. Script authors will naturally learn to
avoid hardcoding profile enablement once users have reported enough
failures resulting from that.

> > > > > As far as warnings go, it'd be nice to warn when mandatory profile
> > > > > extensions are disabled from an enabled profile. Doing that might be
> > > > > useful for debug, but users which do it without being aware they're
> > > > > "breaking" the profile may learn from that warning. Note, the warning
> > > > > should only come when the profile is actually enabled and when the
> > > > > extension would actually be disabled, i.e.
> > > > >
> > > > >   -cpu rv64,rva22u64=true,c=off
> > > > >
> > > > > should warn
> > > > >
> > > > >   -cpu rv64,c=off,rva22u64=true
> > > > >
> > > > > should not warn (rva22u64 overrides c=off since it's to the right)
> > > > >
> > > > >   -cpu rv64,rva22u64=true,rva22u64=false,c=off
> > > > >
> > > > > should not warn (rva22u64 is not enabled)
> >
> > I think these should be hard errors, not warnings.
> >
> > If you're enabling a profile and then disabling an extension that's
> > mandatory for that profile, you've invalidated the profile. You've
> > asked for a configuration that doesn't make any sense: you can't have
> > a CPU that both implements a profile and lacks one of its mandatory
> > extensions.
>
> Given a platform which implements a profile which mandates extension E and
> a need to debug E or test behavior where E is [incorrectly]

Re: [QEMU][PATCHv2 1/8] xen: when unplugging emulated devices skip virtio devices

2023-10-26 Thread Vikram Garhwal

On Thu, Oct 26, 2023 at 04:45:21PM +0100, David Woodhouse wrote:
> On Wed, 2023-10-25 at 18:23 -0700, Stefano Stabellini wrote:
> > On Thu, 26 Oct 2023, David Woodhouse wrote:
> > > On Wed, 2023-10-25 at 14:24 -0700, Vikram Garhwal wrote:
> > > > From: Juergen Gross 
> > > > 
> > > > Virtio devices should never be unplugged at boot time, as they are
> > > > similar to pci passthrough devices.
> > > > 
> > > > Signed-off-by: Juergen Gross 
> > > > Signed-off-by: Vikram Garhwal 
> > > 
> > > Hm, do your virtio NICs still actually *work* after that? Or are they
> > > all disconnected from their netdev peers? 
> > > 
> > > I suspect you're going to want a variant of
> > > https://lore.kernel.org/qemu-devel/20231025145042.627381-19-dw...@infradead.org/T/#u
> > > which also leave the peers of your virtio devices intact?
> > 
> > Hi David, device unplug is an x86-only thing (see the definition of
> > xen_emul_unplug in Linux under arch/x86/xen/platform-pci-unplug.c) I
> > suspect Vikram who is working on ARM hasn't tested it.
> 
> Ah, I had assumed there was something else coming along later which
> would make it actually get used. 
> 
> > Vikram, a simple option is to drop this patch if you don't need it.
> 
> That works. Although I may revive it in that case. 
> 
Hopefully, Juergen is also okay with dropping the patch. Then, i will remove it
from v3.

Thanks David & Stefano!

Re: [PATCH 2/3] migration: Add migration_downtime_start|end() helpers

2023-10-26 Thread Fabiano Rosas

Peter Xu  writes:

> Unify the three users on recording downtimes with the same pair of helpers.
>
> Signed-off-by: Peter Xu 

Reviewed-by: Fabiano Rosas

Re: [PATCH v5 01/10] target/riscv/tcg: add 'zic64b' support

2023-10-26 Thread Daniel Henrique Barboza




On 10/25/23 20:44, Daniel Henrique Barboza wrote:

zic64b is defined in the RVA22U64 profile [1] as a named feature for
"Cache blocks must be 64 bytes in size, naturally aligned in the address
space".  It's a fantasy name for 64 bytes cache blocks. RVA22U64
mandates this feature, meaning that applications using it expects 64
bytes cache blocks.

In theory we're already compliant to it since we're using 64 bytes cache
block sizes by default, but nothing is stopping users from enabling a
profile and changing the cache block size at the same time.

We'll add zic64b as a 'named feature', not a regular extension, in a
sense that we won't write it in riscv,isa. It'll be used solely to track
whether the user changed cache sizes and if we should warn about it.

zic64b is default to 'true' since we're already using 64 bytes blocks.
If any cache block size (cbom_blocksize or cboz_blocksize) is changed to
something different than 64, zic64b is set to 'false' and, if zic64b was
set to 'true' in the command line, also throw an user warning.

Our profile implementation set mandatory extensions as if users enabled
them in the command line, so this logic will extend to the future RVA22U64
implementation as well.


I talked with Andrew offline. We think that, ***for now***, it's overcomplicated
and a bit confusing to have these half-extensions to be user set. Most of them
are internal aspects of the implementation that we're already complying or
something that do not apply to us (e.g. cache-related features).

We'll not show these flags to users. We'll also add more code in 
query-cpu-model-expansion
to expose them, given that they won't be ordinary user flags like the others.

As for this patch, we'll not no longer expose zic64b to users. It'll be an 
internal
flag that we'll enable or disable based on the blocksizes alone.

I emphasized 'for now' at the start because there's always the possibility of 
having
to treat these named extensions more like regular expansions, adding them in 
riscv,isa
and so on. That's not the case for now, so let's simplify while we can.


Thanks,

Daniel




[1] https://github.com/riscv/riscv-profiles/releases/download/v1.0/profiles.pdf

Signed-off-by: Daniel Henrique Barboza 
---
  target/riscv/cpu.c | 12 ++--
  target/riscv/cpu.h |  3 +++
  target/riscv/cpu_cfg.h |  1 +
  target/riscv/tcg/tcg-cpu.c | 26 ++
  4 files changed, 40 insertions(+), 2 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index f40da4c661..5095f093ba 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1394,6 +1394,12 @@ const RISCVCPUMultiExtConfig 
riscv_cpu_experimental_exts[] = {
  DEFINE_PROP_END_OF_LIST(),
  };
  
+const RISCVCPUMultiExtConfig riscv_cpu_named_features[] = {

+MULTI_EXT_CFG_BOOL("zic64b", zic64b, true),
+
+DEFINE_PROP_END_OF_LIST(),
+};
+
  /* Deprecated entries marked for future removal */
  const RISCVCPUMultiExtConfig riscv_cpu_deprecated_exts[] = {
  MULTI_EXT_CFG_BOOL("Zifencei", ext_zifencei, true),
@@ -1423,8 +1429,10 @@ Property riscv_cpu_options[] = {
  DEFINE_PROP_UINT16("vlen", RISCVCPU, cfg.vlen, 128),
  DEFINE_PROP_UINT16("elen", RISCVCPU, cfg.elen, 64),
  
-DEFINE_PROP_UINT16("cbom_blocksize", RISCVCPU, cfg.cbom_blocksize, 64),

-DEFINE_PROP_UINT16("cboz_blocksize", RISCVCPU, cfg.cboz_blocksize, 64),
+DEFINE_PROP_UINT16("cbom_blocksize", RISCVCPU,
+   cfg.cbom_blocksize, CB_DEF_VALUE),
+DEFINE_PROP_UINT16("cboz_blocksize", RISCVCPU,
+   cfg.cboz_blocksize, CB_DEF_VALUE),
  
  DEFINE_PROP_END_OF_LIST(),

  };
diff --git a/target/riscv/cpu.h b/target/riscv/cpu.h
index 8efc4d83ec..ee9abe61d6 100644
--- a/target/riscv/cpu.h
+++ b/target/riscv/cpu.h
@@ -64,6 +64,8 @@ extern const uint32_t misa_bits[];
  const char *riscv_get_misa_ext_name(uint32_t bit);
  const char *riscv_get_misa_ext_description(uint32_t bit);
  
+#define CB_DEF_VALUE 64

+
  #define CPU_CFG_OFFSET(_prop) offsetof(struct RISCVCPUConfig, _prop)
  
  /* Privileged specification version */

@@ -745,6 +747,7 @@ typedef struct RISCVCPUMultiExtConfig {
  extern const RISCVCPUMultiExtConfig riscv_cpu_extensions[];
  extern const RISCVCPUMultiExtConfig riscv_cpu_vendor_exts[];
  extern const RISCVCPUMultiExtConfig riscv_cpu_experimental_exts[];
+extern const RISCVCPUMultiExtConfig riscv_cpu_named_features[];
  extern const RISCVCPUMultiExtConfig riscv_cpu_deprecated_exts[];
  extern Property riscv_cpu_options[];
  
diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h

index 6eef4a51ea..b6693320d3 100644
--- a/target/riscv/cpu_cfg.h
+++ b/target/riscv/cpu_cfg.h
@@ -107,6 +107,7 @@ struct RISCVCPUConfig {
  bool ext_smepmp;
  bool rvv_ta_all_1s;
  bool rvv_ma_all_1s;
+bool zic64b;
  
  uint32_t mvendorid;

  uint64_t marchid;
diff --git a/target/riscv/tcg/tcg-cpu.c b/target/riscv/tcg/tcg-cpu.c
index 093bda2e75..ac5f65a757 100644
---

Re: [PATCH 1/3] migration: Set downtime_start even for postcopy

2023-10-26 Thread Fabiano Rosas

Peter Xu  writes:

> Postcopy calculates its downtime separately.  It always sets
> MigrationState.downtime properly, but not MigrationState.downtime_start.
>
> Make postcopy do the same as other modes on properly recording the
> timestamp when the VM is going to be stopped.  Drop the temporary variable
> in postcopy_start() along the way.
>
> Signed-off-by: Peter Xu 

Reviewed-by: Fabiano Rosas

Re: [PATCH 0/3] migration: Downtime tracepoints

2023-10-26 Thread Peter Xu

On Thu, Oct 26, 2023 at 05:06:37PM +0100, Joao Martins wrote:
> On 26/10/2023 16:53, Peter Xu wrote:
> > This small series (actually only the last patch; first two are cleanups)
> > wants to improve ability of QEMU downtime analysis similarly to what Joao
> > used to propose here:
> > 
> >   https://lore.kernel.org/r/20230926161841.98464-1-joao.m.mart...@oracle.com
> > 
> Thanks for following up on the idea; It's been hard to have enough bandwidth 
> for
> everything on the past set of weeks :(

Yeah, totally understdood.  I think our QE team pushed me towards some
series like this, while my plan was waiting for your new version. :)

Then when I started I decided to go into per-device.  I was thinking of
also persist that information, but then I remembered some ppc guest can
have ~40,000 vmstates..  and memory to maintain that may or may not regress
a ppc user.  So I figured I should first keep it simple with tracepoints.

> 
> > But with a few differences:
> > 
> >   - Nothing exported yet to qapi, all tracepoints so far
> > 
> >   - Instead of major checkpoints (stop, iterable, non-iterable, resume-rp),
> > finer granule by providing downtime measurements for each vmstate (I
> > made microsecond to be the unit to be accurate).  So far it seems
> > iterable / non-iterable is the core of the problem, and I want to nail
> > it to per-device.
> > 
> >   - Trace dest QEMU too
> > 
> > For the last bullet: consider the case where a device save() can be super
> > fast, while load() can actually be super slow.  Both of them will
> > contribute to the ultimate downtime, but not a simple summary: when src
> > QEMU is save()ing on device1, dst QEMU can be load()ing on device2.  So
> > they can run in parallel.  However the only way to figure all components of
> > the downtime is to record both.
> > 
> > Please have a look, thanks.
> >
> 
> I like your series, as it allows a user to pinpoint one particular bad device,
> while covering the load side too. The checkpoints of migration on the other 
> hand
> were useful -- while also a bit ugly -- for the sort of big picture of how
> downtime breaks down. Perhaps we could add that /also/ as tracepoitns without
> specifically commiting to be exposed in QAPI.
> 
> More fundamentally, how can one capture the 'stop' part? There's also time 
> spent
> there like e.g. quiescing/stopping vhost-net workers, or suspending the VF
> device. All likely as bad to those tracepoints pertaining device-state/ram
> related stuff (iterable and non-iterable portions).

Yeah that's a good point.  I didn't cover "stop" yet because I think it's
just more tricky and I didn't think it all through, yet.

The first question is, when stopping some backends, the vCPUs are still
running, so it's not 100% clear to me on which should be contributed as
part of real downtime.

Meanwhile that'll be another angle besides vmstates: need to keep some eye
on the state change handlers, and that can be a device, or something else.

Did you measure the stop process in some way before?  Do you have some
rough number or anything surprising you already observed?

Thanks,

-- 
Peter Xu

Re: [PATCH v2 47/65] target/hppa: Remove TARGET_REGISTER_BITS

2023-10-26 Thread Richard Henderson


On 10/20/23 14:31, Philippe Mathieu-Daudé wrote:

diff --git a/target/hppa/machine.c b/target/hppa/machine.c
index 0c0bba68c0..ab34b72910 100644
--- a/target/hppa/machine.c
+++ b/target/hppa/machine.c
@@ -21,21 +21,12 @@
  #include "cpu.h"
  #include "migration/cpu.h"
-#if TARGET_REGISTER_BITS == 64
  #define qemu_put_betr   qemu_put_be64
  #define qemu_get_betr   qemu_get_be64
  #define VMSTATE_UINTTR_V(_f, _s, _v) \
  VMSTATE_UINT64_V(_f, _s, _v)
  #define VMSTATE_UINTTR_ARRAY_V(_f, _s, _n, _v) \
  VMSTATE_UINT64_ARRAY_V(_f, _s, _n, _v)


Total 6 uses, let's use in place, removing the
definitions.


I had meant to go back and remove these, thanks for the reminder.
I didn't realize there were so few uses.  I will just fold this in.


IIUC for TARGET_REGISTER_BITS == 32 we need:

-- >8 --
  static const VMStateDescription vmstate_env = {
  .name = "env",
-    .version_id = 1,
-    .minimum_version_id = 1,
+    .version_id = 2,
+    .minimum_version_id = 2,
  .fields = vmstate_env_fields,
  };


You're right -- a version bump is required.  I will simply do this unconditionally, as the 
effort of back-compat is not warranted for this target.


I also need to handle the TLB format change for pa2.0.
(There's an existing comment about that!)


r~

Re: [PATCH v4 18/22] hmp: synchronize cpu state for lapic info

2023-10-26 Thread Woodhouse, David

> From: Dongli Zhang 
> 
> While the default "info lapic" always synchronizes cpu state ...
> 
> mon_get_cpu()
> -> mon_get_cpu_sync(mon, true)
>-> cpu_synchronize_state(cpu)
>   -> ioctl KVM_GET_LAPIC (taking KVM as example)
> 
> ... the cpu state is not synchronized when the apic-id is available as
> argument.
> 
> The cpu state should be synchronized when apic-id is available. Otherwise
> the "info lapic " always returns stale data.
> 
> Cc: Joe Jin 
> Reviewed-by: Daniel P. Berrangé 
> Signed-off-by: Dongli Zhang 

Reviewed-by: David Woodhouse 

I spent a while staring at stale data from 'info lapic 1' this week
before realising. This fix would have been nice.



smime.p7s
Description: S/MIME cryptographic signature



Amazon Development Centre (London) Ltd. Registered in England and Wales with 
registration number 04543232 with its registered office at 1 Principal Place, 
Worship Street, London EC2A 2FA, United Kingdom.

[PATCH v7 05/10] hw/fsi: IBM's On-chip Peripheral Bus

2023-10-26 Thread Ninad Palsule

This is a part of patchset where IBM's Flexible Service Interface is
introduced.

The On-Chip Peripheral Bus (OPB): A low-speed bus typically found in
POWER processors. This now makes an appearance in the ASPEED SoC due
to tight integration of the FSI master IP with the OPB, mainly the
existence of an MMIO-mapping of the CFAM address straight onto a
sub-region of the OPB address space.

Signed-off-by: Andrew Jeffery 
Signed-off-by: Ninad Palsule 
Reviewed-by: Joel Stanley 
---
v2:
- Incorporated review comment by Joel.
v5:
- Incorporated review comments by Cedric.
v6:
- Incorporated review comments by Cedric & Daniel
v7:
- Incorporated review comments by Cedric.
---
 include/hw/fsi/opb.h | 33 
 hw/fsi/fsi-master.c  |  3 +-
 hw/fsi/opb.c | 74 
 hw/fsi/Kconfig   |  4 +++
 hw/fsi/meson.build   |  1 +
 5 files changed, 113 insertions(+), 2 deletions(-)
 create mode 100644 include/hw/fsi/opb.h
 create mode 100644 hw/fsi/opb.c

diff --git a/include/hw/fsi/opb.h b/include/hw/fsi/opb.h
new file mode 100644
index 00..8b71bb55c2
--- /dev/null
+++ b/include/hw/fsi/opb.h
@@ -0,0 +1,33 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (C) 2023 IBM Corp.
+ *
+ * IBM On-Chip Peripheral Bus
+ */
+#ifndef FSI_OPB_H
+#define FSI_OPB_H
+
+#include "exec/memory.h"
+#include "hw/fsi/fsi-master.h"
+
+#define TYPE_OP_BUS "opb"
+OBJECT_DECLARE_SIMPLE_TYPE(OPBus, OP_BUS)
+
+typedef struct OPBus {
+/*< private >*/
+BusState bus;
+
+/*< public >*/
+MemoryRegion mr;
+AddressSpace as;
+
+/* Model OPB as dumb enough just to provide an address-space */
+/* TODO: Maybe don't store device state in the bus? */
+FSIMasterState fsi;
+} OPBus;
+
+typedef struct OPBusClass {
+BusClass parent_class;
+} OPBusClass;
+
+#endif /* FSI_OPB_H */
diff --git a/hw/fsi/fsi-master.c b/hw/fsi/fsi-master.c
index bb7a893003..ec092b42ea 100644
--- a/hw/fsi/fsi-master.c
+++ b/hw/fsi/fsi-master.c
@@ -11,8 +11,7 @@
 #include "trace.h"
 
 #include "hw/fsi/fsi-master.h"
-
-#define TYPE_OP_BUS "opb"
+#include "hw/fsi/opb.h"
 
 #define TO_REG(x)   ((x) >> 2)
 
diff --git a/hw/fsi/opb.c b/hw/fsi/opb.c
new file mode 100644
index 00..04771b4b27
--- /dev/null
+++ b/hw/fsi/opb.c
@@ -0,0 +1,74 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (C) 2023 IBM Corp.
+ *
+ * IBM On-chip Peripheral Bus
+ */
+
+#include "qemu/osdep.h"
+
+#include "qapi/error.h"
+#include "qemu/log.h"
+
+#include "hw/fsi/opb.h"
+
+static void fsi_opb_realize(BusState *bus, Error **errp)
+{
+OPBus *opb = OP_BUS(bus);
+
+memory_region_init_io(>mr, OBJECT(opb), NULL, opb,
+  NULL, UINT32_MAX);
+address_space_init(>as, >mr, "opb");
+
+if (!object_property_set_bool(OBJECT(>fsi), "realized", true, errp)) {
+return;
+}
+
+memory_region_add_subregion(>mr, 0x8000, >fsi.iomem);
+
+/* OPB2FSI region */
+/*
+ * Avoid endianness issues by mapping each slave's memory region directly.
+ * Manually bridging multiple address-spaces causes endian swapping
+ * headaches as memory_region_dispatch_read() and
+ * memory_region_dispatch_write() correct the endianness based on the
+ * target machine endianness and not relative to the device endianness on
+ * either side of the bridge.
+ */
+/*
+ * XXX: This is a bit hairy and will need to be fixed when I sort out the
+ * bus/slave relationship and any changes to the CFAM modelling (multiple
+ * slaves, LBUS)
+ */
+memory_region_add_subregion(>mr, 0xa000, >fsi.opb2fsi);
+}
+
+static void fsi_opb_init(Object *o)
+{
+OPBus *opb = OP_BUS(o);
+
+object_initialize_child(o, "fsi-master", >fsi, TYPE_FSI_MASTER);
+qdev_set_parent_bus(DEVICE(>fsi), BUS(o), _abort);
+}
+
+static void fsi_opb_class_init(ObjectClass *klass, void *data)
+{
+BusClass *bc = BUS_CLASS(klass);
+bc->realize = fsi_opb_realize;
+}
+
+static const TypeInfo opb_info = {
+.name = TYPE_OP_BUS,
+.parent = TYPE_BUS,
+.instance_init = fsi_opb_init,
+.instance_size = sizeof(OPBus),
+.class_init = fsi_opb_class_init,
+.class_size = sizeof(OPBusClass),
+};
+
+static void fsi_opb_register_types(void)
+{
+type_register_static(_info);
+}
+
+type_init(fsi_opb_register_types);
diff --git a/hw/fsi/Kconfig b/hw/fsi/Kconfig
index 8d712e77ed..0f6e6d331a 100644
--- a/hw/fsi/Kconfig
+++ b/hw/fsi/Kconfig
@@ -1,3 +1,7 @@
+config FSI_OPB
+bool
+select FSI_CFAM
+
 config FSI_CFAM
 bool
 select FSI
diff --git a/hw/fsi/meson.build b/hw/fsi/meson.build
index f617943b4a..407b8c2775 100644
--- a/hw/fsi/meson.build
+++ b/hw/fsi/meson.build
@@ -2,3 +2,4 @@ system_ss.add(when: 'CONFIG_FSI_LBUS', if_true: files('lbus.c'))
 system_ss.add(when: 'CONFIG_FSI_SCRATCHPAD', if_true: 
files('engine-scratchpad.c'))
 system_ss.add(when:

[PATCH v7 07/10] hw/arm: Hook up FSI module in AST2600

2023-10-26 Thread Ninad Palsule

This patchset introduces IBM's Flexible Service Interface(FSI).

Time for some fun with inter-processor buses. FSI allows a service
processor access to the internal buses of a host POWER processor to
perform configuration or debugging.

FSI has long existed in POWER processes and so comes with some baggage,
including how it has been integrated into the ASPEED SoC.

Working backwards from the POWER processor, the fundamental pieces of
interest for the implementation are:

1. The Common FRU Access Macro (CFAM), an address space containing
   various "engines" that drive accesses on buses internal and external
   to the POWER chip. Examples include the SBEFIFO and I2C masters. The
   engines hang off of an internal Local Bus (LBUS) which is described
   by the CFAM configuration block.

2. The FSI slave: The slave is the terminal point of the FSI bus for
   FSI symbols addressed to it. Slaves can be cascaded off of one
   another. The slave's configuration registers appear in address space
   of the CFAM to which it is attached.

3. The FSI master: A controller in the platform service processor (e.g.
   BMC) driving CFAM engine accesses into the POWER chip. At the
   hardware level FSI is a bit-based protocol supporting synchronous and
   DMA-driven accesses of engines in a CFAM.

4. The On-Chip Peripheral Bus (OPB): A low-speed bus typically found in
   POWER processors. This now makes an appearance in the ASPEED SoC due
   to tight integration of the FSI master IP with the OPB, mainly the
   existence of an MMIO-mapping of the CFAM address straight onto a
   sub-region of the OPB address space.

5. An APB-to-OPB bridge enabling access to the OPB from the ARM core in
   the AST2600. Hardware limitations prevent the OPB from being directly
   mapped into APB, so all accesses are indirect through the bridge.

The implementation appears as following in the qemu device tree:

(qemu) info qtree
bus: main-system-bus
  type System
  ...
  dev: aspeed.apb2opb, id ""
gpio-out "sysbus-irq" 1
mmio 1e79b000/1000
bus: opb.1
  type opb
  dev: fsi.master, id ""
bus: fsi.bus.1
  type fsi.bus
  dev: cfam.config, id ""
  dev: cfam, id ""
bus: fsi.lbus.1
  type lbus
  dev: scratchpad, id ""
address = 0 (0x0)
bus: opb.0
  type opb
  dev: fsi.master, id ""
bus: fsi.bus.0
  type fsi.bus
  dev: cfam.config, id ""
  dev: cfam, id ""
bus: fsi.lbus.0
  type lbus
  dev: scratchpad, id ""
address = 0 (0x0)

The LBUS is modelled to maintain the qdev bus hierarchy and to take
advantage of the object model to automatically generate the CFAM
configuration block. The configuration block presents engines in the
order they are attached to the CFAM's LBUS. Engine implementations
should subclass the LBusDevice and set the 'config' member of
LBusDeviceClass to match the engine's type.

CFAM designs offer a lot of flexibility, for instance it is possible for
a CFAM to be simultaneously driven from multiple FSI links. The modeling
is not so complete; it's assumed that each CFAM is attached to a single
FSI slave (as a consequence the CFAM subclasses the FSI slave).

As for FSI, its symbols and wire-protocol are not modelled at all. This
is not necessary to get FSI off the ground thanks to the mapping of the
CFAM address space onto the OPB address space - the models follow this
directly and map the CFAM memory region into the OPB's memory region.
Future work includes supporting more advanced accesses that drive the
FSI master directly rather than indirectly via the CFAM mapping, which
will require implementing the FSI state machine and methods for each of
the FSI symbols on the slave. Further down the track we can also look at
supporting the bitbanged SoftFSI drivers in Linux by extending the FSI
slave model to resolve sequences of GPIO IRQs into FSI symbols, and
calling the associated symbol method on the slave to map the access onto
the CFAM.

Testing:
Tested by reading cfam config address 0 on rainier machine type.

root@p10bmc:~# pdbg -a getcfam 0x0
p0: 0x0 = 0xc0022d15

Signed-off-by: Andrew Jeffery 
Signed-off-by: Ninad Palsule 
Reviewed-by: Philippe Mathieu-Daudé 
---
 include/hw/arm/aspeed_soc.h |  4 
 hw/arm/aspeed_ast2600.c | 19 +++
 2 files changed, 23 insertions(+)

diff --git a/include/hw/arm/aspeed_soc.h b/include/hw/arm/aspeed_soc.h
index 8adff70072..db3ba3abc7 100644
--- a/include/hw/arm/aspeed_soc.h
+++ b/include/hw/arm/aspeed_soc.h
@@ -36,6 +36,7 @@
 #include "hw/misc/aspeed_lpc.h"
 #include "hw/misc/unimp.h"
 #include "hw/misc/aspeed_peci.h"
+#include "hw/fsi/aspeed-apb2opb.h"
 #include "hw/char/serial.h"
 
 #define ASPEED_SPIS_NUM  2
@@ -96,6 +97,7 @@ struct

[PATCH v7 08/10] hw/fsi: Added qtest

2023-10-26 Thread Ninad Palsule

Added basic qtests for FSI model.

Signed-off-by: Ninad Palsule 
Acked-by: Thomas Huth 
---
v3:
 - Added new qtest as per Cedric's comment.
V4:
 - Remove MAINTAINER and documentation changes from this commit
v6:
 - Incorporated review comments by Thomas Huth.
v7:
 - Incorporated review comments by Thomas Huth & Cedric.
---
 tests/qtest/aspeed-fsi-test.c | 205 ++
 tests/qtest/meson.build   |   1 +
 2 files changed, 206 insertions(+)
 create mode 100644 tests/qtest/aspeed-fsi-test.c

diff --git a/tests/qtest/aspeed-fsi-test.c b/tests/qtest/aspeed-fsi-test.c
new file mode 100644
index 00..b3020dd821
--- /dev/null
+++ b/tests/qtest/aspeed-fsi-test.c
@@ -0,0 +1,205 @@
+/*
+ * QTest testcases for IBM's Flexible Service Interface (FSI)
+ *
+ * Copyright (c) 2023 IBM Corporation
+ *
+ * Authors:
+ *   Ninad Palsule 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 or later.
+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include 
+
+#include "qemu/module.h"
+#include "libqtest-single.h"
+
+/* Registers from ast2600 specifications */
+#define ASPEED_FSI_ENGINER_TRIGGER   0x04
+#define ASPEED_FSI_OPB0_BUS_SELECT   0x10
+#define ASPEED_FSI_OPB1_BUS_SELECT   0x28
+#define ASPEED_FSI_OPB0_RW_DIRECTION 0x14
+#define ASPEED_FSI_OPB1_RW_DIRECTION 0x2c
+#define ASPEED_FSI_OPB0_XFER_SIZE0x18
+#define ASPEED_FSI_OPB1_XFER_SIZE0x30
+#define ASPEED_FSI_OPB0_BUS_ADDR 0x1c
+#define ASPEED_FSI_OPB1_BUS_ADDR 0x34
+#define ASPEED_FSI_INTRRUPT_CLEAR0x40
+#define ASPEED_FSI_INTRRUPT_STATUS   0x48
+#define ASPEED_FSI_OPB0_BUS_STATUS   0x80
+#define ASPEED_FSI_OPB1_BUS_STATUS   0x8c
+#define ASPEED_FSI_OPB0_READ_DATA0x84
+#define ASPEED_FSI_OPB1_READ_DATA0x90
+
+/*
+ * FSI Base addresses from the ast2600 specifications.
+ */
+#define AST2600_OPB_FSI0_BASE_ADDR 0x1e79b000
+#define AST2600_OPB_FSI1_BASE_ADDR 0x1e79b100
+
+static uint32_t aspeed_fsi_base_addr;
+
+static uint32_t aspeed_fsi_readl(QTestState *s, uint32_t reg)
+{
+return qtest_readl(s, aspeed_fsi_base_addr + reg);
+}
+
+static void aspeed_fsi_writel(QTestState *s, uint32_t reg, uint32_t val)
+{
+qtest_writel(s, aspeed_fsi_base_addr + reg, val);
+}
+
+/* Setup base address and select register */
+static void test_fsi_setup(QTestState *s, uint32_t base_addr)
+{
+uint32_t curval;
+
+aspeed_fsi_base_addr = base_addr;
+
+/* Set the base select register */
+if (base_addr == AST2600_OPB_FSI0_BASE_ADDR) {
+/* Unselect FSI1 */
+aspeed_fsi_writel(s, ASPEED_FSI_OPB1_BUS_SELECT, 0x0);
+curval = aspeed_fsi_readl(s, ASPEED_FSI_OPB1_BUS_SELECT);
+g_assert_cmpuint(curval, ==, 0x0);
+
+/* Select FSI0 */
+aspeed_fsi_writel(s, ASPEED_FSI_OPB0_BUS_SELECT, 0x1);
+curval = aspeed_fsi_readl(s, ASPEED_FSI_OPB0_BUS_SELECT);
+g_assert_cmpuint(curval, ==, 0x1);
+} else if (base_addr == AST2600_OPB_FSI1_BASE_ADDR) {
+/* Unselect FSI0 */
+aspeed_fsi_writel(s, ASPEED_FSI_OPB0_BUS_SELECT, 0x0);
+curval = aspeed_fsi_readl(s, ASPEED_FSI_OPB0_BUS_SELECT);
+g_assert_cmpuint(curval, ==, 0x0);
+
+/* Select FSI1 */
+aspeed_fsi_writel(s, ASPEED_FSI_OPB1_BUS_SELECT, 0x1);
+curval = aspeed_fsi_readl(s, ASPEED_FSI_OPB1_BUS_SELECT);
+g_assert_cmpuint(curval, ==, 0x1);
+} else {
+g_assert_not_reached();
+}
+}
+
+static void test_fsi_reg_change(QTestState *s, uint32_t reg, uint32_t newval)
+{
+uint32_t base;
+uint32_t curval;
+
+base = aspeed_fsi_readl(s, reg);
+aspeed_fsi_writel(s, reg, newval);
+curval = aspeed_fsi_readl(s, reg);
+g_assert_cmpuint(curval, ==, newval);
+aspeed_fsi_writel(s, reg, base);
+curval = aspeed_fsi_readl(s, reg);
+g_assert_cmpuint(curval, ==, base);
+}
+
+static void test_fsi0_master_regs(const void *data)
+{
+QTestState *s = (QTestState *)data;
+
+test_fsi_setup(s, AST2600_OPB_FSI0_BASE_ADDR);
+
+test_fsi_reg_change(s, ASPEED_FSI_OPB0_RW_DIRECTION, 0xF3F4F514);
+test_fsi_reg_change(s, ASPEED_FSI_OPB0_XFER_SIZE, 0xF3F4F518);
+test_fsi_reg_change(s, ASPEED_FSI_OPB0_BUS_ADDR, 0xF3F4F51c);
+test_fsi_reg_change(s, ASPEED_FSI_INTRRUPT_CLEAR, 0xF3F4F540);
+test_fsi_reg_change(s, ASPEED_FSI_INTRRUPT_STATUS, 0xF3F4F548);
+test_fsi_reg_change(s, ASPEED_FSI_OPB0_BUS_STATUS, 0xF3F4F580);
+test_fsi_reg_change(s, ASPEED_FSI_OPB0_READ_DATA, 0xF3F4F584);
+}
+
+static void test_fsi1_master_regs(const void *data)
+{
+QTestState *s = (QTestState *)data;
+
+test_fsi_setup(s, AST2600_OPB_FSI1_BASE_ADDR);
+
+test_fsi_reg_change(s, ASPEED_FSI_OPB1_RW_DIRECTION, 0xF3F4F514);
+test_fsi_reg_change(s, ASPEED_FSI_OPB1_XFER_SIZE, 0xF3F4F518);
+test_fsi_reg_change(s, ASPEED_FSI_OPB1_BUS_ADDR, 0xF3F4F51c);
+test_fsi_reg_change(s, ASPEED_FSI_INTRRUPT_CLEAR, 0xF3F4F540);
+test_fsi_reg_change(s, ASPEED_FSI_INTRRUPT_STATUS,

[PATCH v7 02/10] hw/fsi: Introduce IBM's scratchpad

2023-10-26 Thread Ninad Palsule

This is a part of patchset where scratchpad is introduced.

The scratchpad provides a set of non-functional registers. The firmware
is free to use them, hardware does not support any special management
support. The scratchpad registers can be read or written from LBUS
slave.

In this model, The LBUS device is parent for the scratchpad.

Signed-off-by: Andrew Jeffery 
Signed-off-by: Ninad Palsule 
---
v2:
- Incorporated Joel's review comments.
v5:
- Incorporated review comments by Cedric.
v6:
- Incorporated review comments by Daniel.
v7:
- Incorporated review comments by Philippe.
- Cleaned up unused bits.
---
 meson.build|  1 +
 hw/fsi/trace.h |  1 +
 include/hw/fsi/engine-scratchpad.h | 27 +
 include/hw/fsi/fsi.h   | 16 +
 hw/fsi/engine-scratchpad.c | 93 ++
 hw/fsi/Kconfig |  4 ++
 hw/fsi/meson.build |  1 +
 hw/fsi/trace-events|  2 +
 8 files changed, 145 insertions(+)
 create mode 100644 hw/fsi/trace.h
 create mode 100644 include/hw/fsi/engine-scratchpad.h
 create mode 100644 include/hw/fsi/fsi.h
 create mode 100644 hw/fsi/engine-scratchpad.c
 create mode 100644 hw/fsi/trace-events

diff --git a/meson.build b/meson.build
index dcef8b1e79..793c7c1f20 100644
--- a/meson.build
+++ b/meson.build
@@ -3257,6 +3257,7 @@ if have_system
 'hw/char',
 'hw/display',
 'hw/dma',
+'hw/fsi',
 'hw/hyperv',
 'hw/i2c',
 'hw/i386',
diff --git a/hw/fsi/trace.h b/hw/fsi/trace.h
new file mode 100644
index 00..ee67c7fb04
--- /dev/null
+++ b/hw/fsi/trace.h
@@ -0,0 +1 @@
+#include "trace/trace-hw_fsi.h"
diff --git a/include/hw/fsi/engine-scratchpad.h 
b/include/hw/fsi/engine-scratchpad.h
new file mode 100644
index 00..4ffa871965
--- /dev/null
+++ b/include/hw/fsi/engine-scratchpad.h
@@ -0,0 +1,27 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (C) 2023 IBM Corp.
+ *
+ * IBM scratchpad engne
+ */
+#ifndef FSI_ENGINE_SCRATCHPAD_H
+#define FSI_ENGINE_SCRATCHPAD_H
+
+#include "hw/fsi/lbus.h"
+#include "hw/fsi/fsi.h"
+
+#define ENGINE_CONFIG_NEXTBE_BIT(0)
+#define ENGINE_CONFIG_TYPE_PEEK   (0x02 << 4)
+#define ENGINE_CONFIG_TYPE_FSI(0x03 << 4)
+#define ENGINE_CONFIG_TYPE_SCRATCHPAD (0x06 << 4)
+
+#define TYPE_FSI_SCRATCHPAD "fsi.scratchpad"
+#define SCRATCHPAD(obj) OBJECT_CHECK(FSIScratchPad, (obj), TYPE_FSI_SCRATCHPAD)
+
+typedef struct FSIScratchPad {
+FSILBusDevice parent;
+
+uint32_t reg;
+} FSIScratchPad;
+
+#endif /* FSI_ENGINE_SCRATCHPAD_H */
diff --git a/include/hw/fsi/fsi.h b/include/hw/fsi/fsi.h
new file mode 100644
index 00..b08b97f62b
--- /dev/null
+++ b/include/hw/fsi/fsi.h
@@ -0,0 +1,16 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (C) 2023 IBM Corp.
+ *
+ * IBM Flexible Service Interface
+ */
+#ifndef FSI_FSI_H
+#define FSI_FSI_H
+
+#include "qemu/bitops.h"
+
+/* Bitwise operations at the word level. */
+#define BE_BIT(x)   BIT(31 - (x))
+#define BE_GENMASK(hb, lb)  MAKE_64BIT_MASK((lb), ((hb) - (lb) + 1))
+
+#endif
diff --git a/hw/fsi/engine-scratchpad.c b/hw/fsi/engine-scratchpad.c
new file mode 100644
index 00..a8887cd613
--- /dev/null
+++ b/hw/fsi/engine-scratchpad.c
@@ -0,0 +1,93 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (C) 2023 IBM Corp.
+ *
+ * IBM scratchpad engine
+ */
+
+#include "qemu/osdep.h"
+
+#include "qapi/error.h"
+#include "qemu/log.h"
+#include "trace.h"
+
+#include "hw/fsi/engine-scratchpad.h"
+
+static uint64_t fsi_scratchpad_read(void *opaque, hwaddr addr, unsigned size)
+{
+FSIScratchPad *s = SCRATCHPAD(opaque);
+
+trace_fsi_scratchpad_read(addr, size);
+
+if (addr) {
+return 0;
+}
+
+return s->reg;
+}
+
+static void fsi_scratchpad_write(void *opaque, hwaddr addr, uint64_t data,
+ unsigned size)
+{
+FSIScratchPad *s = SCRATCHPAD(opaque);
+
+trace_fsi_scratchpad_write(addr, size, data);
+
+if (addr) {
+return;
+}
+
+s->reg = data;
+}
+
+static const struct MemoryRegionOps scratchpad_ops = {
+.read = fsi_scratchpad_read,
+.write = fsi_scratchpad_write,
+.endianness = DEVICE_BIG_ENDIAN,
+};
+
+static void fsi_scratchpad_realize(DeviceState *dev, Error **errp)
+{
+FSILBusDevice *ldev = FSI_LBUS_DEVICE(dev);
+
+memory_region_init_io(>iomem, OBJECT(ldev), _ops,
+  ldev, TYPE_FSI_SCRATCHPAD, 0x400);
+}
+
+static void fsi_scratchpad_reset(DeviceState *dev)
+{
+FSIScratchPad *s = SCRATCHPAD(dev);
+
+s->reg = 0;
+}
+
+static void fsi_scratchpad_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+FSILBusDeviceClass *ldc = FSI_LBUS_DEVICE_CLASS(klass);
+
+dc->realize = fsi_scratchpad_realize;
+dc->reset = fsi_scratchpad_reset;
+
+ldc->config =
+  ENGINE_CONFIG_NEXT| /*

[PATCH v7 10/10] hw/fsi: Update MAINTAINER list

2023-10-26 Thread Ninad Palsule

Added maintainer for IBM FSI model

Signed-off-by: Ninad Palsule 
---
V4:
  - Added separate commit for MAINTAINER change.

V5:
  - Use * instead of listing all files in dir
---
 MAINTAINERS | 8 
 1 file changed, 8 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index d36aa44661..cce7875fb0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3502,6 +3502,14 @@ F: tests/qtest/adm1272-test.c
 F: tests/qtest/max34451-test.c
 F: tests/qtest/isl_pmbus_vr-test.c
 
+FSI
+M: Ninad Palsule 
+S: Maintained
+F: hw/fsi/*
+F: include/hw/fsi/*
+F: docs/specs/fsi.rst
+F: tests/qtest/fsi-test.c
+
 Firmware schema specifications
 M: Philippe Mathieu-Daudé 
 R: Daniel P. Berrange 
-- 
2.39.2

[PATCH v7 06/10] hw/fsi: Aspeed APB2OPB interface

2023-10-26 Thread Ninad Palsule

This is a part of patchset where IBM's Flexible Service Interface is
introduced.

An APB-to-OPB bridge enabling access to the OPB from the ARM core in
the AST2600. Hardware limitations prevent the OPB from being directly
mapped into APB, so all accesses are indirect through the bridge.

Signed-off-by: Andrew Jeffery 
Signed-off-by: Ninad Palsule 
---
v2:
- Incorporated review comments by Joel
v3:
- Incorporated review comments by Thomas Huth
v4:
  - Compile FSI with ASPEED_SOC only.
v5:
- Incorporated review comments by Cedric.
v6:
- Incorporated review comments by Cedric.
v7:
- Incorporated review comments by Cedric.
---
 include/hw/fsi/aspeed-apb2opb.h |  33 
 hw/fsi/aspeed-apb2opb.c | 272 
 hw/arm/Kconfig  |   1 +
 hw/fsi/Kconfig  |   4 +
 hw/fsi/meson.build  |   1 +
 hw/fsi/trace-events |   2 +
 6 files changed, 313 insertions(+)
 create mode 100644 include/hw/fsi/aspeed-apb2opb.h
 create mode 100644 hw/fsi/aspeed-apb2opb.c

diff --git a/include/hw/fsi/aspeed-apb2opb.h b/include/hw/fsi/aspeed-apb2opb.h
new file mode 100644
index 00..a81ae67023
--- /dev/null
+++ b/include/hw/fsi/aspeed-apb2opb.h
@@ -0,0 +1,33 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (C) 2023 IBM Corp.
+ *
+ * ASPEED APB2OPB Bridge
+ */
+#ifndef FSI_ASPEED_APB2OPB_H
+#define FSI_ASPEED_APB2OPB_H
+
+#include "hw/sysbus.h"
+#include "hw/fsi/opb.h"
+
+#define TYPE_ASPEED_APB2OPB "aspeed.apb2opb"
+OBJECT_DECLARE_SIMPLE_TYPE(AspeedAPB2OPBState, ASPEED_APB2OPB)
+
+#define ASPEED_APB2OPB_NR_REGS ((0xe8 >> 2) + 1)
+
+#define ASPEED_FSI_NUM 2
+
+typedef struct AspeedAPB2OPBState {
+/*< private >*/
+SysBusDevice parent_obj;
+
+/*< public >*/
+MemoryRegion iomem;
+
+uint32_t regs[ASPEED_APB2OPB_NR_REGS];
+qemu_irq irq;
+
+OPBus opb[ASPEED_FSI_NUM];
+} AspeedAPB2OPBState;
+
+#endif /* FSI_ASPEED_APB2OPB_H */
diff --git a/hw/fsi/aspeed-apb2opb.c b/hw/fsi/aspeed-apb2opb.c
new file mode 100644
index 00..4cac62a38f
--- /dev/null
+++ b/hw/fsi/aspeed-apb2opb.c
@@ -0,0 +1,272 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (C) 2023 IBM Corp.
+ *
+ * ASPEED APB-OPB FSI interface
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qom/object.h"
+#include "qapi/error.h"
+#include "trace.h"
+
+#include "hw/fsi/aspeed-apb2opb.h"
+#include "hw/qdev-core.h"
+
+#define TO_REG(x) (x >> 2)
+
+#define APB2OPB_VERSIONTO_REG(0x00)
+#define APB2OPB_TRIGGERTO_REG(0x04)
+
+#define APB2OPB_CONTROLTO_REG(0x08)
+#define   APB2OPB_CONTROL_OFF  BE_GENMASK(31, 13)
+
+#define APB2OPB_OPB2FSITO_REG(0x0c)
+#define   APB2OPB_OPB2FSI_OFF  BE_GENMASK(31, 22)
+
+#define APB2OPB_OPB0_SEL   TO_REG(0x10)
+#define APB2OPB_OPB1_SEL   TO_REG(0x28)
+#define   APB2OPB_OPB_SEL_EN   BIT(0)
+
+#define APB2OPB_OPB0_MODE  TO_REG(0x14)
+#define APB2OPB_OPB1_MODE  TO_REG(0x2c)
+#define   APB2OPB_OPB_MODE_RD  BIT(0)
+
+#define APB2OPB_OPB0_XFER  TO_REG(0x18)
+#define APB2OPB_OPB1_XFER  TO_REG(0x30)
+#define   APB2OPB_OPB_XFER_FULLBIT(1)
+#define   APB2OPB_OPB_XFER_HALFBIT(0)
+
+#define APB2OPB_OPB0_ADDR  TO_REG(0x1c)
+#define APB2OPB_OPB0_WRITE_DATATO_REG(0x20)
+
+#define APB2OPB_OPB1_ADDR  TO_REG(0x34)
+#define APB2OPB_OPB1_WRITE_DATA  TO_REG(0x38)
+
+#define APB2OPB_IRQ_STSTO_REG(0x48)
+#define   APB2OPB_IRQ_STS_OPB1_TX_ACK  BIT(17)
+#define   APB2OPB_IRQ_STS_OPB0_TX_ACK  BIT(16)
+
+#define APB2OPB_OPB0_WRITE_WORD_ENDIAN TO_REG(0x4c)
+#define   APB2OPB_OPB0_WRITE_WORD_ENDIAN_BE 0x0011101b
+#define APB2OPB_OPB0_WRITE_BYTE_ENDIAN TO_REG(0x50)
+#define   APB2OPB_OPB0_WRITE_BYTE_ENDIAN_BE 0x0c330f3f
+#define APB2OPB_OPB1_WRITE_WORD_ENDIAN TO_REG(0x54)
+#define APB2OPB_OPB1_WRITE_BYTE_ENDIAN TO_REG(0x58)
+#define APB2OPB_OPB0_READ_BYTE_ENDIAN  TO_REG(0x5c)
+#define APB2OPB_OPB1_READ_BYTE_ENDIAN  TO_REG(0x60)
+#define   APB2OPB_OPB0_READ_WORD_ENDIAN_BE  0x00030b1b
+
+#define APB2OPB_OPB0_READ_DATA TO_REG(0x84)
+#define APB2OPB_OPB1_READ_DATA TO_REG(0x90)
+
+/*
+ * The following magic values came from AST2600 data sheet
+ * The register values are defined under section "FSI controller"
+ * as initial values.
+ */
+static const uint32_t aspeed_apb2opb_reset[ASPEED_APB2OPB_NR_REGS] = {
+ [APB2OPB_VERSION]= 0x00a1,
+ [APB2OPB_OPB0_WRITE_WORD_ENDIAN] = 0x0044eee4,
+ [APB2OPB_OPB0_WRITE_BYTE_ENDIAN] = 0x0055aaff,
+ [APB2OPB_OPB1_WRITE_WORD_ENDIAN] = 0x00117717,
+ [APB2OPB_OPB1_WRITE_BYTE_ENDIAN] = 0xffaa5500,
+ [APB2OPB_OPB0_READ_BYTE_ENDIAN]  = 0x0044eee4,
+ [APB2OPB_OPB1_READ_BYTE_ENDIAN]  =

[PATCH v7 00/10] Introduce model for IBM's FSI

2023-10-26 Thread Ninad Palsule

Hello,

Please review the patch-set version 7.
I have incorporated review comments from Cedric, Philippe and Thomas.

Ninad Palsule (10):
  hw/fsi: Introduce IBM's Local bus
  hw/fsi: Introduce IBM's scratchpad
  hw/fsi: Introduce IBM's cfam,fsi-slave
  hw/fsi: Introduce IBM's FSI
  hw/fsi: IBM's On-chip Peripheral Bus
  hw/fsi: Aspeed APB2OPB interface
  hw/arm: Hook up FSI module in AST2600
  hw/fsi: Added qtest
  hw/fsi: Added FSI documentation
  hw/fsi: Update MAINTAINER list

 MAINTAINERS|   8 +
 docs/specs/fsi.rst | 138 +++
 docs/specs/index.rst   |   1 +
 meson.build|   1 +
 hw/fsi/trace.h |   1 +
 include/hw/arm/aspeed_soc.h|   4 +
 include/hw/fsi/aspeed-apb2opb.h|  33 
 include/hw/fsi/cfam.h  |  34 
 include/hw/fsi/engine-scratchpad.h |  27 +++
 include/hw/fsi/fsi-master.h|  30 
 include/hw/fsi/fsi-slave.h |  29 +++
 include/hw/fsi/fsi.h   |  36 
 include/hw/fsi/lbus.h  |  43 +
 include/hw/fsi/opb.h   |  33 
 hw/arm/aspeed_ast2600.c|  19 ++
 hw/fsi/aspeed-apb2opb.c| 272 +
 hw/fsi/cfam.c  | 173 ++
 hw/fsi/engine-scratchpad.c |  93 ++
 hw/fsi/fsi-master.c| 161 +
 hw/fsi/fsi-slave.c |  78 +
 hw/fsi/fsi.c   |  25 +++
 hw/fsi/lbus.c  |  74 
 hw/fsi/opb.c   |  74 
 tests/qtest/aspeed-fsi-test.c  | 205 ++
 hw/Kconfig |   1 +
 hw/arm/Kconfig |   1 +
 hw/fsi/Kconfig |  23 +++
 hw/fsi/meson.build |   6 +
 hw/fsi/trace-events|  13 ++
 hw/meson.build |   1 +
 tests/qtest/meson.build|   1 +
 31 files changed, 1638 insertions(+)
 create mode 100644 docs/specs/fsi.rst
 create mode 100644 hw/fsi/trace.h
 create mode 100644 include/hw/fsi/aspeed-apb2opb.h
 create mode 100644 include/hw/fsi/cfam.h
 create mode 100644 include/hw/fsi/engine-scratchpad.h
 create mode 100644 include/hw/fsi/fsi-master.h
 create mode 100644 include/hw/fsi/fsi-slave.h
 create mode 100644 include/hw/fsi/fsi.h
 create mode 100644 include/hw/fsi/lbus.h
 create mode 100644 include/hw/fsi/opb.h
 create mode 100644 hw/fsi/aspeed-apb2opb.c
 create mode 100644 hw/fsi/cfam.c
 create mode 100644 hw/fsi/engine-scratchpad.c
 create mode 100644 hw/fsi/fsi-master.c
 create mode 100644 hw/fsi/fsi-slave.c
 create mode 100644 hw/fsi/fsi.c
 create mode 100644 hw/fsi/lbus.c
 create mode 100644 hw/fsi/opb.c
 create mode 100644 tests/qtest/aspeed-fsi-test.c
 create mode 100644 hw/fsi/Kconfig
 create mode 100644 hw/fsi/meson.build
 create mode 100644 hw/fsi/trace-events

-- 
2.39.2

[PATCH v7 09/10] hw/fsi: Added FSI documentation

2023-10-26 Thread Ninad Palsule

Documentation for IBM FSI model.

Signed-off-by: Ninad Palsule 
---
v4:
  - Added separate commit for documentation
v7:
  - Incorporated review comments by Cedric.
---
 docs/specs/fsi.rst   | 138 +++
 docs/specs/index.rst |   1 +
 2 files changed, 139 insertions(+)
 create mode 100644 docs/specs/fsi.rst

diff --git a/docs/specs/fsi.rst b/docs/specs/fsi.rst
new file mode 100644
index 00..05a6b6347a
--- /dev/null
+++ b/docs/specs/fsi.rst
@@ -0,0 +1,138 @@
+==
+IBM's Flexible Service Interface (FSI)
+==
+
+The QEMU FSI emulation implements hardware interfaces between ASPEED SOC, FSI
+master/slave and the end engine.
+
+FSI is a point-to-point two wire interface which is capable of supporting
+distances of up to 4 meters. FSI interfaces have been used successfully for
+many years in IBM servers to attach IBM Flexible Support Processors(FSP) to
+CPUs and IBM ASICs.
+
+FSI allows a service processor access to the internal buses of a host POWER
+processor to perform configuration or debugging. FSI has long existed in POWER
+processes and so comes with some baggage, including how it has been integrated
+into the ASPEED SoC.
+
+Working backwards from the POWER processor, the fundamental pieces of interest
+for the implementation are: (see the `FSI specification`_ for more details)
+
+1. The Common FRU Access Macro (CFAM), an address space containing various
+   "engines" that drive accesses on buses internal and external to the POWER
+   chip. Examples include the SBEFIFO and I2C masters. The engines hang off of
+   an internal Local Bus (LBUS) which is described by the CFAM configuration
+   block.
+
+2. The FSI slave: The slave is the terminal point of the FSI bus for FSI
+   symbols addressed to it. Slaves can be cascaded off of one another. The
+   slave's configuration registers appear in address space of the CFAM to
+   which it is attached.
+
+3. The FSI master: A controller in the platform service processor (e.g. BMC)
+   driving CFAM engine accesses into the POWER chip. At the hardware level
+   FSI is a bit-based protocol supporting synchronous and DMA-driven accesses
+   of engines in a CFAM.
+
+4. The On-Chip Peripheral Bus (OPB): A low-speed bus typically found in POWER
+   processors. This now makes an appearance in the ASPEED SoC due to tight
+   integration of the FSI master IP with the OPB, mainly the existence of an
+   MMIO-mapping of the CFAM address straight onto a sub-region of the OPB
+   address space.
+
+5. An APB-to-OPB bridge enabling access to the OPB from the ARM core in the
+   AST2600. Hardware limitations prevent the OPB from being directly mapped
+   into APB, so all accesses are indirect through the bridge.
+
+The LBUS is modelled to maintain the qdev bus hierarchy and to take advantages
+of the object model to automatically generate the CFAM configuration block.
+The configuration block presents engines in the order they are attached to the
+CFAM's LBUS. Engine implementations should subclass the LBusDevice and set the
+'config' member of LBusDeviceClass to match the engine's type.
+
+CFAM designs offer a lot of flexibility, for instance it is possible for a
+CFAM to be simultaneously driven from multiple FSI links. The modeling is not
+so complete; it's assumed that each CFAM is attached to a single FSI slave (as
+a consequence the CFAM subclasses the FSI slave).
+
+As for FSI, its symbols and wire-protocol are not modelled at all. This is not
+necessary to get FSI off the ground thanks to the mapping of the CFAM address
+space onto the OPB address space - the models follow this directly and map the
+CFAM memory region into the OPB's memory region.
+
+QEMU files related to FSI interface:
+ - ``hw/fsi/aspeed-apb2opb.c``
+ - ``include/hw/fsi/aspeed-apb2opb.h``
+ - ``hw/fsi/opb.c``
+ - ``include/hw/fsi/opb.h``
+ - ``hw/fsi/fsi.c``
+ - ``include/hw/fsi/fsi.h``
+ - ``hw/fsi/fsi-master.c``
+ - ``include/hw/fsi/fsi-master.h``
+ - ``hw/fsi/fsi-slave.c``
+ - ``include/hw/fsi/fsi-slave.h``
+ - ``hw/fsi/cfam.c``
+ - ``include/hw/fsi/cfam.h``
+ - ``hw/fsi/engine-scratchpad.c``
+ - ``include/hw/fsi/engine-scratchpad.h``
+ - ``include/hw/fsi/lbus.h``
+
+The following commands start the rainier machine with built-in FSI model.
+There are no model specific arguments.
+
+.. code-block:: console
+
+  qemu-system-arm -M rainier-bmc -nographic \
+  -kernel fitImage-linux.bin \
+  -dtb aspeed-bmc-ibm-rainier.dtb \
+  -initrd obmc-phosphor-initramfs.rootfs.cpio.xz \
+  -drive file=obmc-phosphor-image.rootfs.wic.qcow2,if=sd,index=2 \
+  -append "rootwait console=ttyS4,115200n8 root=PARTLABEL=rofs-a"
+
+The implementation appears as following in the qemu device tree:
+
+.. code-block:: console
+
+  (qemu) info qtree
+  bus: main-system-bus
+type System
+...
+dev: aspeed.apb2opb, id ""
+  gpio-out "sysbus-irq" 1
+  mmio 1e79b000/1000
+  bus:

[PATCH v7 01/10] hw/fsi: Introduce IBM's Local bus

2023-10-26 Thread Ninad Palsule

This is a part of patchset where IBM's Flexible Service Interface is
introduced.

The LBUS is modelled to maintain the qdev bus hierarchy and to take
advantage of the object model to automatically generate the CFAM
configuration block. The configuration block presents engines in the
order they are attached to the CFAM's LBUS. Engine implementations
should subclass the LBusDevice and set the 'config' member of
LBusDeviceClass to match the engine's type.

Signed-off-by: Andrew Jeffery 
Signed-off-by: Ninad Palsule 
---
v2:
- Incorporated Joel's review comments.
v5:
- Incorporated review comments by Cedric.
v6:
- Incorporated review comments by Cedric & Daniel.
v7:
- Incorporated review comments by Philippe.
---
 include/hw/fsi/lbus.h | 43 +
 hw/fsi/lbus.c | 74 +++
 hw/Kconfig|  1 +
 hw/fsi/Kconfig|  2 ++
 hw/fsi/meson.build|  1 +
 hw/meson.build|  1 +
 6 files changed, 122 insertions(+)
 create mode 100644 include/hw/fsi/lbus.h
 create mode 100644 hw/fsi/lbus.c
 create mode 100644 hw/fsi/Kconfig
 create mode 100644 hw/fsi/meson.build

diff --git a/include/hw/fsi/lbus.h b/include/hw/fsi/lbus.h
new file mode 100644
index 00..4fa696bbdb
--- /dev/null
+++ b/include/hw/fsi/lbus.h
@@ -0,0 +1,43 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (C) 2023 IBM Corp.
+ *
+ * IBM Local bus and connected device structures.
+ */
+#ifndef FSI_LBUS_H
+#define FSI_LBUS_H
+
+#include "exec/memory.h"
+#include "hw/qdev-core.h"
+
+#define TYPE_FSI_LBUS_DEVICE "fsi.lbus.device"
+OBJECT_DECLARE_TYPE(FSILBusDevice, FSILBusDeviceClass, FSI_LBUS_DEVICE)
+
+#define FSI_LBUS_MEM_REGION_SIZE  (2 * 1024 * 1024)
+#define FSI_LBUSDEV_IOMEM_SIZE0x400
+
+typedef struct FSILBusDevice {
+DeviceState parent;
+
+MemoryRegion iomem;
+uint32_t address;
+} FSILBusDevice;
+
+typedef struct FSILBusDeviceClass {
+DeviceClass parent;
+
+uint32_t config;
+} FSILBusDeviceClass;
+
+#define TYPE_FSI_LBUS "fsi.lbus"
+OBJECT_DECLARE_SIMPLE_TYPE(FSILBus, FSI_LBUS)
+
+typedef struct FSILBus {
+BusState bus;
+
+MemoryRegion mr;
+} FSILBus;
+
+DeviceState *lbus_create_device(FSILBus *bus, const char *type, uint32_t addr);
+int lbus_add_device(FSILBus *bus, FSILBusDevice *dev);
+#endif /* FSI_LBUS_H */
diff --git a/hw/fsi/lbus.c b/hw/fsi/lbus.c
new file mode 100644
index 00..3a7335dde5
--- /dev/null
+++ b/hw/fsi/lbus.c
@@ -0,0 +1,74 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (C) 2023 IBM Corp.
+ *
+ * IBM Local bus where FSI slaves are connected
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "hw/fsi/lbus.h"
+
+#include "hw/qdev-properties.h"
+
+static void lbus_init(Object *o)
+{
+FSILBus *lbus = FSI_LBUS(o);
+
+memory_region_init(>mr, OBJECT(lbus), TYPE_FSI_LBUS,
+   FSI_LBUS_MEM_REGION_SIZE - FSI_LBUSDEV_IOMEM_SIZE);
+}
+
+static const TypeInfo lbus_info = {
+.name = TYPE_FSI_LBUS,
+.parent = TYPE_BUS,
+.instance_init = lbus_init,
+.instance_size = sizeof(FSILBus),
+};
+
+static Property lbus_device_props[] = {
+DEFINE_PROP_UINT32("address", FSILBusDevice, address, 0),
+DEFINE_PROP_END_OF_LIST(),
+};
+
+DeviceState *lbus_create_device(FSILBus *bus, const char *type, uint32_t addr)
+{
+DeviceState *ds;
+BusState *state = BUS(bus);
+FSILBusDevice *dev;
+
+ds = qdev_new(type);
+qdev_prop_set_uint32(ds, "address", addr);
+qdev_realize_and_unref(ds, state, _fatal);
+
+dev = FSI_LBUS_DEVICE(ds);
+memory_region_add_subregion(>mr, dev->address,
+>iomem);
+
+return ds;
+}
+
+static void lbus_device_class_init(ObjectClass *klass, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(klass);
+
+dc->bus_type = TYPE_FSI_LBUS;
+device_class_set_props(dc, lbus_device_props);
+}
+
+static const TypeInfo lbus_device_type_info = {
+.name = TYPE_FSI_LBUS_DEVICE,
+.parent = TYPE_DEVICE,
+.instance_size = sizeof(FSILBusDevice),
+.abstract = true,
+.class_init = lbus_device_class_init,
+.class_size = sizeof(FSILBusDeviceClass),
+};
+
+static void lbus_register_types(void)
+{
+type_register_static(_info);
+type_register_static(_device_type_info);
+}
+
+type_init(lbus_register_types);
diff --git a/hw/Kconfig b/hw/Kconfig
index 9ca7b38c31..2c00936c28 100644
--- a/hw/Kconfig
+++ b/hw/Kconfig
@@ -9,6 +9,7 @@ source core/Kconfig
 source cxl/Kconfig
 source display/Kconfig
 source dma/Kconfig
+source fsi/Kconfig
 source gpio/Kconfig
 source hyperv/Kconfig
 source i2c/Kconfig
diff --git a/hw/fsi/Kconfig b/hw/fsi/Kconfig
new file mode 100644
index 00..e650c660f0
--- /dev/null
+++ b/hw/fsi/Kconfig
@@ -0,0 +1,2 @@
+config FSI_LBUS
+bool
diff --git a/hw/fsi/meson.build b/hw/fsi/meson.build
new file mode 100644
index 00..4074d3a7d2
--- /dev/null
+++ b/hw/fsi/meson.build
@@ -0,0 +1 @@
+system_ss.add(when:

[PATCH v7 04/10] hw/fsi: Introduce IBM's FSI

2023-10-26 Thread Ninad Palsule

This is a part of patchset where IBM's Flexible Service Interface is
introduced.

This commit models the FSI bus. CFAM is hanging out of FSI bus. The bus
is model such a way that it is embedded inside the FSI master which is a
bus controller.

The FSI master: A controller in the platform service processor (e.g.
BMC) driving CFAM engine accesses into the POWER chip. At the
hardware level FSI is a bit-based protocol supporting synchronous and
DMA-driven accesses of engines in a CFAM.

Signed-off-by: Andrew Jeffery 
Signed-off-by: Ninad Palsule 
Reviewed-by: Joel Stanley 
---
v2:
- Incorporated review comments by Joel
v5:
- Incorporated review comments by Cedric.
v6:
- Incorporated review comments by Cedric & Daniel
v7:
- Cleaned up unused bits.
---
 include/hw/fsi/fsi-master.h |  30 +++
 hw/fsi/fsi-master.c | 162 
 hw/fsi/fsi.c|  25 ++
 hw/fsi/meson.build  |   2 +-
 hw/fsi/trace-events |   2 +
 5 files changed, 220 insertions(+), 1 deletion(-)
 create mode 100644 include/hw/fsi/fsi-master.h
 create mode 100644 hw/fsi/fsi-master.c
 create mode 100644 hw/fsi/fsi.c

diff --git a/include/hw/fsi/fsi-master.h b/include/hw/fsi/fsi-master.h
new file mode 100644
index 00..847078919c
--- /dev/null
+++ b/include/hw/fsi/fsi-master.h
@@ -0,0 +1,30 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (C) 2019 IBM Corp.
+ *
+ * IBM Flexible Service Interface Master
+ */
+#ifndef FSI_FSI_MASTER_H
+#define FSI_FSI_MASTER_H
+
+#include "exec/memory.h"
+#include "hw/qdev-core.h"
+#include "hw/fsi/fsi.h"
+
+#define TYPE_FSI_MASTER "fsi.master"
+OBJECT_DECLARE_SIMPLE_TYPE(FSIMasterState, FSI_MASTER)
+
+#define FSI_MASTER_NR_REGS ((0x2e0 >> 2) + 1)
+
+typedef struct FSIMasterState {
+DeviceState parent;
+MemoryRegion iomem;
+MemoryRegion opb2fsi;
+
+FSIBus bus;
+
+uint32_t regs[FSI_MASTER_NR_REGS];
+} FSIMasterState;
+
+
+#endif /* FSI_FSI_H */
diff --git a/hw/fsi/fsi-master.c b/hw/fsi/fsi-master.c
new file mode 100644
index 00..bb7a893003
--- /dev/null
+++ b/hw/fsi/fsi-master.c
@@ -0,0 +1,162 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (C) 2023 IBM Corp.
+ *
+ * IBM Flexible Service Interface master
+ */
+
+#include "qemu/osdep.h"
+#include "qapi/error.h"
+#include "qemu/log.h"
+#include "trace.h"
+
+#include "hw/fsi/fsi-master.h"
+
+#define TYPE_OP_BUS "opb"
+
+#define TO_REG(x)   ((x) >> 2)
+
+#define FSI_MENP0   TO_REG(0x010)
+#define FSI_MENP32  TO_REG(0x014)
+#define FSI_MSENP0  TO_REG(0x018)
+#define FSI_MLEVP0  TO_REG(0x018)
+#define FSI_MSENP32 TO_REG(0x01c)
+#define FSI_MLEVP32 TO_REG(0x01c)
+#define FSI_MCENP0  TO_REG(0x020)
+#define FSI_MREFP0  TO_REG(0x020)
+#define FSI_MCENP32 TO_REG(0x024)
+#define FSI_MREFP32 TO_REG(0x024)
+
+#define FSI_MVERTO_REG(0x074)
+#define FSI_MRESP0  TO_REG(0x0d0)
+
+#define FSI_MRESB0  TO_REG(0x1d0)
+#define   FSI_MRESB0_RESET_GENERAL  BE_BIT(0)
+#define   FSI_MRESB0_RESET_ERRORBE_BIT(1)
+
+static uint64_t fsi_master_read(void *opaque, hwaddr addr, unsigned size)
+{
+FSIMasterState *s = FSI_MASTER(opaque);
+
+trace_fsi_master_read(addr, size);
+
+if (addr + size > sizeof(s->regs)) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: Out of bounds read: 0x%"HWADDR_PRIx" for %u\n",
+  __func__, addr, size);
+return 0;
+}
+
+return s->regs[TO_REG(addr)];
+}
+
+static void fsi_master_write(void *opaque, hwaddr addr, uint64_t data,
+ unsigned size)
+{
+FSIMasterState *s = FSI_MASTER(opaque);
+
+trace_fsi_master_write(addr, size, data);
+
+if (addr + size > sizeof(s->regs)) {
+qemu_log_mask(LOG_GUEST_ERROR,
+  "%s: Out of bounds write: %"HWADDR_PRIx" for %u\n",
+  __func__, addr, size);
+return;
+}
+
+switch (TO_REG(addr)) {
+case FSI_MENP0:
+s->regs[FSI_MENP0] = data;
+break;
+case FSI_MENP32:
+s->regs[FSI_MENP32] = data;
+break;
+case FSI_MSENP0:
+s->regs[FSI_MENP0] |= data;
+break;
+case FSI_MSENP32:
+s->regs[FSI_MENP32] |= data;
+break;
+case FSI_MCENP0:
+s->regs[FSI_MENP0] &= ~data;
+break;
+case FSI_MCENP32:
+s->regs[FSI_MENP32] &= ~data;
+break;
+case FSI_MRESP0:
+/* Perform necessary resets leave register 0 to indicate no errors */
+break;
+case FSI_MRESB0:
+if (data & FSI_MRESB0_RESET_GENERAL) {
+

[PATCH v7 03/10] hw/fsi: Introduce IBM's cfam,fsi-slave

2023-10-26 Thread Ninad Palsule

This is a part of patchset where IBM's Flexible Service Interface is
introduced.

The Common FRU Access Macro (CFAM), an address space containing
various "engines" that drive accesses on busses internal and external
to the POWER chip. Examples include the SBEFIFO and I2C masters. The
engines hang off of an internal Local Bus (LBUS) which is described
by the CFAM configuration block.

The FSI slave: The slave is the terminal point of the FSI bus for
FSI symbols addressed to it. Slaves can be cascaded off of one
another. The slave's configuration registers appear in address space
of the CFAM to which it is attached.

Signed-off-by: Andrew Jeffery 
Signed-off-by: Ninad Palsule 
---
v2:
- Incorporated Joel's review comments.
v3:
- Incorporated Thomas Huth's review comments.
v5:
- Incorporated review comments by Cedric.
v6:
- Incorporated review comments by Cedric & Daniel
---
 include/hw/fsi/cfam.h  |  34 
 include/hw/fsi/fsi-slave.h |  29 +++
 include/hw/fsi/fsi.h   |  20 +
 hw/fsi/cfam.c  | 173 +
 hw/fsi/fsi-slave.c |  78 +
 hw/fsi/Kconfig |   9 ++
 hw/fsi/meson.build |   2 +
 hw/fsi/trace-events|   7 ++
 8 files changed, 352 insertions(+)
 create mode 100644 include/hw/fsi/cfam.h
 create mode 100644 include/hw/fsi/fsi-slave.h
 create mode 100644 hw/fsi/cfam.c
 create mode 100644 hw/fsi/fsi-slave.c

diff --git a/include/hw/fsi/cfam.h b/include/hw/fsi/cfam.h
new file mode 100644
index 00..842a3bad0c
--- /dev/null
+++ b/include/hw/fsi/cfam.h
@@ -0,0 +1,34 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (C) 2023 IBM Corp.
+ *
+ * IBM Common FRU Access Macro
+ */
+#ifndef FSI_CFAM_H
+#define FSI_CFAM_H
+
+#include "exec/memory.h"
+
+#include "hw/fsi/fsi-slave.h"
+#include "hw/fsi/lbus.h"
+
+#define TYPE_FSI_CFAM "cfam"
+#define FSI_CFAM(obj) OBJECT_CHECK(FSICFAMState, (obj), TYPE_FSI_CFAM)
+
+/* P9-ism */
+#define CFAM_CONFIG_NR_REGS 0x28
+
+typedef struct FSICFAMState {
+/* < private > */
+FSISlaveState parent;
+
+/* CFAM config address space */
+MemoryRegion config_iomem;
+
+MemoryRegion mr;
+AddressSpace as;
+
+FSILBus lbus;
+} FSICFAMState;
+
+#endif /* FSI_CFAM_H */
diff --git a/include/hw/fsi/fsi-slave.h b/include/hw/fsi/fsi-slave.h
new file mode 100644
index 00..f5f23f4457
--- /dev/null
+++ b/include/hw/fsi/fsi-slave.h
@@ -0,0 +1,29 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (C) 2023 IBM Corp.
+ *
+ * IBM Flexible Service Interface slave
+ */
+#ifndef FSI_FSI_SLAVE_H
+#define FSI_FSI_SLAVE_H
+
+#include "exec/memory.h"
+#include "hw/qdev-core.h"
+
+#include "hw/fsi/lbus.h"
+
+#include 
+
+#define TYPE_FSI_SLAVE "fsi.slave"
+OBJECT_DECLARE_SIMPLE_TYPE(FSISlaveState, FSI_SLAVE)
+
+#define FSI_SLAVE_CONTROL_NR_REGS ((0x40 >> 2) + 1)
+
+typedef struct FSISlaveState {
+DeviceState parent;
+
+MemoryRegion iomem;
+uint32_t regs[FSI_SLAVE_CONTROL_NR_REGS];
+} FSISlaveState;
+
+#endif /* FSI_FSI_H */
diff --git a/include/hw/fsi/fsi.h b/include/hw/fsi/fsi.h
index b08b97f62b..3cbc685226 100644
--- a/include/hw/fsi/fsi.h
+++ b/include/hw/fsi/fsi.h
@@ -8,9 +8,29 @@
 #define FSI_FSI_H
 
 #include "qemu/bitops.h"
+#include "hw/qdev-core.h"
+
+/*
+ * TODO: Maybe unwind this dependency with const links? Store a
+ * pointer in FSIBus?
+ */
+#include "hw/fsi/cfam.h"
 
 /* Bitwise operations at the word level. */
 #define BE_BIT(x)   BIT(31 - (x))
 #define BE_GENMASK(hb, lb)  MAKE_64BIT_MASK((lb), ((hb) - (lb) + 1))
 
+#define TYPE_FSI_BUS "fsi.bus"
+OBJECT_DECLARE_SIMPLE_TYPE(FSIBus, FSI_BUS)
+
+/* TODO: Figure out what's best with a point-to-point bus */
+typedef struct FSISlaveState FSISlaveState;
+
+typedef struct FSIBus {
+BusState bus;
+
+/* XXX: It's point-to-point, just instantiate the slave directly for now */
+FSICFAMState slave;
+} FSIBus;
+
 #endif
diff --git a/hw/fsi/cfam.c b/hw/fsi/cfam.c
new file mode 100644
index 00..a1c037925f
--- /dev/null
+++ b/hw/fsi/cfam.c
@@ -0,0 +1,173 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (C) 2023 IBM Corp.
+ *
+ * IBM Common FRU Access Macro
+ */
+
+#include "qemu/osdep.h"
+
+#include "qapi/error.h"
+#include "trace.h"
+
+#include "hw/fsi/cfam.h"
+#include "hw/fsi/fsi.h"
+#include "hw/fsi/engine-scratchpad.h"
+
+#include "hw/qdev-properties.h"
+
+#define TO_REG(x)  ((x) >> 2)
+
+#define CFAM_ENGINE_CONFIG  TO_REG(0x04)
+
+#define CFAM_CONFIG_CHIP_IDTO_REG(0x00)
+#define CFAM_CONFIG_CHIP_ID_P9 0xc0022d15
+#define CFAM_CONFIG_CHIP_ID_BREAK  0xc0de
+
+static uint64_t fsi_cfam_config_read(void *opaque, hwaddr addr, unsigned size)
+{
+FSICFAMState *cfam = FSI_CFAM(opaque);
+BusChild *kid;
+int i;
+
+trace_fsi_cfam_config_read(addr, size);
+
+switch (addr) {
+case 0x00:
+return CFAM_CONFIG_CHIP_ID_P9;
+

Re: Replace calls to functions named cpu_physical_memory_* with address_space_*.

2023-10-26 Thread Peter Maydell

On Thu, 26 Oct 2023 at 13:48, Tanmay  wrote:
> I'm really interested in contributing to qemu. I wanted to
> work on the renaming API calls cpu_physical_memory_* to
> address_space_*. I couldn't find any related issues on the
> GItlab tracker. Can I work on this issue?

You're welcome to, but be aware that this is unfortunately
one of the items in the "BiteSizedTasks" list that is
not as simple as the one-line description makes it sound.
(I have a personal project to try to go through that page and
either expand entries into issues in gitlab that describe the
task in more detail, or else delete them if they don't really
seem to be "bite sized". But I haven't got very far with it yet,
so there are still quite a few unhelpful "landmine" tasks on it.
Sorry about that :-(  )

It also is something where the right thing to do is going to
depend on the call-site and what that particular device or piece
of code is trying to do -- it is not a mechanical conversion.
(This is partly why the conversion is not yet complete.)

Most of the devices which use these functions should indeed
use address_space_* functions instead, but the question then
is "what address space should they access?". That usually ought
to be one passed into them by the board code. (commit 112a829f8f0a
is an example of that kind of conversion.) Unfortunately many
of the remaining uses of cpu_physical_memory_* in hw/ are
in very old code which hasn't even been converted to the
kind of new device model coding style that would allow you to
provide an address space by a QOM property that way. So for
those devices this would be just one of a whole pile of
"modernizations" and refactorings that need to be done.

I think what I would suggest is that rather than starting
with this task in general, that you start with what part
of QEMU you're interested in working on in particular (eg
whether you're interested in a particular target architecture
or a particular subsystem like migration, etc), and then
we can probably find some tasks that relate to that specific
interest and help in starting to understand that part of the
code. (QEMU as a whole is too big for anybody to understand
all of it...) If what you want to work on turns out to
involve one of the bits of code which needs this API upgrade,
maybe we can help you work on that; but it might turn out that
the two don't overlap at all, or that there's a better starting
task.

thanks
-- PMM

Re: [PATCH] target/i386/monitor: synchronize cpu before printing lapic state

2023-10-26 Thread Dongli Zhang

Hi David,

On 10/26/23 08:39, David Woodhouse wrote:
> From: David Woodhouse 
> 
> Where the local APIC is emulated by KVM, we need kvm_get_apic() to pull
> the current state into userspace before it's printed. Otherwise we get
> stale values.
> 
> Signed-off-by: David Woodhouse 
> ---
>  target/i386/monitor.c | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/target/i386/monitor.c b/target/i386/monitor.c
> index 6512846327..0754d699ba 100644
> --- a/target/i386/monitor.c
> +++ b/target/i386/monitor.c
> @@ -29,6 +29,7 @@
>  #include "monitor/hmp.h"
>  #include "qapi/qmp/qdict.h"
>  #include "sysemu/kvm.h"
> +#include "sysemu/hw_accel.h"
>  #include "qapi/error.h"
>  #include "qapi/qapi-commands-misc-target.h"
>  #include "qapi/qapi-commands-misc.h"
> @@ -655,6 +656,7 @@ void hmp_info_local_apic(Monitor *mon, const QDict *qdict)
>  if (qdict_haskey(qdict, "apic-id")) {
>  int id = qdict_get_try_int(qdict, "apic-id", 0);
>  cs = cpu_by_arch_id(id);
> +cpu_synchronize_state(cs);

AFAIR, there is a case that cs may be NULL here when I was sending the similar
bugfix long time ago.

https://lore.kernel.org/qemu-devel/20210701214051.1588-1-dongli.zh...@oracle.com/

... and resend:

https://lore.kernel.org/qemu-devel/20210908143803.29191-1-dongli.zh...@oracle.com/

... and resent by Daniel as part of another patchset (after review):

https://lore.kernel.org/qemu-devel/20211028155457.967291-19-berra...@redhat.com/


This utility is helpful for the diagnostic of loss of interrupt issue.

Dongli Zhang

>  } else {
>  cs = mon_get_cpu(mon);
>  }

Re: [PATCH v3 28/28] docs: update Xen-on-KVM documentation

2023-10-26 Thread David Woodhouse

On Thu, 2023-10-26 at 10:25 +0100, David Woodhouse wrote:
> 
> > So it would have been entirely possible to use -initrd 'bzImage
> > console=hvc0 root=/dev/xvda1' if Xen worked like that.
> 
> Xen does allow that too. I didn't realise our multiboot loader did though.
> 
> So yes, you *can* use  -initrd 'bzImage root=/dev/xvda1'. 
> 
> And you can even load more than one module, it seems. Separate them by
> commas, so -initrd 'bzImage,initrd.img' should work.
> 
> You can even do both at the same time. If you have commas on the kernel
> command line, *double* them:
> 
>  -initrd 'bzImage root=/dev/xvda earlyprintk=xen,,keep,initrd.img'
> 
> I'll update the documentation accordingly.

https://git.infradead.org/users/dwmw2/qemu.git/commitdiff/0b13c0ae39b


+Booting Xen PV guests
+-
+
+Booting PV guest kernels is possible by using the Xen PV shim (a version of Xen
+itself, designed to run inside a Xen HVM guest and provide memory management
+services for one guest alone).
+
+The Xen binary is provided as the ``-kernel`` and the guest kernel itself (or
+PV Grub image) as the ``-initrd`` image, which actually just means the first
+multiboot "module". For example:
+
+.. parsed-literal::
+
+  |qemu_system| --accel kvm,xen-version=0x40011,kernel-irqchip=split \\
+   -chardev stdio,id=char0 -device xen-console,chardev=char0 \\
+   -display none  -m 1G  -kernel xen -initrd bzImage \\
+   -append "pv-shim console=xen,pv -- console=hvc0 root=/dev/xvda1" \\
+   -drive file=${GUEST_IMAGE},if=xen
+
+The Xen image must be built with the ``CONFIG_XEN_GUEST`` and 
``CONFIG_PV_SHIM``
+options, and as of Xen 4.17, Xen's PV shim mode does not support using a serial
+port; it must have a Xen console or it will panic.
+
+The example above provides the guest kernel command line after a separator
+(" ``--`` ") on the Xen command line, and does not provide the guest kernel
+with an actual initramfs, which would need to listed as a second multiboot
+module. For more complicated alternatives, see the
+:ref:`documentation ` for the ``-initrd`` option.
+


I also fixed up the -initrd documentation so that it actually mentions
how to quote commas, using a Xen PV launch as an example:

 ``-initrd "file1 arg=foo,file2"``
 This syntax is only available with multiboot.
 
-Use file1 and file2 as modules and pass arg=foo as parameter to the
-first module.
+Use file1 and file2 as modules and pass ``arg=foo`` as parameter to the
+first module. Commas can be provided in module parameters by doubling
+them on the command line to escape them:
+
+``-initrd "bzImage earlyprintk=xen,,keep root=/dev/xvda1,initrd.img"``
+Multiboot only. Use bzImage as the first module with
+"``earlyprintk=xen,keep root=/dev/xvda1``" as its command line,
+and initrd.img as the second module.


smime.p7s
Description: S/MIME cryptographic signature

Re: Replace calls to functions named cpu_physical_memory_* with address_space_*.

2023-10-26 Thread Tanmay

Hi,

The above refactoring of functions was mentioned under API conversion at
https://wiki.qemu.org/Contribute/BiteSizedTasks .

Thanks,
Tanmay Patil

On Thu, 26 Oct 2023 at 15:54, Tanmay  wrote:

> Hi,
>
> I'm really interested in contributing to qemu. I wanted to work on the
> renaming API calls cpu_physical_memory_* to address_space_*. I couldn't
> find any related issues on the GItlab tracker. Can I work on this issue?
>
> Thanks,
> Tanmay Patil
>

Re: [QEMU][PATCHv2 0/8] Xen: support grant mappings.

2023-10-26 Thread David Woodhouse

On Wed, 2023-10-25 at 14:24 -0700, Vikram Garhwal wrote:
> Hi,
> This patch series add support for grant mappings as a pseudo RAM region for 
> Xen.
> 
> Enabling grant mappings patches(first 6) are written by Juergen in 2021.
> 
> QEMU Virtio device provides an emulated backends for Virtio frontned devices
> in Xen.
> Please set "iommu_platform=on" option when invoking QEMU. As this will set
> VIRTIO_F_ACCESS_PLATFORM feature which will be used by virtio frontend in Xen
> to know whether backend supports grants or not.

I don't really understand what's going on here. The subject of the
cover letter certainly doesn't help me, because we *already* support
grant mappings under Xen, don't we?

I found
https://static.linaro.org/connect/lvc21/presentations/lvc21-314.pdf but
I think it's a bit out of date; the decision about how to handle grant
mappings for virtio devices is still 'TBD'.

Can you talk me through the process of what happens when a guest wants
to a virtio device to initiate 'DMA' to one of its pages? I assume it
starts by creating a grant mapping, and then taking the gntref and...
then what?

I don't see any changes to the virtio devices themselves in this
series; are we doing something that will make it work by magic? If so,
it might be useful to explain that magic...

smime.p7s
Description: S/MIME cryptographic signature

Re: [PATCH 13/29] tcg/i386: Support TCG_COND_TST{EQ,NE}

2023-10-26 Thread Richard Henderson


On 10/26/23 04:29, Paolo Bonzini wrote:

On 10/26/23 02:14, Richard Henderson wrote:

Signed-off-by: Richard Henderson 
---


Also, a TST{EQ,NE} with a one-bit immediate argument can be changed to:

- a TEST reg, reg + js/jns (or sets/setns, or cmovs/cmovns) when testing bits 
7, 15 or 31

- a BT reg, imm + jc/jnc (or setc/setnc, or cmovc/cmovnc) when testing other bits in the 
8..63 range.


I will take a look at using this to get rid of the mask field in CCPrepare, but I would 
not mind if someone else took a look at these code generation optimizations in tcg/i386.


I thought about that while working on this series, and is part of the reason why 
tcg_out_cmp now returns a JCC_* value rather than having the caller look it up.


I thought I'd start simpler before adding these optimizations.


r~

Re: [PATCH 0/3] migration: Downtime tracepoints

2023-10-26 Thread Joao Martins




On 26/10/2023 16:53, Peter Xu wrote:
> This small series (actually only the last patch; first two are cleanups)
> wants to improve ability of QEMU downtime analysis similarly to what Joao
> used to propose here:
> 
>   https://lore.kernel.org/r/20230926161841.98464-1-joao.m.mart...@oracle.com
> 
Thanks for following up on the idea; It's been hard to have enough bandwidth for
everything on the past set of weeks :(

> But with a few differences:
> 
>   - Nothing exported yet to qapi, all tracepoints so far
> 
>   - Instead of major checkpoints (stop, iterable, non-iterable, resume-rp),
> finer granule by providing downtime measurements for each vmstate (I
> made microsecond to be the unit to be accurate).  So far it seems
> iterable / non-iterable is the core of the problem, and I want to nail
> it to per-device.
> 
>   - Trace dest QEMU too
> 
> For the last bullet: consider the case where a device save() can be super
> fast, while load() can actually be super slow.  Both of them will
> contribute to the ultimate downtime, but not a simple summary: when src
> QEMU is save()ing on device1, dst QEMU can be load()ing on device2.  So
> they can run in parallel.  However the only way to figure all components of
> the downtime is to record both.
> 
> Please have a look, thanks.
>

I like your series, as it allows a user to pinpoint one particular bad device,
while covering the load side too. The checkpoints of migration on the other hand
were useful -- while also a bit ugly -- for the sort of big picture of how
downtime breaks down. Perhaps we could add that /also/ as tracepoitns without
specifically commiting to be exposed in QAPI.

More fundamentally, how can one capture the 'stop' part? There's also time spent
there like e.g. quiescing/stopping vhost-net workers, or suspending the VF
device. All likely as bad to those tracepoints pertaining device-state/ram
related stuff (iterable and non-iterable portions).


> Peter Xu (3):
>   migration: Set downtime_start even for postcopy
>   migration: Add migration_downtime_start|end() helpers
>   migration: Add per vmstate downtime tracepoints
> 
>  migration/migration.c  | 38 +---
>  migration/savevm.c | 49 ++
>  migration/trace-events |  2 ++
>  3 files changed, 72 insertions(+), 17 deletions(-)
>

[PATCH 1/3] migration: Set downtime_start even for postcopy

2023-10-26 Thread Peter Xu

Postcopy calculates its downtime separately.  It always sets
MigrationState.downtime properly, but not MigrationState.downtime_start.

Make postcopy do the same as other modes on properly recording the
timestamp when the VM is going to be stopped.  Drop the temporary variable
in postcopy_start() along the way.

Signed-off-by: Peter Xu 
---
 migration/migration.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index 67547eb6a1..f8a54ff4d1 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -2121,7 +2121,6 @@ static int postcopy_start(MigrationState *ms, Error 
**errp)
 int ret;
 QIOChannelBuffer *bioc;
 QEMUFile *fb;
-int64_t time_at_stop = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 uint64_t bandwidth = migrate_max_postcopy_bandwidth();
 bool restart_block = false;
 int cur_state = MIGRATION_STATUS_ACTIVE;
@@ -2143,6 +2142,8 @@ static int postcopy_start(MigrationState *ms, Error 
**errp)
 qemu_mutex_lock_iothread();
 trace_postcopy_start_set_run();
 
+ms->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+
 qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
 global_state_store();
 ret = vm_stop_force_state(RUN_STATE_FINISH_MIGRATE);
@@ -2245,7 +2246,7 @@ static int postcopy_start(MigrationState *ms, Error 
**errp)
 ms->postcopy_after_devices = true;
 migration_call_notifiers(ms);
 
-ms->downtime =  qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - time_at_stop;
+ms->downtime = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - ms->downtime_start;
 
 qemu_mutex_unlock_iothread();
 
-- 
2.41.0

[PATCH 0/3] migration: Downtime tracepoints

2023-10-26 Thread Peter Xu

This small series (actually only the last patch; first two are cleanups)
wants to improve ability of QEMU downtime analysis similarly to what Joao
used to propose here:

  https://lore.kernel.org/r/20230926161841.98464-1-joao.m.mart...@oracle.com

But with a few differences:

  - Nothing exported yet to qapi, all tracepoints so far

  - Instead of major checkpoints (stop, iterable, non-iterable, resume-rp),
finer granule by providing downtime measurements for each vmstate (I
made microsecond to be the unit to be accurate).  So far it seems
iterable / non-iterable is the core of the problem, and I want to nail
it to per-device.

  - Trace dest QEMU too

For the last bullet: consider the case where a device save() can be super
fast, while load() can actually be super slow.  Both of them will
contribute to the ultimate downtime, but not a simple summary: when src
QEMU is save()ing on device1, dst QEMU can be load()ing on device2.  So
they can run in parallel.  However the only way to figure all components of
the downtime is to record both.

Please have a look, thanks.

Peter Xu (3):
  migration: Set downtime_start even for postcopy
  migration: Add migration_downtime_start|end() helpers
  migration: Add per vmstate downtime tracepoints

 migration/migration.c  | 38 +---
 migration/savevm.c | 49 ++
 migration/trace-events |  2 ++
 3 files changed, 72 insertions(+), 17 deletions(-)

-- 
2.41.0

[PATCH 3/3] migration: Add per vmstate downtime tracepoints

2023-10-26 Thread Peter Xu

We have a bunch of savevm_section* tracepoints, they're good to analyze
migration stream, but not always suitable if someone would like to analyze
the migration downtime.  Two major problems:

  - savevm_section* tracepoints are dumping all sections, we only care
about the sections that contribute to the downtime

  - They don't have an identifier to show the type of sections, so no way
to filter downtime information either easily.

We can add type into the tracepoints, but instead of doing so, this patch
kept them untouched, instead of adding a bunch of downtime specific
tracepoints, so one can enable "vmstate_downtime*" tracepoints and get a
full picture of how the downtime is distributed across iterative and
non-iterative vmstate save/load.

Note that here both save() and load() need to be traced, because both of
them may contribute to the downtime.  The contribution is not a simple "add
them together", though: consider when the src is doing a save() of device1
while the dest can be load()ing for device2, so they can happen
concurrently.

Tracking both sides make sense because device load() and save() can be
imbalanced, one device can save() super fast, but load() super slow, vice
versa.  We can't figure that out without tracing both.

Signed-off-by: Peter Xu 
---
 migration/savevm.c | 49 ++
 migration/trace-events |  2 ++
 2 files changed, 47 insertions(+), 4 deletions(-)

diff --git a/migration/savevm.c b/migration/savevm.c
index 8622f229e5..cd6d6ba493 100644
--- a/migration/savevm.c
+++ b/migration/savevm.c
@@ -1459,6 +1459,7 @@ void qemu_savevm_state_complete_postcopy(QEMUFile *f)
 static
 int qemu_savevm_state_complete_precopy_iterable(QEMUFile *f, bool in_postcopy)
 {
+int64_t start_ts_each, end_ts_each;
 SaveStateEntry *se;
 int ret;
 
@@ -1475,6 +1476,8 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile 
*f, bool in_postcopy)
 continue;
 }
 }
+
+start_ts_each = qemu_clock_get_us(QEMU_CLOCK_REALTIME);
 trace_savevm_section_start(se->idstr, se->section_id);
 
 save_section_header(f, se, QEMU_VM_SECTION_END);
@@ -1486,6 +1489,9 @@ int qemu_savevm_state_complete_precopy_iterable(QEMUFile 
*f, bool in_postcopy)
 qemu_file_set_error(f, ret);
 return -1;
 }
+end_ts_each = qemu_clock_get_us(QEMU_CLOCK_REALTIME);
+trace_vmstate_downtime_save("iterable", se->idstr, se->instance_id,
+end_ts_each - start_ts_each);
 }
 
 return 0;
@@ -1496,6 +1502,7 @@ int 
qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
 bool inactivate_disks)
 {
 MigrationState *ms = migrate_get_current();
+int64_t start_ts_each, end_ts_each;
 JSONWriter *vmdesc = ms->vmdesc;
 int vmdesc_len;
 SaveStateEntry *se;
@@ -1507,11 +1514,17 @@ int 
qemu_savevm_state_complete_precopy_non_iterable(QEMUFile *f,
 continue;
 }
 
+start_ts_each = qemu_clock_get_us(QEMU_CLOCK_REALTIME);
+
 ret = vmstate_save(f, se, vmdesc);
 if (ret) {
 qemu_file_set_error(f, ret);
 return ret;
 }
+
+end_ts_each = qemu_clock_get_us(QEMU_CLOCK_REALTIME);
+trace_vmstate_downtime_save("non-iterable", se->idstr, se->instance_id,
+end_ts_each - start_ts_each);
 }
 
 if (inactivate_disks) {
@@ -2506,9 +2519,12 @@ static bool check_section_footer(QEMUFile *f, 
SaveStateEntry *se)
 }
 
 static int
-qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis)
+qemu_loadvm_section_start_full(QEMUFile *f, MigrationIncomingState *mis,
+   uint8_t type)
 {
+bool trace_downtime = (type == QEMU_VM_SECTION_FULL);
 uint32_t instance_id, version_id, section_id;
+int64_t start_ts, end_ts;
 SaveStateEntry *se;
 char idstr[256];
 int ret;
@@ -2557,12 +2573,23 @@ qemu_loadvm_section_start_full(QEMUFile *f, 
MigrationIncomingState *mis)
 return -EINVAL;
 }
 
+if (trace_downtime) {
+start_ts = qemu_clock_get_us(QEMU_CLOCK_REALTIME);
+}
+
 ret = vmstate_load(f, se);
 if (ret < 0) {
 error_report("error while loading state for instance 0x%"PRIx32" of"
  " device '%s'", instance_id, idstr);
 return ret;
 }
+
+if (trace_downtime) {
+end_ts = qemu_clock_get_us(QEMU_CLOCK_REALTIME);
+trace_vmstate_downtime_load("non-iterable", se->idstr,
+se->instance_id, end_ts - start_ts);
+}
+
 if (!check_section_footer(f, se)) {
 return -EINVAL;
 }
@@ -2571,8 +2598,11 @@ qemu_loadvm_section_start_full(QEMUFile *f, 
MigrationIncomingState *mis)
 }
 
 static int
-qemu_loadvm_section_part_end(QEMUFile *f, MigrationIncomingState *mis)

[PATCH 2/3] migration: Add migration_downtime_start|end() helpers

2023-10-26 Thread Peter Xu

Unify the three users on recording downtimes with the same pair of helpers.

Signed-off-by: Peter Xu 
---
 migration/migration.c | 37 -
 1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/migration/migration.c b/migration/migration.c
index f8a54ff4d1..70d775942a 100644
--- a/migration/migration.c
+++ b/migration/migration.c
@@ -101,6 +101,24 @@ static int migration_maybe_pause(MigrationState *s,
 static void migrate_fd_cancel(MigrationState *s);
 static int close_return_path_on_source(MigrationState *s);
 
+static void migration_downtime_start(MigrationState *s)
+{
+s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+}
+
+static void migration_downtime_end(MigrationState *s)
+{
+int64_t now = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+
+/*
+ * If downtime already set, should mean that postcopy already set it,
+ * then that should be the real downtime already.
+ */
+if (!s->downtime) {
+s->downtime = now - s->downtime_start;
+}
+}
+
 static bool migration_needs_multiple_sockets(void)
 {
 return migrate_multifd() || migrate_postcopy_preempt();
@@ -2142,7 +2160,7 @@ static int postcopy_start(MigrationState *ms, Error 
**errp)
 qemu_mutex_lock_iothread();
 trace_postcopy_start_set_run();
 
-ms->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+migration_downtime_start(ms);
 
 qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
 global_state_store();
@@ -2246,7 +2264,7 @@ static int postcopy_start(MigrationState *ms, Error 
**errp)
 ms->postcopy_after_devices = true;
 migration_call_notifiers(ms);
 
-ms->downtime = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - ms->downtime_start;
+migration_downtime_end(ms);
 
 qemu_mutex_unlock_iothread();
 
@@ -2342,7 +2360,7 @@ static int migration_completion_precopy(MigrationState *s,
 int ret;
 
 qemu_mutex_lock_iothread();
-s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+migration_downtime_start(s);
 qemu_system_wakeup_request(QEMU_WAKEUP_REASON_OTHER, NULL);
 
 s->vm_old_state = runstate_get();
@@ -2699,15 +2717,8 @@ static void migration_calculate_complete(MigrationState 
*s)
 int64_t end_time = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
 int64_t transfer_time;
 
+migration_downtime_end(s);
 s->total_time = end_time - s->start_time;
-if (!s->downtime) {
-/*
- * It's still not set, so we are precopy migration.  For
- * postcopy, downtime is calculated during postcopy_start().
- */
-s->downtime = end_time - s->downtime_start;
-}
-
 transfer_time = s->total_time - s->setup_time;
 if (transfer_time) {
 s->mbps = ((double) bytes * 8.0) / transfer_time / 1000;
@@ -3126,7 +3137,7 @@ static void bg_migration_vm_start_bh(void *opaque)
 s->vm_start_bh = NULL;
 
 vm_start();
-s->downtime = qemu_clock_get_ms(QEMU_CLOCK_REALTIME) - s->downtime_start;
+migration_downtime_end(s);
 }
 
 /**
@@ -3193,7 +3204,7 @@ static void *bg_migration_thread(void *opaque)
 s->setup_time = qemu_clock_get_ms(QEMU_CLOCK_HOST) - setup_start;
 
 trace_migration_thread_setup_complete();
-s->downtime_start = qemu_clock_get_ms(QEMU_CLOCK_REALTIME);
+migration_downtime_start(s);
 
 qemu_mutex_lock_iothread();
 
-- 
2.41.0

Re: [QEMU][PATCHv2 1/8] xen: when unplugging emulated devices skip virtio devices

2023-10-26 Thread David Woodhouse

On Wed, 2023-10-25 at 18:23 -0700, Stefano Stabellini wrote:
> On Thu, 26 Oct 2023, David Woodhouse wrote:
> > On Wed, 2023-10-25 at 14:24 -0700, Vikram Garhwal wrote:
> > > From: Juergen Gross 
> > > 
> > > Virtio devices should never be unplugged at boot time, as they are
> > > similar to pci passthrough devices.
> > > 
> > > Signed-off-by: Juergen Gross 
> > > Signed-off-by: Vikram Garhwal 
> > 
> > Hm, do your virtio NICs still actually *work* after that? Or are they
> > all disconnected from their netdev peers? 
> > 
> > I suspect you're going to want a variant of
> > https://lore.kernel.org/qemu-devel/20231025145042.627381-19-dw...@infradead.org/T/#u
> > which also leave the peers of your virtio devices intact?
> 
> Hi David, device unplug is an x86-only thing (see the definition of
> xen_emul_unplug in Linux under arch/x86/xen/platform-pci-unplug.c) I
> suspect Vikram who is working on ARM hasn't tested it.

Ah, I had assumed there was something else coming along later which
would make it actually get used. 

> Vikram, a simple option is to drop this patch if you don't need it.

That works. Although I may revive it in that case. 



smime.p7s
Description: S/MIME cryptographic signature

[PATCH] target/i386/monitor: synchronize cpu before printing lapic state

2023-10-26 Thread David Woodhouse

From: David Woodhouse 

Where the local APIC is emulated by KVM, we need kvm_get_apic() to pull
the current state into userspace before it's printed. Otherwise we get
stale values.

Signed-off-by: David Woodhouse 
---
 target/i386/monitor.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/i386/monitor.c b/target/i386/monitor.c
index 6512846327..0754d699ba 100644
--- a/target/i386/monitor.c
+++ b/target/i386/monitor.c
@@ -29,6 +29,7 @@
 #include "monitor/hmp.h"
 #include "qapi/qmp/qdict.h"
 #include "sysemu/kvm.h"
+#include "sysemu/hw_accel.h"
 #include "qapi/error.h"
 #include "qapi/qapi-commands-misc-target.h"
 #include "qapi/qapi-commands-misc.h"
@@ -655,6 +656,7 @@ void hmp_info_local_apic(Monitor *mon, const QDict *qdict)
 if (qdict_haskey(qdict, "apic-id")) {
 int id = qdict_get_try_int(qdict, "apic-id", 0);
 cs = cpu_by_arch_id(id);
+cpu_synchronize_state(cs);
 } else {
 cs = mon_get_cpu(mon);
 }
-- 
2.34.1




smime.p7s
Description: S/MIME cryptographic signature

Re: [PULL 20/39] hw/s390x/s390-stattrib: Simplify handling of the "migration-enabled" property

2023-10-26 Thread Thomas Huth


On 24/10/2023 15.12, Juan Quintela wrote:

From: Thomas Huth 

There's no need for dedicated handlers here if they don't do anything
special.

Acked-by: David Hildenbrand 
Reviewed-by: Eric Farman 
Acked-by: Juan Quintela 
Signed-off-by: Thomas Huth 
Signed-off-by: Juan Quintela 
Message-ID: <20231020150554.664422-3-th...@redhat.com>
---
  hw/s390x/s390-stattrib.c | 26 ++
  1 file changed, 6 insertions(+), 20 deletions(-)

diff --git a/hw/s390x/s390-stattrib.c b/hw/s390x/s390-stattrib.c
index 220e845d12..52f9fc036e 100644
--- a/hw/s390x/s390-stattrib.c
+++ b/hw/s390x/s390-stattrib.c
@@ -13,6 +13,7 @@
  #include "qemu/units.h"
  #include "migration/qemu-file.h"
  #include "migration/register.h"
+#include "hw/qdev-properties.h"
  #include "hw/s390x/storage-attributes.h"
  #include "qemu/error-report.h"
  #include "exec/ram_addr.h"
@@ -340,6 +341,10 @@ static void s390_stattrib_realize(DeviceState *dev, Error 
**errp)
  }
  }
  
+static Property s390_stattrib_props[] = {

+DEFINE_PROP_BOOL("migration-enabled", S390StAttribState, 
migration_enabled, true),


This needs a  DEFINE_PROP_END_OF_LIST() here, too ... sorry for that!
/me is looking for his brown paper-bags...

 Thomas


+};
+
  static void s390_stattrib_class_init(ObjectClass *oc, void *data)
  {
  DeviceClass *dc = DEVICE_CLASS(oc);
@@ -347,22 +352,7 @@ static void s390_stattrib_class_init(ObjectClass *oc, void 
*data)
  dc->hotpluggable = false;
  set_bit(DEVICE_CATEGORY_MISC, dc->categories);
  dc->realize = s390_stattrib_realize;
-}
-
-static inline bool s390_stattrib_get_migration_enabled(Object *obj,
-   Error **errp)
-{
-S390StAttribState *s = S390_STATTRIB(obj);
-
-return s->migration_enabled;
-}
-
-static inline void s390_stattrib_set_migration_enabled(Object *obj, bool value,
-Error **errp)
-{
-S390StAttribState *s = S390_STATTRIB(obj);
-
-s->migration_enabled = value;
+device_class_set_props(dc, s390_stattrib_props);
  }
  
  static SaveVMHandlers savevm_s390_stattrib_handlers = {

@@ -383,10 +373,6 @@ static void s390_stattrib_instance_init(Object *obj)
  register_savevm_live(TYPE_S390_STATTRIB, 0, 0,
   _s390_stattrib_handlers, sas);
  
-object_property_add_bool(obj, "migration-enabled",

- s390_stattrib_get_migration_enabled,
- s390_stattrib_set_migration_enabled);
-object_property_set_bool(obj, "migration-enabled", true, NULL);
  sas->migration_cur_gfn = 0;
  }

Re: [PATCH v2 0/3] vfio/pci: Fix buffer overrun when writing the VF token

2023-10-26 Thread Peter Maydell

On Thu, 26 Oct 2023 at 16:06, Konstantin Ryabitsev
 wrote:
>
> October 26, 2023 at 5:58 AM, "Cédric Le Goater"  wrote:
> > >  Reviwed-by: Denis V. Lunev 
> > >
> >
> > I changed that to "Reviewed-by".
> >
> > Interesting to see that b4 was ok with this new tag.
>
> When we see an email address in the trailer contents, we don't check it 
> against a known-trailers list, because there are just too many things like 
> "Co-developed-by", "Reviewed-and-acked-by", etc. We could add some kind of 
> logic to break these apart and compare individual parts to a list of known 
> person-trailers (e.g. ["co", "reviewed", "developed", "and", "by", ...]), but 
> we currently don't, which is why typos like this one sneak through.

>From the QEMU development perspective, I would be mildly in favour
of having checkpatch at least warn about unusual trailers, because
I don't think the profusion of oddball stuff is actually helpful,
and nudging towards standardization and guarding against typos in
the tag string would be better (eg if you want to do both a review
and an ack, provide both tags, not an -and- tag that no tooling
that might be asking "did this get review?" will be looking for).
But I don't care enough to have actually looked at getting our
checkpatch to do this :-)

-- PMM

Re: [PULL 19/39] hw/s390x/s390-skeys: Don't call register_savevm_live() during instance_init()

2023-10-26 Thread Thomas Huth


On 24/10/2023 15.12, Juan Quintela wrote:

From: Thomas Huth 

Since the instance_init() function immediately tries to set the
property to "true", the s390_skeys_set_migration_enabled() tries
to register a savevm handler during instance_init(). However,
instance_init() functions can be called multiple times, e.g. for
introspection of devices. That means multiple instances of devices
can be created during runtime (which is fine as long as they all
don't get realized, too), so the "Prevent double registration of
savevm handler" check in the s390_skeys_set_migration_enabled()
function does not work at all as expected (since there could be
more than one instance).

Thus we must not call register_savevm_live() from an instance_init()
function at all. Move this to the realize() function instead. This
way we can also get rid of the property getter and setter functions
completely, simplifying the code along the way quite a bit.

Acked-by: David Hildenbrand 
Reviewed-by: Eric Farman 
Acked-by: Juan Quintela 
Signed-off-by: Thomas Huth 
Signed-off-by: Juan Quintela 
Message-ID: <20231020150554.664422-2-th...@redhat.com>
---
  hw/s390x/s390-skeys.c | 35 ---
  1 file changed, 8 insertions(+), 27 deletions(-)

diff --git a/hw/s390x/s390-skeys.c b/hw/s390x/s390-skeys.c
index 5024faf411..8e9d9e41e8 100644
--- a/hw/s390x/s390-skeys.c
+++ b/hw/s390x/s390-skeys.c
@@ -12,6 +12,7 @@
  #include "qemu/osdep.h"
  #include "qemu/units.h"
  #include "hw/boards.h"
+#include "hw/qdev-properties.h"
  #include "hw/s390x/storage-keys.h"
  #include "qapi/error.h"
  #include "qapi/qapi-commands-misc-target.h"
@@ -432,58 +433,38 @@ static int s390_storage_keys_load(QEMUFile *f, void 
*opaque, int version_id)
  return ret;
  }
  
-static inline bool s390_skeys_get_migration_enabled(Object *obj, Error **errp)

-{
-S390SKeysState *ss = S390_SKEYS(obj);
-
-return ss->migration_enabled;
-}
-
  static SaveVMHandlers savevm_s390_storage_keys = {
  .save_state = s390_storage_keys_save,
  .load_state = s390_storage_keys_load,
  };
  
-static inline void s390_skeys_set_migration_enabled(Object *obj, bool value,

-Error **errp)
+static void s390_skeys_realize(DeviceState *dev, Error **errp)
  {
-S390SKeysState *ss = S390_SKEYS(obj);
-
-/* Prevent double registration of savevm handler */
-if (ss->migration_enabled == value) {
-return;
-}
-
-ss->migration_enabled = value;
+S390SKeysState *ss = S390_SKEYS(dev);
  
  if (ss->migration_enabled) {

  register_savevm_live(TYPE_S390_SKEYS, 0, 1,
   _s390_storage_keys, ss);
-} else {
-unregister_savevm(VMSTATE_IF(ss), TYPE_S390_SKEYS, ss);
  }
  }
  
-static void s390_skeys_instance_init(Object *obj)

-{
-object_property_add_bool(obj, "migration-enabled",
- s390_skeys_get_migration_enabled,
- s390_skeys_set_migration_enabled);
-object_property_set_bool(obj, "migration-enabled", true, NULL);
-}
+static Property s390_skeys_props[] = {
+DEFINE_PROP_BOOL("migration-enabled", S390SKeysState, migration_enabled, 
true),


This needs a  DEFINE_PROP_END_OF_LIST() here ... mea culpa!

 Thomas



+};
  
  static void s390_skeys_class_init(ObjectClass *oc, void *data)

  {
  DeviceClass *dc = DEVICE_CLASS(oc);
  
  dc->hotpluggable = false;

+dc->realize = s390_skeys_realize;
+device_class_set_props(dc, s390_skeys_props);
  set_bit(DEVICE_CATEGORY_MISC, dc->categories);
  }
  
  static const TypeInfo s390_skeys_info = {

  .name  = TYPE_S390_SKEYS,
  .parent= TYPE_DEVICE,
-.instance_init = s390_skeys_instance_init,
  .instance_size = sizeof(S390SKeysState),
  .class_init= s390_skeys_class_init,
  .class_size= sizeof(S390SKeysClass),

Re: [PATCH v6 08/10] hw/fsi: Added qtest

2023-10-26 Thread Ninad Palsule


Hello Cedric,

On 10/24/23 02:34, Cédric Le Goater wrote:

On 10/21/23 23:17, Ninad Palsule wrote:

Added basic qtests for FSI model.

Signed-off-by: Ninad Palsule 
---
v3:
  - Added new qtest as per Cedric's comment.
V4:
  - Remove MAINTAINER and documentation changes from this commit
v6:
  - Incorporated review comments by Thomas Huth.
---
  tests/qtest/fsi-test.c  | 207 


please rename the file to aspeed-fsi-test.c


Renamed it.

Thanks for the review.

Regards,

Ninad



Thanks,

C.



  tests/qtest/meson.build |   1 +
  2 files changed, 208 insertions(+)
  create mode 100644 tests/qtest/fsi-test.c

diff --git a/tests/qtest/fsi-test.c b/tests/qtest/fsi-test.c
new file mode 100644
index 00..01a0739092
--- /dev/null
+++ b/tests/qtest/fsi-test.c
@@ -0,0 +1,207 @@
+/*
+ * QTest testcases for IBM's Flexible Service Interface (FSI)
+ *
+ * Copyright (c) 2023 IBM Corporation
+ *
+ * Authors:
+ *   Ninad Palsule 
+ *
+ * This work is licensed under the terms of the GNU GPL, version 2 
or later.

+ * See the COPYING file in the top-level directory.
+ */
+
+#include "qemu/osdep.h"
+#include 
+
+#include "qemu/module.h"
+#include "libqtest-single.h"
+
+/* Registers from ast2600 specifications */
+#define ASPEED_FSI_ENGINER_TRIGGER   0x04
+#define ASPEED_FSI_OPB0_BUS_SELECT   0x10
+#define ASPEED_FSI_OPB1_BUS_SELECT   0x28
+#define ASPEED_FSI_OPB0_RW_DIRECTION 0x14
+#define ASPEED_FSI_OPB1_RW_DIRECTION 0x2c
+#define ASPEED_FSI_OPB0_XFER_SIZE    0x18
+#define ASPEED_FSI_OPB1_XFER_SIZE    0x30
+#define ASPEED_FSI_OPB0_BUS_ADDR 0x1c
+#define ASPEED_FSI_OPB1_BUS_ADDR 0x34
+#define ASPEED_FSI_INTRRUPT_CLEAR    0x40
+#define ASPEED_FSI_INTRRUPT_STATUS   0x48
+#define ASPEED_FSI_OPB0_BUS_STATUS   0x80
+#define ASPEED_FSI_OPB1_BUS_STATUS   0x8c
+#define ASPEED_FSI_OPB0_READ_DATA    0x84
+#define ASPEED_FSI_OPB1_READ_DATA    0x90
+
+/*
+ * FSI Base addresses from the ast2600 specifications.
+ */
+#define AST2600_OPB_FSI0_BASE_ADDR 0x1e79b000
+#define AST2600_OPB_FSI1_BASE_ADDR 0x1e79b100
+
+static uint32_t aspeed_fsi_base_addr;
+
+static uint32_t aspeed_fsi_readl(QTestState *s, uint32_t reg)
+{
+    return qtest_readl(s, aspeed_fsi_base_addr + reg);
+}
+
+static void aspeed_fsi_writel(QTestState *s, uint32_t reg, uint32_t 
val)

+{
+    qtest_writel(s, aspeed_fsi_base_addr + reg, val);
+}
+
+/* Setup base address and select register */
+static void test_fsi_setup(QTestState *s, uint32_t base_addr)
+{
+    uint32_t curval;
+
+    /* Set the base select register */
+    if (base_addr == AST2600_OPB_FSI0_BASE_ADDR) {
+    aspeed_fsi_base_addr = base_addr;
+
+    /* Unselect FSI1 */
+    aspeed_fsi_writel(s, ASPEED_FSI_OPB1_BUS_SELECT, 0x0);
+    curval = aspeed_fsi_readl(s, ASPEED_FSI_OPB1_BUS_SELECT);
+    g_assert_cmpuint(curval, ==, 0x0);
+
+    /* Select FSI0 */
+    aspeed_fsi_writel(s, ASPEED_FSI_OPB0_BUS_SELECT, 0x1);
+    curval = aspeed_fsi_readl(s, ASPEED_FSI_OPB0_BUS_SELECT);
+    g_assert_cmpuint(curval, ==, 0x1);
+    } else if (base_addr == AST2600_OPB_FSI1_BASE_ADDR) {
+    aspeed_fsi_base_addr = base_addr;
+
+    /* Unselect FSI0 */
+    aspeed_fsi_writel(s, ASPEED_FSI_OPB0_BUS_SELECT, 0x0);
+    curval = aspeed_fsi_readl(s, ASPEED_FSI_OPB0_BUS_SELECT);
+    g_assert_cmpuint(curval, ==, 0x0);
+
+    /* Select FSI1 */
+    aspeed_fsi_writel(s, ASPEED_FSI_OPB1_BUS_SELECT, 0x1);
+    curval = aspeed_fsi_readl(s, ASPEED_FSI_OPB1_BUS_SELECT);
+    g_assert_cmpuint(curval, ==, 0x1);
+    } else {
+    g_assert_not_reached();
+    }
+}
+
+static void test_fsi_reg_change(QTestState *s, uint32_t reg, 
uint32_t newval)

+{
+    uint32_t base;
+    uint32_t curval;
+
+    base = aspeed_fsi_readl(s, reg);
+    aspeed_fsi_writel(s, reg, newval);
+    curval = aspeed_fsi_readl(s, reg);
+    g_assert_cmpuint(curval, ==, newval);
+    aspeed_fsi_writel(s, reg, base);
+    curval = aspeed_fsi_readl(s, reg);
+    g_assert_cmpuint(curval, ==, base);
+}
+
+static void test_fsi0_master_regs(const void *data)
+{
+    QTestState *s = (QTestState *)data;
+
+    test_fsi_setup(s, AST2600_OPB_FSI0_BASE_ADDR);
+
+    test_fsi_reg_change(s, ASPEED_FSI_OPB0_RW_DIRECTION, 0xF3F4F514);
+    test_fsi_reg_change(s, ASPEED_FSI_OPB0_XFER_SIZE, 0xF3F4F518);
+    test_fsi_reg_change(s, ASPEED_FSI_OPB0_BUS_ADDR, 0xF3F4F51c);
+    test_fsi_reg_change(s, ASPEED_FSI_INTRRUPT_CLEAR, 0xF3F4F540);
+    test_fsi_reg_change(s, ASPEED_FSI_INTRRUPT_STATUS, 0xF3F4F548);
+    test_fsi_reg_change(s, ASPEED_FSI_OPB0_BUS_STATUS, 0xF3F4F580);
+    test_fsi_reg_change(s, ASPEED_FSI_OPB0_READ_DATA, 0xF3F4F584);
+}
+
+static void test_fsi1_master_regs(const void *data)
+{
+    QTestState *s = (QTestState *)data;
+
+    test_fsi_setup(s, AST2600_OPB_FSI1_BASE_ADDR);
+
+    test_fsi_reg_change(s, ASPEED_FSI_OPB1_RW_DIRECTION, 0xF3F4F514);
+    test_fsi_reg_change(s, ASPEED_FSI_OPB1_XFER_SIZE, 0xF3F4F518);
+    test_fsi_reg_change(s,

Re: [PATCH v6 09/10] hw/fsi: Added FSI documentation

2023-10-26 Thread Ninad Palsule


Hello Cedric,


On 10/24/23 02:37, Cédric Le Goater wrote:

On 10/21/23 23:17, Ninad Palsule wrote:

Documentation for IBM FSI model.

Signed-off-by: Ninad Palsule 
---
v4:
   - Added separate commit for documentation
---
  docs/specs/fsi.rst | 141 +
  1 file changed, 141 insertions(+)
  create mode 100644 docs/specs/fsi.rst



Documentation build is broken.

a 'fsi" entry should be added in docs/specs/index.rst. More below.

Sorry about that. Added entry in the index.rst





diff --git a/docs/specs/fsi.rst b/docs/specs/fsi.rst
new file mode 100644
index 00..73b082afe1
--- /dev/null
+++ b/docs/specs/fsi.rst
@@ -0,0 +1,141 @@
+==
+IBM's Flexible Service Interface (FSI)
+==
+
+The QEMU FSI emulation implements hardware interfaces between ASPEED 
SOC, FSI

+master/slave and the end engine.
+
+FSI is a point-to-point two wire interface which is capable of 
supporting
+distances of up to 4 meters. FSI interfaces have been used 
successfully for
+many years in IBM servers to attach IBM Flexible Support 
Processors(FSP) to

+CPUs and IBM ASICs.
+
+FSI allows a service processor access to the internal buses of a 
host POWER
+processor to perform configuration or debugging. FSI has long 
existed in POWER
+processes and so comes with some baggage, including how it has been 
integrated

+into the ASPEED SoC.
+
+Working backwards from the POWER processor, the fundamental pieces 
of interest

+for the implementation are:
+
+1. The Common FRU Access Macro (CFAM), an address space containing 
various
+   "engines" that drive accesses on buses internal and external to 
the POWER
+   chip. Examples include the SBEFIFO and I2C masters. The engines 
hang off of
+   an internal Local Bus (LBUS) which is described by the CFAM 
configuration

+   block.
+
+2. The FSI slave: The slave is the terminal point of the FSI bus for 
FSI
+   symbols addressed to it. Slaves can be cascaded off of one 
another. The
+   slave's configuration registers appear in address space of the 
CFAM to

+   which it is attached.
+
+3. The FSI master: A controller in the platform service processor 
(e.g. BMC)
+   driving CFAM engine accesses into the POWER chip. At the hardware 
level
+   FSI is a bit-based protocol supporting synchronous and DMA-driven 
accesses

+   of engines in a CFAM.
+
+4. The On-Chip Peripheral Bus (OPB): A low-speed bus typically found 
in POWER
+   processors. This now makes an appearance in the ASPEED SoC due to 
tight
+   integration of the FSI master IP with the OPB, mainly the 
existence of an
+   MMIO-mapping of the CFAM address straight onto a sub-region of 
the OPB

+   address space.
+
+5. An APB-to-OPB bridge enabling access to the OPB from the ARM core 
in the
+   AST2600. Hardware limitations prevent the OPB from being directly 
mapped

+   into APB, so all accesses are indirect through the bridge.
+
+The LBUS is modelled to maintain the qdev bus hierarchy and to take 
advantages
+of the object model to automatically generate the CFAM configuration 
block.
+The configuration block presents engines in the order they are 
attached to the
+CFAM's LBUS. Engine implementations should subclass the LBusDevice 
and set the

+'config' member of LBusDeviceClass to match the engine's type.
+
+CFAM designs offer a lot of flexibility, for instance it is possible 
for a
+CFAM to be simultaneously driven from multiple FSI links. The 
modeling is not
+so complete; it's assumed that each CFAM is attached to a single FSI 
slave (as

+a consequence the CFAM subclasses the FSI slave).
+
+As for FSI, its symbols and wire-protocol are not modelled at all. 
This is not
+necessary to get FSI off the ground thanks to the mapping of the 
CFAM address
+space onto the OPB address space - the models follow this directly 
and map the

+CFAM memory region into the OPB's memory region.
+
+QEMU files related to FSI interface:
+ - ``hw/fsi/aspeed-apb2opb.c``
+ - ``include/hw/fsi/aspeed-apb2opb.h``
+ - ``hw/fsi/opb.c``
+ - ``include/hw/fsi/opb.h``
+ - ``hw/fsi/fsi.c``
+ - ``include/hw/fsi/fsi.h``
+ - ``hw/fsi/fsi-master.c``
+ - ``include/hw/fsi/fsi-master.h``
+ - ``hw/fsi/fsi-slave.c``
+ - ``include/hw/fsi/fsi-slave.h``
+ - ``hw/fsi/cfam.c``
+ - ``include/hw/fsi/cfam.h``
+ - ``hw/fsi/engine-scratchpad.c``
+ - ``include/hw/fsi/engine-scratchpad.h``
+ - ``include/hw/fsi/lbus.h``
+
+The following commands start the rainier machine with built-in FSI 
model.

+There are no model specific arguments.
+
+.. code-block:: console
+
+  qemu-system-arm -M rainier-bmc -nographic \
+  -kernel fitImage-linux.bin \
+  -dtb aspeed-bmc-ibm-rainier.dtb \
+  -initrd obmc-phosphor-initramfs.rootfs.cpio.xz \
+  -drive file=obmc-phosphor-image.rootfs.wic.qcow2,if=sd,index=2 \
+  -append "rootwait console=ttyS4,115200n8 root=PARTLABEL=rofs-a"
+
+The implementation appears as following in the qemu device tree:
+
+.. code-block:: console
+
+  (qemu)

Re: [PATCH v6 06/10] hw/fsi: Aspeed APB2OPB interface

2023-10-26 Thread Ninad Palsule


Hello Cedric,


On 10/24/23 10:21, Cédric Le Goater wrote:

On 10/24/23 17:00, Ninad Palsule wrote:

Hello Cedric,

On 10/24/23 02:46, Cédric Le Goater wrote:

On 10/21/23 23:17, Ninad Palsule wrote:

This is a part of patchset where IBM's Flexible Service Interface is
introduced.

An APB-to-OPB bridge enabling access to the OPB from the ARM core in
the AST2600. Hardware limitations prevent the OPB from being directly
mapped into APB, so all accesses are indirect through the bridge.

Signed-off-by: Andrew Jeffery 
Signed-off-by: Ninad Palsule 
---
v2:
- Incorporated review comments by Joel
v3:
- Incorporated review comments by Thomas Huth
v4:
   - Compile FSI with ASPEED_SOC only.
v5:
- Incorporated review comments by Cedric.
v6:
- Incorporated review comments by Cedric.
---
  include/hw/fsi/aspeed-apb2opb.h |  33 
  hw/fsi/aspeed-apb2opb.c | 280 


  hw/arm/Kconfig  |   1 +
  hw/fsi/Kconfig  |   4 +
  hw/fsi/meson.build  |   1 +
  hw/fsi/trace-events |   2 +
  6 files changed, 321 insertions(+)
  create mode 100644 include/hw/fsi/aspeed-apb2opb.h
  create mode 100644 hw/fsi/aspeed-apb2opb.c

diff --git a/include/hw/fsi/aspeed-apb2opb.h 
b/include/hw/fsi/aspeed-apb2opb.h

new file mode 100644
index 00..a81ae67023
--- /dev/null
+++ b/include/hw/fsi/aspeed-apb2opb.h
@@ -0,0 +1,33 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (C) 2023 IBM Corp.
+ *
+ * ASPEED APB2OPB Bridge
+ */
+#ifndef FSI_ASPEED_APB2OPB_H
+#define FSI_ASPEED_APB2OPB_H
+
+#include "hw/sysbus.h"
+#include "hw/fsi/opb.h"
+
+#define TYPE_ASPEED_APB2OPB "aspeed.apb2opb"
+OBJECT_DECLARE_SIMPLE_TYPE(AspeedAPB2OPBState, ASPEED_APB2OPB)
+
+#define ASPEED_APB2OPB_NR_REGS ((0xe8 >> 2) + 1)
+
+#define ASPEED_FSI_NUM 2
+
+typedef struct AspeedAPB2OPBState {
+    /*< private >*/
+    SysBusDevice parent_obj;
+
+    /*< public >*/
+    MemoryRegion iomem;
+
+    uint32_t regs[ASPEED_APB2OPB_NR_REGS];
+    qemu_irq irq;
+
+    OPBus opb[ASPEED_FSI_NUM];
+} AspeedAPB2OPBState;
+
+#endif /* FSI_ASPEED_APB2OPB_H */
diff --git a/hw/fsi/aspeed-apb2opb.c b/hw/fsi/aspeed-apb2opb.c
new file mode 100644
index 00..6f97a6bc7d
--- /dev/null
+++ b/hw/fsi/aspeed-apb2opb.c
@@ -0,0 +1,280 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (C) 2023 IBM Corp.
+ *
+ * ASPEED APB-OPB FSI interface
+ */
+
+#include "qemu/osdep.h"
+#include "qemu/log.h"
+#include "qom/object.h"
+#include "qapi/error.h"
+#include "trace.h"
+
+#include "hw/fsi/aspeed-apb2opb.h"
+#include "hw/qdev-core.h"
+
+#define TO_REG(x) (x >> 2)
+#define GENMASK(t, b) (((1ULL << ((t) + 1)) - 1) & ~((1ULL << (b)) 
- 1))

+
+#define APB2OPB_VERSION    TO_REG(0x00)
+#define   APB2OPB_VERSION_VER  GENMASK(7, 0)
+
+#define APB2OPB_TRIGGER    TO_REG(0x04)
+#define   APB2OPB_TRIGGER_EN   BIT(0)
+
+#define APB2OPB_CONTROL    TO_REG(0x08)
+#define   APB2OPB_CONTROL_OFF  GENMASK(31, 13)
+
+#define APB2OPB_OPB2FSI    TO_REG(0x0c)
+#define   APB2OPB_OPB2FSI_OFF  GENMASK(31, 22)
+
+#define APB2OPB_OPB0_SEL   TO_REG(0x10)
+#define APB2OPB_OPB1_SEL   TO_REG(0x28)
+#define   APB2OPB_OPB_SEL_EN   BIT(0)
+
+#define APB2OPB_OPB0_MODE  TO_REG(0x14)
+#define APB2OPB_OPB1_MODE  TO_REG(0x2c)
+#define   APB2OPB_OPB_MODE_RD  BIT(0)
+
+#define APB2OPB_OPB0_XFER  TO_REG(0x18)
+#define APB2OPB_OPB1_XFER  TO_REG(0x30)
+#define   APB2OPB_OPB_XFER_FULL    BIT(1)
+#define   APB2OPB_OPB_XFER_HALF    BIT(0)
+
+#define APB2OPB_OPB0_ADDR  TO_REG(0x1c)
+#define APB2OPB_OPB0_WRITE_DATA    TO_REG(0x20)
+
+#define APB2OPB_OPB1_ADDR  TO_REG(0x34)
+#define APB2OPB_OPB1_WRITE_DATA TO_REG(0x38)
+
+#define APB2OPB_IRQ_STS    TO_REG(0x48)
+#define   APB2OPB_IRQ_STS_OPB1_TX_ACK  BIT(17)
+#define   APB2OPB_IRQ_STS_OPB0_TX_ACK  BIT(16)
+
+#define APB2OPB_OPB0_WRITE_WORD_ENDIAN TO_REG(0x4c)
+#define   APB2OPB_OPB0_WRITE_WORD_ENDIAN_BE 0x0011101b
+#define APB2OPB_OPB0_WRITE_BYTE_ENDIAN TO_REG(0x50)
+#define   APB2OPB_OPB0_WRITE_BYTE_ENDIAN_BE 0x0c330f3f
+#define APB2OPB_OPB1_WRITE_WORD_ENDIAN TO_REG(0x54)
+#define APB2OPB_OPB1_WRITE_BYTE_ENDIAN TO_REG(0x58)
+#define APB2OPB_OPB0_READ_BYTE_ENDIAN  TO_REG(0x5c)
+#define APB2OPB_OPB1_READ_BYTE_ENDIAN  TO_REG(0x60)
+#define   APB2OPB_OPB0_READ_WORD_ENDIAN_BE  0x00030b1b
+
+#define APB2OPB_OPB0_READ_DATA TO_REG(0x84)
+#define APB2OPB_OPB1_READ_DATA TO_REG(0x90)
+
+/*
+ * The following magic values came from AST2600 data sheet
+ * The register values are defined under section "FSI controller"
+ * as initial values.
+ */
+static const uint32_t aspeed_apb2opb_reset[ASPEED_APB2OPB_NR_REGS] 
= {

+ [APB2OPB_VERSION]

Re: [PULL 00/39] Migration 20231024 patches

2023-10-26 Thread Juan Quintela

Stefan Hajnoczi  wrote:
> On Tue, 24 Oct 2023 at 23:45, Juan Quintela  wrote:
>>
>> The following changes since commit a95260486aa7e78d7c7194eba65cf03311ad94ad:
>>
>>   Merge tag 'pull-tcg-20231023' of https://gitlab.com/rth7680/qemu into 
>> staging (2023-10-23 14:45:46 -0700)
>>
>> are available in the Git repository at:
>>
>>   https://gitlab.com/juan.quintela/qemu.git 
>> tags/migration-20231024-pull-request
>>
>> for you to fetch changes up to 088f7f03da3f5b3487091302b795c22b1bfe56fb:
>>
>>   migration: Deprecate old compression method (2023-10-24 13:48:24 +0200)
>>
>> 
>> Migration Pull request (20231024)
>>
>> Hi
>>
>> In this PULL:
>> - vmstate registration fixes (thomas, juan)
>> - start merging vmstate_section_needed changes (marc)
>> - migration depreactions (juan)
>> - migration documentation for backwards compatibility (juan)
>>
>> Please apply.
>
> Hi Juan,
> I'm seeing CI failures:
> https://gitlab.com/qemu-project/qemu/-/pipelines/1048630760

start with s390x:

Errors:

 32/840 qemu:qtest+qtest-s390x / qtest-s390x/qom-test   
  ERROR  50.27s   killed by signal 6 SIGABRT
104/840 qemu:qtest+qtest-s390x / qtest-s390x/test-hmp   
  ERROR  51.55s   killed by signal 6 SIGABRT
189/840 qemu:qtest+qtest-s390x / qtest-s390x/boot-serial-test   
  ERROR  54.07s   killed by signal 6 SIGABRT
192/840 qemu:qtest+qtest-s390x / qtest-s390x/qos-test   
  ERROR  51.29s   killed by signal 6 SIGABRT
519/840 qemu:qtest+qtest-s390x / qtest-s390x/test-filter-mirror 
  ERROR  50.36s   killed by signal 6 SIGABRT
520/840 qemu:qtest+qtest-s390x / qtest-s390x/test-netfilter 
  ERROR  51.03s   killed by signal 6 SIGABRT
522/840 qemu:qtest+qtest-s390x / qtest-s390x/device-plug-test   
  ERROR  50.99s   killed by signal 6 SIGABRT
523/840 qemu:qtest+qtest-s390x / qtest-s390x/test-filter-redirector 
  ERROR  54.14s   killed by signal 6 SIGABRT
524/840 qemu:qtest+qtest-s390x / qtest-s390x/drive_del-test 
  ERROR  53.40s   killed by signal 6 SIGABRT
525/840 qemu:qtest+qtest-s390x / qtest-s390x/virtio-ccw-test
  ERROR  54.67s   killed by signal 6 SIGABRT
526/840 qemu:qtest+qtest-s390x / qtest-s390x/device-introspect-test 
  ERROR  51.15s   killed by signal 6 SIGABRT
527/840 qemu:qtest+qtest-s390x / qtest-s390x/cpu-plug-test  
  ERROR  51.21s   killed by signal 6 SIGABRT
535/840 qemu:qtest+qtest-s390x / qtest-s390x/qmp-test   
  ERROR  51.18s   killed by signal 6 SIGABRT
534/840 qemu:qtest+qtest-s390x / qtest-s390x/machine-none-test  
  ERROR  51.21s   killed by signal 6 SIGABRT
533/840 qemu:qtest+qtest-s390x / qtest-s390x/qmp-cmd-test   
  ERROR  51.22s   killed by signal 6 SIGABRT
549/840 qemu:qtest+qtest-s390x / qtest-s390x/readconfig-test
  ERROR  51.20s   killed by signal 6 SIGABRT
644/840 qemu:block / io-qcow2-001   
  ERROR   0.32s   exit status 1
645/840 qemu:block / io-qcow2-002   
  ERROR   0.32s   exit status 1
646/840 qemu:block / io-qcow2-003   
  ERROR   0.34s   exit status 1
647/840 qemu:block / io-qcow2-004   
  ERROR   0.31s   exit status 1
648/840 qemu:block / io-qcow2-005   
  ERROR   0.43s   exit status 1
649/840 qemu:block / io-qcow2-007   
  ERROR   0.34s   exit status 1
650/840 qemu:block / io-qcow2-008   
  ERROR   0.63s   exit status 1
651/840 qemu:block / io-qcow2-009   
  ERROR   0.32s   exit status 1
652/840 qemu:block / io-qcow2-010   
  ERROR   0.30s   exit status 1
654/840 qemu:block / io-qcow2-011   
  ERROR   0.31s   exit status 1
655/840 qemu:block / io-qcow2-012   
  ERROR   0.36s   exit status 1
657/840 qemu:block / io-qcow2-013   
  ERROR   0.51s   exit status 1
658/840 qemu:block / io-qcow2-017   
  ERROR   0.37s   exit status 1
659/840 qemu:block / io-qcow2-018   
  ERROR   0.31s   exit status 1
660/840 qemu:block / io-qcow2-019   
  ERROR

Re: [PATCH v6 02/10] hw/fsi: Introduce IBM's scratchpad

2023-10-26 Thread Ninad Palsule




On 10/24/23 02:08, Philippe Mathieu-Daudé wrote:

On 23/10/23 19:08, Ninad Palsule wrote:

Hello Philippe,

On 10/23/23 10:00, Philippe Mathieu-Daudé wrote:

On 21/10/23 23:17, Ninad Palsule wrote:

This is a part of patchset where scratchpad is introduced.

The scratchpad provides a set of non-functional registers. The 
firmware

is free to use them, hardware does not support any special management
support. The scratchpad registers can be read or written from LBUS
slave.

In this model, The LBUS device is parent for the scratchpad.

Signed-off-by: Andrew Jeffery 
Signed-off-by: Ninad Palsule 
---
v2:
- Incorporated Joel's review comments.
v5:
- Incorporated review comments by Cedric.
v6:
- Incorporated review comments by Daniel.
---
  meson.build    |  1 +
  hw/fsi/trace.h |  1 +
  include/hw/fsi/engine-scratchpad.h | 32 ++
  include/hw/fsi/fsi.h   | 16 +
  hw/fsi/engine-scratchpad.c | 93 
++

  hw/fsi/Kconfig |  4 ++
  hw/fsi/meson.build |  1 +
  hw/fsi/trace-events    |  2 +
  8 files changed, 150 insertions(+)
  create mode 100644 hw/fsi/trace.h
  create mode 100644 include/hw/fsi/engine-scratchpad.h
  create mode 100644 include/hw/fsi/fsi.h
  create mode 100644 hw/fsi/engine-scratchpad.c
  create mode 100644 hw/fsi/trace-events




diff --git a/include/hw/fsi/fsi.h b/include/hw/fsi/fsi.h
new file mode 100644
index 00..e65f26f17b
--- /dev/null
+++ b/include/hw/fsi/fsi.h
@@ -0,0 +1,16 @@
+/*
+ * SPDX-License-Identifier: GPL-2.0-or-later
+ * Copyright (C) 2023 IBM Corp.
+ *
+ * IBM Flexible Service Interface
+ */
+#ifndef FSI_FSI_H
+#define FSI_FSI_H
+
+/* Bitwise operations at the word level. */
+#define BE_BIT(x)  BIT(31 - (x))
+#define GENMASK(t, b) \
+    (((1ULL << ((t) + 1)) - 1) & ~((1ULL << (b)) - 1))


Please use MAKE_64BIT_MASK() from "qemu/bitops.h".


The GENMASK and MAKE_64BIT_MASK macros are invoke differently.

GENMASK is invoked with bit t and bit b (t:b) and it provides the 
mask and


MAKE_64BIT_MASK uses shift and length.


Don't we have:

#define GENMASK(t, b) MAKE_64BIT_MASK(t, b - t + 1)

?


You are right. I am able to use this macro. I have removed some unused 
macros.


Thanks for the review.

Regards,

Ninad




Thanks for the review.

Regards,

Ninad


+#define BE_GENMASK(t, b)   GENMASK(BE_BIT(t), 
BE_BIT(b))

+
+#endif

[PATCH v2 06/14] target/riscv: Add cfg properties for Zvkn[c|g] extensions

2023-10-26 Thread Max Chou

Vector crypto spec defines the NIST algorithm suite related extensions
(Zvkn, Zvknc, Zvkng) combined by several vector crypto extensions.

Signed-off-by: Max Chou 
---
 target/riscv/cpu_cfg.h |  3 +++
 target/riscv/tcg/tcg-cpu.c | 20 
 2 files changed, 23 insertions(+)

diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
index 935335e5721..fd07aa96a27 100644
--- a/target/riscv/cpu_cfg.h
+++ b/target/riscv/cpu_cfg.h
@@ -96,6 +96,9 @@ struct RISCVCPUConfig {
 bool ext_zvksed;
 bool ext_zvksh;
 bool ext_zvkt;
+bool ext_zvkn;
+bool ext_zvknc;
+bool ext_zvkng;
 bool ext_zmmul;
 bool ext_zvfbfmin;
 bool ext_zvfbfwma;
diff --git a/target/riscv/tcg/tcg-cpu.c b/target/riscv/tcg/tcg-cpu.c
index 1b08f27eee4..e460701a13c 100644
--- a/target/riscv/tcg/tcg-cpu.c
+++ b/target/riscv/tcg/tcg-cpu.c
@@ -499,6 +499,26 @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, 
Error **errp)
 return;
 }
 
+/*
+ * Shorthand vector crypto extensions
+ */
+if (cpu->cfg.ext_zvknc) {
+cpu_cfg_ext_auto_update(cpu, CPU_CFG_OFFSET(ext_zvkn), true);
+cpu_cfg_ext_auto_update(cpu, CPU_CFG_OFFSET(ext_zvbc), true);
+}
+
+if (cpu->cfg.ext_zvkng) {
+cpu_cfg_ext_auto_update(cpu, CPU_CFG_OFFSET(ext_zvkn), true);
+cpu_cfg_ext_auto_update(cpu, CPU_CFG_OFFSET(ext_zvkg), true);
+}
+
+if (cpu->cfg.ext_zvkn) {
+cpu_cfg_ext_auto_update(cpu, CPU_CFG_OFFSET(ext_zvkned), true);
+cpu_cfg_ext_auto_update(cpu, CPU_CFG_OFFSET(ext_zvknhb), true);
+cpu_cfg_ext_auto_update(cpu, CPU_CFG_OFFSET(ext_zvkb), true);
+cpu_cfg_ext_auto_update(cpu, CPU_CFG_OFFSET(ext_zvkt), true);
+}
+
 if (cpu->cfg.ext_zvkt) {
 cpu_cfg_ext_auto_update(cpu, CPU_CFG_OFFSET(ext_zvbb), true);
 cpu_cfg_ext_auto_update(cpu, CPU_CFG_OFFSET(ext_zvbc), true);
-- 
2.34.1

[PATCH v2 10/14] target/riscv: Move vector crypto extensions to riscv_cpu_extensions

2023-10-26 Thread Max Chou

Because the vector crypto specification is ratified, so move theses
extensions from riscv_cpu_experimental_exts to riscv_cpu_extensions.

Signed-off-by: Max Chou 
---
 target/riscv/cpu.c | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 1709df76a9b..5b5805399ee 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -1342,6 +1342,24 @@ const RISCVCPUMultiExtConfig riscv_cpu_extensions[] = {
 MULTI_EXT_CFG_BOOL("zcmt", ext_zcmt, false),
 MULTI_EXT_CFG_BOOL("zicond", ext_zicond, false),
 
+/* Vector cryptography extensions */
+MULTI_EXT_CFG_BOOL("zvbb", ext_zvbb, false),
+MULTI_EXT_CFG_BOOL("zvbc", ext_zvbc, false),
+MULTI_EXT_CFG_BOOL("zvkb", ext_zvkg, false),
+MULTI_EXT_CFG_BOOL("zvkg", ext_zvkg, false),
+MULTI_EXT_CFG_BOOL("zvkned", ext_zvkned, false),
+MULTI_EXT_CFG_BOOL("zvknha", ext_zvknha, false),
+MULTI_EXT_CFG_BOOL("zvknhb", ext_zvknhb, false),
+MULTI_EXT_CFG_BOOL("zvksed", ext_zvksed, false),
+MULTI_EXT_CFG_BOOL("zvksh", ext_zvksh, false),
+MULTI_EXT_CFG_BOOL("zvkt", ext_zvkt, false),
+MULTI_EXT_CFG_BOOL("zvkn", ext_zvkn, false),
+MULTI_EXT_CFG_BOOL("zvknc", ext_zvknc, false),
+MULTI_EXT_CFG_BOOL("zvkng", ext_zvkng, false),
+MULTI_EXT_CFG_BOOL("zvks", ext_zvks, false),
+MULTI_EXT_CFG_BOOL("zvksc", ext_zvksc, false),
+MULTI_EXT_CFG_BOOL("zvksg", ext_zvksg, false),
+
 DEFINE_PROP_END_OF_LIST(),
 };
 
@@ -1374,24 +1392,6 @@ const RISCVCPUMultiExtConfig 
riscv_cpu_experimental_exts[] = {
 MULTI_EXT_CFG_BOOL("x-zvfbfmin", ext_zvfbfmin, false),
 MULTI_EXT_CFG_BOOL("x-zvfbfwma", ext_zvfbfwma, false),
 
-/* Vector cryptography extensions */
-MULTI_EXT_CFG_BOOL("x-zvbb", ext_zvbb, false),
-MULTI_EXT_CFG_BOOL("x-zvbc", ext_zvbc, false),
-MULTI_EXT_CFG_BOOL("x-zvkb", ext_zvkg, false),
-MULTI_EXT_CFG_BOOL("x-zvkg", ext_zvkg, false),
-MULTI_EXT_CFG_BOOL("x-zvkned", ext_zvkned, false),
-MULTI_EXT_CFG_BOOL("x-zvknha", ext_zvknha, false),
-MULTI_EXT_CFG_BOOL("x-zvknhb", ext_zvknhb, false),
-MULTI_EXT_CFG_BOOL("x-zvksed", ext_zvksed, false),
-MULTI_EXT_CFG_BOOL("x-zvksh", ext_zvksh, false),
-MULTI_EXT_CFG_BOOL("x-zvkt", ext_zvkt, false),
-MULTI_EXT_CFG_BOOL("x-zvkn", ext_zvkn, false),
-MULTI_EXT_CFG_BOOL("x-zvknc", ext_zvknc, false),
-MULTI_EXT_CFG_BOOL("x-zvkng", ext_zvkng, false),
-MULTI_EXT_CFG_BOOL("x-zvks", ext_zvks, false),
-MULTI_EXT_CFG_BOOL("x-zvksc", ext_zvksc, false),
-MULTI_EXT_CFG_BOOL("x-zvksg", ext_zvksg, false),
-
 DEFINE_PROP_END_OF_LIST(),
 };
 
-- 
2.34.1

[PATCH v2 09/14] target/riscv: Expose Zvks[c|g] extnesion properties

2023-10-26 Thread Max Chou

Expose the properties of ShangMi Algorithm Suite related extensions
(Zvks, Zvksc, Zvksg).

Signed-off-by: Max Chou 
---
 target/riscv/cpu.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 8eae8d3e59c..1709df76a9b 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -133,7 +133,10 @@ const RISCVIsaExtData isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(zvkng, PRIV_VERSION_1_12_0, ext_zvkng),
 ISA_EXT_DATA_ENTRY(zvknha, PRIV_VERSION_1_12_0, ext_zvknha),
 ISA_EXT_DATA_ENTRY(zvknhb, PRIV_VERSION_1_12_0, ext_zvknhb),
+ISA_EXT_DATA_ENTRY(zvks, PRIV_VERSION_1_12_0, ext_zvks),
+ISA_EXT_DATA_ENTRY(zvksc, PRIV_VERSION_1_12_0, ext_zvksc),
 ISA_EXT_DATA_ENTRY(zvksed, PRIV_VERSION_1_12_0, ext_zvksed),
+ISA_EXT_DATA_ENTRY(zvksg, PRIV_VERSION_1_12_0, ext_zvksg),
 ISA_EXT_DATA_ENTRY(zvksh, PRIV_VERSION_1_12_0, ext_zvksh),
 ISA_EXT_DATA_ENTRY(zvkt, PRIV_VERSION_1_12_0, ext_zvkt),
 ISA_EXT_DATA_ENTRY(zhinx, PRIV_VERSION_1_12_0, ext_zhinx),
@@ -1385,6 +1388,9 @@ const RISCVCPUMultiExtConfig 
riscv_cpu_experimental_exts[] = {
 MULTI_EXT_CFG_BOOL("x-zvkn", ext_zvkn, false),
 MULTI_EXT_CFG_BOOL("x-zvknc", ext_zvknc, false),
 MULTI_EXT_CFG_BOOL("x-zvkng", ext_zvkng, false),
+MULTI_EXT_CFG_BOOL("x-zvks", ext_zvks, false),
+MULTI_EXT_CFG_BOOL("x-zvksc", ext_zvksc, false),
+MULTI_EXT_CFG_BOOL("x-zvksg", ext_zvksg, false),
 
 DEFINE_PROP_END_OF_LIST(),
 };
-- 
2.34.1

[PATCH v2 13/14] disas/riscv: Add support for vector crypto extensions

2023-10-26 Thread Max Chou

This patch adds following v1.0.0 ratified vector crypto extensions
support to the RISC-V disassembler.
- Zvbb
- Zvbc
- Zvkb
- Zvkg
- Zvkned
- Zvknha
- Zvknhb
- Zvksed
- Zvksh

Signed-off-by: Max Chou 
---
 disas/riscv.c | 137 ++
 1 file changed, 137 insertions(+)

diff --git a/disas/riscv.c b/disas/riscv.c
index ec33e447f5b..7ea6ea050e9 100644
--- a/disas/riscv.c
+++ b/disas/riscv.c
@@ -862,6 +862,47 @@ typedef enum {
 rv_op_fltq_q = 831,
 rv_op_fleq_h = 832,
 rv_op_fltq_h = 833,
+rv_op_vaesdf_vv = 834,
+rv_op_vaesdf_vs = 835,
+rv_op_vaesdm_vv = 836,
+rv_op_vaesdm_vs = 837,
+rv_op_vaesef_vv = 838,
+rv_op_vaesef_vs = 839,
+rv_op_vaesem_vv = 840,
+rv_op_vaesem_vs = 841,
+rv_op_vaeskf1_vi = 842,
+rv_op_vaeskf2_vi = 843,
+rv_op_vaesz_vs = 844,
+rv_op_vandn_vv = 845,
+rv_op_vandn_vx = 846,
+rv_op_vbrev_v = 847,
+rv_op_vbrev8_v = 848,
+rv_op_vclmul_vv = 849,
+rv_op_vclmul_vx = 850,
+rv_op_vclmulh_vv = 851,
+rv_op_vclmulh_vx = 852,
+rv_op_vclz_v = 853,
+rv_op_vcpop_v = 854,
+rv_op_vctz_v = 855,
+rv_op_vghsh_vv = 856,
+rv_op_vgmul_vv = 857,
+rv_op_vrev8_v = 858,
+rv_op_vrol_vv = 859,
+rv_op_vrol_vx = 860,
+rv_op_vror_vv = 861,
+rv_op_vror_vx = 862,
+rv_op_vror_vi = 863,
+rv_op_vsha2ch_vv = 864,
+rv_op_vsha2cl_vv = 865,
+rv_op_vsha2ms_vv = 866,
+rv_op_vsm3c_vi = 867,
+rv_op_vsm3me_vv = 868,
+rv_op_vsm4k_vi = 869,
+rv_op_vsm4r_vv = 870,
+rv_op_vsm4r_vs = 871,
+rv_op_vwsll_vv = 872,
+rv_op_vwsll_vx = 873,
+rv_op_vwsll_vi = 874,
 } rv_op;
 
 /* register names */
@@ -2008,6 +2049,47 @@ const rv_opcode_data rvi_opcode_data[] = {
 { "fltq.q", rv_codec_r, rv_fmt_rd_frs1_frs2, NULL, 0, 0, 0 },
 { "fleq.h", rv_codec_r, rv_fmt_rd_frs1_frs2, NULL, 0, 0, 0 },
 { "fltq.h", rv_codec_r, rv_fmt_rd_frs1_frs2, NULL, 0, 0, 0 },
+{ "vaesdf.vv", rv_codec_v_r, rv_fmt_vd_vs2, NULL, 0, 0, 0 },
+{ "vaesdf.vs", rv_codec_v_r, rv_fmt_vd_vs2, NULL, 0, 0, 0 },
+{ "vaesdm.vv", rv_codec_v_r, rv_fmt_vd_vs2, NULL, 0, 0, 0 },
+{ "vaesdm.vs", rv_codec_v_r, rv_fmt_vd_vs2, NULL, 0, 0, 0 },
+{ "vaesef.vv", rv_codec_v_r, rv_fmt_vd_vs2, NULL, 0, 0, 0 },
+{ "vaesef.vs", rv_codec_v_r, rv_fmt_vd_vs2, NULL, 0, 0, 0 },
+{ "vaesem.vv", rv_codec_v_r, rv_fmt_vd_vs2, NULL, 0, 0, 0 },
+{ "vaesem.vs", rv_codec_v_r, rv_fmt_vd_vs2, NULL, 0, 0, 0 },
+{ "vaeskf1.vi", rv_codec_v_i, rv_fmt_vd_vs2_uimm, NULL, 0, 0, 0 },
+{ "vaeskf2.vi", rv_codec_v_i, rv_fmt_vd_vs2_uimm, NULL, 0, 0, 0 },
+{ "vaesz.vs", rv_codec_v_r, rv_fmt_vd_vs2, NULL, 0, 0, 0 },
+{ "vandn.vv", rv_codec_v_r, rv_fmt_vd_vs2_vs1_vm, NULL, 0, 0, 0 },
+{ "vandn.vx", rv_codec_v_r, rv_fmt_vd_vs2_rs1_vm, NULL, 0, 0, 0 },
+{ "vbrev.v", rv_codec_v_r, rv_fmt_vd_vs2_vm, NULL, 0, 0, 0 },
+{ "vbrev8.v", rv_codec_v_r, rv_fmt_vd_vs2_vm, NULL, 0, 0, 0 },
+{ "vclmul.vv", rv_codec_v_r, rv_fmt_vd_vs2_vs1_vm, NULL, 0, 0, 0 },
+{ "vclmul.vx", rv_codec_v_r, rv_fmt_vd_vs2_rs1_vm, NULL, 0, 0, 0 },
+{ "vclmulh.vv", rv_codec_v_r, rv_fmt_vd_vs2_vs1_vm, NULL, 0, 0, 0 },
+{ "vclmulh.vx", rv_codec_v_r, rv_fmt_vd_vs2_rs1_vm, NULL, 0, 0, 0 },
+{ "vclz.v", rv_codec_v_r, rv_fmt_vd_vs2_vm, NULL, 0, 0, 0 },
+{ "vcpop.v", rv_codec_v_r, rv_fmt_vd_vs2_vm, NULL, 0, 0, 0 },
+{ "vctz.v", rv_codec_v_r, rv_fmt_vd_vs2_vm, NULL, 0, 0, 0 },
+{ "vghsh.vv", rv_codec_v_r, rv_fmt_vd_vs2_vs1, NULL, 0, 0, 0 },
+{ "vgmul.vv", rv_codec_v_r, rv_fmt_vd_vs2, NULL, 0, 0, 0 },
+{ "vrev8.v", rv_codec_v_r, rv_fmt_vd_vs2_vm, NULL, 0, 0, 0 },
+{ "vrol.vv", rv_codec_v_r, rv_fmt_vd_vs2_vs1_vm, NULL, 0, 0, 0 },
+{ "vrol.vx", rv_codec_v_r, rv_fmt_vd_vs2_rs1_vm, NULL, 0, 0, 0 },
+{ "vror.vv", rv_codec_v_r, rv_fmt_vd_vs2_vs1_vm, NULL, 0, 0, 0 },
+{ "vror.vx", rv_codec_v_r, rv_fmt_vd_vs2_rs1_vm, NULL, 0, 0, 0 },
+{ "vror.vi", rv_codec_vror_vi, rv_fmt_vd_vs2_uimm_vm, NULL, 0, 0, 0 },
+{ "vsha2ch.vv", rv_codec_v_r, rv_fmt_vd_vs2_vs1, NULL, 0, 0, 0 },
+{ "vsha2cl.vv", rv_codec_v_r, rv_fmt_vd_vs2_vs1, NULL, 0, 0, 0 },
+{ "vsha2ms.vv", rv_codec_v_r, rv_fmt_vd_vs2_vs1, NULL, 0, 0, 0 },
+{ "vsm3c.vi", rv_codec_v_i, rv_fmt_vd_vs2_uimm, NULL, 0, 0, 0 },
+{ "vsm3me.vv", rv_codec_v_r, rv_fmt_vd_vs2_vs1, NULL, 0, 0, 0 },
+{ "vsm4k.vi", rv_codec_v_i, rv_fmt_vd_vs2_uimm, NULL, 0, 0, 0 },
+{ "vsm4r.vv", rv_codec_v_r, rv_fmt_vd_vs2, NULL, 0, 0, 0 },
+{ "vsm4r.vs", rv_codec_v_r, rv_fmt_vd_vs2, NULL, 0, 0, 0 },
+{ "vwsll.vv", rv_codec_v_r, rv_fmt_vd_vs2_vs1_vm, NULL, 0, 0, 0 },
+{ "vwsll.vx", rv_codec_v_r, rv_fmt_vd_vs2_rs1_vm, NULL, 0, 0, 0 },
+{ "vwsll.vi", rv_codec_v_i, rv_fmt_vd_vs2_uimm_vm, NULL, 0, 0, 0 },
 };
 
 /* CSR names */
@@ -3176,6 +3258,7 @@ static void decode_inst_opcode(rv_decode *dec, rv_isa isa)
 case 0:
 switch ((inst >> 26) & 0b11) {

[PATCH v2 14/14] disas/riscv: Replace TABs with space

2023-10-26 Thread Max Chou

Replaces TABs with spaces, making sure to have a consistent coding style
of 4 space indentations.

Signed-off-by: Max Chou 
---
 disas/riscv.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/disas/riscv.c b/disas/riscv.c
index 7ea6ea050e9..e9458e574b9 100644
--- a/disas/riscv.c
+++ b/disas/riscv.c
@@ -3136,12 +3136,12 @@ static void decode_inst_opcode(rv_decode *dec, rv_isa 
isa)
 }
 break;
 case 89:
-   switch (((inst >> 12) & 0b111)) {
+switch (((inst >> 12) & 0b111)) {
 case 0: op = rv_op_fmvp_d_x; break;
 }
 break;
 case 91:
-   switch (((inst >> 12) & 0b111)) {
+switch (((inst >> 12) & 0b111)) {
 case 0: op = rv_op_fmvp_q_x; break;
 }
 break;
@@ -4579,7 +4579,7 @@ static void decode_inst_operands(rv_decode *dec, rv_isa 
isa)
 break;
 case rv_codec_zcmt_jt:
 dec->imm = operand_tbl_index(inst);
-   break;
+break;
 case rv_codec_fli:
 dec->rd = operand_rd(inst);
 dec->imm = operand_rs1(inst);
-- 
2.34.1

[PATCH v2 12/14] disas/riscv: Add rv_codec_vror_vi for vror.vi

2023-10-26 Thread Max Chou

Add rv_codec_vror_vi for the vector crypto instruction - vror.vi.
The rotate amount of vror.vi is defined by combining seperated bits.

Signed-off-by: Max Chou 
---
 disas/riscv.c | 14 +-
 disas/riscv.h |  1 +
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/disas/riscv.c b/disas/riscv.c
index 8e89e1d1157..ec33e447f5b 100644
--- a/disas/riscv.c
+++ b/disas/riscv.c
@@ -4011,6 +4011,12 @@ static uint32_t operand_vzimm10(rv_inst inst)
 return (inst << 34) >> 54;
 }
 
+static uint32_t operand_vzimm6(rv_inst inst)
+{
+return ((inst << 37) >> 63) << 5 |
+((inst << 44) >> 59);
+}
+
 static uint32_t operand_bs(rv_inst inst)
 {
 return (inst << 32) >> 62;
@@ -4393,6 +4399,12 @@ static void decode_inst_operands(rv_decode *dec, rv_isa 
isa)
 dec->imm = operand_vimm(inst);
 dec->vm = operand_vm(inst);
 break;
+case rv_codec_vror_vi:
+dec->rd = operand_rd(inst);
+dec->rs2 = operand_rs2(inst);
+dec->imm = operand_vzimm6(inst);
+dec->vm = operand_vm(inst);
+break;
 case rv_codec_vsetvli:
 dec->rd = operand_rd(inst);
 dec->rs1 = operand_rs1(inst);
@@ -4677,7 +4689,7 @@ static void format_inst(char *buf, size_t buflen, size_t 
tab, rv_decode *dec)
 append(buf, tmp, buflen);
 break;
 case 'u':
-snprintf(tmp, sizeof(tmp), "%u", ((uint32_t)dec->imm & 0b1));
+snprintf(tmp, sizeof(tmp), "%u", ((uint32_t)dec->imm & 0b11));
 append(buf, tmp, buflen);
 break;
 case 'j':
diff --git a/disas/riscv.h b/disas/riscv.h
index b242d73b25e..19e5ed2ce63 100644
--- a/disas/riscv.h
+++ b/disas/riscv.h
@@ -152,6 +152,7 @@ typedef enum {
 rv_codec_v_i,
 rv_codec_vsetvli,
 rv_codec_vsetivli,
+rv_codec_vror_vi,
 rv_codec_zcb_ext,
 rv_codec_zcb_mul,
 rv_codec_zcb_lb,
-- 
2.34.1

[PATCH v2 07/14] target/riscv: Expose Zvkn[c|g] extnesion properties

2023-10-26 Thread Max Chou

Expose the properties of NIST Algorithm Suite related extensions (Zvkn,
Zvknc, Zvkng).

Signed-off-by: Max Chou 
---
 target/riscv/cpu.c | 6 ++
 1 file changed, 6 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 992f8e0f7b0..8eae8d3e59c 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -127,7 +127,10 @@ const RISCVIsaExtData isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(zvfhmin, PRIV_VERSION_1_12_0, ext_zvfhmin),
 ISA_EXT_DATA_ENTRY(zvkb, PRIV_VERSION_1_12_0, ext_zvkb),
 ISA_EXT_DATA_ENTRY(zvkg, PRIV_VERSION_1_12_0, ext_zvkg),
+ISA_EXT_DATA_ENTRY(zvkn, PRIV_VERSION_1_12_0, ext_zvkn),
+ISA_EXT_DATA_ENTRY(zvknc, PRIV_VERSION_1_12_0, ext_zvknc),
 ISA_EXT_DATA_ENTRY(zvkned, PRIV_VERSION_1_12_0, ext_zvkned),
+ISA_EXT_DATA_ENTRY(zvkng, PRIV_VERSION_1_12_0, ext_zvkng),
 ISA_EXT_DATA_ENTRY(zvknha, PRIV_VERSION_1_12_0, ext_zvknha),
 ISA_EXT_DATA_ENTRY(zvknhb, PRIV_VERSION_1_12_0, ext_zvknhb),
 ISA_EXT_DATA_ENTRY(zvksed, PRIV_VERSION_1_12_0, ext_zvksed),
@@ -1379,6 +1382,9 @@ const RISCVCPUMultiExtConfig 
riscv_cpu_experimental_exts[] = {
 MULTI_EXT_CFG_BOOL("x-zvksed", ext_zvksed, false),
 MULTI_EXT_CFG_BOOL("x-zvksh", ext_zvksh, false),
 MULTI_EXT_CFG_BOOL("x-zvkt", ext_zvkt, false),
+MULTI_EXT_CFG_BOOL("x-zvkn", ext_zvkn, false),
+MULTI_EXT_CFG_BOOL("x-zvknc", ext_zvknc, false),
+MULTI_EXT_CFG_BOOL("x-zvkng", ext_zvkng, false),
 
 DEFINE_PROP_END_OF_LIST(),
 };
-- 
2.34.1

[PATCH v2 11/14] disas/riscv: Add rv_fmt_vd_vs2_uimm format

2023-10-26 Thread Max Chou

Add rv_fmt_vd_vs2_uimm format for vector crypto instructions.

Signed-off-by: Max Chou 
---
 disas/riscv.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/disas/riscv.h b/disas/riscv.h
index 8abb578b515..b242d73b25e 100644
--- a/disas/riscv.h
+++ b/disas/riscv.h
@@ -274,6 +274,7 @@ enum {
 #define rv_fmt_vd_vs2_fs1_vm  "O\tD,F,4m"
 #define rv_fmt_vd_vs2_imm_vl  "O\tD,F,il"
 #define rv_fmt_vd_vs2_imm_vm  "O\tD,F,im"
+#define rv_fmt_vd_vs2_uimm"O\tD,F,u"
 #define rv_fmt_vd_vs2_uimm_vm "O\tD,F,um"
 #define rv_fmt_vd_vs1_vs2_vm  "O\tD,E,Fm"
 #define rv_fmt_vd_rs1_vs2_vm  "O\tD,1,Fm"
-- 
2.34.1

[PATCH v2 08/14] target/riscv: Add cfg properties for Zvks[c|g] extensions

2023-10-26 Thread Max Chou

Vector crypto spec defines the ShangMi algorithm suite related
extensions (Zvks, Zvksc, Zvksg) combined by several vector crypto
extensions.

Signed-off-by: Max Chou 
---
 target/riscv/cpu_cfg.h |  3 +++
 target/riscv/tcg/tcg-cpu.c | 17 +
 2 files changed, 20 insertions(+)

diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
index fd07aa96a27..7b0ca657a6a 100644
--- a/target/riscv/cpu_cfg.h
+++ b/target/riscv/cpu_cfg.h
@@ -99,6 +99,9 @@ struct RISCVCPUConfig {
 bool ext_zvkn;
 bool ext_zvknc;
 bool ext_zvkng;
+bool ext_zvks;
+bool ext_zvksc;
+bool ext_zvksg;
 bool ext_zmmul;
 bool ext_zvfbfmin;
 bool ext_zvfbfwma;
diff --git a/target/riscv/tcg/tcg-cpu.c b/target/riscv/tcg/tcg-cpu.c
index e460701a13c..f9023ca75c7 100644
--- a/target/riscv/tcg/tcg-cpu.c
+++ b/target/riscv/tcg/tcg-cpu.c
@@ -519,6 +519,23 @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, 
Error **errp)
 cpu_cfg_ext_auto_update(cpu, CPU_CFG_OFFSET(ext_zvkt), true);
 }
 
+if (cpu->cfg.ext_zvksc) {
+cpu_cfg_ext_auto_update(cpu, CPU_CFG_OFFSET(ext_zvks), true);
+cpu_cfg_ext_auto_update(cpu, CPU_CFG_OFFSET(ext_zvbc), true);
+}
+
+if (cpu->cfg.ext_zvksg) {
+cpu_cfg_ext_auto_update(cpu, CPU_CFG_OFFSET(ext_zvks), true);
+cpu_cfg_ext_auto_update(cpu, CPU_CFG_OFFSET(ext_zvkg), true);
+}
+
+if (cpu->cfg.ext_zvks) {
+cpu_cfg_ext_auto_update(cpu, CPU_CFG_OFFSET(ext_zvksed), true);
+cpu_cfg_ext_auto_update(cpu, CPU_CFG_OFFSET(ext_zvksh), true);
+cpu_cfg_ext_auto_update(cpu, CPU_CFG_OFFSET(ext_zvkb), true);
+cpu_cfg_ext_auto_update(cpu, CPU_CFG_OFFSET(ext_zvkt), true);
+}
+
 if (cpu->cfg.ext_zvkt) {
 cpu_cfg_ext_auto_update(cpu, CPU_CFG_OFFSET(ext_zvbb), true);
 cpu_cfg_ext_auto_update(cpu, CPU_CFG_OFFSET(ext_zvbc), true);
-- 
2.34.1

[PATCH v2 03/14] target/riscv: Add cfg property for Zvkb extension

2023-10-26 Thread Max Chou

After vector crypto spec v1.0.0-rc3 release, the Zvkb extension is
defined as a proper subset of the Zvbb extension. And both the Zvkn and
Zvks shorthand extensions replace the included Zvbb extension by Zvkb
extnesion.

Signed-off-by: Max Chou 
---
 target/riscv/cpu_cfg.h | 1 +
 target/riscv/tcg/tcg-cpu.c | 6 +++---
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
index d8d17dedeed..935335e5721 100644
--- a/target/riscv/cpu_cfg.h
+++ b/target/riscv/cpu_cfg.h
@@ -88,6 +88,7 @@ struct RISCVCPUConfig {
 bool ext_zve64d;
 bool ext_zvbb;
 bool ext_zvbc;
+bool ext_zvkb;
 bool ext_zvkg;
 bool ext_zvkned;
 bool ext_zvknha;
diff --git a/target/riscv/tcg/tcg-cpu.c b/target/riscv/tcg/tcg-cpu.c
index b9eaecb699c..1b08f27eee4 100644
--- a/target/riscv/tcg/tcg-cpu.c
+++ b/target/riscv/tcg/tcg-cpu.c
@@ -508,9 +508,9 @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, Error 
**errp)
  * In principle Zve*x would also suffice here, were they supported
  * in qemu
  */
-if ((cpu->cfg.ext_zvbb || cpu->cfg.ext_zvkg || cpu->cfg.ext_zvkned ||
- cpu->cfg.ext_zvknha || cpu->cfg.ext_zvksed || cpu->cfg.ext_zvksh) &&
-!cpu->cfg.ext_zve32f) {
+if ((cpu->cfg.ext_zvbb || cpu->cfg.ext_zvkb || cpu->cfg.ext_zvkg ||
+ cpu->cfg.ext_zvkned || cpu->cfg.ext_zvknha || cpu->cfg.ext_zvksed ||
+ cpu->cfg.ext_zvksh) && !cpu->cfg.ext_zve32f) {
 error_setg(errp,
"Vector crypto extensions require V or Zve* extensions");
 return;
-- 
2.34.1

[PATCH v2 05/14] target/riscv: Expose Zvkb extension property

2023-10-26 Thread Max Chou

Signed-off-by: Max Chou 
---
 target/riscv/cpu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index 5099c786415..992f8e0f7b0 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -125,6 +125,7 @@ const RISCVIsaExtData isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(zvfbfwma, PRIV_VERSION_1_12_0, ext_zvfbfwma),
 ISA_EXT_DATA_ENTRY(zvfh, PRIV_VERSION_1_12_0, ext_zvfh),
 ISA_EXT_DATA_ENTRY(zvfhmin, PRIV_VERSION_1_12_0, ext_zvfhmin),
+ISA_EXT_DATA_ENTRY(zvkb, PRIV_VERSION_1_12_0, ext_zvkb),
 ISA_EXT_DATA_ENTRY(zvkg, PRIV_VERSION_1_12_0, ext_zvkg),
 ISA_EXT_DATA_ENTRY(zvkned, PRIV_VERSION_1_12_0, ext_zvkned),
 ISA_EXT_DATA_ENTRY(zvknha, PRIV_VERSION_1_12_0, ext_zvknha),
@@ -1370,6 +1371,7 @@ const RISCVCPUMultiExtConfig 
riscv_cpu_experimental_exts[] = {
 /* Vector cryptography extensions */
 MULTI_EXT_CFG_BOOL("x-zvbb", ext_zvbb, false),
 MULTI_EXT_CFG_BOOL("x-zvbc", ext_zvbc, false),
+MULTI_EXT_CFG_BOOL("x-zvkb", ext_zvkg, false),
 MULTI_EXT_CFG_BOOL("x-zvkg", ext_zvkg, false),
 MULTI_EXT_CFG_BOOL("x-zvkned", ext_zvkned, false),
 MULTI_EXT_CFG_BOOL("x-zvknha", ext_zvknha, false),
-- 
2.34.1

[PATCH v2 04/14] target/riscv: Replace Zvbb checking by Zvkb

2023-10-26 Thread Max Chou

The Zvkb extension is a proper subset of the Zvbb extension and includes
following instructions:
  * vandn.[vv,vx]
  * vbrev8.v
  * vrev8.v
  * vrol.[vv,vx]
  * vror.[vv,vx,vi]

Signed-off-by: Max Chou 
---
 target/riscv/insn_trans/trans_rvvk.c.inc | 37 +++-
 1 file changed, 24 insertions(+), 13 deletions(-)

diff --git a/target/riscv/insn_trans/trans_rvvk.c.inc 
b/target/riscv/insn_trans/trans_rvvk.c.inc
index e691519ed78..3801c16829d 100644
--- a/target/riscv/insn_trans/trans_rvvk.c.inc
+++ b/target/riscv/insn_trans/trans_rvvk.c.inc
@@ -112,24 +112,27 @@ GEN_VX_MASKED_TRANS(vclmulh_vx, vclmul_vx_check)
 return false;\
 }
 
-static bool zvbb_vv_check(DisasContext *s, arg_rmrr *a)
+static bool zvkb_vv_check(DisasContext *s, arg_rmrr *a)
 {
-return opivv_check(s, a) && s->cfg_ptr->ext_zvbb == true;
+return opivv_check(s, a) &&
+   (s->cfg_ptr->ext_zvbb == true || s->cfg_ptr->ext_zvkb == true);
 }
 
-static bool zvbb_vx_check(DisasContext *s, arg_rmrr *a)
+static bool zvkb_vx_check(DisasContext *s, arg_rmrr *a)
 {
-return opivx_check(s, a) && s->cfg_ptr->ext_zvbb == true;
+return opivx_check(s, a) &&
+   (s->cfg_ptr->ext_zvbb == true || s->cfg_ptr->ext_zvkb == true);
 }
 
 /* vrol.v[vx] */
-GEN_OPIVV_GVEC_TRANS_CHECK(vrol_vv, rotlv, zvbb_vv_check)
-GEN_OPIVX_GVEC_SHIFT_TRANS_CHECK(vrol_vx, rotls, zvbb_vx_check)
+GEN_OPIVV_GVEC_TRANS_CHECK(vrol_vv, rotlv, zvkb_vv_check)
+GEN_OPIVX_GVEC_SHIFT_TRANS_CHECK(vrol_vx, rotls, zvkb_vx_check)
 
 /* vror.v[vxi] */
-GEN_OPIVV_GVEC_TRANS_CHECK(vror_vv, rotrv, zvbb_vv_check)
-GEN_OPIVX_GVEC_SHIFT_TRANS_CHECK(vror_vx, rotrs, zvbb_vx_check)
-GEN_OPIVI_GVEC_TRANS_CHECK(vror_vi, IMM_TRUNC_SEW, vror_vx, rotri, 
zvbb_vx_check)
+GEN_OPIVV_GVEC_TRANS_CHECK(vror_vv, rotrv, zvkb_vv_check)
+GEN_OPIVX_GVEC_SHIFT_TRANS_CHECK(vror_vx, rotrs, zvkb_vx_check)
+GEN_OPIVI_GVEC_TRANS_CHECK(vror_vi, IMM_TRUNC_SEW, vror_vx, rotri,
+   zvkb_vx_check)
 
 #define GEN_OPIVX_GVEC_TRANS_CHECK(NAME, SUF, CHECK) \
 static bool trans_##NAME(DisasContext *s, arg_rmrr *a)   \
@@ -147,8 +150,8 @@ GEN_OPIVI_GVEC_TRANS_CHECK(vror_vi, IMM_TRUNC_SEW, vror_vx, 
rotri, zvbb_vx_check
 }
 
 /* vandn.v[vx] */
-GEN_OPIVV_GVEC_TRANS_CHECK(vandn_vv, andc, zvbb_vv_check)
-GEN_OPIVX_GVEC_TRANS_CHECK(vandn_vx, andcs, zvbb_vx_check)
+GEN_OPIVV_GVEC_TRANS_CHECK(vandn_vv, andc, zvkb_vv_check)
+GEN_OPIVX_GVEC_TRANS_CHECK(vandn_vx, andcs, zvkb_vx_check)
 
 #define GEN_OPIV_TRANS(NAME, CHECK)\
 static bool trans_##NAME(DisasContext *s, arg_rmr *a)  \
@@ -188,8 +191,16 @@ static bool zvbb_opiv_check(DisasContext *s, arg_rmr *a)
vext_check_ss(s, a->rd, a->rs2, a->vm);
 }
 
-GEN_OPIV_TRANS(vbrev8_v, zvbb_opiv_check)
-GEN_OPIV_TRANS(vrev8_v, zvbb_opiv_check)
+static bool zvkb_opiv_check(DisasContext *s, arg_rmr *a)
+{
+return (s->cfg_ptr->ext_zvbb == true || s->cfg_ptr->ext_zvkb == true) &&
+   require_rvv(s) &&
+   vext_check_isa_ill(s) &&
+   vext_check_ss(s, a->rd, a->rs2, a->vm);
+}
+
+GEN_OPIV_TRANS(vbrev8_v, zvkb_opiv_check)
+GEN_OPIV_TRANS(vrev8_v, zvkb_opiv_check)
 GEN_OPIV_TRANS(vbrev_v, zvbb_opiv_check)
 GEN_OPIV_TRANS(vclz_v, zvbb_opiv_check)
 GEN_OPIV_TRANS(vctz_v, zvbb_opiv_check)
-- 
2.34.1

[PATCH v2 02/14] target/riscv: Expose Zvkt extension property

2023-10-26 Thread Max Chou

Signed-off-by: Max Chou 
---
 target/riscv/cpu.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/target/riscv/cpu.c b/target/riscv/cpu.c
index a2881bfa383..5099c786415 100644
--- a/target/riscv/cpu.c
+++ b/target/riscv/cpu.c
@@ -131,6 +131,7 @@ const RISCVIsaExtData isa_edata_arr[] = {
 ISA_EXT_DATA_ENTRY(zvknhb, PRIV_VERSION_1_12_0, ext_zvknhb),
 ISA_EXT_DATA_ENTRY(zvksed, PRIV_VERSION_1_12_0, ext_zvksed),
 ISA_EXT_DATA_ENTRY(zvksh, PRIV_VERSION_1_12_0, ext_zvksh),
+ISA_EXT_DATA_ENTRY(zvkt, PRIV_VERSION_1_12_0, ext_zvkt),
 ISA_EXT_DATA_ENTRY(zhinx, PRIV_VERSION_1_12_0, ext_zhinx),
 ISA_EXT_DATA_ENTRY(zhinxmin, PRIV_VERSION_1_12_0, ext_zhinxmin),
 ISA_EXT_DATA_ENTRY(smaia, PRIV_VERSION_1_12_0, ext_smaia),
@@ -1375,6 +1376,7 @@ const RISCVCPUMultiExtConfig 
riscv_cpu_experimental_exts[] = {
 MULTI_EXT_CFG_BOOL("x-zvknhb", ext_zvknhb, false),
 MULTI_EXT_CFG_BOOL("x-zvksed", ext_zvksed, false),
 MULTI_EXT_CFG_BOOL("x-zvksh", ext_zvksh, false),
+MULTI_EXT_CFG_BOOL("x-zvkt", ext_zvkt, false),
 
 DEFINE_PROP_END_OF_LIST(),
 };
-- 
2.34.1

[PATCH v2 01/14] target/riscv: Add cfg property for Zvkt extension

2023-10-26 Thread Max Chou

Vector crypto spec defines the Zvkt extension that included all of the
instructions of Zvbb & Zvbc extensions and some vector instructions.

Signed-off-by: Max Chou 
---
 target/riscv/cpu_cfg.h | 1 +
 target/riscv/tcg/tcg-cpu.c | 5 +
 2 files changed, 6 insertions(+)

diff --git a/target/riscv/cpu_cfg.h b/target/riscv/cpu_cfg.h
index e7ce977189c..d8d17dedeed 100644
--- a/target/riscv/cpu_cfg.h
+++ b/target/riscv/cpu_cfg.h
@@ -94,6 +94,7 @@ struct RISCVCPUConfig {
 bool ext_zvknhb;
 bool ext_zvksed;
 bool ext_zvksh;
+bool ext_zvkt;
 bool ext_zmmul;
 bool ext_zvfbfmin;
 bool ext_zvfbfwma;
diff --git a/target/riscv/tcg/tcg-cpu.c b/target/riscv/tcg/tcg-cpu.c
index c5ff03efce9..b9eaecb699c 100644
--- a/target/riscv/tcg/tcg-cpu.c
+++ b/target/riscv/tcg/tcg-cpu.c
@@ -499,6 +499,11 @@ void riscv_cpu_validate_set_extensions(RISCVCPU *cpu, 
Error **errp)
 return;
 }
 
+if (cpu->cfg.ext_zvkt) {
+cpu_cfg_ext_auto_update(cpu, CPU_CFG_OFFSET(ext_zvbb), true);
+cpu_cfg_ext_auto_update(cpu, CPU_CFG_OFFSET(ext_zvbc), true);
+}
+
 /*
  * In principle Zve*x would also suffice here, were they supported
  * in qemu
-- 
2.34.1

1 2 3 >

1 - 100 of 213 matches

Mail list logo