[xen-unstable-smoke test] 183960: tolerable all pass - PUSHED

2023-11-30 Thread osstest service owner
flight 183960 xen-unstable-smoke real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183960/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  def73fc14407252cc801f35cd7746e60ccd70884
baseline version:
 xen  f0dd0cd9598f22ee5509bb5d1466e4821834c4ba

Last test of basis   183939  2023-11-29 18:02:12 Z1 days
Testing same since   183960  2023-12-01 04:00:28 Z0 days1 attempts


People who touched revisions under test:
  Michal Orzel 
  Nicola Vetrini 
  Simone Ballarin 
  Stefano Stabellini 

jobs:
 build-arm64-xsm  pass
 build-amd64  pass
 build-armhf  pass
 build-amd64-libvirt  pass
 test-armhf-armhf-xl  pass
 test-arm64-arm64-xl-xsm  pass
 test-amd64-amd64-xl-qemuu-debianhvm-amd64pass
 test-amd64-amd64-libvirt pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/xen.git
   f0dd0cd959..def73fc144  def73fc14407252cc801f35cd7746e60ccd70884 -> smoke



Re: [PATCH 2/6] qemu/main-loop: rename QEMU_IOTHREAD_LOCK_GUARD to QEMU_BQL_LOCK_GUARD

2023-11-30 Thread Harsh Prateek Bora




On 12/1/23 01:57, Stefan Hajnoczi wrote:

On Thu, Nov 30, 2023 at 10:14:47AM +0100, Ilya Leoshkevich wrote:

On Wed, 2023-11-29 at 16:26 -0500, Stefan Hajnoczi wrote:

The name "iothread" is overloaded. Use the term Big QEMU Lock (BQL)
instead, it is already widely used and unambiguous.

Signed-off-by: Stefan Hajnoczi 
---
  include/qemu/main-loop.h  | 20 ++--
  hw/i386/kvm/xen_evtchn.c  | 14 +++---
  hw/i386/kvm/xen_gnttab.c  |  2 +-
  hw/mips/mips_int.c    |  2 +-
  hw/ppc/ppc.c  |  2 +-
  target/i386/kvm/xen-emu.c |  2 +-
  target/ppc/excp_helper.c  |  2 +-
  target/ppc/helper_regs.c  |  2 +-
  target/riscv/cpu_helper.c |  4 ++--
  9 files changed, 25 insertions(+), 25 deletions(-)

diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
index d6f75e57bd..0b6a3e4824 100644
--- a/include/qemu/main-loop.h
+++ b/include/qemu/main-loop.h
@@ -344,13 +344,13 @@ void qemu_bql_lock_impl(const char *file, int
line);
  void qemu_bql_unlock(void);
  
  /**

- * QEMU_IOTHREAD_LOCK_GUARD
+ * QEMU_BQL_LOCK_GUARD
   *
- * Wrap a block of code in a conditional
qemu_mutex_{lock,unlock}_iothread.
+ * Wrap a block of code in a conditional qemu_bql_{lock,unlock}.
   */
-typedef struct IOThreadLockAuto IOThreadLockAuto;
+typedef struct BQLLockAuto BQLLockAuto;
  
-static inline IOThreadLockAuto *qemu_iothread_auto_lock(const char

*file,
+static inline BQLLockAuto *qemu_bql_auto_lock(const char *file,
  int line)


The padding is not correct anymore.


Good point, I didn't check the formatting after search-and-replace. I
will fix this across the patch series in v2.



Yeh, some comments in 5/6 and 6/6 can also make full use of 80 char 
width after search-replace effect.


regards,
Harsh


Stefan




Re: [XEN PATCH 7/7] xen/page_alloc: deviate first_valid_mfn for MISRA C Rule 8.4

2023-11-30 Thread Jan Beulich
On 01.12.2023 03:47, Stefano Stabellini wrote:
> On Wed, 29 Nov 2023, Nicola Vetrini wrote:
>> No functional change.
>>
>> Signed-off-by: Nicola Vetrini 
>> ---
>> The preferred way to deviate is to use asmlinkage, but this modification is 
>> only
>> the consequence of NUMA on ARM (and possibly PPC) being a work in progress.
>> As stated in the comment above the textual deviation, first_valid_mfn will
>> likely then become static and there would be no need for the comment anymore.
>> This works towards having the analysis for this rule clean (i.e. no 
>> violations);
>> the interest in having a clean rule is that then it could be used to signal
>> newly introduced violations by making the analysis job fail.
> 
> Please add this text as part of the commit message. It can be done on
> commit.

I assume you saw my reply on another of the patches in this series as to
asmlinkage use on variables? IOW I think this paragraph would also need
adjustment to account for that.

> With that:
> 
> Acked-by: Stefano Stabellini 
> 
> 
>> ---
>>  xen/common/page_alloc.c | 1 +
>>  1 file changed, 1 insertion(+)
>>
>> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
>> index 9b5df74fddab..794d7689b179 100644
>> --- a/xen/common/page_alloc.c
>> +++ b/xen/common/page_alloc.c
>> @@ -258,6 +258,7 @@ static PAGE_LIST_HEAD(page_broken_list);
>>   * first_valid_mfn is exported because it is use in ARM specific NUMA
>>   * helpers. See comment in arch/arm/include/asm/numa.h.
>>   */
>> +/* SAF-1-safe */
>>  mfn_t first_valid_mfn = INVALID_MFN_INITIALIZER;
>>  
>>  struct bootmem_region {
>> -- 
>> 2.34.1
>>




Re: [PATCH v3 02/10] xen/version: Introduce non-truncating deterministically-signed XENVER_* subops

2023-11-30 Thread Jan Beulich
On 30.11.2023 23:30, Stefano Stabellini wrote:
> Hi everyone following this thread,
> 
> please see:
> https://marc.info/?l=xen-devel=170135718323946
> https://cryptpad.fr/form/#/2/form/view/7ByH95Vd7KiDOvN4wjV5iUGlMuZbkVdwk7cYpZdluWo/
> 
> For a vote on the usage of the word "broken"

So I did vote before becoming aware of this context. I would have voted
differently if I had known that this _alone_ is the context. Yet then
I'm also not going to change my vote, because as written _there_ it is
intended to be more general. If the wording of the text describing what
to vote on changed, things would be different.

Jan

> On Tue, 15 Aug 2023, Andrew Cooper wrote:
>> Recently in XenServer, we have encountered problems caused by both
>> XENVER_extraversion and XENVER_commandline having fixed bounds.
>>
>> More than just the invariant size, the APIs/ABIs also broken by typedef-ing 
>> an
>> array, and using an unqualified 'char' which has implementation-specific
>> signed-ness.
>>
>> Provide brand new ops, which are capable of expressing variable length
>> strings, and mark the older ops as broken.
>>
>> This fixes all issues around XENVER_extraversion being longer than 15 chars.
>> Further work beyond just this API is needed to remove other assumptions about
>> XENVER_commandline being 1023 chars long.
>>
>> Signed-off-by: Andrew Cooper 
>> Reviewed-by: Jason Andryuk 
>> ---
>> CC: George Dunlap 
>> CC: Jan Beulich 
>> CC: Stefano Stabellini 
>> CC: Wei Liu 
>> CC: Julien Grall 
>> CC: Daniel De Graaf 
>> CC: Daniel Smith 
>> CC: Jason Andryuk 
>> CC: Henry Wang 
>>
>> v3:
>>  * Modify dummy.h's xsm_xen_version() in the same way as flask.
>> v2:
>>  * Remove xen_capabilities_info_t from the stack now that arch_get_xen_caps()
>>has gone.
>>  * Use an arbitrary limit check much lower than INT_MAX.
>>  * Use "buf" rather than "string" terminology.
>>  * Expand the API comment.
>>
>> Tested by forcing XENVER_extraversion to be 20 chars long, and confirming 
>> that
>> an untruncated version can be obtained.
>> ---
>>  xen/common/kernel.c  | 62 +++
>>  xen/include/public/version.h | 63 ++--
>>  xen/include/xlat.lst |  1 +
>>  xen/include/xsm/dummy.h  |  3 ++
>>  xen/xsm/flask/hooks.c|  4 +++
>>  5 files changed, 131 insertions(+), 2 deletions(-)
>>
>> diff --git a/xen/common/kernel.c b/xen/common/kernel.c
>> index f822480a8ef3..79c008c7ee5f 100644
>> --- a/xen/common/kernel.c
>> +++ b/xen/common/kernel.c
>> @@ -24,6 +24,7 @@
>>  CHECK_build_id;
>>  CHECK_compile_info;
>>  CHECK_feature_info;
>> +CHECK_varbuf;
>>  #endif
>>  
>>  enum system_state system_state = SYS_STATE_early_boot;
>> @@ -498,6 +499,59 @@ static int __init cf_check param_init(void)
>>  __initcall(param_init);
>>  #endif
>>  
>> +static long xenver_varbuf_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>> +{
>> +struct xen_varbuf user_str;
>> +const char *str = NULL;
>> +size_t sz;
>> +
>> +switch ( cmd )
>> +{
>> +case XENVER_extraversion2:
>> +str = xen_extra_version();
>> +break;
>> +
>> +case XENVER_changeset2:
>> +str = xen_changeset();
>> +break;
>> +
>> +case XENVER_commandline2:
>> +str = saved_cmdline;
>> +break;
>> +
>> +case XENVER_capabilities2:
>> +str = xen_cap_info;
>> +break;
>> +
>> +default:
>> +ASSERT_UNREACHABLE();
>> +return -ENODATA;
>> +}
>> +
>> +sz = strlen(str);
>> +
>> +if ( sz > KB(64) ) /* Arbitrary limit.  Avoid long-running operations. 
>> */
>> +return -E2BIG;
>> +
>> +if ( guest_handle_is_null(arg) ) /* Length request */
>> +return sz;
>> +
>> +if ( copy_from_guest(_str, arg, 1) )
>> +return -EFAULT;
>> +
>> +if ( user_str.len == 0 )
>> +return -EINVAL;
>> +
>> +if ( sz > user_str.len )
>> +return -ENOBUFS;
>> +
>> +if ( copy_to_guest_offset(arg, offsetof(struct xen_varbuf, buf),
>> +  str, sz) )
>> +return -EFAULT;
>> +
>> +return sz;
>> +}
>> +
>>  long do_xen_version(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>>  {
>>  bool_t deny = !!xsm_xen_version(XSM_OTHER, cmd);
>> @@ -711,6 +765,14 @@ long do_xen_version(int cmd, 
>> XEN_GUEST_HANDLE_PARAM(void) arg)
>>  
>>  return sz;
>>  }
>> +
>> +case XENVER_extraversion2:
>> +case XENVER_capabilities2:
>> +case XENVER_changeset2:
>> +case XENVER_commandline2:
>> +if ( deny )
>> +return -EPERM;
>> +return xenver_varbuf_op(cmd, arg);
>>  }
>>  
>>  return -ENOSYS;
>> diff --git a/xen/include/public/version.h b/xen/include/public/version.h
>> index cbc4ef7a46e6..0dd6bbcb43cc 100644
>> --- a/xen/include/public/version.h
>> +++ b/xen/include/public/version.h
>> @@ -19,12 +19,20 @@
>>  /* arg == NULL; returns major:minor (16:16). */
>>  #define XENVER_version  0
>>  
>> -/* arg == 

Re: INFORMAL VOTE REQUIRED - DOCUMENTATION WORDING

2023-11-30 Thread Jan Beulich
On 30.11.2023 23:57, Tamas K Lengyel wrote:
> I think this form is bad and is not helpful. We ought to be able to
> recommend an alternative term beside "broken" and "deprecated". I
> would not use the term broken in this context but that also doesn't
> mean we shouldn't use it in any context. But also in this context
> deprecated is not the right term to use either since deprecated would
> require us to actually make the old hypercalls stop working altogether
> at some future point, which we won't ever do AFAIU. My vote would be
> to use the term superseded in this context, which I can't express
> clearly in the form so I'm not going to cast a vote.

+1

This really supports my earlier voiced concern that reducing things too
much for "ease" of voting isn't helpful. (That said, I don't share
Tamas'es view on the use of "deprecated".)

Jan

> On Thu, Nov 30, 2023 at 5:28 PM Stefano Stabellini
>  wrote:
>>
>> Hi all,
>>
>> This vote is in the context of this thread:
>> https://marc.info/?l=xen-devel=169213351810075
>>
>>
>> On Thu, 30 Nov 2023, Kelly Choi wrote:
>>> Hi all,
>>> There have been a few discussions about how we use documentation wording 
>>> within the community. Whilst there are differences in opinions and
>>> perceptions of the definition, it would be helpful to see a wider consensus 
>>> of how we feel.
>>>
>>> Discussion: Should we use the term 'broken' in our documentation, or do you 
>>> think an alternative wording would be better? If you agree or
>>> disagree, please vote as this will impact future discussions.
>>>
>>> I have purposely made the vote between two options to help us move in a 
>>> forward direction.
>>>
>>> PLEASE VOTE HERE. Deadline 15th December 2023.
>>> Your name will be required but will be private. If you answer anonymously, 
>>> your vote will not count. This is to ensure it is fair and each
>>> person gets one vote.
>>>
>>> As an open-source project, we need to come to a common ground, which 
>>> sometimes means we may not personally agree. To make this fair, please
>>> note the final results will be used to determine our future actions within 
>>> the community.
>>>
>>> If the majority votes for/against, we will respect the majority and 
>>> implement this accordingly.
>>>
>>> Many thanks,
>>> Kelly Choi
>>>
>>> Xen Project Community Manager
>>> XenServer, Cloud Software Group
>>>
>>>




Re: [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs

2023-11-30 Thread Jan Beulich
On 30.11.2023 18:06, Stewart Hildebrand wrote:
> On 11/30/23 03:33, Jan Beulich wrote:
>> On 30.11.2023 03:47, Stewart Hildebrand wrote:
>>> On 11/14/23 04:13, Jan Beulich wrote:
 On 13.11.2023 23:21, Stewart Hildebrand wrote:
> --- a/xen/arch/x86/include/asm/domain.h
> +++ b/xen/arch/x86/include/asm/domain.h
> @@ -503,6 +503,8 @@ struct arch_domain
>  #define has_vpit(d)(!!((d)->arch.emulation_flags & X86_EMU_PIT))
>  #define has_pirq(d)(!!((d)->arch.emulation_flags & 
> X86_EMU_USE_PIRQ))
>  
> +#define arch_needs_vpci(d) ({ (void)(d); false; })

 See my comments on the v5 thread on both this and ...
>>>
>>> So, the goal here is to return true for a PVH dom0, and false otherwise 
>>> (for now). Since dom0 can't feasibly be full HVM, and is_hvm_domain(d) 
>>> returns true for PVH, how about the following?
>>>
>>> /* TODO: re-visit when vPCI is enabled for PVH domUs. */
>>> #define arch_needs_vpci(d) ({   \
>>> const struct domain *_d = (d);  \
>>> is_hardware_domain(_d) && is_hvm_domain(_d); })
>>
>> Looks okay to me, except for the leading underscore in _d (see respective
>> Misra guidelines, merely re-enforcing what the C standard says).
> 
> Right. If I'm interpreting the standards correctly, it looks like a trailing 
> underscore would work, seeing as we haven't adopted MISRA C:2012 Dir 4.5 
> (yet?).

Adopting the respective Misra rule would only affirm that we should adhere
to the C spec. Us being free-standing (and hence no runtime library involved)
may have been an argument towards more relaxed treatment, but imo never was a
good justification. And yes, trailing underscores is what I have been
recommending, but some of the other maintainers don't really like them
(without, iirc, indicating what else to use as an underlying naming scheme).

Jan



Re: [PATCH v2 1/5] x86/livepatch: set function alignment to ensure minimal function size

2023-11-30 Thread Jan Beulich
On 30.11.2023 18:37, Roger Pau Monné wrote:
> On Thu, Nov 30, 2023 at 05:55:07PM +0100, Jan Beulich wrote:
>> On 28.11.2023 11:03, Roger Pau Monne wrote:
>>> The minimal function size requirements for livepatch are either 5 bytes (for
>>> jmp) or 9 bytes (for endbr + jmp).  Ensure that functions are always at 
>>> least
>>> that size by requesting the compiled to align the functions to 8 or 16 
>>> bytes,
>>> depending on whether Xen is build with IBT support.
>>
>> How is alignment going to enforce minimum function size? If a function is
>> last in a section, there may not be any padding added (ahead of linking at
>> least). The trailing padding also isn't part of the function.
> 
> If each function lives in it's own section (by using
> -ffunction-sections), and each section is aligned, then I think we can
> guarantee that there will always be enough padding space?
> 
> Even the last function/section on the .text block would still be
> aligned, and as long as the function alignment <= SECTION_ALIGN
> there will be enough padding left.  I should add some build time
> assert that CONFIG_CC_FUNCTION_ALIGNMENT <= SECTION_ALIGN.

I'm not sure of there being a requirement for a section to be padded to
its alignment. If the following section has smaller alignment, it could
be made start earlier. Of course our linker scripts might guarantee
this ...

> While the trailing padding is not part of the function itself, I don't
> see issues in overwriting it with the replacement function code.

If it's really padding, yes.

Jan



Re: [PATCH 1/6] system/cpus: rename qemu_mutex_lock_iothread() to qemu_bql_lock()

2023-11-30 Thread Eric Farman
On Wed, 2023-11-29 at 16:26 -0500, Stefan Hajnoczi wrote:
> The Big QEMU Lock (BQL) has many names and they are confusing. The
> actual QemuMutex variable is called qemu_global_mutex but it's
> commonly
> referred to as the BQL in discussions and some code comments. The
> locking APIs, however, are called qemu_mutex_lock_iothread() and
> qemu_mutex_unlock_iothread().
> 
> The "iothread" name is historic and comes from when the main thread
> was
> split into into KVM vcpu threads and the "iothread" (now called the
> main
> loop thread). I have contributed to the confusion myself by
> introducing
> a separate --object iothread, a separate concept unrelated to the
> BQL.
> 
> The "iothread" name is no longer appropriate for the BQL. Rename the
> locking APIs to:
> - void qemu_bql_lock(void)
> - void qemu_bql_unlock(void)
> - bool qemu_bql_locked(void)
> 
> There are more APIs with "iothread" in their names. Subsequent
> patches
> will rename them. There are also comments and documentation that will
> be
> updated in later patches.
> 
> Signed-off-by: Stefan Hajnoczi 

Acked-by: Eric Farman 



Re: [PATCH 1/6] system/cpus: rename qemu_mutex_lock_iothread() to qemu_bql_lock()

2023-11-30 Thread Harsh Prateek Bora

Hi Stefan,

On 11/30/23 02:56, Stefan Hajnoczi wrote:

The Big QEMU Lock (BQL) has many names and they are confusing. The
actual QemuMutex variable is called qemu_global_mutex but it's commonly
referred to as the BQL in discussions and some code comments. The
locking APIs, however, are called qemu_mutex_lock_iothread() and
qemu_mutex_unlock_iothread().

The "iothread" name is historic and comes from when the main thread was
split into into KVM vcpu threads and the "iothread" (now called the main
loop thread). I have contributed to the confusion myself by introducing
a separate --object iothread, a separate concept unrelated to the BQL.

The "iothread" name is no longer appropriate for the BQL. Rename the
locking APIs to:
- void qemu_bql_lock(void)
- void qemu_bql_unlock(void)
- bool qemu_bql_locked(void)

There are more APIs with "iothread" in their names. Subsequent patches
will rename them. There are also comments and documentation that will be
updated in later patches.

Signed-off-by: Stefan Hajnoczi 
---
  include/block/aio-wait.h |   2 +-
  include/qemu/main-loop.h |  26 +++---
  accel/accel-blocker.c|  10 +--
  accel/dummy-cpus.c   |   8 +-
  accel/hvf/hvf-accel-ops.c|   4 +-
  accel/kvm/kvm-accel-ops.c|   4 +-
  accel/kvm/kvm-all.c  |  22 ++---
  accel/tcg/cpu-exec.c |  26 +++---
  accel/tcg/cputlb.c   |  16 ++--
  accel/tcg/tcg-accel-ops-icount.c |   4 +-
  accel/tcg/tcg-accel-ops-mttcg.c  |  12 +--
  accel/tcg/tcg-accel-ops-rr.c |  14 ++--
  accel/tcg/tcg-accel-ops.c|   2 +-
  accel/tcg/translate-all.c|   2 +-
  cpu-common.c |   4 +-
  dump/dump.c  |   4 +-
  hw/core/cpu-common.c |   6 +-
  hw/i386/intel_iommu.c|   6 +-
  hw/i386/kvm/xen_evtchn.c |  16 ++--
  hw/i386/kvm/xen_overlay.c|   2 +-
  hw/i386/kvm/xen_xenstore.c   |   2 +-
  hw/intc/arm_gicv3_cpuif.c|   2 +-
  hw/intc/s390_flic.c  |  18 ++--
  hw/misc/edu.c|   4 +-
  hw/misc/imx6_src.c   |   2 +-
  hw/misc/imx7_src.c   |   2 +-
  hw/net/xen_nic.c |   8 +-
  hw/ppc/pegasos2.c|   2 +-
  hw/ppc/ppc.c |   4 +-
  hw/ppc/spapr.c   |   2 +-
  hw/ppc/spapr_rng.c   |   4 +-
  hw/ppc/spapr_softmmu.c   |   4 +-
  hw/remote/mpqemu-link.c  |  12 +--
  hw/remote/vfio-user-obj.c|   2 +-
  hw/s390x/s390-skeys.c|   2 +-
  migration/block-dirty-bitmap.c   |   4 +-
  migration/block.c|  16 ++--
  migration/colo.c |  60 +++---
  migration/dirtyrate.c|  12 +--
  migration/migration.c|  52 ++--
  migration/ram.c  |  12 +--
  replay/replay-internal.c |   2 +-
  semihosting/console.c|   8 +-
  stubs/iothread-lock.c|   6 +-
  system/cpu-throttle.c|   4 +-
  system/cpus.c|  28 +++
  system/dirtylimit.c  |   4 +-
  system/memory.c  |   2 +-
  system/physmem.c |   8 +-
  system/runstate.c|   2 +-
  system/watchpoint.c  |   4 +-
  target/arm/arm-powerctl.c|  14 ++--
  target/arm/helper.c  |   4 +-
  target/arm/hvf/hvf.c |   8 +-
  target/arm/kvm.c |   4 +-
  target/arm/kvm64.c   |   4 +-
  target/arm/ptw.c |   6 +-
  target/arm/tcg/helper-a64.c  |   8 +-
  target/arm/tcg/m_helper.c|   4 +-
  target/arm/tcg/op_helper.c   |  24 +++---
  target/arm/tcg/psci.c|   2 +-
  target/hppa/int_helper.c |   8 +-
  target/i386/hvf/hvf.c|   6 +-
  target/i386/kvm/hyperv.c |   4 +-
  target/i386/kvm/kvm.c|  28 +++
  target/i386/kvm/xen-emu.c|  14 ++--
  target/i386/nvmm/nvmm-accel-ops.c|   4 +-
  target/i386/nvmm/nvmm-all.c  |  20 ++---
  target/i386/tcg/sysemu/fpu_helper.c  |   6 +-
  target/i386/tcg/sysemu/misc_helper.c |   4 +-
  target/i386/whpx/whpx-accel-ops.c|   4 +-
  target/i386/whpx/whpx-all.c  |  24 +++---
  target/loongarch/csr_helper.c|   4 +-
  target/mips/kvm.c|   4 +-
  target/mips/tcg/sysemu/cp0_helper.c  |   4 +-
  target/openrisc/sys_helper.c |  16 ++--
  target/ppc/excp_helper.c |  12 +--
  target/ppc/kvm.c |   4 +-
  target/ppc/misc_helper.c |   8 +-
  target/ppc/timebase_helper.c |   8 +-
  target/s390x/kvm/kvm.c   |   4 +-
  

[xen-unstable test] 183952: tolerable FAIL - PUSHED

2023-11-30 Thread osstest service owner
flight 183952 xen-unstable real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183952/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt   16 saverestore-support-check fail blocked in 183938
 test-amd64-amd64-xl-qemuu-win7-amd64 19 guest-stop  fail blocked in 183938
 test-amd64-amd64-xl-qemut-win7-amd64 19 guest-stopfail like 183938
 test-amd64-i386-xl-qemuu-win7-amd64 19 guest-stop fail like 183938
 test-amd64-amd64-xl-qemuu-ws16-amd64 19 guest-stopfail like 183938
 test-amd64-i386-xl-qemut-ws16-amd64 19 guest-stop fail like 183938
 test-armhf-armhf-libvirt-raw 15 saverestore-support-checkfail  like 183938
 test-amd64-i386-xl-qemut-win7-amd64 19 guest-stop fail like 183938
 test-amd64-amd64-xl-qemut-ws16-amd64 19 guest-stopfail like 183938
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 183938
 test-amd64-amd64-qemuu-nested-amd 20 debian-hvm-install/l1/l2 fail like 183938
 test-amd64-i386-xl-qemuu-ws16-amd64 19 guest-stop fail like 183938
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-i386-xl-pvshim14 guest-start  fail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit1  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-xsm  16 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-thunderx 16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-multivcpu 15 migrate-support-checkfail  never pass
 test-armhf-armhf-xl-multivcpu 16 saverestore-support-checkfail  never pass
 test-arm64-arm64-xl-credit2  15 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-credit2  16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-xl-arndale  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-arndale  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit1  16 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  14 migrate-support-checkfail   never pass
 test-arm64-arm64-xl-vhd  15 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-rtds 16 saverestore-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  15 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-credit2  16 saverestore-support-checkfail   never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-xl-vhd  14 migrate-support-checkfail   never pass
 test-armhf-armhf-xl-vhd  15 saverestore-support-checkfail   never pass

version targeted for testing:
 xen  f0dd0cd9598f22ee5509bb5d1466e4821834c4ba
baseline version:
 xen  902377b690f42ddf44ae91c4b0751d597f1cd694

Last test of basis   183938  2023-11-29 16:37:13 Z1 days
Testing same since   183952  2023-11-30 12:38:47 Z0 days1 attempts


People who touched revisions under test:
  Julien Grall 
  Nicola Vetrini 

jobs:
 

Re: [RFC KERNEL PATCH v2 2/3] xen/pvh: Unmask irq for passthrough device in PVH dom0

2023-11-30 Thread Stefano Stabellini
On Thu, 30 Nov 2023, Roger Pau Monné wrote:
> On Wed, Nov 29, 2023 at 07:53:59PM -0800, Stefano Stabellini wrote:
> > On Fri, 24 Nov 2023, Jiqian Chen wrote:
> > > This patch is to solve two problems we encountered when we try to
> > > passthrough a device to hvm domU base on Xen PVH dom0.
> > > 
> > > First, hvm guest will alloc a pirq and irq for a passthrough device
> > > by using gsi, before that, the gsi must first has a mapping in dom0,
> > > see Xen code pci_add_dm_done->xc_domain_irq_permission, it will call
> > > into Xen and check whether dom0 has the mapping. See
> > > XEN_DOMCTL_irq_permission->pirq_access_permitted, "current" is PVH
> > > dom0 and it return irq is 0, and then return -EPERM.
> > > This is because the passthrough device doesn't do PHYSDEVOP_map_pirq
> > > when thay are enabled.
> > > 
> > > Second, in PVH dom0, the gsi of a passthrough device doesn't get
> > > registered, but gsi must be configured for it to be able to be
> > > mapped into a domU.
> > > 
> > > After searching codes, we can find map_pirq and register_gsi will be
> > > done in function vioapic_write_redirent->vioapic_hwdom_map_gsi when
> > > the gsi(aka ioapic's pin) is unmasked in PVH dom0. So the problems
> > > can be conclude to that the gsi of a passthrough device doesn't be
> > > unmasked.
> > > 
> > > To solve the unmaske problem, this patch call the unmask_irq when we
> > > assign a device to be passthrough. So that the gsi can get registered
> > > and mapped in PVH dom0.
> > 
> > 
> > Roger, this seems to be more of a Xen issue than a Linux issue. Why do
> > we need the unmask check in Xen? Couldn't we just do:
> > 
> > 
> > diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
> > index 4e40d3609a..df262a4a18 100644
> > --- a/xen/arch/x86/hvm/vioapic.c
> > +++ b/xen/arch/x86/hvm/vioapic.c
> > @@ -287,7 +287,7 @@ static void vioapic_write_redirent(
> >  hvm_dpci_eoi(d, gsi);
> >  }
> >  
> > -if ( is_hardware_domain(d) && unmasked )
> > +if ( is_hardware_domain(d) )
> >  {
> >  /*
> >   * NB: don't call vioapic_hwdom_map_gsi while holding hvm.irq_lock
> 
> There are some issues with this approach.
> 
> mp_register_gsi() will only setup the trigger and polarity of the
> IO-APIC pin once, so we do so once the guest unmask the pin in order
> to assert that the configuration is the intended one.  A guest is
> allowed to write all kind of nonsense stuff to the IO-APIC RTE, but
> that doesn't take effect unless the pin is unmasked.
> 
> Overall the question would be whether we have any guarantees that
> the hardware domain has properly configured the pin, even if it's not
> using it itself (as it hasn't been unmasked).
> 
> IIRC PCI legacy interrupts are level triggered and low polarity, so we
> could configure any pins that are not setup at bind time?

That could work.

Another idea is to move only the call to allocate_and_map_gsi_pirq at
bind time? That might be enough to pass a pirq_access_permitted check.

Re: [RFC XEN PATCH v2 3/3] tools: Add new function to get gsi from irq

2023-11-30 Thread Stefano Stabellini
On Thu, 30 Nov 2023, Roger Pau Monné wrote:
> On Wed, Nov 29, 2023 at 08:02:40PM -0800, Stefano Stabellini wrote:
> > n Wed, 29 Nov 2023, Stefano Stabellini wrote:
> > > On Tue, 28 Nov 2023, Roger Pau Monné wrote:
> > > > On Fri, Nov 24, 2023 at 06:41:36PM +0800, Jiqian Chen wrote:
> > > > > In PVH dom0, it uses the linux local interrupt mechanism,
> > > > > when it allocs irq for a gsi, it is dynamic, and follow
> > > > > the principle of applying first, distributing first. And
> > > > > if you debug the kernel codes, you will find the irq
> > > > > number is alloced from small to large, but the applying
> > > > > gsi number is not, may gsi 38 comes before gsi 28, that
> > > > > causes the irq number is not equal with the gsi number.
> > > > > And when we passthrough a device, QEMU will use its gsi
> > > > > number to do mapping actions, see xen_pt_realize->
> > > > > xc_physdev_map_pirq, but the gsi number is got from file
> > > > > /sys/bus/pci/devices/:xx:xx.x/irq in current code,
> > > > > so it will fail when mapping.
> > > > > And in current codes, there is no method to translate
> > > > > irq to gsi for userspace.
> > > > 
> > > > I think it would be cleaner to just introduce a new sysfs node that
> > > > contains the gsi if a device is using one (much like the irq sysfs
> > > > node)?
> > > > 
> > > > Such ioctl to translate from IRQ to GSI has nothing to do with Xen, so
> > > > placing it in privcmd does seem quite strange to me.  I understand
> > > > that for passthrough we need the GSI, but such translation layer from
> > > > IRQ to GSI is all Linux internal, and it would be much simpler to just
> > > > expose the GSI in sysfs IMO.
> > > 
> > > Maybe something to add to drivers/xen/sys-hypervisor.c in Linux.
> > > Juergen what do you think?
> > 
> > Let me also add that privcmd.c is already a Linux specific interface.
> > Although it was born to be a Xen hypercall "proxy" in reality today we
> > have a few privcmd ioctls that don't translate into hypercalls. So I
> > don't think that adding IOCTL_PRIVCMD_GSI_FROM_IRQ would be a problem.
> 
> Maybe not all ioctls translate to hypercalls (I guess you are
> referring to the IOCTL_PRIVCMD_RESTRICT ioctl), but they are specific
> Xen actions.  Getting the GSI used by a device has nothing do to with
> Xen.
> 
> IMO drivers/xen/sys-hypervisor.c is also not appropriate, but I'm not
> the maintainer of any of those components.
> 
> There's nothing Xen specific about fetching the GSI associated with a
> PCI device.  The fact that Xen needs it for passthrough is just a red
> herring, further cases where the GSI is needed might arise outside of
> Xen, and hence such node would better be placed in a generic
> location.  The right location should be /sys/bus/pci/devices//gsi.

That might be true but /sys/bus/pci/devices//gsi is a non-Xen
generic interface and the maintainers of that portion of Linux code
might have a different opinion. We'll have to see.

[ovmf test] 183958: all pass - PUSHED

2023-11-30 Thread osstest service owner
flight 183958 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183958/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 534021965f6f7c417610add53984f39d6945bbcf
baseline version:
 ovmf 26d484d0867b03ebd8a1ecdd9895f17e96732503

Last test of basis   183956  2023-11-30 19:44:47 Z0 days
Testing same since   183958  2023-12-01 01:15:11 Z0 days1 attempts


People who touched revisions under test:
  Zhi Jin 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   26d484d086..534021965f  534021965f6f7c417610add53984f39d6945bbcf -> 
xen-tested-master



Re: [XEN PATCH v3] automation/eclair: improve scheduled analyses

2023-11-30 Thread Stefano Stabellini
On Thu, 30 Nov 2023, Simone Ballarin wrote:
> The scheduled analyses are intended to maintain an overall vision
> of the MISRA complaince of the entire project. For this reason,
> the file exclusions in "out_of_scope.ecl" should not be applied.
> 
> This patch amends ECLAIR settings to prevent exempting files for
> scheduled analyses.
> 
> Signed-off-by: Simone Ballarin 

Acked-by: Stefano Stabellini 


> ---
> Changes in v3:
> - fix guard for inclusion of out_of_scope.ecl.
> Changes in v2:
> - drop changes to inhibit test and build stages in scheduled pipelines.
> ---
>  automation/eclair_analysis/ECLAIR/action.settings |  2 +-
>  automation/eclair_analysis/ECLAIR/analysis.ecl| 12 ++--
>  automation/gitlab-ci/analyze.yaml |  2 ++
>  3 files changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/automation/eclair_analysis/ECLAIR/action.settings 
> b/automation/eclair_analysis/ECLAIR/action.settings
> index f96368ffc7..3cba1a3afb 100644
> --- a/automation/eclair_analysis/ECLAIR/action.settings
> +++ b/automation/eclair_analysis/ECLAIR/action.settings
> @@ -134,7 +134,7 @@ push)
>  badgeLabel="ECLAIR ${ANALYSIS_KIND} ${ref}${variantHeadline} #${jobId}"
>  ;;
>  auto_pull_request)
> -git remote remove autoPRRemote || true
> +git remote remove autoPRRemote 2>/dev/null || true
>  git remote add autoPRRemote "${autoPRRemoteUrl}"
>  git fetch -q autoPRRemote
>  subDir="${ref}"
> diff --git a/automation/eclair_analysis/ECLAIR/analysis.ecl 
> b/automation/eclair_analysis/ECLAIR/analysis.ecl
> index fe418d6da1..f8d4cc8c99 100644
> --- a/automation/eclair_analysis/ECLAIR/analysis.ecl
> +++ b/automation/eclair_analysis/ECLAIR/analysis.ecl
> @@ -2,7 +2,13 @@
>  -project_name=getenv("ECLAIR_PROJECT_NAME")
>  -project_root=getenv("ECLAIR_PROJECT_ROOT")
>  
> --setq=data_dir,getenv("ECLAIR_DATA_DIR")
> +setq(data_dir,getenv("ECLAIR_DATA_DIR"))
> +setq(analysis_kind,getenv("ANALYSIS_KIND"))
> +setq(scheduled_analysis,nil)
> +
> +strings_map("scheduled-analysis",500,"","^.*scheduled$",0,setq(scheduled_analysis,t))
> +strings_map("scheduled-analysis",500,"","^.*$",0)
> +map_strings("scheduled-analysis",analysis_kind)
>  
>  -verbose
>  
> @@ -15,7 +21,9 @@
>  
>  -eval_file=toolchain.ecl
>  -eval_file=public_APIs.ecl
> --eval_file=out_of_scope.ecl
> +if(not(scheduled_analysis),
> +eval_file("out_of_scope.ecl")
> +)
>  -eval_file=deviations.ecl
>  -eval_file=call_properties.ecl
>  -eval_file=tagging.ecl
> diff --git a/automation/gitlab-ci/analyze.yaml 
> b/automation/gitlab-ci/analyze.yaml
> index bd9a68de31..6631db53fa 100644
> --- a/automation/gitlab-ci/analyze.yaml
> +++ b/automation/gitlab-ci/analyze.yaml
> @@ -28,6 +28,8 @@
>extends: .eclair-analysis
>allow_failure: true
>rules:
> +- if: $CI_PIPELINE_SOURCE == "schedule"
> +  when: never
>  - if: $WTOKEN && $CI_PROJECT_PATH =~ /^xen-project\/people\/.*$/
>when: manual
>  - !reference [.eclair-analysis, rules]
> -- 
> 2.34.1
> 



Re: [XEN PATCH v2] automation/eclair: add deviations for MISRA C:2012 Rule 5.6

2023-11-30 Thread Stefano Stabellini
On Wed, 29 Nov 2023, Federico Serafini wrote:
> Update ECLAIR configuration to take into account the adopted files
> and type "ret_t".
> Update docs/misra/deviations.rst accordingly.
> 
> Signed-off-by: Federico Serafini 

Acked-by: Stefano Stabellini 




Re: [XEN PATCH 7/7] xen/page_alloc: deviate first_valid_mfn for MISRA C Rule 8.4

2023-11-30 Thread Stefano Stabellini
On Wed, 29 Nov 2023, Nicola Vetrini wrote:
> No functional change.
> 
> Signed-off-by: Nicola Vetrini 
> ---
> The preferred way to deviate is to use asmlinkage, but this modification is 
> only
> the consequence of NUMA on ARM (and possibly PPC) being a work in progress.
> As stated in the comment above the textual deviation, first_valid_mfn will
> likely then become static and there would be no need for the comment anymore.
> This works towards having the analysis for this rule clean (i.e. no 
> violations);
> the interest in having a clean rule is that then it could be used to signal
> newly introduced violations by making the analysis job fail.

Please add this text as part of the commit message. It can be done on
commit. With that:

Acked-by: Stefano Stabellini 


> ---
>  xen/common/page_alloc.c | 1 +
>  1 file changed, 1 insertion(+)
> 
> diff --git a/xen/common/page_alloc.c b/xen/common/page_alloc.c
> index 9b5df74fddab..794d7689b179 100644
> --- a/xen/common/page_alloc.c
> +++ b/xen/common/page_alloc.c
> @@ -258,6 +258,7 @@ static PAGE_LIST_HEAD(page_broken_list);
>   * first_valid_mfn is exported because it is use in ARM specific NUMA
>   * helpers. See comment in arch/arm/include/asm/numa.h.
>   */
> +/* SAF-1-safe */
>  mfn_t first_valid_mfn = INVALID_MFN_INITIALIZER;
>  
>  struct bootmem_region {
> -- 
> 2.34.1
> 



Re: [XEN PATCH 4/7] x86/viridian: make build_assertions static

2023-11-30 Thread Stefano Stabellini
On Wed, 29 Nov 2023, Nicola Vetrini wrote:
> This is consistent with other instances of the same function
> and also resolves a violation of MISRA C:2012 Rule 8.4.
> 
> No functional change.
> 
> Signed-off-by: Nicola Vetrini 


Reviewed-by: Stefano Stabellini 



Re: [XEN PATCH 1/7] xen/arm: mmu: add headers for missing declarations

2023-11-30 Thread Stefano Stabellini
On Wed, 29 Nov 2023, Nicola Vetrini wrote:
> The definitions needing the inclusion of asm/setup.h are
> boot_{first,second,third}(_id)?, whereas vmap.h is needed by 
> arch_vmap_virt_end.
> 
> No functional change.
> 
> Signed-off-by: Nicola Vetrini 

Reviewed-by: Stefano Stabellini 



Re: [XEN PATCH 11/11] xen/serial: address a violation of MISRA C:2012 Rule 8.2

2023-11-30 Thread Stefano Stabellini
On Fri, 24 Nov 2023, Federico Serafini wrote:
> Add missing parameter name. No functional change.
> 
> Signed-off-by: Federico Serafini 

Reviewed-by: Stefano Stabellini 



Re: [XEN PATCH 10/11] xen/perfc: address a violation of MISRA C:2012 Rule 8.2

2023-11-30 Thread Stefano Stabellini
On Fri, 24 Nov 2023, Federico Serafini wrote:
> Add missing parameter name. No functional change.
> 
> Signed-off-by: Federico Serafini 

Reviewed-by: Stefano Stabellini 



Re: [XEN PATCH 09/11] xen/param: address a violation of MISRA C:2012 Rule 8.2

2023-11-30 Thread Stefano Stabellini
On Fri, 24 Nov 2023, Federico Serafini wrote:
> Add missing parameter name. No functional change.
> 
> Signed-off-by: Federico Serafini 

Reviewed-by: Stefano Stabellini 



Re: [XEN PATCH 08/11] xen/kernel: address a violation of MISRA C:2012 Rule 8.2

2023-11-30 Thread Stefano Stabellini
On Fri, 24 Nov 2023, Federico Serafini wrote:
> Add missing parameter name. No functional change.
> 
> Signed-off-by: Federico Serafini 

Reviewed-by: Stefano Stabellini 



Re: [XEN PATCH 06/11] xen/notifier: address violations of MISRA C:2012 Rule 8.2

2023-11-30 Thread Stefano Stabellini
On Fri, 24 Nov 2023, Federico Serafini wrote:
> Add missing parameter names. No functional change.
> 
> Signed-off-by: Federico Serafini 

Reviewed-by: Stefano Stabellini 



Re: [XEN PATCH 05/11] xen/domain: address violations of MISRA C:2012 Rule 8.2

2023-11-30 Thread Stefano Stabellini
On Fri, 24 Nov 2023, Federico Serafini wrote:
> Add missing parameter names. No functional change.
> 
> Signed-off-by: Federico Serafini 

Reviewed-by: Stefano Stabellini 



Re: [XEN PATCH] docs/misra: fix a typo in rules.rst

2023-11-30 Thread Stefano Stabellini
On Thu, 30 Nov 2023, Nicola Vetrini wrote:
> No functional changes.
> 
> Signed-off-by: Nicola Vetrini 

Acked-by: Stefano Stabellini 

> ---
>  docs/misra/rules.rst | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/docs/misra/rules.rst b/docs/misra/rules.rst
> index 53dab0070c7b..75921b9a3463 100644
> --- a/docs/misra/rules.rst
> +++ b/docs/misra/rules.rst
> @@ -531,7 +531,7 @@ maintainers if you want to suggest a change.
> the future or with another compiler.  For these reasons we discourage
> the introduction of new reserved identifiers in Xen, and we see it as
> positive the reduction of reserved identifiers. At the same time,
> -   certain identifiers starting with wo underscores are also commonly 
> used
> +   certain identifiers starting with two underscores are also commonly 
> used
> in Linux (e.g. __set_bit) and we don't think it would be an 
> improvement
> to rename them.
>  
> -- 
> 2.34.1
> 



Re: [PATCH v10 12/17] vpci/header: emulate PCI_COMMAND register for guests

2023-11-30 Thread Volodymyr Babchuk

Hi Roger, Stewart,

Roger Pau Monné  writes:

> On Thu, Oct 12, 2023 at 10:09:18PM +, Volodymyr Babchuk wrote:
>> From: Oleksandr Andrushchenko 
>> 
>> Xen and/or Dom0 may have put values in PCI_COMMAND which they expect
>> to remain unaltered. PCI_COMMAND_SERR bit is a good example: while the
>> guest's view of this will want to be zero initially, the host having set
>> it to 1 may not easily be overwritten with 0, or else we'd effectively
>> imply giving the guest control of the bit. Thus, PCI_COMMAND register needs
>> proper emulation in order to honor host's settings.
>> 
>> According to "PCI LOCAL BUS SPECIFICATION, REV. 3.0", section "6.2.2
>> Device Control" the reset state of the command register is typically 0,
>> so when assigning a PCI device use 0 as the initial state for the guest's 
>> view
>> of the command register.
>> 
>> Here is the full list of command register bits with notes about
>> emulation, along with QEMU behavior in the same situation:
>> 
>> PCI_COMMAND_IO - QEMU does not allow a guest to change value of this bit
>> in real device. Instead it is always set to 1. A guest can write to this
>> register, but writes are ignored.
>> 
>> PCI_COMMAND_MEMORY - QEMU behaves exactly as with PCI_COMMAND_IO. In
>> Xen case, we handle writes to this bit by mapping/unmapping BAR
>> regions. For devices assigned to DomUs, memory decoding will be
>> disabled and the initialization.
>> 
>> PCI_COMMAND_MASTER - Allow guest to control it. QEMU passes through
>> writes to this bit.
>> 
>> PCI_COMMAND_SPECIAL - Guest can generate special cycles only if it has
>> access to host bridge that supports software generation of special
>> cycles. In our case guest has no access to host bridges at all. Value
>> after reset is 0. QEMU passes through writes of this bit, we will do
>> the same.
>> 
>> PCI_COMMAND_INVALIDATE - Allows "Memory Write and Invalidate" commands
>> to be generated. It requires additional configuration via Cacheline
>> Size register. We are not emulating this register right now and we
>> can't expect guest to properly configure it. QEMU "emulates" access to
>> Cachline Size register by ignoring all writes to it. QEMU passes through
>> writes of PCI_COMMAND_INVALIDATE bit, we will do the same.
>> 
>> PCI_COMMAND_VGA_PALETTE - Enable VGA palette snooping. QEMU passes
>> through writes of this bit, we will do the same.
>> 
>> PCI_COMMAND_PARITY - Controls how device response to parity
>> errors. QEMU ignores writes to this bit, we will do the same.
>> 
>> PCI_COMMAND_WAIT - Reserved. Should be 0, but QEMU passes
>> through writes of this bit, so we will do the same.
>> 
>> PCI_COMMAND_SERR - Controls if device can assert SERR. QEMU ignores
>> writes to this bit, we will do the same.
>> 
>> PCI_COMMAND_FAST_BACK - Optional bit that allows fast back-to-back
>> transactions. It is configured by firmware, so we don't want guest to
>> control it. QEMU ignores writes to this bit, we will do the same.
>> 
>> PCI_COMMAND_INTX_DISABLE - Disables INTx signals. If MSI(X) is
>> enabled, device is prohibited from asserting INTx as per
>> specification. Value after reset is 0. In QEMU case, it checks of INTx
>> was mapped for a device. If it is not, then guest can't control
>> PCI_COMMAND_INTX_DISABLE bit. In our case, we prohibit a guest to
>> change value of this bit if MSI(X) is enabled.
>
> FWIW, bits 11-15 are RsvdP, so we will need to add support for them
> after the series from Stewart that adds support for register masks is
> accepted.

Stewart's series implement much better register handling than this
patch. What about dropping this change at all in favor of Stewart's
implementation? I'll be 100% okay with this.

[...]

-- 
WBR, Volodymyr

Re: [RFC PATCH v2 17/19] heki: x86: Update permissions counters during text patching

2023-11-30 Thread Edgecombe, Rick P
On Wed, 2023-11-29 at 15:07 -0600, Madhavan T. Venkataraman wrote:
> Threat Model
> 
> 
> In the threat model in Heki, the attacker is a user space attacker
> who exploits
> a kernel vulnerability to gain more privileges or bypass the kernel's
> access
> control and self-protection mechanisms. 
> 
> In the context of the guest page table, one of the things that the
> threat model translates
> to is a hacker gaining access to a guest page with RWX permissions.
> E.g., by adding execute
> permissions to a writable page or by adding write permissions to an
> executable page.
> 
> Today, the permissions for a guest page in the extended page table
> are RWX by
> default. So, if a hacker manages to establish RWX for a page in the
> guest page
> table, then that is all he needs to do some damage.

I had a few random comments from watching the plumbers talk online:

Is there really a big difference between a page that is RWX, and a RW
page that is about to become RX? I realize that there is an addition of
timing, but when executable code is getting loaded it can be written to
then and later executed. I think that gap could be addressed in two
different ways, both pretty difficult:
 1. Verifying the loaded code before it gets marked 
executable. This is difficult because the kernel does lots of 
tweaks on the code it is loading (alternatives, etc). It can't 
just check a signature.
 2. Loading the code in a protected environment. In this model the 
(for example) module signature would be checked, then the code 
would be loaded in some sort of protected environment. This way 
integrity of the loaded code would be enforced. But extracting 
module loading into a separate domain would be difficult. 
Various scattered features all have their hands in the loading.

Secondly, I wonder if another way to look at the memory parts of HEKI
could be that this is a way to protect certain page table bits from
stay writes. The RWX bits in the EPT are not directly writable, so more
steps are needed to change things than just a stray write (instead the
helpers involved in the operations need to be called). If that is a
fair way of looking at it, then I wonder how HEKI compares to a
solution like this security-wise:
https://lore.kernel.org/lkml/20210830235927.6443-1-rick.p.edgeco...@intel.com/

Functional-wise it had the benefit of working on bare metal and
supporting the normal kernel features.


Re: INFORMAL VOTE REQUIRED - DOCUMENTATION WORDING

2023-11-30 Thread Tamas K Lengyel
Hi all,
I think this form is bad and is not helpful. We ought to be able to
recommend an alternative term beside "broken" and "deprecated". I
would not use the term broken in this context but that also doesn't
mean we shouldn't use it in any context. But also in this context
deprecated is not the right term to use either since deprecated would
require us to actually make the old hypercalls stop working altogether
at some future point, which we won't ever do AFAIU. My vote would be
to use the term superseded in this context, which I can't express
clearly in the form so I'm not going to cast a vote.

Tamas

On Thu, Nov 30, 2023 at 5:28 PM Stefano Stabellini
 wrote:
>
> Hi all,
>
> This vote is in the context of this thread:
> https://marc.info/?l=xen-devel=169213351810075
>
>
> On Thu, 30 Nov 2023, Kelly Choi wrote:
> > Hi all,
> > There have been a few discussions about how we use documentation wording 
> > within the community. Whilst there are differences in opinions and
> > perceptions of the definition, it would be helpful to see a wider consensus 
> > of how we feel.
> >
> > Discussion: Should we use the term 'broken' in our documentation, or do you 
> > think an alternative wording would be better? If you agree or
> > disagree, please vote as this will impact future discussions.
> >
> > I have purposely made the vote between two options to help us move in a 
> > forward direction.
> >
> > PLEASE VOTE HERE. Deadline 15th December 2023.
> > Your name will be required but will be private. If you answer anonymously, 
> > your vote will not count. This is to ensure it is fair and each
> > person gets one vote.
> >
> > As an open-source project, we need to come to a common ground, which 
> > sometimes means we may not personally agree. To make this fair, please
> > note the final results will be used to determine our future actions within 
> > the community.
> >
> > If the majority votes for/against, we will respect the majority and 
> > implement this accordingly.
> >
> > Many thanks,
> > Kelly Choi
> >
> > Xen Project Community Manager
> > XenServer, Cloud Software Group
> >
> >



Re: [PATCH 10/12] scsi: remove outdated AioContext lock comment

2023-11-30 Thread Eric Blake
On Wed, Nov 29, 2023 at 02:55:51PM -0500, Stefan Hajnoczi wrote:
> The SCSI subsystem no longer uses the AioContext lock. Request
> processing runs exclusively in the BlockBackend's AioContext since
> "scsi: only access SCSIDevice->requests from one thread" and hence the
> lock is unnecessary.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  hw/scsi/scsi-disk.c | 1 -
>  1 file changed, 1 deletion(-)
>

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: [PATCH 09/12] docs: remove AioContext lock from IOThread docs

2023-11-30 Thread Eric Blake
On Wed, Nov 29, 2023 at 02:55:50PM -0500, Stefan Hajnoczi wrote:
> Encourage the use of locking primitives and stop mentioning the
> AioContext lock since it is being removed.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  docs/devel/multiple-iothreads.txt | 45 +++
>  1 file changed, 15 insertions(+), 30 deletions(-)
> 
> diff --git a/docs/devel/multiple-iothreads.txt 
> b/docs/devel/multiple-iothreads.txt
> index a3e949f6b3..4865196bde 100644
> --- a/docs/devel/multiple-iothreads.txt

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: [PATCH v3 02/10] xen/version: Introduce non-truncating deterministically-signed XENVER_* subops

2023-11-30 Thread Stefano Stabellini
Hi everyone following this thread,

please see:
https://marc.info/?l=xen-devel=170135718323946
https://cryptpad.fr/form/#/2/form/view/7ByH95Vd7KiDOvN4wjV5iUGlMuZbkVdwk7cYpZdluWo/

For a vote on the usage of the word "broken"


On Tue, 15 Aug 2023, Andrew Cooper wrote:
> Recently in XenServer, we have encountered problems caused by both
> XENVER_extraversion and XENVER_commandline having fixed bounds.
> 
> More than just the invariant size, the APIs/ABIs also broken by typedef-ing an
> array, and using an unqualified 'char' which has implementation-specific
> signed-ness.
> 
> Provide brand new ops, which are capable of expressing variable length
> strings, and mark the older ops as broken.
> 
> This fixes all issues around XENVER_extraversion being longer than 15 chars.
> Further work beyond just this API is needed to remove other assumptions about
> XENVER_commandline being 1023 chars long.
> 
> Signed-off-by: Andrew Cooper 
> Reviewed-by: Jason Andryuk 
> ---
> CC: George Dunlap 
> CC: Jan Beulich 
> CC: Stefano Stabellini 
> CC: Wei Liu 
> CC: Julien Grall 
> CC: Daniel De Graaf 
> CC: Daniel Smith 
> CC: Jason Andryuk 
> CC: Henry Wang 
> 
> v3:
>  * Modify dummy.h's xsm_xen_version() in the same way as flask.
> v2:
>  * Remove xen_capabilities_info_t from the stack now that arch_get_xen_caps()
>has gone.
>  * Use an arbitrary limit check much lower than INT_MAX.
>  * Use "buf" rather than "string" terminology.
>  * Expand the API comment.
> 
> Tested by forcing XENVER_extraversion to be 20 chars long, and confirming that
> an untruncated version can be obtained.
> ---
>  xen/common/kernel.c  | 62 +++
>  xen/include/public/version.h | 63 ++--
>  xen/include/xlat.lst |  1 +
>  xen/include/xsm/dummy.h  |  3 ++
>  xen/xsm/flask/hooks.c|  4 +++
>  5 files changed, 131 insertions(+), 2 deletions(-)
> 
> diff --git a/xen/common/kernel.c b/xen/common/kernel.c
> index f822480a8ef3..79c008c7ee5f 100644
> --- a/xen/common/kernel.c
> +++ b/xen/common/kernel.c
> @@ -24,6 +24,7 @@
>  CHECK_build_id;
>  CHECK_compile_info;
>  CHECK_feature_info;
> +CHECK_varbuf;
>  #endif
>  
>  enum system_state system_state = SYS_STATE_early_boot;
> @@ -498,6 +499,59 @@ static int __init cf_check param_init(void)
>  __initcall(param_init);
>  #endif
>  
> +static long xenver_varbuf_op(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
> +{
> +struct xen_varbuf user_str;
> +const char *str = NULL;
> +size_t sz;
> +
> +switch ( cmd )
> +{
> +case XENVER_extraversion2:
> +str = xen_extra_version();
> +break;
> +
> +case XENVER_changeset2:
> +str = xen_changeset();
> +break;
> +
> +case XENVER_commandline2:
> +str = saved_cmdline;
> +break;
> +
> +case XENVER_capabilities2:
> +str = xen_cap_info;
> +break;
> +
> +default:
> +ASSERT_UNREACHABLE();
> +return -ENODATA;
> +}
> +
> +sz = strlen(str);
> +
> +if ( sz > KB(64) ) /* Arbitrary limit.  Avoid long-running operations. */
> +return -E2BIG;
> +
> +if ( guest_handle_is_null(arg) ) /* Length request */
> +return sz;
> +
> +if ( copy_from_guest(_str, arg, 1) )
> +return -EFAULT;
> +
> +if ( user_str.len == 0 )
> +return -EINVAL;
> +
> +if ( sz > user_str.len )
> +return -ENOBUFS;
> +
> +if ( copy_to_guest_offset(arg, offsetof(struct xen_varbuf, buf),
> +  str, sz) )
> +return -EFAULT;
> +
> +return sz;
> +}
> +
>  long do_xen_version(int cmd, XEN_GUEST_HANDLE_PARAM(void) arg)
>  {
>  bool_t deny = !!xsm_xen_version(XSM_OTHER, cmd);
> @@ -711,6 +765,14 @@ long do_xen_version(int cmd, 
> XEN_GUEST_HANDLE_PARAM(void) arg)
>  
>  return sz;
>  }
> +
> +case XENVER_extraversion2:
> +case XENVER_capabilities2:
> +case XENVER_changeset2:
> +case XENVER_commandline2:
> +if ( deny )
> +return -EPERM;
> +return xenver_varbuf_op(cmd, arg);
>  }
>  
>  return -ENOSYS;
> diff --git a/xen/include/public/version.h b/xen/include/public/version.h
> index cbc4ef7a46e6..0dd6bbcb43cc 100644
> --- a/xen/include/public/version.h
> +++ b/xen/include/public/version.h
> @@ -19,12 +19,20 @@
>  /* arg == NULL; returns major:minor (16:16). */
>  #define XENVER_version  0
>  
> -/* arg == xen_extraversion_t. */
> +/*
> + * arg == xen_extraversion_t.
> + *
> + * This API/ABI is broken.  Use XENVER_extraversion2 where possible.
> + */
>  #define XENVER_extraversion 1
>  typedef char xen_extraversion_t[16];
>  #define XEN_EXTRAVERSION_LEN (sizeof(xen_extraversion_t))
>  
> -/* arg == xen_compile_info_t. */
> +/*
> + * arg == xen_compile_info_t.
> + *
> + * This API/ABI is broken and truncates data.
> + */
>  #define XENVER_compile_info 2
>  struct xen_compile_info {
>  char compiler[64];
> @@ -34,10 +42,20 @@ struct 

Re: INFORMAL VOTE REQUIRED - DOCUMENTATION WORDING

2023-11-30 Thread Stefano Stabellini
Hi all,

This vote is in the context of this thread:
https://marc.info/?l=xen-devel=169213351810075


On Thu, 30 Nov 2023, Kelly Choi wrote:
> Hi all, 
> There have been a few discussions about how we use documentation wording 
> within the community. Whilst there are differences in opinions and
> perceptions of the definition, it would be helpful to see a wider consensus 
> of how we feel. 
> 
> Discussion: Should we use the term 'broken' in our documentation, or do you 
> think an alternative wording would be better? If you agree or
> disagree, please vote as this will impact future discussions. 
> 
> I have purposely made the vote between two options to help us move in a 
> forward direction.
> 
> PLEASE VOTE HERE. Deadline 15th December 2023.
> Your name will be required but will be private. If you answer anonymously, 
> your vote will not count. This is to ensure it is fair and each
> person gets one vote. 
> 
> As an open-source project, we need to come to a common ground, which 
> sometimes means we may not personally agree. To make this fair, please
> note the final results will be used to determine our future actions within 
> the community. 
> 
> If the majority votes for/against, we will respect the majority and implement 
> this accordingly. 
> 
> Many thanks,
> Kelly Choi
> 
> Xen Project Community Manager
> XenServer, Cloud Software Group
> 
> 

[ovmf test] 183956: all pass - PUSHED

2023-11-30 Thread osstest service owner
flight 183956 ovmf real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183956/

Perfect :-)
All tests in this flight passed as required
version targeted for testing:
 ovmf 26d484d0867b03ebd8a1ecdd9895f17e96732503
baseline version:
 ovmf b4f8c75e316e74750d5806b8ebf2bd11a3d62626

Last test of basis   183946  2023-11-30 07:41:01 Z0 days
Testing same since   183956  2023-11-30 19:44:47 Z0 days1 attempts


People who touched revisions under test:
  Michael D Kinney 
  Michael Kubacki 

jobs:
 build-amd64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-xl-qemuu-ovmf-amd64 pass
 test-amd64-i386-xl-qemuu-ovmf-amd64  pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, is at
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README.email;hb=master
http://xenbits.xen.org/gitweb/?p=osstest.git;a=blob;f=README;hb=master

Test harness code can be found at
http://xenbits.xen.org/gitweb?p=osstest.git;a=summary


Pushing revision :

To xenbits.xen.org:/home/xen/git/osstest/ovmf.git
   b4f8c75e31..26d484d086  26d484d0867b03ebd8a1ecdd9895f17e96732503 -> 
xen-tested-master



Re: [PATCH 08/12] aio: remove aio_context_acquire()/aio_context_release() API

2023-11-30 Thread Eric Blake
On Wed, Nov 29, 2023 at 02:55:49PM -0500, Stefan Hajnoczi wrote:
> Delete these functions because nothing calls these functions anymore.
> 
> I introduced these APIs in commit 98563fc3ec44 ("aio: add
> aio_context_acquire() and aio_context_release()") in 2014. It's with a
> sigh of relief that I delete these APIs almost 10 years later.
> 
> Thanks to Paolo Bonzini's vision for multi-queue QEMU, we got an
> understanding of where the code needed to go in order to remove the
> limitations that the original dataplane and the IOThread/AioContext
> approach that followed it.
> 
> Emanuele Giuseppe Esposito had the splendid determination to convert
> large parts of the codebase so that they no longer needed the AioContext
> lock. This was a painstaking process, both in the actual code changes
> required and the iterations of code review that Emanuele eeked out of

s/eeked/eked/

> Kevin and me over many months.
> 
> Kevin Wolf tackled multitudes of graph locking conversions to protect
> in-flight I/O from run-time changes to the block graph as well as the
> clang Thread Safety Analysis annotations that allow the compiler to
> check whether the graph lock is being used correctly.
> 
> And me, well, I'm just here to add some pizzazz to the QEMU multi-queue
> block layer :). Thank you to everyone who helped with this effort,
> including Eric Blake, code reviewer extraordinaire, and others who I've
> forgotten to mention.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  include/block/aio.h | 17 -
>  util/async.c| 10 --
>  2 files changed, 27 deletions(-)
>

Yay!

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: [PATCH 1/6] system/cpus: rename qemu_mutex_lock_iothread() to qemu_bql_lock()

2023-11-30 Thread BALATON Zoltan

On Thu, 30 Nov 2023, Stefan Hajnoczi wrote:


On Thu, Nov 30, 2023 at 03:08:49PM -0500, Peter Xu wrote:

On Wed, Nov 29, 2023 at 04:26:20PM -0500, Stefan Hajnoczi wrote:

The Big QEMU Lock (BQL) has many names and they are confusing. The
actual QemuMutex variable is called qemu_global_mutex but it's commonly
referred to as the BQL in discussions and some code comments. The
locking APIs, however, are called qemu_mutex_lock_iothread() and
qemu_mutex_unlock_iothread().

The "iothread" name is historic and comes from when the main thread was
split into into KVM vcpu threads and the "iothread" (now called the main
loop thread). I have contributed to the confusion myself by introducing
a separate --object iothread, a separate concept unrelated to the BQL.

The "iothread" name is no longer appropriate for the BQL. Rename the
locking APIs to:
- void qemu_bql_lock(void)
- void qemu_bql_unlock(void)
- bool qemu_bql_locked(void)

There are more APIs with "iothread" in their names. Subsequent patches
will rename them. There are also comments and documentation that will be
updated in later patches.

Signed-off-by: Stefan Hajnoczi 


Acked-by: Peter Xu 

Two nickpicks:

  - BQL contains "QEMU" as the 2nd character, so maybe easier to further
rename qemu_bql into bql_?


Philippe wondered whether the variable name should end with _mutex (or
_lock is common too), so an alternative might be big_qemu_lock. That's
imperfect because it doesn't start with the usual qemu_ prefix.
qemu_big_lock is better in that regard but inconsistent with our BQL
abbreviation.


BQL isn't very specific for those unfamiliar with the code but it's short 
and already used and known by people so I'm OK with qemu_bql with some 
comments and docs explainig here and there what bql stands for should be 
enough for new people to quickly find out. If we want to be more verbose 
how about "qemu_global_mutex" which is self describing but longer and does 
not resemble BQL so then comments may be needed to explain this is what 
was called BQL as well. I don't mind either way though.


Regards,
BALATON Zoltan


I don't like putting an underscore at the end. It's unusual and would
make me wonder what that means.

Naming is hard, but please discuss and I'm open to change to BQL
variable's name to whatever we all agree on.



  - Could we keep the full spell of BQL at some places, so people can still
reference it if not familiar?  IIUC most of the BQL helpers will root
back to the major three functions (_lock, _unlock, _locked), perhaps
add a comment of "BQL stands for..." over these three functions as
comment?


Yes, I'll update the doc comments to say "Big QEMU Lock (BQL)" for each
of these functions.

Stefan





Re: [PATCH 07/12] aio-wait: draw equivalence between AIO_WAIT_WHILE() and AIO_WAIT_WHILE_UNLOCKED()

2023-11-30 Thread Eric Blake
On Wed, Nov 29, 2023 at 02:55:48PM -0500, Stefan Hajnoczi wrote:
> Now that the AioContext lock no longer exists, AIO_WAIT_WHILE() and
> AIO_WAIT_WHILE_UNLOCKED() are equivalent.
> 
> A future patch will get rid of AIO_WAIT_WHILE_UNLOCKED().
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  include/block/aio-wait.h | 16 
>  1 file changed, 4 insertions(+), 12 deletions(-)
>

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: [PATCH 06/12] scsi: remove AioContext locking

2023-11-30 Thread Eric Blake
On Wed, Nov 29, 2023 at 02:55:47PM -0500, Stefan Hajnoczi wrote:
> The AioContext lock no longer has any effect. Remove it.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  include/hw/virtio/virtio-scsi.h | 14 --
>  hw/scsi/scsi-bus.c  |  2 --
>  hw/scsi/scsi-disk.c | 28 
>  hw/scsi/virtio-scsi.c   | 18 --
>  4 files changed, 4 insertions(+), 58 deletions(-)
>

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: [PATCH 05/12] block: remove AioContext locking

2023-11-30 Thread Eric Blake
On Wed, Nov 29, 2023 at 02:55:46PM -0500, Stefan Hajnoczi wrote:
> This is the big patch that removes
> aio_context_acquire()/aio_context_release() from the block layer and
> affected block layer users.
> 
> There isn't a clean way to split this patch and the reviewers are likely
> the same group of people, so I decided to do it in one patch.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---

> +++ b/block.c
> @@ -7585,29 +7433,12 @@ void coroutine_fn bdrv_co_leave(BlockDriverState *bs, 
> AioContext *old_ctx)
>  
>  void coroutine_fn bdrv_co_lock(BlockDriverState *bs)
>  {
> -AioContext *ctx = bdrv_get_aio_context(bs);
> -
> -/* In the main thread, bs->aio_context won't change concurrently */
> -assert(qemu_get_current_aio_context() == qemu_get_aio_context());
> -
> -/*
> - * We're in coroutine context, so we already hold the lock of the main
> - * loop AioContext. Don't lock it twice to avoid deadlocks.
> - */
> -assert(qemu_in_coroutine());

Is this assertion worth keeping in the short term?...

> -if (ctx != qemu_get_aio_context()) {
> -aio_context_acquire(ctx);
> -}
> +/* TODO removed in next patch */
>  }

...I guess I'll see in the next patch.

>  
>  void coroutine_fn bdrv_co_unlock(BlockDriverState *bs)
>  {
> -AioContext *ctx = bdrv_get_aio_context(bs);
> -
> -assert(qemu_in_coroutine());
> -if (ctx != qemu_get_aio_context()) {
> -aio_context_release(ctx);
> -}
> +/* TODO removed in next patch */
>  }

Same comment.

> +++ b/blockdev.c
> @@ -1395,7 +1352,6 @@ static void external_snapshot_action(TransactionAction 
> *action,
>  /* File name of the new image (for 'blockdev-snapshot-sync') */
>  const char *new_image_file;
>  ExternalSnapshotState *state = g_new0(ExternalSnapshotState, 1);
> -AioContext *aio_context;
>  uint64_t perm, shared;
>  
>  /* TODO We'll eventually have to take a writer lock in this function */

I'm guessing removal of the locking gets us one step closer to
implementing this TODO at a later time?  Or is it now a stale comment?
Either way, it doesn't affect this patch.

> +++ b/migration/block.c
> @@ -270,7 +270,6 @@ static int mig_save_device_bulk(QEMUFile *f, 
> BlkMigDevState *bmds)
>  
>  if (bmds->shared_base) {
>  qemu_mutex_lock_iothread();
> -aio_context_acquire(blk_get_aio_context(bb));
...
> @@ -313,22 +311,10 @@ static int mig_save_device_bulk(QEMUFile *f, 
> BlkMigDevState *bmds)
>  block_mig_state.submitted++;
>  blk_mig_unlock();
>  
> -/* We do not know if bs is under the main thread (and thus does
> - * not acquire the AioContext when doing AIO) or rather under
> - * dataplane.  Thus acquire both the iothread mutex and the
> - * AioContext.
> - *
> - * This is ugly and will disappear when we make bdrv_* thread-safe,
> - * without the need to acquire the AioContext.
> - */
> -qemu_mutex_lock_iothread();
> -aio_context_acquire(blk_get_aio_context(bmds->blk));

Will conflict, but with trivial resolution, with your other thread
renaming things to qemu_bql_lock().


> +++ b/tests/unit/test-blockjob.c

> -static void test_complete_in_standby(void)
> -{

> @@ -531,13 +402,5 @@ int main(int argc, char **argv)
>  g_test_add_func("/blockjob/cancel/standby", test_cancel_standby);
>  g_test_add_func("/blockjob/cancel/pending", test_cancel_pending);
>  g_test_add_func("/blockjob/cancel/concluded", test_cancel_concluded);
> -
> -/*
> - * This test is flaky and sometimes fails in CI and otherwise:
> - * don't run unless user opts in via environment variable.
> - */
> -if (getenv("QEMU_TEST_FLAKY_TESTS")) {
> -g_test_add_func("/blockjob/complete_in_standby", 
> test_complete_in_standby);
> -}

Looks like you ripped out this entire test, because it is no longer
viable.  I might have mentioned it in the commit message, or squashed
the removal of this test into the earlier 02/12 patch.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: [PATCH 1/6] system/cpus: rename qemu_mutex_lock_iothread() to qemu_bql_lock()

2023-11-30 Thread Peter Xu
On Thu, Nov 30, 2023 at 03:43:25PM -0500, Stefan Hajnoczi wrote:
> On Thu, Nov 30, 2023 at 03:08:49PM -0500, Peter Xu wrote:
> > On Wed, Nov 29, 2023 at 04:26:20PM -0500, Stefan Hajnoczi wrote:
> > > The Big QEMU Lock (BQL) has many names and they are confusing. The
> > > actual QemuMutex variable is called qemu_global_mutex but it's commonly
> > > referred to as the BQL in discussions and some code comments. The
> > > locking APIs, however, are called qemu_mutex_lock_iothread() and
> > > qemu_mutex_unlock_iothread().
> > > 
> > > The "iothread" name is historic and comes from when the main thread was
> > > split into into KVM vcpu threads and the "iothread" (now called the main
> > > loop thread). I have contributed to the confusion myself by introducing
> > > a separate --object iothread, a separate concept unrelated to the BQL.
> > > 
> > > The "iothread" name is no longer appropriate for the BQL. Rename the
> > > locking APIs to:
> > > - void qemu_bql_lock(void)
> > > - void qemu_bql_unlock(void)
> > > - bool qemu_bql_locked(void)
> > > 
> > > There are more APIs with "iothread" in their names. Subsequent patches
> > > will rename them. There are also comments and documentation that will be
> > > updated in later patches.
> > > 
> > > Signed-off-by: Stefan Hajnoczi 
> > 
> > Acked-by: Peter Xu 
> > 
> > Two nickpicks:
> > 
> >   - BQL contains "QEMU" as the 2nd character, so maybe easier to further
> > rename qemu_bql into bql_?
> 
> Philippe wondered whether the variable name should end with _mutex (or
> _lock is common too), so an alternative might be big_qemu_lock. That's

IMHO mutex isn't important in this context, but an implementation detail of
the "lock" as an abstract concept.

For example, we won't need to rename it again then if the impl changes,
e.g. using pure futex or a rwlock replacement.  When that happens we don't
need to change all call sites again.

(never really meant to change the lock impl, just an example.. :)

KVM actually has that example of KVM_MMU_LOCK() macro taking as the rwlock
write lock when the spinlock is replaced with rwlock, while it'll keep to
be the spinlock "lock()" when !KVM_HAVE_MMU_RWLOCK.

> imperfect because it doesn't start with the usual qemu_ prefix.
> qemu_big_lock is better in that regard but inconsistent with our BQL
> abbreviation.
> 
> I don't like putting an underscore at the end. It's unusual and would
> make me wonder what that means.

Ah, I meant replacing the "qemu_bql_" prefix with "bql_", as that contains
QEMU already, rather than making "_" at the end.  So they'll be bql_lock(),
bql_unlock(), bql_locked().

> 
> Naming is hard, but please discuss and I'm open to change to BQL
> variable's name to whatever we all agree on.

I'm pretty okay with qemu_bql_lock(), etc. too.  I prefer a tiny little bit
on bql_ over qemu_bql_ in this regard, but frankly they're all names good
enough to me.  The "qemu_" prefix can still be a good thing saying "this is
a qemu global function", even if contained inside "bql" itself.

> 
> > 
> >   - Could we keep the full spell of BQL at some places, so people can still
> > reference it if not familiar?  IIUC most of the BQL helpers will root
> > back to the major three functions (_lock, _unlock, _locked), perhaps
> > add a comment of "BQL stands for..." over these three functions as
> > comment?
> 
> Yes, I'll update the doc comments to say "Big QEMU Lock (BQL)" for each
> of these functions.

Thanks!

-- 
Peter Xu




Re: [PATCH 1/6] system/cpus: rename qemu_mutex_lock_iothread() to qemu_bql_lock()

2023-11-30 Thread Stefan Hajnoczi
On Thu, Nov 30, 2023 at 03:08:49PM -0500, Peter Xu wrote:
> On Wed, Nov 29, 2023 at 04:26:20PM -0500, Stefan Hajnoczi wrote:
> > The Big QEMU Lock (BQL) has many names and they are confusing. The
> > actual QemuMutex variable is called qemu_global_mutex but it's commonly
> > referred to as the BQL in discussions and some code comments. The
> > locking APIs, however, are called qemu_mutex_lock_iothread() and
> > qemu_mutex_unlock_iothread().
> > 
> > The "iothread" name is historic and comes from when the main thread was
> > split into into KVM vcpu threads and the "iothread" (now called the main
> > loop thread). I have contributed to the confusion myself by introducing
> > a separate --object iothread, a separate concept unrelated to the BQL.
> > 
> > The "iothread" name is no longer appropriate for the BQL. Rename the
> > locking APIs to:
> > - void qemu_bql_lock(void)
> > - void qemu_bql_unlock(void)
> > - bool qemu_bql_locked(void)
> > 
> > There are more APIs with "iothread" in their names. Subsequent patches
> > will rename them. There are also comments and documentation that will be
> > updated in later patches.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> 
> Acked-by: Peter Xu 
> 
> Two nickpicks:
> 
>   - BQL contains "QEMU" as the 2nd character, so maybe easier to further
> rename qemu_bql into bql_?

Philippe wondered whether the variable name should end with _mutex (or
_lock is common too), so an alternative might be big_qemu_lock. That's
imperfect because it doesn't start with the usual qemu_ prefix.
qemu_big_lock is better in that regard but inconsistent with our BQL
abbreviation.

I don't like putting an underscore at the end. It's unusual and would
make me wonder what that means.

Naming is hard, but please discuss and I'm open to change to BQL
variable's name to whatever we all agree on.

> 
>   - Could we keep the full spell of BQL at some places, so people can still
> reference it if not familiar?  IIUC most of the BQL helpers will root
> back to the major three functions (_lock, _unlock, _locked), perhaps
> add a comment of "BQL stands for..." over these three functions as
> comment?

Yes, I'll update the doc comments to say "Big QEMU Lock (BQL)" for each
of these functions.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH 6/6] Rename "QEMU global mutex" to "BQL" in comments and docs

2023-11-30 Thread Stefan Hajnoczi
On Thu, Nov 30, 2023 at 02:49:48PM +0100, Philippe Mathieu-Daudé wrote:
> On 29/11/23 22:26, Stefan Hajnoczi wrote:
> > The term "QEMU global mutex" is identical to the more widely used Big
> > QEMU Lock ("BQL"). Update the code comments and documentation to use
> > "BQL" instead of "QEMU global mutex".
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >   docs/devel/multi-thread-tcg.rst   |  7 +++
> >   docs/devel/qapi-code-gen.rst  |  2 +-
> >   docs/devel/replay.rst |  2 +-
> >   docs/devel/multiple-iothreads.txt | 16 
> >   include/block/blockjob.h  |  6 +++---
> >   include/io/task.h |  2 +-
> >   include/qemu/coroutine-core.h |  2 +-
> >   include/qemu/coroutine.h  |  2 +-
> >   hw/block/dataplane/virtio-blk.c   |  8 
> >   hw/block/virtio-blk.c |  2 +-
> >   hw/scsi/virtio-scsi-dataplane.c   |  6 +++---
> >   net/tap.c |  2 +-
> >   12 files changed, 28 insertions(+), 29 deletions(-)
> 
> 
> > diff --git a/include/block/blockjob.h b/include/block/blockjob.h
> > index e594c10d23..b2bc7c04d6 100644
> > --- a/include/block/blockjob.h
> > +++ b/include/block/blockjob.h
> > @@ -54,7 +54,7 @@ typedef struct BlockJob {
> >   /**
> >* Speed that was set with @block_job_set_speed.
> > - * Always modified and read under QEMU global mutex 
> > (GLOBAL_STATE_CODE).
> > + * Always modified and read under BQL (GLOBAL_STATE_CODE).
> 
> "under the BQL"
> 
> >*/
> >   int64_t speed;
> > @@ -66,7 +66,7 @@ typedef struct BlockJob {
> >   /**
> >* Block other operations when block job is running.
> > - * Always modified and read under QEMU global mutex 
> > (GLOBAL_STATE_CODE).
> > + * Always modified and read under BQL (GLOBAL_STATE_CODE).
> 
> Ditto,
> 
> >*/
> >   Error *blocker;
> > @@ -89,7 +89,7 @@ typedef struct BlockJob {
> >   /**
> >* BlockDriverStates that are involved in this block job.
> > - * Always modified and read under QEMU global mutex 
> > (GLOBAL_STATE_CODE).
> > + * Always modified and read under BQL (GLOBAL_STATE_CODE).
> 
> Ditto.
> 
> >*/
> >   GSList *nodes;
> >   } BlockJob;

Will fix in v2.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH 5/6] Replace "iothread lock" with "BQL" in comments

2023-11-30 Thread Stefan Hajnoczi
On Thu, Nov 30, 2023 at 02:47:49PM +0100, Philippe Mathieu-Daudé wrote:
> Hi Stefan,
> 
> On 29/11/23 22:26, Stefan Hajnoczi wrote:
> > The term "iothread lock" is obsolete. The APIs use Big QEMU Lock (BQL)
> > in their names. Update the code comments to use "BQL" instead of
> > "iothread lock".
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >   docs/devel/reset.rst |  2 +-
> >   hw/display/qxl.h |  2 +-
> >   include/exec/cpu-common.h|  2 +-
> >   include/exec/memory.h|  4 ++--
> >   include/exec/ramblock.h  |  2 +-
> >   include/migration/register.h |  8 
> >   target/arm/internals.h   |  4 ++--
> >   accel/tcg/cputlb.c   |  4 ++--
> >   accel/tcg/tcg-accel-ops-icount.c |  2 +-
> >   hw/remote/mpqemu-link.c  |  2 +-
> >   migration/block-dirty-bitmap.c   | 10 +-
> >   migration/block.c| 24 
> >   migration/colo.c |  2 +-
> >   migration/migration.c|  2 +-
> >   migration/ram.c  |  4 ++--
> >   system/physmem.c |  6 +++---
> >   target/arm/helper.c  |  2 +-
> >   target/arm/tcg/m_helper.c|  2 +-
> >   ui/spice-core.c  |  2 +-
> >   util/rcu.c   |  2 +-
> >   audio/coreaudio.m|  4 ++--
> >   ui/cocoa.m   |  6 +++---
> >   22 files changed, 49 insertions(+), 49 deletions(-)
> 
> 
> > diff --git a/include/exec/ramblock.h b/include/exec/ramblock.h
> > index 69c6a53902..a2bc0a345d 100644
> > --- a/include/exec/ramblock.h
> > +++ b/include/exec/ramblock.h
> > @@ -34,7 +34,7 @@ struct RAMBlock {
> >   ram_addr_t max_length;
> >   void (*resized)(const char*, uint64_t length, void *host);
> >   uint32_t flags;
> > -/* Protected by iothread lock.  */
> > +/* Protected by BQL.  */
> 
> There is only one single BQL, so preferably:
> 
> "by the BQL"
> 
> >   char idstr[256];
> >   /* RCU-enabled, writes protected by the ramlist lock */
> >   QLIST_ENTRY(RAMBlock) next;
> 
> 
> 
> 
> > -/* Called with iothread lock taken.  */
> > +/* Called with BQL taken.  */
> 
> "with the BQL" (other uses)

I will try to change these for v2. It's a pre-existing issue though
because there was only ever one "iothread lock" too.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH 4/6] system/cpus: rename qemu_global_mutex to qemu_bql

2023-11-30 Thread Stefan Hajnoczi
On Thu, Nov 30, 2023 at 02:44:07PM +0100, Philippe Mathieu-Daudé wrote:
> Hi Stefan,
> 
> On 29/11/23 22:26, Stefan Hajnoczi wrote:
> > The APIs using qemu_global_mutex now follow the Big QEMU Lock (BQL)
> > nomenclature. It's a little strange that the actual QemuMutex variable
> > that embodies the BQL is called qemu_global_mutex instead of qemu_bql.
> > Rename it for consistency.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >   system/cpus.c | 20 ++--
> >   1 file changed, 10 insertions(+), 10 deletions(-)
> > 
> > diff --git a/system/cpus.c b/system/cpus.c
> > index eb24a4db8e..138720a540 100644
> > --- a/system/cpus.c
> > +++ b/system/cpus.c
> > @@ -65,7 +65,7 @@
> >   #endif /* CONFIG_LINUX */
> > -static QemuMutex qemu_global_mutex;
> > +static QemuMutex qemu_bql;
> 
> I thought we were using _cond/_sem/_mutex suffixes, but
> this is not enforced:

I'm open to alternative names. Here are some I can think of:
- big_qemu_lock (although grepping for "bql" won't find it)
- qemu_bql_mutex

If there is no strong feeling about this then let's leave it at
qemu_bql. Otherwise, please discuss.

Thanks,
Stefan


signature.asc
Description: PGP signature


Re: [PATCH 2/6] qemu/main-loop: rename QEMU_IOTHREAD_LOCK_GUARD to QEMU_BQL_LOCK_GUARD

2023-11-30 Thread Stefan Hajnoczi
On Thu, Nov 30, 2023 at 10:14:47AM +0100, Ilya Leoshkevich wrote:
> On Wed, 2023-11-29 at 16:26 -0500, Stefan Hajnoczi wrote:
> > The name "iothread" is overloaded. Use the term Big QEMU Lock (BQL)
> > instead, it is already widely used and unambiguous.
> > 
> > Signed-off-by: Stefan Hajnoczi 
> > ---
> >  include/qemu/main-loop.h  | 20 ++--
> >  hw/i386/kvm/xen_evtchn.c  | 14 +++---
> >  hw/i386/kvm/xen_gnttab.c  |  2 +-
> >  hw/mips/mips_int.c    |  2 +-
> >  hw/ppc/ppc.c  |  2 +-
> >  target/i386/kvm/xen-emu.c |  2 +-
> >  target/ppc/excp_helper.c  |  2 +-
> >  target/ppc/helper_regs.c  |  2 +-
> >  target/riscv/cpu_helper.c |  4 ++--
> >  9 files changed, 25 insertions(+), 25 deletions(-)
> > 
> > diff --git a/include/qemu/main-loop.h b/include/qemu/main-loop.h
> > index d6f75e57bd..0b6a3e4824 100644
> > --- a/include/qemu/main-loop.h
> > +++ b/include/qemu/main-loop.h
> > @@ -344,13 +344,13 @@ void qemu_bql_lock_impl(const char *file, int
> > line);
> >  void qemu_bql_unlock(void);
> >  
> >  /**
> > - * QEMU_IOTHREAD_LOCK_GUARD
> > + * QEMU_BQL_LOCK_GUARD
> >   *
> > - * Wrap a block of code in a conditional
> > qemu_mutex_{lock,unlock}_iothread.
> > + * Wrap a block of code in a conditional qemu_bql_{lock,unlock}.
> >   */
> > -typedef struct IOThreadLockAuto IOThreadLockAuto;
> > +typedef struct BQLLockAuto BQLLockAuto;
> >  
> > -static inline IOThreadLockAuto *qemu_iothread_auto_lock(const char
> > *file,
> > +static inline BQLLockAuto *qemu_bql_auto_lock(const char *file,
> >  int line)
> 
> The padding is not correct anymore.

Good point, I didn't check the formatting after search-and-replace. I
will fix this across the patch series in v2.

Stefan


signature.asc
Description: PGP signature


Re: [PATCH 1/6] system/cpus: rename qemu_mutex_lock_iothread() to qemu_bql_lock()

2023-11-30 Thread Peter Xu
On Wed, Nov 29, 2023 at 04:26:20PM -0500, Stefan Hajnoczi wrote:
> The Big QEMU Lock (BQL) has many names and they are confusing. The
> actual QemuMutex variable is called qemu_global_mutex but it's commonly
> referred to as the BQL in discussions and some code comments. The
> locking APIs, however, are called qemu_mutex_lock_iothread() and
> qemu_mutex_unlock_iothread().
> 
> The "iothread" name is historic and comes from when the main thread was
> split into into KVM vcpu threads and the "iothread" (now called the main
> loop thread). I have contributed to the confusion myself by introducing
> a separate --object iothread, a separate concept unrelated to the BQL.
> 
> The "iothread" name is no longer appropriate for the BQL. Rename the
> locking APIs to:
> - void qemu_bql_lock(void)
> - void qemu_bql_unlock(void)
> - bool qemu_bql_locked(void)
> 
> There are more APIs with "iothread" in their names. Subsequent patches
> will rename them. There are also comments and documentation that will be
> updated in later patches.
> 
> Signed-off-by: Stefan Hajnoczi 

Acked-by: Peter Xu 

Two nickpicks:

  - BQL contains "QEMU" as the 2nd character, so maybe easier to further
rename qemu_bql into bql_?

  - Could we keep the full spell of BQL at some places, so people can still
reference it if not familiar?  IIUC most of the BQL helpers will root
back to the major three functions (_lock, _unlock, _locked), perhaps
add a comment of "BQL stands for..." over these three functions as
comment?

Please take or ignore these nitpicks; my ACK will stand irrelevant.

Thanks,

-- 
Peter Xu




Re: [PATCH 04/12] graph-lock: remove AioContext locking

2023-11-30 Thread Eric Blake
On Wed, Nov 29, 2023 at 02:55:45PM -0500, Stefan Hajnoczi wrote:
> Stop acquiring/releasing the AioContext lock in
> bdrv_graph_wrlock()/bdrv_graph_unlock() since the lock no longer has any
> effect.
> 
> The distinction between bdrv_graph_wrunlock() and
> bdrv_graph_wrunlock_ctx() becomes meaningless and they can be collapsed
> into one function.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




Re: [PATCH v2 1/5] x86/livepatch: set function alignment to ensure minimal function size

2023-11-30 Thread Roger Pau Monné
On Thu, Nov 30, 2023 at 05:55:07PM +0100, Jan Beulich wrote:
> On 28.11.2023 11:03, Roger Pau Monne wrote:
> > The minimal function size requirements for livepatch are either 5 bytes (for
> > jmp) or 9 bytes (for endbr + jmp).  Ensure that functions are always at 
> > least
> > that size by requesting the compiled to align the functions to 8 or 16 
> > bytes,
> > depending on whether Xen is build with IBT support.
> 
> How is alignment going to enforce minimum function size? If a function is
> last in a section, there may not be any padding added (ahead of linking at
> least). The trailing padding also isn't part of the function.

If each function lives in it's own section (by using
-ffunction-sections), and each section is aligned, then I think we can
guarantee that there will always be enough padding space?

Even the last function/section on the .text block would still be
aligned, and as long as the function alignment <= SECTION_ALIGN
there will be enough padding left.  I should add some build time
assert that CONFIG_CC_FUNCTION_ALIGNMENT <= SECTION_ALIGN.

While the trailing padding is not part of the function itself, I don't
see issues in overwriting it with the replacement function code.

> > Note that it's possible for the compiler to end up using a higher function
> > alignment regardless of the passed value, so this change just make sure that
> > the minimum required for livepatch to work is present.
> > 
> > Since the option (-falign-functions) is supported by both minimal required
> > compiler versions of clang and gcc there's no need to add a test to check 
> > for
> > its presence.
> > 
> > The alignment is currently only implemented for livepatch on x86, I'm unsure
> > whether ARM has a mandatory function alignment high enough to cover for the
> > space required by the replacement instruction(s).
> 
> Indeed I was wondering whether this shouldn't be generalized, if indeed
> required.

Well, more than required it's nice to have in order to avoid failures.
The checks in the livepatch code will trigger if there's not enough
space to apply the replacement code.

Thanks, Roger.



Re: [PATCH 03/12] aio: make aio_context_acquire()/aio_context_release() a no-op

2023-11-30 Thread Eric Blake
On Wed, Nov 29, 2023 at 02:55:44PM -0500, Stefan Hajnoczi wrote:
> aio_context_acquire()/aio_context_release() has been replaced by
> fine-grained locking to protect state shared by multiple threads. The
> AioContext lock still plays the role of balancing locking in
> AIO_WAIT_WHILE() and many functions in QEMU either require that the
> AioContext lock is held or not held for this reason. In other words, the
> AioContext lock is purely there for consistency with itself and serves
> no real purpose anymore.
> 
> Stop actually acquiring/releasing the lock in
> aio_context_acquire()/aio_context_release() so that subsequent patches
> can remove callers across the codebase incrementally.
> 
> I have performed "make check" and qemu-iotests stress tests across
> x86-64, ppc64le, and aarch64 to confirm that there are no failures as a
> result of eliminating the lock.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  util/async.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/util/async.c b/util/async.c
> index 8f90ddc304..04ee83d220 100644
> --- a/util/async.c
> +++ b/util/async.c
> @@ -725,12 +725,12 @@ void aio_context_unref(AioContext *ctx)
>  
>  void aio_context_acquire(AioContext *ctx)
>  {
> -qemu_rec_mutex_lock(>lock);
> +/* TODO remove this function */
>  }
>  
>  void aio_context_release(AioContext *ctx)
>  {
> -qemu_rec_mutex_unlock(>lock);
> +/* TODO remove this function */
>  }
>  
>  QEMU_DEFINE_STATIC_CO_TLS(AioContext *, my_aiocontext)
> -- 
> 2.42.0

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




[XEN PATCH] docs/misra: fix a typo in rules.rst

2023-11-30 Thread Nicola Vetrini
No functional changes.

Signed-off-by: Nicola Vetrini 
---
 docs/misra/rules.rst | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/docs/misra/rules.rst b/docs/misra/rules.rst
index 53dab0070c7b..75921b9a3463 100644
--- a/docs/misra/rules.rst
+++ b/docs/misra/rules.rst
@@ -531,7 +531,7 @@ maintainers if you want to suggest a change.
the future or with another compiler.  For these reasons we discourage
the introduction of new reserved identifiers in Xen, and we see it as
positive the reduction of reserved identifiers. At the same time,
-   certain identifiers starting with wo underscores are also commonly used
+   certain identifiers starting with two underscores are also commonly used
in Linux (e.g. __set_bit) and we don't think it would be an improvement
to rename them.
 
-- 
2.34.1




Re: [PATCH v6 4/5] [FUTURE] xen/arm: enable vPCI for domUs

2023-11-30 Thread Stewart Hildebrand
On 11/30/23 03:33, Jan Beulich wrote:
> On 30.11.2023 03:47, Stewart Hildebrand wrote:
>> On 11/14/23 04:13, Jan Beulich wrote:
>>> On 13.11.2023 23:21, Stewart Hildebrand wrote:
 --- a/xen/arch/x86/include/asm/domain.h
 +++ b/xen/arch/x86/include/asm/domain.h
 @@ -503,6 +503,8 @@ struct arch_domain
  #define has_vpit(d)(!!((d)->arch.emulation_flags & X86_EMU_PIT))
  #define has_pirq(d)(!!((d)->arch.emulation_flags & 
 X86_EMU_USE_PIRQ))
  
 +#define arch_needs_vpci(d) ({ (void)(d); false; })
>>>
>>> See my comments on the v5 thread on both this and ...
>>
>> So, the goal here is to return true for a PVH dom0, and false otherwise (for 
>> now). Since dom0 can't feasibly be full HVM, and is_hvm_domain(d) returns 
>> true for PVH, how about the following?
>>
>> /* TODO: re-visit when vPCI is enabled for PVH domUs. */
>> #define arch_needs_vpci(d) ({   \
>> const struct domain *_d = (d);  \
>> is_hardware_domain(_d) && is_hvm_domain(_d); })
> 
> Looks okay to me, except for the leading underscore in _d (see respective
> Misra guidelines, merely re-enforcing what the C standard says).

Right. If I'm interpreting the standards correctly, it looks like a trailing 
underscore would work, seeing as we haven't adopted MISRA C:2012 Dir 4.5 (yet?).

> Of course
> the double evaluate_nospec() isn't quite nice in the result, but I guess
> this isn't going to be used in many places?

Only in XEN_DOMCTL_assign_device, as far as I'm aware.

> 
> Jan



[XEN PATCH v3] automation/eclair: improve scheduled analyses

2023-11-30 Thread Simone Ballarin
The scheduled analyses are intended to maintain an overall vision
of the MISRA complaince of the entire project. For this reason,
the file exclusions in "out_of_scope.ecl" should not be applied.

This patch amends ECLAIR settings to prevent exempting files for
scheduled analyses.

Signed-off-by: Simone Ballarin 

---
Changes in v3:
- fix guard for inclusion of out_of_scope.ecl.
Changes in v2:
- drop changes to inhibit test and build stages in scheduled pipelines.
---
 automation/eclair_analysis/ECLAIR/action.settings |  2 +-
 automation/eclair_analysis/ECLAIR/analysis.ecl| 12 ++--
 automation/gitlab-ci/analyze.yaml |  2 ++
 3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/automation/eclair_analysis/ECLAIR/action.settings 
b/automation/eclair_analysis/ECLAIR/action.settings
index f96368ffc7..3cba1a3afb 100644
--- a/automation/eclair_analysis/ECLAIR/action.settings
+++ b/automation/eclair_analysis/ECLAIR/action.settings
@@ -134,7 +134,7 @@ push)
 badgeLabel="ECLAIR ${ANALYSIS_KIND} ${ref}${variantHeadline} #${jobId}"
 ;;
 auto_pull_request)
-git remote remove autoPRRemote || true
+git remote remove autoPRRemote 2>/dev/null || true
 git remote add autoPRRemote "${autoPRRemoteUrl}"
 git fetch -q autoPRRemote
 subDir="${ref}"
diff --git a/automation/eclair_analysis/ECLAIR/analysis.ecl 
b/automation/eclair_analysis/ECLAIR/analysis.ecl
index fe418d6da1..f8d4cc8c99 100644
--- a/automation/eclair_analysis/ECLAIR/analysis.ecl
+++ b/automation/eclair_analysis/ECLAIR/analysis.ecl
@@ -2,7 +2,13 @@
 -project_name=getenv("ECLAIR_PROJECT_NAME")
 -project_root=getenv("ECLAIR_PROJECT_ROOT")
 
--setq=data_dir,getenv("ECLAIR_DATA_DIR")
+setq(data_dir,getenv("ECLAIR_DATA_DIR"))
+setq(analysis_kind,getenv("ANALYSIS_KIND"))
+setq(scheduled_analysis,nil)
+
+strings_map("scheduled-analysis",500,"","^.*scheduled$",0,setq(scheduled_analysis,t))
+strings_map("scheduled-analysis",500,"","^.*$",0)
+map_strings("scheduled-analysis",analysis_kind)
 
 -verbose
 
@@ -15,7 +21,9 @@
 
 -eval_file=toolchain.ecl
 -eval_file=public_APIs.ecl
--eval_file=out_of_scope.ecl
+if(not(scheduled_analysis),
+eval_file("out_of_scope.ecl")
+)
 -eval_file=deviations.ecl
 -eval_file=call_properties.ecl
 -eval_file=tagging.ecl
diff --git a/automation/gitlab-ci/analyze.yaml 
b/automation/gitlab-ci/analyze.yaml
index bd9a68de31..6631db53fa 100644
--- a/automation/gitlab-ci/analyze.yaml
+++ b/automation/gitlab-ci/analyze.yaml
@@ -28,6 +28,8 @@
   extends: .eclair-analysis
   allow_failure: true
   rules:
+- if: $CI_PIPELINE_SOURCE == "schedule"
+  when: never
 - if: $WTOKEN && $CI_PROJECT_PATH =~ /^xen-project\/people\/.*$/
   when: manual
 - !reference [.eclair-analysis, rules]
-- 
2.34.1




Re: [PATCH v2 3/5] xen/x86: introduce self modifying code test

2023-11-30 Thread Jan Beulich
On 28.11.2023 11:03, Roger Pau Monne wrote:
> --- /dev/null
> +++ b/xen/arch/x86/test-smc.c
> @@ -0,0 +1,68 @@
> +/* SPDX-License-Identifier: GPL-2.0 */
> +
> +#include 
> +
> +#include 
> +#include 
> +#include 
> +
> +static bool cf_check test_insn_replacement(void)
> +{
> +#define EXPECTED_VALUE 2
> +unsigned int r = ~EXPECTED_VALUE;

The compiler is permitted to elide the initializer unless ...

> +alternative_io("", "mov $" STR(EXPECTED_VALUE) ", %0",
> +   X86_FEATURE_ALWAYS, "=r"(r));

... you use "+r" here. Also (nit) there's a blank missing between that
string and the opening parethesis. Also what's wrong with passing
EXPECTED_VALUE in as an "i" constraint input operand?

> +return r == EXPECTED_VALUE;
> +#undef EXPECTED_VALUE
> +}
> +
> +int test_smc(uint32_t selection, uint32_t *results)
> +{
> +struct {
> +unsigned int mask;
> +bool (*test)(void);
> +const char *name;
> +} static const tests[] = {
> +{ XEN_SYSCTL_TEST_SMC_INSN_REPL, _insn_replacement,
> +  "alternative instruction replacement" },
> +};
> +unsigned int i;
> +
> +if ( selection & ~XEN_SYSCTL_TEST_SMC_ALL )
> +return -EINVAL;
> +
> +if ( results )
> +*results = 0;
> +
> +printk(XENLOG_INFO "Checking Self Modify Code\n");
> +
> +for ( i = 0; i < ARRAY_SIZE(tests); i++ )
> +{
> +if ( !(selection & tests[i].mask) )
> +continue;
> +
> +if ( tests[i].test() )
> +{
> +if ( results )
> +*results |= tests[i].mask;
> +continue;
> +}
> +
> +add_taint(TAINT_ERROR_SMC);
> +printk(XENLOG_ERR "%s test failed\n", tests[i].name);
> +}
> +
> +return 0;
> +}

I think I need to look at the next patch to make sense of this.

> @@ -1261,6 +1269,7 @@ struct xen_sysctl {
>  struct xen_sysctl_livepatch_op  livepatch;
>  #if defined(__i386__) || defined(__x86_64__)
>  struct xen_sysctl_cpu_policycpu_policy;
> +struct xen_sysctl_test_smc  smc;

Imo the field name would better be test_smc (leaving aside Stefano's comment).

Jan



Re: [XEN PATCH 2/7] x86/i8259: add missing header for init_IRQ declaration

2023-11-30 Thread Nicola Vetrini

On 2023-11-30 17:48, Jan Beulich wrote:

On 29.11.2023 16:24, Nicola Vetrini wrote:

--- a/xen/arch/x86/i8259.c
+++ b/xen/arch/x86/i8259.c
@@ -19,6 +19,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 


A patch doing this (among other things) is already pending: "x86: 
detect

PIC aliasing on ports other than 0x[2A][01]".



Ok

--
Nicola Vetrini, BSc
Software Engineer, BUGSENG srl (https://bugseng.com)



Re: [PATCH v2 1/5] x86/livepatch: set function alignment to ensure minimal function size

2023-11-30 Thread Jan Beulich
On 28.11.2023 11:03, Roger Pau Monne wrote:
> The minimal function size requirements for livepatch are either 5 bytes (for
> jmp) or 9 bytes (for endbr + jmp).  Ensure that functions are always at least
> that size by requesting the compiled to align the functions to 8 or 16 bytes,
> depending on whether Xen is build with IBT support.

How is alignment going to enforce minimum function size? If a function is
last in a section, there may not be any padding added (ahead of linking at
least). The trailing padding also isn't part of the function.

> Note that it's possible for the compiler to end up using a higher function
> alignment regardless of the passed value, so this change just make sure that
> the minimum required for livepatch to work is present.
> 
> Since the option (-falign-functions) is supported by both minimal required
> compiler versions of clang and gcc there's no need to add a test to check for
> its presence.
> 
> The alignment is currently only implemented for livepatch on x86, I'm unsure
> whether ARM has a mandatory function alignment high enough to cover for the
> space required by the replacement instruction(s).

Indeed I was wondering whether this shouldn't be generalized, if indeed
required.

Jan



Re: [XEN PATCH 2/7] x86/i8259: add missing header for init_IRQ declaration

2023-11-30 Thread Jan Beulich
On 29.11.2023 16:24, Nicola Vetrini wrote:
> --- a/xen/arch/x86/i8259.c
> +++ b/xen/arch/x86/i8259.c
> @@ -19,6 +19,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  #include 
>  #include 

A patch doing this (among other things) is already pending: "x86: detect
PIC aliasing on ports other than 0x[2A][01]".

Jan




Re: [XEN PATCH 3/7] xen/x86: add missing instances of asmlinkage attributes

2023-11-30 Thread Jan Beulich
On 29.11.2023 16:24, Nicola Vetrini wrote:
> --- a/xen/arch/x86/desc.c
> +++ b/xen/arch/x86/desc.c
> @@ -91,7 +91,7 @@ seg_desc_t boot_compat_gdt[PAGE_SIZE / sizeof(seg_desc_t)] =
>   * References boot_cpu_gdt_table for a short period, until the CPUs switch
>   * onto their per-CPU GDTs.
>   */
> -const struct desc_ptr boot_gdtr = {
> +const struct desc_ptr asmlinkage boot_gdtr = {
>  .limit = LAST_RESERVED_GDT_BYTE,
>  .base = (unsigned long)(boot_gdt - FIRST_RESERVED_GDT_ENTRY),
>  };

I'm not convinced asmlinkage is okay to use on data. Recall that in principle
it may expand to an attribute specifying a non-default calling convention.
Such attributes cannot be assumed to continue to be possible to apply to
non-functions, even if such may happen to work with a particular compiler
version.

Jan



Re: [XEN PATCH 6/7] xen/x86: remove stale comment

2023-11-30 Thread Jan Beulich
On 29.11.2023 16:24, Nicola Vetrini wrote:
> The comment referred to the declaration for do_mca, which
> now is part of hypercall-defs.h, therefore the comment is stale.

If the comments were stale, the #include-s should also be able to
disappear?

> --- a/xen/arch/x86/cpu/mcheck/mce.c
> +++ b/xen/arch/x86/cpu/mcheck/mce.c
> @@ -14,7 +14,7 @@
>  #include 
>  #include 
>  #include 
> -#include  /* for do_mca */
> +#include 
>  #include 

Here specifically I think the comment isn't stale, as xen/hypercall.h
includes xen/hypercall-defs.h.

> --- a/xen/arch/x86/include/asm/hypercall.h
> +++ b/xen/arch/x86/include/asm/hypercall.h
> @@ -12,7 +12,7 @@
>  #include 
>  #include 
>  #include 
> -#include  /* for do_mca */
> +#include 
>  #include 

Here otoh I'm not even sure this public header (or the others) is (are)
really needed.

Jan



Re: [XEN PATCH v2] automation/eclair: improve scheduled analyses

2023-11-30 Thread Simone Ballarin

On 30/11/23 04:13, Stefano Stabellini wrote:

On Thu, 23 Nov 2023, Simone Ballarin wrote:

The scheduled analyses are intended to maintain an overall vision
of the MISRA complaince of the entire project. For this reason,
the file exclusions in "out_of_scope.ecl" should not be applied.

This patch amends ECLAIR settings to prevent exempting files for
scheduled analyses.

Signed-off-by: Simone Ballarin 
Reviewed-by: Stefano Stabellini 

---
Changes in v2:
- drop changes to inhibit test and build stages in scheduled pipelines.
---
  automation/eclair_analysis/ECLAIR/action.settings |  2 +-
  automation/eclair_analysis/ECLAIR/analysis.ecl| 12 ++--
  automation/gitlab-ci/analyze.yaml |  2 ++
  3 files changed, 13 insertions(+), 3 deletions(-)

diff --git a/automation/eclair_analysis/ECLAIR/action.settings 
b/automation/eclair_analysis/ECLAIR/action.settings
index f96368ffc7..3cba1a3afb 100644
--- a/automation/eclair_analysis/ECLAIR/action.settings
+++ b/automation/eclair_analysis/ECLAIR/action.settings
@@ -134,7 +134,7 @@ push)
  badgeLabel="ECLAIR ${ANALYSIS_KIND} ${ref}${variantHeadline} #${jobId}"
  ;;
  auto_pull_request)
-git remote remove autoPRRemote || true
+git remote remove autoPRRemote 2>/dev/null || true
  git remote add autoPRRemote "${autoPRRemoteUrl}"
  git fetch -q autoPRRemote
  subDir="${ref}"
diff --git a/automation/eclair_analysis/ECLAIR/analysis.ecl 
b/automation/eclair_analysis/ECLAIR/analysis.ecl
index fe418d6da1..2507a8e787 100644
--- a/automation/eclair_analysis/ECLAIR/analysis.ecl
+++ b/automation/eclair_analysis/ECLAIR/analysis.ecl
@@ -2,7 +2,13 @@
  -project_name=getenv("ECLAIR_PROJECT_NAME")
  -project_root=getenv("ECLAIR_PROJECT_ROOT")
  
--setq=data_dir,getenv("ECLAIR_DATA_DIR")

+setq(data_dir,getenv("ECLAIR_DATA_DIR"))
+setq(analysis_kind,getenv("ANALYSIS_KIND"))
+setq(scheduled_analysis,nil)
+
+strings_map("scheduled-analysis",500,"","^.*scheduled$",0,setq(scheduled_analysis,t))
+strings_map("scheduled-analysis",500,"","^.*$",0)
+map_strings("scheduled-analysis",analysis_kind)
  
  -verbose
  
@@ -15,7 +21,9 @@
  
  -eval_file=toolchain.ecl

  -eval_file=public_APIs.ecl
--eval_file=out_of_scope.ecl
+if(scheduled_analysis,
+eval_file("out_of_scope.ecl")
+)



Overall the patch looks much better. Only one question: shouldn't it be

   if not scheduled_analysis

instead of:

   if scheduled_analysis

I don't know the language of analysis.ecl but we are supposed to add
out_of_scope.ecl in all cases except for scheduled_analysis. Looking at
this change it seems to do the opposite?


yes, you're right. I will fix the guard in v3.




  -eval_file=deviations.ecl
  -eval_file=call_properties.ecl
  -eval_file=tagging.ecl
diff --git a/automation/gitlab-ci/analyze.yaml 
b/automation/gitlab-ci/analyze.yaml
index bd9a68de31..6631db53fa 100644
--- a/automation/gitlab-ci/analyze.yaml
+++ b/automation/gitlab-ci/analyze.yaml
@@ -28,6 +28,8 @@
extends: .eclair-analysis
allow_failure: true
rules:
+- if: $CI_PIPELINE_SOURCE == "schedule"
+  when: never
  - if: $WTOKEN && $CI_PROJECT_PATH =~ /^xen-project\/people\/.*$/
when: manual
  - !reference [.eclair-analysis, rules]
--
2.34.1





--
Simone Ballarin, M.Sc.

Field Application Engineer, BUGSENG (https://bugseng.com)




Re: [XEN PATCH 11/11] xen/serial: address a violation of MISRA C:2012 Rule 8.2

2023-11-30 Thread Jan Beulich
On 24.11.2023 15:03, Federico Serafini wrote:
> Add missing parameter name. No functional change.
> 
> Signed-off-by: Federico Serafini 

Acked-by: Jan Beulich 





Re: [XEN PATCH 10/11] xen/perfc: address a violation of MISRA C:2012 Rule 8.2

2023-11-30 Thread Jan Beulich
On 24.11.2023 15:03, Federico Serafini wrote:
> Add missing parameter name. No functional change.
> 
> Signed-off-by: Federico Serafini 

Acked-by: Jan Beulich 





Re: [XEN PATCH 09/11] xen/param: address a violation of MISRA C:2012 Rule 8.2

2023-11-30 Thread Jan Beulich
On 24.11.2023 15:03, Federico Serafini wrote:
> Add missing parameter name. No functional change.
> 
> Signed-off-by: Federico Serafini 

Acked-by: Jan Beulich 

> --- a/xen/include/xen/param.h
> +++ b/xen/include/xen/param.h
> @@ -22,7 +22,7 @@ struct kernel_param {
>  unsigned int len;
>  union {
>  void *var;
> -int (*func)(const char *);
> +int (*func)(const char *s);

Still I again wonder what good this name does us here.

Jan




[libvirt test] 183944: tolerable all pass - PUSHED

2023-11-30 Thread osstest service owner
flight 183944 libvirt real [real]
http://logs.test-lab.xenproject.org/osstest/logs/183944/

Failures :-/ but no regressions.

Tests which did not succeed, but are not blocking:
 test-armhf-armhf-libvirt-raw 15 saverestore-support-check fail blocked in 
183927
 test-armhf-armhf-libvirt 16 saverestore-support-checkfail  like 183927
 test-armhf-armhf-libvirt-qcow2 15 saverestore-support-check   fail like 183927
 test-amd64-i386-libvirt-xsm  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt  15 migrate-support-checkfail   never pass
 test-amd64-amd64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt 16 saverestore-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 15 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-xsm 16 saverestore-support-checkfail   never pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm 13 migrate-support-check 
fail never pass
 test-armhf-armhf-libvirt 15 migrate-support-checkfail   never pass
 test-amd64-i386-libvirt-raw  14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-arm64-arm64-libvirt-qcow2 15 saverestore-support-checkfail never pass
 test-arm64-arm64-libvirt-raw 14 migrate-support-checkfail   never pass
 test-arm64-arm64-libvirt-raw 15 saverestore-support-checkfail   never pass
 test-amd64-amd64-libvirt-vhd 14 migrate-support-checkfail   never pass
 test-armhf-armhf-libvirt-qcow2 14 migrate-support-checkfail never pass
 test-armhf-armhf-libvirt-raw 14 migrate-support-checkfail   never pass

version targeted for testing:
 libvirt  94ded36b3fdde0c68f47ccfad4afa21fe4996fa5
baseline version:
 libvirt  f3573b5efaa2002c6c9efc379e3f8578c9bbbdf5

Last test of basis   183927  2023-11-29 04:18:51 Z1 days
Testing same since   183944  2023-11-30 04:22:11 Z0 days1 attempts


People who touched revisions under test:
  Michal Privoznik 

jobs:
 build-amd64-xsm  pass
 build-arm64-xsm  pass
 build-i386-xsm   pass
 build-amd64  pass
 build-arm64  pass
 build-armhf  pass
 build-i386   pass
 build-amd64-libvirt  pass
 build-arm64-libvirt  pass
 build-armhf-libvirt  pass
 build-i386-libvirt   pass
 build-amd64-pvopspass
 build-arm64-pvopspass
 build-armhf-pvopspass
 build-i386-pvops pass
 test-amd64-amd64-libvirt-qemuu-debianhvm-amd64-xsm   pass
 test-amd64-i386-libvirt-qemuu-debianhvm-amd64-xsmpass
 test-amd64-amd64-libvirt-xsm pass
 test-arm64-arm64-libvirt-xsm pass
 test-amd64-i386-libvirt-xsm  pass
 test-amd64-amd64-libvirt pass
 test-arm64-arm64-libvirt pass
 test-armhf-armhf-libvirt pass
 test-amd64-i386-libvirt  pass
 test-amd64-amd64-libvirt-pairpass
 test-amd64-i386-libvirt-pair pass
 test-arm64-arm64-libvirt-qcow2   pass
 test-armhf-armhf-libvirt-qcow2   pass
 test-arm64-arm64-libvirt-raw pass
 test-armhf-armhf-libvirt-raw pass
 test-amd64-i386-libvirt-raw  pass
 test-amd64-amd64-libvirt-vhd pass



sg-report-flight on osstest.test-lab.xenproject.org
logs: /home/logs/logs
images: /home/logs/images

Logs, config files, etc. are available at
http://logs.test-lab.xenproject.org/osstest/logs

Explanation of these reports, and of osstest in general, 

Re: [RFC KERNEL PATCH v2 2/3] xen/pvh: Unmask irq for passthrough device in PVH dom0

2023-11-30 Thread Roger Pau Monné
On Wed, Nov 29, 2023 at 07:53:59PM -0800, Stefano Stabellini wrote:
> On Fri, 24 Nov 2023, Jiqian Chen wrote:
> > This patch is to solve two problems we encountered when we try to
> > passthrough a device to hvm domU base on Xen PVH dom0.
> > 
> > First, hvm guest will alloc a pirq and irq for a passthrough device
> > by using gsi, before that, the gsi must first has a mapping in dom0,
> > see Xen code pci_add_dm_done->xc_domain_irq_permission, it will call
> > into Xen and check whether dom0 has the mapping. See
> > XEN_DOMCTL_irq_permission->pirq_access_permitted, "current" is PVH
> > dom0 and it return irq is 0, and then return -EPERM.
> > This is because the passthrough device doesn't do PHYSDEVOP_map_pirq
> > when thay are enabled.
> > 
> > Second, in PVH dom0, the gsi of a passthrough device doesn't get
> > registered, but gsi must be configured for it to be able to be
> > mapped into a domU.
> > 
> > After searching codes, we can find map_pirq and register_gsi will be
> > done in function vioapic_write_redirent->vioapic_hwdom_map_gsi when
> > the gsi(aka ioapic's pin) is unmasked in PVH dom0. So the problems
> > can be conclude to that the gsi of a passthrough device doesn't be
> > unmasked.
> > 
> > To solve the unmaske problem, this patch call the unmask_irq when we
> > assign a device to be passthrough. So that the gsi can get registered
> > and mapped in PVH dom0.
> 
> 
> Roger, this seems to be more of a Xen issue than a Linux issue. Why do
> we need the unmask check in Xen? Couldn't we just do:
> 
> 
> diff --git a/xen/arch/x86/hvm/vioapic.c b/xen/arch/x86/hvm/vioapic.c
> index 4e40d3609a..df262a4a18 100644
> --- a/xen/arch/x86/hvm/vioapic.c
> +++ b/xen/arch/x86/hvm/vioapic.c
> @@ -287,7 +287,7 @@ static void vioapic_write_redirent(
>  hvm_dpci_eoi(d, gsi);
>  }
>  
> -if ( is_hardware_domain(d) && unmasked )
> +if ( is_hardware_domain(d) )
>  {
>  /*
>   * NB: don't call vioapic_hwdom_map_gsi while holding hvm.irq_lock

There are some issues with this approach.

mp_register_gsi() will only setup the trigger and polarity of the
IO-APIC pin once, so we do so once the guest unmask the pin in order
to assert that the configuration is the intended one.  A guest is
allowed to write all kind of nonsense stuff to the IO-APIC RTE, but
that doesn't take effect unless the pin is unmasked.

Overall the question would be whether we have any guarantees that
the hardware domain has properly configured the pin, even if it's not
using it itself (as it hasn't been unmasked).

IIRC PCI legacy interrupts are level triggered and low polarity, so we
could configure any pins that are not setup at bind time?

Thanks, Roger.



Re: [XEN PATCH 07/11] xen/iommu: address violations of MISRA C:2012 Rule 8.2

2023-11-30 Thread Jan Beulich
On 24.11.2023 15:03, Federico Serafini wrote:
> Add missing parameter names to address violations of MISRA C:2012
> Rule 8.2 and remove uses of u{8,16,32} in favor of C standard types.
> No functional change.
> 
> Signed-off-by: Federico Serafini 

Acked-by: Jan Beulich 





Re: [XEN PATCH 08/11] xen/kernel: address a violation of MISRA C:2012 Rule 8.2

2023-11-30 Thread Jan Beulich
On 24.11.2023 15:03, Federico Serafini wrote:
> Add missing parameter name. No functional change.
> 
> Signed-off-by: Federico Serafini 

Acked-by: Jan Beulich 





Re: [XEN PATCH 05/11] xen/domain: address violations of MISRA C:2012 Rule 8.2

2023-11-30 Thread Jan Beulich
On 24.11.2023 15:03, Federico Serafini wrote:
> Add missing parameter names. No functional change.
> 
> Signed-off-by: Federico Serafini 

Acked-by: Jan Beulich 





Re: [XEN PATCH 06/11] xen/notifier: address violations of MISRA C:2012 Rule 8.2

2023-11-30 Thread Jan Beulich
On 24.11.2023 15:03, Federico Serafini wrote:
> Add missing parameter names. No functional change.
> 
> Signed-off-by: Federico Serafini 

Acked-by: Jan Beulich 





Re: [XEN PATCH 0/2] x86/p2m: address a violation of MISRA C:2012 Rule 8.3

2023-11-30 Thread Federico Serafini

On 30/11/23 16:48, Federico Serafini wrote:

Patch 1/2 does some preparation work, hence it needs to be committed as first.
Patch 2/2 addresses the violation.

Federico Serafini (2):
   x86/p2m: preparation work for xenmem_add_to_physmap_one()
   x86/p2m: address a violation of MISRA C:2012 Rule 8.3

  xen/arch/x86/mm/p2m.c | 36 ++--
  1 file changed, 18 insertions(+), 18 deletions(-)


Adding maintainers in CC.

--
Federico Serafini, M.Sc.

Software Engineer, BUGSENG (http://bugseng.com)



[XEN PATCH 0/2] x86/p2m: address a violation of MISRA C:2012 Rule 8.3

2023-11-30 Thread Federico Serafini
Patch 1/2 does some preparation work, hence it needs to be committed as first.
Patch 2/2 addresses the violation.

Federico Serafini (2):
  x86/p2m: preparation work for xenmem_add_to_physmap_one()
  x86/p2m: address a violation of MISRA C:2012 Rule 8.3

 xen/arch/x86/mm/p2m.c | 36 ++--
 1 file changed, 18 insertions(+), 18 deletions(-)

-- 
2.34.1




[XEN PATCH 2/2] x86/p2m: address a violation of MISRA C:2012 Rule 8.3

2023-11-30 Thread Federico Serafini
Make function declaration and definition consistent changing
parameter name from "gpfn" to "gfn".
For consistency, rename also "old_gpfn" to "old_gfn".
No functional change.

Signed-off-by: Federico Serafini 
---
This patch depends on patch 1/2 of the same series.
---
 xen/arch/x86/mm/p2m.c | 28 ++--
 1 file changed, 14 insertions(+), 14 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index 42508e1e7e..6eb446e437 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -2414,10 +2414,10 @@ int xenmem_add_to_physmap_one(
 unsigned int space,
 union add_to_physmap_extra extra,
 unsigned long idx,
-gfn_t gpfn)
+gfn_t gfn)
 {
 struct page_info *page = NULL;
-unsigned long gmfn = 0 /* gcc ... */, old_gpfn;
+unsigned long gmfn = 0 /* gcc ... */, old_gfn;
 mfn_t prev_mfn;
 int rc = 0;
 mfn_t mfn = INVALID_MFN;
@@ -2431,7 +2431,7 @@ int xenmem_add_to_physmap_one(
 break;
 
 case XENMAPSPACE_grant_table:
-rc = gnttab_map_frame(d, idx, gpfn, );
+rc = gnttab_map_frame(d, idx, gfn, );
 if ( rc )
 return rc;
 /* Need to take care of the reference obtained in gnttab_map_frame(). 
*/
@@ -2455,7 +2455,7 @@ int xenmem_add_to_physmap_one(
 }
 
 case XENMAPSPACE_gmfn_foreign:
-return p2m_add_foreign(d, idx, gfn_x(gpfn), extra.foreign_domid);
+return p2m_add_foreign(d, idx, gfn_x(gfn), extra.foreign_domid);
 }
 
 if ( mfn_eq(mfn, INVALID_MFN) )
@@ -2475,12 +2475,12 @@ int xenmem_add_to_physmap_one(
  * Upon freeing we wouldn't be able to find other mappings in the P2M
  * (unless we did a brute force search).
  */
-prev_mfn = get_gfn(d, gfn_x(gpfn), );
+prev_mfn = get_gfn(d, gfn_x(gfn), );
 
 /* XENMAPSPACE_gmfn: Check if the MFN is associated with another GFN. */
-old_gpfn = get_gpfn_from_mfn(mfn_x(mfn));
-ASSERT(!SHARED_M2P(old_gpfn));
-if ( space == XENMAPSPACE_gmfn && old_gpfn != gmfn )
+old_gfn = get_gpfn_from_mfn(mfn_x(mfn));
+ASSERT(!SHARED_M2P(old_gfn));
+if ( space == XENMAPSPACE_gmfn && old_gfn != gmfn )
 {
 rc = -EXDEV;
 goto put_all;
@@ -2493,22 +2493,22 @@ int xenmem_add_to_physmap_one(
 {
 if ( is_special_page(mfn_to_page(prev_mfn)) )
 /* Special pages are simply unhooked from this phys slot. */
-rc = p2m_remove_page(d, gpfn, prev_mfn, PAGE_ORDER_4K);
+rc = p2m_remove_page(d, gfn, prev_mfn, PAGE_ORDER_4K);
 else if ( !mfn_eq(mfn, prev_mfn) )
 /* Normal domain memory is freed, to avoid leaking memory. */
-rc = guest_remove_page(d, gfn_x(gpfn));
+rc = guest_remove_page(d, gfn_x(gfn));
 }
 
 /* Unmap from old location, if any. */
-if ( !rc && old_gpfn != INVALID_M2P_ENTRY && !gfn_eq(_gfn(old_gpfn), gpfn) 
)
-rc = p2m_remove_page(d, _gfn(old_gpfn), mfn, PAGE_ORDER_4K);
+if ( !rc && old_gfn != INVALID_M2P_ENTRY && !gfn_eq(_gfn(old_gfn), gfn) )
+rc = p2m_remove_page(d, _gfn(old_gfn), mfn, PAGE_ORDER_4K);
 
 /* Map at new location. */
 if ( !rc )
-rc = p2m_add_page(d, gpfn, mfn, PAGE_ORDER_4K, p2m_ram_rw);
+rc = p2m_add_page(d, gfn, mfn, PAGE_ORDER_4K, p2m_ram_rw);
 
  put_all:
-put_gfn(d, gfn_x(gpfn));
+put_gfn(d, gfn_x(gfn));
 
  put_both:
 /*
-- 
2.34.1




[XEN PATCH 1/2] x86/p2m: preparation work for xenmem_add_to_physmap_one()

2023-11-30 Thread Federico Serafini
The objective is to use parameter name "gfn" for
xenmem_add_to_physmap_one().
Since the name "gfn" is currently used as identifier for a local
variable, bad things could happen if new uses of such variable are
committed while a renaming patch is waiting for the approval.
To avoid such danger, as first thing rename the local variable from
"gfn" to "gmfn".

No functional change.

Signed-off-by: Federico Serafini 
---
 xen/arch/x86/mm/p2m.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/xen/arch/x86/mm/p2m.c b/xen/arch/x86/mm/p2m.c
index fe9ccabb87..42508e1e7e 100644
--- a/xen/arch/x86/mm/p2m.c
+++ b/xen/arch/x86/mm/p2m.c
@@ -2417,7 +2417,7 @@ int xenmem_add_to_physmap_one(
 gfn_t gpfn)
 {
 struct page_info *page = NULL;
-unsigned long gfn = 0 /* gcc ... */, old_gpfn;
+unsigned long gmfn = 0 /* gcc ... */, old_gpfn;
 mfn_t prev_mfn;
 int rc = 0;
 mfn_t mfn = INVALID_MFN;
@@ -2440,12 +2440,12 @@ int xenmem_add_to_physmap_one(
 
 case XENMAPSPACE_gmfn:
 {
-gfn = idx;
-mfn = get_gfn_unshare(d, gfn, );
+gmfn = idx;
+mfn = get_gfn_unshare(d, gmfn, );
 /* If the page is still shared, exit early */
 if ( p2m_is_shared(p2mt) )
 {
-put_gfn(d, gfn);
+put_gfn(d, gmfn);
 return -ENOMEM;
 }
 page = get_page_from_mfn(mfn, d);
@@ -2480,7 +2480,7 @@ int xenmem_add_to_physmap_one(
 /* XENMAPSPACE_gmfn: Check if the MFN is associated with another GFN. */
 old_gpfn = get_gpfn_from_mfn(mfn_x(mfn));
 ASSERT(!SHARED_M2P(old_gpfn));
-if ( space == XENMAPSPACE_gmfn && old_gpfn != gfn )
+if ( space == XENMAPSPACE_gmfn && old_gpfn != gmfn )
 {
 rc = -EXDEV;
 goto put_all;
@@ -2518,7 +2518,7 @@ int xenmem_add_to_physmap_one(
  */
 if ( space == XENMAPSPACE_gmfn )
 {
-put_gfn(d, gfn);
+put_gfn(d, gmfn);
 if ( !rc && extra.ppage )
 {
 *extra.ppage = page;
-- 
2.34.1




Re: [PATCH 02/12] tests: remove aio_context_acquire() tests

2023-11-30 Thread Eric Blake
On Wed, Nov 29, 2023 at 02:55:43PM -0500, Stefan Hajnoczi wrote:
> The aio_context_acquire() API is being removed. Drop the test case that
> calls the API.
> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  tests/unit/test-aio.c | 67 +--
>  1 file changed, 1 insertion(+), 66 deletions(-)

The rest of this series should not hold up 8.2.

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




[RFC 34/41] hw/core/topo: Implement user-child to collect topology device from cli

2023-11-30 Thread Zhao Liu
From: Zhao Liu 

Support user-child for topology devices.

This will affect these 2 aspects:
1. For the basic topology device (with DEVICE_CATEGORY_CPU_DEF
   category), user could specify "parent" to build the topology
   relationship from cli. And cpu-slot will collect all topology
   devices.

2. For the hotplug topology devices (ppc-core or CPUs of other arches),
   user-child could help to search the correct topology parent in
   topology tree with the index properties. This is compatible with
   the original behavior of inserting CPU/core from cli. And this
   requires arch to support QOM topology with a few arch-specific
   modifications, before this support, hotplugged CPUs/cores are
   inserted into cpu-slot by default.

Signed-off-by: Zhao Liu 
---
 hw/core/cpu-slot.c | 44 ++
 hw/core/cpu-topo.c | 38 
 include/hw/core/cpu-slot.h |  3 +++
 include/hw/core/cpu-topo.h |  4 
 4 files changed, 89 insertions(+)

diff --git a/hw/core/cpu-slot.c b/hw/core/cpu-slot.c
index 45b6aef0750a..413daa66aaad 100644
--- a/hw/core/cpu-slot.c
+++ b/hw/core/cpu-slot.c
@@ -559,3 +559,47 @@ bool machine_validate_cpu_topology(MachineState *ms, Error 
**errp)
 
 return true;
 }
+
+Object *cpu_slot_get_free_parent(CPUTopoState *child, Error **errp)
+{
+MachineState *ms = MACHINE(qdev_get_machine());
+MachineClass *mc = MACHINE_GET_CLASS(ms);
+CPUTopoLevel level = CPU_TOPO_LEVEL(child);
+CPUSlot *slot = ms->topo;
+
+if (!slot) {
+return NULL;
+}
+
+/*
+ * For CPUs and cores that support hotplug, the behavior is to specify
+ * some topology sub ids. This requires special handling.
+ */
+if (level == mc->smp_props.possible_cpus_qom_granu) {
+CPUTopoClass *child_tc = CPU_TOPO_GET_CLASS(child);
+
+if (child_tc->search_parent_pre_plug) {
+return child_tc->search_parent_pre_plug(child,
+   CPU_TOPO(slot), errp);
+}
+}
+
+/*
+ * For other topology devices, just collect them into CPU slot.
+ * The realize() of CPUTopoClass will check no "parent" option case.
+ */
+return OBJECT(slot);
+}
+
+char *cpu_slot_name_future_child(CPUTopoState *child)
+{
+MachineState *ms = MACHINE(qdev_get_machine());
+CPUTopoLevel level = CPU_TOPO_LEVEL(child);
+CPUSlot *slot = ms->topo;
+
+if (!slot) {
+return NULL;
+}
+
+return get_topo_global_name(>stat, level);
+}
diff --git a/hw/core/cpu-topo.c b/hw/core/cpu-topo.c
index 351112ca7a73..143b0a148b17 100644
--- a/hw/core/cpu-topo.c
+++ b/hw/core/cpu-topo.c
@@ -20,8 +20,10 @@
 
 #include "qemu/osdep.h"
 
+#include "hw/core/cpu-slot.h"
 #include "hw/core/cpu-topo.h"
 #include "hw/qdev-properties.h"
+#include "monitor/user-child.h"
 #include "qapi/error.h"
 
 const char *cpu_topo_level_to_string(CPUTopoLevel level)
@@ -272,10 +274,38 @@ static void cpu_topo_unrealize(DeviceState *dev)
 }
 }
 
+/*
+ * Prefer to insert topology device into topology tree where the CPU
+ * slot is the root.
+ */
+static Object *cpu_topo_get_parent(UserChild *uc, Error **errp)
+{
+return cpu_slot_get_free_parent(CPU_TOPO(uc), errp);
+}
+
+static char *cpu_topo_get_child_name(UserChild *uc, Object *parent)
+{
+return cpu_slot_name_future_child(CPU_TOPO(uc));
+}
+
+/* Only check type. Other topology details with be checked at realize(). */
+static bool cpu_topo_check_user_parent(UserChild *uc, Object *parent)
+{
+CPUTopoState *topo;
+
+topo = (CPUTopoState *)object_dynamic_cast(parent, TYPE_CPU_TOPO);
+if (!topo) {
+return false;
+}
+
+return true;
+}
+
 static void cpu_topo_class_init(ObjectClass *oc, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(oc);
 CPUTopoClass *tc = CPU_TOPO_CLASS(oc);
+UserChildClass *ucc = USER_CHILD_CLASS(oc);
 
 /* All topology devices belong to CPU property. */
 set_bit(DEVICE_CATEGORY_CPU, dc->categories);
@@ -290,6 +320,10 @@ static void cpu_topo_class_init(ObjectClass *oc, void 
*data)
 dc->hotpluggable = false;
 
 tc->level = CPU_TOPO_UNKNOWN;
+
+ucc->get_parent = cpu_topo_get_parent;
+ucc->get_child_name = cpu_topo_get_child_name;
+ucc->check_parent = cpu_topo_check_user_parent;
 }
 
 static void cpu_topo_instance_init(Object *obj)
@@ -310,6 +344,10 @@ static const TypeInfo cpu_topo_type_info = {
 .class_init = cpu_topo_class_init,
 .instance_size = sizeof(CPUTopoState),
 .instance_init = cpu_topo_instance_init,
+.interfaces = (InterfaceInfo[]) {
+{ TYPE_USER_CHILD },
+{ }
+}
 };
 
 static void cpu_topo_register_types(void)
diff --git a/include/hw/core/cpu-slot.h b/include/hw/core/cpu-slot.h
index 9e1c6d6b7ff2..1a63f55f52c3 100644
--- a/include/hw/core/cpu-slot.h
+++ b/include/hw/core/cpu-slot.h
@@ -102,4 +102,7 @@ void machine_plug_cpu_slot(MachineState *ms);
 void machine_create_smp_topo_tree(MachineState *ms, Error **errp);
 

Re: [PATCH 01/12] virtio-scsi: replace AioContext lock with tmf_bh_lock

2023-11-30 Thread Eric Blake
On Wed, Nov 29, 2023 at 02:55:42PM -0500, Stefan Hajnoczi wrote:
> Protect the Task Management Function BH state with a lock. The TMF BH
> runs in the main loop thread. An IOThread might process a TMF at the
> same time as the TMF BH is running. Therefore tmf_bh_list and tmf_bh
> must be protected by a lock.
> 
> Run TMF request completion in the IOThread using aio_wait_bh_oneshot().
> This avoids more locking to protect the virtqueue and SCSI layer state.

Are we trying to get this into 8.2?

> 
> Signed-off-by: Stefan Hajnoczi 
> ---
>  include/hw/virtio/virtio-scsi.h |  3 +-
>  hw/scsi/virtio-scsi.c   | 62 ++---
>  2 files changed, 43 insertions(+), 22 deletions(-)
>

Reviewed-by: Eric Blake 

-- 
Eric Blake, Principal Software Engineer
Red Hat, Inc.
Virtualization:  qemu.org | libguestfs.org




[RFC 33/41] hw/machine: Validate smp topology tree without -smp

2023-11-30 Thread Zhao Liu
From: Zhao Liu 

QOM topology allows user to create topology tree from cli without -smp,
in this case, validate the topology tree to meet the smp requirement.

Currently, for compatibility with MachineState.smp, initialize
MachineState.smp from topology tree in this case.

Signed-off-by: Zhao Liu 
---
 hw/core/cpu-slot.c | 146 +
 hw/core/machine.c  |  10 +++
 include/hw/core/cpu-slot.h |   1 +
 3 files changed, 157 insertions(+)

diff --git a/hw/core/cpu-slot.c b/hw/core/cpu-slot.c
index ade155baf60b..45b6aef0750a 100644
--- a/hw/core/cpu-slot.c
+++ b/hw/core/cpu-slot.c
@@ -413,3 +413,149 @@ void machine_create_smp_topo_tree(MachineState *ms, Error 
**errp)
 }
 slot->smp_parsed = true;
 }
+
+static void set_smp_child_topo_info(CpuTopology *smp_info,
+CPUTopoStat *stat,
+CPUTopoLevel child_level)
+{
+unsigned int *smp_count;
+CPUTopoStatEntry *entry;
+
+smp_count = get_smp_info_by_level(smp_info, child_level);
+entry = get_topo_stat_entry(stat, child_level);
+*smp_count = entry->max_units ? entry->max_units : 1;
+
+return;
+}
+
+typedef struct ValidateCbData {
+CPUTopoStat *stat;
+CpuTopology *smp_info;
+Error **errp;
+} ValidateCbData;
+
+static int validate_topo_children(CPUTopoState *topo, void *opaque)
+{
+CPUTopoLevel level = CPU_TOPO_LEVEL(topo), next_level;
+ValidateCbData *cb = opaque;
+unsigned int max_children;
+CPUTopoStatEntry *entry;
+Error **errp = cb->errp;
+
+if (level != CPU_TOPO_THREAD && !topo->num_children &&
+!topo->max_children) {
+error_setg(errp, "Invalid topology: the CPU topology "
+   "(level: %s, index: %d) isn't completed.",
+   cpu_topo_level_to_string(level), topo->index);
+return TOPO_FOREACH_ERR;
+}
+
+if (level == CPU_TOPO_UNKNOWN) {
+error_setg(errp, "Invalid CPU topology: unknown topology level.");
+return TOPO_FOREACH_ERR;
+}
+
+/*
+ * Only CPU_TOPO_THREAD level's child_level could be CPU_TOPO_UNKNOWN,
+ * but machine_validate_cpu_topology() is before CPU creation.
+ */
+if (topo->child_level == CPU_TOPO_UNKNOWN) {
+error_setg(errp, "Invalid CPU topology: incomplete topology "
+   "(level: %s, index: %d), no child?.",
+   cpu_topo_level_to_string(level), topo->index);
+return TOPO_FOREACH_ERR;
+}
+
+/*
+ * Currently hybrid topology isn't supported, so only SMP topology
+ * is allowed.
+ */
+
+entry = get_topo_stat_entry(cb->stat, topo->child_level);
+
+/* Max threads per core is pre-configured by "nr-threads". */
+max_children = topo->child_level != CPU_TOPO_THREAD ?
+   topo->num_children : topo->max_children;
+
+if (entry->max_units != max_children) {
+error_setg(errp, "Invalid SMP topology: "
+   "The %s topology is asymmetric.",
+   cpu_topo_level_to_string(level));
+return TOPO_FOREACH_ERR;
+}
+
+next_level = find_next_bit(cb->stat->curr_levels, USER_AVAIL_LEVEL_NUM,
+   topo->child_level + 1);
+
+if (next_level != level) {
+error_setg(errp, "Invalid smp topology: "
+   "asymmetric CPU topology depth.");
+return TOPO_FOREACH_ERR;
+}
+
+set_smp_child_topo_info(cb->smp_info, cb->stat, topo->child_level);
+
+return TOPO_FOREACH_CONTINUE;
+}
+
+/*
+ * Only check the case user configures CPU topology via -device
+ * without -smp. In this case, MachineState.smp also needs to be
+ * initialized based on topology tree.
+ */
+bool machine_validate_cpu_topology(MachineState *ms, Error **errp)
+{
+MachineClass *mc = MACHINE_GET_CLASS(ms);
+CPUTopoState *slot_topo = CPU_TOPO(ms->topo);
+CPUTopoStat *stat = >topo->stat;
+CpuTopology *smp_info = >smp;
+unsigned int total_cpus;
+ValidateCbData cb;
+
+if (ms->topo->smp_parsed) {
+return true;
+} else if (!slot_topo->num_children) {
+/*
+ * If there's no -smp nor -device to add topology children,
+ * then create the default topology.
+ */
+machine_create_smp_topo_tree(ms, errp);
+if (*errp) {
+return false;
+}
+return true;
+}
+
+if (test_bit(CPU_TOPO_CLUSTER, stat->curr_levels)) {
+mc->smp_props.has_clusters = true;
+}
+
+/*
+ * The next cpu_topo_child_foreach_recursive() doesn't cover the
+ * parent topology unit. Update information for root here.
+ */
+set_smp_child_topo_info(smp_info, stat, slot_topo->child_level);
+
+cb.stat = stat;
+cb.smp_info = smp_info;
+cb.errp = errp;
+
+cpu_topo_child_foreach_recursive(slot_topo, NULL,
+ validate_topo_children, );
+if (*errp) {
+return false;
+}
+
+ 

Re: [RFC XEN PATCH v2 2/3] x86/pvh: Open PHYSDEVOP_map_pirq for PVH dom0

2023-11-30 Thread Roger Pau Monné
On Thu, Nov 30, 2023 at 06:32:00AM +, Chen, Jiqian wrote:
> 
> On 2023/11/28 22:17, Roger Pau Monné wrote:
> > On Fri, Nov 24, 2023 at 06:41:35PM +0800, Jiqian Chen wrote:
> >> If we run Xen with PVH dom0 and hvm domU, hvm will map a pirq for
> >> a passthrough device by using gsi, see xen_pt_realize->xc_physdev_map_pirq
> >> and pci_add_dm_done->xc_physdev_map_pirq. Then xc_physdev_map_pirq will
> >> call into Xen, but in hvm_physdev_op, PHYSDEVOP_map_pirq is not allowed
> >> because currd is PVH dom0 and PVH has no X86_EMU_USE_PIRQ flag, it will
> >> fail at has_pirq check.
> >>
> >> So, I think we may need to allow PHYSDEVOP_map_pirq when currd is dom0 (at
> > 
> > And PHYSDEVOP_unmap_pirq also?
> Yes, in the failed path, PHYSDEVOP_unmap_pirq will be called. I will add some 
> descriptions about it into the commit message.
> 
> > 
> >> present dom0 is PVH).
> > 
> > IMO it would be better to implement a new set of DMOP hypercalls that
> > handle the setup of interrupts from physical devices that are assigned
> > to a guest.  That should allow us to get rid of the awkward PHYSDEVOP
> > + XEN_DOMCTL_{,un}bind_pt_irq hypercall interface, which currently
> > prevents QEMU from being hypervisor version agnostic (since the domctl
> > interface is not stable).
> > 
> > I understand this might be too much to ask for, but something to take
> > into account.
> Yes, that will be a complex project. I think current change can meet the 
> needs. We can take DMOP into account in the future. Thank you.

The issue with this approach is that we always do things in a rush and
cut corners, and then never pay back the debt.  Anyway, I'm not going
to block this, and I'm not blaming you.

Sadly this is just focused on getting something working in the short
term rather than thinking long term in a maintainable interface.

Regards, Roger.



INFORMAL VOTE REQUIRED - DOCUMENTATION WORDING

2023-11-30 Thread Kelly Choi
Hi all,

There have been a few discussions about how we use documentation wording
within the community. Whilst there are differences in opinions and
perceptions of the definition, it would be helpful to see a wider consensus
of how we feel.

*Discussion: Should we use the term 'broken' in our documentation, or do
you think an alternative wording would be better? If you agree or disagree,
please vote as this will impact future discussions. *

I have purposely made the vote between two options to help us move in a
forward direction.

*PLEASE VOTE HERE. Deadline 15th December 2023.
*
*Your name will be required but will be private. If you answer anonymously,
your vote will not count. This is to ensure it is fair and each person gets
one vote. *

As an open-source project, we need to come to a common ground, which
sometimes means we may not personally agree. To make this fair, please note
the final results will be used to determine our future actions within the
community.

If the majority votes for/against, we will respect the majority and
implement this accordingly.

Many thanks,
Kelly Choi

Xen Project Community Manager
XenServer, Cloud Software Group


Re: [RFC KERNEL PATCH v2 1/3] xen/pci: Add xen_reset_device_state function

2023-11-30 Thread Stewart Hildebrand
On 11/30/23 02:03, Chen, Jiqian wrote:
> 
> On 2023/11/30 11:46, Stefano Stabellini wrote:
>> On Fri, 24 Nov 2023, Jiqian Chen wrote:
>>> When device on dom0 side has been reset, the vpci on Xen side
>>> won't get notification, so that the cached state in vpci is
>>> all out of date with the real device state.
>>> To solve that problem, this patch add a function to clear all
>>> vpci device state when device is reset on dom0 side.
>>>
>>> And call that function in pcistub_init_device. Because when
>>> we use "pci-assignable-add" to assign a passthrough device in
>>> Xen, it will reset passthrough device and the vpci state will
>>> out of date, and then device will fail to restore bar state.
>>>
>>> Signed-off-by: Jiqian Chen 
>>> Signed-off-by: Huang Rui 
>>> ---
>>>  drivers/xen/pci.c  | 12 
>>>  drivers/xen/xen-pciback/pci_stub.c |  3 +++
>>>  include/xen/interface/physdev.h|  2 ++
>>>  include/xen/pci.h  |  6 ++
>>>  4 files changed, 23 insertions(+)
>>>
>>> diff --git a/drivers/xen/pci.c b/drivers/xen/pci.c
>>> index 72d4e3f193af..e9b30bc09139 100644
>>> --- a/drivers/xen/pci.c
>>> +++ b/drivers/xen/pci.c
>>> @@ -177,6 +177,18 @@ static int xen_remove_device(struct device *dev)
>>> return r;
>>>  }
>>>  
>>> +int xen_reset_device_state(const struct pci_dev *dev)
>>> +{
>>> +   struct physdev_pci_device device = {
>>> +   .seg = pci_domain_nr(dev->bus),
>>> +   .bus = dev->bus->number,
>>> +   .devfn = dev->devfn
>>> +   };
>>> +
>>> +   return HYPERVISOR_physdev_op(PHYSDEVOP_pci_device_state_reset, );
>>> +}
>>> +EXPORT_SYMBOL_GPL(xen_reset_device_state);
>>> +
>>>  static int xen_pci_notifier(struct notifier_block *nb,
>>> unsigned long action, void *data)
>>>  {
>>> diff --git a/drivers/xen/xen-pciback/pci_stub.c 
>>> b/drivers/xen/xen-pciback/pci_stub.c
>>> index e34b623e4b41..5a96b6c66c07 100644
>>> --- a/drivers/xen/xen-pciback/pci_stub.c
>>> +++ b/drivers/xen/xen-pciback/pci_stub.c
>>> @@ -421,6 +421,9 @@ static int pcistub_init_device(struct pci_dev *dev)
>>> else {
>>> dev_dbg(>dev, "resetting (FLR, D3, etc) the device\n");
>>> __pci_reset_function_locked(dev);
>>> +   err = xen_reset_device_state(dev);
>>> +   if (err)
>>> +   goto config_release;
>>
>> Older versions of Xen won't have the hypercall
>> PHYSDEVOP_pci_device_state_reset implemented. I think we should do
>> something like:
>>
>> if (err && xen_pvh_domain())
>> goto config_release;
>>
>>
>> Or even:
>>
>> if (xen_pvh_domain()) {
>> err = xen_reset_device_state(dev);
>> if (err)
>> goto config_release;
>> }
>>
>> depending on whether we want to call xen_reset_device_state also for PV
>> guests or not. I am assuming we don't want to error out on failure such
>> as -ENOENT for PV guests.
> Yes, only for PVH dom0, I will add the condition in next version. Thank you!

We will want to call xen_reset_device_state() for Arm dom0, too, so checking 
xen_pvh_domain() alone is not sufficient. I suggest instead to check 
!xen_pv_domain().

> 
>>
>>
>>> pci_restore_state(dev);
>>> }
>>> /* Now disable the device (this also ensures some private device
>>> diff --git a/include/xen/interface/physdev.h 
>>> b/include/xen/interface/physdev.h
>>> index a237af867873..231526f80f6c 100644
>>> --- a/include/xen/interface/physdev.h
>>> +++ b/include/xen/interface/physdev.h
>>> @@ -263,6 +263,8 @@ struct physdev_pci_device {
>>>  uint8_t devfn;
>>>  };
>>>  
>>> +#define PHYSDEVOP_pci_device_state_reset 32
>>> +
>>>  #define PHYSDEVOP_DBGP_RESET_PREPARE1
>>>  #define PHYSDEVOP_DBGP_RESET_DONE   2
>>>  
>>> diff --git a/include/xen/pci.h b/include/xen/pci.h
>>> index b8337cf85fd1..b2e2e856efd6 100644
>>> --- a/include/xen/pci.h
>>> +++ b/include/xen/pci.h
>>> @@ -4,10 +4,16 @@
>>>  #define __XEN_PCI_H__
>>>  
>>>  #if defined(CONFIG_XEN_DOM0)
>>> +int xen_reset_device_state(const struct pci_dev *dev);
>>>  int xen_find_device_domain_owner(struct pci_dev *dev);
>>>  int xen_register_device_domain_owner(struct pci_dev *dev, uint16_t domain);
>>>  int xen_unregister_device_domain_owner(struct pci_dev *dev);
>>>  #else
>>> +static inline int xen_reset_device_state(const struct pci_dev *dev)
>>> +{
>>> +   return -1;
>>> +}
>>> +
>>>  static inline int xen_find_device_domain_owner(struct pci_dev *dev)
>>>  {
>>> return -1;
>>> -- 
>>> 2.34.1
>>>
> 



Re: [PATCH] x86/DMI: adjustments to comply with Misra C:2012 Rule 9.3

2023-11-30 Thread Jan Beulich
On 30.11.2023 08:55, Jan Beulich wrote:
> The rule demands that all array elements be initialized (or dedicated
> initializers be used). Introduce a small set of macros to allow doing so
> without unduly affecting use sites (in particular in terms of how many
> elements .matches[] actually has; right now there's no use of
> DMI_MATCH4(), so we could even consider reducing the array size to 3).
> 
> Note that DMI_MATCH() needs adjustment because of the comma included in
> its expansion, which - due to being unparenthesized - would otherwise
> cause macro arguments in the "further replacement" step to be wrong.

Sadly this doesn't work with older gcc (4.8.5 is what I had an issue with,
complaining "initializer element is not constant").

Jan



Re: [RFC XEN PATCH v2 1/3] xen/vpci: Clear all vpci status of device

2023-11-30 Thread Roger Pau Monné
On Thu, Nov 30, 2023 at 07:39:38AM -0500, Daniel P. Smith wrote:
> On 11/30/23 07:25, Daniel P. Smith wrote:
> > On 11/30/23 01:22, Chen, Jiqian wrote:
> > > Hi Roger and Daniel P. Smith,
> > > 
> > > On 2023/11/28 22:08, Roger Pau Monné wrote:
> > > > On Fri, Nov 24, 2023 at 06:41:34PM +0800, Jiqian Chen wrote:
> > > > > When a device has been reset on dom0 side, the vpci on Xen
> > > > > side won't get notification, so the cached state in vpci is
> > > > > all out of date compare with the real device state.
> > > > > To solve that problem, this patch add new hypercall to clear
> > > > > all vpci device state. And when reset device happens on dom0
> > > > > side, dom0 can call this hypercall to notify vpci.
> > > > > 
> > > > > Signed-off-by: Jiqian Chen 
> > > > > Signed-off-by: Huang Rui 
> > > > > ---
> > > > >   xen/arch/x86/hvm/hypercall.c  |  1 +
> > > > >   xen/drivers/passthrough/pci.c | 21 +
> > > > >   xen/drivers/pci/physdev.c | 14 ++
> > > > >   xen/drivers/vpci/vpci.c   |  9 +
> > > > >   xen/include/public/physdev.h  |  2 ++
> > > > >   xen/include/xen/pci.h |  1 +
> > > > >   xen/include/xen/vpci.h    |  6 ++
> > > > >   7 files changed, 54 insertions(+)
> > > > > 
> > > > > diff --git a/xen/arch/x86/hvm/hypercall.c
> > > > > b/xen/arch/x86/hvm/hypercall.c
> > > > > index eeb73e1aa5..6ad5b4d5f1 100644
> > > > > --- a/xen/arch/x86/hvm/hypercall.c
> > > > > +++ b/xen/arch/x86/hvm/hypercall.c
> > > > > @@ -84,6 +84,7 @@ long hvm_physdev_op(int cmd,
> > > > > XEN_GUEST_HANDLE_PARAM(void) arg)
> > > > >   case PHYSDEVOP_pci_mmcfg_reserved:
> > > > >   case PHYSDEVOP_pci_device_add:
> > > > >   case PHYSDEVOP_pci_device_remove:
> > > > > +    case PHYSDEVOP_pci_device_state_reset:
> > > > >   case PHYSDEVOP_dbgp_op:
> > > > >   if ( !is_hardware_domain(currd) )
> > > > >   return -ENOSYS;
> > > > > diff --git a/xen/drivers/passthrough/pci.c
> > > > > b/xen/drivers/passthrough/pci.c
> > > > > index 04d00c7c37..f871715585 100644
> > > > > --- a/xen/drivers/passthrough/pci.c
> > > > > +++ b/xen/drivers/passthrough/pci.c
> > > > > @@ -824,6 +824,27 @@ int pci_remove_device(u16 seg, u8 bus, u8 devfn)
> > > > >   return ret;
> > > > >   }
> > > > > +int pci_reset_device_state(u16 seg, u8 bus, u8 devfn)
> > > > 
> > > > You could use pci_sbdf_t here instead of 3 parameters.
> > > Will change in next version, thank you.
> > > 
> > > > 
> > > > I'm however unsure whether we really need this helper just to fetch
> > > > the pdev and call vpci_reset_device_state(), I think you could place
> > > > this logic directly in pci_physdev_op().  Unless there are plans to
> > > > use such logic outside of pci_physdev_op().
> > > If I place the logic of pci_reset_device_state directly in
> > > pci_physdev_op. I think I need to declare vpci_reset_device_state in
> > > pci.h? If it is suitable, I will change in next version.
> > > 
> > > > 
> > > > > +{
> > > > > +    struct pci_dev *pdev;
> > > > > +    int ret = -ENODEV;
> > > > 
> > > > Some XSM check should be introduced fro this operation, as none of the
> > > > existing ones is suitable.  See xsm_resource_unplug_pci() for example.
> > > > 
> > > > xsm_resource_reset_state_pci() or some such I would assume.
> > > I don't know what I should do in XSM function(assume it is
> > > xsm_resource_reset_state_pci).
> > > Hi Daniel P. Smith, could you please give me some suggestions?
> > > Just to check the XSM_PRIV action?
> > > 
> > 
> > Roger, thank you for seeing this and adding me in!
> > 
> > Jiqian, I just wanted to let you know I have seen your post but I have
> > been a little tied up this week. Just with the cursory look, I think
> > Roger is suggesting a new XSM check/hook is warranted.
> > 
> > If you would like to attempt at writing the dummy policy side, mimic
> > xsm_resource_plug_pci in xen/include/xsm/dummy.h and
> > xen/include/xsm/xsm.h, then I can look at handling the FLASK portion
> > next week and provide it to you for inclusion into the series. If you
> > are not comfortable with it, I can look at the whole thing next week.
> > Just let me know what you would be comfortable with.
> 
> Apologies, thinking about for a moment and was thinking the hook should be
> called xsm_resource_config_pci. I would reset as a config operation and
> there might be new ones in the future. I do not believe there is a need to
> have fine grain access control down to individual config operation, thus
> they could all be captured under this one hook. Roger, what are your
> thoughts about this, in particular how you see vpci evolving?

So the configuration space reset should only be done by the domain
that's also capable of triggering the physical device reset, usually
the hardware domain.  I don't think it's possible ATM to allow a
domain different than the hardware domain to perform a PCI reset, as
doing it requires unmediated access to the PCI config space.

So 

[RFC 06/41] qdev: Introduce user-child interface to collect devices from -device

2023-11-30 Thread Zhao Liu
From: Zhao Liu 

Topology relationship is based on child<> property, therefore introduce
a new user-child interface to help bus-less devices create child<>
property from cli.

User-child interface works in qdev_set_id(), where the child<> property
is created for cli devices.

With several methods, user-child could provide the specific "parent"
device other than the default peripheral/peripheral-anon container.

The topology root (cpu-slot) could collect topology devices based on
user-child implementation.

Signed-off-by: Zhao Liu 
---
 MAINTAINERS  |  2 +
 include/monitor/user-child.h | 57 +++
 include/qom/object.h | 11 +
 qom/object.c | 13 ++
 system/meson.build   |  1 +
 system/qdev-monitor.c| 89 +---
 system/user-child.c  | 72 +
 7 files changed, 239 insertions(+), 6 deletions(-)
 create mode 100644 include/monitor/user-child.h
 create mode 100644 system/user-child.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 695e0bd34fbb..fdbabaa983cc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3208,12 +3208,14 @@ F: hw/core/bus.c
 F: hw/core/sysbus.c
 F: include/hw/qdev*
 F: include/monitor/qdev.h
+F: include/monitor/user-child.h
 F: include/qom/
 F: qapi/qom.json
 F: qapi/qdev.json
 F: scripts/coccinelle/qom-parent-type.cocci
 F: scripts/qom-cast-macro-clean-cocci-gen.py
 F: system/qdev-monitor.c
+F: system/user-child.c
 F: stubs/qdev.c
 F: qom/
 F: tests/unit/check-qom-interface.c
diff --git a/include/monitor/user-child.h b/include/monitor/user-child.h
new file mode 100644
index ..e246fcefe40a
--- /dev/null
+++ b/include/monitor/user-child.h
@@ -0,0 +1,57 @@
+/*
+ * Child configurable interface header.
+ *
+ * Copyright (c) 2023 Intel Corporation
+ * Author: Zhao Liu 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License,
+ * or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#ifndef USER_CHILD_H
+#define USER_CHILD_H
+
+#include "qom/object.h"
+
+#define TYPE_USER_CHILD "user-child"
+
+typedef struct UserChildClass UserChildClass;
+DECLARE_CLASS_CHECKERS(UserChildClass, USER_CHILD, TYPE_USER_CHILD)
+#define USER_CHILD(obj) INTERFACE_CHECK(UserChild, (obj), TYPE_USER_CHILD)
+
+typedef struct UserChild UserChild;
+
+/**
+ * UserChildClass:
+ * @get_parent: Method to get the default parent if user doesn't specify
+ * the parent in cli.
+ * @get_child_name: Method to get the default device id for this device
+ * if user doesn't specify id in cli.
+ * @check_parent: Method to check if the parent specified by user in cli
+ * is valid.
+ */
+struct UserChildClass {
+/*  */
+InterfaceClass parent_class;
+
+/*  */
+Object *(*get_parent)(UserChild *uc, Error **errp);
+char *(*get_child_name)(UserChild *uc, Object *parent);
+bool (*check_parent)(UserChild *uc, Object *parent);
+};
+
+Object *uc_provide_default_parent(Object *obj, Error **errp);
+char *uc_name_future_child(Object *obj, Object *parent);
+bool uc_check_user_parent(Object *obj, Object *parent);
+
+#endif /* USER_CHILD_H */
diff --git a/include/qom/object.h b/include/qom/object.h
index 494eef801be3..f725d9452c76 100644
--- a/include/qom/object.h
+++ b/include/qom/object.h
@@ -1484,6 +1484,17 @@ Object *object_get_objects_root(void);
  */
 Object *object_get_internal_root(void);
 
+/**
+ * object_is_child_from:
+ * @child: the object.
+ * @parent: the parent/non-direct parent object.
+ *
+ * Check whether @parent is the parent/non-direct parent of @child.
+ *
+ * Returns: true iff @parent is the parent/non-direct parent of @child.
+ */
+bool object_is_child_from(const Object *child, const Object *parent);
+
 /**
  * object_get_canonical_path_component:
  * @obj: the object
diff --git a/qom/object.c b/qom/object.c
index da29e88816b5..d6f55aa59504 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -2024,6 +2024,19 @@ object_property_add_const_link(Object *obj, const char 
*name,
 NULL, OBJ_PROP_LINK_DIRECT);
 }
 
+bool object_is_child_from(const Object *child, const Object *parent)
+{
+Object *obj = child->parent;
+
+while (obj) {
+if (obj == parent) {
+return true;
+}
+obj = obj->parent;
+}
+return false;
+}
+
 const char *object_get_canonical_path_component(const Object *obj)
 {
 ObjectProperty *prop = NULL;
diff --git 

[RFC 05/41] qdev: Set device parent and id after setting properties

2023-11-30 Thread Zhao Liu
From: Zhao Liu 

The properties setting does not conflict with the creation of child<>
property.

Pre-setting the device's properties can help the device's parent
selection. Some topology devices (e.g., CPUs that support hotplug)
usually define topology sub indexes as properties, and the selection of
their parent needs to be based on these proteries.

Move qdev_set_id() after properties setting to help the next user-child
introduction.

Signed-off-by: Zhao Liu 
---
 system/qdev-monitor.c | 15 ---
 1 file changed, 8 insertions(+), 7 deletions(-)

diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index 7ee33a50142a..107411bb50cc 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -700,14 +700,7 @@ DeviceState *qdev_device_add_from_qdict(const QDict *opts, 
long *category,
 }
 }
 
-/*
- * set dev's parent and register its id.
- * If it fails it means the id is already taken.
- */
 id = g_strdup(qdict_get_try_str(opts, "id"));
-if (!qdev_set_id(dev, id, errp)) {
-goto err_del_dev;
-}
 
 /* set properties */
 dev->opts = qdict_clone_shallow(opts);
@@ -721,6 +714,14 @@ DeviceState *qdev_device_add_from_qdict(const QDict *opts, 
long *category,
 goto err_del_dev;
 }
 
+/*
+ * set dev's parent and register its id.
+ * If it fails it means the id is already taken.
+ */
+if (!qdev_set_id(dev, id, errp)) {
+goto err_del_dev;
+}
+
 if (!qdev_realize(dev, bus, errp)) {
 goto err_del_dev;
 }
-- 
2.34.1




[RFC 39/41] hw/i386: Add the interface to search parent for QOM topology

2023-11-30 Thread Zhao Liu
From: Zhao Liu 

QOM topology needs to search parent cpu-core for hotplugged CPU to
create topology child<> property in qdev_set_id().

This process is before x86_cpu_pre_plug(), thus place 2 helpers
x86_cpu_assign_apic_id() and x86_cpu_assign_topo_id() in
x86_cpu_search_parent_pre_plug() to help get the correct topology sub
indexes. Then x86_cpu_search_parent_pre_plug() searches the parent
cpu-core with these sub indexes.

Signed-off-by: Zhao Liu 
---
 hw/i386/x86.c | 128 +++---
 include/hw/i386/x86.h |   3 +
 2 files changed, 122 insertions(+), 9 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 04edd6de6aeb..595d4365fdd1 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -460,16 +460,18 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 return;
 }
 
-/* if 'address' properties socket-id/core-id/thread-id are not set, set 
them
- * so that machine_query_hotpluggable_cpus would show correct values
+/*
+ * possible_cpus_qom_granu means the QOM topology support.
+ *
+ * TODO: Drop the "!mc->smp_props.possible_cpus_qom_granu" case when
+ * i386 completes QOM topology support.
  */
-/* TODO: move socket_id/core_id/thread_id checks into x86_cpu_realizefn()
- * once -smp refactoring is complete and there will be CPU private
- * CPUState::nr_cores and CPUState::nr_threads fields instead of globals */
-x86_topo_ids_from_apicid(cpu->apic_id, _info, _ids);
-x86_cpu_assign_topo_id(cpu, _ids, errp);
-if (*errp) {
-return;
+if (!mc->smp_props.possible_cpus_qom_granu) {
+x86_topo_ids_from_apicid(cpu->apic_id, _info, _ids);
+x86_cpu_assign_topo_id(cpu, _ids, errp);
+if (*errp) {
+return;
+}
 }
 
 if (hyperv_feat_enabled(cpu, HYPERV_FEAT_VPINDEX) &&
@@ -484,6 +486,114 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 numa_cpu_pre_plug(cpu_slot, dev, errp);
 }
 
+static int x86_cpu_get_topo_id_by_level(X86CPU *cpu,
+CPUTopoLevel level)
+{
+switch (level) {
+case CPU_TOPO_THREAD:
+return cpu->thread_id;
+case CPU_TOPO_CORE:
+return cpu->core_id;
+case CPU_TOPO_DIE:
+return cpu->die_id;
+case CPU_TOPO_SOCKET:
+return cpu->socket_id;
+default:
+g_assert_not_reached();
+}
+
+return -1;
+}
+
+typedef struct SearchCoreCb {
+X86CPU *cpu;
+CPUTopoState *parent;
+int id;
+} SearchCoreCb;
+
+static int x86_cpu_search_parent_core(CPUTopoState *topo,
+  void *opaque)
+{
+SearchCoreCb *cb = opaque;
+CPUTopoLevel level = CPU_TOPO_LEVEL(topo);
+
+cb->parent = topo;
+cb->id = x86_cpu_get_topo_id_by_level(cb->cpu, level);
+
+if (cb->id == topo->index) {
+if (level == CPU_TOPO_CORE) {
+return TOPO_FOREACH_END;
+}
+return TOPO_FOREACH_CONTINUE;
+}
+return TOPO_FOREACH_SIBLING;
+}
+
+Object *x86_cpu_search_parent_pre_plug(CPUTopoState *topo,
+   CPUTopoState *root,
+   Error **errp)
+{
+int ret;
+SearchCoreCb cb;
+X86CPUTopoIDs topo_ids;
+X86CPUTopoInfo topo_info;
+X86CPU *cpu = X86_CPU(topo);
+CPUSlot *slot = CPU_SLOT(root);
+MachineState *ms = slot->ms;
+DECLARE_BITMAP(foreach_bitmap, USER_AVAIL_LEVEL_NUM);
+
+topo_info.dies_per_pkg = ms->smp.dies;
+topo_info.cores_per_die = ms->smp.cores;
+topo_info.threads_per_core = ms->smp.threads;
+
+if (cpu->apic_id == UNASSIGNED_APIC_ID) {
+x86_cpu_assign_apic_id(ms, cpu, _ids, _info, errp);
+if (*errp) {
+return NULL;
+}
+} else {
+/*
+ * if 'address' properties socket-id/core-id/thread-id are not set,
+ * set them so that machine_query_hotpluggable_cpus would show
+ * correct values.
+ *
+ * TODO: move socket_id/core_id/thread_id checks into
+ * x86_cpu_realizefn() once -smp refactoring is complete and there
+ * will be CPU private CPUState::nr_cores and CPUState::nr_threads
+ * fields instead of globals.
+ */
+x86_topo_ids_from_apicid(cpu->apic_id, _info, _ids);
+}
+
+x86_cpu_assign_topo_id(cpu, _ids, errp);
+if (*errp) {
+return NULL;
+}
+
+cb.cpu = cpu;
+cb.parent = NULL;
+cb.id = -1;
+bitmap_fill(foreach_bitmap, USER_AVAIL_LEVEL_NUM);
+clear_bit(CPU_TOPO_UNKNOWN, foreach_bitmap);
+clear_bit(CPU_TOPO_THREAD, foreach_bitmap);
+
+ret = cpu_topo_child_foreach_recursive(root, foreach_bitmap,
+   x86_cpu_search_parent_core, );
+if (ret != TOPO_FOREACH_END) {
+g_autofree char *search_info = NULL;
+
+search_info = !cb.parent ? g_strdup("") :
+g_strdup_printf(" for %s level with id: %d",
+

[RFC 04/41] qom/object: Introduce helper to resolve path from non-direct parent

2023-11-30 Thread Zhao Liu
From: Zhao Liu 

When we support child<> property creation from cli, the peripheral
container (/machine/peripheral) may not be the direct parent of the
devices created from cli.

For this case, add a helper to resolve path from non-direct parent.

Signed-off-by: Zhao Liu 
---
 include/qom/object.h | 15 +++
 qom/object.c | 18 ++
 2 files changed, 33 insertions(+)

diff --git a/include/qom/object.h b/include/qom/object.h
index afccd24ca7ab..494eef801be3 100644
--- a/include/qom/object.h
+++ b/include/qom/object.h
@@ -1562,6 +1562,21 @@ Object *object_resolve_path_type(const char *path, const 
char *typename,
  */
 Object *object_resolve_path_at(Object *parent, const char *path);
 
+/**
+ * object_resolve_path_from:
+ * @parent: the object from which to resolve the path
+ * @path: the path to resolve
+ * @ambiguous: returns true if the path resolution failed because of an
+ *   ambiguous match
+ *
+ * This is like object_resolve_path_at(), except @parent may be the
+ * partial parent of @path.
+ *
+ * Returns: The resolved object or NULL on path lookup failure.
+ */
+Object *object_resolve_path_from(Object *parent, const char *path,
+ bool *ambiguous);
+
 /**
  * object_resolve_path_component:
  * @parent: the object in which to resolve the path
diff --git a/qom/object.c b/qom/object.c
index 95c0dc8285fe..da29e88816b5 100644
--- a/qom/object.c
+++ b/qom/object.c
@@ -2192,6 +2192,24 @@ Object *object_resolve_path_at(Object *parent, const 
char *path)
 return object_resolve_abs_path(parent, parts, TYPE_OBJECT);
 }
 
+Object *object_resolve_path_from(Object *parent, const char *path,
+ bool *ambiguousp)
+{
+g_auto(GStrv) parts = NULL;
+bool ambiguous = false;
+Object *obj;
+
+parts = g_strsplit(path, "/", 0);
+assert(parts);
+
+obj = object_resolve_partial_path(parent, parts, TYPE_OBJECT,
+  );
+if (ambiguousp) {
+*ambiguousp = ambiguous;
+}
+return obj;
+}
+
 typedef struct StringProperty
 {
 char *(*get)(Object *, Error **);
-- 
2.34.1




[RFC 41/41] hw/i386: Cleanup non-QOM topology support

2023-11-30 Thread Zhao Liu
From: Zhao Liu 

After i386 supports QOM topology, drop original topology logic.

Signed-off-by: Zhao Liu 
---
 hw/i386/x86.c | 52 +++
 1 file changed, 11 insertions(+), 41 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 99f6c502de43..cba8b806cdb6 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -118,7 +118,8 @@ out:
 
 void x86_cpus_init(X86MachineState *x86ms, int default_cpu_version)
 {
-int i;
+CPUCore *core;
+int i, cpu_index = 0, core_idx = 0;
 const CPUArchIdList *possible_cpus;
 MachineState *ms = MACHINE(x86ms);
 MachineClass *mc = MACHINE_GET_CLASS(x86ms);
@@ -153,34 +154,17 @@ void x86_cpus_init(X86MachineState *x86ms, int 
default_cpu_version)
 
 possible_cpus = mc->possible_cpu_arch_ids(ms);
 
-/*
- * possible_cpus_qom_granu means the QOM topology support.
- *
- * TODO: Drop the "!mc->smp_props.possible_cpus_qom_granu" case when
- * i386 completes QOM topology support.
- */
-if (mc->smp_props.possible_cpus_qom_granu) {
-CPUCore *core;
-int cpu_index = 0;
-int core_idx = 0;
-
-MACHINE_CORE_FOREACH(ms, core) {
-for (i = 0; i < core->plugged_threads; i++) {
-x86_cpu_new(x86ms, possible_cpus->cpus[cpu_index].arch_id,
-OBJECT(core), cpu_index, _fatal);
-cpu_index++;
-}
-
-if (core->plugged_threads < core->nr_threads) {
-cpu_index += core->nr_threads - core->plugged_threads;
-}
-core_idx++;
+MACHINE_CORE_FOREACH(ms, core) {
+for (i = 0; i < core->plugged_threads; i++) {
+x86_cpu_new(x86ms, possible_cpus->cpus[cpu_index].arch_id,
+OBJECT(core), cpu_index, _fatal);
+cpu_index++;
 }
-} else {
-for (i = 0; i < ms->smp.cpus; i++) {
-x86_cpu_new(x86ms, possible_cpus->cpus[i].arch_id,
-NULL, i, _fatal);
+
+if (core->plugged_threads < core->nr_threads) {
+cpu_index += core->nr_threads - core->plugged_threads;
 }
+core_idx++;
 }
 }
 
@@ -460,20 +444,6 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 return;
 }
 
-/*
- * possible_cpus_qom_granu means the QOM topology support.
- *
- * TODO: Drop the "!mc->smp_props.possible_cpus_qom_granu" case when
- * i386 completes QOM topology support.
- */
-if (!mc->smp_props.possible_cpus_qom_granu) {
-x86_topo_ids_from_apicid(cpu->apic_id, _info, _ids);
-x86_cpu_assign_topo_id(cpu, _ids, errp);
-if (*errp) {
-return;
-}
-}
-
 if (hyperv_feat_enabled(cpu, HYPERV_FEAT_VPINDEX) &&
 kvm_enabled() && !kvm_hv_vpindex_settable()) {
 error_setg(errp, "kernel doesn't allow setting HyperV VP_INDEX");
-- 
2.34.1




[RFC 37/41] hw/i386: Allow i386 to create new CPUs from QOM topology

2023-11-30 Thread Zhao Liu
From: Zhao Liu 

For QOM topology, maximum number of CPUs and the number of plugged CPUs
are configured in core level.

Iterate through all the cpu-cores to determine how many CPUs should be
created in each cpu-core.

Signed-off-by: Zhao Liu 
---
 hw/i386/x86.c | 32 +---
 1 file changed, 29 insertions(+), 3 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 3c99f4c3ab51..febffed92a83 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -152,9 +152,35 @@ void x86_cpus_init(X86MachineState *x86ms, int 
default_cpu_version)
 }
 
 possible_cpus = mc->possible_cpu_arch_ids(ms);
-for (i = 0; i < ms->smp.cpus; i++) {
-x86_cpu_new(x86ms, possible_cpus->cpus[i].arch_id,
-NULL, i, _fatal);
+
+/*
+ * possible_cpus_qom_granu means the QOM topology support.
+ *
+ * TODO: Drop the "!mc->smp_props.possible_cpus_qom_granu" case when
+ * i386 completes QOM topology support.
+ */
+if (mc->smp_props.possible_cpus_qom_granu) {
+CPUCore *core;
+int cpu_index = 0;
+int core_idx = 0;
+
+MACHINE_CORE_FOREACH(ms, core) {
+for (i = 0; i < core->plugged_threads; i++) {
+x86_cpu_new(x86ms, possible_cpus->cpus[cpu_index].arch_id,
+OBJECT(core), cpu_index, _fatal);
+cpu_index++;
+}
+
+if (core->plugged_threads < core->nr_threads) {
+cpu_index += core->nr_threads - core->plugged_threads;
+}
+core_idx++;
+}
+} else {
+for (i = 0; i < ms->smp.cpus; i++) {
+x86_cpu_new(x86ms, possible_cpus->cpus[i].arch_id,
+NULL, i, _fatal);
+}
 }
 }
 
-- 
2.34.1




[RFC 07/41] qdev: Introduce parent option in -device

2023-11-30 Thread Zhao Liu
From: Zhao Liu 

Currently, the devices added by "-device" are linked via bus, and are
set the parent as peripheral-anon or peripheral containers of the
machine.

But this is not enough for building CPU topology hierarchies as:
1. The relationship between different CPU hierarchies is child<>
   property other than link<> property, and they shouldn't be linked
   using the special bus.
2. The canonical path of device is built from the child<> property, and
   the well defined CPU topology hierarchies ask their canonical path to
   reflect the correct topological relationship.

With these, the child<> property support is needed for QDev interface to
allow user to configure proper parent in "-device".

Introduce the "parent" option in "-device" to create the child<>
property. This option asks for the device id of the parent device.

Signed-off-by: Zhao Liu 
---
 hw/xen/xen-legacy-backend.c |  2 +-
 include/monitor/qdev.h  |  3 ++-
 system/qdev-monitor.c   | 50 ++---
 3 files changed, 38 insertions(+), 17 deletions(-)

diff --git a/hw/xen/xen-legacy-backend.c b/hw/xen/xen-legacy-backend.c
index 124dd5f3d687..70ad11c6287e 100644
--- a/hw/xen/xen-legacy-backend.c
+++ b/hw/xen/xen-legacy-backend.c
@@ -184,7 +184,7 @@ static struct XenLegacyDevice *xen_be_get_xendev(const char 
*type, int dom,
 object_initialize(>qdev, ops->size, TYPE_XENBACKEND);
 OBJECT(xendev)->free = g_free;
 qdev_set_id(DEVICE(xendev), g_strdup_printf("xen-%s-%d", type, dev),
-_fatal);
+NULL, _fatal);
 qdev_realize(DEVICE(xendev), xen_sysbus, _fatal);
 object_unref(OBJECT(xendev));
 
diff --git a/include/monitor/qdev.h b/include/monitor/qdev.h
index f5fd6e6c1ffc..3d9d06158e5f 100644
--- a/include/monitor/qdev.h
+++ b/include/monitor/qdev.h
@@ -16,6 +16,7 @@ DeviceState *qdev_device_add_from_qdict(const QDict *opts, 
long *category,
  * qdev_set_id: parent the device and set its id if provided.
  * @dev: device to handle
  * @id: id to be given to the device, or NULL.
+ * @parent: parent to be set for the device, or NULL.
  *
  * Returns: the id of the device in case of success; otherwise NULL.
  *
@@ -34,6 +35,6 @@ DeviceState *qdev_device_add_from_qdict(const QDict *opts, 
long *category,
  * returned string is owned by the corresponding child property and must
  * not be freed by the caller.
  */
-const char *qdev_set_id(DeviceState *dev, char *id, Error **errp);
+const char *qdev_set_id(DeviceState *dev, char *id, char *parent, Error 
**errp);
 
 #endif
diff --git a/system/qdev-monitor.c b/system/qdev-monitor.c
index 0261937b8462..8f56113eef65 100644
--- a/system/qdev-monitor.c
+++ b/system/qdev-monitor.c
@@ -587,22 +587,33 @@ static BusState *qbus_find(const char *path, Error **errp)
 }
 
 static Object *qdev_find_peripheral_parent(DeviceState *dev,
+   char *parent_id,
Error **errp)
 {
 Object *parent_obj, *obj = OBJECT(dev);
 
-parent_obj = uc_provide_default_parent(obj, errp);
-if (*errp) {
-return NULL;
-}
+if (parent_id) {
+parent_obj = object_resolve_path_from(qdev_get_peripheral(),
+  parent_id, NULL);
+if (parent_obj) {
+if (uc_check_user_parent(obj, parent_obj)) {
+return parent_obj;
+}
+}
+} else {
+parent_obj = uc_provide_default_parent(obj, errp);
+if (*errp) {
+return NULL;
+}
 
-if (parent_obj) {
-/*
- * Non-anonymous parents (under "/peripheral") are allowed to
- * be accessed to create child<> properties.
- */
-if (object_is_child_from(parent_obj, qdev_get_peripheral())) {
-return parent_obj;
+if (parent_obj) {
+/*
+ * Non-anonymous parents (under "/peripheral") are allowed to
+ * be accessed to create child<> properties.
+ */
+if (object_is_child_from(parent_obj, qdev_get_peripheral())) {
+return parent_obj;
+}
 }
 }
 
@@ -628,7 +639,8 @@ static bool qdev_pre_check_device_id(char *id, Error **errp)
 }
 
 /* Takes ownership of @id, will be freed when deleting the device */
-const char *qdev_set_id(DeviceState *dev, char *id, Error **errp)
+const char *qdev_set_id(DeviceState *dev, char *id,
+char *parent, Error **errp)
 {
 Object *parent_obj = NULL;
 ObjectProperty *prop;
@@ -639,7 +651,7 @@ const char *qdev_set_id(DeviceState *dev, char *id, Error 
**errp)
 uc = (UserChild *)object_dynamic_cast(OBJECT(dev), TYPE_USER_CHILD);
 
 if (uc) {
-parent_obj = qdev_find_peripheral_parent(dev, errp);
+parent_obj = qdev_find_peripheral_parent(dev, parent, errp);
 if (*errp) {
 goto err;
 }
@@ -655,6 +667,11 @@ const char 

[RFC 03/41] system: Create base category devices from cli before board initialization

2023-11-30 Thread Zhao Liu
From: Zhao Liu 

Topology devices are required to complete CPU topology building before
*_init_cpus() in MachineClass.init().

Add a qemu_create_cli_base_devices() before board initialization to
help create and realize topology devices from cli early.

Signed-off-by: Zhao Liu 
---
 system/vl.c | 51 ++-
 1 file changed, 34 insertions(+), 17 deletions(-)

diff --git a/system/vl.c b/system/vl.c
index 0be155b530b4..65add2fb2460 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -1197,8 +1197,9 @@ static int device_help_func(void *opaque, QemuOpts *opts, 
Error **errp)
 static int device_init_func(void *opaque, QemuOpts *opts, Error **errp)
 {
 DeviceState *dev;
+long *category = opaque;
 
-dev = qdev_device_add(opts, NULL, errp);
+dev = qdev_device_add(opts, category, errp);
 if (!dev && *errp) {
 error_report_err(*errp);
 return -1;
@@ -2617,25 +2618,13 @@ static void qemu_init_board(void)
 realtime_init();
 }
 
-static void qemu_create_cli_devices(void)
+static void qemu_create_cli_devices(long *category)
 {
 DeviceOption *opt;
 
-soundhw_init();
-
-qemu_opts_foreach(qemu_find_opts("fw_cfg"),
-  parse_fw_cfg, fw_cfg_find(), _fatal);
-
-/* init USB devices */
-if (machine_usb(current_machine)) {
-if (foreach_device_config(DEV_USB, usb_parse) < 0)
-exit(1);
-}
-
-/* init generic devices */
 rom_set_order_override(FW_CFG_ORDER_OVERRIDE_DEVICE);
 qemu_opts_foreach(qemu_find_opts("device"),
-  device_init_func, NULL, _fatal);
+  device_init_func, category, _fatal);
 QTAILQ_FOREACH(opt, _opts, next) {
 DeviceState *dev;
 loc_push_restore(>loc);
@@ -2646,13 +2635,40 @@ static void qemu_create_cli_devices(void)
  * from the start, so call qdev_device_add_from_qdict() directly for
  * now.
  */
-dev = qdev_device_add_from_qdict(opt->opts, NULL, true, _fatal);
+dev = qdev_device_add_from_qdict(opt->opts, category,
+ true, _fatal);
 object_unref(OBJECT(dev));
 loc_pop(>loc);
 }
 rom_reset_order_override();
 }
 
+static void qemu_create_cli_base_devices(void)
+{
+long category = DEVICE_CATEGORY_CPU_DEF;
+
+qemu_opts_foreach(qemu_find_opts("fw_cfg"),
+  parse_fw_cfg, fw_cfg_find(), _fatal);
+
+/* init CPU topology devices which don't support hotplug. */
+qemu_create_cli_devices();
+}
+
+static void qemu_create_cli_periphery_devices(void)
+{
+soundhw_init();
+
+/* init USB devices */
+if (machine_usb(current_machine)) {
+if (foreach_device_config(DEV_USB, usb_parse) < 0) {
+exit(1);
+}
+}
+
+/* init generic devices */
+qemu_create_cli_devices(NULL);
+}
+
 static void qemu_machine_creation_done(void)
 {
 MachineState *machine = MACHINE(qdev_get_machine());
@@ -2701,8 +2717,9 @@ void qmp_x_exit_preconfig(Error **errp)
 return;
 }
 
+qemu_create_cli_base_devices();
 qemu_init_board();
-qemu_create_cli_devices();
+qemu_create_cli_periphery_devices();
 qemu_machine_creation_done();
 
 if (loadvm) {
-- 
2.34.1




[RFC 40/41] hw/i386: Support QOM topology

2023-11-30 Thread Zhao Liu
From: Zhao Liu 

Set MachineClass.smp_props.possible_cpus_qom_granu and
TopoClass.search_parent_pre_plug for i386.

So far, the i386 topology is based on the QOM topology.

Signed-off-by: Zhao Liu 
---
 hw/i386/x86.c | 1 +
 target/i386/cpu.c | 4 
 2 files changed, 5 insertions(+)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index 595d4365fdd1..99f6c502de43 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -1565,6 +1565,7 @@ static void x86_machine_class_init(ObjectClass *oc, void 
*data)
 mc->cpu_index_to_instance_props = x86_cpu_index_to_props;
 mc->get_default_cpu_node_id = x86_get_default_cpu_node_id;
 mc->possible_cpu_arch_ids = x86_possible_cpu_arch_ids;
+mc->smp_props.possible_cpus_qom_granu = CPU_TOPO_THREAD;
 x86mc->save_tsc_khz = true;
 x86mc->fwcfg_dma_enabled = true;
 nc->nmi_monitor_handler = x86_nmi;
diff --git a/target/i386/cpu.c b/target/i386/cpu.c
index cd16cb893daf..1de5726691e1 100644
--- a/target/i386/cpu.c
+++ b/target/i386/cpu.c
@@ -41,6 +41,7 @@
 #include "exec/address-spaces.h"
 #include "hw/boards.h"
 #include "hw/i386/sgx-epc.h"
+#include "hw/i386/x86.h"
 #endif
 
 #include "disas/capstone.h"
@@ -8009,6 +8010,9 @@ static void x86_cpu_common_class_init(ObjectClass *oc, 
void *data)
 #if !defined(CONFIG_USER_ONLY)
 object_class_property_add(oc, "crash-information", "GuestPanicInformation",
   x86_cpu_get_crash_info_qom, NULL, NULL, NULL);
+
+CPU_TOPO_CLASS(oc)->search_parent_pre_plug =
+x86_cpu_search_parent_pre_plug;
 #endif
 
 for (w = 0; w < FEATURE_WORDS; w++) {
-- 
2.34.1




[RFC 35/41] hw/i386: Make x86_cpu_new() private in x86.c

2023-11-30 Thread Zhao Liu
From: Zhao Liu 

x86_cpu_new() is only invoked in x86.c. Declear it as static in x86.c.

Signed-off-by: Zhao Liu 
---
 hw/i386/x86.c | 3 ++-
 include/hw/i386/x86.h | 2 --
 2 files changed, 2 insertions(+), 3 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index b3d054889bba..d9293846db64 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -95,7 +95,8 @@ uint32_t x86_cpu_apic_id_from_index(X86MachineState *x86ms,
 }
 
 
-void x86_cpu_new(X86MachineState *x86ms, int64_t apic_id, Error **errp)
+static void x86_cpu_new(X86MachineState *x86ms, int64_t apic_id,
+Error **errp)
 {
 Object *cpu = object_new(MACHINE(x86ms)->cpu_type);
 
diff --git a/include/hw/i386/x86.h b/include/hw/i386/x86.h
index da19ae15463a..19e9f93fe286 100644
--- a/include/hw/i386/x86.h
+++ b/include/hw/i386/x86.h
@@ -97,8 +97,6 @@ OBJECT_DECLARE_TYPE(X86MachineState, X86MachineClass, 
X86_MACHINE)
 
 uint32_t x86_cpu_apic_id_from_index(X86MachineState *pcms,
 unsigned int cpu_index);
-
-void x86_cpu_new(X86MachineState *pcms, int64_t apic_id, Error **errp);
 void x86_cpus_init(X86MachineState *pcms, int default_cpu_version);
 CpuInstanceProperties x86_cpu_index_to_props(MachineState *ms,
  unsigned cpu_index);
-- 
2.34.1




[RFC 29/41] hw/core/slot: Statistics topology information in CPU slot

2023-11-30 Thread Zhao Liu
From: Zhao Liu 

The CPU slot, as the root of the topology tree, is responsible for
global topology information collection and updates.

When a new topology device is added to/deleted from the topology tree,
update the corresponding information in the slot.

Signed-off-by: Zhao Liu 
---
 hw/core/cpu-slot.c | 41 +++-
 include/hw/core/cpu-slot.h | 43 ++
 2 files changed, 83 insertions(+), 1 deletion(-)

diff --git a/hw/core/cpu-slot.c b/hw/core/cpu-slot.c
index a6b7d98dea18..e8e6f4d25532 100644
--- a/hw/core/cpu-slot.c
+++ b/hw/core/cpu-slot.c
@@ -22,14 +22,44 @@
 
 #include "hw/core/cpu-slot.h"
 
+static inline
+CPUTopoStatEntry *get_topo_stat_entry(CPUTopoStat *stat,
+  CPUTopoLevel level)
+{
+assert(level != CPU_TOPO_UNKNOWN);
+
+return >entries[TOPO_STAT_ENTRY_IDX(level)];
+}
+
 static void cpu_slot_add_topo_info(CPUTopoState *root, CPUTopoState *child)
 {
 CPUSlot *slot = CPU_SLOT(root);
 CPUTopoLevel level = CPU_TOPO_LEVEL(child);
+CPUTopoStatEntry *entry;
 
 if (level == CPU_TOPO_CORE) {
-QTAILQ_INSERT_TAIL(>cores, CPU_CORE(child), node);
+CPUCore *core = CPU_CORE(child);
+CPUTopoStatEntry *thread_entry;
+
+QTAILQ_INSERT_TAIL(>cores, core, node);
+
+/* Max CPUs per core is pre-configured by "nr-threads". */
+slot->stat.max_cpus += core->nr_threads;
+slot->stat.pre_plugged_cpus += core->plugged_threads;
+
+thread_entry = get_topo_stat_entry(>stat, CPU_TOPO_THREAD);
+if (child->max_children > thread_entry->max_units) {
+thread_entry->max_units = child->max_children;
+}
 }
+
+entry = get_topo_stat_entry(>stat, level);
+entry->total_units++;
+if (child->parent->num_children > entry->max_units) {
+entry->max_units = child->parent->num_children;
+}
+
+set_bit(level, slot->stat.curr_levels);
 return;
 }
 
@@ -37,10 +67,18 @@ static void cpu_slot_del_topo_info(CPUTopoState *root, 
CPUTopoState *child)
 {
 CPUSlot *slot = CPU_SLOT(root);
 CPUTopoLevel level = CPU_TOPO_LEVEL(child);
+CPUTopoStatEntry *entry;
+
+assert(level != CPU_TOPO_UNKNOWN);
 
 if (level == CPU_TOPO_CORE) {
 QTAILQ_REMOVE(>cores, CPU_CORE(child), node);
 }
+
+entry = get_topo_stat_entry(>stat, level);
+entry->total_units--;
+
+/* No need to update entries[*].max_units and curr_levels. */
 return;
 }
 
@@ -73,6 +111,7 @@ static void cpu_slot_instance_init(Object *obj)
 CPUSlot *slot = CPU_SLOT(obj);
 
 QTAILQ_INIT(>cores);
+set_bit(CPU_TOPO_ROOT, slot->stat.curr_levels);
 }
 
 static const TypeInfo cpu_slot_type_info = {
diff --git a/include/hw/core/cpu-slot.h b/include/hw/core/cpu-slot.h
index d2a1160562be..fa2bd4af247d 100644
--- a/include/hw/core/cpu-slot.h
+++ b/include/hw/core/cpu-slot.h
@@ -25,6 +25,47 @@
 #include "hw/cpu/core.h"
 #include "hw/qdev-core.h"
 
+/**
+ * @USER_AVAIL_LEVEL_NUM: the number of total topology levels in topology
+ *bitmap, which includes CPU_TOPO_UNKNOWN.
+ */
+#define USER_AVAIL_LEVEL_NUM (CPU_TOPO_ROOT + 1)
+
+/**
+ * @VALID_LEVEL_NUM: the number of valid topology levels, which excludes
+ *   CPU_TOPO_UNKNOWN and CPU_TOPO_ROOT.
+ */
+#define VALID_LEVEL_NUM (CPU_TOPO_ROOT - 1)
+
+#define TOPO_STAT_ENTRY_IDX(level) ((level) - 1)
+
+/**
+ * CPUTopoStatEntry:
+ * @total: Total number of topological units at the same level that are
+ * currently inserted in CPU slot
+ * @max: Maximum number of topological units at the same level under the
+ *   parent topolofical container
+ */
+typedef struct CPUTopoStatEntry {
+unsigned int total_units;
+unsigned int max_units;
+} CPUTopoStatEntry;
+
+/**
+ * CPUTopoStat:
+ * @max_cpus: Maximum number of CPUs in CPU slot.
+ * @pre_plugged_cpus: Number of pre-plugged CPUs in CPU slot.
+ * @entries: Detail count information for valid topology levels under
+ *   CPU slot
+ * @curr_levels: Current CPU topology levels inserted in CPU slot
+ */
+typedef struct CPUTopoStat {
+unsigned int max_cpus;
+unsigned int pre_plugged_cpus;
+CPUTopoStatEntry entries[VALID_LEVEL_NUM];
+DECLARE_BITMAP(curr_levels, USER_AVAIL_LEVEL_NUM);
+} CPUTopoStat;
+
 #define TYPE_CPU_SLOT "cpu-slot"
 
 OBJECT_DECLARE_SIMPLE_TYPE(CPUSlot, CPU_SLOT)
@@ -35,6 +76,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(CPUSlot, CPU_SLOT)
  * where the cpu-slot is the root. cpu-slot can maintain similar
  * queues for other topology levels to facilitate traversal
  * when necessary.
+ * @stat: Statistical topology information for topology tree.
  */
 struct CPUSlot {
 /*< private >*/
@@ -42,6 +84,7 @@ struct CPUSlot {
 
 /*< public >*/
 QTAILQ_HEAD(, CPUCore) cores;
+CPUTopoStat stat;
 };
 
 #endif /* CPU_SLOT_H */
-- 
2.34.1




[RFC 38/41] hw/i386: Wrap apic id and topology sub ids assigning as helpers

2023-11-30 Thread Zhao Liu
From: Zhao Liu 

For QOM topology, these 2 helpers are needed for hotplugged CPU to
verify its topology sub indexes and then search its parent core.

Signed-off-by: Zhao Liu 
---
 hw/i386/x86.c | 173 --
 1 file changed, 96 insertions(+), 77 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index febffed92a83..04edd6de6aeb 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -306,6 +306,98 @@ void x86_cpu_unplug_cb(HotplugHandler *hotplug_dev,
 error_propagate(errp, local_err);
 }
 
+static void x86_cpu_assign_apic_id(MachineState *ms, X86CPU *cpu,
+   X86CPUTopoIDs *topo_ids,
+   X86CPUTopoInfo *topo_info,
+   Error **errp)
+{
+int max_socket = (ms->smp.max_cpus - 1) /
+ ms->smp.threads / ms->smp.cores / ms->smp.dies;
+
+/*
+ * die-id was optional in QEMU 4.0 and older, so keep it optional
+ * if there's only one die per socket.
+ */
+if (cpu->die_id < 0 && ms->smp.dies == 1) {
+cpu->die_id = 0;
+}
+
+if (cpu->socket_id < 0) {
+error_setg(errp, "CPU socket-id is not set");
+return;
+} else if (cpu->socket_id > max_socket) {
+error_setg(errp, "Invalid CPU socket-id: %u must be in range 0:%u",
+   cpu->socket_id, max_socket);
+return;
+}
+if (cpu->die_id < 0) {
+error_setg(errp, "CPU die-id is not set");
+return;
+} else if (cpu->die_id > ms->smp.dies - 1) {
+error_setg(errp, "Invalid CPU die-id: %u must be in range 0:%u",
+   cpu->die_id, ms->smp.dies - 1);
+return;
+}
+if (cpu->core_id < 0) {
+error_setg(errp, "CPU core-id is not set");
+return;
+} else if (cpu->core_id > (ms->smp.cores - 1)) {
+error_setg(errp, "Invalid CPU core-id: %u must be in range 0:%u",
+   cpu->core_id, ms->smp.cores - 1);
+return;
+}
+if (cpu->thread_id < 0) {
+error_setg(errp, "CPU thread-id is not set");
+return;
+} else if (cpu->thread_id > (ms->smp.threads - 1)) {
+error_setg(errp, "Invalid CPU thread-id: %u must be in range 0:%u",
+   cpu->thread_id, ms->smp.threads - 1);
+return;
+}
+
+topo_ids->pkg_id = cpu->socket_id;
+topo_ids->die_id = cpu->die_id;
+topo_ids->core_id = cpu->core_id;
+topo_ids->smt_id = cpu->thread_id;
+cpu->apic_id = x86_apicid_from_topo_ids(topo_info, topo_ids);
+}
+
+static void x86_cpu_assign_topo_id(X86CPU *cpu,
+   X86CPUTopoIDs *topo_ids,
+   Error **errp)
+{
+if (cpu->socket_id != -1 && cpu->socket_id != topo_ids->pkg_id) {
+error_setg(errp, "property socket-id: %u doesn't match set apic-id:"
+" 0x%x (socket-id: %u)", cpu->socket_id, cpu->apic_id,
+topo_ids->pkg_id);
+return;
+}
+cpu->socket_id = topo_ids->pkg_id;
+
+if (cpu->die_id != -1 && cpu->die_id != topo_ids->die_id) {
+error_setg(errp, "property die-id: %u doesn't match set apic-id:"
+" 0x%x (die-id: %u)", cpu->die_id, cpu->apic_id, topo_ids->die_id);
+return;
+}
+cpu->die_id = topo_ids->die_id;
+
+if (cpu->core_id != -1 && cpu->core_id != topo_ids->core_id) {
+error_setg(errp, "property core-id: %u doesn't match set apic-id:"
+" 0x%x (core-id: %u)", cpu->core_id, cpu->apic_id,
+topo_ids->core_id);
+return;
+}
+cpu->core_id = topo_ids->core_id;
+
+if (cpu->thread_id != -1 && cpu->thread_id != topo_ids->smt_id) {
+error_setg(errp, "property thread-id: %u doesn't match set apic-id:"
+" 0x%x (thread-id: %u)", cpu->thread_id, cpu->apic_id,
+topo_ids->smt_id);
+return;
+}
+cpu->thread_id = topo_ids->smt_id;
+}
+
 void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
   DeviceState *dev, Error **errp)
 {
@@ -317,8 +409,6 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
 CPUX86State *env = >env;
 MachineState *ms = MACHINE(hotplug_dev);
 X86MachineState *x86ms = X86_MACHINE(hotplug_dev);
-unsigned int smp_cores = ms->smp.cores;
-unsigned int smp_threads = ms->smp.threads;
 X86CPUTopoInfo topo_info;
 
 if (!object_dynamic_cast(OBJECT(cpu), ms->cpu_type)) {
@@ -347,55 +437,10 @@ void x86_cpu_pre_plug(HotplugHandler *hotplug_dev,
  * set it based on socket/die/core/thread properties.
  */
 if (cpu->apic_id == UNASSIGNED_APIC_ID) {
-int max_socket = (ms->smp.max_cpus - 1) /
-smp_threads / smp_cores / ms->smp.dies;
-
-/*
- * die-id was optional in QEMU 4.0 and older, so keep it optional
- * if there's only one die per socket.
- */
-if (cpu->die_id < 0 && ms->smp.dies == 1) {
-

[RFC 27/41] hw/core/slot: Introduce CPU slot as the root of CPU topology

2023-11-30 Thread Zhao Liu
From: Zhao Liu 

Abstract the root of topology tree as a special topology device
"cpu-slot".

Signed-off-by: Zhao Liu 
---
 MAINTAINERS|  2 ++
 hw/core/cpu-slot.c | 48 ++
 hw/core/meson.build|  1 +
 include/hw/core/cpu-slot.h | 38 ++
 4 files changed, 89 insertions(+)
 create mode 100644 hw/core/cpu-slot.c
 create mode 100644 include/hw/core/cpu-slot.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 4b373ff46ce3..ac08b5a8c4e0 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1856,6 +1856,7 @@ R: Philippe Mathieu-Daudé 
 R: Yanan Wang 
 S: Supported
 F: hw/core/cpu.c
+F: hw/core/cpu-slot.c
 F: hw/core/cpu-topo.c
 F: hw/core/machine-qmp-cmds.c
 F: hw/core/machine.c
@@ -1872,6 +1873,7 @@ F: qapi/machine-common.json
 F: qapi/machine-target.json
 F: include/hw/boards.h
 F: include/hw/core/cpu.h
+F: include/hw/core/cpu-slot.h
 F: include/hw/core/cpu-topo.h
 F: include/hw/cpu/book.h
 F: include/hw/cpu/cluster.h
diff --git a/hw/core/cpu-slot.c b/hw/core/cpu-slot.c
new file mode 100644
index ..5aef5b0189c2
--- /dev/null
+++ b/hw/core/cpu-slot.c
@@ -0,0 +1,48 @@
+/*
+ * CPU slot device abstraction
+ *
+ * Copyright (c) 2023 Intel Corporation
+ * Author: Zhao Liu 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License,
+ * or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#include "qemu/osdep.h"
+
+#include "hw/core/cpu-slot.h"
+
+static void cpu_slot_class_init(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+CPUTopoClass *tc = CPU_TOPO_CLASS(oc);
+
+set_bit(DEVICE_CATEGORY_CPU_DEF, dc->categories);
+dc->user_creatable = false;
+
+tc->level = CPU_TOPO_ROOT;
+}
+
+static const TypeInfo cpu_slot_type_info = {
+.name = TYPE_CPU_SLOT,
+.parent = TYPE_CPU_TOPO,
+.class_init = cpu_slot_class_init,
+.instance_size = sizeof(CPUSlot),
+};
+
+static void cpu_slot_register_types(void)
+{
+type_register_static(_slot_type_info);
+}
+
+type_init(cpu_slot_register_types)
diff --git a/hw/core/meson.build b/hw/core/meson.build
index 501d2529697e..3347c054e162 100644
--- a/hw/core/meson.build
+++ b/hw/core/meson.build
@@ -23,6 +23,7 @@ else
 endif
 
 common_ss.add(files('cpu-common.c'))
+common_ss.add(files('cpu-slot.c'))
 common_ss.add(files('cpu-topo.c'))
 common_ss.add(files('machine-smp.c'))
 system_ss.add(when: 'CONFIG_FITLOADER', if_true: files('loader-fit.c'))
diff --git a/include/hw/core/cpu-slot.h b/include/hw/core/cpu-slot.h
new file mode 100644
index ..718c8ecaa751
--- /dev/null
+++ b/include/hw/core/cpu-slot.h
@@ -0,0 +1,38 @@
+/*
+ * CPU slot device abstraction
+ *
+ * Copyright (c) 2023 Intel Corporation
+ * Author: Zhao Liu 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License,
+ * or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#ifndef CPU_SLOT_H
+#define CPU_SLOT_H
+
+#include "hw/core/cpu-topo.h"
+#include "hw/qdev-core.h"
+
+#define TYPE_CPU_SLOT "cpu-slot"
+
+OBJECT_DECLARE_SIMPLE_TYPE(CPUSlot, CPU_SLOT)
+
+struct CPUSlot {
+/*< private >*/
+CPUTopoState parent_obj;
+
+/*< public >*/
+};
+
+#endif /* CPU_SLOT_H */
-- 
2.34.1




[RFC 25/41] hw/cpu/book: Abstract cpu-book level as topology device

2023-11-30 Thread Zhao Liu
From: Zhao Liu 

Abstract book level as a topology device "cpu-book" to allow user to
create book level topology from cli and later the cpu-books could be
added into topology tree.

In addition, mark the cpu-book as DEVICE_CATEGORY_CPU_DEF category to
indicate it belongs to the basic CPU definition and should be created
from cli before board initialization.

Signed-off-by: Zhao Liu 
---
 MAINTAINERS   |  2 ++
 hw/cpu/book.c | 46 +++
 hw/cpu/meson.build|  2 +-
 include/hw/cpu/book.h | 38 +++
 4 files changed, 87 insertions(+), 1 deletion(-)
 create mode 100644 hw/cpu/book.c
 create mode 100644 include/hw/cpu/book.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 6a9fa0aeed0c..dd5adfda64cc 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1862,6 +1862,7 @@ F: hw/core/machine.c
 F: hw/core/machine-smp.c
 F: hw/core/null-machine.c
 F: hw/core/numa.c
+F: hw/cpu/book.c
 F: hw/cpu/cluster.c
 F: hw/cpu/die.c
 F: hw/cpu/socket.c
@@ -1871,6 +1872,7 @@ F: qapi/machine-target.json
 F: include/hw/boards.h
 F: include/hw/core/cpu.h
 F: include/hw/core/cpu-topo.h
+F: include/hw/cpu/book.h
 F: include/hw/cpu/cluster.h
 F: include/hw/cpu/die.h
 F: include/hw/cpu/socket.h
diff --git a/hw/cpu/book.c b/hw/cpu/book.c
new file mode 100644
index ..4b16267b10eb
--- /dev/null
+++ b/hw/cpu/book.c
@@ -0,0 +1,46 @@
+/*
+ * CPU book abstract device
+ *
+ * Copyright (c) 2023 Intel Corporation
+ * Author: Zhao Liu 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License,
+ * or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "hw/cpu/book.h"
+
+static void cpu_book_class_init(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+CPUTopoClass *tc = CPU_TOPO_CLASS(oc);
+
+set_bit(DEVICE_CATEGORY_CPU_DEF, dc->categories);
+
+tc->level = CPU_TOPO_BOOK;
+}
+
+static const TypeInfo cpu_book_type_info = {
+.name = TYPE_CPU_BOOK,
+.parent = TYPE_CPU_TOPO,
+.class_init = cpu_book_class_init,
+.instance_size = sizeof(CPUBook),
+};
+
+static void cpu_book_register_types(void)
+{
+type_register_static(_book_type_info);
+}
+
+type_init(cpu_book_register_types)
diff --git a/hw/cpu/meson.build b/hw/cpu/meson.build
index 251724fea86c..c44b54c5abb0 100644
--- a/hw/cpu/meson.build
+++ b/hw/cpu/meson.build
@@ -1,4 +1,4 @@
-system_ss.add(files('core.c', 'cluster.c', 'die.c', 'socket.c'))
+system_ss.add(files('core.c', 'cluster.c', 'die.c', 'socket.c', 'book.c'))
 
 system_ss.add(when: 'CONFIG_ARM11MPCORE', if_true: files('arm11mpcore.c'))
 system_ss.add(when: 'CONFIG_REALVIEW', if_true: files('realview_mpcore.c'))
diff --git a/include/hw/cpu/book.h b/include/hw/cpu/book.h
new file mode 100644
index ..b91bd553bea6
--- /dev/null
+++ b/include/hw/cpu/book.h
@@ -0,0 +1,38 @@
+/*
+ * CPU book abstract device
+ *
+ * Copyright (c) 2023 Intel Corporation
+ * Author: Zhao Liu 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License,
+ * or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#ifndef HW_CPU_BOOK_H
+#define HW_CPU_BOOK_H
+
+#include "hw/core/cpu-topo.h"
+#include "hw/qdev-core.h"
+
+#define TYPE_CPU_BOOK "cpu-book"
+
+OBJECT_DECLARE_SIMPLE_TYPE(CPUBook, CPU_BOOK)
+
+struct CPUBook {
+/*< private >*/
+CPUTopoState obj;
+
+/*< public >*/
+};
+
+#endif /* HW_CPU_BOOK_H */
-- 
2.34.1




[RFC 32/41] hw/machine: Build smp topology tree from -smp

2023-11-30 Thread Zhao Liu
From: Zhao Liu 

For the architecture supports QOM topology (the field
MachineClass.possible_cpus_qom_granu is set), implement smp QOM topology
tree from MachineState.smp.

Signed-off-by: Zhao Liu 
---
 hw/core/cpu-slot.c | 217 +
 hw/core/machine-smp.c  |   9 ++
 hw/cpu/core.c  |   1 -
 include/hw/boards.h|  11 ++
 include/hw/core/cpu-slot.h |   5 +
 tests/unit/meson.build |   5 +-
 6 files changed, 246 insertions(+), 2 deletions(-)

diff --git a/hw/core/cpu-slot.c b/hw/core/cpu-slot.c
index 4b148440ed3d..ade155baf60b 100644
--- a/hw/core/cpu-slot.c
+++ b/hw/core/cpu-slot.c
@@ -22,6 +22,11 @@
 
 #include "hw/boards.h"
 #include "hw/core/cpu-slot.h"
+#include "hw/cpu/book.h"
+#include "hw/cpu/cluster.h"
+#include "hw/cpu/die.h"
+#include "hw/cpu/drawer.h"
+#include "hw/cpu/socket.h"
 #include "qapi/error.h"
 
 static inline
@@ -196,3 +201,215 @@ void machine_plug_cpu_slot(MachineState *ms)
 clear_bit(CPU_TOPO_DRAWER, ms->topo->supported_levels);
 }
 }
+
+static unsigned int *get_smp_info_by_level(CpuTopology *smp_info,
+   CPUTopoLevel child_level)
+{
+switch (child_level) {
+case CPU_TOPO_THREAD:
+return _info->threads;
+case CPU_TOPO_CORE:
+return _info->cores;
+case CPU_TOPO_CLUSTER:
+return _info->clusters;
+case CPU_TOPO_DIE:
+return _info->dies;
+case CPU_TOPO_SOCKET:
+return _info->sockets;
+case CPU_TOPO_BOOK:
+return _info->books;
+case CPU_TOPO_DRAWER:
+return _info->drawers;
+default:
+/* No need to consider CPU_TOPO_UNKNOWN, and CPU_TOPO_ROOT. */
+g_assert_not_reached();
+}
+
+return NULL;
+}
+
+static const char *get_topo_typename_by_level(CPUTopoLevel level)
+{
+switch (level) {
+case CPU_TOPO_CORE:
+return TYPE_CPU_CORE;
+case CPU_TOPO_CLUSTER:
+return TYPE_CPU_CLUSTER;
+case CPU_TOPO_DIE:
+return TYPE_CPU_DIE;
+case CPU_TOPO_SOCKET:
+return TYPE_CPU_SOCKET;
+case CPU_TOPO_BOOK:
+return TYPE_CPU_BOOK;
+case CPU_TOPO_DRAWER:
+return TYPE_CPU_DRAWER;
+default:
+/*
+ * No need to consider CPU_TOPO_UNKNOWN, CPU_TOPO_THREAD
+ * and CPU_TOPO_ROOT.
+ */
+g_assert_not_reached();
+}
+
+return NULL;
+}
+
+static char *get_topo_global_name(CPUTopoStat *stat,
+  CPUTopoLevel level)
+{
+const char *type = NULL;
+CPUTopoStatEntry *entry;
+
+type = cpu_topo_level_to_string(level);
+entry = get_topo_stat_entry(stat, level);
+return g_strdup_printf("%s[%d]", type, entry->total_units);
+}
+
+typedef struct SMPBuildCbData {
+unsigned long *supported_levels;
+unsigned int plugged_cpus;
+CpuTopology *smp_info;
+CPUTopoStat *stat;
+Error **errp;
+} SMPBuildCbData;
+
+static int smp_core_set_threads(Object *core, unsigned int max_threads,
+unsigned int *plugged_cpus, Error **errp)
+{
+if (!object_property_set_int(core, "nr-threads", max_threads, errp)) {
+object_unref(core);
+return TOPO_FOREACH_ERR;
+}
+
+if (*plugged_cpus > max_threads) {
+if (!object_property_set_int(core, "plugged-threads",
+ max_threads, errp)) {
+object_unref(core);
+return TOPO_FOREACH_ERR;
+}
+*plugged_cpus -= max_threads;
+} else{
+if (!object_property_set_int(core, "plugged-threads",
+ *plugged_cpus, errp)) {
+object_unref(core);
+return TOPO_FOREACH_ERR;
+}
+*plugged_cpus = 0;
+}
+
+return TOPO_FOREACH_CONTINUE;
+}
+
+static int add_smp_topo_child(CPUTopoState *topo, void *opaque)
+{
+CPUTopoLevel level, child_level;
+Object *parent = OBJECT(topo);
+SMPBuildCbData *cb = opaque;
+unsigned int *nr_children;
+Error **errp = cb->errp;
+
+level = CPU_TOPO_LEVEL(topo);
+child_level = find_last_bit(cb->supported_levels, level);
+
+/*
+ * child_level equals to level, that means no child needs to create.
+ * This shouldn't happen.
+ */
+g_assert(child_level != level);
+
+nr_children = get_smp_info_by_level(cb->smp_info, child_level);
+topo->max_children = *nr_children;
+
+for (int i = 0; i < topo->max_children; i++) {
+ObjectProperty *prop;
+Object *child;
+gchar *name;
+
+child = object_new(get_topo_typename_by_level(child_level));
+name = get_topo_global_name(cb->stat, child_level);
+
+prop = object_property_try_add_child(parent, name, child, errp);
+g_free(name);
+if (!prop) {
+return TOPO_FOREACH_ERR;
+}
+
+if (child_level == CPU_TOPO_CORE) {
+int ret = smp_core_set_threads(child, cb->smp_info->threads,
+  

[RFC 36/41] hw/i386: Allow x86_cpu_new() to specify parent for new CPU

2023-11-30 Thread Zhao Liu
From: Zhao Liu 

For QOM topology, CPU should be inserted under its parent core.

Extend x86_cpu_new() to allow caller to specify topology parent.

Signed-off-by: Zhao Liu 
---
 hw/i386/x86.c | 16 
 1 file changed, 12 insertions(+), 4 deletions(-)

diff --git a/hw/i386/x86.c b/hw/i386/x86.c
index d9293846db64..3c99f4c3ab51 100644
--- a/hw/i386/x86.c
+++ b/hw/i386/x86.c
@@ -94,15 +94,22 @@ uint32_t x86_cpu_apic_id_from_index(X86MachineState *x86ms,
 return x86_apicid_from_cpu_idx(_info, cpu_index);
 }
 
-
 static void x86_cpu_new(X86MachineState *x86ms, int64_t apic_id,
-Error **errp)
+Object *parent, int index, Error **errp)
 {
-Object *cpu = object_new(MACHINE(x86ms)->cpu_type);
+const char *cpu_type = MACHINE(x86ms)->cpu_type;
+Object *cpu = object_new(cpu_type);
 
 if (!object_property_set_uint(cpu, "apic-id", apic_id, errp)) {
 goto out;
 }
+
+if (parent) {
+char *name = g_strdup_printf("%s[%d]", cpu_type, index);
+object_property_add_child(parent, name, cpu);
+g_free(name);
+}
+
 qdev_realize(DEVICE(cpu), NULL, errp);
 
 out:
@@ -146,7 +153,8 @@ void x86_cpus_init(X86MachineState *x86ms, int 
default_cpu_version)
 
 possible_cpus = mc->possible_cpu_arch_ids(ms);
 for (i = 0; i < ms->smp.cpus; i++) {
-x86_cpu_new(x86ms, possible_cpus->cpus[i].arch_id, _fatal);
+x86_cpu_new(x86ms, possible_cpus->cpus[i].arch_id,
+NULL, i, _fatal);
 }
 }
 
-- 
2.34.1




[RFC 23/41] hw/cpu/die: Abstract cpu-die level as topology device

2023-11-30 Thread Zhao Liu
From: Zhao Liu 

Abstract die level as a topology device "cpu-die" to allow user to
create die level topology from cli and later the cpu-dies could be added
into topology tree.

In addition, mark the cpu-die as DEVICE_CATEGORY_CPU_DEF category to
indicate it belongs to the basic CPU definition and should be created
from cli before board initialization.

Signed-off-by: Zhao Liu 
---
 MAINTAINERS  |  2 ++
 hw/cpu/die.c | 46 
 hw/cpu/meson.build   |  2 +-
 include/hw/cpu/die.h | 38 
 4 files changed, 87 insertions(+), 1 deletion(-)
 create mode 100644 hw/cpu/die.c
 create mode 100644 include/hw/cpu/die.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 89e350866d6a..91d0936edb32 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1863,6 +1863,7 @@ F: hw/core/machine-smp.c
 F: hw/core/null-machine.c
 F: hw/core/numa.c
 F: hw/cpu/cluster.c
+F: hw/cpu/die.c
 F: qapi/machine.json
 F: qapi/machine-common.json
 F: qapi/machine-target.json
@@ -1870,6 +1871,7 @@ F: include/hw/boards.h
 F: include/hw/core/cpu.h
 F: include/hw/core/cpu-topo.h
 F: include/hw/cpu/cluster.h
+F: include/hw/cpu/die.h
 F: include/sysemu/numa.h
 F: tests/unit/test-smp-parse.c
 T: git https://gitlab.com/ehabkost/qemu.git machine-next
diff --git a/hw/cpu/die.c b/hw/cpu/die.c
new file mode 100644
index ..06c4f7cc8fa2
--- /dev/null
+++ b/hw/cpu/die.c
@@ -0,0 +1,46 @@
+/*
+ * CPU die abstract device
+ *
+ * Copyright (c) 2023 Intel Corporation
+ * Author: Zhao Liu 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License,
+ * or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "hw/cpu/die.h"
+
+static void cpu_die_class_init(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+CPUTopoClass *tc = CPU_TOPO_CLASS(oc);
+
+set_bit(DEVICE_CATEGORY_CPU_DEF, dc->categories);
+
+tc->level = CPU_TOPO_DIE;
+}
+
+static const TypeInfo cpu_die_type_info = {
+.name = TYPE_CPU_DIE,
+.parent = TYPE_CPU_TOPO,
+.class_init = cpu_die_class_init,
+.instance_size = sizeof(CPUDie),
+};
+
+static void cpu_die_register_types(void)
+{
+type_register_static(_die_type_info);
+}
+
+type_init(cpu_die_register_types)
diff --git a/hw/cpu/meson.build b/hw/cpu/meson.build
index 6d319947ca0b..e685fe1c7d8a 100644
--- a/hw/cpu/meson.build
+++ b/hw/cpu/meson.build
@@ -1,4 +1,4 @@
-system_ss.add(files('core.c', 'cluster.c'))
+system_ss.add(files('core.c', 'cluster.c', 'die.c'))
 
 system_ss.add(when: 'CONFIG_ARM11MPCORE', if_true: files('arm11mpcore.c'))
 system_ss.add(when: 'CONFIG_REALVIEW', if_true: files('realview_mpcore.c'))
diff --git a/include/hw/cpu/die.h b/include/hw/cpu/die.h
new file mode 100644
index ..4c68786b5f2f
--- /dev/null
+++ b/include/hw/cpu/die.h
@@ -0,0 +1,38 @@
+/*
+ * CPU die abstract device
+ *
+ * Copyright (c) 2023 Intel Corporation
+ * Author: Zhao Liu 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License,
+ * or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#ifndef HW_CPU_DIE_H
+#define HW_CPU_DIE_H
+
+#include "hw/core/cpu-topo.h"
+#include "hw/qdev-core.h"
+
+#define TYPE_CPU_DIE "cpu-die"
+
+OBJECT_DECLARE_SIMPLE_TYPE(CPUDie, CPU_DIE)
+
+struct CPUDie {
+/*< private >*/
+CPUTopoState obj;
+
+/*< public >*/
+};
+
+#endif /* HW_CPU_DIE_H */
-- 
2.34.1




[RFC 30/41] hw/core/slot: Check topology child to be added under CPU slot

2023-11-30 Thread Zhao Liu
From: Zhao Liu 

Implement CPUTopoClass.check_topo_child() in cpu-slot to be compatible
with the limitations of the current smp topology.

Signed-off-by: Zhao Liu 
---
 hw/core/cpu-slot.c | 37 +
 hw/core/cpu-topo.c |  2 +-
 include/hw/core/cpu-slot.h |  2 ++
 include/hw/core/cpu-topo.h |  1 +
 4 files changed, 41 insertions(+), 1 deletion(-)

diff --git a/hw/core/cpu-slot.c b/hw/core/cpu-slot.c
index e8e6f4d25532..2a796ad5b6e7 100644
--- a/hw/core/cpu-slot.c
+++ b/hw/core/cpu-slot.c
@@ -21,6 +21,7 @@
 #include "qemu/osdep.h"
 
 #include "hw/core/cpu-slot.h"
+#include "qapi/error.h"
 
 static inline
 CPUTopoStatEntry *get_topo_stat_entry(CPUTopoStat *stat,
@@ -94,6 +95,37 @@ static void cpu_slot_update_topo_info(CPUTopoState *root, 
CPUTopoState *child,
 }
 }
 
+static void cpu_slot_check_topo_support(CPUTopoState *root, CPUTopoState 
*child,
+Error **errp)
+{
+CPUSlot *slot = CPU_SLOT(root);
+CPUTopoLevel child_level = CPU_TOPO_LEVEL(child);
+
+if (!test_bit(child_level, slot->supported_levels)) {
+error_setg(errp, "cpu topo: the level %s is not supported",
+   cpu_topo_level_to_string(child_level));
+return;
+}
+
+/*
+ * Currently we doesn't support hybrid topology. For SMP topology,
+ * each child under the same parent are same type.
+ */
+if (child->parent->num_children) {
+CPUTopoState *sibling = QTAILQ_FIRST(>parent->children);
+const char *sibling_type = object_get_typename(OBJECT(sibling));
+const char *child_type = object_get_typename(OBJECT(child));
+
+if (strcmp(sibling_type, child_type)) {
+error_setg(errp, "Invalid smp topology: different CPU "
+   "topology types (%s child vs %s sibling) "
+   "under the same parent (%s).",
+   child_type, sibling_type,
+   object_get_typename(OBJECT(child->parent)));
+}
+}
+}
+
 static void cpu_slot_class_init(ObjectClass *oc, void *data)
 {
 DeviceClass *dc = DEVICE_CLASS(oc);
@@ -104,6 +136,7 @@ static void cpu_slot_class_init(ObjectClass *oc, void *data)
 
 tc->level = CPU_TOPO_ROOT;
 tc->update_topo_info = cpu_slot_update_topo_info;
+tc->check_topo_child = cpu_slot_check_topo_support;
 }
 
 static void cpu_slot_instance_init(Object *obj)
@@ -112,6 +145,10 @@ static void cpu_slot_instance_init(Object *obj)
 
 QTAILQ_INIT(>cores);
 set_bit(CPU_TOPO_ROOT, slot->stat.curr_levels);
+
+/* Set all levels by default. */
+bitmap_fill(slot->supported_levels, USER_AVAIL_LEVEL_NUM);
+clear_bit(CPU_TOPO_UNKNOWN, slot->supported_levels);
 }
 
 static const TypeInfo cpu_slot_type_info = {
diff --git a/hw/core/cpu-topo.c b/hw/core/cpu-topo.c
index 687a4cc566ec..351112ca7a73 100644
--- a/hw/core/cpu-topo.c
+++ b/hw/core/cpu-topo.c
@@ -24,7 +24,7 @@
 #include "hw/qdev-properties.h"
 #include "qapi/error.h"
 
-static const char *cpu_topo_level_to_string(CPUTopoLevel level)
+const char *cpu_topo_level_to_string(CPUTopoLevel level)
 {
 switch (level) {
 case CPU_TOPO_UNKNOWN:
diff --git a/include/hw/core/cpu-slot.h b/include/hw/core/cpu-slot.h
index fa2bd4af247d..7bf51988afb3 100644
--- a/include/hw/core/cpu-slot.h
+++ b/include/hw/core/cpu-slot.h
@@ -77,6 +77,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(CPUSlot, CPU_SLOT)
  * queues for other topology levels to facilitate traversal
  * when necessary.
  * @stat: Statistical topology information for topology tree.
+ * @supported_levels: Supported topology levels for topology tree.
  */
 struct CPUSlot {
 /*< private >*/
@@ -85,6 +86,7 @@ struct CPUSlot {
 /*< public >*/
 QTAILQ_HEAD(, CPUCore) cores;
 CPUTopoStat stat;
+DECLARE_BITMAP(supported_levels, USER_AVAIL_LEVEL_NUM);
 };
 
 #endif /* CPU_SLOT_H */
diff --git a/include/hw/core/cpu-topo.h b/include/hw/core/cpu-topo.h
index 453bacbb558b..d27da0335c42 100644
--- a/include/hw/core/cpu-topo.h
+++ b/include/hw/core/cpu-topo.h
@@ -102,5 +102,6 @@ int cpu_topo_child_foreach(CPUTopoState *topo, unsigned 
long *levels,
 int cpu_topo_child_foreach_recursive(CPUTopoState *topo,
  unsigned long *levels,
  topo_fn fn, void *opaque);
+const char *cpu_topo_level_to_string(CPUTopoLevel level);
 
 #endif /* CPU_TOPO_H */
-- 
2.34.1




[RFC 26/41] hw/cpu/drawer: Abstract cpu-drawer level as topology device

2023-11-30 Thread Zhao Liu
From: Zhao Liu 

Abstract drawer level as a topology device "cpu-drawer" to allow user to
create drawer level topology from cli and later the cpu-drawers could be
added into topology tree.

In addition, mark the cpu-drawer as DEVICE_CATEGORY_CPU_DEF category to
indicate it belongs to the basic CPU definition and should be created
from cli before board initialization.

Signed-off-by: Zhao Liu 
---
 MAINTAINERS |  2 ++
 hw/cpu/drawer.c | 46 +
 hw/cpu/meson.build  |  2 +-
 include/hw/cpu/drawer.h | 38 ++
 4 files changed, 87 insertions(+), 1 deletion(-)
 create mode 100644 hw/cpu/drawer.c
 create mode 100644 include/hw/cpu/drawer.h

diff --git a/MAINTAINERS b/MAINTAINERS
index dd5adfda64cc..4b373ff46ce3 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -1865,6 +1865,7 @@ F: hw/core/numa.c
 F: hw/cpu/book.c
 F: hw/cpu/cluster.c
 F: hw/cpu/die.c
+F: hw/cpu/drawer.c
 F: hw/cpu/socket.c
 F: qapi/machine.json
 F: qapi/machine-common.json
@@ -1875,6 +1876,7 @@ F: include/hw/core/cpu-topo.h
 F: include/hw/cpu/book.h
 F: include/hw/cpu/cluster.h
 F: include/hw/cpu/die.h
+F: include/hw/cpu/drawer.h
 F: include/hw/cpu/socket.h
 F: include/sysemu/numa.h
 F: tests/unit/test-smp-parse.c
diff --git a/hw/cpu/drawer.c b/hw/cpu/drawer.c
new file mode 100644
index ..f1ccfd153284
--- /dev/null
+++ b/hw/cpu/drawer.c
@@ -0,0 +1,46 @@
+/*
+ * CPU drawer abstract device
+ *
+ * Copyright (c) 2023 Intel Corporation
+ * Author: Zhao Liu 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License,
+ * or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#include "qemu/osdep.h"
+#include "hw/cpu/drawer.h"
+
+static void cpu_drawer_class_init(ObjectClass *oc, void *data)
+{
+DeviceClass *dc = DEVICE_CLASS(oc);
+CPUTopoClass *tc = CPU_TOPO_CLASS(oc);
+
+set_bit(DEVICE_CATEGORY_CPU_DEF, dc->categories);
+
+tc->level = CPU_TOPO_DRAWER;
+}
+
+static const TypeInfo cpu_drawer_type_info = {
+.name = TYPE_CPU_DRAWER,
+.parent = TYPE_CPU_TOPO,
+.class_init = cpu_drawer_class_init,
+.instance_size = sizeof(CPUDrawer),
+};
+
+static void cpu_drawer_register_types(void)
+{
+type_register_static(_drawer_type_info);
+}
+
+type_init(cpu_drawer_register_types)
diff --git a/hw/cpu/meson.build b/hw/cpu/meson.build
index c44b54c5abb0..0dea39364b98 100644
--- a/hw/cpu/meson.build
+++ b/hw/cpu/meson.build
@@ -1,4 +1,4 @@
-system_ss.add(files('core.c', 'cluster.c', 'die.c', 'socket.c', 'book.c'))
+system_ss.add(files('core.c', 'cluster.c', 'die.c', 'socket.c', 'book.c', 
'drawer.c'))
 
 system_ss.add(when: 'CONFIG_ARM11MPCORE', if_true: files('arm11mpcore.c'))
 system_ss.add(when: 'CONFIG_REALVIEW', if_true: files('realview_mpcore.c'))
diff --git a/include/hw/cpu/drawer.h b/include/hw/cpu/drawer.h
new file mode 100644
index ..34ae089d33bf
--- /dev/null
+++ b/include/hw/cpu/drawer.h
@@ -0,0 +1,38 @@
+/*
+ * CPU drawer abstract device
+ *
+ * Copyright (c) 2023 Intel Corporation
+ * Author: Zhao Liu 
+ *
+ * This program is free software; you can redistribute it and/or modify
+ * it under the terms of the GNU General Public License as published by
+ * the Free Software Foundation; either version 2 of the License,
+ * or (at your option) any later version.
+ *
+ * This program is distributed in the hope that it will be useful,
+ * but WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+ * GNU General Public License for more details.
+ *
+ * You should have received a copy of the GNU General Public License
+ * along with this program; if not, see .
+ */
+
+#ifndef HW_CPU_DRAWER_H
+#define HW_CPU_DRAWER_H
+
+#include "hw/core/cpu-topo.h"
+#include "hw/qdev-core.h"
+
+#define TYPE_CPU_DRAWER "cpu-drawer"
+
+OBJECT_DECLARE_SIMPLE_TYPE(CPUDrawer, CPU_DRAWER)
+
+struct CPUDrawer {
+/*< private >*/
+CPUTopoState obj;
+
+/*< public >*/
+};
+
+#endif /* HW_CPU_DRAWER_H */
-- 
2.34.1




[RFC 31/41] hw/machine: Plug cpu-slot into machine to maintain topology tree

2023-11-30 Thread Zhao Liu
From: Zhao Liu 

Add a cpu-slot in machine as the root of topology tree to maintain the
QOM topology.

Signed-off-by: Zhao Liu 
---
 hw/core/cpu-slot.c | 31 +++
 include/hw/boards.h|  2 ++
 include/hw/core/cpu-slot.h |  7 +++
 system/vl.c|  2 ++
 4 files changed, 42 insertions(+)

diff --git a/hw/core/cpu-slot.c b/hw/core/cpu-slot.c
index 2a796ad5b6e7..4b148440ed3d 100644
--- a/hw/core/cpu-slot.c
+++ b/hw/core/cpu-slot.c
@@ -20,6 +20,7 @@
 
 #include "qemu/osdep.h"
 
+#include "hw/boards.h"
 #include "hw/core/cpu-slot.h"
 #include "qapi/error.h"
 
@@ -165,3 +166,33 @@ static void cpu_slot_register_types(void)
 }
 
 type_init(cpu_slot_register_types)
+
+void machine_plug_cpu_slot(MachineState *ms)
+{
+MachineClass *mc = MACHINE_GET_CLASS(ms);
+
+ms->topo = CPU_SLOT(qdev_new(TYPE_CPU_SLOT));
+
+object_property_add_child(container_get(OBJECT(ms), "/peripheral"),
+  "cpu-slot", OBJECT(ms->topo));
+DEVICE(ms->topo)->id = g_strdup_printf("%s", "cpu-slot");
+
+qdev_realize_and_unref(DEVICE(ms->topo), NULL, _abort);
+ms->topo->ms = ms;
+
+if (!mc->smp_props.clusters_supported) {
+clear_bit(CPU_TOPO_CLUSTER, ms->topo->supported_levels);
+}
+
+if (!mc->smp_props.dies_supported) {
+clear_bit(CPU_TOPO_DIE, ms->topo->supported_levels);
+}
+
+if (!mc->smp_props.books_supported) {
+clear_bit(CPU_TOPO_BOOK, ms->topo->supported_levels);
+}
+
+if (!mc->smp_props.drawers_supported) {
+clear_bit(CPU_TOPO_DRAWER, ms->topo->supported_levels);
+}
+}
diff --git a/include/hw/boards.h b/include/hw/boards.h
index da85f86efb91..81a7b04ece86 100644
--- a/include/hw/boards.h
+++ b/include/hw/boards.h
@@ -10,6 +10,7 @@
 #include "qemu/module.h"
 #include "qom/object.h"
 #include "hw/core/cpu.h"
+#include "hw/core/cpu-slot.h"
 
 #define TYPE_MACHINE_SUFFIX "-machine"
 
@@ -398,6 +399,7 @@ struct MachineState {
 AccelState *accelerator;
 CPUArchIdList *possible_cpus;
 CpuTopology smp;
+CPUSlot *topo;
 struct NVDIMMState *nvdimms_state;
 struct NumaState *numa_state;
 };
diff --git a/include/hw/core/cpu-slot.h b/include/hw/core/cpu-slot.h
index 7bf51988afb3..1361af4ccfc0 100644
--- a/include/hw/core/cpu-slot.h
+++ b/include/hw/core/cpu-slot.h
@@ -78,6 +78,7 @@ OBJECT_DECLARE_SIMPLE_TYPE(CPUSlot, CPU_SLOT)
  * when necessary.
  * @stat: Statistical topology information for topology tree.
  * @supported_levels: Supported topology levels for topology tree.
+ * @ms: Machine in which this cpu-slot is plugged.
  */
 struct CPUSlot {
 /*< private >*/
@@ -87,6 +88,12 @@ struct CPUSlot {
 QTAILQ_HEAD(, CPUCore) cores;
 CPUTopoStat stat;
 DECLARE_BITMAP(supported_levels, USER_AVAIL_LEVEL_NUM);
+MachineState *ms;
 };
 
+#define MACHINE_CORE_FOREACH(ms, core) \
+QTAILQ_FOREACH((core), &(ms)->topo->cores, node)
+
+void machine_plug_cpu_slot(MachineState *ms);
+
 #endif /* CPU_SLOT_H */
diff --git a/system/vl.c b/system/vl.c
index 65add2fb2460..637f708d2265 100644
--- a/system/vl.c
+++ b/system/vl.c
@@ -2128,6 +2128,8 @@ static void qemu_create_machine(QDict *qdict)
   false, _abort);
 qobject_unref(default_opts);
 }
+
+machine_plug_cpu_slot(current_machine);
 }
 
 static int global_init_func(void *opaque, QemuOpts *opts, Error **errp)
-- 
2.34.1




  1   2   >