Seeing this expire and had no chance to chime in before ... :-/
It can stay expired unless if there is more data, but there is explanation to 
be made maybe helping you and others with similar struggles. Allowing me to 
link to it when needed again :-)

BTW - Forget about the network XML, that was for a different case we discussed 
and got here by accident.
Sorry for that stray comment.

Still what one would really to do any further step is the full output of 
capabilities and domcapabilities. To help people only reading the bug (and 
potentially yourself) you could also attach the files these features compare to 
which in this case would be:
- /usr/share/libvirt/cpu_map/x86_Icelake-Server.xml
- /usr/share/libvirt/cpu_map/x86_SapphireRapids-noTSX.xml

Now once you (or anyone else wondering about similar) have all these let
me add a bit more details as this kind of wondering has happened a few
times. And TBH almost any similar case (not all though) are resolved
realizing that chip-names vs actual chip-features are smoke and mirrors.
That is because libvirt has to operate by proximity here.

And the end of the day what a chip really is, is not a name. It is a set of 
features.
That set of features does not change due to the name.

Let me try to explain with an artificial example that is a bit easier to
understand.

Imagine we have:
- 2 CPU names "A and B"
- 6 CPU Features called "a, b, c, d, e and f"
- Manufacturers sell various variants of A and B
- The idealistic "A" is defined in CPU types as having features "a b c"
- The idealistic "B" has newer features, hence it is defined as "a b c d e f"
- You bought a CPU sold as B-1234 and you expect it to appear as "B"
  - But it is a bit of a limited type and only comes with which comes with 
features "a b c e"
- Libvirt could now call it "B -d -f" or "A +e"
  - And since it goes by minmal distance it will call it "A +e"

You may now look at your CPU features that libvirt detected and compare to 
those names.
It is likely that Icelake-Server is closer to what it can provide than the 
SapphireRapids-noTSX name.


Now for the difference of capabilities vs domcapabilities.
The former looks at what the host looks like, the latter is what the system can 
emulate.
If a feature is not yet virtualizable and therefore can not be provided to a 
guest - but part of the definition of it - it would need to be listed as 
-feature. Such can add up and increase the distance.

In the example above imagine:
- Host is "B with a b c d e f" detected as "B"
- But qemu/kvm can only virtualize "a b c d"
- It will detect the guest as "A +d"

There is more, also CPU signatures like family/model and I'm simplifying a few 
things.
Further in x86 there also AMD vs Intel features which have features that do not 
cross over.
But at the end you could more or less construct a very modern server as "486 
+alotofthings".
It would at the end not be much different to a "newer name +/- less features".

We could wish manufacturers would only have "one" chip with one set of features 
for a given name.
But that is not how the market works, furthermore it can also be affected by 
kernel support, firmware, board capabilities, ... - hence the same physical 
chip can change appearance over time or depending on where it is used.

I guess what I'm trying to say is do not get bound by the "name", but by
the features you can provide to your guests - that is what really
matters.


I have understanding for the request though. I've spoken to people that said 
something like: "but I promised to my manager to provide CPU $NAME, I really 
need to have that name". After discussing and understanding the above usually 
was a nice laugh and then "... but I still need that name as my manager wants 
it".

So here a story about that...

My laptop is detected as Skylake-Client-noTSX-IBRS + 32 features.

As mentioned you can not virtualize 100% the same so host-model is
instead Skylake-Client-IBRS +~100 features (mostly due to fine grained
VMX control) and -2 features (due to security disabling kernel support).
We see different type, different features, some gone, ...

But as mentioned all are just names +/- features so I can define a guest
as:

    <cpu mode='custom' match='exact' check='partial'>                           
   
      <model fallback='allow'>SapphireRapids</model>                            
   
    </cpu> 

That as-is and as we'd expect it would tell me the misses when I start
the guest:

  error: the CPU is incompatible with host CPU: Host CPU does not
provide required features: hle, rtm, avx512f, avx512dq, avx512ifma,
clwb, avx512cd, sha-ni, avx512bw, avx512vl, avx512vbmi, pku,
avx512vbmi2, gfni, vaes, vpclmulqdq, avx512vnni, avx512bitalg,
avx512-vpopcntdq, la57, rdpid, bus-lock-detect, fsrm, serialize, tsx-
ldtrk, amx-bf16, avx512-fp16, amx-tile, amx-int8, avx-vnni, avx512-bf16,
fzrm, fsrs, fsrc, xfd, wbnoinvd, rdctl-no, ibrs-all, mds-no, taa-no

But as we know even inside real CPU families features come and go by
names, microcode, kernel, ... so we can just make that happen. Defining
it the following way will make it work on my laptop:

  <cpu mode='custom' match='exact' check='partial'>
    <model fallback='allow'>SapphireRapids</model>
    <feature policy='disable' name='hle'/>
    <feature policy='disable' name='rtm'/>
    <feature policy='disable' name='avx512f'/>
    <feature policy='disable' name='avx512dq'/>
    <feature policy='disable' name='avx512ifma'/>
    <feature policy='disable' name='clwb'/>
    <feature policy='disable' name='avx512cd'/>
    <feature policy='disable' name='sha-ni'/>
    <feature policy='disable' name='avx512bw'/>
    <feature policy='disable' name='avx512vl'/>
    <feature policy='disable' name='avx512vbmi'/>
    <feature policy='disable' name='pku'/>
    <feature policy='disable' name='avx512vbmi2'/>
    <feature policy='disable' name='gfni'/>
    <feature policy='disable' name='vaes'/>
    <feature policy='disable' name='vpclmulqdq'/>
    <feature policy='disable' name='avx512vnni'/>
    <feature policy='disable' name='avx512bitalg'/>
    <feature policy='disable' name='avx512-vpopcntdq'/>
    <feature policy='disable' name='la57'/>
    <feature policy='disable' name='rdpid'/>
    <feature policy='disable' name='bus-lock-detect'/>
    <feature policy='disable' name='fsrm'/>
    <feature policy='disable' name='serialize'/>
    <feature policy='disable' name='tsx-ldtrk'/>
    <feature policy='disable' name='amx-bf16'/>
    <feature policy='disable' name='avx512-fp16'/>
    <feature policy='disable' name='amx-tile'/>
    <feature policy='disable' name='amx-int8'/>
    <feature policy='disable' name='avx-vnni'/>
    <feature policy='disable' name='avx512-bf16'/>
    <feature policy='disable' name='fzrm'/>
    <feature policy='disable' name='fsrs'/>
    <feature policy='disable' name='fsrc'/>
    <feature policy='disable' name='xfd'/>
    <feature policy='disable' name='wbnoinvd'/>
    <feature policy='disable' name='rdctl-no'/>
    <feature policy='disable' name='ibrs-all'/>
    <feature policy='disable' name='mds-no'/>
    <feature policy='disable' name='taa-no'/>
  </cpu>

I can start this guest and it works fine.
So I have the CPU name I wanted :-)

Is this a SaphireRapids CPU? - no it isn't!

But then if you bought one and due to the combination of the actual
variant + firmware + microcode + kernel you have 20 features missing and
4 more than the idealistic definition - is that a SaphireRapids CPU?

We enter philosophy area here, but I hope you got the point that the
names mean nothing except trying to make it easier for you to select a
set of pre-defined features with a name and then allowing you to
+/-features from that name.

libvirt even has tools to compare different definitions or to find a
common denominator to set them up to be migratable. Look at
https://www.libvirt.org/manpages/virsh.html#hypervisor-cpu-compare and
the sections around it.

So in your example, you could "ask" it what it feels is missing from using the 
SapphireRapids-noTSX type you'd have expected. Without needing to create a 
guest first. You could create:
$ cat sr.xml
<cpu mode='custom' match='exact' check='partial'>
  <model fallback='allow'>SapphireRapids-noTSX</model>
</cpu>
Or anything similar you need and ask it:
$ virsh hypervisor-cpu-compare sr.xml  --error
error: Failed to compare hypervisor CPU with sr.xml
error: the CPU is incompatible with host CPU: Host CPU does not provide 
required features: hle, rtm, avx512f, avx512dq, avx512ifma, clwb, avx512cd, 
sha-ni, avx512bw, avx512vl, avx512vbmi, pku, avx512vbmi2, gfni, vaes, 
vpclmulqdq, avx512vnni, avx512bitalg, avx512-vpopcntdq, la57, rdpid, 
bus-lock-detect, fsrm, serialize, tsx-ldtrk, amx-bf16, avx512-fp16, amx-tile, 
amx-int8, avx-vnni, avx512-bf16, fzrm, fsrs, fsrc, xfd, wbnoinvd, rdctl-no, 
ibrs-all, mds-no, taa-no

Which is the same list I had in my example above.

After seeing all of that you can determine how many +/-features you'd need to 
make a SapphireRapids guest vs how many you need to make a Icelake-Server guest.
I'd assume the latter is closer and hence it is what libvirt shows as 
host-model for domcapabilities.

-- 
You received this bug notification because you are a member of Ubuntu
Bugs, which is subscribed to Ubuntu.
https://bugs.launchpad.net/bugs/2086598

Title:
  [Libvirt] Xeon CPU Error (SapphireRapids vs. Icelake-Server)

To manage notifications about this bug go to:
https://bugs.launchpad.net/ubuntu/+source/libvirt/+bug/2086598/+subscriptions


-- 
ubuntu-bugs mailing list
[email protected]
https://lists.ubuntu.com/mailman/listinfo/ubuntu-bugs

Reply via email to