Re: GLibC AMD CPUID cache reporting regression (was Re: qemu-user self emulation broken with default CPU on x86/x64)

Florian Weimer Tue, 04 Jul 2023 10:38:45 -0700

* Daniel P. Berrangé:

> On Mon, Jul 03, 2023 at 06:03:08PM +0200, Pierrick Bouvier wrote:
>> Hi everyone,
>> 
>> Recently (in d135f781 [1], between v7.0.0 and v8.0.0), qemu-user default cpu
>> was updated to "max" instead of qemu32/qemu64.
>> 
>> This change "broke" qemu self emulation if this new default cpu is used.
>> 
>> $ ./qemu-x86_64 ./qemu-x86_64 --version
>> qemu-x86_64: ../util/cacheflush.c:212: init_cache_info: Assertion `(isize &
>> (isize - 1)) == 0' failed.
>> qemu: uncaught target signal 6 (Aborted) - core dumped
>> Aborted
>> 
>> By setting cpu back to qemu64, it works again.
>> $ ./qemu-x86_64 -cpu qemu64 ./qemu-x86_64  --version
>> qemu-x86_64 version 8.0.50 (v8.0.0-2317-ge125b08ed6)
>> Copyright (c) 2003-2023 Fabrice Bellard and the QEMU Project developers
>> 
>> Commenting assert does not work, as qemu aligned malloc fail shortly after.
>> 
>> I'm willing to fix it, but I'm not sure what is the issue with "max" cpu
>> exactly. Is it missing CPU cache line, or something else?
>
> I've observed GLibC is issuing CPUID leaf 0x8000_001d
>
> QEMU 'max' CPU model doesn't defnie xlevel, so QEMU makes it default
> to the same as min_xlevel, which is calculated to be 0x8000_000a.
>
> cpu_x86_cpuid() in QEMU sees CPUID leaf 0x8000_001d is above 0x8000_000a,
> and so  considers it an invaild CPUID and thus forces it to report
> 0x0000_000d which is supposedly what an invalid CPUID leaf should do.
>
>
> Net result: glibc is asking for 0x8000_001d, but getting back data
> for 0x0000_000d.
>
> This doesn't end happily for obvious reasons, getting garbage for
> the dcache sizes.
>
>
> The 'qemu64' CPU model also gets CPUID leaf 0x8000_001d capped back
> to 0x0000_000d, but crucially qemu64 lacks the 'xsave' feature bit,
> so QEMU returns all-zeroes for CPUID leaf 0x0000_000d. Still not
> good, but this makes glibc report 0 for DCACHE_*, which in turn
> avoids tripping up the nested qemu which queries DCACHE sysconf.
>
> So the problem is thus more widespread than just 'max' CPU model.
>
> Any QEMU CPU model with vendor=AuthenticAMD and the xsave feature,
> and the xlevel unset, will cause glibc to report garbage for the
> L1D cache info
>
> Any QEMU CPU model with vendor=AuthenticAMD and without the xsave
> feature, and the xlevel unset, will cause glibc to report zeroes
> for L1D cache info
>
> Neither is good, but the latter at least doesn't trip up the
> nested QEMU when it queries L1D cache info.
>
> I'm unsure if QEMU's behaviour is correct with calculating the
> default 'xlevel' values for 'max', but I'm assuming the xlevel
> was correct for Opteron_G4/5 since those are explicitly set
> in the code for along time.


We are tracking this as:

  New AMD cache size computation logic does not work for some CPUs,
  hypervisors
  <https://sourceware.org/bugzilla/show_bug.cgi?id=30428>

I filed it after we resolved the earlier crashes because the data is
clearly not accurate.  I was also able to confirm that impacts more than
just hypervisors.

Sajan posted a first patch:

  [PATCH] x86: Fix for cache computation on AMD legacy cpus.
  <https://sourceware.org/pipermail/libc-alpha/2023-June/148763.html>

However, it changes the reported cache sizes on some older CPUs compared
to what we had before (although the values are no longer zero at least).

Thanks,
Florian

Re: GLibC AMD CPUID cache reporting regression (was Re: qemu-user self emulation broken with default CPU on x86/x64)

Reply via email to