[Bug target/111768] X86: -march=native does not support alder lake big.little cache infor correctly

2023-10-12 Thread stormbyte at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768

--- Comment #13 from David C. Manuelda  ---
I'd suggest for now to pick a common value in order to prevent the compilation
failure (in stage comparison) while a proper fix/workaround is picked.

[Bug target/111768] X86: -march=native does not support alder lake big.little cache infor correctly

2023-10-12 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768

Richard Biener  changed:

   What|Removed |Added

 Status|UNCONFIRMED |NEW
 Ever confirmed|0   |1
   Last reconfirmed||2023-10-12

--- Comment #12 from Richard Biener  ---
Looking at the 'hybrid' flag in cpuid sounds like the most reasonable thing to
do, possibly simply skipping auto-detection for the problematical parts
(L1 and L2 cache sizes) as Alex suggests.

[Bug target/111768] X86: -march=native does not support alder lake big.little cache infor correctly

2023-10-12 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768

--- Comment #11 from Alexander Monakov  ---
(In reply to Hongtao.liu from comment #10)
> > indeed (but I believe it did happen with Alder Lake already, by accident,
> > with AVX512 on P-cores but not on E-cores).
> 
> AVX512 is physically fused off for Alderlake P-core, P-core and E-core share
> the same ISA level(AVX2).

I think Arsen means initial Alder Lake batches, where AVX-512 wasn't yet fused
off (but BIOS support was unofficial/experimental anyway).

[Bug target/111768] X86: -march=native does not support alder lake big.little cache infor correctly

2023-10-12 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768

--- Comment #10 from Hongtao.liu  ---
> indeed (but I believe it did happen with Alder Lake already, by accident,
> with AVX512 on P-cores but not on E-cores).

AVX512 is physically fused off for Alderlake P-core, P-core and E-core share
the same ISA level(AVX2).

[Bug target/111768] X86: -march=native does not support alder lake big.little cache infor correctly

2023-10-12 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768

--- Comment #9 from Alexander Monakov  ---
(In reply to Arsen Arsenović from comment #8)
> indeed (but I believe it did happen with Alder Lake already, by accident,
> with AVX512 on P-cores but not on E-cores).

AFAIK on those Alder Lake CPUs you could only get AVX-512 by disabling E-cores
in the BIOS, so you couldn't boot in a configuration when both E-cores are
available and AVX-512 on P-cores is available.

[Bug target/111768] X86: -march=native does not support alder lake big.little cache infor correctly

2023-10-12 Thread arsen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768

--- Comment #8 from Arsen Arsenović  ---
(In reply to Alexander Monakov from comment #7)
> I'm afraid hybrid CPUs with varying ISA feature sets are not practical for
> the current ecosystem: you wouldn't be able to reschedule from a higher- to
> lower-capable core. Not to mention scenarios like Mesa on-disk llvmpipe
> shader cache.

indeed (but I believe it did happen with Alder Lake already, by accident, with
AVX512 on P-cores but not on E-cores).

> "Always" probing all cores is a not a good idea (the compiler would have to
> manually reschedule itself to all cores, of which there could be hundreds).
> Plus, portable API for such probing across available cores does not exist
> afaik.

I'd consider this close enough to 'not possible' ;P

my thinking was does cpuid provide a way to query cross-CPU (or CPU 'group' I
suppose).  if not, we're definitely better off just using a common, smaller
cache size for intel hybrid CPUs (at least for now)

> I think releasing an x86 hybrid CPU with varying capabilities across cores
> would require substantial preparatory work in the kernel and likely in the
> userland as well, so probably best to leave it until the time comes and
> specifics of what can differ are known.

[Bug target/111768] X86: -march=native does not support alder lake big.little cache infor correctly

2023-10-12 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768

Alexander Monakov  changed:

   What|Removed |Added

 CC||amonakov at gcc dot gnu.org

--- Comment #7 from Alexander Monakov  ---
I'm afraid hybrid CPUs with varying ISA feature sets are not practical for the
current ecosystem: you wouldn't be able to reschedule from a higher- to
lower-capable core. Not to mention scenarios like Mesa on-disk llvmpipe shader
cache.

"Always" probing all cores is a not a good idea (the compiler would have to
manually reschedule itself to all cores, of which there could be hundreds).
Plus, portable API for such probing across available cores does not exist
afaik.

I think releasing an x86 hybrid CPU with varying capabilities across cores
would require substantial preparatory work in the kernel and likely in the
userland as well, so probably best to leave it until the time comes and
specifics of what can differ are known.

[Bug target/111768] X86: -march=native does not support alder lake big.little cache infor correctly

2023-10-12 Thread arsen at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768

--- Comment #6 from Arsen Arsenović  ---
this poses another problem too, though: should big and little cores ever differ
in ISA support levels, building on big cores (which seems like a reasonable
thing to do) with -march=native could lead to generating code incompatible with
little cores.  perhaps it'd be reasonable to always probe all cores (is that
possible?) and pick a common subset?

[Bug target/111768] X86: -march=native does not support alder lake big.little cache infor correctly

2023-10-11 Thread amonakov at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768

--- Comment #5 from Alexander Monakov  ---
I think it's similar to attempting -march=native under distcc, which is already
warned about on Gentoo wiki: https://wiki.gentoo.org/wiki/Distcc

The difference here is that Intel so far decided to make ISA feature set the
same between 'performance' and 'power-efficient' cores, so the differences for
-march=native detection are minimal.

Intel also added a cpuid bit for hybrid CPUs, so in principle native arch
detection could inspect that bit and then override l1-cache-size to 32 KiB
(having the exact size in the param is not important, specifying a lower value
is ok), or just drop it and let cc1 fall back to the default value (64) from
params.opt.

Short term, I would advise users to add --param=l1-cache-size=32 after
-march=native in CFLAGS.

[Bug target/111768] X86: -march=native does not support alder lake big.little cache infor correctly

2023-10-11 Thread crazylht at gmail dot com via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768

--- Comment #4 from Hongtao.liu  ---
I checked Alderlake's L1 cachesize and it is indeed 48, and L1 cachesize in
alderlake_cost is set to 32.
But then again, we have a lot of different platforms that share the same cost 
and they may have different L1 cachesizes, but from a micro-architecture tuning
point of view, it doesn't make a difference. A separate cost if only the L1
cachesize is different is quite unnecessary(the size itself is just a parameter
for the software prefetch, it doesn't have to be real hardware cachesize)

[Bug target/111768] X86: -march=native does not support alder lake big.little cache infor correctly

2023-10-11 Thread rguenth at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768

--- Comment #3 from Richard Biener  ---
I'd say "don't do this" (bootstrap with -march=native).  Alternatively use a
taskset to confine to either big or little cores.

[Bug target/111768] X86: -march=native does not support alder lake big.little cache infor correctly

2023-10-11 Thread pinskia at gcc dot gnu.org via Gcc-bugs
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=111768

--- Comment #2 from Andrew Pinski  ---
I think on those soc we should ignore the cache info or set it to some common
value between the 2.