Hi,

I'd like some input on some rather aggressive optimisations I'm after.

This is a WIP series (so, don't pay attention to bisectability just yet) to
perform aggressive DCE based on vendor checks. I've added some bloat-o-meter
outputs for the everything-enabled vs an amd-only builds for comparison. The
results are very promising.

Please focus on patch 3, as that's the key to the optimisations.

What I'm ultimately after is what I've come to call the "single-vendor
optimisation". In doing this it's fairly easy to optimise other cases too, such
as removing unused vendors and any branches they use exclusively. Please, tell
me what you think of the approach taken by the series.

  Single-vendor optimisation
  ==========================

  - If we compile Xen for a single-vendor AND
    Xen validates it always boots on such vendor AND
    We remove the default CPU path AND
    Xen sanitises guest CPU policies so they always match the boot vendor, THEN
    It's fair game to fold all vendor checks onto true/false depending on
    whether you're testing against the single vendor you compiled-in or not.
    Effectively ignoring the runtime component, and allowing DCE to perform
    eliminations left, right and center. The results are fairly dramatic in
    places like init_speculation_mitigations(). See bloat-o-meter below.

  Compiled-out-vendor elimination
  ===============================

  - If we have all possible x86 vendors described in Kconfig AND
    We compile out a vendor from Xen THEN
    It should be possible to remove such vendor from every vendor check in the
    system. Either we didn't boot it or we're using the default CPU ops anyway. 
    This causes &-checks to turn into ==- checks (more size-efficient) for
    multi-vendor checks turning into single-vendor checks, or even straight-up
    "false" when all vendors in the check are compiled out.
    This is key to enable transparent DCE-removals in multi-vendor settings.

This allows very transparent removal of all Hygon, Centaur and Shanghai code
in the system when it's not required, as it's most typically found in branches
around common x86 code gated by a runtime vendor check.

The secret sauce is a static always_inline function that operates on the
assumption that the second argument is always a constant and the first always
a variable (as it invairably is the case with these checks). It's written
carefully to fold everything into a constant for the appropriate cases. There's
a few more optimisations in the code, like a no-vendor optimisation where the
checks always return false when comparing against a non-unknown vendor when
no explicit vendor is compiled in.

Patch 1 adds the missing vendors and the default path to Kconfig, as they are
currently absent.

Patch 2 ensures consistency between host and guest CPU policies wrt CPU vendor

Patch 3 introduces the x86_vendor_is() function. This is the key.

Patch 4 migrates the early_cpu_init() vendor switch. It's a bit tricky because
it must be done compatibly with the single-vendor optimisation, but there's
nothing complex about it.

Patch 5 is simply code removal at the Makefile level for free thanks to DCE

Patches 6 through 11 is replacements of regular checks with x86_vendor_is(),
with 6 being the one with the most dramatic effect in the diffstat.

===============================================================================
                                 Bloat-o-meter
===============================================================================

all-vendors+default-path
========================

add/remove: 0/1 grow/shrink: 12/10 up/down: 175/-266 (-91)
Function                                     old     new   delta
start_vmx                                   1507    1582     +75
x86_cpu_policies_are_compatible              157     194     +37
xen_config_data                             1479    1506     +27
early_cpu_init                               948     958     +10
setup_apic_nmi_watchdog                      977     986      +9
init_speculation_mitigations                9836    9841      +5
intel_mcheck_init                           2398    2401      +3
set_cx_pminfo                               1691    1693      +2
init_bsp_APIC                                193     195      +2
guest_common_max_feature_adjustments         110     112      +2
disable_lapic_nmi_watchdog                   119     121      +2
do_get_hw_residencies                       1289    1290      +1
mce_firstbank                                 37      36      -1
mcheck_init                                 1227    1225      -2
hvm_vcpu_virtual_to_linear                   631     628      -3
init_nonfatal_mce_checker                    160     156      -4
domain_cpu_policy_changed                    677     672      -5
recalculate_misc                             898     890      -8
traps_init                                   543     527     -16
default_cpu                                   16       -     -16
cpufreq_driver_init                          468     441     -27
vmce_wrmsr                                   993     909     -84
vmce_rdmsr                                  1134    1034    -100
Total: Before=3726243, After=3726152, chg -0.00%

amd-only+no-default-path
========================

add/remove: 0/14 grow/shrink: 4/58 up/down: 93/-10948 (-10855)
Function                                     old     new   delta
x86_cpu_policies_are_compatible              157     194     +37
amd_check_entrysign                          807     829     +22
init_guest_cpu_policies                     1364    1382     +18
xen_config_data                             1471    1487     +16
opt_gds_mit                                    1       -      -1
nmi_p6_event_width                             4       -      -4
nmi_p4_cccr_val                                4       -      -4
init_e820                                   1037    1033      -4
pci_cfg_ok                                   307     301      -6
get_hw_residencies                           213     205      -8
recalculate_cpuid_policy                     909     900      -9
init_amd                                    2508    2499      -9
dom0_setup_permissions                      3809    3800      -9
arch_ioreq_server_get_type_addr              250     241      -9
cpu_has_amd_erratum                          230     219     -11
parse_spec_ctrl                             2321    2307     -14
amd_nonfatal_mcheck_init                     192     177     -15
shanghai_cpu_dev                              16       -     -16
hygon_cpu_dev                                 16       -     -16
default_init                                  16       -     -16
default_cpu                                   16       -     -16
centaur_cpu_dev                               16       -     -16
x86emul_0fae                                2758    2741     -17
vmce_init_vcpu                               153     136     -17
cpufreq_cpu_init                              34      15     -19
nmi_watchdog_tick                            534     514     -20
vmce_restore_vcpu                            160     139     -21
init_nonfatal_mce_checker                    142     120     -22
ucode_update_hcall_cont                      888     865     -23
mce_firstbank                                 37      10     -27
init_shanghai                                 29       -     -29
validate_gl4e                                617     587     -30
l4e_propagate_from_guest                     451     421     -30
guest_walk_tables_4_levels                  3411    3381     -30
clear_msr_range                               30       -     -30
acpi_dead_idle                               430     398     -32
print_mtrr_state                             719     684     -35
amd_mcheck_init                              451     416     -35
hvm_vcpu_virtual_to_linear                   631     595     -36
do_IRQ                                      1783    1747     -36
init_bsp_APIC                                193     149     -44
cpu_callback                                4650    4600     -50
mc_memerr_dhandler                           903     851     -52
mcheck_init                                 1187    1122     -65
microcode_nmi_callback                       205     139     -66
disable_lapic_nmi_watchdog                   119      49     -70
__start_xen                                 9448    9378     -70
alternative_instructions                     154      82     -72
traps_init                                   543     468     -75
protmode_load_seg                           1904    1829     -75
set_cx_pminfo                               1691    1614     -77
init_intel_cacheinfo                        1191    1111     -80
is_cpu_primary                                93       -     -93
do_mca                                      3181    3085     -96
guest_cpuid                                 2395    2292    -103
guest_common_max_feature_adjustments         110       -    -110
read_msr                                    1431    1319    -112
x86emul_decode                             12729   12597    -132
guest_common_default_feature_adjustments     232      62    -170
do_microcode_update                          787     602    -185
cpufreq_driver_init                          453     263    -190
vmce_wrmsr                                   967     768    -199
recalculate_misc                             898     689    -209
vmce_rdmsr                                  1083     872    -211
early_cpu_init                               948     732    -216
guest_wrmsr                                 2853    2622    -231
init_centaur                                 238       -    -238
domain_cpu_policy_changed                    677     408    -269
write_msr                                   1749    1465    -284
x86_emulate                               222198  221891    -307
init_hygon                                   389       -    -389
start_vmx                                   1507    1105    -402
guest_rdmsr                                 2308    1881    -427
setup_apic_nmi_watchdog                      977     276    -701
do_get_hw_residencies                       1289       9   -1280
init_speculation_mitigations                9714    6788   -2926
Total: Before=3679044, After=3668189, chg -0.30%

There's a few more patches needed to add conditional inclusion of amd.c and
intel.c at the Makefile level, but that can be done just as well. It just adds
5 patches worth of noise I don't want to discuss atm.

Just knowing x86_vendor_is() is "good to have" is good enough as it enables our
downstream to customise it with whatever optimisations we need.

I also suspect other areas of the hypervisor could benefit from this meld of
runtime+compiletime sort of checking, allowing transparent code removal.

I'm thinking DOM0LESS_BOOT vs DOM0_BOOT vs PVSHIM_BOOT, or AMD_SVM vs INTEL_VMX
in HVM-only builds, or family checks to have (i.e) a explicit "older-than-zen"
Kconfig option with a similar approach on a family range check.

This is maybe one of several such uses.

So... thoughts? I'm definitely fond of the single-vendor bloat-o-meter output.

Cheers,
Alejandro

Alejandro Vallejo (11):
  x86: Add more granularity to the vendors in Kconfig
  x86: Reject CPU policies with vendors other than the host's
  x86: Add x86_vendor_is() by itself before using it
  x86: Refactor early vendor lookup code to use x86_vendor_is()
  x86: Conditionalise the inclusion of Hygon/Centaur/Shanghai cpu/ files
  x86: Migrate switch-based vendor checks to x86_vendor_is()
  x86: Migrate MSR handler vendor checks to x86_vendor_is()
  x86: Migrate insn emulator to use x86_vendor_is()
  x86: Migrate spec_ctrl vendor checks to x86_vendor_is()
  x86: Migrate everything under cpu/ to use x86_vendor_is()
  x86: Migrate every remaining raw vendor check to x86_vendor_is()

 xen/arch/x86/Kconfig.cpu               | 45 +++++++++++++++++++++
 xen/arch/x86/acpi/cpu_idle.c           | 19 ++++-----
 xen/arch/x86/acpi/cpufreq/acpi.c       |  2 +-
 xen/arch/x86/acpi/cpufreq/cpufreq.c    | 32 +++++----------
 xen/arch/x86/alternative.c             | 30 ++++++--------
 xen/arch/x86/apic.c                    |  2 +-
 xen/arch/x86/cpu-policy.c              | 41 +++++++++----------
 xen/arch/x86/cpu/Makefile              |  6 +--
 xen/arch/x86/cpu/amd.c                 |  6 +--
 xen/arch/x86/cpu/common.c              | 50 +++++++++++++++--------
 xen/arch/x86/cpu/intel_cacheinfo.c     |  5 +--
 xen/arch/x86/cpu/mcheck/amd_nonfatal.c |  2 +-
 xen/arch/x86/cpu/mcheck/mcaction.c     |  3 +-
 xen/arch/x86/cpu/mcheck/mce.c          | 41 +++++--------------
 xen/arch/x86/cpu/mcheck/mce.h          | 20 +++++----
 xen/arch/x86/cpu/mcheck/mce_amd.c      |  6 +--
 xen/arch/x86/cpu/mcheck/mce_intel.c    |  6 +--
 xen/arch/x86/cpu/mcheck/non-fatal.c    | 20 +++------
 xen/arch/x86/cpu/mcheck/vmce.c         | 50 ++++++-----------------
 xen/arch/x86/cpu/microcode/amd.c       |  2 +-
 xen/arch/x86/cpu/microcode/core.c      |  2 +-
 xen/arch/x86/cpu/mtrr/generic.c        |  4 +-
 xen/arch/x86/cpu/mwait-idle.c          |  4 +-
 xen/arch/x86/cpuid.c                   |  4 +-
 xen/arch/x86/dom0_build.c              |  3 +-
 xen/arch/x86/domain.c                  | 37 ++++++++---------
 xen/arch/x86/e820.c                    |  3 +-
 xen/arch/x86/guest/xen/xen.c           | 19 ++++-----
 xen/arch/x86/hvm/hvm.c                 |  5 ++-
 xen/arch/x86/hvm/ioreq.c               |  2 +-
 xen/arch/x86/hvm/vmx/vmx.c             |  6 +--
 xen/arch/x86/include/asm/cpuid.h       | 56 +++++++++++++++++++++++++-
 xen/arch/x86/include/asm/guest_pt.h    |  4 +-
 xen/arch/x86/irq.c                     |  4 +-
 xen/arch/x86/msr.c                     | 41 ++++++++++---------
 xen/arch/x86/nmi.c                     | 18 +++------
 xen/arch/x86/pv/emul-priv-op.c         | 24 +++++------
 xen/arch/x86/setup.c                   |  2 +-
 xen/arch/x86/spec_ctrl.c               | 34 ++++++++--------
 xen/arch/x86/traps-setup.c             | 18 ++++-----
 xen/arch/x86/x86_emulate/private.h     |  4 +-
 xen/arch/x86/x86_emulate/x86_emulate.c |  2 +-
 xen/lib/x86/policy.c                   |  3 +-
 43 files changed, 360 insertions(+), 327 deletions(-)


base-commit: fb0e37df71a31318c61e0715ffed3e149ca8a4aa
-- 
2.43.0


Reply via email to