On Mon, Sep 06, 2021 at 03:02:12PM +0200, Jan Beulich wrote:
> From: Chen Yu <[email protected]>
> 
> Because cpuidle assumes worst-case C-state parameters, PC6 parameters
> are used for describing C6, which is worst-case for requesting CC6.
> When PC6 is enabled, this is appropriate. But if PC6 is disabled
> in the BIOS, the exit latency and target residency should be adjusted
> accordingly.
> 
> Exit latency:
> Previously the C6 exit latency was measured as the PC6 exit latency.
> With PC6 disabled, the C6 exit latency should be the one of CC6.
> 
> Target residency:
> With PC6 disabled, the idle duration within [CC6, PC6) would make the
> idle governor choose C1E over C6. This would cause low energy-efficiency.
> We should lower the bar to request C6 when PC6 is disabled.
> 
> To fill this gap, check if PC6 is disabled in the BIOS in the
> MSR_PKG_CST_CONFIG_CONTROL(0xe2) register. If so, use the CC6 exit latency
> for C6 and set target_residency to 3 times of the new exit latency. [This
> is consistent with how intel_idle driver uses _CST to calculate the
> target_residency.] As a result, the OS would be more likely to choose C6
> over C1E when PC6 is disabled, which is reasonable, because if C6 is
> enabled, it implies that the user cares about energy, so choosing C6 more
> frequently makes sense.
> 
> The new CC6 exit latency of 92us was measured with wult[1] on SKX via NIC
> wakeup as the 99.99th percentile. Also CLX and CPX both have the same CPU
> model number as SkX, but their CC6 exit latencies are similar to the SKX
> one, 96us and 89us respectively, so reuse the SKX value for them.
> 
> There is a concern that it might be better to use a more generic approach
> instead of optimizing every platform. However, if the required code
> complexity and different PC6 bit interpretation on different platforms
> are taken into account, tuning the code per platform seems to be an
> acceptable tradeoff.
> 
> Link: 
> https://nam04.safelinks.protection.outlook.com/?url=https%3A%2F%2Fintel.github.io%2Fwult%2F&amp;data=04%7C01%7Croger.pau%40citrix.com%7Ccdf115e71eb14429868408d97136a902%7C335836de42ef43a2b145348c2ee9ca5b%7C0%7C0%7C637665301851513469%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&amp;sdata=D9sIoe%2BFAvia3OuRsT19VAxkTqrlnPHnKqTSKiW6pRM%3D&amp;reserved=0
>  # [1]
> Suggested-by: Len Brown <[email protected]>
> Signed-off-by: Chen Yu <[email protected]>
> Reviewed-by: Artem Bityutskiy <[email protected]>
> [ rjw: Subject and changelog edits ]
> Signed-off-by: Rafael J. Wysocki <[email protected]>
> [Linux commit: 64233338499126c5c31e07165735ab5441c7e45a]
> 
> Pull in Linux'es MSR_PKG_CST_CONFIG_CONTROL. Alongside the dropping of
> "const" from skx_cstates[] add __read_mostly, and extend that to other
> similar non-const tables.
> 
> Signed-off-by: Jan Beulich <[email protected]>

Acked-by: Roger Pau Monné <[email protected]>

> 
> --- a/xen/arch/x86/cpu/mwait-idle.c
> +++ b/xen/arch/x86/cpu/mwait-idle.c
> @@ -484,7 +484,7 @@ static const struct cpuidle_state bdw_cs
>       {}
>  };
>  
> -static struct cpuidle_state skl_cstates[] = {
> +static struct cpuidle_state __read_mostly skl_cstates[] = {
>       {
>               .name = "C1-SKL",
>               .flags = MWAIT2flg(0x00),
> @@ -536,7 +536,7 @@ static struct cpuidle_state skl_cstates[
>       {}
>  };
>  
> -static const struct cpuidle_state skx_cstates[] = {
> +static struct cpuidle_state __read_mostly skx_cstates[] = {
>       {
>               .name = "C1-SKX",
>               .flags = MWAIT2flg(0x00),
> @@ -674,7 +674,7 @@ static const struct cpuidle_state knl_cs
>       {}
>  };
>  
> -static struct cpuidle_state bxt_cstates[] = {
> +static struct cpuidle_state __read_mostly bxt_cstates[] = {
>       {
>               .name = "C1-BXT",
>               .flags = MWAIT2flg(0x00),
> @@ -870,9 +870,9 @@ static void auto_demotion_disable(void *
>  {
>       u64 msr_bits;
>  
> -     rdmsrl(MSR_NHM_SNB_PKG_CST_CFG_CTL, msr_bits);
> +     rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, msr_bits);
>       msr_bits &= ~(icpu->auto_demotion_disable_flags);
> -     wrmsrl(MSR_NHM_SNB_PKG_CST_CFG_CTL, msr_bits);
> +     wrmsrl(MSR_PKG_CST_CONFIG_CONTROL, msr_bits);
>  }
>  
>  static void byt_auto_demotion_disable(void *dummy)
> @@ -1141,7 +1141,7 @@ static void __init sklh_idle_state_table
>       if ((mwait_substates & (MWAIT_CSTATE_MASK << 28)) == 0)
>               return;
>  
> -     rdmsrl(MSR_NHM_SNB_PKG_CST_CFG_CTL, msr);
> +     rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, msr);
>  
>       /* PC10 is not enabled in PKG C-state limit */
>       if ((msr & 0xF) != 8)
> @@ -1161,6 +1161,36 @@ static void __init sklh_idle_state_table
>  }
>  
>  /*
> + * skx_idle_state_table_update - Adjust the Sky Lake/Cascade Lake
> + * idle states table.
> + */
> +static void __init skx_idle_state_table_update(void)
> +{
> +     unsigned long long msr;
> +
> +     rdmsrl(MSR_PKG_CST_CONFIG_CONTROL, msr);
> +
> +     /*
> +      * 000b: C0/C1 (no package C-state support)
> +      * 001b: C2
> +      * 010b: C6 (non-retention)
> +      * 011b: C6 (retention)
> +      * 111b: No Package C state limits.
> +      */
> +     if ((msr & 0x7) < 2) {

I wouldn't mind adding this mask field to msr-index.h.

Thanks, Roger.

Reply via email to