[Public] > -----Original Message----- > From: Jan Beulich <jbeul...@suse.com> > Sent: Friday, September 5, 2025 2:45 PM > To: Penny, Zheng <penny.zh...@amd.com> > Cc: Andrew Cooper <andrew.coop...@citrix.com>; Roger Pau Monné > <roger....@citrix.com>; Anthony PERARD <anthony.per...@vates.tech>; Orzel, > Michal <michal.or...@amd.com>; Julien Grall <jul...@xen.org>; Stefano > Stabellini <sstabell...@kernel.org>; xen-devel@lists.xenproject.org > Subject: Re: [PATCH v9 2/8] xen/cpufreq: implement amd-cppc driver for CPPC in > passive mode > > On 05.09.2025 07:15, Penny, Zheng wrote: > >> -----Original Message----- > >> From: Jan Beulich <jbeul...@suse.com> > >> Sent: Thursday, September 4, 2025 8:04 PM > >> > >> On 04.09.2025 08:35, Penny Zheng wrote: > >>> +static void cf_check amd_cppc_write_request_msrs(void *info) { > >>> + const struct amd_cppc_drv_data *data = info; > >>> + > >>> + wrmsrl(MSR_AMD_CPPC_REQ, data->req.raw); } > >>> + > >>> +static void amd_cppc_write_request(unsigned int cpu, > >>> + struct amd_cppc_drv_data *data, > >>> + uint8_t min_perf, uint8_t des_perf, > >>> + uint8_t max_perf, uint8_t epp) { > >>> + uint64_t prev = data->req.raw; > >>> + > >>> + data->req.min_perf = min_perf; > >>> + data->req.max_perf = max_perf; > >>> + data->req.des_perf = des_perf; > >>> + data->req.epp = epp; > >>> + > >>> + if ( prev == data->req.raw ) > >>> + return; > >>> + > >>> + on_selected_cpus(cpumask_of(cpu), amd_cppc_write_request_msrs, > >>> + data, 1); > >> > >> With "cpu" coming from ... > >> > >>> +} > >>> + > >>> +static int cf_check amd_cppc_cpufreq_target(struct cpufreq_policy > >>> *policy, > >>> + unsigned int target_freq, > >>> + unsigned int relation) { > >>> + struct amd_cppc_drv_data *data = policy->u.amd_cppc; > >>> + uint8_t des_perf; > >>> + int res; > >>> + > >>> + if ( unlikely(!target_freq) ) > >>> + return 0; > >>> + > >>> + res = amd_cppc_khz_to_perf(data, target_freq, &des_perf); > >>> + if ( res ) > >>> + return res; > >>> + > >>> + /* > >>> + * Having a performance level lower than the lowest nonlinear > >>> + * performance level, such as, lowest_perf <= perf <= > lowest_nonliner_perf, > >>> + * may actually cause an efficiency penalty, So when deciding the > >>> min_perf > >>> + * value, we prefer lowest nonlinear performance over lowest > >>> performance. > >>> + */ > >>> + amd_cppc_write_request(policy->cpu, data, > >>> + data->caps.lowest_nonlinear_perf, > >> > >> ... here, how can this work when this particular CPU isn't online anymore? > > > > Once any processor in the domain gets offline, the governor will stop, > then .target() could not be invoked any more: > > ``` > > if ( hw_all || cpumask_weight(cpufreq_dom->map) == domain_info- > >num_processors ) > > __cpufreq_governor(policy, CPUFREQ_GOV_STOP); ``` > > I can't bring the code in line with what you say.
Only processors in the domain are all online, the weight equates to the "num_processors". That is, governor stops when the *first* processor tries to offline. If gov stops, cpufreq->target() will not be executed any more. Also, in __cpufreq_driver_target(), we will do the cpu_online(policy->cpu) check to ensure registered cpu in policy->cpu is online > > >>> + des_perf, data->caps.highest_perf, > >>> + /* Pre-defined BIOS value for passive mode */ > >>> + per_cpu(epp_init, policy->cpu)); > >>> + return 0; > >>> +} > >>> + > >>> +static void cf_check amd_cppc_init_msrs(void *info) { > >>> + struct cpufreq_policy *policy = info; > >>> + struct amd_cppc_drv_data *data = policy->u.amd_cppc; > >>> + uint64_t val; > >>> + unsigned int min_freq = 0, nominal_freq = 0, max_freq; > >>> + > >>> + /* Package level MSR */ > >>> + rdmsrl(MSR_AMD_CPPC_ENABLE, val); > >> > >> Here you clarify the scope, yet what about ... > >> > >>> + /* > >>> + * Only when Enable bit is on, the hardware will calculate the > >>> processor’s > >>> + * performance capabilities and initialize the performance level > >>> fields in > >>> + * the CPPC capability registers. > >>> + */ > >>> + if ( !(val & AMD_CPPC_ENABLE) ) > >>> + { > >>> + val |= AMD_CPPC_ENABLE; > >>> + wrmsrl(MSR_AMD_CPPC_ENABLE, val); > >>> + } > >>> + > >>> + rdmsrl(MSR_AMD_CPPC_CAP1, data->caps.raw); > >> > >> ... this and ... > >> > > GOV_GETAVG); > >>> + > >>> + /* Store pre-defined BIOS value for passive mode */ > >>> + rdmsrl(MSR_AMD_CPPC_REQ, val); > >> > >> ... this? > > > > They are all Per-thread MSR. I'll add descriptions. > > If they're per-thread, coordination will be yet more difficult if any domain > had more > than one thread in it. So question again: Is it perhaps disallowed by the > spec for > there to be any "domain" covering more than a single thread? > I'll double-check with the hardware team about it. Also, maybe xen current code is already overing SW_ANY coordination. As for SW_ANY coordination type, the OS needs to coordinate the state for all processors in the domain by making a state request on the control interface of *only one* processor in the domain. In Xen, ig, the "only one" is the cpu registered in policy->cpu. But for "SW_ALL", the OSPM coordinates the state for all processors in the domain by making the same state request on the control interface of *each processor" in the domain, I haven't see any codes coordinating the synchronization in Xen > Jan