On 20.05.2025 16:25, Ross Lagerwall wrote:
> On Tue, May 13, 2025 at 3:43 PM Jan Beulich <jbeul...@suse.com> wrote:
>>
>> On 12.05.2025 16:46, Ross Lagerwall wrote:
>>> --- a/xen/include/public/sysctl.h
>>> +++ b/xen/include/public/sysctl.h
>>> @@ -215,23 +215,51 @@ typedef struct pm_px_val pm_px_val_t;
>>>  DEFINE_XEN_GUEST_HANDLE(pm_px_val_t);
>>>
>>>  struct pm_px_stat {
>>> -    uint8_t total;        /* total Px states */
>>> -    uint8_t usable;       /* usable Px states */
>>> -    uint8_t last;         /* last Px state */
>>> -    uint8_t cur;          /* current Px state */
>>> -    XEN_GUEST_HANDLE_64(uint64) trans_pt;   /* Px transition table */
>>> +    /*
>>> +     * IN: Number of elements in pt, number of rows/columns in trans_pt
>>> +     *     (PMSTAT_get_pxstat)
>>> +     * OUT: total Px states (PMSTAT_get_max_px, PMSTAT_get_pxstat)
>>> +     */
>>> +    uint8_t total;
>>
>> The part for this field ought to go in patch 1, as already indicated there.
>>
>>> +    uint8_t usable;       /* OUT: usable Px states (PMSTAT_get_pxstat) */
>>> +    uint8_t last;         /* OUT: last Px state (PMSTAT_get_pxstat) */
>>> +    uint8_t cur;          /* OUT: current Px state (PMSTAT_get_pxstat) */
>>> +    /*
>>> +     * OUT: Px transition table. This should have total * total elements.
>>> +     *      This will not be copied if it is smaller than the hypervisor's
>>> +     *      Px transition table. (PMSTAT_get_pxstat)
>>> +     */
>>> +    XEN_GUEST_HANDLE_64(uint64) trans_pt;
>>> +    /* OUT: This should have total elements (PMSTAT_get_pxstat) */
>>>      XEN_GUEST_HANDLE_64(pm_px_val_t) pt;
>>
>> As also indicated there, the same constraint as for trans_pt applies to this
>> output buffer, just that it's having only one dimension.
>>
>>>  };
>>>
>>>  struct pm_cx_stat {
>>> -    uint32_t nr;    /* entry nr in triggers & residencies, including C0 */
>>> -    uint32_t last;  /* last Cx state */
>>> -    uint64_aligned_t idle_time;                 /* idle time from boot */
>>> -    XEN_GUEST_HANDLE_64(uint64) triggers;    /* Cx trigger counts */
>>> -    XEN_GUEST_HANDLE_64(uint64) residencies; /* Cx residencies */
>>> -    uint32_t nr_pc;                          /* entry nr in pc[] */
>>> -    uint32_t nr_cc;                          /* entry nr in cc[] */
>>>      /*
>>> +     * IN:  Number of elements in triggers, residencies (PMSTAT_get_cxstat)
>>> +     * OUT: entry nr in triggers & residencies, including C0
>>> +     *      (PMSTAT_get_cxstat, PMSTAT_get_max_cx)
>>> +     */
>>> +    uint32_t nr;
>>> +    uint32_t last;  /* OUT: last Cx state (PMSTAT_get_cxstat) */
>>> +    /* OUT: idle time from boot (PMSTAT_get_cxstat)*/
>>> +    uint64_aligned_t idle_time;
>>> +    /* OUT: Cx trigger counts, nr elements (PMSTAT_get_cxstat) */
>>> +    XEN_GUEST_HANDLE_64(uint64) triggers;
>>> +    /* OUT: Cx residencies, nr elements (PMSTAT_get_cxstat) */
>>> +    XEN_GUEST_HANDLE_64(uint64) residencies;
>>> +    /*
>>> +     * IN: entry nr in pc[] (PMSTAT_get_cxstat)
>>> +     * OUT: Index of highest non-zero entry set in pc[] (PMSTAT_get_cxstat)
>>> +     */
>>> +    uint32_t nr_pc;
>>> +    /*
>>> +     * IN: entry nr in cc[] (PMSTAT_get_cxstat)
>>> +     * OUT: Index of highest non-zero entry set in cc[] (PMSTAT_get_cxstat)
>>> +     */
>>
>> For both of these, it's not "highest non-zero" but, according to ...
>>
>>> +    uint32_t nr_cc;
>>> +    /*
>>> +     * OUT: (PMSTAT_get_cxstat)
>>>       * These two arrays may (and generally will) have unused slots; slots 
>>> not
>>>       * having a corresponding hardware register will not be written by the
>>>       * hypervisor. It is therefore up to the caller to put a suitable 
>>> sentinel
>>
>> ... this comment, "highest written by hypervisor". They're also not "index 
>> of",
>> but "one higher than the index of" (i.e. counts, not indexes).
>>
> 
> Looking at this again, I don't think that matches what Xen does (nor
> does my previous attempt). The code in question:
> 
> #define PUT_xC(what, n) do { \
>         if ( stat->nr_##what >= n && \
>              copy_to_guest_offset(stat->what, n - 1, &hw_res.what##n, 1) ) \
>             return -EFAULT; \
>         if ( hw_res.what##n ) \
>             nr_##what = n; \
>     } while ( 0 )

Right. I should have inserted "that could be" in my reply.

> Xen will copy all the hardware registers that it knows about (regardless
> of whether the hardware actually has them) and will return in nr_pc /
> nr_cc the index + 1 of the highest non-zero entry it _would_ have
> written if there is sufficient space.

Which is kind of bogus when the last (or more) of those would merely be
zero.

> I could describe it simply as "OUT: Required size of cc[]" ?

Maybe append something like "..., for all known to Xen entries to be
written"? Provided we don't want to change the behavior anyway.

Jan

Reply via email to