On Tue, 11 Nov 2025 22:19:18 +1000
Gavin Shan <[email protected]> wrote:

> Hi Jonathan,
> 
> On 11/11/25 9:55 PM, Jonathan Cameron wrote:
> > On Tue, 11 Nov 2025 20:55:17 +1000
> > Gavin Shan <[email protected]> wrote:  
> >> On 11/11/25 8:07 PM, Jonathan Cameron wrote:  
> >>> On Tue, 11 Nov 2025 14:08:13 +1000
> >>> Gavin Shan <[email protected]> wrote:  
> >>>> On 11/11/25 12:49 AM, Igor Mammedov wrote:  
> >>>>> On Thu, 6 Nov 2025 13:15:52 +1000
> >>>>> Gavin Shan <[email protected]> wrote:  
> >>>>>> On 11/6/25 12:14 AM, Jonathan Cameron wrote:  
> >>>>>>> On Wed,  5 Nov 2025 21:44:49 +1000
> >>>>>>> Gavin Shan <[email protected]> wrote:
> >>>>>>>            
> >>>>>>>> In the situation where host and guest has 64KiB and 4KiB page sizes,
> >>>>>>>> one problematic host page affects 16 guest pages. we need to send 16
> >>>>>>>> consective errors in this specific case.
> >>>>>>>>
> >>>>>>>> Extend acpi_ghes_memory_errors() to support multiple CPERs after the
> >>>>>>>> hunk of code to generate the GHES error status is pulled out from
> >>>>>>>> ghes_gen_err_data_uncorrectable_recoverable(). The status field of
> >>>>>>>> generic error status block is also updated accordingly if multiple
> >>>>>>>> error data entries are contained in the generic error status block.
> >>>>>>>>
> >>>>>>>> Signed-off-by: Gavin Shan <[email protected]>  
> >>>>>>> Hi Gavin,
> >>>>>>>
> >>>>>>> Mostly fine, but a few comments on the defines added and a
> >>>>>>> question on what the multiple things are meant to mean?
> >>>>>>>            
> >>>>>>
> >>>>>> Thanks for your review and comments, replies as below.
> >>>>>>        
> >>>>>>>> diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c
> >>>>>>>> index a9c08e73c0..527b85c8d8 100644
> >>>>>>>> --- a/hw/acpi/ghes.c
> >>>>>>>> +++ b/hw/acpi/ghes.c
> >>>>>>>> @@ -57,8 +57,12 @@
> >>>>>>>>      /* The memory section CPER size, UEFI 2.6: N.2.5 Memory Error 
> >>>>>>>> Section */
> >>>>>>>>      #define ACPI_GHES_MEM_CPER_LENGTH           80
> >>>>>>>>      
> >>>>>>>> -/* Masks for block_status flags */
> >>>>>>>> -#define ACPI_GEBS_UNCORRECTABLE         1
> >>>>>>>> +/* Bits for block_status flags */
> >>>>>>>> +#define ACPI_GEBS_UNCORRECTABLE           0
> >>>>>>>> +#define ACPI_GEBS_CORRECTABLE             1
> >>>>>>>> +#define ACPI_GEBS_MULTIPLE_UNCORRECTABLE  2
> >>>>>>>> +#define ACPI_GEBS_MULTIPLE_CORRECTABLE    3  
> >>>>>>>
> >>>>>>> So this maps to the bits in block status.
> >>>>>>>
> >>>>>>> I'm not actually sure what these multiple variants are meant to tell 
> >>>>>>> us.
> >>>>>>> The multiple error blocks example referred to by the spec is a way to 
> >>>>>>> represent
> >>>>>>> the same error applying to multiple places.  So that's one error, 
> >>>>>>> many blocks.
> >>>>>>> I have no idea if we set these bits in that case.
> >>>>>>>
> >>>>>>> Based on a quick look I don't think linux even takes any notice.  
> >>>>>>> THere
> >>>>>>> are defines in actbl1.h but I'm not seeing any use made of them.
> >>>>>>>            
> >>>>>>
> >>>>>> I hope Igor can confirm since it was suggested by him.
> >>>>>>
> >>>>>> It's hard to understand how exactly these multiple variants are used 
> >>>>>> from the
> >>>>>> spec. In ACPI 6.5 Table 18.11, it's explained as below.
> >>>>>>
> >>>>>> Bit [2] - Multiple Uncorrectable Errors: If set to one, indicates that 
> >>>>>> more
> >>>>>> than one uncorrectable errors have been detected.
> >>>>>>
> >>>>>> I don't see those multiple variants have been used by Linux. So I 
> >>>>>> think it's
> >>>>>> safe to drop them.  
> >>>>>
> >>>>> even though example describes 'same' error at different components,
> >>>>> the bit fields descriptions doesn't set any limits on what 'more than 
> >>>>> one' means.
> >>>>>
> >>>>> Also from guest POV it's multiple different pages that we are reporting 
> >>>>> here
> >>>>> as multiple CPERs.
> >>>>> It seems to me that setting *_MULTIPLE_* here is correct thing to do.
> >>>>>         
> >>>>
> >>>> I don't have strong opinions. Lets keep to set _MULTIPLE_ flag if 
> >>>> Jonathan
> >>>> is fine. Again, this field isn't used by Linux guest.  
> >>> I don't care strongly.  Maybe we should ask for a spec clarification as I 
> >>> doubt
> >>> implementations will be consistent on this given the vague description 
> >>> and that
> >>> Linux ignores it today.
> >>>      
> >>
> >> Google Gemini has the following question. If it can be trusted, it should 
> >> be
> >> set when @num_of_addresses is larger than 1.
> >>
> >> Quota from Google Gemini:
> >>
> >> The system firmware sets this bit to indicate to the Operating System 
> >> Power Management (OSPM)
> >> that more than one correctable error condition has been detected and 
> >> logged for the associated
> >> hardware component since the last time the status was cleared by the 
> >> software. This is crucial
> >> because a high frequency of correctable errors often indicates a potential 
> >> underlying hardware
> >> issue that could lead to uncorrectable (and potentially fatal) errors if 
> >> not addressed (e.g.,
> >> in memory, where multiple correctable errors might trigger a spare memory 
> >> operation).
> >>  
> >>>>     
> >>>>>>>> +#define ACPI_GEBS_ERROR_DATA_ENTRIES      4  
> >>>>>>>
> >>>>>>> This is bits 4-13 and the define isn't used. I'd drop it.
> >>>>>>>            
> >>>>>>
> >>>>>> The definition is used in acpi_ghes_memory_errors() of this patch. 
> >>>>>> However,
> >>>>>> I don't see it has been used by Linux. This field isn't used by Linux 
> >>>>>> to determine
> >>>>>> the total number of error entries. So I think I can drop it either if 
> >>>>>> Igor is ok.
> >>>>>>        
> >>>>
> >>>> Lets keep this field either in next revision if Jonathan is fine.  
> >>>
> >>> I'm fine with the field, but not the value.  As far as I can tell form 
> >>> the spec, it should
> >>> be a mask, not a single bit.
> >>>      
> >>
> >> Agreed, lets keep ACPI_HEST_ERROR_ENTRY_COUNT as zero in next revision.  
> > 
> > I'm even more confused now.  The GEBS Error Data entry count should be 
> > field from 13:4
> > and the value taken should be the number of entries in the record, so 1, 4, 
> > 16 depending
> > on the page size.
> > 
> > So that define of the value 4 is garbage. If it were DATA_ENTRIES_SHIFT 
> > then I'd be much happier.
> >   
> 
> My bad. I misunderstood your point. It will be fixed by using APIs from
> "hw/registerfields.h" as suggested by Philippe in another reply.
> 
>    ...
>    FIELD(ACPI_GEBS, MULTIPLE_CORRECTABLE, 3, 1)
>    FIELD(ACPI_GEBS, ERROR_DATA_ENTRIES, 4, 10)
> 
>    then use FIELD_DP32() to only set the correct bits.
> 
Perfect. Thanks!

J
> Thanks,
> Gavin
> 
> 


Reply via email to