On Wed, 4 Dec 2024 09:56:35 +0100 Mauro Carvalho Chehab <mchehab+hua...@kernel.org> wrote:
> Em Wed, 4 Dec 2024 08:54:40 +0100 > Igor Mammedov <imamm...@redhat.com> escreveu: > > > On Tue, 3 Dec 2024 14:47:30 +0100 > > Mauro Carvalho Chehab <mchehab+hua...@kernel.org> wrote: > > > > > Em Tue, 3 Dec 2024 12:51:43 +0100 > > > Igor Mammedov <imamm...@redhat.com> escreveu: > > > > > > > On Fri, 22 Nov 2024 10:11:30 +0100 > > > > Mauro Carvalho Chehab <mchehab+hua...@kernel.org> wrote: > > > > > > > > > Currently, CPER address location is calculated as an offset of > > > > > the hardware_errors table. It is also badly named, as the > > > > > offset actually used is the address where the CPER data starts, > > > > > and not the beginning of the error source. > > > > > > > > > > Move the logic which calculates such offset to a separate > > > > > function, in preparation for a patch that will be changing the > > > > > logic to calculate it from the HEST table. > > > > > > > > > > While here, properly name the variable which stores the cper > > > > > address. > > > > > > > > > > Signed-off-by: Mauro Carvalho Chehab <mchehab+hua...@kernel.org> > > > > > Reviewed-by: Jonathan Cameron <jonathan.came...@huawei.com> > > > > > --- > > > > > hw/acpi/ghes.c | 41 ++++++++++++++++++++++++++++++++--------- > > > > > 1 file changed, 32 insertions(+), 9 deletions(-) > > > > > > > > > > diff --git a/hw/acpi/ghes.c b/hw/acpi/ghes.c > > > > > index 87fd3feedd2a..d99697b20164 100644 > > > > > --- a/hw/acpi/ghes.c > > > > > +++ b/hw/acpi/ghes.c > > > > > @@ -364,10 +364,37 @@ void acpi_ghes_add_fw_cfg(AcpiGhesState *ags, > > > > > FWCfgState *s, > > > > > ags->present = true; > > > > > } > > > > > > > > > > +static void get_hw_error_offsets(uint64_t ghes_addr, > > > > > + uint64_t *cper_addr, > > > > > + uint64_t *read_ack_register_addr) > > > > > +{ > > > > > > > > > > > > > + if (!ghes_addr) { > > > > > + return; > > > > > + } > > > > > > > > why do we need this check? > > > > > > It is a safeguard measure to avoid crashes and OOM access. If fw_cfg > > > callback doesn't fill it properly, this will be zero. > > > > shouldn't happen, but yeah it firmware job to write back addr > > which might happen for whatever reason (a bug for example). > > > > The main reason I added it is that, after the second series, it could > also happen if there's something wrong with the backward compat logic. > > So, both here and after switching to HEST-based offsets, I opted > to explicitly test. > > > Perhaps push this up to the stack, so we don't have to deal > > with scattered checks in ghes code. > > > > kvm_arch_on_sigbus_vcpu() looks like a goo candidate for check > > and warn_once if that ever happens. > > It already calls acpi_ghes_present() which resolves GED device > > and then later we duplicate this job in ghes_record_cper_errors() > > > > so maybe rename acpi_ghes_present to something like AcpiGhesState* > > acpi_ghes_get_state() > > and call it instead. And then move ghes_addr check/warn_once there. > > This way the rest of ghes code won't have to deal handling practically > > impossible error conditions that cause reader to wonder why it might > > happen. > > I'll look on it. Yet, if ok for you, I would prefer dealing with this > once we have a bigger picture, e.g. once we merge those tree series: > > - cleanup series (this one); > - HEST offset (I'll be sending a new version today); ok, lets revisit this point after this series. Since at this point we should have a clean picture of how new code works. > - error_inject. > > Thanks, > Mauro >