Mauro Carvalho Chehab <mchehab+hua...@kernel.org> writes: > Em Thu, 25 Jul 2024 11:48:12 +0200 > Markus Armbruster <arm...@redhat.com> escreveu: > >> Mauro Carvalho Chehab <mchehab+hua...@kernel.org> writes: >> >> > From: Jonathan Cameron <jonathan.came...@huawei.com> >> > >> > 1. Some GHES functions require handling addresses. Add a helper function >> > to support it. >> > >> > 2. Add support for ACPI CPER (firmware-first) ARM processor error >> > injection. >> > >> > Compliance with N.2.4.4 ARM Processor Error Section in UEFI 2.6 and >> > upper specs, using error type bit encoding as detailed at UEFI 2.9A >> > errata. >> > >> > Error injection examples: >> > >> > { "execute": "qmp_capabilities" } >> > >> > { "execute": "arm-inject-error", >> > "arguments": { >> > "errortypes": ['cache-error'] >> > } >> > } >> > >> > { "execute": "arm-inject-error", >> > "arguments": { >> > "errortypes": ['tlb-error'] >> > } >> > } >> > >> > { "execute": "arm-inject-error", >> > "arguments": { >> > "errortypes": ['bus-error'] >> > } >> > } >> > >> > { "execute": "arm-inject-error", >> > "arguments": { >> > "errortypes": ['cache-error', 'tlb-error'] >> > } >> > } >> > >> > { "execute": "arm-inject-error", >> > "arguments": { >> > "errortypes": ['cache-error', 'tlb-error', 'bus-error', >> > 'micro-arch-error'] >> > } >> > } >> > ... >> > >> > Co-authored-by: Mauro Carvalho Chehab <mchehab+hua...@kernel.org> >> > Co-authored-by: Shiju Jose <shiju.j...@huawei.com> >> > For Add a logic to handle block addresses, >> > Signed-off-by: Jonathan Cameron <jonathan.came...@huawei.com> >> > Signed-off-by: Mauro Carvalho Chehab <mchehab+hua...@kernel.org> >> > For FW first ARM processor error injection, >> > Signed-off-by: Mauro Carvalho Chehab <mchehab+hua...@kernel.org> >> > Signed-off-by: Shiju Jose <shiju.j...@huawei.com> >> > --- >> > configs/targets/aarch64-softmmu.mak | 1 + >> > hw/acpi/ghes.c | 258 ++++++++++++++++++++++++++-- >> > hw/arm/Kconfig | 4 + >> > hw/arm/arm_error_inject.c | 35 ++++ >> > hw/arm/arm_error_inject_stubs.c | 18 ++ >> > hw/arm/meson.build | 3 + >> > include/hw/acpi/ghes.h | 2 + >> > qapi/arm-error-inject.json | 49 ++++++ >> > qapi/meson.build | 1 + >> > qapi/qapi-schema.json | 1 + >> > 10 files changed, 361 insertions(+), 11 deletions(-) >> > create mode 100644 hw/arm/arm_error_inject.c >> > create mode 100644 hw/arm/arm_error_inject_stubs.c >> > create mode 100644 qapi/arm-error-inject.json >> >> Since the new file not covered in MAINTAINERS, get_maintainer.pl will >> blame it on the QAPI maintainers alone. No good. > > Added myself there: > > diff --git a/MAINTAINERS b/MAINTAINERS > index 98eddf7ae155..713a104ef901 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -2075,6 +2075,13 @@ F: hw/acpi/ghes.c > F: include/hw/acpi/ghes.h > F: docs/specs/acpi_hest_ghes.rst > > +ACPI/HEST/GHES/ARM processor CPER > +R: Mauro Carvalho Chehab <mchehab+hua...@kernel.org> > +S: Maintained > +F: hw/arm/arm_error_inject.c > +F: hw/arm/arm_error_inject_stubs.c > +F: qapi/arm-error-inject.json > + > ppc4xx > L: qemu-...@nongnu.org > S: Orphan > >> >> [...] >> >> > diff --git a/qapi/arm-error-inject.json b/qapi/arm-error-inject.json >> > new file mode 100644 >> > index 000000000000..430e6cea6b60 >> > --- /dev/null >> > +++ b/qapi/arm-error-inject.json >> > @@ -0,0 +1,49 @@ >> > +# -*- Mode: Python -*- >> > +# vim: filetype=python >> > + >> > +## >> > +# = ARM Processor Errors >> > +## >> > + >> > +## >> > +# @ArmProcessorErrorType: >> > +# >> > +# Type of ARM processor error to inject >> > +# >> > +# @unknown-error: Unknown error >> >> Removed in PATCH 7, and unused until then. Why add it in the first >> place? > > I folded this with patch 7, so this was gone now. > >> >> > +# >> > +# @cache-error: Cache error >> > +# >> > +# @tlb-error: TLB error >> > +# >> > +# @bus-error: Bus error. >> > +# >> > +# @micro-arch-error: Micro architectural error. >> > +# >> > +# Since: 9.1 >> > +## >> > +{ 'enum': 'ArmProcessorErrorType', >> > + 'data': ['unknown-error', >> > + 'cache-error', >> >> Tab in this line. Please convert to spaces. > > Ok. > >> >> > + 'tlb-error', >> > + 'bus-error', >> > + 'micro-arch-error'] >> > +} >> > + >> > +## >> > +# @arm-inject-error: >> > +# >> > +# Inject ARM Processor error. >> > +# >> > +# @errortypes: ARM processor error types to inject >> > +# >> > +# Features: >> > +# >> > +# @unstable: This command is experimental. >> > +# >> > +# Since: 9.1 >> > +## >> > +{ 'command': 'arm-inject-error', >> > + 'data': { 'errortypes': ['ArmProcessorErrorType'] }, >> >> Please separate words with dashes: 'error-types'. > > Done. > > Folding with patch 7 broke it on two separate fields: error and > type. > >> >> > + 'features': [ 'unstable' ] >> > +} >> >> Is this used only with TARGET_ARM? > > Yes, as this CPER record is defined only for arm. There are three other > processor error info: > - for x86; > - for ia32; > - for "generic cpu". > > They have different structures, with different fields.
A generic inject-error command feels nicer, but coding its arguments in the schema could be more trouble than it's worth. I'm not asking you to try. A target-specific command like this one should be conditional. Try this: { 'command': 'arm-inject-error', 'data': { 'errortypes': ['ArmProcessorErrorType'] }, 'features': [ 'unstable' ], 'if': 'TARGET_ARM' } No need to provide a qmp_arm_inject_error() stub then. >> Why is being able to inject multiple error types at once useful? > > The CPER ARM Processor record is defined at UEFI spec as having from 1 to > 255 errors, that can be using the same type or not. The idea behind UEFI > spec is that a single root error may be reflected on multiple errors. > > It may also help to reduce BIOS interrupts to OS, by merging errors > altogether, as memory errors usually happen in bursts. > > Due to that, a single Processor Error Information inside a CPER record > for ARM processor can, according with UEFI spec, contain more than one > of the following bits set: > > +-----|---------------------------+ > | Bit | Meaning | > +=====+===========================+ > | 1 | Cache Error | > | 2 | TLB Error | > | 3 | Bus Error | > | 4 | Micro-architectural Error | > +-----|---------------------------+ > > So, the spec allows, for instance, to have a single Processor Error > Information (PEI) with micro-arch and tlb-error flags raised at the > same time. > > We need the capability of testing multiple error types in order to check > if OS implementation is decoding it the right way. In particular, Linux > was not doing it right, as the CPER ARM Processor record handler was > written at the time UEFI 2.6 spec was written, while the actual encoding > for the error type was only defined at UEFI 2.9A errata and newer. I see. >> I'd expect at least some of these errors to come with additional >> information. For instance, I imagine a bus error is associated with >> some address. > > It actually depends on the ARM and PEI valid fields: the address may or > may not be present, depending if the phy/logical address valid field bit > is set or not. > >> >> If we encode the the error to inject as an enum value, adding more will >> be hard. >> >> If we wrap the enum in a struct >> >> { 'struct': 'ArmProcessorError', >> 'data': { 'type': 'ArmProcessorErrorType' } } >> >> we can later extend it like >> >> { 'union': 'ArmProcessorError', >> 'base: { 'type': 'ArmProcessorErrorType' } >> 'data': { >> 'bus-error': 'ArmProcessorBusErrorData' } } >> >> { 'struct': 'ArmProcessorBusErrorData', >> 'data': ... } > > I don't see this working as one might expect. See, the ARM error > information data can be repeated from 1 to 255 times. It is given > by this struct (see patch 7): > > { 'struct': 'ArmProcessorErrorInformation', > 'data': { '*validation': ['ArmPeiValidationBits'], > 'type': ['ArmProcessorErrorType'], > '*multiple-error': 'uint16', > '*flags': ['ArmProcessorFlags'], > '*error-info': 'uint64', > '*virt-addr': 'uint64', > '*phy-addr': 'uint64'} > } > > According with the UEFI spec, the type is always be present. > The other fields are marked as valid or not via the field > "validation". So, there's one bit indicating what is valid between > the fields at the PEI structure, e. g.: > > - multiple-error: multiple occurrences of the error; > - flags; > - error-info: error information; > - virt-addr: virtual address; > - phy-addr: physical address. > > There are also other fields that are global for the entire record, > also marked as valid or not via another bitmask. > > The contents of almost all those fields are independent of the error > type. The only field which content is affected by the error type is > "error-info", and the definition of such field is not fully specified. > > So, currently, UEFI spec only defines it when: > > 1. the error type has just one bit set; > 2. the error type is either cache, TLB or bus error[1]. > If type is micro-arch-specific error, the spec doesn't tell how this > field if filled. > > To make the API simple (yet powerful), I opted to not enforce any encoding > for error-info: let userspace fill it as required and use some default > that would make sense, if this is not passed via QMP. > > [1] See > https://uefi.org/specs/UEFI/2.10/Apx_N_Common_Platform_Error_Record.html#arm-processor-error-information I asked because designing for extensibility is good practice. It's not a hard requirement here, because feature 'unstable' gives us lincense to change the interface incompatibly. [...]