Re: [PATCH v2 3/8] efi: Decode IA32/X64 Processor Error Info Structure

2018-02-27 Thread Borislav Petkov
On Tue, Feb 27, 2018 at 09:32:18PM +, Ghannam, Yazen wrote:
> Not much more readable. It's still vague and confusing to a user and devoid
> of any real info so an expert can't help. And now the information is printed
> arbitrarily, so someone needs to read the source to figure out what it really
> means.

WTF? You need to read the source to figure out what the error is? So
"Corrected Processor Error" is confusing? I think you've been clearly
staring at the spec for waaay too long.

Also, read my mail again:

"Now, I admit that my vesion of the record is not enough to debug it
but it needs to contain only information which is clear and humanly
readable to debug. You can always dump the raw data underneath from the
tracepoint but make the beginning human readable."

> Maybe. But these records are generated by Platform Firmware. Why would
> FW report the error knowing the system is about to die?

WTF more! Dude, are you kidding me?

So the firmware should not report the error if it knows the system is
about to die?!?!

Now you're just making up insane counterarguments, just because.

> Your example still says "Hardware Error"

Oh my, that's the *prefix*.

> and odds are general users won't understand what the error type means
> or what a corrected error is. So it's not much better.

Yeah, and the next thing you'll say is that users won't understand what
"error" means, right?

Geez.

> Exactly! The more info available (usually) the more quickly it can be
> diagnosed.

You still don't understand what I'm trying to explain to you.

It is *not* about diagnosing it - in order to do that you need to
involve people to diagnose it. It is about making the error record as
human readable as possible so that you don't *have* to involve people to
diagnose it in the first place and the user can say, ah ok, corrected
error, no need to do anything.

Or "System Fatal error" - I better replace that part.

> Hardware errors are generally rare and hard to reproduce.

Except when they're not like DRAM ECC floods which are pretty easy to
reproduce.

> So when one does occur we should capture the data and provide it.

Did I say you should not do that?!

I said: make it as human readable as possible and dump the gory hex crap
after it.

> Here are a couple of scenarios based on similar experiences I've had:

Now play that same scenario with the following record format:

[ human readable error record ]
[ full raw error dump ]

> I'll send a V3 set with the following changes:
> 1) Fix table numbering in commit messages.
> 2) Remove "Validation Bits" lines.
> 3) Only print error type GUID for unknown types.
> 
> I think this set should focus on printing the x86 CPER based on the UEFI
> spec and the convention of the other CPER code. CPERs are generated
> by Platform Firmware. So errors are explicitly intended to be viewed by
> the user and all info should be displayed.

You should look up from the spec and realize that real life is much
different.

> I *have* been thinking that it would be nice to take the CPER and pipe it
> through the MCA decoding in arch/x86 and EDAC. This would be really
> nice for when the CPER includes the MCA registers in the Context info.
> So we'd get our usual MCA decoding instead of a binary blob of registers.

That would definitely be a step in the right direction.

> I was thinking that the MCA decoding would be in addition to this. But
> based on Boris's comments, maybe we can make it a default selection.
> For example, if MCA/EDAC decoding is available, use it. Otherwise, print
> the CPER fields in a generic way like we do for the other CPER types.

Yes.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 3/8] efi: Decode IA32/X64 Processor Error Info Structure

2018-02-27 Thread Ghannam, Yazen
> -Original Message-
> From: Borislav Petkov [mailto:b...@suse.de]
> Sent: Tuesday, February 27, 2018 2:10 PM
> To: Ghannam, Yazen <yazen.ghan...@amd.com>
> Cc: linux-efi@vger.kernel.org; linux-ker...@vger.kernel.org;
> ard.biesheu...@linaro.org; x...@kernel.org; Tony Luck
> <tony.l...@intel.com>
> Subject: Re: [PATCH v2 3/8] efi: Decode IA32/X64 Processor Error Info
> Structure
> 
...
> > 1) No one except debug and HW design folks, who will eventually get a
> > report from a user.
> 
> Hahahha, yeah right.
> 
> The only people who get those reports are the maintainers of the code in
> the kernel and the distro people who get all the bugs assigned to them.
> 
> And if they can't decode the error - it is Tony and me.
> 
> HW folks hear about it from us. And we go and decode the damn crap
> *every* time. Do you catch my drift now?
> 

That's not true. You may get reports but so do platform and silicon vendors
directly.

> > [1.990948] [Hardware Error]:  Error 1, type: corrected
> > [1.995789] [Hardware Error]:  fru_text: ProcessorError
> > [2.000632] [Hardware Error]:   section_type: IA32/X64 processor error
> > [2.005796] [Hardware Error]:   Validation Bits: 0x0207
> > [2.010953] [Hardware Error]:   Local APIC_ID: 0x0
> > [2.015991] [Hardware Error]:   CPUID Info:
> > [2.020747] [Hardware Error]:   : 00800f12  00400800
> 
> > [2.025595] [Hardware Error]:   0010: 76d8320b  178bfbff
> 
> > [2.030423] [Hardware Error]:   0020:   
> 
> > [2.035198] [Hardware Error]:   Error Information Structure 0:
> > [2.039961] [Hardware Error]:Error Structure Type: a55701f5-e3ef-
> 43de-ac72-249b573fad2c
> > [2.049608] [Hardware Error]:Error Structure Type: cache error
> > [2.054344] [Hardware Error]:Validation Bits: 0x0001
> > [2.059046] [Hardware Error]:Check Information: 0x20540087
> > [2.063625] [Hardware Error]: Validation Bits: 0x0087
> > [2.068032] [Hardware Error]: Transaction Type: 0, Instruction
> > [2.072423] [Hardware Error]: Operation: 5, instruction fetch
> > [2.076776] [Hardware Error]: Level: 1
> > [2.081073] [Hardware Error]: Overflow: true
> > [2.085360] [Hardware Error]:   Context Information Structure 0:
> > [2.089661] [Hardware Error]:Register Context Type: MSR Registers
> (Machine Check and other MSRs)
> > [2.098487] [Hardware Error]:Register Array Size: 0x0050
> > [2.103113] [Hardware Error]:MSR Address: 0xc0002011
> > [2.107742] [Hardware Error]:Register Array:
> > [2.112270] [Hardware Error]:: d82a0151
> 
> > [2.116845] [Hardware Error]:0010: d010
> 00030031
> > [2.121228] [Hardware Error]:0020: 000100b0
> 4a00
> > [2.125514] [Hardware Error]:0030: 
> 
> > [2.129747] [Hardware Error]:0040: 
> 
> 
> Lemme simplify that error record:
> 
> [Hardware Error]:  Corrected Processor Error
> [Hardware Error]:   APIC_ID: 0x0 | CPUID: 0x17|0x1|0x2
> [Hardware Error]:Type: cache error during instruction fetch
> [Hardware Error]:cache level 1
> [Hardware Error]:Overflow: true
> 
> See how much more readable it got! And it is only 5 lines. I can make it
> even smaller.
> 

Not much more readable. It's still vague and confusing to a user and devoid
of any real info so an expert can't help. And now the information is printed
arbitrarily, so someone needs to read the source to figure out what it really
means.

> If it were a critical, uncorrectable error, every line counts: imagine
> you do the above fat record and the machine freezes at line 5.
> 

Maybe. But these records are generated by Platform Firmware. Why would
FW report the error knowing the system is about to die? Most likely we'll
only see CPERs in HEST if the OS can do something, or we'll see them in
BERT after recovering from a hang/reset.

> Now, I admit that my vesion of the record is not enough to debug it
> but it needs to contain only information which is clear and humanly
> readable to debug. You can always dump the raw data underneath from the
> tracepoint but make the beginning human readable.
> 
> Do you know what users say about your error record?
> 
> "Err, it says hardware error, is my machine broken? I need to replace my
> CPU."
> 

Your example still says "Hardware Error" and odds are general

Re: [PATCH v2 3/8] efi: Decode IA32/X64 Processor Error Info Structure

2018-02-27 Thread Borislav Petkov
On Tue, Feb 27, 2018 at 06:40:13PM +, Ghannam, Yazen wrote:
> Code readability.

Bullshit!

You can't tell me this:

snprintf(newpfx, sizeof(newpfx), "%s ", pfx);

is less readable than:

 snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);

> 1) No one except debug and HW design folks, who will eventually get a
> report from a user.

Hahahha, yeah right.

The only people who get those reports are the maintainers of the code in
the kernel and the distro people who get all the bugs assigned to them.

And if they can't decode the error - it is Tony and me.

HW folks hear about it from us. And we go and decode the damn crap
*every* time. Do you catch my drift now?

> [1.990948] [Hardware Error]:  Error 1, type: corrected 
> [1.995789] [Hardware Error]:  fru_text: ProcessorError 
> [2.000632] [Hardware Error]:   section_type: IA32/X64 processor error 
> [2.005796] [Hardware Error]:   Validation Bits: 0x0207 
> [2.010953] [Hardware Error]:   Local APIC_ID: 0x0 
> [2.015991] [Hardware Error]:   CPUID Info: 
> [2.020747] [Hardware Error]:   : 00800f12  00400800 
>  
> [2.025595] [Hardware Error]:   0010: 76d8320b  178bfbff 
>  
> [2.030423] [Hardware Error]:   0020:    
>  
> [2.035198] [Hardware Error]:   Error Information Structure 0:
> [2.039961] [Hardware Error]:Error Structure Type: 
> a55701f5-e3ef-43de-ac72-249b573fad2c
> [2.049608] [Hardware Error]:Error Structure Type: cache error
> [2.054344] [Hardware Error]:Validation Bits: 0x0001
> [2.059046] [Hardware Error]:Check Information: 0x20540087
> [2.063625] [Hardware Error]: Validation Bits: 0x0087
> [2.068032] [Hardware Error]: Transaction Type: 0, Instruction
> [2.072423] [Hardware Error]: Operation: 5, instruction fetch
> [2.076776] [Hardware Error]: Level: 1
> [2.081073] [Hardware Error]: Overflow: true
> [2.085360] [Hardware Error]:   Context Information Structure 0:
> [2.089661] [Hardware Error]:Register Context Type: MSR Registers 
> (Machine Check and other MSRs)
> [2.098487] [Hardware Error]:Register Array Size: 0x0050
> [2.103113] [Hardware Error]:MSR Address: 0xc0002011
> [2.107742] [Hardware Error]:Register Array:
> [2.112270] [Hardware Error]:: d82a0151 
> 
> [2.116845] [Hardware Error]:0010: d010 
> 00030031
> [2.121228] [Hardware Error]:0020: 000100b0 
> 4a00
> [2.125514] [Hardware Error]:0030:  
> 
> [2.129747] [Hardware Error]:0040:  
> 

Lemme simplify that error record:

[Hardware Error]:  Corrected Processor Error
[Hardware Error]:   APIC_ID: 0x0 | CPUID: 0x17|0x1|0x2
[Hardware Error]:Type: cache error during instruction fetch
[Hardware Error]:cache level 1
[Hardware Error]:Overflow: true

See how much more readable it got! And it is only 5 lines. I can make it
even smaller.

If it were a critical, uncorrectable error, every line counts: imagine
you do the above fat record and the machine freezes at line 5.

Now, I admit that my vesion of the record is not enough to debug it
but it needs to contain only information which is clear and humanly
readable to debug. You can always dump the raw data underneath from the
tracepoint but make the beginning human readable.

Do you know what users say about your error record?

"Err, it says hardware error, is my machine broken? I need to replace my
CPU."

I read that on a weekly basis.

Do you know how expensive support calls are about such errors which are
completely unreadable to people? 20 engineers need to get on a call to
realize it was a dumb correctable error? Btw, this is one of the reasons
why we did the error collector.

So put yourself in the users' shoes, look at the error record and think
hard whether the information displayed is readable to humans.

Btw, decode_error_status() in mce_amd.c is an attempt to explain the
error severity - note the "no action required." thing. It is still not
good enough - people still throw hands in the air and run in headless
chicken mode.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 3/8] efi: Decode IA32/X64 Processor Error Info Structure

2018-02-27 Thread Ghannam, Yazen
> -Original Message-
> From: Borislav Petkov [mailto:b...@suse.de]
> Sent: Tuesday, February 27, 2018 1:03 PM
> To: Ghannam, Yazen <yazen.ghan...@amd.com>
> Cc: linux-efi@vger.kernel.org; linux-ker...@vger.kernel.org;
> ard.biesheu...@linaro.org; x...@kernel.org
> Subject: Re: [PATCH v2 3/8] efi: Decode IA32/X64 Processor Error Info
> Structure
> 
> On Tue, Feb 27, 2018 at 05:46:54PM +, Ghannam, Yazen wrote:
> > I think there's value in following the conventions in a subsystem.
> 
> "conventions in a subsystem" my ass. That's brainless copy-pasting.
> 
> It was added by
> 
> f6edea77c8c8 ("ACPI, APEI, CPER: Cleanup CPER memory error output
> format")
> 
> and then replicated everywhere.
> 
> It is simply a dumb way of writing:
> 
>   snprintf(newpfx, sizeof(newpfx), "%s ", pfx);
> 

There's a space after the %s, right? I missed it at first glance, though maybe
my LASIK didn't take very well.

> 
> > I can change this if you give a reason besides "it's dumb".
> 
> Two can play that game: you get to keep it if you give a good reason
> why.
> 

Code readability.

...
> > And the raw value should still be printed because
> > 1) It may represent a type that we can't decode. Maybe a type that's not
> part of
> > the spec.
> 
> If we can't decode it, *then* you dump it:
> 
>   "Unrecognized type: 0x%llx ..."
> 
> > 2) It's good to have the raw value for reference. We do this with
> MCA_STATUS
> > where we print the raw value followed by the decoding.
> 
> 1. No one stares at the raw value if the bits are decoded
> 2. MCA_STATUS is one register - this error record is huge.
> 

1) No one except debug and HW design folks, who will eventually get a
report from a user.
2) You're right, the record is huge. Printing out the Validation Bits, GUID,
and raw Check Info will be an extra 5 lines.

I posted an example at the end.

We could get rid of the GUID for known types like you suggest. Also, we
can drop the Validation Bits for the Check Info since that's already part of
it.

Thanks,
Yazen

[1.990948] [Hardware Error]:  Error 1, type: corrected 
[1.995789] [Hardware Error]:  fru_text: ProcessorError 
[2.000632] [Hardware Error]:   section_type: IA32/X64 processor error 
[2.005796] [Hardware Error]:   Validation Bits: 0x0207 
[2.010953] [Hardware Error]:   Local APIC_ID: 0x0 
[2.015991] [Hardware Error]:   CPUID Info: 
[2.020747] [Hardware Error]:   : 00800f12  00400800 
 
[2.025595] [Hardware Error]:   0010: 76d8320b  178bfbff 
 
[2.030423] [Hardware Error]:   0020:    
 
[2.035198] [Hardware Error]:   Error Information Structure 0:
[2.039961] [Hardware Error]:Error Structure Type: 
a55701f5-e3ef-43de-ac72-249b573fad2c
[2.049608] [Hardware Error]:Error Structure Type: cache error
[2.054344] [Hardware Error]:Validation Bits: 0x0001
[2.059046] [Hardware Error]:Check Information: 0x20540087
[2.063625] [Hardware Error]: Validation Bits: 0x0087
[2.068032] [Hardware Error]: Transaction Type: 0, Instruction
[2.072423] [Hardware Error]: Operation: 5, instruction fetch
[2.076776] [Hardware Error]: Level: 1
[2.081073] [Hardware Error]: Overflow: true
[2.085360] [Hardware Error]:   Context Information Structure 0:
[2.089661] [Hardware Error]:Register Context Type: MSR Registers 
(Machine Check and other MSRs)
[2.098487] [Hardware Error]:Register Array Size: 0x0050
[2.103113] [Hardware Error]:MSR Address: 0xc0002011
[2.107742] [Hardware Error]:Register Array:
[2.112270] [Hardware Error]:: d82a0151 
[2.116845] [Hardware Error]:0010: d010 00030031
[2.121228] [Hardware Error]:0020: 000100b0 4a00
[2.125514] [Hardware Error]:0030:  
[2.129747] [Hardware Error]:0040:  


Re: [PATCH v2 3/8] efi: Decode IA32/X64 Processor Error Info Structure

2018-02-27 Thread Borislav Petkov
On Tue, Feb 27, 2018 at 05:46:54PM +, Ghannam, Yazen wrote:
> I think there's value in following the conventions in a subsystem.

"conventions in a subsystem" my ass. That's brainless copy-pasting.

It was added by

f6edea77c8c8 ("ACPI, APEI, CPER: Cleanup CPER memory error output format")

and then replicated everywhere.

It is simply a dumb way of writing:

snprintf(newpfx, sizeof(newpfx), "%s ", pfx);


> I can change this if you give a reason besides "it's dumb".

Two can play that game: you get to keep it if you give a good reason
why.

> We do map the spec-defined GUIDs in patch 4 of this set. I don't know if 
> there's
> a central place where all vendor-defined GUIDs are listed. I can look into 
> this.

Yes, at least for the most prominent ones.

> And the raw value should still be printed because
> 1) It may represent a type that we can't decode. Maybe a type that's not part 
> of
> the spec.

If we can't decode it, *then* you dump it:

"Unrecognized type: 0x%llx ..."

> 2) It's good to have the raw value for reference. We do this with MCA_STATUS
> where we print the raw value followed by the decoding.

1. No one stares at the raw value if the bits are decoded
2. MCA_STATUS is one register - this error record is huge.

> The structs are all different even though some fields may be the same.

Fair enough. Only if it makes sense.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 3/8] efi: Decode IA32/X64 Processor Error Info Structure

2018-02-27 Thread Ghannam, Yazen
> -Original Message-
> From: Borislav Petkov [mailto:b...@suse.de]
> Sent: Tuesday, February 27, 2018 12:04 PM
> To: Ghannam, Yazen <yazen.ghan...@amd.com>
> Cc: linux-efi@vger.kernel.org; linux-ker...@vger.kernel.org;
> ard.biesheu...@linaro.org; x...@kernel.org
> Subject: Re: [PATCH v2 3/8] efi: Decode IA32/X64 Processor Error Info
> Structure
> 
> On Tue, Feb 27, 2018 at 03:25:06PM +, Ghannam, Yazen wrote:
> > This is the same as the other CPER code.
> 
> Dude, turn on brain!
> 
> So if it is in the other CPER code, we should copy it, no matter how
> dumb it is?!?
> 

I think there's value in following the conventions in a subsystem.

I can change this if you give a reason besides "it's dumb". This could apply
to the other CPER code also.

> > Also, the spec allows platform-defined GUIDs. So we should always print
> this
> > even if the GUID is not defined by the spec.
> 
> We need to have a way to map the GUID to a hw part. Dumb numbers mean
> shit
> because the error record is worthless.
> 

We do map the spec-defined GUIDs in patch 4 of this set. I don't know if there's
a central place where all vendor-defined GUIDs are listed. I can look into this.

> > The Check Information will be decoded further in another patch.
> >
> > I don't think there's much else to decode for the ID fields.
> 
> Again, those numbers can't help decoding the error, no need to dump
> them. Or we find a way to make sense out of that info.
> 

Which numbers? The Check Information? Like I said, that's decoded in another
patch, patches 5, 6, and 7.

And the raw value should still be printed because
1) It may represent a type that we can't decode. Maybe a type that's not part of
the spec.
2) It's good to have the raw value for reference. We do this with MCA_STATUS
where we print the raw value followed by the decoding.

> > Other tables may have the same fields but the offsets are usually different.
> > So it may be more trouble than it's worth trying to unify the different
> tables.
> 
> If the structs are the same, you can use generic functions for dumping -
> the offsets are meaningless then.
> 

The structs are all different even though some fields may be the same.

Thanks,
Yazen


Re: [PATCH v2 3/8] efi: Decode IA32/X64 Processor Error Info Structure

2018-02-27 Thread Borislav Petkov
On Tue, Feb 27, 2018 at 03:25:06PM +, Ghannam, Yazen wrote:
> This is the same as the other CPER code.

Dude, turn on brain!

So if it is in the other CPER code, we should copy it, no matter how
dumb it is?!?

> Also, the spec allows platform-defined GUIDs. So we should always print this
> even if the GUID is not defined by the spec.

We need to have a way to map the GUID to a hw part. Dumb numbers mean shit
because the error record is worthless.

> The Check Information will be decoded further in another patch.
> 
> I don't think there's much else to decode for the ID fields.

Again, those numbers can't help decoding the error, no need to dump
them. Or we find a way to make sense out of that info.

> Other tables may have the same fields but the offsets are usually different.
> So it may be more trouble than it's worth trying to unify the different 
> tables.

If the structs are the same, you can use generic functions for dumping -
the offsets are meaningless then.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html


RE: [PATCH v2 3/8] efi: Decode IA32/X64 Processor Error Info Structure

2018-02-27 Thread Ghannam, Yazen
> -Original Message-
> From: Borislav Petkov [mailto:b...@suse.de]
> Sent: Tuesday, February 27, 2018 9:26 AM
> To: Ghannam, Yazen <yazen.ghan...@amd.com>
> Cc: linux-efi@vger.kernel.org; linux-ker...@vger.kernel.org;
> ard.biesheu...@linaro.org; x...@kernel.org
> Subject: Re: [PATCH v2 3/8] efi: Decode IA32/X64 Processor Error Info
> Structure
> 
> On Mon, Feb 26, 2018 at 01:38:59PM -0600, Yazen Ghannam wrote:
> > From: Yazen Ghannam <yazen.ghan...@amd.com>
> >
> > Print the fields in the IA32/X64 Processor Error Info Structure.
> >
> > Based on UEFI 2.7 Table 256. IA32/X64 Processor Error Information
> > Structure.
> >
> > Signed-off-by: Yazen Ghannam <yazen.ghan...@amd.com>
> > ---
> > Link:
> > https://lkml.kernel.org/r/20180223200333.6410-4-
> yazen.ghan...@amd.com
> >
> > v1->v2:
> > * Add parantheses around "bits" expression in macro.
> > * Fix indentation on multi-line statements.
> >
> >  drivers/firmware/efi/cper-x86.c | 53
> +
> >  1 file changed, 53 insertions(+)
> >
> > diff --git a/drivers/firmware/efi/cper-x86.c b/drivers/firmware/efi/cper-
> x86.c
> > index 9da0d981178f..417bd4e500a7 100644
> > --- a/drivers/firmware/efi/cper-x86.c
> > +++ b/drivers/firmware/efi/cper-x86.c
> > @@ -3,15 +3,28 @@
> >
> >  #include 
> >
> > +#define INDENT_SP  " "
> 
> That's just silly.
> 

This is the same as the other CPER code.

> > +
> >  /*
> >   * We don't need a "CPER_IA" prefix since these are all locally defined.
> >   * This will save us a lot of line space.
> >   */
> >  #define VALID_LAPIC_ID BIT_ULL(0)
> >  #define VALID_CPUID_INFO   BIT_ULL(1)
> > +#define VALID_PROC_ERR_INFO_NUM(bits)  (((bits) & GENMASK_ULL(7,
> 2)) >> 2)
> > +
> > +#define INFO_VALID_CHECK_INFO  BIT_ULL(0)
> > +#define INFO_VALID_TARGET_ID   BIT_ULL(1)
> > +#define INFO_VALID_REQUESTOR_IDBIT_ULL(2)
> > +#define INFO_VALID_RESPONDER_IDBIT_ULL(3)
> > +#define INFO_VALID_IP  BIT_ULL(4)
> >
> >  void cper_print_proc_ia(const char *pfx, const struct cper_sec_proc_ia
> *proc)
> >  {
> > +   int i;
> > +   struct cper_ia_err_info *err_info;
> > +   char newpfx[64];
> > +
> > printk("%sValidation Bits: 0x%016llx\n", pfx, proc->validation_bits);
> >
> > if (proc->validation_bits & VALID_LAPIC_ID)
> > @@ -22,4 +35,44 @@ void cper_print_proc_ia(const char *pfx, const struct
> cper_sec_proc_ia *proc)
> > print_hex_dump(pfx, "", DUMP_PREFIX_OFFSET, 16, 4, proc-
> >cpuid,
> >sizeof(proc->cpuid), 0);
> > }
> > +
> > +   snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
> > +
> > +   err_info = (struct cper_ia_err_info *)(proc + 1);
> > +   for (i = 0; i < VALID_PROC_ERR_INFO_NUM(proc->validation_bits);
> i++) {
> > +   printk("%sError Information Structure %d:\n", pfx, i);
> > +
> > +   printk("%sError Structure Type: %pUl\n", newpfx,
> > +  _info->err_type);
> 
> That dumps a GUID - how do I find out what type that GUID refers to?
> 

A later patch prints out the spec-defined type.

Also, the spec allows platform-defined GUIDs. So we should always print this
even if the GUID is not defined by the spec.

> > +
> > +   printk("%sValidation Bits: 0x%016llx\n",
> > +  newpfx, err_info->validation_bits);
> 
> As before, no need for those.
> 
> > +
> > +   if (err_info->validation_bits & INFO_VALID_CHECK_INFO) {
> > +   printk("%sCheck Information: 0x%016llx\n", newpfx,
> > +  err_info->check_info);
> > +   }
> > +
> > +   if (err_info->validation_bits & INFO_VALID_TARGET_ID) {
> > +   printk("%sTarget Identifier: 0x%016llx\n",
> > +  newpfx, err_info->target_id);
> > +   }
> > +
> > +   if (err_info->validation_bits & INFO_VALID_REQUESTOR_ID) {
> > +   printk("%sRequestor Identifier: 0x%016llx\n",
> > +  newpfx, err_info->requestor_id);
> > +   }
> > +
> > +   if (err_info->validation_bits & INFO_VALID_RESPONDER_

Re: [PATCH v2 3/8] efi: Decode IA32/X64 Processor Error Info Structure

2018-02-27 Thread Borislav Petkov
On Mon, Feb 26, 2018 at 01:38:59PM -0600, Yazen Ghannam wrote:
> From: Yazen Ghannam 
> 
> Print the fields in the IA32/X64 Processor Error Info Structure.
> 
> Based on UEFI 2.7 Table 256. IA32/X64 Processor Error Information
> Structure.
> 
> Signed-off-by: Yazen Ghannam 
> ---
> Link:
> https://lkml.kernel.org/r/20180223200333.6410-4-yazen.ghan...@amd.com
> 
> v1->v2:
> * Add parantheses around "bits" expression in macro.
> * Fix indentation on multi-line statements.
> 
>  drivers/firmware/efi/cper-x86.c | 53 
> +
>  1 file changed, 53 insertions(+)
> 
> diff --git a/drivers/firmware/efi/cper-x86.c b/drivers/firmware/efi/cper-x86.c
> index 9da0d981178f..417bd4e500a7 100644
> --- a/drivers/firmware/efi/cper-x86.c
> +++ b/drivers/firmware/efi/cper-x86.c
> @@ -3,15 +3,28 @@
>  
>  #include 
>  
> +#define INDENT_SP" "

That's just silly.

> +
>  /*
>   * We don't need a "CPER_IA" prefix since these are all locally defined.
>   * This will save us a lot of line space.
>   */
>  #define VALID_LAPIC_ID   BIT_ULL(0)
>  #define VALID_CPUID_INFO BIT_ULL(1)
> +#define VALID_PROC_ERR_INFO_NUM(bits)(((bits) & GENMASK_ULL(7, 2)) 
> >> 2)
> +
> +#define INFO_VALID_CHECK_INFOBIT_ULL(0)
> +#define INFO_VALID_TARGET_ID BIT_ULL(1)
> +#define INFO_VALID_REQUESTOR_ID  BIT_ULL(2)
> +#define INFO_VALID_RESPONDER_ID  BIT_ULL(3)
> +#define INFO_VALID_IPBIT_ULL(4)
>  
>  void cper_print_proc_ia(const char *pfx, const struct cper_sec_proc_ia *proc)
>  {
> + int i;
> + struct cper_ia_err_info *err_info;
> + char newpfx[64];
> +
>   printk("%sValidation Bits: 0x%016llx\n", pfx, proc->validation_bits);
>  
>   if (proc->validation_bits & VALID_LAPIC_ID)
> @@ -22,4 +35,44 @@ void cper_print_proc_ia(const char *pfx, const struct 
> cper_sec_proc_ia *proc)
>   print_hex_dump(pfx, "", DUMP_PREFIX_OFFSET, 16, 4, proc->cpuid,
>  sizeof(proc->cpuid), 0);
>   }
> +
> + snprintf(newpfx, sizeof(newpfx), "%s%s", pfx, INDENT_SP);
> +
> + err_info = (struct cper_ia_err_info *)(proc + 1);
> + for (i = 0; i < VALID_PROC_ERR_INFO_NUM(proc->validation_bits); i++) {
> + printk("%sError Information Structure %d:\n", pfx, i);
> +
> + printk("%sError Structure Type: %pUl\n", newpfx,
> +_info->err_type);

That dumps a GUID - how do I find out what type that GUID refers to?

> +
> + printk("%sValidation Bits: 0x%016llx\n",
> +newpfx, err_info->validation_bits);

As before, no need for those.

> +
> + if (err_info->validation_bits & INFO_VALID_CHECK_INFO) {
> + printk("%sCheck Information: 0x%016llx\n", newpfx,
> +err_info->check_info);
> + }
> +
> + if (err_info->validation_bits & INFO_VALID_TARGET_ID) {
> + printk("%sTarget Identifier: 0x%016llx\n",
> +newpfx, err_info->target_id);
> + }
> +
> + if (err_info->validation_bits & INFO_VALID_REQUESTOR_ID) {
> + printk("%sRequestor Identifier: 0x%016llx\n",
> +newpfx, err_info->requestor_id);
> + }
> +
> + if (err_info->validation_bits & INFO_VALID_RESPONDER_ID) {
> + printk("%sResponder Identifier: 0x%016llx\n",
> +newpfx, err_info->responder_id);
> + }

Those look like would need an additional decoding. Can we do that here
too?

> +
> + if (err_info->validation_bits & INFO_VALID_IP) {
> + printk("%sInstruction Pointer: 0x%016llx\n",
> +newpfx, err_info->ip);

The only thing that makes sense so far.

> + }
> +
> + err_info++;
> + }

Also, it is worth checking how much duplication there is with
drivers/firmware/efi/cper-arm.c and unifying the common pieces into
functions in cper-common.c or so.

-- 
Regards/Gruss,
Boris.

SUSE Linux GmbH, GF: Felix Imendörffer, Jane Smithard, Graham Norton, HRB 21284 
(AG Nürnberg)
-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-efi" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html