Re: [PATCH] x86/fault: Decode and print #PF oops in human readable form

2018-12-07 Thread Sean Christopherson
On Fri, Dec 07, 2018 at 10:52:49AM -0800, Linus Torvalds wrote:
> On Fri, Dec 7, 2018 at 10:44 AM Sean Christopherson
>  wrote:
> >
> > Remove the per-bit decoding of the error code and instead print the raw
> > error code followed by a brief description of what caused the fault, the
> > effective privilege level of the faulting access, and whether the fault
> > originated in user code or kernel code.
> 
> This doesn't quite work as-is, though.
> 
> For example, at least the PK bit is independent of the other bits and
> would be interesting in the human-legible version, but doesn't show up
> there at all.

Heh, I actually intentionally omitted protection keys thinking it'd be
superfluous, i.e. "go look at the error code bits if you care that much".

> That said, I think the end result might be more legible than the
> previous version, so this approach may well be good, it just needs at
> least that "permissions violation"  part to be extended with whether
> it was PK or not.
> 
> Also, shouldn't we show the SGX bit too as some kind of "during SGX"
> extension on the "in user/kernel space" part?

The SGX bit isn't defined in mainline yet.  But yeah, I can see how
printing e.g. "SGX EPCM violation" would be a lot more helpful than
a vanilla "permissions violation".  I'll send a v2 with the PK bit
added and a slightly reworded changelog.


Re: [PATCH] x86/fault: Decode and print #PF oops in human readable form

2018-12-07 Thread Sean Christopherson
On Fri, Dec 07, 2018 at 10:52:49AM -0800, Linus Torvalds wrote:
> On Fri, Dec 7, 2018 at 10:44 AM Sean Christopherson
>  wrote:
> >
> > Remove the per-bit decoding of the error code and instead print the raw
> > error code followed by a brief description of what caused the fault, the
> > effective privilege level of the faulting access, and whether the fault
> > originated in user code or kernel code.
> 
> This doesn't quite work as-is, though.
> 
> For example, at least the PK bit is independent of the other bits and
> would be interesting in the human-legible version, but doesn't show up
> there at all.

Heh, I actually intentionally omitted protection keys thinking it'd be
superfluous, i.e. "go look at the error code bits if you care that much".

> That said, I think the end result might be more legible than the
> previous version, so this approach may well be good, it just needs at
> least that "permissions violation"  part to be extended with whether
> it was PK or not.
> 
> Also, shouldn't we show the SGX bit too as some kind of "during SGX"
> extension on the "in user/kernel space" part?

The SGX bit isn't defined in mainline yet.  But yeah, I can see how
printing e.g. "SGX EPCM violation" would be a lot more helpful than
a vanilla "permissions violation".  I'll send a v2 with the PK bit
added and a slightly reworded changelog.


Re: [PATCH] x86/fault: Decode and print #PF oops in human readable form

2018-12-07 Thread Linus Torvalds
On Fri, Dec 7, 2018 at 10:44 AM Sean Christopherson
 wrote:
>
> Remove the per-bit decoding of the error code and instead print the raw
> error code followed by a brief description of what caused the fault, the
> effective privilege level of the faulting access, and whether the fault
> originated in user code or kernel code.

This doesn't quite work as-is, though.

For example, at least the PK bit is independent of the other bits and
would be interesting in the human-legible version, but doesn't show up
there at all.

That said, I think the end result might be more legible than the
previous version, so this approach may well be good, it just needs at
least that "permissions violation"  part to be extended with whether
it was PK or not.

Also, shouldn't we show the SGX bit too as some kind of "during SGX"
extension on the "in user/kernel space" part?

   Linus


smime.p7s
Description: S/MIME Cryptographic Signature


Re: [PATCH] x86/fault: Decode and print #PF oops in human readable form

2018-12-07 Thread Linus Torvalds
On Fri, Dec 7, 2018 at 10:44 AM Sean Christopherson
 wrote:
>
> Remove the per-bit decoding of the error code and instead print the raw
> error code followed by a brief description of what caused the fault, the
> effective privilege level of the faulting access, and whether the fault
> originated in user code or kernel code.

This doesn't quite work as-is, though.

For example, at least the PK bit is independent of the other bits and
would be interesting in the human-legible version, but doesn't show up
there at all.

That said, I think the end result might be more legible than the
previous version, so this approach may well be good, it just needs at
least that "permissions violation"  part to be extended with whether
it was PK or not.

Also, shouldn't we show the SGX bit too as some kind of "during SGX"
extension on the "in user/kernel space" part?

   Linus


smime.p7s
Description: S/MIME Cryptographic Signature


[PATCH] x86/fault: Decode and print #PF oops in human readable form

2018-12-07 Thread Sean Christopherson
Linus pointed out[1] that deciphering the raw #PF error code and
printing a more human readable message are two different things,
e.g. not-executable, not-writable, protection keys and even SGX faults
are all some form of permission violation faults, and a #PF with the
USER bit cleared in the error code does not indicate that the fault
originated in kernel code.

Remove the per-bit decoding of the error code and instead print the raw
error code followed by a brief description of what caused the fault, the
effective privilege level of the faulting access, and whether the fault
originated in user code or kernel code.

This provides the user with the information needed to do basic triage
on 99.9% oopses without polluting the message with information that is
useless the vast majority of the time, e.g. an oops on a protection keys
violation is beyond rare, stating that an oops *wasn't* due to a keys
violation is pointless noise.

[1] 
https://lkml.kernel.org/r/CAHk-=whk_fsnxvmvf1t2ffcap2wrvsybabrlqcwljycvhw6...@mail.gmail.com

Suggested-by: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Cc: Yu-cheng Yu 
Cc: linux-kernel@vger.kernel.org
Cc: Ingo Molnar 
Signed-off-by: Sean Christopherson 
---
 arch/x86/mm/fault.c | 41 ++---
 1 file changed, 10 insertions(+), 31 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2ff25ad33233..85de05d41f58 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -603,24 +603,9 @@ static void show_ldttss(const struct desc_ptr *gdt, const 
char *name, u16 index)
 name, index, addr, (desc.limit0 | (desc.limit1 << 16)));
 }
 
-/*
- * This helper function transforms the #PF error_code bits into
- * "[PROT] [USER]" type of descriptive, almost human-readable error strings:
- */
-static void err_str_append(unsigned long error_code, char *buf, unsigned long 
mask, const char *txt)
-{
-   if (error_code & mask) {
-   if (buf[0])
-   strcat(buf, " ");
-   strcat(buf, txt);
-   }
-}
-
 static void
 show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long 
address)
 {
-   char err_txt[64];
-
if (!oops_may_print())
return;
 
@@ -648,27 +633,21 @@ show_fault_oops(struct pt_regs *regs, unsigned long 
error_code, unsigned long ad
 address < PAGE_SIZE ? "NULL pointer dereference" : "paging 
request",
 (void *)address);
 
-   err_txt[0] = 0;
-
-   /*
-* Note: length of these appended strings including the separation 
space and the
-* zero delimiter must fit into err_txt[].
-*/
-   err_str_append(error_code, err_txt, X86_PF_PROT,  "[PROT]" );
-   err_str_append(error_code, err_txt, X86_PF_WRITE, "[WRITE]");
-   err_str_append(error_code, err_txt, X86_PF_USER,  "[USER]" );
-   err_str_append(error_code, err_txt, X86_PF_RSVD,  "[RSVD]" );
-   err_str_append(error_code, err_txt, X86_PF_INSTR, "[INSTR]");
-   err_str_append(error_code, err_txt, X86_PF_PK,"[PK]"   );
-
-   pr_alert("#PF error: %s\n", error_code ? err_txt : "[normal kernel read 
fault]");
+   pr_alert("#PF: error_code(0x%04lx) - %s\n", error_code,
+   !(error_code & X86_PF_PROT) ? "not-present page" :
+   (error_code & X86_PF_RSVD)  ? "reserved bit violation" :
+ "permissions violation");
+   pr_alert("#PF: %s-privileged %s from %s code\n",
+   (error_code & X86_PF_USER)  ? "user" : "supervisor",
+   (error_code & X86_PF_INSTR) ? "instruction fetch" :
+   (error_code & X86_PF_WRITE) ? "write access" :
+ "read access",
+   user_mode(regs) ? "user" : "kernel");
 
if (!(error_code & X86_PF_USER) && user_mode(regs)) {
struct desc_ptr idt, gdt;
u16 ldtr, tr;
 
-   pr_alert("This was a system access from user code\n");
-
/*
 * This can happen for quite a few reasons.  The more obvious
 * ones are faults accessing the GDT, or LDT.  Perhaps
-- 
2.19.2



[PATCH] x86/fault: Decode and print #PF oops in human readable form

2018-12-07 Thread Sean Christopherson
Linus pointed out[1] that deciphering the raw #PF error code and
printing a more human readable message are two different things,
e.g. not-executable, not-writable, protection keys and even SGX faults
are all some form of permission violation faults, and a #PF with the
USER bit cleared in the error code does not indicate that the fault
originated in kernel code.

Remove the per-bit decoding of the error code and instead print the raw
error code followed by a brief description of what caused the fault, the
effective privilege level of the faulting access, and whether the fault
originated in user code or kernel code.

This provides the user with the information needed to do basic triage
on 99.9% oopses without polluting the message with information that is
useless the vast majority of the time, e.g. an oops on a protection keys
violation is beyond rare, stating that an oops *wasn't* due to a keys
violation is pointless noise.

[1] 
https://lkml.kernel.org/r/CAHk-=whk_fsnxvmvf1t2ffcap2wrvsybabrlqcwljycvhw6...@mail.gmail.com

Suggested-by: Linus Torvalds 
Cc: Andy Lutomirski 
Cc: Borislav Petkov 
Cc: Dave Hansen 
Cc: H. Peter Anvin 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Rik van Riel 
Cc: Thomas Gleixner 
Cc: Yu-cheng Yu 
Cc: linux-kernel@vger.kernel.org
Cc: Ingo Molnar 
Signed-off-by: Sean Christopherson 
---
 arch/x86/mm/fault.c | 41 ++---
 1 file changed, 10 insertions(+), 31 deletions(-)

diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2ff25ad33233..85de05d41f58 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -603,24 +603,9 @@ static void show_ldttss(const struct desc_ptr *gdt, const 
char *name, u16 index)
 name, index, addr, (desc.limit0 | (desc.limit1 << 16)));
 }
 
-/*
- * This helper function transforms the #PF error_code bits into
- * "[PROT] [USER]" type of descriptive, almost human-readable error strings:
- */
-static void err_str_append(unsigned long error_code, char *buf, unsigned long 
mask, const char *txt)
-{
-   if (error_code & mask) {
-   if (buf[0])
-   strcat(buf, " ");
-   strcat(buf, txt);
-   }
-}
-
 static void
 show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long 
address)
 {
-   char err_txt[64];
-
if (!oops_may_print())
return;
 
@@ -648,27 +633,21 @@ show_fault_oops(struct pt_regs *regs, unsigned long 
error_code, unsigned long ad
 address < PAGE_SIZE ? "NULL pointer dereference" : "paging 
request",
 (void *)address);
 
-   err_txt[0] = 0;
-
-   /*
-* Note: length of these appended strings including the separation 
space and the
-* zero delimiter must fit into err_txt[].
-*/
-   err_str_append(error_code, err_txt, X86_PF_PROT,  "[PROT]" );
-   err_str_append(error_code, err_txt, X86_PF_WRITE, "[WRITE]");
-   err_str_append(error_code, err_txt, X86_PF_USER,  "[USER]" );
-   err_str_append(error_code, err_txt, X86_PF_RSVD,  "[RSVD]" );
-   err_str_append(error_code, err_txt, X86_PF_INSTR, "[INSTR]");
-   err_str_append(error_code, err_txt, X86_PF_PK,"[PK]"   );
-
-   pr_alert("#PF error: %s\n", error_code ? err_txt : "[normal kernel read 
fault]");
+   pr_alert("#PF: error_code(0x%04lx) - %s\n", error_code,
+   !(error_code & X86_PF_PROT) ? "not-present page" :
+   (error_code & X86_PF_RSVD)  ? "reserved bit violation" :
+ "permissions violation");
+   pr_alert("#PF: %s-privileged %s from %s code\n",
+   (error_code & X86_PF_USER)  ? "user" : "supervisor",
+   (error_code & X86_PF_INSTR) ? "instruction fetch" :
+   (error_code & X86_PF_WRITE) ? "write access" :
+ "read access",
+   user_mode(regs) ? "user" : "kernel");
 
if (!(error_code & X86_PF_USER) && user_mode(regs)) {
struct desc_ptr idt, gdt;
u16 ldtr, tr;
 
-   pr_alert("This was a system access from user code\n");
-
/*
 * This can happen for quite a few reasons.  The more obvious
 * ones are faults accessing the GDT, or LDT.  Perhaps
-- 
2.19.2