Re: [PATCH] x86/mm/fault: Streamline the fault error_code decoder some more

2018-12-06 Thread Linus Torvalds
On Thu, Dec 6, 2018 at 12:28 PM Andy Lutomirski  wrote:
>
> "read" isn't an actual bit in the error code, so I thought it would be
> polite to make it look a little bit different.

If you care about the bits in the error code, then just look at the number.

And if you care about what the numbers mean, it doesn't matter how it's encoded.

I don't think you should mix up the two concepts.

> Sure.  Although it's extremely odd for us to OOPS from user mode, so
> maybe the OOPS code in general should print a big fat warning, and
> we'll just otherwise assume it was from kernel mode.

Yeah, the "from user mode" case is generally really something horribly
bad (reserved bits being set or us screwing up LDT's etc), so yeah,
maybe a better model is indeed to point that out explicitly, the same
way the "user/kernel doesn't match U/S bit" is pointed out.

  Linus


Re: [PATCH] x86/mm/fault: Streamline the fault error_code decoder some more

2018-12-06 Thread Linus Torvalds
On Thu, Dec 6, 2018 at 12:28 PM Andy Lutomirski  wrote:
>
> "read" isn't an actual bit in the error code, so I thought it would be
> polite to make it look a little bit different.

If you care about the bits in the error code, then just look at the number.

And if you care about what the numbers mean, it doesn't matter how it's encoded.

I don't think you should mix up the two concepts.

> Sure.  Although it's extremely odd for us to OOPS from user mode, so
> maybe the OOPS code in general should print a big fat warning, and
> we'll just otherwise assume it was from kernel mode.

Yeah, the "from user mode" case is generally really something horribly
bad (reserved bits being set or us screwing up LDT's etc), so yeah,
maybe a better model is indeed to point that out explicitly, the same
way the "user/kernel doesn't match U/S bit" is pointed out.

  Linus


Re: [PATCH] x86/mm/fault: Streamline the fault error_code decoder some more

2018-12-06 Thread Andy Lutomirski
On Thu, Dec 6, 2018 at 12:24 PM Linus Torvalds
 wrote:
>
> On Thu, Dec 6, 2018 at 11:07 AM Andy Lutomirski  wrote:
> >
> > How do you like the attached patch?
>
> I agree with whoever thought it's odd that "read" is in lower case
> when everything else is in upper case.

"read" isn't an actual bit in the error code, so I thought it would be
polite to make it look a little bit different.  That's not a big deal,
though.

>
> And honestly, I'd just siggest making the err_text simply have the
> real user/kernel difference in it too, using something like
>
> pr_alert("#PF error: 0x%04lx%s from %s mode\n", error_code, err_txt,
> user_mode(regs) ? "user" : "kernel" );
>
> The "oh, PF_USER and user_mode differs, so let's point that out
> explicitly" thing is a good thing regardless.

Sure.  Although it's extremely odd for us to OOPS from user mode, so
maybe the OOPS code in general should print a big fat warning, and
we'll just otherwise assume it was from kernel mode.  (In your tree
right now, we print the wrong value for CS for OOPSes from user mode,
which is extra confusing.  That's fixed in -tip.)

--Andy


Re: [PATCH] x86/mm/fault: Streamline the fault error_code decoder some more

2018-12-06 Thread Andy Lutomirski
On Thu, Dec 6, 2018 at 12:24 PM Linus Torvalds
 wrote:
>
> On Thu, Dec 6, 2018 at 11:07 AM Andy Lutomirski  wrote:
> >
> > How do you like the attached patch?
>
> I agree with whoever thought it's odd that "read" is in lower case
> when everything else is in upper case.

"read" isn't an actual bit in the error code, so I thought it would be
polite to make it look a little bit different.  That's not a big deal,
though.

>
> And honestly, I'd just siggest making the err_text simply have the
> real user/kernel difference in it too, using something like
>
> pr_alert("#PF error: 0x%04lx%s from %s mode\n", error_code, err_txt,
> user_mode(regs) ? "user" : "kernel" );
>
> The "oh, PF_USER and user_mode differs, so let's point that out
> explicitly" thing is a good thing regardless.

Sure.  Although it's extremely odd for us to OOPS from user mode, so
maybe the OOPS code in general should print a big fat warning, and
we'll just otherwise assume it was from kernel mode.  (In your tree
right now, we print the wrong value for CS for OOPSes from user mode,
which is extra confusing.  That's fixed in -tip.)

--Andy


Re: [PATCH] x86/mm/fault: Streamline the fault error_code decoder some more

2018-12-06 Thread Linus Torvalds
On Thu, Dec 6, 2018 at 11:07 AM Andy Lutomirski  wrote:
>
> How do you like the attached patch?

I agree with whoever thought it's odd that "read" is in lower case
when everything else is in upper case.

And honestly, I'd just siggest making the err_text simply have the
real user/kernel difference in it too, using something like

pr_alert("#PF error: 0x%04lx%s from %s mode\n", error_code, err_txt,
user_mode(regs) ? "user" : "kernel" );

The "oh, PF_USER and user_mode differs, so let's point that out
explicitly" thing is a good thing regardless.

 Linus


Re: [PATCH] x86/mm/fault: Streamline the fault error_code decoder some more

2018-12-06 Thread Linus Torvalds
On Thu, Dec 6, 2018 at 11:07 AM Andy Lutomirski  wrote:
>
> How do you like the attached patch?

I agree with whoever thought it's odd that "read" is in lower case
when everything else is in upper case.

And honestly, I'd just siggest making the err_text simply have the
real user/kernel difference in it too, using something like

pr_alert("#PF error: 0x%04lx%s from %s mode\n", error_code, err_txt,
user_mode(regs) ? "user" : "kernel" );

The "oh, PF_USER and user_mode differs, so let's point that out
explicitly" thing is a good thing regardless.

 Linus


Re: [PATCH] x86/mm/fault: Streamline the fault error_code decoder some more

2018-12-06 Thread Andy Lutomirski
On Thu, Dec 6, 2018 at 10:15 AM Linus Torvalds
 wrote:
>
> On Wed, Dec 5, 2018 at 11:34 PM Ingo Molnar  wrote:
> >
> > Yeah, so I don't like the overly long 'SUPERVISOR' and the somewhat
> > inconsistent, sporadic handling of negatives. Here's our error code bits:
> >
> > /*
> >  * Page fault error code bits:
> >  *
> >  *   bit 0 ==0: no page found   1: protection fault
> >  *   bit 1 ==0: read access 1: write access
> >  *   bit 2 ==0: kernel-mode access  1: user-mode access
>
> No. Really not at all.
>
> Bit 2 is *not* "kernel vs user".  Never has been. Never will be.
>
> It's a single bit that mixes up *three* different cases:
>
>  - regular user mode access (value: 1)
>
>  - regular CPL0 access (value: 0)
>
>  - CPU system access (value: 0)
>
> and that third case really is important and relevant. And importantly,
> it can happen from user space.
>
> In fact, these days we possibly have a fourth case:
>
>  - kernel access using wruss (value: 1)

Indeed.

>
> and I'd rather see just the numbers (which you have to know to decode)
> than see the simplified AND WRONG decoding of those numbers.

That's why the very next line in the OOPS explains this.

>
> Please don't ever confuse the fault U/S bit with "user vs kernel".
> It's just not true, and people should be very very aware of it now
> being true.
>
> If you care whether a page fault happened in user mode or not, you
> have to look at the register state (ie "user_mode(regs)").

The code we're arguing over was part of a patch set to fix this
confusion in the page fault code for real.

>
> Please call the U/S bit something else than "user" or "kernel".

Dunno.  I kind of like following the SDM.

How do you like the attached patch?
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2ff25ad33233..f1f9e19646f5 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -610,8 +610,7 @@ static void show_ldttss(const struct desc_ptr *gdt, const char *name, u16 index)
 static void err_str_append(unsigned long error_code, char *buf, unsigned long mask, const char *txt)
 {
 	if (error_code & mask) {
-		if (buf[0])
-			strcat(buf, " ");
+		strcat(buf, " ");
 		strcat(buf, txt);
 	}
 }
@@ -651,23 +650,30 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
 	err_txt[0] = 0;
 
 	/*
-	 * Note: length of these appended strings including the separation space and the
-	 * zero delimiter must fit into err_txt[].
+	 * Note: length of these appended strings including the separation
+	 * space and the zero delimiter must fit into err_txt[].
 	 */
 	err_str_append(error_code, err_txt, X86_PF_PROT,  "[PROT]" );
 	err_str_append(error_code, err_txt, X86_PF_WRITE, "[WRITE]");
+	if ((error_code & (X86_PF_WRITE | X86_PF_INSTR)) == 0)
+		strcat(err_txt, " [read]");
 	err_str_append(error_code, err_txt, X86_PF_USER,  "[USER]" );
 	err_str_append(error_code, err_txt, X86_PF_RSVD,  "[RSVD]" );
 	err_str_append(error_code, err_txt, X86_PF_INSTR, "[INSTR]");
 	err_str_append(error_code, err_txt, X86_PF_PK,"[PK]"   );
 
-	pr_alert("#PF error: %s\n", error_code ? err_txt : "[normal kernel read fault]");
+	pr_alert("#PF error: 0x%04lx%s\n", error_code, err_txt);
 
+	/*
+	 * The X86_PF_USER bit does *not* mean the same thing as
+	 * user_mode(regs).  Make sure that the unusual cases are obvious to
+	 * the reader.
+	 */
 	if (!(error_code & X86_PF_USER) && user_mode(regs)) {
 		struct desc_ptr idt, gdt;
 		u16 ldtr, tr;
 
-		pr_alert("This was a system access from user code\n");
+		pr_alert("NB: This was a supervisor-privileged access from user code.\n");
 
 		/*
 		 * This can happen for quite a few reasons.  The more obvious
@@ -692,6 +698,14 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
 
 		store_tr(tr);
 		show_ldttss(, "TR", tr);
+	} else if ((error_code & X86_PF_USER) && !user_mode(regs)) {
+		/*
+		 * This can happen due to WRUSS.  If an ISA extension ever
+		 * gives new instructions to do user memory access from kernel
+		 * mode and we OOPS due to a broken fixup, we'll presumably land
+		 * here as well.
+		 */
+		pr_alert("NB: This was a user-privileged access from kernel code.\n");
 	}
 
 	dump_pagetable(address);


Re: [PATCH] x86/mm/fault: Streamline the fault error_code decoder some more

2018-12-06 Thread Andy Lutomirski
On Thu, Dec 6, 2018 at 10:15 AM Linus Torvalds
 wrote:
>
> On Wed, Dec 5, 2018 at 11:34 PM Ingo Molnar  wrote:
> >
> > Yeah, so I don't like the overly long 'SUPERVISOR' and the somewhat
> > inconsistent, sporadic handling of negatives. Here's our error code bits:
> >
> > /*
> >  * Page fault error code bits:
> >  *
> >  *   bit 0 ==0: no page found   1: protection fault
> >  *   bit 1 ==0: read access 1: write access
> >  *   bit 2 ==0: kernel-mode access  1: user-mode access
>
> No. Really not at all.
>
> Bit 2 is *not* "kernel vs user".  Never has been. Never will be.
>
> It's a single bit that mixes up *three* different cases:
>
>  - regular user mode access (value: 1)
>
>  - regular CPL0 access (value: 0)
>
>  - CPU system access (value: 0)
>
> and that third case really is important and relevant. And importantly,
> it can happen from user space.
>
> In fact, these days we possibly have a fourth case:
>
>  - kernel access using wruss (value: 1)

Indeed.

>
> and I'd rather see just the numbers (which you have to know to decode)
> than see the simplified AND WRONG decoding of those numbers.

That's why the very next line in the OOPS explains this.

>
> Please don't ever confuse the fault U/S bit with "user vs kernel".
> It's just not true, and people should be very very aware of it now
> being true.
>
> If you care whether a page fault happened in user mode or not, you
> have to look at the register state (ie "user_mode(regs)").

The code we're arguing over was part of a patch set to fix this
confusion in the page fault code for real.

>
> Please call the U/S bit something else than "user" or "kernel".

Dunno.  I kind of like following the SDM.

How do you like the attached patch?
diff --git a/arch/x86/mm/fault.c b/arch/x86/mm/fault.c
index 2ff25ad33233..f1f9e19646f5 100644
--- a/arch/x86/mm/fault.c
+++ b/arch/x86/mm/fault.c
@@ -610,8 +610,7 @@ static void show_ldttss(const struct desc_ptr *gdt, const char *name, u16 index)
 static void err_str_append(unsigned long error_code, char *buf, unsigned long mask, const char *txt)
 {
 	if (error_code & mask) {
-		if (buf[0])
-			strcat(buf, " ");
+		strcat(buf, " ");
 		strcat(buf, txt);
 	}
 }
@@ -651,23 +650,30 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
 	err_txt[0] = 0;
 
 	/*
-	 * Note: length of these appended strings including the separation space and the
-	 * zero delimiter must fit into err_txt[].
+	 * Note: length of these appended strings including the separation
+	 * space and the zero delimiter must fit into err_txt[].
 	 */
 	err_str_append(error_code, err_txt, X86_PF_PROT,  "[PROT]" );
 	err_str_append(error_code, err_txt, X86_PF_WRITE, "[WRITE]");
+	if ((error_code & (X86_PF_WRITE | X86_PF_INSTR)) == 0)
+		strcat(err_txt, " [read]");
 	err_str_append(error_code, err_txt, X86_PF_USER,  "[USER]" );
 	err_str_append(error_code, err_txt, X86_PF_RSVD,  "[RSVD]" );
 	err_str_append(error_code, err_txt, X86_PF_INSTR, "[INSTR]");
 	err_str_append(error_code, err_txt, X86_PF_PK,"[PK]"   );
 
-	pr_alert("#PF error: %s\n", error_code ? err_txt : "[normal kernel read fault]");
+	pr_alert("#PF error: 0x%04lx%s\n", error_code, err_txt);
 
+	/*
+	 * The X86_PF_USER bit does *not* mean the same thing as
+	 * user_mode(regs).  Make sure that the unusual cases are obvious to
+	 * the reader.
+	 */
 	if (!(error_code & X86_PF_USER) && user_mode(regs)) {
 		struct desc_ptr idt, gdt;
 		u16 ldtr, tr;
 
-		pr_alert("This was a system access from user code\n");
+		pr_alert("NB: This was a supervisor-privileged access from user code.\n");
 
 		/*
 		 * This can happen for quite a few reasons.  The more obvious
@@ -692,6 +698,14 @@ show_fault_oops(struct pt_regs *regs, unsigned long error_code, unsigned long ad
 
 		store_tr(tr);
 		show_ldttss(, "TR", tr);
+	} else if ((error_code & X86_PF_USER) && !user_mode(regs)) {
+		/*
+		 * This can happen due to WRUSS.  If an ISA extension ever
+		 * gives new instructions to do user memory access from kernel
+		 * mode and we OOPS due to a broken fixup, we'll presumably land
+		 * here as well.
+		 */
+		pr_alert("NB: This was a user-privileged access from kernel code.\n");
 	}
 
 	dump_pagetable(address);


Re: [PATCH] x86/mm/fault: Streamline the fault error_code decoder some more

2018-12-06 Thread Linus Torvalds
On Wed, Dec 5, 2018 at 11:34 PM Ingo Molnar  wrote:
>
> Yeah, so I don't like the overly long 'SUPERVISOR' and the somewhat
> inconsistent, sporadic handling of negatives. Here's our error code bits:
>
> /*
>  * Page fault error code bits:
>  *
>  *   bit 0 ==0: no page found   1: protection fault
>  *   bit 1 ==0: read access 1: write access
>  *   bit 2 ==0: kernel-mode access  1: user-mode access

No. Really not at all.

Bit 2 is *not* "kernel vs user".  Never has been. Never will be.

It's a single bit that mixes up *three* different cases:

 - regular user mode access (value: 1)

 - regular CPL0 access (value: 0)

 - CPU system access (value: 0)

and that third case really is important and relevant. And importantly,
it can happen from user space.

In fact, these days we possibly have a fourth case:

 - kernel access using wruss (value: 1)

and I'd rather see just the numbers (which you have to know to decode)
than see the simplified AND WRONG decoding of those numbers.

Please don't ever confuse the fault U/S bit with "user vs kernel".
It's just not true, and people should be very very aware of it now
being true.

If you care whether a page fault happened in user mode or not, you
have to look at the register state (ie "user_mode(regs)").

Please call the U/S bit something else than "user" or "kernel".

   Linus


Re: [PATCH] x86/mm/fault: Streamline the fault error_code decoder some more

2018-12-06 Thread Linus Torvalds
On Wed, Dec 5, 2018 at 11:34 PM Ingo Molnar  wrote:
>
> Yeah, so I don't like the overly long 'SUPERVISOR' and the somewhat
> inconsistent, sporadic handling of negatives. Here's our error code bits:
>
> /*
>  * Page fault error code bits:
>  *
>  *   bit 0 ==0: no page found   1: protection fault
>  *   bit 1 ==0: read access 1: write access
>  *   bit 2 ==0: kernel-mode access  1: user-mode access

No. Really not at all.

Bit 2 is *not* "kernel vs user".  Never has been. Never will be.

It's a single bit that mixes up *three* different cases:

 - regular user mode access (value: 1)

 - regular CPL0 access (value: 0)

 - CPU system access (value: 0)

and that third case really is important and relevant. And importantly,
it can happen from user space.

In fact, these days we possibly have a fourth case:

 - kernel access using wruss (value: 1)

and I'd rather see just the numbers (which you have to know to decode)
than see the simplified AND WRONG decoding of those numbers.

Please don't ever confuse the fault U/S bit with "user vs kernel".
It's just not true, and people should be very very aware of it now
being true.

If you care whether a page fault happened in user mode or not, you
have to look at the register state (ie "user_mode(regs)").

Please call the U/S bit something else than "user" or "kernel".

   Linus


Re: [PATCH] x86/mm/fault: Streamline the fault error_code decoder some more

2018-12-06 Thread Ingo Molnar


* Andy Lutomirski  wrote:

> That’s why I suggested “read,” in lowercase, for reads.  Other than 
> that, most of the unset bits are uninteresting. An OOPS is so likely to 
> be a kernel fault that it’s barely worth mentioning, and I even added a 
> whole separate diagnostic for user oopses.  Similarly, I don’t think we 
> need to remind the reader that an oops wasn’t an SGX error or that it 
> wasn’t a PK error.  So I think my idea highlights the interesting bits 
> and avoids distraction from the uninteresting bits.

Ok - all good points.

Thanks,

Ingo


Re: [PATCH] x86/mm/fault: Streamline the fault error_code decoder some more

2018-12-06 Thread Ingo Molnar


* Andy Lutomirski  wrote:

> That’s why I suggested “read,” in lowercase, for reads.  Other than 
> that, most of the unset bits are uninteresting. An OOPS is so likely to 
> be a kernel fault that it’s barely worth mentioning, and I even added a 
> whole separate diagnostic for user oopses.  Similarly, I don’t think we 
> need to remind the reader that an oops wasn’t an SGX error or that it 
> wasn’t a PK error.  So I think my idea highlights the interesting bits 
> and avoids distraction from the uninteresting bits.

Ok - all good points.

Thanks,

Ingo


Re: [PATCH] x86/mm/fault: Streamline the fault error_code decoder some more

2018-12-06 Thread Andy Lutomirski



> On Dec 6, 2018, at 8:47 AM, Ingo Molnar  wrote:
> 
> 
> * Andy Lutomirski  wrote:
> 
>>> vs. (with SGX added as 'G' for testing purposes)
>>> 
>>> [0.158849] #PF error code(0001):  +P !W !U !S !I !K !G
>>> [0.159292] #PF error code(0003):  +P +W !U !S !I !K !G
>>> [0.159742] #PF error code(0007):  +P +W +U !S !I !K !G
>>> [0.160190] #PF error code(0025):  +P !W +U !S !I +K !G
>>> [0.160638] #PF error code(0002):  !P +W !U !S !I !K !G
>>> [0.161087] #PF error code(0004):  !P !W +U !S !I !K !G
>>> [0.161538] #PF error code(0006):  !P +W +U !S !I !K !G
>>> [0.161992] #PF error code(0014):  !P !W +U !S +I !K !G
>>> [0.162450] #PF error code(0011):  +P !W !U !S +I !K !G
>>> [0.162667] #PF error code(8001):  +P !W !U !S !I !K +G
>>> [0.162667] #PF error code(8003):  +P +W !U !S !I !K +G
>>> [0.162667] #PF error code(8007):  +P +W +U !S !I !K +G
>>> [0.162667] #PF error code(8025):  +P !W +U !S !I +K +G
>>> [0.162667] #PF error code(8002):  !P +W !U !S !I !K +G
>>> [0.162667] #PF error code(8004):  !P !W +U !S !I !K +G
>>> [0.162667] #PF error code(8006):  !P +W +U !S !I !K +G
>>> [0.162667] #PF error code(8014):  !P !W +U !S +I !K +G
>>> [0.162667] #PF error code(8011):  +P !W !U !S +I !K +G
>>> [0.162667] #PF error code():  !P !W !U !S !I !K !G
>>> 
>> 
>> Please don’t. The whole reason I added the decoding was to make it easy 
>> to read without a cheat sheet. This is incomprehensible without 
>> reference to the code, and I’m familiar with it to begin with.
> 
> Dunno, I can deduct the meaning from the above abbreviations without a 
> cheat sheet and I'm sure you'll be able to too from now on. All the 
> letters are very obvious references - to me at least, and brevity and 
> predictable, fixed-length output matters.
> 
>> How about:
>> 
>> #PF error code: 0001 [PROT read kernel]
>> 
>> #PF error code: 0001 [PROT WRITE kernel]
>> 
>> #PF error code: 0001 [PROT read kernel]
>> 
>> #PF error code: 8011 [PROT INSTR kernel SGX]
>> 
>> This has no noise from unset bits except that we add lowercase “read” 
>> or “kernel” as appropriate.  Even “kernel” seems barely necessary.
> 
> The thing is, the 'noise' from unset bits is actually important 
> information as well - at least for the major bits: it was a mostly random 
> choice that Intel defined '1' for write access and not for read access. 
> 
> 

That’s why I suggested “read,” in lowercase, for reads.  Other than that, most 
of the unset bits are uninteresting. An OOPS is so likely to be a kernel fault 
that it’s barely worth mentioning, and I even added a whole separate diagnostic 
for user oopses.  Similarly, I don’t think we need to remind the reader that an 
oops wasn’t an SGX error or that it wasn’t a PK error.  So I think my idea 
highlights the interesting bits and avoids distraction from the uninteresting 
bits.

Re: [PATCH] x86/mm/fault: Streamline the fault error_code decoder some more

2018-12-06 Thread Andy Lutomirski



> On Dec 6, 2018, at 8:47 AM, Ingo Molnar  wrote:
> 
> 
> * Andy Lutomirski  wrote:
> 
>>> vs. (with SGX added as 'G' for testing purposes)
>>> 
>>> [0.158849] #PF error code(0001):  +P !W !U !S !I !K !G
>>> [0.159292] #PF error code(0003):  +P +W !U !S !I !K !G
>>> [0.159742] #PF error code(0007):  +P +W +U !S !I !K !G
>>> [0.160190] #PF error code(0025):  +P !W +U !S !I +K !G
>>> [0.160638] #PF error code(0002):  !P +W !U !S !I !K !G
>>> [0.161087] #PF error code(0004):  !P !W +U !S !I !K !G
>>> [0.161538] #PF error code(0006):  !P +W +U !S !I !K !G
>>> [0.161992] #PF error code(0014):  !P !W +U !S +I !K !G
>>> [0.162450] #PF error code(0011):  +P !W !U !S +I !K !G
>>> [0.162667] #PF error code(8001):  +P !W !U !S !I !K +G
>>> [0.162667] #PF error code(8003):  +P +W !U !S !I !K +G
>>> [0.162667] #PF error code(8007):  +P +W +U !S !I !K +G
>>> [0.162667] #PF error code(8025):  +P !W +U !S !I +K +G
>>> [0.162667] #PF error code(8002):  !P +W !U !S !I !K +G
>>> [0.162667] #PF error code(8004):  !P !W +U !S !I !K +G
>>> [0.162667] #PF error code(8006):  !P +W +U !S !I !K +G
>>> [0.162667] #PF error code(8014):  !P !W +U !S +I !K +G
>>> [0.162667] #PF error code(8011):  +P !W !U !S +I !K +G
>>> [0.162667] #PF error code():  !P !W !U !S !I !K !G
>>> 
>> 
>> Please don’t. The whole reason I added the decoding was to make it easy 
>> to read without a cheat sheet. This is incomprehensible without 
>> reference to the code, and I’m familiar with it to begin with.
> 
> Dunno, I can deduct the meaning from the above abbreviations without a 
> cheat sheet and I'm sure you'll be able to too from now on. All the 
> letters are very obvious references - to me at least, and brevity and 
> predictable, fixed-length output matters.
> 
>> How about:
>> 
>> #PF error code: 0001 [PROT read kernel]
>> 
>> #PF error code: 0001 [PROT WRITE kernel]
>> 
>> #PF error code: 0001 [PROT read kernel]
>> 
>> #PF error code: 8011 [PROT INSTR kernel SGX]
>> 
>> This has no noise from unset bits except that we add lowercase “read” 
>> or “kernel” as appropriate.  Even “kernel” seems barely necessary.
> 
> The thing is, the 'noise' from unset bits is actually important 
> information as well - at least for the major bits: it was a mostly random 
> choice that Intel defined '1' for write access and not for read access. 
> 
> 

That’s why I suggested “read,” in lowercase, for reads.  Other than that, most 
of the unset bits are uninteresting. An OOPS is so likely to be a kernel fault 
that it’s barely worth mentioning, and I even added a whole separate diagnostic 
for user oopses.  Similarly, I don’t think we need to remind the reader that an 
oops wasn’t an SGX error or that it wasn’t a PK error.  So I think my idea 
highlights the interesting bits and avoids distraction from the uninteresting 
bits.

Re: [PATCH] x86/mm/fault: Streamline the fault error_code decoder some more

2018-12-06 Thread Ingo Molnar


* Andy Lutomirski  wrote:

> > vs. (with SGX added as 'G' for testing purposes)
> > 
> > [0.158849] #PF error code(0001):  +P !W !U !S !I !K !G
> > [0.159292] #PF error code(0003):  +P +W !U !S !I !K !G
> > [0.159742] #PF error code(0007):  +P +W +U !S !I !K !G
> > [0.160190] #PF error code(0025):  +P !W +U !S !I +K !G
> > [0.160638] #PF error code(0002):  !P +W !U !S !I !K !G
> > [0.161087] #PF error code(0004):  !P !W +U !S !I !K !G
> > [0.161538] #PF error code(0006):  !P +W +U !S !I !K !G
> > [0.161992] #PF error code(0014):  !P !W +U !S +I !K !G
> > [0.162450] #PF error code(0011):  +P !W !U !S +I !K !G
> > [0.162667] #PF error code(8001):  +P !W !U !S !I !K +G
> > [0.162667] #PF error code(8003):  +P +W !U !S !I !K +G
> > [0.162667] #PF error code(8007):  +P +W +U !S !I !K +G
> > [0.162667] #PF error code(8025):  +P !W +U !S !I +K +G
> > [0.162667] #PF error code(8002):  !P +W !U !S !I !K +G
> > [0.162667] #PF error code(8004):  !P !W +U !S !I !K +G
> > [0.162667] #PF error code(8006):  !P +W +U !S !I !K +G
> > [0.162667] #PF error code(8014):  !P !W +U !S +I !K +G
> > [0.162667] #PF error code(8011):  +P !W !U !S +I !K +G
> > [0.162667] #PF error code():  !P !W !U !S !I !K !G
> > 
> 
> Please don’t. The whole reason I added the decoding was to make it easy 
> to read without a cheat sheet. This is incomprehensible without 
> reference to the code, and I’m familiar with it to begin with.

Dunno, I can deduct the meaning from the above abbreviations without a 
cheat sheet and I'm sure you'll be able to too from now on. All the 
letters are very obvious references - to me at least, and brevity and 
predictable, fixed-length output matters.

> How about:
> 
> #PF error code: 0001 [PROT read kernel]
> 
> #PF error code: 0001 [PROT WRITE kernel]
> 
> #PF error code: 0001 [PROT read kernel]
> 
> #PF error code: 8011 [PROT INSTR kernel SGX]
> 
> This has no noise from unset bits except that we add lowercase “read” 
> or “kernel” as appropriate.  Even “kernel” seems barely necessary.

The thing is, the 'noise' from unset bits is actually important 
information as well - at least for the major bits: it was a mostly random 
choice that Intel defined '1' for write access and not for read access. 

Thanks,

Ingo


Re: [PATCH] x86/mm/fault: Streamline the fault error_code decoder some more

2018-12-06 Thread Ingo Molnar


* Andy Lutomirski  wrote:

> > vs. (with SGX added as 'G' for testing purposes)
> > 
> > [0.158849] #PF error code(0001):  +P !W !U !S !I !K !G
> > [0.159292] #PF error code(0003):  +P +W !U !S !I !K !G
> > [0.159742] #PF error code(0007):  +P +W +U !S !I !K !G
> > [0.160190] #PF error code(0025):  +P !W +U !S !I +K !G
> > [0.160638] #PF error code(0002):  !P +W !U !S !I !K !G
> > [0.161087] #PF error code(0004):  !P !W +U !S !I !K !G
> > [0.161538] #PF error code(0006):  !P +W +U !S !I !K !G
> > [0.161992] #PF error code(0014):  !P !W +U !S +I !K !G
> > [0.162450] #PF error code(0011):  +P !W !U !S +I !K !G
> > [0.162667] #PF error code(8001):  +P !W !U !S !I !K +G
> > [0.162667] #PF error code(8003):  +P +W !U !S !I !K +G
> > [0.162667] #PF error code(8007):  +P +W +U !S !I !K +G
> > [0.162667] #PF error code(8025):  +P !W +U !S !I +K +G
> > [0.162667] #PF error code(8002):  !P +W !U !S !I !K +G
> > [0.162667] #PF error code(8004):  !P !W +U !S !I !K +G
> > [0.162667] #PF error code(8006):  !P +W +U !S !I !K +G
> > [0.162667] #PF error code(8014):  !P !W +U !S +I !K +G
> > [0.162667] #PF error code(8011):  +P !W !U !S +I !K +G
> > [0.162667] #PF error code():  !P !W !U !S !I !K !G
> > 
> 
> Please don’t. The whole reason I added the decoding was to make it easy 
> to read without a cheat sheet. This is incomprehensible without 
> reference to the code, and I’m familiar with it to begin with.

Dunno, I can deduct the meaning from the above abbreviations without a 
cheat sheet and I'm sure you'll be able to too from now on. All the 
letters are very obvious references - to me at least, and brevity and 
predictable, fixed-length output matters.

> How about:
> 
> #PF error code: 0001 [PROT read kernel]
> 
> #PF error code: 0001 [PROT WRITE kernel]
> 
> #PF error code: 0001 [PROT read kernel]
> 
> #PF error code: 8011 [PROT INSTR kernel SGX]
> 
> This has no noise from unset bits except that we add lowercase “read” 
> or “kernel” as appropriate.  Even “kernel” seems barely necessary.

The thing is, the 'noise' from unset bits is actually important 
information as well - at least for the major bits: it was a mostly random 
choice that Intel defined '1' for write access and not for read access. 

Thanks,

Ingo


Re: [PATCH] x86/mm/fault: Streamline the fault error_code decoder some more

2018-12-06 Thread Andy Lutomirski




> On Dec 6, 2018, at 7:53 AM, Sean Christopherson 
>  wrote:
> 
>> On Thu, Dec 06, 2018 at 08:34:09AM +0100, Ingo Molnar wrote:
>> I like your '!' idea, but with a further simplification: how about using 
>> '-/+' differentiation and a single character and a fixed-length message.
>> 
>> The new output will be lines of:
>> 
>>  #PF error code: -P -W -U -S -I -K (0x00)
>>  ...
>>  #PF error code: -P -W +U +S -I -K (0x0c)
>>  ...
>>  #PF error code: +P +W +U +S +I +K (0x3f)
>> 
>> The symbol abbreviations are pretty self-explanatory:
>> 
>>  P = protection fault   (X86_PF_PROT)
>>  W = write access   (X86_PF_WRITE)
>>  U = user-mode access   (X86_PF_USER)
>>  S = supervisor mode(X86_PF_RSVD)
>>  I = instruction fault  (X86_PF_INSTR)
>>  K = keys fault (X86_PF_PK)
>> 
>> Misc notes:
>> 
>> - In principle the new text is now short enough to include it in one of 
>>  the existing output lines, further shortening the oops output - but I
>>  havent done that in this patch.
>> 
>> - Another question is the ordering of the bits: the symbolic display is 
>>  actually big endian, while the numeric hexa printout is little endian.
>> 
>>  I kind of still like it that way, not just because the decoding loop is 
>>  more natural, but because the bits are actually ordered by importance: 
>>  the PROT bits is more important than the INSTR or the PK bits - and the 
>>  more important bits are displayed first.
> 
> Hmm, my eyes tend to be drawn to the end of the line, e.g. having PROT
> be the last thing makes it stand out more than being buried in the middle
> of the line.  Inverting the ordering between raw and decoded also makes
> it very difficult to correlate the raw value with each bit.
> 
>> - Only build-tested the patch and looked at the generated assembly, but 
>>  it all looks sane enough so will obviously work just fine! ;-)
> 
> ...
> 
>> /*
>> - * This helper function transforms the #PF error_code bits into
>> - * "[PROT] [USER]" type of descriptive, almost human-readable error strings:
>> + * This maps the somewhat obscure error_code number to symbolic text:
>> + *
>> + * P = protection fault   (X86_PF_PROT)
>> + * W = write access   (X86_PF_WRITE)
>> + * U = user-mode access   (X86_PF_USER)
>> + * S = supervisor mode(X86_PF_RSVD)
>> + * I = instruction fault  (X86_PF_INSTR)
>> + * K = keys fault (X86_PF_PK)
>>  */
>> -static void err_str_append(unsigned long error_code, char *buf, unsigned 
>> long mask, const char *txt)
>> +static const char error_code_chars[] = "PWUSIK";
>> +
>> +/*
>> + * This helper function transforms the #PF error_code bits into " +P -W +U 
>> -R -I -K"
>> + * type of descriptive, almost human-readable error strings:
>> + */
>> +static void show_error_code(struct pt_regs *regs, unsigned long error_code)
> 
> No need for @regs.
> 
>> {
>> - if (error_code & mask) {
>> - if (buf[0])
>> - strcat(buf, " ");
>> - strcat(buf, txt);
>> + unsigned int bit, mask;
>> + char err_txt[6*3+1]; /* Fixed length of 6 bits decoded plus zero at 
>> the end */
> 
> Assuming the error code bits are contiguous breaks if/when SGX gets added,
> which uses bit 15.  Oopsing on an SGX fault should be nigh impossible, but
> it might be nice to be able to reuse show_error_code in other places.
> 
> Hardcoding "6" is also a bit painful.
> 
>> +
>> + /* We go from the X86_PF_PROT bit to the X86_PF_PK bit: */
>> +
>> + for (bit = 0; bit < 6; bit++) {
>> + unsigned int offset = bit*3;
>> +
>> + err_txt[offset+0] = ' ';
>> +
>> + mask = 1 << bit;
>> + if (error_code & mask)
>> + err_txt[offset+1] = '+';
>> + else
>> + err_txt[offset+1] = '-';
> 
> To me, using '!' contrasts better when side-by-side with '+'.
> 
>> +
>> + err_txt[offset+2] = error_code_chars[bit];
>>  }
>> +
>> + /* Close the string: */
>> + err_txt[sizeof(err_txt)-1] = 0;
>> +
>> + pr_alert("#PF error code: %s (%02lx)\n", err_txt, error_code);
> 
> The changelog example has a leading "0x" on the error code.  That being
> said, I actually like it without the "0x".
> 
> How about printing the raw value before the colon?  Having it at the end
> makes it look like extra noise.  And for me, seeing the raw code first
> (reading left to right) cue's my brain that it's about to decode some
> bits.
> 
> SGX will also break the two digit printing.  And for whatever reason four
> digits makes me think "this is an error code!".  IIRC the vectoring ucode
> makes a lot of assumptions about the error code being at most 16 bits, so
> in theory four digits is all we'll ever need.
> 
> E.g.
> 
> [0.144247] #PF error code:  +P -W -U -S -I -K (01)
> [0.144411] #PF error code:  +P +W -U -S -I -K (03)
> [0.144826] #PF error code:  +P +W +U -S -I -K (07)
> [0.145252] #PF error code:  +P -W +U -S -I +K (25)
> [0.145706] #PF error 

Re: [PATCH] x86/mm/fault: Streamline the fault error_code decoder some more

2018-12-06 Thread Andy Lutomirski




> On Dec 6, 2018, at 7:53 AM, Sean Christopherson 
>  wrote:
> 
>> On Thu, Dec 06, 2018 at 08:34:09AM +0100, Ingo Molnar wrote:
>> I like your '!' idea, but with a further simplification: how about using 
>> '-/+' differentiation and a single character and a fixed-length message.
>> 
>> The new output will be lines of:
>> 
>>  #PF error code: -P -W -U -S -I -K (0x00)
>>  ...
>>  #PF error code: -P -W +U +S -I -K (0x0c)
>>  ...
>>  #PF error code: +P +W +U +S +I +K (0x3f)
>> 
>> The symbol abbreviations are pretty self-explanatory:
>> 
>>  P = protection fault   (X86_PF_PROT)
>>  W = write access   (X86_PF_WRITE)
>>  U = user-mode access   (X86_PF_USER)
>>  S = supervisor mode(X86_PF_RSVD)
>>  I = instruction fault  (X86_PF_INSTR)
>>  K = keys fault (X86_PF_PK)
>> 
>> Misc notes:
>> 
>> - In principle the new text is now short enough to include it in one of 
>>  the existing output lines, further shortening the oops output - but I
>>  havent done that in this patch.
>> 
>> - Another question is the ordering of the bits: the symbolic display is 
>>  actually big endian, while the numeric hexa printout is little endian.
>> 
>>  I kind of still like it that way, not just because the decoding loop is 
>>  more natural, but because the bits are actually ordered by importance: 
>>  the PROT bits is more important than the INSTR or the PK bits - and the 
>>  more important bits are displayed first.
> 
> Hmm, my eyes tend to be drawn to the end of the line, e.g. having PROT
> be the last thing makes it stand out more than being buried in the middle
> of the line.  Inverting the ordering between raw and decoded also makes
> it very difficult to correlate the raw value with each bit.
> 
>> - Only build-tested the patch and looked at the generated assembly, but 
>>  it all looks sane enough so will obviously work just fine! ;-)
> 
> ...
> 
>> /*
>> - * This helper function transforms the #PF error_code bits into
>> - * "[PROT] [USER]" type of descriptive, almost human-readable error strings:
>> + * This maps the somewhat obscure error_code number to symbolic text:
>> + *
>> + * P = protection fault   (X86_PF_PROT)
>> + * W = write access   (X86_PF_WRITE)
>> + * U = user-mode access   (X86_PF_USER)
>> + * S = supervisor mode(X86_PF_RSVD)
>> + * I = instruction fault  (X86_PF_INSTR)
>> + * K = keys fault (X86_PF_PK)
>>  */
>> -static void err_str_append(unsigned long error_code, char *buf, unsigned 
>> long mask, const char *txt)
>> +static const char error_code_chars[] = "PWUSIK";
>> +
>> +/*
>> + * This helper function transforms the #PF error_code bits into " +P -W +U 
>> -R -I -K"
>> + * type of descriptive, almost human-readable error strings:
>> + */
>> +static void show_error_code(struct pt_regs *regs, unsigned long error_code)
> 
> No need for @regs.
> 
>> {
>> - if (error_code & mask) {
>> - if (buf[0])
>> - strcat(buf, " ");
>> - strcat(buf, txt);
>> + unsigned int bit, mask;
>> + char err_txt[6*3+1]; /* Fixed length of 6 bits decoded plus zero at 
>> the end */
> 
> Assuming the error code bits are contiguous breaks if/when SGX gets added,
> which uses bit 15.  Oopsing on an SGX fault should be nigh impossible, but
> it might be nice to be able to reuse show_error_code in other places.
> 
> Hardcoding "6" is also a bit painful.
> 
>> +
>> + /* We go from the X86_PF_PROT bit to the X86_PF_PK bit: */
>> +
>> + for (bit = 0; bit < 6; bit++) {
>> + unsigned int offset = bit*3;
>> +
>> + err_txt[offset+0] = ' ';
>> +
>> + mask = 1 << bit;
>> + if (error_code & mask)
>> + err_txt[offset+1] = '+';
>> + else
>> + err_txt[offset+1] = '-';
> 
> To me, using '!' contrasts better when side-by-side with '+'.
> 
>> +
>> + err_txt[offset+2] = error_code_chars[bit];
>>  }
>> +
>> + /* Close the string: */
>> + err_txt[sizeof(err_txt)-1] = 0;
>> +
>> + pr_alert("#PF error code: %s (%02lx)\n", err_txt, error_code);
> 
> The changelog example has a leading "0x" on the error code.  That being
> said, I actually like it without the "0x".
> 
> How about printing the raw value before the colon?  Having it at the end
> makes it look like extra noise.  And for me, seeing the raw code first
> (reading left to right) cue's my brain that it's about to decode some
> bits.
> 
> SGX will also break the two digit printing.  And for whatever reason four
> digits makes me think "this is an error code!".  IIRC the vectoring ucode
> makes a lot of assumptions about the error code being at most 16 bits, so
> in theory four digits is all we'll ever need.
> 
> E.g.
> 
> [0.144247] #PF error code:  +P -W -U -S -I -K (01)
> [0.144411] #PF error code:  +P +W -U -S -I -K (03)
> [0.144826] #PF error code:  +P +W +U -S -I -K (07)
> [0.145252] #PF error code:  +P -W +U -S -I +K (25)
> [0.145706] #PF error 

Re: [PATCH] x86/mm/fault: Streamline the fault error_code decoder some more

2018-12-06 Thread Ingo Molnar


* Sean Christopherson  wrote:

> On Thu, Dec 06, 2018 at 08:34:09AM +0100, Ingo Molnar wrote:
> > I like your '!' idea, but with a further simplification: how about using 
> > '-/+' differentiation and a single character and a fixed-length message.
> > 
> > The new output will be lines of:
> > 
> >   #PF error code: -P -W -U -S -I -K (0x00)
> >   ...
> >   #PF error code: -P -W +U +S -I -K (0x0c)
> >   ...
> >   #PF error code: +P +W +U +S +I +K (0x3f)
> > 
> > The symbol abbreviations are pretty self-explanatory:
> > 
> >   P = protection fault   (X86_PF_PROT)
> >   W = write access   (X86_PF_WRITE)
> >   U = user-mode access   (X86_PF_USER)
> >   S = supervisor mode(X86_PF_RSVD)
> >   I = instruction fault  (X86_PF_INSTR)
> >   K = keys fault (X86_PF_PK)
> > 
> > Misc notes:
> > 
> > - In principle the new text is now short enough to include it in one of 
> >   the existing output lines, further shortening the oops output - but I
> >   havent done that in this patch.
> > 
> > - Another question is the ordering of the bits: the symbolic display is 
> >   actually big endian, while the numeric hexa printout is little endian.
> > 
> >   I kind of still like it that way, not just because the decoding loop is 
> >   more natural, but because the bits are actually ordered by importance: 
> >   the PROT bits is more important than the INSTR or the PK bits - and the 
> >   more important bits are displayed first.
> 
> Hmm, my eyes tend to be drawn to the end of the line, e.g. having PROT
> be the last thing makes it stand out more than being buried in the middle
> of the line.  Inverting the ordering between raw and decoded also makes
> it very difficult to correlate the raw value with each bit.
> 
> > - Only build-tested the patch and looked at the generated assembly, but 
> >   it all looks sane enough so will obviously work just fine! ;-)
> 
> ...
> 
> >  /*
> > - * This helper function transforms the #PF error_code bits into
> > - * "[PROT] [USER]" type of descriptive, almost human-readable error 
> > strings:
> > + * This maps the somewhat obscure error_code number to symbolic text:
> > + *
> > + * P = protection fault   (X86_PF_PROT)
> > + * W = write access   (X86_PF_WRITE)
> > + * U = user-mode access   (X86_PF_USER)
> > + * S = supervisor mode(X86_PF_RSVD)
> > + * I = instruction fault  (X86_PF_INSTR)
> > + * K = keys fault (X86_PF_PK)
> >   */
> > -static void err_str_append(unsigned long error_code, char *buf, unsigned 
> > long mask, const char *txt)
> > +static const char error_code_chars[] = "PWUSIK";
> > +
> > +/*
> > + * This helper function transforms the #PF error_code bits into " +P -W +U 
> > -R -I -K"
> > + * type of descriptive, almost human-readable error strings:
> > + */
> > +static void show_error_code(struct pt_regs *regs, unsigned long error_code)
> 
> No need for @regs.
> 
> >  {
> > - if (error_code & mask) {
> > - if (buf[0])
> > - strcat(buf, " ");
> > - strcat(buf, txt);
> > + unsigned int bit, mask;
> > + char err_txt[6*3+1]; /* Fixed length of 6 bits decoded plus zero at 
> > the end */
> 
> Assuming the error code bits are contiguous breaks if/when SGX gets added,
> which uses bit 15.  Oopsing on an SGX fault should be nigh impossible, but
> it might be nice to be able to reuse show_error_code in other places.
> 
> Hardcoding "6" is also a bit painful.
> 
> > +
> > + /* We go from the X86_PF_PROT bit to the X86_PF_PK bit: */
> > +
> > + for (bit = 0; bit < 6; bit++) {
> > + unsigned int offset = bit*3;
> > +
> > + err_txt[offset+0] = ' ';
> > +
> > + mask = 1 << bit;
> > + if (error_code & mask)
> > + err_txt[offset+1] = '+';
> > + else
> > + err_txt[offset+1] = '-';
> 
> To me, using '!' contrasts better when side-by-side with '+'.
> 
> > +
> > + err_txt[offset+2] = error_code_chars[bit];
> >   }
> > +
> > + /* Close the string: */
> > + err_txt[sizeof(err_txt)-1] = 0;
> > +
> > + pr_alert("#PF error code: %s (%02lx)\n", err_txt, error_code);
> 
> The changelog example has a leading "0x" on the error code.  That being
> said, I actually like it without the "0x".
> 
> How about printing the raw value before the colon?  Having it at the end
> makes it look like extra noise.  And for me, seeing the raw code first
> (reading left to right) cue's my brain that it's about to decode some
> bits.
> 
> SGX will also break the two digit printing.  And for whatever reason four
> digits makes me think "this is an error code!".  IIRC the vectoring ucode
> makes a lot of assumptions about the error code being at most 16 bits, so
> in theory four digits is all we'll ever need.
> 
> E.g.
> 
> [0.144247] #PF error code:  +P -W -U -S -I -K (01)
> [0.144411] #PF error code:  +P +W -U -S -I -K (03)
> [0.144826] #PF error code:  +P +W +U -S -I -K (07)
> [

Re: [PATCH] x86/mm/fault: Streamline the fault error_code decoder some more

2018-12-06 Thread Ingo Molnar


* Sean Christopherson  wrote:

> On Thu, Dec 06, 2018 at 08:34:09AM +0100, Ingo Molnar wrote:
> > I like your '!' idea, but with a further simplification: how about using 
> > '-/+' differentiation and a single character and a fixed-length message.
> > 
> > The new output will be lines of:
> > 
> >   #PF error code: -P -W -U -S -I -K (0x00)
> >   ...
> >   #PF error code: -P -W +U +S -I -K (0x0c)
> >   ...
> >   #PF error code: +P +W +U +S +I +K (0x3f)
> > 
> > The symbol abbreviations are pretty self-explanatory:
> > 
> >   P = protection fault   (X86_PF_PROT)
> >   W = write access   (X86_PF_WRITE)
> >   U = user-mode access   (X86_PF_USER)
> >   S = supervisor mode(X86_PF_RSVD)
> >   I = instruction fault  (X86_PF_INSTR)
> >   K = keys fault (X86_PF_PK)
> > 
> > Misc notes:
> > 
> > - In principle the new text is now short enough to include it in one of 
> >   the existing output lines, further shortening the oops output - but I
> >   havent done that in this patch.
> > 
> > - Another question is the ordering of the bits: the symbolic display is 
> >   actually big endian, while the numeric hexa printout is little endian.
> > 
> >   I kind of still like it that way, not just because the decoding loop is 
> >   more natural, but because the bits are actually ordered by importance: 
> >   the PROT bits is more important than the INSTR or the PK bits - and the 
> >   more important bits are displayed first.
> 
> Hmm, my eyes tend to be drawn to the end of the line, e.g. having PROT
> be the last thing makes it stand out more than being buried in the middle
> of the line.  Inverting the ordering between raw and decoded also makes
> it very difficult to correlate the raw value with each bit.
> 
> > - Only build-tested the patch and looked at the generated assembly, but 
> >   it all looks sane enough so will obviously work just fine! ;-)
> 
> ...
> 
> >  /*
> > - * This helper function transforms the #PF error_code bits into
> > - * "[PROT] [USER]" type of descriptive, almost human-readable error 
> > strings:
> > + * This maps the somewhat obscure error_code number to symbolic text:
> > + *
> > + * P = protection fault   (X86_PF_PROT)
> > + * W = write access   (X86_PF_WRITE)
> > + * U = user-mode access   (X86_PF_USER)
> > + * S = supervisor mode(X86_PF_RSVD)
> > + * I = instruction fault  (X86_PF_INSTR)
> > + * K = keys fault (X86_PF_PK)
> >   */
> > -static void err_str_append(unsigned long error_code, char *buf, unsigned 
> > long mask, const char *txt)
> > +static const char error_code_chars[] = "PWUSIK";
> > +
> > +/*
> > + * This helper function transforms the #PF error_code bits into " +P -W +U 
> > -R -I -K"
> > + * type of descriptive, almost human-readable error strings:
> > + */
> > +static void show_error_code(struct pt_regs *regs, unsigned long error_code)
> 
> No need for @regs.
> 
> >  {
> > - if (error_code & mask) {
> > - if (buf[0])
> > - strcat(buf, " ");
> > - strcat(buf, txt);
> > + unsigned int bit, mask;
> > + char err_txt[6*3+1]; /* Fixed length of 6 bits decoded plus zero at 
> > the end */
> 
> Assuming the error code bits are contiguous breaks if/when SGX gets added,
> which uses bit 15.  Oopsing on an SGX fault should be nigh impossible, but
> it might be nice to be able to reuse show_error_code in other places.
> 
> Hardcoding "6" is also a bit painful.
> 
> > +
> > + /* We go from the X86_PF_PROT bit to the X86_PF_PK bit: */
> > +
> > + for (bit = 0; bit < 6; bit++) {
> > + unsigned int offset = bit*3;
> > +
> > + err_txt[offset+0] = ' ';
> > +
> > + mask = 1 << bit;
> > + if (error_code & mask)
> > + err_txt[offset+1] = '+';
> > + else
> > + err_txt[offset+1] = '-';
> 
> To me, using '!' contrasts better when side-by-side with '+'.
> 
> > +
> > + err_txt[offset+2] = error_code_chars[bit];
> >   }
> > +
> > + /* Close the string: */
> > + err_txt[sizeof(err_txt)-1] = 0;
> > +
> > + pr_alert("#PF error code: %s (%02lx)\n", err_txt, error_code);
> 
> The changelog example has a leading "0x" on the error code.  That being
> said, I actually like it without the "0x".
> 
> How about printing the raw value before the colon?  Having it at the end
> makes it look like extra noise.  And for me, seeing the raw code first
> (reading left to right) cue's my brain that it's about to decode some
> bits.
> 
> SGX will also break the two digit printing.  And for whatever reason four
> digits makes me think "this is an error code!".  IIRC the vectoring ucode
> makes a lot of assumptions about the error code being at most 16 bits, so
> in theory four digits is all we'll ever need.
> 
> E.g.
> 
> [0.144247] #PF error code:  +P -W -U -S -I -K (01)
> [0.144411] #PF error code:  +P +W -U -S -I -K (03)
> [0.144826] #PF error code:  +P +W +U -S -I -K (07)
> [

Re: [PATCH] x86/mm/fault: Streamline the fault error_code decoder some more

2018-12-06 Thread Sean Christopherson
On Thu, Dec 06, 2018 at 08:34:09AM +0100, Ingo Molnar wrote:
> I like your '!' idea, but with a further simplification: how about using 
> '-/+' differentiation and a single character and a fixed-length message.
> 
> The new output will be lines of:
> 
>   #PF error code: -P -W -U -S -I -K (0x00)
>   ...
>   #PF error code: -P -W +U +S -I -K (0x0c)
>   ...
>   #PF error code: +P +W +U +S +I +K (0x3f)
> 
> The symbol abbreviations are pretty self-explanatory:
> 
>   P = protection fault   (X86_PF_PROT)
>   W = write access   (X86_PF_WRITE)
>   U = user-mode access   (X86_PF_USER)
>   S = supervisor mode(X86_PF_RSVD)
>   I = instruction fault  (X86_PF_INSTR)
>   K = keys fault (X86_PF_PK)
> 
> Misc notes:
> 
> - In principle the new text is now short enough to include it in one of 
>   the existing output lines, further shortening the oops output - but I
>   havent done that in this patch.
> 
> - Another question is the ordering of the bits: the symbolic display is 
>   actually big endian, while the numeric hexa printout is little endian.
> 
>   I kind of still like it that way, not just because the decoding loop is 
>   more natural, but because the bits are actually ordered by importance: 
>   the PROT bits is more important than the INSTR or the PK bits - and the 
>   more important bits are displayed first.

Hmm, my eyes tend to be drawn to the end of the line, e.g. having PROT
be the last thing makes it stand out more than being buried in the middle
of the line.  Inverting the ordering between raw and decoded also makes
it very difficult to correlate the raw value with each bit.

> - Only build-tested the patch and looked at the generated assembly, but 
>   it all looks sane enough so will obviously work just fine! ;-)

...

>  /*
> - * This helper function transforms the #PF error_code bits into
> - * "[PROT] [USER]" type of descriptive, almost human-readable error strings:
> + * This maps the somewhat obscure error_code number to symbolic text:
> + *
> + * P = protection fault   (X86_PF_PROT)
> + * W = write access   (X86_PF_WRITE)
> + * U = user-mode access   (X86_PF_USER)
> + * S = supervisor mode(X86_PF_RSVD)
> + * I = instruction fault  (X86_PF_INSTR)
> + * K = keys fault (X86_PF_PK)
>   */
> -static void err_str_append(unsigned long error_code, char *buf, unsigned 
> long mask, const char *txt)
> +static const char error_code_chars[] = "PWUSIK";
> +
> +/*
> + * This helper function transforms the #PF error_code bits into " +P -W +U 
> -R -I -K"
> + * type of descriptive, almost human-readable error strings:
> + */
> +static void show_error_code(struct pt_regs *regs, unsigned long error_code)

No need for @regs.

>  {
> - if (error_code & mask) {
> - if (buf[0])
> - strcat(buf, " ");
> - strcat(buf, txt);
> + unsigned int bit, mask;
> + char err_txt[6*3+1]; /* Fixed length of 6 bits decoded plus zero at the 
> end */

Assuming the error code bits are contiguous breaks if/when SGX gets added,
which uses bit 15.  Oopsing on an SGX fault should be nigh impossible, but
it might be nice to be able to reuse show_error_code in other places.

Hardcoding "6" is also a bit painful.

> +
> + /* We go from the X86_PF_PROT bit to the X86_PF_PK bit: */
> +
> + for (bit = 0; bit < 6; bit++) {
> + unsigned int offset = bit*3;
> +
> + err_txt[offset+0] = ' ';
> +
> + mask = 1 << bit;
> + if (error_code & mask)
> + err_txt[offset+1] = '+';
> + else
> + err_txt[offset+1] = '-';

To me, using '!' contrasts better when side-by-side with '+'.

> +
> + err_txt[offset+2] = error_code_chars[bit];
>   }
> +
> + /* Close the string: */
> + err_txt[sizeof(err_txt)-1] = 0;
> +
> + pr_alert("#PF error code: %s (%02lx)\n", err_txt, error_code);

The changelog example has a leading "0x" on the error code.  That being
said, I actually like it without the "0x".

How about printing the raw value before the colon?  Having it at the end
makes it look like extra noise.  And for me, seeing the raw code first
(reading left to right) cue's my brain that it's about to decode some
bits.

SGX will also break the two digit printing.  And for whatever reason four
digits makes me think "this is an error code!".  IIRC the vectoring ucode
makes a lot of assumptions about the error code being at most 16 bits, so
in theory four digits is all we'll ever need.

E.g.

[0.144247] #PF error code:  +P -W -U -S -I -K (01)
[0.144411] #PF error code:  +P +W -U -S -I -K (03)
[0.144826] #PF error code:  +P +W +U -S -I -K (07)
[0.145252] #PF error code:  +P -W +U -S -I +K (25)
[0.145706] #PF error code:  -P +W -U -S -I -K (02)
[0.146111] #PF error code:  -P -W +U -S -I -K (04)
[0.146521] #PF error code:  -P +W +U -S -I -K (06)
[0.146934] #PF error code:  -P -W +U -S +I -K (14)
[0.147348] #PF error 

Re: [PATCH] x86/mm/fault: Streamline the fault error_code decoder some more

2018-12-06 Thread Sean Christopherson
On Thu, Dec 06, 2018 at 08:34:09AM +0100, Ingo Molnar wrote:
> I like your '!' idea, but with a further simplification: how about using 
> '-/+' differentiation and a single character and a fixed-length message.
> 
> The new output will be lines of:
> 
>   #PF error code: -P -W -U -S -I -K (0x00)
>   ...
>   #PF error code: -P -W +U +S -I -K (0x0c)
>   ...
>   #PF error code: +P +W +U +S +I +K (0x3f)
> 
> The symbol abbreviations are pretty self-explanatory:
> 
>   P = protection fault   (X86_PF_PROT)
>   W = write access   (X86_PF_WRITE)
>   U = user-mode access   (X86_PF_USER)
>   S = supervisor mode(X86_PF_RSVD)
>   I = instruction fault  (X86_PF_INSTR)
>   K = keys fault (X86_PF_PK)
> 
> Misc notes:
> 
> - In principle the new text is now short enough to include it in one of 
>   the existing output lines, further shortening the oops output - but I
>   havent done that in this patch.
> 
> - Another question is the ordering of the bits: the symbolic display is 
>   actually big endian, while the numeric hexa printout is little endian.
> 
>   I kind of still like it that way, not just because the decoding loop is 
>   more natural, but because the bits are actually ordered by importance: 
>   the PROT bits is more important than the INSTR or the PK bits - and the 
>   more important bits are displayed first.

Hmm, my eyes tend to be drawn to the end of the line, e.g. having PROT
be the last thing makes it stand out more than being buried in the middle
of the line.  Inverting the ordering between raw and decoded also makes
it very difficult to correlate the raw value with each bit.

> - Only build-tested the patch and looked at the generated assembly, but 
>   it all looks sane enough so will obviously work just fine! ;-)

...

>  /*
> - * This helper function transforms the #PF error_code bits into
> - * "[PROT] [USER]" type of descriptive, almost human-readable error strings:
> + * This maps the somewhat obscure error_code number to symbolic text:
> + *
> + * P = protection fault   (X86_PF_PROT)
> + * W = write access   (X86_PF_WRITE)
> + * U = user-mode access   (X86_PF_USER)
> + * S = supervisor mode(X86_PF_RSVD)
> + * I = instruction fault  (X86_PF_INSTR)
> + * K = keys fault (X86_PF_PK)
>   */
> -static void err_str_append(unsigned long error_code, char *buf, unsigned 
> long mask, const char *txt)
> +static const char error_code_chars[] = "PWUSIK";
> +
> +/*
> + * This helper function transforms the #PF error_code bits into " +P -W +U 
> -R -I -K"
> + * type of descriptive, almost human-readable error strings:
> + */
> +static void show_error_code(struct pt_regs *regs, unsigned long error_code)

No need for @regs.

>  {
> - if (error_code & mask) {
> - if (buf[0])
> - strcat(buf, " ");
> - strcat(buf, txt);
> + unsigned int bit, mask;
> + char err_txt[6*3+1]; /* Fixed length of 6 bits decoded plus zero at the 
> end */

Assuming the error code bits are contiguous breaks if/when SGX gets added,
which uses bit 15.  Oopsing on an SGX fault should be nigh impossible, but
it might be nice to be able to reuse show_error_code in other places.

Hardcoding "6" is also a bit painful.

> +
> + /* We go from the X86_PF_PROT bit to the X86_PF_PK bit: */
> +
> + for (bit = 0; bit < 6; bit++) {
> + unsigned int offset = bit*3;
> +
> + err_txt[offset+0] = ' ';
> +
> + mask = 1 << bit;
> + if (error_code & mask)
> + err_txt[offset+1] = '+';
> + else
> + err_txt[offset+1] = '-';

To me, using '!' contrasts better when side-by-side with '+'.

> +
> + err_txt[offset+2] = error_code_chars[bit];
>   }
> +
> + /* Close the string: */
> + err_txt[sizeof(err_txt)-1] = 0;
> +
> + pr_alert("#PF error code: %s (%02lx)\n", err_txt, error_code);

The changelog example has a leading "0x" on the error code.  That being
said, I actually like it without the "0x".

How about printing the raw value before the colon?  Having it at the end
makes it look like extra noise.  And for me, seeing the raw code first
(reading left to right) cue's my brain that it's about to decode some
bits.

SGX will also break the two digit printing.  And for whatever reason four
digits makes me think "this is an error code!".  IIRC the vectoring ucode
makes a lot of assumptions about the error code being at most 16 bits, so
in theory four digits is all we'll ever need.

E.g.

[0.144247] #PF error code:  +P -W -U -S -I -K (01)
[0.144411] #PF error code:  +P +W -U -S -I -K (03)
[0.144826] #PF error code:  +P +W +U -S -I -K (07)
[0.145252] #PF error code:  +P -W +U -S -I +K (25)
[0.145706] #PF error code:  -P +W -U -S -I -K (02)
[0.146111] #PF error code:  -P -W +U -S -I -K (04)
[0.146521] #PF error code:  -P +W +U -S -I -K (06)
[0.146934] #PF error code:  -P -W +U -S +I -K (14)
[0.147348] #PF error