Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread Borislav Petkov
On Mon, Apr 21, 2014 at 06:53:36PM -0700, Andrew Lutomirski wrote: On Mon, Apr 21, 2014 at 6:47 PM, H. Peter Anvin h...@zytor.com wrote: Race condition (although with x86 being globally ordered, it probably can't actually happen.) The bitmask is probably the way to go. Does the race

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread Borislav Petkov
Just nitpicks below: On Mon, Apr 21, 2014 at 03:47:52PM -0700, H. Peter Anvin wrote: This is a prototype of espfix for the 64-bit kernel. espfix is a workaround for the architectural definition of IRET, which fails to restore bits [31:16] of %esp when returning to a 16-bit stack segment. We

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread Borislav Petkov
On Tue, Apr 22, 2014 at 01:23:12PM +0200, Borislav Petkov wrote: I wonder if it would be workable to use a bit in the espfix PGD to denote that it has been initialized already... I hear, near NX there's some room :-) Ok, I realized this won't work when I hit send... Oh well. Anyway, another

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread Andrew Lutomirski
On Tue, Apr 22, 2014 at 7:46 AM, Borislav Petkov b...@alien8.de wrote: On Tue, Apr 22, 2014 at 01:23:12PM +0200, Borislav Petkov wrote: I wonder if it would be workable to use a bit in the espfix PGD to denote that it has been initialized already... I hear, near NX there's some room :-) Ok,

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread H. Peter Anvin
Honestly, guys... you're painting the bikeshed at the moment. Initialization is the easiest bit of all this code. The tricky part is *the rest of the code*, i.e. the stuff in entry_64.S. Also, the code is butt-ugly at the moment. Aestetics have not been dealt with. -hpa -- To

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread Andrew Lutomirski
On Tue, Apr 22, 2014 at 9:10 AM, H. Peter Anvin h...@zytor.com wrote: Honestly, guys... you're painting the bikeshed at the moment. Initialization is the easiest bit of all this code. The tricky part is *the rest of the code*, i.e. the stuff in entry_64.S. That's because the initialization

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread Linus Torvalds
On Tue, Apr 22, 2014 at 9:33 AM, Andrew Lutomirski aml...@gmail.com wrote: For the espfix_adjust_stack thing, when can it actually need to do anything? irqs should be off, I think, and MCE, NMI, and debug exceptions use ist, so that leaves just #SS and #GP, I think. How can those actually

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread Andrew Lutomirski
On Tue, Apr 22, 2014 at 9:43 AM, Linus Torvalds torva...@linux-foundation.org wrote: On Tue, Apr 22, 2014 at 9:33 AM, Andrew Lutomirski aml...@gmail.com wrote: For the espfix_adjust_stack thing, when can it actually need to do anything? irqs should be off, I think, and MCE, NMI, and debug

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread Linus Torvalds
On Tue, Apr 22, 2014 at 10:00 AM, Andrew Lutomirski aml...@gmail.com wrote: My point is that it may be safe to remove the special espfix fixup from #PF, which is probably the most performance-critical piece here, aside from iret itself. Actually, even that is unsafe. Why? The segment table

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread H. Peter Anvin
On 04/22/2014 10:00 AM, Andrew Lutomirski wrote: Yes, you can very much trigger GP deliberately. The way to do it is to just make an invalid segment descriptor on the iret stack. Or make it a valid 16-bit one, but make it a code segment for the stack pointer, or read-only, or whatever. All

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread Andrew Lutomirski
On Tue, Apr 22, 2014 at 10:04 AM, Linus Torvalds torva...@linux-foundation.org wrote: On Tue, Apr 22, 2014 at 10:00 AM, Andrew Lutomirski aml...@gmail.com wrote: My point is that it may be safe to remove the special espfix fixup from #PF, which is probably the most performance-critical piece

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread H. Peter Anvin
On 04/22/2014 10:11 AM, Andrew Lutomirski wrote: Anyway, if done correctly, this whole espfix should be totally free for normal processes, since it should only trigger if SS is a LDT entry (bit #2 set in the segment descriptor). So the normal fast-path should just have a simple test for that.

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread H. Peter Anvin
On 04/22/2014 10:04 AM, Linus Torvalds wrote: The segment table is shared for a process. So you can have one thread doing a load_ldt() that invalidates a segment, while another thread is busy taking a page fault. The segment was valid at page fault time and is saved on the kernel stack, but

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread Linus Torvalds
On Tue, Apr 22, 2014 at 10:11 AM, Andrew Lutomirski aml...@gmail.com wrote: Anyway, if done correctly, this whole espfix should be totally free for normal processes, since it should only trigger if SS is a LDT entry (bit #2 set in the segment descriptor). So the normal fast-path should just

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread Andrew Lutomirski
On Tue, Apr 22, 2014 at 10:09 AM, H. Peter Anvin h...@zytor.com wrote: As for Andy's questions: What happens on the IST entries? If I've read your patch right, you're still switching back to the normal stack, which looks questionable. No, in that case %rsp won't point into the espfix

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread H. Peter Anvin
On 04/22/2014 10:20 AM, Andrew Lutomirski wrote: It won't, given the above. I misunderstood what you were checking. It still seems to me that only #GP needs this special handling. The IST entries should never run on the espfix stack, and #MC, #DB, #NM, and #SS (I missed that one earlier)

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread Borislav Petkov
On Tue, Apr 22, 2014 at 10:11:27AM -0700, H. Peter Anvin wrote: The fastpath interference is: 1. Testing for an LDT SS selector before IRET. This is actually easier than on 32 bits, because on 64 bits the SS:RSP on the stack is always valid. 2. Testing for an RSP inside the espfix region

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread H. Peter Anvin
On 04/22/2014 10:19 AM, Linus Torvalds wrote: On Tue, Apr 22, 2014 at 10:11 AM, Andrew Lutomirski aml...@gmail.com wrote: Anyway, if done correctly, this whole espfix should be totally free for normal processes, since it should only trigger if SS is a LDT entry (bit #2 set in the segment

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread Andrew Lutomirski
On Tue, Apr 22, 2014 at 10:26 AM, Borislav Petkov b...@alien8.de wrote: On Tue, Apr 22, 2014 at 10:11:27AM -0700, H. Peter Anvin wrote: The fastpath interference is: 1. Testing for an LDT SS selector before IRET. This is actually easier than on 32 bits, because on 64 bits the SS:RSP on the

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread Andrew Lutomirski
On Tue, Apr 22, 2014 at 10:29 AM, H. Peter Anvin h...@zytor.com wrote: On 04/22/2014 10:19 AM, Linus Torvalds wrote: On Tue, Apr 22, 2014 at 10:11 AM, Andrew Lutomirski aml...@gmail.com wrote: Anyway, if done correctly, this whole espfix should be totally free for normal processes, since it

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread H. Peter Anvin
On 04/22/2014 10:46 AM, Andrew Lutomirski wrote: That is the whole impact of the IRET path. If using IST for #GP won't cause trouble (ISTs don't nest, so we need to make sure there is absolutely no way we could end up nested) then the rest of the fixup code can go away and we kill the common

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread Brian Gerst
On Tue, Apr 22, 2014 at 1:46 PM, Andrew Lutomirski aml...@gmail.com wrote: On Tue, Apr 22, 2014 at 10:29 AM, H. Peter Anvin h...@zytor.com wrote: On 04/22/2014 10:19 AM, Linus Torvalds wrote: On Tue, Apr 22, 2014 at 10:11 AM, Andrew Lutomirski aml...@gmail.com wrote: Anyway, if done

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread H. Peter Anvin
On 04/22/2014 11:03 AM, Brian Gerst wrote: Maybe make the #GP handler check what the previous stack was at the start: 1) If we came from userspace, switch to the top of the process stack. 2) If the previous stack was not the espfix stack, switch back to that stack. 3) Switch to the top of

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread Brian Gerst
On Tue, Apr 22, 2014 at 2:06 PM, H. Peter Anvin h...@zytor.com wrote: On 04/22/2014 11:03 AM, Brian Gerst wrote: Maybe make the #GP handler check what the previous stack was at the start: 1) If we came from userspace, switch to the top of the process stack. 2) If the previous stack was not

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread H. Peter Anvin
On 04/22/2014 11:17 AM, Brian Gerst wrote: That is the entry condition that we have to deal with. The fact that the switch to the IST is unconditional is what makes ISTs hard to deal with. Right, that is why you switch away from the IST as soon as possible, copying the data that is already

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread Borislav Petkov
On Tue, Apr 22, 2014 at 10:29:45AM -0700, Andrew Lutomirski wrote: Or we could add a TIF_NEEDS_ESPFIX that gets set once you have a 16-bit LDT entry. Or something like that, yep. But I think it makes sense to nail down everything else first. I suspect that a single test-and-branch in the

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread Brian Gerst
On Tue, Apr 22, 2014 at 2:51 PM, H. Peter Anvin h...@zytor.com wrote: On 04/22/2014 11:17 AM, Brian Gerst wrote: That is the entry condition that we have to deal with. The fact that the switch to the IST is unconditional is what makes ISTs hard to deal with. Right, that is why you switch

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread H. Peter Anvin
On 04/22/2014 12:55 PM, Brian Gerst wrote: On Tue, Apr 22, 2014 at 2:51 PM, H. Peter Anvin h...@zytor.com wrote: On 04/22/2014 11:17 AM, Brian Gerst wrote: That is the entry condition that we have to deal with. The fact that the switch to the IST is unconditional is what makes ISTs hard to

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread Brian Gerst
On Tue, Apr 22, 2014 at 4:17 PM, H. Peter Anvin h...@zytor.com wrote: On 04/22/2014 12:55 PM, Brian Gerst wrote: On Tue, Apr 22, 2014 at 2:51 PM, H. Peter Anvin h...@zytor.com wrote: On 04/22/2014 11:17 AM, Brian Gerst wrote: That is the entry condition that we have to deal with. The fact

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread Andi Kleen
That simply will not work if you can take a #GP due to the safe MSR functions from NMI and #MC context, which would be my main concern. At some point the IST entry functions subtracted 1K while the handler ran to handle simple nesting cases. Not sure that code is still there. -Andi -- To

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread H. Peter Anvin
On 04/22/2014 04:39 PM, Andi Kleen wrote: That simply will not work if you can take a #GP due to the safe MSR functions from NMI and #MC context, which would be my main concern. At some point the IST entry functions subtracted 1K while the handler ran to handle simple nesting cases. Not

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread H. Peter Anvin
Another spin of the prototype. This one avoids the espfix for anything but #GP, and avoids save/restore/saving registers... one can wonder, though, how much that actually matters in practice. It still does redundant SWAPGS on the slow path. I'm not sure I personally care enough to optimize

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread Andrew Lutomirski
On Tue, Apr 22, 2014 at 6:17 PM, H. Peter Anvin h...@linux.intel.com wrote: Another spin of the prototype. This one avoids the espfix for anything but #GP, and avoids save/restore/saving registers... one can wonder, though, how much that actually matters in practice. It still does redundant

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-22 Thread H. Peter Anvin
On 04/22/2014 06:23 PM, Andrew Lutomirski wrote: What's the to_dmesg thing for? It's for debugging... the espfix page tables generate so many duplicate entries that trying to output it via a seqfile runs out of memory. I suspect we need to do something like skip the espfix range or some

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-21 Thread Andrew Lutomirski
On Mon, Apr 21, 2014 at 6:47 PM, H. Peter Anvin wrote: > Race condition (although with x86 being globally ordered, it probably can't > actually happen.) The bitmask is probably the way to go. Does the race matter? In the worst case you take the lock unnecessarily. But yes, the bitmask is

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-21 Thread H. Peter Anvin
Race condition (although with x86 being globally ordered, it probably can't actually happen.) The bitmask is probably the way to go. On April 21, 2014 6:28:12 PM PDT, Andrew Lutomirski wrote: >On Mon, Apr 21, 2014 at 6:14 PM, H. Peter Anvin wrote: >> I wanted to avoid the "another cpu made

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-21 Thread Andrew Lutomirski
On Mon, Apr 21, 2014 at 6:14 PM, H. Peter Anvin wrote: > I wanted to avoid the "another cpu made this allocation, now I have to free" > crap, but I also didn't want to grab the lock if there was no work needed. I guess you also want to avoid bouncing all these cachelines around on boot on bit

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-21 Thread H. Peter Anvin
I wanted to avoid the "another cpu made this allocation, now I have to free" crap, but I also didn't want to grab the lock if there was no work needed. On April 21, 2014 6:06:19 PM PDT, Andrew Lutomirski wrote: >On Mon, Apr 21, 2014 at 5:53 PM, H. Peter Anvin wrote: >> Well, if 2^17 CPUs are

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-21 Thread Andrew Lutomirski
On Mon, Apr 21, 2014 at 5:53 PM, H. Peter Anvin wrote: > Well, if 2^17 CPUs are allocated we might 2K pages allocated. We could > easily do a bitmap here, of course. NR_CPUS/64 is a small number, and would > reduce the code complexity. > Even simpler: just get rid of the check entirely.

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-21 Thread H. Peter Anvin
Well, if 2^17 CPUs are allocated we might 2K pages allocated. We could easily do a bitmap here, of course. NR_CPUS/64 is a small number, and would reduce the code complexity. On April 21, 2014 5:37:05 PM PDT, Andrew Lutomirski wrote: >On Mon, Apr 21, 2014 at 4:29 PM, H. Peter Anvin wrote:

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-21 Thread Andrew Lutomirski
On Mon, Apr 21, 2014 at 4:29 PM, H. Peter Anvin wrote: > On 04/21/2014 04:19 PM, Andrew Lutomirski wrote: >> >> Hahaha! :) >> >> Some comments: >> >> Does returning to 64-bit CS with 16-bit SS not need espfix? > > There is no such thing. With a 64-bit CS, the flags on SS are ignored > (although

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-21 Thread H. Peter Anvin
On 04/21/2014 04:19 PM, Andrew Lutomirski wrote: > > Hahaha! :) > > Some comments: > > Does returning to 64-bit CS with 16-bit SS not need espfix? There is no such thing. With a 64-bit CS, the flags on SS are ignored (although you still have to have a non-null SS... the conditions are a bit

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-21 Thread Andrew Lutomirski
On Mon, Apr 21, 2014 at 3:47 PM, H. Peter Anvin wrote: > This is a prototype of espfix for the 64-bit kernel. espfix is a > workaround for the architectural definition of IRET, which fails to > restore bits [31:16] of %esp when returning to a 16-bit stack > segment. We have a workaround for the

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-21 Thread Andrew Lutomirski
On Mon, Apr 21, 2014 at 3:47 PM, H. Peter Anvin h...@linux.intel.com wrote: This is a prototype of espfix for the 64-bit kernel. espfix is a workaround for the architectural definition of IRET, which fails to restore bits [31:16] of %esp when returning to a 16-bit stack segment. We have a

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-21 Thread H. Peter Anvin
On 04/21/2014 04:19 PM, Andrew Lutomirski wrote: Hahaha! :) Some comments: Does returning to 64-bit CS with 16-bit SS not need espfix? There is no such thing. With a 64-bit CS, the flags on SS are ignored (although you still have to have a non-null SS... the conditions are a bit

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-21 Thread Andrew Lutomirski
On Mon, Apr 21, 2014 at 4:29 PM, H. Peter Anvin h...@zytor.com wrote: On 04/21/2014 04:19 PM, Andrew Lutomirski wrote: Hahaha! :) Some comments: Does returning to 64-bit CS with 16-bit SS not need espfix? There is no such thing. With a 64-bit CS, the flags on SS are ignored (although

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-21 Thread H. Peter Anvin
Well, if 2^17 CPUs are allocated we might 2K pages allocated. We could easily do a bitmap here, of course. NR_CPUS/64 is a small number, and would reduce the code complexity. On April 21, 2014 5:37:05 PM PDT, Andrew Lutomirski aml...@gmail.com wrote: On Mon, Apr 21, 2014 at 4:29 PM, H. Peter

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-21 Thread Andrew Lutomirski
On Mon, Apr 21, 2014 at 5:53 PM, H. Peter Anvin h...@zytor.com wrote: Well, if 2^17 CPUs are allocated we might 2K pages allocated. We could easily do a bitmap here, of course. NR_CPUS/64 is a small number, and would reduce the code complexity. Even simpler: just get rid of the check

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-21 Thread H. Peter Anvin
I wanted to avoid the another cpu made this allocation, now I have to free crap, but I also didn't want to grab the lock if there was no work needed. On April 21, 2014 6:06:19 PM PDT, Andrew Lutomirski aml...@gmail.com wrote: On Mon, Apr 21, 2014 at 5:53 PM, H. Peter Anvin h...@zytor.com wrote:

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-21 Thread Andrew Lutomirski
On Mon, Apr 21, 2014 at 6:14 PM, H. Peter Anvin h...@zytor.com wrote: I wanted to avoid the another cpu made this allocation, now I have to free crap, but I also didn't want to grab the lock if there was no work needed. I guess you also want to avoid bouncing all these cachelines around on

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-21 Thread H. Peter Anvin
Race condition (although with x86 being globally ordered, it probably can't actually happen.) The bitmask is probably the way to go. On April 21, 2014 6:28:12 PM PDT, Andrew Lutomirski aml...@gmail.com wrote: On Mon, Apr 21, 2014 at 6:14 PM, H. Peter Anvin h...@zytor.com wrote: I wanted to

Re: [PATCH] x86-64: espfix for 64-bit mode *PROTOTYPE*

2014-04-21 Thread Andrew Lutomirski
On Mon, Apr 21, 2014 at 6:47 PM, H. Peter Anvin h...@zytor.com wrote: Race condition (although with x86 being globally ordered, it probably can't actually happen.) The bitmask is probably the way to go. Does the race matter? In the worst case you take the lock unnecessarily. But yes, the

<    1   2