On Mon, Apr 21, 2014 at 06:53:36PM -0700, Andrew Lutomirski wrote:
On Mon, Apr 21, 2014 at 6:47 PM, H. Peter Anvin h...@zytor.com wrote:
Race condition (although with x86 being globally ordered, it probably can't
actually happen.) The bitmask is probably the way to go.
Does the race
Just nitpicks below:
On Mon, Apr 21, 2014 at 03:47:52PM -0700, H. Peter Anvin wrote:
This is a prototype of espfix for the 64-bit kernel. espfix is a
workaround for the architectural definition of IRET, which fails to
restore bits [31:16] of %esp when returning to a 16-bit stack
segment. We
On Tue, Apr 22, 2014 at 01:23:12PM +0200, Borislav Petkov wrote:
I wonder if it would be workable to use a bit in the espfix PGD to
denote that it has been initialized already... I hear, near NX there's
some room :-)
Ok, I realized this won't work when I hit send... Oh well.
Anyway, another
On Tue, Apr 22, 2014 at 7:46 AM, Borislav Petkov b...@alien8.de wrote:
On Tue, Apr 22, 2014 at 01:23:12PM +0200, Borislav Petkov wrote:
I wonder if it would be workable to use a bit in the espfix PGD to
denote that it has been initialized already... I hear, near NX there's
some room :-)
Ok,
Honestly, guys... you're painting the bikeshed at the moment.
Initialization is the easiest bit of all this code. The tricky part is
*the rest of the code*, i.e. the stuff in entry_64.S.
Also, the code is butt-ugly at the moment. Aestetics have not been
dealt with.
-hpa
--
To
On Tue, Apr 22, 2014 at 9:10 AM, H. Peter Anvin h...@zytor.com wrote:
Honestly, guys... you're painting the bikeshed at the moment.
Initialization is the easiest bit of all this code. The tricky part is
*the rest of the code*, i.e. the stuff in entry_64.S.
That's because the initialization
On Tue, Apr 22, 2014 at 9:33 AM, Andrew Lutomirski aml...@gmail.com wrote:
For the espfix_adjust_stack thing, when can it actually need to do
anything? irqs should be off, I think, and MCE, NMI, and debug
exceptions use ist, so that leaves just #SS and #GP, I think. How can
those actually
On Tue, Apr 22, 2014 at 9:43 AM, Linus Torvalds
torva...@linux-foundation.org wrote:
On Tue, Apr 22, 2014 at 9:33 AM, Andrew Lutomirski aml...@gmail.com wrote:
For the espfix_adjust_stack thing, when can it actually need to do
anything? irqs should be off, I think, and MCE, NMI, and debug
On Tue, Apr 22, 2014 at 10:00 AM, Andrew Lutomirski aml...@gmail.com wrote:
My point is that it may be safe to remove the special espfix fixup
from #PF, which is probably the most performance-critical piece here,
aside from iret itself.
Actually, even that is unsafe.
Why?
The segment table
On 04/22/2014 10:00 AM, Andrew Lutomirski wrote:
Yes, you can very much trigger GP deliberately.
The way to do it is to just make an invalid segment descriptor on the
iret stack. Or make it a valid 16-bit one, but make it a code segment
for the stack pointer, or read-only, or whatever. All
On Tue, Apr 22, 2014 at 10:04 AM, Linus Torvalds
torva...@linux-foundation.org wrote:
On Tue, Apr 22, 2014 at 10:00 AM, Andrew Lutomirski aml...@gmail.com wrote:
My point is that it may be safe to remove the special espfix fixup
from #PF, which is probably the most performance-critical piece
On 04/22/2014 10:11 AM, Andrew Lutomirski wrote:
Anyway, if done correctly, this whole espfix should be totally free
for normal processes, since it should only trigger if SS is a LDT
entry (bit #2 set in the segment descriptor). So the normal fast-path
should just have a simple test for that.
On 04/22/2014 10:04 AM, Linus Torvalds wrote:
The segment table is shared for a process. So you can have one thread
doing a load_ldt() that invalidates a segment, while another thread is
busy taking a page fault. The segment was valid at page fault time and
is saved on the kernel stack, but
On Tue, Apr 22, 2014 at 10:11 AM, Andrew Lutomirski aml...@gmail.com wrote:
Anyway, if done correctly, this whole espfix should be totally free
for normal processes, since it should only trigger if SS is a LDT
entry (bit #2 set in the segment descriptor). So the normal fast-path
should just
On Tue, Apr 22, 2014 at 10:09 AM, H. Peter Anvin h...@zytor.com wrote:
As for Andy's questions:
What happens on the IST entries? If I've read your patch right,
you're still switching back to the normal stack, which looks
questionable.
No, in that case %rsp won't point into the espfix
On 04/22/2014 10:20 AM, Andrew Lutomirski wrote:
It won't, given the above. I misunderstood what you were checking.
It still seems to me that only #GP needs this special handling. The
IST entries should never run on the espfix stack, and #MC, #DB, #NM,
and #SS (I missed that one earlier)
On Tue, Apr 22, 2014 at 10:11:27AM -0700, H. Peter Anvin wrote:
The fastpath interference is:
1. Testing for an LDT SS selector before IRET. This is actually easier
than on 32 bits, because on 64 bits the SS:RSP on the stack is always valid.
2. Testing for an RSP inside the espfix region
On 04/22/2014 10:19 AM, Linus Torvalds wrote:
On Tue, Apr 22, 2014 at 10:11 AM, Andrew Lutomirski aml...@gmail.com wrote:
Anyway, if done correctly, this whole espfix should be totally free
for normal processes, since it should only trigger if SS is a LDT
entry (bit #2 set in the segment
On Tue, Apr 22, 2014 at 10:26 AM, Borislav Petkov b...@alien8.de wrote:
On Tue, Apr 22, 2014 at 10:11:27AM -0700, H. Peter Anvin wrote:
The fastpath interference is:
1. Testing for an LDT SS selector before IRET. This is actually easier
than on 32 bits, because on 64 bits the SS:RSP on the
On Tue, Apr 22, 2014 at 10:29 AM, H. Peter Anvin h...@zytor.com wrote:
On 04/22/2014 10:19 AM, Linus Torvalds wrote:
On Tue, Apr 22, 2014 at 10:11 AM, Andrew Lutomirski aml...@gmail.com wrote:
Anyway, if done correctly, this whole espfix should be totally free
for normal processes, since it
On 04/22/2014 10:46 AM, Andrew Lutomirski wrote:
That is the whole impact of the IRET path.
If using IST for #GP won't cause trouble (ISTs don't nest, so we need to
make sure there is absolutely no way we could end up nested) then the
rest of the fixup code can go away and we kill the common
On Tue, Apr 22, 2014 at 1:46 PM, Andrew Lutomirski aml...@gmail.com wrote:
On Tue, Apr 22, 2014 at 10:29 AM, H. Peter Anvin h...@zytor.com wrote:
On 04/22/2014 10:19 AM, Linus Torvalds wrote:
On Tue, Apr 22, 2014 at 10:11 AM, Andrew Lutomirski aml...@gmail.com
wrote:
Anyway, if done
On 04/22/2014 11:03 AM, Brian Gerst wrote:
Maybe make the #GP handler check what the previous stack was at the start:
1) If we came from userspace, switch to the top of the process stack.
2) If the previous stack was not the espfix stack, switch back to that stack.
3) Switch to the top of
On Tue, Apr 22, 2014 at 2:06 PM, H. Peter Anvin h...@zytor.com wrote:
On 04/22/2014 11:03 AM, Brian Gerst wrote:
Maybe make the #GP handler check what the previous stack was at the start:
1) If we came from userspace, switch to the top of the process stack.
2) If the previous stack was not
On 04/22/2014 11:17 AM, Brian Gerst wrote:
That is the entry condition that we have to deal with. The fact that
the switch to the IST is unconditional is what makes ISTs hard to deal with.
Right, that is why you switch away from the IST as soon as possible,
copying the data that is already
On Tue, Apr 22, 2014 at 10:29:45AM -0700, Andrew Lutomirski wrote:
Or we could add a TIF_NEEDS_ESPFIX that gets set once you have a
16-bit LDT entry.
Or something like that, yep.
But I think it makes sense to nail down everything else first. I
suspect that a single test-and-branch in the
On Tue, Apr 22, 2014 at 2:51 PM, H. Peter Anvin h...@zytor.com wrote:
On 04/22/2014 11:17 AM, Brian Gerst wrote:
That is the entry condition that we have to deal with. The fact that
the switch to the IST is unconditional is what makes ISTs hard to deal with.
Right, that is why you switch
On 04/22/2014 12:55 PM, Brian Gerst wrote:
On Tue, Apr 22, 2014 at 2:51 PM, H. Peter Anvin h...@zytor.com wrote:
On 04/22/2014 11:17 AM, Brian Gerst wrote:
That is the entry condition that we have to deal with. The fact that
the switch to the IST is unconditional is what makes ISTs hard to
On Tue, Apr 22, 2014 at 4:17 PM, H. Peter Anvin h...@zytor.com wrote:
On 04/22/2014 12:55 PM, Brian Gerst wrote:
On Tue, Apr 22, 2014 at 2:51 PM, H. Peter Anvin h...@zytor.com wrote:
On 04/22/2014 11:17 AM, Brian Gerst wrote:
That is the entry condition that we have to deal with. The fact
That simply will not work if you can take a #GP due to the safe MSR
functions from NMI and #MC context, which would be my main concern.
At some point the IST entry functions subtracted 1K while the
handler ran to handle simple nesting cases.
Not sure that code is still there.
-Andi
--
To
On 04/22/2014 04:39 PM, Andi Kleen wrote:
That simply will not work if you can take a #GP due to the safe MSR
functions from NMI and #MC context, which would be my main concern.
At some point the IST entry functions subtracted 1K while the
handler ran to handle simple nesting cases.
Not
Another spin of the prototype. This one avoids the espfix for anything
but #GP, and avoids save/restore/saving registers... one can wonder,
though, how much that actually matters in practice.
It still does redundant SWAPGS on the slow path. I'm not sure I
personally care enough to optimize
On Tue, Apr 22, 2014 at 6:17 PM, H. Peter Anvin h...@linux.intel.com wrote:
Another spin of the prototype. This one avoids the espfix for anything
but #GP, and avoids save/restore/saving registers... one can wonder,
though, how much that actually matters in practice.
It still does redundant
On 04/22/2014 06:23 PM, Andrew Lutomirski wrote:
What's the to_dmesg thing for?
It's for debugging... the espfix page tables generate so many duplicate
entries that trying to output it via a seqfile runs out of memory. I
suspect we need to do something like skip the espfix range or some
On Mon, Apr 21, 2014 at 6:47 PM, H. Peter Anvin wrote:
> Race condition (although with x86 being globally ordered, it probably can't
> actually happen.) The bitmask is probably the way to go.
Does the race matter? In the worst case you take the lock
unnecessarily. But yes, the bitmask is
Race condition (although with x86 being globally ordered, it probably can't
actually happen.) The bitmask is probably the way to go.
On April 21, 2014 6:28:12 PM PDT, Andrew Lutomirski wrote:
>On Mon, Apr 21, 2014 at 6:14 PM, H. Peter Anvin wrote:
>> I wanted to avoid the "another cpu made
On Mon, Apr 21, 2014 at 6:14 PM, H. Peter Anvin wrote:
> I wanted to avoid the "another cpu made this allocation, now I have to free"
> crap, but I also didn't want to grab the lock if there was no work needed.
I guess you also want to avoid bouncing all these cachelines around on
boot on bit
I wanted to avoid the "another cpu made this allocation, now I have to free"
crap, but I also didn't want to grab the lock if there was no work needed.
On April 21, 2014 6:06:19 PM PDT, Andrew Lutomirski wrote:
>On Mon, Apr 21, 2014 at 5:53 PM, H. Peter Anvin wrote:
>> Well, if 2^17 CPUs are
On Mon, Apr 21, 2014 at 5:53 PM, H. Peter Anvin wrote:
> Well, if 2^17 CPUs are allocated we might 2K pages allocated. We could
> easily do a bitmap here, of course. NR_CPUS/64 is a small number, and would
> reduce the code complexity.
>
Even simpler: just get rid of the check entirely.
Well, if 2^17 CPUs are allocated we might 2K pages allocated. We could easily
do a bitmap here, of course. NR_CPUS/64 is a small number, and would reduce
the code complexity.
On April 21, 2014 5:37:05 PM PDT, Andrew Lutomirski wrote:
>On Mon, Apr 21, 2014 at 4:29 PM, H. Peter Anvin wrote:
On Mon, Apr 21, 2014 at 4:29 PM, H. Peter Anvin wrote:
> On 04/21/2014 04:19 PM, Andrew Lutomirski wrote:
>>
>> Hahaha! :)
>>
>> Some comments:
>>
>> Does returning to 64-bit CS with 16-bit SS not need espfix?
>
> There is no such thing. With a 64-bit CS, the flags on SS are ignored
> (although
On 04/21/2014 04:19 PM, Andrew Lutomirski wrote:
>
> Hahaha! :)
>
> Some comments:
>
> Does returning to 64-bit CS with 16-bit SS not need espfix?
There is no such thing. With a 64-bit CS, the flags on SS are ignored
(although you still have to have a non-null SS... the conditions are a
bit
On Mon, Apr 21, 2014 at 3:47 PM, H. Peter Anvin wrote:
> This is a prototype of espfix for the 64-bit kernel. espfix is a
> workaround for the architectural definition of IRET, which fails to
> restore bits [31:16] of %esp when returning to a 16-bit stack
> segment. We have a workaround for the
On Mon, Apr 21, 2014 at 3:47 PM, H. Peter Anvin h...@linux.intel.com wrote:
This is a prototype of espfix for the 64-bit kernel. espfix is a
workaround for the architectural definition of IRET, which fails to
restore bits [31:16] of %esp when returning to a 16-bit stack
segment. We have a
On 04/21/2014 04:19 PM, Andrew Lutomirski wrote:
Hahaha! :)
Some comments:
Does returning to 64-bit CS with 16-bit SS not need espfix?
There is no such thing. With a 64-bit CS, the flags on SS are ignored
(although you still have to have a non-null SS... the conditions are a
bit
On Mon, Apr 21, 2014 at 4:29 PM, H. Peter Anvin h...@zytor.com wrote:
On 04/21/2014 04:19 PM, Andrew Lutomirski wrote:
Hahaha! :)
Some comments:
Does returning to 64-bit CS with 16-bit SS not need espfix?
There is no such thing. With a 64-bit CS, the flags on SS are ignored
(although
Well, if 2^17 CPUs are allocated we might 2K pages allocated. We could easily
do a bitmap here, of course. NR_CPUS/64 is a small number, and would reduce
the code complexity.
On April 21, 2014 5:37:05 PM PDT, Andrew Lutomirski aml...@gmail.com wrote:
On Mon, Apr 21, 2014 at 4:29 PM, H. Peter
On Mon, Apr 21, 2014 at 5:53 PM, H. Peter Anvin h...@zytor.com wrote:
Well, if 2^17 CPUs are allocated we might 2K pages allocated. We could
easily do a bitmap here, of course. NR_CPUS/64 is a small number, and would
reduce the code complexity.
Even simpler: just get rid of the check
I wanted to avoid the another cpu made this allocation, now I have to free
crap, but I also didn't want to grab the lock if there was no work needed.
On April 21, 2014 6:06:19 PM PDT, Andrew Lutomirski aml...@gmail.com wrote:
On Mon, Apr 21, 2014 at 5:53 PM, H. Peter Anvin h...@zytor.com wrote:
On Mon, Apr 21, 2014 at 6:14 PM, H. Peter Anvin h...@zytor.com wrote:
I wanted to avoid the another cpu made this allocation, now I have to free
crap, but I also didn't want to grab the lock if there was no work needed.
I guess you also want to avoid bouncing all these cachelines around on
Race condition (although with x86 being globally ordered, it probably can't
actually happen.) The bitmask is probably the way to go.
On April 21, 2014 6:28:12 PM PDT, Andrew Lutomirski aml...@gmail.com wrote:
On Mon, Apr 21, 2014 at 6:14 PM, H. Peter Anvin h...@zytor.com wrote:
I wanted to
On Mon, Apr 21, 2014 at 6:47 PM, H. Peter Anvin h...@zytor.com wrote:
Race condition (although with x86 being globally ordered, it probably can't
actually happen.) The bitmask is probably the way to go.
Does the race matter? In the worst case you take the lock
unnecessarily. But yes, the
101 - 152 of 152 matches
Mail list logo