On 04/28/2014 09:36 PM, H. Peter Anvin wrote:
>
> There are still things that need fixing: we need to go through the
> espfix path even when returning from NMI/MC (which fortunately can't
> nest with taking an NMI/MC on the espfix path itself, since in that case
> we will have been interrupted whi
On 04/28/2014 08:45 PM, H. Peter Anvin wrote:
>
> OK, so I found a bug in ldttest.c -- it sets CS to an LDT segment, but
> it never sets SS to an LDT segment. This means that it should really
> have zero footprint versus the espfix code, and implies that we instead
> have another bug involved. W
On 04/28/2014 08:45 PM, H. Peter Anvin wrote:
>
> OK, so I found a bug in ldttest.c -- it sets CS to an LDT segment, but
> it never sets SS to an LDT segment. This means that it should really
> have zero footprint versus the espfix code, and implies that we instead
> have another bug involved. W
On 04/28/2014 07:38 PM, H. Peter Anvin wrote:
> On 04/28/2014 05:20 PM, Andrew Lutomirski wrote:
>>
>> ldttest segfaults on 3.13 and 3.14 for me. It reboots (triple fault?)
>> on your branch. It even said this:
>>
>> qemu-system-x86_64: 9pfs:virtfs_reset: One or more uncluncked fids
>> found duri
On 04/28/2014 07:38 PM, H. Peter Anvin wrote:
> On 04/28/2014 05:20 PM, Andrew Lutomirski wrote:
>>
>> ldttest segfaults on 3.13 and 3.14 for me. It reboots (triple fault?)
>> on your branch. It even said this:
>>
>> qemu-system-x86_64: 9pfs:virtfs_reset: One or more uncluncked fids
>> found duri
On 04/28/2014 05:20 PM, Andrew Lutomirski wrote:
>
> ldttest segfaults on 3.13 and 3.14 for me. It reboots (triple fault?)
> on your branch. It even said this:
>
> qemu-system-x86_64: 9pfs:virtfs_reset: One or more uncluncked fids
> found during reset
>
> I have no idea what an uncluncked fd i
On Mon, Apr 28, 2014 at 5:02 PM, Andrew Lutomirski wrote:
> On Mon, Apr 28, 2014 at 4:08 PM, H. Peter Anvin wrote:
>> On 04/28/2014 04:05 PM, H. Peter Anvin wrote:
>>>
>>> So I tried writing this bit up, but it fails in some rather spectacular
>>> ways. Furthermore, I have been unable to debug i
On 04/28/2014 05:02 PM, Andrew Lutomirski wrote:
>
> I'm compiling your branch. In the mean time, two possibly stupid questions:
>
> What's the assembly code in the double-fault entry for?
>
It was easier for me to add it there than adding all the glue
(prototypes and so on) to put it into C c
On Mon, Apr 28, 2014 at 4:08 PM, H. Peter Anvin wrote:
> On 04/28/2014 04:05 PM, H. Peter Anvin wrote:
>>
>> So I tried writing this bit up, but it fails in some rather spectacular
>> ways. Furthermore, I have been unable to debug it under Qemu, because
>> breakpoints don't work right (common Qem
On 04/28/2014 04:05 PM, H. Peter Anvin wrote:
>
> So I tried writing this bit up, but it fails in some rather spectacular
> ways. Furthermore, I have been unable to debug it under Qemu, because
> breakpoints don't work right (common Qemu problem, sadly.)
>
> The kernel code is at:
>
> https://g
On 04/23/2014 09:53 PM, Andrew Lutomirski wrote:
>
> This particular vector hurts: you can safely keep trying until it works.
>
> This just gave me an evil idea: what if we make the whole espfix area
> read-only? This has some weird effects. To switch to the espfix
> stack, you have to write to
On Wed, Apr 23, 2014 at 09:56:00AM -0700, H. Peter Anvin wrote:
> On 04/23/2014 07:24 AM, Boris Ostrovsky wrote:
> >>
> >> Konrad - I really could use some help figuring out what needs to be done
> >> for this not to break Xen.
> >
> > This does break Xen PV:
> >
>
> I know it does. This is why
On 04/25/2014 05:02 AM, Pavel Machek wrote:
>
> Just to understand the consequences -- we leak 16 bit of kernel data
> to the userspace, right? Because it is %esp, we know that we leak
> stack address, which is not too sensitive, but will make kernel
> address randomization less useful...?
>
It
On 04/25/2014 02:02 PM, Konrad Rzeszutek Wilk wrote:
>
> Any particular reason you are using __pgd
>
> _pud
>
> and _pmd?
>
> and __pte instead of the 'pmd', 'pud', 'pmd' and 'pte' macros?
>
Not that I know of other than that the semantics of the various macros
are not described anywhere to t
On Tue, Apr 22, 2014 at 06:17:21PM -0700, H. Peter Anvin wrote:
> Another spin of the prototype. This one avoids the espfix for anything
> but #GP, and avoids save/restore/saving registers... one can wonder,
> though, how much that actually matters in practice.
>
> It still does redundant SWAPGS
Hi!
> This is a prototype of espfix for the 64-bit kernel. espfix is a
> workaround for the architectural definition of IRET, which fails to
> restore bits [31:16] of %esp when returning to a 16-bit stack
> segment. We have a workaround for the 32-bit kernel, but that
> implementation doesn't wo
On Thu, Apr 24, 2014 at 3:37 PM, H. Peter Anvin wrote:
> On 04/24/2014 03:31 PM, Andrew Lutomirski wrote:
>>
>> I was imagining just randomizing a couple of high bits so the whole
>> espfix area moves as a unit.
>>
>>> We could XOR with a random constant with no penalty at all. Only
>>> problem i
On 04/24/2014 03:31 PM, Andrew Lutomirski wrote:
>
> I was imagining just randomizing a couple of high bits so the whole
> espfix area moves as a unit.
>
>> We could XOR with a random constant with no penalty at all. Only
>> problem is that this happens early, so the entropy system is not yet
>>
On Thu, Apr 24, 2014 at 3:24 PM, H. Peter Anvin wrote:
> On 04/23/2014 09:53 PM, Andrew Lutomirski wrote:
>>>
>>> - The user can put arbitrary data in registers before returning to the
>>> LDT in order to get it saved at a known address accessible from the
>>> kernel. With SMAP and KASLR this mig
On 04/23/2014 09:53 PM, Andrew Lutomirski wrote:
>>
>> - The user can put arbitrary data in registers before returning to the
>> LDT in order to get it saved at a known address accessible from the
>> kernel. With SMAP and KASLR this might otherwise be difficult.
>
> For one thing, this only matte
On Wed, Apr 23, 2014 at 9:13 PM, comex wrote:
> On Mon, Apr 21, 2014 at 6:47 PM, H. Peter Anvin wrote:
>> This is a prototype of espfix for the 64-bit kernel. espfix is a
>> workaround for the architectural definition of IRET, which fails to
>> restore bits [31:16] of %esp when returning to a 16
On Mon, Apr 21, 2014 at 6:47 PM, H. Peter Anvin wrote:
> This is a prototype of espfix for the 64-bit kernel. espfix is a
> workaround for the architectural definition of IRET, which fails to
> restore bits [31:16] of %esp when returning to a 16-bit stack
> segment. We have a workaround for the
On Wed, Apr 23, 2014 at 10:28 AM, H. Peter Anvin wrote:
> On 04/23/2014 10:25 AM, Andrew Lutomirski wrote:
>> On Wed, Apr 23, 2014 at 10:16 AM, H. Peter Anvin wrote:
>>> On 04/23/2014 10:08 AM, Andrew Lutomirski wrote:
The only way I can see to trigger the race is with sigreturn, but it
On 04/23/2014 10:25 AM, Andrew Lutomirski wrote:
> On Wed, Apr 23, 2014 at 10:16 AM, H. Peter Anvin wrote:
>> On 04/23/2014 10:08 AM, Andrew Lutomirski wrote:
>>>
>>> The only way I can see to trigger the race is with sigreturn, but it's
>>> still there. Sigh.
>>
>> I don't see why sigreturn need
On Wed, Apr 23, 2014 at 10:16 AM, H. Peter Anvin wrote:
> On 04/23/2014 10:08 AM, Andrew Lutomirski wrote:
>>
>> The only way I can see to trigger the race is with sigreturn, but it's
>> still there. Sigh.
>>
>
> I don't see why sigreturn needs to be involved... all you need is
> modify_ldt() on
On 04/23/2014 10:08 AM, Andrew Lutomirski wrote:
>
> The only way I can see to trigger the race is with sigreturn, but it's
> still there. Sigh.
>
I don't see why sigreturn needs to be involved... all you need is
modify_ldt() on one CPU while the other is in the middle of an IRET
return. Small
On Wed, Apr 23, 2014 at 8:53 AM, H. Peter Anvin wrote:
> On 04/23/2014 02:54 AM, One Thousand Gnomes wrote:
>>> Ideally the tests should be doable such that on a normal machine the
>>> tests can be overlapped with the other things we have to do on that
>>> path. The exit branch will be strongly p
On 04/23/2014 07:24 AM, Boris Ostrovsky wrote:
>>
>> Konrad - I really could use some help figuring out what needs to be done
>> for this not to break Xen.
>
> This does break Xen PV:
>
I know it does. This is why I asked for help.
This is fundamentally the problem with PV and *especially* the
On 04/23/2014 02:54 AM, One Thousand Gnomes wrote:
>> Ideally the tests should be doable such that on a normal machine the
>> tests can be overlapped with the other things we have to do on that
>> path. The exit branch will be strongly predicted in the negative
>> direction, so it shouldn't be a s
On 04/22/2014 09:42 PM, H. Peter Anvin wrote:
On 04/22/2014 06:23 PM, Andrew Lutomirski wrote:
What's the to_dmesg thing for?
It's for debugging... the espfix page tables generate so many duplicate
entries that trying to output it via a seqfile runs out of memory. I
suspect we need to do some
> Ideally the tests should be doable such that on a normal machine the
> tests can be overlapped with the other things we have to do on that
> path. The exit branch will be strongly predicted in the negative
> direction, so it shouldn't be a significant problem.
>
> Again, this is not the case in
"H. Peter Anvin" writes:
> Does anyone have any idea if there is a real use case for non-16-bit
> LDT segments used as the stack segment? Does Wine use anything like
> that?
Wine uses them for DPMI support, though that would only get used when
vm86 mode is available.
--
Alexandre Julliard
jul
On 04/22/2014 10:04 AM, Linus Torvalds wrote:
The segment table is shared for a process. So you can have one thread
doing a load_ldt() that invalidates a segment, while another thread is
busy taking a page fault. The segment was valid at page fault time and
is saved on the kernel stack, but by t
On 04/22/2014 06:23 PM, Andrew Lutomirski wrote:
>
> What's the to_dmesg thing for?
>
It's for debugging... the espfix page tables generate so many duplicate
entries that trying to output it via a seqfile runs out of memory. I
suspect we need to do something like skip the espfix range or some o
On Tue, Apr 22, 2014 at 6:17 PM, H. Peter Anvin wrote:
> Another spin of the prototype. This one avoids the espfix for anything
> but #GP, and avoids save/restore/saving registers... one can wonder,
> though, how much that actually matters in practice.
>
> It still does redundant SWAPGS on the sl
Another spin of the prototype. This one avoids the espfix for anything
but #GP, and avoids save/restore/saving registers... one can wonder,
though, how much that actually matters in practice.
It still does redundant SWAPGS on the slow path. I'm not sure I
personally care enough to optimize that,
On 04/22/2014 04:39 PM, Andi Kleen wrote:
>> That simply will not work if you can take a #GP due to the "safe" MSR
>> functions from NMI and #MC context, which would be my main concern.
>
> At some point the IST entry functions subtracted 1K while the
> handler ran to handle simple nesting cases.
> That simply will not work if you can take a #GP due to the "safe" MSR
> functions from NMI and #MC context, which would be my main concern.
At some point the IST entry functions subtracted 1K while the
handler ran to handle simple nesting cases.
Not sure that code is still there.
-Andi
--
To u
On Tue, Apr 22, 2014 at 4:17 PM, H. Peter Anvin wrote:
> On 04/22/2014 12:55 PM, Brian Gerst wrote:
>> On Tue, Apr 22, 2014 at 2:51 PM, H. Peter Anvin wrote:
>>> On 04/22/2014 11:17 AM, Brian Gerst wrote:
>
> That is the entry condition that we have to deal with. The fact that
> the
On 04/22/2014 12:55 PM, Brian Gerst wrote:
> On Tue, Apr 22, 2014 at 2:51 PM, H. Peter Anvin wrote:
>> On 04/22/2014 11:17 AM, Brian Gerst wrote:
That is the entry condition that we have to deal with. The fact that
the switch to the IST is unconditional is what makes ISTs hard to d
On Tue, Apr 22, 2014 at 2:51 PM, H. Peter Anvin wrote:
> On 04/22/2014 11:17 AM, Brian Gerst wrote:
>>>
>>> That is the entry condition that we have to deal with. The fact that
>>> the switch to the IST is unconditional is what makes ISTs hard to deal with.
>>
>> Right, that is why you switch awa
On Tue, Apr 22, 2014 at 10:29:45AM -0700, Andrew Lutomirski wrote:
> Or we could add a TIF_NEEDS_ESPFIX that gets set once you have a
> 16-bit LDT entry.
Or something like that, yep.
> But I think it makes sense to nail down everything else first. I
> suspect that a single test-and-branch in the
On 04/22/2014 11:17 AM, Brian Gerst wrote:
>>
>> That is the entry condition that we have to deal with. The fact that
>> the switch to the IST is unconditional is what makes ISTs hard to deal with.
>
> Right, that is why you switch away from the IST as soon as possible,
> copying the data that is
On Tue, Apr 22, 2014 at 2:06 PM, H. Peter Anvin wrote:
> On 04/22/2014 11:03 AM, Brian Gerst wrote:
>>
>> Maybe make the #GP handler check what the previous stack was at the start:
>> 1) If we came from userspace, switch to the top of the process stack.
>> 2) If the previous stack was not the espf
On 04/22/2014 11:03 AM, Brian Gerst wrote:
>
> Maybe make the #GP handler check what the previous stack was at the start:
> 1) If we came from userspace, switch to the top of the process stack.
> 2) If the previous stack was not the espfix stack, switch back to that stack.
> 3) Switch to the top o
On Tue, Apr 22, 2014 at 1:46 PM, Andrew Lutomirski wrote:
> On Tue, Apr 22, 2014 at 10:29 AM, H. Peter Anvin wrote:
>> On 04/22/2014 10:19 AM, Linus Torvalds wrote:
>>> On Tue, Apr 22, 2014 at 10:11 AM, Andrew Lutomirski
>>> wrote:
>
> Anyway, if done correctly, this whole espfix s
On 04/22/2014 10:46 AM, Andrew Lutomirski wrote:
>>
>> That is the whole impact of the IRET path.
>>
>> If using IST for #GP won't cause trouble (ISTs don't nest, so we need to
>> make sure there is absolutely no way we could end up nested) then the
>> rest of the fixup code can go away and we kill
On Tue, Apr 22, 2014 at 10:29 AM, H. Peter Anvin wrote:
> On 04/22/2014 10:19 AM, Linus Torvalds wrote:
>> On Tue, Apr 22, 2014 at 10:11 AM, Andrew Lutomirski wrote:
>>>
Anyway, if done correctly, this whole espfix should be totally free
for normal processes, since it should only t
On Tue, Apr 22, 2014 at 10:26 AM, Borislav Petkov wrote:
> On Tue, Apr 22, 2014 at 10:11:27AM -0700, H. Peter Anvin wrote:
>> The fastpath interference is:
>>
>> 1. Testing for an LDT SS selector before IRET. This is actually easier
>> than on 32 bits, because on 64 bits the SS:RSP on the stack i
On 04/22/2014 10:19 AM, Linus Torvalds wrote:
> On Tue, Apr 22, 2014 at 10:11 AM, Andrew Lutomirski wrote:
>>
>>>
>>> Anyway, if done correctly, this whole espfix should be totally free
>>> for normal processes, since it should only trigger if SS is a LDT
>>> entry (bit #2 set in the segment descr
On Tue, Apr 22, 2014 at 10:11:27AM -0700, H. Peter Anvin wrote:
> The fastpath interference is:
>
> 1. Testing for an LDT SS selector before IRET. This is actually easier
> than on 32 bits, because on 64 bits the SS:RSP on the stack is always valid.
>
> 2. Testing for an RSP inside the espfix re
On 04/22/2014 10:20 AM, Andrew Lutomirski wrote:
>
> It won't, given the above. I misunderstood what you were checking.
>
> It still seems to me that only #GP needs this special handling. The
> IST entries should never run on the espfix stack, and #MC, #DB, #NM,
> and #SS (I missed that one ear
On Tue, Apr 22, 2014 at 10:09 AM, H. Peter Anvin wrote:
>
> As for Andy's questions:
>
>> What happens on the IST entries? If I've read your patch right,
>> you're still switching back to the normal stack, which looks
>> questionable.
>
> No, in that case %rsp won't point into the espfix region,
On Tue, Apr 22, 2014 at 10:11 AM, Andrew Lutomirski wrote:
>
>>
>> Anyway, if done correctly, this whole espfix should be totally free
>> for normal processes, since it should only trigger if SS is a LDT
>> entry (bit #2 set in the segment descriptor). So the normal fast-path
>> should just have a
On 04/22/2014 10:04 AM, Linus Torvalds wrote:
>
> The segment table is shared for a process. So you can have one thread
> doing a load_ldt() that invalidates a segment, while another thread is
> busy taking a page fault. The segment was valid at page fault time and
> is saved on the kernel stack,
On 04/22/2014 10:11 AM, Andrew Lutomirski wrote:
>>
>> Anyway, if done correctly, this whole espfix should be totally free
>> for normal processes, since it should only trigger if SS is a LDT
>> entry (bit #2 set in the segment descriptor). So the normal fast-path
>> should just have a simple test
On Tue, Apr 22, 2014 at 10:04 AM, Linus Torvalds
wrote:
> On Tue, Apr 22, 2014 at 10:00 AM, Andrew Lutomirski wrote:
>>
>> My point is that it may be safe to remove the special espfix fixup
>> from #PF, which is probably the most performance-critical piece here,
>> aside from iret itself.
>
> Act
On 04/22/2014 10:00 AM, Andrew Lutomirski wrote:
>>
>> Yes, you can very much trigger GP deliberately.
>>
>> The way to do it is to just make an invalid segment descriptor on the
>> iret stack. Or make it a valid 16-bit one, but make it a code segment
>> for the stack pointer, or read-only, or what
On Tue, Apr 22, 2014 at 10:00 AM, Andrew Lutomirski wrote:
>
> My point is that it may be safe to remove the special espfix fixup
> from #PF, which is probably the most performance-critical piece here,
> aside from iret itself.
Actually, even that is unsafe.
Why?
The segment table is shared for
On Tue, Apr 22, 2014 at 9:43 AM, Linus Torvalds
wrote:
> On Tue, Apr 22, 2014 at 9:33 AM, Andrew Lutomirski wrote:
>>
>> For the espfix_adjust_stack thing, when can it actually need to do
>> anything? irqs should be off, I think, and MCE, NMI, and debug
>> exceptions use ist, so that leaves just
On Tue, Apr 22, 2014 at 9:33 AM, Andrew Lutomirski wrote:
>
> For the espfix_adjust_stack thing, when can it actually need to do
> anything? irqs should be off, I think, and MCE, NMI, and debug
> exceptions use ist, so that leaves just #SS and #GP, I think. How can
> those actually occur? Is th
On Tue, Apr 22, 2014 at 9:10 AM, H. Peter Anvin wrote:
> Honestly, guys... you're painting the bikeshed at the moment.
>
> Initialization is the easiest bit of all this code. The tricky part is
> *the rest of the code*, i.e. the stuff in entry_64.S.
That's because the initialization code is much
Honestly, guys... you're painting the bikeshed at the moment.
Initialization is the easiest bit of all this code. The tricky part is
*the rest of the code*, i.e. the stuff in entry_64.S.
Also, the code is butt-ugly at the moment. Aestetics have not been
dealt with.
-hpa
--
To unsubscr
On Tue, Apr 22, 2014 at 7:46 AM, Borislav Petkov wrote:
> On Tue, Apr 22, 2014 at 01:23:12PM +0200, Borislav Petkov wrote:
>> I wonder if it would be workable to use a bit in the espfix PGD to
>> denote that it has been initialized already... I hear, near NX there's
>> some room :-)
>
> Ok, I real
On Tue, Apr 22, 2014 at 01:23:12PM +0200, Borislav Petkov wrote:
> I wonder if it would be workable to use a bit in the espfix PGD to
> denote that it has been initialized already... I hear, near NX there's
> some room :-)
Ok, I realized this won't work when I hit send... Oh well.
Anyway, another
Just nitpicks below:
On Mon, Apr 21, 2014 at 03:47:52PM -0700, H. Peter Anvin wrote:
> This is a prototype of espfix for the 64-bit kernel. espfix is a
> workaround for the architectural definition of IRET, which fails to
> restore bits [31:16] of %esp when returning to a 16-bit stack
> segment.
On Mon, Apr 21, 2014 at 06:53:36PM -0700, Andrew Lutomirski wrote:
> On Mon, Apr 21, 2014 at 6:47 PM, H. Peter Anvin wrote:
> > Race condition (although with x86 being globally ordered, it probably can't
> > actually happen.) The bitmask is probably the way to go.
>
> Does the race matter? In t
On Mon, Apr 21, 2014 at 6:47 PM, H. Peter Anvin wrote:
> Race condition (although with x86 being globally ordered, it probably can't
> actually happen.) The bitmask is probably the way to go.
Does the race matter? In the worst case you take the lock
unnecessarily. But yes, the bitmask is easy.
Race condition (although with x86 being globally ordered, it probably can't
actually happen.) The bitmask is probably the way to go.
On April 21, 2014 6:28:12 PM PDT, Andrew Lutomirski wrote:
>On Mon, Apr 21, 2014 at 6:14 PM, H. Peter Anvin wrote:
>> I wanted to avoid the "another cpu made this
On Mon, Apr 21, 2014 at 6:14 PM, H. Peter Anvin wrote:
> I wanted to avoid the "another cpu made this allocation, now I have to free"
> crap, but I also didn't want to grab the lock if there was no work needed.
I guess you also want to avoid bouncing all these cachelines around on
boot on bit mu
I wanted to avoid the "another cpu made this allocation, now I have to free"
crap, but I also didn't want to grab the lock if there was no work needed.
On April 21, 2014 6:06:19 PM PDT, Andrew Lutomirski wrote:
>On Mon, Apr 21, 2014 at 5:53 PM, H. Peter Anvin wrote:
>> Well, if 2^17 CPUs are al
On Mon, Apr 21, 2014 at 5:53 PM, H. Peter Anvin wrote:
> Well, if 2^17 CPUs are allocated we might 2K pages allocated. We could
> easily do a bitmap here, of course. NR_CPUS/64 is a small number, and would
> reduce the code complexity.
>
Even simpler: just get rid of the check entirely. That
Well, if 2^17 CPUs are allocated we might 2K pages allocated. We could easily
do a bitmap here, of course. NR_CPUS/64 is a small number, and would reduce
the code complexity.
On April 21, 2014 5:37:05 PM PDT, Andrew Lutomirski wrote:
>On Mon, Apr 21, 2014 at 4:29 PM, H. Peter Anvin wrote:
>>
On Mon, Apr 21, 2014 at 4:29 PM, H. Peter Anvin wrote:
> On 04/21/2014 04:19 PM, Andrew Lutomirski wrote:
>>
>> Hahaha! :)
>>
>> Some comments:
>>
>> Does returning to 64-bit CS with 16-bit SS not need espfix?
>
> There is no such thing. With a 64-bit CS, the flags on SS are ignored
> (although y
On 04/21/2014 04:19 PM, Andrew Lutomirski wrote:
>
> Hahaha! :)
>
> Some comments:
>
> Does returning to 64-bit CS with 16-bit SS not need espfix?
There is no such thing. With a 64-bit CS, the flags on SS are ignored
(although you still have to have a non-null SS... the conditions are a
bit co
On Mon, Apr 21, 2014 at 3:47 PM, H. Peter Anvin wrote:
> This is a prototype of espfix for the 64-bit kernel. espfix is a
> workaround for the architectural definition of IRET, which fails to
> restore bits [31:16] of %esp when returning to a 16-bit stack
> segment. We have a workaround for the
This is a prototype of espfix for the 64-bit kernel. espfix is a
workaround for the architectural definition of IRET, which fails to
restore bits [31:16] of %esp when returning to a 16-bit stack
segment. We have a workaround for the 32-bit kernel, but that
implementation doesn't work for 64 bits.
77 matches
Mail list logo