Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling

2016-03-07 Thread Andy Lutomirski
On Mon, Mar 7, 2016 at 10:41 AM, Brian Gerst  wrote:
> On Mon, Mar 7, 2016 at 1:03 PM, Andy Lutomirski  wrote:
>> On Mon, Mar 7, 2016 at 9:17 AM, Brian Gerst  wrote:
>>> On Sun, Mar 6, 2016 at 12:52 AM, Andy Lutomirski  wrote:
 Due to a blatant design error, SYSENTER doesn't clear TF.  As a result,
 if a user does SYSENTER with TF set, we will single-step through the
 kernel until something clears TF.  There is absolutely nothing we can
 do to prevent this short of turning off SYSENTER [1].

 Simplify the handling considerably with two changes:

 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC.  We can
add TF to that list of flags to sanitize with no overhead whatsoever.

 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue.
>>>
>>> What is wrong with the current method of clearing TF and setting
>>> TIF_SINGLESTEP on the first debug trap?  This patch actually increases
>>> complexity because it has to check for a range of addresses rather
>>> than just the first instruction, plus it has to singlestep all the way
>>> through the SYSENTER prologue.
>>>
>>> Unless there is an actual issue with TIF_SINGLESTEP, I don't think
>>> this patch is an improvement.
>>
>> TIF_SINGLESTEP has bizarrely overloaded semantics.
>>
>> There's this:
>>
>> /*
>>  * If we stepped into a sysenter/syscall insn, it trapped in
>>  * kernel mode; do_debug() cleared TF and set TIF_SINGLESTEP.
>>  * If user-mode had set TF itself, then it's still clear from
>>  * do_debug() and we need to set it again to restore the user
>>  * state so we don't wrongly set TIF_FORCED_TF below.
>>  * If enable_single_step() was used last and that is what
>>  * set TIF_SINGLESTEP, then both TF and TIF_FORCED_TF are
>>  * already set and our bookkeeping is fine.
>>  */
>> if (unlikely(test_tsk_thread_flag(child, TIF_SINGLESTEP)))
>> regs->flags |= X86_EFLAGS_TF;
>>
>> but TIF_SINGLESTEP is also used for other things.  (And I need to
>> follow up with a patch to remove that comment.)  This results in
>> incomprehensible behavior: if a user program sets TF and does
>> SYSENTER, then TIF_SINGLESTEP gets set (and stays set!).  This does
>> not happen if a user sets TF and does INT80 or SYSCALL.  There was at
>> least one bug in here that Oleg fixed a while back, and I wouldn't be
>> at all surprised if there were others.
>>
>> I don't know what would happen if TF were set and SYSENTER were used
>> to do a syscall where the __get_user to load the syscall nr failed.
>> That happens before the TIF_SINGLESTEP fixup.
>>
>> Basically, the overloaded use of TIF_SINGLESTEP was complicated and
>> hard to understand, and the new behavior is straightforward and
>> consistent with other entries, even if it's a bit slower.
>>
>> We could introduce a new TIF_SYSENTER_TF and use it directly, or we
>> could accelerate the TF fixup in regs->flags by switching to a
>> different entry path (I had a draft patch to do that), but I tend to
>> favor simplicity for things that aren't performance-critical.
>
> The alternate entry path wouldn't be very big, just 5 instructions
> with the OR being the only difference.

Agreed.  It's just more code to maintain and test.  I can certainly do it.

I scrapped it the first time because it was messy on Xen PV.  It was
extra messy because I did this part of the series before I cleaned up
the garbage ENTRY(debug) code, and that got in the way as well.

>
> Another option is to use a per-cpu var instead of a TIF flag.

Then we'd need to add a branch in the TF-not-set path, which would be
a bit unfortunate.

As a third option, we could force BTF on, which would reduce the
number of useless traps.

In any case, I'd be glad to speed this up as a follow-up patch if
anyone cares about performance.

--Andy


Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling

2016-03-07 Thread Andy Lutomirski
On Mon, Mar 7, 2016 at 10:41 AM, Brian Gerst  wrote:
> On Mon, Mar 7, 2016 at 1:03 PM, Andy Lutomirski  wrote:
>> On Mon, Mar 7, 2016 at 9:17 AM, Brian Gerst  wrote:
>>> On Sun, Mar 6, 2016 at 12:52 AM, Andy Lutomirski  wrote:
 Due to a blatant design error, SYSENTER doesn't clear TF.  As a result,
 if a user does SYSENTER with TF set, we will single-step through the
 kernel until something clears TF.  There is absolutely nothing we can
 do to prevent this short of turning off SYSENTER [1].

 Simplify the handling considerably with two changes:

 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC.  We can
add TF to that list of flags to sanitize with no overhead whatsoever.

 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue.
>>>
>>> What is wrong with the current method of clearing TF and setting
>>> TIF_SINGLESTEP on the first debug trap?  This patch actually increases
>>> complexity because it has to check for a range of addresses rather
>>> than just the first instruction, plus it has to singlestep all the way
>>> through the SYSENTER prologue.
>>>
>>> Unless there is an actual issue with TIF_SINGLESTEP, I don't think
>>> this patch is an improvement.
>>
>> TIF_SINGLESTEP has bizarrely overloaded semantics.
>>
>> There's this:
>>
>> /*
>>  * If we stepped into a sysenter/syscall insn, it trapped in
>>  * kernel mode; do_debug() cleared TF and set TIF_SINGLESTEP.
>>  * If user-mode had set TF itself, then it's still clear from
>>  * do_debug() and we need to set it again to restore the user
>>  * state so we don't wrongly set TIF_FORCED_TF below.
>>  * If enable_single_step() was used last and that is what
>>  * set TIF_SINGLESTEP, then both TF and TIF_FORCED_TF are
>>  * already set and our bookkeeping is fine.
>>  */
>> if (unlikely(test_tsk_thread_flag(child, TIF_SINGLESTEP)))
>> regs->flags |= X86_EFLAGS_TF;
>>
>> but TIF_SINGLESTEP is also used for other things.  (And I need to
>> follow up with a patch to remove that comment.)  This results in
>> incomprehensible behavior: if a user program sets TF and does
>> SYSENTER, then TIF_SINGLESTEP gets set (and stays set!).  This does
>> not happen if a user sets TF and does INT80 or SYSCALL.  There was at
>> least one bug in here that Oleg fixed a while back, and I wouldn't be
>> at all surprised if there were others.
>>
>> I don't know what would happen if TF were set and SYSENTER were used
>> to do a syscall where the __get_user to load the syscall nr failed.
>> That happens before the TIF_SINGLESTEP fixup.
>>
>> Basically, the overloaded use of TIF_SINGLESTEP was complicated and
>> hard to understand, and the new behavior is straightforward and
>> consistent with other entries, even if it's a bit slower.
>>
>> We could introduce a new TIF_SYSENTER_TF and use it directly, or we
>> could accelerate the TF fixup in regs->flags by switching to a
>> different entry path (I had a draft patch to do that), but I tend to
>> favor simplicity for things that aren't performance-critical.
>
> The alternate entry path wouldn't be very big, just 5 instructions
> with the OR being the only difference.

Agreed.  It's just more code to maintain and test.  I can certainly do it.

I scrapped it the first time because it was messy on Xen PV.  It was
extra messy because I did this part of the series before I cleaned up
the garbage ENTRY(debug) code, and that got in the way as well.

>
> Another option is to use a per-cpu var instead of a TIF flag.

Then we'd need to add a branch in the TF-not-set path, which would be
a bit unfortunate.

As a third option, we could force BTF on, which would reduce the
number of useless traps.

In any case, I'd be glad to speed this up as a follow-up patch if
anyone cares about performance.

--Andy


Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling

2016-03-07 Thread Brian Gerst
On Mon, Mar 7, 2016 at 1:03 PM, Andy Lutomirski  wrote:
> On Mon, Mar 7, 2016 at 9:17 AM, Brian Gerst  wrote:
>> On Sun, Mar 6, 2016 at 12:52 AM, Andy Lutomirski  wrote:
>>> Due to a blatant design error, SYSENTER doesn't clear TF.  As a result,
>>> if a user does SYSENTER with TF set, we will single-step through the
>>> kernel until something clears TF.  There is absolutely nothing we can
>>> do to prevent this short of turning off SYSENTER [1].
>>>
>>> Simplify the handling considerably with two changes:
>>>
>>> 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC.  We can
>>>add TF to that list of flags to sanitize with no overhead whatsoever.
>>>
>>> 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue.
>>
>> What is wrong with the current method of clearing TF and setting
>> TIF_SINGLESTEP on the first debug trap?  This patch actually increases
>> complexity because it has to check for a range of addresses rather
>> than just the first instruction, plus it has to singlestep all the way
>> through the SYSENTER prologue.
>>
>> Unless there is an actual issue with TIF_SINGLESTEP, I don't think
>> this patch is an improvement.
>
> TIF_SINGLESTEP has bizarrely overloaded semantics.
>
> There's this:
>
> /*
>  * If we stepped into a sysenter/syscall insn, it trapped in
>  * kernel mode; do_debug() cleared TF and set TIF_SINGLESTEP.
>  * If user-mode had set TF itself, then it's still clear from
>  * do_debug() and we need to set it again to restore the user
>  * state so we don't wrongly set TIF_FORCED_TF below.
>  * If enable_single_step() was used last and that is what
>  * set TIF_SINGLESTEP, then both TF and TIF_FORCED_TF are
>  * already set and our bookkeeping is fine.
>  */
> if (unlikely(test_tsk_thread_flag(child, TIF_SINGLESTEP)))
> regs->flags |= X86_EFLAGS_TF;
>
> but TIF_SINGLESTEP is also used for other things.  (And I need to
> follow up with a patch to remove that comment.)  This results in
> incomprehensible behavior: if a user program sets TF and does
> SYSENTER, then TIF_SINGLESTEP gets set (and stays set!).  This does
> not happen if a user sets TF and does INT80 or SYSCALL.  There was at
> least one bug in here that Oleg fixed a while back, and I wouldn't be
> at all surprised if there were others.
>
> I don't know what would happen if TF were set and SYSENTER were used
> to do a syscall where the __get_user to load the syscall nr failed.
> That happens before the TIF_SINGLESTEP fixup.
>
> Basically, the overloaded use of TIF_SINGLESTEP was complicated and
> hard to understand, and the new behavior is straightforward and
> consistent with other entries, even if it's a bit slower.
>
> We could introduce a new TIF_SYSENTER_TF and use it directly, or we
> could accelerate the TF fixup in regs->flags by switching to a
> different entry path (I had a draft patch to do that), but I tend to
> favor simplicity for things that aren't performance-critical.

The alternate entry path wouldn't be very big, just 5 instructions
with the OR being the only difference.

Another option is to use a per-cpu var instead of a TIF flag.

--
Brian Gerst


Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling

2016-03-07 Thread Brian Gerst
On Mon, Mar 7, 2016 at 1:03 PM, Andy Lutomirski  wrote:
> On Mon, Mar 7, 2016 at 9:17 AM, Brian Gerst  wrote:
>> On Sun, Mar 6, 2016 at 12:52 AM, Andy Lutomirski  wrote:
>>> Due to a blatant design error, SYSENTER doesn't clear TF.  As a result,
>>> if a user does SYSENTER with TF set, we will single-step through the
>>> kernel until something clears TF.  There is absolutely nothing we can
>>> do to prevent this short of turning off SYSENTER [1].
>>>
>>> Simplify the handling considerably with two changes:
>>>
>>> 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC.  We can
>>>add TF to that list of flags to sanitize with no overhead whatsoever.
>>>
>>> 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue.
>>
>> What is wrong with the current method of clearing TF and setting
>> TIF_SINGLESTEP on the first debug trap?  This patch actually increases
>> complexity because it has to check for a range of addresses rather
>> than just the first instruction, plus it has to singlestep all the way
>> through the SYSENTER prologue.
>>
>> Unless there is an actual issue with TIF_SINGLESTEP, I don't think
>> this patch is an improvement.
>
> TIF_SINGLESTEP has bizarrely overloaded semantics.
>
> There's this:
>
> /*
>  * If we stepped into a sysenter/syscall insn, it trapped in
>  * kernel mode; do_debug() cleared TF and set TIF_SINGLESTEP.
>  * If user-mode had set TF itself, then it's still clear from
>  * do_debug() and we need to set it again to restore the user
>  * state so we don't wrongly set TIF_FORCED_TF below.
>  * If enable_single_step() was used last and that is what
>  * set TIF_SINGLESTEP, then both TF and TIF_FORCED_TF are
>  * already set and our bookkeeping is fine.
>  */
> if (unlikely(test_tsk_thread_flag(child, TIF_SINGLESTEP)))
> regs->flags |= X86_EFLAGS_TF;
>
> but TIF_SINGLESTEP is also used for other things.  (And I need to
> follow up with a patch to remove that comment.)  This results in
> incomprehensible behavior: if a user program sets TF and does
> SYSENTER, then TIF_SINGLESTEP gets set (and stays set!).  This does
> not happen if a user sets TF and does INT80 or SYSCALL.  There was at
> least one bug in here that Oleg fixed a while back, and I wouldn't be
> at all surprised if there were others.
>
> I don't know what would happen if TF were set and SYSENTER were used
> to do a syscall where the __get_user to load the syscall nr failed.
> That happens before the TIF_SINGLESTEP fixup.
>
> Basically, the overloaded use of TIF_SINGLESTEP was complicated and
> hard to understand, and the new behavior is straightforward and
> consistent with other entries, even if it's a bit slower.
>
> We could introduce a new TIF_SYSENTER_TF and use it directly, or we
> could accelerate the TF fixup in regs->flags by switching to a
> different entry path (I had a draft patch to do that), but I tend to
> favor simplicity for things that aren't performance-critical.

The alternate entry path wouldn't be very big, just 5 instructions
with the OR being the only difference.

Another option is to use a per-cpu var instead of a TIF flag.

--
Brian Gerst


Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling

2016-03-07 Thread Andy Lutomirski
On Mon, Mar 7, 2016 at 9:17 AM, Brian Gerst  wrote:
> On Sun, Mar 6, 2016 at 12:52 AM, Andy Lutomirski  wrote:
>> Due to a blatant design error, SYSENTER doesn't clear TF.  As a result,
>> if a user does SYSENTER with TF set, we will single-step through the
>> kernel until something clears TF.  There is absolutely nothing we can
>> do to prevent this short of turning off SYSENTER [1].
>>
>> Simplify the handling considerably with two changes:
>>
>> 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC.  We can
>>add TF to that list of flags to sanitize with no overhead whatsoever.
>>
>> 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue.
>
> What is wrong with the current method of clearing TF and setting
> TIF_SINGLESTEP on the first debug trap?  This patch actually increases
> complexity because it has to check for a range of addresses rather
> than just the first instruction, plus it has to singlestep all the way
> through the SYSENTER prologue.
>
> Unless there is an actual issue with TIF_SINGLESTEP, I don't think
> this patch is an improvement.

TIF_SINGLESTEP has bizarrely overloaded semantics.

There's this:

/*
 * If we stepped into a sysenter/syscall insn, it trapped in
 * kernel mode; do_debug() cleared TF and set TIF_SINGLESTEP.
 * If user-mode had set TF itself, then it's still clear from
 * do_debug() and we need to set it again to restore the user
 * state so we don't wrongly set TIF_FORCED_TF below.
 * If enable_single_step() was used last and that is what
 * set TIF_SINGLESTEP, then both TF and TIF_FORCED_TF are
 * already set and our bookkeeping is fine.
 */
if (unlikely(test_tsk_thread_flag(child, TIF_SINGLESTEP)))
regs->flags |= X86_EFLAGS_TF;

but TIF_SINGLESTEP is also used for other things.  (And I need to
follow up with a patch to remove that comment.)  This results in
incomprehensible behavior: if a user program sets TF and does
SYSENTER, then TIF_SINGLESTEP gets set (and stays set!).  This does
not happen if a user sets TF and does INT80 or SYSCALL.  There was at
least one bug in here that Oleg fixed a while back, and I wouldn't be
at all surprised if there were others.

I don't know what would happen if TF were set and SYSENTER were used
to do a syscall where the __get_user to load the syscall nr failed.
That happens before the TIF_SINGLESTEP fixup.

Basically, the overloaded use of TIF_SINGLESTEP was complicated and
hard to understand, and the new behavior is straightforward and
consistent with other entries, even if it's a bit slower.

We could introduce a new TIF_SYSENTER_TF and use it directly, or we
could accelerate the TF fixup in regs->flags by switching to a
different entry path (I had a draft patch to do that), but I tend to
favor simplicity for things that aren't performance-critical.


Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling

2016-03-07 Thread Andy Lutomirski
On Mon, Mar 7, 2016 at 9:17 AM, Brian Gerst  wrote:
> On Sun, Mar 6, 2016 at 12:52 AM, Andy Lutomirski  wrote:
>> Due to a blatant design error, SYSENTER doesn't clear TF.  As a result,
>> if a user does SYSENTER with TF set, we will single-step through the
>> kernel until something clears TF.  There is absolutely nothing we can
>> do to prevent this short of turning off SYSENTER [1].
>>
>> Simplify the handling considerably with two changes:
>>
>> 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC.  We can
>>add TF to that list of flags to sanitize with no overhead whatsoever.
>>
>> 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue.
>
> What is wrong with the current method of clearing TF and setting
> TIF_SINGLESTEP on the first debug trap?  This patch actually increases
> complexity because it has to check for a range of addresses rather
> than just the first instruction, plus it has to singlestep all the way
> through the SYSENTER prologue.
>
> Unless there is an actual issue with TIF_SINGLESTEP, I don't think
> this patch is an improvement.

TIF_SINGLESTEP has bizarrely overloaded semantics.

There's this:

/*
 * If we stepped into a sysenter/syscall insn, it trapped in
 * kernel mode; do_debug() cleared TF and set TIF_SINGLESTEP.
 * If user-mode had set TF itself, then it's still clear from
 * do_debug() and we need to set it again to restore the user
 * state so we don't wrongly set TIF_FORCED_TF below.
 * If enable_single_step() was used last and that is what
 * set TIF_SINGLESTEP, then both TF and TIF_FORCED_TF are
 * already set and our bookkeeping is fine.
 */
if (unlikely(test_tsk_thread_flag(child, TIF_SINGLESTEP)))
regs->flags |= X86_EFLAGS_TF;

but TIF_SINGLESTEP is also used for other things.  (And I need to
follow up with a patch to remove that comment.)  This results in
incomprehensible behavior: if a user program sets TF and does
SYSENTER, then TIF_SINGLESTEP gets set (and stays set!).  This does
not happen if a user sets TF and does INT80 or SYSCALL.  There was at
least one bug in here that Oleg fixed a while back, and I wouldn't be
at all surprised if there were others.

I don't know what would happen if TF were set and SYSENTER were used
to do a syscall where the __get_user to load the syscall nr failed.
That happens before the TIF_SINGLESTEP fixup.

Basically, the overloaded use of TIF_SINGLESTEP was complicated and
hard to understand, and the new behavior is straightforward and
consistent with other entries, even if it's a bit slower.

We could introduce a new TIF_SYSENTER_TF and use it directly, or we
could accelerate the TF fixup in regs->flags by switching to a
different entry path (I had a draft patch to do that), but I tend to
favor simplicity for things that aren't performance-critical.


Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling

2016-03-07 Thread Brian Gerst
On Sun, Mar 6, 2016 at 12:52 AM, Andy Lutomirski  wrote:
> Due to a blatant design error, SYSENTER doesn't clear TF.  As a result,
> if a user does SYSENTER with TF set, we will single-step through the
> kernel until something clears TF.  There is absolutely nothing we can
> do to prevent this short of turning off SYSENTER [1].
>
> Simplify the handling considerably with two changes:
>
> 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC.  We can
>add TF to that list of flags to sanitize with no overhead whatsoever.
>
> 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue.

What is wrong with the current method of clearing TF and setting
TIF_SINGLESTEP on the first debug trap?  This patch actually increases
complexity because it has to check for a range of addresses rather
than just the first instruction, plus it has to singlestep all the way
through the SYSENTER prologue.

Unless there is an actual issue with TIF_SINGLESTEP, I don't think
this patch is an improvement.

--
Brian Gerst


Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling

2016-03-07 Thread Brian Gerst
On Sun, Mar 6, 2016 at 12:52 AM, Andy Lutomirski  wrote:
> Due to a blatant design error, SYSENTER doesn't clear TF.  As a result,
> if a user does SYSENTER with TF set, we will single-step through the
> kernel until something clears TF.  There is absolutely nothing we can
> do to prevent this short of turning off SYSENTER [1].
>
> Simplify the handling considerably with two changes:
>
> 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC.  We can
>add TF to that list of flags to sanitize with no overhead whatsoever.
>
> 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue.

What is wrong with the current method of clearing TF and setting
TIF_SINGLESTEP on the first debug trap?  This patch actually increases
complexity because it has to check for a range of addresses rather
than just the first instruction, plus it has to singlestep all the way
through the SYSENTER prologue.

Unless there is an actual issue with TIF_SINGLESTEP, I don't think
this patch is an improvement.

--
Brian Gerst


Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling

2016-03-06 Thread Andrew Cooper
On 06/03/16 17:36, Andy Lutomirski wrote:
>
>> I haven't read the Xen hypervisor code, but what are those 5 words
>> that were pushed on the stack by the hypervisor?  It suspiciously is
>> the size of an IRET frame.
> I think it is nominally an IRET frame.

It is a notminal IRET frame.  Even to this day, Xen doesn't support
anything other "making it appear as if an interrupt/exception occurred",
even with the syscall/sysenter and artificial entrypoints.

The Xen entrypoint logic predates the introduction of the
syscall/sysenter support in Linux.  At the point where your hammer is
already iret shaped and you have a forked version of Linux for running
as a guest, modifying sysenter to be iret shaped is an easy option.  For
better or worse, this is now the ABI.

> One might wonder what's in it, given that the hypervisor doesn't know what 
> the old IP or SP was.

Looking at the code which synthesizes the iret frame

pushq $FLAT_USER_SS
pushq $0
pushfq
pushq $3 /* ring 3 null cs */
pushq $0 /* null rip */

Completely ignoring it definitely the best course of action.

>
>>  Considering that we don't use SYSEXIT on
>> Xen anymore, can we just redirect SYSENTER to the INT80 handler?
>> Perhaps even just disable SYSENTER support in the VDSO on Xen.   I
>> can't imagine SYSENTER is any faster than INT80 on Xen, because it has
>> to trap to the hypervisor first.
>>
> I think we should leave it enabled -- having user programs on Xen PV
> trap into the hypervisor for a system call using SYSENTER will still
> be much faster than using INT $0x80 as long as the hypervisor does
> something reasonable.

The trap into Xen has to happen either way (even for the INT $0x80
path).  There is almost certainly room for improvement in both paths,
but in principle the sysenter path will be faster.

~Andrew


Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling

2016-03-06 Thread Andrew Cooper
On 06/03/16 17:36, Andy Lutomirski wrote:
>
>> I haven't read the Xen hypervisor code, but what are those 5 words
>> that were pushed on the stack by the hypervisor?  It suspiciously is
>> the size of an IRET frame.
> I think it is nominally an IRET frame.

It is a notminal IRET frame.  Even to this day, Xen doesn't support
anything other "making it appear as if an interrupt/exception occurred",
even with the syscall/sysenter and artificial entrypoints.

The Xen entrypoint logic predates the introduction of the
syscall/sysenter support in Linux.  At the point where your hammer is
already iret shaped and you have a forked version of Linux for running
as a guest, modifying sysenter to be iret shaped is an easy option.  For
better or worse, this is now the ABI.

> One might wonder what's in it, given that the hypervisor doesn't know what 
> the old IP or SP was.

Looking at the code which synthesizes the iret frame

pushq $FLAT_USER_SS
pushq $0
pushfq
pushq $3 /* ring 3 null cs */
pushq $0 /* null rip */

Completely ignoring it definitely the best course of action.

>
>>  Considering that we don't use SYSEXIT on
>> Xen anymore, can we just redirect SYSENTER to the INT80 handler?
>> Perhaps even just disable SYSENTER support in the VDSO on Xen.   I
>> can't imagine SYSENTER is any faster than INT80 on Xen, because it has
>> to trap to the hypervisor first.
>>
> I think we should leave it enabled -- having user programs on Xen PV
> trap into the hypervisor for a system call using SYSENTER will still
> be much faster than using INT $0x80 as long as the hypervisor does
> something reasonable.

The trap into Xen has to happen either way (even for the INT $0x80
path).  There is almost certainly room for improvement in both paths,
but in principle the sysenter path will be faster.

~Andrew


Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling

2016-03-06 Thread Borislav Petkov
On Sun, Mar 06, 2016 at 09:36:42AM -0800, Andy Lutomirski wrote:
> > ALTERNATIVE "movl TSS_sysenter_sp0(%esp), %esp", "addl $5*4, %esp",
> > X86_FEATURE_XENPV
> 
> That might work.

Yap, and as we said on IRC, KALLSYMS can be off - even if it is "if
EXPERT" so we need to think of a better way to compute start address and
size of entry_SYSENTER_32 and the _compat one.

We probably could reuse some of the ELF parsing machinery of relocs.c or
so...

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.


Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling

2016-03-06 Thread Borislav Petkov
On Sun, Mar 06, 2016 at 09:36:42AM -0800, Andy Lutomirski wrote:
> > ALTERNATIVE "movl TSS_sysenter_sp0(%esp), %esp", "addl $5*4, %esp",
> > X86_FEATURE_XENPV
> 
> That might work.

Yap, and as we said on IRC, KALLSYMS can be off - even if it is "if
EXPERT" so we need to think of a better way to compute start address and
size of entry_SYSENTER_32 and the _compat one.

We probably could reuse some of the ELF parsing machinery of relocs.c or
so...

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.


Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling

2016-03-06 Thread Andy Lutomirski
On Sun, Mar 6, 2016 at 9:30 AM, Brian Gerst  wrote:
> On Sun, Mar 6, 2016 at 11:55 AM, Borislav Petkov  wrote:
>> On Sat, Mar 05, 2016 at 09:52:20PM -0800, Andy Lutomirski wrote:
>>> Due to a blatant design error, SYSENTER doesn't clear TF.  As a result,
>>> if a user does SYSENTER with TF set, we will single-step through the
>>> kernel until something clears TF.  There is absolutely nothing we can
>>> do to prevent this short of turning off SYSENTER [1].
>>>
>>> Simplify the handling considerably with two changes:
>>>
>>> 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC.  We can
>>>add TF to that list of flags to sanitize with no overhead whatsoever.
>>>
>>> 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue.
>>>
>>> That's all we need to do.
>>>
>>> Don't get too excited -- our handling is still buggy on 32-bit
>>> kernels.  There's nothing wrong with the SYSENTER code itself, but
>>> the #DB prologue has a clever fixup for traps on the very first
>>> instruction of entry_SYSENTER_32, and the fixup doesn't work quite
>>> correctly.  The next two patches will fix that.
>>>
>>> [1] We could probably prevent it by forcing BTF on at all times and
>>> making sure we clear TF before any branches in the SYSENTER
>>> code.  Needless to say, this is a bad idea.
>>>
>>> Signed-off-by: Andy Lutomirski 
>>> ---
>>>  arch/x86/entry/entry_32.S| 42 ++--
>>>  arch/x86/entry/entry_64_compat.S |  9 ++-
>>>  arch/x86/include/asm/proto.h | 15 ++--
>>>  arch/x86/kernel/traps.c  | 52 
>>> +---
>>>  4 files changed, 94 insertions(+), 24 deletions(-)
>>>
>>> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
>>> index a8c3424c3392..7700cf695d8c 100644
>>> --- a/arch/x86/entry/entry_32.S
>>> +++ b/arch/x86/entry/entry_32.S
>>> @@ -287,7 +287,26 @@ need_resched:
>>>  END(resume_kernel)
>>>  #endif
>>>
>>> - # SYSENTER  call handler stub
>>> +GLOBAL(__begin_SYSENTER_singlestep_region)
>>> +/*
>>> + * All code from here through __end_SYSENTER_singlestep_region is subject
>>> + * to being single-stepped if a user program sets TF and executes SYSENTER.
>>> + * There is absolutely nothing that we can do to prevent this from 
>>> happening
>>> + * (thanks Intel!).  To keep our handling of this situation as simple as
>>> + * possible, we handle TF just like AC and NT, except that our #DB handler
>>> + * will ignore all of the single-step traps generated in this range.
>>> + */
>>> +
>>> +#ifdef CONFIG_XEN
>>> +/*
>>> + * Xen doesn't set %esp to be precisely what the normal SYSENTER
>>> + * entry point expects, so fix it up before using the normal path.
>>> + */
>>> +ENTRY(xen_sysenter_target)
>>> + addl$5*4, %esp  /* remove xen-provided frame 
>>> */
>>> + jmp sysenter_past_esp
>>> +#endif
>>> +
>>>  ENTRY(entry_SYSENTER_32)
>>>   movlTSS_sysenter_sp0(%esp), %esp
>>>  sysenter_past_esp:
>>
>> Can you do this below (ontop of your diff) and get rid of those
>> __{begin,end}_SYSENTER_singlestep_region and __end_entry_SYSENTER_compat
>> globals and use the entry_SYSENTER_{32|compat} symbols instead?
>>
>> From a quick scan, kallsyms can give you the symbol size too so that you
>> can compute where it ends:
>>
>> readelf -a vmlinux | grep entry_SYSENTER
>>  55454: 8170aff099 FUNCGLOBAL DEFAULT1 
>> entry_SYSENTER_compat
>>  62596: 8170b053 0 NOTYPE  GLOBAL DEFAULT1 
>> __end_entry_SYSENTER_comp
>>
>> 0x8170aff0 + 99 = 0x8170b053
>>
>> ---
>> Index: b/arch/x86/entry/entry_32.S
>> ===
>> --- a/arch/x86/entry/entry_32.S 2016-03-06 17:47:10.059733163 +0100
>> +++ b/arch/x86/entry/entry_32.S 2016-03-06 17:46:52.235733410 +0100
>> @@ -297,18 +297,13 @@ GLOBAL(__begin_SYSENTER_singlestep_regio
>>   * will ignore all of the single-step traps generated in this range.
>>   */
>>
>> +ENTRY(entry_SYSENTER_32)
>>  #ifdef CONFIG_XEN
>> -/*
>> - * Xen doesn't set %esp to be precisely what the normal SYSENTER
>> - * entry point expects, so fix it up before using the normal path.
>> - */
>> -ENTRY(xen_sysenter_target)
>> addl$5*4, %esp  /* remove xen-provided frame 
>> */
>> jmp sysenter_past_esp
>> -#endif
>> -
>> -ENTRY(entry_SYSENTER_32)
>> +#else
>> movlTSS_sysenter_sp0(%esp), %esp
>> +#endif
>>  sysenter_past_esp:
>> pushl   $__USER_DS  /* pt_regs->ss */
>> pushl   %ebp/* pt_regs->sp (stashed in bp) */
>
>
> This won't work if you run a Xen enabled kernel on bare metal.  This
> would work though:
>
> ALTERNATIVE "movl TSS_sysenter_sp0(%esp), %esp", "addl $5*4, %esp",
> X86_FEATURE_XENPV

That might work.

>
> I haven't read the Xen hypervisor code, but what are those 5 words
> that 

Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling

2016-03-06 Thread Andy Lutomirski
On Sun, Mar 6, 2016 at 9:30 AM, Brian Gerst  wrote:
> On Sun, Mar 6, 2016 at 11:55 AM, Borislav Petkov  wrote:
>> On Sat, Mar 05, 2016 at 09:52:20PM -0800, Andy Lutomirski wrote:
>>> Due to a blatant design error, SYSENTER doesn't clear TF.  As a result,
>>> if a user does SYSENTER with TF set, we will single-step through the
>>> kernel until something clears TF.  There is absolutely nothing we can
>>> do to prevent this short of turning off SYSENTER [1].
>>>
>>> Simplify the handling considerably with two changes:
>>>
>>> 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC.  We can
>>>add TF to that list of flags to sanitize with no overhead whatsoever.
>>>
>>> 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue.
>>>
>>> That's all we need to do.
>>>
>>> Don't get too excited -- our handling is still buggy on 32-bit
>>> kernels.  There's nothing wrong with the SYSENTER code itself, but
>>> the #DB prologue has a clever fixup for traps on the very first
>>> instruction of entry_SYSENTER_32, and the fixup doesn't work quite
>>> correctly.  The next two patches will fix that.
>>>
>>> [1] We could probably prevent it by forcing BTF on at all times and
>>> making sure we clear TF before any branches in the SYSENTER
>>> code.  Needless to say, this is a bad idea.
>>>
>>> Signed-off-by: Andy Lutomirski 
>>> ---
>>>  arch/x86/entry/entry_32.S| 42 ++--
>>>  arch/x86/entry/entry_64_compat.S |  9 ++-
>>>  arch/x86/include/asm/proto.h | 15 ++--
>>>  arch/x86/kernel/traps.c  | 52 
>>> +---
>>>  4 files changed, 94 insertions(+), 24 deletions(-)
>>>
>>> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
>>> index a8c3424c3392..7700cf695d8c 100644
>>> --- a/arch/x86/entry/entry_32.S
>>> +++ b/arch/x86/entry/entry_32.S
>>> @@ -287,7 +287,26 @@ need_resched:
>>>  END(resume_kernel)
>>>  #endif
>>>
>>> - # SYSENTER  call handler stub
>>> +GLOBAL(__begin_SYSENTER_singlestep_region)
>>> +/*
>>> + * All code from here through __end_SYSENTER_singlestep_region is subject
>>> + * to being single-stepped if a user program sets TF and executes SYSENTER.
>>> + * There is absolutely nothing that we can do to prevent this from 
>>> happening
>>> + * (thanks Intel!).  To keep our handling of this situation as simple as
>>> + * possible, we handle TF just like AC and NT, except that our #DB handler
>>> + * will ignore all of the single-step traps generated in this range.
>>> + */
>>> +
>>> +#ifdef CONFIG_XEN
>>> +/*
>>> + * Xen doesn't set %esp to be precisely what the normal SYSENTER
>>> + * entry point expects, so fix it up before using the normal path.
>>> + */
>>> +ENTRY(xen_sysenter_target)
>>> + addl$5*4, %esp  /* remove xen-provided frame 
>>> */
>>> + jmp sysenter_past_esp
>>> +#endif
>>> +
>>>  ENTRY(entry_SYSENTER_32)
>>>   movlTSS_sysenter_sp0(%esp), %esp
>>>  sysenter_past_esp:
>>
>> Can you do this below (ontop of your diff) and get rid of those
>> __{begin,end}_SYSENTER_singlestep_region and __end_entry_SYSENTER_compat
>> globals and use the entry_SYSENTER_{32|compat} symbols instead?
>>
>> From a quick scan, kallsyms can give you the symbol size too so that you
>> can compute where it ends:
>>
>> readelf -a vmlinux | grep entry_SYSENTER
>>  55454: 8170aff099 FUNCGLOBAL DEFAULT1 
>> entry_SYSENTER_compat
>>  62596: 8170b053 0 NOTYPE  GLOBAL DEFAULT1 
>> __end_entry_SYSENTER_comp
>>
>> 0x8170aff0 + 99 = 0x8170b053
>>
>> ---
>> Index: b/arch/x86/entry/entry_32.S
>> ===
>> --- a/arch/x86/entry/entry_32.S 2016-03-06 17:47:10.059733163 +0100
>> +++ b/arch/x86/entry/entry_32.S 2016-03-06 17:46:52.235733410 +0100
>> @@ -297,18 +297,13 @@ GLOBAL(__begin_SYSENTER_singlestep_regio
>>   * will ignore all of the single-step traps generated in this range.
>>   */
>>
>> +ENTRY(entry_SYSENTER_32)
>>  #ifdef CONFIG_XEN
>> -/*
>> - * Xen doesn't set %esp to be precisely what the normal SYSENTER
>> - * entry point expects, so fix it up before using the normal path.
>> - */
>> -ENTRY(xen_sysenter_target)
>> addl$5*4, %esp  /* remove xen-provided frame 
>> */
>> jmp sysenter_past_esp
>> -#endif
>> -
>> -ENTRY(entry_SYSENTER_32)
>> +#else
>> movlTSS_sysenter_sp0(%esp), %esp
>> +#endif
>>  sysenter_past_esp:
>> pushl   $__USER_DS  /* pt_regs->ss */
>> pushl   %ebp/* pt_regs->sp (stashed in bp) */
>
>
> This won't work if you run a Xen enabled kernel on bare metal.  This
> would work though:
>
> ALTERNATIVE "movl TSS_sysenter_sp0(%esp), %esp", "addl $5*4, %esp",
> X86_FEATURE_XENPV

That might work.

>
> I haven't read the Xen hypervisor code, but what are those 5 words
> that were pushed on the stack by the hypervisor?  It 

Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling

2016-03-06 Thread Brian Gerst
On Sun, Mar 6, 2016 at 11:55 AM, Borislav Petkov  wrote:
> On Sat, Mar 05, 2016 at 09:52:20PM -0800, Andy Lutomirski wrote:
>> Due to a blatant design error, SYSENTER doesn't clear TF.  As a result,
>> if a user does SYSENTER with TF set, we will single-step through the
>> kernel until something clears TF.  There is absolutely nothing we can
>> do to prevent this short of turning off SYSENTER [1].
>>
>> Simplify the handling considerably with two changes:
>>
>> 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC.  We can
>>add TF to that list of flags to sanitize with no overhead whatsoever.
>>
>> 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue.
>>
>> That's all we need to do.
>>
>> Don't get too excited -- our handling is still buggy on 32-bit
>> kernels.  There's nothing wrong with the SYSENTER code itself, but
>> the #DB prologue has a clever fixup for traps on the very first
>> instruction of entry_SYSENTER_32, and the fixup doesn't work quite
>> correctly.  The next two patches will fix that.
>>
>> [1] We could probably prevent it by forcing BTF on at all times and
>> making sure we clear TF before any branches in the SYSENTER
>> code.  Needless to say, this is a bad idea.
>>
>> Signed-off-by: Andy Lutomirski 
>> ---
>>  arch/x86/entry/entry_32.S| 42 ++--
>>  arch/x86/entry/entry_64_compat.S |  9 ++-
>>  arch/x86/include/asm/proto.h | 15 ++--
>>  arch/x86/kernel/traps.c  | 52 
>> +---
>>  4 files changed, 94 insertions(+), 24 deletions(-)
>>
>> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
>> index a8c3424c3392..7700cf695d8c 100644
>> --- a/arch/x86/entry/entry_32.S
>> +++ b/arch/x86/entry/entry_32.S
>> @@ -287,7 +287,26 @@ need_resched:
>>  END(resume_kernel)
>>  #endif
>>
>> - # SYSENTER  call handler stub
>> +GLOBAL(__begin_SYSENTER_singlestep_region)
>> +/*
>> + * All code from here through __end_SYSENTER_singlestep_region is subject
>> + * to being single-stepped if a user program sets TF and executes SYSENTER.
>> + * There is absolutely nothing that we can do to prevent this from happening
>> + * (thanks Intel!).  To keep our handling of this situation as simple as
>> + * possible, we handle TF just like AC and NT, except that our #DB handler
>> + * will ignore all of the single-step traps generated in this range.
>> + */
>> +
>> +#ifdef CONFIG_XEN
>> +/*
>> + * Xen doesn't set %esp to be precisely what the normal SYSENTER
>> + * entry point expects, so fix it up before using the normal path.
>> + */
>> +ENTRY(xen_sysenter_target)
>> + addl$5*4, %esp  /* remove xen-provided frame */
>> + jmp sysenter_past_esp
>> +#endif
>> +
>>  ENTRY(entry_SYSENTER_32)
>>   movlTSS_sysenter_sp0(%esp), %esp
>>  sysenter_past_esp:
>
> Can you do this below (ontop of your diff) and get rid of those
> __{begin,end}_SYSENTER_singlestep_region and __end_entry_SYSENTER_compat
> globals and use the entry_SYSENTER_{32|compat} symbols instead?
>
> From a quick scan, kallsyms can give you the symbol size too so that you
> can compute where it ends:
>
> readelf -a vmlinux | grep entry_SYSENTER
>  55454: 8170aff099 FUNCGLOBAL DEFAULT1 
> entry_SYSENTER_compat
>  62596: 8170b053 0 NOTYPE  GLOBAL DEFAULT1 
> __end_entry_SYSENTER_comp
>
> 0x8170aff0 + 99 = 0x8170b053
>
> ---
> Index: b/arch/x86/entry/entry_32.S
> ===
> --- a/arch/x86/entry/entry_32.S 2016-03-06 17:47:10.059733163 +0100
> +++ b/arch/x86/entry/entry_32.S 2016-03-06 17:46:52.235733410 +0100
> @@ -297,18 +297,13 @@ GLOBAL(__begin_SYSENTER_singlestep_regio
>   * will ignore all of the single-step traps generated in this range.
>   */
>
> +ENTRY(entry_SYSENTER_32)
>  #ifdef CONFIG_XEN
> -/*
> - * Xen doesn't set %esp to be precisely what the normal SYSENTER
> - * entry point expects, so fix it up before using the normal path.
> - */
> -ENTRY(xen_sysenter_target)
> addl$5*4, %esp  /* remove xen-provided frame 
> */
> jmp sysenter_past_esp
> -#endif
> -
> -ENTRY(entry_SYSENTER_32)
> +#else
> movlTSS_sysenter_sp0(%esp), %esp
> +#endif
>  sysenter_past_esp:
> pushl   $__USER_DS  /* pt_regs->ss */
> pushl   %ebp/* pt_regs->sp (stashed in bp) */


This won't work if you run a Xen enabled kernel on bare metal.  This
would work though:

ALTERNATIVE "movl TSS_sysenter_sp0(%esp), %esp", "addl $5*4, %esp",
X86_FEATURE_XENPV

I haven't read the Xen hypervisor code, but what are those 5 words
that were pushed on the stack by the hypervisor?  It suspiciously is
the size of an IRET frame.  Considering that we don't use SYSEXIT on
Xen anymore, can we just redirect SYSENTER to the INT80 handler?
Perhaps even just disable 

Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling

2016-03-06 Thread Brian Gerst
On Sun, Mar 6, 2016 at 11:55 AM, Borislav Petkov  wrote:
> On Sat, Mar 05, 2016 at 09:52:20PM -0800, Andy Lutomirski wrote:
>> Due to a blatant design error, SYSENTER doesn't clear TF.  As a result,
>> if a user does SYSENTER with TF set, we will single-step through the
>> kernel until something clears TF.  There is absolutely nothing we can
>> do to prevent this short of turning off SYSENTER [1].
>>
>> Simplify the handling considerably with two changes:
>>
>> 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC.  We can
>>add TF to that list of flags to sanitize with no overhead whatsoever.
>>
>> 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue.
>>
>> That's all we need to do.
>>
>> Don't get too excited -- our handling is still buggy on 32-bit
>> kernels.  There's nothing wrong with the SYSENTER code itself, but
>> the #DB prologue has a clever fixup for traps on the very first
>> instruction of entry_SYSENTER_32, and the fixup doesn't work quite
>> correctly.  The next two patches will fix that.
>>
>> [1] We could probably prevent it by forcing BTF on at all times and
>> making sure we clear TF before any branches in the SYSENTER
>> code.  Needless to say, this is a bad idea.
>>
>> Signed-off-by: Andy Lutomirski 
>> ---
>>  arch/x86/entry/entry_32.S| 42 ++--
>>  arch/x86/entry/entry_64_compat.S |  9 ++-
>>  arch/x86/include/asm/proto.h | 15 ++--
>>  arch/x86/kernel/traps.c  | 52 
>> +---
>>  4 files changed, 94 insertions(+), 24 deletions(-)
>>
>> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
>> index a8c3424c3392..7700cf695d8c 100644
>> --- a/arch/x86/entry/entry_32.S
>> +++ b/arch/x86/entry/entry_32.S
>> @@ -287,7 +287,26 @@ need_resched:
>>  END(resume_kernel)
>>  #endif
>>
>> - # SYSENTER  call handler stub
>> +GLOBAL(__begin_SYSENTER_singlestep_region)
>> +/*
>> + * All code from here through __end_SYSENTER_singlestep_region is subject
>> + * to being single-stepped if a user program sets TF and executes SYSENTER.
>> + * There is absolutely nothing that we can do to prevent this from happening
>> + * (thanks Intel!).  To keep our handling of this situation as simple as
>> + * possible, we handle TF just like AC and NT, except that our #DB handler
>> + * will ignore all of the single-step traps generated in this range.
>> + */
>> +
>> +#ifdef CONFIG_XEN
>> +/*
>> + * Xen doesn't set %esp to be precisely what the normal SYSENTER
>> + * entry point expects, so fix it up before using the normal path.
>> + */
>> +ENTRY(xen_sysenter_target)
>> + addl$5*4, %esp  /* remove xen-provided frame */
>> + jmp sysenter_past_esp
>> +#endif
>> +
>>  ENTRY(entry_SYSENTER_32)
>>   movlTSS_sysenter_sp0(%esp), %esp
>>  sysenter_past_esp:
>
> Can you do this below (ontop of your diff) and get rid of those
> __{begin,end}_SYSENTER_singlestep_region and __end_entry_SYSENTER_compat
> globals and use the entry_SYSENTER_{32|compat} symbols instead?
>
> From a quick scan, kallsyms can give you the symbol size too so that you
> can compute where it ends:
>
> readelf -a vmlinux | grep entry_SYSENTER
>  55454: 8170aff099 FUNCGLOBAL DEFAULT1 
> entry_SYSENTER_compat
>  62596: 8170b053 0 NOTYPE  GLOBAL DEFAULT1 
> __end_entry_SYSENTER_comp
>
> 0x8170aff0 + 99 = 0x8170b053
>
> ---
> Index: b/arch/x86/entry/entry_32.S
> ===
> --- a/arch/x86/entry/entry_32.S 2016-03-06 17:47:10.059733163 +0100
> +++ b/arch/x86/entry/entry_32.S 2016-03-06 17:46:52.235733410 +0100
> @@ -297,18 +297,13 @@ GLOBAL(__begin_SYSENTER_singlestep_regio
>   * will ignore all of the single-step traps generated in this range.
>   */
>
> +ENTRY(entry_SYSENTER_32)
>  #ifdef CONFIG_XEN
> -/*
> - * Xen doesn't set %esp to be precisely what the normal SYSENTER
> - * entry point expects, so fix it up before using the normal path.
> - */
> -ENTRY(xen_sysenter_target)
> addl$5*4, %esp  /* remove xen-provided frame 
> */
> jmp sysenter_past_esp
> -#endif
> -
> -ENTRY(entry_SYSENTER_32)
> +#else
> movlTSS_sysenter_sp0(%esp), %esp
> +#endif
>  sysenter_past_esp:
> pushl   $__USER_DS  /* pt_regs->ss */
> pushl   %ebp/* pt_regs->sp (stashed in bp) */


This won't work if you run a Xen enabled kernel on bare metal.  This
would work though:

ALTERNATIVE "movl TSS_sysenter_sp0(%esp), %esp", "addl $5*4, %esp",
X86_FEATURE_XENPV

I haven't read the Xen hypervisor code, but what are those 5 words
that were pushed on the stack by the hypervisor?  It suspiciously is
the size of an IRET frame.  Considering that we don't use SYSEXIT on
Xen anymore, can we just redirect SYSENTER to the INT80 handler?
Perhaps even just disable SYSENTER support in the VDSO on Xen.   I

Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling

2016-03-06 Thread Borislav Petkov
On Sat, Mar 05, 2016 at 09:52:20PM -0800, Andy Lutomirski wrote:
> Due to a blatant design error, SYSENTER doesn't clear TF.  As a result,
> if a user does SYSENTER with TF set, we will single-step through the
> kernel until something clears TF.  There is absolutely nothing we can
> do to prevent this short of turning off SYSENTER [1].
> 
> Simplify the handling considerably with two changes:
> 
> 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC.  We can
>add TF to that list of flags to sanitize with no overhead whatsoever.
> 
> 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue.
> 
> That's all we need to do.
> 
> Don't get too excited -- our handling is still buggy on 32-bit
> kernels.  There's nothing wrong with the SYSENTER code itself, but
> the #DB prologue has a clever fixup for traps on the very first
> instruction of entry_SYSENTER_32, and the fixup doesn't work quite
> correctly.  The next two patches will fix that.
> 
> [1] We could probably prevent it by forcing BTF on at all times and
> making sure we clear TF before any branches in the SYSENTER
> code.  Needless to say, this is a bad idea.
> 
> Signed-off-by: Andy Lutomirski 
> ---
>  arch/x86/entry/entry_32.S| 42 ++--
>  arch/x86/entry/entry_64_compat.S |  9 ++-
>  arch/x86/include/asm/proto.h | 15 ++--
>  arch/x86/kernel/traps.c  | 52 
> +---
>  4 files changed, 94 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index a8c3424c3392..7700cf695d8c 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -287,7 +287,26 @@ need_resched:
>  END(resume_kernel)
>  #endif
>  
> - # SYSENTER  call handler stub
> +GLOBAL(__begin_SYSENTER_singlestep_region)
> +/*
> + * All code from here through __end_SYSENTER_singlestep_region is subject
> + * to being single-stepped if a user program sets TF and executes SYSENTER.
> + * There is absolutely nothing that we can do to prevent this from happening
> + * (thanks Intel!).  To keep our handling of this situation as simple as
> + * possible, we handle TF just like AC and NT, except that our #DB handler
> + * will ignore all of the single-step traps generated in this range.
> + */
> +
> +#ifdef CONFIG_XEN
> +/*
> + * Xen doesn't set %esp to be precisely what the normal SYSENTER
> + * entry point expects, so fix it up before using the normal path.
> + */
> +ENTRY(xen_sysenter_target)
> + addl$5*4, %esp  /* remove xen-provided frame */
> + jmp sysenter_past_esp
> +#endif
> +
>  ENTRY(entry_SYSENTER_32)
>   movlTSS_sysenter_sp0(%esp), %esp
>  sysenter_past_esp:

Can you do this below (ontop of your diff) and get rid of those
__{begin,end}_SYSENTER_singlestep_region and __end_entry_SYSENTER_compat
globals and use the entry_SYSENTER_{32|compat} symbols instead?

>From a quick scan, kallsyms can give you the symbol size too so that you
can compute where it ends:

readelf -a vmlinux | grep entry_SYSENTER
 55454: 8170aff099 FUNCGLOBAL DEFAULT1 entry_SYSENTER_compat
 62596: 8170b053 0 NOTYPE  GLOBAL DEFAULT1 
__end_entry_SYSENTER_comp

0x8170aff0 + 99 = 0x8170b053

---
Index: b/arch/x86/entry/entry_32.S
===
--- a/arch/x86/entry/entry_32.S 2016-03-06 17:47:10.059733163 +0100
+++ b/arch/x86/entry/entry_32.S 2016-03-06 17:46:52.235733410 +0100
@@ -297,18 +297,13 @@ GLOBAL(__begin_SYSENTER_singlestep_regio
  * will ignore all of the single-step traps generated in this range.
  */
 
+ENTRY(entry_SYSENTER_32)
 #ifdef CONFIG_XEN
-/*
- * Xen doesn't set %esp to be precisely what the normal SYSENTER
- * entry point expects, so fix it up before using the normal path.
- */
-ENTRY(xen_sysenter_target)
addl$5*4, %esp  /* remove xen-provided frame */
jmp sysenter_past_esp
-#endif
-
-ENTRY(entry_SYSENTER_32)
+#else
movlTSS_sysenter_sp0(%esp), %esp
+#endif
 sysenter_past_esp:
pushl   $__USER_DS  /* pt_regs->ss */
pushl   %ebp/* pt_regs->sp (stashed in bp) */

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.


Re: [PATCH v2 07/10] x86/entry: Vastly simplify SYSENTER TF handling

2016-03-06 Thread Borislav Petkov
On Sat, Mar 05, 2016 at 09:52:20PM -0800, Andy Lutomirski wrote:
> Due to a blatant design error, SYSENTER doesn't clear TF.  As a result,
> if a user does SYSENTER with TF set, we will single-step through the
> kernel until something clears TF.  There is absolutely nothing we can
> do to prevent this short of turning off SYSENTER [1].
> 
> Simplify the handling considerably with two changes:
> 
> 1. We already sanitize EFLAGS in SYSENTER to clear NT and AC.  We can
>add TF to that list of flags to sanitize with no overhead whatsoever.
> 
> 2. Teach do_debug to ignore single-step traps in the SYSENTER prologue.
> 
> That's all we need to do.
> 
> Don't get too excited -- our handling is still buggy on 32-bit
> kernels.  There's nothing wrong with the SYSENTER code itself, but
> the #DB prologue has a clever fixup for traps on the very first
> instruction of entry_SYSENTER_32, and the fixup doesn't work quite
> correctly.  The next two patches will fix that.
> 
> [1] We could probably prevent it by forcing BTF on at all times and
> making sure we clear TF before any branches in the SYSENTER
> code.  Needless to say, this is a bad idea.
> 
> Signed-off-by: Andy Lutomirski 
> ---
>  arch/x86/entry/entry_32.S| 42 ++--
>  arch/x86/entry/entry_64_compat.S |  9 ++-
>  arch/x86/include/asm/proto.h | 15 ++--
>  arch/x86/kernel/traps.c  | 52 
> +---
>  4 files changed, 94 insertions(+), 24 deletions(-)
> 
> diff --git a/arch/x86/entry/entry_32.S b/arch/x86/entry/entry_32.S
> index a8c3424c3392..7700cf695d8c 100644
> --- a/arch/x86/entry/entry_32.S
> +++ b/arch/x86/entry/entry_32.S
> @@ -287,7 +287,26 @@ need_resched:
>  END(resume_kernel)
>  #endif
>  
> - # SYSENTER  call handler stub
> +GLOBAL(__begin_SYSENTER_singlestep_region)
> +/*
> + * All code from here through __end_SYSENTER_singlestep_region is subject
> + * to being single-stepped if a user program sets TF and executes SYSENTER.
> + * There is absolutely nothing that we can do to prevent this from happening
> + * (thanks Intel!).  To keep our handling of this situation as simple as
> + * possible, we handle TF just like AC and NT, except that our #DB handler
> + * will ignore all of the single-step traps generated in this range.
> + */
> +
> +#ifdef CONFIG_XEN
> +/*
> + * Xen doesn't set %esp to be precisely what the normal SYSENTER
> + * entry point expects, so fix it up before using the normal path.
> + */
> +ENTRY(xen_sysenter_target)
> + addl$5*4, %esp  /* remove xen-provided frame */
> + jmp sysenter_past_esp
> +#endif
> +
>  ENTRY(entry_SYSENTER_32)
>   movlTSS_sysenter_sp0(%esp), %esp
>  sysenter_past_esp:

Can you do this below (ontop of your diff) and get rid of those
__{begin,end}_SYSENTER_singlestep_region and __end_entry_SYSENTER_compat
globals and use the entry_SYSENTER_{32|compat} symbols instead?

>From a quick scan, kallsyms can give you the symbol size too so that you
can compute where it ends:

readelf -a vmlinux | grep entry_SYSENTER
 55454: 8170aff099 FUNCGLOBAL DEFAULT1 entry_SYSENTER_compat
 62596: 8170b053 0 NOTYPE  GLOBAL DEFAULT1 
__end_entry_SYSENTER_comp

0x8170aff0 + 99 = 0x8170b053

---
Index: b/arch/x86/entry/entry_32.S
===
--- a/arch/x86/entry/entry_32.S 2016-03-06 17:47:10.059733163 +0100
+++ b/arch/x86/entry/entry_32.S 2016-03-06 17:46:52.235733410 +0100
@@ -297,18 +297,13 @@ GLOBAL(__begin_SYSENTER_singlestep_regio
  * will ignore all of the single-step traps generated in this range.
  */
 
+ENTRY(entry_SYSENTER_32)
 #ifdef CONFIG_XEN
-/*
- * Xen doesn't set %esp to be precisely what the normal SYSENTER
- * entry point expects, so fix it up before using the normal path.
- */
-ENTRY(xen_sysenter_target)
addl$5*4, %esp  /* remove xen-provided frame */
jmp sysenter_past_esp
-#endif
-
-ENTRY(entry_SYSENTER_32)
+#else
movlTSS_sysenter_sp0(%esp), %esp
+#endif
 sysenter_past_esp:
pushl   $__USER_DS  /* pt_regs->ss */
pushl   %ebp/* pt_regs->sp (stashed in bp) */

-- 
Regards/Gruss,
Boris.

ECO tip #101: Trim your mails when you reply.