Re: [PATCH v2 1/3] x86/ldt: Make modify_ldt synchronous

2015-07-21 Thread Andy Lutomirski
On Tue, Jul 21, 2015 at 7:53 PM, Brian Gerst  wrote:
> On Tue, Jul 21, 2015 at 10:12 PM, Andy Lutomirski  wrote:
>> On Tue, Jul 21, 2015 at 7:01 PM, Brian Gerst  wrote:
>>> On Tue, Jul 21, 2015 at 3:59 PM, Andy Lutomirski  wrote:
 modify_ldt has questionable locking and does not synchronize
 threads.  Improve it: redesign the locking and synchronize all
 threads' LDTs using an IPI on all modifications.
>>>
>>> What does this fix?  I can see sending an IPI if the LDT is
>>> reallocated, but on every update seems unnecessary.
>>>
>>
>> It prevents nastiness in which you're in user mode with an impossible
>> CS or SS, resulting in potentially interesting artifacts in
>> interrupts, NMIs, etc.
>
> By impossible, do you mean a partially updated descriptor when the
> interrupt occurs?  Would making sure that the descriptor is atomically
> updated (using set_64bit()) fix that?
>

I tried to exploit that once and didn't get very far.  If I could
cause the LDT to be populated one bit at a time, I think I could
materialize a call gate out of thin air.  The docs are unclear on
what, if anything, the memory ordering rules are.

I'm more concerned about the case where a segment register caches a
value that is no longer in the LDT.  If it's DS, ES, FS, or GS, it
results in nondeterministic behavior but is otherwise not a big deal.
If it's CS or SS, then an interrupt or exception will write a stack
frame with the selectors but IRET can fault.

If the interrupt is an NMI and certain other conditions are met and
your kernel is older than 4.2-rc3, then you should read the
CVE-2015-5157 advisory I'll send tomorrow :)  Even on 4.2-rc3, the NMI
code still struggles a bit if this happens.

With this patch, you can never be in user mode with CS or SS selectors
that point at the LDT but don't match the LDT.  That makes me a lot
more comfortable with modify_ldt.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/3] x86/ldt: Make modify_ldt synchronous

2015-07-21 Thread Brian Gerst
On Tue, Jul 21, 2015 at 10:12 PM, Andy Lutomirski  wrote:
> On Tue, Jul 21, 2015 at 7:01 PM, Brian Gerst  wrote:
>> On Tue, Jul 21, 2015 at 3:59 PM, Andy Lutomirski  wrote:
>>> modify_ldt has questionable locking and does not synchronize
>>> threads.  Improve it: redesign the locking and synchronize all
>>> threads' LDTs using an IPI on all modifications.
>>
>> What does this fix?  I can see sending an IPI if the LDT is
>> reallocated, but on every update seems unnecessary.
>>
>
> It prevents nastiness in which you're in user mode with an impossible
> CS or SS, resulting in potentially interesting artifacts in
> interrupts, NMIs, etc.

By impossible, do you mean a partially updated descriptor when the
interrupt occurs?  Would making sure that the descriptor is atomically
updated (using set_64bit()) fix that?

--
Brian Gerst
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v2 1/3] x86/ldt: Make modify_ldt synchronous

2015-07-21 Thread Andy Lutomirski
On Tue, Jul 21, 2015 at 7:04 PM, Boris Ostrovsky
 wrote:
>
>
> On 07/21/2015 08:49 PM, Andrew Cooper wrote:
>>
>> On 22/07/2015 01:28, Andy Lutomirski wrote:
>>>
>>> On Tue, Jul 21, 2015 at 5:21 PM, Andrew Cooper
>>>  wrote:

 On 22/07/2015 01:07, Andy Lutomirski wrote:
>
> On Tue, Jul 21, 2015 at 4:38 PM, Andrew Cooper
>  wrote:
>>
>> On 21/07/2015 22:53, Boris Ostrovsky wrote:
>>>
>>> On 07/21/2015 03:59 PM, Andy Lutomirski wrote:

 --- a/arch/x86/include/asm/mmu_context.h
 +++ b/arch/x86/include/asm/mmu_context.h
 @@ -34,6 +34,44 @@ static inline void load_mm_cr4(struct mm_struct
 *mm) {}
#endif
  /*
 + * ldt_structs can be allocated, used, and freed, but they are
 never
 + * modified while live.
 + */
 +struct ldt_struct {
 +int size;
 +int __pad;/* keep the descriptors naturally aligned. */
 +struct desc_struct entries[];
 +};
>>>
>>> This breaks Xen which expects LDT to be page-aligned. Not sure why.
>>>
>>> Jan, Andrew?
>>
>> PV guests are not permitted to have writeable mappings to the frames
>> making up the GDT and LDT, so it cannot make unaudited changes to
>> loadable descriptors.  In particular, for a 32bit PV guest, it is only
>> the segment limit which protects Xen from the ring1 guest kernel.
>>
>> A lot of this code hasn't been touched in years, and it certainly
>> predates me.  The alignment requirement appears to come from the
>> virtual
>> region Xen uses to map the guests GDT and LDT.  Strict alignment is
>> required for the GDT so Xen's descriptors starting at 0xe0xx are
>> correct, but the LDT alignment seems to be a side effect of similar
>> codepaths.
>>
>> For an LDT smaller than 8192 entries, I can't see any specific reason
>> for enforcing alignment, other than "that's the way it has always
>> been".
>>
>> However, the guest would still have to relinquish write access to all
>> frames which make up the LDT, which looks to be a bit of an issue
>> given
>> the snippet above.
>
> Does the LDT itself need to be aligned or just the address passed to
> paravirt_alloc_ldt?

 The address which Xen receives needs to be aligned.

 It looks like xen_alloc_ldt() blindly assumes that the desc_struct *ldt
 it is passed is page aligned, and passes it straight through.
>>>
>>> xen_alloc_ldt is just fiddling with protection though, I think.  Isn't
>>> it xen_set_ldt that's the meat?  We could easily pass xen_alloc_ldt a
>>> pointer to the ldt_struct.
>>
>> So it is.  It is the linear_addr in xen_set_ldt() which Xen currently
>> audits to be page aligned.
>>
>> This will allow ldt_struct itself to be page aligned, and for the size
>> field to sit across the base/limit field of what would logically be
>> selector 0x0008  There would be some issues accessing size.  To load
>> frames as an LDT, a guest must drop all refs to the page so that its
>> type may be changed from writeable to segdesc.  After that, an
>> update_descriptor hypercall can be used to change size, and I believe
>> the guest may subsequently recreate read-only mappings to the frames
>> in
>> question (although frankly it is getting late so you will want to
>> double
>> check all of this).
>>
>> Anyhow, this looks like an issue which should be fixed up with
>> slightly
>> more PVOps, rather than enforcing a Xen view of the world on native
>> Linux.
>>
> I could presumably make the allocation the other way around so the
> size is at the end.  I could even use two separate allocations if
> needed.
>
>
> Why not wrap mm_context_t's ldt and size into a struct (just like ldt_struct
> but without __pad) and have a single allocation of ldt?
>
> I.e.
>
> struct ldt_struct {
> int size;
> struct desc_struct *entries;
> }
>
> --- a/arch/x86/include/asm/mmu.h
> +++ b/arch/x86/include/asm/mmu.h
> @@ -9,8 +9,7 @@
>* we put the segment information here.
>*/
>   typedef struct {
> -void *ldt;
> -int size;
> +struct ldt_struct ldt;
> #ifdef CONFIG_X86_64
>   /* True if mm supports a task running in 32 bit compatibility mode. */

I want the atomic read of both of them.  The current code make
interesting assumptions about ordering that may or may not be correct
but are certainly not obviously correct.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/3] x86/ldt: Make modify_ldt synchronous

2015-07-21 Thread Andy Lutomirski
On Tue, Jul 21, 2015 at 7:01 PM, Brian Gerst  wrote:
> On Tue, Jul 21, 2015 at 3:59 PM, Andy Lutomirski  wrote:
>> modify_ldt has questionable locking and does not synchronize
>> threads.  Improve it: redesign the locking and synchronize all
>> threads' LDTs using an IPI on all modifications.
>
> What does this fix?  I can see sending an IPI if the LDT is
> reallocated, but on every update seems unnecessary.
>

It prevents nastiness in which you're in user mode with an impossible
CS or SS, resulting in potentially interesting artifacts in
interrupts, NMIs, etc.

> --
> Brian Gerst



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v2 1/3] x86/ldt: Make modify_ldt synchronous

2015-07-21 Thread Boris Ostrovsky



On 07/21/2015 08:49 PM, Andrew Cooper wrote:

On 22/07/2015 01:28, Andy Lutomirski wrote:

On Tue, Jul 21, 2015 at 5:21 PM, Andrew Cooper
 wrote:

On 22/07/2015 01:07, Andy Lutomirski wrote:

On Tue, Jul 21, 2015 at 4:38 PM, Andrew Cooper
 wrote:

On 21/07/2015 22:53, Boris Ostrovsky wrote:

On 07/21/2015 03:59 PM, Andy Lutomirski wrote:

--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -34,6 +34,44 @@ static inline void load_mm_cr4(struct mm_struct
*mm) {}
   #endif
 /*
+ * ldt_structs can be allocated, used, and freed, but they are never
+ * modified while live.
+ */
+struct ldt_struct {
+int size;
+int __pad;/* keep the descriptors naturally aligned. */
+struct desc_struct entries[];
+};

This breaks Xen which expects LDT to be page-aligned. Not sure why.

Jan, Andrew?

PV guests are not permitted to have writeable mappings to the frames
making up the GDT and LDT, so it cannot make unaudited changes to
loadable descriptors.  In particular, for a 32bit PV guest, it is only
the segment limit which protects Xen from the ring1 guest kernel.

A lot of this code hasn't been touched in years, and it certainly
predates me.  The alignment requirement appears to come from the virtual
region Xen uses to map the guests GDT and LDT.  Strict alignment is
required for the GDT so Xen's descriptors starting at 0xe0xx are
correct, but the LDT alignment seems to be a side effect of similar
codepaths.

For an LDT smaller than 8192 entries, I can't see any specific reason
for enforcing alignment, other than "that's the way it has always been".

However, the guest would still have to relinquish write access to all
frames which make up the LDT, which looks to be a bit of an issue given
the snippet above.

Does the LDT itself need to be aligned or just the address passed to
paravirt_alloc_ldt?

The address which Xen receives needs to be aligned.

It looks like xen_alloc_ldt() blindly assumes that the desc_struct *ldt
it is passed is page aligned, and passes it straight through.

xen_alloc_ldt is just fiddling with protection though, I think.  Isn't
it xen_set_ldt that's the meat?  We could easily pass xen_alloc_ldt a
pointer to the ldt_struct.

So it is.  It is the linear_addr in xen_set_ldt() which Xen currently
audits to be page aligned.


This will allow ldt_struct itself to be page aligned, and for the size
field to sit across the base/limit field of what would logically be
selector 0x0008  There would be some issues accessing size.  To load
frames as an LDT, a guest must drop all refs to the page so that its
type may be changed from writeable to segdesc.  After that, an
update_descriptor hypercall can be used to change size, and I believe
the guest may subsequently recreate read-only mappings to the frames in
question (although frankly it is getting late so you will want to double
check all of this).

Anyhow, this looks like an issue which should be fixed up with slightly
more PVOps, rather than enforcing a Xen view of the world on native Linux.


I could presumably make the allocation the other way around so the
size is at the end.  I could even use two separate allocations if
needed.


Why not wrap mm_context_t's ldt and size into a struct (just like 
ldt_struct but without __pad) and have a single allocation of ldt?


I.e.

struct ldt_struct {
int size;
struct desc_struct *entries;
}

--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -9,8 +9,7 @@
   * we put the segment information here.
   */
  typedef struct {
-void *ldt;
-int size;
+struct ldt_struct ldt;
#ifdef CONFIG_X86_64
  /* True if mm supports a task running in 32 bit compatibility 
mode. */



-boris


I suspect two separate allocations would be the better solution, as it
means that the size field doesn't need to be subject to funny page
permissions.

True.  OTOH we never write to the size field after allocating the thing.

Right, but even reading it is going to cause problems if one of the
paravirt ops can't re-establish ro mappings.

~Andrew

___
Xen-devel mailing list
xen-de...@lists.xen.org
http://lists.xen.org/xen-devel


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/3] x86/ldt: Make modify_ldt synchronous

2015-07-21 Thread Brian Gerst
On Tue, Jul 21, 2015 at 3:59 PM, Andy Lutomirski  wrote:
> modify_ldt has questionable locking and does not synchronize
> threads.  Improve it: redesign the locking and synchronize all
> threads' LDTs using an IPI on all modifications.

What does this fix?  I can see sending an IPI if the LDT is
reallocated, but on every update seems unnecessary.

--
Brian Gerst
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/3] x86/ldt: Make modify_ldt synchronous

2015-07-21 Thread Andy Lutomirski
On Tue, Jul 21, 2015 at 5:49 PM, Andrew Cooper
 wrote:
> On 22/07/2015 01:28, Andy Lutomirski wrote:
>> On Tue, Jul 21, 2015 at 5:21 PM, Andrew Cooper
>>  wrote:
>>> On 22/07/2015 01:07, Andy Lutomirski wrote:
 On Tue, Jul 21, 2015 at 4:38 PM, Andrew Cooper
  wrote:
> On 21/07/2015 22:53, Boris Ostrovsky wrote:
>> On 07/21/2015 03:59 PM, Andy Lutomirski wrote:
>>> --- a/arch/x86/include/asm/mmu_context.h
>>> +++ b/arch/x86/include/asm/mmu_context.h
>>> @@ -34,6 +34,44 @@ static inline void load_mm_cr4(struct mm_struct
>>> *mm) {}
>>>   #endif
>>> /*
>>> + * ldt_structs can be allocated, used, and freed, but they are never
>>> + * modified while live.
>>> + */
>>> +struct ldt_struct {
>>> +int size;
>>> +int __pad;/* keep the descriptors naturally aligned. */
>>> +struct desc_struct entries[];
>>> +};
>>
>> This breaks Xen which expects LDT to be page-aligned. Not sure why.
>>
>> Jan, Andrew?
> PV guests are not permitted to have writeable mappings to the frames
> making up the GDT and LDT, so it cannot make unaudited changes to
> loadable descriptors.  In particular, for a 32bit PV guest, it is only
> the segment limit which protects Xen from the ring1 guest kernel.
>
> A lot of this code hasn't been touched in years, and it certainly
> predates me.  The alignment requirement appears to come from the virtual
> region Xen uses to map the guests GDT and LDT.  Strict alignment is
> required for the GDT so Xen's descriptors starting at 0xe0xx are
> correct, but the LDT alignment seems to be a side effect of similar
> codepaths.
>
> For an LDT smaller than 8192 entries, I can't see any specific reason
> for enforcing alignment, other than "that's the way it has always been".
>
> However, the guest would still have to relinquish write access to all
> frames which make up the LDT, which looks to be a bit of an issue given
> the snippet above.
 Does the LDT itself need to be aligned or just the address passed to
 paravirt_alloc_ldt?
>>> The address which Xen receives needs to be aligned.
>>>
>>> It looks like xen_alloc_ldt() blindly assumes that the desc_struct *ldt
>>> it is passed is page aligned, and passes it straight through.
>> xen_alloc_ldt is just fiddling with protection though, I think.  Isn't
>> it xen_set_ldt that's the meat?  We could easily pass xen_alloc_ldt a
>> pointer to the ldt_struct.
>
> So it is.  It is the linear_addr in xen_set_ldt() which Xen currently
> audits to be page aligned.
>
> This will allow ldt_struct itself to be page aligned, and for the size
> field to sit across the base/limit field of what would logically be
> selector 0x0008  There would be some issues accessing size.  To load
> frames as an LDT, a guest must drop all refs to the page so that its
> type may be changed from writeable to segdesc.  After that, an
> update_descriptor hypercall can be used to change size, and I believe
> the guest may subsequently recreate read-only mappings to the frames in
> question (although frankly it is getting late so you will want to double
> check all of this).
>
> Anyhow, this looks like an issue which should be fixed up with slightly
> more PVOps, rather than enforcing a Xen view of the world on native Linux.
>
 I could presumably make the allocation the other way around so the
 size is at the end.  I could even use two separate allocations if
 needed.
>>> I suspect two separate allocations would be the better solution, as it
>>> means that the size field doesn't need to be subject to funny page
>>> permissions.
>> True.  OTOH we never write to the size field after allocating the thing.
>
> Right, but even reading it is going to cause problems if one of the
> paravirt ops can't re-establish ro mappings.

Does paravirt_alloc_ldt completely deny access or does it just set it RO?

--Andy

>
> ~Andrew



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/3] x86/ldt: Make modify_ldt synchronous

2015-07-21 Thread Andrew Cooper
On 22/07/2015 01:28, Andy Lutomirski wrote:
> On Tue, Jul 21, 2015 at 5:21 PM, Andrew Cooper
>  wrote:
>> On 22/07/2015 01:07, Andy Lutomirski wrote:
>>> On Tue, Jul 21, 2015 at 4:38 PM, Andrew Cooper
>>>  wrote:
 On 21/07/2015 22:53, Boris Ostrovsky wrote:
> On 07/21/2015 03:59 PM, Andy Lutomirski wrote:
>> --- a/arch/x86/include/asm/mmu_context.h
>> +++ b/arch/x86/include/asm/mmu_context.h
>> @@ -34,6 +34,44 @@ static inline void load_mm_cr4(struct mm_struct
>> *mm) {}
>>   #endif
>> /*
>> + * ldt_structs can be allocated, used, and freed, but they are never
>> + * modified while live.
>> + */
>> +struct ldt_struct {
>> +int size;
>> +int __pad;/* keep the descriptors naturally aligned. */
>> +struct desc_struct entries[];
>> +};
>
> This breaks Xen which expects LDT to be page-aligned. Not sure why.
>
> Jan, Andrew?
 PV guests are not permitted to have writeable mappings to the frames
 making up the GDT and LDT, so it cannot make unaudited changes to
 loadable descriptors.  In particular, for a 32bit PV guest, it is only
 the segment limit which protects Xen from the ring1 guest kernel.

 A lot of this code hasn't been touched in years, and it certainly
 predates me.  The alignment requirement appears to come from the virtual
 region Xen uses to map the guests GDT and LDT.  Strict alignment is
 required for the GDT so Xen's descriptors starting at 0xe0xx are
 correct, but the LDT alignment seems to be a side effect of similar
 codepaths.

 For an LDT smaller than 8192 entries, I can't see any specific reason
 for enforcing alignment, other than "that's the way it has always been".

 However, the guest would still have to relinquish write access to all
 frames which make up the LDT, which looks to be a bit of an issue given
 the snippet above.
>>> Does the LDT itself need to be aligned or just the address passed to
>>> paravirt_alloc_ldt?
>> The address which Xen receives needs to be aligned.
>>
>> It looks like xen_alloc_ldt() blindly assumes that the desc_struct *ldt
>> it is passed is page aligned, and passes it straight through.
> xen_alloc_ldt is just fiddling with protection though, I think.  Isn't
> it xen_set_ldt that's the meat?  We could easily pass xen_alloc_ldt a
> pointer to the ldt_struct.

So it is.  It is the linear_addr in xen_set_ldt() which Xen currently
audits to be page aligned.

 This will allow ldt_struct itself to be page aligned, and for the size
 field to sit across the base/limit field of what would logically be
 selector 0x0008  There would be some issues accessing size.  To load
 frames as an LDT, a guest must drop all refs to the page so that its
 type may be changed from writeable to segdesc.  After that, an
 update_descriptor hypercall can be used to change size, and I believe
 the guest may subsequently recreate read-only mappings to the frames in
 question (although frankly it is getting late so you will want to double
 check all of this).

 Anyhow, this looks like an issue which should be fixed up with slightly
 more PVOps, rather than enforcing a Xen view of the world on native Linux.

>>> I could presumably make the allocation the other way around so the
>>> size is at the end.  I could even use two separate allocations if
>>> needed.
>> I suspect two separate allocations would be the better solution, as it
>> means that the size field doesn't need to be subject to funny page
>> permissions.
> True.  OTOH we never write to the size field after allocating the thing.

Right, but even reading it is going to cause problems if one of the
paravirt ops can't re-establish ro mappings.

~Andrew
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/3] x86/ldt: Make modify_ldt synchronous

2015-07-21 Thread Andy Lutomirski
On Tue, Jul 21, 2015 at 5:21 PM, Andrew Cooper
 wrote:
> On 22/07/2015 01:07, Andy Lutomirski wrote:
>> On Tue, Jul 21, 2015 at 4:38 PM, Andrew Cooper
>>  wrote:
>>> On 21/07/2015 22:53, Boris Ostrovsky wrote:
 On 07/21/2015 03:59 PM, Andy Lutomirski wrote:
> --- a/arch/x86/include/asm/mmu_context.h
> +++ b/arch/x86/include/asm/mmu_context.h
> @@ -34,6 +34,44 @@ static inline void load_mm_cr4(struct mm_struct
> *mm) {}
>   #endif
> /*
> + * ldt_structs can be allocated, used, and freed, but they are never
> + * modified while live.
> + */
> +struct ldt_struct {
> +int size;
> +int __pad;/* keep the descriptors naturally aligned. */
> +struct desc_struct entries[];
> +};


 This breaks Xen which expects LDT to be page-aligned. Not sure why.

 Jan, Andrew?
>>> PV guests are not permitted to have writeable mappings to the frames
>>> making up the GDT and LDT, so it cannot make unaudited changes to
>>> loadable descriptors.  In particular, for a 32bit PV guest, it is only
>>> the segment limit which protects Xen from the ring1 guest kernel.
>>>
>>> A lot of this code hasn't been touched in years, and it certainly
>>> predates me.  The alignment requirement appears to come from the virtual
>>> region Xen uses to map the guests GDT and LDT.  Strict alignment is
>>> required for the GDT so Xen's descriptors starting at 0xe0xx are
>>> correct, but the LDT alignment seems to be a side effect of similar
>>> codepaths.
>>>
>>> For an LDT smaller than 8192 entries, I can't see any specific reason
>>> for enforcing alignment, other than "that's the way it has always been".
>>>
>>> However, the guest would still have to relinquish write access to all
>>> frames which make up the LDT, which looks to be a bit of an issue given
>>> the snippet above.
>> Does the LDT itself need to be aligned or just the address passed to
>> paravirt_alloc_ldt?
>
> The address which Xen receives needs to be aligned.
>
> It looks like xen_alloc_ldt() blindly assumes that the desc_struct *ldt
> it is passed is page aligned, and passes it straight through.

xen_alloc_ldt is just fiddling with protection though, I think.  Isn't
it xen_set_ldt that's the meat?  We could easily pass xen_alloc_ldt a
pointer to the ldt_struct.

>
>>
>>> I think I have a solution, but I doubt it is going to be very popular.
>>>
>>> * Make a new paravirt hook for allocation of ldt_struct, so the paravirt
>>> backend can choose an alignment if needed
>>> * Make absolutely certain that __pad has the value 0 (so size and __pad
>>> combined don't look like a present descriptor)
>>> * Never hand selector 0x0008 to unsuspecting users.
>> Yuck.
>
> I actually meant 0x0004, but yes.  Very much yuck.
>
>>
>>> This will allow ldt_struct itself to be page aligned, and for the size
>>> field to sit across the base/limit field of what would logically be
>>> selector 0x0008  There would be some issues accessing size.  To load
>>> frames as an LDT, a guest must drop all refs to the page so that its
>>> type may be changed from writeable to segdesc.  After that, an
>>> update_descriptor hypercall can be used to change size, and I believe
>>> the guest may subsequently recreate read-only mappings to the frames in
>>> question (although frankly it is getting late so you will want to double
>>> check all of this).
>>>
>>> Anyhow, this looks like an issue which should be fixed up with slightly
>>> more PVOps, rather than enforcing a Xen view of the world on native Linux.
>>>
>> I could presumably make the allocation the other way around so the
>> size is at the end.  I could even use two separate allocations if
>> needed.
>
> I suspect two separate allocations would be the better solution, as it
> means that the size field doesn't need to be subject to funny page
> permissions.

True.  OTOH we never write to the size field after allocating the thing.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/3] x86/ldt: Make modify_ldt synchronous

2015-07-21 Thread Andrew Cooper
On 22/07/2015 01:07, Andy Lutomirski wrote:
> On Tue, Jul 21, 2015 at 4:38 PM, Andrew Cooper
>  wrote:
>> On 21/07/2015 22:53, Boris Ostrovsky wrote:
>>> On 07/21/2015 03:59 PM, Andy Lutomirski wrote:
 --- a/arch/x86/include/asm/mmu_context.h
 +++ b/arch/x86/include/asm/mmu_context.h
 @@ -34,6 +34,44 @@ static inline void load_mm_cr4(struct mm_struct
 *mm) {}
   #endif
 /*
 + * ldt_structs can be allocated, used, and freed, but they are never
 + * modified while live.
 + */
 +struct ldt_struct {
 +int size;
 +int __pad;/* keep the descriptors naturally aligned. */
 +struct desc_struct entries[];
 +};
>>>
>>>
>>> This breaks Xen which expects LDT to be page-aligned. Not sure why.
>>>
>>> Jan, Andrew?
>> PV guests are not permitted to have writeable mappings to the frames
>> making up the GDT and LDT, so it cannot make unaudited changes to
>> loadable descriptors.  In particular, for a 32bit PV guest, it is only
>> the segment limit which protects Xen from the ring1 guest kernel.
>>
>> A lot of this code hasn't been touched in years, and it certainly
>> predates me.  The alignment requirement appears to come from the virtual
>> region Xen uses to map the guests GDT and LDT.  Strict alignment is
>> required for the GDT so Xen's descriptors starting at 0xe0xx are
>> correct, but the LDT alignment seems to be a side effect of similar
>> codepaths.
>>
>> For an LDT smaller than 8192 entries, I can't see any specific reason
>> for enforcing alignment, other than "that's the way it has always been".
>>
>> However, the guest would still have to relinquish write access to all
>> frames which make up the LDT, which looks to be a bit of an issue given
>> the snippet above.
> Does the LDT itself need to be aligned or just the address passed to
> paravirt_alloc_ldt?

The address which Xen receives needs to be aligned.

It looks like xen_alloc_ldt() blindly assumes that the desc_struct *ldt
it is passed is page aligned, and passes it straight through.

>
>> I think I have a solution, but I doubt it is going to be very popular.
>>
>> * Make a new paravirt hook for allocation of ldt_struct, so the paravirt
>> backend can choose an alignment if needed
>> * Make absolutely certain that __pad has the value 0 (so size and __pad
>> combined don't look like a present descriptor)
>> * Never hand selector 0x0008 to unsuspecting users.
> Yuck.

I actually meant 0x0004, but yes.  Very much yuck.

>
>> This will allow ldt_struct itself to be page aligned, and for the size
>> field to sit across the base/limit field of what would logically be
>> selector 0x0008  There would be some issues accessing size.  To load
>> frames as an LDT, a guest must drop all refs to the page so that its
>> type may be changed from writeable to segdesc.  After that, an
>> update_descriptor hypercall can be used to change size, and I believe
>> the guest may subsequently recreate read-only mappings to the frames in
>> question (although frankly it is getting late so you will want to double
>> check all of this).
>>
>> Anyhow, this looks like an issue which should be fixed up with slightly
>> more PVOps, rather than enforcing a Xen view of the world on native Linux.
>>
> I could presumably make the allocation the other way around so the
> size is at the end.  I could even use two separate allocations if
> needed.

I suspect two separate allocations would be the better solution, as it
means that the size field doesn't need to be subject to funny page
permissions.

~Andrew
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/3] x86/ldt: Make modify_ldt synchronous

2015-07-21 Thread Andy Lutomirski
On Tue, Jul 21, 2015 at 4:38 PM, Andrew Cooper
 wrote:
> On 21/07/2015 22:53, Boris Ostrovsky wrote:
>> On 07/21/2015 03:59 PM, Andy Lutomirski wrote:
>>> --- a/arch/x86/include/asm/mmu_context.h
>>> +++ b/arch/x86/include/asm/mmu_context.h
>>> @@ -34,6 +34,44 @@ static inline void load_mm_cr4(struct mm_struct
>>> *mm) {}
>>>   #endif
>>> /*
>>> + * ldt_structs can be allocated, used, and freed, but they are never
>>> + * modified while live.
>>> + */
>>> +struct ldt_struct {
>>> +int size;
>>> +int __pad;/* keep the descriptors naturally aligned. */
>>> +struct desc_struct entries[];
>>> +};
>>
>>
>>
>> This breaks Xen which expects LDT to be page-aligned. Not sure why.
>>
>> Jan, Andrew?
>
> PV guests are not permitted to have writeable mappings to the frames
> making up the GDT and LDT, so it cannot make unaudited changes to
> loadable descriptors.  In particular, for a 32bit PV guest, it is only
> the segment limit which protects Xen from the ring1 guest kernel.
>
> A lot of this code hasn't been touched in years, and it certainly
> predates me.  The alignment requirement appears to come from the virtual
> region Xen uses to map the guests GDT and LDT.  Strict alignment is
> required for the GDT so Xen's descriptors starting at 0xe0xx are
> correct, but the LDT alignment seems to be a side effect of similar
> codepaths.
>
> For an LDT smaller than 8192 entries, I can't see any specific reason
> for enforcing alignment, other than "that's the way it has always been".
>
> However, the guest would still have to relinquish write access to all
> frames which make up the LDT, which looks to be a bit of an issue given
> the snippet above.

Does the LDT itself need to be aligned or just the address passed to
paravirt_alloc_ldt?

>
> I think I have a solution, but I doubt it is going to be very popular.
>
> * Make a new paravirt hook for allocation of ldt_struct, so the paravirt
> backend can choose an alignment if needed
> * Make absolutely certain that __pad has the value 0 (so size and __pad
> combined don't look like a present descriptor)
> * Never hand selector 0x0008 to unsuspecting users.

Yuck.

>
> This will allow ldt_struct itself to be page aligned, and for the size
> field to sit across the base/limit field of what would logically be
> selector 0x0008  There would be some issues accessing size.  To load
> frames as an LDT, a guest must drop all refs to the page so that its
> type may be changed from writeable to segdesc.  After that, an
> update_descriptor hypercall can be used to change size, and I believe
> the guest may subsequently recreate read-only mappings to the frames in
> question (although frankly it is getting late so you will want to double
> check all of this).
>
> Anyhow, this looks like an issue which should be fixed up with slightly
> more PVOps, rather than enforcing a Xen view of the world on native Linux.
>

I could presumably make the allocation the other way around so the
size is at the end.  I could even use two separate allocations if
needed.

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/3] x86/ldt: Make modify_ldt synchronous

2015-07-21 Thread Andrew Cooper
On 21/07/2015 22:53, Boris Ostrovsky wrote:
> On 07/21/2015 03:59 PM, Andy Lutomirski wrote:
>> --- a/arch/x86/include/asm/mmu_context.h
>> +++ b/arch/x86/include/asm/mmu_context.h
>> @@ -34,6 +34,44 @@ static inline void load_mm_cr4(struct mm_struct
>> *mm) {}
>>   #endif
>> /*
>> + * ldt_structs can be allocated, used, and freed, but they are never
>> + * modified while live.
>> + */
>> +struct ldt_struct {
>> +int size;
>> +int __pad;/* keep the descriptors naturally aligned. */
>> +struct desc_struct entries[];
>> +};
>
>
>
> This breaks Xen which expects LDT to be page-aligned. Not sure why.
>
> Jan, Andrew?

PV guests are not permitted to have writeable mappings to the frames
making up the GDT and LDT, so it cannot make unaudited changes to
loadable descriptors.  In particular, for a 32bit PV guest, it is only
the segment limit which protects Xen from the ring1 guest kernel.

A lot of this code hasn't been touched in years, and it certainly
predates me.  The alignment requirement appears to come from the virtual
region Xen uses to map the guests GDT and LDT.  Strict alignment is
required for the GDT so Xen's descriptors starting at 0xe0xx are
correct, but the LDT alignment seems to be a side effect of similar
codepaths.

For an LDT smaller than 8192 entries, I can't see any specific reason
for enforcing alignment, other than "that's the way it has always been".

However, the guest would still have to relinquish write access to all
frames which make up the LDT, which looks to be a bit of an issue given
the snippet above.

I think I have a solution, but I doubt it is going to be very popular.

* Make a new paravirt hook for allocation of ldt_struct, so the paravirt
backend can choose an alignment if needed
* Make absolutely certain that __pad has the value 0 (so size and __pad
combined don't look like a present descriptor)
* Never hand selector 0x0008 to unsuspecting users.

This will allow ldt_struct itself to be page aligned, and for the size
field to sit across the base/limit field of what would logically be
selector 0x0008  There would be some issues accessing size.  To load
frames as an LDT, a guest must drop all refs to the page so that its
type may be changed from writeable to segdesc.  After that, an
update_descriptor hypercall can be used to change size, and I believe
the guest may subsequently recreate read-only mappings to the frames in
question (although frankly it is getting late so you will want to double
check all of this).

Anyhow, this looks like an issue which should be fixed up with slightly
more PVOps, rather than enforcing a Xen view of the world on native Linux.

~Andrew
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/3] x86/ldt: Make modify_ldt synchronous

2015-07-21 Thread Boris Ostrovsky

On 07/21/2015 03:59 PM, Andy Lutomirski wrote:

modify_ldt has questionable locking and does not synchronize
threads.  Improve it: redesign the locking and synchronize all
threads' LDTs using an IPI on all modifications.

This will dramatically slow down modify_ldt in multithreaded
programs, but there shouldn't be any multithreaded programs that
care about modify_ldt's performance in the first place.

Cc: sta...@vger.kernel.org
Signed-off-by: Andy Lutomirski 
---
  arch/x86/include/asm/desc.h|  15 ---
  arch/x86/include/asm/mmu.h |   3 +-
  arch/x86/include/asm/mmu_context.h |  48 ++-
  arch/x86/kernel/cpu/common.c   |   4 +-
  arch/x86/kernel/cpu/perf_event.c   |  12 +-
  arch/x86/kernel/ldt.c  | 247 +++--
  arch/x86/kernel/process_64.c   |   4 +-
  arch/x86/kernel/step.c |   6 +-
  arch/x86/power/cpu.c   |   3 +-
  9 files changed, 192 insertions(+), 150 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index a0bf89fd2647..4e10d73cf018 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -280,21 +280,6 @@ static inline void clear_LDT(void)
set_ldt(NULL, 0);
  }
  
-/*

- * load one particular LDT into the current CPU
- */
-static inline void load_LDT_nolock(mm_context_t *pc)
-{
-   set_ldt(pc->ldt, pc->size);
-}
-
-static inline void load_LDT(mm_context_t *pc)
-{
-   preempt_disable();
-   load_LDT_nolock(pc);
-   preempt_enable();
-}
-
  static inline unsigned long get_desc_base(const struct desc_struct *desc)
  {
return (unsigned)(desc->base0 | ((desc->base1) << 16) | ((desc->base2) 
<< 24));
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 09b9620a73b4..364d27481a52 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -9,8 +9,7 @@
   * we put the segment information here.
   */
  typedef struct {
-   void *ldt;
-   int size;
+   struct ldt_struct *ldt;
  
  #ifdef CONFIG_X86_64

/* True if mm supports a task running in 32 bit compatibility mode. */
diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index 5e8daee7c5c9..1ff121fbf366 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -34,6 +34,44 @@ static inline void load_mm_cr4(struct mm_struct *mm) {}
  #endif
  
  /*

+ * ldt_structs can be allocated, used, and freed, but they are never
+ * modified while live.
+ */
+struct ldt_struct {
+   int size;
+   int __pad;  /* keep the descriptors naturally aligned. */
+   struct desc_struct entries[];
+};




This breaks Xen which expects LDT to be page-aligned. Not sure why.

Jan, Andrew?

-boris



+
+static inline void load_mm_ldt(struct mm_struct *mm)
+{
+   struct ldt_struct *ldt;
+   DEBUG_LOCKS_WARN_ON(!irqs_disabled());
+
+   /* lockless_dereference synchronizes with smp_store_release */
+   ldt = lockless_dereference(mm->context.ldt);
+
+   /*
+* Any change to mm->context.ldt is followed by an IPI to all
+* CPUs with the mm active.  The LDT will not be freed until
+* after the IPI is handled by all such CPUs.  This means that,
+* if the ldt_struct changes before we return, the values we see
+* will be safe, and the new values will be loaded before we run
+* any user code.
+*
+* NB: don't try to convert this to use RCU without extreme care.
+* We would still need IRQs off, because we don't want to change
+* the local LDT after an IPI loaded a newer value than the one
+* that we can see.
+*/
+
+   if (unlikely(ldt))
+   set_ldt(ldt->entries, ldt->size);
+   else
+   clear_LDT();
+}
+
+/*
   * Used for LDT copy/destruction.
   */
  int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
@@ -78,12 +116,12 @@ static inline void switch_mm(struct mm_struct *prev, 
struct mm_struct *next,
 * was called and then modify_ldt changed
 * prev->context.ldt but suppressed an IPI to this CPU.
 * In this case, prev->context.ldt != NULL, because we
-* never free an LDT while the mm still exists.  That
-* means that next->context.ldt != prev->context.ldt,
-* because mms never share an LDT.
+* never set context.ldt to NULL while the mm still
+* exists.  That means that next->context.ldt !=
+* prev->context.ldt, because mms never share an LDT.
 */
if (unlikely(prev->context.ldt != next->context.ldt))
-   load_LDT_nolock(>context);
+   load_mm_ldt(next);
}
  #ifdef CONFIG_SMP
  else {
@@ -106,7 +144,7 @@ static inline void switch_mm(struct mm_struct *prev, struct 
mm_struct *next,
  

[PATCH v2 1/3] x86/ldt: Make modify_ldt synchronous

2015-07-21 Thread Andy Lutomirski
modify_ldt has questionable locking and does not synchronize
threads.  Improve it: redesign the locking and synchronize all
threads' LDTs using an IPI on all modifications.

This will dramatically slow down modify_ldt in multithreaded
programs, but there shouldn't be any multithreaded programs that
care about modify_ldt's performance in the first place.

Cc: sta...@vger.kernel.org
Signed-off-by: Andy Lutomirski 
---
 arch/x86/include/asm/desc.h|  15 ---
 arch/x86/include/asm/mmu.h |   3 +-
 arch/x86/include/asm/mmu_context.h |  48 ++-
 arch/x86/kernel/cpu/common.c   |   4 +-
 arch/x86/kernel/cpu/perf_event.c   |  12 +-
 arch/x86/kernel/ldt.c  | 247 +++--
 arch/x86/kernel/process_64.c   |   4 +-
 arch/x86/kernel/step.c |   6 +-
 arch/x86/power/cpu.c   |   3 +-
 9 files changed, 192 insertions(+), 150 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index a0bf89fd2647..4e10d73cf018 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -280,21 +280,6 @@ static inline void clear_LDT(void)
set_ldt(NULL, 0);
 }
 
-/*
- * load one particular LDT into the current CPU
- */
-static inline void load_LDT_nolock(mm_context_t *pc)
-{
-   set_ldt(pc->ldt, pc->size);
-}
-
-static inline void load_LDT(mm_context_t *pc)
-{
-   preempt_disable();
-   load_LDT_nolock(pc);
-   preempt_enable();
-}
-
 static inline unsigned long get_desc_base(const struct desc_struct *desc)
 {
return (unsigned)(desc->base0 | ((desc->base1) << 16) | ((desc->base2) 
<< 24));
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 09b9620a73b4..364d27481a52 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -9,8 +9,7 @@
  * we put the segment information here.
  */
 typedef struct {
-   void *ldt;
-   int size;
+   struct ldt_struct *ldt;
 
 #ifdef CONFIG_X86_64
/* True if mm supports a task running in 32 bit compatibility mode. */
diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index 5e8daee7c5c9..1ff121fbf366 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -34,6 +34,44 @@ static inline void load_mm_cr4(struct mm_struct *mm) {}
 #endif
 
 /*
+ * ldt_structs can be allocated, used, and freed, but they are never
+ * modified while live.
+ */
+struct ldt_struct {
+   int size;
+   int __pad;  /* keep the descriptors naturally aligned. */
+   struct desc_struct entries[];
+};
+
+static inline void load_mm_ldt(struct mm_struct *mm)
+{
+   struct ldt_struct *ldt;
+   DEBUG_LOCKS_WARN_ON(!irqs_disabled());
+
+   /* lockless_dereference synchronizes with smp_store_release */
+   ldt = lockless_dereference(mm->context.ldt);
+
+   /*
+* Any change to mm->context.ldt is followed by an IPI to all
+* CPUs with the mm active.  The LDT will not be freed until
+* after the IPI is handled by all such CPUs.  This means that,
+* if the ldt_struct changes before we return, the values we see
+* will be safe, and the new values will be loaded before we run
+* any user code.
+*
+* NB: don't try to convert this to use RCU without extreme care.
+* We would still need IRQs off, because we don't want to change
+* the local LDT after an IPI loaded a newer value than the one
+* that we can see.
+*/
+
+   if (unlikely(ldt))
+   set_ldt(ldt->entries, ldt->size);
+   else
+   clear_LDT();
+}
+
+/*
  * Used for LDT copy/destruction.
  */
 int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
@@ -78,12 +116,12 @@ static inline void switch_mm(struct mm_struct *prev, 
struct mm_struct *next,
 * was called and then modify_ldt changed
 * prev->context.ldt but suppressed an IPI to this CPU.
 * In this case, prev->context.ldt != NULL, because we
-* never free an LDT while the mm still exists.  That
-* means that next->context.ldt != prev->context.ldt,
-* because mms never share an LDT.
+* never set context.ldt to NULL while the mm still
+* exists.  That means that next->context.ldt !=
+* prev->context.ldt, because mms never share an LDT.
 */
if (unlikely(prev->context.ldt != next->context.ldt))
-   load_LDT_nolock(>context);
+   load_mm_ldt(next);
}
 #ifdef CONFIG_SMP
  else {
@@ -106,7 +144,7 @@ static inline void switch_mm(struct mm_struct *prev, struct 
mm_struct *next,
load_cr3(next->pgd);
trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
TLB_FLUSH_ALL);
load_mm_cr4(next);
- 

Re: [PATCH v2 1/3] x86/ldt: Make modify_ldt synchronous

2015-07-21 Thread Andy Lutomirski
On Tue, Jul 21, 2015 at 5:21 PM, Andrew Cooper
andrew.coop...@citrix.com wrote:
 On 22/07/2015 01:07, Andy Lutomirski wrote:
 On Tue, Jul 21, 2015 at 4:38 PM, Andrew Cooper
 andrew.coop...@citrix.com wrote:
 On 21/07/2015 22:53, Boris Ostrovsky wrote:
 On 07/21/2015 03:59 PM, Andy Lutomirski wrote:
 --- a/arch/x86/include/asm/mmu_context.h
 +++ b/arch/x86/include/asm/mmu_context.h
 @@ -34,6 +34,44 @@ static inline void load_mm_cr4(struct mm_struct
 *mm) {}
   #endif
 /*
 + * ldt_structs can be allocated, used, and freed, but they are never
 + * modified while live.
 + */
 +struct ldt_struct {
 +int size;
 +int __pad;/* keep the descriptors naturally aligned. */
 +struct desc_struct entries[];
 +};


 This breaks Xen which expects LDT to be page-aligned. Not sure why.

 Jan, Andrew?
 PV guests are not permitted to have writeable mappings to the frames
 making up the GDT and LDT, so it cannot make unaudited changes to
 loadable descriptors.  In particular, for a 32bit PV guest, it is only
 the segment limit which protects Xen from the ring1 guest kernel.

 A lot of this code hasn't been touched in years, and it certainly
 predates me.  The alignment requirement appears to come from the virtual
 region Xen uses to map the guests GDT and LDT.  Strict alignment is
 required for the GDT so Xen's descriptors starting at 0xe0xx are
 correct, but the LDT alignment seems to be a side effect of similar
 codepaths.

 For an LDT smaller than 8192 entries, I can't see any specific reason
 for enforcing alignment, other than that's the way it has always been.

 However, the guest would still have to relinquish write access to all
 frames which make up the LDT, which looks to be a bit of an issue given
 the snippet above.
 Does the LDT itself need to be aligned or just the address passed to
 paravirt_alloc_ldt?

 The address which Xen receives needs to be aligned.

 It looks like xen_alloc_ldt() blindly assumes that the desc_struct *ldt
 it is passed is page aligned, and passes it straight through.

xen_alloc_ldt is just fiddling with protection though, I think.  Isn't
it xen_set_ldt that's the meat?  We could easily pass xen_alloc_ldt a
pointer to the ldt_struct.



 I think I have a solution, but I doubt it is going to be very popular.

 * Make a new paravirt hook for allocation of ldt_struct, so the paravirt
 backend can choose an alignment if needed
 * Make absolutely certain that __pad has the value 0 (so size and __pad
 combined don't look like a present descriptor)
 * Never hand selector 0x0008 to unsuspecting users.
 Yuck.

 I actually meant 0x0004, but yes.  Very much yuck.


 This will allow ldt_struct itself to be page aligned, and for the size
 field to sit across the base/limit field of what would logically be
 selector 0x0008  There would be some issues accessing size.  To load
 frames as an LDT, a guest must drop all refs to the page so that its
 type may be changed from writeable to segdesc.  After that, an
 update_descriptor hypercall can be used to change size, and I believe
 the guest may subsequently recreate read-only mappings to the frames in
 question (although frankly it is getting late so you will want to double
 check all of this).

 Anyhow, this looks like an issue which should be fixed up with slightly
 more PVOps, rather than enforcing a Xen view of the world on native Linux.

 I could presumably make the allocation the other way around so the
 size is at the end.  I could even use two separate allocations if
 needed.

 I suspect two separate allocations would be the better solution, as it
 means that the size field doesn't need to be subject to funny page
 permissions.

True.  OTOH we never write to the size field after allocating the thing.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/3] x86/ldt: Make modify_ldt synchronous

2015-07-21 Thread Brian Gerst
On Tue, Jul 21, 2015 at 3:59 PM, Andy Lutomirski l...@kernel.org wrote:
 modify_ldt has questionable locking and does not synchronize
 threads.  Improve it: redesign the locking and synchronize all
 threads' LDTs using an IPI on all modifications.

What does this fix?  I can see sending an IPI if the LDT is
reallocated, but on every update seems unnecessary.

--
Brian Gerst
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/3] x86/ldt: Make modify_ldt synchronous

2015-07-21 Thread Brian Gerst
On Tue, Jul 21, 2015 at 10:12 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Tue, Jul 21, 2015 at 7:01 PM, Brian Gerst brge...@gmail.com wrote:
 On Tue, Jul 21, 2015 at 3:59 PM, Andy Lutomirski l...@kernel.org wrote:
 modify_ldt has questionable locking and does not synchronize
 threads.  Improve it: redesign the locking and synchronize all
 threads' LDTs using an IPI on all modifications.

 What does this fix?  I can see sending an IPI if the LDT is
 reallocated, but on every update seems unnecessary.


 It prevents nastiness in which you're in user mode with an impossible
 CS or SS, resulting in potentially interesting artifacts in
 interrupts, NMIs, etc.

By impossible, do you mean a partially updated descriptor when the
interrupt occurs?  Would making sure that the descriptor is atomically
updated (using set_64bit()) fix that?

--
Brian Gerst
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/3] x86/ldt: Make modify_ldt synchronous

2015-07-21 Thread Andrew Cooper
On 22/07/2015 01:07, Andy Lutomirski wrote:
 On Tue, Jul 21, 2015 at 4:38 PM, Andrew Cooper
 andrew.coop...@citrix.com wrote:
 On 21/07/2015 22:53, Boris Ostrovsky wrote:
 On 07/21/2015 03:59 PM, Andy Lutomirski wrote:
 --- a/arch/x86/include/asm/mmu_context.h
 +++ b/arch/x86/include/asm/mmu_context.h
 @@ -34,6 +34,44 @@ static inline void load_mm_cr4(struct mm_struct
 *mm) {}
   #endif
 /*
 + * ldt_structs can be allocated, used, and freed, but they are never
 + * modified while live.
 + */
 +struct ldt_struct {
 +int size;
 +int __pad;/* keep the descriptors naturally aligned. */
 +struct desc_struct entries[];
 +};


 This breaks Xen which expects LDT to be page-aligned. Not sure why.

 Jan, Andrew?
 PV guests are not permitted to have writeable mappings to the frames
 making up the GDT and LDT, so it cannot make unaudited changes to
 loadable descriptors.  In particular, for a 32bit PV guest, it is only
 the segment limit which protects Xen from the ring1 guest kernel.

 A lot of this code hasn't been touched in years, and it certainly
 predates me.  The alignment requirement appears to come from the virtual
 region Xen uses to map the guests GDT and LDT.  Strict alignment is
 required for the GDT so Xen's descriptors starting at 0xe0xx are
 correct, but the LDT alignment seems to be a side effect of similar
 codepaths.

 For an LDT smaller than 8192 entries, I can't see any specific reason
 for enforcing alignment, other than that's the way it has always been.

 However, the guest would still have to relinquish write access to all
 frames which make up the LDT, which looks to be a bit of an issue given
 the snippet above.
 Does the LDT itself need to be aligned or just the address passed to
 paravirt_alloc_ldt?

The address which Xen receives needs to be aligned.

It looks like xen_alloc_ldt() blindly assumes that the desc_struct *ldt
it is passed is page aligned, and passes it straight through.


 I think I have a solution, but I doubt it is going to be very popular.

 * Make a new paravirt hook for allocation of ldt_struct, so the paravirt
 backend can choose an alignment if needed
 * Make absolutely certain that __pad has the value 0 (so size and __pad
 combined don't look like a present descriptor)
 * Never hand selector 0x0008 to unsuspecting users.
 Yuck.

I actually meant 0x0004, but yes.  Very much yuck.


 This will allow ldt_struct itself to be page aligned, and for the size
 field to sit across the base/limit field of what would logically be
 selector 0x0008  There would be some issues accessing size.  To load
 frames as an LDT, a guest must drop all refs to the page so that its
 type may be changed from writeable to segdesc.  After that, an
 update_descriptor hypercall can be used to change size, and I believe
 the guest may subsequently recreate read-only mappings to the frames in
 question (although frankly it is getting late so you will want to double
 check all of this).

 Anyhow, this looks like an issue which should be fixed up with slightly
 more PVOps, rather than enforcing a Xen view of the world on native Linux.

 I could presumably make the allocation the other way around so the
 size is at the end.  I could even use two separate allocations if
 needed.

I suspect two separate allocations would be the better solution, as it
means that the size field doesn't need to be subject to funny page
permissions.

~Andrew
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [Xen-devel] [PATCH v2 1/3] x86/ldt: Make modify_ldt synchronous

2015-07-21 Thread Boris Ostrovsky



On 07/21/2015 08:49 PM, Andrew Cooper wrote:

On 22/07/2015 01:28, Andy Lutomirski wrote:

On Tue, Jul 21, 2015 at 5:21 PM, Andrew Cooper
andrew.coop...@citrix.com wrote:

On 22/07/2015 01:07, Andy Lutomirski wrote:

On Tue, Jul 21, 2015 at 4:38 PM, Andrew Cooper
andrew.coop...@citrix.com wrote:

On 21/07/2015 22:53, Boris Ostrovsky wrote:

On 07/21/2015 03:59 PM, Andy Lutomirski wrote:

--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -34,6 +34,44 @@ static inline void load_mm_cr4(struct mm_struct
*mm) {}
   #endif
 /*
+ * ldt_structs can be allocated, used, and freed, but they are never
+ * modified while live.
+ */
+struct ldt_struct {
+int size;
+int __pad;/* keep the descriptors naturally aligned. */
+struct desc_struct entries[];
+};

This breaks Xen which expects LDT to be page-aligned. Not sure why.

Jan, Andrew?

PV guests are not permitted to have writeable mappings to the frames
making up the GDT and LDT, so it cannot make unaudited changes to
loadable descriptors.  In particular, for a 32bit PV guest, it is only
the segment limit which protects Xen from the ring1 guest kernel.

A lot of this code hasn't been touched in years, and it certainly
predates me.  The alignment requirement appears to come from the virtual
region Xen uses to map the guests GDT and LDT.  Strict alignment is
required for the GDT so Xen's descriptors starting at 0xe0xx are
correct, but the LDT alignment seems to be a side effect of similar
codepaths.

For an LDT smaller than 8192 entries, I can't see any specific reason
for enforcing alignment, other than that's the way it has always been.

However, the guest would still have to relinquish write access to all
frames which make up the LDT, which looks to be a bit of an issue given
the snippet above.

Does the LDT itself need to be aligned or just the address passed to
paravirt_alloc_ldt?

The address which Xen receives needs to be aligned.

It looks like xen_alloc_ldt() blindly assumes that the desc_struct *ldt
it is passed is page aligned, and passes it straight through.

xen_alloc_ldt is just fiddling with protection though, I think.  Isn't
it xen_set_ldt that's the meat?  We could easily pass xen_alloc_ldt a
pointer to the ldt_struct.

So it is.  It is the linear_addr in xen_set_ldt() which Xen currently
audits to be page aligned.


This will allow ldt_struct itself to be page aligned, and for the size
field to sit across the base/limit field of what would logically be
selector 0x0008  There would be some issues accessing size.  To load
frames as an LDT, a guest must drop all refs to the page so that its
type may be changed from writeable to segdesc.  After that, an
update_descriptor hypercall can be used to change size, and I believe
the guest may subsequently recreate read-only mappings to the frames in
question (although frankly it is getting late so you will want to double
check all of this).

Anyhow, this looks like an issue which should be fixed up with slightly
more PVOps, rather than enforcing a Xen view of the world on native Linux.


I could presumably make the allocation the other way around so the
size is at the end.  I could even use two separate allocations if
needed.


Why not wrap mm_context_t's ldt and size into a struct (just like 
ldt_struct but without __pad) and have a single allocation of ldt?


I.e.

struct ldt_struct {
int size;
struct desc_struct *entries;
}

--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -9,8 +9,7 @@
   * we put the segment information here.
   */
  typedef struct {
-void *ldt;
-int size;
+struct ldt_struct ldt;
#ifdef CONFIG_X86_64
  /* True if mm supports a task running in 32 bit compatibility 
mode. */



-boris


I suspect two separate allocations would be the better solution, as it
means that the size field doesn't need to be subject to funny page
permissions.

True.  OTOH we never write to the size field after allocating the thing.

Right, but even reading it is going to cause problems if one of the
paravirt ops can't re-establish ro mappings.

~Andrew

___
Xen-devel mailing list
xen-de...@lists.xen.org
http://lists.xen.org/xen-devel


--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/3] x86/ldt: Make modify_ldt synchronous

2015-07-21 Thread Andrew Cooper
On 22/07/2015 01:28, Andy Lutomirski wrote:
 On Tue, Jul 21, 2015 at 5:21 PM, Andrew Cooper
 andrew.coop...@citrix.com wrote:
 On 22/07/2015 01:07, Andy Lutomirski wrote:
 On Tue, Jul 21, 2015 at 4:38 PM, Andrew Cooper
 andrew.coop...@citrix.com wrote:
 On 21/07/2015 22:53, Boris Ostrovsky wrote:
 On 07/21/2015 03:59 PM, Andy Lutomirski wrote:
 --- a/arch/x86/include/asm/mmu_context.h
 +++ b/arch/x86/include/asm/mmu_context.h
 @@ -34,6 +34,44 @@ static inline void load_mm_cr4(struct mm_struct
 *mm) {}
   #endif
 /*
 + * ldt_structs can be allocated, used, and freed, but they are never
 + * modified while live.
 + */
 +struct ldt_struct {
 +int size;
 +int __pad;/* keep the descriptors naturally aligned. */
 +struct desc_struct entries[];
 +};

 This breaks Xen which expects LDT to be page-aligned. Not sure why.

 Jan, Andrew?
 PV guests are not permitted to have writeable mappings to the frames
 making up the GDT and LDT, so it cannot make unaudited changes to
 loadable descriptors.  In particular, for a 32bit PV guest, it is only
 the segment limit which protects Xen from the ring1 guest kernel.

 A lot of this code hasn't been touched in years, and it certainly
 predates me.  The alignment requirement appears to come from the virtual
 region Xen uses to map the guests GDT and LDT.  Strict alignment is
 required for the GDT so Xen's descriptors starting at 0xe0xx are
 correct, but the LDT alignment seems to be a side effect of similar
 codepaths.

 For an LDT smaller than 8192 entries, I can't see any specific reason
 for enforcing alignment, other than that's the way it has always been.

 However, the guest would still have to relinquish write access to all
 frames which make up the LDT, which looks to be a bit of an issue given
 the snippet above.
 Does the LDT itself need to be aligned or just the address passed to
 paravirt_alloc_ldt?
 The address which Xen receives needs to be aligned.

 It looks like xen_alloc_ldt() blindly assumes that the desc_struct *ldt
 it is passed is page aligned, and passes it straight through.
 xen_alloc_ldt is just fiddling with protection though, I think.  Isn't
 it xen_set_ldt that's the meat?  We could easily pass xen_alloc_ldt a
 pointer to the ldt_struct.

So it is.  It is the linear_addr in xen_set_ldt() which Xen currently
audits to be page aligned.

 This will allow ldt_struct itself to be page aligned, and for the size
 field to sit across the base/limit field of what would logically be
 selector 0x0008  There would be some issues accessing size.  To load
 frames as an LDT, a guest must drop all refs to the page so that its
 type may be changed from writeable to segdesc.  After that, an
 update_descriptor hypercall can be used to change size, and I believe
 the guest may subsequently recreate read-only mappings to the frames in
 question (although frankly it is getting late so you will want to double
 check all of this).

 Anyhow, this looks like an issue which should be fixed up with slightly
 more PVOps, rather than enforcing a Xen view of the world on native Linux.

 I could presumably make the allocation the other way around so the
 size is at the end.  I could even use two separate allocations if
 needed.
 I suspect two separate allocations would be the better solution, as it
 means that the size field doesn't need to be subject to funny page
 permissions.
 True.  OTOH we never write to the size field after allocating the thing.

Right, but even reading it is going to cause problems if one of the
paravirt ops can't re-establish ro mappings.

~Andrew
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/3] x86/ldt: Make modify_ldt synchronous

2015-07-21 Thread Andy Lutomirski
On Tue, Jul 21, 2015 at 7:53 PM, Brian Gerst brge...@gmail.com wrote:
 On Tue, Jul 21, 2015 at 10:12 PM, Andy Lutomirski l...@amacapital.net wrote:
 On Tue, Jul 21, 2015 at 7:01 PM, Brian Gerst brge...@gmail.com wrote:
 On Tue, Jul 21, 2015 at 3:59 PM, Andy Lutomirski l...@kernel.org wrote:
 modify_ldt has questionable locking and does not synchronize
 threads.  Improve it: redesign the locking and synchronize all
 threads' LDTs using an IPI on all modifications.

 What does this fix?  I can see sending an IPI if the LDT is
 reallocated, but on every update seems unnecessary.


 It prevents nastiness in which you're in user mode with an impossible
 CS or SS, resulting in potentially interesting artifacts in
 interrupts, NMIs, etc.

 By impossible, do you mean a partially updated descriptor when the
 interrupt occurs?  Would making sure that the descriptor is atomically
 updated (using set_64bit()) fix that?


I tried to exploit that once and didn't get very far.  If I could
cause the LDT to be populated one bit at a time, I think I could
materialize a call gate out of thin air.  The docs are unclear on
what, if anything, the memory ordering rules are.

I'm more concerned about the case where a segment register caches a
value that is no longer in the LDT.  If it's DS, ES, FS, or GS, it
results in nondeterministic behavior but is otherwise not a big deal.
If it's CS or SS, then an interrupt or exception will write a stack
frame with the selectors but IRET can fault.

If the interrupt is an NMI and certain other conditions are met and
your kernel is older than 4.2-rc3, then you should read the
CVE-2015-5157 advisory I'll send tomorrow :)  Even on 4.2-rc3, the NMI
code still struggles a bit if this happens.

With this patch, you can never be in user mode with CS or SS selectors
that point at the LDT but don't match the LDT.  That makes me a lot
more comfortable with modify_ldt.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/3] x86/ldt: Make modify_ldt synchronous

2015-07-21 Thread Andy Lutomirski
On Tue, Jul 21, 2015 at 5:49 PM, Andrew Cooper
andrew.coop...@citrix.com wrote:
 On 22/07/2015 01:28, Andy Lutomirski wrote:
 On Tue, Jul 21, 2015 at 5:21 PM, Andrew Cooper
 andrew.coop...@citrix.com wrote:
 On 22/07/2015 01:07, Andy Lutomirski wrote:
 On Tue, Jul 21, 2015 at 4:38 PM, Andrew Cooper
 andrew.coop...@citrix.com wrote:
 On 21/07/2015 22:53, Boris Ostrovsky wrote:
 On 07/21/2015 03:59 PM, Andy Lutomirski wrote:
 --- a/arch/x86/include/asm/mmu_context.h
 +++ b/arch/x86/include/asm/mmu_context.h
 @@ -34,6 +34,44 @@ static inline void load_mm_cr4(struct mm_struct
 *mm) {}
   #endif
 /*
 + * ldt_structs can be allocated, used, and freed, but they are never
 + * modified while live.
 + */
 +struct ldt_struct {
 +int size;
 +int __pad;/* keep the descriptors naturally aligned. */
 +struct desc_struct entries[];
 +};

 This breaks Xen which expects LDT to be page-aligned. Not sure why.

 Jan, Andrew?
 PV guests are not permitted to have writeable mappings to the frames
 making up the GDT and LDT, so it cannot make unaudited changes to
 loadable descriptors.  In particular, for a 32bit PV guest, it is only
 the segment limit which protects Xen from the ring1 guest kernel.

 A lot of this code hasn't been touched in years, and it certainly
 predates me.  The alignment requirement appears to come from the virtual
 region Xen uses to map the guests GDT and LDT.  Strict alignment is
 required for the GDT so Xen's descriptors starting at 0xe0xx are
 correct, but the LDT alignment seems to be a side effect of similar
 codepaths.

 For an LDT smaller than 8192 entries, I can't see any specific reason
 for enforcing alignment, other than that's the way it has always been.

 However, the guest would still have to relinquish write access to all
 frames which make up the LDT, which looks to be a bit of an issue given
 the snippet above.
 Does the LDT itself need to be aligned or just the address passed to
 paravirt_alloc_ldt?
 The address which Xen receives needs to be aligned.

 It looks like xen_alloc_ldt() blindly assumes that the desc_struct *ldt
 it is passed is page aligned, and passes it straight through.
 xen_alloc_ldt is just fiddling with protection though, I think.  Isn't
 it xen_set_ldt that's the meat?  We could easily pass xen_alloc_ldt a
 pointer to the ldt_struct.

 So it is.  It is the linear_addr in xen_set_ldt() which Xen currently
 audits to be page aligned.

 This will allow ldt_struct itself to be page aligned, and for the size
 field to sit across the base/limit field of what would logically be
 selector 0x0008  There would be some issues accessing size.  To load
 frames as an LDT, a guest must drop all refs to the page so that its
 type may be changed from writeable to segdesc.  After that, an
 update_descriptor hypercall can be used to change size, and I believe
 the guest may subsequently recreate read-only mappings to the frames in
 question (although frankly it is getting late so you will want to double
 check all of this).

 Anyhow, this looks like an issue which should be fixed up with slightly
 more PVOps, rather than enforcing a Xen view of the world on native Linux.

 I could presumably make the allocation the other way around so the
 size is at the end.  I could even use two separate allocations if
 needed.
 I suspect two separate allocations would be the better solution, as it
 means that the size field doesn't need to be subject to funny page
 permissions.
 True.  OTOH we never write to the size field after allocating the thing.

 Right, but even reading it is going to cause problems if one of the
 paravirt ops can't re-establish ro mappings.

Does paravirt_alloc_ldt completely deny access or does it just set it RO?

--Andy


 ~Andrew



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/3] x86/ldt: Make modify_ldt synchronous

2015-07-21 Thread Andy Lutomirski
modify_ldt has questionable locking and does not synchronize
threads.  Improve it: redesign the locking and synchronize all
threads' LDTs using an IPI on all modifications.

This will dramatically slow down modify_ldt in multithreaded
programs, but there shouldn't be any multithreaded programs that
care about modify_ldt's performance in the first place.

Cc: sta...@vger.kernel.org
Signed-off-by: Andy Lutomirski l...@kernel.org
---
 arch/x86/include/asm/desc.h|  15 ---
 arch/x86/include/asm/mmu.h |   3 +-
 arch/x86/include/asm/mmu_context.h |  48 ++-
 arch/x86/kernel/cpu/common.c   |   4 +-
 arch/x86/kernel/cpu/perf_event.c   |  12 +-
 arch/x86/kernel/ldt.c  | 247 +++--
 arch/x86/kernel/process_64.c   |   4 +-
 arch/x86/kernel/step.c |   6 +-
 arch/x86/power/cpu.c   |   3 +-
 9 files changed, 192 insertions(+), 150 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index a0bf89fd2647..4e10d73cf018 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -280,21 +280,6 @@ static inline void clear_LDT(void)
set_ldt(NULL, 0);
 }
 
-/*
- * load one particular LDT into the current CPU
- */
-static inline void load_LDT_nolock(mm_context_t *pc)
-{
-   set_ldt(pc-ldt, pc-size);
-}
-
-static inline void load_LDT(mm_context_t *pc)
-{
-   preempt_disable();
-   load_LDT_nolock(pc);
-   preempt_enable();
-}
-
 static inline unsigned long get_desc_base(const struct desc_struct *desc)
 {
return (unsigned)(desc-base0 | ((desc-base1)  16) | ((desc-base2) 
 24));
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 09b9620a73b4..364d27481a52 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -9,8 +9,7 @@
  * we put the segment information here.
  */
 typedef struct {
-   void *ldt;
-   int size;
+   struct ldt_struct *ldt;
 
 #ifdef CONFIG_X86_64
/* True if mm supports a task running in 32 bit compatibility mode. */
diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index 5e8daee7c5c9..1ff121fbf366 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -34,6 +34,44 @@ static inline void load_mm_cr4(struct mm_struct *mm) {}
 #endif
 
 /*
+ * ldt_structs can be allocated, used, and freed, but they are never
+ * modified while live.
+ */
+struct ldt_struct {
+   int size;
+   int __pad;  /* keep the descriptors naturally aligned. */
+   struct desc_struct entries[];
+};
+
+static inline void load_mm_ldt(struct mm_struct *mm)
+{
+   struct ldt_struct *ldt;
+   DEBUG_LOCKS_WARN_ON(!irqs_disabled());
+
+   /* lockless_dereference synchronizes with smp_store_release */
+   ldt = lockless_dereference(mm-context.ldt);
+
+   /*
+* Any change to mm-context.ldt is followed by an IPI to all
+* CPUs with the mm active.  The LDT will not be freed until
+* after the IPI is handled by all such CPUs.  This means that,
+* if the ldt_struct changes before we return, the values we see
+* will be safe, and the new values will be loaded before we run
+* any user code.
+*
+* NB: don't try to convert this to use RCU without extreme care.
+* We would still need IRQs off, because we don't want to change
+* the local LDT after an IPI loaded a newer value than the one
+* that we can see.
+*/
+
+   if (unlikely(ldt))
+   set_ldt(ldt-entries, ldt-size);
+   else
+   clear_LDT();
+}
+
+/*
  * Used for LDT copy/destruction.
  */
 int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
@@ -78,12 +116,12 @@ static inline void switch_mm(struct mm_struct *prev, 
struct mm_struct *next,
 * was called and then modify_ldt changed
 * prev-context.ldt but suppressed an IPI to this CPU.
 * In this case, prev-context.ldt != NULL, because we
-* never free an LDT while the mm still exists.  That
-* means that next-context.ldt != prev-context.ldt,
-* because mms never share an LDT.
+* never set context.ldt to NULL while the mm still
+* exists.  That means that next-context.ldt !=
+* prev-context.ldt, because mms never share an LDT.
 */
if (unlikely(prev-context.ldt != next-context.ldt))
-   load_LDT_nolock(next-context);
+   load_mm_ldt(next);
}
 #ifdef CONFIG_SMP
  else {
@@ -106,7 +144,7 @@ static inline void switch_mm(struct mm_struct *prev, struct 
mm_struct *next,
load_cr3(next-pgd);
trace_tlb_flush(TLB_FLUSH_ON_TASK_SWITCH, 
TLB_FLUSH_ALL);
load_mm_cr4(next);
-

Re: [PATCH v2 1/3] x86/ldt: Make modify_ldt synchronous

2015-07-21 Thread Boris Ostrovsky

On 07/21/2015 03:59 PM, Andy Lutomirski wrote:

modify_ldt has questionable locking and does not synchronize
threads.  Improve it: redesign the locking and synchronize all
threads' LDTs using an IPI on all modifications.

This will dramatically slow down modify_ldt in multithreaded
programs, but there shouldn't be any multithreaded programs that
care about modify_ldt's performance in the first place.

Cc: sta...@vger.kernel.org
Signed-off-by: Andy Lutomirski l...@kernel.org
---
  arch/x86/include/asm/desc.h|  15 ---
  arch/x86/include/asm/mmu.h |   3 +-
  arch/x86/include/asm/mmu_context.h |  48 ++-
  arch/x86/kernel/cpu/common.c   |   4 +-
  arch/x86/kernel/cpu/perf_event.c   |  12 +-
  arch/x86/kernel/ldt.c  | 247 +++--
  arch/x86/kernel/process_64.c   |   4 +-
  arch/x86/kernel/step.c |   6 +-
  arch/x86/power/cpu.c   |   3 +-
  9 files changed, 192 insertions(+), 150 deletions(-)

diff --git a/arch/x86/include/asm/desc.h b/arch/x86/include/asm/desc.h
index a0bf89fd2647..4e10d73cf018 100644
--- a/arch/x86/include/asm/desc.h
+++ b/arch/x86/include/asm/desc.h
@@ -280,21 +280,6 @@ static inline void clear_LDT(void)
set_ldt(NULL, 0);
  }
  
-/*

- * load one particular LDT into the current CPU
- */
-static inline void load_LDT_nolock(mm_context_t *pc)
-{
-   set_ldt(pc-ldt, pc-size);
-}
-
-static inline void load_LDT(mm_context_t *pc)
-{
-   preempt_disable();
-   load_LDT_nolock(pc);
-   preempt_enable();
-}
-
  static inline unsigned long get_desc_base(const struct desc_struct *desc)
  {
return (unsigned)(desc-base0 | ((desc-base1)  16) | ((desc-base2) 
 24));
diff --git a/arch/x86/include/asm/mmu.h b/arch/x86/include/asm/mmu.h
index 09b9620a73b4..364d27481a52 100644
--- a/arch/x86/include/asm/mmu.h
+++ b/arch/x86/include/asm/mmu.h
@@ -9,8 +9,7 @@
   * we put the segment information here.
   */
  typedef struct {
-   void *ldt;
-   int size;
+   struct ldt_struct *ldt;
  
  #ifdef CONFIG_X86_64

/* True if mm supports a task running in 32 bit compatibility mode. */
diff --git a/arch/x86/include/asm/mmu_context.h 
b/arch/x86/include/asm/mmu_context.h
index 5e8daee7c5c9..1ff121fbf366 100644
--- a/arch/x86/include/asm/mmu_context.h
+++ b/arch/x86/include/asm/mmu_context.h
@@ -34,6 +34,44 @@ static inline void load_mm_cr4(struct mm_struct *mm) {}
  #endif
  
  /*

+ * ldt_structs can be allocated, used, and freed, but they are never
+ * modified while live.
+ */
+struct ldt_struct {
+   int size;
+   int __pad;  /* keep the descriptors naturally aligned. */
+   struct desc_struct entries[];
+};




This breaks Xen which expects LDT to be page-aligned. Not sure why.

Jan, Andrew?

-boris



+
+static inline void load_mm_ldt(struct mm_struct *mm)
+{
+   struct ldt_struct *ldt;
+   DEBUG_LOCKS_WARN_ON(!irqs_disabled());
+
+   /* lockless_dereference synchronizes with smp_store_release */
+   ldt = lockless_dereference(mm-context.ldt);
+
+   /*
+* Any change to mm-context.ldt is followed by an IPI to all
+* CPUs with the mm active.  The LDT will not be freed until
+* after the IPI is handled by all such CPUs.  This means that,
+* if the ldt_struct changes before we return, the values we see
+* will be safe, and the new values will be loaded before we run
+* any user code.
+*
+* NB: don't try to convert this to use RCU without extreme care.
+* We would still need IRQs off, because we don't want to change
+* the local LDT after an IPI loaded a newer value than the one
+* that we can see.
+*/
+
+   if (unlikely(ldt))
+   set_ldt(ldt-entries, ldt-size);
+   else
+   clear_LDT();
+}
+
+/*
   * Used for LDT copy/destruction.
   */
  int init_new_context(struct task_struct *tsk, struct mm_struct *mm);
@@ -78,12 +116,12 @@ static inline void switch_mm(struct mm_struct *prev, 
struct mm_struct *next,
 * was called and then modify_ldt changed
 * prev-context.ldt but suppressed an IPI to this CPU.
 * In this case, prev-context.ldt != NULL, because we
-* never free an LDT while the mm still exists.  That
-* means that next-context.ldt != prev-context.ldt,
-* because mms never share an LDT.
+* never set context.ldt to NULL while the mm still
+* exists.  That means that next-context.ldt !=
+* prev-context.ldt, because mms never share an LDT.
 */
if (unlikely(prev-context.ldt != next-context.ldt))
-   load_LDT_nolock(next-context);
+   load_mm_ldt(next);
}
  #ifdef CONFIG_SMP
  else {
@@ -106,7 +144,7 @@ static inline void switch_mm(struct mm_struct *prev, struct 
mm_struct *next,

Re: [Xen-devel] [PATCH v2 1/3] x86/ldt: Make modify_ldt synchronous

2015-07-21 Thread Andy Lutomirski
On Tue, Jul 21, 2015 at 7:04 PM, Boris Ostrovsky
boris.ostrov...@oracle.com wrote:


 On 07/21/2015 08:49 PM, Andrew Cooper wrote:

 On 22/07/2015 01:28, Andy Lutomirski wrote:

 On Tue, Jul 21, 2015 at 5:21 PM, Andrew Cooper
 andrew.coop...@citrix.com wrote:

 On 22/07/2015 01:07, Andy Lutomirski wrote:

 On Tue, Jul 21, 2015 at 4:38 PM, Andrew Cooper
 andrew.coop...@citrix.com wrote:

 On 21/07/2015 22:53, Boris Ostrovsky wrote:

 On 07/21/2015 03:59 PM, Andy Lutomirski wrote:

 --- a/arch/x86/include/asm/mmu_context.h
 +++ b/arch/x86/include/asm/mmu_context.h
 @@ -34,6 +34,44 @@ static inline void load_mm_cr4(struct mm_struct
 *mm) {}
#endif
  /*
 + * ldt_structs can be allocated, used, and freed, but they are
 never
 + * modified while live.
 + */
 +struct ldt_struct {
 +int size;
 +int __pad;/* keep the descriptors naturally aligned. */
 +struct desc_struct entries[];
 +};

 This breaks Xen which expects LDT to be page-aligned. Not sure why.

 Jan, Andrew?

 PV guests are not permitted to have writeable mappings to the frames
 making up the GDT and LDT, so it cannot make unaudited changes to
 loadable descriptors.  In particular, for a 32bit PV guest, it is only
 the segment limit which protects Xen from the ring1 guest kernel.

 A lot of this code hasn't been touched in years, and it certainly
 predates me.  The alignment requirement appears to come from the
 virtual
 region Xen uses to map the guests GDT and LDT.  Strict alignment is
 required for the GDT so Xen's descriptors starting at 0xe0xx are
 correct, but the LDT alignment seems to be a side effect of similar
 codepaths.

 For an LDT smaller than 8192 entries, I can't see any specific reason
 for enforcing alignment, other than that's the way it has always
 been.

 However, the guest would still have to relinquish write access to all
 frames which make up the LDT, which looks to be a bit of an issue
 given
 the snippet above.

 Does the LDT itself need to be aligned or just the address passed to
 paravirt_alloc_ldt?

 The address which Xen receives needs to be aligned.

 It looks like xen_alloc_ldt() blindly assumes that the desc_struct *ldt
 it is passed is page aligned, and passes it straight through.

 xen_alloc_ldt is just fiddling with protection though, I think.  Isn't
 it xen_set_ldt that's the meat?  We could easily pass xen_alloc_ldt a
 pointer to the ldt_struct.

 So it is.  It is the linear_addr in xen_set_ldt() which Xen currently
 audits to be page aligned.

 This will allow ldt_struct itself to be page aligned, and for the size
 field to sit across the base/limit field of what would logically be
 selector 0x0008  There would be some issues accessing size.  To load
 frames as an LDT, a guest must drop all refs to the page so that its
 type may be changed from writeable to segdesc.  After that, an
 update_descriptor hypercall can be used to change size, and I believe
 the guest may subsequently recreate read-only mappings to the frames
 in
 question (although frankly it is getting late so you will want to
 double
 check all of this).

 Anyhow, this looks like an issue which should be fixed up with
 slightly
 more PVOps, rather than enforcing a Xen view of the world on native
 Linux.

 I could presumably make the allocation the other way around so the
 size is at the end.  I could even use two separate allocations if
 needed.


 Why not wrap mm_context_t's ldt and size into a struct (just like ldt_struct
 but without __pad) and have a single allocation of ldt?

 I.e.

 struct ldt_struct {
 int size;
 struct desc_struct *entries;
 }

 --- a/arch/x86/include/asm/mmu.h
 +++ b/arch/x86/include/asm/mmu.h
 @@ -9,8 +9,7 @@
* we put the segment information here.
*/
   typedef struct {
 -void *ldt;
 -int size;
 +struct ldt_struct ldt;
 #ifdef CONFIG_X86_64
   /* True if mm supports a task running in 32 bit compatibility mode. */

I want the atomic read of both of them.  The current code make
interesting assumptions about ordering that may or may not be correct
but are certainly not obviously correct.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/3] x86/ldt: Make modify_ldt synchronous

2015-07-21 Thread Andy Lutomirski
On Tue, Jul 21, 2015 at 7:01 PM, Brian Gerst brge...@gmail.com wrote:
 On Tue, Jul 21, 2015 at 3:59 PM, Andy Lutomirski l...@kernel.org wrote:
 modify_ldt has questionable locking and does not synchronize
 threads.  Improve it: redesign the locking and synchronize all
 threads' LDTs using an IPI on all modifications.

 What does this fix?  I can see sending an IPI if the LDT is
 reallocated, but on every update seems unnecessary.


It prevents nastiness in which you're in user mode with an impossible
CS or SS, resulting in potentially interesting artifacts in
interrupts, NMIs, etc.

 --
 Brian Gerst



-- 
Andy Lutomirski
AMA Capital Management, LLC
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/3] x86/ldt: Make modify_ldt synchronous

2015-07-21 Thread Andrew Cooper
On 21/07/2015 22:53, Boris Ostrovsky wrote:
 On 07/21/2015 03:59 PM, Andy Lutomirski wrote:
 --- a/arch/x86/include/asm/mmu_context.h
 +++ b/arch/x86/include/asm/mmu_context.h
 @@ -34,6 +34,44 @@ static inline void load_mm_cr4(struct mm_struct
 *mm) {}
   #endif
 /*
 + * ldt_structs can be allocated, used, and freed, but they are never
 + * modified while live.
 + */
 +struct ldt_struct {
 +int size;
 +int __pad;/* keep the descriptors naturally aligned. */
 +struct desc_struct entries[];
 +};



 This breaks Xen which expects LDT to be page-aligned. Not sure why.

 Jan, Andrew?

PV guests are not permitted to have writeable mappings to the frames
making up the GDT and LDT, so it cannot make unaudited changes to
loadable descriptors.  In particular, for a 32bit PV guest, it is only
the segment limit which protects Xen from the ring1 guest kernel.

A lot of this code hasn't been touched in years, and it certainly
predates me.  The alignment requirement appears to come from the virtual
region Xen uses to map the guests GDT and LDT.  Strict alignment is
required for the GDT so Xen's descriptors starting at 0xe0xx are
correct, but the LDT alignment seems to be a side effect of similar
codepaths.

For an LDT smaller than 8192 entries, I can't see any specific reason
for enforcing alignment, other than that's the way it has always been.

However, the guest would still have to relinquish write access to all
frames which make up the LDT, which looks to be a bit of an issue given
the snippet above.

I think I have a solution, but I doubt it is going to be very popular.

* Make a new paravirt hook for allocation of ldt_struct, so the paravirt
backend can choose an alignment if needed
* Make absolutely certain that __pad has the value 0 (so size and __pad
combined don't look like a present descriptor)
* Never hand selector 0x0008 to unsuspecting users.

This will allow ldt_struct itself to be page aligned, and for the size
field to sit across the base/limit field of what would logically be
selector 0x0008  There would be some issues accessing size.  To load
frames as an LDT, a guest must drop all refs to the page so that its
type may be changed from writeable to segdesc.  After that, an
update_descriptor hypercall can be used to change size, and I believe
the guest may subsequently recreate read-only mappings to the frames in
question (although frankly it is getting late so you will want to double
check all of this).

Anyhow, this looks like an issue which should be fixed up with slightly
more PVOps, rather than enforcing a Xen view of the world on native Linux.

~Andrew
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/3] x86/ldt: Make modify_ldt synchronous

2015-07-21 Thread Andy Lutomirski
On Tue, Jul 21, 2015 at 4:38 PM, Andrew Cooper
andrew.coop...@citrix.com wrote:
 On 21/07/2015 22:53, Boris Ostrovsky wrote:
 On 07/21/2015 03:59 PM, Andy Lutomirski wrote:
 --- a/arch/x86/include/asm/mmu_context.h
 +++ b/arch/x86/include/asm/mmu_context.h
 @@ -34,6 +34,44 @@ static inline void load_mm_cr4(struct mm_struct
 *mm) {}
   #endif
 /*
 + * ldt_structs can be allocated, used, and freed, but they are never
 + * modified while live.
 + */
 +struct ldt_struct {
 +int size;
 +int __pad;/* keep the descriptors naturally aligned. */
 +struct desc_struct entries[];
 +};



 This breaks Xen which expects LDT to be page-aligned. Not sure why.

 Jan, Andrew?

 PV guests are not permitted to have writeable mappings to the frames
 making up the GDT and LDT, so it cannot make unaudited changes to
 loadable descriptors.  In particular, for a 32bit PV guest, it is only
 the segment limit which protects Xen from the ring1 guest kernel.

 A lot of this code hasn't been touched in years, and it certainly
 predates me.  The alignment requirement appears to come from the virtual
 region Xen uses to map the guests GDT and LDT.  Strict alignment is
 required for the GDT so Xen's descriptors starting at 0xe0xx are
 correct, but the LDT alignment seems to be a side effect of similar
 codepaths.

 For an LDT smaller than 8192 entries, I can't see any specific reason
 for enforcing alignment, other than that's the way it has always been.

 However, the guest would still have to relinquish write access to all
 frames which make up the LDT, which looks to be a bit of an issue given
 the snippet above.

Does the LDT itself need to be aligned or just the address passed to
paravirt_alloc_ldt?


 I think I have a solution, but I doubt it is going to be very popular.

 * Make a new paravirt hook for allocation of ldt_struct, so the paravirt
 backend can choose an alignment if needed
 * Make absolutely certain that __pad has the value 0 (so size and __pad
 combined don't look like a present descriptor)
 * Never hand selector 0x0008 to unsuspecting users.

Yuck.


 This will allow ldt_struct itself to be page aligned, and for the size
 field to sit across the base/limit field of what would logically be
 selector 0x0008  There would be some issues accessing size.  To load
 frames as an LDT, a guest must drop all refs to the page so that its
 type may be changed from writeable to segdesc.  After that, an
 update_descriptor hypercall can be used to change size, and I believe
 the guest may subsequently recreate read-only mappings to the frames in
 question (although frankly it is getting late so you will want to double
 check all of this).

 Anyhow, this looks like an issue which should be fixed up with slightly
 more PVOps, rather than enforcing a Xen view of the world on native Linux.


I could presumably make the allocation the other way around so the
size is at the end.  I could even use two separate allocations if
needed.

--Andy
--
To unsubscribe from this list: send the line unsubscribe linux-kernel in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/