Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-24 Thread Andy Lutomirski
On Mon, Nov 21, 2016 at 1:21 PM, Linus Torvalds
 wrote:
> On Mon, Nov 21, 2016 at 10:26 AM, H. Peter Anvin  wrote:
>> On 11/21/16 10:00, Linus Torvalds wrote:
>>>
>>> I'd much rather we go back to just making the "cs" entry explicitly
>>> 16-bit, and have a separate padding entry, the way we used to long
>>> long ago.
>>>
>>
>> I would agree 100% with this.
>
> We _used_ to do it like this in some places (signal stack, other places):
>
> unsigned short  cs, __csh;

I'm testing a patch to do exactly this.  I didn't bother with the
fancy anonymous union stuff because I don't see any great reason that
anything needs to write the high bits.

Amusingly, grsecurity seems to contain a fix for one instance of this
bug on x86_32 and one instance on x86_64 (!).

--Andy


Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-24 Thread Andy Lutomirski
On Mon, Nov 21, 2016 at 1:21 PM, Linus Torvalds
 wrote:
> On Mon, Nov 21, 2016 at 10:26 AM, H. Peter Anvin  wrote:
>> On 11/21/16 10:00, Linus Torvalds wrote:
>>>
>>> I'd much rather we go back to just making the "cs" entry explicitly
>>> 16-bit, and have a separate padding entry, the way we used to long
>>> long ago.
>>>
>>
>> I would agree 100% with this.
>
> We _used_ to do it like this in some places (signal stack, other places):
>
> unsigned short  cs, __csh;

I'm testing a patch to do exactly this.  I didn't bother with the
fancy anonymous union stuff because I don't see any great reason that
anything needs to write the high bits.

Amusingly, grsecurity seems to contain a fix for one instance of this
bug on x86_32 and one instance on x86_64 (!).

--Andy


Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-23 Thread Ingo Molnar

* Andy Lutomirski  wrote:

> The SDM says:
> 
> If the source operand is an immediate of size less than the operand size, a 
> sign-extended value is pushed on the stack. If the source operand is a 
> segment 
> register (16 bits) and the operand size is 64-bits, a zero- extended value is 
> pushed on the stack; if the operand size is 32-bits, either a zero-extended 
> value is pushed on the stack or the segment selector is written on the stack 
> using a 16-bit move. For the last case, all recent Core and Atom processors 
> perform a 16-bit move, leaving the upper portion of the stack location 
> unmodified.
> 
> This makes me think that even new processors are quirky.

Oh well ...

Thanks,

Ingo


Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-23 Thread Ingo Molnar

* Andy Lutomirski  wrote:

> The SDM says:
> 
> If the source operand is an immediate of size less than the operand size, a 
> sign-extended value is pushed on the stack. If the source operand is a 
> segment 
> register (16 bits) and the operand size is 64-bits, a zero- extended value is 
> pushed on the stack; if the operand size is 32-bits, either a zero-extended 
> value is pushed on the stack or the segment selector is written on the stack 
> using a 16-bit move. For the last case, all recent Core and Atom processors 
> perform a 16-bit move, leaving the upper portion of the stack location 
> unmodified.
> 
> This makes me think that even new processors are quirky.

Oh well ...

Thanks,

Ingo


Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-22 Thread Andy Lutomirski
On Tue, Nov 22, 2016 at 12:30 AM, Ingo Molnar  wrote:
>
> * Linus Torvalds  wrote:
>
>> On Sun, Nov 20, 2016 at 11:13 PM, Ingo Molnar  wrote:
>> >
>> > So I have applied your fix that addresses the worst fallout directly:
>> >
>> >   fc0e81b2bea0 x86/traps: Ignore high word of regs->cs in 
>> > early_fixup_exception()
>> >
>> > ... but otherwise we might be better off zeroing out the high bits of 
>> > segment
>> > registers stored on the stack, in all entry code pathways
>>
>> Ugh.
>>
>> I'd much rather we go back to just making the "cs" entry explicitly
>> 16-bit, and have a separate padding entry, the way we used to long
>> long ago.
>>
>> Or just rename it to something that you're not supposed to access
>> directly, and a helper accessor function that masks off the high bits.
>>
>> The entry code-paths are *much* more critical than any of the few user
>> codepaths.
>
> Absolutely, no arguments about that!
>
>> [...] Let's not add complexity to entry. Make the structure actually reflect
>> reality instead.
>
> So I have no problems at all with your suggestion either.
>
> I am still trying to semi-defend my suggestion as well, because if we do what 
> I
> suggested:
>
>> > [...] so that the function call is patched out on modern CPUs.
>
> then it's essentially an opt-in quirk for really old CPUs and won't impact new
> CPUs, other than a single NOP for the patched out bits - and not even that on
> kernel builds with M686 or later or so ...
>
> I.e. the quirk essentially implements what new CPUs do (in C), and then all
> remaining code can just assume that all data is properly initialized/zeroed 
> like
> on new CPUs and the effects of the quirk does not spread to data structures 
> and
> code that handles and copies around those data structures - unless I'm missing
> something.

The SDM says:

If the source operand is an immediate of size less than the operand
size, a sign-extended value is pushed on
the stack. If the source operand is a segment register (16 bits) and
the operand size is 64-bits, a zero-
extended value is pushed on the stack; if the operand size is 32-bits,
either a zero-extended value is pushed
on the stack or the segment selector is written on the stack using a
16-bit move. For the last case, all recent
Core and Atom processors perform a 16-bit move, leaving the upper
portion of the stack location unmodified.

This makes me think that even new processors are quirky.

--Andy


Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-22 Thread Andy Lutomirski
On Tue, Nov 22, 2016 at 12:30 AM, Ingo Molnar  wrote:
>
> * Linus Torvalds  wrote:
>
>> On Sun, Nov 20, 2016 at 11:13 PM, Ingo Molnar  wrote:
>> >
>> > So I have applied your fix that addresses the worst fallout directly:
>> >
>> >   fc0e81b2bea0 x86/traps: Ignore high word of regs->cs in 
>> > early_fixup_exception()
>> >
>> > ... but otherwise we might be better off zeroing out the high bits of 
>> > segment
>> > registers stored on the stack, in all entry code pathways
>>
>> Ugh.
>>
>> I'd much rather we go back to just making the "cs" entry explicitly
>> 16-bit, and have a separate padding entry, the way we used to long
>> long ago.
>>
>> Or just rename it to something that you're not supposed to access
>> directly, and a helper accessor function that masks off the high bits.
>>
>> The entry code-paths are *much* more critical than any of the few user
>> codepaths.
>
> Absolutely, no arguments about that!
>
>> [...] Let's not add complexity to entry. Make the structure actually reflect
>> reality instead.
>
> So I have no problems at all with your suggestion either.
>
> I am still trying to semi-defend my suggestion as well, because if we do what 
> I
> suggested:
>
>> > [...] so that the function call is patched out on modern CPUs.
>
> then it's essentially an opt-in quirk for really old CPUs and won't impact new
> CPUs, other than a single NOP for the patched out bits - and not even that on
> kernel builds with M686 or later or so ...
>
> I.e. the quirk essentially implements what new CPUs do (in C), and then all
> remaining code can just assume that all data is properly initialized/zeroed 
> like
> on new CPUs and the effects of the quirk does not spread to data structures 
> and
> code that handles and copies around those data structures - unless I'm missing
> something.

The SDM says:

If the source operand is an immediate of size less than the operand
size, a sign-extended value is pushed on
the stack. If the source operand is a segment register (16 bits) and
the operand size is 64-bits, a zero-
extended value is pushed on the stack; if the operand size is 32-bits,
either a zero-extended value is pushed
on the stack or the segment selector is written on the stack using a
16-bit move. For the last case, all recent
Core and Atom processors perform a 16-bit move, leaving the upper
portion of the stack location unmodified.

This makes me think that even new processors are quirky.

--Andy


Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-22 Thread Ingo Molnar

* Linus Torvalds  wrote:

> On Sun, Nov 20, 2016 at 11:13 PM, Ingo Molnar  wrote:
> >
> > So I have applied your fix that addresses the worst fallout directly:
> >
> >   fc0e81b2bea0 x86/traps: Ignore high word of regs->cs in 
> > early_fixup_exception()
> >
> > ... but otherwise we might be better off zeroing out the high bits of 
> > segment
> > registers stored on the stack, in all entry code pathways
> 
> Ugh.
> 
> I'd much rather we go back to just making the "cs" entry explicitly
> 16-bit, and have a separate padding entry, the way we used to long
> long ago.
> 
> Or just rename it to something that you're not supposed to access
> directly, and a helper accessor function that masks off the high bits.
> 
> The entry code-paths are *much* more critical than any of the few user
> codepaths.

Absolutely, no arguments about that!

> [...] Let's not add complexity to entry. Make the structure actually reflect 
> reality instead.

So I have no problems at all with your suggestion either.

I am still trying to semi-defend my suggestion as well, because if we do what I 
suggested:

> > [...] so that the function call is patched out on modern CPUs.

then it's essentially an opt-in quirk for really old CPUs and won't impact new 
CPUs, other than a single NOP for the patched out bits - and not even that on 
kernel builds with M686 or later or so ...

I.e. the quirk essentially implements what new CPUs do (in C), and then all 
remaining code can just assume that all data is properly initialized/zeroed 
like 
on new CPUs and the effects of the quirk does not spread to data structures and 
code that handles and copies around those data structures - unless I'm missing 
something.

Thanks,

Ingo


Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-22 Thread Ingo Molnar

* Linus Torvalds  wrote:

> On Sun, Nov 20, 2016 at 11:13 PM, Ingo Molnar  wrote:
> >
> > So I have applied your fix that addresses the worst fallout directly:
> >
> >   fc0e81b2bea0 x86/traps: Ignore high word of regs->cs in 
> > early_fixup_exception()
> >
> > ... but otherwise we might be better off zeroing out the high bits of 
> > segment
> > registers stored on the stack, in all entry code pathways
> 
> Ugh.
> 
> I'd much rather we go back to just making the "cs" entry explicitly
> 16-bit, and have a separate padding entry, the way we used to long
> long ago.
> 
> Or just rename it to something that you're not supposed to access
> directly, and a helper accessor function that masks off the high bits.
> 
> The entry code-paths are *much* more critical than any of the few user
> codepaths.

Absolutely, no arguments about that!

> [...] Let's not add complexity to entry. Make the structure actually reflect 
> reality instead.

So I have no problems at all with your suggestion either.

I am still trying to semi-defend my suggestion as well, because if we do what I 
suggested:

> > [...] so that the function call is patched out on modern CPUs.

then it's essentially an opt-in quirk for really old CPUs and won't impact new 
CPUs, other than a single NOP for the patched out bits - and not even that on 
kernel builds with M686 or later or so ...

I.e. the quirk essentially implements what new CPUs do (in C), and then all 
remaining code can just assume that all data is properly initialized/zeroed 
like 
on new CPUs and the effects of the quirk does not spread to data structures and 
code that handles and copies around those data structures - unless I'm missing 
something.

Thanks,

Ingo


Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-21 Thread Linus Torvalds
On Mon, Nov 21, 2016 at 2:17 PM,   wrote:
>
> Now, segment loads have always ignored the top 32 bits; it's an issue when 
> examined by other kinds of code.

Yes. Particularly ptrace and signal information copying. Need to make
sure those things don't look at (or expose) high bits that may be
stale stack contents etc.

   Linus


Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-21 Thread Linus Torvalds
On Mon, Nov 21, 2016 at 2:17 PM,   wrote:
>
> Now, segment loads have always ignored the top 32 bits; it's an issue when 
> examined by other kinds of code.

Yes. Particularly ptrace and signal information copying. Need to make
sure those things don't look at (or expose) high bits that may be
stale stack contents etc.

   Linus


Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-21 Thread hpa
On November 21, 2016 1:21:35 PM PST, Linus Torvalds 
 wrote:
>On Mon, Nov 21, 2016 at 10:26 AM, H. Peter Anvin  wrote:
>> On 11/21/16 10:00, Linus Torvalds wrote:
>>>
>>> I'd much rather we go back to just making the "cs" entry explicitly
>>> 16-bit, and have a separate padding entry, the way we used to long
>>> long ago.
>>>
>>
>> I would agree 100% with this.
>
>We _used_ to do it like this in some places (signal stack, other
>places):
>
>unsigned short  cs, __csh;
>
>and
>
>int xcs;
>
>in others (pt_regs, for example).
>
>You still see that "xcs" thing in the x86 uapi/asm/ptrace.h file, but
>that's what our native pt_regs used to look like). And we still have
>that "cs+__cs" thing in at least 'struct user_regs_struct32'. But our
>"struct pt_regs" gas lost it.
>
>I wonder why we broke that. I suspect it happened when we merged the
>64-bit and 32-bit files, but I was too lazy to try to pinpoint it.
>
>And I do think the original i386 model was better - exactly because it
>didn't access undefined state when you just accessed "cs". Either you
>had to know about it and it wasn't called 'cs' ("xcs") or you had that
>high/low separation.
>
>Of course, what might be better yet is to use an anonymous union, so
>that you can do both of the above for all the cases (ie access it both
>as a trustworthy low 16 bits, _and_ as a single 32-bit piece of
>information).
>
>We use anonymous unions all over now, we used to not do it because of
>compiler limitations.
>
>With an anonymous union, we could do soemthing like
>
>union {
>unsigned int xcs;
>unsigned short cs;
>}
>
>and so easily access either the reliable part (cs) or the full word
>(xcs) without masking or having to play games.
>
>[ In fact, I think we could try to make the "cs" member in that union
>be marked "const", which should mean that we'd get warnings if
>somebody were to try to assign just the half-word (so you'd always
>have to *assign* to "xcs", but you'd be able to read "cs").
>
>  I think that has made it from C++ to C. I'm not sure that's
>somethign we can/should use, but it sounds potentially useful for
>these kinds of cases ]
>
>  Linus

Now, segment loads have always ignored the top 32 bits; it's an issue when 
examined by other kinds of code.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-21 Thread hpa
On November 21, 2016 1:21:35 PM PST, Linus Torvalds 
 wrote:
>On Mon, Nov 21, 2016 at 10:26 AM, H. Peter Anvin  wrote:
>> On 11/21/16 10:00, Linus Torvalds wrote:
>>>
>>> I'd much rather we go back to just making the "cs" entry explicitly
>>> 16-bit, and have a separate padding entry, the way we used to long
>>> long ago.
>>>
>>
>> I would agree 100% with this.
>
>We _used_ to do it like this in some places (signal stack, other
>places):
>
>unsigned short  cs, __csh;
>
>and
>
>int xcs;
>
>in others (pt_regs, for example).
>
>You still see that "xcs" thing in the x86 uapi/asm/ptrace.h file, but
>that's what our native pt_regs used to look like). And we still have
>that "cs+__cs" thing in at least 'struct user_regs_struct32'. But our
>"struct pt_regs" gas lost it.
>
>I wonder why we broke that. I suspect it happened when we merged the
>64-bit and 32-bit files, but I was too lazy to try to pinpoint it.
>
>And I do think the original i386 model was better - exactly because it
>didn't access undefined state when you just accessed "cs". Either you
>had to know about it and it wasn't called 'cs' ("xcs") or you had that
>high/low separation.
>
>Of course, what might be better yet is to use an anonymous union, so
>that you can do both of the above for all the cases (ie access it both
>as a trustworthy low 16 bits, _and_ as a single 32-bit piece of
>information).
>
>We use anonymous unions all over now, we used to not do it because of
>compiler limitations.
>
>With an anonymous union, we could do soemthing like
>
>union {
>unsigned int xcs;
>unsigned short cs;
>}
>
>and so easily access either the reliable part (cs) or the full word
>(xcs) without masking or having to play games.
>
>[ In fact, I think we could try to make the "cs" member in that union
>be marked "const", which should mean that we'd get warnings if
>somebody were to try to assign just the half-word (so you'd always
>have to *assign* to "xcs", but you'd be able to read "cs").
>
>  I think that has made it from C++ to C. I'm not sure that's
>somethign we can/should use, but it sounds potentially useful for
>these kinds of cases ]
>
>  Linus

Now, segment loads have always ignored the top 32 bits; it's an issue when 
examined by other kinds of code.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-21 Thread Linus Torvalds
On Mon, Nov 21, 2016 at 10:26 AM, H. Peter Anvin  wrote:
> On 11/21/16 10:00, Linus Torvalds wrote:
>>
>> I'd much rather we go back to just making the "cs" entry explicitly
>> 16-bit, and have a separate padding entry, the way we used to long
>> long ago.
>>
>
> I would agree 100% with this.

We _used_ to do it like this in some places (signal stack, other places):

unsigned short  cs, __csh;

and

int xcs;

in others (pt_regs, for example).

You still see that "xcs" thing in the x86 uapi/asm/ptrace.h file, but
that's what our native pt_regs used to look like). And we still have
that "cs+__cs" thing in at least 'struct user_regs_struct32'. But our
"struct pt_regs" gas lost it.

I wonder why we broke that. I suspect it happened when we merged the
64-bit and 32-bit files, but I was too lazy to try to pinpoint it.

And I do think the original i386 model was better - exactly because it
didn't access undefined state when you just accessed "cs". Either you
had to know about it and it wasn't called 'cs' ("xcs") or you had that
high/low separation.

Of course, what might be better yet is to use an anonymous union, so
that you can do both of the above for all the cases (ie access it both
as a trustworthy low 16 bits, _and_ as a single 32-bit piece of
information).

We use anonymous unions all over now, we used to not do it because of
compiler limitations.

With an anonymous union, we could do soemthing like

union {
unsigned int xcs;
unsigned short cs;
}

and so easily access either the reliable part (cs) or the full word
(xcs) without masking or having to play games.

[ In fact, I think we could try to make the "cs" member in that union
be marked "const", which should mean that we'd get warnings if
somebody were to try to assign just the half-word (so you'd always
have to *assign* to "xcs", but you'd be able to read "cs").

  I think that has made it from C++ to C. I'm not sure that's
somethign we can/should use, but it sounds potentially useful for
these kinds of cases ]

  Linus


Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-21 Thread Linus Torvalds
On Mon, Nov 21, 2016 at 10:26 AM, H. Peter Anvin  wrote:
> On 11/21/16 10:00, Linus Torvalds wrote:
>>
>> I'd much rather we go back to just making the "cs" entry explicitly
>> 16-bit, and have a separate padding entry, the way we used to long
>> long ago.
>>
>
> I would agree 100% with this.

We _used_ to do it like this in some places (signal stack, other places):

unsigned short  cs, __csh;

and

int xcs;

in others (pt_regs, for example).

You still see that "xcs" thing in the x86 uapi/asm/ptrace.h file, but
that's what our native pt_regs used to look like). And we still have
that "cs+__cs" thing in at least 'struct user_regs_struct32'. But our
"struct pt_regs" gas lost it.

I wonder why we broke that. I suspect it happened when we merged the
64-bit and 32-bit files, but I was too lazy to try to pinpoint it.

And I do think the original i386 model was better - exactly because it
didn't access undefined state when you just accessed "cs". Either you
had to know about it and it wasn't called 'cs' ("xcs") or you had that
high/low separation.

Of course, what might be better yet is to use an anonymous union, so
that you can do both of the above for all the cases (ie access it both
as a trustworthy low 16 bits, _and_ as a single 32-bit piece of
information).

We use anonymous unions all over now, we used to not do it because of
compiler limitations.

With an anonymous union, we could do soemthing like

union {
unsigned int xcs;
unsigned short cs;
}

and so easily access either the reliable part (cs) or the full word
(xcs) without masking or having to play games.

[ In fact, I think we could try to make the "cs" member in that union
be marked "const", which should mean that we'd get warnings if
somebody were to try to assign just the half-word (so you'd always
have to *assign* to "xcs", but you'd be able to read "cs").

  I think that has made it from C++ to C. I'm not sure that's
somethign we can/should use, but it sounds potentially useful for
these kinds of cases ]

  Linus


Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-21 Thread H. Peter Anvin
On 11/21/16 10:00, Linus Torvalds wrote:
> 
> Ugh.
> 
> I'd much rather we go back to just making the "cs" entry explicitly
> 16-bit, and have a separate padding entry, the way we used to long
> long ago.
> 

I would agree 100% with this.

-hpa




Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-21 Thread H. Peter Anvin
On 11/21/16 10:00, Linus Torvalds wrote:
> 
> Ugh.
> 
> I'd much rather we go back to just making the "cs" entry explicitly
> 16-bit, and have a separate padding entry, the way we used to long
> long ago.
> 

I would agree 100% with this.

-hpa




Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-21 Thread Linus Torvalds
On Mon, Nov 21, 2016 at 7:58 AM, H. Peter Anvin  wrote:
> On 11/20/16 20:54, h...@zytor.com wrote:
>>
>> I believe i686+ writes zero, older CPUs leave unchanged.
>
> I should point out that, at least from my memory, the same applies to
> instructions like "movl ".  I can't even remember for sure how the
> behavior differs between "movl ," and "movl ,";
> I'd have to do some digging.

I have this distinct feeling that there issues with *both* the
register and memory versions.

Because I have this dim memory that on early microarchitectures, even
"mov segment to register" would always only do a 16-bit move, even if
it was encoded as a 32-bit "movl". Although that may be partly because
I know "gas" had some confusion about operand sizes and segment
register instructions, so there might have been toolchain issues too.

I just dug out my old 486 manual on _paper_ (Christ, I still had it):
"Intel486(tm) Microprocessor Family Programmer's Reference Manual".
The "mov" instruction is only documented for r/m16, and it has a
footnote saying "In protected mode, use 16-bit operand size prefix".

I definitely know that the "only write 16 bits" was the case for
memory accesses, but I think it might have been the case even for
register moves. After all, "mov segment register" is actually a
completely different instruction from the normal "mov" instructions,
even if it often shows up together with them in the instruction
descriptions.

The i686 cleaned up a lot of things, but I think this might be an area
where there were differences between i486 and Pentium and all the
clone chips too.

Linus


Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-21 Thread Linus Torvalds
On Mon, Nov 21, 2016 at 7:58 AM, H. Peter Anvin  wrote:
> On 11/20/16 20:54, h...@zytor.com wrote:
>>
>> I believe i686+ writes zero, older CPUs leave unchanged.
>
> I should point out that, at least from my memory, the same applies to
> instructions like "movl ".  I can't even remember for sure how the
> behavior differs between "movl ," and "movl ,";
> I'd have to do some digging.

I have this distinct feeling that there issues with *both* the
register and memory versions.

Because I have this dim memory that on early microarchitectures, even
"mov segment to register" would always only do a 16-bit move, even if
it was encoded as a 32-bit "movl". Although that may be partly because
I know "gas" had some confusion about operand sizes and segment
register instructions, so there might have been toolchain issues too.

I just dug out my old 486 manual on _paper_ (Christ, I still had it):
"Intel486(tm) Microprocessor Family Programmer's Reference Manual".
The "mov" instruction is only documented for r/m16, and it has a
footnote saying "In protected mode, use 16-bit operand size prefix".

I definitely know that the "only write 16 bits" was the case for
memory accesses, but I think it might have been the case even for
register moves. After all, "mov segment register" is actually a
completely different instruction from the normal "mov" instructions,
even if it often shows up together with them in the instruction
descriptions.

The i686 cleaned up a lot of things, but I think this might be an area
where there were differences between i486 and Pentium and all the
clone chips too.

Linus


Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-21 Thread Linus Torvalds
On Sun, Nov 20, 2016 at 11:13 PM, Ingo Molnar  wrote:
>
> So I have applied your fix that addresses the worst fallout directly:
>
>   fc0e81b2bea0 x86/traps: Ignore high word of regs->cs in 
> early_fixup_exception()
>
> ... but otherwise we might be better off zeroing out the high bits of segment
> registers stored on the stack, in all entry code pathways

Ugh.

I'd much rather we go back to just making the "cs" entry explicitly
16-bit, and have a separate padding entry, the way we used to long
long ago.

Or just rename it to something that you're not supposed to access
directly, and a helper accessor function that masks off the high bits.

The entry code-paths are *much* more critical than any of the few user
codepaths. Let's not add complexity to entry. Make the structure
actually reflect reality instead.

 Linus


Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-21 Thread Linus Torvalds
On Sun, Nov 20, 2016 at 11:13 PM, Ingo Molnar  wrote:
>
> So I have applied your fix that addresses the worst fallout directly:
>
>   fc0e81b2bea0 x86/traps: Ignore high word of regs->cs in 
> early_fixup_exception()
>
> ... but otherwise we might be better off zeroing out the high bits of segment
> registers stored on the stack, in all entry code pathways

Ugh.

I'd much rather we go back to just making the "cs" entry explicitly
16-bit, and have a separate padding entry, the way we used to long
long ago.

Or just rename it to something that you're not supposed to access
directly, and a helper accessor function that masks off the high bits.

The entry code-paths are *much* more critical than any of the few user
codepaths. Let's not add complexity to entry. Make the structure
actually reflect reality instead.

 Linus


Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-21 Thread H. Peter Anvin
On 11/20/16 20:54, h...@zytor.com wrote:
> 
> I believe i686+ writes zero, older CPUs leave unchanged.
> 

I should point out that, at least from my memory, the same applies to
instructions like "movl ".  I can't even remember for sure how the
behavior differs between "movl ," and "movl ,";
I'd have to do some digging.

-hpa



Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-21 Thread H. Peter Anvin
On 11/20/16 20:54, h...@zytor.com wrote:
> 
> I believe i686+ writes zero, older CPUs leave unchanged.
> 

I should point out that, at least from my memory, the same applies to
instructions like "movl ".  I can't even remember for sure how the
behavior differs between "movl ," and "movl ,";
I'd have to do some digging.

-hpa



Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-20 Thread Ingo Molnar

* Andy Lutomirski  wrote:

> On Sat, Nov 19, 2016 at 6:11 PM, Brian Gerst  wrote:
> > On Sat, Nov 19, 2016 at 8:52 PM, Andy Lutomirski  wrote:
> >> This is a question for the old-timers here, since I can't find
> >> anything resembling an answer in the SDM.
> >>
> >> Suppose an exception happens (#UD in this case, but I assume it
> >> doesn't really matter).  We're not in long mode, and the IDT is set up
> >> to deliver to a normal 32-bit kernel code segment.  We're running in
> >> that very same code segment when the exception hits, so no CPL change
> >> occurs and the TSS doesn't particularly matter.
> >>
> >> The CPU will push EFLAGS, CS, and RIP.  Here's the question: what
> >> happens to the high word of CS on the stack?
> >>
> >> The SDM appears to say nothing at all about this.  Modern systems
> >> (e.g. my laptop running in 32-bit legacy mode under KVM) appear to
> >> zero-extend CS.  But Matthew's 486DX appears to put garbage in the
> >> high bits (or maybe just leave whatever was already on the stack in
> >> place).
> >>
> >> Do any of you happen to know what's going on and when the behavior
> >> changed?  I'd like to know just how big of a problem this is.  Because
> >> if lots of CPUs work like Matthew's, we have lots of subtle bugs on
> >> them.
> >>
> >> --Andy
> >
> > This came up a while back, and we was determined that we can't assume
> > zero-extension in 32-bit mode because older processors only do a
> > 16-bit write even on a 32-bit push.  So all segments have to be
> > treated as 16-bit values, or we have to explicitly zero-extend them.
> >
> > All 64-bit capable processors do zero-extend segments, even in 32-bit mode.
> 
> This almost makes me want to change the definition of pt_regs on
> 32-bit rather than fixing all the entry code.

So I have applied your fix that addresses the worst fallout directly:

  fc0e81b2bea0 x86/traps: Ignore high word of regs->cs in 
early_fixup_exception()

... but otherwise we might be better off zeroing out the high bits of segment 
registers stored on the stack, in all entry code pathways - maybe using a 
single 
function and conditional on 

Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-20 Thread Ingo Molnar

* Andy Lutomirski  wrote:

> On Sat, Nov 19, 2016 at 6:11 PM, Brian Gerst  wrote:
> > On Sat, Nov 19, 2016 at 8:52 PM, Andy Lutomirski  wrote:
> >> This is a question for the old-timers here, since I can't find
> >> anything resembling an answer in the SDM.
> >>
> >> Suppose an exception happens (#UD in this case, but I assume it
> >> doesn't really matter).  We're not in long mode, and the IDT is set up
> >> to deliver to a normal 32-bit kernel code segment.  We're running in
> >> that very same code segment when the exception hits, so no CPL change
> >> occurs and the TSS doesn't particularly matter.
> >>
> >> The CPU will push EFLAGS, CS, and RIP.  Here's the question: what
> >> happens to the high word of CS on the stack?
> >>
> >> The SDM appears to say nothing at all about this.  Modern systems
> >> (e.g. my laptop running in 32-bit legacy mode under KVM) appear to
> >> zero-extend CS.  But Matthew's 486DX appears to put garbage in the
> >> high bits (or maybe just leave whatever was already on the stack in
> >> place).
> >>
> >> Do any of you happen to know what's going on and when the behavior
> >> changed?  I'd like to know just how big of a problem this is.  Because
> >> if lots of CPUs work like Matthew's, we have lots of subtle bugs on
> >> them.
> >>
> >> --Andy
> >
> > This came up a while back, and we was determined that we can't assume
> > zero-extension in 32-bit mode because older processors only do a
> > 16-bit write even on a 32-bit push.  So all segments have to be
> > treated as 16-bit values, or we have to explicitly zero-extend them.
> >
> > All 64-bit capable processors do zero-extend segments, even in 32-bit mode.
> 
> This almost makes me want to change the definition of pt_regs on
> 32-bit rather than fixing all the entry code.

So I have applied your fix that addresses the worst fallout directly:

  fc0e81b2bea0 x86/traps: Ignore high word of regs->cs in 
early_fixup_exception()

... but otherwise we might be better off zeroing out the high bits of segment 
registers stored on the stack, in all entry code pathways - maybe using a 
single 
function and conditional on 

Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-20 Thread hpa
On November 19, 2016 5:52:57 PM PST, Andy Lutomirski  wrote:
>This is a question for the old-timers here, since I can't find
>anything resembling an answer in the SDM.
>
>Suppose an exception happens (#UD in this case, but I assume it
>doesn't really matter).  We're not in long mode, and the IDT is set up
>to deliver to a normal 32-bit kernel code segment.  We're running in
>that very same code segment when the exception hits, so no CPL change
>occurs and the TSS doesn't particularly matter.
>
>The CPU will push EFLAGS, CS, and RIP.  Here's the question: what
>happens to the high word of CS on the stack?
>
>The SDM appears to say nothing at all about this.  Modern systems
>(e.g. my laptop running in 32-bit legacy mode under KVM) appear to
>zero-extend CS.  But Matthew's 486DX appears to put garbage in the
>high bits (or maybe just leave whatever was already on the stack in
>place).
>
>Do any of you happen to know what's going on and when the behavior
>changed?  I'd like to know just how big of a problem this is.  Because
>if lots of CPUs work like Matthew's, we have lots of subtle bugs on
>them.
>
>--Andy

I believe i686+ writes zero, older CPUs leave unchanged.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-20 Thread hpa
On November 19, 2016 5:52:57 PM PST, Andy Lutomirski  wrote:
>This is a question for the old-timers here, since I can't find
>anything resembling an answer in the SDM.
>
>Suppose an exception happens (#UD in this case, but I assume it
>doesn't really matter).  We're not in long mode, and the IDT is set up
>to deliver to a normal 32-bit kernel code segment.  We're running in
>that very same code segment when the exception hits, so no CPL change
>occurs and the TSS doesn't particularly matter.
>
>The CPU will push EFLAGS, CS, and RIP.  Here's the question: what
>happens to the high word of CS on the stack?
>
>The SDM appears to say nothing at all about this.  Modern systems
>(e.g. my laptop running in 32-bit legacy mode under KVM) appear to
>zero-extend CS.  But Matthew's 486DX appears to put garbage in the
>high bits (or maybe just leave whatever was already on the stack in
>place).
>
>Do any of you happen to know what's going on and when the behavior
>changed?  I'd like to know just how big of a problem this is.  Because
>if lots of CPUs work like Matthew's, we have lots of subtle bugs on
>them.
>
>--Andy

I believe i686+ writes zero, older CPUs leave unchanged.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-19 Thread Andy Lutomirski
On Sat, Nov 19, 2016 at 6:11 PM, Brian Gerst  wrote:
> On Sat, Nov 19, 2016 at 8:52 PM, Andy Lutomirski  wrote:
>> This is a question for the old-timers here, since I can't find
>> anything resembling an answer in the SDM.
>>
>> Suppose an exception happens (#UD in this case, but I assume it
>> doesn't really matter).  We're not in long mode, and the IDT is set up
>> to deliver to a normal 32-bit kernel code segment.  We're running in
>> that very same code segment when the exception hits, so no CPL change
>> occurs and the TSS doesn't particularly matter.
>>
>> The CPU will push EFLAGS, CS, and RIP.  Here's the question: what
>> happens to the high word of CS on the stack?
>>
>> The SDM appears to say nothing at all about this.  Modern systems
>> (e.g. my laptop running in 32-bit legacy mode under KVM) appear to
>> zero-extend CS.  But Matthew's 486DX appears to put garbage in the
>> high bits (or maybe just leave whatever was already on the stack in
>> place).
>>
>> Do any of you happen to know what's going on and when the behavior
>> changed?  I'd like to know just how big of a problem this is.  Because
>> if lots of CPUs work like Matthew's, we have lots of subtle bugs on
>> them.
>>
>> --Andy
>
> This came up a while back, and we was determined that we can't assume
> zero-extension in 32-bit mode because older processors only do a
> 16-bit write even on a 32-bit push.  So all segments have to be
> treated as 16-bit values, or we have to explicitly zero-extend them.
>
> All 64-bit capable processors do zero-extend segments, even in 32-bit mode.

This almost makes me want to change the definition of pt_regs on
32-bit rather than fixing all the entry code.

>
> --
> Brian Gerst



-- 
Andy Lutomirski
AMA Capital Management, LLC


Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-19 Thread Andy Lutomirski
On Sat, Nov 19, 2016 at 6:11 PM, Brian Gerst  wrote:
> On Sat, Nov 19, 2016 at 8:52 PM, Andy Lutomirski  wrote:
>> This is a question for the old-timers here, since I can't find
>> anything resembling an answer in the SDM.
>>
>> Suppose an exception happens (#UD in this case, but I assume it
>> doesn't really matter).  We're not in long mode, and the IDT is set up
>> to deliver to a normal 32-bit kernel code segment.  We're running in
>> that very same code segment when the exception hits, so no CPL change
>> occurs and the TSS doesn't particularly matter.
>>
>> The CPU will push EFLAGS, CS, and RIP.  Here's the question: what
>> happens to the high word of CS on the stack?
>>
>> The SDM appears to say nothing at all about this.  Modern systems
>> (e.g. my laptop running in 32-bit legacy mode under KVM) appear to
>> zero-extend CS.  But Matthew's 486DX appears to put garbage in the
>> high bits (or maybe just leave whatever was already on the stack in
>> place).
>>
>> Do any of you happen to know what's going on and when the behavior
>> changed?  I'd like to know just how big of a problem this is.  Because
>> if lots of CPUs work like Matthew's, we have lots of subtle bugs on
>> them.
>>
>> --Andy
>
> This came up a while back, and we was determined that we can't assume
> zero-extension in 32-bit mode because older processors only do a
> 16-bit write even on a 32-bit push.  So all segments have to be
> treated as 16-bit values, or we have to explicitly zero-extend them.
>
> All 64-bit capable processors do zero-extend segments, even in 32-bit mode.

This almost makes me want to change the definition of pt_regs on
32-bit rather than fixing all the entry code.

>
> --
> Brian Gerst



-- 
Andy Lutomirski
AMA Capital Management, LLC


Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-19 Thread Brian Gerst
On Sat, Nov 19, 2016 at 8:52 PM, Andy Lutomirski  wrote:
> This is a question for the old-timers here, since I can't find
> anything resembling an answer in the SDM.
>
> Suppose an exception happens (#UD in this case, but I assume it
> doesn't really matter).  We're not in long mode, and the IDT is set up
> to deliver to a normal 32-bit kernel code segment.  We're running in
> that very same code segment when the exception hits, so no CPL change
> occurs and the TSS doesn't particularly matter.
>
> The CPU will push EFLAGS, CS, and RIP.  Here's the question: what
> happens to the high word of CS on the stack?
>
> The SDM appears to say nothing at all about this.  Modern systems
> (e.g. my laptop running in 32-bit legacy mode under KVM) appear to
> zero-extend CS.  But Matthew's 486DX appears to put garbage in the
> high bits (or maybe just leave whatever was already on the stack in
> place).
>
> Do any of you happen to know what's going on and when the behavior
> changed?  I'd like to know just how big of a problem this is.  Because
> if lots of CPUs work like Matthew's, we have lots of subtle bugs on
> them.
>
> --Andy

This came up a while back, and we was determined that we can't assume
zero-extension in 32-bit mode because older processors only do a
16-bit write even on a 32-bit push.  So all segments have to be
treated as 16-bit values, or we have to explicitly zero-extend them.

All 64-bit capable processors do zero-extend segments, even in 32-bit mode.

--
Brian Gerst


Re: What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-19 Thread Brian Gerst
On Sat, Nov 19, 2016 at 8:52 PM, Andy Lutomirski  wrote:
> This is a question for the old-timers here, since I can't find
> anything resembling an answer in the SDM.
>
> Suppose an exception happens (#UD in this case, but I assume it
> doesn't really matter).  We're not in long mode, and the IDT is set up
> to deliver to a normal 32-bit kernel code segment.  We're running in
> that very same code segment when the exception hits, so no CPL change
> occurs and the TSS doesn't particularly matter.
>
> The CPU will push EFLAGS, CS, and RIP.  Here's the question: what
> happens to the high word of CS on the stack?
>
> The SDM appears to say nothing at all about this.  Modern systems
> (e.g. my laptop running in 32-bit legacy mode under KVM) appear to
> zero-extend CS.  But Matthew's 486DX appears to put garbage in the
> high bits (or maybe just leave whatever was already on the stack in
> place).
>
> Do any of you happen to know what's going on and when the behavior
> changed?  I'd like to know just how big of a problem this is.  Because
> if lots of CPUs work like Matthew's, we have lots of subtle bugs on
> them.
>
> --Andy

This came up a while back, and we was determined that we can't assume
zero-extension in 32-bit mode because older processors only do a
16-bit write even on a 32-bit push.  So all segments have to be
treated as 16-bit values, or we have to explicitly zero-extend them.

All 64-bit capable processors do zero-extend segments, even in 32-bit mode.

--
Brian Gerst


What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-19 Thread Andy Lutomirski
This is a question for the old-timers here, since I can't find
anything resembling an answer in the SDM.

Suppose an exception happens (#UD in this case, but I assume it
doesn't really matter).  We're not in long mode, and the IDT is set up
to deliver to a normal 32-bit kernel code segment.  We're running in
that very same code segment when the exception hits, so no CPL change
occurs and the TSS doesn't particularly matter.

The CPU will push EFLAGS, CS, and RIP.  Here's the question: what
happens to the high word of CS on the stack?

The SDM appears to say nothing at all about this.  Modern systems
(e.g. my laptop running in 32-bit legacy mode under KVM) appear to
zero-extend CS.  But Matthew's 486DX appears to put garbage in the
high bits (or maybe just leave whatever was already on the stack in
place).

Do any of you happen to know what's going on and when the behavior
changed?  I'd like to know just how big of a problem this is.  Because
if lots of CPUs work like Matthew's, we have lots of subtle bugs on
them.

--Andy


What exactly do 32-bit x86 exceptions push on the stack in the CS slot?

2016-11-19 Thread Andy Lutomirski
This is a question for the old-timers here, since I can't find
anything resembling an answer in the SDM.

Suppose an exception happens (#UD in this case, but I assume it
doesn't really matter).  We're not in long mode, and the IDT is set up
to deliver to a normal 32-bit kernel code segment.  We're running in
that very same code segment when the exception hits, so no CPL change
occurs and the TSS doesn't particularly matter.

The CPU will push EFLAGS, CS, and RIP.  Here's the question: what
happens to the high word of CS on the stack?

The SDM appears to say nothing at all about this.  Modern systems
(e.g. my laptop running in 32-bit legacy mode under KVM) appear to
zero-extend CS.  But Matthew's 486DX appears to put garbage in the
high bits (or maybe just leave whatever was already on the stack in
place).

Do any of you happen to know what's going on and when the behavior
changed?  I'd like to know just how big of a problem this is.  Because
if lots of CPUs work like Matthew's, we have lots of subtle bugs on
them.

--Andy