On Thu, Apr 30, 2015 at 02:39:07PM -0700, H. Peter Anvin wrote:
> This is the microbenchmark I used.
>
> For the record, Intel's intention going forward is that 0F 1F will
> always be as fast or faster than any other alternative.
It looks like this is the case on AMD too.
So I took your
On Thu, Apr 30, 2015 at 02:39:07PM -0700, H. Peter Anvin wrote:
This is the microbenchmark I used.
For the record, Intel's intention going forward is that 0F 1F will
always be as fast or faster than any other alternative.
It looks like this is the case on AMD too.
So I took your benchmark
On Thu, Apr 30, 2015 at 04:23:26PM -0700, H. Peter Anvin wrote:
> I probably should have added that the microbenchmark specifically tests
> for an atomic 5-byte NOP (as required by tracepoints etc.) If the
> requirement for 5-byte atomic is dropped there might be faster
> combinations, e.g. 66 66
On Thu, Apr 30, 2015 at 04:23:26PM -0700, H. Peter Anvin wrote:
I probably should have added that the microbenchmark specifically tests
for an atomic 5-byte NOP (as required by tracepoints etc.) If the
requirement for 5-byte atomic is dropped there might be faster
combinations, e.g. 66 66 66
On 04/30/2015 02:39 PM, H. Peter Anvin wrote:
> This is the microbenchmark I used.
>
> For the record, Intel's intention going forward is that 0F 1F will
> always be as fast or faster than any other alternative.
>
I probably should have added that the microbenchmark specifically tests
for an
This is the microbenchmark I used.
For the record, Intel's intention going forward is that 0F 1F will
always be as fast or faster than any other alternative.
-hpa
#define _GNU_SOURCE
#include
#include
#include
#include
#include
static void nop_p6(void)
{
asm volatile(".rept
This is the microbenchmark I used.
For the record, Intel's intention going forward is that 0F 1F will
always be as fast or faster than any other alternative.
-hpa
#define _GNU_SOURCE
#include stdio.h
#include stdlib.h
#include time.h
#include stdbool.h
#include sys/time.h
static void
On 04/30/2015 02:39 PM, H. Peter Anvin wrote:
This is the microbenchmark I used.
For the record, Intel's intention going forward is that 0F 1F will
always be as fast or faster than any other alternative.
I probably should have added that the microbenchmark specifically tests
for an atomic
On Tue, Apr 28, 2015 at 10:16:33AM -0700, Linus Torvalds wrote:
> I suspect it might be related to things like getting performance
> counters and instruction debug traps etc right. There are quite
> possibly also simply constraints where the front end has to generate
> *something* just to keep the
On Tue, Apr 28, 2015 at 9:58 AM, Borislav Petkov wrote:
>
> Well, AFAIK, NOPs do require resources for tracking in the machine. I
> was hoping that hw would be smarter and discard at decode time but there
> probably are reasons that it can't be done (...yet).
I suspect it might be related to
On Tue, Apr 28, 2015 at 09:28:52AM -0700, Linus Torvalds wrote:
> On Tue, Apr 28, 2015 at 8:55 AM, Borislav Petkov wrote:
> >
> > Provided it is correct, it shows that the 0x66-prefixed 3-byte NOPs are
> > better than the 0F 1F 00 suggested by the manual (Haha!):
>
> That's which AMD CPU?
F16h.
On Tue, Apr 28, 2015 at 8:55 AM, Borislav Petkov wrote:
>
> Provided it is correct, it shows that the 0x66-prefixed 3-byte NOPs are
> better than the 0F 1F 00 suggested by the manual (Haha!):
That's which AMD CPU?
On my intel i7-4770S, they are the same cost (I cut down your loop
numbers by an
On Mon, Apr 27, 2015 at 01:14:51PM -0700, H. Peter Anvin wrote:
> I did a microbenchmark in user space... let's see if I can find it.
How about the simple one below?
Provided it is correct, it shows that the 0x66-prefixed 3-byte NOPs are
better than the 0F 1F 00 suggested by the manual (Haha!):
On Mon, Apr 27, 2015 at 09:45:12PM +0200, Borislav Petkov wrote:
> > Maybe you are measuring random noise.
>
> Yeah. Last exercise tomorrow. Let's see what those numbers would look
> like.
Right, so with Mel's help, I did a simple microbenchmark to measure how
many cycles a syscall (getpid())
On Mon, Apr 27, 2015 at 09:45:12PM +0200, Borislav Petkov wrote:
Maybe you are measuring random noise.
Yeah. Last exercise tomorrow. Let's see what those numbers would look
like.
Right, so with Mel's help, I did a simple microbenchmark to measure how
many cycles a syscall (getpid()) needs
On Mon, Apr 27, 2015 at 01:14:51PM -0700, H. Peter Anvin wrote:
I did a microbenchmark in user space... let's see if I can find it.
How about the simple one below?
Provided it is correct, it shows that the 0x66-prefixed 3-byte NOPs are
better than the 0F 1F 00 suggested by the manual (Haha!):
On Tue, Apr 28, 2015 at 8:55 AM, Borislav Petkov b...@alien8.de wrote:
Provided it is correct, it shows that the 0x66-prefixed 3-byte NOPs are
better than the 0F 1F 00 suggested by the manual (Haha!):
That's which AMD CPU?
On my intel i7-4770S, they are the same cost (I cut down your loop
On Tue, Apr 28, 2015 at 09:28:52AM -0700, Linus Torvalds wrote:
On Tue, Apr 28, 2015 at 8:55 AM, Borislav Petkov b...@alien8.de wrote:
Provided it is correct, it shows that the 0x66-prefixed 3-byte NOPs are
better than the 0F 1F 00 suggested by the manual (Haha!):
That's which AMD CPU?
On Tue, Apr 28, 2015 at 9:58 AM, Borislav Petkov b...@alien8.de wrote:
Well, AFAIK, NOPs do require resources for tracking in the machine. I
was hoping that hw would be smarter and discard at decode time but there
probably are reasons that it can't be done (...yet).
I suspect it might be
On Tue, Apr 28, 2015 at 10:16:33AM -0700, Linus Torvalds wrote:
I suspect it might be related to things like getting performance
counters and instruction debug traps etc right. There are quite
possibly also simply constraints where the front end has to generate
*something* just to keep the
I did a microbenchmark in user space... let's see if I can find it.
On April 27, 2015 1:03:29 PM PDT, Borislav Petkov wrote:
>On Mon, Apr 27, 2015 at 12:59:11PM -0700, H. Peter Anvin wrote:
>> It really comes down to this: it seems older cores from both Intel
>> and AMD perform better with 66 66
On Mon, Apr 27, 2015 at 12:59:11PM -0700, H. Peter Anvin wrote:
> It really comes down to this: it seems older cores from both Intel
> and AMD perform better with 66 66 66 90, whereas the 0F 1F series is
> better on newer cores.
>
> When I measured it, the differences were sometimes dramatic.
How
It really comes down to this: it seems older cores from both Intel and AMD
perform better with 66 66 66 90, whereas the 0F 1F series is better on newer
cores.
When I measured it, the differences were sometimes dramatic.
On April 27, 2015 11:53:44 AM PDT, Borislav Petkov wrote:
>On Mon, Apr
On Mon, Apr 27, 2015 at 09:21:34PM +0200, Denys Vlasenko wrote:
> On 04/27/2015 09:11 PM, Borislav Petkov wrote:
> > A: 709.528485252 seconds time elapsed
> >( +- 0.02% )
> > B: 708.976557288 seconds time elapsed
On 04/27/2015 09:11 PM, Borislav Petkov wrote:
> A: 709.528485252 seconds time elapsed
> ( +- 0.02% )
> B: 708.976557288 seconds time elapsed
> ( +- 0.04% )
> C: 709.312844791 seconds time elapsed
On Mon, Apr 27, 2015 at 08:38:54PM +0200, Borislav Petkov wrote:
> I'm running them now and will report numbers relative to the last run
> once it is done. And those numbers should in practice get even better if
> we revert to the simpler canonical-ness check but let's see...
Results are done.
On Mon, Apr 27, 2015 at 11:47:30AM -0700, Linus Torvalds wrote:
> On Mon, Apr 27, 2015 at 11:38 AM, Borislav Petkov wrote:
> >
> > So our current NOP-infrastructure does ASM_NOP_MAX NOPs of 8 bytes so
> > without more invasive changes, our longest NOPs are 8 byte long and then
> > we have to
On Mon, Apr 27, 2015 at 11:12:05AM -0700, Linus Torvalds wrote:
> So if one or two cycles in this code doesn't matter, then why are we
> adding alternate instructions just to avoid a few ALU instructions and
> a conditional branch that predicts perfectly? And if it does matter,
> then the 6-byte
On Mon, Apr 27, 2015 at 11:38 AM, Borislav Petkov wrote:
>
> So our current NOP-infrastructure does ASM_NOP_MAX NOPs of 8 bytes so
> without more invasive changes, our longest NOPs are 8 byte long and then
> we have to repeat.
Btw (and I'm too lazy to check) do we take alignment into account?
On Mon, Apr 27, 2015 at 11:14:15AM -0700, Linus Torvalds wrote:
> Btw, please don't use the "more than three 66h overrides" version.
Oh yeah, a notorious "frontend choker".
> Sure, that's what the optimization manual suggests if you want
> single-instruction decode for all sizes up to 15 bytes,
On Mon, Apr 27, 2015 at 9:40 AM, Borislav Petkov wrote:
>
> Either way, the NOPs-version is faster and I'm running the test with the
> F16h-specific NOPs to see how they perform.
Btw, please don't use the "more than three 66h overrides" version.
Sure, that's what the optimization manual suggests
On Mon, Apr 27, 2015 at 9:12 AM, Denys Vlasenko wrote:
>
> It is smaller, but not by much. It is two instructions smaller.
Ehh. That's _half_.
And on a decoding side, it's the difference between 6 bytes that
decode cleanly and can be decoded in parallel with other things
(assuming the 6-byte
On Mon, Apr 27, 2015 at 09:00:08AM -0700, Linus Torvalds wrote:
> On Mon, Apr 27, 2015 at 8:46 AM, Borislav Petkov wrote:
> >
> > Right, what about the false positives:
>
> Anybody who tries to return to kernel addresses with sysret is
> suspect. It's more likely to be an attack vector than
On 04/27/2015 04:57 PM, Linus Torvalds wrote:
> On Mon, Apr 27, 2015 at 4:35 AM, Borislav Petkov wrote:
>>
>> /*
>> * Change top 16 bits to be the sign-extension of 47th bit, if this
>> * changed %rcx, it was not canonical.
>> */
>> ALTERNATIVE "", \
>>
On 04/27/2015 06:04 PM, Brian Gerst wrote:
> On Mon, Apr 27, 2015 at 11:56 AM, Andy Lutomirski wrote:
>> On Mon, Apr 27, 2015 at 8:46 AM, Borislav Petkov wrote:
>>> On Mon, Apr 27, 2015 at 07:57:36AM -0700, Linus Torvalds wrote:
On Mon, Apr 27, 2015 at 4:35 AM, Borislav Petkov wrote:
>
On Mon, Apr 27, 2015 at 11:56 AM, Andy Lutomirski wrote:
> On Mon, Apr 27, 2015 at 8:46 AM, Borislav Petkov wrote:
>> On Mon, Apr 27, 2015 at 07:57:36AM -0700, Linus Torvalds wrote:
>>> On Mon, Apr 27, 2015 at 4:35 AM, Borislav Petkov wrote:
>>> >
>>> > /*
>>> > * Change top 16
On Mon, Apr 27, 2015 at 8:46 AM, Borislav Petkov wrote:
>
> Right, what about the false positives:
Anybody who tries to return to kernel addresses with sysret is
suspect. It's more likely to be an attack vector than anything else
(ie somebody who is trying to take advantage of a CPU bug).
I
On Mon, Apr 27, 2015 at 8:46 AM, Borislav Petkov wrote:
> On Mon, Apr 27, 2015 at 07:57:36AM -0700, Linus Torvalds wrote:
>> On Mon, Apr 27, 2015 at 4:35 AM, Borislav Petkov wrote:
>> >
>> > /*
>> > * Change top 16 bits to be the sign-extension of 47th bit, if this
>> >
On Mon, Apr 27, 2015 at 07:57:36AM -0700, Linus Torvalds wrote:
> On Mon, Apr 27, 2015 at 4:35 AM, Borislav Petkov wrote:
> >
> > /*
> > * Change top 16 bits to be the sign-extension of 47th bit, if this
> > * changed %rcx, it was not canonical.
> > */
> >
On Mon, Apr 27, 2015 at 08:06:16AM -0700, Linus Torvalds wrote:
> So maybe our AMD nop tables should be updated?
Ho-humm, we're using k8_nops on all 64-bit AMD. I better do some
opt-guide staring.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe
On Mon, Apr 27, 2015 at 7:57 AM, Linus Torvalds
wrote:
>
> ..end result is just six bytes. That way you can use alternative to
> replace it with one single noop on AMD.
Actually, it looks like we have no good 6-byte no-ops on AMD. So you'd
get two three-byte ones. Oh well. It's still better than
On Mon, Apr 27, 2015 at 4:35 AM, Borislav Petkov wrote:
>
> /*
> * Change top 16 bits to be the sign-extension of 47th bit, if this
> * changed %rcx, it was not canonical.
> */
> ALTERNATIVE "", \
> "shl$(64 - (47+1)), %rcx; \
>
On Sun, Apr 26, 2015 at 04:39:38PM -0700, Andy Lutomirski wrote:
> I know it would be ugly, but would it be worth saving two bytes by
> using ALTERNATIVE "jmp 1f", "shl ...", ...?
Damn, it is actually visible even that saving the unconditional forward
JMP makes the numbers marginally nicer (E:
On Mon, Apr 27, 2015 at 02:08:40PM +0200, Denys Vlasenko wrote:
> > 819ef40c: 48 c1 e1 10 shl$0x10,%rcx
> > 819ef410: 48 c1 f9 10 sar$0x10,%rcx
> > 819ef414: 49 39 cbcmp%rcx,%r11
> > 819ef417:
On 04/27/2015 01:35 PM, Borislav Petkov wrote:
> On Mon, Apr 27, 2015 at 10:53:05AM +0200, Borislav Petkov wrote:
>> ALTERNATIVE "",
>> "shl $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx \
>> sar $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx \
>>
On Mon, Apr 27, 2015 at 10:53:05AM +0200, Borislav Petkov wrote:
> ALTERNATIVE "",
> "shl $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx \
>sar $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx \
>cmpq%rcx, %r11 \
>jne
On Mon, Apr 27, 2015 at 12:07:14PM +0200, Denys Vlasenko wrote:
> /* Only three 0x66 prefixes for NOP for fast decode on all CPUs */
> ALTERNATIVE ".byte 0x66,0x66,0x66,0x90 \
> .byte 0x66,0x66,0x66,0x90 \
> .byte 0x66,0x66,0x66,0x90",
>
On 04/27/2015 10:53 AM, Borislav Petkov wrote:
> On Sun, Apr 26, 2015 at 04:39:38PM -0700, Andy Lutomirski wrote:
>>> +#define X86_BUG_CANONICAL_RCX X86_BUG(8) /* SYSRET #GPs when %RCX
>>> non-canonical */
>>
>> I think that "sysret" should appear in the name.
>
> Yeah, I thought about it too,
On Sun, Apr 26, 2015 at 04:39:38PM -0700, Andy Lutomirski wrote:
> > +#define X86_BUG_CANONICAL_RCX X86_BUG(8) /* SYSRET #GPs when %RCX
> > non-canonical */
>
> I think that "sysret" should appear in the name.
Yeah, I thought about it too, will fix.
> Oh no! My laptop is currently bug-free,
On 04/27/2015 10:53 AM, Borislav Petkov wrote:
On Sun, Apr 26, 2015 at 04:39:38PM -0700, Andy Lutomirski wrote:
+#define X86_BUG_CANONICAL_RCX X86_BUG(8) /* SYSRET #GPs when %RCX
non-canonical */
I think that sysret should appear in the name.
Yeah, I thought about it too, will fix.
Oh
On Mon, Apr 27, 2015 at 12:07:14PM +0200, Denys Vlasenko wrote:
/* Only three 0x66 prefixes for NOP for fast decode on all CPUs */
ALTERNATIVE .byte 0x66,0x66,0x66,0x90 \
.byte 0x66,0x66,0x66,0x90 \
.byte 0x66,0x66,0x66,0x90,
On Mon, Apr 27, 2015 at 12:59:11PM -0700, H. Peter Anvin wrote:
It really comes down to this: it seems older cores from both Intel
and AMD perform better with 66 66 66 90, whereas the 0F 1F series is
better on newer cores.
When I measured it, the differences were sometimes dramatic.
How did
It really comes down to this: it seems older cores from both Intel and AMD
perform better with 66 66 66 90, whereas the 0F 1F series is better on newer
cores.
When I measured it, the differences were sometimes dramatic.
On April 27, 2015 11:53:44 AM PDT, Borislav Petkov b...@alien8.de wrote:
On Mon, Apr 27, 2015 at 09:21:34PM +0200, Denys Vlasenko wrote:
On 04/27/2015 09:11 PM, Borislav Petkov wrote:
A: 709.528485252 seconds time elapsed
( +- 0.02% )
B: 708.976557288 seconds time elapsed
I did a microbenchmark in user space... let's see if I can find it.
On April 27, 2015 1:03:29 PM PDT, Borislav Petkov b...@alien8.de wrote:
On Mon, Apr 27, 2015 at 12:59:11PM -0700, H. Peter Anvin wrote:
It really comes down to this: it seems older cores from both Intel
and AMD perform better
On Sun, Apr 26, 2015 at 04:39:38PM -0700, Andy Lutomirski wrote:
+#define X86_BUG_CANONICAL_RCX X86_BUG(8) /* SYSRET #GPs when %RCX
non-canonical */
I think that sysret should appear in the name.
Yeah, I thought about it too, will fix.
Oh no! My laptop is currently bug-free, and
On Mon, Apr 27, 2015 at 02:08:40PM +0200, Denys Vlasenko wrote:
819ef40c: 48 c1 e1 10 shl$0x10,%rcx
819ef410: 48 c1 f9 10 sar$0x10,%rcx
819ef414: 49 39 cbcmp%rcx,%r11
819ef417: 0f 85
On 04/27/2015 01:35 PM, Borislav Petkov wrote:
On Mon, Apr 27, 2015 at 10:53:05AM +0200, Borislav Petkov wrote:
ALTERNATIVE ,
shl $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx \
sar $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx \
cmpq
On Mon, Apr 27, 2015 at 10:53:05AM +0200, Borislav Petkov wrote:
ALTERNATIVE ,
shl $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx \
sar $(64 - (__VIRTUAL_MASK_SHIFT+1)), %rcx \
cmpq%rcx, %r11 \
jne
On Mon, Apr 27, 2015 at 4:35 AM, Borislav Petkov b...@alien8.de wrote:
/*
* Change top 16 bits to be the sign-extension of 47th bit, if this
* changed %rcx, it was not canonical.
*/
ALTERNATIVE , \
shl$(64 - (47+1)), %rcx; \
On Mon, Apr 27, 2015 at 7:57 AM, Linus Torvalds
torva...@linux-foundation.org wrote:
..end result is just six bytes. That way you can use alternative to
replace it with one single noop on AMD.
Actually, it looks like we have no good 6-byte no-ops on AMD. So you'd
get two three-byte ones. Oh
On Sun, Apr 26, 2015 at 04:39:38PM -0700, Andy Lutomirski wrote:
I know it would be ugly, but would it be worth saving two bytes by
using ALTERNATIVE jmp 1f, shl ..., ...?
Damn, it is actually visible even that saving the unconditional forward
JMP makes the numbers marginally nicer (E: row). So
On Mon, Apr 27, 2015 at 11:56 AM, Andy Lutomirski l...@amacapital.net wrote:
On Mon, Apr 27, 2015 at 8:46 AM, Borislav Petkov b...@alien8.de wrote:
On Mon, Apr 27, 2015 at 07:57:36AM -0700, Linus Torvalds wrote:
On Mon, Apr 27, 2015 at 4:35 AM, Borislav Petkov b...@alien8.de wrote:
On Mon, Apr 27, 2015 at 07:57:36AM -0700, Linus Torvalds wrote:
On Mon, Apr 27, 2015 at 4:35 AM, Borislav Petkov b...@alien8.de wrote:
/*
* Change top 16 bits to be the sign-extension of 47th bit, if this
* changed %rcx, it was not canonical.
*/
On Mon, Apr 27, 2015 at 8:46 AM, Borislav Petkov b...@alien8.de wrote:
On Mon, Apr 27, 2015 at 07:57:36AM -0700, Linus Torvalds wrote:
On Mon, Apr 27, 2015 at 4:35 AM, Borislav Petkov b...@alien8.de wrote:
/*
* Change top 16 bits to be the sign-extension of 47th bit, if
On Mon, Apr 27, 2015 at 8:46 AM, Borislav Petkov b...@alien8.de wrote:
Right, what about the false positives:
Anybody who tries to return to kernel addresses with sysret is
suspect. It's more likely to be an attack vector than anything else
(ie somebody who is trying to take advantage of a CPU
On Mon, Apr 27, 2015 at 08:06:16AM -0700, Linus Torvalds wrote:
So maybe our AMD nop tables should be updated?
Ho-humm, we're using k8_nops on all 64-bit AMD. I better do some
opt-guide staring.
--
Regards/Gruss,
Boris.
ECO tip #101: Trim your mails when you reply.
--
--
To unsubscribe
On 04/27/2015 06:04 PM, Brian Gerst wrote:
On Mon, Apr 27, 2015 at 11:56 AM, Andy Lutomirski l...@amacapital.net wrote:
On Mon, Apr 27, 2015 at 8:46 AM, Borislav Petkov b...@alien8.de wrote:
On Mon, Apr 27, 2015 at 07:57:36AM -0700, Linus Torvalds wrote:
On Mon, Apr 27, 2015 at 4:35 AM,
On 04/27/2015 04:57 PM, Linus Torvalds wrote:
On Mon, Apr 27, 2015 at 4:35 AM, Borislav Petkov b...@alien8.de wrote:
/*
* Change top 16 bits to be the sign-extension of 47th bit, if this
* changed %rcx, it was not canonical.
*/
ALTERNATIVE , \
On Mon, Apr 27, 2015 at 09:00:08AM -0700, Linus Torvalds wrote:
On Mon, Apr 27, 2015 at 8:46 AM, Borislav Petkov b...@alien8.de wrote:
Right, what about the false positives:
Anybody who tries to return to kernel addresses with sysret is
suspect. It's more likely to be an attack vector
On Mon, Apr 27, 2015 at 11:14:15AM -0700, Linus Torvalds wrote:
Btw, please don't use the more than three 66h overrides version.
Oh yeah, a notorious frontend choker.
Sure, that's what the optimization manual suggests if you want
single-instruction decode for all sizes up to 15 bytes, but I
On Mon, Apr 27, 2015 at 11:38 AM, Borislav Petkov b...@alien8.de wrote:
So our current NOP-infrastructure does ASM_NOP_MAX NOPs of 8 bytes so
without more invasive changes, our longest NOPs are 8 byte long and then
we have to repeat.
Btw (and I'm too lazy to check) do we take alignment into
On Mon, Apr 27, 2015 at 11:12:05AM -0700, Linus Torvalds wrote:
So if one or two cycles in this code doesn't matter, then why are we
adding alternate instructions just to avoid a few ALU instructions and
a conditional branch that predicts perfectly? And if it does matter,
then the 6-byte
On Mon, Apr 27, 2015 at 11:47:30AM -0700, Linus Torvalds wrote:
On Mon, Apr 27, 2015 at 11:38 AM, Borislav Petkov b...@alien8.de wrote:
So our current NOP-infrastructure does ASM_NOP_MAX NOPs of 8 bytes so
without more invasive changes, our longest NOPs are 8 byte long and then
we have to
On Mon, Apr 27, 2015 at 9:12 AM, Denys Vlasenko dvlas...@redhat.com wrote:
It is smaller, but not by much. It is two instructions smaller.
Ehh. That's _half_.
And on a decoding side, it's the difference between 6 bytes that
decode cleanly and can be decoded in parallel with other things
On Mon, Apr 27, 2015 at 9:40 AM, Borislav Petkov b...@alien8.de wrote:
Either way, the NOPs-version is faster and I'm running the test with the
F16h-specific NOPs to see how they perform.
Btw, please don't use the more than three 66h overrides version.
Sure, that's what the optimization manual
On Mon, Apr 27, 2015 at 08:38:54PM +0200, Borislav Petkov wrote:
I'm running them now and will report numbers relative to the last run
once it is done. And those numbers should in practice get even better if
we revert to the simpler canonical-ness check but let's see...
Results are done. New
On 04/27/2015 09:11 PM, Borislav Petkov wrote:
A: 709.528485252 seconds time elapsed
( +- 0.02% )
B: 708.976557288 seconds time elapsed
( +- 0.04% )
C: 709.312844791 seconds time elapsed
>
> diff --git a/arch/x86/include/asm/cpufeature.h
> b/arch/x86/include/asm/cpufeature.h
> index 7ee9b94d9921..8d555b046fe9 100644
> --- a/arch/x86/include/asm/cpufeature.h
> +++ b/arch/x86/include/asm/cpufeature.h
> @@ -265,6 +265,7 @@
> #define X86_BUG_11AP X86_BUG(5) /* Bad local
On Fri, Apr 24, 2015 at 7:17 PM, Denys Vlasenko
wrote:
> On Fri, Apr 24, 2015 at 10:50 PM, Andy Lutomirski wrote:
>> On Fri, Apr 24, 2015 at 1:46 PM, Denys Vlasenko
This might be way more trouble than it's worth.
>>>
>>> Exactly my feeling. What are you trying to save? About four CPU
>>>
On Fri, Apr 24, 2015 at 4:18 AM, Andy Lutomirski wrote:
> On Thu, Apr 23, 2015 at 7:15 PM, Andy Lutomirski wrote:
> Even if the issue affects SYSRETQ, it could be that we don't care. If
> the extent of the info leak is whether we context switched during a
> 64-bit syscall to a non-syscall
On Sat, Apr 25, 2015 at 11:12:06PM +0200, Borislav Petkov wrote:
> I've prepended the perf stat output with markers A:, B: or C: for easier
> comparing. The markers mean:
>
> A: Linus' master from a couple of days ago + tip/master + tip/x86/asm
> B: With Andy's SYSRET patch ontop
> C: Without RCX
On Sat, Apr 25, 2015 at 11:12:06PM +0200, Borislav Petkov wrote:
I've prepended the perf stat output with markers A:, B: or C: for easier
comparing. The markers mean:
A: Linus' master from a couple of days ago + tip/master + tip/x86/asm
B: With Andy's SYSRET patch ontop
C: Without RCX
On Fri, Apr 24, 2015 at 4:18 AM, Andy Lutomirski l...@amacapital.net wrote:
On Thu, Apr 23, 2015 at 7:15 PM, Andy Lutomirski l...@kernel.org wrote:
Even if the issue affects SYSRETQ, it could be that we don't care. If
the extent of the info leak is whether we context switched during a
64-bit
On Fri, Apr 24, 2015 at 7:17 PM, Denys Vlasenko
vda.li...@googlemail.com wrote:
On Fri, Apr 24, 2015 at 10:50 PM, Andy Lutomirski l...@amacapital.net wrote:
On Fri, Apr 24, 2015 at 1:46 PM, Denys Vlasenko
This might be way more trouble than it's worth.
Exactly my feeling. What are you trying
diff --git a/arch/x86/include/asm/cpufeature.h
b/arch/x86/include/asm/cpufeature.h
index 7ee9b94d9921..8d555b046fe9 100644
--- a/arch/x86/include/asm/cpufeature.h
+++ b/arch/x86/include/asm/cpufeature.h
@@ -265,6 +265,7 @@
#define X86_BUG_11AP X86_BUG(5) /* Bad local APIC aka
On Thu, Apr 23, 2015 at 07:15:01PM -0700, Andy Lutomirski wrote:
> AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET
> with SS == 0 results in an invalid usermode state in which SS is
> apparently equal to __USER_DS but causes #SS if used.
>
> Work around the issue by replacing
On Thu, Apr 23, 2015 at 07:15:01PM -0700, Andy Lutomirski wrote:
AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET
with SS == 0 results in an invalid usermode state in which SS is
apparently equal to __USER_DS but causes #SS if used.
Work around the issue by replacing NULL
On Fri, Apr 24, 2015 at 10:50 PM, Andy Lutomirski wrote:
> On Fri, Apr 24, 2015 at 1:46 PM, Denys Vlasenko
>>> This might be way more trouble than it's worth.
>>
>> Exactly my feeling. What are you trying to save? About four CPU
>> cycles of checking %ss != __KERNEL_DS on each switch_to?
>>
On 04/24/2015 01:50 PM, Andy Lutomirski wrote:
>>
>> Exactly my feeling. What are you trying to save? About four CPU
>> cycles of checking %ss != __KERNEL_DS on each switch_to?
>> That's not worth bothering about. Your last patch seems to be perfect.
>
> We'll have to do the write to ss almost
On 04/24/2015 01:50 PM, Andy Lutomirski wrote:
>>
>> Exactly my feeling. What are you trying to save? About four CPU
>> cycles of checking %ss != __KERNEL_DS on each switch_to?
>> That's not worth bothering about. Your last patch seems to be perfect.
>
> We'll have to do the write to ss almost
On 04/24/2015 01:50 PM, Andy Lutomirski wrote:
>>
>> Exactly my feeling. What are you trying to save? About four CPU
>> cycles of checking %ss != __KERNEL_DS on each switch_to?
>> That's not worth bothering about. Your last patch seems to be perfect.
>
> We'll have to do the write to ss almost
On 04/24/2015 01:50 PM, Andy Lutomirski wrote:
>>
>> Exactly my feeling. What are you trying to save? About four CPU
>> cycles of checking %ss != __KERNEL_DS on each switch_to?
>> That's not worth bothering about. Your last patch seems to be perfect.
>
> We'll have to do the write to ss almost
On 04/24/2015 01:50 PM, Andy Lutomirski wrote:
>>
>> Exactly my feeling. What are you trying to save? About four CPU
>> cycles of checking %ss != __KERNEL_DS on each switch_to?
>> That's not worth bothering about. Your last patch seems to be perfect.
>
> We'll have to do the write to ss almost
On 04/24/2015 01:50 PM, Andy Lutomirski wrote:
>>
>> Exactly my feeling. What are you trying to save? About four CPU
>> cycles of checking %ss != __KERNEL_DS on each switch_to?
>> That's not worth bothering about. Your last patch seems to be perfect.
>
> We'll have to do the write to ss almost
On Fri, Apr 24, 2015 at 1:21 PM, Andy Lutomirski wrote:
>
> 2. SYSRETQ. The only way that I know of to see the problem is SYSRETQ
> followed by a far jump or return. This is presumably *extremely*
> rare.
>
> What if we fixed #2 up in do_stack_segment. We should double-check
> the docs, but I
On Fri, Apr 24, 2015 at 1:46 PM, Denys Vlasenko
wrote:
> On Fri, Apr 24, 2015 at 10:21 PM, Andy Lutomirski wrote:
>> On Thu, Apr 23, 2015 at 7:15 PM, Andy Lutomirski wrote:
>>> AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET
>>> with SS == 0 results in an invalid usermode
On Fri, Apr 24, 2015 at 10:21 PM, Andy Lutomirski wrote:
> On Thu, Apr 23, 2015 at 7:15 PM, Andy Lutomirski wrote:
>> AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET
>> with SS == 0 results in an invalid usermode state in which SS is
>> apparently equal to __USER_DS but causes
On Thu, Apr 23, 2015 at 7:15 PM, Andy Lutomirski wrote:
> AMD CPUs don't reinitialize the SS descriptor on SYSRET, so SYSRET
> with SS == 0 results in an invalid usermode state in which SS is
> apparently equal to __USER_DS but causes #SS if used.
>
> Work around the issue by replacing NULL SS
On Fri, Apr 24, 2015 at 12:59:06PM +0200, Borislav Petkov wrote:
> Yeah, that makes more sense. So I tested Andy's patch but changed it as
> above and I get
>
> $ taskset -c 0 ./sysret_ss_attrs_32
> [RUN] Syscalls followed by SS validation
> [OK]We survived
Andy, you wanted the 64-bit
1 - 100 of 134 matches
Mail list logo