Re: PING: [PATCH] i386: Add TARGET_INDIRECT_BRANCH_REGISTER

2018-02-28 Thread Jeff Law
On 02/28/2018 02:39 AM, Steve Beattie wrote:
> Hi Jeff,
> 
> On Thu, Feb 22, 2018 at 10:10:13AM -0700, Jeff Law wrote:
>> A few notes.
>>
>> 1. It's not even clear at this time that retpolining user space binaries
>> makes any sense at all.   SO before doing anything to make this easier
>> I'd like to see a justification for why it's really needed.
> 
> Do you have a reference that gives evidence that retpolining user
> space is not needed or not preferred for x86? Everything that I've
> seen has suggested user space to user space attacks are possible,
> if difficult. And it does not seem likely that microcode updates will
> occur for all processor generations out there.
No reference, but that's general conclusion we've come to internally
with input from Intel.

Essentially you use the newly exposed capabilities from the microcode
updates to flush the state of the indirect predictor at context switch
time.  Combine that with the microcode change to stop sharing indirect
branch prediction state between SMT threads.

I'm being a bit imprecise, but that's the gist.


> 
>> 2. On the other hand, the existing thunk options do make it easier to
>> test independent of hte kernel.  ie, I can turn on inline thunks by
>> default and test things in user space (ie, do thunks generally work
>> properly).
> 
> If thunk-extern is to be the only maintained option, and its deemed
> sensible for user space in at least some situations, is there a
> preferred location for the thunks to end up?
Ideally you'd have one set of thunks per DSO.  Otherwise you end up
doing an indirect branch for the cross-DSO call to get to the thunk
which largely defeats the purpose of the thunks to begin with.


> 
> (I ask these questions because you can already find individual users
> recompiling apps important to them with retpoline options, and there
> is pressure (with associated deadlines) in some quarters to rebuild
> vast tracts of user space with retpolines for x86.)
I know.  I've given up trying to educate all of them.  I expected all
along that well intentioned, but ultimately clueless, folks would start
doing this kind of thing.

jeff



signature.asc
Description: OpenPGP digital signature


Re: PING: [PATCH] i386: Add TARGET_INDIRECT_BRANCH_REGISTER

2018-02-28 Thread Steve Beattie
Hi Jeff,

On Thu, Feb 22, 2018 at 10:10:13AM -0700, Jeff Law wrote:
> A few notes.
> 
> 1. It's not even clear at this time that retpolining user space binaries
> makes any sense at all.   SO before doing anything to make this easier
> I'd like to see a justification for why it's really needed.

Do you have a reference that gives evidence that retpolining user
space is not needed or not preferred for x86? Everything that I've
seen has suggested user space to user space attacks are possible,
if difficult. And it does not seem likely that microcode updates will
occur for all processor generations out there.

> 2. On the other hand, the existing thunk options do make it easier to
> test independent of hte kernel.  ie, I can turn on inline thunks by
> default and test things in user space (ie, do thunks generally work
> properly).

If thunk-extern is to be the only maintained option, and its deemed
sensible for user space in at least some situations, is there a
preferred location for the thunks to end up?

(I ask these questions because you can already find individual users
recompiling apps important to them with retpoline options, and there
is pressure (with associated deadlines) in some quarters to rebuild
vast tracts of user space with retpolines for x86.)

Thanks.

-- 
Steve Beattie

http://NxNW.org/~steve/


signature.asc
Description: PGP signature


Re: PING: [PATCH] i386: Add TARGET_INDIRECT_BRANCH_REGISTER

2018-02-27 Thread Jan Hubicka
> On Mon, Feb 26, 2018 at 8:54 AM, Jan Hubicka  wrote:
> >> On Mon, Feb 26, 2018 at 7:47 AM, Jan Hubicka  wrote:
> >> > Hi,
> >> > my main concern about the patch is that we now have 
> >> > -mindirect-branch=thunk-extern
> >> > which is intended to work well and is used by kernel, but we also have 
> >> > other modes
> >> > that are documented and as such they should work but they may lead to 
> >> > invalid
> >> > unwind info (or did I miss anything imporant here?).
> >> > Why we can't fix the others as well?
> >> >
> >>
> >> I took a closer look at my commit message.  It does leave an impression 
> >> that
> >> only  -mindirect-branch=thunk-extern is fixed.  But in fact, all
> >> -mindirect-branch=
> >> choices are fixed.
> >
> > I see, sorry for confussion!
> >>
> >> 1.  -mindirect-branch=thunk generates:
> >>
> >>.cfi_startproc
> >> pushq   %rbx
> >> .cfi_def_cfa_offset 16
> >> .cfi_offset 3, -16
> >> movq(%rdi), %rax
> >> movq%rdi, %rbx
> >> movq16(%rax), %rax
> >> call__x86_indirect_thunk_rax
> >> movq(%rbx), %rax
> >> movq%rbx, %rdi
> >> popq%rbx
> >> .cfi_def_cfa_offset 8
> >> movq16(%rax), %rax
> >> jmp __x86_indirect_thunk_rax
> >> .cfi_endproc
> >>
> >> 2.  -mindirect-branch=thunk-inline generates:
> >>
> >>.cfi_startproc
> >> pushq   %rbx
> >> .cfi_def_cfa_offset 16
> >> .cfi_offset 3, -16
> >> movq(%rdi), %rax
> >> movq%rdi, %rbx
> >> movq16(%rax), %rax
> >> jmp .LIND1
> >> .LIND0:
> >> call.LIND3
> >> .LIND2:
> >> pause
> >> lfence
> >> jmp .LIND2
> >> .LIND3:
> >> mov %rax, (%rsp)
> >> ret
> >> .LIND1:
> >> call.LIND0
> >> movq(%rbx), %rax
> >> movq%rbx, %rdi
> >> popq%rbx
> >> .cfi_def_cfa_offset 8
> >> movq16(%rax), %rax
> >> call.LIND5
> >> .LIND4:
> >> pause
> >> lfence
> >> jmp .LIND4
> >> .LIND5:
> >> mov %rax, (%rsp)
> >> ret
> >> .cfi_endproc
> >>
> >> I updated the commit message with
> >>
> >> This patch adds TARGET_INDIRECT_BRANCH_REGISTER to force indirect
> >> branch via register whenever -mindirect-branch= is used.
> >>
> >> OK for trunk?
> >>
> >> Thanks.
> >>
> >> --
> >> H.J.
> >
> >> From bd0672bd070da6fa4ff630540c1d87df3e8fdd53 Mon Sep 17 00:00:00 2001
> >> From: "H.J. Lu" 
> >> Date: Fri, 26 Jan 2018 15:54:25 -0800
> >> Subject: [PATCH] i386: Add TARGET_INDIRECT_BRANCH_REGISTER
> >>
> >> For
> >>
> >> ---
> >> struct C {
> >>   virtual ~C();
> >>   virtual void f();
> >> };
> >>
> >> void
> >> f (C *p)
> >> {
> >>   p->f();
> >>   p->f();
> >> }
> >> ---
> >>
> >> -mindirect-branch=thunk-extern -O2 on x86-64 GNU/Linux generates:
> >>
> >> _Z1fP1C:
> >> .LFB0:
> >> .cfi_startproc
> >> pushq   %rbx
> >> .cfi_def_cfa_offset 16
> >> .cfi_offset 3, -16
> >> movq(%rdi), %rax
> >> movq%rdi, %rbx
> >> jmp .LIND1
> >> .LIND0:
> >> pushq   16(%rax)
> >> jmp __x86_indirect_thunk
> >> .LIND1:
> >> call.LIND0
> >> movq(%rbx), %rax
> >> movq%rbx, %rdi
> >> popq%rbx
> >> .cfi_def_cfa_offset 8
> >> movq16(%rax), %rax
> >> jmp __x86_indirect_thunk_rax
> >> .cfi_endproc
> >>
> >> x86-64 is supposed to have asynchronous unwind tables by default, but
> >> there is nothing that reflects the change in the (relative) frame
> >> address after .LIND0.  That region really has to be moved outside of
> >> the .cfi_startproc/.cfi_endproc bracket.
> >>
> >> This patch adds TARGET_INDIRECT_BRANCH_REGISTER to force indirect
> >> branch via register whenever -mindirect-branch= is used.  Now,
> >> -mindirect-branch=thunk-extern -O2 on x86-64 GNU/Linux generates:
> >>
> >> _Z1fP1C:
> >> .LFB0:
> >>   .cfi_startproc
> >>   pushq   %rbx
> >>   .cfi_def_cfa_offset 16
> >>   .cfi_offset 3, -16
> >>   movq(%rdi), %rax
> >>   movq%rdi, %rbx
> >>   movq16(%rax), %rax
> >>   call__x86_indirect_thunk_rax
> >>   movq(%rbx), %rax
> >>   movq%rbx, %rdi
> >>   popq%rbx
> >>   .cfi_def_cfa_offset 8
> >>   movq16(%rax), %rax
> >>   jmp __x86_indirect_thunk_rax
> >>   .cfi_endproc
> >>
> >> so that "-mindirect-branch=thunk-extern" is equivalent to
> >> "-mindirect-branch=thunk-extern -mindirect-branch-register", which is
> >> used by Linux kernel.
> >>
> >> gcc/
> >>
> >>   PR target/84039
> >>   * config/i386/constraints.md (Bs): Replace
> >>   ix86_indirect_branch_register with
> >>   TARGET_INDIRECT_BRANCH_REGISTER.
> >>   (Bw): Likewise.
> >>   * 

Re: PING: [PATCH] i386: Add TARGET_INDIRECT_BRANCH_REGISTER

2018-02-27 Thread H.J. Lu
On Mon, Feb 26, 2018 at 8:54 AM, Jan Hubicka  wrote:
>> On Mon, Feb 26, 2018 at 7:47 AM, Jan Hubicka  wrote:
>> > Hi,
>> > my main concern about the patch is that we now have 
>> > -mindirect-branch=thunk-extern
>> > which is intended to work well and is used by kernel, but we also have 
>> > other modes
>> > that are documented and as such they should work but they may lead to 
>> > invalid
>> > unwind info (or did I miss anything imporant here?).
>> > Why we can't fix the others as well?
>> >
>>
>> I took a closer look at my commit message.  It does leave an impression that
>> only  -mindirect-branch=thunk-extern is fixed.  But in fact, all
>> -mindirect-branch=
>> choices are fixed.
>
> I see, sorry for confussion!
>>
>> 1.  -mindirect-branch=thunk generates:
>>
>>.cfi_startproc
>> pushq   %rbx
>> .cfi_def_cfa_offset 16
>> .cfi_offset 3, -16
>> movq(%rdi), %rax
>> movq%rdi, %rbx
>> movq16(%rax), %rax
>> call__x86_indirect_thunk_rax
>> movq(%rbx), %rax
>> movq%rbx, %rdi
>> popq%rbx
>> .cfi_def_cfa_offset 8
>> movq16(%rax), %rax
>> jmp __x86_indirect_thunk_rax
>> .cfi_endproc
>>
>> 2.  -mindirect-branch=thunk-inline generates:
>>
>>.cfi_startproc
>> pushq   %rbx
>> .cfi_def_cfa_offset 16
>> .cfi_offset 3, -16
>> movq(%rdi), %rax
>> movq%rdi, %rbx
>> movq16(%rax), %rax
>> jmp .LIND1
>> .LIND0:
>> call.LIND3
>> .LIND2:
>> pause
>> lfence
>> jmp .LIND2
>> .LIND3:
>> mov %rax, (%rsp)
>> ret
>> .LIND1:
>> call.LIND0
>> movq(%rbx), %rax
>> movq%rbx, %rdi
>> popq%rbx
>> .cfi_def_cfa_offset 8
>> movq16(%rax), %rax
>> call.LIND5
>> .LIND4:
>> pause
>> lfence
>> jmp .LIND4
>> .LIND5:
>> mov %rax, (%rsp)
>> ret
>> .cfi_endproc
>>
>> I updated the commit message with
>>
>> This patch adds TARGET_INDIRECT_BRANCH_REGISTER to force indirect
>> branch via register whenever -mindirect-branch= is used.
>>
>> OK for trunk?
>>
>> Thanks.
>>
>> --
>> H.J.
>
>> From bd0672bd070da6fa4ff630540c1d87df3e8fdd53 Mon Sep 17 00:00:00 2001
>> From: "H.J. Lu" 
>> Date: Fri, 26 Jan 2018 15:54:25 -0800
>> Subject: [PATCH] i386: Add TARGET_INDIRECT_BRANCH_REGISTER
>>
>> For
>>
>> ---
>> struct C {
>>   virtual ~C();
>>   virtual void f();
>> };
>>
>> void
>> f (C *p)
>> {
>>   p->f();
>>   p->f();
>> }
>> ---
>>
>> -mindirect-branch=thunk-extern -O2 on x86-64 GNU/Linux generates:
>>
>> _Z1fP1C:
>> .LFB0:
>> .cfi_startproc
>> pushq   %rbx
>> .cfi_def_cfa_offset 16
>> .cfi_offset 3, -16
>> movq(%rdi), %rax
>> movq%rdi, %rbx
>> jmp .LIND1
>> .LIND0:
>> pushq   16(%rax)
>> jmp __x86_indirect_thunk
>> .LIND1:
>> call.LIND0
>> movq(%rbx), %rax
>> movq%rbx, %rdi
>> popq%rbx
>> .cfi_def_cfa_offset 8
>> movq16(%rax), %rax
>> jmp __x86_indirect_thunk_rax
>> .cfi_endproc
>>
>> x86-64 is supposed to have asynchronous unwind tables by default, but
>> there is nothing that reflects the change in the (relative) frame
>> address after .LIND0.  That region really has to be moved outside of
>> the .cfi_startproc/.cfi_endproc bracket.
>>
>> This patch adds TARGET_INDIRECT_BRANCH_REGISTER to force indirect
>> branch via register whenever -mindirect-branch= is used.  Now,
>> -mindirect-branch=thunk-extern -O2 on x86-64 GNU/Linux generates:
>>
>> _Z1fP1C:
>> .LFB0:
>>   .cfi_startproc
>>   pushq   %rbx
>>   .cfi_def_cfa_offset 16
>>   .cfi_offset 3, -16
>>   movq(%rdi), %rax
>>   movq%rdi, %rbx
>>   movq16(%rax), %rax
>>   call__x86_indirect_thunk_rax
>>   movq(%rbx), %rax
>>   movq%rbx, %rdi
>>   popq%rbx
>>   .cfi_def_cfa_offset 8
>>   movq16(%rax), %rax
>>   jmp __x86_indirect_thunk_rax
>>   .cfi_endproc
>>
>> so that "-mindirect-branch=thunk-extern" is equivalent to
>> "-mindirect-branch=thunk-extern -mindirect-branch-register", which is
>> used by Linux kernel.
>>
>> gcc/
>>
>>   PR target/84039
>>   * config/i386/constraints.md (Bs): Replace
>>   ix86_indirect_branch_register with
>>   TARGET_INDIRECT_BRANCH_REGISTER.
>>   (Bw): Likewise.
>>   * config/i386/i386.md (indirect_jump): Likewise.
>>   (tablejump): Likewise.
>>   (*sibcall_memory): Likewise.
>>   (*sibcall_value_memory): Likewise.
>>   Peepholes of indirect call and jump via memory: Likewise.
>>   (*sibcall_GOT_32): Disallowed for TARGET_INDIRECT_BRANCH_REGISTER.
>>   (*sibcall_value_GOT_32): Likewise.

Re: PING: [PATCH] i386: Add TARGET_INDIRECT_BRANCH_REGISTER

2018-02-26 Thread Jan Hubicka
> On Mon, Feb 26, 2018 at 7:47 AM, Jan Hubicka  wrote:
> > Hi,
> > my main concern about the patch is that we now have 
> > -mindirect-branch=thunk-extern
> > which is intended to work well and is used by kernel, but we also have 
> > other modes
> > that are documented and as such they should work but they may lead to 
> > invalid
> > unwind info (or did I miss anything imporant here?).
> > Why we can't fix the others as well?
> >
> 
> I took a closer look at my commit message.  It does leave an impression that
> only  -mindirect-branch=thunk-extern is fixed.  But in fact, all
> -mindirect-branch=
> choices are fixed.

I see, sorry for confussion!
> 
> 1.  -mindirect-branch=thunk generates:
> 
>.cfi_startproc
> pushq   %rbx
> .cfi_def_cfa_offset 16
> .cfi_offset 3, -16
> movq(%rdi), %rax
> movq%rdi, %rbx
> movq16(%rax), %rax
> call__x86_indirect_thunk_rax
> movq(%rbx), %rax
> movq%rbx, %rdi
> popq%rbx
> .cfi_def_cfa_offset 8
> movq16(%rax), %rax
> jmp __x86_indirect_thunk_rax
> .cfi_endproc
> 
> 2.  -mindirect-branch=thunk-inline generates:
> 
>.cfi_startproc
> pushq   %rbx
> .cfi_def_cfa_offset 16
> .cfi_offset 3, -16
> movq(%rdi), %rax
> movq%rdi, %rbx
> movq16(%rax), %rax
> jmp .LIND1
> .LIND0:
> call.LIND3
> .LIND2:
> pause
> lfence
> jmp .LIND2
> .LIND3:
> mov %rax, (%rsp)
> ret
> .LIND1:
> call.LIND0
> movq(%rbx), %rax
> movq%rbx, %rdi
> popq%rbx
> .cfi_def_cfa_offset 8
> movq16(%rax), %rax
> call.LIND5
> .LIND4:
> pause
> lfence
> jmp .LIND4
> .LIND5:
> mov %rax, (%rsp)
> ret
> .cfi_endproc
> 
> I updated the commit message with
> 
> This patch adds TARGET_INDIRECT_BRANCH_REGISTER to force indirect
> branch via register whenever -mindirect-branch= is used.
> 
> OK for trunk?
> 
> Thanks.
> 
> -- 
> H.J.

> From bd0672bd070da6fa4ff630540c1d87df3e8fdd53 Mon Sep 17 00:00:00 2001
> From: "H.J. Lu" 
> Date: Fri, 26 Jan 2018 15:54:25 -0800
> Subject: [PATCH] i386: Add TARGET_INDIRECT_BRANCH_REGISTER
> 
> For
> 
> ---
> struct C {
>   virtual ~C();
>   virtual void f();
> };
> 
> void
> f (C *p)
> {
>   p->f();
>   p->f();
> }
> ---
> 
> -mindirect-branch=thunk-extern -O2 on x86-64 GNU/Linux generates:
> 
> _Z1fP1C:
> .LFB0:
> .cfi_startproc
> pushq   %rbx
> .cfi_def_cfa_offset 16
> .cfi_offset 3, -16
> movq(%rdi), %rax
> movq%rdi, %rbx
> jmp .LIND1
> .LIND0:
> pushq   16(%rax)
> jmp __x86_indirect_thunk
> .LIND1:
> call.LIND0
> movq(%rbx), %rax
> movq%rbx, %rdi
> popq%rbx
> .cfi_def_cfa_offset 8
> movq16(%rax), %rax
> jmp __x86_indirect_thunk_rax
> .cfi_endproc
> 
> x86-64 is supposed to have asynchronous unwind tables by default, but
> there is nothing that reflects the change in the (relative) frame
> address after .LIND0.  That region really has to be moved outside of
> the .cfi_startproc/.cfi_endproc bracket.
> 
> This patch adds TARGET_INDIRECT_BRANCH_REGISTER to force indirect
> branch via register whenever -mindirect-branch= is used.  Now,
> -mindirect-branch=thunk-extern -O2 on x86-64 GNU/Linux generates:
> 
> _Z1fP1C:
> .LFB0:
>   .cfi_startproc
>   pushq   %rbx
>   .cfi_def_cfa_offset 16
>   .cfi_offset 3, -16
>   movq(%rdi), %rax
>   movq%rdi, %rbx
>   movq16(%rax), %rax
>   call__x86_indirect_thunk_rax
>   movq(%rbx), %rax
>   movq%rbx, %rdi
>   popq%rbx
>   .cfi_def_cfa_offset 8
>   movq16(%rax), %rax
>   jmp __x86_indirect_thunk_rax
>   .cfi_endproc
> 
> so that "-mindirect-branch=thunk-extern" is equivalent to
> "-mindirect-branch=thunk-extern -mindirect-branch-register", which is
> used by Linux kernel.
> 
> gcc/
> 
>   PR target/84039
>   * config/i386/constraints.md (Bs): Replace
>   ix86_indirect_branch_register with
>   TARGET_INDIRECT_BRANCH_REGISTER.
>   (Bw): Likewise.
>   * config/i386/i386.md (indirect_jump): Likewise.
>   (tablejump): Likewise.
>   (*sibcall_memory): Likewise.
>   (*sibcall_value_memory): Likewise.
>   Peepholes of indirect call and jump via memory: Likewise.
>   (*sibcall_GOT_32): Disallowed for TARGET_INDIRECT_BRANCH_REGISTER.
>   (*sibcall_value_GOT_32): Likewise.
>   * config/i386/i386.opt: Likewise.
>   * config/i386/predicates.md (indirect_branch_operand): Likewise.
>   (GOT_memory_operand): Likewise.
>   (call_insn_operand): Likewise.
>   

Re: PING: [PATCH] i386: Add TARGET_INDIRECT_BRANCH_REGISTER

2018-02-26 Thread H.J. Lu
On Mon, Feb 26, 2018 at 7:47 AM, Jan Hubicka  wrote:
> Hi,
> my main concern about the patch is that we now have 
> -mindirect-branch=thunk-extern
> which is intended to work well and is used by kernel, but we also have other 
> modes
> that are documented and as such they should work but they may lead to invalid
> unwind info (or did I miss anything imporant here?).
> Why we can't fix the others as well?
>

I took a closer look at my commit message.  It does leave an impression that
only  -mindirect-branch=thunk-extern is fixed.  But in fact, all
-mindirect-branch=
choices are fixed.

1.  -mindirect-branch=thunk generates:

   .cfi_startproc
pushq   %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
movq(%rdi), %rax
movq%rdi, %rbx
movq16(%rax), %rax
call__x86_indirect_thunk_rax
movq(%rbx), %rax
movq%rbx, %rdi
popq%rbx
.cfi_def_cfa_offset 8
movq16(%rax), %rax
jmp __x86_indirect_thunk_rax
.cfi_endproc

2.  -mindirect-branch=thunk-inline generates:

   .cfi_startproc
pushq   %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
movq(%rdi), %rax
movq%rdi, %rbx
movq16(%rax), %rax
jmp .LIND1
.LIND0:
call.LIND3
.LIND2:
pause
lfence
jmp .LIND2
.LIND3:
mov %rax, (%rsp)
ret
.LIND1:
call.LIND0
movq(%rbx), %rax
movq%rbx, %rdi
popq%rbx
.cfi_def_cfa_offset 8
movq16(%rax), %rax
call.LIND5
.LIND4:
pause
lfence
jmp .LIND4
.LIND5:
mov %rax, (%rsp)
ret
.cfi_endproc

I updated the commit message with

This patch adds TARGET_INDIRECT_BRANCH_REGISTER to force indirect
branch via register whenever -mindirect-branch= is used.

OK for trunk?

Thanks.

-- 
H.J.
From bd0672bd070da6fa4ff630540c1d87df3e8fdd53 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Fri, 26 Jan 2018 15:54:25 -0800
Subject: [PATCH] i386: Add TARGET_INDIRECT_BRANCH_REGISTER

For

---
struct C {
  virtual ~C();
  virtual void f();
};

void
f (C *p)
{
  p->f();
  p->f();
}
---

-mindirect-branch=thunk-extern -O2 on x86-64 GNU/Linux generates:

_Z1fP1C:
.LFB0:
.cfi_startproc
pushq   %rbx
.cfi_def_cfa_offset 16
.cfi_offset 3, -16
movq(%rdi), %rax
movq%rdi, %rbx
jmp .LIND1
.LIND0:
pushq   16(%rax)
jmp __x86_indirect_thunk
.LIND1:
call.LIND0
movq(%rbx), %rax
movq%rbx, %rdi
popq%rbx
.cfi_def_cfa_offset 8
movq16(%rax), %rax
jmp __x86_indirect_thunk_rax
.cfi_endproc

x86-64 is supposed to have asynchronous unwind tables by default, but
there is nothing that reflects the change in the (relative) frame
address after .LIND0.  That region really has to be moved outside of
the .cfi_startproc/.cfi_endproc bracket.

This patch adds TARGET_INDIRECT_BRANCH_REGISTER to force indirect
branch via register whenever -mindirect-branch= is used.  Now,
-mindirect-branch=thunk-extern -O2 on x86-64 GNU/Linux generates:

_Z1fP1C:
.LFB0:
	.cfi_startproc
	pushq	%rbx
	.cfi_def_cfa_offset 16
	.cfi_offset 3, -16
	movq	(%rdi), %rax
	movq	%rdi, %rbx
	movq	16(%rax), %rax
	call	__x86_indirect_thunk_rax
	movq	(%rbx), %rax
	movq	%rbx, %rdi
	popq	%rbx
	.cfi_def_cfa_offset 8
	movq	16(%rax), %rax
	jmp	__x86_indirect_thunk_rax
	.cfi_endproc

so that "-mindirect-branch=thunk-extern" is equivalent to
"-mindirect-branch=thunk-extern -mindirect-branch-register", which is
used by Linux kernel.

gcc/

	PR target/84039
	* config/i386/constraints.md (Bs): Replace
	ix86_indirect_branch_register with
	TARGET_INDIRECT_BRANCH_REGISTER.
	(Bw): Likewise.
	* config/i386/i386.md (indirect_jump): Likewise.
	(tablejump): Likewise.
	(*sibcall_memory): Likewise.
	(*sibcall_value_memory): Likewise.
	Peepholes of indirect call and jump via memory: Likewise.
	(*sibcall_GOT_32): Disallowed for TARGET_INDIRECT_BRANCH_REGISTER.
	(*sibcall_value_GOT_32): Likewise.
	* config/i386/i386.opt: Likewise.
	* config/i386/predicates.md (indirect_branch_operand): Likewise.
	(GOT_memory_operand): Likewise.
	(call_insn_operand): Likewise.
	(sibcall_insn_operand): Likewise.
	(GOT32_symbol_operand): Likewise.
	* config/i386/i386.h (TARGET_INDIRECT_BRANCH_REGISTER): New.

gcc/testsuite/

	PR target/84039
	* gcc.target/i386/indirect-thunk-1.c: Updated.
	* gcc.target/i386/indirect-thunk-2.c: Likewise.
	* gcc.target/i386/indirect-thunk-3.c: Likewise.
	* gcc.target/i386/indirect-thunk-4.c: Likewise.
	* gcc.target/i386/indirect-thunk-5.c: Likewise.
	* gcc.target/i386/indirect-thunk-6.c: Likewise.
	* gcc.target/i386/indirect-thunk-7.c: Likewise.
	* gcc.target/i386/indirect-thunk-attr-1.c: Likewise.
	* 

Re: PING: [PATCH] i386: Add TARGET_INDIRECT_BRANCH_REGISTER

2018-02-26 Thread Jan Hubicka
Hi,
my main concern about the patch is that we now have 
-mindirect-branch=thunk-extern
which is intended to work well and is used by kernel, but we also have other 
modes
that are documented and as such they should work but they may lead to invalid
unwind info (or did I miss anything imporant here?).
Why we can't fix the others as well? 

Honza


Re: PING: [PATCH] i386: Add TARGET_INDIRECT_BRANCH_REGISTER

2018-02-23 Thread H.J. Lu
On Thu, Feb 22, 2018 at 9:32 AM, H.J. Lu  wrote:
> On Thu, Feb 22, 2018 at 9:10 AM, Jeff Law  wrote:
>> On 02/22/2018 07:38 AM, Jan Hubicka wrote:
>>
>> Hi Jan,
>>
>> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg02233.html
>>
>> Is OK for trunk?
>
> I see that using register makes the problem go away and pushing address 
> to stack
> seemed bit odd anyway. However how does this work on other types of thunk?

 Kernel only uses  -mindirect-branch=thunk-extern.  I am working on a 
 proposal
 to use -mindirect-branch=thunk-extern in user space to support CET in a 
 single
 binary.  So at the end of the day, only
 -mindirect-branch=thunk-extern will be used.
>>>
>>> OK, so it is about the fact that we do not really want to support all
>>> -mindirect-branch options in the future? If we don't want to support the 
>>> correctly,
>>> I wonder why we are including them at all.  Shall we at least output 
>>> warning/sorry
>>> when user tries other thunk type with stack unwinding enabled?
>>> (does Kernel use it?)
>> A few notes.
>>
>> 1. It's not even clear at this time that retpolining user space binaries
>> makes any sense at all.   SO before doing anything to make this easier
>> I'd like to see a justification for why it's really needed.
>
> Hi Jeff,
>
> Which part were commenting? My patch to add TARGET_INDIRECT_BRANCH_REGISTER
> or removing -mindirect-branch choices?

Is my patch OK for trunk?

>> 2. On the other hand, the existing thunk options do make it easier to
>> test independent of hte kernel.  ie, I can turn on inline thunks by
>> default and test things in user space (ie, do thunks generally work
>> properly).
>
> It sounds reasonable.
>

Thanks.


-- 
H.J.


Re: PING: [PATCH] i386: Add TARGET_INDIRECT_BRANCH_REGISTER

2018-02-22 Thread H.J. Lu
On Thu, Feb 22, 2018 at 9:10 AM, Jeff Law  wrote:
> On 02/22/2018 07:38 AM, Jan Hubicka wrote:
>
> Hi Jan,
>
> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg02233.html
>
> Is OK for trunk?

 I see that using register makes the problem go away and pushing address to 
 stack
 seemed bit odd anyway. However how does this work on other types of thunk?
>>>
>>> Kernel only uses  -mindirect-branch=thunk-extern.  I am working on a 
>>> proposal
>>> to use -mindirect-branch=thunk-extern in user space to support CET in a 
>>> single
>>> binary.  So at the end of the day, only
>>> -mindirect-branch=thunk-extern will be used.
>>
>> OK, so it is about the fact that we do not really want to support all
>> -mindirect-branch options in the future? If we don't want to support the 
>> correctly,
>> I wonder why we are including them at all.  Shall we at least output 
>> warning/sorry
>> when user tries other thunk type with stack unwinding enabled?
>> (does Kernel use it?)
> A few notes.
>
> 1. It's not even clear at this time that retpolining user space binaries
> makes any sense at all.   SO before doing anything to make this easier
> I'd like to see a justification for why it's really needed.

Hi Jeff,

Which part were commenting? My patch to add TARGET_INDIRECT_BRANCH_REGISTER
or removing -mindirect-branch choices?

> 2. On the other hand, the existing thunk options do make it easier to
> test independent of hte kernel.  ie, I can turn on inline thunks by
> default and test things in user space (ie, do thunks generally work
> properly).

It sounds reasonable.


-- 
H.J.


Re: PING: [PATCH] i386: Add TARGET_INDIRECT_BRANCH_REGISTER

2018-02-22 Thread Jeff Law
On 02/22/2018 07:38 AM, Jan Hubicka wrote:

 Hi Jan,

 https://gcc.gnu.org/ml/gcc-patches/2018-01/msg02233.html

 Is OK for trunk?
>>>
>>> I see that using register makes the problem go away and pushing address to 
>>> stack
>>> seemed bit odd anyway. However how does this work on other types of thunk?
>>
>> Kernel only uses  -mindirect-branch=thunk-extern.  I am working on a proposal
>> to use -mindirect-branch=thunk-extern in user space to support CET in a 
>> single
>> binary.  So at the end of the day, only
>> -mindirect-branch=thunk-extern will be used.
> 
> OK, so it is about the fact that we do not really want to support all
> -mindirect-branch options in the future? If we don't want to support the 
> correctly,
> I wonder why we are including them at all.  Shall we at least output 
> warning/sorry
> when user tries other thunk type with stack unwinding enabled?
> (does Kernel use it?)
A few notes.

1. It's not even clear at this time that retpolining user space binaries
makes any sense at all.   SO before doing anything to make this easier
I'd like to see a justification for why it's really needed.

2. On the other hand, the existing thunk options do make it easier to
test independent of hte kernel.  ie, I can turn on inline thunks by
default and test things in user space (ie, do thunks generally work
properly).

Jeff


Re: PING: [PATCH] i386: Add TARGET_INDIRECT_BRANCH_REGISTER

2018-02-22 Thread H.J. Lu
On Thu, Feb 22, 2018 at 6:38 AM, Jan Hubicka  wrote:
>> >>
>> >> Hi Jan,
>> >>
>> >> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg02233.html
>> >>
>> >> Is OK for trunk?
>> >
>> > I see that using register makes the problem go away and pushing address to 
>> > stack
>> > seemed bit odd anyway. However how does this work on other types of thunk?
>>
>> Kernel only uses  -mindirect-branch=thunk-extern.  I am working on a proposal
>> to use -mindirect-branch=thunk-extern in user space to support CET in a 
>> single
>> binary.  So at the end of the day, only
>> -mindirect-branch=thunk-extern will be used.
>
> OK, so it is about the fact that we do not really want to support all
> -mindirect-branch options in the future? If we don't want to support the 
> correctly,
> I wonder why we are including them at all.  Shall we at least output 
> warning/sorry

-mindirect-branch= implementation changed over time while
experimenting different
approaches.  I didn't remove the old ones when I added new ones.  We
should remove
others and only keep -mindirect-branch=thunk-extern with my proposal
to support CET
with -mindirect-branch=thunk-extern.

> when user tries other thunk type with stack unwinding enabled?

This is on my TODO list.

> (does Kernel use it?)

No, kernel doesn't use it.


-- 
H.J.


Re: PING: [PATCH] i386: Add TARGET_INDIRECT_BRANCH_REGISTER

2018-02-22 Thread Jan Hubicka
> >>
> >> Hi Jan,
> >>
> >> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg02233.html
> >>
> >> Is OK for trunk?
> >
> > I see that using register makes the problem go away and pushing address to 
> > stack
> > seemed bit odd anyway. However how does this work on other types of thunk?
> 
> Kernel only uses  -mindirect-branch=thunk-extern.  I am working on a proposal
> to use -mindirect-branch=thunk-extern in user space to support CET in a single
> binary.  So at the end of the day, only
> -mindirect-branch=thunk-extern will be used.

OK, so it is about the fact that we do not really want to support all
-mindirect-branch options in the future? If we don't want to support the 
correctly,
I wonder why we are including them at all.  Shall we at least output 
warning/sorry
when user tries other thunk type with stack unwinding enabled?
(does Kernel use it?)

Honza
> 
> 
> -- 
> H.J.


Re: PING: [PATCH] i386: Add TARGET_INDIRECT_BRANCH_REGISTER

2018-02-22 Thread H.J. Lu
On Thu, Feb 22, 2018 at 6:29 AM, Jan Hubicka  wrote:
>> On Sun, Jan 28, 2018 at 11:56 AM, H.J. Lu  wrote:
>> > On Sat, Jan 27, 2018 at 2:12 PM, H.J. Lu  wrote:
>> >> For
>> >>
>> >> ---
>> >> struct C {
>> >>   virtual ~C();
>> >>   virtual void f();
>> >> };
>> >>
>> >> void
>> >> f (C *p)
>> >> {
>> >>   p->f();
>> >>   p->f();
>> >> }
>> >> ---
>> >>
>> >> -mindirect-branch=thunk-extern -O2 on x86-64 GNU/Linux generates:
>> >>
>> >> _Z1fP1C:
>> >> .LFB0:
>> >> .cfi_startproc
>> >> pushq   %rbx
>> >> .cfi_def_cfa_offset 16
>> >> .cfi_offset 3, -16
>> >> movq(%rdi), %rax
>> >> movq%rdi, %rbx
>> >> jmp .LIND1
>> >> .LIND0:
>> >> pushq   16(%rax)
>> >> jmp __x86_indirect_thunk
>> >> .LIND1:
>> >> call.LIND0
>> >> movq(%rbx), %rax
>> >> movq%rbx, %rdi
>> >> popq%rbx
>> >> .cfi_def_cfa_offset 8
>> >> movq16(%rax), %rax
>> >> jmp __x86_indirect_thunk_rax
>> >> .cfi_endproc
>> >>
>> >> x86-64 is supposed to have asynchronous unwind tables by default, but
>> >> there is nothing that reflects the change in the (relative) frame
>> >> address after .LIND0.  That region really has to be moved outside of
>> >> the .cfi_startproc/.cfi_endproc bracket.
>> >>
>> >> This patch adds TARGET_INDIRECT_BRANCH_REGISTER to force indirect
>> >> branch via register when -mindirect-branch=thunk-extern is used.  Now,
>> >> -mindirect-branch=thunk-extern -O2 on x86-64 GNU/Linux generates:
>> >>
>> >> _Z1fP1C:
>> >> .LFB0:
>> >> .cfi_startproc
>> >> pushq   %rbx
>> >> .cfi_def_cfa_offset 16
>> >> .cfi_offset 3, -16
>> >> movq(%rdi), %rax
>> >> movq%rdi, %rbx
>> >> movq16(%rax), %rax
>> >> call__x86_indirect_thunk_rax
>> >> movq(%rbx), %rax
>> >> movq%rbx, %rdi
>> >> popq%rbx
>> >> .cfi_def_cfa_offset 8
>> >> movq16(%rax), %rax
>> >> jmp __x86_indirect_thunk_rax
>> >> .cfi_endproc
>> >>
>> >> Now "-mindirect-branch=thunk-extern" is equivalent to
>> >> "-mindirect-branch=thunk-extern -mindirect-branch-register", which is
>> >> used by Linux kernel.
>> >>
>> >> Tested on i686 and x86-64.  OK for trunk?
>> >>
>> >> Thanks.
>> >>
>> >> H.J.
>> >> 
>> >> gcc/
>> >>
>> >> PR target/84039
>> >> * config/i386/constraints.md (Bs): Replace
>> >> ix86_indirect_branch_register with
>> >> TARGET_INDIRECT_BRANCH_REGISTER.
>> >> (Bw): Likewise.
>> >> * config/i386/i386.md (indirect_jump): Likewise.
>> >> (tablejump): Likewise.
>> >> (*sibcall_memory): Likewise.
>> >> (*sibcall_value_memory): Likewise.
>> >> Peepholes of indirect call and jump via memory: Likewise.
>> >> * config/i386/i386.opt: Likewise.
>> >> * config/i386/predicates.md (indirect_branch_operand): Likewise.
>> >> (GOT_memory_operand): Likewise.
>> >> (call_insn_operand): Likewise.
>> >> (sibcall_insn_operand): Likewise.
>> >> (GOT32_symbol_operand): Likewise.
>> >> * config/i386/i386.h (TARGET_INDIRECT_BRANCH_REGISTER): New.
>> >>
>> >
>> > Here is the updated patch  to disallow *sibcall_GOT_32 and 
>> > *sibcall_value_GOT_32
>> > for TARGET_INDIRECT_BRANCH_REGISTER.
>> >
>> > Tested on i686 and x86-64.  OK for trunk?
>> >
>>
>> Hi Jan,
>>
>> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg02233.html
>>
>> Is OK for trunk?
>
> I see that using register makes the problem go away and pushing address to 
> stack
> seemed bit odd anyway. However how does this work on other types of thunk?

Kernel only uses  -mindirect-branch=thunk-extern.  I am working on a proposal
to use -mindirect-branch=thunk-extern in user space to support CET in a single
binary.  So at the end of the day, only
-mindirect-branch=thunk-extern will be used.


-- 
H.J.


Re: PING: [PATCH] i386: Add TARGET_INDIRECT_BRANCH_REGISTER

2018-02-22 Thread Jan Hubicka
> On Sun, Jan 28, 2018 at 11:56 AM, H.J. Lu  wrote:
> > On Sat, Jan 27, 2018 at 2:12 PM, H.J. Lu  wrote:
> >> For
> >>
> >> ---
> >> struct C {
> >>   virtual ~C();
> >>   virtual void f();
> >> };
> >>
> >> void
> >> f (C *p)
> >> {
> >>   p->f();
> >>   p->f();
> >> }
> >> ---
> >>
> >> -mindirect-branch=thunk-extern -O2 on x86-64 GNU/Linux generates:
> >>
> >> _Z1fP1C:
> >> .LFB0:
> >> .cfi_startproc
> >> pushq   %rbx
> >> .cfi_def_cfa_offset 16
> >> .cfi_offset 3, -16
> >> movq(%rdi), %rax
> >> movq%rdi, %rbx
> >> jmp .LIND1
> >> .LIND0:
> >> pushq   16(%rax)
> >> jmp __x86_indirect_thunk
> >> .LIND1:
> >> call.LIND0
> >> movq(%rbx), %rax
> >> movq%rbx, %rdi
> >> popq%rbx
> >> .cfi_def_cfa_offset 8
> >> movq16(%rax), %rax
> >> jmp __x86_indirect_thunk_rax
> >> .cfi_endproc
> >>
> >> x86-64 is supposed to have asynchronous unwind tables by default, but
> >> there is nothing that reflects the change in the (relative) frame
> >> address after .LIND0.  That region really has to be moved outside of
> >> the .cfi_startproc/.cfi_endproc bracket.
> >>
> >> This patch adds TARGET_INDIRECT_BRANCH_REGISTER to force indirect
> >> branch via register when -mindirect-branch=thunk-extern is used.  Now,
> >> -mindirect-branch=thunk-extern -O2 on x86-64 GNU/Linux generates:
> >>
> >> _Z1fP1C:
> >> .LFB0:
> >> .cfi_startproc
> >> pushq   %rbx
> >> .cfi_def_cfa_offset 16
> >> .cfi_offset 3, -16
> >> movq(%rdi), %rax
> >> movq%rdi, %rbx
> >> movq16(%rax), %rax
> >> call__x86_indirect_thunk_rax
> >> movq(%rbx), %rax
> >> movq%rbx, %rdi
> >> popq%rbx
> >> .cfi_def_cfa_offset 8
> >> movq16(%rax), %rax
> >> jmp __x86_indirect_thunk_rax
> >> .cfi_endproc
> >>
> >> Now "-mindirect-branch=thunk-extern" is equivalent to
> >> "-mindirect-branch=thunk-extern -mindirect-branch-register", which is
> >> used by Linux kernel.
> >>
> >> Tested on i686 and x86-64.  OK for trunk?
> >>
> >> Thanks.
> >>
> >> H.J.
> >> 
> >> gcc/
> >>
> >> PR target/84039
> >> * config/i386/constraints.md (Bs): Replace
> >> ix86_indirect_branch_register with
> >> TARGET_INDIRECT_BRANCH_REGISTER.
> >> (Bw): Likewise.
> >> * config/i386/i386.md (indirect_jump): Likewise.
> >> (tablejump): Likewise.
> >> (*sibcall_memory): Likewise.
> >> (*sibcall_value_memory): Likewise.
> >> Peepholes of indirect call and jump via memory: Likewise.
> >> * config/i386/i386.opt: Likewise.
> >> * config/i386/predicates.md (indirect_branch_operand): Likewise.
> >> (GOT_memory_operand): Likewise.
> >> (call_insn_operand): Likewise.
> >> (sibcall_insn_operand): Likewise.
> >> (GOT32_symbol_operand): Likewise.
> >> * config/i386/i386.h (TARGET_INDIRECT_BRANCH_REGISTER): New.
> >>
> >
> > Here is the updated patch  to disallow *sibcall_GOT_32 and 
> > *sibcall_value_GOT_32
> > for TARGET_INDIRECT_BRANCH_REGISTER.
> >
> > Tested on i686 and x86-64.  OK for trunk?
> >
> 
> Hi Jan,
> 
> https://gcc.gnu.org/ml/gcc-patches/2018-01/msg02233.html
> 
> Is OK for trunk?

I see that using register makes the problem go away and pushing address to stack
seemed bit odd anyway. However how does this work on other types of thunk?

Honza


PING: [PATCH] i386: Add TARGET_INDIRECT_BRANCH_REGISTER

2018-02-08 Thread H.J. Lu
On Sun, Jan 28, 2018 at 11:56 AM, H.J. Lu  wrote:
> On Sat, Jan 27, 2018 at 2:12 PM, H.J. Lu  wrote:
>> For
>>
>> ---
>> struct C {
>>   virtual ~C();
>>   virtual void f();
>> };
>>
>> void
>> f (C *p)
>> {
>>   p->f();
>>   p->f();
>> }
>> ---
>>
>> -mindirect-branch=thunk-extern -O2 on x86-64 GNU/Linux generates:
>>
>> _Z1fP1C:
>> .LFB0:
>> .cfi_startproc
>> pushq   %rbx
>> .cfi_def_cfa_offset 16
>> .cfi_offset 3, -16
>> movq(%rdi), %rax
>> movq%rdi, %rbx
>> jmp .LIND1
>> .LIND0:
>> pushq   16(%rax)
>> jmp __x86_indirect_thunk
>> .LIND1:
>> call.LIND0
>> movq(%rbx), %rax
>> movq%rbx, %rdi
>> popq%rbx
>> .cfi_def_cfa_offset 8
>> movq16(%rax), %rax
>> jmp __x86_indirect_thunk_rax
>> .cfi_endproc
>>
>> x86-64 is supposed to have asynchronous unwind tables by default, but
>> there is nothing that reflects the change in the (relative) frame
>> address after .LIND0.  That region really has to be moved outside of
>> the .cfi_startproc/.cfi_endproc bracket.
>>
>> This patch adds TARGET_INDIRECT_BRANCH_REGISTER to force indirect
>> branch via register when -mindirect-branch=thunk-extern is used.  Now,
>> -mindirect-branch=thunk-extern -O2 on x86-64 GNU/Linux generates:
>>
>> _Z1fP1C:
>> .LFB0:
>> .cfi_startproc
>> pushq   %rbx
>> .cfi_def_cfa_offset 16
>> .cfi_offset 3, -16
>> movq(%rdi), %rax
>> movq%rdi, %rbx
>> movq16(%rax), %rax
>> call__x86_indirect_thunk_rax
>> movq(%rbx), %rax
>> movq%rbx, %rdi
>> popq%rbx
>> .cfi_def_cfa_offset 8
>> movq16(%rax), %rax
>> jmp __x86_indirect_thunk_rax
>> .cfi_endproc
>>
>> Now "-mindirect-branch=thunk-extern" is equivalent to
>> "-mindirect-branch=thunk-extern -mindirect-branch-register", which is
>> used by Linux kernel.
>>
>> Tested on i686 and x86-64.  OK for trunk?
>>
>> Thanks.
>>
>> H.J.
>> 
>> gcc/
>>
>> PR target/84039
>> * config/i386/constraints.md (Bs): Replace
>> ix86_indirect_branch_register with
>> TARGET_INDIRECT_BRANCH_REGISTER.
>> (Bw): Likewise.
>> * config/i386/i386.md (indirect_jump): Likewise.
>> (tablejump): Likewise.
>> (*sibcall_memory): Likewise.
>> (*sibcall_value_memory): Likewise.
>> Peepholes of indirect call and jump via memory: Likewise.
>> * config/i386/i386.opt: Likewise.
>> * config/i386/predicates.md (indirect_branch_operand): Likewise.
>> (GOT_memory_operand): Likewise.
>> (call_insn_operand): Likewise.
>> (sibcall_insn_operand): Likewise.
>> (GOT32_symbol_operand): Likewise.
>> * config/i386/i386.h (TARGET_INDIRECT_BRANCH_REGISTER): New.
>>
>
> Here is the updated patch  to disallow *sibcall_GOT_32 and 
> *sibcall_value_GOT_32
> for TARGET_INDIRECT_BRANCH_REGISTER.
>
> Tested on i686 and x86-64.  OK for trunk?
>

Hi Jan,

https://gcc.gnu.org/ml/gcc-patches/2018-01/msg02233.html

Is OK for trunk?

Thanks.

-- 
H.J.