Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-07-24 Thread H.J. Lu
On Thu, Jun 4, 2015 at 9:54 AM, Sriraman Tallam tmsri...@google.com wrote:
 Patch attached with those changes.

 Is this patch alright to commit?


 * c-family/c-common.c (noplt): New attribute.
 (handle_noplt_attribute): New handler.
 * calls.c (prepare_call_address): Check for noplt attribute.
 * config/i386/i386.c (ix86_function_ok_for_sibcall): Check
 for noplt attribute.
 (ix86_expand_call):  Ditto.
 (ix86_nopic_noplt_attribute_p): New function.
 (ix86_output_call_insn): Output indirect call for non-pic no plt calls.
 * doc/extend.texi (noplt): Document new attribute.
 * doc/invoke.texi: Document new attribute.
 * testsuite/gcc.target/i386/noplt-1.c: New test.
 * testsuite/gcc.target/i386/noplt-2.c: New test.
 * testsuite/gcc.target/i386/noplt-3.c: New test.
 * testsuite/gcc.target/i386/noplt-4.c: New test.


This may have caused:

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=67001

-- 
H.J.


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-06-04 Thread Sriraman Tallam
On Thu, Jun 4, 2015 at 10:05 AM, Richard Henderson r...@redhat.com wrote:
 On 06/04/2015 09:54 AM, Sriraman Tallam wrote:
 +  DECL_ATTRIBUTES (SYMBOL_REF_DECL (XEXP(fnaddr, 0)

 Spacing.

   {
 use_reg (use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM));
 if (ix86_use_pseudo_pic_reg ())
 @@ -25598,7 +25603,31 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call

return call;
  }
 +/* Return true if the function being called was marked with attribute 
 noplt

 Vertical spacing.

 +  || !TARGET_64BIT || TARGET_MACHO|| TARGET_SEH || TARGET_PECOFF

 Spacing.

 Otherwise ok.

Made these changes and committed the patch.  I had to add one more
check here to check if decl is not null before looking at its
attributes.  It was causing a seg fault during boot-strap with libgcc
build.

+   (SYMBOL_REF_DECL ((XEXP (fnaddr, 0))) == NULL_TREE // This
line was added after the patch was approved.
+  || !lookup_attribute (noplt,
+ DECL_ATTRIBUTES (SYMBOL_REF_DECL (XEXP (fnaddr, 0))

Thanks
Sri



 r~


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-06-04 Thread Richard Henderson
On 06/04/2015 09:54 AM, Sriraman Tallam wrote:
 +  DECL_ATTRIBUTES (SYMBOL_REF_DECL (XEXP(fnaddr, 0)

Spacing.

   {
 use_reg (use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM));
 if (ix86_use_pseudo_pic_reg ())
 @@ -25598,7 +25603,31 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call
  
return call;
  }
 +/* Return true if the function being called was marked with attribute noplt

Vertical spacing.

 +  || !TARGET_64BIT || TARGET_MACHO|| TARGET_SEH || TARGET_PECOFF

Spacing.

Otherwise ok.


r~


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-06-04 Thread Sriraman Tallam
 Patch attached with those changes.

Is this patch alright to commit?


* c-family/c-common.c (noplt): New attribute.
(handle_noplt_attribute): New handler.
* calls.c (prepare_call_address): Check for noplt attribute.
* config/i386/i386.c (ix86_function_ok_for_sibcall): Check
for noplt attribute.
(ix86_expand_call):  Ditto.
(ix86_nopic_noplt_attribute_p): New function.
(ix86_output_call_insn): Output indirect call for non-pic no plt calls.
* doc/extend.texi (noplt): Document new attribute.
* doc/invoke.texi: Document new attribute.
* testsuite/gcc.target/i386/noplt-1.c: New test.
* testsuite/gcc.target/i386/noplt-2.c: New test.
* testsuite/gcc.target/i386/noplt-3.c: New test.
* testsuite/gcc.target/i386/noplt-4.c: New test.


Thanks
Sri
* c-family/c-common.c (noplt): New attribute.
(handle_noplt_attribute): New handler.
* calls.c (prepare_call_address): Check for noplt
attribute.
* config/i386/i386.c (ix86_function_ok_for_sibcall): Check
for noplt attribute.
(ix86_expand_call):  Ditto.
(ix86_nopic_noplt_attribute_p): New function.
(ix86_output_call_insn): Output indirect call for non-pic
no plt calls.
* doc/extend.texi (noplt): Document new attribute.
* doc/invoke.texi: Document new attribute.
* testsuite/gcc.target/i386/noplt-1.c: New test.
* testsuite/gcc.target/i386/noplt-2.c: New test.
* testsuite/gcc.target/i386/noplt-3.c: New test.
* testsuite/gcc.target/i386/noplt-4.c: New test.

This patch does two things:

* Adds new generic function attribute noplt that is similar in functionality
  to -fno-plt except that it applies only to calls to functions that are marked
  with this attribute.
* For x86_64, it makes -fno-plt(and the attribute) also work for non-PIC code by
  directly generating an indirect call via a GOT entry.

Index: c-family/c-common.c
===
--- c-family/c-common.c (revision 223720)
+++ c-family/c-common.c (working copy)
@@ -357,6 +357,7 @@ static tree handle_mode_attribute (tree *, tree, t
 static tree handle_section_attribute (tree *, tree, tree, int, bool *);
 static tree handle_aligned_attribute (tree *, tree, tree, int, bool *);
 static tree handle_weak_attribute (tree *, tree, tree, int, bool *) ;
+static tree handle_noplt_attribute (tree *, tree, tree, int, bool *) ;
 static tree handle_alias_ifunc_attribute (bool, tree *, tree, tree, bool *);
 static tree handle_ifunc_attribute (tree *, tree, tree, int, bool *);
 static tree handle_alias_attribute (tree *, tree, tree, int, bool *);
@@ -706,6 +707,8 @@ const struct attribute_spec c_common_attribute_tab
  handle_aligned_attribute, false },
   { weak,   0, 0, true,  false, false,
  handle_weak_attribute, false },
+  { noplt,   0, 0, true,  false, false,
+ handle_noplt_attribute, false },
   { ifunc,  1, 1, true,  false, false,
  handle_ifunc_attribute, false },
   { alias,  1, 1, true,  false, false,
@@ -8185,6 +8188,25 @@ handle_weak_attribute (tree *node, tree name,
   return NULL_TREE;
 }
 
+/* Handle a noplt attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_noplt_attribute (tree *node, tree name,
+  tree ARG_UNUSED (args),
+  int ARG_UNUSED (flags),
+  bool * ARG_UNUSED (no_add_attrs))
+{
+  if (TREE_CODE (*node) != FUNCTION_DECL)
+{
+  warning (OPT_Wattributes,
+  %qE attribute is only applicable on functions, name);
+  *no_add_attrs = true;
+  return NULL_TREE;
+}
+  return NULL_TREE;
+}
+
 /* Handle an alias or ifunc attribute; arguments as in
struct attribute_spec.handler, except that IS_ALIAS tells us
whether this is an alias as opposed to ifunc attribute.  */
Index: calls.c
===
--- calls.c (revision 223720)
+++ calls.c (working copy)
@@ -226,10 +226,16 @@ prepare_call_address (tree fndecl_or_type, rtx fun
targetm.small_register_classes_for_mode_p (FUNCTION_MODE))
  ? force_not_mem (memory_address (FUNCTION_MODE, funexp))
  : memory_address (FUNCTION_MODE, funexp));
-  else if (flag_pic  !flag_plt  fndecl_or_type
+  else if (flag_pic
+   fndecl_or_type
TREE_CODE (fndecl_or_type) == FUNCTION_DECL
+   (!flag_plt
+  || lookup_attribute (noplt, DECL_ATTRIBUTES (fndecl_or_type)))
!targetm.binds_local_p (fndecl_or_type))
 {
+  /* This is done only for PIC code.  There is no easy interface to force 
the
+function address into GOT for non-PIC case.  non-PIC case needs to be
+handled specially by the backend.  */
   funexp = force_reg (Pmode, 

Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-06-03 Thread Sriraman Tallam
On Wed, Jun 3, 2015 at 1:09 PM, Richard Henderson r...@redhat.com wrote:
 On 06/03/2015 11:38 AM, Sriraman Tallam wrote:
 +  { no_plt,   0, 0, true,  false, false,
 +   handle_no_plt_attribute, false },

 Call it noplt.  We don't add the underscore for noinline, noclone, etc.

Done.




 Index: config/i386/i386.c
 ===
 --- config/i386/i386.c(revision 223720)
 +++ config/i386/i386.c(working copy)
 @@ -5479,7 +5479,10 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
 !TARGET_64BIT
 flag_pic
 flag_plt
 -   decl  !targetm.binds_local_p (decl))
 +   decl
 +   (TREE_CODE (decl) != FUNCTION_DECL
 +   || !lookup_attribute (no_plt, DECL_ATTRIBUTES (decl)))
 +   !targetm.binds_local_p (decl))
  return false;

/* If we need to align the outgoing stack, then sibcalling would

 Is this really necessary?  I'd expect DECL to be NULL in this case,
 since the non-use of the PLT will mean that the (sib)call is indirect.

Removed.



 @@ -25497,13 +25500,19 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call
  }
else
  {
 -  /* Static functions and indirect calls don't need the pic register.  
 */
 +  /* Static functions and indirect calls don't need the pic register.  
 Also,
 +  check if PLT was explicitly avoided via no-plt or no_plt attribute, 
 making
 +  it an indirect call.  */
if (flag_pic
  (!TARGET_64BIT
 || (ix86_cmodel == CM_LARGE_PIC
  DEFAULT_ABI != MS_ABI))
  GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
 -! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)))
 +!SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))
 +flag_plt
 +(TREE_CODE (SYMBOL_REF_DECL (XEXP(fnaddr, 0))) != FUNCTION_DECL
 +   || !lookup_attribute (no_plt,
 +  DECL_ATTRIBUTES (SYMBOL_REF_DECL (XEXP(fnaddr, 0))
   {
 use_reg (use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM));
 if (ix86_use_pseudo_pic_reg ())

 Why are you testing FUNCTION_DECL?  Even if, somehow, the user were producing 
 a
 function call to a data symbol, why do you think that lookup_attribute would
 produce incorrect results?

 Similarly in ix86_nopic_no_plt_attribute_p.

Fixed.

Patch attached with those changes.

Thanks
Sri
* c-family/c-common.c (noplt): New attribute.
(handle_noplt_attribute): New handler.
* calls.c (prepare_call_address): Check for noplt
attribute.
* config/i386/i386.c (ix86_function_ok_for_sibcall): Check
for noplt attribute.
(ix86_expand_call):  Ditto.
(ix86_nopic_noplt_attribute_p): New function.
(ix86_output_call_insn): Output indirect call for non-pic
no plt calls.
* doc/extend.texi (noplt): Document new attribute.
* doc/invoke.texi: Document new attribute.
* testsuite/gcc.target/i386/noplt-1.c: New test.
* testsuite/gcc.target/i386/noplt-2.c: New test.
* testsuite/gcc.target/i386/noplt-3.c: New test.
* testsuite/gcc.target/i386/noplt-4.c: New test.

This patch does two things:

* Adds new generic function attribute noplt that is similar in functionality
  to -fno-plt except that it applies only to calls to functions that are marked
  with this attribute.
* For x86_64, it makes -fno-plt(and the attribute) also work for non-PIC code by
  directly generating an indirect call via a GOT entry.

Index: c-family/c-common.c
===
--- c-family/c-common.c (revision 223720)
+++ c-family/c-common.c (working copy)
@@ -357,6 +357,7 @@ static tree handle_mode_attribute (tree *, tree, t
 static tree handle_section_attribute (tree *, tree, tree, int, bool *);
 static tree handle_aligned_attribute (tree *, tree, tree, int, bool *);
 static tree handle_weak_attribute (tree *, tree, tree, int, bool *) ;
+static tree handle_noplt_attribute (tree *, tree, tree, int, bool *) ;
 static tree handle_alias_ifunc_attribute (bool, tree *, tree, tree, bool *);
 static tree handle_ifunc_attribute (tree *, tree, tree, int, bool *);
 static tree handle_alias_attribute (tree *, tree, tree, int, bool *);
@@ -706,6 +707,8 @@ const struct attribute_spec c_common_attribute_tab
  handle_aligned_attribute, false },
   { weak,   0, 0, true,  false, false,
  handle_weak_attribute, false },
+  { noplt,   0, 0, true,  false, false,
+ handle_noplt_attribute, false },
   { ifunc,  1, 1, true,  false, false,
  handle_ifunc_attribute, false },
   { alias,  1, 1, true,  false, false,
@@ -8185,6 +8188,25 @@ handle_weak_attribute (tree *node, tree name,
   return NULL_TREE;
 }
 
+/* Handle a noplt attribute; arguments as in
+   

Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-06-03 Thread Richard Henderson
On 06/02/2015 01:56 PM, Ramana Radhakrishnan wrote:
 I'm sorry I'm going to push back again for the same reason.
 
 Other than forcing targets to tweak their call insn patterns, the act
 of generating the indirect call should remain in target independent
 code.

How is that going to help?

Unless a target tweaks its call insn patterns, combine or cse is going
to reconstruct the direct call from the indirect call.  Indeed, the tweak
itself will be exactly what's needed to force the generation of the indirect
call, no?


r~


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-06-03 Thread Richard Henderson
On 06/03/2015 11:38 AM, Sriraman Tallam wrote:
 +  { no_plt,   0, 0, true,  false, false,
 +   handle_no_plt_attribute, false },

Call it noplt.  We don't add the underscore for noinline, noclone, etc.



 Index: config/i386/i386.c
 ===
 --- config/i386/i386.c(revision 223720)
 +++ config/i386/i386.c(working copy)
 @@ -5479,7 +5479,10 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
 !TARGET_64BIT
 flag_pic
 flag_plt
 -   decl  !targetm.binds_local_p (decl))
 +   decl
 +   (TREE_CODE (decl) != FUNCTION_DECL
 +   || !lookup_attribute (no_plt, DECL_ATTRIBUTES (decl)))
 +   !targetm.binds_local_p (decl))
  return false;
  
/* If we need to align the outgoing stack, then sibcalling would

Is this really necessary?  I'd expect DECL to be NULL in this case,
since the non-use of the PLT will mean that the (sib)call is indirect.


 @@ -25497,13 +25500,19 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call
  }
else
  {
 -  /* Static functions and indirect calls don't need the pic register.  */
 +  /* Static functions and indirect calls don't need the pic register.  
 Also,
 +  check if PLT was explicitly avoided via no-plt or no_plt attribute, 
 making
 +  it an indirect call.  */
if (flag_pic
  (!TARGET_64BIT
 || (ix86_cmodel == CM_LARGE_PIC
  DEFAULT_ABI != MS_ABI))
  GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
 -! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)))
 +!SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))
 +flag_plt
 +(TREE_CODE (SYMBOL_REF_DECL (XEXP(fnaddr, 0))) != FUNCTION_DECL
 +   || !lookup_attribute (no_plt,
 +  DECL_ATTRIBUTES (SYMBOL_REF_DECL (XEXP(fnaddr, 0))
   {
 use_reg (use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM));
 if (ix86_use_pseudo_pic_reg ())

Why are you testing FUNCTION_DECL?  Even if, somehow, the user were producing a
function call to a data symbol, why do you think that lookup_attribute would
produce incorrect results?

Similarly in ix86_nopic_no_plt_attribute_p.


r~


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-06-03 Thread Ramana Radhakrishnan

Hi Sriraman,

Thanks for the detailed explanation, that was useful.



I'm sorry I'm going to push back again for the same reason.


Let me describe the problem I am having in a little more detail:

For the PIC case, I think there is no confusion. Both of us agree on
what is being done. Attribute no_plt exactly shadows -fno-plt and is
completely target independent.


Agreed.



For the non-PIC case, this is where some target dependent portions are
needed.  This is because I simply cannot remove the flag_pic check in
calls.c and force the address onto a register. Lets say I did that
with this patch:


Of-course I should have realized this earlier - sorry for being a pain.
We need to load the value from the GOT (or an equivalent position 
independent manner) and that is entirely handled by the backends, 
there's no easy interface to do this from the mid-end.


I tried a horrible hack in calls.c which was -

int old_flag_pic = flag_pic;
flag_pic = 1;
funexp = force_reg (Pmode, funexp);
flag_pic = old_flag_pic;

We then have to relax quite a lot of checks in a number of places across 
backends to handle !flag_plt which ain't worth it.


I agree now that it will be much cleaner just to punt this into the 
backend, so it may be worth noting that making this work properly for 
the non-PIC case requires quite a degree of massaging in the backends.


Objections withdrawn.

Thanks,
Ramana







Index: calls.c
===
--- calls.c (revision 223720)
+++ calls.c (working copy)
@@ -226,8 +226,10 @@ prepare_call_address (tree fndecl_or_type, rtx fun
  targetm.small_register_classes_for_mode_p (FUNCTION_MODE))
? force_not_mem (memory_address (FUNCTION_MODE, funexp))
: memory_address (FUNCTION_MODE, funexp));
-  else if (flag_pic  !flag_plt  fndecl_or_type
+  else if (fndecl_or_type
  TREE_CODE (fndecl_or_type) == FUNCTION_DECL
+(!flag_plt
+   || lookup_attribute (no_plt, DECL_ATTRIBUTES (fndecl_or_type)))
  !targetm.binds_local_p (fndecl_or_type))
  {
funexp = force_reg (Pmode, funexp);

what would the code look like for this example below in the non-PIC case:

__attribute__((no_plt))
extern int foo();

int main ()
{
   return foo();
}


Without -O2:

mov _Z3foov, %eax
call *%eax

The indirect call is there but this is wrong because this will force
the linker to still create a PLT entry for foo and use that address.
This is worse than calling the PLT directly as we end up calling the
PLT indirectly.

Now, with -O2:
call *_Z3foov

and again same story.  The linker creates a PLT entry for foo and
calls foo_plt indirectly.

What we really need to do in the non-PIC case, if we need a target
independent solution, is pretend that the call to foo is like a PIC
call when we see the attribute.  I looked at how to do this and the
change to me seems pretty hairy and that is why it seemed like it is
better to handle this in the target directly.

Thanks
Sri




Other than forcing targets to tweak their call insn patterns, the act
of generating the indirect call should remain in target independent
code. Sorry, not having the same behaviour on all platforms for
something like this is just a recipe for confusion.

regards
Ramana



For PIC code, no_plt merely shadows the implementation of -fno-plt, no
surprises here.

* c-family/c-common.c (no_plt): New attribute.
(handle_no_plt_attribute): New handler.
* calls.c (prepare_call_address): Check for no_plt
attribute.
* config/i386/i386.c (ix86_function_ok_for_sibcall): Check
for no_plt attribute.
(ix86_expand_call):  Ditto.
(nopic_no_plt_attribute): New function.
(ix86_output_call_insn): Output indirect call for non-pic
no plt calls.
* doc/extend.texi (no_plt): Document new attribute.
* testsuite/gcc.target/i386/noplt-1.c: New test.
* testsuite/gcc.target/i386/noplt-2.c: New test.
* testsuite/gcc.target/i386/noplt-3.c: New test.
* testsuite/gcc.target/i386/noplt-4.c: New test.


Please review.

Thanks
Sri




To be honest, this is trivial to implement in the ARM backend as one
would just piggy back on the longcalls work - despite that, IMNSHO
it's best done in a target independent manner.

regards
Ramana



Thanks
Sri



regards
Ramana







I am not familiar with PLT calls for other targets.  I can move the
tests to gcc.dg but what relocation are you suggesting I check for?



Move the test to gcc.dg, add a target_support_no_plt function in
testsuite/lib/target-supports.exp and mark this as being supported only on
x86 and use scan-assembler to scan for PLT relocations for x86. Other
targets can add things as they deem fit.




In any case, on a large number of elf/ linux targets I would have thought
the absence of a JMP_SLOT relocation would be good enough to check that this
is working correctly.

regards
Ramana






Thanks
Sri






Ramana





Also I think the PLT calls have EBX in call fusage wich is added by
ix86_expand_call.
else
  {
/* Static functions 

Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-06-03 Thread Sriraman Tallam

 I agree now that it will be much cleaner just to punt this into the backend,
 so it may be worth noting that making this work properly for the non-PIC
 case requires quite a degree of massaging in the backends.

 Objections withdrawn.

Thanks!, I have attached the latest patch after making the changes
Bernhard suggested.  Also, added a comment saying non-PIC case needs
to be handled specially by the backend.

* c-family/c-common.c (no_plt): New attribute.
(handle_no_plt_attribute): New handler.
* calls.c (prepare_call_address): Check for no_plt
attribute.
* config/i386/i386.c (ix86_function_ok_for_sibcall): Check
for no_plt attribute.
(ix86_expand_call):  Ditto.
(ix86_nopic_no_plt_attribute_p): New function.
(ix86_output_call_insn): Output indirect call for non-pic
no plt calls.
* doc/extend.texi (no_plt): Document new attribute.
* doc/invoke.texi: Document new attribute.
* testsuite/gcc.target/i386/noplt-1.c: New test.
* testsuite/gcc.target/i386/noplt-2.c: New test.
* testsuite/gcc.target/i386/noplt-3.c: New test.
* testsuite/gcc.target/i386/noplt-4.c: New test.

This patch does two things:

* Adds new generic function attribute no_plt that is similar in
functionality  to -fno-plt except that it applies only to calls to
functions that are marked  with this attribute.
* For x86_64, it makes -fno-plt(and the attribute) also work for
non-PIC code by  directly generating an indirect call via a GOT entry.


Sri



 Thanks,
 Ramana
* c-family/c-common.c (no_plt): New attribute.
(handle_no_plt_attribute): New handler.
* calls.c (prepare_call_address): Check for no_plt
attribute.
* config/i386/i386.c (ix86_function_ok_for_sibcall): Check
for no_plt attribute.
(ix86_expand_call):  Ditto.
(ix86_nopic_no_plt_attribute_p): New function.
(ix86_output_call_insn): Output indirect call for non-pic
no plt calls.
* doc/extend.texi (no_plt): Document new attribute.
* doc/invoke.texi: Document new attribute.
* testsuite/gcc.target/i386/noplt-1.c: New test.
* testsuite/gcc.target/i386/noplt-2.c: New test.
* testsuite/gcc.target/i386/noplt-3.c: New test.
* testsuite/gcc.target/i386/noplt-4.c: New test.

This patch does two things:

* Adds new generic function attribute no_plt that is similar in functionality
  to -fno-plt except that it applies only to calls to functions that are marked
  with this attribute.
* For x86_64, it makes -fno-plt(and the attribute) also work for non-PIC code by
  directly generating an indirect call via a GOT entry.

Index: c-family/c-common.c
===
--- c-family/c-common.c (revision 223720)
+++ c-family/c-common.c (working copy)
@@ -357,6 +357,7 @@ static tree handle_mode_attribute (tree *, tree, t
 static tree handle_section_attribute (tree *, tree, tree, int, bool *);
 static tree handle_aligned_attribute (tree *, tree, tree, int, bool *);
 static tree handle_weak_attribute (tree *, tree, tree, int, bool *) ;
+static tree handle_no_plt_attribute (tree *, tree, tree, int, bool *) ;
 static tree handle_alias_ifunc_attribute (bool, tree *, tree, tree, bool *);
 static tree handle_ifunc_attribute (tree *, tree, tree, int, bool *);
 static tree handle_alias_attribute (tree *, tree, tree, int, bool *);
@@ -706,6 +707,8 @@ const struct attribute_spec c_common_attribute_tab
  handle_aligned_attribute, false },
   { weak,   0, 0, true,  false, false,
  handle_weak_attribute, false },
+  { no_plt,   0, 0, true,  false, false,
+ handle_no_plt_attribute, false },
   { ifunc,  1, 1, true,  false, false,
  handle_ifunc_attribute, false },
   { alias,  1, 1, true,  false, false,
@@ -8185,6 +8188,25 @@ handle_weak_attribute (tree *node, tree name,
   return NULL_TREE;
 }
 
+/* Handle a no_plt attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_no_plt_attribute (tree *node, tree name,
+  tree ARG_UNUSED (args),
+  int ARG_UNUSED (flags),
+  bool * ARG_UNUSED (no_add_attrs))
+{
+  if (TREE_CODE (*node) != FUNCTION_DECL)
+{
+  warning (OPT_Wattributes,
+  %qE attribute is only applicable on functions, name);
+  *no_add_attrs = true;
+  return NULL_TREE;
+}
+  return NULL_TREE;
+}
+
 /* Handle an alias or ifunc attribute; arguments as in
struct attribute_spec.handler, except that IS_ALIAS tells us
whether this is an alias as opposed to ifunc attribute.  */
Index: calls.c
===
--- calls.c (revision 223720)
+++ calls.c (working copy)
@@ -226,10 +226,16 @@ prepare_call_address (tree fndecl_or_type, rtx fun

Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-06-02 Thread Sriraman Tallam
On Mon, Jun 1, 2015 at 1:33 PM, Ramana Radhakrishnan
ramana@googlemail.com wrote:
 On Mon, Jun 1, 2015 at 7:55 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Mon, Jun 1, 2015 at 11:41 AM, Ramana Radhakrishnan
 ramana@googlemail.com wrote:
 On Mon, Jun 1, 2015 at 7:01 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Mon, Jun 1, 2015 at 1:24 AM, Ramana Radhakrishnan
 ramana.radhakrish...@arm.com wrote:

 Why isn't it just an indirect call in the cases that would require a GOT
 slot and a direct call otherwise ? I'm trying to work out what's so
 different on each target that mandates this to be in the target backend.
 Also it would be better to push the tests into gcc.dg if you can and
 check
 for the absence of a relocation so that folks at least see these as 
 being
 UNSUPPORTED on their target.




 To be even more explicit, shouldn't this be handled similar to the way in
 which -fno-plt is handled in a target agnostic manner ? After all, if you
 can handle this for the command line, doing the same for a function which
 has been decorated with attribute((noplt)) should be simple.

 -fno-plt does not work for non-PIC code, having non-PIC code not use
 PLT was my primary motivation.  Infact, if you go back in this thread,
 I suggested to HJ if I should piggyback on -fno-plt.  I tried using
 the -fno-plt implementation to do this by removing the flag_pic check
 in calls.c, but that does not still work for non-PIC code.

 If you want __attribute__ ((noplt)) to work for non-PIC code, we
 should look to code it in the same place surely by making all
 __attribute__((noplt)) calls, indirect calls irrespective of whether
 it's fpic or not.



 You're missing my point, unless I'm missing something basic here - I
 should have been even more explicit and said -fPIC was a given in all
 this discussion.

 calls.c:229 has

 else if (flag_pic  !flag_plt  fndecl_or_type
 TREE_CODE (fndecl_or_type) == FUNCTION_DECL
 !targetm.binds_local_p (fndecl_or_type))

 why can't we merge the check in here for the attribute noplt ?

 We can and and please see this thread, that is the exact patch I proposed :
 https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02682.html

 However, there was one caveat.  I want this working without -fPIC too.
 non-PIC code also generates PLT calls and I want them eliminated.


 If a new attribute is added to the GNU language in this case, why
 isn't this being treated in the same way as the command line option
 has been treated ? All this means is that we add an attribute and a
 command line option to common code and then not implement it in a
 proper target agnostic fashion.

 You are right.  This is the way I wanted it too but I also wanted the
 attribute to work without PIC. PLT calls are generated without -fPIC
 and -fPIE too and I wanted a solution for that.  On looking at the
 code in more detail,

 * -fno-plt is made to work with -fPIC, is there a reason to not make
 it work for non-PIC code?  I can remove the flag_pic check from
 calls.c

 I don't think that's right, you probably have to allow that along with
 (flag_pic || (decl  attribute_no_plt (decl)) - however it seems odd
 to me that the language extension allows this but the flag doesn't.

 * Then, I add the generic attribute noplt and everything is fine.

 There is just one caveat with the above approach, for x86_64
 (*call_insn) will not generate indirect-calls for *non-PIC* code
 because constant_call_address_operand in predicates.md will evaluate
 to false.  This can be fixed appropriately in ix86_output_call_insn in
 i386.c.

 Yes, targets need to massage that into place but that's essentially
 the mechanics of retaining indirect calls in each backend. -fno-plt
 doesn't work for ARM / AArch64 with optimizers currently (and I
 suspect on most other targets) because our predicates are too liberal,
 fixed by treating noplt or -fno-plt as the equivalent of
 -mlong-calls.



 Is this alright?  Sorry for the confusion, but the primary reason why
 I did not do it the way you suggested is because we wanted noplt
 attribute to work for non-PIC code also.

 If that is the case, then this is a slightly more complicated
 condition in the same place. We then always have indirect calls for
 functions that are marked noplt and just have target generate this
 appropriately.

I have now modified this patch.

This patch does two things:

1) Adds new generic function attribute no_plt that is similar in
functionality  to -fno-plt except that it applies only to calls to
functions that are marked  with this attribute.
2) For x86_64, it makes -fno-plt(and the attribute) also work for
non-PIC code by  directly generating an indirect call via a GOT entry.

For PIC code, no_plt merely shadows the implementation of -fno-plt, no
surprises here.

* c-family/c-common.c (no_plt): New attribute.
(handle_no_plt_attribute): New handler.
* calls.c (prepare_call_address): Check for no_plt
attribute.
* config/i386/i386.c 

Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-06-02 Thread Bernhard Reutner-Fischer
On June 2, 2015 8:15:42 PM GMT+02:00, Sriraman Tallam tmsri...@google.com 
wrote:
[]

I have now modified this patch.

This patch does two things:

1) Adds new generic function attribute no_plt that is similar in
functionality  to -fno-plt except that it applies only to calls to
functions that are marked  with this attribute.
2) For x86_64, it makes -fno-plt(and the attribute) also work for
non-PIC code by  directly generating an indirect call via a GOT entry.

For PIC code, no_plt merely shadows the implementation of -fno-plt, no
surprises here.

* c-family/c-common.c (no_plt): New attribute.
(handle_no_plt_attribute): New handler.
* calls.c (prepare_call_address): Check for no_plt
attribute.
* config/i386/i386.c (ix86_function_ok_for_sibcall): Check
for no_plt attribute.
(ix86_expand_call):  Ditto.
(nopic_no_plt_attribute): New function.
(ix86_output_call_insn): Output indirect call for non-pic
no plt calls.
* doc/extend.texi (no_plt): Document new attribute.
* testsuite/gcc.target/i386/noplt-1.c: New test.
* testsuite/gcc.target/i386/noplt-2.c: New test.
* testsuite/gcc.target/i386/noplt-3.c: New test.
* testsuite/gcc.target/i386/noplt-4.c: New test.


Please review.

--- config/i386/i386.c  (revision 223720)
+++ config/i386/i386.c  (working copy)
@@ -5479,6 +5479,8 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
!TARGET_64BIT
flag_pic
flag_plt
+   (TREE_CODE (decl) != FUNCTION_DECL
+ || !lookup_attribute (no_plt, DECL_ATTRIBUTES (decl)))
decl  !targetm.binds_local_p (decl))
 return false;

Wrong order or  decl is redundant. Stopped reading here.

Thanks,



Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-06-02 Thread Ramana Radhakrishnan
On Tue, Jun 2, 2015 at 7:15 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Mon, Jun 1, 2015 at 1:33 PM, Ramana Radhakrishnan
 ramana@googlemail.com wrote:
 On Mon, Jun 1, 2015 at 7:55 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Mon, Jun 1, 2015 at 11:41 AM, Ramana Radhakrishnan
 ramana@googlemail.com wrote:
 On Mon, Jun 1, 2015 at 7:01 PM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Mon, Jun 1, 2015 at 1:24 AM, Ramana Radhakrishnan
 ramana.radhakrish...@arm.com wrote:

 Why isn't it just an indirect call in the cases that would require a 
 GOT
 slot and a direct call otherwise ? I'm trying to work out what's so
 different on each target that mandates this to be in the target 
 backend.
 Also it would be better to push the tests into gcc.dg if you can and
 check
 for the absence of a relocation so that folks at least see these as 
 being
 UNSUPPORTED on their target.




 To be even more explicit, shouldn't this be handled similar to the way in
 which -fno-plt is handled in a target agnostic manner ? After all, if you
 can handle this for the command line, doing the same for a function which
 has been decorated with attribute((noplt)) should be simple.

 -fno-plt does not work for non-PIC code, having non-PIC code not use
 PLT was my primary motivation.  Infact, if you go back in this thread,
 I suggested to HJ if I should piggyback on -fno-plt.  I tried using
 the -fno-plt implementation to do this by removing the flag_pic check
 in calls.c, but that does not still work for non-PIC code.

 If you want __attribute__ ((noplt)) to work for non-PIC code, we
 should look to code it in the same place surely by making all
 __attribute__((noplt)) calls, indirect calls irrespective of whether
 it's fpic or not.



 You're missing my point, unless I'm missing something basic here - I
 should have been even more explicit and said -fPIC was a given in all
 this discussion.

 calls.c:229 has

 else if (flag_pic  !flag_plt  fndecl_or_type
 TREE_CODE (fndecl_or_type) == FUNCTION_DECL
 !targetm.binds_local_p (fndecl_or_type))

 why can't we merge the check in here for the attribute noplt ?

 We can and and please see this thread, that is the exact patch I proposed :
 https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02682.html

 However, there was one caveat.  I want this working without -fPIC too.
 non-PIC code also generates PLT calls and I want them eliminated.


 If a new attribute is added to the GNU language in this case, why
 isn't this being treated in the same way as the command line option
 has been treated ? All this means is that we add an attribute and a
 command line option to common code and then not implement it in a
 proper target agnostic fashion.

 You are right.  This is the way I wanted it too but I also wanted the
 attribute to work without PIC. PLT calls are generated without -fPIC
 and -fPIE too and I wanted a solution for that.  On looking at the
 code in more detail,

 * -fno-plt is made to work with -fPIC, is there a reason to not make
 it work for non-PIC code?  I can remove the flag_pic check from
 calls.c

 I don't think that's right, you probably have to allow that along with
 (flag_pic || (decl  attribute_no_plt (decl)) - however it seems odd
 to me that the language extension allows this but the flag doesn't.

 * Then, I add the generic attribute noplt and everything is fine.

 There is just one caveat with the above approach, for x86_64
 (*call_insn) will not generate indirect-calls for *non-PIC* code
 because constant_call_address_operand in predicates.md will evaluate
 to false.  This can be fixed appropriately in ix86_output_call_insn in
 i386.c.

 Yes, targets need to massage that into place but that's essentially
 the mechanics of retaining indirect calls in each backend. -fno-plt
 doesn't work for ARM / AArch64 with optimizers currently (and I
 suspect on most other targets) because our predicates are too liberal,
 fixed by treating noplt or -fno-plt as the equivalent of
 -mlong-calls.



 Is this alright?  Sorry for the confusion, but the primary reason why
 I did not do it the way you suggested is because we wanted noplt
 attribute to work for non-PIC code also.

 If that is the case, then this is a slightly more complicated
 condition in the same place. We then always have indirect calls for
 functions that are marked noplt and just have target generate this
 appropriately.

 I have now modified this patch.

Thanks for taking care of this. I'll have a read through tomorrow
morning when I'm at my normal work machine.


 This patch does two things:

 1) Adds new generic function attribute no_plt that is similar in
 functionality  to -fno-plt except that it applies only to calls to
 functions that are marked  with this attribute.
 2) For x86_64, it makes -fno-plt(and the attribute) also work for
 non-PIC code by  directly generating an indirect call via a GOT entry.

I'm sorry I'm going to push back again for the same reason.

Other than 

Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-06-02 Thread Sriraman Tallam
On Tue, Jun 2, 2015 at 12:32 PM, Bernhard Reutner-Fischer
rep.dot@gmail.com wrote:
 On June 2, 2015 8:15:42 PM GMT+02:00, Sriraman Tallam tmsri...@google.com 
 wrote:
 []

I have now modified this patch.

This patch does two things:

1) Adds new generic function attribute no_plt that is similar in
functionality  to -fno-plt except that it applies only to calls to
functions that are marked  with this attribute.
2) For x86_64, it makes -fno-plt(and the attribute) also work for
non-PIC code by  directly generating an indirect call via a GOT entry.

For PIC code, no_plt merely shadows the implementation of -fno-plt, no
surprises here.

* c-family/c-common.c (no_plt): New attribute.
(handle_no_plt_attribute): New handler.
* calls.c (prepare_call_address): Check for no_plt
attribute.
* config/i386/i386.c (ix86_function_ok_for_sibcall): Check
for no_plt attribute.
(ix86_expand_call):  Ditto.
(nopic_no_plt_attribute): New function.
(ix86_output_call_insn): Output indirect call for non-pic
no plt calls.
* doc/extend.texi (no_plt): Document new attribute.
* testsuite/gcc.target/i386/noplt-1.c: New test.
* testsuite/gcc.target/i386/noplt-2.c: New test.
* testsuite/gcc.target/i386/noplt-3.c: New test.
* testsuite/gcc.target/i386/noplt-4.c: New test.


Please review.

 --- config/i386/i386.c  (revision 223720)
 +++ config/i386/i386.c  (working copy)
 @@ -5479,6 +5479,8 @@ ix86_function_ok_for_sibcall (tree decl, tree exp)
 !TARGET_64BIT
 flag_pic
 flag_plt
 +   (TREE_CODE (decl) != FUNCTION_DECL
 + || !lookup_attribute (no_plt, DECL_ATTRIBUTES (decl)))
 decl  !targetm.binds_local_p (decl))
  return false;

 Wrong order or  decl is redundant. Stopped reading here.

Fixed and new patch attached.

Thanks
Sri


 Thanks,

* c-family/c-common.c (no_plt): New attribute.
(handle_no_plt_attribute): New handler.
* calls.c (prepare_call_address): Check for no_plt
attribute.
* config/i386/i386.c (ix86_function_ok_for_sibcall): Check
for no_plt attribute.
(ix86_expand_call):  Ditto.
(nopic_no_plt_attribute): New function.
(ix86_output_call_insn): Output indirect call for non-pic
no plt calls.
* doc/extend.texi (no_plt): Document new attribute.
* testsuite/gcc.target/i386/noplt-1.c: New test.
* testsuite/gcc.target/i386/noplt-2.c: New test.
* testsuite/gcc.target/i386/noplt-3.c: New test.
* testsuite/gcc.target/i386/noplt-4.c: New test.

This patch does two things:

* Adds new generic function attribute no_plt that is similar in functionality
  to -fno-plt except that it applies only to calls to functions that are marked
  with this attribute.
* For x86_64, it makes -fno-plt(and the attribute) also work for non-PIC code by
  directly generating an indirect call via a GOT entry.

Index: c-family/c-common.c
===
--- c-family/c-common.c (revision 223720)
+++ c-family/c-common.c (working copy)
@@ -357,6 +357,7 @@ static tree handle_mode_attribute (tree *, tree, t
 static tree handle_section_attribute (tree *, tree, tree, int, bool *);
 static tree handle_aligned_attribute (tree *, tree, tree, int, bool *);
 static tree handle_weak_attribute (tree *, tree, tree, int, bool *) ;
+static tree handle_no_plt_attribute (tree *, tree, tree, int, bool *) ;
 static tree handle_alias_ifunc_attribute (bool, tree *, tree, tree, bool *);
 static tree handle_ifunc_attribute (tree *, tree, tree, int, bool *);
 static tree handle_alias_attribute (tree *, tree, tree, int, bool *);
@@ -706,6 +707,8 @@ const struct attribute_spec c_common_attribute_tab
  handle_aligned_attribute, false },
   { weak,   0, 0, true,  false, false,
  handle_weak_attribute, false },
+  { no_plt,   0, 0, true,  false, false,
+ handle_no_plt_attribute, false },
   { ifunc,  1, 1, true,  false, false,
  handle_ifunc_attribute, false },
   { alias,  1, 1, true,  false, false,
@@ -8185,6 +8188,25 @@ handle_weak_attribute (tree *node, tree name,
   return NULL_TREE;
 }
 
+/* Handle a no_plt attribute; arguments as in
+   struct attribute_spec.handler.  */
+
+static tree
+handle_no_plt_attribute (tree *node, tree name,
+  tree ARG_UNUSED (args),
+  int ARG_UNUSED (flags),
+  bool * ARG_UNUSED (no_add_attrs))
+{
+  if (TREE_CODE (*node) != FUNCTION_DECL)
+{
+  warning (OPT_Wattributes,
+  %qE attribute is only applicable on functions, name);
+  *no_add_attrs = true;
+  return NULL_TREE;
+}
+  return NULL_TREE;
+}
+
 /* Handle an alias or ifunc attribute; arguments as in
struct attribute_spec.handler, except that IS_ALIAS tells us
whether this is an alias as opposed to ifunc 

Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-06-02 Thread Bernhard Reutner-Fischer
On June 2, 2015 9:59:40 PM GMT+02:00, Sriraman Tallam tmsri...@google.com 
wrote:
On Tue, Jun 2, 2015 at 12:32 PM, Bernhard Reutner-Fischer
rep.dot@gmail.com wrote:
 On June 2, 2015 8:15:42 PM GMT+02:00, Sriraman Tallam
tmsri...@google.com wrote:
 []

I have now modified this patch.

This patch does two things:

1) Adds new generic function attribute no_plt that is similar in
functionality  to -fno-plt except that it applies only to calls to
functions that are marked  with this attribute.
2) For x86_64, it makes -fno-plt(and the attribute) also work for
non-PIC code by  directly generating an indirect call via a GOT
entry.

For PIC code, no_plt merely shadows the implementation of -fno-plt,
no
surprises here.

* c-family/c-common.c (no_plt): New attribute.
(handle_no_plt_attribute): New handler.
* calls.c (prepare_call_address): Check for no_plt
attribute.
* config/i386/i386.c (ix86_function_ok_for_sibcall): Check
for no_plt attribute.
(ix86_expand_call):  Ditto.
(nopic_no_plt_attribute): New function.
(ix86_output_call_insn): Output indirect call for non-pic
no plt calls.
* doc/extend.texi (no_plt): Document new attribute.
* testsuite/gcc.target/i386/noplt-1.c: New test.
* testsuite/gcc.target/i386/noplt-2.c: New test.
* testsuite/gcc.target/i386/noplt-3.c: New test.
* testsuite/gcc.target/i386/noplt-4.c: New test.


Please review.

 --- config/i386/i386.c  (revision 223720)
 +++ config/i386/i386.c  (working copy)
 @@ -5479,6 +5479,8 @@ ix86_function_ok_for_sibcall (tree decl, tree
exp)
 !TARGET_64BIT
 flag_pic
 flag_plt
 +   (TREE_CODE (decl) != FUNCTION_DECL
 + || !lookup_attribute (no_plt, DECL_ATTRIBUTES (decl)))
 decl  !targetm.binds_local_p (decl))
  return false;

 Wrong order or  decl is redundant. Stopped reading here.

Fixed and new patch

Just reading the diff I do not grok the different conditions in
ix86_function_ok_for_sibcall
ix86_expand_call
especially regarding CM_LARGE_PIC but I take it you've read more context.

-  ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)))
+  ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))
+  flag_plt

s/! /!/;# while you touch or maybe that's OK -- check_GNU.sh  would know, 
hopefully.

+/* Return true if the function being called was marked with attribute
+   no_plt or using -fno-plt and we are compiling for no-PIC and x86_64.
+   This is currently used only with 64-bit ELF targets to call the function

a function

+   marked no_plt indirectly.  */
+
+static bool
+nopic_no_plt_attribute (rtx call_op)

IIRC predicates ought to have a _p suffix but maybe that's outdated nowadays?

+{
+  if (flag_pic)
+return false;
+
+  if (!TARGET_64BIT || TARGET_MACHO|| TARGET_SEH || TARGET_PECOFF)

missing space after ||
We have a contrib/check*.sh style checker for patches in there.

+return false;
+
+  if (SYMBOL_REF_LOCAL_P (call_op))
+return false;
+
+  tree symbol_decl = SYMBOL_REF_DECL (call_op);
+
+  if (symbol_decl != NULL_TREE
+   TREE_CODE (symbol_decl) == FUNCTION_DECL
+   (!flag_plt
+  || lookup_attribute (no_plt, DECL_ATTRIBUTES (symbol_decl
+return true;
+
+  return false;
+}

 
+@item no_plt
+@cindex @code{no_plt} function attribute
+The @code{no_plt} attribute is used to inform the compiler that a calls

Doesn't parse. a call / calls

+to the function should not use the PLT.  For example, external functions

would be nice to have an xref to PLT definition for the casual reader, iff we 
have one or could have one easily.

+defined in shared objects are called from the executable using the PLT.
+This attribute on the function declaration calls these functions indirectly
+rather than going via the PLT.  This is similar to @option{-fno-plt} but
+is only applicable to calls to the function marked with this attribute.
+

smallexample (or you-name-it counterpart) for code-avoidance for bonus points, 
maybe.

Not a conceptual review due to current cellphone-impairedness, but looks 
somewhat plausible at first glance..

HTH  cheers,


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-06-02 Thread Sriraman Tallam
On Tue, Jun 2, 2015 at 1:56 PM, Ramana Radhakrishnan
ramana@googlemail.com wrote:
 On Tue, Jun 2, 2015 at 7:15 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Mon, Jun 1, 2015 at 1:33 PM, Ramana Radhakrishnan
 ramana@googlemail.com wrote:
 On Mon, Jun 1, 2015 at 7:55 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Mon, Jun 1, 2015 at 11:41 AM, Ramana Radhakrishnan
 ramana@googlemail.com wrote:
 On Mon, Jun 1, 2015 at 7:01 PM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Mon, Jun 1, 2015 at 1:24 AM, Ramana Radhakrishnan
 ramana.radhakrish...@arm.com wrote:

 Why isn't it just an indirect call in the cases that would require a 
 GOT
 slot and a direct call otherwise ? I'm trying to work out what's so
 different on each target that mandates this to be in the target 
 backend.
 Also it would be better to push the tests into gcc.dg if you can and
 check
 for the absence of a relocation so that folks at least see these as 
 being
 UNSUPPORTED on their target.




 To be even more explicit, shouldn't this be handled similar to the way 
 in
 which -fno-plt is handled in a target agnostic manner ? After all, if 
 you
 can handle this for the command line, doing the same for a function 
 which
 has been decorated with attribute((noplt)) should be simple.

 -fno-plt does not work for non-PIC code, having non-PIC code not use
 PLT was my primary motivation.  Infact, if you go back in this thread,
 I suggested to HJ if I should piggyback on -fno-plt.  I tried using
 the -fno-plt implementation to do this by removing the flag_pic check
 in calls.c, but that does not still work for non-PIC code.

 If you want __attribute__ ((noplt)) to work for non-PIC code, we
 should look to code it in the same place surely by making all
 __attribute__((noplt)) calls, indirect calls irrespective of whether
 it's fpic or not.



 You're missing my point, unless I'm missing something basic here - I
 should have been even more explicit and said -fPIC was a given in all
 this discussion.

 calls.c:229 has

 else if (flag_pic  !flag_plt  fndecl_or_type
 TREE_CODE (fndecl_or_type) == FUNCTION_DECL
 !targetm.binds_local_p (fndecl_or_type))

 why can't we merge the check in here for the attribute noplt ?

 We can and and please see this thread, that is the exact patch I proposed :
 https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02682.html

 However, there was one caveat.  I want this working without -fPIC too.
 non-PIC code also generates PLT calls and I want them eliminated.


 If a new attribute is added to the GNU language in this case, why
 isn't this being treated in the same way as the command line option
 has been treated ? All this means is that we add an attribute and a
 command line option to common code and then not implement it in a
 proper target agnostic fashion.

 You are right.  This is the way I wanted it too but I also wanted the
 attribute to work without PIC. PLT calls are generated without -fPIC
 and -fPIE too and I wanted a solution for that.  On looking at the
 code in more detail,

 * -fno-plt is made to work with -fPIC, is there a reason to not make
 it work for non-PIC code?  I can remove the flag_pic check from
 calls.c

 I don't think that's right, you probably have to allow that along with
 (flag_pic || (decl  attribute_no_plt (decl)) - however it seems odd
 to me that the language extension allows this but the flag doesn't.

 * Then, I add the generic attribute noplt and everything is fine.

 There is just one caveat with the above approach, for x86_64
 (*call_insn) will not generate indirect-calls for *non-PIC* code
 because constant_call_address_operand in predicates.md will evaluate
 to false.  This can be fixed appropriately in ix86_output_call_insn in
 i386.c.

 Yes, targets need to massage that into place but that's essentially
 the mechanics of retaining indirect calls in each backend. -fno-plt
 doesn't work for ARM / AArch64 with optimizers currently (and I
 suspect on most other targets) because our predicates are too liberal,
 fixed by treating noplt or -fno-plt as the equivalent of
 -mlong-calls.



 Is this alright?  Sorry for the confusion, but the primary reason why
 I did not do it the way you suggested is because we wanted noplt
 attribute to work for non-PIC code also.

 If that is the case, then this is a slightly more complicated
 condition in the same place. We then always have indirect calls for
 functions that are marked noplt and just have target generate this
 appropriately.

 I have now modified this patch.

 Thanks for taking care of this. I'll have a read through tomorrow
 morning when I'm at my normal work machine.


 This patch does two things:

 1) Adds new generic function attribute no_plt that is similar in
 functionality  to -fno-plt except that it applies only to calls to
 functions that are marked  with this attribute.
 2) For x86_64, it makes -fno-plt(and the attribute) also work for
 non-PIC code by  directly generating an indirect 

Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-06-02 Thread Xinliang David Li
On Tue, Jun 2, 2015 at 1:56 PM, Ramana Radhakrishnan
ramana@googlemail.com wrote:
 On Tue, Jun 2, 2015 at 7:15 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Mon, Jun 1, 2015 at 1:33 PM, Ramana Radhakrishnan
 ramana@googlemail.com wrote:
 On Mon, Jun 1, 2015 at 7:55 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Mon, Jun 1, 2015 at 11:41 AM, Ramana Radhakrishnan
 ramana@googlemail.com wrote:
 On Mon, Jun 1, 2015 at 7:01 PM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Mon, Jun 1, 2015 at 1:24 AM, Ramana Radhakrishnan
 ramana.radhakrish...@arm.com wrote:

 Why isn't it just an indirect call in the cases that would require a 
 GOT
 slot and a direct call otherwise ? I'm trying to work out what's so
 different on each target that mandates this to be in the target 
 backend.
 Also it would be better to push the tests into gcc.dg if you can and
 check
 for the absence of a relocation so that folks at least see these as 
 being
 UNSUPPORTED on their target.




 To be even more explicit, shouldn't this be handled similar to the way 
 in
 which -fno-plt is handled in a target agnostic manner ? After all, if 
 you
 can handle this for the command line, doing the same for a function 
 which
 has been decorated with attribute((noplt)) should be simple.

 -fno-plt does not work for non-PIC code, having non-PIC code not use
 PLT was my primary motivation.  Infact, if you go back in this thread,
 I suggested to HJ if I should piggyback on -fno-plt.  I tried using
 the -fno-plt implementation to do this by removing the flag_pic check
 in calls.c, but that does not still work for non-PIC code.

 If you want __attribute__ ((noplt)) to work for non-PIC code, we
 should look to code it in the same place surely by making all
 __attribute__((noplt)) calls, indirect calls irrespective of whether
 it's fpic or not.



 You're missing my point, unless I'm missing something basic here - I
 should have been even more explicit and said -fPIC was a given in all
 this discussion.

 calls.c:229 has

 else if (flag_pic  !flag_plt  fndecl_or_type
 TREE_CODE (fndecl_or_type) == FUNCTION_DECL
 !targetm.binds_local_p (fndecl_or_type))

 why can't we merge the check in here for the attribute noplt ?

 We can and and please see this thread, that is the exact patch I proposed :
 https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02682.html

 However, there was one caveat.  I want this working without -fPIC too.
 non-PIC code also generates PLT calls and I want them eliminated.


 If a new attribute is added to the GNU language in this case, why
 isn't this being treated in the same way as the command line option
 has been treated ? All this means is that we add an attribute and a
 command line option to common code and then not implement it in a
 proper target agnostic fashion.

 You are right.  This is the way I wanted it too but I also wanted the
 attribute to work without PIC. PLT calls are generated without -fPIC
 and -fPIE too and I wanted a solution for that.  On looking at the
 code in more detail,

 * -fno-plt is made to work with -fPIC, is there a reason to not make
 it work for non-PIC code?  I can remove the flag_pic check from
 calls.c

 I don't think that's right, you probably have to allow that along with
 (flag_pic || (decl  attribute_no_plt (decl)) - however it seems odd
 to me that the language extension allows this but the flag doesn't.

 * Then, I add the generic attribute noplt and everything is fine.

 There is just one caveat with the above approach, for x86_64
 (*call_insn) will not generate indirect-calls for *non-PIC* code
 because constant_call_address_operand in predicates.md will evaluate
 to false.  This can be fixed appropriately in ix86_output_call_insn in
 i386.c.

 Yes, targets need to massage that into place but that's essentially
 the mechanics of retaining indirect calls in each backend. -fno-plt
 doesn't work for ARM / AArch64 with optimizers currently (and I
 suspect on most other targets) because our predicates are too liberal,
 fixed by treating noplt or -fno-plt as the equivalent of
 -mlong-calls.



 Is this alright?  Sorry for the confusion, but the primary reason why
 I did not do it the way you suggested is because we wanted noplt
 attribute to work for non-PIC code also.

 If that is the case, then this is a slightly more complicated
 condition in the same place. We then always have indirect calls for
 functions that are marked noplt and just have target generate this
 appropriately.

 I have now modified this patch.

 Thanks for taking care of this. I'll have a read through tomorrow
 morning when I'm at my normal work machine.


 This patch does two things:

 1) Adds new generic function attribute no_plt that is similar in
 functionality  to -fno-plt except that it applies only to calls to
 functions that are marked  with this attribute.
 2) For x86_64, it makes -fno-plt(and the attribute) also work for
 non-PIC code by  directly generating an indirect 

Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-06-02 Thread Bernhard Reutner-Fischer
On June 2, 2015 11:22:03 PM GMT+02:00, Xinliang David Li davi...@google.com 
wrote:

 I'm sorry I'm going to push back again for the same reason.

 Other than forcing targets to tweak their call insn patterns, the act
 of generating the indirect call should remain in target independent
 code. Sorry, not having the same behaviour on all platforms for
 something like this is just a recipe for confusion.

Everything else will be a nightmare for any real (widespread)  use, yes. Just 
doing this for x86, x86_64 and x32 gets us in an unpleasant situation like the 
dances everybody had and has to do for ebx avoidance.



Do you have a good suggestion on the way to implement this (non PIC
no-plt) in a clean and target independent way? Regarding the

not offhand here, at least, fwiw.

'confusion' part, is it a matter of documentation (can be updated when
more targets start to support it more efficiently)?

I386 compatible relief in this respect certainly is nice but we ought to handle 
this better throughout IMHO. Cannot devote time there myself though, so just 
hoping you folks are able to put some effort into this.

PS: and please, pretty please clip your replies sensibly..



Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-06-01 Thread Ramana Radhakrishnan



Why isn't it just an indirect call in the cases that would require a GOT
slot and a direct call otherwise ? I'm trying to work out what's so
different on each target that mandates this to be in the target backend.
Also it would be better to push the tests into gcc.dg if you can and check
for the absence of a relocation so that folks at least see these as being
UNSUPPORTED on their target.





To be even more explicit, shouldn't this be handled similar to the way 
in which -fno-plt is handled in a target agnostic manner ? After all, if 
you can handle this for the command line, doing the same for a function 
which has been decorated with attribute((noplt)) should be simple.



I am not familiar with PLT calls for other targets.  I can move the
tests to gcc.dg but what relocation are you suggesting I check for?


Move the test to gcc.dg, add a target_support_no_plt function in 
testsuite/lib/target-supports.exp and mark this as being supported only 
on x86 and use scan-assembler to scan for PLT relocations for x86. Other 
targets can add things as they deem fit.


In any case, on a large number of elf/ linux targets I would have 
thought the absence of a JMP_SLOT relocation would be good enough to 
check that this is working correctly.


regards
Ramana





Thanks
Sri






Ramana




Also I think the PLT calls have EBX in call fusage wich is added by
ix86_expand_call.
   else
 {
   /* Static functions and indirect calls don't need the pic
register.  */
   if (flag_pic
(!TARGET_64BIT
   || (ix86_cmodel == CM_LARGE_PIC
DEFAULT_ABI != MS_ABI))
GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)))
 {
   use_reg (use, gen_rtx_REG (Pmode,
REAL_PIC_OFFSET_TABLE_REGNUM));
   if (ix86_use_pseudo_pic_reg ())
 emit_move_insn (gen_rtx_REG (Pmode,
REAL_PIC_OFFSET_TABLE_REGNUM),
 pic_offset_table_rtx);
 }

I think you want to take that away from FUSAGE there just like we do for
local calls
(and in fact the code should already check flag_pic  flag_plt I
suppose.


Done that now and patch attached.

Thanks
Sri



Honza




Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-06-01 Thread Sriraman Tallam
On Mon, Jun 1, 2015 at 1:24 AM, Ramana Radhakrishnan
ramana.radhakrish...@arm.com wrote:

 Why isn't it just an indirect call in the cases that would require a GOT
 slot and a direct call otherwise ? I'm trying to work out what's so
 different on each target that mandates this to be in the target backend.
 Also it would be better to push the tests into gcc.dg if you can and
 check
 for the absence of a relocation so that folks at least see these as being
 UNSUPPORTED on their target.




 To be even more explicit, shouldn't this be handled similar to the way in
 which -fno-plt is handled in a target agnostic manner ? After all, if you
 can handle this for the command line, doing the same for a function which
 has been decorated with attribute((noplt)) should be simple.

-fno-plt does not work for non-PIC code, having non-PIC code not use
PLT was my primary motivation.  Infact, if you go back in this thread,
I suggested to HJ if I should piggyback on -fno-plt.  I tried using
the -fno-plt implementation to do this by removing the flag_pic check
in calls.c, but that does not still work for non-PIC code.


 I am not familiar with PLT calls for other targets.  I can move the
 tests to gcc.dg but what relocation are you suggesting I check for?


 Move the test to gcc.dg, add a target_support_no_plt function in
 testsuite/lib/target-supports.exp and mark this as being supported only on
 x86 and use scan-assembler to scan for PLT relocations for x86. Other
 targets can add things as they deem fit.


 In any case, on a large number of elf/ linux targets I would have thought
 the absence of a JMP_SLOT relocation would be good enough to check that this
 is working correctly.

 regards
 Ramana





 Thanks
 Sri





 Ramana



 Also I think the PLT calls have EBX in call fusage wich is added by
 ix86_expand_call.
else
  {
/* Static functions and indirect calls don't need the pic
 register.  */
if (flag_pic
 (!TARGET_64BIT
|| (ix86_cmodel == CM_LARGE_PIC
 DEFAULT_ABI != MS_ABI))
 GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
 ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)))
  {
use_reg (use, gen_rtx_REG (Pmode,
 REAL_PIC_OFFSET_TABLE_REGNUM));
if (ix86_use_pseudo_pic_reg ())
  emit_move_insn (gen_rtx_REG (Pmode,
 REAL_PIC_OFFSET_TABLE_REGNUM),
  pic_offset_table_rtx);
  }

 I think you want to take that away from FUSAGE there just like we do
 for
 local calls
 (and in fact the code should already check flag_pic  flag_plt I
 suppose.


 Done that now and patch attached.

 Thanks
 Sri


 Honza





Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-06-01 Thread Ramana Radhakrishnan
On Mon, Jun 1, 2015 at 7:01 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Mon, Jun 1, 2015 at 1:24 AM, Ramana Radhakrishnan
 ramana.radhakrish...@arm.com wrote:

 Why isn't it just an indirect call in the cases that would require a GOT
 slot and a direct call otherwise ? I'm trying to work out what's so
 different on each target that mandates this to be in the target backend.
 Also it would be better to push the tests into gcc.dg if you can and
 check
 for the absence of a relocation so that folks at least see these as being
 UNSUPPORTED on their target.




 To be even more explicit, shouldn't this be handled similar to the way in
 which -fno-plt is handled in a target agnostic manner ? After all, if you
 can handle this for the command line, doing the same for a function which
 has been decorated with attribute((noplt)) should be simple.

 -fno-plt does not work for non-PIC code, having non-PIC code not use
 PLT was my primary motivation.  Infact, if you go back in this thread,
 I suggested to HJ if I should piggyback on -fno-plt.  I tried using
 the -fno-plt implementation to do this by removing the flag_pic check
 in calls.c, but that does not still work for non-PIC code.

You're missing my point, unless I'm missing something basic here - I
should have been even more explicit and said -fPIC was a given in all
this discussion.

calls.c:229 has

else if (flag_pic  !flag_plt  fndecl_or_type
TREE_CODE (fndecl_or_type) == FUNCTION_DECL
!targetm.binds_local_p (fndecl_or_type))

why can't we merge the check in here for the attribute noplt ?

If a new attribute is added to the GNU language in this case, why
isn't this being treated in the same way as the command line option
has been treated ? All this means is that we add an attribute and a
command line option to common code and then not implement it in a
proper target agnostic fashion.

regards
Ramana




 I am not familiar with PLT calls for other targets.  I can move the
 tests to gcc.dg but what relocation are you suggesting I check for?


 Move the test to gcc.dg, add a target_support_no_plt function in
 testsuite/lib/target-supports.exp and mark this as being supported only on
 x86 and use scan-assembler to scan for PLT relocations for x86. Other
 targets can add things as they deem fit.


 In any case, on a large number of elf/ linux targets I would have thought
 the absence of a JMP_SLOT relocation would be good enough to check that this
 is working correctly.

 regards
 Ramana





 Thanks
 Sri





 Ramana



 Also I think the PLT calls have EBX in call fusage wich is added by
 ix86_expand_call.
else
  {
/* Static functions and indirect calls don't need the pic
 register.  */
if (flag_pic
 (!TARGET_64BIT
|| (ix86_cmodel == CM_LARGE_PIC
 DEFAULT_ABI != MS_ABI))
 GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
 ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)))
  {
use_reg (use, gen_rtx_REG (Pmode,
 REAL_PIC_OFFSET_TABLE_REGNUM));
if (ix86_use_pseudo_pic_reg ())
  emit_move_insn (gen_rtx_REG (Pmode,
 REAL_PIC_OFFSET_TABLE_REGNUM),
  pic_offset_table_rtx);
  }

 I think you want to take that away from FUSAGE there just like we do
 for
 local calls
 (and in fact the code should already check flag_pic  flag_plt I
 suppose.


 Done that now and patch attached.

 Thanks
 Sri


 Honza





Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-06-01 Thread Sriraman Tallam
On Mon, Jun 1, 2015 at 11:41 AM, Ramana Radhakrishnan
ramana@googlemail.com wrote:
 On Mon, Jun 1, 2015 at 7:01 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Mon, Jun 1, 2015 at 1:24 AM, Ramana Radhakrishnan
 ramana.radhakrish...@arm.com wrote:

 Why isn't it just an indirect call in the cases that would require a GOT
 slot and a direct call otherwise ? I'm trying to work out what's so
 different on each target that mandates this to be in the target backend.
 Also it would be better to push the tests into gcc.dg if you can and
 check
 for the absence of a relocation so that folks at least see these as being
 UNSUPPORTED on their target.




 To be even more explicit, shouldn't this be handled similar to the way in
 which -fno-plt is handled in a target agnostic manner ? After all, if you
 can handle this for the command line, doing the same for a function which
 has been decorated with attribute((noplt)) should be simple.

 -fno-plt does not work for non-PIC code, having non-PIC code not use
 PLT was my primary motivation.  Infact, if you go back in this thread,
 I suggested to HJ if I should piggyback on -fno-plt.  I tried using
 the -fno-plt implementation to do this by removing the flag_pic check
 in calls.c, but that does not still work for non-PIC code.

 You're missing my point, unless I'm missing something basic here - I
 should have been even more explicit and said -fPIC was a given in all
 this discussion.

 calls.c:229 has

 else if (flag_pic  !flag_plt  fndecl_or_type
 TREE_CODE (fndecl_or_type) == FUNCTION_DECL
 !targetm.binds_local_p (fndecl_or_type))

 why can't we merge the check in here for the attribute noplt ?

We can and and please see this thread, that is the exact patch I proposed :
https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02682.html

However, there was one caveat.  I want this working without -fPIC too.
non-PIC code also generates PLT calls and I want them eliminated.


 If a new attribute is added to the GNU language in this case, why
 isn't this being treated in the same way as the command line option
 has been treated ? All this means is that we add an attribute and a
 command line option to common code and then not implement it in a
 proper target agnostic fashion.

You are right.  This is the way I wanted it too but I also wanted the
attribute to work without PIC. PLT calls are generated without -fPIC
and -fPIE too and I wanted a solution for that.  On looking at the
code in more detail,

* -fno-plt is made to work with -fPIC, is there a reason to not make
it work for non-PIC code?  I can remove the flag_pic check from
calls.c
* Then, I add the generic attribute noplt and everything is fine.

There is just one caveat with the above approach, for x86_64
(*call_insn) will not generate indirect-calls for *non-PIC* code
because constant_call_address_operand in predicates.md will evaluate
to false.  This can be fixed appropriately in ix86_output_call_insn in
i386.c.


Is this alright?  Sorry for the confusion, but the primary reason why
I did not do it the way you suggested is because we wanted noplt
attribute to work for non-PIC code also.

Thanks
Sri


 regards
 Ramana




 I am not familiar with PLT calls for other targets.  I can move the
 tests to gcc.dg but what relocation are you suggesting I check for?


 Move the test to gcc.dg, add a target_support_no_plt function in
 testsuite/lib/target-supports.exp and mark this as being supported only on
 x86 and use scan-assembler to scan for PLT relocations for x86. Other
 targets can add things as they deem fit.


 In any case, on a large number of elf/ linux targets I would have thought
 the absence of a JMP_SLOT relocation would be good enough to check that this
 is working correctly.

 regards
 Ramana





 Thanks
 Sri





 Ramana



 Also I think the PLT calls have EBX in call fusage wich is added by
 ix86_expand_call.
else
  {
/* Static functions and indirect calls don't need the pic
 register.  */
if (flag_pic
 (!TARGET_64BIT
|| (ix86_cmodel == CM_LARGE_PIC
 DEFAULT_ABI != MS_ABI))
 GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
 ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)))
  {
use_reg (use, gen_rtx_REG (Pmode,
 REAL_PIC_OFFSET_TABLE_REGNUM));
if (ix86_use_pseudo_pic_reg ())
  emit_move_insn (gen_rtx_REG (Pmode,
 REAL_PIC_OFFSET_TABLE_REGNUM),
  pic_offset_table_rtx);
  }

 I think you want to take that away from FUSAGE there just like we do
 for
 local calls
 (and in fact the code should already check flag_pic  flag_plt I
 suppose.


 Done that now and patch attached.

 Thanks
 Sri


 Honza





Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-06-01 Thread Ramana Radhakrishnan
On Mon, Jun 1, 2015 at 7:55 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Mon, Jun 1, 2015 at 11:41 AM, Ramana Radhakrishnan
 ramana@googlemail.com wrote:
 On Mon, Jun 1, 2015 at 7:01 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Mon, Jun 1, 2015 at 1:24 AM, Ramana Radhakrishnan
 ramana.radhakrish...@arm.com wrote:

 Why isn't it just an indirect call in the cases that would require a GOT
 slot and a direct call otherwise ? I'm trying to work out what's so
 different on each target that mandates this to be in the target backend.
 Also it would be better to push the tests into gcc.dg if you can and
 check
 for the absence of a relocation so that folks at least see these as being
 UNSUPPORTED on their target.




 To be even more explicit, shouldn't this be handled similar to the way in
 which -fno-plt is handled in a target agnostic manner ? After all, if you
 can handle this for the command line, doing the same for a function which
 has been decorated with attribute((noplt)) should be simple.

 -fno-plt does not work for non-PIC code, having non-PIC code not use
 PLT was my primary motivation.  Infact, if you go back in this thread,
 I suggested to HJ if I should piggyback on -fno-plt.  I tried using
 the -fno-plt implementation to do this by removing the flag_pic check
 in calls.c, but that does not still work for non-PIC code.

If you want __attribute__ ((noplt)) to work for non-PIC code, we
should look to code it in the same place surely by making all
__attribute__((noplt)) calls, indirect calls irrespective of whether
it's fpic or not.



 You're missing my point, unless I'm missing something basic here - I
 should have been even more explicit and said -fPIC was a given in all
 this discussion.

 calls.c:229 has

 else if (flag_pic  !flag_plt  fndecl_or_type
 TREE_CODE (fndecl_or_type) == FUNCTION_DECL
 !targetm.binds_local_p (fndecl_or_type))

 why can't we merge the check in here for the attribute noplt ?

 We can and and please see this thread, that is the exact patch I proposed :
 https://gcc.gnu.org/ml/gcc-patches/2015-05/msg02682.html

 However, there was one caveat.  I want this working without -fPIC too.
 non-PIC code also generates PLT calls and I want them eliminated.


 If a new attribute is added to the GNU language in this case, why
 isn't this being treated in the same way as the command line option
 has been treated ? All this means is that we add an attribute and a
 command line option to common code and then not implement it in a
 proper target agnostic fashion.

 You are right.  This is the way I wanted it too but I also wanted the
 attribute to work without PIC. PLT calls are generated without -fPIC
 and -fPIE too and I wanted a solution for that.  On looking at the
 code in more detail,

 * -fno-plt is made to work with -fPIC, is there a reason to not make
 it work for non-PIC code?  I can remove the flag_pic check from
 calls.c

I don't think that's right, you probably have to allow that along with
(flag_pic || (decl  attribute_no_plt (decl)) - however it seems odd
to me that the language extension allows this but the flag doesn't.

 * Then, I add the generic attribute noplt and everything is fine.

 There is just one caveat with the above approach, for x86_64
 (*call_insn) will not generate indirect-calls for *non-PIC* code
 because constant_call_address_operand in predicates.md will evaluate
 to false.  This can be fixed appropriately in ix86_output_call_insn in
 i386.c.

Yes, targets need to massage that into place but that's essentially
the mechanics of retaining indirect calls in each backend. -fno-plt
doesn't work for ARM / AArch64 with optimizers currently (and I
suspect on most other targets) because our predicates are too liberal,
fixed by treating noplt or -fno-plt as the equivalent of
-mlong-calls.



 Is this alright?  Sorry for the confusion, but the primary reason why
 I did not do it the way you suggested is because we wanted noplt
 attribute to work for non-PIC code also.

If that is the case, then this is a slightly more complicated
condition in the same place. We then always have indirect calls for
functions that are marked noplt and just have target generate this
appropriately.

To be honest, this is trivial to implement in the ARM backend as one
would just piggy back on the longcalls work - despite that, IMNSHO
it's best done in a target independent manner.

regards
Ramana


 Thanks
 Sri


 regards
 Ramana




 I am not familiar with PLT calls for other targets.  I can move the
 tests to gcc.dg but what relocation are you suggesting I check for?


 Move the test to gcc.dg, add a target_support_no_plt function in
 testsuite/lib/target-supports.exp and mark this as being supported only on
 x86 and use scan-assembler to scan for PLT relocations for x86. Other
 targets can add things as they deem fit.


 In any case, on a large number of elf/ linux targets I would have thought
 the absence of a JMP_SLOT relocation would be good 

Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-29 Thread Sriraman Tallam
On Thu, May 28, 2015 at 5:05 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 4:54 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Thu, May 28, 2015 at 2:52 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 2:27 PM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Thu, May 28, 2015 at 2:01 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 1:54 PM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Thu, May 28, 2015 at 12:05 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 11:50 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Thu, May 28, 2015 at 11:42 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 11:34 AM, Sriraman Tallam 
 tmsri...@google.com wrote:
 I have attached a patch that adds the new attribute noplt.  Please 
 review.

 * config/i386/i386.c (avoid_plt_to_call): New function.
 (ix86_output_call_insn): Generate indirect call for functions
 marked with noplt attribute.
 (attribute_spec ix86_attribute_): Define new attribute noplt.
 * doc/extend.texi: Document new attribute noplt.
 * gcc.target/i386/noplt-1.c: New testcase.
 * gcc.target/i386/noplt-2.c: New testcase.


 2 comments:

 1. Don't remove %! prefix before call/jmp.  It is needed for MPX.
 2. Don't you need to check

!TARGET_MACHO
!TARGET_SEH
!TARGET_PECOFF

 since it only works for ELF.

 Ok, I will make this change. OTOH, is it just better to piggy-back on
 existing -fno-plt change by Alex in calls.c
 and do this:

 Index: calls.c
 ===
 --- calls.c (revision 223720)
 +++ calls.c (working copy)
 @@ -226,9 +226,11 @@ prepare_call_address (tree fndecl_or_type, rtx fun
  targetm.small_register_classes_for_mode_p (FUNCTION_MODE))
? force_not_mem (memory_address (FUNCTION_MODE, funexp))
: memory_address (FUNCTION_MODE, funexp));
 -  else if (flag_pic  !flag_plt  fndecl_or_type
 +  else if (fndecl_or_type
  TREE_CODE (fndecl_or_type) == FUNCTION_DECL
 -!targetm.binds_local_p (fndecl_or_type))
 +!targetm.binds_local_p (fndecl_or_type)
 +((flag_pic  !flag_plt)
 +   || (lookup_attribute (noplt, 
 DECL_ATTRIBUTES(fndecl_or_type)
  {
funexp = force_reg (Pmode, funexp);
  }


 Does it work on non-PIC calls?

 You are right, it doesnt work.  I have attached the patch with the
 changes you mentioned.


 Since direct_p is true, do wee need

 +  if (GET_CODE (call_op) != SYMBOL_REF
 +  || SYMBOL_REF_LOCAL_P (call_op))
 +return false;

 We do need it right because  for this case below, I do not want an
 indirect call:

 __attribute__((noplt))
 int foo() {
   return 0;
 }

 int main()
 {
   return foo();
 }

 Assuming foo is not inlined, if I remove the lines you mentioned, I
 will get an indirect call which is unnecessary.


 I meant the GET_CODE (call_op) != SYMBOL_REF part isn't
 needed.

 I should have realized that :), sorry.  Patch fixed.


 --- testsuite/gcc.target/i386/noplt-1.c (revision 0)
 +++ testsuite/gcc.target/i386/noplt-1.c (working copy)
 @@ -0,0 +1,13 @@
 +/* { dg-do compile { target x86_64-*-* } } */
 ...
 +/* { dg-final { scan-assembler call\[
 \t\]\\*.*foo.*@GOTPCREL\\(%rip\\) } } */

 The test will fail on Windows and Darwin.

Changed to use x86_64-*-linux* target.



 --
 H.J.
* config/i386/i386.c (avoid_plt_to_call): New function.
(ix86_output_call_insn): Generate indirect call for functions
marked with noplt attribute.
(attribute_spec ix86_attribute_): Define new attribute noplt.
* doc/extend.texi: Document new attribute noplt.
* gcc.target/i386/noplt-1.c: New testcase.
* gcc.target/i386/noplt-2.c: New testcase.

Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 223720)
+++ config/i386/i386.c  (working copy)
@@ -25599,6 +25599,24 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call
   return call;
 }
 
+/* Return true if the function being called was marked with attribute
+   noplt.  If this function is defined, this should return false.  */
+static bool
+avoid_plt_to_call (rtx call_op)
+{
+  if (SYMBOL_REF_LOCAL_P (call_op))
+return false;
+
+  tree symbol_decl = SYMBOL_REF_DECL (call_op);
+
+  if (symbol_decl != NULL_TREE
+   TREE_CODE (symbol_decl) == FUNCTION_DECL
+   lookup_attribute (noplt, DECL_ATTRIBUTES (symbol_decl)))
+return true;
+
+  return false;
+}
+
 /* Output the assembly for a call instruction.  */
 
 const char *
@@ -25611,7 +25629,13 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op
   if (SIBLING_CALL_P (insn))
 {
   if (direct_p)
-   xasm = %!jmp\t%P0;
+   {
+ if (!TARGET_MACHO  !TARGET_SEH  !TARGET_PECOFF
+  TARGET_64BIT  avoid_plt_to_call (call_op))
+   xasm = %!jmp\t*%p0@GOTPCREL(%%rip);
+ else
+   xasm = %!jmp\t%P0;
+   }
   /* SEH epilogue detection 

Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-29 Thread Sriraman Tallam
+Uros

On Fri, May 29, 2015 at 10:25 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, May 29, 2015 at 10:20 AM, Sriraman Tallam tmsri...@google.com wrote:
 Hi HJ,

 Is this ok to commit?


 Looks good to me.  But I can't approve it.

 --
 H.J.


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-29 Thread H.J. Lu
On Fri, May 29, 2015 at 10:20 AM, Sriraman Tallam tmsri...@google.com wrote:
 Hi HJ,

 Is this ok to commit?


Looks good to me.  But I can't approve it.

-- 
H.J.


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-29 Thread Sriraman Tallam
On Fri, May 29, 2015 at 3:24 PM, Ramana Radhakrishnan
ramana@googlemail.com wrote:


 On Friday, 29 May 2015, Sriraman Tallam tmsri...@google.com wrote:

 On Fri, May 29, 2015 at 12:35 PM, Jan Hubicka hubi...@ucw.cz wrote:
* config/i386/i386.c (avoid_plt_to_call): New function.
(ix86_output_call_insn): Generate indirect call for functions
marked with noplt attribute.
(attribute_spec ix86_attribute_): Define new attribute noplt.
* doc/extend.texi: Document new attribute noplt.
* gcc.target/i386/noplt-1.c: New testcase.
* gcc.target/i386/noplt-2.c: New testcase.
 
  Index: config/i386/i386.c
  ===
  --- config/i386/i386.c(revision 223720)
  +++ config/i386/i386.c(working copy)
  @@ -25599,6 +25599,24 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx
  call
 return call;
   }
 
  +/* Return true if the function being called was marked with attribute
  +   noplt.  If this function is defined, this should return false.
  */
  +static bool
  +avoid_plt_to_call (rtx call_op)
  +{
  +  if (SYMBOL_REF_LOCAL_P (call_op))
  +return false;
  +
  +  tree symbol_decl = SYMBOL_REF_DECL (call_op);
  +
  +  if (symbol_decl != NULL_TREE
  +   TREE_CODE (symbol_decl) == FUNCTION_DECL
  +   lookup_attribute (noplt, DECL_ATTRIBUTES (symbol_decl)))
  +return true;
  +
  +  return false;
  +}
 
  OK, now we have __attribute__ (optimize(noplt)) which binds to the
  caller and makes
  all calls in the function to skip PLT and __attribute__ (noplt) which
  binds to callee
  and makes all calls to function to not use PLT.
 
  That sort of makes sense to me, but why noplt attribute is not
  implemented at generic level
  just like -fplt? Is it only because every target supporting PLT would
  need update in its
  call expansion patterns?

 Yes, that is what I had in mind.



 Why isn't it just an indirect call in the cases that would require a GOT
 slot and a direct call otherwise ? I'm trying to work out what's so
 different on each target that mandates this to be in the target backend.
 Also it would be better to push the tests into gcc.dg if you can and check
 for the absence of a relocation so that folks at least see these as being
 UNSUPPORTED on their target.

I am not familiar with PLT calls for other targets.  I can move the
tests to gcc.dg but what relocation are you suggesting I check for?

Thanks
Sri





 Ramana

 
  Also I think the PLT calls have EBX in call fusage wich is added by
  ix86_expand_call.
else
  {
/* Static functions and indirect calls don't need the pic
  register.  */
if (flag_pic
 (!TARGET_64BIT
|| (ix86_cmodel == CM_LARGE_PIC
 DEFAULT_ABI != MS_ABI))
 GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
 ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)))
  {
use_reg (use, gen_rtx_REG (Pmode,
  REAL_PIC_OFFSET_TABLE_REGNUM));
if (ix86_use_pseudo_pic_reg ())
  emit_move_insn (gen_rtx_REG (Pmode,
  REAL_PIC_OFFSET_TABLE_REGNUM),
  pic_offset_table_rtx);
  }
 
  I think you want to take that away from FUSAGE there just like we do for
  local calls
  (and in fact the code should already check flag_pic  flag_plt I
  suppose.

 Done that now and patch attached.

 Thanks
 Sri

 
  Honza


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-29 Thread Sriraman Tallam
Hi HJ,

Is this ok to commit?

Thanks
Sri

On Thu, May 28, 2015 at 11:03 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Thu, May 28, 2015 at 5:05 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 4:54 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Thu, May 28, 2015 at 2:52 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 2:27 PM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Thu, May 28, 2015 at 2:01 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 1:54 PM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Thu, May 28, 2015 at 12:05 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 11:50 AM, Sriraman Tallam 
 tmsri...@google.com wrote:
 On Thu, May 28, 2015 at 11:42 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 11:34 AM, Sriraman Tallam 
 tmsri...@google.com wrote:
 I have attached a patch that adds the new attribute noplt.  
 Please review.

 * config/i386/i386.c (avoid_plt_to_call): New function.
 (ix86_output_call_insn): Generate indirect call for functions
 marked with noplt attribute.
 (attribute_spec ix86_attribute_): Define new attribute noplt.
 * doc/extend.texi: Document new attribute noplt.
 * gcc.target/i386/noplt-1.c: New testcase.
 * gcc.target/i386/noplt-2.c: New testcase.


 2 comments:

 1. Don't remove %! prefix before call/jmp.  It is needed for MPX.
 2. Don't you need to check

!TARGET_MACHO
!TARGET_SEH
!TARGET_PECOFF

 since it only works for ELF.

 Ok, I will make this change. OTOH, is it just better to piggy-back on
 existing -fno-plt change by Alex in calls.c
 and do this:

 Index: calls.c
 ===
 --- calls.c (revision 223720)
 +++ calls.c (working copy)
 @@ -226,9 +226,11 @@ prepare_call_address (tree fndecl_or_type, rtx 
 fun
  targetm.small_register_classes_for_mode_p (FUNCTION_MODE))
? force_not_mem (memory_address (FUNCTION_MODE, funexp))
: memory_address (FUNCTION_MODE, funexp));
 -  else if (flag_pic  !flag_plt  fndecl_or_type
 +  else if (fndecl_or_type
  TREE_CODE (fndecl_or_type) == FUNCTION_DECL
 -!targetm.binds_local_p (fndecl_or_type))
 +!targetm.binds_local_p (fndecl_or_type)
 +((flag_pic  !flag_plt)
 +   || (lookup_attribute (noplt, 
 DECL_ATTRIBUTES(fndecl_or_type)
  {
funexp = force_reg (Pmode, funexp);
  }


 Does it work on non-PIC calls?

 You are right, it doesnt work.  I have attached the patch with the
 changes you mentioned.


 Since direct_p is true, do wee need

 +  if (GET_CODE (call_op) != SYMBOL_REF
 +  || SYMBOL_REF_LOCAL_P (call_op))
 +return false;

 We do need it right because  for this case below, I do not want an
 indirect call:

 __attribute__((noplt))
 int foo() {
   return 0;
 }

 int main()
 {
   return foo();
 }

 Assuming foo is not inlined, if I remove the lines you mentioned, I
 will get an indirect call which is unnecessary.


 I meant the GET_CODE (call_op) != SYMBOL_REF part isn't
 needed.

 I should have realized that :), sorry.  Patch fixed.


 --- testsuite/gcc.target/i386/noplt-1.c (revision 0)
 +++ testsuite/gcc.target/i386/noplt-1.c (working copy)
 @@ -0,0 +1,13 @@
 +/* { dg-do compile { target x86_64-*-* } } */
 ...
 +/* { dg-final { scan-assembler call\[
 \t\]\\*.*foo.*@GOTPCREL\\(%rip\\) } } */

 The test will fail on Windows and Darwin.

 Changed to use x86_64-*-linux* target.



 --
 H.J.


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-29 Thread Jan Hubicka
   * config/i386/i386.c (avoid_plt_to_call): New function.
   (ix86_output_call_insn): Generate indirect call for functions
   marked with noplt attribute.
   (attribute_spec ix86_attribute_): Define new attribute noplt.
   * doc/extend.texi: Document new attribute noplt.
   * gcc.target/i386/noplt-1.c: New testcase.
   * gcc.target/i386/noplt-2.c: New testcase.
 
 Index: config/i386/i386.c
 ===
 --- config/i386/i386.c(revision 223720)
 +++ config/i386/i386.c(working copy)
 @@ -25599,6 +25599,24 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call
return call;
  }
  
 +/* Return true if the function being called was marked with attribute
 +   noplt.  If this function is defined, this should return false.  */
 +static bool
 +avoid_plt_to_call (rtx call_op)
 +{
 +  if (SYMBOL_REF_LOCAL_P (call_op))
 +return false;
 +
 +  tree symbol_decl = SYMBOL_REF_DECL (call_op);
 +
 +  if (symbol_decl != NULL_TREE
 +   TREE_CODE (symbol_decl) == FUNCTION_DECL
 +   lookup_attribute (noplt, DECL_ATTRIBUTES (symbol_decl)))
 +return true;
 +
 +  return false;
 +}

OK, now we have __attribute__ (optimize(noplt)) which binds to the caller and 
makes
all calls in the function to skip PLT and __attribute__ (noplt) which binds 
to callee
and makes all calls to function to not use PLT.

That sort of makes sense to me, but why noplt attribute is not implemented at 
generic level
just like -fplt? Is it only because every target supporting PLT would need 
update in its
call expansion patterns?

Also I think the PLT calls have EBX in call fusage wich is added by 
ix86_expand_call.
  else  
{   
  /* Static functions and indirect calls don't need the pic register.  */   
  if (flag_pic  
   (!TARGET_64BIT 
  || (ix86_cmodel == CM_LARGE_PIC   
   DEFAULT_ABI != MS_ABI))
   GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF  
   ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)))   
{   
  use_reg (use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM));
  if (ix86_use_pseudo_pic_reg ())   
emit_move_insn (gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM),  
pic_offset_table_rtx);  
}   

I think you want to take that away from FUSAGE there just like we do for local 
calls
(and in fact the code should already check flag_pic  flag_plt I suppose.

Honza


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-29 Thread Sriraman Tallam
On Fri, May 29, 2015 at 12:35 PM, Jan Hubicka hubi...@ucw.cz wrote:
   * config/i386/i386.c (avoid_plt_to_call): New function.
   (ix86_output_call_insn): Generate indirect call for functions
   marked with noplt attribute.
   (attribute_spec ix86_attribute_): Define new attribute noplt.
   * doc/extend.texi: Document new attribute noplt.
   * gcc.target/i386/noplt-1.c: New testcase.
   * gcc.target/i386/noplt-2.c: New testcase.

 Index: config/i386/i386.c
 ===
 --- config/i386/i386.c(revision 223720)
 +++ config/i386/i386.c(working copy)
 @@ -25599,6 +25599,24 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call
return call;
  }

 +/* Return true if the function being called was marked with attribute
 +   noplt.  If this function is defined, this should return false.  */
 +static bool
 +avoid_plt_to_call (rtx call_op)
 +{
 +  if (SYMBOL_REF_LOCAL_P (call_op))
 +return false;
 +
 +  tree symbol_decl = SYMBOL_REF_DECL (call_op);
 +
 +  if (symbol_decl != NULL_TREE
 +   TREE_CODE (symbol_decl) == FUNCTION_DECL
 +   lookup_attribute (noplt, DECL_ATTRIBUTES (symbol_decl)))
 +return true;
 +
 +  return false;
 +}

 OK, now we have __attribute__ (optimize(noplt)) which binds to the caller 
 and makes
 all calls in the function to skip PLT and __attribute__ (noplt) which binds 
 to callee
 and makes all calls to function to not use PLT.

 That sort of makes sense to me, but why noplt attribute is not implemented 
 at generic level
 just like -fplt? Is it only because every target supporting PLT would need 
 update in its
 call expansion patterns?

Yes, that is what I had in mind.


 Also I think the PLT calls have EBX in call fusage wich is added by 
 ix86_expand_call.
   else
 {
   /* Static functions and indirect calls don't need the pic register.  */
   if (flag_pic
(!TARGET_64BIT
   || (ix86_cmodel == CM_LARGE_PIC
DEFAULT_ABI != MS_ABI))
GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)))
 {
   use_reg (use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM));
   if (ix86_use_pseudo_pic_reg ())
 emit_move_insn (gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM),
 pic_offset_table_rtx);
 }

 I think you want to take that away from FUSAGE there just like we do for 
 local calls
 (and in fact the code should already check flag_pic  flag_plt I suppose.

Done that now and patch attached.

Thanks
Sri


 Honza
* config/i386/i386.c (avoid_plt_to_call): New function.
(ix86_expand_call): Dont use the PIC register when external function
calls are not made via PLT.
(ix86_output_call_insn): Generate indirect call for functions
marked with noplt attribute.
(attribute_spec ix86_attribute_): Define new attribute noplt.
* doc/extend.texi: Document new attribute noplt.
* gcc.target/i386/noplt-1.c: New testcase.
* gcc.target/i386/noplt-2.c: New testcase.

Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 223720)
+++ config/i386/i386.c  (working copy)
@@ -25475,6 +25475,28 @@ construct_plt_address (rtx symbol)
   return tmp;
 }
 
+/* Return true if the function being called was marked with attribute
+   noplt.  If this function is defined, this should return false.  This
+   is currently used only with 64-bit ELF targets.  */
+static bool
+avoid_plt_to_call (rtx call_op)
+{
+  if (!TARGET_64BIT || TARGET_MACHO|| TARGET_SEH || TARGET_PECOFF)
+return false;
+
+  if (SYMBOL_REF_LOCAL_P (call_op))
+return false;
+
+  tree symbol_decl = SYMBOL_REF_DECL (call_op);
+
+  if (symbol_decl != NULL_TREE
+   TREE_CODE (symbol_decl) == FUNCTION_DECL
+   lookup_attribute (noplt, DECL_ATTRIBUTES (symbol_decl)))
+return true;
+
+  return false;
+}
+
 rtx
 ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
  rtx callarg2,
@@ -25497,13 +25519,16 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call
 }
   else
 {
-  /* Static functions and indirect calls don't need the pic register.  */
+  /* Static functions and indirect calls don't need the pic register.  
Also,
+check if PLT was explicitly avoided via no-plt or noplt attribute, 
making
+it an indirect call.  */
   if (flag_pic
   (!TARGET_64BIT
  || (ix86_cmodel == CM_LARGE_PIC
   DEFAULT_ABI != MS_ABI))
   GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
-  ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)))
+  ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))
+  flag_plt  !avoid_plt_to_call (XEXP (fnaddr, 0)))
{
  use_reg (use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM));
  if 

Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-29 Thread Sriraman Tallam
Made one more change and New patch attached.

Thanks
Sri

On Fri, May 29, 2015 at 2:37 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Fri, May 29, 2015 at 12:35 PM, Jan Hubicka hubi...@ucw.cz wrote:
   * config/i386/i386.c (avoid_plt_to_call): New function.
   (ix86_output_call_insn): Generate indirect call for functions
   marked with noplt attribute.
   (attribute_spec ix86_attribute_): Define new attribute noplt.
   * doc/extend.texi: Document new attribute noplt.
   * gcc.target/i386/noplt-1.c: New testcase.
   * gcc.target/i386/noplt-2.c: New testcase.

 Index: config/i386/i386.c
 ===
 --- config/i386/i386.c(revision 223720)
 +++ config/i386/i386.c(working copy)
 @@ -25599,6 +25599,24 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call
return call;
  }

 +/* Return true if the function being called was marked with attribute
 +   noplt.  If this function is defined, this should return false.  */
 +static bool
 +avoid_plt_to_call (rtx call_op)
 +{
 +  if (SYMBOL_REF_LOCAL_P (call_op))
 +return false;
 +
 +  tree symbol_decl = SYMBOL_REF_DECL (call_op);
 +
 +  if (symbol_decl != NULL_TREE
 +   TREE_CODE (symbol_decl) == FUNCTION_DECL
 +   lookup_attribute (noplt, DECL_ATTRIBUTES (symbol_decl)))
 +return true;
 +
 +  return false;
 +}

 OK, now we have __attribute__ (optimize(noplt)) which binds to the caller 
 and makes
 all calls in the function to skip PLT and __attribute__ (noplt) which 
 binds to callee
 and makes all calls to function to not use PLT.

 That sort of makes sense to me, but why noplt attribute is not implemented 
 at generic level
 just like -fplt? Is it only because every target supporting PLT would need 
 update in its
 call expansion patterns?

 Yes, that is what I had in mind.


 Also I think the PLT calls have EBX in call fusage wich is added by 
 ix86_expand_call.
   else
 {
   /* Static functions and indirect calls don't need the pic register.  */
   if (flag_pic
(!TARGET_64BIT
   || (ix86_cmodel == CM_LARGE_PIC
DEFAULT_ABI != MS_ABI))
GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)))
 {
   use_reg (use, gen_rtx_REG (Pmode, REAL_PIC_OFFSET_TABLE_REGNUM));
   if (ix86_use_pseudo_pic_reg ())
 emit_move_insn (gen_rtx_REG (Pmode, 
 REAL_PIC_OFFSET_TABLE_REGNUM),
 pic_offset_table_rtx);
 }

 I think you want to take that away from FUSAGE there just like we do for 
 local calls
 (and in fact the code should already check flag_pic  flag_plt I suppose.

 Done that now and patch attached.

 Thanks
 Sri


 Honza
* config/i386/i386.c (avoid_plt_to_call): New function.
(ix86_expand_call): Dont use the PIC register when external function
calls are not made via PLT.
(ix86_output_call_insn): Generate indirect call for functions
marked with noplt attribute.
(attribute_spec ix86_attribute_): Define new attribute noplt.
* doc/extend.texi: Document new attribute noplt.
* gcc.target/i386/noplt-1.c: New testcase.
* gcc.target/i386/noplt-2.c: New testcase.

Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 223720)
+++ config/i386/i386.c  (working copy)
@@ -25475,6 +25475,28 @@ construct_plt_address (rtx symbol)
   return tmp;
 }
 
+/* Return true if the function being called was marked with attribute
+   noplt.  If this function is defined, this should return false.  This
+   is currently used only with 64-bit ELF targets.  */
+static bool
+avoid_plt_to_call (rtx call_op)
+{
+  if (!TARGET_64BIT || TARGET_MACHO|| TARGET_SEH || TARGET_PECOFF)
+return false;
+
+  if (SYMBOL_REF_LOCAL_P (call_op))
+return false;
+
+  tree symbol_decl = SYMBOL_REF_DECL (call_op);
+
+  if (symbol_decl != NULL_TREE
+   TREE_CODE (symbol_decl) == FUNCTION_DECL
+   lookup_attribute (noplt, DECL_ATTRIBUTES (symbol_decl)))
+return true;
+
+  return false;
+}
+
 rtx
 ix86_expand_call (rtx retval, rtx fnaddr, rtx callarg1,
  rtx callarg2,
@@ -25497,13 +25519,16 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call
 }
   else
 {
-  /* Static functions and indirect calls don't need the pic register.  */
+  /* Static functions and indirect calls don't need the pic register.  
Also,
+check if PLT was explicitly avoided via no-plt or noplt attribute, 
making
+it an indirect call.  */
   if (flag_pic
   (!TARGET_64BIT
  || (ix86_cmodel == CM_LARGE_PIC
   DEFAULT_ABI != MS_ABI))
   GET_CODE (XEXP (fnaddr, 0)) == SYMBOL_REF
-  ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0)))
+  ! SYMBOL_REF_LOCAL_P (XEXP (fnaddr, 0))
+  flag_plt  

Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-28 Thread H.J. Lu
On Thu, May 28, 2015 at 1:54 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Thu, May 28, 2015 at 12:05 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 11:50 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Thu, May 28, 2015 at 11:42 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 11:34 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 I have attached a patch that adds the new attribute noplt.  Please 
 review.

 * config/i386/i386.c (avoid_plt_to_call): New function.
 (ix86_output_call_insn): Generate indirect call for functions
 marked with noplt attribute.
 (attribute_spec ix86_attribute_): Define new attribute noplt.
 * doc/extend.texi: Document new attribute noplt.
 * gcc.target/i386/noplt-1.c: New testcase.
 * gcc.target/i386/noplt-2.c: New testcase.


 2 comments:

 1. Don't remove %! prefix before call/jmp.  It is needed for MPX.
 2. Don't you need to check

!TARGET_MACHO
!TARGET_SEH
!TARGET_PECOFF

 since it only works for ELF.

 Ok, I will make this change. OTOH, is it just better to piggy-back on
 existing -fno-plt change by Alex in calls.c
 and do this:

 Index: calls.c
 ===
 --- calls.c (revision 223720)
 +++ calls.c (working copy)
 @@ -226,9 +226,11 @@ prepare_call_address (tree fndecl_or_type, rtx fun
  targetm.small_register_classes_for_mode_p (FUNCTION_MODE))
? force_not_mem (memory_address (FUNCTION_MODE, funexp))
: memory_address (FUNCTION_MODE, funexp));
 -  else if (flag_pic  !flag_plt  fndecl_or_type
 +  else if (fndecl_or_type
  TREE_CODE (fndecl_or_type) == FUNCTION_DECL
 -!targetm.binds_local_p (fndecl_or_type))
 +!targetm.binds_local_p (fndecl_or_type)
 +((flag_pic  !flag_plt)
 +   || (lookup_attribute (noplt, DECL_ATTRIBUTES(fndecl_or_type)
  {
funexp = force_reg (Pmode, funexp);
  }


 Does it work on non-PIC calls?

 You are right, it doesnt work.  I have attached the patch with the
 changes you mentioned.


Since direct_p is true, do wee need

+  if (GET_CODE (call_op) != SYMBOL_REF
+  || SYMBOL_REF_LOCAL_P (call_op))
+return false;

H.J.


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-28 Thread Sriraman Tallam
On Thu, May 28, 2015 at 2:01 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 1:54 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Thu, May 28, 2015 at 12:05 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 11:50 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Thu, May 28, 2015 at 11:42 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 11:34 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 I have attached a patch that adds the new attribute noplt.  Please 
 review.

 * config/i386/i386.c (avoid_plt_to_call): New function.
 (ix86_output_call_insn): Generate indirect call for functions
 marked with noplt attribute.
 (attribute_spec ix86_attribute_): Define new attribute noplt.
 * doc/extend.texi: Document new attribute noplt.
 * gcc.target/i386/noplt-1.c: New testcase.
 * gcc.target/i386/noplt-2.c: New testcase.


 2 comments:

 1. Don't remove %! prefix before call/jmp.  It is needed for MPX.
 2. Don't you need to check

!TARGET_MACHO
!TARGET_SEH
!TARGET_PECOFF

 since it only works for ELF.

 Ok, I will make this change. OTOH, is it just better to piggy-back on
 existing -fno-plt change by Alex in calls.c
 and do this:

 Index: calls.c
 ===
 --- calls.c (revision 223720)
 +++ calls.c (working copy)
 @@ -226,9 +226,11 @@ prepare_call_address (tree fndecl_or_type, rtx fun
  targetm.small_register_classes_for_mode_p (FUNCTION_MODE))
? force_not_mem (memory_address (FUNCTION_MODE, funexp))
: memory_address (FUNCTION_MODE, funexp));
 -  else if (flag_pic  !flag_plt  fndecl_or_type
 +  else if (fndecl_or_type
  TREE_CODE (fndecl_or_type) == FUNCTION_DECL
 -!targetm.binds_local_p (fndecl_or_type))
 +!targetm.binds_local_p (fndecl_or_type)
 +((flag_pic  !flag_plt)
 +   || (lookup_attribute (noplt, DECL_ATTRIBUTES(fndecl_or_type)
  {
funexp = force_reg (Pmode, funexp);
  }


 Does it work on non-PIC calls?

 You are right, it doesnt work.  I have attached the patch with the
 changes you mentioned.


 Since direct_p is true, do wee need

 +  if (GET_CODE (call_op) != SYMBOL_REF
 +  || SYMBOL_REF_LOCAL_P (call_op))
 +return false;

We do need it right because  for this case below, I do not want an
indirect call:

__attribute__((noplt))
int foo() {
  return 0;
}

int main()
{
  return foo();
}

Assuming foo is not inlined, if I remove the lines you mentioned, I
will get an indirect call which is unnecessary.

Thanks
Sri


 H.J.


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-28 Thread H.J. Lu
On Thu, May 28, 2015 at 2:27 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Thu, May 28, 2015 at 2:01 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 1:54 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Thu, May 28, 2015 at 12:05 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 11:50 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Thu, May 28, 2015 at 11:42 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 11:34 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 I have attached a patch that adds the new attribute noplt.  Please 
 review.

 * config/i386/i386.c (avoid_plt_to_call): New function.
 (ix86_output_call_insn): Generate indirect call for functions
 marked with noplt attribute.
 (attribute_spec ix86_attribute_): Define new attribute noplt.
 * doc/extend.texi: Document new attribute noplt.
 * gcc.target/i386/noplt-1.c: New testcase.
 * gcc.target/i386/noplt-2.c: New testcase.


 2 comments:

 1. Don't remove %! prefix before call/jmp.  It is needed for MPX.
 2. Don't you need to check

!TARGET_MACHO
!TARGET_SEH
!TARGET_PECOFF

 since it only works for ELF.

 Ok, I will make this change. OTOH, is it just better to piggy-back on
 existing -fno-plt change by Alex in calls.c
 and do this:

 Index: calls.c
 ===
 --- calls.c (revision 223720)
 +++ calls.c (working copy)
 @@ -226,9 +226,11 @@ prepare_call_address (tree fndecl_or_type, rtx fun
  targetm.small_register_classes_for_mode_p (FUNCTION_MODE))
? force_not_mem (memory_address (FUNCTION_MODE, funexp))
: memory_address (FUNCTION_MODE, funexp));
 -  else if (flag_pic  !flag_plt  fndecl_or_type
 +  else if (fndecl_or_type
  TREE_CODE (fndecl_or_type) == FUNCTION_DECL
 -!targetm.binds_local_p (fndecl_or_type))
 +!targetm.binds_local_p (fndecl_or_type)
 +((flag_pic  !flag_plt)
 +   || (lookup_attribute (noplt, DECL_ATTRIBUTES(fndecl_or_type)
  {
funexp = force_reg (Pmode, funexp);
  }


 Does it work on non-PIC calls?

 You are right, it doesnt work.  I have attached the patch with the
 changes you mentioned.


 Since direct_p is true, do wee need

 +  if (GET_CODE (call_op) != SYMBOL_REF
 +  || SYMBOL_REF_LOCAL_P (call_op))
 +return false;

 We do need it right because  for this case below, I do not want an
 indirect call:

 __attribute__((noplt))
 int foo() {
   return 0;
 }

 int main()
 {
   return foo();
 }

 Assuming foo is not inlined, if I remove the lines you mentioned, I
 will get an indirect call which is unnecessary.


I meant the GET_CODE (call_op) != SYMBOL_REF part isn't
needed.



-- 
H.J.


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-28 Thread Sriraman Tallam
On Thu, May 28, 2015 at 12:05 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 11:50 AM, Sriraman Tallam tmsri...@google.com wrote:
 On Thu, May 28, 2015 at 11:42 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 11:34 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 I have attached a patch that adds the new attribute noplt.  Please 
 review.

 * config/i386/i386.c (avoid_plt_to_call): New function.
 (ix86_output_call_insn): Generate indirect call for functions
 marked with noplt attribute.
 (attribute_spec ix86_attribute_): Define new attribute noplt.
 * doc/extend.texi: Document new attribute noplt.
 * gcc.target/i386/noplt-1.c: New testcase.
 * gcc.target/i386/noplt-2.c: New testcase.


 2 comments:

 1. Don't remove %! prefix before call/jmp.  It is needed for MPX.
 2. Don't you need to check

!TARGET_MACHO
!TARGET_SEH
!TARGET_PECOFF

 since it only works for ELF.

 Ok, I will make this change. OTOH, is it just better to piggy-back on
 existing -fno-plt change by Alex in calls.c
 and do this:

 Index: calls.c
 ===
 --- calls.c (revision 223720)
 +++ calls.c (working copy)
 @@ -226,9 +226,11 @@ prepare_call_address (tree fndecl_or_type, rtx fun
  targetm.small_register_classes_for_mode_p (FUNCTION_MODE))
? force_not_mem (memory_address (FUNCTION_MODE, funexp))
: memory_address (FUNCTION_MODE, funexp));
 -  else if (flag_pic  !flag_plt  fndecl_or_type
 +  else if (fndecl_or_type
  TREE_CODE (fndecl_or_type) == FUNCTION_DECL
 -!targetm.binds_local_p (fndecl_or_type))
 +!targetm.binds_local_p (fndecl_or_type)
 +((flag_pic  !flag_plt)
 +   || (lookup_attribute (noplt, DECL_ATTRIBUTES(fndecl_or_type)
  {
funexp = force_reg (Pmode, funexp);
  }


 Does it work on non-PIC calls?

You are right, it doesnt work.  I have attached the patch with the
changes you mentioned.

Thanks
Sri


 --
 H.J.
* config/i386/i386.c (avoid_plt_to_call): New function.
(ix86_output_call_insn): Generate indirect call for functions
marked with noplt attribute.
(attribute_spec ix86_attribute_): Define new attribute noplt.
* doc/extend.texi: Document new attribute noplt.
* gcc.target/i386/noplt-1.c: New testcase.
* gcc.target/i386/noplt-2.c: New testcase.

Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 223720)
+++ config/i386/i386.c  (working copy)
@@ -25599,6 +25599,25 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call
   return call;
 }
 
+/* Return true if the function being called was marked with attribute
+   noplt.  If this function is defined, this should return false.  */
+static bool
+avoid_plt_to_call (rtx call_op)
+{
+  if (GET_CODE (call_op) != SYMBOL_REF
+  || SYMBOL_REF_LOCAL_P (call_op))
+return false;
+
+  tree symbol_decl = SYMBOL_REF_DECL (call_op);
+
+  if (symbol_decl != NULL_TREE
+   TREE_CODE (symbol_decl) == FUNCTION_DECL
+   lookup_attribute (noplt, DECL_ATTRIBUTES (symbol_decl)))
+return true;
+
+  return false;
+}
+
 /* Output the assembly for a call instruction.  */
 
 const char *
@@ -25611,7 +25630,13 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op
   if (SIBLING_CALL_P (insn))
 {
   if (direct_p)
-   xasm = %!jmp\t%P0;
+   {
+ if (!TARGET_MACHO  !TARGET_SEH  !TARGET_PECOFF
+  TARGET_64BIT  avoid_plt_to_call (call_op))
+   xasm = %!jmp\t*%p0@GOTPCREL(%%rip);
+ else
+   xasm = %!jmp\t%P0;
+   }
   /* SEH epilogue detection requires the indirect branch case
 to include REX.W.  */
   else if (TARGET_SEH)
@@ -25654,7 +25679,13 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op
 }
 
   if (direct_p)
-xasm = %!call\t%P0;
+{
+  if (!TARGET_MACHO  !TARGET_SEH  !TARGET_PECOFF
+  TARGET_64BIT  avoid_plt_to_call (call_op))
+xasm = %!call\t*%p0@GOTPCREL(%%rip);
+  else
+xasm = %!call\t%P0;
+}
   else
 xasm = %!call\t%A0;
 
@@ -46628,6 +46659,9 @@ static const struct attribute_spec ix86_attribute_
 false },
   { callee_pop_aggregate_return, 1, 1, false, true, true,
 ix86_handle_callee_pop_aggregate_return, true },
+  /* Attribute to avoid calling function via PLT.  */
+  { noplt, 0, 0, true, false, false, ix86_handle_fndecl_attribute,
+false },
   /* End element.  */
   { NULL,0, 0, false, false, false, NULL, false }
 };
Index: doc/extend.texi
===
--- doc/extend.texi (revision 223720)
+++ doc/extend.texi (working copy)
@@ -4858,6 +4858,13 @@ On x86-32 targets, the @code{stdcall} attribute ca
 assume that the called function pops off the stack space used to
 pass arguments, unless it takes a variable number of arguments.
 
+@item noplt
+@cindex @code{noplt} 

Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-28 Thread H.J. Lu
On Thu, May 28, 2015 at 11:34 AM, Sriraman Tallam tmsri...@google.com wrote:
 I have attached a patch that adds the new attribute noplt.  Please review.

 * config/i386/i386.c (avoid_plt_to_call): New function.
 (ix86_output_call_insn): Generate indirect call for functions
 marked with noplt attribute.
 (attribute_spec ix86_attribute_): Define new attribute noplt.
 * doc/extend.texi: Document new attribute noplt.
 * gcc.target/i386/noplt-1.c: New testcase.
 * gcc.target/i386/noplt-2.c: New testcase.


2 comments:

1. Don't remove %! prefix before call/jmp.  It is needed for MPX.
2. Don't you need to check

   !TARGET_MACHO
   !TARGET_SEH
   !TARGET_PECOFF

since it only works for ELF.

-- 
H.J.


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-28 Thread Sriraman Tallam
I have attached a patch that adds the new attribute noplt.  Please review.

* config/i386/i386.c (avoid_plt_to_call): New function.
(ix86_output_call_insn): Generate indirect call for functions
marked with noplt attribute.
(attribute_spec ix86_attribute_): Define new attribute noplt.
* doc/extend.texi: Document new attribute noplt.
* gcc.target/i386/noplt-1.c: New testcase.
* gcc.target/i386/noplt-2.c: New testcase.



Thanks
Sri

On Fri, May 22, 2015 at 2:00 AM, Pedro Alves pal...@redhat.com wrote:
 On 05/21/2015 11:02 PM, Sriraman Tallam wrote:
 On Thu, May 21, 2015 at 2:51 PM, Pedro Alves pal...@redhat.com wrote:
 On 05/21/2015 10:12 PM, Sriraman Tallam wrote:

 My original proposal, for x86_64 only, was to add
 -fno-plt=function-name. This lets the user decide for which
 functions PLT must be avoided.  Let the compiler always generate an
 indirect call using call *func@GOTPCREL(%rip).  We could do this for
 non-PIC code too.  No need for linker fixups since this relies on the
 user to know that func is from a shared object.

 Having to pass function names on the command line seems like an odd
 interface.  E.g, you'll need to pass the mangled name for
 C++ functions.  Any reason this isn't a function attribute?

 It is not clear to me where I would stick the attribute.  Example
 usage in foo.cc:

 #includestring.h

 int main() {
   int n = memcmp();
 }

 I want memcmp to not go through PLT, do you propose explicitly
 re-declaring it in foo.cc with the attribute?

 I guess you'd do:

 #includestring.h

 __attribute__((no_plt)) typeof (memcpy) memcpy;

 int main() {
   int n = memcmp();
 }

 or even:

 #includestring.h

 int main() {
   if (hotpath) {
 __attribute__((no_plt)) typeof (memcpy) memcpy;
 for (..) {
   int n = memcmp();
 }
   } else {
   int n = memcmp();
   }
 }

 or globally:

 $ cat no-plt/string.h:
 #include_next string.h
 __attribute__((no_plt)) typeof (memcpy) memcpy;

 $ gcc -I no-plt/ ...

 Thanks,
 Pedro Alves

* config/i386/i386.c (avoid_plt_to_call): New function.
(ix86_output_call_insn): Generate indirect call for functions
marked with noplt attribute.
(attribute_spec ix86_attribute_): Define new attribute noplt.
* doc/extend.texi: Document new attribute noplt.
* gcc.target/i386/noplt-1.c: New testcase.
* gcc.target/i386/noplt-2.c: New testcase.

Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 223720)
+++ config/i386/i386.c  (working copy)
@@ -25599,6 +25599,25 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call
   return call;
 }
 
+/* Return true if the function being called was marked with attribute
+   noplt.  If this function is defined, this should return false.  */
+static bool
+avoid_plt_to_call (rtx call_op)
+{
+  if (GET_CODE (call_op) != SYMBOL_REF
+  || SYMBOL_REF_LOCAL_P (call_op))
+return false;
+
+  tree symbol_decl = SYMBOL_REF_DECL (call_op);
+
+  if (symbol_decl != NULL_TREE
+   TREE_CODE (symbol_decl) == FUNCTION_DECL
+   lookup_attribute (noplt, DECL_ATTRIBUTES (symbol_decl)))
+return true;
+
+  return false;
+}
+
 /* Output the assembly for a call instruction.  */
 
 const char *
@@ -25611,7 +25630,12 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op
   if (SIBLING_CALL_P (insn))
 {
   if (direct_p)
-   xasm = %!jmp\t%P0;
+   {
+ if (TARGET_64BIT  avoid_plt_to_call (call_op))
+   xasm = jmp\t*%p0@GOTPCREL(%%rip);
+ else
+   xasm = jmp\t%P0;
+   }
   /* SEH epilogue detection requires the indirect branch case
 to include REX.W.  */
   else if (TARGET_SEH)
@@ -25654,7 +25678,12 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op
 }
 
   if (direct_p)
-xasm = %!call\t%P0;
+{
+  if (TARGET_64BIT  avoid_plt_to_call (call_op))
+xasm = call\t*%p0@GOTPCREL(%%rip);
+  else
+xasm = call\t%P0;
+}
   else
 xasm = %!call\t%A0;
 
@@ -46628,6 +46657,9 @@ static const struct attribute_spec ix86_attribute_
 false },
   { callee_pop_aggregate_return, 1, 1, false, true, true,
 ix86_handle_callee_pop_aggregate_return, true },
+  /* Attribute to avoid calling function via PLT.  */
+  { noplt, 0, 0, true, false, false, ix86_handle_fndecl_attribute,
+false },
   /* End element.  */
   { NULL,0, 0, false, false, false, NULL, false }
 };
Index: doc/extend.texi
===
--- doc/extend.texi (revision 223720)
+++ doc/extend.texi (working copy)
@@ -4858,6 +4858,13 @@ On x86-32 targets, the @code{stdcall} attribute ca
 assume that the called function pops off the stack space used to
 pass arguments, unless it takes a variable number of arguments.
 
+@item noplt
+@cindex @code{noplt} function attribute, x86-64
+@cindex functions whose calls do not go via PLT
+On x86-64 targets. the @code{noplt} 

Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-28 Thread Sriraman Tallam
On Thu, May 28, 2015 at 11:42 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 11:34 AM, Sriraman Tallam tmsri...@google.com wrote:
 I have attached a patch that adds the new attribute noplt.  Please review.

 * config/i386/i386.c (avoid_plt_to_call): New function.
 (ix86_output_call_insn): Generate indirect call for functions
 marked with noplt attribute.
 (attribute_spec ix86_attribute_): Define new attribute noplt.
 * doc/extend.texi: Document new attribute noplt.
 * gcc.target/i386/noplt-1.c: New testcase.
 * gcc.target/i386/noplt-2.c: New testcase.


 2 comments:

 1. Don't remove %! prefix before call/jmp.  It is needed for MPX.
 2. Don't you need to check

!TARGET_MACHO
!TARGET_SEH
!TARGET_PECOFF

 since it only works for ELF.

Ok, I will make this change. OTOH, is it just better to piggy-back on
existing -fno-plt change by Alex in calls.c
and do this:

Index: calls.c
===
--- calls.c (revision 223720)
+++ calls.c (working copy)
@@ -226,9 +226,11 @@ prepare_call_address (tree fndecl_or_type, rtx fun
 targetm.small_register_classes_for_mode_p (FUNCTION_MODE))
   ? force_not_mem (memory_address (FUNCTION_MODE, funexp))
   : memory_address (FUNCTION_MODE, funexp));
-  else if (flag_pic  !flag_plt  fndecl_or_type
+  else if (fndecl_or_type
 TREE_CODE (fndecl_or_type) == FUNCTION_DECL
-!targetm.binds_local_p (fndecl_or_type))
+!targetm.binds_local_p (fndecl_or_type)
+((flag_pic  !flag_plt)
+   || (lookup_attribute (noplt, DECL_ATTRIBUTES(fndecl_or_type)
 {
   funexp = force_reg (Pmode, funexp);
 }


Thanks
Sri


 --
 H.J.


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-28 Thread H.J. Lu
On Thu, May 28, 2015 at 11:50 AM, Sriraman Tallam tmsri...@google.com wrote:
 On Thu, May 28, 2015 at 11:42 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 11:34 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 I have attached a patch that adds the new attribute noplt.  Please review.

 * config/i386/i386.c (avoid_plt_to_call): New function.
 (ix86_output_call_insn): Generate indirect call for functions
 marked with noplt attribute.
 (attribute_spec ix86_attribute_): Define new attribute noplt.
 * doc/extend.texi: Document new attribute noplt.
 * gcc.target/i386/noplt-1.c: New testcase.
 * gcc.target/i386/noplt-2.c: New testcase.


 2 comments:

 1. Don't remove %! prefix before call/jmp.  It is needed for MPX.
 2. Don't you need to check

!TARGET_MACHO
!TARGET_SEH
!TARGET_PECOFF

 since it only works for ELF.

 Ok, I will make this change. OTOH, is it just better to piggy-back on
 existing -fno-plt change by Alex in calls.c
 and do this:

 Index: calls.c
 ===
 --- calls.c (revision 223720)
 +++ calls.c (working copy)
 @@ -226,9 +226,11 @@ prepare_call_address (tree fndecl_or_type, rtx fun
  targetm.small_register_classes_for_mode_p (FUNCTION_MODE))
? force_not_mem (memory_address (FUNCTION_MODE, funexp))
: memory_address (FUNCTION_MODE, funexp));
 -  else if (flag_pic  !flag_plt  fndecl_or_type
 +  else if (fndecl_or_type
  TREE_CODE (fndecl_or_type) == FUNCTION_DECL
 -!targetm.binds_local_p (fndecl_or_type))
 +!targetm.binds_local_p (fndecl_or_type)
 +((flag_pic  !flag_plt)
 +   || (lookup_attribute (noplt, DECL_ATTRIBUTES(fndecl_or_type)
  {
funexp = force_reg (Pmode, funexp);
  }


Does it work on non-PIC calls?

-- 
H.J.


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-28 Thread Sriraman Tallam
On Thu, May 28, 2015 at 2:52 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 2:27 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Thu, May 28, 2015 at 2:01 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 1:54 PM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Thu, May 28, 2015 at 12:05 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 11:50 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Thu, May 28, 2015 at 11:42 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 11:34 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 I have attached a patch that adds the new attribute noplt.  Please 
 review.

 * config/i386/i386.c (avoid_plt_to_call): New function.
 (ix86_output_call_insn): Generate indirect call for functions
 marked with noplt attribute.
 (attribute_spec ix86_attribute_): Define new attribute noplt.
 * doc/extend.texi: Document new attribute noplt.
 * gcc.target/i386/noplt-1.c: New testcase.
 * gcc.target/i386/noplt-2.c: New testcase.


 2 comments:

 1. Don't remove %! prefix before call/jmp.  It is needed for MPX.
 2. Don't you need to check

!TARGET_MACHO
!TARGET_SEH
!TARGET_PECOFF

 since it only works for ELF.

 Ok, I will make this change. OTOH, is it just better to piggy-back on
 existing -fno-plt change by Alex in calls.c
 and do this:

 Index: calls.c
 ===
 --- calls.c (revision 223720)
 +++ calls.c (working copy)
 @@ -226,9 +226,11 @@ prepare_call_address (tree fndecl_or_type, rtx fun
  targetm.small_register_classes_for_mode_p (FUNCTION_MODE))
? force_not_mem (memory_address (FUNCTION_MODE, funexp))
: memory_address (FUNCTION_MODE, funexp));
 -  else if (flag_pic  !flag_plt  fndecl_or_type
 +  else if (fndecl_or_type
  TREE_CODE (fndecl_or_type) == FUNCTION_DECL
 -!targetm.binds_local_p (fndecl_or_type))
 +!targetm.binds_local_p (fndecl_or_type)
 +((flag_pic  !flag_plt)
 +   || (lookup_attribute (noplt, 
 DECL_ATTRIBUTES(fndecl_or_type)
  {
funexp = force_reg (Pmode, funexp);
  }


 Does it work on non-PIC calls?

 You are right, it doesnt work.  I have attached the patch with the
 changes you mentioned.


 Since direct_p is true, do wee need

 +  if (GET_CODE (call_op) != SYMBOL_REF
 +  || SYMBOL_REF_LOCAL_P (call_op))
 +return false;

 We do need it right because  for this case below, I do not want an
 indirect call:

 __attribute__((noplt))
 int foo() {
   return 0;
 }

 int main()
 {
   return foo();
 }

 Assuming foo is not inlined, if I remove the lines you mentioned, I
 will get an indirect call which is unnecessary.


 I meant the GET_CODE (call_op) != SYMBOL_REF part isn't
 needed.

I should have realized that :), sorry.  Patch fixed.

Thanks
Sri




 --
 H.J.
* config/i386/i386.c (avoid_plt_to_call): New function.
(ix86_output_call_insn): Generate indirect call for functions
marked with noplt attribute.
(attribute_spec ix86_attribute_): Define new attribute noplt.
* doc/extend.texi: Document new attribute noplt.
* gcc.target/i386/noplt-1.c: New testcase.
* gcc.target/i386/noplt-2.c: New testcase.

Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 223720)
+++ config/i386/i386.c  (working copy)
@@ -25599,6 +25599,24 @@ ix86_expand_call (rtx retval, rtx fnaddr, rtx call
   return call;
 }
 
+/* Return true if the function being called was marked with attribute
+   noplt.  If this function is defined, this should return false.  */
+static bool
+avoid_plt_to_call (rtx call_op)
+{
+  if (SYMBOL_REF_LOCAL_P (call_op))
+return false;
+
+  tree symbol_decl = SYMBOL_REF_DECL (call_op);
+
+  if (symbol_decl != NULL_TREE
+   TREE_CODE (symbol_decl) == FUNCTION_DECL
+   lookup_attribute (noplt, DECL_ATTRIBUTES (symbol_decl)))
+return true;
+
+  return false;
+}
+
 /* Output the assembly for a call instruction.  */
 
 const char *
@@ -25611,7 +25629,13 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op
   if (SIBLING_CALL_P (insn))
 {
   if (direct_p)
-   xasm = %!jmp\t%P0;
+   {
+ if (!TARGET_MACHO  !TARGET_SEH  !TARGET_PECOFF
+  TARGET_64BIT  avoid_plt_to_call (call_op))
+   xasm = %!jmp\t*%p0@GOTPCREL(%%rip);
+ else
+   xasm = %!jmp\t%P0;
+   }
   /* SEH epilogue detection requires the indirect branch case
 to include REX.W.  */
   else if (TARGET_SEH)
@@ -25654,7 +25678,13 @@ ix86_output_call_insn (rtx_insn *insn, rtx call_op
 }
 
   if (direct_p)
-xasm = %!call\t%P0;
+{
+  if (!TARGET_MACHO  !TARGET_SEH  !TARGET_PECOFF
+  TARGET_64BIT  avoid_plt_to_call (call_op))
+xasm = %!call\t*%p0@GOTPCREL(%%rip);
+  else
+xasm = %!call\t%P0;
+}
   else
 xasm = %!call\t%A0;
 
@@ -46628,6 +46658,9 

Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-28 Thread H.J. Lu
On Thu, May 28, 2015 at 4:54 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Thu, May 28, 2015 at 2:52 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 2:27 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Thu, May 28, 2015 at 2:01 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 1:54 PM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Thu, May 28, 2015 at 12:05 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 11:50 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Thu, May 28, 2015 at 11:42 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Thu, May 28, 2015 at 11:34 AM, Sriraman Tallam 
 tmsri...@google.com wrote:
 I have attached a patch that adds the new attribute noplt.  Please 
 review.

 * config/i386/i386.c (avoid_plt_to_call): New function.
 (ix86_output_call_insn): Generate indirect call for functions
 marked with noplt attribute.
 (attribute_spec ix86_attribute_): Define new attribute noplt.
 * doc/extend.texi: Document new attribute noplt.
 * gcc.target/i386/noplt-1.c: New testcase.
 * gcc.target/i386/noplt-2.c: New testcase.


 2 comments:

 1. Don't remove %! prefix before call/jmp.  It is needed for MPX.
 2. Don't you need to check

!TARGET_MACHO
!TARGET_SEH
!TARGET_PECOFF

 since it only works for ELF.

 Ok, I will make this change. OTOH, is it just better to piggy-back on
 existing -fno-plt change by Alex in calls.c
 and do this:

 Index: calls.c
 ===
 --- calls.c (revision 223720)
 +++ calls.c (working copy)
 @@ -226,9 +226,11 @@ prepare_call_address (tree fndecl_or_type, rtx fun
  targetm.small_register_classes_for_mode_p (FUNCTION_MODE))
? force_not_mem (memory_address (FUNCTION_MODE, funexp))
: memory_address (FUNCTION_MODE, funexp));
 -  else if (flag_pic  !flag_plt  fndecl_or_type
 +  else if (fndecl_or_type
  TREE_CODE (fndecl_or_type) == FUNCTION_DECL
 -!targetm.binds_local_p (fndecl_or_type))
 +!targetm.binds_local_p (fndecl_or_type)
 +((flag_pic  !flag_plt)
 +   || (lookup_attribute (noplt, 
 DECL_ATTRIBUTES(fndecl_or_type)
  {
funexp = force_reg (Pmode, funexp);
  }


 Does it work on non-PIC calls?

 You are right, it doesnt work.  I have attached the patch with the
 changes you mentioned.


 Since direct_p is true, do wee need

 +  if (GET_CODE (call_op) != SYMBOL_REF
 +  || SYMBOL_REF_LOCAL_P (call_op))
 +return false;

 We do need it right because  for this case below, I do not want an
 indirect call:

 __attribute__((noplt))
 int foo() {
   return 0;
 }

 int main()
 {
   return foo();
 }

 Assuming foo is not inlined, if I remove the lines you mentioned, I
 will get an indirect call which is unnecessary.


 I meant the GET_CODE (call_op) != SYMBOL_REF part isn't
 needed.

 I should have realized that :), sorry.  Patch fixed.


--- testsuite/gcc.target/i386/noplt-1.c (revision 0)
+++ testsuite/gcc.target/i386/noplt-1.c (working copy)
@@ -0,0 +1,13 @@
+/* { dg-do compile { target x86_64-*-* } } */
...
+/* { dg-final { scan-assembler call\[
\t\]\\*.*foo.*@GOTPCREL\\(%rip\\) } } */

The test will fail on Windows and Darwin.


-- 
H.J.


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-22 Thread Pedro Alves
On 05/21/2015 11:02 PM, Sriraman Tallam wrote:
 On Thu, May 21, 2015 at 2:51 PM, Pedro Alves pal...@redhat.com wrote:
 On 05/21/2015 10:12 PM, Sriraman Tallam wrote:

 My original proposal, for x86_64 only, was to add
 -fno-plt=function-name. This lets the user decide for which
 functions PLT must be avoided.  Let the compiler always generate an
 indirect call using call *func@GOTPCREL(%rip).  We could do this for
 non-PIC code too.  No need for linker fixups since this relies on the
 user to know that func is from a shared object.

 Having to pass function names on the command line seems like an odd
 interface.  E.g, you'll need to pass the mangled name for
 C++ functions.  Any reason this isn't a function attribute?
 
 It is not clear to me where I would stick the attribute.  Example
 usage in foo.cc:
 
 #includestring.h
 
 int main() {
   int n = memcmp();
 }
 
 I want memcmp to not go through PLT, do you propose explicitly
 re-declaring it in foo.cc with the attribute?

I guess you'd do:

#includestring.h

__attribute__((no_plt)) typeof (memcpy) memcpy;

int main() {
  int n = memcmp();
}

or even:

#includestring.h

int main() {
  if (hotpath) {
__attribute__((no_plt)) typeof (memcpy) memcpy;
for (..) {
  int n = memcmp();
}
  } else {
  int n = memcmp();
  }
}

or globally:

$ cat no-plt/string.h:
#include_next string.h
__attribute__((no_plt)) typeof (memcpy) memcpy;

$ gcc -I no-plt/ ...

Thanks,
Pedro Alves



Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-22 Thread Sriraman Tallam
On Fri, May 22, 2015 at 2:00 AM, Pedro Alves pal...@redhat.com wrote:
 On 05/21/2015 11:02 PM, Sriraman Tallam wrote:
 On Thu, May 21, 2015 at 2:51 PM, Pedro Alves pal...@redhat.com wrote:
 On 05/21/2015 10:12 PM, Sriraman Tallam wrote:

 My original proposal, for x86_64 only, was to add
 -fno-plt=function-name. This lets the user decide for which
 functions PLT must be avoided.  Let the compiler always generate an
 indirect call using call *func@GOTPCREL(%rip).  We could do this for
 non-PIC code too.  No need for linker fixups since this relies on the
 user to know that func is from a shared object.

 Having to pass function names on the command line seems like an odd
 interface.  E.g, you'll need to pass the mangled name for
 C++ functions.  Any reason this isn't a function attribute?

 It is not clear to me where I would stick the attribute.  Example
 usage in foo.cc:

 #includestring.h

 int main() {
   int n = memcmp();
 }

 I want memcmp to not go through PLT, do you propose explicitly
 re-declaring it in foo.cc with the attribute?

 I guess you'd do:

 #includestring.h

 __attribute__((no_plt)) typeof (memcpy) memcpy;

 int main() {
   int n = memcmp();
 }

 or even:

 #includestring.h

 int main() {
   if (hotpath) {
 __attribute__((no_plt)) typeof (memcpy) memcpy;
 for (..) {
   int n = memcmp();
 }
   } else {
   int n = memcmp();
   }
 }

 or globally:

 $ cat no-plt/string.h:
 #include_next string.h
 __attribute__((no_plt)) typeof (memcpy) memcpy;

 $ gcc -I no-plt/ ...

That looks good, thanks.

Sri


 Thanks,
 Pedro Alves



Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-21 Thread Sriraman Tallam
On Sun, May 10, 2015 at 10:01 AM, Sriraman Tallam tmsri...@google.com wrote:

 On Sun, May 10, 2015, 8:19 AM H.J. Lu hjl.to...@gmail.com wrote:

 On Sat, May 9, 2015 at 9:34 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, May 4, 2015 at 7:45 AM, Michael Matz m...@suse.de wrote:
 Hi,

 On Thu, 30 Apr 2015, Sriraman Tallam wrote:

 We noticed that one of our benchmarks sped-up by ~1% when we eliminated
 PLT stubs for some of the hot external library functions like memcmp,
 pow.  The win was from better icache and itlb performance. The main
 reason was that the PLT stubs had no spatial locality with the
 call-sites. I have started looking at ways to tell the compiler to
 eliminate PLT stubs (in-effect inline them) for specified external
 functions, for x86_64. I have a proposal and a patch and I would like to
 hear what you think.

 This comes with caveats.  This cannot be generally done for all
 functions marked extern as it is impossible for the compiler to say if a
 function is truly extern (defined in a shared library). If a function
 is not truly extern(ends up defined in the final executable), then
 calling it indirectly is a performance penalty as it could have been a
 direct call.

 This can be fixed by Alans idea.

 Further, the newly created GOT entries are fixed up at
 start-up and do not get lazily bound.

 And this can be fixed by some enhancements in the linker and dynamic
 linker.  The idea is to still generate a PLT stub and make its GOT entry
 point to it initially (like a normal got.plt slot).  Then the first
 indirect call will use the address of PLT entry (starting lazy
 resolution)
 and update the GOT slot with the real address, so further indirect calls
 will directly go to the function.

 This requires a new asm marker (and hence new reloc) as normally if
 there's a GOT slot it's filled by the real symbols address, unlike if
 there's only a got.plt slot.  E.g. a

   call *foo@GOTPLT(%rip)

 would generate a GOT slot (and fill its address into above call insn),
 but
 generate a JUMP_SLOT reloc in the final executable, not a GLOB_DAT one.


 I added the relax prefix support to x86 assembler on users/hjl/relax
 branch

 at

 https://sourceware.org/git/?p=binutils-gdb.git;a=summary

 [hjl@gnu-tools-1 relax-3]$ cat r.S
 .text
 relax jmp foo
 relax call foo
 relax jmp foo@plt
 relax call foo@plt
 [hjl@gnu-tools-1 relax-3]$ ./as -o r.o r.S
 [hjl@gnu-tools-1 relax-3]$ ./objdump -drw r.o

 r.o: file format elf64-x86-64


 Disassembly of section .text:

  .text:
0: 66 e9 00 00 00 00 data16 jmpq 0x6 2: R_X86_64_RELAX_PC32 foo-0x4
6: 66 e8 00 00 00 00 data16 callq 0xc 8: R_X86_64_RELAX_PC32
 foo-0x4
c: 66 e9 00 00 00 00 data16 jmpq 0x12 e:
 R_X86_64_RELAX_PLT32foo-0x4
   12: 66 e8 00 00 00 00 data16 callq 0x18 14:
 R_X86_64_RELAX_PLT32foo-0x4
 [hjl@gnu-tools-1 relax-3]$

 Right now, the relax relocations are treated as PC32/PLT32 relocations.
 I am working on linker support.


 I implemented the linker support for x86-64:

  main:
0: 48 83 ec 08   sub$0x8,%rsp
4: e8 00 00 00 00   callq  9 main+0x9 5: R_X86_64_PC32 plt-0x4
9: e8 00 00 00 00   callq  e main+0xe a: R_X86_64_PLT32 plt-0x4
e: e8 00 00 00 00   callq  13 main+0x13 f: R_X86_64_PC32 bar-0x4
   13: 66 e8 00 00 00 00 data16 callq 19 main+0x19 15:
 R_X86_64_RELAX_PC32 bar-0x4
   19: 66 e8 00 00 00 00 data16 callq 1f main+0x1f 1b:
 R_X86_64_RELAX_PLT32 bar-0x4
   1f: 66 e8 00 00 00 00 data16 callq 25 main+0x25 21:
 R_X86_64_RELAX_PC32 foo-0x4
   25: 66 e8 00 00 00 00 data16 callq 2b main+0x2b 27:
 R_X86_64_RELAX_PLT32 foo-0x4
   2b: 31 c0 xor%eax,%eax
   2d: 48 83 c4 08   add$0x8,%rsp
   31: c3   retq

 00400460 main:
   400460: 48 83 ec 08   sub$0x8,%rsp
   400464: e8 d7 ff ff ff   callq  400440 plt@plt
   400469: e8 d2 ff ff ff   callq  400440 plt@plt
   40046e: e8 ad ff ff ff   callq  400420 bar@plt
   400473: ff 15 ff 03 20 00 callq  *0x2003ff(%rip)# 600878
 _DYNAMIC+0xf8
   400479: ff 15 f9 03 20 00 callq  *0x2003f9(%rip)# 600878
 _DYNAMIC+0xf8
   40047f: 66 e8 f3 00 00 00 data16 callq 400578 foo
   400485: 66 e8 ed 00 00 00 data16 callq 400578 foo
   40048b: 31 c0 xor%eax,%eax
   40048d: 48 83 c4 08   add$0x8,%rsp
   400491: c3   retq

 Sriraman, can you give it a try?


I like HJ's proposal here and it is important that the linker fixes
unnecessary indirect calls to direct ones.

However, independently I think my original proposal is still useful
and I want to pitch it again for the following reasons.

AFAIU, Alexander Monakov's -fno-plt does not solve the following:

* Does not do anything for non-PIC code. The compiler does not
generate a @PLT call but the linker will route all external calls via
PLT.  We noticed a problem with non-PIC executables where the PLT
stubs were 

Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-21 Thread Sriraman Tallam
On Thu, May 21, 2015 at 2:12 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Sun, May 10, 2015 at 10:01 AM, Sriraman Tallam tmsri...@google.com wrote:

 On Sun, May 10, 2015, 8:19 AM H.J. Lu hjl.to...@gmail.com wrote:

 On Sat, May 9, 2015 at 9:34 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, May 4, 2015 at 7:45 AM, Michael Matz m...@suse.de wrote:
 Hi,

 On Thu, 30 Apr 2015, Sriraman Tallam wrote:

 We noticed that one of our benchmarks sped-up by ~1% when we eliminated
 PLT stubs for some of the hot external library functions like memcmp,
 pow.  The win was from better icache and itlb performance. The main
 reason was that the PLT stubs had no spatial locality with the
 call-sites. I have started looking at ways to tell the compiler to
 eliminate PLT stubs (in-effect inline them) for specified external
 functions, for x86_64. I have a proposal and a patch and I would like to
 hear what you think.

 This comes with caveats.  This cannot be generally done for all
 functions marked extern as it is impossible for the compiler to say if a
 function is truly extern (defined in a shared library). If a function
 is not truly extern(ends up defined in the final executable), then
 calling it indirectly is a performance penalty as it could have been a
 direct call.

 This can be fixed by Alans idea.

 Further, the newly created GOT entries are fixed up at
 start-up and do not get lazily bound.

 And this can be fixed by some enhancements in the linker and dynamic
 linker.  The idea is to still generate a PLT stub and make its GOT entry
 point to it initially (like a normal got.plt slot).  Then the first
 indirect call will use the address of PLT entry (starting lazy
 resolution)
 and update the GOT slot with the real address, so further indirect calls
 will directly go to the function.

 This requires a new asm marker (and hence new reloc) as normally if
 there's a GOT slot it's filled by the real symbols address, unlike if
 there's only a got.plt slot.  E.g. a

   call *foo@GOTPLT(%rip)

 would generate a GOT slot (and fill its address into above call insn),
 but
 generate a JUMP_SLOT reloc in the final executable, not a GLOB_DAT one.


 I added the relax prefix support to x86 assembler on users/hjl/relax
 branch

 at

 https://sourceware.org/git/?p=binutils-gdb.git;a=summary

 [hjl@gnu-tools-1 relax-3]$ cat r.S
 .text
 relax jmp foo
 relax call foo
 relax jmp foo@plt
 relax call foo@plt
 [hjl@gnu-tools-1 relax-3]$ ./as -o r.o r.S
 [hjl@gnu-tools-1 relax-3]$ ./objdump -drw r.o

 r.o: file format elf64-x86-64


 Disassembly of section .text:

  .text:
0: 66 e9 00 00 00 00 data16 jmpq 0x6 2: R_X86_64_RELAX_PC32 foo-0x4
6: 66 e8 00 00 00 00 data16 callq 0xc 8: R_X86_64_RELAX_PC32
 foo-0x4
c: 66 e9 00 00 00 00 data16 jmpq 0x12 e:
 R_X86_64_RELAX_PLT32foo-0x4
   12: 66 e8 00 00 00 00 data16 callq 0x18 14:
 R_X86_64_RELAX_PLT32foo-0x4
 [hjl@gnu-tools-1 relax-3]$

 Right now, the relax relocations are treated as PC32/PLT32 relocations.
 I am working on linker support.


 I implemented the linker support for x86-64:

  main:
0: 48 83 ec 08   sub$0x8,%rsp
4: e8 00 00 00 00   callq  9 main+0x9 5: R_X86_64_PC32 plt-0x4
9: e8 00 00 00 00   callq  e main+0xe a: R_X86_64_PLT32 plt-0x4
e: e8 00 00 00 00   callq  13 main+0x13 f: R_X86_64_PC32 bar-0x4
   13: 66 e8 00 00 00 00 data16 callq 19 main+0x19 15:
 R_X86_64_RELAX_PC32 bar-0x4
   19: 66 e8 00 00 00 00 data16 callq 1f main+0x1f 1b:
 R_X86_64_RELAX_PLT32 bar-0x4
   1f: 66 e8 00 00 00 00 data16 callq 25 main+0x25 21:
 R_X86_64_RELAX_PC32 foo-0x4
   25: 66 e8 00 00 00 00 data16 callq 2b main+0x2b 27:
 R_X86_64_RELAX_PLT32 foo-0x4
   2b: 31 c0 xor%eax,%eax
   2d: 48 83 c4 08   add$0x8,%rsp
   31: c3   retq

 00400460 main:
   400460: 48 83 ec 08   sub$0x8,%rsp
   400464: e8 d7 ff ff ff   callq  400440 plt@plt
   400469: e8 d2 ff ff ff   callq  400440 plt@plt
   40046e: e8 ad ff ff ff   callq  400420 bar@plt
   400473: ff 15 ff 03 20 00 callq  *0x2003ff(%rip)# 600878
 _DYNAMIC+0xf8
   400479: ff 15 f9 03 20 00 callq  *0x2003f9(%rip)# 600878
 _DYNAMIC+0xf8
   40047f: 66 e8 f3 00 00 00 data16 callq 400578 foo
   400485: 66 e8 ed 00 00 00 data16 callq 400578 foo
   40048b: 31 c0 xor%eax,%eax
   40048d: 48 83 c4 08   add$0x8,%rsp
   400491: c3   retq

 Sriraman, can you give it a try?


 I like HJ's proposal here and it is important that the linker fixes
 unnecessary indirect calls to direct ones.

 However, independently I think my original proposal is still useful
 and I want to pitch it again for the following reasons.

 AFAIU, Alexander Monakov's -fno-plt does not solve the following:

 * Does not do anything for non-PIC code. The compiler does not
 generate a @PLT call but the linker will route all external calls 

Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-21 Thread Pedro Alves
On 05/21/2015 10:12 PM, Sriraman Tallam wrote:
 
 My original proposal, for x86_64 only, was to add
 -fno-plt=function-name. This lets the user decide for which
 functions PLT must be avoided.  Let the compiler always generate an
 indirect call using call *func@GOTPCREL(%rip).  We could do this for
 non-PIC code too.  No need for linker fixups since this relies on the
 user to know that func is from a shared object.

Having to pass function names on the command line seems like an odd
interface.  E.g, you'll need to pass the mangled name for
C++ functions.  Any reason this isn't a function attribute?

Thanks,
Pedro Alves



Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-21 Thread Sriraman Tallam
On Thu, May 21, 2015 at 2:51 PM, Pedro Alves pal...@redhat.com wrote:
 On 05/21/2015 10:12 PM, Sriraman Tallam wrote:

 My original proposal, for x86_64 only, was to add
 -fno-plt=function-name. This lets the user decide for which
 functions PLT must be avoided.  Let the compiler always generate an
 indirect call using call *func@GOTPCREL(%rip).  We could do this for
 non-PIC code too.  No need for linker fixups since this relies on the
 user to know that func is from a shared object.

 Having to pass function names on the command line seems like an odd
 interface.  E.g, you'll need to pass the mangled name for
 C++ functions.  Any reason this isn't a function attribute?

It is not clear to me where I would stick the attribute.  Example
usage in foo.cc:

#includestring.h

int main() {
  int n = memcmp();
}

I want memcmp to not go through PLT, do you propose explicitly
re-declaring it in foo.cc with the attribute?

Thanks
Sri




 Thanks,
 Pedro Alves



Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-21 Thread H.J. Lu
On Thu, May 21, 2015 at 2:58 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Thu, May 21, 2015 at 10:51:50PM +0100, Pedro Alves wrote:
 On 05/21/2015 10:12 PM, Sriraman Tallam wrote:
 
  My original proposal, for x86_64 only, was to add
  -fno-plt=function-name. This lets the user decide for which
  functions PLT must be avoided.  Let the compiler always generate an
  indirect call using call *func@GOTPCREL(%rip).  We could do this for
  non-PIC code too.  No need for linker fixups since this relies on the
  user to know that func is from a shared object.

 Having to pass function names on the command line seems like an odd
 interface.  E.g, you'll need to pass the mangled name for
 C++ functions.  Any reason this isn't a function attribute?

 I strongly second this.  Similar reasons for why we haven't added
 the asan blacklisting from the command line, one really should use
 function attributes for this kind of things.


We can extend attribute to add something similar to dllimport


-- 
H.J.


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-21 Thread Jakub Jelinek
On Thu, May 21, 2015 at 10:51:50PM +0100, Pedro Alves wrote:
 On 05/21/2015 10:12 PM, Sriraman Tallam wrote:
  
  My original proposal, for x86_64 only, was to add
  -fno-plt=function-name. This lets the user decide for which
  functions PLT must be avoided.  Let the compiler always generate an
  indirect call using call *func@GOTPCREL(%rip).  We could do this for
  non-PIC code too.  No need for linker fixups since this relies on the
  user to know that func is from a shared object.
 
 Having to pass function names on the command line seems like an odd
 interface.  E.g, you'll need to pass the mangled name for
 C++ functions.  Any reason this isn't a function attribute?

I strongly second this.  Similar reasons for why we haven't added
the asan blacklisting from the command line, one really should use
function attributes for this kind of things.

Jakub


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-21 Thread Xinliang David Li
We have -finstrument-functions-exclude-function-list=.. in GCC, though
it is not using mangled names.

David

On Thu, May 21, 2015 at 2:58 PM, Jakub Jelinek ja...@redhat.com wrote:
 On Thu, May 21, 2015 at 10:51:50PM +0100, Pedro Alves wrote:
 On 05/21/2015 10:12 PM, Sriraman Tallam wrote:
 
  My original proposal, for x86_64 only, was to add
  -fno-plt=function-name. This lets the user decide for which
  functions PLT must be avoided.  Let the compiler always generate an
  indirect call using call *func@GOTPCREL(%rip).  We could do this for
  non-PIC code too.  No need for linker fixups since this relies on the
  user to know that func is from a shared object.

 Having to pass function names on the command line seems like an odd
 interface.  E.g, you'll need to pass the mangled name for
 C++ functions.  Any reason this isn't a function attribute?

 I strongly second this.  Similar reasons for why we haven't added
 the asan blacklisting from the command line, one really should use
 function attributes for this kind of things.

 Jakub


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-10 Thread H.J. Lu
On Sat, May 9, 2015 at 9:34 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Mon, May 4, 2015 at 7:45 AM, Michael Matz m...@suse.de wrote:
 Hi,

 On Thu, 30 Apr 2015, Sriraman Tallam wrote:

 We noticed that one of our benchmarks sped-up by ~1% when we eliminated
 PLT stubs for some of the hot external library functions like memcmp,
 pow.  The win was from better icache and itlb performance. The main
 reason was that the PLT stubs had no spatial locality with the
 call-sites. I have started looking at ways to tell the compiler to
 eliminate PLT stubs (in-effect inline them) for specified external
 functions, for x86_64. I have a proposal and a patch and I would like to
 hear what you think.

 This comes with caveats.  This cannot be generally done for all
 functions marked extern as it is impossible for the compiler to say if a
 function is truly extern (defined in a shared library). If a function
 is not truly extern(ends up defined in the final executable), then
 calling it indirectly is a performance penalty as it could have been a
 direct call.

 This can be fixed by Alans idea.

 Further, the newly created GOT entries are fixed up at
 start-up and do not get lazily bound.

 And this can be fixed by some enhancements in the linker and dynamic
 linker.  The idea is to still generate a PLT stub and make its GOT entry
 point to it initially (like a normal got.plt slot).  Then the first
 indirect call will use the address of PLT entry (starting lazy resolution)
 and update the GOT slot with the real address, so further indirect calls
 will directly go to the function.

 This requires a new asm marker (and hence new reloc) as normally if
 there's a GOT slot it's filled by the real symbols address, unlike if
 there's only a got.plt slot.  E.g. a

   call *foo@GOTPLT(%rip)

 would generate a GOT slot (and fill its address into above call insn), but
 generate a JUMP_SLOT reloc in the final executable, not a GLOB_DAT one.


 I added the relax prefix support to x86 assembler on users/hjl/relax
 branch

 at

 https://sourceware.org/git/?p=binutils-gdb.git;a=summary

 [hjl@gnu-tools-1 relax-3]$ cat r.S
 .text
 relax jmp foo
 relax call foo
 relax jmp foo@plt
 relax call foo@plt
 [hjl@gnu-tools-1 relax-3]$ ./as -o r.o r.S
 [hjl@gnu-tools-1 relax-3]$ ./objdump -drw r.o

 r.o: file format elf64-x86-64


 Disassembly of section .text:

  .text:
0: 66 e9 00 00 00 00 data16 jmpq 0x6 2: R_X86_64_RELAX_PC32 foo-0x4
6: 66 e8 00 00 00 00 data16 callq 0xc 8: R_X86_64_RELAX_PC32 foo-0x4
c: 66 e9 00 00 00 00 data16 jmpq 0x12 e: R_X86_64_RELAX_PLT32foo-0x4
   12: 66 e8 00 00 00 00 data16 callq 0x18 14: R_X86_64_RELAX_PLT32foo-0x4
 [hjl@gnu-tools-1 relax-3]$

 Right now, the relax relocations are treated as PC32/PLT32 relocations.
 I am working on linker support.


I implemented the linker support for x86-64:

 main:
   0: 48 83 ec 08   sub$0x8,%rsp
   4: e8 00 00 00 00   callq  9 main+0x9 5: R_X86_64_PC32 plt-0x4
   9: e8 00 00 00 00   callq  e main+0xe a: R_X86_64_PLT32 plt-0x4
   e: e8 00 00 00 00   callq  13 main+0x13 f: R_X86_64_PC32 bar-0x4
  13: 66 e8 00 00 00 00 data16 callq 19 main+0x19 15:
R_X86_64_RELAX_PC32 bar-0x4
  19: 66 e8 00 00 00 00 data16 callq 1f main+0x1f 1b:
R_X86_64_RELAX_PLT32 bar-0x4
  1f: 66 e8 00 00 00 00 data16 callq 25 main+0x25 21:
R_X86_64_RELAX_PC32 foo-0x4
  25: 66 e8 00 00 00 00 data16 callq 2b main+0x2b 27:
R_X86_64_RELAX_PLT32 foo-0x4
  2b: 31 c0 xor%eax,%eax
  2d: 48 83 c4 08   add$0x8,%rsp
  31: c3   retq

00400460 main:
  400460: 48 83 ec 08   sub$0x8,%rsp
  400464: e8 d7 ff ff ff   callq  400440 plt@plt
  400469: e8 d2 ff ff ff   callq  400440 plt@plt
  40046e: e8 ad ff ff ff   callq  400420 bar@plt
  400473: ff 15 ff 03 20 00 callq  *0x2003ff(%rip)# 600878
_DYNAMIC+0xf8
  400479: ff 15 f9 03 20 00 callq  *0x2003f9(%rip)# 600878
_DYNAMIC+0xf8
  40047f: 66 e8 f3 00 00 00 data16 callq 400578 foo
  400485: 66 e8 ed 00 00 00 data16 callq 400578 foo
  40048b: 31 c0 xor%eax,%eax
  40048d: 48 83 c4 08   add$0x8,%rsp
  400491: c3   retq

Sriraman, can you give it a try?

-- 
H.J.


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-09 Thread H.J. Lu
On Mon, May 4, 2015 at 7:45 AM, Michael Matz m...@suse.de wrote:
 Hi,

 On Thu, 30 Apr 2015, Sriraman Tallam wrote:

 We noticed that one of our benchmarks sped-up by ~1% when we eliminated
 PLT stubs for some of the hot external library functions like memcmp,
 pow.  The win was from better icache and itlb performance. The main
 reason was that the PLT stubs had no spatial locality with the
 call-sites. I have started looking at ways to tell the compiler to
 eliminate PLT stubs (in-effect inline them) for specified external
 functions, for x86_64. I have a proposal and a patch and I would like to
 hear what you think.

 This comes with caveats.  This cannot be generally done for all
 functions marked extern as it is impossible for the compiler to say if a
 function is truly extern (defined in a shared library). If a function
 is not truly extern(ends up defined in the final executable), then
 calling it indirectly is a performance penalty as it could have been a
 direct call.

 This can be fixed by Alans idea.

 Further, the newly created GOT entries are fixed up at
 start-up and do not get lazily bound.

 And this can be fixed by some enhancements in the linker and dynamic
 linker.  The idea is to still generate a PLT stub and make its GOT entry
 point to it initially (like a normal got.plt slot).  Then the first
 indirect call will use the address of PLT entry (starting lazy resolution)
 and update the GOT slot with the real address, so further indirect calls
 will directly go to the function.

 This requires a new asm marker (and hence new reloc) as normally if
 there's a GOT slot it's filled by the real symbols address, unlike if
 there's only a got.plt slot.  E.g. a

   call *foo@GOTPLT(%rip)

 would generate a GOT slot (and fill its address into above call insn), but
 generate a JUMP_SLOT reloc in the final executable, not a GLOB_DAT one.


I added the relax prefix support to x86 assembler on users/hjl/relax
branch

at

https://sourceware.org/git/?p=binutils-gdb.git;a=summary

[hjl@gnu-tools-1 relax-3]$ cat r.S
.text
relax jmp foo
relax call foo
relax jmp foo@plt
relax call foo@plt
[hjl@gnu-tools-1 relax-3]$ ./as -o r.o r.S
[hjl@gnu-tools-1 relax-3]$ ./objdump -drw r.o

r.o: file format elf64-x86-64


Disassembly of section .text:

 .text:
   0: 66 e9 00 00 00 00 data16 jmpq 0x6 2: R_X86_64_RELAX_PC32 foo-0x4
   6: 66 e8 00 00 00 00 data16 callq 0xc 8: R_X86_64_RELAX_PC32 foo-0x4
   c: 66 e9 00 00 00 00 data16 jmpq 0x12 e: R_X86_64_RELAX_PLT32foo-0x4
  12: 66 e8 00 00 00 00 data16 callq 0x18 14: R_X86_64_RELAX_PLT32foo-0x4
[hjl@gnu-tools-1 relax-3]$

Right now, the relax relocations are treated as PC32/PLT32 relocations.
I am working on linker support.

-- 
H.J.


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-04 Thread Michael Matz
Hi,

On Thu, 30 Apr 2015, Sriraman Tallam wrote:

 We noticed that one of our benchmarks sped-up by ~1% when we eliminated 
 PLT stubs for some of the hot external library functions like memcmp, 
 pow.  The win was from better icache and itlb performance. The main 
 reason was that the PLT stubs had no spatial locality with the 
 call-sites. I have started looking at ways to tell the compiler to 
 eliminate PLT stubs (in-effect inline them) for specified external 
 functions, for x86_64. I have a proposal and a patch and I would like to 
 hear what you think.
 
 This comes with caveats.  This cannot be generally done for all 
 functions marked extern as it is impossible for the compiler to say if a 
 function is truly extern (defined in a shared library). If a function 
 is not truly extern(ends up defined in the final executable), then 
 calling it indirectly is a performance penalty as it could have been a 
 direct call.

This can be fixed by Alans idea.

 Further, the newly created GOT entries are fixed up at 
 start-up and do not get lazily bound.

And this can be fixed by some enhancements in the linker and dynamic 
linker.  The idea is to still generate a PLT stub and make its GOT entry 
point to it initially (like a normal got.plt slot).  Then the first 
indirect call will use the address of PLT entry (starting lazy resolution) 
and update the GOT slot with the real address, so further indirect calls 
will directly go to the function.

This requires a new asm marker (and hence new reloc) as normally if 
there's a GOT slot it's filled by the real symbols address, unlike if 
there's only a got.plt slot.  E.g. a

  call *foo@GOTPLT(%rip)

would generate a GOT slot (and fill its address into above call insn), but 
generate a JUMP_SLOT reloc in the final executable, not a GLOB_DAT one.


Ciao,
Michael.


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-04 Thread Xinliang David Li
The use case proposed by Sri allows user to selectively eliminate PLT
overhead for hot external calls only. In such scenarios, lazy binding
won't be something matters to the user.

David

On Mon, May 4, 2015 at 7:45 AM, Michael Matz m...@suse.de wrote:
 Hi,

 On Thu, 30 Apr 2015, Sriraman Tallam wrote:

 We noticed that one of our benchmarks sped-up by ~1% when we eliminated
 PLT stubs for some of the hot external library functions like memcmp,
 pow.  The win was from better icache and itlb performance. The main
 reason was that the PLT stubs had no spatial locality with the
 call-sites. I have started looking at ways to tell the compiler to
 eliminate PLT stubs (in-effect inline them) for specified external
 functions, for x86_64. I have a proposal and a patch and I would like to
 hear what you think.

 This comes with caveats.  This cannot be generally done for all
 functions marked extern as it is impossible for the compiler to say if a
 function is truly extern (defined in a shared library). If a function
 is not truly extern(ends up defined in the final executable), then
 calling it indirectly is a performance penalty as it could have been a
 direct call.

 This can be fixed by Alans idea.

 Further, the newly created GOT entries are fixed up at
 start-up and do not get lazily bound.

 And this can be fixed by some enhancements in the linker and dynamic
 linker.  The idea is to still generate a PLT stub and make its GOT entry
 point to it initially (like a normal got.plt slot).  Then the first
 indirect call will use the address of PLT entry (starting lazy resolution)
 and update the GOT slot with the real address, so further indirect calls
 will directly go to the function.

 This requires a new asm marker (and hence new reloc) as normally if
 there's a GOT slot it's filled by the real symbols address, unlike if
 there's only a got.plt slot.  E.g. a

   call *foo@GOTPLT(%rip)

 would generate a GOT slot (and fill its address into above call insn), but
 generate a JUMP_SLOT reloc in the final executable, not a GLOB_DAT one.


 Ciao,
 Michael.


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-04 Thread Michael Matz
Hi,

On Mon, 4 May 2015, Xinliang David Li wrote:

 The use case proposed by Sri allows user to selectively eliminate PLT
 overhead for hot external calls only.

Yes, but only _because_ his approach doesn't use lazy binding.  With the 
full solution such restriction to a subset of functions isn't necessary.
And we should strive for going the full way, instead of adding hacks, 
shouldn't we?


Ciao,
Michael.


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-04 Thread Xinliang David Li
yes -- a full solution that supports lazy binding will be nice.

David

On Mon, May 4, 2015 at 9:58 AM, Michael Matz m...@suse.de wrote:
 Hi,

 On Mon, 4 May 2015, Xinliang David Li wrote:

 The use case proposed by Sri allows user to selectively eliminate PLT
 overhead for hot external calls only.

 Yes, but only _because_ his approach doesn't use lazy binding.  With the
 full solution such restriction to a subset of functions isn't necessary.
 And we should strive for going the full way, instead of adding hacks,
 shouldn't we?


 Ciao,
 Michael.


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-02 Thread Andi Kleen
On Fri, May 01, 2015 at 11:05:58AM -0700, Sriraman Tallam wrote:
 On Fri, May 1, 2015 at 9:26 AM, Xinliang David Li davi...@google.com wrote:
  yes -- it is good to turn this on by default in LTO mode without
  requiring user to specify the option.
 
 Yes, with LTO, we would exactly know what the truly extern functions
 are

... unless a function is overwritten somewhere else at dynamic link time 
That's why you may need -fno-semantic...

-Andi


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-01 Thread Sriraman Tallam
On Fri, May 1, 2015 at 8:01 AM, Andi Kleen a...@firstfloor.org wrote:
 Sriraman Tallam tmsri...@google.com writes:

 This comes with  caveats.  This cannot be generally done for all
 functions marked extern as it is impossible for the compiler to say if
 a function is truly extern (defined in a shared library). If a
 function is not truly extern(ends up defined in the final executable),
 then calling it indirectly is a performance penalty as it could have
 been a direct call.  Further, the newly created GOT entries are fixed
 up at start-up and do not get lazily bound.

 This means you need to make it depend on -fno-semantic-interposition ?

Please correct me if I am wrong but I do not see any dependency on
semantic-interposition.  The GOT entry created for the function
pointer (whose PLT has been eliminated) has a dynamic relocation
against it to fixup the address at run-time and the dynamic linker
fills it with the right address.  This is not a new mechanism.  The
same mechanism is used when we access function pointers with PIE for
instance.

Thanks
Sri


 Given this, I propose adding a new option called
 -fno-plt=function-name to the compiler.  This tells the compiler
 that we know that the function is truly extern and we want the
 indirect call only for these call-sites.  I have attached a patch that
 adds -fno-plt= to GCC.  Any number of -fno-plt= can be specified and
 all call-sites corresponding to these named functions will be done
 indirectly using the mechanism described above without the use of a
 PLT stub.

 The argument seems awkward. The command line may get very long.
 Better an attribute?

 Longer term it would be probably better to support it properly
 in the linker.

 -Andi

 --
 a...@linux.intel.com -- Speaking for myself only


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-01 Thread Xinliang David Li
yes -- it is good to turn this on by default in LTO mode without
requiring user to specify the option.

David

On Fri, May 1, 2015 at 9:23 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, May 1, 2015 at 9:19 AM, Xinliang David Li davi...@google.com wrote:
 On Fri, May 1, 2015 at 8:01 AM, Andi Kleen a...@firstfloor.org wrote:
 Sriraman Tallam tmsri...@google.com writes:

 This comes with  caveats.  This cannot be generally done for all
 functions marked extern as it is impossible for the compiler to say if
 a function is truly extern (defined in a shared library). If a
 function is not truly extern(ends up defined in the final executable),
 then calling it indirectly is a performance penalty as it could have
 been a direct call.  Further, the newly created GOT entries are fixed
 up at start-up and do not get lazily bound.

 This means you need to make it depend on -fno-semantic-interposition ?

 Given this, I propose adding a new option called
 -fno-plt=function-name to the compiler.  This tells the compiler
 that we know that the function is truly extern and we want the
 indirect call only for these call-sites.  I have attached a patch that
 adds -fno-plt= to GCC.  Any number of -fno-plt= can be specified and
 all call-sites corresponding to these named functions will be done
 indirectly using the mechanism described above without the use of a
 PLT stub.

 The argument seems awkward. The command line may get very long.
 Better an attribute?

 They are complementary. Perhaps another option like linker's
 --dynamic-list= that can take a file specifying the list of symbols.


 Longer term it would be probably better to support it properly
 in the linker.


 Linker solution has its own downside -- it require reserving more
 space conservatively for many callsites which end up being direct
 calls.


 Can we do it automatically for LTO?


 --
 H.J.


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-01 Thread H.J. Lu
On Fri, May 1, 2015 at 9:19 AM, Xinliang David Li davi...@google.com wrote:
 On Fri, May 1, 2015 at 8:01 AM, Andi Kleen a...@firstfloor.org wrote:
 Sriraman Tallam tmsri...@google.com writes:

 This comes with  caveats.  This cannot be generally done for all
 functions marked extern as it is impossible for the compiler to say if
 a function is truly extern (defined in a shared library). If a
 function is not truly extern(ends up defined in the final executable),
 then calling it indirectly is a performance penalty as it could have
 been a direct call.  Further, the newly created GOT entries are fixed
 up at start-up and do not get lazily bound.

 This means you need to make it depend on -fno-semantic-interposition ?

 Given this, I propose adding a new option called
 -fno-plt=function-name to the compiler.  This tells the compiler
 that we know that the function is truly extern and we want the
 indirect call only for these call-sites.  I have attached a patch that
 adds -fno-plt= to GCC.  Any number of -fno-plt= can be specified and
 all call-sites corresponding to these named functions will be done
 indirectly using the mechanism described above without the use of a
 PLT stub.

 The argument seems awkward. The command line may get very long.
 Better an attribute?

 They are complementary. Perhaps another option like linker's
 --dynamic-list= that can take a file specifying the list of symbols.


 Longer term it would be probably better to support it properly
 in the linker.


 Linker solution has its own downside -- it require reserving more
 space conservatively for many callsites which end up being direct
 calls.


Can we do it automatically for LTO?


-- 
H.J.


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-01 Thread Xinliang David Li
On Fri, May 1, 2015 at 8:01 AM, Andi Kleen a...@firstfloor.org wrote:
 Sriraman Tallam tmsri...@google.com writes:

 This comes with  caveats.  This cannot be generally done for all
 functions marked extern as it is impossible for the compiler to say if
 a function is truly extern (defined in a shared library). If a
 function is not truly extern(ends up defined in the final executable),
 then calling it indirectly is a performance penalty as it could have
 been a direct call.  Further, the newly created GOT entries are fixed
 up at start-up and do not get lazily bound.

 This means you need to make it depend on -fno-semantic-interposition ?

 Given this, I propose adding a new option called
 -fno-plt=function-name to the compiler.  This tells the compiler
 that we know that the function is truly extern and we want the
 indirect call only for these call-sites.  I have attached a patch that
 adds -fno-plt= to GCC.  Any number of -fno-plt= can be specified and
 all call-sites corresponding to these named functions will be done
 indirectly using the mechanism described above without the use of a
 PLT stub.

 The argument seems awkward. The command line may get very long.
 Better an attribute?

They are complementary. Perhaps another option like linker's
--dynamic-list= that can take a file specifying the list of symbols.


 Longer term it would be probably better to support it properly
 in the linker.


Linker solution has its own downside -- it require reserving more
space conservatively for many callsites which end up being direct
calls.

David



 -Andi

 --
 a...@linux.intel.com -- Speaking for myself only


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-01 Thread Andi Kleen
Sriraman Tallam tmsri...@google.com writes:

 This comes with  caveats.  This cannot be generally done for all
 functions marked extern as it is impossible for the compiler to say if
 a function is truly extern (defined in a shared library). If a
 function is not truly extern(ends up defined in the final executable),
 then calling it indirectly is a performance penalty as it could have
 been a direct call.  Further, the newly created GOT entries are fixed
 up at start-up and do not get lazily bound.

This means you need to make it depend on -fno-semantic-interposition ?

 Given this, I propose adding a new option called
 -fno-plt=function-name to the compiler.  This tells the compiler
 that we know that the function is truly extern and we want the
 indirect call only for these call-sites.  I have attached a patch that
 adds -fno-plt= to GCC.  Any number of -fno-plt= can be specified and
 all call-sites corresponding to these named functions will be done
 indirectly using the mechanism described above without the use of a
 PLT stub.

The argument seems awkward. The command line may get very long.
Better an attribute?

Longer term it would be probably better to support it properly
in the linker.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-05-01 Thread Sriraman Tallam
On Fri, May 1, 2015 at 9:26 AM, Xinliang David Li davi...@google.com wrote:
 yes -- it is good to turn this on by default in LTO mode without
 requiring user to specify the option.

Yes, with LTO, we would exactly know what the truly extern functions
are and PLT stubs can be eliminated for all extern functions when
early binding is specified. With lazy binding, we can eliminate the
PLT stubs selectively for the hot extern functions.

Thanks
Sri


 David

 On Fri, May 1, 2015 at 9:23 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, May 1, 2015 at 9:19 AM, Xinliang David Li davi...@google.com wrote:
 On Fri, May 1, 2015 at 8:01 AM, Andi Kleen a...@firstfloor.org wrote:
 Sriraman Tallam tmsri...@google.com writes:

 This comes with  caveats.  This cannot be generally done for all
 functions marked extern as it is impossible for the compiler to say if
 a function is truly extern (defined in a shared library). If a
 function is not truly extern(ends up defined in the final executable),
 then calling it indirectly is a performance penalty as it could have
 been a direct call.  Further, the newly created GOT entries are fixed
 up at start-up and do not get lazily bound.

 This means you need to make it depend on -fno-semantic-interposition ?

 Given this, I propose adding a new option called
 -fno-plt=function-name to the compiler.  This tells the compiler
 that we know that the function is truly extern and we want the
 indirect call only for these call-sites.  I have attached a patch that
 adds -fno-plt= to GCC.  Any number of -fno-plt= can be specified and
 all call-sites corresponding to these named functions will be done
 indirectly using the mechanism described above without the use of a
 PLT stub.

 The argument seems awkward. The command line may get very long.
 Better an attribute?

 They are complementary. Perhaps another option like linker's
 --dynamic-list= that can take a file specifying the list of symbols.


 Longer term it would be probably better to support it properly
 in the linker.


 Linker solution has its own downside -- it require reserving more
 space conservatively for many callsites which end up being direct
 calls.


 Can we do it automatically for LTO?


 --
 H.J.


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-04-30 Thread Alan Modra
On Thu, Apr 30, 2015 at 05:31:30PM -0700, Sriraman Tallam wrote:
 This comes with  caveats.  This cannot be generally done for all
 functions marked extern as it is impossible for the compiler to say if
 a function is truly extern (defined in a shared library). If a
 function is not truly extern(ends up defined in the final executable),
 then calling it indirectly is a performance penalty as it could have
 been a direct call.  Further, the newly created GOT entries are fixed
 up at start-up and do not get lazily bound.

I've considered something similar for PowerPC (but didn't consider
doing do so for a subset of calls).  Losing lazy symbol resolution is
a real problem.  The other problem you cite of indirect calls that
could be direct can be fixed in the linker relatively easily.
Edit this code
   0:   ff 15 00 00 00 00   callq  *0x0(%rip)# 0x6
2: R_X86_64_GOTPCRELfoo-0x4
   6:   ff 25 00 00 00 00   jmpq   *0x0(%rip)# 0xc
8: R_X86_64_GOTPCRELfoo-0x4
to this
   c:   e8 00 00 00 00  callq  0x11
d: R_X86_64_PC32foo-0x4
  11:   90  nop
  12:   e9 00 00 00 00  jmpq   0x17
13: R_X86_64_PC32   foo-0x4
  17:   90  nop
You may need to have gcc or gas add a marker reloc to say exactly
where an instruction starts.

-- 
Alan Modra
Australia Development Lab, IBM


Re: [RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-04-30 Thread Sriraman Tallam
On Thu, Apr 30, 2015 at 8:21 PM, Alan Modra amo...@gmail.com wrote:
 On Thu, Apr 30, 2015 at 05:31:30PM -0700, Sriraman Tallam wrote:
 This comes with  caveats.  This cannot be generally done for all
 functions marked extern as it is impossible for the compiler to say if
 a function is truly extern (defined in a shared library). If a
 function is not truly extern(ends up defined in the final executable),
 then calling it indirectly is a performance penalty as it could have
 been a direct call.  Further, the newly created GOT entries are fixed
 up at start-up and do not get lazily bound.

 I've considered something similar for PowerPC (but didn't consider
 doing do so for a subset of calls).  Losing lazy symbol resolution is
 a real problem.

With -fno-plt= option, you are choosing functions that are hot and
PLT must be avoided.  Losing lazy binding on these should be perfectly
fine because they would be called.

Thanks
Sri

The other problem you cite of indirect calls that
 could be direct can be fixed in the linker relatively easily.
 Edit this code
0:   ff 15 00 00 00 00   callq  *0x0(%rip)# 0x6
 2: R_X86_64_GOTPCRELfoo-0x4
6:   ff 25 00 00 00 00   jmpq   *0x0(%rip)# 0xc
 8: R_X86_64_GOTPCRELfoo-0x4
 to this
c:   e8 00 00 00 00  callq  0x11
 d: R_X86_64_PC32foo-0x4
   11:   90  nop
   12:   e9 00 00 00 00  jmpq   0x17
 13: R_X86_64_PC32   foo-0x4
   17:   90  nop
 You may need to have gcc or gas add a marker reloc to say exactly
 where an instruction starts.

 --
 Alan Modra
 Australia Development Lab, IBM


[RFC][PATCH][X86_64] Eliminate PLT stubs for specified external functions via -fno-plt=

2015-04-30 Thread Sriraman Tallam
Hi,

We noticed that one of our benchmarks sped-up by ~1% when we
eliminated PLT stubs for some of the hot external library functions
like memcmp, pow.  The win was from better icache and itlb
performance. The main reason was that the PLT stubs had no spatial
locality with the call-sites. I have started looking at ways to tell
the compiler to eliminate PLT stubs (in-effect inline them) for
specified external functions, for x86_64. I have a proposal and a
patch and I would like to hear what you think.

Here is a  summary of what is happening currently. A call to an
external function is direct but calls into the PLT stub which then
jumps indirectly to the GOT entry.  If I could replace the direct call
to the PLT stub with an indirect call to a GOT entry which will hold
the address of the external function, I have gotten rid of the PLT
stub.  Here is an example:

foo.cc
=

extern int foo ();  // Truly external library function, defined in a
shared library.

int main() {
  foo();
  ...
}

Currently, this is what is happening.

foo.s looks like this:

main:
.
callq _Z3foov

but the linker replaces this to call the PLT stub of foo instead.

Function main calls the plt stub directly:

00400766 main:
….
40076a:   e8 71 fe ff ff  callq  4005e0 _Z3foov@plt

and the PLT stub does this:

004005e0 _Z3foov@plt:
  4005e0:   jmpq   *0x15d2(%rip)# 401bb8
_GLOBAL_OFFSET_TABLE_+0x28
  4005e6:   pushq  $0x2
  4005eb:   jmpq   4005b0 _init+0x28

The GOT entry at address 0x401bb8 contains the address of foo which
will be lazily bound.

What my proposal plans does is to change foo.s to look like this:

callq *_Z3foov@GOTPCREL(%rip)

which is indirectly calling foo via a GOT entry that contains the
address of foo.  The address in the GOT entry is fixed up at load time
and the linker creates only one GOT entry per function irrespective of
the number of callers.

a.out now looks like this:

00400746 main:
...
40074a:   ff 15 20 14 00 00   callq  *0x1420(%rip)#
401b70 _DYNAMIC+0x1e8
...

Function main indirectly calls foo using the contents at location
0x401b70 which is actually a GOT entry containing the address of foo.
Notice that we have in effect inlined the PLT stub.

This comes with  caveats.  This cannot be generally done for all
functions marked extern as it is impossible for the compiler to say if
a function is truly extern (defined in a shared library). If a
function is not truly extern(ends up defined in the final executable),
then calling it indirectly is a performance penalty as it could have
been a direct call.  Further, the newly created GOT entries are fixed
up at start-up and do not get lazily bound.

Given this, I propose adding a new option called
-fno-plt=function-name to the compiler.  This tells the compiler
that we know that the function is truly extern and we want the
indirect call only for these call-sites.  I have attached a patch that
adds -fno-plt= to GCC.  Any number of -fno-plt= can be specified and
all call-sites corresponding to these named functions will be done
indirectly using the mechanism described above without the use of a
PLT stub.

Alternatively, we can do this entirely in the linker.  We can
introduce a new relocation type to tell the linker to convert all
direct calls to truly extern functions into indirect calls via GOT
entries.  The GCC patch just seems simpler.
Also, we could link statically but we do not want that or we could
copy the specific external functions into our executable. This might
work for executable A but a different set of external functions might
be hot for executable B. We want a more general solution.


Please let me know what you think.

Thanks
Sri
* common.opt (-fno-plt=): New option.
* config/i386/i386.c (avoid_plt_to_call): New function.
(ix86_output_call_insn):  Check if PLT needs to be avoided
and call or jump indirectly if true.
* opts-global.c (htab_str_eq): New function.
(avoid_plt_fnsymbol_names_tab): New htab.
(handle_common_deferred_options): Handle -fno-plt=

Index: common.opt
===
--- common.opt  (revision 222641)
+++ common.opt  (working copy)
@@ -1087,6 +1087,11 @@ fdbg-cnt=
 Common RejectNegative Joined Var(common_deferred_options) Defer
 -fdbg-cnt=counter:limit[,counter:limit,...]Set the debug counter 
limit.   
 
+fno-plt=
+Common RejectNegative Joined Var(common_deferred_options) Defer
+-fno-plt=symbol1  Avoid going through the PLT when calling the specified 
function.
+Allow multiple instances of this option with different function names.
+
 fdebug-prefix-map=
 Common Joined RejectNegative Var(common_deferred_options) Defer
 Map one directory name to another in debug information
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 222641)
+++ config/i386/i386.c