Re: [CHKP, PATCH] Fix LTO cgraph merge for instrumented functions

2015-04-03 Thread Ilya Enkovich
2015-04-02 20:01 GMT+03:00 Jan Hubicka hubi...@ucw.cz:
 Ping
 It would help if you add hubi...@ucw.cz to CC for IPA related patches.

 2015-03-19 11:34 GMT+03:00 Ilya Enkovich enkovich@gmail.com:
  On 12 Mar 13:27, Ilya Enkovich wrote:
  Hi,
 
  Currently cgraph merge has several issues with instrumented code:
   - original function node may be removed = no assembler name conflict is 
  detected between function and variable

 Why do you need to detect this one?

Assembler name of instrumented function is a transparent alias of
original function's name.  Alias chains are not taken into account by
analysis. Thus we see no conflict between instrumented function's name
and a variable name but emit them with the same assembler name. To
detect name conflict I keep original function node alive.

   - only orig_decl name is privatized for instrumented function = node 
  still shares assembler name which causes infinite privatization loop
   - information about changed name is stored in file_data of instrumented 
  node = original section name may be not found for original function
   - chkp reference is not fixed when nodes are merged
 
  This patch should fix theese problems by keeping instrumentation thunks 
  reachable, privatizing both nodes and fixing chkp references.  
  Bootstrapped and tested on x86_64-unknown-linux-gnu.  OK for trunk?

 Next stage1 I definitly hope we will be able to reduce number of special cases
 we need for chck nodes.  I will try to read the code and your design document
 and give it some tought.

Original design didn't take into account several LTO aspect and
additional patches appeared to fix various LTO issues. It would be
nice to improve it with your help. I'll need to update a design
document to reflect recent problems and used fixes.

  2015-03-19  Ilya Enkovich  ilya.enkov...@intel.com
 
  * ipa-chkp.h (chkp_maybe_fix_chkp_ref): New.
  * ipa-chkp.c (chkp_maybe_fix_chkp_ref): New.
  * ipa.c (symbol_table::remove_unreachable_nodes): Don't
  remove instumentation thunks calling reachable functions.
  * lto-cgraph.c: Include ipa-chkp.h.
  (input_symtab): Fix chkp references for boundary nodes.
  * lto/lto-partition.c (privatize_symbol_name_1): New.
  (privatize_symbol_name): Privatize both decl and orig_decl
  names for instrumented functions.
  * lto/lto-symtab.c: Include ipa-chkp.h.
  (lto_cgraph_replace_node): Fix chkp references for merged
  function nodes.
 
  gcc/testsuite/
 
  2015-03-19  Ilya Enkovich  ilya.enkov...@intel.com
 
  * gcc.dg/lto/chkp-privatize-1_0.c: New.
  * gcc.dg/lto/chkp-privatize-1_1.c: New.
  * gcc.dg/lto/chkp-privatize-2_0.c: New.
  * gcc.dg/lto/chkp-privatize-2_1.c: New.
 
 
  diff --git a/gcc/ipa-chkp.c b/gcc/ipa-chkp.c
  index 0b857ff..7a7fc3c 100644
  --- a/gcc/ipa-chkp.c
  +++ b/gcc/ipa-chkp.c
  @@ -414,6 +414,40 @@ chkp_instrumentable_p (tree fndecl)
 (!fn || !copy_forbidden (fn, fndecl)));
   }
 
  +/* Check NODE has a correct IPA_REF_CHKP reference.
  +   Create a new reference if required.  */
  +
  +void
  +chkp_maybe_fix_chkp_ref (cgraph_node *node)

 OK, so you have the chkp references that links to instrumented_version.
 You do not stream them for boundary symbols but for some reason they are 
 needed,
 but when you can end up with wrong CHKP link?

Wrong links may appear when cgraph nodes are merged.

Thanks
Ilya


Re: [CHKP] Support returned bounds in thunks expand

2015-04-03 Thread Ilya Enkovich
2015-04-02 23:55 GMT+03:00 Jan Hubicka hubi...@ucw.cz:
 Ping

 2015-03-10 13:12 GMT+03:00 Ilya Enkovich enkovich@gmail.com:
  Hi,
 
  Currentl we loose returned bounds when functions are merged.  This patch 
  fixes it by adding returne bounds support for cgraph_node::expand_thunk.  
  Bootstrapped and tested on x86_64-unknown-linux-gnu.  OK for trunk?
  2015-03-06  Ilya Enkovich  ilya.enkov...@intel.com
 
  * cgraphunit.c (cgraph_node::expand_thunk): Build returned
  bounds for instrumented functions.

 I think if the extra bultin call is needed here, the same andling needs to be
 added to ipa-split.  We really ought to unify that code - it is surprisingly
 difficult to produce a wrapper call these days.

Yep, there is such code in place. It is done by
insert_bndret_call_after. You are right, I should use the same
interface for both these cases.

  + if (instrumentation_clone
  +  !DECL_BY_REFERENCE (resdecl)
  +  restmp
  +  BOUNDED_P (restmp))
  +   {
  + tree fn = targetm.builtin_chkp_function 
  (BUILT_IN_CHKP_BNDRET);
  + gcall *retbnd = gimple_build_call (fn, 1, restmp);
  +
  + resbnd = create_tmp_reg (pointer_bounds_type_node, retbnd);
  + gimple_call_set_lhs (retbnd, resbnd);
  +
  + gsi_insert_after (bsi, retbnd, GSI_NEW_STMT);
  + create_edge (get_create (fn), retbnd, callees-count, 
  callees-frequency);
  +   }

 I am not sure why we need to check here:  Originaly we have two bounded 
 functions, A and B.
 We turn function B to a wrapper of function A. If callers of bounded 
 functions need
 to call a builtin, I would expect all callers of B to do it already, so I 
 would expect
 that wrapper is safe here?

The problem with instrumented call is that instrumented function
returns two values and call lhs gets only the first one. Thus we
generate bndret call to get the second one to build own return with
two values.


 Or do we mix instrumented and non-instrumented versions somehow?

 Addding code handling return value is going to turn the call to non-tailcall, 
 so you probably
 want to drop the tailcall flag, too.

No, function still return what the last call return, thus may keep tailcall.

Thanks,
Ilya


 Honza


[PATCH] Fix PR ipa/65665

2015-04-03 Thread Martin Liška

Hello.

Following patch fixes a new issue, ICE has disappeared on aarch64-linux machine 
and I was able to boostrap on
the machine. Moreover, no new regression seen on x86_64-linux-pc.

Ready for trunk?
Thanks,
Martin
From b8463b8a7b2f6b652d6d9c3a2ece814ec5554619 Mon Sep 17 00:00:00 2001
From: mliska mli...@suse.cz
Date: Fri, 3 Apr 2015 09:30:50 +0200
Subject: [PATCH] Fix PR ipa/65665

gcc/ChangeLog:

2015-04-03  Martin Liska  mli...@suse.cz

	PR ipa/65665
	* ipa-icf.c (sem_function::equals_wpa): Verify that IPA CP
	has computed data structure.
	(sem_item_optimizer::update_hash_by_addr_refs): Likewise.
---
 gcc/ipa-icf.c | 10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
index 8626730..8f8a0cf 100644
--- a/gcc/ipa-icf.c
+++ b/gcc/ipa-icf.c
@@ -535,7 +535,8 @@ sem_function::equals_wpa (sem_item *item,
(TREE_CODE (TREE_TYPE (decl)) == METHOD_TYPE
   || TREE_CODE (TREE_TYPE (item-decl)) == METHOD_TYPE)
(ipa_node_params_sum == NULL
-	  || ipa_is_param_used (IPA_NODE_REF (dyn_cast cgraph_node *(node)),
+	  || IPA_NODE_REF (get_node ())-descriptors.is_empty ()
+	  || ipa_is_param_used (IPA_NODE_REF (get_node ()),
 0))
compare_polymorphic_p ())
 {
@@ -2501,14 +2502,15 @@ sem_item_optimizer::update_hash_by_addr_refs ()
   m_items[i]-update_hash_by_addr_refs (m_symtab_node_map);
   if (m_items[i]-type == FUNC)
 	{
+	  cgraph_node *cnode = dyn_cast cgraph_node * (m_items[i]-node);
+
 	  if (TREE_CODE (TREE_TYPE (m_items[i]-decl)) == METHOD_TYPE
 	   contains_polymorphic_type_p
 		   (method_class_type (TREE_TYPE (m_items[i]-decl)))
 	   (DECL_CXX_CONSTRUCTOR_P (m_items[i]-decl)
 		  || ((ipa_node_params_sum == NULL
-		   || ipa_is_param_used (
-			IPA_NODE_REF
-			  (dyn_cast cgraph_node *(m_items[i]-node)), 0))
+		   || IPA_NODE_REF (cnode)-descriptors.is_empty ()
+		   || ipa_is_param_used (IPA_NODE_REF (cnode), 0))
 		   static_castsem_function * (m_items[i])
 			   -compare_polymorphic_p (
 	 {
-- 
2.1.4



Re: [PATCH, CHKP] Fix ipa-comdats for instrumentation thunks

2015-04-03 Thread Ilya Enkovich
2015-04-02 20:04 GMT+03:00 Jan Hubicka hubi...@ucw.cz:
 Hi,

 With r221574 (https://gcc.gnu.org/ml/gcc-cvs/2015-03/msg00495.html) thunks 
 don't get comdat groups assigned and this causes a failure in cgraph checker 
 for instrumentation thunks.  It happens because instrumentation thunk may 
 reference local symbol in comdat not being in comdat by itself.  This patch 
 fixes the problem.  Doesn't affect not instrumented code.  Testing is in 
 progress.  Does it look OK?

 Thanks,
 Ilya
 --
 gcc/

 2015-04-02  Ilya Enkovich  ilya.enkov...@intel.com

   * ipa-comdats.c (ipa_comdats): Visit all instrumentation
   thunks to set proper comdat group.

 gcc/testsuite/

 2015-04-02  Ilya Enkovich  ilya.enkov...@intel.com

   * gcc.target/i386/mpx/chkp-thunk-comdat-1.cc: New.
   * gcc.target/i386/mpx/chkp-thunk-comdat-2.cc: New.


 diff --git a/gcc/ipa-comdats.c b/gcc/ipa-comdats.c
 index f349f9f..30bcad8 100644
 --- a/gcc/ipa-comdats.c
 +++ b/gcc/ipa-comdats.c
 @@ -348,10 +348,9 @@ ipa_comdats (void)
  }

/* Finally assign symbols to the sections.  */
 -
 +  cgraph_node *fun;
FOR_EACH_DEFINED_SYMBOL (symbol)
  {
 -  struct cgraph_node *fun;
symbol-aux = NULL;
if (!symbol-get_comdat_group ()
  !symbol-alias
 @@ -388,6 +387,20 @@ ipa_comdats (void)
  true);
   }
  }
 +
 +  /* Instrumentation thunks reference original node and thus
 + need to be in the same comdat group.  Otherwise we may
 + get a local instrumented symbol in a comdat group and
 + the referencing original node outside of it.  */
 +  FOR_EACH_DEFINED_FUNCTION (fun)
 +if (fun-instrumentation_clone
 +  fun-instrumented_version
 +  !fun-instrumented_version-alias
 +  fun-get_comdat_group ()
 +  !fun-instrumented_version-get_comdat_group ())
 +  fun-instrumented_version-call_for_symbol_and_aliases
 + (set_comdat_group_1, fun, true);

 I think you want to handle them same way as the aliasesthunks are handled.
 This fix is symptomatic: the code may assign them to different comdat groups
 and may propagate that furhter.

Currently ipa_comdats doesn't set comdat groups for thunks. At the
same time all references to local symbol should be within one comdat
group. That's why I need this fix. Assigning function and its
instrumentation thunks to different comdat groups would be an error
because both nodes represent the same function.

Thanks,
Ilya


 Honza


[PING, AArch64] Add long-call attribute and pragma interfaces

2015-04-03 Thread Yangfei (Felix)
Patch ping: https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01148.html 

Thanks. 


Re: [CHKP] Support returned bounds in thunks expand

2015-04-03 Thread Ilya Enkovich
2015-04-02 22:42 GMT+03:00 Jeff Law l...@redhat.com:
 On 04/02/2015 08:49 AM, Ilya Enkovich wrote:

 Ping

 2015-03-10 13:12 GMT+03:00 Ilya Enkovich enkovich@gmail.com:

 Hi,

 Currentl we loose returned bounds when functions are merged.  This patch
 fixes it by adding returne bounds support for cgraph_node::expand_thunk.
 Bootstrapped and tested on x86_64-unknown-linux-gnu.  OK for trunk?

 Thanks,
 Ilya
 --
 gcc/

 2015-03-06  Ilya Enkovich  ilya.enkov...@intel.com

  * cgraphunit.c (cgraph_node::expand_thunk): Build returned
  bounds for instrumented functions.

 gcc/testsuite/

 2015-03-06  Ilya Enkovich  ilya.enkov...@intel.com

  * gcc/testsuite/gcc.target/i386/thunk-retbnd.c: New.

 I really dislike the amount of gimple and bounded pointer knowledge in this
 code.

 It seems like a significant modularity violation and while you didn't
 introduce the gimple stuff, we probably shouldn't be making it worse.

 Is it possible to let this code build up the thunk, then pass it off as a
 whole to the chkp code to add the instrumentation, particularly for the
 return value?

OK, will rework the patch.


 ALso, is this critical for stage4?  It looks like this is strictly a QofI
 change, correct?

Actually having just a tail call in a function we don't lose bounds
and I don't expect it affects QofI. But we get instrumented function
with no returned bounds for a pointer which triggers an assert
somewhere (don't remember exact place). I'll revisit the original
problem and will probably make a simpler stability fix for now,
leaving thunk modification for stage 1.

Thanks,
Ilya


 jeff



[PATCH, i386] PR63211 broken type-punning in avx* tests.

2015-04-03 Thread Ilya Tocar
Hi,

I've looked into avx* tests and many of them (even those that don't fail
in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63211) use invalid type
punning. Properly fixing them looks like a lot of work, so I propose
just adding  -fno-strict-aliasing to them.
This patch was obtained by running
sed -i s/-O2/-O2  -fno-strict-aliasing/g 
../gcc/testsuite/gcc.target/i386/avx*-2.c

Ok for stage1?

Changelog below:

testsuite/

2015-04-03  Ilya Tocar  ilya.to...@intel.com

PR target/63211
* gcc.target/i386/avx-cmpsd-2.c: Update test.
* gcc.target/i386/avx-cmpss-2.c: Update test.
* gcc.target/i386/avx-vbroadcastf128-256-2.c: Update test.
* gcc.target/i386/avx-vbroadcastss-2.c: Update test.
* gcc.target/i386/avx-vcomisd-2.c: Update test.
* gcc.target/i386/avx-vcomiss-2.c: Update test.
* gcc.target/i386/avx-vcvtsd2si-2.c: Update test.
* gcc.target/i386/avx-vcvtsi2sd-2.c: Update test.
* gcc.target/i386/avx-vcvtsi2ss-2.c: Update test.
* gcc.target/i386/avx-vcvtss2si-2.c: Update test.
* gcc.target/i386/avx-vcvttsd2si-2.c: Update test.
* gcc.target/i386/avx-vcvttss2si-2.c: Update test.
* gcc.target/i386/avx-vdppd-2.c: Update test.
* gcc.target/i386/avx-vdpps-2.c: Update test.
* gcc.target/i386/avx-vextractf128-256-2.c: Update test.
* gcc.target/i386/avx-vinsertf128-256-2.c: Update test.
* gcc.target/i386/avx-vinsertps-2.c: Update test.
* gcc.target/i386/avx-vmaskmovpd-2.c: Update test.
* gcc.target/i386/avx-vmaskmovpd-256-2.c: Update test.
* gcc.target/i386/avx-vmaskmovps-2.c: Update test.
* gcc.target/i386/avx-vmaskmovps-256-2.c: Update test.
* gcc.target/i386/avx-vmovapd-2.c: Update test.
* gcc.target/i386/avx-vmovapd-256-2.c: Update test.
* gcc.target/i386/avx-vmovaps-2.c: Update test.
* gcc.target/i386/avx-vmovaps-256-2.c: Update test.
* gcc.target/i386/avx-vmovd-2.c: Update test.
* gcc.target/i386/avx-vmovdqa-2.c: Update test.
* gcc.target/i386/avx-vmovdqa-256-2.c: Update test.
* gcc.target/i386/avx-vmovdqu-2.c: Update test.
* gcc.target/i386/avx-vmovdqu-256-2.c: Update test.
* gcc.target/i386/avx-vmovhpd-2.c: Update test.
* gcc.target/i386/avx-vmovhps-2.c: Update test.
* gcc.target/i386/avx-vmovlpd-2.c: Update test.
* gcc.target/i386/avx-vmovq-2.c: Update test.
* gcc.target/i386/avx-vmovsd-2.c: Update test.
* gcc.target/i386/avx-vmovss-2.c: Update test.
* gcc.target/i386/avx-vmovupd-2.c: Update test.
* gcc.target/i386/avx-vmovupd-256-2.c: Update test.
* gcc.target/i386/avx-vmovups-2.c: Update test.
* gcc.target/i386/avx-vmovups-256-2.c: Update test.
* gcc.target/i386/avx-vpcmpestri-2.c: Update test.
* gcc.target/i386/avx-vpcmpestrm-2.c: Update test.
* gcc.target/i386/avx-vpcmpistri-2.c: Update test.
* gcc.target/i386/avx-vpcmpistrm-2.c: Update test.
* gcc.target/i386/avx-vperm2f128-256-2.c: Update test.
* gcc.target/i386/avx-vpermilpd-2.c: Update test.
* gcc.target/i386/avx-vpermilpd-256-2.c: Update test.
* gcc.target/i386/avx-vpermilps-2.c: Update test.
* gcc.target/i386/avx-vpermilps-256-2.c: Update test.
* gcc.target/i386/avx-vpslld-2.c: Update test.
* gcc.target/i386/avx-vpsllq-2.c: Update test.
* gcc.target/i386/avx-vpsllw-2.c: Update test.
* gcc.target/i386/avx-vpsrad-2.c: Update test.
* gcc.target/i386/avx-vpsraw-2.c: Update test.
* gcc.target/i386/avx-vpsrld-2.c: Update test.
* gcc.target/i386/avx-vpsrlq-2.c: Update test.
* gcc.target/i386/avx-vpsrlw-2.c: Update test.
* gcc.target/i386/avx-vptest-2.c: Update test.
* gcc.target/i386/avx-vptest-256-2.c: Update test.
* gcc.target/i386/avx-vroundpd-2.c: Update test.
* gcc.target/i386/avx-vroundpd-256-2.c: Update test.
* gcc.target/i386/avx-vtestpd-2.c: Update test.
* gcc.target/i386/avx-vtestpd-256-2.c: Update test.
* gcc.target/i386/avx-vtestps-2.c: Update test.
* gcc.target/i386/avx-vtestps-256-2.c: Update test.
* gcc.target/i386/avx-vucomisd-2.c: Update test.
* gcc.target/i386/avx-vucomiss-2.c: Update test.
* gcc.target/i386/avx2-i32gatherd-2.c: Update test.
* gcc.target/i386/avx2-i32gatherd256-2.c: Update test.
* gcc.target/i386/avx2-i32gatherpd-2.c: Update test.
* gcc.target/i386/avx2-i32gatherpd256-2.c: Update test.
* gcc.target/i386/avx2-i32gatherps-2.c: Update test.
* gcc.target/i386/avx2-i32gatherps256-2.c: Update test.
* gcc.target/i386/avx2-i32gatherq-2.c: Update test.
* gcc.target/i386/avx2-i32gatherq256-2.c: Update test.
* gcc.target/i386/avx2-i64gatherd-2.c: Update test.
* gcc.target/i386/avx2-i64gatherd256-2.c: Update test.
  

Re: [PATCH, CHKP] Restore transparent alias chains

2015-04-03 Thread Ilya Enkovich
2015-04-02 21:48 GMT+03:00 Jan Hubicka hubi...@ucw.cz:
 On 03/20/2015 02:20 AM, Ilya Enkovich wrote:
 Hi,
 
 Identifiers read with input_identifier miss IDENTIFIER_TRANSPARENT_ALIAS 
 bit.  We always expect it for instrumentation clones, thus restore it 
 input_cgraph_1.  Bootstrapped and tested on x86_64-unknown-linux-gnu.  OK 
 for trunk?
 
 Thanks,
 Ilya
 --
 2015-03-20  Ilya Enkovich  ilya.enkov...@intel.com
 
  * lto-cgraph.c (input_cgraph_1): Always link instrumented
  assembler name with original one.
 This appears to be a code path that only triggers when MPX is
 enabled and is roughly analogous to the code in
 chkp_build_instrumented_fndecl links things up.

 OK for the trunk.

 I think this will lead to wrong code. At this time, we may have multple
 declarations sharing single assembler name (and thus
 IDENTIFIER_TRANSPARENT_ALIAS link). Think of case where one unit defines 
 static
 function and other global function of the same name. We may end up renaming 
 the
 symbol but keeping bogus transparent alias link that will trigger on other
 symbol.

That is the reason for fixing chains in privatize_symbol_name.


 During WPA the assembler names are never fully unique, but we also probably do
 not need to set IDENTIFIER_TRANSPARENT_ALIAS.

 I think IDENTIFIER_TRANSPARENT_ALIAS should be set in rename_statics and
 separately in ltrans on the place we skip lto_symtab merging. At least it is
 the place I link it with my patch for supporting transparent aliases in 
 cgraph.

 I will try to find time to audit chkp code - I missed the addition of
 transparent aliases in the initial chkp patches.  Symbol table code expect 1-1
 correspondence between symbol table entries and real symbols.  Basically all
 places we special case aliases or weakrefs needs to be revisited for
 transparent aliases.

I don't see how 1-1 matching may be achieved now for instrumented
functions. We have two cgraph_nodes which actually represent the same
function. Thus single real symbol for two nodes.

Thanks,
Ilya


 Honza


Re: [CHKP] Never expand instrumentation thunks

2015-04-03 Thread Ilya Enkovich
2015-04-03 0:13 GMT+03:00 Jan Hubicka hubi...@ucw.cz:
 On 04/02/2015 02:11 PM, Jan Hubicka wrote:
 2015-03-18  Ilya Enkovich  ilya.enkov...@intel.com
 
  * cgraphunit.c (cgraph_node::expand_thunk): Don't expand
  instrumentation thunks.
 
 
 diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
 index e640907..abc9cfe 100644
 --- a/gcc/cgraphunit.c
 +++ b/gcc/cgraphunit.c
 @@ -1508,6 +1508,10 @@ cgraph_node::expand_thunk (bool output_asm_thunks, 
 bool force_gimple_thunk)
 tree thunk_fndecl = decl;
 tree a;
 
 +  /* Instrumentation thunk is the same function with
 + a different signature.  Never need to expand it.  */
 +  if (thunk.add_pointer_bounds_args)
 +return false;
 
 Yeah, this is another case where we hit problem with transparent alias 
 pretending
 to be thunk :)
 I guess the patch is OK for GCC-5 and for next stage1 we can clean this up.
 I was actually building a compiler so I could take a look at this
 one under a debugger ;-)

 I think it is really the transparent alias issue.  The comment seems pretty 
 clear about it.
 What is confusing is that instrumentation thunks are called thunks when they 
 are
 really not - thunk is a small hunk of code, while instrumentation thunk is a 
 transparent
 alias.

 Too bad I did not notice we introduced transparent aliases, i would push out 
 my
 code for that. I will compare Ilya's changes with mine and hopefully we can
 catch more bugs and unify the code next stage1.

Thunk was the best I could find to represent the same function but
with different signature. It would be great to have a more specialized
way for that.

Thanks,
Ilya


 Honza


Re: [PATCH, i386] PR63211 broken type-punning in avx* tests.

2015-04-03 Thread Uros Bizjak
On Fri, Apr 3, 2015 at 1:02 PM, Ilya Tocar tocarip.in...@gmail.com wrote:

 I've looked into avx* tests and many of them (even those that don't fail
 in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63211) use invalid type
 punning. Properly fixing them looks like a lot of work, so I propose
 just adding  -fno-strict-aliasing to them.
 This patch was obtained by running
 sed -i s/-O2/-O2  -fno-strict-aliasing/g 
 ../gcc/testsuite/gcc.target/i386/avx*-2.c

 Ok for stage1?

I don't like this approach. If the testcase is broken, then it should
be fixed, not worked around.

Uros.


Re: [PATCH, stage1][PR65443] Add transform_to_exit_first_loop_alt

2015-04-03 Thread Tom de Vries

On 27-03-15 15:10, Tom de Vries wrote:

Hi,

this patch fixes PR65443, a todo in the parloops pass for function
transform_to_exit_first_loop:
...
TODO: the common case is that latch of the loop is empty and immediately
follows the loop exit.  In this case, it would be better not to copy the
body of the loop, but only move the entry of the loop directly before the
exit check and increase the number of iterations of the loop by one.
This may need some additional preconditioning in case NIT = ~0.
...

The current implementation of transform_to_exit_first_loop transforms the loop
by moving and duplicating the loop body. This patch transforms the loop into the
same normal form, but without the duplication, if that's possible (determined by
try_transform_to_exit_first_loop_alt).

The actual transformation, done by transform_to_exit_first_loop_alt transforms
this loop:
...
  bb preheader:
  ...
  goto bb header

  bb header:
  ...
  if (ivtmp  n)
goto bb latch;
  else
goto bb exit;

  bb latch:
  ivtmp = ivtmp + inc;
  goto bb header
...

into this one:
...
  bb preheader:
  ...
  goto bb newheader

  bb header:
  ...
  goto bb latch;

  bb newheader:
  if (ivtmp  n + 1)
goto bb header;
  else
goto bb exit;

  bb latch:
  ivtmp = ivtmp + inc;
  goto bb newheader
...



Updated patch, now using redirect_edge_var_map and flush_pending_stmts.

Bootstrapped and reg-tested on x86_64.

OK for stage1 trunk?

Thanks,
- Tom


Add transform_to_exit_first_loop_alt

2015-03-27  Tom de Vries  t...@codesourcery.com

	PR tree-optimization/65443
	* tree-parloops.c (replace_imm_uses, replace_uses_in_bb_by)
	(replace_uses_in_bbs_by, transform_to_exit_first_loop_alt)
	(try_transform_to_exit_first_loop_alt): New function.
	(transform_to_exit_first_loop): Use
	try_transform_to_exit_first_loop_alt.

	* gcc.dg/parloops-exit-first-loop-alt-2.c: New test.
	* gcc.dg/parloops-exit-first-loop-alt-3.c: New test.
	* gcc.dg/parloops-exit-first-loop-alt.c: New test.

	* testsuite/libgomp.c/parloops-exit-first-loop-alt-2.c: New test.
	* testsuite/libgomp.c/parloops-exit-first-loop-alt-3.c: New test.
	* testsuite/libgomp.c/parloops-exit-first-loop-alt.c: New test.
---
 .../gcc.dg/parloops-exit-first-loop-alt-2.c|  27 ++
 .../gcc.dg/parloops-exit-first-loop-alt-3.c|  26 ++
 .../gcc.dg/parloops-exit-first-loop-alt.c  |  27 ++
 gcc/tree-parloops.c| 341 -
 .../libgomp.c/parloops-exit-first-loop-alt-2.c |  48 +++
 .../libgomp.c/parloops-exit-first-loop-alt-3.c |  29 ++
 .../libgomp.c/parloops-exit-first-loop-alt.c   |  48 +++
 7 files changed, 534 insertions(+), 12 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-2.c
 create mode 100644 gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c
 create mode 100644 gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt.c
 create mode 100644 libgomp/testsuite/libgomp.c/parloops-exit-first-loop-alt-2.c
 create mode 100644 libgomp/testsuite/libgomp.c/parloops-exit-first-loop-alt-3.c
 create mode 100644 libgomp/testsuite/libgomp.c/parloops-exit-first-loop-alt.c

diff --git a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-2.c b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-2.c
new file mode 100644
index 000..a4d6cb2
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-2.c
@@ -0,0 +1,27 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target pthread } */
+/* { dg-options -O2 -ftree-parallelize-loops=2 -fdump-tree-parloops } */
+
+#define N 1000
+
+unsigned int a[N];
+unsigned int b[N];
+unsigned int c[N];
+
+void __attribute__((noclone,noinline))
+f (unsigned int n)
+{
+  int i;
+
+for (i = 0; i  N; ++i)
+  c[i] = a[i] + b[i];
+}
+
+/* Three times three array accesses:
+   - three in f._loopfn.0
+   - three in the parallel
+   - three in the low iteration count loop
+   Crucially, none for a peeled off last iteration following the parallel.  */
+/* { dg-final { scan-tree-dump-times (?n)\\\[i 9 parloops } } */
+
+/* { dg-final { cleanup-tree-dump parloops } } */
diff --git a/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c
new file mode 100644
index 000..c7ab51f88
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/parloops-exit-first-loop-alt-3.c
@@ -0,0 +1,26 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target pthread } */
+/* { dg-options -O2 -ftree-parallelize-loops=2 -fdump-tree-parloops } */
+
+unsigned int *a;
+
+unsigned int __attribute__((noclone,noinline))
+f (unsigned int n)
+{
+  int i;
+  unsigned int sum = 1;
+
+  for (i = 0; i  n; ++i)
+sum += a[i];
+
+  return sum;
+}
+
+/* Three array accesses:
+   - one in f._loopfn.0
+   - one in the parallel
+   - one in the low iteration count loop
+   Crucially, none for a peeled off last 

Re: [RS6000] Fix 65576 regression

2015-04-03 Thread Alan Modra
On Thu, Apr 02, 2015 at 08:02:57PM -0400, David Edelsohn wrote:
 If the final alternative cannot occur, it should be removed as well.
 Alan, would you please test that change also?

Tested powerpc64-linux and powerpc-linux no regressions.

* config/rs6000/rs6000.md (extenddftf2_internal): Remove last
alternative.

Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 221805)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -8369,9 +8369,9 @@
 })
 
 (define_insn_and_split *extenddftf2_internal
-  [(set (match_operand:TF 0 nonimmediate_operand =m,Y,ws,d,d,r)
-   (float_extend:TF (match_operand:DF 1 input_operand 
d,r,md,md,md,rm)))
-   (use (match_operand:DF 2 zero_reg_mem_operand d,r,j,m,d,n))]
+  [(set (match_operand:TF 0 nonimmediate_operand =m,Y,ws,d,d)
+   (float_extend:TF (match_operand:DF 1 input_operand d,r,md,md,md)))
+   (use (match_operand:DF 2 zero_reg_mem_operand d,r,j,m,d))]
   !TARGET_IEEEQUAD
 TARGET_HARD_FLOAT  TARGET_FPRS  TARGET_DOUBLE_FLOAT 
 TARGET_LONG_DOUBLE_128

-- 
Alan Modra
Australia Development Lab, IBM


Re: [C++ Patch/RFC] PR 64085

2015-04-03 Thread Jason Merrill

OK.

Jason


Re: [PATCH, CHKP] Fix cdtor merge for instrumented functions

2015-04-03 Thread Ilya Enkovich
On 02 Apr 22:21, Jan Hubicka wrote:
  Hi,
  
  This patch doesn't allow instrumentation thunks calls while merging 
  constructors and destructors.  Not isntrumented code is not affeceted.  
  Bootstrapped and tested on x86_64-unknown-linux-gnu.  OK for trunk?
  
  Thanks,
  Ilya
  --
  gcc/
  
  2015-04-02  Ilya Enkovich  ilya.enkov...@intel.com
  
  * ipa.c (ipa_cdtor_merge): Skip instrumentation thunks.
 
 So the problem here is that you do have two names for the function, one that
 is not instrumented and other that is instrumented?  I am bit surprised we get
 instrumentation on ctors that should not take or return pointer parameter,
 but I see one can trigger that at least by manually adding constructor 
 attribute.
 
 I think what you need is to drop DECL_STATIC_CONSTRUCTOR/DESTRUCTURO flags 
 when
 producing the transparent alias.
 
 Honza

Dropping flag is a good option.  Here is a corresponding patch.  Bootstrapped 
and tested on x86_64-unknown-linux-gnu.  Does it look OK?

Thanks,
Ilya
--
gcc/

2015-04-03  Ilya Enkovich  ilya.enkov...@intel.com

* ipa-chkp.c (chkp_maybe_create_clone): Reset cdtor
flags for instrumentation thunk.
(chkp_produce_thunks): Likewise.

gcc/testsuite/

2015-04-03  Ilya Enkovich  ilya.enkov...@intel.com

* gcc.dg/lto/chkp-ctor-merge_0.c: New.


diff --git a/gcc/ipa-chkp.c b/gcc/ipa-chkp.c
index a9933e2..0c16f71 100644
--- a/gcc/ipa-chkp.c
+++ b/gcc/ipa-chkp.c
@@ -550,6 +550,9 @@ chkp_maybe_create_clone (tree fndecl)
  clone-thunk.thunk_p = true;
  clone-thunk.add_pointer_bounds_args = true;
  clone-create_edge (node, NULL, 0, CGRAPH_FREQ_BASE);
+ /* Thunk shouldn't be a cdtor.  */
+ DECL_STATIC_CONSTRUCTOR (clone-decl) = 0;
+ DECL_STATIC_DESTRUCTOR (clone-decl) = 0;
}
  else
{
@@ -713,6 +716,9 @@ chkp_produce_thunks (bool early)
 0, CGRAPH_FREQ_BASE);
  node-create_reference (node-instrumented_version,
   IPA_REF_CHKP, NULL);
+ /* Thunk shouldn't be a cdtor.  */
+ DECL_STATIC_CONSTRUCTOR (node-decl) = 0;
+ DECL_STATIC_DESTRUCTOR (node-decl) = 0;
}
 }
 
diff --git a/gcc/testsuite/gcc.dg/lto/chkp-ctor-merge_0.c 
b/gcc/testsuite/gcc.dg/lto/chkp-ctor-merge_0.c
new file mode 100644
index 000..ac4095b
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lto/chkp-ctor-merge_0.c
@@ -0,0 +1,23 @@
+/* { dg-lto-do run } */
+/* { dg-require-effective-target mpx } */
+/* { dg-lto-options { { -O2 -flto -fcheck-pointer-bounds -mmpx -nodefaultlibs 
-lc } } } */
+
+int glob = 1;
+
+void __attribute__((constructor))
+ctor1 ()
+{
+  glob += 1;
+}
+
+
+void __attribute__((constructor))
+ctor2 ()
+{
+  glob -= 2;
+}
+
+int main (int argc, const char **argv)
+{
+  return glob;
+}


Re: [debug-early] equate new DIE with DW_AT_specificationto a previous declaration

2015-04-03 Thread Jason Merrill

On 03/18/2015 11:51 AM, Aldy Hernandez wrote:

On 03/17/2015 07:12 PM, Jason Merrill wrote:

Why are we outlining a DECL_EXTERNAL function?


SRA has no restrictions on whether a function is DECL_EXTERNAL.


Aha.

So it seems that DECL_EXTERNAL is the wrong gate for 
equate_decl_number_to_die here.  I think the rule we want is that if we 
don't already have a non-declaration DIE for a function, we should 
equate the new DIE.  Let's remove the existing calls and replace them 
with a single conditional call before the if (declaration).


Incidentally,


  /* A declaration that has been previously dumped needs no
 additional information.  */
  if (dumped_early  declaration)
return;


Do we need to check dumped_early here?  I would think that any 
declaration that has an old_die needs no additional information.


Jason



Re: [debug-early] Handle specification of class scoped static functions

2015-04-03 Thread Jason Merrill

On 03/20/2015 08:11 PM, Aldy Hernandez wrote:

+  /* For class scoped static functions, the dumped early
+ version was the declaration, whereas the next time
+ around with a different context should be the
+ specification.  In this case, avoid reusing the DIE, but
+ generate a specification below. E.g.:
+
+ class C {
+ public:
+   static void moo () {}
+ };  */
+  || !is_cu_die (context_die))


Why do we still need this added (relative to trunk)?  Are we getting 
here multiple times with class context_die?


Also, the comment seems redundant with the comment immediately above.

Jason



Re: [PATCH, i386] PR63211 broken type-punning in avx* tests.

2015-04-03 Thread Ilya Tocar
On 03 Apr 13:39, Uros Bizjak wrote:
 On Fri, Apr 3, 2015 at 1:02 PM, Ilya Tocar tocarip.in...@gmail.com wrote:
 
  I've looked into avx* tests and many of them (even those that don't fail
  in https://gcc.gnu.org/bugzilla/show_bug.cgi?id=63211) use invalid type
  punning. Properly fixing them looks like a lot of work, so I propose
  just adding  -fno-strict-aliasing to them.
  This patch was obtained by running
  sed -i s/-O2/-O2  -fno-strict-aliasing/g 
  ../gcc/testsuite/gcc.target/i386/avx*-2.c
 
  Ok for stage1?
 
 I don't like this approach. If the testcase is broken, then it should
 be fixed, not worked around.

IMHO those tests don't need to be alias conformant.
There are plenty of tests for aliasing rules,
and avx tests verify intrinsics implementaion. There are plenty of real
programs braking alias rules, so why can't we have non-conformant tests?


Re: [PATCH] Fix a test with bogus size_t type

2015-04-03 Thread Jakub Jelinek
On Thu, Apr 02, 2015 at 10:35:35AM +0200, Marek Polacek wrote:
 We are now more strict when accepting user-defined initializer_lists; in
 particular, we now require sizetype, not just any integral type.  The
 following test failed with -m32, because it had a bogus type of size_t,
 with -m32 it usually should be unsigned int, not unsigned long.
 
 Test passes now with both -m32/-m64, ok for trunk?

That is obvious.

 2015-04-02  Marek Polacek  pola...@redhat.com
 
   * g++.dg/cpp0x/pr57101.C: Use proper type for size_t.
 
 diff --git gcc/testsuite/g++.dg/cpp0x/pr57101.C 
 gcc/testsuite/g++.dg/cpp0x/pr57101.C
 index 94b576f..c0fc966 100644
 --- gcc/testsuite/g++.dg/cpp0x/pr57101.C
 +++ gcc/testsuite/g++.dg/cpp0x/pr57101.C
 @@ -1,7 +1,7 @@
  // { dg-do compile { target c++11 } }
  // { dg-options -fcompare-debug }
  
 -typedef long unsigned size_t;
 +typedef __SIZE_TYPE__ size_t;
  namespace
  {
template  typename _Tp, _Tp __v  struct integral_constant

Jakub


Re: [libstdc++/65033] Give alignment info to libatomic

2015-04-03 Thread Hans-Peter Nilsson
On Thu, 2 Apr 2015, Hans-Peter Nilsson wrote:
 Why then use __alignof(_M_i) (the object-alignment)
 instead of _S_alignment (the deduced alas insufficiently
 increased type-alignment)?

(The immediate reason is that _S_alignment wasn't there until a
later revision, but the gist of the question remains. :-)

  making sure that atomic_is_lock_free returns the same
  value for all objects of a given type,

 (That would work but it doesn't seem to be the case.)

  we probably should have changed the
  interface so that we would pass size and alignment rather than size and 
  object
  pointer.
 
  Instead, we decided that passing null for the object pointer would be
  sufficient.  But as this PR shows, we really do need to take alignment into
  account.

 Regarding what's actually needed, alignment of an atomic type
 should always be *forced to be at least the natural alignment of
 of the object* (with non-power-of-two sized-objects rounded up)
 and until then atomic types won't work for targets where the
 non-atomic equivalents have less alignment (as straddling a
 page-boundary won't be lock-less-atomic anywhere where
 straddling a page-boundary may cause a non-atomic-access...) So,
 not target-specific except for targets that require even
 higher-than-natural alignment.

So, can we do something like this instead for gcc5?
(Completely untested and may be syntactically, namespacingly and
cxxstandardversionly flawed.)

Index: include/std/atomic
===
--- include/std/atomic  (revision 221849)
+++ include/std/atomic  (working copy)
@@ -165,16 +165,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 struct atomic
 {
 private:
-  // Align 1/2/4/8/16-byte types the same as integer types of that size.
+  // Align 1/2/4/8/16-byte types to the natural alignment of that size.
   // This matches the alignment effects of the C11 _Atomic qualifier.
   static constexpr int _S_min_alignment
-   = sizeof(_Tp) == sizeof(char)  ? alignof(char)
-   : sizeof(_Tp) == sizeof(short) ? alignof(short)
-   : sizeof(_Tp) == sizeof(int)   ? alignof(int)
-   : sizeof(_Tp) == sizeof(long)  ? alignof(long)
-   : sizeof(_Tp) == sizeof(long long) ? alignof(long long)
+  = sizeof(_Tp) == sizeof(char)   ? max(sizeof(char), alignof(char))
+   : sizeof(_Tp) == sizeof(short) ? max(sizeof(short), alignof(short))
+   : sizeof(_Tp) == sizeof(int)   ? max(sizeof(int), alignof(int))
+   : sizeof(_Tp) == sizeof(long)  ? max(sizeof(long), alignof(long))
+   : sizeof(_Tp) == sizeof(long long) ? max(sizeof(long long), 
alignof(long long))
 #ifdef _GLIBCXX_USE_INT128
-   : sizeof(_Tp) == sizeof(__int128)  ? alignof(__int128)
+   : sizeof(_Tp) == sizeof(__int128)  ? max(sizeof(__int128), 
alignof(__int128))
 #endif
: 0;

@@ -216,7 +216,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   is_lock_free() const noexcept
   {
// Produce a fake, minimally aligned pointer.
-   void *__a = reinterpret_castvoid *(-__alignof(_M_i));
+   void *__a = reinterpret_castvoid *(-_S_alignment);
return __atomic_is_lock_free(sizeof(_M_i), __a);
   }

@@ -224,7 +224,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   is_lock_free() const volatile noexcept
   {
// Produce a fake, minimally aligned pointer.
-   void *__a = reinterpret_castvoid *(-__alignof(_M_i));
+   void *__a = reinterpret_castvoid *(-_S_alignment);
return __atomic_is_lock_free(sizeof(_M_i), __a);
   }

Index: include/bits/atomic_base.h
===
--- include/bits/atomic_base.h  (revision 221849)
+++ include/bits/atomic_base.h  (working copy)
@@ -240,7 +240,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 private:
   typedef _ITp __int_type;

-  __int_type   _M_i;
+  // Align 1/2/4/8/16-byte types to the natural alignment of that size.
+  // This matches the alignment effects of the C11 _Atomic qualifier.
+  static constexpr int _S_min_alignment
+  = sizeof(_Tp) == sizeof(char)   ? max(sizeof(char), __alignof(char))
+   : sizeof(_Tp) == sizeof(short) ? max(sizeof(short), 
__alignof(short))
+   : sizeof(_Tp) == sizeof(int)   ? max(sizeof(int), __alignof(int))
+   : sizeof(_Tp) == sizeof(long)  ? max(sizeof(long), __alignof(long))
+   : sizeof(_Tp) == sizeof(long long) ? max(sizeof(long long), 
__alignof(long long))
+#ifdef _GLIBCXX_USE_INT128
+   : sizeof(_Tp) == sizeof(__int128)  ? max(sizeof(__int128), 
__alignof(__int128))
+#endif
+   : 0;
+
+  static constexpr int _S_alignment
+= _S_min_alignment  alignof(_Tp) ? _S_min_alignment : __alignof(_Tp);
+
+  __int_type   _M_i __attribute ((__aligned(_S_alignment)));

 public:
   __atomic_base() noexcept = default;
@@ -348,7 +364,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
   is_lock_free() const noexcept
  

Re: ipa-cp heuristic tweek

2015-04-03 Thread Ilya Enkovich
2015-03-31 15:32 GMT+03:00 Ilya Enkovich enkovich@gmail.com:
 2015-03-29 18:43 GMT+03:00 Jan Hubicka hubi...@ucw.cz:
 +static bool
 +set_single_call_flag (cgraph_node *node, void *)
 +{
 +  cgraph_edge *cs = node-callers;
 +  /* Local thunks can be handled transparently, skip them.  */
 +  while (cs  cs-caller-thunk.thunk_p  cs-caller-local.local)
 +cs = cs-next_caller;
 +  if (cs)
 +{
 +  gcc_assert (!cs-next_caller);

 This assert assumes the only non-thunk caller is always at the end of
 a callers list. Is it actually guaranteed?

 +  IPA_NODE_REF (cs-caller)-node_calling_single_call = true;
 +  return true;
 +}
 +  return false;
 +}
 +
  /* Initialize ipcp_lattices.  */


 Thanks,
 Ilya

Hi Honza,

For chkp testing I see cases when gcc asserts in set_single_call_flag
because instrumentation thunk is not the last one in a callers list. I
want to install following patch to fix it. Is it OK?

Thanks,
Ilya
--
2015-04-03  Ilya Enkovich  ilya.enkov...@intel.com

* ipa-cp (set_single_call_flag): Remove too
restrictive assert.


diff --git a/gcc/ipa-cp.c b/gcc/ipa-cp.c
index d9aa92e..bfe4821 100644
--- a/gcc/ipa-cp.c
+++ b/gcc/ipa-cp.c
@@ -839,7 +839,6 @@ set_single_call_flag (cgraph_node *node, void *)
 cs = cs-next_caller;
   if (cs)
 {
-  gcc_assert (!cs-next_caller);
   IPA_NODE_REF (cs-caller)-node_calling_single_call = true;
   return true;
 }


Re: ipa-cp heuristic tweek

2015-04-03 Thread Jan Hubicka
 2015-03-31 15:32 GMT+03:00 Ilya Enkovich enkovich@gmail.com:
  2015-03-29 18:43 GMT+03:00 Jan Hubicka hubi...@ucw.cz:
  +static bool
  +set_single_call_flag (cgraph_node *node, void *)
  +{
  +  cgraph_edge *cs = node-callers;
  +  /* Local thunks can be handled transparently, skip them.  */
  +  while (cs  cs-caller-thunk.thunk_p  cs-caller-local.local)
  +cs = cs-next_caller;
  +  if (cs)
  +{
  +  gcc_assert (!cs-next_caller);
 
  This assert assumes the only non-thunk caller is always at the end of
  a callers list. Is it actually guaranteed?
 
  +  IPA_NODE_REF (cs-caller)-node_calling_single_call = true;
  +  return true;
  +}
  +  return false;
  +}
  +
   /* Initialize ipcp_lattices.  */
 
 
  Thanks,
  Ilya
 
 Hi Honza,
 
 For chkp testing I see cases when gcc asserts in set_single_call_flag
 because instrumentation thunk is not the last one in a callers list. I
 want to install following patch to fix it. Is it OK?
 
 Thanks,
 Ilya
 --
 2015-04-03  Ilya Enkovich  ilya.enkov...@intel.com
 
 * ipa-cp (set_single_call_flag): Remove too
 restrictive assert.

OK (this is probably artifact of earlier version of the code, thanks for 
pointing
it out)
Honza


Re: [CHKP] Support returned bounds in thunks expand

2015-04-03 Thread Jan Hubicka
 
 Yep, there is such code in place. It is done by
 insert_bndret_call_after. You are right, I should use the same
 interface for both these cases.

That would be nice indeed.  We should separate out and unify the code for 
porducing
a wrapper call next stage1 (as Jeff mentions too).
 
   + if (instrumentation_clone
   +  !DECL_BY_REFERENCE (resdecl)
   +  restmp
   +  BOUNDED_P (restmp))
   +   {
   + tree fn = targetm.builtin_chkp_function 
   (BUILT_IN_CHKP_BNDRET);
   + gcall *retbnd = gimple_build_call (fn, 1, restmp);
   +
   + resbnd = create_tmp_reg (pointer_bounds_type_node, 
   retbnd);
   + gimple_call_set_lhs (retbnd, resbnd);
   +
   + gsi_insert_after (bsi, retbnd, GSI_NEW_STMT);
   + create_edge (get_create (fn), retbnd, callees-count, 
   callees-frequency);
   +   }
 
  I am not sure why we need to check here:  Originaly we have two bounded 
  functions, A and B.
  We turn function B to a wrapper of function A. If callers of bounded 
  functions need
  to call a builtin, I would expect all callers of B to do it already, so I 
  would expect
  that wrapper is safe here?
 
 The problem with instrumented call is that instrumented function
 returns two values and call lhs gets only the first one. Thus we
 generate bndret call to get the second one to build own return with
 two values.

I see, patch is OK then (preferably merging as much as possible with ipa-split)

Honza
 
 
  Or do we mix instrumented and non-instrumented versions somehow?
 
  Addding code handling return value is going to turn the call to 
  non-tailcall, so you probably
  want to drop the tailcall flag, too.
 
 No, function still return what the last call return, thus may keep tailcall.
 
 Thanks,
 Ilya
 
 
  Honza


Re: [CHKP, PATCH] Fix LTO cgraph merge for instrumented functions

2015-04-03 Thread Jan Hubicka
 Assembler name of instrumented function is a transparent alias of
 original function's name.  Alias chains are not taken into account by
 analysis. Thus we see no conflict between instrumented function's name
 and a variable name but emit them with the same assembler name. To
 detect name conflict I keep original function node alive.

Hmm, so lets see if I understand your situation.  You always have transparent 
alias TA
and function F connected by IPA_REF_CHKP and because TA is represented as thunk
you also have a fake call from TA to F?

I do not understand how you can miss the link these days.  I extended 
lto-cgraph to
always keep thunk and its target in every unit that reffers to thunk. That 
should
solve you problem?  How IPA_REF_CHKP can link get lost?

I suppose we still need to
 1) write_symbol and lto-symtab should know that transparent aliases are not 
real
symbols and ignore them. We have predicate symtab_node::real_symbol_p,
I suppose it should return true.

I think cgraph_merge and varpool_merge in lto-symtab needs to know what to 
do
when merging two symbols with transparent aliases.
 2) At some point we need to populate transparent alias links as discussed in 
the
other thread.
 3) For safety, I guess we want symbol_table::change_decl_assembler_name to 
either
sanity check there are no trasparent alias links pointing to old assembler
names or update it.
 4) I think we want verify_node to check that transparent alias links and chkp 
points
as intended and only on those symbols where they need to be

There is also logic in lto-partition to always insert alias target
  OK, so you have the chkp references that links to instrumented_version.
  You do not stream them for boundary symbols but for some reason they are 
  needed,
  but when you can end up with wrong CHKP link?
 
 Wrong links may appear when cgraph nodes are merged.

I see, I think you really want to fix these at merging time as sugested in 1).
In this case even the node-instrumented_version pointer may be out of date and
point to a node that lost in merging?

Honza
 
 Thanks
 Ilya


Fix ICE in speculative_call_info

2015-04-03 Thread Jan Hubicka
Hi,
this patch fixes ordering issue where we try to resolve speculation while
we are mid of its duplication.

Bootstrapped/regtested x86_64-liinux, will commit it shortly.

Honza

PR ipa/65655
* ipa-inline-analysis.c (edge_set_predicate): Do not redirect
speculative indirect edges to avoid ordering issue.
* g++.dg/torture/pr65655.C: New testcase.
Index: ipa-inline-analysis.c
===
--- ipa-inline-analysis.c   (revision 221845)
+++ ipa-inline-analysis.c   (working copy)
@@ -793,7 +793,11 @@ edge_set_predicate (struct cgraph_edge *
 {
   /* If the edge is determined to be never executed, redirect it
  to BUILTIN_UNREACHABLE to save inliner from inlining into it.  */
-  if (predicate  false_predicate_p (predicate))
+  if (predicate  false_predicate_p (predicate)
+  /* When handling speculative edges, we need to do the redirection
+ just once.  Do it always on the direct edge, so we do not
+attempt to resolve speculation while duplicating the edge.  */
+   (!e-speculative || e-callee))
 e = redirect_to_unreachable (e);
 
   struct inline_edge_summary *es = inline_edge_summary (e);
Index: testsuite/g++.dg/torture/pr65655.C
===
--- testsuite/g++.dg/torture/pr65655.C  (revision 0)
+++ testsuite/g++.dg/torture/pr65655.C  (working copy)
@@ -0,0 +1,51 @@
+/* { dg-do compile } */
+// { dg-additional-options -std=c++11 -fsanitize=undefined -O2 }
+class ECoordinate { };
+class EPoint {
+public:
+  inline ECoordinate  y ();
+};
+ECoordinate  EPoint::y () { }
+template  class KEY, class CONTENT  class AVLTree;
+template  class KEY, class CONTENT  class AVLTreeNode {
+  friend class
+AVLTree  KEY, CONTENT ;
+  KEY key;
+  void set_rthread (unsigned char b);
+  void set_lthread (unsigned char b);
+};
+template  class KEY, class CONTENT  class AVLTree {
+public:
+  AVLTree ();
+  void insert (const KEY  key, const CONTENT  c);
+AVLTreeNode  KEY, CONTENT  *root;
+  const KEY * _target_key;
+  virtual int compare (const KEY  k1, const KEY  k2) const;
+  void _add (AVLTreeNode  KEY, CONTENT  *t);
+  virtual void _status (unsigned int) { }
+};
+template  class KEY, class CONTENT  void AVLTree  KEY, CONTENT ::_add 
(AVLTreeNode  KEY, CONTENT  *t) {
+  int cmp = compare (*_target_key, t-key);
+  if (cmp == 0)
+{ _status (1); }
+}
+template  class KEY, class CONTENT  void AVLTree  KEY, CONTENT ::insert 
(const KEY  key, const CONTENT  c) {
+  if (root == 0) {
+  root-set_rthread (1);
+  root-set_lthread (1);
+}
+else { _target_key = key; _add (root); }
+}
+template  class KEY, class CONTENT  AVLTree  KEY, CONTENT ::AVLTree ()
+: root (0) { }
+class ContactRepository {
+  void insertContact (EPoint  pt, int val);
+};
+void ContactRepository::insertContact (EPoint  pt, int val) {
+  AVLTreeNode  ECoordinate, AVLTree  ECoordinate, int **cont_x_node;
+  if (cont_x_node == __null)
+{
+  AVLTree  ECoordinate, int *insert_tree = new AVLTree  ECoordinate, 
int ;
+  insert_tree-insert (pt.y (), val);
+}
+}


Re: [PATCH] [ARM] Add support for the Samsung Exynos M1 processor

2015-04-03 Thread Ramana Radhakrishnan
On Fri, Apr 3, 2015 at 5:17 PM, Sebastian Pop seb...@gmail.com wrote:
 Hi,

 On Thu, Apr 2, 2015 at 5:51 PM, James Greenhalgh
 james.greenha...@arm.com wrote:
 Trunk is currently in Stage 4 development, these patches are fairly
 low-risk, but they are certainly not regression fixes. I'll defer
 to port maintainers and release managers for the final say, but in my
 opinion it would not be appropriate to commit them until Stage 1
 development for GCC 6.0 opens (hopefully in a few weeks).

 I thought that adding flags for new processors was ok at any time,
 even to backport.


It's usually risk vs reward on a per patch basis and I don't think of
it as a general rule. We've always avoided the CPU tuning backport
rule to the FSF branches. The smaller the CPU tuning patch - the
better it is and in this case I'm comfortable with the patch going in
as it is adding another tuning option, using existing constructs and
is not invasive in the backend.

Ok for the ARM port if no regressions and there are no objections from the RMs.

regards
Ramana


Re: [PATCH, CHKP] Fix static const bounds creation in LTO

2015-04-03 Thread Ilya Enkovich
On 02 Apr 22:45, Jan Hubicka wrote:
  Hi,
  
  Current in LTO static const bounds are created as external symbol.  It 
  doesn't work in case original symbols were removed and privatized.  This 
  patch introduces a separate comdat symbol to be used in LTO.   Bootstrapped 
  and tested on x86_64-unknown-linux-gnu.  Does this approach look OK?
 
 As I understand it, you want to create a static variables, like 
 __chkp_zero_bounds
 which should be shared across translation units.  Currently the function does:
   static tree
   chkp_make_static_const_bounds (HOST_WIDE_INT lb,
HOST_WIDE_INT ub,
const char *name)
   {
 tree var;
 
 /* With LTO we may have constant bounds already in varpool.
Try to find it.  */
 var = chkp_find_const_bounds_var (lb, ub);
 
 if (var)
   return var;
 
 var  = build_decl (UNKNOWN_LOCATION, VAR_DECL,
  get_identifier (name), pointer_bounds_type_node);
 
 TREE_PUBLIC (var) = 1;
 TREE_USED (var) = 1;
 TREE_READONLY (var) = 1;
 TREE_STATIC (var) = 1;
 TREE_ADDRESSABLE (var) = 0;
 DECL_ARTIFICIAL (var) = 1;
 DECL_READ_P (var) = 1;
 /* We may use this symbol during ctors generation in chkp_finish_file
when all symbols are emitted.  Force output to avoid undefined
symbols in ctors.  */
 if (!in_lto_p)
   {
   DECL_INITIAL (var) = targetm.chkp_make_bounds_constant (lb, ub);
   DECL_COMDAT (var) = 1;
   varpool_node::get_create (var)-set_comdat_group (DECL_ASSEMBLER_NAME 
 (var));
   varpool_node::get_create (var)-force_output = 1;
   }
 else
   DECL_EXTERNAL (var) = 1;
 varpool_node::finalize_decl (var);
   }
 
 You should not set comdat group by hand, or we get into troubles on non-ELF
 targets.  Just call make_decl_one_only (var, DECL_ASSEMBLER_NAME (var));
 
 Now in LTO I guess you want to  check if the symbol is provided by the source
 compilation unit, i.e. call symtab_node::get_for_asmname (DECL_ASSEMBLER_NAME
 (var)) and if it is there, check it have proper type and fatal_error otherwise
 and if it does, discar var and use existing variable
 
 This avoid a duplicate declaration of a symbol that is invalid in symtab.
 (once we solve the transparent alias thing, I will add verification for that)
 Because you already set force_output, the symbol should not be privatized and
 renamed in any way.
 
 If there are cases the symbol can get legally optimized out, I guess you
 can also avoid setting force_output and we can stream references to the
 decls into optimization summary in a case we want ot bind for it in WPA,
 but that can wait for next stage1.
 
 Honza

Thanks a lot for looking into it!  Seems originally I mostly got problems due 
to no DECL_WEAK for generated var, make_decl_one_only seems to work for my 
tests.  Will run more testing to make sure it's OK.  Here is a new patch 
version.  Is it OK?

Thanks,
Ilya
--
gcc/

2015-04-03  Ilya Enkovich  ilya.enkov...@intel.com

* tree-chkp.c (chkp_find_const_bounds_var): Search
by assembler name.
(chkp_make_static_const_bounds): Use make_decl_one_only.

gcc/testsuite/

2015-04-03  Ilya Enkovich  ilya.enkov...@intel.com

* gcc.dg/lto/chkp-static-bounds_0.c: New.


diff --git a/gcc/testsuite/gcc.dg/lto/chkp-static-bounds_0.c 
b/gcc/testsuite/gcc.dg/lto/chkp-static-bounds_0.c
new file mode 100644
index 000..596e551
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/lto/chkp-static-bounds_0.c
@@ -0,0 +1,26 @@
+/* { dg-lto-do link } */
+/* { dg-require-effective-target mpx } */
+/* { dg-lto-options { { -flto -flto-partition=max -fcheck-pointer-bounds -mmpx 
} } } */
+
+const char *cc;
+
+int test1 (const char *c)
+{
+  c = __builtin___bnd_init_ptr_bounds (c);
+  cc = c;
+  return c[0] * 2;
+}
+
+struct S
+{
+  int (*fnptr) (const char *);
+} S;
+
+struct S s1 = {test1};
+struct S s2 = {test1};
+struct S s3 = {test1};
+
+int main (int argc, const char **argv)
+{
+  return s1.fnptr (argv[0]) + s2.fnptr (argv[1]);
+}
diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c
index 03f75b3..4daeab6 100644
--- a/gcc/tree-chkp.c
+++ b/gcc/tree-chkp.c
@@ -1876,26 +1876,17 @@ chkp_add_bounds_to_call_stmt (gimple_stmt_iterator *gsi)
 /* Return constant static bounds var with specified LB and UB
if such var exists in varpool.  Return NULL otherwise.  */
 static tree
-chkp_find_const_bounds_var (HOST_WIDE_INT lb,
-   HOST_WIDE_INT ub)
+chkp_find_const_bounds_var (tree id)
 {
-  tree val = targetm.chkp_make_bounds_constant (lb, ub);
-  struct varpool_node *node;
-
-  /* We expect bounds constant is represented as a complex value
- of two pointer sized integers.  */
-  gcc_assert (TREE_CODE (val) == COMPLEX_CST);
+  struct symtab_node *node = symtab_node::get_for_asmname (id);
 
-  FOR_EACH_VARIABLE (node)
-if (POINTER_BOUNDS_P (node-decl)
-TREE_READONLY (node-decl)
-DECL_INITIAL 

Re: [PATCH, CHKP] Fix cdtor merge for instrumented functions

2015-04-03 Thread Jan Hubicka
 On 02 Apr 22:21, Jan Hubicka wrote:
   Hi,
   
   This patch doesn't allow instrumentation thunks calls while merging 
   constructors and destructors.  Not isntrumented code is not affeceted.  
   Bootstrapped and tested on x86_64-unknown-linux-gnu.  OK for trunk?
   
   Thanks,
   Ilya
   --
   gcc/
   
   2015-04-02  Ilya Enkovich  ilya.enkov...@intel.com
   
 * ipa.c (ipa_cdtor_merge): Skip instrumentation thunks.
  
  So the problem here is that you do have two names for the function, one that
  is not instrumented and other that is instrumented?  I am bit surprised we 
  get
  instrumentation on ctors that should not take or return pointer parameter,
  but I see one can trigger that at least by manually adding constructor 
  attribute.
  
  I think what you need is to drop DECL_STATIC_CONSTRUCTOR/DESTRUCTURO flags 
  when
  producing the transparent alias.
  
  Honza
 
 Dropping flag is a good option.  Here is a corresponding patch.  Bootstrapped 
 and tested on x86_64-unknown-linux-gnu.  Does it look OK?
 
 Thanks,
 Ilya
 --
 gcc/
 
 2015-04-03  Ilya Enkovich  ilya.enkov...@intel.com
 
   * ipa-chkp.c (chkp_maybe_create_clone): Reset cdtor
   flags for instrumentation thunk.
   (chkp_produce_thunks): Likewise.

OK,
Thanks!
Honza


Re: [PATCH] Fix PR ipa/65665

2015-04-03 Thread Jan Hubicka
 Hello.
 
 Following patch fixes a new issue, ICE has disappeared on aarch64-linux 
 machine and I was able to boostrap on
 the machine. Moreover, no new regression seen on x86_64-linux-pc.
 
 Ready for trunk?
 Thanks,
 Martin

 From b8463b8a7b2f6b652d6d9c3a2ece814ec5554619 Mon Sep 17 00:00:00 2001
 From: mliska mli...@suse.cz
 Date: Fri, 3 Apr 2015 09:30:50 +0200
 Subject: [PATCH] Fix PR ipa/65665
 
 gcc/ChangeLog:
 
 2015-04-03  Martin Liska  mli...@suse.cz
 
   PR ipa/65665
   * ipa-icf.c (sem_function::equals_wpa): Verify that IPA CP
   has computed data structure.
   (sem_item_optimizer::update_hash_by_addr_refs): Likewise.

OK,
we should clean this up next stage1 ideally making passmanager to do the 
ipa-prop
analysis and making them trigger for all ipa passes consuming them
(ipa-icf/ipa-cp/inliner). Currently the way they are initialized either by 
inliner
or ipa-cp is quite ugly.

Honza
 ---
  gcc/ipa-icf.c | 10 ++
  1 file changed, 6 insertions(+), 4 deletions(-)
 
 diff --git a/gcc/ipa-icf.c b/gcc/ipa-icf.c
 index 8626730..8f8a0cf 100644
 --- a/gcc/ipa-icf.c
 +++ b/gcc/ipa-icf.c
 @@ -535,7 +535,8 @@ sem_function::equals_wpa (sem_item *item,
 (TREE_CODE (TREE_TYPE (decl)) == METHOD_TYPE
|| TREE_CODE (TREE_TYPE (item-decl)) == METHOD_TYPE)
 (ipa_node_params_sum == NULL
 -   || ipa_is_param_used (IPA_NODE_REF (dyn_cast cgraph_node *(node)),
 +   || IPA_NODE_REF (get_node ())-descriptors.is_empty ()
 +   || ipa_is_param_used (IPA_NODE_REF (get_node ()),
   0))
 compare_polymorphic_p ())
  {
 @@ -2501,14 +2502,15 @@ sem_item_optimizer::update_hash_by_addr_refs ()
m_items[i]-update_hash_by_addr_refs (m_symtab_node_map);
if (m_items[i]-type == FUNC)
   {
 +   cgraph_node *cnode = dyn_cast cgraph_node * (m_items[i]-node);
 +
 if (TREE_CODE (TREE_TYPE (m_items[i]-decl)) == METHOD_TYPE
  contains_polymorphic_type_p
  (method_class_type (TREE_TYPE (m_items[i]-decl)))
  (DECL_CXX_CONSTRUCTOR_P (m_items[i]-decl)
 || ((ipa_node_params_sum == NULL
 -|| ipa_is_param_used (
 - IPA_NODE_REF
 -   (dyn_cast cgraph_node *(m_items[i]-node)), 0))
 +|| IPA_NODE_REF (cnode)-descriptors.is_empty ()
 +|| ipa_is_param_used (IPA_NODE_REF (cnode), 0))
  static_castsem_function * (m_items[i])
  -compare_polymorphic_p (
{
 -- 
 2.1.4
 



Re: [PATCH, CHKP] Fix static const bounds creation in LTO

2015-04-03 Thread Jan Hubicka
 
 Thanks a lot for looking into it!  Seems originally I mostly got problems due 
 to no DECL_WEAK for generated var, make_decl_one_only seems to work for my 
 tests.  Will run more testing to make sure it's OK.  Here is a new patch 
 version.  Is it OK?

Good ;)
 diff --git a/gcc/tree-chkp.c b/gcc/tree-chkp.c
 index 03f75b3..4daeab6 100644
 --- a/gcc/tree-chkp.c
 +++ b/gcc/tree-chkp.c
 @@ -1876,26 +1876,17 @@ chkp_add_bounds_to_call_stmt (gimple_stmt_iterator 
 *gsi)
  /* Return constant static bounds var with specified LB and UB
 if such var exists in varpool.  Return NULL otherwise.  */
  static tree
 -chkp_find_const_bounds_var (HOST_WIDE_INT lb,
 - HOST_WIDE_INT ub)
 +chkp_find_const_bounds_var (tree id)
  {
 -  tree val = targetm.chkp_make_bounds_constant (lb, ub);
 -  struct varpool_node *node;
 -
 -  /* We expect bounds constant is represented as a complex value
 - of two pointer sized integers.  */
 -  gcc_assert (TREE_CODE (val) == COMPLEX_CST);
 +  struct symtab_node *node = symtab_node::get_for_asmname (id);

Sadly I think this won't work on targets that adds prefix to assembler name,
like djgpp/mingw.
To obtain mangled assembler name you need to call
id = targetm.mangle_decl_assembler_name (decl, DECL_NAME (decl));
that in turn requires decl to already exist.  That is why I guess you may want
to just build the new var and see if already exists

Honza
  
 -  FOR_EACH_VARIABLE (node)
 -if (POINTER_BOUNDS_P (node-decl)
 -  TREE_READONLY (node-decl)
 -  DECL_INITIAL (node-decl)
 -  TREE_CODE (DECL_INITIAL (node-decl)) == COMPLEX_CST
 -  tree_int_cst_equal (TREE_REALPART (DECL_INITIAL (node-decl)),
 -TREE_REALPART (val))
 -  tree_int_cst_equal (TREE_IMAGPART (DECL_INITIAL (node-decl)),
 -TREE_IMAGPART (val)))
 +  if (node)
 +{
 +  /* We don't allow this symbol usage for non bounds.  */
 +  gcc_assert (node-type == SYMTAB_VARIABLE);
 +  gcc_assert (POINTER_BOUNDS_P (node-decl));
return node-decl;
 +}
  
return NULL;
  }
 @@ -1907,37 +1898,34 @@ chkp_make_static_const_bounds (HOST_WIDE_INT lb,
  HOST_WIDE_INT ub,
  const char *name)
  {
 +  tree id = get_identifier (name);
tree var;
 +  varpool_node *node;
  
/* With LTO we may have constant bounds already in varpool.
   Try to find it.  */
 -  var = chkp_find_const_bounds_var (lb, ub);
 +  var = chkp_find_const_bounds_var (id);
  
if (var)
  return var;
  
 -  var  = build_decl (UNKNOWN_LOCATION, VAR_DECL,
 -  get_identifier (name), pointer_bounds_type_node);
 +  var  = build_decl (UNKNOWN_LOCATION, VAR_DECL, id,
 +  pointer_bounds_type_node);
  
 -  TREE_PUBLIC (var) = 1;
TREE_USED (var) = 1;
TREE_READONLY (var) = 1;
TREE_STATIC (var) = 1;
TREE_ADDRESSABLE (var) = 0;
DECL_ARTIFICIAL (var) = 1;
DECL_READ_P (var) = 1;
 +  DECL_INITIAL (var) = targetm.chkp_make_bounds_constant (lb, ub);
 +  make_decl_one_only (var, DECL_ASSEMBLER_NAME (var));
/* We may use this symbol during ctors generation in chkp_finish_file
   when all symbols are emitted.  Force output to avoid undefined
   symbols in ctors.  */
 -  if (!in_lto_p)
 -{
 -  DECL_INITIAL (var) = targetm.chkp_make_bounds_constant (lb, ub);
 -  DECL_COMDAT (var) = 1;
 -  varpool_node::get_create (var)-set_comdat_group (DECL_ASSEMBLER_NAME 
 (var));
 -  varpool_node::get_create (var)-force_output = 1;
 -}
 -  else
 -DECL_EXTERNAL (var) = 1;
 +  node = varpool_node::get_create (var);
 +  node-force_output = 1;
 +
varpool_node::finalize_decl (var);
  
return var;


Re: [PATCH, CHKP] Restore transparent alias chains

2015-04-03 Thread Jan Hubicka
  I think this will lead to wrong code. At this time, we may have multple
  declarations sharing single assembler name (and thus
  IDENTIFIER_TRANSPARENT_ALIAS link). Think of case where one unit defines 
  static
  function and other global function of the same name. We may end up renaming 
  the
  symbol but keeping bogus transparent alias link that will trigger on other
  symbol.
 
 That is the reason for fixing chains in privatize_symbol_name.

OK, so with your proposed patch you produce the links during lto-stream-in
but because the links may be wrong due to multiple different symbol sharing same
assembler name you rely on not using it until you fixup in 
privatize_symbol_name.
What happen when you have two symbols with different links sharing assembler 
name
originally, but you rename one and do not rename other during privatization?

It still looks much safer to simply install the links only after names become 
unique
and probably don't do that during WPA compilation at all, since we never output
any assembly.
 
 
  During WPA the assembler names are never fully unique, but we also probably 
  do
  not need to set IDENTIFIER_TRANSPARENT_ALIAS.
 
  I think IDENTIFIER_TRANSPARENT_ALIAS should be set in rename_statics and
  separately in ltrans on the place we skip lto_symtab merging. At least it is
  the place I link it with my patch for supporting transparent aliases in 
  cgraph.
 
  I will try to find time to audit chkp code - I missed the addition of
  transparent aliases in the initial chkp patches.  Symbol table code expect 
  1-1
  correspondence between symbol table entries and real symbols.  Basically all
  places we special case aliases or weakrefs needs to be revisited for
  transparent aliases.
 
 I don't see how 1-1 matching may be achieved now for instrumented
 functions. We have two cgraph_nodes which actually represent the same
 function. Thus single real symbol for two nodes.

Yeah, this is quite important change in the design assumptions for symtab that
is why we need so many places to fix...

Honza
 
 Thanks,
 Ilya
 
 
  Honza


Re: [PATCH, CHKP] Fix ipa-comdats for instrumentation thunks

2015-04-03 Thread Jan Hubicka
 2015-04-02 20:04 GMT+03:00 Jan Hubicka hubi...@ucw.cz:
  Hi,
 
  With r221574 (https://gcc.gnu.org/ml/gcc-cvs/2015-03/msg00495.html) thunks 
  don't get comdat groups assigned and this causes a failure in cgraph 
  checker for instrumentation thunks.  It happens because instrumentation 
  thunk may reference local symbol in comdat not being in comdat by itself.  
  This patch fixes the problem.  Doesn't affect not instrumented code.  
  Testing is in progress.  Does it look OK?
 
  Thanks,
  Ilya
  --
  gcc/
 
  2015-04-02  Ilya Enkovich  ilya.enkov...@intel.com
 
* ipa-comdats.c (ipa_comdats): Visit all instrumentation
thunks to set proper comdat group.
 
  gcc/testsuite/
 
  2015-04-02  Ilya Enkovich  ilya.enkov...@intel.com
 
* gcc.target/i386/mpx/chkp-thunk-comdat-1.cc: New.
* gcc.target/i386/mpx/chkp-thunk-comdat-2.cc: New.
 
 
  diff --git a/gcc/ipa-comdats.c b/gcc/ipa-comdats.c
  index f349f9f..30bcad8 100644
  --- a/gcc/ipa-comdats.c
  +++ b/gcc/ipa-comdats.c
  @@ -348,10 +348,9 @@ ipa_comdats (void)
   }
 
 /* Finally assign symbols to the sections.  */
  -
  +  cgraph_node *fun;
 FOR_EACH_DEFINED_SYMBOL (symbol)
   {
  -  struct cgraph_node *fun;
 symbol-aux = NULL;
 if (!symbol-get_comdat_group ()
   !symbol-alias
  @@ -388,6 +387,20 @@ ipa_comdats (void)
   true);
}
   }
  +
  +  /* Instrumentation thunks reference original node and thus
  + need to be in the same comdat group.  Otherwise we may
  + get a local instrumented symbol in a comdat group and
  + the referencing original node outside of it.  */
  +  FOR_EACH_DEFINED_FUNCTION (fun)
  +if (fun-instrumentation_clone
  +  fun-instrumented_version
  +  !fun-instrumented_version-alias
  +  fun-get_comdat_group ()
  +  !fun-instrumented_version-get_comdat_group ())
  +  fun-instrumented_version-call_for_symbol_and_aliases
  + (set_comdat_group_1, fun, true);
 
  I think you want to handle them same way as the aliasesthunks are handled.
  This fix is symptomatic: the code may assign them to different comdat groups
  and may propagate that furhter.
 
 Currently ipa_comdats doesn't set comdat groups for thunks. At the

I see, that is a bug.  It is supposed to keep thunks in the same section
as their target (thunks doesn't really work across sections on some target,
like PPC, because there is no way to produce a tailcall)
Does the following fix the problem?
Index: ipa-comdats.c
===
--- ipa-comdats.c   (revision 221857)
+++ ipa-comdats.c   (working copy)
@@ -377,12 +377,12 @@
  fprintf (dump_file, To group: %s\n, IDENTIFIER_POINTER (group));
}
  if (is_a cgraph_node * (symbol))
-  dyn_cast cgraph_node *(symbol)-call_for_symbol_and_aliases
+  dyn_cast cgraph_node *(symbol)-call_for_symbol_thunks_and_aliases
  (set_comdat_group_1,
   *comdat_head_map.get (group),
   true);
  else
-  symbol-call_for_symbol_and_aliases
+  symbol-call_for_symbol_thunks_and_aliases
  (set_comdat_group,
   *comdat_head_map.get (group),
   true);

 same time all references to local symbol should be within one comdat
 group. That's why I need this fix. Assigning function and its
 instrumentation thunks to different comdat groups would be an error
 because both nodes represent the same function.
 
 Thanks,
 Ilya
 
 
  Honza


Re: [CHKP] Never expand instrumentation thunks

2015-04-03 Thread Jan Hubicka
 2015-04-03 0:13 GMT+03:00 Jan Hubicka hubi...@ucw.cz:
  On 04/02/2015 02:11 PM, Jan Hubicka wrote:
  2015-03-18  Ilya Enkovich  ilya.enkov...@intel.com
  
   * cgraphunit.c (cgraph_node::expand_thunk): Don't expand
   instrumentation thunks.
  
  
  diff --git a/gcc/cgraphunit.c b/gcc/cgraphunit.c
  index e640907..abc9cfe 100644
  --- a/gcc/cgraphunit.c
  +++ b/gcc/cgraphunit.c
  @@ -1508,6 +1508,10 @@ cgraph_node::expand_thunk (bool 
  output_asm_thunks, bool force_gimple_thunk)
  tree thunk_fndecl = decl;
  tree a;
  
  +  /* Instrumentation thunk is the same function with
  + a different signature.  Never need to expand it.  */
  +  if (thunk.add_pointer_bounds_args)
  +return false;
  
  Yeah, this is another case where we hit problem with transparent alias 
  pretending
  to be thunk :)
  I guess the patch is OK for GCC-5 and for next stage1 we can clean this 
  up.
  I was actually building a compiler so I could take a look at this
  one under a debugger ;-)
 
  I think it is really the transparent alias issue.  The comment seems pretty 
  clear about it.
  What is confusing is that instrumentation thunks are called thunks when 
  they are
  really not - thunk is a small hunk of code, while instrumentation thunk is 
  a transparent
  alias.
 
  Too bad I did not notice we introduced transparent aliases, i would push 
  out my
  code for that. I will compare Ilya's changes with mine and hopefully we can
  catch more bugs and unify the code next stage1.
 
 Thunk was the best I could find to represent the same function but
 with different signature. It would be great to have a more specialized
 way for that.

Yeah, that is also what we need for 
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61886
I just did not notice you are independently solving the same issue :(

Honza


Re: [PATCH] [ARM] Add support for the Samsung Exynos M1 processor

2015-04-03 Thread Sebastian Pop
Hi,

On Thu, Apr 2, 2015 at 5:51 PM, James Greenhalgh
james.greenha...@arm.com wrote:
 Trunk is currently in Stage 4 development, these patches are fairly
 low-risk, but they are certainly not regression fixes. I'll defer
 to port maintainers and release managers for the final say, but in my
 opinion it would not be appropriate to commit them until Stage 1
 development for GCC 6.0 opens (hopefully in a few weeks).

I thought that adding flags for new processors was ok at any time,
even to backport.

 For the AArch64 patch, I was expecting to see a respin after Junmo's
 comment on Wednesday [1].

 In particular:

 +AARCH64_CORE(exynos-m1,   exynosm1,  exynosm1,  8,  AARCH64_FL_FOR_ARCH8 | 
 AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1)

 As there has not been a scheduling model contributed for the exynos-m1,
 this will use *no* scheduling model on AArch64. This is unlikely to give
 good performance.

Fixed in the attached patch.


 +static const struct tune_params exynosm1_tunings =
 +{
 +  cortexa57_extra_costs,
 +  cortexa57_addrcost_table,
 +  cortexa57_regmove_cost,
 +  cortexa57_vector_cost,
 +  4, /* memmov_cost  */
 +  3, /* issue_rate  */
 +  (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
 +   | AARCH64_FUSE_MOVK_MOVK), /* fuseable_ops  */
 +  16,  /* function_align.  */
 +  8,   /* jump_align.  */
 +  4,   /* loop_align.  */
 +  2,   /* int_reassoc_width.  */
 +  4,   /* fp_reassoc_width.  */
 +  1/* vec_reassoc_width.  */
 +};
 +

 As these are identical to the Cortex-A57 tuning, is there any reason to
 add them? I'd prefer if we took a copy-on-modify policy for these
 tuning structs, only adding them where there is a difference.

Agreed.  Fixed.

I'm testing the two patches with bootstrap and make check on Juno
aarch64-linux and Arndale arm-linux.
Ok for trunk after regstrap passes?

Thanks,
Sebastian


0001-ARM-add-option-for-the-Samsung-Exynos-M1-core.patch
Description: Binary data


0002-AArch64-add-option-for-the-Samsung-Exynos-M1-core-fo.patch
Description: Binary data


Re: [libstdc++/65033] Give alignment info to libatomic

2015-04-03 Thread Richard Henderson
On 04/03/2015 07:13 AM, Jonathan Wakely wrote:
 +++ b/libstdc++-v3/include/std/atomic
 @@ -165,9 +165,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  struct atomic
  {
  private:
 -  // Align 1/2/4/8/16-byte types the same as integer types of that size.
 +  // Align 1/2/4/8/16-byte types the natural alignment of that size.
// This matches the alignment effects of the C11 _Atomic qualifier.
 -  static constexpr int _S_min_alignment
 +  static constexpr int _S_int_alignment
   = sizeof(_Tp) == sizeof(char)  ? alignof(char)
   : sizeof(_Tp) == sizeof(short) ? alignof(short)
   : sizeof(_Tp) == sizeof(int)   ? alignof(int)
 @@ -178,6 +178,9 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  #endif
   : 0;
  
 +  static constexpr int _S_min_alignment
 + = _S_int_alignment  sizeof(_Tp) ? _S_int_alignment : sizeof(_Tp);
 +

This doesn't work for non-power-of-two sized _Tp.

Again, we don't have *any* target for which alignof(integer)  sizeof(integer).
So if you care about forcing natural alignment, don't bother with the alignof
on the integrals, as you're doing with _S_int_alignment at the moment.


r~


Re: [PING, AArch64] Add long-call attribute and pragma interfaces

2015-04-03 Thread Sandra Loosemore

On 04/02/2015 11:59 PM, Yangfei (Felix) wrote:

Patch ping: https://gcc.gnu.org/ml/gcc-patches/2014-11/msg01148.html


This patch needs documentation for the new attributes and pragmas before 
it can be committed.  (Since this is a new feature I think it has to 
wait until stage 1, too, but that's not my call.)


-Sandra



Another tweak of inline badness metric

2015-04-03 Thread Jan Hubicka
Hi,
this patch adds combined_size (i.e. size of caller after inlining) to the
denominator of the badness metric.  This is based on observations in PR 65076.
Old metric did use relative time speedup to get into sane range with fixed
point math.  I removed this after switching to sreals and this caused quite
noticeable compile time slowdowns on tramp3d and a regression on dromaeo
benchmark of Firefox.

The issue turns out to be the fact that new metric preffer building very
large functions with are not completely cool for optimization and definitely
bad for compile time.

I am comitting the attached patch and plan to watch benchmark results over
the long weekend. If everything goes well, I plan to also increase somewhat
the inline-unit-growth. In GCC 4.9 it is 30%, I dropped it to 15% that seems
too tight for firefox.  Given that large applications was the main motivator
for the drop, I think it is good enough reason to retreat and do not be so
agressive about code size at -O2/-O3.

Bootstrapped/regtested x86_64-linux, will commit it shortly.

Honza

PR ipa/65076
* ipa-inline.c (edge_badness): Add combined size to the denominator.

Index: ipa-inline.c
===
--- ipa-inline.c(revision 221806)
+++ ipa-inline.c(working copy)
@@ -1077,8 +1077,8 @@ edge_badness (struct cgraph_edge *edge,
   /* When profile is available. Compute badness as:
  
  time_saved * caller_count
- goodness =  -
-growth_of_caller * overall_growth
+ goodness =  -
+growth_of_caller * overall_growth * combined_size
 
  badness = - goodness
 
@@ -1167,6 +1167,7 @@ edge_badness (struct cgraph_edge *edge,
overall_growth += 256 * 256 - 256;
  denominator *= overall_growth;
 }
+  denominator *= inline_summaries-get (caller)-self_size + growth;
 
   badness = - numerator / denominator;
 


Re: [PATCH, libmpx, i386, PR driver/65444] Pass '-z bndplt' when building dynamic objects with MPX

2015-04-03 Thread Joseph Myers
On Tue, 31 Mar 2015, Ilya Enkovich wrote:

 +library.  It also passes '-z bndplt' to a linker in case it supports this
 +option (which is checked on libmpx configuration).  Note that old versions
 +of linker may ignore option.  Gold linker doesn't support '-z bndplt'
 +option.  With no '-z bndplt' support in linker all calls to dynamic libraries
 +lose passed bounds reducing overall protection level.  It's highly
 +recommended to use linker with '-z bndplt' support.  In case such linker
 +is not available it is adviced to always use @option{-static-libmpxwrappers}
 +for better protection level or use @option{-static} to completely avoid
 +external calls to dynamic libraries.  MPX-based instrumentation

Use @samp{-z bndplt} rather than '' quoting (but Sandra may have further 
advice on the substance of this documentation).

-- 
Joseph S. Myers
jos...@codesourcery.com


Work around ICE in PR 65648

2015-04-03 Thread Jan Hubicka
Hi,
the dealII compilation ICEs on check that verifies that projected size match
the size after inlining.  This check suffers from roundoff errors in corner
cases and I believe it is what triggers here.
Given that I have a reorg of this code to sreals scheduled for next stage1,
I think we can just disable the sanity check in GCC 5.  It is non-critical
thoguh very useful to fix varoius bug that causes propagation to diverge.

Will commit it shortly as obvious.

PR ipa/65648
* ipa-inline-transform.c (inline_call): Skip sanity check to work
around the ICE
Index: ipa-inline-transform.c
===
--- ipa-inline-transform.c  (revision 221859)
+++ ipa-inline-transform.c  (working copy)
@@ -304,7 +304,8 @@ inline_call (struct cgraph_edge *e, bool
   struct cgraph_node *callee = e-callee-ultimate_alias_target ();
   bool new_edges_found = false;
 
-#ifdef ENABLE_CHECKING
+  /* This is used only for assert bellow.  */
+#if 0
   int estimated_growth = estimate_edge_growth (e);
   bool predicated = inline_edge_summary (e)-predicate != NULL;
 #endif
@@ -375,7 +376,10 @@ inline_call (struct cgraph_edge *e, bool
to-calls_comdat_local = false;
 }
 
-#ifdef ENABLE_CHECKING
+  /* FIXME: This assert suffers from roundoff errors, disable it for GCC 5
+ and revisit it after conversion to sreals in GCC 6.
+ See PR 65654.  */
+#if 0
   /* Verify that estimated growth match real growth.  Allow off-by-one
  error due to INLINE_SIZE_SCALE roudoff errors.  */
   gcc_assert (!update_overall_summary || !overall_size || new_edges_found


[PATCH], PR target/65614, Prefer to use LFD on powerpc for constants

2015-04-03 Thread Michael Meissner
In my fix for PR target/65240, I removed the special -ffast-math code that
delayed dealing with constants until reload time.  In this patch, constants are
now pushed to memory earlier, and the compiler uses LFS (load floating point
single) to load double precision constants.  When you use the LRA register
allocator (-mlra), it uses the Altivec registers for scalar data more
frequently, and there appears to be interactions between values loaded up as
single constants that are moved to the Altivec registers via XXLOR.

This patch makes (float_extend (mem)) slightly more costly than just (mem) and
the code in expr.c will not compress the constant.  In addition, for scalar
single precision moves it uses copy sign instead of or to move the data.  The
copy sign instruction deals with single precision values that would create
denormals.  While working in the code, I also noticed that truncdfsf2 did not
have support for ISA 2.07, so I added support for it.

I have done bootstraps and make check with no regressions (after fixing the two
tests that were checking that LFS was used).  I have also built and run the
Spec 2006 benchmark bwaves with the patch, and it now runs when compiled with
-mlra and upper register support.  Is the patch ok to commit to trunk?

[gcc]
2015-04-03  Michael Meissner  meiss...@linux.vnet.ibm.com

PR target/65614
* config/rs6000/rs6000.c (rs6000_rtx_costs): Make FLOAT_EXTEND
more expensive, so that LFD is used to load double constants, and
not LFS.

* config/rs6000/rs6000.md (extendsfdf2_fpr): Generate XSCPSGNDP
instead of XXLOR to copy SFmode to clear out dirty bits created
when SFmode denormals are generated.
(movmode_hardfloat, FMOVE32 case): Likewise.
(truncdfsf2_fpr): Add support for ISA 2.07 XSRSP instruction.

[gcc/testsuite]
2015-04-03  Michael Meissner  meiss...@linux.vnet.ibm.com

PR target/65614
* gcc.target/powerpc/compress-float-ppc-pic.c: Run test on power5
to get floating point compression.
* gcc.target/powerpc/compress-foat-ppc.c: Likewise.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 221802)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -30479,8 +30479,10 @@ rs6000_rtx_costs (rtx x, int code, int o
   return false;
 
 case FLOAT_EXTEND:
+  /* Make converts on newer machines slightly more expensive to encourage
+expr.c to not use a LFS instead of LFD to load constants.  */
   if (mode == DFmode)
-   *total = 0;
+   *total = (TARGET_VSX || TARGET_POPCNTD) ? 1 : 0;
   else
*total = rs6000_cost-fp;
   return false;
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 221802)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -5222,7 +5222,7 @@ (define_insn_and_split *extendsfdf2_fpr
fmr %0,%1
lfs%U1%X1 %0,%1
#
-   xxlor %x0,%x1,%x1
+   xscpsgndp %x0,%x1,%x1
lxsspx %x0,%y1
reload_completed  REG_P (operands[1])  REGNO (operands[0]) == REGNO 
(operands[1])
   [(const_int 0)]
@@ -5230,7 +5230,7 @@ (define_insn_and_split *extendsfdf2_fpr
   emit_note (NOTE_INSN_DELETED);
   DONE;
 }
-  [(set_attr type fp,fp,fpload,fp,vecsimple,fpload)])
+  [(set_attr type fp,fp,fpload,fp,fp,fpload)])
 
 (define_expand truncdfsf2
   [(set (match_operand:SF 0 gpc_reg_operand )
@@ -5239,10 +5239,12 @@ (define_expand truncdfsf2
   )
 
 (define_insn *truncdfsf2_fpr
-  [(set (match_operand:SF 0 gpc_reg_operand =f)
-   (float_truncate:SF (match_operand:DF 1 gpc_reg_operand d)))]
+  [(set (match_operand:SF 0 gpc_reg_operand =f,wy)
+   (float_truncate:SF (match_operand:DF 1 gpc_reg_operand d,ws)))]
   TARGET_HARD_FLOAT  TARGET_FPRS  TARGET_DOUBLE_FLOAT
-  frsp %0,%1
+  @
+   frsp %0,%1
+   xsrsp %x0,%x1
   [(set_attr type fp)])
 
 ;; This expander is here to avoid FLOAT_WORDS_BIGENDIAN tests in
@@ -8058,7 +8060,7 @@ (define_insn movmode_hardfloat
lwz%U1%X1 %0,%1
stw%U0%X0 %1,%0
fmr %0,%1
-   xxlor %x0,%x1,%x1
+   xscpsgndp %x0,%x1,%x1
xxlxor %x0,%x0,%x0
li %0,0
f32_li
@@ -8070,7 +8072,7 @@ (define_insn movmode_hardfloat
mt%0 %1
mf%1 %0
nop
-  [(set_attr type 
*,load,store,fp,vecsimple,vecsimple,integer,fpload,fpstore,fpload,fpstore,mftgpr,mffgpr,mtjmpr,mfjmpr,*)
+  [(set_attr type 
*,load,store,fp,fp,vecsimple,integer,fpload,fpstore,fpload,fpstore,mftgpr,mffgpr,mtjmpr,mfjmpr,*)
(set_attr length 4)])
 
 (define_insn *movmode_softfloat
Index: gcc/testsuite/gcc.target/powerpc/compress-float-ppc.c
===
--- gcc/testsuite/gcc.target/powerpc/compress-float-ppc.c   (revision 
221802)
+++ 

Re: [PATCH], PR target/65614, Prefer to use LFD on powerpc for constants

2015-04-03 Thread David Edelsohn
On Fri, Apr 3, 2015 at 3:07 PM, Michael Meissner
meiss...@linux.vnet.ibm.com wrote:
 In my fix for PR target/65240, I removed the special -ffast-math code that
 delayed dealing with constants until reload time.  In this patch, constants 
 are
 now pushed to memory earlier, and the compiler uses LFS (load floating point
 single) to load double precision constants.  When you use the LRA register
 allocator (-mlra), it uses the Altivec registers for scalar data more
 frequently, and there appears to be interactions between values loaded up as
 single constants that are moved to the Altivec registers via XXLOR.

 This patch makes (float_extend (mem)) slightly more costly than just (mem) and
 the code in expr.c will not compress the constant.  In addition, for scalar
 single precision moves it uses copy sign instead of or to move the data.  The
 copy sign instruction deals with single precision values that would create
 denormals.  While working in the code, I also noticed that truncdfsf2 did not
 have support for ISA 2.07, so I added support for it.

 I have done bootstraps and make check with no regressions (after fixing the 
 two
 tests that were checking that LFS was used).  I have also built and run the
 Spec 2006 benchmark bwaves with the patch, and it now runs when compiled with
 -mlra and upper register support.  Is the patch ok to commit to trunk?

The FLOAT_EXTEND cost should be based on the processor tuning, not the ISA.

- David


Re: [PATCH] [ARM] Add support for the Samsung Exynos M1 processor

2015-04-03 Thread James Greenhalgh
On Fri, Apr 03, 2015 at 07:53:12PM +0100, Ramana Radhakrishnan wrote:
 On Fri, Apr 3, 2015 at 5:17 PM, Sebastian Pop seb...@gmail.com wrote:
  Hi,
 
  On Thu, Apr 2, 2015 at 5:51 PM, James Greenhalgh
  james.greenha...@arm.com wrote:
  Trunk is currently in Stage 4 development, these patches are fairly
  low-risk, but they are certainly not regression fixes. I'll defer
  to port maintainers and release managers for the final say, but in my
  opinion it would not be appropriate to commit them until Stage 1
  development for GCC 6.0 opens (hopefully in a few weeks).
 
  I thought that adding flags for new processors was ok at any time,
  even to backport.
 
 It's usually risk vs reward on a per patch basis and I don't think of
 it as a general rule. We've always avoided the CPU tuning backport
 rule to the FSF branches. The smaller the CPU tuning patch - the
 better it is and in this case I'm comfortable with the patch going in
 as it is adding another tuning option, using existing constructs and
 is not invasive in the backend.

Thanks for the clarification Ramana.

In which case, and now that I've seen that binutils support has also
been accepted, the AArch64 part is OK to commit (assuming no regressions
and no objections from Richard or Jakub).

It would be great if you could follow these up with a patch for
changes.html for GCC 5 for both ARM and AArch64.

Cheers,
James



Re: [PATCH] [ARM] Add support for the Samsung Exynos M1 processor

2015-04-03 Thread Sebastian Pop
On Fri, Apr 3, 2015 at 4:09 PM, James Greenhalgh
james.greenha...@arm.com wrote:
 On Fri, Apr 03, 2015 at 07:53:12PM +0100, Ramana Radhakrishnan wrote:
 On Fri, Apr 3, 2015 at 5:17 PM, Sebastian Pop seb...@gmail.com wrote:
  Hi,
 
  On Thu, Apr 2, 2015 at 5:51 PM, James Greenhalgh
  james.greenha...@arm.com wrote:
  Trunk is currently in Stage 4 development, these patches are fairly
  low-risk, but they are certainly not regression fixes. I'll defer
  to port maintainers and release managers for the final say, but in my
  opinion it would not be appropriate to commit them until Stage 1
  development for GCC 6.0 opens (hopefully in a few weeks).
 
  I thought that adding flags for new processors was ok at any time,
  even to backport.

 It's usually risk vs reward on a per patch basis and I don't think of
 it as a general rule. We've always avoided the CPU tuning backport
 rule to the FSF branches. The smaller the CPU tuning patch - the
 better it is and in this case I'm comfortable with the patch going in
 as it is adding another tuning option, using existing constructs and
 is not invasive in the backend.

 Thanks for the clarification Ramana.

 In which case, and now that I've seen that binutils support has also
 been accepted, the AArch64 part is OK to commit (assuming no regressions
 and no objections from Richard or Jakub).

I will wait to hear from Richi or Jakub before committing the two patches.


 It would be great if you could follow these up with a patch for
 changes.html for GCC 5 for both ARM and AArch64.

Attached.  I will commit this after the two patches adding the exynos-m1 flags.

Thanks,
Sebastian
*** changes.html.~1.92.~2015-03-27 09:42:10.0 -0500
--- changes.html2015-04-03 21:57:33.550001606 -0500
***
*** 585,591 
(codecortex-a72/code) and initial support for its big.LITTLE
combination with the ARM Cortex-A53 
(codecortex-a72.cortex-a53/code),
Cavium ThunderX (codethunderx/code), Applied Micro X-Gene 1
!   (codexgene1/code).
The GCC identifiers can be used
as arguments to the code-mcpu/code or code-mtune/code options,
for example: code-mcpu=xgene1/code or
--- 585,591 
(codecortex-a72/code) and initial support for its big.LITTLE
combination with the ARM Cortex-A53 
(codecortex-a72.cortex-a53/code),
Cavium ThunderX (codethunderx/code), Applied Micro X-Gene 1
!   (codexgene1/code), and Samsung Exynos M1 (codeexynos-m1/code).
The GCC identifiers can be used
as arguments to the code-mcpu/code or code-mtune/code options,
for example: code-mcpu=xgene1/code or
***
*** 624,630 
 (codecortex-a72/code) and initial support for its big.LITTLE
 combination with the ARM Cortex-A53 
(codecortex-a72.cortex-a53/code),
 ARM Cortex-M7 (codecortex-m7/code), Applied Micro X-Gene 1
!(codexgene1/code).  The GCC identifiers can be used
 as arguments to the code-mcpu/code or code-mtune/code options,
 for example: code-mcpu=xgene1/code or
 code-mtune=cortex-a72.cortex-a53/code.
--- 624,631 
 (codecortex-a72/code) and initial support for its big.LITTLE
 combination with the ARM Cortex-A53 
(codecortex-a72.cortex-a53/code),
 ARM Cortex-M7 (codecortex-m7/code), Applied Micro X-Gene 1
!(codexgene1/code), and Samsung Exynos M1 (codeexynos-m1/code).
!The GCC identifiers can be used
 as arguments to the code-mcpu/code or code-mtune/code options,
 for example: code-mcpu=xgene1/code or
 code-mtune=cortex-a72.cortex-a53/code.


Re: [libstdc++/65033] Give alignment info to libatomic

2015-04-03 Thread Jonathan Wakely

On 03/04/15 05:24 -0400, Hans-Peter Nilsson wrote:

On Thu, 2 Apr 2015, Hans-Peter Nilsson wrote:

Why then use __alignof(_M_i) (the object-alignment)
instead of _S_alignment (the deduced alas insufficiently
increased type-alignment)?


Isn't the object aligned to _S_alignment?



(The immediate reason is that _S_alignment wasn't there until a
later revision, but the gist of the question remains. :-)




 making sure that atomic_is_lock_free returns the same
 value for all objects of a given type,

(That would work but it doesn't seem to be the case.)


Because we haven't done anything to increase the alignment for the
__atomic_base_ITp specialization yet, see the additional patch at:

https://gcc.gnu.org/ml/gcc-patches/2015-03/msg01664.html

(Although that's insufficient as you point out, because it should
depend on the size too.)


 we probably should have changed the
 interface so that we would pass size and alignment rather than size and object
 pointer.

 Instead, we decided that passing null for the object pointer would be
 sufficient.  But as this PR shows, we really do need to take alignment into
 account.

Regarding what's actually needed, alignment of an atomic type
should always be *forced to be at least the natural alignment of
of the object* (with non-power-of-two sized-objects rounded up)
and until then atomic types won't work for targets where the
non-atomic equivalents have less alignment (as straddling a
page-boundary won't be lock-less-atomic anywhere where
straddling a page-boundary may cause a non-atomic-access...) So,
not target-specific except for targets that require even
higher-than-natural alignment.


So, can we do something like this instead for gcc5?
(Completely untested and may be syntactically, namespacingly and
cxxstandardversionly flawed.)

Index: include/std/atomic
===
--- include/std/atomic  (revision 221849)
+++ include/std/atomic  (working copy)
@@ -165,16 +165,16 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
struct atomic
{
private:
-  // Align 1/2/4/8/16-byte types the same as integer types of that size.
+  // Align 1/2/4/8/16-byte types to the natural alignment of that size.
  // This matches the alignment effects of the C11 _Atomic qualifier.
  static constexpr int _S_min_alignment
-   = sizeof(_Tp) == sizeof(char)  ? alignof(char)
-   : sizeof(_Tp) == sizeof(short) ? alignof(short)
-   : sizeof(_Tp) == sizeof(int)   ? alignof(int)
-   : sizeof(_Tp) == sizeof(long)  ? alignof(long)
-   : sizeof(_Tp) == sizeof(long long) ? alignof(long long)
+  = sizeof(_Tp) == sizeof(char)   ? max(sizeof(char), alignof(char))
+   : sizeof(_Tp) == sizeof(short) ? max(sizeof(short), alignof(short))
+   : sizeof(_Tp) == sizeof(int)   ? max(sizeof(int), alignof(int))
+   : sizeof(_Tp) == sizeof(long)  ? max(sizeof(long), alignof(long))
+   : sizeof(_Tp) == sizeof(long long) ? max(sizeof(long long), 
alignof(long long))
#ifdef _GLIBCXX_USE_INT128
-   : sizeof(_Tp) == sizeof(__int128)  ? alignof(__int128)
+   : sizeof(_Tp) == sizeof(__int128)  ? max(sizeof(__int128), 
alignof(__int128))
#endif
: 0;


Instead of changing every case in the condition to include sizeof why
not just do it afterwards using sizeof(_Tp), in the _S_alignment
calculation?

We know sizeof(_Tp) == sizeof(corresponding integer type) because
that's the whole point of the conditionals! See attachment.


@@ -216,7 +216,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
  is_lock_free() const noexcept
  {
// Produce a fake, minimally aligned pointer.
-   void *__a = reinterpret_castvoid *(-__alignof(_M_i));
+   void *__a = reinterpret_castvoid *(-_S_alignment);
return __atomic_is_lock_free(sizeof(_M_i), __a);
  }


If _M_i is aligned to _S_alignment then what difference does the
change above make?

It doesn't matter if the value is per-object if we've forced all such
objects to have the same alignment, does it?

Or is it different if a std::atomicT is included in some other
struct and the user forces a different alignment on it? I don't think
we really need to support that, users shouldn't be doing that.



Index: include/bits/atomic_base.h
===
--- include/bits/atomic_base.h  (revision 221849)
+++ include/bits/atomic_base.h  (working copy)
@@ -240,7 +240,23 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
private:
  typedef _ITp  __int_type;

-  __int_type   _M_i;
+  // Align 1/2/4/8/16-byte types to the natural alignment of that size.
+  // This matches the alignment effects of the C11 _Atomic qualifier.
+  static constexpr int _S_min_alignment
+  = sizeof(_Tp) == sizeof(char)   ? max(sizeof(char), __alignof(char))
+   : sizeof(_Tp) == sizeof(short) ? max(sizeof(short), 
__alignof(short))
+   : sizeof(_Tp) == sizeof(int)   ? 

Re: [PR64164] drop copyrename, integrate into expand

2015-04-03 Thread Alexandre Oliva
On Mar 31, 2015, Jeff Law l...@redhat.com wrote:

 -  if (var1 != var2)
 +  if (var1 != var2  !optimize)
 return false;
 So when when the base variables are different and we are optimizing,
 this allows coalescing, right?

Yeah.

 What I don't see is a corresponding change to var_map_base_init to
 ensure we build a conflict graph which includes objects when
 SSA_NAME_VARs are not the same.  I see a vague reference in
 var_map_base_init's header comment that refers us to
 coalesce_ssa_name.

 It appears that compute_optimized_partition_bases handles this by
 creating a partitions of things that are related by copies/phis
 regardless of their underlying named object, type, etc.  Right?

Correct.  I guess it makes sense to move partition base computation to a
single location.  Since compute_optimized_partition_bases relies on data
structures local to this source file, I'm moving the non-optimized
version to tree-ssa-coalesce.c, and dropping support for basevar
initialization from tree-ssa-live.c.


 Hard to argue with removing a pass that gets called 5 times!

:-)


 @@ -890,6 +900,36 @@ build_ssa_conflict_graph (tree_live_info_p liveinfo)
 live_track_process_def (live, result, graph);
 }
 
 +  /* Pretend there are defs for params' default defs at the start
 + of the (post-)entry block.  We run after abnormal coalescing,
 + so we can't assume the leader variable is the default
 + definition, but because of SSA_NAME_VAR adjustments in
 + attempt_coalesce, we can assume that if there is any
 + PARM_DECL in the partition, it will be the leader's
 + SSA_NAME_VAR.  */

This comment is outdated.  Since we no longer have abnormal coalescing
before building the conflict graph, we can just test whether each
SSA_NAME is a default def for a PARM_DECL and be done with it.

 So the issue here is you want to iterate over the objects live at the
 entry block, which would include any SSA_NAMEs which result from
 PARM_DECLs.  I don't guess there's an easier way to do that other than
 iterating over everything live in that initial block?

We could iterate over all SSA_NAMEs, but that would probably be more
costly.  There shouldn't be very many live variables at the function
entry, so using the live bitmaps is likely to save us time, especially
on functions with lots of SSA_NAMEs.

 And the second second EXECUTE_IF_SET_IN_BITMAP iterates over
 everything in the partitions associated with the SSA_NAMES that are
 live at the the entry block, right?

Yeah, we iterate over the bases in live_base_var, because the per-base
bitmaps are only accurate when the corresponding live_base_var bit is
iset.

 @@ -1126,11 +1166,12 @@ create_outofssa_var_map (coalesce_list_p cl, bitmap 
 used_in_copy)
 
 static inline bool
 attempt_coalesce (var_map map, ssa_conflicts_p graph, int x, int y,
 -  FILE *debug)
 +  bitmap param_defaults, FILE *debug)
 [ ... ]
 So the bulk of the changes into this routine are really about picking
 a good leader, which presumably is how we're able to get the desired
 effects on debuginfo that we used to get from tree-ssa-copyrename.c?

This has nothing to do with debuginfo, I'm afraid.  We just had to keep
track of parm and result decls to avoid coalescing them together, and
parm decl default defs to promote them to leaders, because expand copies
incoming REGs to pseudos in PARM_DECL's DECL_RTL.  We should fill that
in with the RTL created for the default def for the PARM_DECL.  At the
end, I believe we also copy the RESULT_DECL DECL_RTL to the actual
return register or rtl.  I didn't want to tackle the reworking of these
expanders to avoid problems out of copying incoming parms to one pseudo
and then reading it from another, as I observed before I made this
change.  I'm now tackling that, so that we can refrain from touching the
base vars altogether, and not refrain from coalescing parms and results.

 So some comments about the various cases here might help.  I can sort
 them out if I read the code, but one could argue that a block comment
 on the rules for how to select the partition leader would be better.

*nod*.  I won't bother, though, if this code ends up gone in the next
iteration of the patch ;-)

 Is the special casing of PARM_DECLs + RESULT_DECLs really a failing of
 not handling one or both properly when computing liveness information?

No, it's about RTL assignment and copying to/from hard regs.  We assign
RTL to PARM_DECLs and RESULT_DECLs explicitly in the expander, but we
can't assign different RTL to them if they are coalesced in a single
partition.

 I'm not aware of an inherent reason why a PARM_DECL couldn't coalesce
 with a related RESULT_DECL if they are otherwise non-conflicting and
 related by a copy/phi.

Indeed, there isn't any inherent reason.  It was just a restriction I
carried over from copyrename, and that I postponed cleaning up.

 Presumably ordering of unioning of the partitions doesn't matter here
 as we're looking at coalesce 

Re: Silence merge warnings on artificial types

2015-04-03 Thread Jakub Jelinek
On Thu, Apr 02, 2015 at 09:05:53PM +0200, Jakub Jelinek wrote:
 On Thu, Apr 02, 2015 at 09:23:03PM +0300, Ilya Verbin wrote:
  Hmm, libgomp.c++/target-3.C still fails.
  Here is what I see in need_assembler_name_p:
 
 Guess we should make the .omp_data_s.* types TYPE_ARTIFICIAL too.
 Will take care of that tomorrow.

Tested on x86_64-linux, committed to trunk.

2015-04-03  Jakub Jelinek  ja...@redhat.com

* omp-low.c (scan_omp_parallel, scan_omp_task, scan_omp_target): Set
TYPE_ARTIFICIAL on the .omp_data* types.

--- gcc/omp-low.c.jj2015-03-23 08:47:51.0 +0100
+++ gcc/omp-low.c   2015-04-02 22:06:30.547850197 +0200
@@ -2351,6 +2351,7 @@ scan_omp_parallel (gimple_stmt_iterator
   DECL_ARTIFICIAL (name) = 1;
   DECL_NAMELESS (name) = 1;
   TYPE_NAME (ctx-record_type) = name;
+  TYPE_ARTIFICIAL (ctx-record_type) = 1;
   create_omp_child_function (ctx, false);
   gimple_omp_parallel_set_child_fn (stmt, ctx-cb.dst_fn);
 
@@ -2391,6 +2392,7 @@ scan_omp_task (gimple_stmt_iterator *gsi
   DECL_ARTIFICIAL (name) = 1;
   DECL_NAMELESS (name) = 1;
   TYPE_NAME (ctx-record_type) = name;
+  TYPE_ARTIFICIAL (ctx-record_type) = 1;
   create_omp_child_function (ctx, false);
   gimple_omp_task_set_child_fn (stmt, ctx-cb.dst_fn);
 
@@ -2404,6 +2406,7 @@ scan_omp_task (gimple_stmt_iterator *gsi
   DECL_ARTIFICIAL (name) = 1;
   DECL_NAMELESS (name) = 1;
   TYPE_NAME (ctx-srecord_type) = name;
+  TYPE_ARTIFICIAL (ctx-srecord_type) = 1;
   create_omp_child_function (ctx, true);
 }
 
@@ -2671,6 +2674,7 @@ scan_omp_target (gomp_target *stmt, omp_
   DECL_ARTIFICIAL (name) = 1;
   DECL_NAMELESS (name) = 1;
   TYPE_NAME (ctx-record_type) = name;
+  TYPE_ARTIFICIAL (ctx-record_type) = 1;
   if (offloaded)
 {
   if (is_gimple_omp_oacc (stmt))


Jakub


Re: [PR64164] drop copyrename, integrate into expand

2015-04-03 Thread Alexandre Oliva
On Mar 31, 2015, Richard Biener richard.guent...@gmail.com wrote:

 + || !(SSA_NAME_IS_DEFAULT_DEF (var)
 +  || (param_defaults
 +   bitmap_bit_p (param_defaults, part

 This looks somewhat awkward to me ;)  Is it really important to allow
 coalescing PARM_DECL-based SSA vars with sth else?

It's a valid optimization.  I can't say it's really important, but if
the only objection is to param_defaults, I'm getting rid of it.

 At least abnormal coalescing doesn't need to do that, so just walking
 over the function decls parameters and making their default-def live
 should be enough?

It should.  We'd have to duplicate logic of parameters, including static
chain and whatnot.  I figured this would make it more resilient to
changes elsewhere.

 +  else if (TREE_CODE (SSA_NAME_VAR (var2)) == VAR_DECL)
 +   leader = SSA_NAME_VAR (var2);
 +  else /* What else could it be?  */
 +   gcc_unreachable ();
 +

 definitely comments missing in this spaghetti...

I'm trying to remove the spaghetting now.

 or seeing this, why coalesce default-defs at all?

Why not?  (the referenced code is gone from my local tree, BTW)

 Either they are param values or they have indetermined values (and
 thus we can and do pick whatever is available at expansion time)?

If they are param values, we want to have them available; if they
aren't, whatever we coalesce with is good.


 So the above does full coalescing ignoring conflicts.

Yeah.  We want to tell what we'd get if all coalesce possibilities are
taken, so that we can assign the same basevar to all partitions so that
we detect conflicts.

 Did you do any statistics on how the number of basevars changes with your 
 patch
 compared to trunk?

'fraid I didn't run any statistics whatsoever.  I didn't think it was
important, since it's pretty much just doing copyrename during coalesce.
Truth be told, this has since relaxed some of the constraints from
copyrename, and I'm going to relax some more in the next iteration, so I
guess some statistics wouldn't be a bad idea.  Is there any specific
testcase you're interested in, or something like a GCC bootstrap or
somesuch?

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Re: [RS6000] Fix 65576 regression

2015-04-03 Thread David Edelsohn
On Fri, Apr 3, 2015 at 8:51 AM, Alan Modra amo...@gmail.com wrote:
 On Thu, Apr 02, 2015 at 08:02:57PM -0400, David Edelsohn wrote:
 If the final alternative cannot occur, it should be removed as well.
 Alan, would you please test that change also?

 Tested powerpc64-linux and powerpc-linux no regressions.

 * config/rs6000/rs6000.md (extenddftf2_internal): Remove last
 alternative.

Okay with the two patches combined.

Thanks, David


Re: [libcpp] Fix ICE with conditional macros (PR preprocessor/61977)

2015-04-03 Thread Jason Merrill

OK.

Jason