Re: [PATCH, alpha]: Introduce handle_trap_shadows and align_insns passes

2014-07-11 Thread Uros Bizjak
On Mon, Jun 30, 2014 at 9:59 PM, Uros Bizjak ubiz...@gmail.com wrote:

 After fixing _.barriers and _.eh_range passes w.r.t. CALL_ARG_LOCATION
 notes, we can finaly move handling of trap shadows (PR 56858) and insn
 alignments into their own passes.

 Additionally, the patch skips handling of BARRIERs in
 alpha_pad_function_end, since CALL_ARG_LOCATION notes are not split
 away from their call insn anymore.

 2014-06-30  Uros Bizjak  ubiz...@gmail.com

 PR target/56858
 * config/alpha/alpha.c: Include tree-pass.h, context.h
 and pass_manager.h.
 (pass_data_handle_trap_shadows): New pass.
 (pass_handle_trap_shadows::gate): New pass gate function.
 (make_pass_handle_trap_shadows): New function.
 (rest_of_handle_trap_shadows): Ditto.

 (alpha_align_insns_1): Rename from alpha_align_insns.
 (pass_data_align_insns): New pass.
 (pass_align_insns::gate): New pass gate function.
 (make_pass_aling_insns): New function.
 (rest_of_align_insns): Ditto.
 (alpha_align_insns): Ditto.

 (alpha_option_override): Declare handle_trap_shadows info
 and align_insns_info.  Register handle_trap_shadows and align_insns
 passes here.
 (alpha_reorg): Do not call alpha_trap_shadows and
 alpha_align_insn from here.

 (alpha_pad_function_end): Do not skip BARRIERs.

I went agead and installed the patch on mainline SVN. The patch just
fixes existing approach and doesn't introduce new functionality.

Uros.


Re: [PATCH] Generate more efficient memory barriers for LEON3

2014-07-11 Thread Daniel Cederman

Hi,

Thank you for your comments.

 ...stdbar should never be generated since #StoreStore is implied by 
TSO.


I missed that #StoreStore is never generated for TSO, so I'm removing 
that pattern.


 OK, thanks.  Does this result in a significance performance gain?

stb seems to be at least twice as fast as ldstub.

 I think that only the membar_storeload_leon3 pattern is necessary.

The full barrier pattern membar_leon3 also gets generated so I think 
that one should be kept also.


I'm changing the pattern type to store and the condition on the 
original patterns to  !TARGET_LEON3 and resubmitting the patch.


On 2014-07-10 11:37, Eric Botcazou wrote:

The memory barriers generated for SPARC are targeting the weakest memory
model allowed for SPARC.


That's not quite true, they are targeting the sparc_memory_model, which is the
memory model selected for the architecture/OS pair by default and which can be
overridden by the user with -mmemory-model=[default|rmo|pso|tso|sc].


The LEON3/4 SPARC processors are using a stronger memory model and thus have
less requirements on the memory barriers.


My understanding is that they use TSO, in which case...


For LEON3/4, StoreStore is compiler-only, instead of stbar,


...stdbar should never be generated since #StoreStore is implied by TSO.


and StoreLoad can be achieved with a normal byte write stb, instead of
an atomic byte read-write ldstub.


OK, thanks.  Does this result in a significance performance gain?


The provided patch changes the previously mentioned memory barriers for
TARGET_LEON3.


I think that only the membar_storeload_leon3 pattern is necessary.  Couple of
more nits: the new pattern is not multi, it's store and you need to add:

!TARGET_LEON3

to the original membar_storeload since TARGET_LEON3 is also TARGET_V8.



--
Daniel Cederman
Software Engineer
Aeroflex Gaisler AB
Aeroflex Microelectronic Solutions – HiRel
Kungsgatan 12
SE-411 19 Gothenburg, Sweden
Phone: +46 31 7758665
ceder...@gaisler.com
www.Aeroflex.com/Gaisler


[PATCH-v2] Generate more efficient memory barriers for LEON3

2014-07-11 Thread Daniel Cederman
ChangeLog:

2014-07-11  Daniel Cederman  ceder...@gaisler.com

gcc/config/sparc/
* sync.md: Generate more efficient memory barriers for LEON3 

---
 gcc/config/sparc/sync.md |   21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/gcc/config/sparc/sync.md b/gcc/config/sparc/sync.md
index e6e237f..05c3277 100644
--- a/gcc/config/sparc/sync.md
+++ b/gcc/config/sparc/sync.md
@@ -64,11 +64,19 @@
   stbar
   [(set_attr type multi)])
 
+;; For LEON3, STB has the effect of membar #StoreLoad.
+(define_insn *membar_storeload_leon3
+  [(set (match_operand:BLK 0  )
+   (unspec:BLK [(match_dup 0) (const_int 2)] UNSPEC_MEMBAR))]
+  TARGET_LEON3
+  stb\t%%g0, [%%sp-1]
+  [(set_attr type store)])
+
 ;; For V8, LDSTUB has the effect of membar #StoreLoad.
 (define_insn *membar_storeload
   [(set (match_operand:BLK 0  )
(unspec:BLK [(match_dup 0) (const_int 2)] UNSPEC_MEMBAR))]
-  TARGET_V8
+  TARGET_V8  !TARGET_LEON3
   ldstub\t[%%sp-1], %%g0
   [(set_attr type multi)])
 
@@ -79,7 +87,7 @@
   [(set (match_operand:BLK 0  )
(unspec:BLK [(match_dup 0) (match_operand:SI 1 const_int_operand)]
UNSPEC_MEMBAR))]
-  TARGET_V8
+  TARGET_V8  !TARGET_LEON3
   stbar\n\tldstub\t[%%sp-1], %%g0
   [(set_attr type multi)
(set_attr length 2)])
@@ -93,6 +101,15 @@
   membar\t%1
   [(set_attr type multi)])
 
+;; For LEON3, membar #StoreLoad is enough for a full barrier.
+(define_insn *membar_leon3
+  [(set (match_operand:BLK 0  )
+   (unspec:BLK [(match_dup 0) (match_operand:SI 1 const_int_operand)]
+   UNSPEC_MEMBAR))]
+  TARGET_LEON3
+  stb\t%%g0, [%%sp-1]
+  [(set_attr type store)])
+
 (define_peephole2
   [(set (match_operand:BLK 0  )
(unspec:BLK [(match_dup 0) (match_operand:SI 1 const_int_operand)]
-- 
1.7.9.5



Re: [PATCH] Fix PR61762, reads from string constants

2014-07-11 Thread Richard Biener
On Thu, 10 Jul 2014, Jakub Jelinek wrote:

 On Thu, Jul 10, 2014 at 05:12:53PM +0200, Richard Biener wrote:
  The following makes sure we fold the reads from string constants
  created from folding memcpy (foo, string_cst[0], ...) to
  eventually create similar code as if that memcpy folding wasn't
  done (move-by-pieces).
  
  Bootstrapped and tested on x86_64-unknown-linux-gnu.
  
  Comments?
  
  (yeah, those native_encode_expr improvement would be really nice,
  eventually for a slightly different interface, native_encode_expr_part)
 
 Don't think you need a new function for that, just add an optional argument
 to native_encode_expr, and propagate it to native_encode_string, which would
 if the new bool arg is true not do:
   if (total_bytes  len)
 return 0;
 but if total_bytes  len set total_bytes to len instead.
 Or, isn't your code only handling STRING_CSTs and nothing else?
 Then perhaps just export native_encode_string too and add the optional
 argument to that.

I started with adding an optional arg to native_encode_expr but then
figured that all the current workers return failure if the entire
expr doesn't fit within 'len'.  But what I want is a native_encode_expr
that encodes (at most) [offset, offset + len] even when the actual
object is larger.  But yes, I'd share the actual workers and only
add native_encode_expr_part with the extra offset argument.

For the particular case native_encode_string is enough but I think
for constructor folding and constructor output we want to support
CONSTRUCTOR in native_encode_expr in the future.

Thus, I'm not sure if changing the semantics of native_encode_expr
is ok (or desirable with just an extra offset argument - if so
I can add an overload retaining the old semantic for the old
interface).

Richard.


Re: [PATCH][RFC] Fix PR61473, inline small memcpy/memmove during tree opts

2014-07-11 Thread Richard Biener
On Thu, 10 Jul 2014, Jakub Jelinek wrote:

 On Thu, Jul 10, 2014 at 04:30:13PM +0200, Richard Biener wrote:
  Compromise hack below.  It simply avoids the transform for
  sources that c_strlen can compute a length of.  That fixes all
  strlenopt testcase apart from strlenopt-8.c which does
  memcpy (, flag ? a : b); which then still folds
  during gimplification.  I have XFAILed that.
  
  I've tried to comb my way through strlenopt but failed to quickly
  add support for generic stores (it has very rough support for
  char stores, see also PR61773).
  
  Does the below look ok?
 
 I can look at the tree-ssa-strlen.c stuff and removing the !c_strlen
 hack incrementally.

Ok, I'll bootstrap/test the patch then and will commit if there are
no issues left.

Thanks,
Richard.


Re: [PATCH] offline gcda profile processing tool

2014-07-11 Thread Richard Biener
On Mon, May 5, 2014 at 7:17 PM, Rong Xu x...@google.com wrote:
 Here is the updated patch. The difference with patch set 3 is
 (1) use the gcov-counter.def for scaling operation.
 (2) fix wrong scaling function for time-profiler.

 passed with bootstrap, profiledboostrap and SPEC2006.

One of the patches breaks bootstrap for me:

/space/rguenther/src/svn/trunk/gcc/../libgcc/libgcov.h:184: error: ISO
C++ forbids zero-size array ‘ctrs’
make[3]: *** [libgcov-util.o] Error 1

host compiler is gcc 4.3.4

Richard.

 Thanks,

 -Rong

 On Thu, May 1, 2014 at 3:37 PM, Rong Xu x...@google.com wrote:
 Hi, Honza,

 Please find the new patch in the attachment. This patch integrated
 your earlier suggestions. The noticeable changes are:
 (1) build specialized object for libgcov-driver.c and libgcov-merge.c
 and link into gcov-tool, rather including the source file.
 (2) split some gcov counter definition code to gcov-counter.def to
 avoid code duplication.
 (3) use a macro for gcov_read_counter(), so gcov-tool can use existing
 code in libgcov-merge.c with minimal change.

 This patch does not address the suggestion of creating a new directory
 for libgcov. I agree with you that's a much better
 and cleaner structure we should go for. We can do that in follow-up patches.

 I tested this patch with boostrap and profiledbootstrap. Other tests
 are on-going.

 Thanks,

 -Rong

 On Wed, Apr 16, 2014 at 8:34 PM, Jan Hubicka hubi...@ucw.cz wrote:
 GCOT_TOOL needs to use this function to read the string in gcda file
 to memory to construct gcov_info objects.
 As you noticed, gcov runtime does not need this interface. But
 gcov-tool links with gcov runtime and it also uses the function.
 We could make it available in gcov_runtime, but that will slightly
 increase the memory footprint.

 Yep, it is not really pretty. I wrote bellow some plan how things may be
 structured in more convenient way.  Any work in that direction would be 
 welcome.

 The planned new functions for trunk version is: (1) overlap score b/w
 two profiles (2) better dumping (top hot objects/function/counters)
 and statistics.
 Once this basic version is in, we can start to add the new functionality.

 Sounds good. I assume the autoFDO does not go via gcov tool but rather uses
 custom reader of profile data at GCC side?
 I wonder, are there any recent overview papers/slides/text of design of 
 AutoFDO
 and other features to be merged?
 I remember the talk from GCC Summit and I did read some of pre-LTO LIPO
 sources  papers, but it would be nice to have somethin up to date.

 libgcov-util.o is built in gcc/ directory, rather in libgcc.
 It's directly linked to gcov-tool.
 So libgcov-util.o is built for HOST, not TARGET.
 With makefile changes, we can built HOST version of libgcov-driver.o
 and libgcov-merge.o.
 I also need to make some static functions/variables public.

 I suppose that can go with IN_GCOV_TOOL ifdef.

 So we currently have basic gcov io handling in gcc/gcov-io.c that is 
 borrowed
 by libgcc/libgcov* code.  We also will get libgcov-util.c in libgcc 
 directory
 that is actually borrowed by by gcc/gcov-tool.c code.

 We now have one runtime using STDIO for streaming and kernel has custom 
 version
 that goes via /proc interface (last time I looked).  We added some 
 abstraction
 into libgcov-interface that is the part that relies on STDIO, partly via 
 gcov-io.c
 code and now we have in-memory handling code in libgcov-util.

 I guess it would make most sentse to put all the gcov code into a new 
 directory
 (libgcov) and make it stand-alone library that can be configured
 1) for stdio based runtime as we do not
 2) for runtime missing the interface and relyin on user providing it
 3) for use within gcov file manipulation tools with reorg of
 GCC/gcov/gcov-dump/gcov-tool to all use the same low-level interfaces.
 In a longer term, I would like to make FDO profiling intstrumentation to 
 happen
 at linktime. For that I need to make the instrumentation code (minimal 
 spaning
 tree  friends) to work w/o cgraph that would ideally be done in a shared
 implementation
  Won't this get wrong answer when counters[0] is not the same?
  I would expected the merging code to compare the counters first. 
  Similarly for delta counter.

 These *_op functions are for scaling only. So there is only one
 profile involved (thus there is no comparison).
 The merge handles are in libgcov-merge.c which have the code to handle
 mismatched profile targets.

 I see, OK then.
 
  Adding correct rounding may actually make difference for Martin's 
  startup time work.

 Do you mean to use something like in RDIV macro?

 Yes.

 Honza


Re: update address taken: don't drop clobbers

2014-07-11 Thread Richard Biener
On Thu, Jul 10, 2014 at 8:22 PM, Jeff Law l...@redhat.com wrote:
 On 07/10/14 09:48, Michael Matz wrote:

 Hi,

 On Thu, 10 Jul 2014, Richard Biener wrote:

 Apart from the out-of-SSA patch you proposed elsewhere a possibility
 is to simply never mark undefined SSA names as
 SSA_NAME_OCCURS_IN_ABNORMAL_PHI ... (or not mark those
 as must-coalesce).


 The insight to note is, that undefined SSA names should really be
 coalesced with something (otherwise you lost an optimization opportunity),
 but it doesn't matter with _what_ each use of the undefined name is
 coalesced, you can even identify different uses of them with different SSA
 names (e.g. the LHS of each using stmt).  Requires some change in the
 order things are done in out-of-ssa.

 The last part is what I hinted might be problematical.  If some undefined
 SSA_NAME appears on the RHS of two PHIs and we want to coalesce that
 undefined SSA_NAME with the LHS of each of those PHIs, then the LHS of those
 two PHIs must coalesce as well.  At least that's my recollection of how all
 that stuff worked.

Yes, coalescing doesn't do live-range splitting to avoid coalescing the
two PHI results.  But they have to be coalesced anyway.

I still think simply never recording conflicts for undefined SSA names
is a proper hack to avoid this issue.

 It was that realization that made me wonder if we should have a unique
 SSA_NAME at each undefined use point.

That would be unnecessarily expensive.

Richard.

 jeff



Re: [PATCH] Fix PR61762, reads from string constants

2014-07-11 Thread Jakub Jelinek
On Fri, Jul 11, 2014 at 10:00:01AM +0200, Richard Biener wrote:
 I started with adding an optional arg to native_encode_expr but then
 figured that all the current workers return failure if the entire
 expr doesn't fit within 'len'.  But what I want is a native_encode_expr

That is because the functions were written for VIEW_CONVERT_EXPR folding
where it is expected that both the types are the same size, and it doesn't
make sense to spend too much time on large stuff that doesn't fit into
the fold_view_convert_expr's fixed size buffer.  Now, for your case
the sizes aren't the same and thus partial data is just fine, because
what native_interpret_expr will do is small fixed size.
So I don't see a problem with not returning NULL and instead handling the
partial store into buffer if you ask for it through optional argument
(doing it always would be fine for fold_view_convert_expr assuming nobody
feeds us different sized argument vs. VCE type, but e.g.
tree-loop-distribution.c certainly appreciates if we return NULL for large
things, because if it had just partial buffer, it would not be able to
verify what it is looking for).

Jakub


Re: [PATCH] offline gcda profile processing tool

2014-07-11 Thread Christophe Lyon
On 11 July 2014 10:07, Richard Biener richard.guent...@gmail.com wrote:
 On Mon, May 5, 2014 at 7:17 PM, Rong Xu x...@google.com wrote:
 Here is the updated patch. The difference with patch set 3 is
 (1) use the gcov-counter.def for scaling operation.
 (2) fix wrong scaling function for time-profiler.

 passed with bootstrap, profiledboostrap and SPEC2006.

 One of the patches breaks bootstrap for me:

 /space/rguenther/src/svn/trunk/gcc/../libgcc/libgcov.h:184: error: ISO
 C++ forbids zero-size array ‘ctrs’
 make[3]: *** [libgcov-util.o] Error 1

 host compiler is gcc 4.3.4

 Richard.


On my side, commit 212448 breaks the cross-build of GCC for targets
using newlib:
* arm-none-eabi
* aarch64-none-elf
/usr/include/sys/stat.h: In function 8098void
gcov_output_files(const char*, gcov_info*)8099:
/usr/include/sys/stat.h:317: error: too few arguments to function
8098int mkdir(const char*, __mode_t)8099
/tmp/1392958_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gcov-tool.c:96:
error: at this point in file
make[2]: *** [gcov-tool.o] Error 1

Christophe.

 Thanks,

 -Rong

 On Thu, May 1, 2014 at 3:37 PM, Rong Xu x...@google.com wrote:
 Hi, Honza,

 Please find the new patch in the attachment. This patch integrated
 your earlier suggestions. The noticeable changes are:
 (1) build specialized object for libgcov-driver.c and libgcov-merge.c
 and link into gcov-tool, rather including the source file.
 (2) split some gcov counter definition code to gcov-counter.def to
 avoid code duplication.
 (3) use a macro for gcov_read_counter(), so gcov-tool can use existing
 code in libgcov-merge.c with minimal change.

 This patch does not address the suggestion of creating a new directory
 for libgcov. I agree with you that's a much better
 and cleaner structure we should go for. We can do that in follow-up patches.

 I tested this patch with boostrap and profiledbootstrap. Other tests
 are on-going.

 Thanks,

 -Rong

 On Wed, Apr 16, 2014 at 8:34 PM, Jan Hubicka hubi...@ucw.cz wrote:
 GCOT_TOOL needs to use this function to read the string in gcda file
 to memory to construct gcov_info objects.
 As you noticed, gcov runtime does not need this interface. But
 gcov-tool links with gcov runtime and it also uses the function.
 We could make it available in gcov_runtime, but that will slightly
 increase the memory footprint.

 Yep, it is not really pretty. I wrote bellow some plan how things may be
 structured in more convenient way.  Any work in that direction would be 
 welcome.

 The planned new functions for trunk version is: (1) overlap score b/w
 two profiles (2) better dumping (top hot objects/function/counters)
 and statistics.
 Once this basic version is in, we can start to add the new functionality.

 Sounds good. I assume the autoFDO does not go via gcov tool but rather uses
 custom reader of profile data at GCC side?
 I wonder, are there any recent overview papers/slides/text of design of 
 AutoFDO
 and other features to be merged?
 I remember the talk from GCC Summit and I did read some of pre-LTO LIPO
 sources  papers, but it would be nice to have somethin up to date.

 libgcov-util.o is built in gcc/ directory, rather in libgcc.
 It's directly linked to gcov-tool.
 So libgcov-util.o is built for HOST, not TARGET.
 With makefile changes, we can built HOST version of libgcov-driver.o
 and libgcov-merge.o.
 I also need to make some static functions/variables public.

 I suppose that can go with IN_GCOV_TOOL ifdef.

 So we currently have basic gcov io handling in gcc/gcov-io.c that is 
 borrowed
 by libgcc/libgcov* code.  We also will get libgcov-util.c in libgcc 
 directory
 that is actually borrowed by by gcc/gcov-tool.c code.

 We now have one runtime using STDIO for streaming and kernel has custom 
 version
 that goes via /proc interface (last time I looked).  We added some 
 abstraction
 into libgcov-interface that is the part that relies on STDIO, partly via 
 gcov-io.c
 code and now we have in-memory handling code in libgcov-util.

 I guess it would make most sentse to put all the gcov code into a new 
 directory
 (libgcov) and make it stand-alone library that can be configured
 1) for stdio based runtime as we do not
 2) for runtime missing the interface and relyin on user providing it
 3) for use within gcov file manipulation tools with reorg of
 GCC/gcov/gcov-dump/gcov-tool to all use the same low-level interfaces.
 In a longer term, I would like to make FDO profiling intstrumentation to 
 happen
 at linktime. For that I need to make the instrumentation code (minimal 
 spaning
 tree  friends) to work w/o cgraph that would ideally be done in a shared
 implementation
  Won't this get wrong answer when counters[0] is not the same?
  I would expected the merging code to compare the counters first. 
  Similarly for delta counter.

 These *_op functions are for scaling only. So there is only one
 profile involved (thus there is no comparison).
 The merge handles are in libgcov-merge.c which have the code to 

Re: update address taken: don't drop clobbers

2014-07-11 Thread Richard Biener
On Fri, Jul 11, 2014 at 10:10 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Thu, Jul 10, 2014 at 8:22 PM, Jeff Law l...@redhat.com wrote:
 On 07/10/14 09:48, Michael Matz wrote:

 Hi,

 On Thu, 10 Jul 2014, Richard Biener wrote:

 Apart from the out-of-SSA patch you proposed elsewhere a possibility
 is to simply never mark undefined SSA names as
 SSA_NAME_OCCURS_IN_ABNORMAL_PHI ... (or not mark those
 as must-coalesce).


 The insight to note is, that undefined SSA names should really be
 coalesced with something (otherwise you lost an optimization opportunity),
 but it doesn't matter with _what_ each use of the undefined name is
 coalesced, you can even identify different uses of them with different SSA
 names (e.g. the LHS of each using stmt).  Requires some change in the
 order things are done in out-of-ssa.

 The last part is what I hinted might be problematical.  If some undefined
 SSA_NAME appears on the RHS of two PHIs and we want to coalesce that
 undefined SSA_NAME with the LHS of each of those PHIs, then the LHS of those
 two PHIs must coalesce as well.  At least that's my recollection of how all
 that stuff worked.

 Yes, coalescing doesn't do live-range splitting to avoid coalescing the
 two PHI results.  But they have to be coalesced anyway.

 I still think simply never recording conflicts for undefined SSA names
 is a proper hack to avoid this issue.

 It was that realization that made me wonder if we should have a unique
 SSA_NAME at each undefined use point.

 That would be unnecessarily expensive.

Btw, the bug must be already kind-of preexisting due to

bool
may_propagate_copy (tree dest, tree orig)
{
...
  /* If ORIG flows in from an abnormal edge, it cannot be propagated.  */
  if (TREE_CODE (orig) == SSA_NAME
   SSA_NAME_OCCURS_IN_ABNORMAL_PHI (orig)
  /* If it is the default definition and an automatic variable then
 we can though and it is important that we do to avoid
 uninitialized regular copies.  */
   !(SSA_NAME_IS_DEFAULT_DEF (orig)
(SSA_NAME_VAR (orig) == NULL_TREE
   || TREE_CODE (SSA_NAME_VAR (orig)) == VAR_DECL)))
return false;

but it never replaces an abnormal SSA names with its default definition
due to the next check:

  /* If DEST is an SSA_NAME that flows from an abnormal edge, then it
 cannot be replaced.  */
  if (TREE_CODE (dest) == SSA_NAME
   SSA_NAME_OCCURS_IN_ABNORMAL_PHI (dest))
return false;

still it might create overlapping life-ranges for abnormals.

Richard.

 Richard.

 jeff



Re: [PATCH] Fix PR61762, reads from string constants

2014-07-11 Thread Richard Biener
On Fri, 11 Jul 2014, Jakub Jelinek wrote:

 On Fri, Jul 11, 2014 at 10:00:01AM +0200, Richard Biener wrote:
  I started with adding an optional arg to native_encode_expr but then
  figured that all the current workers return failure if the entire
  expr doesn't fit within 'len'.  But what I want is a native_encode_expr
 
 That is because the functions were written for VIEW_CONVERT_EXPR folding
 where it is expected that both the types are the same size, and it doesn't
 make sense to spend too much time on large stuff that doesn't fit into
 the fold_view_convert_expr's fixed size buffer.  Now, for your case
 the sizes aren't the same and thus partial data is just fine, because
 what native_interpret_expr will do is small fixed size.
 So I don't see a problem with not returning NULL and instead handling the
 partial store into buffer if you ask for it through optional argument
 (doing it always would be fine for fold_view_convert_expr assuming nobody
 feeds us different sized argument vs. VCE type, but e.g.
 tree-loop-distribution.c certainly appreciates if we return NULL for large
 things, because if it had just partial buffer, it would not be able to
 verify what it is looking for).

Yeah, if you pass 'offset' then 'len' suddenly changes semantics
from specifying the buffer size to specifying the desired amount
of encoded data ... (which the present interface computes for you,
the caller doesn't even need to know how large it is).

The question is also if I ask native_encode_expr to produce
data for [offset, offset + len] but the object only contains
part of it - thus the tail is 'undefined' - do we want to fail
or simply feed (undefined) garbage to the following
native_interpret_expr?  (yes, native_encode_expr will of course
return sth  len in this case)

Richard.


Re: Optimize type streaming

2014-07-11 Thread Jan Hubicka
 Are we sure it ever stabilizes? But yes, I had something like this in mind 
 (just do one iteration always) in case we need to improve hashing.

Hi,
this is bit quick experiment with strengthening the hash by iteration. I don't
seem to be able to measure WPA time difference for the patch though (but on
positive side, it also does not make streaming out slower). I did not find
clean way to hook things into streamer cache, so I added separate hash for
that; I am sure you will know better ;)

I added dumping for duplicated trees withing SCC regions that shows some 
interesting
info.  It seems that at firefox we do have duplicate type variants, duplicated
const decls such as:
 function_type 0x2bf2a540
type integer_type 0x2bde95e8 int sizes-gimplified public SI
size integer_cst 0x2bdd4e70 constant 32
unit size integer_cst 0x2bdd4e88 constant 4
align 32 symtab 0 alias set 0 canonical type 0x2bde95e8 precision 
32 min integer_cst 0x2bdd4e28 -2147483648 max integer_cst 0x2bdd4e40 
2147483647 context translation_unit_decl 0x2db62a00 
/aux/hubicka/firefox/db/sqlite3/src/sqlite3.c
pointer_to_this pointer_type 0x2dc527e0
QI
size integer_cst 0x2bdd4d20 type integer_type 0x2bde90a8 
bitsizetype constant 8
unit size integer_cst 0x2bdd4d38 type integer_type 0x2bde9000 
sizetype constant 1
align 8 symtab 0 alias set -1 canonical type 0x2bf2a5e8
arg-types tree_list 0x2beedd20
value pointer_type 0x2bf1e930 type record_type 0x2bf1e5e8 
sqlite3_file
sizes-gimplified public unsigned DI
size integer_cst 0x2bdd4c30 constant 64
unit size integer_cst 0x2bdd4c48 constant 8
align 64 symtab 0 alias set -1 canonical type 0x2bf1e9d8
pointer_to_this pointer_type 0x2c5dadc8
chain tree_list 0x2beedcf8 value integer_type 0x2bde95e8 int
chain tree_list 0x2bde4640 value void_type 0x2bde9f18 
void
pointer_to_this pointer_type 0x2bf2a690
 function_type 0x2bf2adc8
type integer_type 0x2bde95e8 int sizes-gimplified public SI
size integer_cst 0x2bdd4e70 constant 32
unit size integer_cst 0x2bdd4e88 constant 4
align 32 symtab 0 alias set 0 canonical type 0x2bde95e8 precision 
32 min integer_cst 0x2bdd4e28 -2147483648 max integer_cst 0x2bdd4e40 
2147483647 context translation_unit_decl 0x2db62a00 
/aux/hubicka/firefox/db/sqlite3/src/sqlite3.c
pointer_to_this pointer_type 0x2dc527e0
QI
size integer_cst 0x2bdd4d20 type integer_type 0x2bde90a8 
bitsizetype constant 8
unit size integer_cst 0x2bdd4d38 type integer_type 0x2bde9000 
sizetype constant 1
align 8 symtab 0 alias set -1 canonical type 0x2bf2ae70
arg-types tree_list 0x2beedfa0
value pointer_type 0x2bf1e930 type record_type 0x2bf1e5e8 
sqlite3_file
sizes-gimplified public unsigned DI
size integer_cst 0x2bdd4c30 constant 64
unit size integer_cst 0x2bdd4c48 constant 8
align 64 symtab 0 alias set -1 canonical type 0x2bf1e9d8
pointer_to_this pointer_type 0x2c5dadc8
chain tree_list 0x2beedf78 value pointer_type 0x2bdf1690
chain tree_list 0x2bde4640 value void_type 0x2bde9f18 
void
pointer_to_this pointer_type 0x2bf2af18

constants as:

 integer_cst 0x2e8101b0 type enumeral_type 0x2e7e55e8 tag_nsresult 
constant 0
 integer_cst 0x2e810600 type enumeral_type 0x2e7e55e8 tag_nsresult 
constant 0
 integer_cst 0x2e810c90 type enumeral_type 0x2e7e55e8 tag_nsresult 
constant 0
 integer_cst 0x2e810cc0 type enumeral_type 0x2e7e55e8 tag_nsresult 
constant 0
 integer_cst 0x2e810ee8 type enumeral_type 0x2e7e55e8 tag_nsresult 
constant 0
 integer_cst 0x2e810fd8 type enumeral_type 0x2e7e55e8 tag_nsresult 
constant 5242893
 integer_cst 0x2e8130a8 type enumeral_type 0x2e7e55e8 tag_nsresult 
constant 5242893
 integer_cst 0x2e814858 type enumeral_type 0x2e7e55e8 tag_nsresult 
constant 7864321
 integer_cst 0x2e814870 type enumeral_type 0x2e7e55e8 tag_nsresult 
constant 7864321
 integer_cst 0x2e810f78 type enumeral_type 0x2e7e55e8 tag_nsresult 
constant 2152726542
 integer_cst 0x2e8130c0 type enumeral_type 0x2e7e55e8 tag_nsresult 
constant 2152726542

tree lists:

 tree_list 0x2e952708
value pointer_type 0x2e954888
type record_type 0x2e9509d8 PLDHashTable sizes-gimplified type_5 
type_6 BLK
size integer_cst 0x2d5229a8 constant 384
unit size integer_cst 0x2d522960 constant 48
align 64 symtab 0 alias set -1 canonical type 0x2e950930 fields 
field_decl 0x2e955000 ops context translation_unit_decl 0x2bdec0a0 
/aux/hubicka/firefox/webapprt/gtk2/webapprt.cpp

[PATCH, DOC]: Fix for Options That Control Optimization section

2014-07-11 Thread Martin Liška

Hello,
   I fixed  Options That Control Optimization section according to 'gcc -Q 
--help=optimizers' and after consultation with Jakub, I added missing 
-foptimize-strlen option.

Ready for trunk?
Martin

ChangeLog:

2014-07-11  Martin Liska  mli...@suse.cz

* doc/invoke.texi: Added missing options to options
that control optimization. Missing -foptimize-strlen option
introduced.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index a83f6c6..8fa63ff 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -6921,25 +6921,31 @@ compilation time.
 @option{-O} turns on the following optimization flags:
 @gccoptlist{
 -fauto-inc-dec @gol
+-fbranch-count-reg @gol
+-fcombine-stack-adjustments @gol
 -fcompare-elim @gol
 -fcprop-registers @gol
 -fdce @gol
 -fdefer-pop @gol
 -fdelayed-branch @gol
 -fdse @gol
+-fforward-propagate @gol
 -fguess-branch-probability @gol
 -fif-conversion2 @gol
 -fif-conversion @gol
+-finline-functions-called-once @gol
 -fipa-pure-const @gol
 -fipa-profile @gol
 -fipa-reference @gol
--fmerge-constants
+-fmerge-constants @gol
+-fmove-loop-invariants @gol
+-fshrink-wrap @gol
 -fsplit-wide-types @gol
 -ftree-bit-ccp @gol
--ftree-builtin-call-dce @gol
 -ftree-ccp @gol
 -fssa-phiopt @gol
 -ftree-ch @gol
+-ftree-copy-prop @gol
 -ftree-copyrename @gol
 -ftree-dce @gol
 -ftree-dominator-opts @gol
@@ -6947,6 +6953,7 @@ compilation time.
 -ftree-forwprop @gol
 -ftree-fre @gol
 -ftree-phiprop @gol
+-ftree-sink @gol
 -ftree-slsr @gol
 -ftree-sra @gol
 -ftree-pta @gol
@@ -6978,19 +6985,23 @@ also turns on the following optimization flags:
 -fhoist-adjacent-loads @gol
 -finline-small-functions @gol
 -findirect-inlining @gol
+-fipa-cp @gol
 -fipa-sra @gol
 -fisolate-erroneous-paths-dereference @gol
 -foptimize-sibling-calls @gol
+-foptimize-strlen @gol
 -fpartial-inlining @gol
 -fpeephole2 @gol
--freorder-blocks  -freorder-functions @gol
+-freorder-blocks -freorder-blocks-and-partition -freorder-functions @gol
 -frerun-cse-after-loop  @gol
 -fsched-interblock  -fsched-spec @gol
 -fschedule-insns  -fschedule-insns2 @gol
 -fstrict-aliasing -fstrict-overflow @gol
+-ftree-builtin-call-dce @gol
 -ftree-switch-conversion -ftree-tail-merge @gol
 -ftree-pre @gol
--ftree-vrp}
+-ftree-vrp @gol
+-fuse-caller-save}
 
 Please note the warning under @option{-fgcse} about
 invoking @option{-O2} on programs that use computed gotos.
@@ -7000,9 +7011,10 @@ invoking @option{-O2} on programs that use computed gotos.
 Optimize yet more.  @option{-O3} turns on all optimizations specified
 by @option{-O2} and also turns on the @option{-finline-functions},
 @option{-funswitch-loops}, @option{-fpredictive-commoning},
-@option{-fgcse-after-reload}, @option{-ftree-loop-vectorize},
-@option{-ftree-slp-vectorize}, @option{-fvect-cost-model},
-@option{-ftree-partial-pre} and @option{-fipa-cp-clone} options.
+@option{-fgcse-after-reload}, @option{-ftree-loop-distribute-patterns},
+@option{-ftree-loop-vectorize}, @option{-ftree-slp-vectorize},
+@option{-fvect-cost-model}, @option{-ftree-partial-pre}
+and @option{-fipa-cp-clone} options.
 
 @item -O0
 @opindex O0
@@ -7113,6 +7125,14 @@ Optimize sibling and tail recursive calls.
 
 Enabled at levels @option{-O2}, @option{-O3}, @option{-Os}.
 
+@item -foptimize-strlen
+@opindex foptimize-strlen
+Optimize various standard C string functions (e.g. @code{strlen},
+@code{strchr} or @code{strcpy}) and
+their _FORTIFY_SOURCE counterparts into faster alternatives.
+
+Enabled at levels @option{-O2}, @option{-O3}.
+
 @item -fno-inline
 @opindex fno-inline
 Do not expand any functions inline apart from those marked with
@@ -7278,6 +7298,8 @@ register, compare it against zero, then branch based upon the result.
 This option is only meaningful on architectures that support such
 instructions, which include x86, PowerPC, IA-64 and S/390.
 
+Enabled by default at -O1 and higher.
+
 The default is @option{-fbranch-count-reg}.
 
 @item -fno-function-cse


Re: [GSoC] generation of Gimple loops with empty bodies

2014-07-11 Thread Roman Gareev
 I would use -x to see the headers. This should tell you if it is defined
 or only used there.

It gives the following output:

roman@roman-System-Product-Name:~/isl-0.12.2/lib$ objdump -x
libisl.so.10.2.2 | grep isl_ast_expr_get_val
00060380 g F .text 0053
isl_ast_expr_get_val

 Sorry, but this still needs to be debugged. :-( I am too busy to do it
 myself, so i am afraid you may just need to dig into it yourself. :-(

I've found the reason of this error. Somehow cc1 used an older version
of isl provided by the package from the standard repository in spite
of manually selected paths to isl 0.12.2. I think it is better to move
on and not to concentrate on this.

 +#include graphite-htab.h

 Is this still needed?

No, this is redundant.

 +/* IVS_PARAMS_MAP maps ISL's scattering and parameter identifiers
 +   to corresponding trees.  */
 +typedef struct ivs_params {
 +  std::mapisl_id *, tree, id_comp ivs_params_map;

 What about calling it isl_id_to_tree?

Maybe it would be better to call it tree_from_isl_id (because it would
help to avoid confusion with definitions from isl)?

 +  sese region;

 Is this region needed?

No, this is redundant.

 +   We create a new if region protecting the loop to be executed, if
 +   the execution count is zero (lower bound  upper bound). In this case,
 +   it is subsequently removed by dead code elimination.  */


 The last sentence is unclear. Are you saying the loop will be removed or
 that the condition will be removed in case it is always true. Also, what
 about cases where the condition remains?

I wanted to say that the loop will be removed. Maybe it would be
better to remove the last sentence at all (because the previous
sentence already explains the motivation)?

 Those changes are unrelated and are part of the other patch, right?

Yes, these changes were made in the revision under No. 212361.

 +  ip.region = region;

 Is this needed? ip.region seems unused.

Yes, this is redundant.


I tried to incorporate all your comments in the attached patch.

--
   Cheers, Roman Gareev.
Index: gcc/graphite-isl-ast-to-gimple.c
===
--- gcc/graphite-isl-ast-to-gimple.c(revision 212361)
+++ gcc/graphite-isl-ast-to-gimple.c(working copy)
@@ -25,7 +25,14 @@
 #include isl/map.h
 #include isl/union_map.h
 #include isl/ast_build.h
+#if defined(__cplusplus)
+extern C {
 #endif
+#include isl/val_gmp.h
+#if defined(__cplusplus)
+}
+#endif
+#endif
 
 #include system.h
 #include coretypes.h
@@ -42,6 +49,9 @@
 #include cfgloop.h
 #include tree-data-ref.h
 #include sese.h
+#include tree-ssa-loop-manip.h
+#include tree-scalar-evolution.h
+#include map
 
 #ifdef HAVE_cloog
 #include graphite-poly.h
@@ -52,6 +62,517 @@
 
 static bool graphite_regenerate_error;
 
+/* We always use signed 128, until isl is able to give information about
+types  */
+
+static tree *graphite_expression_size_type = int128_integer_type_node;
+
+/* Converts a GMP constant VAL to a tree and returns it.  */
+
+static tree
+gmp_cst_to_tree (tree type, mpz_t val)
+{
+  tree t = type ? type : integer_type_node;
+  mpz_t tmp;
+
+  mpz_init (tmp);
+  mpz_set (tmp, val);
+  wide_int wi = wi::from_mpz (t, tmp, true);
+  mpz_clear (tmp);
+
+  return wide_int_to_tree (t, wi);
+}
+
+/* Verifies properties that GRAPHITE should maintain during translation.  */
+
+static inline void
+graphite_verify (void)
+{
+#ifdef ENABLE_CHECKING
+  verify_loop_structure ();
+  verify_loop_closed_ssa (true);
+#endif
+}
+
+/* TREE_FROM_ISL_ID maps ISL's scattering and parameter identifiers
+   to corresponding trees.  */
+
+typedef struct ivs_params {
+  std::mapisl_id *, tree tree_from_isl_id;
+} *ivs_params_p;
+
+/* Free all memory allocated for ISL's identifiers.  */
+
+void ivs_params_clear (ivs_params_p ip)
+{
+  std::mapisl_id *, tree::iterator it;
+  for (it = ip-tree_from_isl_id.begin ();
+   it != ip-tree_from_isl_id.end (); it++)
+{
+  isl_id_free (it-first);
+}
+}
+
+static tree
+gcc_expression_from_isl_expression (tree type, __isl_take isl_ast_expr *,
+   ivs_params_p ip);
+
+/* Return the tree variable that corresponds to the given isl ast identifier
+ expression (an isl_ast_expr of type isl_ast_expr_id).  */
+
+static tree
+gcc_expression_from_isl_ast_expr_id (__isl_keep isl_ast_expr *expr_id,
+ivs_params_p ip)
+{
+  gcc_assert (isl_ast_expr_get_type (expr_id) == isl_ast_expr_id);
+  isl_id *tmp_isl_id = isl_ast_expr_get_id (expr_id);
+  std::mapisl_id *, tree::iterator res;
+  res = ip-tree_from_isl_id.find (tmp_isl_id);
+  isl_id_free (tmp_isl_id);
+  gcc_assert (res != ip-tree_from_isl_id.end () 
+  Could not map isl_id to tree expression);
+  isl_ast_expr_free (expr_id);
+  return res-second;
+}
+
+/* Converts an isl_ast_expr_int expression E to a GCC expression tree of
+   type TYPE.  */
+
+static tree

Re: [patch 1/4] change specific int128 - generic intN

2014-07-11 Thread Eric Botcazou
 PSImode is 20 bits, fits in a 20 bit register, and uses 20 bit operations.

Then why do you need this change?

 -  TYPE_SIZE (type) = bitsize_int (GET_MODE_BITSIZE (TYPE_MODE (type)));
 +  TYPE_SIZE (type) = bitsize_int (GET_MODE_PRECISION (TYPE_MODE
 (type))); TYPE_SIZE_UNIT (type) = size_int (GET_MODE_SIZE (TYPE_MODE
 (type))); break;

What are GET_MODE_BITSIZE and GET_MODE_PRECISION for PSImode?

 If a type is 17-20 bits, PSImode is chosen.  If it's 21 bits or
 larger, SImode is chosen.  If it's 16 or fewer bits, HImode is chosen.

Size or precision?  That's the crux of the matter.

-- 
Eric Botcazou


Fwd: Re: [PING][PATCH] Fix for PR 61561

2014-07-11 Thread Marat Zakirov

Thank to you all.

Committed revision 212450.

--Marat

 Original Message 
Subject:Re: [PING][PATCH] Fix for PR 61561
Date:   Thu, 10 Jul 2014 14:01:24 +0100
From:   Ramana Radhakrishnan ramana.radhakrish...@arm.com
To: 	Marat Zakirov m.zaki...@samsung.com, gcc-patches@gcc.gnu.org 
gcc-patches@gcc.gnu.org
CC: 	Richard Earnshaw richard.earns...@arm.com, Kyrylo Tkachov 
kyrylo.tkac...@arm.com, Slava Garbuzov v.garbu...@samsung.com, Yuri 
Gribov tetra2...@gmail.com, mara...@gmail.com mara...@gmail.com






On 30/06/14 16:21, Marat Zakirov wrote:


Thank for your attention.


This is OK for trunk - Sorry about the delayed response.

Ramana


Marat.







Re: Optimize type streaming

2014-07-11 Thread Richard Biener
On Fri, 11 Jul 2014, Jan Hubicka wrote:

  Are we sure it ever stabilizes? But yes, I had something like this in mind 
  (just do one iteration always) in case we need to improve hashing.
 
 Hi,
 this is bit quick experiment with strengthening the hash by iteration. I don't
 seem to be able to measure WPA time difference for the patch though (but on
 positive side, it also does not make streaming out slower). I did not find
 clean way to hook things into streamer cache, so I added separate hash for
 that; I am sure you will know better ;)

Well - as we re-use the streamer cache to store the hash value it isn't
really possible to do better ... (at least I don't have a clever idea)

 I added dumping for duplicated trees withing SCC regions that shows some 
 interesting
 info.  It seems that at firefox we do have duplicate type variants, duplicated
 const decls such as:
  function_type 0x2bf2a540
 type integer_type 0x2bde95e8 int sizes-gimplified public SI
 size integer_cst 0x2bdd4e70 constant 32
 unit size integer_cst 0x2bdd4e88 constant 4
 align 32 symtab 0 alias set 0 canonical type 0x2bde95e8 precision 
 32 min integer_cst 0x2bdd4e28 -2147483648 max integer_cst 
 0x2bdd4e40 2147483647 context translation_unit_decl 0x2db62a00 
 /aux/hubicka/firefox/db/sqlite3/src/sqlite3.c
 pointer_to_this pointer_type 0x2dc527e0
 QI
 size integer_cst 0x2bdd4d20 type integer_type 0x2bde90a8 
 bitsizetype constant 8
 unit size integer_cst 0x2bdd4d38 type integer_type 0x2bde9000 
 sizetype constant 1
 align 8 symtab 0 alias set -1 canonical type 0x2bf2a5e8
 arg-types tree_list 0x2beedd20
 value pointer_type 0x2bf1e930 type record_type 0x2bf1e5e8 
 sqlite3_file
 sizes-gimplified public unsigned DI
 size integer_cst 0x2bdd4c30 constant 64
 unit size integer_cst 0x2bdd4c48 constant 8
 align 64 symtab 0 alias set -1 canonical type 0x2bf1e9d8
 pointer_to_this pointer_type 0x2c5dadc8
 chain tree_list 0x2beedcf8 value integer_type 0x2bde95e8 
 int
 chain tree_list 0x2bde4640 value void_type 0x2bde9f18 
 void
 pointer_to_this pointer_type 0x2bf2a690
  function_type 0x2bf2adc8
 type integer_type 0x2bde95e8 int sizes-gimplified public SI
 size integer_cst 0x2bdd4e70 constant 32
 unit size integer_cst 0x2bdd4e88 constant 4
 align 32 symtab 0 alias set 0 canonical type 0x2bde95e8 precision 
 32 min integer_cst 0x2bdd4e28 -2147483648 max integer_cst 
 0x2bdd4e40 2147483647 context translation_unit_decl 0x2db62a00 
 /aux/hubicka/firefox/db/sqlite3/src/sqlite3.c
 pointer_to_this pointer_type 0x2dc527e0
 QI
 size integer_cst 0x2bdd4d20 type integer_type 0x2bde90a8 
 bitsizetype constant 8
 unit size integer_cst 0x2bdd4d38 type integer_type 0x2bde9000 
 sizetype constant 1
 align 8 symtab 0 alias set -1 canonical type 0x2bf2ae70
 arg-types tree_list 0x2beedfa0
 value pointer_type 0x2bf1e930 type record_type 0x2bf1e5e8 
 sqlite3_file
 sizes-gimplified public unsigned DI
 size integer_cst 0x2bdd4c30 constant 64
 unit size integer_cst 0x2bdd4c48 constant 8
 align 64 symtab 0 alias set -1 canonical type 0x2bf1e9d8
 pointer_to_this pointer_type 0x2c5dadc8
 chain tree_list 0x2beedf78 value pointer_type 0x2bdf1690
 chain tree_list 0x2bde4640 value void_type 0x2bde9f18 
 void
 pointer_to_this pointer_type 0x2bf2af18
 
 constants as:
 
  integer_cst 0x2e8101b0 type enumeral_type 0x2e7e55e8 tag_nsresult 
 constant 0
  integer_cst 0x2e810600 type enumeral_type 0x2e7e55e8 tag_nsresult 
 constant 0
  integer_cst 0x2e810c90 type enumeral_type 0x2e7e55e8 tag_nsresult 
 constant 0
  integer_cst 0x2e810cc0 type enumeral_type 0x2e7e55e8 tag_nsresult 
 constant 0
  integer_cst 0x2e810ee8 type enumeral_type 0x2e7e55e8 tag_nsresult 
 constant 0
  integer_cst 0x2e810fd8 type enumeral_type 0x2e7e55e8 tag_nsresult 
 constant 5242893
  integer_cst 0x2e8130a8 type enumeral_type 0x2e7e55e8 tag_nsresult 
 constant 5242893
  integer_cst 0x2e814858 type enumeral_type 0x2e7e55e8 tag_nsresult 
 constant 7864321
  integer_cst 0x2e814870 type enumeral_type 0x2e7e55e8 tag_nsresult 
 constant 7864321
  integer_cst 0x2e810f78 type enumeral_type 0x2e7e55e8 tag_nsresult 
 constant 2152726542
  integer_cst 0x2e8130c0 type enumeral_type 0x2e7e55e8 tag_nsresult 
 constant 2152726542
 
 tree lists:
 
  tree_list 0x2e952708
 value pointer_type 0x2e954888
 type record_type 0x2e9509d8 PLDHashTable sizes-gimplified type_5 
 type_6 BLK
 size integer_cst 0x2d5229a8 

Re: [patch,gomp-4_0-branch] misc reduction clause bug fixes

2014-07-11 Thread Thomas Schwinge
Hi Cesar!

On Thu, 10 Jul 2014 11:43:11 -0700, Cesar Philippidis ce...@codesourcery.com 
wrote:
 This patch addresses two bugs openacc reduction clause bugs.

Thanks!  OK; one question/suggestion, though:

 --- a/gcc/omp-low.c
 +++ b/gcc/omp-low.c
 @@ -9679,11 +9679,23 @@ process_reduction_data (gimple_seq *body, gimple_seq 
 *in_stmt_seqp,
gcc_assert (is_gimple_omp_oacc_specifically (ctx-stmt));
  
gimple_stmt_iterator gsi;
 +  gimple_seq inner;
 +  gimple stmt;
 +
 +  /* A collapse clause may have inserted a new bind block.  */
 +  stmt = gimple_seq_first (*body);
 +  if (stmt  gimple_code (stmt) == GIMPLE_BIND)
 +{
 +  inner = gimple_bind_body (gimple_seq_first (*body));
 +  body = inner;
 +}
  
for (gsi = gsi_start (*body); !gsi_end_p (gsi); gsi_next (gsi))
  {
gimple stmt = gsi_stmt (gsi);

Can get rid of this shadow variable stmt?


Grüße,
 Thomas


pgpAk2WKO0Q1o.pgp
Description: PGP signature


Re: [PATCH] Fix PR61762, reads from string constants

2014-07-11 Thread Richard Biener
On Fri, 11 Jul 2014, Richard Biener wrote:

 On Fri, 11 Jul 2014, Jakub Jelinek wrote:
 
  On Fri, Jul 11, 2014 at 10:00:01AM +0200, Richard Biener wrote:
   I started with adding an optional arg to native_encode_expr but then
   figured that all the current workers return failure if the entire
   expr doesn't fit within 'len'.  But what I want is a native_encode_expr
  
  That is because the functions were written for VIEW_CONVERT_EXPR folding
  where it is expected that both the types are the same size, and it doesn't
  make sense to spend too much time on large stuff that doesn't fit into
  the fold_view_convert_expr's fixed size buffer.  Now, for your case
  the sizes aren't the same and thus partial data is just fine, because
  what native_interpret_expr will do is small fixed size.
  So I don't see a problem with not returning NULL and instead handling the
  partial store into buffer if you ask for it through optional argument
  (doing it always would be fine for fold_view_convert_expr assuming nobody
  feeds us different sized argument vs. VCE type, but e.g.
  tree-loop-distribution.c certainly appreciates if we return NULL for large
  things, because if it had just partial buffer, it would not be able to
  verify what it is looking for).
 
 Yeah, if you pass 'offset' then 'len' suddenly changes semantics
 from specifying the buffer size to specifying the desired amount
 of encoded data ... (which the present interface computes for you,
 the caller doesn't even need to know how large it is).
 
 The question is also if I ask native_encode_expr to produce
 data for [offset, offset + len] but the object only contains
 part of it - thus the tail is 'undefined' - do we want to fail
 or simply feed (undefined) garbage to the following
 native_interpret_expr?  (yes, native_encode_expr will of course
 return sth  len in this case)

Like the following (completely untested - lacks handling of
len  total_bytes - off as I just figured out).  If off is -1
then it should behave as the current interface, else we encode
the part [off, off + MIN (len, total_bytes)] and return
MIN (len, total_bytes).  Ah, a guard against off = total_bytes
is also missing.

I wonder if it would be ok to start with gcc_assert (off == -1)
in all but native_encode_string.

Does the interface look sane?

Thanks,
Richard.

Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 212449)
+++ gcc/fold-const.c(working copy)
@@ -7239,15 +7239,17 @@ fold_plusminus_mult_expr (location_t loc
upon failure.  */
 
 static int
-native_encode_int (const_tree expr, unsigned char *ptr, int len)
+native_encode_int (const_tree expr, unsigned char *ptr, int len, int off)
 {
   tree type = TREE_TYPE (expr);
   int total_bytes = GET_MODE_SIZE (TYPE_MODE (type));
   int byte, offset, word, words;
   unsigned char value;
 
-  if (total_bytes  len)
+  if (off == -1  total_bytes  len)
 return 0;
+  if (off == -1)
+off = 0;
   words = total_bytes / UNITS_PER_WORD;
 
   for (byte = 0; byte  total_bytes; byte++)
@@ -7270,9 +7272,10 @@ native_encode_int (const_tree expr, unsi
}
   else
offset = BYTES_BIG_ENDIAN ? (total_bytes - 1) - byte : byte;
-  ptr[offset] = value;
+  if (offset = off)
+   ptr[offset - off] = value;
 }
-  return total_bytes;
+  return total_bytes - off;
 }
 
 
@@ -7282,7 +7285,7 @@ native_encode_int (const_tree expr, unsi
upon failure.  */
 
 static int
-native_encode_fixed (const_tree expr, unsigned char *ptr, int len)
+native_encode_fixed (const_tree expr, unsigned char *ptr, int len, int off)
 {
   tree type = TREE_TYPE (expr);
   enum machine_mode mode = TYPE_MODE (type);
@@ -7290,7 +7293,8 @@ native_encode_fixed (const_tree expr, un
   FIXED_VALUE_TYPE value;
   tree i_value, i_type;
 
-  if (total_bytes * BITS_PER_UNIT  HOST_BITS_PER_DOUBLE_INT)
+  if (off == -1
+   total_bytes * BITS_PER_UNIT  HOST_BITS_PER_DOUBLE_INT)
 return 0;
 
   i_type = lang_hooks.types.type_for_size (GET_MODE_BITSIZE (mode), 1);
@@ -7302,7 +7306,7 @@ native_encode_fixed (const_tree expr, un
   value = TREE_FIXED_CST (expr);
   i_value = double_int_to_tree (i_type, value.data);
 
-  return native_encode_int (i_value, ptr, len);
+  return native_encode_int (i_value, ptr, len, off);
 }
 
 
@@ -7312,7 +7316,7 @@ native_encode_fixed (const_tree expr, un
upon failure.  */
 
 static int
-native_encode_real (const_tree expr, unsigned char *ptr, int len)
+native_encode_real (const_tree expr, unsigned char *ptr, int len, int off)
 {
   tree type = TREE_TYPE (expr);
   int total_bytes = GET_MODE_SIZE (TYPE_MODE (type));
@@ -7324,8 +7328,11 @@ native_encode_real (const_tree expr, uns
  up to 192 bits.  */
   long tmp[6];
 
-  if (total_bytes  len)
+  if (off == -1
+   total_bytes  len)
 return 0;
+  if (off == -1)
+off = 0;
   words = (32 / BITS_PER_UNIT) / UNITS_PER_WORD;
 
   real_to_target (tmp, TREE_REAL_CST_PTR (expr), 

Re: [PATCH] Generate more efficient memory barriers for LEON3

2014-07-11 Thread Eric Botcazou
 The full barrier pattern membar_leon3 also gets generated so I think
 that one should be kept also.

Do you have a testcase?  membar is generated by sparc_emit_membar_for_model 
and, for the TSO model of LEON3, implied = StoreStore | LoadLoad | LoadStore 
so mm can only be StoreLoad, which means that membar_storeload will match so 
the full barrier never will.

-- 
Eric Botcazou


Use separate sections to stream non-trivial constructors

2014-07-11 Thread Jan Hubicka
Hi,
since we both agreed offlining constructors from global decl stream is a good
idea, I went ahead and implemented it.  I would like to followup by an
cleanups; for example the sections are still tagged as function sections, but I
would like to do it incrementally. There is quite some uglyness in the way we
handle function sections and the patch started to snowball very quickly.

The patch conceptually copies what we do for functions and re-uses most of
infrastructure. varpool_get_constructor is cgraph_get_body (i.e. mean of
getting function in) and it is used by output machinery, by ipa-visibility
while rewritting the constructor and by ctor_for_folding (which makes us to
load the ctor whenever it is needed by ipa-cp or ipa-devirt).

I kept get_symbol_initial_value as an authority to decide if we want to encode
given constructor or not.  The section itself for trivial ctor is about 25
bytes and with header it is probably close to double of it. Currently the 
heuristic
is to offline only constructors that are CONSTRUCTOR and keep simple expressions
inline.  We may want to tweak it.

The patch does not bring miraculous savings to firefox WPA, but it does some:

GGC memory after global stream is read goes from 1376898k to 1250533k
overall GGC allocations from 4156478 kB to 4012462 kB
read 11006599 SCCs of average size 1.907692 - read 9119433 SCCs of average 
size 2.037867
20997206 tree bodies read in total - 18584194 tree bodies read in total
Size of mmap'd section decls: 299540188 bytes - Size of mmap'd section decls: 
271557265 bytes
Size of mmap'd section function_body: 5711078 bytes - Size of mmap'd section 
function_body: 7548680 bytes 

Things would be better if ipa-visibility and ipa-devirt did not load most of
the virtual tables into memory (still better than loading each into memory 20
times at average).  I will work on that incrementally. We load 10311 ctors into
memory at WPA time.

Note that firefox seems to feature really huge data segment these days.
http://hubicka.blogspot.ca/2014/04/linktime-optimization-in-gcc-2-firefox.html

Bootstrapped/regtested x86_64-linux, tested with firefox, lto bootstrap in 
progress, OK?

* vapool.c: Include tree-ssa-alias.h, gimple.h and lto-streamer.h
(varpool_get_constructor): New function.
(ctor_for_folding): Use it.
(varpool_assemble_decl): Likewise.
* lto-streamer.h (struct output_block): Turn cgraph_node
to symbol filed.
(lto_input_variable_constructor): Declare.
* ipa-visibility.c (function_and_variable_visibility): Use
varpool_get_constructor.
* cgraph.h (varpool_get_constructor): Declare.
* lto-streamer-out.c (get_symbol_initial_value): Take encoder
parameter; return error_mark_node for non-trivial constructors.
(lto_write_tree_1, DFS_write_tree): UPdate use of
get_symbol_initial_value.
(output_function): Update initialization of symbol.
(output_constructor): New function.
(copy_function): Rename to ..
(copy_function_or_variable): ... this one; handle vars too.
(lto_output): Output variable sections.
* lto-streamer-in.c (input_constructor): New function.
(lto_read_body): Rename from ...
(lto_read_body_or_constructor): ... this one; handle vars
too.
(lto_input_variable_constructor): New function.
* ipa-prop.c (ipa_prop_write_jump_functions,
ipa_prop_write_all_agg_replacement): Update.
Index: varpool.c
===
--- varpool.c   (revision 212426)
+++ varpool.c   (working copy)
@@ -35,6 +35,9 @@ along with GCC; see the file COPYING3.
 #include gimple-expr.h
 #include flags.h
 #include pointer-set.h
+#include tree-ssa-alias.h
+#include gimple.h
+#include lto-streamer.h
 
 const char * const tls_model_names[]={none, tls-emulated, tls-real,
  tls-global-dynamic, tls-local-dynamic,
@@ -253,6 +256,41 @@ varpool_node_for_asm (tree asmname)
 return NULL;
 }
 
+/* When doing LTO, read NODE's constructor from disk if it is not already 
present.  */
+
+tree
+varpool_get_constructor (struct varpool_node *node)
+{
+  struct lto_file_decl_data *file_data;
+  const char *data, *name;
+  size_t len;
+  tree decl = node-decl;
+
+  if (DECL_INITIAL (node-decl) != error_mark_node
+  || !in_lto_p)
+return DECL_INITIAL (node-decl);
+
+  file_data = node-lto_file_data;
+  name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
+
+  /* We may have renamed the declaration, e.g., a static function.  */
+  name = lto_get_decl_name_mapping (file_data, name);
+
+  data = lto_get_section_data (file_data, LTO_section_function_body,
+  name, len);
+  if (!data)
+fatal_error (%s: section %s is missing,
+file_data-file_name,
+name);
+
+  lto_input_variable_constructor (file_data, node, data);
+  

Re: [GSoC] generation of Gimple loops with empty bodies

2014-07-11 Thread Tobias Grosser

I tried to incorporate all your comments in the attached patch.


Nice. It looks now very good to me. One final and last detail:


+/* TREE_FROM_ISL_ID maps ISL's scattering and parameter identifiers
+   to corresponding trees.  */
+
+typedef struct ivs_params {
+  std::mapisl_id *, tree tree_from_isl_id;
+} *ivs_params_p;

The struct now contains only a single element such that there seems to 
be no need for it anymore. Should we remove it? (We could still use a 
typedef if you believe the datatype is too long).


Cheers,
Tobias


Re: [patch,gomp-4_0-branch] openacc collapse clause

2014-07-11 Thread Thomas Schwinge
Hi Cesar!

On Thu, 10 Jul 2014 11:47:42 -0700, Cesar Philippidis ce...@codesourcery.com 
wrote:
 These patch enables the collapse clause with a value greater than one.

Thanks!

 Is this patch OK for gomp-4_0-branch?

OK with the following addressed:

 --- a/gcc/c/c-parser.c
 +++ b/gcc/c/c-parser.c
 @@ -11260,6 +11260,10 @@ c_parser_oacc_all_clauses (c_parser *parser, 
 omp_clause_mask mask,
  
switch (c_kind)
   {
 + case PRAGMA_OMP_CLAUSE_COLLAPSE:
 +   clauses = c_parser_omp_clause_collapse (parser, clauses);
 +   c_name = collapse;
 +   break;

Annotate c_parser_omp_clause_collapse that it is now used for OpenACC,
too.


  #define OACC_LOOP_CLAUSE_MASK
 \
 - (OMP_CLAUSE_MASK_1  PRAGMA_OMP_CLAUSE_REDUCTION)
 + ( (OMP_CLAUSE_MASK_1  PRAGMA_OMP_CLAUSE_COLLAPSE) \
 + | (OMP_CLAUSE_MASK_1  PRAGMA_OMP_CLAUSE_REDUCTION))

Given the syntax generally used, add a space between the two closing
parens.


Even though we're re-using OpenMP's tested implementation, please add
some basic front end test coverage for OpenACC: reject invalid, accept
valid syntax.


Grüße,
 Thomas


pgpbYNzXe3f6Q.pgp
Description: PGP signature


Re: Optimize type streaming

2014-07-11 Thread Jan Hubicka
 
 Well - as we re-use the streamer cache to store the hash value it isn't
 really possible to do better ... (at least I don't have a clever idea)

OK, no cleverness on my side either.
 
 Yeah (though you wouldn't need the extra hashing - we only need to know
 the hash of the SCC).  The current iterating to iterate until no
 duplicate hashes are found is bogous IMHO - better do the right thing
 from the start or only iterate until at least _one_ hash is unique
 (you get rid of the scc_entry_len  1 case which can be quite expensive).

agreed. I just got this implemented before I got into enough of understading
of the lto/lto.c side of the machinery - I assumed that identical nodes
are tested deeply but that would of course be expensive

 Of course the question is whether we can always guarantee this property
 ... if we can then the offline compare can be expanded to all trees
 (it requires a stable sort of the SCCs independent on SCC entry).

I do not think we need stable sorting here.  This is what I think extra
DFS walk should ensure - as soon as we have unique entry (by identifying first
node w/o conflict), the DFS walks in all units will give same order.

I do not think we can 100% guarantee it, but we can get very close to 100%.
If we won't find unique entry, I suppose we can stick with current scheme or
just drop the complicated path with multiple entries and do not merge in this
case (or just merge when the entry happens to match by accident).
 
  This is because those otherwise equivalent trees are not really 
  equivalent from merging perspective. You test strict pointer equivalence 
  outside SCC regions, so it matters whether two same trees are referred 
  or two equivalent copies.
 
 Hmm?  So you say you have two identical trees in the _same_ SCC?
 How can that be?  Answer: not possible - they at least differ
 in the edges forming the SCC (unless we have a cyclic list but
 I don't think we (should) have that).

We hash only on outgoing SCC edges.  You can easily have
main variant type T and variants T1,T2 that are same all used by type
T again.

T1 and T2 differs by references in T, but they will always have same
hash and in theory they can be merged if your code worked recursively
on nested SCC regions (as given by Tarjan's algorithm).  Because you work on
outermost SCC regions only, we never merge T1 and T2.

The congruence you compute is not best possible, but best possible assuming no
duplicates within compilation units.  I think this is sane, we should look
towards eliminating such duplicates rather than working harder in the streamer.
 
 Btw, if you re-hash only nodes with duplicate hashes might there be
 a higher probability of ending up with a hash that collides with
 one that was not a duplicate one?  That is, why not simply re-hash
 all nodes?  (premature optimization...?)
 
 So - can you can iterate on this re-doing the DFS walk with the
 found entry node?

Yep, that is the plan. Iterate long enough until I get unqiue hash.  Can I
easilly re-do the DFS walk?

Honza
 
 Thanks,
 Richard.
 
  Honza
  
  * lto-streamer-out.c (hash_tree): Add map argument;
  remove special purpose hack for pointer types.
  (visit): Update it.
  (hash_scc): Add iteration until equivalence classes stabilize.
  Index: lto-streamer-out.c
  ===
  --- lto-streamer-out.c  (revision 212426)
  +++ lto-streamer-out.c  (working copy)
  @@ -690,13 +712,19 @@ DFS_write_tree_body (struct output_block
   /* Return a hash value for the tree T.  */
   
   static hashval_t
  -hash_tree (struct streamer_tree_cache_d *cache, tree t)
  +hash_tree (struct streamer_tree_cache_d *cache, hash_maptree, hashval_t 
  *map, tree t)
   {
   #define visit(SIBLING) \
 do { \
   unsigned ix; \
  -if (SIBLING  streamer_tree_cache_lookup (cache, SIBLING, ix)) \
  +if (!SIBLING) \
  +  v = iterative_hash_hashval_t (0, v); \
  +else if (streamer_tree_cache_lookup (cache, SIBLING, ix)) \
 v = iterative_hash_hashval_t (streamer_tree_cache_get_hash (cache, 
  ix), v); \
  +else if (map) \
  +  v = iterative_hash_hashval_t (*map-get (SIBLING), v); \
  +else \
  +  v = iterative_hash_hashval_t (1, v); \
 } while (0)
   
 /* Hash TS_BASE.  */
  @@ -896,23 +924,7 @@ hash_tree (struct streamer_tree_cache_d
   
 if (CODE_CONTAINS_STRUCT (code, TS_TYPED))
   {
  -  if (POINTER_TYPE_P (t))
  -   {
  - /* For pointers factor in the pointed-to type recursively as
  -we cannot recurse through only pointers.
  -???  We can generalize this by keeping track of the
  -in-SCC edges for each tree (or arbitrarily the first
  -such edge) and hashing that in in a second stage
  -(instead of the quadratic mixing of the SCC we do now).  */
  - hashval_t x;
  - unsigned ix;
  - if (streamer_tree_cache_lookup (cache, TREE_TYPE (t), ix))
  -   x = 

Re: [PATCH] Fix PR61762, reads from string constants

2014-07-11 Thread Jakub Jelinek
On Fri, Jul 11, 2014 at 11:13:12AM +0200, Richard Biener wrote:
 +  if (offset = off)
 +   ptr[offset - off] = value;

For original off != 0, you aren't checking whether offset - off  len
though (various places), you don't want to write beyond end of buffer.

 -  return total_bytes;
 +  return total_bytes - off;

What will happen with originally off is bigger than total_bytes?
Returning negative value sounds wrong, IMHO you should in that case return
early 0.  So like:
if ((off == -1  total_bytes  len) || off = total_bytes)
  return 0;

 @@ -7290,7 +7293,8 @@ native_encode_fixed (const_tree expr, un
FIXED_VALUE_TYPE value;
tree i_value, i_type;
  
 -  if (total_bytes * BITS_PER_UNIT  HOST_BITS_PER_DOUBLE_INT)
 +  if (off == -1
 +   total_bytes * BITS_PER_UNIT  HOST_BITS_PER_DOUBLE_INT)
  return 0;

This isn't comparing total_bytes to len, so IMHO shouldn't be changed.

 @@ -7324,8 +7328,11 @@ native_encode_real (const_tree expr, uns
   up to 192 bits.  */
long tmp[6];
  
 -  if (total_bytes  len)
 +  if (off == -1
 +   total_bytes  len)

This can fit onto one line.

 -  rsize = native_encode_expr (part, ptr, len);
 +  rsize = native_encode_expr (part, ptr, len, off);
if (rsize == 0)
  return 0;

If off is not -1 and len is too short, the above will do a partial
store.  But:
a) if it is a partial store, because some bytes didn't fit, then the
second native_encode_expr should probably not be invoked
b) what about the case when the first one returns 0 because you are asking
for few bytes from the imag part?

part = TREE_IMAGPART (expr);
 -  isize = native_encode_expr (part, ptr+rsize, len-rsize);
 -  if (isize != rsize)
 +  if (off != -1)
 +off = MAX (0, off - rsize);
 +  isize = native_encode_expr (part, ptr+rsize, len-rsize, off);
 +  if (off == -1
 +   isize != rsize)
  return 0;
return rsize + isize;
  }

 @@ -7396,9 +7408,13 @@ native_encode_vector (const_tree expr, u
for (i = 0; i  count; i++)
  {
elem = VECTOR_CST_ELT (expr, i);
 -  if (native_encode_expr (elem, ptr+offset, len-offset) != size)
 +  int res = native_encode_expr (elem, ptr+offset, len-offset, off);
 +  if (off == -1
 +res != size)
   return 0;

I don't think this will work correctly if off is not -1.

if (TREE_STRING_LENGTH (expr)  total_bytes)

No verification that you are not accessing beyond end of string here.
  {
 -  memcpy (ptr, TREE_STRING_POINTER (expr), TREE_STRING_LENGTH (expr));
 -  memset (ptr + TREE_STRING_LENGTH (expr), 0,
 -   total_bytes - TREE_STRING_LENGTH (expr));
 +  memcpy (ptr, TREE_STRING_POINTER (expr) + off,
 +   TREE_STRING_LENGTH (expr) - off);
 +  memset (ptr + TREE_STRING_LENGTH (expr) - off, 0,
 +   MIN (total_bytes, len) - TREE_STRING_LENGTH (expr) + off);
  }
else
 -memcpy (ptr, TREE_STRING_POINTER (expr), total_bytes);
 -  return total_bytes;
 +memcpy (ptr, TREE_STRING_POINTER (expr) + off,
 + MIN (total_bytes, len));
 +  return MIN (total_bytes - off, len);
  }

Jakub


Re: Optimize type streaming

2014-07-11 Thread Richard Biener
On Fri, 11 Jul 2014, Jan Hubicka wrote:

  
  Well - as we re-use the streamer cache to store the hash value it isn't
  really possible to do better ... (at least I don't have a clever idea)
 
 OK, no cleverness on my side either.
  
  Yeah (though you wouldn't need the extra hashing - we only need to know
  the hash of the SCC).  The current iterating to iterate until no
  duplicate hashes are found is bogous IMHO - better do the right thing
  from the start or only iterate until at least _one_ hash is unique
  (you get rid of the scc_entry_len  1 case which can be quite expensive).
 
 agreed. I just got this implemented before I got into enough of understading
 of the lto/lto.c side of the machinery - I assumed that identical nodes
 are tested deeply but that would of course be expensive
 
  Of course the question is whether we can always guarantee this property
  ... if we can then the offline compare can be expanded to all trees
  (it requires a stable sort of the SCCs independent on SCC entry).
 
 I do not think we need stable sorting here.  This is what I think extra
 DFS walk should ensure - as soon as we have unique entry (by identifying first
 node w/o conflict), the DFS walks in all units will give same order.

Yes, that's what I call stable sorting of the SCC.

 I do not think we can 100% guarantee it, but we can get very close to 100%.
 If we won't find unique entry, I suppose we can stick with current scheme or
 just drop the complicated path with multiple entries and do not merge in this
 case (or just merge when the entry happens to match by accident).

Yeah, I suppose we want some statistics here (how many times we have
to iterate - I've once placed an assert to see wheter we have
SCCs with non-unique entry and there was none during LTO bootstrap
at least, with no iteration).

  
   This is because those otherwise equivalent trees are not really 
   equivalent from merging perspective. You test strict pointer equivalence 
   outside SCC regions, so it matters whether two same trees are referred 
   or two equivalent copies.
  
  Hmm?  So you say you have two identical trees in the _same_ SCC?
  How can that be?  Answer: not possible - they at least differ
  in the edges forming the SCC (unless we have a cyclic list but
  I don't think we (should) have that).
 
 We hash only on outgoing SCC edges.  You can easily have
 main variant type T and variants T1,T2 that are same all used by type
 T again.

Ok, but then the two variants shouldn't be in the same SCC or at
least not in the same SCC as the main variant.

 T1 and T2 differs by references in T, but they will always have same
 hash and in theory they can be merged if your code worked recursively
 on nested SCC regions (as given by Tarjan's algorithm).  Because you work on
 outermost SCC regions only, we never merge T1 and T2.

I wonder what ties them into a SCC though.

 The congruence you compute is not best possible, but best possible 
 assuming no duplicates within compilation units.  I think this is sane, 
 we should look towards eliminating such duplicates rather than working 
 harder in the streamer.

Yeah.

  
  Btw, if you re-hash only nodes with duplicate hashes might there be
  a higher probability of ending up with a hash that collides with
  one that was not a duplicate one?  That is, why not simply re-hash
  all nodes?  (premature optimization...?)
  
  So - can you can iterate on this re-doing the DFS walk with the
  found entry node?
 
 Yep, that is the plan. Iterate long enough until I get unqiue hash.  Can I
 easilly re-do the DFS walk?

Not that easily...  you have to refactor things a bit I fear ;)
DFS_write_tree is the main worker and the init is embedded
in lto_output_tree.  The main issue is that you can end up
with multiple SCCs from the start of the DFS walk.  So you'd
need to first collect all SCCs (but not write anything yet)
and then factor out a worker that processes them - you can
then iterate the DFS walk on the SCCs.

Quite some work eventually.  Not sure ;)

Richard.

 Honza
  
  Thanks,
  Richard.
  
   Honza
   
 * lto-streamer-out.c (hash_tree): Add map argument;
 remove special purpose hack for pointer types.
 (visit): Update it.
 (hash_scc): Add iteration until equivalence classes stabilize.
   Index: lto-streamer-out.c
   ===
   --- lto-streamer-out.c(revision 212426)
   +++ lto-streamer-out.c(working copy)
   @@ -690,13 +712,19 @@ DFS_write_tree_body (struct output_block
/* Return a hash value for the tree T.  */

static hashval_t
   -hash_tree (struct streamer_tree_cache_d *cache, tree t)
   +hash_tree (struct streamer_tree_cache_d *cache, hash_maptree, 
   hashval_t *map, tree t)
{
#define visit(SIBLING) \
  do { \
unsigned ix; \
   -if (SIBLING  streamer_tree_cache_lookup (cache, SIBLING, ix)) \
   +if (!SIBLING) \
   +  v = iterative_hash_hashval_t (0, v); \
   +

Re: [RFC, PATCH 1/n] IPA C++ refactoring

2014-07-11 Thread Jan Hubicka
 Hi,
this first patch continues with rafactoring of IPA infrastructure so that 
 we will have C++ API. In the patch, I transformed many global functions to 
 members of symtab_node and cgraph_node.
 
 Example:
 cgraph_remove_node (struct cgraph_node *node) - cgraph_node::remove (void)
 symtab_unregister_node (symtab_node *node) - symtab_node::unregister (void)
 
 The patch is being consulted with Honza and will iterate. We want to inform 
 folk that we plan to do following changes.
 
 After the patch is applied, I would like to transform varpool_node and 
 cgraph_edge in the following patch.
 
 Thank you for your comments,
 Martin

 
 /* Remove the node from cgraph.  */

Perhaps Remove function from symbol table.
(similarly for varpool, perhaps few other block comments needs revisiting.
We may do that incrementally.)
 
+  /* Add node into symbol table.  This function is not used directly, but via
+ cgraph/varpool node creation routines.  */
+  void register_symbol (void);
+
+  /* Remove symtab node from the symbol table.  */
+  void remove (void);
+
+  /* Dump symtab node to F.  */
+  void dump (FILE *f);
+
+  /* Dump symtab node to stderr.  */
+  void DEBUG_FUNCTION debug (void);
+
+  /* Verify consistency of node.  */
+  void DEBUG_FUNCTION verify (void);
+
+  /* Return ipa reference from this symtab_node to
+ REFERED_NODE or REFERED_VARPOOL_NODE. USE_TYPE specify type
+ of the use and STMT the statement (if it exists).  */
+  struct ipa_ref *add_reference (symtab_node *referred_node,
+   enum ipa_ref_use use_type);
+
+  /* Return ipa reference from this symtab_node to
+ REFERED_NODE or REFERED_VARPOOL_NODE. USE_TYPE specify type
+ of the use and STMT the statement (if it exists).  */
+  struct ipa_ref *add_reference (symtab_node *referred_node,
+enum ipa_ref_use use_type, gimple stmt);
+
+  /* If VAL is a reference to a function or a variable, add a reference from
+ this symtab_node to the corresponding symbol table node.  USE_TYPE specify
+ type of the use and STMT the statement (if it exists).  Return the new
+ reference or NULL if none was created.  */
+  struct ipa_ref *maybe_add_reference (tree val, enum ipa_ref_use use_type,
+  gimple stmt);
+
+  /* Clone all references from symtab NODE to this symtab_node.  */
+  void clone_references (symtab_node *node);
+
+  /* Remove all stmt references in non-speculative references.
+ Those are not maintained during inlining  clonning.
+ The exception are speculative references that are updated along
+ with callgraph edges associated with them.  */
+  void clone_referring (symtab_node *node);
+
+  /* Clone reference REF to this symtab_node and set its stmt to STMT.  */
+  struct ipa_ref *clone_reference (struct ipa_ref *ref, gimple stmt);
+
+  /* Find the structure describing a reference to REFERRED_NODE
+ and associated with statement STMT.  */
+  struct ipa_ref *find_reference (symtab_node *, gimple, unsigned int);
+
+  /* Remove all references that are associated with statement STMT.  */
+  void remove_stmt_references (gimple stmt);
+
+  /* Remove all stmt references in non-speculative references.
+ Those are not maintained during inlining  clonning.
+ The exception are speculative references that are updated along
+ with callgraph edges associated with them.  */
+  void clear_stmts_in_references (void);
+
+  /* Remove all references in ref list.  */
+  void remove_all_references (void);
+
+  /* Remove all referring items in ref list.  */
+  void remove_all_referring (void);
+
+  /* Dump references in ref list to FILE.  */
+  void dump_references (FILE *file);
+
+  /* Dump referring in list to FILE.  */
+  void dump_referring (FILE *);
+
+  /* Return true if symtab node and TARGET represents
+ semantically equivalent symbols.  */
+  bool semantically_equivalent_p (symtab_node *target);
+
+  /* Classify symbol symtab node for partitioning.  */
+  enum symbol_partitioning_class get_partitioning_class (void);
+
+  /* Return comdat group.  */
+  tree get_comdat_group ()
+{
+  return x_comdat_group;
+}
+
+  /* Return comdat group as identifier_node.  */
+  tree get_comdat_group_id ()
+{
+  if (x_comdat_group  TREE_CODE (x_comdat_group) != IDENTIFIER_NODE)
+   x_comdat_group = DECL_ASSEMBLER_NAME (x_comdat_group);
+  return x_comdat_group;
+}
+
+  /* Set comdat group.  */
+  void set_comdat_group (tree group)
+{
+  gcc_checking_assert (!group || TREE_CODE (group) == IDENTIFIER_NODE
+  || DECL_P (group));
+  x_comdat_group = group;
+}
+
+  /* Return section as string.  */
+  const char * get_section ()
+{
+  if (!x_section)
+   return NULL;
+  return x_section-name;
+}
+
+  /* Remove node from same comdat group.   */
+  void remove_from_same_comdat_group (void);
+
+  /* Add this symtab_node to the same comdat 

Re: Optimize type streaming

2014-07-11 Thread Jan Hubicka
  
  We hash only on outgoing SCC edges.  You can easily have
  main variant type T and variants T1,T2 that are same all used by type
  T again.
 
 Ok, but then the two variants shouldn't be in the same SCC or at
 least not in the same SCC as the main variant.

They will because T refers them (since it contains i.e. pointers to itself)
 
  T1 and T2 differs by references in T, but they will always have same
  hash and in theory they can be merged if your code worked recursively
  on nested SCC regions (as given by Tarjan's algorithm).  Because you work on
  outermost SCC regions only, we never merge T1 and T2.
 
 I wonder what ties them into a SCC though.
 
  The congruence you compute is not best possible, but best possible 
  assuming no duplicates within compilation units.  I think this is sane, 
  we should look towards eliminating such duplicates rather than working 
  harder in the streamer.
 
 Yeah.
 
   
   Btw, if you re-hash only nodes with duplicate hashes might there be
   a higher probability of ending up with a hash that collides with
   one that was not a duplicate one?  That is, why not simply re-hash
   all nodes?  (premature optimization...?)
   
   So - can you can iterate on this re-doing the DFS walk with the
   found entry node?
  
  Yep, that is the plan. Iterate long enough until I get unqiue hash.  Can I
  easilly re-do the DFS walk?
 
 Not that easily...  you have to refactor things a bit I fear ;)
 DFS_write_tree is the main worker and the init is embedded
 in lto_output_tree.  The main issue is that you can end up
 with multiple SCCs from the start of the DFS walk.  So you'd
 need to first collect all SCCs (but not write anything yet)
 and then factor out a worker that processes them - you can
 then iterate the DFS walk on the SCCs.
 
 Quite some work eventually.  Not sure ;)

Hmm, walking everything first and then starting streaming sounds bad idea
for locality.  Can't I just re-do the walk since I know what the SCC is?
I will read the code more after lunch.

Honza
 
 Richard.
 
  Honza
   
   Thanks,
   Richard.
   
Honza

* lto-streamer-out.c (hash_tree): Add map argument;
remove special purpose hack for pointer types.
(visit): Update it.
(hash_scc): Add iteration until equivalence classes stabilize.
Index: lto-streamer-out.c
===
--- lto-streamer-out.c  (revision 212426)
+++ lto-streamer-out.c  (working copy)
@@ -690,13 +712,19 @@ DFS_write_tree_body (struct output_block
 /* Return a hash value for the tree T.  */
 
 static hashval_t
-hash_tree (struct streamer_tree_cache_d *cache, tree t)
+hash_tree (struct streamer_tree_cache_d *cache, hash_maptree, 
hashval_t *map, tree t)
 {
 #define visit(SIBLING) \
   do { \
 unsigned ix; \
-if (SIBLING  streamer_tree_cache_lookup (cache, SIBLING, ix)) \
+if (!SIBLING) \
+  v = iterative_hash_hashval_t (0, v); \
+else if (streamer_tree_cache_lookup (cache, SIBLING, ix)) \
   v = iterative_hash_hashval_t (streamer_tree_cache_get_hash 
(cache, ix), v); \
+else if (map) \
+  v = iterative_hash_hashval_t (*map-get (SIBLING), v); \
+else \
+  v = iterative_hash_hashval_t (1, v); \
   } while (0)
 
   /* Hash TS_BASE.  */
@@ -896,23 +924,7 @@ hash_tree (struct streamer_tree_cache_d
 
   if (CODE_CONTAINS_STRUCT (code, TS_TYPED))
 {
-  if (POINTER_TYPE_P (t))
-   {
- /* For pointers factor in the pointed-to type recursively as
-we cannot recurse through only pointers.
-???  We can generalize this by keeping track of the
-in-SCC edges for each tree (or arbitrarily the first
-such edge) and hashing that in in a second stage
-(instead of the quadratic mixing of the SCC we do now).  */
- hashval_t x;
- unsigned ix;
- if (streamer_tree_cache_lookup (cache, TREE_TYPE (t), ix))
-   x = streamer_tree_cache_get_hash (cache, ix);
- else
-   x = hash_tree (cache, TREE_TYPE (t));
- v = iterative_hash_hashval_t (x, v);
-   }
-  else if (code != IDENTIFIER_NODE)
+  if (code != IDENTIFIER_NODE)
visit (TREE_TYPE (t));
 }
 
@@ -1122,28 +1156,78 @@ scc_entry_compare (const void *p1_, cons
 static hashval_t
 hash_scc (struct streamer_tree_cache_d *cache, unsigned first, 
unsigned size)
 {
+  unsigned int last_classes = 0, iterations = 0;
+
   /* Compute hash values for the SCC members.  */
   for (unsigned i = 0; i  size; ++i)
-sccstack[first+i].hash = hash_tree (cache, sccstack[first+i].t);
+sccstack[first+i].hash = hash_tree (cache, NULL, 
sccstack[first+i].t);
 
   if (size 

Re: [PATCH] Fix PR61762, reads from string constants

2014-07-11 Thread Richard Biener
On Fri, 11 Jul 2014, Jakub Jelinek wrote:

 On Fri, Jul 11, 2014 at 11:13:12AM +0200, Richard Biener wrote:
  +  if (offset = off)
  +   ptr[offset - off] = value;
 
 For original off != 0, you aren't checking whether offset - off  len
 though (various places), you don't want to write beyond end of buffer.

Indeed.  Fixed.

  -  return total_bytes;
  +  return total_bytes - off;
 
 What will happen with originally off is bigger than total_bytes?
 Returning negative value sounds wrong, IMHO you should in that case return
 early 0.  So like:
 if ((off == -1  total_bytes  len) || off = total_bytes)
   return 0;

Fixed.

  @@ -7290,7 +7293,8 @@ native_encode_fixed (const_tree expr, un
 FIXED_VALUE_TYPE value;
 tree i_value, i_type;
   
  -  if (total_bytes * BITS_PER_UNIT  HOST_BITS_PER_DOUBLE_INT)
  +  if (off == -1
  +   total_bytes * BITS_PER_UNIT  HOST_BITS_PER_DOUBLE_INT)
   return 0;
 
 This isn't comparing total_bytes to len, so IMHO shouldn't be changed.

Oh, indeed.  (reading that and native_encode_int I think we want
a native_encode_wide_int and avoid creating an integer cst for
encoding fixed_csts)

  @@ -7324,8 +7328,11 @@ native_encode_real (const_tree expr, uns
up to 192 bits.  */
 long tmp[6];
   
  -  if (total_bytes  len)
  +  if (off == -1
  +   total_bytes  len)
 
 This can fit onto one line.
 
  -  rsize = native_encode_expr (part, ptr, len);
  +  rsize = native_encode_expr (part, ptr, len, off);
 if (rsize == 0)
   return 0;
 
 If off is not -1 and len is too short, the above will do a partial
 store.  But:
 a) if it is a partial store, because some bytes didn't fit, then the
 second native_encode_expr should probably not be invoked

len will get negative so that might just be handled fine (fingers
crossing).  I suppose I need to come up with a testcase that ends
up native-encoding a vector partially (SCCVN also benefits from
this work).

 b) what about the case when the first one returns 0 because you are asking
 for few bytes from the imag part?

Yeah.  Sth like

 part = TREE_IMAGPART (expr);
  -  isize = native_encode_expr (part, ptr+rsize, len-rsize);
  -  if (isize != rsize)
  +  if (off != -1)
  +off = MAX (0, off - rsize);

  if (off != -1)
off = MAX (0, off - GET_MODE_SIZE (TYPE_MODE (TREE_TYPE (part;

instead.  Same mistake in vector encoding - we need to adjust
off by the size of the sub-object not by what was encoded.

  +  isize = native_encode_expr (part, ptr+rsize, len-rsize, off);
  +  if (off == -1
  +   isize != rsize)
   return 0;
 return rsize + isize;
   }
 
  @@ -7396,9 +7408,13 @@ native_encode_vector (const_tree expr, u
 for (i = 0; i  count; i++)
   {
 elem = VECTOR_CST_ELT (expr, i);
  -  if (native_encode_expr (elem, ptr+offset, len-offset) != size)
  +  int res = native_encode_expr (elem, ptr+offset, len-offset, off);
  +  if (off == -1
  +  res != size)
  return 0;
 
 I don't think this will work correctly if off is not -1.

I have

  elem = VECTOR_CST_ELT (expr, i);
  int res = native_encode_expr (elem, ptr+offset, len-offset, off);
  if (off == -1
   res != size)
return 0;
  offset += res;
  if (off != -1)
off = MAX (0, off - size);

now, so 'offset' tracks the number of encoded bytes and off is
decremented properly.  Now it will of course break if we fail
to encode one vector element but not another.  But I hate
complicating things here ... (maybe guard all recursive calls
with off  size and check for a 0 return).  Like

  for (i = 0; i  count; i++)
{
  if (off = size)
{
  off -= size;
  continue;
}
  elem = VECTOR_CST_ELT (expr, i);
  int res = native_encode_expr (elem, ptr+offset, len-offset, off);
  if ((off == -1  res != size)
  || res == 0)
return 0;
  offset += res;
  if (off != -1)
off = 0;

?

 if (TREE_STRING_LENGTH (expr)  total_bytes)
 
 No verification that you are not accessing beyond end of string here.

Changed to

  if (TREE_STRING_LENGTH (expr) - off  MIN (total_bytes, len))
{
  int written = 0;
  if (off  TREE_STRING_LENGTH (expr))
{
  written = MIN (len, TREE_STRING_LENGTH (expr) - off);
  memcpy (ptr, TREE_STRING_POINTER (expr) + off, written);
}
  memset (ptr + written, 0,
  MIN (total_bytes - written, len - written));
}
  else
memcpy (ptr, TREE_STRING_POINTER (expr) + off, MIN (total_bytes, 
len));

ugh.

Now I have to think about getting at least some testing coverage ...

Richard.

Index: gcc/fold-const.c
===
*** gcc/fold-const.c.orig   2014-07-11 11:13:22.380607780 +0200
--- gcc/fold-const.c2014-07-11 12:05:02.352394351 +0200
*** fold_plusminus_mult_expr (location_t loc
*** 7239,7253 
 upon failure.  */
  
  static int
! 

Re: FWD: Re: OpenACC subarray specifications in the GCC Fortran front end

2014-07-11 Thread Thomas Schwinge
Hi!

On Thu, 22 May 2014 11:31:25 +0400, Ilmir Usmanov i.usma...@samsung.com wrote:
 On 16.05.2014 19:44, Ilmir Usmanov wrote:
  On 16.05.2014 19:12, Thomas Schwinge wrote:
  You recently indicated that you have already begun implementing OpenACC
  subarray specifications in the GCC Fortran front end, but have not
  been/are not currently able to complete that.  Would you be willing to
  share your WIP patch with Cesar, who is now working on this, so that he
  doesn't have to duplicate your work?
  Sure! I'm glad to know that my work won't go directly to trash.

:-)

  You can find the patch in attachment.
 
  I started to implement sub-arrays in gfortran by implementing OpenMP 
  4.0 target map clause. This clause was already implemented in C/C++ 
  FEs, so I could check the behavior. I don't know whether it's already 
  implemented in gfortran or not.

To avoid duplication of work: with Jakub's Fortran OpenMP 4 target
changes recently committed to trunk, and now merged into gomp-4_0-branch,
I have trimmed down Ilmir's patch to just the OpenACC bits, OpenMP 4
target changes removed, and TODO markers added to integrate into that.

Jakub, before your Fortran OpenMP 4 target changes, Ilmir had written the
test case gcc/testsuite/gfortran.dg/gomp/map-1.f90 (based on his
interpretation and implementation of OpenMP 4 target), which I have now
amended with XFAILs and changed error messages -- anything in there that
you'd like to see addressed for Fortran OpenMP 4 target?

  To represent OpenMP array sections (or OpenACC subarrays) I used 
  gfc_expr.
 
  After implementing OpenMP target map clauses I was going to use it to 
  represent OpenACC data clauses, just as Thomas recommended in his 
  mail: http://gcc.gnu.org/ml/gcc-patches/2014-01/msg02040.html
 
  I hope this will be useful for you. If you will have any question feel 
  free to ask.

 gcc/fortran/dump-parse-tree.c |  47 +--
 gcc/fortran/gfortran.h|  18 +++
 gcc/fortran/openmp.c  | 182 ++
 gcc/fortran/trans-openmp.c| 145 +++-
 gcc/testsuite/gfortran.dg/goacc/subarrays.f95 |  36 +
 gcc/testsuite/gfortran.dg/gomp/map-1.f90  | 109 +++
 6 files changed, 520 insertions(+), 17 deletions(-)

diff --git gcc/fortran/dump-parse-tree.c gcc/fortran/dump-parse-tree.c
index c3671395..8d7c38c 100644
--- gcc/fortran/dump-parse-tree.c
+++ gcc/fortran/dump-parse-tree.c
@@ -1072,6 +1072,18 @@ show_omp_namelist (int list_type, gfc_omp_namelist *n)
 }
 }
 
+/* TODO: remove; use show_omp_namelist.  */
+static void
+show_expr_list (gfc_expr_list *el)
+{
+  for (; el-next; el = el-next)
+{
+  show_expr (el-expr);
+  fputc (',', dumpfile);
+}
+  show_expr (el-expr);
+}
+
 
 /* Show OpenMP or OpenACC clauses.  */
 
@@ -1214,28 +1226,35 @@ show_omp_clauses (gfc_omp_clauses *omp_clauses)
}
   fprintf (dumpfile,  DEFAULT(%s), type);
 }
+  for (int kind = 0; kind  OMP_MAP_LIST_LAST; kind++)
+{
+  const char *type;
+  if (omp_clauses-map_lists[kind] == NULL)
+   continue;
+
+  switch (kind)
+   {
+   case OMP_MAP_LIST_ALLOC: type = ALLOC; break;
+   case OMP_MAP_LIST_TO: type = TO; break;
+   case OMP_MAP_LIST_FROM: type = FROM; break;
+   case OMP_MAP_LIST_TOFROM: type = TOFROM; break;
+   default:
+ gcc_unreachable ();
+   }
+  fprintf (dumpfile,  MAP(%s:, type);
+  show_expr_list (omp_clauses-map_lists[kind]);
+  fputc (')', dumpfile);
+}
   if (omp_clauses-tile_list)
 {
-  gfc_expr_list *list;
   fputs ( TILE(, dumpfile);
-  for (list = omp_clauses-tile_list; list; list = list-next)
-   {
- show_expr (list-expr);
- if (list-next) 
-   fputs (, , dumpfile);
-   }
+  show_expr_list (omp_clauses-tile_list);
   fputc (')', dumpfile);
 }
   if (omp_clauses-wait_list)
 {
-  gfc_expr_list *list;
   fputs ( WAIT(, dumpfile);
-  for (list = omp_clauses-wait_list; list; list = list-next)
-   {
- show_expr (list-expr);
- if (list-next) 
-   fputs (, , dumpfile);
-   }
+  show_expr_list (omp_clauses-wait_list);
   fputc (')', dumpfile);
 }
   if (omp_clauses-seq)
diff --git gcc/fortran/gfortran.h gcc/fortran/gfortran.h
index cc445e6..09da2d1 100644
--- gcc/fortran/gfortran.h
+++ gcc/fortran/gfortran.h
@@ -1172,6 +1172,22 @@ enum
   OMP_LIST_NUM
 };
 
+/* OpenACC 2.0: data clauses kind.  */
+/* TODO: remove; use OpenMP 4 target infrastructure.  */
+enum gfc_omp_clause_map_kind
+{
+  /* If not already present, allocate.  */
+  OMP_MAP_LIST_ALLOC,
+  /* ..., and copy to device.  */
+  OMP_MAP_LIST_TO,
+  /* ..., and copy from device.  */
+  OMP_MAP_LIST_FROM,
+  /* ..., and copy to and from device.  */
+  OMP_MAP_LIST_TOFROM,
+  /* End marker.  */
+  OMP_MAP_LIST_LAST
+};
+
 /* Because a symbol can belong to multiple 

Re: Optimize type streaming

2014-07-11 Thread Richard Biener
On Fri, 11 Jul 2014, Jan Hubicka wrote:

   
   We hash only on outgoing SCC edges.  You can easily have
   main variant type T and variants T1,T2 that are same all used by type
   T again.
  
  Ok, but then the two variants shouldn't be in the same SCC or at
  least not in the same SCC as the main variant.
 
 They will because T refers them (since it contains i.e. pointers to itself)

Ah, yes.  Bah ;)  [handle TYPE_FIELDs like variants - built them up
late via adjusting DECL_CONTEXTs TYPE_FIELDs ... 8)]

That is, ideally stream a tree, not a graph ...

   T1 and T2 differs by references in T, but they will always have same
   hash and in theory they can be merged if your code worked recursively
   on nested SCC regions (as given by Tarjan's algorithm).  Because you work 
   on
   outermost SCC regions only, we never merge T1 and T2.
  
  I wonder what ties them into a SCC though.
  
   The congruence you compute is not best possible, but best possible 
   assuming no duplicates within compilation units.  I think this is sane, 
   we should look towards eliminating such duplicates rather than working 
   harder in the streamer.
  
  Yeah.
  

Btw, if you re-hash only nodes with duplicate hashes might there be
a higher probability of ending up with a hash that collides with
one that was not a duplicate one?  That is, why not simply re-hash
all nodes?  (premature optimization...?)

So - can you can iterate on this re-doing the DFS walk with the
found entry node?
   
   Yep, that is the plan. Iterate long enough until I get unqiue hash.  Can I
   easilly re-do the DFS walk?
  
  Not that easily...  you have to refactor things a bit I fear ;)
  DFS_write_tree is the main worker and the init is embedded
  in lto_output_tree.  The main issue is that you can end up
  with multiple SCCs from the start of the DFS walk.  So you'd
  need to first collect all SCCs (but not write anything yet)
  and then factor out a worker that processes them - you can
  then iterate the DFS walk on the SCCs.
  
  Quite some work eventually.  Not sure ;)
 
 Hmm, walking everything first and then starting streaming sounds bad idea
 for locality.  Can't I just re-do the walk since I know what the SCC is?
 I will read the code more after lunch.

Might be possible with a 2nd SCC stack set up, yes.

Richard.

 Honza
  
  Richard.
  
   Honza

Thanks,
Richard.

 Honza
 
   * lto-streamer-out.c (hash_tree): Add map argument;
   remove special purpose hack for pointer types.
   (visit): Update it.
   (hash_scc): Add iteration until equivalence classes stabilize.
 Index: lto-streamer-out.c
 ===
 --- lto-streamer-out.c(revision 212426)
 +++ lto-streamer-out.c(working copy)
 @@ -690,13 +712,19 @@ DFS_write_tree_body (struct output_block
  /* Return a hash value for the tree T.  */
  
  static hashval_t
 -hash_tree (struct streamer_tree_cache_d *cache, tree t)
 +hash_tree (struct streamer_tree_cache_d *cache, hash_maptree, 
 hashval_t *map, tree t)
  {
  #define visit(SIBLING) \
do { \
  unsigned ix; \
 -if (SIBLING  streamer_tree_cache_lookup (cache, SIBLING, ix)) 
 \
 +if (!SIBLING) \
 +  v = iterative_hash_hashval_t (0, v); \
 +else if (streamer_tree_cache_lookup (cache, SIBLING, ix)) \
v = iterative_hash_hashval_t (streamer_tree_cache_get_hash 
 (cache, ix), v); \
 +else if (map) \
 +  v = iterative_hash_hashval_t (*map-get (SIBLING), v); \
 +else \
 +  v = iterative_hash_hashval_t (1, v); \
} while (0)
  
/* Hash TS_BASE.  */
 @@ -896,23 +924,7 @@ hash_tree (struct streamer_tree_cache_d
  
if (CODE_CONTAINS_STRUCT (code, TS_TYPED))
  {
 -  if (POINTER_TYPE_P (t))
 - {
 -   /* For pointers factor in the pointed-to type recursively as
 -  we cannot recurse through only pointers.
 -  ???  We can generalize this by keeping track of the
 -  in-SCC edges for each tree (or arbitrarily the first
 -  such edge) and hashing that in in a second stage
 -  (instead of the quadratic mixing of the SCC we do now).  */
 -   hashval_t x;
 -   unsigned ix;
 -   if (streamer_tree_cache_lookup (cache, TREE_TYPE (t), ix))
 - x = streamer_tree_cache_get_hash (cache, ix);
 -   else
 - x = hash_tree (cache, TREE_TYPE (t));
 -   v = iterative_hash_hashval_t (x, v);
 - }
 -  else if (code != IDENTIFIER_NODE)
 +  if (code != IDENTIFIER_NODE)
   visit (TREE_TYPE (t));
  }
  
 @@ -1122,28 +1156,78 @@ scc_entry_compare (const void *p1_, cons
  static hashval_t
  hash_scc (struct streamer_tree_cache_d *cache, 

[C++ Patch] PR 53159 (Take 2)

2014-07-11 Thread Paolo Carlini

Hi,

only today it occurred to me that we can as well delay all the 
diagnostic at issue to the check_narrowing at the end of 
convert_like_real and avoid at once possible issues with duplicate 
warnings. Tested x86_64-linux.


Thanks,
Paolo.

//
/cp
2014-07-11  Paolo Carlini  paolo.carl...@oracle.com

PR c++/53159
* call.c (build_user_type_conversion_1): Copy LOOKUP_NO_NARROWING
into convflags.
* decl.c (check_initializer): Dont' call check_narrowing here,
set LOOKUP_NO_NARROWING.
* typeck2.c (digest_init_r): Likewise.

/testsuite
2014-07-11  Paolo Carlini  paolo.carl...@oracle.com

PR c++/53159
* g++.dg/cpp0x/Wnarrowing1.C: New.
Index: cp/call.c
===
--- cp/call.c   (revision 212449)
+++ cp/call.c   (working copy)
@@ -3586,7 +3586,8 @@ build_user_type_conversion_1 (tree totype, tree ex
 
   /* It's OK to bind a temporary for converting constructor arguments, but
  not in converting the return value of a conversion operator.  */
-  convflags = ((flags  LOOKUP_NO_TEMP_BIND) | LOOKUP_NO_CONVERSION);
+  convflags = ((flags  LOOKUP_NO_TEMP_BIND) | LOOKUP_NO_CONVERSION
+  | (flags  LOOKUP_NO_NARROWING));
   flags = ~LOOKUP_NO_TEMP_BIND;
 
   if (ctors)
Index: cp/decl.c
===
--- cp/decl.c   (revision 212449)
+++ cp/decl.c   (working copy)
@@ -5757,7 +5757,7 @@ check_initializer (tree decl, tree init, int flags
{
  init = reshape_init (type, init, tf_warning_or_error);
  if (SCALAR_TYPE_P (type))
-   check_narrowing (type, init);
+   flags |= LOOKUP_NO_NARROWING;
}
}
   else if (TREE_CODE (init) == TREE_LIST
Index: cp/typeck2.c
===
--- cp/typeck2.c(revision 212449)
+++ cp/typeck2.c(working copy)
@@ -1027,7 +1027,7 @@ digest_init_r (tree type, tree init, bool nested,
   tree *exp;
 
   if (nested)
-   check_narrowing (type, init);
+   flags |= LOOKUP_NO_NARROWING;
   init = convert_for_initialization (0, type, init, flags,
 ICR_INIT, NULL_TREE, 0,
 complain);
Index: testsuite/g++.dg/cpp0x/Wnarrowing1.C
===
--- testsuite/g++.dg/cpp0x/Wnarrowing1.C(revision 0)
+++ testsuite/g++.dg/cpp0x/Wnarrowing1.C(working copy)
@@ -0,0 +1,18 @@
+// PR c++/53159
+// { dg-do compile { target c++11 } }
+// { dg-options -Wnarrowing -Wno-overflow }
+
+struct X
+{
+  constexpr operator int() { return __INT_MAX__; }
+};
+
+int f() { return __INT_MAX__; }
+
+signed char a { __INT_MAX__ }; // { dg-warning narrowing conversion }
+signed char b { f() }; // { dg-warning narrowing conversion }
+signed char c { X{} }; // { dg-warning narrowing conversion }
+
+signed char ar[] { __INT_MAX__ };  // { dg-warning narrowing conversion }
+signed char br[] { f() };  // { dg-warning narrowing conversion }
+signed char cr[] { X{} };  // { dg-warning narrowing conversion }


Re: [PATCH] Generate more efficient memory barriers for LEON3

2014-07-11 Thread Daniel Cederman
That was an error on my side. The wrong memory model had gotten cached 
in a generated make script. Lets drop membar_leon3 also then :)


On 2014-07-11 11:15, Eric Botcazou wrote:

The full barrier pattern membar_leon3 also gets generated so I think
that one should be kept also.


Do you have a testcase?  membar is generated by sparc_emit_membar_for_model
and, for the TSO model of LEON3, implied = StoreStore | LoadLoad | LoadStore
so mm can only be StoreLoad, which means that membar_storeload will match so
the full barrier never will.



[PATCH-v3] Generate more efficient memory barriers for LEON3

2014-07-11 Thread Daniel Cederman
ChangeLog:

2014-07-11  Daniel Cederman  ceder...@gaisler.com

gcc/config/sparc/
* sync.md: Generate more efficient memory barriers for LEON3 

---
 gcc/config/sparc/sync.md |   10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/gcc/config/sparc/sync.md b/gcc/config/sparc/sync.md
index e6e237f..98ac0d3 100644
--- a/gcc/config/sparc/sync.md
+++ b/gcc/config/sparc/sync.md
@@ -64,11 +64,19 @@
   stbar
   [(set_attr type multi)])
 
+;; For LEON3, STB has the effect of membar #StoreLoad.
+(define_insn *membar_storeload_leon3
+  [(set (match_operand:BLK 0  )
+   (unspec:BLK [(match_dup 0) (const_int 2)] UNSPEC_MEMBAR))]
+  TARGET_LEON3
+  stb\t%%g0, [%%sp-1]
+  [(set_attr type store)])
+
 ;; For V8, LDSTUB has the effect of membar #StoreLoad.
 (define_insn *membar_storeload
   [(set (match_operand:BLK 0  )
(unspec:BLK [(match_dup 0) (const_int 2)] UNSPEC_MEMBAR))]
-  TARGET_V8
+  TARGET_V8  !TARGET_LEON3
   ldstub\t[%%sp-1], %%g0
   [(set_attr type multi)])
 
-- 
1.7.9.5



Re: [Patch 1/2][ARM]Split insn type alu_reg into alu_sreg and alu_dsp_reg

2014-07-11 Thread Richard Earnshaw
On 10/07/14 08:46, Terry Guo wrote:
 Hi there,
 
 Currently the insn type of DSP-kind instructions like QSUB8 is alu_reg which
 is same as other normal instructions like SUB. In order to distinguish those
 DSP-kind instructions, this patch intends to replace current alu_reg with
 two sub categories alu_sreg and alu_dsp_reg. Meanwhile the alus_reg is
 renamed to alus_sreg in terms of consistence. This is the first patch of
 this series and intends to cover the files under gcc/config/arm. Tested with
 gcc regression, no new regressions. Is it ok to trunk?
 
 BR,
 Terry
 

This is OK.  But please wait for the review of the AArch64 code before
committing (both patches need to be in one commit for consistency).

R.

 2014-07-10  Terry Guo  terry@arm.com
 
  * config/arm/types.md (alu_reg): Replaced by alu_sreg and alu_dsp_reg.
  (alus_reg): Renamed to alus_sreg.
  * config/arm/arm-fixed.md: Change type of non-dsp instructions
  from alu_reg to alu_sreg.  Change type of dsp instructions from
  alu_reg to alu_dsp_reg.
  * config/arm/thumb1.md: Likewise.
  * config/arm/thumb2.md: Likewise.
  * config/arm/arm.c (cortexa7_older_only): Use new ALU type names.
  * config/arm/arm1020e.md (1020alu_op): Replace alu_reg and alus_reg
 with
  alu_sreg and alus_sreg.
  * config/arm/arm1026ejs.md (alu_op): Likewise.
  * config/arm/arm1136jfs.md (11_alu_op): Likewise.
  * config/arm/arm926ejs.md (9_alu_op): Likewise.
  * config/arm/fa526.md (526_alu_op): Likewise.
  * config/arm/fa606te.md (606te_alu_op): Likewise.
  * config/arm/fa626te.md (626te_alu_op): Likewise.
  * config/arm/fa726te.md (726te_alu_op): Likewise.
  * config/arm/fmp626.md (mp626_alu_op): Likewise.
  * config/arm/arm.md (core_cycles): Replace alu_reg and alus_reg with
  alu_sreg, alu_dsp_reg and alus_sreg. 
  * config/arm/cortex-a15.md (cortex_a15_alu): Likewise.
  * config/arm/cortex-a5.md (cortex_a5_alu): Likewise.
  * config/arm/cortex-a53.md (cortex_a53_alu): Likewise.
  * config/arm/cortex-a7.md (cortex_a7_alu_sreg): Likewise.
  * config/arm/cortex-a8.md (cortex_a8_alu): Likewise.
  * config/arm/cortex-a9.md (cortex_a9_dp): Likewise.
  * config/arm/cortex-m4.md (cortex_m4_alu): Likewise.
  * config/arm/cortex-r4.md (cortex_r4_alu): Likewise.
  * config/arm/marvell-pj4.md (pj4_alu, pj4_alu_conds): Likewise.
 
 
 arm-new-insn-type-v3.txt
 
 
 diff --git a/gcc/config/arm/arm-fixed.md b/gcc/config/arm/arm-fixed.md
 index 4ab9d35..5611ad1 100644
 --- a/gcc/config/arm/arm-fixed.md
 +++ b/gcc/config/arm/arm-fixed.md
 @@ -26,7 +26,7 @@
add%?\\t%0, %1, %2
[(set_attr predicable yes)
 (set_attr predicable_short_it yes,no)
 -   (set_attr type alu_reg)])
 +   (set_attr type alu_sreg)])
  
  (define_insn addmode3
[(set (match_operand:ADDSUB 0 s_register_operand =r)
 @@ -36,7 +36,7 @@
saddqaddsub_suf%?\\t%0, %1, %2
[(set_attr predicable yes)
 (set_attr predicable_short_it no)
 -   (set_attr type alu_reg)])
 +   (set_attr type alu_dsp_reg)])
  
  (define_insn usaddmode3
[(set (match_operand:UQADDSUB 0 s_register_operand =r)
 @@ -46,7 +46,7 @@
uqaddqaddsub_suf%?\\t%0, %1, %2
[(set_attr predicable yes)
 (set_attr predicable_short_it no)
 -   (set_attr type alu_reg)])
 +   (set_attr type alu_dsp_reg)])
  
  (define_insn ssaddmode3
[(set (match_operand:QADDSUB 0 s_register_operand =r)
 @@ -56,7 +56,7 @@
qaddqaddsub_suf%?\\t%0, %1, %2
[(set_attr predicable yes)
 (set_attr predicable_short_it no)
 -   (set_attr type alu_reg)])
 +   (set_attr type alu_dsp_reg)])
  
  (define_insn submode3
[(set (match_operand:FIXED 0 s_register_operand =l,r)
 @@ -66,7 +66,7 @@
sub%?\\t%0, %1, %2
[(set_attr predicable yes)
 (set_attr predicable_short_it yes,no)
 -   (set_attr type alu_reg)])
 +   (set_attr type alu_sreg)])
  
  (define_insn submode3
[(set (match_operand:ADDSUB 0 s_register_operand =r)
 @@ -76,7 +76,7 @@
ssubqaddsub_suf%?\\t%0, %1, %2
[(set_attr predicable yes)
 (set_attr predicable_short_it no)
 -   (set_attr type alu_reg)])
 +   (set_attr type alu_dsp_reg)])
  
  (define_insn ussubmode3
[(set (match_operand:UQADDSUB 0 s_register_operand =r)
 @@ -87,7 +87,7 @@
uqsubqaddsub_suf%?\\t%0, %1, %2
[(set_attr predicable yes)
 (set_attr predicable_short_it no)
 -   (set_attr type alu_reg)])
 +   (set_attr type alu_dsp_reg)])
  
  (define_insn sssubmode3
[(set (match_operand:QADDSUB 0 s_register_operand =r)
 @@ -97,7 +97,7 @@
qsubqaddsub_suf%?\\t%0, %1, %2
[(set_attr predicable yes)
 (set_attr predicable_short_it no)
 -   (set_attr type alu_reg)])
 +   (set_attr type alu_dsp_reg)])
  
  ;; Fractional multiplies.
  
 diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
 index 78cae73..942df7d 100644
 --- a/gcc/config/arm/arm.c
 +++ b/gcc/config/arm/arm.c
 @@ -11710,8 +11710,9 @@ cortexa7_older_only (rtx insn)
  
switch (get_attr_type 

Re: [Patch 2/2][AArch64]Split insn type alu_reg into alu_sreg and alu_dsp_reg

2014-07-11 Thread Richard Earnshaw
On 10/07/14 09:06, Terry Guo wrote:
 Hi there,
 
 As the second and final patch in this series, it intends to update alu_reg
 and alus_reg types for AArch64 port. With this change, the gcc can be
 successfully built for AArch64. Is it OK to trunk?
 
 BR,
 Terry
 
 2014-07-10  Terry Guo  terry@arm.com
 
  * config/aarch64/aarch64.md (*addsi3_aarch64, *addsi3_aarch64_uxtw,
  subsi3, *adddi3_aarch64, *subsi3_uxtw, subdi3, absdi2, negmode2,
  add_losym_mode, *negsi2_uxtw, tlsle_small_mode): Rename type
 alu_reg
  to alu_sreg. 
  (addmode3_compare0, *addsi3_compare0_uxtw, *addmode3nr_compare0,
  submode3_compare0, *compare_negmode, *negmode2_compare0,
  subsi3_compare0_uxtw, *negsi2_compare0_uxtw, *cmpmode): Rename type 
  alus_reg to alus_sreg.
 
 

OK.  As mentioned in the other thread, please commit both patches in a
single commit.

R.

 aarch64-new-insn-type-v1.txt
 
 
 diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
 index 3eb783c..76d8cd3 100644
 --- a/gcc/config/aarch64/aarch64.md
 +++ b/gcc/config/aarch64/aarch64.md
 @@ -1167,7 +1167,7 @@
add\\t%w0, %w1, %w2
add\\t%0.2s, %1.2s, %2.2s
sub\\t%w0, %w1, #%n2
 -  [(set_attr type alu_imm,alu_reg,neon_add,alu_imm)
 +  [(set_attr type alu_imm,alu_sreg,neon_add,alu_imm)
 (set_attr simd *,*,yes,*)]
  )
  
 @@ -1183,7 +1183,7 @@
add\\t%w0, %w1, %2
add\\t%w0, %w1, %w2
sub\\t%w0, %w1, #%n2
 -  [(set_attr type alu_imm,alu_reg,alu_imm)]
 +  [(set_attr type alu_imm,alu_sreg,alu_imm)]
  )
  
  (define_insn *adddi3_aarch64
 @@ -1198,7 +1198,7 @@
add\\t%x0, %x1, %x2
sub\\t%x0, %x1, #%n2
add\\t%d0, %d1, %d2
 -  [(set_attr type alu_imm,alu_reg,alu_imm,alu_reg)
 +  [(set_attr type alu_imm,alu_sreg,alu_imm,alu_sreg)
 (set_attr simd *,*,*,yes)]
  )
  
 @@ -1234,7 +1234,7 @@
adds\\t%w0, %w1, %w2
adds\\t%w0, %w1, %w2
subs\\t%w0, %w1, #%n2
 -  [(set_attr type alus_reg,alus_imm,alus_imm)]
 +  [(set_attr type alus_sreg,alus_imm,alus_imm)]
  )
  
  ;; zero_extend version of above
 @@ -1251,7 +1251,7 @@
adds\\t%w0, %w1, %w2
adds\\t%w0, %w1, %w2
subs\\t%w0, %w1, #%n2
 -  [(set_attr type alus_reg,alus_imm,alus_imm)]
 +  [(set_attr type alus_sreg,alus_imm,alus_imm)]
  )
  
  (define_insn *adds_mul_imm_mode
 @@ -1365,7 +1365,7 @@
cmn\\t%w0, %w1
cmn\\t%w0, %w1
cmp\\t%w0, #%n1
 -  [(set_attr type alus_reg,alus_imm,alus_imm)]
 +  [(set_attr type alus_sreg,alus_imm,alus_imm)]
  )
  
  (define_insn *compare_negmode
 @@ -1375,7 +1375,7 @@
(match_operand:GPI 1 register_operand r)))]

cmn\\t%w1, %w0
 -  [(set_attr type alus_reg)]
 +  [(set_attr type alus_sreg)]
  )
  
  (define_insn *add_shift_mode
 @@ -1647,7 +1647,7 @@
  (match_operand:SI 2 register_operand r)))]

sub\\t%w0, %w1, %w2
 -  [(set_attr type alu_reg)]
 +  [(set_attr type alu_sreg)]
  )
  
  ;; zero_extend version of above
 @@ -1658,7 +1658,7 @@
  (match_operand:SI 2 register_operand r]

sub\\t%w0, %w1, %w2
 -  [(set_attr type alu_reg)]
 +  [(set_attr type alu_sreg)]
  )
  
  (define_insn subdi3
 @@ -1669,7 +1669,7 @@
@
 sub\\t%x0, %x1, %x2
 sub\\t%d0, %d1, %d2
 -  [(set_attr type alu_reg, neon_sub)
 +  [(set_attr type alu_sreg, neon_sub)
 (set_attr simd *,yes)]
  )
  
 @@ -1701,7 +1701,7 @@
   (minus:GPI (match_dup 1) (match_dup 2)))]

subs\\t%w0, %w1, %w2
 -  [(set_attr type alus_reg)]
 +  [(set_attr type alus_sreg)]
  )
  
  ;; zero_extend version of above
 @@ -1714,7 +1714,7 @@
   (zero_extend:DI (minus:SI (match_dup 1) (match_dup 2]

subs\\t%w0, %w1, %w2
 -  [(set_attr type alus_reg)]
 +  [(set_attr type alus_sreg)]
  )
  
  (define_insn *sub_shift_mode
 @@ -1925,7 +1925,7 @@
GEN_INT (63);
  DONE;
}
 -  [(set_attr type alu_reg)]
 +  [(set_attr type alu_sreg)]
  )
  
  (define_insn negmode2
 @@ -1935,7 +1935,7 @@
@
 neg\\t%w0, %w1
 neg\\t%rtn0vas, %rtn1vas
 -  [(set_attr type alu_reg, neon_negq)
 +  [(set_attr type alu_sreg, neon_negq)
 (set_attr simd *,yes)]
  )
  
 @@ -1945,7 +1945,7 @@
   (zero_extend:DI (neg:SI (match_operand:SI 1 register_operand r]

neg\\t%w0, %w1
 -  [(set_attr type alu_reg)]
 +  [(set_attr type alu_sreg)]
  )
  
  (define_insn *ngcmode
 @@ -1975,7 +1975,7 @@
   (neg:GPI (match_dup 1)))]

negs\\t%w0, %w1
 -  [(set_attr type alus_reg)]
 +  [(set_attr type alus_sreg)]
  )
  
  ;; zero_extend version of above
 @@ -1987,7 +1987,7 @@
   (zero_extend:DI (neg:SI (match_dup 1]

negs\\t%w0, %w1
 -  [(set_attr type alus_reg)]
 +  [(set_attr type alus_sreg)]
  )
  
  (define_insn *neg_shiftmode3_compare0
 @@ -2266,7 +2266,7 @@
 cmp\\t%w0, %w1
 cmp\\t%w0, %w1
 cmn\\t%w0, #%n1
 -  [(set_attr type alus_reg,alus_imm,alus_imm)]
 +  [(set_attr type alus_sreg,alus_imm,alus_imm)]
  )
  
  (define_insn *cmpmode
 @@ -3790,7 

Re: FWD: Re: OpenACC subarray specifications in the GCC Fortran front end

2014-07-11 Thread Jakub Jelinek
On Fri, Jul 11, 2014 at 12:11:10PM +0200, Thomas Schwinge wrote:
 To avoid duplication of work: with Jakub's Fortran OpenMP 4 target
 changes recently committed to trunk, and now merged into gomp-4_0-branch,
 I have trimmed down Ilmir's patch to just the OpenACC bits, OpenMP 4
 target changes removed, and TODO markers added to integrate into that.

Resolving the TODO markers would be nice, indeed.

 Jakub, before your Fortran OpenMP 4 target changes, Ilmir had written the
 test case gcc/testsuite/gfortran.dg/gomp/map-1.f90 (based on his
 interpretation and implementation of OpenMP 4 target), which I have now
 amended with XFAILs and changed error messages -- anything in there that
 you'd like to see addressed for Fortran OpenMP 4 target?

 +  !$omp target map(j(5:4)) ! { dg-error Lower bound of OpenMP array section 
 in greater than upper  { xfail *-*-* } }
 +  !$omp end target

I think this isn't an error in Fortran, if low bound is above upper bound,
then it is considered a zero size array section.  Though supposedly for
depend clause we might want to diagnose that.

 +  !$omp target map(aas) ! { dg-error The upper bound in the last dimension 
 must appear  { xfail *-*-* } }
 +  !$omp end target

Assumed-size in map without array section would be indeed nice thing to
diagnose.

 +  !$omp target map(tt%i) ! { dg-error Syntax error in OpenMP variable list 
 }
 +  !$omp end target ! { dg-bogus Unexpected !\\\$OMP END TARGET statement 
  { xfail *-*-* } }

Right now the parsing of !$omp directives in case of parsing error rejects
the whole directive, perhaps it should be reconsidered unless it is a fatal
error from which there is no easy way out.

 +  !$omp target map(tt%j(1)) ! { dg-bogus Syntax error in OpenMP variable 
 list  { xfail *-*-* } }
 +  !$omp end target ! { dg-bogus Unexpected !\\\$OMP END TARGET statement 
  { xfail *-*-* } }
 +
 +  !$omp target map(tt%j(1:)) ! { dg-bogus Syntax error in OpenMP variable 
 list  { xfail *-*-* } }
 +  !$omp end target ! { dg-bogus Unexpected !\\\$OMP END TARGET statement 
  { xfail *-*-* } }

These two are pending resolution on omp-lang, I had exchanged a few mails
about it, I think we shouldn't support those for consistency with the C/C++
support, where tt.j[1] or tt.j[1:] and similar is explicitly invalid.

Jakub


Re: [Patch ARM-AArch64/testsuite v2 01/21] Neon intrinsics execution tests initial framework.

2014-07-11 Thread Richard Earnshaw
On 10/07/14 11:12, Marcus Shawcroft wrote:
 On 1 July 2014 11:05, Christophe Lyon christophe.l...@linaro.org wrote:
 * documentation (README)
 * dejanu driver (neon-intrinsics.exp)
 * support macros (arm-neon-ref.h, compute-ref-data.h)
 * Tests for 3 intrinsics: vaba, vld1, vshl
 
 Hi, The terminology in armv8 is advsimd rather than neon.  Can we
 rename neon-intrinsics to advsimd-intrinsics or simd-intrinsics
 throughout please.  The existing gcc.target/aarch64/simd directory of
 tests will presumably be superseded by this more comprehensive set of
 tests so I suggest these tests go in gcc.target/aarch64/advsimd and we
 eventually remove gcc.target/aarch64/simd/ directory.
 
 GNU style should apply throughout this patch series, notably double
 space after period in comments and README text.  Space before left
 parenthesis in function/macro call and function declaration.  The
 function name in a declaration goes on a new line.  The GCC wiki notes
 on test case state individual test should have file names ending in
 _number, see here https://gcc.gnu.org/wiki/TestCaseWriting
 
 I'm OK with the execute only no scan nature of the tests.
 
 diff --git a/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/README 
 b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/README
 new file mode 100644
 index 000..232bb1d
 --- /dev/null
 +++ b/gcc/testsuite/gcc.target/aarch64/neon-intrinsics/README
 @@ -0,0 +1,132 @@
 +This directory contains executable tests for ARM/AArch64 Neon
 +intrinsics.
 
 Neon - Advanced SIMD as below.

On first use, I think Advanced SIMD (Neon) is even better.

R.

 
 +
 +It is meant to cover execution cases of all the Advanced SIMD
 +intrinsics, but does not scan the generated assembler code.
 
 +#if __BYTE_ORDER__ == __ORDER_LITTLE_ENDIAN__
 +
 +typedef union {
 +  struct {
 
 GNUstyle { on new lne.
 
 +#define Neon_Cumulative_Sat  __read_neon_cumulative_sat()
 +#define Set_Neon_Cumulative_Sat(x)  __set_neon_cumulative_sat((x))
 
 Upper case the macro's rather than camel case.
 
 +# Copyright (C) 2013 Free Software Foundation, Inc.
 
 s/13/14/
 
 Cheers
 /Marcus
 




Re: [GSoC] generation of Gimple loops with empty bodies

2014-07-11 Thread Roman Gareev
 The struct now contains only a single element such that there seems to be no
 need for it anymore. Should we remove it? (We could still use a typedef if
 you believe the datatype is too long).

I don't mind removing it. However I think that we should choose the
way of transferring of sese region, which is used for translation of
an isl_ast_node_user to Gimple code. Should we transfer it separately
or restore ivs_params later? What do you think?

--
   Cheers, Roman Gareev.


Re: [GSoC] generation of Gimple loops with empty bodies

2014-07-11 Thread Tobias Grosser

On 11/07/2014 13:11, Roman Gareev wrote:

The struct now contains only a single element such that there seems to be no
need for it anymore. Should we remove it? (We could still use a typedef if
you believe the datatype is too long).


I don't mind removing it. However I think that we should choose the
way of transferring of sese region, which is used for translation of
an isl_ast_node_user to Gimple code. Should we transfer it separately
or restore ivs_params later? What do you think?


Sorry, I currently don't see why sese region is needed later. In case it
is, we may keep it, but this would require to pull more code in the 
review. I personally would just remove it to finish this review.


Feel free to commit after applying this change.

Cheers,
Tobias



Re: Use separate sections to stream non-trivial constructors

2014-07-11 Thread Richard Biener
On Fri, 11 Jul 2014, Jan Hubicka wrote:

 Hi,
 since we both agreed offlining constructors from global decl stream is a good
 idea, I went ahead and implemented it.  I would like to followup by an
 cleanups; for example the sections are still tagged as function sections, but 
 I
 would like to do it incrementally. There is quite some uglyness in the way we
 handle function sections and the patch started to snowball very quickly.
 
 The patch conceptually copies what we do for functions and re-uses most of
 infrastructure. varpool_get_constructor is cgraph_get_body (i.e. mean of
 getting function in) and it is used by output machinery, by ipa-visibility
 while rewritting the constructor and by ctor_for_folding (which makes us to
 load the ctor whenever it is needed by ipa-cp or ipa-devirt).
 
 I kept get_symbol_initial_value as an authority to decide if we want to encode
 given constructor or not.  The section itself for trivial ctor is about 25
 bytes and with header it is probably close to double of it. Currently the 
 heuristic
 is to offline only constructors that are CONSTRUCTOR and keep simple 
 expressions
 inline.  We may want to tweak it.

Hmm, so what about artificial testcase with gazillions of

struct X { int i; };

struct X a0001 = { 1 };
struct X a0002 = { 2 };


how does it explode LTO IL size and streaming time (compile-out and
LTRANS in)?  I suppose it still helps WPA stage.

Also what we desparately miss is to put CONST_DECLs into the symbol 
table (and thus eventually move the constant pool to symtab).  That
and no longer allowing STRING_CSTs in the IL but only CONST_DECLs
with STRING_CST initializers (to fix PR50199).

 The patch does not bring miraculous savings to firefox WPA, but it does some:
 
 GGC memory after global stream is read goes from 1376898k to 1250533k
 overall GGC allocations from 4156478 kB to 4012462 kB
 read 11006599 SCCs of average size 1.907692 - read 9119433 SCCs of average 
 size 2.037867
 20997206 tree bodies read in total - 18584194 tree bodies read in total
 Size of mmap'd section decls: 299540188 bytes - Size of mmap'd section 
 decls: 271557265 bytes
 Size of mmap'd section function_body: 5711078 bytes - Size of mmap'd section 
 function_body: 7548680 bytes 
 
 Things would be better if ipa-visibility and ipa-devirt did not load most of
 the virtual tables into memory (still better than loading each into memory 20
 times at average).  I will work on that incrementally. We load 10311 ctors 
 into
 memory at WPA time.
 
 Note that firefox seems to feature really huge data segment these days.
 http://hubicka.blogspot.ca/2014/04/linktime-optimization-in-gcc-2-firefox.html
 
 Bootstrapped/regtested x86_64-linux, tested with firefox, lto bootstrap 
 in progress, OK?

The patch looks ok to me.  How about simply doing 
s/LTO_section_function_body/LTO_section_symbol_content/ instead of
adding LTO_section_variable_initializer?

Thanks,
Richard.

   * vapool.c: Include tree-ssa-alias.h, gimple.h and lto-streamer.h
   (varpool_get_constructor): New function.
   (ctor_for_folding): Use it.
   (varpool_assemble_decl): Likewise.
   * lto-streamer.h (struct output_block): Turn cgraph_node
   to symbol filed.
   (lto_input_variable_constructor): Declare.
   * ipa-visibility.c (function_and_variable_visibility): Use
   varpool_get_constructor.
   * cgraph.h (varpool_get_constructor): Declare.
   * lto-streamer-out.c (get_symbol_initial_value): Take encoder
   parameter; return error_mark_node for non-trivial constructors.
   (lto_write_tree_1, DFS_write_tree): UPdate use of
   get_symbol_initial_value.
   (output_function): Update initialization of symbol.
   (output_constructor): New function.
   (copy_function): Rename to ..
   (copy_function_or_variable): ... this one; handle vars too.
   (lto_output): Output variable sections.
   * lto-streamer-in.c (input_constructor): New function.
   (lto_read_body): Rename from ...
   (lto_read_body_or_constructor): ... this one; handle vars
   too.
   (lto_input_variable_constructor): New function.
   * ipa-prop.c (ipa_prop_write_jump_functions,
   ipa_prop_write_all_agg_replacement): Update.
 Index: varpool.c
 ===
 --- varpool.c (revision 212426)
 +++ varpool.c (working copy)
 @@ -35,6 +35,9 @@ along with GCC; see the file COPYING3.
  #include gimple-expr.h
  #include flags.h
  #include pointer-set.h
 +#include tree-ssa-alias.h
 +#include gimple.h
 +#include lto-streamer.h
  
  const char * const tls_model_names[]={none, tls-emulated, tls-real,
 tls-global-dynamic, tls-local-dynamic,
 @@ -253,6 +256,41 @@ varpool_node_for_asm (tree asmname)
  return NULL;
  }
  
 +/* When doing LTO, read NODE's constructor from disk if it is not already 
 present.  */
 +
 +tree
 +varpool_get_constructor (struct varpool_node *node)
 +{
 +  

Re: [PATCH 2/2] Enable elimination of zext/sext

2014-07-11 Thread Kugan
Thanks foe the review and suggestions.

On 10/07/14 22:15, Richard Biener wrote:
 On Mon, Jul 7, 2014 at 8:55 AM, Kugan kugan.vivekanandara...@linaro.org 
 wrote:

[...]


 For -fwrapv, it is due to how PROMOTE_MODE is defined in arm back-end.
 In the test-case, a function (which has signed char return type) returns
 -1 in one of the paths. ARM PROMOTE_MODE changes that to 255 and relies
 on zero/sign extension generated by RTL again for the correct value. I
 saw some other targets also defining similar think. I am therefore
 skipping removing zero/sign extension if the ssa variable can be set to
 negative integer constants.
 
 Hm?  I think you should rather check that you are removing a
 sign-/zero-extension - PROMOTE_MODE tells you if it will sign- or
 zero-extend.  Definitely
 
 +  /* In some architectures, negative integer constants are truncated and
 + sign changed with target defined PROMOTE_MODE macro. This will impact
 + the value range seen here and produce wrong code if zero/sign extensions
 + are eliminated. Therefore, return false if this SSA can have negative
 + integers.  */
 +  if (is_gimple_assign (stmt)
 +   (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_unary))
 +{
 +  tree rhs1 = gimple_assign_rhs1 (stmt);
 +  if (TREE_CODE (rhs1) == INTEGER_CST
 +  !TYPE_UNSIGNED (TREE_TYPE (ssa))
 +  tree_int_cst_compare (rhs1, integer_zero_node) == -1)
 +   return false;
 
 looks completely bogus ... (an unary op with a constant operand?)
 instead you want to do sth like

I see that unary op with a constant operand is not possible in gimple.
What I wanted to check here is any sort of constant loads; but seems
that will not happen in gimple. Is PHI statements the only possible
statements where we will end up with such constants.

   mode = TYPE_MODE (TREE_TYPE (ssa));
   rhs_uns = TYPE_UNSIGNED (TREE_TYPE (ssa));
   PROMOTE_MODE (mode, rhs_uns, TREE_TYPE (ssa));
 
 instead of initializing rhs_uns from ssas type.  That is, if
 PROMOTE_MODE tells you to promote _not_ according to ssas sign then
 honor that.

This is triggered in pr43017.c in function foo for arm-none-linux-gnueabi.

where, the gimple statement that cause this looks like:
.
  # _3 = PHI _17(7), -1(2)
bb43:
  return _3;

ARM PROMOTE_MODE changes the sign for integer constants only and hence
looking at the variable with PROMOTE_MODE is not changing the sign in
this case.

#define PROMOTE_MODE(MODE, UNSIGNEDP, TYPE) \
  if (GET_MODE_CLASS (MODE) == MODE_INT \
   GET_MODE_SIZE (MODE)  4)  \
{   \
  if (MODE == QImode)   \
UNSIGNEDP = 1;  \
  else if (MODE == HImode)  \
UNSIGNEDP = 1;  \
  (MODE) = SImode;  \
}

 As for the -fno-strict-overflow case, if the variables overflows, in VRP
 dumps, I see +INF(OVF), but the value range stored in ssa has TYPE_MAX.
 We therefore should limit the comparison to (TYPE_MIN  VR_MIN  VR_MAX
  TYPE_MAX) instead of (TYPE_MIN = VR_MIN  VR_MAX = TYPE_MAX) when
 checking to be sure that this is not the overflowing case. Attached
 patch changes this.
 
 I don't think that's necessary - the overflow cases happen only when
 that overflow has undefined behavior, thus any valid program will have
 values = MAX.

I see that you have now removed +INF(OVF). I will change it this way.

Thanks again,
Kugan



Re: Use separate sections to stream non-trivial constructors

2014-07-11 Thread Jan Hubicka
 On Fri, 11 Jul 2014, Jan Hubicka wrote:
 
  Hi,
  since we both agreed offlining constructors from global decl stream is a 
  good
  idea, I went ahead and implemented it.  I would like to followup by an
  cleanups; for example the sections are still tagged as function sections, 
  but I
  would like to do it incrementally. There is quite some uglyness in the way 
  we
  handle function sections and the patch started to snowball very quickly.
  
  The patch conceptually copies what we do for functions and re-uses most of
  infrastructure. varpool_get_constructor is cgraph_get_body (i.e. mean of
  getting function in) and it is used by output machinery, by ipa-visibility
  while rewritting the constructor and by ctor_for_folding (which makes us to
  load the ctor whenever it is needed by ipa-cp or ipa-devirt).
  
  I kept get_symbol_initial_value as an authority to decide if we want to 
  encode
  given constructor or not.  The section itself for trivial ctor is about 25
  bytes and with header it is probably close to double of it. Currently the 
  heuristic
  is to offline only constructors that are CONSTRUCTOR and keep simple 
  expressions
  inline.  We may want to tweak it.
 
 Hmm, so what about artificial testcase with gazillions of
 
 struct X { int i; };
 
 struct X a0001 = { 1 };
 struct X a0002 = { 2 };
 
 
 how does it explode LTO IL size and streaming time (compile-out and
 LTRANS in)?  I suppose it still helps WPA stage.

Well, nothing really artificial, except that gazzilions of static variables
called a0001 to a000gazzilion are ugly :))

I just put the CONSRUCTOR bits in the initial varsion to not have the path 
unused
at all.  Either we can base our decision on size of the variable or do simple
walk to see if it needs more than, say 8 trees.

I will play with this incrementally after cleaning up the headers (as those
accounts for the overhead)
 
 Also what we desparately miss is to put CONST_DECLs into the symbol 
 table (and thus eventually move the constant pool to symtab).  That
 and no longer allowing STRING_CSTs in the IL but only CONST_DECLs
 with STRING_CST initializers (to fix PR50199).

Yep, I have patch for putting CONST_DECLs into symbol table. It however
does not help partitionability because at the moment output machinery do
not expect const decls to have visibilities.

I will push out that change (and LABEL_DECL, too) after Martin's renaming
patches lands to mainline.
 
  The patch does not bring miraculous savings to firefox WPA, but it does 
  some:
  
  GGC memory after global stream is read goes from 1376898k to 1250533k
  overall GGC allocations from 4156478 kB to 4012462 kB
  read 11006599 SCCs of average size 1.907692 - read 9119433 SCCs of average 
  size 2.037867
  20997206 tree bodies read in total - 18584194 tree bodies read in total
  Size of mmap'd section decls: 299540188 bytes - Size of mmap'd section 
  decls: 271557265 bytes
  Size of mmap'd section function_body: 5711078 bytes - Size of mmap'd 
  section function_body: 7548680 bytes 
  
  Things would be better if ipa-visibility and ipa-devirt did not load most of
  the virtual tables into memory (still better than loading each into memory 
  20
  times at average).  I will work on that incrementally. We load 10311 ctors 
  into
  memory at WPA time.
  
  Note that firefox seems to feature really huge data segment these days.
  http://hubicka.blogspot.ca/2014/04/linktime-optimization-in-gcc-2-firefox.html
  
  Bootstrapped/regtested x86_64-linux, tested with firefox, lto bootstrap 
  in progress, OK?
 
 The patch looks ok to me.  How about simply doing 
 s/LTO_section_function_body/LTO_section_symbol_content/ instead of
 adding LTO_section_variable_initializer?

Yeah, I was thinking about it, too.
I think variable and constructor sections may differ in its header however, 
since we do
not need CFG stream for variables.

Thanks!
Honza
 
 Thanks,
 Richard.


Re: Optimize type streaming

2014-07-11 Thread Jan Hubicka
  Hmm, walking everything first and then starting streaming sounds bad idea
  for locality.  Can't I just re-do the walk since I know what the SCC is?
  I will read the code more after lunch.
 
 Might be possible with a 2nd SCC stack set up, yes.

I set up hash_map to store the hash values anyway. What about sorting using
walk_tree with simple callback pushing the parameter on stack if it is in the
hash map and removing it from hash map?

That way I do not need to duplicate the tricky SCC walking and probably won't
be more terrible in overhead than qsort.  Seems like a good idea at least for a
prototype, will try it out.

honza
 
 Richard.
 
  Honza
   
   Richard.
   
Honza
 
 Thanks,
 Richard.
 
  Honza
  
  * lto-streamer-out.c (hash_tree): Add map argument;
  remove special purpose hack for pointer types.
  (visit): Update it.
  (hash_scc): Add iteration until equivalence classes stabilize.
  Index: lto-streamer-out.c
  ===
  --- lto-streamer-out.c  (revision 212426)
  +++ lto-streamer-out.c  (working copy)
  @@ -690,13 +712,19 @@ DFS_write_tree_body (struct output_block
   /* Return a hash value for the tree T.  */
   
   static hashval_t
  -hash_tree (struct streamer_tree_cache_d *cache, tree t)
  +hash_tree (struct streamer_tree_cache_d *cache, hash_maptree, 
  hashval_t *map, tree t)
   {
   #define visit(SIBLING) \
 do { \
   unsigned ix; \
  -if (SIBLING  streamer_tree_cache_lookup (cache, SIBLING, 
  ix)) \
  +if (!SIBLING) \
  +  v = iterative_hash_hashval_t (0, v); \
  +else if (streamer_tree_cache_lookup (cache, SIBLING, ix)) \
 v = iterative_hash_hashval_t (streamer_tree_cache_get_hash 
  (cache, ix), v); \
  +else if (map) \
  +  v = iterative_hash_hashval_t (*map-get (SIBLING), v); \
  +else \
  +  v = iterative_hash_hashval_t (1, v); \
 } while (0)
   
 /* Hash TS_BASE.  */
  @@ -896,23 +924,7 @@ hash_tree (struct streamer_tree_cache_d
   
 if (CODE_CONTAINS_STRUCT (code, TS_TYPED))
   {
  -  if (POINTER_TYPE_P (t))
  -   {
  - /* For pointers factor in the pointed-to type recursively as
  -we cannot recurse through only pointers.
  -???  We can generalize this by keeping track of the
  -in-SCC edges for each tree (or arbitrarily the first
  -such edge) and hashing that in in a second stage
  -(instead of the quadratic mixing of the SCC we do now).  */
  - hashval_t x;
  - unsigned ix;
  - if (streamer_tree_cache_lookup (cache, TREE_TYPE (t), ix))
  -   x = streamer_tree_cache_get_hash (cache, ix);
  - else
  -   x = hash_tree (cache, TREE_TYPE (t));
  - v = iterative_hash_hashval_t (x, v);
  -   }
  -  else if (code != IDENTIFIER_NODE)
  +  if (code != IDENTIFIER_NODE)
  visit (TREE_TYPE (t));
   }
   
  @@ -1122,28 +1156,78 @@ scc_entry_compare (const void *p1_, cons
   static hashval_t
   hash_scc (struct streamer_tree_cache_d *cache, unsigned first, 
  unsigned size)
   {
  +  unsigned int last_classes = 0, iterations = 0;
  +
 /* Compute hash values for the SCC members.  */
 for (unsigned i = 0; i  size; ++i)
  -sccstack[first+i].hash = hash_tree (cache, 
  sccstack[first+i].t);
  +sccstack[first+i].hash = hash_tree (cache, NULL, 
  sccstack[first+i].t);
   
 if (size == 1)
   return sccstack[first].hash;
   
  -  /* Sort the SCC of type, hash pairs so that when we mix in
  - all members of the SCC the hash value becomes independent on
  - the order we visited the SCC.  Produce hash of the whole SCC 
  as
  - combination of hashes of individual elements.  Then combine 
  that hash into
  - hash of each element, so othewise identically looking 
  elements from two
  - different SCCs are distinguished.  */
  -  qsort (sccstack[first], size, sizeof (scc_entry), 
  scc_entry_compare);
  -
  -  hashval_t scc_hash = sccstack[first].hash;
  -  for (unsigned i = 1; i  size; ++i)
  -scc_hash = iterative_hash_hashval_t (scc_hash,
  -sccstack[first+i].hash);
  -  for (unsigned i = 0; i  size; ++i)
  -sccstack[first+i].hash = iterative_hash_hashval_t 
  (sccstack[first+i].hash, scc_hash);
  -  return scc_hash;
  +  /* We may want to iterate multiple times across SCC propagating 
  across the internal
  + edges in order to get different hash values for individual 
  trees.  */
  +  do
  +{
  +  /* Sort the SCC of type, hash pairs 

Re: Use separate sections to stream non-trivial constructors

2014-07-11 Thread Richard Biener
On Fri, 11 Jul 2014, Jan Hubicka wrote:

  On Fri, 11 Jul 2014, Jan Hubicka wrote:
  
   Hi,
   since we both agreed offlining constructors from global decl stream is a 
   good
   idea, I went ahead and implemented it.  I would like to followup by an
   cleanups; for example the sections are still tagged as function sections, 
   but I
   would like to do it incrementally. There is quite some uglyness in the 
   way we
   handle function sections and the patch started to snowball very quickly.
   
   The patch conceptually copies what we do for functions and re-uses most of
   infrastructure. varpool_get_constructor is cgraph_get_body (i.e. mean of
   getting function in) and it is used by output machinery, by ipa-visibility
   while rewritting the constructor and by ctor_for_folding (which makes us 
   to
   load the ctor whenever it is needed by ipa-cp or ipa-devirt).
   
   I kept get_symbol_initial_value as an authority to decide if we want to 
   encode
   given constructor or not.  The section itself for trivial ctor is about 25
   bytes and with header it is probably close to double of it. Currently the 
   heuristic
   is to offline only constructors that are CONSTRUCTOR and keep simple 
   expressions
   inline.  We may want to tweak it.
  
  Hmm, so what about artificial testcase with gazillions of
  
  struct X { int i; };
  
  struct X a0001 = { 1 };
  struct X a0002 = { 2 };
  
  
  how does it explode LTO IL size and streaming time (compile-out and
  LTRANS in)?  I suppose it still helps WPA stage.
 
 Well, nothing really artificial, except that gazzilions of static variables
 called a0001 to a000gazzilion are ugly :))
 
 I just put the CONSRUCTOR bits in the initial varsion to not have the path 
 unused
 at all.  Either we can base our decision on size of the variable or do simple
 walk to see if it needs more than, say 8 trees.

Hum, probably not worth special-casing.

 I will play with this incrementally after cleaning up the headers (as those
 accounts for the overhead)
  
  Also what we desparately miss is to put CONST_DECLs into the symbol 
  table (and thus eventually move the constant pool to symtab).  That
  and no longer allowing STRING_CSTs in the IL but only CONST_DECLs
  with STRING_CST initializers (to fix PR50199).
 
 Yep, I have patch for putting CONST_DECLs into symbol table. It however
 does not help partitionability because at the moment output machinery do
 not expect const decls to have visibilities.

Well, just make them regular (anonymous) VAR_DECLs then ... (the fact
that a CONST_DECL is anonymous is probably the only real difference - 
and that they are mergeable by content).

 I will push out that change (and LABEL_DECL, too) after Martin's renaming
 patches lands to mainline.

Thanks.

  
   The patch does not bring miraculous savings to firefox WPA, but it does 
   some:
   
   GGC memory after global stream is read goes from 1376898k to 1250533k
   overall GGC allocations from 4156478 kB to 4012462 kB
   read 11006599 SCCs of average size 1.907692 - read 9119433 SCCs of 
   average size 2.037867
   20997206 tree bodies read in total - 18584194 tree bodies read in total
   Size of mmap'd section decls: 299540188 bytes - Size of mmap'd section 
   decls: 271557265 bytes
   Size of mmap'd section function_body: 5711078 bytes - Size of mmap'd 
   section function_body: 7548680 bytes 
   
   Things would be better if ipa-visibility and ipa-devirt did not load most 
   of
   the virtual tables into memory (still better than loading each into 
   memory 20
   times at average).  I will work on that incrementally. We load 10311 
   ctors into
   memory at WPA time.
   
   Note that firefox seems to feature really huge data segment these days.
   http://hubicka.blogspot.ca/2014/04/linktime-optimization-in-gcc-2-firefox.html
   
   Bootstrapped/regtested x86_64-linux, tested with firefox, lto bootstrap 
   in progress, OK?
  
  The patch looks ok to me.  How about simply doing 
  s/LTO_section_function_body/LTO_section_symbol_content/ instead of
  adding LTO_section_variable_initializer?
 
 Yeah, I was thinking about it, too.
 I think variable and constructor sections may differ in its header however, 
 since we do
 not need CFG stream for variables.
 
 Thanks!
 Honza
  
  Thanks,
  Richard.
 
 

-- 
Richard Biener rguent...@suse.de
SUSE / SUSE Labs
SUSE LINUX Products GmbH - Nuernberg - AG Nuernberg - HRB 16746
GF: Jeff Hawn, Jennifer Guild, Felix Imendorffer


Re: Optimize type streaming

2014-07-11 Thread Richard Biener
On Fri, 11 Jul 2014, Jan Hubicka wrote:

   Hmm, walking everything first and then starting streaming sounds bad idea
   for locality.  Can't I just re-do the walk since I know what the SCC is?
   I will read the code more after lunch.
  
  Might be possible with a 2nd SCC stack set up, yes.
 
 I set up hash_map to store the hash values anyway. What about sorting using
 walk_tree with simple callback pushing the parameter on stack if it is in the
 hash map and removing it from hash map?
 
 That way I do not need to duplicate the tricky SCC walking and probably won't
 be more terrible in overhead than qsort.  Seems like a good idea at least for 
 a
 prototype, will try it out.

Awww, but you get edges walked that are not walked in the DFS walk ...
(basically you can end up walking the whole program IL if you are 
unlucky).

RIchard.

 honza
  
  Richard.
  
   Honza

Richard.

 Honza
  
  Thanks,
  Richard.
  
   Honza
   
 * lto-streamer-out.c (hash_tree): Add map argument;
 remove special purpose hack for pointer types.
 (visit): Update it.
 (hash_scc): Add iteration until equivalence classes stabilize.
   Index: lto-streamer-out.c
   ===
   --- lto-streamer-out.c(revision 212426)
   +++ lto-streamer-out.c(working copy)
   @@ -690,13 +712,19 @@ DFS_write_tree_body (struct output_block
/* Return a hash value for the tree T.  */

static hashval_t
   -hash_tree (struct streamer_tree_cache_d *cache, tree t)
   +hash_tree (struct streamer_tree_cache_d *cache, hash_maptree, 
   hashval_t *map, tree t)
{
#define visit(SIBLING) \
  do { \
unsigned ix; \
   -if (SIBLING  streamer_tree_cache_lookup (cache, SIBLING, 
   ix)) \
   +if (!SIBLING) \
   +  v = iterative_hash_hashval_t (0, v); \
   +else if (streamer_tree_cache_lookup (cache, SIBLING, ix)) \
  v = iterative_hash_hashval_t (streamer_tree_cache_get_hash 
   (cache, ix), v); \
   +else if (map) \
   +  v = iterative_hash_hashval_t (*map-get (SIBLING), v); \
   +else \
   +  v = iterative_hash_hashval_t (1, v); \
  } while (0)

  /* Hash TS_BASE.  */
   @@ -896,23 +924,7 @@ hash_tree (struct streamer_tree_cache_d

  if (CODE_CONTAINS_STRUCT (code, TS_TYPED))
{
   -  if (POINTER_TYPE_P (t))
   - {
   -   /* For pointers factor in the pointed-to type recursively as
   -  we cannot recurse through only pointers.
   -  ???  We can generalize this by keeping track of the
   -  in-SCC edges for each tree (or arbitrarily the first
   -  such edge) and hashing that in in a second stage
   -  (instead of the quadratic mixing of the SCC we do now).  */
   -   hashval_t x;
   -   unsigned ix;
   -   if (streamer_tree_cache_lookup (cache, TREE_TYPE (t), ix))
   - x = streamer_tree_cache_get_hash (cache, ix);
   -   else
   - x = hash_tree (cache, TREE_TYPE (t));
   -   v = iterative_hash_hashval_t (x, v);
   - }
   -  else if (code != IDENTIFIER_NODE)
   +  if (code != IDENTIFIER_NODE)
 visit (TREE_TYPE (t));
}

   @@ -1122,28 +1156,78 @@ scc_entry_compare (const void *p1_, cons
static hashval_t
hash_scc (struct streamer_tree_cache_d *cache, unsigned first, 
   unsigned size)
{
   +  unsigned int last_classes = 0, iterations = 0;
   +
  /* Compute hash values for the SCC members.  */
  for (unsigned i = 0; i  size; ++i)
   -sccstack[first+i].hash = hash_tree (cache, 
   sccstack[first+i].t);
   +sccstack[first+i].hash = hash_tree (cache, NULL, 
   sccstack[first+i].t);

  if (size == 1)
return sccstack[first].hash;

   -  /* Sort the SCC of type, hash pairs so that when we mix in
   - all members of the SCC the hash value becomes independent on
   - the order we visited the SCC.  Produce hash of the whole 
   SCC as
   - combination of hashes of individual elements.  Then combine 
   that hash into
   - hash of each element, so othewise identically looking 
   elements from two
   - different SCCs are distinguished.  */
   -  qsort (sccstack[first], size, sizeof (scc_entry), 
   scc_entry_compare);
   -
   -  hashval_t scc_hash = sccstack[first].hash;
   -  for (unsigned i = 1; i  size; ++i)
   -scc_hash = iterative_hash_hashval_t (scc_hash,
   -  sccstack[first+i].hash);
   -  for (unsigned i = 0; i  size; ++i)
   -sccstack[first+i].hash = iterative_hash_hashval_t 
   (sccstack[first+i].hash, scc_hash);
   -  

Re: Optimize type streaming

2014-07-11 Thread Jan Hubicka
 On Fri, 11 Jul 2014, Jan Hubicka wrote:
 
Hmm, walking everything first and then starting streaming sounds bad 
idea
for locality.  Can't I just re-do the walk since I know what the SCC is?
I will read the code more after lunch.
   
   Might be possible with a 2nd SCC stack set up, yes.
  
  I set up hash_map to store the hash values anyway. What about sorting 
  using
  walk_tree with simple callback pushing the parameter on stack if it is in 
  the
  hash map and removing it from hash map?
  
  That way I do not need to duplicate the tricky SCC walking and probably 
  won't
  be more terrible in overhead than qsort.  Seems like a good idea at least 
  for a
  prototype, will try it out.
 
 Awww, but you get edges walked that are not walked in the DFS walk ...
 (basically you can end up walking the whole program IL if you are 
 unlucky).

Why? I just terminate the walk on every node that is not withing the SCC region.
Yes, I will unnecesarily walk all edges out of my SCC region, but I think we can
live with that (SCC walk would visit them anyway, too)

Honza
 
 RIchard.
 
  honza
   
   Richard.
   
Honza
 
 Richard.
 
  Honza
   
   Thanks,
   Richard.
   
Honza

* lto-streamer-out.c (hash_tree): Add map argument;
remove special purpose hack for pointer types.
(visit): Update it.
(hash_scc): Add iteration until equivalence classes 
stabilize.
Index: lto-streamer-out.c
===
--- lto-streamer-out.c  (revision 212426)
+++ lto-streamer-out.c  (working copy)
@@ -690,13 +712,19 @@ DFS_write_tree_body (struct output_block
 /* Return a hash value for the tree T.  */
 
 static hashval_t
-hash_tree (struct streamer_tree_cache_d *cache, tree t)
+hash_tree (struct streamer_tree_cache_d *cache, hash_maptree, 
hashval_t *map, tree t)
 {
 #define visit(SIBLING) \
   do { \
 unsigned ix; \
-if (SIBLING  streamer_tree_cache_lookup (cache, SIBLING, 
ix)) \
+if (!SIBLING) \
+  v = iterative_hash_hashval_t (0, v); \
+else if (streamer_tree_cache_lookup (cache, SIBLING, ix)) 
\
   v = iterative_hash_hashval_t 
(streamer_tree_cache_get_hash (cache, ix), v); \
+else if (map) \
+  v = iterative_hash_hashval_t (*map-get (SIBLING), v); \
+else \
+  v = iterative_hash_hashval_t (1, v); \
   } while (0)
 
   /* Hash TS_BASE.  */
@@ -896,23 +924,7 @@ hash_tree (struct streamer_tree_cache_d
 
   if (CODE_CONTAINS_STRUCT (code, TS_TYPED))
 {
-  if (POINTER_TYPE_P (t))
-   {
- /* For pointers factor in the pointed-to type 
recursively as
-we cannot recurse through only pointers.
-???  We can generalize this by keeping track of the
-in-SCC edges for each tree (or arbitrarily the 
first
-such edge) and hashing that in in a second stage
-(instead of the quadratic mixing of the SCC we do 
now).  */
- hashval_t x;
- unsigned ix;
- if (streamer_tree_cache_lookup (cache, TREE_TYPE (t), 
ix))
-   x = streamer_tree_cache_get_hash (cache, ix);
- else
-   x = hash_tree (cache, TREE_TYPE (t));
- v = iterative_hash_hashval_t (x, v);
-   }
-  else if (code != IDENTIFIER_NODE)
+  if (code != IDENTIFIER_NODE)
visit (TREE_TYPE (t));
 }
 
@@ -1122,28 +1156,78 @@ scc_entry_compare (const void *p1_, cons
 static hashval_t
 hash_scc (struct streamer_tree_cache_d *cache, unsigned first, 
unsigned size)
 {
+  unsigned int last_classes = 0, iterations = 0;
+
   /* Compute hash values for the SCC members.  */
   for (unsigned i = 0; i  size; ++i)
-sccstack[first+i].hash = hash_tree (cache, 
sccstack[first+i].t);
+sccstack[first+i].hash = hash_tree (cache, NULL, 
sccstack[first+i].t);
 
   if (size == 1)
 return sccstack[first].hash;
 
-  /* Sort the SCC of type, hash pairs so that when we mix in
- all members of the SCC the hash value becomes independent 
on
- the order we visited the SCC.  Produce hash of the whole 
SCC as
- combination of hashes of individual elements.  Then 
combine that hash into
- hash of each element, so othewise identically looking 
elements from 

Re: update address taken: don't drop clobbers

2014-07-11 Thread Michael Matz
Hi,

On Thu, 10 Jul 2014, Jeff Law wrote:

  The insight to note is, that undefined SSA names should really be
  coalesced with something (otherwise you lost an optimization opportunity),
  but it doesn't matter with _what_ each use of the undefined name is
  coalesced, you can even identify different uses of them with different SSA
  names (e.g. the LHS of each using stmt).  Requires some change in the
  order things are done in out-of-ssa.
 
 The last part is what I hinted might be problematical.  If some 
 undefined SSA_NAME appears on the RHS of two PHIs and we want to 
 coalesce that undefined SSA_NAME with the LHS of each of those PHIs, 
 then the LHS of those two PHIs must coalesce as well.  At least that's 
 my recollection of how all that stuff worked.

Only with the usual definition of coalescing (being transitive).  For 
undefined SSA names the definition can be mended.

 It was that realization that made me wonder if we should have a unique 
 SSA_NAME at each undefined use point.

It's easier to implicitely regard every individual use of an undefined SSA 
name as a unique name in coalescing I think (instead of having it be a 
unique name explicitely).  That is, given:

bb1:
  x_1 = PHI a_2, b_3(UND)
  ...

bb2:
  x_4 = PHI y_5, b_3(UND)
  ...

There is no reason to not regard the two uses of b_3 as separate and 
identify the first with x_1 and the second with x_2, _without_ coalescing 
x_1 and x_2.  But yes, this doesn't fit readily into the normal coalescing 
merge-find framework, but rather would have to be something after 
coalescing when writing out-of-ssa (whenever an undefined use is rewritten 
just take a random other fitting variable).


Ciao,
Michael.


Re: Optimize type streaming

2014-07-11 Thread Richard Biener
On Fri, 11 Jul 2014, Jan Hubicka wrote:

  On Fri, 11 Jul 2014, Jan Hubicka wrote:
  
 Hmm, walking everything first and then starting streaming sounds bad 
 idea
 for locality.  Can't I just re-do the walk since I know what the SCC 
 is?
 I will read the code more after lunch.

Might be possible with a 2nd SCC stack set up, yes.
   
   I set up hash_map to store the hash values anyway. What about sorting 
   using
   walk_tree with simple callback pushing the parameter on stack if it is in 
   the
   hash map and removing it from hash map?
   
   That way I do not need to duplicate the tricky SCC walking and probably 
   won't
   be more terrible in overhead than qsort.  Seems like a good idea at least 
   for a
   prototype, will try it out.
  
  Awww, but you get edges walked that are not walked in the DFS walk ...
  (basically you can end up walking the whole program IL if you are 
  unlucky).
 
 Why? I just terminate the walk on every node that is not withing the SCC 
 region.
 Yes, I will unnecesarily walk all edges out of my SCC region, but I think we 
 can
 live with that (SCC walk would visit them anyway, too)

Ah, ok.  Well, let's hope walk_tree walks all edges the DFS walk walks ;)
A quick look tells me it doesn't walk DECL_VINDEX or
DECL_FUNCTION_PERSONALITY for example.

Richard.

 Honza
  
  RIchard.
  
   honza

Richard.

 Honza
  
  Richard.
  
   Honza

Thanks,
Richard.

 Honza
 
   * lto-streamer-out.c (hash_tree): Add map argument;
   remove special purpose hack for pointer types.
   (visit): Update it.
   (hash_scc): Add iteration until equivalence classes 
 stabilize.
 Index: lto-streamer-out.c
 ===
 --- lto-streamer-out.c(revision 212426)
 +++ lto-streamer-out.c(working copy)
 @@ -690,13 +712,19 @@ DFS_write_tree_body (struct output_block
  /* Return a hash value for the tree T.  */
  
  static hashval_t
 -hash_tree (struct streamer_tree_cache_d *cache, tree t)
 +hash_tree (struct streamer_tree_cache_d *cache, 
 hash_maptree, hashval_t *map, tree t)
  {
  #define visit(SIBLING) \
do { \
  unsigned ix; \
 -if (SIBLING  streamer_tree_cache_lookup (cache, 
 SIBLING, ix)) \
 +if (!SIBLING) \
 +  v = iterative_hash_hashval_t (0, v); \
 +else if (streamer_tree_cache_lookup (cache, SIBLING, 
 ix)) \
v = iterative_hash_hashval_t 
 (streamer_tree_cache_get_hash (cache, ix), v); \
 +else if (map) \
 +  v = iterative_hash_hashval_t (*map-get (SIBLING), v); 
 \
 +else \
 +  v = iterative_hash_hashval_t (1, v); \
} while (0)
  
/* Hash TS_BASE.  */
 @@ -896,23 +924,7 @@ hash_tree (struct streamer_tree_cache_d
  
if (CODE_CONTAINS_STRUCT (code, TS_TYPED))
  {
 -  if (POINTER_TYPE_P (t))
 - {
 -   /* For pointers factor in the pointed-to type 
 recursively as
 -  we cannot recurse through only pointers.
 -  ???  We can generalize this by keeping track of the
 -  in-SCC edges for each tree (or arbitrarily the 
 first
 -  such edge) and hashing that in in a second stage
 -  (instead of the quadratic mixing of the SCC we do 
 now).  */
 -   hashval_t x;
 -   unsigned ix;
 -   if (streamer_tree_cache_lookup (cache, TREE_TYPE (t), 
 ix))
 - x = streamer_tree_cache_get_hash (cache, ix);
 -   else
 - x = hash_tree (cache, TREE_TYPE (t));
 -   v = iterative_hash_hashval_t (x, v);
 - }
 -  else if (code != IDENTIFIER_NODE)
 +  if (code != IDENTIFIER_NODE)
   visit (TREE_TYPE (t));
  }
  
 @@ -1122,28 +1156,78 @@ scc_entry_compare (const void *p1_, 
 cons
  static hashval_t
  hash_scc (struct streamer_tree_cache_d *cache, unsigned 
 first, unsigned size)
  {
 +  unsigned int last_classes = 0, iterations = 0;
 +
/* Compute hash values for the SCC members.  */
for (unsigned i = 0; i  size; ++i)
 -sccstack[first+i].hash = hash_tree (cache, 
 sccstack[first+i].t);
 +sccstack[first+i].hash = hash_tree (cache, NULL, 
 sccstack[first+i].t);
  
if (size == 1)
  return sccstack[first].hash;
  
 -  /* Sort the SCC of type, hash pairs so that when we mix in
 - 

Re: Use separate sections to stream non-trivial constructors

2014-07-11 Thread Jan Hubicka
 
 Well, just make them regular (anonymous) VAR_DECLs then ... (the fact
 that a CONST_DECL is anonymous is probably the only real difference - 
 and that they are mergeable by content).

Something like that, perhaps. Plan to do that incrementally - having them in
symbol tabel first is an important step. There is also an option to update
CONST_DECL into VAR_DECL when it is being turned into hidden.

Currently things are bit inflexible because we still make difference between
const decl and var decl in tree representation.  Once I finish my transition
to push out DECL_WITH_VIS and DECL_NON_COMMON fields, we can turn CONST_DECL
into VAR_DECL with special flag saying that symbol name/address value doesn't
matter.

This and, of course, cleaning up constpool mess can get me occupised for months
;)

Honza


Re: Optimize type streaming

2014-07-11 Thread Jan Hubicka
 
 Ah, ok.  Well, let's hope walk_tree walks all edges the DFS walk walks ;)
 A quick look tells me it doesn't walk DECL_VINDEX or
 DECL_FUNCTION_PERSONALITY for example.

I guess it should - will try patch adding that :) Both seems like just
ommisions - these fileds were added quite recently (moved from elsewhere) and
probably walk tree was not updated and no current user cares.

DECL_VINDEX is a strange animal, since it always points to INTEGER_CST in well
formed IL.  I wanted to turn it into integer, but of course both C++ and Java
temporarily sets it into TREE_LIST.  We probably could stream it as integer
saving the walk.

Those two should not be in-SCC edges.  I can easily match the two stacks
for size and see if I find something catastrophical.

Honza


Fix PR61772: ifcvt removing asm volatile gotos

2014-07-11 Thread Michael Matz
Hi,

our kernel guys saw this problem when they used an asm goto to implement a 
conditional where the then-block was empty.  RTL ifcvt happily goes on and 
removes the whole block and the jump instruction, even though it has 
side-effects (as marked with the volatility).  Patch below fixes it by 
checking that it's really just a jump.  Regstrapped on trunk.  Okay for 
trunk, 4.9, 4.8?


Ciao,
Michael.
PR rtl-optimization/61772
* ifcvt.c (dead_or_predicable): Check jump to be free of side
effects.

testsuite/
PR rtl-optimization/61772
* gcc.dg/torture/pr61772.c: New test.

Index: ifcvt.c
===
--- ifcvt.c (revision 212430)
+++ ifcvt.c (working copy)
@@ -4138,6 +4138,8 @@ dead_or_predicable (basic_block test_bb,
 
   if (JUMP_P (end))
 {
+  if (!onlyjump_p (end))
+   return FALSE;
   if (head == end)
{
  head = end = NULL_RTX;
Index: testsuite/gcc.dg/torture/pr61772.c
===
--- testsuite/gcc.dg/torture/pr61772.c  (revision 0)
+++ testsuite/gcc.dg/torture/pr61772.c  (working copy)
@@ -0,0 +1,24 @@
+/* { dg-do compile } */
+/* { dg-final { scan-assembler-times XXX 2 } } */
+
+static inline __attribute__((always_inline)) int dec_and_test (int *i)
+{
+asm volatile goto (XXX %0, %l[cc_label]
+  : : m (*i) : memory : cc_label);
+return 0;
+cc_label:
+return 1;
+}
+extern int getit (int *);
+int f (int *i, int cond)
+{
+  if (cond) {
+  getit (0);
+  if (dec_and_test (i))
+   getit (i);
+  return 42;
+  }
+  if (dec_and_test (i))
+(void)1;
+  return getit (i);
+}


Re: [PATCH][sched-deps] Generalise usage of macro fusion to work on any two insns

2014-07-11 Thread Kyrill Tkachov


On 10/07/14 22:53, Maxim Kuvyrkov wrote:

On Jul 10, 2014, at 8:00 PM, Kyrill Tkachov kyrylo.tkac...@arm.com wrote:


On 30/06/14 21:39, Jeff Law wrote:

On 06/27/14 02:29, Kyrill Tkachov wrote:

Hi all,

This patch generalises the TARGET_MACRO_FUSION_PAIR_P hook usage to work
on more than just
compares and conditional branches for which it was initially designed
for (for x86).

There are some instructions in arm and aarch64 that can be fused
together when they're back to back in the instruction stream and I'd
like to use this hook to keep them together.

I'll post an implementation of TARGET_MACRO_FUSION_PAIR_P for arm and
aarch64 shortly...

Bootstrapped and tested on x86, aarch64-none-linux-gnu and
arm-none-linux-gnueabihf.

Ok for trunk?

The patch looks good to me, but some cleanup suggestions below.


Thanks, here's an updated patch.
How's this?

2014-07-11  Ramana Radhakrishnan ramana.radhakrish...@arm.com
Kyrylo Tkachov  kyrylo.tkac...@arm.com

* sched-deps.c (try_group_insn): Generalise macro fusion hook usage
to any two insns.  Update comment.  Rename to sched_macro_fuse_insns.
(sched_analyze_insn): Update use of try_group_insn to
sched_macro_fuse_insns.
* config/i386/i386.c (ix86_macro_fusion_pair_p): Reject 2nd arguments
that are not conditional jumps.





commit e36b8977738dbe3f63445199710ca627ab37e243
Author: Kyrylo Tkachov kyrylo.tkac...@arm.com
Date:   Fri Jun 13 11:41:41 2014 +0100

 [sched-deps] Generalise macro fusion hook usage

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 8046c67..7dd2ce5 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -25817,6 +25817,9 @@ ix86_macro_fusion_pair_p (rtx condgen, rtx condjmp)
rtx compare_set = NULL_RTX, test_if, cond;
rtx alu_set = NULL_RTX, addr = NULL_RTX;
  
+  if (!any_condjump_p (condjmp))

+return false;
+
if (get_attr_type (condgen) != TYPE_TEST
 get_attr_type (condgen) != TYPE_ICMP
 get_attr_type (condgen) != TYPE_INCDEC
diff --git a/gcc/sched-deps.c b/gcc/sched-deps.c
index 7cafc8b..c01a8a6 100644
--- a/gcc/sched-deps.c
+++ b/gcc/sched-deps.c
@@ -2820,35 +2820,48 @@ sched_analyze_2 (struct deps_desc *deps, rtx x, rtx 
insn)
  sched_deps_info-finish_rhs ();
  }
  
-/* Try to group comparison and the following conditional jump INSN if

-   they're already adjacent. This is to prevent scheduler from scheduling
-   them apart.  */
+/* Try to group two fuseable insns together to prevent scheduler
+   from scheduling them apart.  */
  
  static void

  try_group_insn (rtx insn)

Please rename try_group_insn to sched_macro_fuse_insns.  The call is predicated 
to try_group_insn is predicated on targetm.sched.macro_fusion_p, so this code 
will not be used for any other kinds of fusion -- might as well just state that 
in the name,.


  {
-  unsigned int condreg1, condreg2;
-  rtx cc_reg_1;
rtx prev;
  
-  if (!any_condjump_p (insn))

+  if (!targetm.sched.macro_fusion_p ())
  return;

This is a no-op since there is a check on the upper level.  Please remove.

  
-  targetm.fixed_condition_code_regs (condreg1, condreg2);

-  cc_reg_1 = gen_rtx_REG (CCmode, condreg1);
-  prev = prev_nonnote_nondebug_insn (insn);
-  if (!reg_referenced_p (cc_reg_1, PATTERN (insn))
-  || !prev
-  || !modified_in_p (cc_reg_1, prev))
-return;
+  if (any_condjump_p (insn))
+{
+  unsigned int condreg1, condreg2;
+  rtx cc_reg_1;
+  targetm.fixed_condition_code_regs (condreg1, condreg2);
+  cc_reg_1 = gen_rtx_REG (CCmode, condreg1);
+  prev = prev_nonnote_nondebug_insn (insn);
+  if (!reg_referenced_p (cc_reg_1, PATTERN (insn))
+ || !prev
+ || !modified_in_p (cc_reg_1, prev))
+   return;
  
-  /* Different microarchitectures support macro fusions for different

- combinations of insn pairs.  */
-  if (!targetm.sched.macro_fusion_pair_p
-  || !targetm.sched.macro_fusion_pair_p (prev, insn))
-return;
+  if (targetm.sched.macro_fusion_pair_p (prev, insn))
+SCHED_GROUP_P (insn) = 1;
+}
+  else
+{
+  rtx insn_set = single_set (insn);
+
+  prev = prev_nonnote_nondebug_insn (insn);
+  if (prev
+   insn_set
+   single_set (prev)
+   modified_in_p (SET_DEST (insn_set), prev)

Invert the check (as done in the upper if-clause) and cut it here.  Then you 
can use a single unified

if (targetm.sched.macro_fusion_pair_p (prev, insn))
   SCHED_GROUP_P (insn) = 1;

as the final statement of the function.

Thank you,

--
Maxim Kuvyrkov
www.linaro.org


commit cb0584229d9247df805df35dc4c5bffbb839d59f
Author: Kyrylo Tkachov kyrylo.tkac...@arm.com
Date:   Fri Jun 13 11:41:41 2014 +0100

[sched-deps] Generalise macro fusion hook usage

diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
index 1b5cbeb..6951ddd 100644
--- a/gcc/config/i386/i386.c
+++ b/gcc/config/i386/i386.c
@@ -25820,6 +25820,9 @@ ix86_macro_fusion_pair_p (rtx condgen, 

Re: [PATCH 2/2] Enable elimination of zext/sext

2014-07-11 Thread Richard Biener
On Fri, Jul 11, 2014 at 1:52 PM, Kugan
kugan.vivekanandara...@linaro.org wrote:
 Thanks foe the review and suggestions.

 On 10/07/14 22:15, Richard Biener wrote:
 On Mon, Jul 7, 2014 at 8:55 AM, Kugan kugan.vivekanandara...@linaro.org 
 wrote:

 [...]


 For -fwrapv, it is due to how PROMOTE_MODE is defined in arm back-end.
 In the test-case, a function (which has signed char return type) returns
 -1 in one of the paths. ARM PROMOTE_MODE changes that to 255 and relies
 on zero/sign extension generated by RTL again for the correct value. I
 saw some other targets also defining similar think. I am therefore
 skipping removing zero/sign extension if the ssa variable can be set to
 negative integer constants.

 Hm?  I think you should rather check that you are removing a
 sign-/zero-extension - PROMOTE_MODE tells you if it will sign- or
 zero-extend.  Definitely

 +  /* In some architectures, negative integer constants are truncated and
 + sign changed with target defined PROMOTE_MODE macro. This will impact
 + the value range seen here and produce wrong code if zero/sign 
 extensions
 + are eliminated. Therefore, return false if this SSA can have negative
 + integers.  */
 +  if (is_gimple_assign (stmt)
 +   (TREE_CODE_CLASS (gimple_assign_rhs_code (stmt)) == tcc_unary))
 +{
 +  tree rhs1 = gimple_assign_rhs1 (stmt);
 +  if (TREE_CODE (rhs1) == INTEGER_CST
 +  !TYPE_UNSIGNED (TREE_TYPE (ssa))
 +  tree_int_cst_compare (rhs1, integer_zero_node) == -1)
 +   return false;

 looks completely bogus ... (an unary op with a constant operand?)
 instead you want to do sth like

 I see that unary op with a constant operand is not possible in gimple.
 What I wanted to check here is any sort of constant loads; but seems
 that will not happen in gimple. Is PHI statements the only possible
 statements where we will end up with such constants.

No, in theory you can have

  ssa_1 = -1;

but that's not unary but a GIMPLE_SINGLE_RHS and thus
gimple_assign_rhs_code (stmt) == INTEGER_CST.

   mode = TYPE_MODE (TREE_TYPE (ssa));
   rhs_uns = TYPE_UNSIGNED (TREE_TYPE (ssa));
   PROMOTE_MODE (mode, rhs_uns, TREE_TYPE (ssa));

 instead of initializing rhs_uns from ssas type.  That is, if
 PROMOTE_MODE tells you to promote _not_ according to ssas sign then
 honor that.

 This is triggered in pr43017.c in function foo for arm-none-linux-gnueabi.

 where, the gimple statement that cause this looks like:
 .
   # _3 = PHI _17(7), -1(2)
 bb43:
   return _3;

 ARM PROMOTE_MODE changes the sign for integer constants only and hence
 looking at the variable with PROMOTE_MODE is not changing the sign in
 this case.

 #define PROMOTE_MODE(MODE, UNSIGNEDP, TYPE) \
   if (GET_MODE_CLASS (MODE) == MODE_INT \
GET_MODE_SIZE (MODE)  4)  \
 {   \
   if (MODE == QImode)   \
 UNSIGNEDP = 1;  \
   else if (MODE == HImode)  \
 UNSIGNEDP = 1;  \
   (MODE) = SImode;  \
 }

Where does it only apply for constants?  It applies to all QImode and
HImode entities.

 As for the -fno-strict-overflow case, if the variables overflows, in VRP
 dumps, I see +INF(OVF), but the value range stored in ssa has TYPE_MAX.
 We therefore should limit the comparison to (TYPE_MIN  VR_MIN  VR_MAX
  TYPE_MAX) instead of (TYPE_MIN = VR_MIN  VR_MAX = TYPE_MAX) when
 checking to be sure that this is not the overflowing case. Attached
 patch changes this.

 I don't think that's necessary - the overflow cases happen only when
 that overflow has undefined behavior, thus any valid program will have
 values = MAX.

 I see that you have now removed +INF(OVF). I will change it this way.

I have not removed anything, I just fixed a bug.

Richard.

 Thanks again,
 Kugan



Change an assignment to an assert in varpool

2014-07-11 Thread Bernd Schmidt
I noticed that we set node-definition = true in varpool_assemble_decl. 
The surrounding code suggests that we should only ever get there if 
definition is already true, so I changed it to an assert. The question 
is interesting for some modifications I'm making for ptx (which requires 
declarations for external variables in the assembly).


The only thing that was tripped by the assert was a variable created by 
asan. AFAICT the problem there is that asan calls varpool_assemble_decl 
when it should be calling varpool_finalize_decl.


Bootstrapped and tested on x86_64-linux, ok?


Bernd
	* asan.c (asan_finish_file): Use varpool_finalize_decl instead of
	varpool_assemble_decl.
	* varpool.c (varpool_assemble_decl): Assert that node-definition is
	true.

diff --git a/gcc/asan.c b/gcc/asan.c
index b9a4a91..0d78634 100644
--- a/gcc/asan.c
+++ b/gcc/asan.c
@@ -2595,7 +2595,7 @@ asan_finish_file (void)
   TREE_CONSTANT (ctor) = 1;
   TREE_STATIC (ctor) = 1;
   DECL_INITIAL (var) = ctor;
-  varpool_assemble_decl (varpool_node_for_decl (var));
+  varpool_finalize_decl (var);
 
   fn = builtin_decl_implicit (BUILT_IN_ASAN_REGISTER_GLOBALS);
   tree gcount_tree = build_int_cst (pointer_sized_int_node, gcount);
diff --git a/gcc/varpool.c b/gcc/varpool.c
index 79f07bf..a72fb22 100644
--- a/gcc/varpool.c
+++ b/gcc/varpool.c
@@ -473,7 +473,7 @@ varpool_assemble_decl (varpool_node *node)
 {
   assemble_variable (decl, 0, 1, 0);
   gcc_assert (TREE_ASM_WRITTEN (decl));
-  node-definition = true;
+  gcc_assert (node-definition);
   assemble_aliases (node);
   return true;
 }


[BUILDROBOT] gcov patch (was: r212448 - in /trunk: gcc/ChangeLog gcc/Makefile...)

2014-07-11 Thread Jan-Benedict Glaw
On Fri, 2014-07-11 05:48:08 -, x...@gcc.gnu.org x...@gcc.gnu.org wrote:
 Author: xur
 Date: Fri Jul 11 05:48:07 2014
 New Revision: 212448
 
 URL: https://gcc.gnu.org/viewcvs?rev=212448root=gccview=rev
 Log:
 2014-07-10  Rong Xu  x...@google.com
 
   Add gcov-tool: an offline gcda profile processing tool
   Support.
[...]

See eg. http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=289639,
it breaks like this:

[...]
g++ -c   -g -O2 -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE  -fno-exceptions 
-fno-rtti -fasynchronous-unwind-tables -W -Wall -Wwrite-strings -Wcast-qual 
-Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long 
-Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. 
-I. -I/home/jbglaw/repos/gcc/gcc -I/home/jbglaw/repos/gcc/gcc/. 
-I/home/jbglaw/repos/gcc/gcc/../include 
-I/home/jbglaw/repos/gcc/gcc/../libcpp/include  
-I/home/jbglaw/repos/gcc/gcc/../libdecnumber 
-I/home/jbglaw/repos/gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
-I/home/jbglaw/repos/gcc/gcc/../libbacktrace-o gcov-tool.o -MT gcov-tool.o 
-MMD -MP -MF ./.deps/gcov-tool.TPo /home/jbglaw/repos/gcc/gcc/gcov-tool.c
/usr/include/sys/stat.h: In function ‘void gcov_output_files(const char*, 
gcov_info*)’:
/usr/include/sys/stat.h:323: error: too few arguments to function ‘int 
mkdir(const char*, __mode_t)’
/home/jbglaw/repos/gcc/gcc/gcov-tool.c:96: error: at this point in file
make[1]: *** [gcov-tool.o] Error 1
make[1]: Leaving directory `/home/jbglaw/build/rl78-elf/build-gcc/gcc'
make: *** [all-gcc] Error 2


MfG, JBG

-- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
  Signature of:Lauf nicht vor Deinem Glück davon:
  the second  : Es könnte hinter Dir stehen!


signature.asc
Description: Digital signature


Re: [BUILDROBOT] gcov patch

2014-07-11 Thread Jan-Benedict Glaw
On Fri, 2014-07-11 15:03:06 +0200, Jan-Benedict Glaw jbg...@lug-owl.de wrote:
 See eg. http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=289639,
 it breaks like this:
 
 [...]
 g++ -c   -g -O2 -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE  -fno-exceptions 
 -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wwrite-strings -Wcast-qual 
 -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long 
 -Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H -I. 
 -I. -I/home/jbglaw/repos/gcc/gcc -I/home/jbglaw/repos/gcc/gcc/. 
 -I/home/jbglaw/repos/gcc/gcc/../include 
 -I/home/jbglaw/repos/gcc/gcc/../libcpp/include  
 -I/home/jbglaw/repos/gcc/gcc/../libdecnumber 
 -I/home/jbglaw/repos/gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
 -I/home/jbglaw/repos/gcc/gcc/../libbacktrace-o gcov-tool.o -MT 
 gcov-tool.o -MMD -MP -MF ./.deps/gcov-tool.TPo 
 /home/jbglaw/repos/gcc/gcc/gcov-tool.c
 /usr/include/sys/stat.h: In function ‘void gcov_output_files(const char*, 
 gcov_info*)’:
 /usr/include/sys/stat.h:323: error: too few arguments to function ‘int 
 mkdir(const char*, __mode_t)’
 /home/jbglaw/repos/gcc/gcc/gcov-tool.c:96: error: at this point in file
 make[1]: *** [gcov-tool.o] Error 1
 make[1]: Leaving directory `/home/jbglaw/build/rl78-elf/build-gcc/gcc'
 make: *** [all-gcc] Error 2

[...]

+#ifdef TARGET_POSIX_IO
+  if (mkdir (out, 0755) == -1  errno != EEXIST)
+#else
+  if (mkdir (out) == -1  errno != EEXIST)
+#endif


And with POSIX I/O, this should probably use the S_I* macros for the
mode bits?

MfG, JBG

-- 
  Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
 Signature of:Arroganz verkürzt fruchtlose Gespräche.
 the second  :   -- Jan-Benedict Glaw


signature.asc
Description: Digital signature


Re: [PATCH, AArch64, Testsuite] Specify -fno-use-caller-save for func-ret* tests

2014-07-11 Thread Yufeng Zhang
Hi Tom,

On 8 July 2014 20:45, Tom de Vries tom_devr...@mentor.com wrote:
 On 01-07-14 19:26, Jeff Law wrote:

 On 07/01/14 09:51, Yufeng Zhang wrote:

 Hi,

 This patch resolves a conflict between the aapcs64 test framework for
 func-ret tests and the optimization option -fuse-caller-save, which was
 enabled by default at -O1 or above recently.


 Minor detail: it's enabled by default at -O2 or above:
 ...
 { OPT_LEVELS_2_PLUS, OPT_fuse_caller_save, NULL, 1 },
 ...

I see. Thanks for correcting me.



 Basically, the test framework has an inline-assembly based mechanism in
 place which invokes the test facility function right on the return of a
 tested function.

  The compiler with -fuse-caller-save is unable to
 identify the unconventional call graph and carries out the optimization
 regardless.

 AFAIU, we're overwriting the return register to implement a function call at
 return in order to see the exact state of registers at return:

Yes, exactly.

 ...
 __attribute__ ((noinline)) unsigned char
 func_return_val_0 (int i, double d, unsigned char t)
 {
   asm (::r (i),r (d));
   asm volatile (mov %0, x30 : =r (saved_return_address));
   asm volatile (mov x30, %0 : : r ((unsigned long long) myfunc));
   return t;
 }
 ...

 But we're not informing the compiler that a hidden function call takes
 place. This patch fixes that, and there's no need to disable
 fuse-caller-save.

 Tested with aarch64 build.  OK for trunk?

 Thanks,
 - Tom

 2014-07-08  Tom de Vries  t...@codesourcery.com

 * gcc.target/aarch64/aapcs64/aapcs64.exp
 (additional_flags_for_func_ret): Remove.
 (func-ret-*.c): Use additional_flags.
 * gcc.target/aarch64/aapcs64/abitest-2.h (FUNC_VAL_CHECK): Add missing
 call_used_regs clobbers.

 Index: gcc/testsuite/gcc.target/aarch64/aapcs64/aapcs64.exp
 ===
 --- gcc/testsuite/gcc.target/aarch64/aapcs64/aapcs64.exp (revision 212294)
 +++ gcc/testsuite/gcc.target/aarch64/aapcs64/aapcs64.exp (working copy)
 @@ -48,15 +48,11 @@ foreach src [lsort [glob -nocomplain $sr
  }

  # Test function return value.
 -#   Disable -fuse-caller-save to prevent the compiler from generating
 -#   conflicting code.
 -set additional_flags_for_func_ret $additional_flags
 -append additional_flags_for_func_ret  -fno-use-caller-save
  foreach src [lsort [glob -nocomplain $srcdir/$subdir/func-ret-*.c]] {
  if {[runtest_file_p $runtests $src]} {
  c-torture-execute [list $src \
  $srcdir/$subdir/abitest.S] \
 -$additional_flags_for_func_ret
 +$additional_flags
  }
  }

 Index: gcc/testsuite/gcc.target/aarch64/aapcs64/abitest-2.h
 ===
 --- gcc/testsuite/gcc.target/aarch64/aapcs64/abitest-2.h (revision 212294)
 +++ gcc/testsuite/gcc.target/aarch64/aapcs64/abitest-2.h (working copy)
 @@ -80,10 +80,18 @@ __attribute__ ((noinline)) type FUNC_NAM
 The previous approach of sequentially calling myfunc right after  
 \
 this function does not guarantee myfunc see the exact register  \
 content, as compiler may emit code in between the two calls,  \
 -   especially during the -O0 codegen.  */  \
 +   especially during the -O0 codegen.  \
 +   However, since we're doing a call, we need to clobber all call  \
 +   used regs.  */  \
  asm volatile (mov %0, x30 : =r (saved_return_address));  \
 -asm volatile (mov x30, %0 : : r ((unsigned long long) myfunc));   \
 -return t;  \
 +asm volatile (mov x30, %0 : : r ((unsigned long long) myfunc) :  
 \
 +  x0, x1, x2, x3, x4, x5, x6, x7,  \
 +  x8, x9,x10, x11, x12, x13, x14, x15, \
 +  x16, x17, x18,  \
 +  v0, v1,v2,  v3,  v4,  v5,  v6,  v7,  \
 +  v16, v17, v18, v19, v20, v21, v22, v23, \
 +  v24, v25, v26, v27, v28, v29, v30, v31);\
 +return t;\
}
  #include TESTFILE

Your patch is probably OK (although I'm no longer in a position to
verify the code-gen by myself easily). I still prefer to have
-fuse-caller-save disabled for these tests in order to have a
strictly-conformed ABI environment for these ABI tests.

I'll leave the AArch64 maintainers to comment.

Thanks,
Yufeng


Re: [PATCH][sched-deps] Generalise usage of macro fusion to work on any two insns

2014-07-11 Thread Alexander Monakov
On Fri, 11 Jul 2014, Kyrill Tkachov wrote:
 
 On 10/07/14 22:53, Maxim Kuvyrkov wrote:
  The patch looks good to me, but some cleanup suggestions below.
 
 Thanks, here's an updated patch.
 How's this?

You need to remove 'if (targetm. ...) SCHED_GROUP_P (insn) = 1;' from the
first if branch, keeping only one SCHED_GROUP_P assignment at the end of the
function.

Alexander


Re: Change an assignment to an assert in varpool

2014-07-11 Thread Jan Hubicka
 I noticed that we set node-definition = true in
 varpool_assemble_decl. The surrounding code suggests that we should
 only ever get there if definition is already true, so I changed it
 to an assert. The question is interesting for some modifications I'm
 making for ptx (which requires declarations for external variables
 in the assembly).
 
 The only thing that was tripped by the assert was a variable created
 by asan. AFAICT the problem there is that asan calls
 varpool_assemble_decl when it should be calling
 varpool_finalize_decl.
 
 Bootstrapped and tested on x86_64-linux, ok?
 
 
 Bernd

   * asan.c (asan_finish_file): Use varpool_finalize_decl instead of
   varpool_assemble_decl.
   * varpool.c (varpool_assemble_decl): Assert that node-definition is
   true.

OK,
we used to call assemble_decl directly before the transition to varpool
was finished, seems I never got around adding an assert.

Thanks,
Honza
 
 diff --git a/gcc/asan.c b/gcc/asan.c
 index b9a4a91..0d78634 100644
 --- a/gcc/asan.c
 +++ b/gcc/asan.c
 @@ -2595,7 +2595,7 @@ asan_finish_file (void)
TREE_CONSTANT (ctor) = 1;
TREE_STATIC (ctor) = 1;
DECL_INITIAL (var) = ctor;
 -  varpool_assemble_decl (varpool_node_for_decl (var));
 +  varpool_finalize_decl (var);
  
fn = builtin_decl_implicit (BUILT_IN_ASAN_REGISTER_GLOBALS);
tree gcount_tree = build_int_cst (pointer_sized_int_node, gcount);
 diff --git a/gcc/varpool.c b/gcc/varpool.c
 index 79f07bf..a72fb22 100644
 --- a/gcc/varpool.c
 +++ b/gcc/varpool.c
 @@ -473,7 +473,7 @@ varpool_assemble_decl (varpool_node *node)
  {
assemble_variable (decl, 0, 1, 0);
gcc_assert (TREE_ASM_WRITTEN (decl));
 -  node-definition = true;
 +  gcc_assert (node-definition);
assemble_aliases (node);
return true;
  }



Re: [PATCH][sched-deps] Generalise usage of macro fusion to work on any two insns

2014-07-11 Thread Kyrill Tkachov


On 11/07/14 14:20, Alexander Monakov wrote:

On Fri, 11 Jul 2014, Kyrill Tkachov wrote:

On 10/07/14 22:53, Maxim Kuvyrkov wrote:

The patch looks good to me, but some cleanup suggestions below.

Thanks, here's an updated patch.
How's this?

You need to remove 'if (targetm. ...) SCHED_GROUP_P (insn) = 1;' from the
first if branch, keeping only one SCHED_GROUP_P assignment at the end of the
function.


Yes, your're right, I had missed that. Will do.

Thanks,
Kyrill



Alexander






Re: [PATCH][RFC] Fix PR61473, inline small memcpy/memmove during tree opts

2014-07-11 Thread Richard Biener
On Fri, 11 Jul 2014, Richard Biener wrote:

 On Thu, 10 Jul 2014, Jakub Jelinek wrote:
 
  On Thu, Jul 10, 2014 at 04:30:13PM +0200, Richard Biener wrote:
   Compromise hack below.  It simply avoids the transform for
   sources that c_strlen can compute a length of.  That fixes all
   strlenopt testcase apart from strlenopt-8.c which does
   memcpy (, flag ? a : b); which then still folds
   during gimplification.  I have XFAILed that.
   
   I've tried to comb my way through strlenopt but failed to quickly
   add support for generic stores (it has very rough support for
   char stores, see also PR61773).
   
   Does the below look ok?
  
  I can look at the tree-ssa-strlen.c stuff and removing the !c_strlen
  hack incrementally.
 
 Ok, I'll bootstrap/test the patch then and will commit if there are
 no issues left.

This is what I have applied.

Bootstrapped and tested on x86_64-unknown-linux-gnu.

Richard.

2014-07-11  Richard Biener  rguent...@suse.de

PR middle-end/61473
* builtins.c (fold_builtin_memory_op): Inline memory moves
that can be implemented with a single load followed by a
single store.
(c_strlen): Only warn when only_value is not 2.

* gcc.dg/memmove-4.c: New testcase.
* gcc.dg/strlenopt-8.c: XFAIL.
* gfortran.dg/coarray_lib_realloc_1.f90: Adjust.

Index: gcc/builtins.c
===
*** gcc/builtins.c.orig 2014-07-11 09:54:49.844932232 +0200
--- gcc/builtins.c  2014-07-11 13:15:35.297102917 +0200
*** get_pointer_alignment (tree exp)
*** 535,540 
--- 535,544 
 len = c_strlen (src, 1); if (len) expand_expr (len, ...); would not
 evaluate the side-effects.
  
+If ONLY_VALUE is two then we do not emit warnings about out-of-bound
+accesses.  Note that this implies the result is not going to be emitted
+into the instruction stream.
+ 
 The value returned is of type `ssizetype'.
  
 Unfortunately, string_constant can't access the values of const char
*** c_strlen (tree src, int only_value)
*** 606,612 
  
/* If the offset is known to be out of bounds, warn, and call strlen at
   runtime.  */
!   if (offset  0 || offset  max)
  {
   /* Suppress multiple warnings for propagated constant strings.  */
if (! TREE_NO_WARNING (src))
--- 610,617 
  
/* If the offset is known to be out of bounds, warn, and call strlen at
   runtime.  */
!   if (only_value != 2
!(offset  0 || offset  max))
  {
   /* Suppress multiple warnings for propagated constant strings.  */
if (! TREE_NO_WARNING (src))
*** fold_builtin_memory_op (location_t loc,
*** 8637,8647 
unsigned int src_align, dest_align;
tree off0;
  
!   if (endp == 3)
{
! src_align = get_pointer_alignment (src);
! dest_align = get_pointer_alignment (dest);
  
  /* Both DEST and SRC must be pointer types.
 ??? This is what old code did.  Is the testing for pointer types
 really mandatory?
--- 8642,8698 
unsigned int src_align, dest_align;
tree off0;
  
!   /* Build accesses at offset zero with a ref-all character type.  */
!   off0 = build_int_cst (build_pointer_type_for_mode (char_type_node,
!ptr_mode, true), 0);
! 
!   /* If we can perform the copy efficiently with first doing all loads
!  and then all stores inline it that way.  Currently efficiently
!means that we can load all the memory into a single integer
!register which is what MOVE_MAX gives us.  */
!   src_align = get_pointer_alignment (src);
!   dest_align = get_pointer_alignment (dest);
!   if (tree_fits_uhwi_p (len)
!  compare_tree_int (len, MOVE_MAX) = 0
! /* ???  Don't transform copies from strings with known length this
!confuses the tree-ssa-strlen.c.  This doesn't handle
!the case in gcc.dg/strlenopt-8.c which is XFAILed for that
!reason.  */
!  !c_strlen (src, 2))
{
! unsigned ilen = tree_to_uhwi (len);
! if (exact_log2 (ilen) != -1)
!   {
! tree type = lang_hooks.types.type_for_size (ilen * 8, 1);
! if (type
!  TYPE_MODE (type) != BLKmode
!  (GET_MODE_SIZE (TYPE_MODE (type)) * BITS_PER_UNIT
! == ilen * 8)
! /* If the pointers are not aligned we must be able to
!emit an unaligned load.  */
!  ((src_align = GET_MODE_ALIGNMENT (TYPE_MODE (type))
!   dest_align = GET_MODE_ALIGNMENT (TYPE_MODE (type)))
! || !SLOW_UNALIGNED_ACCESS (TYPE_MODE (type),
!MIN (src_align, dest_align
!   {
! tree srctype = type;

Re: [GSoC] generation of Gimple loops with empty bodies

2014-07-11 Thread Roman Gareev
I've attached an improved version of the patch and the ChangeLog
entry. Are they fine for trunk?

--
   Cheers, Roman Gareev.
2014-07-11  Roman Gareev  gareevro...@gmail.com

gcc/
* graphite-isl-ast-to-gimple.c (gmp_cst_to_tree):
New function.
(graphite_verify): New function.
(ivs_params_clear): New function.
(gcc_expression_from_isl_ast_expr_id): New function.
(gcc_expression_from_isl_expr_int): New function.
(binary_op_to_tree): New function.
(ternary_op_to_tree): New function.
(unary_op_to_tree): New function.
(nary_op_to_tree): New function.
(gcc_expression_from_isl_expr_op): New function.
(gcc_expression_from_isl_expression): New function.
(graphite_create_new_loop): New function.
(translate_isl_ast_for_loop): New function.
(get_upper_bound): New function.
(graphite_create_new_loop_guard): New function.
(translate_isl_ast_node_for): New function.
(translate_isl_ast): New function.
(add_parameters_to_ivs_params): New function.
(scop_to_isl_ast): New parameter ip.
(graphite_regenerate_ast_isl): Add generation of GIMPLE code.
Index: gcc/graphite-isl-ast-to-gimple.c
===
--- gcc/graphite-isl-ast-to-gimple.c(revision 212450)
+++ gcc/graphite-isl-ast-to-gimple.c(working copy)
@@ -25,7 +25,14 @@
 #include isl/map.h
 #include isl/union_map.h
 #include isl/ast_build.h
+#if defined(__cplusplus)
+extern C {
 #endif
+#include isl/val_gmp.h
+#if defined(__cplusplus)
+}
+#endif
+#endif
 
 #include system.h
 #include coretypes.h
@@ -42,6 +49,9 @@
 #include cfgloop.h
 #include tree-data-ref.h
 #include sese.h
+#include tree-ssa-loop-manip.h
+#include tree-scalar-evolution.h
+#include map
 
 #ifdef HAVE_cloog
 #include graphite-poly.h
@@ -52,6 +62,515 @@
 
 static bool graphite_regenerate_error;
 
+/* We always use signed 128, until isl is able to give information about
+types  */
+
+static tree *graphite_expression_size_type = int128_integer_type_node;
+
+/* Converts a GMP constant VAL to a tree and returns it.  */
+
+static tree
+gmp_cst_to_tree (tree type, mpz_t val)
+{
+  tree t = type ? type : integer_type_node;
+  mpz_t tmp;
+
+  mpz_init (tmp);
+  mpz_set (tmp, val);
+  wide_int wi = wi::from_mpz (t, tmp, true);
+  mpz_clear (tmp);
+
+  return wide_int_to_tree (t, wi);
+}
+
+/* Verifies properties that GRAPHITE should maintain during translation.  */
+
+static inline void
+graphite_verify (void)
+{
+#ifdef ENABLE_CHECKING
+  verify_loop_structure ();
+  verify_loop_closed_ssa (true);
+#endif
+}
+
+/* IVS_PARAMS maps ISL's scattering and parameter identifiers
+   to corresponding trees.  */
+
+typedef std::mapisl_id *, tree ivs_params;
+
+/* Free all memory allocated for ISL's identifiers.  */
+
+void ivs_params_clear (ivs_params ip)
+{
+  std::mapisl_id *, tree::iterator it;
+  for (it = ip.begin ();
+   it != ip.end (); it++)
+{
+  isl_id_free (it-first);
+}
+}
+
+static tree
+gcc_expression_from_isl_expression (tree type, __isl_take isl_ast_expr *,
+   ivs_params ip);
+
+/* Return the tree variable that corresponds to the given isl ast identifier
+ expression (an isl_ast_expr of type isl_ast_expr_id).  */
+
+static tree
+gcc_expression_from_isl_ast_expr_id (__isl_keep isl_ast_expr *expr_id,
+ivs_params ip)
+{
+  gcc_assert (isl_ast_expr_get_type (expr_id) == isl_ast_expr_id);
+  isl_id *tmp_isl_id = isl_ast_expr_get_id (expr_id);
+  std::mapisl_id *, tree::iterator res;
+  res = ip.find (tmp_isl_id);
+  isl_id_free (tmp_isl_id);
+  gcc_assert (res != ip.end () 
+  Could not map isl_id to tree expression);
+  isl_ast_expr_free (expr_id);
+  return res-second;
+}
+
+/* Converts an isl_ast_expr_int expression E to a GCC expression tree of
+   type TYPE.  */
+
+static tree
+gcc_expression_from_isl_expr_int (tree type, __isl_take isl_ast_expr *expr)
+{
+  gcc_assert (isl_ast_expr_get_type (expr) == isl_ast_expr_int);
+  isl_val *val = isl_ast_expr_get_val (expr);
+  mpz_t val_mpz_t;
+  mpz_init (val_mpz_t);
+  tree res;
+  if (isl_val_get_num_gmp (val, val_mpz_t) == -1)
+res = NULL_TREE;
+  else
+res = gmp_cst_to_tree (type, val_mpz_t);
+  isl_val_free (val);
+  isl_ast_expr_free (expr);
+  mpz_clear (val_mpz_t);
+  return res;
+}
+
+/* Converts a binary isl_ast_expr_op expression E to a GCC expression tree of
+   type TYPE.  */
+
+static tree
+binary_op_to_tree (tree type, __isl_take isl_ast_expr *expr, ivs_params ip)
+{
+  isl_ast_expr *arg_expr = isl_ast_expr_get_op_arg (expr, 0);
+  tree tree_lhs_expr = gcc_expression_from_isl_expression (type, arg_expr, ip);
+  arg_expr = isl_ast_expr_get_op_arg (expr, 1);
+  tree tree_rhs_expr = gcc_expression_from_isl_expression (type, arg_expr, ip);
+  enum isl_ast_op_type expr_type = 

Re: [PATCH][RFC] Fix PR61473, inline small memcpy/memmove during tree opts

2014-07-11 Thread Jakub Jelinek
On Fri, Jul 11, 2014 at 03:36:15PM +0200, Richard Biener wrote:
 *** c_strlen (tree src, int only_value)
 *** 606,612 
   
 /* If the offset is known to be out of bounds, warn, and call strlen at
runtime.  */
 !   if (offset  0 || offset  max)
   {
/* Suppress multiple warnings for propagated constant strings.  */
 if (! TREE_NO_WARNING (src))
 --- 610,617 
   
 /* If the offset is known to be out of bounds, warn, and call strlen at
runtime.  */
 !   if (only_value != 2
 !(offset  0 || offset  max))
   {
/* Suppress multiple warnings for propagated constant strings.  */
 if (! TREE_NO_WARNING (src))

This looks wrong.  I'd say you only want to disable the warning for
only_value != 2, but still return NULL_TREE, otherwise the strlen call will
be out of bounds in the compiler.  So move only_value != 2 down to the
second if ?

Jakub


[PATCH][Ping v4] Add patch for debugging compiler ICEs

2014-07-11 Thread Maxim Ostapenko

Ping. Added small changes due to previous discussion in community.


 Original Message 
Subject:[PATCH][Ping v3] Add patch for debugging compiler ICEs
Date:   Fri, 04 Jul 2014 18:32:44 +0400
From:   Maxim Ostapenko m.ostape...@partner.samsung.com
To: GCC Patches gcc-patches@gcc.gnu.org
CC: 	Yury Gribov y.gri...@samsung.com, Slava Garbuzov 
v.garbu...@samsung.com, Jakub Jelinek ja...@redhat.com, 
tsaund...@mozilla.com, Maxim Ostapenko chefm...@gmail.com




Ping.


 Original Message 
Subject:[PATCH][Ping v2] Add patch for debugging compiler ICEs
Date:   Thu, 26 Jun 2014 19:46:08 +0400
From:   Maxim Ostapenko m.ostape...@partner.samsung.com
To: GCC Patches gcc-patches@gcc.gnu.org
CC: Yury Gribov y.gri...@samsung.com, Slava Garbuzov
v.garbu...@samsung.com, Jakub Jelinek ja...@redhat.com,
tsaund...@mozilla.com, Maxim Ostapenko chefm...@gmail.com



Ping.


 Original Message 
Subject:[PATCH][Ping] Add patch for debugging compiler ICEs
Date:   Wed, 11 Jun 2014 18:15:27 +0400
From:   Maxim Ostapenko m.ostape...@partner.samsung.com
To: GCC Patches gcc-patches@gcc.gnu.org
CC: Yury Gribov y.gri...@samsung.com, Slava Garbuzov
v.garbu...@samsung.com, Jakub Jelinek ja...@redhat.com,
tsaund...@mozilla.com, chefm...@gmail.com



Ping.


 Original Message 
Subject:[PATCH] Add patch for debugging compiler ICEs
Date:   Mon, 02 Jun 2014 19:21:14 +0400
From:   Maxim Ostapenko m.ostape...@partner.samsung.com
To: GCC Patches gcc-patches@gcc.gnu.org
CC: Yury Gribov y.gri...@samsung.com, Slava Garbuzov
v.garbu...@samsung.com, Jakub Jelinek ja...@redhat.com,
tsaund...@mozilla.com, chefm...@gmail.com



Hi,

A years ago there was a discussion 
(https://gcc.gnu.org/ml/gcc-patches/2004-01/msg02437.html) about debugging 
compiler ICEs that resulted in a patch from Jakub, which dumps
useful information into temporary file, but for some reasons this patch wasn't 
applied to trunk.

This is the resurrected patch with added GCC version information into generated 
repro file.

-Maxim










2014-06-02  Jakub Jelinek  ja...@redhat.com
	Max Ostapenko  m.ostape...@partner.samsung.com

	* diagnostic.c (diagnostic_action_after_output): Exit with
	ICE_EXIT_CODE instead of FATAL_EXIT_CODE.
	* gcc.c (execute): Don't free first string early, but at the end
	of the function.  Call retry_ice if compiler exited with
	ICE_EXIT_CODE.
	(main): Factor out common code.
	(print_configuration): New function.
	(try_fork): Likewise.
	(redirect_stdout_stderr): Likewise.
	(files_equal_p): Likewise.
	(check_repro): Likewise.
	(run_attempt): Likewise.
	(generate_preprocessed_code): Likewise.
	(append_text): Likewise.
	(try_generate_repro): Likewise.

diff --git a/gcc/diagnostic.c b/gcc/diagnostic.c
index 0cc7593..67b8c5b 100644
--- a/gcc/diagnostic.c
+++ b/gcc/diagnostic.c
@@ -492,7 +492,7 @@ diagnostic_action_after_output (diagnostic_context *context,
 	real_abort ();
   diagnostic_finish (context);
   fnotice (stderr, compilation terminated.\n);
-  exit (FATAL_EXIT_CODE);
+  exit (ICE_EXIT_CODE);
 
 default:
   gcc_unreachable ();
diff --git a/gcc/gcc.c b/gcc/gcc.c
index 6cd08ea..045363c 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -43,6 +43,13 @@ compilation is specified by a string called a spec.  */
 #include params.h
 #include vec.h
 #include filenames.h
+#ifdef HAVE_UNISTD_H
+#include unistd.h
+#endif
+
+#if !(defined (__MSDOS__) || defined (OS2) || defined (VMS))
+#define RETRY_ICE_SUPPORTED
+#endif
 
 /* By default there is no special suffix for target executables.  */
 /* FIXME: when autoconf is fixed, remove the host check - dj */
@@ -253,6 +260,9 @@ static void init_gcc_specs (struct obstack *, const char *, const char *,
 static const char *convert_filename (const char *, int, int);
 #endif
 
+#ifdef RETRY_ICE_SUPPORTED
+static void try_generate_repro (const char *prog, const char **argv);
+#endif
 static const char *getenv_spec_function (int, const char **);
 static const char *if_exists_spec_function (int, const char **);
 static const char *if_exists_else_spec_function (int, const char **);
@@ -2850,7 +2860,7 @@ execute (void)
 	}
 	}
 
-  if (string != commands[i].prog)
+  if (i  string != commands[i].prog)
 	free (CONST_CAST (char *, string));
 }
 
@@ -2903,6 +2913,16 @@ execute (void)
 	else if (WIFEXITED (status)
 		  WEXITSTATUS (status) = MIN_FATAL_STATUS)
 	  {
+#ifdef RETRY_ICE_SUPPORTED
+	/* For ICEs in cc1, cc1obj, cc1plus see if it is
+	   reproducible or not.  */
+	const char *p;
+	if (WEXITSTATUS (status) == ICE_EXIT_CODE
+		 i == 0
+		 (p = strrchr (commands[0].argv[0], DIR_SEPARATOR))
+		 ! strncmp (p + 1, cc1, 3))
+	  try_generate_repro (commands[0].prog, commands[0].argv);
+#endif
 	if (WEXITSTATUS (status)  greatest_status)
 	  greatest_status = WEXITSTATUS (status);
 	ret_code = -1;
@@ -2960,6 +2980,9 @@ execute (void)
 	  }
   

Re: [GSoC] generation of Gimple loops with empty bodies

2014-07-11 Thread Tobias Grosser

On 11/07/2014 15:41, Roman Gareev wrote:

I've attached an improved version of the patch and the ChangeLog
entry. Are they fine for trunk?


LGTM.

Thank you!
Tobias



Re: [PATCH v3 3/3] Port libstdc++ pretty-printers to Python 2 + Python 3

2014-07-11 Thread Jonathan Wakely

On 10/07/14 22:48 -0400, Samuel Bronson wrote:

PR libstdc++/58962
* python/libstdcxx/v6/printers.py: Port to Python 2+3
  (imap): New compat function.
  (izip): Likewise.
  (Iterator): New mixin to allow writing iterators in Python 3 style
  regardless of which version we're running on.
  [Python3] (long) New compat alias for int.
* testsuite/lib/gdb-test.exp: Port to Python 2+3 (print syntax)


Thanks, I've committed this with a few tiny changes to the comments
(s/compatability/compatibility/ and the comment about your GDB PR).

I'm not planning to backport this myself, but if the patch applies
cleanly or if someone else does the work then I have no objections to
it going on the release branches (after the branch reopens post-4.9.1
of course).

Thanks very much for doing this.


Re: [PATCH preprocessor/61389] - libcpp diagnostics shouldn't talk about ISO C99 for C++ input files

2014-07-11 Thread Dominique Dhumieres
 Here is a preprocessor patch to make error messages show C++11 
 and other relevant C++ language instead of C99.

 Built and tested on x86_64-linux.

This caused

FAIL: gcc.dg/cpp/macsyntx.c (test for excess errors)
FAIL: gcc.dg/cpp/macsyntx.c (test for excess errors)
FAIL: gcc.dg/cpp/macsyntx.c  (test for warnings, line 54)
FAIL: gcc.dg/cpp/macsyntx.c  (test for warnings, line 54)
FAIL: gcc.dg/cpp/macsyntx.c  (test for warnings, line 55)
FAIL: gcc.dg/cpp/macsyntx.c  (test for warnings, line 55)
FAIL: gcc.dg/cpp/macsyntx.c  (test for warnings, line 62)
FAIL: gcc.dg/cpp/macsyntx.c  (test for warnings, line 62)
FAIL: gcc.dg/cpp/sysmac1.c (test for excess errors)
FAIL: gcc.dg/cpp/sysmac1.c (test for excess errors)
FAIL: gcc.dg/cpp/sysmac1.c  (test for warnings, line 25)
FAIL: gcc.dg/cpp/sysmac1.c  (test for warnings, line 25)

(see https://gcc.gnu.org/ml/gcc-regression/2014-07/msg00162.html).

The warnings have to be adjusted as in

--- ../_clean/gcc/testsuite/gcc.dg/cpp/macsyntx.c   2014-05-10 
23:17:13.0 +0200
+++ gcc/testsuite/gcc.dg/cpp/macsyntx.c 2014-07-11 15:57:33.0 +0200
@@ -51,15 +51,15 @@ one(ichi\
 two(ichi)  /* { dg-error requires 2 } */
 var0() /* OK.  */
 var0(ichi) /* OK.  */
-var1() /* { dg-warning rest arguments to be used } */
-var1(ichi) /* { dg-warning rest arguments to be used } */
+var1() /* { dg-warning ISO C99 requires at least one 
argument } */
+var1(ichi) /* { dg-warning ISO C99 requires at least one 
argument } */
 var1(ichi, ni) /* OK.  */
 
 /* This tests two oddities of GNU rest args - omitting a comma is OK,
and backtracking a token on pasting an empty rest args.  */
 #define rest(x, y...) x ## y   /* { dg-warning ISO C } */
 rest(ichi,)/* OK.  */
-rest(ichi) /* { dg-warning rest arguments to be used } */
+rest(ichi) /* { dg-warning ISO C99 requires at least one 
argument } */
 #if 23 != rest(2, 3)   /* OK, no warning.  */
 #error 23 != 23 !!
 #endif
--- ../_clean/gcc/testsuite/gcc.dg/cpp/sysmac1.c2014-05-10 
23:17:12.0 +0200
+++ gcc/testsuite/gcc.dg/cpp/sysmac1.c  2014-07-11 15:59:18.0 +0200
@@ -22,5 +22,5 @@
 (str); /* { dg-warning used with arguments } */
 (sys_str); /* { dg-bogus used with arguments } */
 
-foo (one_arg); /* { dg-warning requires rest arguments } */
+foo (one_arg); /* { dg-warning ISO C99 requires at least one 
argument } */
 sys_foo (one_arg); /* { dg-bogus requires rest arguments } */

Dominique


Re: [BUILDROBOT] gcov patch

2014-07-11 Thread Rong Xu
Sorry. This code meant to work with the different mkdir api in
windows. I used wrong ifdef.

Here is the patch. OK for checkin?

Thanks,

-Rong

2014-07-11  Rong Xu  x...@google.com

* gcov-tool.c (gcov_output_files): Fix build error.

Index: gcov-tool.c
===
--- gcov-tool.c (revision 212448)
+++ gcov-tool.c (working copy)
@@ -90,8 +90,8 @@ gcov_output_files (const char *out, struct gcov_in
   /* Try to make directory if it doesn't already exist.  */
   if (access (out, F_OK) == -1)
 {
-#ifdef TARGET_POSIX_IO
-  if (mkdir (out, 0755) == -1  errno != EEXIST)
+#if !defined(_WIN32)
+  if (mkdir (out, S_IRWXU | S_IRWXG | S_IRWXO) == -1  errno != EEXIST)
 #else
   if (mkdir (out) == -1  errno != EEXIST)
 #endif

On Fri, Jul 11, 2014 at 6:08 AM, Jan-Benedict Glaw jbg...@lug-owl.de wrote:
 On Fri, 2014-07-11 15:03:06 +0200, Jan-Benedict Glaw jbg...@lug-owl.de 
 wrote:
 See eg. 
 http://toolchain.lug-owl.de/buildbot/show_build_details.php?id=289639,
 it breaks like this:

 [...]
 g++ -c   -g -O2 -DIN_GCC  -DCROSS_DIRECTORY_STRUCTURE  -fno-exceptions 
 -fno-rtti -fasynchronous-unwind-tables -W -Wall -Wwrite-strings -Wcast-qual 
 -Wmissing-format-attribute -Woverloaded-virtual -pedantic -Wno-long-long 
 -Wno-variadic-macros -Wno-overlength-strings -fno-common  -DHAVE_CONFIG_H 
 -I. -I. -I/home/jbglaw/repos/gcc/gcc -I/home/jbglaw/repos/gcc/gcc/. 
 -I/home/jbglaw/repos/gcc/gcc/../include 
 -I/home/jbglaw/repos/gcc/gcc/../libcpp/include  
 -I/home/jbglaw/repos/gcc/gcc/../libdecnumber 
 -I/home/jbglaw/repos/gcc/gcc/../libdecnumber/dpd -I../libdecnumber 
 -I/home/jbglaw/repos/gcc/gcc/../libbacktrace-o gcov-tool.o -MT 
 gcov-tool.o -MMD -MP -MF ./.deps/gcov-tool.TPo 
 /home/jbglaw/repos/gcc/gcc/gcov-tool.c
 /usr/include/sys/stat.h: In function ‘void gcov_output_files(const char*, 
 gcov_info*)’:
 /usr/include/sys/stat.h:323: error: too few arguments to function ‘int 
 mkdir(const char*, __mode_t)’
 /home/jbglaw/repos/gcc/gcc/gcov-tool.c:96: error: at this point in file
 make[1]: *** [gcov-tool.o] Error 1
 make[1]: Leaving directory `/home/jbglaw/build/rl78-elf/build-gcc/gcc'
 make: *** [all-gcc] Error 2

 [...]

 +#ifdef TARGET_POSIX_IO
 +  if (mkdir (out, 0755) == -1  errno != EEXIST)
 +#else
 +  if (mkdir (out) == -1  errno != EEXIST)
 +#endif


 And with POSIX I/O, this should probably use the S_I* macros for the
 mode bits?

 MfG, JBG

 --
   Jan-Benedict Glaw  jbg...@lug-owl.de  +49-172-7608481
  Signature of:Arroganz verkürzt fruchtlose Gespräche.
  the second  :   -- Jan-Benedict Glaw


Re: Fix PR61772: ifcvt removing asm volatile gotos

2014-07-11 Thread Steven Bosscher
On Fri, Jul 11, 2014 at 2:34 PM, Michael Matz wrote:
 PR rtl-optimization/61772
 * ifcvt.c (dead_or_predicable): Check jump to be free of side
 effects.

This is OK.

Ciao!
Steven


Re: [PATCH preprocessor/61389] - libcpp diagnostics shouldn't talk about ISO C99 for C++ input files

2014-07-11 Thread Andreas Schwab
Tested on x86_64-suse-linux and installed as obvious.

Andreas.

PR preprocessor/61389
* gcc.dg/cpp/macsyntx.c: Update expected warnings.
* gcc.dg/cpp/sysmac1.c: Likewise.

diff --git a/gcc/testsuite/gcc.dg/cpp/macsyntx.c 
b/gcc/testsuite/gcc.dg/cpp/macsyntx.c
index 495921e..146dced 100644
--- a/gcc/testsuite/gcc.dg/cpp/macsyntx.c
+++ b/gcc/testsuite/gcc.dg/cpp/macsyntx.c
@@ -51,15 +51,15 @@ one(ichi\
 two(ichi)  /* { dg-error requires 2 } */
 var0() /* OK.  */
 var0(ichi) /* OK.  */
-var1() /* { dg-warning rest arguments to be used } */
-var1(ichi) /* { dg-warning rest arguments to be used } */
+var1() /* { dg-warning requires at least one } */
+var1(ichi) /* { dg-warning requires at least one } */
 var1(ichi, ni) /* OK.  */
 
 /* This tests two oddities of GNU rest args - omitting a comma is OK,
and backtracking a token on pasting an empty rest args.  */
 #define rest(x, y...) x ## y   /* { dg-warning ISO C } */
 rest(ichi,)/* OK.  */
-rest(ichi) /* { dg-warning rest arguments to be used } */
+rest(ichi) /* { dg-warning requires at least one } */
 #if 23 != rest(2, 3)   /* OK, no warning.  */
 #error 23 != 23 !!
 #endif
diff --git a/gcc/testsuite/gcc.dg/cpp/sysmac1.c 
b/gcc/testsuite/gcc.dg/cpp/sysmac1.c
index cc8469e..54f161e 100644
--- a/gcc/testsuite/gcc.dg/cpp/sysmac1.c
+++ b/gcc/testsuite/gcc.dg/cpp/sysmac1.c
@@ -22,5 +22,5 @@
 (str); /* { dg-warning used with arguments } */
 (sys_str); /* { dg-bogus used with arguments } */
 
-foo (one_arg); /* { dg-warning requires rest arguments } */
-sys_foo (one_arg); /* { dg-bogus requires rest arguments } */
+foo (one_arg); /* { dg-warning requires at least one } */
+sys_foo (one_arg); /* { dg-bogus requires at least one } */

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.


Re: [PATCH] Fix for PR 61561

2014-07-11 Thread Richard Earnshaw
On 19/06/14 21:19, Yuri Gribov wrote:
 Thirdly, we also need to fix movhi_bytes (for pre-v4) thumb2_movhi_insn
 (for thumb2) and, quite possibly, thumb1_movhi_insn (for thumb1).  There
 may well be additional changes for movqi variants as well.
 
 A general question: how should one test ARM backend patches? Is it
 enough to regtest ARM and Thumb2 on some modern Cortex? If not - what
 other variants should be tested?
 
 -Y
 

The answer to this question is that you have to be realistic.  It's
clearly impossible to test every permutation, but you should, ideally,
test a reasonable selection of permutations that might be affected by a
change.  Ie, if you make a change that is likely to affect thumb1 code
generation, it doesn't make sense to only test the arm or thumb2 code
generation (and vice versa).  The tricky thing here is that although
thumb1 and thumb2 sound quite similar in name, in practice thumb2 code
generation is closer to ARM than thumb1.

R.



[PATCH] AIX COMDAT

2014-07-11 Thread David Edelsohn
This patch enables COMDAT functionality on AIX.  It defines
MAKE_DECL_ONE_ONLY and removes the historical legacy link-line option
of -bnodelcsect.

-bnodelcsect instructs the AIX (garbage collecting) linker to preserve
un-referenced CSECTs (like ELF sections). In the distant past, GCC
emitted code that produced necessary sections that appeared not to be
referenced, and the linker deleted them.

GCC bootstraps and shows similar testsuite results with the patch. At
this point, the patch needs wider use to find any problems.  I plan to
install the patch shortly.

Thanks, David

Bootstrapped on powerpc-ibm-aix7.1.0.0

* config/rs6000/aix51.h (LINK_SPEC): Remove -bnodelcsect.
* config/rs6000/aix52.h (LINK_SPEC): Same.
* config/rs6000/aix53.h (LINK_SPEC): Same.
* config/rs6000/aix61.h (LINK_SPEC): Same.
* config/rs6000/xcoff.h (MAKE_DECL_ONE_ONLY): Define.

Index: aix53.h
===
--- aix53.h (revision 212454)
+++ aix53.h (working copy)
@@ -133,7 +133,7 @@
%{pthread:-lpthreads} -lc

 #undef LINK_SPEC
-#define LINK_SPEC -bpT:0x1000 -bpD:0x2000 %{!r:-btextro} -bnodelcsect\
+#define LINK_SPEC -bpT:0x1000 -bpD:0x2000 %{!r:-btextro}\
%{static:-bnso %(link_syscalls) } %{shared:-bM:SRE %{!e:-bnoentry}}\
%{!maix64:%{!shared:%{g*: %(link_libg) }}} %{maix64:-b64}\
%{mpe:-binitfini:poe_remote_main}
Index: aix51.h
===
--- aix51.h (revision 212454)
+++ aix51.h (working copy)
@@ -121,7 +121,7 @@
%{pthread:-lpthreads} -lc

 #undef LINK_SPEC
-#define LINK_SPEC -bpT:0x1000 -bpD:0x2000 %{!r:-btextro} -bnodelcsect\
+#define LINK_SPEC -bpT:0x1000 -bpD:0x2000 %{!r:-btextro}\
%{static:-bnso %(link_syscalls) } %{shared:-bM:SRE %{!e:-bnoentry}}\
%{!maix64:%{!shared:%{g*: %(link_libg) }}} %{maix64:-b64}\
%{mpe:-binitfini:poe_remote_main}
Index: xcoff.h
===
--- xcoff.h (revision 212454)
+++ xcoff.h (working copy)
@@ -309,3 +309,6 @@
than in the .eh_frame section.  We do this because the AIX linker
would otherwise garbage collect these sections.  */
 #define EH_FRAME_IN_DATA_SECTION 1
+
+#define MAKE_DECL_ONE_ONLY(DECL) (DECL_WEAK (DECL) = 1)
+
Index: aix52.h
===
--- aix52.h (revision 212454)
+++ aix52.h (working copy)
@@ -133,7 +133,7 @@
%{pthread:-lpthreads} -lc

 #undef LINK_SPEC
-#define LINK_SPEC -bpT:0x1000 -bpD:0x2000 %{!r:-btextro} -bnodelcsect\
+#define LINK_SPEC -bpT:0x1000 -bpD:0x2000 %{!r:-btextro}\
%{static:-bnso %(link_syscalls) } %{shared:-bM:SRE %{!e:-bnoentry}}\
%{!maix64:%{!shared:%{g*: %(link_libg) }}} %{maix64:-b64}\
%{mpe:-binitfini:poe_remote_main}
Index: aix61.h
===
--- aix61.h (revision 212454)
+++ aix61.h (working copy)
@@ -156,7 +156,7 @@
%{pthread:-lpthreads} -lc

 #undef LINK_SPEC
-#define LINK_SPEC -bpT:0x1000 -bpD:0x2000 %{!r:-btextro} -bnodelcsect\
+#define LINK_SPEC -bpT:0x1000 -bpD:0x2000 %{!r:-btextro}\
%{static:-bnso %(link_syscalls) } %{shared:-bM:SRE %{!e:-bnoentry}}\
%{!maix64:%{!shared:%{g*: %(link_libg) }}} %{maix64:-b64}\
%{mpe:-binitfini:poe_remote_main}


Re: [BUILDROBOT] gcov patch

2014-07-11 Thread Jan Hubicka
 Sorry. This code meant to work with the different mkdir api in
 windows. I used wrong ifdef.
 
 Here is the patch. OK for checkin?
OK. I also see the following with LTO bootstrap:
../../gcc/../libgcc/libgcov-util.c:41:24: error: type of �gcov_max_filename� 
does not match original declaration [-Werror]
 extern gcov_unsigned_t gcov_max_filename;
^
../../gcc/../libgcc/libgcov-driver.c:88:8: note: previously declared here
 size_t gcov_max_filename = 0;
Probably both can be size_t?

Honza

 
 Thanks,
 
 -Rong
 
 2014-07-11  Rong Xu  x...@google.com
 
 * gcov-tool.c (gcov_output_files): Fix build error.
 
 Index: gcov-tool.c
 ===
 --- gcov-tool.c (revision 212448)
 +++ gcov-tool.c (working copy)
 @@ -90,8 +90,8 @@ gcov_output_files (const char *out, struct gcov_in
/* Try to make directory if it doesn't already exist.  */
if (access (out, F_OK) == -1)
  {
 -#ifdef TARGET_POSIX_IO
 -  if (mkdir (out, 0755) == -1  errno != EEXIST)
 +#if !defined(_WIN32)
 +  if (mkdir (out, S_IRWXU | S_IRWXG | S_IRWXO) == -1  errno != EEXIST)

Sounds almost like something we could have libiberty glue for...


  #else
if (mkdir (out) == -1  errno != EEXIST)
  #endif
 


[c++-concepts] fixes

2014-07-11 Thread Andrew Sutton
Fix a couple of issues causing a test regression and boostrap build errors.

2014-07-11  Andrew Sutton  andrew.n.sut...@gmail.com
* gcc/cp/typeck.c (cp_build_function_call_vec): Emit diagnostic
at the input location.
* gcc/cp/error.c (dump_template_decl): Constraints are never invalid
in this way. Also fixes brace warning.

Andrew Sutton
Index: gcc/cp/typeck.c
===
--- gcc/cp/typeck.c	(revision 212456)
+++ gcc/cp/typeck.c	(working copy)
@@ -3463,8 +3463,8 @@ cp_build_function_call_vec (tree functio
 if (tree ci = get_constraints (function))
   if (!check_constraints (ci))
 {
+  error (cannot call function %qD, function);
   location_t loc = DECL_SOURCE_LOCATION (function);
-  error_at (loc, cannot call function %qD, function);
   diagnose_constraints (loc, function, NULL_TREE);
   return error_mark_node;
 }
Index: gcc/cp/error.c
===
--- gcc/cp/error.c	(revision 212456)
+++ gcc/cp/error.c	(working copy)
@@ -1280,10 +1280,7 @@ dump_template_decl (cxx_pretty_printer *
 
   if (flag_concepts)
 if (tree ci = get_constraints (t))
-if (ci != error_mark_node)
-  pp_cxx_requires_clause (pp, CI_LEADING_REQS (ci));
-else
-  pp_cxx_ws_string (pp, invalid-constraints);
+  pp_cxx_requires_clause (pp, CI_LEADING_REQS (ci));
 
 	  pp_cxx_whitespace (pp);
 	}


Re: [PATCH v2 1/3] Make libstdc++ testsuite work with pre-color GCC versions again

2014-07-11 Thread Jonathan Wakely

On 10/07/14 22:48 -0400, Samuel Bronson wrote:


When I try to build  test just libstdc++, or to run the testsuite
from trunk against my installed libstdc++, the testsuite tries to pass
-fdiagnostics-color=never to the system GCC, which is too old to
know what that is.

Since I really just want to test a patch for the gdb pretty-printers,
and since evidently my machine is too puny to actually build GCC, this
is a bit problematic.

According to the documentation, setting GCC_COLORS to  in the
environment should be just as effective, while it clearly can't cause
older GCCs to freak out, so that's just what I've done.

(I've also taken the liberty of swapping the set ccflags and set
cxxflags lines here so that ccflags doesn't end up with two
-DLOCALEDIR flags.)

libstdc++-v3/

* testsuite/lib/libstdc++.exp (libstdc++_init): Set $GCC_COLORS=
 instead of insisting that GCC understand -fdiagnostics-color=never


Committed to trunk - thanks.


Re: [PATCH v2 2/3] libstdc++ testsuite: Turn off GDB's auto-load, list loaded libs

2014-07-11 Thread Jonathan Wakely

On 10/07/14 22:48 -0400, Samuel Bronson wrote:


We load our pretty-printers explicitly, and we shouldn't need any other
random -gdb.gdb or -gdb.py files from anywhere, so in this patch we turn
that off by running set auto-load no.

Also, run info share so that the list of loaded libraries ends up in
the logs for the GDB tests.

libstdc++-v3/

* testsuite/lib/gdb-test.exp (gdb-test): Turn off GDB's auto-load, list
  loaded libs


Committed to trunk - thanks.



Re: [PATCH, libstdc++] Add the logistic distribution as an extension

2014-07-11 Thread Ed Smith-Rowland

On 07/10/2014 06:16 AM, Paolo Carlini wrote:

.. I have another comment: are we sure the usual strategy:

  templatetypename _UniformRandomNumberGenerator
result_type
operator()(_UniformRandomNumberGenerator __urng)
{ return this-operator()(__urng, this-_M_param); }

doesn't make sense here too?

Paolo.


 Look OK to me too, but I would move both operator() out of line

 and definitely operator()(_UniformRandomNumberGenerator, const 
param_type) of von_mises_distribution.

Ouch - that on was huge huge!

So moved.  The logistic operator() bodies were small looking to me but 
they aren't one or two lines either.  And DRY.


Rebuilt and retested on x86_64-linux.
OK?


2014-07-11  Edward Smith-Rowland  3dw...@verizon.net

Add the logistic_distribution as an extension.
* include/ext/random: Add the logistic_distribution.
* include/ext/random.tcc: Add the logistic_distribution.
* testsuite/ext/random/logistic_distribution/cons/parms.cc: New.
* testsuite/ext/random/logistic_distribution/cons/default.cc: New.
* testsuite/ext/random/logistic_distribution/requirements/typedefs.cc:
New.
* testsuite/ext/random/logistic_distribution/operators/inequal.cc: New.
* testsuite/ext/random/logistic_distribution/operators/equal.cc: New.
* testsuite/ext/random/logistic_distribution/operators/serialize.cc:
New.

Index: include/ext/random
===
--- include/ext/random  (revision 212442)
+++ include/ext/random  (working copy)
@@ -2728,42 +2728,8 @@
   templatetypename _UniformRandomNumberGenerator
result_type
operator()(_UniformRandomNumberGenerator __urng,
-  const param_type __p)
-   {
- const result_type __pi
-   = __gnu_cxx::__math_constantsresult_type::__pi;
- std::__detail::_Adaptor_UniformRandomNumberGenerator, result_type
-   __aurng(__urng);
+  const param_type __p);
 
- result_type __f;
- while (1)
-   {
- result_type __rnd = std::cos(__pi * __aurng());
- __f = (result_type(1) + __p._M_r * __rnd) / (__p._M_r + __rnd);
- result_type __c = __p._M_kappa * (__p._M_r - __f);
-
- result_type __rnd2 = __aurng();
- if (__c * (result_type(2) - __c)  __rnd2)
-   break;
- if (std::log(__c / __rnd2) = __c - result_type(1))
-   break;
-   }
-
- result_type __res = std::acos(__f);
-#if _GLIBCXX_USE_C99_MATH_TR1
- __res = std::copysign(__res, __aurng() - result_type(0.5));
-#else
- if (__aurng()  result_type(0.5))
-   __res = -__res;
-#endif
- __res += __p._M_mu;
- if (__res  __pi)
-   __res -= result_type(2) * __pi;
- else if (__res  -__pi)
-   __res += result_type(2) * __pi;
- return __res;
-   }
-
   templatetypename _ForwardIterator,
   typename _UniformRandomNumberGenerator
void
@@ -3106,6 +3072,227 @@
   const __gnu_cxx::hypergeometric_distribution_UIntType __d2)
 { return !(__d1 == __d2); }
 
+  /**
+   * @brief A logistic continuous distribution for random numbers.
+   *
+   * The formula for the logistic probability density function is
+   * @f[
+   * p(x|\a,\b) = \frac{e^{(x - a)/b}}{b[1 + e^{(x - a)/b}]^2}
+   * @f]
+   * where @f$b  0@f$.
+   *
+   * The formula for the logistic probability function is
+   * @f[
+   * cdf(x|\a,\b) = \frac{e^{(x - a)/b}}{1 + e^{(x - a)/b}}
+   * @f]
+   * where @f$b  0@f$.
+   *
+   * table border=1 cellpadding=10 cellspacing=0
+   * caption align=topDistribution Statistics/caption
+   * trtdMean/tdtd@f$a@f$/td/tr
+   * trtdVariance/tdtd@f$b^2\pi^2/3@f$/td/tr
+   * trtdRange/tdtd@f$[0, \infty)@f$/td/tr
+   * /table
+   */
+  templatetypename _RealType = double
+class
+logistic_distribution
+{
+  static_assert(std::is_floating_point_RealType::value,
+   template argument not a floating point type);
+
+public:
+  /** The type of the range of the distribution. */
+  typedef _RealType result_type;
+  /** Parameter type. */
+  struct param_type
+  {
+   typedef logistic_distributionresult_type distribution_type;
+
+   param_type(result_type __a = result_type(0),
+  result_type __b = result_type(1))
+   : _M_a(__a), _M_b(__b)
+   {
+ _GLIBCXX_DEBUG_ASSERT(_M_b  result_type(0));
+   }
+
+   result_type
+   a() const
+   { return _M_a; }
+
+   result_type
+   b() const
+   { return _M_b; }
+
+   friend bool
+   operator==(const param_type __p1, const param_type __p2)
+   { return __p1._M_a == __p2._M_a
+   __p1._M_b == __p2._M_b; }
+
+  private:
+   void _M_initialize();
+
+   result_type _M_a;
+   result_type _M_b;
+  };
+
+  

Re: [PATCH] offline gcda profile processing tool

2014-07-11 Thread Xinliang David Li
I wonder why. The struct definition for gcov_fn_info has not changed
in this patch.

David

On Fri, Jul 11, 2014 at 1:07 AM, Richard Biener
richard.guent...@gmail.com wrote:
 On Mon, May 5, 2014 at 7:17 PM, Rong Xu x...@google.com wrote:
 Here is the updated patch. The difference with patch set 3 is
 (1) use the gcov-counter.def for scaling operation.
 (2) fix wrong scaling function for time-profiler.

 passed with bootstrap, profiledboostrap and SPEC2006.

 One of the patches breaks bootstrap for me:

 /space/rguenther/src/svn/trunk/gcc/../libgcc/libgcov.h:184: error: ISO
 C++ forbids zero-size array ‘ctrs’
 make[3]: *** [libgcov-util.o] Error 1

 host compiler is gcc 4.3.4

 Richard.

 Thanks,

 -Rong

 On Thu, May 1, 2014 at 3:37 PM, Rong Xu x...@google.com wrote:
 Hi, Honza,

 Please find the new patch in the attachment. This patch integrated
 your earlier suggestions. The noticeable changes are:
 (1) build specialized object for libgcov-driver.c and libgcov-merge.c
 and link into gcov-tool, rather including the source file.
 (2) split some gcov counter definition code to gcov-counter.def to
 avoid code duplication.
 (3) use a macro for gcov_read_counter(), so gcov-tool can use existing
 code in libgcov-merge.c with minimal change.

 This patch does not address the suggestion of creating a new directory
 for libgcov. I agree with you that's a much better
 and cleaner structure we should go for. We can do that in follow-up patches.

 I tested this patch with boostrap and profiledbootstrap. Other tests
 are on-going.

 Thanks,

 -Rong

 On Wed, Apr 16, 2014 at 8:34 PM, Jan Hubicka hubi...@ucw.cz wrote:
 GCOT_TOOL needs to use this function to read the string in gcda file
 to memory to construct gcov_info objects.
 As you noticed, gcov runtime does not need this interface. But
 gcov-tool links with gcov runtime and it also uses the function.
 We could make it available in gcov_runtime, but that will slightly
 increase the memory footprint.

 Yep, it is not really pretty. I wrote bellow some plan how things may be
 structured in more convenient way.  Any work in that direction would be 
 welcome.

 The planned new functions for trunk version is: (1) overlap score b/w
 two profiles (2) better dumping (top hot objects/function/counters)
 and statistics.
 Once this basic version is in, we can start to add the new functionality.

 Sounds good. I assume the autoFDO does not go via gcov tool but rather uses
 custom reader of profile data at GCC side?
 I wonder, are there any recent overview papers/slides/text of design of 
 AutoFDO
 and other features to be merged?
 I remember the talk from GCC Summit and I did read some of pre-LTO LIPO
 sources  papers, but it would be nice to have somethin up to date.

 libgcov-util.o is built in gcc/ directory, rather in libgcc.
 It's directly linked to gcov-tool.
 So libgcov-util.o is built for HOST, not TARGET.
 With makefile changes, we can built HOST version of libgcov-driver.o
 and libgcov-merge.o.
 I also need to make some static functions/variables public.

 I suppose that can go with IN_GCOV_TOOL ifdef.

 So we currently have basic gcov io handling in gcc/gcov-io.c that is 
 borrowed
 by libgcc/libgcov* code.  We also will get libgcov-util.c in libgcc 
 directory
 that is actually borrowed by by gcc/gcov-tool.c code.

 We now have one runtime using STDIO for streaming and kernel has custom 
 version
 that goes via /proc interface (last time I looked).  We added some 
 abstraction
 into libgcov-interface that is the part that relies on STDIO, partly via 
 gcov-io.c
 code and now we have in-memory handling code in libgcov-util.

 I guess it would make most sentse to put all the gcov code into a new 
 directory
 (libgcov) and make it stand-alone library that can be configured
 1) for stdio based runtime as we do not
 2) for runtime missing the interface and relyin on user providing it
 3) for use within gcov file manipulation tools with reorg of
 GCC/gcov/gcov-dump/gcov-tool to all use the same low-level interfaces.
 In a longer term, I would like to make FDO profiling intstrumentation to 
 happen
 at linktime. For that I need to make the instrumentation code (minimal 
 spaning
 tree  friends) to work w/o cgraph that would ideally be done in a shared
 implementation
  Won't this get wrong answer when counters[0] is not the same?
  I would expected the merging code to compare the counters first. 
  Similarly for delta counter.

 These *_op functions are for scaling only. So there is only one
 profile involved (thus there is no comparison).
 The merge handles are in libgcov-merge.c which have the code to handle
 mismatched profile targets.

 I see, OK then.
 
  Adding correct rounding may actually make difference for Martin's 
  startup time work.

 Do you mean to use something like in RDIV macro?

 Yes.

 Honza


Re: [PATCH, libstdc++] Add the logistic distribution as an extension

2014-07-11 Thread Paolo Carlini

Hi,

On 07/11/2014 05:38 PM, Ed Smith-Rowland wrote:

OK?
Ok, thanks, but please adjust the dates you have on the testcases to the 
date of the actual commit (I suppose today or tomorrow)


Thanks again!
Paolo.



Re: [PATCH] offline gcda profile processing tool

2014-07-11 Thread Xinliang David Li
What is the macro to test POSIX IO on host? The guard uses
TARGET_POSIX_IO which is not right.

David

On Fri, Jul 11, 2014 at 1:12 AM, Christophe Lyon
christophe.l...@linaro.org wrote:
 On 11 July 2014 10:07, Richard Biener richard.guent...@gmail.com wrote:
 On Mon, May 5, 2014 at 7:17 PM, Rong Xu x...@google.com wrote:
 Here is the updated patch. The difference with patch set 3 is
 (1) use the gcov-counter.def for scaling operation.
 (2) fix wrong scaling function for time-profiler.

 passed with bootstrap, profiledboostrap and SPEC2006.

 One of the patches breaks bootstrap for me:

 /space/rguenther/src/svn/trunk/gcc/../libgcc/libgcov.h:184: error: ISO
 C++ forbids zero-size array ‘ctrs’
 make[3]: *** [libgcov-util.o] Error 1

 host compiler is gcc 4.3.4

 Richard.


 On my side, commit 212448 breaks the cross-build of GCC for targets
 using newlib:
 * arm-none-eabi
 * aarch64-none-elf
 /usr/include/sys/stat.h: In function 8098void
 gcov_output_files(const char*, gcov_info*)8099:
 /usr/include/sys/stat.h:317: error: too few arguments to function
 8098int mkdir(const char*, __mode_t)8099
 /tmp/1392958_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gcov-tool.c:96:
 error: at this point in file
 make[2]: *** [gcov-tool.o] Error 1

 Christophe.

 Thanks,

 -Rong

 On Thu, May 1, 2014 at 3:37 PM, Rong Xu x...@google.com wrote:
 Hi, Honza,

 Please find the new patch in the attachment. This patch integrated
 your earlier suggestions. The noticeable changes are:
 (1) build specialized object for libgcov-driver.c and libgcov-merge.c
 and link into gcov-tool, rather including the source file.
 (2) split some gcov counter definition code to gcov-counter.def to
 avoid code duplication.
 (3) use a macro for gcov_read_counter(), so gcov-tool can use existing
 code in libgcov-merge.c with minimal change.

 This patch does not address the suggestion of creating a new directory
 for libgcov. I agree with you that's a much better
 and cleaner structure we should go for. We can do that in follow-up 
 patches.

 I tested this patch with boostrap and profiledbootstrap. Other tests
 are on-going.

 Thanks,

 -Rong

 On Wed, Apr 16, 2014 at 8:34 PM, Jan Hubicka hubi...@ucw.cz wrote:
 GCOT_TOOL needs to use this function to read the string in gcda file
 to memory to construct gcov_info objects.
 As you noticed, gcov runtime does not need this interface. But
 gcov-tool links with gcov runtime and it also uses the function.
 We could make it available in gcov_runtime, but that will slightly
 increase the memory footprint.

 Yep, it is not really pretty. I wrote bellow some plan how things may be
 structured in more convenient way.  Any work in that direction would be 
 welcome.

 The planned new functions for trunk version is: (1) overlap score b/w
 two profiles (2) better dumping (top hot objects/function/counters)
 and statistics.
 Once this basic version is in, we can start to add the new functionality.

 Sounds good. I assume the autoFDO does not go via gcov tool but rather 
 uses
 custom reader of profile data at GCC side?
 I wonder, are there any recent overview papers/slides/text of design of 
 AutoFDO
 and other features to be merged?
 I remember the talk from GCC Summit and I did read some of pre-LTO LIPO
 sources  papers, but it would be nice to have somethin up to date.

 libgcov-util.o is built in gcc/ directory, rather in libgcc.
 It's directly linked to gcov-tool.
 So libgcov-util.o is built for HOST, not TARGET.
 With makefile changes, we can built HOST version of libgcov-driver.o
 and libgcov-merge.o.
 I also need to make some static functions/variables public.

 I suppose that can go with IN_GCOV_TOOL ifdef.

 So we currently have basic gcov io handling in gcc/gcov-io.c that is 
 borrowed
 by libgcc/libgcov* code.  We also will get libgcov-util.c in libgcc 
 directory
 that is actually borrowed by by gcc/gcov-tool.c code.

 We now have one runtime using STDIO for streaming and kernel has custom 
 version
 that goes via /proc interface (last time I looked).  We added some 
 abstraction
 into libgcov-interface that is the part that relies on STDIO, partly via 
 gcov-io.c
 code and now we have in-memory handling code in libgcov-util.

 I guess it would make most sentse to put all the gcov code into a new 
 directory
 (libgcov) and make it stand-alone library that can be configured
 1) for stdio based runtime as we do not
 2) for runtime missing the interface and relyin on user providing it
 3) for use within gcov file manipulation tools with reorg of
 GCC/gcov/gcov-dump/gcov-tool to all use the same low-level interfaces.
 In a longer term, I would like to make FDO profiling intstrumentation to 
 happen
 at linktime. For that I need to make the instrumentation code (minimal 
 spaning
 tree  friends) to work w/o cgraph that would ideally be done in a shared
 implementation
  Won't this get wrong answer when counters[0] is not the same?
  I would expected the merging code to compare the counters first. 
  

Re: [PATCH] offline gcda profile processing tool

2014-07-11 Thread Jakub Jelinek
On Fri, Jul 11, 2014 at 08:42:27AM -0700, Xinliang David Li wrote:
 I wonder why. The struct definition for gcov_fn_info has not changed
 in this patch.

Perhaps it has been used only in C until now?

Jakub


Re: [PATCH] offline gcda profile processing tool

2014-07-11 Thread Xinliang David Li
right.

Rong, the fix would be just change ctr array size to 1. For each
function, there should be at least one kind of counters -- see the
assertion in build_fn_info_type.  There are some code that do
'sizeof(gcov_fn_info)' when computing heap size -- they can be
adjusted or leave it as it is (if not doing memcpy for the whole
array).

David

On Fri, Jul 11, 2014 at 8:44 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Fri, Jul 11, 2014 at 08:42:27AM -0700, Xinliang David Li wrote:
 I wonder why. The struct definition for gcov_fn_info has not changed
 in this patch.

 Perhaps it has been used only in C until now?

 Jakub


Re: [Patch, avr] Add atmel ata5782 and ata5831 devices

2014-07-11 Thread Denis Chertykov
2014-07-11 12:50 GMT+04:00 S, Pitchumani pitchuman...@atmel.com:
 This patch adds supports for Atmel's ata5782 and ata5831 devices.

 These devices ISA matches with AVR5 architecture except that text section
 start for these two devices is 0x8000. By default 0x0 is used as the text
 start address. This patch adds a device attribute for text start and that
 is passed to Linker via -Ttext option.

 If OK, could someone commit please?

 Regards,
 Pitchumani

 2014-07-11  Pitchumani Sivanupandi  pitchuman...@atmel.com

 * config/avr/avr-arch.h (avr_mcu_t): Add text section start attribute.
 * config/avr/avr-devices.c (AVR_MCU): Same.
 (avr_mcu_types): add text start value to end of device list.
 * config/avr/avr-mcus.def: Add text section start for all devices.
 (ata5782): Add new avr5 device.
 (ata5831): Same.
 * config/avr/avr-tables.opt: Regenerate.
 * config/avr/avr.h: Add declaration for text section start handler.
 (EXTRA_SPEC_FUNCTIONS): Add text section start handler to SPEC functions.
 (LINK_SPEC): Include text section start handler to linker spec.
 * config/avr/driver-avr.c (avr_device_to_text_start): New function to
 pass -Ttext option to linker if the text section start for the device is
 not zero.
 * config/avr/t-multilib: Regenerate.
 * doc/avr-mmcu.texi: Regenerate.


Committed.

Denis.


Re: [patch 1/4] change specific int128 - generic intN

2014-07-11 Thread DJ Delorie

  PSImode is 20 bits, fits in a 20 bit register, and uses 20 bit operations.
 
 Then why do you need this change?

Because parts of the gcc code use the byte size instead of the bit
size, or round up, or assume powers-of-two sizes.

  -  TYPE_SIZE (type) = bitsize_int (GET_MODE_BITSIZE (TYPE_MODE (type)));
  +  TYPE_SIZE (type) = bitsize_int (GET_MODE_PRECISION (TYPE_MODE
  (type))); TYPE_SIZE_UNIT (type) = size_int (GET_MODE_SIZE (TYPE_MODE
  (type))); break;
 
 What are GET_MODE_BITSIZE and GET_MODE_PRECISION for PSImode?

It *should* be 20 and 20 for msp430.  But GET_MODE_BITSIZE returns 32,
because it's a macro that does GET_MODE_SIZE * BITS_PER_UNIT, so it
cannot return 20.

  If a type is 17-20 bits, PSImode is chosen.  If it's 21 bits or
  larger, SImode is chosen.  If it's 16 or fewer bits, HImode is chosen.
 
 Size or precision?  That's the crux of the matter.

GCC typically uses size for fits in a tests.



Re: [BUILDROBOT] gcov patch

2014-07-11 Thread Rong Xu
On Fri, Jul 11, 2014 at 8:06 AM, Jan Hubicka hubi...@ucw.cz wrote:
 Sorry. This code meant to work with the different mkdir api in
 windows. I used wrong ifdef.

 Here is the patch. OK for checkin?
 OK. I also see the following with LTO bootstrap:
 ../../gcc/../libgcc/libgcov-util.c:41:24: error: type of �gcov_max_filename� 
 does not match original declaration [-Werror]
  extern gcov_unsigned_t gcov_max_filename;
 ^
 ../../gcc/../libgcc/libgcov-driver.c:88:8: note: previously declared here
  size_t gcov_max_filename = 0;
 Probably both can be size_t?

OK. I will change this too and submit.

Thanks for the quick review.

-Rong


 Honza


 Thanks,

 -Rong

 2014-07-11  Rong Xu  x...@google.com

 * gcov-tool.c (gcov_output_files): Fix build error.

 Index: gcov-tool.c
 ===
 --- gcov-tool.c (revision 212448)
 +++ gcov-tool.c (working copy)
 @@ -90,8 +90,8 @@ gcov_output_files (const char *out, struct gcov_in
/* Try to make directory if it doesn't already exist.  */
if (access (out, F_OK) == -1)
  {
 -#ifdef TARGET_POSIX_IO
 -  if (mkdir (out, 0755) == -1  errno != EEXIST)
 +#if !defined(_WIN32)
 +  if (mkdir (out, S_IRWXU | S_IRWXG | S_IRWXO) == -1  errno != EEXIST)

 Sounds almost like something we could have libiberty glue for...


  #else
if (mkdir (out) == -1  errno != EEXIST)
  #endif



Re: [PATCH] offline gcda profile processing tool

2014-07-11 Thread Rong Xu
I should use _WIN32 macro. This code is for windows mkdir api.
I'll commit a patch for this shortly (approved by honza).

-Rong

On Fri, Jul 11, 2014 at 8:49 AM, Xinliang David Li davi...@google.com wrote:
 What is the macro to test POSIX IO on host? The guard uses
 TARGET_POSIX_IO which is not right.

 David

 On Fri, Jul 11, 2014 at 1:12 AM, Christophe Lyon
 christophe.l...@linaro.org wrote:
 On 11 July 2014 10:07, Richard Biener richard.guent...@gmail.com wrote:
 On Mon, May 5, 2014 at 7:17 PM, Rong Xu x...@google.com wrote:
 Here is the updated patch. The difference with patch set 3 is
 (1) use the gcov-counter.def for scaling operation.
 (2) fix wrong scaling function for time-profiler.

 passed with bootstrap, profiledboostrap and SPEC2006.

 One of the patches breaks bootstrap for me:

 /space/rguenther/src/svn/trunk/gcc/../libgcc/libgcov.h:184: error: ISO
 C++ forbids zero-size array ‘ctrs’
 make[3]: *** [libgcov-util.o] Error 1

 host compiler is gcc 4.3.4

 Richard.


 On my side, commit 212448 breaks the cross-build of GCC for targets
 using newlib:
 * arm-none-eabi
 * aarch64-none-elf
 /usr/include/sys/stat.h: In function 8098void
 gcov_output_files(const char*, gcov_info*)8099:
 /usr/include/sys/stat.h:317: error: too few arguments to function
 8098int mkdir(const char*, __mode_t)8099
 /tmp/1392958_1.tmpdir/aci-gcc-fsf/sources/gcc-fsf/gccsrc/gcc/gcov-tool.c:96:
 error: at this point in file
 make[2]: *** [gcov-tool.o] Error 1

 Christophe.

 Thanks,

 -Rong

 On Thu, May 1, 2014 at 3:37 PM, Rong Xu x...@google.com wrote:
 Hi, Honza,

 Please find the new patch in the attachment. This patch integrated
 your earlier suggestions. The noticeable changes are:
 (1) build specialized object for libgcov-driver.c and libgcov-merge.c
 and link into gcov-tool, rather including the source file.
 (2) split some gcov counter definition code to gcov-counter.def to
 avoid code duplication.
 (3) use a macro for gcov_read_counter(), so gcov-tool can use existing
 code in libgcov-merge.c with minimal change.

 This patch does not address the suggestion of creating a new directory
 for libgcov. I agree with you that's a much better
 and cleaner structure we should go for. We can do that in follow-up 
 patches.

 I tested this patch with boostrap and profiledbootstrap. Other tests
 are on-going.

 Thanks,

 -Rong

 On Wed, Apr 16, 2014 at 8:34 PM, Jan Hubicka hubi...@ucw.cz wrote:
 GCOT_TOOL needs to use this function to read the string in gcda file
 to memory to construct gcov_info objects.
 As you noticed, gcov runtime does not need this interface. But
 gcov-tool links with gcov runtime and it also uses the function.
 We could make it available in gcov_runtime, but that will slightly
 increase the memory footprint.

 Yep, it is not really pretty. I wrote bellow some plan how things may be
 structured in more convenient way.  Any work in that direction would be 
 welcome.

 The planned new functions for trunk version is: (1) overlap score b/w
 two profiles (2) better dumping (top hot objects/function/counters)
 and statistics.
 Once this basic version is in, we can start to add the new 
 functionality.

 Sounds good. I assume the autoFDO does not go via gcov tool but rather 
 uses
 custom reader of profile data at GCC side?
 I wonder, are there any recent overview papers/slides/text of design of 
 AutoFDO
 and other features to be merged?
 I remember the talk from GCC Summit and I did read some of pre-LTO LIPO
 sources  papers, but it would be nice to have somethin up to date.

 libgcov-util.o is built in gcc/ directory, rather in libgcc.
 It's directly linked to gcov-tool.
 So libgcov-util.o is built for HOST, not TARGET.
 With makefile changes, we can built HOST version of libgcov-driver.o
 and libgcov-merge.o.
 I also need to make some static functions/variables public.

 I suppose that can go with IN_GCOV_TOOL ifdef.

 So we currently have basic gcov io handling in gcc/gcov-io.c that is 
 borrowed
 by libgcc/libgcov* code.  We also will get libgcov-util.c in libgcc 
 directory
 that is actually borrowed by by gcc/gcov-tool.c code.

 We now have one runtime using STDIO for streaming and kernel has custom 
 version
 that goes via /proc interface (last time I looked).  We added some 
 abstraction
 into libgcov-interface that is the part that relies on STDIO, partly via 
 gcov-io.c
 code and now we have in-memory handling code in libgcov-util.

 I guess it would make most sentse to put all the gcov code into a new 
 directory
 (libgcov) and make it stand-alone library that can be configured
 1) for stdio based runtime as we do not
 2) for runtime missing the interface and relyin on user providing it
 3) for use within gcov file manipulation tools with reorg of
 GCC/gcov/gcov-dump/gcov-tool to all use the same low-level interfaces.
 In a longer term, I would like to make FDO profiling intstrumentation to 
 happen
 at linktime. For that I need to make the instrumentation code (minimal 
 spaning
 tree  friends) to 

Re: [PATCH] offline gcda profile processing tool

2014-07-11 Thread Rong Xu
I did see the warning in the bootstrap, but it did not exit the build.
I thought it was ok.

I'll have a patch for this and send for review.

-Rong

On Fri, Jul 11, 2014 at 9:13 AM, Xinliang David Li davi...@google.com wrote:
 right.

 Rong, the fix would be just change ctr array size to 1. For each
 function, there should be at least one kind of counters -- see the
 assertion in build_fn_info_type.  There are some code that do
 'sizeof(gcov_fn_info)' when computing heap size -- they can be
 adjusted or leave it as it is (if not doing memcpy for the whole
 array).

 David

 On Fri, Jul 11, 2014 at 8:44 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Fri, Jul 11, 2014 at 08:42:27AM -0700, Xinliang David Li wrote:
 I wonder why. The struct definition for gcov_fn_info has not changed
 in this patch.

 Perhaps it has been used only in C until now?

 Jakub


possible negative patch for gcc/tree-ssa-loop-im.c (or removal of an assert)

2014-07-11 Thread Gaius Mulley


Hello,

I've been looking through tree-ssa-loop-im.c (while hunting down a bug
in the modula-2 front end) and found a curiosity in
gcc/tree-ssa-loop-im.c.

It seems that there is dead code in function determine_max_movement
as mem_ref_in_stmt can never return NULL.


static mem_ref_p
mem_ref_in_stmt (gimple stmt)
{
  ...

  gcc_assert (ref != NULL);
  return ref;
}

so the patch below could logically be applied as the else statement is
currently unreachable.



--- tree-ssa-loop-im.c.orig 2014-07-11 16:54:41.0 +0100
+++ tree-ssa-loop-im.c  2014-07-11 16:55:38.0 +0100
@@ -798,21 +798,11 @@
 {
   mem_ref_p ref = mem_ref_in_stmt (stmt);
 
-  if (ref)
-   {
- lim_data-max_loop
- = outermost_indep_loop (lim_data-max_loop, loop, ref);
- if (!lim_data-max_loop)
-   return false;
-   }
-  else
-   {
- if ((val = gimple_vuse (stmt)) != NULL_TREE)
-   {
- if (!add_dependency (val, lim_data, loop, false))
-   return false;
-   }
-   }
+  gcc_assert (ref != NULL);
+  lim_data-max_loop
+   = outermost_indep_loop (lim_data-max_loop, loop, ref);
+  if (!lim_data-max_loop)
+   return false;
 }
 
   lim_data-cost += stmt_cost (stmt);


However my question is whether the assert in mem_ref_in_stmt is correct?
Since the author of determine_max_movement must have thought ref could
be NULL.

Anyhow, it seems that either the above patch should be applied or the
'gcc_assert (ref != NULL);' from mem_ref_p should be removed.

regards,
Gaius


Re: update address taken: don't drop clobbers

2014-07-11 Thread Jeff Law

On 07/11/14 06:05, Michael Matz wrote:

Hi,

On Thu, 10 Jul 2014, Jeff Law wrote:


The insight to note is, that undefined SSA names should really be
coalesced with something (otherwise you lost an optimization opportunity),
but it doesn't matter with _what_ each use of the undefined name is
coalesced, you can even identify different uses of them with different SSA
names (e.g. the LHS of each using stmt).  Requires some change in the
order things are done in out-of-ssa.


The last part is what I hinted might be problematical.  If some
undefined SSA_NAME appears on the RHS of two PHIs and we want to
coalesce that undefined SSA_NAME with the LHS of each of those PHIs,
then the LHS of those two PHIs must coalesce as well.  At least that's
my recollection of how all that stuff worked.


Only with the usual definition of coalescing (being transitive).  For
undefined SSA names the definition can be mended.
My recollection is the transitive nature of coalescing is baked into our 
implementation.  I'd be happy to be proved wrong :-)



Jeff



Patch to fix a bug in LRA inheritance

2014-07-11 Thread Vladimir Makarov
The following patch fixes a bug in LRA inheritance found on compiling a 
big file on s390x.  Unfortunately, the bug is very hard to reproduce and 
the test is too big and can not be included.  But even if it were 
included, checking a correct code generation would be a problem too.


The bug was in undoing inheritance transformation.  Not all inheritance 
pseudo occurrences were removed which resulted in reading a garbage from 
memory.


The patch was successfully bootstrapped and tested on x86-64.

Committed to the trunk as rev. 212464.

2014-07-11  Vladimir Makarov  vmaka...@redhat.com

* lra-constraints.c (remove_inheritance_pseudos): Process
destination pseudo too.

Index: lra-constraints.c
===
--- lra-constraints.c   (revision 212460)
+++ lra-constraints.c   (working copy)
@@ -5697,6 +5697,20 @@ remove_inheritance_pseudos (bitmap remov
SUBREG_REG (SET_SRC (set)) = SET_SRC (prev_set);
  else
SET_SRC (set) = SET_SRC (prev_set);
+ /* As we are finishing with processing the insn
+here, check the destination too as it might
+inheritance pseudo for another pseudo.  */
+ if (bitmap_bit_p (remove_pseudos, dregno)
+  bitmap_bit_p (lra_inheritance_pseudos, dregno)
+  (restore_regno
+ = lra_reg_info[dregno].restore_regno) = 0)
+   {
+ if (GET_CODE (SET_DEST (set)) == SUBREG)
+   SUBREG_REG (SET_DEST (set))
+ = regno_reg_rtx[restore_regno];
+ else
+   SET_DEST (set) = regno_reg_rtx[restore_regno];
+   }
  lra_push_insn_and_update_insn_regno_info (curr_insn);
  lra_set_used_insn_alternative_by_uid
(INSN_UID (curr_insn), -1);


Re: possible negative patch for gcc/tree-ssa-loop-im.c (or removal of an assert)

2014-07-11 Thread Gaius Mulley
Gaius Mulley gaius.mul...@southwales.ac.uk writes:

 Hello,

 I've been looking through tree-ssa-loop-im.c (while hunting down a bug
 in the modula-2 front end) and found a curiosity in
 gcc/tree-ssa-loop-im.c.

 It seems that there is dead code in function determine_max_movement
 as mem_ref_in_stmt can never return NULL.


 static mem_ref_p
 mem_ref_in_stmt (gimple stmt)
 {
   ...

   gcc_assert (ref != NULL);
   return ref;
 }

 so the patch below could logically be applied as the else statement is
 currently unreachable.



 --- tree-ssa-loop-im.c.orig   2014-07-11 16:54:41.0 +0100
 +++ tree-ssa-loop-im.c2014-07-11 16:55:38.0 +0100
 @@ -798,21 +798,11 @@
  {
mem_ref_p ref = mem_ref_in_stmt (stmt);
  
 -  if (ref)
 - {
 -   lim_data-max_loop
 -   = outermost_indep_loop (lim_data-max_loop, loop, ref);
 -   if (!lim_data-max_loop)
 - return false;
 - }
 -  else
 - {
 -   if ((val = gimple_vuse (stmt)) != NULL_TREE)
 - {
 -   if (!add_dependency (val, lim_data, loop, false))
 - return false;
 - }
 - }
 +  gcc_assert (ref != NULL);
 +  lim_data-max_loop
 + = outermost_indep_loop (lim_data-max_loop, loop, ref);
 +  if (!lim_data-max_loop)
 + return false;
  }
  
lim_data-cost += stmt_cost (stmt);


 However my question is whether the assert in mem_ref_in_stmt is correct?
 Since the author of determine_max_movement must have thought ref could
 be NULL.

 Anyhow, it seems that either the above patch should be applied or the
 'gcc_assert (ref != NULL);' from mem_ref_p should be removed.

apologies, this should read:

Anyhow, it seems that either the above patch should be applied or the
'gcc_assert (ref != NULL);' from mem_ref_in_stmt should be removed.


 regards,
 Gaius


Fix ICE in ipa-devirt while building firefox

2014-07-11 Thread Jan Hubicka
Hi,
Firefox build ICEs in ipa-devirt (types_same_for_odr) not being able
to establish ODR equivalency for non-polymorphic types.  This is because
ipa-prop and ipa-cp does call get_binfo_at_offset where it really wants
to propagate on types and offsets.  Before this is fixed this patch avoids
the ICE by not doing the propagation when we don't know what to do.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

* ipa-prop.c (ipa_binfo_from_known_type_jfunc): In LTO do not walk
non-polymorphic types.
* ipa-cp.c (ipa_get_jf_ancestor_result): Likewise.
* ipa-devirt.c (types_same_for_odr): Do not explode when one
of types is not polymorphic.

Index: ipa-prop.c
===
--- ipa-prop.c  (revision 212457)
+++ ipa-prop.c  (working copy)
@@ -560,6 +560,19 @@ ipa_binfo_from_known_type_jfunc (struct
 
   if (!base_binfo)
 return NULL_TREE;
+  /* FIXME: At LTO we can't propagate to non-polymorphic type, because
+ we have no ODR equivalency on those.  This should be fixed by
+ propagating on types rather than binfos that would make type
+ matching here unnecesary.  */
+  if (in_lto_p
+   (TREE_CODE (jfunc-value.known_type.component_type) != RECORD_TYPE
+ || !TYPE_BINFO (jfunc-value.known_type.component_type)
+ || !BINFO_VTABLE (TYPE_BINFO 
(jfunc-value.known_type.component_type
+{
+  if (!jfunc-value.known_type.offset)
+   return base_binfo;
+  return NULL;
+}
   return get_binfo_at_offset (base_binfo,
  jfunc-value.known_type.offset,
  jfunc-value.known_type.component_type);
Index: ipa-cp.c
===
--- ipa-cp.c(revision 212457)
+++ ipa-cp.c(working copy)
@@ -789,6 +789,19 @@ ipa_get_jf_ancestor_result (struct ipa_j
 {
   if (!ipa_get_jf_ancestor_type_preserved (jfunc))
return NULL;
+  /* FIXME: At LTO we can't propagate to non-polymorphic type, because
+we have no ODR equivalency on those.  This should be fixed by
+propagating on types rather than binfos that would make type
+matching here unnecesary.  */
+  if (in_lto_p
+  (TREE_CODE (ipa_get_jf_ancestor_type (jfunc)) != RECORD_TYPE
+ || !TYPE_BINFO (ipa_get_jf_ancestor_type (jfunc))
+ || !BINFO_VTABLE (TYPE_BINFO (ipa_get_jf_ancestor_type (jfunc)
+   {
+ if (!ipa_get_jf_ancestor_offset (jfunc))
+   return input;
+ return NULL;
+   }
   return get_binfo_at_offset (input,
  ipa_get_jf_ancestor_offset (jfunc),
  ipa_get_jf_ancestor_type (jfunc));
Index: ipa-devirt.c
===
--- ipa-devirt.c(revision 212457)
+++ ipa-devirt.c(working copy)
@@ -341,6 +341,20 @@ types_same_for_odr (const_tree type1, co
   || type_in_anonymous_namespace_p (type2))
 return false;
 
+  /* See if types are obvoiusly different (i.e. different codes
+ or polymorphis wrt non-polymorphic).  This is not strictly correct
+ for ODR violating programs, but we can't do better without streaming
+ ODR names.  */
+  if (TREE_CODE (type1) != TREE_CODE (type2))
+return false;
+  if (TREE_CODE (type1) == RECORD_TYPE
+   (TYPE_BINFO (type1) == NULL_TREE) != (TYPE_BINFO (type1) == 
NULL_TREE))
+return false;
+  if (TREE_CODE (type1) == RECORD_TYPE  TYPE_BINFO (type1)
+   (BINFO_VTABLE (TYPE_BINFO (type1)) == NULL_TREE)
+!= (BINFO_VTABLE (TYPE_BINFO (type2)) == NULL_TREE))
+return false;
+
   /* At the moment we have no way to establish ODR equivlaence at LTO
  other than comparing virtual table pointrs of polymorphic types.
  Eventually we should start saving mangled names in TYPE_NAME.


Re: [PATCH x86_64] Optimize access to globals in -fpie -pie builds with copy relocations

2014-07-11 Thread Sriraman Tallam
Ping.

On Thu, Jun 26, 2014 at 10:54 AM, Sriraman Tallam tmsri...@google.com wrote:
 Hi Uros,

Could you please review this patch?

 Thanks
 Sri

 On Fri, Jun 20, 2014 at 5:17 PM, Sriraman Tallam tmsri...@google.com wrote:
 Patch Updated.

 Sri

 On Mon, Jun 9, 2014 at 3:55 PM, Sriraman Tallam tmsri...@google.com wrote:
 Ping.

 On Mon, May 19, 2014 at 11:11 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 Ping.

 On Thu, May 15, 2014 at 11:34 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 Optimize access to globals with -fpie, x86_64 only:

 Currently, with -fPIE/-fpie, GCC accesses globals that are extern to the 
 module
 using the GOT.  This is two instructions, one to get the address of the 
 global
 from the GOT and the other to get the value.  If it turns out that the 
 global
 gets defined in the executable at link-time, it still needs to go through 
 the
 GOT as it is too late then to generate a direct access.

 Examples:

 foo.cc
 --
 int a_glob;
 int main () {
   return a_glob; // defined in this file
 }

 With -O2 -fpie -pie, the generated code directly accesses the global via
 PC-relative insn:

 5e0   main:
mov0x165a(%rip),%eax# 1c40 a_glob

 foo.cc
 --

 extern int a_glob;
 int main () {
   return a_glob; // defined in this file
 }

 With -O2 -fpie -pie, the generated code accesses global via GOT using two
 memory loads:

 6f0  main:
mov0x1609(%rip),%rax   # 1d00 _DYNAMIC+0x230
mov(%rax),%eax

 This is true even if in the latter case the global was defined in the
 executable through a different file.

 Some experiments on google benchmarks shows that the extra memory loads 
 affects
 performance by 1% to 5%.


 Solution - Copy Relocations:

 When the linker supports copy relocations, GCC can always assume that the
 global will be defined in the executable.  For globals that are truly 
 extern
 (come from shared objects), the linker will create copy relocations and 
 have
 them defined in the executable. Result is that no global access needs to 
 go
 through the GOT and hence improves performance.

 This patch to the gold linker :
 https://sourceware.org/ml/binutils/2014-05/msg00092.html
 submitted recently allows gold to generate copy relocations for -pie mode 
 when
 necessary.

 I have added option -mld-pie-copyrelocs which when combined with -fpie 
 would do
 this.  Note that the BFD linker does not support pie copyrelocs yet and 
 this
 option cannot be used there.

 Please review.


 ChangeLog:

 * config/i386/i36.opt (mld-pie-copyrelocs): New option.
 * config/i386/i386.c (legitimate_pic_address_disp_p): Check if this
  address is still legitimate in the presence of copy relocations
  and -fpie.
 * testsuite/gcc.target/i386/ld-pie-copyrelocs-1.c: New test.
 * testsuite/gcc.target/i386/ld-pie-copyrelocs-2.c: New test.



 Patch attached.
 Thanks
 Sri


Re: Use separate sections to stream non-trivial constructors

2014-07-11 Thread Jan Hubicka
Hi,
this is the variant of patch I comitted. I noticed that partitioning actually
calls ctor_for_folding just to figure out if the constant value may be used
that drags in every readonly variable ctor into memory at WPA.
So now we have separate predicate varpool_ctor_useable_for_folding_p to check
if ctor can be used and ctor_for_folding to drag in the actual value.

This actually happens too late to show in LTO report.  In followup patch I will
add timevar for the delayed streaming so we have some idea how often it triggers
and how much memory it takes.

Bootstrapped/regtested x86_64-linux and comitted.

Honza

* vapool.c: Include tree-ssa-alias.h, gimple.h and lto-streamer.h
(varpool_get_constructor): New function.
(varpool_ctor_useable_for_folding_p): Break out from ...
(ctor_for_folding): ... here; use varpool_get_constructor.
(varpool_assemble_decl): Likewise.
* lto-streamer.h (struct output_block): Turn cgraph_node
to symbol filed.
(lto_input_variable_constructor): Declare.
* ipa-visibility.c (function_and_variable_visibility): Use
varpool_get_constructor.
* cgraph.h (varpool_get_constructor): Declare.
(varpool_ctor_useable_for_folding_p): New function.
* lto-streamer-out.c (get_symbol_initial_value): Take encoder
parameter; return error_mark_node for non-trivial constructors.
(lto_write_tree_1, DFS_write_tree): UPdate use of
get_symbol_initial_value.
(output_function): Update initialization of symbol.
(output_constructor): New function.
(copy_function): Rename to ..
(copy_function_or_variable): ... this one; handle vars too.
(lto_output): Output variable sections.
* lto-streamer-in.c (input_constructor): New function.
(lto_read_body): Rename from ...
(lto_read_body_or_constructor): ... this one; handle vars
too.
(lto_input_variable_constructor): New function.
* ipa-prop.c (ipa_prop_write_jump_functions,
ipa_prop_write_all_agg_replacement): Update.
* lto-cgraph.c (compute_ltrans_boundary): Use it.
(output_cgraph_opt_summary): Set symbol to NULL.

* lto-partition.c (add_references_to_partition): Use 
varpool_ctor_useable_for_folding_p.
* lto.c (lto_read_in_decl_state): Update sanity check.
Index: ipa-visibility.c
===
--- ipa-visibility.c(revision 212457)
+++ ipa-visibility.c(working copy)
@@ -686,6 +686,8 @@ function_and_variable_visibility (bool w
  if (found)
{
  struct pointer_set_t *visited_nodes = pointer_set_create ();
+
+ varpool_get_constructor (vnode);
  walk_tree (DECL_INITIAL (vnode-decl),
 update_vtable_references, NULL, visited_nodes);
  pointer_set_destroy (visited_nodes);
Index: lto/lto.c
===
--- lto/lto.c   (revision 212457)
+++ lto/lto.c   (working copy)
@@ -236,7 +236,7 @@ lto_read_in_decl_state (struct data_in *
 
   ix = *data++;
   decl = streamer_tree_cache_get_tree (data_in-reader_cache, ix);
-  if (TREE_CODE (decl) != FUNCTION_DECL)
+  if (!VAR_OR_FUNCTION_DECL_P (decl))
 {
   gcc_assert (decl == void_type_node);
   decl = NULL_TREE;
Index: lto/lto-partition.c
===
--- lto/lto-partition.c (revision 212457)
+++ lto/lto-partition.c (working copy)
@@ -96,7 +96,7 @@ add_references_to_partition (ltrans_part
Recursively look into the initializers of the constant variable and add
references, too.  */
 else if (is_a varpool_node * (ref-referred)
- ctor_for_folding (ref-referred-decl) != error_mark_node
+ varpool_ctor_useable_for_folding_p (varpool (ref-referred))
  !lto_symtab_encoder_in_partition_p (part-encoder, 
ref-referred))
   {
if (!part-initializers_visited)
Index: lto-cgraph.c
===
--- lto-cgraph.c(revision 212457)
+++ lto-cgraph.c(working copy)
@@ -867,7 +867,7 @@ compute_ltrans_boundary (lto_symtab_enco
{
  if (!lto_symtab_encoder_encode_initializer_p (encoder,
vnode)
-  ctor_for_folding (vnode-decl) != error_mark_node)
+  varpool_ctor_useable_for_folding_p (vnode))
{
  lto_set_symtab_encoder_encode_initializer (encoder, vnode);
  add_references (encoder, vnode);
@@ -1808,7 +1808,7 @@ output_cgraph_opt_summary (void)
   struct output_block *ob = create_output_block (LTO_section_cgraph_opt_sum);
   unsigned count = 0;
 
-  ob-cgraph_node = NULL;
+  ob-symbol = NULL;
   encoder = ob-decl_state-symtab_node_encoder;
   n_nodes = 

Re: [C++ Patch/RFC] Back to PR 53159

2014-07-11 Thread Jason Merrill

On 07/10/2014 06:38 PM, Paolo Carlini wrote:

  if (SCALAR_TYPE_P (type))
+ if (!CLASS_TYPE_P (TREE_TYPE (init)))


Why this change?  check_narrowing only deals with scalar types.

Jason



Re: [C++ Patch] PR 53159 (Take 2)

2014-07-11 Thread Jason Merrill

Oops, this patch didn't thread with the earlier one...

On 07/11/2014 06:15 AM, Paolo Carlini wrote:

  if (SCALAR_TYPE_P (type))


Is the condition still necessary?


-   check_narrowing (type, init);
+   flags |= LOOKUP_NO_NARROWING;


Jason




Re: [C++ Patch] PR 53159 (Take 2)

2014-07-11 Thread Paolo Carlini

Hi,

On 07/11/2014 07:58 PM, Jason Merrill wrote:

Oops, this patch didn't thread with the earlier one...
Yeah, sorry, I considered my drafts of yesterday evening a distraction 
and wanted to start a fresh thread with a complete (more meaningful) new 
proposal. Probably not a good idea, after all, confusing...

On 07/11/2014 06:15 AM, Paolo Carlini wrote:

if (SCALAR_TYPE_P (type))


Is the condition still necessary?
I'm removing it and retesting. In any case (the eventual) 
check_narrowing will return immediately if !ARITHMETIC_TYPE_P (type) is 
true... Would the amended patch be Ok?


Thanks!
Paolo.




Re: [PATCH] offline gcda profile processing tool

2014-07-11 Thread Rong Xu
Richard,

I looked at my patch again. I already add -Wno-error to libgcov-util.o
compilation:
In line 200 of gcc/Makefile.in
libgcov-util.o-warn = -Wno-error

In my test, I used gcc-4.6 as the host compiler. I got warning like this:

In file included from ../../gcc/gcc/../libgcc/libgcov-util.c:30:0:
../../gcc/gcc/../libgcc/libgcov.h:184:30: warning: ISO C++ forbids
zero-size array ‘ctrs’ [-pedantic]

Can you check your buildlog to see if -Wno-error is added to the command line?

Thanks,

-Rong

On Fri, Jul 11, 2014 at 9:47 AM, Rong Xu x...@google.com wrote:
 I did see the warning in the bootstrap, but it did not exit the build.
 I thought it was ok.

 I'll have a patch for this and send for review.

 -Rong

 On Fri, Jul 11, 2014 at 9:13 AM, Xinliang David Li davi...@google.com wrote:
 right.

 Rong, the fix would be just change ctr array size to 1. For each
 function, there should be at least one kind of counters -- see the
 assertion in build_fn_info_type.  There are some code that do
 'sizeof(gcov_fn_info)' when computing heap size -- they can be
 adjusted or leave it as it is (if not doing memcpy for the whole
 array).

 David

 On Fri, Jul 11, 2014 at 8:44 AM, Jakub Jelinek ja...@redhat.com wrote:
 On Fri, Jul 11, 2014 at 08:42:27AM -0700, Xinliang David Li wrote:
 I wonder why. The struct definition for gcov_fn_info has not changed
 in this patch.

 Perhaps it has been used only in C until now?

 Jakub


Re: Use separate sections to stream non-trivial constructors

2014-07-11 Thread Jan Hubicka
Hi
this is patch i am going to commit after testing.  It removes DECL_INIT_IO
timevar that guards only one variable set (so hardly measure anything) and
moves GIMPLE_IN to proper place. It also adds CTORS_IN and CTORS_OUT.
I get:
 ipa lto gimple out  :   0.37 ( 0%) usr   0.21 ( 3%) sys   0.64 ( 1%) wall  
 0 kB ( 0%) ggc
 ipa lto decl in :  23.56 (26%) usr   1.24 (15%) sys  24.81 (23%) wall 
2429174 kB (60%) ggc
 ipa lto decl out:   5.58 ( 6%) usr   0.34 ( 4%) sys   5.94 ( 5%) wall  
 0 kB ( 0%) ggc
 ipa lto constructors in :   0.34 ( 0%) usr   0.10 ( 1%) sys   0.47 ( 0%) wall  
 14864 kB ( 0%) ggc
 ipa lto constructors out:   0.06 ( 0%) usr   0.01 ( 0%) sys   0.01 ( 0%) wall  
 0 kB ( 0%) ggc
 ipa lto cgraph I/O  :   1.20 ( 1%) usr   0.25 ( 3%) sys   1.45 ( 1%) wall  
437317 kB (11%) ggc

for Firefox WPA that is surprisingly good. We traded about 400MB for 14MB. So 
perhaps I do not
need to care about not bringing in all vtables needed for devirt machinery.

honza

* lto.c (read_cgraph_and_symbols): Do not push DECL_INIT_IO
timevar
(materialize_cgraph): Do not push GIMPLE_IN timevar.

* timevar.def (TV_IPA_LTO_DECL_INIT_IO): Remove.
(TV_IPA_LTO_CTORS_IN, TV_IPA_LTO_CTORS_OUT): New timevar.
* cgraph.c (cgraph_get_body): Push GIMPLE_IN timevar.
(varpool_get_constructor): Push CTORS_IN timevar.
* lto-streamer-out.c (lto_output): Push TV_IPA_LTO_CTORS_OUT
timevar.
Index: lto/lto.c
===
--- lto/lto.c   (revision 212467)
+++ lto/lto.c   (working copy)
@@ -3094,12 +3094,9 @@ read_cgraph_and_symbols (unsigned nfiles
 
   timevar_pop (TV_IPA_LTO_CGRAPH_MERGE);
 
-  timevar_push (TV_IPA_LTO_DECL_INIT_IO);
-
   /* Indicate that the cgraph is built and ready.  */
   cgraph_function_flags_ready = true;
 
-  timevar_pop (TV_IPA_LTO_DECL_INIT_IO);
   ggc_free (all_file_decl_data);
   all_file_decl_data = NULL;
 }
@@ -3117,9 +3114,6 @@ materialize_cgraph (void)
 fprintf (stderr,
 flag_wpa ? Materializing decls: : Reading function bodies:);
 
-  /* Now that we have input the cgraph, we need to clear all of the aux
- nodes and read the functions if we are not running in WPA mode.  */
-  timevar_push (TV_IPA_LTO_GIMPLE_IN);
 
   FOR_EACH_FUNCTION (node)
 {
@@ -3130,7 +3124,6 @@ materialize_cgraph (void)
}
 }
 
-  timevar_pop (TV_IPA_LTO_GIMPLE_IN);
 
   /* Start the appropriate timer depending on the mode that we are
  operating in.  */
Index: timevar.def
===
--- timevar.def (revision 212457)
+++ timevar.def (working copy)
@@ -77,7 +77,8 @@ DEFTIMEVAR (TV_IPA_LTO_GIMPLE_IN , 
 DEFTIMEVAR (TV_IPA_LTO_GIMPLE_OUT, ipa lto gimple out)
 DEFTIMEVAR (TV_IPA_LTO_DECL_IN   , ipa lto decl in)
 DEFTIMEVAR (TV_IPA_LTO_DECL_OUT  , ipa lto decl out)
-DEFTIMEVAR (TV_IPA_LTO_DECL_INIT_IO  , ipa lto decl init I/O)
+DEFTIMEVAR (TV_IPA_LTO_CTORS_IN  , ipa lto constructors in)
+DEFTIMEVAR (TV_IPA_LTO_CTORS_OUT , ipa lto constructors out)
 DEFTIMEVAR (TV_IPA_LTO_CGRAPH_IO , ipa lto cgraph I/O)
 DEFTIMEVAR (TV_IPA_LTO_DECL_MERGE, ipa lto decl merge)
 DEFTIMEVAR (TV_IPA_LTO_CGRAPH_MERGE  , ipa lto cgraph merge)
Index: cgraph.c
===
--- cgraph.c(revision 212457)
+++ cgraph.c(working copy)
@@ -3053,6 +3053,8 @@ cgraph_get_body (struct cgraph_node *nod
 
   gcc_assert (in_lto_p);
 
+  timevar_push (TV_IPA_LTO_GIMPLE_IN);
+
   file_data = node-lto_file_data;
   name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
 
@@ -3076,6 +3078,9 @@ cgraph_get_body (struct cgraph_node *nod
   lto_free_section_data (file_data, LTO_section_function_body, name,
 data, len);
   lto_free_function_in_decl_state_for_node (node);
+
+  timevar_pop (TV_IPA_LTO_GIMPLE_IN);
+
   return true;
 }
 
Index: varpool.c
===
--- varpool.c   (revision 212467)
+++ varpool.c   (working copy)
@@ -268,6 +268,8 @@ varpool_get_constructor (struct varpool_
   || !in_lto_p)
 return DECL_INITIAL (node-decl);
 
+  timevar_push (TV_IPA_LTO_CTORS_IN);
+
   file_data = node-lto_file_data;
   name = IDENTIFIER_POINTER (DECL_ASSEMBLER_NAME (decl));
 
@@ -286,6 +288,7 @@ varpool_get_constructor (struct varpool_
   lto_free_section_data (file_data, LTO_section_function_body, name,
 data, len);
   lto_free_function_in_decl_state_for_node (node);
+  timevar_pop (TV_IPA_LTO_CTORS_IN);
   return DECL_INITIAL (node-decl);
 }
 
Index: lto-streamer-out.c
===
--- lto-streamer-out.c  (revision 212467)
+++ lto-streamer-out.c  (working copy)
@@ -2115,6 +2115,7 @@ lto_output (void)
   lto_symtab_encoder_encode_initializer_p (encoder, node)
  

  1   2   >