Re: [PATCH] [RFC, GCC 4.8] Optimize conditional moves from adjacent memory locations

2012-02-26 Thread Andrew T Pinski
On Fri, 2012-02-24 at 15:41 -0600, William J. Schmidt wrote:
 On Fri, 2012-02-10 at 15:46 -0500, Michael Meissner wrote:
  I was looking at the routelookup EEMBC benchmark and it has code of the 
  form:
  
 while ( this_node-cmpbit  next_node-cmpbit )
  {
this_node = next_node;
  
if ( proute-dst_addr  (0x1  this_node-cmpbit) )
   next_node = this_node-rlink;
else
   next_node = this_node-llink;
  }
  
 
 Andrew and Richard both suggested this would be better handled as a tree
 optimization.  Here is a proposed patch to do that.  

I think this is slightly different from what was suggested by me.  Since
my suggestion has to deal with the load from this_node-cmpbit taken
into account and not just because this_node-rlink and this_node-llink
might be in the same page.

Thanks,
Andrew Pinski



Re: [PR52001] too many cse reverse equiv exprs (take2)

2012-02-26 Thread Richard Sandiford
Alexandre Oliva aol...@redhat.com writes:
 On Feb 19, 2012, Richard Sandiford rdsandif...@googlemail.com wrote:
 and it still isn't obvious to me when canonical_cselib_val is supposed
 to be used.

 For comparison of VALUEs, it avoids the need for recursive or
 combinatorial compares, for all equivalent VALUEs map directly to the
 single canonical value.  For recursive searches and other the like, it's just
 an optimization to avoid an additional recursion (avoiding recursing
 into *any* VALUEs is recommended along with it).

 Other algorithms that iterate over loc lists and recurse should take
 care to avoid infinite recursion, by marking already-visited nodes
 (cselib and var-tracking do some of this), temporarily zeroing out their
 loc lists (like find_base_term), or using other mechanisms to break
 recursion cycles (like get_addr).

 Algorithms that didn't expect naked VALUEs in loc lists (like get_addr)
 may need adjusting to iterate over the loc lists of the canonical value
 (for non-canonical values have a single loc, the canonical value), and
 to disregard values in canonical value's lists (unless e.g. we happen to
 be looking for VALUEs that turn out to be noncanonical).

 In other cases, the use of canonical values instead of noncanonical ones
 when filling in data structures (say, building expressions to record in
 cselib) may save memory by avoiding duplication, but since it causes
 cselib to compute different hashes, we'd better use whatever is most
 likely to be searched for by hashing.  (We could tweak lookups to use
 canonical values and to recompute hashes when adding equivalences
 between values already used in expressions, but this hasn't been done
 yet).

 I hope this makes some sense ;-)

Yeah, it does, thanks.  It seemed that when we recorded two values V1
and V2 were equivalent, we added V1 to V2's location list and V2 to V1's
location list.  But it sounds from the above like the canonical value is
what we want in almost all cases, so if V2 is the one that becomes
noncanonical, is it really worth adding V2 to V1's location list?

Richard


[patch committed SH] Fix some comment typos and formattings

2012-02-26 Thread Oleg Endo
Hello,

I've just committed the attached patch as obvious.

Cheers,
Oleg

2012-02-26  Oleg Endo  olege...@gcc.gnu.org

* config/sh/predicates.md: Remove blank lines.
* config/sh/sh.c: Fix typos in comments.
* config/sh/constraints.md: Likewise.
* config/sh/sh.md: Remove blank lines.
Fix typos in comments.  Use ;; as comment characters.

Index: gcc/config/sh/predicates.md
===
--- gcc/config/sh/predicates.md	(revision 184582)
+++ gcc/config/sh/predicates.md	(working copy)
@@ -448,7 +448,6 @@
   return general_operand (op, mode);
 })
 
-
 ;; Returns 1 if OP is a POST_INC on stack pointer register.
 
 (define_predicate sh_no_delay_pop_operand
@@ -466,7 +465,6 @@
   return 0;
 })
 
-
 ;; Returns 1 if OP is a MEM that can be source of a simple move operation.
 
 (define_predicate unaligned_load_operand
Index: gcc/config/sh/sh.c
===
--- gcc/config/sh/sh.c	(revision 184582)
+++ gcc/config/sh/sh.c	(working copy)
@@ -1850,7 +1850,7 @@
 }
 
 /* ??? How should we distribute probabilities when more than one branch
-   is generated.  So far we only have soem ad-hoc observations:
+   is generated.  So far we only have some ad-hoc observations:
- If the operands are random, they are likely to differ in both parts.
- If comparing items in a hash chain, the operands are random or equal;
  operation should be EQ or NE.
@@ -5380,7 +5380,7 @@
 
   /* If relaxing, generate pseudo-ops to associate function calls with
  the symbols they call.  It does no harm to not generate these
- pseudo-ops.  However, when we can generate them, it enables to
+ pseudo-ops.  However, when we can generate them, it enables the
  linker to potentially relax the jsr to a bsr, and eliminate the
  register load and, possibly, the constant pool entry.  */
 
@@ -9259,7 +9259,7 @@
 
 #if 0
   /* If this is a label that existed before reload, then the register
-	 if dead here.  However, if this is a label added by reorg, then
+	 is dead here.  However, if this is a label added by reorg, then
 	 the register may still be live here.  We can't tell the difference,
 	 so we just ignore labels completely.  */
   if (code == CODE_LABEL)
@@ -9569,7 +9569,7 @@
 	{
 	  int size;
 
-	  /* Check if this the address of an unaligned load / store.  */
+	  /* Check if this is the address of an unaligned load / store.  */
 	  if (mode == VOIDmode)
 	return CONST_OK_FOR_I06 (INTVAL (op));
 
Index: gcc/config/sh/constraints.md
===
--- gcc/config/sh/constraints.md	(revision 184582)
+++ gcc/config/sh/constraints.md	(working copy)
@@ -139,7 +139,7 @@
(match_test ival = 0  ival = 255)))
  
 (define_constraint K12
-  An unsigned 8-bit constant, as used in SH2A 12-bit display.
+  An unsigned 8-bit constant, as used in SH2A 12-bit displacement addressing.
   (and (match_code const_int)
(match_test ival = 0  ival = 4095)))
 
Index: gcc/config/sh/sh.md
===
--- gcc/config/sh/sh.md	(revision 184582)
+++ gcc/config/sh/sh.md	(working copy)
@@ -4405,8 +4405,6 @@
   sub	r63, %1, %0
   [(set_attr type arith_media)])
 
-
-
 ;; Don't expand immediately because otherwise neg:DI (abs:DI) will not be
 ;; combined.
 (define_expand negdi2
@@ -4497,7 +4495,6 @@
   DONE;
 })
 
-
 ;; The SH4 202 can do zero-offset branches without pipeline stalls.
 ;; This can be used as some kind of conditional execution, which is useful
 ;; for abs.
@@ -5468,7 +5465,7 @@
   operands[3] = gen_rtx_REG (DImode, REGNO (operands[2]));
 })
 
-/* When storing r0, we have to avoid reg+reg addressing.  */
+;; When storing r0, we have to avoid reg+reg addressing.
 (define_insn movhi_i
   [(set (match_operand:HI 0 general_movdst_operand   =r,r,r,r,m,r,l,r)
 	(match_operand:HI 1 general_movsrc_operand Q,rI08,m,t,r,l,r,i))]
@@ -7472,7 +7469,7 @@
(set_attr fp_set unknown)])
 
 ;; This is TBR relative jump instruction for SH2A architecture.
-;; Its use is enabled assigning an attribute function_vector
+;; Its use is enabled by assigning an attribute function_vector
 ;; and the vector number to a function during its declaration.
 
 (define_insn call_valuei_tbr_rel
@@ -9581,8 +9578,6 @@
DONE;
 )
 
-
-
 ;; sne moves the complement of the T reg to DEST like this:
 ;;  cmp/eq ...
 ;;  mov#-1,temp
@@ -9605,7 +9600,6 @@
   operands[1] = gen_reg_rtx (SImode);
 })
 
-
 ;; Recognize mov #-1/negc/neg sequence, and change it to movt/add #-1.
 ;; This prevents a regression that occurred when we switched from xor to
 ;; mov/neg for sne.
@@ -9659,7 +9653,6 @@
DONE;
 )
 
-
 ;; -
 ;; Instructions to cope with inline literal tables
 ;; 

New Swedish PO file for 'gcc' (version 4.7-b20120128)

2012-02-26 Thread Translation Project Robot
Hello, gentle maintainer.

This is a message from the Translation Project robot.

A revised PO file for textual domain 'gcc' has been submitted
by the Swedish team of translators.  The file is available at:

http://translationproject.org/latest/gcc/sv.po

(This file, 'gcc-4.7-b20120128.sv.po', has just now been sent to you in
a separate email.)

All other PO files for your package are available in:

http://translationproject.org/latest/gcc/

Please consider including all of these in your next release, whether
official or a pretest.

Whenever you have a new distribution with a new version number ready,
containing a newer POT file, please send the URL of that distribution
tarball to the address below.  The tarball may be just a pretest or a
snapshot, it does not even have to compile.  It is just used by the
translators when they need some extra translation context.

The following HTML page has been updated:

http://translationproject.org/domain/gcc.html

If any question arises, please contact the translation coordinator.

Thank you for all your work,

The Translation Project robot, in the
name of your translation coordinator.
coordina...@translationproject.org



Re: [PATCH] [RFC, GCC 4.8] Optimize conditional moves from adjacent memory locations

2012-02-26 Thread William J. Schmidt
On Sun, 2012-02-26 at 00:39 -0800, Andrew T Pinski wrote:
 On Fri, 2012-02-24 at 15:41 -0600, William J. Schmidt wrote:
  On Fri, 2012-02-10 at 15:46 -0500, Michael Meissner wrote:
   I was looking at the routelookup EEMBC benchmark and it has code of the 
   form:
   
  while ( this_node-cmpbit  next_node-cmpbit )
   {
 this_node = next_node;
   
 if ( proute-dst_addr  (0x1  this_node-cmpbit) )
next_node = this_node-rlink;
 else
next_node = this_node-llink;
   }
   
  
  Andrew and Richard both suggested this would be better handled as a tree
  optimization.  Here is a proposed patch to do that.  
 
 I think this is slightly different from what was suggested by me.  Since
 my suggestion has to deal with the load from this_node-cmpbit taken
 into account and not just because this_node-rlink and this_node-llink
 might be in the same page.

Hi Andrew,

That's true, but I must be missing your concern here.  If this_node is
NULL, an exception will occur with or without the transformation,
whether this_node is dereferenced in bb0 or not.  Are you worried about
exception ordering or...?  Sorry if I'm being dense.

Thanks,
Bill

 
 Thanks,
 Andrew Pinski
 



[patch committed SH] PR 49263 - add missing test case

2012-02-26 Thread Oleg Endo
Hello,

The test case that was originally included in the patch for PR 49263
didn't make it into trunk.  Committed as rev 184585.

Cheers,
Oleg


testsuite/ChangeLog:

2012-02-26  Oleg Endo  olege...@gcc.gnu.org

PR target/49263
* gcc.target/sh/pr49263.c: New.

Index: gcc/testsuite/gcc.target/sh/pr49263.c
===
--- gcc/testsuite/gcc.target/sh/pr49263.c	(revision 0)
+++ gcc/testsuite/gcc.target/sh/pr49263.c	(revision 0)
@@ -0,0 +1,86 @@
+/* Verify that TST #imm, R0 instruction is generated if the constant
+   allows it.  Under some circumstances another compare instruction might
+   be selected, which is also fine.  Any AND instructions are considered
+   counter productive and fail the test.  */
+/* { dg-do compile { target sh*-*-* } } */
+/* { dg-options -O2 } */
+/* { dg-final { scan-assembler-not and } } */
+
+#define make_func(__valtype__, __valget__, __tstval__, __suff__)\
+  int test_imm_##__tstval__##__suff__ (__valtype__ val) \
+{\
+  return ((__valget__)  (0x##__tstval__   0)) ? -20 : -40;\
+}
+
+#define make_func_0_F(__valtype__, __valget__, __y__, __suff__)\
+  make_func (__valtype__, __valget__, __y__##0, __suff__)\
+  make_func (__valtype__, __valget__, __y__##1, __suff__)\
+  make_func (__valtype__, __valget__, __y__##2, __suff__)\
+  make_func (__valtype__, __valget__, __y__##3, __suff__)\
+  make_func (__valtype__, __valget__, __y__##4, __suff__)\
+  make_func (__valtype__, __valget__, __y__##5, __suff__)\
+  make_func (__valtype__, __valget__, __y__##6, __suff__)\
+  make_func (__valtype__, __valget__, __y__##7, __suff__)\
+  make_func (__valtype__, __valget__, __y__##8, __suff__)\
+  make_func (__valtype__, __valget__, __y__##9, __suff__)\
+  make_func (__valtype__, __valget__, __y__##A, __suff__)\
+  make_func (__valtype__, __valget__, __y__##B, __suff__)\
+  make_func (__valtype__, __valget__, __y__##C, __suff__)\
+  make_func (__valtype__, __valget__, __y__##D, __suff__)\
+  make_func (__valtype__, __valget__, __y__##E, __suff__)\
+  make_func (__valtype__, __valget__, __y__##F, __suff__)\
+
+#define make_funcs_0_FF(__valtype__, __valget__, __suff__)\
+  make_func_0_F (__valtype__, __valget__, 0, __suff__)\
+  make_func_0_F (__valtype__, __valget__, 1, __suff__)\
+  make_func_0_F (__valtype__, __valget__, 2, __suff__)\
+  make_func_0_F (__valtype__, __valget__, 3, __suff__)\
+  make_func_0_F (__valtype__, __valget__, 4, __suff__)\
+  make_func_0_F (__valtype__, __valget__, 5, __suff__)\
+  make_func_0_F (__valtype__, __valget__, 6, __suff__)\
+  make_func_0_F (__valtype__, __valget__, 7, __suff__)\
+  make_func_0_F (__valtype__, __valget__, 8, __suff__)\
+  make_func_0_F (__valtype__, __valget__, 9, __suff__)\
+  make_func_0_F (__valtype__, __valget__, A, __suff__)\
+  make_func_0_F (__valtype__, __valget__, B, __suff__)\
+  make_func_0_F (__valtype__, __valget__, C, __suff__)\
+  make_func_0_F (__valtype__, __valget__, D, __suff__)\
+  make_func_0_F (__valtype__, __valget__, E, __suff__)\
+  make_func_0_F (__valtype__, __valget__, F, __suff__)\
+
+make_funcs_0_FF (signed char*, *val, int8_mem)
+make_funcs_0_FF (signed char, val, int8_reg)
+
+make_funcs_0_FF (unsigned char*, *val, uint8_mem)
+make_funcs_0_FF (unsigned char, val, uint8_reg)
+
+make_funcs_0_FF (short*, *val, int16_mem)
+make_funcs_0_FF (short, val, int16_reg)
+
+make_funcs_0_FF (unsigned short*, *val, uint16_mem)
+make_funcs_0_FF (unsigned short, val, uint16_reg)
+
+make_funcs_0_FF (int*, *val, int32_mem)
+make_funcs_0_FF (int, val, int32_reg)
+
+make_funcs_0_FF (unsigned int*, *val, uint32_mem)
+make_funcs_0_FF (unsigned int, val, uint32_reg)
+
+make_funcs_0_FF (long long*, *val, int64_lowword_mem)
+make_funcs_0_FF (long long, val, int64_lowword_reg)
+
+make_funcs_0_FF (unsigned long long*, *val, uint64_lowword_mem)
+make_funcs_0_FF (unsigned long long, val, uint64_lowword_reg)
+
+make_funcs_0_FF (long long*, *val  32, int64_highword_mem)
+make_funcs_0_FF (long long, val  32, int64_highword_reg)
+
+make_funcs_0_FF (unsigned long long*, *val  32, uint64_highword_mem)
+make_funcs_0_FF (unsigned long long, val  32, uint64_highword_reg)
+
+make_funcs_0_FF (long long*, *val  16, int64_midword_mem)
+make_funcs_0_FF (long long, val  16, int64_midword_reg)
+
+make_funcs_0_FF (unsigned long long*, *val  16, uint64_midword_mem)
+make_funcs_0_FF (unsigned long long, val  16, uint64_midword_reg)
+


Re: [PATCH] Fix PR52298

2012-02-26 Thread Richard Guenther
On Fri, Feb 24, 2012 at 2:16 PM, Ulrich Weigand uweig...@de.ibm.com wrote:
 Richard Guenther wrote:
 On Thu, 23 Feb 2012, Ulrich Weigand wrote:
  The assert in question looks like:
 
    if (nested_in_vect_loop
         (TREE_INT_CST_LOW (STMT_VINFO_DR_STEP (stmt_info))
            % GET_MODE_SIZE (TYPE_MODE (vectype)) != 0))
      {
        gcc_assert (alignment_support_scheme != 
  dr_explicit_realign_optimized);
        compute_in_loop = true;
      }
 
  where your patch changed DR_STEP to STMT_VINFO_DR_STEP (reverting just this
  one change makes the ICEs go away).
 
  However, at the place where the decision to use the 
  dr_explicit_realign_optimized
  strategy is made (tree-vect-data-refs.c:vect_supportable_dr_alignment), we 
  still
  have:
 
            if ((nested_in_vect_loop
                  (TREE_INT_CST_LOW (DR_STEP (dr))
                     != GET_MODE_SIZE (TYPE_MODE (vectype
                || !loop_vinfo)
              return dr_explicit_realign;
            else
              return dr_explicit_realign_optimized;
 
  Should this now also use STMT_VINFO_DR_STEP?

 Yes, I think so.

 Hmmm.  Reading the comment in vect_supportable_dr_alignment:

     However, in the case of outer-loop vectorization, when vectorizing a
     memory access in the inner-loop nested within the LOOP that is now being
     vectorized, while it is guaranteed that the misalignment of the
     vectorized memory access will remain the same in different outer-loop
     iterations, it is *not* guaranteed that is will remain the same throughout
     the execution of the inner-loop.  This is because the inner-loop advances
     with the original scalar step (and not in steps of VS).  If the inner-loop
     step happens to be a multiple of VS, then the misalignment remains fixed
     and we can use the optimized realignment scheme.

 it would appear that in this case, checking the inner-loop step is deliberate.

 Given the comment in vectorizable_load:

  /* If the misalignment remains the same throughout the execution of the
     loop, we can create the init_addr and permutation mask at the loop
     preheader.  Otherwise, it needs to be created inside the loop.
     This can only occur when vectorizing memory accesses in the inner-loop
     nested within an outer-loop that is being vectorized.  */

 this looks to me that, since the check is intended to verify that
 misalignment remains the same throughout the execuction of the loop,
 we actually want to check the inner-loop step here as well, i.e. revert
 this chunk of your patch ...

Hmm.  I have to admit I don't know the outer loop vectorization code very well,
but the comments indeed suggest that revering that hunk is ok.  Can you check
that patch on powerpc and commit it if it works ok?

Thanks,
Richard.

 Bye,
 Ulrich

 --
  Dr. Ulrich Weigand
  GNU Toolchain for Linux on System z and Cell BE
  ulrich.weig...@de.ibm.com



Re: [PR51752] publication safety violations in loop invariant motion pass

2012-02-26 Thread Richard Guenther
On Fri, Feb 24, 2012 at 2:10 PM, Torvald Riegel trie...@redhat.com wrote:
 On Fri, 2012-02-24 at 09:58 +0100, Richard Guenther wrote:
 On Thu, Feb 23, 2012 at 10:11 PM, Aldy Hernandez al...@redhat.com wrote:
  On 02/23/12 12:19, Aldy Hernandez wrote:
 
  about hit me. Instead now I save all loads in a function and iterate
  through them in a brute force way. I'd like to rewrite this into a hash
  of some sort, but before I go any further I'm interested to know if the
  main idea is ok.
 
 
  For the record, it may be ideal to reuse some of the iterations we already
  do over the function's basic blocks, so we don't have to iterate yet again
  over the IL stream.  Though it may complicate the pass unnecessarily.

 It's definitely ugly that you need to do this.

 Indeed.  But that's simply the price we have to pay for making
 publication with transactions easier for the programmer yet still at
 acceptable performance.

 Also note that what we primarily have to care about for this PR is to
 not hoist loads _within_ transactions if they would violate publication
 safety.  I didn't have time to look at Aldy's patch yet, but a first
 safe and conservative way would be to treat transactions as full
 transformation barriers, and prevent publication-safety-violating
 transformations _within_ transactions.  Which I would prefer until we're
 confident that we understood all of it.

 For hoisting out of or across transactions, we have to reason about more
 than just publication safety.

 And yes, you have to look at
 very many passes I guess, PRE comes to my mind at least.

 I still do not understand the situation very well, at least why for

  transaction { for () if (b) load; } load;

 it should be safe to hoist load out of the transaction

 This one is funny.  *Iff* this is an atomic txn, we can assume that the
 transaction does not wait for a concurrent event. If it is a relaxed
 txn, then we must not have a loop which terminates depending on an
 atomic load (which we don't in that example); otherwise, we cannot hoist
 load to before the txn (same reason as why we must not hoist to before
 an atomic load with memory_order_acquire).
 Now, if these things hold, then the load will always be executed after
 the txn.  Thus, we can assume that it will happen anyway and nothing can
 stop it (irrespective of which position the txn gets in the
 Transactional Synchronization Order (see the C++ TM spec)).  It is a
 nonatomic load, and we can rely on the program being data-race-free, so
 we cannot have other threads storing to the same location, so we can
 hoist it across because in that part of the program, the location is
 guaranteed to be thread-local data (or immutable).

 As I said, this isn't related to just pub safety anymore.  And this is
 tricky enough that I'd rather not try to optimize this currently but
 instead wait until we have more confidence in our understanding of the
 matter.

 while for

  load; transaction { for () if (b) load; }

 it is not.

 If the same assumptions as above hold, I think you can hoist it out,
 because again you can assume that it targets thread-local/immutable
 data: the nontransactional (+nonatomic!) load can happen at any time,
 essentially, irrespective of b's value or how/when other threads modify
 it.  Thus, it cannot have been changed between the two loads in a
 data-race-free program.

 Neither do I understand why it's not ok for

  transaction { for () if (b) load; }

 to hoist the load out of the transaction.

 You would violate publication safety.

 Also, you don't have any reason to believe that load targets
 thread-local/immutable data, so you must not make it nontransactional
 (otherwise, you could introduce a race condition).


 I assume all the issue is about hoisting things over the trasaction start.
 So - why does the transaction start not simply clobber all global memory?
 That would avoid the hoisting.  I assume that  transforming the above to

  transaction { tem = load; for () if (b) = tem; }

 is ok?

 No, it is not.  Actually, this is precisely the transformation that we
 need to prevent from happening.
 As Aldy said, please see the explanations in the PR, or have a look at
 the C++ TM specification and the C++11 memory model alternatively.  We
 could also discuss this on IRC, if this would be easier.

Ok. I see.  So, I think what would be best is to have a way to check whether
a store/load is part of a transaction - do we have a way to do that right now?
(For example a flag on a gimple stmt?)  Then we can simply avoid the LIM
transform by making transaction load/store stmts behave the same as
potentially trapping stmts (thus, only optimize if the memory is accessed
unconditional somewhere else).  That would work for PRE as well.
[easiest would be to make *_could_trap_p return true for memory ops
inside a transaction]

Does that sound like it would make TM happy (well, apart from missed
optimizations)?  Thus, is it enough to avoid transforms that would be

Re: [PR51752] publication safety violations in loop invariant motion pass

2012-02-26 Thread Richard Guenther
On Fri, Feb 24, 2012 at 5:34 PM, Aldy Hernandez al...@redhat.com wrote:
 On 02/24/12 07:10, Torvald Riegel wrote:

 On Fri, 2012-02-24 at 09:58 +0100, Richard Guenther wrote:

 On Thu, Feb 23, 2012 at 10:11 PM, Aldy Hernandezal...@redhat.com
  wrote:

 On 02/23/12 12:19, Aldy Hernandez wrote:

 about hit me. Instead now I save all loads in a function and iterate
 through them in a brute force way. I'd like to rewrite this into a hash
 of some sort, but before I go any further I'm interested to know if the
 main idea is ok.



 For the record, it may be ideal to reuse some of the iterations we
 already
 do over the function's basic blocks, so we don't have to iterate yet
 again
 over the IL stream.  Though it may complicate the pass unnecessarily.


 It's definitely ugly that you need to do this.


 Indeed.  But that's simply the price we have to pay for making
 publication with transactions easier for the programmer yet still at
 acceptable performance.

 Also note that what we primarily have to care about for this PR is to
 not hoist loads _within_ transactions if they would violate publication


 For that matter, didn't rth add a memory barrier at the beginning of
 transactions last week?  That would mean that we can't hoist anything
 outside of a transaction anyhow.  Or was it not a full memory barrier?

It's now a full memory barrier for all global memory and for local statics
if their address is taken (and for automatic vars with their address taken).
Do we need to be concerned about non-address-taken local statics?


 safety.  I didn't have time to look at Aldy's patch yet, but a first
 safe and conservative way would be to treat transactions as full
 transformation barriers, and prevent publication-safety-violating
 transformations _within_ transactions.  Which I would prefer until we're
 confident that we understood all of it.


 Do you mean disallow hoisting of *any* loads that happen inside of a
 transaction (regardless of whether a subsequent load happens on every path
 out of the loop)?  This would definitely be safe and quite easily doable,
 simply by checking if loads to be hoisted are within a transaction.

Yes, simply handle memory accesses inside a transaction as if they may
trap.  Then all optimizers should be fine (or they are buggy even without
TM).

Richard.



 For hoisting out of or across transactions, we have to reason about more
 than just publication safety.


 Again, __transactions being barriers and all, I don't think we should
 complicate things unnecessarily at this point, since it doesn't happen.

 Aldy


[Patch libiberty/mach-o] fix byte-swapping of indices in wrapper sections.

2012-02-26 Thread Iain Sandoe

Found while testing x86 X ppc ...
.. I missed byte-swapping the indices when outputting the index of the  
GNU wrapper for LTO sections.


OK/When?
Iain

libiberty:

* simple-object-mach-o.c (simple_object_mach_o_write_segment):
Byte-swap indices when required.

diff --git a/libiberty/simple-object-mach-o.c b/libiberty/simple- 
object-mach-o.c

index af5e4f9..fbf6a3f 100644
--- a/libiberty/simple-object-mach-o.c
+++ b/libiberty/simple-object-mach-o.c
@@ -1224,6 +1224,11 @@ simple_object_mach_o_write_segment  
(simple_object_write *sobj, int descriptor,

index[4 * i] -= index[0];
   index[0] = 0;

+  /* Swap the indices, if required.  */
+
+  for (i = 0; i  (nsects_in * 4); ++i)
+set_32 (index[i], index[i]);
+
   sechdr_offset += sechdrsize;

   /* Write out the section names.



Re: [Patch libiberty/mach-o] fix byte-swapping of indices in wrapper sections.

2012-02-26 Thread Ian Lance Taylor
Iain Sandoe develo...@sandoe-acoustics.co.uk writes:

 Found while testing x86 X ppc ...
 .. I missed byte-swapping the indices when outputting the index of the
 GNU wrapper for LTO sections.

 OK/When?
 Iain

 libiberty:

   * simple-object-mach-o.c (simple_object_mach_o_write_segment):
   Byte-swap indices when required.

 diff --git a/libiberty/simple-object-mach-o.c b/libiberty/simple-
 object-mach-o.c
 index af5e4f9..fbf6a3f 100644
 --- a/libiberty/simple-object-mach-o.c
 +++ b/libiberty/simple-object-mach-o.c
 @@ -1224,6 +1224,11 @@ simple_object_mach_o_write_segment
 (simple_object_write *sobj, int descriptor,
   index[4 * i] -= index[0];
index[0] = 0;

 +  /* Swap the indices, if required.  */
 +
 +  for (i = 0; i  (nsects_in * 4); ++i)
 +set_32 (index[i], index[i]);
 +
sechdr_offset += sechdrsize;

/* Write out the section names.


I think that is not the only way the code is broken.  When you pass the
array to simple_object_internal_write, you are assuming that the type
unsigned int is exactly 4 bytes long.  I would strongly recommend
changing the code to do something like

  index_bytes = XNEWVEC (unsigned char, nsects_in * 16);
  for (i = 0; i  nsects_in * 4; ++i)
set_32 (index_bytes + i * 4, index[i]);
  if (!simple_object_internal_write (descriptor, offset, index_bytes,
 nsects_in * 16, errmsg, err))
return 0;
  XFREEVEC (index_bytes);

If that works, I'll approve it.  I think this is OK in stage 4, as it is
a bug fix.

Ian


[wwwdocs] SH update for 4.7

2012-02-26 Thread Oleg Endo
Hello,

The attached patch adds some SH update notes for GCC 4.7.
OK to commit?

Cheers,
Oleg


Index: htdocs/gcc-4.7/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.7/changes.html,v
retrieving revision 1.88
diff -u -r1.88 changes.html
--- htdocs/gcc-4.7/changes.html	23 Feb 2012 13:24:36 -	1.88
+++ htdocs/gcc-4.7/changes.html	26 Feb 2012 21:55:51 -
@@ -735,6 +735,24 @@
 	being defined in preprocessor output./li
   /ul
 
+h3SH/h3
+  ul
+liA new option code-msoft-atomic/code has been added.  When it is
+specified, GCC will generate GNU/Linux compatible gUSA atomic sequences
+for the new code__atomic/code routines.
+liSince it is neither supported by GAS nor officially documented, code
+generation for little endian SH2A has been disabled.  Specifying
+code-ml/code with code-m2a*/code will now result in a compiler
+error.
+liThe defunct code-mbranch-cost/code option has been fixed.
+liSome improvements to the generated code of:
+  ul
+liUtilization of the codetst #imm,R0/code instruction.
+liDynamic shift instructions on SH2A.
+liInteger absolute value calculations.
+  /ul
+  /ul
+
 h3SPARC/h3
   ul
 liThe option code-mflat/code has been reinstated.  When it is


Re: [SH] Delete dead GO_IF_LEGITIMATE_INDEX macro

2012-02-26 Thread Kaz Kojima
Oleg Endo oleg.e...@t-online.de wrote:
 The attached patch deletes the dead GO_IF_LEGITIMATE_INDEX macro in
 sh.h.
 OK to apply to trunk?

OK.

Regards,
kaz


Re: [wwwdocs] SH update for 4.7

2012-02-26 Thread Kaz Kojima
Oleg Endo oleg.e...@t-online.de wrote:
 The attached patch adds some SH update notes for GCC 4.7.
 OK to commit?

Looks fine to me, though it requires OK from wwwdocs maintainer.

Regards,
kaz


[SH] Use SImode for return value in atomic_compare_and_swap*

2012-02-26 Thread Oleg Endo
Hello,

The attached patch changes the atomic_compare_and_swap expander/insn to
use SImode for the return value instead of QImode.  This is more aligned
to the other insns which handle the T bit as SImode and avoids some
unnecessary test instructions in cases where the result of the atomic op
in the T bit is re-used.

Tested against rev 184582 with

make -k check RUNTESTFLAGS=--target_board=sh-sim
\{-m2/-ml/-msoft-atomic,
-m2/-mb/-msoft-atomic,
-m2a-single/-mb/-msoft-atomic,
-m4-single/-ml/-msoft-atomic,
-m4-single/-mb/-msoft-atomic,
-m4a-single/-ml/-msoft-atomic,
-m4a-single/-mb/-msoft-atomic}

and no new failures.

Cheers,
Oleg

2012-02-27  Oleg Endo  olege...@gcc.gnu.org

* config/sh/sync.md (atomic_compare_and_swapmode): Use SImode
for return value instead of QImode.
(atomic_compare_and_swapmode_soft): Likewise.
Index: gcc/config/sh/sync.md
===
--- gcc/config/sh/sync.md	(revision 184582)
+++ gcc/config/sh/sync.md	(working copy)
@@ -109,7 +109,7 @@
   [(plus add) (minus sub) (ior or) (xor xor) (and and)])
 
 (define_expand atomic_compare_and_swapmode
-  [(match_operand:QI 0 register_operand )		;; bool success output
+  [(match_operand:SI 0 register_operand )		;; bool success output
(match_operand:I124 1 register_operand )		;; oldval output
(match_operand:I124 2 memory_operand )		;; memory
(match_operand:I124 3 register_operand )		;; expected input
@@ -131,7 +131,7 @@
   else if (MODEmode == HImode)
 emit_insn (gen_zero_extendhisi2 (gen_lowpart (SImode, operands[1]),
  operands[1]));
-  emit_insn (gen_movqi (operands[0], gen_rtx_REG (QImode, T_REG)));
+  emit_insn (gen_movsi (operands[0], gen_rtx_REG (SImode, T_REG)));
   DONE;
 })
 
@@ -144,8 +144,8 @@
 	  UNSPECV_CMPXCHG_1))
(set (mem:I124 (match_dup 1))
 	(unspec_volatile:I124 [(const_int 0)] UNSPECV_CMPXCHG_2))
-   (set (reg:QI T_REG)
-	(unspec_volatile:QI [(const_int 0)] UNSPECV_CMPXCHG_3))
+   (set (reg:SI T_REG)
+	(unspec_volatile:SI [(const_int 0)] UNSPECV_CMPXCHG_3))
(clobber (match_scratch:SI 4 =u))
(clobber (reg:SI R0_REG))
(clobber (reg:SI R1_REG))]


[SH] Add atomic_exchange patterns

2012-02-26 Thread Oleg Endo
Hello,

The attached patch adds atomic_exchange patterns to the SH target.
This results in slightly better generated code compared to the default
compare_and_swap loop that is generated if atomic_exchange patterns are
absent.

Tested against rev 184582 with

make -k check RUNTESTFLAGS=--target_board=sh-sim
\{-m2/-ml/-msoft-atomic,
-m2/-mb/-msoft-atomic,
-m2a-single/-mb/-msoft-atomic,
-m4-single/-ml/-msoft-atomic,
-m4-single/-mb/-msoft-atomic,
-m4a-single/-ml/-msoft-atomic,
-m4a-single/-mb/-msoft-atomic}

and no new failures.

Cheers,
Oleg

2012-02-27  Oleg Endo  olege...@gcc.gnu.org

* config/sh/sync.md (atomic_exchangemode): New expander.
(atomic_exchangemode_soft): New insn.
Index: gcc/config/sh/sync.md
===
--- gcc/config/sh/sync.md	(revision 184582)
+++ gcc/config/sh/sync.md	(working copy)
@@ -164,6 +164,45 @@
 }
   [(set_attr length 20)])
 
+(define_expand atomic_exchangemode
+  [(match_operand:I124 0 register_operand )		;; oldval output
+   (match_operand:I124 1 memory_operand )		;; memory
+   (match_operand:I124 2 register_operand )		;; newval input
+   (match_operand:SI 3 const_int_operand )]		;; memory model
+  TARGET_SOFT_ATOMIC  !TARGET_SHMEDIA
+{
+  rtx addr = force_reg (Pmode, XEXP (operands[1], 0));
+  emit_insn (gen_atomic_exchangemode_soft
+	 (operands[0], addr, operands[2]));
+  if (MODEmode == QImode)
+emit_insn (gen_zero_extendqisi2 (gen_lowpart (SImode, operands[0]),
+ operands[0]));
+  else if (MODEmode == HImode)
+emit_insn (gen_zero_extendhisi2 (gen_lowpart (SImode, operands[0]),
+ operands[0]));
+  DONE;
+})
+
+(define_insn atomic_exchangemode_soft
+  [(set (match_operand:I124 0 register_operand =u)
+	(mem:I124 (match_operand:SI 1 register_operand u)))
+   (set (mem:I124 (match_dup 1))
+	(unspec:I124
+	  [(match_operand:I124 2 register_operand u)] UNSPEC_ATOMIC))
+   (clobber (reg:SI R0_REG))
+   (clobber (reg:SI R1_REG))]
+  TARGET_SOFT_ATOMIC  !TARGET_SHMEDIA
+{
+  return mova	1f,r0\n
+	 	.align 2			\n
+	 	mov	r15,r1			\n
+	 	mov	#(0f-1f),r15		\n
+	 0:	mov.i124suffix	@%1,%0	\n
+	 	mov.i124suffix	%2,@%1	\n
+	 1:	mov	r1,r15;
+}
+  [(set_attr length 14)])
+
 (define_expand atomic_fetch_fetchop_namemode
   [(set (match_operand:I124 0 register_operand )
 	(match_operand:I124 1 memory_operand ))


Re: [wwwdocs] SH update for 4.7

2012-02-26 Thread Gerald Pfeifer
On Mon, 27 Feb 2012, Kaz Kojima wrote:
 The attached patch adds some SH update notes for GCC 4.7.
 OK to commit?
 Looks fine to me, though it requires OK from wwwdocs maintainer.

It does not. :-)  As port maintainer, you are more then welcome
(and obviously qualified), Kaz!


Oleg, just one detail regarding markup: please close all list
items with /li at the end, or the validator will haunt you.

I was going to show this using an example, but then figured,
why not be helpful and go ahead and make this change for you?
The patch now looks as follows and I applied it already.

Thanks,
Gerald

Index: changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-4.7/changes.html,v
retrieving revision 1.88
diff -u -3 -p -r1.88 changes.html
--- changes.html23 Feb 2012 13:24:36 -  1.88
+++ changes.html27 Feb 2012 00:30:46 -
@@ -735,6 +735,24 @@ int add_values (const __flash int *p, in
being defined in preprocessor output./li
   /ul
 
+h3SH/h3
+  ul
+liA new option code-msoft-atomic/code has been added.  When it is
+specified, GCC will generate GNU/Linux compatible gUSA atomic sequences
+   for the new code__atomic/code routines./li
+liSince it is neither supported by GAS nor officially documented, code
+generation for little endian SH2A has been disabled.  Specifying
+code-ml/code with code-m2a*/code will now result in a compiler
+   error./li
+liThe defunct code-mbranch-cost/code option has been fixed./li
+liSome improvements to the generated code of:
+  ul
+liUtilization of the codetst #imm,R0/code instruction./li
+   liDynamic shift instructions on SH2A./li
+liInteger absolute value calculations./li
+  /ul/li
+  /ul
+
 h3SPARC/h3
   ul
 liThe option code-mflat/code has been reinstated.  When it is


Re: [SH] Use SImode for return value in atomic_compare_and_swap*

2012-02-26 Thread Kaz Kojima
Oleg Endo oleg.e...@t-online.de wrote:
 The attached patch changes the atomic_compare_and_swap expander/insn to
 use SImode for the return value instead of QImode.  This is more aligned
 to the other insns which handle the T bit as SImode and avoids some
 unnecessary test instructions in cases where the result of the atomic op
 in the T bit is re-used.
 
 Tested against rev 184582 with
 
 make -k check RUNTESTFLAGS=--target_board=sh-sim
 \{-m2/-ml/-msoft-atomic,
 -m2/-mb/-msoft-atomic,
 -m2a-single/-mb/-msoft-atomic,
 -m4-single/-ml/-msoft-atomic,
 -m4-single/-mb/-msoft-atomic,
 -m4a-single/-ml/-msoft-atomic,
 -m4a-single/-mb/-msoft-atomic}
 
 and no new failures.

OK for 4.8.

Regards,
kaz


Re: [SH] Add atomic_exchange patterns

2012-02-26 Thread Kaz Kojima
Oleg Endo oleg.e...@t-online.de wrote:
 The attached patch adds atomic_exchange patterns to the SH target.
 This results in slightly better generated code compared to the default
 compare_and_swap loop that is generated if atomic_exchange patterns are
 absent.
 
 Tested against rev 184582 with
 
 make -k check RUNTESTFLAGS=--target_board=sh-sim
 \{-m2/-ml/-msoft-atomic,
 -m2/-mb/-msoft-atomic,
 -m2a-single/-mb/-msoft-atomic,
 -m4-single/-ml/-msoft-atomic,
 -m4-single/-mb/-msoft-atomic,
 -m4a-single/-ml/-msoft-atomic,
 -m4a-single/-mb/-msoft-atomic}
 
 and no new failures.

OK for 4.8.

Regards,
kaz


[patch libgcc]: Fix float128 soft-float for mingw targets

2012-02-26 Thread Kai Tietz
Hi,

by recent tests in gcc.target/i386 I noticed that testcase
float128-2.c failed on executation.  This failure is caused by
incompatible bitfield-structure definition in soft-fp/quad.h for
enabled ms-bitfields layout.

Patch marks those structures to be 'gcc_struct' for mingw targets.

ChangeLog

2012-02-27  Kai Tietz  kti...@redhat.com

* soft-fp/quad.h: Mark bitfield-structures as gcc_struct.

Regression tested for i686-w64-mingw32, x86_64-w64-mingw32, and
x86_64-unknown-linux-gnu for all languages (including Ada + Obj-C++).
Ok for apply?

Regards,
Kai

Index: soft-fp/quad.h
===
--- soft-fp/quad.h  (revision 184486)
+++ soft-fp/quad.h  (working copy)
@@ -67,7 +67,11 @@
 union _FP_UNION_Q
 {
TFtype flt;
-   struct
+   struct
+#ifdef __MINGW32__
+  /* Make sure we are using gnu-style bitfield handling.  */
+  __attribute__ ((gcc_struct))
+#endif
{
 #if __BYTE_ORDER == __BIG_ENDIAN
   unsigned sign : 1;
@@ -174,7 +178,12 @@
   struct {
 _FP_W_TYPE a, b;
   } longs;
-  struct {
+  struct
+#ifdef __MINGW32__
+  /* Make sure we are using gnu-style bitfield handling.  */
+  __attribute__ ((gcc_struct))
+#endif
+  {
 #if __BYTE_ORDER == __BIG_ENDIAN
 unsigned sign: 1;
 unsigned exp : _FP_EXPBITS_Q;


Re: [PR52001] too many cse reverse equiv exprs (take2)

2012-02-26 Thread Alexandre Oliva
On Feb 26, 2012, Richard Sandiford rdsandif...@googlemail.com wrote:

 It seemed that when we recorded two values V1 and V2 were equivalent,
 we added V1 to V2's location list and V2 to V1's location list.  But
 it sounds from the above like the canonical value is what we want in
 almost all cases, so if V2 is the one that becomes noncanonical, is
 it really worth adding V2 to V1's location list?

I'd given that some thought and concluded that it wasn't safe to take V2
out of V1's list, in case what we were searching for among V1's
locations was precisely V2.  Now, maybe there are ways around that that
(say, canonicalizing a value before searching for it) that I haven't
given much thought.  I didn't think it would buy us much, but I could
easily be wrong, and I'd be glad to look into this given evidence that I
am.

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist  Red Hat Brazil Compiler Engineer