Re: Phone call (was Re: Armhf dynamic linker path)

2012-04-12 Thread Jakub Jelinek
On Thu, Apr 12, 2012 at 11:22:13AM +1200, Michael Hope wrote:
 All good.  My vote is for /lib/ld-arm-linux-gnueabihf.so.3 as it:

The directory should be /libhf/ or /libhfp/ for that for consistency
with all the other architectures.  Note e.g. x86_64 dynamic linker
is /lib64/ld-linux-x86-64.so.2, not /lib/ld-linux-x86-64.so.2.
For distros that choose to multilib softfp vs. hardfp all the hardfp
libraries would go into the usual */lib{qual} paths (for qual hf resp.
hfp), for others /libhf can be a symlink to /lib or for those doing
multiarch stuff can be a symlink to the multiarch location of the thing.

I'm fine with arm and hf (resp. hfp) being mentioned in the name of
the dynamic linker, but IMNSHO having there gnu and eabi strings
is an overkill - why mention gnu there, when all the other
architectures which also have GNU libc dynamic linkers don't?  That
part is implicit.  And, EABI is implied by so.3, softfp dynamic linker
for EABI is /lib/ld-linux.so.3 while softfp dynamic linker for the old
ABI is /lib/ld-linux.so.2.

Jakub


[PATCH] Option to build bare-metal ARM cross-compiler for arm-none-eabi target without libunwind

2012-04-12 Thread Fredrik Hederstierna
We've been using a bare-metal 'arm-elf' cross-compiler toolchain over the years 
since GCC 2.95.x/GCC3/GCC4, and it has worked fine.
Now it seems though like the 'arm-elf' target seems obsolete, and we are trying 
to switch to 'arm-none-eabi'.

Though we use no standard-libs at all, we are really hard bare-metal with our 
own memcpy() implementation etc.

Now when we try to build GCC 4.7.0 we get errors that libgcc has dependencies 
on UnWind stuff we cannot get rid of.
The dependencies are caused by some division library functions like _divdi3 
that can throw exceptions like division-by-zero.
These exceptions I guess mainly targeting C++/Java exceptions, and we want to 
avoid dependencies to libunwind when building libgcc for our bare-metal pure 
C-toolchain.

I attach a patch that adds an option to configure to avoid building libgcc with 
libunwind dependencies.
The new option is called '--disable-libunwind-exceptions'.

Over the years several people have experienced the same problem, here is some 
background from the gcc mailinglists:
http://gcc.gnu.org/ml/gcc-help/2012-03/msg00352.html
http://gcc.gnu.org/ml/gcc-help/2009-10/msg00302.html

Please review and comment!
Best Regards,

Fredrik Hederstierna
Securitas Direct AB
Malmoe Sweden


disable_libunwind_exceptions.patch
Description: Binary data


Re: Phone call (was Re: Armhf dynamic linker path)

2012-04-12 Thread Jakub Jelinek
On Thu, Apr 12, 2012 at 10:33:08AM +0300, Riku Voipio wrote:
 On 12 April 2012 09:05, Jakub Jelinek ja...@redhat.com wrote:
  On Thu, Apr 12, 2012 at 11:22:13AM +1200, Michael Hope wrote:
  All good.  My vote is for /lib/ld-arm-linux-gnueabihf.so.3 as it:
 
  The directory should be /libhf/ or /libhfp/ for that for consistency
  with all the other architectures.  Note e.g. x86_64 dynamic linker
  is /lib64/ld-linux-x86-64.so.2, not /lib/ld-linux-x86-64.so.2.
 
 For some value of consistency. x86_64, mips64, powerpc64 and sparc64
 install to /lib64. But on ia64 it is /lib/ld-linux-ia64.so.2 and on

ia64 installs in /lib, because it isn't a multilibbed architecture.

 s390x it is /lib/ld64.so.1 [1].

Ok, I forgot about this, I've tried to convince s390x folks to move it
to /lib64/ld64.so.1 many years ago, but that just didn't happen, so
/lib/ld64.so.1 is just a symlink to /lib64/ld64.so.1.
Upstream glibc binaries use /lib64/ld64.so.1 as their dynamic linker,
while all other packages use /lib/ld64.so.1 as that is hardcoded in gcc.

That is an argument that perhaps /lib/ld-linux-armhf.so.3 could be
acceptable too, as it would follow the s390x model, I wouldn't be terribly
happy about that, but I could live with that.

Jakub


Re: [C++ Patch] for c++/52465

2012-04-12 Thread Fabien Chêne
2012/4/11 Jason Merrill ja...@redhat.com:
[...]
 Your ChangeLog needs to be adjusted.  :)

I did write an updated ChangeLog, but I pasted here the wrong one...
I committed it to trunk yesterday, hopefully with the correct
ChangeLog . I'll be backporting it to 4.7 after a relaxing weekend :-)

-- 
Fabien


Re: [cxx-conversion] Switch default to build in C++

2012-04-12 Thread Richard Guenther
On Wed, Apr 11, 2012 at 6:37 PM, Diego Novillo dnovi...@google.com wrote:
 This patch makes the branch to build in C++ mode by default.

 Tested on x86_64.  Committed.

Note that for merging the branch the support to build with a C host compiler
should be removed (and docs updated).

Richard.


 ChangeLog.cxx-conversion
        * configure.ac (ENABLE_BUILD_WITH_CXX): Default to 'yes'.
        * configure: Regenerate.

 gcc/ChangeLog.cxx-conversion
        * configure.ac (ENABLE_BUILD_WITH_CXX): Default to 'yes'.
        * configure: Regenerate.

 libcpp/ChangeLog.cxx-conversion
        * configure.ac (ENABLE_BUILD_WITH_CXX): Default to 'yes'.
        * configure: Regenerate.

 Index: configure.ac
 ===
 --- configure.ac        (revision 186326)
 +++ configure.ac        (working copy)
 @@ -1191,7 +1191,7 @@
  [AS_HELP_STRING([--enable-build-with-cxx],
                [build with C++ compiler instead of C compiler])],
  ENABLE_BUILD_WITH_CXX=$enableval,
 -ENABLE_BUILD_WITH_CXX=no)
 +ENABLE_BUILD_WITH_CXX=yes)

  # Build stage1 with C and build stages 2 and 3 with C++.
  AC_ARG_ENABLE(build-poststage1-with-cxx,
 Index: gcc/configure.ac
 ===
 --- gcc/configure.ac    (revision 186326)
 +++ gcc/configure.ac    (working copy)
 @@ -608,7 +608,7 @@
  [AS_HELP_STRING([--enable-build-with-cxx],
                [build with C++ compiler instead of C compiler])],
  ENABLE_BUILD_WITH_CXX=$enableval,
 -ENABLE_BUILD_WITH_CXX=no)
 +ENABLE_BUILD_WITH_CXX=yes)
  AC_SUBST(ENABLE_BUILD_WITH_CXX)
  if test $ENABLE_BUILD_WITH_CXX = yes; then
   AC_DEFINE(ENABLE_BUILD_WITH_CXX, 1,
 Index: libcpp/configure.ac
 ===
 --- libcpp/configure.ac (revision 186326)
 +++ libcpp/configure.ac (working copy)
 @@ -22,7 +22,7 @@
  AC_ARG_ENABLE(build-with-cxx,
  [  --enable-build-with-cxx build with C++ compiler instead of C compiler],
  ENABLE_BUILD_WITH_CXX=$enableval,
 -ENABLE_BUILD_WITH_CXX=no)
 +ENABLE_BUILD_WITH_CXX=yes)
  AC_SUBST(ENABLE_BUILD_WITH_CXX)

  MISSING=`cd $ac_aux_dir  ${PWDCMD-pwd}`/missing


Re: [IA-64] Fix PR target/48496

2012-04-12 Thread Richard Guenther
On Wed, Apr 11, 2012 at 11:01 PM, Eric Botcazou ebotca...@adacore.com wrote:
 Hi,

 this is the spurious error on asm statements of the form:

  error: 'asm' operand requires impossible reload

 present for IA-64 on mainline and 4.7 branch.  As diagnosed by Ulrich, the 
 code
 responsible for the error implicitly assumes that constraints accepting memory
 operands also accept pseudo-registers during reload.  That isn't true for the
 Q constraint of IA-64.  The patch also fixes the MEM_VOLATILE_P access not
 properly guarded by a MEM_P test in the expression.

 Bootstrapped/regtested on IA-64/Linux, OK for mainline and 4.7 branch?

How ugly ;)  Can we have a generic helper for the pseudo-created-by-reload
test on trunk please?

Ok.

Thanks,
Richard.


 2012-04-11  Eric Botcazou  ebotca...@adacore.com

        PR target/48496
        * config/ia64/constraints.md (Q): Only accept non-volatile MEMs and
        also pseudo-registers during reload.


 2012-04-11  Eric Botcazou  ebotca...@adacore.com

        * gcc.target/ia64/pr48496.c: New test.
        * gcc.target/ia64/pr52657.c: Likewise.


 --
 Eric Botcazou


Re: [i386, patch, RFC] HLE support in GCC

2012-04-12 Thread Kirill Yukhin
 Perhaps HLE_ACQUIRE / HLE_RELEASE should be something like HLE_START /
 HLE_END instead?  Not particularly great names, but at least it avoids
 overloading ACQUIRE/RELEASE and thus should make it clearer that you
 still need to specify a memory order.

IMHO, this is also not as good, since ACQUIRE/RELEASE reflect actual
prefixies names.
HLE_START/END may confuse user even more.

I am also, agree with Jakub, such things shouldn't be implied. Going
to remove it from patch

K


Re: [IA-64] Fix PR target/48496

2012-04-12 Thread Jakub Jelinek
On Wed, Apr 11, 2012 at 11:01:40PM +0200, Eric Botcazou wrote:
 2012-04-11  Eric Botcazou  ebotca...@adacore.com
 
   PR target/48496
   * config/ia64/constraints.md (Q): Only accept non-volatile MEMs and
   also pseudo-registers during reload.
 
 
 2012-04-11  Eric Botcazou  ebotca...@adacore.com
 
   * gcc.target/ia64/pr48496.c: New test.
   * gcc.target/ia64/pr52657.c: Likewise.

Is the standard condition for define_memory_constraint here
/* Likewise if the address will be reloaded because
   reg_equiv_address is nonzero.  For reg_equiv_mem
   we have to check.  */
else if (REG_P (operand)
  REGNO (operand) = FIRST_PSEUDO_REGISTER
  reg_renumber[REGNO (operand)]  0
  ((reg_equiv_mem (REGNO (operand)) != 0
   EXTRA_CONSTRAINT_STR (reg_equiv_mem 
(REGNO (operand)), c, p))
 || (reg_equiv_address (REGNO (operand)) != 
0)))
  win = 1;
If so, shouldn't you check those conditions as well, or at least something
similar?  Not sure if reg_equiv_address needs to be allowed there, and guess
reg_equiv_mem should satisfy the Q constraint, i.e. !MEM_VOLATILE_P
memory_operand.  Accepting any pseudo there sounds too risky to me...

 Index: config/ia64/constraints.md
 ===
 --- config/ia64/constraints.md(revision 186272)
 +++ config/ia64/constraints.md(working copy)
 @@ -111,11 +111,16 @@ (define_constraint H
  
  ;; Note that while this accepts mem, it only accepts non-volatile mem,
  ;; and so cannot be fixed by adjusting the address.  Thus it cannot
 -;; and does not use define_memory_constraint.
 +;; and does not use define_memory_constraint.  But it needs to accept
 +;; pseudo-registers during reload like a define_memory_constraint.
  (define_constraint Q
Non-volatile memory for FP_REG loads/stores
 -  (and (match_operand 0 memory_operand)
 -   (match_test !MEM_VOLATILE_P (op
 +  (ior (and (match_code mem)
 + (match_test !MEM_VOLATILE_P (op))
 + (match_operand 0 memory_operand))
 +   (and (match_code reg)
 + (match_test !HARD_REGISTER_P (op))
 + (match_test reload_in_progress
  
  (define_constraint R
1..4 for shladd arguments

Jakub


[Patch, Fortran] PR52864 - Fix pointer-intent regresssion

2012-04-12 Thread Tobias Burnus
That's a GCC 4.6-4.8 regression. Pointer intents apply to the 
association status and not to the value. Thus,  assigning to an 
intent(in) pointer is fine. The problem was that the LHS is no pointer 
due to the array access (dt%ptr(1) =) thus, the check got triggered.


Build and regtested on x86-64-linux.
OK for the trunk and the 4.6/4.7 branch?

Tobias
20012-04-12  Tobias Burnus  bur...@net-b.de

	PR fortran/52864
	* expr.c (gfc_check_vardef_context): Fix assignment check for
	pointer components.

20012-04-12  Tobias Burnus  bur...@net-b.de

	PR fortran/52864
	* gfortran.dg/pointer_intent_6.f90: New.

diff --git a/gcc/fortran/expr.c b/gcc/fortran/expr.c
index e6a9c88..7ce693b 100644
--- a/gcc/fortran/expr.c
+++ b/gcc/fortran/expr.c
@@ -4656,7 +4680,11 @@ gfc_check_vardef_context (gfc_expr* e, bool pointer, bool alloc_obj,
   if (ptr_component  ref-type == REF_COMPONENT)
 	check_intentin = false;
   if (ref-type == REF_COMPONENT  ref-u.c.component-attr.pointer)
-	ptr_component = true;
+	{
+	  ptr_component = true;
+	  if (!pointer)
+	check_intentin = false;
+	}
 }
   if (check_intentin  sym-attr.intent == INTENT_IN)
 {
--- /dev/null	2012-04-12 06:55:49.927755790 +0200
+++ gcc/gcc/testsuite/gfortran.dg/pointer_intent_6.f90	2012-04-12 09:35:25.0 +0200
@@ -0,0 +1,19 @@
+! { dg-do compile }
+!
+! PR fortran/52864
+!
+! Assigning to an intent(in) pointer (which is valid).
+!
+  program test
+ type PoisFFT_Solver3D
+   complex, dimension(:,:,:), 
+   pointer :: work = null()
+ end type PoisFFT_Solver3D
+  contains
+subroutine PoisFFT_Solver3D_FullPeriodic(D, p)
+  type(PoisFFT_Solver3D), intent(in) :: D
+  real, intent(in), pointer :: p(:)
+  D%work(i,j,k) = 0.0
+  p = 0.0
+end subroutine
+  end


Re: [PATCH] [ARM] thumb1 imm move 256-510

2012-04-12 Thread Richard Earnshaw
On 12/04/12 02:48, Joey Ye wrote:
 For thumb1 use move + add instructions for immediate move [256-510]. 
 
 Following is a complete range if combine an imm mov with listed
 instructions. Among them, lsls and neg have already been implemented. The
 only missing opportunity is add, in which I enabled in this patch. Others
 are replicated with lsls or neg.
 
 1. Result imm range with 16bit movs
 Insn  Result imm with movs
 lsls  [0x1-0xff] shifted by 1-31
 add   [256, 510]
 neg   [-255, 0]
 rev   [0x100-0xff00] step 0x100
 rev16 [0x100-0xff00] step 0x100
 revsh [0x1-0xff] step 0x1
 rsb   [-255, 0]
 mvn   [-255, 0]
 sub   [-255, 0]
 sxtb  [-255, 0]
 
 ChangeLog:
 
   * config/arm/constraints.md (Pe): New constraint.
   * config/arm/arm.md: New split for imm 256-510.
 
 Testcase:
   * gcc.target/arm/thumb1-imm.c: New testcase.
 

OK.

R.



Re: [i386, patch, RFC] HLE support in GCC

2012-04-12 Thread Kirill Yukhin
Folks,
Here is patch with removed implied atomic ACQUIRE/RELEASE. Could you
please have a look?

ChangeLog entry:
2012-04-12  Kirill Yukhin  kirill.yuk...@intel.com

* builtins.c (get_memmodel): Remove check of upper bound.
(expand_builtin_atomic_compare_exchange): Mask memmodel values.
* config/i386/cpuid.h (bit_HLE): New.
* config/i386/driver-i386.c (host_detect_local_cpu): Detect
HLE support.
* config/i386/i386-protos.h (ix86_generate_hle_prefix): New.
* config/i386/i386-c.c (ix86_target_macros_internal): Set
HLE defines.
(ix86_target_string)-mhle: New.
(ix86_option_override_internal)PTA_HLE: Ditto.
(ix86_valid_target_attribute_inner_p)OPT_mhle: Ditto.
* config/i386/i386.c (ix86_target_string)OPTION_MASK_ISA_HLE:
New.
(ix86_valid_target_attribute_inner_p)OPT_mhle: Ditto.
(ix86_generate_hle_prefix): Ditto.
* config/i386/i386.h (OPTION_ISA_HLE): Ditto.
(IX86_HLE_ACQUIRE): Ditto.
(IX86_HLE_RELEASE): Ditto.
* config/i386/i386.h (ix86_generate_hle_prefix): Ditto.
* config/i386/i386.opt (mhle): Ditto.
* config/i386/sync.md(atomic_compare_and_swapmode): Pass
success model to instruction emitter.
(atomic_compare_and_swap_singlemode): Define and use argument
for success model.
(atomic_compare_and_swap_doublemode): Ditto.
* configure.ac: Check if assembler support HLE prefixies.
* configure: Regenerate.
* config.in: Ditto.

Thanks, K


hle-rfc-5.gcc.patch
Description: Binary data


Re: [i386, patch, RFC] HLE support in GCC

2012-04-12 Thread Jakub Jelinek
On Thu, Apr 12, 2012 at 01:37:24PM +0400, Kirill Yukhin wrote:
 Folks,
 Here is patch with removed implied atomic ACQUIRE/RELEASE. Could you
 please have a look?

+   
   
+  sprintf (hle_macro, __ATOMIC_HLE_ACQUIRE=%d, IX86_HLE_ACQUIRE);
   
+  def_or_undef (parse_in, hle_macro);  
   
+   
   
+  sprintf (hle_macro, __ATOMIC_HLE_RELEASE=%d, IX86_HLE_RELEASE);
   
+  def_or_undef (parse_in, hle_macro);  
   

This doesn't belong to ix86_target_macros_internal, but to
ix86_target_macros.  It is defined unconditionally, so you don't want to
undef and define it again on each target pragma, and furthermore
cpp_undef with __ATOMIC_HLE_ACQUIRE=something wouldn't work (for
undef you'd need __ATOMIC_HLE_ACQUIRE).  And in ix86_target_macros
you should be able to use cpp_define_formatted and avoid the temporary
buffer.

As for the rest of the patch, I'd like Richard to chime in...

Jakub


strengthen protection against REG_EQUIV/EQUAL on !REG set

2012-04-12 Thread Olivier Hainque
Hello,

This is a followup on a suggestion made along the thread at

  http://gcc.gnu.org/ml/gcc-patches/2010-09/msg00967.html

where we were observing the middle-end setting invalid REG_EQUIV
notes on set(mem) insns. At the time we had fixed specific locations
where this was happening via explicit calls to set_unique_reg_note.

Steven suggested to add some code to set_unique_reg_note as well. This
patch implements this suggestion. We (AdaCore) have been using it without
problem for a while now, it turned out useful to prevent a real case in
the gcc-4.5 series and we never saw the original issue resurface again
since then.

Bootstrapped and regtested on x86_64-suse-linux

OK to commit ?

Thanks in advance,

With Kind Regards,

Olivier

2012-04-12  Olivier Hainque  hain...@adacore.com

* emit-rtl.c (set_unique_reg_note): Don't add REG_EQUAL or REG_EQUIV
on a SET if the destination isn't a REG or a SUBREG of REG.



regequiv.dif
Description: video/dv


Re: strengthen protection against REG_EQUIV/EQUAL on !REG set

2012-04-12 Thread Olivier Hainque

Clarifying:

On Apr 12, 2012, at 11:54 , Olivier Hainque wrote:
 At the time we had fixed specific locations
 where this was happening via explicit calls to set_unique_reg_note.

 We had fixed the problems observable at the time by preventing
 calls to set_unique_reg_notes when they would lead to invalid settings.

 We did that at specific locations where we had observed real
 problematic calls taking place.

 Steven suggested to add some code to set_unique_reg_note as well.
 This patch implements this suggestion.


 Olivier


[PATCH] Also use max-iteration knowledge for cost consideration

2012-04-12 Thread Richard Guenther

This makes us also use max-iteration knowledge when considering costs
for prefetching and parallelization instead of only relying on
iteration estimates which are sometimes not available.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-04-11  Richard Guenther  rguent...@suse.de

* tree-parloops.c (parallelize_loops): Also consult the upper
bound for the number of iterations.
* tree-ssa-loop-prefetch.c (determine_loop_nest_reuse): Likewise.
(loop_prefetch_arrays): Likewise.

Index: trunk/gcc/tree-parloops.c
===
*** trunk.orig/gcc/tree-parloops.c  2012-04-11 16:31:08.0 +0200
--- trunk/gcc/tree-parloops.c   2012-04-11 16:39:25.271205694 +0200
*** parallelize_loops (void)
*** 2192,2203 
 header-copied loops correctly - see PR46886.  */
  || !do_while_loop_p (loop))
continue;
estimated = estimated_stmt_executions_int (loop);
/* FIXME: Bypass this check as graphite doesn't update the
!   count and frequency correctly now.  */
if (!flag_loop_parallelize_all
!  ((estimated !=-1 
! estimated = (HOST_WIDE_INT) n_threads * MIN_PER_THREAD)
  /* Do not bother with loops in cold areas.  */
  || optimize_loop_nest_for_size_p (loop)))
continue;
--- 2192,2206 
 header-copied loops correctly - see PR46886.  */
  || !do_while_loop_p (loop))
continue;
+ 
estimated = estimated_stmt_executions_int (loop);
+   if (estimated == -1)
+   estimated = max_stmt_executions_int (loop);
/* FIXME: Bypass this check as graphite doesn't update the
!count and frequency correctly now.  */
if (!flag_loop_parallelize_all
!  ((estimated != -1
!   estimated = (HOST_WIDE_INT) n_threads * MIN_PER_THREAD)
  /* Do not bother with loops in cold areas.  */
  || optimize_loop_nest_for_size_p (loop)))
continue;
Index: trunk/gcc/tree-ssa-loop-prefetch.c
===
*** trunk.orig/gcc/tree-ssa-loop-prefetch.c 2012-04-11 16:31:08.0 
+0200
--- trunk/gcc/tree-ssa-loop-prefetch.c  2012-04-11 16:42:45.864193724 +0200
*** determine_loop_nest_reuse (struct loop *
*** 1549,1555 
  
aloop = VEC_index (loop_p, vloops, i);
vol = estimated_stmt_executions_int (aloop);
!   if (vol  0)
vol = expected_loop_iterations (aloop);
volume *= vol;
  }
--- 1549,1555 
  
aloop = VEC_index (loop_p, vloops, i);
vol = estimated_stmt_executions_int (aloop);
!   if (vol == -1)
vol = expected_loop_iterations (aloop);
volume *= vol;
  }
*** loop_prefetch_arrays (struct loop *loop)
*** 1801,1806 
--- 1801,1808 
  
ahead = (PREFETCH_LATENCY + time - 1) / time;
est_niter = estimated_stmt_executions_int (loop);
+   if (est_niter == -1)
+ est_niter = max_stmt_executions_int (loop);
  
/* Prefetching is not likely to be profitable if the trip count to ahead
   ratio is too small.  */


Re: [PATCH] Atom: Scheduler improvements for better imul placement

2012-04-12 Thread Igor Zamyatin
On Wed, Apr 11, 2012 at 5:38 PM, Andi Kleen a...@firstfloor.org wrote:
 Igor Zamyatin izamya...@gmail.com writes:

 Hi All!

 It is known that imul placement is rather critical for Atom processors
 and changes try to improve imul scheduling for Atom.

 This gives +5% performance on several tests from new OA 2.0 testsuite
 from EEMBC.

 Tested for i386 and x86-64, ok for trunk?

 Did you measure how much this slows down the compiler when compiling
 for Atom? The new pass looks rather slow.

No, since this pass is applied rarely. Conditions are mentioned in
the comments to ix86_sched_reorder


 -Andi

 --
 a...@linux.intel.com -- Speaking for myself only


Re: [PATCH] Atom: Scheduler improvements for better imul placement

2012-04-12 Thread Igor Zamyatin
On Wed, Apr 11, 2012 at 6:10 PM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Wed, Apr 11, 2012 at 3:38 PM, Andi Kleen a...@firstfloor.org wrote:
 Igor Zamyatin izamya...@gmail.com writes:

 Hi All!

 It is known that imul placement is rather critical for Atom processors
 and changes try to improve imul scheduling for Atom.

 This gives +5% performance on several tests from new OA 2.0 testsuite
 from EEMBC.

 Tested for i386 and x86-64, ok for trunk?

 Did you measure how much this slows down the compiler when compiling
 for Atom? The new pass looks rather slow.

 Also please explain why adjusting the automaton for Atom is not a way to
 attack this issue.

If I understand the question correctly - it's a dynamic check and it's
not clear how to describe this adjusting statically in machine
description


 Richard.

 -Andi

 --
 a...@linux.intel.com -- Speaking for myself only


Re: [PATCH] Atom: Scheduler improvements for better imul placement

2012-04-12 Thread Richard Guenther
On Thu, Apr 12, 2012 at 12:20 PM, Igor Zamyatin izamya...@gmail.com wrote:
 On Wed, Apr 11, 2012 at 6:10 PM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Wed, Apr 11, 2012 at 3:38 PM, Andi Kleen a...@firstfloor.org wrote:
 Igor Zamyatin izamya...@gmail.com writes:

 Hi All!

 It is known that imul placement is rather critical for Atom processors
 and changes try to improve imul scheduling for Atom.

 This gives +5% performance on several tests from new OA 2.0 testsuite
 from EEMBC.

 Tested for i386 and x86-64, ok for trunk?

 Did you measure how much this slows down the compiler when compiling
 for Atom? The new pass looks rather slow.

 Also please explain why adjusting the automaton for Atom is not a way to
 attack this issue.

 If I understand the question correctly - it's a dynamic check and it's
 not clear how to describe this adjusting statically in machine
 description

From reading the code (the comments are not clear to me) you seem to
produce two adjacent IMUL instructions from within the ready list - but
first check that there is only a single one.  So I fail to see how it can work.
Can atom execute two IMUL in parallel?  Or what exactly is the pipeline
behavior?

You miss a testcase that would show the effect of your patch.

Richard.


 Richard.

 -Andi

 --
 a...@linux.intel.com -- Speaking for myself only


Re: [PATCH] Atom: Enabling unroll at O2 optimization level

2012-04-12 Thread Igor Zamyatin
On Wed, Apr 11, 2012 at 5:34 PM, Andi Kleen a...@firstfloor.org wrote:
 Richard Guenther richard.guent...@gmail.com writes:

 5% is not moderate.  Your patch does enable unrolling at -O2 but not -O3,
 why? Why do you disable register renaming?  check_imull requires a function
 comment.

 This completely looks like a hack for EEMBC2.0, so it's definitely not ok.

 -O2 is not supposed to give best benchmark results.

 Besides it is against the Intel Optimization Manual recommendation
 to prefer small code on Atom to avoid falling out of the predecode hints
 in the cache.

Yes, this is well-known concern for Atom. But in the same time unroll
could help a lot for inorder machines because it could provide more
opportunities to a compiler scheduler. And experiments showed that
unroll could really help.


 So would need much more benchmarking on macro workloads first at least.

Like what, for example? I believe in this case everything also
strongly depends on test usage model (e.g. it usually compiled with Os
not O2) and, let's say, internal test structure - whether there are
hot loops that suitable for unroll.


 -Andi

 --
 a...@linux.intel.com -- Speaking for myself only


Re: [PATCH] Atom: Enabling unroll at O2 optimization level

2012-04-12 Thread Igor Zamyatin
On Wed, Apr 11, 2012 at 12:39 PM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Tue, Apr 10, 2012 at 8:43 PM, Igor Zamyatin izamya...@gmail.com wrote:
 Hi All!

 Here is a patch that enables unroll at O2 for Atom.

 This gives good performance boost on EEMBC 2.0 (~+8% in Geomean for 32
 bits) with quite moderate code size increase (~5% for EEMBC2.0, 32
 bits).

 5% is not moderate.  Your patch does enable unrolling at -O2 but not -O3,
 why? Why do you disable register renaming?  check_imull requires a function
 comment.

Sure, enabling unroll for O3 could be the next step.
We can't avoid code size increase with unroll - what number do you
think will be appropriate?
Register renaming was the reason of several degradations during tuning process
Comment for check_imull was added


 This completely looks like a hack for EEMBC2.0, so it's definitely not ok.

Why? EEMBC was measured and result provided here just because this
benchmark considers to be very relevant for Atom


 -O2 is not supposed to give best benchmark results.

O2 is wide-used so performance improvement could be important for users.


 Thanks,
 Richard.


 Tested for i386 and x86-64, ok for trunk?

Updated patch attached


 Thanks,
 Igor

 ChangeLog:

 2012-04-10  Yakovlev Vladimir  vladimir.b.yakov...@intel.com

        * gcc/config/i386/i386.c (check_imul): New routine.
        (ix86_loop_unroll_adjust): New target hook.
        (ix86_option_override_internal): Enable unrolling on Atom at -O2.
        (TARGET_LOOP_UNROLL_ADJUST): New define.


unroll1.patch
Description: Binary data


Re: [PATCH] Atom: Enabling unroll at O2 optimization level

2012-04-12 Thread Richard Guenther
On Thu, Apr 12, 2012 at 1:05 PM, Igor Zamyatin izamya...@gmail.com wrote:
 On Wed, Apr 11, 2012 at 12:39 PM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Tue, Apr 10, 2012 at 8:43 PM, Igor Zamyatin izamya...@gmail.com wrote:
 Hi All!

 Here is a patch that enables unroll at O2 for Atom.

 This gives good performance boost on EEMBC 2.0 (~+8% in Geomean for 32
 bits) with quite moderate code size increase (~5% for EEMBC2.0, 32
 bits).

 5% is not moderate.  Your patch does enable unrolling at -O2 but not -O3,
 why? Why do you disable register renaming?  check_imull requires a function
 comment.

 Sure, enabling unroll for O3 could be the next step.
 We can't avoid code size increase with unroll - what number do you
 think will be appropriate?
 Register renaming was the reason of several degradations during tuning process
 Comment for check_imull was added


 This completely looks like a hack for EEMBC2.0, so it's definitely not ok.

 Why? EEMBC was measured and result provided here just because this
 benchmark considers to be very relevant for Atom

I'd say that SPEC INT (2000 / 2006) is more relevant for Atom (SPEC FP
would be irrelevant OTOH).  Similar code size for, say, Mozilla Firefox
or GCC itself would be important.

 -O2 is not supposed to give best benchmark results.

 O2 is wide-used so performance improvement could be important for users.

But not at a 5% size cost.  Please also always check the compile-time effect
which is important for -O2 as well.

Richard.


 Thanks,
 Richard.


 Tested for i386 and x86-64, ok for trunk?

 Updated patch attached


 Thanks,
 Igor

 ChangeLog:

 2012-04-10  Yakovlev Vladimir  vladimir.b.yakov...@intel.com

        * gcc/config/i386/i386.c (check_imul): New routine.
        (ix86_loop_unroll_adjust): New target hook.
        (ix86_option_override_internal): Enable unrolling on Atom at -O2.
        (TARGET_LOOP_UNROLL_ADJUST): New define.


Re: [PATCH, i386, Android] -mandroid support for i386 target

2012-04-12 Thread Ilya Enkovich

 This is a big change.  Maybe it is better to break it into 2
 parts:

 1. Introduce gnu-user32.h only, which can be reviewed by x86 backend
 maintainers.   Another possibility is to add gnu-user-common.h instead of
 gnu-user32.h so that no changes to config.gcc are needed.
 2. Add Android support, which can reviewed by Android maintainer.

 Thanks.

 --
 H.J.

OK, I'll split this patch and send new patches in separate threads.

Thanks
Ilya


[PATCH] Fix PR52943

2012-04-12 Thread Richard Guenther

Data-dependence analysis gets analyzing a[3] vs. a[{3, +, -1}_1] wrong
where it claims there is no dependence.  That is because
analyze_siv_subscript_cst_affine uses chrec_is_positive to distinguish
cases but that returns always negative for zero (zero, in this case
being the distance between 3 and 3).  The following patch makes
chrec_is_positive fail for zero, which is neither positive nor
negative.  It also special-cases the zero distance case because it
can be handled independently on the direction or value of the increment.
For the questionable handling of 'zero' I moved chrec_is_positive to
a private place.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.  I will
apply this to the trunk and the 4.7 branch where this latent bug
shows up first.

Thanks,
Richard.

2012-04-12  Richard Guenther  rguent...@suse.de

PR tree-optimization/52943
* tree-chrec.h (chrec_is_positive): Remove.
* tree-scalar-evolution.c (chrec_is_positive): Move ...
* tree-data-ref.c (chrec_is_positive): ... here.  Make static.
Return false for a constant zero instead of negative.
(analyze_siv_subscript_cst_affine): Handle zero difference
in the initial condition explicitely.

* gcc.dg/torture/pr52943.c: New testcase.

Index: gcc/tree-chrec.h
===
*** gcc/tree-chrec.h(revision 186372)
--- gcc/tree-chrec.h(working copy)
*** extern void for_each_scev_op (tree *, bo
*** 77,83 
  /* Observers.  */
  extern bool eq_evolutions_p (const_tree, const_tree);
  extern bool is_multivariate_chrec (const_tree);
- extern bool chrec_is_positive (tree, bool *);
  extern bool chrec_contains_symbols (const_tree);
  extern bool chrec_contains_symbols_defined_in_loop (const_tree, unsigned);
  extern bool chrec_contains_undetermined (const_tree);
--- 77,82 
Index: gcc/tree-scalar-evolution.c
===
*** gcc/tree-scalar-evolution.c (revision 186372)
--- gcc/tree-scalar-evolution.c (working copy)
*** compute_overall_effect_of_inner_loop (st
*** 501,565 
  return chrec_dont_know;
  }
  
- /* Determine whether the CHREC is always positive/negative.  If the expression
-cannot be statically analyzed, return false, otherwise set the answer into
-VALUE.  */
- 
- bool
- chrec_is_positive (tree chrec, bool *value)
- {
-   bool value0, value1, value2;
-   tree end_value, nb_iter;
- 
-   switch (TREE_CODE (chrec))
- {
- case POLYNOMIAL_CHREC:
-   if (!chrec_is_positive (CHREC_LEFT (chrec), value0)
- || !chrec_is_positive (CHREC_RIGHT (chrec), value1))
-   return false;
- 
-   /* FIXME -- overflows.  */
-   if (value0 == value1)
-   {
- *value = value0;
- return true;
-   }
- 
-   /* Otherwise the chrec is under the form: {-197, +, 2}_1,
-and the proof consists in showing that the sign never
-changes during the execution of the loop, from 0 to
-loop-nb_iterations.  */
-   if (!evolution_function_is_affine_p (chrec))
-   return false;
- 
-   nb_iter = number_of_latch_executions (get_chrec_loop (chrec));
-   if (chrec_contains_undetermined (nb_iter))
-   return false;
- 
- #if 0
-   /* TODO -- If the test is after the exit, we may decrease the number of
-iterations by one.  */
-   if (after_exit)
-   nb_iter = chrec_fold_minus (type, nb_iter, build_int_cst (type, 1));
- #endif
- 
-   end_value = chrec_apply (CHREC_VARIABLE (chrec), chrec, nb_iter);
- 
-   if (!chrec_is_positive (end_value, value2))
-   return false;
- 
-   *value = value0;
-   return value0 == value1;
- 
- case INTEGER_CST:
-   *value = (tree_int_cst_sgn (chrec) == 1);
-   return true;
- 
- default:
-   return false;
- }
- }
- 
  /* Associate CHREC to SCALAR.  */
  
  static void
--- 501,506 
Index: gcc/tree-data-ref.c
===
*** gcc/tree-data-ref.c (revision 186372)
--- gcc/tree-data-ref.c (working copy)
*** max_stmt_executions_tree (struct loop *l
*** 1718,1723 
--- 1718,1793 
return double_int_to_tree (unsigned_type_node, nit);
  }
  
+ /* Determine whether the CHREC is always positive/negative.  If the expression
+cannot be statically analyzed, return false, otherwise set the answer into
+VALUE.  */
+ 
+ static bool
+ chrec_is_positive (tree chrec, bool *value)
+ {
+   bool value0, value1, value2;
+   tree end_value, nb_iter;
+ 
+   switch (TREE_CODE (chrec))
+ {
+ case POLYNOMIAL_CHREC:
+   if (!chrec_is_positive (CHREC_LEFT (chrec), value0)
+ || !chrec_is_positive (CHREC_RIGHT (chrec), value1))
+   return false;
+ 
+   /* FIXME -- overflows.  */
+   if (value0 == value1)
+   {
+ *value = value0;
+ return true;
+   }
+ 
+   

Re: [i386, patch, RFC] HLE support in GCC

2012-04-12 Thread Andi Kleen
On Thu, Apr 12, 2012 at 12:38:44AM +0200, Torvald Riegel wrote:
 On Wed, 2012-04-11 at 15:06 +0200, Andi Kleen wrote:
   Tests passing, bootstrap in progress.
   
   Comments?
  
  Do you really imply ACQUIRE/RELEASE with HLE_ACQUIRE/RELEASE now? I don't
  see that in the code. I think that's really required, otherwise the 
  optimizer
  will do the wrong thing and move memory references outside the region.
 
 Perhaps HLE_ACQUIRE / HLE_RELEASE should be something like HLE_START /
 HLE_END instead?  Not particularly great names, but at least it avoids
 overloading ACQUIRE/RELEASE and thus should make it clearer that you
 still need to specify a memory order.

It still seems wrong to me. HLE is an x86 construct, so weaker
memory orders on the compiler level than what the instruction implements
does not really make sense to me. And the instruction just has LOCK
semantics.

Currently it's highly error prone -- on the Russel hard to misuse scale not 
higher 
than 1 as is [1]

At the minimum it would need a warning with RELAXED as suggested
by Jakub earlier.


 I agree with Jakub that users really should specify memory order bits,
 if they want ordering.  Andi, I also see your point regarding catching
 bugs, but this is really expert stuff, and my hope is that we can make
 HLE really transparent or at least provide better abstractions around it

At least this form of HLE cannot be transparent, it has to be annotated by the
programmer.

-Andi

[1] http://ozlabs.org/~rusty/index.cgi/tech/2008-03-30.html
-- 
a...@linux.intel.com -- Speaking for myself only.


Re: [PATCH] Atom: Scheduler improvements for better imul placement

2012-04-12 Thread Alexander Monakov

 Can atom execute two IMUL in parallel?  Or what exactly is the pipeline
 behavior?

As I understand from Intel's optimization reference manual, the behavior is as
follows: if the instruction immediately following IMUL has shorter latency,
execution is stalled for 4 cycles (which is IMUL's latency); otherwise, a
4-or-more cycles latency instruction can be issued after IMUL without a stall.
In other words, IMUL is pipelined with respect to other long-latency
instructions, but not to short-latency instructions.

From reading the patch, I could not understand the link between pipeline
behavior and what the patch appears to do.

Hope that helps.

Alexander


Re: [IA-64] Fix PR target/48496

2012-04-12 Thread Eric Botcazou
 Is the standard condition for define_memory_constraint here
 /* Likewise if the address will be reloaded because
reg_equiv_address is nonzero.  For reg_equiv_mem
we have to check.  */
 else if (REG_P (operand)
   REGNO (operand) =
 FIRST_PSEUDO_REGISTER  reg_renumber[REGNO (operand)]  0 
 ((reg_equiv_mem (REGNO (operand)) != 0  EXTRA_CONSTRAINT_STR
 (reg_equiv_mem (REGNO (operand)), c, p))

  || (reg_equiv_address (REGNO
  || (operand)) != 0)))

   win = 1;
 If so, shouldn't you check those conditions as well, or at least something
 similar?  Not sure if reg_equiv_address needs to be allowed there, and
 guess reg_equiv_mem should satisfy the Q constraint, i.e. !MEM_VOLATILE_P
 memory_operand.  Accepting any pseudo there sounds too risky to me...

You're right, and modifying a constraint to silence a bogus error is probably 
too dangerous in any case.  And there may be other affected architectures.
So patch withdrawn.  The best fix is very likely in reload1.c in the end.

-- 
Eric Botcazou


[v3] libstdc++/52942

2012-04-12 Thread Paolo Carlini

Hi,

this is what I'm going to commit to mainline and 4_7-branch to fix the 
regression.


Tested x86_64-linux, debug-mode too.

Thanks,
Paolo.


2012-04-12  Paolo Carlini  paolo.carl...@oracle.com

PR libstdc++/52942
* include/bits/stl_function.h (_Identity, _Select1st, _Select2nd):
In C++11 mode do not derive from std::unary_function.
* include/ext/functional (identity, select1st, select2nd): Adjust.
* testsuite/23_containers/unordered_map/requirements/52942.cc: New.
* testsuite/23_containers/unordered_set/requirements/52942.cc: Likewise.
Index: include/ext/functional
===
--- include/ext/functional  (revision 186366)
+++ include/ext/functional  (working copy)
@@ -1,6 +1,6 @@
 // Functional extensions -*- C++ -*-
 
-// Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2009, 2010
+// Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2009, 2010, 2012
 // Free Software Foundation, Inc.
 //
 // This file is part of the GNU ISO C++ Library.  This library is free
@@ -183,7 +183,13 @@
*  @addtogroup SGIextensions
*/
   template class _Tp
-struct identity : public std::_Identity_Tp {};
+struct identity
+#ifdef __GXX_EXPERIMENTAL_CXX0X__
+: public std::unary_function_Tp,_Tp,
+  public std::_Identity_Tp {};
+#else
+: public std::_Identity_Tp {};
+#endif
 
   /** @c select1st and @c select2nd are extensions provided by SGI.  Their
*  @c operator()s
@@ -197,11 +203,23 @@
*/
   /// An \link SGIextensions SGI extension \endlink.
   template class _Pair
-struct select1st : public std::_Select1st_Pair {};
+struct select1st
+#ifdef __GXX_EXPERIMENTAL_CXX0X__
+: public std::unary_function_Pair, typename _Pair::first_type,
+  public std::_Select1st_Pair {};
+#else
+: public std::_Select1st_Pair {};
+#endif
 
   /// An \link SGIextensions SGI extension \endlink.
   template class _Pair
-struct select2nd : public std::_Select2nd_Pair {};
+struct select2nd
+#ifdef __GXX_EXPERIMENTAL_CXX0X__
+: public std::unary_function_Pair, typename _Pair::second_type,
+  public std::_Select2nd_Pair {};
+#else
+: public std::_Select2nd_Pair {};
+#endif
   /** @}  */
 
   // extension documented next
Index: include/bits/stl_function.h
===
--- include/bits/stl_function.h (revision 186372)
+++ include/bits/stl_function.h (working copy)
@@ -1,6 +1,7 @@
 // Functor implementations -*- C++ -*-
 
-// Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2009, 2010, 2011
+// Copyright (C) 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2009, 2010,
+// 2011, 2012
 // Free Software Foundation, Inc.
 //
 // This file is part of the GNU ISO C++ Library.  This library is free
@@ -471,7 +472,12 @@
   /** @}  */
 
   templatetypename _Tp
-struct _Identity : public unary_function_Tp,_Tp
+struct _Identity
+#ifndef __GXX_EXPERIMENTAL_CXX0X__
+// unary_function itself is deprecated in C++11 and deriving from
+// it can even be a nuisance (see PR 52942).
+: public unary_function_Tp,_Tp
+#endif
 {
   _Tp
   operator()(_Tp __x) const
@@ -483,8 +489,10 @@
 };
 
   templatetypename _Pair
-struct _Select1st : public unary_function_Pair,
- typename _Pair::first_type
+struct _Select1st
+#ifndef __GXX_EXPERIMENTAL_CXX0X__
+: public unary_function_Pair, typename _Pair::first_type
+#endif
 {
   typename _Pair::first_type
   operator()(_Pair __x) const
@@ -508,8 +516,10 @@
 };
 
   templatetypename _Pair
-struct _Select2nd : public unary_function_Pair,
- typename _Pair::second_type
+struct _Select2nd
+#ifndef __GXX_EXPERIMENTAL_CXX0X__
+: public unary_function_Pair, typename _Pair::second_type
+#endif
 {
   typename _Pair::second_type
   operator()(_Pair __x) const
Index: testsuite/23_containers/unordered_map/requirements/52942.cc
===
--- testsuite/23_containers/unordered_map/requirements/52942.cc (revision 0)
+++ testsuite/23_containers/unordered_map/requirements/52942.cc (revision 0)
@@ -0,0 +1,37 @@
+// { dg-do compile }
+// { dg-options -std=gnu++11 }
+
+// Copyright (C) 2012 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without Pred the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// 

Re: [cxx-conversion] Switch default to build in C++

2012-04-12 Thread Diego Novillo

On 4/12/12 4:31 AM, Richard Guenther wrote:

On Wed, Apr 11, 2012 at 6:37 PM, Diego Novillodnovi...@google.com  wrote:

This patch makes the branch to build in C++ mode by default.

Tested on x86_64.  Committed.


Note that for merging the branch the support to build with a C host compiler
should be removed (and docs updated).


Thanks.  Yes, I already had that in mind.


Diego.


Re: [i386, patch, RFC] HLE support in GCC

2012-04-12 Thread Andrew MacLeod

On 04/11/2012 06:38 PM, Torvald Riegel wrote:

On Wed, 2012-04-11 at 15:06 +0200, Andi Kleen wrote:



Perhaps HLE_ACQUIRE / HLE_RELEASE should be something like HLE_START /
HLE_END instead?  Not particularly great names, but at least it avoids
overloading ACQUIRE/RELEASE and thus should make it clearer that you
still need to specify a memory order.




Does it make any sense to simply predefine the possible valid 
combinations with the HLE bit already set?  it at least removes any 
possible invalid combinations and forces the programmer to consciously 
choose their memory model.


ie,
__ATOMIC_HLE_XACQ_CONSUME
__ATOMIC_HLE_XACQ_ACQUIRE
__ATOMIC_HLE_XACQ_ACQ_REL
__ATOMIC_HLE_XACQ_SEQ_CST

__ATOMIC_HLE_XREL_RELEASE
__ATOMIC_HLE_XREL_ACQ_REL
__ATOMIC_HLE_XREL_SEQ_CST

or whatever happens to be valid...   Doesn't really scale to adding more 
new bits later, but perhaps that doesn't matter.


Just a thought.

Andrew



Re: [PATCH] Atom: Scheduler improvements for better imul placement

2012-04-12 Thread Richard Guenther
On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakov amona...@ispras.ru wrote:

 Can atom execute two IMUL in parallel?  Or what exactly is the pipeline
 behavior?

 As I understand from Intel's optimization reference manual, the behavior is as
 follows: if the instruction immediately following IMUL has shorter latency,
 execution is stalled for 4 cycles (which is IMUL's latency); otherwise, a
 4-or-more cycles latency instruction can be issued after IMUL without a stall.
 In other words, IMUL is pipelined with respect to other long-latency
 instructions, but not to short-latency instructions.

It seems to be modeled in the pipeline description though:

;;; imul insn has 5 cycles latency
(define_reservation atom-imul-32
atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4,
 atom-port-0)

;;; imul instruction excludes other non-FP instructions.
(exclusion_set atom-eu-0, atom-eu-1
   atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4)

at least from my very limited guessing of what the above does.  So, did you
analyze why the scheduler does not properly schedule things for you?

Richard.


 From reading the patch, I could not understand the link between pipeline
 behavior and what the patch appears to do.

 Hope that helps.

 Alexander


Re: [i386, patch, RFC] HLE support in GCC

2012-04-12 Thread Jakub Jelinek
On Thu, Apr 12, 2012 at 08:21:47AM -0400, Andrew MacLeod wrote:
 On 04/11/2012 06:38 PM, Torvald Riegel wrote:
 On Wed, 2012-04-11 at 15:06 +0200, Andi Kleen wrote:
 Perhaps HLE_ACQUIRE / HLE_RELEASE should be something like HLE_START /
 HLE_END instead?  Not particularly great names, but at least it avoids
 overloading ACQUIRE/RELEASE and thus should make it clearer that you
 still need to specify a memory order.
 
 Does it make any sense to simply predefine the possible valid
 combinations with the HLE bit already set?  it at least removes any
 possible invalid combinations and forces the programmer to
 consciously choose their memory model.
 
 ie,
 __ATOMIC_HLE_XACQ_CONSUME
 __ATOMIC_HLE_XACQ_ACQUIRE
 __ATOMIC_HLE_XACQ_ACQ_REL
 __ATOMIC_HLE_XACQ_SEQ_CST
 
 __ATOMIC_HLE_XREL_RELEASE
 __ATOMIC_HLE_XREL_ACQ_REL
 __ATOMIC_HLE_XREL_SEQ_CST
 
 or whatever happens to be valid...   Doesn't really scale to adding
 more new bits later, but perhaps that doesn't matter.

I'd prefer not to predefine these.  They can be surely defined in some intrinsic
header, but the number of predefined macros is already huge and is quite
costly (it appears in all -g3 macro info, increases compiler initialization
time even for empty sources, etc.).

Jakub


Re: [i386, patch, RFC] HLE support in GCC

2012-04-12 Thread Torvald Riegel
On Thu, 2012-04-12 at 13:36 +0200, Andi Kleen wrote:
 On Thu, Apr 12, 2012 at 12:38:44AM +0200, Torvald Riegel wrote:
  On Wed, 2012-04-11 at 15:06 +0200, Andi Kleen wrote:
Tests passing, bootstrap in progress.

Comments?
   
   Do you really imply ACQUIRE/RELEASE with HLE_ACQUIRE/RELEASE now? I don't
   see that in the code. I think that's really required, otherwise the 
   optimizer
   will do the wrong thing and move memory references outside the region.
  
  Perhaps HLE_ACQUIRE / HLE_RELEASE should be something like HLE_START /
  HLE_END instead?  Not particularly great names, but at least it avoids
  overloading ACQUIRE/RELEASE and thus should make it clearer that you
  still need to specify a memory order.
 
 It still seems wrong to me. HLE is an x86 construct, so weaker
 memory orders on the compiler level than what the instruction implements
 does not really make sense to me. And the instruction just has LOCK
 semantics.

What if another vendor shows up, perhaps on another architecture?

 
 Currently it's highly error prone -- on the Russel hard to misuse scale not 
 higher 
 than 1 as is [1]

It would be a three, if the patch would contain documentation of the
additional bit.  I guess that's a hint :)

It could even be a four, depending on the point of view.  Not from the
POV of the Intel HLE feature.  But from a conceptual POV, HLE is pretty
independent of memory orders.

 
 At the minimum it would need a warning with RELAXED as suggested
 by Jakub earlier.

That could be helpful.

However, I'm not 100% sure that HLE is only useful paired with
acquire/release memory orders in general (ie, possibly on other archs).
For example, if you only care about having acq_rel atomics being
protected by your possibly-elided lock, then they won't get moved out of
the critical section (unless I read the C++11 memory model too
conservatively).
Therefore, given that I don't see the atomic builtins being used by lots
of programmers, I'd rather make them more general.

  I agree with Jakub that users really should specify memory order bits,
  if they want ordering.  Andi, I also see your point regarding catching
  bugs, but this is really expert stuff, and my hope is that we can make
  HLE really transparent or at least provide better abstractions around it
 
 At least this form of HLE cannot be transparent, it has to be annotated by the
 programmer.

Let me elaborate.  The point I was trying to make is that it should be
transparent for the non-concurrency-expert programmer.  Or at least make
this specific detail we're discussing here transparent (ie, whether
HLE_ACQUIRE should imply a certain memory order).  That is, if
non-experts only see a default lock implementation, or see a
lock-implementation with a use-HLE-here-if-possible flag, then they
don't have to deal with memory orders anyway.



Re: [PATCH] Atom: Scheduler improvements for better imul placement

2012-04-12 Thread Igor Zamyatin
On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakov amona...@ispras.ru wrote:

 Can atom execute two IMUL in parallel?  Or what exactly is the pipeline
 behavior?

 As I understand from Intel's optimization reference manual, the behavior is 
 as
 follows: if the instruction immediately following IMUL has shorter latency,
 execution is stalled for 4 cycles (which is IMUL's latency); otherwise, a
 4-or-more cycles latency instruction can be issued after IMUL without a 
 stall.
 In other words, IMUL is pipelined with respect to other long-latency
 instructions, but not to short-latency instructions.

 It seems to be modeled in the pipeline description though:

 ;;; imul insn has 5 cycles latency
 (define_reservation atom-imul-32
                    atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4,
                     atom-port-0)

 ;;; imul instruction excludes other non-FP instructions.
 (exclusion_set atom-eu-0, atom-eu-1
               atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4)


The main idea is quite simple:

If we are going to schedule IMUL instruction (it is on the top of
ready list) we try to find out producer of other (independent) IMUL
instruction that is in ready list too. The goal is try to schedule
such a producer to get another IMUL in ready list and get scheduling
of 2 successive IMUL instructions.

And MD allows us only prefer scheduling of successive IMUL instructions, i.e.
If IMUL was just scheduled and ready list contains another IMUL
instruction then it will be chosen as the best candidate for
scheduling.


 at least from my very limited guessing of what the above does.  So, did you
 analyze why the scheduler does not properly schedule things for you?

 Richard.


 From reading the patch, I could not understand the link between pipeline
 behavior and what the patch appears to do.

 Hope that helps.

 Alexander


Re: [PATCH] Atom: Scheduler improvements for better imul placement

2012-04-12 Thread Richard Guenther
On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatin izamya...@gmail.com wrote:
 On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakov amona...@ispras.ru 
 wrote:

 Can atom execute two IMUL in parallel?  Or what exactly is the pipeline
 behavior?

 As I understand from Intel's optimization reference manual, the behavior is 
 as
 follows: if the instruction immediately following IMUL has shorter latency,
 execution is stalled for 4 cycles (which is IMUL's latency); otherwise, a
 4-or-more cycles latency instruction can be issued after IMUL without a 
 stall.
 In other words, IMUL is pipelined with respect to other long-latency
 instructions, but not to short-latency instructions.

 It seems to be modeled in the pipeline description though:

 ;;; imul insn has 5 cycles latency
 (define_reservation atom-imul-32
                    atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4,
                     atom-port-0)

 ;;; imul instruction excludes other non-FP instructions.
 (exclusion_set atom-eu-0, atom-eu-1
               atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4)


 The main idea is quite simple:

 If we are going to schedule IMUL instruction (it is on the top of
 ready list) we try to find out producer of other (independent) IMUL
 instruction that is in ready list too. The goal is try to schedule
 such a producer to get another IMUL in ready list and get scheduling
 of 2 successive IMUL instructions.

Why does that not happen without your patch?  Does it never happen without
your patch or does it merely not happen for one EEMBC benchmark (can
you provide a testcase?)?

 And MD allows us only prefer scheduling of successive IMUL instructions, i.e.
 If IMUL was just scheduled and ready list contains another IMUL
 instruction then it will be chosen as the best candidate for
 scheduling.


 at least from my very limited guessing of what the above does.  So, did you
 analyze why the scheduler does not properly schedule things for you?

 Richard.


 From reading the patch, I could not understand the link between pipeline
 behavior and what the patch appears to do.

 Hope that helps.

 Alexander


Re: [i386, patch, RFC] HLE support in GCC

2012-04-12 Thread Andi Kleen
Hi Andrew,

 Does it make any sense to simply predefine the possible valid 
 combinations with the HLE bit already set?  it at least removes any 
 possible invalid combinations and forces the programmer to consciously 
 choose their memory model.
 
 ie,
 __ATOMIC_HLE_XACQ_CONSUME
 __ATOMIC_HLE_XACQ_ACQUIRE
 __ATOMIC_HLE_XACQ_ACQ_REL
 __ATOMIC_HLE_XACQ_SEQ_CST
 
 __ATOMIC_HLE_XREL_RELEASE
 __ATOMIC_HLE_XREL_ACQ_REL
 __ATOMIC_HLE_XREL_SEQ_CST
 
 or whatever happens to be valid...   Doesn't really scale to adding more 
 new bits later, but perhaps that doesn't matter.

Idea sounds good to me. Certainly would be much harder to misuse,
so generally be a better interface.

As to what combinations make sense:

An HLE region ACQUIRE has somewhat interesting ordering semantics.
It's a fairly strong barrier (LOCK prefix) for reads and writes.
The HLE RELEASE is either LOCK too, or MOV without LOCK. If it's running
transactionally the whole block acts like a LOCK too. But we have
to use the weakest.

I suppose that would map to always _SEQ_CST just for most instructions,
except for mov release whih can be _RELEASE too (and would need 
an additional MFENCE generated for anything stronger)

Probably there is not a lot of value in allowing the optimizer
weaker models than what the CPU does.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.


Re: [i386, patch, RFC] HLE support in GCC

2012-04-12 Thread Andi Kleen
 What if another vendor shows up, perhaps on another architecture?

HLE code is currently usually x86 specific. e.g. for practical spin locks
you have to include a __builtin_ia32_pause() on lock locked to stop speculation,
otherwise the lock path will speculate too, which is very inefficient.

So if you wanted abstracted HLE, you would need more abstracted builtins first.

  Currently it's highly error prone -- on the Russel hard to misuse scale not 
  higher 
  than 1 as is [1]
 
 It would be a three, if the patch would contain documentation of the
 additional bit.  I guess that's a hint :)

Yes :)

 However, I'm not 100% sure that HLE is only useful paired with
 acquire/release memory orders in general (ie, possibly on other archs).

It's actually stronger on the instruction level (see other mail),
except for the MOV release.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.


Re: [i386, patch, RFC] HLE support in GCC

2012-04-12 Thread Andi Kleen
  __ATOMIC_HLE_XACQ_CONSUME
  __ATOMIC_HLE_XACQ_ACQUIRE
  __ATOMIC_HLE_XACQ_ACQ_REL
  __ATOMIC_HLE_XACQ_SEQ_CST
  
  __ATOMIC_HLE_XREL_RELEASE
  __ATOMIC_HLE_XREL_ACQ_REL
  __ATOMIC_HLE_XREL_SEQ_CST
  
  or whatever happens to be valid...   Doesn't really scale to adding
  more new bits later, but perhaps that doesn't matter.
 
 I'd prefer not to predefine these.  They can be surely defined in some 
 intrinsic

I think only a very small number make sense for x86 HLE, two, if you include 
weak MOV RELEASE
three. That's only one more than currently.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.


Re: [PATCH] Atom: Scheduler improvements for better imul placement

2012-04-12 Thread Andrey Belevantsev

On 12.04.2012 16:38, Richard Guenther wrote:

On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatinizamya...@gmail.com  wrote:

On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther
richard.guent...@gmail.com  wrote:

On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakovamona...@ispras.ru  wrote:



Can atom execute two IMUL in parallel?  Or what exactly is the pipeline
behavior?


As I understand from Intel's optimization reference manual, the behavior is as
follows: if the instruction immediately following IMUL has shorter latency,
execution is stalled for 4 cycles (which is IMUL's latency); otherwise, a
4-or-more cycles latency instruction can be issued after IMUL without a stall.
In other words, IMUL is pipelined with respect to other long-latency
instructions, but not to short-latency instructions.


It seems to be modeled in the pipeline description though:

;;; imul insn has 5 cycles latency
(define_reservation atom-imul-32
atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4,
 atom-port-0)

;;; imul instruction excludes other non-FP instructions.
(exclusion_set atom-eu-0, atom-eu-1
   atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4)



The main idea is quite simple:

If we are going to schedule IMUL instruction (it is on the top of
ready list) we try to find out producer of other (independent) IMUL
instruction that is in ready list too. The goal is try to schedule
such a producer to get another IMUL in ready list and get scheduling
of 2 successive IMUL instructions.


Why does that not happen without your patch?  Does it never happen without
your patch or does it merely not happen for one EEMBC benchmark (can
you provide a testcase?)?


It does not happen because the scheduler by itself does not do such 
specific reordering.  That said, it is easy to imagine the cases where this 
patch will make things worse rather than better.


Igor, why not try different subtler mechanisms like adjust_priority, which 
is get called when an insn is added to the ready list?  E.g. increase the 
producer's priority.


The patch as is misses checks for NONDEBUG_INSN_P.  Also, why bail out when 
you have more than one imul in the ready list?  Don't you want to bump the 
priority of the other imul found?


Andrey




And MD allows us only prefer scheduling of successive IMUL instructions, i.e.
If IMUL was just scheduled and ready list contains another IMUL
instruction then it will be chosen as the best candidate for
scheduling.



at least from my very limited guessing of what the above does.  So, did you
analyze why the scheduler does not properly schedule things for you?

Richard.



 From reading the patch, I could not understand the link between pipeline
behavior and what the patch appears to do.

Hope that helps.

Alexander




Re: [patch optimization]: Add some basic folding for ==/!= comparison

2012-04-12 Thread Richard Guenther
On Thu, Apr 5, 2012 at 6:15 PM, Kai Tietz ktiet...@googlemail.com wrote:
 Hello,

 this patch adds some basic folding capabilities to fold-const for
 equal and none-equal comparisons
 on integer typed argument.

 ChangeLog

 2012-04-05  Kai Tietz  kti...@redhat.com

        * fold-const.c (fold_comparison_1): New
        function.
        (fold_comparison): Use fold_comparison_1.

 2012-04-05  Kai Tietz  kti...@redhat.com

        * gcc.dg/fold-compare-1.c: Adjust matching rule
        for a == b without argument swap.
        * gcc.dg/fold-compare-7.c: New test.

 Regession tested for x86_64-unknown-linux-gnu for all languages
 (including Ada and Obj-C++).  Ok for apply?

 Regards,
 Kai

 Index: gcc/gcc/fold-const.c
 ===
 --- gcc.orig/gcc/fold-const.c
 +++ gcc/gcc/fold-const.c
 @@ -8739,6 +8739,241 @@ pointer_may_wrap_p (tree base, tree offs
   return total_low  (unsigned HOST_WIDE_INT) size;
  }

 +/* Sub-routine of fold_comparison.  Folding of EQ_EXPR/NE_EXPR
 +   comparisons with integral typed arguments.  */
 +
 +static tree
 +fold_comparison_1 (location_t loc, enum tree_code code, tree type,
 +                  tree arg0, tree arg1)

Please name it more specific, like for example
fold_integral_equality_comparison.

 +{
 +  enum tree_code c1, c2;
 +  tree optype, op0, op1, opr0, opr1, tem;
 +
 +  if (code != NE_EXPR  code != EQ_EXPR)
 +    return NULL_TREE;
 +
 +  optype = TREE_TYPE (arg0);
 +  if (!INTEGRAL_TYPE_P (optype))
 +    return NULL_TREE;
 +
 +  c1 = TREE_CODE (arg0);
 +  c2 = TREE_CODE (arg1);
 +
 +  /* Integer constant is on right-hand side.  */
 +  if (c1 == INTEGER_CST
 +       c2 != c1)
 +    return fold_build2_loc (loc, code, type, arg1, arg0);

  /* If one arg is a real or integer constant, put it last.  */
  if (tree_swap_operands_p (arg0, arg1, true))
return fold_build2_loc (loc, swap_tree_comparison (code), type, op1, op0);

in fold_comparison already ensures this.

 +  if (!TREE_SIDE_EFFECTS (arg0)
 +       operand_equal_p (arg0, arg1, 0))
 +    {
 +      if (code == EQ_EXPR)
 +        return build_one_cst (type);
 +      return build_zero_cst (type);
 +    }

This is already done in a more general way in fold_comparison:

  /* Simplify comparison of something with itself.  (For IEEE
 floating-point, we can only do some of these simplifications.)  */
  if (operand_equal_p (arg0, arg1, 0))
{
  switch (code)
{
case EQ_EXPR:
...

which also shows how to fold to true/false - using constant_boolean_node.

 +
 +  if (c1 == NEGATE_EXPR)
 +    {
 +      op0 = TREE_OPERAND (arg0, 0);
 +      /* -X ==/!= -Y - X ==/!= Y.  */
 +      if (c2 == c1)
 +        return fold_build2_loc (loc, code, type,
 +                               op0,
 +                               TREE_OPERAND (arg1, 0));

This is already done, in a more general way but only for float types,
in fold_comparison.  It's beyond me why it is conditional on float types
there and does not check for trapping math and NaNs (maybe that's
well-defined - one would need to double-check).  For integral types
you'd have to care for undefined overflow (or issue a warning), and ...

 +      /* -X ==/!= CST - X ==/!= CST' with CST'= -CST.  */
 +      else if (c2 == INTEGER_CST)
 +        return fold_build2_loc (loc, code, type, op0,
 +                               fold_build1_loc (loc, NEGATE_EXPR,
 +                                                optype, arg1));

... generalizing this the code should use negate_expr_p / negate_expr
to for example handle folding -b != b - a to b != a - b (of course you'd
require at least one explicit NEGATE_EXPR - similar foldings elsewhere
will tell you what to do).

 +     }
 +  else if (c1 == BIT_NOT_EXPR)
 +    {
 +      op0 = TREE_OPERAND (arg0, 0);
 +      /* ~X ==/!= ~Y - X ==/!= Y.  */
 +      if (c2 == c1)
 +        return fold_build2_loc (loc, code, type, op0,
 +                               TREE_OPERAND (arg1, 0));

This can be generalized to relational comparisons as well I think.  It's also
already done in fold_comparison here:

  /* Fold ~X op ~Y as Y op X.  */
  if (TREE_CODE (arg0) == BIT_NOT_EXPR
   TREE_CODE (arg1) == BIT_NOT_EXPR)
{


 +      /* ~X ==/!= CST - X ==/!= CST' with CST'= ~CST.  */
 +      else if (c2 == INTEGER_CST)
 +        return fold_build2_loc (loc, code, type, op0,
 +                               fold_build1_loc (loc, BIT_NOT_EXPR,
 +                                                optype, arg1));

Possibly unified with having a new predicate/worker invert_expr_p / invert_expr.

 +    }
 +
 +  /* See if we can simplify case X == (Y +|-|^ Z).  */
 +  if (c1 != PLUS_EXPR  c1 != MINUS_EXPR  c1 != BIT_XOR_EXPR)
 +    {
 +      if ((c2 != PLUS_EXPR  c2 != MINUS_EXPR
 +            c2 != BIT_XOR_EXPR)
 +          || TREE_SIDE_EFFECTS (arg0))
 +        return NULL_TREE;

(I'm not sure why you are testing for side-effects - if you omit sth use
omit_*_operand ())

 +
 +      

Re: [RS6000] Stack tie fix.

2012-04-12 Thread Alan Modra
On Wed, Apr 11, 2012 at 11:00:04AM +0200, Richard Guenther wrote:
 On Wed, Apr 11, 2012 at 6:15 AM, Alan Modra amo...@gmail.com wrote:
 Well - you are expecting generic code to understand your arch-specific
 UNSPEC.  It of course cannot.

Right.

  (USE (mem:BLK (reg 1)))
  (CLOBBER (mem:BLK (reg 1)))
 
 repeated, for each register (maybe you can avoid the USE if the CLOBBER

I tried that.  It doesn't work without something else in the insn to
stop rtl-dce deleting it, so you may as well use SETs.  But thanks for
the prod in the right direction.  We do get slightly better results
when the regs are not hidden away in an UNSPEC, for instance
non-stack writes/reads are seen by the alias oracle to not conflict
with the epilogue frame deallocation.

Bootstrapped etc. powerpc-linux.  OK to apply, David?

PR target/52828
* config/rs6000/rs6000.c (rs6000_emit_stack_tie): Rewrite with
tie regs on destination of sets.  Delete forward declaration.
(rs6000_emit_stack_reset): Update rs6000_emit_stack_tie calls.
(rs6000_emit_prologue): Likewise.
(rs6000_emit_epilogue): Likewise.  Use in place of gen_frame_tie
and gen_stack_tie.
(is_mem_ref): Use tie_operand to recognise stack ties.
* config/rs6000/predicates.md (tie_operand): New.
* config/rs6000/rs6000.md (UNSPEC_TIE): Delete.
(restore_stack_block): Generate new stack tie rtl.
(restore_stack_nonlocal): Likewise.
(stack_tie): Update.
(frame_tie): Delete.

Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 186373)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -925,7 +925,6 @@ static const char *rs6000_invalid_within_doloop (c
 static bool rs6000_legitimate_address_p (enum machine_mode, rtx, bool);
 static bool rs6000_debug_legitimate_address_p (enum machine_mode, rtx, bool);
 static rtx rs6000_generate_compare (rtx, enum machine_mode);
-static void rs6000_emit_stack_tie (void);
 static bool spe_func_has_64bit_regs_p (void);
 static rtx gen_frame_mem_offset (enum machine_mode, rtx, int);
 static unsigned rs6000_hash_constant (rtx);
@@ -18505,12 +18504,29 @@ rs6000_aix_asm_output_dwarf_table_ref (char * fram
and the change to the stack pointer.  */
 
 static void
-rs6000_emit_stack_tie (void)
+rs6000_emit_stack_tie (rtx fp, bool hard_frame_needed)
 {
-  rtx mem = gen_frame_mem (BLKmode,
-  gen_rtx_REG (Pmode, STACK_POINTER_REGNUM));
+  rtvec p;
+  int i;
+  rtx regs[3];
 
-  emit_insn (gen_stack_tie (mem));
+  i = 0;
+  regs[i++] = gen_rtx_REG (Pmode, STACK_POINTER_REGNUM);
+  if (hard_frame_needed)
+regs[i++] = gen_rtx_REG (Pmode, HARD_FRAME_POINTER_REGNUM);
+  if (!(REGNO (fp) == STACK_POINTER_REGNUM
+   || (hard_frame_needed
+REGNO (fp) == HARD_FRAME_POINTER_REGNUM)))
+regs[i++] = fp;
+
+  p = rtvec_alloc (i);
+  while (--i = 0)
+{
+  rtx mem = gen_frame_mem (BLKmode, regs[i]);
+  RTVEC_ELT (p, i) = gen_rtx_SET (VOIDmode, mem, const0_rtx);
+}
+
+  emit_insn (gen_stack_tie (gen_rtx_PARALLEL (VOIDmode, p)));
 }
 
 /* Emit the correct code for allocating stack space, as insns.
@@ -19130,7 +19146,7 @@ rs6000_emit_stack_reset (rs6000_stack_t *info,
   || (TARGET_SPE_ABI
   info-spe_64bit_regs_used != 0
   info-first_gp_reg_save != 32))
-rs6000_emit_stack_tie ();
+rs6000_emit_stack_tie (frame_reg_rtx, frame_pointer_needed);
   
   if (frame_reg_rtx != sp_reg_rtx)
 {
@@ -19350,7 +19366,7 @@ rs6000_emit_prologue (void)
}
   rs6000_emit_allocate_stack (info-total_size, copy_reg);
   if (frame_reg_rtx != sp_reg_rtx)
-   rs6000_emit_stack_tie ();
+   rs6000_emit_stack_tie (frame_reg_rtx, false);
 }
 
   /* Handle world saves specially here.  */
@@ -19854,7 +19870,7 @@ rs6000_emit_prologue (void)
sp_offset = info-total_size;
   rs6000_emit_allocate_stack (info-total_size, copy_reg);
   if (frame_reg_rtx != sp_reg_rtx)
-   rs6000_emit_stack_tie ();
+   rs6000_emit_stack_tie (frame_reg_rtx, false);
 }
 
   /* Set frame pointer, if needed.  */
@@ -20425,13 +20441,7 @@ rs6000_emit_epilogue (int sibcall)
   /* Prevent reordering memory accesses against stack pointer restore.  */
   else if (cfun-calls_alloca
   || offset_below_red_zone_p (-info-total_size))
-   {
- rtx mem1 = gen_rtx_MEM (BLKmode, hard_frame_pointer_rtx);
- rtx mem2 = gen_rtx_MEM (BLKmode, sp_reg_rtx);
- MEM_NOTRAP_P (mem1) = 1;
- MEM_NOTRAP_P (mem2) = 1;
- emit_insn (gen_frame_tie (mem1, mem2));
-   }
+   rs6000_emit_stack_tie (frame_reg_rtx, true);
 
   insn = emit_insn (gen_add3_insn (frame_reg_rtx, hard_frame_pointer_rtx,
   GEN_INT (info-total_size)));
@@ -20444,11 +20454,7 @@ rs6000_emit_epilogue (int sibcall)
   /* Prevent reordering memory 

Re: [i386, patch, RFC] HLE support in GCC

2012-04-12 Thread Torvald Riegel
On Thu, 2012-04-12 at 08:21 -0400, Andrew MacLeod wrote:
 On 04/11/2012 06:38 PM, Torvald Riegel wrote:
  On Wed, 2012-04-11 at 15:06 +0200, Andi Kleen wrote:
 
  Perhaps HLE_ACQUIRE / HLE_RELEASE should be something like HLE_START /
  HLE_END instead?  Not particularly great names, but at least it avoids
  overloading ACQUIRE/RELEASE and thus should make it clearer that you
  still need to specify a memory order.
 
 
 
 Does it make any sense to simply predefine the possible valid 
 combinations with the HLE bit already set?  it at least removes any 
 possible invalid combinations and forces the programmer to consciously 
 choose their memory model.
 
 ie,
 __ATOMIC_HLE_XACQ_CONSUME
 __ATOMIC_HLE_XACQ_ACQUIRE
 __ATOMIC_HLE_XACQ_ACQ_REL
 __ATOMIC_HLE_XACQ_SEQ_CST
 
 __ATOMIC_HLE_XREL_RELEASE
 __ATOMIC_HLE_XREL_ACQ_REL
 __ATOMIC_HLE_XREL_SEQ_CST
 
 or whatever happens to be valid...   Doesn't really scale to adding more 
 new bits later, but perhaps that doesn't matter.

I would suggest that we keep the HLE acq/rel bits independent of the
memory order bits.  Both are independent on a conceptual level.  And we
should add documentation that tells programmers that memory orders need
always be specified.

This way, we:
- keep it generic and don't add any arch-specific assumptions,
- remain closer to a true extension of the C++11/C11 atomics instead of
  a more strongly coupled mix-up,
- can still do arch-specific selection of memory orders (e.g., even just
  the memory orders are more fine-granular than you'd need for just
  x86),
- can still do arch-specific warnings if some combinations really don't
  make sense,
- keep the number of macros small as preferred by Jakub.

Would that be an acceptable option for everyone?  Andi, would proper
documentation resolve your ease-of-use concerns?



Re: [PATCH] Atom: Enabling unroll at O2 optimization level

2012-04-12 Thread Andi Kleen
  So would need much more benchmarking on macro workloads first at least.
 
 Like what, for example? I believe in this case everything also
 strongly depends on test usage model (e.g. it usually compiled with Os
 not O2) and, let's say, internal test structure - whether there are
 hot loops that suitable for unroll.

Normally the compiler doesn't know if a loop is hot unless you use
profile feedback. So worst case on a big code base you may end up
with a lot of unnecessary unrolling. On cold code it's just wasted
bytes, but there could be already icache limited code where it 
would be worse.

How about just a compiler bootstrap on Atom as a worst case?
For the benchmark can you use profile feedback?

BTW I know some loops are unrolled at -O3 by default at tree level because
the vectorizer likes it.  I actually have an older patch to dial this
down for some common cases.

-Andi
-- 
a...@linux.intel.com -- Speaking for myself only.


Re: [i386, patch, RFC] HLE support in GCC

2012-04-12 Thread Andi Kleen
 Would that be an acceptable option for everyone?  Andi, would proper
 documentation resolve your ease-of-use concerns?

Proper documentation is needed in any case, but I would strongly prefer an
inherently hard-to-misuse interface. I think Andrew's proposal is the best 
for that so far.

-Andi

-- 
a...@linux.intel.com -- Speaking for myself only.


Re: [PATCH] Atom: Scheduler improvements for better imul placement

2012-04-12 Thread Richard Guenther
2012/4/12 Andrey Belevantsev a...@ispras.ru:
 On 12.04.2012 16:38, Richard Guenther wrote:

 On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatinizamya...@gmail.com
  wrote:

 On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther
 richard.guent...@gmail.com  wrote:

 On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakovamona...@ispras.ru
  wrote:


 Can atom execute two IMUL in parallel?  Or what exactly is the
 pipeline
 behavior?


 As I understand from Intel's optimization reference manual, the
 behavior is as
 follows: if the instruction immediately following IMUL has shorter
 latency,
 execution is stalled for 4 cycles (which is IMUL's latency); otherwise,
 a
 4-or-more cycles latency instruction can be issued after IMUL without a
 stall.
 In other words, IMUL is pipelined with respect to other long-latency
 instructions, but not to short-latency instructions.


 It seems to be modeled in the pipeline description though:

 ;;; imul insn has 5 cycles latency
 (define_reservation atom-imul-32
                    atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4,
                     atom-port-0)

 ;;; imul instruction excludes other non-FP instructions.
 (exclusion_set atom-eu-0, atom-eu-1
               atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4)


 The main idea is quite simple:

 If we are going to schedule IMUL instruction (it is on the top of
 ready list) we try to find out producer of other (independent) IMUL
 instruction that is in ready list too. The goal is try to schedule
 such a producer to get another IMUL in ready list and get scheduling
 of 2 successive IMUL instructions.


 Why does that not happen without your patch?  Does it never happen without
 your patch or does it merely not happen for one EEMBC benchmark (can
 you provide a testcase?)?


 It does not happen because the scheduler by itself does not do such specific
 reordering.  That said, it is easy to imagine the cases where this patch
 will make things worse rather than better.

That surprises me.  What is so specific about this reordering?

 Igor, why not try different subtler mechanisms like adjust_priority, which
 is get called when an insn is added to the ready list?  E.g. increase the
 producer's priority.

 The patch as is misses checks for NONDEBUG_INSN_P.  Also, why bail out when
 you have more than one imul in the ready list?  Don't you want to bump the
 priority of the other imul found?

 Andrey



 And MD allows us only prefer scheduling of successive IMUL instructions,
 i.e.
 If IMUL was just scheduled and ready list contains another IMUL
 instruction then it will be chosen as the best candidate for
 scheduling.


 at least from my very limited guessing of what the above does.  So, did
 you
 analyze why the scheduler does not properly schedule things for you?

 Richard.


  From reading the patch, I could not understand the link between
 pipeline
 behavior and what the patch appears to do.

 Hope that helps.

 Alexander




Re: [PATCH] Atom: Scheduler improvements for better imul placement

2012-04-12 Thread Andrey Belevantsev

On 12.04.2012 17:54, Richard Guenther wrote:

2012/4/12 Andrey Belevantseva...@ispras.ru:

On 12.04.2012 16:38, Richard Guenther wrote:


On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatinizamya...@gmail.com
  wrote:


On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther
richard.guent...@gmail.comwrote:


On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakovamona...@ispras.ru
  wrote:




Can atom execute two IMUL in parallel?  Or what exactly is the
pipeline
behavior?



As I understand from Intel's optimization reference manual, the
behavior is as
follows: if the instruction immediately following IMUL has shorter
latency,
execution is stalled for 4 cycles (which is IMUL's latency); otherwise,
a
4-or-more cycles latency instruction can be issued after IMUL without a
stall.
In other words, IMUL is pipelined with respect to other long-latency
instructions, but not to short-latency instructions.



It seems to be modeled in the pipeline description though:

;;; imul insn has 5 cycles latency
(define_reservation atom-imul-32
atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4,
 atom-port-0)

;;; imul instruction excludes other non-FP instructions.
(exclusion_set atom-eu-0, atom-eu-1
   atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4)



The main idea is quite simple:

If we are going to schedule IMUL instruction (it is on the top of
ready list) we try to find out producer of other (independent) IMUL
instruction that is in ready list too. The goal is try to schedule
such a producer to get another IMUL in ready list and get scheduling
of 2 successive IMUL instructions.



Why does that not happen without your patch?  Does it never happen without
your patch or does it merely not happen for one EEMBC benchmark (can
you provide a testcase?)?



It does not happen because the scheduler by itself does not do such specific
reordering.  That said, it is easy to imagine the cases where this patch
will make things worse rather than better.


That surprises me.  What is so specific about this reordering?


I mean that the scheduler does things like sort the ready list according 
to a number of heuristics and to the insn priority, then choose the insn 
that would allow the maximum of ready insns to be issued on the current 
cycle.  The heuristics are rather general.  The scheduler does not do 
things like if some random insn is ready, then choose some other random 
insn from the ready list and schedule it (which is what the patch does). 
This is what scheduler hooks are for, to account for some target-specific 
heuristic.


The problem is that this particular implementation looks somewhat like an 
overkill and also motivated by a single benchmark.  Testing on a wider set 
of benchmarks and checking compile-time hit would make the motivation more 
convincing.


Andrey




Igor, why not try different subtler mechanisms like adjust_priority, which
is get called when an insn is added to the ready list?  E.g. increase the
producer's priority.

The patch as is misses checks for NONDEBUG_INSN_P.  Also, why bail out when
you have more than one imul in the ready list?  Don't you want to bump the
priority of the other imul found?

Andrey





And MD allows us only prefer scheduling of successive IMUL instructions,
i.e.
If IMUL was just scheduled and ready list contains another IMUL
instruction then it will be chosen as the best candidate for
scheduling.



at least from my very limited guessing of what the above does.  So, did
you
analyze why the scheduler does not properly schedule things for you?

Richard.



  From reading the patch, I could not understand the link between
pipeline
behavior and what the patch appears to do.

Hope that helps.

Alexander







Re: [PATCH, ARM] PR49448 incorrectly detecting big-endian arm-linux triplets

2012-04-12 Thread Richard Earnshaw
On 28/02/12 15:27, Richard Earnshaw wrote:
 The pattern to match a big-endian machine for linux is ambiguous as
 reported by the PR and can cause some little-endian triplets to be
 confused as big-endian.
 
 This patch makes the string unambiguous.
 
 R.
 
   * config.gcc (arm*-*-linux*): Use an unambiguous pattern for
   detecting big-endian triplets.
 
 
 be-lin.patch
 
 
 Index: config.gcc
 ===
 --- config.gcc(revision 184624)
 +++ config.gcc(working copy)
 @@ -825,7 +825,7 @@ arm*-*-netbsdelf*)
  arm*-*-linux*)   # ARM GNU/Linux with ELF
   tm_file=dbxelf.h elfos.h gnu-user.h linux.h linux-android.h 
 glibc-stdint.h arm/elf.h arm/linux-gas.h arm/linux-elf.h
   case $target in
 - arm*b-*)
 + arm*b-*-linux*)
   tm_defines=${tm_defines} TARGET_BIG_ENDIAN_DEFAULT=1
   ;;
   esac

For the record, I've back-ported this patch to gcc-4.5 and gcc-4.6.

R.



Re: [i386, patch, RFC] HLE support in GCC

2012-04-12 Thread Kirill Yukhin
 I would suggest that we keep the HLE acq/rel bits independent of the
 memory order bits.  Both are independent on a conceptual level.  And we
 should add documentation that tells programmers that memory orders need
 always be specified.

Sorry, I didn't get your point. You propose to separate HLE-capable
builtins at all?

How will we make it independent without new param introduction?
New param introduction will make new builtin.
We already had that: see initial patch - it contains HLE-featured
intrinsics (and builtins),
independent of __atomic*

K


[PATCH] More code refactoring

2012-04-12 Thread Richard Guenther

Eventually I'll succeed in making tree-optimize.c empty.  At least
the pass stuff I'm interested in get's better now.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Richard.

2012-04-12  Richard Guenther  rguent...@suse.de

* Makefile.in (cgraphunit.o): Add $(EXCEPT_H) dependency.
* cgraph.h (tree_rest_of_compilation): Remove.
* cgraph.c (cgraph_add_new_function): Move ...
* cgraphunit.c (cgraph_add_new_function): ... here.
(tree_rest_of_compilation): Make static.
(cgraph_expand_function): Do not set cgraph_function_flags_ready.
* tree-optimize.c (gate_all_optimizations, pass_all_optimizations,
gate_all_early_local_passes, execute_all_early_local_passes,
pass_early_local_passes, gate_all_early_optimizations,
pass_all_early_optimizations): Move ...
* passes.c (gate_all_optimizations, pass_all_optimizations,
gate_all_early_local_passes, execute_all_early_local_passes,
pass_early_local_passes, gate_all_early_optimizations,
pass_all_early_optimizations): ... here.
* tree-optimize.c (execute_free_datastructures): Remove.
* tree-flow.h (execute_free_datastructures): Remove.
* tree-optimize.c (execute_init_datastructures,
pass_init_datastructures): Move ...
* tree-ssa.c (execute_init_datastructures,
pass_init_datastructures): ... here.
* cfgexpand.c (gimple_expand_cfg): Inline-expand call to
execute_free_datastructures.

Index: gcc/Makefile.in
===
*** gcc/Makefile.in.orig2012-04-12 14:05:23.0 +0200
--- gcc/Makefile.in 2012-04-12 14:05:51.274596925 +0200
*** cgraphunit.o : cgraphunit.c $(CONFIG_H)
*** 2922,2928 
 $(FIBHEAP_H) output.h $(PARAMS_H) $(RTL_H) $(TIMEVAR_H) $(IPA_PROP_H) \
 gt-cgraphunit.h tree-iterator.h $(COVERAGE_H) $(TREE_DUMP_H) \
 tree-pretty-print.h gimple-pretty-print.h ipa-inline.h $(IPA_UTILS_H) \
!$(LTO_STREAMER_H) output.h $(REGSET_H)
  cgraphbuild.o : cgraphbuild.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
 $(TREE_H) langhooks.h $(CGRAPH_H) intl.h pointer-set.h $(GIMPLE_H) \
 $(TREE_FLOW_H) $(TREE_PASS_H) $(IPA_UTILS_H) $(EXCEPT_H) \
--- 2922,2928 
 $(FIBHEAP_H) output.h $(PARAMS_H) $(RTL_H) $(TIMEVAR_H) $(IPA_PROP_H) \
 gt-cgraphunit.h tree-iterator.h $(COVERAGE_H) $(TREE_DUMP_H) \
 tree-pretty-print.h gimple-pretty-print.h ipa-inline.h $(IPA_UTILS_H) \
!$(LTO_STREAMER_H) output.h $(REGSET_H) $(EXCEPT_H)
  cgraphbuild.o : cgraphbuild.c $(CONFIG_H) $(SYSTEM_H) coretypes.h $(TM_H) \
 $(TREE_H) langhooks.h $(CGRAPH_H) intl.h pointer-set.h $(GIMPLE_H) \
 $(TREE_FLOW_H) $(TREE_PASS_H) $(IPA_UTILS_H) $(EXCEPT_H) \
Index: gcc/cgraph.c
===
*** gcc/cgraph.c.orig   2012-04-12 14:05:23.0 +0200
--- gcc/cgraph.c2012-04-12 14:05:51.276596925 +0200
*** cgraph_function_body_availability (struc
*** 2397,2485 
return avail;
  }
  
- /* Add the function FNDECL to the call graph.
-Unlike cgraph_finalize_function, this function is intended to be used
-by middle end and allows insertion of new function at arbitrary point
-of compilation.  The function can be either in high, low or SSA form
-GIMPLE.
- 
-The function is assumed to be reachable and have address taken (so no
-API breaking optimizations are performed on it).
- 
-Main work done by this function is to enqueue the function for later
-processing to avoid need the passes to be re-entrant.  */
- 
- void
- cgraph_add_new_function (tree fndecl, bool lowered)
- {
-   struct cgraph_node *node;
-   switch (cgraph_state)
- {
-   case CGRAPH_STATE_CONSTRUCTION:
-   /* Just enqueue function to be processed at nearest occurrence.  */
-   node = cgraph_create_node (fndecl);
-   node-next_needed = cgraph_new_nodes;
-   if (lowered)
- node-lowered = true;
-   cgraph_new_nodes = node;
- break;
- 
-   case CGRAPH_STATE_IPA:
-   case CGRAPH_STATE_IPA_SSA:
-   case CGRAPH_STATE_EXPANSION:
-   /* Bring the function into finalized state and enqueue for later
-  analyzing and compilation.  */
-   node = cgraph_get_create_node (fndecl);
-   node-local.local = false;
-   node-local.finalized = true;
-   node-reachable = node-needed = true;
-   if (!lowered  cgraph_state == CGRAPH_STATE_EXPANSION)
- {
-   push_cfun (DECL_STRUCT_FUNCTION (fndecl));
-   current_function_decl = fndecl;
-   gimple_register_cfg_hooks ();
-   bitmap_obstack_initialize (NULL);
-   execute_pass_list (all_lowering_passes);
-   execute_pass_list (pass_early_local_passes.pass.sub);
-   bitmap_obstack_release (NULL);
-   pop_cfun ();
-   current_function_decl = 

Re: [PATCH] Atom: Scheduler improvements for better imul placement

2012-04-12 Thread Richard Guenther
2012/4/12 Andrey Belevantsev a...@ispras.ru:
 On 12.04.2012 17:54, Richard Guenther wrote:

 2012/4/12 Andrey Belevantseva...@ispras.ru:

 On 12.04.2012 16:38, Richard Guenther wrote:


 On Thu, Apr 12, 2012 at 2:36 PM, Igor Zamyatinizamya...@gmail.com
  wrote:


 On Thu, Apr 12, 2012 at 4:24 PM, Richard Guenther
 richard.guent...@gmail.com    wrote:


 On Thu, Apr 12, 2012 at 2:00 PM, Alexander Monakovamona...@ispras.ru
  wrote:



 Can atom execute two IMUL in parallel?  Or what exactly is the
 pipeline
 behavior?



 As I understand from Intel's optimization reference manual, the
 behavior is as
 follows: if the instruction immediately following IMUL has shorter
 latency,
 execution is stalled for 4 cycles (which is IMUL's latency);
 otherwise,
 a
 4-or-more cycles latency instruction can be issued after IMUL without
 a
 stall.
 In other words, IMUL is pipelined with respect to other long-latency
 instructions, but not to short-latency instructions.



 It seems to be modeled in the pipeline description though:

 ;;; imul insn has 5 cycles latency
 (define_reservation atom-imul-32
                    atom-imul-1, atom-imul-2, atom-imul-3,
 atom-imul-4,
                     atom-port-0)

 ;;; imul instruction excludes other non-FP instructions.
 (exclusion_set atom-eu-0, atom-eu-1
               atom-imul-1, atom-imul-2, atom-imul-3, atom-imul-4)


 The main idea is quite simple:

 If we are going to schedule IMUL instruction (it is on the top of
 ready list) we try to find out producer of other (independent) IMUL
 instruction that is in ready list too. The goal is try to schedule
 such a producer to get another IMUL in ready list and get scheduling
 of 2 successive IMUL instructions.



 Why does that not happen without your patch?  Does it never happen
 without
 your patch or does it merely not happen for one EEMBC benchmark (can
 you provide a testcase?)?



 It does not happen because the scheduler by itself does not do such
 specific
 reordering.  That said, it is easy to imagine the cases where this patch
 will make things worse rather than better.


 That surprises me.  What is so specific about this reordering?


 I mean that the scheduler does things like sort the ready list according to
 a number of heuristics and to the insn priority, then choose the insn that
 would allow the maximum of ready insns to be issued on the current cycle.
  The heuristics are rather general.  The scheduler does not do things like
 if some random insn is ready, then choose some other random insn from the
 ready list and schedule it (which is what the patch does). This is what
 scheduler hooks are for, to account for some target-specific heuristic.

 The problem is that this particular implementation looks somewhat like an
 overkill and also motivated by a single benchmark.  Testing on a wider set
 of benchmarks and checking compile-time hit would make the motivation more
 convincing.

Yeah, and especially looking _why_ the generic heuristics are not working
and if they could be improved.  After all the issue seems to be properly
modeled in the DFA.

Richard.

 Andrey



 Igor, why not try different subtler mechanisms like adjust_priority,
 which
 is get called when an insn is added to the ready list?  E.g. increase the
 producer's priority.

 The patch as is misses checks for NONDEBUG_INSN_P.  Also, why bail out
 when
 you have more than one imul in the ready list?  Don't you want to bump
 the
 priority of the other imul found?

 Andrey



 And MD allows us only prefer scheduling of successive IMUL
 instructions,
 i.e.
 If IMUL was just scheduled and ready list contains another IMUL
 instruction then it will be chosen as the best candidate for
 scheduling.


 at least from my very limited guessing of what the above does.  So,
 did
 you
 analyze why the scheduler does not properly schedule things for you?

 Richard.


  From reading the patch, I could not understand the link between
 pipeline
 behavior and what the patch appears to do.

 Hope that helps.

 Alexander






Re: [PATCH] Fix ICE in expand_cse_reciprocals (PR tree-optimization/42078)

2012-04-12 Thread Richard Guenther
On Mon, Apr 9, 2012 at 8:37 AM, Alexandre Oliva aol...@redhat.com wrote:
 On Jun  3, 2011, Alexandre Oliva aol...@redhat.com wrote:

 According to http://gcc.gnu.org/ml/gcc-patches/2009-11/msg01082.html
 on Nov 20, 2009, Alexandre Oliva aol...@redhat.com wrote:

 On Nov 19, 2009, Richard Guenther richard.guent...@gmail.com wrote:
 In fact this exchanging of the LHS (or rather invalidating of the
 SSA name value) should be a helper function that knows
 the implementation details and avoids going through releasing
 and allocating the name.

 Okie dokie, here's a sequence of patches that implements helper
 functions for this sort of stuff.

 The first patch introduces machinery to propagate “dying” DEFs into
 debug stmts, while replacing them with other SSA_NAMEs.

 This is already in.

 The second extends it so as to enable the old LHS to be redefined
 e.g. in terms of the new LHS.  IIRC this may be useful in some other
 transformations that, in the process of introducing VTA, I changed from
 modifying stmts in place to inserting new stmts and removing others.  I
 haven't looked for them yet.

 The third uses this new feature for the case at hand, while avoiding
 computing the reciprocal expression if we know it won't be used.

 Updated versions of these follow.  Regstrapped on x86_64-linux-gnu and
 i686-linux-gnu.  Ok to install?

 I was asked to submit these again in stage1, so...  Ping?
 (updated and re-tested)


+  /* If the conditions in which this function uses VALUE change,
+ adjust gimple_replace_lhs_wants_value().  */
+  gcc_assert (gimple_replace_lhs_wants_value ()
+ == MAY_HAVE_DEBUG_STMTS);
+

that looks ... odd.  But I see you want to conditionally compute value.

+static inline bool
+gimple_replace_lhs_wants_value (void)
+{
+  return MAY_HAVE_DEBUG_STMTS;
+}

but should this not depend on the old stmt / lhs?  For example do we really
want to do this for artificial variables?  I suppose not.

Thanks,
Richard.



 --
 Alexandre Oliva, freedom fighter    http://FSFLA.org/~lxoliva/
 You must be the change you wish to see in the world. -- Gandhi
 Be Free! -- http://FSFLA.org/   FSF Latin America board member
 Free Software Evangelist      Red Hat Brazil Compiler Engineer



Re: [PATCH] Fix PR18589

2012-04-12 Thread Richard Guenther
On Thu, Apr 5, 2012 at 3:49 PM, William J. Schmidt
wschm...@linux.vnet.ibm.com wrote:
 On Thu, 2012-04-05 at 11:23 +0200, Richard Guenther wrote:
 On Wed, Apr 4, 2012 at 9:15 PM, William J. Schmidt
 wschm...@linux.vnet.ibm.com wrote:
 
  Unfortunately this seems to be necessary if I name the two passes
  reassoc1 and reassoc2.  If I try to name both of them reassoc I
  get failures in other tests like gfortran.dg/reassoc_4, where
  -fdump-tree-reassoc1 doesn't work.  Unless I'm missing something
  obvious, I think I need to keep that change.

 Hm, naming them reassoc1 and reassoc2 is a hack.  Naming both
 reassoc will not trigger re-naming them to reassoc1 and reassoc2
 I think.  How ugly.  Especially that -fdump-tree-reassoc will no longer
 work.  Maybe instead of using two pass structs resort to using
 the existing hack with using first_pass_instance and 
 TODO_mark_first_instance.

 OK, that seems to be the best among evils.  Using the
 first_pass_instance hack, the patch is transformed as below.
 Regstrapped on powerpc64-linux, no additional failures.  OK for trunk?

Ok.

Thanks,
Richard.

 Thanks,
 Bill


 gcc:

 2012-04-05  Bill Schmidt  wschm...@linux.vnet.ibm.com

        PR tree-optimization/18589
        * tree-ssa-reassoc.c (reassociate_stats): Add two fields.
        (operand_entry): Add count field.
        (add_repeat_to_ops_vec): New function.
        (completely_remove_stmt): Likewise.
        (remove_def_if_absorbed_call): Likewise.
        (remove_visited_stmt_chain): Remove feeding builtin pow/powi calls.
        (acceptable_pow_call): New function.
        (linearize_expr_tree): Look for builtin pow/powi calls and add operand
        entries with repeat counts when found.
        (repeat_factor_d): New struct and associated typedefs.
        (repeat_factor_vec): New static vector variable.
        (compare_repeat_factors): New function.
        (get_reassoc_pow_ssa_name): Likewise.
        (attempt_builtin_powi): Likewise.
        (reassociate_bb): Call attempt_builtin_powi.
        (fini_reassoc): Two new calls to statistics_counter_event.

 gcc/testsuite:

 2012-04-05  Bill Schmidt  wschm...@linux.vnet.ibm.com

        PR tree-optimization/18589
        * gcc.dg/tree-ssa/pr18589-1.c: New test.
        * gcc.dg/tree-ssa/pr18589-2.c: Likewise.
        * gcc.dg/tree-ssa/pr18589-3.c: Likewise.
        * gcc.dg/tree-ssa/pr18589-4.c: Likewise.
        * gcc.dg/tree-ssa/pr18589-5.c: Likewise.
        * gcc.dg/tree-ssa/pr18589-6.c: Likewise.
        * gcc.dg/tree-ssa/pr18589-7.c: Likewise.
        * gcc.dg/tree-ssa/pr18589-8.c: Likewise.
        * gcc.dg/tree-ssa/pr18589-9.c: Likewise.
        * gcc.dg/tree-ssa/pr18589-10.c: Likewise.


 Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c
 ===
 --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c   (revision 0)
 +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-4.c   (revision 0)
 @@ -0,0 +1,10 @@
 +/* { dg-do compile } */
 +/* { dg-options -O3 -ffast-math -fdump-tree-optimized } */
 +
 +double baz (double x, double y, double z, double u)
 +{
 +  return x * x * y * y * y * z * z * z * z * u;
 +}
 +
 +/* { dg-final { scan-tree-dump-times  \\*  6 optimized } } */
 +/* { dg-final { cleanup-tree-dump optimized } } */
 Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-5.c
 ===
 --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-5.c   (revision 0)
 +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-5.c   (revision 0)
 @@ -0,0 +1,10 @@
 +/* { dg-do compile } */
 +/* { dg-options -O3 -ffast-math -fdump-tree-optimized } */
 +
 +double baz (double x, double y, double z, double u)
 +{
 +  return x * x * x * y * y * y * z * z * z * z * u * u * u * u;
 +}
 +
 +/* { dg-final { scan-tree-dump-times  \\*  6 optimized } } */
 +/* { dg-final { cleanup-tree-dump optimized } } */
 Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-6.c
 ===
 --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-6.c   (revision 0)
 +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-6.c   (revision 0)
 @@ -0,0 +1,10 @@
 +/* { dg-do compile } */
 +/* { dg-options -O3 -ffast-math -fdump-tree-optimized } */
 +
 +double baz (double x, double y)
 +{
 +  return __builtin_pow (x, 3.0) * __builtin_pow (y, 4.0);
 +}
 +
 +/* { dg-final { scan-tree-dump-times  \\*  4 optimized } } */
 +/* { dg-final { cleanup-tree-dump optimized } } */
 Index: gcc/testsuite/gcc.dg/tree-ssa/pr18589-7.c
 ===
 --- gcc/testsuite/gcc.dg/tree-ssa/pr18589-7.c   (revision 0)
 +++ gcc/testsuite/gcc.dg/tree-ssa/pr18589-7.c   (revision 0)
 @@ -0,0 +1,10 @@
 +/* { dg-do compile } */
 +/* { dg-options -O3 -ffast-math -fdump-tree-optimized } */
 +
 +float baz (float x, float y)
 +{
 +  return x * x * x * x * y * y * y * y;
 +}
 +
 +/* { dg-final { scan-tree-dump-times  \\*  3 optimized } } */
 +/* { dg-final { cleanup-tree-dump optimized 

Re: [PATCH][ARM] NEON DImode neg

2012-04-12 Thread Andrew Stubbs

Ping.

On 26/03/12 11:14, Andrew Stubbs wrote:

On 28/02/12 17:45, Andrew Stubbs wrote:

Hi all,

This patch adds a DImode negate pattern for NEON.

Unfortunately, the NEON vneg instruction only supports vectors, not
singletons, so there's no direct way to do it in DImode, and the
compiler ends up moving the value back to core registers, negating it,
and returning to NEON afterwards:

fmrrd r2, r3, d16 @ int
negs r2, r2
sbc r3, r3, r3, lsl #1
fmdrr d16, r2, r3 @ int

The new patch does it entirely in NEON:

vmov.i32 d17, #0 @ di
vsub.i64 d16, d17, d16

(Note that this is the result when combined with my recent patch for
NEON DImode immediates. Without that you get a constant pool load.)


This updates fixes a bootstrap failure caused by an early clobber error.
I've also got a native regression test running now.

OK?

Andrew




[PATCH][C] Fix PR52549

2012-04-12 Thread Richard Guenther

This fixes PR52549 - we are running into an overzealous assert
that wants to make sure we don't have PLUS_EXPR on pointers.
But that code does not really check this and falls foul of
the conversion removal code right before it that transforms
(void *)(a + b) to a + b.

Fixed as follows.

Bootstrap and regtest pending on x86_64-unknown-linux-gnu.

Ok?

Thanks,
Richard.

2012-04-12  Richard Guenther  rguent...@suse.de

PR c/52549
* c-typeck.c (pointer_diff): Remove bogus assert.

* gcc.dg/pr52549.c: New testcase.

Index: gcc/c-typeck.c
===
--- gcc/c-typeck.c  (revision 186373)
+++ gcc/c-typeck.c  (working copy)
@@ -3446,8 +3446,6 @@ pointer_diff (location_t loc, tree op0,
   else
 con1 = op1;
 
-  gcc_assert (TREE_CODE (con0) != PLUS_EXPR
-  TREE_CODE (con1) != PLUS_EXPR);
   if (TREE_CODE (con0) == POINTER_PLUS_EXPR)
 {
   lit0 = TREE_OPERAND (con0, 1);
Index: gcc/testsuite/gcc.dg/pr52549.c
===
--- gcc/testsuite/gcc.dg/pr52549.c  (revision 0)
+++ gcc/testsuite/gcc.dg/pr52549.c  (revision 0)
@@ -0,0 +1,6 @@
+/* { dg-do compile } */
+
+_mark (long obj, int i, char *a)
+{
+  (char *)(((long *)(obj)) [i]) - a;
+}


[PATCH] Fix PR52862

2012-04-12 Thread Richard Guenther

This removes the special casing of zero from convert_to_pointer
(why would only that need overflow flag handling?  and why would
the generic code below not properly handle it ...)

That sounds better than papering over the issue with noting
that integer_zerop can return true for non-INTEGER_CSTs and
checking for that before querying TREE_OVERFLOW.

Bootstrap and regtest pending on x86_64-unknown-linux-gnu.

Richard.

2012-04-12  Richard Guenther  rguent...@suse.de

PR c/52862
* convert.c (convert_to_pointer): Remove special-casing of
zero.

* gcc.dg/pr52862.c: New testcase.

Index: gcc/convert.c
===
*** gcc/convert.c   (revision 186373)
--- gcc/convert.c   (working copy)
*** convert_to_pointer (tree type, tree expr
*** 44,54 
if (TREE_TYPE (expr) == type)
  return expr;
  
-   /* Propagate overflow to the NULL pointer.  */
-   if (integer_zerop (expr))
- return force_fit_type_double (type, double_int_zero, 0,
- TREE_OVERFLOW (expr));
- 
switch (TREE_CODE (TREE_TYPE (expr)))
  {
  case POINTER_TYPE:
--- 44,49 
Index: gcc/testsuite/gcc.dg/pr52862.c
===
*** gcc/testsuite/gcc.dg/pr52862.c  (revision 0)
--- gcc/testsuite/gcc.dg/pr52862.c  (revision 0)
***
*** 0 
--- 1,9 
+ /* { dg-do compile } */
+ /* { dg-options -O } */
+ 
+ void ASMAtomicWritePtrVoid(const void *pv);
+ void rtThreadDestroy(void)
+ {
+   void * const pvTypeChecked = ((void *)0);
+   ASMAtomicWritePtrVoid((void *)(pvTypeChecked));
+ }


Re: [PATCH, powerpc] Fix PR52775, enable FCFID on power4

2012-04-12 Thread David Edelsohn
On Wed, Apr 11, 2012 at 4:12 PM, Michael Meissner
meiss...@linux.vnet.ibm.com wrote:
 It was brought to my attention that when I rewrote the floating point
 conversion operations for power7, I did not notice that the power4 and 970
 powerpc's actually support the FCFID (floating point convert) instruciton in
 32-bit mode.  This patch fixes it.  It is ok to apply?  I did bootstraps with
 and without the patch, and there were no regressions.

 [gcc]
 2012-04-11  Michael Meissner  meiss...@linux.vnet.ibm.com

        PR target/52775
        * config/rs6000/rs6000.h (TARGET_FCFID): Add TARGET_PPC_GPOPT to
        the list of options to enable the FCFID instruction.
        (TARGET_EXTRA_BUILTINS): Adjust comment.

 [gcc/testsuite]
 2012-04-11  Michael Meissner  meiss...@linux.vnet.ibm.com

        PR target/52775
        * gcc.target/powerpc/pr52775.c: New file.

Okay.

Thanks, David


[Patch, Fortran] PR52864 - fix actual/formal checks

2012-04-12 Thread Tobias Burnus
This patch is a kind of follow up to the other one for the same PR - 
though this one is for a separate test case, it is not a regression and 
it's about actual/formal checks.


When trying to fix the rejects-valid bug, I realized that one function 
was never accessed as a call to expr.c's gfc_check_vardef_context is 
done before. I made some cleanup and added some code to ensure pointer 
CLASS are correctly handled. I am not positive that the removed code is 
unreachable, but I failed to produce reachable code and also the test 
suit passed.


Thus, this patch removed a rejects-valid bug, an accepts-invalid bug, 
cleans up the code a bit and adds a test case for existing checks 
(besides testing the bug fixes).


Build and regtested on x86-64-linux.
OK for the trunk?

Tobias
20012-04-12  Tobias Burnus  bur...@net-b.de

	PR fortran/52864
	* interface.c (compare_parameter_intent): Remove.
	(check_intents): Remove call, handle CLASS pointer.
	(compare_actual_formal): Handle CLASS pointer.

20012-04-12  Tobias Burnus  bur...@net-b.de

	PR fortran/52864
	* gfortran.dg/pointer_intent_7.f90: New.
	* gfortran.dg/pure_formal_3.f90: New.

diff --git a/gcc/fortran/interface.c b/gcc/fortran/interface.c
index 298ae23..3c8f9cb 100644
--- a/gcc/fortran/interface.c
+++ b/gcc/fortran/interface.c
@@ -2504,7 +2520,9 @@ compare_actual_formal (gfc_actual_arglist **ap, gfc_formal_arglist *formal,
  ? _(actual argument to INTENT = OUT/INOUT)
  : NULL);
 
-	  if (f-sym-attr.pointer
+	  if (((f-sym-ts.type == BT_CLASS  f-sym-attr.class_ok
+		 CLASS_DATA (f-sym)-attr.class_pointer)
+	   || (f-sym-ts.type != BT_CLASS  f-sym-attr.pointer))
 	   gfc_check_vardef_context (a-expr, true, false, context)
 		   == FAILURE)
 	return 0;
@@ -2799,25 +2817,6 @@ check_some_aliasing (gfc_formal_arglist *f, gfc_actual_arglist *a)
 }
 
 
-/* Given a symbol of a formal argument list and an expression,
-   return nonzero if their intents are compatible, zero otherwise.  */
-
-static int
-compare_parameter_intent (gfc_symbol *formal, gfc_expr *actual)
-{
-  if (actual-symtree-n.sym-attr.pointer  !formal-attr.pointer)
-return 1;
-
-  if (actual-symtree-n.sym-attr.intent != INTENT_IN)
-return 1;
-
-  if (formal-attr.intent == INTENT_INOUT || formal-attr.intent == INTENT_OUT)
-return 0;
-
-  return 1;
-}
-
-
 /* Given formal and actual argument lists that correspond to one
another, check that they are compatible in the sense that intents
are not mismatched.  */
@@ -2839,25 +2838,11 @@ check_intents (gfc_formal_arglist *f, gfc_actual_arglist *a)
 
   f_intent = f-sym-attr.intent;
 
-  if (!compare_parameter_intent(f-sym, a-expr))
-	{
-	  gfc_error (Procedure argument at %L is INTENT(IN) while interface 
-		 specifies INTENT(%s), a-expr-where,
-		 gfc_intent_string (f_intent));
-	  return FAILURE;
-	}
-
   if (gfc_pure (NULL)  gfc_impure_variable (a-expr-symtree-n.sym))
 	{
-	  if (f_intent == INTENT_INOUT || f_intent == INTENT_OUT)
-	{
-	  gfc_error (Procedure argument at %L is local to a PURE 
-			 procedure and is passed to an INTENT(%s) argument,
-			 a-expr-where, gfc_intent_string (f_intent));
-	  return FAILURE;
-	}
-
-	  if (f-sym-attr.pointer)
+	  if ((f-sym-ts.type == BT_CLASS  CLASS_DATA (f-sym)
+	CLASS_DATA (f-sym)-attr.class_pointer)
+	  || (f-sym-ts.type != BT_CLASS  f-sym-attr.pointer))
 	{
 	  gfc_error (Procedure argument at %L is local to a PURE 
 			 procedure and has the POINTER attribute,
@@ -2877,7 +2862,9 @@ check_intents (gfc_formal_arglist *f, gfc_actual_arglist *a)
 	  return FAILURE;
 	}
 
-	  if (f-sym-attr.pointer)
+	  if ((f-sym-ts.type == BT_CLASS  CLASS_DATA (f-sym)
+CLASS_DATA (f-sym)-attr.class_pointer)
+  || (f-sym-ts.type != BT_CLASS  f-sym-attr.pointer))
 	{
 	  gfc_error (Coindexed actual argument at %L in PURE procedure 
 			 is passed to a POINTER dummy argument,
--- /dev/null	2012-04-12 06:55:49.927755790 +0200
+++ gcc/gcc/testsuite/gfortran.dg/pointer_intent_7.f90	2012-04-12 12:21:37.0 +0200
@@ -0,0 +1,45 @@
+! { dg-do compile }
+!
+! PR fortran/
+!
+! Contributed by Neil Carlson
+!
+! Check whether passing an intent(in) pointer
+! to an intent(inout) nonpointer is allowed
+!
+module modA
+  type :: typeA
+integer, pointer :: ptr
+  end type
+contains
+  subroutine foo (a,b,c)
+type(typeA), intent(in) :: a
+type(typeA), intent(in) , pointer :: b
+class(typeA), intent(in) , pointer :: c
+
+call bar (a%ptr)
+call bar2 (b)
+call bar3 (b)
+call bar2 (c)
+call bar3 (c)
+call bar2p (b) ! { dg-error INTENT\\(IN\\) in pointer association context \\(actual argument to INTENT = OUT/INOUT }
+call bar3p (b) ! { dg-error INTENT\\(IN\\) in pointer association context \\(actual argument to INTENT = OUT/INOUT }
+call bar2p (c) ! { dg-error INTENT\\(IN\\) in pointer association context \\(actual argument to INTENT = OUT/INOUT }
+

Re: [RS6000] Stack tie fix.

2012-04-12 Thread David Edelsohn
On Thu, Apr 12, 2012 at 9:22 AM, Alan Modra amo...@gmail.com wrote:

 I tried that.  It doesn't work without something else in the insn to
 stop rtl-dce deleting it, so you may as well use SETs.  But thanks for
 the prod in the right direction.  We do get slightly better results
 when the regs are not hidden away in an UNSPEC, for instance
 non-stack writes/reads are seen by the alias oracle to not conflict
 with the epilogue frame deallocation.

 Bootstrapped etc. powerpc-linux.  OK to apply, David?

        PR target/52828
        * config/rs6000/rs6000.c (rs6000_emit_stack_tie): Rewrite with
        tie regs on destination of sets.  Delete forward declaration.
        (rs6000_emit_stack_reset): Update rs6000_emit_stack_tie calls.
        (rs6000_emit_prologue): Likewise.
        (rs6000_emit_epilogue): Likewise.  Use in place of gen_frame_tie
        and gen_stack_tie.
        (is_mem_ref): Use tie_operand to recognise stack ties.
        * config/rs6000/predicates.md (tie_operand): New.
        * config/rs6000/rs6000.md (UNSPEC_TIE): Delete.
        (restore_stack_block): Generate new stack tie rtl.
        (restore_stack_nonlocal): Likewise.
        (stack_tie): Update.
        (frame_tie): Delete.

This probably is getting close to the best we can do with GCC's RTL
alias analysis infrastructure.

This version is okay, but I also want to give Richi and Olivier an
opportunity to comment if they still have any concerns.

Thanks, David


Re: [PATCH] Dissociate store_expr's temp from exp so that it is not marked as addressable

2012-04-12 Thread Martin Jambor
Hi,

On Fri, Apr 06, 2012 at 06:13:20PM +0200, Eric Botcazou wrote:
  2012-04-03  Martin Jambor  mjam...@suse.cz
 
  * expr.c (expand_expr_real_1): Pass type, not the expression, to
  set_mem_attributes for a memory temporary.  Do not call the
  function for temporaries with a different alias set.
 
 The last sentence is unprecise, this would rather be: Do not call
 the function for the memory temporary created for a bitfield.

OK

 
 I wonder whether we should simplify the bitfield case in the process.  Once 
 we 
 remove the call to set_mem_attributes, I think the
 
   /* If the reference doesn't use the alias set of its type,
  we cannot create the temporary using that type.  */
 
 is useless, so we could try to revert r122014 in the process.

Well, the commit did not add a testcase and when I looked up the patch
in the mailing list archive
(http://gcc.gnu.org/ml/gcc-patches/2006-11/msg01449.html) it said it
was fixing problems not reproducible on trunk so it's basically
impossible for me to evaluate whether it is still necessary by some
simple testing.  Having said that, I guess I can give it a round of
regular testing on all the platforms I have currently set up.

On Fri, Apr 06, 2012 at 06:21:55PM +0200, Eric Botcazou wrote:
  @@ -9870,7 +9871,14 @@ expand_expr_real_1 (tree exp, rtx target
  if (op0 == orig_op0)
op0 = copy_rtx (op0);
 
  -   set_mem_attributes (op0, exp, 0);
  +   /* If op0 is a temporary because of forcing to memory, pass only the
  +  type to set_mem_attributes so that the original expression is never
  +  marked as ADDRESSABLE through MEM_EXPR of the temporary.  */
  +   if (mem_attrs_from_type)
  + set_mem_attributes (op0, TREE_TYPE (exp), 0);
 
 set_mem_attributes (op0, type, 0); is equivalent.
 

OK.

So basically I'd like to commit the following.  It has been
successfully bootstrapped and tested on x86_64-linux, i686-linux,
sparc64-linux (without Java), ia64-linux (without Ada) and tested on
hppa-linux (C and C++ only).

OK for trunk?

Thanks,

Martin


2012-04-10  Martin Jambor  mjam...@suse.cz

* expr.c (expand_expr_real_1): Pass type, not the expression, to
set_mem_attributes for a memory temporary. Do not call the function
for the memory temporary created for a bitfield.

Index: src/gcc/expr.c
===
--- src.orig/gcc/expr.c
+++ src/gcc/expr.c
@@ -9598,6 +9598,7 @@ expand_expr_real_1 (tree exp, rtx target
tree tem = get_inner_reference (exp, bitsize, bitpos, offset,
mode1, unsignedp, volatilep, true);
rtx orig_op0, memloc;
+   bool mem_attrs_from_type = false;
 
/* If we got back the original object, something is wrong.  Perhaps
   we are evaluating an expression too early.  In any event, don't
@@ -9703,6 +9704,7 @@ expand_expr_real_1 (tree exp, rtx target
memloc = assign_temp (nt, 1, 1, 1);
emit_move_insn (memloc, op0);
op0 = memloc;
+   mem_attrs_from_type = true;
  }
 
if (offset)
@@ -9875,7 +9877,6 @@ expand_expr_real_1 (tree exp, rtx target
emit_move_insn (new_rtx, op0);
op0 = copy_rtx (new_rtx);
PUT_MODE (op0, BLKmode);
-   set_mem_attributes (op0, exp, 1);
  }
 
return op0;
@@ -9896,7 +9897,14 @@ expand_expr_real_1 (tree exp, rtx target
if (op0 == orig_op0)
  op0 = copy_rtx (op0);
 
-   set_mem_attributes (op0, exp, 0);
+   /* If op0 is a temporary because of forcing to memory, pass only the
+  type to set_mem_attributes so that the original expression is never
+  marked as ADDRESSABLE through MEM_EXPR of the temporary.  */
+   if (mem_attrs_from_type)
+ set_mem_attributes (op0, type, 0);
+   else
+ set_mem_attributes (op0, exp, 0);
+
if (REG_P (XEXP (op0, 0)))
  mark_reg_pointer (XEXP (op0, 0), MEM_ALIGN (op0));
 


Re: [PATCH][ARM] NEON DImode neg

2012-04-12 Thread Richard Earnshaw
On 26/03/12 11:14, Andrew Stubbs wrote:
 On 28/02/12 17:45, Andrew Stubbs wrote:
 Hi all,

 This patch adds a DImode negate pattern for NEON.

 Unfortunately, the NEON vneg instruction only supports vectors, not
 singletons, so there's no direct way to do it in DImode, and the
 compiler ends up moving the value back to core registers, negating it,
 and returning to NEON afterwards:

 fmrrd r2, r3, d16 @ int
 negs r2, r2
 sbc r3, r3, r3, lsl #1
 fmdrr d16, r2, r3 @ int

 The new patch does it entirely in NEON:

 vmov.i32 d17, #0 @ di
 vsub.i64 d16, d17, d16

 (Note that this is the result when combined with my recent patch for
 NEON DImode immediates. Without that you get a constant pool load.)
 
 This updates fixes a bootstrap failure caused by an early clobber error. 
 I've also got a native regression test running now.
 
 OK?
 
 Andrew
 
 
 neon-neg64.patch
 
 
 2012-03-26  Andrew Stubbs  a...@codesourcery.com
 
   gcc/
   * config/arm/arm.md (negdi2): Use gen_negdi2_neon.
   * config/arm/neon.md (negdi2_neon): New insn.
   Also add splitters for core and NEON registers.
 
 ---
  gcc/config/arm/arm.md  |8 +++-
  gcc/config/arm/neon.md |   37 +
  2 files changed, 44 insertions(+), 1 deletions(-)
 
 diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
 index 751997f..f1dbbf7 100644
 --- a/gcc/config/arm/arm.md
 +++ b/gcc/config/arm/arm.md
 @@ -4048,7 +4048,13 @@
(neg:DI (match_operand:DI 1 s_register_operand )))
  (clobber (reg:CC CC_REGNUM))])]
TARGET_EITHER
 -  
 +  {
 +if (TARGET_NEON)
 +  {
 +emit_insn (gen_negdi2_neon (operands[0], operands[1]));
 + DONE;
 +  }
 +  }
  )
  
  ;; The constraints here are to prevent a *partial* overlap (where %Q0 == 
 %R1).
 diff --git a/gcc/config/arm/neon.md b/gcc/config/arm/neon.md
 index 3c88568..bf229a7 100644
 --- a/gcc/config/arm/neon.md
 +++ b/gcc/config/arm/neon.md
 @@ -922,6 +922,43 @@
  (const_string neon_int_3)))]
  )
  
 +(define_insn negdi2_neon
 +  [(set (match_operand:DI 0 s_register_operand  = w,?r,?r,?w)
 + (neg:DI (match_operand:DI 1 s_register_operand   w, 0,  r, w)))
 +   (clobber (match_scratch:DI 2   =w, X,  
 X,w))
 +   (clobber (reg:CC CC_REGNUM))]
 +  TARGET_NEON
 +  #
 +  [(set_attr length 8)
 +   (set_attr arch nota8,*,*,onlya8)]
 +)
 +

If negation in Neon needs a scratch register, it seems to me to be
somewhat odd that we're disparaging the ARM version.

Also, wouldn't it be sensible to support a variant that was
early-clobber on operand 0, but loaded immediate zero into that value first:

vmovDd, #0
vsubDd, Dd, Dm

That way you'll never need more than two registers, whereas today you
want three.


R.



Re: [RS6000] Stack tie fix.

2012-04-12 Thread Richard Guenther
On Thu, Apr 12, 2012 at 5:34 PM, David Edelsohn dje@gmail.com wrote:
 On Thu, Apr 12, 2012 at 9:22 AM, Alan Modra amo...@gmail.com wrote:

 I tried that.  It doesn't work without something else in the insn to
 stop rtl-dce deleting it, so you may as well use SETs.  But thanks for
 the prod in the right direction.  We do get slightly better results
 when the regs are not hidden away in an UNSPEC, for instance
 non-stack writes/reads are seen by the alias oracle to not conflict
 with the epilogue frame deallocation.

 Bootstrapped etc. powerpc-linux.  OK to apply, David?

        PR target/52828
        * config/rs6000/rs6000.c (rs6000_emit_stack_tie): Rewrite with
        tie regs on destination of sets.  Delete forward declaration.
        (rs6000_emit_stack_reset): Update rs6000_emit_stack_tie calls.
        (rs6000_emit_prologue): Likewise.
        (rs6000_emit_epilogue): Likewise.  Use in place of gen_frame_tie
        and gen_stack_tie.
        (is_mem_ref): Use tie_operand to recognise stack ties.
        * config/rs6000/predicates.md (tie_operand): New.
        * config/rs6000/rs6000.md (UNSPEC_TIE): Delete.
        (restore_stack_block): Generate new stack tie rtl.
        (restore_stack_nonlocal): Likewise.
        (stack_tie): Update.
        (frame_tie): Delete.

 This probably is getting close to the best we can do with GCC's RTL
 alias analysis infrastructure.

 This version is okay, but I also want to give Richi and Olivier an
 opportunity to comment if they still have any concerns.

It looks fine to me.

Richard.

 Thanks, David


Re: [RFC] Should SRA stop producing COMPONENT_REF for non-bit-fields (again)?

2012-04-12 Thread Martin Jambor
Hi,

On Wed, Apr 04, 2012 at 04:42:05PM +0200, Richard Guenther wrote:
 On Wed, 4 Apr 2012, Martin Jambor wrote:
 
  Hi everyone, especially Richi and Eric,
  
  I'd like to know what is your attitude to changing SRA's
  build_ref_for_model to what it once looked like, so that it produces
  COMPONENT_REFs only for bit-fields.  The non-bit field handling was
  added in order to work-around problems when expanding non-aligned
  MEM_REFs on strict-alignment targets but that should not be a problem
  now and my experiments confirm that.  Last week I successfully
  bootstrapped and tested this patch on sparc64-linux (with the
  temporary MEM_EXPR patch, not including Java), ia64-linux (without
  Ada), x86_64-linux, i686-linux and tested it on hppa-linux (only C and
  C++).

...

  
  2012-03-20 Martin Jambor mjam...@suse.cz
  
  * tree-sra.c (build_ref_for_model): Create COMPONENT_REFs only for
  bit-fields.
  
  Index: src/gcc/tree-sra.c
  ===
  *** src.orig/gcc/tree-sra.c
  --- src/gcc/tree-sra.c
  *** build_ref_for_offset (location_t loc, tr
  *** 1489,1558 
  return fold_build2_loc (loc, MEM_REF, exp_type, base, off);
}

  - DEF_VEC_ALLOC_P_STACK (tree);
  - #define VEC_tree_stack_alloc(alloc) VEC_stack_alloc (tree, alloc)
  - 
/* Construct a memory reference to a part of an aggregate BASE at the 
  given
  !OFFSET and of the type of MODEL.  In case this is a chain of references
  !to component, the function will replicate the chain of COMPONENT_REFs 
  of
  !the expression of MODEL to access it.  GSI and INSERT_AFTER have the 
  same
  !meaning as in build_ref_for_offset.  */

static tree
build_ref_for_model (location_t loc, tree base, HOST_WIDE_INT offset,
   struct access *model, gimple_stmt_iterator *gsi,
   bool insert_after)
{
  !   tree type = model-type, t;
  !   VEC(tree,stack) *cr_stack = NULL;
  ! 
  !   if (TREE_CODE (model-expr) == COMPONENT_REF)
{
  !   tree expr = model-expr;
  ! 
  !   /* Create a stack of the COMPONENT_REFs so later we can walk them in
  !order from inner to outer.  */
  !   cr_stack = VEC_alloc (tree, stack, 6);
  ! 
  !   do {
  !   tree field = TREE_OPERAND (expr, 1);
  !   tree cr_offset = component_ref_field_offset (expr);
  !   HOST_WIDE_INT bit_pos
  ! = tree_low_cst (cr_offset, 1) * BITS_PER_UNIT
  ! + TREE_INT_CST_LOW (DECL_FIELD_BIT_OFFSET (field));

  !   /* We can be called with a model different from the one associated
  !  with BASE so we need to avoid going up the chain too far.  */
  !   if (offset - bit_pos  0)
  ! break;
  ! 
  !   offset -= bit_pos;
  !   VEC_safe_push (tree, stack, cr_stack, expr);
  ! 
  !   expr = TREE_OPERAND (expr, 0);
  !   type = TREE_TYPE (expr);
  !   } while (TREE_CODE (expr) == COMPONENT_REF);
}
  ! 
  !   t = build_ref_for_offset (loc, base, offset, type, gsi, insert_after);
  ! 
  !   if (TREE_CODE (model-expr) == COMPONENT_REF)
  ! {
  !   unsigned i;
  !   tree expr;
  ! 
  !   /* Now replicate the chain of COMPONENT_REFs from inner to outer.  
  */
  !   FOR_EACH_VEC_ELT_REVERSE (tree, cr_stack, i, expr)
  !   {
  ! tree field = TREE_OPERAND (expr, 1);
  ! t = fold_build3_loc (loc, COMPONENT_REF, TREE_TYPE (field), t, field,
  !  TREE_OPERAND (expr, 2));
  !   }
  ! 
  !   VEC_free (tree, stack, cr_stack);
  ! }
  ! 
  !   return t;
}

/* Construct a memory reference consisting of component_refs and 
  array_refs to
  --- 1489,1520 
  return fold_build2_loc (loc, MEM_REF, exp_type, base, off);
}

/* Construct a memory reference to a part of an aggregate BASE at the 
  given
  !OFFSET and of the same type as MODEL.  In case this is a reference to a
  !bit-field, the function will replicate the last component_ref of 
  model's
  !expr to access it.  GSI and INSERT_AFTER have the same meaning as in
  !build_ref_for_offset.  */

static tree
build_ref_for_model (location_t loc, tree base, HOST_WIDE_INT offset,
   struct access *model, gimple_stmt_iterator *gsi,
   bool insert_after)
{
  !   if (TREE_CODE (model-expr) == COMPONENT_REF
  !DECL_BIT_FIELD (TREE_OPERAND (model-expr, 1)))
 
 I think you need to test DECL_BIT_FIELD_TYPE here

I have no problems changing that but in between revisions 164280 and
166730 this test worked fine.  What exactly is the difference?

 
{
  !   /* This access represents a bit-field.  */
  !   tree t, exp_type;

  !   offset -= int_bit_position (TREE_OPERAND (model-expr, 1));
 
 I'm not sure that offset is now byte-aligned for all the funny Ada
 bit-layouted records.  But maybe we don't even try to SRA those?

Aggregates containing aggregate fields which start at position
straddling 

Re: [RS6000] Stack tie fix.

2012-04-12 Thread Olivier Hainque

On Apr 12, 2012, at 17:34 , David Edelsohn wrote:
 This version is okay, but I also want to give Richi and Olivier an
 opportunity to comment if they still have any concerns.

 Thanks :) I'm rebuilding our old compiler with the patch
 and will double check how things go on our original failing case.

 Olivier



Re: [PATCH][ARM] NEON DImode neg

2012-04-12 Thread Andrew Stubbs

On 12/04/12 16:48, Richard Earnshaw wrote:

If negation in Neon needs a scratch register, it seems to me to be
somewhat odd that we're disparaging the ARM version.


Good point, I'm really meaning to provide missing functionality in NEON 
to prevent unnecessary moves, rather than cause them, so I should remove 
the '?'s.


It probably doesn't need disparaging on A8, etiher?


Also, wouldn't it be sensible to support a variant that was
early-clobber on operand 0, but loaded immediate zero into that value first:

vmovDd, #0
vsubDd, Dd, Dm

That way you'll never need more than two registers, whereas today you
want three.


That does seem like a good option. I'll have a go and see what happens.

Andrew


Re: [PATCH][ARM] NEON DImode neg

2012-04-12 Thread Richard Earnshaw
On 12/04/12 17:16, Andrew Stubbs wrote:
 On 12/04/12 16:48, Richard Earnshaw wrote:
 If negation in Neon needs a scratch register, it seems to me to be
 somewhat odd that we're disparaging the ARM version.
 
 Good point, I'm really meaning to provide missing functionality in NEON 
 to prevent unnecessary moves, rather than cause them, so I should remove 
 the '?'s.
 
 It probably doesn't need disparaging on A8, etiher?

Possibly not.  Which then begs the question as to whether we need an A8
variant at all.

 
 Also, wouldn't it be sensible to support a variant that was
 early-clobber on operand 0, but loaded immediate zero into that value first:

  vmovDd, #0
  vsubDd, Dd, Dm

 That way you'll never need more than two registers, whereas today you
 want three.
 
 That does seem like a good option. I'll have a go and see what happens.
 
 Andrew
 




Re: [PATCH] Fix PR18589

2012-04-12 Thread H.J. Lu
On Thu, Apr 5, 2012 at 6:49 AM, William J. Schmidt
wschm...@linux.vnet.ibm.com wrote:
 On Thu, 2012-04-05 at 11:23 +0200, Richard Guenther wrote:
 On Wed, Apr 4, 2012 at 9:15 PM, William J. Schmidt
 wschm...@linux.vnet.ibm.com wrote:
 
  Unfortunately this seems to be necessary if I name the two passes
  reassoc1 and reassoc2.  If I try to name both of them reassoc I
  get failures in other tests like gfortran.dg/reassoc_4, where
  -fdump-tree-reassoc1 doesn't work.  Unless I'm missing something
  obvious, I think I need to keep that change.

 Hm, naming them reassoc1 and reassoc2 is a hack.  Naming both
 reassoc will not trigger re-naming them to reassoc1 and reassoc2
 I think.  How ugly.  Especially that -fdump-tree-reassoc will no longer
 work.  Maybe instead of using two pass structs resort to using
 the existing hack with using first_pass_instance and 
 TODO_mark_first_instance.

 OK, that seems to be the best among evils.  Using the
 first_pass_instance hack, the patch is transformed as below.
 Regstrapped on powerpc64-linux, no additional failures.  OK for trunk?

 Thanks,
 Bill


 gcc:

 2012-04-05  Bill Schmidt  wschm...@linux.vnet.ibm.com

        PR tree-optimization/18589
        * tree-ssa-reassoc.c (reassociate_stats): Add two fields.
        (operand_entry): Add count field.
        (add_repeat_to_ops_vec): New function.
        (completely_remove_stmt): Likewise.
        (remove_def_if_absorbed_call): Likewise.
        (remove_visited_stmt_chain): Remove feeding builtin pow/powi calls.
        (acceptable_pow_call): New function.
        (linearize_expr_tree): Look for builtin pow/powi calls and add operand
        entries with repeat counts when found.
        (repeat_factor_d): New struct and associated typedefs.
        (repeat_factor_vec): New static vector variable.
        (compare_repeat_factors): New function.
        (get_reassoc_pow_ssa_name): Likewise.
        (attempt_builtin_powi): Likewise.
        (reassociate_bb): Call attempt_builtin_powi.
        (fini_reassoc): Two new calls to statistics_counter_event.


It breaks bootstrap on Linux/ia32:

../../src-trunk/gcc/tree-ssa-reassoc.c: In function 'void
attempt_builtin_powi(gimple, VEC_operand_entry_t_heap**,
tree_node**)':
../../src-trunk/gcc/tree-ssa-reassoc.c:3189:41: error: format '%ld'
expects argument of type 'long int', but argument 3 has type 'long
long int' [-Werror=format]
 fprintf (dump_file, )^%ld\n, power);
 ^
../../src-trunk/gcc/tree-ssa-reassoc.c:3222:44: error: format '%ld'
expects argument of type 'long int', but argument 3 has type 'long
long int' [-Werror=format]
fprintf (dump_file, )^%ld\n, power);
^
cc1plus: all warnings being treated as errors


H.J.


Re: [PATCH] Caret diagnostics

2012-04-12 Thread Tom Tromey
 Gaby == Gabriel Dos Reis g...@integrable-solutions.net writes:

Manuel So, in fact, libcpp is buggy for not implementing this (as can be seen
Manuel in emacs). If/When libcpp is fixed, the column info will be correct
Manuel for tabs. And then, one may care about printing tabs as anything
Manuel different from a single space. But until then, a single space is a
Manuel good as anything. In fact this is what is done in c-ppoutput.c.

Gaby I would not necessarily say that libcpp is buggy in this aspect.
Gaby However, I do agree that removing the tab expansion simplifies
Gaby the code and is better.

Unfortunately I think it really is buggy.  Other GNU programs, including
Emacs, already conform to the GCS here.  Arguably the GCS made the wrong
decision, but it is already baked in all over.

Tom


Re: [PATCH] Caret diagnostics

2012-04-12 Thread Gabriel Dos Reis
On Thu, Apr 12, 2012 at 11:53 AM, Tom Tromey tro...@redhat.com wrote:
 Gaby == Gabriel Dos Reis g...@integrable-solutions.net writes:

 Manuel So, in fact, libcpp is buggy for not implementing this (as can be seen
 Manuel in emacs). If/When libcpp is fixed, the column info will be correct
 Manuel for tabs. And then, one may care about printing tabs as anything
 Manuel different from a single space. But until then, a single space is a
 Manuel good as anything. In fact this is what is done in c-ppoutput.c.

 Gaby I would not necessarily say that libcpp is buggy in this aspect.
 Gaby However, I do agree that removing the tab expansion simplifies
 Gaby the code and is better.

 Unfortunately I think it really is buggy.  Other GNU programs, including
 Emacs, already conform to the GCS here.  Arguably the GCS made the wrong
 decision, but it is already baked in all over.

OK.

-- Gaby


Re: [PATCH] Fix PR18589

2012-04-12 Thread William J. Schmidt
On Thu, 2012-04-12 at 09:50 -0700, H.J. Lu wrote:
 On Thu, Apr 5, 2012 at 6:49 AM, William J. Schmidt
 wschm...@linux.vnet.ibm.com wrote:
  On Thu, 2012-04-05 at 11:23 +0200, Richard Guenther wrote:
  On Wed, Apr 4, 2012 at 9:15 PM, William J. Schmidt
  wschm...@linux.vnet.ibm.com wrote:
  
   Unfortunately this seems to be necessary if I name the two passes
   reassoc1 and reassoc2.  If I try to name both of them reassoc I
   get failures in other tests like gfortran.dg/reassoc_4, where
   -fdump-tree-reassoc1 doesn't work.  Unless I'm missing something
   obvious, I think I need to keep that change.
 
  Hm, naming them reassoc1 and reassoc2 is a hack.  Naming both
  reassoc will not trigger re-naming them to reassoc1 and reassoc2
  I think.  How ugly.  Especially that -fdump-tree-reassoc will no longer
  work.  Maybe instead of using two pass structs resort to using
  the existing hack with using first_pass_instance and 
  TODO_mark_first_instance.
 
  OK, that seems to be the best among evils.  Using the
  first_pass_instance hack, the patch is transformed as below.
  Regstrapped on powerpc64-linux, no additional failures.  OK for trunk?
 
  Thanks,
  Bill
 
 
  gcc:
 
  2012-04-05  Bill Schmidt  wschm...@linux.vnet.ibm.com
 
 PR tree-optimization/18589
 * tree-ssa-reassoc.c (reassociate_stats): Add two fields.
 (operand_entry): Add count field.
 (add_repeat_to_ops_vec): New function.
 (completely_remove_stmt): Likewise.
 (remove_def_if_absorbed_call): Likewise.
 (remove_visited_stmt_chain): Remove feeding builtin pow/powi calls.
 (acceptable_pow_call): New function.
 (linearize_expr_tree): Look for builtin pow/powi calls and add 
  operand
 entries with repeat counts when found.
 (repeat_factor_d): New struct and associated typedefs.
 (repeat_factor_vec): New static vector variable.
 (compare_repeat_factors): New function.
 (get_reassoc_pow_ssa_name): Likewise.
 (attempt_builtin_powi): Likewise.
 (reassociate_bb): Call attempt_builtin_powi.
 (fini_reassoc): Two new calls to statistics_counter_event.
 
 
 It breaks bootstrap on Linux/ia32:
 
 ../../src-trunk/gcc/tree-ssa-reassoc.c: In function 'void
 attempt_builtin_powi(gimple, VEC_operand_entry_t_heap**,
 tree_node**)':
 ../../src-trunk/gcc/tree-ssa-reassoc.c:3189:41: error: format '%ld'
 expects argument of type 'long int', but argument 3 has type 'long
 long int' [-Werror=format]
  fprintf (dump_file, )^%ld\n, power);
  ^
 ../../src-trunk/gcc/tree-ssa-reassoc.c:3222:44: error: format '%ld'
 expects argument of type 'long int', but argument 3 has type 'long
 long int' [-Werror=format]
 fprintf (dump_file, )^%ld\n, power);
 ^
 cc1plus: all warnings being treated as errors
 
 
 H.J.
 

Whoops.  Looks like I need to use HOST_WIDE_INT_PRINT_DEC instead of %ld
in those spots.  I'll get a fix prepared.



Re: [RS6000] Stack tie fix.

2012-04-12 Thread Olivier Hainque

On Apr 12, 2012, at 18:07 , Olivier Hainque wrote:

 I'm rebuilding our old compiler with the patch
 and will double check how things go on our original
 failing case.

 All is well :) Thanks Alan!




Re: [PATCH] Dissociate store_expr's temp from exp so that it is not marked as addressable

2012-04-12 Thread Eric Botcazou
 Well, the commit did not add a testcase and when I looked up the patch
 in the mailing list archive
 (http://gcc.gnu.org/ml/gcc-patches/2006-11/msg01449.html) it said it
 was fixing problems not reproducible on trunk so it's basically
 impossible for me to evaluate whether it is still necessary by some
 simple testing.  Having said that, I guess I can give it a round of
 regular testing on all the platforms I have currently set up.

The problem was that, for the same address, you had the alias set of the type 
on one MEM and the alias set of the reference on the other MEM.  If the alias 
set of the reference doesn't conflict with that of the type (this can happen 
in Ada because of DECL_NONADDRESSABLE_P), the RAW dependency may be missed.

If we don't put the alias set of the reference on one of the MEM, then I don't 
think that we need to put it on the other MEM.  That's what's done for the 
first, non-bitfield temporary now.

 2012-04-10  Martin Jambor  mjam...@suse.cz

   * expr.c (expand_expr_real_1): Pass type, not the expression, to
   set_mem_attributes for a memory temporary. Do not call the function
   for the memory temporary created for a bitfield.

Fine with me, but the now dangling code in the bitfield case is a bit annoying.

-- 
Eric Botcazou


Re: Phone call (was Re: Armhf dynamic linker path)

2012-04-12 Thread Mike Frysinger
On Thursday 12 April 2012 02:05:23 Jakub Jelinek wrote:
 On Thu, Apr 12, 2012 at 11:22:13AM +1200, Michael Hope wrote:
  All good.  My vote is for /lib/ld-arm-linux-gnueabihf.so.3 as it:

 The directory should be /libhf/ or /libhfp/ for that for consistency
 with all the other architectures.

i think the idea was that no one is looking to do multilib here.  so we won't 
have softfloat in /lib/ and hardfloat in /libhf/.  we're just changing the ldso 
to reflect a change in the ABI.

you could also make this argument for EABI and OABI -- the EABI ldso should 
not be in /lib/.  but since we've got OABI and EABI both in /lib/ and people 
are happy with that, as well as the hardfloat ldso in /lib/, there's no need 
for a sep /libhf/.

 I'm fine with arm and hf (resp. hfp) being mentioned in the name of
 the dynamic linker, but IMNSHO having there gnu and eabi strings
 is an overkill - why mention gnu there, when all the other
 architectures which also have GNU libc dynamic linkers don't?  That
 part is implicit.  And, EABI is implied by so.3, softfp dynamic linker
 for EABI is /lib/ld-linux.so.3 while softfp dynamic linker for the old
 ABI is /lib/ld-linux.so.2.

i have no opinion either way here.  uClibc has already opted to name things 
with -uClibc- in them, so we're clear of collisions there.
-mike


signature.asc
Description: This is a digitally signed message part.


Re: Phone call (was Re: Armhf dynamic linker path)

2012-04-12 Thread Mike Frysinger
On Thursday 12 April 2012 03:47:29 Jakub Jelinek wrote:
 On Thu, Apr 12, 2012 at 10:33:08AM +0300, Riku Voipio wrote:
  On 12 April 2012 09:05, Jakub Jelinek ja...@redhat.com wrote:
   On Thu, Apr 12, 2012 at 11:22:13AM +1200, Michael Hope wrote:
   All good.  My vote is for /lib/ld-arm-linux-gnueabihf.so.3 as it:
   The directory should be /libhf/ or /libhfp/ for that for consistency
   with all the other architectures.  Note e.g. x86_64 dynamic linker
   is /lib64/ld-linux-x86-64.so.2, not /lib/ld-linux-x86-64.so.2.
  
  For some value of consistency. x86_64, mips64, powerpc64 and sparc64
  install to /lib64. But on ia64 it is /lib/ld-linux-ia64.so.2 and on
 
 ia64 installs in /lib, because it isn't a multilibbed architecture.

because distros choose not to support it.  in first gen chips, there was 
hardware support for running x86.  so if we were to be strict, there should 
have been /libia64/ (or whatever) while the current x86 32bit code would stay 
in /lib/.  but no distro was interested in supporting that (probably due to 
the 32bit x86 layers being balls-ass slow), so it never happened.

  s390x it is /lib/ld64.so.1 [1].
 
 Ok, I forgot about this, I've tried to convince s390x folks to move it
 to /lib64/ld64.so.1 many years ago, but that just didn't happen, so
 /lib/ld64.so.1 is just a symlink to /lib64/ld64.so.1.
 Upstream glibc binaries use /lib64/ld64.so.1 as their dynamic linker,
 while all other packages use /lib/ld64.so.1 as that is hardcoded in gcc.

i always thought this was weird.  glad to know i'm not the only one :).
-mike


signature.asc
Description: This is a digitally signed message part.


Re: Phone call (was Re: Armhf dynamic linker path)

2012-04-12 Thread Jakub Jelinek
On Thu, Apr 12, 2012 at 01:49:16PM -0400, Mike Frysinger wrote:
  ia64 installs in /lib, because it isn't a multilibbed architecture.
 
 because distros choose not to support it.  in first gen chips, there was 
 hardware support for running x86.  so if we were to be strict, there should 
 have been /libia64/ (or whatever) while the current x86 32bit code would stay 
 in /lib/.  but no distro was interested in supporting that (probably due to 
 the 32bit x86 layers being balls-ass slow), so it never happened.

We even carried patches (not applied) for lib64 ia64 support in our
binutils/glibc/gcc for a while, but the final decision after these were
written was that it is going to stay in /lib.

Jakub


Re: [i386, patch, RFC] HLE support in GCC

2012-04-12 Thread Torvald Riegel
On Thu, 2012-04-12 at 18:13 +0400, Kirill Yukhin wrote:
  I would suggest that we keep the HLE acq/rel bits independent of the
  memory order bits.  Both are independent on a conceptual level.  And we
  should add documentation that tells programmers that memory orders need
  always be specified.
 
 Sorry, I didn't get your point. You propose to separate HLE-capable
 builtins at all?

No, just the bits; programmers would need to do
  __atomic_...(..., __ATOMIC_RELEASE | HLE_RELEASE);
I believe this is what you had in one of your versions of the patch.  My
suggestions was not about doing something new but instead a
suggestions/poll for a resolution of the discussion.




[steve.mcint...@linaro.org: Phone call (was Re: Armhf dynamic linker path)]

2012-04-12 Thread Steve McIntyre
Getting bounces from this message. Let's try again.

- Forwarded message from Steve McIntyre steve.mcint...@linaro.org -

From: Steve McIntyre steve.mcint...@linaro.org
Date: Wed, 11 Apr 2012 23:37:55 +0100
To: cross-dis...@lists.linaro.org
Cc: Adam Conrad adcon...@debian.org, linaro-toolch...@lists.linaro.org,
Jeff Law l...@redhat.com, libc-po...@sourceware.org,
gcc-patches@gcc.gnu.org
Subject: Phone call (was Re: Armhf dynamic linker path)
X-attached: unknown

On Wed, Apr 11, 2012 at 02:06:09AM +0100, Steve McIntyre wrote:

And here's the details as promised.

I've started a wiki page at 

https://wiki.linaro.org/OfficeofCTO/HardFloat/LinkerPathCallApr2012

with a strawman agenda for now, and a Doodle poll at 

http://www.doodle.com/93bitkqeb7auyxn7

to see when the best time is for the call on Thursday/Friday. Please
fill in the times that work for you ASAP and I'll announce the result
during Wednesday. Ideally we'd like stakeholders from all the relevant
distros and the upstream toolchain developers to be there, able to
represent their groups and (importantly) able to make a decision here
on what we should do.

Apologies for the short notice, but we need a decision quickly.

And the best time turns out to be Friday at 15:00 UTC (16:00 BST,
11:00 EDT etc.). Of the 10 people who responded in the poll, the only
person who can't make that time is Michael in .nz. Sorry, Michael.

Phone numbers are listed in the wiki page above. I'll take notes and
post minutes afterwards. Let's talk Friday and get this shaken out to
a working conclusion.

Thanks,
-- 
Steve McIntyresteve.mcint...@linaro.org
http://www.linaro.org/ Linaro.org | Open source software for ARM SoCs

- End forwarded message -
Cheers,
-- 
Steve McIntyresteve.mcint...@linaro.org
http://www.linaro.org/ Linaro.org | Open source software for ARM SoCs



[PATCH, i386]: Fix PR 52932 - _mm256_permutevar8x32_ps and _mm256_permutevar8x32_ps

2012-04-12 Thread Uros Bizjak
Hello!

Attached patch fixes issues around AVX2 vpermps and vpermd instructions.

1. Changes second argument of _mm256_permutevar8x32_ps to __m256i type
and consequently changes second argument of
__builtin_ia32_permvarsf256 argument to __v8si type.
2. Changes avx2_permvarv2sf pattern to accept v8si mask operand as its
2nd operand
3. Changes avx2_permvarv2si pattern in similar way, so it accepts mask
as its 2nd operand
4. Macroizes avx2_permvarv2sf and permvarv2si patterns
5. Mechanically updates all calls to these two expanders
6. Fixes testcases accordingly

2012-04-12  Uros Bizjak  ubiz...@gmail.com

PR target/52932
* config/i386/avx2intrin.h (_mm256_permutevar8x32_ps): Change second
argument type to __m256i.  Update call to __builtin_ia32_permvarsf256.
* config/i386/sse.md (UNSPEC_VPERMVAR): New.
(UNSPEC_VPERMSI, UNSPEC_VPERMSF): Remove.
(avx2_permvarv8sf, avx2_permvarv8si): Switch operands 1 and 2.
(avx2_permvarmode): Macroize insn from avx2_permvarv8sf and
avx2_permvarv8si using VI4F_256 mode iterator.
* config/i386/i386.c (bdesc_args) __builtin_ia32_permvarsf256:
Update builtin type to V8SF_FTYPE_V8SF_V8SI.
(ix86_expand_vec_perm): Update calls to gen_avx2_permvarv8si and
gen_avx2_permvarv8sf.
(expand_vec_perm_pshufb): Ditto.

testsuite/ChangeLog:

2012-04-12  Uros Bizjak  ubiz...@gmail.com

PR target/52932
* gcc.target/i386/avx2-vpermps-1.c (avx2_test): Use __m256i type for
second function argument.
* gcc.target/i386/avx2-vpermps-2.c (init_permps): Update declaration.
(calc_permps): Update declaration.  Calculate result correctly.
(avx2_test): Change src2 type to union256i_d.
* gcc.target/i386/avx2-vpermd-2.c (calc_permd): Calculate result
correctly.

Patch was tested on x86_64-pc-linux-gnu {,-m32}. Earlier version of
the patch (without mechanical changes) was also tested on AVX2 target
by Kirill.

Patch was committed to mainline SVN, will be committed to 4.7.1 in a few days.

Uros.
Index: config/i386/avx2intrin.h
===
--- config/i386/avx2intrin.h(revision 186383)
+++ config/i386/avx2intrin.h(working copy)
@@ -1034,9 +1034,9 @@ _mm256_permute4x64_pd (__m256d __X, const int __M)
 
 extern __inline __m256
 __attribute__ ((__gnu_inline__, __always_inline__, __artificial__))
-_mm256_permutevar8x32_ps (__m256 __X, __m256 __Y)
+_mm256_permutevar8x32_ps (__m256 __X, __m256i __Y)
 {
-  return (__m256) __builtin_ia32_permvarsf256 ((__v8sf)__X,(__v8sf)__Y);
+  return (__m256) __builtin_ia32_permvarsf256 ((__v8sf)__X, (__v8si)__Y);
 }
 
 #ifdef __OPTIMIZE__
Index: config/i386/sse.md
===
--- config/i386/sse.md  (revision 186383)
+++ config/i386/sse.md  (working copy)
@@ -79,8 +79,7 @@
   UNSPEC_VCVTPS2PH
 
   ;; For AVX2 support
-  UNSPEC_VPERMSI
-  UNSPEC_VPERMSF
+  UNSPEC_VPERMVAR
   UNSPEC_VPERMTI
   UNSPEC_GATHER
   UNSPEC_VSIBADDR
@@ -11901,30 +11900,18 @@
(set_attr prefix vex)
(set_attr mode sseinsnmode)])
 
-(define_insn avx2_permvarv8si
-  [(set (match_operand:V8SI 0 register_operand =x)
-   (unspec:V8SI
- [(match_operand:V8SI 1 register_operand x)
-  (match_operand:V8SI 2 nonimmediate_operand xm)]
- UNSPEC_VPERMSI))]
+(define_insn avx2_permvarmode
+  [(set (match_operand:VI4F_256 0 register_operand =x)
+   (unspec:VI4F_256
+ [(match_operand:VI4F_256 1 nonimmediate_operand xm)
+  (match_operand:V8SI 2 register_operand x)]
+ UNSPEC_VPERMVAR))]
   TARGET_AVX2
-  vpermd\t{%2, %1, %0|%0, %1, %2}
+  vpermssemodesuffix\t{%1, %2, %0|%0, %2, %1}
   [(set_attr type sselog)
(set_attr prefix vex)
(set_attr mode OI)])
 
-(define_insn avx2_permvarv8sf
-  [(set (match_operand:V8SF 0 register_operand =x)
-   (unspec:V8SF
- [(match_operand:V8SF 1 register_operand x)
-  (match_operand:V8SF 2 nonimmediate_operand xm)]
- UNSPEC_VPERMSF))]
-  TARGET_AVX2
-  vpermps\t{%2, %1, %0|%0, %1, %2}
-  [(set_attr type sselog)
-   (set_attr prefix vex)
-   (set_attr mode OI)])
-
 (define_expand avx2_permmode
   [(match_operand:VI8F_256 0 register_operand)
(match_operand:VI8F_256 1 nonimmediate_operand)
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 186383)
+++ config/i386/i386.c  (working copy)
@@ -19937,7 +19937,7 @@ ix86_expand_vec_perm (rtx operands[])
  vt = force_reg (maskmode, vt);
  mask = gen_lowpart (maskmode, mask);
  if (maskmode == V8SImode)
-   emit_insn (gen_avx2_permvarv8si (t1, vt, mask));
+   emit_insn (gen_avx2_permvarv8si (t1, mask, vt));
  else
emit_insn (gen_avx2_pshufbv32qi3 (t1, mask, vt));
 
@@ -19971,13 +19971,13 @@ ix86_expand_vec_perm (rtx operands[])
 the 

libstdc++-v3: Run prettyprinters.exp

2012-04-12 Thread Andreas Schwab
prettyprinters.exp is never run except in non-parallel check or when
requested explicitly.  Tested on powerpc-linux and checked in as
obvious.

Andreas.

2012-04-12  Andreas Schwab  sch...@linux-m68k.org

* testsuite/Makefile.am (check_DEJAGNUnormal0): Run
prettyprinters.exp.
* testsuite/Makefile.in: Regenerated.

diff --git a/libstdc++-v3/testsuite/Makefile.am 
b/libstdc++-v3/testsuite/Makefile.am
index 7094ad5..0cf8de5 100644
--- a/libstdc++-v3/testsuite/Makefile.am
+++ b/libstdc++-v3/testsuite/Makefile.am
@@ -134,7 +134,7 @@ check-DEJAGNU $(check_DEJAGNU_normal_targets): 
check-DEJAGNU%: site.exp
  normal0) \
if $(SHELL) -c $$runtest --version  /dev/null 21; then \
  $$runtest $(AM_RUNTESTFLAGS) $(RUNTESTDEFAULTFLAGS) \
-   $(RUNTESTFLAGS) abi.exp; \
+   $(RUNTESTFLAGS) abi.exp prettyprinters.exp; \
else echo WARNING: could not find \`runtest' 12; :;\
fi; \
dirs=`cd $$srcdir; echo [013-9][0-9]_*/*`;; \
-- 
1.7.10

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.


Re: libstdc++-v3: Run prettyprinters.exp

2012-04-12 Thread Paolo Carlini
Hi,

 prettyprinters.exp is never run except in non-parallel check or when
 requested explicitly.  Tested on powerpc-linux and checked in as
 obvious.

Thanks Andreas.

Since you seem to pay good attention to these testing details, I wonder if you 
could also help with the issue I noticed again over the last days with make 
check-performance not actually doing much if make check isn't run before. I bet 
a fix would turn out to be just a few lines..

Paolo


Re: [PATCH] PR c++/50852 - loose template parameter comparison

2012-04-12 Thread Jason Merrill
Sorry for the delay in getting back to you on this; I just keep thinking 
there has to be a better way to deal with this issue than this 
increasingly complex fixup machinery.


The basic problem we're dealing with is that two typedefs from different 
contexts are considered identical for the purpose of template argument 
comparison, and we are leaving them in non-type template arguments, so 
when we look for a pre-existing specialization one typedef gets replaced 
by the other, which breaks.


The fixup stuff is a solution to the problem of typedefs with 
incompatible template parameters leading to crashes, but it seems to me 
that this is just a symptom of the problem of replacing one typedef with 
another, and we should attack that underlying problem instead.  That 
problem is also visible when the typedefs have different names and an 
error message ends up talking about the wrong one.


It seems to me that the way to do this is to strip typedefs from 
non-type arguments like strip_typedefs does for types.  Did we try this 
before?


Ideal would be to treat a specialization using the typedef-using 
arguments as a typedef to the specialization using the canonical 
arguments, but that could follow on later.


Jason



[google/google-main] Fix regression - SUBTARGET_EXTRA_SPECS overridden by LINUX_GRTE_EXTRA_SPECS (issue 6016047)

2012-04-12 Thread shenhan

Reviewers: asharif1, jingyu, Diego Novillo,

Message:
Hi, the newest chrome gcc (from google-main) fails to linking anything,
by looking into specs file, it seems that 'link_emulation' section is
missing in specs.

The problem I found is that SUBTARGET_EXTRA_SPECS, which is not empty
for chromeos, is overridden by LINUX_GRTE_EXTRA_SPECS.

Please review the proposed a patch. (Tested by buildit bootstrap).

-Han



Description:
In linux.h, macro SUBTARGET_EXTRA_SPECS is overridden by
LINUX_GRTE_EXTRA_SPECS, so the origin content of SUBTARGET_EXTRA_SPECS
never gets written into gcc spec, which causes linking failure for
chrome gcc.


Please review this at http://codereview.appspot.com/6016047/

Affected files:
  M gcc/config/i386/i386.h
  M gcc/config/i386/linux.h


Index: gcc/config/i386/i386.h
diff --git a/gcc/config/i386/i386.h b/gcc/config/i386/i386.h
index 7721c46..29fdd0b 100644
--- a/gcc/config/i386/i386.h
+++ b/gcc/config/i386/i386.h
@@ -621,8 +621,9 @@ enum target_cpu_default
 #endif

 #define EXTRA_SPECS\
-  { cc1_cpu,  CC1_CPU_SPEC },  \
-  SUBTARGET_EXTRA_SPECS
+  { cc1_cpu,  CC1_CPU_SPEC },  \
+LINUX_GRTE_EXTRA_SPECS \
+SUBTARGET_EXTRA_SPECS
 

 /* Set the value of FLT_EVAL_METHOD in float.h.  When using only the
Index: gcc/config/i386/linux.h
diff --git a/gcc/config/i386/linux.h b/gcc/config/i386/linux.h
index ade524c..383dedf 100644
--- a/gcc/config/i386/linux.h
+++ b/gcc/config/i386/linux.h
@@ -30,7 +30,3 @@ along with GCC; see the file COPYING3.  If not see
 #ifndef LINUX_GRTE_EXTRA_SPECS
 #define LINUX_GRTE_EXTRA_SPECS
 #endif
-
-#undef  SUBTARGET_EXTRA_SPECS
-#define SUBTARGET_EXTRA_SPECS \
-  LINUX_GRTE_EXTRA_SPECS




Re: [PATCH] Fix PR18589

2012-04-12 Thread William J. Schmidt
On Thu, 2012-04-12 at 09:50 -0700, H.J. Lu wrote:
 On Thu, Apr 5, 2012 at 6:49 AM, William J. Schmidt
 wschm...@linux.vnet.ibm.com wrote:
  On Thu, 2012-04-05 at 11:23 +0200, Richard Guenther wrote:
  On Wed, Apr 4, 2012 at 9:15 PM, William J. Schmidt
  wschm...@linux.vnet.ibm.com wrote:
  
   Unfortunately this seems to be necessary if I name the two passes
   reassoc1 and reassoc2.  If I try to name both of them reassoc I
   get failures in other tests like gfortran.dg/reassoc_4, where
   -fdump-tree-reassoc1 doesn't work.  Unless I'm missing something
   obvious, I think I need to keep that change.
 
  Hm, naming them reassoc1 and reassoc2 is a hack.  Naming both
  reassoc will not trigger re-naming them to reassoc1 and reassoc2
  I think.  How ugly.  Especially that -fdump-tree-reassoc will no longer
  work.  Maybe instead of using two pass structs resort to using
  the existing hack with using first_pass_instance and 
  TODO_mark_first_instance.
 
  OK, that seems to be the best among evils.  Using the
  first_pass_instance hack, the patch is transformed as below.
  Regstrapped on powerpc64-linux, no additional failures.  OK for trunk?
 
  Thanks,
  Bill
 
 
  gcc:
 
  2012-04-05  Bill Schmidt  wschm...@linux.vnet.ibm.com
 
 PR tree-optimization/18589
 * tree-ssa-reassoc.c (reassociate_stats): Add two fields.
 (operand_entry): Add count field.
 (add_repeat_to_ops_vec): New function.
 (completely_remove_stmt): Likewise.
 (remove_def_if_absorbed_call): Likewise.
 (remove_visited_stmt_chain): Remove feeding builtin pow/powi calls.
 (acceptable_pow_call): New function.
 (linearize_expr_tree): Look for builtin pow/powi calls and add 
  operand
 entries with repeat counts when found.
 (repeat_factor_d): New struct and associated typedefs.
 (repeat_factor_vec): New static vector variable.
 (compare_repeat_factors): New function.
 (get_reassoc_pow_ssa_name): Likewise.
 (attempt_builtin_powi): Likewise.
 (reassociate_bb): Call attempt_builtin_powi.
 (fini_reassoc): Two new calls to statistics_counter_event.
 
 
 It breaks bootstrap on Linux/ia32:
 
 ../../src-trunk/gcc/tree-ssa-reassoc.c: In function 'void
 attempt_builtin_powi(gimple, VEC_operand_entry_t_heap**,
 tree_node**)':
 ../../src-trunk/gcc/tree-ssa-reassoc.c:3189:41: error: format '%ld'
 expects argument of type 'long int', but argument 3 has type 'long
 long int' [-Werror=format]
  fprintf (dump_file, )^%ld\n, power);
  ^
 ../../src-trunk/gcc/tree-ssa-reassoc.c:3222:44: error: format '%ld'
 expects argument of type 'long int', but argument 3 has type 'long
 long int' [-Werror=format]
 fprintf (dump_file, )^%ld\n, power);
 ^
 cc1plus: all warnings being treated as errors
 
 
 H.J.
 

Thanks, H.J.  Sorry for the problem!

Fixing as follows.  I'll plan to commit as obvious shortly.


2012-04-12  Bill Schmidt  wschm...@linux.vnet.ibm.com

* tree-ssa-reassoc.c (attempt_builtin_powi_stats): Change %ld to
HOST_WIDE_INT_PRINT_DEC in format strings.


Index: gcc/tree-ssa-reassoc.c
===
--- gcc/tree-ssa-reassoc.c  (revision 186384)
+++ gcc/tree-ssa-reassoc.c  (working copy)
@@ -3186,7 +3186,8 @@ attempt_builtin_powi (gimple stmt, VEC(operand_ent
  if (elt  vec_len - 1)
fputs ( * , dump_file);
}
- fprintf (dump_file, )^%ld\n, power);
+ fprintf (dump_file, )^HOST_WIDE_INT_PRINT_DEC\n,
+  power);
}
}
}
@@ -3219,7 +3220,7 @@ attempt_builtin_powi (gimple stmt, VEC(operand_ent
  if (elt  vec_len - 1)
fputs ( * , dump_file);
}
- fprintf (dump_file, )^%ld\n, power);
+ fprintf (dump_file, )^HOST_WIDE_INT_PRINT_DEC\n, power);
}
 
  reassociate_stats.pows_created++;




Re: libstdc++-v3: Run prettyprinters.exp

2012-04-12 Thread Andreas Schwab
Paolo Carlini pcarl...@gmail.com writes:

 Since you seem to pay good attention to these testing details, I wonder if
 you could also help with the issue I noticed again over the last days with
 make check-performance not actually doing much if make check isn't run
 before. I bet a fix would turn out to be just a few lines..

Perhaps because
# XXX Need to add dependency on libtestc++.a

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.


[PR tree-optimization/52558]: RFC: questions on store data race

2012-04-12 Thread Aldy Hernandez
Here we have a testcase that affects both the C++ memory model and 
transactional memory.


[Hans, this is caused by the same problem that is causing the 
speculative register promotion issue you and Torvald pointed me at].


In the following testcase (adapted from the PR), the loop invariant 
motion pass caches a pre-existing value for g_2, and then performs a 
store to g_2 on every path, causing a store data race:


int g_1 = 1;
int g_2 = 0;

int func_1(void)
{
  int l;
  for (l = 0; l  1234; l++)
  {
if (g_1)
  return l;
else
  g_2 = 0;  -- Store to g_2 should only happen if !g_1
  }
  return 999;
}

This gets transformed into something like:

g_2_lsm = g_2;
if (g_1) {
g_2 = g_2_lsm;  // boo! hiss!
return 0;
} else {
g_2_lsm = 0;
g_2 = g_2_lsm;
}

The spurious write to g_2 could cause a data race.

Andrew has suggested verifying that the store to g_2 was actually 
different than on entry, and letting PHI copy propagation optimize the 
redundant comparisons.  Like this:


g_2_lsm = g_2;
if (g_1) {
if (g_2_lsm != g_2) // hopefully optimized away
g_2 = g_2_lsm;  // hopefully optimized away
return 0;
} else {
g_2_lsm = 0;
if (g_2_lsm != g_2) // hopefully optimized away
g_2 = g_2_lsm;
}

...which PHI copy propagation and/or friends should be able to optimize.

For that matter, regardless of the memory model, the proposed solution 
would improve the existing code by removing the extra store (in this 
contrived test case anyhow).


Anyone see a problem with this approach (see attached patch)?

Assuming the unlikely scenario that everyone likes this :), I have two 
problems that I would like answered.  But feel free to ignore if the 
approach is a no go.


1. No pass can figure out the equality (or inequality) of g_2_lsm and 
g_2.  In the PR, Richard mentions that both FRE/PRE can figure this out, 
but they are not run after store motion.  DOM should also be able to, 
but apparently does not :(.  Tips?


2. The GIMPLE_CONDs I create in this patch are currently causing 
problems with complex floats/doubles.  do_jump is complaining that it 
can't compare them, so I assume it is too late (in tree-ssa-loop-im.c) 
to compare complex values since complex lowering has already happened 
(??). Is there a more canonical way of creating a GIMPLE_COND that may 
contain complex floats at this stage?


Thanks folks.
* tree-ssa-loop-im.c (execute_sm): Guard store motion with a
conditional.
* opts.c (finish_options): Do not allow store or load data races
with -fgnu-tm.

Index: tree-ssa-loop-im.c
===
--- tree-ssa-loop-im.c  (revision 186108)
+++ tree-ssa-loop-im.c  (working copy)
@@ -1999,8 +1999,59 @@ execute_sm (struct loop *loop, VEC (edge
 
   FOR_EACH_VEC_ELT (edge, exits, i, ex)
 {
-  store = gimple_build_assign (unshare_expr (ref-mem), tmp_var);
-  gsi_insert_on_edge (ex, store);
+  basic_block new_bb, then_bb, old_dest;
+  edge then_old_edge;
+  gimple_stmt_iterator gsi;
+  gimple stmt;
+  tree t1;
+
+  if (PARAM_VALUE (PARAM_ALLOW_STORE_DATA_RACES))
+   {
+ store = gimple_build_assign (unshare_expr (ref-mem), tmp_var);
+ gsi_insert_on_edge (ex, store);
+   }
+  else
+   {
+ old_dest = ex-dest;
+ new_bb = split_edge (ex);
+ then_bb = create_empty_bb (new_bb);
+ if (current_loops  new_bb-loop_father)
+   add_bb_to_loop (then_bb, new_bb-loop_father);
+
+ gsi = gsi_start_bb (new_bb);
+ t1 = make_rename_temp (TREE_TYPE (ref-mem), NULL);
+ stmt = gimple_build_assign (t1, unshare_expr (ref-mem));
+ gsi_insert_after (gsi, stmt, GSI_CONTINUE_LINKING);
+ stmt = gimple_build_cond (NE_EXPR, t1, tmp_var,
+   NULL_TREE, NULL_TREE);
+ gsi_insert_after (gsi, stmt, GSI_CONTINUE_LINKING);
+
+ gsi = gsi_start_bb (then_bb);
+ store = gimple_build_assign (unshare_expr (ref-mem), tmp_var);
+ gsi_insert_after (gsi, store, GSI_CONTINUE_LINKING);
+
+ make_edge (new_bb, then_bb, EDGE_TRUE_VALUE);
+ make_edge (new_bb, old_dest, EDGE_FALSE_VALUE);
+ then_old_edge = make_edge (then_bb, old_dest, EDGE_FALLTHRU);
+ set_immediate_dominator (CDI_DOMINATORS, then_bb, new_bb);
+
+ for (gsi = gsi_start_phis (old_dest); !gsi_end_p (gsi); gsi_next 
(gsi))
+   {
+ gimple phi = gsi_stmt (gsi);
+ unsigned i;
+
+ for (i = 0; i  gimple_phi_num_args (phi); i++)
+   if (gimple_phi_arg_edge (phi, i)-src == new_bb)
+ {
+   tree arg = gimple_phi_arg_def (phi, i);
+   

Re: libstdc++-v3: Run prettyprinters.exp

2012-04-12 Thread Paolo Carlini

On 04/12/2012 11:43 PM, Andreas Schwab wrote:

Paolo Carlinipcarl...@gmail.com  writes:


Since you seem to pay good attention to these testing details, I wonder if
you could also help with the issue I noticed again over the last days with
make check-performance not actually doing much if make check isn't run
before. I bet a fix would turn out to be just a few lines..

Perhaps because
# XXX Need to add dependency on libtestc++.a
Indeed, I was looking for some sort of missing dependency... Any 
patchlet forthcoming for that? ;)


Thanks,
Paolo.


Re: Support for Runtime CPU type detection via builtins (issue5754058)

2012-04-12 Thread Sriraman Tallam
Ping.

On Tue, Apr 3, 2012 at 12:47 PM, Sriraman Tallam tmsri...@google.com wrote:
 Hi,

 i386 maintainers -  Is this patch ok?

 Thanks,
 -Sri.

 On Mon, Apr 2, 2012 at 5:48 PM, Sriraman Tallam tmsri...@google.com wrote:
 On Mon, Apr 2, 2012 at 5:38 AM, Richard Guenther
 richard.guent...@gmail.com wrote:
 On Sat, Mar 31, 2012 at 1:03 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 On Fri, Mar 30, 2012 at 5:47 AM, Michael Matz m...@suse.de wrote:
 Hi,

 On Thu, 29 Mar 2012, Sriraman Tallam wrote:

 +struct __processor_model
 +{
 +  /* Vendor. */
 +  unsigned int __cpu_is_amd : 1;
 +  unsigned int __cpu_is_intel : 1;
 +  /* CPU type. */
 +  unsigned int __cpu_is_intel_atom : 1;
 +  unsigned int __cpu_is_intel_core2 : 1;
 +  unsigned int __cpu_is_intel_corei7 : 1;
 +  unsigned int __cpu_is_intel_corei7_nehalem : 1;
 +  unsigned int __cpu_is_intel_corei7_westmere : 1;
 +  unsigned int __cpu_is_intel_corei7_sandybridge : 1;
 +  unsigned int __cpu_is_amdfam10h : 1;
 +  unsigned int __cpu_is_amdfam10h_barcelona : 1;
 +  unsigned int __cpu_is_amdfam10h_shanghai : 1;
 +  unsigned int __cpu_is_amdfam10h_istanbul : 1;
 +  unsigned int __cpu_is_amdfam15h_bdver1 : 1;
 +  unsigned int __cpu_is_amdfam15h_bdver2 : 1;
 +} __cpu_model;

 It doesn't make sense for the model to be a bitfield, a processor will
 have only ever exactly one model.  Just make it an enum or even just an
 int.

 Not entirely true, nehalem and corei7 can be both set. However, I
 modified this by dividing it into types and sub types and then did
 what you said.

 Uh...  then I suppose you need to document somewhere what names
 match to what cpuid family/model (supposedly thats where your two-layer
 hierarchy comes from, which incidentially misses one layer, the vendor?)

 Added documentation to extend.texi

 Patch available for review here:
 http://codereview.appspot.com/5754058

 Thanks,
 -Sri.



 Richard.

       * config/i386/i386.c (build_processor_features_struct): New function.
       (build_processor_model_struct): New function.
       (make_var_decl): New function.
       (get_field_from_struct): New function.
       (fold_builtin_target): New function.
       (ix86_fold_builtin): New function.
       (ix86_expand_builtin): Expand new builtins by folding them.
       (make_cpu_type_builtin): New functions.
       (ix86_init_platform_type_builtins): Make the new builtins.
       (ix86_init_builtins): Make new builtins to detect CPU type.
       (TARGET_FOLD_BUILTIN): New macro.
       (IX86_BUILTIN_CPU_INIT): New enum value.
       (IX86_BUILTIN_CPU_IS): New enum value.
       (IX86_BUILTIN_CPU_SUPPORTS): New enum value.
       * config/i386/i386-builtin-types.def: New function type.
       * testsuite/gcc.target/builtin_target.c: New testcase.

       * libgcc/config/i386/i386-cpuinfo.c: New file.
       * libgcc/config/i386/t-cpuinfo: New file.
       * libgcc/config.host: Include t-cpuinfo.
       * libgcc/config/i386/libgcc-glibc.ver: Version symbols __cpu_model
       and __cpu_features.

 Patch available for review here:
 http://codereview.appspot.com/5754058

 Thanks,
 -Sri.




 Ciao,
 Michael.


[v3] hashtable aliases

2012-04-12 Thread Benjamin De Kosnik


This patch restructures the std::unordered_* containers, as per the
tempting comment in unordered_set:

  // NB: When we get typedef templates these class definitions
  // will be unnecessary.

This required some changes in _Hashtable, forced the consolidation of
insert() logic in a base class, and necessitated the restructuring of
type definitions in the _Hashtable. I think the current form allows
some nice advantages in terms of nested types and expressiveness. 

tested x86/linux
tested x86/linux -debug

-benjamin

20120412-334.patch.bz2
Description: application/bzip