Re: [PATCH, libcpp]: Use x86 __builtin_ia32_pcmpestri128 instead of asm.

2012-06-19 Thread Uros Bizjak
On Tue, Jun 19, 2012 at 12:07 AM, Richard Henderson r...@redhat.com wrote:
 On 2012-06-18 13:19, Uros Bizjak wrote:
        /* ??? The builtin doesn't understand that the PCMPESTRI read from
        memory need not be aligned.  */
 -      __asm (%vpcmpestri $0, (%1), %2
 -          : =c(index) : r(s), x(search), a(4), d(16));
 +      sv = __builtin_ia32_loaddqu ((const char *) s);
 +      index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0);
 +


 Surely the comment can be removed too then?

I'm not sure there. The builtin, as defined, expects V16QI operand
with xm constraint. Using:

int test (const char *s1)
{
  const v16qi *p = (const v16qi *)(unsigned long) s1;
  return __builtin_ia32_pcmpistri128 (*p, ...);
}

will generate movdqa before pcmpistri.

With x86 pcmp[ie]str patch, we trick gcc to pass unaligned memory to
the pcmp[ie]str RTX, but we still need __builtin_ia32_loaddqu in front
of __builtin_ia32_pcmpestri128.

Uros.


Re: [PATCH 2/3] Add XLP-specific atomic instructions and tweaks.

2012-06-19 Thread Maxim Kuvyrkov
On 16/06/2012, at 7:45 PM, Richard Sandiford wrote:

 Maxim Kuvyrkov ma...@codesourcery.com writes:
 Updated patch attached.  Any further comments?
 
 It's due to my bad explanation, sorry, but this isn't what I meant.
 The two main changes I was looking for were:
 
 1) Your pattern uses:
 
[(mem:GPR (match_operand:P 1 register_operand d))]
 
   Instead, we should define a new memory predicate/constraint pair
   for memories that only accept register addresses.  I.e. there
   should be a new predicate to go alongside things like
   memory_operand and stack_operand, except that the new one would
   be even more restrictive in the set of addresses that it allows.
   mem_reg_operand seems as good a name as any, but I'm not wedded
   to a particular name.
 
   The new memory constraint would likewise go alongside m, W, etc.,
   except that (like the predicate) it too would only allow register
   addresses.  We're running low on constraint latters, so a two-operand
   one like ZR might be OK.  We can then use Z as a prefix for other
   MIPS-specific memory and address constraints in future.
 
   The atomic_exchange and atomic_fetch_add expanders should use
   the code I quoted in the earlier message to force the original
   memory_operand into this more restrictive form:
 
if (!mem_reg_operand (operands[1], MODEmode))
  {
addr = force_reg (Pmode, XEXP (operands[1], 0));
operands[1] = replace_equiv_address (operands[1], addr);
  }
 
   The reason is that hard-coding (mem ...) in named define_insns
   (i.e. those with a gen_* function) is usually a mistake.  We end
   up discarding the original MEM and losing track of its MEM_ATTRs.
 
   (Note that this change means we don't need separate Pmode == SImode
   and Pmode == DImode patterns.)
 
 2) Your pattern has:
 
  (match_operand:GPR 2 arith_operand 0)
 
   to match:
 
  (match_operand:GPR 0 register_operand =d)
 
   Operand 2 doesn't accept constants, so it should be a register_operand
   rather than an arith_operand.  Then the atomic_exchange and 
 atomic_fetch_add 
   expanders should use force_reg to turn _their_ arith_operands into
   register_operands before calling gen_atomic_fetch_addmode_ldadd
   and gen_atomic_fetchmode_swap.
 
 Your new comment says:
 
   /* Spill the address to a register upfront to simplify reload's job.  */
 
 But this isn't about making reload's job easier.  Reload can cope just
 fine with the arith_operand above and would cope just fine with:
 
   (match_operand ... memory_operand ZR)
 
 with ZR defined as above.  Instead. we're trying to describe the
 instruction as accurately as possible so that the pre-reload passes
 (including IRA) are in a position to make good optimisation decisions.
 They're less able to do that if patterns claim to accept more things
 than they actually do.
 
 I.e. it's the same reason that we don't just use general_operand
 for all reloadable rvalues and nonimmediate_operand for all
 reloadable lvalues.  Trying to use accurate predicates is such
 standard practice that I think it'd be better to drop the comment here.
 Having one gives the impression that we're trying to cope with some
 special case, which AFAICT we're not.

Richard,

Thank you for a thoughtful write-up.  I really appreciate the time you are 
taking to educate me.

I've incorporated yours and Richard H.'s comments (stole pieces from ARM port) 
and attached is the updated patch.

The only other change that I made that was not in your comments is the addition 
of b mips_print_operand specifier.  The LDADD and SWAP instructions accept 
their address as a plain register without parenthesis, so I've added the 
specifier to skip outputting parenthesis.

Any further comments?

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics



0002-Add-XLP-specific-atomic-instructions-and-tweaks.patch
Description: Binary data


Re: [PATCH, libcpp]: Use x86 __builtin_ia32_pcmpestri128 instead of asm.

2012-06-19 Thread Uros Bizjak
On Tue, Jun 19, 2012 at 8:38 AM, Uros Bizjak ubiz...@gmail.com wrote:
 On Tue, Jun 19, 2012 at 12:07 AM, Richard Henderson r...@redhat.com wrote:
 On 2012-06-18 13:19, Uros Bizjak wrote:
        /* ??? The builtin doesn't understand that the PCMPESTRI read from
        memory need not be aligned.  */
 -      __asm (%vpcmpestri $0, (%1), %2
 -          : =c(index) : r(s), x(search), a(4), d(16));
 +      sv = __builtin_ia32_loaddqu ((const char *) s);
 +      index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0);
 +


 Surely the comment can be removed too then?

 I'm not sure there. The builtin, as defined, expects V16QI operand
 with xm constraint. Using:

 int test (const char *s1)
 {
  const v16qi *p = (const v16qi *)(unsigned long) s1;
  return __builtin_ia32_pcmpistri128 (*p, ...);
 }

 will generate movdqa before pcmpistri.

Pedantic correction: __builtin_ia32_pcmpistri128 (v16qi_arg, *p, N);

movdqa in front of this builtin will be generated with -O0.

Uros.


Re: [v3] PR 53270 fix hppa-linux bootstrap regression

2012-06-19 Thread Jonathan Wakely
On 14 June 2012 23:23, Jonathan Wakely wrote:

 For 4.6.4 and 4.7.2 I plan to make a less intrusive change, #undef'ing
 the __GTHREAD_MUTEX_INIT, _GTHREAD_RECURSIVE_MUTEX_INIT and
 __GTHREAD_COND_INIT macros on hppa-linux in C++11 mode, so that the
 init functions are used instead.  This fixes the bootstrap regression
 on hppa-linux without affecting other targets.

Here's the simpler patch I'm committing to the 4.7 and 4.6 branches.

PR libstdc++/53270
* config/os/gnu-linux/os_defines.h: Disable static initializer macros
for gthreads types in C++11 mode.

Tested hppa-linux.
commit 82976f5a0e4a69d247bded9d8bae99a633360f20
Author: Jonathan Wakely jwakely@gmail.com
Date:   Tue Jun 19 01:07:54 2012 +0100

PR libstdc++/53270
* config/os/gnu-linux/os_defines.h: Disable static initializer macros
for gthreads types in C++11 mode.

diff --git a/libstdc++-v3/config/os/gnu-linux/os_defines.h 
b/libstdc++-v3/config/os/gnu-linux/os_defines.h
index c4aa305..f41160f 100644
--- a/libstdc++-v3/config/os/gnu-linux/os_defines.h
+++ b/libstdc++-v3/config/os/gnu-linux/os_defines.h
@@ -46,4 +46,10 @@
 # undef _GLIBCXX_HAVE_GETS
 #endif
 
+#if defined(__hppa__)  defined(__GXX_EXPERIMENTAL_CXX0X__)
+# define _GTHREAD_USE_MUTEX_INIT_FUNC
+# define _GTHREAD_USE_RECURSIVE_MUTEX_INIT_FUNC
+# define _GTHREAD_USE_COND_INIT_FUNC
+#endif
+
 #endif


Re: [Patch] Adjustments for Windows x64 SEH

2012-06-19 Thread Tristan Gingold

On Jun 18, 2012, at 4:28 PM, Kai Tietz wrote:

 Hello Tristan,
 
 patch works for me, too. Just one nit about the patch.
 
 2012/6/18 Tristan Gingold ging...@adacore.com:
 @@ -8558,6 +8558,11 @@ ix86_frame_pointer_required (void)
   if (TARGET_32BIT_MS_ABI  cfun-calls_setjmp)
 return true;
 
 +  /* Win64 SEH, very large frames need a frame-pointer as maximum stack
 + allocation is 4GB (add a safety guard for saved registers).  */
 +  if (TARGET_64BIT_MS_ABI  get_frame_size () + 4096  SEH_MAX_FRAME_SIZE)
 +return true;
 Where does this magic 4096 comes from?  Is it intended to be the
 page-size, or is it meant to be the maximum stack-frame consumed by
 prologue?

It is an upper bound for the maximum stack-frame consumed by prologue.

  I would suggest to use here instead:
 +  if (TARGET_64BIT_MS_ABI  get_frame_size ()  (SEH_MAX_FRAME_SIZE - 4096))
 +return true;
 
 Additional a testcase for big-stackframe would be interesting.  You
 won't need to make here a execution test, a assembler-scan would be
 enough.

I think that a simple build test should make it.

Thanks,
Tristan.



Re: RFA: Fix PR53688

2012-06-19 Thread Richard Guenther
On Mon, Jun 18, 2012 at 4:59 PM, Michael Matz m...@suse.de wrote:
 Hi,

 now that we regard MEM_EXPR as a conservative approximation for MEM_SIZE
 (and MEM_OFFSET) we must ensure that this is really the case.  It isn't
 currently for the string expanders, as they use the MEM_REF (whose address
 was taken) directly as the one to use for MEM_EXPR on the MEM rtx.  That's
 wrong, on gimple side we take the address only and hence its size is
 arbitrary.

 So, we have to build a memref always and rewrite its type to one
 representing the real size.  Note that TYPE_MAX_VALUE may be NULL, so we
 don't need to check for 'len' being null or not.

 This fixes the C testcase (don't know about fma 3d), and is in
 regstrapping on x86_64-linux.  Okay if that passes?

Ok.

Note that as a followup you should be able to remove the whole

  /* Allow the string and memory builtins to overflow from one
 field into another, see http://gcc.gnu.org/PR23561.
 Thus avoid COMPONENT_REFs in MEM_EXPR unless we know the whole
 memory accessed by the string or memory builtin will fit
 within the field.  */
  if (MEM_EXPR (mem)  TREE_CODE (MEM_EXPR (mem)) == COMPONENT_REF)
{

block.  Also practically (as we are expanding from GIMPLE now), off
should always be zero and TREE_CODE (exp) should never be
POINTER_PLUS_EXPR, nor should there be wrapping conversions.
The 'off' case can also be dealt with by using the offset operand of
the MEM_REF we build.

Finally MEM_EXPR itself has invalid type-based aliasing properties
(it has so even before your patch), of course that doesn't really matter,
as below we do set_mem_alias_set (mem, 0).  Still with MEM_REF
you should be able to do

Index: gcc/builtins.c
===
--- gcc/builtins.c  (revision 188733)
+++ gcc/builtins.c  (working copy)
@@ -1250,132 +1250,27 @@ expand_builtin_prefetch (tree exp)
 static rtx
 get_memory_rtx (tree exp, tree len)
 {
-  tree orig_exp = exp;
   rtx addr, mem;
-  HOST_WIDE_INT off;

-  /* When EXP is not resolved SAVE_EXPR, MEM_ATTRS can be still derived
- from its expression, for expr-a.b only variable.a.b is recorded.  */
-  if (TREE_CODE (exp) == SAVE_EXPR  !SAVE_EXPR_RESOLVED_P (exp))
-exp = TREE_OPERAND (exp, 0);
-
-  addr = expand_expr (orig_exp, NULL_RTX, ptr_mode, EXPAND_NORMAL);
+  addr = expand_expr (exp, NULL_RTX, ptr_mode, EXPAND_NORMAL);
   mem = gen_rtx_MEM (BLKmode, memory_address (BLKmode, addr));

-  /* Get an expression we can use to find the attributes to assign to MEM.
- If it is an ADDR_EXPR, use the operand.  Otherwise, dereference it if
- we can.  First remove any nops.  */
-  while (CONVERT_EXPR_P (exp)
- POINTER_TYPE_P (TREE_TYPE (TREE_OPERAND (exp, 0
-exp = TREE_OPERAND (exp, 0);
-
-  off = 0;
-  if (TREE_CODE (exp) == POINTER_PLUS_EXPR
-   TREE_CODE (TREE_OPERAND (exp, 0)) == ADDR_EXPR
-   host_integerp (TREE_OPERAND (exp, 1), 0)
-   (off = tree_low_cst (TREE_OPERAND (exp, 1), 0))  0)
-exp = TREE_OPERAND (TREE_OPERAND (exp, 0), 0);
-  else if (TREE_CODE (exp) == ADDR_EXPR)
-exp = TREE_OPERAND (exp, 0);
-  else if (POINTER_TYPE_P (TREE_TYPE (exp)))
-exp = build1 (INDIRECT_REF, TREE_TYPE (TREE_TYPE (exp)), exp);
-  else
-exp = NULL;
+  /* Build a memory reference suitable for MEM_EXPR for use by the
+ alias oracle.  Make sure to give that memory reference a proper
+ access size as well as alias-set zero.  */
+  exp = fold_build2 (MEM_REF,
+build_array_type (char_type_node,
+  build_range_type (sizetype,
+size_one_node, len)),
+exp, build_int_cst (ptr_type_node, 0));

   /* Honor attributes derived from exp, except for the alias set
  (as builtin stringops may alias with anything) and the size
  (as stringops may access multiple array elements).  */
-  if (exp)
-{
-  set_mem_attributes (mem, exp, 0);
-
-  if (off)
-   mem = adjust_automodify_address_nv (mem, BLKmode, NULL, off);
-
-  /* Allow the string and memory builtins to overflow from one
-field into another, see http://gcc.gnu.org/PR23561.
-Thus avoid COMPONENT_REFs in MEM_EXPR unless we know the whole
-memory accessed by the string or memory builtin will fit
-within the field.  */
-  if (MEM_EXPR (mem)  TREE_CODE (MEM_EXPR (mem)) == COMPONENT_REF)
-   {
- tree mem_expr = MEM_EXPR (mem);
- HOST_WIDE_INT offset = -1, length = -1;
- tree inner = exp;
-
- while (TREE_CODE (inner) == ARRAY_REF
-|| CONVERT_EXPR_P (inner)
-|| TREE_CODE (inner) == VIEW_CONVERT_EXPR
-|| TREE_CODE (inner) == SAVE_EXPR)
-   inner = TREE_OPERAND (inner, 0);
-
- gcc_assert (TREE_CODE (inner) == COMPONENT_REF);
-
- if (MEM_OFFSET_KNOWN_P 

Re: [patch] Fix PR48109 using artificial top-level asm statements (darwin/objc)

2012-06-19 Thread Richard Guenther
On Mon, Jun 18, 2012 at 7:51 PM, Steven Bosscher stevenb@gmail.com wrote:
 Hello,

 This patch started as an attempt to remove #include output.h from
 objc/: Instead of writing references directly to asm_out_file, the
 references are output as top-level asm statements. It's a bit of a
 hack, but it works and it's a better hack than writing to
 asm_out_file from a front end, and it also happens to fix PR
 objc/48109 to make ObjC on darwin/-m32 LTO-compatible.

 Bootstrappedtested on darwin by Iain, and bootstrappedtested by me
 on x86_64-unknown-linux-gnu.
 OK for trunk?

Ok for the general idea and implementation, I'd still ask for a darwin
maintainer
ack though.

Thanks,
Richard.

 Ciao!
 Steven


Re: [4.6][ARM] Backport MCR Not available in Thumb1

2012-06-19 Thread Richard Earnshaw
On 19/06/12 04:03, Joey Ye wrote:
 Backporting trunk r179979
 
 OK for 4.6?
 
 Backported from mainline
 2011-10-14  David Alan Gilbert  david.gilb...@linaro.org
 
 PR target/48126
 * config/arm/arm.c (arm_output_sync_loop): Move label before
 barrier.
 
 Index: gcc/config/arm/arm.h
 ===
 --- gcc/config/arm/arm.h  (revision 188331)
 +++ gcc/config/arm/arm.h  (working copy)
 @@ -294,7 +294,8 @@
  #define TARGET_HAVE_DMB  (arm_arch7)
  
  /* Nonzero if this chip implements a memory barrier via CP15.  */
 -#define TARGET_HAVE_DMB_MCR  (arm_arch6k  ! TARGET_HAVE_DMB)
 +#define TARGET_HAVE_DMB_MCR  (arm_arch6  ! TARGET_HAVE_DMB \
 +   ! TARGET_THUMB1)
  
  /* Nonzero if this chip implements a memory barrier instruction.  */
  #define TARGET_HAVE_MEMORY_BARRIER (TARGET_HAVE_DMB || TARGET_HAVE_DMB_MCR)
 
 

Not ok (yet), the ChangeLog entry doesn't match the patch.

R.



Re: [PATCH] Bad code generation: incorrect folding of TARGET_MEM_REF into a constant

2012-06-19 Thread Richard Guenther
On Mon, Jun 18, 2012 at 9:51 PM, Jiří Hruška ji...@fud.cz wrote:
 Hi all,

 I have tracked down a bug which results in invalid code being
 generated for indexed TARGET_MEM_REF expressions during dominator
 optimization.

 The conditions are: accessing objects adjacent in memory in a loop (in
 order to generate the TARGET_MEM_REF gimple) and optimizing this tree
 item during dom optimization (to trigger folding). There might be
 another set of conditions which get to the same state through a
 different

 The problem is that get_ref_base_and_extent() for TARGET_MEM_REF with
 variable index sets `maxsize' to -1 to signal that via index or
 index2, the whole object can be reached and returns. But before that,
 if the target object is a declaration with known size and `maxsize' is
 -1, it is updated, which can be taken by the caller (if `maxsize'
 equals to basic `size') as possibility to fold the expression into a
 constant.

 Assuming I understood the code and comments right, the solution is
 then to really take a quick exit in the abovementioned indexed case
 instead of just breaking the loop and letting the rest of function
 change the `maxsize' parameter.

 A quick search did not reveal any existing ticket for this problem.
 The bug was originally found in GCC 4.6.1 while compiling x86 code
 under MinGW, which is what the attached simplified testcase is based
 upon (compilation with -O1 is OK, anything higher fails).
 GCC 3.4.6 seems unaffected.
 Also the relevant code parts seem unchanged in current trunk.
 Patched build of 4.7.1 survived bootstrap on x86_64-rhel fine.

 The attached patch and all changes provided therein are released to
 public domain and can be freely used or modified by anyone.

 (This is my first time dealing with GCC bowels, please excuse my
 superficial understanding of everything I have written above.)

The issue is that your testcase is invalid.

__attribute__((section(.rodata$int0))) const int fooS = 0;
__attribute__((section(.rodata$int1))) const int foo1 = 1;
__attribute__((section(.rodata$int2))) const int foo2 = 2;
__attribute__((section(.rodata$int3))) const int foo3 = 3;
__attribute__((section(.rodata$int4))) const int fooE = 0;
...
int x = ret(*(fooS + i));

this access is only ever valid for i == 0 as otherwise you are creating
a pointer that points outside of the object fooS.

Richard.


 Thanks,
 Jiri Hruska


Re: [PATCH, testsuite]: Fix scan-tree-dump-times argument order in gcc.dg/tree-ssa/vrp68.c.

2012-06-19 Thread Richard Guenther
On Mon, Jun 18, 2012 at 10:01 PM, Janis Johnson
janis_john...@mentor.com wrote:
 On 06/17/2012 05:03 AM, Richard Guenther wrote:
 On Sun, Jun 17, 2012 at 10:41 AM, Uros Bizjak ubiz...@gmail.com wrote:
 Hello!

 The testcase still fails on x86_64-pc-linux-gnu with:

 FAIL: gcc.dg/tree-ssa/vrp68.c scan-tree-dump-times vrp1 link_error 1

 since there are two calls to link_error.

 Oops.  I wonder how I did not see those failures myself ...

 Richard.

 I'm confused about what this test is supposed to do.  It uses
 dg-do link which means the compile (test for excess errors) will
 fail if there is a reference to link_error.  There are two uses of
 scan-tree-dump-times for the same string in the same file, so one
 of those is guaranteed to fail.  It looks like the scans aren't
 needed, and dg-do link is the thing that needs the xfail.

No, the scan-tree-dump-times are supposed to catch that already
VRP1 has done the optimization - it does not so fully, which is
why I added the XFAILed scan-tree-dump-times.  But we still catch
that XFAILed case with subsequent optimizations so the link succeeds
nevertheless.

The testcase fails now, I must have broken the optimization somehow
and I am looking into it.

Richard.

 Janis


[patch] Fix failing nested-3.C on ARM.

2012-06-19 Thread Richard Earnshaw
The regexp in nested-3.C has to parse the machine-specific comment
character; on ARM that is '@'.

Tested on arm-eabi, where this test now passes.

OK?

R.

* g++.dg/debug/dwarf2/nested-3.C: Add ARM comment character to regexp.
--- g++.dg/debug/dwarf2/nested-3.C  (revision 188750)
+++ g++.dg/debug/dwarf2/nested-3.C  (local)
@@ -59,4 +59,4 @@ main ()
 //
 // Hence the scary regexp:
 //
-// { dg-final { scan-assembler \[^\n\r\]*\\(DIE \\(0x(\[0-9a-f\]+)\\) 
DW_TAG_namespace\\)\[\n\r\]+\[^\n\r\]*\thread\[\^\n\r]+\[\n\r\]+(\[^\n\r\]*\[\n\r\]+)+\[^\n\r\]*\\(DIE
 \\(0x(\[0-9a-f\]+)\\) 
DW_TAG_class_type\\)(\[\n\r\]+\[^\n\r\]*)+\Executor\[^\n\r\]+\[\n\r\]+\[^\n\r\]*DW_AT_declaration\[\n\r\]+\[^\n\r\]*DW_AT_signature\[^#/!|\]*\[#/!|\]
 
\[^\n\r\]*\\(DIE\[^\n\r\]*DW_TAG_subprogram\\)\[\n\r\]+(\[^\n\r\]*\[\n\r\]+)+\[^\n\r\]*\CurrentExecutor\[^\n\r\]+\[\n\r\]+(\[^\n\r\]*\[\n\r\]+)+(\[^\n\r\]*\[\n\r\]+)+\[^\n\r\]*end
 of children of DIE 0x\\3\[\n\r]+\[^\n\r\]*end of children of DIE 
0x\\1\[\n\r]+ } }
+// { dg-final { scan-assembler \[^\n\r\]*\\(DIE \\(0x(\[0-9a-f\]+)\\) 
DW_TAG_namespace\\)\[\n\r\]+\[^\n\r\]*\thread\[\^\n\r]+\[\n\r\]+(\[^\n\r\]*\[\n\r\]+)+\[^\n\r\]*\\(DIE
 \\(0x(\[0-9a-f\]+)\\) 
DW_TAG_class_type\\)(\[\n\r\]+\[^\n\r\]*)+\Executor\[^\n\r\]+\[\n\r\]+\[^\n\r\]*DW_AT_declaration\[\n\r\]+\[^\n\r\]*DW_AT_signature\[^#/!|@\]*\[#/!|@\]
 
\[^\n\r\]*\\(DIE\[^\n\r\]*DW_TAG_subprogram\\)\[\n\r\]+(\[^\n\r\]*\[\n\r\]+)+\[^\n\r\]*\CurrentExecutor\[^\n\r\]+\[\n\r\]+(\[^\n\r\]*\[\n\r\]+)+(\[^\n\r\]*\[\n\r\]+)+\[^\n\r\]*end
 of children of DIE 0x\\3\[\n\r]+\[^\n\r\]*end of children of DIE 
0x\\1\[\n\r]+ } }

[PATCH] Fix PR53708

2012-06-19 Thread Richard Guenther

We are too eager to bump alignment of some decls when vectorizing.
The fix is to not bump alignment of decls the user explicitely
aligned or that are used in an unknown way.

Bootstrapped and tested on i686-darwin9 and x86_64-apple-darwin10
and powerpc-apple-darwin9 by darwin folks, applied.

Richard.

2012-06-19  Richard Guenther  rguent...@suse.de

PR tree-optimization/53708
* tree-vect-data-refs.c (vect_can_force_dr_alignment_p): Preserve
user-supplied alignment and alignment of decls with the used
attribute.

Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   (revision 188733)
+++ gcc/tree-vect-data-refs.c   (working copy)
@@ -4731,6 +4720,12 @@ vect_can_force_dr_alignment_p (const_tre
   if (TREE_ASM_WRITTEN (decl))
 return false;
 
+  /* Do not override explicit alignment set by the user or the alignment
+ as specified by the ABI when the used attribute is set.  */
+  if (DECL_USER_ALIGN (decl)
+  || DECL_PRESERVE_P (decl))
+return false;
+
   if (TREE_STATIC (decl))
 return (alignment = MAX_OFILE_ALIGNMENT);
   else


Re: [PATCH] Add vector cost model density heuristic

2012-06-19 Thread Richard Guenther
On Mon, 18 Jun 2012, William J. Schmidt wrote:

 On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote:
  On Fri, 8 Jun 2012, William J. Schmidt wrote:
  
 snip
  
  Hmm.  I don't like this patch or its general idea too much.  Instead
  I'd like us to move more of the cost model detail to the target, giving
  it a chance to look at the whole loop before deciding on a cost.  ISTR
  posting the overall idea at some point, but let me repeat it here instead
  of trying to find that e-mail.
  
  The basic interface of the cost model should be, in targetm.vectorize
  
/* Tell the target to start cost analysis of a loop or a basic-block
   (if the loop argument is NULL).  Returns an opaque pointer to
   target-private data.  */
void *init_cost (struct loop *loop);
  
/* Add cost for N vectorized-stmt-kind statements in vector_mode.  */
void add_stmt_cost (void *data, unsigned n,
vectorized-stmt-kind,
enum machine_mode vector_mode);
  
/* Tell the target to compute and return the cost of the accumulated
   statements and free any target-private data.  */
unsigned finish_cost (void *data);
  
  with eventually slightly different signatures for add_stmt_cost
  (like pass in the original scalar stmt?).
  
  It allows the target, at finish_cost time, to evaluate things like
  register pressure and resource utilization.
  
  Thanks,
  Richard.
 
 I've been looking at this in between other projects.  I wanted to be
 sure I understood the SLP infrastructure and whether it would cause any
 problems.  It looks to me like it will be mostly ok.  One issue I
 noticed is a possible difference in the order in which SLP instructions
 are analyzed and the order in which the instructions are issued during
 transformation.
 
 For both loop analysis and basic block analysis, SLP trees are
 constructed and analyzed prior to examining other vectorizable
 instructions.  Their costs are calculated and stored in the SLP trees at
 this time.  Later, when transforming statements to their vector
 equivalents, instructions in the block (or loop body) are processed in
 order until the first instruction that's part of an SLP tree is
 encountered.  At that point, every instruction that's part of any SLP
 tree is transformed; then the vectorizer continues with the remaining
 non-SLP vectorizable statements.
 
 So if we do the natural and easy thing of placing calls to add_stmt_cost
 everywhere that costs are calculated today, the order that those costs
 are presented to the back end model will possibly be different than the
 order they are actually emitted.

Interesting.  But I suppose this is similar to how pattern statements
are handled?  Thus, the whole pattern sequence is processed when
we encounter the main pattern statement?

 For a first cut at this, I suggest ignoring the problem other than to
 document it as an opportunity for improvement.  Later we could improve
 it by using an add_stmt_slp_cost () interface (or adding an is_slp
 flag), and another interface to be called at the time during analysis
 when the SLP statements will be issued during transformation.  This
 would allow the back end model to queue up the SLP costs in a separate
 vector and later place them in its internal structures at the
 appropriate place.

 It should eventually be possible to remove these fields/accessors:
 
  * STMT_VINFO_{IN,OUT}SIDE_OF_LOOP_COST
  * SLP_TREE_{IN,OUT}SIDE_OF_LOOP_COST
  * SLP_INSTANCE_{IN,OUT}SIDE_OF_LOOP_COST
 
 However, I think this should be delayed until we have the basic
 infrastructure in place for the new model and well-tested.

Indeed.
 
 The other issue is that we should have the model track both the inside
 and outside costs if we're going to get everything into the target
 model.  For a first pass we can ignore this and keep the existing logic
 for the outside costs.  Later we should add some interfaces analogous to
 add_stmt_cost such as add_stmt_prolog_cost and add_stmt_epilog_cost so
 the model can track this stuff as carefully as it wants to.

Outside costs are merely added to the niter * inner-cost metric to
be compared with the scalar cost niter * scalar-cost, right?  Thus
they would be tracked completely separate - eventually similar to
how we compute the cost of the scalar loop.

 So, I'd propose going at this in several phases:
 
 (1) Add calls to the new interface without disturbing existing logic;
 modify the profitability algorithms to query the new model for inside
 costs.  Default algorithm for the model is to just sum costs as is done
 today.

Right.

 (x) Add heuristics to target models as desired.
 (2) Handle the SLP ordering problem.
 (3) Handle outside costs in the target model.
 (4) Remove the now unnecessary cost fields and the calls that set them.
 
 Item (x) can happen anytime after item (1).
 
 I don't think this work is terribly difficult, just a bit tedious.  The
 only really time-consuming aspect of it will be in 

Re: [PATCH] Add vector cost model density heuristic

2012-06-19 Thread Richard Guenther
On Mon, 18 Jun 2012, William J. Schmidt wrote:

 On Mon, 2012-06-18 at 13:49 -0500, William J. Schmidt wrote:
  On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote:
   On Fri, 8 Jun 2012, William J. Schmidt wrote:
   
  snip
   
   Hmm.  I don't like this patch or its general idea too much.  Instead
   I'd like us to move more of the cost model detail to the target, giving
   it a chance to look at the whole loop before deciding on a cost.  ISTR
   posting the overall idea at some point, but let me repeat it here instead
   of trying to find that e-mail.
   
   The basic interface of the cost model should be, in targetm.vectorize
   
 /* Tell the target to start cost analysis of a loop or a basic-block
(if the loop argument is NULL).  Returns an opaque pointer to
target-private data.  */
 void *init_cost (struct loop *loop);
   
 /* Add cost for N vectorized-stmt-kind statements in vector_mode.  */
 void add_stmt_cost (void *data, unsigned n,
   vectorized-stmt-kind,
 enum machine_mode vector_mode);
   
 /* Tell the target to compute and return the cost of the accumulated
statements and free any target-private data.  */
 unsigned finish_cost (void *data);
 
 By the way, I don't see much point in passing the void *data around
 here.  Too many levels of interfaces that we'd have to pass it around in
 the vectorizer, so it would just sit in a static variable.  Might as
 well let the data be wholly private to the target.

Ok, so you'd have void init_cost (struct loop *) and
unsigned finish_cost (void); then?  Static variables are of couse
not properly abstracted so we can't ever compute two set of costs
at the same time ... but that's true all-over-the-place in GCC ...

With previous discussion the add_stmt_cost hook would be split up
to also allow passing the operation code for example.

Richard.


Re: RFA: Fix PR53688

2012-06-19 Thread Michael Matz
Hi,

On Tue, 19 Jun 2012, Richard Guenther wrote:

  So, we have to build a memref always and rewrite its type to one
  representing the real size.  Note that TYPE_MAX_VALUE may be NULL, so we
  don't need to check for 'len' being null or not.
 
  This fixes the C testcase (don't know about fma 3d), and is in
  regstrapping on x86_64-linux.  Okay if that passes?
 
 Ok.

Thanks, but I now know why we built an INDIRECT_REF :)  
build_simple_mem_ref() only handles some very constrained arguments, 
namely pointers and offseted ADDR_EXPRs when the offset is a constant.  
It doesn't for instance handle bla-a[i] (it asserts).  So the patch 
trips over the assert in build_simple_mem_ref on __builtin_memset 
(p-c[i], 0, 42);.

I could build INDIRECT_REFs always instead of MEM_REFs, this fixes the bug 
too, but it wouldn't generate any MEM_EXPRs anymore, and hence the whole 
bruhaha would be dead code (well, except for alignment setting).

Or I could build MEM_REFs directly, not via build_simple_mem_ref, that 
also works, but leaves us with such MEM_EXPRs sometimes:

  (mem/c:BLK (reg:DI 65) [0 MEM[(void *)p_1(D)-c[i_2(D)]]+0 A8])

Note the complicated and non-canonical expression in the MEM[].  I'm not 
sure if the disambiguators do anything interesting with such expressions.  
If they aren't we'd safe memory by not generating this MEM_EXPR at all.

If the latter is acceptable, then I indeed can as well wrap everything in 
a MEM_REF like you proposed (possibly with a predicate simple enough 
that reflects what build_simple_mem_ref is also checking) and be done with 
it.

So, what should it be?


Ciao,
Michael.

Re: [PATCH] Fix PR53708

2012-06-19 Thread Richard Sandiford
Richard Guenther rguent...@suse.de writes:
 We are too eager to bump alignment of some decls when vectorizing.
 The fix is to not bump alignment of decls the user explicitely
 aligned or that are used in an unknown way.

I thought attribute((__aligned__)) only set a minimum alignment
for variables?  Most usees I've seen have been trying to get
better performance from higher alignment, so it might not go
down well if the attribute stopped the vectoriser from increasing
the alignment still further.

Richard


Re: [PATCH] Fix PR53708

2012-06-19 Thread Richard Guenther
On Tue, 19 Jun 2012, Richard Sandiford wrote:

 Richard Guenther rguent...@suse.de writes:
  We are too eager to bump alignment of some decls when vectorizing.
  The fix is to not bump alignment of decls the user explicitely
  aligned or that are used in an unknown way.
 
 I thought attribute((__aligned__)) only set a minimum alignment
 for variables?  Most usees I've seen have been trying to get
 better performance from higher alignment, so it might not go
 down well if the attribute stopped the vectoriser from increasing
 the alignment still further.

That's what the documentation says indeed.  I'm not sure which
part of the patch fixes the ObjC failures where the alignment
is part of the ABI (and I suppose ObjC then mis-uses the aligned
attribute?).

Richard.


Re: [PATCH] Improve pattern recognizer for division by constant (PR tree-optimization/51581)

2012-06-19 Thread Jakub Jelinek
On Mon, Jun 18, 2012 at 04:44:21PM -0700, Richard Henderson wrote:
 On 2012-06-14 13:58, Jakub Jelinek wrote:
  +  if (!supportable_widening_operation (WIDEN_MULT_EXPR, last_stmt,
  +  vecwtype, vectype,
  +  dummy, dummy, dummy_code,
  +  dummy_code, dummy_int, dummy_vec))
  +return NULL;
 
 
 It would be nice to be able to handle high-part multiplies as well, e.g. 
 VEC_WIDEN_MULT_HI_EXPR.  Which is what Altivec provides, and not 
 VEC_WIDEN_MULT.

Sure, but we don't have a tree code for that right now, do we?
VEC_WIDEN_MULT_HI_EXPR is just one half of the widened multiply results,
not all the high halves of the widened multiply.
For 16-bit multiplication we could also use {,V}PMULH{,U}W
(for 32-bit multiplication we use two {,V}PMUL{,U}DQ plus shifts afterwards).

Jakub


Re: RFA: Fix PR53688

2012-06-19 Thread Richard Guenther
On Tue, Jun 19, 2012 at 12:13 PM, Michael Matz m...@suse.de wrote:
 Hi,

 On Tue, 19 Jun 2012, Richard Guenther wrote:

  So, we have to build a memref always and rewrite its type to one
  representing the real size.  Note that TYPE_MAX_VALUE may be NULL, so we
  don't need to check for 'len' being null or not.
 
  This fixes the C testcase (don't know about fma 3d), and is in
  regstrapping on x86_64-linux.  Okay if that passes?

 Ok.

 Thanks, but I now know why we built an INDIRECT_REF :)
 build_simple_mem_ref() only handles some very constrained arguments,
 namely pointers and offseted ADDR_EXPRs when the offset is a constant.
 It doesn't for instance handle bla-a[i] (it asserts).  So the patch
 trips over the assert in build_simple_mem_ref on __builtin_memset
 (p-c[i], 0, 42);.

 I could build INDIRECT_REFs always instead of MEM_REFs, this fixes the bug
 too, but it wouldn't generate any MEM_EXPRs anymore, and hence the whole
 bruhaha would be dead code (well, except for alignment setting).

 Or I could build MEM_REFs directly, not via build_simple_mem_ref, that
 also works, but leaves us with such MEM_EXPRs sometimes:

  (mem/c:BLK (reg:DI 65) [0 MEM[(void *)p_1(D)-c[i_2(D)]]+0 A8])

 Note the complicated and non-canonical expression in the MEM[].  I'm not
 sure if the disambiguators do anything interesting with such expressions.
 If they aren't we'd safe memory by not generating this MEM_EXPR at all.

 If the latter is acceptable, then I indeed can as well wrap everything in
 a MEM_REF like you proposed (possibly with a predicate simple enough
 that reflects what build_simple_mem_ref is also checking) and be done with
 it.

 So, what should it be?

The MEM_REF is acceptable to the tree oracle and it can extract
points-to information from it.

Thus for simplicity unconditionally building the above is the best.

We can always massage both fold to handle more complex cases
(like the POINTER_PLUS_EXPR case) and set_mem_attributes to
canonicalize / strip the above from useless parts.

Thanks,
Richard.


 Ciao,
 Michael.


RE: [4.6][ARM] Backport MCR Not available in Thumb1

2012-06-19 Thread Joey Ye
Oops! Sorry for such a stupid problem.

2012-06-18  Joey Ye  joey...@arm.com

Backported from mainline
2011-10-14  David Alan Gilbert  david.gilb...@linaro.org

* config/arm/arm.h (TARGET_HAVE_DMB_MCR): MCR Not available in
Thumb1.

Index: gcc/config/arm/arm.h
===
--- gcc/config/arm/arm.h(revision 188331)
+++ gcc/config/arm/arm.h(working copy)
@@ -294,7 +294,8 @@
 #define TARGET_HAVE_DMB(arm_arch7)
 
 /* Nonzero if this chip implements a memory barrier via CP15.  */
-#define TARGET_HAVE_DMB_MCR(arm_arch6k  ! TARGET_HAVE_DMB)
+#define TARGET_HAVE_DMB_MCR(arm_arch6  ! TARGET_HAVE_DMB \
+ ! TARGET_THUMB1)
 
 /* Nonzero if this chip implements a memory barrier instruction.  */
 #define TARGET_HAVE_MEMORY_BARRIER (TARGET_HAVE_DMB || TARGET_HAVE_DMB_MCR)


 -Original Message-
 From: Richard Earnshaw
 Sent: Tuesday, June 19, 2012 16:43
 To: Joey Ye
 Cc: GCC Patches
 Subject: Re: [4.6][ARM] Backport MCR Not available in Thumb1
 
 On 19/06/12 04:03, Joey Ye wrote:
  Backporting trunk r179979
 
  OK for 4.6?
 
  Backported from mainline
  2011-10-14  David Alan Gilbert  david.gilb...@linaro.org
 
  PR target/48126
  * config/arm/arm.c (arm_output_sync_loop): Move label before
  barrier.
 
  Index: gcc/config/arm/arm.h
  ===
  --- gcc/config/arm/arm.h(revision 188331)
  +++ gcc/config/arm/arm.h(working copy)
  @@ -294,7 +294,8 @@
   #define TARGET_HAVE_DMB(arm_arch7)
 
   /* Nonzero if this chip implements a memory barrier via CP15.  */
  -#define TARGET_HAVE_DMB_MCR(arm_arch6k  ! TARGET_HAVE_DMB)
  +#define TARGET_HAVE_DMB_MCR(arm_arch6  ! TARGET_HAVE_DMB \
  + ! TARGET_THUMB1)
 
   /* Nonzero if this chip implements a memory barrier instruction.  */
   #define TARGET_HAVE_MEMORY_BARRIER (TARGET_HAVE_DMB ||
 TARGET_HAVE_DMB_MCR)
 
 
 
 Not ok (yet), the ChangeLog entry doesn't match the patch.
 
 R.





Re: [PATCH] Add vector cost model density heuristic

2012-06-19 Thread William J. Schmidt
On Tue, 2012-06-19 at 12:08 +0200, Richard Guenther wrote:
 On Mon, 18 Jun 2012, William J. Schmidt wrote:
 
  On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote:
   On Fri, 8 Jun 2012, William J. Schmidt wrote:
   
  snip
   
   Hmm.  I don't like this patch or its general idea too much.  Instead
   I'd like us to move more of the cost model detail to the target, giving
   it a chance to look at the whole loop before deciding on a cost.  ISTR
   posting the overall idea at some point, but let me repeat it here instead
   of trying to find that e-mail.
   
   The basic interface of the cost model should be, in targetm.vectorize
   
 /* Tell the target to start cost analysis of a loop or a basic-block
(if the loop argument is NULL).  Returns an opaque pointer to
target-private data.  */
 void *init_cost (struct loop *loop);
   
 /* Add cost for N vectorized-stmt-kind statements in vector_mode.  */
 void add_stmt_cost (void *data, unsigned n,
   vectorized-stmt-kind,
 enum machine_mode vector_mode);
   
 /* Tell the target to compute and return the cost of the accumulated
statements and free any target-private data.  */
 unsigned finish_cost (void *data);
   
   with eventually slightly different signatures for add_stmt_cost
   (like pass in the original scalar stmt?).
   
   It allows the target, at finish_cost time, to evaluate things like
   register pressure and resource utilization.
   
   Thanks,
   Richard.
  
  I've been looking at this in between other projects.  I wanted to be
  sure I understood the SLP infrastructure and whether it would cause any
  problems.  It looks to me like it will be mostly ok.  One issue I
  noticed is a possible difference in the order in which SLP instructions
  are analyzed and the order in which the instructions are issued during
  transformation.
  
  For both loop analysis and basic block analysis, SLP trees are
  constructed and analyzed prior to examining other vectorizable
  instructions.  Their costs are calculated and stored in the SLP trees at
  this time.  Later, when transforming statements to their vector
  equivalents, instructions in the block (or loop body) are processed in
  order until the first instruction that's part of an SLP tree is
  encountered.  At that point, every instruction that's part of any SLP
  tree is transformed; then the vectorizer continues with the remaining
  non-SLP vectorizable statements.
  
  So if we do the natural and easy thing of placing calls to add_stmt_cost
  everywhere that costs are calculated today, the order that those costs
  are presented to the back end model will possibly be different than the
  order they are actually emitted.
 
 Interesting.  But I suppose this is similar to how pattern statements
 are handled?  Thus, the whole pattern sequence is processed when
 we encounter the main pattern statement?

Yes, but the difference is that both vect_analyze_stmt and
vect_transform_loop handle the pattern statements in the same order
(thankfully -- I would hate to have to deal with the pattern mess).
With SLP, all SLP statements are analyzed ahead of time, but they aren't
transformed until one of them is encountered in the statement walk.

 
  For a first cut at this, I suggest ignoring the problem other than to
  document it as an opportunity for improvement.  Later we could improve
  it by using an add_stmt_slp_cost () interface (or adding an is_slp
  flag), and another interface to be called at the time during analysis
  when the SLP statements will be issued during transformation.  This
  would allow the back end model to queue up the SLP costs in a separate
  vector and later place them in its internal structures at the
  appropriate place.
 
  It should eventually be possible to remove these fields/accessors:
  
   * STMT_VINFO_{IN,OUT}SIDE_OF_LOOP_COST
   * SLP_TREE_{IN,OUT}SIDE_OF_LOOP_COST
   * SLP_INSTANCE_{IN,OUT}SIDE_OF_LOOP_COST
  
  However, I think this should be delayed until we have the basic
  infrastructure in place for the new model and well-tested.
 
 Indeed.
 
  The other issue is that we should have the model track both the inside
  and outside costs if we're going to get everything into the target
  model.  For a first pass we can ignore this and keep the existing logic
  for the outside costs.  Later we should add some interfaces analogous to
  add_stmt_cost such as add_stmt_prolog_cost and add_stmt_epilog_cost so
  the model can track this stuff as carefully as it wants to.
 
 Outside costs are merely added to the niter * inner-cost metric to
 be compared with the scalar cost niter * scalar-cost, right?  Thus
 they would be tracked completely separate - eventually similar to
 how we compute the cost of the scalar loop.

Yes, that's the way they're used today, and probably nobody will ever
want to get fancier than that.  But as you say, the idea would be to let
them be tracked similarly, but in 

Re: [PATCH] Add vector cost model density heuristic

2012-06-19 Thread Richard Guenther
On Tue, 19 Jun 2012, William J. Schmidt wrote:

 On Tue, 2012-06-19 at 12:08 +0200, Richard Guenther wrote:
  On Mon, 18 Jun 2012, William J. Schmidt wrote:
  
   On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote:
On Fri, 8 Jun 2012, William J. Schmidt wrote:

   snip

Hmm.  I don't like this patch or its general idea too much.  Instead
I'd like us to move more of the cost model detail to the target, giving
it a chance to look at the whole loop before deciding on a cost.  ISTR
posting the overall idea at some point, but let me repeat it here 
instead
of trying to find that e-mail.

The basic interface of the cost model should be, in targetm.vectorize

  /* Tell the target to start cost analysis of a loop or a basic-block
 (if the loop argument is NULL).  Returns an opaque pointer to
 target-private data.  */
  void *init_cost (struct loop *loop);

  /* Add cost for N vectorized-stmt-kind statements in vector_mode.  */
  void add_stmt_cost (void *data, unsigned n,
  vectorized-stmt-kind,
  enum machine_mode vector_mode);

  /* Tell the target to compute and return the cost of the accumulated
 statements and free any target-private data.  */
  unsigned finish_cost (void *data);

with eventually slightly different signatures for add_stmt_cost
(like pass in the original scalar stmt?).

It allows the target, at finish_cost time, to evaluate things like
register pressure and resource utilization.

Thanks,
Richard.
   
   I've been looking at this in between other projects.  I wanted to be
   sure I understood the SLP infrastructure and whether it would cause any
   problems.  It looks to me like it will be mostly ok.  One issue I
   noticed is a possible difference in the order in which SLP instructions
   are analyzed and the order in which the instructions are issued during
   transformation.
   
   For both loop analysis and basic block analysis, SLP trees are
   constructed and analyzed prior to examining other vectorizable
   instructions.  Their costs are calculated and stored in the SLP trees at
   this time.  Later, when transforming statements to their vector
   equivalents, instructions in the block (or loop body) are processed in
   order until the first instruction that's part of an SLP tree is
   encountered.  At that point, every instruction that's part of any SLP
   tree is transformed; then the vectorizer continues with the remaining
   non-SLP vectorizable statements.
   
   So if we do the natural and easy thing of placing calls to add_stmt_cost
   everywhere that costs are calculated today, the order that those costs
   are presented to the back end model will possibly be different than the
   order they are actually emitted.
  
  Interesting.  But I suppose this is similar to how pattern statements
  are handled?  Thus, the whole pattern sequence is processed when
  we encounter the main pattern statement?
 
 Yes, but the difference is that both vect_analyze_stmt and
 vect_transform_loop handle the pattern statements in the same order
 (thankfully -- I would hate to have to deal with the pattern mess).
 With SLP, all SLP statements are analyzed ahead of time, but they aren't
 transformed until one of them is encountered in the statement walk.

Ah, ok.  I suppose we can simply declare that when we register
vectorized stmts with the backend they are in arbitrary oder.
After all this is not supposed to be another machine dependent reorg
phase (to quote David).

  
   For a first cut at this, I suggest ignoring the problem other than to
   document it as an opportunity for improvement.  Later we could improve
   it by using an add_stmt_slp_cost () interface (or adding an is_slp
   flag), and another interface to be called at the time during analysis
   when the SLP statements will be issued during transformation.  This
   would allow the back end model to queue up the SLP costs in a separate
   vector and later place them in its internal structures at the
   appropriate place.
  
   It should eventually be possible to remove these fields/accessors:
   
* STMT_VINFO_{IN,OUT}SIDE_OF_LOOP_COST
* SLP_TREE_{IN,OUT}SIDE_OF_LOOP_COST
* SLP_INSTANCE_{IN,OUT}SIDE_OF_LOOP_COST
   
   However, I think this should be delayed until we have the basic
   infrastructure in place for the new model and well-tested.
  
  Indeed.
  
   The other issue is that we should have the model track both the inside
   and outside costs if we're going to get everything into the target
   model.  For a first pass we can ignore this and keep the existing logic
   for the outside costs.  Later we should add some interfaces analogous to
   add_stmt_cost such as add_stmt_prolog_cost and add_stmt_epilog_cost so
   the model can track this stuff as carefully as it wants to.
  
  Outside costs are merely added to the niter * 

Re: [4.6][ARM] Backport MCR Not available in Thumb1

2012-06-19 Thread Richard Earnshaw
On 19/06/12 12:26, Joey Ye wrote:
 Oops! Sorry for such a stupid problem.
 
 2012-06-18  Joey Ye  joey...@arm.com
 
 Backported from mainline
 2011-10-14  David Alan Gilbert  david.gilb...@linaro.org
 
 * config/arm/arm.h (TARGET_HAVE_DMB_MCR): MCR Not available in
 Thumb1.
 

OK.

R.



Re: [PATCH] Add vector cost model density heuristic

2012-06-19 Thread William J. Schmidt
On Tue, 2012-06-19 at 12:10 +0200, Richard Guenther wrote:
 On Mon, 18 Jun 2012, William J. Schmidt wrote:
 
  On Mon, 2012-06-18 at 13:49 -0500, William J. Schmidt wrote:
   On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote:
On Fri, 8 Jun 2012, William J. Schmidt wrote:

   snip

Hmm.  I don't like this patch or its general idea too much.  Instead
I'd like us to move more of the cost model detail to the target, giving
it a chance to look at the whole loop before deciding on a cost.  ISTR
posting the overall idea at some point, but let me repeat it here 
instead
of trying to find that e-mail.

The basic interface of the cost model should be, in targetm.vectorize

  /* Tell the target to start cost analysis of a loop or a basic-block
 (if the loop argument is NULL).  Returns an opaque pointer to
 target-private data.  */
  void *init_cost (struct loop *loop);

  /* Add cost for N vectorized-stmt-kind statements in vector_mode.  */
  void add_stmt_cost (void *data, unsigned n,
  vectorized-stmt-kind,
  enum machine_mode vector_mode);

  /* Tell the target to compute and return the cost of the accumulated
 statements and free any target-private data.  */
  unsigned finish_cost (void *data);
  
  By the way, I don't see much point in passing the void *data around
  here.  Too many levels of interfaces that we'd have to pass it around in
  the vectorizer, so it would just sit in a static variable.  Might as
  well let the data be wholly private to the target.
 
 Ok, so you'd have void init_cost (struct loop *) and
 unsigned finish_cost (void); then?  Static variables are of couse
 not properly abstracted so we can't ever compute two set of costs
 at the same time ... but that's true all-over-the-place in GCC ...

It's a fair point, and perhaps I'll decide to pass the data pointer
around anyway to keep that option open.  We'll see which looks uglier.

 
 With previous discussion the add_stmt_cost hook would be split up
 to also allow passing the operation code for example.

I remember having this discussion, and I was looking for it to check on
the details, but I can't seem to find it either in my inbox or in the
archives.  Can you please point me to that again?  Sorry for the bother.

Thanks,
Bill

 
 Richard.
 



Re: [PATCH] Add vector cost model density heuristic

2012-06-19 Thread Richard Guenther
On Tue, 19 Jun 2012, William J. Schmidt wrote:

 On Tue, 2012-06-19 at 12:10 +0200, Richard Guenther wrote:
  On Mon, 18 Jun 2012, William J. Schmidt wrote:
  
   On Mon, 2012-06-18 at 13:49 -0500, William J. Schmidt wrote:
On Mon, 2012-06-11 at 13:40 +0200, Richard Guenther wrote:
 On Fri, 8 Jun 2012, William J. Schmidt wrote:
 
snip
 
 Hmm.  I don't like this patch or its general idea too much.  Instead
 I'd like us to move more of the cost model detail to the target, 
 giving
 it a chance to look at the whole loop before deciding on a cost.  ISTR
 posting the overall idea at some point, but let me repeat it here 
 instead
 of trying to find that e-mail.
 
 The basic interface of the cost model should be, in targetm.vectorize
 
   /* Tell the target to start cost analysis of a loop or a basic-block
  (if the loop argument is NULL).  Returns an opaque pointer to
  target-private data.  */
   void *init_cost (struct loop *loop);
 
   /* Add cost for N vectorized-stmt-kind statements in vector_mode.  
 */
   void add_stmt_cost (void *data, unsigned n,
 vectorized-stmt-kind,
   enum machine_mode vector_mode);
 
   /* Tell the target to compute and return the cost of the accumulated
  statements and free any target-private data.  */
   unsigned finish_cost (void *data);
   
   By the way, I don't see much point in passing the void *data around
   here.  Too many levels of interfaces that we'd have to pass it around in
   the vectorizer, so it would just sit in a static variable.  Might as
   well let the data be wholly private to the target.
  
  Ok, so you'd have void init_cost (struct loop *) and
  unsigned finish_cost (void); then?  Static variables are of couse
  not properly abstracted so we can't ever compute two set of costs
  at the same time ... but that's true all-over-the-place in GCC ...
 
 It's a fair point, and perhaps I'll decide to pass the data pointer
 around anyway to keep that option open.  We'll see which looks uglier.
 
  
  With previous discussion the add_stmt_cost hook would be split up
  to also allow passing the operation code for example.
 
 I remember having this discussion, and I was looking for it to check on
 the details, but I can't seem to find it either in my inbox or in the
 archives.  Can you please point me to that again?  Sorry for the bother.

It was in the Correct cost model for strided loads thread.

Richard.


Re: [PATCH] Fix PR53708

2012-06-19 Thread Dominique Dhumieres
On Tue, 19 Jun 2012, Richard Guenther wrote:
 
  Richard Guenther rguent...@suse.de writes:
   We are too eager to bump alignment of some decls when vectorizing.
   The fix is to not bump alignment of decls the user explicitely
   aligned or that are used in an unknown way.
  
  I thought attribute((__aligned__)) only set a minimum alignment for
  variables?  Most usees I've seen have been trying to get better
  performance from higher alignment, so it might not go down well if the
  attribute stopped the vectoriser from increasing the alignment still
  further.
 
 That's what the documentation says indeed.  I'm not sure which part of
 the patch fixes the ObjC failures where the alignment is part of the ABI
 (and I suppose ObjC then mis-uses the aligned attribute?).

A quick test shows that 

if (DECL_PRESERVE_P (decl))

alone is enough to fix the objc failures, while they are still there if 
one uses only

if (DECL_USER_ALIGN (decl))

Dominique


[PATCH][5/n] VRP and anti-ranges

2012-06-19 Thread Richard Guenther

This adjusts intersect_ranges to match what will become union_ranges
(but in a separate patch).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-06-19  Richard Guenther  rguent...@suse.de

* tree-vrp.c (intersect_ranges): Handle more cases.
(vrp_intersect_ranges): Dump what we intersect and call ...
(vrp_intersect_ranges_1): ... this.

Index: gcc/tree-vrp.c
===
*** gcc/tree-vrp.c  (revision 188771)
--- gcc/tree-vrp.c  (working copy)
*** intersect_ranges (enum value_range_type
*** 6781,6789 
  enum value_range_type vr1type,
  tree vr1min, tree vr1max)
  {
/* [] is vr0, () is vr1 in the following classification comments.  */
!   if (operand_less_p (*vr0max, vr1min) == 1
!   || operand_less_p (vr1max, *vr0min) == 1)
  {
/* [ ] ( ) or ( ) [ ]
 If the ranges have an empty intersection, the result of the
--- 6781,6811 
  enum value_range_type vr1type,
  tree vr1min, tree vr1max)
  {
+   bool mineq = operand_equal_p (*vr0min, vr1min, 0);
+   bool maxeq = operand_equal_p (*vr0max, vr1max, 0);
+ 
/* [] is vr0, () is vr1 in the following classification comments.  */
!   if (mineq  maxeq)
! {
!   /* [(  )] */
!   if (*vr0type == vr1type)
!   /* Nothing to do for equal ranges.  */
!   ;
!   else if ((*vr0type == VR_RANGE
!vr1type == VR_ANTI_RANGE)
!  || (*vr0type == VR_ANTI_RANGE
!   vr1type == VR_RANGE))
!   {
! /* For anti-range with range intersection the result is empty.  */
! *vr0type = VR_UNDEFINED;
! *vr0min = NULL_TREE;
! *vr0max = NULL_TREE;
!   }
!   else
!   gcc_unreachable ();
! }
!   else if (operand_less_p (*vr0max, vr1min) == 1
!  || operand_less_p (vr1max, *vr0min) == 1)
  {
/* [ ] ( ) or ( ) [ ]
 If the ranges have an empty intersection, the result of the
*** intersect_ranges (enum value_range_type
*** 6813,6831 
  /* Take VR0.  */
}
  }
!   else if (operand_less_p (vr1max, *vr0max) == 1
!   operand_less_p (*vr0min, vr1min) == 1)
  {
!   /* [ (  ) ]  */
!   if (*vr0type == VR_RANGE)
{
! /* If the outer is a range choose the inner one.
!???  If the inner is an anti-range this arbitrarily chooses
!the anti-range.  */
  *vr0type = vr1type;
  *vr0min = vr1min;
  *vr0max = vr1max;
}
else if (*vr0type == VR_ANTI_RANGE
vr1type == VR_ANTI_RANGE)
/* If both are anti-ranges the result is the outer one.  */
--- 6835,6882 
  /* Take VR0.  */
}
  }
!   else if ((maxeq || operand_less_p (vr1max, *vr0max) == 1)
!   (mineq || operand_less_p (*vr0min, vr1min) == 1))
  {
!   /* [ (  ) ] or [(  ) ] or [ (  )] */
!   if (*vr0type == VR_RANGE
!  vr1type == VR_RANGE)
{
! /* If both are ranges the result is the inner one.  */
  *vr0type = vr1type;
  *vr0min = vr1min;
  *vr0max = vr1max;
}
+   else if (*vr0type == VR_RANGE
+   vr1type == VR_ANTI_RANGE)
+   {
+ /* Choose the right gap if the left one is empty.  */
+ if (mineq)
+   {
+ if (TREE_CODE (vr1max) == INTEGER_CST)
+   *vr0min = int_const_binop (PLUS_EXPR, vr1max, integer_one_node);
+ else
+   *vr0min = vr1max;
+   }
+ /* Choose the left gap if the right one is empty.  */
+ else if (maxeq)
+   {
+ if (TREE_CODE (vr1min) == INTEGER_CST)
+   *vr0max = int_const_binop (MINUS_EXPR, vr1min,
+  integer_one_node);
+ else
+   *vr0max = vr1min;
+   }
+ /* Choose the anti-range if the range is effectively varying.  */
+ else if (vrp_val_is_min (*vr0min)
+   vrp_val_is_max (*vr0max))
+   {
+ *vr0type = vr1type;
+ *vr0min = vr1min;
+ *vr0max = vr1max;
+   }
+ /* Else choose the range.  */
+   }
else if (*vr0type == VR_ANTI_RANGE
vr1type == VR_ANTI_RANGE)
/* If both are anti-ranges the result is the outer one.  */
*** intersect_ranges (enum value_range_type
*** 6841,6856 
else
gcc_unreachable ();
  }
!   else if (operand_less_p (*vr0max, vr1max) == 1
!   operand_less_p (vr1min, *vr0min) == 1)
  {
!   /* ( [  ] )  */
!   if (vr1type == VR_RANGE)
!   /* If the outer is a range, choose the inner one.
!  ???  If the inner is an anti-range this arbitrarily chooses
!  the anti-range.  */
;
else if (*vr0type == 

[arm] Remove obsolete FPA support (7/n): Tidy up attributes

2012-06-19 Thread Richard Earnshaw
This patch cleans up some more of the resulting fall-out from removing
the FPA and maverick co-processors.  In particular it covers:

- Removing the redundant states from the type attributes
- Removing some now redundant UNSPEC values.
- Removing some state from the generic scheduler description that is now
no-longer needed.

Tested on arm-eabi and installed on trunk.

* arm.md (enum unspec): Delete UNSPEC_SIN and UNSPEC_COS.
(attr type): Remove fmul, ffmul, farith, ffarith, float_em
f_fpa_load, f_fpa_store, f_mem_r, r_mem_f.
(attr write_conflict, attr core_cycles): Update.
* arm-generic.md (r_mem_f_wbuf): Delete reservation.

R.Index: config/arm/arm.md
===
--- config/arm/arm.md   (revision 188771)
+++ config/arm/arm.md   (working copy)
@@ -65,12 +65,6 @@ (define_constants
 ;; Unspec enumerators for iwmmxt2 are defined in iwmmxt2.md
 
 (define_c_enum unspec [
-  UNSPEC_SIN; `sin' operation (MODE_FLOAT):
-;   operand 0 is the result,
-;   operand 1 the parameter.
-  UNPSEC_COS; `cos' operation (MODE_FLOAT):
-;   operand 0 is the result,
-;   operand 1 the parameter.
   UNSPEC_PUSH_MULT  ; `push multiple' operation:
 ;   operand 0 is the first register,
 ;   subsequent registers are in parallel (use ...)
@@ -321,21 +315,11 @@ (define_attr insn
 ; floata floating point arithmetic operation (subject to 
expansion)
 ; fdivdDFmode floating point division
 ; fdivsSFmode floating point division
-; fmul Floating point multiply
-; ffmulFast floating point multiply
-; farith   Floating point arithmetic (4 cycle)
-; ffarith  Fast floating point arithmetic (2 cycle)
-; float_em a floating point arithmetic operation that is normally emulated
-;  even on a machine with an fpa.
-; f_fpa_load   a floating point load from memory. Only for the FPA.
-; f_fpa_store  a floating point store to memory. Only for the FPA.
 ; f_load[sd]   A single/double load from memory. Used for VFP unit.
 ; f_store[sd]  A single/double store to memory. Used for VFP unit.
 ; f_flag   a transfer of co-processor flags to the CPSR
-; f_mem_r  a transfer of a floating point register to a real reg via mem
-; r_mem_f  the reverse of f_mem_r
-; f_2_rfast transfer float to arm (no memory needed)
-; r_2_ffast transfer arm to float
+; f_2_rtransfer float to core (no memory needed)
+; r_2_ftransfer core to float
 ; f_cvtconvert floating-integral
 ; branch   a branch
 ; call a subroutine call
@@ -351,18 +335,59 @@ (define_attr insn
 ;
 
 (define_attr type
-   
alu,alu_shift,alu_shift_reg,mult,block,float,fdivx,fdivd,fdivs,fmul,fmuls,fmuld,fmacs,fmacd,ffmul,farith,ffarith,f_flag,float_em,f_fpa_load,f_fpa_store,f_loads,f_loadd,f_stores,f_stored,f_mem_r,r_mem_f,f_2_r,r_2_f,f_cvt,branch,call,load_byte,load1,load2,load3,load4,store1,store2,store3,store4,fconsts,fconstd,fadds,faddd,ffariths,ffarithd,fcmps,fcmpd,fcpys
-   (if_then_else 
-(eq_attr insn 
smulxy,smlaxy,smlalxy,smulwy,smlawx,mul,muls,mla,mlas,umull,umulls,umlal,umlals,smull,smulls,smlal,smlals)
-(const_string mult)
-(const_string alu)))
+ alu,\
+  alu_shift,\
+  alu_shift_reg,\
+  mult,\
+  block,\
+  float,\
+  fdivd,\
+  fdivs,\
+  fmuls,\
+  fmuld,\
+  fmacs,\
+  fmacd,\
+  f_flag,\
+  f_loads,\
+  f_loadd,\
+  f_stores,\
+  f_stored,\
+  f_2_r,\
+  r_2_f,\
+  f_cvt,\
+  branch,\
+  call,\
+  load_byte,\
+  load1,\
+  load2,\
+  load3,\
+  load4,\
+  store1,\
+  store2,\
+  store3,\
+  store4,\
+  fconsts,\
+  fconstd,\
+  fadds,\
+  faddd,\
+  ffariths,\
+  ffarithd,\
+  fcmps,\
+  fcmpd,\
+  fcpys
+ (if_then_else 
+(eq_attr insn smulxy,smlaxy,smlalxy,smulwy,smlawx,mul,muls,mla,mlas,\
+umull,umulls,umlal,umlals,smull,smulls,smlal,smlals)
+(const_string mult)
+(const_string alu)))
 
 ; Is this an (integer side) multiply with a 64-bit result?
 (define_attr mul64 no,yes
-(if_then_else
-  (eq_attr insn 
smlalxy,umull,umulls,umlal,umlals,smull,smulls,smlal,smlals)
-  (const_string yes)
-  (const_string no)))
+  (if_then_else
+(eq_attr insn
+ smlalxy,umull,umulls,umlal,umlals,smull,smulls,smlal,smlals)
+(const_string yes)
+(const_string no)))
 
 ; wtype for WMMX insn scheduling purposes.
 (define_attr wtype
@@ -486,7 +511,7 @@ (define_attr model_wbuf no,yes (cons
 ; to stall the processor.  Used with model_wbuf above.
 (define_attr write_conflict no,yes
   (if_then_else (eq_attr type
-
block,float_em,f_fpa_load,f_fpa_store,f_mem_r,r_mem_f,call,load1)
+block,call,load1)

[PATCH][AARCH64]: Invent new regclass - FP low regs.

2012-06-19 Thread Tejas Belagod


Hi,

The attached patch invents a new register class V0 - V15 that is needed for some
lane variants of AdvSIMD instructions that can only take V0 - V15 as their 
indexed register when working on half-word type.


Regression tests are happy. OK?

Thanks,
Tejas Belagod.
ARM.

Changelog:

2012-06-19  Tejas Belagod  tejas.bela...@arm.com

gcc/
* config/aarch64/aarch64-simd.md (aarch64_sqrdmulh_lanemode,
aarch64_sqdmlSBINQOPS:asl_lanemode_internal,
aarch64_sqdmlal_lanemode, aarch64_sqdmlal_laneqmode,
aarch64_sqdmlsl_lanemode, aarch64_sqdmlsl_laneqmode,
aarch64_sqdmlSBINQOPS:asl2_lanemode_internal,
aarch64_sqdmlal2_lanemode, aarch64_sqdmlal2_laneqmode,
aarch64_sqdmlsl2_lanemode, aarch64_sqdmlsl2_laneqmode,
aarch64_sqdmull_lanemode_internal, aarch64_sqdmull_lanemode,
aarch64_sqdmull_laneqmode, aarch64_sqdmull2_lanemode_internal,
aarch64_sqdmull2_lanemode, aarch64_sqdmull2_laneqmode): Change the
constraint of the indexed operand to use vwl instead of w.
* config/aarch64/aarch64.c (aarch64_hard_regno_nregs): Add case for
FP_LO_REGS class.
(aarch64_regno_regclass): Return FP_LO_REGS if register in V0 - V15.
(aarch64_secondary_reload): Change condition to check for both FP reg
classes.
(aarch64_class_max_nregs): Add case for FP_LO_REGS.
* config/aarch64/aarch64.h (reg_class): New register class FP_LO_REGS.
(REG_CLASS_NAMES): Likewise.
(REG_CLASS_CONTENTS): Likewise.
(FP_LO_REGNUM_P): New.
* config/aarch64/aarch64.md (V15_REGNUM): New.
* config/aarch64/constraints.md (x): New register constraint.
* config/aarch64/iterators.md (vwx): New.diff --git a/gcc/config/aarch64/aarch64-simd.md 
b/gcc/config/aarch64/aarch64-simd.md
index 9ceefee..43017df 100644
--- a/gcc/config/aarch64/aarch64-simd.md
+++ b/gcc/config/aarch64/aarch64-simd.md
@@ -1897,7 +1897,7 @@
 (unspec:VSDQ_HSI
  [(match_operand:VSDQ_HSI 1 register_operand w)
(vec_select:VEL
- (match_operand:VCON 2 register_operand w)
+ (match_operand:VCON 2 register_operand vwx)
  (parallel [(match_operand:SI 3 immediate_operand i)]))]
 VQDMULH))]
   TARGET_SIMD
@@ -1940,7 +1940,7 @@
  (sign_extend:VWIDE
(vec_duplicate:VD_HSI
  (vec_select:VEL
-   (match_operand:VCON 3 register_operand w)
+   (match_operand:VCON 3 register_operand vwx)
(parallel [(match_operand:SI 4 immediate_operand i)])))
   ))
(const_int 1]
@@ -1960,7 +1960,7 @@
(match_operand:SD_HSI 2 register_operand w))
  (sign_extend:VWIDE
(vec_select:VEL
- (match_operand:VCON 3 register_operand w)
+ (match_operand:VCON 3 register_operand vwx)
  (parallel [(match_operand:SI 4 immediate_operand i)])))
   )
(const_int 1]
@@ -1974,7 +1974,7 @@
   [(match_operand:VWIDE 0 register_operand =w)
(match_operand:VWIDE 1 register_operand 0)
(match_operand:VSD_HSI 2 register_operand w)
-   (match_operand:VCON 3 register_operand w)
+   (match_operand:VCON 3 register_operand vwx)
(match_operand:SI 4 immediate_operand i)]
   TARGET_SIMD
 {
@@ -1989,7 +1989,7 @@
   [(match_operand:VWIDE 0 register_operand =w)
(match_operand:VWIDE 1 register_operand 0)
(match_operand:VSD_HSI 2 register_operand w)
-   (match_operand:VCON 3 register_operand w)
+   (match_operand:VCON 3 register_operand vwx)
(match_operand:SI 4 immediate_operand i)]
   TARGET_SIMD
 {
@@ -2004,7 +2004,7 @@
   [(match_operand:VWIDE 0 register_operand =w)
(match_operand:VWIDE 1 register_operand 0)
(match_operand:VSD_HSI 2 register_operand w)
-   (match_operand:VCON 3 register_operand w)
+   (match_operand:VCON 3 register_operand vwx)
(match_operand:SI 4 immediate_operand i)]
   TARGET_SIMD
 {
@@ -2019,7 +2019,7 @@
   [(match_operand:VWIDE 0 register_operand =w)
(match_operand:VWIDE 1 register_operand 0)
(match_operand:VSD_HSI 2 register_operand w)
-   (match_operand:VCON 3 register_operand w)
+   (match_operand:VCON 3 register_operand vwx)
(match_operand:SI 4 immediate_operand i)]
   TARGET_SIMD
 {
@@ -2114,7 +2114,7 @@
(sign_extend:VWIDE
   (vec_duplicate:VHALF
(vec_select:VEL
- (match_operand:VCON 3 register_operand w)
+ (match_operand:VCON 3 register_operand vwx)
  (parallel [(match_operand:SI 4 immediate_operand i)])

  (const_int 1]
@@ -2128,7 +2128,7 @@
   [(match_operand:VWIDE 0 register_operand =w)
(match_operand:VWIDE 1 register_operand w)
(match_operand:VQ_HSI 2 register_operand w)
-   (match_operand:VCON 3 register_operand w)
+   (match_operand:VCON 3 

[PATCH] AIX pthread.h fixincludes

2012-06-19 Thread David Edelsohn
AIX 5.2 pthread.h uses the wrong number of braces for more of the
PTHREAD initializers. This patch extends the earlier patch to fix the
other broken macros.

* inclhack.def (aix_mutex_initializer_1, aix_cond_initializer_1,
aix_rwlock_initializer): New.
* fixincl.x: Regenerate.
* tests/base/pthread.h [AIX_MUTEX_INITIALIZER_1_CHECK,
AIX_COND_INITIALIZER_1_CHECK,
AIX_RWLOCK_INITIALIZER_1_CHECK]: New.

Okay?

Thanks, David
Index: inclhack.def
===
--- inclhack.def(revision 188738)
+++ inclhack.def(working copy)
@@ -397,7 +397,9 @@
 };
 
 /*
- *  pthread.h on AIX defines PTHREAD_ONCE_INIT without enough braces.
+ *  pthread.h on AIX defines PTHREAD_ONCE_INIT, PTHREAD_MUTEX_INITIALIZER,
+ *  PTHREAD_COND_INITIALIZER and PTHREAD_RWLOCK_INITIALIZER without enough
+ *  braces.
  */
 fix = {
 hackname  = aix_once_init_1;
@@ -425,6 +427,45 @@
}\n;
 };
 
+fix = {
+hackname  = aix_mutex_initializer_1;
+mach  = *-*-aix*;
+files = pthread.h;
+select= #define[ \t]PTHREAD_MUTEX_INITIALIZER \n
+   \\{ \n;
+c_fix = format;
+c_fix_arg = #define PTHREAD_MUTEX_INITIALIZER \\\n
+   {{ \\\n;
+test_text = #define PTHREAD_MUTEX_INITIALIZER \n
+   { \n;
+};
+
+fix = {
+hackname  = aix_cond_initializer_1;
+mach  = *-*-aix*;
+files = pthread.h;
+select= #define[ \t]PTHREAD_COND_INITIALIZER \n
+   \\{ \n;
+c_fix = format;
+c_fix_arg = #define PTHREAD_COND_INITIALIZER \\\n
+   {{ \\\n;
+test_text = #define PTHREAD_COND_INITIALIZER \n
+   { \n;
+};
+
+fix = {
+hackname  = aix_rwlock_initializer_1;
+mach  = *-*-aix*;
+files = pthread.h;
+select= #define[ \t]PTHREAD_RWLOCK_INITIALIZER \n
+   \\{ \n;
+c_fix = format;
+c_fix_arg = #define PTHREAD_RWLOCK_INITIALIZER \\\n
+   {{ \\\n;
+test_text = #define PTHREAD_RWLOCK_INITIALIZER \n
+   { \n;
+};
+
 /*
  *  pthread.h on AIX 4.3.3 tries to define a macro without whitspace
  *  which violates a requirement of ISO C.


Re: [PATCH] Add vector cost model density heuristic

2012-06-19 Thread William J. Schmidt
On Tue, 2012-06-19 at 14:48 +0200, Richard Guenther wrote:
 On Tue, 19 Jun 2012, William J. Schmidt wrote:
 
  I remember having this discussion, and I was looking for it to check on
  the details, but I can't seem to find it either in my inbox or in the
  archives.  Can you please point me to that again?  Sorry for the bother.
 
 It was in the Correct cost model for strided loads thread.

Ah, right, thanks.  I think it will be best to make that a separate
patch in the series.  Like so:

(1) Add calls to the new interface without disturbing existing logic;
modify the profitability algorithms to query the new model for inside
costs.  Default algorithm for the model is to just sum costs as is done
today.
(1a) Split up the cost hooks (one for loads/stores with misalign parm,
one for vector_stmt with tree_code, etc.).
(x) Add heuristics to target models as desired.
(2) Handle the SLP ordering problem.
(3) Handle outside costs in the target model.
(4) Remove the now unnecessary cost fields and the calls that set them.

I'll start work on this series of patches as I have time between other
projects.

Thanks,
Bill

 
 Richard.
 



Re: [PATCH] Add vector cost model density heuristic

2012-06-19 Thread Richard Guenther
On Tue, 19 Jun 2012, William J. Schmidt wrote:

 On Tue, 2012-06-19 at 14:48 +0200, Richard Guenther wrote:
  On Tue, 19 Jun 2012, William J. Schmidt wrote:
  
   I remember having this discussion, and I was looking for it to check on
   the details, but I can't seem to find it either in my inbox or in the
   archives.  Can you please point me to that again?  Sorry for the bother.
  
  It was in the Correct cost model for strided loads thread.
 
 Ah, right, thanks.  I think it will be best to make that a separate
 patch in the series.  Like so:
 
 (1) Add calls to the new interface without disturbing existing logic;
 modify the profitability algorithms to query the new model for inside
 costs.  Default algorithm for the model is to just sum costs as is done
 today.
 (1a) Split up the cost hooks (one for loads/stores with misalign parm,
 one for vector_stmt with tree_code, etc.).
 (x) Add heuristics to target models as desired.
 (2) Handle the SLP ordering problem.
 (3) Handle outside costs in the target model.
 (4) Remove the now unnecessary cost fields and the calls that set them.
 
 I'll start work on this series of patches as I have time between other
 projects.

Thanks!
Richard.


Re: [PATCH] AIX pthread.h fixincludes

2012-06-19 Thread Bruce Korb
Hi David,

On Tue, Jun 19, 2012 at 7:16 AM, David Edelsohn dje@gmail.com wrote:
 Okay?

Okay.

Cheers - Bruce


[testsuite] Clear hwcap_2 with Sun ld

2012-06-19 Thread Rainer Orth
In recent Solaris 11 Update 1 builds, the Sun assembler tags AVX2 object
files with a hardware capability that isn't cleared by the current
gcc/testsuite/gcc.target/i386/clearcap.map file.  There are some new
capabilities in sys/auxv_386.h in AT_SUN_CAP_HW2, but unfortunately
the old linker map syntax has no support for setting/clearing hwcap_2,
and won't ever get it.

To deal with this situation, I've introduced a new mapfile using the v2
syntax which does support clearing hwcap_2, but now I need to determine
if the linker supports that syntax before using it.  Solaris 11 ld has
the necessary support, and it was backported to Solaris 10 Update 10.
Older Solaris 10 updates and Solaris 8/9 lack it, though.

The following patch does just that.  Tested with the appropriate runtest
invocation on i386-pc-solaris2.11 (ld v2 support), i386-pc-solaris2.9
(ld v1 support only), and x86_64-unknown-linux-gnu (GNU ld which doesn't
support either syntax).

Unless someone finds fault with the patch, I'll commit it in a day.

Rainer


2012-06-19  Rainer Orth  r...@cebitec.uni-bielefeld.de

* gcc.target/i386/clearcapv2.map: New file.
* gcc.target/i386/i386.exp: Try it first before clearcap.map.

# HG changeset patch
# Parent 02789d700fe014df8358c45b8dc09a6b104fbb6b
Clear hwcap_2 with Sun ld

diff --git a/gcc/testsuite/gcc.target/i386/clearcapv2.map b/gcc/testsuite/gcc.target/i386/clearcapv2.map
new file mode 100644
--- /dev/null
+++ b/gcc/testsuite/gcc.target/i386/clearcapv2.map
@@ -0,0 +1,7 @@
+# clear all hardware capabilities emitted by Sun as: the tests here
+# guard against execution at runtime
+# uses mapfile v2 syntax which is the only way to clear AT_SUN_CAP_HW2 flags
+$mapfile_version 2
+CAPABILITY {
+  HW = ;
+};
diff --git a/gcc/testsuite/gcc.target/i386/i386.exp b/gcc/testsuite/gcc.target/i386/i386.exp
--- a/gcc/testsuite/gcc.target/i386/i386.exp
+++ b/gcc/testsuite/gcc.target/i386/i386.exp
@@ -256,12 +256,23 @@ proc check_effective_target_rtm { } {
 
 # If the linker used understands -M mapfile, pass it to clear hardware
 # capabilities set by the Sun assembler.
-set clearcap_ldflags -Wl,-M,$srcdir/$subdir/clearcap.map
+# Try mapfile syntax v2 first which is the only way to clear hwcap_2 flags.
+set clearcap_ldflags -Wl,-M,$srcdir/$subdir/clearcapv2.map
 
-if [check_no_compiler_messages mapfile executable {
+if ![check_no_compiler_messages mapfilev2 executable {
+int main (void) { return 0; }
+} $clearcap_ldflags ] {
+# If this doesn't work, fall back to the less capable v1 syntax.
+set clearcap_ldflags -Wl,-M,$srcdir/$subdir/clearcap.map
+
+if ![check_no_compiler_messages mapfile executable {
 	int main (void) { return 0; }
-  } $clearcap_ldflags ] {
+} $clearcap_ldflags ] {
+	unset clearcap_ldflags
+}
+}
 
+if [info exists clearcap_ldflags] {
   if { [info procs gcc_target_compile] != [list] \
 	 [info procs saved_gcc_target_compile] == [list] } {
 rename gcc_target_compile saved_gcc_target_compile

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


[PATCH][7/n] VRP and anti-ranges

2012-06-19 Thread Richard Guenther

And here is the union_ranges part.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-06-19  Richard Guenther  rguent...@suse.de

* tree-vrp.c (union_ranges): New function.
(vrp_meet_1): Use union_ranges.
(vrp_meet): Dump what we union and call vrp_meet_1.

Index: gcc/tree-vrp.c
===
*** gcc/tree-vrp.c.orig 2012-06-19 15:18:34.0 +0200
--- gcc/tree-vrp.c  2012-06-19 15:23:20.803752745 +0200
*** vrp_visit_stmt (gimple stmt, edge *taken
*** 6770,6775 
--- 6770,7032 
return SSA_PROP_VARYING;
  }
  
+ /* Union the two value-ranges { *VR0TYPE, *VR0MIN, *VR0MAX } and
+{ VR1TYPE, VR0MIN, VR0MAX } and store the result
+in { *VR0TYPE, *VR0MIN, *VR0MAX }.  This may not be the smallest
+possible such range.  The resulting range is not canonicalized.  */
+ 
+ static void
+ union_ranges (enum value_range_type *vr0type,
+ tree *vr0min, tree *vr0max,
+ enum value_range_type vr1type,
+ tree vr1min, tree vr1max)
+ {
+   bool mineq = operand_equal_p (*vr0min, vr1min, 0);
+   bool maxeq = operand_equal_p (*vr0max, vr1max, 0);
+ 
+   /* [] is vr0, () is vr1 in the following classification comments.  */
+   if (mineq  maxeq)
+ {
+   /* [(  )] */
+   if (*vr0type == vr1type)
+   /* Nothing to do for equal ranges.  */
+   ;
+   else if ((*vr0type == VR_RANGE
+vr1type == VR_ANTI_RANGE)
+  || (*vr0type == VR_ANTI_RANGE
+   vr1type == VR_RANGE))
+   {
+ /* For anti-range with range union the result is varying.  */
+ goto give_up;
+   }
+   else
+   gcc_unreachable ();
+ }
+   else if (operand_less_p (*vr0max, vr1min) == 1
+  || operand_less_p (vr1max, *vr0min) == 1)
+ {
+   /* [ ] ( ) or ( ) [ ]
+If the ranges have an empty intersection, result of the union
+operation is the anti-range or if both are anti-ranges
+it covers all.  */
+   if (*vr0type == VR_ANTI_RANGE
+  vr1type == VR_ANTI_RANGE)
+   goto give_up;
+   else if (*vr0type == VR_ANTI_RANGE
+   vr1type == VR_RANGE)
+   ;
+   else if (*vr0type == VR_RANGE
+   vr1type == VR_ANTI_RANGE)
+   {
+ *vr0type = vr1type;
+ *vr0min = vr1min;
+ *vr0max = vr1max;
+   }
+   else if (*vr0type == VR_RANGE
+   vr1type == VR_RANGE)
+   {
+ /* The result is the convex hull of both ranges.  */
+ if (operand_less_p (*vr0max, vr1min) == 1)
+   {
+ /* If the result can be an anti-range, create one.  */
+ if (TREE_CODE (*vr0max) == INTEGER_CST
+  TREE_CODE (vr1min) == INTEGER_CST
+  vrp_val_is_min (*vr0min)
+  vrp_val_is_max (vr1max))
+   {
+ tree min = int_const_binop (PLUS_EXPR,
+ *vr0max, integer_one_node);
+ tree max = int_const_binop (MINUS_EXPR,
+ vr1min, integer_one_node);
+ if (!operand_less_p (max, min))
+   {
+ *vr0type = VR_ANTI_RANGE;
+ *vr0min = min;
+ *vr0max = max;
+   }
+ else
+   *vr0max = vr1max;
+   }
+ else
+   *vr0max = vr1max;
+   }
+ else
+   {
+ /* If the result can be an anti-range, create one.  */
+ if (TREE_CODE (vr1max) == INTEGER_CST
+  TREE_CODE (*vr0min) == INTEGER_CST
+  vrp_val_is_min (vr1min)
+  vrp_val_is_max (*vr0max))
+   {
+ tree min = int_const_binop (PLUS_EXPR,
+ vr1max, integer_one_node);
+ tree max = int_const_binop (MINUS_EXPR,
+ *vr0min, integer_one_node);
+ if (!operand_less_p (max, min))
+   {
+ *vr0type = VR_ANTI_RANGE;
+ *vr0min = min;
+ *vr0max = max;
+   }
+ else
+   *vr0min = vr1min;
+   }
+ else
+   *vr0min = vr1min;
+   }
+   }
+   else
+   gcc_unreachable ();
+ }
+   else if ((maxeq || operand_less_p (vr1max, *vr0max) == 1)
+   (mineq || operand_less_p (*vr0min, vr1min) == 1))
+ {
+   /* [ (  ) ] or [(  ) ] or [ (  )] */
+   if (*vr0type == VR_RANGE
+  vr1type == VR_RANGE)
+   ;
+   else if (*vr0type == VR_ANTI_RANGE
+   vr1type == VR_ANTI_RANGE)
+   {
+ *vr0type = vr1type;
+ *vr0min = vr1min;
+ *vr0max = vr1max;

Re: [PATCH, gdc] - Merging gdc (GNU D Compiler) into gcc

2012-06-19 Thread Joseph S. Myers
On Mon, 18 Jun 2012, Iain Buclaw wrote:

 These series of patches are for the D compiler frontend for inclusion into 
 GCC.
 
 http://www.gdcproject.org/files/gdc_frontend.patch.gz
 http://www.gdcproject.org/files/gdc_libphobos.patch.gz
 http://www.gdcproject.org/files/gdc_testsuite.patch.gz
 http://www.gdcproject.org/files/gdc_gcc.patch.gz

Please provide GNU ChangeLog entries for each patch, for each relevant 
ChangeLog file.  It would be best to post those in plain text to the list, 
even if the patches themselves are too big.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH][AARCH64]: Invent new regclass - FP low regs.

2012-06-19 Thread Marcus Shawcroft

On 19/06/12 15:03, Tejas Belagod wrote:


Hi,

The attached patch invents a new register class V0 - V15 that is needed for some
lane variants of AdvSIMD instructions that can only take V0 - V15 as their
indexed register when working on half-word type.

Regression tests are happy. OK?


OK
/Marcus



Re: [PATCH 4/4] - Merging gdc (GNU D Compiler) into gcc

2012-06-19 Thread Joseph S. Myers
On Mon, 18 Jun 2012, Iain Buclaw wrote:

 --- gcc-4.8-20120617/gcc/doc/install.texi 2012-05-29 15:14:06.0 
 +0100
 +++ gcc-4.8/gcc/doc/install.texi  2012-06-18 20:39:45.058591380 +0100
 @@ -1360,12 +1360,12 @@ their runtime libraries should be built.
  grep language= */config-lang.in
  @end smallexample
  Currently, you can use any of the following:
 -@code{all}, @code{ada}, @code{c}, @code{c++}, @code{fortran},
 +@code{all}, @code{ada}, @code{c}, @code{c++}, @code{d}, @code{fortran},
  @code{go}, @code{java}, @code{objc}, @code{obj-c++}.
  Building the Ada compiler has special requirements, see below.
  If you do not pass this flag, or specify the option @code{all}, then all
  default languages available in the @file{gcc} sub-tree will be configured.
 -Ada, Go and Objective-C++ are not default languages; the rest are.
 +Ada, D, Go and Objective-C++ are not default languages; the rest are.

Maybe this should be true, but I don't see a build_by_default=no setting 
in config-lang.in (in gdc_frontend.patch.gz) to make it so.

 --- gcc-4.8-20120617/gcc/doc/standards.texi   2011-12-21 17:53:58.0 
 +
 +++ gcc-4.8/gcc/doc/standards.texi2012-04-22 17:11:38.553880036 +0100
 @@ -289,6 +289,16 @@ a specific version.  In general GCC trac
  closely, and any given release will support the language as of the
  date that the release was frozen.
  
 +@section D language
 +
 +The D language continues to evolve as of this writing; see the
 +@uref{http://golang.org/@/doc/@/go_spec.html, current language
 +specifications}.  At present there are no specific versions of Go, and
 +there is no way to describe the language supported by GCC in terms of
 +a specific version.  In general GCC tracks the evolving specification
 +closely, and any given release will support the language as of the
 +date that the release was frozen.

Referring to Go in a section about D doesn't make sense

I don't see entries in contrib.texi in this patch.

I'd also expect contrib/gcc_update to be updated to handle timestamp 
ordering for generated files in libphobos.

Are you volunteering to be appointed maintainer for this front end by the 
SC?

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH] Fix vrp68 testcase

2012-06-19 Thread Richard Guenther

This fixes the testcase to match reality - and update the comments
appropriately in it.

Tested on x86_64-unknown-linux-gnu, applied.

Richard.

2012-06-19  Richard Guenther  rguent...@suse.de

* gcc.dg/tree-ssa/vrp68.c: Adjust testcase.

Index: gcc/testsuite/gcc.dg/tree-ssa/vrp68.c
===
--- gcc/testsuite/gcc.dg/tree-ssa/vrp68.c   (revision 188780)
+++ gcc/testsuite/gcc.dg/tree-ssa/vrp68.c   (working copy)
@@ -8,17 +8,11 @@ void test1 (int i, int j, int b)
   RANGE(i, 2, 6);
   ANTI_RANGE(j, 1, 7);
   MERGE(b, i, j);
-  CHECK_ANTI_RANGE(i, 7, 7);
   CHECK_ANTI_RANGE(i, 1, 1);
-  /* If we swap the anti-range tests the ~[6, 6] test is never eliminated.  */
 }
 int main() { }
 
-/* While subsequent VRP/DOM passes manage to even recognize the ~[6, 6]
-   test as redundant a single VRP run will arbitrarily choose ~[0, 0] when
-   merging [1, 5] with ~[0, 6] so the first VRP pass can only eliminate
-   the ~[0, 0] check as redundant.  */
+/* VRP will arbitrarily choose ~[1, 1] when merging [2, 6] with ~[1, 7].  */
 
-/* { dg-final { scan-tree-dump-times link_error 0 vrp1 { xfail *-*-* } } } 
*/
-/* { dg-final { scan-tree-dump-times link_error 1 vrp1 } } */
+/* { dg-final { scan-tree-dump-times link_error 0 vrp1 } } */
 /* { dg-final { cleanup-tree-dump vrp1 } } */


Re: [PATCH, libcpp]: Use x86 __builtin_ia32_pcmpestri128 instead of asm.

2012-06-19 Thread Richard Henderson
On 2012-06-18 23:38, Uros Bizjak wrote:
 On Tue, Jun 19, 2012 at 12:07 AM, Richard Henderson r...@redhat.com wrote:
 On 2012-06-18 13:19, Uros Bizjak wrote:
/* ??? The builtin doesn't understand that the PCMPESTRI read from
memory need not be aligned.  */
 -  __asm (%vpcmpestri $0, (%1), %2
 -  : =c(index) : r(s), x(search), a(4), d(16));
 +  sv = __builtin_ia32_loaddqu ((const char *) s);
 +  index = __builtin_ia32_pcmpestri128 (search, 4, sv, 16, 0);
 +


 Surely the comment can be removed too then?
 
 I'm not sure there. The builtin, as defined, expects V16QI operand
 with xm constraint.

Fair enough.  I'm ok with the patch as-is.


r~




Re: [PATCH, gdc] - Merging gdc (GNU D Compiler) into gcc

2012-06-19 Thread Steven Bosscher
Hello,

I had a very quick look through the gdc_frontend patch. Below are a
couple of comments on it:

 http://www.gdcproject.org/files/gdc_frontend.patch.gz

 [PATCH 1/4]:
 The D compiler frontend
  -  gcc/d

How did you test this? You include rtl.h/expr.h in d-builtins.c and
d-gcc-includes.h, which should both be in ALL_HOST_FRONTEND_OBJS and
fail to build because IN_GCC_FRONTEND is defined and GCC_RTL_H is
poisoned. See system.h:

/* Front ends should never have to include middle-end headers.  Enforce
   this by poisoning the header double-include protection defines.  */
#ifdef IN_GCC_FRONTEND
#pragma GCC poison GCC_RTL_H GCC_EXCEPT_H GCC_EXPR_H
#endif

Do you somehow bypass the normal build system? Or maybe you don't
include system.h? Either way, front ends should never have to include
RTL headers.

BTW you also include output.h in those two files, and I am about two
patches away from adding output.h to the list of headers that no front
end should ever include (a front end should never have to write out
assembly). Can you please check what you need output.h for, and fix
this?


What are you calling targetm.asm_out.output_mi_thunk and
targetm.asm_out.generate_internal_label for? Thunks and aliases should
go through cgraphunit.

(NB: This also means that this front end cannot work with LTO. IMHO we
shouldn't let in new front ends that don't work with LTO.)


Many functions have no leading comment, and other GNU coding standard
requirements are not followed either. Those should IMHO be fixed also,
before this front end can be accepted.


There is this comment:
+/* GCC does not support jumps from asm statements.

This isn't really true anymore, as your patch also notes:
+   --
+   %% Fix for GCC-4.5+
+   GCC now accepts a 5th operand, ASM_LABELS.
(...)
+   For prior versions of gcc, this requires a backpatch.

It seems to me that if this front end is contributed, handling of
prior version of gcc isn't necessary anymore - that code should just
be removed.


+
+   case Op_de:
+#ifndef TARGET_80387
+#define XFmode TFmode
+#endif
+ mode = XFmode; // not TFmode

What is this hack for? This is not the way to find the right mode for
this operation.

+#ifdef TARGET_80387
+#include d-asm-i386.h
+#else
+#define D_NO_INLINE_ASM_AT_ALL
+#endif
+
+/* Apple GCC extends ASM_EXPR to five operands; cannot use build4. */

Idem here. And Apple GCC is irrelevant too, if this front end lands on
FSF trunk.

What is d/d-asm-i386.h for? It looks like i386 is a special case
throughout the front end.


In d-gcc-tree.h:
+// normally include config.h (hconfig.h, tconfig.h?), but that
+// includes things that cause problems, so...
+
+union tree_node;
+typedef union tree_node *tree;

See coretypes.h.

Ciao!
Steven


Re: [PATCH, gdc] - Merging gdc (GNU D Compiler) into gcc

2012-06-19 Thread Joseph S. Myers
On Mon, 18 Jun 2012, Iain Buclaw wrote:

 [PATCH 1/4]:
 The D compiler frontend
  -  gcc/d

Only selectively reviewed, but here are some comments:

 diff -Naur gcc-4.8-20120617/gcc/d/asmstmt.cc gcc-4.8/gcc/d/asmstmt.cc
 --- gcc-4.8-20120617/gcc/d/asmstmt.cc   1970-01-01 01:00:00.0 +0100
 +++ gcc-4.8/gcc/d/asmstmt.cc2012-06-05 13:42:09.044876794 +0100
 @@ -0,0 +1,2731 @@
 +// asmstmt.cc -- D frontend for GCC.
 +// Originally contributed by David Friedman
 +// Maintained by Iain Buclaw
 +
 +// GCC is free software; you can redistribute it and/or modify it under

Every file more than ten lines long needs a copyright notice as well as 
the license notice.  See 
http://www.gnu.org/prep/maintain/html_node/Copyright-Notices.html for 
instructions, including the case of multiple copyright holders - though if 
there are any significant (more than fifteen lines of copyrightable text 
or so) contributors not assigning copyright to the FSF then special 
approval from the FSF will be needed to include the front end.

I would say that the files in dfrontend/ need copyright and license 
notices as well, though not necessarily in exactly GNU form.  Thus, you 
will need to get Digital Mars to approve appropriate notices for those 
files (aav.c is the first I see that's lacking such a notice but is long 
enough to need one; likewise async.c, gnuc.c, speller.c; rmem.c just says 
All Rights Reserved and needs a proper license notice like other files; 
likewise rmem.h).

 +#ifdef TARGET_80387
 +#include d-asm-i386.h
 +#else
 +#define D_NO_INLINE_ASM_AT_ALL
 +#endif

Ugh.  We want to move away from target macros, and this isn't even a 
proper target macro.  It would be better to define target hooks for the D 
inline asm support - possibly with a D-specific hook structure, like the C 
hooks structure.  (Even if you avoid needing copyright assignments for the 
front end itself, such hook implementations will probably need to be 
assigned.)

 +/* Apple GCC extends ASM_EXPR to five operands; cannot use build4. */

I don't see why that should be in the least relevant to a contribution to 
FSF GCC.  If you can do things in a more natural way in FSF GCC, then do 
so.

Each function in the GCC-specific parts of the code should have a comment 
on it, explaining the semantics of the function, its operands and its 
return value if any.

For new code in GCC, it's better to use snprintf than sprintf.

 +extern void decode_options (struct gcc_options *, struct gcc_options *,

Please use appropriate headers rather than local declarations of GCC 
functions.

 +// d-bi-attr.h -- D frontend for GCC.

This file looks like it's largely copied from elsewhere in GCC.  In such a 
case, please work out a better way to refactor the code so that it can be 
shared rather than duplicated.  (Again, such common code will no doubt 
need full copyright assignments.)

I don't know whether your assignment Assigns Past and Future Changes to 
the GNU D Compiler (GDC) covers changes elsewhere in GCC.  But I expect a 
general assignment for GCC to be needed for any refactoring involved in 
adapting common code for use in D.  (And such refactoring would be a new 
contribution so there shouldn't be any issues with unknown previous 
contributors without assignments - those would only arise if significant 
amounts of previously written D front-end code are being moved into common 
code.)

 +#if D_VA_LIST_TYPE_VOIDPTR

Please avoid #if conditionals on anything that could be a target property.  
It's generally better to use if conditionals instead of #if, so that all 
cases are checked for syntax in all compiles.

I see #if conditions on defines such as V2 and V1 as well.  Unless 
something is an *existing* target macro or configure macro in GCC, use 
if conditions and ensure that the macro is defined to true or false 
values (rather than defined or not defined).  But if a macro is always 
defined, or never defined, then just avoiding the conditionals may be 
better.

The gcc/d/dfrontend/readme.txt says:

 +These sources are free, they are redistributable and modifiable
 +under the terms of the GNU General Public License (attached as gpl.txt),
 +or the Artistic License (attached as artistic.txt).

But that license is GPLv2.  We need an explicit notice (approved by the 
copyright holder) saying that *any later version* may be used.  If Digital 
Mars wishes to license the separately maintained dfrontend/ code under 
GPLv2+ rather than GPLv3+, that's fine, just like the gofrontend/ code is 
under a permissive license - but it needs to be explicit that any later 
version may be used.

I haven't studied the details of the dfrontend/ code.  But if you are to 
follow the Go model - separately maintained code for the front end proper 
that may be used verbatim in multiple compilers, with the code outside 
dfrontend/ doing everything related to interfacing with GCC, and only 
what's related to interfacing with GCC - then the

 +/* NOTE: This file has been patched 

[PATCH, i386]: Introduce FRNDINT_ROUNDING int iterator

2012-06-19 Thread Uros Bizjak
Hello!

2012-06-19  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.md (FRNDINT_ROUNDING): New int iterator.
(rounding): New int attribute.
(ROUNDING): Ditto.
(frndintxf2_rounding): Macroize insn from
frndintxf2_{floor,ceil,trunc} using FRNDINT_ROUNDING int iterator.
(frndintxf2_rounding_i387): Macroize insn from
frndintxf2_{floor,ceil,trunc}_i387 using FRNDINT_ROUNDING int iterator.

Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32}.
Will be committed to mainline SVN.

BTW: A follow-up patch will also macroize fistmode2_{floor,ceil} and friends.

Uros.
Index: i386.md
===
--- i386.md (revision 188781)
+++ i386.md (working copy)
@@ -15099,11 +15099,26 @@
   DONE;
 })
 
+(define_int_iterator FRNDINT_ROUNDING
+   [UNSPEC_FRNDINT_FLOOR
+UNSPEC_FRNDINT_CEIL
+UNSPEC_FRNDINT_TRUNC])
+
+(define_int_attr rounding
+   [(UNSPEC_FRNDINT_FLOOR floor)
+(UNSPEC_FRNDINT_CEIL ceil)
+(UNSPEC_FRNDINT_TRUNC trunc)])
+
+(define_int_attr ROUNDING
+   [(UNSPEC_FRNDINT_FLOOR FLOOR)
+(UNSPEC_FRNDINT_CEIL CEIL)
+(UNSPEC_FRNDINT_TRUNC TRUNC)])
+
 ;; Rounding mode control word calculation could clobber FLAGS_REG.
-(define_insn_and_split frndintxf2_floor
+(define_insn_and_split frndintxf2_rounding
   [(set (match_operand:XF 0 register_operand)
(unspec:XF [(match_operand:XF 1 register_operand)]
-UNSPEC_FRNDINT_FLOOR))
+  FRNDINT_ROUNDING))
(clobber (reg:CC FLAGS_REG))]
   TARGET_USE_FANCY_MATH_387
 flag_unsafe_math_optimizations
@@ -15112,30 +15127,30 @@
1
   [(const_int 0)]
 {
-  ix86_optimize_mode_switching[I387_FLOOR] = 1;
+  ix86_optimize_mode_switching[I387_ROUNDING] = 1;
 
   operands[2] = assign_386_stack_local (HImode, SLOT_CW_STORED);
-  operands[3] = assign_386_stack_local (HImode, SLOT_CW_FLOOR);
+  operands[3] = assign_386_stack_local (HImode, SLOT_CW_ROUNDING);
 
-  emit_insn (gen_frndintxf2_floor_i387 (operands[0], operands[1],
-   operands[2], operands[3]));
+  emit_insn (gen_frndintxf2_rounding_i387 (operands[0], operands[1],
+operands[2], operands[3]));
   DONE;
 }
   [(set_attr type frndint)
-   (set_attr i387_cw floor)
+   (set_attr i387_cw rounding)
(set_attr mode XF)])
 
-(define_insn frndintxf2_floor_i387
+(define_insn frndintxf2_rounding_i387
   [(set (match_operand:XF 0 register_operand =f)
(unspec:XF [(match_operand:XF 1 register_operand 0)]
-UNSPEC_FRNDINT_FLOOR))
+  FRNDINT_ROUNDING))
(use (match_operand:HI 2 memory_operand m))
(use (match_operand:HI 3 memory_operand m))]
   TARGET_USE_FANCY_MATH_387
 flag_unsafe_math_optimizations
   fldcw\t%3\n\tfrndint\n\tfldcw\t%2
   [(set_attr type frndint)
-   (set_attr i387_cw floor)
+   (set_attr i387_cw rounding)
(set_attr mode XF)])
 
 (define_expand floorxf2
@@ -15357,45 +15372,6 @@
   DONE;
 })
 
-;; Rounding mode control word calculation could clobber FLAGS_REG.
-(define_insn_and_split frndintxf2_ceil
-  [(set (match_operand:XF 0 register_operand)
-   (unspec:XF [(match_operand:XF 1 register_operand)]
-UNSPEC_FRNDINT_CEIL))
-   (clobber (reg:CC FLAGS_REG))]
-  TARGET_USE_FANCY_MATH_387
-flag_unsafe_math_optimizations
-can_create_pseudo_p ()
-  #
-   1
-  [(const_int 0)]
-{
-  ix86_optimize_mode_switching[I387_CEIL] = 1;
-
-  operands[2] = assign_386_stack_local (HImode, SLOT_CW_STORED);
-  operands[3] = assign_386_stack_local (HImode, SLOT_CW_CEIL);
-
-  emit_insn (gen_frndintxf2_ceil_i387 (operands[0], operands[1],
-  operands[2], operands[3]));
-  DONE;
-}
-  [(set_attr type frndint)
-   (set_attr i387_cw ceil)
-   (set_attr mode XF)])
-
-(define_insn frndintxf2_ceil_i387
-  [(set (match_operand:XF 0 register_operand =f)
-   (unspec:XF [(match_operand:XF 1 register_operand 0)]
-UNSPEC_FRNDINT_CEIL))
-   (use (match_operand:HI 2 memory_operand m))
-   (use (match_operand:HI 3 memory_operand m))]
-  TARGET_USE_FANCY_MATH_387
-flag_unsafe_math_optimizations
-  fldcw\t%3\n\tfrndint\n\tfldcw\t%2
-  [(set_attr type frndint)
-   (set_attr i387_cw ceil)
-   (set_attr mode XF)])
-
 (define_expand ceilxf2
   [(use (match_operand:XF 0 register_operand))
(use (match_operand:XF 1 register_operand))]
@@ -15613,45 +15589,6 @@
   DONE;
 })
 
-;; Rounding mode control word calculation could clobber FLAGS_REG.
-(define_insn_and_split frndintxf2_trunc
-  [(set (match_operand:XF 0 register_operand)
-   (unspec:XF [(match_operand:XF 1 register_operand)]
-UNSPEC_FRNDINT_TRUNC))
-   (clobber (reg:CC FLAGS_REG))]
-  TARGET_USE_FANCY_MATH_387
-flag_unsafe_math_optimizations
-can_create_pseudo_p ()
-  #
-   1
-  [(const_int 0)]
-{
-  ix86_optimize_mode_switching[I387_TRUNC] = 1;
-
-  operands[2] = 

Re: [PATCH, gdc] - Merging gdc (GNU D Compiler) into gcc

2012-06-19 Thread Joseph S. Myers
On Mon, 18 Jun 2012, Iain Buclaw wrote:

 http://www.gdcproject.org/files/gdc_libphobos.patch.gz

Same comments as before about FSF postal addresses.

Although runtime libraries need not be assigned to the FSF (as per the GCC 
Mission Statement), all significant files should still have copyright and 
license notices (approved by all significant contributors) so that people 
know the free software terms under which they may be used.  E.g., 
libphobos/libdruntime/config/x3.c appears to be missing such notices.  
Without a license (or a dedication to the public domain), a file is 
presumptively copyright and has no license for anyone to use it at all.

 +if true; then

if true seems odd; if you have a good reason for it, you need to comment 
it.

 +# generated automatically by aclocal 1.9.6 -*- Autoconf -*-

Please use the standard documented autoconf/automake versions for GCC 
(autoconf 2.64, automake 1.11.1).

 diff -Naur gcc-4.8-20120617/libphobos/autom4te.cache/output.0 
 gcc-4.8/libphobos/autom4te.cache/output.0

We don't check in autom4te.cache directories.

 +# libphobos is usually a symlink to gcc/d/phobos, so libphobos/..

No it's not.  No runtime libraries should go under gcc/ any more at all.

 +dnl Copied from libstdc++-v3/acinclude.m4.  Indeed, multilib will not work

Refactor into the config/ directory, don't copy.

 \ No newline at end of file

Add any missing newlines to text files in all patches.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH, gdc] - Merging gdc (GNU D Compiler) into gcc

2012-06-19 Thread Joseph S. Myers
On Mon, 18 Jun 2012, Iain Buclaw wrote:

 http://www.gdcproject.org/files/gdc_testsuite.patch.gz

I have no comments on this patch for now.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [Patch] Adjustments for Windows x64 SEH

2012-06-19 Thread Richard Henderson
On 2012-06-18 05:22, Tristan Gingold wrote:
 +  /* Win64 SEH, very large frames need a frame-pointer as maximum stack
 + allocation is 4GB (add a safety guard for saved registers).  */
 +  if (TARGET_64BIT_MS_ABI  get_frame_size () + 4096  SEH_MAX_FRAME_SIZE)
 +return true;

Elsewhere you say this is an upper bound for stack use by the prologue.
It's clearly a wild guess.  The maximum stack use is 10*sse + 8*int 
registers saved, which is a lot less than 4096.

That said, I'm ok with *using* 4096 so long that the comment clearly
states that it's a large over-estimate.  I do suggest, however, folding
this into the SEH_MAX_FRAME_SIZE value, and expanding on the comment
there.  I see no practical difference between 0x8000 and 0x7fffe000
being the limit.

 +/* Output assembly code to get the establisher frame (Windows x64 only).
 +   This corresponds to what will be computed by Windows from Frame Register
 +   and Frame Register Offset fields of the UNWIND_INFO structure.  Since
 +   these values are computed very late (by ix86_expand_prologue), we cannot
 +   express this using only RTL.  */
 +
 +const char *
 +ix86_output_establisher_frame (rtx target)
 +{
 +  if (!frame_pointer_needed)
 +{
 +  /* Note that we have advertized an lea operation.  */
 +  output_asm_insn (lea{q}\t{0(%%rsp), %0|%0, 0[rsp]}, target);
 +}
 +  else
 +{
 +  rtx xops[3];
 +  struct ix86_frame frame;
 +
 +  /* Recompute the frame layout here.  */
 +  ix86_compute_frame_layout (frame);
 +
 +  /* Closely follow how the frame pointer is set in
 +  ix86_expand_prologue.  */
 +  xops[0] = target;
 +  xops[1] = hard_frame_pointer_rtx;
 +  if (frame.hard_frame_pointer_offset == frame.reg_save_offset)
 + xops[2] = GEN_INT (0);
 +  else
 + xops[2] = GEN_INT (-(frame.stack_pointer_offset
 +  - frame.hard_frame_pointer_offset));
 +  output_asm_insn (lea{q}\t{%a2(%1), %0|%0, %a2[%1]}, xops);

This is what register elimination is for; the value substitution happens
during reload.

Now, one *could* add a new pseudo-hard-register for this (we support as
many register eliminations as needed), but before we do that we need to
decide if we can adjust the soft frame pointer to be the value required.
If so, you can then rely on the existing __builtin_frame_address.  Which
is a very attractive sounding solution.  I'm 99% moving the sfp will work.


r~


Re: [PATCH] Bad code generation: incorrect folding of TARGET_MEM_REF into a constant

2012-06-19 Thread Jiří Hruška
On Tue, Jun 19, 2012 at 10:54 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 The issue is that your testcase is invalid.
    int x = ret(*(fooS + i));
 this access is only ever valid for i == 0 as otherwise you are creating
 a pointer that points outside of the object fooS.

Richard,

thanks for your reply.

The testcase is invalid also for other reasons, a big one being the
automatic sorting and merging of sections with a dollar sign in their
names is a Windows-originated extension used for PE target only, which
makes it not work elsewhere. Sorry about that, I'll refrain from using
anything non-standard here.


Accessing outside object bounds is IMO a common C practice allowed by
the existence of pointers. This exact technique is used for
decentralized lists created during compile-time, be it extensible
handler/hook structures, pointers to init/fini functions etc. It has
notable use e.g. in Linux kernel [1], [2].

The programmer places defined data to a special linker section in
individual compilation units, then traverse through it using
linker-provided symbols (e.g. ld creates __start_section-name and
__end_section-name automatically), as test0.c shows:
  $ gcc -O1 -m32 -fno-toplevel-reorder test0.c  ./a.out
  0: 1
  1: 2
  2: 3

The sole reason for messing with the section attributes is to keep the
values together. Because I can force the order (to the necessary
extent) by -fno-toplevel-reorder, the program can be changed to use
just bounding variables without any linker magic (test1.c):
  $ gcc -O1 -m32 -fno-toplevel-reorder test1.c  ./a.out
  0: 1
  1: 2
  2: 3
The only changes in the code are removing the section attributes and
adding offset by one, skipping the starting element (as __start_foo
has a size now).

Now, changing the end condition from test for the end address to test
for the end sentinel -1 and duplicating the printf() line (to hit the
right optimization spot), something weird happens (test2.c):
  $ gcc -O1 -m32 -fno-toplevel-reorder test2.c  ./a.out
  0: 1
  0: -1
  1: 2
  1: -1
  2: 3
  2: -1
Why is the second line in each iteration different from the first? It
should be printing exactly the same expression.
Analyzing the dom phase log shows the memory access is optimized to
constant value of the base variable, hence -1.
And without optimization, both of them are correct:
  $ gcc -O0 -m32 test2.c  ./a.out
  0: 1
  0: 1
  1: 2
  1: 2
  2: 3
  2: 3

That is the problem I am talking about and which the patch aims to address.

Jiri


[1] 
http://www.compsoc.man.ac.uk/~moz/kernelnewbies/documents/initcall/index.html
[2] http://lkml.indiana.edu/hypermail/linux/kernel/0706.2/2552.html
#include stdio.h

__attribute__((section(foo))) const int foo1 = 1;
__attribute__((section(foo))) const int foo2 = 2;
__attribute__((section(foo))) const int foo3 = 3;

extern const int __start_foo, __stop_foo;

int main(void)
{
  int i;

  i = 0;
  do {
printf(%d: %d\n, i, *(__start_foo + i));
i++;
  } while(__start_foo + i != __stop_foo);

  return 0;
}
#include stdio.h

const int __start_foo = -1;
const int foo1 = 1;
const int foo2 = 2;
const int foo3 = 3;
const int __stop_foo = -1;

int main(void)
{
  int i;

  i = 0;
  do {
printf(%d: %d\n, i, *(__start_foo + 1 + i));
i++;
  } while(__start_foo + 1 + i != __stop_foo);

  return 0;
}
#include stdio.h

const int __start_foo = -1;
const int foo1 = 1;
const int foo2 = 2;
const int foo3 = 3;
const int __stop_foo = -1;

int main(void)
{
  int i;

  i = 0;
  do {
printf(%d: %d\n, i, *(__start_foo + 1 + i));
printf(%d: %d\n, i, *(__start_foo + 1 + i));
i++;
  } while(*(__start_foo + 1 + i) != -1);

  return 0;
}


Re: Updated to respond to various email comments from Jason, Diego and Cary (issue6197069)

2012-06-19 Thread Sterling Augustine
On Wed, Jun 13, 2012 at 10:47 PM, Jason Merrill ja...@redhat.com wrote:
 On 06/13/2012 04:26 PM, Sterling Augustine wrote:

 I lean toward -g myself, since there doesn't seem to be a strong rule one
 way or the other.


 Unless there are further comments, I'll stick with -g then.

 I think that covers all the comments, so I think I will commit this
 Friday morning unless I hear anything further.


 Weren't you going to repost the patch first?  :)

I hate how codereview.appspot.com doesn't connect some messages properly.

After this prompting, I re-posted the patch here:

http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00949.html

As this has addressed all previous comments, and barring any
objections, I'll check it in tomorrow morning.

Sterling


Re: [PATCH 2/3] Add XLP-specific atomic instructions and tweaks.

2012-06-19 Thread Richard Sandiford
Maxim Kuvyrkov ma...@codesourcery.com writes:
 The only other change that I made that was not in your comments is the
 addition of b mips_print_operand specifier.  The LDADD and SWAP
 instructions accept their address as a plain register without
 parenthesis,

Ouch.

 so I've added the specifier to skip outputting parenthesis.

Yeah, good idea.

Patch is OK, thanks.

Richard


Re: [PATCH, i386]: Introduce FIST_ROUNDING int iterator

2012-06-19 Thread Uros Bizjak
Hello!

2012-06-19  Uros Bizjak  ubiz...@gmail.com

* config/i386/i386.md (FIST_ROUNDING): New int iterator.
(rounding): Handle UNSPEC_FIST_{FLOOR,CEIL}.
(ROUNDING): Ditto.
(*fistmode2_rounding_1): Macroize insn from
*fistmode2_{floor,ceil}_1 using FIST_ROUNDING int iterator.
(fistdi2_rounding): Macroize insn from
fistdi2_{floor,ceil} using FIST_ROUNDING int iterator.
(fistdi2_rounding_with_temp and splitters): Macroize insn and
corresponding splitters from fistdi2_{floor,ceil} and corresponding
splitters using FIST_ROUNDING int iterator.
(fistmode2_rounding): Macroize insn from
fistmode2_{floor,ceil} using FIST_ROUNDING int iterator.
(fistmode2_rounding_with_temp and splitters): Macroize insn and
corresponding splitters from fistmode2_{floor,ceil} and corresponding
splitters using FIST_ROUNDING int iterator.
(lroundingxfmode2): Macroize expander from l{floor,ceil}xfmode2
using FIST_ROUNDING int iterator.

Bootstrapped and regression tested on x86_64-pc-linux-gnu {,-m32},
committed to mainline SVN.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 188783)
+++ config/i386/i386.md (working copy)
@@ -15104,15 +15104,23 @@
 UNSPEC_FRNDINT_CEIL
 UNSPEC_FRNDINT_TRUNC])
 
+(define_int_iterator FIST_ROUNDING
+   [UNSPEC_FIST_FLOOR
+UNSPEC_FIST_CEIL])
+
 (define_int_attr rounding
[(UNSPEC_FRNDINT_FLOOR floor)
 (UNSPEC_FRNDINT_CEIL ceil)
-(UNSPEC_FRNDINT_TRUNC trunc)])
+(UNSPEC_FRNDINT_TRUNC trunc)
+(UNSPEC_FIST_FLOOR floor)
+(UNSPEC_FIST_CEIL ceil)])
 
 (define_int_attr ROUNDING
[(UNSPEC_FRNDINT_FLOOR FLOOR)
 (UNSPEC_FRNDINT_CEIL CEIL)
-(UNSPEC_FRNDINT_TRUNC TRUNC)])
+(UNSPEC_FRNDINT_TRUNC TRUNC)
+(UNSPEC_FIST_FLOOR FLOOR)
+(UNSPEC_FIST_CEIL CEIL)])
 
 ;; Rounding mode control word calculation could clobber FLAGS_REG.
 (define_insn_and_split frndintxf2_rounding
@@ -15205,174 +15213,59 @@
   DONE;
 })
 
-(define_insn_and_split *fistmode2_floor_1
-  [(set (match_operand:SWI248x 0 nonimmediate_operand)
-   (unspec:SWI248x [(match_operand:XF 1 register_operand)]
-   UNSPEC_FIST_FLOOR))
-   (clobber (reg:CC FLAGS_REG))]
+(define_expand ceilxf2
+  [(use (match_operand:XF 0 register_operand))
+   (use (match_operand:XF 1 register_operand))]
   TARGET_USE_FANCY_MATH_387
-flag_unsafe_math_optimizations
-can_create_pseudo_p ()
-  #
-   1
-  [(const_int 0)]
+flag_unsafe_math_optimizations
 {
-  ix86_optimize_mode_switching[I387_FLOOR] = 1;
+  if (optimize_insn_for_size_p ())
+FAIL;
+  emit_insn (gen_frndintxf2_ceil (operands[0], operands[1]));
+  DONE;
+})
 
-  operands[2] = assign_386_stack_local (HImode, SLOT_CW_STORED);
-  operands[3] = assign_386_stack_local (HImode, SLOT_CW_FLOOR);
-  if (memory_operand (operands[0], VOIDmode))
-emit_insn (gen_fistmode2_floor (operands[0], operands[1],
- operands[2], operands[3]));
+(define_expand ceilmode2
+  [(use (match_operand:MODEF 0 register_operand))
+   (use (match_operand:MODEF 1 register_operand))]
+  (TARGET_USE_FANCY_MATH_387
+ (!(SSE_FLOAT_MODE_P (MODEmode)  TARGET_SSE_MATH)
+   || TARGET_MIX_SSE_I387)
+ flag_unsafe_math_optimizations)
+   || (SSE_FLOAT_MODE_P (MODEmode)  TARGET_SSE_MATH
+!flag_trapping_math)
+{
+  if (SSE_FLOAT_MODE_P (MODEmode)  TARGET_SSE_MATH
+   !flag_trapping_math)
+{
+  if (TARGET_ROUND)
+   emit_insn (gen_sse4_1_roundmode2
+  (operands[0], operands[1], GEN_INT (ROUND_CEIL)));
+  else if (optimize_insn_for_size_p ())
+   FAIL;
+  else if (TARGET_64BIT || (MODEmode != DFmode))
+   ix86_expand_floorceil (operands[0], operands[1], false);
+  else
+   ix86_expand_floorceildf_32 (operands[0], operands[1], false);
+}
   else
 {
-  operands[4] = assign_386_stack_local (MODEmode, SLOT_TEMP);
-  emit_insn (gen_fistmode2_floor_with_temp (operands[0], operands[1],
- operands[2], operands[3],
- operands[4]));
-}
-  DONE;
-}
-  [(set_attr type fistp)
-   (set_attr i387_cw floor)
-   (set_attr mode MODE)])
+  rtx op0, op1;
 
-(define_insn fistdi2_floor
-  [(set (match_operand:DI 0 memory_operand =m)
-   (unspec:DI [(match_operand:XF 1 register_operand f)]
-  UNSPEC_FIST_FLOOR))
-   (use (match_operand:HI 2 memory_operand m))
-   (use (match_operand:HI 3 memory_operand m))
-   (clobber (match_scratch:XF 4 =1f))]
-  TARGET_USE_FANCY_MATH_387
-flag_unsafe_math_optimizations
-  * return output_fix_trunc (insn, operands, false);
-  [(set_attr type fistp)
-   (set_attr i387_cw floor)
-   (set_attr mode DI)])
+  if 

Re: [PATCH 2/3] Use synth_mult for vector multiplies vs scalar constant

2012-06-19 Thread Richard Henderson
On 2012-06-16 04:19, Eric Botcazou wrote:
 @@ -179,7 +179,11 @@ extern const unsigned char
 mode_class[NUM_MACHINE_MODES];

  extern CONST_MODE_SIZE unsigned char mode_size[NUM_MACHINE_MODES];
  #define GET_MODE_SIZE(MODE)((unsigned short) mode_size[MODE])
 -#define GET_MODE_BITSIZE(MODE) ((unsigned short) (GET_MODE_SIZE (MODE) *
 BITS_PER_UNIT)) +
 +#define GET_MODE_BITSIZE(MODE) \
 +  ((unsigned short) (GET_MODE_SIZE (MODE) * BITS_PER_UNIT))
 +#define GET_MODE_UNIT_BITSIZE(MODE) \
 +  ((unsigned short) (GET_MODE_UNIT_SIZE (MODE) * BITS_PER_UNIT))

  /* Get the number of value bits of an object of mode MODE.  */
  extern const unsigned short mode_precision[NUM_MACHINE_MODES];
 
 Can you move GET_MODE_UNIT_BITSIZE to after GET_MODE_UNIT_SIZE, changing 
 size 
 in bytes to size in bytes and bits in the comment just above?  Because the 
 overloading of UNIT in the macro makes the whole thing slightly confusing. :-)
 

Done in the committed patch.


r~



Re: [PATCH] Bad code generation: incorrect folding of TARGET_MEM_REF into a constant

2012-06-19 Thread Andreas Schwab
Jiří Hruška ji...@fud.cz writes:

 #include stdio.h

 __attribute__((section(foo))) const int foo1 = 1;
 __attribute__((section(foo))) const int foo2 = 2;
 __attribute__((section(foo))) const int foo3 = 3;

 extern const int __start_foo, __stop_foo;

Declare them as arrays.

extern const int __start_foo[], __stop_foo[];

Andreas.

-- 
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
And now for something completely different.


Re: [PATCH] backport darwin12 fixes to gcc-4_7-branch

2012-06-19 Thread Mike Stump
Ok.


Re: [RFC 0/3] Stuff related to pr53533

2012-06-19 Thread Richard Henderson
On 2012-06-15 13:57, Richard Henderson wrote:
 Bootstrapped and tested on x86_64, but I'll leave some time for
 comment before committing any of this.

Patches now committed.


r~


Re: [PATCH] Improve pattern recognizer for division by constant (PR tree-optimization/51581)

2012-06-19 Thread Richard Henderson
On 2012-06-18 22:46, Jakub Jelinek wrote:
 On Mon, Jun 18, 2012 at 04:44:21PM -0700, Richard Henderson wrote:
 On 2012-06-14 13:58, Jakub Jelinek wrote:
 +  if (!supportable_widening_operation (WIDEN_MULT_EXPR, last_stmt,
 +  vecwtype, vectype,
 +  dummy, dummy, dummy_code,
 +  dummy_code, dummy_int, dummy_vec))
 +return NULL;


 It would be nice to be able to handle high-part multiplies as well, e.g. 
 VEC_WIDEN_MULT_HI_EXPR.  Which is what Altivec provides, and not 
 VEC_WIDEN_MULT.
 
 Sure, but we don't have a tree code for that right now, do we?
 VEC_WIDEN_MULT_HI_EXPR is just one half of the widened multiply results,
 not all the high halves of the widened multiply.

Actually, it is all the high parts of the multiply results.  The comment
in tree.def is incorrect.  Likewise MULT_LO_EXPR is the low parts (and
fully redundant with plain MULT_EXPR, really).

 For 16-bit multiplication we could also use {,V}PMULH{,U}W
 (for 32-bit multiplication we use two {,V}PMUL{,U}DQ plus shifts afterwards).

Well, an single interleave, not shifts, but yes.


r~




Re: [PATCH] Fix PR53708

2012-06-19 Thread Iain Sandoe

On 19 Jun 2012, at 13:53, Dominique Dhumieres wrote:

 On Tue, 19 Jun 2012, Richard Guenther wrote:
 
 Richard Guenther rguent...@suse.de writes:
 We are too eager to bump alignment of some decls when vectorizing.
 The fix is to not bump alignment of decls the user explicitely
 aligned or that are used in an unknown way.
 
 I thought attribute((__aligned__)) only set a minimum alignment for
 variables?  Most usees I've seen have been trying to get better
 performance from higher alignment, so it might not go down well if the
 attribute stopped the vectoriser from increasing the alignment still
 further.
 
 That's what the documentation says indeed.  I'm not sure which part of
 the patch fixes the ObjC failures where the alignment is part of the ABI
 (and I suppose ObjC then mis-uses the aligned attribute?).
 
 A quick test shows that 
 
 if (DECL_PRESERVE_P (decl))
 
 alone is enough to fix the objc failures, while they are still there if 
 one uses only
 
 if (DECL_USER_ALIGN (decl))

That makes sense, I had a quick look at the ObjC code, and it appears that the 
explicit ALIGNs were never committed to trunk.

Thus, the question becomes; what should ObjC (or any other) FE do to ensure 
that specific ABI (upper) alignment constraints are met?

Iain



Re: [patch] Deal with #ident without

2012-06-19 Thread Steven Bosscher
On Thu, Jun 7, 2012 at 11:22 AM, Richard Guenther
richard.guent...@gmail.com wrote:
 On Thu, Jun 7, 2012 at 8:16 AM, Andreas Schwab sch...@linux-m68k.org wrote:
 Steven Bosscher stevenb@gmail.com writes:

 Index: doc/tm.texi
 ===
 --- doc/tm.texi       (revision 188182)
 +++ doc/tm.texi       (working copy)
 @@ -5847,6 +5847,10 @@ value is 0.
  @end deftypevr

  @deftypefn {Target Hook} void TARGET_ASM_OUTPUT_ANCHOR (rtx @var{x})
 +
 +@deftypefn {Target Hook} void TARGET_ASM_OUTPUT_IDENT (const char 
 *@var{name})
 +Generate a string based on @var{name}, suitable for the @samp{#ident}  
 directive, or the equivalent directive or pragma in non-C-family languages. 
  If this hook is not defined, nothing is output for the @samp{#ident}  
 directive.
 +@end deftypefn

 That looks misplaced.

 Ok after double-checking the above.

I've now committed this, see r188791.

Ciao!
Steven


Re: [patch] Fix PR48109 using artificial top-level asm statements (darwin/objc)

2012-06-19 Thread Mike Stump
On Jun 18, 2012, at 10:51 AM, Steven Bosscher stevenb@gmail.com wrote:
 This patch started as an attempt to remove #include output.h from
 objc/: Instead of writing references directly to asm_out_file, the
 references are output as top-level asm statements.

  OK for trunk?

Ok.


Re: [patch] Use IDENTIFIER_LENGTH instead of strlen(IDENTIFIER_POINTER) in a few places

2012-06-19 Thread Mike Stump
On Jun 18, 2012, at 8:55 AM, Steven Bosscher stevenb@gmail.com wrote:
 Obvious enough
 
 objc/
* objc-encoding.c (encode_aggregate_fields): Use IDENTIFIER_LENGTH
instead of strlen(IDENTIFIER_POINTER).
(encode_aggregate_within): Likewise.

Ok.


Re: [PATCH] Bad code generation: incorrect folding of TARGET_MEM_REF into a constant

2012-06-19 Thread Jiří Hruška
On Tue, Jun 19, 2012 at 8:59 PM, Andreas Schwab sch...@linux-m68k.org wrote:
 Declare them as arrays.
 extern const int __start_foo[], __stop_foo[];
Thanks, that's a good suggestion, cleans the code nicely!
(Though, of course, both ways work here and the strange things happen
only in the 3rd testcase, which does not use these special variables.)


Re: [PATCH] Fix PR tree-optimization/53636 (SLP generates invalid misaligned access)

2012-06-19 Thread Mikael Pettersson
Richard Guenther writes:
  On Fri, Jun 15, 2012 at 5:00 PM, Ulrich Weigand uweig...@de.ibm.com wrote:
   Richard Guenther wrote:
   On Fri, Jun 15, 2012 at 3:13 PM, Ulrich Weigand uweig...@de.ibm.com 
   wrote:
However, there is a second case where we need to check every pass: if
we're not actually vectorizing any loop, but are performing basic-block
SLP.  In this case, it would appear that we need the same check as
described in the comment above, i.e. to verify that the stride is a
multiple of the vector size.
   
The patch below adds this check, and this indeed fixes the invalid 
access
I was seeing in the test case (in the final assembler, we now get a
vld1.16 instead of vldr).
   
Tested on arm-linux-gnueabi with no regressions.
   
OK for mainline?
  
   Ok.
  
   Thanks for the quick review; I've checked this in to mainline now.
  
   I just noticed that the test case also crashes on 4.7, but not on 4.6.
  
   Would a backport to 4.7 also be OK, once testing passes?
  
  Yes.  Please leave it on mainline a few days to catch fallout from
  autotesters.

This patch caused

FAIL: gcc.dg/vect/bb-slp-16.c scan-tree-dump-times slp basic block vectorized 
using SLP 1

on sparc64-linux.  Comparing the pre and post patch dumps for that file shows

 22: vect_compute_data_ref_alignment:
 22: misalign = 4 bytes of ref MEM[(unsigned int *)pout_90 + 28B]
 22: vect_compute_data_ref_alignment:
-22: force alignment of arr[i_87]
-22: misalign = 0 bytes of ref arr[i_87]
+22: SLP: step doesn't divide the vector-size.
+22: Unknown alignment for access: arr

(lots of stuff that's simply gone)

-22: BASIC BLOCK VECTORIZED
-
-22: basic block vectorized using SLP
+22: not vectorized: unsupported unaligned store.arr[i_87]
+22: not vectorized: unsupported alignment in basic block.

/Mikael


Re: [patch] Fix failing nested-3.C on ARM.

2012-06-19 Thread Mike Stump
On Jun 19, 2012, at 2:18 AM, Richard Earnshaw rearn...@arm.com wrote:
 The regexp in nested-3.C has to parse the machine-specific comment
 character; on ARM that is '@'.
 
 Tested on arm-eabi, where this test now passes.
 
 OK?

Ok.


Re: [PATCH] Fix PR53708

2012-06-19 Thread Mike Stump
On Jun 19, 2012, at 12:22 PM, Iain Sandoe i...@codesourcery.com wrote:
 On 19 Jun 2012, at 13:53, Dominique Dhumieres wrote:
 
 On Tue, 19 Jun 2012, Richard Guenther wrote:
 
 Richard Guenther rguent...@suse.de writes:
 We are too eager to bump alignment of some decls when vectorizing.
 The fix is to not bump alignment of decls the user explicitely
 aligned or that are used in an unknown way.
 
 I thought attribute((__aligned__)) only set a minimum alignment for
 variables?  Most usees I've seen have been trying to get better
 performance from higher alignment, so it might not go down well if the
 attribute stopped the vectoriser from increasing the alignment still
 further.
 
 That's what the documentation says indeed.  I'm not sure which part of
 the patch fixes the ObjC failures where the alignment is part of the ABI
 (and I suppose ObjC then mis-uses the aligned attribute?).
 
 A quick test shows that 
 
 if (DECL_PRESERVE_P (decl))
 
 alone is enough to fix the objc failures, while they are still there if 
 one uses only
 
 if (DECL_USER_ALIGN (decl))
 
 That makes sense, I had a quick look at the ObjC code, and it appears that 
 the explicit ALIGNs were never committed to trunk.
 
 Thus, the question becomes; what should ObjC (or any other) FE do to ensure 
 that specific ABI (upper) alignment constraints are met?

Hum, upper is easy...  I thought the issue was that extra alignment would kill 
it?  I know that extra alignment does kill some of the objc metadata.


Re: [PATCH] Fix PR53708

2012-06-19 Thread Mike Stump
On Jun 19, 2012, at 5:53 AM, domi...@lps.ens.fr (Dominique Dhumieres) wrote:
 On Tue, 19 Jun 2012, Richard Guenther wrote:
 
 Richard Guenther rguent...@suse.de writes:
 We are too eager to bump alignment of some decls when vectorizing.
 The fix is to not bump alignment of decls the user explicitely
 aligned or that are used in an unknown way.
 
 I thought attribute((__aligned__)) only set a minimum alignment for
 variables?  Most usees I've seen have been trying to get better
 performance from higher alignment, so it might not go down well if the
 attribute stopped the vectoriser from increasing the alignment still
 further.
 
 That's what the documentation says indeed.  I'm not sure which part of
 the patch fixes the ObjC failures where the alignment is part of the ABI
 (and I suppose ObjC then mis-uses the aligned attribute?).
 
 A quick test shows that 
 
 if (DECL_PRESERVE_P (decl))
 
 alone is enough to fix the objc failures,

Sounds good to me.  It seems ok to me for the optimizer bumps up the alignment 
on things that aren't special.  DECL_PRESERVE seems like a reasonable way to 
declare they are special.


Re: [testsuite] profopt.exp and friends: use expected list of options

2012-06-19 Thread Mike Stump
On Jun 18, 2012, at 4:51 PM, Janis Johnson janis_john...@mentor.com wrote:
 There are tests in g++.tree-prof that have non-unique lines in test
 summaries for scan-*-dump checks.  Investigation showed that these tests
 were being run multiple times, for a list of options that had leaked
 over from another set of profile-directed optimization tests.

 This patch makes
 it use [ { -O2 } {-O3 } ] so the options tested there will get some
 coverage with optimization, although not as much as originally planned
 when the tests were added years and years ago.

Sounds ok to me, but I'd be happy to have a prof champion chime in, if they 
disagree.

 OK for mainline?

Ok, with the caveat that I'll defer to a prof champion.


Re: [PATCH] Fix PR53708

2012-06-19 Thread Iain Sandoe

On 19 Jun 2012, at 22:41, Mike Stump wrote:

 On Jun 19, 2012, at 12:22 PM, Iain Sandoe i...@codesourcery.com wrote:
 On 19 Jun 2012, at 13:53, Dominique Dhumieres wrote:
 
 On Tue, 19 Jun 2012, Richard Guenther wrote:
 
 Richard Guenther rguent...@suse.de writes:
 We are too eager to bump alignment of some decls when vectorizing.
 The fix is to not bump alignment of decls the user explicitely
 aligned or that are used in an unknown way.
 
 I thought attribute((__aligned__)) only set a minimum alignment for
 variables?  Most usees I've seen have been trying to get better
 performance from higher alignment, so it might not go down well if the
 attribute stopped the vectoriser from increasing the alignment still
 further.
 
 That's what the documentation says indeed.  I'm not sure which part of
 the patch fixes the ObjC failures where the alignment is part of the ABI
 (and I suppose ObjC then mis-uses the aligned attribute?).
 
 A quick test shows that 
 
 if (DECL_PRESERVE_P (decl))
 
 alone is enough to fix the objc failures, while they are still there if 
 one uses only
 
 if (DECL_USER_ALIGN (decl))
 
 That makes sense, I had a quick look at the ObjC code, and it appears that 
 the explicit ALIGNs were never committed to trunk.
 
 Thus, the question becomes; what should ObjC (or any other) FE do to ensure 
 that specific ABI (upper) alignment constraints are met?
 
 Hum, upper is easy...  I thought the issue was that extra alignment would 
 kill it?  I know that extra alignment does kill some of the objc metadata.

clearly, ambiguous phrasing on my part. 
I mean when we want to say no more than this much.





Fix e500 vector ICE with string constants

2012-06-19 Thread Joseph S. Myers
On some tests involving storing a pointer to a string constant in a
vector, on powerpc with SPE vectors, an ICE occurs of the form:

t2.c: In function 'f':
t2.c:7:1: error: unrecognizable insn:
 }
 ^
(insn 9 8 10 2 (set (subreg:SI (reg:V2SI 125 [ D.1618 ]) 4)
(lo_sum:SI (reg:SI 126)
(symbol_ref/f:SI (*.LC0) [flags 0x82] var_decl
0xf745b000 *.LC0))) t2.c:6 -1
 (nil))
t2.c:7:1: internal compiler error: in extract_insn, at recog.c:2130
Please submit a full bug report,
with preprocessed source if appropriate.
See http://gcc.gnu.org/bugs.html for instructions.

The patterns to set individual words of SPE vectors only allow
input_operand and do not allow for the LO_SUM constructs used for
pointers to strings.  This patch fixes things by adding further
patterns for the LO_SUM case.  (It's possible the issue could also
arise with the patterns for subregs of TFmode at offset 8 and 12, but
I couldn't get the compiler to generate stores of string constant
pointers to such subregs.)

The original test I had for this issue in a 4.6-based compiler
simplified to

char *a1[20];
int a2[20];
char a3[1];

void
f (void)
{
  int i;
  for (i = 1; i  20; i++)
{
  a1[i] = ;
  a2[i] = 0;
}
}

with -O3, where the vectors were generated internally, but that
doesn't ICE with trunk, so I created the synthetic testcases in this
patch that do ICE with trunk.

Tested with no regressions with cross to powerpc-eabispe.  OK to
commit?

2012-06-19  Joseph Myers  jos...@codesourcery.com

* config/rs6000/spe.md (*mov_simode_e500_subreg0): Rename to
mov_simode_e500_subreg0.
(*mov_simode_e500_subreg0_elf_low)
(*mov_simode_e500_subreg4_elf_low): New patterns.

testsuite:
2012-06-19  Joseph Myers  jos...@codesourcery.com

* gcc.c-torture/compile/vector-5.c,
gcc.c-torture/compile/vector-6.c: New tests.

Index: gcc/testsuite/gcc.c-torture/compile/vector-5.c
===
--- gcc/testsuite/gcc.c-torture/compile/vector-5.c  (revision 0)
+++ gcc/testsuite/gcc.c-torture/compile/vector-5.c  (revision 0)
@@ -0,0 +1,7 @@
+typedef int v2si __attribute__((__vector_size__(8)));
+
+v2si
+f (int x)
+{
+  return (v2si) { x, (__INTPTR_TYPE__)  };
+}
Index: gcc/testsuite/gcc.c-torture/compile/vector-6.c
===
--- gcc/testsuite/gcc.c-torture/compile/vector-6.c  (revision 0)
+++ gcc/testsuite/gcc.c-torture/compile/vector-6.c  (revision 0)
@@ -0,0 +1,7 @@
+typedef int v2si __attribute__((__vector_size__(8)));
+
+v2si
+f (int x)
+{
+  return (v2si) { (__INTPTR_TYPE__) , x };
+}
Index: gcc/config/rs6000/spe.md
===
--- gcc/config/rs6000/spe.md(revision 188753)
+++ gcc/config/rs6000/spe.md(working copy)
@@ -1,5 +1,5 @@
 ;; e500 SPE description
-;; Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009
+;; Copyright (C) 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2011, 2012
 ;; Free Software Foundation, Inc.
 ;; Contributed by Aldy Hernandez (a...@quesejoda.com)
 
@@ -2329,7 +2329,7 @@
   evmergehi %0,%1,%1\;mr %L0,%1\;evmergehi %Y0,%L1,%L1\;mr %Z0,%L1
   [(set_attr length 16)])
 
-(define_insn *mov_simode_e500_subreg0
+(define_insn mov_simode_e500_subreg0
   [(set (subreg:SI (match_operand:SPE64TF 0 register_operand +r,r) 0)
(match_operand:SI 1 input_operand r,m))]
   (TARGET_E500_DOUBLE  (MODEmode == DFmode || MODEmode == TFmode))
@@ -2339,6 +2339,24 @@
evmergelohi %0,%0,%0\;{l%U1%X1|lwz%U1%X1} %0,%1\;evmergelohi %0,%0,%0
   [(set_attr length 4,12)])
 
+(define_insn_and_split *mov_simode_e500_subreg0_elf_low
+  [(set (subreg:SI (match_operand:SPE64TF 0 register_operand +r) 0)
+   (lo_sum:SI (match_operand:SI 1 gpc_reg_operand r)
+  (match_operand 2  )))]
+  ((TARGET_E500_DOUBLE  (MODEmode == DFmode || MODEmode == TFmode))
+|| (TARGET_SPE  MODEmode != DFmode  MODEmode != TFmode))
+TARGET_ELF  !TARGET_64BIT  can_create_pseudo_p ()
+  #
+   1
+  [(pc)]
+{
+  rtx tmp = gen_reg_rtx (SImode);
+  emit_insn (gen_elf_low (tmp, operands[1], operands[2]));
+  emit_insn (gen_mov_simode_e500_subreg0 (operands[0], tmp));
+  DONE;
+}
+  [(set_attr length 8)])
+
 ;; ??? Could use evstwwe for memory stores in some cases, depending on
 ;; the offset.
 (define_insn *mov_simode_e500_subreg0_2
@@ -2360,6 +2378,15 @@
mr %0,%1
{l%U1%X1|lwz%U1%X1} %0,%1)
 
+(define_insn *mov_simode_e500_subreg4_elf_low
+  [(set (subreg:SI (match_operand:SPE64TF 0 register_operand +r) 4)
+   (lo_sum:SI (match_operand:SI 1 gpc_reg_operand r)
+  (match_operand 2  )))]
+  ((TARGET_E500_DOUBLE  (MODEmode == DFmode || MODEmode == TFmode))
+|| (TARGET_SPE  MODEmode != DFmode  MODEmode != TFmode))
+TARGET_ELF  !TARGET_64BIT
+  {ai|addic} %0,%1,%K2)
+
 (define_insn *mov_simode_e500_subreg4_2
   [(set (match_operand:SI 0 

[patch][PCH] Do not write/read asm_out_file, take 2

2012-06-19 Thread Steven Bosscher
Hello,

The attached patch removes one more #include output.h, this time from
c-family/c-pch.c.

Anything written out to asm_out_file between pch_init and
c_common_write_pch is read back in by c_common_write_pch and dumped to
the PCH that's being written out. In c_common_read_pch this data is
written out verbatim to asm_out_file again.

But nothing should write to asm_out_file between pch_init and
c_common_write_pch. I suppose this happened before unit-at-a-time
became the only supported compilation mode, but these days there's
nothing, AFAICT, that should be written to asm_out_file by a front end
during PCH generation.

This patch was bootstrappedtested on powerpc64-unknown-linux-gnu.
The issues with #ident have already been addressed, and this patch
adds a new test case, to make sure...

OK for trunk?

Ciao!
Steven


01_c_pch_no_asm_out_file.diff
Description: Binary data


Re: Fix e500 vector ICE with string constants

2012-06-19 Thread David Edelsohn
On Tue, Jun 19, 2012 at 5:56 PM, Joseph S. Myers
jos...@codesourcery.com wrote:

 2012-06-19  Joseph Myers  jos...@codesourcery.com

        * config/rs6000/spe.md (*mov_simode_e500_subreg0): Rename to
        mov_simode_e500_subreg0.
        (*mov_simode_e500_subreg0_elf_low)
        (*mov_simode_e500_subreg4_elf_low): New patterns.

 testsuite:
 2012-06-19  Joseph Myers  jos...@codesourcery.com

        * gcc.c-torture/compile/vector-5.c,
        gcc.c-torture/compile/vector-6.c: New tests.

Okay.

Thanks, David


[patch committed testsuite] Tweak gcc.dg/stack-usage-1.c on SH

2012-06-19 Thread Kaz Kojima
Hi,

I've applied the attached patch which is a tiny SH specific
change of gcc.dg/stack-usage-1.c test.  Tested on sh-linux
and i686-pc-linux-gnu.

Regards,
kaz
--
2012-06-19  Kaz Kojima  kkoj...@gcc.gnu.org

* gcc.dg/stack-usage-1.c: Use sh*-*-* instead of sh-*-*.

--- ORIG/trunk/gcc/testsuite/gcc.dg/stack-usage-1.c 2012-06-16 
09:29:54.0 +0900
+++ trunk/gcc/testsuite/gcc.dg/stack-usage-1.c  2012-06-19 07:55:54.0 
+0900
@@ -1,6 +1,6 @@
 /* { dg-do compile } */
 /* { dg-options -fstack-usage } */
-/* { dg-options -fstack-usage -fomit-frame-pointer { target { sh-*-* } } } */
+/* { dg-options -fstack-usage -fomit-frame-pointer { target { sh*-*-* } } } 
*/
 
 /* This is aimed at testing basic support for -fstack-usage in the back-ends.
See the SPARC back-end for example (grep flag_stack_usage_info in sparc.c).


[patch][ARM] Do not include output.h in arm-c.c

2012-06-19 Thread Steven Bosscher
Hello,

Only a few front-end files to go that need output.h, and some of them
are in the c_target_objs: arm, mep, m32c, and rl78.

This patch tackles the ARM case.  arm-c.c needs output.h because
EMIT_EABI_ATTRIBUTE wants to print to asm_out_file. Solved by
replacing EMIT_EABI_ATTRIBUTE with a function
arm.c:arm_emit_eabi_attribute.

Tested by building a cross-compiler from powerpc64-unknown-linux-gnu X
arm-eabi, and comparing assembly on a set of files.
OK for trunk?

Ciao!
Steven


arm_C_no_output_h.diff
Description: Binary data


Re: [RFC 0/3] Stuff related to pr53533

2012-06-19 Thread Matt

On 2012-06-15 13:57, Richard Henderson wrote:

 Bootstrapped and tested on x86_64, but I'll leave some time for
 comment before committing any of this.



Patches now committed.


Hey Richard,

Thanks for taking on some of these issues. I'm not seeing much of an 
improvement yet when manually applying the patches to 4.7, but it looks 
like steps in the right direction. Having to turn off vectorization to 
approximate previous compiler performance was disappointing given it's 
supposed to give us a boost on some of these architectures ;)


Would it be possible to commit these to 4_7-branch as well? (One of the 
patches looks relevant to 4.6 as well, and applied cleanly, but I haven't 
tested to see if it had a noticeable effect.)


Thanks again!


--
tangled strands of DNA explain the way that I behave.
http://www.clock.org/~matt


[cxx-conversion] Remove option to build without a C++ compiler (issue6296093)

2012-06-19 Thread Diego Novillo
Remove option to build without a C++ compiler.

This patch removes all the configuration code that allowed GCC to
build without a C++ compiler.  After this patch the following
configuration flags are no longer valid:

--enable-build-with-cxx
--enable-build-poststage1-with-cxx

All builds will unconditionally use C++.

Tested on x86_64.

Ian, could you please take a look to double check I have not missed
anything?  There was more code dealing with it than I was expecting.

I'm also not sure how to propagate the changes in go/gofrontend, but
we don't need to worry about that until we do the acutal merge into
trunk.

Thanks.  Diego.


2012-06-19   Diego Novillo  dnovi...@google.com

ChangeLog.cxx-conversion
* Makefile.tpl (STAGE[+id+]_CXXFLAGS): Remove
POSTSTAGE1_CONFIGURE_FLAGS.
* Makefile.in: Regenerate.
* configure.ac (ENABLE_BUILD_WITH_CXX): Remove.  Update all users.
* configure: Regenerate.

gcc/ChangeLog.cxx-conversion
* Makefile.in: Remove all handlers of ENABLE_BUILD_WITH_CXX.
* configure.ac: Likewise.
* configure: Regenerate.
* config.in: Regenerate.
* doc/install.texi: Remove documentation for --enable-build-with-cxx
and --enable-build-poststage1-with-cxx.

gcc/go/ChangeLog.cxx-conversion
* go-c.h: Remove all handlers of ENABLE_BUILD_WITH_CXX.
* go-gcc.cc: Likewise.
* go-system.h: Likewise.

libcpp/ChangeLog.cxx-conversion
* Makefile.in: Remove all handlers of ENABLE_BUILD_WITH_CXX.
* configure.ac: Likewise.
* configure: Regenerate.

diff --git a/Makefile.in b/Makefile.in
index def860e..d81fb97 100644
--- a/Makefile.in
+++ b/Makefile.in
@@ -422,7 +422,6 @@ TFLAGS =
 STAGE_CFLAGS = $(BOOT_CFLAGS)
 STAGE_TFLAGS = $(TFLAGS)
 STAGE_CONFIGURE_FLAGS=@stage2_werror_flag@
-POSTSTAGE1_CONFIGURE_FLAGS = @POSTSTAGE1_CONFIGURE_FLAGS@
 
 
 # Defaults for stage 1; some are overridden below.
@@ -433,10 +432,7 @@ STAGE1_CXXFLAGS = $(CXXFLAGS)
 STAGE1_CXXFLAGS = $(STAGE1_CFLAGS)
 @endif target-libstdc++-v3-bootstrap
 STAGE1_TFLAGS = $(STAGE_TFLAGS)
-# STAGE1_CONFIGURE_FLAGS overridden below, so we can use
-# POSTSTAGE1_CONFIGURE_FLAGS here.
-STAGE1_CONFIGURE_FLAGS = \
-   $(STAGE_CONFIGURE_FLAGS) $(POSTSTAGE1_CONFIGURE_FLAGS)
+STAGE1_CONFIGURE_FLAGS = $(STAGE_CONFIGURE_FLAGS)
 
 # Defaults for stage 2; some are overridden below.
 STAGE2_CFLAGS = $(STAGE_CFLAGS)
@@ -446,10 +442,7 @@ STAGE2_CXXFLAGS = $(CXXFLAGS)
 STAGE2_CXXFLAGS = $(STAGE2_CFLAGS)
 @endif target-libstdc++-v3-bootstrap
 STAGE2_TFLAGS = $(STAGE_TFLAGS)
-# STAGE1_CONFIGURE_FLAGS overridden below, so we can use
-# POSTSTAGE1_CONFIGURE_FLAGS here.
-STAGE2_CONFIGURE_FLAGS = \
-   $(STAGE_CONFIGURE_FLAGS) $(POSTSTAGE1_CONFIGURE_FLAGS)
+STAGE2_CONFIGURE_FLAGS = $(STAGE_CONFIGURE_FLAGS)
 
 # Defaults for stage 3; some are overridden below.
 STAGE3_CFLAGS = $(STAGE_CFLAGS)
@@ -459,10 +452,7 @@ STAGE3_CXXFLAGS = $(CXXFLAGS)
 STAGE3_CXXFLAGS = $(STAGE3_CFLAGS)
 @endif target-libstdc++-v3-bootstrap
 STAGE3_TFLAGS = $(STAGE_TFLAGS)
-# STAGE1_CONFIGURE_FLAGS overridden below, so we can use
-# POSTSTAGE1_CONFIGURE_FLAGS here.
-STAGE3_CONFIGURE_FLAGS = \
-   $(STAGE_CONFIGURE_FLAGS) $(POSTSTAGE1_CONFIGURE_FLAGS)
+STAGE3_CONFIGURE_FLAGS = $(STAGE_CONFIGURE_FLAGS)
 
 # Defaults for stage 4; some are overridden below.
 STAGE4_CFLAGS = $(STAGE_CFLAGS)
@@ -472,10 +462,7 @@ STAGE4_CXXFLAGS = $(CXXFLAGS)
 STAGE4_CXXFLAGS = $(STAGE4_CFLAGS)
 @endif target-libstdc++-v3-bootstrap
 STAGE4_TFLAGS = $(STAGE_TFLAGS)
-# STAGE1_CONFIGURE_FLAGS overridden below, so we can use
-# POSTSTAGE1_CONFIGURE_FLAGS here.
-STAGE4_CONFIGURE_FLAGS = \
-   $(STAGE_CONFIGURE_FLAGS) $(POSTSTAGE1_CONFIGURE_FLAGS)
+STAGE4_CONFIGURE_FLAGS = $(STAGE_CONFIGURE_FLAGS)
 
 # Defaults for stage profile; some are overridden below.
 STAGEprofile_CFLAGS = $(STAGE_CFLAGS)
@@ -485,10 +472,7 @@ STAGEprofile_CXXFLAGS = $(CXXFLAGS)
 STAGEprofile_CXXFLAGS = $(STAGEprofile_CFLAGS)
 @endif target-libstdc++-v3-bootstrap
 STAGEprofile_TFLAGS = $(STAGE_TFLAGS)
-# STAGE1_CONFIGURE_FLAGS overridden below, so we can use
-# POSTSTAGE1_CONFIGURE_FLAGS here.
-STAGEprofile_CONFIGURE_FLAGS = \
-   $(STAGE_CONFIGURE_FLAGS) $(POSTSTAGE1_CONFIGURE_FLAGS)
+STAGEprofile_CONFIGURE_FLAGS = $(STAGE_CONFIGURE_FLAGS)
 
 # Defaults for stage feedback; some are overridden below.
 STAGEfeedback_CFLAGS = $(STAGE_CFLAGS)
@@ -498,10 +482,7 @@ STAGEfeedback_CXXFLAGS = $(CXXFLAGS)
 STAGEfeedback_CXXFLAGS = $(STAGEfeedback_CFLAGS)
 @endif target-libstdc++-v3-bootstrap
 STAGEfeedback_TFLAGS = $(STAGE_TFLAGS)
-# STAGE1_CONFIGURE_FLAGS overridden below, so we can use
-# POSTSTAGE1_CONFIGURE_FLAGS here.
-STAGEfeedback_CONFIGURE_FLAGS = \
-   $(STAGE_CONFIGURE_FLAGS) $(POSTSTAGE1_CONFIGURE_FLAGS)
+STAGEfeedback_CONFIGURE_FLAGS = $(STAGE_CONFIGURE_FLAGS)
 
 
 # Only build the C compiler for stage1, because that is the only one that
@@ -519,9 +500,6 @@ 

Re: [patch] Deal with #ident without

2012-06-19 Thread Hans-Peter Nilsson
On Tue, 19 Jun 2012, Steven Bosscher wrote:
 I've now committed this, see r188791.

Breaking cris-elf.  Just try rebuilding cc1:
./gcc/gcc/../libdecnumber/dpd -I../libdecnumber\
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c -o cris.o
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c: In function 
'cris_asm_output_ident':
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2480: error: 'cgraph_state' 
undeclared (first use in this function)
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2480: error: (Each undeclared 
identifier is reported only once
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2480: error: for each function 
it appears in.)
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2480: error: 
'CGRAPH_STATE_PARSING' undeclared (first use in this funct\
ion)
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2478: warning: unused variable 
'buf'
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2477: warning: unused variable 
'size'
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2476: warning: unused variable 
'section_asm_op'
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c: In function 
'cris_option_override':
/tmp/hpautotest-gcc1/gcc/gcc/config/cris/cris.c:2538: error: 
'flag_no_gcc_ident' undeclared (first use in this function\
)
make[2]: *** [cris.o] Error 1

brgds, H-P


Re: [RFC 0/3] Stuff related to pr53533

2012-06-19 Thread Richard Henderson
On 2012-06-19 15:55, Matt wrote:
 On 2012-06-15 13:57, Richard Henderson wrote:
  Bootstrapped and tested on x86_64, but I'll leave some time for
  comment before committing any of this.
 
 Patches now committed.
 
 Hey Richard,
 
 Thanks for taking on some of these issues. I'm not seeing much of an
 improvement yet when manually applying the patches to 4.7...

Of course not.  None of them address the real problem.  They merely
fix warts discovered along the way.

 Would it be possible to commit these to 4_7-branch as well?

No, I don't think so.


r~


Re: Updated to respond to various email comments from Jason, Diego and Cary (issue6197069)

2012-06-19 Thread Jason Merrill

On 06/19/2012 10:12 AM, Sterling Augustine wrote:

+ /* If we're putting types in their own .debug_types sections,
+the .debug_pubtypes table will still point to the compile
+unit (not the type unit), so we want to use the offset of
+the skeleton DIE (if there is one).  */
+ if (pub-die-comdat_type_p  names == pubtype_table)
+   {
+ comdat_type_node_ref type_node = pub-die-die_id.die_type_node;
+
+ if (type_node != NULL  type_node-skeleton_die != NULL)
+   die_offset = type_node-skeleton_die-die_offset;
+   }


I think we had agreed that if there is no skeleton, we should use an 
offset of 0.


Jason


Re: User directed Function Multiversioning via Function Overloading (issue5752064)

2012-06-19 Thread Sriraman Tallam
Ping.

On Thu, Jun 14, 2012 at 1:13 PM, Sriraman Tallam tmsri...@google.com wrote:
 +cc c++ front-end maintainers

 Hi,

   C++ Frontend maintainers, Could you please take a look at the
 front-end part when you find the time?

   Honza, your thoughts on the callgraph part?

   Richard, any further comments/feedback?

   Additionally, I am working on generating better mangled names for
 function versions, along the lines of C++ thunks.

 Thanks,
 -Sri.

 On Mon, Jun 4, 2012 at 11:59 AM, Sriraman Tallam tmsri...@google.com wrote:
 Hi,

   Attaching updated patch for function multiversioning which brings
 in plenty of changes.

 * As suggested by Richard earlier, I have made cgraph aware of
 function versions. All nodes of function versions are chained and the
 dispatcher bodies are created on demand while building cgraph edges.
 The dispatcher body will be created if and only if there is a call or
 reference to a versioned function. Previously, I was maintaining the
 list of versions separately in a hash map, all that is gone now.
 * Now, the file multiverison.c has some helper routines that are used
 in the context of function versioning. There are no new passes and no
 new globals.
 * More tests, updated existing tests.
 * Fixed lots of bugs.
 * Updated patch description.

 Patch attached. Patch also available for review at
 http://codereview.appspot.com/5752064

 Please let me know what you think,

 Thanks,
 -Sri.


 On Mon, May 14, 2012 at 11:28 AM, Sriraman Tallam tmsri...@google.com 
 wrote:
 Hi H.J,

   Attaching new patch with 2 test cases, mv2.C checks ISAs only and
 mv1.C checks ISAs and arches mixed. Right now, checking only arches is
 not needed as they are mutually exclusive, any order should be fine.

 Patch also available for review here:  http://codereview.appspot.com/5752064

 Thanks,
 -Sri.

 On Sat, May 12, 2012 at 6:37 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Fri, May 11, 2012 at 7:04 PM, Sriraman Tallam tmsri...@google.com 
 wrote:
 Hi H.J.,

   I have updated the patch to improve the dispatching method like we
 discussed. Each feature gets a priority now, and the dispatching is
 done in priority order. Please see i386.c for the changes.

 Patch also available for review here:  
 http://codereview.appspot.com/5752064


 I think you need 3 tests:

 1.  Only with ISA.
 2.  Only with arch
 3.  Mixed with ISA and arch

 since test mixed ISA and arch may hide issues with ISA only or arch only.

 --
 H.J.


Re: [PATCH, MIPS] Add most common atomic patterns

2012-06-19 Thread Maxim Kuvyrkov
I've now checked these patches.

Tom, thanks for great optimizing sync and atomic builtins for MIPS and XLP, 
and, Richard, thanks for the reviews and education on writing good .md 
descriptions.

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics



On 13/06/2012, at 5:50 PM, Maxim Kuvyrkov wrote:

 This patch series adds necessary patterns for __atomic_compare_exchange[_n], 
 __atomic_exchange[_n] and __atomic_fetch_add builtins.  These are the 
 builtins that correspond to inline assembly that MIPS GLIBC port is using.
 
 The patches were originally developed by Tom de Vries a while ago, and I've 
 rewrote parts of them to be better suited for upstream.
 
 The second patch adds XLP-specific patterns to support its swap and ldadd 
 instructions.  Unfortunately, there seem to be a problem in reload that 
 prevents reload from properly spilling address for these two patterns.  I 
 will work with reload experts on investigating and fixing this problem, but, 
 meanwhile, the patch contains a workaround that avoids the problem.
 
 The third patch is a small optimization to alleviate 
 __atomic_compare_exchange[_n] builtins being a use-one-for-all solutions.  
 These builtins return both boolean success and oldval results.  As most 
 cases use only one of the results, this optimizations looks at REG_UNUSED 
 notes to determine if instructions to set these results can be omitted.
 
 The patch series was tested by running GLIBC testsuite for n32, n64 and o32 
 ABIs on XLP and [in-progress] non-XLP MIPS boards with no regressions with a 
 corresponding patch to MIPS GLIBC port to use the new atomic builtins.
 
 --
 Maxim Kuvyrkov
 CodeSourcery / Mentor Graphics
 
 
 



Re: [PATCH] Unify emit_{pre,post}_atomic_barrier across Alpha, ARM, MIPS and TileGX

2012-06-19 Thread Maxim Kuvyrkov
On 15/06/2012, at 11:16 AM, Richard Henderson wrote:

 On 2012-06-14 16:06, Maxim Kuvyrkov wrote:
 2012-06-15  Maxim Kuvyrkov  ma...@codesourcery.com
 
  * emit-rtl.c (need_atomic_barrier_p): New function.
  * emit-rtl.h (need_atomic_barrier_p): Declare it.
  * config/alpha/alpha.c (alpha_{pre,post}_atomic_barrier): Remove, use
  generic version instead.
  * config/arm/arm.c (arm_{pre,post}_atomic_barrier): Remove, use
  generic version instead.
  * config/mips/mips.c (mips_{pre,post}_atomic_barrier_p): Remove, use
  generic version instead.
  * config/tilegx/tilegx.c, config/tilegx/tilegx-protos.h,
  * config/tilegx/sync.md (tilegx_{pre,post}_atomic_barrier): Remove, use
  generic version instead.
 
 
 Ok.

Since I didn't hear any objections from target maintainers I've checked in this 
patch.

Thank you,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics




C++ PATCH for c++/53651 (ICE with ill-formed use of decltype)

2012-06-19 Thread Jason Merrill

A decltype doesn't have a name.

Tested x86_64-pc-linux-gnu, applying to trunk and 4.7.
commit bab2f5e9e77bd41b91ca6eae34483eb159307519
Author: Jason Merrill ja...@redhat.com
Date:   Thu Jun 14 17:28:08 2012 -0700

	PR c++/53651
	* name-lookup.c (constructor_name_p): Don't try to look at the
	name of a DECLTYPE_TYPE.

diff --git a/gcc/cp/name-lookup.c b/gcc/cp/name-lookup.c
index 0f28820..cc8439c 100644
--- a/gcc/cp/name-lookup.c
+++ b/gcc/cp/name-lookup.c
@@ -1966,6 +1966,11 @@ constructor_name_p (tree name, tree type)
   if (TREE_CODE (name) != IDENTIFIER_NODE)
 return false;
 
+  /* These don't have names.  */
+  if (TREE_CODE (type) == DECLTYPE_TYPE
+  || TREE_CODE (type) == TYPEOF_TYPE)
+return false;
+
   ctor_name = constructor_name_full (type);
   if (name == ctor_name)
 return true;
diff --git a/gcc/testsuite/g++.dg/cpp0x/decltype37.C b/gcc/testsuite/g++.dg/cpp0x/decltype37.C
new file mode 100644
index 000..c885e9a
--- /dev/null
+++ b/gcc/testsuite/g++.dg/cpp0x/decltype37.C
@@ -0,0 +1,14 @@
+// PR c++/53651
+// { dg-do compile { target c++11 } }
+
+templatetypename struct wrap { void bar(); };
+
+templatetypename T auto foo(T* t) - wrapT* { return 0; }
+
+templatetypename T
+struct holder : decltype(*foo((T*)0)) // { dg-error class type }
+{
+using decltype(*foo((T*)0))::bar; // { dg-error is not a base }
+};
+
+holderint h;


Re: [patch] Remove NO_IMPLICIT_EXTERN_C target macro

2012-06-19 Thread Hans-Peter Nilsson
On Mon, 18 Jun 2012, Steven Bosscher wrote:
 The attached patch removes NO_IMPLICIT_EXTERN_C, and replaces its sole
 user with IMPLICIT_EXTERN_C to avoid the double negations (#ifndef
 NO_IMPLICIT_EXTERN_C, etc.).

 Bootstrapped and tested on x86_64-unknown-linux-gnu. OK for trunk?

I saw it wasn't part of this patch so: when and if this
eventually gets in, please don't forget to poison it, see
system.h.

brgds, H-P


Re: [PATCH] Take branch misprediction effects into account when RTL loop unrolling (issue6099055)

2012-06-19 Thread Teresa Johnson
Ping.
Teresa

On Fri, May 18, 2012 at 7:21 AM, Teresa Johnson tejohn...@google.com wrote:
 Ping?
 Teresa

 On Fri, May 11, 2012 at 6:11 AM, Teresa Johnson tejohn...@google.com wrote:
 Ping?
 Teresa

 On Fri, May 4, 2012 at 3:41 PM, Teresa Johnson tejohn...@google.com wrote:

 On David's suggestion, I have removed the changes that rename niter_desc
 to
 loop_desc from this patch to focus the patch on the unrolling changes. I
 can
 submit a cleanup patch to do the renaming as soon as this one goes in.

 Bootstrapped and tested on x86_64-unknown-linux-gnu.  Ok for trunk?

 Thanks,
 Teresa

 Here is the new description of improvements from the original patch:

 Improved patch based on feedback. Main changes are:

 1) Improve efficiency by caching loop analysis results in the loop
 auxiliary
 info structure hanging off the loop structure. Added a new routine,
 analyze_loop_insns, to fill in information about the average and total
 number
 of branches, as well as whether there are any floating point set and call
 instructions in the loop. The new routine is invoked when we first create
 a
 loop's niter_desc struct, and the caller (get_simple_loop_desc) has been
 modified to handle creating a niter_desc for the fake outermost loop.

 2) Improvements to max_unroll_with_branches:
 - Treat the fake outermost loop (the procedure body) as we would a hot
 outer
 loop, i.e. compute the max unroll looking at its nested branches, instead
 of
 shutting off unrolling when we reach the fake outermost loop.
 - Pull the checks previously done in the caller into the routine (e.g.
 whether the loop iterates frequently or contains fp instructions).
 - Fix a bug in the previous version that sometimes caused overflow in the
 new unroll factor.

 3) Remove float variables, and use integer computation to compute the
 average number of branches in the loop.

 4) Detect more types of floating point computations in the loop by walking
 all set instructions, not just single sets.

 2012-05-04   Teresa Johnson  tejohn...@google.com

        * doc/invoke.texi: Update the documentation with new params.
        * loop-unroll.c (max_unroll_with_branches): New function.
        (decide_unroll_constant_iterations,
 decide_unroll_runtime_iterations):
        Add heuristic to avoid increasing branch mispredicts when
 unrolling.
        (decide_peel_simple, decide_unroll_stupid): Retrieve number of
        branches from niter_desc instead of via function that walks loop.
        * loop-iv.c (get_simple_loop_desc): Invoke new analyze_loop_insns
        function, and add guards to enable this function to work for the
        outermost loop.
        * cfgloop.c (insn_has_fp_set, analyze_loop_insns): New functions.
        (num_loop_branches): Remove.
        * cfgloop.h (struct loop_desc): Added new fields to cache
 additional
        loop analysis information.
        (num_loop_branches): Remove.
        (analyze_loop_insns): Declare.
        * params.def (PARAM_MIN_ITER_UNROLL_WITH_BRANCHES): New param.
        (PARAM_UNROLL_OUTER_LOOP_BRANCH_BUDGET): Ditto.

 Index: doc/invoke.texi
 ===
 --- doc/invoke.texi     (revision 187013)
 +++ doc/invoke.texi     (working copy)
 @@ -8842,6 +8842,12 @@ The maximum number of insns of an unswitched loop.
  @item max-unswitch-level
  The maximum number of branches unswitched in a single loop.

 +@item min-iter-unroll-with-branches
 +Minimum iteration count to ignore branch effects when unrolling.
 +
 +@item unroll-outer-loop-branch-budget
 +Maximum number of branches allowed in hot outer loop region after unroll.
 +
  @item lim-expensive
  The minimum cost of an expensive expression in the loop invariant motion.

 Index: loop-unroll.c
 ===
 --- loop-unroll.c       (revision 187013)
 +++ loop-unroll.c       (working copy)
 @@ -152,6 +152,99 @@ static void combine_var_copies_in_loop_exit (struc
                                             basic_block);
  static rtx get_expansion (struct var_to_expand *);

 +/* Compute the maximum number of times LOOP can be unrolled without
 exceeding
 +   a branch budget, which can increase branch mispredictions. The number
 of
 +   branches is computed by weighting each branch with its expected
 execution
 +   probability through the loop based on profile data. If no profile
 feedback
 +   data exists, simply return the current NUNROLL factor.  */
 +
 +static unsigned
 +max_unroll_with_branches(struct loop *loop, unsigned nunroll)
 +{
 +  struct loop *outer;
 +  struct niter_desc *outer_desc = 0;
 +  int outer_niters = 1;
 +  int frequent_iteration_threshold;
 +  unsigned branch_budget;
 +  struct niter_desc *desc = get_simple_loop_desc (loop);
 +
 +  /* Ignore loops with FP computation as these tend to benefit much more
 +     consistently from unrolling.  */
 +  if (desc-has_fp)
 +    return nunroll;
 +
 +  frequent_iteration_threshold = PARAM_VALUE

Re: [PING ARM Patches] PR53447: optimizations of 64bit ALU operation with constant

2012-06-19 Thread Michael Hope
On 18 June 2012 22:17, Carrot Wei car...@google.com wrote:
 Hi

 Could ARM maintainers review following patches?

 http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00497.html
 64bit add/sub constants.

 http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01834.html
 64bit and with constants.

 http://gcc.gnu.org/ml/gcc-patches/2012-05/msg01974.html
 64bit xor with constants.

 http://gcc.gnu.org/ml/gcc-patches/2012-06/msg00287.html
 64bit ior with constants.

Hi Carrot.  Out of interest, how do these interact with the 64 bit in
NEON patches that Andrew has been doing?  They seem to touch many of
the same patterns and I'm concerned that they'd cause GCC to prefer
core registers instead of NEON, especially as the constant values you
can use in a vmov are limited.

There's a (in progress) summary of the current state for the standard
C operators here:
 https://wiki.linaro.org/MichaelHope/Sandbox/64BitOperations

-- Michael


RE: [PATCH, ARM] New CPU support for Marvell PJ4 cores

2012-06-19 Thread Yi-Hsiu Hsu
marvell-pj4 is added to BE8_LINK_SPEC.

Modified patch is attached.

Thanks!

B.R.
Yi-Hsiu, Hsu

-Original Message-
From: Ramana Radhakrishnan [mailto:ramana.radhakrish...@linaro.org] 
Sent: Thursday, June 14, 2012 2:19 AM
To: Yi-Hsiu Hsu
Cc: gcc-patches@gcc.gnu.org
Subject: Re: [PATCH, ARM] New CPU support for Marvell PJ4 cores

On 29 May 2012 10:07, Yi-Hsiu Hsu a...@marvell.com wrote:
 Hi,

 This patch maintains Marvell PJ4 cores pipeline description.
 Run arm testsuite on arm-linux-gnueabi and no extra regressions are found.

        * config/arm/marvell-pj4.md: New marvell-pj4 pipeline description.
        * config/arm/arm.c (arm_issue_rate): Add marvell_pj4.
        * config/arm/arm-cores.def: Add core marvell-pj4.
        * config/arm/arm-tune.md: Regenerated.
        * config/arm/arm-tables.opt: Regenerated.
        * doc/invoke.texi: Added entry for marvell-pj4.

This command line option should also be added to BE8_LINK_SPEC similar
to what's done for the other v7-a cores.

Ok with that change.

regards,
Ramana





 Thanks!

 P.S. I create the patch from revision 187308, but this revision is unable to 
 build successfully, then I apply this patch to revision 187623 and 
 successfully build and pass the testsuite.



marvell-pj4-core.patch
Description: marvell-pj4-core.patch


[PR debug/53682] avoid crash in cselib promote_debug_loc

2012-06-19 Thread Alexandre Oliva
When promote_debug_loc was first introduced, it would never be called
with a NULL loc list.  However, because of the strategy of temporarily
resetting loc lists before recursion introduced a few months ago in
alias.c, the earlier assumption no longer holds.

This patch adusts promote_debug_loc to deal with this case.

Ok to install?


for  gcc/ChangeLog
from  Alexandre Oliva  aol...@redhat.com

	PR debug/53682
	* cselib.c (promote_debug_loc): Don't crash on NULL argument.

Index: gcc/cselib.c
===
--- gcc/cselib.c.orig	2012-06-17 22:52:27.740087279 -0300
+++ gcc/cselib.c	2012-06-18 08:55:32.948832112 -0300
@@ -322,7 +322,7 @@ new_elt_loc_list (cselib_val *val, rtx l
 static inline void
 promote_debug_loc (struct elt_loc_list *l)
 {
-  if (l-setting_insn  DEBUG_INSN_P (l-setting_insn)
+  if (l  l-setting_insn  DEBUG_INSN_P (l-setting_insn)
(!cselib_current_insn || !DEBUG_INSN_P (cselib_current_insn)))
 {
   n_debug_values--;

-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist  Red Hat Brazil Compiler Engineer


Re: [PR49888, VTA] don't keep VALUEs bound to modified MEMs

2012-06-19 Thread Alexandre Oliva
On Jun 16, 2012, H.J. Lu hjl.to...@gmail.com wrote:

 If I understand it correctly, the new approach fails to handle push
 properly.

It's actually cselib that didn't deal with push properly, so it thinks
incoming stack arguments may be clobbered by them.  But that's not the
whole story, unfortunately.  I still don't have a complete fix for the
problem, but I have some patches that restore nearly all of the passes.

The first one extends RTX alias analysis so that cselib can recognize
that (mem:SI ARGP) and (mem:SI (plus (and (plus ARGP #-4) #-32) #-4))
don't alias.  Before the patch, we'd go for infinite sized objects upon
AND.

The second introduces an entry-point equivalence between ARGP and SP, so
that SP references in push and stack-align sequences can be
canonicalized to ARGP-based.

The third introduces address canonicalization that uses information in
the dataflow variable set in addition to the static cselib table.  This
is the one I'm still working on, because some expressions still fail to
canonicalize to ARGP although they could.

The fourth removes a now-redundant equivalence from the dynamic table;
the required information is always preserved in the static table.

I've regstrapped (and checked results! :-) all of these on
x86_64-linux-gnu and i686-linux-gnu.  It fixes all visible regressions
in x86_64-linux-gnu, and nearly all on i686-linux-gnu.

May I check these in and keep on working to complete the fix, or should
I revert the original patch and come back only with a patchset that
fixes all debug info regressions?

for  gcc/ChangeLog
from  Alexandre Oliva  aol...@redhat.com

	PR debug/53671
	PR debug/49888
	* alias.c (memrefs_conflict_p): Improve handling of AND for
	alignment.
	
Index: gcc/alias.c
===
--- gcc/alias.c.orig	2012-06-17 22:52:27.551102225 -0300
+++ gcc/alias.c	2012-06-17 22:59:00.674994588 -0300
@@ -2103,17 +2103,31 @@ memrefs_conflict_p (int xsize, rtx x, in
  at least as large as the alignment, assume no other overlap.  */
   if (GET_CODE (x) == AND  CONST_INT_P (XEXP (x, 1)))
 {
-  if (GET_CODE (y) == AND || ysize  -INTVAL (XEXP (x, 1)))
+  HOST_WIDE_INT sc = INTVAL (XEXP (x, 1));
+  unsigned HOST_WIDE_INT uc = sc;
+  if (xsize  0  sc  0  -uc == (uc  -uc))
+	{
+	  xsize -= sc + 1;
+	  c -= sc;
+	}
+  else if (GET_CODE (y) == AND || ysize  -INTVAL (XEXP (x, 1)))
 	xsize = -1;
   return memrefs_conflict_p (xsize, canon_rtx (XEXP (x, 0)), ysize, y, c);
 }
   if (GET_CODE (y) == AND  CONST_INT_P (XEXP (y, 1)))
 {
+  HOST_WIDE_INT sc = INTVAL (XEXP (y, 1));
+  unsigned HOST_WIDE_INT uc = sc;
+  if (ysize  0  sc  0  -uc == (uc  -uc))
+	{
+	  ysize -= sc + 1;
+	  c += sc;
+	}
   /* ??? If we are indexing far enough into the array/structure, we
 	 may yet be able to determine that we can not overlap.  But we
 	 also need to that we are far enough from the end not to overlap
 	 a following reference, so we do nothing with that for now.  */
-  if (GET_CODE (x) == AND || xsize  -INTVAL (XEXP (y, 1)))
+  else if (GET_CODE (x) == AND || xsize  -INTVAL (XEXP (y, 1)))
 	ysize = -1;
   return memrefs_conflict_p (xsize, x, ysize, canon_rtx (XEXP (y, 0)), c);
 }
for  gcc/ChangeLog
from  Alexandre Oliva  aol...@redhat.com

	PR debug/53671
	PR debug/49888
	* var-tracking.c (vt_initialize): Record initial offset between
	arg pointer and stack pointer.

Index: gcc/var-tracking.c
===
--- gcc/var-tracking.c.orig	2012-06-17 23:00:45.793675979 -0300
+++ gcc/var-tracking.c	2012-06-17 23:01:02.525351931 -0300
@@ -9507,6 +9507,41 @@ vt_initialize (void)
   valvar_pool = NULL;
 }
 
+  if (MAY_HAVE_DEBUG_INSNS)
+{
+  rtx reg, expr;
+  int ofst;
+  cselib_val *val;
+
+#ifdef FRAME_POINTER_CFA_OFFSET
+  reg = frame_pointer_rtx;
+  ofst = FRAME_POINTER_CFA_OFFSET (current_function_decl);
+#else
+  reg = arg_pointer_rtx;
+  ofst = ARG_POINTER_CFA_OFFSET (current_function_decl);
+#endif
+
+  ofst -= INCOMING_FRAME_SP_OFFSET;
+
+  val = cselib_lookup_from_insn (reg, GET_MODE (reg), 1,
+ VOIDmode, get_insns ());
+  preserve_value (val);
+  cselib_preserve_cfa_base_value (val, REGNO (reg));
+  expr = plus_constant (GET_MODE (stack_pointer_rtx),
+			stack_pointer_rtx, -ofst);
+  cselib_add_permanent_equiv (val, expr, get_insns ());
+
+  if (ofst)
+	{
+	  val = cselib_lookup_from_insn (stack_pointer_rtx,
+	 GET_MODE (stack_pointer_rtx), 1,
+	 VOIDmode, get_insns ());
+	  preserve_value (val);
+	  expr = plus_constant (GET_MODE (reg), reg, ofst);
+	  cselib_add_permanent_equiv (val, expr, get_insns ());
+	}
+}
+
   /* In order to factor out the adjustments made to the stack pointer or to
  the hard frame pointer and thus be able to use DW_OP_fbreg operations
  instead of individual location lists, we're going to