date:20111012

 The problem is that the references to object C1 and C2 live in
 a hash table, and that although the referenced nodes will be retained
 by the garbage collector, their mapping in int_cst_hash_table is
 deleted by the GC.

This isn't a simple hash table, is it?

 I am not familiar with the details of garbage collection and
 pretty much just try use existing code as a model.  Apparently,
 this sequence of statements is insufficient to tell the GC that it should
 mark the integer constants referenced in this hash table as in use.

   static GTY ((if_marked (tree_map_marked_p),
param_is (struct tree_map)))
htab_t upc_block_factor_for_type;
 [...]
  upc_block_factor_for_type = htab_create_ggc (512, tree_map_hash
   tree_map_eq, 0);

This is a map so it is garbage-collected as a map: if the key isn't marked, 
then the value isn't either.  Hence 2 questions:
 - why using a map and not a simple hash table?
 - what is the key and why isn't it marked?

-- 
Eric Botcazou

Re: [PATCH] Mark static const strings as read-only.

2011-10-12 Thread Tom de Vries

On 10/10/2011 05:50 PM, Eric Botcazou wrote:
 So, the patch for build_constant_desc does not have the desired effect.
 
 OK, too bad that we need to play this back-and-forth game with MEMs.  So the 
 original patch is OK (with TREE_READONLY (base) on the next line to mimic 
 what 
 is done just above and without the gcc/ prefix in the ChangeLog).  If you 
 have 
 some available cycles, you can test and install the build_constant_desc 
 change 
 in the same commit, otherwise I'll do it myself.
 

I'll include the build_constant_desc change in a bootstrap/reg-test on x86_64.

Thanks,
- Tom

Re: int_cst_hash_table mapping persistence and the garbage collector

 It maps a type node into a corresponding integer node that is
 the blocking factor associated with the type node.  Before
 the advent of this hash table the blocking factor was stored
 in a dedicated field in the tree type node.  The suggestion was
 made to move this into a hash table to save space.  I chose
 the tree map hash table because I thought it could do the job.

So this isn't a simple hash table since this is a map.  A simple hash table 
doesn't store the key in the slot, only the value; a map does.

 The keys are valid.  In the example discussed in this thread,
 there is a pointer to type node that used in a parameter declaration
 of a function prototype and also in the similarly named parameter
 of the function definition.  Both tree pointers are used as keys,
 and they are live at the point that the GC runs.

But somehow they aren't marked by the GC.  You need to find out why, since the 
value will be kept only if the key is already marked by the GC.

By the time a GC pass is run, all trees to be kept must be linked to a GC root.
You said that the pass was run between the point that the function prototype
tree node was created and the point at which the function declaration was 
processed.  To which GC root are the keys linked between these points?

-- 
Eric Botcazou

Re: [Patch,AVR]: Fix PR49939: Skip 2-word insns

2011-10-12 Thread Georg-Johann Lay

Denis Chertykov schrieb:
 2011/10/11 Georg-Johann Lay a...@gjlay.de:
 This patch teaches avr-gcc to skip 2-word instructions like STS and LDS.

 It's just about looking into an 2-word insn and check if it's a 2-word
 instruction or not.

 Passes without regression. Ok to install?
 
 Please commit.
 
 Denis.

Committed with the following change:

-   avr_2word_insn_p (next_nonnote_nondebug_insn (insn;
+   avr_2word_insn_p (next_active_insn (insn;

Otherwise a code label would knock out the optimization like in


char c;

void foo (char a, char b)
{
if (a || b)
c = b;
}

which now compiles to

foo:
tst r24
brne .L4
cpse r22,__zero_reg__
.L4:
sts c,r22
.L3:
ret

Johann

Commit: ARM: Add comments to emitted .eabi_attribute directives

2011-10-12 Thread Nick Clifton

Hi Guys,

  I am checking in the patch below to add comments to the
  .eabi_attribute assembler directives emitted by the ARM backend, when
  commented assembler output is enabled.

Cheers
  Nick

gcc/ChangeLog
2011-10-12  Nick Clifton  ni...@redhat.com

* config/arm/arm.h (EMIT_EABI_ATTRIBUTE): New macro.  Used to
emit a .eabi_attribute assembler directive, possibly with a
comment attached.
* config/arm/arm.c (arm_file_start): Use the new macro.
* config/arm/arm-c.c (arm_output_c_attributes): Likewise.

Index: gcc/config/arm/arm.c
===
--- gcc/config/arm/arm.c(revision 179839)
+++ gcc/config/arm/arm.c(working copy)
@@ -22291,9 +22291,9 @@
  if (arm_fpu_desc-model == ARM_FP_MODEL_VFP)
{
  if (TARGET_HARD_FLOAT)
-   asm_fprintf (asm_out_file, \t.eabi_attribute 27, 3\n);
+   EMIT_EABI_ATTRIBUTE (Tag_ABI_HardFP_use, 27, 3);
  if (TARGET_HARD_FLOAT_ABI)
-   asm_fprintf (asm_out_file, \t.eabi_attribute 28, 1\n);
+   EMIT_EABI_ATTRIBUTE (Tag_ABI_VFP_args, 28, 1);
}
}
   asm_fprintf (asm_out_file, \t.fpu %s\n, fpu_name);
@@ -22302,31 +22302,24 @@
  are used.  However we don't have any easy way of figuring this out.
 Conservatively record the setting that would have been used.  */
 
-  /* Tag_ABI_FP_rounding.  */
   if (flag_rounding_math)
-   asm_fprintf (asm_out_file, \t.eabi_attribute 19, 1\n);
+   EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_rounding, 19, 1);
+
   if (!flag_unsafe_math_optimizations)
{
- /* Tag_ABI_FP_denomal.  */
- asm_fprintf (asm_out_file, \t.eabi_attribute 20, 1\n);
- /* Tag_ABI_FP_exceptions.  */
- asm_fprintf (asm_out_file, \t.eabi_attribute 21, 1\n);
+ EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_denormal, 20, 1);
+ EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_exceptions, 21, 1);
}
-  /* Tag_ABI_FP_user_exceptions.  */
   if (flag_signaling_nans)
-   asm_fprintf (asm_out_file, \t.eabi_attribute 22, 1\n);
-  /* Tag_ABI_FP_number_model.  */
-  asm_fprintf (asm_out_file, \t.eabi_attribute 23, %d\n, 
-  flag_finite_math_only ? 1 : 3);
+   EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_user_exceptions, 22, 1);
 
-  /* Tag_ABI_align8_needed.  */
-  asm_fprintf (asm_out_file, \t.eabi_attribute 24, 1\n);
-  /* Tag_ABI_align8_preserved.  */
-  asm_fprintf (asm_out_file, \t.eabi_attribute 25, 1\n);
-  /* Tag_ABI_enum_size.  */
-  asm_fprintf (asm_out_file, \t.eabi_attribute 26, %d\n,
-  flag_short_enums ? 1 : 2);
+  EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_number_model, 23,
+  flag_finite_math_only ? 1 : 3);
 
+  EMIT_EABI_ATTRIBUTE (Tag_ABI_align8_needed, 24, 1);
+  EMIT_EABI_ATTRIBUTE (Tag_ABI_align8_preserved, 25, 1);
+  EMIT_EABI_ATTRIBUTE (Tag_ABI_enum_size, 26, flag_short_enums ? 1 : 2);
+
   /* Tag_ABI_optimization_goals.  */
   if (optimize_size)
val = 4;
@@ -22336,21 +22329,18 @@
val = 1;
   else
val = 6;
-  asm_fprintf (asm_out_file, \t.eabi_attribute 30, %d\n, val);
+  EMIT_EABI_ATTRIBUTE (Tag_ABI_optimization_goals, 30, val);
 
-  /* Tag_CPU_unaligned_access.  */
-  asm_fprintf (asm_out_file, \t.eabi_attribute 34, %d\n,
-  unaligned_access);
+  EMIT_EABI_ATTRIBUTE (Tag_CPU_unaligned_access, 34, unaligned_access);
 
-  /* Tag_ABI_FP_16bit_format.  */
   if (arm_fp16_format)
-   asm_fprintf (asm_out_file, \t.eabi_attribute 38, %d\n,
-(int)arm_fp16_format);
+   EMIT_EABI_ATTRIBUTE (Tag_ABI_FP_16bit_format, 38, (int) 
arm_fp16_format);
 
   if (arm_lang_output_object_attributes_hook)
arm_lang_output_object_attributes_hook();
 
 static void
Index: gcc/config/arm/arm.h
===
--- gcc/config/arm/arm.h(revision 179839)
+++ gcc/config/arm/arm.h(working copy)
@@ -2235,4 +2235,19 @@
 %{mcpu=generic-*:-march=%*;  \
   :%{mcpu=*:-mcpu=%*} %{march=*:-march=%*}}
 
+/* This macro is used to emit an EABI tag and its associated value.
+   We emit the numerical value of the tag in case the assembler does not
+   support textual tags.  (Eg gas prior to 2.20).  If requested we include
+   the tag name in a comment so that anyone reading the assembler output
+   will know which tag is being set.  */
+#define EMIT_EABI_ATTRIBUTE(NAME,NUM,VAL)  \
+  do   \
+{  \
+  asm_fprintf (asm_out_file, \t.eabi_attribute %d, %d, NUM, VAL); \
+  if (flag_verbose_asm || flag_debug_asm)  \
+   asm_fprintf

[C++ Patch] PR 50594 (C++ front-end bits)


Hi,

thus, per the discussion in the audit trail, I'm proceeding with 
decorating with __attribute__((externally_visible)) both the 8 new and 
delete in new, and the 4 pre-declared by the C++ front-end. The below 
is what I regression tested successfully, together with the library 
bits, on x86_64-linux.


I'm also attaching, for convenience, the library work (I took the 
occasion to adjust noexcept vs throw(), etc, otherwise the patch would 
be tiny)


What do you think?

Thanks,
Paolo.


2011-10-12  Paolo Carlini  paolo.carl...@oracle.com

PR c++/50594
* decl.c (cxx_init_decl_processing): Add
__attribute__((externally_visible)) to operator new and
operator delete library fn.
Index: decl.c
===
--- decl.c  (revision 179842)
+++ decl.c  (working copy)
@@ -3654,7 +3654,7 @@ cxx_init_decl_processing (void)
   current_lang_name = lang_name_cplusplus;
 
   {
-tree newattrs;
+tree newattrs, delattrs;
 tree newtype, deltype;
 tree ptr_ftype_sizetype;
 tree new_eh_spec;
@@ -3687,9 +3687,16 @@ cxx_init_decl_processing (void)
 newattrs
   = build_tree_list (get_identifier (alloc_size),
 build_tree_list (NULL_TREE, integer_one_node));
+newattrs
+  = chainon (newattrs, build_tree_list
+(get_identifier (externally_visible), NULL_TREE));
 newtype = cp_build_type_attribute_variant (ptr_ftype_sizetype, newattrs);
 newtype = build_exception_variant (newtype, new_eh_spec);
-deltype = build_exception_variant (void_ftype_ptr, empty_except_spec);
+delattrs
+  = build_tree_list (get_identifier (externally_visible),
+build_tree_list (NULL_TREE, integer_one_node));
+deltype = cp_build_type_attribute_variant (void_ftype_ptr, delattrs);
+deltype = build_exception_variant (deltype, empty_except_spec);
 push_cp_library_fn (NEW_EXPR, newtype);
 push_cp_library_fn (VEC_NEW_EXPR, newtype);
 global_delete_fndecl = push_cp_library_fn (DELETE_EXPR, deltype);
Index: include/bits/c++config
===
--- include/bits/c++config  (revision 179842)
+++ include/bits/c++config  (working copy)
@@ -103,9 +103,11 @@
 # ifdef __GXX_EXPERIMENTAL_CXX0X__
 #  define _GLIBCXX_NOEXCEPT noexcept
 #  define _GLIBCXX_USE_NOEXCEPT noexcept
+#  define _GLIBCXX_THROW(_EXC)
 # else
 #  define _GLIBCXX_NOEXCEPT
 #  define _GLIBCXX_USE_NOEXCEPT throw()
+#  define _GLIBCXX_THROW(_EXC) throw(_EXC)
 # endif
 #endif
 
Index: libsupc++/del_op.cc
===
--- libsupc++/del_op.cc (revision 179842)
+++ libsupc++/del_op.cc (working copy)
@@ -1,6 +1,7 @@
 // Boilerplate support routines for -*- C++ -*- dynamic memory management.
 
-// Copyright (C) 1997, 1998, 1999, 2000, 2004, 2007, 2009 Free Software 
Foundation
+// Copyright (C) 1997, 1998, 1999, 2000, 2004, 2007, 2009, 2010, 2011
+// Free Software Foundation
 //
 // This file is part of GCC.
 //
@@ -41,7 +42,7 @@
 #include new
 
 _GLIBCXX_WEAK_DEFINITION void
-operator delete(void* ptr) throw ()
+operator delete(void* ptr) _GLIBCXX_USE_NOEXCEPT
 {
   if (ptr)
 std::free(ptr);
Index: libsupc++/new_opv.cc
===
--- libsupc++/new_opv.cc(revision 179842)
+++ libsupc++/new_opv.cc(working copy)
@@ -1,6 +1,7 @@
 // Boilerplate support routines for -*- C++ -*- dynamic memory management.
 
-// Copyright (C) 1997, 1998, 1999, 2000, 2004, 2009 Free Software Foundation
+// Copyright (C) 1997, 1998, 1999, 2000, 2004, 2009, 2010, 2011
+// Free Software Foundation
 //
 // This file is part of GCC.
 //
@@ -27,7 +28,7 @@
 #include new
 
 _GLIBCXX_WEAK_DEFINITION void*
-operator new[] (std::size_t sz) throw (std::bad_alloc)
+operator new[] (std::size_t sz) _GLIBCXX_THROW (std::bad_alloc)
 {
   return ::operator new(sz);
 }
Index: libsupc++/new_op.cc
===
--- libsupc++/new_op.cc (revision 179842)
+++ libsupc++/new_op.cc (working copy)
@@ -42,7 +42,7 @@
 extern new_handler __new_handler;
 
 _GLIBCXX_WEAK_DEFINITION void *
-operator new (std::size_t sz) throw (std::bad_alloc)
+operator new (std::size_t sz) _GLIBCXX_THROW (std::bad_alloc)
 {
   void *p;
 
Index: libsupc++/del_opv.cc
===
--- libsupc++/del_opv.cc(revision 179842)
+++ libsupc++/del_opv.cc(working copy)
@@ -1,6 +1,7 @@
 // Boilerplate support routines for -*- C++ -*- dynamic memory management.
 
-// Copyright (C) 1997, 1998, 1999, 2000, 2004, 2009 Free Software Foundation
+// Copyright (C) 1997, 1998, 1999, 2000, 2004, 2009, 2010, 2011 
+// Free Software Foundation
 //
 // This file is part of GCC.
 //
@@ -27,7 +28,7 @@
 #include new
 
 _GLIBCXX_WEAK_DEFINITION void

Re: Fix PR 50565 (offsetof-type expressions in static initializers)

On Tue, Oct 11, 2011 at 5:32 PM, Joseph S. Myers
jos...@codesourcery.com wrote:
 This patch fixes PR 50565, a failure to accept certain offsetof-type
 expressions in static initializers introduced by my constant
 expressions changes.  (These expressions are permitted but not
 required by ISO C to be accepted; the intent of my constant
 expressions model is that they should be valid in GNU C.)

 The problem comes down to an expression with the difference of two
 pointers being cast to int on a 64-bit system, resulting in
 convert_to_integer moving the conversions inside the subtraction.
 (These optimizations at conversion time should really be done later as
 a part of folding, or even later than that, rather than
 unconditionally in convert_to_*, but that's another issue.)  So when
 the expression reaches c_fully_fold it is a difference of narrowed
 pointers being folded, which the compiler cannot optimize as it can a
 difference of unnarrowed pointers with the same base object.  Before
 the introduction of c_fully_fold the difference would have been folded
 when built and so the narrowing of operands would never have been
 applied to it.

 This patch disables the narrowing in the case of pointer subtraction,
 as it doesn't seem particularly likely to be useful there and is known
 to prevent this folding required for these initializers to be
 accepted.

 Bootstrapped with no regressions on x86_64-unknown-linux-gnu.  OK to
 commit?

Ok.

Thanks,
Richard.

 2011-10-11  Joseph Myers  jos...@codesourcery.com

        PR c/50565
        * convert.c (convert_to_integer): Do not narrow operands of
        pointer subtraction.

 testsuite:
 2011-10-11  Joseph Myers  jos...@codesourcery.com

        PR c/50565
        * gcc.c-torture/compile/pr50565-1.c,
        gcc.c-torture/compile/pr50565-2.c: New tests.

 Index: gcc/testsuite/gcc.c-torture/compile/pr50565-1.c
 ===
 --- gcc/testsuite/gcc.c-torture/compile/pr50565-1.c     (revision 0)
 +++ gcc/testsuite/gcc.c-torture/compile/pr50565-1.c     (revision 0)
 @@ -0,0 +1,4 @@
 +struct s { char p[2]; };
 +static struct s v;
 +const int o0 = (int) ((void *) v.p[0] - (void *) v) + 0U;
 +const int o1 = (int) ((void *) v.p[0] - (void *) v) + 1U;
 Index: gcc/testsuite/gcc.c-torture/compile/pr50565-2.c
 ===
 --- gcc/testsuite/gcc.c-torture/compile/pr50565-2.c     (revision 0)
 +++ gcc/testsuite/gcc.c-torture/compile/pr50565-2.c     (revision 0)
 @@ -0,0 +1,4 @@
 +struct s { char p[2]; };
 +static struct s v;
 +const int o0 = (int) ((void *) v.p[0] - (void *) v) + 0;
 +const int o1 = (int) ((void *) v.p[0] - (void *) v) + 1;
 Index: gcc/convert.c
 ===
 --- gcc/convert.c       (revision 179754)
 +++ gcc/convert.c       (working copy)
 @@ -745,6 +745,15 @@ convert_to_integer (tree type, tree expr
            tree arg0 = get_unwidened (TREE_OPERAND (expr, 0), type);
            tree arg1 = get_unwidened (TREE_OPERAND (expr, 1), type);

 +           /* Do not try to narrow operands of pointer subtraction;
 +              that will interfere with other folding.  */
 +           if (ex_form == MINUS_EXPR
 +                CONVERT_EXPR_P (arg0)
 +                CONVERT_EXPR_P (arg1)
 +                POINTER_TYPE_P (TREE_TYPE (TREE_OPERAND (arg0, 0)))
 +                POINTER_TYPE_P (TREE_TYPE (TREE_OPERAND (arg1, 0
 +             break;
 +
            if (outprec = BITS_PER_WORD
                || TRULY_NOOP_TRUNCATION (outprec, inprec)
                || inprec  TYPE_PRECISION (TREE_TYPE (arg0))

 --
 Joseph S. Myers
 jos...@codesourcery.com

Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

Hi,

On Wed, 12 Oct 2011, Kai Tietz wrote:

  And I think it could use some overview of the transformation done like in
  the initial patch, ala:
 
  Transform ((A  B)  C) into (A  (B  C)).
 
  and
 
  Or (A  B) into (A  B). for this part:
 
  +     /* Needed for sequence points to handle trappings, and side-effects.  
  */
  +     else if (simple_operand_p_2 (arg0))
  +       return fold_build2_loc (loc, ncode, type, arg0, arg1);
 
 Well to use here binary form of operation seems to me misleading.

Hmm?  What do you mean?  Both operations are binary.  ANDIF is '', AND 
is ''.  In fold-const.c comments we usually use the C notations of the 
operations.

 It is right that the non-IF AND/OR has finally the same behavior as the 
 binary form in gimple.  Nevertheless it isn't the same on AST level. But 
 sure I can Add comments for operations like (A OR/AND-IF B) OR/AND-IF C 
 - (A OR/AND-IF (B OR/AND C and A OR/AND-IF C - (A OR/AND C)

Too much noise, leave out the || variant, and just say once Same for ||.


Ciao,
Michael.

Re: [PATCH] Add capability to run several iterations of early optimizations

On Wed, Oct 12, 2011 at 8:50 AM, Maxim Kuvyrkov ma...@codesourcery.com wrote:
The following patch adds new knob to make GCC perform several iterations of
early optimizations and inlining.

This is for dont-care-about-compile-time-optimize-all-you-can scenarios.
Performing several iterations of optimizations does significantly improve
code speed on a certain proprietary source base. Some hand-tuning of the
parameter value is required to get optimum performance. Another good use for
this option is for search and ad-hoc analysis of cases where GCC misses
optimization opportunities.

With the default setting of '1', nothing is changed from the current status
quo.

Performance impact on the standard benchmarks is not conclusive, there are
improvements in SPEC2000 of up to 4% and regressions down to -2%, see [*].
SPEC2006 benchmarks will take another day or two to complete and I will
update the spreadsheet then. The benchmarks were run on a Core2 system for
all combinations of {-m32/-m64}{-O2/-O3}.

Effect on compilation time is fairly predictable, about 10% compile time
increase with 3 iterations.

OK for trunk?

I don't think this is a good idea, especially in the form you implemented it.

If we'd want to iterate early optimizations we'd want to do it by iterating
an IPA pass so that we benefit from more precise size estimates
when trying to inline a function the second time. Also statically
scheduling the passes will mess up dump files and you have no
chance of say, noticing that nothing changed for function f and its
callees in iteration N and thus you can skip processing them in
iteration N + 1.

So, at least you should split the pass_early_local_passes IPA pass
into three, you'd iterate over the 2nd (definitely not over pass_split_functions
though), the third would be pass_profile and pass_split_functions only.
And you'd iterate from the place the 2nd IPA pass is executed, not
by scheduling them N times.

Then you'd have to analyze the compile-time impact of the IPA
splitting on its own when not iterating. Then you should look
at what actually was the optimizations that were performed
that lead to the improvement (I can see some indirect inlining
happening, but everything else would be a bug in present
optimizers in the early pipeline - they are all designed to be
roughly independent on each other and _not_ expose new
opportunities by iteration). Thus - testcases?

Thanks,
Richard.

[*]
https://docs.google.com/spreadsheet/ccc?key=0AvK0Y-Pgj7bNdFBQMEJ6d3laeFdvdk9lQ1p0LUFkVFEhl=en_US

Thank you,

--
Maxim Kuvyrkov
CodeSourcery / Mentor Graphics

Re: int_cst_hash_table mapping persistence and the garbage collector

On Wed, Oct 12, 2011 at 10:29 AM, Eric Botcazou ebotca...@adacore.com wrote:
 It maps a type node into a corresponding integer node that is
 the blocking factor associated with the type node.  Before
 the advent of this hash table the blocking factor was stored
 in a dedicated field in the tree type node.  The suggestion was
 made to move this into a hash table to save space.  I chose
 the tree map hash table because I thought it could do the job.

 So this isn't a simple hash table since this is a map.  A simple hash table
 doesn't store the key in the slot, only the value; a map does.

 The keys are valid.  In the example discussed in this thread,
 there is a pointer to type node that used in a parameter declaration
 of a function prototype and also in the similarly named parameter
 of the function definition.  Both tree pointers are used as keys,
 and they are live at the point that the GC runs.

 But somehow they aren't marked by the GC.  You need to find out why, since the
 value will be kept only if the key is already marked by the GC.

 By the time a GC pass is run, all trees to be kept must be linked to a GC 
 root.
 You said that the pass was run between the point that the function prototype
 tree node was created and the point at which the function declaration was
 processed.  To which GC root are the keys linked between these points?

I think there is an issue when two cache htabs refer to each other
with respect to GC, you might search the list to find out more.

Richard.

 --
 Eric Botcazou

Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

2011-10-12 Thread Kai Tietz

2011/10/12 Michael Matz m...@suse.de:
 Hi,

 On Wed, 12 Oct 2011, Kai Tietz wrote:

  And I think it could use some overview of the transformation done like in
  the initial patch, ala:
 
  Transform ((A  B)  C) into (A  (B  C)).
 
  and
 
  Or (A  B) into (A  B). for this part:
 
  +     /* Needed for sequence points to handle trappings, and side-effects. 
   */
  +     else if (simple_operand_p_2 (arg0))
  +       return fold_build2_loc (loc, ncode, type, arg0, arg1);

 Well to use here binary form of operation seems to me misleading.

 Hmm?  What do you mean?  Both operations are binary.  ANDIF is '', AND
 is ''.  In fold-const.c comments we usually use the C notations of the
 operations.

See TRUTH_AND_EXPR is in C-notation  and TRUTH_ANDIF_EXPR is also
.  The transcription to binary  is done in gimplifier.  Btw I just
noticed that by this patch a latent bug in gimplifier about
boolification for TRUTH_NOT_EXPR/TRUTH_AND/OR... is present.
On Fortran there are different boolean-kinds types with different
precision.  This makes them incompatible to eachother in gimple (as
useless_type_conversion_p returns for them false).  For gimplier needs
to ensure that operands for those TRUTH_... expression met a
compatible type of final expression type.

I will sent a patch for this as soon I completed regression-testing for it.

 It is right that the non-IF AND/OR has finally the same behavior as the
 binary form in gimple.  Nevertheless it isn't the same on AST level. But
 sure I can Add comments for operations like (A OR/AND-IF B) OR/AND-IF C
 - (A OR/AND-IF (B OR/AND C and A OR/AND-IF C - (A OR/AND C)

 Too much noise, leave out the || variant, and just say once Same for ||.


 Ciao,
 Michael.

Cheers,
Kai

Re: [PATCH] [Annotalysis] Bugfix where lock function is attached to a base class.

On Tue, Oct 11, 2011 at 13:52, Delesley Hutchins deles...@google.com wrote:
 This patch fixes an error where Annotalysis generates bogus warnings
 when using lock and unlock functions that are attached to a base class.
 The canonicalize routine did not work correctly in this case.

 Bootstrapped and passed gcc regression testsuite on
 x86_64-unknown-linux-gnu.  Okay for google/gcc-4_6?

  -DeLesley

 Changelog.google-4_6:
 2011-10-11  DeLesley Hutchins  deles...@google.com

   * tree-threadsafe-analyze.c (get_canonical_lock_expr)

 testsuite/Changelog.google-4_6:
 2011-10-11  DeLesley Hutchins  deles...@google.com

   * g++.dg/thread-ann/thread_annot_lock-83.C

ChangeLog entries are missing.

OK otherwise.


Diego.

Re: [PATCH] RFC: Cache LTO streamer mappings

On Sun, Oct 9, 2011 at 13:11, Andi Kleen a...@firstfloor.org wrote:

 Is it still a good idea?

Given that you found no speedups and it introduces added complexity, I
think it's best if we revisit the idea later.  I never found bytecode
reading to be a bottleneck in LTO, but perhaps Jan can comment what
the experience is with Mozilla builds.


Diego.

Re: [gimplefe][patch] A bugfix for a missed symbol

On Mon, Oct 10, 2011 at 17:28, Sandeep Soni soni.sande...@gmail.com wrote:

 2011-10-11  Sandeep Soni  soni.sande...@gmail.com

        * parser.c (gp_parse_var_decl): Fixed a bug for the
        missing symbol 'CPP_LESS' in the 'INTEGER_TYPE' declaration.

OK.


Diego.

Re: [C++ Patch] PR 50594 (C++ front-end bits)


On 10/12/2011 07:26 AM, Paolo Carlini wrote:

+delattrs
+  = build_tree_list (get_identifier (externally_visible),
+build_tree_list (NULL_TREE, integer_one_node));


Why integer_one_node?

Jason

Re: [PR50672, PATCH] Fix ice triggered by -ftree-tail-merge: verify_ssa failed: no immediate_use list

On Wed, Oct 12, 2011 at 8:35 AM, Tom de Vries vr...@codesourcery.com wrote:
 Richard,

 I have a patch for PR50672.

 When compiling the testcase from the PR with -ftree-tail-merge, the scenario 
 is
 as follows:

 We start out tail_merge_optimize with blocks 14 and 20, which are alike, but 
 not
 equal, since they have different successors:
 ...
  # BLOCK 14 freq:690
  # PRED: 25 [61.0%]  (false,exec)

  if (wD.2197_57(D) != 0B)
    goto bb 15;
  else
    goto bb 16;
  # SUCC: 15 [78.4%]  (true,exec) 16 [21.6%]  (false,exec)


  # BLOCK 20 freq:2900
  # PRED: 29 [100.0%]  (fallthru) 31 [100.0%]  (fallthru)

  # .MEMD.2447_209 = PHI .MEMD.2447_125(29), .MEMD.2447_129(31)
  if (wD.2197_57(D) != 0B)
    goto bb 5;
  else
    goto bb 6;
  # SUCC: 5 [85.0%]  (true,exec) 6 [15.0%]  (false,exec)
 ...

 In the first iteration, we merge block 5 with block 15 and block 6 with block
 16. After that, the blocks 14 and 20 are equal.

 In the second iteration, the blocks 14 and 20 are merged, by redirecting the
 incoming edges of block 20 to block 14, and removing block 20.

 Block 20 also contains the definition of .MEMD.2447_209. Removing the 
 definition
 delinks the vuse of .MEMD.2447_209 in block 5:
 ...
  # BLOCK 5 freq:6036
  # PRED: 20 [85.0%]  (true,exec)

  # PT = nonlocal escaped
  D.2306_58 = thisD.2200_10(D)-D.2156;
  # .MEMD.2447_132 = VDEF .MEMD.2447_209
  # USE = anything
  # CLB = anything
  drawLineD.2135 (D.2306_58, wD.2197_57(D), gcD.2198_59(D));
  goto bb 17;
  # SUCC: 17 [100.0%]  (fallthru,exec)
 ...

And block 5 is retained and block 15 is discarded?

 After the pass, when executing the TODO_update_ssa_only_virtuals, we update 
 the
 drawLine call in block 5 using rewrite_update_stmt, which calls
 maybe_replace_use for the vuse operand.

 However, maybe_replace_use doesn't have an effect since the old vuse and the 
 new
 vuse happen to be the same (rdef == use), so SET_USE is not called and the 
 vuse
 remains delinked:
 ...
  if (rdef  rdef != use)
    SET_USE (use_p, rdef);
 ...

 The patch fixes this by forcing SET_USE for delinked uses.

That isn't the correct fix.  Whoever unlinks the vuse (by removing its
definition) has to replace it with something valid, which is either the
bare symbol .MEM, or the VUSE associated with the removed VDEF
(thus, as unlink_stmt_vdef does).

Richard.

 Bootstrapped and reg-tested on x86_64.

 OK for trunk?

 Thanks,
 - Tom

 2011-10-12  Tom de Vries  t...@codesourcery.com

        PR tree-optimization/50672
        * tree-into-ssa.c (maybe_replace_use): Force SET_USE for delinked uses.

Re: [gimplefe][patch] The symbol table for declarations

On Tue, Oct 11, 2011 at 09:42, Tom Tromey tro...@redhat.com wrote:
 Sandeep == Sandeep Soni soni.sande...@gmail.com writes:

 Sandeep The following patch is a basic attempt to build a symbol table that
 Sandeep stores the names of all the declarations made in the input file.

 I don't know anything about gimplefe, but unless you have complicated
 needs, it is more usual to just put a symbol's value directly into the
 identifier node.  The C front end is a good example of this.

Granted, but a central symbol table simplifies processing like
generating gimple output.  The gimple FE will want to emit a text file
with transformed gimple.


Diego.

[PATCH] Fix PR50700


This fixes __builtin_object_size folding on static storage accessed
via a type with a trailing array element.

Bootstrapped and tested on x86_64-unknown-linux-gnu, will apply
after testing on the 4.6 branch.

Thanks,
Richard.

2011-10-12  Richard Guenther  rguent...@suse.de

PR tree-optimization/50700
* tree-object-size.c (addr_object_size): Simplify and treat
MEM_REF bases consistently.

* gcc.dg/builtin-object-size-12.c: New testcase.

Index: gcc/tree-object-size.c
===
*** gcc/tree-object-size.c  (revision 179757)
--- gcc/tree-object-size.c  (working copy)
*** addr_object_size (struct object_size_inf
*** 166,189 
gcc_assert (TREE_CODE (ptr) == ADDR_EXPR);
  
pt_var = TREE_OPERAND (ptr, 0);
!   if (REFERENCE_CLASS_P (pt_var))
! pt_var = get_base_address (pt_var);
  
if (pt_var
!TREE_CODE (pt_var) == MEM_REF
!TREE_CODE (TREE_OPERAND (pt_var, 0)) == SSA_NAME
!POINTER_TYPE_P (TREE_TYPE (TREE_OPERAND (pt_var, 0
  {
unsigned HOST_WIDE_INT sz;
  
!   if (!osi || (object_size_type  1) != 0)
{
  sz = compute_builtin_object_size (TREE_OPERAND (pt_var, 0),
object_size_type  ~1);
- if (host_integerp (TREE_OPERAND (pt_var, 1), 0))
-   sz -= TREE_INT_CST_LOW (TREE_OPERAND (pt_var, 1));
- else
-   sz = offset_limit;
}
else
{
--- 166,184 
gcc_assert (TREE_CODE (ptr) == ADDR_EXPR);
  
pt_var = TREE_OPERAND (ptr, 0);
!   while (handled_component_p (pt_var))
! pt_var = TREE_OPERAND (pt_var, 0);
  
if (pt_var
!TREE_CODE (pt_var) == MEM_REF)
  {
unsigned HOST_WIDE_INT sz;
  
!   if (!osi || (object_size_type  1) != 0
! || TREE_CODE (pt_var) != SSA_NAME)
{
  sz = compute_builtin_object_size (TREE_OPERAND (pt_var, 0),
object_size_type  ~1);
}
else
{
*** addr_object_size (struct object_size_inf
*** 195,204 
sz = object_sizes[object_size_type][SSA_NAME_VERSION (var)];
  else
sz = unknown[object_size_type];
! if (host_integerp (TREE_OPERAND (pt_var, 1), 0))
!   sz -= TREE_INT_CST_LOW (TREE_OPERAND (pt_var, 1));
  else
!   sz = offset_limit;
}
  
if (sz != unknown[object_size_type]  sz  offset_limit)
--- 190,206 
sz = object_sizes[object_size_type][SSA_NAME_VERSION (var)];
  else
sz = unknown[object_size_type];
!   }
!   if (sz != unknown[object_size_type])
!   {
! double_int dsz = double_int_sub (uhwi_to_double_int (sz),
!  mem_ref_offset (pt_var));
! if (double_int_negative_p (dsz))
!   sz = 0;
! else if (double_int_fits_in_uhwi_p (dsz))
!   sz = double_int_to_uhwi (dsz);
  else
!   sz = unknown[object_size_type];
}
  
if (sz != unknown[object_size_type]  sz  offset_limit)
*** addr_object_size (struct object_size_inf
*** 211,217 
tree_low_cst (DECL_SIZE_UNIT (pt_var), 1)  offset_limit)
  pt_var_size = DECL_SIZE_UNIT (pt_var);
else if (pt_var
!   (SSA_VAR_P (pt_var) || TREE_CODE (pt_var) == STRING_CST)
TYPE_SIZE_UNIT (TREE_TYPE (pt_var))
host_integerp (TYPE_SIZE_UNIT (TREE_TYPE (pt_var)), 1)
(unsigned HOST_WIDE_INT)
--- 213,219 
tree_low_cst (DECL_SIZE_UNIT (pt_var), 1)  offset_limit)
  pt_var_size = DECL_SIZE_UNIT (pt_var);
else if (pt_var
!   TREE_CODE (pt_var) == STRING_CST
TYPE_SIZE_UNIT (TREE_TYPE (pt_var))
host_integerp (TYPE_SIZE_UNIT (TREE_TYPE (pt_var)), 1)
(unsigned HOST_WIDE_INT)
Index: gcc/testsuite/gcc.dg/builtin-object-size-12.c
===
*** gcc/testsuite/gcc.dg/builtin-object-size-12.c   (revision 0)
--- gcc/testsuite/gcc.dg/builtin-object-size-12.c   (revision 0)
***
*** 0 
--- 1,19 
+ /* { dg-do run } */
+ /* { dg-options -O2 } */
+ 
+ extern void abort (void);
+ struct S {
+ int len;
+ char s[0];
+ };
+ int main()
+ {
+   char buf[sizeof (struct S) + 32];
+   if (__builtin_object_size (((struct S *)buf[0])-s, 1) != 32)
+ abort ();
+   if (__builtin_object_size (((struct S *)buf[1])-s, 1) != 31)
+ abort ();
+   if (__builtin_object_size (((struct S *)buf[64])-s, 0) != 0)
+ abort ();
+   return 0;
+ }

Re: [patch tree-optimization]: Improve handling of conditional-branches on targets with high branch costs

Hi,

On Wed, 12 Oct 2011, Kai Tietz wrote:

  Hmm?  What do you mean?  Both operations are binary.  ANDIF is '', AND
  is ''.  In fold-const.c comments we usually use the C notations of the
  operations.
 
 See TRUTH_AND_EXPR is in C-notation  and TRUTH_ANDIF_EXPR is also
 .

Ah right, confusing but there we are.  A comment using ANDIF and AND it is 
then.


Ciao,
Michael.

[PATCH] Fix PR50189


This changes VRP to use the type of the variable we record
an assertion for to look for TYPE_MIN/MAX_VALUEs rather than
the limit that it is tested against.  That makes sense anyway
and happens to mitigate the wrong-code bug for the testcase
in PR50189.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Thanks,
Richard.

2011-10-12  Paul Koning  pkon...@gcc.gnu.org

PR tree-optimization/50189
* tree-vrp.c (extract_range_from_assert): Use the type of
the variable, not the limit.

* g++.dg/torture/pr50189.C: New testcase.

Index: gcc/tree-vrp.c
===
*** gcc/tree-vrp.c  (revision 179757)
--- gcc/tree-vrp.c  (working copy)
*** extract_range_from_assert (value_range_t
*** 1476,1482 
  
limit = avoid_overflow_infinity (limit);
  
!   type = TREE_TYPE (limit);
gcc_assert (limit != var);
  
/* For pointer arithmetic, we only keep track of pointer equality
--- 1476,1482 
  
limit = avoid_overflow_infinity (limit);
  
!   type = TREE_TYPE (var);
gcc_assert (limit != var);
  
/* For pointer arithmetic, we only keep track of pointer equality
Index: gcc/testsuite/g++.dg/torture/pr50189.C
===
*** gcc/testsuite/g++.dg/torture/pr50189.C  (revision 0)
--- gcc/testsuite/g++.dg/torture/pr50189.C  (revision 0)
***
*** 0 
--- 1,121 
+ // { dg-do run }
+ // { dg-options -fstrict-enums }
+ 
+ extern C void abort (void);
+ class CCUTILS_KeyedScalarLevelPosition
+ {
+ public:
+ 
+ typedef enum
+ {
+ UNINITED = 0,
+ AT_BEGIN = 1,
+ AT_END = 2,
+ AT_KEY = 3
+ 
+ } position_t;
+ 
+ bool is_init() const
+ { return(m_timestamp != UNINITED); }
+ 
+ bool is_at_begin() const
+ { return(m_timestamp == AT_BEGIN); }
+ 
+ position_t get_state() const
+ {
+ return((m_timestamp = AT_KEY)
+  ? AT_KEY
+  : ((position_t)m_timestamp));
+ }
+ 
+ void set_at_begin()
+ { m_timestamp = AT_BEGIN; }
+ 
+ unsigned int get_index() const
+ { return(m_index); }
+ 
+ void set_pos(unsigned int a_index, unsigned int a_timestmap)
+ {
+ m_index = a_index;
+ m_timestamp = a_timestmap;
+ }
+ 
+ bool check_pos(unsigned int a_num_entries, unsigned int a_timestamp) const
+ {
+ if (get_state() != AT_KEY)
+ return(false);
+ 
+ if (m_timestamp != a_timestamp)
+ return(false);
+ 
+ return(m_index  a_num_entries);
+ }
+ 
+ void set_not_init()
+ { m_timestamp = 0; }
+ 
+ private:
+ 
+ unsigned int m_timestamp;
+ unsigned int m_index;
+ 
+ };
+ 
+ class CCUTILS_KeyedScalarPosition
+ {
+ public:
+ 
+ CCUTILS_KeyedScalarLevelPosition m_L1;
+ CCUTILS_KeyedScalarLevelPosition m_L2;
+ };
+ 
+ class baz
+ {
+ public:
+ int *n[20];
+ unsigned int m_cur_array_len;
+ unsigned int m_timestamp;
+ 
+ unsigned int _get_timestamp() const
+ { return(m_timestamp); }
+ 
+ bool _check_L1_pos(const CCUTILS_KeyedScalarPosition a_position) const
+ {
+ return(a_position.m_L1.check_pos(
+m_cur_array_len, _get_timestamp()));
+ }
+ 
+ void *next (CCUTILS_KeyedScalarPosition );
+ };
+ 
+ void * baz::next (CCUTILS_KeyedScalarPosition a_position)
+ {
+ if (a_position.m_L1.is_at_begin() || (!a_position.m_L1.is_init()))
+ {
+ a_position.m_L1.set_pos(0, _get_timestamp());
+ a_position.m_L2.set_at_begin();
+ }
+ else if (!_check_L1_pos(a_position))
+ return(0);
+ 
+ return n[a_position.m_L1.get_index ()];
+ }
+ 
+ int main (int, char **)
+ {
+ baz obj;
+ CCUTILS_KeyedScalarPosition a_pos;
+ void *ret;
+ int n[5];
+ 
+ obj.n[0] = n;
+ obj.m_cur_array_len = 1;
+ obj.m_timestamp = 42;
+ 
+ a_pos.m_L1.set_pos (0, 42);
+ 
+ ret = obj.next (a_pos);
+ if (ret == 0)
+   abort ();
+ return 0;
+ }

Re: [C++ Patch] PR 50594 (C++ front-end bits)


On 10/12/2011 02:18 PM, Jason Merrill wrote:

On 10/12/2011 07:26 AM, Paolo Carlini wrote:

+delattrs
+  = build_tree_list (get_identifier (externally_visible),
+ build_tree_list (NULL_TREE, integer_one_node));


Why integer_one_node?
To be honest? No idea, I copied what pre-existed for operator new. Shall 
I test (NULL_TREE, NULL_TREE)??


Paolo.

Re: [C++ Patch] PR 50594 (C++ front-end bits)

On Wed, Oct 12, 2011 at 2:29 PM, Paolo Carlini paolo.carl...@oracle.com wrote:
 On 10/12/2011 02:18 PM, Jason Merrill wrote:

 On 10/12/2011 07:26 AM, Paolo Carlini wrote:

 +    delattrs
 +      = build_tree_list (get_identifier (externally_visible),
 +             build_tree_list (NULL_TREE, integer_one_node));

 Why integer_one_node?

 To be honest? No idea, I copied what pre-existed for operator new. Shall I
 test (NULL_TREE, NULL_TREE)??

build_tree_list (get_identifier (externally_visible), NULL_TREE)

 Paolo.

Re: [PATCH] Fix PR50204

Hi,

On Tue, 11 Oct 2011, Richard Guenther wrote:

 Since we have the alias oracle we no longer optimize the testcase below 
 because I initially restricted the stmt walking to give up for PHIs with 
 more than 2 arguments because of compile-time complexity issues. But 
 it's easy to see that compile-time is not an issue when we reduce PHI 
 args pairwise to a single dominating operand.

Of course it is, not a different complexity class, but a constant factor.  
You have to do N-1 pairwise reductions, meaning with a large fan-in block 
you pay N-1 times the price, not just once for one pair, and if the price 
happens to be walking all up to the function start you indeed then are at 
N*M.  I think there should be a cutoff, possibly not at two.  Think about 
the generated testcases with many large switches.


Ciao,
Michael.

Re: Out-of-order update of new_spill_reg_store[]

2011-10-12 Thread Bernd Schmidt

On 10/11/11 14:35, Richard Sandiford wrote:
 No, reload 1 is inherited by a later instruction.  And it's inherited
 correctly, in terms of the register contents being what we expect.
 (Reload 1 is the one that survives to the end of the instruction's
 reload sequence.  Reload 2, in contrast, is clobbered by reload 1,
 so could not be inherited.  So when we record inheritance information
 in emit_reload_insns, reload_reg_reaches_end_p correctly stops us
 from recording reload 2 but allows us to record reload 1.)
 
 The problem is that we record the wrong instruction for reload 1.
 We say that reload 1 is performed by the instruction that performs
 reload 2.  So spill_reg_store[] contains the instruction for reload 2
 rather than the instruction for reload 1.  We delete it in
 delete_output_reload at the point of inheritance.

Ok. So, would the minimal fix of testing !new_spill_reg_store[..] before
writing to it also work? Seems to me this would cope with the
out-of-order writes by only allowing the first.

If so, then I think I'd prefer that, but we could gcc_assert
(reload_reg_reaches_end (..)) as a bit of a verification of that function.


Bernd

Re: [PATCH] RFC: Cache LTO streamer mappings


On 11-10-12 08:25 , Jan Hubicka wrote:


WPA is currently about 1/3 of readingtype merging, 1/3 of streaming out and
1/3 of inlining.  inlining is relatively easy to cure, so yes, streaming
performance is important.  The very basic streaming primitives actualy still
shows top in profile along with hashing and type comparing code.  I will post
some updated oprofiles into Mozilla PR.


OK, thanks.  My numbers are from very early LTO development.


Honestly I think we won't get any great speedups unless we work on reducing 
amount of
unnecesary info we pickle/unpickle.


That's what I was leaning towards.  Optimizing the basic access patterns 
may not buy us as much as just reducing the amount of clutter we have to 
deal with.  It may make sense, however, as a subsequent optimization.



Diego.

Re: [Patch,AVR]: Fix PR49939: Skip 2-word insns

2011-10-12 Thread Denis Chertykov

2011/10/12 Georg-Johann Lay a...@gjlay.de:
 Denis Chertykov schrieb:
 2011/10/11 Georg-Johann Lay a...@gjlay.de:
 This patch teaches avr-gcc to skip 2-word instructions like STS and LDS.

 It's just about looking into an 2-word insn and check if it's a 2-word
 instruction or not.

 Passes without regression. Ok to install?

 Please commit.

 Denis.

 Committed with the following change:

 -               avr_2word_insn_p (next_nonnote_nondebug_insn (insn;
 +               avr_2word_insn_p (next_active_insn (insn;


It was discussed in another thread.

Denis.

Re: [C++ Patch] PR 50594 (C++ front-end bits)


... or like this, maybe better.

Paolo.

/
Index: decl.c
===
--- decl.c  (revision 179842)
+++ decl.c  (working copy)
@@ -3654,7 +3654,7 @@ cxx_init_decl_processing (void)
   current_lang_name = lang_name_cplusplus;
 
   {
-tree newattrs;
+tree newattrs, extvisattr;
 tree newtype, deltype;
 tree ptr_ftype_sizetype;
 tree new_eh_spec;
@@ -3687,9 +3687,13 @@ cxx_init_decl_processing (void)
 newattrs
   = build_tree_list (get_identifier (alloc_size),
 build_tree_list (NULL_TREE, integer_one_node));
+extvisattr = build_tree_list (get_identifier (externally_visible),
+ NULL_TREE);
+newattrs = chainon (newattrs, extvisattr);
 newtype = cp_build_type_attribute_variant (ptr_ftype_sizetype, newattrs);
 newtype = build_exception_variant (newtype, new_eh_spec);
-deltype = build_exception_variant (void_ftype_ptr, empty_except_spec);
+deltype = cp_build_type_attribute_variant (void_ftype_ptr, extvisattr);
+deltype = build_exception_variant (deltype, empty_except_spec);
 push_cp_library_fn (NEW_EXPR, newtype);
 push_cp_library_fn (VEC_NEW_EXPR, newtype);
 global_delete_fndecl = push_cp_library_fn (DELETE_EXPR, deltype);

Re: [gimplefe][patch] The symbol table for declarations


On 11-10-10 17:47 , Sandeep Soni wrote:

Hi,
The following patch is a basic attempt to build a symbol table that
stores the names of all the declarations made in the input file.

Index: gcc/gimple/parser.c
===
--- gcc/gimple/parser.c (revision 174754)
+++ gcc/gimple/parser.c (working copy)
@@ -28,6 +28,7 @@
  #include tree.h
  #include gimple.h
  #include parser.h
+#include hashtab.h
  #include ggc.h

  /* The GIMPLE parser.  Note: do not use this variable directly.  It is
@@ -44,6 +45,43 @@
  /* EOF token.  */
  static gimple_token gl_eof_token = { CPP_EOF, 0, 0, 0 };

+/* The GIMPLE symbol table entry.  */
+
+struct GTY (()) gimple_symtab_entry_def
+{
+  /* Variable that is declared.  */
+  tree decl;
+
+};


No blank line before '};'

Add 'typedef struct gimple_symtab_entry_def gimple_symtab_entry;' to 
shorten declarations.



+
+/* Gimple symbol table.  */
+static htab_t gimple_symtab;
+
+/* Return the hash value of the declaration name of a gimple_symtab_entry_def
+   object pointed by ENTRY.  */
+
+static hashval_t
+gimple_symtab_entry_hash (const void *entry)
+{
+  const struct gimple_symtab_entry_def *base =
+(const struct gimple_symtab_entry_def *)entry;
+  return IDENTIFIER_HASH_VALUE (DECL_NAME(base-decl));


Space after DECL_NAME.


+}
+
+/* Returns non-zero if ENTRY1 and ENTRY2 points to gimple_symtab_entry_def


s/points/point/


+   objects corresponding to the same declaration.  */
+
+static int
+gimple_symtab_eq_hash (const void *entry1, const void *entry2)
+{
+  const struct gimple_symtab_entry_def *p1 =
+(const struct gimple_symtab_entry_def *)entry1;
+  const struct gimple_symtab_entry_def *p2 =
+(const struct gimple_symtab_entry_def *)entry2;
+
+  return DECL_NAME(p1-decl) == DECL_NAME(p2-decl);


Space after DECL_NAME.


+}
+
  /* Return the string representation of token TOKEN.  */

  static const char *
@@ -807,6 +845,7 @@
  }
  }

+
  /* The Declaration section within a .gimple file can consist of
 a) Declaration of variables.
 b) Declaration of functions.
@@ -870,11 +909,17 @@
  static void
  gp_parse_var_decl (gimple_parser *parser)
  {
-  const gimple_token *next_token;
+  const gimple_token *next_token, *name_token;
+  const char* name;


s/char* /char */


enum tree_code code ;
+  struct gimple_symtab_entry_def e;

gl_consume_expected_token (parser-lexer, CPP_LESS);
-  gl_consume_expected_token (parser-lexer, CPP_NAME);
+  name_token = gl_consume_expected_token (parser-lexer, CPP_NAME);
+  name = gl_token_as_text (name_token);
+  e.decl =
+  build_decl (UNKNOWN_LOCATION, VAR_DECL, get_identifier(name),
void_type_node);


No need to use UNKNOWN_LOCATION.  Get the location for E.DECL from 
name_token.location.


Additionally, before building the decl, you should make sure that the 
symbol table does not already have it.  So, instead of looking up with a 
DECL, you should look it up using IDENTIFIER_NODEs.  There are two 
approaches you can use:


1- Add an identifier field to gimple_symtab_entry_def.  Use that field 
for hash table lookups (in this code you'd then fill E.ID with NAME_TOKEN).


2- Use a pointer_map_t and a VEC().  With this approach, you use a 
pointer map to map identifier nodes to unsigned integers.  These 
integers are the index into the VEC() array where the corresponding decl 
is stored.


In this case, I think #1 is the simplest approach.


+  htab_find_slot (gimple_symtab,e, INSERT);


This looks wrong.  Where are you actually filling in the slot?  You need 
to check the returned slot, if it's empty, you fill it in with E.DECL. 
See other uses of htab_*.



gl_consume_expected_token (parser-lexer, CPP_COMMA);

next_token = gl_consume_token (parser-lexer);
@@ -981,6 +1027,7 @@
gimple_parser *parser = ggc_alloc_cleared_gimple_parser ();
line_table = parser-line_table = ggc_alloc_cleared_line_maps ();
parser-ident_hash = ident_hash;
+
linemap_init (parser-line_table);
parser-lexer = gl_init (parser, fname);

@@ -1403,6 +1450,9 @@
if (parser-lexer-filename == NULL)
  return;

+  gimple_symtab =
+htab_create_ggc (1021, gimple_symtab_entry_hash,
+gimple_symtab_eq_hash, NULL);


Do you need to indent it this way?  Seems to me that the call to 
htab_create_ggc can fit in the line above.



Diego.

[patch] dwarf2out: Drop the size + performance overhead of DW_AT_sibling

2011-10-12 Thread Jan Kratochvil

Hi,

dropping the optional DWARF attribute DW_AT_sibling has only advantages and no
disadvantages:

For files with .gdb_index GDB initial scan does not use DW_AT_sibling at all.
For files without .gdb_index GDB initial scan has 1.79% time _improvement_.
For .debug files it brings 3.49% size decrease (7.84% for rpm compressed files).

I guess DW_AT_sibling had real performance gains on CPUs with 1x (=no) clock
multipliers.  Nowadays mostly only the data size transferred over FSB matters.

I do not think there would be any DWARF consumers compatibility problems as
DW_AT_sibling has always been optional but I admit I have tested only GDB.


clean is FSF GCC+GDB, ns is FSF GCC with the patch applied.

gdbindex -readnow 100x warm:
clean:
56.975 57.161 57.738 58.243 57.529249 seconds
ns:
57.799 58.008 58.202 58.473 58.1204993 seconds
+1.03% = performance decrease but it should be 0%, it is a measurement error
gdbindex -readnow 20x warm(gdb) cold(data):
clean:
57.989
ns:
58.538
+0.95% = performance decrease but it should be 0%, it is a measurement error
200x warm:
clean:
14.393 14.414 14.587 14.496 14.4724998 seconds
ns:
14.202 14.160 14.174 14.318 14.2134998 seconds
-1.79% = performance improvement of non-gdbindex scan 
(dwarf2_build_psymtabs_hard)

gdbindex .debug:
clean = 5589272 bytes
ns = 5394120 bytes
-3.49% = size improvement

gdbindex .debug.xz9:
clean = 1158696 bytes
ns = 1067900 bytes
-7.84% = size improvement

.debug_info + .debug_types:
clean = 0x1a11a0+0x08f389 bytes
ns = 0x184205+0x0833b0 bytes
-7.31% = size improvement

Intel i7-920 CPU and only libstdc++ from GCC 4.7.0 20111002 and `-O2 -gdwarf-4
-fdebug-types-section' were used for the benchmark.  GCC 4.7.0 20111002
--enable-languages=c++ was used for `make check' regression testing.


Thanks,
Jan


gcc/
2011-10-12  Jan Kratochvil  jan.kratoch...@redhat.com

Stop producing DW_AT_sibling.
* dwarf2out.c (add_sibling_attributes): Remove the declaration.
(add_sibling_attributes): Remove the function.
(dwarf2out_finish): Remove calls of add_sibling_attributes.

--- a/gcc/dwarf2out.c
+++ b/gcc/dwarf2out.c
@@ -3316,7 +3316,6 @@ static int htab_cu_eq (const void *, const void *);
 static void htab_cu_del (void *);
 static int check_duplicate_cu (dw_die_ref, htab_t, unsigned *);
 static void record_comdat_symbol_number (dw_die_ref, htab_t, unsigned);
-static void add_sibling_attributes (dw_die_ref);
 static void build_abbrev_table (dw_die_ref);
 static void output_location_lists (dw_die_ref);
 static int constant_size (unsigned HOST_WIDE_INT);
@@ -7482,24 +7481,6 @@ copy_decls_for_unworthy_types (dw_die_ref unit)
   unmark_dies (unit);
 }
 
-/* Traverse the DIE and add a sibling attribute if it may have the
-   effect of speeding up access to siblings.  To save some space,
-   avoid generating sibling attributes for DIE's without children.  */
-
-static void
-add_sibling_attributes (dw_die_ref die)
-{
-  dw_die_ref c;
-
-  if (! die-die_child)
-return;
-
-  if (die-die_parent  die != die-die_parent-die_child)
-add_AT_die_ref (die, DW_AT_sibling, die-die_sib);
-
-  FOR_EACH_CHILD (die, c, add_sibling_attributes (c));
-}
-
 /* Output all location lists for the DIE and its children.  */
 
 static void
@@ -22496,14 +22477,6 @@ dwarf2out_finish (const char *filename)
   prune_unused_types ();
 }
 
-  /* Traverse the DIE's and add add sibling attributes to those DIE's
- that have children.  */
-  add_sibling_attributes (comp_unit_die ());
-  for (node = limbo_die_list; node; node = node-next)
-add_sibling_attributes (node-die);
-  for (ctnode = comdat_type_list; ctnode != NULL; ctnode = ctnode-next)
-add_sibling_attributes (ctnode-root_die);
-
   /* Output a terminator label for the .text section.  */
   switch_to_section (text_section);
   targetm.asm_out.internal_label (asm_out_file, TEXT_END_LABEL, 0);

[patch, Fortran] Change -std=f2008tr to f2008ts, update *.texi status and TR29113-TS29113

2011-10-12 Thread Tobias Burnus


Hello all,

this patch does two things:
a) It updates the Fortran 2003 and TR/TS 29113 status in the GNU Fortran 
manual.
b) It changes all references to Technical Report 29113 to Technical 
Specification 29113

c) It changes -std=f2008tr to -std=f2008ts

(a) is obvious.

Regarding (b): For some reason, ISO's SC22 thinks that one should not 
use Technical Reports (TR) but a Technical Specification (TS) for the 
Further //Interoperability of Fortran with C document - and later also 
for the coarray extensions. Glancing at the documentation, I think they 
are right that a TS is better; there are procedural differences, but for 
us the main difference is the name. As the final word is TS, I think 
gfortran should use TS and not TR throughout for the post-F2008 
technical documents.


Cf. ftp://ftp.nag.co.uk/sc22wg5/N1851-N1900/N1879.txt : JTC 1/SC 22 
instructs the JTC 1/SC 22/WG 5 Convenor to submit future drafts of TR 
29113, Further Interoperability of Fortran with C as Technical 
Specifications


For the difference between TS and TR, see also 
http://www.iso.org/iso/standards_development/processes_and_procedures/deliverables.htm; 
for the different approval scheme also the following flow chart 
(clickable): 
http://www.iso.org/iso/standards_development/it_tools/flowchart_main.htm



Regarding (c): If we switch to TS everywhere, I think it makes sense to 
also call the flag -std=f2008ts; the flag stands for: Follow the 
standard according to Fortran 2008 with the extensions defined in the 
post-F2008 (pre-F2013) standard. Namely, TS 29113 on further 
interoperability of Fortran with C and the coarray TS, which is in a 
rather early stage. (TS 29113 is already past a PDTS voting and a DTS 
should be submitted until June 2012.


Given that -std=f2008tr was never included in a released GCC version and 
given that it currently only allows very few features, I think no one 
actually uses it. Hence, I decided that one can simply change it without 
taking care of still accepting the f2008tr version.


(Currently supported TS 29113 features: OPTIONAL with BIND(C) - absent 
arguments are indicated as NULL pointer, matching the internal 
implementation. RANK() intrinsic - which is boring without assumed-rank 
arrays. ASYNCHRONOUS - well, only the semantic has changed a bit since 
F2003/GCC 4.6; however, GCC's middle end uses by default ASYNCHRONOUS 
semantic; turning it off is a missed-optimization bug.)


The patch was build and regtested on x86-64-linux.
OK for the trunk?

Tobias

PS: I will also update the release notes after the patch has been committed.
2011-10-12  Tobias Burnus  bur...@net-b.de

	* gfortran.texi (Fortran 2008 status, TS 29113 status,
	Further Interoperability of Fortran with C): Update implementation
	status, change references from TR 29113 to TS 29113.
	* intrinsic.texi (RANK): Change TR 29113 to TS 29113.
	* invoke.text (-std=): Ditto, change -std=f2008tr to -std=f2008ts.
	* lang.opt (std=): Ditto.
	* options.c (gfc_handle_option, set_default_std_flags): Ditto and
	change GFC_STD_F2008_TR to GFC_STD_F2008_TS.
	* libgfortran.h: Ditto.
	* intrinsic.c (add_functions, gfc_check_intrinsic_standard): Ditto.
	* decl.c (verify_c_interop_param): Ditto.

2011-10-12  Tobias Burnus  bur...@net-b.de

	* gfortran.dg/bind_c_usage_23.f90: Change TR 29113 to TS 29113 in
	the comments.
	* gfortran.dg/bind_c_usage_24.f90: Ditto.
	* gfortran.dg/rank_3.f90: Ditto.
	* gfortran.dg/bind_c_usage_22.f90: Ditto, change -std=f2008tr to
	-std=f2008ts in dg-options.
	* gfortran.dg/rank_4.f90: Ditto.

diff --git a/gcc/fortran/decl.c b/gcc/fortran/decl.c
index 0ee2575..9f3a39e 100644
--- a/gcc/fortran/decl.c
+++ b/gcc/fortran/decl.c
@@ -1069,7 +1069,7 @@ verify_c_interop_param (gfc_symbol *sym)
 	  retval = FAILURE;
 	}
 	  else if (sym-attr.optional == 1
-		gfc_notify_std (GFC_STD_F2008_TR, TR29113: Variable '%s' 
+		gfc_notify_std (GFC_STD_F2008_TS, TS29113: Variable '%s' 
   at %L with OPTIONAL attribute in 
   procedure '%s' which is BIND(C),
   sym-name, (sym-declared_at),
diff --git a/gcc/fortran/gfortran.texi b/gcc/fortran/gfortran.texi
index 389c05b..f847df3 100644
--- a/gcc/fortran/gfortran.texi
+++ b/gcc/fortran/gfortran.texi
@@ -772,7 +772,7 @@ compile option was used.
 @menu
 * Fortran 2003 status::
 * Fortran 2008 status::
-* TR 29113 status::
+* TS 29113 status::
 @end menu
 
 @node Fortran 2003 status
@@ -1003,8 +1003,11 @@ the intrinsic module @code{ISO_FORTRAN_ENV}.
 @code{ISO_C_BINDINGS} and @code{COMPILER_VERSION} and @code{COMPILER_OPTIONS}
 of @code{ISO_FORTRAN_ENV}.
 
-@item Experimental coarray, use the @option{-fcoarray=single} or
-@option{-fcoarray=lib} flag to enable it.
+@item Coarray support for serial programs with @option{-fcoarray=single} flag
+and experimental support for multiple images with the @option{-fcoarray=lib}
+flag.
+
+@item The @code{DO CONCURRENT} construct is supported.
 
 @item The @code{BLOCK} construct is supported.
 
@@ -1049,19

Re: [patch] dwarf2out: Drop the size + performance overhead of DW_AT_sibling

2011-10-12 Thread Tristan Gingold


On Oct 12, 2011, at 3:50 PM, Jan Kratochvil wrote:

 Hi,
 
 dropping the optional DWARF attribute DW_AT_sibling has only advantages and no
 disadvantages:
 
 For files with .gdb_index GDB initial scan does not use DW_AT_sibling at all.
 For files without .gdb_index GDB initial scan has 1.79% time _improvement_.
 For .debug files it brings 3.49% size decrease (7.84% for rpm compressed 
 files).
 
 I guess DW_AT_sibling had real performance gains on CPUs with 1x (=no) clock
 multipliers.  Nowadays mostly only the data size transferred over FSB matters.
 
 I do not think there would be any DWARF consumers compatibility problems as
 DW_AT_sibling has always been optional but I admit I have tested only GDB.

I fear that this may degrade performance of other debuggers.  What about adding 
a command line option ?

Tristan.

[Patch, Fortran, Committed] Update -f(no-)whole-file in invoke.texi (was: Re: [Patch, Fortran, committed] PR 50585: [4.6/4.7 Regression] ICE with assumed length character array argument)

2011-10-12 Thread Tobias Burnus

I have committed the attached patch to the 4.7 trunk (rev 179854) and 
the 4.6 branch (rev 179855).


invoke.texi wasn't updated when -fwhole-file became the default in GCC 
4.6. This was spotted by Janus, who created the first draft patch. This 
patch was approved by Janus off list.


Tobias

On 10/09/2011 07:01 PM, Tobias Burnus wrote:

On 08.10.2011 11:51, Janus Weil wrote:

Thanks! What's about the .texi change for -fwhole-file?

Will do. Should I include a note about deprecation? And if yes, do you
have a suggestion for the wording?


How about the following attachment?
Index: ChangeLog
===
--- ChangeLog	(revision 179852)
+++ ChangeLog	(working copy)
@@ -1,3 +1,9 @@
+2011-10-11  Tobias Burnus  bur...@net-b.de
+	Janus Weil  ja...@gcc.gnu.org
+
+	* invoke.texi (-fwhole-file): Update wording since -fwhole-file
+	is now enabled by default.
+
 2011-10-11  Michael Meissner  meiss...@linux.vnet.ibm.com
 
 	* trans-expr.c (gfc_conv_power_op): Delete old interface with two
Index: invoke.texi
===
--- invoke.texi	(revision 179852)
+++ invoke.texi	(working copy)
@@ -164,7 +164,7 @@
 @item Code Generation Options
 @xref{Code Gen Options,,Options for code generation conventions}.
 @gccoptlist{-fno-automatic  -ff2c  -fno-underscoring @gol
--fwhole-file -fsecond-underscore @gol
+-fno-whole-file -fsecond-underscore @gol
 -fbounds-check -fcheck-array-temporaries  -fmax-array-constructor =@var{n} @gol
 -fcheck=@var{all|array-temps|bounds|do|mem|pointer|recursion} @gol
 -fcoarray=@var{none|single|lib} -fmax-stack-var-size=@var{n} @gol
@@ -1225,19 +1225,22 @@
 prevent accidental linking between procedures with incompatible
 interfaces.
 
-@item -fwhole-file
-@opindex @code{fwhole-file}
-By default, GNU Fortran parses, resolves and translates each procedure
-in a file separately.  Using this option modifies this such that the
-whole file is parsed and placed in a single front-end tree.  During
-resolution, in addition to all the usual checks and fixups, references
+@item -fno-whole-file
+@opindex @code{fno-whole-file}
+This flag causes the compiler to resolve and translate each procedure in
+a file separately. 
+
+By default, the whole file is parsed and placed in a single front-end tree.
+During resolution, in addition to all the usual checks and fixups, references
 to external procedures that are in the same file effect resolution of
-that procedure, if not already done, and a check of the interfaces. The
+that procedure, if not already done, and a check of the interfaces.  The
 dependences are resolved by changing the order in which the file is
 translated into the backend tree.  Thus, a procedure that is referenced
 is translated before the reference and the duplication of backend tree
 declarations eliminated.
 
+The @option{-fno-whole-file} option is deprecated and may lead to wrong code.
+
 @item -fsecond-underscore
 @opindex @code{fsecond-underscore}
 @cindex underscore
Index: ChangeLog
===
--- ChangeLog	(revision 179794)
+++ ChangeLog	(working copy)
@@ -1,5 +1,11 @@
 2011-10-11  Tobias Burnus  bur...@net-b.de
+	Janus Weil  ja...@gcc.gnu.org
 
+	* invoke.texi (-fwhole-file): Update wording since -fwhole-file
+	is now enabled by default.
+
+2011-10-11  Tobias Burnus  bur...@net-b.de
+
 	PR fortran/50273
 	* trans-common.c (translate_common): Fix -Walign-commons check.
 
Index: invoke.texi
===
--- invoke.texi	(revision 179793)
+++ invoke.texi	(working copy)
@@ -163,7 +163,7 @@
 @item Code Generation Options
 @xref{Code Gen Options,,Options for code generation conventions}.
 @gccoptlist{-fno-automatic  -ff2c  -fno-underscoring @gol
--fwhole-file -fsecond-underscore @gol
+-fno-whole-file -fsecond-underscore @gol
 -fbounds-check -fcheck-array-temporaries  -fmax-array-constructor =@var{n} @gol
 -fcheck=@var{all|array-temps|bounds|do|mem|pointer|recursion} @gol
 -fcoarray=@var{none|single} -fmax-stack-var-size=@var{n} @gol
@@ -1206,19 +1206,22 @@
 prevent accidental linking between procedures with incompatible
 interfaces.
 
-@item -fwhole-file
-@opindex @code{fwhole-file}
-By default, GNU Fortran parses, resolves and translates each procedure
-in a file separately.  Using this option modifies this such that the
-whole file is parsed and placed in a single front-end tree.  During
-resolution, in addition to all the usual checks and fixups, references
+@item -fno-whole-file
+@opindex @code{fno-whole-file}
+This flag causes the compiler to resolve and translate each procedure in
+a file separately. 
+
+By default, the whole file is parsed and placed in a single front-end tree.
+During resolution, in addition to all the usual checks and fixups, references
 to external procedures that are in the same file effect resolution of
-that procedure, if not already done,

Re: [patch] dwarf2out: Drop the size + performance overhead of DW_AT_sibling

2011-10-12 Thread Jan Kratochvil

On Wed, 12 Oct 2011 16:07:24 +0200, Tristan Gingold wrote:
 I fear that this may degrade performance of other debuggers.  What about
 adding a command line option ?

I can test idb, there aren't so many DWARF debuggers out there I think.

If the default is removed DW_AT_sibling a new options may make sense as some
compatibility safeguard.


Thanks,
Jan

Re: [PATCH] RFC: Cache LTO streamer mappings

2011-10-12 Thread Jan Hubicka

 On 11-10-12 08:25 , Jan Hubicka wrote:

 WPA is currently about 1/3 of readingtype merging, 1/3 of streaming out and
 1/3 of inlining.  inlining is relatively easy to cure, so yes, streaming
 performance is important.  The very basic streaming primitives actualy still
 shows top in profile along with hashing and type comparing code.  I will post
 some updated oprofiles into Mozilla PR.

 OK, thanks.  My numbers are from very early LTO development.

Yeah, the problem is minor on small projects and C projects. C++ tends to carry
a lot of context with it - both in the files streamed from compilation to WPA
(a lot of types and such) as well as into individual ltrans units.

We still need to stream in and out about 2GB from WPA to ltrans (combined sizes
of ltrans0 to lstrans31) and since we are at 3 minutes of compilation now,
seconds actually count.


 Honestly I think we won't get any great speedups unless we work on reducing 
 amount of
 unnecesary info we pickle/unpickle.

 That's what I was leaning towards.  Optimizing the basic access patterns  
 may not buy us as much as just reducing the amount of clutter we have to  
 deal with.  It may make sense, however, as a subsequent optimization.

I will give this patch a try on Mozilla to see if I can report some positive 
numbers.
Obviously having the basic I/O effective is also important.

Honza


 Diego.

Re: [PATCH] [Annotalysis] Bugfix for spurious thread safety warnings with shared mutexes

2011-10-12 Thread Delesley Hutchins

I don't think that will fix this bug.  The bug occurs if:
(1) The exclusive lock set has error_mark_node.
(2) The shared lock set has the actual lock.

In this case, remove_lock_from_lockset thinks that it has found the
lock in the exclusive lock set, and fails to remove it from the shared
lock set.  To fix the bug, the first call to lock_set_contains should
ignore the universal lock and return null, so that remove_lock will
continue on to search the shared lock set.

If I understand your suggested fix correctly, lock_set_contains would
still return non-null when the universal lock was present, which is
not what we want.  IMHO, lock_set_contains is operating correctly; it
was just passed the wrong arguments.

  -DeLesley

On Tue, Oct 11, 2011 at 2:34 PM, Ollie Wild a...@google.com wrote:
 On Mon, Oct 10, 2011 at 3:37 PM, Delesley Hutchins deles...@google.com 
 wrote:

 --- gcc/tree-threadsafe-analyze.c       (revision 179771)
 +++ gcc/tree-threadsafe-analyze.c       (working copy)
 @@ -1830,14 +1830,27 @@ remove_lock_from_lockset (tree lockable, struct po

 This feels like a bug in lock_set_contains(), not
 remove_lock_from_lockset().  I'd modify lock_set_contains() as
 follows:

 1) During the universal lock conditional, remove the return statement.
  Instead, set default_lock = lock (where default_lock is a new
 variable initialized to NULL_TREE).

 2) Anywhere NULL_TREE is returned later, replace it with default_lock.

 Ollie




-- 
DeLesley Hutchins | Software Engineer | deles...@google.com | 505-206-0315

Re: [PATCH] Fix PR50189

On Wed, 12 Oct 2011, Richard Guenther wrote:

 
 This changes VRP to use the type of the variable we record
 an assertion for to look for TYPE_MIN/MAX_VALUEs rather than
 the limit that it is tested against.  That makes sense anyway
 and happens to mitigate the wrong-code bug for the testcase
 in PR50189.
 
 Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Which shows I need to adjust types for plus/minus we build because
of possible mismatches.

Bootstrapped and tested on x86_64-unknown-linux-gnu, installed.

Richard.

2011-10-12  Paul Koning  pkon...@gcc.gnu.org

PR tree-optimization/50189
* tree-vrp.c (extract_range_from_assert): Use the type of
the variable, not the limit.

* g++.dg/torture/pr50189.C: New testcase.

Index: gcc/tree-vrp.c
===
*** gcc/tree-vrp.c  (revision 179850)
--- gcc/tree-vrp.c  (working copy)
*** extract_range_from_assert (value_range_t
*** 1476,1482 
  
limit = avoid_overflow_infinity (limit);
  
!   type = TREE_TYPE (limit);
gcc_assert (limit != var);
  
/* For pointer arithmetic, we only keep track of pointer equality
--- 1476,1482 
  
limit = avoid_overflow_infinity (limit);
  
!   type = TREE_TYPE (var);
gcc_assert (limit != var);
  
/* For pointer arithmetic, we only keep track of pointer equality
*** extract_range_from_assert (value_range_t
*** 1650,1657 
  /* For LT_EXPR, we create the range [MIN, MAX - 1].  */
  if (cond_code == LT_EXPR)
{
! tree one = build_int_cst (type, 1);
! max = fold_build2 (MINUS_EXPR, type, max, one);
  if (EXPR_P (max))
TREE_NO_WARNING (max) = 1;
}
--- 1650,1657 
  /* For LT_EXPR, we create the range [MIN, MAX - 1].  */
  if (cond_code == LT_EXPR)
{
! tree one = build_int_cst (TREE_TYPE (max), 1);
! max = fold_build2 (MINUS_EXPR, TREE_TYPE (max), max, one);
  if (EXPR_P (max))
TREE_NO_WARNING (max) = 1;
}
*** extract_range_from_assert (value_range_t
*** 1685,1692 
  /* For GT_EXPR, we create the range [MIN + 1, MAX].  */
  if (cond_code == GT_EXPR)
{
! tree one = build_int_cst (type, 1);
! min = fold_build2 (PLUS_EXPR, type, min, one);
  if (EXPR_P (min))
TREE_NO_WARNING (min) = 1;
}
--- 1685,1692 
  /* For GT_EXPR, we create the range [MIN + 1, MAX].  */
  if (cond_code == GT_EXPR)
{
! tree one = build_int_cst (TREE_TYPE (min), 1);
! min = fold_build2 (PLUS_EXPR, TREE_TYPE (min), min, one);
  if (EXPR_P (min))
TREE_NO_WARNING (min) = 1;
}
Index: gcc/testsuite/g++.dg/torture/pr50189.C
===
*** gcc/testsuite/g++.dg/torture/pr50189.C  (revision 0)
--- gcc/testsuite/g++.dg/torture/pr50189.C  (revision 0)
***
*** 0 
--- 1,121 
+ // { dg-do run }
+ // { dg-options -fstrict-enums }
+ 
+ extern C void abort (void);
+ class CCUTILS_KeyedScalarLevelPosition
+ {
+ public:
+ 
+ typedef enum
+ {
+ UNINITED = 0,
+ AT_BEGIN = 1,
+ AT_END = 2,
+ AT_KEY = 3
+ 
+ } position_t;
+ 
+ bool is_init() const
+ { return(m_timestamp != UNINITED); }
+ 
+ bool is_at_begin() const
+ { return(m_timestamp == AT_BEGIN); }
+ 
+ position_t get_state() const
+ {
+ return((m_timestamp = AT_KEY)
+  ? AT_KEY
+  : ((position_t)m_timestamp));
+ }
+ 
+ void set_at_begin()
+ { m_timestamp = AT_BEGIN; }
+ 
+ unsigned int get_index() const
+ { return(m_index); }
+ 
+ void set_pos(unsigned int a_index, unsigned int a_timestmap)
+ {
+ m_index = a_index;
+ m_timestamp = a_timestmap;
+ }
+ 
+ bool check_pos(unsigned int a_num_entries, unsigned int a_timestamp) const
+ {
+ if (get_state() != AT_KEY)
+ return(false);
+ 
+ if (m_timestamp != a_timestamp)
+ return(false);
+ 
+ return(m_index  a_num_entries);
+ }
+ 
+ void set_not_init()
+ { m_timestamp = 0; }
+ 
+ private:
+ 
+ unsigned int m_timestamp;
+ unsigned int m_index;
+ 
+ };
+ 
+ class CCUTILS_KeyedScalarPosition
+ {
+ public:
+ 
+ CCUTILS_KeyedScalarLevelPosition m_L1;
+ CCUTILS_KeyedScalarLevelPosition m_L2;
+ };
+ 
+ class baz
+ {
+ public:
+ int *n[20];
+ unsigned int m_cur_array_len;
+ unsigned int m_timestamp;
+ 
+ unsigned int _get_timestamp() const
+ { return(m_timestamp); }
+ 
+ bool _check_L1_pos(const CCUTILS_KeyedScalarPosition a_position) const
+ {
+ return(a_position.m_L1.check_pos(
+

Re: [PATCH] Fix PR50204

On Wed, 12 Oct 2011, Michael Matz wrote:

 Hi,
 
 On Tue, 11 Oct 2011, Richard Guenther wrote:
 
  Since we have the alias oracle we no longer optimize the testcase below 
  because I initially restricted the stmt walking to give up for PHIs with 
  more than 2 arguments because of compile-time complexity issues. But 
  it's easy to see that compile-time is not an issue when we reduce PHI 
  args pairwise to a single dominating operand.
 
 Of course it is, not a different complexity class, but a constant factor.  
 You have to do N-1 pairwise reductions, meaning with a large fan-in block 
 you pay N-1 times the price, not just once for one pair, and if the price 
 happens to be walking all up to the function start you indeed then are at 
 N*M.  I think there should be a cutoff, possibly not at two.  Think about 
 the generated testcases with many large switches.

Indeed we can do a little better by also caching at possible
branches.  The easiest is to have cache points at the first
store we visit in a basic-block (thus we have at most two bits
in the visited bitmap per BB).  We can also make the result
(more) independent on the order of PHI arguments by disambiguating
against a VUSE that (possibly) dominates all other VUSEs.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2011-10-12  Richard Guenther  rguent...@suse.de

* tree-ssa-alias.c (maybe_skip_until): Cache also at the point
of the first store we visit in a basic-block.
(get_continuation_for_phi): Search for a candidate VUSE that
might dominates all others.  Do pairwise disambiguation against
that candidate.

Index: gcc/tree-ssa-alias.c
===
--- gcc/tree-ssa-alias.c(revision 179849)
+++ gcc/tree-ssa-alias.c(working copy)
@@ -1846,6 +1846,8 @@ static bool
 maybe_skip_until (gimple phi, tree target, ao_ref *ref,
  tree vuse, bitmap *visited)
 {
+  basic_block bb = gimple_bb (phi);
+
   if (!*visited)
 *visited = BITMAP_ALLOC (NULL);
 
@@ -1870,6 +1872,14 @@ maybe_skip_until (gimple phi, tree targe
   else if (gimple_nop_p (def_stmt)
   || stmt_may_clobber_ref_p_1 (def_stmt, ref))
return false;
+  /* If we reach a new basic-block see if we already skipped it
+ in a previous walk that ended successfully.  */
+  if (gimple_bb (def_stmt) != bb)
+   {
+ if (!bitmap_set_bit (*visited, SSA_NAME_VERSION (vuse)))
+   return true;
+ bb = gimple_bb (def_stmt);
+   }
   vuse = gimple_vuse (def_stmt);
 }
   return true;
@@ -1948,18 +1958,35 @@ get_continuation_for_phi (gimple phi, ao
  until we hit the phi argument definition that dominates the other one.  */
   else if (nargs = 2)
 {
-  tree arg0 = PHI_ARG_DEF (phi, 0);
-  tree arg1;
-  unsigned i = 1;
-  do
+  tree arg0, arg1;
+  unsigned i;
+
+  /* Find a candidate for the virtual operand which definition
+dominates those of all others.  */
+  arg0 = PHI_ARG_DEF (phi, 0);
+  if (!SSA_NAME_IS_DEFAULT_DEF (arg0))
+   for (i = 1; i  nargs; ++i)
+ {
+   arg1 = PHI_ARG_DEF (phi, i);
+   if (SSA_NAME_IS_DEFAULT_DEF (arg1))
+ {
+   arg0 = arg1;
+   break;
+ }
+   if (dominated_by_p (CDI_DOMINATORS,
+   gimple_bb (SSA_NAME_DEF_STMT (arg0)),
+   gimple_bb (SSA_NAME_DEF_STMT (arg1
+ arg0 = arg1;
+ }
+
+  /* Then pairwise reduce against the found candidate.  */
+  for (i = 0; i  nargs; ++i)
{
  arg1 = PHI_ARG_DEF (phi, i);
  arg0 = get_continuation_for_phi_1 (phi, arg0, arg1, ref, visited);
  if (!arg0)
return NULL_TREE;
-
}
-  while (++i  nargs);
 
   return arg0;
 }

Re: [PATCH] Fix PR50204

Hi,


no need to test a phi argument with itself (testing arg0 != arg1 is not 
the right test, though, so remembering the candidate index):

 @@ -1948,18 +1958,35 @@ get_continuation_for_phi (gimple phi, ao
   until we hit the phi argument definition that dominates the other one.  
 */
else if (nargs = 2)
  {
 -  tree arg0 = PHI_ARG_DEF (phi, 0);
 -  tree arg1;
 -  unsigned i = 1;
 -  do
 +  tree arg0, arg1;
 +  unsigned i;

unsigned j;

 +  /* Find a candidate for the virtual operand which definition
 +  dominates those of all others.  */
 +  arg0 = PHI_ARG_DEF (phi, 0);

j = 0;

 +  if (!SSA_NAME_IS_DEFAULT_DEF (arg0))
 + for (i = 1; i  nargs; ++i)
 +   {
 + arg1 = PHI_ARG_DEF (phi, i);
 + if (SSA_NAME_IS_DEFAULT_DEF (arg1))
 +   {
 + arg0 = arg1;

j = i;

 + break;
 +   }
 + if (dominated_by_p (CDI_DOMINATORS,
 + gimple_bb (SSA_NAME_DEF_STMT (arg0)),
 + gimple_bb (SSA_NAME_DEF_STMT (arg1
 +   arg0 = arg1;

, j = i;

 +   }
 +
 +  /* Then pairwise reduce against the found candidate.  */
 +  for (i = 0; i  nargs; ++i)
   {
 arg1 = PHI_ARG_DEF (phi, i);

if (i != j)

 arg0 = get_continuation_for_phi_1 (phi, arg0, arg1, ref, visited);
 if (!arg0)
   return NULL_TREE;
 -
   }
 -  while (++i  nargs);


Ciao,
Michael.

Re: New warning for expanded vector operations

On Tue, Oct 11, 2011 at 9:11 AM, Artem Shinkarov
artyom.shinkar...@gmail.com wrote:

 Committed with the revision 179807.



This caused:

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50704

-- 
H.J.

[PATCH] Optimize some loops using bool types (PR tree-optimization/50596)

Hi!

This patch allows vectorization of some loops that use
bool (which is especially important now that we use bool more often
even for stmts that weren't originally using bool in the sources),
in particular (when bool is cast to an integer type, and the bool rhs
has def stmts within the loop as either BIT_{AND,IOR,XOR}_EXPR,
or just SSA_NAME assigns or bool - another bool casts, or comparisons
(tested recursively).  In that case the pattern recognizer transforms
the comparisons into COND_EXPRs using suitable integer type (the same width
as the comparison operands) and other bools to suitable integer types
with casts added where needed.

The patch doesn't yet handle vectorization of storing into a bool array,
I'll work on that later.

Bootstrapped/regtested on x86_64-linux and i686-linux.  Ok for trunk?

2011-10-12  Jakub Jelinek  ja...@redhat.com

PR tree-optimization/50596
* tree-vectorizer.h (NUM_PATTERNS): Increase to 7.
* tree-vect-patterns.c (vect_vect_recog_func_ptrs): Add
vect_recog_bool_pattern.
(check_bool_pattern, adjust_bool_pattern_cast,
adjust_bool_pattern, vect_recog_bool_pattern): New functions.

* gcc.dg/vect/vect-cond-9.c: New test.

--- gcc/tree-vectorizer.h.jj2011-10-10 09:41:29.0 +0200
+++ gcc/tree-vectorizer.h   2011-10-10 10:12:03.0 +0200
@@ -902,7 +902,7 @@ extern void vect_slp_transform_bb (basic
Additional pattern recognition functions can (and will) be added
in the future.  */
 typedef gimple (* vect_recog_func_ptr) (VEC (gimple, heap) **, tree *, tree *);
-#define NUM_PATTERNS 6
+#define NUM_PATTERNS 7
 void vect_pattern_recog (loop_vec_info);
 
 /* In tree-vectorizer.c.  */
--- gcc/tree-vect-patterns.c.jj 2011-10-10 09:41:29.0 +0200
+++ gcc/tree-vect-patterns.c2011-10-10 18:23:41.0 +0200
@@ -51,13 +51,15 @@ static gimple vect_recog_over_widening_p
  tree *);
 static gimple vect_recog_mixed_size_cond_pattern (VEC (gimple, heap) **,
  tree *, tree *);
+static gimple vect_recog_bool_pattern (VEC (gimple, heap) **, tree *, tree *);
 static vect_recog_func_ptr vect_vect_recog_func_ptrs[NUM_PATTERNS] = {
vect_recog_widen_mult_pattern,
vect_recog_widen_sum_pattern,
vect_recog_dot_prod_pattern,
vect_recog_pow_pattern,
vect_recog_over_widening_pattern,
-   vect_recog_mixed_size_cond_pattern};
+   vect_recog_mixed_size_cond_pattern,
+   vect_recog_bool_pattern};
 
 
 /* Function widened_name_p
@@ -1068,10 +1070,8 @@ vect_operation_fits_smaller_type (gimple
constants.
Check if S3 and S4 can be done on a smaller type than 'TYPE', it can either
be 'type' or some intermediate type.  For now, we expect S5 to be a type
-   demotion operation.  We also check that S3 and S4 have only one use.
-.
+   demotion operation.  We also check that S3 and S4 have only one use.  */
 
-*/
 static gimple
 vect_recog_over_widening_pattern (VEC (gimple, heap) **stmts,
   tree *type_in, tree *type_out)
@@ -1333,6 +1333,356 @@ vect_recog_mixed_size_cond_pattern (VEC 
 }
 
 
+/* Helper function of vect_recog_bool_pattern.  Called recursively, return
+   true if bool VAR can be optimized that way.  */
+
+static bool
+check_bool_pattern (tree var, loop_vec_info loop_vinfo)
+{
+  gimple def_stmt;
+  enum vect_def_type dt;
+  tree def, rhs1;
+  enum tree_code rhs_code;
+
+  if (!vect_is_simple_use (var, loop_vinfo, NULL, def_stmt, def, dt))
+return false;
+
+  if (dt != vect_internal_def)
+return false;
+
+  if (!is_gimple_assign (def_stmt))
+return false;
+
+  if (!has_single_use (def))
+return false;
+
+  rhs1 = gimple_assign_rhs1 (def_stmt);
+  rhs_code = gimple_assign_rhs_code (def_stmt);
+  switch (rhs_code)
+{
+case SSA_NAME:
+  return check_bool_pattern (rhs1, loop_vinfo);
+
+CASE_CONVERT:
+  if ((TYPE_PRECISION (TREE_TYPE (rhs1)) != 1
+  || !TYPE_UNSIGNED (TREE_TYPE (rhs1)))
+  TREE_CODE (TREE_TYPE (rhs1)) != BOOLEAN_TYPE)
+   return false;
+  return check_bool_pattern (rhs1, loop_vinfo);
+
+case BIT_NOT_EXPR:
+  return check_bool_pattern (rhs1, loop_vinfo);
+
+case BIT_AND_EXPR:
+case BIT_IOR_EXPR:
+case BIT_XOR_EXPR:
+  if (!check_bool_pattern (rhs1, loop_vinfo))
+   return false;
+  return check_bool_pattern (gimple_assign_rhs2 (def_stmt), loop_vinfo);
+
+default:
+  if (TREE_CODE_CLASS (rhs_code) == tcc_comparison)
+   {
+ tree vecitype, comp_vectype;
+
+ comp_vectype = get_vectype_for_scalar_type (TREE_TYPE (rhs1));
+ if (comp_vectype == NULL_TREE)
+   return false;
+
+ if (TREE_CODE (TREE_TYPE (rhs1)) != INTEGER_TYPE)
+   {
+ enum machine_mode mode = TYPE_MODE (TREE_TYPE (rhs1));
+ tree itype
+   =

Re: [patch i386]: Unbreak bootstrap for x64 SEH enabled target

On 10/12/2011 12:07 AM, Kai Tietz wrote:
 Hello,
 
 by recent changes gcc begun to move code into the prologue region.
 This is for x64 SEH an issue, as here the table-information for
 prologue is limited to 255 bytes size.  So we need to avoid moving
 additional code into prologue.  To achieve this we mark all standard
 and xmm registers as prologue-used at the end of prologue.   Also we
 need to emit a memory blockage.
 
 ChangeLog
 
 2011-10-12  Kai Tietz  kti...@redhat.com
 
 * config/i386/i386.c (ix86_expand_prologue): Mark
 for TARGET_SEH all sse/integer registers as prologue-used.
 
 Tested for x86_64-w64-mingw32.  Ok for apply?
 
 Regards,
 Kai
 
 Index: i386.c
 ===
 --- i386.c  (revision 179824)
 +++ i386.c  (working copy)
 @@ -10356,7 +10356,24 @@
   Further, prevent alloca modifications to the stack pointer from being
   combined with prologue modifications.  */
if (TARGET_SEH)
 -emit_insn (gen_prologue_use (stack_pointer_rtx));
 +{
 +  int i;
 +
 +  /* Due limited size of prologue-code size of 255 bytes,
 + we need to prevent scheduler to sink instructions into
 + prologue code.  Therefore we mark all standard, sse, fpu,
 + and the pc registers as prologue-used to prevent this.
 + Also an memory-blockage is necessary.  */
 +  emit_insn (gen_memory_blockage ());
 +
 +  for (i = 0; i = 7; i++)
 +{
 + emit_insn (gen_prologue_use (gen_rtx_REG (Pmode, AX_REG + i)));
 + emit_insn (gen_prologue_use (gen_rtx_REG (Pmode, R8_REG + i)));
 + emit_insn (gen_prologue_use (gen_rtx_REG (TImode, XMM0_REG + i)));
 + emit_insn (gen_prologue_use (gen_rtx_REG (TImode, XMM8_REG + i)));
 +   }
 +}

This is overkill.  We simply need to disable shrink-wrapping for SEH.
The easiest way to do that is to add !TARGET_SEH (and a comment) to
the simple_return pattern predicate.


r~

[PATCH] AVX2 vector permutation fixes plus vec_pack_trunc_{v16hi,v8si,v4di} support

Hi!

This patch started with noticing while working on PR50596 that
#define N 1024
long long a[N];
char b[N];
void
foo (void)
{
  int i;
  for (i = 0; i  N; i++)
b[i] = a[i];
}
is even with -O3 -mavx2 vectorized just with 16-byte vectors
instead of 32-byte vectors and has various fixes I've noticed
when diving into it.  The vector permutations with AVX2 aren't
very easy, because some instructions don't shuffle cross-lane,
some do but only for some modes.  The patch adds AVX2
vec_pack_trunc* expanders so that the above can be vectorized,
and implements a couple of permutation sequences, including for
a single operand __builtin_vec_shuffle a 4 insn sequence that
handles arbitrary V32QI/V16HI constant permutations (and some
cases where 1 insn is possible too) and also a variable mask
V{32Q,16H,8S,4D}I permutations.
I think we badly need testcase which will try all possible
constant permutations (probably one testcase per mode),
even for V32QImode that's just 32x32 plus 32x64 tests (if
split into 32 tests in a function times 96 noinline functions),
but with that I'd like to wait for Richard's permutation improvements,
because although currently the backend signalizes it can handle
some constant argument e.g. V32QImode permutation, as there is no
V32QImode permutation builtin __builtin_shuffle emits it as
variable mask operation.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

2011-10-12  Jakub Jelinek  ja...@redhat.com

* config/i386/i386.md (UNSPEC_VPERMDI): Remove.
* config/i386/i386.c (ix86_expand_vec_perm): Handle
V16QImode and V32QImode for TARGET_AVX2.
(MAX_VECT_LEN): Increase to 32.
(expand_vec_perm_blend): Add support for 32-byte integer
vectors with TARGET_AVX2.
(valid_perm_using_mode_p): New function.
(expand_vec_perm_pshufb): Add support for 32-byte integer
vectors with TARGET_AVX2.
(expand_vec_perm_vpshufb2_vpermq): New function.
(expand_vec_perm_vpshufb2_vpermq_even_odd): New function.
(expand_vec_perm_even_odd_1): Handle 32-byte integer vectors
with TARGET_AVX2.
(ix86_expand_vec_perm_builtin_1): Try expand_vec_perm_vpshufb2_vpermq
and expand_vec_perm_vpshufb2_vpermq_even_odd.
* config/i386/sse.md (VEC_EXTRACT_EVENODD_MODE): Add for TARGET_AVX2
32-byte integer vector modes.
(vec_pack_trunc_mode): Use VI248_AVX2 instead of VI248_128.
(avx2_interleave_highv32qi, avx2_interleave_lowv32qi): Remove pasto.
(avx2_pshufdv3, avx2_pshuflwv3, avx2_pshufhwv3): Generate
4 new operands.
(avx2_pshufd_1, avx2_pshuflw_1, avx2_pshufhw_1): Don't use
match_dup, instead add 4 new operands and require they have
right cross-lane values.
(avx2_permv4di): Change into define_expand.
(avx2_permv4di_1): New instruction.
(avx2_permv2ti): Use nonimmediate_operand instead of register_operand
for xm constrained operand.
(VEC_PERM_AVX2): Add V32QI and V16QI for TARGET_AVX2.

--- gcc/config/i386/i386.md.jj  2011-10-06 16:42:12.0 +0200
+++ gcc/config/i386/i386.md 2011-10-11 10:07:04.0 +0200
@@ -235,7 +235,6 @@ (define_c_enum unspec [
   UNSPEC_VPERMSI
   UNSPEC_VPERMDF
   UNSPEC_VPERMSF
-  UNSPEC_VPERMDI
   UNSPEC_VPERMTI
   UNSPEC_GATHER
 
--- gcc/config/i386/i386.c.jj   2011-10-10 09:41:28.0 +0200
+++ gcc/config/i386/i386.c  2011-10-12 11:05:06.0 +0200
@@ -19334,7 +19334,7 @@ ix86_expand_vec_perm (rtx operands[])
   rtx op0 = operands[1];
   rtx op1 = operands[2];
   rtx mask = operands[3];
-  rtx t1, t2, vt, vec[16];
+  rtx t1, t2, t3, t4, vt, vt2, vec[32];
   enum machine_mode mode = GET_MODE (op0);
   enum machine_mode maskmode = GET_MODE (mask);
   int w, e, i;
@@ -19343,50 +19343,72 @@ ix86_expand_vec_perm (rtx operands[])
   /* Number of elements in the vector.  */
   w = GET_MODE_NUNITS (mode);
   e = GET_MODE_UNIT_SIZE (mode);
-  gcc_assert (w = 16);
+  gcc_assert (w = 32);
 
   if (TARGET_AVX2)
 {
-  if (mode == V4DImode || mode == V4DFmode)
+  if (mode == V4DImode || mode == V4DFmode || mode == V16HImode)
{
  /* Unfortunately, the VPERMQ and VPERMPD instructions only support
 an constant shuffle operand.  With a tiny bit of effort we can
 use VPERMD instead.  A re-interpretation stall for V4DFmode is
-unfortunate but there's no avoiding it.  */
- t1 = gen_reg_rtx (V8SImode);
+unfortunate but there's no avoiding it.
+Similarly for V16HImode we don't have instructions for variable
+shuffling, while for V32QImode we can use after preparing suitable
+masks vpshufb; vpshufb; vpermq; vpor.  */
+
+ if (mode == V16HImode)
+   {
+ maskmode = mode = V32QImode;
+ w = 32;
+ e = 1;
+   }
+ else
+   {
+ maskmode = mode = V8SImode;
+

[PATCH] Add mulv32qi3 support

Hi!

On
long long a[1024], c[1024];
char b[1024];
void
foo (void)
{
  int i;
  for (i = 0; i  1024; i++)
b[i] = a[i] + 3 * c[i];
}
I've noticed that while i?86 backend supports
mulv16qi3, it doesn't support mulv32qi3 even with AVX2.

The following patch implements that similarly how
mulv16qi3 is implemented.

Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?

BTW, I wonder if vector multiply expansion when one argument is VECTOR_CST
with all elements the same shouldn't use something similar to what expand_mult
does, not sure if in the generic code or at least in the backends.
Testing the costs will be harder, maybe it could just test fewer algorithms
and perhaps just count number of instructions or something similar.
But certainly e.g. v32qi multiplication by 3 is quite costly
(4 interleaves, 2 v16hi multiplications, 4 insns to select even from the
two), while two vector additions (tmp = x + x; result = x + tmp;)
would do the job.

2011-10-12  Jakub Jelinek  ja...@redhat.com

* config/i386/sse.md (vec_avx2): New mode_attr.
(mulv16qi3): Macroize to cover also mulv32qi3 for
TARGET_AVX2 into ...
(mulmode3): ... this.

--- gcc/config/i386/sse.md.jj   2011-10-12 09:23:37.0 +0200
+++ gcc/config/i386/sse.md  2011-10-12 12:16:39.0 +0200
@@ -163,6 +163,12 @@ (define_mode_attr avx_avx2
(V4SI avx2) (V2DI avx2)
(V8SI avx2) (V4DI avx2)])
 
+(define_mode_attr vec_avx2
+  [(V16QI vec) (V32QI avx2)
+   (V8HI vec) (V16HI avx2)
+   (V4SI vec) (V8SI avx2)
+   (V2DI vec) (V4DI avx2)])
+
 ;; Mapping of logic-shift operators
 (define_code_iterator lshift [lshiftrt ashift])
 
@@ -4838,10 +4844,10 @@ (define_insn *sse2_avx2_plusminus_in
(set_attr prefix orig,vex)
(set_attr mode TI)])
 
-(define_insn_and_split mulv16qi3
-  [(set (match_operand:V16QI 0 register_operand )
-   (mult:V16QI (match_operand:V16QI 1 register_operand )
-   (match_operand:V16QI 2 register_operand )))]
+(define_insn_and_split mulmode3
+  [(set (match_operand:VI1_AVX2 0 register_operand )
+   (mult:VI1_AVX2 (match_operand:VI1_AVX2 1 register_operand )
+  (match_operand:VI1_AVX2 2 register_operand )))]
   TARGET_SSE2
 can_create_pseudo_p ()
   #
@@ -4850,34 +4856,41 @@ (define_insn_and_split mulv16qi3
 {
   rtx t[6];
   int i;
+  enum machine_mode mulmode = sseunpackmodemode;
 
   for (i = 0; i  6; ++i)
-t[i] = gen_reg_rtx (V16QImode);
+t[i] = gen_reg_rtx (MODEmode);
 
   /* Unpack data such that we've got a source byte in each low byte of
  each word.  We don't care what goes into the high byte of each word.
  Rather than trying to get zero in there, most convenient is to let
  it be a copy of the low byte.  */
-  emit_insn (gen_vec_interleave_highv16qi (t[0], operands[1], operands[1]));
-  emit_insn (gen_vec_interleave_highv16qi (t[1], operands[2], operands[2]));
-  emit_insn (gen_vec_interleave_lowv16qi (t[2], operands[1], operands[1]));
-  emit_insn (gen_vec_interleave_lowv16qi (t[3], operands[2], operands[2]));
+  emit_insn (gen_vec_avx2_interleave_highmode (t[0], operands[1],
+  operands[1]));
+  emit_insn (gen_vec_avx2_interleave_highmode (t[1], operands[2],
+  operands[2]));
+  emit_insn (gen_vec_avx2_interleave_lowmode (t[2], operands[1],
+ operands[1]));
+  emit_insn (gen_vec_avx2_interleave_lowmode (t[3], operands[2],
+ operands[2]));
 
   /* Multiply words.  The end-of-line annotations here give a picture of what
  the output of that instruction looks like.  Dot means don't care; the
  letters are the bytes of the result with A being the most significant.  */
-  emit_insn (gen_mulv8hi3 (gen_lowpart (V8HImode, t[4]), /* .A.B.C.D.E.F.G.H */
-  gen_lowpart (V8HImode, t[0]),
-  gen_lowpart (V8HImode, t[1])));
-  emit_insn (gen_mulv8hi3 (gen_lowpart (V8HImode, t[5]), /* .I.J.K.L.M.N.O.P */
-  gen_lowpart (V8HImode, t[2]),
-  gen_lowpart (V8HImode, t[3])));
+  emit_insn (gen_rtx_SET (VOIDmode, gen_lowpart (mulmode, t[4]),
+ gen_rtx_MULT (mulmode,/* .A.B.C.D.E.F.G.H */
+   gen_lowpart (mulmode, t[0]),
+   gen_lowpart (mulmode, t[1];
+  emit_insn (gen_rtx_SET (VOIDmode, gen_lowpart (mulmode, t[5]),
+ gen_rtx_MULT (mulmode,/* .I.J.K.L.M.N.O.P */
+   gen_lowpart (mulmode, t[2]),
+   gen_lowpart (mulmode, t[3];
 
   /* Extract the even bytes and merge them back together.  */
   ix86_expand_vec_extract_even_odd (operands[0], t[5], t[4], 0);
 
   set_unique_reg_note (get_last_insn (), REG_EQUAL,
-

Re: int_cst_hash_table mapping persistence and the garbage collector

2011-10-12 Thread Gary Funck

On 10/12/11 14:00:54, Richard Guenther wrote:
 I think there is an issue when two cache htabs refer to each other
 with respect to GC, you might search the list to find out more.

Richard, thanks.  I thought that might be the case, but
I don't understand the GC well enough to make this determination.

- Gary

Re: [PATCH 3/7] Emit macro expansion related diagnostics

2011-10-12 Thread Gabriel Dos Reis

On Tue, Oct 11, 2011 at 9:47 AM, Jason Merrill ja...@redhat.com wrote:
 That looks pretty good, but do you really need to build up a separate data
 structure to search?  You seem to be searching it in the same order that
 it's built up, so why not just walk the expansion chain directly when
 searching?


Agreed.

Also, please keep linemap_location_before_p, if needed as macro that
expands to the comparison function.  That aids readability.

Re: [patch i386]: Unbreak bootstrap for x64 SEH enabled target

2011-10-12 Thread Kai Tietz

2011/10/12 Richard Henderson r...@redhat.com:
 On 10/12/2011 12:07 AM, Kai Tietz wrote:
 Hello,

 by recent changes gcc begun to move code into the prologue region.
 This is for x64 SEH an issue, as here the table-information for
 prologue is limited to 255 bytes size.  So we need to avoid moving
 additional code into prologue.  To achieve this we mark all standard
 and xmm registers as prologue-used at the end of prologue.   Also we
 need to emit a memory blockage.

 ChangeLog

 2011-10-12  Kai Tietz  kti...@redhat.com

         * config/i386/i386.c (ix86_expand_prologue): Mark
         for TARGET_SEH all sse/integer registers as prologue-used.

 Tested for x86_64-w64-mingw32.  Ok for apply?

 Regards,
 Kai

 Index: i386.c
 ===
 --- i386.c      (revision 179824)
 +++ i386.c      (working copy)
 @@ -10356,7 +10356,24 @@
       Further, prevent alloca modifications to the stack pointer from being
       combined with prologue modifications.  */
    if (TARGET_SEH)
 -    emit_insn (gen_prologue_use (stack_pointer_rtx));
 +    {
 +      int i;
 +
 +      /* Due limited size of prologue-code size of 255 bytes,
 +         we need to prevent scheduler to sink instructions into
 +         prologue code.  Therefore we mark all standard, sse, fpu,
 +         and the pc registers as prologue-used to prevent this.
 +         Also an memory-blockage is necessary.  */
 +      emit_insn (gen_memory_blockage ());
 +
 +      for (i = 0; i = 7; i++)
 +        {
 +         emit_insn (gen_prologue_use (gen_rtx_REG (Pmode, AX_REG + i)));
 +         emit_insn (gen_prologue_use (gen_rtx_REG (Pmode, R8_REG + i)));
 +         emit_insn (gen_prologue_use (gen_rtx_REG (TImode, XMM0_REG + i)));
 +         emit_insn (gen_prologue_use (gen_rtx_REG (TImode, XMM8_REG + i)));
 +       }
 +    }

 This is overkill.  We simply need to disable shrink-wrapping for SEH.
 The easiest way to do that is to add !TARGET_SEH (and a comment) to
 the simple_return pattern predicate.


 r~

Thanks,  this is indeed more simple.  I wasn't aware that enamed
return_simple expand also enables shrink-wrapping into prologue.

Patch tested for x86_64-w64-mingw32.  Ok?

Regards,
Kai

Index: i386.md
===
--- i386.md (revision 179824)
+++ i386.md (working copy)
@@ -11708,9 +11708,13 @@
 }
 })

+;; We need to disable this for TARGET_SEH, as otherwise
+;; shrink-wrapped prologue gets enabled too.  This might exceed
+;; the maximum size of prologue in unwind information.
+
 (define_expand simple_return
   [(simple_return)]
-  
+  !TARGET_SEH
 {
   if (crtl-args.pops_args)
 {

Re: [patch i386]: Unbreak bootstrap for x64 SEH enabled target

On 10/12/2011 09:54 AM, Kai Tietz wrote:
 +;; We need to disable this for TARGET_SEH, as otherwise
 +;; shrink-wrapped prologue gets enabled too.  This might exceed
 +;; the maximum size of prologue in unwind information.
 +
  (define_expand simple_return
[(simple_return)]
 -  
 +  !TARGET_SEH

Ok.


r~

Re: [PATCH, testsuite, i386] FMA3 testcases + typo fix in MD

On Tue, Oct 11, 2011 at 8:37 AM, H.J. Lu hjl.to...@gmail.com wrote:
 On Tue, Oct 11, 2011 at 3:12 AM, Kirill Yukhin kirill.yuk...@gmail.com 
 wrote:
 Hi
 Uros, you was right both with fpmath and configflags. That is why it
 was passing for me.

 Attached patch which cures the problem.

 testsuite/ChangeLog entry:

 2011-10-11  Kirill Yukhin  kirill.yuk...@intel.com

        * gcc.target/i386/fma_double_1.c: Add -mfpmath=sse.
        * gcc.target/i386/fma_double_2.c: Ditto.
        * gcc.target/i386/fma_double_3.c: Ditto.
        * gcc.target/i386/fma_double_4.c: Ditto.
        * gcc.target/i386/fma_double_5.c: Ditto.
        * gcc.target/i386/fma_double_6.c: Ditto.
        * gcc.target/i386/fma_float_1.c: Ditto.
        * gcc.target/i386/fma_float_2.c: Ditto.
        * gcc.target/i386/fma_float_3.c: Ditto.
        * gcc.target/i386/fma_float_4.c: Ditto.
        * gcc.target/i386/fma_float_5.c: Ditto.
        * gcc.target/i386/fma_float_6.c: Ditto.
        * gcc.target/i386/l_fma_double_1.c: Ditto.
        * gcc.target/i386/l_fma_double_2.c: Ditto.
        * gcc.target/i386/l_fma_double_3.c: Ditto.
        * gcc.target/i386/l_fma_double_4.c: Ditto.
        * gcc.target/i386/l_fma_double_5.c: Ditto.
        * gcc.target/i386/l_fma_double_6.c: Ditto.
        * gcc.target/i386/l_fma_float_1.c: Ditto.
        * gcc.target/i386/l_fma_float_2.c: Ditto.
        * gcc.target/i386/l_fma_float_3.c: Ditto.
        * gcc.target/i386/l_fma_float_4.c: Ditto.
        * gcc.target/i386/l_fma_float_5.c: Ditto.
        * gcc.target/i386/l_fma_float_6.c: Ditto.
        * gcc.target/i386/l_fma_run_double_1.c: Ditto.
        * gcc.target/i386/l_fma_run_double_2.c: Ditto.
        * gcc.target/i386/l_fma_run_double_3.c: Ditto.
        * gcc.target/i386/l_fma_run_double_4.c: Ditto.
        * gcc.target/i386/l_fma_run_double_5.c: Ditto.
        * gcc.target/i386/l_fma_run_double_6.c: Ditto.
        * gcc.target/i386/l_fma_run_float_1.c: Ditto.
        * gcc.target/i386/l_fma_run_float_2.c: Ditto.
        * gcc.target/i386/l_fma_run_float_3.c: Ditto.
        * gcc.target/i386/l_fma_run_float_4.c: Ditto.
        * gcc.target/i386/l_fma_run_float_5.c: Ditto.
        * gcc.target/i386/l_fma_run_float_6.c: Ditto.

 Could you please have a look?

 Sorry for inconvenience, K


 All double vector tests are failed when GCC is configured with
 --with-cpu=atom since double vectorizer is turned off by default.
 You should add -mtune=generic to those tests.


I checked in this patch to add -mfpmath=sse/-mtune=generic to FMA tests.
I also removed the extra dg-options.  Tested on Linux/ia32 and Linux/x96-64.


-- 
H.J.
---
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 5af301f..11a3cc6 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,32 @@
+2011-10-12  H.J. Lu  hongjiu...@intel.com
+
+   * gcc.target/i386/fma_run_double_1.c: Add -mfpmath=sse.
+   * gcc.target/i386/fma_run_double_2.c: Likewise.
+   * gcc.target/i386/fma_run_double_3.c: Likewise.
+   * gcc.target/i386/fma_run_double_4.c: Likewise.
+   * gcc.target/i386/fma_run_double_5.c: Likewise.
+   * gcc.target/i386/fma_run_double_6.c: Likewise.
+   * gcc.target/i386/fma_run_float_1.c: Likewise.
+   * gcc.target/i386/fma_run_float_2.c: Likewise.
+   * gcc.target/i386/fma_run_float_3.c: Likewise.
+   * gcc.target/i386/fma_run_float_4.c: Likewise.
+   * gcc.target/i386/fma_run_float_5.c: Likewise.
+   * gcc.target/i386/fma_run_float_6.c: Likewise.
+
+   * gcc.target/i386/l_fma_double_1.c: Add -mtune=generic and
+   remove the extra dg-options.
+   * gcc.target/i386/l_fma_double_2.c: Likewise.
+   * gcc.target/i386/l_fma_double_3.c: Likewise.
+   * gcc.target/i386/l_fma_double_4.c: Likewise.
+   * gcc.target/i386/l_fma_double_5.c: Likewise.
+   * gcc.target/i386/l_fma_double_6.c: Likewise.
+   * gcc.target/i386/l_fma_float_1.c: Likewise.
+   * gcc.target/i386/l_fma_float_2.c: Likewise.
+   * gcc.target/i386/l_fma_float_3.c: Likewise.
+   * gcc.target/i386/l_fma_float_4.c: Likewise.
+   * gcc.target/i386/l_fma_float_5.c: Likewise.
+   * gcc.target/i386/l_fma_float_6.c: Likewise.
+
 2011-10-12  Paul Koning  pkon...@gcc.gnu.org

PR tree-optimization/50189
diff --git a/gcc/testsuite/gcc.target/i386/fma_run_double_1.c
b/gcc/testsuite/gcc.target/i386/fma_run_double_1.c
index d46327d..79b219b 100644
--- a/gcc/testsuite/gcc.target/i386/fma_run_double_1.c
+++ b/gcc/testsuite/gcc.target/i386/fma_run_double_1.c
@@ -1,7 +1,7 @@
 /* { dg-do run } */
 /* { dg-prune-output .*warning: 'sseregparm' attribute ignored.* } */
 /* { dg-require-effective-target fma } */
-/* { dg-options -O3 -mfma } */
+/* { dg-options -O3 -mfpmath=sse -mfma } */

 /* Test that the compiler properly optimizes floating point multiply
and add instructions into FMA3 instructions.  */
diff --git a/gcc/testsuite/gcc.target/i386/fma_run_double_2.c

Re: [PATCH] AVX2 vector permutation fixes plus vec_pack_trunc_{v16hi,v8si,v4di} support

On 10/12/2011 09:09 AM, Jakub Jelinek wrote:
 /* Multiply the shuffle indicies by two.  */
 -   emit_insn (gen_avx2_lshlv8si3 (t1, t1, const1_rtx));
 +   if (maskmode == V8SImode)
 + emit_insn (gen_avx2_lshlv8si3 (t1, t1, const1_rtx));
 +   else
 + emit_insn (gen_addv32qi3 (t1, t1, t1));

I guess this would be cleaner to use plus always.  And thus
expand_simple_binop instead of (a couple of) these mode tests.

  
 + case V32QImode:
 +   t1 = gen_reg_rtx (V32QImode);
 +   t2 = gen_reg_rtx (V32QImode);
 +   t3 = gen_reg_rtx (V32QImode);
 +   vt2 = GEN_INT (128);
 +   for (i = 0; i  32; i++)
 + vec[i] = vt2;
 +   vt = gen_rtx_CONST_VECTOR (V32QImode, gen_rtvec_v (32, vec));
 +   vt = force_reg (V32QImode, vt);
 +   for (i = 0; i  32; i++)
 + vec[i] = i  16 ? vt2 : const0_rtx;
 +   vt2 = gen_rtx_CONST_VECTOR (V32QImode, gen_rtvec_v (32, vec));
 +   vt2 = force_reg (V32QImode, vt2);
 +   emit_insn (gen_avx2_lshlv4di3 (gen_lowpart (V4DImode, t1),
 +  gen_lowpart (V4DImode, mask),
 +  GEN_INT (3)));
 +   emit_insn (gen_avx2_andnotv32qi3 (t2, vt, mask));
 +   emit_insn (gen_xorv32qi3 (t1, t1, vt2));
 +   emit_insn (gen_andv32qi3 (t1, t1, vt));
 +   emit_insn (gen_iorv32qi3 (t3, t1, t2));
 +   emit_insn (gen_xorv32qi3 (t1, t1, vt));
 +   emit_insn (gen_avx2_permv4di_1 (gen_lowpart (V4DImode, t3),
 +   gen_lowpart (V4DImode, t3),
 +   const2_rtx, GEN_INT (3),
 +   const0_rtx, const1_rtx));
 +   emit_insn (gen_iorv32qi3 (t1, t1, t2));

Some commentary here is required.  I might have expected to see a compare,
or something, but the logical operations here are less than obvious.

I believe I've commented on everything else in the previous messages.


r~

Re: PR c++/30195


Copying the decl is unlikely to do what we want, I think.  Does putting the
target decl directly into the method vec work?

Unfortunately not, it ends up with the same error: undefined
reference.


Hunh, that's surprising.


Furthermore, I don't think it is the right approach since
the access may be different between the member function and the using
declaration... Never mind.


I would expect the existing access declaration code to deal with that, 
though I could be wrong.


There don't seem to be any tests for a class that both uses and defines 
functions with the same name to verify that both functions can be 
called; I suspect that doesn't work yet with this patch.  If we can't 
put the used functions directly into CLASSTYPE_METHOD_VEC, we need to 
combine them with functions from there at lookup time.



+   if (TREE_CODE (target_field) == FUNCTION_DECL
+DECL_NAME (OVL_CURRENT (target_field)) == name)


Checking for FUNCTION_DECL won't work if the target is overloaded.

Jason

Re: [C++ Patch] PR 50594 (C++ front-end bits)


On 10/12/2011 09:18 AM, Paolo Carlini wrote:

  newattrs
= build_tree_list (get_identifier (alloc_size),
 build_tree_list (NULL_TREE, integer_one_node));
+extvisattr = build_tree_list (get_identifier (externally_visible),
+ NULL_TREE);
+newattrs = chainon (newattrs, extvisattr);


Instead of chainon you could build newattrs after extvisattr with tree_cons.

Jason

Re: RFC: Add ADD_RESTRICT tree code

On Wed, Oct 12, 2011 at 07:16:56PM +0200, Michael Matz wrote:
 This patch will fix the currently XFAILed tree-ssa/restrict-4.c again, as 
 well as fix PR 50419.  It also still fixes the original testcase of 
 PR 49279.  But it will break the checked in testcase for this bug report.  
 I believe the checked in testcase is invalid as follows:
 
 struct S { int a; int *__restrict p; };
 
 int
 foo (int *p, int *q)
 {
   struct S s, *t;
   s.a = 1;
   s.p = p;   // 1
   t = wrap(s);  // 2 t=s in effect, but GCC doesn't see this
   t-p = q;  // 3
   s.p[0] = 0;// 4
   t-p[0] = 1;   // 5
   return s.p[0]; // 6
 }

I'm fairly sure this is completely valid.

 Assignment 2 means that t-p points to s.p.  Assignment 3 changes t-p and 
 s.p, but the change to s.p doesn't occur through a pointer based on t-p 
 or any other restrict pointer, in fact it doesn't occur through any 
 explicit initialization or assignment, but rather through in indirect 
 access via a different pointer.  Hence the accesses to the same memory 
 object at s.p[0] and t-p[0] were undefined because both accesses weren't 
 through pointers based on each other.

Only the field p in the structure is restrict qualified, there is no
restrict qualification on the other pointers (e.g. t is not restrict).
Thus, it is valid that t points to s.  And, s.p[0] access is based on s.p
as well as t-p and similarly t-p[0] access is based on s.p as well as
t-p, in the sense of the ISO C99 restrict wording.  Because, if you change
t-p (or s.p) at some point in between t-p = q; and s.p[0]; (i.e. prior
to the access) to point to a copy of the array, both s.p and t-p change.

In what follows, a pointer expression E is said to be based on object P if
(at some sequence point in the execution of B prior to the evaluation of E)
modifying P to point to a copy of the array object into which it formerly
pointed would change the value of E.
Note that ‘‘based’’ is defined only for expressions with pointer types.

Which means that for memory restricts (fields in particular) we need to
limit ourselves to the cases where the field is accessed through a
restricted pointer or doesn't have address taken.

Jakub

Re: [C++-11] User defined literals


On 10/12/2011 01:05 AM, Ed Smith-Rowland wrote:

cp_parser_operator(function_id) is simply run twice in
cp_parser_unqualified_id.
Once inside cp_parser_template_id called at parser.c:4515.
Once directly inside cp_parser_unqualified_id at parser.c:4525.


Ah.  You could try replacing the operator  X tokens with a single 
CPP_LITERAL_OPERATOR token, like we do for CPP_NESTED_NAME_SPECIFIER and 
CPP_TEMPLATE_ID.



cp_parser_template_id never succeeds with literal operator templates. I
find that curious. But I haven't looked real hard and the things do get
parsed somehow.


I'd only expect it to succeed if you actually wrote, e.g.,

operator_c'a','b','c'();

Jason

Re: [C++ Patch] PR 50594 (C++ front-end bits)


On 10/12/2011 07:56 PM, Jason Merrill wrote:

On 10/12/2011 09:18 AM, Paolo Carlini wrote:

  newattrs
= build_tree_list (get_identifier (alloc_size),
   build_tree_list (NULL_TREE, integer_one_node));
+extvisattr = build_tree_list (get_identifier 
(externally_visible),

+  NULL_TREE);
+newattrs = chainon (newattrs, extvisattr);
Instead of chainon you could build newattrs after extvisattr with 
tree_cons.

Yes. Like this?

Paolo.

/
Index: decl.c
===
--- decl.c  (revision 179859)
+++ decl.c  (working copy)
@@ -3654,7 +3654,7 @@ cxx_init_decl_processing (void)
   current_lang_name = lang_name_cplusplus;
 
   {
-tree newattrs;
+tree newattrs, extvisattr;
 tree newtype, deltype;
 tree ptr_ftype_sizetype;
 tree new_eh_spec;
@@ -3684,12 +3684,15 @@ cxx_init_decl_processing (void)
 
 /* Ensure attribs.c is initialized.  */
 init_attributes ();
-newattrs
-  = build_tree_list (get_identifier (alloc_size),
-build_tree_list (NULL_TREE, integer_one_node));
+extvisattr = build_tree_list (get_identifier (externally_visible),
+ NULL_TREE);
+newattrs = tree_cons (get_identifier (alloc_size),
+ build_tree_list (NULL_TREE, integer_one_node),
+ extvisattr);
 newtype = cp_build_type_attribute_variant (ptr_ftype_sizetype, newattrs);
 newtype = build_exception_variant (newtype, new_eh_spec);
-deltype = build_exception_variant (void_ftype_ptr, empty_except_spec);
+deltype = cp_build_type_attribute_variant (void_ftype_ptr, extvisattr);
+deltype = build_exception_variant (deltype, empty_except_spec);
 push_cp_library_fn (NEW_EXPR, newtype);
 push_cp_library_fn (VEC_NEW_EXPR, newtype);
 global_delete_fndecl = push_cp_library_fn (DELETE_EXPR, deltype);

Re: [C++ Patch] PR 50594 (C++ front-end bits)


OK.

Jason

[PATCH] Slightly fix up vgather* patterns (take 2)

On Sun, Oct 09, 2011 at 12:55:40PM +0200, Uros Bizjak wrote:
 BTW: No need to use %c modifier:
 
 /* Meaning of CODE:
L,W,B,Q,S,T -- print the opcode suffix for specified size of operand.
C -- print opcode suffix for set/cmov insn.
c -- like C, but print reversed condition
...
 */

Well, something needs to be used there, because otherwise we get
addresses like (%rax, %ymm0, $4) instead of the needed (%rax, %ymm0, 4)

I've used %p6 instead of %c6 in the patch below.

On Mon, Oct 10, 2011 at 01:47:49PM -0700, Richard Henderson wrote:
 The use of match_dup in the clobber is wrong.  We should not be
 clobbering the user-visible copy of the operand.  That does not
 make sense when dealing with the user-visible builtin.

Ok.

 Instead, use (clobber (match_scratch)) and matching constraints with operand 
 4.

Ok.

 I think that a (mem (scratch)) as input to the unspec is probably best.
 The exact memory usage is almost certainly too complex to describe
 in a useful way.

Ok, so how about this (so far untested, will bootstrap/regtest it soon)?

2011-10-12  Jakub Jelinek  ja...@redhat.com

* config/i386/sse.md (avx2_gathersimode,
avx2_gatherdimode, avx2_gatherdimode256): Add clobber of
match_scratch, change memory_operand to register_operand,
add (mem:BLK (scratch)) use.
(*avx2_gathersimode, *avx2_gatherdimode,
*avx2_gatherdimode256): Add clobber of match_scratch,
add earlyclobber to the output operand and match_scratch,
add (mem:BLK (scratch)) use, change the other mem to match_operand.
Use %p6 instead of %c6 in the pattern.
* config/i386/i386.c (ix86_expand_builtin): Adjust for
operand 2 being a Pmode register_operand instead of memory_operand.

--- gcc/config/i386/i386.c.jj   2011-10-12 16:15:50.0 +0200
+++ gcc/config/i386/i386.c  2011-10-12 19:12:15.0 +0200
@@ -28862,7 +28862,6 @@ rdrand_step:
   op4 = expand_normal (arg4);
   /* Note the arg order is different from the operand order.  */
   mode0 = insn_data[icode].operand[1].mode;
-  mode1 = insn_data[icode].operand[2].mode;
   mode2 = insn_data[icode].operand[3].mode;
   mode3 = insn_data[icode].operand[4].mode;
   mode4 = insn_data[icode].operand[5].mode;
@@ -28876,12 +28875,11 @@ rdrand_step:
   if (GET_MODE (op1) != Pmode)
op1 = convert_to_mode (Pmode, op1, 1);
   op1 = force_reg (Pmode, op1);
-  op1 = gen_rtx_MEM (mode1, op1);
 
   if (!insn_data[icode].operand[1].predicate (op0, mode0))
op0 = copy_to_mode_reg (mode0, op0);
-  if (!insn_data[icode].operand[2].predicate (op1, mode1))
-   op1 = copy_to_mode_reg (mode1, op1);
+  if (!insn_data[icode].operand[2].predicate (op1, Pmode))
+   op1 = copy_to_mode_reg (Pmode, op1);
   if (!insn_data[icode].operand[3].predicate (op2, mode2))
op2 = copy_to_mode_reg (mode2, op2);
   if (!insn_data[icode].operand[4].predicate (op3, mode3))
--- gcc/config/i386/sse.md.jj   2011-10-12 16:16:49.0 +0200
+++ gcc/config/i386/sse.md  2011-10-12 19:19:55.0 +0200
@@ -12582,55 +12582,61 @@ (define_mode_attr VEC_GATHER_MODE
   (V8SI V8SI) (V8SF V8SI)])
 
 (define_expand avx2_gathersimode
-  [(set (match_operand:VEC_GATHER_MODE 0 register_operand )
-   (unspec:VEC_GATHER_MODE
- [(match_operand:VEC_GATHER_MODE 1 register_operand )
-  (match_operand:ssescalarmode 2 memory_operand )
-  (match_operand:VEC_GATHER_MODE 3 register_operand )
-  (match_operand:VEC_GATHER_MODE 4 register_operand )
-  (match_operand:SI 5 const1248_operand  )]
- UNSPEC_GATHER))]
+  [(parallel [(set (match_operand:VEC_GATHER_MODE 0 register_operand )
+  (unspec:VEC_GATHER_MODE
+[(match_operand:VEC_GATHER_MODE 1 register_operand )
+ (match_operand 2 register_operand )
+ (mem:BLK (scratch))
+ (match_operand:VEC_GATHER_MODE 3 register_operand )
+ (match_operand:VEC_GATHER_MODE 4 register_operand )
+ (match_operand:SI 5 const1248_operand  )]
+UNSPEC_GATHER))
+ (clobber (match_scratch:VEC_GATHER_MODE 6 ))])]
   TARGET_AVX2)
 
 (define_insn *avx2_gathersimode
-  [(set (match_operand:VEC_GATHER_MODE 0 register_operand =x)
+  [(set (match_operand:VEC_GATHER_MODE 0 register_operand =x)
(unspec:VEC_GATHER_MODE
- [(match_operand:VEC_GATHER_MODE 1 register_operand 0)
-  (mem:ssescalarmode
-(match_operand:P 2 register_operand r))
-  (match_operand:VEC_GATHER_MODE 3 register_operand x)
-  (match_operand:VEC_GATHER_MODE 4 register_operand x)
-  (match_operand:SI 5 const1248_operand n)]
- UNSPEC_GATHER))]
+ [(match_operand:VEC_GATHER_MODE 2 register_operand 0)
+  (match_operand:P 3 register_operand r)
+  (mem:BLK

[v3] PR C++/50594


Hi,

these are the library bits, which I committed together with the 
front-end bits approved by Jason. Tested x86_64-linux.


Paolo.


2011-10-12  Paolo Carlini  paolo.carl...@oracle.com

PR c++/50594
* libsupc++/new (operator new, operator delete): Decorate with
__attribute__((__externally_visible__)).
* include/bits/c++config: Add _GLIBCXX_THROW.
* libsupc++/del_op.cc: Adjust.
* libsupc++/del_opv.cc: Likewise.
* libsupc++/del_opnt.cc: Likewise.
* libsupc++/del_opvnt.cc: Likewise.
* libsupc++/new_op.cc: Likewise.
* libsupc++/new_opv.cc: Likewise.
* libsupc++/new_opnt.cc: Likewise.
* libsupc++/new_opvnt.cc: Likewise.
* testsuite/18_support/50594.cc: New.
* testsuite/ext/profile/mutex_extensions_neg.cc: Adjust dg-error
line number.
Index: include/bits/c++config
===
--- include/bits/c++config  (revision 179842)
+++ include/bits/c++config  (working copy)
@@ -103,9 +103,11 @@
 # ifdef __GXX_EXPERIMENTAL_CXX0X__
 #  define _GLIBCXX_NOEXCEPT noexcept
 #  define _GLIBCXX_USE_NOEXCEPT noexcept
+#  define _GLIBCXX_THROW(_EXC)
 # else
 #  define _GLIBCXX_NOEXCEPT
 #  define _GLIBCXX_USE_NOEXCEPT throw()
+#  define _GLIBCXX_THROW(_EXC) throw(_EXC)
 # endif
 #endif
 
Index: libsupc++/del_op.cc
===
--- libsupc++/del_op.cc (revision 179842)
+++ libsupc++/del_op.cc (working copy)
@@ -1,6 +1,7 @@
 // Boilerplate support routines for -*- C++ -*- dynamic memory management.
 
-// Copyright (C) 1997, 1998, 1999, 2000, 2004, 2007, 2009 Free Software 
Foundation
+// Copyright (C) 1997, 1998, 1999, 2000, 2004, 2007, 2009, 2010, 2011
+// Free Software Foundation
 //
 // This file is part of GCC.
 //
@@ -41,7 +42,7 @@
 #include new
 
 _GLIBCXX_WEAK_DEFINITION void
-operator delete(void* ptr) throw ()
+operator delete(void* ptr) _GLIBCXX_USE_NOEXCEPT
 {
   if (ptr)
 std::free(ptr);
Index: libsupc++/new_opv.cc
===
--- libsupc++/new_opv.cc(revision 179842)
+++ libsupc++/new_opv.cc(working copy)
@@ -1,6 +1,7 @@
 // Boilerplate support routines for -*- C++ -*- dynamic memory management.
 
-// Copyright (C) 1997, 1998, 1999, 2000, 2004, 2009 Free Software Foundation
+// Copyright (C) 1997, 1998, 1999, 2000, 2004, 2009, 2010, 2011
+// Free Software Foundation
 //
 // This file is part of GCC.
 //
@@ -27,7 +28,7 @@
 #include new
 
 _GLIBCXX_WEAK_DEFINITION void*
-operator new[] (std::size_t sz) throw (std::bad_alloc)
+operator new[] (std::size_t sz) _GLIBCXX_THROW (std::bad_alloc)
 {
   return ::operator new(sz);
 }
Index: libsupc++/new_op.cc
===
--- libsupc++/new_op.cc (revision 179842)
+++ libsupc++/new_op.cc (working copy)
@@ -42,7 +42,7 @@
 extern new_handler __new_handler;
 
 _GLIBCXX_WEAK_DEFINITION void *
-operator new (std::size_t sz) throw (std::bad_alloc)
+operator new (std::size_t sz) _GLIBCXX_THROW (std::bad_alloc)
 {
   void *p;
 
Index: libsupc++/del_opv.cc
===
--- libsupc++/del_opv.cc(revision 179842)
+++ libsupc++/del_opv.cc(working copy)
@@ -1,6 +1,7 @@
 // Boilerplate support routines for -*- C++ -*- dynamic memory management.
 
-// Copyright (C) 1997, 1998, 1999, 2000, 2004, 2009 Free Software Foundation
+// Copyright (C) 1997, 1998, 1999, 2000, 2004, 2009, 2010, 2011 
+// Free Software Foundation
 //
 // This file is part of GCC.
 //
@@ -27,7 +28,7 @@
 #include new
 
 _GLIBCXX_WEAK_DEFINITION void
-operator delete[] (void *ptr) throw ()
+operator delete[] (void *ptr) _GLIBCXX_USE_NOEXCEPT
 {
   ::operator delete (ptr);
 }
Index: libsupc++/del_opnt.cc
===
--- libsupc++/del_opnt.cc   (revision 179842)
+++ libsupc++/del_opnt.cc   (working copy)
@@ -1,6 +1,7 @@
 // Boilerplate support routines for -*- C++ -*- dynamic memory management.
 
-// Copyright (C) 1997, 1998, 1999, 2000, 2004, 2009 Free Software Foundation
+// Copyright (C) 1997, 1998, 1999, 2000, 2004, 2009, 2010, 2011 
+// Free Software Foundation
 //
 // This file is part of GCC.
 //
@@ -29,7 +30,7 @@
 extern C void free (void *);
 
 _GLIBCXX_WEAK_DEFINITION void
-operator delete (void *ptr, const std::nothrow_t) throw ()
+operator delete (void *ptr, const std::nothrow_t) _GLIBCXX_USE_NOEXCEPT
 {
   free (ptr);
 }
Index: libsupc++/new
===
--- libsupc++/new   (revision 179842)
+++ libsupc++/new   (working copy)
@@ -1,7 +1,7 @@
 // The -*- C++ -*- dynamic memory management header.
 
 // Copyright (C) 1994, 1995, 1996, 1997, 1998, 1999, 2000, 2001, 2002,
-// 2003, 2004, 2005, 2006, 2007, 2009, 2010
+// 2003, 2004, 2005, 2006,

Re: [Patch, Fortran, committed] PR 50659: [4.4/4.5/4.6/4.7 Regression] ICE with PROCEDURE statement

2011-10-12 Thread Janus Weil

Committed to the 4.6 branch as r179864:

http://gcc.gnu.org/viewcvs?view=revisionrevision=179723

Cheers,
Janus



2011/10/9 Janus Weil ja...@gcc.gnu.org:
 Hi all,

 I have just committed as obvious a patch for an ICE-on-valid problem
 with PROCEDURE statements:

 http://gcc.gnu.org/viewcvs?root=gccview=revrev=179723

 The problem was the following: When setting up an external procedure
 or procedure pointer (declared via a PROCEDURE statement), we copy the
 expressions for the array bounds and string length from the interface
 symbol given in the PROCEDURE declaration (cf.
 'resolve_procedure_interface'). If those expressions depend on the
 actual args of the interface, we have to replace those args by the
 args of the new procedure symbol that we're setting up. This is what
 'gfc_expr_replace_symbols' / 'replace_symbol' does. Unfortunately we
 failed to check whether the symbol we try to replace is actually a
 dummy!

 Contrary to Andrew's initial assumption, I think the test case is
 valid. I could neither find a compiler which rejects it, nor a
 restriction in the standard which makes it invalid. The relevant part
 of F08 is probably chapter 7.1.11 (Specification expression). This
 states that a specification expression can contain variables, which
 are made accessible via use association.

 I'm planning to apply the patch to the 4.6, 4.5 and 4.4 branches soon.

 Cheers,
 Janus

Factor out allocation of sorted_fields (issue5253050)

This moves the allocation of sorted_fields_type elements into a new
allocator function.  It's not completely necessary in trunk, but in
the pph branch we need to allocate this type from pph images, so we
need to call it from outside of class.c

OK for trunk?

Tested on x86_64.


Diego.

* class.c (sorted_fields_type_new): Factor out of ...
(finish_struct_1): ... here.

diff --git a/gcc/cp/class.c b/gcc/cp/class.c
index 2df9177..6185054 100644
--- a/gcc/cp/class.c
+++ b/gcc/cp/class.c
@@ -5663,6 +5663,22 @@ determine_key_method (tree type)
   return;
 }
 
+
+/* Allocate and return an instance of struct sorted_fields_type with
+   N fields.  */
+
+static struct sorted_fields_type *
+sorted_fields_type_new (int n)
+{
+  struct sorted_fields_type *sft;
+  sft = ggc_alloc_sorted_fields_type (sizeof (struct sorted_fields_type)
+ + n * sizeof (tree));
+  sft-len = n;
+
+  return sft;
+}
+
+
 /* Perform processing required when the definition of T (a class type)
is complete.  */
 
@@ -5792,9 +5808,7 @@ finish_struct_1 (tree t)
   n_fields = count_fields (TYPE_FIELDS (t));
   if (n_fields  7)
 {
-  struct sorted_fields_type *field_vec = ggc_alloc_sorted_fields_type
-(sizeof (struct sorted_fields_type) + n_fields * sizeof (tree));
-  field_vec-len = n_fields;
+  struct sorted_fields_type *field_vec = sorted_fields_type_new (n_fields);
   add_fields_to_record_type (TYPE_FIELDS (t), field_vec, 0);
   qsort (field_vec-elts, n_fields, sizeof (tree),
 field_decl_cmp);
-- 
1.7.3.1


--
This patch is available for review at http://codereview.appspot.com/5253050

Add new debugging routines to the C++ parser (issue5232053)

I added this code while learning my way through the parser.  It dumps
most of the internal parser state.  It also changes the lexer dumper
to support dumping a window of tokens and highlighting a specific
token when dumping.

Tested on x86_64.

OK for trunk?


Diego.

* parser.c: Remove ENABLE_CHECKING markers around debugging
routines.
(cp_lexer_dump_tokens): Add arguments START_TOKEN and CURR_TOKEN.
Make static
When printing CURR_TOKEN surround it in [[ ]].
Start printing at START_TOKEN.
Update all users.
(cp_debug_print_tree_if_set): New.
(cp_debug_print_context): New.
(cp_debug_print_context_stack): New.
(cp_debug_print_flag): New.
(cp_debug_print_unparsed_function): New.
(cp_debug_print_unparsed_queues): New.
(cp_debug_parser_tokens): New.
(cp_debug_parser): New.
(cp_lexer_start_debugging): Set cp_lexer_debug_stream to stderr.
(cp_lexer_stop_debugging): Set cp_lexer_debug_stream to NULL.
* parser.h (cp_lexer_dump_tokens): Remove declaration.
(cp_debug_parser): Declare.
 
diff --git a/gcc/cp/parser.c b/gcc/cp/parser.c
index cabe9aa..48d92bb 100644
--- a/gcc/cp/parser.c
+++ b/gcc/cp/parser.c
@@ -210,7 +210,6 @@ static void cp_lexer_commit_tokens
   (cp_lexer *);
 static void cp_lexer_rollback_tokens
   (cp_lexer *);
-#ifdef ENABLE_CHECKING
 static void cp_lexer_print_token
   (FILE *, cp_token *);
 static inline bool cp_lexer_debugging_p
@@ -219,15 +218,6 @@ static void cp_lexer_start_debugging
   (cp_lexer *) ATTRIBUTE_UNUSED;
 static void cp_lexer_stop_debugging
   (cp_lexer *) ATTRIBUTE_UNUSED;
-#else
-/* If we define cp_lexer_debug_stream to NULL it will provoke warnings
-   about passing NULL to functions that require non-NULL arguments
-   (fputs, fprintf).  It will never be used, so all we need is a value
-   of the right type that's guaranteed not to be NULL.  */
-#define cp_lexer_debug_stream stdout
-#define cp_lexer_print_token(str, tok) (void) 0
-#define cp_lexer_debugging_p(lexer) 0
-#endif /* ENABLE_CHECKING */
 
 static cp_token_cache *cp_token_cache_new
   (cp_token *, cp_token *);
@@ -241,33 +231,64 @@ static void cp_parser_initial_pragma
 
 /* Variables.  */
 
-#ifdef ENABLE_CHECKING
 /* The stream to which debugging output should be written.  */
 static FILE *cp_lexer_debug_stream;
-#endif /* ENABLE_CHECKING */
 
 /* Nonzero if we are parsing an unevaluated operand: an operand to
sizeof, typeof, or alignof.  */
 int cp_unevaluated_operand;
 
-#ifdef ENABLE_CHECKING
-/* Dump up to NUM tokens in BUFFER to FILE.  If NUM is 0, dump all the
-   tokens.  */
+/* Dump up to NUM tokens in BUFFER to FILE starting with token
+   START_TOKEN.  If START_TOKEN is NULL, the dump starts with the
+   first token in BUFFER.  If NUM is 0, dump all the tokens.  If
+   CURR_TOKEN is set and it is one of the tokens in BUFFER, it will be
+   highlighted by surrounding it in [[ ]].  */
 
-void
-cp_lexer_dump_tokens (FILE *file, VEC(cp_token,gc) *buffer, unsigned num)
+static void
+cp_lexer_dump_tokens (FILE *file, VEC(cp_token,gc) *buffer,
+ cp_token *start_token, unsigned num,
+ cp_token *curr_token)
 {
-  unsigned i;
+  unsigned i, nprinted;
   cp_token *token;
+  bool do_print;
 
   fprintf (file, %u tokens\n, VEC_length (cp_token, buffer));
 
+  if (buffer == NULL)
+return;
+
   if (num == 0)
 num = VEC_length (cp_token, buffer);
 
-  for (i = 0; VEC_iterate (cp_token, buffer, i, token)  i  num; i++)
+  if (start_token == NULL)
+start_token = VEC_address (cp_token, buffer);
+
+  if (start_token  VEC_address (cp_token, buffer))
+{
+  cp_lexer_print_token (file, VEC_index (cp_token, buffer, 0));
+  fprintf (file,  ... );
+}
+
+  do_print = false;
+  nprinted = 0;
+  for (i = 0; VEC_iterate (cp_token, buffer, i, token)  nprinted  num; i++)
 {
+  if (token == start_token)
+   do_print = true;
+
+  if (!do_print)
+   continue;
+
+  nprinted++;
+  if (token == curr_token)
+   fprintf (file, [[);
+
   cp_lexer_print_token (file, token);
+
+  if (token == curr_token)
+   fprintf (file, ]]);
+
   switch (token-type)
{
  case CPP_SEMICOLON:
@@ -298,9 +319,218 @@ cp_lexer_dump_tokens (FILE *file, VEC(cp_token,gc) 
*buffer, unsigned num)
 void
 cp_lexer_debug_tokens (VEC(cp_token,gc) *buffer)
 {
-  cp_lexer_dump_tokens (stderr, buffer, 0);
+  cp_lexer_dump_tokens (stderr, buffer, NULL, 0, NULL);
+}
+
+
+/* Dump the cp_parser tree field T to FILE if T is non-NULL.  DESC is the
+   description for T.  */
+
+static void
+cp_debug_print_tree_if_set (FILE *file, const char *desc, tree t)
+{
+  if (t)
+{
+  fprintf (file, %s: , desc);
+  print_node_brief (file, , t, 0);
+}
+}
+
+
+/* Dump parser context C to FILE.  */
+
+static void
+cp_debug_print_context (FILE *file, cp_parser_context *c)
+{
+  const char *status_s[] = {

[PATCH] Fix number of arguments in call to alloca_with_align

2011-10-12 Thread Tom de Vries

Richard,

This patch fixes a trivial problem in gimplify_parameters, introduced by the
patch that introduced BUILT_IN_ALLOCA_WITH_ALIGN.
BUILT_IN_ALLOCA_WITH_ALIGN has 2 parameters, so the number of arguments in the
corresponding build_call_expr should be 2, not 1.

Bootstrapped and reg-tested (including Ada) on x86_64.

OK for trunk?

Thanks,
- Tom


2011-10-12  Tom de Vries  t...@codesourcery.com

* function.c (gimplify_parameters): Set number of arguments of call to
BUILT_IN_ALLOCA_WITH_ALIGN to 2.
Index: gcc/function.c
===
--- gcc/function.c (revision 179773)
+++ gcc/function.c (working copy)
@@ -3636,7 +3636,7 @@ gimplify_parameters (void)
 		  local = build_fold_indirect_ref (addr);
 
 		  t = built_in_decls[BUILT_IN_ALLOCA_WITH_ALIGN];
-		  t = build_call_expr (t, 1, DECL_SIZE_UNIT (parm),
+		  t = build_call_expr (t, 2, DECL_SIZE_UNIT (parm),
    size_int (DECL_ALIGN (parm)));
 
 		  /* The call has been built for a variable-sized object.  */

Re: [PATCH] Slightly fix up vgather* patterns (take 2)

On 10/12/2011 11:25 AM, Jakub Jelinek wrote:
   * config/i386/sse.md (avx2_gathersimode,
   avx2_gatherdimode, avx2_gatherdimode256): Add clobber of
   match_scratch, change memory_operand to register_operand,
   add (mem:BLK (scratch)) use.
   (*avx2_gathersimode, *avx2_gatherdimode,
   *avx2_gatherdimode256): Add clobber of match_scratch,
   add earlyclobber to the output operand and match_scratch,
   add (mem:BLK (scratch)) use, change the other mem to match_operand.
   Use %p6 instead of %c6 in the pattern.
   * config/i386/i386.c (ix86_expand_builtin): Adjust for
   operand 2 being a Pmode register_operand instead of memory_operand.

Ok.

It looks like these 4 patterns could be macro-ized some more.
But that can wait for a follow-up.


r~

Re: [PATCH] Add mulv32qi3 support

On 10/12/2011 09:24 AM, Jakub Jelinek wrote:
 BTW, I wonder if vector multiply expansion when one argument is VECTOR_CST
 with all elements the same shouldn't use something similar to what expand_mult
 does, not sure if in the generic code or at least in the backends.
 Testing the costs will be harder, maybe it could just test fewer algorithms
 and perhaps just count number of instructions or something similar.
 But certainly e.g. v32qi multiplication by 3 is quite costly
 (4 interleaves, 2 v16hi multiplications, 4 insns to select even from the
 two), while two vector additions (tmp = x + x; result = x + tmp;)
 would do the job.

It would certainly be a good thing to try to do this in the middle-end.


 2011-10-12  Jakub Jelinek  ja...@redhat.com
 
   * config/i386/sse.md (vec_avx2): New mode_attr.
   (mulv16qi3): Macroize to cover also mulv32qi3 for
   TARGET_AVX2 into ...
   (mulmode3): ... this.

Ok.


r~

Re: [PATCH] Fix VIS3 assembler check and conditionalize testsuite on VIS3 support.

From: Eric Botcazou ebotca...@adacore.com
Date: Wed, 12 Oct 2011 00:33:43 +0200

 I see, so we can test the code generation in the testsuite even if the
 compiler was built against an assembler without support for the
 instructions.
 
 At least partially, yes.
 
 But in such a case, I'm unsure if I understand why i386.exp needs
 these tests at all.  The presence of support for a particular i386
 intrinsic is an implicit property of the gcc sources that these test
 cases are a part of.

 If the tests are properly added only once the code to support the i386
 intrinsic is added as well, the checks seem superfluous.
 
 The check is an _object_ check, for example:
 ...
 So the first category of tests will always be executed, whereas the latter 
 two 
 will only be executed if you have the binutils support.

Thanks a lot for explaining things.

I'm currently testing the following patch in various scenerios, I'm pretty
sure this is what you had in mind.

Any feedback is appreciated, thanks again Eric.

diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index 9c7cc56..fa790b3 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -850,7 +850,11 @@ sparc_option_override (void)
 
   cpu = cpu_table[(int) sparc_cpu_and_features];
   target_flags = ~cpu-disable;
-  target_flags |= cpu-enable;
+  target_flags |= (cpu-enable
+#ifndef HAVE_AS_FMAF_HPC_VIS3
+   ~(MASK_FMAF | MASK_VIS3)
+#endif
+  );
 
   /* If -mfpu or -mno-fpu was explicitly used, don't override with
  the processor default.  */
diff --git a/gcc/config/sparc/sparc.h b/gcc/config/sparc/sparc.h
index 0642ff2..669f106 100644
--- a/gcc/config/sparc/sparc.h
+++ b/gcc/config/sparc/sparc.h
@@ -1871,10 +1871,6 @@ extern int sparc_indent_opcode;
 
 #ifndef HAVE_AS_FMAF_HPC_VIS3
 #define AS_NIAGARA3_FLAG b
-#undef TARGET_FMAF
-#define TARGET_FMAF 0
-#undef TARGET_VIS3
-#define TARGET_VIS3 0
 #else
 #define AS_NIAGARA3_FLAG d
 #endif
diff --git a/gcc/testsuite/gcc.target/sparc/cmask.c 
b/gcc/testsuite/gcc.target/sparc/cmask.c
index 989274c..b3168ec 100644
--- a/gcc/testsuite/gcc.target/sparc/cmask.c
+++ b/gcc/testsuite/gcc.target/sparc/cmask.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { vis3 } } } */
+/* { dg-do compile } */
 /* { dg-options -mcpu=niagara3 -mvis } */
 
 void test_cm8 (long x)
diff --git a/gcc/testsuite/gcc.target/sparc/fhalve.c 
b/gcc/testsuite/gcc.target/sparc/fhalve.c
index 737fc71..340b936 100644
--- a/gcc/testsuite/gcc.target/sparc/fhalve.c
+++ b/gcc/testsuite/gcc.target/sparc/fhalve.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { vis3 } } } */
+/* { dg-do compile } */
 /* { dg-options -mcpu=niagara3 -mvis } */
 
 float test_fhadds (float x, float y)
diff --git a/gcc/testsuite/gcc.target/sparc/fnegop.c 
b/gcc/testsuite/gcc.target/sparc/fnegop.c
index 3e3e72c..25f8c19 100644
--- a/gcc/testsuite/gcc.target/sparc/fnegop.c
+++ b/gcc/testsuite/gcc.target/sparc/fnegop.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { vis3 } } } */
+/* { dg-do compile } */
 /* { dg-options -O2 -mcpu=niagara3 -mvis } */
 
 float test_fnadds(float x, float y)
diff --git a/gcc/testsuite/gcc.target/sparc/fpadds.c 
b/gcc/testsuite/gcc.target/sparc/fpadds.c
index f55cb05..d0704e0 100644
--- a/gcc/testsuite/gcc.target/sparc/fpadds.c
+++ b/gcc/testsuite/gcc.target/sparc/fpadds.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { vis3 } } } */
+/* { dg-do compile } */
 /* { dg-options -mcpu=niagara3 -mvis } */
 typedef int __v2si __attribute__((vector_size(8)));
 typedef int __v1si __attribute__((vector_size(4)));
diff --git a/gcc/testsuite/gcc.target/sparc/fshift.c 
b/gcc/testsuite/gcc.target/sparc/fshift.c
index 6adbed6..a12df04 100644
--- a/gcc/testsuite/gcc.target/sparc/fshift.c
+++ b/gcc/testsuite/gcc.target/sparc/fshift.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { vis3 } } } */
+/* { dg-do compile } */
 /* { dg-options -mcpu=niagara3 -mvis } */
 typedef int __v2si __attribute__((vector_size(8)));
 typedef short __v4hi __attribute__((vector_size(8)));
diff --git a/gcc/testsuite/gcc.target/sparc/fucmp.c 
b/gcc/testsuite/gcc.target/sparc/fucmp.c
index 4e7ecad..7f291c3 100644
--- a/gcc/testsuite/gcc.target/sparc/fucmp.c
+++ b/gcc/testsuite/gcc.target/sparc/fucmp.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { vis3 } } } */
+/* { dg-do compile } */
 /* { dg-options -mcpu=niagara3 -mvis } */
 typedef unsigned char vec8 __attribute__((vector_size(8)));
 
diff --git a/gcc/testsuite/gcc.target/sparc/lzd.c 
b/gcc/testsuite/gcc.target/sparc/lzd.c
index 5ffaf56..a897829 100644
--- a/gcc/testsuite/gcc.target/sparc/lzd.c
+++ b/gcc/testsuite/gcc.target/sparc/lzd.c
@@ -1,4 +1,4 @@
-/* { dg-do compile { target { vis3 } } } */
+/* { dg-do compile } */
 /* { dg-options -mcpu=niagara3 } */
 int test_clz(int a)
 {
diff --git a/gcc/testsuite/gcc.target/sparc/vis3misc.c 
b/gcc/testsuite/gcc.target/sparc/vis3misc.c
index e3ef49e..8a9535e 100644
--- a/gcc/testsuite/gcc.target/sparc/vis3misc.c
+++

Rename some fields in struct language_function. (issue5229058)


This patch is needed in the pph branch because streamers need access
to the fields in struct language_function without going through
cp_function_chain.

Since these fields are named exactly like their #define counterparts,
we cannot reference them without the pre-processor expanding the
#defines, which causes build errors.

OK for trunk?

Tested on x86_64.


Diego.

* cp-tree.h (struct language_function): Rename returns_value
to x_returns_value.
Rename returns_null to x_returns_null.
Rename returns_abnormally to x_returns_abnormally.
Rename in_function_try_handler to x_in_function_try_handler.
Rename in_base_initializer to x_in_base_initializer.
Update all users.

diff --git a/gcc/cp/cp-tree.h b/gcc/cp/cp-tree.h
index b53accf..a163cd2 100644
--- a/gcc/cp/cp-tree.h
+++ b/gcc/cp/cp-tree.h
@@ -1050,11 +1050,11 @@ struct GTY(()) language_function {
   tree x_vtt_parm;
   tree x_return_value;
 
-  BOOL_BITFIELD returns_value : 1;
-  BOOL_BITFIELD returns_null : 1;
-  BOOL_BITFIELD returns_abnormally : 1;
-  BOOL_BITFIELD in_function_try_handler : 1;
-  BOOL_BITFIELD in_base_initializer : 1;
+  BOOL_BITFIELD x_returns_value : 1;
+  BOOL_BITFIELD x_returns_null : 1;
+  BOOL_BITFIELD x_returns_abnormally : 1;
+  BOOL_BITFIELD x_in_function_try_handler : 1;
+  BOOL_BITFIELD x_in_base_initializer : 1;
 
   /* True if this function can throw an exception.  */
   BOOL_BITFIELD can_throw : 1;
@@ -1107,23 +1107,23 @@ struct GTY(()) language_function {
 /* Set to 0 at beginning of a function definition, set to 1 if
a return statement that specifies a return value is seen.  */
 
-#define current_function_returns_value cp_function_chain-returns_value
+#define current_function_returns_value cp_function_chain-x_returns_value
 
 /* Set to 0 at beginning of a function definition, set to 1 if
a return statement with no argument is seen.  */
 
-#define current_function_returns_null cp_function_chain-returns_null
+#define current_function_returns_null cp_function_chain-x_returns_null
 
 /* Set to 0 at beginning of a function definition, set to 1 if
a call to a noreturn function is seen.  */
 
 #define current_function_returns_abnormally \
-  cp_function_chain-returns_abnormally
+  cp_function_chain-x_returns_abnormally
 
 /* Nonzero if we are processing a base initializer.  Zero elsewhere.  */
-#define in_base_initializer cp_function_chain-in_base_initializer
+#define in_base_initializer cp_function_chain-x_in_base_initializer
 
-#define in_function_try_handler cp_function_chain-in_function_try_handler
+#define in_function_try_handler cp_function_chain-x_in_function_try_handler
 
 /* Expression always returned from function, or error_mark_node
otherwise, for use by the automatic named return value optimization.  */

--
This patch is available for review at http://codereview.appspot.com/5229058

Re: Out-of-order update of new_spill_reg_store[]

2011-10-12 Thread Richard Sandiford

Bernd Schmidt ber...@codesourcery.com writes:
 On 10/11/11 14:35, Richard Sandiford wrote:
 No, reload 1 is inherited by a later instruction.  And it's inherited
 correctly, in terms of the register contents being what we expect.
 (Reload 1 is the one that survives to the end of the instruction's
 reload sequence.  Reload 2, in contrast, is clobbered by reload 1,
 so could not be inherited.  So when we record inheritance information
 in emit_reload_insns, reload_reg_reaches_end_p correctly stops us
 from recording reload 2 but allows us to record reload 1.)
 
 The problem is that we record the wrong instruction for reload 1.
 We say that reload 1 is performed by the instruction that performs
 reload 2.  So spill_reg_store[] contains the instruction for reload 2
 rather than the instruction for reload 1.  We delete it in
 delete_output_reload at the point of inheritance.

 Ok. So, would the minimal fix of testing !new_spill_reg_store[..] before
 writing to it also work? Seems to me this would cope with the
 out-of-order writes by only allowing the first.

 If so, then I think I'd prefer that, but we could gcc_assert
 (reload_reg_reaches_end (..)) as a bit of a verification of that function.

I don't think the assert would be safe.  We could have similar reuse in
cases where the first reload (in rld order) is a double-register value
starting in $4 and the second reload uses just $5.  In that case, the first
reload will have set new_spill_reg_store[4], so new_spill_reg_store[5] will
still be null.  But $5 in the second reload won't survive until the end of
the sequence.  So we'd try to set new_spill_reg_store[5] and trip the assert.

IMO it's a choice between just checking for null and not asserting
(if that really is safe, since we'll be storing instructions that
don't actually reach the end of the reload sequence), or checking
reload_reg_reaches_end.  I prefer the second, since it seems more
direct, and matches the corresponding code in emit_reload_insns.

Richard

Re: Rename some fields in struct language_function. (issue5229058)


On 10/12/2011 04:48 PM, Diego Novillo wrote:

This patch is needed in the pph branch because streamers need access
to the fields in struct language_function without going through
cp_function_chain.

Since these fields are named exactly like their #define counterparts,
we cannot reference them without the pre-processor expanding the
#defines, which causes build errors.

-#define current_function_returns_value cp_function_chain-returns_value
+#define current_function_returns_value cp_function_chain-x_returns_value

-#define current_function_returns_null cp_function_chain-returns_null
+#define current_function_returns_null cp_function_chain-x_returns_null

  #define current_function_returns_abnormally \
-  cp_function_chain-returns_abnormally
+  cp_function_chain-x_returns_abnormally


Doesn't seem necessary for these three.  OK for in_*.

Jason

Re: Factor out allocation of sorted_fields (issue5253050)


OK.

Jason

Re: int_cst_hash_table mapping persistence and the garbage collector

 I think there is an issue when two cache htabs refer to each other
 with respect to GC, you might search the list to find out more.

I'm not sure this is the case here, there seems to be a clear hierarchy.

-- 
Eric Botcazou

Re: [PATCH] Fix VIS3 assembler check and conditionalize testsuite on VIS3 support.

 I'm currently testing the following patch in various scenerios, I'm pretty
 sure this is what you had in mind.

Yes, this seems to go in the right direction.  Don't you need to pass -mvis3 
instead of -mvis?  Do you need to pass -mcpu=niagara3 at all?

-- 
Eric Botcazou

Re: [PATCH] Fix VIS3 assembler check and conditionalize testsuite on VIS3 support.

From: Eric Botcazou ebotca...@adacore.com
Date: Wed, 12 Oct 2011 23:08:39 +0200

 I'm currently testing the following patch in various scenerios, I'm pretty
 sure this is what you had in mind.

 Yes, this seems to go in the right direction.  Don't you need to pass -mvis3 
 instead of -mvis?  Do you need to pass -mcpu=niagara3 at all?

Yes, I need to correct the testcase flags now.  I just noticed this while
testing.

I will post a finalized patch later tonight.

[PATCH] AVX2 vector permutation fixes plus vec_pack_trunc_{v16hi,v8si,v4di} support (take 2)

On Wed, Oct 12, 2011 at 10:49:33AM -0700, Richard Henderson wrote:
 I believe I've commented on everything else in the previous messages.

Here is an updated patch which should incorporate your comments from both
mails (thanks for them).  Bootstrapped/regtested on x86_64-linux and
i686-linux, ok for trunk?

2011-10-12  Jakub Jelinek  ja...@redhat.com

* config/i386/i386.md (UNSPEC_VPERMDI): Remove.
* config/i386/i386.c (ix86_expand_vec_perm): Handle
V16QImode and V32QImode for TARGET_AVX2.
(MAX_VECT_LEN): Increase to 32.
(expand_vec_perm_blend): Add support for 32-byte integer
vectors with TARGET_AVX2.
(valid_perm_using_mode_p): New function.
(expand_vec_perm_pshufb): Add support for 32-byte integer
vectors with TARGET_AVX2.
(expand_vec_perm_vpshufb2_vpermq): New function.
(expand_vec_perm_vpshufb2_vpermq_even_odd): New function.
(expand_vec_perm_even_odd_1): Handle 32-byte integer vectors
with TARGET_AVX2.
(ix86_expand_vec_perm_builtin_1): Try expand_vec_perm_vpshufb2_vpermq
and expand_vec_perm_vpshufb2_vpermq_even_odd.
* config/i386/sse.md (VEC_EXTRACT_EVENODD_MODE): Add for TARGET_AVX2
32-byte integer vector modes.
(vec_pack_trunc_mode): Use VI248_AVX2 instead of VI248_128.
(avx2_interleave_highv32qi, avx2_interleave_lowv32qi): Remove pasto.
(avx2_pshufdv3, avx2_pshuflwv3, avx2_pshufhwv3): Generate
4 new operands.
(avx2_pshufd_1, avx2_pshuflw_1, avx2_pshufhw_1): Don't use
match_dup, instead add 4 new operands and require they have
right cross-lane values.
(avx2_permv4di): Change into define_expand.
(avx2_permv4di_1): New instruction.
(avx2_permv2ti): Use nonimmediate_operand instead of register_operand
for xm constrained operand.
(VEC_PERM_AVX2): Add V32QI and V16QI for TARGET_AVX2.

--- gcc/config/i386/i386.md.jj  2011-10-12 20:28:19.0 +0200
+++ gcc/config/i386/i386.md 2011-10-12 20:30:00.0 +0200
@@ -235,7 +235,6 @@ (define_c_enum unspec [
   UNSPEC_VPERMSI
   UNSPEC_VPERMDF
   UNSPEC_VPERMSF
-  UNSPEC_VPERMDI
   UNSPEC_VPERMTI
   UNSPEC_GATHER
 
--- gcc/config/i386/i386.c.jj   2011-10-12 20:28:19.0 +0200
+++ gcc/config/i386/i386.c  2011-10-12 21:01:39.0 +0200
@@ -19334,7 +19334,7 @@ ix86_expand_vec_perm (rtx operands[])
   rtx op0 = operands[1];
   rtx op1 = operands[2];
   rtx mask = operands[3];
-  rtx t1, t2, vt, vec[16];
+  rtx t1, t2, t3, t4, vt, vt2, vec[32];
   enum machine_mode mode = GET_MODE (op0);
   enum machine_mode maskmode = GET_MODE (mask);
   int w, e, i;
@@ -19343,50 +19343,68 @@ ix86_expand_vec_perm (rtx operands[])
   /* Number of elements in the vector.  */
   w = GET_MODE_NUNITS (mode);
   e = GET_MODE_UNIT_SIZE (mode);
-  gcc_assert (w = 16);
+  gcc_assert (w = 32);
 
   if (TARGET_AVX2)
 {
-  if (mode == V4DImode || mode == V4DFmode)
+  if (mode == V4DImode || mode == V4DFmode || mode == V16HImode)
{
  /* Unfortunately, the VPERMQ and VPERMPD instructions only support
 an constant shuffle operand.  With a tiny bit of effort we can
 use VPERMD instead.  A re-interpretation stall for V4DFmode is
-unfortunate but there's no avoiding it.  */
- t1 = gen_reg_rtx (V8SImode);
+unfortunate but there's no avoiding it.
+Similarly for V16HImode we don't have instructions for variable
+shuffling, while for V32QImode we can use after preparing suitable
+masks vpshufb; vpshufb; vpermq; vpor.  */
+
+ if (mode == V16HImode)
+   {
+ maskmode = mode = V32QImode;
+ w = 32;
+ e = 1;
+   }
+ else
+   {
+ maskmode = mode = V8SImode;
+ w = 8;
+ e = 4;
+   }
+ t1 = gen_reg_rtx (maskmode);
 
  /* Replicate the low bits of the V4DImode mask into V8SImode:
   mask = { A B C D }
   t1 = { A A B B C C D D }.  */
- for (i = 0; i  4; ++i)
+ for (i = 0; i  w / 2; ++i)
vec[i*2 + 1] = vec[i*2] = GEN_INT (i * 2);
- vt = gen_rtx_CONST_VECTOR (V8SImode, gen_rtvec_v (8, vec));
- vt = force_reg (V8SImode, vt);
- mask = gen_lowpart (V8SImode, mask);
- emit_insn (gen_avx2_permvarv8si (t1, vt, mask));
+ vt = gen_rtx_CONST_VECTOR (maskmode, gen_rtvec_v (w, vec));
+ vt = force_reg (maskmode, vt);
+ mask = gen_lowpart (maskmode, mask);
+ if (maskmode == V8SImode)
+   emit_insn (gen_avx2_permvarv8si (t1, vt, mask));
+ else
+   emit_insn (gen_avx2_pshufbv32qi3 (t1, mask, vt));
 
  /* Multiply the shuffle indicies by two.  */
- emit_insn (gen_avx2_lshlv8si3 (t1, t1, const1_rtx));
+ t1 = expand_simple_binop (maskmode, PLUS, t1, t1, t1, 1,
+

[PATCH] Add VEC_UNPACK_{HI,LO}_EXPR support for V{32QI,16HI,8SI} with AVX2

Hi!

This patch allows to vectorize
char a[1024], c[1024];
long long b[1024];
void
foo (void)
{
  int i;
  for (i = 0; i  1024; i++)
b[i] = a[i] + 3 * c[i];
}
using 32-byte vectors with -mavx2.  Bootstrapped/regtested
on x86_64-linux and i686-linux, ok for trunk?

2011-10-12  Jakub Jelinek  ja...@redhat.com

* config/i386/sse.md (vec_unpacks_lo_mode,
vec_unpacks_hi_mode, vec_unpacku_lo_mode,
vec_unpacku_hi_mode): Change VI124_128 mode to
VI124_AVX2.
* config/i386/i386.c (ix86_expand_sse_unpack): Handle
V32QImode, V16HImode and V8SImode for TARGET_AVX2.

--- gcc/config/i386/sse.md.jj   2011-10-12 15:42:12.0 +0200
+++ gcc/config/i386/sse.md  2011-10-12 16:16:49.0 +0200
@@ -7536,25 +7536,25 @@ (define_insn vec_concatv2di
 
 (define_expand vec_unpacks_lo_mode
   [(match_operand:sseunpackmode 0 register_operand )
-   (match_operand:VI124_128 1 register_operand )]
+   (match_operand:VI124_AVX2 1 register_operand )]
   TARGET_SSE2
   ix86_expand_sse_unpack (operands, false, false); DONE;)
 
 (define_expand vec_unpacks_hi_mode
   [(match_operand:sseunpackmode 0 register_operand )
-   (match_operand:VI124_128 1 register_operand )]
+   (match_operand:VI124_AVX2 1 register_operand )]
   TARGET_SSE2
   ix86_expand_sse_unpack (operands, false, true); DONE;)
 
 (define_expand vec_unpacku_lo_mode
   [(match_operand:sseunpackmode 0 register_operand )
-   (match_operand:VI124_128 1 register_operand )]
+   (match_operand:VI124_AVX2 1 register_operand )]
   TARGET_SSE2
   ix86_expand_sse_unpack (operands, true, false); DONE;)
 
 (define_expand vec_unpacku_hi_mode
   [(match_operand:sseunpackmode 0 register_operand )
-   (match_operand:VI124_128 1 register_operand )]
+   (match_operand:VI124_AVX2 1 register_operand )]
   TARGET_SSE2
   ix86_expand_sse_unpack (operands, true, true); DONE;)
 
--- gcc/config/i386/i386.c.jj   2011-10-12 14:19:26.0 +0200
+++ gcc/config/i386/i386.c  2011-10-12 16:15:50.0 +0200
@@ -19658,9 +19658,38 @@ ix86_expand_sse_unpack (rtx operands[2],
   if (TARGET_SSE4_1)
 {
   rtx (*unpack)(rtx, rtx);
+  rtx (*extract)(rtx, rtx) = NULL;
+  enum machine_mode halfmode = BLKmode;
 
   switch (imode)
{
+   case V32QImode:
+ if (unsigned_p)
+   unpack = gen_avx2_zero_extendv16qiv16hi2;
+ else
+   unpack = gen_avx2_sign_extendv16qiv16hi2;
+ halfmode = V16QImode;
+ extract
+   = high_p ? gen_vec_extract_hi_v32qi : gen_vec_extract_lo_v32qi;
+ break;
+   case V16HImode:
+ if (unsigned_p)
+   unpack = gen_avx2_zero_extendv8hiv8si2;
+ else
+   unpack = gen_avx2_sign_extendv8hiv8si2;
+ halfmode = V8HImode;
+ extract
+   = high_p ? gen_vec_extract_hi_v16hi : gen_vec_extract_lo_v16hi;
+ break;
+   case V8SImode:
+ if (unsigned_p)
+   unpack = gen_avx2_zero_extendv4siv4di2;
+ else
+   unpack = gen_avx2_sign_extendv4siv4di2;
+ halfmode = V4SImode;
+ extract
+   = high_p ? gen_vec_extract_hi_v8si : gen_vec_extract_lo_v8si;
+ break;
case V16QImode:
  if (unsigned_p)
unpack = gen_sse4_1_zero_extendv8qiv8hi2;
@@ -19683,7 +19712,12 @@ ix86_expand_sse_unpack (rtx operands[2],
  gcc_unreachable ();
}
 
-  if (high_p)
+  if (GET_MODE_SIZE (imode) == 32)
+   {
+ tmp = gen_reg_rtx (halfmode);
+ emit_insn (extract (tmp, operands[1]));
+   }
+  else if (high_p)
{
  /* Shift higher 8 bytes to lower 8 bytes.  */
  tmp = gen_reg_rtx (imode);

Jakub

Re: New warning for expanded vector operations

2011-10-12 Thread Artem Shinkarov

This patch fixed PR50704.

gcc/testsuite:
* gcc.target/i386/warn-vect-op-3.c: Exclude ia32 target.
* gcc.target/i386/warn-vect-op-1.c: Ditto.
* gcc.target/i386/warn-vect-op-2.c: Ditto.

Ok for trunk?

Artem.

On Wed, Oct 12, 2011 at 4:40 PM, H.J. Lu hjl.to...@gmail.com wrote:
 On Tue, Oct 11, 2011 at 9:11 AM, Artem Shinkarov
 artyom.shinkar...@gmail.com wrote:

 Committed with the revision 179807.



 This caused:

 http://gcc.gnu.org/bugzilla/show_bug.cgi?id=50704

 --
 H.J.



fix-performance-tests.diff
Description: Binary data

Re: [PATCH] AVX2 vector permutation fixes plus vec_pack_trunc_{v16hi,v8si,v4di} support (take 2)

On 10/12/2011 02:23 PM, Jakub Jelinek wrote:
 2011-10-12  Jakub Jelinek  ja...@redhat.com
 
   * config/i386/i386.md (UNSPEC_VPERMDI): Remove.
   * config/i386/i386.c (ix86_expand_vec_perm): Handle
   V16QImode and V32QImode for TARGET_AVX2.
   (MAX_VECT_LEN): Increase to 32.
   (expand_vec_perm_blend): Add support for 32-byte integer
   vectors with TARGET_AVX2.
   (valid_perm_using_mode_p): New function.
   (expand_vec_perm_pshufb): Add support for 32-byte integer
   vectors with TARGET_AVX2.
   (expand_vec_perm_vpshufb2_vpermq): New function.
   (expand_vec_perm_vpshufb2_vpermq_even_odd): New function.
   (expand_vec_perm_even_odd_1): Handle 32-byte integer vectors
   with TARGET_AVX2.
   (ix86_expand_vec_perm_builtin_1): Try expand_vec_perm_vpshufb2_vpermq
   and expand_vec_perm_vpshufb2_vpermq_even_odd.
   * config/i386/sse.md (VEC_EXTRACT_EVENODD_MODE): Add for TARGET_AVX2
   32-byte integer vector modes.
   (vec_pack_trunc_mode): Use VI248_AVX2 instead of VI248_128.
   (avx2_interleave_highv32qi, avx2_interleave_lowv32qi): Remove pasto.
   (avx2_pshufdv3, avx2_pshuflwv3, avx2_pshufhwv3): Generate
   4 new operands.
   (avx2_pshufd_1, avx2_pshuflw_1, avx2_pshufhw_1): Don't use
   match_dup, instead add 4 new operands and require they have
   right cross-lane values.
   (avx2_permv4di): Change into define_expand.
   (avx2_permv4di_1): New instruction.
   (avx2_permv2ti): Use nonimmediate_operand instead of register_operand
   for xm constrained operand.
   (VEC_PERM_AVX2): Add V32QI and V16QI for TARGET_AVX2.

Ok.


r~

Re: [PATCH] Add VEC_UNPACK_{HI,LO}_EXPR support for V{32QI,16HI,8SI} with AVX2

On 10/12/2011 02:28 PM, Jakub Jelinek wrote:
 2011-10-12  Jakub Jelinek  ja...@redhat.com
 
   * config/i386/sse.md (vec_unpacks_lo_mode,
   vec_unpacks_hi_mode, vec_unpacku_lo_mode,
   vec_unpacku_hi_mode): Change VI124_128 mode to
   VI124_AVX2.
   * config/i386/i386.c (ix86_expand_sse_unpack): Handle
   V32QImode, V16HImode and V8SImode for TARGET_AVX2.

Ok.


r~

[Ada] Enable vectorization for loops with dynamic bounds

Loops with static bounds are reasonably well vectorized in Ada.  Problems arise 
when things start to go dynamic, because of the dynamic bounds themselves but 
also because of the checks.  This patch is a first step towards enabling more 
vectorization in the dynamic cases.  The generated code isn't pretty though...

Tested on i586-suse-linux, applied on the mainline.


2011-10-12  Eric Botcazou  ebotca...@adacore.com

* gcc-interface/ada-tree.h (DECL_LOOP_PARM_P): New flag.
(DECL_INDUCTION_VAR): New macro.
(SET_DECL_INDUCTION_VAR): Likewise.
* gcc-interface/gigi.h (convert_to_index_type): Declare.
(gnat_invariant_expr): Likewise.
* gcc-interface/decl.c (gnat_to_gnu_entity) object: If this is a loop
parameter, set DECL_LOOP_PARM_P on it.
* gcc-interface/misc.c (gnat_print_decl) VAR_DECL: If DECL_LOOP_PARM_P
is set, print DECL_INDUCTION_VAR instead of DECL_RENAMED_OBJECT.
* gcc-interface/trans.c (gnu_loop_label_stack): Delete.
(struct range_check_info_d): New type.
(struct loop_info_d): Likewise.
(gnu_loop_stack): New stack.
(Identifier_to_gnu): Set TREE_READONLY flag on the first dereference
built for a by-double-ref read-only parameter.  If DECL_LOOP_PARM_P
is set, do not test DECL_RENAMED_OBJECT.
(push_range_check_info): New function.
(Loop_Statement_to_gnu): Push a new struct loop_info_d instead of just
the label.  Reference the label and the iteration variable from it.
Build the special induction variable in the unsigned version of the
size type, if it is larger than the base type.  And attach it to the
iteration variable if the latter isn't by-ref.  In the iteration scheme
case, initialize the invariant conditions in front of the loop if
deemed profitable.  Use gnu_loop_stack.
(gnat_to_gnu) N_Exit_Statement: Use gnu_loop_stack.
N_Raise_Constraint_Error: Always process the reason.  In the range
check and related cases, and if loop unswitching is enabled, compute
invariant conditions and push this information onto the stack.
Do not translate again the condition if it has been already translated.
* gcc-interface/utils.c (record_global_renaming_pointer): Assert that
DECL_LOOP_PARM_P isn't set.
(convert_to_index_type): New function.
* gcc-interface/utils2.c (build_binary_op) ARRAY_REF: Use it in order
to convert the index from the base index type to sizetype.
(gnat_invariant_expr): New function.


2011-10-12  Eric Botcazou  ebotca...@adacore.com

* gnat.dg/vect1.ad[sb]: New test.
* gnat.dg/vect1_pkg.ads: New helper.
* gnat.dg/vect2.ad[sb]: New test.
* gnat.dg/vect2_pkg.ads: New helper.
* gnat.dg/vect3.ad[sb]: New test.
* gnat.dg/vect3_pkg.ads: New helper.
* gnat.dg/vect4.ad[sb]: New test.
* gnat.dg/vect4_pkg.ads: New helper.
* gnat.dg/vect5.ad[sb]: New test.
* gnat.dg/vect5_pkg.ads: New helper.
* gnat.dg/vect6.ad[sb]: New test.
* gnat.dg/vect6_pkg.ads: New helper.


-- 
Eric Botcazou
Index: gcc-interface/utils.c
===
--- gcc-interface/utils.c	(revision 179844)
+++ gcc-interface/utils.c	(working copy)
@@ -1771,7 +1771,7 @@ process_attributes (tree decl, struct at
 void
 record_global_renaming_pointer (tree decl)
 {
-  gcc_assert (DECL_RENAMED_OBJECT (decl));
+  gcc_assert (!DECL_LOOP_PARM_P (decl)  DECL_RENAMED_OBJECT (decl));
   VEC_safe_push (tree, gc, global_renaming_pointers, decl);
 }
 
@@ -4247,6 +4247,92 @@ convert (tree type, tree expr)
   gcc_unreachable ();
 }
 }
+
+/* Create an expression whose value is that of EXPR converted to the common
+   index type, which is sizetype.  EXPR is supposed to be in the base type
+   of the GNAT index type.  Calling it is equivalent to doing
+
+ convert (sizetype, expr)
+
+   but we try to distribute the type conversion with the knowledge that EXPR
+   cannot overflow in its type.  This is a best-effort approach and we fall
+   back to the above expression as soon as difficulties are encountered.
+
+   This is necessary to overcome issues that arise when the GNAT base index
+   type and the GCC common index type (sizetype) don't have the same size,
+   which is quite frequent on 64-bit architectures.  In this case, and if
+   the GNAT base index type is signed but the iteration type of the loop has
+   been forced to unsigned, the loop scalar evolution engine cannot compute
+   a simple evolution for the general induction variables associated with the
+   array indices, because it will preserve the wrap-around semantics in the
+   unsigned type of their inner part.  As a result, many loop optimizations
+   are blocked.
+
+   The solution is to use a special (basic) induction variable that is at
+   least as large as

[Ada] Housekeeping work in gigi (39/n)

Tested on i586-suse-linux, applied on the mainline.


2011-10-12  Eric Botcazou  ebotca...@adacore.com

* gcc-interface/trans.c (Attribute_to_gnu): Use remove_conversions.
(push_range_check_info): Likewise.
(gnat_to_gnu) N_Code_Statement: Likewise.
* gcc-interface/utils2.c (build_unary_op) INDIRECT_REF: Likewise.
(gnat_invariant_expr): Likewise.
* gcc-interface/utils.c (compute_related_constant): Likewise.
(max_size): Fix handling of SAVE_EXPR.
(remove_conversions): Fix formatting.


-- 
Eric Botcazou
Index: gcc-interface/utils.c
===
--- gcc-interface/utils.c	(revision 179868)
+++ gcc-interface/utils.c	(working copy)
@@ -1147,11 +1147,11 @@ compute_related_constant (tree op0, tree
 static tree
 split_plus (tree in, tree *pvar)
 {
-  /* Strip NOPS in order to ease the tree traversal and maximize the
- potential for constant or plus/minus discovery. We need to be careful
+  /* Strip conversions in order to ease the tree traversal and maximize the
+ potential for constant or plus/minus discovery.  We need to be careful
  to always return and set *pvar to bitsizetype trees, but it's worth
  the effort.  */
-  STRIP_NOPS (in);
+  in = remove_conversions (in, false);
 
   *pvar = convert (bitsizetype, in);
 
@@ -2288,7 +2288,9 @@ max_size (tree exp, bool max_p)
   switch (TREE_CODE_LENGTH (code))
 	{
 	case 1:
-	  if (code == NON_LVALUE_EXPR)
+	  if (code == SAVE_EXPR)
+	return exp;
+	  else if (code == NON_LVALUE_EXPR)
 	return max_size (TREE_OPERAND (exp, 0), max_p);
 	  else
 	return
@@ -2330,9 +2332,7 @@ max_size (tree exp, bool max_p)
 	  }
 
 	case 3:
-	  if (code == SAVE_EXPR)
-	return exp;
-	  else if (code == COND_EXPR)
+	  if (code == COND_EXPR)
 	return fold_build2 (max_p ? MAX_EXPR : MIN_EXPR, type,
 max_size (TREE_OPERAND (exp, 1), max_p),
 max_size (TREE_OPERAND (exp, 2), max_p));
@@ -4359,8 +4359,9 @@ remove_conversions (tree exp, bool true_
 	return remove_conversions (TREE_OPERAND (exp, 0), true_address);
   break;
 
-case VIEW_CONVERT_EXPR:  case NON_LVALUE_EXPR:
 CASE_CONVERT:
+case VIEW_CONVERT_EXPR:
+case NON_LVALUE_EXPR:
   return remove_conversions (TREE_OPERAND (exp, 0), true_address);
 
 default:
Index: gcc-interface/utils2.c
===
--- gcc-interface/utils2.c	(revision 179868)
+++ gcc-interface/utils2.c	(working copy)
@@ -1277,13 +1277,8 @@ build_unary_op (enum tree_code op_code,
 
 case INDIRECT_REF:
   {
-	bool can_never_be_null;
-	tree t = operand;
-
-	while (CONVERT_EXPR_P (t) || TREE_CODE (t) == VIEW_CONVERT_EXPR)
-	  t = TREE_OPERAND (t, 0);
-
-	can_never_be_null = DECL_P (t)  DECL_CAN_NEVER_BE_NULL_P (t);
+	tree t = remove_conversions (operand, false);
+	bool can_never_be_null = DECL_P (t)  DECL_CAN_NEVER_BE_NULL_P (t);
 
 	/* If TYPE is a thin pointer, first convert to the fat pointer.  */
 	if (TYPE_IS_THIN_POINTER_P (type)
@@ -2608,16 +2603,13 @@ gnat_invariant_expr (tree expr)
 {
   tree type = TREE_TYPE (expr), t;
 
-  STRIP_NOPS (expr);
+  expr = remove_conversions (expr, false);
 
   while ((TREE_CODE (expr) == CONST_DECL
 	  || (TREE_CODE (expr) == VAR_DECL  TREE_READONLY (expr)))
 	  decl_function_context (expr) == current_function_decl
 	  DECL_INITIAL (expr))
-{
-  expr = DECL_INITIAL (expr);
-  STRIP_NOPS (expr);
-}
+expr = remove_conversions (DECL_INITIAL (expr), false);
 
   if (TREE_CONSTANT (expr))
 return fold_convert (type, expr);
Index: gcc-interface/trans.c
===
--- gcc-interface/trans.c	(revision 179868)
+++ gcc-interface/trans.c	(working copy)
@@ -1364,10 +1364,7 @@ Attribute_to_gnu (Node_Id gnat_node, tre
 	 don't try to build a trampoline.  */
   if (attribute == Attr_Code_Address)
 	{
-	  for (gnu_expr = gnu_result;
-	   CONVERT_EXPR_P (gnu_expr);
-	   gnu_expr = TREE_OPERAND (gnu_expr, 0))
-	TREE_CONSTANT (gnu_expr) = 1;
+	  gnu_expr = remove_conversions (gnu_result, false);
 
 	  if (TREE_CODE (gnu_expr) == ADDR_EXPR)
 	TREE_NO_TRAMPOLINE (gnu_expr) = TREE_CONSTANT (gnu_expr) = 1;
@@ -1378,10 +1375,7 @@ Attribute_to_gnu (Node_Id gnat_node, tre
 	 a useful warning with -Wtrampolines.  */
   else if (TREE_CODE (TREE_TYPE (gnu_prefix)) == FUNCTION_TYPE)
 	{
-	  for (gnu_expr = gnu_result;
-	   CONVERT_EXPR_P (gnu_expr);
-	   gnu_expr = TREE_OPERAND (gnu_expr, 0))
-	;
+	  gnu_expr = remove_conversions (gnu_result, false);
 
 	  if (TREE_CODE (gnu_expr) == ADDR_EXPR
 	   decl_function_context (TREE_OPERAND (gnu_expr, 0)))
@@ -2156,8 +2150,7 @@ push_range_check_info (tree var)
   if (VEC_empty (loop_info, gnu_loop_stack))
 return NULL;
 
-  while (CONVERT_EXPR_P (var) || TREE_CODE (var) == VIEW_CONVERT_EXPR)
-var = TREE_OPERAND (var, 0);
+  var =

[rs6000] Enable scalar shifts of vectors

I suppose technically the middle-end could be improved to implement
ashlmode as vashlmode by broadcasting the scalar, but Altivec
is the only extant SIMD ISA that would make use of this.  All of
the others can arrange for constant shifts to be encoded into the
insn, and so implement the ashlmode named pattern.

Tested on ppc64-linux, --with-cpu=G5.

Ok?


r~


* config/rs6000/rs6000.c (rs6000_expand_vector_broadcast): New.
* config/rs6000/rs6000-protos.h: Update.
* config/rs6000/vector.md (ashlVEC_I3): New.
(lshrVEC_I3, ashrVEC_I3): New.
commit 63a6b475bcde403cc4e220827370e6ecea9aad33
Author: Richard Henderson r...@twiddle.net
Date:   Mon Oct 10 12:34:59 2011 -0700

rs6000: Implement scalar shifts of vectors.

diff --git a/gcc/config/rs6000/rs6000-protos.h 
b/gcc/config/rs6000/rs6000-protos.h
index 73da0f6..4dee23f 100644
--- a/gcc/config/rs6000/rs6000-protos.h
+++ b/gcc/config/rs6000/rs6000-protos.h
@@ -55,6 +55,7 @@ extern void rs6000_expand_vector_init (rtx, rtx);
 extern void paired_expand_vector_init (rtx, rtx);
 extern void rs6000_expand_vector_set (rtx, rtx, int);
 extern void rs6000_expand_vector_extract (rtx, rtx, int);
+extern rtx rs6000_expand_vector_broadcast (enum machine_mode, rtx);
 extern void build_mask64_2_operands (rtx, rtx *);
 extern int expand_block_clear (rtx[]);
 extern int expand_block_move (rtx[]);
diff --git a/gcc/config/rs6000/rs6000.c b/gcc/config/rs6000/rs6000.c
index 63c0f0c..786736d 100644
--- a/gcc/config/rs6000/rs6000.c
+++ b/gcc/config/rs6000/rs6000.c
@@ -4890,6 +4890,35 @@ rs6000_expand_vector_extract (rtx target, rtx vec, int 
elt)
   emit_move_insn (target, adjust_address_nv (mem, inner_mode, 0));
 }
 
+/* Broadcast an element to all parts of a vector, loaded into a register.
+   Used to turn vector shifts by a scalar into vector shifts by a vector.  */
+
+rtx
+rs6000_expand_vector_broadcast (enum machine_mode mode, rtx elt)
+{
+  rtx repl, vec[16];
+  int i, n;
+
+  n = GET_MODE_NUNITS (mode);
+  for (i = 0; i  n; ++i)
+vec[i] = elt;
+
+  if (CONSTANT_P (elt))
+{
+  repl = gen_rtx_CONST_VECTOR (mode, gen_rtvec_v (n, vec));
+  repl = force_reg (mode, repl);
+}
+  else
+{
+  rtx par = gen_rtx_PARALLEL (VOIDmode, gen_rtvec_v (n, vec));
+  repl = gen_reg_rtx (mode);
+  rs6000_expand_vector_init (repl, par);
+}
+
+  return repl;
+}
+
+
 /* Generates shifts and masks for a pair of rldicl or rldicr insns to
implement ANDing by the mask IN.  */
 void
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index 0179cd9..24b473e 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -987,6 +987,16 @@
   TARGET_ALTIVEC
   )
 
+(define_expand ashlmode3
+  [(set (match_operand:VEC_I 0 vint_operand )
+   (ashift:VEC_I
+ (match_operand:VEC_I 1 vint_operand )
+ (match_operand:VEC_base 2 nonmemory_operand )))]
+  TARGET_ALTIVEC
+{
+  operands[2] = rs6000_expand_vector_broadcast (MODEmode, operands[2]);
+})
+
 ;; Expanders for logical shift right on each vector element
 (define_expand vlshrmode3
   [(set (match_operand:VEC_I 0 vint_operand )
@@ -995,6 +1005,16 @@
   TARGET_ALTIVEC
   )
 
+(define_expand lshrmode3
+  [(set (match_operand:VEC_I 0 vint_operand )
+   (lshiftrt:VEC_I
+ (match_operand:VEC_I 1 vint_operand )
+ (match_operand:VEC_base 2 nonmemory_operand )))]
+  TARGET_ALTIVEC
+{
+  operands[2] = rs6000_expand_vector_broadcast (MODEmode, operands[2]);
+})
+
 ;; Expanders for arithmetic shift right on each vector element
 (define_expand vashrmode3
   [(set (match_operand:VEC_I 0 vint_operand )
@@ -1002,6 +1022,16 @@
(match_operand:VEC_I 2 vint_operand )))]
   TARGET_ALTIVEC
   )
+
+(define_expand ashrmode3
+  [(set (match_operand:VEC_I 0 vint_operand )
+   (ashiftrt:VEC_I
+ (match_operand:VEC_I 1 vint_operand )
+ (match_operand:VEC_base 2 nonmemory_operand )))]
+  TARGET_ALTIVEC
+{
+  operands[2] = rs6000_expand_vector_broadcast (MODEmode, operands[2]);
+})
 
 ;; Vector reduction expanders for VSX

Re: [PATCH] Fix VIS3 assembler check and conditionalize testsuite on VIS3 support.

From: David Miller da...@davemloft.net
Date: Wed, 12 Oct 2011 17:14:59 -0400 (EDT)

 From: Eric Botcazou ebotca...@adacore.com
 Date: Wed, 12 Oct 2011 23:08:39 +0200
 
 I'm currently testing the following patch in various scenerios, I'm pretty
 sure this is what you had in mind.
 
 Yes, this seems to go in the right direction.  Don't you need to pass -mvis3 
 instead of -mvis?  Do you need to pass -mcpu=niagara3 at all?
 
 Yes, I need to correct the testcase flags now.  I just noticed this while
 testing.
 
 I will post a finalized patch later tonight.

Ok, I tested that this does the right thing both with and without a
vis3/fmaf capable assembler.

Committed to trunk.  Eric, let me know if there are any further tweaks
you'd like me to implement.


Fix sparc when assembler lacks support for vis3/fmaf instructions.

gcc/

* config/sparc/sparc.h: Do not force TARGET_VIS3 and TARGET_FMAF
to zero when assembler lacks support for such instructions.
* config/sparc/sparc.c (sparc_option_override): Clear MASK_VIS3
and MASK_FMAF in defaults when assembler lacks necessary support.

gcc/testsuite/

* gcc.target/sparc/cmask.c: Remove 'vis3' target check and specify
'-mvis3' instead of 'mcpu=niagara3' in options.
* gcc.target/sparc/fhalve.c: Likewise.
* gcc.target/sparc/fnegop.c: Likewise.
* gcc.target/sparc/fpadds.c: Likewise.
* gcc.target/sparc/fshift.c: Likewise.
* gcc.target/sparc/fucmp.c: Likewise.
* gcc.target/sparc/lzd.c: Likewise.
* gcc.target/sparc/vis3misc.c: Likewise.
* gcc.target/sparc/xmul.c: Likewise.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@179875 
138bc75d-0d04-0410-961f-82ee72b054a4
---
 gcc/ChangeLog |7 +++
 gcc/config/sparc/sparc.c  |6 +-
 gcc/config/sparc/sparc.h  |4 
 gcc/testsuite/ChangeLog   |   13 +
 gcc/testsuite/gcc.target/sparc/cmask.c|4 ++--
 gcc/testsuite/gcc.target/sparc/fhalve.c   |4 ++--
 gcc/testsuite/gcc.target/sparc/fnegop.c   |4 ++--
 gcc/testsuite/gcc.target/sparc/fpadds.c   |4 ++--
 gcc/testsuite/gcc.target/sparc/fshift.c   |4 ++--
 gcc/testsuite/gcc.target/sparc/fucmp.c|4 ++--
 gcc/testsuite/gcc.target/sparc/lzd.c  |4 ++--
 gcc/testsuite/gcc.target/sparc/vis3misc.c |4 ++--
 gcc/testsuite/gcc.target/sparc/xmul.c |4 ++--
 13 files changed, 43 insertions(+), 23 deletions(-)

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index cdc9391..017594f 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2011-10-12  David S. Miller  da...@davemloft.net
+
+   * config/sparc/sparc.h: Do not force TARGET_VIS3 and TARGET_FMAF
+   to zero when assembler lacks support for such instructions.
+   * config/sparc/sparc.c (sparc_option_override): Clear MASK_VIS3
+   and MASK_FMAF in defaults when assembler lacks necessary support.
+
 2011-10-12  Jakub Jelinek  ja...@redhat.com
 
* config/i386/sse.md (vec_unpacks_lo_mode,
diff --git a/gcc/config/sparc/sparc.c b/gcc/config/sparc/sparc.c
index 9c7cc56..fa790b3 100644
--- a/gcc/config/sparc/sparc.c
+++ b/gcc/config/sparc/sparc.c
@@ -850,7 +850,11 @@ sparc_option_override (void)
 
   cpu = cpu_table[(int) sparc_cpu_and_features];
   target_flags = ~cpu-disable;
-  target_flags |= cpu-enable;
+  target_flags |= (cpu-enable
+#ifndef HAVE_AS_FMAF_HPC_VIS3
+   ~(MASK_FMAF | MASK_VIS3)
+#endif
+  );
 
   /* If -mfpu or -mno-fpu was explicitly used, don't override with
  the processor default.  */
diff --git a/gcc/config/sparc/sparc.h b/gcc/config/sparc/sparc.h
index 0642ff2..669f106 100644
--- a/gcc/config/sparc/sparc.h
+++ b/gcc/config/sparc/sparc.h
@@ -1871,10 +1871,6 @@ extern int sparc_indent_opcode;
 
 #ifndef HAVE_AS_FMAF_HPC_VIS3
 #define AS_NIAGARA3_FLAG b
-#undef TARGET_FMAF
-#define TARGET_FMAF 0
-#undef TARGET_VIS3
-#define TARGET_VIS3 0
 #else
 #define AS_NIAGARA3_FLAG d
 #endif
diff --git a/gcc/testsuite/ChangeLog b/gcc/testsuite/ChangeLog
index 9e8f1f9..943f36f 100644
--- a/gcc/testsuite/ChangeLog
+++ b/gcc/testsuite/ChangeLog
@@ -1,3 +1,16 @@
+2011-10-12  David S. Miller  da...@davemloft.net
+
+   * gcc.target/sparc/cmask.c: Remove 'vis3' target check and specify
+   '-mvis3' instead of 'mcpu=niagara3' in options.
+   * gcc.target/sparc/fhalve.c: Likewise.
+   * gcc.target/sparc/fnegop.c: Likewise.
+   * gcc.target/sparc/fpadds.c: Likewise.
+   * gcc.target/sparc/fshift.c: Likewise.
+   * gcc.target/sparc/fucmp.c: Likewise.
+   * gcc.target/sparc/lzd.c: Likewise.
+   * gcc.target/sparc/vis3misc.c: Likewise.
+   * gcc.target/sparc/xmul.c: Likewise.
+
 2011-10-12  Eric Botcazou  ebotca...@adacore.com
 
* gnat.dg/vect1.ad[sb]: New test.
diff --git a/gcc/testsuite/gcc.target/sparc/cmask.c 
b/gcc/testsuite/gcc.target/sparc/cmask.c
index

PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

Hi,

When combine tries to combine:

(insn 37 35 39 3 (set (reg:SI 90)
(plus:SI (mult:SI (reg/v:SI 84 [ i ])
(const_int 4 [0x4]))
(reg:SI 106))) x.i:11 247 {*leasi_2}
 (nil))

(insn 39 37 41 3 (set (mem:SI (zero_extend:DI (reg:SI 90)) [3
MEM[symbol: x,
index: D.2741_12, step: 4, offset: 4294967292B]+0 S4 A32])
(reg/v:SI 84 [ i ])) x.i:11 64 {*movsi_internal}
 (expr_list:REG_DEAD (reg:SI 90)
(nil)))

it optimizes

(zero_extend:DI (plus:SI (mult:SI (reg/v:SI 84 [ i ])
(const_int 4 [0x4]))
(reg:SI 106)))

into

(and:DI (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
(const_int 4 [0x4])) 0)
(subreg:DI (reg:SI 106) 0))
(const_int 4294967292 [0xfffc])) 

in make_compound_operation.  X86 backend doesn't accept the new
expression as valid address while (zero_extend:DI) works just fine.
This patches keeps ZERO_EXTEND when zero-extending address to Pmode.
It reduces number of lea from 24173 to 21428 in x32 libgfortran.so.
Does it make any senses?

Thanks.

H.J.
---
2011-10-12  H.J. Lu  hongjiu...@intel.com

PR rtl-optimization/50696
* combine.c (subst): If an address is zero-extended to Pmode,
replace FROM with while keeping ZERO_EXTEND.

diff --git a/gcc/combine.c b/gcc/combine.c
index 6c3b17c..45180e5 100644
--- a/gcc/combine.c
+++ b/gcc/combine.c
@@ -5078,6 +5078,23 @@ subst (rtx x, rtx from, rtx to, int in_dest, int 
in_cond, int unique_copy)
}
}
 }
+#ifdef POINTERS_EXTEND_UNSIGNED
+  else if (POINTERS_EXTEND_UNSIGNED  0
+   code == MEM
+   GET_CODE (XEXP (x, 0)) == ZERO_EXTEND
+   GET_MODE (XEXP (x, 0)) == Pmode)
+{
+  /* If an address is zero-extended to Pmode, replace FROM with
+TO while keeping ZERO_EXTEND.  */
+  new_rtx = subst (XEXP (XEXP (x, 0), 0), from, to, 0, 0,
+  unique_copy);
+  /* Drop ZERO_EXTEND on constant.  */
+  if (CONST_INT_P (new_rtx))
+   SUBST (XEXP (x, 0), new_rtx);
+  else
+   SUBST (XEXP (XEXP (x, 0), 0), new_rtx);
+}
+#endif
   else
 {
   len = GET_RTX_LENGTH (code);

Re: [rs6000] Enable scalar shifts of vectors

From: Richard Henderson r...@redhat.com
Date: Wed, 12 Oct 2011 15:32:46 -0700

 I suppose technically the middle-end could be improved to implement
 ashlmode as vashlmode by broadcasting the scalar, but Altivec
 is the only extant SIMD ISA that would make use of this.  All of
 the others can arrange for constant shifts to be encoded into the
 insn, and so implement the ashlmode named pattern.

I'm pretty sure Sparc's VIS3 can do this too, see the
'vis3_shift_insnvbits_vis' patterns in sparc.md

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-12 Thread Richard Kenner

 X86 backend doesn't accept the new expression as valid address while
 (zero_extend:DI) works just fine.  This patches keeps ZERO_EXTEND
 when zero-extending address to Pmode.  It reduces number of lea from
 24173 to 21428 in x32 libgfortran.so.  Does it make any senses?

I'd be inclined to have the x86 backend accept combine's canonicalized
form rather than doing a patch such as this.

[rs6000, spu] Add vec_perm named pattern

The generic support for vector permutation will allow for automatic
lowering to V*QImode, so all we need to add to support for these targets
is the single V16QI pattern that represents the base permutation insn.

I'm not touching any of the other ways that the permutation insn 
could be generated.  After the generic support is added, I'll leave
it to the port maintainers to determine what they want to keep.  I
suspect in many cases using the generic __builtin_shuffle plus some
casting in the target-specific header files would be sufficient,
eliminating several dozen builtins.


Ok?


r~


* config/rs6000/altivec.md (vec_permv16qi): New.

* config/spu/spu.md (vec_permv16qi): New.
commit f2d8929afb989a09d7e287dc171607440c1a
Author: Richard Henderson r...@twiddle.net
Date:   Mon Oct 10 12:35:25 2011 -0700

rs6000: Implement vec_permv16qi.

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 9e7437e..84c5444 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -1357,6 +1357,15 @@
   vperm %0,%1,%2,%3
   [(set_attr type vecperm)])
 
+(define_expand vec_permv16qi
+  [(set (match_operand:V16QI 0 register_operand )
+   (unspec:V16QI [(match_operand:V16QI 1 register_operand )
+  (match_operand:V16QI 2 register_operand )
+  (match_operand:V16QI 3 register_operand )]
+ UNSPEC_VPERM))]
+  TARGET_ALTIVEC
+  )
+
 (define_insn altivec_vrfip   ; ceil
   [(set (match_operand:V4SF 0 register_operand =v)
 (unspec:V4SF [(match_operand:V4SF 1 register_operand v)]
commit a67ea08189a4399d6ade00c15e69447304f85f96
Author: Richard Henderson r...@twiddle.net
Date:   Mon Oct 10 12:35:50 2011 -0700

spu: Implement vec_permv16qi.

diff --git a/gcc/config/spu/spu.md b/gcc/config/spu/spu.md
index 676d54e..00cfaa4 100644
--- a/gcc/config/spu/spu.md
+++ b/gcc/config/spu/spu.md
@@ -4395,6 +4395,18 @@ selb\t%0,%4,%0,%3
   shufb\t%0,%1,%2,%3
   [(set_attr type shuf)])
 
+(define_expand vec_permv16qi
+  [(set (match_operand:V16QI 0 spu_reg_operand )
+   (unspec:V16QI
+ [(match_operand:V16QI 1 spu_reg_operand )
+  (match_operand:V16QI 2 spu_reg_operand )
+  (match_operand:V16QI 3 spu_reg_operand )]
+ UNSPEC_SHUFB))]
+  
+  {
+operands[3] = gen_lowpart (TImode, operands[3]);
+  })
+
 (define_insn nop
   [(unspec_volatile [(const_int 0)] UNSPECV_NOP)]

[Ada] Enable -W -Wall across the entire build

They weren't enabled for the Ada part of the front-end and the C part of the 
library.  Of course there are a few warnings...

Tested on i586-suse-linux, applied on the mainline.


2011-10-12  Eric Botcazou  ebotca...@adacore.com

gnattools/
* Makefile.in (LOOSE_WARN): Delete.
(GCC_WARN_CFLAGS): Set to -W -Wall.
(TOOLS_FLAGS_TO_PASS_1): Delete.
(TOOLS_FLAGS_TO_PASS_1re): Rename into...
(TOOLS_FLAGS_TO_PASS_RE): ...this.
(gnattools-native): Use TOOLS_FLAGS_TO_PASS_NATIVE.
(regnattools): Use TOOLS_FLAGS_TO_PASS_RE.
libada/
* Makefile.in (LOOSE_WARN): Delete.
(GCC_WARN_CFLAGS): Likewise.
(WARN_CFLAGS): Likewise.
(GNATLIBFLAGS): Add -nostdinc.
(GNATLIBCFLAGS_FOR_C): Add -W -Wall.
(LIBADA_FLAGS_TO_PASS): Remove WARN_CFLAGS.
* configure.ac (warn_cflags): Delete.
* configure: Regenerate.
gcc/ada/
* sem_util.adb (Denotes_Same_Prefix): Fix fatal warning.
* gcc-interface/Make-lang.in (WARN_ADAFLAGS): New.
(ALL_ADAFLAGS): Include WARN_ADAFLAGS.
(ADA_FLAGS_TO_PASS): Likewise.
(COMMON_FLAGS_TO_PASS): New.
(ADA_TOOLS_FLAGS_TO_PASS): Use COMMON_FLAGS_TO_PASS.  In the regular
native case, also use FLAGS_TO_PASS and ADA_FLAGS_TO_PASS.
(gnatlib): Use COMMON_FLAGS_TO_PASS.
(ada.install-common): Likewise.
(install-gnatlib): Likewise.
(install-gnatlib-obj): Likewise.
(gnattools): Use ADA_TOOLS_FLAGS_TO_PASS for gnattools1 as well.
(gnat-cross): Delete.
(gnatboot): Likewise.
(gnatboot2): Likewise.
(gnatboot3): Likewise.
(gnatstage1): Likewise.
(gnatstage2): Likewise.
* gcc-interface/Makefile.in (SOME_ADAFLAGS): Likewise.
(MOST_ADAFLAGS): Likewise.
(LOOSE_CFLAGS): Likewise.
(gnat-cross): Likewise.
(GNATLIBFLAGS): Add -W -Wall.
(GNATLIBCFLAGS_FOR_C): Likewise.
* gcc-interface/lang.opt: Remove C-specific warnings.  Add doc lines.
* gcc-interface/misc.c (gnat_handle_option): Remove obsolete cases.


-- 
Eric Botcazou
Index: gnattools/Makefile.in
===
--- gnattools/Makefile.in	(revision 179844)
+++ gnattools/Makefile.in	(working copy)
@@ -44,8 +44,7 @@ PWD_COMMAND = $${PWDCMD-pwd}
 
 # The tedious process of getting CFLAGS right.
 CFLAGS=-g
-LOOSE_WARN = -W -Wall -Wwrite-strings -Wstrict-prototypes -Wmissing-prototypes
-GCC_WARN_CFLAGS = $(LOOSE_WARN)
+GCC_WARN_CFLAGS = -W -Wall
 WARN_CFLAGS = @warn_cflags@
 
 ADA_CFLAGS=@ADA_CFLAGS@
@@ -64,8 +63,8 @@ INCLUDES_FOR_SUBDIR = -I. -I.. -I../.. -
 	-I$(fsrcdir)/../include -I$(fsrcdir)
 ADA_INCLUDES_FOR_SUBDIR = -I. -I$(fsrcdir)/ada
 
-# Variables for gnattools1, native
-TOOLS_FLAGS_TO_PASS_1= \
+# Variables for gnattools, native
+TOOLS_FLAGS_TO_PASS_NATIVE= \
 	CC=../../xgcc -B../../ \
 	CFLAGS=$(CFLAGS) $(WARN_CFLAGS) \
 	LDFLAGS=$(LDFLAGS) \
@@ -76,11 +75,13 @@ TOOLS_FLAGS_TO_PASS_1= \
 	exeext=$(exeext) \
 	fsrcdir=$(fsrcdir) \
 	srcdir=$(fsrcdir) \
+	GNATMAKE=../../gnatmake \
+	GNATLINK=../../gnatlink \
 	GNATBIND=../../gnatbind \
 	TOOLSCASE=native
 
 # Variables for regnattools
-TOOLS_FLAGS_TO_PASS_1re= \
+TOOLS_FLAGS_TO_PASS_RE= \
 	CC=../../xgcc -B../../ \
 	CFLAGS=$(CFLAGS) \
 	ADAFLAGS=$(ADAFLAGS) \
@@ -93,24 +94,7 @@ TOOLS_FLAGS_TO_PASS_1re= \
 	GNATMAKE=../../gnatmake \
 	GNATLINK=../../gnatlink \
 	GNATBIND=../../gnatbind \
-	TOOLSCASE=cross \
-	INCLUDES=
-
-# Variables for gnattools2, native
-TOOLS_FLAGS_TO_PASS_NATIVE= \
-	CC=../../xgcc -B../../ \
-	CFLAGS=$(CFLAGS) \
-	ADAFLAGS=$(ADAFLAGS) \
-	ADA_CFLAGS=$(ADA_CFLAGS) \
-	INCLUDES=$(INCLUDES_FOR_SUBDIR) \
-	ADA_INCLUDES=-I../rts $(ADA_INCLUDES_FOR_SUBDIR) \
-	exeext=$(exeext) \
-	fsrcdir=$(fsrcdir) \
-	srcdir=$(fsrcdir) \
-	GNATMAKE=../../gnatmake \
-	GNATLINK=../../gnatlink \
-	GNATBIND=../../gnatbind \
-	TOOLSCASE=native
+	TOOLSCASE=cross
 
 # Variables for gnattools, cross
 TOOLS_FLAGS_TO_PASS_CROSS= \
@@ -177,7 +161,7 @@ $(GCC_DIR)/stamp-tools:
 gnattools-native: $(GCC_DIR)/stamp-tools $(GCC_DIR)/stamp-gnatlib-rts
 	# gnattools1
 	$(MAKE) -C $(GCC_DIR)/ada/tools -f ../Makefile \
-	  $(TOOLS_FLAGS_TO_PASS_1) \
+	  $(TOOLS_FLAGS_TO_PASS_NATIVE) \
 	  ../../gnatmake$(exeext) ../../gnatlink$(exeext)
 	# gnattools2
 	$(MAKE) -C $(GCC_DIR)/ada/tools -f ../Makefile \
@@ -189,7 +173,7 @@ gnattools-native: $(GCC_DIR)/stamp-tools
 regnattools: $(GCC_DIR)/stamp-gnatlib-rts
 	# gnattools1-re
 	$(MAKE) -C $(GCC_DIR)/ada/tools -f ../Makefile \
-	  $(TOOLS_FLAGS_TO_PASS_1re) \
+	  $(TOOLS_FLAGS_TO_PASS_RE) INCLUDES= \
 	  gnatmake-re gnatlink-re
 	# gnattools2
 	$(MAKE) -C $(GCC_DIR)/ada/tools -f ../Makefile \
Index: libada/Makefile.in
===
--- libada/Makefile.in	(revision 179844)
+++ libada/Makefile.in	(working copy)
@@ -45,21 +45,17 @@ AWK=@AWK@
 
 # Variables for the user (or the top

Re: [rs6000] Enable scalar shifts of vectors

On 10/12/2011 03:37 PM, David Miller wrote:
 From: Richard Henderson r...@redhat.com
 Date: Wed, 12 Oct 2011 15:32:46 -0700

 I suppose technically the middle-end could be improved to implement
 ashlmode as vashlmode by broadcasting the scalar, but Altivec
 is the only extant SIMD ISA that would make use of this.  All of
 the others can arrange for constant shifts to be encoded into the
 insn, and so implement the ashlmode named pattern.

 I'm pretty sure Sparc's VIS3 can do this too, see the
 'vis3_shift_insnvbits_vis' patterns in sparc.md

Ok, if I read the rtl correctly, you can perform a vector shift, where each 
shift count comes from the corresponding element of op2.  But VIS has no vector 
shift where the shift count comes from a single scalar (immediate or register)?

If so, please rename this pattern to the vshift_pat_namemode3 form and 
I'll work on more middle-end support for re-use of the vshift_pat_name optab.

r~

[lto] Add streamer hooks for emitting location_t (issue5249058)


The pph streamer does not write out expanded locations.  It emits the
line map tables exactly as it found them on the initial compile so
that it can recreate them when the pph image is restored.

This allows it to emit the location_t as integers and produce the same
line locations than the original compile.

This patch adds the two hooks needed to make sure that the tree
streamer writes locations using the pph routines.

Barring any objections, I'll commit this patch to mainline in the next
couple of days.


Diego.

* streamer-hooks.h (struct streamer_hooks): Add hooks
input_location and output_location.
* lto-streamer-in.c (lto_input_location): Use
streamer_hooks.input_location, if set.
* lto-streamer-out.c (lto_output_location): Use
streamer_hooks.output_location, if set.

diff --git a/gcc/lto-streamer-in.c b/gcc/lto-streamer-in.c
index d4e80c7..f18b944 100644
--- a/gcc/lto-streamer-in.c
+++ b/gcc/lto-streamer-in.c
@@ -50,6 +50,7 @@ along with GCC; see the file COPYING3.  If not see
 #include lto-streamer.h
 #include tree-streamer.h
 #include tree-pass.h
+#include streamer-hooks.h
 
 /* The table to hold the file names.  */
 static htab_t file_name_hash_table;
@@ -180,15 +181,23 @@ lto_input_location_bitpack (struct data_in *data_in, 
struct bitpack_d *bp)
 }
 
 
-/* Read a location from input block IB.  */
+/* Read a location from input block IB.
+   If the input_location streamer hook exists, call it.
+   Otherwise, proceed with reading the location from the
+   expanded location bitpack.  */
 
 location_t
 lto_input_location (struct lto_input_block *ib, struct data_in *data_in)
 {
-  struct bitpack_d bp;
+  if (streamer_hooks.input_location)
+return streamer_hooks.input_location (ib, data_in);
+  else
+{
+  struct bitpack_d bp;
 
-  bp = streamer_read_bitpack (ib);
-  return lto_input_location_bitpack (data_in, bp);
+  bp = streamer_read_bitpack (ib);
+  return lto_input_location_bitpack (data_in, bp);
+}
 }
 
 
diff --git a/gcc/lto-streamer-out.c b/gcc/lto-streamer-out.c
index c14b3a9..4d88f62 100644
--- a/gcc/lto-streamer-out.c
+++ b/gcc/lto-streamer-out.c
@@ -172,15 +172,21 @@ lto_output_location_bitpack (struct bitpack_d *bp,
 
 
 /* Emit location LOC to output block OB.
-   When bitpack is handy, it is more space effecient to call
+   If the output_location streamer hook exists, call it.
+   Otherwise, when bitpack is handy, it is more space efficient to call
lto_output_location_bitpack with existing bitpack.  */
 
 void
 lto_output_location (struct output_block *ob, location_t loc)
 {
-  struct bitpack_d bp = bitpack_create (ob-main_stream);
-  lto_output_location_bitpack (bp, ob, loc);
-  streamer_write_bitpack (bp);
+  if (streamer_hooks.output_location)
+streamer_hooks.output_location (ob, loc);
+  else
+{
+  struct bitpack_d bp = bitpack_create (ob-main_stream);
+  lto_output_location_bitpack (bp, ob, loc);
+  streamer_write_bitpack (bp);
+}
 }
 
 
diff --git a/gcc/streamer-hooks.h b/gcc/streamer-hooks.h
index b4c6562..0c1d483 100644
--- a/gcc/streamer-hooks.h
+++ b/gcc/streamer-hooks.h
@@ -51,6 +51,16 @@ struct streamer_hooks {
  and descriptors needed by the unpickling routines.  It returns the
  tree instantiated from the stream.  */
   tree (*read_tree) (struct lto_input_block *, struct data_in *);
+
+  /* [OPT] Called by lto_input_location to retrieve the source location of the
+ tree currently being read. If this hook returns NULL, lto_input_location
+ defaults to calling lto_input_location_bitpack.  */
+  location_t (*input_location) (struct lto_input_block *, struct data_in *);
+
+  /* [OPT] Called by lto_output_location to write the source_location of the
+ tree currently being written. If this hook returns NULL,
+ lto_output_location defaults to calling lto_output_location_bitpack.  */
+  void (*output_location) (struct output_block *, location_t);
 };
 
 #define stream_write_tree(OB, EXPR, REF_P) \

--
This patch is available for review at http://codereview.appspot.com/5249058

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

On Wed, Oct 12, 2011 at 3:40 PM, Richard Kenner
ken...@vlsi1.ultra.nyu.edu wrote:
 X86 backend doesn't accept the new expression as valid address while
 (zero_extend:DI) works just fine.  This patches keeps ZERO_EXTEND
 when zero-extending address to Pmode.  It reduces number of lea from
 24173 to 21428 in x32 libgfortran.so.  Does it make any senses?

 I'd be inclined to have the x86 backend accept combine's canonicalized
 form rather than doing a patch such as this.


The address format generated by combine is very unusual in
2 aspecst:

1. The placement of subreg in

(plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
(const_int 4 [0x4])) 0)
(subreg:DI (reg:SI 106) 0))

isn't supported by x86 backend.

2. The biggest problem is optimizing mask 0x to
0xfffc by keeping track of non-zero bits in registers.
X86 backend doesn't have such information to know
ADDR  0xfffc == ADDR  0x.


-- 
H.J.

Re: [rs6000] Enable scalar shifts of vectors

From: Richard Henderson r...@redhat.com
Date: Wed, 12 Oct 2011 15:49:28 -0700

 Ok, if I read the rtl correctly, you can perform a vector shift,
 where each shift count comes from the corresponding element of op2.
 But VIS has no vector shift where the shift count comes from a
 single scalar (immediate or register)?

That's correct.

 If so, please rename this pattern to the vshift_pat_namemode3
 form and I'll work on more middle-end support for re-use of the
 vshift_pat_name optab.

Will do, thanks Richard.

Re: PATCH: PR rtl-optimization/50696: [x32] Unnecessary lea

2011-10-12 Thread Richard Kenner

 1. The placement of subreg in
 
 (plus:DI (subreg:DI (mult:SI (reg/v:SI 85 [ i ])
 (const_int 4 [0x4])) 0)
 (subreg:DI (reg:SI 106) 0))
 
 isn't supported by x86 backend.

That's easy to fix.

 2. The biggest problem is optimizing mask 0x to
 0xfffc by keeping track of non-zero bits in registers.
 X86 backend doesn't have such information to know
 ADDR  0xfffc == ADDR  0x.

But this indeed isn't.

I withdraw my comment.  

I still don't like the patch, but I'm no longer as familiar with the code
as I used to be so can't suggest a replacement.  Let's see what others 
think about it.

[cxx-mem-model] merge from trunk @ 179855

2011-10-12 Thread Aldy Hernandez

Nothing major to report.  I found a small buglet in 
gcc/testsuite/g++.dg/dg.exp that is left over from the memmodel - 
simulate-thread rename.


I also found that trunk does not have the following change to 
g++.dg/dg.exp and I will submit this as a follow up (to trunk).


Committed to branch.

houston:/source/cxx-mem-model-merge/gcc/testsuite/g++.dg$ svn diff *.exp
Index: dg.exp
===
--- dg.exp  (revision 179857)
+++ dg.exp  (working copy)
@@ -48,7 +48,7 @@ set tests [prune $tests $srcdir/$subdir/
 set tests [prune $tests $srcdir/$subdir/torture/*]
 set tests [prune $tests $srcdir/$subdir/graphite/*]
 set tests [prune $tests $srcdir/$subdir/guality/*]
-set tests [prune $tests $srcdir/$subdir/memmodel/*]
+set tests [prune $tests $srcdir/$subdir/simulate-thread/*]

 # Main loop.
 dg-runtest $tests  $DEFAULT_CXXFLAGS

[testsuite] require arm_little_endian in two tests

2011-10-12 Thread Janis Johnson

Tests gcc.target/arm/pr48252.c and gcc.target/arm/neon-vset_lanes8.c
expect little-endian code and fail when compiled with -mbig-endian.
This patch skips the test if the current multilib does not generate
little-endian code.

I'm not able to run execution tests for -mbig-endian for GCC mainline
but have tested this patch with CodeSourcery's GCC 4.6.  OK for trunk?
2011-10-12  Janis Johnson  jani...@codesourcery.com

* gcc.target/arm/pr48252.c: Require arm_little_endian.
* gcc.target/arm/neon-vset_lanes8.c: Likewise.

Index: gcc/testsuite/gcc.target/arm/pr48252.c
===
--- gcc/testsuite/gcc.target/arm/pr48252.c  (revision 344214)
+++ gcc/testsuite/gcc.target/arm/pr48252.c  (working copy)
@@ -1,5 +1,6 @@
 /* { dg-do run } */
 /* { dg-require-effective-target arm_neon_hw } */
+/* { dg-require-effective-target arm_little_endian } */
 /* { dg-options -O2 } */
 /* { dg-add-options arm_neon } */
 
Index: gcc/testsuite/gcc.target/arm/neon-vset_lanes8.c
===
--- gcc/testsuite/gcc.target/arm/neon-vset_lanes8.c (revision 344214)
+++ gcc/testsuite/gcc.target/arm/neon-vset_lanes8.c (working copy)
@@ -2,6 +2,7 @@
 
 /* { dg-do run } */
 /* { dg-require-effective-target arm_neon_hw } */
+/* { dg-require-effective-target arm_little_endian } */
 /* { dg-options -O0 } */
 /* { dg-add-options arm_neon } */

Re: RFC: Add ADD_RESTRICT tree code

Hi,

On Wed, 12 Oct 2011, Jakub Jelinek wrote:

  Assignment 2 means that t-p points to s.p.  Assignment 3 changes t-p and 
  s.p, but the change to s.p doesn't occur through a pointer based on t-p 
  or any other restrict pointer, in fact it doesn't occur through any 
  explicit initialization or assignment, but rather through in indirect 
  access via a different pointer.  Hence the accesses to the same memory 
  object at s.p[0] and t-p[0] were undefined because both accesses weren't 
  through pointers based on each other.
 
 Only the field p in the structure is restrict qualified, there is no
 restrict qualification on the other pointers (e.g. t is not restrict).
 Thus, it is valid that t points to s.  And, s.p[0] access is based on s.p
 as well as t-p and similarly t-p[0] access is based on s.p as well as
 t-p, in the sense of the ISO C99 restrict wording.

IMO reading the standard to allow an access to be 
based on s.p _as well as_ t-p and that this should result in any 
sensible behaviour regarding restrict is interpreting too much into it.  
Let's do away with the fields, trying to capture the core of the 
disagreement.  What you seem to be saying is that this code is 
well-defined and shouldn't return 1:

int foo (int * _a, int * _b)
{
  int * restrict a = _a;
  int * restrict b = _b;
  int * restrict *pa = wrap (a);   
  *pa = _b; // 1
  *a = 0;
  **pa = 1;
  return *a;
}

I think that would go straight against the intent of restrict.  I'd read 
the standard as making the above trick undefined.

 Because, if you change t-p (or s.p) at some point in between t-p = q; 
 and s.p[0]; (i.e. prior to the access) to point to a copy of the array, 
 both s.p and t-p change.

Yes, but the question is, if the very modification of t-p was valid to 
start with.  In my example above insn 1 is a funny way to write a = _b, 
i.e. reassigning the already set restrict pointer a to the one that also 
is already in b.  Simplifying the above then leads to:

int foo (int * _a, int * _b)
{
  int * restrict a = _a;
  int * restrict b = _b;
  a = _b;
  *a = 0;
  *b = 1;
  return *a;
}

which I think is undefined because of the fourth clause (multiple 
modifying accesses to the same underlying object X need to go through one 
particular restrict chain).

Seen from another perspective your reading would introduce an 
inconsistency with composition.  Let's assume we have this function 
available:

int tail (int * restrict a, int * restrict b) {
  *a = 0;
  *b = 1;
  return *a;
}

Clearly we can optimize this into { *a=0;*b=1;return 0; } without 
looking at the context.  Now write the testcase or my example above in 
terms of that function:

int goo (int *p, int *q)
{
  struct S s, *t;
  s.a = 1;
  s.p = p;   // 1
  t = wrap(s);  // 2 t=s in effect, but GCC doesn't see this
  t-p = q;  // 3
  return tail (s.p, t-p);
}

Now we get the same behaviour of returning a zero.  Something must be 
undefined here, and it's not in tail itself.  It's either the call of 
tail, the implicit modification of s.p with writes to t-p or the 
existence of two separate restrict pointers of the same value.  I think 
the production of two separate equal-value restrict pointers via 
indirect modification is the undefinedness, and _if_ the standard can be 
read in a way that this is supposed to be valid then it needs to be 
clarified to not allow that anymore.

I believe the standard should say something to the effect of disallowing 
modifying restrict pointers after they are initialized/assigned to once.


Ciao,
Michael.

Re: [wwwdocs] gcc-4.6/porting_to.html

2011-10-12 Thread Gerald Pfeifer

On Tue, 11 Oct 2011, Benjamin Kosnik wrote:
 Many users still won't have GCC 4.6 deployed yet, so I think it's
 still worth it.
 Ouch. I see this is not in, and I though I checked in the draft months 
 ago.
 
 Please check this in immediately!!!

Done last evening, and made some further tweaks.

For reference hre is the full patch that's now live on the system.

Gerald


Index: porting_to.html
===
RCS file: porting_to.html
diff -N porting_to.html
--- /dev/null   1 Jan 1970 00:00:00 -
+++ porting_to.html 12 Oct 2011 16:16:54 -  1.3
@@ -0,0 +1,150 @@
+html
+
+head
+titlePorting to GCC 4.6/title
+/head
+
+body
+h1Porting to GCC 4.6/h1
+
+p
+The GCC 4.6 release series differs from previous GCC releases in more
+than the usual list of
+a href=http://gcc.gnu.org/gcc-4.6/changes.html;changes/a. Some of
+these are a result of bug fixing, and some old behaviors have been
+intentionally changed in order to support new standards, or relaxed
+instandards-conforming ways to facilitate compilation or runtime
+performance.  Some of these changes are not visible to the naked eye
+and will not cause problems when updating from older versions.
+/p
+
+p
+However, some of these changes are visible, and can cause grief to
+users porting to GCC 4.6. This document is an effort to identify major
+issues and provide clear solutions in a quick and easily searched
+manner. Additions and suggestions for improvement are welcome.
+/p
+
+h2C language issues/h2
+
+h3New warnings for unused variables and parameters/h3
+
+p
+The behavior of code-Wall/code has changed and now includes the
+new warning flags code-Wunused-but-set-variable/code and
+(with code-Wall
+-Wextra/code) code-Wunused-but-set-parameter/code. This may
+result in new warnings in code that compiled cleanly with previous
+versions of GCC.
+/p
+
+pFor example,/p
+pre
+  void fn (void)
+  {
+int foo;
+foo = bar ();  /* foo is never used.  */
+  }
+/pre
+pGives the following diagnostic:/p
+pre
+warning: variable foo set but not used [-Wunused-but-set-variable]
+/pre
+
+pAlthough these warnings will not result in compilation failure,
+often code-Wall/code is used in conjunction with
+code-Werror/code and as a result, new warnings are turned into
+new errors./p
+ 
+pTo fix, first see if the unused variable or parameter can be removed
+without changing the result or logic of the surrounding code. If not,
+annotate it with code__attribute__((__unused__))/code./p
+ 
+pAs a workaround, remove code-Werror/code until the new warnings
+are fixed.  For conversion warnings add
+code-Wno-unused-but-set-variable/code or
+code-Wno-unused-but-set-parameter/code./p
+
+h3Strict overflow warnings/h3
+
+pUsing the code-Wstrict-overflow/code flag with
+code-Werror/code and optmization flags above code-O2/code
+may result in compile errors when using glibc optimizations
+for codestrcmp/code./p
+
+pFor example,/p
+pre
+#include lt;string.hgt;
+void do_rm_rf (const char *p) { if (strcmp (p, /) == 0) return; }
+/pre
+pResults in the following diagnostic:/p
+pre
+error: assuming signed overflow does not occur when changing X +- C1 cmp C2 to 
X cmp C1 +- C2 [-Werror=strict-overflow]
+/pre
+
+pTo work around this, use code-D__NO_STRING_INLINES/code./p
+
+h2C++ language issues/h2
+
+h3Header dependency changes/h3
+
+p
+Many of the standard C++ library include files have been edited to no
+longer include lt;cstddefgt; to get codenamespace std/code
+-scoped versions of codesize_t/code and codeptrdiff_t/code. 
+/p
+
+p
+As such, C++ programs that used the macros codeNULL/code
+or codeoffsetof/code without including lt;cstddefgt; will no
+longer compile. The diagnostic produced is similar to:
+/p
+
+pre
+error: 'ptrdiff_t' does not name a type
+/pre
+
+pre
+error: 'size_t' has not been declared
+/pre
+
+pre
+error: 'NULL' was not declared in this scope
+/pre
+
+pre
+error: there are no arguments to 'offsetof' that depend on a template
+parameter, so a declaration of 'offsetof' must be available
+/pre
+
+p
+Fixing this issue is easy: just include lt;cstddefgt;.
+/p
+
+!-- 
+h3Java issues/h3
+--
+
+h3Links/h3
+
+p
+Jakub Jelinek,
+ a 
href=http://lists.fedoraproject.org/pipermail/devel/2011-February/148523.html;GCC
+4.6 related common package rebuild failures (was Re: mass rebuild status)/a
+/p
+
+p
+Matthias Klose,
+a 
href=http://lists.debian.org/debian-devel-announce/2011/02/msg00012.html;prepare
+to fix build failures with new GCC versions/a
+/p
+
+p
+Jim Meyering,
+ a 
href=http://lists.fedoraproject.org/pipermail/devel/2011-March/149355.html;gcc-4.6.0-0.12.fc15.x86_64
 breaks strcmp?/a
+/p
+
+/body
+/html
+  
+

[lto] Factor out code for streaming struct function. (issue5253051)