Re: [PATCH 01/02] PR/62314: add ability to add fixit-hints

2015-11-10 Thread Bernd Schmidt

On 11/10/2015 05:35 PM, David Malcolm wrote:

+  /* Nasty workaround to convince the linker to add
+  rich_location::add_fixit_insert
+  rich_location::add_fixit_remove
+  rich_location::add_fixit_replace
+ to cc1 for use by diagnostic_plugin_test_show_locus,
+ before anything in cc1 is using them.
+
+ This conditional should never hold, but hopefully the compiler can't
+ figure that out.  */


Does attribute((used)) help with this problem?


Bernd


Re: [Patch AArch64] Switch constant pools to separate rodata sections.

2015-11-10 Thread Ramana Radhakrishnan
On Tue, Nov 10, 2015 at 4:39 PM, Alan Lawrence  wrote:
> On 04/11/15 14:26, Ramana Radhakrishnan wrote:
>>
>>
>> True and I've just been reading more of the backend - We could now start
>> using blocks for constant pools as well. So let's do that.
>>
>> How does something like this look ?
>>
>> Tested on aarch64-none-elf - no regressions.
>>
>> 2015-11-04  Ramana Radhakrishnan  
>>
>>  * config/aarch64/aarch64.c
>>  (aarch64_can_use_per_function_literal_pools_p): New.
>>  (aarch64_use_blocks_for_constant_p): Adjust declaration
>>  and use aarch64_can_use_function_literal_pools_p.
>>  (aarch64_select_rtx_section): Update.
>>
>
> Since r229878, I've been seeing
>
> FAIL: gcc.dg/attr-weakref-1.c (test for excess errors)
> UNRESOLVED: gcc.dg/attr-weakref-1.c compilation failed to produce executable
>
> (both previously passing) on aarch64-none-elf, aarch64_be-none-elf, and
> aarch64-none-linux-gnu. Here's a log from aarch64_be-none-elf (the others
> look similar):
>
> /work/alalaw01/build-aarch64_be-none-elf/obj/gcc2/gcc/xgcc
> -B/work/alalaw01/build-aarch64_be-none-elf/obj/gcc2/gcc/
> /work/alalaw01/src/gcc/gcc/testsuite/gcc.dg/attr-weakref-1.c
> -fno-diagnostics-show-caret -fdiagnostics-color=never -O2
> /work/alalaw01/src/gcc/gcc/testsuite/gcc.dg/attr-weakref-1a.c
> -specs=aem-validation.specs -lm -o ./attr-weakref-1.exe
> /tmp/ccEfngi6.o:(.rodata.cst8+0x30): undefined reference to `wv12'
> /tmp/ccEfngi6.o:(.rodata.cst8+0x38): undefined reference to `wv12'
> /tmp/ccEfngi6.o:(.rodata.cst8+0x60): undefined reference to `wf12'
> /tmp/ccEfngi6.o:(.rodata.cst8+0x68): undefined reference to `wf12'
> collect2: error: ld returned 1 exit status
> compiler exited with status 1
> output is:
> /tmp/ccEfngi6.o:(.rodata.cst8+0x30): undefined reference to `wv12'
> /tmp/ccEfngi6.o:(.rodata.cst8+0x38): undefined reference to `wv12'
> /tmp/ccEfngi6.o:(.rodata.cst8+0x60): undefined reference to `wf12'
> /tmp/ccEfngi6.o:(.rodata.cst8+0x68): undefined reference to `wf12'
> collect2: error: ld returned 1 exit status
>
> FAIL: gcc.dg/attr-weakref-1.c (test for excess errors)
>

Hmmm I'm surprised it failed in the first place as my testing didn't
show it - I need to check on that.

Nevertheless this fail has gone away in my testing with
https://gcc.gnu.org/ml/gcc-cvs/2015-11/msg00453.html in a bootstrap
and regression run on aarch64-none-linux-gnu. I see nothing triplet
specific in the testcase here for it to fail differently.

Is this something you see really with tip of trunk ?

regards
Ramana


[PATCH 01/02] PR/62314: add ability to add fixit-hints

2015-11-10 Thread David Malcolm
This patch adds the ability to add "fix-it hints" to a rich_location,
which will be displayed when the corresponding diagnostic is printed.

It does not actually add any fix-it hints (that comes in the second
patch), but it adds test coverage of the machinery and printing,
by using the existing diagnostic_plugin_test_show_locus to inject
some meaningless fixit hints, and to verify the output.

For now, add a nasty linker kludge in layout::print_any_fixits for
the sake of diagnostic_plugin_test_show_locus.

Successfully bootstrapped the pair of patches on
x86_64-pc-linux-gnu (on top of the 10-patch diagnostics kit).

OK for trunk?

gcc/ChangeLog:
PR/62314
* diagnostic-show-locus.c (colorizer::set_fixit_hint): New.
(class layout): Update comment
(layout::print_any_fixits): New method.
(layout::move_to_column): New method.
(diagnostic_show_locus): Add call to layout.print_any_fixits.

gcc/testsuite/ChangeLog:
PR/62314
* gcc.dg/plugin/diagnostic-test-show-locus-ascii-bw.c
(test_fixit_insert): New.
(test_fixit_remove): New.
(test_fixit_replace): New.
* gcc.dg/plugin/diagnostic-test-show-locus-ascii-color.c
(test_fixit_insert): New.
(test_fixit_remove): New.
(test_fixit_replace): New.
* gcc.dg/plugin/diagnostic_plugin_test_show_locus.c
(test_show_locus): Add tests of rendering fixit hints.

libcpp/ChangeLog:
PR/62314
* include/line-map.h (source_range::intersects_line_p): New
method.
(rich_location::~rich_location): New.
(rich_location::add_fixit_insert): New method.
(rich_location::add_fixit_remove): New method.
(rich_location::add_fixit_replace): New method.
(rich_location::get_num_fixit_hints): New accessor.
(rich_location::get_fixit_hint): New accessor.
(rich_location::MAX_FIXIT_HINTS): New constant.
(rich_location::m_num_fixit_hints): New field.
(rich_location::m_fixit_hints): New field.
(class fixit_hint): New class.
(class fixit_insert): New class.
(class fixit_remove): New class.
(class fixit_replace): New class.
* line-map.c (source_range::intersects_line_p): New method.
(rich_location::rich_location): Add initialization of
m_num_fixit_hints to both ctors.
(rich_location::~rich_location): New.
(rich_location::add_fixit_insert): New method.
(rich_location::add_fixit_remove): New method.
(rich_location::add_fixit_replace): New method.
(fixit_insert::fixit_insert): New.
(fixit_insert::~fixit_insert): New.
(fixit_insert::affects_line_p): New.
(fixit_remove::fixit_remove): New.
(fixit_remove::affects_line_p): New.
(fixit_replace::fixit_replace): New.
(fixit_replace::~fixit_replace): New.
(fixit_replace::affects_line_p): New.
---
 gcc/diagnostic-show-locus.c| 125 ++-
 .../gcc.dg/plugin/diagnostic-test-show-locus-bw.c  |  43 +++
 .../plugin/diagnostic-test-show-locus-color.c  |  43 +++
 .../plugin/diagnostic_plugin_test_show_locus.c |  35 ++
 libcpp/include/line-map.h  |  96 +++
 libcpp/line-map.c  | 136 -
 6 files changed, 471 insertions(+), 7 deletions(-)

diff --git a/gcc/diagnostic-show-locus.c b/gcc/diagnostic-show-locus.c
index 22203cd..f3d4a0e 100644
--- a/gcc/diagnostic-show-locus.c
+++ b/gcc/diagnostic-show-locus.c
@@ -78,6 +78,7 @@ class colorizer
 
   void set_range (int range_idx) { set_state (range_idx); }
   void set_normal_text () { set_state (STATE_NORMAL_TEXT); }
+  void set_fixit_hint () { set_state (0); }
 
  private:
   void set_state (int state);
@@ -139,8 +140,8 @@ struct line_bounds
 /* A class to control the overall layout when printing a diagnostic.
 
The layout is determined within the constructor.
-   It is then printed by repeatedly calling the "print_source_line"
-   and "print_annotation_line" methods.
+   It is then printed by repeatedly calling the "print_source_line",
+   "print_annotation_line" and "print_any_fixits" methods.
 
We assume we have disjoint ranges.  */
 
@@ -155,6 +156,7 @@ class layout
 
   bool print_source_line (int row, line_bounds *lbounds_out);
   void print_annotation_line (int row, const line_bounds lbounds);
+  void print_any_fixits (int row, const rich_location *richloc);
 
  private:
   bool
@@ -168,6 +170,9 @@ class layout
   get_x_bound_for_row (int row, int caret_column,
   int last_non_ws);
 
+  void
+  move_to_column (int *column, int dest_column);
+
  private:
   diagnostic_context *m_context;
   pretty_printer *m_pp;
@@ -593,6 +598,92 @@ layout::print_annotation_line (int row, const line_bounds 
lbounds)
   pp_newline (m_pp);
 }
 
+/* If there are any fixit hints on source line ROW within 

[PATCH 02/02] C FE: add fix-it hint for . vs ->

2015-11-10 Thread David Malcolm
This is the most trivial example of a real fix-it example I could think
of: if the user writes
ptr.field
rather than ptr->field.

gcc/c/ChangeLog:
* c-typeck.c (build_component_ref): Special-case POINTER_TYPE when
generating a "not a structure of union"  error message, and
suggest a "->" rather than a ".", providing a fix-it hint.

gcc/testsuite/ChangeLog:
* gcc.dg/fixits.c: New file.
---
 gcc/c/c-typeck.c  | 15 +++
 gcc/testsuite/gcc.dg/fixits.c | 14 ++
 2 files changed, 29 insertions(+)
 create mode 100644 gcc/testsuite/gcc.dg/fixits.c

diff --git a/gcc/c/c-typeck.c b/gcc/c/c-typeck.c
index c2e16c6..6fe1ca8 100644
--- a/gcc/c/c-typeck.c
+++ b/gcc/c/c-typeck.c
@@ -2336,6 +2336,21 @@ build_component_ref (location_t loc, tree datum, tree 
component)
 
   return ref;
 }
+  else if (code == POINTER_TYPE && !c_dialect_objc ())
+{
+  /* Special-case the error message for "ptr.field" for the case
+where the user has confused "." vs "->".
+We don't do it for Objective-C, since Objective-C 2.0 dot-syntax
+allows "." for ptrs; we could be handling a failed attempt
+to access a property.  */
+  rich_location richloc (line_table, loc);
+  /* "loc" should be the "." token.  */
+  richloc.add_fixit_replace (source_range::from_location (loc), "->");
+  error_at_rich_loc (,
+"%qE is a pointer; did you mean to use %<->%>?",
+datum);
+  return error_mark_node;
+}
   else if (code != ERROR_MARK)
 error_at (loc,
  "request for member %qE in something not a structure or union",
diff --git a/gcc/testsuite/gcc.dg/fixits.c b/gcc/testsuite/gcc.dg/fixits.c
new file mode 100644
index 000..3b8c8a8
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/fixits.c
@@ -0,0 +1,14 @@
+/* { dg-do compile } */
+/* { dg-options "-fdiagnostics-show-caret" } */
+
+struct foo { int x; };
+
+int test (struct foo *ptr)
+{
+  return ptr.x; /* { dg-error "'ptr' is a pointer; did you mean to use '->'?" 
} */
+/* { dg-begin-multiline-output "" }
+   return ptr.x;
+ ^
+ ->
+   { dg-end-multiline-output "" } */
+}
-- 
1.8.5.3



Re: [RFC][PATCH] Preferred rename register in regrename pass

2015-11-10 Thread Christophe Lyon
On 10 November 2015 at 12:41, Robert Suchanek
 wrote:
> Hi Christophe,
>
>> Hi,
>>
>> Since you committed this (r230087 if I'm correct), I can see that GCC
>> fails to build
>> ligfortran for target arm-none-linuxgnueabi --with-cpu=cortex-a9.
> ...
>>
>> Can you have a look?
>
> Sorry for the breakage. I see that my assertion is being triggered.
> I'll investigate this and check whether the assertion is correct or
> something else needs to be done.
>

Now that 'make check' has had enough time to run, I can see several
regressions in the configurations where GCC still builds.
For more details:
http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/230087/report-build-info.html


> Robert


RE: [PATCH 1/2][ARC] Add support for ARCv2 CPUs

2015-11-10 Thread Claudiu Zissulescu

> If you can name a pre-existing testcase to trigger the assert, the patch is
> approved for separate check-in.

The patch solves the gcc.dg/pr29921-2.c error, visible for ARC700 architecture. 
I will prepare a new patch for this error. 

Thank you for the review,
Claudiu


Re: [hsa 7/12] Disabling the vectorizer for GPU kernels/functions

2015-11-10 Thread Richard Biener
On Tue, 10 Nov 2015, Martin Jambor wrote:

> On Fri, Nov 06, 2015 at 09:38:21AM +0100, Richard Biener wrote:
> > On Thu, 5 Nov 2015, Martin Jambor wrote:
> > 
> > > Hi,
> > > 
> > > in the previous email I wrote we need to "change behavior" of a few
> > > optimization passes.  One was the flattening of GPU functions and the
> > > other two are in the patch below.  It all comes to that, at the
> > > moment, we need to switch off the vectorizer (only for the GPU
> > > functions, of course).
> > > 
> > > We are actually quite close to being able to handle gimple vector
> > > input in HSA back-end but not all the way yet, and before allowing the
> > > vectorizer again, we will have to make sure it never produces vectors
> > > bigger than 128bits (in GPU functions).
> > 
> > Hmm.  I'd rather have this modify
> > DECL_FUNCTION_SPECIFIC_OPTIMIZATION of the hsa function to get this
> > effect.  I think I mentioned this to the OACC guys as well for a
> > similar needs of them.
> 
> I see, that is a good idea.  I have reverted changes to
> tree-ssa-loop.c and tree-vectorizer.c and on top of that committed the
> following patch to the branch which makes modifications to HSA fndecls
> at a more convenient spot and disables vectorization in the following
> way:
> 
>   tree gdecl = gpu->decl;
>   tree fn_opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl);
>   if (fn_opts == NULL_TREE)
> fn_opts = optimization_default_node;
>   fn_opts = copy_node (fn_opts);
>   TREE_OPTIMIZATION (fn_opts)->x_flag_tree_loop_vectorize = false;
>   TREE_OPTIMIZATION (fn_opts)->x_flag_tree_slp_vectorize = false;
>   DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl) = fn_opts;
> 
> I hope that is what you meant.  I have also verified that it works.

Yes, that's what I meant.

Thanks,
Richard.

> Thanks,
> 
> Martin
> 
> 
> 2015-11-10  Martin Jambor  
> 
>   * hsa.h (hsa_summary_t): Add a comment to method link_functions.
>   (hsa_summary_t::link_functions): Moved...
>   * hsa.c (hsa_summary_t::link_functions): ...here.  Added common fndecl
>   modifications.
>   Include stringpool.h.
>   * ipa-hsa.c (process_hsa_functions): Do not add flatten attribute
>   here.  Fixed comments.
> 
> diff --git a/gcc/hsa.c b/gcc/hsa.c
> index ab05a1d..e63be95 100644
> --- a/gcc/hsa.c
> +++ b/gcc/hsa.c
> @@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "alloc-pool.h"
>  #include "cgraph.h"
>  #include "print-tree.h"
> +#include "stringpool.h"
>  #include "symbol-summary.h"
>  #include "hsa.h"
>  
> @@ -693,6 +694,40 @@ hsa_get_declaration_name (tree decl)
>return NULL;
>  }
>  
> +/* Couple GPU and HOST as gpu-specific and host-specific implementation of 
> the
> +   same function.  KIND determines whether GPU is a host-invokable kernel or
> +   gpu-callable function.  */
> +
> +inline void
> +hsa_summary_t::link_functions (cgraph_node *gpu, cgraph_node *host,
> +hsa_function_kind kind)
> +{
> +  hsa_function_summary *gpu_summary = get (gpu);
> +  hsa_function_summary *host_summary = get (host);
> +
> +  gpu_summary->m_kind = kind;
> +  host_summary->m_kind = kind;
> +
> +  gpu_summary->m_gpu_implementation_p = true;
> +  host_summary->m_gpu_implementation_p = false;
> +
> +  gpu_summary->m_binded_function = host;
> +  host_summary->m_binded_function = gpu;
> +
> +  tree gdecl = gpu->decl;
> +  DECL_ATTRIBUTES (gdecl)
> += tree_cons (get_identifier ("flatten"), NULL_TREE,
> +  DECL_ATTRIBUTES (gdecl));
> +
> +  tree fn_opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl);
> +  if (fn_opts == NULL_TREE)
> +fn_opts = optimization_default_node;
> +  fn_opts = copy_node (fn_opts);
> +  TREE_OPTIMIZATION (fn_opts)->x_flag_tree_loop_vectorize = false;
> +  TREE_OPTIMIZATION (fn_opts)->x_flag_tree_slp_vectorize = false;
> +  DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl) = fn_opts;
> +}
> +
>  /* Add a HOST function to HSA summaries.  */
>  
>  void
> diff --git a/gcc/hsa.h b/gcc/hsa.h
> index 025de67..b6855ea 100644
> --- a/gcc/hsa.h
> +++ b/gcc/hsa.h
> @@ -1161,27 +1161,14 @@ public:
>hsa_summary_t (symbol_table *table):
>  function_summary (table) { }
>  
> +  /* Couple GPU and HOST as gpu-specific and host-specific implementation of
> + the same function.  KIND determines whether GPU is a host-invokable 
> kernel
> + or gpu-callable function.  */
> +
>void link_functions (cgraph_node *gpu, cgraph_node *host,
>  hsa_function_kind kind);
>  };
>  
> -inline void
> -hsa_summary_t::link_functions (cgraph_node *gpu, cgraph_node *host,
> -hsa_function_kind kind)
> -{
> -  hsa_function_summary *gpu_summary = get (gpu);
> -  hsa_function_summary *host_summary = get (host);
> -
> -  gpu_summary->m_kind = kind;
> -  host_summary->m_kind = kind;
> -
> -  gpu_summary->m_gpu_implementation_p = true;
> -  host_summary->m_gpu_implementation_p = false;
> -
> -  

Re: [AArch64] Move iterators from atomics.md to iterators.md

2015-11-10 Thread James Greenhalgh
On Mon, Nov 02, 2015 at 11:44:02AM +, Matthew Wahab wrote:
> Hello
> 
> One of the review comments for the v8.1 atomics patches was that the
> iterators and unspec declarations should be moved out of the atomics.md
> file (https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01375.html).
> 
> The iterators in atomics.md are tied to the unspecv definition in the
> same file. This patch moves both into iterators.md.
> 
> Tested aarch64-none-elf with cross-compiled check-gcc and
> aarch64-none-linux-gnu with native bootstrap and make check.
> 
> Ok for trunk?

OK.

Thanks,
James

> Matthew
> 
> gcc/
> 2015-11-02  Matthew Wahab  
> 
>   * config/aarch64/atomics.md (unspecv): Move to iterators.md.
>   (ATOMIC_LDOP): Likewise.
>   (atomic_ldop): Likewise.
>   * config/aarch64/iterators.md (unspecv): Moved from atomics.md.
>   (ATOMIC_LDOP): Likewise.
>   (atomic_ldop): Likewise.



Re: [PATCH 01/02] PR/62314: add ability to add fixit-hints

2015-11-10 Thread David Malcolm
On Tue, 2015-11-10 at 17:26 +0100, Bernd Schmidt wrote:
> On 11/10/2015 05:35 PM, David Malcolm wrote:
> > +  /* Nasty workaround to convince the linker to add
> > +  rich_location::add_fixit_insert
> > +  rich_location::add_fixit_remove
> > +  rich_location::add_fixit_replace
> > + to cc1 for use by diagnostic_plugin_test_show_locus,
> > + before anything in cc1 is using them.
> > +
> > + This conditional should never hold, but hopefully the compiler can't
> > + figure that out.  */
> 
> Does attribute((used)) help with this problem?

For some reason, I'm no longer seeing the problem; I tried simply taking
out the kludge, and it now works (this is *without* the in-cc1 usage in
patch 2); looking at cc1 shows that the above 3 symbols are indeed being
added:

$ eu-readelf -s ./cc1 |grep add_fixit
 2510: 012a5280 94 FUNCGLOBAL DEFAULT   13 
_ZN13rich_location16add_fixit_insertEjPKc
 2905: 012a5300 76 FUNCGLOBAL DEFAULT   13 
_ZN13rich_location16add_fixit_removeE12source_range
 9262: 012a5390 94 FUNCGLOBAL DEFAULT   13 
_ZN13rich_location17add_fixit_replaceE12source_rangePKc
37430: 012a5300 76 FUNCGLOBAL DEFAULT   13 
_ZN13rich_location16add_fixit_removeE12source_range
46935: 012a5390 94 FUNCGLOBAL DEFAULT   13 
_ZN13rich_location17add_fixit_replaceE12source_rangePKc
47508: 012a5280 94 FUNCGLOBAL DEFAULT   13 
_ZN13rich_location16add_fixit_insertEjPKc

I've tried poking at it, but I'm not sure what changed since I first
added the kludge (an earlier version of this, sent as:
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00732.html
); sorry.

Dave



Re: [PATCH] Simple optimization for MASK_STORE.

2015-11-10 Thread Ilya Enkovich
2015-11-10 17:46 GMT+03:00 Richard Biener :
> On Tue, Nov 10, 2015 at 1:48 PM, Ilya Enkovich  wrote:
>> 2015-11-10 15:33 GMT+03:00 Richard Biener :
>>> On Fri, Nov 6, 2015 at 2:28 PM, Yuri Rumyantsev  wrote:
 Richard,

 I tried it but 256-bit precision integer type is not yet supported.
>>>
>>> What's the symptom?  The compare cannot be expanded?  Just add a pattern 
>>> then.
>>> After all we have modes up to XImode.
>>
>> I suppose problem may be in:
>>
>> gcc/config/i386/i386-modes.def:#define MAX_BITSIZE_MODE_ANY_INT (128)
>>
>> which doesn't allow to create constants of bigger size.  Changing it
>> to maximum vector size (512) would mean we increase wide_int structure
>> size significantly. New patterns are probably also needed.
>
> Yes, new patterns are needed but wide-int should be fine (we only need to 
> create
> a literal zero AFACS).  The "new pattern" would be equality/inequality
> against zero
> compares only.

Currently 256bit integer creation fails because wide_int for max and
min values cannot be created.
It is fixed by increasing MAX_BITSIZE_MODE_ANY_INT, but it increases
WIDE_INT_MAX_ELTS
and thus increases wide_int structure. If we use 512 for
MAX_BITSIZE_MODE_ANY_INT then
wide_int structure would grow by 48 bytes (16 bytes if use 256 for
MAX_BITSIZE_MODE_ANY_INT).
Is it OK for such narrow usage?

Ilya

>
> Richard.
>
>> Ilya
>>
>>>
>>> Richard.
>>>
 Yuri.




Re: [PR64164] drop copyrename, integrate into expand

2015-11-10 Thread Alan Lawrence

On 05/11/15 05:08, Alexandre Oliva wrote:

[PR67753] fix copy of PARALLEL entry_parm to CONCAT target_reg
for  gcc/ChangeLog

PR rtl-optimization/67753
PR rtl-optimization/64164
* function.c (assign_parm_setup_block): Avoid allocating a
stack slot if we don't have an ABI-reserved one.  Emit the
copy to target_reg in the conversion seq if the copy from
entry_parm is in it too.  Don't use the conversion seq to copy
a PARALLEL to a REG or a CONCAT.


Since this change, we have on aarch64_be:

FAIL: gcc.target/aarch64/aapcs64/func-ret-4.c execution,  -O1
FAIL: gcc.target/aarch64/aapcs64/func-ret-4.c execution,  -O2
FAIL: gcc.target/aarch64/aapcs64/func-ret-4.c execution,  -O3 -g
FAIL: gcc.target/aarch64/aapcs64/func-ret-4.c execution,  -Os
FAIL: gcc.target/aarch64/aapcs64/func-ret-4.c execution,  -Og -g

The difference in the assembler looks as follows (this is at -Og):

 func_return_val_10:
-   sub sp, sp, #16
-   lsr x2, x1, 48
-   lsr x1, x1, 32
+   ubfxx2, x1, 16, 16
fmovx3, d0
// Start of user assembly
 // 23 "func-ret-4.c" 1
mov x0, x30
 // 0 "" 2
// End of user assembly
adrpx3, saved_return_address
str x0, [x3, #:lo12:saved_return_address]
adrpx0, myfunc
add x0, x0, :lo12:myfunc
// Start of user assembly
 // 23 "func-ret-4.c" 1
mov x30, x0
 // 0 "" 2
// End of user assembly
bfi w0, w2, 16, 16
bfi w0, w1, 0, 16
lsl x0, x0, 32
-   add sp, sp, 16

(ubfx is a bitfield extract, the first immediate is the lsbit, the second the 
width. lsr = logical shift right.) And in the RTL dump, this (before the patch):


(insn 4 3 5 2 (set (mem/c:DI (plus:DI (reg/f:DI 68 virtual-stack-vars)
(const_int -8 [0xfff8])) [0 t+0 S8 A64])
(reg:DI 1 x1)) func-ret-4.c:23 -1
 (nil))
(insn 5 4 6 2 (set (reg:HI 78 [ t ])
(mem/c:HI (plus:DI (reg/f:DI 68 virtual-stack-vars)
(const_int -8 [0xfff8])) [0 t+0 S2 A64])) 
func-ret-4.c:23 -1

 (nil))
(insn 6 5 7 2 (set (reg:HI 79 [ t+2 ])
(mem/c:HI (plus:DI (reg/f:DI 68 virtual-stack-vars)
(const_int -6 [0xfffa])) [0 t+2 S2 A16])) 
func-ret-4.c:23 -1

 (nil))

becomes (after the patch):

(insn 4 3 5 2 (set (subreg:SI (reg:CHI 80) 0)
(reg:SI 1 x1 [ t ])) func-ret-4.c:23 -1
 (nil))
(insn 5 4 6 2 (set (reg:SI 81)
(subreg:SI (reg:CHI 80) 0)) func-ret-4.c:23 -1
 (nil))
(insn 6 5 7 2 (set (subreg:DI (reg:HI 82) 0)
(zero_extract:DI (subreg:DI (reg:SI 81) 0)
(const_int 16 [0x10])
(const_int 16 [0x10]))) func-ret-4.c:23 -1
 (nil))
(insn 7 6 8 2 (set (reg:HI 78 [ t ])
(reg:HI 82)) func-ret-4.c:23 -1
 (nil))
(insn 8 7 9 2 (set (reg:SI 83)
(subreg:SI (reg:CHI 80) 0)) func-ret-4.c:23 -1
 (nil))
(insn 9 8 10 2 (set (reg:HI 79 [ t+2 ])
(subreg:HI (reg:SI 83) 2)) func-ret-4.c:23 -1
 (nil))

--Alan



Re: [Patch AArch64] Switch constant pools to separate rodata sections.

2015-11-10 Thread Alan Lawrence

On 10/11/15 16:39, Alan Lawrence wrote:

Since r229878, I've been seeing

FAIL: gcc.dg/attr-weakref-1.c (test for excess errors)
UNRESOLVED: gcc.dg/attr-weakref-1.c compilation failed to produce executable

(both previously passing) on aarch64-none-elf, aarch64_be-none-elf, and
aarch64-none-linux-gnu.


Ah, these are fixed by Ramana's partial rollback (r230085).

--Alan



Re: [RFC] Combine vectorized loops with its scalar remainder.

2015-11-10 Thread Richard Biener
On Tue, Nov 10, 2015 at 2:02 PM, Ilya Enkovich  wrote:
> 2015-11-10 15:30 GMT+03:00 Richard Biener :
>> On Tue, Nov 3, 2015 at 1:08 PM, Yuri Rumyantsev  wrote:
>>> Richard,
>>>
>>> It looks like misunderstanding - we assume that for GCCv6 the simple
>>> scheme of remainder will be used through introducing new IV :
>>> https://gcc.gnu.org/ml/gcc-patches/2015-09/msg01435.html
>>>
>>> Is it true or we missed something?
>>
>> 
>>> > Do you have an idea how "masking" is better be organized to be usable
>>> > for both 4b and 4c?
>>>
>>> Do 2a ...
>> Okay.
>> 
>
> 2a was 'transform already vectorized loop as a separate
> post-processing'. Isn't it what this prototype patch implements?
> Current version only masks loop body which is in practice applicable
> for AVX-512 only in the most cases.  With AVX-512 it's easier to see
> how profitable masking might be and it is a main target for the first
> masking version.  Extending it to prologues/epilogues and thus making
> it more profitable for other targets is the next step and is out of
> the scope of this patch.

Ok, technically the prototype transforms the already vectorized loop.
Of course I meant the vectorized loop be copied, masked and that
result used as epilogue...

I'll queue a more detailed look into the patch for this week.

Did you perform any measurements with this patch like # of
masked epilogues in SPEC 2006 FP (and any speedup?)

Thanks,
Richard.

> Thanks,
> Ilya
>
>>
>> Richard.
>>


[PATCH] libcpp: add examples to source_location description

2015-11-10 Thread David Malcolm
This is a followup to:
  [PATCH 10/10] Compress short ranges into source_location
which adds some worked examples of what a source_location/location_t
can encode.

Successfully bootstrapped on x86_64-pc-linux-gnu
(although it only touches a comment).

OK for trunk?

libcpp/ChangeLog:
* include/line-map.h (source_location): Add worked examples of
location encoding to the leading commment.
---
 libcpp/include/line-map.h | 97 ++-
 1 file changed, 95 insertions(+), 2 deletions(-)

diff --git a/libcpp/include/line-map.h b/libcpp/include/line-map.h
index 36247b2..e7169af 100644
--- a/libcpp/include/line-map.h
+++ b/libcpp/include/line-map.h
@@ -163,8 +163,101 @@ typedef unsigned int linenum_type;
   0x | UINT_MAX  |
   ---+---+---
 
-  To see how this works in practice, see the worked example in
-  libcpp/location-example.txt.  */
+   Examples of location encoding.
+
+   Packed ranges
+   =
+
+   Consider encoding the location of a token "foo", seen underlined here
+   on line 523, within an ordinary line_map that starts at line 500:
+
+ 112
+12345678901234567890
+ 522
+ 523   return foo + bar;
+  ^~~
+ 524
+
+   The location's caret and start are both at line 523, column 11; the
+   location's finish is on the same line, at column 13 (an offset of 2
+   columns, for length 3).
+
+   Line 523 is offset 23 from the starting line of the ordinary line_map.
+
+   caret == start, and the offset of the finish fits within 5 bits, so
+   this can be stored as a packed range.
+
+   This is encoded as:
+  ordmap->start
+ + (line_offset << ordmap->m_column_and_range_bits)
+ + (column << ordmap->m_range_bits)
+ + (range_offset);
+   i.e. (for line offset 23, column 11, range offset 2):
+  ordmap->start
+ + (23 << 12)
+ + (11 << 5)
+ + 2;
+   i.e.:
+  ordmap->start + 0x17162
+   assuming that the line_map uses the default of 7 bits for columns and
+   5 bits for packed range (giving 12 bits for m_column_and_range_bits).
+
+
+   "Pure" locations
+   
+
+   These are a special case of the above, where
+  caret == start == finish
+   They are stored as packed ranges with offset == 0.
+   For example, the location of the "f" of "foo" could be stored
+   as above, but with range offset 0, giving:
+  ordmap->start
+ + (23 << 12)
+ + (11 << 5)
+ + 0;
+   i.e.:
+  ordmap->start + 0x17160
+
+
+   Unoptimized ranges
+   ==
+
+   Consider encoding the location of the binary expression
+   below:
+
+ 112
+12345678901234567890
+ 521
+ 523   return foo + bar;
+  ^
+ 523
+
+   The location's caret is at the "+", line 523 column 15, but starts
+   earlier, at the "f" of "foo" at column 11.  The finish is at the "r"
+   of "bar" at column 19.
+
+   This can't be stored as a packed range since start != caret.
+   Hence it is stored as an ad-hoc location e.g. 0x8003.
+
+   Stripping off the top bit gives us an index into the ad-hoc
+   lookaside table:
+
+ line_table->location_adhoc_data_map.data[0x3]
+
+   from which the caret, start and finish can be looked up,
+   encoded as "pure" locations:
+
+ start  == ordmap->start + (23 << 12) + (11 << 5)
+== ordmap->start + 0x17160  (as above; the "f" of "foo")
+
+ caret  == ordmap->start + (23 << 12) + (15 << 5)
+== ordmap->start + 0x171e0
+
+ finish == ordmap->start + (23 << 12) + (19 << 5)
+== ordmap->start + 0x17260
+
+   To further see how source_location works in practice, see the
+   worked example in libcpp/location-example.txt.  */
 typedef unsigned int source_location;
 
 /* A range of source locations.
-- 
1.8.5.3



[C/C++ PATCH] Reject declarators with huge arrays (PR c/68107, c++/68266)

2015-11-10 Thread Marek Polacek
While both C and C++ FEs are able to reject e.g.
int a[__SIZE_MAX__ / sizeof(int)];
they are accepting code such as
int (*a)[__SIZE_MAX__ / sizeof(int)];

As Joseph pointed out, any construction of a non-VLA type whose size is half or
more of the address space should receive a compile-time error.

Done by moving up the check for the size in bytes so that it checks check every
non-VLA complete array type constructed in the course of processing the
declarator.  Since the C++ FE had the same problem, I've fixed it up there as
well.  And that's why I had to twek dg-error of two C++ tests; if the size of
an array is considered invalid, we give an error message with word "unnamed".

(I've removed the comment about crashing in tree_to_[su]hwi since that seems
to no longer be the case.)

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-11-10  Marek Polacek  

PR c/68107
PR c++/68266
* c-decl.c (grokdeclarator): Check whether the size of arrays is
valid earlier.

* decl.c (grokdeclarator): Check whether the size of arrays is valid
earlier.

* c-c++-common/pr68107.c: New test.
* g++.dg/init/new38.C (large_array_char): Adjust dg-error.
(large_array_char_template): Likewise.
* g++.dg/init/new44.C: Adjust dg-error.

diff --git gcc/c/c-decl.c gcc/c/c-decl.c
index a3d8ead..2ec4865 100644
--- gcc/c/c-decl.c
+++ gcc/c/c-decl.c
@@ -6007,6 +6007,19 @@ grokdeclarator (const struct c_declarator *declarator,
TYPE_SIZE_UNIT (type) = size_zero_node;
SET_TYPE_STRUCTURAL_EQUALITY (type);
  }
+
+   /* Did array size calculations overflow or does the array
+  cover more than half of the address-space?  */
+   if (COMPLETE_TYPE_P (type)
+   && TREE_CODE (TYPE_SIZE_UNIT (type)) == INTEGER_CST
+   && !valid_constant_size_p (TYPE_SIZE_UNIT (type)))
+ {
+   if (name)
+ error_at (loc, "size of array %qE is too large", name);
+   else
+ error_at (loc, "size of unnamed array is too large");
+   type = error_mark_node;
+ }
  }
 
if (decl_context != PARM
@@ -6014,7 +6027,8 @@ grokdeclarator (const struct c_declarator *declarator,
|| array_ptr_attrs != NULL_TREE
|| array_parm_static))
  {
-   error_at (loc, "static or type qualifiers in non-parameter 
array declarator");
+   error_at (loc, "static or type qualifiers in non-parameter "
+ "array declarator");
array_ptr_quals = TYPE_UNQUALIFIED;
array_ptr_attrs = NULL_TREE;
array_parm_static = 0;
@@ -6293,22 +6307,6 @@ grokdeclarator (const struct c_declarator *declarator,
}
 }
 
-  /* Did array size calculations overflow or does the array cover more
- than half of the address-space?  */
-  if (TREE_CODE (type) == ARRAY_TYPE
-  && COMPLETE_TYPE_P (type)
-  && TREE_CODE (TYPE_SIZE_UNIT (type)) == INTEGER_CST
-  && ! valid_constant_size_p (TYPE_SIZE_UNIT (type)))
-{
-  if (name)
-   error_at (loc, "size of array %qE is too large", name);
-  else
-   error_at (loc, "size of unnamed array is too large");
-  /* If we proceed with the array type as it is, we'll eventually
-crash in tree_to_[su]hwi().  */
-  type = error_mark_node;
-}
-
   /* If this is declaring a typedef name, return a TYPE_DECL.  */
 
   if (storage_class == csc_typedef)
diff --git gcc/cp/decl.c gcc/cp/decl.c
index bd3f2bc..68ad82e 100644
--- gcc/cp/decl.c
+++ gcc/cp/decl.c
@@ -9945,6 +9945,18 @@ grokdeclarator (const cp_declarator *declarator,
case cdk_array:
  type = create_array_type_for_decl (dname, type,
 declarator->u.array.bounds);
+ if (type != error_mark_node
+ && COMPLETE_TYPE_P (type)
+ && TREE_CODE (TYPE_SIZE_UNIT (type)) == INTEGER_CST
+ && !valid_constant_size_p (TYPE_SIZE_UNIT (type)))
+   {
+ if (dname)
+   error ("size of array %qE is too large", dname);
+ else
+   error ("size of unnamed array is too large");
+ type = error_mark_node;
+   }
+
  if (declarator->std_attributes)
/* [dcl.array]/1:
 
@@ -10508,19 +10520,6 @@ grokdeclarator (const cp_declarator *declarator,
 error ("non-parameter %qs cannot be a parameter pack", name);
 }
 
-  /* Did array size calculations overflow or does the array cover more
- than half of the address-space?  */
-  if (TREE_CODE (type) == ARRAY_TYPE
-  && COMPLETE_TYPE_P (type)
-  && TREE_CODE (TYPE_SIZE_UNIT (type)) == INTEGER_CST
-  && ! valid_constant_size_p 

[patch] libstdc++/68190 Fix return type of heterogeneous find for sets

2015-11-10 Thread Jonathan Wakely

This converts the return type of heterogeneous find members to the
correct set iterator type.

Tested powerpc64le-linux, committed to trunk. Will commit to the
gcc-5-branch too.


commit d84e13dd8a7d47016bdfc5a9f45d8658a9d16ed9
Author: Jonathan Wakely 
Date:   Tue Nov 10 14:59:00 2015 +

Fix return type of heterogeneous find for sets

	PR libstdc++/68190
	* include/bits/stl_multiset.h (multiset::find): Fix return types.
	* include/bits/stl_set.h (set::find): Likewise.
	* testsuite/23_containers/map/operations/2.cc: Test find return types.
	* testsuite/23_containers/multimap/operations/2.cc: Likewise.
	* testsuite/23_containers/multiset/operations/2.cc: Likewise.
	* testsuite/23_containers/set/operations/2.cc: Likewise.

diff --git a/libstdc++-v3/include/bits/stl_multiset.h b/libstdc++-v3/include/bits/stl_multiset.h
index 5ccc6dd..e6e2337 100644
--- a/libstdc++-v3/include/bits/stl_multiset.h
+++ b/libstdc++-v3/include/bits/stl_multiset.h
@@ -680,13 +680,15 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 #if __cplusplus > 201103L
   template
 	auto
-	find(const _Kt& __x) -> decltype(_M_t._M_find_tr(__x))
-	{ return _M_t._M_find_tr(__x); }
+	find(const _Kt& __x)
+	-> decltype(iterator{_M_t._M_find_tr(__x)})
+	{ return iterator{_M_t._M_find_tr(__x)}; }
 
   template
 	auto
-	find(const _Kt& __x) const -> decltype(_M_t._M_find_tr(__x))
-	{ return _M_t._M_find_tr(__x); }
+	find(const _Kt& __x) const
+	-> decltype(const_iterator{_M_t._M_find_tr(__x)})
+	{ return const_iterator{_M_t._M_find_tr(__x)}; }
 #endif
   //@}
 
diff --git a/libstdc++-v3/include/bits/stl_set.h b/libstdc++-v3/include/bits/stl_set.h
index cf74368..8bea61a 100644
--- a/libstdc++-v3/include/bits/stl_set.h
+++ b/libstdc++-v3/include/bits/stl_set.h
@@ -699,13 +699,15 @@ _GLIBCXX_BEGIN_NAMESPACE_CONTAINER
 #if __cplusplus > 201103L
   template
 	auto
-	find(const _Kt& __x) -> decltype(_M_t._M_find_tr(__x))
-	{ return _M_t._M_find_tr(__x); }
+	find(const _Kt& __x)
+	-> decltype(iterator{_M_t._M_find_tr(__x)})
+	{ return iterator{_M_t._M_find_tr(__x)}; }
 
   template
 	auto
-	find(const _Kt& __x) const -> decltype(_M_t._M_find_tr(__x))
-	{ return _M_t._M_find_tr(__x); }
+	find(const _Kt& __x) const
+	-> decltype(const_iterator{_M_t._M_find_tr(__x)})
+	{ return const_iterator{_M_t._M_find_tr(__x)}; }
 #endif
   //@}
 
diff --git a/libstdc++-v3/testsuite/23_containers/map/operations/2.cc b/libstdc++-v3/testsuite/23_containers/map/operations/2.cc
index 6cc277a..ef301ef 100644
--- a/libstdc++-v3/testsuite/23_containers/map/operations/2.cc
+++ b/libstdc++-v3/testsuite/23_containers/map/operations/2.cc
@@ -54,6 +54,11 @@ test01()
   VERIFY( cit == cx.end() );
 
   VERIFY( Cmp::count == 0);
+
+  static_assert(std::is_same::value,
+  "find returns iterator");
+  static_assert(std::is_same::value,
+  "const find returns const_iterator");
 }
 
 void
diff --git a/libstdc++-v3/testsuite/23_containers/multimap/operations/2.cc b/libstdc++-v3/testsuite/23_containers/multimap/operations/2.cc
index 67c3bfd..eef6ee4 100644
--- a/libstdc++-v3/testsuite/23_containers/multimap/operations/2.cc
+++ b/libstdc++-v3/testsuite/23_containers/multimap/operations/2.cc
@@ -54,6 +54,11 @@ test01()
   VERIFY( cit == cx.end() );
 
   VERIFY( Cmp::count == 0);
+
+  static_assert(std::is_same::value,
+  "find returns iterator");
+  static_assert(std::is_same::value,
+  "const find returns const_iterator");
 }
 
 void
diff --git a/libstdc++-v3/testsuite/23_containers/multiset/operations/2.cc b/libstdc++-v3/testsuite/23_containers/multiset/operations/2.cc
index ff2748f..4bea719 100644
--- a/libstdc++-v3/testsuite/23_containers/multiset/operations/2.cc
+++ b/libstdc++-v3/testsuite/23_containers/multiset/operations/2.cc
@@ -54,6 +54,11 @@ test01()
   VERIFY( cit == cx.end() );
 
   VERIFY( Cmp::count == 0);
+
+  static_assert(std::is_same::value,
+  "find returns iterator");
+  static_assert(std::is_same::value,
+  "const find returns const_iterator");
 }
 
 void
diff --git a/libstdc++-v3/testsuite/23_containers/set/operations/2.cc b/libstdc++-v3/testsuite/23_containers/set/operations/2.cc
index 84ddd1f..6a68453 100644
--- a/libstdc++-v3/testsuite/23_containers/set/operations/2.cc
+++ b/libstdc++-v3/testsuite/23_containers/set/operations/2.cc
@@ -54,6 +54,11 @@ test01()
   VERIFY( cit == cx.end() );
 
   VERIFY( Cmp::count == 0);
+
+  static_assert(std::is_same::value,
+  "find returns iterator");
+  static_assert(std::is_same::value,
+  "const find returns const_iterator");
 }
 
 void


Re: [Patch AArch64] Switch constant pools to separate rodata sections.

2015-11-10 Thread Alan Lawrence

On 04/11/15 14:26, Ramana Radhakrishnan wrote:


True and I've just been reading more of the backend - We could now start using 
blocks for constant pools as well. So let's do that.

How does something like this look ?

Tested on aarch64-none-elf - no regressions.

2015-11-04  Ramana Radhakrishnan  

 * config/aarch64/aarch64.c
 (aarch64_can_use_per_function_literal_pools_p): New.
 (aarch64_use_blocks_for_constant_p): Adjust declaration
 and use aarch64_can_use_function_literal_pools_p.
 (aarch64_select_rtx_section): Update.



Since r229878, I've been seeing

FAIL: gcc.dg/attr-weakref-1.c (test for excess errors)
UNRESOLVED: gcc.dg/attr-weakref-1.c compilation failed to produce executable

(both previously passing) on aarch64-none-elf, aarch64_be-none-elf, and 
aarch64-none-linux-gnu. Here's a log from aarch64_be-none-elf (the others look 
similar):


/work/alalaw01/build-aarch64_be-none-elf/obj/gcc2/gcc/xgcc 
-B/work/alalaw01/build-aarch64_be-none-elf/obj/gcc2/gcc/ 
/work/alalaw01/src/gcc/gcc/testsuite/gcc.dg/attr-weakref-1.c 
-fno-diagnostics-show-caret -fdiagnostics-color=never -O2 
/work/alalaw01/src/gcc/gcc/testsuite/gcc.dg/attr-weakref-1a.c 
-specs=aem-validation.specs -lm -o ./attr-weakref-1.exe

/tmp/ccEfngi6.o:(.rodata.cst8+0x30): undefined reference to `wv12'
/tmp/ccEfngi6.o:(.rodata.cst8+0x38): undefined reference to `wv12'
/tmp/ccEfngi6.o:(.rodata.cst8+0x60): undefined reference to `wf12'
/tmp/ccEfngi6.o:(.rodata.cst8+0x68): undefined reference to `wf12'
collect2: error: ld returned 1 exit status
compiler exited with status 1
output is:
/tmp/ccEfngi6.o:(.rodata.cst8+0x30): undefined reference to `wv12'
/tmp/ccEfngi6.o:(.rodata.cst8+0x38): undefined reference to `wv12'
/tmp/ccEfngi6.o:(.rodata.cst8+0x60): undefined reference to `wf12'
/tmp/ccEfngi6.o:(.rodata.cst8+0x68): undefined reference to `wf12'
collect2: error: ld returned 1 exit status

FAIL: gcc.dg/attr-weakref-1.c (test for excess errors)



Re: [PATCH] Simple optimization for MASK_STORE.

2015-11-10 Thread Mike Stump
On Nov 10, 2015, at 6:56 AM, Ilya Enkovich  wrote:
> 2015-11-10 17:46 GMT+03:00 Richard Biener :
>> On Tue, Nov 10, 2015 at 1:48 PM, Ilya Enkovich  
>> wrote:
>>> 2015-11-10 15:33 GMT+03:00 Richard Biener :
 On Fri, Nov 6, 2015 at 2:28 PM, Yuri Rumyantsev  wrote:
> Richard,
> 
> I tried it but 256-bit precision integer type is not yet supported.
 
 What's the symptom?  The compare cannot be expanded?  Just add a pattern 
 then.
 After all we have modes up to XImode.
>>> 
>>> I suppose problem may be in:
>>> 
>>> gcc/config/i386/i386-modes.def:#define MAX_BITSIZE_MODE_ANY_INT (128)
>>> 
>>> which doesn't allow to create constants of bigger size.  Changing it
>>> to maximum vector size (512) would mean we increase wide_int structure
>>> size significantly. New patterns are probably also needed.
>> 
>> Yes, new patterns are needed but wide-int should be fine (we only need to 
>> create
>> a literal zero AFACS).  The "new pattern" would be equality/inequality
>> against zero
>> compares only.
> 
> Currently 256bit integer creation fails because wide_int for max and
> min values cannot be created.
> It is fixed by increasing MAX_BITSIZE_MODE_ANY_INT, but it increases
> WIDE_INT_MAX_ELTS
> and thus increases wide_int structure. If we use 512 for
> MAX_BITSIZE_MODE_ANY_INT then
> wide_int structure would grow by 48 bytes (16 bytes if use 256 for
> MAX_BITSIZE_MODE_ANY_INT).

Not answering for Richard, but the design of wide-int was that though the 
temporary space would grow, trees and rtl would not.  Most wide-int values are 
short lived.

Re: [PATCH] Make BB vectorizer work on sub-BBs

2015-11-10 Thread Christophe Lyon
On 10 November 2015 at 14:02, Richard Biener  wrote:
> On Tue, 10 Nov 2015, Christophe Lyon wrote:
>
>> On 6 November 2015 at 12:11, Kyrill Tkachov  wrote:
>> > Hi Richard,
>> >
>> >
>> > On 06/11/15 11:09, Richard Biener wrote:
>> >>
>> >> On Fri, 6 Nov 2015, Richard Biener wrote:
>> >>
>> >>> The following patch makes the BB vectorizer not only handle BB heads
>> >>> (until the first stmt with a data reference it cannot handle) but
>> >>> arbitrary regions in a BB separated by such stmts.
>> >>>
>> >>> This improves the number of BB vectorizations from 469 to 556
>> >>> in a quick test on SPEC CPU 2006 with -Ofast on x86_64 and
>> >>> 1x400.perlbench 1x410.bwaves 1x416.gamess 1x450.soplex 1x453.povray
>> >>> 1x481.wrf failing both patched and unpatched (have to update my
>> >>> config used for such experiments it seems ...)
>> >>>
>> >>> Bootstrapped and tested on x86_64-unknown-linux-gnu, aarch64 cross built.
>> >>>
>> >>> I'm currently re-testing for a cosmetic change I made when writing
>> >>> the changelog.
>> >>>
>> >>> I expected (and there are) some issues with compile-time.  Left
>> >>> is unpatched and right is patched.
>> >>>
>> >>> '403.gcc': 00:00:54 (54)  | '403.gcc': 00:00:55 (55)
>> >>> '483.xalancbmk': 00:02:20 (140)   | '483.xalancbmk': 00:02:24 (144)
>> >>> '416.gamess': 00:02:36 (156)  | '416.gamess': 00:02:37 (157)
>> >>> '435.gromacs': 00:00:18 (18)  | '435.gromacs': 00:00:19 (19)
>> >>> '447.dealII': 00:01:31 (91)   | '447.dealII': 00:01:33 (93)
>> >>> '453.povray': 00:04:54 (294)  | '453.povray': 00:08:54 (534)
>> >>> '454.calculix': 00:00:34 (34) | '454.calculix': 00:00:52 (52)
>> >>> '481.wrf': 00:01:57 (117) | '481.wrf': 00:01:59 (119)
>> >>>
>> >>> other benchmarks are unchanged.  I'm double-checking now that a followup
>> >>> patch I have which re-implements BB vectorization dependence checking
>> >>> fixes this (that's the only quadraticness I know of).
>> >>
>> >> Fixes all but
>> >>
>> >> '453.povray': 00:04:54 (294)  | '453.povray': 00:06:46 (406)
>> >
>> >
>> > Note that povray is currently suffering from PR 68198
>> >
>>
>> Hi,
>>
>> I've also noticed that the new test bb-slp-38 fails on armeb:
>> FAIL: gcc.dg/vect/bb-slp-38.c -flto -ffat-lto-objects
>> scan-tree-dump-times slp2 "basic block part vectorized" 2
>> FAIL: gcc.dg/vect/bb-slp-38.c scan-tree-dump-times slp2 "basic block
>> part vectorized" 2
>>
>> I haven't checked in more detail, maybe it's similar to what we
>> discussed in PR65962
>
> Maybe though there is no misalignment involved as far as I can see.
>
> Please open a bug and attach vectorizer dumps.
>
OK, this is https://gcc.gnu.org/bugzilla/show_bug.cgi?id=68275

> Richard.
>
>> > Kyrill
>> >
>> >
>> >>
>> >> it even improves compile-time on some:
>> >>
>> >> '464.h264ref': 00:00:26 (26)  | '464.h264ref': 00:00:21 (21)
>> >>
>> >> it also increases the number of vectorized BBs to 722.
>> >>
>> >> Needs some work still though.
>> >>
>> >> Richard.
>> >>
>> >>> Richard.
>> >>>
>> >>> 2015-11-06  Richard Biener  
>> >>>
>> >>> * tree-vectorizer.h (struct _bb_vec_info): Add region_begin/end
>> >>> members.
>> >>> (vect_stmt_in_region_p): Declare.
>> >>> * tree-vect-slp.c (new_bb_vec_info): Work on a region.
>> >>> (destroy_bb_vec_info): Likewise.
>> >>> (vect_bb_slp_scalar_cost): Use vect_stmt_in_region_p.
>> >>> (vect_get_and_check_slp_defs): Likewise.
>> >>> (vect_slp_analyze_bb_1): Refactor to make it work on sub-BBs.
>> >>> (vect_slp_bb): Likewise.
>> >>> * tree-vect-patterns.c (vect_same_loop_or_bb_p): Implement
>> >>> in terms of vect_stmt_in_region_p.
>> >>> (vect_pattern_recog): Iterate over the BB region.
>> >>> * tree-vect-stmts.c (vect_is_simple_use): Use
>> >>> vect_stmt_in_region_p.
>> >>> * tree-vectorizer.c (vect_stmt_in_region_p): New function.
>> >>> (pass_slp_vectorize::execute): Initialize all stmt UIDs to -1.
>> >>>
>> >>> * config/i386/i386.c: Include gimple-iterator.h.
>> >>> * config/aarch64/aarch64.c: Likewise.
>> >>>
>> >>> * gcc.dg/vect/bb-slp-38.c: New testcase.
>> >>>
>> >>> Index: gcc/tree-vectorizer.h
>> >>> ===
>> >>> *** gcc/tree-vectorizer.h.orig  2015-11-05 09:52:00.640227178 +0100
>> >>> --- gcc/tree-vectorizer.h   2015-11-05 13:20:58.385786476 +0100
>> >>> *** nested_in_vect_loop_p (struct loop *loop
>> >>> *** 390,395 
>> >>> --- 390,397 
>> >>>typedef struct _bb_vec_info : public vec_info
>> >>>{
>> >>>  basic_block bb;
>> >>> +   gimple_stmt_iterator region_begin;
>> >>> +   gimple_stmt_iterator region_end;
>> >>>} *bb_vec_info;
>> >>>   #define BB_VINFO_BB(B)   (B)->bb
>> >>> *** void 

Re: [gomp4] Random omp-low.c backporting

2015-11-10 Thread Thomas Schwinge
Hi Nathan!

On Tue, 10 Nov 2015 09:19:50 -0500, Nathan Sidwell  wrote:
> I've committed this to backport a bunch of random bits from trunk to gomp4, 
> and 
> thereby reduce divergence.

Yeah, I had some of these on my list, too.

> --- omp-low.c (revision 230080)
> +++ omp-low.c (working copy)
> @@ -12515,7 +12485,7 @@ replace_oacc_fn_attrib (tree fn, tree di
> function attribute.  Push any that are non-constant onto the ARGS
> list, along with an appropriate GOMP_LAUNCH_DIM tag.  */
>  
> -void
> +static void
>  set_oacc_fn_attrib (tree fn, tree clauses, vec *args)
>  {
>/* Must match GOMP_DIM ordering.  */

[...]/gcc/omp-low.c: In function 'void set_oacc_fn_attrib(tree, tree, 
vec*)':
[...]/gcc/omp-low.c:12578:59: error: 'void set_oacc_fn_attrib(tree, tree, 
vec*)' was declared 'extern' and later 'static' [-fpermissive]
 set_oacc_fn_attrib (tree fn, tree clauses, vec *args)
   ^
In file included from [...]/gcc/omp-low.c:71:0:
[...]/gcc/omp-low.h:36:13: error: previous declaration of 'void 
set_oacc_fn_attrib(tree, tree, vec*)' [-fpermissive]
 extern void set_oacc_fn_attrib (tree, tree, vec *);
 ^
Makefile:1083: recipe for target 'omp-low.o' failed
make[2]: *** [omp-low.o] Error 1

If it's intended to be static in gcc/omp-low.c, you'll need to change
gcc/tree-parloops.c:create_parallel_loop to not use it.

> @@ -15530,7 +15499,7 @@ lower_omp_target (gimple_stmt_iterator *
> case OMP_CLAUSE_MAP:
> case OMP_CLAUSE_TO:
> case OMP_CLAUSE_FROM:
> -   oacc_firstprivate_2:
> +   oacc_firstprivate_map:
>   nc = c;
>   ovar = OMP_CLAUSE_DECL (c);
>   if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP

We got another label oacc_firstprivate above this one, which is why I had
named this oacc_firstprivate_2 -- no idea if oacc_firstprivate_map is a
"better" name.

> @@ -15581,9 +15550,9 @@ lower_omp_target (gimple_stmt_iterator *
>   x = build_sender_ref (ovar, ctx);
>  
>   if (OMP_CLAUSE_CODE (c) == OMP_CLAUSE_MAP
> -  && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_POINTER
> -  && !OMP_CLAUSE_MAP_ZERO_BIAS_ARRAY_SECTION (c)
> -  && TREE_CODE (TREE_TYPE (ovar)) == ARRAY_TYPE)
> + && OMP_CLAUSE_MAP_KIND (c) == GOMP_MAP_POINTER
> + && !OMP_CLAUSE_MAP_ZERO_BIAS_ARRAY_SECTION (c)
> + && TREE_CODE (TREE_TYPE (ovar)) == ARRAY_TYPE)
> {
>   gcc_assert (offloaded);
>   tree avar

Needs to be fixed on trunk, I think?

> @@ -15727,8 +15696,7 @@ lower_omp_target (gimple_stmt_iterator *
>  
> case OMP_CLAUSE_FIRSTPRIVATE:
>   if (is_oacc_parallel (ctx))
> -   goto oacc_firstprivate_2;
> - gcc_assert (!is_gimple_omp_oacc (ctx->stmt));
> +   goto oacc_firstprivate_map;
>   ovar = OMP_CLAUSE_DECL (c);
>   if (is_reference (ovar))
> talign = TYPE_ALIGN_UNIT (TREE_TYPE (TREE_TYPE (ovar)));

I had put in the "gcc_assert (!is_gimple_omp_oacc (ctx->stmt))" to make
sure we don't ever reach this for OpenACC kernels, which will not "goto
oacc_firstprivate_2" because that's only being done for
"is_oacc_parallel" (but not for "is_oacc_kernels").


Grüße
 Thomas


signature.asc
Description: PGP signature


Re: [hsa 7/12] Disabling the vectorizer for GPU kernels/functions

2015-11-10 Thread Martin Jambor
On Fri, Nov 06, 2015 at 09:38:21AM +0100, Richard Biener wrote:
> On Thu, 5 Nov 2015, Martin Jambor wrote:
> 
> > Hi,
> > 
> > in the previous email I wrote we need to "change behavior" of a few
> > optimization passes.  One was the flattening of GPU functions and the
> > other two are in the patch below.  It all comes to that, at the
> > moment, we need to switch off the vectorizer (only for the GPU
> > functions, of course).
> > 
> > We are actually quite close to being able to handle gimple vector
> > input in HSA back-end but not all the way yet, and before allowing the
> > vectorizer again, we will have to make sure it never produces vectors
> > bigger than 128bits (in GPU functions).
> 
> Hmm.  I'd rather have this modify
> DECL_FUNCTION_SPECIFIC_OPTIMIZATION of the hsa function to get this
> effect.  I think I mentioned this to the OACC guys as well for a
> similar needs of them.

I see, that is a good idea.  I have reverted changes to
tree-ssa-loop.c and tree-vectorizer.c and on top of that committed the
following patch to the branch which makes modifications to HSA fndecls
at a more convenient spot and disables vectorization in the following
way:

  tree gdecl = gpu->decl;
  tree fn_opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl);
  if (fn_opts == NULL_TREE)
fn_opts = optimization_default_node;
  fn_opts = copy_node (fn_opts);
  TREE_OPTIMIZATION (fn_opts)->x_flag_tree_loop_vectorize = false;
  TREE_OPTIMIZATION (fn_opts)->x_flag_tree_slp_vectorize = false;
  DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl) = fn_opts;

I hope that is what you meant.  I have also verified that it works.

Thanks,

Martin


2015-11-10  Martin Jambor  

* hsa.h (hsa_summary_t): Add a comment to method link_functions.
(hsa_summary_t::link_functions): Moved...
* hsa.c (hsa_summary_t::link_functions): ...here.  Added common fndecl
modifications.
Include stringpool.h.
* ipa-hsa.c (process_hsa_functions): Do not add flatten attribute
here.  Fixed comments.

diff --git a/gcc/hsa.c b/gcc/hsa.c
index ab05a1d..e63be95 100644
--- a/gcc/hsa.c
+++ b/gcc/hsa.c
@@ -34,6 +34,7 @@ along with GCC; see the file COPYING3.  If not see
 #include "alloc-pool.h"
 #include "cgraph.h"
 #include "print-tree.h"
+#include "stringpool.h"
 #include "symbol-summary.h"
 #include "hsa.h"
 
@@ -693,6 +694,40 @@ hsa_get_declaration_name (tree decl)
   return NULL;
 }
 
+/* Couple GPU and HOST as gpu-specific and host-specific implementation of the
+   same function.  KIND determines whether GPU is a host-invokable kernel or
+   gpu-callable function.  */
+
+inline void
+hsa_summary_t::link_functions (cgraph_node *gpu, cgraph_node *host,
+  hsa_function_kind kind)
+{
+  hsa_function_summary *gpu_summary = get (gpu);
+  hsa_function_summary *host_summary = get (host);
+
+  gpu_summary->m_kind = kind;
+  host_summary->m_kind = kind;
+
+  gpu_summary->m_gpu_implementation_p = true;
+  host_summary->m_gpu_implementation_p = false;
+
+  gpu_summary->m_binded_function = host;
+  host_summary->m_binded_function = gpu;
+
+  tree gdecl = gpu->decl;
+  DECL_ATTRIBUTES (gdecl)
+= tree_cons (get_identifier ("flatten"), NULL_TREE,
+DECL_ATTRIBUTES (gdecl));
+
+  tree fn_opts = DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl);
+  if (fn_opts == NULL_TREE)
+fn_opts = optimization_default_node;
+  fn_opts = copy_node (fn_opts);
+  TREE_OPTIMIZATION (fn_opts)->x_flag_tree_loop_vectorize = false;
+  TREE_OPTIMIZATION (fn_opts)->x_flag_tree_slp_vectorize = false;
+  DECL_FUNCTION_SPECIFIC_OPTIMIZATION (gdecl) = fn_opts;
+}
+
 /* Add a HOST function to HSA summaries.  */
 
 void
diff --git a/gcc/hsa.h b/gcc/hsa.h
index 025de67..b6855ea 100644
--- a/gcc/hsa.h
+++ b/gcc/hsa.h
@@ -1161,27 +1161,14 @@ public:
   hsa_summary_t (symbol_table *table):
 function_summary (table) { }
 
+  /* Couple GPU and HOST as gpu-specific and host-specific implementation of
+ the same function.  KIND determines whether GPU is a host-invokable kernel
+ or gpu-callable function.  */
+
   void link_functions (cgraph_node *gpu, cgraph_node *host,
   hsa_function_kind kind);
 };
 
-inline void
-hsa_summary_t::link_functions (cgraph_node *gpu, cgraph_node *host,
-  hsa_function_kind kind)
-{
-  hsa_function_summary *gpu_summary = get (gpu);
-  hsa_function_summary *host_summary = get (host);
-
-  gpu_summary->m_kind = kind;
-  host_summary->m_kind = kind;
-
-  gpu_summary->m_gpu_implementation_p = true;
-  host_summary->m_gpu_implementation_p = false;
-
-  gpu_summary->m_binded_function = host;
-  host_summary->m_binded_function = gpu;
-}
-
 /* in hsa.c */
 extern struct hsa_function_representation *hsa_cfun;
 extern hash_map  *hsa_decl_kernel_dependencies;
diff --git a/gcc/ipa-hsa.c b/gcc/ipa-hsa.c
index b4cb58e..d77fa6b 100644
--- a/gcc/ipa-hsa.c
+++ b/gcc/ipa-hsa.c
@@ -90,16 +90,12 @@ 

Re: [PATCH 1/2] Fix invalid left shift of negative value.

2015-11-10 Thread Mike Stump
On Nov 10, 2015, at 3:13 AM, Dominik Vogt  wrote:
> On Tue, Nov 10, 2015 at 12:11:23PM +0100, Dominik Vogt wrote:
>> The following series of patches fixes all occurences of
>> left-shifting negative constants in C code which is undefined by
>> the C standard.  The patches have been tested on s390x, covering
>> only a small subset of the changes.
> 
> Changes in gdb/.

So, should these go to the gdb list?


Re: [PATCH 3a/4][AArch64] Add attribute for compatibility with ARM pipeline models

2015-11-10 Thread Ramana Radhakrishnan
On Tue, Nov 10, 2015 at 6:01 PM, Ramana Radhakrishnan
 wrote:
> On Tue, Nov 10, 2015 at 5:50 PM, Evandro Menezes  
> wrote:
>>2015-11-10  Evandro Menezes 
>>
>>gcc/
>>
>>* config/aarch64/aarch64.md (predicated): Copy attribute from
>>"arm.md".
>>
>> This patch duplicates an attribute from arm.md so that the same pipeline
>> model can be used for both AArch32 and AArch64.
>
> I'm not an aarch64 maintainer so I cannot approve.
>
> There are no predicated instructions in aarch64 - thus it's best imho
> to have only one option, "no" and not even give the option for someone
> to accidentally set this to yes.

Scratch that - I had a brain fade.

Ramana

>
> regards
> Ramana
>
>
>>
>> Bootstrapped on arm-unknown-linux-gnueabihf, aarch64-unknown-linux-gnu.
>>
>> Please, commit if it's alright.
>>
>> --
>> Evandro Menezes
>>
>>


Re: [PATCH], Add power9 support to GCC, patch #6 (IEEE 128-bit hardware support)

2015-11-10 Thread Michael Meissner
On Tue, Nov 10, 2015 at 12:41:07AM +, Joseph Myers wrote:
> I don't see any conversions between KFmode and TImode (in either 
> direction, signed or unsigned) here - I suppose there are no instructions 
> for that?

No in power9 there is no instruction that converts 128-bit integer to IEEE
128-bit floating point or vice versa.

> If so, I would guess (without having tested it) that it is more efficient 
> to use the libgcc2 implementations of those functions (whether copied, or 
> with some logic to build selected libgcc2.c functions for KFmode), which 
> implement them using a few hardware operations on DImode [note that where 
> libgcc2.c has e.g. __floatditf, that gets mapped to __floattitf for 64-bit 
> systems], than to use the soft-fp implementations doing everything with 
> integer arithmetic.  (There are IEEE exceptions issues with the libgcc2.c 
> conversions from double-word integers to floating-point - see bug 59412 - 
> but since that's a preexisting issue for all architectures using this 
> code, it's clearly not your problem to fix.)
> 
> Ideally, I'd think that for optimal efficiency if objects built for power8 
> are linked with libgcc built for power9, or if an executable using shared 
> libgcc that was built for power8 gets run with shared libgcc for power9, 
> you'd want power9 libgcc to contain t-hardfp versions of all the functions 
> that can be expanded inline for power9, and libgcc2 versions of those 
> (such as TImode comparisons) that aren't expanded inline, but not to 
> contain soft-fp versions of any of those KFmode functions.  Cf. how 
> config.host ensures various 32-bit powerpc variants use the right mixture 
> of hardfp and soft-fp functions.  It's a bit fiddly to make sure you get 
> the preferred implementation of every function and that the ABI doesn't 
> change depending on the configured processor, but not that hard.

Yep, that is my thinking. 

> Since none of the libgcc pieces for KFmode support are yet in, and the 
> proposed changes are optimizations rather than a matter of correctness, 
> none of the above should directly affect this patch in any way - it simply 
> indicates desirable followup once both the libgcc soft-fp KFmode support, 
> and this patch, are in.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH 1/6] Use IFN_SQRT in tree-vect-patterns.c

2015-11-10 Thread Joseph Myers
On Tue, 10 Nov 2015, Richard Biener wrote:

> Looks ok but I wonder if this is dead code with
> 
> (for pows (POW)
>  sqrts (SQRT)
>  cbrts (CBRT)
>  (simplify
>   (pows @0 REAL_CST@1)
>   (with {
> const REAL_VALUE_TYPE *value = TREE_REAL_CST_PTR (@1);
> REAL_VALUE_TYPE tmp;
>}
>(switch
> ...
> /* pow(x,0.5) -> sqrt(x).  */
> (if (flag_unsafe_math_optimizations
>  && canonicalize_math_p ()
>  && real_equal (value, ))
>  (sqrts @0))
> 
> also wondering here about canonicalize_math_p (), I'd expected the
> reverse transform as canonicalization.  Also wondering about
> flag_unsafe_math_optimizations (missing from the vectorizer pattern).

pow(x,0.5) -> sqrt(x) is unsafe because: pow (-0, 0.5) is specified in 
Annex F to be +0 but sqrt (-0) is -0; pow (-Inf, 0.5) is specified in 
Annex F to be +Inf, but sqrt (-Inf) is NaN with "invalid" exception 
raised.  I think it's safe in other cases (the reverse of course is not 
safe, sqrt is a fully-specified correctly-rounded IEEE operation and pow 
isn't).

-- 
Joseph S. Myers
jos...@codesourcery.com


[PATCH 3a/4][AArch64] Add attribute for compatibility with ARM pipeline models

2015-11-10 Thread Evandro Menezes

   2015-11-10  Evandro Menezes 

   gcc/

   * config/aarch64/aarch64.md (predicated): Copy attribute from
   "arm.md".

This patch duplicates an attribute from arm.md so that the same pipeline 
model can be used for both AArch32 and AArch64.


Bootstrapped on arm-unknown-linux-gnueabihf, aarch64-unknown-linux-gnu.

Please, commit if it's alright.

--
Evandro Menezes


>From 3b643a3c026350864713e1700dc44e4794d93809 Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Mon, 9 Nov 2015 17:11:16 -0600
Subject: [PATCH 1/2] [AArch64] Add attribute for compatibility with ARM
 pipeline models

gcc/
	* config/aarch64/aarch64.md (predicated): Copy attribute from "arm.md".
---
 gcc/config/aarch64/aarch64.md | 5 +
 1 file changed, 5 insertions(+)

diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 6b08850..2bc2ff5 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -195,6 +195,11 @@
 ;; 1 :=: yes
 (define_attr "far_branch" "" (const_int 0))
 
+;; [For compatibility with ARM in pipeline models]
+;; Attribute that specifies whether or not the instruction is executed
+;; conditionally ( != "AL"? "yes": "no").
+(define_attr "predicated" "yes,no" (const_string "no"))
+
 ;; ---
 ;; Pipeline descriptions and scheduling
 ;; ---
-- 
2.1.0.243.g30d45f7



Re: [PATCH, i386]: Fix gcc.c-torture/compile/pr41634.c FAIL

2015-11-10 Thread Uros Bizjak
On Tue, Nov 10, 2015 at 7:00 PM, Richard Henderson  wrote:
> On 11/10/2015 06:54 PM, Uros Bizjak wrote:
>>
>> -  return "movabs{}\t{%1, %0|%0, %1}";
>> +  return "movabs{}\t{%1, %P0|[%P0], %1}";
>
>
> The thing that's missing from this, that's present in the patch that I sent
> you off-list, is the  thing for Intel syntax.
>
> Would you prefer to just add that back here via , rather than
> using a new %v specifier like in my patch?

I have opted for the same assembly code as it was generated
previously. But, since we have macroized pattern and already available
mode attribute, I'd prefer to use  PTR [...]. There are
already a couple of examples using this approach in i386.md.

BTW: gas is able to determine pointer size from register name, so
having PTR prefix does not change generated object code.

Uros.


Re: State of support for the ISO C++ Transactional Memory TS and remanining work

2015-11-10 Thread Szabolcs Nagy

On 09/11/15 00:19, Torvald Riegel wrote:

Hi,

I'd like to summarize the current state of support for the TM TS, and
outline the current plan for the work that remains to complete the
support.

I'm aware we're at the end of stage 1, but I'm confident we can still
finish this work and hope to include it in GCC 6 because:
(1) most of the support is already in GCC, and we have a big head start
in the space of TM so it would be unfortunate to not waste that by not
delivering support for the TM TS,
(2) this is a TS and support for it is considered experimental,
(3) most of the affected code is in libitm or the compiler's TM passes,
which has to be enabled explicitly by the user.

Currently, we have complete support for the syntax and all necessary
instrumentation except the exception handling bits listed below.  libitm
has a good set of STM and HTM-based algorithms.


What is missing on the compiler side is essentially a change of how we
support atomic_noexcept and atomic_cancel, in particular exception
handling.  Instead of just using a finally block as done currently, the
compiler need to build a catch clause so that it can actively intercept
exceptions that escape an atomic_noexcept or atomic_cancel.  For
atomic_noexcept, the compiler needs to include a call to abort() in the
catch clause.


For atomic_cancel, it needs to call ITM_commitTransactionEH in the catch
clause, and use NULL as exception argument.  This can then be used by
libitm to look at the currently being handled exception and (a) check
whether the type support transaction cancellation as specified by the TS
and (b) pick out the allocations that belong to this exception and roll
back everything else before rethrowing this exception.

For (a), it's probably best to place this check into libstdc++
(specifically, libsupc++ I suppose) because support for transaction
cancellation is a property that library parts of the standard (or the
TS) require, and that has to match the implementation in libstdc++.
Attached is a patch by Jason that implements this check.  This adds one
symbol, which should be okay we hope.



does that mean libitm will depend on libstdc++?

i think the tm runtime should not depend on c++,
so it is usable from c code.


For (b), our plan is to track the additional allocations that happen
when during construction of the exception types that support
cancellation (eg, creating the what() string for logic_error).  There
are several ways to do that, one of that being that we create custom
transactional clones of those constructors that tell libitm that either
such a constructor is currently running or explicitly list the
allocations that have been made by the constructor; eventually, we would
always (copy) construct into memory returned by cxa_allocate_exception,
which then makes available the necessary undo information when such an
exception is handled in libitm.


The other big piece of missing support is making sure that the functions
that are specified in the TS as transaction_safe are indeed that.  I
believe we do not need to actually add such annotations to any libstdc++
functions that are already transaction-safe and completely defined in
headers -- those functions are implicitly transaction-safe, and we can
thus let the compiler isntrument them at the point of use inside of a
transaction.

If a supposedly transaction-safe function is not defined in a header,
we'd need a transaction_safe annotation at the declaration.  Jason has
implemented the TM TS feature test macro, so we can only add the
annotation if the user has enabled support for the TM TS in the
respective compilation process.


sounds ugly: the function type (transaction-safety)
depends on the visibility of the definition..


We also need ensure that there is a transaction clode of the function.
This will add symbols to libstdc++, but these all have their own special
prefix in the mangled name.  I'd like to get feedback on how to best
trigger the insturmentation and make it a part of a libstdc++ build.
(If that would show to be too problematic, we could still fall back to
writing transacitonal clones manually.)
For the clones of the constructors of the types that support
cancellation, I suppose manually written clones might be easier than
automatic instrumentation.

I've not yet created tests for the full list of functions specified as
transaction-safe in the TS, but my understanding is that this list was
created after someone from the ISO C++ TM study group looked at libstdc
++'s implementation and investigated which functions might be feasible
to be declared transaction-safe in it.



is that list available somewhere?

libitm seems to try to allow allocating functions in
transactions, but syscalls are not supposed to be
transaction safe.

are allocating functions prohibited?


I'm looking forward to your feedback.

Thanks,

Torvald



i'm not familiar with libitm, but i see several implementation
issues:

xmalloc
  the runtime exits on memory allocation failure,
 

[PATCH][ARM] PR 68149 Fix ICE in unaligned_loaddi split

2015-11-10 Thread Kyrill Tkachov

Hi all,

This ICE in this PR occurs when we're trying to split unaligned_loaddi into two 
SImode unaligned loads.
The problem is in the addressing mode.  When reload was picking the addressing 
mode we accepted an offset of
-256 because the mode in the pattern is advertised as DImode and that was 
accepted by the legitimate address
hooks because they thought it was a NEON load (DImode is in 
VALID_NEON_DREG_MODE). However, the splitter wants
to generate two normal SImode unaligned loads using that address, for which 
-256 is not valid, so we ICE
in gen_lowpart.

The only way unaligned_loaddi could be generated was through the 
gen_movmem_ldrd_strd expansion that implements
a memmove using LDRD and STRD sequences. If the memmove source is not aligned 
we can't use LDRDs so the code
generates unaligned_loaddi patterns and expects them to be split into two 
normal loads after reload. Similarly
for unaligned store destinations.

This patch just explicitly generates the two unaligned SImode loads or stores 
when appropriate inside
gen_movmem_ldrd_strd.  This makes the unaligned_loaddi and unaligned_storedi 
patterns unused, so we can remove them.

This patch fixes the ICe in gcc.target/aarch64/advsimd-intrinsics/vldX.c seen 
with
-mthumb -mcpu=cortex-a15 -mfpu=neon-vfpv4 -mfloat-abi=hard -mfp16-format=ieee
so no new testcase is added.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?

Thanks,
Kyrill

2015-11-10  Kyrylo Tkachov  

PR target/68149
* config/arm/arm.md (unaligned_loaddi): Delete.
(unaligned_storedi): Likewise.
* config/arm/arm.c (gen_movmem_ldrd_strd): Don't generate
unaligned DImode memory ops.  Instead perform two back-to-back
unalgined SImode ops.
commit 51849126dbef9ebdd95e0ee4dbcd84361e22c992
Author: Kyrylo Tkachov 
Date:   Tue Nov 3 17:36:38 2015 +

[ARM] Fix ICE in unaligned_loaddi split

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 71e704c..eafcb9c 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -14911,21 +14911,41 @@ gen_movmem_ldrd_strd (rtx *operands)
   if (!(dst_aligned || src_aligned))
 return arm_gen_movmemqi (operands);
 
-  src = adjust_address (src, DImode, 0);
-  dst = adjust_address (dst, DImode, 0);
+  /* If the either src or dst is unaligned we'll be accessing it as pairs
+ of unaligned SImode accesses.  Otherwise we can generate DImode
+ ldrd/strd instructions.  */
+  src = adjust_address (src, src_aligned ? DImode : SImode, 0);
+  dst = adjust_address (dst, dst_aligned ? DImode : SImode, 0);
+
   while (len >= 8)
 {
   len -= 8;
   reg0 = gen_reg_rtx (DImode);
+  rtx low_reg = NULL_RTX;
+  rtx hi_reg = NULL_RTX;
+
+  if (!src_aligned || !dst_aligned)
+	{
+	  low_reg = gen_lowpart (SImode, reg0);
+	  hi_reg = gen_highpart_mode (SImode, DImode, reg0);
+	}
   if (src_aligned)
 emit_move_insn (reg0, src);
   else
-emit_insn (gen_unaligned_loaddi (reg0, src));
+	{
+	  emit_insn (gen_unaligned_loadsi (low_reg, src));
+	  src = next_consecutive_mem (src);
+	  emit_insn (gen_unaligned_loadsi (hi_reg, src));
+	}
 
   if (dst_aligned)
 emit_move_insn (dst, reg0);
   else
-emit_insn (gen_unaligned_storedi (dst, reg0));
+	{
+	  emit_insn (gen_unaligned_storesi (dst, low_reg));
+	  dst = next_consecutive_mem (dst);
+	  emit_insn (gen_unaligned_storesi (dst, hi_reg));
+	}
 
   src = next_consecutive_mem (src);
   dst = next_consecutive_mem (dst);
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 1e40b17..42f961f 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -4362,59 +4362,6 @@ (define_insn "unaligned_storehi"
(set_attr "predicable_short_it" "yes,no")
(set_attr "type" "store1")])
 
-;; Unaligned double-word load and store.
-;; Split after reload into two unaligned single-word accesses.
-;; It prevents lower_subreg from splitting some other aligned
-;; double-word accesses too early. Used for internal memcpy.
-
-(define_insn_and_split "unaligned_loaddi"
-  [(set (match_operand:DI 0 "s_register_operand" "=l,r")
-	(unspec:DI [(match_operand:DI 1 "memory_operand" "o,o")]
-		   UNSPEC_UNALIGNED_LOAD))]
-  "unaligned_access && TARGET_32BIT"
-  "#"
-  "&& reload_completed"
-  [(set (match_dup 0) (unspec:SI [(match_dup 1)] UNSPEC_UNALIGNED_LOAD))
-   (set (match_dup 2) (unspec:SI [(match_dup 3)] UNSPEC_UNALIGNED_LOAD))]
-  {
-operands[2] = gen_highpart (SImode, operands[0]);
-operands[0] = gen_lowpart (SImode, operands[0]);
-operands[3] = gen_highpart (SImode, operands[1]);
-operands[1] = gen_lowpart (SImode, operands[1]);
-
-/* If the first destination register overlaps with the base address,
-   swap the order in which the loads are emitted.  */
-if (reg_overlap_mentioned_p (operands[0], operands[1]))
-  {
-std::swap (operands[1], operands[3]);
-std::swap (operands[0], operands[2]);

[PATCH, i386]: Fix gcc.c-torture/compile/pr41634.c FAIL

2015-11-10 Thread Uros Bizjak
Hello!

Recent AS patches introduced the above compilation failure. We have to
treat movabs operands in a special way - without %rip and inside
square bracket for -masm=intel.

Also, the patch removes dead code while at it.

2015-11-10  Uros Bizjak  

* config/i386/i386.c (ix86_print_operand): Remove dead code that
tried to avoid (%rip) for call operands.

2015-11-10  Uros Bizjak  

* config/i386/i386.c (ix86_print_operand_address_as): Add no_rip
argument.  Do not use RIP relative addressing when no_rip is set.
(ix86_print_operand): Update call to ix86_print_operand_address_as.
(ix86_print_operand_address): Ditto.
* config/i386/i386.md (*movabs_1): Use %P modifier for
absolute movabs operand 0.  Add square braces for -masm=intel.
(*movabs_2): Ditto for operand 1.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32} and
committed to mainline SVN.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 230084)
+++ config/i386/i386.md (working copy)
@@ -2601,7 +2601,7 @@
   switch (which_alternative)
 {
 case 0:
-  return "movabs{}\t{%1, %0|%0, %1}";
+  return "movabs{}\t{%1, %P0|[%P0], %1}";
 case 1:
   return "mov{}\t{%1, %0|%0, %1}";
 default:
@@ -2625,7 +2625,7 @@
   switch (which_alternative)
 {
 case 0:
-  return "movabs{}\t{%1, %0|%0, %1}";
+  return "movabs{}\t{%P1, %0|%0, [%P1]}";
 case 1:
   return "mov{}\t{%1, %0|%0, %1}";
 default:
Index: config/i386/i386.c
===
--- config/i386/i386.c  (revision 230084)
+++ config/i386/i386.c  (working copy)
@@ -80,7 +80,7 @@ along with GCC; see the file COPYING3.  If not see
 static rtx legitimize_dllimport_symbol (rtx, bool);
 static rtx legitimize_pe_coff_extern_decl (rtx, bool);
 static rtx legitimize_pe_coff_symbol (rtx, bool);
-static void ix86_print_operand_address_as (FILE *file, rtx addr, addr_space_t);
+static void ix86_print_operand_address_as (FILE *, rtx, addr_space_t, bool);
 
 #ifndef CHECK_STACK_LIMIT
 #define CHECK_STACK_LIMIT (-1)
@@ -17131,13 +17131,6 @@ ix86_print_operand (FILE *file, rtx x, int code)
 {
   rtx addr = XEXP (x, 0);
 
-  /* Avoid (%rip) for call operands.  */
-  if (code == 'P' && CONSTANT_ADDRESS_P (x) && !CONST_INT_P (x))
-   {
- output_addr_const (file, addr);
- return;
-   }
-
   /* No `byte ptr' prefix for call instructions ... */
   if (ASSEMBLER_DIALECT == ASM_INTEL && code != 'X' && code != 'P')
{
@@ -17187,7 +17180,8 @@ ix86_print_operand (FILE *file, rtx x, int code)
   if (this_is_asm_operands && ! address_operand (addr, VOIDmode))
output_operand_lossage ("invalid constraints for operand");
   else
-   ix86_print_operand_address_as (file, addr, MEM_ADDR_SPACE (x));
+   ix86_print_operand_address_as
+ (file, addr, MEM_ADDR_SPACE (x), code == 'p' || code == 'P');
 }
 
   else if (CONST_DOUBLE_P (x) && GET_MODE (x) == SFmode)
@@ -17272,7 +17266,8 @@ ix86_print_operand_punct_valid_p (unsigned char co
 /* Print a memory operand whose address is ADDR.  */
 
 static void
-ix86_print_operand_address_as (FILE *file, rtx addr, addr_space_t as)
+ix86_print_operand_address_as (FILE *file, rtx addr,
+  addr_space_t as, bool no_rip)
 {
   struct ix86_address parts;
   rtx base, index, disp;
@@ -17346,7 +17341,7 @@ static void
 }
 
   /* Use one byte shorter RIP relative addressing for 64bit mode.  */
-  if (TARGET_64BIT && !base && !index)
+  if (TARGET_64BIT && !base && !index && !no_rip)
 {
   rtx symbol = disp;
 
@@ -17505,7 +17500,7 @@ static void
 static void
 ix86_print_operand_address (FILE *file, machine_mode /*mode*/, rtx addr)
 {
-  ix86_print_operand_address_as (file, addr, ADDR_SPACE_GENERIC);
+  ix86_print_operand_address_as (file, addr, ADDR_SPACE_GENERIC, false);
 }
 
 /* Implementation of TARGET_ASM_OUTPUT_ADDR_CONST_EXTRA.  */


Re: [C/C++ PATCH] Reject declarators with huge arrays (PR c/68107, c++/68266)

2015-11-10 Thread Joseph Myers
On Tue, 10 Nov 2015, Marek Polacek wrote:

> While both C and C++ FEs are able to reject e.g.
> int a[__SIZE_MAX__ / sizeof(int)];
> they are accepting code such as
> int (*a)[__SIZE_MAX__ / sizeof(int)];
> 
> As Joseph pointed out, any construction of a non-VLA type whose size is half 
> or
> more of the address space should receive a compile-time error.
> 
> Done by moving up the check for the size in bytes so that it checks check 
> every
> non-VLA complete array type constructed in the course of processing the
> declarator.  Since the C++ FE had the same problem, I've fixed it up there as
> well.  And that's why I had to twek dg-error of two C++ tests; if the size of
> an array is considered invalid, we give an error message with word "unnamed".
> 
> (I've removed the comment about crashing in tree_to_[su]hwi since that seems
> to no longer be the case.)
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?

The C front-end changes are OK.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 3a/4][AArch64] Add attribute for compatibility with ARM pipeline models

2015-11-10 Thread Ramana Radhakrishnan
On Tue, Nov 10, 2015 at 5:50 PM, Evandro Menezes  wrote:
>2015-11-10  Evandro Menezes 
>
>gcc/
>
>* config/aarch64/aarch64.md (predicated): Copy attribute from
>"arm.md".
>
> This patch duplicates an attribute from arm.md so that the same pipeline
> model can be used for both AArch32 and AArch64.

I'm not an aarch64 maintainer so I cannot approve.

There are no predicated instructions in aarch64 - thus it's best imho
to have only one option, "no" and not even give the option for someone
to accidentally set this to yes.

regards
Ramana


>
> Bootstrapped on arm-unknown-linux-gnueabihf, aarch64-unknown-linux-gnu.
>
> Please, commit if it's alright.
>
> --
> Evandro Menezes
>
>


Re: [PATCH], Add power9 support to GCC, patch #8 (add integer multiply/add)

2015-11-10 Thread Michael Meissner
This patch adds support for the MADDLD instruciton, which is a fused
multiply/add instruction for integers.  At this time, it is for 64-bit
multiplies only.  Eventually, we will restructure 128-bit multiply so that we
can use the 64x64 + 64 high bit varients.

I have bootstrapped a compiler with this change in and there were no
regressions.  Is it ok to apply to the trunk?

[gcc]
2015-11-10  Michael Meissner  

* config/rs6000/rs6000.h (TARGET_MADDLD): Add support for the ISA
3.0 integer multiply-add instruction.
* config/rs6000/rs6000.md (mul3): Likewise.

[gcc/testsuite]
2015-11-10  Michael Meissner  

* gcc.target/powerpc/maddld.c: New test.


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/rs6000.h
===
--- gcc/config/rs6000/rs6000.h  (revision 230078)
+++ gcc/config/rs6000/rs6000.h  (working copy)
@@ -571,6 +571,7 @@ extern int rs6000_vector_align[];
 #define TARGET_FCTIWUZ TARGET_POPCNTD
 #define TARGET_CTZ TARGET_MODULO
 #define TARGET_EXTSWSLI(TARGET_MODULO && TARGET_POWERPC64)
+#define TARGET_MADDLD  (TARGET_MODULO && TARGET_POWERPC64)
 
 #define TARGET_XSCVDPSPN   (TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
 #define TARGET_XSCVSPDPN   (TARGET_DIRECT_MOVE || TARGET_P8_VECTOR)
Index: gcc/config/rs6000/rs6000.md
===
--- gcc/config/rs6000/rs6000.md (revision 230078)
+++ gcc/config/rs6000/rs6000.md (working copy)
@@ -2837,6 +2837,14 @@ (define_expand "mul3"
   DONE;
 })
 
+(define_insn "*maddld4"
+  [(set (match_operand:DI 0 "gpc_reg_operand" "=r")
+   (plus:DI (mult:DI (match_operand:DI 1 "gpc_reg_operand" "r")
+ (match_operand:DI 2 "gpc_reg_operand" "r"))
+(match_operand:DI 3 "gpc_reg_operand" "r")))]
+  "TARGET_MADDLD"
+  "maddld %0,%1,%2,%3"
+  [(set_attr "type" "mul")])
 
 (define_insn "udiv3"
   [(set (match_operand:GPR 0 "gpc_reg_operand" "=r")
Index: gcc/testsuite/gcc.target/powerpc/maddld.c
===
--- gcc/testsuite/gcc.target/powerpc/maddld.c   (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/maddld.c   (revision 0)
@@ -0,0 +1,20 @@
+/* { dg-do compile { target { powerpc*-*-* && lp64 } } } */
+/* { dg-require-effective-target powerpc_p9modulo_ok } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power9" } } */
+/* { dg-options "-mcpu=power9 -O2" } */
+
+long
+s_madd (long a, long b, long c)
+{
+  return (a * b) + c;
+}
+
+unsigned long
+u_madd (unsigned long a, unsigned long b, unsigned long c)
+{
+  return (a * b) + c;
+}
+
+/* { dg-final { scan-assembler-times "maddld " 2 } } */
+/* { dg-final { scan-assembler-not   "mulld "} } */
+/* { dg-final { scan-assembler-not   "add "  } } */


[PATCH 3b/4][AArch64] Add scheduling model for Exynos M1

2015-11-10 Thread Evandro Menezes

   2015-11-10  Evandro Menezes 

   gcc/

   * config/aarch64/aarch64-cores.def: Use the Exynos M1 sched model.
   * config/aarch64/aarch64.md: Include "exynos-m1.md".
   * config/arm/arm-cores.def: Use the Exynos M1 sched model.
   * config/arm/arm.md: Include "exynos-m1.md".
   * config/arm/arm-tune.md: Regenerated.
   * config/arm/exynos-m1.md: New file.

This patch adds the scheduling model for Exynos M1.  It depends on 
https://gcc.gnu.org/ml/gcc-patches/2015-11/msg01257.html


Bootstrapped on arm-unknown-linux-gnueabihf, aarch64-unknown-linux-gnu.

Please, commit if it's alright.

--
Evandro Menezes


>From 0b7b6d597e5877c78c4d88e0d4491858555a5364 Mon Sep 17 00:00:00 2001
From: Evandro Menezes 
Date: Mon, 9 Nov 2015 17:18:52 -0600
Subject: [PATCH 2/2] [AArch64] Add scheduling model for Exynos M1

gcc/
	* config/aarch64/aarch64-cores.def: Use the Exynos M1 sched model.
	* config/aarch64/aarch64.md: Include "exynos-m1.md".
	* config/arm/arm-cores.def: Use the Exynos M1 sched model.
	* config/arm/arm.md: Include "exynos-m1.md".
	* config/arm/arm-tune.md: Regenerated.
	* config/arm/exynos-m1.md: New file.
---
 gcc/config/aarch64/aarch64-cores.def |   2 +-
 gcc/config/aarch64/aarch64.md|   1 +
 gcc/config/arm/arm-cores.def |   2 +-
 gcc/config/arm/arm.md|   3 +-
 gcc/config/arm/exynos-m1.md  | 947 +++
 5 files changed, 952 insertions(+), 3 deletions(-)
 create mode 100644 gcc/config/arm/exynos-m1.md

diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index 0ab1ca8..c17baa3 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -43,7 +43,7 @@
 AARCH64_CORE("cortex-a53",  cortexa53, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa53, "0x41", "0xd03")
 AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd07")
 AARCH64_CORE("cortex-a72",  cortexa72, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, "0x41", "0xd08")
-AARCH64_CORE("exynos-m1",   exynosm1,  cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa72, "0x53", "0x001")
+AARCH64_CORE("exynos-m1",   exynosm1,  exynosm1,  8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa72, "0x53", "0x001")
 AARCH64_CORE("thunderx",thunderx,  thunderx,  8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  "0x43", "0x0a1")
 AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  AARCH64_FL_FOR_ARCH8, xgene1, "0x50", "0x000")
 
diff --git a/gcc/config/aarch64/aarch64.md b/gcc/config/aarch64/aarch64.md
index 2bc2ff5..18f5547 100644
--- a/gcc/config/aarch64/aarch64.md
+++ b/gcc/config/aarch64/aarch64.md
@@ -210,6 +210,7 @@
 ;; Scheduling
 (include "../arm/cortex-a53.md")
 (include "../arm/cortex-a57.md")
+(include "../arm/exynos-m1.md")
 (include "thunderx.md")
 (include "../arm/xgene1.md")
 
diff --git a/gcc/config/arm/arm-cores.def b/gcc/config/arm/arm-cores.def
index 4c35200..3448e82 100644
--- a/gcc/config/arm/arm-cores.def
+++ b/gcc/config/arm/arm-cores.def
@@ -168,7 +168,7 @@ ARM_CORE("cortex-a17.cortex-a7", cortexa17cortexa7, cortexa7,	7A,	ARM_FSET_MAKE_
 ARM_CORE("cortex-a53",	cortexa53, cortexa53,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a53)
 ARM_CORE("cortex-a57",	cortexa57, cortexa57,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a57)
 ARM_CORE("cortex-a72",	cortexa72, cortexa57,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a57)
-ARM_CORE("exynos-m1",	exynosm1,  cortexa57,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a57)
+ARM_CORE("exynos-m1",	exynosm1,  exynosm1,	8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_CRC32 | FL_FOR_ARCH8A), cortex_a57)
 ARM_CORE("xgene1",  xgene1,xgene1,  8A,	ARM_FSET_MAKE_CPU1 (FL_LDSCHED | FL_FOR_ARCH8A),xgene1)
 
 /* V8 big.LITTLE implementations */
diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 8ebb1bf..f14cd0e 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -377,7 +377,7 @@
 arm1136jfs,cortexa5,cortexa7,cortexa8,\
 cortexa9,cortexa12,cortexa15,cortexa17,\
 cortexa53,cortexa57,cortexm4,cortexm7,\
-marvell_pj4,xgene1")
+exynosm1,marvell_pj4,xgene1")
 	   (eq_attr "tune_cortexr4" "yes"))
   (const_string "no")
   (const_string "yes"
@@ -416,6 +416,7 @@
 (include "cortex-m7.md")
 (include "cortex-m4.md")
 (include "cortex-m4-fpu.md")
+(include "exynos-m1.md")
 (include "vfp11.md")
 (include "marvell-pj4.md")
 (include "xgene1.md")
diff --git a/gcc/config/arm/exynos-m1.md b/gcc/config/arm/exynos-m1.md
new file mode 100644
index 000..fd73353
--- /dev/null
+++ 

[PATCH] gcc.c: new macro POST_LINK_SPECS to be able to add additional steps after linking

2015-11-10 Thread Andris Pavenis

One may need to execute extra steps after linking program. This is required
for example for DJGPP to run stubify.exe on file generated by linker.

The only way how to achieve was to use LINK_COMMAND_SPEC. It would be much 
easier
and less error prone to use new macro POST_LINK_SPEC introduced in this patch.

Andris

ChangeLog entry

2015 Nov 10 Andris Pavenis 

* gcc.c: new macro POST_LINK_SPEC
* doc/tm.texi.in: document POST_LINK_SPEC
* doc/tm.texi: regenerate

>From 2b50898ca2340aa43ce756bd605862b947cf1e7d Mon Sep 17 00:00:00 2001
From: Andris Pavenis 
Date: Tue, 10 Nov 2015 19:52:57 +0200
Subject: [PATCH] New macro POST_LINK_SPEC for additional steps after rinning
 linker

---
 gcc/doc/tm.texi| 5 +
 gcc/doc/tm.texi.in | 5 +
 gcc/gcc.c  | 8 +++-
 3 files changed, 17 insertions(+), 1 deletion(-)

diff --git a/gcc/doc/tm.texi b/gcc/doc/tm.texi
index f394db7..fe4e7f0 100644
--- a/gcc/doc/tm.texi
+++ b/gcc/doc/tm.texi
@@ -375,6 +375,11 @@ The sequence in which libgcc and libc are specified to the linker.
 By default this is @code{%G %L %G}.
 @end defmac
 
+@defmac POST_LINK_SPEC
+Define this macro to add additional steps to be executed after linker.
+The default value of this macro is empty string.
+@end defmac
+
 @defmac LINK_COMMAND_SPEC
 A C string constant giving the complete command line need to execute the
 linker.  When you do this, you will need to update your port each time a
diff --git a/gcc/doc/tm.texi.in b/gcc/doc/tm.texi.in
index d188c57..8c9c1b2 100644
--- a/gcc/doc/tm.texi.in
+++ b/gcc/doc/tm.texi.in
@@ -375,6 +375,11 @@ The sequence in which libgcc and libc are specified to the linker.
 By default this is @code{%G %L %G}.
 @end defmac
 
+@defmac POST_LINK_SPEC
+Define this macro to add additional steps to be executed after linker.
+The default value of this macro is empty string.
+@end defmac
+
 @defmac LINK_COMMAND_SPEC
 A C string constant giving the complete command line need to execute the
 linker.  When you do this, you will need to update your port each time a
diff --git a/gcc/gcc.c b/gcc/gcc.c
index bbc9b23..45d6089 100644
--- a/gcc/gcc.c
+++ b/gcc/gcc.c
@@ -979,6 +979,10 @@ proper position among the other output files.  */
 %{%:sanitize(leak):" LIBLSAN_SPEC "}}}"
 #endif
 
+#ifndef POST_LINK_SPEC
+#define POST_LINK_SPEC ""
+#endif
+
 /*  This is the spec to use, once the code for creating the vtable
 verification runtime library, libvtv.so, has been created.  Currently
 the vtable verification runtime functions are in libstdc++, so we use
@@ -1021,7 +1025,7 @@ proper position among the other output files.  */
 %(mflib) " STACK_SPLIT_SPEC "\
 %{fprofile-arcs|fprofile-generate*|coverage:-lgcov} " SANITIZER_SPEC " \
 %{!nostdlib:%{!nodefaultlibs:%(link_ssp) %(link_gcc_c_sequence)}}\
-%{!nostdlib:%{!nostartfiles:%E}} %{T*} }}"
+%{!nostdlib:%{!nostartfiles:%E}} %{T*}  \n%(post_link) }}"
 #endif
 
 #ifndef LINK_LIBGCC_SPEC
@@ -1063,6 +1067,7 @@ static const char *linker_name_spec = LINKER_NAME;
 static const char *linker_plugin_file_spec = "";
 static const char *lto_wrapper_spec = "";
 static const char *lto_gcc_spec = "";
+static const char *post_link_spec = POST_LINK_SPEC;
 static const char *link_command_spec = LINK_COMMAND_SPEC;
 static const char *link_libgcc_spec = LINK_LIBGCC_SPEC;
 static const char *startfile_prefix_spec = STARTFILE_PREFIX_SPEC;
@@ -1571,6 +1576,7 @@ static struct spec_list static_specs[] =
   INIT_STATIC_SPEC ("linker_plugin_file",	_plugin_file_spec),
   INIT_STATIC_SPEC ("lto_wrapper",		_wrapper_spec),
   INIT_STATIC_SPEC ("lto_gcc",			_gcc_spec),
+  INIT_STATIC_SPEC ("post_link",		_link_spec),
   INIT_STATIC_SPEC ("link_libgcc",		_libgcc_spec),
   INIT_STATIC_SPEC ("md_exec_prefix",		_exec_prefix),
   INIT_STATIC_SPEC ("md_startfile_prefix",	_startfile_prefix),
-- 
2.4.3



Re: State of support for the ISO C++ Transactional Memory TS and remanining work

2015-11-10 Thread Torvald Riegel
On Tue, 2015-11-10 at 17:26 +, Szabolcs Nagy wrote:
> On 09/11/15 00:19, Torvald Riegel wrote:
> > Hi,
> >
> > I'd like to summarize the current state of support for the TM TS, and
> > outline the current plan for the work that remains to complete the
> > support.
> >
> > I'm aware we're at the end of stage 1, but I'm confident we can still
> > finish this work and hope to include it in GCC 6 because:
> > (1) most of the support is already in GCC, and we have a big head start
> > in the space of TM so it would be unfortunate to not waste that by not
> > delivering support for the TM TS,
> > (2) this is a TS and support for it is considered experimental,
> > (3) most of the affected code is in libitm or the compiler's TM passes,
> > which has to be enabled explicitly by the user.
> >
> > Currently, we have complete support for the syntax and all necessary
> > instrumentation except the exception handling bits listed below.  libitm
> > has a good set of STM and HTM-based algorithms.
> >
> >
> > What is missing on the compiler side is essentially a change of how we
> > support atomic_noexcept and atomic_cancel, in particular exception
> > handling.  Instead of just using a finally block as done currently, the
> > compiler need to build a catch clause so that it can actively intercept
> > exceptions that escape an atomic_noexcept or atomic_cancel.  For
> > atomic_noexcept, the compiler needs to include a call to abort() in the
> > catch clause.
> >
> >
> > For atomic_cancel, it needs to call ITM_commitTransactionEH in the catch
> > clause, and use NULL as exception argument.  This can then be used by
> > libitm to look at the currently being handled exception and (a) check
> > whether the type support transaction cancellation as specified by the TS
> > and (b) pick out the allocations that belong to this exception and roll
> > back everything else before rethrowing this exception.
> >
> > For (a), it's probably best to place this check into libstdc++
> > (specifically, libsupc++ I suppose) because support for transaction
> > cancellation is a property that library parts of the standard (or the
> > TS) require, and that has to match the implementation in libstdc++.
> > Attached is a patch by Jason that implements this check.  This adds one
> > symbol, which should be okay we hope.
> >
> 
> does that mean libitm will depend on libstdc++?

No, weak references are used to avoid that.  See libitm/eh_cpp.cc for
example.

> i think the tm runtime should not depend on c++,
> so it is usable from c code.
> 
> > For (b), our plan is to track the additional allocations that happen
> > when during construction of the exception types that support
> > cancellation (eg, creating the what() string for logic_error).  There
> > are several ways to do that, one of that being that we create custom
> > transactional clones of those constructors that tell libitm that either
> > such a constructor is currently running or explicitly list the
> > allocations that have been made by the constructor; eventually, we would
> > always (copy) construct into memory returned by cxa_allocate_exception,
> > which then makes available the necessary undo information when such an
> > exception is handled in libitm.
> >
> >
> > The other big piece of missing support is making sure that the functions
> > that are specified in the TS as transaction_safe are indeed that.  I
> > believe we do not need to actually add such annotations to any libstdc++
> > functions that are already transaction-safe and completely defined in
> > headers -- those functions are implicitly transaction-safe, and we can
> > thus let the compiler isntrument them at the point of use inside of a
> > transaction.
> >
> > If a supposedly transaction-safe function is not defined in a header,
> > we'd need a transaction_safe annotation at the declaration.  Jason has
> > implemented the TM TS feature test macro, so we can only add the
> > annotation if the user has enabled support for the TM TS in the
> > respective compilation process.
> 
> sounds ugly: the function type (transaction-safety)
> depends on the visibility of the definition..

I don't understand why that would be the case.  The TS specifies whether
a function has to be safe or not.  We strive to implement that to
support the TS.  Nonetheless, the TS also has the notion of a function
being implicitly transaction-safe (ie, if the compiler can check that it
is, it can be used as-if annotated as transacton-safe even if it
actually isn't).
Also note that you can't overload based on transaction-safe or not, so
it's not something you can check without compilation failures or your
program aborting.

> > We also need ensure that there is a transaction clode of the function.
> > This will add symbols to libstdc++, but these all have their own special
> > prefix in the mangled name.  I'd like to get feedback on how to best
> > trigger the insturmentation and make it a part of a libstdc++ build.
> > (If that would show to be 

Re: [C/C++ PATCH] Reject declarators with huge arrays (PR c/68107, c++/68266)

2015-11-10 Thread Paolo Carlini

 Hi,

On 11/10/2015 05:36 PM, Marek Polacek wrote:

+
+   /* Did array size calculations overflow or does the array
+  cover more than half of the address-space?  */
+   if (COMPLETE_TYPE_P (type)
+   && TREE_CODE (TYPE_SIZE_UNIT (type)) == INTEGER_CST
+   && !valid_constant_size_p (TYPE_SIZE_UNIT (type)))
+ {
+   if (name)
+ error_at (loc, "size of array %qE is too large", name);
+   else
+ error_at (loc, "size of unnamed array is too large");
+   type = error_mark_node;
+ }
  }
Obviously "the issue" predates your proposed change, but I don't 
understand why the code implementing the check can't be shared by the 
front-ends via a small function in c-family...


Paolo.


Re: [PATCH 1/2] simplify-rtx: Simplify trunc of and of shiftrt

2015-11-10 Thread Segher Boessenkool
On Tue, Nov 10, 2015 at 12:16:09PM +0100, Bernd Schmidt wrote:
> On 11/09/2015 08:33 AM, Segher Boessenkool wrote:
> >If we have
> >
> > (truncate:M1 (and:M2 (lshiftrt:M2 (x:M2) C) C2))
> >
> >we can write it instead as
> >
> > (and:M1 (lshiftrt:M1 (truncate:M1 (x:M2)) C) C2)
> >
> >
> >+  /* Likewise (truncate:QI (and:SI (lshiftrt:SI (x:SI) C) C2)) into
> >+ (and:QI (lshiftrt:QI (truncate:QI (x:SI)) C) C2) for suitable C
> >+ and C2.  */
> >+  if (GET_CODE (op) == AND
> >+  && (GET_CODE (XEXP (op, 0)) == LSHIFTRT
> >+  || GET_CODE (XEXP (op, 0)) == ASHIFTRT)
> >+  && CONST_INT_P (XEXP (XEXP (op, 0), 1))
> >+  && CONST_INT_P (XEXP (op, 1))
> >+  && UINTVAL (XEXP (XEXP (op, 0), 1)) < precision
> >+  && ((GET_MODE_MASK (mode) >> UINTVAL (XEXP (XEXP (op, 0), 1)))
> >+  & UINTVAL (XEXP (op, 1)))
> >+ == ((GET_MODE_MASK (op_mode) >> UINTVAL (XEXP (XEXP (op, 0), 1)))
> >+ & UINTVAL (XEXP (op, 1
> 
> In general this would be easier to read if there were intermediate 
> variables called shift_amount and mask.

Yes I know.  All the rest of the code around is it like this though.
Do you want this written in a saner way?

> I'm not entirely sure what the 
> last condition here is supposed to test.

It tests whether moving the truncate inside will give the same result.
It essentially looks if it works for an x with all bits set; if that
works, it works for any x.

> Is it related to...
> 
> >+return simplify_gen_binary (AND, mode, op0, XEXP (op, 1));
> 
> ... the fact that here I think you'd have to trunc_int_for_mode the AND 
> amount for the smaller mode?

Ugh yes, I still have to do that for it to be valid RTL in all cases.
Thanks for catching it.


Segher


Re: [RFC][PATCH] Preferred rename register in regrename pass

2015-11-10 Thread James Greenhalgh
On Tue, Nov 10, 2015 at 05:22:40PM +0100, Christophe Lyon wrote:
> On 10 November 2015 at 12:41, Robert Suchanek
>  wrote:
> > Hi Christophe,
> >
> >> Hi,
> >>
> >> Since you committed this (r230087 if I'm correct), I can see that GCC
> >> fails to build
> >> ligfortran for target arm-none-linuxgnueabi --with-cpu=cortex-a9.
> > ...
> >>
> >> Can you have a look?
> >
> > Sorry for the breakage. I see that my assertion is being triggered.
> > I'll investigate this and check whether the assertion is correct or
> > something else needs to be done.
> >
> 
> Now that 'make check' has had enough time to run, I can see several
> regressions in the configurations where GCC still builds.
> For more details:
> http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/230087/report-build-info.html
> 

This also causes failures for AArch64 -mcpu=cortex-a57 targets. This
testcase:

  void
  foo (unsigned char *out, const unsigned char *in, int a)
  {
for (int i = 0; i < a; i++)
  {
out[0] = in[2];
out[1] = in[1];
out[2] = in[0];
in += 3;
out += 3;
  }
  }

Fails as so:

  foo.c: In function 'void foo(unsigned char*, const unsigned char*, int)':
  foo.c:12:1: internal compiler error: in scan_rtx_reg, at regrename.c:1074
   }
   ^

  0xbe00f8 scan_rtx_reg
/gcc/regrename.c:1073
  0xbe0ad5 scan_rtx
/gcc/regrename.c:1401
  0xbe1038 record_out_operands
/gcc/regrename.c:1554
  0xbe1f50 build_def_use
/gcc/regrename.c:1802
  0xbe1f50 regrename_analyze(bitmap_head*)
/gcc/regrename.c:726
  0xf7a0c7 func_fma_steering::execute_fma_steering()
/gcc/config/aarch64/cortex-a57-fma-steering.c:1026
  0xf7a9c1 pass_fma_steering::execute(function*)
/gcc/config/aarch64/cortex-a57-fma-steering.c:1063
  Please submit a full bug report,
  with preprocessed source if appropriate.
  Please include the complete backtrace with any bug report.
  See  for instructions.

When compiled with:

   -O3 -mcpu=cortex-a57 foo.c

Thanks,
James



Re: [PATCH, i386]: Fix gcc.c-torture/compile/pr41634.c FAIL

2015-11-10 Thread Richard Henderson

On 11/10/2015 06:54 PM, Uros Bizjak wrote:

-  return "movabs{}\t{%1, %0|%0, %1}";
+  return "movabs{}\t{%1, %P0|[%P0], %1}";


The thing that's missing from this, that's present in the patch that I sent you 
off-list, is the  thing for Intel syntax.


Would you prefer to just add that back here via , rather than using a 
new %v specifier like in my patch?



r~


Re: Enable pointer TBAA for LTO

2015-11-10 Thread Jan Hubicka
> > Index: tree.c
> > ===
> > --- tree.c  (revision 229968)
> > +++ tree.c  (working copy)
> > @@ -13198,6 +13198,7 @@ gimple_canonical_types_compatible_p (con
> >/* If the types have been previously registered and found equal
> >   they still are.  */
> >if (TYPE_CANONICAL (t1) && TYPE_CANONICAL (t2)
> > +  && !POINTER_TYPE_P (t1) && !POINTER_TYPE_P (t2)
> 
> But TYPE_CANONICAL (t1) should be NULL_TREE for POINTER_TYPE_P?

The reason is that TYPE_CANONICAL is initialized in get_alias_set that may be
called before we finish all merging and then it is more fine grained than what
we need here (i.e. TYPE_CANONICAL of pointers to two differnt types will be
different, but here we want them to be equal so we can match:

struct aa { void *ptr;};
struct bb { int * ptr;};

Which is actually required for Fortran interoperability.

Removing this hunk triggers false type incompatibility warning in one of the
interoperability testcases I added.

Even if I drop the code bellow setting TYPE_CANOINCAL, I think I need to keep
this conditional: the types may be built in and those get TYPE_CANONICAL set as
they are constructed by build_pointer_type.  I can gcc_checking_assert for this
scenario and see.  Perhaps we never build LTO type from builtin type and this
won't happen. If we did, we would probably have a trouble with false negatives
in return TYPE_CANONICAL (t1) == TYPE_CANONICAL (t2); on non-pointers anyway.
> 
> >&& trust_type_canonical)
> >  return TYPE_CANONICAL (t1) == TYPE_CANONICAL (t2);
> >  
> > Index: alias.c
> > ===
> > --- alias.c (revision 229968)
> > +++ alias.c (working copy)
> > @@ -869,13 +874,19 @@ get_alias_set (tree t)
> >set = lang_hooks.get_alias_set (t);
> >if (set != -1)
> > return set;
> > -  return 0;
> > +  /* LTO frontend does not assign canonical types to pointers (which we
> > +ignore anyway) and we compute them.  The following path may be
> > +probably enabled for non-LTO, too, and it may improve TBAA for
> > +pointers to types with structural equality.  */
> > +  if (!in_lto_p || !POINTER_TYPE_P (t))
> > +return 0;
> 
> No new LTO paths please, do the suggested change immediately.

OK, I originally tested the patch without if and there was no problems.
Just chickened out before preparing final version of the patch.
> > + p = TYPE_MAIN_VARIANT (p);
> > + /* Normally all pointer types are built by
> > +build_pointer_type_for_mode which ensures they have canonical
> > +type unless they point to type with structural equality.
> > +LTO frontend produce pointer types without TYPE_CANONICAL
> > +that are then added to TYPE_POINTER_TO lists and 
> > +build_pointer_type_for_mode will end up picking one for us.
> > +Declare it the canonical one.  This is the same as
> > +build_pointer_type_for_mode would do. */
> > + if (!TYPE_CANONICAL (p))
> > +   {
> > + TYPE_CANONICAL (p) = p;
> > + gcc_checking_assert (in_lto_p);
> > +   }
> > + else
> > +   gcc_checking_assert (p == TYPE_CANONICAL (p));
> 
> The assert can trigger as
> build_pointer_type_for_mode builds SET_TYPE_STRUCTURAL_EQUALITY pointer
> types for SET_TYPE_STRUCTURAL_EQUALITY pointed-to types.  Ah,
> looking up more context reveals
> 
>   if (TREE_CODE (p) == VOID_TYPE || TYPE_STRUCTURAL_EQUALITY_P (p))
> set = get_alias_set (ptr_type_node);

Yep, we don't get here.
> 
> Not sure why you adjust TYPE_CANONICAL here at all either.

You are right, I may probably just drop all the code and just do:
gcc_checking_assert (!TYPE_CANONICAL || p == TYPE_CANONICAL (p));
I will test this and re-think the build_pointer_type code to be sure that we
won't get into a problem there.

As I recall, the original code
  p = TYPE_CANONICAL (p);
was there to permit frontends to glob two pointers by setting same canonical
type to them.  My original plan was to use this for LTO frotnend and make
gimple_compare_canonical_types to do the right thing for pointers and this would
follow gimple_compare_canonical_types globbing then.

This idea was wrong: since pointer rules are not transitive (i.e. void
* alias them all), we can't model that by an equivalence produced by
gimple_compare_canonical_types.

Since the assert does not trigger, seems no frontend is doing that and moreover
I do not see how that would be useful (well, perhaps for some kind of internal
bookeeping when build TYPE_CANONICAL of more complex types from pointer types,
like arrays, but for those we ignore TYPE_CANONICAL anyway).  Grepping over
TYPE_CANONICAL sets in frotneds, I see no code that I would suspect from doing
something like this.

Thank you!
Honza
> 
> Otherwise looks ok.
> 
> RIchard.
> 
> 
> > }
> > - 

Re: [PATCH, i386]: Fix gcc.c-torture/compile/pr41634.c FAIL

2015-11-10 Thread Uros Bizjak
On Tue, Nov 10, 2015 at 7:14 PM, Uros Bizjak  wrote:
> On Tue, Nov 10, 2015 at 7:00 PM, Richard Henderson  wrote:
>> On 11/10/2015 06:54 PM, Uros Bizjak wrote:
>>>
>>> -  return "movabs{}\t{%1, %0|%0, %1}";
>>> +  return "movabs{}\t{%1, %P0|[%P0], %1}";
>>
>>
>> The thing that's missing from this, that's present in the patch that I sent
>> you off-list, is the  thing for Intel syntax.
>>
>> Would you prefer to just add that back here via , rather than
>> using a new %v specifier like in my patch?
>
> I have opted for the same assembly code as it was generated
> previously. But, since we have macroized pattern and already available
> mode attribute, I'd prefer to use  PTR [...]. There are
> already a couple of examples using this approach in i386.md.
>
> BTW: gas is able to determine pointer size from register name, so
> having PTR prefix does not change generated object code.

I'm testing following patch:

--cut here--
Index: i386.md
===
--- i386.md (revision 230117)
+++ i386.md (working copy)
@@ -2601,7 +2601,7 @@
   switch (which_alternative)
 {
 case 0:
-  return "movabs{}\t{%1, %P0|[%P0], %1}";
+  return "movabs{}\t{%1, %P0| PTR [%P0], %1}";
 case 1:
   return "mov{}\t{%1, %0|%0, %1}";
 default:
@@ -2625,7 +2625,7 @@
   switch (which_alternative)
 {
 case 0:
-  return "movabs{}\t{%P1, %0|%0, [%P1]}";
+  return "movabs{}\t{%P1, %0|%0,  PTR [%P1]}";
 case 1:
   return "mov{}\t{%1, %0|%0, %1}";
 default:
--cut here--

Uros.


Re: [PATCH][ARM][cleanup] Remove uses of CONST_DOUBLE_HIGH/LOW

2015-11-10 Thread Kyrill Tkachov

Hi Ramana,

On 10/11/15 14:33, Ramana Radhakrishnan wrote:

On Thu, Nov 5, 2015 at 9:32 AM, Kyrill Tkachov  wrote:

Hi all,

This cleanup patch removes handling of CONST_DOUBLE rtxes that carry large
integers.
These should never be passed down from the midend and the arm backend
doesn't create them.
The code has been there since 2007 but the arm backend was moved to
TARGET_SUPPORTS_WIDE_INT
in 2014, so this path should never be taken.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Ok for trunk?


This is OK -


Thanks for reviewing.
Sorry I had forgotten the ChangeLog in the initial submission.
It is
2015-11-10  Kyrylo Tkachov  

* config/arm/arm.c (neon_valid_immediate): Remove integer
CONST_DOUBLE handling.  It should never occur.

I have committed the patch with that entry as r230115.

Thanks,
Kyrill


Ramana

Thanks,
Kyrill





Re: [PATCH 02/02] C FE: add fix-it hint for . vs ->

2015-11-10 Thread Joseph Myers
On Tue, 10 Nov 2015, David Malcolm wrote:

> This is the most trivial example of a real fix-it example I could think
> of: if the user writes
>   ptr.field
> rather than ptr->field.
> 
> gcc/c/ChangeLog:
>   * c-typeck.c (build_component_ref): Special-case POINTER_TYPE when
>   generating a "not a structure of union"  error message, and
>   suggest a "->" rather than a ".", providing a fix-it hint.

I wonder if this should be restricted to the case where the pointer's 
target is of structure or union type.  At least, if it's some other type, 
more of a fix is needed than just using -> (e.g. converting from void * to 
a pointer to the relevant type).

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH 1/6] Use IFN_SQRT in tree-vect-patterns.c

2015-11-10 Thread Richard Biener
On November 10, 2015 6:29:36 PM GMT+01:00, Joseph Myers 
 wrote:
>On Tue, 10 Nov 2015, Richard Biener wrote:
>
>> Looks ok but I wonder if this is dead code with
>> 
>> (for pows (POW)
>>  sqrts (SQRT)
>>  cbrts (CBRT)
>>  (simplify
>>   (pows @0 REAL_CST@1)
>>   (with {
>> const REAL_VALUE_TYPE *value = TREE_REAL_CST_PTR (@1);
>> REAL_VALUE_TYPE tmp;
>>}
>>(switch
>> ...
>> /* pow(x,0.5) -> sqrt(x).  */
>> (if (flag_unsafe_math_optimizations
>>  && canonicalize_math_p ()
>>  && real_equal (value, ))
>>  (sqrts @0))
>> 
>> also wondering here about canonicalize_math_p (), I'd expected the
>> reverse transform as canonicalization.  Also wondering about
>> flag_unsafe_math_optimizations (missing from the vectorizer pattern).
>
>pow(x,0.5) -> sqrt(x) is unsafe because: pow (-0, 0.5) is specified in 
>Annex F to be +0 but sqrt (-0) is -0; pow (-Inf, 0.5) is specified in 
>Annex F to be +Inf, but sqrt (-Inf) is NaN with "invalid" exception 
>raised.  I think it's safe in other cases

So it's safe with no signed zeros and finite math rather than unsafe.  The 
reverse would be unsafe in addition (not fully specified and rounded).

 (the reverse of course is not
>
>safe, sqrt is a fully-specified correctly-rounded IEEE operation and
>pow 
>isn't).




Re: Replace match.pd DEFINE_MATH_FNs with auto-generated lists

2015-11-10 Thread Richard Biener
On November 10, 2015 9:13:25 PM GMT+01:00, Richard Sandiford 
 wrote:
>Richard Biener  writes:
>> On Sat, Nov 7, 2015 at 2:23 PM, Richard Sandiford
>>  wrote:
>>> diff --git a/gcc/genmatch.c b/gcc/genmatch.c
>>> index cff32b0..7139476 100644
>>> --- a/gcc/genmatch.c
>>> +++ b/gcc/genmatch.c
>>> @@ -4638,6 +4638,11 @@ main (int argc, char **argv)
>>>cpp_callbacks *cb = cpp_get_callbacks (r);
>>>cb->error = error_cb;
>>>
>>> +  /* Add the build directory to the #include "" search path.  */
>>> +  cpp_dir *dir = XCNEW (cpp_dir);
>>> +  dir->name = ASTRDUP (".");
>>> +  cpp_set_include_chains (r, dir, NULL, false);
>>
>> Does that work on non-UNIX hosts?
>
>Bah, hadn't thought about that.
>
>> I wonder if there is sth
>> better we can use by passing some -DXXX=... to the genmatch
>> build command from the Makefile?
>
>toplev.c has:
>
>  src_pwd = getpwd ();
>  if (!src_pwd)
>   src_pwd = ".";
>
>where getpwd is a libiberty function.  Maybe we can use that?

Looks like so.

Richard.

>Thanks,
>Richard




libgo patch committed: always use --whole-archive in go tool

2015-11-10 Thread Ian Lance Taylor
This patch changes the Go tool to always use --whole-archive when
linking gccgo packages.  This fixes cases where a Go package uses cgo
to call C code in which the only referenced symbol is a C global
variable.  This is a backport of https://golang.org/cl/16775 in the
master Go sources.  This fixes GC PR 68255.  Bootstrapped and ran Go
testsuite on x86_64-pc-linux-gnu.  Committed to mainline.

Ian
Index: gcc/go/gofrontend/MERGE
===
--- gcc/go/gofrontend/MERGE (revision 230064)
+++ gcc/go/gofrontend/MERGE (working copy)
@@ -1,4 +1,4 @@
-012ab5cb2ef1c26e8023ce90d3a2bba174da7b30
+0c07751d139ef90a43ef7f299f925622a8792a9f
 
 The first line of this file holds the git revision number of the last
 merge done from the gofrontend repository.
Index: libgo/go/cmd/go/build.go
===
--- libgo/go/cmd/go/build.go(revision 230064)
+++ libgo/go/cmd/go/build.go(working copy)
@@ -2555,17 +2555,9 @@ func (tools gccgoToolchain) ld(b *builde
}
}
 
-   switch ldBuildmode {
-   case "c-archive", "c-shared":
-   ldflags = append(ldflags, "-Wl,--whole-archive")
-   }
-
+   ldflags = append(ldflags, "-Wl,--whole-archive")
ldflags = append(ldflags, afiles...)
-
-   switch ldBuildmode {
-   case "c-archive", "c-shared":
-   ldflags = append(ldflags, "-Wl,--no-whole-archive")
-   }
+   ldflags = append(ldflags, "-Wl,--no-whole-archive")
 
ldflags = append(ldflags, cgoldflags...)
ldflags = append(ldflags, envList("CGO_LDFLAGS", "")...)


Re: [patch 4/3] Header file reduction - Tools for contrib - second cut

2015-11-10 Thread Jeff Law


Andrew, can you go ahead and commit those changes into contrib?  I think 
in a subdirectory would be best so that you can include the README.


Make sure the permissions are set correctly.  Applying them as a patch 
kept mucking them up.


header-tools or somesuch should be a fine directory name to use.

Generally we haven't required the same level of rigor on the contrib/ 
bits that's required elsewhere.  And I really don't want to lose these 
tools and see them bitrot.


Jeff


Re: Replace match.pd DEFINE_MATH_FNs with auto-generated lists

2015-11-10 Thread Richard Sandiford
Richard Biener  writes:
> On Sat, Nov 7, 2015 at 2:23 PM, Richard Sandiford
>  wrote:
>> diff --git a/gcc/genmatch.c b/gcc/genmatch.c
>> index cff32b0..7139476 100644
>> --- a/gcc/genmatch.c
>> +++ b/gcc/genmatch.c
>> @@ -4638,6 +4638,11 @@ main (int argc, char **argv)
>>cpp_callbacks *cb = cpp_get_callbacks (r);
>>cb->error = error_cb;
>>
>> +  /* Add the build directory to the #include "" search path.  */
>> +  cpp_dir *dir = XCNEW (cpp_dir);
>> +  dir->name = ASTRDUP (".");
>> +  cpp_set_include_chains (r, dir, NULL, false);
>
> Does that work on non-UNIX hosts?

Bah, hadn't thought about that.

> I wonder if there is sth
> better we can use by passing some -DXXX=... to the genmatch
> build command from the Makefile?

toplev.c has:

  src_pwd = getpwd ();
  if (!src_pwd)
src_pwd = ".";

where getpwd is a libiberty function.  Maybe we can use that?

Thanks,
Richard



[v3 PATCH] LWG 2510, make the default constructors of library tag types explicit.

2015-11-10 Thread Ville Voutilainen
Tested on Linux-X64.

2015-11-10  Ville Voutilainen  

LWG 2510, make the default constructors of library tag types
explicit.
* include/bits/mutex.h (defer_lock_t, try_lock_t,
adopt_lock_t): Add an explicit default constructor.
* include/bits/stl_pair.h (piecewise_construct_t): Likewise.
* include/bits/uses_allocator.h (allocator_arg_t): Likewise.
* libsupc++/new (nothrow_t): Likewise.
* testsuite/17_intro/tag_type_explicit_ctor.cc: New.
diff --git a/libstdc++-v3/include/bits/mutex.h 
b/libstdc++-v3/include/bits/mutex.h
index 43f5b0b..dd27989 100644
--- a/libstdc++-v3/include/bits/mutex.h
+++ b/libstdc++-v3/include/bits/mutex.h
@@ -129,14 +129,14 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 #endif // _GLIBCXX_HAS_GTHREADS
 
   /// Do not acquire ownership of the mutex.
-  struct defer_lock_t { };
+  struct defer_lock_t { explicit defer_lock_t() = default; };
 
   /// Try to acquire ownership of the mutex without blocking.
-  struct try_to_lock_t { };
+  struct try_to_lock_t { explicit try_to_lock_t() = default; };
 
   /// Assume the calling thread has already obtained mutex ownership
   /// and manage it.
-  struct adopt_lock_t { };
+  struct adopt_lock_t { explicit adopt_lock_t() = default; };
 
   constexpr defer_lock_t   defer_lock { };
   constexpr try_to_lock_t  try_to_lock { };
diff --git a/libstdc++-v3/include/bits/stl_pair.h 
b/libstdc++-v3/include/bits/stl_pair.h
index dfcd357..d6f6b86 100644
--- a/libstdc++-v3/include/bits/stl_pair.h
+++ b/libstdc++-v3/include/bits/stl_pair.h
@@ -73,7 +73,7 @@ _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
 #if __cplusplus >= 201103L
   /// piecewise_construct_t
-  struct piecewise_construct_t { };
+  struct piecewise_construct_t { explicit piecewise_construct_t() = default; };
 
   /// piecewise_construct
   constexpr piecewise_construct_t piecewise_construct = 
piecewise_construct_t();
diff --git a/libstdc++-v3/include/bits/uses_allocator.h 
b/libstdc++-v3/include/bits/uses_allocator.h
index f9ea7d6..a0f084d 100644
--- a/libstdc++-v3/include/bits/uses_allocator.h
+++ b/libstdc++-v3/include/bits/uses_allocator.h
@@ -36,7 +36,7 @@ namespace std _GLIBCXX_VISIBILITY(default)
 _GLIBCXX_BEGIN_NAMESPACE_VERSION
 
   /// [allocator.tag]
-  struct allocator_arg_t { };
+  struct allocator_arg_t { explicit allocator_arg_t() = default; };
 
   constexpr allocator_arg_t allocator_arg = allocator_arg_t();
 
diff --git a/libstdc++-v3/libsupc++/new b/libstdc++-v3/libsupc++/new
index 0f6a05a..8621f73 100644
--- a/libstdc++-v3/libsupc++/new
+++ b/libstdc++-v3/libsupc++/new
@@ -79,7 +79,12 @@ namespace std
   };
 #endif
 
-  struct nothrow_t { };
+  struct nothrow_t
+  {
+#if __cplusplus >= 201103L
+explicit nothrow_t() = default;
+#endif
+  };
 
   extern const nothrow_t nothrow;
 
diff --git a/libstdc++-v3/testsuite/17_intro/tag_type_explicit_ctor.cc 
b/libstdc++-v3/testsuite/17_intro/tag_type_explicit_ctor.cc
new file mode 100644
index 000..4b9d217
--- /dev/null
+++ b/libstdc++-v3/testsuite/17_intro/tag_type_explicit_ctor.cc
@@ -0,0 +1,60 @@
+// { dg-do compile }
+// { dg-options "-std=gnu++11" }
+
+// Copyright (C) 2015 Free Software Foundation, Inc.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+//
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+//
+// You should have received a copy of the GNU General Public License along
+// with this library; see the file COPYING3.  If not see
+// .
+
+#include 
+#include 
+#include 
+#include 
+
+void f1(std::nothrow_t);
+void f2(std::piecewise_construct_t);
+void f3(std::allocator_arg_t);
+void f4(std::defer_lock_t);
+void f5(std::try_to_lock_t);
+void f6(std::adopt_lock_t);
+
+
+int main()
+{
+  std::nothrow_t v1;
+  std::piecewise_construct_t v2;
+  std::allocator_arg_t v3;
+  std::defer_lock_t v4;
+  std::try_to_lock_t v5;
+  std::try_to_lock_t v6;
+  std::nothrow_t v7 = {}; // { dg-error "explicit" }
+  std::piecewise_construct_t v8 = {}; // { dg-error "explicit" }
+  std::allocator_arg_t v9 = {}; // { dg-error "explicit" }
+  std::defer_lock_t v10 = {}; // { dg-error "explicit" }
+  std::try_to_lock_t v11 = {}; // { dg-error "explicit" }
+  std::try_to_lock_t v12 = {}; // { dg-error "explicit" }
+  f1(std::nothrow_t{});
+  f2(std::piecewise_construct_t{});
+  f3(std::allocator_arg_t{});
+  f4(std::defer_lock_t{});
+  f5(std::try_to_lock_t{});
+  f6(std::adopt_lock_t{});
+  f1({}); // { dg-error "explicit" }
+  f2({}); // { dg-error "explicit" }
+  f3({}); // { dg-error "explicit" }
+  f4({}); // { 

Re: [PATCH, applied], Add power9 support to GCC, patch #9 (config.gcc)

2015-11-10 Thread Michael Meissner
I applied this patch as obvious.  I missed submitting it in my original patch
for the power9 support (it was in the sandbox I was testing power9 support on).

2015-11-10  Michael Meissner  

* config.gcc (powerpc*-*-*, rs6000*-*-*): Add power9 to hosts that
default to 64-bit.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config.gcc
===
--- gcc/config.gcc  (revision 230072)
+++ gcc/config.gcc  (working copy)
@@ -439,7 +439,7 @@ powerpc*-*-*)
cpu_type=rs6000
extra_headers="ppc-asm.h altivec.h spe.h ppu_intrinsics.h paired.h 
spu2vmx.h vec_types.h si2vmx.h htmintrin.h htmxlintrin.h"
case x$with_cpu in
-   
xpowerpc64|xdefault64|x6[23]0|x970|xG5|xpower[345678]|xpower6x|xrs64a|xcell|xa2|xe500mc64|xe5500|xe6500)
+   
xpowerpc64|xdefault64|x6[23]0|x970|xG5|xpower[3456789]|xpower6x|xrs64a|xcell|xa2|xe500mc64|xe5500|xe6500)
cpu_is_64bit=yes
;;
esac
@@ -4131,7 +4131,7 @@ case "${target}" in
eval "with_$which=405"
;;
"" | common | native \
-   | power | power[2345678] | power6x | powerpc | 
powerpc64 \
+   | power | power[23456789] | power6x | powerpc | 
powerpc64 \
| rios | rios1 | rios2 | rsc | rsc1 | rs64a \
| 401 | 403 | 405 | 405fp | 440 | 440fp | 464 | 464fp \
| 476 | 476fp | 505 | 601 | 602 | 603 | 603e | ec603e \


Re: [RFC] [PATCH V2]: RE: [RFC] [Patch] Relax tree-if-conv.c trap assumptions.

2015-11-10 Thread Bernhard Reutner-Fischer
On November 10, 2015 1:02:57 PM GMT+01:00, Richard Biener 
 wrote:
>On Sat, Nov 7, 2015 at 12:41 PM, Kumar, Venkataramanan
> wrote:
>> Hi Richard,
>>
>> I have now implemented storing of DR and references using hash maps.
>> Please find attached patch.
>>
>> As discussed, I am now storing the ref, DR  and baseref, DR pairs
>along with unconditional read/write information  in  hash tables while
>iterating over DR during its initialization.
>> Then during checking for possible traps for if-converting,  just
>check if the memory reference for a gimple statement is read/written
>unconditionally by querying the hash table instead of quadratic walk.
>>
>> Boot strapped and regression tested on x86_64.
>
>@@ -592,137 +598,153 @@ struct ifc_dr {
>
>   /* -1 when not initialized, 0 when false, 1 when true.  */
>   int rw_unconditionally;
>+
>+  tree ored_result;
>+
>
>excess vertical space at the end.  A better name would be simply
>"predicate".
>
>+  if (!exsist1)

s/exsist/exists/g

Also watch out for wrong spaces around assignments and comparisons (one "=3" in 
the testcase and one "=0" in the code). Not sure offhand if check_GCC_style.sh 
catches those.

Thanks,



Re: [C/C++ PATCH] Reject declarators with huge arrays (PR c/68107, c++/68266)

2015-11-10 Thread Martin Sebor

On 11/10/2015 09:36 AM, Marek Polacek wrote:

While both C and C++ FEs are able to reject e.g.
int a[__SIZE_MAX__ / sizeof(int)];
they are accepting code such as
int (*a)[__SIZE_MAX__ / sizeof(int)];

As Joseph pointed out, any construction of a non-VLA type whose size is half or
more of the address space should receive a compile-time error.

Done by moving up the check for the size in bytes so that it checks check every
non-VLA complete array type constructed in the course of processing the
declarator.  Since the C++ FE had the same problem, I've fixed it up there as
well.  And that's why I had to twek dg-error of two C++ tests; if the size of
an array is considered invalid, we give an error message with word "unnamed".

(I've removed the comment about crashing in tree_to_[su]hwi since that seems
to no longer be the case.)


Thanks for including me on this. I tested it with C++ references
to arrays (in addition to pointers) and it works correctly for
those as well (unsurprisingly). The only thing that bothers me
a bit is that the seemingly  arbitrary inconsistency between
the diagnostics:


+p = new char [1][MAX - 99]; // { dg-error "size of unnamed array" }
  p = new char [1][MAX / 2];  // { dg-error "size of array" }


Would it be possible to make the message issued by the front ends
the same? I.e., either both "unnamed array" or both just "array?"

Martin


[PATCH] Fix minor fallout from output_address changes

2015-11-10 Thread Jeff Law
The ft32 and moxie ports failed to build after the recent output_address 
changes.  Fixed thusly and committed to the trunk after verifying the 
moxie and ft32 ports in config-list.mk build again.


Jeff
commit b408dd85568c5d0c0a9673810280a8438753b60f
Author: law 
Date:   Tue Nov 10 21:11:07 2015 +

[PATCH] Fix minor fallout from output_address changes
2015-11-10  Jeff Law  

* config/ft32/ft32.c (ft32_print_operand): Supply mode to
call to output_address.
* config/moxie/moxie.c (moxie_print_operand_address): Similarly.
Add unnamed machine_mode argument.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/trunk@230130 
138bc75d-0d04-0410-961f-82ee72b054a4

diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index 2c966e7..84481ef 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,10 @@
+2015-11-10  Jeff Law  
+
+   * config/ft32/ft32.c (ft32_print_operand): Supply mode to
+   call to output_address.
+   * config/moxie/moxie.c (moxie_print_operand_address): Similarly.
+   Add unnamed machine_mode argument.
+
 2015-11-10  Michael Meissner  
 
* config.gcc (powerpc*-*-*, rs6000*-*-*): Add power9 to hosts that
diff --git a/gcc/config/ft32/ft32.c b/gcc/config/ft32/ft32.c
index 85e5ba3..ab62061 100644
--- a/gcc/config/ft32/ft32.c
+++ b/gcc/config/ft32/ft32.c
@@ -238,7 +238,7 @@ ft32_print_operand (FILE * file, rtx x, int code)
   return;
 
 case MEM:
-  output_address (XEXP (operand, 0));
+  output_address (GET_MODE (XEXP (operand, 0)), XEXP (operand, 0));
   return;
 
 default:
diff --git a/gcc/config/moxie/moxie.c b/gcc/config/moxie/moxie.c
index a45b825..756e2f7 100644
--- a/gcc/config/moxie/moxie.c
+++ b/gcc/config/moxie/moxie.c
@@ -106,7 +106,7 @@ moxie_operand_lossage (const char *msgid, rtx op)
 /* The PRINT_OPERAND_ADDRESS worker.  */
 
 static void
-moxie_print_operand_address (FILE *file, rtx x)
+moxie_print_operand_address (FILE *file, machine_mode, rtx x)
 {
   switch (GET_CODE (x))
 {
@@ -183,7 +183,7 @@ moxie_print_operand (FILE *file, rtx x, int code)
   return;
 
 case MEM:
-  output_address (XEXP (operand, 0));
+  output_address (GET_MODE (XEXP (operand, 0)), XEXP (operand, 0));
   return;
 
 default:


Re: [PATCH] libcpp: add examples to source_location description

2015-11-10 Thread Jeff Law

On 11/10/2015 09:44 AM, David Malcolm wrote:

This is a followup to:
   [PATCH 10/10] Compress short ranges into source_location
which adds some worked examples of what a source_location/location_t
can encode.

Successfully bootstrapped on x86_64-pc-linux-gnu
(although it only touches a comment).

OK for trunk?

libcpp/ChangeLog:
* include/line-map.h (source_location): Add worked examples of
location encoding to the leading commment.

OK.
jeff



[gomp4] Fix some broken tests

2015-11-10 Thread Nathan Sidwell
I've committed this to  gomp4.  In preparing the reworked firstprivate patch 
changes for gomp4's gimplify.c I discovered these testcases were passing by 
accident, and lacked a data clause.


nathan
2015-11-10  Nathan Sidwell  

	* testsuite/libgomp.oacc-fortran/parallel-reduction.f90: Fix data
	missing data clause.
	* testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c: Likewise.
	* testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-4.c: Likewise.

Index: libgomp/testsuite/libgomp.oacc-fortran/parallel-reduction.f90
===
--- libgomp/testsuite/libgomp.oacc-fortran/parallel-reduction.f90	(revision 230116)
+++ libgomp/testsuite/libgomp.oacc-fortran/parallel-reduction.f90	(working copy)
@@ -7,7 +7,7 @@ program reduction
 
   sum = 0
 
-  !$acc parallel reduction(+:sum) num_gangs (n)
+  !$acc parallel reduction(+:sum) num_gangs (n) copy(sum)
   sum = sum + 1
   !$acc end parallel
 
@@ -32,7 +32,7 @@ end program reduction
 subroutine redsub(sum, n)
   integer :: sum, n
 
-  !$acc parallel reduction(+:sum) num_gangs (10)
+  !$acc parallel reduction(+:sum) num_gangs (10)  copy(sum)
   sum = sum + 1
   !$acc end parallel
 end subroutine redsub
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c
===
--- libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c	(revision 230116)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/parallel-reduction.c	(working copy)
@@ -14,7 +14,7 @@ main ()
 
 #pragma acc data copy (dummy)
   {
-#pragma acc parallel num_gangs (N) reduction (+:s1)
+#pragma acc parallel num_gangs (N) reduction (+:s1) copy(s1)
 {
   s1++;
 }
@@ -34,7 +34,7 @@ main ()
   s1 = 0;
   s2 = 0;
 
-#pragma acc parallel num_gangs (10) reduction (+:s1, s2)
+#pragma acc parallel num_gangs (10) reduction (+:s1, s2) copy(s1, s2)
   {
 s1++;
 s2 += N;
@@ -57,7 +57,7 @@ main ()
 
   s1 = 0;
 
-#pragma acc parallel num_gangs (10) reduction (+:s1)
+#pragma acc parallel num_gangs (10) reduction (+:s1) copy(s1)
   {
 #pragma acc loop gang reduction (+:s1)
 for (i = 0; i < 10; i++)
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c
===
--- libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c	(revision 230116)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-1.c	(working copy)
@@ -15,7 +15,7 @@ main (int argc, char *argv[])
 # define GANGS 256
 #endif
   #pragma acc parallel num_gangs(GANGS) num_workers(32) vector_length(32) \
-		   reduction(+:res1) copy(res2)
+reduction(+:res1) copy(res2, res1)
   {
 res1 += 5;
 
@@ -36,7 +36,7 @@ main (int argc, char *argv[])
 # define GANGS 8
 #endif
   #pragma acc parallel num_gangs(GANGS) num_workers(32) vector_length(32) \
-		   reduction(*:res1) copy(res2)
+reduction(*:res1) copy(res1, res2)
   {
 res1 *= 5;
 
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c
===
--- libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c	(revision 230116)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-1.c	(working copy)
@@ -13,7 +13,7 @@ main (int argc, char *argv[])
 arr[i] = i;
 
   #pragma acc parallel num_gangs(32) num_workers(32) vector_length(32) \
-		   reduction(+:res)
+reduction(+:res) copy(res)
   {
 #pragma acc loop gang
 for (j = 0; j < 32; j++)
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c
===
--- libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c	(revision 230116)
+++ libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-2.c	(working copy)
@@ -14,7 +14,7 @@ main (int argc, char *argv[])
 # define GANGS 256
 #endif
   #pragma acc parallel num_gangs(GANGS) num_workers(32) vector_length(32) \
-		   reduction(+:res1) copy(res2) async(1)
+reduction(+:res1) copy(res1, res2) async(1)
   {
 res1 += 5;
 
@@ -37,7 +37,7 @@ main (int argc, char *argv[])
 # define GANGS 8
 #endif
   #pragma acc parallel num_gangs(GANGS) num_workers(32) vector_length(32) \
-		   reduction(*:res1) copy(res2) async(1)
+reduction(*:res1) copy(res1, res2) async(1)
   {
 res1 *= 5;
 
Index: libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c
===
--- libgomp/testsuite/libgomp.oacc-c-c++-common/par-loop-comb-reduction-3.c	(revision 230116)

[PATCH, rs6000] Remove redundant logic from rs6000_secondary_reload_direct_move

2015-11-10 Thread Bill Schmidt
Hi,

While investigating another issue, I observed some repeated logic in
rs6000_secondary_reload_direct_move ().  This patch takes it out.  No
functional change intended, and quite straightforward, so I'll plan to
commit shortly if no concerns are raised.

Bootstrapped and tested on powerpc64le-linux-gnu with no regressions.

Thanks,
Bill


2015-11-10  Bill Schmidt  

* config/rs6000/rs6000.c (rs6000_secondary_reload_direct_move):
Remove redundant code.


Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 230052)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -17926,29 +17926,8 @@ rs6000_secondary_reload_direct_move (enum rs6000_r
}
 }
 
-  if (TARGET_POWERPC64 && size == 16)
+  else if (size == 8)
 {
-  /* Handle moving 128-bit values from GPRs to VSX point registers on
-power8 when running in 64-bit mode using XXPERMDI to glue the two
-64-bit values back together.  */
-  if (to_type == VSX_REG_TYPE && from_type == GPR_REG_TYPE)
-   {
- cost = 3; /* 2 mtvsrd's, 1 xxpermdi.  */
- icode = reg_addr[mode].reload_vsx_gpr;
-   }
-
-  /* Handle moving 128-bit values from VSX point registers to GPRs on
-power8 when running in 64-bit mode using XXPERMDI to get access to the
-bottom 64-bit value.  */
-  else if (to_type == GPR_REG_TYPE && from_type == VSX_REG_TYPE)
-   {
- cost = 3; /* 2 mfvsrd's, 1 xxpermdi.  */
- icode = reg_addr[mode].reload_gpr_vsx;
-   }
-}
-
-  else if (!TARGET_POWERPC64 && size == 8)
-{
   /* Handle moving 64-bit values from GPRs to floating point registers on
 power8 when running in 32-bit mode using FMRGOW to glue the two 32-bit
 values back together.  Altivec register classes must be handled




Re: [PATCH 1/2] simplify-rtx: Simplify trunc of and of shiftrt

2015-11-10 Thread Bernd Schmidt

On 11/10/2015 06:44 PM, Segher Boessenkool wrote:


Yes I know.  All the rest of the code around is it like this though.
Do you want this written in a saner way?


I won't object to leaving it as-is for now, but in the future it would 
be good to keep this in mind.



I'm not entirely sure what the
last condition here is supposed to test.


It tests whether moving the truncate inside will give the same result.
It essentially looks if it works for an x with all bits set; if that
works, it works for any x.


Yeah, I figured afterwards that must have been the purpose of the test 
but I was thinking of other constants because of the trunc_int_for_mode 
thing. (I probably would have written "(and_const >> shift_amount) & 
~small_mask == 0" but yours should be ok too). You might want to use 
your description as a comment.



... the fact that here I think you'd have to trunc_int_for_mode the AND
amount for the smaller mode?


Ugh yes, I still have to do that for it to be valid RTL in all cases.
Thanks for catching it.


So FAOD the patch is OK with that change.


Bernd


Re: [PATCH][combine][RFC] Don't transform sign and zero extends inside mults

2015-11-10 Thread Segher Boessenkool
On Mon, Nov 09, 2015 at 03:51:32AM -0600, Segher Boessenkool wrote:
> > >From the original patch submission, it looks that this patch would
> > also benefit x86_32.
> 
> Yes, that is what I thought too.
> 
> > Regarding the above code size increase -  do you perhaps have a
> > testcase, to see what causes the difference?
> 
> I could extract some.  It happens quite rarely on usual code.
> 
> > It isn't necessary due to
> > the patch, but perhaps some loads are moved to the insn and aren't
> > CSE'd anymore.

I don't have a small testcase yet.

What causes the degradation is that sometimes we end up with imul reg,reg
instead of imul mem,reg.  In the normal case we already have imul mem,reg
after expand, so the patch doesn't change anything in the normal case.
Even if expand didn't do it fwprop would I think.

It also isn't LRA that is doing it, the MEMs in case are not on stack.
Maybe as you say some CSE pass.

For x86_64, which has many more registers than i386, often a peephole
fires that turns a  mov reg,reg ; imul mem,reg  into an  mov mem,reg ;
imul reg,reg  which makes the generated machines code identical with
or without the patch (tested on a Linux build, 12MB text).

The i386 size regression is 0.01% btw (comparable to the gains for
other targets).


Segher


Re: [PATCH v4] SH FDPIC backend support

2015-11-10 Thread Rich Felker
On Tue, Oct 27, 2015 at 11:01:39PM +0900, Oleg Endo wrote:
> On Mon, 2015-10-26 at 22:47 -0400, Rich Felker wrote:
> > On Sun, Oct 25, 2015 at 11:28:51PM +0900, Oleg Endo wrote:
> > > On Fri, 2015-10-23 at 02:32 -0400, Rich Felker wrote:
> > > > Here's my updated version of the FDPIC patch with all requested
> > > > changes made and Changelog added. I've included all the original
> > > > authors. This is my first time writing such an extensive
> > > > Changelog
> > > > entry so please let me know if there are things I got wrong.
> > > 
> > > I took the liberty and fixed some minor formatting trivia and
> > > extracted
> > > functions sh_emit_storesi and sh_emit_storehi which are used in
> > >  sh_trampoline_init to effectively memcpy code into the trampoline
> > > area.  Can you please check it?  If it's OK I'll commit the
> > > attached
> > > patch to trunk.
> > 
> > Is there anything in particular you'd like me to check? It builds
> > fine
> > for fdpic target, successfully compiles musl libc.so, and busybox
> > runs
> > with the resulting libc.so. I did a quick visual inspection of the
> > diff between my version and yours too and didn't see anything that
> > looked suspicious to me.
> 
> Thanks.  I have committed it as r229438 after a sanity check with "make
> all" on sh-elf.
> 
> The way libcalls are now emitted is a bit unhandy.  If more special-ABI
> libcalls are to be added in the future, they all have to do the jsr vs.
> bsrf handling (some potential candidates for new libcalls are optimized
> soft FP routines).  Then we still have PR 65374 and PR 54019. In the
> future maybe we should come up with something that allows emitting
> libcalls in a more transparent way...

I'd like to look into improving this at some point in the near future.
On further reading of the changes made, I think there's a lot of code
we could reduce or simplify.

In all the places where new RTL patterns were added for *call*_fdpic,
the main constraint change vs the non-fdpic version is using REG_PIC.
Is it possible to make a REG_GOT_ARG macro or similar that's defined
as something like TARGET_FDPIC ? REG_PIC : nonexistent_or_dummy?

As for the call site stuff, I wonder why the existing call site stuff
used by "call_pcrel" can't be used for SFUNC_STATIC. I'm actually
trying to prepare a simpler FDPIC patch for other gcc versions we're
interested in that's not so invasive, and for now I'm just having
function_symbol replace SFUNC_STATIC with SFUNC_GOT on TARGET_FDPIC to
avoid needing all the label stuff, but it would be nice to find a way
to reuse the existing framework.

Rich


Re: [gomp4] Random omp-low.c backporting

2015-11-10 Thread Nathan Sidwell

On 11/10/15 11:28, Thomas Schwinge wrote:

Hi Nathan!

On Tue, 10 Nov 2015 09:19:50 -0500, Nathan Sidwell  wrote:

I've committed this to backport a bunch of random bits from trunk to gomp4, and
thereby reduce divergence.


Yeah, I had some of these on my list, too.


--- omp-low.c   (revision 230080)
+++ omp-low.c   (working copy)
@@ -12515,7 +12485,7 @@ replace_oacc_fn_attrib (tree fn, tree di
 function attribute.  Push any that are non-constant onto the ARGS
 list, along with an appropriate GOMP_LAUNCH_DIM tag.  */

-void
+static void
  set_oacc_fn_attrib (tree fn, tree clauses, vec *args)
  {
/* Must match GOMP_DIM ordering.  */




fixed.  Not sure why I don't encounter these build problems ...

nathan

2015-11-10  Nathan Sidwell  

	* omp-low.c (set_oacc_fn_attrib): Revert static storage specifier.

Index: gcc/omp-low.c
===
--- gcc/omp-low.c	(revision 230120)
+++ gcc/omp-low.c	(working copy)
@@ -12574,7 +12574,7 @@ replace_oacc_fn_attrib (tree fn, tree di
function attribute.  Push any that are non-constant onto the ARGS
list, along with an appropriate GOMP_LAUNCH_DIM tag.  */
 
-static void
+void
 set_oacc_fn_attrib (tree fn, tree clauses, vec *args)
 {
   /* Must match GOMP_DIM ordering.  */


Re: [PATCH] gcc.c: new macro POST_LINK_SPECS to be able to add additional steps after linking

2015-11-10 Thread Jeff Law

On 11/10/2015 11:16 AM, Andris Pavenis wrote:

One may need to execute extra steps after linking program. This is required
for example for DJGPP to run stubify.exe on file generated by linker.

The only way how to achieve was to use LINK_COMMAND_SPEC. It would be
much easier
and less error prone to use new macro POST_LINK_SPEC introduced in this
patch.

Andris

ChangeLog entry

2015 Nov 10 Andris Pavenis 

 * gcc.c: new macro POST_LINK_SPEC
 * doc/tm.texi.in: document POST_LINK_SPEC
 * doc/tm.texi: regenerate


Can you also include the changes to djgpp.h which exploit this capability?

Jeff


Re: Short-cut generation of simple built-in functions

2015-11-10 Thread Richard Sandiford
Richard Biener  writes:
> On Sat, Nov 7, 2015 at 2:31 PM, Richard Sandiford
>  wrote:
>> This patch short-circuits the builtins.c expansion code for a particular
>> gimple call if:
>>
>> - the function has an associated internal function
>> - the target implements that internal function
>> - the call has no side effects
>>
>> This allows a later patch to remove the builtins.c code, once calls with
>> side effects have been handled.
>>
>> Tested on x86_64-linux-gnu, aarch64-linux-gnu and arm-linux-gnueabi.
>> OK to install?
>>
>> Thanks,
>> Richard
>>
>>
>> gcc/
>> * builtins.h (called_as_built_in): Declare.
>> * builtins.c (called_as_built_in): Make external.
>> * internal-fn.h (expand_internal_call): Define a variant that
>> specifies the internal function explicitly.
>> * internal-fn.c (expand_load_lanes_optab_fn)
>> (expand_store_lanes_optab_fn, expand_ANNOTATE, expand_GOMP_SIMD_LANE)
>> (expand_GOMP_SIMD_VF, expand_GOMP_SIMD_LAST_LANE)
>> (expand_GOMP_SIMD_ORDERED_START, expand_GOMP_SIMD_ORDERED_END)
>> (expand_UBSAN_NULL, expand_UBSAN_BOUNDS, expand_UBSAN_VPTR)
>> (expand_UBSAN_OBJECT_SIZE, expand_ASAN_CHECK, expand_TSAN_FUNC_EXIT)
>> (expand_UBSAN_CHECK_ADD, expand_UBSAN_CHECK_SUB)
>> (expand_UBSAN_CHECK_MUL, expand_ADD_OVERFLOW, expand_SUB_OVERFLOW)
>> (expand_MUL_OVERFLOW, expand_LOOP_VECTORIZED)
>> (expand_mask_load_optab_fn, expand_mask_store_optab_fn)
>> (expand_ABNORMAL_DISPATCHER, expand_BUILTIN_EXPECT, expand_VA_ARG)
>> (expand_UNIQUE, expand_GOACC_DIM_SIZE, expand_GOACC_DIM_POS)
>> (expand_GOACC_LOOP, expand_GOACC_REDUCTION, expand_direct_optab_fn)
>> (expand_unary_optab_fn, expand_binary_optab_fn): Add an internal_fn
>> argument.
>> (internal_fn_expanders): Update prototype.
>> (expand_internal_call): Define a variant that specifies the
>> internal function explicitly. Use it to implement the previous
>> interface.
>> * cfgexpand.c (expand_call_stmt): Try to expand calls to built-in
>> functions as calls to internal functions.
>>
>> diff --git a/gcc/builtins.c b/gcc/builtins.c
>> index f65011e..bbcc7dc3 100644
>> --- a/gcc/builtins.c
>> +++ b/gcc/builtins.c
>> @@ -222,7 +222,7 @@ is_builtin_fn (tree decl)
>> of the optimization level.  This means whenever a function is invoked 
>> with
>> its "internal" name, which normally contains the prefix "__builtin".  */
>>
>> -static bool
>> +bool
>>  called_as_built_in (tree node)
>>  {
>>/* Note that we must use DECL_NAME, not DECL_ASSEMBLER_NAME_SET_P since
>> diff --git a/gcc/builtins.h b/gcc/builtins.h
>> index 917eb90..1d00068 100644
>> --- a/gcc/builtins.h
>> +++ b/gcc/builtins.h
>> @@ -50,6 +50,7 @@ extern struct target_builtins *this_target_builtins;
>>  extern bool force_folding_builtin_constant_p;
>>
>>  extern bool is_builtin_fn (tree);
>> +extern bool called_as_built_in (tree);
>>  extern bool get_object_alignment_1 (tree, unsigned int *,
>> unsigned HOST_WIDE_INT *);
>>  extern unsigned int get_object_alignment (tree);
>> diff --git a/gcc/cfgexpand.c b/gcc/cfgexpand.c
>> index bfbc958..dc7d4f5 100644
>> --- a/gcc/cfgexpand.c
>> +++ b/gcc/cfgexpand.c
>> @@ -2551,10 +2551,25 @@ expand_call_stmt (gcall *stmt)
>>return;
>>  }
>>
>> +  /* If this is a call to a built-in function and it has no effect other
>> + than setting the lhs, try to implement it using an internal function
>> + instead.  */
>> +  decl = gimple_call_fndecl (stmt);
>> +  if (gimple_call_lhs (stmt)
>> +  && !gimple_vdef (stmt)
>
> I think you want && ! gimple_has_side_effects (stmt)
> instead of checking !gimple_vdef (stmt).

OK, I can do that, but what would the difference be in practice for
these types of call?  I.e. are there cases for built-ins where:

  (A) gimple_vdef (stmt) && !gimple_side_effects (stmt)

or:

  (B) !gimple_vdef (stmt) && gimple_side_effects (stmt)

?

It just seems like this check should be the opposite of the one used
in the call-cdce patch (when deciding whether to optimise a call
with an lhs).  In order to keep them in sync I'd need to use
gimple_side_effects rather than gimple_vdef there too, but is
(B) a possibility there?

>> +  && (optimize || (decl && called_as_built_in (decl
>> +{
>> +  internal_fn ifn = replacement_internal_fn (stmt);
>> +  if (ifn != IFN_LAST)
>> +   {
>> + expand_internal_call (ifn, stmt);
>> + return;
>> +   }
>> +}
>> +
>>exp = build_vl_exp (CALL_EXPR, gimple_call_num_args (stmt) + 3);
>>
>>CALL_EXPR_FN (exp) = gimple_call_fn (stmt);
>> -  decl = gimple_call_fndecl (stmt);
>>builtin_p = decl && DECL_BUILT_IN (decl);
>>
>>/* If this is not a builtin function, the function type through which the
>> diff --git a/gcc/internal-fn.c 

Re: libgo patch committed: Update to Go 1.5 release

2015-11-10 Thread Ian Lance Taylor
On Sun, Nov 8, 2015 at 9:21 AM, Rainer Orth  
wrote:
>
> There were two remaining problems:
>
> * Before Solaris 12, sendfile only lives in libsendfile.  This lead to
>   link failures in gotools.
>
> * Solaris 12 introduced a couple more types that use _in6_addr_t, which
>   are filtered out by mksysinfo.sh, leading to compilation failues.
>
> The following patch addresses both issues.  Solaris 10 and 11 bootstraps
> have completed, a Solaris 12 bootstrap is still running make check.

Thanks.  Committed to mainline.

Ian


Re: [PATCH], Add power9 support to GCC, patch #10 (SFmode/DFmode d-form addressing)

2015-11-10 Thread Michael Meissner
This patch d-form addressing to float/double scalars for the PowerPC that was
added in ISA 3.0 (power9).  This patch does not yet turn on D-form addressing
as default.  It is likely that patch #11, which will add limited d-form
addressing to vector registers will enable it by default.

I have bootstrapped the compiler with these changes, and there were no
regressions to the testsuite.

In addition, I built all of the Spec 2006 benchmark with my normal options
(-ffast-math -O3 -mveclibabi=mass -mcpu=power9 -mpower9-dform -mrecip=rsqrt
-fpeel-loops -funroll-loops -fvect-cost-model -msave-toc-indirect
-fno-aggressive-loop-optimizations -mno-pointers-to-nested-functions) and there
were no compiler failures (and various power9 instructions were generated,
including d-form addressing).

Are these patches ok to check in?

[gcc]
2015-11-10  Michael Meissner  


* config/rs6000/constraints.md (wb constraint): New constraint for
ISA 3.0 d-form scalar addressing.

* config/rs6000/rs6000.c (mode_supports_vmx_dform): Add support
for ISA 3.0 D-form addressing to load SFmode/DFmode scalars into
Altivec registers.  Add wb constraint for Altivec registers with
D-form addressing.  If we have ISA 3.0 d-form support, undo
secondary reload support for using FPR registers if we want to do
D-form addressing.
(rs6000_debug_reg_global): Likewise.
(rs6000_setup_reg_addr_masks): Likewise.
(rs6000_init_hard_regno_mode_ok): Likewise.
(rs6000_secondary_reload): Likewise.
(rs6000_preferred_reload_class): Likewise.
(rs6000_secondary_reload_class): Likewise.

* config/rs6000/rs6000.h (enum r6000_reg_class_enum): Add wb
constraint.

* config/rs6000/rs6000.md (f32_lr2 mode attribute): Add support
for ISA 3.0 SFmode/DFmode d-form addressing to Altivec registers.
(f32_lm2): Likewise.
(f32_li2): Likewise.
(f32_sr2): Likewise.
(f32_sm2): Likewise.
(f32_si2): Likewise.
(f64_p9): Likewise.
(extendsfdf2_fpr): Likewise.
(mov_hardfloat): Likewise.
(mov_hardfloat32): Likewise.
(mov_hardfloat64): Likewise.

* doc/md.texi (RS/6000 constraints): Document wb constraint.
Fixup we constraint documentation.

[gcc/testsuite]
2015-11-10  Michael Meissner  

* gcc.target/powerpc/dform-1.c: New test.
* gcc.target/powerpc/dform-2.c: Likewise.


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



[ptx] partitioning optimization

2015-11-10 Thread Nathan Sidwell
I've committed this patch to trunk.  It implements a partitioning optimization 
for a loop partitioned over both vector and worker axes.  We can elide the inner 
vector partitioning state propagation, if there are no intervening instructions 
in the worker-partitioned outer loop other than the forking and joining.  We 
simply execute the worker propagation on all vectors.


I've been unable to introduce a testcase for this. The difficulty is we want to 
check an rtl dump from the acceleration compiler, and there doesn't  appear to 
be existing machinery for that in the testsuite.  Perhaps something to be added 
later?


nathan
2015-11-10  Nathan Sidwell  

	* config/nvptx/nvptx.opt (moptimize): New flag.
	* config/nvptx/nvptx.c (nvptx_option_override): Set nvptx_optimize
	default.
	(nvptx_optimize_inner): New.
	(nvptx_process_pars): Call it when optimizing.
	* doc/invoke.texi (Nvidia PTX Options): Document -moptimize.

Index: config/nvptx/nvptx.c
===
--- config/nvptx/nvptx.c	(revision 230112)
+++ config/nvptx/nvptx.c	(working copy)
@@ -137,6 +137,9 @@ nvptx_option_override (void)
   write_symbols = NO_DEBUG;
   debug_info_level = DINFO_LEVEL_NONE;
 
+  if (nvptx_optimize < 0)
+nvptx_optimize = optimize > 0;
+
   declared_fndecls_htab = hash_table::create_ggc (17);
   needed_fndecls_htab = hash_table::create_ggc (17);
   declared_libfuncs_htab
@@ -2942,6 +2945,69 @@ nvptx_skip_par (unsigned mask, parallel
   nvptx_single (mask, par->forked_block, pre_tail);
 }
 
+/* If PAR has a single inner parallel and PAR itself only contains
+   empty entry and exit blocks, swallow the inner PAR.  */
+
+static void
+nvptx_optimize_inner (parallel *par)
+{
+  parallel *inner = par->inner;
+
+  /* We mustn't be the outer dummy par.  */
+  if (!par->mask)
+return;
+
+  /* We must have a single inner par.  */
+  if (!inner || inner->next)
+return;
+
+  /* We must only contain 2 blocks ourselves -- the head and tail of
+ the inner par.  */
+  if (par->blocks.length () != 2)
+return;
+
+  /* We must be disjoint partitioning.  As we only have vector and
+ worker partitioning, this is sufficient to guarantee the pars
+ have adjacent partitioning.  */
+  if ((par->mask & inner->mask) & (GOMP_DIM_MASK (GOMP_DIM_MAX) - 1))
+/* This indicates malformed code generation.  */
+return;
+
+  /* The outer forked insn should be immediately followed by the inner
+ fork insn.  */
+  rtx_insn *forked = par->forked_insn;
+  rtx_insn *fork = BB_END (par->forked_block);
+
+  if (NEXT_INSN (forked) != fork)
+return;
+  gcc_checking_assert (recog_memoized (fork) == CODE_FOR_nvptx_fork);
+
+  /* The outer joining insn must immediately follow the inner join
+ insn.  */
+  rtx_insn *joining = par->joining_insn;
+  rtx_insn *join = inner->join_insn;
+  if (NEXT_INSN (join) != joining)
+return;
+
+  /* Preconditions met.  Swallow the inner par.  */
+  if (dump_file)
+fprintf (dump_file, "Merging loop %x [%d,%d] into %x [%d,%d]\n",
+	 inner->mask, inner->forked_block->index,
+	 inner->join_block->index,
+	 par->mask, par->forked_block->index, par->join_block->index);
+
+  par->mask |= inner->mask & (GOMP_DIM_MASK (GOMP_DIM_MAX) - 1);
+
+  par->blocks.reserve (inner->blocks.length ());
+  while (inner->blocks.length ())
+par->blocks.quick_push (inner->blocks.pop ());
+
+  par->inner = inner->inner;
+  inner->inner = NULL;
+
+  delete inner;
+}
+
 /* Process the parallel PAR and all its contained
parallels.  We do everything but the neutering.  Return mask of
partitioned modes used within this parallel.  */
@@ -2949,6 +3015,9 @@ nvptx_skip_par (unsigned mask, parallel
 static unsigned
 nvptx_process_pars (parallel *par)
 {
+  if (nvptx_optimize)
+nvptx_optimize_inner (par);
+  
   unsigned inner_mask = par->mask;
 
   /* Do the inner parallels first.  */
Index: config/nvptx/nvptx.opt
===
--- config/nvptx/nvptx.opt	(revision 230112)
+++ config/nvptx/nvptx.opt	(working copy)
@@ -28,3 +28,7 @@ Generate code for a 64-bit ABI.
 mmainkernel
 Target Report RejectNegative
 Link in code for a __main kernel.
+
+moptimize
+Target Report Var(nvptx_optimize) Init(-1)
+Optimize partition neutering
Index: doc/invoke.texi
===
--- doc/invoke.texi	(revision 230112)
+++ doc/invoke.texi	(working copy)
@@ -873,7 +873,7 @@ Objective-C and Objective-C++ Dialects}.
 -march=@var{arch} -mbmx -mno-bmx -mcdx -mno-cdx}
 
 @emph{Nvidia PTX Options}
-@gccoptlist{-m32 -m64 -mmainkernel}
+@gccoptlist{-m32 -m64 -mmainkernel -moptimize}
 
 @emph{PDP-11 Options}
 @gccoptlist{-mfpu  -msoft-float  -mac0  -mno-ac0  -m40  -m45  -m10 @gol
@@ -18960,6 +18960,11 @@ Generate code for 32-bit or 64-bit ABI.
 Link in code for a __main kernel.  This is for stand-alone instead of
 offloading execution.
 

Re: [PATCH], Add power9 support to GCC, patch #10 (SFmode/DFmode d-form addressing)

2015-11-10 Thread Michael Meissner
Arghh, forgot the the patch once again.

[gcc]
2015-11-10  Michael Meissner  


* config/rs6000/constraints.md (wb constraint): New constraint for
ISA 3.0 d-form scalar addressing.

* config/rs6000/rs6000.c (mode_supports_vmx_dform): Add support
for ISA 3.0 D-form addressing to load SFmode/DFmode scalars into
Altivec registers.  Add wb constraint for Altivec registers with
D-form addressing.  If we have ISA 3.0 d-form support, undo
secondary reload support for using FPR registers if we want to do
D-form addressing.
(rs6000_debug_reg_global): Likewise.
(rs6000_setup_reg_addr_masks): Likewise.
(rs6000_init_hard_regno_mode_ok): Likewise.
(rs6000_secondary_reload): Likewise.
(rs6000_preferred_reload_class): Likewise.
(rs6000_secondary_reload_class): Likewise.

* config/rs6000/rs6000.h (enum r6000_reg_class_enum): Add wb
constraint.

* config/rs6000/rs6000.md (f32_lr2 mode attribute): Add support
for ISA 3.0 SFmode/DFmode d-form addressing to Altivec registers.
(f32_lm2): Likewise.
(f32_li2): Likewise.
(f32_sr2): Likewise.
(f32_sm2): Likewise.
(f32_si2): Likewise.
(f64_p9): Likewise.
(extendsfdf2_fpr): Likewise.
(mov_hardfloat): Likewise.
(mov_hardfloat32): Likewise.
(mov_hardfloat64): Likewise.

* doc/md.texi (RS/6000 constraints): Document wb constraint.
Fixup we constraint documentation.

[gcc/testsuite]
2015-11-10  Michael Meissner  

* gcc.target/powerpc/dform-1.c: New test.
* gcc.target/powerpc/dform-2.c: Likewise.


-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/config/rs6000/constraints.md
===
--- gcc/config/rs6000/constraints.md(revision 230078)
+++ gcc/config/rs6000/constraints.md(working copy)
@@ -56,7 +56,8 @@ (define_register_constraint "z" "CA_REGS
 (define_register_constraint "wa" "rs6000_constraints[RS6000_CONSTRAINT_wa]"
   "Any VSX register if the -mvsx option was used or NO_REGS.")
 
-;; wb is not currently used
+(define_register_constraint "wb" "rs6000_constraints[RS6000_CONSTRAINT_wb]"
+  "Altivec register if the -mpower9-dform option was used or NO_REGS.")
 
 ;; NOTE: For compatibility, "wc" is reserved to represent individual CR bits.
 ;; It is currently used for that purpose in LLVM.
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 230078)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -408,6 +408,13 @@ mode_supports_pre_modify_p (machine_mode
  != 0);
 }
 
+/* Return true if we have D-form addressing in altivec registers.  */
+static inline bool
+mode_supports_vmx_dform (machine_mode mode)
+{
+  return ((reg_addr[mode].addr_mask[RELOAD_REG_VMX] & RELOAD_REG_OFFSET) != 0);
+}
+
 
 /* Target cpu costs.  */
 
@@ -2258,7 +2265,9 @@ rs6000_debug_reg_global (void)
   "f  reg_class = %s\n"
   "v  reg_class = %s\n"
   "wa reg_class = %s\n"
+  "wb reg_class = %s\n"
   "wd reg_class = %s\n"
+  "we reg_class = %s\n"
   "wf reg_class = %s\n"
   "wg reg_class = %s\n"
   "wh reg_class = %s\n"
@@ -2283,7 +2292,9 @@ rs6000_debug_reg_global (void)
   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_f]],
   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_v]],
   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wa]],
+  reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wb]],
   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wd]],
+  reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_we]],
   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wf]],
   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wg]],
   reg_class_names[rs6000_constraints[RS6000_CONSTRAINT_wh]],
@@ -2664,9 +2675,15 @@ rs6000_setup_reg_addr_masks (void)
}
 
  /* GPR and FPR registers can do REG+OFFSET addressing, except
-possibly for SDmode.  */
+possibly for SDmode.  ISA 3.0 (i.e. power9) adds D-form
+addressing for scalars to altivec registers.  */
  if ((addr_mask != 0) && !indexed_only_p
- && (rc == RELOAD_REG_GPR || rc == RELOAD_REG_FPR))
+ && msize <= 8
+ && (rc == RELOAD_REG_GPR
+ || rc == RELOAD_REG_FPR
+ || (rc == RELOAD_REG_VMX
+ && TARGET_P9_DFORM
+ && (m2 == DFmode || m2 == SFmode
addr_mask |= RELOAD_REG_OFFSET;
 
  /* VMX registers can do (REG & -16) and 

Re: [ptx] partitioning optimization

2015-11-10 Thread Ilya Verbin
> I've been unable to introduce a testcase for this. The difficulty is we want
> to check an rtl dump from the acceleration compiler, and there doesn't
> appear to be existing machinery for that in the testsuite.  Perhaps
> something to be added later?

I haven't tried it, but doesn't
/* { dg-options "-foffload=-fdump-rtl-..." } */
with
/* { dg-final { scan-rtl-dump ... } } */
work?

  -- Ilya


RE: [RFC][PATCH] Preferred rename register in regrename pass

2015-11-10 Thread Robert Suchanek
Hi all,

> > Now that 'make check' has had enough time to run, I can see several
> > regressions in the configurations where GCC still builds.
> > For more details:
> > http://people.linaro.org/~christophe.lyon/cross-validation/gcc/trunk/230087/report-build-info.html
> >
> 
> This also causes failures for AArch64 -mcpu=cortex-a57 targets. This
> testcase:
> 
>   void
>   foo (unsigned char *out, const unsigned char *in, int a)
>   {
> for (int i = 0; i < a; i++)
>   {
> out[0] = in[2];
> out[1] = in[1];
> out[2] = in[0];
> in += 3;
> out += 3;
>   }
>   }
> 
> Fails as so:
> 
>   foo.c: In function 'void foo(unsigned char*, const unsigned char*, int)':
>   foo.c:12:1: internal compiler error: in scan_rtx_reg, at regrename.c:1074
>}
>^
> 
>   0xbe00f8 scan_rtx_reg
> /gcc/regrename.c:1073
>   0xbe0ad5 scan_rtx
> /gcc/regrename.c:1401
>   0xbe1038 record_out_operands
> /gcc/regrename.c:1554
>   0xbe1f50 build_def_use
> /gcc/regrename.c:1802
>   0xbe1f50 regrename_analyze(bitmap_head*)
> /gcc/regrename.c:726
>   0xf7a0c7 func_fma_steering::execute_fma_steering()
> /gcc/config/aarch64/cortex-a57-fma-steering.c:1026
>   0xf7a9c1 pass_fma_steering::execute(function*)
> /gcc/config/aarch64/cortex-a57-fma-steering.c:1063
>   Please submit a full bug report,
>   with preprocessed source if appropriate.
>   Please include the complete backtrace with any bug report.
>   See  for instructions.
> 
> When compiled with:
> 
>-O3 -mcpu=cortex-a57 foo.c
> 
> Thanks,
> James 0xbe1f50 build_def_use
> /gcc/regrename.c:1802
>   0xbe1f50 regrename_analyze(bitmap_head*)
> /gcc/regrename.c:726
>   0xf7a0c7 func_fma_steering::execute_fma_steering()
> /gcc/config/aarch64/cortex-a57-fma-steering.c:1026
>   0xf7a9c1 pass_fma_steering::execute(function*)
> /gcc/config/aarch64/cortex-a57-fma-steering.c:1063
>   Please submit a full bug report,
>   with preprocessed source if appropriate.
>   Please include the complete backtrace with any bug report.
>   See  for instructions.
> 
> When compiled with:
> 
>-O3 -mcpu=cortex-a57 foo.c

Thanks for the test case.

It appears that I managed to run only those tests that didn't expose
the assertion error and there is at least one more port i.e. powerpc64
showing similar ICEs when -funroll-loops and/or -fpeel-loops are used
that enables the regrename pass.

In both AArch64 and ARM cases I found the same insufficient checks
when chains are tied and it seems that this is the root cause behind
all failures.

With the attached patch I built arm-none-linux-gnueabi without failures,
checked a number of cases shown on Christophe's page, the above
test case, and it would appear that the problem is solved.

The reason behind the failures is that the terminated_this_insn had
a different number of consecutive registers (and mode) to the input
operand in a move currently being considered for tying. In the fix,
I allow tying only if there is matching number of NREGS.

Bernd, do you think that this check would be sufficient and safe?
I'm not sure what would be better: check the mode, nregs plus perhaps
consider tying only if nregs == 1.

Regards,
Robert

gcc/
* regname.c (scan_rtx_reg): Check the matching number of consecutive
registers when tying chains.
---
 gcc/regrename.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/gcc/regrename.c b/gcc/regrename.c
index d727dd9..0b8f032 100644
--- a/gcc/regrename.c
+++ b/gcc/regrename.c
@@ -1068,7 +1068,9 @@ scan_rtx_reg (rtx_insn *insn, rtx *loc, enum reg_class 
cl, enum scan_actions act
  && GET_CODE (pat) == SET
  && GET_CODE (SET_DEST (pat)) == REG
  && GET_CODE (SET_SRC (pat)) == REG
- && terminated_this_insn)
+ && terminated_this_insn
+ && terminated_this_insn->nregs
+== REG_NREGS (recog_data.operand[1]))
{
  gcc_assert (terminated_this_insn->regno
  == REGNO (recog_data.operand[1]));
-- 
2.4.5


Re: RFC: Experimental use of Sphinx for GCC documentation

2015-11-10 Thread Joseph Myers
On Mon, 9 Nov 2015, Sandra Loosemore wrote:

> If we're going to switch documentation formats, I'd rather we used DocBook.
> I've had to use "restructured text" before and found it really awkward.

I should perhaps note that the Sphinx extensions to reST are a lot better 
documented than the ZWiki ones!

-- 
Joseph S. Myers
jos...@codesourcery.com


[Patch, MIPS] Remove definition of TARGET_PROMOTE_PROTOTYPES

2015-11-10 Thread Steve Ellcey
This patch removes the definition of TARGET_PROMOTE_PROTOTYPES from MIPS,
where it was defined as true, so that it now defaults to false.

Currently MIPS does prototype promotion in the caller and the callee and this
patch removes the TARGET_PROMOTE_PROTOTYPES macro definition so that
the promotion is only done in the caller (due to PROMOTE_MODE being defined).
This does not break the ABI which requires the caller to do promotions anyway.
(See https://gcc.gnu.org/ml/gcc/2015-10/msg00223.html).  This change also
causes GCC to match what the LLVM and Greenhills compilers already do on MIPS.

After removing this macro I had three regressions, two were just tests that
needed changing but one was a bug (gcc.dg/fixed-point/convert-sat.c).
This test was calling a library function to convert a signed char into an
unsigned fixed type and because we don't have tree type information about
libcalls GCC cannot do the ABI required type promotion on those calls that it
does on normal user defined calls.  In fact promote_mode in explow.c expicitly
returns without doing anything if no type is given it.  Before this change it
didn't matter on MIPS because the callee did the same promotion that the caller
was supposed to have done before using the argument.  Now that callee code is
gone we depend on the caller doing the correct promotion and that was not
happening.

I submitted and checked in another patch to optabs.c
(See https://gcc.gnu.org/ml/gcc-patches/2015-11/msg00704.html) to provide
me with the infrastructure to do the correct type promotion in expand_fixed
and this patch redefines TARGET_PROMOTE_FUNCTION_MODE to return the needed
promotion mode even when type is NULL_TREE.  When type is set it does
the same thing as it used to do.  This change allows me to remote the
definition of TARGET_PROMOTE_PROTOTYPES without the convert-sat.c test
failing.

The two tests that I changed are gcc.dg/tree-ssa/ssa-fre-4.c and
gcc.target/mips/ext-2.c.  ssa-fre-4.c no longer applies to MIPS now
that we do not define TARGET_PROMOTE_PROTOTYPES so I removed the MIPS
target from it.  ext-2.c now generates an srl instruction instead of a
dext instruction but the number of instructions has not changed and I
updated the scan checks.

Tested on mips-mti-linux-gnu with no unfixed regressions.  OK to checkin?

Steve Ellcey
sell...@imgtec.com


2015-11-10  Steve Ellcey  

* config/mips/mips.c (mips_promote_function_mode): New function.
(TARGET_PROMOTE_FUNCTION_MODE): Define as above function.
(TARGET_PROMOTE_PROTOTYPES): Remove.


diff --git a/gcc/config/mips/mips.c b/gcc/config/mips/mips.c
index 9880b23..e9c3830 100644
--- a/gcc/config/mips/mips.c
+++ b/gcc/config/mips/mips.c
@@ -19760,6 +19760,32 @@ mips_ira_change_pseudo_allocno_class (int regno, 
reg_class_t allocno_class)
 return GR_REGS;
   return allocno_class;
 }
+
+/* Implement TARGET_PROMOTE_FUNCTION_MODE */
+
+/* This function is equivalent to default_promote_function_mode_always_promote
+   except that it returns a promoted mode even if type is NULL_TREE.  This is
+   needed by libcalls which have no type (only a mode) such as fixed conversion
+   routines that take a signed or unsigned char/short argument and convert it
+   to a fixed type.  */
+
+static machine_mode
+mips_promote_function_mode (const_tree type ATTRIBUTE_UNUSED,
+machine_mode mode,
+int *punsignedp ATTRIBUTE_UNUSED,
+const_tree fntype ATTRIBUTE_UNUSED,
+int for_return ATTRIBUTE_UNUSED)
+{
+  int unsignedp;
+
+  if (type != NULL_TREE)
+return promote_mode (type, mode, punsignedp);
+
+  unsignedp = *punsignedp;
+  PROMOTE_MODE (mode, unsignedp, type);
+  *punsignedp = unsignedp;
+  return mode;
+}
 
 /* Initialize the GCC target structure.  */
 #undef TARGET_ASM_ALIGNED_HI_OP
@@ -19864,10 +19890,7 @@ mips_ira_change_pseudo_allocno_class (int regno, 
reg_class_t allocno_class)
 #define TARGET_GIMPLIFY_VA_ARG_EXPR mips_gimplify_va_arg_expr
 
 #undef  TARGET_PROMOTE_FUNCTION_MODE
-#define TARGET_PROMOTE_FUNCTION_MODE 
default_promote_function_mode_always_promote
-#undef TARGET_PROMOTE_PROTOTYPES
-#define TARGET_PROMOTE_PROTOTYPES hook_bool_const_tree_true
-
+#define TARGET_PROMOTE_FUNCTION_MODE mips_promote_function_mode
 #undef TARGET_FUNCTION_VALUE
 #define TARGET_FUNCTION_VALUE mips_function_value
 #undef TARGET_LIBCALL_VALUE




2015-11-10  Steve Ellcey  

* gcc.dg/tree-ssa/ssa-fre-4.c: Remove mips*-*-* target.
* gcc.target/mips/ext-2.c: Update scan checks.


diff --git a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-4.c 
b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-4.c
index 02b6719..5a7588f 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-4.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/ssa-fre-4.c
@@ -1,6 +1,6 @@
 /* If the target returns false for TARGET_PROMOTE_PROTOTYPES, then there
will be no casts for FRE to eliminate and the 

Re: [PATCH], Add power9 support to GCC, patch #10 (SFmode/DFmode d-form addressing)

2015-11-10 Thread Segher Boessenkool
On Tue, Nov 10, 2015 at 04:56:15PM -0500, Michael Meissner wrote:
> This patch d-form addressing to float/double scalars for the PowerPC that was
> added in ISA 3.0 (power9).  This patch does not yet turn on D-form addressing
> as default.  It is likely that patch #11, which will add limited d-form
> addressing to vector registers will enable it by default.
> 
> I have bootstrapped the compiler with these changes, and there were no
> regressions to the testsuite.
> 
> In addition, I built all of the Spec 2006 benchmark with my normal options
> (-ffast-math -O3 -mveclibabi=mass -mcpu=power9 -mpower9-dform -mrecip=rsqrt
> -fpeel-loops -funroll-loops -fvect-cost-model -msave-toc-indirect
> -fno-aggressive-loop-optimizations -mno-pointers-to-nested-functions) and 
> there
> were no compiler failures (and various power9 instructions were generated,
> including d-form addressing).
> 
> Are these patches ok to check in?

You forgot the patch again, it must be a curse ;-)


Segher


Re: [PR64164] drop copyrename, integrate into expand

2015-11-10 Thread Alexandre Oliva
On Nov 10, 2015, Alan Lawrence  wrote:

> FAIL: gcc.target/aarch64/aapcs64/func-ret-4.c execution,  -O2

Ugh, sorry.  I even checked that testcase by hand before submitting the
patch, because I knew it took the paths I was changing, but I didn't
realize the stack store and load would amount to shifts when the stack
slot was bypassed.

With the following patch, we get a lsr and a ubfx, without the sp
adjustments.  Please let me know if it causes any further problems.  So
far, I've tested it on x86_64-linux-gnu, i686-linux-gnu, and
ppc64le-linux-gnu; the ppc64-linux-gnu test run is running slower and
probably won't be done before I call it a day, but I wanted to give you
something before taking off for the day.

Is this ok to install if ppc64-linux-gnu also regstraps successfully?


[PR67753] adjust for padding when bypassing memory in assign_parm_setup_block

From: Alexandre Oliva 

Storing a register in memory as a full word and then accessing the
same memory address under a smaller-than-word mode amounts to
right-shifting of the register word on big endian machines.  So, if
BLOCK_REG_PADDING chooses upward padding for BYTES_BIG_ENDIAN, and
we're copying from the entry_parm REG directly to a pseudo, bypassing
any stack slot, perform the shifting explicitly.

This fixes the miscompile of function_return_val_10 in
gcc.target/aarch64/aapcs64/func-ret-4.c for target aarch64_be-elf
introduced in the first patch for 67753.

for  gcc/ChangeLog

PR rtl-optimization/67753
PR rtl-optimization/64164
* function.c (assign_parm_setup_block): Right-shift
upward-padded big-endian args when bypassing the stack slot.
---
 gcc/function.c |   44 +---
 1 file changed, 41 insertions(+), 3 deletions(-)

diff --git a/gcc/function.c b/gcc/function.c
index a637cb3..1ee092c 100644
--- a/gcc/function.c
+++ b/gcc/function.c
@@ -3002,6 +3002,38 @@ assign_parm_setup_block (struct assign_parm_data_all 
*all,
  emit_move_insn (change_address (mem, mode, 0), reg);
}
 
+#ifdef BLOCK_REG_PADDING
+ /* Storing the register in memory as a full word, as
+move_block_from_reg below would do, and then using the
+MEM in a smaller mode, has the effect of shifting right
+if BYTES_BIG_ENDIAN.  If we're bypassing memory, the
+shifting must be explicit.  */
+ else if (!MEM_P (mem))
+   {
+ rtx x;
+
+ /* If the assert below fails, we should have taken the
+mode != BLKmode path above, unless we have downward
+padding of smaller-than-word arguments on a machine
+with little-endian bytes, which would likely require
+additional changes to work correctly.  */
+ gcc_checking_assert (BYTES_BIG_ENDIAN
+  && (BLOCK_REG_PADDING (mode,
+ data->passed_type, 1)
+  == upward));
+
+ int by = (UNITS_PER_WORD - size) * BITS_PER_UNIT;
+
+ x = gen_rtx_REG (word_mode, REGNO (entry_parm));
+ x = expand_shift (RSHIFT_EXPR, word_mode, x, by,
+   NULL_RTX, 1);
+ x = force_reg (word_mode, x);
+ x = gen_lowpart_SUBREG (GET_MODE (mem), x);
+
+ emit_move_insn (mem, x);
+   }
+#endif
+
  /* Blocks smaller than a word on a BYTES_BIG_ENDIAN
 machine must be aligned to the left before storing
 to memory.  Note that the previous test doesn't
@@ -3023,14 +3055,20 @@ assign_parm_setup_block (struct assign_parm_data_all 
*all,
  tem = change_address (mem, word_mode, 0);
  emit_move_insn (tem, x);
}
- else if (!MEM_P (mem))
-   emit_move_insn (mem, entry_parm);
  else
move_block_from_reg (REGNO (entry_parm), mem,
 size_stored / UNITS_PER_WORD);
}
   else if (!MEM_P (mem))
-   emit_move_insn (mem, entry_parm);
+   {
+ gcc_checking_assert (size > UNITS_PER_WORD);
+#ifdef BLOCK_REG_PADDING
+ gcc_checking_assert (BLOCK_REG_PADDING (GET_MODE (mem),
+ data->passed_type, 0)
+  == upward);
+#endif
+ emit_move_insn (mem, entry_parm);
+   }
   else
move_block_from_reg (REGNO (entry_parm), mem,
 size_stored / UNITS_PER_WORD);


-- 
Alexandre Oliva, freedom fighterhttp://FSFLA.org/~lxoliva/
You must be the change you wish to see in the world. -- Gandhi
Be Free! -- http://FSFLA.org/   FSF Latin America board member
Free Software Evangelist|Red Hat Brasil GNU Toolchain Engineer


Re: [RFC][PATCH] Preferred rename register in regrename pass

2015-11-10 Thread Bernd Schmidt

On 11/10/2015 11:33 PM, Robert Suchanek wrote:


The reason behind the failures is that the terminated_this_insn had
a different number of consecutive registers (and mode) to the input
operand in a move currently being considered for tying. In the fix,
I allow tying only if there is matching number of NREGS.

Bernd, do you think that this check would be sufficient and safe?
I'm not sure what would be better: check the mode, nregs plus perhaps
consider tying only if nregs == 1.


Hmm, but shouldn't the regno still be the same? Or is this a case where 
we have a multi-word chain like ax/dx and then something like a "set bx, 
dx" involving only a part of it, but the entire chain dies?


I guess this is ok to stop the failures for now, but you may want to 
move the check to the point where we set terminated_this_insn. Also, as 
I pointed out earlier, clearing terminated_this_insn should probably 
happen earlier.



Bernd


Re: [C/C++ PATCH] Reject declarators with huge arrays (PR c/68107, c++/68266)

2015-11-10 Thread Jeff Law

On 11/10/2015 09:36 AM, Marek Polacek wrote:

While both C and C++ FEs are able to reject e.g.
int a[__SIZE_MAX__ / sizeof(int)];
they are accepting code such as
int (*a)[__SIZE_MAX__ / sizeof(int)];

As Joseph pointed out, any construction of a non-VLA type whose size is half or
more of the address space should receive a compile-time error.

Done by moving up the check for the size in bytes so that it checks check every
non-VLA complete array type constructed in the course of processing the
declarator.  Since the C++ FE had the same problem, I've fixed it up there as
well.  And that's why I had to twek dg-error of two C++ tests; if the size of
an array is considered invalid, we give an error message with word "unnamed".

(I've removed the comment about crashing in tree_to_[su]hwi since that seems
to no longer be the case.)

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2015-11-10  Marek Polacek  

PR c/68107
PR c++/68266
* c-decl.c (grokdeclarator): Check whether the size of arrays is
valid earlier.

* decl.c (grokdeclarator): Check whether the size of arrays is valid
earlier.

* c-c++-common/pr68107.c: New test.
* g++.dg/init/new38.C (large_array_char): Adjust dg-error.
(large_array_char_template): Likewise.
* g++.dg/init/new44.C: Adjust dg-error.
Someone (I can't recall who) suggested the overflow check ought to be 
shared, I agree.  Can you factor out that check, shove it into c-family/ 
and call it from the C & C++ front-ends?


Approved with that change.  Please post it here for archival purposes 
though.


Your decision as to whether or not the shared routine verifies that type 
!= error_mark_node as is currently done in the C++ front-end.  The C 
front-end merely checks it earlier.  SO it's safe to put that test into 
the shared code if you want.


Jeff


Re: RFC: Experimental use of Sphinx for GCC documentation

2015-11-10 Thread David Malcolm
On Mon, 2015-11-09 at 16:37 -0700, Sandra Loosemore wrote:
> On 11/08/2015 06:55 AM, David Malcolm wrote:
> > I've been experimenting with using Sphinx [1] for GCC's documentation.
> >
> > [snip]
> >
> > The primary advantages of .rst/sphinx over .texi/texinfo I see are in
> > the generated HTML:
> >
> > * sane, stable URLs (so e.g. there is a reliable URL for the docs for,
> > say, "-Wall").
> >
> > * a page-splitting structure that make sense, to me, at least [3]
> >
> > * much more use of markup, with restrained and well-chosen CSS
> > (texinfo's HTML seems to ignore much of the inline markup in
> > the .texinfo file)
> >
> > * autogenerated internal links, so that almost everything is clickable,
> > and will take you somewhere sane, by default
> >
> > * syntax-highlighting of code examples, with support for multiple
> > programming languages (note the mixture of C, C++, Fortran, etc in the
> > docs for the gcc options).
> >
> > * looks modern and fresh (IMHO), letting casual observers see that the
> > project is alive and kicking.
> >
> >
> > Thoughts?
> 
> If we're going to switch documentation formats, I'd rather we used 
> DocBook.  I've had to use "restructured text" before and found it really 
> awkward.

My own preference is the opposite; I've used DocBook and rst, and I find
DocBook to be the awkward one [1].  I think DocBook may be OK as an
interchange format, but I find it overly verbose to author and to read
in plain-text form.

(I'm not so fond of some parts of .rst's inline markup syntax, but I
find its structural aspects to be extremely expressive and concise;
overall I find it a joy to work with).

> But, personal preferences aside, I also think it's more important that 
> we commit documentation-person resources to making the content more 
> correct, readable, and better organized, than to making the HTML output 
> look "modern and fresh", or worse yet, translating the docs to another 
> format and having to proofread them for conversion goofs.

Correct, readable and well-organized documentation are laudable goals...
but I think that, to a first approximation, we're already there: I just
feel that the content is hidden behind a poor tool chain.

I believe that no matter how good we make the .texi files, the issues
with URLs, HTML page-splitting, etc with how texinfo's HTML generation
works will hold gcc back.

> BTW, Mentor Graphics' toolchains ship with a custom HTML stylesheet for 
> the generated manuals, to make them a little "prettier".  Maybe 
> something like that would go a long way towards solving the perceived 
> problems here? 

I'm interested in seeing that, though presumably the URL and
page-splitting issues would remain (is this at the CSS level, or do you
make deeper changes to the HTML generation?)


> Or improvements to texinfo's HTML generation.

texinfo is implemented in Perl, and FWIW, for me to help, that's a
showstopper (sprry; I've tried several times to get my head around Perl,
but my brain seems incompatible with it).


One other approach might be to retain .texi as the canonical format, but
have a optional custom HTML generator (perhaps using texi2rst to
generate .rst for sphinx, this time as an intermediate step during "make
html", rather than as a one-time conversion).  The main thing I think
it's missing is a way to express the language of embedded source
examples, so that they can be syntax-highlighted.


Thanks; I hope this is constructive.
Dave

[1] fwiw, my opinion on this has changed; a decade or so ago I worked on
a DocBook editor,  http://conglomerate.org/ 




Re: [gomp4] Fix some broken tests

2015-11-10 Thread Cesar Philippidis
On 11/10/2015 12:35 PM, Nathan Sidwell wrote:
> I've committed this to  gomp4.  In preparing the reworked firstprivate
> patch changes for gomp4's gimplify.c I discovered these testcases were
> passing by accident, and lacked a data clause.

It used to be if a reduction was on a parallel construct, the gimplifier
would introduce a pcopy clause for the reduction variable if it was not
associated with any data clause. Is that not the case anymore?

Cesar


Re: [PR64164] drop copyrename, integrate into expand

2015-11-10 Thread Jeff Law

On 11/10/2015 03:58 PM, Alexandre Oliva wrote:

On Nov 10, 2015, Alan Lawrence  wrote:


FAIL: gcc.target/aarch64/aapcs64/func-ret-4.c execution,  -O2


Ugh, sorry.  I even checked that testcase by hand before submitting the
patch, because I knew it took the paths I was changing, but I didn't
realize the stack store and load would amount to shifts when the stack
slot was bypassed.

With the following patch, we get a lsr and a ubfx, without the sp
adjustments.  Please let me know if it causes any further problems.  So
far, I've tested it on x86_64-linux-gnu, i686-linux-gnu, and
ppc64le-linux-gnu; the ppc64-linux-gnu test run is running slower and
probably won't be done before I call it a day, but I wanted to give you
something before taking off for the day.

Is this ok to install if ppc64-linux-gnu also regstraps successfully?


[PR67753] adjust for padding when bypassing memory in assign_parm_setup_block

From: Alexandre Oliva 

Storing a register in memory as a full word and then accessing the
same memory address under a smaller-than-word mode amounts to
right-shifting of the register word on big endian machines.  So, if
BLOCK_REG_PADDING chooses upward padding for BYTES_BIG_ENDIAN, and
we're copying from the entry_parm REG directly to a pseudo, bypassing
any stack slot, perform the shifting explicitly.

This fixes the miscompile of function_return_val_10 in
gcc.target/aarch64/aapcs64/func-ret-4.c for target aarch64_be-elf
introduced in the first patch for 67753.

for  gcc/ChangeLog

PR rtl-optimization/67753
PR rtl-optimization/64164
* function.c (assign_parm_setup_block): Right-shift
upward-padded big-endian args when bypassing the stack slot.
Don't you need to check the value of BLOCK_REG_PADDING at runtime?  The 
padding is essentially allowed to vary.


If you  look at the other places where BLOCK_REG_PADDING is used, it's 
checked in a #ifdef, then again inside a if conditional.




Jeff



RE: [RFC][PATCH] Preferred rename register in regrename pass

2015-11-10 Thread Robert Suchanek
Hi,

> > Bernd, do you think that this check would be sufficient and safe?
> > I'm not sure what would be better: check the mode, nregs plus perhaps
> > consider tying only if nregs == 1.
>
> Hmm, but shouldn't the regno still be the same? Or is this a case where
> we have a multi-word chain like ax/dx and then something like a "set bx,
> dx" involving only a part of it, but the entire chain dies?

The more I stare at this the more confusing it is. Yes, it appears to be a 
multi-word
chain and when a subset dies then the whole chain dies.

Let's consider the following snippet:
...
(insn 1467 1465 1466 68 (set (reg:DI 4 r4 [626])
(mult:DI (zero_extend:DI (reg:SI 1 r1 [orig:698 bbase_yn ] [698]))
(zero_extend:DI (reg:SI 12 ip [orig:700 _302 ] [700] 
/scratch2/check-other-ports/src/gcc/libgfortran/generated/matmul_i8.c:284 54 
{*umulsidi3_v6}
 (nil))
(insn 1466 1467 4288 68 (set (reg:SI 2 r2 [625])
(plus:SI (mult:SI (reg:SI 12 ip [orig:700 _302 ] [700])
(reg:SI 0 r0 [orig:699 bbase_yn+4 ] [699]))
(reg:SI 2 r2 [624]))) 
/scratch2/check-other-ports/src/gcc/libgfortran/generated/matmul_i8.c:284 43 
{*mulsi3addsi_v6}
 (expr_list:REG_DEAD (reg:SI 12 ip [orig:700 _302 ] [700])
(nil)))
(insn 4288 1466 1469 68 (set (reg:SI 12 ip [1933])
(reg:SI 5 r5 [+4 ])) 
/scratch2/check-other-ports/src/gcc/libgfortran/generated/matmul_i8.c:284 174 
{*arm_movsi_insn}
 (expr_list:REG_DEAD (reg:SI 5 r5 [+4 ])
(nil)))
...

When the input operand in insn 4288 is terminated as dead then the
terminated_this_insn->regno points to register 4 but this_regno is 5.
terminated_this_insn->last->insn points to insn 1467.
I presume "[+4 ]" for register 5 in the dump indicates that this is a part of
the multi-word register.

When a new chain is created for the output operand with register 12
and tying is attempted then we get an assertion error.

> I guess this is ok to stop the failures for now, but you may want to
> move the check to the point where we set terminated_this_insn. Also, as
> I pointed out earlier, clearing terminated_this_insn should probably
> happen earlier.
>
> Bernd

Ah yes, I forgot to move this. I'll move it and commit the patch in the morning.

Regards,
Robert

Re: [PATCH PR52272]Be smart when adding iv candidates

2015-11-10 Thread Bin.Cheng
On Tue, Nov 10, 2015 at 9:26 AM, Bin.Cheng  wrote:
> On Mon, Nov 9, 2015 at 11:24 PM, Bernd Schmidt  wrote:
>> On 11/08/2015 10:11 AM, Richard Biener wrote:
>>>
>>> On November 8, 2015 3:58:57 AM GMT+01:00, "Bin.Cheng"
>>>  wrote:
>
> +inline bool
> +iv_common_cand_hasher::equal (const iv_common_cand *ccand1,
> +  const iv_common_cand *ccand2)
> +{
> +  return ccand1->hash == ccand2->hash
> +&& operand_equal_p (ccand1->base, ccand2->base, 0)
> +&& operand_equal_p (ccand1->step, ccand2->step, 0)
> +&& TYPE_PRECISION (TREE_TYPE (ccand1->base))
> + == TYPE_PRECISION (TREE_TYPE (ccand2->base));
>
>>> Yes.  Patch is OK then.
>>
>>
>> Doesn't follow the formatting rules though in the quoted piece.
>
> Hi Bernd,
> Thanks for reviewing.  I haven't committed it yet, could you please
> point out which quoted piece is so that I can update patch?
Ah, the part quoted in review message, I was stupid and tried to find
quoted part in my patch...  I can see the problem now, here is the
updated patch.

Thanks,
bin
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 1f952a7..aecba12 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -247,6 +247,45 @@ struct iv_cand
   smaller type.  */
 };
 
+/* Hashtable entry for common candidate derived from iv uses.  */
+struct iv_common_cand
+{
+  tree base;
+  tree step;
+  /* IV uses from which this common candidate is derived.  */
+  vec uses;
+  hashval_t hash;
+};
+
+/* Hashtable helpers.  */
+
+struct iv_common_cand_hasher : free_ptr_hash 
+{
+  static inline hashval_t hash (const iv_common_cand *);
+  static inline bool equal (const iv_common_cand *, const iv_common_cand *);
+};
+
+/* Hash function for possible common candidates.  */
+
+inline hashval_t
+iv_common_cand_hasher::hash (const iv_common_cand *ccand)
+{
+  return ccand->hash;
+}
+
+/* Hash table equality function for common candidates.  */
+
+inline bool
+iv_common_cand_hasher::equal (const iv_common_cand *ccand1,
+ const iv_common_cand *ccand2)
+{
+  return ccand1->hash == ccand2->hash
+&& operand_equal_p (ccand1->base, ccand2->base, 0)
+&& operand_equal_p (ccand1->step, ccand2->step, 0)
+&& TYPE_PRECISION (TREE_TYPE (ccand1->base))
+ == TYPE_PRECISION (TREE_TYPE (ccand2->base));
+}
+
 /* Loop invariant expression hashtable entry.  */
 struct iv_inv_expr_ent
 {
@@ -255,8 +294,6 @@ struct iv_inv_expr_ent
   hashval_t hash;
 };
 
-/* The data used by the induction variable optimizations.  */
-
 /* Hashtable helpers.  */
 
 struct iv_inv_expr_hasher : free_ptr_hash 
@@ -323,6 +360,12 @@ struct ivopts_data
   /* Cache used by tree_to_aff_combination_expand.  */
   hash_map *name_expansion_cache;
 
+  /* The hashtable of common candidates derived from iv uses.  */
+  hash_table *iv_common_cand_tab;
+
+  /* The common candidates.  */
+  vec iv_common_cands;
+
   /* The maximum invariant id.  */
   unsigned max_inv_id;
 
@@ -894,6 +937,8 @@ tree_ssa_iv_optimize_init (struct ivopts_data *data)
   data->inv_expr_tab = new hash_table (10);
   data->inv_expr_id = 0;
   data->name_expansion_cache = NULL;
+  data->iv_common_cand_tab = new hash_table (10);
+  data->iv_common_cands.create (20);
   decl_rtl_to_reset.create (20);
   gcc_obstack_init (>iv_obstack);
 }
@@ -3051,6 +3096,96 @@ add_iv_candidate_for_bivs (struct ivopts_data *data)
 }
 }
 
+/* Record common candidate {BASE, STEP} derived from USE in hashtable.  */
+
+static void
+record_common_cand (struct ivopts_data *data, tree base,
+   tree step, struct iv_use *use)
+{
+  struct iv_common_cand ent;
+  struct iv_common_cand **slot;
+
+  gcc_assert (use != NULL);
+
+  ent.base = base;
+  ent.step = step;
+  ent.hash = iterative_hash_expr (base, 0);
+  ent.hash = iterative_hash_expr (step, ent.hash);
+
+  slot = data->iv_common_cand_tab->find_slot (, INSERT);
+  if (*slot == NULL)
+{
+  *slot = XNEW (struct iv_common_cand);
+  (*slot)->base = base;
+  (*slot)->step = step;
+  (*slot)->uses.create (8);
+  (*slot)->hash = ent.hash;
+  data->iv_common_cands.safe_push ((*slot));
+}
+  (*slot)->uses.safe_push (use);
+  return;
+}
+
+/* Comparison function used to sort common candidates.  */
+
+static int
+common_cand_cmp (const void *p1, const void *p2)
+{
+  unsigned n1, n2;
+  const struct iv_common_cand *const *const ccand1
+= (const struct iv_common_cand *const *)p1;
+  const struct iv_common_cand *const *const ccand2
+= (const struct iv_common_cand *const *)p2;
+
+  n1 = (*ccand1)->uses.length ();
+  n2 = (*ccand2)->uses.length ();
+  return n2 - n1;
+}
+
+/* Adds IV candidates based on common candidated recorded.  */
+
+static void
+add_iv_candidate_derived_from_uses (struct ivopts_data 

Re: [hsa 9/12] Small alloc-pool fix

2015-11-10 Thread Martin Liška
On 11/06/2015 10:57 AM, Richard Biener wrote:
> On Fri, 6 Nov 2015, Martin Liška wrote:
> 
>> On 11/06/2015 10:00 AM, Richard Biener wrote:
>>> On Thu, 5 Nov 2015, Martin Jambor wrote:
>>>
 Hi,

 we use C++ new operators based on alloc-pools a lot in the subsequent
 patches and realized that on the current trunk, such new operators
 would needlessly call the placement ::new operator within the allocate
 method of pool-alloc.  Fixed below by providing a new allocation
 method which does not call placement new, which is only safe to use
 from within a new operator.

 The patch also fixes the slightly weird two parameter operator new
 (which we do not use in HSA backend) so that it does not do the same.
>>>
>>
>> Hi.
>>
>>> Why do you need to add the pointer variant then?
>>
>> You are right, we originally used the variant in the branch, but it was 
>> eventually
>> left.
>>
>>>
>>> Also isn't the issue with allocate() that it does
>>>
>>> return ::new (m_allocator.allocate ()) T ();
>>>
>>> which 1) value-initializes and 2) doesn't even work with types like
>>>
>>> struct T { T(int); };
>>>
>>> thus types without a default constructor.
>>
>> You are right, it produces compilation error.
>>
>>>
>>> I think the allocator was poorly C++-ified without updating the
>>> specification for the cases it is supposed to handle.  And now
>>> we have C++ uses that are not working because the allocator is
>>> broken.
>>>
>>> An incrementally better version (w/o fixing the issue with
>>> types w/o default constructor) is
>>>
>>> return ::new (m_allocator.allocate ()) T;
>>
>> I've tried that, and it also calls default ctor:
>>
>> ../../gcc/alloc-pool.h: In instantiation of ‘T* 
>> object_allocator::allocate() [with T = et_occ]’:
>> ../../gcc/alloc-pool.h:531:22:   required from ‘void* operator new(size_t, 
>> object_allocator&) [with T = et_occ; size_t = long unsigned int]’
>> ../../gcc/et-forest.c:449:46:   required from here
>> ../../gcc/et-forest.c:58:3: error: ‘et_occ::et_occ()’ is private
>>et_occ ();
>>^
>> In file included from ../../gcc/et-forest.c:28:0:
>> ../../gcc/alloc-pool.h:483:44: error: within this context
>>  return ::new (m_allocator.allocate ()) T;
> 
> Yes, but it does slightly cheaper initialization of PODs
> 
>>
>>>
>>> thus default-initialize which does no initialization for PODs (without
>>> array members...) which is what the old pool allocator did.
>>
>> I'm not so familiar with differences related to PODs.
>>
>>>
>>> To fix the new operator (how do you even call that?  does it allow
>>> specifying constructor args and thus work without a default constructor?)
>>> it should indeed use an allocation method not performing the placement
>>> new.  But I'd call it allocate_raw rather than vallocate.
>>
>> For situations where do not have a default ctor, one should you the 
>> helper method defined at the end of alloc-pool.h:
>>
>> template 
>> inline void *
>> operator new (size_t, object_allocator )
>> {
>>   return a.allocate ();
>> }
>>
>> For instance:
>> et_occ *nw = new (et_occurrences) et_occ (2);
> 
> Oh, so it uses placement new syntax...  works for me.
> 
>> or as used in the HSA branch:
>>
>> /* New operator to allocate convert instruction from pool alloc.  */
>>
>> void *
>> hsa_insn_cvt::operator new (size_t)
>> {
>>   return hsa_allocp_inst_cvt->allocate_raw ();
>> }
>>
>> and
>>
>> cvtinsn = new hsa_insn_cvt (reg, *ptmp2);
>>
>>
>> I attached patch where I rename the method as suggested.
> 
> Ok.

Hi.

I'm sending suggested patch that survives regression tests and bootstrap
on x86_64-linux-gnu.

Can I install the patch to trunk?
Thanks,
Martin

> 
> Thanks,
> Richard.
> 
>> Thanks,
>> Martin
>>
>>>
>>> Thanks.
>>> Richard.
>>>
 Thanks,

 Martin


 2015-11-05  Martin Liska  
Martin Jambor  

* alloc-pool.h (object_allocator::vallocate): New method.
(operator new): Call vallocate instead of allocate.
(operator new): New operator.


 diff --git a/gcc/alloc-pool.h b/gcc/alloc-pool.h
 index 0dc05cd..46b6550 100644
 --- a/gcc/alloc-pool.h
 +++ b/gcc/alloc-pool.h
 @@ -483,6 +483,12 @@ public:
  return ::new (m_allocator.allocate ()) T ();
}
  
 +  inline void *
 +  vallocate () ATTRIBUTE_MALLOC
 +  {
 +return m_allocator.allocate ();
 +  }
 +
inline void
remove (T *object)
{
 @@ -523,12 +529,19 @@ struct alloc_pool_descriptor
  };
  
  /* Helper for classes that do not provide default ctor.  */
 -
  template 
  inline void *
  operator new (size_t, object_allocator )
  {
 -  return a.allocate ();
 +  return a.vallocate ();
 +}
 +
 +/* Helper for classes that do not provide default ctor.  */
 +template 
 +inline void *
 +operator new (size_t, object_allocator *a)
 +{

Re: [PATCH, VECTOR ABI] Add __attribute__((__simd__)) to GCC.

2015-11-10 Thread Kirill Yukhin
Hi Jakub,
On 29 Oct 09:54, Jakub Jelinek wrote:
> On Wed, Oct 28, 2015 at 12:16:04PM +0300, Kirill Yukhin wrote:
> > Bootstrapped. Regtested. Is it ok for trunk?
> > 
> > 
> > gcc/
> > * omp-low.c (pass_omp_simd_clone::gate): If target allows - call
> > without additional conditions.
> > * doc/extend.texi (simd): Document new attribute.
> > gcc/cp/
> > * parser.h (cp_parser): Add simd_attr_present.
> > * parser.c (cp_parser_late_return_type_opt): Handle 
> > simd_attr_present,
> > require comman in __vector__ attribute.
> > (cp_parser_gnu_attribute_list): Ditto.
> > gcc/c/
> > * c-parser.c (c_parser): Add simd_attr_present flag.
> > (c_parser_declaration_or_fndef): Call c_parser_declaration_or_fndef
> > if simd_attr_present is set.
> > (c_finish_omp_declare_simd): Handle simd_attr_present.
> 
> Actually, do you plan to eventually add some clauses/operands to the simd
> attribute, or is the plan to just say that simd attribute is
> #pragma omp declare simd
> with no clauses as if -fopenmp-simd has been enabled?
I think so/
> If you don't plan to add any clauses, I wonder whether you really need to
> add any parser changes at all, whether this couldn't be all handled in
> c-family/c-common.c - handle_simd_attribute, adding simd to the attribute
> table in there as a function decl attribute, and simply when processing it
> add
> tree c = build_tree_list (get_identifier ("omp declare simd"), 
> NULL_TREE);
> TREE_CHAIN (c) = DECL_ATTRIBUTES (fndecl);
> DECL_ATTRIBUTES (fndecl) = c;
> (after checking whether the attribute isn't already present and erroring out
> if there is "cilk simd function" attribute).
> The reason for the (admittedly ugly) parser changes for #pragma omp declare 
> simd is
> that the clauses on the directive refer to parameters that will be declared
> later, so we need to save the tokens of the pragma and then after parsing
> the parameter declarations actually parse the clauses.  But, in the simd
> attribute case, there are no clauses, there is nothing to parse later.
I've refactored the patch.
New tests pass except one, which fails due to PR68158.
Bootstrapped and reg-tested.

Is it ok for trunk?

gcc/
* omp-low.c (pass_omp_simd_clone::gate): If target allows - call
without additional conditions.
* doc/extend.texi (@item simd): New.
gcc/c-family/
* c-common.c (handle_simd_attribute): New.
(struct attribute_spec): Add entry for "simd".
(handle_simd_attribute): New
gcc/c/
* c-parser.c (c_finish_omp_declare_simd): Look for
"simd" attribute as well. Update error message.
gcc/cp/
* parser.c (cp_parser_late_parsing_cilk_simd_fn_info): Look for
"simd" attribute as well. Update error message.
gcc/testsuite/
* c-c++-common/attr-simd.c: New test.
* c-c++-common/attr-simd-2.c: Ditto.
* c-c++-common/attr-simd-3.c: Ditto.

>   Jakub

--
Thanks, K

diff --git a/gcc/c-family/c-common.c b/gcc/c-family/c-common.c
index 1c75921..08ab220 100644
--- a/gcc/c-family/c-common.c
+++ b/gcc/c-family/c-common.c
@@ -392,6 +392,7 @@ static tree handle_warn_unused_attribute (tree *, tree, 
tree, int, bool *);
 static tree handle_returns_nonnull_attribute (tree *, tree, tree, int, bool *);
 static tree handle_omp_declare_simd_attribute (tree *, tree, tree, int,
   bool *);
+static tree handle_simd_attribute (tree *, tree, tree, int, bool *);
 static tree handle_omp_declare_target_attribute (tree *, tree, tree, int,
 bool *);
 static tree handle_designated_init_attribute (tree *, tree, tree, int, bool *);
@@ -818,6 +819,8 @@ const struct attribute_spec c_common_attribute_table[] =
  handle_omp_declare_simd_attribute, false },
   { "cilk simd function", 0, -1, true,  false, false,
  handle_omp_declare_simd_attribute, false },
+  { "simd",  0, -1, true,  false, false,
+ handle_simd_attribute, false },
   { "omp declare target", 0, 0, true, false, false,
  handle_omp_declare_target_attribute, false },
   { "alloc_align",   1, 1, false, true, true,
@@ -8955,6 +8958,37 @@ handle_omp_declare_simd_attribute (tree *, tree, tree, 
int, bool *)
   return NULL_TREE;
 }
 
+/* Handle an "simd" attribute.  */
+
+static tree
+handle_simd_attribute (tree *node, tree name, tree ARG_UNUSED (args),
+  int ARG_UNUSED (flags), bool *no_add_attrs)
+{
+  if (TREE_CODE (*node) == FUNCTION_DECL)
+{
+  if (lookup_attribute ("cilk simd function", DECL_ATTRIBUTES (*node)) != 
NULL)
+   {
+ error_at (DECL_SOURCE_LOCATION (*node),
+   "%<__simd__%> attribute cannot be "
+   "used in the same function marked as a Cilk Plus 

Re: [patch] Fix PR middle-end/68251

2015-11-10 Thread Eric Botcazou
> Tested on x86_64-suse-linux, OK for the mainline?  I'll install the Fortran
> testcase once it is reduced because it takes a while to compile ATM.

Here it is, as reduced by Joost, installed on the mainline.


2015-11-10  Eric Botcazou  

* gfortran.dg/pr68251.f90: New test.

-- 
Eric Botcazou! PR middle-end/68251
! Reduced testcase by Joost VandeVondele 

! { dg-do compile }
! { dg-options "-O3" }

MODULE hfx_contract_block
  INTEGER, PARAMETER :: dp=8
CONTAINS
  SUBROUTINE contract_block(ma_max,mb_max,mc_max,md_max,kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
REAL(KIND=dp) :: kbd(mb_max*md_max), kbc(mb_max*mc_max), &
  kad(ma_max*md_max), kac(ma_max*mc_max), pbd(mb_max*md_max), &
  pbc(mb_max*mc_max), pad(ma_max*md_max), pac(ma_max*mc_max), &
  prim(ma_max*mb_max*mc_max*md_max), scale
SELECT CASE(ma_max)
CASE(1)
  SELECT CASE(mb_max)
  CASE(1)
SELECT CASE(mc_max)
CASE(1)
  SELECT CASE(md_max)
  CASE(1)
CALL block_1_1_1_1(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_1_1_2(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_1_11(md_max,kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
  END SELECT
END SELECT
SELECT CASE(mc_max)
CASE(1)
  SELECT CASE(md_max)
  CASE(2)
CALL block_1_2_1_2(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_2_1_3(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_2_1_4(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_2_1_5(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_2_1_6(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_2_1_7(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_2_2_2(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_2_2_4(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_2_4_1(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_2_6_1(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
  END SELECT
  SELECT CASE(md_max)
  CASE(1)
CALL block_1_2_7_1(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
  END SELECT
END SELECT
SELECT CASE(mc_max)
CASE(1)
  SELECT CASE(md_max)
  CASE(1)
CALL block_1_3_1_1(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_3_1_3(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_3_1_4(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_3_1_5(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_3_1_6(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_3_1(md_max,kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_3_2_1(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_3_2_2(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_3_2_3(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
  END SELECT
  SELECT CASE(md_max)
  CASE(1)
CALL block_1_3_3_1(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_3_3_2(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
  END SELECT
  SELECT CASE(md_max)
  CASE(1)
CALL block_1_3_5(md_max,kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_3_5(md_max,kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
  END SELECT
END SELECT
SELECT CASE(mc_max)
CASE(1)
  SELECT CASE(md_max)
  CASE(1)
CALL block_1_4_1_1(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_4_1_2(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_4_1_3(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
  END SELECT
  SELECT CASE(md_max)
  CASE(1)
CALL block_1_4_2_1(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_4_2_2(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_4_3_1(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_4_3(md_max,kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_4_3(md_max,kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_4_3(md_max,kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_4_3(md_max,kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_4_3(md_max,kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_4_3(md_max,kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_4_3(md_max,kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_4_3(md_max,kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL block_1_4_4_1(kbd,kbc,kad,kac,pbd,pbc,pad,pac,prim,scale)
CALL 

[PATCH] Fix PR56118

2015-11-10 Thread Richard Biener

The following fixes PR56118 by adjusting the cost model handling of
basic-block vectorization to favor the vectorized version in case
estimated cost is the same as the estimated cost of the scalar
version.  This makes sense because we over-estimate the vectorized
cost in several places.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-11-10  Richard Biener  

PR tree-optimization/56118
* tree-vect-slp.c (vect_bb_vectorization_profitable_p): Make equal
cost favor vectorized version.

* gcc.target/i386/pr56118.c: New testcase.

Index: gcc/tree-vect-slp.c
===
--- gcc/tree-vect-slp.c (revision 230020)
+++ gcc/tree-vect-slp.c (working copy)
@@ -2317,9 +2317,12 @@ vect_bb_vectorization_profitable_p (bb_v
   dump_printf (MSG_NOTE, "  Scalar cost of basic block: %d\n", 
scalar_cost);
 }
 
-  /* Vectorization is profitable if its cost is less than the cost of scalar
- version.  */
-  if (vec_outside_cost + vec_inside_cost >= scalar_cost)
+  /* Vectorization is profitable if its cost is more than the cost of scalar
+ version.  Note that we err on the vector side for equal cost because
+ the cost estimate is otherwise quite pessimistic (constant uses are
+ free on the scalar side but cost a load on the vector side for
+ example).  */
+  if (vec_outside_cost + vec_inside_cost > scalar_cost)
 return false;
 
   return true;
Index: gcc/testsuite/gcc.target/i386/pr56118.c
===
--- gcc/testsuite/gcc.target/i386/pr56118.c (revision 0)
+++ gcc/testsuite/gcc.target/i386/pr56118.c (working copy)
@@ -0,0 +1,17 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 -msse2" } */
+
+#include 
+
+__m128d f()
+{
+  __m128d r={3,4};
+  r[0]=1;
+  r[1]=2;
+  return r;
+}
+
+/* We want to "vectorize" this to a aligned vector load from the
+   constant pool.  */
+
+/* { dg-final { scan-assembler "movapd" } } */


Re: Use combined_fn in tree-vrp.c

2015-11-10 Thread Richard Biener
On Tue, Nov 10, 2015 at 1:09 AM, Bernd Schmidt  wrote:
> On 11/07/2015 01:46 PM, Richard Sandiford wrote:
>>
>> @@ -3814,8 +3817,8 @@ extract_range_basic (value_range *vr, gimple *stmt)
>>   break;
>>   /* Both __builtin_ffs* and __builtin_popcount return
>>  [0, prec].  */
>> -   CASE_INT_FN (BUILT_IN_FFS):
>> -   CASE_INT_FN (BUILT_IN_POPCOUNT):
>> +   CASE_CFN_FFS:
>> +   CASE_CFN_POPCOUNT:
>>   arg = gimple_call_arg (stmt, 0);
>>   prec = TYPE_PRECISION (TREE_TYPE (arg));
>>   mini = 0;
>
>
> So let me see if I understood this. From what we discussed the purpose of
> these new internal functions is that they can have vector types. If so,
> isn't this code (here and elsewhere) which expects integers potentially
> going to be confused?

We indeed need to add additional checks to most users of CASE_CFN_* to cover
the bigger freedom that exists with respect to types.

Richard, please audit all the cases you change for that.

Thanks,
Richard.

>
>
> Bernd


Re: [hsa 9/12] Small alloc-pool fix

2015-11-10 Thread Richard Biener
On Tue, Nov 10, 2015 at 9:47 AM, Martin Liška  wrote:
> On 11/06/2015 10:57 AM, Richard Biener wrote:
>> On Fri, 6 Nov 2015, Martin Liška wrote:
>>
>>> On 11/06/2015 10:00 AM, Richard Biener wrote:
 On Thu, 5 Nov 2015, Martin Jambor wrote:

> Hi,
>
> we use C++ new operators based on alloc-pools a lot in the subsequent
> patches and realized that on the current trunk, such new operators
> would needlessly call the placement ::new operator within the allocate
> method of pool-alloc.  Fixed below by providing a new allocation
> method which does not call placement new, which is only safe to use
> from within a new operator.
>
> The patch also fixes the slightly weird two parameter operator new
> (which we do not use in HSA backend) so that it does not do the same.

>>>
>>> Hi.
>>>
 Why do you need to add the pointer variant then?
>>>
>>> You are right, we originally used the variant in the branch, but it was 
>>> eventually
>>> left.
>>>

 Also isn't the issue with allocate() that it does

 return ::new (m_allocator.allocate ()) T ();

 which 1) value-initializes and 2) doesn't even work with types like

 struct T { T(int); };

 thus types without a default constructor.
>>>
>>> You are right, it produces compilation error.
>>>

 I think the allocator was poorly C++-ified without updating the
 specification for the cases it is supposed to handle.  And now
 we have C++ uses that are not working because the allocator is
 broken.

 An incrementally better version (w/o fixing the issue with
 types w/o default constructor) is

 return ::new (m_allocator.allocate ()) T;
>>>
>>> I've tried that, and it also calls default ctor:
>>>
>>> ../../gcc/alloc-pool.h: In instantiation of ‘T* 
>>> object_allocator::allocate() [with T = et_occ]’:
>>> ../../gcc/alloc-pool.h:531:22:   required from ‘void* operator new(size_t, 
>>> object_allocator&) [with T = et_occ; size_t = long unsigned int]’
>>> ../../gcc/et-forest.c:449:46:   required from here
>>> ../../gcc/et-forest.c:58:3: error: ‘et_occ::et_occ()’ is private
>>>et_occ ();
>>>^
>>> In file included from ../../gcc/et-forest.c:28:0:
>>> ../../gcc/alloc-pool.h:483:44: error: within this context
>>>  return ::new (m_allocator.allocate ()) T;
>>
>> Yes, but it does slightly cheaper initialization of PODs
>>
>>>

 thus default-initialize which does no initialization for PODs (without
 array members...) which is what the old pool allocator did.
>>>
>>> I'm not so familiar with differences related to PODs.
>>>

 To fix the new operator (how do you even call that?  does it allow
 specifying constructor args and thus work without a default constructor?)
 it should indeed use an allocation method not performing the placement
 new.  But I'd call it allocate_raw rather than vallocate.
>>>
>>> For situations where do not have a default ctor, one should you the
>>> helper method defined at the end of alloc-pool.h:
>>>
>>> template 
>>> inline void *
>>> operator new (size_t, object_allocator )
>>> {
>>>   return a.allocate ();
>>> }
>>>
>>> For instance:
>>> et_occ *nw = new (et_occurrences) et_occ (2);
>>
>> Oh, so it uses placement new syntax...  works for me.
>>
>>> or as used in the HSA branch:
>>>
>>> /* New operator to allocate convert instruction from pool alloc.  */
>>>
>>> void *
>>> hsa_insn_cvt::operator new (size_t)
>>> {
>>>   return hsa_allocp_inst_cvt->allocate_raw ();
>>> }
>>>
>>> and
>>>
>>> cvtinsn = new hsa_insn_cvt (reg, *ptmp2);
>>>
>>>
>>> I attached patch where I rename the method as suggested.
>>
>> Ok.
>
> Hi.
>
> I'm sending suggested patch that survives regression tests and bootstrap
> on x86_64-linux-gnu.
>
> Can I install the patch to trunk?

Ok.

Thanks,
Richard.

> Thanks,
> Martin
>
>>
>> Thanks,
>> Richard.
>>
>>> Thanks,
>>> Martin
>>>

 Thanks.
 Richard.

> Thanks,
>
> Martin
>
>
> 2015-11-05  Martin Liska  
>Martin Jambor  
>
>* alloc-pool.h (object_allocator::vallocate): New method.
>(operator new): Call vallocate instead of allocate.
>(operator new): New operator.
>
>
> diff --git a/gcc/alloc-pool.h b/gcc/alloc-pool.h
> index 0dc05cd..46b6550 100644
> --- a/gcc/alloc-pool.h
> +++ b/gcc/alloc-pool.h
> @@ -483,6 +483,12 @@ public:
>  return ::new (m_allocator.allocate ()) T ();
>}
>
> +  inline void *
> +  vallocate () ATTRIBUTE_MALLOC
> +  {
> +return m_allocator.allocate ();
> +  }
> +
>inline void
>remove (T *object)
>{
> @@ -523,12 +529,19 @@ struct alloc_pool_descriptor
>  };
>
>  /* Helper for classes that do not provide default ctor.  */
> -
>  template 
>  inline void *
>  operator new (size_t, 

[PATCH] vect_slp_analyze_node_dependences TLC

2015-11-10 Thread Richard Biener

Some TLC also preparing for further enhancements.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied.

Richard.

2015-11-10  Richard Biener  

* tree-vect-data-refs.c (vect_slp_analyze_node_dependences):
Handle memory using/clobbering stmts without a STMT_VINFO_DATA_REF
conservatively.

Index: gcc/tree-vect-data-refs.c
===
--- gcc/tree-vect-data-refs.c   (revision 230020)
+++ gcc/tree-vect-data-refs.c   (working copy)
@@ -573,21 +595,22 @@ vect_slp_analyze_node_dependences (slp_i
   gimple *access = SLP_TREE_SCALAR_STMTS (node)[k];
   if (access == last_access)
continue;
-  stmt_vec_info access_stmt_info = vinfo_for_stmt (access);
-  gimple_stmt_iterator gsi = gsi_for_stmt (access);
-  gsi_next ();
-  for (; gsi_stmt (gsi) != last_access; gsi_next ())
+  data_reference *dr_a = STMT_VINFO_DATA_REF (vinfo_for_stmt (access));
+  for (gimple_stmt_iterator gsi = gsi_for_stmt (access);
+  gsi_stmt (gsi) != last_access; gsi_next ())
{
  gimple *stmt = gsi_stmt (gsi);
- stmt_vec_info stmt_info = vinfo_for_stmt (stmt);
- if (!STMT_VINFO_DATA_REF (stmt_info)
- || (DR_IS_READ (STMT_VINFO_DATA_REF (stmt_info))
- && DR_IS_READ (STMT_VINFO_DATA_REF (access_stmt_info
+ if (! gimple_vuse (stmt)
+ || (DR_IS_READ (dr_a) && ! gimple_vdef (stmt)))
continue;
 
- ddr_p ddr = initialize_data_dependence_relation
- (STMT_VINFO_DATA_REF (access_stmt_info),
-  STMT_VINFO_DATA_REF (stmt_info), vNULL);
+ /* If we couldn't record a (single) data reference for this
+stmt we have to give up.  */
+ data_reference *dr_b = STMT_VINFO_DATA_REF (vinfo_for_stmt (stmt));
+ if (!dr_b)
+   return false;
+
+ ddr_p ddr = initialize_data_dependence_relation (dr_a, dr_b, vNULL);
  if (vect_slp_analyze_data_ref_dependence (ddr))
{
  /* ???  If the dependence analysis failed we can resort to the



Re: RFC: Incomplete Draft Patches to Correct Errors in Loop Unrolling Frequencies (bugzilla problem 68212)

2015-11-10 Thread Bernd Schmidt

On 11/10/2015 10:56 AM, Bernhard Reutner-Fischer wrote:

On November 9, 2015 6:35:20 PM GMT+01:00, Bernd Schmidt  
wrote:

I think something that starts with bb->loop_father and iterates
outwards
would be more efficient.


flow_bb_inside_loop_p() ?


Ah thanks. I knew there must be one but I couldn't find it.


Bernd



Re: RFC: Incomplete Draft Patches to Correct Errors in Loop Unrolling Frequencies (bugzilla problem 68212)

2015-11-10 Thread Bernhard Reutner-Fischer
On November 9, 2015 6:35:20 PM GMT+01:00, Bernd Schmidt  
wrote:
>On 11/07/2015 03:44 PM, Kelvin Nilsen wrote:

>> +bool
>> +in_loop_p (basic_block block, struct loop *loop_ptr)
>> +{
>> +  basic_block *bbs = get_loop_body (loop_ptr);
>> +  bool result = false;
>> +
>> +  for (unsigned int i = 0; i < loop_ptr->num_nodes; i++)
>> +{
>> +  if (bbs[i] == block)
>> +result = true;
>> +}
>
>I think something that starts with bb->loop_father and iterates
>outwards 
>would be more efficient.

flow_bb_inside_loop_p() ?

Cheers,



Re: Extend tree-call-cdce to calls whose result is used

2015-11-10 Thread Richard Biener
On Mon, Nov 9, 2015 at 10:03 PM, Michael Matz  wrote:
> Hi,
>
> On Mon, 9 Nov 2015, Richard Sandiford wrote:
>
>> +static bool
>> +can_use_internal_fn (gcall *call)
>> +{
>> +  /* Only replace calls that set errno.  */
>> +  if (!gimple_vdef (call))
>> +return false;
>
> Oh, I managed to confuse this in my head while reading the patch.  So,
> hmm, you don't actually replace the builtin with an internal function
> (without the condition) under no-errno-math?  Does something else do that?
> Because otherwise that seems an unnecessary restriction?
>
>> >> r229916 fixed that for the non-EH case.
>> >
>> > Ah, missed it.  Even the EH case shouldn't be difficult.  If the
>> > original dominator of the EH destination was the call block it moves,
>> > otherwise it remains unchanged.
>>
>> The target of the edge is easy in itself, I agree, but that isn't
>> necessarily the only affected block, if the EH handler doesn't
>> exit or rethrow.
>
> You're worried the non-EH and the EH regions merge again, right?  Like so:
>
> before change:
>
> BB1: throwing-call
>  fallthru/   \EH
> BB2   BBeh
>  |   /\ (stuff in EH-region)
>  | /some path out of EH region
>  | /--/
> BB3
>
> Here, BB3 must at least be dominated by BB1 (the throwing block), or by
> something further up (when there are other side-entries to the path
> BB2->BB3 or into the EH region).  When further up, nothing changes, when
> it's BB1, then it's afterwards dominated by the BB containing the
> condition.  So everything with idom==BB1 gets idom=Bcond, except for BBeh,
> which gets idom=Bcall.  Depending on how you split BB1, either Bcond or
> BBcall might still be BB1 and doesn't lead to changes in the dom tree.
>
>> > Currently we have quite some of such passes (reassoc, forwprop,
>> > lower_vector_ssa, cse_reciprocals, cse_sincos (sigh!), optimize_bswap
>> > and others), but they are all handling only special situations in one
>> > way or the other.  pass_fold_builtins is another one, but it seems
>> > most related to what you want (replacing a call with something else),
>> > so I thought that'd be the natural choice.
>>
>> Well, to be pedantic, it's not really replacing the call.  Except for
>> the special case of targets that support direct assignments to errno,
>> it keeps the original call but ensures that it isn't usually executed.
>> From that point of view it doesn't really seem like a fold.
>>
>> But I suppose that's just naming again :-).  And it's easily solved with
>> s/fold/rewrite/.
>
> Exactly, in my mind pass_fold_builtin (like many of the others I
> mentioned) doesn't do folding but rewriting :)

So I am replying here to the issue of where to do the transform call_cdce
does and the one Richard wants to add.  For example we "lower"
posix_memalign as early as GIMPLE lowering (that's before CFG construction).
We also lower sincos to cexpi during GENERIC folding (or if that is dropped
either GIMPLE lowering or GIMPLE folding during gimplification would be
appropriate).

Now, with offloading we have to avoid creating target dependencies before
LTO stream-out (thus no IFN replacements before that - not sure if
Richards patches have an issue there already).  Which would leave us
with a lowering stage early in the main optimization pipeline - I think
fold_builtins pass is way too late but any "folding" pass will do
(like forwprop or backprop where the latter might be better because
it might end up computing FP "ranges" to improve the initial lowering code).

Of course call_cdce is as good as long as it still exists.

>> > call_cdce is also such a pass, but I think it's simply not the
>> > appropriate one (only in so far as its source file contains the helper
>> > routines you need), and in addition I think it shouldn't exist at all
>> > (and wouldn't need to if it had been part of DCE from the start, or if
>> > you implemented the conditionalizing as part of another pass).  Hey,
>> > you could be one to remove a pass! ;-)
>>
>> It still seems a bit artificial to me to say that the transformation
>> with a null lhs is "DCE enough" to go in the main DCE pass (even though
>> like I say it doesn't actually eliminate any code from the IR, it just
>> adds more code) and should be kept in a separate pass from the one that
>> does the transformation on a non-null lhs.
>
> Oh, I agree, I might not have been clear: I'm not arguing that the normal
> DCE should now be changed to do the conditionalizing when it removes an
> call LHS; I was saying that it _would_ have been good instead of adding
> the call_cdce pass in the past, when it was for DCE purposes only.

Yes, I also argued that.

> But
> now your proposal is on the plate, namely doing the conditionalizing also
> with an LHS.  So that conditionalizing should take place in some rewriting
> pass (and ideally not call_cdce), no matter the LHS, and normal DCE not be
> 

Re: [PATCH PR52272]Be smart when adding iv candidates

2015-11-10 Thread Bernd Schmidt

On 11/10/2015 09:25 AM, Bin.Cheng wrote:

Thanks for reviewing.  I haven't committed it yet, could you please
point out which quoted piece is so that I can update patch?


Sorry, I thought it was pretty obvious...


+{
+  return ccand1->hash == ccand2->hash
+&& operand_equal_p (ccand1->base, ccand2->base, 0)
+&& operand_equal_p (ccand1->step, ccand2->step, 0)
+&& TYPE_PRECISION (TREE_TYPE (ccand1->base))
+ == TYPE_PRECISION (TREE_TYPE (ccand2->base));
+}
+


Multi-line expressions should be wrapped in parentheses so that 
emacs/indent can format them automatically. Two sets of parens are 
needed for this. Operators should then line up appropriately.



Bernd


[Patch GCC 5/Vect] Partial backport of r228751 (pr68238)

2015-11-10 Thread James Greenhalgh

Hi,

As requested in the PR, this patch is a partial backport of r228751.

I can't claim any responsibility for it, but I did take it through the
paces on an aarch64-none-linux-gnu and x86_64-none-linux-gnu bootstrap/
test run and found no issues.

Applied as r230092 on gcc-5-branch (pre-approved in the PR) after checking
that it gives the right results for the code I derived the PR from.

I'll start a test cycle for a 4.9 backport.

Thanks,
James

---
2015-11-09  James Greenhalgh  

Partial backport from trunk r228751.
PR tree-optimization/68238
2015-10-13  Richard Biener  

* tree-vect-loop.c (vect_estimate_min_profitable_iters): Use
LOOP_VINFO_COMP_ALIAS_DDRS to estimate alias versioning cost.

diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index 88ef251..05515b5 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -2825,7 +2825,7 @@ vect_estimate_min_profitable_iters (loop_vec_info loop_vinfo,
   if (LOOP_REQUIRES_VERSIONING_FOR_ALIAS (loop_vinfo))
 {
   /*  FIXME: Make cost depend on complexity of individual check.  */
-  unsigned len = LOOP_VINFO_MAY_ALIAS_DDRS (loop_vinfo).length ();
+  unsigned len = LOOP_VINFO_COMP_ALIAS_DDRS (loop_vinfo).length ();
   (void) add_stmt_cost (target_cost_data, len, vector_stmt, NULL, 0,
 			vect_prologue);
   dump_printf (MSG_NOTE,


Re: [PATCH 1/6] Use IFN_SQRT in tree-vect-patterns.c

2015-11-10 Thread Richard Biener
On Mon, Nov 9, 2015 at 5:21 PM, Richard Sandiford
 wrote:
> In practice all targets that can vectorise sqrt define the appropriate
> sqrt2 optab.  The only case where this isn't immediately obvious
> is the libmass support in rs6000.c, but Mike Meissner said that it shouldn't
> be exercised for sqrt.
>
> This patch therefore uses the internal function interface instead of
> going via the target hook.
>
>
> gcc/
> * tree-vect-patterns.c: Include internal-fn.h.
> (vect_recog_pow_pattern): Use IFN_SQRT instead of BUILT_IN_SQRT*.
>
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> index bab9a4f..a803e8c 100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -39,6 +39,7 @@ along with GCC; see the file COPYING3.  If not see
>  #include "tree-vectorizer.h"
>  #include "dumpfile.h"
>  #include "builtins.h"
> +#include "internal-fn.h"
>  #include "case-cfn-macros.h"
>
>  /* Pattern recognition functions  */
> @@ -1052,18 +1053,13 @@ vect_recog_pow_pattern (vec *stmts, tree 
> *type_in,
>if (TREE_CODE (exp) == REAL_CST
>&& real_equal (_REAL_CST (exp), ))
>  {
> -  tree newfn = mathfn_built_in (TREE_TYPE (base), BUILT_IN_SQRT);
>*type_in = get_vectype_for_scalar_type (TREE_TYPE (base));
> -  if (*type_in)
> +  if (*type_in && direct_internal_fn_supported_p (IFN_SQRT, *type_in))
> {
> - gcall *stmt = gimple_build_call (newfn, 1, base);
> - if (vectorizable_function (stmt, *type_in, *type_in)
> - != NULL_TREE)
> -   {
> - var = vect_recog_temp_ssa_var (TREE_TYPE (base), stmt);
> - gimple_call_set_lhs (stmt, var);
> - return stmt;
> -   }
> + gcall *stmt = gimple_build_call_internal (IFN_SQRT, 1, base);
> + var = vect_recog_temp_ssa_var (TREE_TYPE (base), stmt);
> + gimple_call_set_lhs (stmt, var);
> + return stmt;

Looks ok but I wonder if this is dead code with

(for pows (POW)
 sqrts (SQRT)
 cbrts (CBRT)
 (simplify
  (pows @0 REAL_CST@1)
  (with {
const REAL_VALUE_TYPE *value = TREE_REAL_CST_PTR (@1);
REAL_VALUE_TYPE tmp;
   }
   (switch
...
/* pow(x,0.5) -> sqrt(x).  */
(if (flag_unsafe_math_optimizations
 && canonicalize_math_p ()
 && real_equal (value, ))
 (sqrts @0))

also wondering here about canonicalize_math_p (), I'd expected the
reverse transform as canonicalization.  Also wondering about
flag_unsafe_math_optimizations (missing from the vectorizer pattern).

Anyway, patch is ok.

Thanks,
Richard.

> }
>  }
>
>


Re: [gomp4 06/14] omp-low: copy omp_data_o to shared memory on NVPTX

2015-11-10 Thread Jakub Jelinek
On Tue, Nov 03, 2015 at 05:25:53PM +0300, Alexander Monakov wrote:
> Here's an alternative patch that does not depend on exposure of shared-memory
> address space, and does not try to use pass_late_lower_omp.  It's based on
> Bernd's suggestion to transform

FYI, I've committed a new testcase to gomp-4_5-branch that covers various
target data sharing/team sharing/privatization parallel
sharing/privatization offloading cases.

2015-11-10  Jakub Jelinek  

* testsuite/libgomp.c/target-31.c: New test.

--- libgomp/testsuite/libgomp.c/target-31.c.jj  2015-11-09 19:05:50.439644694 
+0100
+++ libgomp/testsuite/libgomp.c/target-31.c 2015-11-10 11:12:12.930286760 
+0100
@@ -0,0 +1,163 @@
+#include 
+#include 
+
+int a = 1, b = 2, c = 3, d = 4;
+int e[2] = { 5, 6 }, f[2] = { 7, 8 }, g[2] = { 9, 10 }, h[2] = { 11, 12 };
+
+__attribute__((noinline, noclone)) void
+use (int *k, int *l, int *m, int *n, int *o, int *p, int *q, int *r)
+{
+  asm volatile ("" : : "r" (k) : "memory");
+  asm volatile ("" : : "r" (l) : "memory");
+  asm volatile ("" : : "r" (m) : "memory");
+  asm volatile ("" : : "r" (n) : "memory");
+  asm volatile ("" : : "r" (o) : "memory");
+  asm volatile ("" : : "r" (p) : "memory");
+  asm volatile ("" : : "r" (q) : "memory");
+  asm volatile ("" : : "r" (r) : "memory");
+}
+
+#pragma omp declare target to (use)
+
+int
+main ()
+{
+  int err = 0, r = -1, t[4];
+  long s[4] = { -1, -2, -3, -4 };
+  int j = 13, k = 14, l[2] = { 15, 16 }, m[2] = { 17, 18 };
+  #pragma omp target private (a, b, e, f) firstprivate (c, d, g, h) map(from: 
r, s, t) \
+map(tofrom: err, j, l) map(to: k, m)
+  #pragma omp teams num_teams (4) thread_limit (8) private (b, f) firstprivate 
(d, h, k, m)
+  {
+int u1 = k, u2[2] = { m[0], m[1] };
+int u3[64];
+int i;
+for (i = 0; i < 64; i++)
+  u3[i] = k + i;
+#pragma omp parallel num_threads (1)
+{
+  if (c != 3 || d != 4 || g[0] != 9 || g[1] != 10 || h[0] != 11 || h[1] != 
12 || k != 14 || m[0] != 17 || m[1] != 18)
+   #pragma omp atomic write
+ err = 1;
+  b = omp_get_team_num ();
+  if (b >= 4)
+   #pragma omp atomic write
+ err = 1;
+  if (b == 0)
+   {
+ a = omp_get_num_teams ();
+ e[0] = 2 * a;
+ e[1] = 3 * a;
+   }
+  f[0] = 2 * b;
+  f[1] = 3 * b;
+  #pragma omp atomic update
+   c++;
+  #pragma omp atomic update
+   g[0] += 2;
+  #pragma omp atomic update
+   g[1] += 3;
+  d++;
+  h[0] += 2;
+  h[1] += 3;
+  k += b;
+  m[0] += 2 * b;
+  m[1] += 3 * b;
+}
+use (, , , , e, f, g, h);
+#pragma omp parallel firstprivate (u1, u2)
+{
+  int w = omp_get_thread_num ();
+  int x = 19;
+  int y[2] = { 20, 21 };
+  int v = 24;
+  int ll[64];
+  if (u1 != 14 || u2[0] != 17 || u2[1] != 18)
+   #pragma omp atomic write
+ err = 1;
+  u1 += w;
+  u2[0] += 2 * w;
+  u2[1] += 3 * w;
+  use (, u2, [b], l, , m, , h);
+  #pragma omp master
+   t[b] = omp_get_num_threads ();
+  #pragma omp atomic update
+   j++;
+  #pragma omp atomic update
+   l[0] += 2;
+  #pragma omp atomic update
+   l[1] += 3;
+  #pragma omp atomic update
+   k += 4;
+  #pragma omp atomic update
+   m[0] += 5;
+  #pragma omp atomic update
+   m[1] += 6;
+  x += w;
+  y[0] += 2 * w;
+  y[1] += 3 * w;
+  #pragma omp simd safelen(32) private (v)
+  for (i = 0; i < 64; i++)
+   {
+ v = 3 * i;
+ ll[i] = u1 + v * u2[0] + u2[1] + x + y[0] + y[1] + v + h[0] + u3[i];
+   }
+  #pragma omp barrier
+  use (, u2, [b], l, , m, , y);
+  if (w < 0 || w > 8 || w != omp_get_thread_num () || u1 != 14 + w
+ || u2[0] != 17 + 2 * w || u2[1] != 18 + 3 * w
+ || x != 19 + w || y[0] != 20 + 2 * w || y[1] != 21 + 3 * w
+ || v != 24)
+   #pragma omp atomic write
+ err = 1;
+  for (i = 0; i < 64; i++)
+   if (ll[i] != u1 + 3 * i * u2[0] + u2[1] + x + y[0] + y[1] + 3 * i + 13 
+ 14 + i)
+ #pragma omp atomic write
+   err = 1;
+}
+#pragma omp parallel num_threads (1)
+{
+  if (b == 0)
+   {
+ r = a;
+ if (a != omp_get_num_teams ()
+ || e[0] != 2 * a
+ || e[1] != 3 * a)
+   #pragma omp atomic write
+ err = 1;
+   }
+  int v1, v2, v3;
+  #pragma omp atomic read
+   v1 = c;
+  #pragma omp atomic read
+   v2 = g[0];
+  #pragma omp atomic read
+   v3 = g[1];
+  s[b] = v1 * 65536L + v2 * 256L + v3;
+  if (d != 5 || h[0] != 13 || h[1] != 15
+ || k != 14 + b + 4 * t[b]
+ || m[0] != 17 + 2 * b + 5 * t[b]
+ || m[1] != 18 + 3 * b + 6 * t[b]
+ || b != omp_get_team_num ()
+ || f[0] != 2 * b || f[1] != 3 * b)
+   #pragma omp atomic write
+ err = 1;
+}
+  }

Re: [hsa 5/12] New HSA-related GCC options

2015-11-10 Thread Richard Biener
On Mon, 9 Nov 2015, Martin Jambor wrote:

> Hi,
> 
> On Fri, Nov 06, 2015 at 09:42:25AM +0100, Richard Biener wrote:
> > On Thu, 5 Nov 2015, Martin Jambor wrote:
> > 
> > > Hi,
> > > 
> > > the following small part of the merge deals with new options.  It adds
> > > four independent things:
> > > 
> > > 1) flag_disable_hsa is used by code in opts.c (in the first patch) to
> > >remember whether HSA has been explicitely disabled on the compiler
> > >command line.
> > 
> > But I don't see any way to disable it on the command line?  (no switch?)
> 
> No, the switch is -foffload, which has missing documentation (PR
> 67300) and is only described at https://gcc.gnu.org/wiki/Offloading
> Nevertheless, the option allows the user to specify compiler option
> -foffload=disable and no offloading should happen, not even HSA.  The
> user can also enumerate just the offload targets they want (and pass
> them special command line stuff).
> 
> It seems I have misplaced a hunk in the patch series.  Nevertheless,
> in the first patch (with configuration stuff), there is a change to
> opts.c which scans the -foffload= contents and sets the flag variable
> if hsa is not present.
> 
> Whenever the compiler has to decide whether HSA is enabled for the
> given compilation or not, it has to look at this variable (if
> configured for HSA).
> 
> > 
> > > 2) -Whsa is a new warning we emit whenever we fail to produce HSAIL
> > >for some source code.  It is on by default but of course only
> > >emitted by HSAIL generating code so should never affect anybody who
> > >does not use HSA-enabled compiler and OpenMP 4 device constructs.
> > > 
> > > We have found the following two additions very useful for debugging on
> > > the branch but will understand if they are not deemed suitable for
> > > trunk and will gladly remove them:
> > > 
> > > 3) -fdisable-hsa-gridification disables the gridification process to
> > >ease experimenting with dynamic parallelism.  With this option,
> > >HSAIL is always generated from the CPU-intended gimple.
> > 
> > So this sounds like sth a user should never do which means
> > it shouln't be a switch (but a parameter or removed).
> 
> Martin said he likes the capability to switch gridification off so I
> turned it into a parameter.
> 
> > 
> > > 4) Parameter hsa-gen-debug-stores will be obsolete once HSA run-time
> > >supports debugging traps.  Before that, we have to do with
> > >debugging stores to memory at defined places, which however can
> > >cost speed in benchmarks.  So they are only enabled with this
> > >parameter.  We decided to make it a parameter rather than a switch
> > >to emphasize the fact it will go away and to possibly allow us
> > >select different levels of verbosity of the stores in the future).
> > 
> > You miss documentation in invoke.texi for new switches and parameters.
> 
> Right, I have added that together with other changes addressing the
> above comments and am about to commit the following to the branch:

Looks good to me.

Thanks,
Richard.

> 
> 2015-11-09  Martin Jambor  
> 
>   * common.opt (-fdisable-hsa-gridification): Removed.
>   * params.def (PARAM_OMP_GPU_GRIDIFY): New.
>   * omp-low.c: Include params.h.
>   (execute_lower_omp): Check parameter PARAM_OMP_GPU_GRIDIFY instead of
>   flag_disable-hsa-gridification.
>   * doc/invoke.texi (Optimize Options): Add description of
>   omp-gpu-gridify and hsa-gen-debug-stores parameters.
> 
> diff --git a/gcc/common.opt b/gcc/common.opt
> index 9cb52db..8bee504 100644
> --- a/gcc/common.opt
> +++ b/gcc/common.opt
> @@ -1115,10 +1115,6 @@ fdiagnostics-show-location=
>  Common Joined RejectNegative Enum(diagnostic_prefixing_rule)
>  -fdiagnostics-show-location=[once|every-line]How often to emit 
> source location at the beginning of line-wrapped diagnostics.
>  
> -fdisable-hsa-gridification
> -Common Report Var(flag_disable_hsa_gridification)
> -Disable HSA gridification for OMP pragmas
> -
>  ; Required for these enum values.
>  SourceInclude
>  pretty-print.h
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 4fc7d88..b9fb1e1 100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -11171,6 +11171,17 @@ dynamic, guided, auto, runtime).  The default is 
> static.
>  Maximum depth of recursion when querying properties of SSA names in things
>  like fold routines.  One level of recursion corresponds to following a
>  use-def chain.
> +
> +@item omp-gpu-gridify
> +Enable creation of gridified GPU kernels out of loops within target
> +OpenMP constructs.  This conversion is enabled by default when
> +offloading to HSA, to disable it, use @option{--param omp-gpu-gridify=0}
> +
> +@item hsa-gen-debug-stores
> +Enable emission of special debug stores within HSA kernels which are
> +then read and reported by libgomp plugin.  Generation of these stores
> +is disabled by default, use @option{--param 

Re: RFC: C++ delayed folding merge

2015-11-10 Thread Richard Biener
On Mon, 9 Nov 2015, Jason Merrill wrote:

> On 11/09/2015 02:28 PM, Jason Merrill wrote:
> > On 11/09/2015 04:08 AM, Richard Biener wrote:
> > > On Mon, 9 Nov 2015, Jason Merrill wrote:
> > > 
> > > > I'm planning to merge the C++ delayed folding branch this week, but I
> > > > need to
> > > > get approval of the back end changes (the first patch attached).
> > > > Most of
> > > > these are the introduction of non-folding variants of convert_to_*,
> > > > but there
> > > > are a few others.
> > > > 
> > > > One question: The branch changes 'convert' to not fold its result,
> > > > and it's
> > > > not clear to me whether that's part of the expected behavior of a
> > > > front end
> > > > 'convert' function or not.
> > > 
> > > History.  convert is purely frontend (but shared, unfortunately between
> > > all frontends).  I would expect that FEs that do not do delayed folding
> > > expect convert to fold.
> > > 
> > > > Also, I'm a bit uncertain about merging this at the end of stage 1,
> > > > since it's
> > > > a large internal change with relatively small user impact; it just
> > > > improves
> > > > handling of constant expression corner cases.  I'm inclined to go
> > > > ahead with
> > > > it at this point, but I'm interested in contrary opinions.
> > > 
> > > I welcome this change as it should allow cleaning up the FE-middle-end
> > > interface a bit more.  It should be possible to remove all
> > > NON_LVALUE_EXPR adding/removal from the middle-end folders.
> > > 
> > > Looks like the backend patch included frontend parts but as far as I
> > > skimmed it only
> > > 
> > > diff --git a/gcc/fold-const.c b/gcc/fold-const.c
> > > index 5e32901..d754a90 100644
> > > --- a/gcc/fold-const.c
> > > +++ b/gcc/fold-const.c
> > > @@ -2091,6 +2091,17 @@ fold_convert_const (enum tree_code code, tree
> > > type,
> > > tree arg1)
> > > else if (TREE_CODE (arg1) == REAL_CST)
> > >  return fold_convert_const_fixed_from_real (type, arg1);
> > >   }
> > > +  else if (TREE_CODE (type) == VECTOR_TYPE)
> > > +{
> > > +  if (TREE_CODE (arg1) == VECTOR_CST
> > > + && TYPE_MAIN_VARIANT (type) == TYPE_MAIN_VARIANT (TREE_TYPE
> > > (arg1))
> > > + && TYPE_VECTOR_SUBPARTS (type) == VECTOR_CST_NELTS (arg1))
> > > +   {
> > > + tree r = copy_node (arg1);
> > > + TREE_TYPE (arg1) = type;
> > > + return r;
> > > +   }
> > > +}
> > > 
> > > 
> > > looks suspicious.  The issue here is that the vector elements will
> > > have the wrong type after this simple handling.
> > 
> > I was aiming to just handle simple cv-qualifier changes; that's why the
> > TYPE_MAIN_VARIANT comparison is there.
> > 
> > > If you fix that you can as well handle all kind of element type
> > > changes via recursing to fold_convert_const (that includes
> > > float to int / int to float changes).
> > 
> > But I'll try this.
> 
> Like so?

Yes.

Thanks,
Richard.


[PATCH][ARM][3/3][v2] Implement negsicc, notsicc optabs

2015-11-10 Thread Kyrill Tkachov

Hi all,

This is a slight respin of 
https://gcc.gnu.org/ml/gcc-patches/2015-09/msg00075.html.
This had been ok'd but I've encountered a bug with the *if__move 
pattern
For some reason, after reload operands[1] doesn't end up living in the same 
register
as operands[0] even though it has the constraint '0'. Maybe I misunderstood the 
semantics
of the '0' constraints. In any case, telling the splitter to explicitly emit 
the move
before the cond_exec if the registers don't match fixes this.

Bootstrapped and tested on arm.
Ok to commit this updated version instead?

Thanks,
Kyrill

2015-11-10  Kyrylo Tkachov  

* config/arm/arm.md (sicc): New define_expand.
(*if_neg_move): Rename to...
(*if__move): ... This.  Use NOT_NEG code iterator.
Move operands[1] into operands[0] if they don't match up.
* config/arm/iterators.md (NOT_NEG): New code iterator.
(NOT_NEG_op): New code attribute.

2015-11-10  Kyrylo Tkachov  

* gcc.target/arm/cond_op_imm_1.c: New test.
commit c5a3ade022a18dad02d3391aab7af9ddf7e26340
Author: Kyrylo Tkachov 
Date:   Fri Aug 14 13:42:51 2015 +0100

[ARM][3/3] Implement negsicc, notsicc optabs

diff --git a/gcc/config/arm/arm.md b/gcc/config/arm/arm.md
index 8ebb1bf..ab7ece9 100644
--- a/gcc/config/arm/arm.md
+++ b/gcc/config/arm/arm.md
@@ -10079,19 +10079,43 @@ (define_insn "*ifcompare_neg_move"
(set_attr "type" "multiple")]
 )
 
-(define_insn_and_split "*if_neg_move"
+;; The negsicc and notsicc optabs.
+(define_expand "sicc"
+  [(set (match_operand:SI 0 "s_register_operand" "")
+	(if_then_else:SI (match_operand 1 "arm_comparison_operator" "")
+			  (NOT_NEG:SI (match_operand:SI 2 "s_register_operand" ""))
+			  (match_operand:SI 3 "s_register_operand" "")))]
+  "TARGET_32BIT"
+  {
+rtx ccreg;
+enum rtx_code code = GET_CODE (operands[1]);
+
+if (code == UNEQ || code == LTGT)
+  FAIL;
+
+ccreg = arm_gen_compare_reg (code, XEXP (operands[1], 0),
+  XEXP (operands[1], 1), NULL);
+operands[1] = gen_rtx_fmt_ee (code, VOIDmode, ccreg, const0_rtx);
+  }
+)
+
+
+(define_insn_and_split "*if__move"
   [(set (match_operand:SI 0 "s_register_operand" "=l,r")
 	(if_then_else:SI
 	 (match_operator 4 "arm_comparison_operator"
 	  [(match_operand 3 "cc_register" "") (const_int 0)])
-	 (neg:SI (match_operand:SI 2 "s_register_operand" "l,r"))
+	 (NOT_NEG:SI (match_operand:SI 2 "s_register_operand" "l,r"))
 	 (match_operand:SI 1 "s_register_operand" "0,0")))]
   "TARGET_32BIT"
   "#"
   "&& reload_completed"
   [(cond_exec (match_op_dup 4 [(match_dup 3) (const_int 0)])
-	  (set (match_dup 0) (neg:SI (match_dup 2]
-  ""
+	  (set (match_dup 0) (NOT_NEG:SI (match_dup 2]
+  {
+if (!rtx_equal_p (operands[0], operands[1]))
+  emit_move_insn (operands[0], operands[1]);
+  }
   [(set_attr "conds" "use")
(set_attr "length" "4")
(set_attr "arch" "t2,32")
diff --git a/gcc/config/arm/iterators.md b/gcc/config/arm/iterators.md
index 6a54125..2f4bc5c 100644
--- a/gcc/config/arm/iterators.md
+++ b/gcc/config/arm/iterators.md
@@ -209,6 +209,9 @@ (define_code_iterator COMPARISONS [eq gt ge le lt])
 ;; A list of ...
 (define_code_iterator IOR_XOR [ior xor])
 
+;; Bitwise complement and negation
+(define_code_iterator NOT_NEG [not neg])
+
 ;; Operations on two halves of a quadword vector.
 (define_code_iterator VQH_OPS [plus smin smax umin umax])
 
@@ -656,6 +659,8 @@ (define_code_attr VQH_type [(plus "add") (smin "minmax") (smax "minmax")
 (define_code_attr VQH_sign [(plus "i") (smin "s") (smax "s") (umin "u")
 (umax "u")])
 
+(define_code_attr NOT_NEG_op [(not "not") (neg "neg")])
+
 (define_code_attr cnb [(ltu "CC_C") (geu "CC")])
 (define_code_attr optab [(ltu "ltu") (geu "geu")])
 
diff --git a/gcc/testsuite/gcc.target/arm/cond_op_imm_1.c b/gcc/testsuite/gcc.target/arm/cond_op_imm_1.c
new file mode 100644
index 000..9d335e2
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/cond_op_imm_1.c
@@ -0,0 +1,42 @@
+/* { dg-do run } */
+/* { dg-options "-save-temps -O2 -fno-inline" } */
+/* { dg-require-effective-target arm_cond_exec } */
+
+extern void abort (void);
+
+#define N 25089992
+
+int
+foonegsi (int a)
+{
+  return a ? N : -N;
+}
+
+int
+fooinvsi (int a)
+{
+  return a ? N : ~N;
+}
+
+
+
+int
+main (void)
+{
+  if (foonegsi (1) != N)
+abort ();
+
+  if (foonegsi (0) != -N)
+abort ();
+
+  if (fooinvsi (1) != N)
+abort ();
+
+  if (fooinvsi (0) != ~N)
+abort ();
+
+  return 0;
+}
+
+/* { dg-final { scan-assembler "rsbne" } } */
+/* { dg-final { scan-assembler "mvnne" } } */


[PATCH] Fix PR68240

2015-11-10 Thread Richard Biener

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2015-11-10  Richard Biener  

PR tree-optimization/68240
* tree-ssa-sccvn.c (cond_stmts_equal_p): Handle commutative compares
properly.
(visit_phi): For PHIs with just a single executable edge
take its value directly.
(expressions_equal_p): Handle VN_TOP properly.

* gcc.dg/torture/pr68240.c: New testcase.

Index: gcc/testsuite/gcc.dg/torture/pr68240.c
===
*** gcc/testsuite/gcc.dg/torture/pr68240.c  (revision 0)
--- gcc/testsuite/gcc.dg/torture/pr68240.c  (working copy)
***
*** 0 
--- 1,12 
+ /* { dg-do compile } */
+ 
+ int a, b, f;
+ 
+ void
+ fn1 ()
+ {
+   int c = 1, d, e = 1;
+   a = 1; 
+   for (; f;)
+ b = (c && (d = (e && a)));
+ }
Index: gcc/tree-ssa-sccvn.c
===
*** gcc/tree-ssa-sccvn.c(revision 230020)
--- gcc/tree-ssa-sccvn.c(working copy)
*** cond_stmts_equal_p (gcond *cond1, gcond
*** 2760,2770 
else
  return false;
  
!   if (! expressions_equal_p (vn_valueize (lhs1), vn_valueize (lhs2))
!   || ! expressions_equal_p (vn_valueize (rhs1), vn_valueize (rhs2)))
! return false;
! 
!   return true;
  }
  
  /* Compare two phi entries for equality, ignoring VN_TOP arguments.  */
--- 2806,2820 
else
  return false;
  
!   lhs1 = vn_valueize (lhs1);
!   rhs1 = vn_valueize (rhs1);
!   lhs2 = vn_valueize (lhs2);
!   rhs2 = vn_valueize (rhs2);
!   return ((expressions_equal_p (lhs1, lhs2)
!  && expressions_equal_p (rhs1, rhs2))
! || (commutative_tree_code (code1)
! && expressions_equal_p (lhs1, rhs2)
! && expressions_equal_p (rhs1, lhs2)));
  }
  
  /* Compare two phi entries for equality, ignoring VN_TOP arguments.  */
*** visit_phi (gimple *phi)
*** 3379,3384 
--- 3428,3434 
tree result;
tree sameval = VN_TOP;
bool allsame = true;
+   unsigned n_executable = 0;
  
/* TODO: We could check for this in init_sccvn, and replace this
   with a gcc_assert.  */
*** visit_phi (gimple *phi)
*** 3394,3399 
--- 3444,3450 
{
tree def = PHI_ARG_DEF_FROM_EDGE (phi, e);
  
+   ++n_executable;
if (TREE_CODE (def) == SSA_NAME)
  def = SSA_VAL (def);
if (def == VN_TOP)
*** visit_phi (gimple *phi)
*** 3408,3416 
}

/* If none of the edges was executable or all incoming values are
!  undefined keep the value-number at VN_TOP.  */
!   if (sameval == VN_TOP)
! return set_ssa_val_to (PHI_RESULT (phi), VN_TOP);
  
/* First see if it is equivalent to a phi node in this block.  We prefer
   this as it allows IV elimination - see PRs 66502 and 67167.  */
--- 3459,3469 
}

/* If none of the edges was executable or all incoming values are
!  undefined keep the value-number at VN_TOP.  If only a single edge
!  is exectuable use its value.  */
!   if (sameval == VN_TOP
!   || n_executable == 1)
! return set_ssa_val_to (PHI_RESULT (phi), sameval);
  
/* First see if it is equivalent to a phi node in this block.  We prefer
   this as it allows IV elimination - see PRs 66502 and 67167.  */
*** expressions_equal_p (tree e1, tree e2)
*** 4610,4615 
--- 4663,4672 
if (e1 == e2)
  return true;
  
+   /* If either one is VN_TOP consider them equal.  */
+   if (e1 == VN_TOP || e2 == VN_TOP)
+ return true;
+ 
/* If only one of them is null, they cannot be equal.  */
if (!e1 || !e2)
  return false;


Re: [PATCH PR52272]Be smart when adding iv candidates

2015-11-10 Thread Bin.Cheng
On Tue, Nov 10, 2015 at 6:06 PM, Bernd Schmidt  wrote:
> On 11/10/2015 09:25 AM, Bin.Cheng wrote:
>>>
>>> Thanks for reviewing.  I haven't committed it yet, could you please
>>> point out which quoted piece is so that I can update patch?
>
>
> Sorry, I thought it was pretty obvious...
>
>> +{
>> +  return ccand1->hash == ccand2->hash
>> +&& operand_equal_p (ccand1->base, ccand2->base, 0)
>> +&& operand_equal_p (ccand1->step, ccand2->step, 0)
>> +&& TYPE_PRECISION (TREE_TYPE (ccand1->base))
>> + == TYPE_PRECISION (TREE_TYPE (ccand2->base));
>> +}
>> +
>
>
> Multi-line expressions should be wrapped in parentheses so that emacs/indent
> can format them automatically. Two sets of parens are needed for this.
> Operators should then line up appropriately.
Ah, thanks for teaching.  Here is the updated patch, hoping it's correct.

Thanks,
bin
>
>
> Bernd
diff --git a/gcc/tree-ssa-loop-ivopts.c b/gcc/tree-ssa-loop-ivopts.c
index 1f952a7..a00e33c 100644
--- a/gcc/tree-ssa-loop-ivopts.c
+++ b/gcc/tree-ssa-loop-ivopts.c
@@ -247,6 +247,45 @@ struct iv_cand
   smaller type.  */
 };
 
+/* Hashtable entry for common candidate derived from iv uses.  */
+struct iv_common_cand
+{
+  tree base;
+  tree step;
+  /* IV uses from which this common candidate is derived.  */
+  vec uses;
+  hashval_t hash;
+};
+
+/* Hashtable helpers.  */
+
+struct iv_common_cand_hasher : free_ptr_hash 
+{
+  static inline hashval_t hash (const iv_common_cand *);
+  static inline bool equal (const iv_common_cand *, const iv_common_cand *);
+};
+
+/* Hash function for possible common candidates.  */
+
+inline hashval_t
+iv_common_cand_hasher::hash (const iv_common_cand *ccand)
+{
+  return ccand->hash;
+}
+
+/* Hash table equality function for common candidates.  */
+
+inline bool
+iv_common_cand_hasher::equal (const iv_common_cand *ccand1,
+ const iv_common_cand *ccand2)
+{
+  return (ccand1->hash == ccand2->hash
+ && operand_equal_p (ccand1->base, ccand2->base, 0)
+ && operand_equal_p (ccand1->step, ccand2->step, 0)
+ && (TYPE_PRECISION (TREE_TYPE (ccand1->base))
+ == TYPE_PRECISION (TREE_TYPE (ccand2->base;
+}
+
 /* Loop invariant expression hashtable entry.  */
 struct iv_inv_expr_ent
 {
@@ -255,8 +294,6 @@ struct iv_inv_expr_ent
   hashval_t hash;
 };
 
-/* The data used by the induction variable optimizations.  */
-
 /* Hashtable helpers.  */
 
 struct iv_inv_expr_hasher : free_ptr_hash 
@@ -323,6 +360,12 @@ struct ivopts_data
   /* Cache used by tree_to_aff_combination_expand.  */
   hash_map *name_expansion_cache;
 
+  /* The hashtable of common candidates derived from iv uses.  */
+  hash_table *iv_common_cand_tab;
+
+  /* The common candidates.  */
+  vec iv_common_cands;
+
   /* The maximum invariant id.  */
   unsigned max_inv_id;
 
@@ -894,6 +937,8 @@ tree_ssa_iv_optimize_init (struct ivopts_data *data)
   data->inv_expr_tab = new hash_table (10);
   data->inv_expr_id = 0;
   data->name_expansion_cache = NULL;
+  data->iv_common_cand_tab = new hash_table (10);
+  data->iv_common_cands.create (20);
   decl_rtl_to_reset.create (20);
   gcc_obstack_init (>iv_obstack);
 }
@@ -3051,6 +3096,96 @@ add_iv_candidate_for_bivs (struct ivopts_data *data)
 }
 }
 
+/* Record common candidate {BASE, STEP} derived from USE in hashtable.  */
+
+static void
+record_common_cand (struct ivopts_data *data, tree base,
+   tree step, struct iv_use *use)
+{
+  struct iv_common_cand ent;
+  struct iv_common_cand **slot;
+
+  gcc_assert (use != NULL);
+
+  ent.base = base;
+  ent.step = step;
+  ent.hash = iterative_hash_expr (base, 0);
+  ent.hash = iterative_hash_expr (step, ent.hash);
+
+  slot = data->iv_common_cand_tab->find_slot (, INSERT);
+  if (*slot == NULL)
+{
+  *slot = XNEW (struct iv_common_cand);
+  (*slot)->base = base;
+  (*slot)->step = step;
+  (*slot)->uses.create (8);
+  (*slot)->hash = ent.hash;
+  data->iv_common_cands.safe_push ((*slot));
+}
+  (*slot)->uses.safe_push (use);
+  return;
+}
+
+/* Comparison function used to sort common candidates.  */
+
+static int
+common_cand_cmp (const void *p1, const void *p2)
+{
+  unsigned n1, n2;
+  const struct iv_common_cand *const *const ccand1
+= (const struct iv_common_cand *const *)p1;
+  const struct iv_common_cand *const *const ccand2
+= (const struct iv_common_cand *const *)p2;
+
+  n1 = (*ccand1)->uses.length ();
+  n2 = (*ccand2)->uses.length ();
+  return n2 - n1;
+}
+
+/* Adds IV candidates based on common candidated recorded.  */
+
+static void
+add_iv_candidate_derived_from_uses (struct ivopts_data *data)
+{
+  unsigned i, j;
+  struct iv_cand *cand_1, *cand_2;
+
+  data->iv_common_cands.qsort (common_cand_cmp);
+  for (i = 0; i < data->iv_common_cands.length (); i++)
+{
+  struct iv_common_cand *ptr = data->iv_common_cands[i];
+
+  /* Only 

Re: [PATCH 2/6] Make builtin_vectorized_function take a combined_fn

2015-11-10 Thread Richard Biener
On Mon, Nov 9, 2015 at 5:25 PM, Richard Sandiford
 wrote:
> This patch replaces the fndecl argument to builtin_vectorized_function
> with a combined_fn and gets the vectoriser to call it for internal
> functions too.  The patch also moves vectorisation of machine-specific
> built-ins to a new hook, builtin_md_vectorized_function.
>
> I've attached a -b version too since that's easier to read.

@@ -42095,8 +42018,7 @@ ix86_builtin_vectorized_function (tree fndecl,
tree type_out,

   /* Dispatch to a handler for a vectorization library.  */
   if (ix86_veclib_handler)
-return ix86_veclib_handler ((enum built_in_function) fn, type_out,
-   type_in);
+return ix86_veclib_handler (combined_fn (fn), type_out, type_in);

   return NULL_TREE;
 }

fn is already a combined_fn?  Why does the builtin_vectorized_function
not take one but an unsigned int?

@@ -42176,11 +42077,12 @@ ix86_veclibabi_svml (enum built_in_function
fn, tree type_out, tree type_in)
   return NULL_TREE;
 }

-  bname = IDENTIFIER_POINTER (DECL_NAME (builtin_decl_implicit (fn)));
+  tree fndecl = mathfn_built_in (TREE_TYPE (type_in), fn);
+  bname = IDENTIFIER_POINTER (DECL_NAME (fndecl));

-  if (fn == BUILT_IN_LOGF)
+  if (DECL_FUNCTION_CODE (fndecl) == BUILT_IN_LOGF)

with 'fn' now a combined_fn how is this going to work with IFNs?

@@ -42194,9 +42096,7 @@ ix86_veclibabi_svml (enum built_in_function
fn, tree type_out, tree type_in)
   name[4] &= ~0x20;

   arity = 0;
-  for (args = DECL_ARGUMENTS (builtin_decl_implicit (fn));
-   args;
-   args = TREE_CHAIN (args))
+  for (args = DECL_ARGUMENTS (fndecl); args; args = TREE_CHAIN (args))
 arity++;


or this?

Did you try this out?  We have only two basic testcases for all this
code using sin()
which may not end up as IFN even with -ffast-math(?).

+/* Implement TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION.  */
+
+static tree
+rs6000_builtin_md_vectorized_function (tree fndecl, tree type_out,
+  tree type_in)
+{

any reason you are using a fndecl for this hook instead of the function code?

@@ -1639,20 +1639,20 @@ vect_finish_stmt_generation (gimple *stmt,
gimple *vec_stmt,
 tree
 vectorizable_function (gcall *call, tree vectype_out, tree vectype_in)
 {
-  tree fndecl = gimple_call_fndecl (call);
-
-  /* We only handle functions that do not read or clobber memory -- i.e.
- const or novops ones.  */
-  if (!(gimple_call_flags (call) & (ECF_CONST | ECF_NOVOPS)))
+  /* We only handle functions that do not read or clobber memory.  */
+  if (gimple_vuse (call))
 return NULL_TREE;

-  if (!fndecl
-  || TREE_CODE (fndecl) != FUNCTION_DECL
-  || !DECL_BUILT_IN (fndecl))
-return NULL_TREE;
+  combined_fn fn = gimple_call_combined_fn (call);
+  if (fn != CFN_LAST)
+return targetm.vectorize.builtin_vectorized_function
+  (fn, vectype_out, vectype_in);

-  return targetm.vectorize.builtin_vectorized_function (fndecl, vectype_out,
-   vectype_in);
+  if (gimple_call_builtin_p (call, BUILT_IN_MD))
+return targetm.vectorize.builtin_md_vectorized_function
+  (gimple_call_fndecl (call), vectype_out, vectype_in);
+
+  return NULL_TREE;

Looking at this and the issues above wouldn't it be easier to simply
pass the call stmt to the hook (which then can again handle
both normal and target builtins)?  And it has context available
(actual arguments and number of arguments for IFN calls).

Richard.

>
> gcc/
> * target.def (builtin_vectorized_function): Take a combined_fn (in
> the form of an unsigned int) rather than a function decl.
> (builtin_md_vectorized_function): New.
> * targhooks.h (default_builtin_vectorized_function): Replace the
> fndecl argument with an unsigned int.
> (default_builtin_md_vectorized_function): Declare.
> * targhooks.c (default_builtin_vectorized_function): Replace the
> fndecl argument with an unsigned int.
> (default_builtin_md_vectorized_function): New function.
> * doc/tm.texi.in (TARGET_VECTORIZE_BUILTIN_MD_VECTORIZED_FUNCTION):
> New hook.
> * doc/tm.texi: Regenerate.
> * tree-vect-stmts.c (vectorizable_function): Update call to
> builtin_vectorized_function, also passing internal functions.
> Call builtin_md_vectorized_function for target-specific builtins.
> * config/aarch64/aarch64-protos.h
> (aarch64_builtin_vectorized_function): Replace fndecl argument
> with an unsigned int.
> * config/aarch64/aarch64-builtins.c: Include case-cfn-macros.h.
> (aarch64_builtin_vectorized_function): Update after above changes.
> Use CASE_CFN_*.
> * config/arm/arm-protos.h (arm_builtin_vectorized_function): Replace
> fndecl argument with an unsigned int.
> * config/arm/arm-builtins.c: Include 

  1   2   >