Re: [PATCH] Add a new target hook to compute the frame layout

2016-06-21 Thread Bernd Edlinger
On 06/21/16 23:29, Jeff Law wrote:
> On 06/16/2016 08:47 AM, Bernd Edlinger wrote:
>> Hi!
>>
>>
>> By the design of the target hook INITIAL_ELIMINATION_OFFSET
>> it is necessary to call this function several times with
>> different register combinations.
>> Most targets use a cached data structure that describes the
>> exact frame layout of the current function.
>>
>> It is safe to skip the computation when reload_completed = true,
>> and most targets do that already.
>>
>> However while reload is doing its work, it is not clear when to
>> do the computation and when not.  This results in unnecessary
>> work.  Computing the frame layout can be a simple function or an
>> arbitrarily complex one, that walks all instructions of the current
>> function for instance, which is more or less the common case.
>>
>>
>> This patch adds a new optional target hook that can be used
>> by the target to factor the INITIAL_ELIMINATION_OFFSET-hook
>> into a O(n) computation part, and a O(1) result function.
>>
>> The patch implements a compute_frame_layout target hook just
>> for ARM in the moment, to show the principle.
>> Other targets may also implement that hook, if it seems appropriate.
>>
>>
>> Boot-strapped and reg-tested on arm-linux-gnueabihf.
>> OK for trunk?
>>
>>
>> Thanks
>> Bernd.
>>
>>
>> changelog-frame-layout.txt
>>
>>
>> 2016-06-16  Bernd Edlinger  
>>
>> * target.def (compute_frame_layout): New optional target hook.
>> * doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook.
>> * doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation.
>> * lra-eliminations.c (update_reg_eliminate): Call
>> compute_frame_layout
>> target hook.
>> * reload1.c (verify_initial_elim_offsets): Likewise.
>> * config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define.
>> (use_simple_return_p): Call arm_compute_frame_layout if needed.
>> (arm_get_frame_offsets): Split up into this ...
>> (arm_compute_frame_layout): ... and this function.
> The ARM maintainers would need to chime in on the ARM specific changes
> though.
>
>
>
>> Index: gcc/target.def
>> ===
>> --- gcc/target.def(Revision 233176)
>> +++ gcc/target.def(Arbeitskopie)
>> @@ -5245,8 +5245,19 @@ five otherwise.  This is best for most machines.",
>>   unsigned int, (void),
>>   default_case_values_threshold)
>>
>> -/* Retutn true if a function must have and use a frame pointer.  */
> s/Retutn/Return
>
yes, a line "+ /* Return ..." is a few lines below.
>> +/* Optional callback to advise the target to compute the frame
>> layout.  */
>>  DEFHOOK
>> +(compute_frame_layout,
>> + "This target hook is called immediately before reload wants to call\n\
>> +@code{INITIAL_ELIMINATION_OFFSET} and allows the target to cache the
>> frame\n\
>> +layout instead of re-computing it on every invocation.  This is
>> particularly\n\
>> +useful for targets that have an O(n) frame layout function.
>> Implementing\n\
>> +this callback is optional.",
>> + void, (void),
>> + hook_void_void)
> So the docs say "immediately before", but that's not actually reality in
> lra-eliminations.  I think you can just say "This target hook is called
> before reload or lra-eliminations calls
> @code{INITIAL_ELIMINATION_OFFSET} and allows ..."
>
>
> How does this macro interact with INITIAL_FRAME_POINTER_OFFSET?

What I wanted to say here, is that lra goes thru several iterations,
changes something in the register allocation that has an impact on the
frame layout, typically 4-5 times, and calls INITIAL_ELIMINATION_OFFSET 
3-4 times in a row, and in the results must be consistent in each
iteration to be usable.

So I am open to suggestions, how would you explain this idea in the doc?


Thanks
Bernd.

>
> I'm OK with this conceptually.  I think you need a minor doc update and
> OK from the ARM maintainers before it can be installed though.
>
> jeff


Re: C, C++: Fix PR 69733 (bad location for ignored qualifiers warning)

2016-06-21 Thread Jeff Law

On 05/04/2016 09:17 AM, Bernd Schmidt wrote:

On 04/25/2016 10:18 PM, Joseph Myers wrote:

On Fri, 22 Apr 2016, Bernd Schmidt wrote:


+/* Returns the smallest location != UNKNOWN_LOCATION in LOCATIONS,
+   considering only those c_declspec_words found in LIST, which
+   must be terminated by cdw_number_of_elements.  */
+
+static location_t
+smallest_type_quals_location (const location_t* locations,
+  c_declspec_word *list)


I'd expect list to be a pointer to const...


@@ -6101,6 +6122,18 @@ grokdeclarator (const struct c_declarato
 qualify the return type, not the function type.  */
  if (type_quals)
{
+enum c_declspec_word ignored_quals_list[] =
+  {
+cdw_const, cdw_volatile, cdw_restrict, cdw_address_space,
+cdw_number_of_elements
+  };


  ... and ignored_quals_list to be static const here.


How's this? Fully retested on x86_64-linux.


Bernd

declspecs-v2.diff


c/
PR c++/69733
* c-decl.c (smallest_type_quals_location): New static function.
(grokdeclarator): Try to find the correct location for an ignored
qualifier.
cp/
PR c++/69733
* decl.c (grokdeclarator): Try to find the correct location for an
ignored qualifier.
testsuite/
PR c++/69733
* c-c++-common/pr69733.c: New test.
* gcc.target/i386/pr69733.c: New test.

It looks like this stalled...

Anyway, it's fine for the trunk.

Thanks,
Jeff


Re: [PR66726] Fix regression caused by Factor conversion out of COND_EXPR

2016-06-21 Thread Jeff Law

On 06/01/2016 04:46 AM, kugan wrote:


Hi All,

Factoring out CONVERT_EXPR introduced a regression for (PR66726). I had
to revert my previous patch due to some regressions. This is a much
simplified version compared to the one I reverted.

There is a test-case (pr46309.c) in the test-suite which is valid for
targets that has branch cost greater than 1.

This patch makes optimize_range_tests understand the factored out
COND_EXPR. i.e., Updated the final_range_test_p to look for the new
pattern. Changed the maybe_optimize_range_tests (which does the inter
basic block range test optimization) accordingly.

With the patch
m68k-linux-gnu-gcc -O2 -S pr46309.c -fdump-tree-reassoc-details
grep -e "Optimizing range tests" -e into
pr46309.c.*.reassoc1pr46309.c.114t.reassoc1:Optimizing range tests
a_6(D) -[1, 1] and -[2, 2] and -[3, 3] and -[4, 4]
pr46309.c.114t.reassoc1: into (unsigned int) a_6(D) + 4294967295 > 3
pr46309.c.114t.reassoc1: into _10 = _13;
pr46309.c.114t.reassoc1:Optimizing range tests a_6(D) -[1, 1] and -[2,
2] and -[3, 3] and -[4, 4]
pr46309.c.114t.reassoc1: into (unsigned int) a_6(D) + 4294967295 > 3
pr46309.c.114t.reassoc1: into _10 = _13;
pr46309.c.114t.reassoc1:Optimizing range tests a_4(D) -[1, 1] and -[3, 3]
pr46309.c.114t.reassoc1: into (a_4(D) & -3) != 1
pr46309.c.114t.reassoc1: into _6 = _8;
pr46309.c.114t.reassoc1:Optimizing range tests a_4(D) -[1, 1] and -[2, 2]
pr46309.c.114t.reassoc1: into (unsigned int) a_4(D) + 4294967295 > 1
pr46309.c.114t.reassoc1: into _6 = _9;
pr46309.c.114t.reassoc1:Optimizing range tests a_5(D) -[0, 31] and -[64,
95]
pr46309.c.114t.reassoc1: into (a_5(D) & 4294967231) > 31
pr46309.c.114t.reassoc1: into _7 = _9;
pr46309.c.114t.reassoc1:Optimizing range tests a_9(D) -[0, 31] and -[64,
95]
pr46309.c.114t.reassoc1: into (a_9(D) & 4294967231) > 31
pr46309.c.114t.reassoc1:Optimizing range tests a_9(D) -[128, 159] and
-[192, 223]
pr46309.c.114t.reassoc1: into (a_9(D) & 4294967231) + 4294967168 > 31
pr46309.c.114t.reassoc1: into _13 = _18 | _15;
pr46309.c.116t.reassoc1:Optimizing range tests a_2(D) -[1, 1] and -[2,
2] and -[3, 3] and -[4, 4]
pr46309.c.116t.reassoc1: into (unsigned int) a_2(D) + 4294967295 > 3
pr46309.c.116t.reassoc1:Optimizing range tests a_2(D) -[1, 1] and -[2,
2] and -[3, 3] and -[4, 4]
pr46309.c.116t.reassoc1: into (unsigned int) a_2(D) + 4294967295 > 3
pr46309.c.116t.reassoc1:Optimizing range tests a_3(D) -[0, 31] and -[64,
95]
pr46309.c.116t.reassoc1: into (a_3(D) & 4294967231) > 31
pr46309.c.116t.reassoc1:Optimizing range tests a_5(D) -[0, 31] and -[64,
95]
pr46309.c.116t.reassoc1: into (a_5(D) & 4294967231) > 31
pr46309.c.116t.reassoc1:Optimizing range tests a_5(D) -[128, 159] and
-[192, 223]
pr46309.c.116t.reassoc1: into (a_5(D) & 4294967231) + 4294967168 > 31


Bootstrapped and regression testing on x86-64-linux-gnu and
ppc64le-linux-gnu doesn't have any new regressions. Also did regression
testing arm variants which has branch cost greater than 1

Is this OK for trunk.

Thanks,
Kugan

gcc/ChangeLog:

2016-06-01  Kugan Vivekanandarajah  

PR middle-end/66726
* tree-ssa-reassoc.c (optimize_vec_cond_expr): Handle tcc_compare stmt
whose result is used in PHI
(final_range_test_p): Likewise.
(maybe_optimize_range_tests): Likewise.

OK for the trunk.

Thanks for your patience,
Jeff


Re: [PATCH PING] boehm-gc: check for execinfo.h directly

2016-06-21 Thread Jeff Law

On 06/21/2016 06:59 PM, Mike Frysinger wrote:

On 21 Jun 2016 15:46, Jeff Law wrote:


If accepted into upstream Boehm-GC, then this is obviously acceptable in
GCC's copy.


so changes can be pushed directly if it's already in upstream ?
my original goal is already fixed in upstream, but it's not in
gcc's copy ...
Yes.  Ideally we'd just resync at some point in the near future, but if 
you want to cherry-pick a fix from upstream so it gets fixed sooner, 
that's fine.


jeff


Re: [PATCH] Print column numbers in inclusion trace consistently.

2016-06-21 Thread Jeff Law

On 06/03/2016 05:24 AM, Marcin Baczyński wrote:

Hi,
the patch below fixes PR/42014. Although the fix itself seems easy enough,
I have a problem with the test. Is there a way to match the output before
the "warning:" line? dg-{begin,end}-multiline-output doesn't do the job, or
at least I don't know how to convince it.

Bootstrapped on x86_64 linux.

Thanks,
Marcin


gcc/ChangeLog:

   PR/42014

   * diagnostic.c (diagnostic_report_current_module): Print column numbers
for all mentioned files if context->show_column.

gcc/testsuite/ChangeLog:

   PR/42014

   * gcc.dg/inclusion-trace-column.i: New test.
The change itself seems reasonable.  You might contact David Malcolm 
(dmalc...@redhat.com) directly to see if he's got any ideas on how to 
convince the multi-line test to do what you want.  Let's hold off 
installing the fix until we've got the testsuite issue sorted out.


Thanks,
jeff



Re: Unreviewed patches

2016-06-21 Thread Jeff Law

On 06/06/2016 02:16 AM, Rainer Orth wrote:

The following patches have remained unreviewed for a week:

[gotools, libcc1] Update copyright dates
https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02307.html
Everything bug the gotools changes are OK.  THe master bits for gotools 
is outside of GCC.  Probably best to contact Ian for getting these updated.




Richard already approved the update-copyright.py changes, but the actual
effects on gotools and libcc1 require either maintainer or release
manager approval, I believe.

[build] Handle gas/gld --compress-debug-sections=type
https://gcc.gnu.org/ml/gcc-patches/2016-05/msg02325.html

This one needs a build maintainer.

This is OK for the trunk.

jeff



[PATCH] Implement -fdiagnostics-parseable-fixits

2016-06-21 Thread David Malcolm
Both GCC and Clang can emit "fix-it" hints for a diagnostic, giving
a suggestion about how to fix the issue.

Clang has an option -fdiagnostics-parseable-fixits, which emits a
machine-readable version of the fixits, for use by IDEs.  (The only
IDE I know of that supports this format is Xcode [1]; does anyone
know of other IDEs supporting this?  I'd be very happy if someone
implemented Emacs support for it, or could point me at it).

This patch implements the option for gcc.  It seems prudent to be
compatible with Clang here, so I've deliberately used the same option
name and attempted to emulate the output format, based on the
documentation here:
http://llvm.org/releases/3.8.0/tools/clang/docs/UsersManual.html#cmdoption-fdiagnostics-parseable-fixits

Experiments with clang (3.4.2) showed that supplying the option does
*not* suppress the normal printing of fixits, so I emulated that
behavior in my patch.

I implemented tests using both -fself-test and DejaGnu.
For the DejaGnu test coverage, I attempted to implement detection of the
output strings via existing directives, but after several hours of
failing, I instead implemented a new "dg-regexp" directive, which doesn't
expect anything other than the given regexp.  If anyone can see a way
to implement the tests using existing directives, I'll port to that.
(I believe that the injection of leading line numbers was the issue).

I need review of the proposed additions to gcc/testsuite/lib at least
(the rest I believe I can self-approve, but another pair of eyes would
be appreciated).

Successfully bootstrapped on x86_64-pc-linux-gnu.
Successful -fself-test of stage1 on powerpc-ibm-aix7.1.3.0.

OK for trunk?

[1] 
http://clang-developers.42468.n3.nabble.com/Parsing-clang-output-td3275815.html#a3276978

gcc/ChangeLog:
* common.opt (fdiagnostics-parseable-fixits): New option.
* diagnostic.c: Include "selftest.h".
(print_escaped_string): New function.
(print_parseable_fixits): New function.
(diagnostic_report_diagnostic): Call print_parseable_fixits.
(selftest::assert_print_escaped_string): New function.
(ASSERT_PRINT_ESCAPED_STRING_STREQ): New macro.
(selftest::test_print_escaped_string): New function.
(selftest::test_print_parseable_fixits_none): New function.
(selftest::test_print_parseable_fixits_insert): New function.
(selftest::test_print_parseable_fixits_remove): New function.
(selftest::test_print_parseable_fixits_replace): New function.
(selftest::diagnostic_c_tests): New function.
* diagnostic.h (struct diagnostic_context): Add field
"parseable_fixits_p".
* doc/invoke.texi (Diagnostic Message Formatting Options): Add
-fdiagnostics-parseable-fixits.
(-fdiagnostics-parseable-fixits): New option.
* opts.c (common_handle_option): Handle
-fdiagnostics-parseable-fixits.
* selftest-run-tests.c (selftest::run_tests): Call
selftest::diagnostic_c_tests.
* selftest.h (selftest::diagnostic_c_tests): New prototype.

gcc/testsuite/ChangeLog:
* gcc.dg/plugin/diagnostic-test-show-locus-parseable-fixits.c: New
file.
* gcc.dg/plugin/plugin.exp (plugin_test_list): Add
diagnostic-test-show-locus-parseable-fixits.c to sources for
diagnostic_plugin_test_show_locus.c.
* lib/gcc-defs.exp (freeform_regexps): New global.
(dg-regexp): New function.
(handle-dg-regexps): New function.
* lib/gcc-dg.exp (cleanup-after-saved-dg-test): Reset
freeform_regexps to the empty list.
* lib/prune.exp (prune_gcc_output): Call handle-dg-regexps.

libcpp/ChangeLog:
* include/line-map.h (fixit_hint::get_start_loc): New pure virtual
function.
(fixit_hint::maybe_get_end_loc): Likewise.
(fixit_insert::get_start_loc): New function, implementing
fixit_hint::get_start_loc.
(fixit_insert::maybe_get_end_loc): New function, implementing
fixit_hint::maybe_get_end_loc.
(fixit_remove::get_start_loc): New function, implementing
fixit_hint::get_start_loc.
(fixit_remove::maybe_get_end_loc): New function, implementing
fixit_hint::maybe_get_end_loc.
(fixit_replace::get_start_loc): New function, implementing
fixit_hint::get_start_loc.
(fixit_replace::maybe_get_end_loc): New function, implementing
fixit_hint::maybe_get_end_loc.
---
 gcc/common.opt |   4 +
 gcc/diagnostic.c   | 258 +
 gcc/diagnostic.h   |   4 +
 gcc/doc/invoke.texi|  34 ++-
 gcc/opts.c |   4 +
 gcc/selftest-run-tests.c   |   1 +
 gcc/selftest.h |   1 +
 .../diagnostic-test-show-locus-parseable-fixits.c  |  41 
 

Re: [PATCH 2/2] gcc: Update comment in bb-reorder.c

2016-06-21 Thread Jeff Law

On 06/10/2016 10:56 AM, Andrew Burgess wrote:

* gcc/bb-reorder.c (pass_partition_blocks::gate): Update comment.

Thanks.  Installed.

jeff


Re: [PATCH 1/2] gcc: Remove unneeded global flag.

2016-06-21 Thread Jeff Law

On 06/10/2016 10:56 AM, Andrew Burgess wrote:

The global flag `user_defined_section_attribute' is set while parsing C
code when the section attribute is encountered.  The flag is set when
anything has the section attribute applied to it, functions or data.

The only place this global was used was within the gate function for
partitioning blocks (pass_partition_blocks::gate), however, the
partitioning is done during compilation, while the flag is set earlier,
during the parsing.  The flag is then cleared again during the final
compilation pass.

The result is that if any function or data has a section attribute then
the flag will be set to true during the file parse pass.  The first
compiled function will then skip the partition-blocks pass, and the flag
will be set back to false during the final-pass on the first function.
After then, the flag is never set to true again.

The guarding of the partition-blocks pass does not appear to be
necessary, given that applying a section attribute correctly
overrides the hot/cold section partitioning (this is taken care if in
varasm.c).

gcc/ChangeLog:

* gcc/bb-reorder.c: Remove 'toplev.h' include.
(pass_partition_blocks::gate): No longer check
user_defined_section_attribute.
* gcc/c-family/c-common.c (handle_section_attribute): No longer
set user_defined_section_attribute.
* gcc/final.c (rest_of_handle_final): Likewise.
* gcc/toplev.c: Remove definition of user_defined_section_attribute.
* gcc/toplev.h: Remove declaration of
user_defined_section_attribute.
user_defined_section_attribute was introduced as part of the hot/cold 
partitioning changes.


https://gcc.gnu.org/ml/gcc-patches/2004-07/msg01545.html


What's supposed to happen is hot/cold partitioning is supposed to be 
turned off for the function which has the a user defined section 
attribute.


So proper behaviour is to set the flag to true when the attribute is 
parsed and turn it off when we're finished with the current function. 
The gate for hot/cold partitioning should check the value of the flag 
and avoid hot/cold partitioning when the flag is true.


So AFAICT everything is working as it should.  Keep in mind that 
multiple functions might have user defined section attributes.


So what might be better to do here is introduce a test to verify proper 
behavior.


Jeff


Re: [PATCH PING] boehm-gc: check for execinfo.h directly

2016-06-21 Thread Mike Frysinger
On 21 Jun 2016 15:46, Jeff Law wrote:
> On 06/13/2016 11:40 AM, Mike Frysinger wrote:
> > The current header depends on glibc version checks to determine whether
> > execinfo.h exists which breaks uClibc.  Instead, add an explicit configure
> > check for it.
> >
> > 2015-08-29  Mike Frysinger  
> >
> > * configure.ac: Call AC_CHECK_HEADERS([execinfo.h]).
> > * configure: Regenerated.
> > * include/gc.h [HAVE_EXECINFO_H]: Define GC_HAVE_BUILTIN_BACKTRACE.
> > * include/gc_config.h.in: Regenerated.
> 
> Shouldn't this be going to the upstream Boehm-GC project?  You should be 
> able to find information here on how to get the patch into the official 
> Boehm-GC project:

thanks ... didn't realize this was an embedded upstream project.
i've sent my patch there now:
https://github.com/ivmai/bdwgc/pull/123

> If accepted into upstream Boehm-GC, then this is obviously acceptable in 
> GCC's copy.

so changes can be pushed directly if it's already in upstream ?
my original goal is already fixed in upstream, but it's not in
gcc's copy ...
-mike


signature.asc
Description: Digital signature


Re: [PATCH, rs6000] Prefer vspltisw/h over xxspltib+instruction when available

2016-06-21 Thread Bill Schmidt

> On Jun 21, 2016, at 5:34 PM, Segher Boessenkool  
> wrote:
> 
> On Tue, Jun 21, 2016 at 03:14:51PM -0500, Bill Schmidt wrote:
>> I discovered recently that, with -mcpu=power9, an attempt to generate a 
>> vspltish instruction resulted instead in an xxspltib followed by a vupkhsb.  
>> This is semantically correct but the extra instruction is not optimal.  I 
>> found that there was some logic in xxspltib_constant_p to do special casing 
>> for const_vector with small constants, but not for vec_duplicate with small 
>> constants.  This patch duplicates that logic so we can generate the single 
>> instruction when possible.
> 
> This part is okay.
> 
>> When I did this, I ran into a problem with an existing test case.  We end up 
>> matching the *vsx_splat_v4si_internal pattern instead of falling back to the 
>> altivec_vspltisw pattern.  The constraints don't match for constant input.  
>> To avoid this, I added a pattern ahead of this one that will match for VMX 
>> output registers and produce the vspltisw as desired.  This corrected the 
>> failing test and produces the expected code.
> 
> Why does the predicate allow constant input, while the constraints do not?

I have no idea why it was built that way.  The predicate seems to provide for 
all sorts of things, but this and the subsequent pattern both handle only a 
subset of the constraints implied by it.  To be honest, I didn't feel competent 
to try to fix the existing patterns.  Do you have any suggestions for what to 
do instead?

Thanks!
Bill

> 
>> I've added a test case to demonstrate the code works properly now in the 
>> usual case.
> 
> Thanks :-)
> 
> 
> Segher
> 



Re: [7/7] Add negative and zero strides to vect_memory_access_type

2016-06-21 Thread Jeff Law

On 06/15/2016 02:53 AM, Richard Sandiford wrote:

This patch uses the vect_memory_access_type from patch 6 to represent
the effect of a negative contiguous stride or a zero stride.  The latter
is valid only for loads.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
* tree-vectorizer.h (vect_memory_access_type): Add
VMAT_INVARIANT, VMAT_CONTIGUOUS_DOWN and VMAT_CONTIGUOUS_REVERSED.
* tree-vect-stmts.c (compare_step_with_zero): New function.
(perm_mask_for_reverse): Move further up file.
(get_group_load_store_type): Stick to VMAT_ELEMENTWISE if the
step is negative.
(get_negative_load_store_type): New function.
(get_load_store_type): Call it.  Add an ncopies argument.
(vectorizable_mask_load_store): Update call accordingly and
remove tests for negative steps.
(vectorizable_store, vectorizable_load): Likewise.  Handle new
memory_access_types.

OK.
jeff



Re: [6/7] Explicitly classify vector loads and stores

2016-06-21 Thread Jeff Law

On 06/15/2016 02:52 AM, Richard Sandiford wrote:

This is the main patch in the series.  It adds a new enum and routines
for classifying a vector load or store implementation.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Thanks,
Richard


gcc/
* tree-vectorizer.h (vect_memory_access_type): New enum.
(_stmt_vec_info): Add a memory_access_type field.
(STMT_VINFO_MEMORY_ACCESS_TYPE): New macro.
(vect_model_store_cost): Take an access type instead of a boolean.
(vect_model_load_cost): Likewise.
* tree-vect-slp.c (vect_analyze_slp_cost_1): Update calls to
vect_model_store_cost and vect_model_load_cost.
* tree-vect-stmts.c (vec_load_store_type): New enum.
(vect_model_store_cost): Take an access type instead of a
store_lanes_p boolean.  Simplify tests.
(vect_model_load_cost): Likewise, but for load_lanes_p.
(get_group_load_store_type, get_load_store_type): New functions.
(vectorizable_store): Use get_load_store_type.  Record the access
type in STMT_VINFO_MEMORY_ACCESS_TYPE.
(vectorizable_load): Likewise.
(vectorizable_mask_load_store): Likewise.  Replace is_store
variable with vls_type.
OK.  Looks like a nice cleanup to me.  If there's something that got 
goof'd along the way, I trust you'll deal with it appropriately -- I 
didn't try to map from every conditional back in the original code to 
the conditionals in the new code.



Jeff


Re: [PATCH, rs6000] Prefer vspltisw/h over xxspltib+instruction when available

2016-06-21 Thread Segher Boessenkool
On Tue, Jun 21, 2016 at 03:14:51PM -0500, Bill Schmidt wrote:
> I discovered recently that, with -mcpu=power9, an attempt to generate a 
> vspltish instruction resulted instead in an xxspltib followed by a vupkhsb.  
> This is semantically correct but the extra instruction is not optimal.  I 
> found that there was some logic in xxspltib_constant_p to do special casing 
> for const_vector with small constants, but not for vec_duplicate with small 
> constants.  This patch duplicates that logic so we can generate the single 
> instruction when possible.

This part is okay.

> When I did this, I ran into a problem with an existing test case.  We end up 
> matching the *vsx_splat_v4si_internal pattern instead of falling back to the 
> altivec_vspltisw pattern.  The constraints don't match for constant input.  
> To avoid this, I added a pattern ahead of this one that will match for VMX 
> output registers and produce the vspltisw as desired.  This corrected the 
> failing test and produces the expected code.

Why does the predicate allow constant input, while the constraints do not?

> I've added a test case to demonstrate the code works properly now in the 
> usual case.

Thanks :-)


Segher


Re: [PATCH] Drop excess size used for run time allocated stack variables.

2016-06-21 Thread Jeff Law

On 06/21/2016 03:35 AM, Dominik Vogt wrote:

What do we do now with the two patches?  At the moment, the
functional patch depends on the changes in the cleanup patch, so
it cannot be applied on its own.  Options:

(with the requested cleanup in the functional patch)

 1) Apply both patches as they are now and do further cleanup on
top of it.
 2) Rewrite the functional patch so that it applies without the
cleanup patch and commit it now.
 3) Look into the suggested cleanup now and adapt the functional
patch to it when its ready.

Actually I'd prefer (1) or (2) to just get the functional patch
off my desk.  I agree that the cleanup is very useful, but there's
not relation between the cleanup and the functional stuff except
that they touch the same code.  Having the functional patch
applied would simplify further work for me.
I thought Eric had ack'd the cleanup patch with a comment fix, so that 
can move forward and presumably unblock your functional patch.  Right?


So I think the TODO here is for me to fix the comment per Eric's review 
so that you can move forward.  The trick is getting it done before I go 
on PTO at the end of this week :-)


Jeff



Re: [PATCH] Fix bootstrap on hppa*-*-hpux*

2016-06-21 Thread Jeff Law

On 05/20/2016 12:09 PM, John David Anglin wrote:

On 2016-05-18 2:20 AM, Jakub Jelinek wrote:

On Tue, May 17, 2016 at 08:31:00PM -0400, John David Anglin wrote:

>r235550 introduced the use of long long, and the macros LLONG_MIN
and LLONG_MAX.  These macros
>are not defined by default and we need to include  when
compiling with c++ to define them.

IMNSHO we should get rid of those long long uses instead and just use
int64_t and INTTYPE_MINIMUM (int64_t) and INTTYPE_MAXIMUM (int64_t).

There is also another use of long long in libcpp, we should also replace
that.


The attached change implements the above.  There is an implicit
assumption that int64_t
is long long if it is not long.

The patch also changes gcov-tool.c.  This affects the interface somewhat
but
I think consistently using int64_t better.

Tested on hppa2.0w-hp-hpux11.11.  Okay for trunk?

Dave

--
John David Anglin  dave.ang...@bell.net


longlong.d.txt


2016-05-20  John David Anglin  

PR bootstrap/71014
* c-common.c (get_source_date_epoch): Use int64_t instead of long long.

* gcov-tool.c (profile_rewrite): Use int64_t instead of long long.
(do_rewrite): likewise.

* line-map.c (location_adhoc_data_update): Use int64_t instead of
long long.
(get_combined_adhoc_loc): Likewise.

OK.  Please install if you haven't already.

jeff


Re: [PATCH] Improve TBAA with unions

2016-06-21 Thread Jeff Law

On 05/24/2016 03:14 AM, Richard Biener wrote:

On Wed, 18 May 2016, Richard Biener wrote:



The following adjusts get_alias_set beahvior when applied to
union accesses to use the union alias-set rather than alias-set
zero.  This is in line with behavior from the alias oracle
which (bogously) circumvents alias-set zero with looking at
the alias-sets of the base object.  Thus for

union U { int i; float f; };

float
foo (union U *u, double *p)
{
  u->f = 1.;
  *p = 0;
  return u->f;
}

the langhooks ensured u->f has alias-set zero and thus disambiguation
against *p was not allowed.  Still the alias-oracle did the disambiguation
by using the alias set of the union here (I think optimizing the
return to return 1. is valid).

We have a good place in the middle-end to apply such rules which
is component_uses_parent_alias_set_from - this is where I move
the logic that is duplicated in various frontends.

The Java and Ada frontends do not allow union type punning (LTO does),
so this patch may eventually pessimize them.  I don't care anything
about Java but Ada folks might want to chime in.

Bootstrapped on x86_64-unknown-linux-gnu, testing in progress.

Ok for trunk?


Ping.

Thanks,
Richard.


Thanks,
Richard.

2016-05-18  Richard Biener  

* alias.c (component_uses_parent_alias_set_from): Handle
type punning through union accesses by using the union alias set.
* gimple.c (gimple_get_alias_set): Remove union type punning case.

c-family/
* c-common.c (c_common_get_alias_set): Remove union type punning case.

fortran/
* f95-lang.c (LANG_HOOKS_GET_ALIAS_SET): Remove (un-)define.
(gfc_get_alias_set): Remove.
You know the aliasing rules better than I.  If you're confident using 
the union's alias set is safe, then it's OK with me.


My only worry is that if we get it wrong, it's likely to be a subtle bug 
that may take a long time to expose itself.  But that in and of itself 
shouldn't stop us from going forward.



Jeff



Re: RFA: Generate normal DWARF DW_LOC descriptors for non integer mode pointers

2016-06-21 Thread Jeff Law

On 05/26/2016 10:16 AM, Nick Clifton wrote:

Hi Jeff,


I may be missing something, but isn't it the transition to an FP
relative address rather than a SP relative address that's the problem
here?


Yes, I believe so.


Where does that happen?


I think that it happens in dwarf2out.c:based_loc_descr()  which
detects the use of the frame pointer and works out that it is going
to be eliminated to the stack pointer:

  /* We only use "frame base" when we're sure we're talking about the
 post-prologue local stack frame.  We do this by *not* running
 register elimination until this point, and recognizing the special
 argument pointer and soft frame pointer rtx's.  */
  if (reg == arg_pointer_rtx || reg == frame_pointer_rtx)
{
  rtx elim = (ira_use_lra_p
  ? lra_eliminate_regs (reg, VOIDmode, NULL_RTX)
  : eliminate_regs (reg, VOIDmode, NULL_RTX));

  if (elim != reg)
.

The problem, I believe, is that based_loc_descr() is only called
from mem_loc_descriptor when the mode of the rtl concerned is an
MODE_INT.  For example:

case REG:
  if (GET_MODE_CLASS (mode) != MODE_INT
 [...]
  else
  if (REGNO (rtl) < FIRST_PSEUDO_REGISTER)
mem_loc_result = based_loc_descr (rtl, 0, VAR_INIT_STATUS_INITIALIZED);

or, (this is another one that I found whilst investigating this
problem further):

  case PLUS:
plus:
  if (is_based_loc (rtl)
  && (GET_MODE_SIZE (mode) <= DWARF2_ADDR_SIZE
  || XEXP (rtl, 0) == arg_pointer_rtx
  || XEXP (rtl, 0) == frame_pointer_rtx)
  && GET_MODE_CLASS (mode) == MODE_INT)
mem_loc_result = based_loc_descr (XEXP (rtl, 0),
  INTVAL (XEXP (rtl, 1)),
  VAR_INIT_STATUS_INITIALIZED);
  else


There are quite a few places in mem_loc_descriptor where the code checks
for the mode being in the MODE_INT class.  I am not exactly sure why.  I
think that it might be that the programmer thought that any expression that
does not involve integer based arithmetic cannot be expressed in DWARF CFA
notation and so would have to be handled specially.  If I am correct,
then it seems to me that the proper fix would be to use SCALAR_INT_MODE_P
instead.

I tried out the extended patch (attached) and it gave even better GDB
results for the MSP430 and still no regressions (GCC or GDB) for MSP430 or
x86_64.

Is this enough justification ?
So the argument is that MODE_INT was likely intended to filter out 
things like FP modes and such that might somehow be bogusly used as 
addresses.  As written those tests are also filtering out partial 
integer modes which we can support?


And that many (most?) of the things that filter on MODE_INT should 
really be using MODE_INT || MODE_PARTIAL_INT (aka SCALAR_INT_MODE_P).


I can buy that ;-)   OK with a suitable ChangeLog entry.

jeff




Re: [PATCH PR68030/PR69710][RFC]Introduce a simple local CSE interface and use it in vectorizer

2016-06-21 Thread Jeff Law

On 06/06/2016 05:23 AM, Bin.Cheng wrote:


Hi Jeff,
What's your opinion on this (and how to extend it to a region based
interface)?  I will update this patch for further review if you are
okay.
I've never pondered how to revamp DOM into a regional interface (though 
I have pondered how to revamp the threading bits into a regional 
interface -- without any real success).


Conceptually I think it's just a matter of setting an entry block/edge, 
computing all blocks dominated by that entry block/edge and limiting the 
walker to only visiting blocks in that set.  So the regional aspect 
would live in domwalk.[ch].


We'd probably want to avoid all the threading bits in a regional walk -- 
not because threading isn't useful, but sequencing the updates is a PITA 
and trying to do it in the middle of some other pass would just be a mess.


jeff


Re: [PATCH PR68030/PR69710][RFC]Introduce a simple local CSE interface and use it in vectorizer

2016-06-21 Thread Jeff Law

On 05/27/2016 05:56 AM, Richard Biener wrote:

On Fri, May 27, 2016 at 1:11 PM, Bin.Cheng  wrote:

On Fri, May 27, 2016 at 11:45 AM, Richard Biener
 wrote:

On Wed, May 25, 2016 at 1:22 PM, Bin Cheng  wrote:

Hi,
As analyzed in PR68303 and PR69710, vectorizer generates duplicated 
computations in loop's pre-header basic block when creating base address for 
vector reference to the same memory object.  Because the duplicated code is out 
of loop, IVOPT fails to track base object for these vector references, 
resulting in missed strength reduction.
It's agreed that vectorizer should be improved to generate optimal 
(IVOPT-friendly) code, the difficult part is we want a generic infrastructure.  
After investigation, I tried to introduce a generic/simple local CSE interface 
by reusing existing algorithm/data-structure from tree-ssa-dom 
(tree-ssa-scopedtables).  The interface runs local CSE for each basic block in 
a bitmap, customers of this interface only need to record basic blocks in the 
bitmap when necessary.  Note we don't need scopedtables' unwinding facility 
since the interface runs only for single basic block, this should be good in 
terms of compilation time.
Besides CSE issue, this patch also re-associates address expressions in 
vect_create_addr_base_for_vector_ref, specifically, it splits constant offset 
and adds it back near the expression root in IR.  This is necessary because GCC 
only handles re-association for commutative operators in CSE.

I checked its impact on various test cases.
With this patch, PR68030's generated assembly is reduced from ~750 lines to 
~580 lines on x86_64, with both pre-header and loop body simplified.  But,
1) It doesn't fix all the problem on x86_64.  Root cause is computation for 
base address of the first reference is somehow moved outside of loop's 
pre-header, local CSE can't help in this case.  Though split_constant_offset 
can back track ssa def chain, it causes possible redundant when there is no CSE 
opportunities in pre-header.
2) It causes regression for PR68030 on AArch64.  I think the regression is caused by IVOPT issues which are 
exposed by this patch.  Checks on offset validity in get_address_cost is wrong/inaccurate now.  It considers 
an offset as valid if it's within the maximum offset range that backend supports.  This is not true, for 
example, AArch64 requires aligned offset additionally.  For example, LDR [base + 2060] is invalid for 
V4SFmode, although "2060" is within the maximum offset range.  Another issue is also in 
get_address_cost.  Inaccurate cost is computed for "base + offset + INDEX" address expression.  
When register pressure is low, "base+offset" can be hoisted out and we can use [base + INDEX] 
addressing mode, whichhis is current behavior.

Bootstrap and test on x86_64 and AArch64.  Any comments appreciated.


It looks quite straight-forward with the caveat that it has one
obvious piece that is not in the order
of the complexity of a basic-block.  threadedge_initialize_values
creates the SSA value array

I noticed this too, and think it's better to get rid of this init/fini
functions by some kind re-design.  I found it's quite weird to call
threadege_X in tree-vrp.c.  I will keep investigating this.

which is zero-initialized (upon use).  That's probably a non-issue for
the use you propose for
the vectorizer (call cse_bbs once per function).  As Ideally I would
like this facility to replace
the loop unrollers own propagate_constants_for_unrolling it might
become an issue though.
In that regard the unroller facility is also more powerful because it
handles regions (including
PHIs).

With the current data-structure, I think it's not very hard to extend
the interface to regions.  I will keep investigating this too.  BTW,
if it's okay, I tend to do that in following patches.


I'm fine with doing enhancements to this in followup patches (adding Jeff
to CC for his opinions).
Likewise.  Ideally we'll get all the threading stuff out of DOM/VRP. 
VRP is probably the easiest as I think it'll fall-out of some work 
Andrew is doing.  Getting rid of the threading from DOM is more work, 
but nothing that (IMHO) is terribly complex, its just time consuming.


Jeff



Re: [PATCH PING] boehm-gc: check for execinfo.h directly

2016-06-21 Thread Jeff Law

On 06/13/2016 11:40 AM, Mike Frysinger wrote:

The current header depends on glibc version checks to determine whether
execinfo.h exists which breaks uClibc.  Instead, add an explicit configure
check for it.

2015-08-29  Mike Frysinger  

* configure.ac: Call AC_CHECK_HEADERS([execinfo.h]).
* configure: Regenerated.
* include/gc.h [HAVE_EXECINFO_H]: Define GC_HAVE_BUILTIN_BACKTRACE.
* include/gc_config.h.in: Regenerated.
Shouldn't this be going to the upstream Boehm-GC project?  You should be 
able to find information here on how to get the patch into the official 
Boehm-GC project:



http://hboehm.info/gc/

If accepted into upstream Boehm-GC, then this is obviously acceptable in 
GCC's copy.


jeff




Re: [PATCH 4/4] C FE: suggest corrections for misspelled identifiers and type names

2016-06-21 Thread Jeff Law

On 06/14/2016 09:15 AM, David Malcolm wrote:

This patch introduces a new lookup_name_fuzzy function to the
C frontend and uses it to provides suggestions for various error
messages that may be due to misspellings, and also for the
warnings in implicit_decl_warning.

This latter part may be controversial.  So far, we've only provided
spelling suggestions during error-handling, and I think there's
a strong case for spending the extra cycles to give a good error
message given that the compilation is going to fail.  For a *warning*,
this tradeoff may not be so clear.  In my experience, the
"implicit declaration of function" warning usually means that
either the compile or link is going to fail, so it may be that these
cases are usually effectively errors (hence the suggestions in this
patch).

Alternatively, the call to lookup_name_fuzzy could be somehow guarded
so that we don't do work upfront to generate the suggestion
if the warning is going to be discarded internally within diagnostics.c.
(if the user has supplied -Wno-implicit-function-declaration, they're
presumably working with code that uses implicit function decls, I
suppose).

Amongst other things, builtins get offered as suggestions (hence the
suggestion of "nanl" for "name" in one existing test case, which I
initially found surprising).

The patch includes PR c/70339: "singed" vs "signed" hints, looking
at reserved words that could start typenames.

The patch also considers preprocessor macros when considering
spelling candidates: if the user misspells a macro name, this is
typically seen by the frontend as an unrecognized identifier, so
we can offer suggestions like this:

spellcheck-identifiers.c: In function 'test_4':
spellcheck-identifiers.c:64:10: warning: implicit declaration of
function 'IDENTIFIER_PTR'; did you mean 'IDENTIFIER_POINTER'?
[-Wimplicit-function-declaration]
   return IDENTIFIER_PTR (node);
  ^~
  IDENTIFIER_POINTER

Doing so uses the "best_match" class added in a prior patch, merging
in results between C scopes and the preprocessor, using the
optimizations there to minimize the work done.  Merging these
results required some minor surgery to class best_match.

In c-c++-common/attributes-1.c, the error message for:
  void* my_calloc(unsigned, unsigned) __attribute__((alloc_size(1,bar))); /* { dg-warning 
"outside range" } */
gains a suggestion of "carg", becoming:
  attributes-1.c:4:65: error: 'bar' undeclared here (not in a function); did 
you mean 'carg'?
  attributes-1.c:4:1: warning: alloc_size parameter outside range [-Wattributes]
This is an unhelpful suggestion, given that alloc_size expects an integer
value identifying a parameter, which the builtin
  double carg (double complex z);
is not.  It's not clear to me what the best way to fix this is:
if I'm reading things right, c_parser_attributes parses expressions
using c_parser_expr_list without regard to which attribute it's
handling, so there's (currently) no way to "tune" the attribute parser
based on the attribute (and I don't know if that's a complexity we
want to take on).

Successfully bootstrapped in combination with the rest of
the kit on x86_64-pc-linux-gnu
Successful -fself-test of stage1 on powerpc-ibm-aix7.1.3.0 (in
combination with the rest of the kit).

OK for trunk?

gcc/c-family/ChangeLog:
PR c/70339
* c-common.h (enum lookup_name_fuzzy_kind): New enum.
(lookup_name_fuzzy): New prototype.

gcc/c/ChangeLog:
PR c/70339
* c-decl.c: Include spellcheck-tree.h and gcc-rich-location.h.
(implicit_decl_warning): When issuing warnings for implicit
declarations, attempt to provide a suggestion via
lookup_name_fuzzy.
(undeclared_variable): Likewise when issuing errors.
(lookup_name_in_scope): Likewise.
(struct edit_distance_traits): New struct.
(best_macro_match): New typedef.
(find_closest_macro_cpp_cb): New function.
(lookup_name_fuzzy): New function.
* c-parser.c: Include gcc-rich-location.h.
(c_token_starts_typename): Split out case CPP_KEYWORD into...
(c_keyword_starts_typename): ...this new function.
(c_parser_declaration_or_fndef): When issuing errors about
missing "struct" etc, add a fixit.  For other kinds of errors,
attempt to provide a suggestion via lookup_name_fuzzy.
(c_parser_parms_declarator): When looking ahead to detect typos in
type names, also reject CPP_KEYWORD.
(c_parser_parameter_declaration): When issuing errors about
unknown type names, attempt to provide a suggestion via
lookup_name_fuzzy.
* c-tree.h (c_keyword_starts_typename): New prototype.

gcc/ChangeLog:
PR c/70339
* diagnostic-core.h (pedwarn_at_rich_loc): New prototype.
* diagnostic.c (pedwarn_at_rich_loc): New function.
* spellcheck.h (best_match::best_match): Add a
"best_distance_so_far" optional parameter.
  

[Ada] clean up handling of DECL_ORIGINAL_TYPE in gigi

2016-06-21 Thread Eric Botcazou
More specifically the ??? comment:

  /* ??? Copy and original type are not supposed to be variant but we
 really need a variant for the placeholder machinery to work.  */
  if (TYPE_IS_FAT_POINTER_P (t))
tt = build_variant_type_copy (t);
  else
{
  /* TYPE_NEXT_PTR_TO is a chain of main variants.  */
  tt = build_distinct_type_copy (TYPE_MAIN_VARIANT (t));
  if (TREE_CODE (t) == POINTER_TYPE)
TYPE_NEXT_PTR_TO (TYPE_MAIN_VARIANT (t)) = tt;
  tt = build_qualified_type (tt, TYPE_QUALS (t));
}

The C family of compilers always uses a variant in this case so the attached 
patch does the same for the Ada compiler.  The verify_type_variant hunk is 
necessary because it would flag differences for TYPE_SIZE and TYPE_SIZE_UNIT 
between variants that are spurious: the trees are initially the same but they 
are reset to different PLACEHOLDER_EXPRs by free_lang_data_in_one_sizepos:

/* Reset the expression *EXPR_P, a size or position.

   ??? We could reset all non-constant sizes or positions.  But it's cheap
   enough to not do so and refrain from adding workarounds to dwarf2out.c.

   We need to reset self-referential sizes or positions because they cannot
   be gimplified and thus can contain a CALL_EXPR after the gimplification
   is finished, which will run afoul of LTO streaming.  And they need to be
   reset to something essentially dummy but not constant, so as to preserve
   the properties of the object they are attached to.  */

static inline void
free_lang_data_in_one_sizepos (tree *expr_p)
{
  tree expr = *expr_p;
  if (CONTAINS_PLACEHOLDER_P (expr))
*expr_p = build0 (PLACEHOLDER_EXPR, TREE_TYPE (expr));
}

Tested on x86_64-suse-linux, applied on the mainline, as obvious for the 
verify_type_variant hunk.


2016-06-21  Eric Botcazou  

* tree.c (verify_type_variant): Skip TYPE_SIZE and TYPE_SIZE_UNIT if
they are both PLACEHOLDER_EXPRs.
ada/
* gcc-interface/decl.c (set_nonaliased_component_on_array_type): New
function.
(set_reverse_storage_order_on_array_type): Likewise.
(gnat_to_gnu_entity) : Call them to set the flags.
: Likewise.
: Likewise.
(substitute_in_type) : Likewise.
* gcc-interface/utils.c (gnat_pushdecl): Always create a variant for
the DECL_ORIGINAL_TYPE of a type.

-- 
Eric BotcazouIndex: ada/gcc-interface/decl.c
===
--- ada/gcc-interface/decl.c	(revision 237571)
+++ ada/gcc-interface/decl.c	(working copy)
@@ -206,6 +206,8 @@ static tree gnat_to_gnu_subprog_type (En
 static tree gnat_to_gnu_field (Entity_Id, tree, int, bool, bool);
 static tree gnu_ext_name_for_subprog (Entity_Id, tree);
 static tree change_qualified_type (tree, int);
+static void set_nonaliased_component_on_array_type (tree);
+static void set_reverse_storage_order_on_array_type (tree);
 static bool same_discriminant_p (Entity_Id, Entity_Id);
 static bool array_type_has_nonaliased_component (tree, Entity_Id);
 static bool compile_time_known_address_p (Node_Id);
@@ -2265,12 +2267,12 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	for (index = ndim - 1; index >= 0; index--)
 	  {
 	tem = build_nonshared_array_type (tem, gnu_index_types[index]);
-	if (index == ndim - 1)
-	  TYPE_REVERSE_STORAGE_ORDER (tem)
-		= Reverse_Storage_Order (gnat_entity);
 	TYPE_MULTI_ARRAY_P (tem) = (index > 0);
+	TYPE_CONVENTION_FORTRAN_P (tem) = convention_fortran_p;
+	if (index == ndim - 1 && Reverse_Storage_Order (gnat_entity))
+	  set_reverse_storage_order_on_array_type (tem);
 	if (array_type_has_nonaliased_component (tem, gnat_entity))
-	  TYPE_NONALIASED_COMPONENT (tem) = 1;
+	  set_nonaliased_component_on_array_type (tem);
 	  }
 
 	/* If an alignment is specified, use it if valid.  But ignore it
@@ -2287,8 +2289,6 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	  TYPE_USER_ALIGN (tem) = 1;
 	  }
 
-	TYPE_CONVENTION_FORTRAN_P (tem) = convention_fortran_p;
-
 	/* Tag top-level ARRAY_TYPE nodes for packed arrays and their
 	   implementation types as such so that the debug information back-end
 	   can output the appropriate description for them.  */
@@ -2651,12 +2651,12 @@ gnat_to_gnu_entity (Entity_Id gnat_entit
 	{
 	  gnu_type = build_nonshared_array_type (gnu_type,
 		 gnu_index_types[index]);
-	  if (index == ndim - 1)
-		TYPE_REVERSE_STORAGE_ORDER (gnu_type)
-		  = Reverse_Storage_Order (gnat_entity);
 	  TYPE_MULTI_ARRAY_P (gnu_type) = (index > 0);
+	  TYPE_CONVENTION_FORTRAN_P (gnu_type) = convention_fortran_p;
+	  if (index == ndim - 1 && Reverse_Storage_Order (gnat_entity))
+		set_reverse_storage_order_on_array_type (gnu_type);
 	  if (array_type_has_nonaliased_component (gnu_type, gnat_entity))
-		TYPE_NONALIASED_COMPONENT (gnu_type) = 1;
+		

Re: [PATCH] Reject boolean/enum types in last arg of __builtin_*_overflow_p (take 2)

2016-06-21 Thread Jeff Law

On 06/15/2016 05:47 AM, Jakub Jelinek wrote:

On Tue, Jun 14, 2016 at 11:13:28AM -0600, Martin Sebor wrote:

Here is an untested patch for that.  Except that the middle-end considers
conversions between BOOLEAN_TYPE and single bit unsigned type as useless,
so in theory this can't work well, and in practice only if we are lucky
enough (plus it generates terrible code right now), so we'd probably need
to come up with a different way of expressing whether the internal fn
should have a bool/_Bool-ish behavior or not (optional 3rd argument or
something ugly like that).  Plus add lots of testcases to cover the weirdo
cases.  Is it really worth it, even when we don't want to support overflow
into enumeration type and thus will not cover all integral types anyway?


If it's cumbersome to get to work I agree that it's not worth
the effort.  Thanks for taking the time to prototype it.


Ok, so here is an updated patch.  In addition to diagnostic wording changes
this (as also the earlier posted patch) fixes the handling of sub-mode
precision, it adds hopefully sufficient testsuite coverage for
__builtin_{add,sub,mul}_overflow_p.

The only thing I'm unsure about is what to do with bitfield types.
For __builtin_{add,sub,mul}_overflow it is not an issue, as one can't take
address of a bitfield.  For __builtin_{add,sub,mul}_overflow_p right now,
the C FE doesn't promote the last argument in any way, therefore for C
the builtin-arith-overflow-p-19.c testcase tests the behavior of bitfield
overflows.  The C++ FE even for type-generic builtins promotes the argument
to the underlying type (as part of decay_conversion), therefore for C++
overflow to bit-fields doesn't work.  Is that acceptable that because the
bitfields in the two languages behave generally slightly differently it is
ok that it differs even here, or should the C FE promote bitfields to the
underlying type for the last argument of __builtin_{add,sub,mul}_overflow_p,
or should the C++ FE special case __builtin_{add,sub,mul}_overflow_p and
not decay_conversion on the last argument to these, something else?

2016-06-15  Jakub Jelinek  

* internal-fn.c (expand_arith_set_overflow): New function.
(expand_addsub_overflow, expand_neg_overflow, expand_mul_overflow):
Use it.
(expand_arith_overflow_result_store): Likewise.  Handle precision
smaller than mode precision.
* tree-vrp.c (extract_range_basic): For imag part, handle
properly signed 1-bit precision result.
* doc/extend.texi (__builtin_add_overflow): Document that last
argument can't be pointer to enumerated or boolean type.
(__builtin_add_overflow_p): Document that last argument can't
have enumerated or boolean type.

* c-common.c (check_builtin_function_arguments): Require last
argument of BUILT_IN_*_OVERFLOW_P to have INTEGER_TYPE type.
Adjust wording of diagnostics for BUILT_IN_*_OVERLFLOW
if the last argument is pointer to enumerated or boolean type.

* c-c++-common/builtin-arith-overflow-1.c (generic_wrong_type, f3,
f4): Adjust expected diagnostics.
* c-c++-common/torture/builtin-arith-overflow.h (TP): New macro.
(T): If OVFP is defined, redefine to TP.
* c-c++-common/torture/builtin-arith-overflow-12.c: Adjust comment.
* c-c++-common/torture/builtin-arith-overflow-p-1.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-2.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-3.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-4.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-5.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-6.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-7.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-8.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-9.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-10.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-11.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-12.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-13.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-14.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-15.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-16.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-17.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-18.c: New test.
* c-c++-common/torture/builtin-arith-overflow-p-19.c: New test.
* g++.dg/ext/builtin-arith-overflow-1.C: Pass 0 instead of C
as last argument to __builtin_add_overflow_p.
I think this is OK -- however, please don't install until you've 
received an ACK from Jason on the follow-up patch which prevents 
promotion of bitfields in 

Re: [RFC: Patch 1/6 v2] New target hook: max_noce_ifcvt_seq_cost

2016-06-21 Thread Bernhard Reutner-Fischer
On June 21, 2016 5:50:26 PM GMT+02:00, James Greenhalgh 
 wrote:
>
>On Fri, Jun 03, 2016 at 12:39:42PM +0200, Richard Biener wrote:
>> On Thu, Jun 2, 2016 at 6:53 PM, James Greenhalgh
>>  wrote:
>> >
>> > Hi,
>> >
>> > This patch introduces a new target hook, to be used like
>BRANCH_COST but
>> > with a guaranteed unit of measurement. We want this to break away
>from
>> > the current ambiguous uses of BRANCH_COST.
>> >
>> > BRANCH_COST is used in ifcvt.c in two types of comparisons. One
>against
>> > instruction counts - where it is used as the limit on the number of
>new
>> > instructions we are permitted to generate. The other (after
>multiplying
>> > by COSTS_N_INSNS (1)) directly against RTX costs.
>> >
>> > Of these, a comparison against RTX costs is the more easily
>understood
>> > metric across the compiler, and the one I've pulled out to the new
>hook.
>> > To keep things consistent for targets which don't migrate, this new
>hook
>> > has a default value of BRANCH_COST * COSTS_N_INSNS (1).
>> >
>> > OK?
>>
>> How does the caller compute "predictable"?  There are some archs
>where
>> an information on whether this is a forward or backward jump is more
>> useful I guess.  Also at least for !speed_p the distance of the
>branch is
>> important given not all targets support arbitrary branch offsets.
>
>Just through a call to predictable_edge_p. It isn't perfect. My worry
>with adding more details of the branch is that you end up with a
>nonsense
>target implementation that tries way too hard to be clever. But, I
>don't
>mind passing the edge through to the target hook, that way a target has
>it if they want it. In this patch revision, I pass the edge through.
>
>> I remember that at the last Cauldron we discussed to change things to
>> compare costs of sequences of instructions rather than giving targets
>no
>> context with just asking for single (sub-)insn rtx costs.
>
>I've made better use of seq_cost in this respin. Bernd was right,
>constructing dummy RTX just for costs, then discarding it, then
>constructing the actual RTX for matching doesn't make sense as a
>pipeline.
>Better just to construct the real sequence and use the cost of that.
>
>In this patch revision, I started by removing the idea that this costs
>a branch at all. It doesn't, the use of this hook is really a target
>trying to limit if-convert to not end up pulling too much on to the
>unconditional path. It seems better to expose that limit directly by
>explicitly asking for the maximum cost of an unconditional sequence we
>would create, and comparing against seq_cost of the new RTL. This saves
>a target trying to figure out what is meant by a cost of a branch.
>
>Having done that, I think I can see a clearer path to getting the
>default hook implementation in shape. I've introduced two new params,
>which give maximum costs for the generated sequence (one for a
>"predictable"
>branch, one for "unpredictable") in the speed_p cases. I'm not
>expecting it
>to be useful to give the user control in the case we are compiling for
>size - whether this is a size win or not is independent of whether the
>branch is predictable.
>
>For the default implementation, if the parameters are not set, I just
>multiply BRANCH_COST through by COSTS_N_INSNS (1) for size and
>COSTS_N_INSNS (3) for speed. I know this is not ideal, but I'm still
>short
>of ideas on how best to form the default implementation.

How bad is it in e.g. CSiBE?

>we're
>still potentially going to introduce performance regressions for
>targets
>that don't provide an implementation of the new hook, or a default
>value
>for the new parameters. It does mean we can keep the testsuite clean by
>setting parameter values suitably high for all targets that have
>conditional move instructions.
>
>The new default causes some changes in generated conditional move
>sequences
>for x86_64. Whether these changes are for the better or not I can't
>say.
>
>This first patch introduces the two new parameters, and uses them in
>the
>default implementation of the target hook.

s/precitable/predictable/ ?

Present tense in documentation (s/will try to/tries to/).
s/should return/returns/

TARGET_MAX_NOCE_IFCVT_SEQ_COST (bool @var{speed_p}, edge @var{e}) talks about 
predictable_p but doesn't document e.


+DEFPARAM (PARAM_MAX_RTL_IF_CONVERSION_UNPREDICTABLE_COST, + 
"max-rtl-if-conversion-unpredictable-cost", +   "Maximum permissible cost for 
the sequence that would be " +"generated by the RTL if-conversion pass for 
a branch which " + "is considered predictable.", + 40, 0, 200)

unpredictable.

Present tense also in target.def.

+@code{predictable_p} is true

no predictable_p anymore but e missing in docs.

/Then multiply through by/s/through by/with/

thanks,
>
>Bootstrapped on x86_64 and aarch64 with no issues.
>
>OK?
>
>Thanks,
>James
>
>---
>2016-06-21  James Greenhalgh  
>
>   * target.def 

Re: [PATCH] Add a new target hook to compute the frame layout

2016-06-21 Thread Jeff Law

On 06/16/2016 08:47 AM, Bernd Edlinger wrote:

Hi!


By the design of the target hook INITIAL_ELIMINATION_OFFSET
it is necessary to call this function several times with
different register combinations.
Most targets use a cached data structure that describes the
exact frame layout of the current function.

It is safe to skip the computation when reload_completed = true,
and most targets do that already.

However while reload is doing its work, it is not clear when to
do the computation and when not.  This results in unnecessary
work.  Computing the frame layout can be a simple function or an
arbitrarily complex one, that walks all instructions of the current
function for instance, which is more or less the common case.


This patch adds a new optional target hook that can be used
by the target to factor the INITIAL_ELIMINATION_OFFSET-hook
into a O(n) computation part, and a O(1) result function.

The patch implements a compute_frame_layout target hook just
for ARM in the moment, to show the principle.
Other targets may also implement that hook, if it seems appropriate.


Boot-strapped and reg-tested on arm-linux-gnueabihf.
OK for trunk?


Thanks
Bernd.


changelog-frame-layout.txt


2016-06-16  Bernd Edlinger  

* target.def (compute_frame_layout): New optional target hook.
* doc/tm.texi.in (TARGET_COMPUTE_FRAME_LAYOUT): Add hook.
* doc/tm.texi (TARGET_COMPUTE_FRAME_LAYOUT): Add documentation.
* lra-eliminations.c (update_reg_eliminate): Call compute_frame_layout
target hook.
* reload1.c (verify_initial_elim_offsets): Likewise.
* config/arm/arm.c (TARGET_COMPUTE_FRAME_LAYOUT): Define.
(use_simple_return_p): Call arm_compute_frame_layout if needed.
(arm_get_frame_offsets): Split up into this ...
(arm_compute_frame_layout): ... and this function.
The ARM maintainers would need to chime in on the ARM specific changes 
though.





Index: gcc/target.def
===
--- gcc/target.def  (Revision 233176)
+++ gcc/target.def  (Arbeitskopie)
@@ -5245,8 +5245,19 @@ five otherwise.  This is best for most machines.",
  unsigned int, (void),
  default_case_values_threshold)

-/* Retutn true if a function must have and use a frame pointer.  */

s/Retutn/Return


+/* Optional callback to advise the target to compute the frame layout.  */
 DEFHOOK
+(compute_frame_layout,
+ "This target hook is called immediately before reload wants to call\n\
+@code{INITIAL_ELIMINATION_OFFSET} and allows the target to cache the frame\n\
+layout instead of re-computing it on every invocation.  This is particularly\n\
+useful for targets that have an O(n) frame layout function.  Implementing\n\
+this callback is optional.",
+ void, (void),
+ hook_void_void)
So the docs say "immediately before", but that's not actually reality in 
lra-eliminations.  I think you can just say "This target hook is called 
before reload or lra-eliminations calls 
@code{INITIAL_ELIMINATION_OFFSET} and allows ..."



How does this macro interact with INITIAL_FRAME_POINTER_OFFSET?

I'm OK with this conceptually.  I think you need a minor doc update and 
OK from the ARM maintainers before it can be installed though.


jeff


Re: Implement C _FloatN, _FloatNx types [version 2]

2016-06-21 Thread Joseph Myers
On Tue, 21 Jun 2016, Michael Meissner wrote:

> While from one perspective, it would have been cleaner if the PowerPC
> had separate internal types for IBM extended double and IEEE 128-bit floating
> point.  But the way the compiler has been implemented is that TFmode is which
> ever type is the default.  Typically, in the Linux environment, this is IBM
> extended double, but we are hoping in the future to change the default to make
> long double IEEE 128-bit.  So, I think we need to do something like:
> 
> static machine_mode
> rs6000_floatn_mode (int n, bool extended)
> {
>   if (extended)
> {
>   switch (n)
>   {
>   case 32:
> return DFmode;
> 
>   case 64:
> if (TARGET_FLOAT128)
>   return (TARGET_IEEEQUAD) ? TFmode : KFmode;

That's essentially what I had in the first patch version.  I.e., you and 
Bill are making contradictory statements about whether always to use 
KFmode (in which case *q constants should be fixed to match) or whether to 
prefer TFmode when it's binary128 (in which case no fixes beyond my patch 
(with the first version of this function) are needed as my patch will make 
__float128 into an alias for _Float128, which will prefer TFmode when it's 
binary128).

> As an aside, one of the issues, I'm currently grappling with is how to enable
> the __float128 machinery without enabling the __float128 keyword (PR 70589).
> The reason is we need to create the various built-in functions for IEEE 
> 128-bit
> functions, which means creating the type and keyword support.

Perhaps it would be best not to allow __float128 to be disabled?  I.e. 
have it always available for 64-bit and never available for 32-bit (or 
something like that).  Or at least don't allow it to change during the 
source file.  Because we want these functions to end up in generic code 
rather than target-specific.  And at least the C-family built-ins 
machinery handles not creating a built-in function if its type would be 
error_mark_node, and using error_mark_node for types derived from 
error_mark_node.  That is, if in the generic code we have

DEF_PRIMITIVE_TYPE (BT_FLOAT128, float128_type_node ? float_128_type_node : 
error_mark_node)

in builtin-types.def, then it should be possible to list all the relevant 
functions in builtins.def and have them only created when the type is 
enabled.  But you'll run into problems if there are conditions for 
function availability changing part way through the source file  
(There's existing machinery for targets to change built-in function 
availability part way through the source file, but it seems likely to be 
nasty to apply that to target-independent functions.)

> Another thing that I'm grappling with is how to identify when the floating
> point emulation functions have been built.  Right now, it is only built for
> 64-bit Linux systems, and it is not built for AIX, BSD, or the various 32-bit
> embedded targets.

See the libgcc_floating_mode_supported_p hook.  (It doesn't do much, and 
you might need to implement it for your target, but it's the right idea.)

> However, I suspect that in the future, we may have users that want to do
> arithmetic in this format (particularly vectors).  This comes from people
> wanting to interface with attached GPUs that often times support HFmode, and
> with machine learning systems that does not need the precision.  So, we should
> at least think what is needed to enable HFmode as a real type.

Basically, you need to define an ABI, and you need changes to excess 
precision support which is currently geared to the x87 case.

> > GCC has some prior support for nonstandard floating-point types in the
> > form of __float80 and __float128.  Where these were previously types
> > distinct from long double, they are made by this patch into aliases
> > for _Float64x / _Float128 if those types have the required properties.
> 
> And of course other machine dependent non-standard floating point types 
> such as IBM extended double.

Which cannot be _FloatN / _FloatNx types; those names are strictly for 
types with IEEE semantics (and representation, in the _FloatN case).  
float, double and long double are much more flexible about possible 
formats than _FloatN and _FloatNx.

> In addition, there is the question of what to do on the embedded machines that
> might use IEEE format for floating point, but does not support all of the
> features in IEEE 754R such as NaNs, infinities, de-normal numbers, negative 0,
> etc.

Those also cannot be _FloatN / _FloatNx types.  Note how my real.c changes 
mark spu_single_format as not suitable for such a type.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PATCH, PR middle-end/71488] Fix vectorization of comparison of booleans

2016-06-21 Thread Jeff Law

On 06/16/2016 05:06 AM, Ilya Enkovich wrote:

Hi,

This patch fixes incorrect comparison vectorization for booleans.
The problem is that regular comparison which works for scalars
doesn't work for vectors due to different binary representation.
Also this never works for scalar masks.

This patch replaces such comparisons with bitwise operations
which work correctly for both vector and scalar masks.

Bootstrapped and regtested on x86_64-unknown-linux-gnu.  Is it
OK for trunk?  What should be done for gcc-6-branch?  Port this
patch or just restrict vectorization for comparison of booleans?

Thanks,
Ilya
--
gcc/

2016-06-15  Ilya Enkovich  

PR middle-end/71488
* tree-vect-patterns.c (vect_recog_mask_conversion_pattern): Support
comparison of boolean vectors.
* tree-vect-stmts.c (vectorizable_comparison): Vectorize comparison
of boolean vectors using bitwise operations.

gcc/testsuite/

2016-06-15  Ilya Enkovich  

PR middle-end/71488
* g++.dg/pr71488.C: New test.
* gcc.dg/vect/vect-bool-cmp.c: New test.
OK.  Given this is a code generation bug, I'll support porting this 
patch to the gcc-6 branch.  Is there any reason to think that porting 
out be more risky than usual?  It looks pretty simple to me, am I 
missing some subtle dependency?


jeff



Re: [PATCH] Backport PowerPC complex __float128 compiler support to GCC 6.x

2016-06-21 Thread Joseph Myers
On Tue, 21 Jun 2016, Michael Meissner wrote:

> > I'm now working on support for TS 18661-3 _FloatN / _FloatNx type names 
> > (keywords), constant suffixes and  addiitions.  That will 
> > address, for C, the need to use modes for complex float128 (bug 32187) by 
> > allowing the standard _Complex _Float128 to be used.  The issue would 
> > still apply for C++ (I'm not including any C++ support for these type 
> > names / constant suffixes in my patch), and for complex ibm128.
> 
> Great!
> 
> Of course we will need to have some solution for C++.

Since the C++-standard way of using complex numbers is std::complex, it's 
not clear to me that the C-compatible _Complex _Float128 needs to be 
particularly conveniently avilable in C++.

Rather, where the C++ standard says "The effect of instantiating the 
template complex for any type other than float, double, or long double is 
unspecified. The specializations complex, complex, and 
complex are literal types (3.9).", presumably one should aim 
to make complex<__float128> work properly if it doesn't already.  And that 
might internally use _Complex _Float128 in some way to call libm functions 
when available, much as libstdc++ currently has _GLIBCXX_USE_C99_COMPLEX.  
And then I suppose you'd want to make appropriate literal suffixes work as 
well.  That's part of rather more generally making libstdc++ work with a 
new floating-point type; 
 has an overview of 
the sorts of things involved and links to other discussions.

Standard C++ support for decimal floating-point types, TR 24733, took the 
form of classes such as std::decimal::decimal64 rather than the C-style 
_Decimal64.  I don't know what form C++ bindings to IEEE interchange and 
extended types might take, but something similar would seem natural.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Implement C _FloatN, _FloatNx types [version 2]

2016-06-21 Thread Michael Meissner
On Tue, Jun 21, 2016 at 05:41:36PM +, Joseph Myers wrote:
> [Changes in version 2 of the patch:
> 
> * PowerPC always uses KFmode for _Float128 and _Float64x when those
>   types are supported, not TFmode.

While from one perspective, it would have been cleaner if the PowerPC
had separate internal types for IBM extended double and IEEE 128-bit floating
point.  But the way the compiler has been implemented is that TFmode is which
ever type is the default.  Typically, in the Linux environment, this is IBM
extended double, but we are hoping in the future to change the default to make
long double IEEE 128-bit.  So, I think we need to do something like:

static machine_mode
rs6000_floatn_mode (int n, bool extended)
{
  if (extended)
{
  switch (n)
{
case 32:
  return DFmode;

case 64:
  if (TARGET_FLOAT128)
return (TARGET_IEEEQUAD) ? TFmode : KFmode;
  else
return VOIDmode;

case 128:
  return VOIDmode;

default:
  /* Those are the only valid _FloatNx types.  */
  gcc_unreachable ();
}
}
  else
{
  switch (n)
{
case 32:
  return SFmode;

case 64:
  return DFmode;

case 128:
  if (TARGET_FLOAT128)
return (TARGET_IEEEQUAD) ? TFmode : KFmode;
  else
return VOIDmode;

default:
  return VOIDmode;
}
}
}

As an aside, one of the issues, I'm currently grappling with is how to enable
the __float128 machinery without enabling the __float128 keyword (PR 70589).
The reason is we need to create the various built-in functions for IEEE 128-bit
functions, which means creating the type and keyword support.

As Richard Biener and I discussed, the way I created the types using
FRACTIONAL_FLOAT_MODE for IFmode/KFmode isn't really ideal.
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01114.html

It would have made things simpler if there was a MD hook to control what
automatically widens to what.  I tried to implement it in the early days, but I
never came up with code that did what I wanted.  I use rs6000_invalid_binary_op
to do this to some extent.

Another thing that I'm grappling with is how to identify when the floating
point emulation functions have been built.  Right now, it is only built for
64-bit Linux systems, and it is not built for AIX, BSD, or the various 32-bit
embedded targets.

> No GCC port supports a floating-point format suitable for _Float128x.
> Although there is HFmode support for ARM and AArch64, use of that for
> _Float16 is not enabled.  Supporting _Float16 would require additional
> work on the excess precision aspects of TS 18661-3: there are new
> values of FLT_EVAL_METHOD, which are not currently supported in GCC,
> and FLT_EVAL_METHOD == 0 now means that operations and constants on
> types narrower than float are evaluated to the range and precision of
> float.  Implementing that, so that _Float16 gets evaluated with excess
> range and precision, would involve changes to the excess precision
> infrastructure so that the _Float16 case is enabled by default, unlike
> the x87 case which is only enabled for -fexcess-precision=standard.
> Other differences between _Float16 and __fp16 would also need to be
> disentangled.

ISA 3.0 (Power9) does have instructions to convert to/from 16-bit floating
point to 32-bit floating point.  Right now, it is just conversion to/from
storage format to a more normal format (similar to _Decimal32 where there are
no instructions to operate on 32-bit decimal data).  At the moment, we don't
have support for these instructions, but we will likely do it in the future.

However, I suspect that in the future, we may have users that want to do
arithmetic in this format (particularly vectors).  This comes from people
wanting to interface with attached GPUs that often times support HFmode, and
with machine learning systems that does not need the precision.  So, we should
at least think what is needed to enable HFmode as a real type.

> GCC has some prior support for nonstandard floating-point types in the
> form of __float80 and __float128.  Where these were previously types
> distinct from long double, they are made by this patch into aliases
> for _Float64x / _Float128 if those types have the required properties.

And of course other machine dependent non-standard floating point types such as
IBM extended double.

In addition, there is the question of what to do on the embedded machines that
might use IEEE format for floating point, but does not support all of the
features in IEEE 754R such as NaNs, infinities, de-normal numbers, negative 0,
etc.

As I said in my previous message, what ever we do in this space needs to be
backwards compatible with existing usage in GCC 6.x.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



[PATCH, rs6000] Prefer vspltisw/h over xxspltib+instruction when available

2016-06-21 Thread Bill Schmidt
Hi,

I discovered recently that, with -mcpu=power9, an attempt to generate a 
vspltish instruction resulted instead in an xxspltib followed by a vupkhsb.  
This is semantically correct but the extra instruction is not optimal.  I found 
that there was some logic in xxspltib_constant_p to do special casing for 
const_vector with small constants, but not for vec_duplicate with small 
constants.  This patch duplicates that logic so we can generate the single 
instruction when possible.

When I did this, I ran into a problem with an existing test case.  We end up 
matching the *vsx_splat_v4si_internal pattern instead of falling back to the 
altivec_vspltisw pattern.  The constraints don't match for constant input.  To 
avoid this, I added a pattern ahead of this one that will match for VMX output 
registers and produce the vspltisw as desired.  This corrected the failing test 
and produces the expected code.

I've added a test case to demonstrate the code works properly now in the usual 
case.

Bootstrapped and tested on powerpc64le-unknown-linux-gnu.  OK for trunk, and 
for 6.2 after suitable burn-in?

Thanks!

Bill


[gcc]

2016-06-21  Bill Schmidt  

* config/rs6000/rs6000.c (xxspltib_constant_p): Prefer vspltisw/h
for vec_duplicate when this is cheaper.
* config/rs6000/vsx.md (*vsx_splat_v4si_altivec): New define_insn.

[gcc/testsuite]

2016-06-21  Bill Schmidt  

* gcc.target/powerpc/splat-p9-1.c: New test.


Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  (revision 237619)
+++ gcc/config/rs6000/rs6000.c  (working copy)
@@ -6329,6 +6329,13 @@ xxspltib_constant_p (rtx op,
   value = INTVAL (element);
   if (!IN_RANGE (value, -128, 127))
return false;
+
+  /* See if we could generate vspltisw/vspltish directly instead of
+xxspltib + sign extend.  Special case 0/-1 to allow getting
+ any VSX register instead of an Altivec register.  */
+  if (!IN_RANGE (value, -1, 0) && EASY_VECTOR_15 (value)
+ && (mode == V4SImode || mode == V8HImode))
+   return false;
 }
 
   /* Handle (const_vector [...]).  */
Index: gcc/config/rs6000/vsx.md
===
--- gcc/config/rs6000/vsx.md(revision 237619)
+++ gcc/config/rs6000/vsx.md(working copy)
@@ -2400,6 +2400,17 @@
 operands[1] = force_reg (mode, operands[1]);
 })
 
+;; The pattern following this one hides altivec_vspltisw, which we
+;; prefer to match when possible, so duplicate that here for
+;; TARGET_P9_VECTOR.
+(define_insn "*vsx_splat_v4si_altivec"
+  [(set (match_operand:V4SI 0 "altivec_register_operand" "=v")
+(vec_duplicate:V4SI
+(match_operand:QI 1 "s5bit_cint_operand" "i")))]
+  "TARGET_P9_VECTOR"
+  "vspltisw %0,%1"
+  [(set_attr "type" "vecperm")])
+
 (define_insn "*vsx_splat_v4si_internal"
   [(set (match_operand:V4SI 0 "vsx_register_operand" "=wa,wa")
(vec_duplicate:V4SI
Index: gcc/testsuite/gcc.target/powerpc/splat-p9-1.c
===
--- gcc/testsuite/gcc.target/powerpc/splat-p9-1.c   (revision 0)
+++ gcc/testsuite/gcc.target/powerpc/splat-p9-1.c   (working copy)
@@ -0,0 +1,16 @@
+/* { dg-do compile } */
+/* { dg-require-effective-target powerpc_p9vector_ok } */
+/* { dg-options "-maltivec -mcpu=power9" } */
+/* { dg-skip-if "do not override -mcpu" { powerpc*-*-* } { "-mcpu=*" } { 
"-mcpu=power9" } } */
+/* { dg-final { scan-assembler "vspltish" } } */
+/* { dg-final { scan-assembler-not "xxspltib" } } */
+
+/* Make sure we don't use an inefficient sequence for small integer splat.  */
+
+#include 
+
+vector short
+foo ()
+{
+  return vec_splat_s16 (5);
+}





Re: [PATCH 2/3] Add support for arm*-*-phoenix* targets.

2016-06-21 Thread Jeff Law

On 06/15/2016 08:22 AM, Kuba Sejdak wrote:

Is it ok for trunk? If possible, If possible, please merge it also to GCC-6 and 
GCC-5 branches.

2016-06-15  Jakub Sejdak  

   * config.gcc: Add support for arm*-*-phoenix* targets.
   * config/arm/t-phoenix: New.
   * config/phoenix.h: New.

---
 gcc/ChangeLog|  6 ++
 gcc/config.gcc   | 11 +++
 gcc/config/arm/t-phoenix | 29 +
 gcc/config/phoenix.h | 33 +
 4 files changed, 79 insertions(+)
 create mode 100644 gcc/config/arm/t-phoenix
 create mode 100644 gcc/config/phoenix.h




+arm*-*-phoenix*)
+   tm_file="dbxelf.h elfos.h arm/unknown-elf.h arm/elf.h arm/bpabi.h"
+   tm_file="${tm_file} newlib-stdint.h phoenix.h"
+   tm_file="${tm_file} arm/aout.h arm/arm.h"
+   tmake_file="${tmake_file} arm/t-arm arm/t-bpabi arm/t-phoenix"
Do you really need dbxelf.h?  We're trying to get away from stabs, so 
unless there's a strong need, avoid dbxelf.h :-)


OK for the trunk with dbxelf.h removed.

jeff


Re: [PATCH 3/3] Add support for arm*-*-phoenix* targets in libgcc.

2016-06-21 Thread Jeff Law

On 06/15/2016 08:22 AM, Kuba Sejdak wrote:

Is it ok for trunk? If possible, If possible, please merge it also to GCC-6 and 
GCC-5 branches.

2016-06-15  Jakub Sejdak  

   * config.host: Add suport for arm*-*-phoenix* targets.

OK for the trunk.

jeff



Re: [PATCH 1/3] Disable libgcj and libgloss for Phoenix-RTOS targets.

2016-06-21 Thread Jeff Law

On 06/15/2016 08:22 AM, Kuba Sejdak wrote:

This patch disables libgcj and libgloss in main configure.ac for new OS port - 
Phoenix-RTOS.
Those libs are unnecessary to build GCC or newlib for arm-phoenix.

Is it ok for trunk? If possible, If possible, please merge it also to GCC-6 and 
GCC-5 branches.

2016-06-15  Jakub Sejdak  

* configure.ac: Disable libgcj and libgloss for Phoenix-RTOS targets.
* configure: Regenerated.
These are fine for the trunk.  Please go ahead and commit once your SVN 
write access is set up.


We generally don't do feature enablement in release branches.  Jakub, 
Joseph or Richi would have go grant an exception for this to be accepted 
on the release branches.


jeff



Re: [PATCH], Simplify setup of complex types

2016-06-21 Thread Jeff Law

On 06/21/2016 07:48 AM, Michael Meissner wrote:

When I submitted the back port to allow complex __float128 to be created on the
PowerPC to the GCC 6.2 branch, Richard Biener suggested a simpler way to set
the complex type:
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01114.html

This patch implements this change for the trunk.  I have a companion patch for
6.2 once this goes into the trunk.

I bootstrapped the compiler and did a make check with no regressions on a big
endian Power 7 system and a little endian Power 8 system.  Is it ok to go into
the trunk?

[gcc]
2016-06-21  Michael Meissner  

* stor-layout.c (layout_type): Move setting complex MODE to
layout_type, instead of setting it ahead of time by the caller.
* tree.c (build_complex_type): Likewise.

[gcc/fortran]
2016-06-21  Michael Meissner  

* trans-types.c (gfc_build_complex_type): Move setting complex
MODE to layout_type, instead of setting it ahead of time by the
caller.

OK.

Jeff

ps.  Thanks for the pointer back to the prior discussion with Richi -- 
that makes it a lot easier to pick up state on this while Richi is on PTO.




Re: [PATCH] x86-64: Load external function address via GOT slot

2016-06-21 Thread H.J. Lu
On Tue, Jun 21, 2016 at 11:22 AM, Uros Bizjak  wrote:
> On Tue, Jun 21, 2016 at 2:40 PM, H.J. Lu  wrote:
>> On Mon, Jun 20, 2016 at 12:46 PM, Richard Sandiford
>>  wrote:
>>> Uros Bizjak  writes:
 On Mon, Jun 20, 2016 at 9:19 PM, H.J. Lu  wrote:
> On Mon, Jun 20, 2016 at 12:13 PM, Uros Bizjak  wrote:
>> On Mon, Jun 20, 2016 at 7:05 PM, H.J. Lu  wrote:
>>> Hi,
>>>
>>> This patch implements the alternate code sequence recommended in
>>>
>>> https://groups.google.com/forum/#!topic/x86-64-abi/de5_KnLHxtI
>>>
>>> to load external function address via GOT slot with
>>>
>>> movq func@GOTPCREL(%rip), %rax
>>>
>>> so that linker won't create an PLT entry for extern function
>>> address.
>>>
>>> Tested on x86-64.  OK for trunk?
>>
>>> +  else if (ix86_force_load_from_GOT_p (op1))
>>> +{
>>> +  /* Load the external function address via the GOT slot to
>>> +avoid PLT.  */
>>> +  op1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op1),
>>> +   (TARGET_64BIT
>>> +? UNSPEC_GOTPCREL
>>> +: UNSPEC_GOT));
>>> +  op1 = gen_rtx_CONST (Pmode, op1);
>>> +  op1 = gen_const_mem (Pmode, op1);
>>> +  /* This symbol must be referenced via a load from the Global
>>> +Offset Table.  */
>>> +  set_mem_alias_set (op1, ix86_GOT_alias_set ());
>>> +  op1 = convert_to_mode (mode, op1, 1);
>>> +  op1 = force_reg (mode, op1);
>>> +  emit_insn (gen_rtx_SET (op0, op1));
>>> +  /* Generate a CLOBBER so that there will be no REG_EQUAL note
>>> +on the last insn to prevent cse and fwprop from replacing
>>> +a GOT load with a constant.  */
>>> +  rtx tmp = gen_reg_rtx (Pmode);
>>> +  emit_clobber (tmp);
>>> +  return;
>>
>> Jeff, is this the recommended way to prevent CSE, as far as RTL
>> infrastructure is concerned? I didn't find any example of this
>> approach with other targets.
>>
>
> FWIW, the similar approach is used in ix86_expand_vector_move_misalign,
> ix86_expand_convert_uns_didf_sse and ix86_expand_vector_init_general
> as well as other targets:
>
> frv/frv.c:  emit_clobber (op0);
> frv/frv.c:  emit_clobber (op1);
> im32c/m32c.c:  /*  emit_clobber (gen_rtx_REG (HImode, R0L_REGNO)); */
> s390/s390.c:  emit_clobber (addr);
> s390/s390.md:  emit_clobber (reg0);
> s390/s390.md:  emit_clobber (reg1);
> s390/s390.md:  emit_clobber (reg0);
> s390/s390.md:  emit_clobber (reg0);
> s390/s390.md:  emit_clobber (reg1);

 These usages mark the whole register as being "clobbered"
 (=undefined), before only a part of register is written, e.g.:

   emit_clobber (int_xmm);
   emit_move_insn (gen_lowpart (DImode, int_xmm), input);

 They aren't used to prevent unwanted CSE.
>>>
>>> Since it's being called in the move expander, I thought the normal
>>> way of preventing the constant being rematerialised would be to reject
>>> it in the move define_insn predicates.
>>>
>>> FWIW, I agree that using a clobber for this is going to be fragile.
>>>
>>
>> Here is the patch without clobber.  Tested on x86-64.  OK for
>> trunk?
>
> No, your patch has multiple problems:
>
> 1. It won't work for e.g. +1, since you have to legitimize the
> symbol in two places of ix86_expand_move. Also, why use TARGET_64BIT
> in
>
> +  op1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op1),
> +(TARGET_64BIT
> + ? UNSPEC_GOTPCREL
> + : UNSPEC_GOT));
>
> when ix86_force_load_from_GOT_p rejects non-64bit targets?
>
> 2. New check should be added to ix86_legitimate_constant_p, not to
> predicates of move insn patterns. Unfortunately, we still have to
> change x86_64_immediate_operand in two places.
>
> I have attached my version of the patch. It handles all your
> testcases, plus +1 case. Bootstrap is still running.
>
> Does the patch work for you?

It works.

Thanks.

-- 
H.J.


Re: [PATCH] Giant concepts patch

2016-06-21 Thread Jason Merrill
I've pushed my work-in-progress integration branch to jason/concepts-rewrite.

Jason


On Mon, Jun 20, 2016 at 4:28 PM, Jason Merrill  wrote:
> On Fri, Mar 25, 2016 at 1:33 AM, Andrew Sutton
>  wrote:
>> I'll just leave this here...
>>
>> This patch significantly improves performance with concepts (i.e.,
>> makes it actually usable for real systems) and improves the
>> specificity of diagnostics when constraints fail.
>>
>> Unfortunately, this isn't easily submittable in small pieces because
>> it completely replaces most of the core processing routines for
>> constraints, including (essentially) a complete rewrite of logic.cc
>> and the diagnostics in constraint.cc. More perfective work could be
>> done related to diagnostics, but this needs to be applied first.
>>
>> As part of the patch, I added timevars for constraint satisfaction and
>> subsumption. In template-heavy coe (~80KLOC), I'm seeing satisfaction
>> account for ~6% of compilation time and subsumption ~2%. Template
>> instantiation remains ~35%, but I think there's still room for
>> improvement in concepts. It just requires experimentation.
>>
>> Tests involving significant number of disjunctions have yet to
>> register > 1% of compilation time spent in subsumption, but I'm still
>> testing.
>
> Thanks, I've been working on integrating this patch, hoping to have it
> in for 6.2.  Have you done more work on it since you sent this out?
>
> A few issues:
>
> I've run into some trouble building cmcstl2: declarator requirements
> on a function can lead to constraints that tsubst_constraint doesn't
> handle.  What was your theory of only handling a few _CONSTR codes
> there?  This is blocking me from checking in the patch.
>
> Adding gcc_unreachable to the ARGUMENT_PACK_SELECT handling in concept
> arg hash/compare doesn't seem to break anything in either the GCC
> testsuite or your stl2.  Do you have a testcase where that code is
> still needed?
>
>> Also, it might be worth noting that partial specialization of variable
>> templates is currently broken. We don't seem to be emitting template
>> arguments as part of the mangled name, leading to lots and lots of
>> late redefinition errors.
>
> This should be fixed now.
>
> Jason


Re: [PATCH] Backport PowerPC complex __float128 compiler support to GCC 6.x

2016-06-21 Thread Michael Meissner
On Thu, Jun 16, 2016 at 10:06:48PM +, Joseph Myers wrote:
> On Wed, 15 Jun 2016, Michael Meissner wrote:
> 
> > Note, I do feel the front ends should be modified to allow __complex 
> > __float128
> > directly rather than having to use an attribute to force the complex type 
> > (and
> > to use mode(TF) on x86 or mode(KF) on PowerPC).  It would clean up both x86 
> > and
> > PowerPC.  However, those patches aren't written yet.
> 
> I'm now working on support for TS 18661-3 _FloatN / _FloatNx type names 
> (keywords), constant suffixes and  addiitions.  That will 
> address, for C, the need to use modes for complex float128 (bug 32187) by 
> allowing the standard _Complex _Float128 to be used.  The issue would 
> still apply for C++ (I'm not including any C++ support for these type 
> names / constant suffixes in my patch), and for complex ibm128.

Great!

Of course we will need to have some solution for C++.

And we will have to live with the current stuff in GCC 6.x.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797



Re: [PATCH] x86-64: Load external function address via GOT slot

2016-06-21 Thread Uros Bizjak
On Tue, Jun 21, 2016 at 2:40 PM, H.J. Lu  wrote:
> On Mon, Jun 20, 2016 at 12:46 PM, Richard Sandiford
>  wrote:
>> Uros Bizjak  writes:
>>> On Mon, Jun 20, 2016 at 9:19 PM, H.J. Lu  wrote:
 On Mon, Jun 20, 2016 at 12:13 PM, Uros Bizjak  wrote:
> On Mon, Jun 20, 2016 at 7:05 PM, H.J. Lu  wrote:
>> Hi,
>>
>> This patch implements the alternate code sequence recommended in
>>
>> https://groups.google.com/forum/#!topic/x86-64-abi/de5_KnLHxtI
>>
>> to load external function address via GOT slot with
>>
>> movq func@GOTPCREL(%rip), %rax
>>
>> so that linker won't create an PLT entry for extern function
>> address.
>>
>> Tested on x86-64.  OK for trunk?
>
>> +  else if (ix86_force_load_from_GOT_p (op1))
>> +{
>> +  /* Load the external function address via the GOT slot to
>> +avoid PLT.  */
>> +  op1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op1),
>> +   (TARGET_64BIT
>> +? UNSPEC_GOTPCREL
>> +: UNSPEC_GOT));
>> +  op1 = gen_rtx_CONST (Pmode, op1);
>> +  op1 = gen_const_mem (Pmode, op1);
>> +  /* This symbol must be referenced via a load from the Global
>> +Offset Table.  */
>> +  set_mem_alias_set (op1, ix86_GOT_alias_set ());
>> +  op1 = convert_to_mode (mode, op1, 1);
>> +  op1 = force_reg (mode, op1);
>> +  emit_insn (gen_rtx_SET (op0, op1));
>> +  /* Generate a CLOBBER so that there will be no REG_EQUAL note
>> +on the last insn to prevent cse and fwprop from replacing
>> +a GOT load with a constant.  */
>> +  rtx tmp = gen_reg_rtx (Pmode);
>> +  emit_clobber (tmp);
>> +  return;
>
> Jeff, is this the recommended way to prevent CSE, as far as RTL
> infrastructure is concerned? I didn't find any example of this
> approach with other targets.
>

 FWIW, the similar approach is used in ix86_expand_vector_move_misalign,
 ix86_expand_convert_uns_didf_sse and ix86_expand_vector_init_general
 as well as other targets:

 frv/frv.c:  emit_clobber (op0);
 frv/frv.c:  emit_clobber (op1);
 im32c/m32c.c:  /*  emit_clobber (gen_rtx_REG (HImode, R0L_REGNO)); */
 s390/s390.c:  emit_clobber (addr);
 s390/s390.md:  emit_clobber (reg0);
 s390/s390.md:  emit_clobber (reg1);
 s390/s390.md:  emit_clobber (reg0);
 s390/s390.md:  emit_clobber (reg0);
 s390/s390.md:  emit_clobber (reg1);
>>>
>>> These usages mark the whole register as being "clobbered"
>>> (=undefined), before only a part of register is written, e.g.:
>>>
>>>   emit_clobber (int_xmm);
>>>   emit_move_insn (gen_lowpart (DImode, int_xmm), input);
>>>
>>> They aren't used to prevent unwanted CSE.
>>
>> Since it's being called in the move expander, I thought the normal
>> way of preventing the constant being rematerialised would be to reject
>> it in the move define_insn predicates.
>>
>> FWIW, I agree that using a clobber for this is going to be fragile.
>>
>
> Here is the patch without clobber.  Tested on x86-64.  OK for
> trunk?

No, your patch has multiple problems:

1. It won't work for e.g. +1, since you have to legitimize the
symbol in two places of ix86_expand_move. Also, why use TARGET_64BIT
in

+  op1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op1),
+(TARGET_64BIT
+ ? UNSPEC_GOTPCREL
+ : UNSPEC_GOT));

when ix86_force_load_from_GOT_p rejects non-64bit targets?

2. New check should be added to ix86_legitimate_constant_p, not to
predicates of move insn patterns. Unfortunately, we still have to
change x86_64_immediate_operand in two places.

I have attached my version of the patch. It handles all your
testcases, plus +1 case. Bootstrap is still running.

Does the patch work for you?

Uros.
Index: i386-protos.h
===
--- i386-protos.h   (revision 237653)
+++ i386-protos.h   (working copy)
@@ -70,6 +70,7 @@ extern bool ix86_expand_set_or_movmem (rtx, rtx, r
 extern bool constant_address_p (rtx);
 extern bool legitimate_pic_operand_p (rtx);
 extern bool legitimate_pic_address_disp_p (rtx);
+extern bool ix86_force_load_from_GOT_p (rtx);
 extern void print_reg (rtx, int, FILE*);
 extern void ix86_print_operand (FILE *, rtx, int);
 
Index: i386.c
===
--- i386.c  (revision 237653)
+++ i386.c  (working copy)
@@ -15120,6 +15120,19 @@ darwin_local_data_pic (rtx disp)
  && XINT (disp, 1) == UNSPEC_MACHOPIC_OFFSET);
 }
 
+/* True if operand X should be loaded from GOT.  */
+
+bool
+ix86_force_load_from_GOT_p (rtx x)
+{
+  return (TARGET_64BIT && !TARGET_PECOFF && !TARGET_MACHO
+ && 

Re: [Patch, Fortran] PR71068 - fix ICE on invalid with coindexed DATA

2016-06-21 Thread Tobias Burnus

Dear Paul,

Paul Richard Thomas wrote:

Thanks for the patch

Thanks also from my side.


PS Why, in principle, can data objects not have co-indices?


I think there is no really fundamental reason, but it doesn't make 
really sense. DATA is an explicit initialization, similar to

  "integer :: i = 5"
and (mostly) has implicitly the SAVE attribute. [5.6.7 @ J3/16-007r1] To 
initialize the variable on a remote image feels odd - especially as each 
image initializes it to the same value.


[Side remark, since I just stumbled over it: "The statement ordering 
rules allow DATA statements to appear anywhere in a program unit after 
the specification statements. The ability to position DATA statements 
amongst executable statements is very rarely used, unnecessary, and a 
potential source of error." (B.3.5 in the section of obsolescent 
features in F2015 (J3/16-007r1).)]


Cheers,

Tobias



On 21 June 2016 at 16:15, Tobias Burnus
 wrote:

Dear all,

the problem comes up with:
data a(1)[1] /1/
which is invalid. In resolve.c's check_data_variable(), one has:

   if (!gfc_resolve_expr (var->expr))
 return false;
...
   e = var->expr;

   if (e->expr_type != EXPR_VARIABLE)
 gfc_internal_error ("check_data_variable(): Bad expression");

which triggers as resolve_variable() has:

   if (t && flag_coarray == GFC_FCOARRAY_LIB && gfc_is_coindexed (e))
 add_caf_get_intrinsic (e);


The solution is either not to decorate the DATA variable with
caf_get() - or to strip it off for testing. The latter has been
done in this patch. It's not really beautify, but works.

Additionally, I had to add the argument-handling short cut
as otherwise, more and more caf_get() could be added around the
argument, which is both pointless and causes the strip off to
fail.


Build and regtested on x86-64-gnu-linux.
OK for the trunk? Or do you see a more beautiful approach?

Tobias







C++ PATCH for concept checking in non-dependent expressions

2016-06-21 Thread Jason Merrill
Concept code also needs some updates to accommodate my GCC 7 fix for 
10200.  First, and not limited to concepts, we need to treat a member 
template as dependent if its signature depends on template parameters of 
its enclosing class (which, more importantly, are template parameters of 
the scope where the member template is named).  Second, 
constraints_satisfied_p needs the same kind of change that I made to 
instantiate_decl and fn_type_unification to handle non-dependent calls 
within a template.  The testcase tests both these changes.


Tested x86_64-pc-linux-gnu, applying to trunk.
commit 44d09c91bac2a200149e30c1d1bbc13042c6a723
Author: Jason Merrill 
Date:   Mon Jun 20 11:22:49 2016 +0300

	Fix type_dependent_expression_p of member templates.

	* pt.c (template_parm_outer_level, uses_outer_template_parms): New.
	(type_dependent_expression_p): Use uses_outer_template_parms.

diff --git a/gcc/cp/pt.c b/gcc/cp/pt.c
index 11b5d82..c5f65a7 100644
--- a/gcc/cp/pt.c
+++ b/gcc/cp/pt.c
@@ -5064,6 +5064,24 @@ template_parm_this_level_p (tree t, void* data)
   return level == this_level;
 }
 
+/* Worker for uses_outer_template_parms, called via for_each_template_parm.
+   DATA is really an int, indicating the innermost outer level of parameters.
+   If T is a template parameter of that level or further out, return
+   nonzero.  */
+
+static int
+template_parm_outer_level (tree t, void *data)
+{
+  int this_level = *(int *)data;
+  int level;
+
+  if (TREE_CODE (t) == TEMPLATE_PARM_INDEX)
+level = TEMPLATE_PARM_LEVEL (t);
+  else
+level = TEMPLATE_TYPE_LEVEL (t);
+  return level <= this_level;
+}
+
 /* Creates a TEMPLATE_DECL for the indicated DECL using the template
parameters given by current_template_args, or reuses a
previously existing one, if appropriate.  Returns the DECL, or an
@@ -9032,6 +9050,33 @@ uses_template_parms_level (tree t, int level)
  /*include_nondeduced_p=*/true);
 }
 
+/* Returns true if the signature of DECL depends on any template parameter from
+   its enclosing class.  */
+
+bool
+uses_outer_template_parms (tree decl)
+{
+  int depth = template_class_depth (CP_DECL_CONTEXT (decl));
+  if (depth == 0)
+return false;
+  if (for_each_template_parm (TREE_TYPE (decl), template_parm_outer_level,
+			  , NULL, /*include_nondeduced_p=*/true))
+return true;
+  if (PRIMARY_TEMPLATE_P (decl)
+  && for_each_template_parm (INNERMOST_TEMPLATE_PARMS
+ (DECL_TEMPLATE_PARMS (decl)),
+ template_parm_outer_level,
+ , NULL, /*include_nondeduced_p=*/true))
+return true;
+  tree ci = get_constraints (decl);
+  if (ci)
+ci = CI_NORMALIZED_CONSTRAINTS (ci);
+  if (ci && for_each_template_parm (ci, template_parm_outer_level,
+, NULL, /*nondeduced*/true))
+return true;
+  return false;
+}
+
 /* Returns TRUE iff INST is an instantiation we don't need to do in an
ill-formed translation unit, i.e. a variable or function that isn't
usable in a constant expression.  */
@@ -23008,7 +23053,7 @@ type_dependent_expression_p (tree expression)
 
   if (TREE_CODE (expression) == TEMPLATE_DECL
   && !DECL_TEMPLATE_TEMPLATE_PARM_P (expression))
-return false;
+return uses_outer_template_parms (expression);
 
   if (TREE_CODE (expression) == STMT_EXPR)
 expression = stmt_expr_value_expr (expression);

commit 2f9c0996e3c50d7c84c34c3353ac70a7c10c0141
Author: Jason Merrill 
Date:   Mon Jun 20 11:24:10 2016 +0300

	Fix constraint satisfaction in uninstantiated template.

	* constraint.cc (constraints_satisfied_p): Keep as many levels of
	args as our template has levels of parms.

diff --git a/gcc/cp/constraint.cc b/gcc/cp/constraint.cc
index 5e42fa9..af7a593 100644
--- a/gcc/cp/constraint.cc
+++ b/gcc/cp/constraint.cc
@@ -2122,8 +2122,10 @@ constraints_satisfied_p (tree decl)
   tree args = NULL_TREE;
   if (tree ti = DECL_TEMPLATE_INFO (decl))
 {
-  ci = get_constraints (TI_TEMPLATE (ti));
-  args = INNERMOST_TEMPLATE_ARGS (TI_ARGS (ti));
+  tree tmpl = TI_TEMPLATE (ti);
+  ci = get_constraints (tmpl);
+  int depth = TMPL_PARMS_DEPTH (DECL_TEMPLATE_PARMS (tmpl));
+  args = get_innermost_template_args (TI_ARGS (ti), depth);
 }
   else
 {
diff --git a/gcc/testsuite/g++.dg/concepts/memtmpl1.C b/gcc/testsuite/g++.dg/concepts/memtmpl1.C
new file mode 100644
index 000..6f3d5a3
--- /dev/null
+++ b/gcc/testsuite/g++.dg/concepts/memtmpl1.C
@@ -0,0 +1,15 @@
+// { dg-options "-std=c++1z -fconcepts" }
+
+template 
+struct A {
+  template 
+  requires sizeof(T) == 1
+static void f(U);
+  template 
+  requires sizeof(T) == 2
+static void f(U);
+  void g()
+  {
+f(42);
+  }
+};


Re: [Patch, Fortran] PR71068 - fix ICE on invalid with coindexed DATA

2016-06-21 Thread Paul Richard Thomas
Dear Tobias,

"Beauty is in the eye of the beholder!" It works, it's good :-)

OK for trunk

Thanks for the patch

Paul

PS Why, in principle, can data objects not have co-indices?

On 21 June 2016 at 16:15, Tobias Burnus
 wrote:
> Dear all,
>
> the problem comes up with:
>data a(1)[1] /1/
> which is invalid. In resolve.c's check_data_variable(), one has:
>
>   if (!gfc_resolve_expr (var->expr))
> return false;
> ...
>   e = var->expr;
>
>   if (e->expr_type != EXPR_VARIABLE)
> gfc_internal_error ("check_data_variable(): Bad expression");
>
> which triggers as resolve_variable() has:
>
>   if (t && flag_coarray == GFC_FCOARRAY_LIB && gfc_is_coindexed (e))
> add_caf_get_intrinsic (e);
>
>
> The solution is either not to decorate the DATA variable with
> caf_get() - or to strip it off for testing. The latter has been
> done in this patch. It's not really beautify, but works.
>
> Additionally, I had to add the argument-handling short cut
> as otherwise, more and more caf_get() could be added around the
> argument, which is both pointless and causes the strip off to
> fail.
>
>
> Build and regtested on x86-64-gnu-linux.
> OK for the trunk? Or do you see a more beautiful approach?
>
> Tobias



-- 
The difference between genius and stupidity is; genius has its limits.

Albert Einstein


[PATCH, rs6000] Scheduling update

2016-06-21 Thread Pat Haugen
This patch adds instruction scheduling support for the Power9 processor. 
Bootstrap/regression tested on powerpc64/powerpc64le with no new failures. Ok 
for trunk? Ok for backport to GCC 6 branch after successful bootstrap/regtest 
there?

-Pat


2016-06-21  Pat Haugen  

* config/rs6000/power8.md (power8-fp): Include dfp type.
* config/rs6000/power6.md (power6-fp): Likewise.
* config/rs6000/htm.md (various insns): Change type atribute to 
htmsimple and set power9_alu2 appropriately.
* config/rs6000/power9.md: New file.
* config/rs6000/t-rs6000 (MD_INCLUDES): Add power9.md.
* config/rs6000/power7.md (power7-fp): Include dfp type.
* config/rs6000/rs6000.c (power9_cost): Update costs, cache size
and prefetch streams.
(rs6000_option_override_internal): Remove temporary code setting
tuning to power8. Don't set rs6000_sched_groups for power9.
(last_scheduled_insn): Change to rtx_insn *.
(divCnt, vec_load_pendulum): New variables.
(rs6000_adjust_cost): Add Power9 to test for store->load separation.
(rs6000_issue_rate): Set issue rate for Power9.
(is_power9_pairable_vec_type): New.
(rs6000_sched_reorder2): Add Power9 code to group fixed point divide
insns and group/alternate vector operations with vector loads.
(insn_must_be_first_in_group): Remove Power9.
(insn_must_be_last_in_group): Likewise.
(force_new_group): Likewise.
(rs6000_sched_init): Fix initialization of last_scheduled_insn.
Initialize divCnt/vec_load_pendulum.
(_rs6000_sched_context, rs6000_init_sched_context,
rs6000_set_sched_context): Handle context save/restore of new
variables.
* config/rs6000/vsx.md (various insns): Set power9_alu2 attribute.
* config/rs6000/altivec.md (various insns): Likewise.
* config/rs6000/dfp.md (various insns): Change type attribute to dfp.
*  config/rs6000/crypto.md (crypto_vshasigma): Change type
and set power9_alu2.
* config/rs6000/rs6000.md ('type' attribute): Add htmsimple/dfp types.
Define "power9_alu2" and "mnemonic" attributes.
Include power9.md.
(*cmp_fpr, *fpmask): Set power9_alu2 attribute.
(*cmp_hw): Change type to veccmp.

Index: config/rs6000/power8.md
===
--- config/rs6000/power8.md	(revision 237621)
+++ config/rs6000/power8.md	(working copy)
@@ -317,7 +317,7 @@ (define_bypass 4 "power8-branch" "power8
 
 ; VS Unit (includes FP/VSX/VMX/DFP/Crypto)
 (define_insn_reservation "power8-fp" 6
-  (and (eq_attr "type" "fp,dmul")
+  (and (eq_attr "type" "fp,dmul,dfp")
(eq_attr "cpu" "power8"))
   "DU_any_power8,VSU_power8")
 
Index: config/rs6000/power6.md
===
--- config/rs6000/power6.md	(revision 237621)
+++ config/rs6000/power6.md	(working copy)
@@ -500,7 +500,7 @@ (define_insn_reservation "power6-mtcr" 4
 (define_bypass 9 "power6-mtcr" "power6-branch")
 
 (define_insn_reservation "power6-fp" 6
-  (and (eq_attr "type" "fp,dmul")
+  (and (eq_attr "type" "fp,dmul,dfp")
(eq_attr "cpu" "power6"))
   "FPU_power6")
 
Index: config/rs6000/htm.md
===
--- config/rs6000/htm.md	(revision 237621)
+++ config/rs6000/htm.md	(working copy)
@@ -72,7 +72,8 @@ (define_insn "*tabort"
(set (match_operand:BLK 2) (unspec:BLK [(match_dup 2)] UNSPEC_HTM_FENCE))]
   "TARGET_HTM"
   "tabort. %0"
-  [(set_attr "type" "htm")
+  [(set_attr "type" "htmsimple")
+   (set_attr "power9_alu2" "yes")
(set_attr "length" "4")])
 
 (define_expand "tabortc"
@@ -98,7 +99,8 @@ (define_insn "*tabortc"
(set (match_operand:BLK 4) (unspec:BLK [(match_dup 4)] UNSPEC_HTM_FENCE))]
   "TARGET_HTM"
   "tabortc. %0,%1,%2"
-  [(set_attr "type" "htm")
+  [(set_attr "type" "htmsimple")
+   (set_attr "power9_alu2" "yes")
(set_attr "length" "4")])
 
 (define_expand "tabortci"
@@ -124,7 +126,8 @@ (define_insn "*tabortci"
(set (match_operand:BLK 4) (unspec:BLK [(match_dup 4)] UNSPEC_HTM_FENCE))]
   "TARGET_HTM"
   "tabortci. %0,%1,%2"
-  [(set_attr "type" "htm")
+  [(set_attr "type" "htmsimple")
+   (set_attr "power9_alu2" "yes")
(set_attr "length" "4")])
 
 (define_expand "tbegin"
@@ -146,7 +149,7 @@ (define_insn "*tbegin"
(set (match_operand:BLK 2) (unspec:BLK [(match_dup 2)] UNSPEC_HTM_FENCE))]
   "TARGET_HTM"
   "tbegin. %0"
-  [(set_attr "type" "htm")
+  [(set_attr "type" "htmsimple")
(set_attr "length" "4")])
 
 (define_expand "tcheck"
@@ -208,7 +211,7 @@ (define_insn "*trechkpt"
(set (match_operand:BLK 1) (unspec:BLK [(match_dup 1)] UNSPEC_HTM_FENCE))]
   "TARGET_HTM"
   "trechkpt."
-  [(set_attr "type" "htm")
+  [(set_attr "type" "htmsimple")
(set_attr "length" "4")])
 
 (define_expand "treclaim"
@@ -230,7 +233,7 @@ 

Implement C _FloatN, _FloatNx types [version 2]

2016-06-21 Thread Joseph Myers
[Changes in version 2 of the patch:

* PowerPC always uses KFmode for _Float128 and _Float64x when those
  types are supported, not TFmode.

* More thorough checking for back ends using too-low precision values
  for modes, to avoid that causing too-low precision values for types.

* Patch description explicitly says that _FloatN may use MIPS NaN
  conventions, not just the conventions recommended in IEEE 754-2008.

* Patch description updated for Fortran now we've established that the
  Fortran float128_type_node is deliberately different from the
  language-independent one and should be NULL when long double has
  binary128 format, so renaming is the right thing to do.

]

ISO/IEC TS 18661-3:2015 defines C bindings to IEEE interchange and
extended types, in the form of _FloatN and _FloatNx type names with
corresponding fN/FN and fNx/FNx constant suffixes and FLTN_* / FLTNX_*
 macros.  This patch implements support for this feature in
GCC.

The _FloatN types, for N = 16, 32, 64 or >= 128 and a multiple of 32,
are types encoded according to the corresponding IEEE interchange
format (endianness unspecified; may use either the NaN conventions
recommended in IEEE 754-2008, or the MIPS NaN conventions, since the
choice of convention is only an IEEE recommendation, not a
requirement).  The _FloatNx types, for N = 32, 64 and 128, are IEEE
"extended" types: types extending a narrower format with range and
precision at least as big as those specified in IEEE 754 for each
extended type (and with unspecified representation, but still
following IEEE semantics for their values and operations - and with
the set of values being determined by the precision and the maximum
exponent, which means that while Intel "extended" is suitable for
_Float64x, m68k "extended" is not).  These types are always distinct
from and not compatible with each other and the standard floating
types float, double, long double; thus, double, _Float64 and _Float32x
may all have the same ABI, but they are three still distinct types.
The type names may be used with _Complex to construct corresponding
complex types (unlike __float128, which acts more like a typedef name
than a keyword - thus, this patch may be considered to fix PR
c/32187).  The new suffixes can be combined with GNU "i" and "j"
suffixes for constants of complex types (e.g. 1.0if128, 2.0f64i).

The set of types supported is implementation-defined.  In this GCC
patch, _Float32 is SFmode if that is suitable; _Float32x and _Float64
are DFmode if that is suitable; _Float128 is TFmode if that is
suitable; _Float64x is XFmode if that is suitable, and otherwise
TFmode if that is suitable.  There is a target hook to override the
choices if necessary.  "Suitable" means both conforming to the
requirements of that type, and supported as a scalar type including in
libgcc.  The ABI is whatever the back end does for scalars of that
mode (but note that _Float32 is passed without promotion in variable
arguments, unlike float).  All the existing issues with exceptions and
rounding modes for existing types apply equally to the new type names.

No GCC port supports a floating-point format suitable for _Float128x.
Although there is HFmode support for ARM and AArch64, use of that for
_Float16 is not enabled.  Supporting _Float16 would require additional
work on the excess precision aspects of TS 18661-3: there are new
values of FLT_EVAL_METHOD, which are not currently supported in GCC,
and FLT_EVAL_METHOD == 0 now means that operations and constants on
types narrower than float are evaluated to the range and precision of
float.  Implementing that, so that _Float16 gets evaluated with excess
range and precision, would involve changes to the excess precision
infrastructure so that the _Float16 case is enabled by default, unlike
the x87 case which is only enabled for -fexcess-precision=standard.
Other differences between _Float16 and __fp16 would also need to be
disentangled.

GCC has some prior support for nonstandard floating-point types in the
form of __float80 and __float128.  Where these were previously types
distinct from long double, they are made by this patch into aliases
for _Float64x / _Float128 if those types have the required properties.

In principle the set of possible _FloatN types is infinite.  This
patch hardcodes the four such types for N <= 128, but with as much
code as possible using loops over types to minimize the number of
places with such hardcoding.  I don't think it's likely any further
such types will be of use in future (or indeed that formats suitable
for _Float128x will actually be implemented).  There is a corner case
that all _FloatN, for N >= 128 and a multiple of 32, should be treated
as keywords even when the corresponding type is not supported; I
intend to deal with that in a followup patch.

Tests are added for various functionality of the new types, mostly
using type-generic headers.  PowerPC maintainers should note that the
tests do not do anything regarding passing 

Re: [Fortran, Patch] First patch for coarray FAILED IMAGES (TS 18508)

2016-06-21 Thread Alessandro Fanfarillo
* PING *

2016-06-06 15:05 GMT-06:00 Alessandro Fanfarillo :
> Dear all,
>
> please find in attachment the first patch (of n) for the FAILED IMAGES
> capability defined in the coarray TS 18508.
> The patch adds support for three new intrinsic functions defined in
> the TS for simulating a failure (fail image), checking an image status
> (image_status) and getting the list of failed images (failed_images).
> The patch has been built and regtested on x86_64-pc-linux-gnu.
>
> Ok for trunk?
>
> Alessandro


Re: RFC: pass to warn on questionable uses of alloca().

2016-06-21 Thread Martin Sebor

On 06/21/2016 10:00 AM, Jakub Jelinek wrote:

On Tue, Jun 21, 2016 at 09:57:59AM -0600, Jeff Law wrote:

Would a new attribute to annotate async-signal safe functions
help?  I envision that the attribute on a function definition
would turn off the alloca/VLA to malloc transformation, and
could also diagnose calls to other function whose declarations
were not also declared async-signal safe with the same
attribute.

It's probably a good idea -- there's enough "special" stuff with those
functions that having a way to mark them is useful.

In fact, given the attribute, we ought to be able to build warnings around
multiple constructs that are not safe in that context.


What about functions that are meant to be async-signal safe only for certain
arguments or under some other conditions?
Automatically turning VLAs or alloca into malloc/free would break those.


I imagine in those unusual cases the author of such a function
would have to disable the transformation explicitly, either by
using the command line option it was under, or via the equivalent
pragma.

But I wouldn't expect such cases to be common.  I can't think of
any POSIX function that's conditionally async-signal safe.  In
fact, by my reading, the POSIX definition of an async-signal safe
function prohibits it:

  Async-Signal-Safe Function
  A function that may be invoked, without restriction, from
  signal-catching functions.  No function is async-signal-safe
  unless explicitly described as such.

Joseph would have a better idea whether there are any functions
in Glibc that are conditionally async-signal safe.

Martin


[Committed][testsuite] Ensure vrnd* tests run on ARMv8 cores

2016-06-21 Thread Wilco Dijkstra
The recently added gcc.target/aarch64/advsimd-intrinsics/vrnd*.c tests cause
failures due to accidentally running on non-ARMv8 hardware - the target check
arm_v8_neon_ok is correct for compilation tests but should be arm_v8_neon_hw
for execution tests.  Fix this and also change arm_v8_neon_hw to return
true for AArch64 so these tests are run on AArch64 too.

Committed as trivial patch in r237653.

ChangeLog:
2016-06-21  Wilco Dijkstra  

gcc/testsuite/

* gcc.target/aarch64/advsimd-intrinsics/vrnd.c
(dg-require-effective-target): Use arm_v8_neon_hw.
* gcc.target/aarch64/advsimd-intrinsics/vrnda.c
(dg-require-effective-target): Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vrndm.c
(dg-require-effective-target): Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vrndn.c
(dg-require-effective-target): Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vrndp.c
(dg-require-effective-target): Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vrndx.c
(dg-require-effective-target): Likewise.
* lib/target-supports.exp (check_runtime arm_v8_neon_hw_available):
Add AArch64 check.
--
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c
index 
5f492d41bffb49fd6811a31aacb86d8a949ab0e6..d97a3a25ee5fb0a6021f0d75c4d653bff0d59bb7
 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnd.c
@@ -1,4 +1,4 @@
-/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-require-effective-target arm_v8_neon_hw } */
 /* { dg-add-options arm_v8_neon } */
 
 #include 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnda.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnda.c
index 
816fd28dd19d6f0591619c3fa3ca06b7e4d99c3e..ff2bdc0563fc3a15115ae6121408f134bb9e81cd
 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnda.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrnda.c
@@ -1,4 +1,4 @@
-/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-require-effective-target arm_v8_neon_hw } */
 /* { dg-add-options arm_v8_neon } */
 
 #include 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndm.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndm.c
index 
029880c21f6fdea148ec6a41a7a438cd08eeafe3..eae9f61c5859b7f7add3dd01ff0edfd0ae8cd75b
 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndm.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndm.c
@@ -1,4 +1,4 @@
-/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-require-effective-target arm_v8_neon_hw } */
 /* { dg-add-options arm_v8_neon } */
 
 #include 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndn.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndn.c
index 
571243c49298ec154c55932b611eb3bcc42efe60..c6c707d67655cc648e0526a489046411a065675f
 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndn.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndn.c
@@ -1,4 +1,4 @@
-/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-require-effective-target arm_v8_neon_hw } */
 /* { dg-add-options arm_v8_neon } */
 
 #include 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndp.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndp.c
index 
ff4771c87892d202732e9eefa9b241c1dec1c9eb..e94eb6b76221c7b229229e1286eb910b7eef740f
 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndp.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndp.c
@@ -1,4 +1,4 @@
-/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-require-effective-target arm_v8_neon_hw } */
 /* { dg-add-options arm_v8_neon } */
 
 #include 
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndx.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndx.c
index 
ff2357bebf3c01d723229c43d35a747d2bbe1315..0d2a63ef26c75f684ed17689a37d1b8ada0b043f
 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndx.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vrndx.c
@@ -1,4 +1,4 @@
-/* { dg-require-effective-target arm_v8_neon_ok } */
+/* { dg-require-effective-target arm_v8_neon_hw } */
 /* { dg-add-options arm_v8_neon } */
 
 #include 
diff --git a/gcc/testsuite/lib/target-supports.exp 
b/gcc/testsuite/lib/target-supports.exp
index 
9876bb5f4ce539628c451f77e21415507830c4f6..2a8feb8f13e0130036771a73a40015c905c21993
 100644
--- a/gcc/testsuite/lib/target-supports.exp
+++ b/gcc/testsuite/lib/target-supports.exp
@@ -3417,11 +3417,17 @@ proc check_effective_target_arm_v8_neon_hw { } {
int
main (void)
{
- float32x2_t a;
+ float32x2_t a = { 1.0f, 2.0f };
+ #ifdef __ARM_ARCH_ISA_A64
+ asm ("frinta %0.2s, 

Re: [PATCH][AArch64] Add initial support for Cortex-A73

2016-06-21 Thread James Greenhalgh
On Tue, Jun 21, 2016 at 04:55:43PM +0100, Kyrill Tkachov wrote:
> Hi all,
> 
> This is a rebase of https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00403.html
> on top of Evandro's changes.
> Also, to elaborate on the original posting, the initial tuning structure is
> based on the Cortex-A57 one but with the issue rate set to 2, FMA steering
> turned off and ADRP+LDR fusion enabled.

I see you've also chosen to use the generic_branch_cost costs for
branches. As you didn't mention it explicitly here, was that intentional?

> Is this ok for trunk?

This looks OK to me. Watch out for the conflict with the Broadcom Vulcan
patch that was committed to trunk earlier today. The merge should be easy.

Thanks for the patch!

James

> 2016-06-21  Kyrylo Tkachov  
> 
> * config/aarch64/aarch64.c (cortexa73_tunings): New struct.
> * config/aarch64/aarch64-cores.def (cortex-a73): New entry.
> (cortex-a73.cortex-a35): Likewise.
> (cortex-a73.cortex-a53): Likewise.
> * config/aarch64/aarch64-tune.md: Regenerate.
> * doc/invoke.texi (AArch64 Options): Document cortex-a73,
> cortex-a73.cortex-a35 and cortex-a73.cortex-a53 arguments to
> -mcpu and -mtune.



[Committed][testsuite] Fix tree-ssa/attr-hotcold-2.c failures

2016-06-21 Thread Wilco Dijkstra
Fix tree-ssa/attr-hotcold-2.c failures now that the test runs. 
GCC dumps the blocks 3 times so update count to 3 and the test passes.

ChangeLog:
2016-06-21  Wilco Dijkstra  

    gcc/testsuite/

    * gcc.dg/tree-ssa/attr-hotcold-2.c (scan-tree-dump-times):
    Set to 3 so test passes.
--

diff --git a/gcc/testsuite/gcc.dg/tree-ssa/attr-hotcold-2.c 
b/gcc/testsuite/gcc.dg/tree-ssa/attr-hotcold-2.c
index 
e2e81434febd707c36e5ce0a2687a4bdc568b0e5..13d2916c47b9f0b358fe455088b6e17e3a6ad60f
 100644
--- a/gcc/testsuite/gcc.dg/tree-ssa/attr-hotcold-2.c
+++ b/gcc/testsuite/gcc.dg/tree-ssa/attr-hotcold-2.c
@@ -20,10 +20,9 @@ void f(int x, int y)
 
 /* { dg-final { scan-tree-dump-times "hot label heuristics" 1 
"profile_estimate" } } */
 /* { dg-final { scan-tree-dump-times "cold label heuristics" 1 
"profile_estimate" } } */
-/* { dg-final { scan-tree-dump-times "block 4, loop depth 0, count 0, freq 
\[1-4\]\[^0-9\]" 1 "profile_estimate" } } */
+/* { dg-final { scan-tree-dump-times "block 4, loop depth 0, count 0, freq 
\[1-4\]\[^0-9\]" 3 "profile_estimate" } } */
 
 /* Note: we're attempting to match some number > 6000, i.e. > 60%.
    The exact number ought to be tweekable without having to juggle
    the testcase around too much.  */
-/* { dg-final { scan-tree-dump-times "block 5, loop depth 0, count 0, freq 
\[6-9\]\[0-9\]\[0-9\]\[0-9\]" 1 "profile_estimate" } } */
-
+/* { dg-final { scan-tree-dump-times "block 5, loop depth 0, count 0, freq 
\[6-9\]\[0-9\]\[0-9\]\[0-9\]" 3 "profile_estimate" } } */




Re: [patch, avr,wwwdocs] PR 58655

2016-06-21 Thread Georg-Johann Lay

Pitchumani Sivanupandi schrieb:

Attached patches add documentation for -mfract-convert-truncate option
and add that info to release notes (gcc-4.9 changes).

If OK, could someone commit please? I do not have commit access.

Regards,
Pitchumani

gcc/ChangeLog

2016-06-21  Pitchumani Sivanupandi  

PR target/58655
* doc/invoke.texi (AVR Options): Document -mfract-convert-truncate
option.

--- a/wwwdocs/htdocs/gcc-4.9/changes.html
+++ b/wwwdocs/htdocs/gcc-4.9/changes.html
@@ -579,6 +579,14 @@ auto incr(T x) { return x++; }
size when compiling for the M-profile processors.
  
  
+AVR
+
+  
+A new command-line option -mfract-convert-truncate has been added.


 tags around the option.


+It allows compiler to use truncation instead of rounding towards
+0 for fractional int types.


"zero" instead of "0", and it's for fixed-point types, not for int types.


+  
+
 IA-32/x86-64
   
 -mfpmath=sse is now implied by -ffast-math



Attached patches add documentation for -mfract-convert-truncate option
and add that info to release notes (gcc-4.9 changes).

If OK, could someone commit please? I do not have commit access.

Regards,
Pitchumani

gcc/ChangeLog

2016-06-21  Pitchumani Sivanupandi  

PR target/58655
* doc/invoke.texi (AVR Options): Document -mfract-convert-truncate
option.



--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -643,8 +643,8 @@ Objective-C and Objective-C++ Dialects}.
 @emph{AVR Options}
 @gccoptlist{-mmcu=@var{mcu} -maccumulate-args -mbranch-cost=@var{cost} @gol
 -mcall-prologues -mint8 -mn_flash=@var{size} -mno-interrupts @gol
--mrelax -mrmw -mstrict-X -mtiny-stack -nodevicelib -Waddr-space-convert @gol
--Wmisspelled-isr}
+-mrelax -mrmw -mstrict-X -mtiny-stack -mfract-convert-truncate -nodevicelib 
@gol
+-Waddr-space-convert -Wmisspelled-isr}
 
 @emph{Blackfin Options}

 @gccoptlist{-mcpu=@var{cpu}@r{[}-@var{sirevision}@r{]} @gol
@@ -14586,6 +14586,10 @@ sbiw r26, const   ; X -= const
 @opindex mtiny-stack
 Only change the lower 8@tie{}bits of the stack pointer.
 
+@item -mfract-convert-truncate

+@opindex mfract-convert-truncate
+Allow to use truncation instead of rounding towards 0 for fractional int types.


Same here: "zero" and "fixed-point".


+
 @item -nodevicelib
 @opindex nodevicelib
 Don't link against AVR-LibC's device specific library @code{lib.a}.



Thanks, Johann


[PATCH] [OBVIOUS] s/imposisble/impossible in predict.c

2016-06-21 Thread Martin Liška
Hello.

I've just installed patch that does $SUBJECT.

Thanks,
Martin
>From 8302396974053dd00cd5eaff594dddf2f1ccf80b Mon Sep 17 00:00:00 2001
From: marxin 
Date: Tue, 21 Jun 2016 18:05:50 +0200
Subject: [PATCH] s/imposisble/impossible in predict.c

gcc/ChangeLog:

2016-06-21  Martin Liska  

	* predict.c (force_edge_cold): Replace imposisble with
	impossible.
---
 gcc/predict.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/gcc/predict.c b/gcc/predict.c
index 642bd62..470de8a 100644
--- a/gcc/predict.c
+++ b/gcc/predict.c
@@ -3507,7 +3507,7 @@ force_edge_cold (edge e, bool impossible)
 	fprintf (dump_file, "Making edge %i->%i %s by redistributing "
 		 "probability to other edges.\n",
 		 e->src->index, e->dest->index,
-		 impossible ? "imposisble" : "cold");
+		 impossible ? "impossible" : "cold");
   FOR_EACH_EDGE (e2, ei, e->src->succs)
 	if (e2 != e)
 	  {
@@ -3533,7 +3533,7 @@ force_edge_cold (edge e, bool impossible)
 	  int old_frequency = e->src->frequency;
 	  if (dump_file && (dump_flags & TDF_DETAILS))
 	fprintf (dump_file, "Making bb %i %s.\n", e->src->index,
-		 impossible ? "imposisble" : "cold");
+		 impossible ? "impossible" : "cold");
 	  e->src->frequency = MIN (e->src->frequency, impossible ? 0 : 1);
 	  e->src->count = e->count = RDIV (e->src->count * e->src->frequency,
 	   old_frequency);
@@ -3542,6 +3542,6 @@ force_edge_cold (edge e, bool impossible)
   else if (dump_file && (dump_flags & TDF_DETAILS)
 	   && maybe_hot_bb_p (cfun, e->src))
 	fprintf (dump_file, "Giving up on making bb %i %s.\n", e->src->index,
-		 impossible ? "imposisble" : "cold");
+		 impossible ? "impossible" : "cold");
 }
 }
-- 
2.8.4



Re: RFC: pass to warn on questionable uses of alloca().

2016-06-21 Thread Jakub Jelinek
On Tue, Jun 21, 2016 at 09:57:59AM -0600, Jeff Law wrote:
> >Would a new attribute to annotate async-signal safe functions
> >help?  I envision that the attribute on a function definition
> >would turn off the alloca/VLA to malloc transformation, and
> >could also diagnose calls to other function whose declarations
> >were not also declared async-signal safe with the same
> >attribute.
> It's probably a good idea -- there's enough "special" stuff with those
> functions that having a way to mark them is useful.
> 
> In fact, given the attribute, we ought to be able to build warnings around
> multiple constructs that are not safe in that context.

What about functions that are meant to be async-signal safe only for certain
arguments or under some other conditions?
Automatically turning VLAs or alloca into malloc/free would break those.

Jakub


[RFC: Patch 4/6 v2] Modify cost model for noce_cmove_arith

2016-06-21 Thread James Greenhalgh

Hi,

This patch clears up the cost model for noce_try_cmove_arith. We lose
the "??? FIXME: Magic number 5" comment, and gain a more realistic cost
model for if-converting memory accesses.

This is the patch that has the chance to cause the largest behavioural
changes for most targets - the current heuristic does not take in to
consideration the cost of a conditional move - once we add that the cost
of the converted sequence often looks higher than we allowed before.

I think that missing the cost of the conditional move from these sequences
is not a good idea, and that the cost model should rely on the target giving
back good information. A target that finds tests failing after this patch
should consider either reducing the cost of a conditional move sequence, or
increasing TARGET_MAX_NOCE_IFCVT_SEQ_COST.

As this ups the cost of if-convert dramatically, I've used the new
parameters to ensure that the tests in the testsuite continue to pass on
all targets.

Bootstrapped in series on aarch64 and x86-64.

OK?

Thanks,
James

---
gcc/

2016-06-21  James Greenhalgh  

* ifcvt.c (noce_try_cmove_arith): Check costs after constructing
new sequence.

gcc/testsuite/

2016-06-21  James Greenhalgh  

* gcc.dg/ifcvt-2.c: Use parameter to guide if-conversion heuristics.
* gcc.dg/ifcvt-3.c: Use parameter to guide if-conversion heuristics.
* gcc.dg/pr68435.c: Use parameter to guide if-conversion heuristics.

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index f4ad037..78906d3 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -2092,7 +2092,8 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   rtx a = if_info->a;
   rtx b = if_info->b;
   rtx x = if_info->x;
-  rtx orig_a, orig_b;
+  rtx orig_a = a;
+  rtx orig_b = b;
   rtx_insn *insn_a, *insn_b;
   bool a_simple = if_info->then_simple;
   bool b_simple = if_info->else_simple;
@@ -2102,16 +2103,15 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   int is_mem = 0;
   enum rtx_code code;
   rtx_insn *ifcvt_seq;
+  bool speed_p = optimize_bb_for_speed_p (if_info->test_bb);
 
   /* A conditional move from two memory sources is equivalent to a
  conditional on their addresses followed by a load.  Don't do this
  early because it'll screw alias analysis.  Note that we've
  already checked for no side effects.  */
-  /* ??? FIXME: Magic number 5.  */
   if (cse_not_expected
   && MEM_P (a) && MEM_P (b)
-  && MEM_ADDR_SPACE (a) == MEM_ADDR_SPACE (b)
-  && noce_estimate_conversion_profitable_p (if_info, 5))
+  && MEM_ADDR_SPACE (a) == MEM_ADDR_SPACE (b))
 {
   machine_mode address_mode = get_address_mode (a);
 
@@ -2143,25 +2143,6 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   if (!can_conditionally_move_p (x_mode))
 return FALSE;
 
-  unsigned int then_cost;
-  unsigned int else_cost;
-  if (insn_a)
-then_cost = if_info->then_cost;
-  else
-then_cost = 0;
-
-  if (insn_b)
-else_cost = if_info->else_cost;
-  else
-else_cost = 0;
-
-  /* We're going to execute one of the basic blocks anyway, so
- bail out if the most expensive of the two blocks is unacceptable.  */
-
-  /* TODO: Revisit cost model.  */
-  if (MAX (then_cost, else_cost) > if_info->max_seq_cost)
-return FALSE;
-
   /* Possibly rearrange operands to make things come out more natural.  */
   if (reversed_comparison_code (if_info->cond, if_info->jump) != UNKNOWN)
 {
@@ -2353,6 +2334,12 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
   if (!ifcvt_seq)
 return FALSE;
 
+  /* Check that our cost model will allow the transform.  */
+
+  if (seq_cost (ifcvt_seq, speed_p) > if_info->max_seq_cost)
+/* Just return false, the sequence has already been finalized.  */
+return FALSE;
+
   emit_insn_before_setloc (ifcvt_seq, if_info->jump,
 			   INSN_LOCATION (if_info->insn_a));
   if_info->transform_name = "noce_try_cmove_arith";
diff --git a/gcc/testsuite/gcc.dg/ifcvt-2.c b/gcc/testsuite/gcc.dg/ifcvt-2.c
index e0e1728..73e0dcc 100644
--- a/gcc/testsuite/gcc.dg/ifcvt-2.c
+++ b/gcc/testsuite/gcc.dg/ifcvt-2.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target aarch64*-*-* x86_64-*-* } } */
-/* { dg-options "-fdump-rtl-ce1 -O2" } */
+/* { dg-options "-fdump-rtl-ce1 -O2 --param max-rtl-if-conversion-unpredictable-cost=100" } */
 
 
 typedef unsigned char uint8_t;
diff --git a/gcc/testsuite/gcc.dg/ifcvt-3.c b/gcc/testsuite/gcc.dg/ifcvt-3.c
index 44233d4..b250bc1 100644
--- a/gcc/testsuite/gcc.dg/ifcvt-3.c
+++ b/gcc/testsuite/gcc.dg/ifcvt-3.c
@@ -1,5 +1,5 @@
 /* { dg-do compile { target { { aarch64*-*-* i?86-*-* x86_64-*-* } && lp64 } } } */
-/* { dg-options "-fdump-rtl-ce1 -O2" } */
+/* { dg-options "-fdump-rtl-ce1 -O2 --param max-rtl-if-conversion-unpredictable-cost=100" } */
 
 typedef long long s64;
 
diff --git a/gcc/testsuite/gcc.dg/pr68435.c b/gcc/testsuite/gcc.dg/pr68435.c
index 765699a..f86b7f8 100644
--- a/gcc/testsuite/gcc.dg/pr68435.c
+++ 

Re: RFC: pass to warn on questionable uses of alloca().

2016-06-21 Thread Jeff Law

On 06/21/2016 09:51 AM, Martin Sebor wrote:

On 06/20/2016 03:41 PM, Jeff Law wrote:

On 06/20/2016 08:56 AM, Joseph Myers wrote:

On Sat, 18 Jun 2016, Martin Sebor wrote:


the function regardless of the value of its argument).  At
the same time, it seems that an even more reliable solution
than pointing out potentially unsafe calls to the function
and relying on users to modify their code to use malloc for
large/unbounded allocations would be to let GCC do it for
them automatically (i.e., in response to some other option,
emit a call to malloc instead, and insert a call to free when
appropriate).


Note that such an option would not be usable for the original motivating
case of glibc, because in code that's meant to be async-signal-safe,
alloca and VLAs can be used, but malloc cannot.

I've actually considered the other direction more viable.  Define the
right set of constraints and let the compiler optimize from malloc/free
to alloca.

For the uses of alloca in glibc that have to be async-signal-safe, we
should just leave those alone no matter what we may or may not be able
to prove.


Would a new attribute to annotate async-signal safe functions
help?  I envision that the attribute on a function definition
would turn off the alloca/VLA to malloc transformation, and
could also diagnose calls to other function whose declarations
were not also declared async-signal safe with the same
attribute.
It's probably a good idea -- there's enough "special" stuff with those 
functions that having a way to mark them is useful.


In fact, given the attribute, we ought to be able to build warnings 
around multiple constructs that are not safe in that context.


jeff


[RFC: Patch 2/6 v2] Factor out the comparisons against magic numbers in ifcvt

2016-06-21 Thread James Greenhalgh

Hi,

This patch pulls the comparisons between if_info->branch_cost and a magic
number representing an instruction count to a common function. While I'm
doing it, I've documented the instructions that the magic numbers relate
to, and updated them where they were inconsistent.

If our measure of the cost of a branch is now in rtx costs units, we can
get to an estimate for the cost of an expression from the number of
instructions by multiplying through by COSTS_N_INSNS (1).

Alternatively, we could actually construct the cheap sequences and
check the sequence. But in these cases we're expecting to if-convert on
almost all targets, the transforms in this patch are almost universally
a good idea, even for targets with a very powerful branch predictor,
eliminating the branch eliminates a basic block boundary so might be
helpful for scheduling, combine, and other RTL optimizers.

Bootstrapped on x86-64 and aarch64 as part of the full sequence.

OK?

Thanks,
James

---

2016-06-21  James Greenhalgh  

* ifcvt.c (noce_if_info): New field: max_seq_cost.
(noce_estimate_conversion_profitable_p): New.
(noce_try_store_flag_constants): Use it.
(noce_try_addcc): Likewise.
(noce_try_store_flag_mask): Likewise.
(noce_try_cmove): Likewise.
(noce_try_cmove_arith): Likewise.
(noce_find_if_block): Record targetm.max_noce_ifcvt_seq_cost.

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index fd29516..0b97114 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -814,6 +814,10 @@ struct noce_if_info
   /* Estimated cost of the particular branch instruction.  */
   unsigned int branch_cost;
 
+  /* Maximum permissible cost for the unconditional sequence we should
+ generate to replace this branch.  */
+  unsigned int max_seq_cost;
+
   /* The name of the noce transform that succeeded in if-converting
  this structure.  Used for debugging.  */
   const char *transform_name;
@@ -835,6 +839,17 @@ static int noce_try_minmax (struct noce_if_info *);
 static int noce_try_abs (struct noce_if_info *);
 static int noce_try_sign_mask (struct noce_if_info *);
 
+/* This function is always called when we would expand a number of "cheap"
+   instructions.  Multiply NINSNS by COSTS_N_INSNS (1) to approximate the
+   RTX cost of those cheap instructions.  */
+
+inline static bool
+noce_estimate_conversion_profitable_p (struct noce_if_info *if_info,
+   unsigned int ninsns)
+{
+  return if_info->max_seq_cost >= ninsns * COSTS_N_INSNS (1);
+}
+
 /* Helper function for noce_try_store_flag*.  */
 
 static rtx
@@ -1320,7 +1335,8 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
   && (REG_P (XEXP (a, 0))
 	  || (noce_operand_ok (XEXP (a, 0))
 	  && ! reg_overlap_mentioned_p (if_info->x, XEXP (a, 0
-  && if_info->branch_cost >= 2)
+  /* We need one instruction, the ADD of the store flag.  */
+  && noce_estimate_conversion_profitable_p (if_info, 1))
 {
   common = XEXP (a, 0);
   a = XEXP (a, 1);
@@ -1393,22 +1409,32 @@ noce_try_store_flag_constants (struct noce_if_info *if_info)
 	  else
 	gcc_unreachable ();
 	}
+  /* Is this (cond) ? 2^n : 0?  */
   else if (ifalse == 0 && exact_log2 (itrue) >= 0
 	   && (STORE_FLAG_VALUE == 1
-		   || if_info->branch_cost >= 2))
+		   /* We need ASHIFT, IOR.   */
+		   || noce_estimate_conversion_profitable_p (if_info, 2)))
 	normalize = 1;
+  /* Is this (cond) ? 0 : 2^n?  */
   else if (itrue == 0 && exact_log2 (ifalse) >= 0 && can_reverse
-	   && (STORE_FLAG_VALUE == 1 || if_info->branch_cost >= 2))
+	   && (STORE_FLAG_VALUE == 1
+		   /* We need ASHIFT, IOR.  */
+		   || noce_estimate_conversion_profitable_p (if_info, 2)))
 	{
 	  normalize = 1;
 	  reversep = true;
 	}
+  /* Is this (cond) ? -1 : x?  */
   else if (itrue == -1
 	   && (STORE_FLAG_VALUE == -1
-		   || if_info->branch_cost >= 2))
+		   /* Just an IOR.  */
+		   || noce_estimate_conversion_profitable_p (if_info, 1)))
 	normalize = -1;
+  /* Is this (cond) ? x : -1?  */
   else if (ifalse == -1 && can_reverse
-	   && (STORE_FLAG_VALUE == -1 || if_info->branch_cost >= 2))
+	   && (STORE_FLAG_VALUE == -1
+		   /* Just an IOR.  */
+		   || noce_estimate_conversion_profitable_p (if_info, 1)))
 	{
 	  normalize = -1;
 	  reversep = true;
@@ -1564,8 +1590,8 @@ noce_try_addcc (struct noce_if_info *if_info)
 	}
 
   /* If that fails, construct conditional increment or decrement using
-	 setcc.  */
-  if (if_info->branch_cost >= 2
+	 setcc.  We'd only need an ADD/SUB for this.  */
+  if (noce_estimate_conversion_profitable_p (if_info, 1)
 	  && (XEXP (if_info->a, 1) == const1_rtx
 	  || XEXP (if_info->a, 1) == constm1_rtx))
 {
@@ -1621,7 +1647,9 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
 return FALSE;
 
   reversep = 0;
-  if ((if_info->branch_cost >= 2
+
+  /* One instruction, AND.  */
+  if 

Re: [PATCH][AArch64] Add initial support for Cortex-A73

2016-06-21 Thread Kyrill Tkachov

Hi all,

This is a rebase of https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00403.html
on top of Evandro's changes.
Also, to elaborate on the original posting, the initial tuning structure is 
based on the Cortex-A57 one
but with the issue rate set to 2, FMA steering turned off and ADRP+LDR fusion 
enabled.

Is this ok for trunk?

Thanks,
Kyrill

2016-06-21  Kyrylo Tkachov  

* config/aarch64/aarch64.c (cortexa73_tunings): New struct.
* config/aarch64/aarch64-cores.def (cortex-a73): New entry.
(cortex-a73.cortex-a35): Likewise.
(cortex-a73.cortex-a53): Likewise.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi (AArch64 Options): Document cortex-a73,
cortex-a73.cortex-a35 and cortex-a73.cortex-a53 arguments to
-mcpu and -mtune.
diff --git a/gcc/config/aarch64/aarch64-cores.def b/gcc/config/aarch64/aarch64-cores.def
index 251a3ebb9be82def8f257cbdcab440d7a51d478b..3bbf42504c528fc364af19f422ff79dc0f8b7cd8 100644
--- a/gcc/config/aarch64/aarch64-cores.def
+++ b/gcc/config/aarch64/aarch64-cores.def
@@ -44,6 +44,7 @@ AARCH64_CORE("cortex-a35",  cortexa35, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | AA
 AARCH64_CORE("cortex-a53",  cortexa53, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa53, "0x41", "0xd03")
 AARCH64_CORE("cortex-a57",  cortexa57, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd07")
 AARCH64_CORE("cortex-a72",  cortexa72, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, "0x41", "0xd08")
+AARCH64_CORE("cortex-a73",  cortexa73, cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, "0x41", "0xd09")
 AARCH64_CORE("exynos-m1",   exynosm1,  exynosm1,  8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, exynosm1,  "0x53", "0x001")
 AARCH64_CORE("qdf24xx", qdf24xx,   cortexa57, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, cortexa57, "0x51", "0x800")
 AARCH64_CORE("thunderx",thunderx,  thunderx,  8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC | AARCH64_FL_CRYPTO, thunderx,  "0x43", "0x0a1")
@@ -53,4 +54,5 @@ AARCH64_CORE("xgene1",  xgene1,xgene1,8A,  AARCH64_FL_FOR_ARCH8, xge
 
 AARCH64_CORE("cortex-a57.cortex-a53",  cortexa57cortexa53, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa57, "0x41", "0xd07.0xd03")
 AARCH64_CORE("cortex-a72.cortex-a53",  cortexa72cortexa53, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa72, "0x41", "0xd08.0xd03")
-
+AARCH64_CORE("cortex-a73.cortex-a35",  cortexa73cortexa35, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, "0x41", "0xd09.0xd04")
+AARCH64_CORE("cortex-a73.cortex-a53",  cortexa73cortexa53, cortexa53, 8A,  AARCH64_FL_FOR_ARCH8 | AARCH64_FL_CRC, cortexa73, "0x41", "0xd09.0xd03")
diff --git a/gcc/config/aarch64/aarch64-tune.md b/gcc/config/aarch64/aarch64-tune.md
index cbc6f4879edb2f3842a50dfafe206313d49e9cf8..392dfbd0d922007b2d245d168ab5cf95db2670b5 100644
--- a/gcc/config/aarch64/aarch64-tune.md
+++ b/gcc/config/aarch64/aarch64-tune.md
@@ -1,5 +1,5 @@
 ;; -*- buffer-read-only: t -*-
 ;; Generated automatically by gentune.sh from aarch64-cores.def
 (define_attr "tune"
-	"cortexa35,cortexa53,cortexa57,cortexa72,exynosm1,qdf24xx,thunderx,xgene1,cortexa57cortexa53,cortexa72cortexa53"
+	"cortexa35,cortexa53,cortexa57,cortexa72,cortexa73,exynosm1,qdf24xx,thunderx,xgene1,cortexa57cortexa53,cortexa72cortexa53,cortexa73cortexa35,cortexa73cortexa53"
 	(const (symbol_ref "((enum attr_tune) aarch64_tune)")))
diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 78140653a8b09e5789afe670dc2c2f22a3a94a08..43eaa272dc54f231d3f31eea0fdc0b288b0d3f61 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -548,6 +548,32 @@ static const struct tune_params cortexa72_tunings =
   (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
 };
 
+static const struct tune_params cortexa73_tunings =
+{
+  _extra_costs,
+  _addrcost_table,
+  _regmove_cost,
+  _vector_cost,
+  _branch_cost,
+  _approx_modes,
+  4, /* memmov_cost.  */
+  2, /* issue_rate.  */
+  (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
+   | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
+  16,	/* function_align.  */
+  8,	/* jump_align.  */
+  4,	/* loop_align.  */
+  2,	/* int_reassoc_width.  */
+  4,	/* fp_reassoc_width.  */
+  1,	/* vec_reassoc_width.  */
+  2,	/* min_div_recip_mul_sf.  */
+  2,	/* min_div_recip_mul_df.  */
+  0,	/* max_case_values.  */
+  0,	/* cache_line_size.  */
+  tune_params::AUTOPREFETCHER_WEAK,	/* autoprefetcher_model.  */
+  (AARCH64_EXTRA_TUNE_NONE)	/* tune_flags.  */
+};
+
 static const struct tune_params exynosm1_tunings =
 {
   _extra_costs,
diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index c05ce8f8f12f9419d68c9ab6ceb8d89310b6c077..92c34764fce31b6a6e59f740c1e2131692ac527c 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -13085,12 +13085,14 @@ processors 

[RFC: Patch 6/6 v2] Remove second cost model from noce_try_store_flag_mask

2016-06-21 Thread James Greenhalgh

Hi,

This transformation tries two cost models, one estimating the number
of insns to use, one estimating the RTX cost of the transformed sequence.
This is inconsistent with the other cost models used in ifcvt.c and
unneccesary - eliminate the second cost model.

Thanks,
James

---
2016-06-21  James Greenhalgh  

* ifcvt.c (noce_try_store_flag_mask): Delete redundant cost model.

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 8f892b0..0cb8280 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -1668,9 +1668,6 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
 
   if (target)
 	{
-	  int old_cost, new_cost, insn_cost;
-	  int speed_p;
-
 	  if (target != if_info->x)
 	noce_emit_move_insn (if_info->x, target);
 
@@ -1678,15 +1675,6 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
 	  if (!seq)
 	return FALSE;
 
-	  speed_p = optimize_bb_for_speed_p (BLOCK_FOR_INSN (if_info->insn_a));
-	  insn_cost = insn_rtx_cost (PATTERN (if_info->insn_a), speed_p);
-	  /* TODO: Revisit this cost model.  */
-	  old_cost = if_info->max_seq_cost + insn_cost;
-	  new_cost = seq_cost (seq, speed_p);
-
-	  if (new_cost > old_cost)
-	return FALSE;
-
 	  emit_insn_before_setloc (seq, if_info->jump,
    INSN_LOCATION (if_info->insn_a));
 	  if_info->transform_name = "noce_try_store_flag_mask";


[RFC: Patch 5/6 v2] Improve the cost model for multiple-sets

2016-06-21 Thread James Greenhalgh

Hi,

This patch is rewrites the cost model for bb_ok_for_noce_multiple_sets
to use the max_seq_cost heuristic added in earlier patch revisions.

As with the previous patch, I've used the new parameters to ensure that
the testsuite is still testing the functionality rather than relying on
the target setting the costs appropriately.

Thanks,
James

---
gcc/

2016-06-21  James Greenhalgh  

* ifcvt.c (noce_convert_multiple sets): Move cost model to here,
check the sequence cost after constructing the converted sequence.
(bb_of_for_noce_convert_multiple_sets): Move cost model.

gcc/testsuite/

2016-06-21  James Greenhalgh  

* gcc.dg/ifcvt-4.c: Use parameter to guide if-conversion heuristics.
* gcc.dg/ifcvt-5.c: Use parameter to guide if-conversion heuristics.

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 78906d3..8f892b0 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -3191,6 +3191,7 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   rtx_insn *jump = if_info->jump;
   rtx_insn *cond_earliest;
   rtx_insn *insn;
+  bool speed_p = optimize_bb_for_speed_p (if_info->test_bb);
 
   start_sequence ();
 
@@ -3273,9 +3274,17 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
   for (int i = 0; i < count; i++)
 noce_emit_move_insn (targets[i], temporaries[i]);
 
-  /* Actually emit the sequence.  */
+  /* Actually emit the sequence if it isn't too expensive.  */
   rtx_insn *seq = get_insns ();
 
+  /*  Check the cost model to ensure this is profitable.  */
+  if (seq_cost (seq, speed_p)
+  > if_info->max_seq_cost)
+{
+  end_sequence ();
+  return FALSE;
+}
+
   for (insn = seq; insn; insn = NEXT_INSN (insn))
 set_used_flags (insn);
 
@@ -3325,23 +3334,16 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
 
 /* Return true iff basic block TEST_BB is comprised of only
(SET (REG) (REG)) insns suitable for conversion to a series
-   of conditional moves.  FORNOW: Use II to find the expected cost of
-   the branch into/over TEST_BB.
-
-   TODO: This creates an implicit "magic number" for if conversion.
-   II->max_seq_cost now guides the maximum number of set instructions in
-   a basic block which is considered profitable to completely
-   if-convert.  */
+   of conditional moves.  Also check that we have more than one set
+   (other routines can handle a single set better than we would), and
+   fewer than PARAM_MAX_RTL_IF_CONVERSION_INSNS sets.  */
 
 static bool
-bb_ok_for_noce_convert_multiple_sets (basic_block test_bb,
-  struct noce_if_info *ii)
+bb_ok_for_noce_convert_multiple_sets (basic_block test_bb)
 {
   rtx_insn *insn;
   unsigned count = 0;
   unsigned param = PARAM_VALUE (PARAM_MAX_RTL_IF_CONVERSION_INSNS);
-  /* TODO:  Revisit this cost model.  */
-  unsigned limit = MIN (ii->max_seq_cost / COSTS_N_INSNS (1), param);
 
   FOR_BB_INSNS (test_bb, insn)
 {
@@ -3377,14 +3379,15 @@ bb_ok_for_noce_convert_multiple_sets (basic_block test_bb,
   if (!can_conditionally_move_p (GET_MODE (dest)))
 	return false;
 
-  /* FORNOW: Our cost model is a count of the number of instructions we
-	 would if-convert.  This is suboptimal, and should be improved as part
-	 of a wider rework of branch_cost.  */
-  if (++count > limit)
-	return false;
+  count++;
 }
 
-  return count > 1;
+  /* If we would only put out one conditional move, the other strategies
+ this pass tries are better optimized and will be more appropriate.
+ Some targets want to strictly limit the number of conditional moves
+ that are emitted, they set this through PARAM, we need to respect
+ that.  */
+  return count > 1 && count <= param;
 }
 
 /* Given a simple IF-THEN-JOIN or IF-THEN-ELSE-JOIN block, attempt to convert
@@ -3420,7 +3423,7 @@ noce_process_if_block (struct noce_if_info *if_info)
   if (!else_bb
   && HAVE_conditional_move
   && !HAVE_cc0
-  && bb_ok_for_noce_convert_multiple_sets (then_bb, if_info))
+  && bb_ok_for_noce_convert_multiple_sets (then_bb))
 {
   if (noce_convert_multiple_sets (if_info))
 	{
diff --git a/gcc/testsuite/gcc.dg/ifcvt-4.c b/gcc/testsuite/gcc.dg/ifcvt-4.c
index 319b583..0d1671c 100644
--- a/gcc/testsuite/gcc.dg/ifcvt-4.c
+++ b/gcc/testsuite/gcc.dg/ifcvt-4.c
@@ -1,4 +1,4 @@
-/* { dg-options "-fdump-rtl-ce1 -O2 --param max-rtl-if-conversion-insns=3" } */
+/* { dg-options "-fdump-rtl-ce1 -O2 --param max-rtl-if-conversion-insns=3 --param max-rtl-if-conversion-unpredictable-cost=100" } */
 /* { dg-additional-options "-misel" { target { powerpc*-*-* } } } */
 /* { dg-skip-if "Multiple set if-conversion not guaranteed on all subtargets" { "arm*-*-* hppa*64*-*-* visium-*-*" } }  */
 
diff --git a/gcc/testsuite/gcc.dg/ifcvt-5.c b/gcc/testsuite/gcc.dg/ifcvt-5.c
index 818099a..d2a9476 100644
--- a/gcc/testsuite/gcc.dg/ifcvt-5.c
+++ b/gcc/testsuite/gcc.dg/ifcvt-5.c
@@ -1,7 +1,8 @@
 /* Check that 

Re: RFC: pass to warn on questionable uses of alloca().

2016-06-21 Thread Martin Sebor

On 06/20/2016 03:41 PM, Jeff Law wrote:

On 06/20/2016 08:56 AM, Joseph Myers wrote:

On Sat, 18 Jun 2016, Martin Sebor wrote:


the function regardless of the value of its argument).  At
the same time, it seems that an even more reliable solution
than pointing out potentially unsafe calls to the function
and relying on users to modify their code to use malloc for
large/unbounded allocations would be to let GCC do it for
them automatically (i.e., in response to some other option,
emit a call to malloc instead, and insert a call to free when
appropriate).


Note that such an option would not be usable for the original motivating
case of glibc, because in code that's meant to be async-signal-safe,
alloca and VLAs can be used, but malloc cannot.

I've actually considered the other direction more viable.  Define the
right set of constraints and let the compiler optimize from malloc/free
to alloca.

For the uses of alloca in glibc that have to be async-signal-safe, we
should just leave those alone no matter what we may or may not be able
to prove.


Would a new attribute to annotate async-signal safe functions
help?  I envision that the attribute on a function definition
would turn off the alloca/VLA to malloc transformation, and
could also diagnose calls to other function whose declarations
were not also declared async-signal safe with the same
attribute.

Martin


[RFC: Patch 3/6 v2] Remove if_info->branch_cost

2016-06-21 Thread James Greenhalgh

Hi,

This patch removes what is left of branch_cost uses, moving them to use
the new hook and tagging each left over spot with a TODO to revisit them.
All these uses are in rtx costs units, so we don't have more work to do at
this point.

Bootstrapped as part of the patch series on aarch64 and x86-64.

OK?

Thanks,
James

---
2016-06-21  James Greenhalgh  

* ifcvt.c (noce_if_info): Remove branch_cost.
(noce_try_store_flag_mask): Use max_seq_cost rather than
branch_cost, tag as a TODO..
(noce_try_cmove_arith): Likewise.
(noce_convert_multiple_sets): Likewise.
(bb_ok_for_noce_convert_multiple_sets): Likewise.
(noce_find_if_block): Remove set of branch_cost.

diff --git a/gcc/ifcvt.c b/gcc/ifcvt.c
index 0b97114..f4ad037 100644
--- a/gcc/ifcvt.c
+++ b/gcc/ifcvt.c
@@ -811,9 +811,6 @@ struct noce_if_info
   unsigned int then_cost;
   unsigned int else_cost;
 
-  /* Estimated cost of the particular branch instruction.  */
-  unsigned int branch_cost;
-
   /* Maximum permissible cost for the unconditional sequence we should
  generate to replace this branch.  */
   unsigned int max_seq_cost;
@@ -1683,7 +1680,8 @@ noce_try_store_flag_mask (struct noce_if_info *if_info)
 
 	  speed_p = optimize_bb_for_speed_p (BLOCK_FOR_INSN (if_info->insn_a));
 	  insn_cost = insn_rtx_cost (PATTERN (if_info->insn_a), speed_p);
-	  old_cost = COSTS_N_INSNS (if_info->branch_cost) + insn_cost;
+	  /* TODO: Revisit this cost model.  */
+	  old_cost = if_info->max_seq_cost + insn_cost;
 	  new_cost = seq_cost (seq, speed_p);
 
 	  if (new_cost > old_cost)
@@ -2159,7 +2157,9 @@ noce_try_cmove_arith (struct noce_if_info *if_info)
 
   /* We're going to execute one of the basic blocks anyway, so
  bail out if the most expensive of the two blocks is unacceptable.  */
-  if (MAX (then_cost, else_cost) > COSTS_N_INSNS (if_info->branch_cost))
+
+  /* TODO: Revisit cost model.  */
+  if (MAX (then_cost, else_cost) > if_info->max_seq_cost)
 return FALSE;
 
   /* Possibly rearrange operands to make things come out more natural.  */
@@ -3341,8 +3341,8 @@ noce_convert_multiple_sets (struct noce_if_info *if_info)
of conditional moves.  FORNOW: Use II to find the expected cost of
the branch into/over TEST_BB.
 
-   TODO: This creates an implicit "magic number" for branch_cost.
-   II->branch_cost now guides the maximum number of set instructions in
+   TODO: This creates an implicit "magic number" for if conversion.
+   II->max_seq_cost now guides the maximum number of set instructions in
a basic block which is considered profitable to completely
if-convert.  */
 
@@ -3353,7 +3353,8 @@ bb_ok_for_noce_convert_multiple_sets (basic_block test_bb,
   rtx_insn *insn;
   unsigned count = 0;
   unsigned param = PARAM_VALUE (PARAM_MAX_RTL_IF_CONVERSION_INSNS);
-  unsigned limit = MIN (ii->branch_cost, param);
+  /* TODO:  Revisit this cost model.  */
+  unsigned limit = MIN (ii->max_seq_cost / COSTS_N_INSNS (1), param);
 
   FOR_BB_INSNS (test_bb, insn)
 {
@@ -4070,8 +4071,6 @@ noce_find_if_block (basic_block test_bb, edge then_edge, edge else_edge,
   if_info.cond_earliest = cond_earliest;
   if_info.jump = jump;
   if_info.then_else_reversed = then_else_reversed;
-  if_info.branch_cost = BRANCH_COST (optimize_bb_for_speed_p (test_bb),
- predictable_edge_p (then_edge));
   if_info.max_seq_cost
 = targetm.max_noce_ifcvt_seq_cost (optimize_bb_for_speed_p (test_bb),
    then_edge);


[RFC: Patch 1/6 v2] New target hook: max_noce_ifcvt_seq_cost

2016-06-21 Thread James Greenhalgh

On Fri, Jun 03, 2016 at 12:39:42PM +0200, Richard Biener wrote:
> On Thu, Jun 2, 2016 at 6:53 PM, James Greenhalgh
>  wrote:
> >
> > Hi,
> >
> > This patch introduces a new target hook, to be used like BRANCH_COST but
> > with a guaranteed unit of measurement. We want this to break away from
> > the current ambiguous uses of BRANCH_COST.
> >
> > BRANCH_COST is used in ifcvt.c in two types of comparisons. One against
> > instruction counts - where it is used as the limit on the number of new
> > instructions we are permitted to generate. The other (after multiplying
> > by COSTS_N_INSNS (1)) directly against RTX costs.
> >
> > Of these, a comparison against RTX costs is the more easily understood
> > metric across the compiler, and the one I've pulled out to the new hook.
> > To keep things consistent for targets which don't migrate, this new hook
> > has a default value of BRANCH_COST * COSTS_N_INSNS (1).
> >
> > OK?
>
> How does the caller compute "predictable"?  There are some archs where
> an information on whether this is a forward or backward jump is more
> useful I guess.  Also at least for !speed_p the distance of the branch is
> important given not all targets support arbitrary branch offsets.

Just through a call to predictable_edge_p. It isn't perfect. My worry
with adding more details of the branch is that you end up with a nonsense
target implementation that tries way too hard to be clever. But, I don't
mind passing the edge through to the target hook, that way a target has
it if they want it. In this patch revision, I pass the edge through.

> I remember that at the last Cauldron we discussed to change things to
> compare costs of sequences of instructions rather than giving targets no
> context with just asking for single (sub-)insn rtx costs.

I've made better use of seq_cost in this respin. Bernd was right,
constructing dummy RTX just for costs, then discarding it, then
constructing the actual RTX for matching doesn't make sense as a pipeline.
Better just to construct the real sequence and use the cost of that.

In this patch revision, I started by removing the idea that this costs
a branch at all. It doesn't, the use of this hook is really a target
trying to limit if-convert to not end up pulling too much on to the
unconditional path. It seems better to expose that limit directly by
explicitly asking for the maximum cost of an unconditional sequence we
would create, and comparing against seq_cost of the new RTL. This saves
a target trying to figure out what is meant by a cost of a branch.

Having done that, I think I can see a clearer path to getting the
default hook implementation in shape. I've introduced two new params,
which give maximum costs for the generated sequence (one for a "predictable"
branch, one for "unpredictable") in the speed_p cases. I'm not expecting it
to be useful to give the user control in the case we are compiling for
size - whether this is a size win or not is independent of whether the
branch is predictable.

For the default implementation, if the parameters are not set, I just
multiply BRANCH_COST through by COSTS_N_INSNS (1) for size and
COSTS_N_INSNS (3) for speed. I know this is not ideal, but I'm still short
of ideas on how best to form the default implementation. This means we're
still potentially going to introduce performance regressions for targets
that don't provide an implementation of the new hook, or a default value
for the new parameters. It does mean we can keep the testsuite clean by
setting parameter values suitably high for all targets that have
conditional move instructions.

The new default causes some changes in generated conditional move sequences
for x86_64. Whether these changes are for the better or not I can't say.

This first patch introduces the two new parameters, and uses them in the
default implementation of the target hook.

Bootstrapped on x86_64 and aarch64 with no issues.

OK?

Thanks,
James

---
2016-06-21  James Greenhalgh  

* target.def (max_noce_ifcvt_seq_cost): New.
* doc/tm.texi.in (TARGET_MAX_NOCE_IFCVT_SEQ_COST): Document it.
* doc/tm.texi: Regenerate.
* targhooks.h (default_max_noce_ifcvt_seq_cost): New.
* targhooks.c (default_max_noce_ifcvt_seq_cost): New.
* params.def (PARAM_MAX_RTL_IF_CONVERSION_PREDICTABLE_COST): New.
(PARAM_MAX_RTL_IF_CONVERSION_UNPREDICTABLE_COST): Likewise.
* doc/invoke.texi: Document new params.

diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
index e000218..b71968f 100644
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -8816,6 +8816,17 @@ considered for if-conversion.  The default is 10, though the compiler will
 also use other heuristics to decide whether if-conversion is likely to be
 profitable.
 
+@item max-rtl-if-conversion-precitable-cost
+@item max-rtl-if-conversion-unprecitable-cost
+RTL if-conversion tries to remove conditional branches around a block and

Re: Implement C _FloatN, _FloatNx types

2016-06-21 Thread Joseph Myers
On Tue, 21 Jun 2016, Bill Schmidt wrote:

> I haven't read through the patch in detail yet, but thank you for the 
> very thorough description!  For PowerPC, we currently have a lot 
> invested in having __float128 correspond to KFmode for the time being, 
> during the transition while supporting both IEEE-128 and IBM long 
> double.  (I was not aware of the problem with q-suffix constants being 
> treated as TFmode, which will need to be fixed.)  I'm currently in 
> process of developing the minimal set of *f128 built-ins required for 
> glibc, using __float128 and KFmode.
> 
> Can we please keep this correspondence for now?  Do you foresee any 
> further concerns with staying on that path until we can fully throw the 
> switch to move away from IBM long double?

The question is what mode is used for _Float128 (and thus for __float128, 
which is now a typedef for _Float128) in the FLOAT128_IEEE_P (TFmode) case 
(that is, when TARGET_IEEEQUAD is true).  I see no problems with that mode 
being KFmode instead of TFmode if that's what's preferred; I'll make that 
change in the next revision of my patch.  Of course, you should make *q 
constants match, so that they are consistent with *f128 constants.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PING] [PATCH] Fix ICE with invalid use of flags output operand

2016-06-21 Thread Bernd Edlinger
Oh, yes.

Uros...  ?


Thanks
Bernd.

Am 21.06.2016 um 17:04 schrieb Jeff Law:
> On 06/21/2016 02:09 AM, Bernd Edlinger wrote:
>> Ping...
>>
>> for this patch: https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00871.html
>>
>> I'd say it's a no-brainer...
> You might want to contact Uros directly.  He does the most with the x86
> backend these days.
>
> jeff
>


Re: Implement C _FloatN, _FloatNx types

2016-06-21 Thread Joseph Myers
On Tue, 21 Jun 2016, FX wrote:

> > Fortran note: I took the conservative approach of renaming the
> > float128_type_node used in the Fortran front end, since I wasn't sure
> > if it's safe to make the front end always use the language-independent
> > node (which follows C rules - thus, being distinct from long double
> > even if that has binary128 format).
> 
> In the Fortran front-end, float128_type_node is defined as the 
> (expectedly unique) floating-point type for which mode_precision is 
> equal to 128 but not equal to LONG_DOUBLE_TYPE_SIZE. That is, it is 
> garanteed that float128_type_node is not long_double_type_node.
> 
> If fact, if there is a long double type with precision of 128, then 
> float128_type_node is NULL.
> 
> Would that match the (new) C behavior? I think it does.

Precision is a poorly defined concept in this context.

Precision of a type is meant to be the number of value bits, which is 128 
not just for binary128 but also for IBM long double.  I think the same 
applies to precision of a mode.  See Richard's comments in 
 on how the use 
of fractional float modes for IFmode and KFmode is a lie, because those 
modes both have 128 significant bits, not 106 and 113.  To avoid the 
misleading mode precision values becoming misleading type precision 
values, my patch includes a workaround in build_common_tree_nodes to give 
precedence to the N in binaryN (although it occurs to me that this is not 
in fact a sufficient workaround, because _Float64x needs such a fix as 
well) - that keeps _Float128 having type precision 128 in that case, like 
ieee128_float_type_node in the PowerPC back end does.

Language-independent float128_type_node has the following properties: it 
is always of binary128 format, never of another format that might have 
precision 128 (such as IBM long double).  If binary128 is supported, 
float128_type_node is non-NULL, whether or not long double also has 
precision 128.  That does not seem to match your semantics for Fortran in 
the case where long double is binary128, as for Fortran you'd have NULL 
but the language-independent node would be a distinct type from long 
double, not NULL.

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: Implement C _FloatN, _FloatNx types

2016-06-21 Thread FX
> Fortran note: I took the conservative approach of renaming the
> float128_type_node used in the Fortran front end, since I wasn't sure
> if it's safe to make the front end always use the language-independent
> node (which follows C rules - thus, being distinct from long double
> even if that has binary128 format).

In the Fortran front-end, float128_type_node is defined as the (expectedly 
unique) floating-point type for which mode_precision is equal to 128 but not 
equal to LONG_DOUBLE_TYPE_SIZE. That is, it is garanteed that 
float128_type_node is not long_double_type_node.

If fact, if there is a long double type with precision of 128, then 
float128_type_node is NULL.

Would that match the (new) C behavior? I think it does.

FX

Re: [PING] [PATCH] Fix ICE with invalid use of flags output operand

2016-06-21 Thread Jeff Law

On 06/21/2016 02:09 AM, Bernd Edlinger wrote:

Ping...

for this patch: https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00871.html

I'd say it's a no-brainer...
You might want to contact Uros directly.  He does the most with the x86 
backend these days.


jeff



Re: [PATCH] update-copyright.py: Retain file mode

2016-06-21 Thread Jeff Law

On 06/21/2016 08:14 AM, Bernhard Reutner-Fischer wrote:

Hi!

Ok for trunk?

thanks,

contrib/ChangeLog

2016-06-21  Bernhard Reutner-Fischer  

* update-copyright.py (Copyright.process_file): Retain original
file mode.

OK.
jeff



[Committed][testsuite] Fix vect-8.f90 test

2016-06-21 Thread Wilco Dijkstra
Due to recent improvements to the vectorizer, the number of vectorized
loops needs to be increased to 21 in gfortran.dg/vect/vect-8.f90.

Confirmed this test now passes on AArch64.

Commited as trivial patch in r237650.

ChangeLog:
2016-06-21  Wilco Dijkstra  

* gfortran.dg/vect/vect-8.f90: Set "vectorized loops" to 21.
--
diff --git a/gcc/testsuite/gfortran.dg/vect/vect-8.f90 
b/gcc/testsuite/gfortran.dg/vect/vect-8.f90
index 
1b85a6152a40bec5399971ea65dff85fc604230a..865a47725de3df6e0aaabd1466c910d3177d5c08
 100644
--- a/gcc/testsuite/gfortran.dg/vect/vect-8.f90
+++ b/gcc/testsuite/gfortran.dg/vect/vect-8.f90
@@ -703,4 +703,4 @@ CALL track('KERNEL  ')
 RETURN
 END SUBROUTINE kernel
 
-! { dg-final { scan-tree-dump-times "vectorized 20 loops" 1 "vect" } }
+! { dg-final { scan-tree-dump-times "vectorized 21 loops" 1 "vect" } }



Re: [PATCH 0/6] remove some usage of rtx_{insn,expr}_list

2016-06-21 Thread Jeff Law

On 06/21/2016 08:47 AM, Trevor Saunders wrote:

On Mon, Jun 20, 2016 at 06:52:35PM +0200, Bernd Schmidt wrote:

On 06/20/2016 12:22 PM, tbsaunde+...@tbsaunde.org wrote:

In theory I would expect if anything this helps performance since it isn't
necessary to malloc every time a node is added, however the data is less clear.


Well, we have alloc pools for these lists, so a malloc is not needed for
every node.


its true, and lists.c has its own special cache, but still it is more
than storing a pointer and incrementing the length I expect.
Yea, it's got that cache.  IIRC that was a fairly minor optimization.  I 
wouldn't lose any sleep if it went away.  IIRC it was to reduce the cost 
of the lists we build up and tear down gcse.c.  It may have even been 
limited to load/store motion, in which case I really don't think the 
cache is all that important.




jeff



Re: [PATCH 6/6] loop-iv.c: make cond_list a vec

2016-06-21 Thread Trevor Saunders
On Mon, Jun 20, 2016 at 09:11:21PM +0100, Richard Sandiford wrote:
> tbsaunde+...@tbsaunde.org writes:
> > diff --git a/gcc/loop-iv.c b/gcc/loop-iv.c
> > index 57fb8c1..21c3180 100644
> > --- a/gcc/loop-iv.c
> > +++ b/gcc/loop-iv.c
> > @@ -1860,7 +1860,6 @@ simplify_using_initial_values (struct loop *loop, 
> > enum rtx_code op, rtx *expr)
> >  {
> >bool expression_valid;
> >rtx head, tail, last_valid_expr;
> > -  rtx_expr_list *cond_list;
> >rtx_insn *insn;
> >rtx neutral, aggr;
> >regset altered, this_altered;
> > @@ -1936,7 +1935,7 @@ simplify_using_initial_values (struct loop *loop, 
> > enum rtx_code op, rtx *expr)
> >  
> >expression_valid = true;
> >last_valid_expr = *expr;
> > -  cond_list = NULL;
> > +  auto_vec cond_list;
> >while (1)
> >  {
> >insn = BB_END (e->src);
> 
> How about using "auto_vec" for some small N, since we expect
> cond_list to be used fairly often?

sure, why not?

> > @@ -1988,39 +1988,30 @@ simplify_using_initial_values (struct loop *loop, 
> > enum rtx_code op, rtx *expr)
> >  
> >   if (suitable_set_for_replacement (insn, , ))
> > {
> > - rtx_expr_list **pnote, **pnote_next;
> > -
> >   replace_in_expr (expr, dest, src);
> >   if (CONSTANT_P (*expr))
> > goto out;
> >  
> > - for (pnote = _list; *pnote; pnote = pnote_next)
> > + unsigned int len = cond_list.length ();
> > + for (unsigned int i = len - 1; i < len; i--)
> > {
> > - rtx_expr_list *note = *pnote;
> > - rtx old_cond = XEXP (note, 0);
> > + rtx old_cond = cond_list[i];
> >  
> > - pnote_next = (rtx_expr_list **) (note, 1);
> > - replace_in_expr ( (note, 0), dest, src);
> > + replace_in_expr (_list[i], dest, src);
> >  
> >   /* We can no longer use a condition that has been simplified
> >  to a constant, and simplify_using_condition will abort if
> >  we try.  */
> > - if (CONSTANT_P (XEXP (note, 0)))
> > -   {
> > - *pnote = *pnote_next;
> > - pnote_next = pnote;
> > - free_EXPR_LIST_node (note);
> > -   }
> > + if (CONSTANT_P (cond_list[i]))
> > +   cond_list.ordered_remove (i);
> 
> Do we really need ordered removes here and below?  Obviously it turns
> the original O(1) operation into O(n), and it wasn't obvious from first
> glance that the order of the conditions was relevant.

I'm not sure, but I certainly don't know that we don't need them.  I
kind of ment to not send this patch because of that question but then
forgot.  I'm not really sure what to do with these, I don't know that I
know what's going on well enough to prove unordered removes are fine,
but I guess I can try.

Trev

> 
> Thanks,
> Richard


Re: [PATCH 0/6] remove some usage of rtx_{insn,expr}_list

2016-06-21 Thread Trevor Saunders
On Mon, Jun 20, 2016 at 06:52:35PM +0200, Bernd Schmidt wrote:
> On 06/20/2016 12:22 PM, tbsaunde+...@tbsaunde.org wrote:
> > In theory I would expect if anything this helps performance since it isn't
> > necessary to malloc every time a node is added, however the data is less 
> > clear.
> 
> Well, we have alloc pools for these lists, so a malloc is not needed for
> every node.

its true, and lists.c has its own special cache, but still it is more
than storing a pointer and incrementing the length I expect.

> > fold const O2 new
> > real0m5.034s
> > user0m3.408s
> > sys 0m0.364s
> > 
> > fold const O2 old
> > real0m4.012s
> > user0m3.420s
> > sys 0m0.340s
> 
> So that's a second more in real time - was the machine very busy at the time
> you ran these tests so that these aren't meaningful, or is there a need to
> investigate this?

Well, it was on my laptop which was running a web browser and stuff.  I
wasn't aware of it being busy, but it also wasn't a extra stable
machine.  I also noticed a bit of variance within the same configuration
so I'm not terribly concerned, but it is odd.

> > So a couple got about .3s slower, and others got about .1 faster, I'm not
> > really sure but inclined to say any change is too small to easily measure.
> > 
> > bootstrapped + regtested patches individually on x86_64-linux-gnu, ok?
> 
> Modulo the question about compile times I think patches 1-4 are ok, In 5 and
> 6 I see explicit for loops instead of FOR_EACH macros; I'm curious as to the
> reason.

uh, I suck and wasn't careful enough checking I fixed everything and I
guess it was easy to forget since I've been destracted by other stuff.
Sorry about that.

Trev

> 
> Bernd
> 


Re: [Patch AArch64 2/2]Add missing vcond by rewriting it with vcond_mask/vec_cmp patterns.

2016-06-21 Thread James Greenhalgh
On Wed, Jun 15, 2016 at 09:22:20AM +, Bin Cheng wrote:
> +  rtx mask = gen_reg_rtx (mode);
> +  enum rtx_code code = GET_CODE (operands[3]);
> +
> +  emit_insn (gen_vec_cmp_internal (mask, operands[3],
> +operands[4], operands[5]));
> +  /* See comments of vec_cmp_internal, the opposite
> + result masks are computed for below operators, we need to invert
> + the mask here.  In this case we can save an inverting instruction
> + by simply swapping the two operands to bsl.  */
> +  if (code == NE || code == UNEQ || code == UNLT || code == UNLE
> +  || code == UNGT || code == UNGE || code == UNORDERED)
> +std::swap (operands[1], operands[2]);

With regards to my comments on patch 1/2 - do you not get the same code-gen
if you change those functions to always generate the correct mask, and add
a second invert here before swapping the operands. The two mask inverts
should simplify to nothing, and you'd clean up the design?

Thanks,
James



Re: [Patch AArch64 1/2]Implement vcond_mask/vec_cmp patterns.

2016-06-21 Thread James Greenhalgh
On Wed, Jun 15, 2016 at 09:21:29AM +, Bin Cheng wrote:
> Hi,
> According to review comments, I split the original patch @
> https://gcc.gnu.org/ml/gcc-patches/2016-05/msg01182.html into two, as well as
> refined the comments.  Here is the first one implementing vcond_mask/vec_cmp
> patterns on AArch64.  These new patterns will be used in the second patch for
> vcond.
> 
> +;; Patterns comparing two vectors to produce a mask.
> +
> +;; Internal pattern for vec_cmp.  It returns expected result mask for
> +;; comparison operators other than NE.  For NE operator, it returns
> +;; the opposite result mask.  This is intended behavior so that we
> +;; can save one mask inverting instruction when using this pattern in
> +;; vcond patterns.  In this case, it is the caller's responsibility
> +;; to interpret and use the result mask correctly.

I'm not convinced by this design at all. Having a function that sometimes
generates the inverse of what you expect is difficult to follow, and I'm
not going to OK it unless it is really the only way of implementing these
hooks.

Can we not rely on the compiler spotting that it can simplify two
one's compliment that appear beside each other?

The integer case needing negation of NE is almost possible to follow, but
the float unordered/UNGE/etc. cases become very, very difficult to reason
about. Particularly seeing UNGT fall through to UNLT

> +;; Internal pattern for vec_cmp.  Similar to vec_cmp +;; it returns the opposite result mask for operators NE, UNEQ, UNLT,
> +;; UNLE, UNGT, UNGE and UNORDERED.  This is intended behavior so that
> +;; we can save one mask inverting instruction when using this pattern
> +;; in vcond patterns.  In these cases, it is the caller's responsibility
> +;; to interpret and use the result mask correctly.
> +(define_expand "vec_cmp_internal"
> +  [(set (match_operand: 0 "register_operand")
> + (match_operator 1 "comparison_operator"
> + [(match_operand:VDQF 2 "register_operand")
> +  (match_operand:VDQF 3 "nonmemory_operand")]))]
> +  "TARGET_SIMD"
> +{
> +  int use_zero_form = 0;
> +  enum rtx_code code = GET_CODE (operands[1]);
> +  rtx tmp = gen_reg_rtx (mode);
> +
> +  rtx (*comparison) (rtx, rtx, rtx);
> +
> +  if (operands[3] == CONST0_RTX (mode)
> +  && (code == LE || code == LT || code == GE || code == GT || code == 
> EQ))
> +{
> +  /* Some instructions have a form taking an immediate zero.  */
> +  use_zero_form = 1;
> +}
> +  else if (!REG_P (operands[3]))
> +{
> +  /* Make sure we can handle the last operand.  */
> +  operands[3] = force_reg (mode, operands[3]);
> +}
> +
> +  switch (code)
> +{
> +case LT:
> +  if (use_zero_form)
> + {
> +   comparison = gen_aarch64_cmlt;
> +   break;
> + }
> +  /* Else, fall through.  */
> +case UNGE:
> +  std::swap (operands[2], operands[3]);
> +  /* Fall through.  */
> +case UNLE:
> +case GT:
> +  comparison = gen_aarch64_cmgt;
> +  break;
> +case LE:
> +  if (use_zero_form)
> + {
> +   comparison = gen_aarch64_cmle;
> +   break;
> + }
> +  /* Else, fall through.  */
> +case UNGT:
> +  std::swap (operands[2], operands[3]);
> +  /* Fall through.  */
> +case UNLT:
> +case GE:
> +  comparison = gen_aarch64_cmge;
> +  break;
> +case NE:
> +case EQ:
> +  comparison = gen_aarch64_cmeq;
> +  break;
> +case UNEQ:
> +case ORDERED:
> +case UNORDERED:
> +  break;
> +default:
> +  gcc_unreachable ();
> +}
> +
> +  switch (code)
> +{
> +case UNGT:
> +case UNGE:
> +case NE:
> +case UNLT:
> +case UNLE:
> +  /* FCM returns false for lanes which are unordered, so if we use
> +  the inverse of the comparison we actually want to emit, then
> +  revert the result, we will end up with the correct result.
> +  Note that a NE NaN and NaN NE b are true for all a, b.
> +
> +  Our transformations are:
> +  a GE b -> !(b GT a)
> +  a GT b -> !(b GE a)
> +  a LE b -> !(a GT b)
> +  a LT b -> !(a GE b)
> +  a NE b -> !(a EQ b)
> +
> +  See comment at the beginning of this pattern, we return the
> +  opposite of result mask for these operators, and it's caller's
> +  resonsibility to invert the mask.

These comments don't fit with the code (which does nothing other than
fall through).

> +
> +  Fall through.  */
> +case LT:
> +case LE:
> +case GT:
> +case GE:
> +case EQ:
> +  /* The easy case.  Here we emit one of FCMGE, FCMGT or FCMEQ.
> +  As a LT b <=> b GE a && a LE b <=> b GT a.  Our transformations are:
> +  a GE b -> a GE b
> +  a GT b -> a GT b
> +  a LE b -> b GE a
> +  a LT b -> b GT a
> +  a EQ b -> a EQ b
> +  Note that there also exist direct comparison against 0 forms,
> +  so catch those as a special case.  */
> +
> +  emit_insn (comparison 

Re: [PATCH/AARCH64] Accept vulcan as a cpu name for the AArch64 port of GCC

2016-06-21 Thread Virendra Pathak
Hi James,

> This patch is OK for trunk.
Thank you for the review and merging the patch to trunk.

> I couldn't spot your name in the MAINTAINERS file, so I've applied this
> on your behalf as revision 237645.
My name is not present in the MAINTAINERS file. This was my first
patch in the GCC :-)
In future I will request aarch64's maintainers to apply the patches
on my behalf after review (to avoid any confusion).

Thanks.

with regards,
Virendra Pathak

On Tue, Jun 21, 2016 at 7:16 PM, James Greenhalgh
 wrote:
>
> On Sat, Jun 18, 2016 at 01:57:43AM +0530, Virendra Pathak wrote:
> > Hi,
> >
> > Please find the patch for introducing vulcan as a cpu name for the
> > AArch64 port of GCC.
> > Broadcom's vulcan is an armv8.1-a aarch64 server processor.
> >
> > Since vulcan is the first armv8.1-a processor to be introduced in
> > aarch64-cores.def,
> > I have created a new section in the file for the armv8.1 based processors.
> > Kindly let me know if that is okay.
> >
> > Tested the patch with cross aarch64-linux-gnu, bootstrapped native
> > aarch64-unknown-linux-gnu
> > and make check (gcc, ld, gas, binutils, gdb).
> > No new regression failure is added by this patch.
> >
> > In addition, tested -mcpu=vulcan -mtune=vulcan flags by passing them
> > via command line.
> > Also verified that above flags passes armv8.1-a option to assembler(as).
> >
> > At present we are using schedule & cost model of cortex-a57 but
> > soon we will be submitting one for vulcan.
> >
> > Please review the patch.
> > Ok for trunk?
> >
>
> This patch is OK for trunk.
>
> I couldn't spot your name in the MAINTAINERS file, so I've applied this
> on your behalf as revision 237645.
>
> Thank you for the patch.
>
> Thanks,
> James
>
> > gcc/ChangeLog:
> >
> > Virendra Pathak 
> >
> > * config/aarch64/aarch64-cores.def (vulcan): New core.
> > * config/aarch64/aarch64-tune.md: Regenerate.
> > * doc/invoke.texi: Document vulcan as an available option.
> >
>


[PATCH][ARM] Updating testcase unsigned-extend-2.c

2016-06-21 Thread Andre Vieira (lists)
Hello,

After some changes to GCC this test no longer tests the desired code
generation behavior. The generated assembly is better than it used to
be, but it has become too smart. I add an extra parameter to make sure
GCC can't optimize away the loop.

Tested for arm-none-eabi-gcc with a Cortex-M3 target.

Is this OK?

Cheers,
Andre

gcc/ChangeLog
2016-06-21  Andre Vieira  

* gcc.target/arm/unsigned-extend-2.c: Update testcase.
>From 12da0a48045b37efb2e459116ec81cc7117a0981 Mon Sep 17 00:00:00 2001
From: Andre Simoes Dias Vieira 
Date: Mon, 20 Jun 2016 14:27:06 +0100
Subject: [PATCH] fix testcase

---
 gcc/testsuite/gcc.target/arm/unsigned-extend-2.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/gcc/testsuite/gcc.target/arm/unsigned-extend-2.c b/gcc/testsuite/gcc.target/arm/unsigned-extend-2.c
index b610b73617dc6e6a5428c966380516007a02acba..013240749ecaabf0d2e8ad802d27c7edc69d8828 100644
--- a/gcc/testsuite/gcc.target/arm/unsigned-extend-2.c
+++ b/gcc/testsuite/gcc.target/arm/unsigned-extend-2.c
@@ -2,13 +2,13 @@
 /* { dg-require-effective-target arm_thumb2_ok } */
 /* { dg-options "-O" } */
 
-unsigned short foo (unsigned short x)
+unsigned short foo (unsigned short x, unsigned short c)
 {
   unsigned char i = 0;
   for (i = 0; i < 8; i++)
 {
   x >>= 1;
-  x &= 0x7fff;
+  x &= c;
 }
   return x;
 }
-- 
1.9.1



[Patch, Fortran] PR71068 - fix ICE on invalid with coindexed DATA

2016-06-21 Thread Tobias Burnus
Dear all,

the problem comes up with:
   data a(1)[1] /1/
which is invalid. In resolve.c's check_data_variable(), one has:

  if (!gfc_resolve_expr (var->expr))
return false;
...
  e = var->expr;

  if (e->expr_type != EXPR_VARIABLE)
gfc_internal_error ("check_data_variable(): Bad expression");

which triggers as resolve_variable() has:

  if (t && flag_coarray == GFC_FCOARRAY_LIB && gfc_is_coindexed (e))
add_caf_get_intrinsic (e);


The solution is either not to decorate the DATA variable with
caf_get() - or to strip it off for testing. The latter has been
done in this patch. It's not really beautify, but works.

Additionally, I had to add the argument-handling short cut
as otherwise, more and more caf_get() could be added around the
argument, which is both pointless and causes the strip off to
fail.


Build and regtested on x86-64-gnu-linux.
OK for the trunk? Or do you see a more beautiful approach?

Tobias
	PR fortran/71068
	* resolve.c (resolve_function): Don't resolve caf_get/caf_send.
	(check_data_variable): Strip-off caf_get before checking.

	PR fortran/71068
	* gfortran.dg/coarray/data_1.f90: New.

diff --git a/gcc/fortran/resolve.c b/gcc/fortran/resolve.c
index 77f8c10..4378313 100644
--- a/gcc/fortran/resolve.c
+++ b/gcc/fortran/resolve.c
@@ -2923,6 +2923,13 @@ resolve_function (gfc_expr *expr)
   if (gfc_is_proc_ptr_comp (expr))
 return true;
 
+  /* Avoid re-resolving the arguments of caf_get, which can lead to inserting
+ another caf_get.  */
+  if (sym && sym->attr.intrinsic
+  && (sym->intmod_sym_id == GFC_ISYM_CAF_GET
+	  || sym->intmod_sym_id == GFC_ISYM_CAF_SEND))
+return true;
+
   if (sym && sym->attr.intrinsic
   && !gfc_resolve_intrinsic (sym, >where))
 return false;
@@ -14495,6 +14502,10 @@ check_data_variable (gfc_data_variable *var, locus *where)
   mpz_init_set_si (offset, 0);
   e = var->expr;
 
+  if (e->expr_type == EXPR_FUNCTION && e->value.function.isym
+  && e->value.function.isym->id == GFC_ISYM_CAF_GET)
+e = e->value.function.actual->expr;
+
   if (e->expr_type != EXPR_VARIABLE)
 gfc_internal_error ("check_data_variable(): Bad expression");
 
diff --git a/gcc/testsuite/gfortran.dg/coarray/data_1.f90 b/gcc/testsuite/gfortran.dg/coarray/data_1.f90
new file mode 100644
index 000..d68ac14
--- /dev/null
+++ b/gcc/testsuite/gfortran.dg/coarray/data_1.f90
@@ -0,0 +1,11 @@
+! { dg-do compile }
+!
+! PR fortran/71068
+!
+! Contributed by Gerhard Steinmetz
+!
+program p
+   integer :: a(2)[*]
+   data a(1)[1] /1/  ! { dg-error "cannot have a coindex" }
+   data a(2)[1] /2/  ! { dg-error "cannot have a coindex" }
+end


[PATCH] update-copyright.py: Retain file mode

2016-06-21 Thread Bernhard Reutner-Fischer
Hi!

Ok for trunk?

thanks,

contrib/ChangeLog

2016-06-21  Bernhard Reutner-Fischer  

* update-copyright.py (Copyright.process_file): Retain original
file mode.
---
 contrib/update-copyright.py | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/contrib/update-copyright.py b/contrib/update-copyright.py
index ebefa46..04afd18 100755
--- a/contrib/update-copyright.py
+++ b/contrib/update-copyright.py
@@ -393,8 +393,10 @@ class Copyright:
 lines = []
 changed = False
 line_filter = filter.get_line_filter (dir, filename)
+mode = None
 with open (pathname, 'r') as file:
 prev = None
+mode = os.fstat (file.fileno()).st_mode
 for line in file:
 while line:
 next_line = None
@@ -421,6 +423,7 @@ class Copyright:
 with open (tmp_pathname, 'w') as file:
 for line in lines:
 file.write (line)
+os.fchmod (file.fileno(), mode)
 if self.use_quilt:
 subprocess.call (['quilt', 'add', pathname])
 os.rename (tmp_pathname, pathname)
-- 
2.8.1



Re: [PATCH,rs6000] Add support for HAVE_AS_POWER9

2016-06-21 Thread Segher Boessenkool
On Mon, Jun 20, 2016 at 07:16:49PM -0600, Kelvin Nilsen wrote:
> A "#define HAVE_AS_POWER9" or "#undef HAVE_AS_POWER9" preprocessor
> directive is emitted into the $GCC_BUILD/gcc/auto-host.h file at
> configuration time, depending on whether the available assembler
> supports the Power9 instruction set.  This patch arranges to disable
> Power9-specific compiler features if HAVE_AS_POWER9 is not defined.
> 
> The patch includes code to modify the behavior of the compiler along with
> directives to adjust the treatment of certain dejagnu tests.  Disable
> the Power9-specific tests on aix because of known incompatibilities.
> 
> This patch has bootstrapped and regression tested on
> powerpc64le-unknown-linux-gnu with both a configuration that has a
> Power9 assembler and one that does not have a Power9 assembler.  In
> both cases, there were no regressions.  Is this ok for the trunk?  Is
> this patch ok for gcc-6 after a few days of burn-in on the trunk?

Okay for trunk, okay for 6 after a week.  Thanks,


Segher


Re: [PATCH] PR target/71549: Convert V1TImode register to TImode in debug insn

2016-06-21 Thread Ilya Enkovich
2016-06-21 15:48 GMT+03:00 H.J. Lu :
> On Mon, Jun 20, 2016 at 11:53 AM, H.J. Lu  wrote:
>> On Mon, Jun 20, 2016 at 10:31 AM, Ilya Enkovich  
>> wrote:
>>> On 20 Jun 09:45, H.J. Lu wrote:
 On Mon, Jun 20, 2016 at 7:30 AM, Ilya Enkovich  
 wrote:
 > 2016-06-20 16:39 GMT+03:00 Uros Bizjak :
 >> On Mon, Jun 20, 2016 at 1:55 PM, H.J. Lu  wrote:
 >>> TImode register referenced in debug insn can be converted to V1TImode
 >>> by scalar to vector optimization.  We need to convert a debug insn if
 >>> it has a variable in a TImode register.
 >>>
 >>> Tested on x86-64.  OK for trunk?
 >>
 >> Ilya, does this approach look good to you? Also, does DImode STV
 >> conversion need similar handling of debug insns?
 >
 > DImode conversion doesn't change register mode (i.e. never calls
 > PUT_MODE for registers).  That keeps debug instructions valid.
 >
 > Overall I don't like the idea of having debug insns in candidates
 > set and in chains.  Looks like it is possible to have a chain
 > consisting of a debug insn only which is weird (otherwise I don't
 > see why we may return false in timode_scalar_chain::convert_insn).

 Yes, it can happen:

 (insn 11 8 12 2 (parallel [
 (set (reg/v:TI 91 [  ])
 (plus:TI (reg/v:TI 92 [ a ])
 (reg/v:TI 96 [ b ])))
 (clobber (reg:CC 17 flags))
 ]) y.i:5 210 {*addti3_doubleword}
  (expr_list:REG_UNUSED (reg:CC 17 flags)
 (nil)))
 (debug_insn 12 11 13 2 (var_location:TI w (reg/v:TI 91 [  ])) 
 y.i:5 -1
  (nil))


 > What about other possible register uses?  If debug insns are added
 > to candidates then NONDEBUG_INSN_P check for uses in
 > timode_check_non_convertible_regs becomes invalid, right?

 Debug insn has no impact on STV decision.  We just need to convert
 register referenced in debug insn from V1TImode to TImode in
 timode_scalar_chain::convert_insn.

 > If we have (or want) to fix some register uses then it's probably
 > would be better to visit register uses when we convert its mode
 > and make required fix-ups.  It seems better to me to not involve
 > debug insns in analysis phase.

 Here is the updated patch to add debug insn, which references the
 TImode register which will be converted to V1TImode to queue.
 I am testing it now.

>>>
>>> You still count and dump debug insns as optimized ones.  Also we
>>> try to use virtual functions to cover differences in DI and TI
>>> optimizations and introducing additional TARGET_64BIT in common
>>> STV code is undesirable.
>>>
>>> Also your conversion now depends on instructions processing order.
>>> You will fail to process debug insn before non-debug ones. Required
>>> order is not guaranteed because processing depends on instruction
>>> UIDs only.
>>>
>>> I propose to modify transformation phase only like in the patch
>>> (untested) below.  I rely on your code which assumes the only
>>> possible usage in debug insn is VAR_LOCATION.
>>>
>>> Thanks,
>>> Ilya
>>> --
>>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>>> index c5e5e12..ec955f0 100644
>>> --- a/gcc/config/i386/i386.c
>>> +++ b/gcc/config/i386/i386.c
>>> @@ -3139,6 +3139,7 @@ class timode_scalar_chain : public scalar_chain
>>>
>>>   private:
>>>void mark_dual_mode_def (df_ref def);
>>> +  void fix_debug_reg_uses (rtx reg);
>>>void convert_insn (rtx_insn *insn);
>>>/* We don't convert registers to difference size.  */
>>>void convert_registers () {}
>>> @@ -3790,6 +3791,34 @@ dimode_scalar_chain::convert_insn (rtx_insn *insn)
>>>df_insn_rescan (insn);
>>>  }
>>>
>>> +/* Fix uses of converted REG in debug insns.  */
>>> +
>>> +void
>>> +timode_scalar_chain::fix_debug_reg_uses (rtx reg)
>>> +{
>>> +  df_ref ref;
>>> +  for (ref = DF_REG_USE_CHAIN (REGNO (reg)); ref; ref = DF_REF_NEXT_REG 
>>> (ref))
>>> +{
>>> +  rtx_insn *insn = DF_REF_INSN (ref);
>>> +
>>> +  if (DEBUG_INSN_P (insn))
>>> +   {
>>> + /* It must be a debug insn with a TImode variable in register.  */
>>> + rtx val = PATTERN (insn);
>>> + gcc_assert (GET_MODE (val) == TImode
>>> + && GET_CODE (val) == VAR_LOCATION);
>>> + rtx loc = PAT_VAR_LOCATION_LOC (val);
>>> + gcc_assert (REG_P (loc)
>>> + && GET_MODE (loc) == V1TImode);
>>> + /* Convert V1TImode register, which has been updated by a SET
>>> + insn before, to SUBREG TImode.  */
>>> + PAT_VAR_LOCATION_LOC (val) = gen_rtx_SUBREG (TImode, loc, 0);
>>> + df_insn_rescan (insn);
>>> + return;
>>> +   }
>>> +}
>>> +}
>>> +
>>>  /* Convert INSN from TImode to 

[PATCH], Simplify setup of complex types

2016-06-21 Thread Michael Meissner
When I submitted the back port to allow complex __float128 to be created on the
PowerPC to the GCC 6.2 branch, Richard Biener suggested a simpler way to set
the complex type:
https://gcc.gnu.org/ml/gcc-patches/2016-06/msg01114.html

This patch implements this change for the trunk.  I have a companion patch for
6.2 once this goes into the trunk.

I bootstrapped the compiler and did a make check with no regressions on a big
endian Power 7 system and a little endian Power 8 system.  Is it ok to go into
the trunk?

[gcc]
2016-06-21  Michael Meissner  

* stor-layout.c (layout_type): Move setting complex MODE to
layout_type, instead of setting it ahead of time by the caller.
* tree.c (build_complex_type): Likewise.

[gcc/fortran]
2016-06-21  Michael Meissner  

* trans-types.c (gfc_build_complex_type): Move setting complex
MODE to layout_type, instead of setting it ahead of time by the
caller.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.vnet.ibm.com, phone: +1 (978) 899-4797
Index: gcc/stor-layout.c
===
--- gcc/stor-layout.c   (revision 237612)
+++ gcc/stor-layout.c   (working copy)
@@ -2146,12 +2146,8 @@ layout_type (tree type)
 
 case COMPLEX_TYPE:
   TYPE_UNSIGNED (type) = TYPE_UNSIGNED (TREE_TYPE (type));
-
-  /* build_complex_type and fortran's gfc_build_complex_type have set the
-expected mode to allow having multiple complex types for multiple
-floating point types that have the same size such as the PowerPC with
-__ibm128 and __float128.  */
-  gcc_assert (TYPE_MODE (type) != VOIDmode);
+  SET_TYPE_MODE (type,
+GET_MODE_COMPLEX_MODE (TYPE_MODE (TREE_TYPE (type;
 
   TYPE_SIZE (type) = bitsize_int (GET_MODE_BITSIZE (TYPE_MODE (type)));
   TYPE_SIZE_UNIT (type) = size_int (GET_MODE_SIZE (TYPE_MODE (type)));
Index: gcc/tree.c
===
--- gcc/tree.c  (revision 237612)
+++ gcc/tree.c  (working copy)
@@ -8783,7 +8783,6 @@ build_complex_type (tree component_type)
   t = make_node (COMPLEX_TYPE);
 
   TREE_TYPE (t) = TYPE_MAIN_VARIANT (component_type);
-  SET_TYPE_MODE (t, GET_MODE_COMPLEX_MODE (TYPE_MODE (component_type)));
 
   /* If we already have such a type, use the old one.  */
   hstate.add_object (TYPE_HASH (component_type));
Index: gcc/fortran/trans-types.c
===
--- gcc/fortran/trans-types.c   (revision 237613)
+++ gcc/fortran/trans-types.c   (working copy)
@@ -828,7 +828,6 @@ gfc_build_complex_type (tree scalar_type
 
   new_type = make_node (COMPLEX_TYPE);
   TREE_TYPE (new_type) = scalar_type;
-  SET_TYPE_MODE (new_type, GET_MODE_COMPLEX_MODE (TYPE_MODE (scalar_type)));
   layout_type (new_type);
   return new_type;
 }


Re: [PATCH/AARCH64] Accept vulcan as a cpu name for the AArch64 port of GCC

2016-06-21 Thread James Greenhalgh
On Sat, Jun 18, 2016 at 01:57:43AM +0530, Virendra Pathak wrote:
> Hi,
> 
> Please find the patch for introducing vulcan as a cpu name for the
> AArch64 port of GCC.
> Broadcom's vulcan is an armv8.1-a aarch64 server processor.
> 
> Since vulcan is the first armv8.1-a processor to be introduced in
> aarch64-cores.def,
> I have created a new section in the file for the armv8.1 based processors.
> Kindly let me know if that is okay.
> 
> Tested the patch with cross aarch64-linux-gnu, bootstrapped native
> aarch64-unknown-linux-gnu
> and make check (gcc, ld, gas, binutils, gdb).
> No new regression failure is added by this patch.
> 
> In addition, tested -mcpu=vulcan -mtune=vulcan flags by passing them
> via command line.
> Also verified that above flags passes armv8.1-a option to assembler(as).
> 
> At present we are using schedule & cost model of cortex-a57 but
> soon we will be submitting one for vulcan.
> 
> Please review the patch.
> Ok for trunk?
> 

This patch is OK for trunk.

I couldn't spot your name in the MAINTAINERS file, so I've applied this
on your behalf as revision 237645.

Thank you for the patch.

Thanks,
James

> gcc/ChangeLog:
> 
> Virendra Pathak 
> 
> * config/aarch64/aarch64-cores.def (vulcan): New core.
> * config/aarch64/aarch64-tune.md: Regenerate.
> * doc/invoke.texi: Document vulcan as an available option.
> 



Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)

2016-06-21 Thread Christophe Lyon
On 21 June 2016 at 15:13, Jakub Jelinek  wrote:
> On Tue, Jun 21, 2016 at 03:10:33PM +0200, Christophe Lyon wrote:
>> > Here is a new patch version, which removes the hardcoded dg-do run 
>> > directives,
>> > so that tests use compile or run depending on the result of
>> > check_vect_support_and_set_flags.
>> >
>> > On ARM, this first uses arm_neon_ok to detect the required flags, then
>> > depending on
>> > arm_neon_hw, it decides whether to dg-do run or compile.
>> >
>> > OK?
>>
>> ping?
>
> I'm not convinced we want to do this, even if the hw doesn't support it, the
> dg-do run tests are tested for not just compilation, but also assembly and
> linking; IMHO much better is to really fix up
> check_vect so that it works on ARM as it does everywhere else, even if it
> just means rewriting parts of it in inline asm.
>

I'm not sure to follow: many other targets use dg-do compile if some "hw"
support is not present (see check_vect_support_and_set_flags for
powerpc, sparc, alpha, x86).

Do you mean that on these targets, we want to run the tests, even if
the hw does not support the required features and that removing dg-do run
will result is less coverage on these targets?

What about defaulting to 'assemble' instead of 'compile' ?

> Jakub


Re: [PATCH][AArch64] Increase code alignment

2016-06-21 Thread Wilco Dijkstra

ping


From: Wilco Dijkstra
Sent: 03 June 2016 11:51
To: GCC Patches
Cc: nd; philipp.toms...@theobroma-systems.com; pins...@gmail.com; 
jim.wil...@linaro.org; benedikt.hu...@theobroma-systems.com; Evandro Menezes
Subject: [PATCH][AArch64] Increase code alignment
    
Increase loop alignment on Cortex cores to 8 and set function alignment to 16.  
This makes things consistent across big.LITTLE cores, improves performance of 
benchmarks with tight loops and reduces performance variations due to small  
changes in code layout. It looks almost all AArch64 cores agree on alignment of 
16 for function, and 8 for loops and branches, so we should change 
-mcpu=generic as well if there is no disagreement - feedback welcome.

OK for commit?

ChangeLog:

2016-05-03  Wilco Dijkstra  

    * gcc/config/aarch64/aarch64.c (cortexa53_tunings):
    Increase loop alignment to 8.  Set function alignment to 16.
    (cortexa35_tunings): Likewise.
    (cortexa57_tunings): Increase loop alignment to 8.
    (cortexa72_tunings): Likewise.

---

diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c
index 
12e5017a6d4b0ab15dcf932014980fdbd1a598ee..6ea10a187a1f895a399515b8cd0da0be63be827a
 100644
--- a/gcc/config/aarch64/aarch64.c
+++ b/gcc/config/aarch64/aarch64.c
@@ -424,9 +424,9 @@ static const struct tune_params cortexa35_tunings =
   1, /* issue_rate  */
   (AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  8,   /* function_align.  */
+  16,  /* function_align.  */
   8,   /* jump_align.  */
-  4,   /* loop_align.  */
+  8,   /* loop_align.  */
   2,   /* int_reassoc_width.  */
   4,   /* fp_reassoc_width.  */
   1,   /* vec_reassoc_width.  */
@@ -449,9 +449,9 @@ static const struct tune_params cortexa53_tunings =
   2, /* issue_rate  */
   (AARCH64_FUSE_AES_AESMC | AARCH64_FUSE_MOV_MOVK | AARCH64_FUSE_ADRP_ADD
    | AARCH64_FUSE_MOVK_MOVK | AARCH64_FUSE_ADRP_LDR), /* fusible_ops  */
-  8,   /* function_align.  */
+  16,  /* function_align.  */
   8,   /* jump_align.  */
-  4,   /* loop_align.  */
+  8,   /* loop_align.  */
   2,   /* int_reassoc_width.  */
   4,   /* fp_reassoc_width.  */
   1,   /* vec_reassoc_width.  */
@@ -476,7 +476,7 @@ static const struct tune_params cortexa57_tunings =
    | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
   16,  /* function_align.  */
   8,   /* jump_align.  */
-  4,   /* loop_align.  */
+  8,   /* loop_align.  */
   2,   /* int_reassoc_width.  */
   4,   /* fp_reassoc_width.  */
   1,   /* vec_reassoc_width.  */
@@ -502,7 +502,7 @@ static const struct tune_params cortexa72_tunings =
    | AARCH64_FUSE_MOVK_MOVK), /* fusible_ops  */
   16,  /* function_align.  */
   8,   /* jump_align.  */
-  4,   /* loop_align.  */
+  8,   /* loop_align.  */
   2,   /* int_reassoc_width.  */
   4,   /* fp_reassoc_width.  */
   1,   /* vec_reassoc_width.  */




Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)

2016-06-21 Thread Jakub Jelinek
On Tue, Jun 21, 2016 at 03:10:33PM +0200, Christophe Lyon wrote:
> > Here is a new patch version, which removes the hardcoded dg-do run 
> > directives,
> > so that tests use compile or run depending on the result of
> > check_vect_support_and_set_flags.
> >
> > On ARM, this first uses arm_neon_ok to detect the required flags, then
> > depending on
> > arm_neon_hw, it decides whether to dg-do run or compile.
> >
> > OK?
> 
> ping?

I'm not convinced we want to do this, even if the hw doesn't support it, the
dg-do run tests are tested for not just compilation, but also assembly and
linking; IMHO much better is to really fix up
check_vect so that it works on ARM as it does everywhere else, even if it
just means rewriting parts of it in inline asm.

Jakub


Re: [PATCH] Fix SLP wrong-code with VECTOR_BOOLEAN_TYPE_P (PR tree-optimization/71259)

2016-06-21 Thread Christophe Lyon
On 15 June 2016 at 10:45, Christophe Lyon  wrote:
> On 9 June 2016 at 14:46, Jakub Jelinek  wrote:
>> On Thu, Jun 09, 2016 at 02:40:43PM +0200, Christophe Lyon wrote:
>>> > Bet it depends if this happens before the signal(SIGILL, sig_ill_handler);
>>> > call or after it.  If before, then I guess you'd better rewrite the
>>> > long long a = 0, b = 1;
>>> > asm ("vorr %P0, %P1, %P2"
>>> >  : "=w" (a)
>>> >  : "0" (a), "w" (b));
>>> > if (a != 1)
>>>
>>> Of course you are right: it happens just before the call to signal,
>>> to build the sig_ill_handler address in r1.
>>>
>>> So it's not even a problem with rewriting the asm.
>>
>> Ugh, so the added options don't affect just vectorized code, but normal
>> integer only code?
>> check_vect is fragile, there is always a risk that some instruction is
>> scheduled before the call.
>
> Yes, here it's an instruction used to build a parameter of the call.
>
>> If you have working target attribute support, I think you should compile
>> check_vect with attribute set to some lowest common denominator that every
>> ARM CPU supports (if there is any, that is).  Though most likely you'll need
>> to tweak the inline asm, because maybe "w" constraint won't be available
>> then.
>
> ARM does not support attribute/pragma cpu :(
>
> Here is a new patch version, which removes the hardcoded dg-do run directives,
> so that tests use compile or run depending on the result of
> check_vect_support_and_set_flags.
>
> On ARM, this first uses arm_neon_ok to detect the required flags, then
> depending on
> arm_neon_hw, it decides whether to dg-do run or compile.
>
> OK?

ping?


>
> Christophe
>
>> Jakub


Re: [PATCH] PR target/71549: Convert V1TImode register to TImode in debug insn

2016-06-21 Thread H.J. Lu
On Mon, Jun 20, 2016 at 11:53 AM, H.J. Lu  wrote:
> On Mon, Jun 20, 2016 at 10:31 AM, Ilya Enkovich  
> wrote:
>> On 20 Jun 09:45, H.J. Lu wrote:
>>> On Mon, Jun 20, 2016 at 7:30 AM, Ilya Enkovich  
>>> wrote:
>>> > 2016-06-20 16:39 GMT+03:00 Uros Bizjak :
>>> >> On Mon, Jun 20, 2016 at 1:55 PM, H.J. Lu  wrote:
>>> >>> TImode register referenced in debug insn can be converted to V1TImode
>>> >>> by scalar to vector optimization.  We need to convert a debug insn if
>>> >>> it has a variable in a TImode register.
>>> >>>
>>> >>> Tested on x86-64.  OK for trunk?
>>> >>
>>> >> Ilya, does this approach look good to you? Also, does DImode STV
>>> >> conversion need similar handling of debug insns?
>>> >
>>> > DImode conversion doesn't change register mode (i.e. never calls
>>> > PUT_MODE for registers).  That keeps debug instructions valid.
>>> >
>>> > Overall I don't like the idea of having debug insns in candidates
>>> > set and in chains.  Looks like it is possible to have a chain
>>> > consisting of a debug insn only which is weird (otherwise I don't
>>> > see why we may return false in timode_scalar_chain::convert_insn).
>>>
>>> Yes, it can happen:
>>>
>>> (insn 11 8 12 2 (parallel [
>>> (set (reg/v:TI 91 [  ])
>>> (plus:TI (reg/v:TI 92 [ a ])
>>> (reg/v:TI 96 [ b ])))
>>> (clobber (reg:CC 17 flags))
>>> ]) y.i:5 210 {*addti3_doubleword}
>>>  (expr_list:REG_UNUSED (reg:CC 17 flags)
>>> (nil)))
>>> (debug_insn 12 11 13 2 (var_location:TI w (reg/v:TI 91 [  ])) y.i:5 
>>> -1
>>>  (nil))
>>>
>>>
>>> > What about other possible register uses?  If debug insns are added
>>> > to candidates then NONDEBUG_INSN_P check for uses in
>>> > timode_check_non_convertible_regs becomes invalid, right?
>>>
>>> Debug insn has no impact on STV decision.  We just need to convert
>>> register referenced in debug insn from V1TImode to TImode in
>>> timode_scalar_chain::convert_insn.
>>>
>>> > If we have (or want) to fix some register uses then it's probably
>>> > would be better to visit register uses when we convert its mode
>>> > and make required fix-ups.  It seems better to me to not involve
>>> > debug insns in analysis phase.
>>>
>>> Here is the updated patch to add debug insn, which references the
>>> TImode register which will be converted to V1TImode to queue.
>>> I am testing it now.
>>>
>>
>> You still count and dump debug insns as optimized ones.  Also we
>> try to use virtual functions to cover differences in DI and TI
>> optimizations and introducing additional TARGET_64BIT in common
>> STV code is undesirable.
>>
>> Also your conversion now depends on instructions processing order.
>> You will fail to process debug insn before non-debug ones. Required
>> order is not guaranteed because processing depends on instruction
>> UIDs only.
>>
>> I propose to modify transformation phase only like in the patch
>> (untested) below.  I rely on your code which assumes the only
>> possible usage in debug insn is VAR_LOCATION.
>>
>> Thanks,
>> Ilya
>> --
>> diff --git a/gcc/config/i386/i386.c b/gcc/config/i386/i386.c
>> index c5e5e12..ec955f0 100644
>> --- a/gcc/config/i386/i386.c
>> +++ b/gcc/config/i386/i386.c
>> @@ -3139,6 +3139,7 @@ class timode_scalar_chain : public scalar_chain
>>
>>   private:
>>void mark_dual_mode_def (df_ref def);
>> +  void fix_debug_reg_uses (rtx reg);
>>void convert_insn (rtx_insn *insn);
>>/* We don't convert registers to difference size.  */
>>void convert_registers () {}
>> @@ -3790,6 +3791,34 @@ dimode_scalar_chain::convert_insn (rtx_insn *insn)
>>df_insn_rescan (insn);
>>  }
>>
>> +/* Fix uses of converted REG in debug insns.  */
>> +
>> +void
>> +timode_scalar_chain::fix_debug_reg_uses (rtx reg)
>> +{
>> +  df_ref ref;
>> +  for (ref = DF_REG_USE_CHAIN (REGNO (reg)); ref; ref = DF_REF_NEXT_REG 
>> (ref))
>> +{
>> +  rtx_insn *insn = DF_REF_INSN (ref);
>> +
>> +  if (DEBUG_INSN_P (insn))
>> +   {
>> + /* It must be a debug insn with a TImode variable in register.  */
>> + rtx val = PATTERN (insn);
>> + gcc_assert (GET_MODE (val) == TImode
>> + && GET_CODE (val) == VAR_LOCATION);
>> + rtx loc = PAT_VAR_LOCATION_LOC (val);
>> + gcc_assert (REG_P (loc)
>> + && GET_MODE (loc) == V1TImode);
>> + /* Convert V1TImode register, which has been updated by a SET
>> + insn before, to SUBREG TImode.  */
>> + PAT_VAR_LOCATION_LOC (val) = gen_rtx_SUBREG (TImode, loc, 0);
>> + df_insn_rescan (insn);
>> + return;
>> +   }
>> +}
>> +}
>> +
>>  /* Convert INSN from TImode to V1T1mode.  */
>>
>>  void
>> @@ -3806,8 +3835,10 @@ timode_scalar_chain::convert_insn (rtx_insn *insn)
>> rtx tmp = find_reg_equal_equiv_note (insn);
>> if (tmp)
>>

Re: [PATCH] x86-64: Load external function address via GOT slot

2016-06-21 Thread H.J. Lu
On Mon, Jun 20, 2016 at 12:46 PM, Richard Sandiford
 wrote:
> Uros Bizjak  writes:
>> On Mon, Jun 20, 2016 at 9:19 PM, H.J. Lu  wrote:
>>> On Mon, Jun 20, 2016 at 12:13 PM, Uros Bizjak  wrote:
 On Mon, Jun 20, 2016 at 7:05 PM, H.J. Lu  wrote:
> Hi,
>
> This patch implements the alternate code sequence recommended in
>
> https://groups.google.com/forum/#!topic/x86-64-abi/de5_KnLHxtI
>
> to load external function address via GOT slot with
>
> movq func@GOTPCREL(%rip), %rax
>
> so that linker won't create an PLT entry for extern function
> address.
>
> Tested on x86-64.  OK for trunk?

> +  else if (ix86_force_load_from_GOT_p (op1))
> +{
> +  /* Load the external function address via the GOT slot to
> +avoid PLT.  */
> +  op1 = gen_rtx_UNSPEC (Pmode, gen_rtvec (1, op1),
> +   (TARGET_64BIT
> +? UNSPEC_GOTPCREL
> +: UNSPEC_GOT));
> +  op1 = gen_rtx_CONST (Pmode, op1);
> +  op1 = gen_const_mem (Pmode, op1);
> +  /* This symbol must be referenced via a load from the Global
> +Offset Table.  */
> +  set_mem_alias_set (op1, ix86_GOT_alias_set ());
> +  op1 = convert_to_mode (mode, op1, 1);
> +  op1 = force_reg (mode, op1);
> +  emit_insn (gen_rtx_SET (op0, op1));
> +  /* Generate a CLOBBER so that there will be no REG_EQUAL note
> +on the last insn to prevent cse and fwprop from replacing
> +a GOT load with a constant.  */
> +  rtx tmp = gen_reg_rtx (Pmode);
> +  emit_clobber (tmp);
> +  return;

 Jeff, is this the recommended way to prevent CSE, as far as RTL
 infrastructure is concerned? I didn't find any example of this
 approach with other targets.

>>>
>>> FWIW, the similar approach is used in ix86_expand_vector_move_misalign,
>>> ix86_expand_convert_uns_didf_sse and ix86_expand_vector_init_general
>>> as well as other targets:
>>>
>>> frv/frv.c:  emit_clobber (op0);
>>> frv/frv.c:  emit_clobber (op1);
>>> im32c/m32c.c:  /*  emit_clobber (gen_rtx_REG (HImode, R0L_REGNO)); */
>>> s390/s390.c:  emit_clobber (addr);
>>> s390/s390.md:  emit_clobber (reg0);
>>> s390/s390.md:  emit_clobber (reg1);
>>> s390/s390.md:  emit_clobber (reg0);
>>> s390/s390.md:  emit_clobber (reg0);
>>> s390/s390.md:  emit_clobber (reg1);
>>
>> These usages mark the whole register as being "clobbered"
>> (=undefined), before only a part of register is written, e.g.:
>>
>>   emit_clobber (int_xmm);
>>   emit_move_insn (gen_lowpart (DImode, int_xmm), input);
>>
>> They aren't used to prevent unwanted CSE.
>
> Since it's being called in the move expander, I thought the normal
> way of preventing the constant being rematerialised would be to reject
> it in the move define_insn predicates.
>
> FWIW, I agree that using a clobber for this is going to be fragile.
>

Here is the patch without clobber.  Tested on x86-64.  OK for
trunk?

Thanks.


-- 
H.J.
From 55ab339cc4173565095b66c0fc2ffa4267b55606 Mon Sep 17 00:00:00 2001
From: "H.J. Lu" 
Date: Fri, 28 Aug 2015 19:14:49 -0700
Subject: [PATCH] x86-64: Load external function address via GOT slot

This patch implements the alternate code sequence recommended in

https://groups.google.com/forum/#!topic/x86-64-abi/de5_KnLHxtI

to load external function address via GOT slot with

movq func@GOTPCREL(%rip), %rax

so that linker won't create an PLT entry for extern function address.

gcc/

	PR target/67400
	* config/i386/i386-protos.h (ix86_force_load_from_GOT_p): New.
	* config/i386/i386.c (ix86_force_load_from_GOT_p): New function.
	(ix86_legitimate_address_p): Allow UNSPEC_GOTPCREL if
	ix86_force_load_from_GOT_p returns true.
	(ix86_print_operand_address): Support UNSPEC_GOTPCREL if
	ix86_force_load_from_GOT_p returns true.
	(ix86_expand_move): Load the external function address via the
	GOT slot if ix86_force_load_from_GOT_p returns true.
	* config/i386/i386.md (*movsi_internal): Replace general_operand
	with ix86_general_operand.
	(*movqi_internal): Likewise.
	* config/i386/predicates.md (x86_64_immediate_operand): Return
	false if ix86_force_load_from_GOT_p returns true.
	(address_no_seg_operand): Likewise.
	(ix86_general_operand): New predicate.

gcc/testsuite/

	PR target/67400
	* gcc.target/i386/pr67400-1.c: New test.
	* gcc.target/i386/pr67400-2.c: Likewise.
	* gcc.target/i386/pr67400-3.c: Likewise.
	* gcc.target/i386/pr67400-4.c: Likewise.
	* gcc.target/i386/pr67400-5.c: Likewise.
	* gcc.target/i386/pr67400-6.c: Likewise.
---
 gcc/config/i386/i386-protos.h |  1 +
 gcc/config/i386/i386.c| 44 +++
 gcc/config/i386/i386.md   |  4 +--
 

Implement C _FloatN, _FloatNx types

2016-06-21 Thread Joseph Myers
ISO/IEC TS 18661-3:2015 defines C bindings to IEEE interchange and
extended types, in the form of _FloatN and _FloatNx type names with
corresponding fN/FN and fNx/FNx constant suffixes and FLTN_* / FLTNX_*
 macros.  This patch implements support for this feature in
GCC.

The _FloatN types, for N = 16, 32, 64 or >= 128 and a multiple of 32,
are types encoded according to the corresponding IEEE interchange
format (endianness unspecified).  The _FloatNx types, for N = 32, 64
and 128, are IEEE "extended" types: types extending a narrower format
with range and precision at least as big as those specified in IEEE
754 for each extended type (and with unspecified representation, but
still following IEEE semantics for their values and operations - and
with the set of values being determined by the precision and the
maximum exponent, which means that while Intel "extended" is suitable
for _Float64x, m68k "extended" is not).  These types are always
distinct from and not compatible with each other and the standard
floating types float, double, long double; thus, double, _Float64 and
_Float32x may all have the same ABI, but they are three still distinct
types.  The type names may be used with _Complex to construct
corresponding complex types (unlike __float128, which acts more like a
typedef name than a keyword - thus, this patch may be considered to
fix PR c/32187).  The new suffixes can be combined with GNU "i" and
"j" suffixes for constants of complex types (e.g. 1.0if128, 2.0f64i).

The set of types supported is implementation-defined.  In this GCC
patch, _Float32 is SFmode if that is suitable; _Float32x and _Float64
are DFmode if that is suitable; _Float128 is TFmode if that is
suitable; _Float64x is XFmode if that is suitable, and otherwise
TFmode if that is suitable.  There is a target hook to override the
choices if necessary.  "Suitable" means both conforming to the
requirements of that type, and supported as a scalar type including in
libgcc.  The ABI is whatever the back end does for scalars of that
mode (but note that _Float32 is passed without promotion in variable
arguments, unlike float).  All the existing issues with exceptions and
rounding modes for existing types apply equally to the new type names.

No GCC port supports a floating-point format suitable for _Float128x.
Although there is HFmode support for ARM and AArch64, use of that for
_Float16 is not enabled.  Supporting _Float16 would require additional
work on the excess precision aspects of TS 18661-3: there are new
values of FLT_EVAL_METHOD, which are not currently supported in GCC,
and FLT_EVAL_METHOD == 0 now means that operations and constants on
types narrower than float are evaluated to the range and precision of
float.  Implementing that, so that _Float16 gets evaluated with excess
range and precision, would involve changes to the excess precision
infrastructure so that the _Float16 case is enabled by default, unlike
the x87 case which is only enabled for -fexcess-precision=standard.
Other differences between _Float16 and __fp16 would also need to be
disentangled.

GCC has some prior support for nonstandard floating-point types in the
form of __float80 and __float128.  Where these were previously types
distinct from long double, they are made by this patch into aliases
for _Float64x / _Float128 if those types have the required properties.

In principle the set of possible _FloatN types is infinite.  This
patch hardcodes the four such types for N <= 128, but with as much
code as possible using loops over types to minimize the number of
places with such hardcoding.  I don't think it's likely any further
such types will be of use in future (or indeed that formats suitable
for _Float128x will actually be implemented).  There is a corner case
that all _FloatN, for N >= 128 and a multiple of 32, should be treated
as keywords even when the corresponding type is not supported; I
intend to deal with that in a followup patch.

Tests are added for various functionality of the new types, mostly
using type-generic headers.  PowerPC maintainers should note that the
tests do not do anything regarding passing special options to enable
support for the types, either for the tests themselves or for the
corresponding effective-target tests.  Thus, to run the _Float128
tests on PowerPC, you will need to add such support, { dg-add-options
float128 } or similar and make sure it affects both the
effective-target tests and the tests themselves.  The complex
arithmetic support in libgcc will also be needed, as otherwise the
associated tests will fail.  (The same would apply to _Float16 on ARM
as well if that were enabled with -mfp16-format=ieee required to use
it.  Of course, -mfp16-format=alternative enables use of a format
which is not compatible with the requirements of the _Float16 type.)

C++ note: no support for the new types or constant suffixes is added
for C++.  C++ decimal floating-point support was very different from
the C support, using class types, 

[PATCH] Disable gdb and sim builds for ARC in top-level configure.ac

2016-06-21 Thread Anton Kolesov
As of now GDB and sim cannot be built from upstream binutils-gdb repository,
so they should be disabled by default.

2016-06-21  Anton Kolesov  

* configure.ac: Disable gdb and sim for ARC.
* configure: Regenerate.
---
 configure| 3 +++
 configure.ac | 3 +++
 2 files changed, 6 insertions(+)

diff --git a/configure b/configure
index ea63784..805fbe9 100755
--- a/configure
+++ b/configure
@@ -3756,6 +3756,9 @@ case "${target}" in
   sh*-*-pe|mips*-*-pe|*arm-wince-pe)
 noconfigdirs="$noconfigdirs tcl tk itcl libgui sim"
 ;;
+  arc*-*-*)
+noconfigdirs="$noconfigdirs gdb sim"
+;;
   arm-*-pe*)
 noconfigdirs="$noconfigdirs target-libgloss"
 ;;
diff --git a/configure.ac b/configure.ac
index 54558df..04ed98e 100644
--- a/configure.ac
+++ b/configure.ac
@@ -1092,6 +1092,9 @@ case "${target}" in
   sh*-*-pe|mips*-*-pe|*arm-wince-pe)
 noconfigdirs="$noconfigdirs tcl tk itcl libgui sim"
 ;;
+  arc*-*-*)
+noconfigdirs="$noconfigdirs gdb sim"
+;;
   arm-*-pe*)
 noconfigdirs="$noconfigdirs target-libgloss"
 ;;
-- 
2.8.1



Canonicalize ASM_OPERANDS inside PARALLEL during CSE

2016-06-21 Thread Eric Botcazou
canonicalize_insn attempts to replace pseudo-registers with the canonical ones 
during the CSE pass and it does so inside ASM_OPERANDS (only for the inputs) 
and inside PARALLELs, but not inside ASM_OPERANDS which are themselves inside 
a PARALLEL.  The latter case occurs for targets that automatically clobber the 
CC register for inline assembly statements, for example Visium and x86.

Tested on x86_64-suse-linux, applied on the mainline.


2016-06-21  Eric Botcazou  

* cse.c (canon_asm_operands): New function extracted from...
(canonicalize_insn): ...here.  Call it to canonicalize an ASM_OPERANDS
either standalone or member of a PARALLEL.

-- 
Eric BotcazouIndex: cse.c
===
--- cse.c	(revision 237571)
+++ cse.c	(working copy)
@@ -4298,6 +4298,22 @@ find_sets_in_insn (rtx_insn *insn, struc
   return n_sets;
 }
 
+/* Subroutine of canonicalize_insn.  X is an ASM_OPERANDS in INSN.  */
+
+static void
+canon_asm_operands (rtx x, rtx_insn *insn)
+{
+  for (int i = ASM_OPERANDS_INPUT_LENGTH (x) - 1; i >= 0; i--)
+{
+  rtx input = ASM_OPERANDS_INPUT (x, i);
+  if (!(REG_P (input) && HARD_REGISTER_P (input)))
+	{
+	  input = canon_reg (input, insn);
+	  validate_change (insn, _OPERANDS_INPUT (x, i), input, 1);
+	}
+}
+}
+
 /* Where possible, substitute every register reference in the N_SETS
number of SETS in INSN with the canonical register.
 
@@ -4361,17 +4377,7 @@ canonicalize_insn (rtx_insn *insn, struc
 /* Canonicalize a USE of a pseudo register or memory location.  */
 canon_reg (x, insn);
   else if (GET_CODE (x) == ASM_OPERANDS)
-{
-  for (i = ASM_OPERANDS_INPUT_LENGTH (x) - 1; i >= 0; i--)
-	{
-	  rtx input = ASM_OPERANDS_INPUT (x, i);
-	  if (!(REG_P (input) && REGNO (input) < FIRST_PSEUDO_REGISTER))
-	{
-	  input = canon_reg (input, insn);
-	  validate_change (insn, _OPERANDS_INPUT (x, i), input, 1);
-	}
-	}
-}
+canon_asm_operands (x, insn);
   else if (GET_CODE (x) == CALL)
 {
   canon_reg (x, insn);
@@ -4400,6 +4406,8 @@ canonicalize_insn (rtx_insn *insn, struc
 		   && ! (REG_P (XEXP (y, 0))
 			 && REGNO (XEXP (y, 0)) < FIRST_PSEUDO_REGISTER))
 	canon_reg (y, insn);
+	  else if (GET_CODE (y) == ASM_OPERANDS)
+	canon_asm_operands (y, insn);
 	  else if (GET_CODE (y) == CALL)
 	{
 	  canon_reg (y, insn);


[patch, avr] PR 58655

2016-06-21 Thread Pitchumani Sivanupandi
Attached patches add documentation for -mfract-convert-truncate option
and add that info to release notes (gcc-4.9 changes).

If OK, could someone commit please? I do not have commit access.

Regards,
Pitchumani

gcc/ChangeLog

2016-06-21  Pitchumani Sivanupandi  

    PR target/58655
* doc/invoke.texi (AVR Options): Document -mfract-convert-truncate
    option.
--- a/gcc/doc/invoke.texi
+++ b/gcc/doc/invoke.texi
@@ -643,8 +643,8 @@ Objective-C and Objective-C++ Dialects}.
 @emph{AVR Options}
 @gccoptlist{-mmcu=@var{mcu} -maccumulate-args -mbranch-cost=@var{cost} @gol
 -mcall-prologues -mint8 -mn_flash=@var{size} -mno-interrupts @gol
--mrelax -mrmw -mstrict-X -mtiny-stack -nodevicelib -Waddr-space-convert @gol
--Wmisspelled-isr}
+-mrelax -mrmw -mstrict-X -mtiny-stack -mfract-convert-truncate -nodevicelib @gol
+-Waddr-space-convert -Wmisspelled-isr}
 
 @emph{Blackfin Options}
 @gccoptlist{-mcpu=@var{cpu}@r{[}-@var{sirevision}@r{]} @gol
@@ -14586,6 +14586,10 @@ sbiw r26, const   ; X -= const
 @opindex mtiny-stack
 Only change the lower 8@tie{}bits of the stack pointer.
 
+@item -mfract-convert-truncate
+@opindex mfract-convert-truncate
+Allow to use truncation instead of rounding towards 0 for fractional int types.
+
 @item -nodevicelib
 @opindex nodevicelib
 Don't link against AVR-LibC's device specific library @code{lib.a}.
--- a/wwwdocs/htdocs/gcc-4.9/changes.html
+++ b/wwwdocs/htdocs/gcc-4.9/changes.html
@@ -579,6 +579,14 @@ auto incr(T x) { return x++; }
size when compiling for the M-profile processors.
  
  
+AVR
+
+  
+A new command-line option -mfract-convert-truncate has been added.
+It allows compiler to use truncation instead of rounding towards
+0 for fractional int types.
+  
+
 IA-32/x86-64
   
 -mfpmath=sse is now implied by -ffast-math


Re: i386/prologues: ROP mitigation for normal function epilogues

2016-06-21 Thread Bernd Schmidt

On 06/20/2016 02:08 PM, Michael Matz wrote:


P.S: Though I do feel these ROP counter measures are not much more than
security by obscurity; I guess enough obscurity indeed can at least lead
to harder to exploit programs.


I think security by obscurity is the wrong term for this. But I kind of 
know what you are saying, and at the moment none of us are probably in a 
position to say how much harder this makes exploits. We are however also 
working on tools that should help answer such questions, and hopefully 
we'll eventually be able to accumulate enough pieces like this one to 
make a real difference.



Bernd


Re: [PATCH] Drop excess size used for run time allocated stack variables.

2016-06-21 Thread Dominik Vogt
What do we do now with the two patches?  At the moment, the
functional patch depends on the changes in the cleanup patch, so
it cannot be applied on its own.  Options:

(with the requested cleanup in the functional patch)

 1) Apply both patches as they are now and do further cleanup on
top of it.
 2) Rewrite the functional patch so that it applies without the
cleanup patch and commit it now.
 3) Look into the suggested cleanup now and adapt the functional
patch to it when its ready.

Actually I'd prefer (1) or (2) to just get the functional patch
off my desk.  I agree that the cleanup is very useful, but there's
not relation between the cleanup and the functional stuff except
that they touch the same code.  Having the functional patch
applied would simplify further work for me.

On Thu, Jun 09, 2016 at 02:00:21PM +0200, Bernd Schmidt wrote:
> On 05/20/2016 01:11 AM, Jeff Law wrote:
> >Let's start with clean up of dead code:
> >
> > /* We will need to ensure that the address we return is aligned to
> > REQUIRED_ALIGN.  If STACK_DYNAMIC_OFFSET is defined, we don't
> > always know its final value at this point in the compilation (it
> > might depend on the size of the outgoing parameter lists, for
> > example), so we must align the value to be returned in that case.
> > (Note that STACK_DYNAMIC_OFFSET will have a default nonzero value if
> > STACK_POINTER_OFFSET or ACCUMULATE_OUTGOING_ARGS are defined).
> > We must also do an alignment operation on the returned value if
> > the stack pointer alignment is less strict than REQUIRED_ALIGN.
> >
> > If we have to align, we must leave space in SIZE for the hole
> > that might result from the alignment operation.  */
> >
> >  must_align = (crtl->preferred_stack_boundary < required_align);
> >  if (must_align)
> >{
> >  if (required_align > PREFERRED_STACK_BOUNDARY)
> >extra_align = PREFERRED_STACK_BOUNDARY;
> >  else if (required_align > STACK_BOUNDARY)
> >extra_align = STACK_BOUNDARY;
> >  else
> >extra_align = BITS_PER_UNIT;
> >}
> >
> >  /* ??? STACK_POINTER_OFFSET is always defined now.  */
> >#if defined (STACK_DYNAMIC_OFFSET) || defined (STACK_POINTER_OFFSET)
> >  must_align = true;
> >  extra_align = BITS_PER_UNIT;
> >#endif
> >
> >If we look at defaults.h, it always defines STACK_POINTER_OFFSET.  So
> >all the code above I think collapses to:
> >
> >  must_align = true;
> >  extra_align = BITS_PER_UNIT
> 
> (Cc'ing rth because portions of this seem to be his, r165240).
> 
> I kind of want to approach this from a different angle; let's look
> at extra_align. The way this is used subsequently makes it appear to
> be misnamed. It looks like it should hold the stack alignment we can
> assume:
> 
>   unsigned extra = (required_align - extra_align) / BITS_PER_UNIT;
> 
>   size = plus_constant (Pmode, size, extra);
> [...]
>   if (extra && size_align > extra_align)
> size_align = extra_align;
> 
> (where size_align is the known alignment of the size of the block to
> be allocated). If I'm reading this right, then the first part of the
> cleanup ought to be to get the naming right.
> 
> So why BITS_PER_UNIT? Shouldn't it at least be STACK_BOUNDARY? Let's
> look at the previous block a little more closely.
> 
> >   must_align = (crtl->preferred_stack_boundary < required_align);
> 
> [ crtl->preferred_stack_boundary is initialized to STACK_BOUNDARY in
> cfgexpand and only ever increased ]
> 
> >   if (must_align)
> > {
> 
> [ if must_align, then required_align > crtl->p_s_b >= STACK_BOUNDARY ]
> 
> >   if (required_align > PREFERRED_STACK_BOUNDARY)
> > extra_align = PREFERRED_STACK_BOUNDARY;
> 
> [ so far so good ]
> 
> >   else if (required_align > STACK_BOUNDARY)
> > extra_align = STACK_BOUNDARY;
> 
> [ always true, right? ]
> 
> >   else
> > extra_align = BITS_PER_UNIT;
> > }
> 
> [ dead code, right? ]
> 
> So we're left with the question of why extra_align is set to
> BITS_PER_UNIT for STACK_DYNAMIC_OFFSET, and I can't really see a
> reason to do that either. AFAIK the minimum alignment of the stack
> is always STACK_BOUNDARY, and it's possible we could do better.
> 
> As far as I can tell, no definition of STACK_DYNAMIC_OFFSET differs
> substantially from the default definition in function.c. Why
> couldn't we round up the outgoing_args_size to the preferred stack
> boundary (or a new value to keep track of the required alignment for
> dynamic allocations) before instantiating dynamic_offset? We then
> wouldn't have to add extra alignment for it here.
> 
> This rounding seems to happen anyway in port's frame calculations,
> e.g. here in i386:
> 
>  if (ACCUMULATE_OUTGOING_ARGS
>   && (!crtl->is_leaf || cfun->calls_alloca
>   || ix86_current_function_calls_tls_descriptor))
> {
>   offset += crtl->outgoing_args_size;
>   frame->outgoing_arguments_size = crtl->outgoing_args_size;
> }
>   else
>  

Re: [PATCH][typo] alignement -> alignment

2016-06-21 Thread Eric Botcazou
> Committing the attached typo fix as obvious (I believe "alignement" is the
> French form).

You are right.

-- 
Eric Botcazou


[ARM][testsuite] Add missing guards to fp16 AdvSIMD tests

2016-06-21 Thread Christophe Lyon
Hi,

I've noticed that some guards were missing on some of the AdvSIMD
tests involving fp16 code.

The attached patch fixes, although I didn't notice any difference in
validation: I have no configuration where
check_effective_target_arm_neon_fp16_ok fails.

I did locally modify this effective target to always return false to
make sure I covered all the missing guards.

However, I'm not sure when check_effective_target_arm_neon_fp16_ok can
fail? This effective target test was added by Alan Lawrence a few
months ago.

OK?

Christophe
gcc/testsuite/ChangeLog:

2016-06-21  Christophe Lyon  

* gcc.target/aarch64/advsimd-intrinsics/vget_lane.c: Add ifdef
around fp16 code.
* gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p128.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vreinterpret_p64.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vstX_lane.c: Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vld2_lane_f16_indices_1.c:
Add arm_neon_fp16_ok effective target.
* gcc.target/aarch64/advsimd-intrinsics/vld2q_lane_f16_indices_1.c: 
Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vld3_lane_f16_indices_1.c: 
Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vld3q_lane_f16_indices_1.c: 
Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vld4_lane_f16_indices_1.c: 
Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vld4q_lane_f16_indices_1.c: 
Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vst2_lane_f16_indices_1.c: 
Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vst2q_lane_f16_indices_1.c: 
Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vst3_lane_f16_indices_1.c: 
Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vst3q_lane_f16_indices_1.c: 
Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vst4_lane_f16_indices_1.c: 
Likewise.
* gcc.target/aarch64/advsimd-intrinsics/vst4q_lane_f16_indices_1.c: 
Likewise.

diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vget_lane.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vget_lane.c
index fe41c5f..ee6d650 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vget_lane.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vget_lane.c
@@ -54,10 +54,12 @@ void exec_vget_lane (void)
 uint32_t var_int32;
 float32_t var_float32;
   } var_int32_float32;
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
   union {
 uint16_t var_int16;
 float16_t var_float16;
   } var_int16_float16;
+#endif
 
 #define TEST_VGET_LANE_FP(Q, T1, T2, W, N, L) \
   VAR(var, T1, W) = vget##Q##_lane_##T2##W(VECT_VAR(vector, T1, W, N), L); \
diff --git a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c 
b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c
index 0de2ab3..127e1aa 100644
--- a/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c
+++ b/gcc/testsuite/gcc.target/aarch64/advsimd-intrinsics/vreinterpret.c
@@ -665,9 +665,11 @@ void exec_vreinterpret (void)
 
   /* Initialize input "vector" from "buffer".  */
   TEST_MACRO_ALL_VARIANTS_2_5(VLOAD, vector, buffer);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
   VLOAD(vector, buffer, , float, f, 16, 4);
-  VLOAD(vector, buffer, , float, f, 32, 2);
   VLOAD(vector, buffer, q, float, f, 16, 8);
+#endif
+  VLOAD(vector, buffer, , float, f, 32, 2);
   VLOAD(vector, buffer, q, float, f, 32, 4);
 
   /* vreinterpret_s8_xx.  */
@@ -680,7 +682,9 @@ void exec_vreinterpret (void)
   TEST_VREINTERPRET(, int, s, 8, 8, uint, u, 64, 1, expected_s8_7);
   TEST_VREINTERPRET(, int, s, 8, 8, poly, p, 8, 8, expected_s8_8);
   TEST_VREINTERPRET(, int, s, 8, 8, poly, p, 16, 4, expected_s8_9);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
   TEST_VREINTERPRET(, int, s, 8, 8, float, f, 16, 4, expected_s8_10);
+#endif
 
   /* vreinterpret_s16_xx.  */
   TEST_VREINTERPRET(, int, s, 16, 4, int, s, 8, 8, expected_s16_1);
@@ -692,7 +696,9 @@ void exec_vreinterpret (void)
   TEST_VREINTERPRET(, int, s, 16, 4, uint, u, 64, 1, expected_s16_7);
   TEST_VREINTERPRET(, int, s, 16, 4, poly, p, 8, 8, expected_s16_8);
   TEST_VREINTERPRET(, int, s, 16, 4, poly, p, 16, 4, expected_s16_9);
+#if defined (__ARM_FP16_FORMAT_IEEE) || defined (__ARM_FP16_FORMAT_ALTERNATIVE)
   TEST_VREINTERPRET(, int, s, 16, 4, float, f, 16, 4, expected_s16_10);
+#endif
 
   /* vreinterpret_s32_xx.  */
   TEST_VREINTERPRET(, int, s, 32, 2, int, s, 8, 8, expected_s32_1);
@@ -704,7 +710,9 @@ void exec_vreinterpret (void)
   TEST_VREINTERPRET(, int, s, 32, 2, uint, u, 64, 1, expected_s32_7);
   TEST_VREINTERPRET(, int, s, 32, 2, poly, p, 8, 8, expected_s32_8);
   TEST_VREINTERPRET(, int, s, 32, 2, poly, p, 16, 4, 

[PING] [PATCH] Fix ICE with invalid use of flags output operand

2016-06-21 Thread Bernd Edlinger
Ping...

for this patch: https://gcc.gnu.org/ml/gcc-patches/2016-06/msg00871.html

I'd say it's a no-brainer...


> Hi,
> 
> this fixes an ICE that happens when an asm statement tries to print
> the flags output operand.
> 
> Boot-strapped and reg-tested on x86_64-linux-gnu.
> OK for trunk?
> 
> 
> Thanks
> Bernd.


Re: [PING] [PATCH] c/69507 - bogus warning: ISO C does not allow ‘__alignof__ (expression)’

2016-06-21 Thread Christophe Lyon
On 20 June 2016 at 17:46, Martin Sebor  wrote:
>> Since this patch was committed, I am now seeing failures on:
>> gcc.dg/gnu99-const-expr-1.c
>> gcc.dg/gnu99-static-1.c
>>
>> (targets arm, aarch64, I don't think that it should matter?)
>>
>> Can you have  a look?
>
>
> Sorry about that.  I missed the test updates in my initial patch.
> I've committed them in r237606.
>

OK thanks, I confirm the tests no longer fail.

Christophe

> Martin
>


Re: Update probabilities in predict.def to match reality

2016-06-21 Thread Andreas Schwab
Renlin Li  writes:

> Hi,
>
> On 08/06/16 11:21, Andreas Schwab wrote:
>> Jan Hubicka  writes:
>>
>>> Bootstrapped/regtested x86_64-linux, will commit it later today.
>>
>> FAIL: gcc.dg/tree-ssa/slsr-8.c scan-tree-dump-times optimized " w?* " 7
>
> This fails for all arm and aarch64 targets as well since the commit.

In fact, it has regressed everywhere.

Andreas.

-- 
Andreas Schwab, SUSE Labs, sch...@suse.de
GPG Key fingerprint = 0196 BAD8 1CE9 1970 F4BE  1748 E4D4 88E3 0EEA B9D7
"And now for something completely different."


Re: [PATCH, Fortran, OpenACC] Fix PR70598, Fortran host_data ICE (ping x2)

2016-06-21 Thread Chung-Lin Tang
Ping x2

On 2016/6/7 08:03 PM, Chung-Lin Tang wrote:
> Ping.
> 
> On 2016/5/11 02:57 AM, Bernhard Reutner-Fischer wrote:
>> On May 9, 2016 4:26:50 PM GMT+02:00, Chung-Lin Tang 
>>  wrote:
>>> Hi, this patch resolves an ICE for Fortran when using the OpenACC
>>> host_data directive.  Actually, rather than say resolve, it's more like
>>> adjusting the front-end to same middle-end restrictions as C/C++,
>>> namely that we only support pointers or arrays for host_data right now.
>>>
>>> This patch contains a little bit of adjustments in
>>> fortran/openmp.c:resolve_omp_clauses(),
>>> and some testcase adjustments. This has been tested without regressions
>>> for Fortran.
>>>
>>> Is this okay for trunk?
>>>
>>> Thanks,
>>> Chung-Lin
>>>
>>> 2015-05-09  Chung-Lin Tang  
>>>
>>> gcc/
>>> * fortran/openmp.c (resolve_omp_clauses): Adjust use_device clause
>>> handling to only allow pointers and arrays.
>>
>> Fortran has it's own ChangeLog. The patch itself looks somewhat plausible to 
>> me, fwiw, but Jakub or a FE maintainer has the say.
>>
>