Re: [patch] adjust default nvptx launch geometry for OpenACC offloaded regions

2018-06-29 Thread Cesar Philippidis
On 06/29/2018 10:12 AM, Cesar Philippidis wrote:
> Ping.

While porting the vector length patches to trunk, I realized that I
mistakenly removed support for the environment variable GOMP_OPENACC_DIM
in this patch (thanks for adding those test case Tom!). I'll post an
updated version of this patch once I got the vector length patches
working with it.

Cesar

> On 06/20/2018 02:59 PM, Cesar Philippidis wrote:
>> At present, the nvptx libgomp plugin does not take into account the
>> amount of shared resources on GPUs (mostly shared-memory are register
>> usage) when selecting the default num_gangs and num_workers. In certain
>> situations, an OpenACC offloaded function can fail to launch if the GPU
>> does not have sufficient shared resources to accommodate all of the
>> threads in a CUDA block. This typically manifests when a PTX function
>> uses a lot of registers and num_workers is set too large, although it
>> can also happen if the shared-memory has been exhausted by the threads
>> in a vector.
>>
>> This patch resolves that issue by adjusting num_workers based the amount
>> of shared resources used by each threads. If worker parallelism has been
>> requested, libgomp will spawn as many workers as possible up to 32.
>> Without this patch, libgomp would always default to launching 32 workers
>> when worker parallelism is used.
>>
>> Besides for the worker parallelism, this patch also includes some
>> heuristics on selecting num_gangs. Before, the plugin would launch two
>> gangs per GPU multiprocessor. Now it follows the formula contained in
>> the "CUDA Occupancy Calculator" spreadsheet that's distributed with CUDA.
>>
>> Is this patch OK for trunk?
>>
>> Thanks,
>> Cesar
>>
> 



Re: C++ PATCH for c++/57891, narrowing conversions in non-type template arguments

2018-06-29 Thread Marek Polacek
On Wed, Jun 27, 2018 at 07:35:15PM -0400, Jason Merrill wrote:
> On Wed, Jun 27, 2018 at 12:53 PM, Marek Polacek  wrote:
> > This PR complains about us accepting invalid code like
> >
> >   template struct A {};
> >   A<-1> a;
> >
> > Where we should detect the narrowing: [temp.arg.nontype] says
> > "A template-argument for a non-type template-parameter shall be a converted
> > constant expression ([expr.const]) of the type of the template-parameter."
> > and a converted constant expression can contain only
> > - integral conversions other than narrowing conversions,
> > - [...]."
> > It spurred e.g.
> > 
> > and has >=3 dups so it has some visibility.
> >
> > I think build_converted_constant_expr needs to set check_narrowing.
> > check_narrowing also always mentions that it's in { } but that is no longer
> > true; in the future it will also apply to <=>.  We'd probably have to add a 
> > new
> > flag to struct conversion if wanted to distinguish between these.
> >
> > This does not yet fix detecting narrowing in function templates (78244).
> >
> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> >
> > 2018-06-27  Marek Polacek  
> >
> > PR c++/57891
> > * call.c (build_converted_constant_expr): Set check_narrowing.
> > * decl.c (compute_array_index_type): Add warning sentinel.  Use
> > input_location.
> > * pt.c (convert_nontype_argument): Return NULL_TREE if any errors
> > were reported.
> > * typeck2.c (check_narrowing): Don't mention { } in diagnostic.
> >
> > * g++.dg/cpp0x/Wnarrowing6.C: New test.
> > * g++.dg/cpp0x/Wnarrowing7.C: New test.
> > * g++.dg/cpp0x/Wnarrowing8.C: New test.
> > * g++.dg/cpp0x/constexpr-data2.C: Add dg-error.
> > * g++.dg/init/new43.C: Adjust dg-error.
> > * g++.dg/other/fold1.C: Likewise.
> > * g++.dg/parse/array-size2.C: Likewise.
> > * g++.dg/other/vrp1.C: Add dg-error.
> > * g++.dg/template/char1.C: Likewise.
> > * g++.dg/ext/builtin12.C: Likewise.
> > * g++.dg/template/dependent-name3.C: Adjust dg-error.
> >
> > diff --git gcc/cp/call.c gcc/cp/call.c
> > index 209c1fd2f0e..956c7b149dc 100644
> > --- gcc/cp/call.c
> > +++ gcc/cp/call.c
> > @@ -4152,7 +4152,10 @@ build_converted_constant_expr (tree type, tree expr, 
> > tsubst_flags_t complain)
> >  }
> >
> >if (conv)
> > -expr = convert_like (conv, expr, complain);
> > +{
> > +  conv->check_narrowing = !processing_template_decl;
> 
> Why !processing_template_decl?  This needs a comment.

Otherwise we'd warn for e.g.

template struct S { char a[N]; };
S<1> s;

where compute_array_index_type will try to convert the size of the array (which
is a template_parm_index of type int when parsing the template) to size_type.
So I guess I can say that we need to wait for instantiation?

> > +  expr = convert_like (conv, expr, complain);
> > +}
> >else
> >  expr = error_mark_node;
> >
> > diff --git gcc/cp/pt.c gcc/cp/pt.c
> > index 3780f3492aa..12d1a1e1cd3 100644
> > --- gcc/cp/pt.c
> > +++ gcc/cp/pt.c
> > @@ -6669,9 +6669,10 @@ convert_nontype_argument (tree type, tree expr, 
> > tsubst_flags_t complain)
> >   /* C++17: A template-argument for a non-type template-parameter 
> > shall
> >  be a converted constant expression (8.20) of the type of the
> >  template-parameter.  */
> > + int errs = errorcount;
> >   expr = build_converted_constant_expr (type, expr, complain);
> >   if (expr == error_mark_node)
> > -   return error_mark_node;
> > +   return errorcount > errs ? NULL_TREE : error_mark_node;
> 
> I suspect that what you want here is to check (complain & tf_error)
> rather than errorcount.  Otherwise it needs a comment.

I added a comment.  Checking complain doesn't work becase that doesn't
say if we have really issued an error.  If we have not, and we return
NULL_TREE anyway, we hit this assert:

 8515   if (lost)
 8516 {
 8517   gcc_assert (!(complain & tf_error) || seen_error ());
 8518   return error_mark_node;
 8519 }

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2018-06-29  Marek Polacek  

PR c++/57891
* call.c (build_converted_constant_expr): Set check_narrowing.
* decl.c (compute_array_index_type): Add warning sentinel.  Use
input_location.
* pt.c (convert_nontype_argument): Return NULL_TREE if any errors
were reported.
* typeck2.c (check_narrowing): Don't mention { } in diagnostic.

* g++.dg/cpp0x/Wnarrowing6.C: New test.
* g++.dg/cpp0x/Wnarrowing7.C: New test.
* g++.dg/cpp0x/Wnarrowing8.C: New test.
* g++.dg/cpp0x/constexpr-data2.C: Add dg-error.
* g++.dg/init/new43.C: Adjust dg-error.
* g++.dg/other/fold1.C: Likewise.
  

Re: [PATCH][PR84877]Dynamically align the address for local parameter copy on the stack when required alignment is larger than MAX_SUPPORTED_STACK_ALIGNMENT

2018-06-29 Thread Jeff Law
On 03/22/2018 05:56 AM, Renlin Li wrote:
> Hi all,
> 
> As described in PR84877. https://gcc.gnu.org/bugzilla/show_bug.cgi?id=84877
> The local copy of parameter on stack is not aligned.
> 
> For BLKmode paramters, a local copy on the stack will be saved.
> There are three cases:
> 1) arguments passed partially on the stack, partially via registers.
> 2) arguments passed fully on the stack.
> 3) arguments passed via registers.
> 
> After the change here, in all three cases, the stack slot for the local 
> parameter copy is aligned by the data type.Presumably this is only for named 
> arguments.  If we have to deal with
stdarg/varargs there's a number of additional complications we'd need to
worry about.


> 
> The stack slot is the DECL_RTL of the parameter. All the references 
> thereafter in the function will refer to this RTL.
OK.  This implies we're dealing strictly with named arguments.


> 
> 
> To populate the local copy on the stack,
> For case 1) and 2), there are operations to move data from the caller's stack 
> (from incoming rtl) into callee's stack.
> 
> For case 3), the registers are directly saved into the stack slot.
> 
> In all cases, the destination address is properly aligned.
> But for case 1) and case 2), the source address is not aligned by the type. 
> It is defined by the PCS how the arguments are prepared.
I'm not 100% sure the destination is always aligned.  I vaguely recall
the PA being an oddball on this kind of stuff.



> 
> The block move operation is fulfilled by emit_block_move (). As far as I can 
> see,
Yes.  But we may have had to flush argument registers to memory prior to
using emit_block_move.  And the flushing operation can be odd because of
things like alignment, padding, etc.  The PA in particular was an
oddball here, but I don't remember the precise details.


> 
> it will use the smaller alignment of source and destination.
> This looks fine as long as we don't use instructions which requires a strict 
> larger alignment than the address actually has.
Right.


> 
> 
> Here, it only changes receiving parameters.
> The function assign_stack_local_1 will be called in various places.
> Usually, the caller will constraint the ALIGN parameter. For example via 
> STACK_SLOT_ALIGNMENT macro.
> 
> assign_parm_setup_block will call assign_stack_local () with alignment from 
> the parameter type which in this case could be
> 
> larger than MAX_SUPPORTED_STACK_ALIGNMENT.
> 
> The alignment operation for parameter copy on the stack is similar to stack 
> vars.
> 
> First, enough space is reserved on the stack. The size is fixed at compile 
> time.
> 
> Instructions are emitted to dynamically get an aligned address at runtime 
> within this piece of memory.
At least that's how it's supposed to work.  I have some concerns about
the existing dynamic alignment bits independent of your change.


> 
> 
> This will unavoidably increase the usage of stack. However, it really depends 
> on
> 
> how many over-aligned parameters are passed by value.
It's relatively rare in my experience, so I wouldn't let this get in the
way.


> 
> x86-linux, arm-none-eabi, aarch64-one-elf regression test Okay.
> linux-armhf bootstrap Okay.
>   
> I assume there are other targets which will be affected by the change.
> But I don't have environment to test.
I don't think my tester will help much here as over-aligned parameters
are relatively rare and likely not triggered by bootstraps.

> 
> Okay the commit?
>   
> 
> Regards,
> Renlin
> 
> gcc/
> 
> 2018-03-22  Renlin Li  
> 
> PR middle-end/84877
> * explow.h (get_dynamic_stack_size): Declare it as external.
> * explow.c (record_new_stack_level): Remove function static attribute.
> * function.c (assign_stack_local_1): Dynamically align the stack slot
> addr for parameter copy on the stack.
> 
> gcc/testsuite/
> 
> 2018-03-22  Renlin Li  
> 
> PR middle-end/84877
> * gcc.dg/pr84877.c: New.
OK.  Certainly keep an eye out for issues on other targets.
Jeff


[patch] Add OpenACC Fortran support for deviceptr and variable in common blocks

2018-06-29 Thread Cesar Philippidis
The attached patch adds support Fortran support for OpenACC deviceptr
and the use of common block variables in data clauses (both implicit and
explicit). This patch also relaxes the Fortran parser to not error
certain types of integral expressions and assumed-sized arrays.

With respect to those errors, I removed them because a lot of working
applications do not explicitly use type attributes (like contiguous).
Perhaps it would be better to reduce them to a warning. Any thoughts on
that? My argument for their removal is that, while the standard states
that, say, arrays must be contiguous or bad things will happen, it does
not necessary mandate that the compiler enforces it. I.e., the intent is
to set the user's expectation that things will go bad if garbage input
is fed to the accelerator. If necessary, I can push back on the OpenACC
standards committee on these issue, but don't expect a quick resolution.

In hindsight, I probably should have kept the error relaxation patches
separate. This patch includes the following patches from og8:

  * (dd8b75a) [OpenACC] Update deviceptr handling
  * (634727d) [OpenACC] Handle Fortran deviceptr clause
  * (d50862a) [Fortran] Remove pointer check in check_array_not_assumed
  * (0793cef) [OpenACC] add support for fortran common blocks
  * (bdc1acc) [Fortran] update gfortran's tile clause error handling
  * (5dc4968) Fix PR72715 "ICE in gfc_trans_omp_do, at
  fortran/trans-openmp.c:3164"

Is this patch OK for trunk? It bootstrapped / regression tested cleanly
for x86_64 with nvptx offloading.

Thanks,
Cesar
2018-06-29  Cesar Philippidis  
	James Norris  

	gcc/fortran/
	* openmp.c (gfc_match_omp_map_clause): Re-write handling of the
	deviceptr clause.  Add new common_blocks argument.  Propagate it to
	gfc_match_omp_variable_list.
	(gfc_match_omp_clauses): Update calls to gfc_match_omp_map_clauses.
	(resolve_positive_int_expr): Promote the warning to an error.
	(check_array_not_assumed): Remove pointer check.
	(resolve_oacc_nested_loops): Error on do concurrent loops.
	* trans-openmp.c (gfc_omp_finish_clause): Don't create pointer data
	mappings for deviceptr clauses.
	(gfc_trans_omp_clauses): Likewise.

	gcc/
	* gimplify.c (enum gimplify_omp_var_data): Add GOVD_DEVICETPR.
	(oacc_default_clause): Privatize fortran common blocks.
	(omp_notice_variable): Add GOVD_DEVICEPTR attribute when appropriate.
	Defer the expansion of DECL_VALUE_EXPR for common block decls.
	(gimplify_scan_omp_clauses): Add GOVD_DEVICEPTR attribute when
	appropriate.
	(gimplify_adjust_omp_clauses_1): Set GOMP_MAP_FORCE_DEVICEPTR for
	implicit deviceptr mappings.

	gcc/testsuite/
	* c-c++-common/goacc/deviceptr-4.c: Update.
	* gfortran.dg/goacc/common-block-1.f90: New test.
	* gfortran.dg/goacc/common-block-2.f90: New test.
	* gfortran.dg/goacc/loop-2.f95: Update.
	* gfortran.dg/goacc/loop-3-2.f95: Update.
	* gfortran.dg/goacc/loop-3.f95: Update.
	* gfortran.dg/goacc/loop-5.f95: Update.
	* gfortran.dg/goacc/pr72715.f90: New test.
	* gfortran.dg/goacc/sie.f95: Update.
	* gfortran.dg/goacc/tile-1.f90: Update.
	* gfortran.dg/gomp/pr77516.f90: Update.

	libgomp/
	* oacc-parallel.c (GOACC_parallel_keyed): Handle Fortran deviceptr
	clause.
	(GOACC_data_start): Likewise.
	* testsuite/libgomp.oacc-fortran/common-block-1.f90: New test.
	* testsuite/libgomp.oacc-fortran/common-block-2.f90: New test.
	* testsuite/libgomp.oacc-fortran/common-block-3.f90: New test.
	* testsuite/libgomp.oacc-fortran/deviceptr-1.f90: New test.


>From 09c1aa87d9a7db2e08384bb47c80b4a61d218a99 Mon Sep 17 00:00:00 2001
From: Cesar Philippidis 
Date: Mon, 25 Jun 2018 13:10:13 -0700
Subject: [PATCH] fortran deviceptr

dd8b75 [OpenACC] Update deviceptr handling
634727 [OpenACC] Handle Fortran deviceptr clause
0793ce [OpenACC] add support for fortran common blocks
bdc1ac [Fortran] update gfortran's tile clause error handling
d50862 [Fortran] Remove pointer check in check_array_not_assumed
5dc496 Fix PR72715 "ICE in gfc_trans_omp_do, at fortran/trans-openmp.c:3164"

---
 gcc/fortran/openmp.c  |  57 ++---
 gcc/fortran/trans-openmp.c|   9 +
 gcc/gimplify.c|  35 +++-
 .../c-c++-common/goacc/deviceptr-4.c  |   2 +-
 .../gfortran.dg/goacc/common-block-1.f90  |  69 ++
 .../gfortran.dg/goacc/common-block-2.f90  |  49 +
 gcc/testsuite/gfortran.dg/goacc/loop-2.f95|   8 +-
 gcc/testsuite/gfortran.dg/goacc/loop-3-2.f95  |   4 +-
 gcc/testsuite/gfortran.dg/goacc/loop-3.f95|   4 +-
 gcc/testsuite/gfortran.dg/goacc/loop-5.f95|  12 --
 gcc/testsuite/gfortran.dg/goacc/pr72715.f90   |   6 +
 gcc/testsuite/gfortran.dg/goacc/sie.f95   |  36 ++--
 gcc/testsuite/gfortran.dg/goacc/tile-1.f90|  16 +-
 gcc/testsuite/gfortran.dg/gomp/pr77516.f90|   2 +-
 libgomp/oacc-parallel.c   |  11 +-
 .../libgomp.oacc-fortran/common-block-1.f90   | 105 ++
 .../libgomp.oacc-fortran/common-block-2.f90   | 150 

[PATCH 3/3] Extend -falign-FOO=N to N[:M[:N2[:M2]]]

2018-06-29 Thread Jeff Law
On 05/21/2018 12:58 PM, marxin wrote:
> gcc/ChangeLog:
> 
> 2018-05-25  Denys Vlasenko  
>   Martin Liska  
> 
>   PR middle-end/66240
>   PR target/45996
>   PR c/84100
>   * common.opt: Rename align options with 'str_' prefix.
>   * common/config/i386/i386-common.c (set_malign_value): New
>   function.
>   (ix86_handle_option): Use it to set -falign-* options/
>   * config/aarch64/aarch64-protos.h (struct tune_params): Change
>   type from int to string.
>   * config/aarch64/aarch64.c: Update default values from int
>   to string.
>   * config/alpha/alpha.c (alpha_override_options_after_change):
>   Likewise.
>   * config/arm/arm.c (arm_override_options_after_change_1): Likewise.
>   * config/i386/dragonfly.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Print
>   max skip conditionally.
>   * config/i386/freebsd.h (SUBALIGN_LOG): New.
>   (ASM_OUTPUT_MAX_SKIP_ALIGN): Print
>   max skip conditionally.
>   * config/i386/gas.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Print
>   max skip conditionally.
>   * config/i386/gnu-user.h (SUBALIGN_LOG): New.
>   (ASM_OUTPUT_MAX_SKIP_ALIGN): Print
>   max skip conditionally.
>   * config/i386/i386.c (struct ptt): Change type from int to
>   string.
>   (ix86_default_align): Set default values.
>   * config/i386/i386.h (ASM_OUTPUT_MAX_SKIP_PAD): Print
>   max skip conditionally.
>   * config/i386/iamcu.h (SUBALIGN_LOG): New.
>   (ASM_OUTPUT_MAX_SKIP_ALIGN):
>   * config/i386/lynx.h (ASM_OUTPUT_MAX_SKIP_ALIGN):
>   * config/i386/netbsd-elf.h (ASM_OUTPUT_MAX_SKIP_ALIGN): Print
>   max skip conditionally.
>   * config/i386/openbsdelf.h (SUBALIGN_LOG): New.
>   (ASM_OUTPUT_MAX_SKIP_ALIGN) Print max skip conditionally.:
>   * config/i386/x86-64.h (SUBALIGN_LOG): New.
>   (ASM_OUTPUT_MAX_SKIP_ALIGN): Print
>   max skip conditionally.
>   (ASM_OUTPUT_MAX_SKIP_PAD): Likewise.
>   * config/mips/mips.c (mips_set_compression_mode): Change
>   type of constants.
>   * config/rs6000/rs6000.c (rs6000_option_override_internal):
>   Likewise.
>   * config/rx/rx.c (rx_option_override): Likewise.
>   * config/rx/rx.h (JUMP_ALIGN): Use align_jumps_log.
>   (LABEL_ALIGN): Use align_labels_log.
>   (LOOP_ALIGN): Use align_loops_align.
>   * config/sh/sh.c (sh_override_options_after_change):
>   Change type of constants.
>   * config/spu/spu.c (spu_sched_init): Likewise.
>   * config/visium/visium.c (visium_option_override): Likewise.
>   * doc/invoke.texi: Document extended format of -falign-*.
>   * final.c: Use align_labels alignment.
>   * flags.h (struct target_flag_state): Change type to use
>   align_flags.
>   (struct align_flags_tuple): New.
>   (struct align_flags): Likewise.
>   (align_loops_log): Redefine macro to use new types.
>   (align_loops_max_skip): Redefine macro to use new types.
>   (align_jumps_log): Redefine macro to use new types.
>   (align_jumps_max_skip): Redefine macro to use new types.
>   (align_labels_log): Redefine macro to use new types.
>   (align_labels_max_skip): Redefine macro to use new types.
>   (align_functions_log): Redefine macro to use new types.
>   (align_loops): Redefine macro to use new types.
>   (align_jumps): Redefine macro to use new types.
>   (align_labels): Redefine macro to use new types.
>   (align_functions): Redefine macro to use new types.
>   (align_functions_max_skip): Redefine macro to use new types.
>   * function.c (invoke_set_current_function_hook): Propagate
>   alignment values from flags to global variables default in
>   topleev.h.
>   * ipa-icf.c (sem_function::equals_wpa): Use
>   cl_optimization_option_eq instead of memcmp.
>   * lto-streamer.h (cl_optimization_stream_out): Support streaming
>   of string types.
>   (cl_optimization_stream_in): Likewise.
>   * optc-save-gen.awk: Support strings in cl_optimization.
>   * opth-gen.awk: Likewise.
>   * opts.c (finish_options): Remove error checking of invalid
>   value ranges.
>   (MAX_CODE_ALIGN): Remove.
>   (MAX_CODE_ALIGN_VALUE): Likewise.
>   (parse_and_check_align_values): New function.
>   (check_alignment_argument): Likewise.
>   (common_handle_option): Use check_alignment_argument.
>   * opts.h (parse_and_check_align_values): Declare.
>   * toplev.c (init_alignments): Remove.
>   (read_log_maxskip): New.
>   (parse_N_M): Likewise.
>   (parse_alignment_opts): Likewise.
>   (backend_init_target): Remove usage of init_alignments.
>   * toplev.h (parse_alignment_opts): Declare.
>   * tree-streamer-in.c (streamer_read_tree_bitfields): Add new
>   argument.
>   * tree-streamer-out.c (streamer_write_tree_bitfields): Likewise.
>   * tree.c (cl_option_hasher::equal): New.
>   * varasm.c: Use new 

Re: [patch] jump threading multiple paths that start from the same BB

2018-06-29 Thread Jeff Law
[ Returning to another old patch... ]

On 11/07/2017 10:33 AM, Aldy Hernandez wrote:
> [One more time, but without rejected HTML mail, because apparently this
> is my first post to gcc-patches *ever* ;-)].
> 
> Howdy!
> 
> While poking around in the backwards threader I noticed that we bail if
> we have already seen a starting BB.
> 
>   /* Do not jump-thread twice from the same block.  */
>   if (bitmap_bit_p (threaded_blocks, entry->src->index)
> 
> This limitation discards paths that are sub-paths of paths that have
> already been threaded.
> 
> The following patch scans the remaining to-be-threaded paths to identify
> if any of them start from the same point, and are thus sub-paths of the
> just-threaded path.  By removing the common prefix of blocks in upcoming
> threadable paths, and then rewiring first non-common block
> appropriately, we expose new threading opportunities, since we are no
> longer starting from the same BB.  We also simplify the would-be
> threaded paths, because we don't duplicate already duplicated paths.
> 
> This sounds confusing, but the documentation for the entry point to my
> patch (adjust_paths_after_duplication) shows an actual example:
> 
> +/* After an FSM path has been jump threaded, adjust the remaining FSM
> +   paths that are subsets of this path, so these paths can be safely
> +   threaded within the context of the new threaded path.
> +
> +   For example, suppose we have just threaded:
> +
> +   5 -> 6 -> 7 -> 8 -> 12  =>  5 -> 6' -> 7' -> 8' -> 12'
> +
> +   And we have an upcoming threading candidate:
> +   5 -> 6 -> 7 -> 8 -> 15 -> 20
> +
> +   This function adjusts the upcoming path into:
> +   8' -> 15 -> 20
> 
> Notice that we will no longer have two paths that start from the same
> BB.  One will start with bb5, while the adjusted path will start with
> bb8'.  With this we kill two birds-- we are able to thread more paths,
> and these paths will avoid duplicating a whole mess of things that have
> already been threaded.
> 
> The attached patch is a subset of some upcoming work that can live on
> its own.  It bootstraps and regtests.  Also, by running it on a handful
> of .ii files, I can see that we are able to thread sub-paths that we
> previously dropped on the floor.  More is better, right? :)
> 
> To test this, I stole Jeff's method of using cachegrind to benchmark
> instruction counts and conditional branches
> (https://gcc.gnu.org/ml/gcc-patches/2013-11/msg02434.html).
> 
> Basically, I bootstrapped two compilers, with and without improvements,
> and used each to build a stage1 trunk.  Each of these stage1-trunks were
> used on 246 .ii GCC files I have lying around from a bootstrap some
> random point last year.  I used no special flags on builds apart from
> --enable-languages=c,c++.
> 
> Although I would've wished a larger improvement, this works comes for
> free, as it's just a subset of other work I'm hacking on.
> 
> Without further ado, here are my monumental, earth shattering improvements:
> 
> Conditional branches
>    Without patch: 411846839709
>    Withpatch: 411831330316
> %changed: -0.0037660%
> 
> Number of instructions
>    Without patch: 2271844650634
>    Withpatch: 2271761153578
> %changed: -0.0036754%
> 
> 
> OK for trunk?
> Aldy
> 
> p.s. There is a lot of debugging/dumping code in my patch, which I can
> gladly remove if/when approved.  It helps keep my head straight while
> looking at this spaghetti :).
> 
> curr.patch
> 
> 
> gcc/
> 
>   * tree-ssa-threadupdate.c (mark_threaded_blocks): Avoid
>   dereferencing path[] beyond its length.
>   (debug_path): New.
>   (debug_all_paths): New.
>   (rewire_first_differing_edge): New.
>   (adjust_paths_after_duplication): New.
>   (duplicate_thread_path): Call adjust_paths_after_duplication.
>   Add new argument.
>   (thread_through_all_blocks): Add new argument to
>   duplicate_thread_path.
This is fine for the trunk.  I'd keep the dumping code as-is.  It'll be
useful in the future :-)

> 
> diff --git a/gcc/tree-ssa-threadupdate.c b/gcc/tree-ssa-threadupdate.c
> index 1dab0f1fab4..53ac7181b4b 100644
> --- a/gcc/tree-ssa-threadupdate.c
> +++ b/gcc/tree-ssa-threadupdate.c
> +
> +/* Rewire a jump_thread_edge so that the source block is now a
> +   threaded source block.
> +
> +   PATH_NUM is an index into the global path table PATHS.
> +   EDGE_NUM is the jump thread edge number into said path.
> +
> +   Returns TRUE if we were able to successfully rewire the edge.  */
> +
> +static bool
> +rewire_first_differing_edge (unsigned path_num, unsigned edge_num)
> +{
> +  vec *path = paths[path_num];
> +  edge  = (*path)[edge_num]->e;
> +  if (dump_file && (dump_flags & TDF_DETAILS))
> +fprintf (dump_file, "rewiring edge candidate: %d -> %d\n",
> +  e->src->index, e->dest->index);
> +  basic_block src_copy = get_bb_copy (e->src);
> +  if (src_copy == NULL)
> +{
> +  if (dump_file && (dump_flags & 

Re: [PATCH] Fix PR86321

2018-06-29 Thread Janne Blomqvist
On Thu, Jun 28, 2018 at 12:16 PM, Richard Biener  wrote:

>
> The fortran FE creates array descriptor types via build_distinct_type_copy
> which ends up re-using the TYPE_FIELDs chain of FIELD_DECLs between
> types in different type-variant chains.  While that seems harmless
> in practice it breaks once we try to generate C-like debug info for
> it because dwarf2out doesn't expect such sharing to occur (and I
> wouldn't be surprised of other odd behavior elsewhere that simply
> doesn't manifest in a as fatal way as PR86321).
>
> We generate C-like debug info when you use LTO and -g0 at compile-time
> and -g at link-time (that's the way targets w/o debug-copy implementation
> end up wired).  For non-LTO we avoid directly generating debug for
> the array descriptor types by detecting them via a langhook.
>
> The solution seems to be to adhere to the invariant that TYPE_FIELDs
> (and thus FIELD_DECL) sharing is only valid between variant types
> and their main variant.  Thus, copy the chain.
>
> Bootstrap / regtest pending on x86_64-unknown-linux-gnu.
>
> I suppose verify_type () could check proper ownership of the
> FIELD_DECLs (simply verify that DECL_CONTEXT is TYPE_MAIN_VARIANT).
> But I guess this may break in different ways.  Honza - did you
> originally try to verify that?  It currently says
>
>   for (tree fld = TYPE_FIELDS (t); fld; fld = TREE_CHAIN (fld))
> {
>   /* TODO: verify properties of decls.  */
>   if (TREE_CODE (fld) == FIELD_DECL)
> ;
> ...
>
> OK for trunk?
>

Ok, thanks for the patch, and to Dominique for testing!

-- 
Janne Blomqvist


[committed] Convert v850 to LRA

2018-06-29 Thread Jeff Law

So this patch converts the v850 port to use LRA.  From a code generation
standpoint it looks like the old bare v850 gets slightly worse code.
However, we get slightly better code in general on the newer parts like
v850e3v5.

The only really interesting part of the conversion is the removal of
several patterns that never should have existed in the first place.  A
later variant of the v850 introduced loads/stores with larger
displacement values then the original v850 supported.

Those loads/stores were implemented with distinct patterns that only
matched those large displacement loads/stores.

The better solution is to fix GO_IF_LEGITIMATE_ADDRESS to allow the
larger displacements.  Once that's done the existing movxx patterns as
well as the zero/sign extending loads can use the bigger displacements
as-is.

That also avoids a bit of a wart in the LRA implementation related to
re-recognizing insns during register elimination.I'm actually rather
surprised reload didn't complain as well as we've long required a single
movXX pattern to implement all the loads/stores for a particular mode
for similar reasons.

>From a testing standpoint we get 100% identical results with LRA as we
were getting without LRA.

Installed on the trunk.

Jeff
diff --git a/gcc/ChangeLog b/gcc/ChangeLog
index a0da66e6932..c486d6d2172 100644
--- a/gcc/ChangeLog
+++ b/gcc/ChangeLog
@@ -1,3 +1,12 @@
+2018-06-29  Jeff Law  
+
+   * config/v850/v850.c (v850_legitimate_address_p): Handle large
+   displacements for TARGET_V850E2V3 and newer.
+   (TARGET_LRA_P): Remove.  Defaults to LRA now.
+   * config/v850/v850.md (sign23byte_load): Remove.
+   (unsign23byte_load, sign23hword_load, unsign23hword_load): Likewise.
+   (23word_load, 23byte_store, 23hword_store, 23word_store): Likewise.
+
 2018-06-29  Martin Liska  
 
PR lto/85759
diff --git a/gcc/config/v850/v850.c b/gcc/config/v850/v850.c
index cb2debf46f1..8936c732307 100644
--- a/gcc/config/v850/v850.c
+++ b/gcc/config/v850/v850.c
@@ -3079,7 +3079,10 @@ v850_legitimate_address_p (machine_mode mode, rtx x, 
bool strict_p,
 return true;
   if (GET_CODE (x) == PLUS
   && v850_rtx_ok_for_base_p (XEXP (x, 0), strict_p)
-  && constraint_satisfied_p (XEXP (x,1), CONSTRAINT_K)
+  && (constraint_satisfied_p (XEXP (x, 1), CONSTRAINT_K)
+ || (TARGET_V850E2V3_UP
+ && (mode == SImode || mode == HImode || mode == QImode)
+ && constraint_satisfied_p (XEXP (x, 1), CONSTRAINT_W)))
   && ((mode == QImode || INTVAL (XEXP (x, 1)) % 2 == 0)
   && CONST_OK_FOR_K (INTVAL (XEXP (x, 1))
  + (GET_MODE_NUNITS (mode) * UNITS_PER_WORD
@@ -3309,9 +3312,6 @@ v850_modes_tieable_p (machine_mode mode1, machine_mode 
mode2)
 #undef  TARGET_LEGITIMATE_CONSTANT_P
 #define TARGET_LEGITIMATE_CONSTANT_P v850_legitimate_constant_p
 
-#undef TARGET_LRA_P
-#define TARGET_LRA_P hook_bool_void_false
-
 #undef  TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P
 #define TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P v850_legitimate_address_p
 
diff --git a/gcc/config/v850/v850.md b/gcc/config/v850/v850.md
index b8f098b9363..6530778c8f6 100644
--- a/gcc/config/v850/v850.md
+++ b/gcc/config/v850/v850.md
@@ -138,74 +138,6 @@
 ;; --
 ;; MOVE INSTRUCTIONS
 ;; --
-(define_insn "sign23byte_load"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-   (sign_extend:SI
-   (mem:QI (plus:SI (match_operand:SI 1 "register_operand" "r")
-(match_operand 2 "disp23_operand" "W")]
-  "TARGET_V850E2V3_UP"
-  "ld.b %2[%1],%0"
-  [(set_attr "length" "4")])
-  
-(define_insn "unsign23byte_load"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-   (zero_extend:SI
-   (mem:QI (plus:SI (match_operand:SI 1 "register_operand" "r")
-(match_operand 2 "disp23_operand" "W")]
-  "TARGET_V850E2V3_UP"
-  "ld.bu %2[%1],%0"
-  [(set_attr "length" "4")])
-
-(define_insn "sign23hword_load"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-   (sign_extend:SI
-   (mem:HI (plus:SI (match_operand:SI 1 "register_operand" "r")
-(match_operand 2 "disp23_operand" "W")]
-  "TARGET_V850E2V3_UP"
-  "ld.h %2[%1],%0"
-  [(set_attr "length" "4")])
-
-(define_insn "unsign23hword_load"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-   (zero_extend:SI
-   (mem:HI (plus:SI (match_operand:SI 1 "register_operand" "r")
-(match_operand 2 "disp23_operand" "W")]
-  "TARGET_V850E2V3_UP"
-  "ld.hu %2[%1],%0"
-  [(set_attr "length" "4")])
-
-(define_insn "23word_load"
-  [(set (match_operand:SI 0 "register_operand" "=r")
-   (mem:SI (plus:SI (match_operand:SI 1 "register_operand" "r")
-(match_operand 2 "disp23_operand" "W"]
-  "TARGET_V850E2V3_UP"
-  "ld.w %2[%1],%0"

Re: [PATCH] Make sure rs6000-modes.h is installed in plugin/include/config/rs6000/ subdir

2018-06-29 Thread Michael Meissner
On Fri, Jun 29, 2018 at 12:52:59AM +0200, Jakub Jelinek wrote:
> Hi!
> 
> The newly added rs6000-modes.h is now included from rs6000.h, so it is
> needed when building plugins that include tm.h, but it wasn't listed in the
> Makefile fragments and therefore included among PLUGIN_HEADERS.
> 
> Fixed thusly, tested on powerpc64le-linux, ok for trunk and 8.2?
> 
> 2018-06-28  Jakub Jelinek  
> 
>   * config/rs6000/t-rs6000: Append rs6000-modes.h to TM_H.

Thanks for catching this.

-- 
Michael Meissner, IBM
IBM, M/S 2506R, 550 King Street, Littleton, MA 01460-6245, USA
email: meiss...@linux.ibm.com, phone: +1 (978) 899-4797



Re: [patch] various OpenACC reduction enhancements - test cases

2018-06-29 Thread Cesar Philippidis
Attached are the updated reductions tests cases. Again, these have been
bootstrapped and regression tested cleanly for x86_64 with nvptx
offloading. Is it OK for trunk?

Thanks,
Cesar
2018-06-29  Cesar Philippidis  
	Nathan Sidwell  

	gcc/testsuite/
	* c-c++-common/goacc/orphan-reductions-1.c: New test.
	* c-c++-common/goacc/reduction-7.c: New test.
	* c-c++-common/goacc/routine-4.c: Update.
	* g++.dg/goacc/reductions-1.C: New test.
	* gcc.dg/goacc/loop-processing-1.c: Update.
	* gfortran.dg/goacc/orphan-reductions-1.f90: New test.

	libgomp/
	* libgomp.oacc-c-c++-common/par-reduction-3.c: New test.
	* libgomp.oacc-c-c++-common/reduction-cplx-flt-2.c: New test.
	* libgomp.oacc-fortran/reduction-9.f90: New test.


From b128e80be7cd2c81171fbd9c8b23e786bb832633 Mon Sep 17 00:00:00 2001
From: Cesar Philippidis 
Date: Thu, 21 Jun 2018 11:37:56 -0700
Subject: [PATCH] Trunk reductions patches

OG8 Reduction patches

4469fc4 [Fortran] Permit reductions in gfc_omp_clause_copy_ctor
704f1a2 [nxptx, OpenACC] vector reductions
8a35c89 [OpenACC] Fix a reduction bug involving GOMP_MAP_FIRSTPRIVATE_POINTER variables
16ead33 [OpenACC] Update error messages for c and c++ reductions
65dd9cf Make OpenACC orphan gang reductions errors
5d60102 [PR80547] Handle parallel reductions explicitly initialized by the user
---
 gcc/c/c-parser.c  |  46 +-
 gcc/c/c-typeck.c  |   8 +
 gcc/config/nvptx/nvptx.c  | 233 +++-
 gcc/config/nvptx/nvptx.md |   7 +
 gcc/cp/parser.c   |  27 +-
 gcc/cp/semantics.c|   8 +
 gcc/fortran/openmp.c  |  12 +
 gcc/fortran/trans-openmp.c|   3 +-
 gcc/omp-general.h |   5 +-
 gcc/omp-low.c |  33 +-
 gcc/omp-offload.c |  18 +
 .../c-c++-common/goacc/orphan-reductions-1.c  |  56 ++
 .../c-c++-common/goacc/reduction-7.c  | 111 
 gcc/testsuite/c-c++-common/goacc/routine-4.c  |   8 +-
 gcc/testsuite/g++.dg/goacc/reductions-1.C | 548 ++
 .../gcc.dg/goacc/loop-processing-1.c  |   3 +-
 .../gfortran.dg/goacc/orphan-reductions-1.f90 | 204 +++
 .../par-reduction-3.c |  29 +
 .../reduction-cplx-flt-2.c|  32 +
 .../libgomp.oacc-fortran/reduction-9.f90  |  54 ++
 20 files changed, 1396 insertions(+), 49 deletions(-)
 create mode 100644 gcc/testsuite/c-c++-common/goacc/orphan-reductions-1.c
 create mode 100644 gcc/testsuite/c-c++-common/goacc/reduction-7.c
 create mode 100644 gcc/testsuite/g++.dg/goacc/reductions-1.C
 create mode 100644 gcc/testsuite/gfortran.dg/goacc/orphan-reductions-1.f90
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/par-reduction-3.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-c-c++-common/reduction-cplx-flt-2.c
 create mode 100644 libgomp/testsuite/libgomp.oacc-fortran/reduction-9.f90

diff --git a/gcc/testsuite/c-c++-common/goacc/orphan-reductions-1.c b/gcc/testsuite/c-c++-common/goacc/orphan-reductions-1.c
new file mode 100644
index 000..b0bd4a7de05
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/orphan-reductions-1.c
@@ -0,0 +1,56 @@
+/* Test orphan reductions.  */
+
+#include 
+
+#pragma acc routine seq
+int
+seq_reduction (int n)
+{
+  int i, sum = 0;
+#pragma acc loop seq reduction(+:sum)
+  for (i = 0; i < n; i++)
+sum = sum + 1;
+
+  return sum;
+}
+
+#pragma acc routine gang
+int
+gang_reduction (int n)
+{
+  int i, s1 = 0, s2 = 0;
+#pragma acc loop gang reduction(+:s1) /* { dg-error "gang reduction on an orphan loop" } */
+  for (i = 0; i < n; i++)
+s1 = s1 + 2;
+
+#pragma acc loop gang reduction(+:s2) /* { dg-error "gang reduction on an orphan loop" } */
+  for (i = 0; i < n; i++)
+s2 = s2 + 2;
+
+
+  return s1 + s2;
+}
+
+#pragma acc routine worker
+int
+worker_reduction (int n)
+{
+  int i, sum = 0;
+#pragma acc loop worker reduction(+:sum)
+  for (i = 0; i < n; i++)
+sum = sum + 3;
+
+  return sum;
+}
+
+#pragma acc routine vector
+int
+vector_reduction (int n)
+{
+  int i, sum = 0;
+#pragma acc loop vector reduction(+:sum)
+  for (i = 0; i < n; i++)
+sum = sum + 4;
+
+  return sum;
+}
diff --git a/gcc/testsuite/c-c++-common/goacc/reduction-7.c b/gcc/testsuite/c-c++-common/goacc/reduction-7.c
new file mode 100644
index 000..245c848d509
--- /dev/null
+++ b/gcc/testsuite/c-c++-common/goacc/reduction-7.c
@@ -0,0 +1,111 @@
+/* Exercise invalid reductions on array and struct members.  */
+
+void
+test_parallel ()
+{
+  struct {
+int a;
+float b[5];
+  } s1, s2[10];
+
+  int i;
+  double z[100];
+
+#pragma acc parallel reduction(+:s1.a) /* { dg-error "invalid reduction variable" } */
+  for (i = 0; i < 10; i++)
+s1.a += 1;
+
+#pragma acc parallel reduction(+:s1.b[3]) /* { dg-error "invalid reduction variable" } */
+  for (i = 0; i < 10; i++)

Re: [patch] various OpenACC reduction enhancements - FE changes

2018-06-29 Thread Cesar Philippidis
Attaches are the FE changes for the OpenACC reduction enhancements. It
depends on the ME patch.

Is this patch OK for trunk? It bootstrapped / regression tested cleanly
for x86_64 with nvptx offloading.

Thanks,
Cesar
2018-06-29  Cesar Philippidis  
	Nathan Sidwell  

	gcc/c/
	* c-parser.c (c_parser_omp_variable_list): New c_omp_region_type
	argument.  Use it to specialize handling of OMP_CLAUSE_REDUCTION for
	OpenACC.
	(c_parser_omp_clause_reduction): Update call to
	c_parser_omp_variable_list.  Propage OpenACC errors as necessary.
	(c_parser_oacc_all_clauses): Update call to
	p_parser_omp_clause_reduction.
	(c_parser_omp_all_clauses): Likewise.
	* c-typeck.c (c_finish_omp_clauses): Emit an error on orphan OpenACC
	gang reductions.

	gcc/cp/
	* parser.c (cp_parser_omp_var_list_no_open):  New c_omp_region_type
	argument.  Use it to specialize handling of OMP_CLAUSE_REDUCTION for
	OpenACC.
	(cp_parser_omp_clause_reduction): Update call to
	cp_parser_omp_variable_list.  Propage OpenACC errors as necessary.
	(cp_parser_oacc_all_clauses): Update call to
	cp_parser_omp_clause_reduction.
	(cp_parser_omp_all_clauses): Likewise.
	* semantics.c (finish_omp_clauses): Emit an error on orphan OpenACC
	gang reductions.

	gcc/fortran/
	* openmp.c (resolve_oacc_loop_blocks): Emit an error on orphan OpenACC
	gang reductions.
	* trans-openmp.c (gfc_omp_clause_copy_ctor): Permit reductions.

---
diff --git a/gcc/c/c-parser.c b/gcc/c/c-parser.c
index 7a926285f3a..a6f453dae54 100644
--- a/gcc/c/c-parser.c
+++ b/gcc/c/c-parser.c
@@ -965,12 +965,13 @@ class token_pair
 
   /* Like token_pair::require_close, except that tokens will be skipped
  until the desired token is found.  An error message is still produced
- if the next token is not as expected.  */
+ if the next token is not as expected, unless QUIET is set.  */
 
-  void skip_until_found_close (c_parser *parser) const
+  void skip_until_found_close (c_parser *parser, bool quiet = false) const
   {
 c_parser_skip_until_found (parser, traits_t::close_token_type,
-			   traits_t::close_gmsgid, m_open_loc);
+			   quiet ? NULL : traits_t::close_gmsgid,
+			   m_open_loc);
   }
 
  private:
@@ -11498,7 +11499,8 @@ c_parser_oacc_wait_list (c_parser *parser, location_t clause_loc, tree list)
 static tree
 c_parser_omp_variable_list (c_parser *parser,
 			location_t clause_loc,
-			enum omp_clause_code kind, tree list)
+			enum omp_clause_code kind, tree list,
+			enum c_omp_region_type ort = C_ORT_OMP)
 {
   if (c_parser_next_token_is_not (parser, CPP_NAME)
   || c_parser_peek_token (parser)->id_kind != C_ID_ID)
@@ -11557,6 +11559,22 @@ c_parser_omp_variable_list (c_parser *parser,
 	  /* FALLTHROUGH  */
 	case OMP_CLAUSE_DEPEND:
 	case OMP_CLAUSE_REDUCTION:
+	  if (kind == OMP_CLAUSE_REDUCTION && ort == C_ORT_ACC)
+		{
+		  switch (c_parser_peek_token (parser)->type)
+		{
+		case CPP_OPEN_PAREN:
+		case CPP_OPEN_SQUARE:
+		case CPP_DOT:
+		case CPP_DEREF:
+		  error ("invalid reduction variable");
+		  t = error_mark_node;
+		default:;
+		  break;
+		}
+		  if (t == error_mark_node)
+		break;
+		}
 	  while (c_parser_next_token_is (parser, CPP_OPEN_SQUARE))
 		{
 		  tree low_bound = NULL_TREE, length = NULL_TREE;
@@ -12789,9 +12807,12 @@ c_parser_omp_clause_private (c_parser *parser, tree list)
  identifier  */
 
 static tree
-c_parser_omp_clause_reduction (c_parser *parser, tree list)
+c_parser_omp_clause_reduction (c_parser *parser, tree list,
+			   enum c_omp_region_type ort)
 {
   location_t clause_loc = c_parser_peek_token (parser)->location;
+  bool seen_error = false;
+
   matching_parens parens;
   if (parens.require_open (parser))
 {
@@ -12855,7 +12876,13 @@ c_parser_omp_clause_reduction (c_parser *parser, tree list)
 	  tree nl, c;
 
 	  nl = c_parser_omp_variable_list (parser, clause_loc,
-	   OMP_CLAUSE_REDUCTION, list);
+	   OMP_CLAUSE_REDUCTION, list, ort);
+	  if (c_parser_peek_token (parser)->type != CPP_CLOSE_PAREN)
+	{
+	  seen_error = true;
+	  goto cleanup;
+	}
+
 	  for (c = nl; c != list; c = OMP_CLAUSE_CHAIN (c))
 	{
 	  tree d = OMP_CLAUSE_DECL (c), type;
@@ -12891,7 +12918,8 @@ c_parser_omp_clause_reduction (c_parser *parser, tree list)
 
 	  list = nl;
 	}
-  parens.skip_until_found_close (parser);
+cleanup:
+  parens.skip_until_found_close (parser, seen_error);
 }
   return list;
 }
@@ -13998,7 +14026,7 @@ c_parser_oacc_all_clauses (c_parser *parser, omp_clause_mask mask,
 	  c_name = "private";
 	  break;
 	case PRAGMA_OACC_CLAUSE_REDUCTION:
-	  clauses = c_parser_omp_clause_reduction (parser, clauses);
+	  clauses = c_parser_omp_clause_reduction (parser, clauses, C_ORT_ACC);
 	  c_name = "reduction";
 	  break;
 	case PRAGMA_OACC_CLAUSE_SEQ:
@@ -14157,7 +14185,7 @@ c_parser_omp_all_clauses (c_parser *parser, omp_clause_mask mask,
 	  c_name = "private";
 	  break;
 	case 

Re: [patch] various OpenACC reduction enhancements - ME and nvptx changes

2018-06-29 Thread Cesar Philippidis
The attached patch includes the nvptx and GCC ME reductions enhancements.

Is this patch OK for trunk? It bootstrapped / regression tested cleanly
for x86_64 with nvptx offloading.

Thanks,
Cesar
2018-06-29  Cesar Philippidis  
	Nathan Sidwell  

	gcc/
	* config/nvptx/nvptx.c (nvptx_propagate_unified): New.
	(nvptx_split_blocks): Call it for cond_uni insn.
	(nvptx_expand_cond_uni): New.
	(enum nvptx_builtins): Add NVPTX_BUILTIN_COND_UNI.
	(nvptx_init_builtins): Initialize it.
	(nvptx_expand_builtin):
	(nvptx_generate_vector_shuffle): Change integral SHIFT operand to
	tree BITS operand.
	(nvptx_vector_reduction): New.
	(nvptx_adjust_reduction_type): New.
	(nvptx_goacc_reduction_setup): Use it to adjust the type of ref_to_res.
	(nvptx_goacc_reduction_init): Don't update LHS if it doesn't exist.
	(nvptx_goacc_reduction_fini): Call nvptx_vector_reduction for vector.
	Use it to adjust the type of ref_to_res.
	(nvptx_goacc_reduction_teardown):
	* config/nvptx/nvptx.md (cond_uni): New pattern.
	* omp-general.h (enum oacc_loop_flags): Add OLF_REDUCTION enum.
	* omp-low.c (lower_oacc_reductions): Handle reduction decls mapped
	with GOMP_MAP_FIRSTPRIVATE_POINTER.
	(lower_oacc_head_mark): Use OLF_REDUCTION to mark OpenACC reductions.
	* omp-offload.c (oacc_loop_auto_partitions): Don't assign gang
	level parallelism to orphan reductions.
	(default_goacc_reduction): Retype ref_to_res as necessary.

---
diff --git a/gcc/config/nvptx/nvptx.c b/gcc/config/nvptx/nvptx.c
index 5608bee8a8d..33ec3db1153 100644
--- a/gcc/config/nvptx/nvptx.c
+++ b/gcc/config/nvptx/nvptx.c
@@ -2863,6 +2863,52 @@ nvptx_reorg_uniform_simt ()
 }
 }
 
+/* UNIFIED is a cond_uni insn.  Find the branch insn it affects, and
+   mark that as unified.  We expect to be in a single block.  */
+
+static void
+nvptx_propagate_unified (rtx_insn *unified)
+{
+  rtx_insn *probe = unified;
+  rtx cond_reg = SET_DEST (PATTERN (unified));
+  rtx pat = NULL_RTX;
+
+  /* Find the comparison.  (We could skip this and simply scan to he
+ blocks' terminating branch, if we didn't care for self
+ checking.)  */
+  for (;;)
+{
+  probe = next_real_insn (probe);
+  if (!probe)
+	break;
+  pat = PATTERN (probe);
+
+  if (GET_CODE (pat) == SET
+	  && GET_RTX_CLASS (GET_CODE (SET_SRC (pat))) == RTX_COMPARE
+	  && XEXP (SET_SRC (pat), 0) == cond_reg)
+	break;
+  gcc_assert (NONJUMP_INSN_P (probe));
+}
+  gcc_assert (pat);
+  rtx pred_reg = SET_DEST (pat);
+
+  /* Find the branch.  */
+  do
+probe = NEXT_INSN (probe);
+  while (!JUMP_P (probe));
+
+  pat = PATTERN (probe);
+  rtx itec = XEXP (SET_SRC (pat), 0);
+  gcc_assert (XEXP (itec, 0) == pred_reg);
+
+  /* Mark the branch's condition as unified.  */
+  rtx unspec = gen_rtx_UNSPEC (BImode, gen_rtvec (1, pred_reg),
+			   UNSPEC_BR_UNIFIED);
+  bool ok = validate_change (probe,  (itec, 0), unspec, false);
+
+  gcc_assert (ok);
+}
+
 /* Loop structure of the function.  The entire function is described as
a NULL loop.  */
 
@@ -2964,6 +3010,9 @@ nvptx_split_blocks (bb_insn_map_t *map)
 	continue;
 	  switch (recog_memoized (insn))
 	{
+	case CODE_FOR_cond_uni:
+	  nvptx_propagate_unified (insn);
+	  /* FALLTHROUGH */
 	default:
 	  seen_insn = true;
 	  continue;
@@ -5080,6 +5129,21 @@ nvptx_expand_cmp_swap (tree exp, rtx target,
   return target;
 }
 
+/* Expander for the compare unified builtin.  */
+
+static rtx
+nvptx_expand_cond_uni (tree exp, rtx target, machine_mode mode, int ignore)
+{
+  if (ignore)
+return target;
+  
+  rtx src = expand_expr (CALL_EXPR_ARG (exp, 0),
+			 NULL_RTX, mode, EXPAND_NORMAL);
+
+  emit_insn (gen_cond_uni (target, src));
+
+  return target;
+}
 
 /* Codes for all the NVPTX builtins.  */
 enum nvptx_builtins
@@ -5089,6 +5153,7 @@ enum nvptx_builtins
   NVPTX_BUILTIN_WORKER_ADDR,
   NVPTX_BUILTIN_CMP_SWAP,
   NVPTX_BUILTIN_CMP_SWAPLL,
+  NVPTX_BUILTIN_COND_UNI,
   NVPTX_BUILTIN_MAX
 };
 
@@ -5126,6 +5191,7 @@ nvptx_init_builtins (void)
(PTRVOID, ST, UINT, UINT, NULL_TREE));
   DEF (CMP_SWAP, "cmp_swap", (UINT, PTRVOID, UINT, UINT, NULL_TREE));
   DEF (CMP_SWAPLL, "cmp_swapll", (LLUINT, PTRVOID, LLUINT, LLUINT, NULL_TREE));
+  DEF (COND_UNI, "cond_uni", (integer_type_node, integer_type_node, NULL_TREE));
 
 #undef DEF
 #undef ST
@@ -5158,6 +5224,9 @@ nvptx_expand_builtin (tree exp, rtx target, rtx ARG_UNUSED (subtarget),
 case NVPTX_BUILTIN_CMP_SWAPLL:
   return nvptx_expand_cmp_swap (exp, target, mode, ignore);
 
+case NVPTX_BUILTIN_COND_UNI:
+  return nvptx_expand_cond_uni (exp, target, mode, ignore);
+
 default: gcc_unreachable ();
 }
 }
@@ -5284,7 +5353,7 @@ nvptx_get_worker_red_addr (tree type, tree offset)
 
 static void
 nvptx_generate_vector_shuffle (location_t loc,
-			   tree dest_var, tree var, unsigned shift,
+			   tree dest_var, tree var, tree bits,
 			   gimple_seq *seq)
 {
   unsigned fn = NVPTX_BUILTIN_SHUFFLE;
@@ -5307,7 +5376,6 @@ 

[patch] various OpenACC reduction enhancements

2018-06-29 Thread Cesar Philippidis
The following patch set includes various OpenACC reduction enhancements
present in og8. These include the following individual og8 commits:

  * (4469fc4) [Fortran] Permit reductions in gfc_omp_clause_copy_ctor
  * (704f1a2) [nxptx, OpenACC] vector reductions
  * (8a35c89) [OpenACC] Fix a reduction bug involving
  GOMP_MAP_FIRSTPRIVATE_POINTER variables
  * (16ead33) [OpenACC] Update error messages for c and c++ reductions
  * (65dd9cf) Make OpenACC orphan gang reductions errors
  * (5d60102) [PR80547] Handle parallel reductions explicitly
  initialized by the user

The nvptx vector reduction enhancement is a prerequisite for the
forthcoming variable-length patches.

This patch as a whole is somewhat large, so I've split it into three
pieces, 1) ME and nvptx changes, FE changes, and test cases. I'll reply
to this message with the individual patches.

Thanks,
Cesar


Re: [patch] Update support for Fortran arrays in OpenACC

2018-06-29 Thread Jakub Jelinek
On Fri, Jun 29, 2018 at 11:07:48AM -0700, Cesar Philippidis wrote:
> On 06/29/2018 10:49 AM, Jakub Jelinek wrote:
> > On Fri, Jun 29, 2018 at 10:33:56AM -0700, Cesar Philippidis wrote:
> >> @@ -1044,21 +1046,6 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p)
> >>  return;
> >>  
> >>tree decl = OMP_CLAUSE_DECL (c);
> >> -
> >> -  /* Assumed-size arrays can't be mapped implicitly, they have to be
> >> - mapped explicitly using array sections.  */
> >> -  if (TREE_CODE (decl) == PARM_DECL
> >> -  && GFC_ARRAY_TYPE_P (TREE_TYPE (decl))
> >> -  && GFC_TYPE_ARRAY_AKIND (TREE_TYPE (decl)) == GFC_ARRAY_UNKNOWN
> >> -  && GFC_TYPE_ARRAY_UBOUND (TREE_TYPE (decl),
> >> -  GFC_TYPE_ARRAY_RANK (TREE_TYPE (decl)) - 1)
> >> -   == NULL)
> >> -{
> >> -  error_at (OMP_CLAUSE_LOCATION (c),
> >> -  "implicit mapping of assumed size array %qD", decl);
> >> -  return;
> >> -}
> >> -
> >>tree c2 = NULL_TREE, c3 = NULL_TREE, c4 = NULL_TREE;
> >>if (POINTER_TYPE_P (TREE_TYPE (decl)))
> >>  {
> > 
> > I don't have time to review this fully right now, but the above looks like a
> > blocker to me.  The above must be diagnosed for OpenMP, so taking it away
> > rather than say conditionalizing it on whether it is in an OpenMP or OpenACC
> > construct is just wrong.
> 
> In certain respects, the above code is overly strict if the data is
> already present on the device. However, I do see your point. Would you
> be OK if I reduced that error to a warning?

No, it is violating the standard requirements for OpenMP, so it should be
an error.

> > As a general feeling of the patch there are many other spots that change
> > unconditionally code used by OpenMP and OpenACC and it isn't clear it
> > doesn't affect OpenMP code generation.  If some change is useful even for
> > OpenMP and Fortran, then I'd certainly expect it to be done only in omp-low
> > or omp-expand, before that it better should be represented how the standard
> > mandates.
> 
> I'll add more comments to the code. Also, I admit that I should make a
> stronger effort to share code between OpenACC and OpenMP. Would you be
> interested in using GOMP_MAP_FIRSTPRIVATE_POINTER mappings for arrays in
> OpenMP? I'm not sure if that's supported by OpenMP, although even with
> OpenACC it's not used everywhere yet.

GOMP_MAP_FIRSTPRIVATE_POINTER is (at least for OpenMP) standard mandated
behavior, which is for C/C++ pointers only, not for Fortran arrays.

Jakub


Re: [patch] Update support for Fortran arrays in OpenACC

2018-06-29 Thread Cesar Philippidis
On 06/29/2018 10:49 AM, Jakub Jelinek wrote:
> On Fri, Jun 29, 2018 at 10:33:56AM -0700, Cesar Philippidis wrote:
>> @@ -1044,21 +1046,6 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p)
>>  return;
>>  
>>tree decl = OMP_CLAUSE_DECL (c);
>> -
>> -  /* Assumed-size arrays can't be mapped implicitly, they have to be
>> - mapped explicitly using array sections.  */
>> -  if (TREE_CODE (decl) == PARM_DECL
>> -  && GFC_ARRAY_TYPE_P (TREE_TYPE (decl))
>> -  && GFC_TYPE_ARRAY_AKIND (TREE_TYPE (decl)) == GFC_ARRAY_UNKNOWN
>> -  && GFC_TYPE_ARRAY_UBOUND (TREE_TYPE (decl),
>> -GFC_TYPE_ARRAY_RANK (TREE_TYPE (decl)) - 1)
>> - == NULL)
>> -{
>> -  error_at (OMP_CLAUSE_LOCATION (c),
>> -"implicit mapping of assumed size array %qD", decl);
>> -  return;
>> -}
>> -
>>tree c2 = NULL_TREE, c3 = NULL_TREE, c4 = NULL_TREE;
>>if (POINTER_TYPE_P (TREE_TYPE (decl)))
>>  {
> 
> I don't have time to review this fully right now, but the above looks like a
> blocker to me.  The above must be diagnosed for OpenMP, so taking it away
> rather than say conditionalizing it on whether it is in an OpenMP or OpenACC
> construct is just wrong.

In certain respects, the above code is overly strict if the data is
already present on the device. However, I do see your point. Would you
be OK if I reduced that error to a warning?

> As a general feeling of the patch there are many other spots that change
> unconditionally code used by OpenMP and OpenACC and it isn't clear it
> doesn't affect OpenMP code generation.  If some change is useful even for
> OpenMP and Fortran, then I'd certainly expect it to be done only in omp-low
> or omp-expand, before that it better should be represented how the standard
> mandates.

I'll add more comments to the code. Also, I admit that I should make a
stronger effort to share code between OpenACC and OpenMP. Would you be
interested in using GOMP_MAP_FIRSTPRIVATE_POINTER mappings for arrays in
OpenMP? I'm not sure if that's supported by OpenMP, although even with
OpenACC it's not used everywhere yet.

Cesar


extract_range_from_binary* cleanups for VRP

2018-06-29 Thread Aldy Hernandez

Howdy!

Attached are some cleanups to the VRP code dealing with PLUS/MINUS_EXPR 
on ranges.  This will make it easier to share code with any other range 
implementation in the future, but is completely independent from any 
other work.


Currently there is a lot of code duplication in the PLUS/MINUS_EXPR 
code, which we can easily abstract out and make everything easier to 
read.  I have tried to keep functionality changes to a minimum to help 
in reviewing.


A few minor things that are different:

1. As mentioned in a previous thread with Richard 
(https://gcc.gnu.org/ml/gcc/2018-06/msg00100.html), I would like to use 
the first variant here, as they seem to ultimately do the same thing:


- /* Get the lower and upper bounds of the type.  */
- if (TYPE_OVERFLOW_WRAPS (expr_type))
-   {
- type_min = wi::min_value (prec, sgn);
- type_max = wi::max_value (prec, sgn);
-   }
- else
-   {
- type_min = wi::to_wide (vrp_val_min (expr_type));
- type_max = wi::to_wide (vrp_val_max (expr_type));
-   }

2. I've removed the code below, as it seems to be a remnant from when 
the comparisons were being done with double_int's.  The overflow checks 
were/are being done prior anyhow.  For that matter, I put in some 
gcc_unreachables in the code below, and never triggered it in a 
bootstrap + regtest.


- /* Check for type overflow.  */
- if (min_ovf == 0)
-   {
- if (wi::cmp (wmin, type_min, sgn) == -1)
-   min_ovf = -1;
- else if (wi::cmp (wmin, type_max, sgn) == 1)
-   min_ovf = 1;
-   }
- if (max_ovf == 0)
-   {
- if (wi::cmp (wmax, type_min, sgn) == -1)
-   max_ovf = -1;
- else if (wi::cmp (wmax, type_max, sgn) == 1)
-   max_ovf = 1;
-   }

Everything else is exactly as it was, just abstracted and moved around.

To test this, I compared the results of every binary op before and after 
this patch, to make sure that we were getting the same exact ranges. 
There were no differences in a bootstrap plus regtest.


OK for trunk?

gcc/

	* tree-vrp.c (extract_range_from_binary_expr_1): Abstract a lot of the
	{PLUS,MINUS}_EXPR code to...
	(adjust_symbolic_bound): ...here,
	(combine_bound): ...here,
	(set_value_range_with_overflow): ...and here.

diff --git a/gcc/tree-vrp.c b/gcc/tree-vrp.c
index 7c675396d78..ee112bb1826 100644
--- a/gcc/tree-vrp.c
+++ b/gcc/tree-vrp.c
@@ -1275,6 +1275,196 @@ extract_range_from_multiplicative_op_1 (value_range *vr,
 		   wide_int_to_tree (type, max), NULL);
 }
 
+/* If BOUND will include a symbolic bound, adjust it accordingly,
+   otherwise leave it as is.
+
+   CODE is the original operation that combined the bounds (PLUS_EXPR
+   or MINUS_EXPR).
+
+   TYPE is the type of the original operation.
+
+   SYM_OPn is the symbolic for OPn if it has a symbolic.
+
+   NEG_OPn is TRUE if the OPn was negated.  */
+
+static void
+adjust_symbolic_bound (tree , enum tree_code code, tree type,
+		   tree sym_op0, tree sym_op1,
+		   bool neg_op0, bool neg_op1)
+{
+  bool minus_p = (code == MINUS_EXPR);
+  /* If the result bound is constant, we're done; otherwise, build the
+ symbolic lower bound.  */
+  if (sym_op0 == sym_op1)
+;
+  else if (sym_op0)
+bound = build_symbolic_expr (type, sym_op0,
+ neg_op0, bound);
+  else if (sym_op1)
+{
+  /* We may not negate if that might introduce
+	 undefined overflow.  */
+  if (!minus_p
+	  || neg_op1
+	  || TYPE_OVERFLOW_WRAPS (type))
+	bound = build_symbolic_expr (type, sym_op1,
+ neg_op1 ^ minus_p, bound);
+  else
+	bound = NULL_TREE;
+}
+}
+
+/* Combine OP1 and OP1, which are two parts of a bound, into one wide
+   int bound according to CODE.  CODE is the operation combining the
+   bound (either a PLUS_EXPR or a MINUS_EXPR).
+
+   TYPE is the type of the combine operation.
+
+   WI is the wide int to store the result.
+
+   OVF is -1 if an underflow occurred, +1 if an overflow occurred or 0
+   if over/underflow occurred.  */
+
+static void
+combine_bound (enum tree_code code, wide_int , int ,
+	   tree type, tree op0, tree op1)
+{
+  bool minus_p = (code == MINUS_EXPR);
+  const signop sgn = TYPE_SIGN (type);
+  const unsigned int prec = TYPE_PRECISION (type);
+
+  /* Combine the bounds, if any.  */
+  if (op0 && op1)
+{
+  if (minus_p)
+	{
+	  wi = wi::to_wide (op0) - wi::to_wide (op1);
+
+	  /* Check for overflow.  */
+	  if (wi::cmp (0, wi::to_wide (op1), sgn)
+	  != wi::cmp (wi, wi::to_wide (op0), sgn))
+	ovf = wi::cmp (wi::to_wide (op0),
+			   wi::to_wide (op1), sgn);
+	}
+  else
+	{
+	  wi = wi::to_wide (op0) + wi::to_wide (op1);
+
+	  /* Check for overflow.  */
+	  if (wi::cmp (wi::to_wide (op1), 0, sgn)
+	  != wi::cmp (wi, wi::to_wide (op0), sgn))
+	ovf = wi::cmp (wi::to_wide (op0), wi, sgn);
+	}
+}
+  else if 

Re: [patch] Update support for Fortran arrays in OpenACC

2018-06-29 Thread Jakub Jelinek
On Fri, Jun 29, 2018 at 10:33:56AM -0700, Cesar Philippidis wrote:
> @@ -1044,21 +1046,6 @@ gfc_omp_finish_clause (tree c, gimple_seq *pre_p)
>  return;
>  
>tree decl = OMP_CLAUSE_DECL (c);
> -
> -  /* Assumed-size arrays can't be mapped implicitly, they have to be
> - mapped explicitly using array sections.  */
> -  if (TREE_CODE (decl) == PARM_DECL
> -  && GFC_ARRAY_TYPE_P (TREE_TYPE (decl))
> -  && GFC_TYPE_ARRAY_AKIND (TREE_TYPE (decl)) == GFC_ARRAY_UNKNOWN
> -  && GFC_TYPE_ARRAY_UBOUND (TREE_TYPE (decl),
> - GFC_TYPE_ARRAY_RANK (TREE_TYPE (decl)) - 1)
> -  == NULL)
> -{
> -  error_at (OMP_CLAUSE_LOCATION (c),
> - "implicit mapping of assumed size array %qD", decl);
> -  return;
> -}
> -
>tree c2 = NULL_TREE, c3 = NULL_TREE, c4 = NULL_TREE;
>if (POINTER_TYPE_P (TREE_TYPE (decl)))
>  {

I don't have time to review this fully right now, but the above looks like a
blocker to me.  The above must be diagnosed for OpenMP, so taking it away
rather than say conditionalizing it on whether it is in an OpenMP or OpenACC
construct is just wrong.
As a general feeling of the patch there are many other spots that change
unconditionally code used by OpenMP and OpenACC and it isn't clear it
doesn't affect OpenMP code generation.  If some change is useful even for
OpenMP and Fortran, then I'd certainly expect it to be done only in omp-low
or omp-expand, before that it better should be represented how the standard
mandates.

Jakub


Re: [PATCH,rs6000] Fix implementation of vec_unpackh, vec_unpackl builtins

2018-06-29 Thread Carl Love
Bill:
> > * config/rs6000/rs6000-c.c: Map ALTIVEC_BUILTIN_VEC_UNPACKH for
> > float argument to ALTIVEC_BUILTIN_UNPACKH_V4SF.
> > Map ALTIVEC_BUILTIN_VEC_UNPACKL for float argument to
> > ALTIVEC_BUILTIN_UNPACKH_V4SF.
> 
> UNPACKL
> 
> That's all I see; will leave to Segher to approve, of course. 

Ah, yes.  Fixed.

  Carl Love



[patch] Update support for Fortran arrays in OpenACC

2018-06-29 Thread Cesar Philippidis
The attached patch includes various bug fixes and performance
improvements involving the use of Fortran arrays in OpenACC data
clauses. More specifically,

  * Transfers Fortran arrays using GOMP_MAP_FIRSTPRIVATE_POINTERs.
  * Privatizes array descriptors in the Fortran FE.
  * Corrects a couple of bugs involving the offsets of subarray data.

The privatization of array descriptors results in a significant speedup
when programs utilize a lot of arrays. That patch was introduced back in
gomp-4_0-branch, so I lost state on it. However, I believer that I
privatized the array descriptors directly in the Fortran FE instead of
during gimplification because the FE had more knowledge on the array
descriptor types.

For reference, this patch contains the following patches from og8:

cecd29 OpenACC subarray data alignment in fortran
be4fec Privatize internal array variables introduced by the fortran FE
629dfb [OpenACC] firstprivate subarray changes
19cfe1 Fix PR70828s "broken array-type subarrays inside acc data in
   openacc"
00c258 [libgomp, OpenACC] Adjust offsets for present data clauses
924e50 [OpenACC, Fortran] fix an ICE involving assumed-size arrays
a5736d Correct the reported line number in fortran combined OpenACC
   directives

Is this patch OK for trunk? It bootstrapped / regression tested cleanly
for x86_64 with nvptx offloading.

Thanks,
Cesar
2018-06-29  Cesar Philippidis  

	gcc/fortran/
	* trans-array.c (gfc_trans_array_bounds): Add an INIT_VLA argument
	to control whether VLAs should be initialized.  Don't mark this
	function as static.
	(gfc_trans_auto_array_allocation): Update call to
	gfc_trans_array_bounds.
	(gfc_trans_g77_array): Likewise.
	* trans-array.h: Declare gfc_trans_array_bounds.
	* trans-openmp.c (gfc_omp_finish_clause): Don't cast ptr into a
	character pointer.  Remove "implicit mapping of assumed size array"
	error.
	(gfc_trans_omp_clauses): Likewise.
	(gfc_scan_nodesc_arrays): New.
	(gfc_privatize_nodesc_arrays_1): New.
	(gfc_privatize_nodesc_arrays): New.
	(gfc_init_nodesc_arrays): New.
	(gfc_trans_oacc_construct): Initialize any internal variables for
	arrays without array descriptors inside the offloaded parallel and
	kernels region.
	(gfc_trans_oacc_combined_directive): Likewise.  Set the	location of
	combined acc loops.

	gcc/
	* gimplify.c (struct gimplify_omp_ctx): Add tree clauses member.
	(new_omp_context): Initialize clauses to NULL_TREE.
	(gimplify_scan_omp_clauses): Prune firstprivate clause associated with
	OACC_DATA, OACC_ENTER_DATA and OACC_EXITdata regions.  Set clauses in
	the gimplify_omp_ctx.
	(omp_clause_matching_array_ref): New.
	(gomp_needs_data_present): New.
	(gimplify_adjust_omp_clauses_1): Use preset or pointer omp clause map
	kinds when creating implicit data clauses for OpenACC offloaded
	variables defined used an acc data region as necessary. Link ACC new
	clauses with the old ones.
	* omp-low.c (lower_omp_target): Handle NULL-sized types for
	assumed-sized arrays.

	gcc/testsuite/
	* c-c++-common/goacc/acc-data-chain.c: New test.
	* gfortran.dg/goacc/mod-array.f90: New test.
	* gfortran.dg/gomp/pr78866-2.f90: Update

	libgomp/
	* oacc-parallel.c (GOACC_parallel_keyed): Add offset to devaddrs.
	* testsuite/libgomp.oacc-c-c++-common/data_offset.c: New test.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-3.c: Update.
	* testsuite/libgomp.oacc-c-c++-common/kernels-loop-and-seq-4.c: Update.
	* testsuite/libgomp.oacc-c-c++-common/pr70828.c: New test.
	* testsuite/libgomp.oacc-fortran/assumed-size.f90: New test.
	* testsuite/libgomp.oacc-fortran/data-alignment.f90: New test.
	* testsuite/libgomp.oacc-fortran/data_offset.f90: New test.
	* testsuite/libgomp.oacc-fortran/lib-13.f90: Update.
	* testsuite/libgomp.oacc-fortran/pr70828.f90: New test.


>From 6723544609c3ae0fd9daa01c6585060625fe5454 Mon Sep 17 00:00:00 2001
From: Cesar Philippidis 
Date: Fri, 22 Jun 2018 09:58:43 -0700
Subject: [PATCH] Fortran array support

cecd29 OpenACC subarray data alignment in fortran
be4fec Privatize internal array variables introduced by the fortran FE
629dfb [OpenACC] firstprivate subarray changes
19cfe1 Fix PR70828s "broken array-type subarrays inside acc data in openacc"
00c258 [libgomp, OpenACC] Adjust offsets for present data clauses
924e50 [OpenACC, Fortran] fix an ICE involving assumed-size arrays
a5736d Correct the reported line number in fortran combined OpenACC directives

---
 gcc/fortran/trans-array.c |  12 +-
 gcc/fortran/trans-array.h |   2 +
 gcc/fortran/trans-openmp.c| 197 +-
 gcc/gimplify.c| 112 +-
 gcc/omp-low.c |   6 +
 .../c-c++-common/goacc/acc-data-chain.c   |  24 +++
 gcc/testsuite/gfortran.dg/goacc/mod-array.f90 |  23 ++
 gcc/testsuite/gfortran.dg/gomp/pr78866-2.f90  |   3 +-
 libgomp/oacc-parallel.c   |   3 +-
 .../libgomp.oacc-c-c++-common/data_offset.c   |  

Re: [PATCH] libtool: Sort output of 'find' to enable deterministic builds.

2018-06-29 Thread Eric Gallager
On 6/29/18, Ian Lance Taylor  wrote:
> On Fri, Jun 29, 2018 at 8:43 AM, Jakub Jelinek  wrote:
>> On Fri, Jun 29, 2018 at 09:09:38AM -0600, Jeff Law wrote:
>>> > Btw, running find to search for libtool.m4/ltmain.sh I find extra
>>> > copies in
>>> >
>>> > ./libgo/config/ltmain.sh
>>> > ./libgo/config/libtool.m4
>>> >
>>> > which are nearly identical besides appearantly patched in GO support?
>>> >
>>> > Can we consolidate those and/or do we need to patch those as well?
>>> Ideally consolidate.  The README indicates that directory is supposed
>>> "temporarily until Go support is added to autoconf and libtool".
>>
>> Can it be done when these files are not owned by GCC, but copied from
>> upstream?
>
> I sent the libtool patches upstream years ago, but I'm not sure what
> happened to them.
>
> Ian
>

Try pinging them; when there are libtool maintainers, they sometimes miss stuff.


Re: [RFC PATCH] diagnose built-in declarations without prototype (PR 83656)

2018-06-29 Thread Eric Gallager
On 6/29/18, Jeff Law  wrote:
> On 06/27/2018 08:40 PM, Martin Sebor wrote:
>> On 06/27/2018 03:53 PM, Jeff Law wrote:
>>> On 06/27/2018 09:27 AM, Jakub Jelinek wrote:
 On Wed, Jun 27, 2018 at 09:17:07AM -0600, Jeff Law wrote:
>> About 115 tests fail due to incompatible declarations of
>> the built-in functions below (the number shows the number
>> of warnings for each functions):
>>
>> 428   abort
>>  58   exit
>>  36   memcpy
>>  17   memmove
>>  15   realloc
>>  14   cabs
>>   5   strncpy
>>   4   strcmp
>>   3   alloca
>>   2   rindex
>>   1   aligned_alloc
> I'm supportive of this change.  Though I'm more worried about the
> affects on the distros than I am on the testsuite (which I expected to
> be in worse shape WRT this issue than your analysis indicates).

 I'm mainly worried about configure tests, those will be hit hardest by
 this and might result in silently turning off many features of the
 compiled
 programs.
>>> It's certainly a concern.  Sadly, it's a hard one to deal with because
>>> we can't just throw code at it and see what complains.  Instead we end
>>> up with packages that don't configure in the appropriate features.
>>
>> I checked all GCC's config logs and although there are 543
>> instances of the -Wbuiltin-declaration-mismatch warning in
>> an x86_64-linux build, none of them is an error and
>> the number is the same as before the patch.
> That's both depressing and promising at the same time.  Depressing
> there's so many, promising that none trigger an error.
>
> I wonder if stepping forward to a more modern version of autoconf is
> going to help here and if we should be feeding them updates to make this
> kind of stuff less pervasive, at least in standard autoconf tests.
> jeff
>

If you're going to be feeding the autoconf people updates, please help
them do a release, too; 2.70 has been stuck in limbo for too long now
due to nobody being able to do a release.
Thanks!


Re: [patch] adjust default nvptx launch geometry for OpenACC offloaded regions

2018-06-29 Thread Cesar Philippidis
Ping.

Ceasr

On 06/20/2018 02:59 PM, Cesar Philippidis wrote:
> At present, the nvptx libgomp plugin does not take into account the
> amount of shared resources on GPUs (mostly shared-memory are register
> usage) when selecting the default num_gangs and num_workers. In certain
> situations, an OpenACC offloaded function can fail to launch if the GPU
> does not have sufficient shared resources to accommodate all of the
> threads in a CUDA block. This typically manifests when a PTX function
> uses a lot of registers and num_workers is set too large, although it
> can also happen if the shared-memory has been exhausted by the threads
> in a vector.
> 
> This patch resolves that issue by adjusting num_workers based the amount
> of shared resources used by each threads. If worker parallelism has been
> requested, libgomp will spawn as many workers as possible up to 32.
> Without this patch, libgomp would always default to launching 32 workers
> when worker parallelism is used.
> 
> Besides for the worker parallelism, this patch also includes some
> heuristics on selecting num_gangs. Before, the plugin would launch two
> gangs per GPU multiprocessor. Now it follows the formula contained in
> the "CUDA Occupancy Calculator" spreadsheet that's distributed with CUDA.
> 
> Is this patch OK for trunk?
> 
> Thanks,
> Cesar
> 



Re: [PATCH,rs6000] Fix implementation of vec_unpackh, vec_unpackl builtins

2018-06-29 Thread Bill Schmidt
On Jun 29, 2018, at 9:38 AM, Carl Love  wrote:
> 
> GCC Maintainers:
> 
> The vec_unpackh, vec_unpackl builtins with vector float arguments
> unpack the high or low half of a floating point vector and convert the
> elements to a vector of doubles.  The current implementation of the
> builtin for the vector float argument is incorrectly using the vupklsh 
> instructions vupkhsh that unpack a vector of pixel type.  The following
> path fixes this issue by having the vec_unpackh and vec_unpackl
> builtins use the xvcvspdp instruction to do the vector float to double
> conversion.
> 
> Additionally, runnable tests were added to verify that the builtins
> work correctly for all of the valid argument types.
> 
> The patch has been tested on 
> 
> powerpc64le-unknown-linux-gnu (Power 8 LE)  
> powerpc64-unknown-linux-gnu (Power 8 BE)
>powerpc64le-unknown-linux-gnu (Power 9 LE)
> 
> With no regressions.
> 
> Please let me know if the patch looks OK for GCC mainline. The patch
> also needs to be backported to GCC 8.
> 
>  Carl Love
> --
> 
> gcc/ChangeLog:
> 
> 2018-06-29  Carl Love  
> 
>   * config/rs6000/altivec.md: Add define_expand altivec_unpackh_v4sf,
>   and define_expand altivec_unpackl_v4sf expansions.
>   * config/rs6000/rs6000-builtin.def: Add UNPACKH_V4SF and
>   UNPACKL_V4SF definitions.
>   * config/rs6000/rs6000-c.c: Map ALTIVEC_BUILTIN_VEC_UNPACKH for
>   float argument to ALTIVEC_BUILTIN_UNPACKH_V4SF.
>   Map ALTIVEC_BUILTIN_VEC_UNPACKL for float argument to
>   ALTIVEC_BUILTIN_UNPACKH_V4SF.

UNPACKL

That's all I see; will leave to Segher to approve, of course. 

Thanks,
Bill
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-06-29  Carl Love  
>   * gcc.target/altivec-1-runnable.c: New test file.
>   * gcc.target/altivec-2-runnable.c: New test file.
>   * gcc.target/vsx-7.c (main2): Change expected instruction
>   for tests.
> ---
> gcc/config/rs6000/altivec.md   |  22 ++
> gcc/config/rs6000/rs6000-builtin.def   |   2 +
> gcc/config/rs6000/rs6000-c.c   |   4 +-
> .../gcc.target/powerpc/altivec-1-runnable.c| 257 +
> .../gcc.target/powerpc/altivec-2-runnable.c|  94 
> gcc/testsuite/gcc.target/powerpc/vsx-7.c   |   7 +-
> 6 files changed, 380 insertions(+), 6 deletions(-)
> create mode 100644 gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
> create mode 100644 gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c
> 
> diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
> index 8ee42ae..e0ab588 100644
> --- a/gcc/config/rs6000/altivec.md
> +++ b/gcc/config/rs6000/altivec.md
> @@ -2311,6 +2311,28 @@
> }
>   [(set_attr "type" "vecperm")])
> 
> +;; Unpack high elements of float vector to vector of doubles
> +(define_expand "altivec_unpackh_v4sf"
> +  [(set (match_operand:V2DF 0 "register_operand" "=v")
> +(match_operand:V4SF 1 "register_operand" "v"))]
> +  "TARGET_VSX"
> +{
> +  emit_insn (gen_doublehv4sf2 (operands[0], operands[1]));
> +  DONE;
> +}
> +  [(set_attr "type" "veccomplex")])
> +
> +;; Unpack low elements of float vector to vector of doubles
> +(define_expand "altivec_unpackl_v4sf"
> +  [(set (match_operand:V2DF 0 "register_operand" "=v")
> +(match_operand:V4SF 1 "register_operand" "v"))]
> +  "TARGET_VSX"
> +{
> +  emit_insn (gen_doublelv4sf2 (operands[0], operands[1]));
> +  DONE;
> +}
> +  [(set_attr "type" "veccomplex")])
> +
> ;; Compare vectors producing a vector result and a predicate, setting CR6 to
> ;; indicate a combined status
> (define_insn "*altivec_vcmpequ_p"
> diff --git a/gcc/config/rs6000/rs6000-builtin.def 
> b/gcc/config/rs6000/rs6000-builtin.def
> index f799681..8c9fd4d 100644
> --- a/gcc/config/rs6000/rs6000-builtin.def
> +++ b/gcc/config/rs6000/rs6000-builtin.def
> @@ -1144,6 +1144,8 @@ BU_ALTIVEC_1 (VUPKHSH,"vupkhsh",CONST,  
> altivec_vupkhsh)
> BU_ALTIVEC_1 (VUPKLSB,  "vupklsb",CONST,  altivec_vupklsb)
> BU_ALTIVEC_1 (VUPKLPX,  "vupklpx",CONST,  altivec_vupklpx)
> BU_ALTIVEC_1 (VUPKLSH,  "vupklsh",CONST,  altivec_vupklsh)
> +BU_ALTIVEC_1 (UNPACKH_V4SF,   "unpackh_v4sf",CONST,  
> altivec_unpackh_v4sf)
> +BU_ALTIVEC_1 (UNPACKL_V4SF,   "unpackl_v4sf",CONST,  
> altivec_unpackl_v4sf)
> 
> BU_ALTIVEC_1 (VREVE_V2DI,  "vreve_v2di", CONST,  altivec_vrevev2di2)
> BU_ALTIVEC_1 (VREVE_V4SI,  "vreve_v4si", CONST,  altivec_vrevev4si2)
> diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
> index f4b1bf7..cd50d4e 100644
> --- a/gcc/config/rs6000/rs6000-c.c
> +++ b/gcc/config/rs6000/rs6000-c.c
> @@ -865,7 +865,7 @@ const struct altivec_builtin_types 
> altivec_overloaded_builtins[] = {
> RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V4SI, 0, 0 },
>   { 

Re: [PATCH] enhance strlen to understand MEM_REF and partial overlaps (PR 86042, 86043)

2018-06-29 Thread Jeff Law
On 06/07/2018 09:57 AM, Martin Sebor wrote:
> The attached patch enhances the strlen pass to more consistently
> deal with MEM_REF assignments (PR 86042) and to track string
> lengths across calls to memcpy that overwrite parts of a string
> with sequences of non-nul characters (PR 86043).
> 
> Fixes for both bugs rely on changes to the same code so I chose
> to include them in the same patch.
> 
> To fix PR 86042 the patch extends handle_char_store() to deal with
> more forms of multi-character assignments from MEM_REF (originally
> introduced in r256180).  To handle assignments from strings of
> multiple nuls the patch also extends the initializer_zerop()
> function to understand MEM_REFs of the form:
> 
>    MEM[(char * {ref-all})] = MEM[(char * {ref-all})"..."];
> 
> The solution for PR 86043 consists of two parts: the extension
> above which lets handle_char_store() recognize assignments of
> sequences of non-null characters that overwrite some portion of
> the leading non-zero characters in the destination and avoid
> discarding the destination information, and a similar extension
> to handle_builtin_memcpy().
> 
> Martin
> 
> gcc-86042.diff
> 
> 
> PR tree-optimization/86042 - missing strlen optimization after second strcpy
> 
> gcc/ChangeLog:
> 
>   PR tree-optimization/86042
>   * tree-ssa-strlen.c (handle_builtin_memcpy): Handle strict overlaps.
>   (get_string_cst_length): Rename...
>   (get_min_string_length): ...to this.  Add argument.
>   (handle_char_store): Extend to handle multi-character stores by
>   MEM_REF.
>   * tree.c (initializer_zerop): Use new argument.  Handle MEM_REF.
>   * tree.h (initializer_zerop): Add argument.
> 
> gcc/testsuite/ChangeLog:
> 
>   PR tree-optimization/86042
>   * gcc.dg/strlenopt-44.c: New test.
OK.

Thanks,
jeff


Re: [PATCH] avoid using strnlen result for late calls to strlen (PR 82604)

2018-06-29 Thread Martin Sebor

On 06/29/2018 06:56 AM, Rainer Orth wrote:

Hi Martin,


While looking into opportunities to detect strnlen/strlen coding
mistakes (pr86199) I noticed a bug in the strnlen implementation
I committed earlier today that lets a strnlen() result be saved
and used in subsequent calls to strlen() with the same argument.
The attached patch changes the handle_builtin_strlen() function
to discard the strnlen() result unless its bound is greater than
the length of the string.


the new test FAILs to link on Solaris 10:

+FAIL: gcc.dg/strlenopt-46.c (test for excess errors)
+UNRESOLVED: gcc.dg/strlenopt-46.c compilation failed to produce executable

Excess errors:
Undefined   first referenced
 symbol in file
strnlen /var/tmp//ccyrOY3M.o
ld: fatal: symbol referencing errors. No output written to ./strlenopt-46.exe


I forgot that the strlenopt tests don't define their own library
versions of the tested functions like the ones in
the gcc.c-torture/execute/builtins directory do.  I've added one
in r262255 so the test should link on targets without the function
too.  Please let me know if the problem doesn't clear up.

Martin


Re: [PATCH] libtool: Sort output of 'find' to enable deterministic builds.

2018-06-29 Thread Ian Lance Taylor
On Fri, Jun 29, 2018 at 8:43 AM, Jakub Jelinek  wrote:
> On Fri, Jun 29, 2018 at 09:09:38AM -0600, Jeff Law wrote:
>> > Btw, running find to search for libtool.m4/ltmain.sh I find extra copies in
>> >
>> > ./libgo/config/ltmain.sh
>> > ./libgo/config/libtool.m4
>> >
>> > which are nearly identical besides appearantly patched in GO support?
>> >
>> > Can we consolidate those and/or do we need to patch those as well?
>> Ideally consolidate.  The README indicates that directory is supposed
>> "temporarily until Go support is added to autoconf and libtool".
>
> Can it be done when these files are not owned by GCC, but copied from
> upstream?

I sent the libtool patches upstream years ago, but I'm not sure what
happened to them.

Ian


Re: [RFC PATCH] diagnose built-in declarations without prototype (PR 83656)

2018-06-29 Thread Martin Sebor

On 06/29/2018 09:11 AM, Jeff Law wrote:

On 06/27/2018 08:40 PM, Martin Sebor wrote:

On 06/27/2018 03:53 PM, Jeff Law wrote:

On 06/27/2018 09:27 AM, Jakub Jelinek wrote:

On Wed, Jun 27, 2018 at 09:17:07AM -0600, Jeff Law wrote:

About 115 tests fail due to incompatible declarations of
the built-in functions below (the number shows the number
of warnings for each functions):

428   abort
 58   exit
 36   memcpy
 17   memmove
 15   realloc
 14   cabs
  5   strncpy
  4   strcmp
  3   alloca
  2   rindex
  1   aligned_alloc

I'm supportive of this change.  Though I'm more worried about the
affects on the distros than I am on the testsuite (which I expected to
be in worse shape WRT this issue than your analysis indicates).


I'm mainly worried about configure tests, those will be hit hardest by
this and might result in silently turning off many features of the
compiled
programs.

It's certainly a concern.  Sadly, it's a hard one to deal with because
we can't just throw code at it and see what complains.  Instead we end
up with packages that don't configure in the appropriate features.


I checked all GCC's config logs and although there are 543
instances of the -Wbuiltin-declaration-mismatch warning in
an x86_64-linux build, none of them is an error and
the number is the same as before the patch.

That's both depressing and promising at the same time.  Depressing
there's so many, promising that none trigger an error.


All the warnings I have seen are because of declarations like
the one in the test below that checks for the presence of symbol
sin in the library:

  char sin ();
  int main () { return sin (); }

GCC has warned for this code by default since at least 4.1 so if
it is, in fact, a problem it has been with us for over a decade.

There's a comment in the test that explains that the char return
type is deliberate to prevent GCC from treating the function as
a built-in.  (That, of course, relies on undefined behavior
because names of extern library functions are reserved and
conforming compilers could simply avoid emitting the call or
replace it with a trap.)


I wonder if stepping forward to a more modern version of autoconf is
going to help here and if we should be feeding them updates to make this
kind of stuff less pervasive, at least in standard autoconf tests.


That would make sense to me.  The tests should not rely on
undefined behavior.  They should declare standard functions with
the right prototypes.  IMO, for GCC and compatible compilers they
should disable built-in expansion instead via -fno-builtin.  For
all other compilers, they could store the address of each function
in a (perhaps volatile) pointer and use it to make the call instead.

But since the number of warnings here hasn't changed, the ones
in GCC logs predate my changes.  So updating the tests seems
like an improvement to consider independently of the patch.

Martin


Re: [PATCH] libtool: Sort output of 'find' to enable deterministic builds.

2018-06-29 Thread Jakub Jelinek
On Fri, Jun 29, 2018 at 09:09:38AM -0600, Jeff Law wrote:
> > Btw, running find to search for libtool.m4/ltmain.sh I find extra copies in
> > 
> > ./libgo/config/ltmain.sh
> > ./libgo/config/libtool.m4
> > 
> > which are nearly identical besides appearantly patched in GO support?
> > 
> > Can we consolidate those and/or do we need to patch those as well?
> Ideally consolidate.  The README indicates that directory is supposed
> "temporarily until Go support is added to autoconf and libtool".

Can it be done when these files are not owned by GCC, but copied from
upstream?

Jakub


Re: [RFC PATCH] diagnose built-in declarations without prototype (PR 83656)

2018-06-29 Thread Jakub Jelinek
On Fri, Jun 29, 2018 at 09:11:39AM -0600, Jeff Law wrote:
> > I checked all GCC's config logs and although there are 543
> > instances of the -Wbuiltin-declaration-mismatch warning in
> > an x86_64-linux build, none of them is an error and
> > the number is the same as before the patch.
> That's both depressing and promising at the same time.  Depressing
> there's so many, promising that none trigger an error.
> 
> I wonder if stepping forward to a more modern version of autoconf is
> going to help here and if we should be feeding them updates to make this
> kind of stuff less pervasive, at least in standard autoconf tests.

For autoconf, the problem is that the tests usually scan for any compiler
messages, so even a warning results in test failure.
Even current autoconf for AC_CHECK_FUNCS declares
#ifdef __cplusplus
extern "C"
#endif
char whatever();

and in main body does return whatever();, so if we start warning on this if
we haven't warned before, then it might be a problem.

Jakub


Re: C++ PATCH for c++/86184, rejects-valid with ?: and omitted operand

2018-06-29 Thread Marek Polacek
On Wed, Jun 27, 2018 at 05:47:25PM -0400, Jason Merrill wrote:
> On Thu, Jun 21, 2018 at 2:22 PM, Marek Polacek  wrote:
> > The following testcase is rejected because, for this line:
> >
> >   bool b = X() ?: false;
> >
> > arg2 is missing and arg1 is a TARGET_EXPR.  A TARGET_EXPR is a class
> > prvalue so we wrap it in a SAVE_EXPR.  Later when building 'this' we
> > call build_this (SAVE_EXPR >) which triggers lvalue_error:
> >  5856   cp_lvalue_kind kind = lvalue_kind (arg);
> >  5857   if (kind == clk_none)
> >  5858 {
> >  5859   if (complain & tf_error)
> >  5860 lvalue_error (input_location, lv_addressof);
> > because all SAVE_EXPRs are non-lvalue.
> >
> > Since
> > a) cp_build_addr_expr_1 can process xvalues and class prvalues,
> > b) TARGET_EXPRs are only evaluated once (gimplify_target_expr),
> > I thought we could do the following.  The testcase ensures that
> > with the omitted operand we only construct X once.
> >
> > Bootstrapped/regtested on x86_64-linux, ok for trunk?
> >
> > 2018-06-21  Marek Polacek  
> >
> > PR c++/86184
> > * call.c (build_conditional_expr_1): Don't wrap TARGET_EXPRs
> > in a SAVE_EXPR.
> >
> > * g++.dg/ext/cond3.C: New test.
> >
> > --- gcc/cp/call.c
> > +++ gcc/cp/call.c
> > @@ -4806,6 +4806,10 @@ build_conditional_expr_1 (location_t loc, tree arg1, 
> > tree arg2, tree arg3,
> >/* Make sure that lvalues remain lvalues.  See g++.oliva/ext1.C.  */
> >if (lvalue_p (arg1))
> > arg2 = arg1 = cp_stabilize_reference (arg1);
> > +  else if (TREE_CODE (arg1) == TARGET_EXPR)
> > +   /* TARGET_EXPRs are only expanded once, don't wrap it in a 
> > SAVE_EXPR,
> > +  rendering it clk_none of clk_class.  */
> > +   arg2 = arg1;
> >else
> > arg2 = arg1 = cp_save_expr (arg1);
> 
> How about adding the special handling in cp_save_expr rather than
> here, so other callers also benefit?
> 
> OK with that change.

Thanks, this is what I'll commit (bootstrap/regtest passed on x86_64):

2018-06-29  Marek Polacek  

PR c++/86184
* tree.c (cp_save_expr): Don't call save_expr for TARGET_EXPRs.

* g++.dg/ext/cond3.C: New test.

diff --git gcc/cp/tree.c gcc/cp/tree.c
index e7bd79b6276..361248d4b52 100644
--- gcc/cp/tree.c
+++ gcc/cp/tree.c
@@ -4918,6 +4918,11 @@ cp_save_expr (tree expr)
  tree codes.  */
   if (processing_template_decl)
 return expr;
+
+  /* TARGET_EXPRs are only expanded once.  */
+  if (TREE_CODE (expr) == TARGET_EXPR)
+return expr;
+
   return save_expr (expr);
 }
 
diff --git gcc/testsuite/g++.dg/ext/cond3.C gcc/testsuite/g++.dg/ext/cond3.C
index e69de29bb2d..6390dc4270b 100644
--- gcc/testsuite/g++.dg/ext/cond3.C
+++ gcc/testsuite/g++.dg/ext/cond3.C
@@ -0,0 +1,20 @@
+// PR c++/86184
+// { dg-do run }
+// { dg-options "" }
+
+int j;
+struct X {
+  X() { j++; }
+  operator bool() { return true; }
+};
+
+/* Only create X once.  */
+bool b = X() ?: false;
+bool b2 = X() ? X() : false;
+
+int
+main ()
+{
+  if (j != 3)
+__builtin_abort ();
+}

Marek


Re: [RFC PATCH] diagnose built-in declarations without prototype (PR 83656)

2018-06-29 Thread Jeff Law
On 06/27/2018 08:40 PM, Martin Sebor wrote:
> On 06/27/2018 03:53 PM, Jeff Law wrote:
>> On 06/27/2018 09:27 AM, Jakub Jelinek wrote:
>>> On Wed, Jun 27, 2018 at 09:17:07AM -0600, Jeff Law wrote:
> About 115 tests fail due to incompatible declarations of
> the built-in functions below (the number shows the number
> of warnings for each functions):
>
> 428   abort
>  58   exit
>  36   memcpy
>  17   memmove
>  15   realloc
>  14   cabs
>   5   strncpy
>   4   strcmp
>   3   alloca
>   2   rindex
>   1   aligned_alloc
 I'm supportive of this change.  Though I'm more worried about the
 affects on the distros than I am on the testsuite (which I expected to
 be in worse shape WRT this issue than your analysis indicates).
>>>
>>> I'm mainly worried about configure tests, those will be hit hardest by
>>> this and might result in silently turning off many features of the
>>> compiled
>>> programs.
>> It's certainly a concern.  Sadly, it's a hard one to deal with because
>> we can't just throw code at it and see what complains.  Instead we end
>> up with packages that don't configure in the appropriate features.
> 
> I checked all GCC's config logs and although there are 543
> instances of the -Wbuiltin-declaration-mismatch warning in
> an x86_64-linux build, none of them is an error and
> the number is the same as before the patch.
That's both depressing and promising at the same time.  Depressing
there's so many, promising that none trigger an error.

I wonder if stepping forward to a more modern version of autoconf is
going to help here and if we should be feeding them updates to make this
kind of stuff less pervasive, at least in standard autoconf tests.
jeff


Re: [PATCH] libtool: Sort output of 'find' to enable deterministic builds.

2018-06-29 Thread Jeff Law
On 06/29/2018 02:13 AM, Richard Biener wrote:
> On Mon, Jun 25, 2018 at 1:39 PM Bernhard M. Wiedemann
>  wrote:
>>
>> so that gcc builds in a reproducible way
>> in spite of indeterministic filesystem readdir order
>>
>> See https://reproducible-builds.org/ for why this is good.
>>
>> While working on the reproducible builds effort, I found that
>> when building the gcc8 package for openSUSE, there were differences
>> between each build in resulting binaries like gccgo, cc1obj and cpp
>> because the order of objects in libstdc++.a varied based on
>> the order of entries returned by the filesystem.
>>
>> Two remaining issues are with timestamps in the ada build
>> and with profiledbootstrap that only is reproducible if all inputs
>> in the profiling run remain constant (and make -j breaks it too)
>>
>> Testcases:
>>   none included because patch is trivial and it would need to compare builds 
>> on 2 filesystems.
>>
>> Bootstrapping and testing:
>>   tested successfully with gcc8 on x86_64
> 
> Looks ok to me.
> 
> Btw, running find to search for libtool.m4/ltmain.sh I find extra copies in
> 
> ./libgo/config/ltmain.sh
> ./libgo/config/libtool.m4
> 
> which are nearly identical besides appearantly patched in GO support?
> 
> Can we consolidate those and/or do we need to patch those as well?
Ideally consolidate.  The README indicates that directory is supposed
"temporarily until Go support is added to autoconf and libtool".

So, assuming autoconf/libtool have updated appropriately upstream, then
we "just" need to get ourselves up-to-date and I think that directory of
stuff can go away.

In the immediate term, applying the patch to both instances seems wise.

Bernhard, do you have commit privs?

jeff


Re: [PATCH] Revert one ipa_call_summaries::get to get_create (PR ipa/86323).

2018-06-29 Thread Jeff Law
On 06/29/2018 07:41 AM, Martin Liška wrote:
> Hi.
> 
> It's revert of a hunk that causes a new ICE.
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready to be installed?
> Martin
> 
> gcc/ChangeLog:
> 
> 2018-06-29  Martin Liska  
> 
> PR ipa/86323
>   * ipa-inline.c (early_inliner): Revert wrongly added ::get call.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-06-29  Martin Liska  
> 
> PR ipa/86323
>   * g++.dg/ipa/pr86323.C: New test.
OK.
jeff


Re: [PATCH] Revert 2 ::get to ::get_create for IPA summaries (PR ipa/86279).

2018-06-29 Thread Jeff Law
On 06/29/2018 07:41 AM, Martin Liška wrote:
> Hi.
> 
> It's revert of a hunk that causes a new ICE in IPA summaries.
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
> 
> Ready to be installed?
> Martin
> 
> gcc/ChangeLog:
> 
> 2018-06-29  Martin Liska  
> 
> PR ipa/86279
>   * ipa-pure-const.c (malloc_candidate_p): Revert usage of ::get.
>   (propagate_nothrow): Likewise.
> 
> gcc/testsuite/ChangeLog:
> 
> 2018-06-29  Martin Liska  
> 
> PR ipa/86279
>   * gcc.dg/ipa/pr86279.c: New test.
ISTM that if you're reverting something recent of your own that's
causing failures you ought to be able to revert without waiting.

Anyway, OK.

jeff


[PATCH,rs6000] Fix implementation of vec_unpackh, vec_unpackl builtins

2018-06-29 Thread Carl Love
GCC Maintainers:

The vec_unpackh, vec_unpackl builtins with vector float arguments
unpack the high or low half of a floating point vector and convert the
elements to a vector of doubles.  The current implementation of the
builtin for the vector float argument is incorrectly using the vupklsh 
instructions vupkhsh that unpack a vector of pixel type.  The following
path fixes this issue by having the vec_unpackh and vec_unpackl
builtins use the xvcvspdp instruction to do the vector float to double
conversion.

Additionally, runnable tests were added to verify that the builtins
work correctly for all of the valid argument types.

The patch has been tested on 

powerpc64le-unknown-linux-gnu (Power 8 LE)  
powerpc64-unknown-linux-gnu (Power 8 BE)
powerpc64le-unknown-linux-gnu (Power 9 LE)

With no regressions.

Please let me know if the patch looks OK for GCC mainline. The patch
also needs to be backported to GCC 8.

 Carl Love
--

gcc/ChangeLog:

2018-06-29  Carl Love  

* config/rs6000/altivec.md: Add define_expand altivec_unpackh_v4sf,
and define_expand altivec_unpackl_v4sf expansions.
* config/rs6000/rs6000-builtin.def: Add UNPACKH_V4SF and
UNPACKL_V4SF definitions.
* config/rs6000/rs6000-c.c: Map ALTIVEC_BUILTIN_VEC_UNPACKH for
float argument to ALTIVEC_BUILTIN_UNPACKH_V4SF.
Map ALTIVEC_BUILTIN_VEC_UNPACKL for float argument to
ALTIVEC_BUILTIN_UNPACKH_V4SF.

gcc/testsuite/ChangeLog:

2018-06-29  Carl Love  
* gcc.target/altivec-1-runnable.c: New test file.
* gcc.target/altivec-2-runnable.c: New test file.
* gcc.target/vsx-7.c (main2): Change expected instruction
for tests.
---
 gcc/config/rs6000/altivec.md   |  22 ++
 gcc/config/rs6000/rs6000-builtin.def   |   2 +
 gcc/config/rs6000/rs6000-c.c   |   4 +-
 .../gcc.target/powerpc/altivec-1-runnable.c| 257 +
 .../gcc.target/powerpc/altivec-2-runnable.c|  94 
 gcc/testsuite/gcc.target/powerpc/vsx-7.c   |   7 +-
 6 files changed, 380 insertions(+), 6 deletions(-)
 create mode 100644 gcc/testsuite/gcc.target/powerpc/altivec-1-runnable.c
 create mode 100644 gcc/testsuite/gcc.target/powerpc/altivec-2-runnable.c

diff --git a/gcc/config/rs6000/altivec.md b/gcc/config/rs6000/altivec.md
index 8ee42ae..e0ab588 100644
--- a/gcc/config/rs6000/altivec.md
+++ b/gcc/config/rs6000/altivec.md
@@ -2311,6 +2311,28 @@
 }
   [(set_attr "type" "vecperm")])
 
+;; Unpack high elements of float vector to vector of doubles
+(define_expand "altivec_unpackh_v4sf"
+  [(set (match_operand:V2DF 0 "register_operand" "=v")
+(match_operand:V4SF 1 "register_operand" "v"))]
+  "TARGET_VSX"
+{
+  emit_insn (gen_doublehv4sf2 (operands[0], operands[1]));
+  DONE;
+}
+  [(set_attr "type" "veccomplex")])
+
+;; Unpack low elements of float vector to vector of doubles
+(define_expand "altivec_unpackl_v4sf"
+  [(set (match_operand:V2DF 0 "register_operand" "=v")
+(match_operand:V4SF 1 "register_operand" "v"))]
+  "TARGET_VSX"
+{
+  emit_insn (gen_doublelv4sf2 (operands[0], operands[1]));
+  DONE;
+}
+  [(set_attr "type" "veccomplex")])
+
 ;; Compare vectors producing a vector result and a predicate, setting CR6 to
 ;; indicate a combined status
 (define_insn "*altivec_vcmpequ_p"
diff --git a/gcc/config/rs6000/rs6000-builtin.def 
b/gcc/config/rs6000/rs6000-builtin.def
index f799681..8c9fd4d 100644
--- a/gcc/config/rs6000/rs6000-builtin.def
+++ b/gcc/config/rs6000/rs6000-builtin.def
@@ -1144,6 +1144,8 @@ BU_ALTIVEC_1 (VUPKHSH,  "vupkhsh",CONST,  
altivec_vupkhsh)
 BU_ALTIVEC_1 (VUPKLSB,   "vupklsb",CONST,  altivec_vupklsb)
 BU_ALTIVEC_1 (VUPKLPX,   "vupklpx",CONST,  altivec_vupklpx)
 BU_ALTIVEC_1 (VUPKLSH,   "vupklsh",CONST,  altivec_vupklsh)
+BU_ALTIVEC_1 (UNPACKH_V4SF,   "unpackh_v4sf",  CONST,  altivec_unpackh_v4sf)
+BU_ALTIVEC_1 (UNPACKL_V4SF,   "unpackl_v4sf",  CONST,  altivec_unpackl_v4sf)
 
 BU_ALTIVEC_1 (VREVE_V2DI,  "vreve_v2di", CONST,  altivec_vrevev2di2)
 BU_ALTIVEC_1 (VREVE_V4SI,  "vreve_v4si", CONST,  altivec_vrevev4si2)
diff --git a/gcc/config/rs6000/rs6000-c.c b/gcc/config/rs6000/rs6000-c.c
index f4b1bf7..cd50d4e 100644
--- a/gcc/config/rs6000/rs6000-c.c
+++ b/gcc/config/rs6000/rs6000-c.c
@@ -865,7 +865,7 @@ const struct altivec_builtin_types 
altivec_overloaded_builtins[] = {
 RS6000_BTI_bool_V2DI, RS6000_BTI_bool_V4SI, 0, 0 },
   { ALTIVEC_BUILTIN_VEC_UNPACKH, ALTIVEC_BUILTIN_VUPKHPX,
 RS6000_BTI_unsigned_V4SI, RS6000_BTI_pixel_V8HI, 0, 0 },
-  { ALTIVEC_BUILTIN_VEC_UNPACKH, ALTIVEC_BUILTIN_VUPKHPX,
+  { ALTIVEC_BUILTIN_VEC_UNPACKH, ALTIVEC_BUILTIN_UNPACKH_V4SF,
 RS6000_BTI_V2DF, RS6000_BTI_V4SF, 0, 0 },
   { ALTIVEC_BUILTIN_VEC_VUPKHSH, ALTIVEC_BUILTIN_VUPKHSH,
 RS6000_BTI_V4SI, 

Re: [PATCH] When using -fprofile-generate=/some/path mangle absolute path of file (PR lto/85759).

2018-06-29 Thread Martin Liška
On 06/22/2018 10:35 PM, Jeff Law wrote:
> On 05/16/2018 05:53 AM, Martin Liška wrote:
>> On 12/21/2017 10:13 AM, Martin Liška wrote:
>>> On 12/20/2017 06:45 PM, Jakub Jelinek wrote:
 Another thing is that the "/" in there is wrong, so
   const char dir_separator_str[] = { DIR_SEPARATOR, '\0' };
   char *b = concat (profile_data_prefix, dir_separator_str, pwd, NULL);
 needs to be used instead.
>>> This looks much nicer, I forgot about DIR_SEPARATOR.
>>>
 Does profile_data_prefix have any dir separators stripped from the end?
>>> That's easy to achieve..
>>>
 Is pwd guaranteed to be relative in this case?
>>> .. however this is absolute path, which would be problematic on a DOC based 
>>> FS.
>>> Maybe we should do the same path mangling as we do for purpose of gcov:
>>>
>>> https://github.com/gcc-mirror/gcc/blob/master/gcc/gcov.c#L2424
>> Hi.
>>
>> I decided to implement that. Which means for:
>>
>> $ gcc -fprofile-generate=/tmp/myfolder empty.c -O2 && ./a.out 
>>
>> we get following file:
>> /tmp/myfolder/#home#marxin#Programming#testcases#tmp#empty.gcda
>>
>> That guarantees we have a unique file path. As seen in the PR it
>> can produce a funny ICE.
>>
>> I've been testing the patch.
>> Ready after it finishes tests?
>>
>> Martin
>>
>>> What do you think about it?
>>> Regarding the string manipulation: I'm not an expert, but work with string 
>>> in C
>>> is for me always a pain :)
>>>
>>> Martin
>>>
>>
>> 0001-When-using-fprofile-generate-some-path-mangle-absolu.patch
>>
>>
>> From 386a4561a4d1501e8959871791289e95f6a89af5 Mon Sep 17 00:00:00 2001
>> From: marxin 
>> Date: Wed, 16 Aug 2017 10:22:57 +0200
>> Subject: [PATCH] When using -fprofile-generate=/some/path mangle absolute 
>> path
>>  of file (PR lto/85759).
>>
>> gcc/ChangeLog:
>>
>> 2018-05-16  Martin Liska  
>>
>>  PR lto/85759
>>  * coverage.c (coverage_init): Mangle full path name.
>>  * doc/invoke.texi: Document the change.
>>  * gcov-io.c (mangle_path): New.
>>  * gcov-io.h (mangle_path): Likewise.
>>  * gcov.c (mangle_name): Use mangle_path for path mangling.
> ISTM you can self-approve this now if you want it to move forward :-)
> 
> jeff
> 

Sure, let me install the patch then.

Martin


[PATCH] Revert 2 ::get to ::get_create for IPA summaries (PR ipa/86279).

2018-06-29 Thread Martin Liška
Hi.

It's revert of a hunk that causes a new ICE in IPA summaries.
Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin

gcc/ChangeLog:

2018-06-29  Martin Liska  

PR ipa/86279
* ipa-pure-const.c (malloc_candidate_p): Revert usage of ::get.
(propagate_nothrow): Likewise.

gcc/testsuite/ChangeLog:

2018-06-29  Martin Liska  

PR ipa/86279
* gcc.dg/ipa/pr86279.c: New test.
---
 gcc/ipa-pure-const.c   |  5 ++---
 gcc/testsuite/gcc.dg/ipa/pr86279.c | 25 +
 2 files changed, 27 insertions(+), 3 deletions(-)
 create mode 100644 gcc/testsuite/gcc.dg/ipa/pr86279.c


diff --git a/gcc/ipa-pure-const.c b/gcc/ipa-pure-const.c
index 714239f8734..dede783bd5f 100644
--- a/gcc/ipa-pure-const.c
+++ b/gcc/ipa-pure-const.c
@@ -923,8 +923,7 @@ malloc_candidate_p (function *fun, bool ipa)
 	  cgraph_edge *cs = node->get_edge (call_stmt);
 	  if (cs)
 	{
-	  ipa_call_summary *es = ipa_call_summaries->get (cs);
-	  gcc_assert (es);
+	  ipa_call_summary *es = ipa_call_summaries->get_create (cs);
 	  es->is_return_callee_uncaptured = true;
 	}
 	}
@@ -1803,7 +1802,7 @@ propagate_nothrow (void)
   w = node;
   while (w)
 	{
-	  funct_state w_l = funct_state_summaries->get (w);
+	  funct_state w_l = funct_state_summaries->get_create (w);
 	  if (!can_throw && !TREE_NOTHROW (w->decl))
 	{
 	  /* Inline clones share declaration with their offline copies;
diff --git a/gcc/testsuite/gcc.dg/ipa/pr86279.c b/gcc/testsuite/gcc.dg/ipa/pr86279.c
new file mode 100644
index 000..a9360213ec6
--- /dev/null
+++ b/gcc/testsuite/gcc.dg/ipa/pr86279.c
@@ -0,0 +1,25 @@
+/* PR ipa/86279 */
+/* { dg-do compile } */
+/* { dg-options "-fipa-pure-const" } */
+
+typedef __SIZE_TYPE__ size_t;
+extern inline __attribute__ ((__always_inline__))
+void *
+memset (void *x, int y, size_t z)
+{
+  return __builtin___memset_chk (x, y, z, __builtin_object_size (x, 0));
+}
+
+void
+foo (unsigned char *x, unsigned char *y, unsigned char *z,
+ unsigned char *w, unsigned int v, int u, int t)
+{
+  int i;
+  for (i = 0; i < t; i++)
+{
+  memset (z, x[0], v);
+  memset (w, y[0], v);
+  x += u;
+}
+  __builtin_memcpy (z, x, u);
+}



[PATCH] Revert one ipa_call_summaries::get to get_create (PR ipa/86323).

2018-06-29 Thread Martin Liška
Hi.

It's revert of a hunk that causes a new ICE.
Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.

Ready to be installed?
Martin

gcc/ChangeLog:

2018-06-29  Martin Liska  

PR ipa/86323
* ipa-inline.c (early_inliner): Revert wrongly added ::get call.

gcc/testsuite/ChangeLog:

2018-06-29  Martin Liska  

PR ipa/86323
* g++.dg/ipa/pr86323.C: New test.
---
 gcc/ipa-inline.c   | 13 +
 gcc/testsuite/g++.dg/ipa/pr86323.C | 28 
 2 files changed, 33 insertions(+), 8 deletions(-)
 create mode 100644 gcc/testsuite/g++.dg/ipa/pr86323.C


diff --git a/gcc/ipa-inline.c b/gcc/ipa-inline.c
index a62c1ffd3b0..a84d1d9ad3e 100644
--- a/gcc/ipa-inline.c
+++ b/gcc/ipa-inline.c
@@ -2802,14 +2802,11 @@ early_inliner (function *fun)
 	  for (edge = node->callees; edge; edge = edge->next_callee)
 	{
 	  /* We have no summary for new bound store calls yet.  */
-	  ipa_call_summary *es = ipa_call_summaries->get (edge);
-	  if (es != NULL)
-		{
-		  es->call_stmt_size
-		= estimate_num_insns (edge->call_stmt, _size_weights);
-		  es->call_stmt_time
-		= estimate_num_insns (edge->call_stmt, _time_weights);
-		}
+	  ipa_call_summary *es = ipa_call_summaries->get_create (edge);
+	  es->call_stmt_size
+		= estimate_num_insns (edge->call_stmt, _size_weights);
+	  es->call_stmt_time
+		= estimate_num_insns (edge->call_stmt, _time_weights);
 
 	  if (edge->callee->decl
 		  && !gimple_check_call_matching_types (
diff --git a/gcc/testsuite/g++.dg/ipa/pr86323.C b/gcc/testsuite/g++.dg/ipa/pr86323.C
new file mode 100644
index 000..6632f35d86c
--- /dev/null
+++ b/gcc/testsuite/g++.dg/ipa/pr86323.C
@@ -0,0 +1,28 @@
+/* { dg-do compile } */
+/* { dg-options "-O3 --param max-early-inliner-iterations=5" } */
+
+char *s;
+namespace a {
+template  class af {
+public:
+  af(ae);
+};
+typedef af b;
+namespace ag {
+class ah {
+public:
+  void ai(b aj) { c(aj); }
+  virtual void c(b);
+};
+class d : public ah {
+  void c(b);
+};
+class e {
+  void f(bool);
+  void ai(b aj) { g.ai(aj); }
+  d g;
+};
+void d::c(b) {}
+void e::f(bool) { ai(s); }
+}
+}



Re: [PATCH] Fix PR86321

2018-06-29 Thread Dominique d'Humières
The patch fixes PR86321 without regression.

Thanks,

Dominique



[PATCH][arm] Avoid STRD with odd register for TARGET_ARM in output_move_double

2018-06-29 Thread Kyrill Tkachov

Hi all,

In this testcase the user forces an odd register as the starting reg for a 
DFmode value.
The output_move_double function tries to store that using an STRD instruction.
But for TARGET_ARM the starting register of an STRD must be an even one.
This is always the case with compiler-allocated registers for DFmode values, 
but the
inline assembly forced our hand here.

This patch  restricts the STRD-emitting logic in output_move_double to not avoid
odd-numbered source registers in STRD.
I'm not a fan of the whole function, we should be exposing a lot of the logic 
in there
to RTL rather than at the final output stage, but that would need to be fixed 
separately.

This patch is much safer for backporting purposes.

Bootstrapped and tested on arm-none-linux-gnueabihf.

Committing to trunk.
Thanks,
Kyrill

2018-06-29  Kyrylo Tkachov  

* config/arm/arm.c (output_move_double): Don't allow STRD instructions
if starting source register is not even.

2018-06-29  Kyrylo Tkachov  

* gcc.target/arm/arm-soft-strd-even.c: New test.
diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 0c6738cf4d669b17d063e9c5c7ab3b4f455fe8bb..8f1df45fca589c6af2f136bcba30cc1a04b0ceec 100644
--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -18455,12 +18455,18 @@ output_move_double (rtx *operands, bool emit, int *count)
   gcc_assert ((REGNO (operands[1]) != IP_REGNUM)
   || (TARGET_ARM && TARGET_LDRD));
 
+  /* For TARGET_ARM the first source register of an STRD
+	 must be even.  This is usually the case for double-word
+	 values but user assembly constraints can force an odd
+	 starting register.  */
+  bool allow_strd = TARGET_LDRD
+			 && !(TARGET_ARM && (REGNO (operands[1]) & 1) == 1);
   switch (GET_CODE (XEXP (operands[0], 0)))
 {
 	case REG:
 	  if (emit)
 	{
-	  if (TARGET_LDRD)
+	  if (allow_strd)
 		output_asm_insn ("strd%?\t%1, [%m0]", operands);
 	  else
 		output_asm_insn ("stm%?\t%m0, %M1", operands);
@@ -18468,7 +18474,7 @@ output_move_double (rtx *operands, bool emit, int *count)
 	  break;
 
 case PRE_INC:
-	  gcc_assert (TARGET_LDRD);
+	  gcc_assert (allow_strd);
 	  if (emit)
 	output_asm_insn ("strd%?\t%1, [%m0, #8]!", operands);
 	  break;
@@ -18476,7 +18482,7 @@ output_move_double (rtx *operands, bool emit, int *count)
 case PRE_DEC:
 	  if (emit)
 	{
-	  if (TARGET_LDRD)
+	  if (allow_strd)
 		output_asm_insn ("strd%?\t%1, [%m0, #-8]!", operands);
 	  else
 		output_asm_insn ("stmdb%?\t%m0!, %M1", operands);
@@ -18486,7 +18492,7 @@ output_move_double (rtx *operands, bool emit, int *count)
 case POST_INC:
 	  if (emit)
 	{
-	  if (TARGET_LDRD)
+	  if (allow_strd)
 		output_asm_insn ("strd%?\t%1, [%m0], #8", operands);
 	  else
 		output_asm_insn ("stm%?\t%m0!, %M1", operands);
@@ -18494,7 +18500,7 @@ output_move_double (rtx *operands, bool emit, int *count)
 	  break;
 
 case POST_DEC:
-	  gcc_assert (TARGET_LDRD);
+	  gcc_assert (allow_strd);
 	  if (emit)
 	output_asm_insn ("strd%?\t%1, [%m0], #-8", operands);
 	  break;
@@ -18505,8 +18511,8 @@ output_move_double (rtx *operands, bool emit, int *count)
 	  otherops[1] = XEXP (XEXP (XEXP (operands[0], 0), 1), 0);
 	  otherops[2] = XEXP (XEXP (XEXP (operands[0], 0), 1), 1);
 
-	  /* IWMMXT allows offsets larger than ldrd can handle,
-	 fix these up with a pair of ldr.  */
+	  /* IWMMXT allows offsets larger than strd can handle,
+	 fix these up with a pair of str.  */
 	  if (!TARGET_THUMB2
 	  && CONST_INT_P (otherops[2])
 	  && (INTVAL(otherops[2]) <= -256
@@ -18571,7 +18577,7 @@ output_move_double (rtx *operands, bool emit, int *count)
 		  return "";
 		}
 	}
-	  if (TARGET_LDRD
+	  if (allow_strd
 	  && (REG_P (otherops[2])
 		  || TARGET_THUMB2
 		  || (CONST_INT_P (otherops[2])
diff --git a/gcc/testsuite/gcc.target/arm/arm-soft-strd-even.c b/gcc/testsuite/gcc.target/arm/arm-soft-strd-even.c
new file mode 100644
index ..fb7317c8718229f5a6a01a6ce2a734edb466c151
--- /dev/null
+++ b/gcc/testsuite/gcc.target/arm/arm-soft-strd-even.c
@@ -0,0 +1,18 @@
+/* { dg-do assemble } */
+/* { dg-require-effective-target arm_arm_ok } */
+/* { dg-options "-O2 -marm -mfloat-abi=soft" } */
+
+/* Check that we don't try to emit STRD in ARM state with
+   odd starting register.  */
+
+struct S {
+  double M0;
+} __attribute((aligned)) __attribute((packed));
+
+void bar(void *);
+
+void foo(int x, struct S y) {
+  asm("" : : : "r1", "r8", "r7", "r4");
+  y.M0 ?: bar(0);
+  bar(__builtin_alloca(8));
+}


[PATCH][wwwdocs] Mention Cortex-A76 support in GCC 9 changes.html

2018-06-29 Thread Kyrill Tkachov

Hi all,

This patch adds support for the Arm Cortex-A76 processor in changes.html for 
GCC 9.
It enables the AArch64 section of the page and adds the news blob there.
It also adds an entry to the already-existing arm entry.

Ok to commit to CVS (for the aarch64 parts)?

Thanks,
Kyrill
Index: htdocs/gcc-9/changes.html
===
RCS file: /cvs/gcc/wwwdocs/htdocs/gcc-9/changes.html,v
retrieving revision 1.9
diff -U 3 -r1.9 changes.html
--- htdocs/gcc-9/changes.html	20 Jun 2018 16:15:35 -	1.9
+++ htdocs/gcc-9/changes.html	27 Jun 2018 10:25:19 -
@@ -77,13 +77,41 @@
 
 New Targets and Target Specific Improvements
 
-
+AArch64
+
+  
+Support has been added for the following processors
+(GCC identifiers in parentheses):
+
+	Arm Cortex-A76 (cortex-a76).
+	Arm Cortex-A55/Cortex-A76 DynamIQ big.LITTLE (cortex-a76.cortex-a55).
+
+The GCC identifiers can be used
+as arguments to the -mcpu or -mtune options,
+for example: -mcpu=cortex-a76 or
+-mtune=cortex-a76.cortex-a55 or as arguments to the equivalent target
+attributes and pragmas.
+  
+
 
 
 
 ARM
 
   
+Support has been added for the following processors
+(GCC identifiers in parentheses):
+
+	Arm Cortex-A76 (cortex-a76).
+	Arm Cortex-A55/Cortex-A76 DynamIQ big.LITTLE (cortex-a76.cortex-a55).
+
+The GCC identifiers can be used
+as arguments to the -mcpu or -mtune options,
+for example: -mcpu=cortex-a76 or
+-mtune=cortex-a76.cortex-a55 or as arguments to the equivalent target
+attributes and pragmas.
+  
+  
 Support for the deprecated Armv2 and Armv3 architectures and their
 variants has been removed.  Their corresponding -march
 values and the -mcpu options that used these architectures


Re: [PATCH] avoid using strnlen result for late calls to strlen (PR 82604)

2018-06-29 Thread Rainer Orth
Hi Martin,

> While looking into opportunities to detect strnlen/strlen coding
> mistakes (pr86199) I noticed a bug in the strnlen implementation
> I committed earlier today that lets a strnlen() result be saved
> and used in subsequent calls to strlen() with the same argument.
> The attached patch changes the handle_builtin_strlen() function
> to discard the strnlen() result unless its bound is greater than
> the length of the string.

the new test FAILs to link on Solaris 10:

+FAIL: gcc.dg/strlenopt-46.c (test for excess errors)
+UNRESOLVED: gcc.dg/strlenopt-46.c compilation failed to produce executable

Excess errors:
Undefined   first referenced
 symbol in file
strnlen /var/tmp//ccyrOY3M.o
ld: fatal: symbol referencing errors. No output written to ./strlenopt-46.exe

Rainer

-- 
-
Rainer Orth, Center for Biotechnology, Bielefeld University


Re: [14/n] PR85694: Rework overwidening detection

2018-06-29 Thread Richard Sandiford
Richard Sandiford  writes:
> This patch is the main part of PR85694.  The aim is to recognise at least:
>
>   signed char *a, *b, *c;
>   ...
>   for (int i = 0; i < 2048; i++)
> c[i] = (a[i] + b[i]) >> 1;
>
> as an over-widening pattern, since the addition and shift can be done
> on shorts rather than ints.  However, it ended up being a lot more
> general than that.
>
> The current over-widening pattern detection is limited to a few simple
> cases: logical ops with immediate second operands, and shifts by a
> constant.  These cases are enough for common pixel-format conversion
> and can be detected in a peephole way.
>
> The loop above requires two generalisations of the current code: support
> for addition as well as logical ops, and support for non-constant second
> operands.  These are harder to detect in the same peephole way, so the
> patch tries to take a more global approach.
>
> The idea is to get information about the minimum operation width
> in two ways:
>
> (1) by using the range information attached to the SSA_NAMEs
> (effectively a forward walk, since the range info is
> context-independent).
>
> (2) by back-propagating the number of output bits required by
> users of the result.
>
> As explained in the comments, there's a balance to be struck between
> narrowing an individual operation and fitting in with the surrounding
> code.  The approach is pretty conservative: if we could narrow an
> operation to N bits without changing its semantics, it's OK to do that if:
>
> - no operations later in the chain require more than N bits; or
>
> - all internally-defined inputs are extended from N bits or fewer,
>   and at least one of them is single-use.
>
> See the comments for the rationale.
>
> I didn't bother adding STMT_VINFO_* wrappers for the new fields
> since the code seemed more readable without.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Here's a version rebased on top of current trunk.  Changes from last time:

- reintroduce dump_generic_expr_loc, with the obvious change to the
  prototype

- fix a typo in a comment

- use vect_element_precision from the new version of 12/n.

Tested as before.  OK to install?

Richard


2018-06-29  Richard Sandiford  

gcc/
* poly-int.h (print_hex): New function.
* dumpfile.h (dump_generic_expr_loc, dump_dec, dump_hex): Declare.
* dumpfile.c (dump_generic_expr): Fix formatting.
(dump_generic_expr_loc): New function.
(dump_dec, dump_hex): New poly_wide_int functions.
* tree-vectorizer.h (_stmt_vec_info): Add min_output_precision,
min_input_precision, operation_precision and operation_sign.
* tree-vect-patterns.c (vect_get_range_info): New function.
(vect_same_loop_or_bb_p, vect_single_imm_use)
(vect_operation_fits_smaller_type): Delete.
(vect_look_through_possible_promotion): Add an optional
single_use_p parameter.
(vect_recog_over_widening_pattern): Rewrite to use new
stmt_vec_info infomration.  Handle one operation at a time.
(vect_recog_cast_forwprop_pattern, vect_narrowable_type_p)
(vect_truncatable_operation_p, vect_set_operation_type)
(vect_set_min_input_precision): New functions.
(vect_determine_min_output_precision_1): Likewise.
(vect_determine_min_output_precision): Likewise.
(vect_determine_precisions_from_range): Likewise.
(vect_determine_precisions_from_users): Likewise.
(vect_determine_stmt_precisions, vect_determine_precisions): Likewise.
(vect_vect_recog_func_ptrs): Put over_widening first.
Add cast_forwprop.
(vect_pattern_recog): Call vect_determine_precisions.

gcc/testsuite/
* gcc.dg/vect/vect-over-widen-1.c: Update the scan tests for new
over-widening messages.
* gcc.dg/vect/vect-over-widen-1-big-array.c: Likewise.
* gcc.dg/vect/vect-over-widen-2.c: Likewise.
* gcc.dg/vect/vect-over-widen-2-big-array.c: Likewise.
* gcc.dg/vect/vect-over-widen-3.c: Likewise.
* gcc.dg/vect/vect-over-widen-3-big-array.c: Likewise.
* gcc.dg/vect/vect-over-widen-4.c: Likewise.
* gcc.dg/vect/vect-over-widen-4-big-array.c: Likewise.
* gcc.dg/vect/bb-slp-over-widen-1.c: New test.
* gcc.dg/vect/bb-slp-over-widen-2.c: Likewise.
* gcc.dg/vect/vect-over-widen-5.c: Likewise.
* gcc.dg/vect/vect-over-widen-6.c: Likewise.
* gcc.dg/vect/vect-over-widen-7.c: Likewise.
* gcc.dg/vect/vect-over-widen-8.c: Likewise.
* gcc.dg/vect/vect-over-widen-9.c: Likewise.
* gcc.dg/vect/vect-over-widen-10.c: Likewise.
* gcc.dg/vect/vect-over-widen-11.c: Likewise.
* gcc.dg/vect/vect-over-widen-12.c: Likewise.
* gcc.dg/vect/vect-over-widen-13.c: Likewise.
* gcc.dg/vect/vect-over-widen-14.c: Likewise.
* gcc.dg/vect/vect-over-widen-15.c: Likewise.
* 

Re: [12/n] PR85694: Rework detection of widened operations

2018-06-29 Thread Richard Biener
On Fri, Jun 29, 2018 at 11:20 AM Richard Sandiford
 wrote:
>
> Richard Sandiford  writes:
> > This patch adds helper functions for detecting widened operations and
> > generalises the existing code to handle more cases.
> >
> > One of the main changes is to recognise multi-stage type conversions,
> > which are possible even in the original IR and can also occur as a
> > result of earlier pattern matching (especially after the main
> > over-widening patch).  E.g. for:
> >
> >   unsigned int res = 0;
> >   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> > {
> >   int av = a[i];
> >   int bv = b[i];
> >   short diff = av - bv;
> >   unsigned short abs = diff < 0 ? -diff : diff;
> >   res += abs;
> > }
> >
> > we have:
> >
> >   _9 = _7 - _8;
> >   diff_20 = (short int) _9;
> >   _10 = (int) diff_20;
> >   _11 = ABS_EXPR <_10>;
> >
> > where the first cast establishes the sign of the promotion done
> > by the second cast.
> >
> > vect_recog_sad_pattern didn't handle this kind of intermediate promotion
> > between the MINUS_EXPR and the ABS_EXPR.  Sign extensions and casts from
> > unsigned to signed are both OK there.  Unsigned promotions aren't, and
> > need to be rejected, but should have been folded away earlier anyway.
> >
> > Also, the dot_prod and widen_sum patterns both required the promotions
> > to be from one signedness to the same signedness, rather than say signed
> > char to unsigned int.  That shouldn't be necessary, since it's only the
> > sign of the input to the promotion that matters.  Nothing requires the
> > narrow and wide types in a DOT_PROD_EXPR or WIDEN_SUM_EXPR to have the
> > same sign (and IMO that's a good thing).
> >
> > Fixing these fixed an XFAIL in gcc.dg/vect/vect-widen-mult-sum.c.
> >
> > The patch also uses a common routine to handle both the WIDEN_MULT_EXPR
> > and WIDEN_LSHIFT_EXPR patterns.  I hope this could be reused for other
> > similar operations in future.
> >
> > Also, the patch means we recognise the index calculations in
> > vect-mult-const-pattern*.c as widening multiplications, whereas the
> > scan test was expecting them to be recognised as mult patterns instead.
> > The patch makes the tests check specifically for the multiplication we
> > care about.
> >
> > Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>
> It turned out that it would be better to generalise vect_widened_op_p
> to handle a tree of operations, for the benefit of the average detection
> itself.
>
> Tested as before.  OK to install?

OK.

Thanks,
Richard.

> Richard
>
>
> 2018-06-29  Richard Sandiford  
>
> gcc/
> * tree-vect-patterns.c (append_pattern_def_seq): Take an optional
> vector type.  If given, install it in the new statement's
> STMT_VINFO_VECTYPE.
> (vect_element_precision): New function.
> (vect_unpromoted_value): New struct.
> (vect_unpromoted_value::vect_unpromoted_value): New function.
> (vect_unpromoted_value::set_op): Likewise.
> (vect_look_through_possible_promotion): Likewise.
> (vect_joust_widened_integer, vect_joust_widened_type): Likewise.
> (vect_widened_op_tree, vect_convert_input): Likewise.
> (vect_convert_inputs, vect_convert_output): Likewise.
> (vect_recog_dot_prod_pattern): Use 
> vect_look_through_possible_promotion
> to handle the optional cast of the multiplication result and
> vect_widened_op_tree to detect the widened multiplication itself.
> Do not require the input and output of promotion casts to have
> the same sign, but base the signedness of the operation on the
> input rather than the result.  If the pattern includes two
> promotions, check that those promotions have the same sign.
> Do not restrict the MULT_EXPR handling to a double-width result;
> handle quadruple-width results and wider.  Use vect_convert_inputs
> to convert the inputs to the common type.
> (vect_recog_sad_pattern):  Use vect_look_through_possible_promotion
> to handle the optional cast of the ABS result.  Also allow a sign
> change or a sign extension between the ABS and MINUS.
> Use vect_widened_op_tree to detect the widened subtraction and use
> vect_convert_inputs to convert the inputs to the common type.
> (vect_handle_widen_op_by_const): Delete.
> (vect_recog_widen_op_pattern): New function.
> (vect_recog_widen_mult_pattern): Use it.
> (vect_recog_widen_shift_pattern): Likewise.
> (vect_recog_widen_sum_pattern): Use
> vect_look_through_possible_promotion to handle the promoted
> PLUS_EXPR operand.
>
> gcc/testsuite/
> * gcc.dg/vect/vect-widen-mult-sum.c: Remove xfail.
> * gcc.dg/vect/no-scevccp-outer-6.c: Don't match widened 
> multiplications
> by 4 in the computation of a[i].
> * gcc.dg/vect/vect-mult-const-pattern-1.c: Test specifically for 

Re: [11/n] PR85694: Apply pattern matching to pattern definition statements

2018-06-29 Thread Richard Biener
On Wed, Jun 20, 2018 at 12:28 PM Richard Sandiford
 wrote:
>
> Although the first pattern match wins in the sense that no later
> function can match the *old* gimple statement, it still seems worth
> letting them match the *new* gimple statements, just like we would if
> the original IR had included that sequence from the outset.
>
> This is mostly true after the later patch for PR85694, where e.g. we
> could recognise:
>
>signed char a;
>int ap = (int) a;
>int res = ap * 3;
>
> as the pattern:
>
>short ap' = (short) a;
>short res = ap' * 3; // S1: definition statement
>int res = (int) res; // S2: pattern statement
>
> and then apply the mult pattern to "ap' * 3".  The patch needs to
> come first (without its own test cases) so that the main over-widening
> patch doesn't regress anything.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

OK.

Richard.

> Richard
>
>
> 2018-06-20  Richard Sandiford  
>
> gcc/
> * gimple-iterator.c (gsi_for_stmt): Add a new overload that takes
> the containing gimple_seq *.
> * gimple-iterator.h (gsi_for_stmt): Declare it.
> * tree-vect-patterns.c (vect_recog_dot_prod_pattern)
> (vect_recog_sad_pattern, vect_recog_widen_sum_pattern)
> (vect_recog_widen_shift_pattern, vect_recog_rotate_pattern)
> (vect_recog_vector_vector_shift_pattern, vect_recog_divmod_pattern)
> (vect_recog_mask_conversion_pattern): Remove STMT_VINFO_IN_PATTERN_P
> checks.
> (vect_init_pattern_stmt, vect_set_pattern_stmt): New functions,
> split out from...
> (vect_mark_pattern_stmts): ...here.  Handle cases in which the
> statement being replaced is part of an existing pattern
> definition sequence, inserting the new pattern statements before
> the original one.
> (vect_pattern_recog_1): Don't return a bool.  If the statement
> is already part of a pattern, instead apply pattern matching
> to the pattern definition statements.  Don't clear the
> STMT_VINFO_RELATED_STMT if is_pattern_stmt_p.
> (vect_pattern_recog): Don't break after the first match;
> continue processing the pattern definition statements instead.
> Don't bail out for STMT_VINFO_IN_PATTERN_P here.
>
> Index: gcc/gimple-iterator.c
> ===
> --- gcc/gimple-iterator.c   2018-05-02 08:38:09.833407589 +0100
> +++ gcc/gimple-iterator.c   2018-06-20 11:26:07.913295807 +0100
> @@ -619,6 +619,18 @@ gsi_for_stmt (gimple *stmt)
>return i;
>  }
>
> +/* Get an iterator for STMT, which is known to belong to SEQ.  This is
> +   equivalent to starting at the beginning of SEQ and searching forward
> +   until STMT is found.  */
> +
> +gimple_stmt_iterator
> +gsi_for_stmt (gimple *stmt, gimple_seq *seq)
> +{
> +  gimple_stmt_iterator i = gsi_start_1 (seq);
> +  i.ptr = stmt;
> +  return i;
> +}
> +
>  /* Finds iterator for PHI.  */
>
>  gphi_iterator
> Index: gcc/gimple-iterator.h
> ===
> --- gcc/gimple-iterator.h   2018-05-02 08:38:10.117404903 +0100
> +++ gcc/gimple-iterator.h   2018-06-20 11:26:07.913295807 +0100
> @@ -79,6 +79,7 @@ extern void gsi_insert_after (gimple_stm
>   enum gsi_iterator_update);
>  extern bool gsi_remove (gimple_stmt_iterator *, bool);
>  extern gimple_stmt_iterator gsi_for_stmt (gimple *);
> +extern gimple_stmt_iterator gsi_for_stmt (gimple *, gimple_seq *);
>  extern gphi_iterator gsi_for_phi (gphi *);
>  extern void gsi_move_after (gimple_stmt_iterator *, gimple_stmt_iterator *);
>  extern void gsi_move_before (gimple_stmt_iterator *, gimple_stmt_iterator *);
> Index: gcc/tree-vect-patterns.c
> ===
> --- gcc/tree-vect-patterns.c2018-06-20 11:26:00.497361237 +0100
> +++ gcc/tree-vect-patterns.c2018-06-20 11:26:07.913295807 +0100
> @@ -60,6 +60,42 @@ vect_pattern_detected (const char *name,
>  }
>  }
>
> +/* Associate pattern statement PATTERN_STMT with ORIG_STMT_INFO.
> +   Set its vector type to VECTYPE if it doesn't have one already.  */
> +
> +static void
> +vect_init_pattern_stmt (gimple *pattern_stmt, stmt_vec_info orig_stmt_info,
> +   tree vectype)
> +{
> +  stmt_vec_info pattern_stmt_info = vinfo_for_stmt (pattern_stmt);
> +  if (pattern_stmt_info == NULL)
> +{
> +  pattern_stmt_info = new_stmt_vec_info (pattern_stmt,
> +orig_stmt_info->vinfo);
> +  set_vinfo_for_stmt (pattern_stmt, pattern_stmt_info);
> +}
> +  gimple_set_bb (pattern_stmt, gimple_bb (orig_stmt_info->stmt));
> +
> +  STMT_VINFO_RELATED_STMT (pattern_stmt_info) = orig_stmt_info->stmt;
> +  STMT_VINFO_DEF_TYPE (pattern_stmt_info)
> += STMT_VINFO_DEF_TYPE (orig_stmt_info);
> +  if (!STMT_VINFO_VECTYPE 

Re: Add support for dumping multiple dump files under one name

2018-06-29 Thread David Malcolm
On Fri, 2018-06-29 at 10:15 +0200, Richard Biener wrote:
> On Fri, 22 Jun 2018, Jan Hubicka wrote:
> 
> > Hi,
> > this patch adds dumpfile support for dumps that come in multiple
> > parts.  This
> > is needed for WPA stream-out dump since we stream partitions in
> > parallel and
> > the dumps would come up in random order.  Parts are added by new
> > parameter that
> > is initialzed by default to -1 (no parts). 
> > 
> > One thing I skipped is any support for duplicated opening of file
> > with parts since I do not need it.
> > 
> > Bootstrapped/regtested x86_64-linux, OK?
> 
> Looks reasonable - David, anything you want to add / have changed?

No worries from my side; I don't think it interacts with the
optimization records stuff I'm working on - presumably this is just for
dumping the WPA stream-out, rather than for dumping specific
optimizations.

[...snip...]

Dave



[PATCH] Add whitespace to some dejagnu directives in libstdc++ tests

2018-06-29 Thread Jonathan Wakely

* testsuite/20_util/add_rvalue_reference/requirements/alias_decl.cc:
Add whitespace to dejagnu directive.
* testsuite/23_containers/array/element_access/at_neg.cc: Likewise.

Tested x86_64-linux, committed to trunk.


commit 0c1a20c4c48f12dfa2797eb193ff9c5edbb4fd8f
Author: Jonathan Wakely 
Date:   Fri Jun 29 10:37:05 2018 +0100

Add whitespace to some dejagnu directives in libstdc++ tests

* testsuite/20_util/add_rvalue_reference/requirements/alias_decl.cc:
Add whitespace to dejagnu directive.
* testsuite/23_containers/array/element_access/at_neg.cc: Likewise.

diff --git 
a/libstdc++-v3/testsuite/20_util/add_rvalue_reference/requirements/alias_decl.cc
 
b/libstdc++-v3/testsuite/20_util/add_rvalue_reference/requirements/alias_decl.cc
index db2d47219c6..d940cec22d3 100644
--- 
a/libstdc++-v3/testsuite/20_util/add_rvalue_reference/requirements/alias_decl.cc
+++ 
b/libstdc++-v3/testsuite/20_util/add_rvalue_reference/requirements/alias_decl.cc
@@ -1,4 +1,4 @@
-// { dg-do compile {target c++14 } }
+// { dg-do compile { target c++14 } }
 
 // Copyright (C) 2013-2018 Free Software Foundation, Inc.
 //
diff --git 
a/libstdc++-v3/testsuite/23_containers/array/element_access/at_neg.cc 
b/libstdc++-v3/testsuite/23_containers/array/element_access/at_neg.cc
index 33f52f265d8..15f36909044 100644
--- a/libstdc++-v3/testsuite/23_containers/array/element_access/at_neg.cc
+++ b/libstdc++-v3/testsuite/23_containers/array/element_access/at_neg.cc
@@ -1,4 +1,4 @@
-// { dg-do run {target c++11 xfail *-*-* } }
+// { dg-do run { target c++11 xfail *-*-* } }
 
 // Copyright (C) 2011-2018 Free Software Foundation, Inc.
 //


[PATCH] fixincludes: vxworks: remove unnecessary parentheses in ioctl wrapper macro

2018-06-29 Thread Rasmus Villemoes
The rationale for the fixinclude ioctl macro wrapper is, as far as I can
tell (https://gcc.gnu.org/ml/gcc-patches/2012-09/msg01619.html)

  Fix 2: Add hack for ioctl() on VxWorks.

  ioctl() is supposed to be variadic, but VxWorks only has a three
  argument version with the third argument of type int.  This messes up
  when the third argument is not implicitly convertible to int.  This
  adds a macro which wraps around ioctl() and explicitly casts the third
  argument to an int.  This way, the most common use case of ioctl (with
  a const char * for the third argument) will compile in C++, where
  pointers must be explicitly casted to int.

However, we have existing C++ code that calls the ioctl function via

  ::ioctl(foo, bar, baz)

and obviously this breaks when it gets expanded to

  ::(ioctl)(foo, bar, (int)(baz))

Since the GNU C preprocessor already prevents recursive expansion of
function-like macros, the parentheses around ioctl are unnecessary.

Incidentally, there is also a macro sioIoctl() in the vxworks sioLib.h
header that expands to

  ((pSioChan)->pDrvFuncs->ioctl (pSioChan, cmd, arg))

which also breaks when that gets further expanded to

  ((pSioChan)->pDrvFuncs->(ioctl) (pSioChan, cmd, (int)(arg)))

This patch partly fixes that issue as well, but the third argument to
the pDrvFuncs->ioctl method should be void*, so the cast to (int) is
slightly annoying. Internally, we've simply patched the sioIoctl macro:

  (((pSioChan)->pDrvFuncs->ioctl) (pSioChan, cmd, arg))

==changelog==

fixincludes/

* inclhack.def (vxworks_ioctl_macro): Remove parentheses from
expansion of ioctl macro.
* fixincl.x: Regenerate.
---
 fixincludes/inclhack.def | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fixincludes/inclhack.def b/fixincludes/inclhack.def
index c1f5a13eda4..f7d2124ba74 100644
--- a/fixincludes/inclhack.def
+++ b/fixincludes/inclhack.def
@@ -4902,7 +4902,7 @@ fix = {
 
 c_fix   = format;
 c_fix_arg   = "%0\n"
-"#define ioctl(fd, func, arg) (ioctl)(fd, func, (int)(arg))\n";
+"#define ioctl(fd, func, arg) ioctl(fd, func, (int)(arg))\n";
 c_fix_arg   = "extern[\t ]+int[\t ]+ioctl[\t ]*\\([\t ,[:alnum:]]*\\);";
 
 test_text   = "extern int ioctl ( int asdf1234, int jkl , int qwerty ) ;";
-- 
2.16.4



Re: [PATCH] Add experimental::sample and experimental::shuffle from N4531

2018-06-29 Thread Jonathan Wakely

On 29/06/18 09:39 +0200, Christophe Lyon wrote:

On Fri, 29 Jun 2018 at 09:21, Jonathan Wakely  wrote:


On 29/06/18 08:55 +0200, Christophe Lyon wrote:
>On Mon, 25 Jun 2018 at 18:23, Jonathan Wakely  wrote:
>>
>> The additions to  were added in 2015 but the new
>> algorithms in  were not. This adds them.
>>
>> * include/experimental/algorithm (sample, shuffle): Add new overloads
>> using per-thread random number engine.
>> * testsuite/experimental/algorithm/sample.cc: Simpify and reduce
>> dependencies by using __gnu_test::test_container.
>> * testsuite/experimental/algorithm/sample-2.cc: New.
>> * testsuite/experimental/algorithm/shuffle.cc: New.
>>
>> Tested x86_64-linux, committed to trunk.
>>
>> This would be safe to backport, but nobody has noticed the algos are
>> missing or complained, so it doesn't seem very important to backport.
>>
>>
>
>Hi,
>
>On bare-metal targets (aarch64 and arm + newlib), I've noticed that
>the two new tests fail:
>PASS: experimental/algorithm/shuffle.cc (test for excess errors)
>spawn 
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/utils/bin/qemu-wrapper.sh
>./shuffle.exe
>terminate called after throwing an instance of 'std::runtime_error'
>  what():  random_device::random_device(const std::string&)
>
>*** EXIT code 4242
>FAIL: experimental/algorithm/shuffle.cc execution test
>
>PASS: experimental/algorithm/sample-2.cc (test for excess errors)
>spawn 
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/utils/bin/qemu-wrapper.sh
>./sample-2.exe
>terminate called after throwing an instance of 'std::runtime_error'
>  what():  random_device::random_device(const std::string&)
>
>*** EXIT code 4242
>FAIL: experimental/algorithm/sample-2.cc execution test
>
>Does this ring a bell?

Does the existing testsuite/experimental/random/randint.cc file fail
in the same way?



Yes it does.

And so do:
25_algorithms/make_heap/complexity.cc


This one also uses std::random_device.


23_containers/array/element_access/at_neg.cc


Hmm,

 // Expected behavior is to either throw and have the uncaught
 // exception end up in a terminate handler which eventually exits,
 // or abort. (Depending on -fno-exceptions.)

So this is expected to XFAIL.


26_numerics/random/random_device/cons/default.cc


We should XFAIL the ones that use std::random_device, if we can
identify an effective target to describe them.




GCC 7 backport

2018-06-29 Thread Martin Liška
Hi.

I'm going to install following tested patch (it's already in GCC-6 branch).

Martin
>From 46584361c5f48925395e8155e6a9b809507a25be Mon Sep 17 00:00:00 2001
From: marxin 
Date: Fri, 15 Jun 2018 08:51:28 +
Subject: [PATCH] Partial backport r256656

2018-06-15  Martin Liska  

	Backport from mainline
	2018-01-10  Kelvin Nilsen  

	* lex.c (search_line_fast): Remove illegal coercion of an
	unaligned pointer value to vector pointer type and replace with
	use of __builtin_vec_vsx_ld () built-in function, which operates
	on unaligned pointer values.

git-svn-id: svn+ssh://gcc.gnu.org/svn/gcc/branches/gcc-6-branch@261621 138bc75d-0d04-0410-961f-82ee72b054a4
---
 libcpp/lex.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/libcpp/lex.c b/libcpp/lex.c
index 097c78002cb..e0fb9e822c4 100644
--- a/libcpp/lex.c
+++ b/libcpp/lex.c
@@ -568,7 +568,7 @@ search_line_fast (const uchar *s, const uchar *end ATTRIBUTE_UNUSED)
 {
   vc m_nl, m_cr, m_bs, m_qm;
 
-  data = *((const vc *)s);
+  data = __builtin_vec_vsx_ld (0, s);
   s += 16;
 
   m_nl = (vc) __builtin_vec_cmpeq(data, repl_nl);
-- 
2.17.1



[17/n] PR85694: AArch64 support for AVG_FLOOR/CEIL

2018-06-29 Thread Richard Sandiford
This patch adds AArch64 patterns for the new AVG_FLOOR/CEIL operations.
AVG_FLOOR is [SU]HADD and AVG_CEIL is [SU]RHADD.

Tested on aarch64-linux-gnu (with and without SVE).  OK to install?

Richard


2018-06-29  Richard Sandiford  

gcc/
PR tree-optimization/85694
* config/aarch64/iterators.md (HADD, RHADD): New int iterators.
(u): Handle UNSPEC_SHADD, UNSPEC_UHADD, UNSPEC_SRHADD and
UNSPEC_URHADD.
* config/aarch64/aarch64-simd.md (avg3_floor)
(avg3_ceil): New patterns.

gcc/testsuite/
PR tree-optimization/85694
* lib/target-supports.exp (check_effective_target_vect_avg_qi):
Return true for AArch64 without SVE.
* gcc.target/aarch64/vect_hadd_1.h: New file.
* gcc.target/aarch64/vect_shadd_1.c: New test.
* gcc.target/aarch64/vect_srhadd_1.c: Likewise.
* gcc.target/aarch64/vect_uhadd_1.c: Likewise.
* gcc.target/aarch64/vect_urhadd_1.c: Likewise.

Index: gcc/config/aarch64/iterators.md
===
--- gcc/config/aarch64/iterators.md 2018-06-27 10:27:10.018648589 +0100
+++ gcc/config/aarch64/iterators.md 2018-06-29 10:16:35.300385599 +0100
@@ -1448,6 +1448,10 @@ (define_int_iterator HADDSUB [UNSPEC_SHA
  UNSPEC_SHSUB UNSPEC_UHSUB
  UNSPEC_SRHSUB UNSPEC_URHSUB])
 
+(define_int_iterator HADD [UNSPEC_SHADD UNSPEC_UHADD])
+
+(define_int_iterator RHADD [UNSPEC_SRHADD UNSPEC_URHADD])
+
 (define_int_iterator DOTPROD [UNSPEC_SDOT UNSPEC_UDOT])
 
 (define_int_iterator ADDSUBHN [UNSPEC_ADDHN UNSPEC_RADDHN
@@ -1672,8 +1676,10 @@ (define_int_attr lr [(UNSPEC_SSLI  "l")
 
 (define_int_attr u [(UNSPEC_SQSHLU "u") (UNSPEC_SQSHL "") (UNSPEC_UQSHL "")
(UNSPEC_SQSHRUN "u") (UNSPEC_SQRSHRUN "u")
-(UNSPEC_SQSHRN "")  (UNSPEC_UQSHRN "")
-(UNSPEC_SQRSHRN "") (UNSPEC_UQRSHRN "")])
+   (UNSPEC_SQSHRN "")  (UNSPEC_UQSHRN "")
+   (UNSPEC_SQRSHRN "") (UNSPEC_UQRSHRN "")
+   (UNSPEC_SHADD "") (UNSPEC_UHADD "u")
+   (UNSPEC_SRHADD "") (UNSPEC_URHADD "u")])
 
 (define_int_attr addsub [(UNSPEC_SHADD "add")
 (UNSPEC_UHADD "add")
Index: gcc/config/aarch64/aarch64-simd.md
===
--- gcc/config/aarch64/aarch64-simd.md  2018-06-27 10:27:10.022648553 +0100
+++ gcc/config/aarch64/aarch64-simd.md  2018-06-29 10:16:35.296385636 +0100
@@ -3387,6 +3387,22 @@ (define_expand "aarch64_usubw2"
 
 ;; h.
 
+(define_expand "avg3_floor"
+  [(set (match_operand:VDQ_BHSI 0 "register_operand")
+   (unspec:VDQ_BHSI [(match_operand:VDQ_BHSI 1 "register_operand")
+ (match_operand:VDQ_BHSI 2 "register_operand")]
+HADD))]
+  "TARGET_SIMD"
+)
+
+(define_expand "avg3_ceil"
+  [(set (match_operand:VDQ_BHSI 0 "register_operand")
+   (unspec:VDQ_BHSI [(match_operand:VDQ_BHSI 1 "register_operand")
+ (match_operand:VDQ_BHSI 2 "register_operand")]
+RHADD))]
+  "TARGET_SIMD"
+)
+
 (define_insn "aarch64_h"
   [(set (match_operand:VDQ_BHSI 0 "register_operand" "=w")
 (unspec:VDQ_BHSI [(match_operand:VDQ_BHSI 1 "register_operand" "w")
Index: gcc/testsuite/lib/target-supports.exp
===
--- gcc/testsuite/lib/target-supports.exp   2018-06-29 10:16:31.940416295 
+0100
+++ gcc/testsuite/lib/target-supports.exp   2018-06-29 10:16:35.300385599 
+0100
@@ -6317,7 +6317,8 @@ proc check_effective_target_vect_usad_ch
 # and unsigned average operations on vectors of bytes.
 
 proc check_effective_target_vect_avg_qi {} {
-return 0
+return [expr { [istarget aarch64*-*-*]
+  && ![check_effective_target_aarch64_sve] }]
 }
 
 # Return 1 if the target plus current options supports a vector
Index: gcc/testsuite/gcc.target/aarch64/vect_hadd_1.h
===
--- /dev/null   2018-06-13 14:36:57.192460992 +0100
+++ gcc/testsuite/gcc.target/aarch64/vect_hadd_1.h  2018-06-29 
10:16:35.300385599 +0100
@@ -0,0 +1,39 @@
+#include 
+
+#pragma GCC target "+nosve"
+
+#define N 100
+
+#define DEF_FUNC(TYPE, B1, B2, C1, C2) \
+  void __attribute__ ((noipa)) \
+  f_##TYPE (TYPE *restrict a, TYPE *restrict b, TYPE *restrict c)  \
+  {\
+for (int i = 0; i < N; ++i)
\
+  a[i] = ((__int128) b[i] + c[i] + BIAS) >> 1; \
+  }
+
+#define TEST_FUNC(TYPE, B1, B2, C1, C2)
\
+  {\
+TYPE a[N], b[N], c[N];   

[16/n] PR85694: Add detection of averaging operations

2018-06-29 Thread Richard Sandiford
This patch adds detection of average instructions:

   a = (((wide) b + (wide) c) >> 1);
   --> a = (wide) .AVG_FLOOR (b, c);

   a = (((wide) b + (wide) c + 1) >> 1);
   --> a = (wide) .AVG_CEIL (b, c);

in cases where users of "a" need only the low half of the result,
making the cast to (wide) redundant.  The heavy lifting was done by
earlier patches.

This showed up another problem in vectorizable_call: if the call is a
pattern definition statement rather than the main pattern statement,
the type of vectorised call might be different from the type of the
original statement.

Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

Richard


2018-06-29  Richard Sandiford  

gcc/
PR tree-optimization/85694
* doc/md.texi (avgM3_floor, uavgM3_floor, avgM3_ceil)
(uavgM3_ceil): Document new optabs.
* doc/sourcebuild.texi (vect_avg_qi): Document new target selector.
* internal-fn.def (IFN_AVG_FLOOR, IFN_AVG_CEIL): New internal
functions.
* optabs.def (savg_floor_optab, uavg_floor_optab, savg_ceil_optab)
(savg_ceil_optab): New optabs.
* tree-vect-patterns.c (vect_recog_average_pattern): New function.
(vect_vect_recog_func_ptrs): Add it.
* tree-vect-stmts.c (vectorizable_call): Get the type of the zero
constant directly from the associated lhs.

gcc/testsuite/
PR tree-optimization/85694
* lib/target-supports.exp (check_effective_target_vect_avg_qi): New
proc.
* gcc.dg/vect/vect-avg-1.c: New test.
* gcc.dg/vect/vect-avg-2.c: Likewise.
* gcc.dg/vect/vect-avg-3.c: Likewise.
* gcc.dg/vect/vect-avg-4.c: Likewise.
* gcc.dg/vect/vect-avg-5.c: Likewise.
* gcc.dg/vect/vect-avg-6.c: Likewise.
* gcc.dg/vect/vect-avg-7.c: Likewise.
* gcc.dg/vect/vect-avg-8.c: Likewise.
* gcc.dg/vect/vect-avg-9.c: Likewise.
* gcc.dg/vect/vect-avg-10.c: Likewise.
* gcc.dg/vect/vect-avg-11.c: Likewise.
* gcc.dg/vect/vect-avg-12.c: Likewise.
* gcc.dg/vect/vect-avg-13.c: Likewise.
* gcc.dg/vect/vect-avg-14.c: Likewise.

Index: gcc/doc/md.texi
===
--- gcc/doc/md.texi 2018-06-29 10:14:49.425353913 +0100
+++ gcc/doc/md.texi 2018-06-29 10:16:31.936416331 +0100
@@ -5599,6 +5599,34 @@ Other shift and rotate instructions, ana
 Vector shift and rotate instructions that take vectors as operand 2
 instead of a scalar type.
 
+@cindex @code{avg@var{m}3_floor} instruction pattern
+@cindex @code{uavg@var{m}3_floor} instruction pattern
+@item @samp{avg@var{m}3_floor}
+@itemx @samp{uavg@var{m}3_floor}
+Signed and unsigned average instructions.  These instructions add
+operands 1 and 2 without truncation, divide the result by 2,
+round towards -Inf, and store the result in operand 0.  This is
+equivalent to the C code:
+@smallexample
+narrow op0, op1, op2;
+@dots{}
+op0 = (narrow) (((wide) op1 + (wide) op2) >> 1);
+@end smallexample
+where the sign of @samp{narrow} determines whether this is a signed
+or unsigned operation.
+
+@cindex @code{avg@var{m}3_ceil} instruction pattern
+@cindex @code{uavg@var{m}3_ceil} instruction pattern
+@item @samp{avg@var{m}3_ceil}
+@itemx @samp{uavg@var{m}3_ceil}
+Like @samp{avg@var{m}3_floor} and @samp{uavg@var{m}3_floor}, but round
+towards +Inf.  This is equivalent to the C code:
+@smallexample
+narrow op0, op1, op2;
+@dots{}
+op0 = (narrow) (((wide) op1 + (wide) op2 + 1) >> 1);
+@end smallexample
+
 @cindex @code{bswap@var{m}2} instruction pattern
 @item @samp{bswap@var{m}2}
 Reverse the order of bytes of operand 1 and store the result in operand 0.
Index: gcc/doc/sourcebuild.texi
===
--- gcc/doc/sourcebuild.texi2018-06-14 12:27:24.156171818 +0100
+++ gcc/doc/sourcebuild.texi2018-06-29 10:16:31.936416331 +0100
@@ -1417,6 +1417,10 @@ Target supports Fortran @code{real} kind
 The target's ABI allows stack variables to be aligned to the preferred
 vector alignment.
 
+@item vect_avg_qi
+Target supports both signed and unsigned averaging operations on vectors
+of bytes.
+
 @item vect_condition
 Target supports vector conditional operations.
 
Index: gcc/internal-fn.def
===
--- gcc/internal-fn.def 2018-06-14 12:27:34.108084438 +0100
+++ gcc/internal-fn.def 2018-06-29 10:16:31.936416331 +0100
@@ -143,6 +143,11 @@ DEF_INTERNAL_OPTAB_FN (FMS, ECF_CONST, f
 DEF_INTERNAL_OPTAB_FN (FNMA, ECF_CONST, fnma, ternary)
 DEF_INTERNAL_OPTAB_FN (FNMS, ECF_CONST, fnms, ternary)
 
+DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_FLOOR, ECF_CONST | ECF_NOTHROW, first,
+ savg_floor, uavg_floor, binary)
+DEF_INTERNAL_SIGNED_OPTAB_FN (AVG_CEIL, ECF_CONST | ECF_NOTHROW, first,
+ savg_ceil, uavg_ceil, binary)
+
 DEF_INTERNAL_OPTAB_FN (COND_ADD, ECF_CONST, 

Re: [12/n] PR85694: Rework detection of widened operations

2018-06-29 Thread Richard Sandiford
Richard Sandiford  writes:
> This patch adds helper functions for detecting widened operations and
> generalises the existing code to handle more cases.
>
> One of the main changes is to recognise multi-stage type conversions,
> which are possible even in the original IR and can also occur as a
> result of earlier pattern matching (especially after the main
> over-widening patch).  E.g. for:
>
>   unsigned int res = 0;
>   for (__INTPTR_TYPE__ i = 0; i < N; ++i)
> {
>   int av = a[i];
>   int bv = b[i];
>   short diff = av - bv;
>   unsigned short abs = diff < 0 ? -diff : diff;
>   res += abs;
> }
>
> we have:
>
>   _9 = _7 - _8;
>   diff_20 = (short int) _9;
>   _10 = (int) diff_20;
>   _11 = ABS_EXPR <_10>;
>
> where the first cast establishes the sign of the promotion done
> by the second cast.
>
> vect_recog_sad_pattern didn't handle this kind of intermediate promotion
> between the MINUS_EXPR and the ABS_EXPR.  Sign extensions and casts from
> unsigned to signed are both OK there.  Unsigned promotions aren't, and
> need to be rejected, but should have been folded away earlier anyway.
>
> Also, the dot_prod and widen_sum patterns both required the promotions
> to be from one signedness to the same signedness, rather than say signed
> char to unsigned int.  That shouldn't be necessary, since it's only the
> sign of the input to the promotion that matters.  Nothing requires the
> narrow and wide types in a DOT_PROD_EXPR or WIDEN_SUM_EXPR to have the
> same sign (and IMO that's a good thing).
>
> Fixing these fixed an XFAIL in gcc.dg/vect/vect-widen-mult-sum.c.
>
> The patch also uses a common routine to handle both the WIDEN_MULT_EXPR
> and WIDEN_LSHIFT_EXPR patterns.  I hope this could be reused for other
> similar operations in future.
>
> Also, the patch means we recognise the index calculations in
> vect-mult-const-pattern*.c as widening multiplications, whereas the
> scan test was expecting them to be recognised as mult patterns instead.
> The patch makes the tests check specifically for the multiplication we
> care about.
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?

It turned out that it would be better to generalise vect_widened_op_p
to handle a tree of operations, for the benefit of the average detection
itself.

Tested as before.  OK to install?

Richard


2018-06-29  Richard Sandiford  

gcc/
* tree-vect-patterns.c (append_pattern_def_seq): Take an optional
vector type.  If given, install it in the new statement's
STMT_VINFO_VECTYPE.
(vect_element_precision): New function.
(vect_unpromoted_value): New struct.
(vect_unpromoted_value::vect_unpromoted_value): New function.
(vect_unpromoted_value::set_op): Likewise.
(vect_look_through_possible_promotion): Likewise.
(vect_joust_widened_integer, vect_joust_widened_type): Likewise.
(vect_widened_op_tree, vect_convert_input): Likewise.
(vect_convert_inputs, vect_convert_output): Likewise.
(vect_recog_dot_prod_pattern): Use vect_look_through_possible_promotion
to handle the optional cast of the multiplication result and
vect_widened_op_tree to detect the widened multiplication itself.
Do not require the input and output of promotion casts to have
the same sign, but base the signedness of the operation on the
input rather than the result.  If the pattern includes two
promotions, check that those promotions have the same sign.
Do not restrict the MULT_EXPR handling to a double-width result;
handle quadruple-width results and wider.  Use vect_convert_inputs
to convert the inputs to the common type.
(vect_recog_sad_pattern):  Use vect_look_through_possible_promotion
to handle the optional cast of the ABS result.  Also allow a sign
change or a sign extension between the ABS and MINUS.
Use vect_widened_op_tree to detect the widened subtraction and use
vect_convert_inputs to convert the inputs to the common type.
(vect_handle_widen_op_by_const): Delete.
(vect_recog_widen_op_pattern): New function.
(vect_recog_widen_mult_pattern): Use it.
(vect_recog_widen_shift_pattern): Likewise.
(vect_recog_widen_sum_pattern): Use
vect_look_through_possible_promotion to handle the promoted
PLUS_EXPR operand.

gcc/testsuite/
* gcc.dg/vect/vect-widen-mult-sum.c: Remove xfail.
* gcc.dg/vect/no-scevccp-outer-6.c: Don't match widened multiplications
by 4 in the computation of a[i].
* gcc.dg/vect/vect-mult-const-pattern-1.c: Test specifically for the
main multiplication constant.
* gcc.dg/vect/vect-mult-const-pattern-2.c: Likewise.
* gcc.dg/vect/vect-widen-mult-const-s16.c: Likewise.
* gcc.dg/vect/vect-widen-mult-const-u16.c: Likewise.  Expect the
pattern to cast the result to int.
* 

Re: [patch, fortran] Handling of .and. and .or. expressions

2018-06-29 Thread Janus Weil
2018-06-29 9:28 GMT+02:00 Jakub Jelinek :
> On Thu, Jun 28, 2018 at 07:36:56PM -0700, Steve Kargl wrote:
>> === gfortran Summary ===
>>
>> # of expected passes47558
>> # of unexpected failures6
>> # of expected failures  104
>> # of unsupported tests  85
>>
>> FAIL: gfortran.dg/actual_pointer_function_1.f90   -O0  execution test
>> FAIL: gfortran.dg/actual_pointer_function_1.f90   -O1  execution test
>> FAIL: gfortran.dg/actual_pointer_function_1.f90   -O2  execution test
>> FAIL: gfortran.dg/actual_pointer_function_1.f90   -O3 -fomit-frame-pointer 
>> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
>> FAIL: gfortran.dg/actual_pointer_function_1.f90   -O3 -g  execution test
>> FAIL: gfortran.dg/actual_pointer_function_1.f90   -Os  execution test
>>
>> Execution timeout is: 300
>> spawn [open ...]
>>
>> Program received signal SIGSEGV: Segmentation fault - invalid memory 
>> reference.
>>
>> Backtrace for this error:
>> #0  0x71a2 in ???
>> #1  0x400c09 in ???
>> #2  0x400b91 in ???
>> #3  0x400c51 in ???
>> #4  0x400854 in _start
>> at /usr/src/lib/csu/amd64/crt1.c:74
>> #5  0x200627fff in ???
>
> If you have a test that is broken by the TRUTH_ANDIF_EXPR -> TRUTH_AND_EXPR
> change, then the test must be broken, because from the snippets that were
> posted, Fortran does not require (unlike C/C++) that the second operand is
> not evaluated if the first one evaluates to false for (and) or true (for
> or), it just allows it.

Exactly.


> So, the optimizing away of the function calls should be an optimization, and
> as such should be done only when optimizing.  So for -O0 at least always use
> TRUTH_{AND,OR}_EXPR, so that people can actually make sure that their
> programs are valid Fortran and can also step into those functions when
> debugging.  For -O1 and higher perhaps use temporarily the *IF_EXPR, or
> better, as I said in another mail, let's add an attribute that will optimize
> all the calls that can be optimized, not just one special case.

Thanks for the comments, Jakub. I fully agree. This is pretty much the
sanest strategy I've heard so far in all of this monstrous thread and
I definitely support it.

Cheers,
Janus


Re: [PATCH 3/3][POPCOUNT] Remove unnecessary if condition in phiopt

2018-06-29 Thread Richard Biener
On Wed, Jun 27, 2018 at 7:09 AM Kugan Vivekanandarajah
 wrote:
>
> Hi Richard,
>
> Thanks for the review,
>
> On 25 June 2018 at 20:20, Richard Biener  wrote:
> > On Fri, Jun 22, 2018 at 11:16 AM Kugan Vivekanandarajah
> >  wrote:
> >>
> >> gcc/ChangeLog:
> >
> > @@ -1516,6 +1521,114 @@ minmax_replacement (basic_block cond_bb,
> > basic_block middle_bb,
> >
> >return true;
> >  }
> > +/* Convert
> > +
> > +   
> > +   if (b_4(D) != 0)
> > +   goto 
> >
> > vertical space before the comment.
> Done.
>
> >
> > + edge e0 ATTRIBUTE_UNUSED, edge e1
> > ATTRIBUTE_UNUSED,
> >
> > why pass them if they are unused?
> Removed.
>
> >
> > +  if (stmt_count != 2)
> > +return false;
> >
> > so what about the case when there is no conversion?
> Done.
>
> >
> > +  /* Check that we have a popcount builtin.  */
> > +  if (!is_gimple_call (popcount)
> > +  || !gimple_call_builtin_p (popcount, BUILT_IN_NORMAL))
> > +return false;
> > +  tree fndecl = gimple_call_fndecl (popcount);
> > +  if ((DECL_FUNCTION_CODE (fndecl) != BUILT_IN_POPCOUNT)
> > +  && (DECL_FUNCTION_CODE (fndecl) != BUILT_IN_POPCOUNTL)
> > +  && (DECL_FUNCTION_CODE (fndecl) != BUILT_IN_POPCOUNTLL))
> > +return false;
> >
> > look at popcount handling in tree-vrp.c how to properly also handle
> > IFN_POPCOUNT.
> > (CASE_CFN_POPCOUNT)
> Done.
> >
> > +  /* Cond_bb has a check for b_4 != 0 before calling the popcount
> > + builtin.  */
> > +  if (gimple_code (cond) != GIMPLE_COND
> > +  || gimple_cond_code (cond) != NE_EXPR
> > +  || TREE_CODE (gimple_cond_lhs (cond)) != SSA_NAME
> > +  || rhs != gimple_cond_lhs (cond))
> > +return false;
> >
> > The check for SSA_NAME is redundant.
> > You fail to check that gimple_cond_rhs is zero.
> Done.
>
> >
> > +  /* Remove the popcount builtin and cast stmt.  */
> > +  gsi = gsi_for_stmt (popcount);
> > +  gsi_remove (, true);
> > +  gsi = gsi_for_stmt (cast);
> > +  gsi_remove (, true);
> > +
> > +  /* And insert the popcount builtin and cast stmt before the cond_bb.  */
> > +  gsi = gsi_last_bb (cond_bb);
> > +  gsi_insert_before (, popcount, GSI_NEW_STMT);
> > +  gsi_insert_before (, cast, GSI_NEW_STMT);
> >
> > use gsi_move_before ().  You need to reset flow sensitive info on the
> > LHS of the popcount call as well as on the LHS of the cast.
> Done.
>
> >
> > You fail to check the PHI operand on the false edge.  Consider
> >
> >  if (b != 0)
> >res = __builtin_popcount (b);
> >  else
> >res = 1;
> >
> > You fail to check the PHI operand on the true edge.  Consider
> >
> >  res = 0;
> >  if (b != 0)
> >{
> >   __builtin_popcount (b);
> >   res = 2;
> >}
> >
> > and using -fno-tree-dce and whatever you need to keep the
> > popcount call in the IL.  A gimple testcase for phiopt will do.
> >
> > Your testcase relies on popcount detection.  Please write it
> > using __builtin_popcount instead.  Write one with a cast and
> > one without.
> Added the testcases.
>
> Is this OK now.

+  for (gsi = gsi_start_bb (middle_bb); !gsi_end_p (gsi); gsi_next ())
+{

use gsi_after_labels (middle_bb)

+  popcount = last_stmt (middle_bb);
+  if (popcount == NULL)
+return false;

after the counting this test is always false, remove it.

+  /* We have a cast stmt feeding popcount builtin.  */
+  cast = first_stmt (middle_bb);

looking at the implementation of first_stmt this will
give you a label in case the BB has one.  I think it's better
to merge this and the above with the "counting" like

gsi = gsi_start_nondebug_after_labels_bb (middle_bb);
if (gsi_end_p (gsi))
  return false;
cast = gsi_stmt (gsi);
gsi_next_nondebug ();
if (!gsi_end_p (gsi))
  {
popcount = gsi_stmt (gsi);
gsi_next_nondebug ();
if (!gsi_end_p (gsi))
   return false;
  }
else
  {
popcount = cast;
cast = NULL;
  }

+  /* Check that we have a cast prior to that.  */
+  if (gimple_code (cast) != GIMPLE_ASSIGN
+ || gimple_assign_rhs_code (cast) != NOP_EXPR)
+   return false;

CONVERT_EXPR_CODE_P (gimple_assign_rhs_code (cast))

+  /* Check PHI arguments.  */
+  if (lhs != arg0 || !integer_zerop (arg1))
+return false;

that is not sufficient, you do not know whether arg0 is the true
value or the false value.  The edge flags will tell you.

Otherwise looks OK.

Richard.

> Thanks,
> Kugan
> >
> > Thanks,
> > Richard.
> >
> >
> >> 2018-06-22  Kugan Vivekanandarajah  
> >>
> >> * tree-ssa-phiopt.c (cond_removal_in_popcount_pattern): New.
> >> (tree_ssa_phiopt_worker): Call cond_removal_in_popcount_pattern.
> >>
> >> gcc/testsuite/ChangeLog:
> >>
> >> 2018-06-22  Kugan Vivekanandarajah  
> >>
> >> * gcc.dg/tree-ssa/popcount3.c: New test.


Re: Add support for dumping multiple dump files under one name

2018-06-29 Thread Richard Biener
On Fri, 22 Jun 2018, Jan Hubicka wrote:

> Hi,
> this patch adds dumpfile support for dumps that come in multiple parts.  This
> is needed for WPA stream-out dump since we stream partitions in parallel and
> the dumps would come up in random order.  Parts are added by new parameter 
> that
> is initialzed by default to -1 (no parts). 
> 
> One thing I skipped is any support for duplicated opening of file
> with parts since I do not need it.
> 
> Bootstrapped/regtested x86_64-linux, OK?

Looks reasonable - David, anything you want to add / have changed?

Thanks,
Richard.

> Honza
> 
>   * dumpfile.c (gcc::dump_manager::get_dump_file_name): Add PART
>parameter.
>   (gcc::dump_manager::get_dump_file_name): likewise.
>   (dump_begin): Likewise.
>   * dumpfile.h (dump_begin): Update prototype.
>   (gcc::dump_manager::get_dump_file_name,
>   gcc::dump_manager::get_dump_file_name): Update prototype.
> Index: dumpfile.c
> ===
> --- dumpfile.c(revision 261885)
> +++ dumpfile.c(working copy)
> @@ -269,7 +269,7 @@ get_dump_file_info_by_switch (const char
>  
>  char *
>  gcc::dump_manager::
> -get_dump_file_name (int phase) const
> +get_dump_file_name (int phase, int part) const
>  {
>struct dump_file_info *dfi;
>  
> @@ -278,7 +278,7 @@ get_dump_file_name (int phase) const
>  
>dfi = get_dump_file_info (phase);
>  
> -  return get_dump_file_name (dfi);
> +  return get_dump_file_name (dfi, part);
>  }
>  
>  /* Return the name of the dump file for the given dump_file_info.
> @@ -288,7 +288,7 @@ get_dump_file_name (int phase) const
>  
>  char *
>  gcc::dump_manager::
> -get_dump_file_name (struct dump_file_info *dfi) const
> +get_dump_file_name (struct dump_file_info *dfi, int part) const
>  {
>char dump_id[10];
>  
> @@ -312,7 +312,14 @@ get_dump_file_name (struct dump_file_inf
>   dump_id[0] = '\0';
>  }
>  
> -  return concat (dump_base_name, dump_id, dfi->suffix, NULL);
> +  if (part != -1)
> +{
> +   char part_id[8];
> +   snprintf (part_id, sizeof (part_id), ".%i", part);
> +   return concat (dump_base_name, dump_id, part_id, dfi->suffix, NULL);
> +}
> +  else
> +return concat (dump_base_name, dump_id, dfi->suffix, NULL);
>  }
>  
>  /* Open a dump file called FILENAME.  Some filenames are special and
> @@ -592,17 +599,19 @@ dump_finish (int phase)
>  /* Begin a tree dump for PHASE. Stores any user supplied flag in
> *FLAG_PTR and returns a stream to write to. If the dump is not
> enabled, returns NULL.
> -   Multiple calls will reopen and append to the dump file.  */
> +   PART can be used for dump files which should be split to multiple
> +   parts. PART == -1 indicates dump file with no parts.
> +   If PART is -1, multiple calls will reopen and append to the dump file.  */
>  
>  FILE *
> -dump_begin (int phase, dump_flags_t *flag_ptr)
> +dump_begin (int phase, dump_flags_t *flag_ptr, int part)
>  {
> -  return g->get_dumps ()->dump_begin (phase, flag_ptr);
> +  return g->get_dumps ()->dump_begin (phase, flag_ptr, part);
>  }
>  
>  FILE *
>  gcc::dump_manager::
> -dump_begin (int phase, dump_flags_t *flag_ptr)
> +dump_begin (int phase, dump_flags_t *flag_ptr, int part)
>  {
>char *name;
>struct dump_file_info *dfi;
> @@ -611,12 +620,14 @@ dump_begin (int phase, dump_flags_t *fla
>if (phase == TDI_none || !dump_phase_enabled_p (phase))
>  return NULL;
>  
> -  name = get_dump_file_name (phase);
> +  name = get_dump_file_name (phase, part);
>if (!name)
>  return NULL;
>dfi = get_dump_file_info (phase);
>  
> -  stream = dump_open (name, dfi->pstate < 0);
> +  /* We do not support re-opening of dump files with parts.  This would 
> require
> + tracking pstate per part of the dump file.  */
> +  stream = dump_open (name, part != -1 || dfi->pstate < 0);
>if (stream)
>  dfi->pstate = 1;
>free (name);
> Index: dumpfile.h
> ===
> --- dumpfile.h(revision 261885)
> +++ dumpfile.h(working copy)
> @@ -269,7 +269,7 @@ struct dump_file_info
>  };
>  
>  /* In dumpfile.c */
> -extern FILE *dump_begin (int, dump_flags_t *);
> +extern FILE *dump_begin (int, dump_flags_t *, int part=-1);
>  extern void dump_end (int, FILE *);
>  extern int opt_info_switch_p (const char *);
>  extern const char *dump_flag_name (int);
> @@ -343,10 +343,10 @@ public:
>/* Return the name of the dump file for the given phase.
>   If the dump is not enabled, returns NULL.  */
>char *
> -  get_dump_file_name (int phase) const;
> +  get_dump_file_name (int phase, int part = -1) const;
>  
>char *
> -  get_dump_file_name (struct dump_file_info *dfi) const;
> +  get_dump_file_name (struct dump_file_info *dfi, int part = -1) const;
>  
>int
>dump_switch_p (const char *arg);
> @@ -365,7 +365,7 @@ public:
>dump_finish (int phase);
>  
>

Re: [PATCH] libtool: Sort output of 'find' to enable deterministic builds.

2018-06-29 Thread Richard Biener
On Mon, Jun 25, 2018 at 1:39 PM Bernhard M. Wiedemann
 wrote:
>
> so that gcc builds in a reproducible way
> in spite of indeterministic filesystem readdir order
>
> See https://reproducible-builds.org/ for why this is good.
>
> While working on the reproducible builds effort, I found that
> when building the gcc8 package for openSUSE, there were differences
> between each build in resulting binaries like gccgo, cc1obj and cpp
> because the order of objects in libstdc++.a varied based on
> the order of entries returned by the filesystem.
>
> Two remaining issues are with timestamps in the ada build
> and with profiledbootstrap that only is reproducible if all inputs
> in the profiling run remain constant (and make -j breaks it too)
>
> Testcases:
>   none included because patch is trivial and it would need to compare builds 
> on 2 filesystems.
>
> Bootstrapping and testing:
>   tested successfully with gcc8 on x86_64

Looks ok to me.

Btw, running find to search for libtool.m4/ltmain.sh I find extra copies in

./libgo/config/ltmain.sh
./libgo/config/libtool.m4

which are nearly identical besides appearantly patched in GO support?

Can we consolidate those and/or do we need to patch those as well?

Thanks,
Richard.

> [gcc]
> 2018-06-19  Bernhard M. Wiedemann  
>
> libtool: Sort output of 'find' to enable deterministic builds.
>
> ---
> pulled in libtool commit 74c8993c178a1386ea5e2363a01d919738402f30
> because a full update appears to be too troublesome after 8+ years
> of divergence, but we still really want that fix.
>
> See also https://gcc.gnu.org/ml/gcc/2017-10/msg00060.html
> ---
>  libtool.m4 | 8 
>  ltmain.sh  | 4 ++--
>  2 files changed, 6 insertions(+), 6 deletions(-)
>
> diff --git a/libtool.m4 b/libtool.m4
> index 24d13f344..940faaa16 100644
> --- a/libtool.m4
> +++ b/libtool.m4
> @@ -6005,20 +6005,20 @@ if test "$_lt_caught_CXX_error" != yes; then
>   _LT_TAGVAR(prelink_cmds, $1)='tpldir=Template.dir~
> rm -rf $tpldir~
> $CC --prelink_objects --instantiation_dir $tpldir $objs 
> $libobjs $compile_deplibs~
> -   compile_command="$compile_command `find $tpldir -name \*.o | 
> $NL2SP`"'
> +   compile_command="$compile_command `find $tpldir -name \*.o | 
> sort | $NL2SP`"'
>   _LT_TAGVAR(old_archive_cmds, $1)='tpldir=Template.dir~
> rm -rf $tpldir~
> $CC --prelink_objects --instantiation_dir $tpldir 
> $oldobjs$old_deplibs~
> -   $AR $AR_FLAGS $oldlib$oldobjs$old_deplibs `find $tpldir -name 
> \*.o | $NL2SP`~
> +   $AR $AR_FLAGS $oldlib$oldobjs$old_deplibs `find $tpldir -name 
> \*.o | sort | $NL2SP`~
> $RANLIB $oldlib'
>   _LT_TAGVAR(archive_cmds, $1)='tpldir=Template.dir~
> rm -rf $tpldir~
> $CC --prelink_objects --instantiation_dir $tpldir 
> $predep_objects $libobjs $deplibs $convenience $postdep_objects~
> -   $CC -shared $pic_flag $predep_objects $libobjs $deplibs `find 
> $tpldir -name \*.o | $NL2SP` $postdep_objects $compiler_flags ${wl}-soname 
> ${wl}$soname -o $lib'
> +   $CC -shared $pic_flag $predep_objects $libobjs $deplibs `find 
> $tpldir -name \*.o | sort | $NL2SP` $postdep_objects $compiler_flags 
> ${wl}-soname ${wl}$soname -o $lib'
>   _LT_TAGVAR(archive_expsym_cmds, $1)='tpldir=Template.dir~
> rm -rf $tpldir~
> $CC --prelink_objects --instantiation_dir $tpldir 
> $predep_objects $libobjs $deplibs $convenience $postdep_objects~
> -   $CC -shared $pic_flag $predep_objects $libobjs $deplibs `find 
> $tpldir -name \*.o | $NL2SP` $postdep_objects $compiler_flags ${wl}-soname 
> ${wl}$soname ${wl}-retain-symbols-file ${wl}$export_symbols -o $lib'
> +   $CC -shared $pic_flag $predep_objects $libobjs $deplibs `find 
> $tpldir -name \*.o | sort | $NL2SP` $postdep_objects $compiler_flags 
> ${wl}-soname ${wl}$soname ${wl}-retain-symbols-file ${wl}$export_symbols -o 
> $lib'
>   ;;
> *) # Version 6 and above use weak symbols
>   _LT_TAGVAR(archive_cmds, $1)='$CC -shared $pic_flag 
> $predep_objects $libobjs $deplibs $postdep_objects $compiler_flags 
> ${wl}-soname ${wl}$soname -o $lib'
> diff --git a/ltmain.sh b/ltmain.sh
> index 9503ec85d..79f9ba89a 100644
> --- a/ltmain.sh
> +++ b/ltmain.sh
> @@ -2917,7 +2917,7 @@ func_extract_archives ()
> darwin_file=
> darwin_files=
> for darwin_file in $darwin_filelist; do
> - darwin_files=`find unfat-$$ -name $darwin_file -print | $NL2SP`
> + darwin_files=`find unfat-$$ -name $darwin_file -print | sort | 
> $NL2SP`
>   $LIPO -create -output "$darwin_file" $darwin_files
> done # $darwin_filelist
> $RM -rf unfat-$$
> @@ -2932,7 +2932,7 @@ func_extract_archives ()
>  func_extract_an_archive "$my_xdir" "$my_xabs"

Re: [PATCH] -fopt-info: add indentation via DUMP_VECT_SCOPE

2018-06-29 Thread Richard Biener
On Tue, Jun 26, 2018 at 5:43 PM David Malcolm  wrote:
>
> This patch adds a concept of nested "scopes" to dumpfile.c's dump_*_loc
> calls, and wires it up to the DUMP_VECT_SCOPE macro in tree-vectorizer.h,
> so that the nested structure is shown in -fopt-info by indentation.
>
> For example, this converts -fopt-info-all e.g. from:
>
> test.c:8:3: note: === analyzing loop ===
> test.c:8:3: note: === analyze_loop_nest ===
> test.c:8:3: note: === vect_analyze_loop_form ===
> test.c:8:3: note: === get_loop_niters ===
> test.c:8:3: note: symbolic number of iterations is (unsigned int) n_9(D)
> test.c:8:3: note: not vectorized: loop contains function calls or data 
> references that cannot be analyzed
> test.c:8:3: note: vectorized 0 loops in function
>
> to:
>
> test.c:8:3: note: === analyzing loop ===
> test.c:8:3: note:  === analyze_loop_nest ===
> test.c:8:3: note:   === vect_analyze_loop_form ===
> test.c:8:3: note:=== get_loop_niters ===
> test.c:8:3: note:   symbolic number of iterations is (unsigned int) n_9(D)
> test.c:8:3: note:   not vectorized: loop contains function calls or data 
> references that cannot be analyzed
> test.c:8:3: note: vectorized 0 loops in function
>
> showing that the "symbolic number of iterations" message is within
> the "=== analyze_loop_nest ===" (and not within the
> "=== vect_analyze_loop_form ===").
>
> This is also enabling work for followups involving optimization records
> (allowing the records to directly capture the nested structure of the
> dump messages).
>
> Successfully bootstrapped & regrtested on x86_64-pc-linux-gnu.
>
> OK for trunk?

OK and sorry for the delay.
Richard.

> gcc/ChangeLog:
> * dumpfile.c (dump_loc): Add indentation based on scope depth.
> (dump_scope_depth): New variable.
> (get_dump_scope_depth): New function.
> (dump_begin_scope): New function.
> (dump_end_scope): New function.
> * dumpfile.h (get_dump_scope_depth): New declaration.
> (dump_begin_scope): New declaration.
> (dump_end_scope): New declaration.
> (class auto_dump_scope): New class.
> (AUTO_DUMP_SCOPE): New macro.
> * tree-vectorizer.h (DUMP_VECT_SCOPE): Reimplement in terms of
> AUTO_DUMP_SCOPE.
> ---
>  gcc/dumpfile.c| 35 +++
>  gcc/dumpfile.h| 39 +++
>  gcc/tree-vectorizer.h | 15 ---
>  3 files changed, 82 insertions(+), 7 deletions(-)
>
> diff --git a/gcc/dumpfile.c b/gcc/dumpfile.c
> index 122e420..190b52d 100644
> --- a/gcc/dumpfile.c
> +++ b/gcc/dumpfile.c
> @@ -419,6 +419,8 @@ dump_loc (dump_flags_t dump_kind, FILE *dfile, 
> source_location loc)
>   DECL_SOURCE_FILE (current_function_decl),
>   DECL_SOURCE_LINE (current_function_decl),
>   DECL_SOURCE_COLUMN (current_function_decl));
> +  /* Indentation based on scope depth.  */
> +  fprintf (dfile, "%*s", get_dump_scope_depth (), "");
>  }
>  }
>
> @@ -539,6 +541,39 @@ template void dump_dec (dump_flags_t, const poly_uint64 
> &);
>  template void dump_dec (dump_flags_t, const poly_offset_int &);
>  template void dump_dec (dump_flags_t, const poly_widest_int &);
>
> +/* The current dump scope-nesting depth.  */
> +
> +static int dump_scope_depth;
> +
> +/* Get the current dump scope-nesting depth.
> +   For use by dump_*_loc (for showing nesting via indentation).  */
> +
> +unsigned int
> +get_dump_scope_depth ()
> +{
> +  return dump_scope_depth;
> +}
> +
> +/* Push a nested dump scope.
> +   Print "=== NAME ===\n" to the dumpfile, if any, and to the -fopt-info
> +   destination, if any.
> +   Increment the scope depth.  */
> +
> +void
> +dump_begin_scope (const char *name, const dump_location_t )
> +{
> +  dump_printf_loc (MSG_NOTE, loc, "=== %s ===\n", name);
> +  dump_scope_depth++;
> +}
> +
> +/* Pop a nested dump scope.  */
> +
> +void
> +dump_end_scope ()
> +{
> +  dump_scope_depth--;
> +}
> +
>  /* Start a dump for PHASE. Store user-supplied dump flags in
> *FLAG_PTR.  Return the number of streams opened.  Set globals
> DUMP_FILE, and ALT_DUMP_FILE to point to the opened streams, and
> diff --git a/gcc/dumpfile.h b/gcc/dumpfile.h
> index 90d8930..89d5c11 100644
> --- a/gcc/dumpfile.h
> +++ b/gcc/dumpfile.h
> @@ -456,6 +456,45 @@ dump_enabled_p (void)
>return (dump_file || alt_dump_file);
>  }
>
> +/* Managing nested scopes, so that dumps can express the call chain
> +   leading to a dump message.  */
> +
> +extern unsigned int get_dump_scope_depth ();
> +extern void dump_begin_scope (const char *name, const dump_location_t );
> +extern void dump_end_scope ();
> +
> +/* Implementation detail of the AUTO_DUMP_SCOPE macro below.
> +
> +   A RAII-style class intended to make it easy to emit dump
> +   information about entering and exiting a collection of nested
> +   function calls.  */
> +
> +class auto_dump_scope
> +{
> + public:
> +  auto_dump_scope 

Re: [PATCH 3/3] Come up with new --completion option.

2018-06-29 Thread Martin Liška
Hi.

I would like to link bash-completion pull request that adjusts gcc option 
provides:
https://github.com/scop/bash-completion/pull/222

Martin


Re: [PATCH] Add experimental::sample and experimental::shuffle from N4531

2018-06-29 Thread Christophe Lyon
On Fri, 29 Jun 2018 at 09:21, Jonathan Wakely  wrote:
>
> On 29/06/18 08:55 +0200, Christophe Lyon wrote:
> >On Mon, 25 Jun 2018 at 18:23, Jonathan Wakely  wrote:
> >>
> >> The additions to  were added in 2015 but the new
> >> algorithms in  were not. This adds them.
> >>
> >> * include/experimental/algorithm (sample, shuffle): Add new 
> >> overloads
> >> using per-thread random number engine.
> >> * testsuite/experimental/algorithm/sample.cc: Simpify and reduce
> >> dependencies by using __gnu_test::test_container.
> >> * testsuite/experimental/algorithm/sample-2.cc: New.
> >> * testsuite/experimental/algorithm/shuffle.cc: New.
> >>
> >> Tested x86_64-linux, committed to trunk.
> >>
> >> This would be safe to backport, but nobody has noticed the algos are
> >> missing or complained, so it doesn't seem very important to backport.
> >>
> >>
> >
> >Hi,
> >
> >On bare-metal targets (aarch64 and arm + newlib), I've noticed that
> >the two new tests fail:
> >PASS: experimental/algorithm/shuffle.cc (test for excess errors)
> >spawn 
> >/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/utils/bin/qemu-wrapper.sh
> >./shuffle.exe
> >terminate called after throwing an instance of 'std::runtime_error'
> >  what():  random_device::random_device(const std::string&)
> >
> >*** EXIT code 4242
> >FAIL: experimental/algorithm/shuffle.cc execution test
> >
> >PASS: experimental/algorithm/sample-2.cc (test for excess errors)
> >spawn 
> >/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/utils/bin/qemu-wrapper.sh
> >./sample-2.exe
> >terminate called after throwing an instance of 'std::runtime_error'
> >  what():  random_device::random_device(const std::string&)
> >
> >*** EXIT code 4242
> >FAIL: experimental/algorithm/sample-2.cc execution test
> >
> >Does this ring a bell?
>
> Does the existing testsuite/experimental/random/randint.cc file fail
> in the same way?
>

Yes it does.

And so do:
25_algorithms/make_heap/complexity.cc
23_containers/array/element_access/at_neg.cc
26_numerics/random/random_device/cons/default.cc


Re: [PATCH][2/3] Share dataref and dependence analysis for multi-vector size vectorization

2018-06-29 Thread Richard Biener
On Thu, 28 Jun 2018, Christophe Lyon wrote:

> On Fri, 22 Jun 2018 at 12:52, Richard Biener  wrote:
> >
> >
> > This is the main part to make considering multiple vector sizes based on
> > costs less compile-time costly.  It shares dataref analysis and
> > dependence analysis for loop vectorization (BB vectorization is only
> > adjusted to comply with the new APIs sofar).
> >
> > Sharing means that DRs (and of course DDRs) may not be modified during
> > vectorization analysis which means splitting out dataref_aux from
> > dangling from dr->aux to separate copies in the DR_STMTs vinfo.
> > I've put measures in place that assure that DRs are not modified
> > (because they were) but refrained from doing the same for DDRs
> > (because I didn't run into any issues).
> >
> > The sharing then is accomplished by moving the dataref and
> > ddr array as well as the dependent loop-nest array into a
> > separate structure with bigger lifetime than vinfo and appropriately
> > link to it from there.
> >
> > Bootstrapped on x86_64-unknown-linux-gnu (together with [1/3]),
> > testing in progress.
> >
> 
> Hi Richard,
> 
> This you committed this patch (r262009) I've noticed a regression on
> aarch64 and arm,
> and I think it's also present on x86 according to gcc-testresults:
> FAIL:gcc.dg/params/blocksort-part.c -O3 --param
> loop-max-datarefs-for-datadeps=0 (test for excess errors)
> FAIL:gcc.dg/params/blocksort-part.c -O3 --param
> loop-max-datarefs-for-datadeps=0 (internal compiler error)
> 
> gcc.log says:
> 
> during GIMPLE pass: vect
> /gcc/testsuite/gcc.dg/params/blocksort-part.c: In function 'fallbackQSort3':
> /gcc/testsuite/gcc.dg/params/blocksort-part.c:116:6: internal compiler
> error: Segmentation fault
> 0xbef215 crash_signal
> /gcc/toplev.c:324
> 0x14715cc gimple_uid
> /gcc/tree-vectorizer.h:1043
> 0x14715cc vinfo_for_stmt
> /gcc/tree-vectorizer.h:1043
> 0x14715cc vect_dr_stmt
> /gcc/tree-vectorizer.h:1331
> 0x14715cc vect_analyze_data_ref_dependence
> /gcc/tree-vect-data-refs.c:297
> 0x14715cc vect_analyze_data_ref_dependences(_loop_vec_info*, unsigned int*)
> /gcc/tree-vect-data-refs.c:593
> 0xecabb8 vect_analyze_loop_2
> /gcc/tree-vect-loop.c:1910
> 0xecc70c vect_analyze_loop(loop*, _loop_vec_info*, vec_info_shared*)
> /gcc/tree-vect-loop.c:2337
> 0xeec188 try_vectorize_loop_1
> /gcc/tree-vectorizer.c:705
> 0xeed512 vectorize_loops()
> /gcc/tree-vectorizer.c:918

I am testing the following.

Richard.

2018-06-29  Richard Biener  

* tree-vect-data-refs.c (vect_analyze_data_ref_dependences): Assert
compute_all_dependences succeeds.
* tree-vect-loop.c (vect_get_datarefs_in_loop): Fail early if we
exceed --param loop-max-datarefs-for-datadeps.

diff --git a/gcc/tree-vect-data-refs.c b/gcc/tree-vect-data-refs.c
index 9f848fefd1e..63429a34bf2 100644
--- a/gcc/tree-vect-data-refs.c
+++ b/gcc/tree-vect-data-refs.c
@@ -575,10 +575,11 @@ vect_analyze_data_ref_dependences (loop_vec_info 
loop_vinfo,
 * LOOP_VINFO_DATAREFS (loop_vinfo).length ());
   /* We need read-read dependences to compute
 STMT_VINFO_SAME_ALIGN_REFS.  */
-  if (!compute_all_dependences (LOOP_VINFO_DATAREFS (loop_vinfo),
-   _VINFO_DDRS (loop_vinfo),
-   LOOP_VINFO_LOOP_NEST (loop_vinfo), true))
-   return false;
+  bool res = compute_all_dependences (LOOP_VINFO_DATAREFS (loop_vinfo),
+ _VINFO_DDRS (loop_vinfo),
+ LOOP_VINFO_LOOP_NEST (loop_vinfo),
+ true);
+  gcc_assert (res);
 }
 
   LOOP_VINFO_NO_DATA_DEPENDENCIES (loop_vinfo) = true;
diff --git a/gcc/tree-vect-loop.c b/gcc/tree-vect-loop.c
index cd19fe6fbab..8d6ee648b35 100644
--- a/gcc/tree-vect-loop.c
+++ b/gcc/tree-vect-loop.c
@@ -1808,6 +1808,11 @@ vect_get_datarefs_in_loop (loop_p loop, basic_block *bbs,
  }
return false;
  }
+   /* If dependence analysis will give up due to the limit on the
+  number of datarefs stop here and fail fatally.  */
+   if (datarefs->length ()
+   > (unsigned)PARAM_VALUE (PARAM_LOOP_MAX_DATAREFS_FOR_DATADEPS))
+ return false;
   }
   return true;
 }



Re: [patch, fortran] Handling of .and. and .or. expressions

2018-06-29 Thread Jakub Jelinek
On Thu, Jun 28, 2018 at 07:36:56PM -0700, Steve Kargl wrote:
> === gfortran Summary ===
> 
> # of expected passes47558
> # of unexpected failures6
> # of expected failures  104
> # of unsupported tests  85
> 
> FAIL: gfortran.dg/actual_pointer_function_1.f90   -O0  execution test
> FAIL: gfortran.dg/actual_pointer_function_1.f90   -O1  execution test
> FAIL: gfortran.dg/actual_pointer_function_1.f90   -O2  execution test
> FAIL: gfortran.dg/actual_pointer_function_1.f90   -O3 -fomit-frame-pointer 
> -funroll-loops -fpeel-loops -ftracer -finline-functions  execution test
> FAIL: gfortran.dg/actual_pointer_function_1.f90   -O3 -g  execution test
> FAIL: gfortran.dg/actual_pointer_function_1.f90   -Os  execution test
> 
> Execution timeout is: 300
> spawn [open ...]
> 
> Program received signal SIGSEGV: Segmentation fault - invalid memory 
> reference.
> 
> Backtrace for this error:
> #0  0x71a2 in ???
> #1  0x400c09 in ???
> #2  0x400b91 in ???
> #3  0x400c51 in ???
> #4  0x400854 in _start
> at /usr/src/lib/csu/amd64/crt1.c:74
> #5  0x200627fff in ???

If you have a test that is broken by the TRUTH_ANDIF_EXPR -> TRUTH_AND_EXPR
change, then the test must be broken, because from the snippets that were
posted, Fortran does not require (unlike C/C++) that the second operand is
not evaluated if the first one evaluates to false for (and) or true (for
or), it just allows it.

So, the optimizing away of the function calls should be an optimization, and
as such should be done only when optimizing.  So for -O0 at least always use
TRUTH_{AND,OR}_EXPR, so that people can actually make sure that their
programs are valid Fortran and can also step into those functions when
debugging.  For -O1 and higher perhaps use temporarily the *IF_EXPR, or
better, as I said in another mail, let's add an attribute that will optimize
all the calls that can be optimized, not just one special case.

Jakub


Re: [PATCH] Add experimental::sample and experimental::shuffle from N4531

2018-06-29 Thread Jonathan Wakely

On 29/06/18 08:55 +0200, Christophe Lyon wrote:

On Mon, 25 Jun 2018 at 18:23, Jonathan Wakely  wrote:


The additions to  were added in 2015 but the new
algorithms in  were not. This adds them.

* include/experimental/algorithm (sample, shuffle): Add new overloads
using per-thread random number engine.
* testsuite/experimental/algorithm/sample.cc: Simpify and reduce
dependencies by using __gnu_test::test_container.
* testsuite/experimental/algorithm/sample-2.cc: New.
* testsuite/experimental/algorithm/shuffle.cc: New.

Tested x86_64-linux, committed to trunk.

This would be safe to backport, but nobody has noticed the algos are
missing or complained, so it doesn't seem very important to backport.




Hi,

On bare-metal targets (aarch64 and arm + newlib), I've noticed that
the two new tests fail:
PASS: experimental/algorithm/shuffle.cc (test for excess errors)
spawn 
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/utils/bin/qemu-wrapper.sh
./shuffle.exe
terminate called after throwing an instance of 'std::runtime_error'
 what():  random_device::random_device(const std::string&)

*** EXIT code 4242
FAIL: experimental/algorithm/shuffle.cc execution test

PASS: experimental/algorithm/sample-2.cc (test for excess errors)
spawn 
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/utils/bin/qemu-wrapper.sh
./sample-2.exe
terminate called after throwing an instance of 'std::runtime_error'
 what():  random_device::random_device(const std::string&)

*** EXIT code 4242
FAIL: experimental/algorithm/sample-2.cc execution test

Does this ring a bell?


Does the existing testsuite/experimental/random/randint.cc file fail
in the same way?




Re: [PATCH] Fix bit-test expansion for single cluster (PR tree-optimization/86263).

2018-06-29 Thread Richard Biener
On Thu, Jun 28, 2018 at 9:06 PM Martin Liška  wrote:
>
> Hi.
>
> I'm sending patch for situation where we create a bit-test for
> entire switch. In that case split BB must be redirected so that
> the original switch is a dead code.
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>
> Ready to be installed?

Testcase?

OK with one added.

Thanks,
Richard.

> Martin
>
> gcc/ChangeLog:
>
> 2018-06-28  Martin Liska  
>
> PR tree-optimization/86263
> * tree-switch-conversion.c 
> (switch_decision_tree::try_switch_expansion):
> Make edge redirection.
> ---
>  gcc/tree-switch-conversion.c | 8 ++--
>  1 file changed, 6 insertions(+), 2 deletions(-)
>
>


Re: [committed] Introduce dump_location_t

2018-06-29 Thread Richard Biener
On Thu, Jun 28, 2018 at 4:29 PM David Malcolm  wrote:
>
> On Thu, 2018-06-28 at 13:29 +0200, Richard Biener wrote:
> > On Tue, Jun 26, 2018 at 3:54 PM David Malcolm 
> > wrote:
> > >
> > > On Mon, 2018-06-25 at 15:34 +0200, Richard Biener wrote:
> > > > On Wed, Jun 20, 2018 at 6:34 PM David Malcolm  > > > m>
> > > > wrote:
> > > > >
> > > > > Here's v3 of the patch (one big patch this time, rather than a
> > > > > kit).
> > > > >
> > > > > Like the v2 patch kit, this patch reuses the existing dump API,
> > > > > rather than inventing its own.
> > > > >
> > > > > Specifically, it uses the dump_* functions in dumpfile.h that
> > > > > don't
> > > > > take a FILE *, the ones that implicitly write to dump_file
> > > > > and/or
> > > > > alt_dump_file.  I needed a name for them, so I've taken to
> > > > > calling
> > > > > them the "structured dump API" (better name ideas welcome).
> > > > >
> > > > > v3 eliminates v2's optinfo_guard class, instead using
> > > > > "dump_*_loc"
> > > > > calls as delimiters when consolidating "dump_*" calls.  There's
> > > > > a
> > > > > new dump_context class which has responsibility for
> > > > > consolidating
> > > > > them into optimization records.
> > > > >
> > > > > The dump_*_loc calls now capture more than just a location_t:
> > > > > they
> > > > > capture the profile_count and the location in GCC's own sources
> > > > > where
> > > > > the dump is being emitted from.
> > > > >
> > > > > This works by introducing a new "dump_location_t" class as the
> > > > > argument of those dump_*_loc calls.  The dump_location_t can
> > > > > be constructed from a gimple * or from an rtx_insn *, so that
> > > > > rather than writing:
> > > > >
> > > > >   dump_printf_loc (MSG_NOTE, gimple_location (stmt),
> > > > >"some message: %i", 42);
> > > > >
> > > > > you can write:
> > > > >
> > > > >   dump_printf_loc (MSG_NOTE, stmt,
> > > > >"some message: %i", 42);
> > > > >
> > > > > and the dump_location_t constructor will grab the location_t
> > > > > and
> > > > > profile_count of stmt, and the location of the
> > > > > "dump_printf_loc"
> > > > > callsite (and gracefully handle "stmt" being NULL).
> > > > >
> > > > > Earlier versions of the patch captured the location of the
> > > > > dump_*_loc call via preprocessor hacks, or didn't work
> > > > > properly;
> > > > > this version of the patch works more cleanly: internally,
> > > > > dump_location_t is split into two new classes:
> > > > >   * dump_user_location_t: the location_t and profile_count
> > > > > within
> > > > > the *user's code*, and
> > > > >   * dump_impl_location_t: the __builtin_FILE/LINE/FUNCTION
> > > > > within
> > > > > the *implementation* code (i.e. GCC or a plugin), captured
> > > > > "automagically" via default params
> > > > >
> > > > > These classes are sometimes used elsewhere in the code.  For
> > > > > example, "vect_location" becomes a dump_user_location_t
> > > > > (location_t and profile_count), so that in e.g:
> > > > >
> > > > >   vect_location = find_loop_location (loop);
> > > > >
> > > > > it's capturing the location_t and profile_count, and then when
> > > > > it's used here:
> > > > >
> > > > >   dump_printf_loc (MSG_NOTE, vect_location, "foo");
> > > > >
> > > > > the dump_location_t is constructed from the vect_location
> > > > > plus the dump_impl_location_t at that callsite.
> > > > >
> > > > > In contrast, loop-unroll.c's report_unroll's "locus" param
> > > > > becomes a dump_location_t: we're interested in where it was
> > > > > called from, not in the locations of the various dump_*_loc
> > > > > calls
> > > > > within it.
> > > > >
> > > > > Previous versions of the patch captured a gimple *, and needed
> > > > > GTY markers; in this patch, the dump_user_location_t is now
> > > > > just a
> > > > > location_t and a profile_count.
> > > > >
> > > > > The v2 patch added an overload for dump_printf_loc so that you
> > > > > could pass in either a location_t, or the new type; this
> > > > > version
> > > > > of the patch eliminates that: they all now take
> > > > > dump_location_t.
> > > > >
> > > > > Doing so required adding support for rtx_insn *, so that one
> > > > > can
> > > > > write this kind of thing in RTL passes:
> > > > >
> > > > >   dump_printf_loc (MSG_NOTE, insn, "foo");
> > > > >
> > > > > One knock-on effect is that get_loop_location now returns a
> > > > > dump_user_location_t rather than a location_t, so that it has
> > > > > hotness information.
> > > > >
> > > > > Richi: would you like me to split out this location-handling
> > > > > code into a separate patch?  (It's kind of redundant without
> > > > > adding the remarks and optimization records work, but if that's
> > > > > easier I can do it)
> > > >
> > > > I think that would be easier because it doesn't require the JSON
> > > > stuff and so I'll happily approve it.
> > > >
> > > > Thus - trying to review that bits (and sorry for the delay).
> > > >
> > > > +  

Re: [PATCH] Add experimental::sample and experimental::shuffle from N4531

2018-06-29 Thread Christophe Lyon
On Mon, 25 Jun 2018 at 18:23, Jonathan Wakely  wrote:
>
> The additions to  were added in 2015 but the new
> algorithms in  were not. This adds them.
>
> * include/experimental/algorithm (sample, shuffle): Add new overloads
> using per-thread random number engine.
> * testsuite/experimental/algorithm/sample.cc: Simpify and reduce
> dependencies by using __gnu_test::test_container.
> * testsuite/experimental/algorithm/sample-2.cc: New.
> * testsuite/experimental/algorithm/shuffle.cc: New.
>
> Tested x86_64-linux, committed to trunk.
>
> This would be safe to backport, but nobody has noticed the algos are
> missing or complained, so it doesn't seem very important to backport.
>
>

Hi,

On bare-metal targets (aarch64 and arm + newlib), I've noticed that
the two new tests fail:
PASS: experimental/algorithm/shuffle.cc (test for excess errors)
spawn 
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/utils/bin/qemu-wrapper.sh
./shuffle.exe
terminate called after throwing an instance of 'std::runtime_error'
  what():  random_device::random_device(const std::string&)

*** EXIT code 4242
FAIL: experimental/algorithm/shuffle.cc execution test

PASS: experimental/algorithm/sample-2.cc (test for excess errors)
spawn 
/aci-gcc-fsf/builds/gcc-fsf-gccsrc/obj-arm-none-eabi/gcc3/utils/bin/qemu-wrapper.sh
./sample-2.exe
terminate called after throwing an instance of 'std::runtime_error'
  what():  random_device::random_device(const std::string&)

*** EXIT code 4242
FAIL: experimental/algorithm/sample-2.cc execution test

Does this ring a bell?

Thanks,

Christophe