date:20191110

[PATCH 1/2] Add a pass to automatically add ptwrite instrumentation

2019-11-10 Thread Andi Kleen

From: Andi Kleen 

[v4: Rebased on current tree. Avoid some redundant log statements
for locals and a few other fixes.  Fix some comments. Improve
documentation. Did some studies on the debug information quality,
see below]

Add a new pass to automatically instrument changes to variables
with the new PTWRITE instruction on x86. PTWRITE writes a 4 or 8 byte
field into an Processor Trace log, which allows low over head
logging of information. Essentially it's a hardware accelerated
printf.

This allows to reconstruct how values later from the log,
which can be useful for debugging or other analysis of the program
behavior. With the compiler support this can be done with without
having to manually add instrumentation to the code.

Using dwarf information this can be later mapped back to the variables.
The decoder decodes the PTWRITE instructions using IP information
in the log, and then looks up the argument in the debug information.
Then this can be used to reconstruct the original variable
name to display a value history for the variable.

There are new options to enable instrumentation for different types,
and also a new attribute to control analysis fine grained per
function or variable level. The attributes can be set on both
the variable and the type level, and also on structure fields.
This allows to enable tracing only for specific code in large
programs in a flexible matter.

The pass is generic, but only the x86 backend enables the necessary
hooks. When the backend enables the necessary hooks (with -mptwrite)
there is an additional pass that looks through the code for
attribute vartrace enabled functions or variables.

Earlier there were concerns that the debug information is not
always associated with the ptwrite instruction because the
backend doesn't know how to keep it together.

I did some experiments using -fdump-rtl-final, just checking if the
PTWRITE builtin has a variable location on "loop-unroll.c" from
the gcc source.

With -fvartrace=args there is good coverage, the ptwrite always had a
usable variable name.

With -fvartrace=returns there is usually no variable name, but in this
case the decoder can figure it out by looking for the RET, and knowing
that the value is in %rax.

With -fvartrace=reads,writes the ptwrite usually just has the variable
name of the gimple temporary in a register.
However there is near always a rtl set for the address just before it,
and the set tends to have the expected name/type and offset (if for a
structure). There are two ways to handle this: either could teach the
decoder to track debug info for registers. Or alternatively could
change the ptwrite define_insn to avoid splitting the effective
address into a separate register. I tried this with some different
constraints, but wasn't successfull. I hope there's some way to do
this though.

With -fvartrace=locals there's a mix. Most accesses have the correct
variable name, but sometimes it is lost. I believe that's acceptable,
locals is more an experimental option and may not be too useful anyways
because it generates a lot of traffic.

Currently the code can be tested with SDE, or on a Intel
Gemini Lake system with a new enough Linux kernel (v4.10+)
that supports PTWRITE for PT. Gemini Lake is used in low
end laptops ("Intel Pentium Silver J5.. / Celeron N4... /
Celeron J4...")

Linux perf (4.10+) can be used to record the values

perf record -e intel_pt/ptw=1,pt=1,branch=0,fup_on_ptw=1/u ./program
perf script -F +srcline ..

I have an experimential version of perf that can also use
dwarf information to symbolize many[1] values back to their variable
names. So far it is not in standard perf, but available at

https://git.kernel.org/pub/scm/linux/kernel/git/ak/linux-misc.git/log/?h=perf/var-resolve-5

It is currently not able to decode all variable locations to names,
but a large subset.

The CPU can potentially generate very data high bandwidths when
code doing a lot of computation is heavily instrumented.
This can cause some data loss in both the CPU and also in perf
logging the data when the disk cannot keep up.

Running some larger workloads most workloads do not cause
CPU level overflows, but I've seen it with -fvartrace
with crafty, and with more workloads with -fvartrace=locals.

Recommendation is to not fully instrument programs,
but only areas of interest either at the file level or using
the attributes.

perf and the disk often cannot keep up
with the data bandwidth for longer computations. In this case
it's possible to use perf snapshot mode (add --snapshot
to the command line above). The data will be only logged to
a memory ring buffer then, and only dump the buffers on events
of interest by sending SIGUSR2 to the perf binrary.

In the future this will be hopefully better supported with
core files and gdb.

Passes bootstrap and test suite on x86_64-linux, also
bootstrapped and tested gcc itself with full -fvartrace
and -fvartrace=all instrumentation, and running
the test suite with

[PATCH, rs6000] Refactor FP vector comparison operators

2019-11-10 Thread Kewen.Lin

Hi,

This is a subsequent patch to refactor the existing float point
vector comparison operator supports.  The patch to fix PR92132
supplemented vector float point comparison by exposing the names
for unordered/ordered/uneq/ltgt and adding ungt/unge/unlt/unle/
ne.  As Segher pointed out, some patterns can be refactored
together.  The main link on this is: 
https://gcc.gnu.org/ml/gcc-patches/2019-11/msg00452.html


The refactoring mainly follows the below patterns:

pattern 1:
  lt(a,b) = gt(b,a)
  le(a,b) = ge(b,a)

pattern 2:
  unge(a,b) = ~gt(b,a)
  unle(a,b) = ~gt(a,b)
  ne(a,b)   = ~eq(a,b)
  ungt(a,b) = ~ge(b,a)
  unlt(a,b) = ~ge(a,b)

pattern 3:
  ltgt: gt(a,b) | gt(b,a)
  ordered: ge(a,b) | ge(b,a)

pattern 4:
  uneq: ~gt(a,b) & ~gt(b,a)
  unordered: ~ge(a,b) & ~ge(b,a)

Naming the code iterators and attributes are really knotty for me :(.

Regression testing just launched.

BR,
Kewen

---
gcc/ChangeLog

2019-11-11 Kewen Lin  

* config/rs6000/vector.md (vec_fp_cmp1): New code iterator.
(vec_fp_cmp2): Likewise.
(vec_fp_cmp3): Likewise.
(vec_fp_cmp4): Likewise.
(vec_fp_cmp1_attr): New code attribute.
(vec_fp_cmp2_attr): Likewise.
(vec_fp_cmp3_attr): Likewise.
(vec_fp_cmp4_attr): Likewise.
(vector_ for VEC_F and vec_fp_cmp1): New
define_and_split.
(vector_ for VEC_F and vec_fp_cmp2): Likewise.
(vector_ for VEC_F and vec_fp_cmp3): Likewise.
(vector_ for VEC_F and vec_fp_cmp4): Likewise.
(vector_lt for VEC_F): Refactor with vec_fp_cmp1.
(vector_le for VEC_F): Likewise.
(vector_unge for VEC_F): Refactor with vec_fp_cmp2.
(vector_unle for VEC_F): Likewise.
(vector_ne for VEC_F): Likewise.
(vector_ungt for VEC_F): Likewise.
(vector_unlt for VEC_F): Likewise.
(vector_ltgt for VEC_F): Refactor with vec_fp_cmp3.
(vector_ordered for VEC_F): Likewise.
(vector_uneq for VEC_F): Refactor with vec_fp_cmp4.
(vector_unordered for VEC_F): Likewise.
diff --git a/gcc/config/rs6000/vector.md b/gcc/config/rs6000/vector.md
index b132037..be2d425 100644
--- a/gcc/config/rs6000/vector.md
+++ b/gcc/config/rs6000/vector.md
@@ -107,6 +107,31 @@
 (smin "smin")
 (smax "smax")])
 
+;; code iterators and attributes for vector FP comparison operators:
+
+;; 1. lt and le.
+(define_code_iterator vec_fp_cmp1 [lt le])
+(define_code_attr vec_fp_cmp1_attr [(lt "gt")
+   (le "ge")])
+
+; 2. unge, unle, ne, ungt and unlt.
+(define_code_iterator vec_fp_cmp2 [unge unle ne ungt unlt])
+(define_code_attr vec_fp_cmp2_attr [(unge "gt")
+   (unle "gt")
+   (ne   "eq")
+   (ungt "ge")
+   (unlt "ge")])
+
+;; 3. ltgt and ordered.
+(define_code_iterator vec_fp_cmp3 [ltgt ordered])
+(define_code_attr vec_fp_cmp3_attr [(ltgt "gt")
+   (ordered "ge")])
+
+;; 4. uneq and unordered.
+(define_code_iterator vec_fp_cmp4 [uneq unordered])
+(define_code_attr vec_fp_cmp4_attr [(uneq "gt")
+   (unordered "ge")])
+
 
 ;; Vector move instructions.  Little-endian VSX loads and stores require
 ;; special handling to circumvent "element endianness."
@@ -665,88 +690,6 @@
   DONE;
 })
 
-; lt(a,b) = gt(b,a)
-(define_expand "vector_lt"
-  [(set (match_operand:VEC_F 0 "vfloat_operand")
-   (lt:VEC_F (match_operand:VEC_F 1 "vfloat_operand")
- (match_operand:VEC_F 2 "vfloat_operand")))]
-  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
-{
-  emit_insn (gen_vector_gt (operands[0], operands[2], operands[1]));
-  DONE;
-})
-
-; le(a,b) = ge(b,a)
-(define_expand "vector_le"
-  [(set (match_operand:VEC_F 0 "vfloat_operand")
-   (le:VEC_F (match_operand:VEC_F 1 "vfloat_operand")
- (match_operand:VEC_F 2 "vfloat_operand")))]
-  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
-{
-  emit_insn (gen_vector_ge (operands[0], operands[2], operands[1]));
-  DONE;
-})
-
-; ne(a,b) = ~eq(a,b)
-(define_expand "vector_ne"
-  [(set (match_operand:VEC_F 0 "vfloat_operand")
-   (ne:VEC_F (match_operand:VEC_F 1 "vfloat_operand")
- (match_operand:VEC_F 2 "vfloat_operand")))]
-  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
-{
-  emit_insn (gen_vector_eq (operands[0], operands[1], operands[2]));
-  emit_insn (gen_one_cmpl2 (operands[0], operands[0]));
-  DONE;
-})
-
-; unge(a,b) = ~gt(b,a)
-(define_expand "vector_unge"
-  [(set (match_operand:VEC_F 0 "vfloat_operand")
-   (unge:VEC_F (match_operand:VEC_F 1 "vfloat_operand")
-   (match_operand:VEC_F 2 "vfloat_operand")))]
-  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
-{
-  emit_insn (gen_vector_gt (operands[0], operands[2], operands[1]));
-  emit_insn (gen_one_cmpl2 (operands[0], operands[0]));
-

[PATCH 2/2] Add tests for the vartrace pass

2019-11-10 Thread Andi Kleen

From: Andi Kleen 

So far they are mostly i386 target specific. Later they could
be moved up to architecture specific if some other architecture
adds vartracing. This would need abstracing the scanning
for the trace function.

gcc/testsuite/:

2019-11-10  Andi Kleen  

* g++.dg/vartrace-3.C: New test.
* g++.dg/vartrace-ret.C: New test.
* g++.dg/vartrace-ret2.C: New test.
* gcc.target/i386/vartrace-1.c: New test.
* gcc.target/i386/vartrace-10.c: New test.
* gcc.target/i386/vartrace-11.c: New test.
* gcc.target/i386/vartrace-12.c: New test.
* gcc.target/i386/vartrace-13.c: New test.
* gcc.target/i386/vartrace-14.c: New test.
* gcc.target/i386/vartrace-15.c: New test.
* gcc.target/i386/vartrace-16.c: New test.
* gcc.target/i386/vartrace-17.c: New test.
* gcc.target/i386/vartrace-18.c: New test.
* gcc.target/i386/vartrace-19.c: New test.
* gcc.target/i386/vartrace-20.c: New test.
* gcc.target/i386/vartrace-21.c: New test.
* gcc.target/i386/vartrace-22.c: New test.
* gcc.target/i386/vartrace-23.c: New test.
* gcc.target/i386/vartrace-2.c: New test.
* gcc.target/i386/vartrace-3.c: New test.
* gcc.target/i386/vartrace-4.c: New test.
* gcc.target/i386/vartrace-5.c: New test.
* gcc.target/i386/vartrace-6.c: New test.
* gcc.target/i386/vartrace-7.c: New test.
* gcc.target/i386/vartrace-8.c: New test.
* gcc.target/i386/vartrace-9.c: New test.
---
 gcc/testsuite/g++.dg/vartrace-3.C   | 14 
 gcc/testsuite/g++.dg/vartrace-ret.C | 17 +
 gcc/testsuite/g++.dg/vartrace-ret2.C| 24 +++
 gcc/testsuite/gcc.target/i386/vartrace-1.c  | 41 +++
 gcc/testsuite/gcc.target/i386/vartrace-10.c | 13 
 gcc/testsuite/gcc.target/i386/vartrace-11.c | 16 +
 gcc/testsuite/gcc.target/i386/vartrace-12.c | 16 +
 gcc/testsuite/gcc.target/i386/vartrace-13.c | 18 +
 gcc/testsuite/gcc.target/i386/vartrace-14.c | 17 +
 gcc/testsuite/gcc.target/i386/vartrace-15.c | 12 
 gcc/testsuite/gcc.target/i386/vartrace-16.c | 12 
 gcc/testsuite/gcc.target/i386/vartrace-17.c | 23 ++
 gcc/testsuite/gcc.target/i386/vartrace-18.c | 21 ++
 gcc/testsuite/gcc.target/i386/vartrace-19.c | 24 +++
 gcc/testsuite/gcc.target/i386/vartrace-2.c  |  9 +++
 gcc/testsuite/gcc.target/i386/vartrace-20.c | 32 +
 gcc/testsuite/gcc.target/i386/vartrace-21.c | 79 +
 gcc/testsuite/gcc.target/i386/vartrace-22.c | 17 +
 gcc/testsuite/gcc.target/i386/vartrace-23.c | 38 ++
 gcc/testsuite/gcc.target/i386/vartrace-3.c  |  9 +++
 gcc/testsuite/gcc.target/i386/vartrace-4.c  | 13 
 gcc/testsuite/gcc.target/i386/vartrace-5.c  | 11 +++
 gcc/testsuite/gcc.target/i386/vartrace-6.c  | 13 
 gcc/testsuite/gcc.target/i386/vartrace-7.c  | 11 +++
 gcc/testsuite/gcc.target/i386/vartrace-8.c  | 11 +++
 gcc/testsuite/gcc.target/i386/vartrace-9.c  | 10 +++
 gcc/testsuite/gcc.target/vartrace-19.c  | 23 ++
 27 files changed, 544 insertions(+)
 create mode 100644 gcc/testsuite/g++.dg/vartrace-3.C
 create mode 100644 gcc/testsuite/g++.dg/vartrace-ret.C
 create mode 100644 gcc/testsuite/g++.dg/vartrace-ret2.C
 create mode 100644 gcc/testsuite/gcc.target/i386/vartrace-1.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vartrace-10.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vartrace-11.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vartrace-12.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vartrace-13.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vartrace-14.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vartrace-15.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vartrace-16.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vartrace-17.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vartrace-18.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vartrace-19.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vartrace-2.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vartrace-20.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vartrace-21.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vartrace-22.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vartrace-23.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vartrace-3.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vartrace-4.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vartrace-5.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vartrace-6.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vartrace-7.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vartrace-8.c
 create mode 100644 gcc/testsuite/gcc.target/i386/vartrace-9.c
 create mode 100644 gcc/testsuite/gcc.target/vartrace-19.c

diff --git a/gcc/testsuite/g++.dg/vartrace-3.C 
b/gcc/testsuite/g++.dg/vartrace-3.C
new file mode 100644
index 000..217db297baa
--- /dev/null
+++

Re: PC-relative TLS support

2019-11-10 Thread Alan Modra

On Wed, Aug 21, 2019 at 09:55:28PM +0930, Alan Modra wrote:
> On Mon, Aug 19, 2019 at 07:45:19AM -0500, Segher Boessenkool wrote:
> > But if you think we can remove the !TARGET_TLS_MARKERS everywhere it
> > is relevant at all, now is the time, patches very welcome, it would be
> > a nice cleanup :-)  Needs testing everywhere of course, but now is
> > stage 1 :-)
> 
> This patch removes !TARGET_TLS_MARKERS support.  -mtls-markers (and
> -mno-tls-markers) disappear as valid options too, because I figure
> they haven't been used too much except by people testing the
> compiler.  Bootstrapped and regression tested powerpc64le-linux and
> powerpc-ibm-aix7.1.3.0 (on gcc111).  I believe powerpc*-darwin doesn't
> support TLS.
> 
> Requiring an 8 year old binutils-2.20 shouldn't be that onerous.
> 
> Note that this patch doesn't remove the configure test to set
> HAVE_AS_TLS_MARKERS.  I was wondering whether I ought to hook that
> into a "sorry, your assembler is too old" error?

https://gcc.gnu.org/ml/gcc-patches/2019-08/msg01487.html

I should have pinged this before now, and really I think the following
additional patch makes more sense than any sort of sorry message.
Mostly people will be running the assembler anyway so will discover
quickly that their assembler is too old.

* configure.ac (HAVE_AS_TLS_MARKERS): Delete test.
* configure: Regenerate.
* config.in: Regenerate.

diff --git a/gcc/configure.ac b/gcc/configure.ac
index 5f32fd4d5e4..44d816630e9 100644
--- a/gcc/configure.ac
+++ b/gcc/configure.ac
@@ -4811,12 +4811,6 @@ LCF0:
   [AC_DEFINE(HAVE_AS_GNU_ATTRIBUTE, 1,
  [Define if your assembler supports .gnu_attribute.])])
 
-gcc_GAS_CHECK_FEATURE([tls marker support],
-  gcc_cv_as_powerpc_tls_markers, [2,20,0],,
-  [ bl __tls_get_addr(x@tlsgd)],,
-  [AC_DEFINE(HAVE_AS_TLS_MARKERS, 1,
- [Define if your assembler supports arg info for __tls_get_addr.])])
-
 gcc_GAS_CHECK_FEATURE([prologue entry point marker support],
   gcc_cv_as_powerpc_entry_markers, [2,26,0],-a64 --fatal-warnings,
   [ .reloc .,R_PPC64_ENTRY; nop],,

>   * config/rs6000/rs6000-protos.h (rs6000_output_tlsargs): Delete.
>   * config/rs6000/rs6000.c (rs6000_output_tlsargs): Delete.
>   (rs6000_legitimize_tls_address): Remove !TARGET_TLS_MARKERS code.
>   (rs6000_call_template_1): Delete TARGET_TLS_MARKERS test and
>   allow other UNSPECs besides UNSPEC_TLSGD and UNSPEC_TLSLD.
>   (rs6000_indirect_call_template_1): Likewise.
>   (rs6000_pltseq_template): Likewise.
>   (rs6000_opt_vars): Remove "tls-markers" entry.
>   * config/rs6000/rs6000.h (TARGET_TLS_MARKERS): Don't define.
>   (IS_NOMARK_TLSGETADDR): Likewise.
>   * config/rs6000/rs6000.md (tls_gd): Replace TARGET_TLS_MARKERS
>   with !TARGET_XCOFF.
>   (tls_gd_high, tls_gd_low): Likewise.
>   (tls_ld, tls_ld_high, tls_ld_low): Likewise.
>   (pltseq_plt_pcrel): Likewise.
>   (call_value_local32): Remove IS_NOMARK_TLSGETADDR predicate test.
>   (call_value_local64): Likewise.
>   (call_value_indirect_nonlocal_sysv): Remove IS_NOMARK_TLSGETADDR
>   output and length attribute sub-expression.
>   (call_value_nonlocal_sysv),
>   (call_value_nonlocal_sysv_secure),
>   (call_value_local_aix, call_value_nonlocal_aix),
>   (call_value_indirect_aix, call_value_indirect_elfv2),
>   (call_value_indirect_pcrel): Likewise.
>   * config/rs6000/rs6000.opt (mtls-markers): Delete.
>   * doc/install.texi (powerpc-*-*): Require binutils-2.20.

-- 
Alan Modra
Australia Development Lab, IBM

Re: [PATCH][vect]Account for epilogue's peeling for gaps when checking if we have enough niters for epilogue

2019-11-10 Thread Richard Biener

On Fri, 8 Nov 2019, Andre Vieira (lists) wrote:

> Hi,
> 
> As I mentioned in the patch to disable epilogue vectorization for loops with
> SIMDUID set, there were still some aarch64 libgomp failures. This patch fixes
> those.
> 
> The problem was that we were vectorizing a reduction that was only using one
> of the parts from a complex number, creating data accesses with gaps. For this
> we set PEELING_FOR_GAPS which forces us to peel an extra scalar iteration.
> 
> What was happening in the testcase I looked at was that we had a known niters
> of 10. The first VF was 4, leaving 10 % 4 = 2 scalar iterations. The epilogue
> had VF 2, which meant the current code thought we could do it. However, given
> the PEELING_FOR_GAPS it would create a scalar epilogue and we would end up
> doing too many iterations, surprisingly 12 as I think the code assumed we
> hadn't created said epilogue.
> 
> I ran a local check where I upped the iterations of the fortran test to 11 and
> I see GCC vectorizing the epilogue with VF = 2 and a scalar epilogue for one
> iteration, so that looks good too. I have transformed it into a test that
> would reproduce the issue in C and without openacc so I can run it in gcc's
> normal testsuite more easily.
> 
> Bootstrap on aarch64 and x86_64.
> 
> Is this OK for trunk?

OK.

Richard.

> Cheers,
> Andre
> 
> gcc/ChangeLog:
> 2019-11-08  Andre Vieira  
> 
>   * tree-vect-loop-manip.c (vect_do_peeling): Take epilogue gaps
> into account when checking if there are enough iterations to
> vectorize epilogue.
> 
> gcc/testsuite/ChangeLog:
> 2019-11-08  Andre Vieira  
> 
>   * gcc.dg/vect/vect-reduc-epilogue-gaps.c: New test.
> 
> 

-- 
Richard Biener 
SUSE Software Solutions Germany GmbH, Maxfeldstrasse 5, 90409 Nuernberg,
Germany; GF: Felix Imendörffer; HRB 36809 (AG Nuernberg)

Re: [PATCH rs6000]Fix PR92132

2019-11-10 Thread Kewen.Lin

Hi Segher,

on 2019/11/9 上午1:36, Segher Boessenkool wrote:
> Hi!
> 
> On Fri, Nov 08, 2019 at 10:38:13AM +0800, Kewen.Lin wrote:
 +  [(set (match_operand: 0 "vint_operand")
 +   (match_operator 1 "comparison_operator"
>>>
>>> If you make an iterator for this instead, it is simpler code (you can then
>>> use  to do all these cases in one statement).
>>
>> If my understanding is correct and based on some tries before, I think we
>> have to leave these **CASEs** there (at least at the 1st level define_expand
>> for vec_cmp*), since vec_cmp* doesn't have  field in the pattern name.
>> The code can be only extracted from operator 1.  I tried to add one dummy
>> operand to hold  but it's impractical.
>>
>> Sorry, I may miss something here, I'm happy to make a subsequent patch to
>> uniform these cases if there is a good way to run a code iterator on them.
> 
> Instead of
> 
>   [(set (match_operand:VEC_I 0 "vint_operand")
>   (match_operator 1 "signed_or_equality_comparison_operator"
> [(match_operand:VEC_I 2 "vint_operand")
>  (match_operand:VEC_I 3 "vint_operand")]))]
> 
> you can do
> 
>   [(set (match_operand:VEC_I 0 "vint_operand")
>   (some_iter:VEC_I (match_operand:VEC_I 1 "vint_operand")
>(match_operand:VEC_I 2 "vint_operand")))]
> 

Thanks for your example.  But I'm afraid that it doesn't work for these 
patterns.

I tried it with simple code below:

; For testing
(define_code_iterator some_iter [eq gt])

(define_expand "vec_cmp"
  [(set (match_operand:VEC_I 0 "vint_operand")
(some_iter:VEC_I
  (match_operand:VEC_I 2 "vint_operand")
  (match_operand:VEC_I 3 "vint_operand")))]
  "VECTOR_UNIT_ALTIVEC_OR_VSX_P (mode)"
{
  emit_insn (gen_vector_ (operands[0], operands[2], operands[3]));
  DONE;
})

Error messages were emitted:

/home/linkw/gcc/gcc-git-fix/gcc/config/rs6000/vector.md:531:1: duplicate 
definition of 'vec_cmpv16qiv16qi'
/home/linkw/gcc/gcc-git-fix/gcc/config/rs6000/vector.md:531:1: duplicate 
definition of 'vec_cmpv8hiv8hi'
/home/linkw/gcc/gcc-git-fix/gcc/config/rs6000/vector.md:531:1: duplicate 
definition of 'vec_cmpv4siv4si'
/home/linkw/gcc/gcc-git-fix/gcc/config/rs6000/vector.md:531:1: duplicate 
definition of 'vec_cmpv2div2di'

It's expected, since the pattern here is vec_cmp rather than
vec_cmp, your example would work perfectly for the later.
Btw, in that pattern, the comparison operator is passed in operand 1.


BR,
Kewen

> with some_iter some code_iterator, (note you need to renumber), and in the
> body you can then just use  (or , or some other code_attribute).
> 
> code_iterator is more flexible than match_operator, in most ways.
> 
> 
> Segher
>

Re: [PATCH V3] rs6000: Refine small loop unroll in loop_unroll_adjust hook

2019-11-10 Thread Jiufu Guo

Segher Boessenkool  writes:

> Hi Jiu Fu,
>
> On Thu, Nov 07, 2019 at 10:40:41PM +0800, Jiufu Guo wrote:
>> gcc/
>> 2019-11-07  Jiufu Guo  
>> 
>>  PR tree-optimization/88760
>>  * gcc/config/rs6000/rs6000.opt (-munroll-only-small-loops): New option.
>>  * gcc/common/config/rs6000/rs6000-common.c
>>  (rs6000_option_optimization_table) [OPT_LEVELS_2_PLUS_SPEED_ONLY]:
>>  Turn on -funroll-loops and -munroll-only-small-loops.
>>  [OPT_LEVELS_ALL]: Turn off -fweb and -frename-registers.
>>  * config/rs6000/rs6000.c (rs6000_option_override_internal): Remove
>>  set of PARAM_MAX_UNROLL_TIMES and PARAM_MAX_UNROLLED_INSNS.
>>  Turn off -munroll-only-small-loops for explicit -funroll-loops.
>>  (TARGET_LOOP_UNROLL_ADJUST): Add loop unroll adjust hook.
>>  (rs6000_loop_unroll_adjust): Define it.  Use -munroll-only-small-loops.
>> 
>> gcc.testsuite/
>> 2019-11-07  Jiufu Guo  
>> 
>>  PR tree-optimization/88760
>>  * gcc.dg/pr59643.c: Update back to r277550.
>
> Okay for trunk.  Thanks!  Just some formatting stuff:
Thanks Segher! I will update according for patch.
>
>> +/* Enable -munroll-only-small-loops with -funroll-loops to unroll small
>> +loops at -O2 and above by default.   */
>
> The "l" of "loops" should align with the "E" of "Enable", and only two
> spaces after a dot:
> /* Enable -munroll-only-small-loops with -funroll-loops to unroll small
>loops at -O2 and above by default.  */
>
>> +/*  Implement targetm.loop_unroll_adjust.  */
>
> Only one space at the start of the comment.
>
>> +static unsigned
>> +rs6000_loop_unroll_adjust (unsigned nunroll, struct loop * loop)
>
> struct loop *loop
>
>> +  /* TODO: This is hardcoded to 10 right now.  It can be refined, for
>> + example we may want to unroll very small loops more times (4 perhaps).
>> + We also should use a PARAM for this.  */
>
> There will be target-specific params soon, if I understood correctly
> :-)
Yes, It is what I want to do.

Thanks again!

Jiufu
BR.

>
> Cheers,
>
>
> Segher

[Darwin, machopic 11/n, committed] A flag to indicate synbols should be linker-visible.

2019-11-10 Thread Iain Sandoe


Some of the solution to PR71767 is incomplete, and we need finer-grained
control over whether symbols need to be made linker-visible.  This is a
preparation patch, providing the flag.

tested on x86_64-darwin16, applied to mainline.
thanks
gcc/ChangeLog:

2019-11-10  Iain Sandoe  

* config/darwin.h (MACHO_SYMBOL_FLAG_LINKER_VIS): New.
(MACHO_SYMBOL_LINKER_VIS_P): New.

diff --git a/gcc/config/darwin.h b/gcc/config/darwin.h
index f331fa1..8eb8edf 100644
--- a/gcc/config/darwin.h
+++ b/gcc/config/darwin.h
@@ -843,6 +843,13 @@ extern GTY(()) section *  
darwin_sections[NUM_DARWIN_SECTIONS];

 #define MACHO_SYMBOL_HIDDEN_VIS_P(RTX) \
   ((SYMBOL_REF_FLAGS (RTX) & MACHO_SYMBOL_FLAG_HIDDEN_VIS) != 0)

+/* Set on a symbol that should be made visible to the linker (overriding
+   'L' symbol prefixes).  */
+
+#define MACHO_SYMBOL_FLAG_LINKER_VIS ((SYMBOL_FLAG_SUBT_DEP) << 4)
+#define MACHO_SYMBOL_LINKER_VIS_P(RTX) \
+  ((SYMBOL_REF_FLAGS (RTX) & MACHO_SYMBOL_FLAG_LINKER_VIS) != 0)
+
 /* Set on a symbol that is a pic stub or symbol indirection (i.e. the
L_x${stub,non_lazy_ptr,lazy_ptr}.  */

[PATCH, committed] Don't print warning when moving to static with -fno-automatic

2019-11-10 Thread Janne Blomqvist

As part of PR 91413, GFortran now prints a warning when a variable is
moved from the stack to static storage. However, when the user
explicitly specifies that all local variables should be put in static
storage with the -fno-automatic option, don't print this warning.

Regtested on x86_64-pc-linux-gnu, committed r278027 as obvious.

gcc/fortran/ChangeLog:

2019-11-10  Janne Blomqvist  

PR fortran/91413
* trans-decl.c (gfc_finish_var_decl): Don't print warning when
-fno-automatic is enabled.
---
 gcc/fortran/trans-decl.c | 19 ++-
 1 file changed, 10 insertions(+), 9 deletions(-)

diff --git a/gcc/fortran/trans-decl.c b/gcc/fortran/trans-decl.c
index ffa6316..76e1c7a8453 100644
--- a/gcc/fortran/trans-decl.c
+++ b/gcc/fortran/trans-decl.c
@@ -746,15 +746,16 @@ gfc_finish_var_decl (tree decl, gfc_symbol * sym)
  || sym->attr.allocatable)
   && !DECL_ARTIFICIAL (decl))
 {
-  gfc_warning (OPT_Wsurprising,
-  "Array %qs at %L is larger than limit set by"
-  " %<-fmax-stack-var-size=%>, moved from stack to static"
-  " storage. This makes the procedure unsafe when called"
-  " recursively, or concurrently from multiple threads."
-  " Consider using %<-frecursive%>, or increase the"
-  " %<-fmax-stack-var-size=%> limit, or change the code to"
-  " use an ALLOCATABLE array.",
-  sym->name, >declared_at);
+  if (flag_max_stack_var_size > 0)
+   gfc_warning (OPT_Wsurprising,
+"Array %qs at %L is larger than limit set by"
+" %<-fmax-stack-var-size=%>, moved from stack to static"
+" storage. This makes the procedure unsafe when called"
+" recursively, or concurrently from multiple threads."
+" Consider using %<-frecursive%>, or increase the"
+" %<-fmax-stack-var-size=%> limit, or change the code to"
+" use an ALLOCATABLE array.",
+sym->name, >declared_at);
 
   TREE_STATIC (decl) = 1;
 
-- 
2.17.1

Re: [wwwdocs] readings.html - "Porting GCC for Dunces" is gone

2019-11-10 Thread David Malcolm

On Sun, 2019-11-10 at 14:53 +0100, Gerald Pfeifer wrote:
> Hi H-P,
> 
> it appears this download is gone. Do you have an alternate location?
> 
> For now I applied the patch below which disables that link in 
> readings.html.
> 
> Gerald

FWIW archive.org seems to have a copy here:

https://web.archive.org/web/20190214220423/http://ftp.axis.se/pub/users/hp/pgccfd/pgccfd.pdf

[C++] Implement D1957R0, T* to bool should be considered narrowing.

2019-11-10 Thread Jason Merrill

This paper was delayed until the February meeting in Prague so that we could
get a better idea of what the impact on existing code would actually be.  To
that end, I'm implementing it now.

Tested x86_64-pc-linux-gnu, applying to trunk.

* typeck2.c (check_narrowing): Treat pointer->bool as a narrowing
conversion with -std=c++2a.
---
 gcc/cp/typeck2.c|  5 +++
 gcc/testsuite/g++.dg/cpp0x/initlist92.C | 51 +++--
 2 files changed, 18 insertions(+), 38 deletions(-)

diff --git a/gcc/cp/typeck2.c b/gcc/cp/typeck2.c
index 7884d423a59..9fb36fd1ed3 100644
--- a/gcc/cp/typeck2.c
+++ b/gcc/cp/typeck2.c
@@ -1018,6 +1018,11 @@ check_narrowing (tree type, tree init, tsubst_flags_t 
complain,
ok = true;
}
 }
+  else if (TREE_CODE (type) == BOOLEAN_TYPE
+  && (TYPE_PTR_P (ftype) || TYPE_PTRMEM_P (ftype)))
+/* This hasn't actually made it into C++20 yet, but let's add it now to get
+   an idea of the impact.  */
+ok = (cxx_dialect < cxx2a);
 
   bool almost_ok = ok;
   if (!ok && !CONSTANT_CLASS_P (init) && (complain & tf_warning_or_error))
diff --git a/gcc/testsuite/g++.dg/cpp0x/initlist92.C 
b/gcc/testsuite/g++.dg/cpp0x/initlist92.C
index 81a63182f0e..319264ae274 100644
--- a/gcc/testsuite/g++.dg/cpp0x/initlist92.C
+++ b/gcc/testsuite/g++.dg/cpp0x/initlist92.C
@@ -1,26 +1,13 @@
 // PR c++/64665, DR 1467 
-// { dg-do run { target c++11 } }
+// { dg-do compile { target c++11 } }
 
 #include 
-#include 
 
-bool Test1(bool) 
-{
-  return true;
-}
-bool Test1(std::string)
-{
-  return false;
-}
+bool Test1(bool);
+bool Test1(std::string) = delete;
 
-bool Test2(int)
-{
-  return false;
-}
-bool Test2(std::initializer_list)
-{
-  return true;
-}
+bool Test2(int) = delete;
+bool Test2(std::initializer_list);
 
 struct S 
 { 
@@ -28,28 +15,16 @@ struct S
 private:
 int a;
 };
-bool Test3(int)
-{
-  return true;
-}
-bool Test3(S)
-{
-  return false;
-}
+bool Test3(int);
+bool Test3(S) = delete;
 
-bool Test4(bool) 
-{
-  return false;
-}
-bool Test4(std::initializer_list)
-{
-  return true;
-}
+bool Test4(bool) = delete;
+bool Test4(std::initializer_list);
 
 int main () 
 {
-  assert ( Test1({"false"}) );
-  assert ( Test2({123}) );
-  assert ( Test3({456}) );
-  assert ( Test4({"false"}) );
+  ( Test1({"false"}) );// { dg-error "narrowing" "" { target c++2a } }
+  ( Test2({123}) );
+  ( Test3({456}) );
+  ( Test4({"false"}) );
 }

base-commit: 9b0807d9fe86b69112c4b1b65a923d685640b094
-- 
2.18.1

Re: [PATCH] Handle gimple_clobber_p stmts in store-merging (PR target/92038)

2019-11-10 Thread Christophe Lyon

On Thu, 7 Nov 2019 at 16:28, Jakub Jelinek  wrote:
>
> Hi!
>
> The following patch adds handling of clobbers in store-merging.  The intent
> is if we have a clobber followed by some stores into the clobbered area,
> even if don't store all the bytes in the area, we can avoid masking, because
> the non-stored bytes are undefined and in some cases we can even overwrite
> the whole area with the same or smaller number of stores compared to the
> original IL.
> Clobbers aren't removed from the IL, even if the following stores completely
> cover the whole area, as clobbers carry important additional information
> that the old value is gone, e.g. for tail call discovery if address taken
> before the clobber but not after it, removing the clobbers would disable
> tail call optimization.
> The patch right now treats the clobbered non-stored bytes as non-masked zero
> stores, except that we don't add stores to whole words etc. if there are no
> other overlapping stores; I have a separate patch that also computed
> defined_mask which contained whether some bytes are just undefined and we
> could in theory try different bit patterns in those bytes, but in the end
> decided it is too complicated and if needed, could be done as a follow-up.
>
> Bootstrapped/regtested on x86_64-linux and i686-linux, ok for trunk?
>

Hi Jakub,

I've noticed that the new test store-merging-2.C fails on arm:
FAIL: g++.dg/opt/store-merging-2.C  -std=gnu++14  scan-tree-dump
store-merging "New sequence of 2 stores to replace old one of 3
stores"

Christophe



> 2019-11-07  Jakub Jelinek  
>
> PR target/92038
> * gimple-ssa-store-merging.c (find_constituent_stores): For return
> value only, return non-NULL if there is a single non-clobber
> constituent store even if there are constituent clobbers and return
> one of clobber constituent stores if all constituent stores are
> clobbers.
> (split_group): Handle clobbers.
> (imm_store_chain_info::output_merged_store): When computing
> bzero_first, look after all clobbers at the start.  Don't count
> clobber stmts in orig_num_stmts, except if the first orig store is
> a clobber covering the whole area and split_stores cover the whole
> area, consider equal number of stmts ok.  Punt if split_stores
> contains only ->orig stores and their number plus number of original
> clobbers is equal to original number of stmts.  For ->orig, look past
> clobbers in the constituent stores.
> (imm_store_chain_info::output_merged_stores): Don't remove clobber
> stmts.
> (rhs_valid_for_store_merging_p): Don't return false for clobber stmt
> rhs.
> (store_valid_for_store_merging_p): Allow clobber stmts.
> (verify_clear_bit_region_be): Fix up a thinko in function comment.
>
> * g++.dg/opt/store-merging-1.C: New test.
> * g++.dg/opt/store-merging-2.C: New test.
> * g++.dg/opt/store-merging-3.C: New test.
>
> --- gcc/gimple-ssa-store-merging.c.jj   2019-11-07 09:50:38.029447052 +0100
> +++ gcc/gimple-ssa-store-merging.c  2019-11-07 12:13:15.048531180 +0100
> @@ -3110,7 +3110,8 @@ split_store::split_store (unsigned HOST_
>  /* Record all stores in GROUP that write to the region starting at BITPOS and
> is of size BITSIZE.  Record infos for such statements in STORES if
> non-NULL.  The stores in GROUP must be sorted by bitposition.  Return INFO
> -   if there is exactly one original store in the range.  */
> +   if there is exactly one original store in the range (in that case ignore
> +   clobber stmts, unless there are only clobber stmts).  */
>
>  static store_immediate_info *
>  find_constituent_stores (class merged_store_group *group,
> @@ -3146,16 +3147,24 @@ find_constituent_stores (class merged_st
>if (stmt_start >= end)
> return ret;
>
> +  if (gimple_clobber_p (info->stmt))
> +   {
> + if (stores)
> +   stores->safe_push (info);
> + if (ret == NULL)
> +   ret = info;
> + continue;
> +   }
>if (stores)
> {
>   stores->safe_push (info);
> - if (ret)
> + if (ret && !gimple_clobber_p (ret->stmt))
> {
>   ret = NULL;
>   second = true;
> }
> }
> -  else if (ret)
> +  else if (ret && !gimple_clobber_p (ret->stmt))
> return NULL;
>if (!second)
> ret = info;
> @@ -3347,13 +3356,17 @@ split_group (merged_store_group *group,
>
>if (bzero_first)
>  {
> -  first = 1;
> +  store_immediate_info *gstore;
> +  FOR_EACH_VEC_ELT (group->stores, first, gstore)
> +   if (!gimple_clobber_p (gstore->stmt))
> + break;
> +  ++first;
>ret = 1;
>if (split_stores)
> {
>   split_store *store
> -   = new split_store (bytepos, group->stores[0]->bitsize, 
> align_base);
> -

Re: [committed] Handle POLY_INT_CST in copy_reference_ops_from_ref

2019-11-10 Thread Christophe Lyon

On Fri, 8 Nov 2019 at 10:44, Richard Sandiford
 wrote:
>
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  Applied as obvious.
>

Hi Richard,

The new deref_2.c test fails with -mabi=ilp32:
FAIL: gcc.target/aarch64/sve/acle/general/deref_2.c
-march=armv8.2-a+sve (test for excess errors)
Excess errors:
/gcc/testsuite/gcc.target/aarch64/sve/acle/general/deref_2.c:17:39:
error: no matching function for call to 'svld1(svbool_t&, int32_t*&)'
/gcc/testsuite/gcc.target/aarch64/sve/acle/general/deref_2.c:17:38:
error: invalid conversion from 'int32_t*' {aka 'long int*'} to 'const
int*' [-fpermissive]
/gcc/testsuite/gcc.target/aarch64/sve/acle/general/deref_2.c:17:38:
error: invalid conversion from 'int32_t*' {aka 'long int*'} to 'const
unsigned int*' [-fpermissive]
/gcc/testsuite/gcc.target/aarch64/sve/acle/general/deref_2.c:18:43:
error: no matching function for call to 'svld1(svbool_t&, int32_t*&)'
/gcc/testsuite/gcc.target/aarch64/sve/acle/general/deref_2.c:18:42:
error: invalid conversion from 'int32_t*' {aka 'long int*'} to 'const
int*' [-fpermissive]
/gcc/testsuite/gcc.target/aarch64/sve/acle/general/deref_2.c:18:42:
error: invalid conversion from 'int32_t*' {aka 'long int*'} to 'const
unsigned int*' [-fpermissive]

Christophe

> Richard
>
>
> 2019-11-08  Richard Sandiford  
>
> gcc/
> * tree-ssa-sccvn.c (copy_reference_ops_from_ref): Handle
> POLY_INT_CST.
>
> gcc/testsuite/
> * gcc.target/aarch64/sve/acle/general/deref_2.c: New test.
> * gcc.target/aarch64/sve/acle/general/whilele_8.c: Likewise.
> * gcc.target/aarch64/sve/acle/general/whilelt_4.c: Likewise.
>
> Index: gcc/tree-ssa-sccvn.c
> ===
> --- gcc/tree-ssa-sccvn.c2019-10-31 17:15:21.594544316 +
> +++ gcc/tree-ssa-sccvn.c2019-11-08 09:43:07.927488162 +
> @@ -928,6 +928,7 @@ copy_reference_ops_from_ref (tree ref, v
>   break;
> case STRING_CST:
> case INTEGER_CST:
> +   case POLY_INT_CST:
> case COMPLEX_CST:
> case VECTOR_CST:
> case REAL_CST:
> Index: gcc/testsuite/gcc.target/aarch64/sve/acle/general/deref_2.c
> ===
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.target/aarch64/sve/acle/general/deref_2.c 2019-11-08 
> 09:43:07.927488162 +
> @@ -0,0 +1,20 @@
> +/* { dg-options "-O2" } */
> +
> +#include 
> +#include 
> +
> +inline void
> +copy (void *dst, svbool_t src)
> +{
> +  memcpy (dst, , svcntd ());
> +}
> +
> +uint64_t
> +f (int32_t *x, int32_t *y)
> +{
> +  union { uint64_t x; char c[8]; } u;
> +  svbool_t pg = svptrue_b32 ();
> +  copy (u.c, svcmpeq (pg, svld1 (pg, x), 0));
> +  copy (u.c + 4, svcmpeq (pg, svld1 (pg, y), 1));
> +  return u.x;
> +}
> Index: gcc/testsuite/gcc.target/aarch64/sve/acle/general/whilele_8.c
> ===
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.target/aarch64/sve/acle/general/whilele_8.c   
> 2019-11-08 09:43:07.927488162 +
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#include 
> +
> +/* { dg-final { scan-assembler-not {\tptrue\t} } } */
> +/* { dg-final { scan-assembler-not {\tpfalse\t} } } */
> +
> +void
> +test1 (svbool_t *ptr)
> +{
> +  *ptr = svwhilele_b32_s32 (-4, 0);
> +}
> +
> +void
> +test2 (svbool_t *ptr)
> +{
> +  *ptr = svwhilele_b16_s64 (svcntb (), svcntb () + 8);
> +}
> +
> +void
> +test3 (svbool_t *ptr)
> +{
> +  *ptr = svwhilele_b64_s32 (0, 2);
> +}
> +
> +void
> +test4 (svbool_t *ptr)
> +{
> +  *ptr = svwhilele_b8_s64 (16, svcntb ());
> +}
> +
> +/* { dg-final { scan-assembler-times {\twhilel[et]\t} 4 } } */
> Index: gcc/testsuite/gcc.target/aarch64/sve/acle/general/whilelt_4.c
> ===
> --- /dev/null   2019-09-17 11:41:18.176664108 +0100
> +++ gcc/testsuite/gcc.target/aarch64/sve/acle/general/whilelt_4.c   
> 2019-11-08 09:43:07.927488162 +
> @@ -0,0 +1,33 @@
> +/* { dg-do compile } */
> +/* { dg-options "-O2" } */
> +
> +#include 
> +
> +/* { dg-final { scan-assembler-not {\tptrue\t} } } */
> +/* { dg-final { scan-assembler-not {\tpfalse\t} } } */
> +
> +void
> +test1 (svbool_t *ptr)
> +{
> +  *ptr = svwhilelt_b32_s32 (-4, 1);
> +}
> +
> +void
> +test2 (svbool_t *ptr)
> +{
> +  *ptr = svwhilelt_b16_s64 (svcntb (), svcntb () + 9);
> +}
> +
> +void
> +test3 (svbool_t *ptr)
> +{
> +  *ptr = svwhilelt_b64_s32 (0, 3);
> +}
> +
> +void
> +test4 (svbool_t *ptr)
> +{
> +  *ptr = svwhilelt_b8_s64 (16, svcntb ());
> +}
> +
> +/* { dg-final { scan-assembler-times {\twhilel[et]\t} 4 } } */

Avoid even more sreal calculations in inliner

2019-11-10 Thread Jan Hubicka

Hi,
this patch manually CSEs sreal frequency calculations.

Bootstrapped/regtested x86_64-linux, comitted.

* ipa-inline.c (compute_uninlined_call_time,
compute_inlined_call_time): Take edge frequency as
parameter rather than computing it by itself.
(big_speedup_p, edge_badness): Manually CSE sreal
frequency calculations.

Index: ipa-inline.c
===
--- ipa-inline.c(revision 278020)
+++ ipa-inline.c(working copy)
@@ -735,13 +735,13 @@ want_early_inline_function_p (struct cgr
 
 inline sreal
 compute_uninlined_call_time (struct cgraph_edge *edge,
-sreal uninlined_call_time)
+sreal uninlined_call_time,
+sreal freq)
 {
   cgraph_node *caller = (edge->caller->inlined_to
 ? edge->caller->inlined_to
 : edge->caller);
 
-  sreal freq = edge->sreal_frequency ();
   if (freq > 0)
 uninlined_call_time *= freq;
   else
@@ -756,14 +756,14 @@ compute_uninlined_call_time (struct cgra
 
 inline sreal
 compute_inlined_call_time (struct cgraph_edge *edge,
-  sreal time)
+  sreal time,
+  sreal freq)
 {
   cgraph_node *caller = (edge->caller->inlined_to
 ? edge->caller->inlined_to
 : edge->caller);
   sreal caller_time = ipa_fn_summaries->get (caller)->time;
 
-  sreal freq = edge->sreal_frequency ();
   if (freq > 0)
 time *= freq;
   else
@@ -787,8 +787,9 @@ big_speedup_p (struct cgraph_edge *e)
 {
   sreal unspec_time;
   sreal spec_time = estimate_edge_time (e, _time);
-  sreal time = compute_uninlined_call_time (e, unspec_time);
-  sreal inlined_time = compute_inlined_call_time (e, spec_time);
+  sreal freq = e->sreal_frequency ();
+  sreal time = compute_uninlined_call_time (e, unspec_time, freq);
+  sreal inlined_time = compute_inlined_call_time (e, spec_time, freq);
   cgraph_node *caller = (e->caller->inlined_to
 ? e->caller->inlined_to
 : e->caller);
@@ -1164,9 +1165,10 @@ edge_badness (struct cgraph_edge *edge,
 {
   sreal numerator, denominator;
   int overall_growth;
-  sreal inlined_time = compute_inlined_call_time (edge, edge_time);
+  sreal freq = edge->sreal_frequency ();
+  sreal inlined_time = compute_inlined_call_time (edge, edge_time, freq);
 
-  numerator = (compute_uninlined_call_time (edge, unspec_edge_time)
+  numerator = (compute_uninlined_call_time (edge, unspec_edge_time, freq)
   - inlined_time);
   if (numerator <= 0)
numerator = ((sreal) 1 >> 8);
@@ -1198,14 +1200,14 @@ edge_badness (struct cgraph_edge *edge,
  && callee_info->single_caller
  && !edge->caller->inlined_to
  /* ... and edges executed only conditionally ... */
- && edge->sreal_frequency () < 1
+ && freq < 1
  /* ... consider case where callee is not inline but caller is ... */
  && ((!DECL_DECLARED_INLINE_P (edge->callee->decl)
   && DECL_DECLARED_INLINE_P (caller->decl))
  /* ... or when early optimizers decided to split and edge
 frequency still indicates splitting is a win ... */
  || (callee->split_part && !caller->split_part
- && edge->sreal_frequency () * 100
+ && freq * 100
 < PARAM_VALUE
  (PARAM_PARTIAL_INLINING_ENTRY_PROBABILITY)
  /* ... and do not overwrite user specified hints.   */
@@ -1256,11 +1258,11 @@ edge_badness (struct cgraph_edge *edge,
   " overall growth %i (current) %i (original)"
   " %i (compensated)\n",
   badness.to_double (),
-  edge->sreal_frequency ().to_double (),
+  freq.to_double (),
   edge->count.ipa ().initialized_p () ? edge->count.ipa 
().to_gcov_type () : -1,
   caller->count.ipa ().initialized_p () ? caller->count.ipa 
().to_gcov_type () : -1,
   compute_uninlined_call_time (edge,
-   unspec_edge_time).to_double (),
+   unspec_edge_time, 
freq).to_double (),
   inlined_time.to_double (),
   estimate_growth (callee),
   callee_info->growth, overall_growth);

Another sreal micro optimization

2019-11-10 Thread Jan Hubicka

Hi,
this is another case where we can save quite some sreal operations
(because it is common that call frequency is the same as entry block bb
frequency)

Bootstrapped/regtested x86_64-linux, comitted.

Honza

* profile-count.c (profile_count::to_sreal_scale): Short circuit
case where profiles are same.
Index: profile-count.c
===
--- profile-count.c (revision 278020)
+++ profile-count.c (working copy)
@@ -312,6 +312,8 @@ profile_count::to_sreal_scale (profile_c
 *known = true;
   if (*this == zero ())
 return 0;
+  if (m_val == in.m_val)
+return 1;
 
   if (!in.m_val)
 {

Avoid sreal in cgrpah_maybe_hot_p

2019-11-10 Thread Jan Hubicka

Hi,
while looking into performance issues with too much of sreal use in
inliner I nocited that in maybe_hot_p it is used just to hold a fraction
which is easily done on profile_counters, too.

This has also advantage that it will work with partially guessed static
profiles which are there during early inlining.

Bootstrapped/regtested x86_64-linux, comitted.

* cgraph.c (cgraph_edge::maybe_hot_p): Do not use sreal_frequency.
Index: cgraph.c
===
--- cgraph.c(revision 278020)
+++ cgraph.c(working copy)
@@ -2697,14 +2697,18 @@ cgraph_edge::maybe_hot_p (void)
 return false;
   if (caller->frequency == NODE_FREQUENCY_HOT)
 return true;
-  /* If profile is now known yet, be conservative.
- FIXME: this predicate is used by early inliner and can do better there.  
*/
-  if (symtab->state < IPA_SSA)
+  if (!count.initialized_p ())
 return true;
-  if (caller->frequency == NODE_FREQUENCY_EXECUTED_ONCE
-  && sreal_frequency () * 2 < 3)
+  cgraph_node *where = caller->inlined_to ? caller->inlined_to : caller;
+  if (!where->count.initialized_p ())
 return false;
-  if (sreal_frequency () * PARAM_VALUE (HOT_BB_FREQUENCY_FRACTION) <= 1)
+  if (caller->frequency == NODE_FREQUENCY_EXECUTED_ONCE)
+{
+  if (count.apply_scale (2, 1) < where->count.apply_scale (3, 1))
+   return false;
+}
+  else if (count.apply_scale (PARAM_VALUE (HOT_BB_FREQUENCY_FRACTION), 1)
+  < where->count)
 return false;
   return true;
 }

Free ipcp transformation summaries for inline clones

2019-11-10 Thread Jan Hubicka

Hi,
this patch implements freeing of transformation summaries for inline
clones. It also cleans up how they are duplicated (via its own duplicate
method rather then being budndled to duplication of indirect call infos)
and adds destructor so we release the memory of vectors.

I also noticed that agg replacements was not copied before, so I
implemnted it.

Bootstrapped/regtested x86_64-linux, comitted.

* ipa-prop.c (ipa_propagate_indirect_call_infos): Remove ipcp
summary.
(ipcp_transformation_t::duplicate): Break out from ...
(ipa_node_params_t::duplicate): ... here; add copying of agg
replacements.
* ipa-prop.h (ipcp_transformation): Add constructor and destructor.
(ipcp_transformation_t): Add duplicate.
Index: ipa-prop.c
===
--- ipa-prop.c  (revision 278020)
+++ ipa-prop.c  (working copy)
@@ -3746,6 +3746,8 @@ ipa_propagate_indirect_call_infos (struc
   if (ok)
 ipa_edge_args_sum->remove (cs);
 }
+  if (ipcp_transformation_sum)
+ipcp_transformation_sum->remove (cs->callee);
 
   return changed;
 }
@@ -3986,27 +3988,28 @@ ipa_node_params_t::duplicate(cgraph_node
}
   ipa_set_node_agg_value_chain (dst, new_av);
 }
+}
 
-  ipcp_transformation *src_trans = ipcp_get_transformation_summary (src);
+/* Duplication of ipcp transformation summaries.  */
 
-  if (src_trans)
+void
+ipcp_transformation_t::duplicate(cgraph_node *, cgraph_node *dst,
+ipcp_transformation *src_trans,
+ipcp_transformation *dst_trans)
+{
+  /* Avoid redundant work of duplicating vectors we will never use.  */
+  if (dst->inlined_to)
+return;
+  dst_trans->bits = vec_safe_copy (src_trans->bits);
+  dst_trans->m_vr = vec_safe_copy (src_trans->m_vr);
+  ipa_agg_replacement_value *agg = src_trans->agg_values,
+   **aggptr = _trans->agg_values;
+  while (agg)
 {
-  ipcp_transformation_initialize ();
-  src_trans = ipcp_transformation_sum->get_create (src);
-  ipcp_transformation *dst_trans
-   = ipcp_transformation_sum->get_create (dst);
-
-  dst_trans->bits = vec_safe_copy (src_trans->bits);
-
-  const vec *src_vr = src_trans->m_vr;
-  vec *_vr
-   = ipcp_get_transformation_summary (dst)->m_vr;
-  if (vec_safe_length (src_trans->m_vr) > 0)
-   {
- vec_safe_reserve_exact (dst_vr, src_vr->length ());
- for (unsigned i = 0; i < src_vr->length (); ++i)
-   dst_vr->quick_push ((*src_vr)[i]);
-   }
+  *aggptr = ggc_alloc ();
+  **aggptr = *agg;
+  agg = agg->next;
+  aggptr = &(*aggptr)->next;
 }
 }
 
Index: ipa-prop.h
===
--- ipa-prop.h  (revision 278020)
+++ ipa-prop.h  (working copy)
@@ -639,6 +639,25 @@ struct GTY(()) ipcp_transformation
   vec *bits;
   /* Value range information.  */
   vec *m_vr;
+
+  /* Default constructor.  */
+  ipcp_transformation ()
+  : agg_values (NULL), bits (NULL), m_vr (NULL)
+  { }
+
+  /* Default destructor.  */
+  ~ipcp_transformation ()
+  {
+ipa_agg_replacement_value *agg = agg_values;
+while (agg)
+  {
+   ipa_agg_replacement_value *next = agg->next;
+   ggc_free (agg);
+   agg = next;
+  }
+vec_free (bits);
+vec_free (m_vr);
+  }
 };
 
 void ipa_set_node_agg_value_chain (struct cgraph_node *node,
@@ -759,6 +778,11 @@ public:
   ipcp_transformation_t (symtab, true);
 return summary;
   }
+  /* Hook that is called by summary when a node is duplicated.  */
+  virtual void duplicate (cgraph_node *node,
+ cgraph_node *node2,
+ ipcp_transformation *data,
+ ipcp_transformation *data2);
 };
 
 /* Function summary where the IPA CP transformations are actually stored.  */

Free ipa args summary after inlining

2019-11-10 Thread Jan Hubicka

Hi,
this patch adds removal of ipa_edge_args_sum for the inlined edges.
It turns out that those are still needed for described uses (though I am
not sure why described uses are counted across all inline duplicates
rather than handled for each of them independently), so I keep those.

Bootstrapped/regtested x86_64-linux, comitted.

Honza

* ipa-prop.c (ipa_propagate_indirect_call_infos): Remove ipa edge
args summaries of inlined edge unless it holds info about
described reference.

Index: ipa-prop.c
===
--- ipa-prop.c  (revision 278016)
+++ ipa-prop.c  (working copy)
@@ -3727,6 +3727,26 @@ ipa_propagate_indirect_call_infos (struc
   changed = propagate_info_to_inlined_callees (cs, cs->callee, new_edges);
   ipa_node_params_sum->remove (cs->callee);
 
+  class ipa_edge_args *args = IPA_EDGE_REF (cs);
+  if (args)
+{
+  bool ok = true;
+  if (args->jump_functions)
+   {
+ struct ipa_jump_func *jf;
+ int i;
+ FOR_EACH_VEC_ELT (*args->jump_functions, i, jf)
+   if (jf->type == IPA_JF_CONST
+   && ipa_get_jf_constant_rdesc (jf))
+ {
+   ok = false;
+   break;
+ }
+   }
+  if (ok)
+ipa_edge_args_sum->remove (cs);
+}
+
   return changed;
 }

[wwwdocs] readings.html - "Porting GCC for Dunces" is gone

2019-11-10 Thread Gerald Pfeifer

Hi H-P,

it appears this download is gone. Do you have an alternate location?

For now I applied the patch below which disables that link in 
readings.html.

Gerald

- Log -
commit c5f63c81361196993ea4fdbc3e77c1f2a35a6e15
Author: Gerald Pfeifer 
Date:   Sun Nov 10 14:41:36 2019 +0100

Disable "Porting GCC for Dunces" which is gone (or moved).

diff --git a/htdocs/readings.html b/htdocs/readings.html
index 03010f0..e95760a 100644
--- a/htdocs/readings.html
+++ b/htdocs/readings.html
@@ -29,10 +29,10 @@
   http://www.pspace.org/a/thesis/;>Compilation
   of Functional Programming Languages using GCC -- Tail Calls
   by Andreas Bauer.
-   
+
   http://cobolforgcc.sourceforge.net/cobol_toc.html;>Using,
   Maintaining and Enhancing COBOL for the GNU Compiler Collection (GCC)
   by Joachim Nadler and Tim Josling

Re: [PATCH] Bump minimum MPFR version to 3.1.0

2019-11-10 Thread Gerald Pfeifer

On Sun, 10 Nov 2019, Janne Blomqvist wrote:
> Thanks, I'll take that as an Ok for the Fortran part. I believe I
> still need an Ok by a global or build machinery reviewer for the
> global and docs parts.

For the docs parts, and in particular a change like this, only in
the most pedantic of worlds imaginable. ;-)  

That said: okay. And thank you!

Gerald

[libstdc++,doc] doc/xml/manual/using.xml: Switch www.hboehm.info to https

2019-11-10 Thread Gerald Pfeifer

Committed.

Gerald


2019-11-10  Gerald Pfeifer  

* doc/xml/manual/using.xml: Switch www.hboehm.info to https.

Index: doc/xml/manual/using.xml
===
--- doc/xml/manual/using.xml(revision 278018)
+++ doc/xml/manual/using.xml(working copy)
@@ -1849,7 +1849,7 @@ gcc version 4.1.2 20070925 (Red Hat 4.1.2-33)
 
   For further details of the C++11 memory model see Hans-J. Boehm's
   http://www.w3.org/1999/xlink; 
xlink:href="https://www.hboehm.info/c++mm/;>Threads
-  and memory model for C++ pages, particularly the http://www.w3.org/1999/xlink; 
xlink:href="http://www.hboehm.info/c++mm/threadsintro.html;>introduction 
+  and memory model for C++ pages, particularly the http://www.w3.org/1999/xlink; 
xlink:href="https://www.hboehm.info/c++mm/threadsintro.html;>introduction
 
   and http://www.w3.org/1999/xlink; 
xlink:href="https://www.hboehm.info/c++mm/user-faq.html;>FAQ.

[PATCH] rs6000: Allow any CC mode in movcc

2019-11-10 Thread Segher Boessenkool

Sometimes combine wants to do a move in CCFPmode, but we don't currently
handle moves in any CC mode other than CCmode.  Fix that oversight.

Tested on powerpc64-linux {-m32,-m64}.  Committing to trunk.


Segher


2019-11-10  Segher Boessenkool  

* config/rs6000/rs6000.md (CC_any): New mode iterator.
(*movcc_internal1): Rename to...
(*movcc_ for CC_any): ... this.  Support moves of all CC modes.

---
 gcc/config/rs6000/rs6000.md | 16 +---
 1 file changed, 9 insertions(+), 7 deletions(-)

diff --git a/gcc/config/rs6000/rs6000.md b/gcc/config/rs6000/rs6000.md
index 1944abf..0cdfdef 100644
--- a/gcc/config/rs6000/rs6000.md
+++ b/gcc/config/rs6000/rs6000.md
@@ -7220,13 +7220,15 @@ (define_expand "movcc"
   ""
   "")
 
-(define_insn "*movcc_internal1"
-  [(set (match_operand:CC 0 "nonimmediate_operand"
-   "=y,x,?y,y,r,r,r,r, r,*c*l,r,m")
-   (match_operand:CC 1 "general_operand"
-   " y,r, r,O,x,y,r,I,*h,   r,m,r"))]
-  "register_operand (operands[0], CCmode)
-   || register_operand (operands[1], CCmode)"
+(define_mode_iterator CC_any [CC CCUNS CCEQ CCFP])
+
+(define_insn "*movcc_"
+  [(set (match_operand:CC_any 0 "nonimmediate_operand"
+   "=y,x,?y,y,r,r,r,r, r,*c*l,r,m")
+   (match_operand:CC_any 1 "general_operand"
+   " y,r, r,O,x,y,r,I,*h,   r,m,r"))]
+  "register_operand (operands[0], mode)
+   || register_operand (operands[1], mode)"
   "@
mcrf %0,%1
mtcrf 128,%1
-- 
1.8.3.1

Remove ipa-prop node summaries for inline clones

2019-11-10 Thread Jan Hubicka

Hi,
this patch makes creation of IPA_NODE_REF summaries explicit and fixes
the fallout. It also removes the summaries after stuff is propagated
into caller when function is inlined.

Martin, I had to add flag ipcp_clone_p into cgraph_node since that was
used while resolving cloned references.  I do not see why this flag is
needed at all. I understand it is about chain of references when
multiple thinkgs was cloned first and inlined later, but it seems to me
that we have all the info and there can not me non-clones in chain.

Also it seems to me that ipa-cp should simply remove those parameters
earlier and not get multiplied references then.

Honza

* cgraph.h (struct cgraph_node): Add ipcp_clone flag.
(cgraph_node::create_virtual_clone): Copy it.
* ipa-cp.c (ipcp_versionable_function_p): Watch for missing
summaries.
(ignore_edge_p): If caller has ipa-cp disabled, skip the edge, too.
(ipcp_verify_propagated_values): Do not verify nodes where ipcp
is disabled.
(propagate_constants_across_call): If callee is not analyzed, give up.
(propagate_constants_topo): Lower to bottom latties of all callees of
functions with ipa-cp disabled.
(ipcp_propagate_stage): Skip functions with ipa-cp disabled.
(cgraph_edge_brings_value_p): Check for availability first.
(create_specialized_node): Set ipcp_clone.
(ipcp_store_bits_results): Check that info is present.
* ipa-fnsummary.c (evaluate_properties_for_edge): Do not analyze
thunks.
(ipa_call_context::duplicate_from, ipa_call_context::equal_to): Be
conservative when callee summary is missing.
(remap_edge_summaries): Lookup call summary only when needed.
* ipa-icf.c (sem_function::param_used_p): Be ready for missing summary.
* ipa-prpo.c (ipa_alloc_node_params, ipa_initialize_node_params):
Use get_create.
(ipa_analyze_node): Use get_create.
(propagate_controlled_uses): Do not propagate when function is not
analyzed.
(ipa_propagate_indirect_call_infos): Remove summary of inline clone.
(ipa_read_node_info): Use get_create.
* ipa-prop.h (IPA_NODE_REF): Use get.
(IPA_NODE_REF_GET_CREATE): New.
Index: cgraph.h
===
--- cgraph.h(revision 278009)
+++ cgraph.h(working copy)
@@ -1484,6 +1484,8 @@ struct GTY((tag ("SYMTAB_FUNCTION"))) cg
   unsigned redefined_extern_inline : 1;
   /* True if the function may enter serial irrevocable mode.  */
   unsigned tm_may_enter_irr : 1;
+  /* True if this was a clone created by ipa-cp.  */
+  unsigned ipcp_clone : 1;
 
 private:
   /* Unique id of the node.  */
Index: cgraphclones.c
===
--- cgraphclones.c  (revision 278009)
+++ cgraphclones.c  (working copy)
@@ -570,6 +570,7 @@ cgraph_node::create_virtual_clone (vecipcp_clone = ipcp_clone;
   new_node->clone.tree_map = tree_map;
   if (!implicit_section)
 new_node->set_section (get_section ());
Index: ipa-cp.c
===
--- ipa-cp.c(revision 278009)
+++ ipa-cp.c(working copy)
@@ -656,7 +656,7 @@ determine_versionability (struct cgraph_
 static bool
 ipcp_versionable_function_p (struct cgraph_node *node)
 {
-  return IPA_NODE_REF (node)->versionable;
+  return IPA_NODE_REF (node) && IPA_NODE_REF (node)->versionable;
 }
 
 /* Structure holding accumulated information about callers of a node.  */
@@ -817,6 +817,7 @@ ignore_edge_p (cgraph_edge *e)
 = e->callee->function_or_virtual_thunk_symbol (, e->caller);
 
   return (avail <= AVAIL_INTERPOSABLE
+ || !opt_for_fn (e->caller->decl, flag_ipa_cp)
  || !opt_for_fn (ultimate_target->decl, flag_ipa_cp));
 }
 
@@ -1471,6 +1472,8 @@ ipcp_verify_propagated_values (void)
   FOR_EACH_FUNCTION_WITH_GIMPLE_BODY (node)
 {
   class ipa_node_params *info = IPA_NODE_REF (node);
+  if (!opt_for_fn (node->decl, flag_ipa_cp))
+   continue;
   int i, count = ipa_get_param_count (info);
 
   for (i = 0; i < count; i++)
@@ -2307,6 +2310,8 @@ propagate_constants_across_call (struct
 return false;
   gcc_checking_assert (callee->has_gimple_body_p ());
   callee_info = IPA_NODE_REF (callee);
+  if (!callee_info)
+return false;
 
   args = IPA_EDGE_REF (cs);
   parms_count = ipa_get_param_count (callee_info);
@@ -3233,7 +3238,17 @@ propagate_constants_topo (class ipa_topo
 until all lattices stabilize.  */
   FOR_EACH_VEC_ELT (cycle_nodes, j, v)
if (v->has_gimple_body_p ())
- push_node_to_stack (topo, v);
+ {
+   if (opt_for_fn (v->decl, flag_ipa_cp))
+ push_node_to_stack (topo, v);
+   /* When V is not optimized, we can not push it to stac, but
+  still we need to set all its callees lattices to bottom.  */
+

Re: [PATCH] Bump minimum MPFR version to 3.1.0

2019-11-10 Thread Janne Blomqvist

On Sun, Nov 10, 2019 at 11:43 AM Thomas Koenig  wrote:
>
> Hi Janne,
>
> > Bump the minimum MPFR version to 3.1.0, released 2011-10-03. With this
> > requirement one can still build GCC with the operating system provided
> > MPFR on old but still supported operating systems like SLES 12 (MPFR
> > 3.1.2) or RHEL/CentOS 7.x (MPFR 3.1.1).
> >
>
> OK for trunk.

Thanks, I'll take that as an Ok for the Fortran part. I believe I
still need an Ok by a global or build machinery reviewer for the
global and docs parts.

> Can you also make a note in https://gcc.gnu.org/gcc-10/changes.html ?

Sure, will do, when the patch is accepted.




-- 
Janne Blomqvist

Re: [PATCH] Bump minimum MPFR version to 3.1.0

2019-11-10 Thread Thomas Koenig


Hi Janne,


Bump the minimum MPFR version to 3.1.0, released 2011-10-03. With this
requirement one can still build GCC with the operating system provided
MPFR on old but still supported operating systems like SLES 12 (MPFR
3.1.2) or RHEL/CentOS 7.x (MPFR 3.1.1).



OK for trunk.

Can you also make a note in https://gcc.gnu.org/gcc-10/changes.html ?

Regards

Thomas

[PATCH 1/2] Add a pass to automatically add ptwrite instrumentation

[PATCH, rs6000] Refactor FP vector comparison operators

[PATCH 2/2] Add tests for the vartrace pass

Re: PC-relative TLS support

Re: [PATCH][vect]Account for epilogue's peeling for gaps when checking if we have enough niters for epilogue

Re: [PATCH rs6000]Fix PR92132

Re: [PATCH V3] rs6000: Refine small loop unroll in loop_unroll_adjust hook

[Darwin, machopic 11/n, committed] A flag to indicate synbols should be linker-visible.

[PATCH, committed] Don't print warning when moving to static with -fno-automatic

Re: [wwwdocs] readings.html - "Porting GCC for Dunces" is gone

[C++] Implement D1957R0, T* to bool should be considered narrowing.

Re: [PATCH] Handle gimple_clobber_p stmts in store-merging (PR target/92038)

Re: [committed] Handle POLY_INT_CST in copy_reference_ops_from_ref

Avoid even more sreal calculations in inliner

Another sreal micro optimization

Avoid sreal in cgrpah_maybe_hot_p

Free ipcp transformation summaries for inline clones

Free ipa args summary after inlining

[wwwdocs] readings.html - "Porting GCC for Dunces" is gone

Re: [PATCH] Bump minimum MPFR version to 3.1.0

[libstdc++,doc] doc/xml/manual/using.xml: Switch www.hboehm.info to https

[PATCH] rs6000: Allow any CC mode in movcc

Remove ipa-prop node summaries for inline clones

Re: [PATCH] Bump minimum MPFR version to 3.1.0

Re: [PATCH] Bump minimum MPFR version to 3.1.0

25 matches

Site Navigation

Mail list logo

Footer information