Re: [PATCH][ARM] Remove Thumb-2 iordi_not patterns

2017-09-04 Thread Wilco Dijkstra
Kyrill Tkachov wrote:

> > After Bernd's change almost all DI mode instructions are split before 
> > register
> > allocation. So instructions using DI mode no longer exist and thus these
> > extend variants can never be matched and are thus redundant.
>
> Bernd's patch splits them when we don't have NEON. When NEON is 
> available though
> they still maintain the DImode so we'd still benefit from these 
> transformations, no?

While you're right it may be possible to trigger these instructions, ORN is 
already
so rare that it is hardly beneficial to have an instruction for it, and ORN of 
an extended
value never ever happens. So there is absolutely no benefit in keeping these 
versions
temporarily until we fix Neon too.

For the Neon case my proposal is to use the VFP early expansion (so you get an 
efficient
expansion by default in all cases). You can then use -mneon-for-64bits to 
enable the use
of Neon instructions (which may be even better in some cases). There are quite 
a few
patches in this series already and more to come soon!

Wilco

Re: [PATCH, ARM] correctly encode the CC reg data flow

2017-09-04 Thread Bernd Edlinger
Hi Kyrill,

Thanks for your review!


On 09/04/17 15:55, Kyrill Tkachov wrote:
> Hi Bernd,
> 
> On 18/01/17 15:36, Bernd Edlinger wrote:
>> On 01/13/17 19:28, Bernd Edlinger wrote:
>>> On 01/13/17 17:10, Bernd Edlinger wrote:
 On 01/13/17 14:50, Richard Earnshaw (lists) wrote:
> On 18/12/16 12:58, Bernd Edlinger wrote:
>> Hi,
>>
>> this is related to PR77308, the follow-up patch will depend on this
>> one.
>>
>> When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
>> before reload, a mis-compilation in libgcc function 
>> __gnu_satfractdasq
>> was discovered, see [1] for more details.
>>
>> The reason seems to be that when the *arm_cmpdi_insn is directly
>> followed by a *arm_cmpdi_unsigned instruction, both are split
>> up into this:
>>
>> [(set (reg:CC CC_REGNUM)
>>   (compare:CC (match_dup 0) (match_dup 1)))
>>  (parallel [(set (reg:CC CC_REGNUM)
>>  (compare:CC (match_dup 3) (match_dup 4)))
>> (set (match_dup 2)
>>  (minus:SI (match_dup 5)
>>   (ltu:SI (reg:CC_C CC_REGNUM) (const_int
>> 0])]
>>
>> [(set (reg:CC CC_REGNUM)
>>   (compare:CC (match_dup 2) (match_dup 3)))
>>  (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
>> (set (reg:CC CC_REGNUM)
>>  (compare:CC (match_dup 0) (match_dup 1]
>>
>> The problem is that the reg:CC from the *subsi3_carryin_compare
>> is not mentioning that the reg:CC is also dependent on the reg:CC
>> from before.  Therefore the *arm_cmpsi_insn appears to be
>> redundant and thus got removed, because the data values are 
>> identical.
>>
>> I think that applies to a number of similar pattern where data
>> flow is happening through the CC reg.
>>
>> So this is a kind of correctness issue, and should be fixed
>> independently from the optimization issue PR77308.
>>
>> Therefore I think the patterns need to specify the true
>> value that will be in the CC reg, in order for cse to
>> know what the instructions are really doing.
>>
>>
>> Bootstrapped and reg-tested on arm-linux-gnueabihf.
>> Is it OK for trunk?
>>
> I agree you've found a valid problem here, but I have some issues with
> the patch itself.
>
>
> (define_insn_and_split "subdi3_compare1"
>[(set (reg:CC_NCV CC_REGNUM)
>  (compare:CC_NCV
>(match_operand:DI 1 "register_operand" "r")
>(match_operand:DI 2 "register_operand" "r")))
> (set (match_operand:DI 0 "register_operand" "=")
>  (minus:DI (match_dup 1) (match_dup 2)))]
>"TARGET_32BIT"
>"#"
>"&& reload_completed"
>[(parallel [(set (reg:CC CC_REGNUM)
> (compare:CC (match_dup 1) (match_dup 2)))
>(set (match_dup 0) (minus:SI (match_dup 1) (match_dup 
> 2)))])
> (parallel [(set (reg:CC_C CC_REGNUM)
> (compare:CC_C
>   (zero_extend:DI (match_dup 4))
>   (plus:DI (zero_extend:DI (match_dup 5))
>(ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
>(set (match_dup 3)
> (minus:SI (minus:SI (match_dup 4) (match_dup 5))
>   (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0])]
>
>
> This pattern is now no-longer self consistent in that before the split
> the overall result for the condition register is in mode CC_NCV, but
> afterwards it is just CC_C.
>
> I think CC_NCV is correct mode (the N, C and V bits all correctly
> reflect the result of the 64-bit comparison), but that then implies 
> that
> the cc mode of subsi3_carryin_compare is incorrect as well and 
> should in
> fact also be CC_NCV.  Thinking about this pattern, I'm inclined to 
> agree
> that CC_NCV is the correct mode for this operation
>
> I'm not sure if there are other consequences that will fall out from
> fixing this (it's possible that we might need a change to 
> select_cc_mode
> as well).
>
 Yes, this is still a bit awkward...

 The N and V bit will be the correct result for the subdi3_compare1
 a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
 only gets the C bit correct, the expression for N and V is a different
 one.

 It probably works, because the subsi3_carryin_compare instruction sets
 more CC bits than the pattern does explicitly specify the value.
 We know the subsi3_carryin_compare also computes the NV bits, but it is
 hard to write down the correct rtl expression for it.

 In theory the pattern should describe everything correctly,
 maybe, like:

 set (reg:CC_C CC_REGNUM)
  (compare:CC_C
 

Re: [PATCH]: PR target/80204 (Darwin macosx-version-min problem)

2017-09-04 Thread Jakub Jelinek
On Mon, Sep 04, 2017 at 08:47:07PM +0100, Simon Wright wrote:
> On 1 Sep 2017, at 23:05, Simon Wright  wrote:
> > 
> > 2017-09-01 Simon Wright 
> > 
> > PR target/80204
> > * config/darwin-driver.c (darwin_find_version_from_kernel): eliminate 
> > calculation of the
> >   minor version, always output as 0.
> 
> Like this?
> 
> Do you need the patch to be resubmitted as well?
> 
> gcc/Changelog
> 
> 2017-09-01 Simon Wright 
> 
>   PR target/80204
>   * config/darwin-driver.c (darwin_find_version_from_kernel):
>   Eliminate calculation of the minor version, always output as 0.

Better

2017-09-01 Simon Wright 

PR target/80204
* config/darwin-driver.c (darwin_find_version_from_kernel): Eliminate
calculation of the minor version, always output as 0.

Jakub


Re: std::vector default default and move constructors

2017-09-04 Thread François Dumont

Hi

Gentle reminder.

Thanks


On 21/08/2017 21:15, François Dumont wrote:
Following feedback on std::list patch this one had the same problem of 
unused code being deleted. So here is a new version.


Ok to commit ?

François

On 28/07/2017 18:45, François Dumont wrote:

Hi

There was a little issue in this patch, here is the correct version.

François


On 23/07/2017 19:41, François Dumont wrote:

Hi

Is it time now to consider this patch ?

* include/bits/stl_vector.h
(_Vector_impl_data): New.
(_Vector_impl): Inherit from latter.
(_Vertor_impl(_Vector_impl&&, _Tp_alloc_type&&)): New.
(_Vector_base(_Vector_base&&, const allocator_type&)): Use latter.
(_Vector_base()): Default.
(_Vector_base(size_t)): Delete.
(_Vector_base(_Tp_alloc_type&&)): Delete.
(_Vector_base(_Vector_base&&)): Default.
(vector()): Default.
(vector(vector&&, const allocator_type&, true_type)): New.
(vector(vector&&, const allocator_type&, false_type)): New.
(vector(vector&&, const allocator_type&)): Use latters.

Tested under linux x86_64.

François










Re: [PATCH]: PR target/80204 (Darwin macosx-version-min problem)

2017-09-04 Thread Simon Wright
On 1 Sep 2017, at 23:05, Simon Wright  wrote:
> 
>   2017-09-01 Simon Wright 
> 
>   PR target/80204
>   * config/darwin-driver.c (darwin_find_version_from_kernel): eliminate 
> calculation of the
> minor version, always output as 0.

Like this?

Do you need the patch to be resubmitted as well?

gcc/Changelog

2017-09-01 Simon Wright 

PR target/80204
* config/darwin-driver.c (darwin_find_version_from_kernel):
Eliminate calculation of the minor version, always output as 0.



Re: [PATCH]: PR target/80204 (Darwin macosx-version-min problem)

2017-09-04 Thread Jakub Jelinek
On Mon, Sep 04, 2017 at 07:56:58PM +0100, Simon Wright wrote:
> >>2017-09-01 Simon Wright 
> >> 
> >>PR target/80204
> >>* config/darwin-driver.c (darwin_find_version_from_kernel): eliminate 
> >> calculation of the
> >>  minor version, always output as 0.
> >> 
> > OK
> > jeff
> 
> Great! Can it be applied, please, when convenient

Note the ChangeLog entry has bad format, it should start with capital letter
(Eliminate) and the line is way too long, so should be wrapped differently.

Jakub


Re: [PING][PATCH 2/3] retire mem_signal_fence pattern

2017-09-04 Thread Uros Bizjak
> On 09/01/2017 12:26 AM, Alexander Monakov wrote:
>> On Thu, 31 Aug 2017, Jeff Law wrote:
>>> This is OK.
>>>
>>> What's the point of the delete_insns_since calls in patch #3?
>>
>> Deleting the first barrier when maybe_expand_insn failed.
>> Other functions in the file use a similar approach.
> Thanks for clarifying.  OK for the trunk.  Sorry about the wait.

It looks that:

2017-09-04  Alexander Monakov  

PR rtl-optimization/57448
PR target/67458
PR target/81316
* optabs.c (expand_atomic_load): Place compiler memory barriers if
using atomic_load pattern.
(expand_atomic_store): Likewise.

introduced a couple of regressions on x86 (-m32, 32bit)  testsuite:

New failures:
FAIL: gcc.target/i386/pr71245-1.c scan-assembler-not (fistp|fild)
FAIL: gcc.target/i386/pr71245-2.c scan-assembler-not movlps

Uros.


Re: [PATCH]: PR target/80204 (Darwin macosx-version-min problem)

2017-09-04 Thread Simon Wright
(Apologies if this is a duplicate: Mac Mail vs the list's requirement for plain 
text)

On 4 Sep 2017, at 07:09, Jeff Law  wrote:
> 
> On 09/01/2017 04:05 PM, Simon Wright wrote:
>> In gcc/config/darwin-driver.c, darwin_find_version_from_kernel() assumes
>> that the minor version in the Darwin kernel version (16.7.0, => minor
>> version 7) is equal to the bugfix component of the macOS version, so that
>> the compiler receives -mmacosx-version-min=10.12.7 and the linker receives
>> -macosx_version_min 10.12.7.
>> 
>> Unfortunately, Apple don’t apply this algorithm; the macOS version is
>> actually 10.12.6.
>> 
>> Getting this wrong means that it’s impossible to run an executable from 
>> within a bundle: Sierra complains "You have macOS 10.12.6. The application
>> requires macOS 10.12.7 or later".
>> 
>> A workround would perhaps be to link the executable with
>> -Wl,-macosx_version_min,`sw_vers -productVersion` (I assume that it’s only
>> the linker phase that matters?)
>> 
>> I see that Apple’s gcc (Apple LLVM version 8.0.0
>> (clang-800.0.42.1)) specifies - only at link time -
>> -macosx_version_min 10.12.0
>> 
>> This patch does the same.
>> 
>> gcc/Changelog:
>> 
>>  2017-09-01 Simon Wright 
>> 
>>  PR target/80204
>>  * config/darwin-driver.c (darwin_find_version_from_kernel): eliminate 
>> calculation of the
>>minor version, always output as 0.
>> 
> OK
> jeff

Great! Can it be applied, please, when convenient

--S



[PATCH, i386 testsuite]: Introduce gcc.target/i386/mpx/mpx-os-support.h

2017-09-04 Thread Uros Bizjak
2017-09-04  Uros Bizjak  

* gcc.target/i386/mpx/mpx-os-support.h: New file.
* gcc.target/i386/mpx/mpx-check.h: Include mpx-os-support.h.

Tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: gcc.target/i386/mpx/mpx-check.h
===
--- gcc.target/i386/mpx/mpx-check.h (revision 251661)
+++ gcc.target/i386/mpx/mpx-check.h (working copy)
@@ -1,8 +1,8 @@
 #include 
 #include 
 #include 
-
 #include "cpuid.h"
+#include "mpx-os-support.h"
 
 static int
 __attribute__ ((noinline))
@@ -16,16 +16,6 @@
 
 #define DEBUG
 
-#define XSTATE_BNDREGS (1 << 3)
-
-/* This should be an intrinsic, but isn't.  */
-static int xgetbv (unsigned x)
-{
-   unsigned eax, edx;
-   asm ("xgetbv" : "=a" (eax), "=d" (edx) : "c" (x)); 
-   return eax;
-}
-
 static int
 check_osxsave (void)
 {
@@ -44,7 +34,7 @@
 return NORUNRES;
 
   /* Run MPX test only if host has MPX support.  */
-  if (check_osxsave () && (ebx & bit_MPX) && (xgetbv (0) & XSTATE_BNDREGS))
+  if (check_osxsave () && (ebx & bit_MPX) && mpx_os_support ())
 mpx_test (argc, argv);
   else
 {
Index: gcc.target/i386/mpx/mpx-os-support.h
===
--- gcc.target/i386/mpx/mpx-os-support.h(nonexistent)
+++ gcc.target/i386/mpx/mpx-os-support.h(working copy)
@@ -0,0 +1,16 @@
+/* Check if the OS supports executing MPX instructions.  */
+
+#define XCR_XFEATURE_ENABLED_MASK  0x0
+
+#define XSTATE_BNDREGS 0x8
+
+static int
+mpx_os_support (void)
+{
+  unsigned int eax, edx;
+  unsigned int ecx = XCR_XFEATURE_ENABLED_MASK;
+
+  __asm__ ("xgetbv" : "=a" (eax), "=d" (edx) : "c" (ecx));
+
+  return (eax & XSTATE_BNDREGS) != 0;
+}


[PATCH, i386]: Fix PR82098, ICE in elimination_costs_in_insn

2017-09-04 Thread Uros Bizjak
2017-09-04  Uros Bizjak  

PR target/82098
* config/i386/i386.md (*_mask): Add
TARGET_USE_BT to insn constraint.
(*btr_mask): Ditto.

Bootstrapped and regression tested on x86_64-linux-gnu {,-m32}.

Committed to mainline SVN.

Uros.
Index: config/i386/i386.md
===
--- config/i386/i386.md (revision 251662)
+++ config/i386/i386.md (working copy)
@@ -11033,8 +11033,9 @@
(match_operand:SI 2 "const_int_operand")) 0))
  (match_operand:SWI48 3 "register_operand")))
(clobber (reg:CC FLAGS_REG))]
-  "(INTVAL (operands[2]) & (GET_MODE_BITSIZE (mode)-1))
-   == GET_MODE_BITSIZE (mode)-1
+  "TARGET_USE_BT
+   && (INTVAL (operands[2]) & (GET_MODE_BITSIZE (mode)-1))
+  == GET_MODE_BITSIZE (mode)-1
&& can_create_pseudo_p ()"
   "#"
   "&& 1"
@@ -11073,8 +11074,9 @@
(match_operand:SI 2 "const_int_operand")) 0))
  (match_operand:SWI48 3 "register_operand")))
(clobber (reg:CC FLAGS_REG))]
-  "(INTVAL (operands[2]) & (GET_MODE_BITSIZE (mode)-1))
-   == GET_MODE_BITSIZE (mode)-1
+  "TARGET_USE_BT
+   && (INTVAL (operands[2]) & (GET_MODE_BITSIZE (mode)-1))
+  == GET_MODE_BITSIZE (mode)-1
&& can_create_pseudo_p ()"
   "#"
   "&& 1"


Re: Add support to trace comparison instructions and switch statements

2017-09-04 Thread Jakub Jelinek
On Mon, Sep 04, 2017 at 09:36:40PM +0800, 吴潍浠(此彼) wrote:
> gcc/ChangeLog: 
> 
> 2017-09-04  Wish Wu   
> 
> * asan.c (initialize_sanitizer_builtins):  
> * builtin-types.def (BT_FN_VOID_UINT8_UINT8):  
> (BT_FN_VOID_UINT16_UINT16):
> (BT_FN_VOID_UINT32_UINT32):
> (BT_FN_VOID_FLOAT_FLOAT):  
> (BT_FN_VOID_DOUBLE_DOUBLE):
> (BT_FN_VOID_UINT64_PTR):   
> * common.opt:  
> * flag-types.h (enum sanitize_coverage_code):  
> * opts.c (COVERAGE_SANITIZER_OPT): 
> (get_closest_sanitizer_option):
> (parse_sanitizer_options): 
> (common_handle_option):
> * sancov.c (instrument_cond):  
> (instrument_switch):   
> (sancov_pass): 
> * sanitizer.def (BUILT_IN_SANITIZER_COV_TRACE_CMP1):   
> (BUILT_IN_SANITIZER_COV_TRACE_CMP2):   
> (BUILT_IN_SANITIZER_COV_TRACE_CMP4):   
> (BUILT_IN_SANITIZER_COV_TRACE_CMP8):   
> (BUILT_IN_SANITIZER_COV_TRACE_CMPF):   
> (BUILT_IN_SANITIZER_COV_TRACE_CMPD):   
> (BUILT_IN_SANITIZER_COV_TRACE_SWITCH): 

mklog just generates a template, you need to fill in the details
on what has been changed or added or removed.  See other ChangeLog
entries etc. to see what is expected.

> For code :
> void bar (void);
> void
> foo (int x)
> {
>   if (x == 21 || x == 64 || x == 98 || x == 135)
> bar ();
> }
> GIMPLE IL on x86_64:
>   1 
>   2 ;; Function foo (foo, funcdef_no=0, decl_uid=2161, cgraph_uid=0, 
> symbol_order=0)
>   3 
>   4 foo (int x)
>   5 {

...

That is with -O0 though?  With -O2 you'll see that it changes.
IMNSHO you really want to also handle the GIMPLE_ASSIGN with tcc_comparison
class rhs_code.  Shouldn't be that hard to handle that within
instrument_cond, just the way how you extract lhs and rhs from the insn will
differ based on if it is a GIMPLE_COND or GIMPLE_ASSIGN (and in that case
also for tcc_comparison rhs_code or for COND_EXPR with tcc_comparison first
operand).
And I really think we should change the 2 LOGICAL_OP_NON_SHORT_CIRCUIT
uses in fold-const.c and one in tree-ssa-ifcombine.c with
  LOGICAL_OP_NON_SHORT_CIRCUIT
  && !flag_sanitize_coverage
with a comment that for sanitize coverage we want to avoid this optimization
because it negatively affects it.

@@ -1611,38 +1631,51 @@ parse_sanitizer_options (const char *p, location_t
}
 
   /* Check to see if the string matches an option class name.  */
-  for (i = 0; sanitizer_opts[i].name != NULL; ++i)
-   if (len == sanitizer_opts[i].len
-   && memcmp (p, sanitizer_opts[i].name, len) == 0)
+  for (i = 0; opts[i].name != NULL; ++i)
+   if (len == opts[i].len
+   && memcmp (p, opts[i].name, len) == 0)
  {
-   /* Handle both -fsanitize and -fno-sanitize cases.  */
-   if (value && sanitizer_opts[i].flag == ~0U)
+   if (code == OPT_fsanitize_coverage_)
  {
-   if (code == OPT_fsanitize_)
- {
-   if (complain)
- error_at (loc, "%<-fsanitize=all%> option is not valid");
- }
+   if (value)
+ flags |= opts[i].flag;
else
- flags |= ~(SANITIZE_THREAD | SANITIZE_LEAK
-| SANITIZE_UNREACHABLE | SANITIZE_RETURN);
+ flags &= ~opts[i].flag;
+   found = true;
+   break;
  }
-   else if (value)
+   else
  {
-   /* Do not enable -fsanitize-recover=unreachable and
-  -fsanitize-recover=return if -fsanitize-recover=undefined
-  is selected.  */
-   if (code == OPT_fsanitize_recover_
-   && sanitizer_opts[i].flag == SANITIZE_UNDEFINED)
- flags |= (SANITIZE_UNDEFINED
-   & ~(SANITIZE_UNREACHABLE | SANITIZE_RETURN));
+   /* Handle both -fsanitize and -fno-sanitize cases.  */
+   if (value && opts[i].flag == ~0U)
+ {
+   if (code == OPT_fsanitize_)
+ {
+   if (complain)
+ error_at (loc,
+   

Re: [PATCH][ARM] Remove Thumb-2 iordi_not patterns

2017-09-04 Thread Kyrill Tkachov


On 04/09/17 18:07, Wilco Dijkstra wrote:

Kyrill Tkachov wrote:



-(define_insn_and_split "*iordi_notsesidi_di"
-  [(set (match_operand:DI 0 "s_register_operand" "=,")
-   (ior:DI (not:DI (sign_extend:DI
-(match_operand:SI 2 "s_register_operand" "r,r")))
-   (match_operand:DI 1 "s_register_operand" "0,r")))]
-  "TARGET_THUMB2"
-  "#"
-  "TARGET_THUMB2 && reload_completed"
-  [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1)))
-   (set (match_dup 3) (ior:SI (not:SI
-   (ashiftrt:SI (match_dup 2) (const_int
31)))
-  (match_dup 4)))]
-  "
-  {
-operands[3] = gen_highpart (SImode, operands[0]);
-operands[0] = gen_lowpart (SImode, operands[0]);
-operands[4] = gen_highpart (SImode, operands[1]);
-operands[1] = gen_lowpart (SImode, operands[1]);
-  }"
-  [(set_attr "length" "8")
-   (set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "no")
-   (set_attr "type" "multiple")]
-)
-
... but why can we delete all these? As far as I can see they still
perform useful
splitting and don't have a NEON-specific pattern.
Am I missing something?

After Bernd's change almost all DI mode instructions are split before register
allocation. So instructions using DI mode no longer exist and thus these
extend variants can never be matched and are thus redundant.


Bernd's patch splits them when we don't have NEON. When NEON is 
available though
they still maintain the DImode so we'd still benefit from these 
transformations, no?


Kyrill


Note that the SI mode instructions have zero/sign extend optimized after
being split from DI mode using existing patterns.

Wilco

 




Re: [PATCH][ARM] Remove Thumb-2 iordi_not patterns

2017-09-04 Thread Wilco Dijkstra
Kyrill Tkachov wrote:


> -(define_insn_and_split "*iordi_notsesidi_di"
> -  [(set (match_operand:DI 0 "s_register_operand" "=,")
> -   (ior:DI (not:DI (sign_extend:DI
> -    (match_operand:SI 2 "s_register_operand" "r,r")))
> -   (match_operand:DI 1 "s_register_operand" "0,r")))]
> -  "TARGET_THUMB2"
> -  "#"
> -  "TARGET_THUMB2 && reload_completed"
> -  [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1)))
> -   (set (match_dup 3) (ior:SI (not:SI
> -   (ashiftrt:SI (match_dup 2) (const_int 
> 31)))
> -  (match_dup 4)))]
> -  "
> -  {
> -    operands[3] = gen_highpart (SImode, operands[0]);
> -    operands[0] = gen_lowpart (SImode, operands[0]);
> -    operands[4] = gen_highpart (SImode, operands[1]);
> -    operands[1] = gen_lowpart (SImode, operands[1]);
> -  }"
> -  [(set_attr "length" "8")
> -   (set_attr "predicable" "yes")
> -   (set_attr "predicable_short_it" "no")
> -   (set_attr "type" "multiple")]
> -)
> -

> ... but why can we delete all these? As far as I can see they still 
> perform useful
> splitting and don't have a NEON-specific pattern.
> Am I missing something?

After Bernd's change almost all DI mode instructions are split before register
allocation. So instructions using DI mode no longer exist and thus these
extend variants can never be matched and are thus redundant.

Note that the SI mode instructions have zero/sign extend optimized after
being split from DI mode using existing patterns.

Wilco



Re: [PATCH][ARM] Fix ldrd offsets

2017-09-04 Thread Kyrill Tkachov


On 27/06/17 16:39, Wilco Dijkstra wrote:



ping

From: Wilco Dijkstra
Sent: 03 November 2016 12:20
To: GCC Patches
Cc: nd
Subject: [PATCH][ARM] Fix ldrd offsets

Fix ldrd offsets of Thumb-2 - for TARGET_LDRD the range is +-1020,
without -255..4091.  This reduces the number of addressing instructions
when using DI mode operations (such as in PR77308).

Bootstrap & regress OK.


Ok.

Thanks,
Kyrill



ChangeLog:
2015-11-03  Wilco Dijkstra  

gcc/
* config/arm/arm.c (arm_legitimate_index_p): Add comment.
(thumb2_legitimate_index_p): Use correct range for DI/DF mode.
--

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
3c4c7042d9c2101619722b5822b3d1ca37d637b9..5d12cf9c46c27d60a278d90584bde36ec86bb3fe 
100644

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -7486,6 +7486,8 @@ arm_legitimate_index_p (machine_mode mode, rtx 
index, RTX_CODE outer,

 {
   HOST_WIDE_INT val = INTVAL (index);

+ /* Assume we emit ldrd or 2x ldr if !TARGET_LDRD.
+If vldr is selected it uses arm_coproc_mem_operand.  */
   if (TARGET_LDRD)
 return val > -256 && val < 256;
   else
@@ -7613,11 +7615,13 @@ thumb2_legitimate_index_p (machine_mode mode, 
rtx index, int strict_p)

   if (code == CONST_INT)
 {
   HOST_WIDE_INT val = INTVAL (index);
- /* ??? Can we assume ldrd for thumb2?  */
- /* Thumb-2 ldrd only has reg+const addressing modes.  */
- /* ldrd supports offsets of +-1020.
-However the ldr fallback does not.  */
- return val > -256 && val < 256 && (val & 3) == 0;
+ /* Thumb-2 ldrd only has reg+const addressing modes.
+Assume we emit ldrd or 2x ldr if !TARGET_LDRD.
+If vldr is selected it uses arm_coproc_mem_operand.  */
+ if (TARGET_LDRD)
+   return IN_RANGE (val, -1020, 1020) && (val & 3) == 0;
+ else
+   return IN_RANGE (val, -255, 4095 - 4);
 }
   else
 return 0;




Re: [PATCH][ARM] Improve max_insns_skipped logic

2017-09-04 Thread Kyrill Tkachov


On 27/06/17 16:38, Wilco Dijkstra wrote:



ping


From: Wilco Dijkstra
Sent: 10 November 2016 17:19
To: GCC Patches
Cc: nd
Subject: [PATCH][ARM] Improve max_insns_skipped logic

Improve the logic when setting max_insns_skipped.  Limit the maximum 
size of IT

to MAX_INSN_PER_IT_BLOCK as otherwise multiple IT instructions are needed,
increasing codesize.  Given 4 works well for Thumb-2, use the same 
limit for ARM

for consistency.

ChangeLog:
2016-11-04  Wilco Dijkstra  

* config/arm/arm.c (arm_option_params_internal): Improve 
setting of

max_insns_skipped.
--

diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c
index 
f046854e9665d54911616fc1c60fee407188f7d6..29e8d1d07d918fbb2a627a653510dfc8587ee01a 
100644

--- a/gcc/config/arm/arm.c
+++ b/gcc/config/arm/arm.c
@@ -2901,20 +2901,12 @@ arm_option_params_internal (void)
   targetm.max_anchor_offset = TARGET_MAX_ANCHOR_OFFSET;
 }

-  if (optimize_size)
-{
-  /* If optimizing for size, bump the number of instructions that we
- are prepared to conditionally execute (even on a StrongARM).  */
-  max_insns_skipped = 6;
+  /* Increase the number of conditional instructions with -Os.  */
+  max_insns_skipped = optimize_size ? 4 : 
current_tune->max_insns_skipped;


-  /* For THUMB2, we limit the conditional sequence to one IT 
block.  */

-  if (TARGET_THUMB2)
-max_insns_skipped = arm_restrict_it ? 1 : 4;
-}
-  else
-/* When -mrestrict-it is in use tone down the if-conversion.  */
-max_insns_skipped = (TARGET_THUMB2 && arm_restrict_it)
-  ? 1 : current_tune->max_insns_skipped;
+  /* For THUMB2, we limit the conditional sequence to one IT block.  */
+  if (TARGET_THUMB2)
+max_insns_skipped = MIN (max_insns_skipped, MAX_INSN_PER_IT_BLOCK);


I like the simplifications in the selection logic here :)
However, changing the value for ARM from 6 to 4 looks a bit arbitrary to me.
There's probably a reason why default values for ARM and Thumb-2 are 
different
(maybe not a good one) and I'd rather not change it without some code 
size data measurements.

So I'd rather not let that hold this cleanup patch though, so this is ok
 (assuming a normal bootstrap and testing cycle) without changing the 6 
to a 4
and you can propose a change to 4 as a separate patch that can be 
discussed on its own.


Thanks,
Kyrill


 }

 /* True if -mflip-thumb should next add an attribute for the default





Re: [RFC, vectorizer] Allow single element vector types for vector reduction operations

2017-09-04 Thread Andrew Pinski
On Mon, Sep 4, 2017 at 7:28 AM, Tamar Christina  wrote:
>> >   vect__5.25_58 = VIEW_CONVERT_EXPR> intD.11>(vect__4.21_65);
>> >   vect__5.25_57 = VIEW_CONVERT_EXPR> intD.11>(vect__4.22_63);
>> >   vect__5.25_56 = VIEW_CONVERT_EXPR> intD.11>(vect__4.23_61);
>> >   vect__5.25_55 = VIEW_CONVERT_EXPR> > intD.11>(vect__4.24_59);
>> >
>> > I suspect this patch will be quite bad for us performance wise as it
>> > thinks it's as cheap to do all our integer operations on the vector side 
>> > with
>> vectors of 1 element. But I'm still waiting for the perf numbers to confirm.
>>
>> Looks like the backend advertises that it can do POPCOUNT on V1DI.  So SLP
>> vectorization decides it can vectorize this without unrolling.
>
> We don't, POPCOUNT is only defined for vector modes V8QI and V16QI, we also 
> don't define support
> For V1DI anywhere in the backend, we do however say we support V1DF, but 
> removing
> That doesn't cause the ICE to go away.
>
>> Vectorization with V2DI is rejected:
>>
>> /tmp/trunk/gcc/testsuite/gcc.dg/vect/pr68577.c:18:3: note: function is not
>> vectorizable.
>> /tmp/trunk/gcc/testsuite/gcc.dg/vect/pr68577.c:18:3: note: not vectorized:
>> relevant stmt not supported: _8 = __builtin_popcountl (_5); ...
>> /tmp/trunk/gcc/testsuite/gcc.dg/vect/pr68577.c:18:3: note: * Re-trying
>> analysis with vector size 8
>>
>> and that now succeeds (it probably didn't succeed before the patch).
>
> In the .optimized file, I see it vectorised it to
>
>   vect__5.25_58 = VIEW_CONVERT_EXPR intD.11>(vect__4.21_65);
>   vect__5.25_57 = VIEW_CONVERT_EXPR intD.11>(vect__4.22_63);
>   vect__5.25_56 = VIEW_CONVERT_EXPR intD.11>(vect__4.23_61);
>   vect__5.25_55 = VIEW_CONVERT_EXPR intD.11>(vect__4.24_59);
>   _54 = POPCOUNT (vect__5.25_58);
>   _53 = POPCOUNT (vect__5.25_57);
>
> Which is something we just don't have a pattern for. Before this patch, it 
> was rejecting "long unsigned int"
> With this patch is somehow thinks we support an integer vector of 1 element, 
> even though 1) we don't have an optab
> Defined for this operation for POPCOUNT (or at all in aarch64 as far as I can 
> tell), and 2) we don't have it in our supported list of vector modes.

Here are the two popcount optab aarch64 has:
(define_mode_iterator GPI [SI DI])
(define_expand "popcount2"
  [(match_operand:GPI 0 "register_operand")
   (match_operand:GPI 1 "register_operand")]


(define_insn "popcount2"
(define_mode_iterator VB [V8QI V16QI])
  [(set (match_operand:VB 0 "register_operand" "=w")
(popcount:VB (match_operand:VB 1 "register_operand" "w")))]

As you can see we only define popcount optab for SI, DI, V8QI and
V16QI.  (note SI and DI uses the V8QI and V16QI during the expansion
but that is a different story).

Maybe somehow the vectorizer is thinking V1DI and DI are
interchangeable in some places.

Thanks,
Andrew


>
> So I don't quite understand how we end up with this expression.
>
> Regards,
> Tamar
>
>>
>> Richard.
>>
>> > 
>> > From: gcc-patches-ow...@gcc.gnu.org 
>> on
>> > behalf of Andreas Schwab 
>> > Sent: Saturday, September 2, 2017 10:58 PM
>> > To: Jon Beniston
>> > Cc: 'Richard Biener'; gcc-patches@gcc.gnu.org
>> > Subject: Re: [RFC, vectorizer] Allow single element vector types for
>> > vector reduction operations
>> >
>> > On Aug 30 2017, "Jon Beniston"  wrote:
>> >
>> > > gcc/
>> > > 2017-08-30  Jon Beniston  
>> > > Richard Biener  
>> > >
>> > > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
>> > > index cfdb72c..5ebeac2 100644
>> > > --- a/gcc/tree-vect-patterns.c
>> > > +++ b/gcc/tree-vect-patterns.c
>> > > @@ -4150,7 +4150,7 @@ vect_pattern_recog_1 (vect_recog_func
>> *recog_func,
>> > >loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>> > >
>> > >if (VECTOR_BOOLEAN_TYPE_P (type_in)
>> > > -  || VECTOR_MODE_P (TYPE_MODE (type_in)))
>> > > +  || VECTOR_TYPE_P (type_in))
>> > >  {
>> > >/* No need to check target support (already checked by the pattern
>> > >   recognition function).  */ diff --git
>> > > a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index
>> > > 013fb1f..fc62efb 100644
>> > > --- a/gcc/tree-vect-stmts.c
>> > > +++ b/gcc/tree-vect-stmts.c
>> > > @@ -9098,7 +9098,8 @@ get_vectype_for_scalar_type_and_size (tree
>> > > scalar_type, unsigned size)
>> > >else
>> > >  simd_mode = mode_for_vector (inner_mode, size / nbytes);
>> > >nunits = GET_MODE_SIZE (simd_mode) / nbytes;
>> > > -  if (nunits <= 1)
>> > > +  /* NOTE: nunits == 1 is allowed to support single element vector 
>> > > types.
>> > > */
>> > > +  if (nunits < 1)
>> > >   

Re: [PATCH][ARM] Remove Thumb-2 iordi_not patterns

2017-09-04 Thread Kyrill Tkachov

Hi Wilco,

Sorry for the delay...

On 27/06/17 16:38, Wilco Dijkstra wrote:



ping


From: Wilco Dijkstra
Sent: 17 January 2017 18:00
To: GCC Patches
Cc: nd; Kyrylo Tkachov; Richard Earnshaw
Subject: [PATCH][ARM] Remove Thumb-2 iordi_not patterns

After Bernd's DImode patch [1] almost all DImode operations are expanded
early (except for -mfpu=neon). This means the Thumb-2 iordi_notdi_di
patterns are no longer used - the split ORR and NOT instructions are 
merged

into ORN by Combine.  With -mfpu=neon the iordi_notdi_di patterns are used
on Thumb-2, and after this patch the orndi3_neon pattern matches instead
(which still emits ORN).  After this there are no Thumb-2 specific 
DImode patterns.


[1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02796.html

ChangeLog:
2017-01-17  Wilco Dijkstra  

* config/arm/thumb2.md (iordi_notdi_di): Remove pattern.
(iordi_notzesidi_di): Likewise.
(iordi_notdi_zesidi): Likewise.
(iordi_notsesidi_di): Likewise.

--

diff --git a/gcc/config/arm/thumb2.md b/gcc/config/arm/thumb2.md
index 
2e7580f220eae1524fef69719b1796f50f5cf27c..91471d4650ecae4f4e87b549d84d11adf3014ad2 
100644

--- a/gcc/config/arm/thumb2.md
+++ b/gcc/config/arm/thumb2.md
@@ -1434,103 +1434,6 @@
(set_attr "type" "alu_sreg")]
 )

-; Constants for op 2 will never be given to these patterns.
-(define_insn_and_split "*iordi_notdi_di"
-  [(set (match_operand:DI 0 "s_register_operand" "=,")
-   (ior:DI (not:DI (match_operand:DI 1 "s_register_operand" "0,r"))
-   (match_operand:DI 2 "s_register_operand" "r,0")))]
-  "TARGET_THUMB2"
-  "#"
-  "TARGET_THUMB2 && reload_completed"
-  [(set (match_dup 0) (ior:SI (not:SI (match_dup 1)) (match_dup 2)))
-   (set (match_dup 3) (ior:SI (not:SI (match_dup 4)) (match_dup 5)))]
-  "
-  {
-operands[3] = gen_highpart (SImode, operands[0]);
-operands[0] = gen_lowpart (SImode, operands[0]);
-operands[4] = gen_highpart (SImode, operands[1]);
-operands[1] = gen_lowpart (SImode, operands[1]);
-operands[5] = gen_highpart (SImode, operands[2]);
-operands[2] = gen_lowpart (SImode, operands[2]);
-  }"
-  [(set_attr "length" "8")
-   (set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "no")
-   (set_attr "type" "multiple")]
-)


So I get why we delete this, since we have orndi3_neon in neon.md for 
the NEON version anyway...



-
-(define_insn_and_split "*iordi_notzesidi_di"
-  [(set (match_operand:DI 0 "s_register_operand" "=,")
-   (ior:DI (not:DI (zero_extend:DI
-(match_operand:SI 2 "s_register_operand" "r,r")))
-   (match_operand:DI 1 "s_register_operand" "0,?r")))]
-  "TARGET_THUMB2"
-  "#"
-  ; (not (zero_extend...)) means operand0 will always be 0x
-  "TARGET_THUMB2 && reload_completed"
-  [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1)))
-   (set (match_dup 3) (const_int -1))]
-  "
-  {
-operands[3] = gen_highpart (SImode, operands[0]);
-operands[0] = gen_lowpart (SImode, operands[0]);
-operands[1] = gen_lowpart (SImode, operands[1]);
-  }"
-  [(set_attr "length" "4,8")
-   (set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "no")
-   (set_attr "type" "multiple")]
-)
-
-(define_insn_and_split "*iordi_notdi_zesidi"
-  [(set (match_operand:DI 0 "s_register_operand" "=,")
-   (ior:DI (not:DI (match_operand:DI 2 "s_register_operand" "0,?r"))
-   (zero_extend:DI
-(match_operand:SI 1 "s_register_operand" "r,r"]
-  "TARGET_THUMB2"
-  "#"
-  "TARGET_THUMB2 && reload_completed"
-  [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1)))
-   (set (match_dup 3) (not:SI (match_dup 4)))]
-  "
-  {
-operands[3] = gen_highpart (SImode, operands[0]);
-operands[0] = gen_lowpart (SImode, operands[0]);
-operands[1] = gen_lowpart (SImode, operands[1]);
-operands[4] = gen_highpart (SImode, operands[2]);
-operands[2] = gen_lowpart (SImode, operands[2]);
-  }"
-  [(set_attr "length" "8")
-   (set_attr "predicable" "yes")
-   (set_attr "predicable_short_it" "no")
-   (set_attr "type" "multiple")]
-)
-
-(define_insn_and_split "*iordi_notsesidi_di"
-  [(set (match_operand:DI 0 "s_register_operand" "=,")
-   (ior:DI (not:DI (sign_extend:DI
-(match_operand:SI 2 "s_register_operand" "r,r")))
-   (match_operand:DI 1 "s_register_operand" "0,r")))]
-  "TARGET_THUMB2"
-  "#"
-  "TARGET_THUMB2 && reload_completed"
-  [(set (match_dup 0) (ior:SI (not:SI (match_dup 2)) (match_dup 1)))
-   (set (match_dup 3) (ior:SI (not:SI
-   (ashiftrt:SI (match_dup 2) (const_int 
31)))

-  (match_dup 4)))]
-  "
-  {
-operands[3] = gen_highpart (SImode, operands[0]);
-operands[0] = gen_lowpart (SImode, operands[0]);
-operands[4] = gen_highpart (SImode, operands[1]);
-operands[1] = gen_lowpart (SImode, operands[1]);
-  }"
-  [(set_attr "length" "8")
-   

Re: [PATCH, ARM] Further improve stack usage in sha512, part 2 (PR 77308)

2017-09-04 Thread Kyrill Tkachov

Hi Bernd,

On 29/04/17 18:52, Bernd Edlinger wrote:

Ping...

I attached the latest version of my patch.


Thanks
Bernd.

On 12/18/16 14:14, Bernd Edlinger wrote:

Hi,

this splits the *arm_negdi2, *arm_cmpdi_insn and *arm_cmpdi_unsigned
also at split1 except for TARGET_NEON and TARGET_IWMMXT.

In the new test case the stack is reduced to about 270 bytes, except
for neon and iwmmxt, where this does not change anything.

This patch depends on [1] and [2] before it can be applied.

Bootstrapped and reg-tested on arm-linux-gnueabihf.
Is it OK for trunk?


Thanks
Bernd.



[1] https://gcc.gnu.org/ml/gcc-patches/2016-11/msg02796.html
[2] https://gcc.gnu.org/ml/gcc-patches/2016-12/msg01562.html


2016-12-18  Bernd Edlinger

PR target/77308
* config/arm/arm.md (*arm_negdi2, *arm_cmpdi_insn,
*arm_cmpdi_unsigned): Split early except for TARGET_NEON and
TARGET_IWMMXT.

You're changing negdi2_insn rather than *arm_negdi2.

Ok with the fixed ChangeLog once the prerequisite is committed.

Thanks,
Kyrill




Re: [PATCH] PR libstdc++/79162 ambiguity in string assignment due to string_view overload (LWG 2946)

2017-09-04 Thread Jonathan Wakely

On 30/07/17 15:01 +0200, Daniel Krügler wrote:

2017-07-28 22:40 GMT+02:00 Daniel Krügler :

2017-07-28 22:29 GMT+02:00 Daniel Krügler :

2017-07-28 22:25 GMT+02:00 Tim Song :

On Fri, Jul 28, 2017 at 4:10 PM, Daniel Krügler
 wrote:

+  // Performs an implicit conversion from _Tp to __sv_type.
+  template
+static __sv_type _S_to_string_view(const _Tp& __svt)
+{
+  return __svt;
+}


I might have gone for

+static __sv_type _S_to_string_view(__sv_type __svt) noexcept
+{
+  return __svt;
+}

With that, we can also use noexcept(_S_to_string_view(__t)) to make up
for the absence of is_nothrow_convertible (basically the same thing I
did in LWG 2993's PR).


Agreed, that makes very much sense. I will adjust the P/R, but before
I resubmit I would like to get feedback whether the other two compare
functions also should become conditionally noexcept.


Locally I have now performed the sole change of the _S_to_string_view
declaration getting rid of the template, but would also like to gather
feedback from the maintainers whether I should also change the form of
the conditional noexcept to use the expression

noexcept(_S_to_string_view(__t))

instead of the current

is_same<_Tp, __sv_type>::value

as suggested by Tim Song.

I'm asking also, because I have a paper proposing to standardize
is_nothrow_convertible submitted for the upcoming C++ mailing - This
would be one of the first applications in the library ;-)


A slightly revised patch update: It replaces the _S_to_string_view
template by a simpler _S_to_string_view function as of Tim Song's
suggestion, but still uses the simplified noexcept specification
deferring it to a future application case for is_nothrow_convertible.
Furthermore now all three compare function templates are now
(conditionally) noexcept by an (off-list) suggestion from Jonathan
Wakely.


I've committed this, after some whitespace fixes and testing.

Thanks!



Re: [RFC PATCH, i386]: Convert TLS location to DEFAULT_TLS_SEG_REG address space

2017-09-04 Thread Uros Bizjak
On Thu, Aug 31, 2017 at 9:32 PM, Uros Bizjak  wrote:
> Hello!
>
> Using following testcase:
>
> --cut here--
> __thread int a;
>
> int foo (void)
> {
>   return a;
> }
> --cut here--
>
> Attached patch converts TLS location in the form of:
>
> (mem/c:SI (plus:DI (unspec:DI [
> (const_int 0 [0])
> ] UNSPEC_TP)
> (const:DI (unspec:DI [
> (symbol_ref:DI ("a") [flags 0x2a]
> )
> ] UNSPEC_NTPOFF))) [1 a+0 S4 A32]))
> "thread.c":5 82 {*movsi_internal}
> to:
>
> (mem/c:SI (const:DI (unspec:DI [
> (symbol_ref:DI ("a") [flags 0x2a]   0x7f8c3acd5e10 a>)
> ] UNSPEC_NTPOFF)) [1 a+0 S4 A32 AS1])) "thread.c":5 -1
>
> avoiding the UNSPEC_TP tag and instead mark the location with AS.
>
> This way, address space becomes the property of the memory location,
> not of the address itself, so we don't need ugly hacks when the
> address is moved to a register (LEA ignores segment overrides, c.f.
> split_long_move function in i386.c).
>
> Please note that some instructions (e.g. prefetchX) operate on
> prefixed *address*, so we can't just rip out non-AS code from
> print_operand. The above amended example:
>
> --cut here--
> __thread int a;
>
> int foo (void)
> {
>   __builtin_prefetch (, 0);
>   return a;
> }
> --cut here--
>
> compiles to:
>
> prefetcht0  %fs:a@tpoff
> movl%fs:a@tpoff, %eax
>
> where prefetch operand remains:
>
> (insn 7 6 16 2 (prefetch (plus:DI (unspec:DI [
> (const_int 0 [0])
> ] UNSPEC_TP)
> (const:DI (unspec:DI [
> (symbol_ref:DI ("a") [flags 0x2a]   0x7fe994a14e10 a>)
> ] UNSPEC_NTPOFF)))
> (const_int 0 [0])
> (const_int 3 [0x3])) "thread.c":5 1010 {*prefetch_sse}
>  (nil))
>
> 2017-08-31  Uros Bizjak  
>
> * config/i386/i386-protos.h (ix86_tls_address_pattern_p) New prototype.
> (ix86_rewrite_tls_address): Ditto.
> * config/i386/i386.c (ix86_tls_address_pattern_p) New function.
> (ix86_rewrite_tls_address_1): Ditto.
> (ix86_rewrite_tls_address): Ditto.
> * config/i386/predicates.md (tls_address_pattern): New predicate.
> * config/i386/i386.md (TLS address splitter): New splitter.
>
> Patch was bootstrapped and regression tested on x86_64-linux-gnu
> {,-m32}, all default languages plus go.
>
> I plan to commit the patch to the mainline early next week.

Committed to mainline SVN.

Uros


Re: [RFC, vectorizer] Allow single element vector types for vector reduction operations

2017-09-04 Thread Christophe Lyon
On 30 August 2017 at 17:11, Jon Beniston  wrote:
> Hi Richard,
>
>> I think the issue is that with TARGET_VECTOR_MODE_SUPPORTED_P false
>> for V1SI you'll get a SImode vector type and the
>>
>>  if (VECTOR_BOOLEAN_TYPE_P (type_in)
>>  || VECTOR_MODE_P (TYPE_MODE (type_in)))
>>
>>check fails.  Try changing that to || VECTOR_TYPE_P (type_in).
>>The else path should be hopefully dead ...
>>
>>> It looks to me like the other call sites of optab_for_tree_code which
>>> are passing optab_default are mostly places where GCC is sure it is
>>> not a shift operation.
>>> Given TYPE_IN is returned from get_vectype_for_scalar_type, is it
>>> safe to simply pass optab_vector in vect_pattern_recog_1?
>>
>>I guess so.  Can you also try the above?
>
> Changing VECTOR_MODE_P check into VECTOR_TYPE_P check also works for me, I
> verified the following patch could vectorize my dot product case and there
> is no regression on gcc tests on my port.
>
> Meanwhile x86-64 bootstraps OK and no regression on gcc/g++ test.
>
>  Does this look OK?
>
>
> gcc/
> 2017-08-30  Jon Beniston  
> Richard Biener  
>
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> index cfdb72c..5ebeac2 100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -4150,7 +4150,7 @@ vect_pattern_recog_1 (vect_recog_func *recog_func,
>loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>
>if (VECTOR_BOOLEAN_TYPE_P (type_in)
> -  || VECTOR_MODE_P (TYPE_MODE (type_in)))
> +  || VECTOR_TYPE_P (type_in))
>  {
>/* No need to check target support (already checked by the pattern
>   recognition function).  */
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 013fb1f..fc62efb 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -9098,7 +9098,8 @@ get_vectype_for_scalar_type_and_size (tree
> scalar_type, unsigned size)
>else
>  simd_mode = mode_for_vector (inner_mode, size / nbytes);
>nunits = GET_MODE_SIZE (simd_mode) / nbytes;
> -  if (nunits <= 1)
> +  /* NOTE: nunits == 1 is allowed to support single element vector types.
> */
> +  if (nunits < 1)
>  return NULL_TREE;
>
>vectype = build_vector_type (scalar_type, nunits);
>
>

Hi,

This patch makes this test fail:
gcc.dg/tree-ssa/ssa-dom-cse-2.c scan-tree-dump optimized "return 28;"
on arm-none-linux-gnueabi when using the default fpu, or on
arm-none-linux-gnueabihf --with-fpu=vfp (as opposed to
--with-fpu=neon*)

Christophe


Re: [PATCH] Handle wide-chars in native_encode_string

2017-09-04 Thread Joseph Myers
On Mon, 4 Sep 2017, Richard Biener wrote:

> always have a consistend "character" size and how the individual
> "characters" are encoded.  The patch assumes that the array element
> type of the STRING_CST can be used to get access to individual
> characters by means of the element type size and those elements
> are stored in host byteorder.  Which means the patch simply handles

It's actually target byte order, i.e. the STRING_CST stores the same 
sequence of target bytes as would appear on the target system (modulo 
certain strings such as in asm statements and attributes, for which 
translation to the execution character set is disabled because those 
strings are only processed in the compiler on the host, not on the target 
- but you should never encounter such strings in the optimizers etc.).  
This is documented in generic.texi (complete with a warning about how it's 
not well-defined what the encoding is if target bytes are not the same as 
host bytes).

I suspect that, generically in the compiler, the use of C++ might make it 
easier than it would have been some time ago to build some abstractions 
around target strings that work for all of narrow strings, wide strings, 
char16_t strings etc. (for extracting individual elements - or individual 
characters which might be multibyte characters in the narrow string case, 
etc.) - as would be useful for e.g. wide string format checking and more 
generally for making e.g. optimizations for narrow strings also work for 
wide strings.  (Such abstractions wouldn't solve the question of what the 
format is if host and target bytes differ, but their use would reduce the 
number of places needing changing to establish a definition of the format 
in that case if someone were to do a port to a system with bytes bigger 
than 8 bits.)

However, as I understand the place you're patching, it doesn't have any 
use for such an abstraction; it just needs to copy a sequence of bytes 
from one place to another.  (And even with host bytes different from 
target bytes, clearly it would make sense to define the internal 
interfaces to make the encodings consistent so this function still only 
needs to copy bytes from one place to another and still doesn't need such 
abstractions.)

-- 
Joseph S. Myers
jos...@codesourcery.com


Re: [PING**2] [PATCH, ARM] Further improve stack usage on sha512 (PR 77308)

2017-09-04 Thread Kyrill Tkachov


On 29/04/17 18:45, Bernd Edlinger wrote:

Ping...

I attached a rebased version since there was a merge conflict in
the xordi3 pattern, otherwise the patch is still identical.
It splits adddi3, subdi3, anddi3, iordi3, xordi3 and one_cmpldi2
early when the target has no neon or iwmmxt.


Thanks
Bernd.



On 11/28/16 20:42, Bernd Edlinger wrote:

On 11/25/16 12:30, Ramana Radhakrishnan wrote:

On Sun, Nov 6, 2016 at 2:18 PM, Bernd Edlinger
 wrote:

Hi!

This improves the stack usage on the sha512 test case for the case
without hardware fpu and without iwmmxt by splitting all di-mode
patterns right while expanding which is similar to what the
shift-pattern
does.  It does nothing in the case iwmmxt and fpu=neon or vfp as well as
thumb1.


I would go further and do this in the absence of Neon, the VFP unit
being there doesn't help with DImode operations i.e. we do not have 64
bit integer arithmetic instructions without Neon. The main reason why
we have the DImode patterns split so late is to give a chance for
folks who want to do 64 bit arithmetic in Neon a chance to make this
work as well as support some of the 64 bit Neon intrinsics which IIRC
map down to these instructions. Doing this just for soft-float doesn't
improve the default case only. I don't usually test iwmmxt and I'm not
sure who has the ability to do so, thus keeping this restriction for
iwMMX is fine.



Yes I understand, thanks for pointing that out.

I was not aware what iwmmxt exists at all, but I noticed that most
64bit expansions work completely different, and would break if we split
the pattern early.

I can however only look at the assembler outout for iwmmxt, and make
sure that the stack usage does not get worse.

Thus the new version of the patch keeps only thumb1, neon and iwmmxt as
it is: around 1570 (thumb1), 2300 (neon) and 2200 (wimmxt) bytes stack
for the test cases, and vfp and soft-float at around 270 bytes stack
usage.


It reduces the stack usage from 2300 to near optimal 272 bytes (!).

Note this also splits many ldrd/strd instructions and therefore I will
post a followup-patch that mitigates this effect by enabling the
ldrd/strd
peephole optimization after the necessary reg-testing.


Bootstrapped and reg-tested on arm-linux-gnueabihf.

What do you mean by arm-linux-gnueabihf - when folks say that I
interpret it as --with-arch=armv7-a --with-float=hard
--with-fpu=vfpv3-d16 or (--with-fpu=neon).

If you've really bootstrapped and regtested it on armhf, doesn't this
patch as it stand have no effect there i.e. no change ?
arm-linux-gnueabihf usually means to me someone has configured with
--with-float=hard, so there are no regressions in the hard float ABI
case,


I know it proves little.  When I say arm-linux-gnueabihf
I do in fact mean --enable-languages=all,ada,go,obj-c++
--with-arch=armv7-a --with-tune=cortex-a9 --with-fpu=vfpv3-d16
--with-float=hard.

My main interest in the stack usage is of course not because of linux,
but because of eCos where we have very small task stacks and in fact
no fpu support by the O/S at all, so that patch is exactly what we need.


Bootstrapped and reg-tested on arm-linux-gnueabihf
Is it OK for trunk?


The code is ok.
AFAICS testing this with --with-fpu=vfpv3-d16 does exercise the new code 
as the splits
will happen for !TARGET_NEON (it is of course !TARGET_IWMMXT and 
TARGET_IWMMXT2

is irrelevant here).

So this is ok for trunk.
Thanks, and sorry again for the delay.
Kyrill



Thanks
Bernd.




Re: [PATCH GCC]A simple implementation of loop interchange

2017-09-04 Thread Bin.Cheng
On Mon, Sep 4, 2017 at 2:54 PM, Richard Biener
 wrote:
> On Wed, Aug 30, 2017 at 6:32 PM, Bin.Cheng  wrote:
>> On Wed, Aug 30, 2017 at 3:19 PM, Richard Biener
>>  wrote:
>>> On Wed, Aug 30, 2017 at 3:18 PM, Bin Cheng  wrote:
 Hi,
 This patch implements a simple loop interchange pass in GCC, as described 
 by its comments:
 +/* This pass performs loop interchange: for example, the loop nest
 +
 +   for (int j = 0; j < N; j++)
 + for (int k = 0; k < N; k++)
 +   for (int i = 0; i < N; i++)
 +c[i][j] = c[i][j] + a[i][k]*b[k][j];
 +
 +   is transformed to
 +
 +   for (int i = 0; i < N; i++)
 + for (int j = 0; j < N; j++)
 +   for (int k = 0; k < N; k++)
 +c[i][j] = c[i][j] + a[i][k]*b[k][j];
 +
 +   This pass implements loop interchange in the following steps:
 +
 + 1) Find perfect loop nest for each innermost loop and compute data
 +   dependence relations for it.  For above example, loop nest is
 +   .
 + 2) From innermost to outermost loop, this pass tries to interchange
 +   each loop pair.  For above case, it firstly tries to interchange
 +    and loop nest becomes .
 +   Then it tries to interchange  and loop nest becomes
 +   .  The overall effect is to move innermost
 +   loop to the outermost position.  For loop pair 
 +   to be interchanged, we:
 + 3) Check if data dependence relations are valid for loop interchange.
 + 4) Check if both loops can be interchanged in terms of 
 transformation.
 + 5) Check if interchanging the two loops is profitable.
 + 6) Interchange the two loops by mapping induction variables.
 +
 +   This pass also handles reductions in loop nest.  So far we only support
 +   simple reduction of inner loop and double reduction of the loop nest.  
 */

 Actually, this pass only does loop shift which outermosting inner loop to 
 outer, rather
 than permutation.  Also as a traditional loop optimizer, it only works for 
 perfect loop
 nest.  I put it just after loop distribution thus ideally loop 
 split/distribution could
 create perfect nest for it.  Unfortunately, we don't get any perfect nest 
 from distribution
 for now because it only works for innermost loop.  For example, the 
 motivation case in
 spec2k6/bwaves is not handled on this pass alone.  I have a patch 
 extending distribution
 for (innermost) loop nest and with that patch bwaves case can be handled.
 Another point is I deliberately make both the cost model and code 
 transformation (very)
 conservative.  We can support more cases, or more transformations with 
 great care when
 it is for sure known beneficial.  IMHO, we already hit over-baked issues 
 quite often and
 don't want to introduce more.
 As for code generation, this patch has an issue that invariant code in 
 outer loop could
 be moved to inner loop.  For the moment, we rely on the last lim pass to 
 handle all INV
 generated during interchange.  In the future, we may need to avoid that in 
 interchange
 itself, or another lim pass just like the one after graphite optimizations.

 Boostrap and test on x86_64 and AArch64.  Various benchmarks built and run 
 successfully.
 Note this pass is disabled in patch, while the code is exercised by 
 bootstrap/building
 programs with it enabled by default.  Any comments?
>>>
>> Thanks for quick review.
>>> +/* The same as above, but this one is only used for interchanging not
>>> +   innermost loops.  */
>>> +#define OUTER_STRIDE_RATIO (2)
>>>
>>> please make all these knobs --params.
>>>
>>> +/* Enum type for loop reduction variable.  */
>>> +enum reduction_type
>>> +{
>>> +  UNKNOWN_RTYPE = 0,
>>> +  SIMPLE_RTYPE,
>>> +  DOUBLE_RTYPE
>>> +};
>>>
>>> seeing this we should have some generic data structure / analysis for
>>> reduction detection.  This adds a third user (autopar and vectorizer
>>> are the others).  Just an idea.
>>>
>>> +/* Return true if E is abnormal edge.  */
>>> +
>>> +static inline bool
>>> +abnormal_edge (edge e)
>>> +{
>>> +  return (e->flags & (EDGE_EH | EDGE_ABNORMAL | EDGE_IRREDUCIBLE_LOOP));
>>> +}
>>>
>>> bad name/comment for what it does.
>>>
>>> ... jumping to end of file / start of pass
>>>
>>> +  /* Get the outer loop.  */
>>> +  loop = superloop_at_depth (loop, loop_depth (loop) - 1);
>>>
>>> loop_outer (loop)?
>>>
>>> +  /* Only support rectangle loop nest, i.e, inner loop's niters doesn't
>>> +depends on outer loop's IV.  */
>>> +  if 

Re: [PATCH][aarch64] Enable ifunc resolver attribute by default

2017-09-04 Thread Szabolcs Nagy
On 12/06/17 16:02, Steve Ellcey wrote:
> I recently noticed that the GCC 'resolver' attribute used for ifunc's is not
> on by default for aarch64 even though all the infrastructure to support it is
> in place.  I made memcpy an ifunc on aarch64 in glibc and am looking at
> possibly using it for libatomic too.  For this reason I would like to enable
> it by default.  Note that the memcpy ifunc works even when this is not enabled
> because glibc enables ifuncs by using the assembly language .type psuedo-op to
> set the resolver attribute when GCC cannot do it with an attribute.  Using
> an ifunc in libatomic does require this to be enabled and I do not see any
> reason not to have it enabled by default.
> 
> Tested with no regressions, OK to check in?
> 

this is not the right default for bionic, uclibc and musl

(gcc does not distinguish between supporting ifunc in the
compiler vs runtime, so when ifunc is enabled it is assumed
the c runtime will have support too, hence libatomic and
libgcc starts using ifuncs which breaks at runtime)

so don't change the default if target matches
*-*-*android*|*-*-*uclibc*|*-*-*musl*)

(i think the default should be kept "no" for these targets
independently of cpu arch, so the current logic that is
repeated many places in config.gcc is suboptimal.

and i think the attribute syntax should be always supported
and this setting should only mean that ifunc use is allowed
in the runtime libraries.)



RE: [RFC, vectorizer] Allow single element vector types for vector reduction operations

2017-09-04 Thread Tamar Christina
> >   vect__5.25_58 = VIEW_CONVERT_EXPR intD.11>(vect__4.21_65);
> >   vect__5.25_57 = VIEW_CONVERT_EXPR intD.11>(vect__4.22_63);
> >   vect__5.25_56 = VIEW_CONVERT_EXPR intD.11>(vect__4.23_61);
> >   vect__5.25_55 = VIEW_CONVERT_EXPR > intD.11>(vect__4.24_59);
> >
> > I suspect this patch will be quite bad for us performance wise as it
> > thinks it's as cheap to do all our integer operations on the vector side 
> > with
> vectors of 1 element. But I'm still waiting for the perf numbers to confirm.
> 
> Looks like the backend advertises that it can do POPCOUNT on V1DI.  So SLP
> vectorization decides it can vectorize this without unrolling.

We don't, POPCOUNT is only defined for vector modes V8QI and V16QI, we also 
don't define support
For V1DI anywhere in the backend, we do however say we support V1DF, but 
removing
That doesn't cause the ICE to go away. 

> Vectorization with V2DI is rejected:
> 
> /tmp/trunk/gcc/testsuite/gcc.dg/vect/pr68577.c:18:3: note: function is not
> vectorizable.
> /tmp/trunk/gcc/testsuite/gcc.dg/vect/pr68577.c:18:3: note: not vectorized:
> relevant stmt not supported: _8 = __builtin_popcountl (_5); ...
> /tmp/trunk/gcc/testsuite/gcc.dg/vect/pr68577.c:18:3: note: * Re-trying
> analysis with vector size 8
> 
> and that now succeeds (it probably didn't succeed before the patch).

In the .optimized file, I see it vectorised it to 

  vect__5.25_58 = VIEW_CONVERT_EXPR(vect__4.21_65);
  vect__5.25_57 = VIEW_CONVERT_EXPR(vect__4.22_63);
  vect__5.25_56 = VIEW_CONVERT_EXPR(vect__4.23_61);
  vect__5.25_55 = VIEW_CONVERT_EXPR(vect__4.24_59);
  _54 = POPCOUNT (vect__5.25_58);
  _53 = POPCOUNT (vect__5.25_57);

Which is something we just don't have a pattern for. Before this patch, it was 
rejecting "long unsigned int"
With this patch is somehow thinks we support an integer vector of 1 element, 
even though 1) we don't have an optab
Defined for this operation for POPCOUNT (or at all in aarch64 as far as I can 
tell), and 2) we don't have it in our supported list of vector modes.

So I don't quite understand how we end up with this expression.

Regards,
Tamar

> 
> Richard.
> 
> > 
> > From: gcc-patches-ow...@gcc.gnu.org 
> on
> > behalf of Andreas Schwab 
> > Sent: Saturday, September 2, 2017 10:58 PM
> > To: Jon Beniston
> > Cc: 'Richard Biener'; gcc-patches@gcc.gnu.org
> > Subject: Re: [RFC, vectorizer] Allow single element vector types for
> > vector reduction operations
> >
> > On Aug 30 2017, "Jon Beniston"  wrote:
> >
> > > gcc/
> > > 2017-08-30  Jon Beniston  
> > > Richard Biener  
> > >
> > > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> > > index cfdb72c..5ebeac2 100644
> > > --- a/gcc/tree-vect-patterns.c
> > > +++ b/gcc/tree-vect-patterns.c
> > > @@ -4150,7 +4150,7 @@ vect_pattern_recog_1 (vect_recog_func
> *recog_func,
> > >loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
> > >
> > >if (VECTOR_BOOLEAN_TYPE_P (type_in)
> > > -  || VECTOR_MODE_P (TYPE_MODE (type_in)))
> > > +  || VECTOR_TYPE_P (type_in))
> > >  {
> > >/* No need to check target support (already checked by the pattern
> > >   recognition function).  */ diff --git
> > > a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c index
> > > 013fb1f..fc62efb 100644
> > > --- a/gcc/tree-vect-stmts.c
> > > +++ b/gcc/tree-vect-stmts.c
> > > @@ -9098,7 +9098,8 @@ get_vectype_for_scalar_type_and_size (tree
> > > scalar_type, unsigned size)
> > >else
> > >  simd_mode = mode_for_vector (inner_mode, size / nbytes);
> > >nunits = GET_MODE_SIZE (simd_mode) / nbytes;
> > > -  if (nunits <= 1)
> > > +  /* NOTE: nunits == 1 is allowed to support single element vector types.
> > > */
> > > +  if (nunits < 1)
> > >  return NULL_TREE;
> > >
> > >vectype = build_vector_type (scalar_type, nunits);
> > >
> > >
> > >
> >
> > That breaks vect/pr68577.c on aarch64.
> >
> > during RTL pass: expand
> > /opt/gcc/gcc-20170902/gcc/testsuite/gcc.dg/vect/pr68577.c: In function
> 'slp_test':
> > /opt/gcc/gcc-20170902/gcc/testsuite/gcc.dg/vect/pr68577.c:20:12:
> > internal compiler error: in simplify_subreg, at simplify-rtx.c:6050
> > 0xb55983 simplify_subreg(machine_mode, rtx_def*, machine_mode,
> unsigned int)
> > ../../gcc/simplify-rtx.c:6049
> > 0xb595f7 simplify_gen_subreg(machine_mode, rtx_def*, machine_mode,
> unsigned int)
> > ../../gcc/simplify-rtx.c:6278
> > 0x81d277 store_bit_field_1
> > ../../gcc/expmed.c:798
> > 0x81d55f store_bit_field(rtx_def*, unsigned long, unsigned long, unsigned
> long, unsigned long, machine_mode, rtx_def*, bool)
> > 

[PATCH] Handle wide-chars in native_encode_string

2017-09-04 Thread Richard Biener

The following patch cures the missed optimization in PR82084, vectorizing
a wide-char initializer

  wchar_t strs[4][2]= {  L"A", L"B", L"C" , L"D"};

with AVX to

  MEM[(wchar_t[2] *)] = { 65, 66, 67, 68 };

it's not entirely clear to me whether the individual STRING_CSTs we
build for the gimplified code

strs[0] = "A";
strs[1] = "B";
strs[2] = "C";
strs[3] = "D";

always have a consistend "character" size and how the individual
"characters" are encoded.  The patch assumes that the array element
type of the STRING_CST can be used to get access to individual
characters by means of the element type size and those elements
are stored in host byteorder.  Which means the patch simply handles
16bit and 32bit "characters" as 16bit and 32bit integers and encodes
them with the same rules as such integers.

Joseph, are there more considerations for encoding the target
representation of STRING_CSTs?

Looks I was too lazy to lookup answers to those questions from
the RTL expansion code which hopefully outputs constant initializers
properly.

Apart from vectorization in the mentioned testcase we also gain
constant folding from pices from this change (but I don't adjust
fold_read_from_constant_string yet).

Thanks,
Richard.

2017-09-04  Richard Biener  

PR tree-optimization/82084
* fold-const.c (native_encode_string): Handle wide characters.
(can_native_encode_string_p): Likewise.

Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 251661)
+++ gcc/fold-const.c(working copy)
@@ -7187,26 +7187,71 @@ native_encode_string (const_tree expr, u
   if (! can_native_encode_string_p (expr))
 return 0;
 
-  HOST_WIDE_INT total_bytes = tree_to_shwi (TYPE_SIZE_UNIT (TREE_TYPE (expr)));
+  tree type = TREE_TYPE (expr);
+  HOST_WIDE_INT total_bytes = tree_to_shwi (TYPE_SIZE_UNIT (type));
+  int orig_off = off;
   if ((off == -1 && total_bytes > len)
   || off >= total_bytes)
 return 0;
   if (off == -1)
 off = 0;
-  if (TREE_STRING_LENGTH (expr) - off < MIN (total_bytes, len))
+
+  HOST_WIDE_INT elsz = tree_to_shwi (TYPE_SIZE_UNIT (TREE_TYPE (type)));
+  if (elsz == 1)
 {
-  int written = 0;
-  if (off < TREE_STRING_LENGTH (expr))
+  if (TREE_STRING_LENGTH (expr) - off < MIN (total_bytes, len))
{
- written = MIN (len, TREE_STRING_LENGTH (expr) - off);
- memcpy (ptr, TREE_STRING_POINTER (expr) + off, written);
+ int written = 0;
+ if (off < TREE_STRING_LENGTH (expr))
+   {
+ written = MIN (len, TREE_STRING_LENGTH (expr) - off);
+ memcpy (ptr, TREE_STRING_POINTER (expr) + off, written);
+   }
+ memset (ptr + written, 0,
+ MIN (total_bytes - written, len - written));
}
-  memset (ptr + written, 0,
- MIN (total_bytes - written, len - written));
+  else
+   memcpy (ptr, TREE_STRING_POINTER (expr) + off, MIN (total_bytes, len));
+  return MIN (total_bytes - off, len);
 }
   else
-memcpy (ptr, TREE_STRING_POINTER (expr) + off, MIN (total_bytes, len));
-  return MIN (total_bytes - off, len);
+{
+  tree ielt = build_nonstandard_integer_type (elsz * 8, true);
+  int offset = 0;
+  bool first = true;
+  for (int o = off & ~(elsz - 1); o < total_bytes; o += elsz)
+   {
+ unsigned HOST_WIDE_INT c;
+ switch (elsz)
+   {
+   case 2:
+ {
+   uint16_t s;
+   memcpy (, TREE_STRING_POINTER (expr) + o, 2);
+   c = s;
+   break;
+ }
+   case 4:
+ {
+   uint32_t i;
+   memcpy (, TREE_STRING_POINTER (expr) + o, 4);
+   c = i;
+   break;
+ }
+   default:
+ gcc_unreachable ();
+   }
+ tree elem = build_int_cstu (ielt, c);
+ int res = native_encode_expr (elem, ptr+offset, len-offset,
+   first ? off & (elsz - 1) : 0);
+ if ((orig_off == -1 && res != elsz)
+ || res == 0)
+   return 0;
+ offset += res;
+ first = false;
+   }
+  return offset;
+}
 }
 
 
@@ -7491,10 +7536,11 @@ can_native_encode_string_p (const_tree e
 
   if (TREE_CODE (type) != ARRAY_TYPE
   || TREE_CODE (TREE_TYPE (type)) != INTEGER_TYPE
-  || (GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE (TREE_TYPE (type)))
- != BITS_PER_UNIT)
   || !tree_fits_shwi_p (TYPE_SIZE_UNIT (type)))
 return false;
+  HOST_WIDE_INT elsz = tree_to_shwi (TYPE_SIZE_UNIT (TREE_TYPE (type)));
+  if (elsz != 1 && elsz != 2 && elsz != 4)
+return false;
   return true;
 }
 


[PATCH] Another type demotion issue with ubsan (PR sanitizer/82072)

2017-09-04 Thread Marek Polacek
Vittorio reported another issue with convert_to_integer_1: for
u = -l;
where u is unsigned and l is long long the function does:

 911   return convert (type,
 912   fold_build1 (ex_form, typex,
 913convert (typex,
 914 TREE_OPERAND (expr, 
0;

so instead of
u = (unsigned int) -l;
it produced
u = -(unsigned int) l;
thus hiding the overflow.  Fixed by moving the recently added check a little
bit above.

Bootstrapped/regtested on x86_64-linux, ok for trunk?

2017-09-04  Marek Polacek  

PR sanitizer/82072
* convert.c (convert_to_integer_1) : Move the ubsan
check earlier.

* c-c++-common/ubsan/pr82072-2.c: New test.

diff --git gcc/convert.c gcc/convert.c
index 139d790fd98..bfe18fb0f43 100644
--- gcc/convert.c
+++ gcc/convert.c
@@ -886,6 +886,12 @@ convert_to_integer_1 (tree type, tree expr, bool dofold)
break;
 
  case NEGATE_EXPR:
+   /* Using unsigned arithmetic for signed types may hide overflow
+  bugs.  */
+   if (!TYPE_UNSIGNED (TREE_TYPE (TREE_OPERAND (expr, 0)))
+   && sanitize_flags_p (SANITIZE_SI_OVERFLOW))
+ break;
+   /* Fall through.  */
  case BIT_NOT_EXPR:
/* This is not correct for ABS_EXPR,
   since we must test the sign before truncation.  */
@@ -902,12 +908,7 @@ convert_to_integer_1 (tree type, tree expr, bool dofold)
TYPE_UNSIGNED (typex));
 
  if (!TYPE_UNSIGNED (typex))
-   {
- /* Using unsigned arithmetic may hide overflow bugs.  */
- if (sanitize_flags_p (SANITIZE_SI_OVERFLOW))
-   break;
- typex = unsigned_type_for (typex);
-   }
+   typex = unsigned_type_for (typex);
  return convert (type,
  fold_build1 (ex_form, typex,
   convert (typex,
diff --git gcc/testsuite/c-c++-common/ubsan/pr82072-2.c 
gcc/testsuite/c-c++-common/ubsan/pr82072-2.c
index e69de29bb2d..ff8aca4d942 100644
--- gcc/testsuite/c-c++-common/ubsan/pr82072-2.c
+++ gcc/testsuite/c-c++-common/ubsan/pr82072-2.c
@@ -0,0 +1,15 @@
+/* PR sanitizer/82072 */
+/* { dg-do run } */
+/* { dg-options "-fsanitize=signed-integer-overflow" } */
+
+int
+main ()
+{
+  long long int l = -__LONG_LONG_MAX__ - 1;
+  unsigned int u;
+  u = -l;
+  asm volatile ("" : "+r" (u));
+  return 0;
+}
+
+/* { dg-output "negation of -9223372036854775808 cannot be represented in type 
'long long int'\[^\n\r]*; cast to an unsigned type to negate this value to 
itself" } */

Marek


Re: [PATCH][GCC][ARM] Dot Product commandline options [Patch (1/8)]

2017-09-04 Thread Richard Earnshaw (lists)
On 01/09/17 14:19, Tamar Christina wrote:
> Hi All,
> 
> This patch adds support for the +dotprod extension to ARM.
> Dot Product requires Adv.SIMD to work and so enables this option
> by default when enabled.
> 
> It is available from ARMv8.2-a and onwards and is enabled by
> default on Cortex-A55 and Cortex-A75.
> 
> Regtested and bootstrapped on arm-none-eabi and no issues.
> 
> Ok for trunk?
> 
> gcc/
> 2017-09-01  Tamar Christina  
> 
>   * config/arm/arm.h (TARGET_DOTPROD): New.
>   * config/arm/arm.c (arm_arch_dotprod): New.
>   (arm_option_reconfigure_globals): Add arm_arch_dotprod.
>   * config/arm/arm-c.c (__ARM_FEATURE_DOTPROD): New.
>   * config/arm/arm-cpus.in (cortex-a55, cortex-75): Enabled +dotprod.
>   (armv8.2-a, cortex-a75.cortex-a55): Likewise.
>   * config/arm/arm-isa.h (isa_bit_dotprod, ISA_DOTPROD): New.
>   * config/arm/t-multilib (v8_2_a_simd_variants): Add dotprod.
>   * doc/invoke.texi (armv8.2-a): Document dotprod
> 
> 
> 7949-diff.patch
> 
> 
> diff --git a/gcc/config/arm/arm-c.c b/gcc/config/arm/arm-c.c
> index 
> 55472434c3a6e90c5693bbaabd3265f7d968787f..295f03bf8ee02be7c89ed2967d283be206e9f25a
>  100644
> --- a/gcc/config/arm/arm-c.c
> +++ b/gcc/config/arm/arm-c.c
> @@ -73,6 +73,7 @@ arm_cpu_builtins (struct cpp_reader* pfile)
>def_or_undef_macro (pfile, "__ARM_FEATURE_QRDMX", TARGET_NEON_RDMA);
>  
>def_or_undef_macro (pfile, "__ARM_FEATURE_CRC32", TARGET_CRC32);
> +  def_or_undef_macro (pfile, "__ARM_FEATURE_DOTPROD", TARGET_DOTPROD);
>def_or_undef_macro (pfile, "__ARM_32BIT_STATE", TARGET_32BIT);
>  
>cpp_undef (pfile, "__ARM_FEATURE_CMSE");
> diff --git a/gcc/config/arm/arm-cpus.in b/gcc/config/arm/arm-cpus.in
> index 
> d009a9e18acb093aefe0f9d8d6de49489fc2325c..7707eec5edf36b0cb4339bc52bc45a92b6ea007f
>  100644
> --- a/gcc/config/arm/arm-cpus.in
> +++ b/gcc/config/arm/arm-cpus.in
> @@ -357,6 +357,7 @@ begin arch armv8.2-a
>   option crypto add FP_ARMv8 CRYPTO
>   option nocrypto remove ALL_CRYPTO
>   option nofp remove ALL_FP
> + option dotprod add FP_ARMv8 DOTPROD
>  end arch armv8.2-a
>  
>  begin arch armv8-m.base
> @@ -1269,9 +1270,10 @@ begin cpu cortex-a55
>   cname cortexa55
>   tune for cortex-a53
>   tune flags LDSCHED
> - architecture armv8.2-a+fp16
> + architecture armv8.2-a+fp16+dotprod
>   fpu neon-fp-armv8
>   option crypto add FP_ARMv8 CRYPTO
> + option dotprod add FP_ARMv8 DOTPROD
>   option nofp remove ALL_FP
>   costs cortex_a53
>  end cpu cortex-a55
> @@ -1280,9 +1282,10 @@ begin cpu cortex-a75
>   cname cortexa75
>   tune for cortex-a57
>   tune flags LDSCHED
> - architecture armv8.2-a+fp16
> + architecture armv8.2-a+fp16+dotprod
>   fpu neon-fp-armv8
>   option crypto add FP_ARMv8 CRYPTO
> + option dotprod add FP_ARMv8 DOTPROD
>   costs cortex_a73
>  end cpu cortex-a75
>  
> @@ -1292,9 +1295,10 @@ begin cpu cortex-a75.cortex-a55
>   cname cortexa75cortexa55
>   tune for cortex-a53
>   tune flags LDSCHED
> - architecture armv8.2-a+fp16
> + architecture armv8.2-a+fp16+dotprod
>   fpu neon-fp-armv8
>   option crypto add FP_ARMv8 CRYPTO
> + option dotprod add FP_ARMv8 DOTPROD
>   costs cortex_a73
>  end cpu cortex-a75.cortex-a55
>  
> diff --git a/gcc/config/arm/arm-isa.h b/gcc/config/arm/arm-isa.h
> index 
> dbd29eaa52f2007498c2aff6263b8b6c3a70e2c2..60a50edf08dd7d3ac9ad46967250f4dcc6b8768b
>  100644
> --- a/gcc/config/arm/arm-isa.h
> +++ b/gcc/config/arm/arm-isa.h
> @@ -66,6 +66,7 @@ enum isa_feature
>  isa_bit_fp_d32,  /* 32 Double precision registers.  */
>  isa_bit_crypto,  /* Crypto extension to ARMv8.  */
>  isa_bit_fp16,/* FP16 data processing (half-precision float).  */
> +isa_bit_dotprod, /* Dot Product instructions.  */
>  
>  /* ISA Quirks (errata?).  Don't forget to add this to the list of
> all quirks below.  */
> @@ -159,6 +160,7 @@ enum isa_feature
>  #define ISA_FP_ARMv8 ISA_FPv5, ISA_FP_D32
>  #define ISA_NEON ISA_FP_D32, isa_bit_neon
>  #define ISA_CRYPTO   ISA_NEON, isa_bit_crypto
> +#define ISA_DOTPROD  ISA_NEON, isa_bit_dotprod

You also need to update ISA_ALL_FP to include your new feature;
otherwise it won't be correctly removed if +nofp is used.

>  
>  /* List of all quirk bits to strip out when comparing CPU features with
> architectures.  */
> diff --git a/gcc/config/arm/arm.h b/gcc/config/arm/arm.h
> index 
> 4f53583cf0219de4329bc64a47a5a42c550ff354..44a95bf7eb2eab8e3cf07ac9cc7aad3d9997b27f
>  100644
> --- a/gcc/config/arm/arm.h
> +++ b/gcc/config/arm/arm.h
> @@ -210,6 +210,11 @@ extern tree arm_fp16_type_node;
>  /* FPU supports ARMv8.1 Adv.SIMD extensions.  */
>  #define TARGET_NEON_RDMA (TARGET_NEON && arm_arch8_1)
>  
> +/* Supports for Dot Product AdvSIMD extensions.  */
> +#define TARGET_DOTPROD (TARGET_NEON  \
> + && bitmap_bit_p (arm_active_target.isa, \
> + isa_bit_dotprod))
> +
>  /* FPU supports the 

[PATCH] Fix PR82084

2017-09-04 Thread Richard Biener

The following fixes vectorization of stores from wide-char STRING_CSTs
which native_encode_string doesn't handle (will post a preliminary patch
for that and trunk later).

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk 
sofar.

Richard.

2017-09-04  Richard Biener  

PR tree-optimization/82084
* fold-const.h (can_native_encode_string_p): Declare.
* fold-const.c (can_native_encode_string_p): Factor out from ...
(native_encode_string): ... here.
* tree-vect-stmts.c (vectorizable_store): Call it to avoid
vectorizing stores from constants we later cannot handle.

* g++.dg/torture/pr82084.C: New testcase.

Index: gcc/fold-const.c
===
--- gcc/fold-const.c(revision 251649)
+++ gcc/fold-const.c(working copy)
@@ -7184,16 +7184,10 @@ native_encode_vector (const_tree expr, u
 static int
 native_encode_string (const_tree expr, unsigned char *ptr, int len, int off)
 {
-  tree type = TREE_TYPE (expr);
-  HOST_WIDE_INT total_bytes;
-
-  if (TREE_CODE (type) != ARRAY_TYPE
-  || TREE_CODE (TREE_TYPE (type)) != INTEGER_TYPE
-  || (GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE (TREE_TYPE (type)))
- != BITS_PER_UNIT)
-  || !tree_fits_shwi_p (TYPE_SIZE_UNIT (type)))
+  if (! can_native_encode_string_p (expr))
 return 0;
-  total_bytes = tree_to_shwi (TYPE_SIZE_UNIT (type));
+
+  HOST_WIDE_INT total_bytes = tree_to_shwi (TYPE_SIZE_UNIT (TREE_TYPE (expr)));
   if ((off == -1 && total_bytes > len)
   || off >= total_bytes)
 return 0;
@@ -7487,6 +7481,23 @@ can_native_encode_type_p (tree type)
 }
 }
 
+/* Return true iff a STRING_CST S is accepted by
+   native_encode_expr.  */
+
+bool
+can_native_encode_string_p (const_tree expr)
+{
+  tree type = TREE_TYPE (expr);
+
+  if (TREE_CODE (type) != ARRAY_TYPE
+  || TREE_CODE (TREE_TYPE (type)) != INTEGER_TYPE
+  || (GET_MODE_BITSIZE (SCALAR_INT_TYPE_MODE (TREE_TYPE (type)))
+ != BITS_PER_UNIT)
+  || !tree_fits_shwi_p (TYPE_SIZE_UNIT (type)))
+return false;
+  return true;
+}
+
 /* Fold a VIEW_CONVERT_EXPR of a constant expression EXPR to type
TYPE at compile-time.  If we're unable to perform the conversion
return NULL_TREE.  */
Index: gcc/fold-const.h
===
--- gcc/fold-const.h(revision 251649)
+++ gcc/fold-const.h(working copy)
@@ -28,6 +28,7 @@ extern int folding_initializer;
 extern int native_encode_expr (const_tree, unsigned char *, int, int off = -1);
 extern tree native_interpret_expr (tree, const unsigned char *, int);
 extern bool can_native_encode_type_p (tree);
+extern bool can_native_encode_string_p (const_tree);
 
 /* Fold constants as much as possible in an expression.
Returns the simplified expression.
Index: gcc/tree-vect-stmts.c
===
--- gcc/tree-vect-stmts.c   (revision 251649)
+++ gcc/tree-vect-stmts.c   (working copy)
@@ -5733,6 +5733,12 @@ vectorizable_store (gimple *stmt, gimple
 
   op = gimple_assign_rhs1 (stmt);
 
+  /* In the case this is a store from a STRING_CST make sure
+ native_encode_expr can handle it.  */
+  if (TREE_CODE (op) == STRING_CST
+  && ! can_native_encode_string_p (op))
+return false;
+
   if (!vect_is_simple_use (op, vinfo, _stmt, , _vectype))
 {
   if (dump_enabled_p ())
Index: gcc/testsuite/g++.dg/torture/pr82084.C
===
--- gcc/testsuite/g++.dg/torture/pr82084.C  (revision 0)
+++ gcc/testsuite/g++.dg/torture/pr82084.C  (working copy)
@@ -0,0 +1,9 @@
+// { dg-do compile }
+
+#include 
+int main()
+{
+  wchar_t strs[4][2]= {  L"A", L"B", L"C" , L"D"};
+  std::wstring ss(strs[0]);
+  return 0;
+}


Re: [PATCH, ARM] correctly encode the CC reg data flow

2017-09-04 Thread Kyrill Tkachov

Hi Bernd,

On 18/01/17 15:36, Bernd Edlinger wrote:

On 01/13/17 19:28, Bernd Edlinger wrote:

On 01/13/17 17:10, Bernd Edlinger wrote:

On 01/13/17 14:50, Richard Earnshaw (lists) wrote:

On 18/12/16 12:58, Bernd Edlinger wrote:

Hi,

this is related to PR77308, the follow-up patch will depend on this
one.

When trying the split the *arm_cmpdi_insn and *arm_cmpdi_unsigned
before reload, a mis-compilation in libgcc function __gnu_satfractdasq
was discovered, see [1] for more details.

The reason seems to be that when the *arm_cmpdi_insn is directly
followed by a *arm_cmpdi_unsigned instruction, both are split
up into this:

[(set (reg:CC CC_REGNUM)
  (compare:CC (match_dup 0) (match_dup 1)))
 (parallel [(set (reg:CC CC_REGNUM)
 (compare:CC (match_dup 3) (match_dup 4)))
(set (match_dup 2)
 (minus:SI (match_dup 5)
  (ltu:SI (reg:CC_C CC_REGNUM) (const_int
0])]

[(set (reg:CC CC_REGNUM)
  (compare:CC (match_dup 2) (match_dup 3)))
 (cond_exec (eq:SI (reg:CC CC_REGNUM) (const_int 0))
(set (reg:CC CC_REGNUM)
 (compare:CC (match_dup 0) (match_dup 1]

The problem is that the reg:CC from the *subsi3_carryin_compare
is not mentioning that the reg:CC is also dependent on the reg:CC
from before.  Therefore the *arm_cmpsi_insn appears to be
redundant and thus got removed, because the data values are identical.

I think that applies to a number of similar pattern where data
flow is happening through the CC reg.

So this is a kind of correctness issue, and should be fixed
independently from the optimization issue PR77308.

Therefore I think the patterns need to specify the true
value that will be in the CC reg, in order for cse to
know what the instructions are really doing.


Bootstrapped and reg-tested on arm-linux-gnueabihf.
Is it OK for trunk?


I agree you've found a valid problem here, but I have some issues with
the patch itself.


(define_insn_and_split "subdi3_compare1"
   [(set (reg:CC_NCV CC_REGNUM)
 (compare:CC_NCV
   (match_operand:DI 1 "register_operand" "r")
   (match_operand:DI 2 "register_operand" "r")))
(set (match_operand:DI 0 "register_operand" "=")
 (minus:DI (match_dup 1) (match_dup 2)))]
   "TARGET_32BIT"
   "#"
   "&& reload_completed"
   [(parallel [(set (reg:CC CC_REGNUM)
(compare:CC (match_dup 1) (match_dup 2)))
   (set (match_dup 0) (minus:SI (match_dup 1) (match_dup 2)))])
(parallel [(set (reg:CC_C CC_REGNUM)
(compare:CC_C
  (zero_extend:DI (match_dup 4))
  (plus:DI (zero_extend:DI (match_dup 5))
   (ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
   (set (match_dup 3)
(minus:SI (minus:SI (match_dup 4) (match_dup 5))
  (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0])]


This pattern is now no-longer self consistent in that before the split
the overall result for the condition register is in mode CC_NCV, but
afterwards it is just CC_C.

I think CC_NCV is correct mode (the N, C and V bits all correctly
reflect the result of the 64-bit comparison), but that then implies that
the cc mode of subsi3_carryin_compare is incorrect as well and should in
fact also be CC_NCV.  Thinking about this pattern, I'm inclined to agree
that CC_NCV is the correct mode for this operation

I'm not sure if there are other consequences that will fall out from
fixing this (it's possible that we might need a change to select_cc_mode
as well).


Yes, this is still a bit awkward...

The N and V bit will be the correct result for the subdi3_compare1
a 64-bit comparison, but zero_extend:DI (match_dup 4) (plus:DI ...)
only gets the C bit correct, the expression for N and V is a different
one.

It probably works, because the subsi3_carryin_compare instruction sets
more CC bits than the pattern does explicitly specify the value.
We know the subsi3_carryin_compare also computes the NV bits, but it is
hard to write down the correct rtl expression for it.

In theory the pattern should describe everything correctly,
maybe, like:

set (reg:CC_C CC_REGNUM)
 (compare:CC_C
   (zero_extend:DI (match_dup 4))
   (plus:DI (zero_extend:DI (match_dup 5))
(ltu:DI (reg:CC_C CC_REGNUM) (const_int 0)
set (reg:CC_NV CC_REGNUM)
 (compare:CC_NV
  (match_dup 4))
  (plus:SI (match_dup 5) (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)))
set (match_dup 3)
 (minus:SI (minus:SI (match_dup 4) (match_dup 5))
   (ltu:SI (reg:CC_C CC_REGNUM) (const_int 0)


But I doubt that will work to set CC_REGNUM with two different modes
in parallel?

Another idea would be to invent a CC_CNV_NOOV mode, that implicitly
defines C from the DImode result, and NV from the SImode result,
similar to the CC_NOOVmode, that also leaves something open what
bits it really defines?


What do you think?


Thanks
Bernd.

I think maybe the 

Re: [PATCH GCC]A simple implementation of loop interchange

2017-09-04 Thread Richard Biener
On Wed, Aug 30, 2017 at 6:32 PM, Bin.Cheng  wrote:
> On Wed, Aug 30, 2017 at 3:19 PM, Richard Biener
>  wrote:
>> On Wed, Aug 30, 2017 at 3:18 PM, Bin Cheng  wrote:
>>> Hi,
>>> This patch implements a simple loop interchange pass in GCC, as described 
>>> by its comments:
>>> +/* This pass performs loop interchange: for example, the loop nest
>>> +
>>> +   for (int j = 0; j < N; j++)
>>> + for (int k = 0; k < N; k++)
>>> +   for (int i = 0; i < N; i++)
>>> +c[i][j] = c[i][j] + a[i][k]*b[k][j];
>>> +
>>> +   is transformed to
>>> +
>>> +   for (int i = 0; i < N; i++)
>>> + for (int j = 0; j < N; j++)
>>> +   for (int k = 0; k < N; k++)
>>> +c[i][j] = c[i][j] + a[i][k]*b[k][j];
>>> +
>>> +   This pass implements loop interchange in the following steps:
>>> +
>>> + 1) Find perfect loop nest for each innermost loop and compute data
>>> +   dependence relations for it.  For above example, loop nest is
>>> +   .
>>> + 2) From innermost to outermost loop, this pass tries to interchange
>>> +   each loop pair.  For above case, it firstly tries to interchange
>>> +    and loop nest becomes .
>>> +   Then it tries to interchange  and loop nest becomes
>>> +   .  The overall effect is to move innermost
>>> +   loop to the outermost position.  For loop pair 
>>> +   to be interchanged, we:
>>> + 3) Check if data dependence relations are valid for loop interchange.
>>> + 4) Check if both loops can be interchanged in terms of transformation.
>>> + 5) Check if interchanging the two loops is profitable.
>>> + 6) Interchange the two loops by mapping induction variables.
>>> +
>>> +   This pass also handles reductions in loop nest.  So far we only support
>>> +   simple reduction of inner loop and double reduction of the loop nest.  
>>> */
>>>
>>> Actually, this pass only does loop shift which outermosting inner loop to 
>>> outer, rather
>>> than permutation.  Also as a traditional loop optimizer, it only works for 
>>> perfect loop
>>> nest.  I put it just after loop distribution thus ideally loop 
>>> split/distribution could
>>> create perfect nest for it.  Unfortunately, we don't get any perfect nest 
>>> from distribution
>>> for now because it only works for innermost loop.  For example, the 
>>> motivation case in
>>> spec2k6/bwaves is not handled on this pass alone.  I have a patch extending 
>>> distribution
>>> for (innermost) loop nest and with that patch bwaves case can be handled.
>>> Another point is I deliberately make both the cost model and code 
>>> transformation (very)
>>> conservative.  We can support more cases, or more transformations with 
>>> great care when
>>> it is for sure known beneficial.  IMHO, we already hit over-baked issues 
>>> quite often and
>>> don't want to introduce more.
>>> As for code generation, this patch has an issue that invariant code in 
>>> outer loop could
>>> be moved to inner loop.  For the moment, we rely on the last lim pass to 
>>> handle all INV
>>> generated during interchange.  In the future, we may need to avoid that in 
>>> interchange
>>> itself, or another lim pass just like the one after graphite optimizations.
>>>
>>> Boostrap and test on x86_64 and AArch64.  Various benchmarks built and run 
>>> successfully.
>>> Note this pass is disabled in patch, while the code is exercised by 
>>> bootstrap/building
>>> programs with it enabled by default.  Any comments?
>>
> Thanks for quick review.
>> +/* The same as above, but this one is only used for interchanging not
>> +   innermost loops.  */
>> +#define OUTER_STRIDE_RATIO (2)
>>
>> please make all these knobs --params.
>>
>> +/* Enum type for loop reduction variable.  */
>> +enum reduction_type
>> +{
>> +  UNKNOWN_RTYPE = 0,
>> +  SIMPLE_RTYPE,
>> +  DOUBLE_RTYPE
>> +};
>>
>> seeing this we should have some generic data structure / analysis for
>> reduction detection.  This adds a third user (autopar and vectorizer
>> are the others).  Just an idea.
>>
>> +/* Return true if E is abnormal edge.  */
>> +
>> +static inline bool
>> +abnormal_edge (edge e)
>> +{
>> +  return (e->flags & (EDGE_EH | EDGE_ABNORMAL | EDGE_IRREDUCIBLE_LOOP));
>> +}
>>
>> bad name/comment for what it does.
>>
>> ... jumping to end of file / start of pass
>>
>> +  /* Get the outer loop.  */
>> +  loop = superloop_at_depth (loop, loop_depth (loop) - 1);
>>
>> loop_outer (loop)?
>>
>> +  /* Only support rectangle loop nest, i.e, inner loop's niters doesn't
>> +depends on outer loop's IV.  */
>> +  if (chrec_contains_symbols_defined_in_loop (niters, loop->num))
>> +   break;
>>
>> but you don't check for a three-nest whether niters depends on outer outer
>> loop's IV that way.  Either the check is superfluous here or incomplete.
> It is 

[ARC] Fix stack unwinding for ARC

2017-09-04 Thread Cupertino Miranda
Hi everyone,

Here is patch to stack unwinding for ARC.

- Fix to unwinding. Now is is possible to unwind from syscall
wrappers, signals and functions with dynamic stack allocation.

- Patch also fixes millicode. Although millicode save and restore functions
would change blink, the calls to those functions were not clobbering blink.

Looking forward to your reviews.

Best regards,
Cupertino

From 213a0f115b2c7459b2a65728f4c5a74d2ed89c9d Mon Sep 17 00:00:00 2001
From: Cupertino Miranda 
Date: Mon, 4 Sep 2017 14:24:51 +0200
Subject: [PATCH] Fix to unwinding. Now is is possible to unwind from syscall
 wrappers, signals and functions with dynamic stack allocation.

Patch also fixes millicode. Although millicode save and restore functions
would change blink, the calls to those functions were not clobbering blink.

gcc/ChangeLog:

2017-09-04  Cupertino Miranda  

	* config/arc/arc.c (arc_save_restore): Corrected CFA note.
	(arc_expand_prologue): Restore blink for millicode.
	* config/arc/linux.h (LINK_EH_SPEC): Defined.

libgcc/ChangeLog:

2017-09-04  Cupertino Miranda  

	* config/arc/linux-unwind-reg.def: Created with register indexes info.
	* config/arc/linux-unwind-reg.h (registers_stack_order): Defined.
	(register_position): Likewise.
	(register_id_for_index): Likewise.
	(arc_fallback_frame_state): New function to define frame where frame
	information is missing and it refers to a syscall wrapper.
	(+arc_frob_update_context): New function to correct context information in
	case unwinding happens through fp register.
---
 gcc/config/arc/arc.c   |  36 ++--
 gcc/config/arc/linux.h |   8 ++
 libgcc/config.host |   1 +
 libgcc/config/arc/linux-unwind-reg.def |  23 +
 libgcc/config/arc/linux-unwind.h   | 154 +
 5 files changed, 216 insertions(+), 6 deletions(-)
 create mode 100644 libgcc/config/arc/linux-unwind-reg.def
 create mode 100644 libgcc/config/arc/linux-unwind.h

diff --git a/gcc/config/arc/arc.c b/gcc/config/arc/arc.c
index d519063..ff2f8ae 100644
--- a/gcc/config/arc/arc.c
+++ b/gcc/config/arc/arc.c
@@ -2849,12 +2849,23 @@ arc_save_restore (rtx base_reg,
 	  else
 	{
 	  insn = frame_insn (insn);
-	  if (epilogue_p)
-		for (r = start_call; r <= end_call; r++)
-		  {
-		rtx reg = gen_rtx_REG (SImode, r);
-		add_reg_note (insn, REG_CFA_RESTORE, reg);
-		  }
+	  for (r = start_call, off = 0;
+		   r <= end_call;
+		   r++, off += UNITS_PER_WORD)
+		{
+		  rtx reg = gen_rtx_REG (SImode, r);
+		  if (epilogue_p)
+		  add_reg_note (insn, REG_CFA_RESTORE, reg);
+		  else
+		{
+		  rtx mem = gen_rtx_MEM (SImode, plus_constant (Pmode,
+base_reg,
+off));
+
+		  add_reg_note (insn, REG_CFA_OFFSET,
+gen_rtx_SET (mem, reg));
+		}
+		}
 	}
 	  offset += off;
 	}
@@ -3092,6 +3103,19 @@ arc_expand_prologue (void)
   frame_size_to_allocate -= cfun->machine->frame_info.reg_size;
 }
 
+  /* In the case of millicode thunk, we need to restore the clobbered
+ blink register.  */
+  if (cfun->machine->frame_info.millicode_end_reg > 0
+  && arc_must_save_return_addr (cfun))
+{
+  HOST_WIDE_INT tmp = cfun->machine->frame_info.reg_size;
+  emit_insn (gen_rtx_SET (gen_rtx_REG (Pmode, RETURN_ADDR_REGNUM),
+			  gen_rtx_MEM (Pmode,
+	   plus_constant (Pmode,
+			  stack_pointer_rtx,
+			  tmp;
+}
+
   /* Save frame pointer if needed.  First save the FP on stack, if not
  autosaved.  */
   if (arc_frame_pointer_needed ()
diff --git a/gcc/config/arc/linux.h b/gcc/config/arc/linux.h
index d8e0063..7073471 100644
--- a/gcc/config/arc/linux.h
+++ b/gcc/config/arc/linux.h
@@ -91,3 +91,11 @@ along with GCC; see the file COPYING3.  If not see
 /* Pre/post modify with register displacement are default off.  */
 #undef TARGET_AUTO_MODIFY_REG_DEFAULT
 #define TARGET_AUTO_MODIFY_REG_DEFAULT 0
+
+#if DEFAULT_LIBC == LIBC_GLIBC
+/* Override linux.h LINK_EH_SPEC definition.
+   Signalize that because we have fde-glibc, we don't need all C shared libs
+   linked against -lgcc_s.  */
+#undef LINK_EH_SPEC
+#define LINK_EH_SPEC "--eh-frame-hdr"
+#endif
diff --git a/libgcc/config.host b/libgcc/config.host
index 2686d59..d05ed45 100644
--- a/libgcc/config.host
+++ b/libgcc/config.host
@@ -385,6 +385,7 @@ arc*-*-elf*)
 	tmake_file="arc/t-arc"
 	extra_parts="crti.o crtn.o crtend.o crtbegin.o crtendS.o crtbeginS.o"
 	extra_parts="$extra_parts crttls.o"
+	md_unwind_header=arc/linux-unwind.h
 	;;
 arc*-*-linux*)
 	tmake_file="${tmake_file} t-slibgcc-libgcc t-slibgcc-nolc-override arc/t-arc-uClibc arc/t-arc"
diff --git a/libgcc/config/arc/linux-unwind-reg.def b/libgcc/config/arc/linux-unwind-reg.def
new file mode 100644
index 000..763a32f
--- /dev/null
+++ b/libgcc/config/arc/linux-unwind-reg.def
@@ -0,0 +1,23 @@
+REGISTER_IN_STACK(bta, -1)

Re: Add support to trace comparison instructions and switch statements

2017-09-04 Thread 吴潍浠(此彼)
Sorry. I made a terrible bug in it.
Attachment is my new patch.

Wish Wu

--
From:Wish Wu 
Time:2017 Sep 4 (Mon) 21:17
To:Dmitry Vyukov ; Jakub Jelinek 
Cc:gcc ; gcc-patches ; Jeff Law 
; wishwu007 
Subject:Re: Add support to trace comparison instructions and switch statements


Hi
I updated the patch and put it in attachment.

gcc/ChangeLog: 

2017-09-04  Wish Wu   

* asan.c (initialize_sanitizer_builtins):  
* builtin-types.def (BT_FN_VOID_UINT8_UINT8):  
(BT_FN_VOID_UINT16_UINT16):
(BT_FN_VOID_UINT32_UINT32):
(BT_FN_VOID_FLOAT_FLOAT):  
(BT_FN_VOID_DOUBLE_DOUBLE):
(BT_FN_VOID_UINT64_PTR):   
* common.opt:  
* flag-types.h (enum sanitize_coverage_code):  
* opts.c (COVERAGE_SANITIZER_OPT): 
(get_closest_sanitizer_option):
(parse_sanitizer_options): 
(common_handle_option):
* sancov.c (instrument_cond):  
(instrument_switch):   
(sancov_pass): 
* sanitizer.def (BUILT_IN_SANITIZER_COV_TRACE_CMP1):   
(BUILT_IN_SANITIZER_COV_TRACE_CMP2):   
(BUILT_IN_SANITIZER_COV_TRACE_CMP4):   
(BUILT_IN_SANITIZER_COV_TRACE_CMP8):   
(BUILT_IN_SANITIZER_COV_TRACE_CMPF):   
(BUILT_IN_SANITIZER_COV_TRACE_CMPD):   
(BUILT_IN_SANITIZER_COV_TRACE_SWITCH): 

gcc/testsuite/ChangeLog:   

2017-09-04  Wish Wu   

* gcc.dg/sancov/basic3.c: New test.

I think the aim of "trace-cmp" is finding reasonable values in runtime, playing 
approximate tricks when fuzzing.
We don't need to save all of values from low_case to high_case, there may be 
too much values and wasting resource.
For code :
void bar (void);
void
foo (int x)
{
  if (x == 21 || x == 64 || x == 98 || x == 135)
bar ();
}
GIMPLE IL on x86_64:
  1 
  2 ;; Function foo (foo, funcdef_no=0, decl_uid=2161, cgraph_uid=0, 
symbol_order=0)
  3 
  4 foo (int x)
  5 {
  6   unsigned int _5;
  7   unsigned int _6;
  8   unsigned int _7;
  9   unsigned int _8;
 10   unsigned int _9;
 11   unsigned int _10;
 12   unsigned int _11;
 13   unsigned int _12;
 14 
 15[0.00%] [count: INV]:
 16   _5 = (unsigned int) x_2(D);
 17   _6 = (unsigned int) 21;
 18   __builtin___sanitizer_cov_trace_cmp4 (_5, _6);
 19   if (x_2(D) == 21)
 20 goto ; [INV] [count: INV]
 21   else
 22 goto ; [INV] [count: INV]
 23 
 24[0.00%] [count: INV]:
 25   _7 = (unsigned int) x_2(D);
 26   _8 = (unsigned int) 64;
 27   __builtin___sanitizer_cov_trace_cmp4 (_7, _8);
 28   if (x_2(D) == 64)
 29 goto ; [INV] [count: INV]
 30   else
 31 goto ; [INV] [count: INV]
 32 
 33[0.00%] [count: INV]:
 34   _9 = (unsigned int) x_2(D);
 35   _10 = (unsigned int) 98;
 36   __builtin___sanitizer_cov_trace_cmp4 (_9, _10);
 37   if (x_2(D) == 98)
 38 goto ; [INV] [count: INV]
 39   else
 40 goto ; [INV] [count: INV]
 41 
 42[0.00%] [count: INV]:
 43   _11 = (unsigned int) x_2(D);
 44   _12 = (unsigned int) 135;
 45   __builtin___sanitizer_cov_trace_cmp4 (_11, _12);
 46   if (x_2(D) == 135)
 47 goto ; [INV] [count: INV]
 48   else
 49 goto ; [INV] [count: INV]
 50 
 51[0.00%] [count: INV]:
 52   bar ();
 53 
 54[0.00%] [count: INV]:
 55   return;
 56 
 57 }
 58 
 59
It actually catches reasonable and meaningful values. 
Maybe we can improve it in the future for tracing all of expression for 
comparison.

Wish Wu
--
From:Dmitry Vyukov 
Time:2017 Sep 3 (Sun) 19:05
To:Wish Wu 
Cc:Jakub Jelinek ; gcc ; gcc-patches 
; Jeff Law ; wishwu007 

Subject:Re: Add support to trace comparison instructions and switch statements


On Sun, Sep 3, 2017 at 12:38 PM, 吴潍浠(此彼)  wrote:
> Hi
> I will update the patch according to your requirements, and with some my 
> suggestions.
> It will take me 

Re: Add support to trace comparison instructions and switch statements

2017-09-04 Thread 吴潍浠(此彼)
Hi
I updated the patch and put it in attachment.

gcc/ChangeLog: 

2017-09-04  Wish Wu   

* asan.c (initialize_sanitizer_builtins):  
* builtin-types.def (BT_FN_VOID_UINT8_UINT8):  
(BT_FN_VOID_UINT16_UINT16):
(BT_FN_VOID_UINT32_UINT32):
(BT_FN_VOID_FLOAT_FLOAT):  
(BT_FN_VOID_DOUBLE_DOUBLE):
(BT_FN_VOID_UINT64_PTR):   
* common.opt:  
* flag-types.h (enum sanitize_coverage_code):  
* opts.c (COVERAGE_SANITIZER_OPT): 
(get_closest_sanitizer_option):
(parse_sanitizer_options): 
(common_handle_option):
* sancov.c (instrument_cond):  
(instrument_switch):   
(sancov_pass): 
* sanitizer.def (BUILT_IN_SANITIZER_COV_TRACE_CMP1):   
(BUILT_IN_SANITIZER_COV_TRACE_CMP2):   
(BUILT_IN_SANITIZER_COV_TRACE_CMP4):   
(BUILT_IN_SANITIZER_COV_TRACE_CMP8):   
(BUILT_IN_SANITIZER_COV_TRACE_CMPF):   
(BUILT_IN_SANITIZER_COV_TRACE_CMPD):   
(BUILT_IN_SANITIZER_COV_TRACE_SWITCH): 

gcc/testsuite/ChangeLog:   

2017-09-04  Wish Wu   

* gcc.dg/sancov/basic3.c: New test.

I think the aim of "trace-cmp" is finding reasonable values in runtime, playing 
approximate tricks when fuzzing.
We don't need to save all of values from low_case to high_case, there may be 
too much values and wasting resource.
For code :
void bar (void);
void
foo (int x)
{
  if (x == 21 || x == 64 || x == 98 || x == 135)
bar ();
}
GIMPLE IL on x86_64:
  1 
  2 ;; Function foo (foo, funcdef_no=0, decl_uid=2161, cgraph_uid=0, 
symbol_order=0)
  3 
  4 foo (int x)
  5 {
  6   unsigned int _5;
  7   unsigned int _6;
  8   unsigned int _7;
  9   unsigned int _8;
 10   unsigned int _9;
 11   unsigned int _10;
 12   unsigned int _11;
 13   unsigned int _12;
 14 
 15[0.00%] [count: INV]:
 16   _5 = (unsigned int) x_2(D);
 17   _6 = (unsigned int) 21;
 18   __builtin___sanitizer_cov_trace_cmp4 (_5, _6);
 19   if (x_2(D) == 21)
 20 goto ; [INV] [count: INV]
 21   else
 22 goto ; [INV] [count: INV]
 23 
 24[0.00%] [count: INV]:
 25   _7 = (unsigned int) x_2(D);
 26   _8 = (unsigned int) 64;
 27   __builtin___sanitizer_cov_trace_cmp4 (_7, _8);
 28   if (x_2(D) == 64)
 29 goto ; [INV] [count: INV]
 30   else
 31 goto ; [INV] [count: INV]
 32 
 33[0.00%] [count: INV]:
 34   _9 = (unsigned int) x_2(D);
 35   _10 = (unsigned int) 98;
 36   __builtin___sanitizer_cov_trace_cmp4 (_9, _10);
 37   if (x_2(D) == 98)
 38 goto ; [INV] [count: INV]
 39   else
 40 goto ; [INV] [count: INV]
 41 
 42[0.00%] [count: INV]:
 43   _11 = (unsigned int) x_2(D);
 44   _12 = (unsigned int) 135;
 45   __builtin___sanitizer_cov_trace_cmp4 (_11, _12);
 46   if (x_2(D) == 135)
 47 goto ; [INV] [count: INV]
 48   else
 49 goto ; [INV] [count: INV]
 50 
 51[0.00%] [count: INV]:
 52   bar ();
 53 
 54[0.00%] [count: INV]:
 55   return;
 56 
 57 }
 58 
 59
It actually catches reasonable and meaningful values. 
Maybe we can improve it in the future for tracing all of expression for 
comparison.

Wish Wu
--
From:Dmitry Vyukov 
Time:2017 Sep 3 (Sun) 19:05
To:Wish Wu 
Cc:Jakub Jelinek ; gcc ; gcc-patches 
; Jeff Law ; wishwu007 

Subject:Re: Add support to trace comparison instructions and switch statements


On Sun, Sep 3, 2017 at 12:38 PM, 吴潍浠(此彼)  wrote:
> Hi
> I will update the patch according to your requirements, and with some my 
> suggestions.
> It will take me one or two days.

Thanks! No hurry, just wanted to make sure you still want to pursue this.

> Wish Wu
>
> --
> From:Dmitry Vyukov 
> Time:2017 Sep 3 (Sun) 18:21
> To:Jakub Jelinek 
> Cc:Wish Wu ; gcc ; gcc-patches 
> ; Jeff Law ; wishwu007 
> 
> Subject:Re: Add support to trace 

RE: [PATCH][GCC][ARM][AArch64] Testsuite framework changes and execution tests [Patch (8/8)]

2017-09-04 Thread Tamar Christina
Hi Christophe,

> >
> > gcc/testsuite
> > 2017-09-01  Tamar Christina  
> >
> > * lib/target-supports.exp
> > (check_effective_target_arm_v8_2a_dotprod_neon_ok_nocache):
> New.
> > (check_effective_target_arm_v8_2a_dotprod_neon_ok): New.
> > (add_options_for_arm_v8_2a_dotprod_neon): New.
> > (check_effective_target_arm_v8_2a_dotprod_neon_hw): New.
> > (check_effective_target_vect_sdot_qi): New.
> > (check_effective_target_vect_udot_qi): New.
> > * gcc.target/arm/simd/vdot-exec.c: New.
> 
> Aren't you defining twice P() and ARR() in vdot-exec.c ?
> I'd expect a preprocessor error, did I read too quickly?
>

Yes they are defined twice but they're not redefined, all the definitions
are exactly the same so the pre-processor doesn't care. I can leave only
one if this is confusing.
 
> 
> Thanks,
> 
> Christophe
> 
> > * gcc.target/aarch64/advsimd-intrinsics/vdot-exec.c: New.
> > * gcc/doc/sourcebuild.texi: Document arm_v8_2a_dotprod_neon.
> >
> > --


Re: [PATCH][GCC][ARM][AArch64] Testsuite framework changes and execution tests [Patch (8/8)]

2017-09-04 Thread Christophe Lyon
Hi Tamar,

On 1 September 2017 at 15:24, Tamar Christina  wrote:
> Hi All,
>
> This patch enables the execution runs for Dot product and also
> adds the feature tests.
>
> The ARMv8.2-a Dot Product instructions only support 8-bit
> element vectorization.
>
> Dot product is available from ARMv8.2-a and onwards.
>
> Regtested and bootstrapped on aarch64-none-elf and
> arm-none-eabi and no issues.
>
> Ok for trunk?
>
> gcc/testsuite
> 2017-09-01  Tamar Christina  
>
> * lib/target-supports.exp
> (check_effective_target_arm_v8_2a_dotprod_neon_ok_nocache): New.
> (check_effective_target_arm_v8_2a_dotprod_neon_ok): New.
> (add_options_for_arm_v8_2a_dotprod_neon): New.
> (check_effective_target_arm_v8_2a_dotprod_neon_hw): New.
> (check_effective_target_vect_sdot_qi): New.
> (check_effective_target_vect_udot_qi): New.
> * gcc.target/arm/simd/vdot-exec.c: New.

Aren't you defining twice P() and ARR() in vdot-exec.c ?
I'd expect a preprocessor error, did I read too quickly?


Thanks,

Christophe

> * gcc.target/aarch64/advsimd-intrinsics/vdot-exec.c: New.
> * gcc/doc/sourcebuild.texi: Document arm_v8_2a_dotprod_neon.
>
> --


Re: [PATCH] Expand switch statements with a single (or none) non-default case.

2017-09-04 Thread Richard Biener
On Fri, Sep 1, 2017 at 3:01 PM, Martin Liška  wrote:
> On 09/01/2017 12:57 PM, Richard Biener wrote:
>> On Fri, Sep 1, 2017 at 12:44 PM, Martin Liška  wrote:
>>> On 09/01/2017 10:26 AM, Richard Biener wrote:
 On Fri, Sep 1, 2017 at 10:07 AM, Martin Liška  wrote:
> On 08/30/2017 02:56 PM, Richard Biener wrote:
>> On Wed, Aug 30, 2017 at 2:32 PM, Martin Liška  wrote:
>>> On 08/30/2017 02:28 PM, Richard Biener wrote:
 On Wed, Aug 30, 2017 at 1:13 PM, Martin Liška  wrote:
> Hi.
>
> Simple transformation of switch statements where degenerated switch 
> can be interpreted
> as gimple condition (or removed if having any non-default case). I 
> originally though
> that we don't have switch statements without non-default cases, but 
> PR82032 shows we
> can see it.
>
> Patch can bootstrap on ppc64le-redhat-linux and survives regression 
> tests.
>
> Ready to be installed?

 While I guess this case is ok to handle here it would be nice if CFG 
 cleanup
 would do the same.  I suppose find_taken_edge somehow doesn't work for
 this case even after my last enhancement?  Or is CFG cleanup for some 
 reason
 not run?
>>>
>>> Do you mean both with # of non-default edges equal to 0 and 1?
>>> Let me take a look.
>>
>> First and foremost 0.  The case of 1 non-default and a default would
>> need extra code.
>
> For the test-case I reduced, one needs:
>
> diff --git a/gcc/tree-cfg.c b/gcc/tree-cfg.c
> index b7593068ea9..13af516c6ac 100644
> --- a/gcc/tree-cfg.c
> +++ b/gcc/tree-cfg.c
> @@ -8712,7 +8712,7 @@ const pass_data pass_data_split_crit_edges =
>PROP_no_crit_edges, /* properties_provided */
>0, /* properties_destroyed */
>0, /* todo_flags_start */
> -  0, /* todo_flags_finish */
> +  TODO_cleanup_cfg, /* todo_flags_finish */
>  };
>
>  class pass_split_crit_edges : public gimple_opt_pass
>
> And the code eliminates the problematic switch statement. Do you believe 
> it's the right approach
> to add the clean up and preserve the assert in tree-switch-conversion.c?

 Eh, no.  If you run cleanup-cfg after critical edge splitting they
 will be unsplit immediately, making
 it (mostly) a no-op.

 OTOH I wanted to eliminate that "pass" in favor of just calling
 split_critical_edges () where needed
 (that's already done in some places).
>>>
>>> Good, so I will leave it to you. Should I in meantime remove the assert in 
>>> tree-switch-conversion.c ?
>>
>> Yes, as said your patch was generally OK, I just wondered why we left
>> the switches "unoptimized".
>
> Good.
>
> I'm sending v2 for single non-default case situation.
> Patch can bootstrap on ppc64le-redhat-linux and survives regression tests.
>
> Ready to be installed?

Some style nits.  generate_high_low_equality is a bit cumbersome I think
and I'd prefer

/* Generate a range test LHS CODE RHS that determines whether INDEX is in the
range [low, high].  Place associated stmts before *GSI.  */

void
generate_range_test (gimple_stmt_iterator *gsi, tree index, wide_int
low, wide_int high,
enum tree_code *code, tree *lhs, tree *rhs)
{
}

this captures the implicit requirement of constant low/high (otherwise
you'll generate invalid GIMPLE)
and it makes the comparison code explicit -- otherwise a user has to
inspect users or decipher
the function.  Note you'll need to use wi::from (low, TYPE_PRECISION
(TREE_TYPE (index)), SIGNED/UNSIGNED)
to get at the promoted wide-ints/trees.

Otherwise it looks reasonable.

Not super-happy with the tree-cfg.c location for the helper but I
don't have anything more
appropriate either.

Thanks,
Richard.


> Martin
>
>>
>> Richard.
>>

> For the case with # of edges == 1, should I place it to tree-cfg.c in 
> order to trigger it as a clean-up?

 I believe the code for edges == 1 can reside in
 cleanup_control_expr_graph.  Probably easiest
 from a flow perspective if we do the switch -> cond transform before
 the existing code handling
 cond and switch via find-taken-edge.
>>>
>>> Working on that, good place to do the transformation.
>>>
>>> Martin
>>>

> Thoughts?
>
> Martin
>
>>
>> Richard.
>>
>>> Martin
>>>

 Richard.

> Martin
>
> gcc/ChangeLog:
>
> 2017-08-25  Martin Liska  
>
> PR tree-optimization/82032
> * tree-switch-conversion.c (generate_high_low_equality): New
> function.
> (expand_degenerated_switch): Likewise.
> (process_switch): Call 

[9/9] Make bitsize_mode_for_mode return an opt_mode

2017-09-04 Thread Richard Sandiford
2017-09-04  Richard Sandiford  

gcc/
* machmode.h (bitwise_mode_for_mode): Return opt_mode.
* stor-layout.c (bitwise_mode_for_mode): Likewise.
(bitwise_type_for_mode): Update accordingly.

Index: gcc/machmode.h
===
--- gcc/machmode.h  2017-09-04 12:18:55.821333642 +0100
+++ gcc/machmode.h  2017-09-04 12:19:42.856108173 +0100
@@ -694,7 +694,7 @@ smallest_int_mode_for_size (unsigned int
 }
 
 extern opt_scalar_int_mode int_mode_for_mode (machine_mode);
-extern machine_mode bitwise_mode_for_mode (machine_mode);
+extern opt_machine_mode bitwise_mode_for_mode (machine_mode);
 extern opt_machine_mode mode_for_vector (scalar_mode, unsigned);
 extern opt_machine_mode mode_for_int_vector (unsigned int, unsigned int);
 
Index: gcc/stor-layout.c
===
--- gcc/stor-layout.c   2017-09-04 12:19:01.144339518 +0100
+++ gcc/stor-layout.c   2017-09-04 12:19:42.856108173 +0100
@@ -404,10 +404,10 @@ int_mode_for_mode (machine_mode mode)
 }
 }
 
-/* Find a mode that can be used for efficient bitwise operations on MODE.
-   Return BLKmode if no such mode exists.  */
+/* Find a mode that can be used for efficient bitwise operations on MODE,
+   if one exists.  */
 
-machine_mode
+opt_machine_mode
 bitwise_mode_for_mode (machine_mode mode)
 {
   /* Quick exit if we already have a suitable mode.  */
@@ -445,7 +445,7 @@ bitwise_mode_for_mode (machine_mode mode
 }
 
   /* Otherwise fall back on integers while honoring MAX_FIXED_MODE_SIZE.  */
-  return mode_for_size (bitsize, MODE_INT, true).else_blk ();
+  return mode_for_size (bitsize, MODE_INT, true);
 }
 
 /* Find a type that can be used for efficient bitwise operations on MODE.
@@ -454,8 +454,7 @@ bitwise_mode_for_mode (machine_mode mode
 tree
 bitwise_type_for_mode (machine_mode mode)
 {
-  mode = bitwise_mode_for_mode (mode);
-  if (mode == BLKmode)
+  if (!bitwise_mode_for_mode (mode).exists ())
 return NULL_TREE;
 
   unsigned int inner_size = GET_MODE_UNIT_BITSIZE (mode);


[8/9] Make mode_for_size_tree return an opt_mode

2017-09-04 Thread Richard Sandiford
...for consistency with mode_for_size

2017-09-04  Richard Sandiford  

gcc/
* stor-layout.h (mode_for_size_tree): Return an opt_mode.
* stor-layout.c (mode_for_size_tree): Likewise.
(mode_for_array): Update accordingly.
(layout_decl): Likewise.
(compute_record_mode): Likewise.  Only set the mode once.

gcc/ada/
* gcc-interface/utils.c (make_packable_type): Update call to
mode_for_size_tree.

Index: gcc/stor-layout.h
===
--- gcc/stor-layout.h   2017-08-21 12:14:47.158835574 +0100
+++ gcc/stor-layout.h   2017-09-04 12:19:01.144339518 +0100
@@ -99,7 +99,7 @@ extern tree make_unsigned_type (int);
If LIMIT is nonzero, then don't use modes bigger than MAX_FIXED_MODE_SIZE.
The value is BLKmode if no other mode is found.  This is like
mode_for_size, but is passed a tree.  */
-extern machine_mode mode_for_size_tree (const_tree, enum mode_class, int);
+extern opt_machine_mode mode_for_size_tree (const_tree, enum mode_class, int);
 
 extern tree bitwise_type_for_mode (machine_mode);
 
Index: gcc/stor-layout.c
===
--- gcc/stor-layout.c   2017-09-04 12:18:55.824344959 +0100
+++ gcc/stor-layout.c   2017-09-04 12:19:01.144339518 +0100
@@ -321,19 +321,19 @@ mode_for_size (unsigned int size, enum m
 
 /* Similar, except passed a tree node.  */
 
-machine_mode
+opt_machine_mode
 mode_for_size_tree (const_tree size, enum mode_class mclass, int limit)
 {
   unsigned HOST_WIDE_INT uhwi;
   unsigned int ui;
 
   if (!tree_fits_uhwi_p (size))
-return BLKmode;
+return opt_machine_mode ();
   uhwi = tree_to_uhwi (size);
   ui = uhwi;
   if (uhwi != ui)
-return BLKmode;
-  return mode_for_size (ui, mclass, limit).else_blk ();
+return opt_machine_mode ();
+  return mode_for_size (ui, mclass, limit);
 }
 
 /* Return the narrowest mode of class MCLASS that contains at least
@@ -563,7 +563,7 @@ mode_for_array (tree elem_type, tree siz
 int_size / int_elem_size))
limit_p = false;
 }
-  return mode_for_size_tree (size, MODE_INT, limit_p);
+  return mode_for_size_tree (size, MODE_INT, limit_p).else_blk ();
 }
 
 /* Subroutine of layout_decl: Force alignment required for the data type.
@@ -683,17 +683,18 @@ layout_decl (tree decl, unsigned int kno
  && TREE_CODE (TYPE_SIZE (type)) == INTEGER_CST
  && GET_MODE_CLASS (TYPE_MODE (type)) == MODE_INT)
{
- machine_mode xmode
-   = mode_for_size_tree (DECL_SIZE (decl), MODE_INT, 1);
- unsigned int xalign = GET_MODE_ALIGNMENT (xmode);
-
- if (xmode != BLKmode
- && !(xalign > BITS_PER_UNIT && DECL_PACKED (decl))
- && (known_align == 0 || known_align >= xalign))
+ machine_mode xmode;
+ if (mode_for_size_tree (DECL_SIZE (decl),
+ MODE_INT, 1).exists ())
{
- SET_DECL_ALIGN (decl, MAX (xalign, DECL_ALIGN (decl)));
- SET_DECL_MODE (decl, xmode);
- DECL_BIT_FIELD (decl) = 0;
+ unsigned int xalign = GET_MODE_ALIGNMENT (xmode);
+ if (!(xalign > BITS_PER_UNIT && DECL_PACKED (decl))
+ && (known_align == 0 || known_align >= xalign))
+   {
+ SET_DECL_ALIGN (decl, MAX (xalign, DECL_ALIGN (decl)));
+ SET_DECL_MODE (decl, xmode);
+ DECL_BIT_FIELD (decl) = 0;
+   }
}
}
 
@@ -1756,22 +1757,24 @@ compute_record_mode (tree type)
   if (TREE_CODE (type) == RECORD_TYPE && mode != VOIDmode
   && tree_fits_uhwi_p (TYPE_SIZE (type))
   && GET_MODE_BITSIZE (mode) == tree_to_uhwi (TYPE_SIZE (type)))
-SET_TYPE_MODE (type, mode);
+;
   else
-SET_TYPE_MODE (type, mode_for_size_tree (TYPE_SIZE (type), MODE_INT, 1));
+mode = mode_for_size_tree (TYPE_SIZE (type), MODE_INT, 1).else_blk ();
 
   /* If structure's known alignment is less than what the scalar
  mode would need, and it matters, then stick with BLKmode.  */
-  if (TYPE_MODE (type) != BLKmode
+  if (mode != BLKmode
   && STRICT_ALIGNMENT
   && ! (TYPE_ALIGN (type) >= BIGGEST_ALIGNMENT
-   || TYPE_ALIGN (type) >= GET_MODE_ALIGNMENT (TYPE_MODE (type
+   || TYPE_ALIGN (type) >= GET_MODE_ALIGNMENT (mode)))
 {
   /* If this is the only reason this type is BLKmode, then
 don't force containing types to be BLKmode.  */
   TYPE_NO_FORCE_BLK (type) = 1;
-  SET_TYPE_MODE (type, BLKmode);
+  mode = BLKmode;
 }
+
+  SET_TYPE_MODE (type, mode);
 }
 
 /* Compute TYPE_SIZE and TYPE_ALIGN for TYPE, once it has been laid
Index: gcc/ada/gcc-interface/utils.c

[7/9] Make targetm.get_mask_mode return an opt_mode

2017-09-04 Thread Richard Sandiford
...for consistency with mode_for_vector.

2017-09-04  Richard Sandiford  

gcc/
* target.def (get_mask_mode): Change return type to opt_mode.
Expand commentary.
* doc/tm.texi: Regenerate.
* targhooks.h (default_get_mask_mode): Return an opt_mode.
* targhooks.c (default_get_mask_mode): Likewise.
* config/i386/i386.c (ix86_get_mask_mode): Likewise.
* optabs-query.c (can_vec_mask_load_store_p): Update use of
targetm.get_mask_mode.
* tree.c (build_truth_vector_type): Likewise.

Index: gcc/target.def
===
--- gcc/target.def  2017-09-04 11:50:24.568774867 +0100
+++ gcc/target.def  2017-09-04 12:18:58.594757220 +0100
@@ -1877,10 +1877,16 @@ The default is zero which means to not i
 /* Function to get a target mode for a vector mask.  */
 DEFHOOK
 (get_mask_mode,
- "This hook returns mode to be used for a mask to be used for a vector\n\
-of specified @var{length} with @var{nunits} elements.  By default an integer\n\
-vector mode of a proper size is returned.",
- machine_mode,
+ "A vector mask is a value that holds one boolean result for every element\n\
+in a vector.  This hook returns the machine mode that should be used to\n\
+represent such a mask when the vector in question is @var{length} bytes\n\
+long and contains @var{nunits} elements.  The hook returns an empty\n\
+@code{opt_machine_mode} if no such mode exists.\n\
+\n\
+The default implementation returns the mode of an integer vector that\n\
+is @var{length} bytes long and that contains @var{nunits} elements,\n\
+if such a mode exists.",
+ opt_machine_mode,
  (unsigned nunits, unsigned length),
  default_get_mask_mode)
 
Index: gcc/doc/tm.texi
===
--- gcc/doc/tm.texi 2017-09-04 11:50:24.566073698 +0100
+++ gcc/doc/tm.texi 2017-09-04 12:18:58.593753447 +0100
@@ -5820,10 +5820,16 @@ mode returned by @code{TARGET_VECTORIZE_
 The default is zero which means to not iterate over other vector sizes.
 @end deftypefn
 
-@deftypefn {Target Hook} machine_mode TARGET_VECTORIZE_GET_MASK_MODE (unsigned 
@var{nunits}, unsigned @var{length})
-This hook returns mode to be used for a mask to be used for a vector
-of specified @var{length} with @var{nunits} elements.  By default an integer
-vector mode of a proper size is returned.
+@deftypefn {Target Hook} opt_machine_mode TARGET_VECTORIZE_GET_MASK_MODE 
(unsigned @var{nunits}, unsigned @var{length})
+A vector mask is a value that holds one boolean result for every element
+in a vector.  This hook returns the machine mode that should be used to
+represent such a mask when the vector in question is @var{length} bytes
+long and contains @var{nunits} elements.  The hook returns an empty
+@code{opt_machine_mode} if no such mode exists.
+
+The default implementation returns the mode of an integer vector that
+is @var{length} bytes long and that contains @var{nunits} elements,
+if such a mode exists.
 @end deftypefn
 
 @deftypefn {Target Hook} {void *} TARGET_VECTORIZE_INIT_COST (struct loop 
*@var{loop_info})
Index: gcc/targhooks.h
===
--- gcc/targhooks.h 2017-09-04 11:50:24.568774867 +0100
+++ gcc/targhooks.h 2017-09-04 12:18:58.594757220 +0100
@@ -102,7 +102,7 @@ default_builtin_support_vector_misalignm
 int, bool);
 extern machine_mode default_preferred_simd_mode (scalar_mode mode);
 extern unsigned int default_autovectorize_vector_sizes (void);
-extern machine_mode default_get_mask_mode (unsigned, unsigned);
+extern opt_machine_mode default_get_mask_mode (unsigned, unsigned);
 extern void *default_init_cost (struct loop *);
 extern unsigned default_add_stmt_cost (void *, int, enum vect_cost_for_stmt,
   struct _stmt_vec_info *, int,
Index: gcc/targhooks.c
===
--- gcc/targhooks.c 2017-09-04 12:18:55.825348732 +0100
+++ gcc/targhooks.c 2017-09-04 12:18:58.594757220 +0100
@@ -1200,7 +1200,7 @@ default_autovectorize_vector_sizes (void
 
 /* By defaults a vector of integers is used as a mask.  */
 
-machine_mode
+opt_machine_mode
 default_get_mask_mode (unsigned nunits, unsigned vector_size)
 {
   unsigned elem_size = vector_size / nunits;
@@ -1210,12 +1210,12 @@ default_get_mask_mode (unsigned nunits,
 
   gcc_assert (elem_size * nunits == vector_size);
 
-  if (!mode_for_vector (elem_mode, nunits).exists (_mode)
-  || !VECTOR_MODE_P (vector_mode)
-  || !targetm.vector_mode_supported_p (vector_mode))
-vector_mode = BLKmode;
+  if (mode_for_vector (elem_mode, nunits).exists (_mode)
+  && VECTOR_MODE_P (vector_mode)
+  && targetm.vector_mode_supported_p (vector_mode))
+return vector_mode;
 
-  return vector_mode;
+  return opt_machine_mode ();
 

[6/9] Make mode_for_vector return an opt_mode

2017-09-04 Thread Richard Sandiford
...following on from the mode_for_size change.  The patch also removes
machmode.h versions of the stor-layout.c comments, since the comments
in the .c file are more complete.

2017-09-04  Richard Sandiford  

gcc/
* machmode.h (mode_for_vector): Return an opt_mode.
* stor-layout.c (mode_for_vector): Likewise.
(mode_for_int_vector): Update accordingly.
(layout_type): Likewise.
* config/i386/i386.c (emit_memmov): Likewise.
(ix86_expand_set_or_movmem): Likewise.
(ix86_expand_vector_init): Likewise.
(ix86_get_mask_mode): Likewise.
* config/powerpcspe/powerpcspe.c (rs6000_expand_vec_perm_const_1):
Likewise.
* config/rs6000/rs6000.c (rs6000_expand_vec_perm_const_1): Likewise.
* expmed.c (extract_bit_field_1): Likewise.
* expr.c (expand_expr_real_2): Likewise.
* optabs-query.c (can_vec_perm_p): Likewise.
(can_vec_mask_load_store_p): Likewise.
* optabs.c (expand_vec_perm): Likewise.
* targhooks.c (default_get_mask_mode): Likewise.
* tree-vect-stmts.c (vectorizable_store): Likewise.
(vectorizable_load): Likewise.
(get_vectype_for_scalar_type_and_size): Likewise.

Index: gcc/machmode.h
===
--- gcc/machmode.h  2017-09-04 12:18:53.153306182 +0100
+++ gcc/machmode.h  2017-09-04 12:18:55.821333642 +0100
@@ -682,8 +682,6 @@ decimal_float_mode_for_size (unsigned in
 (mode_for_size (size, MODE_DECIMAL_FLOAT, 0));
 }
 
-/* Similar to mode_for_size, but find the smallest mode for a given width.  */
-
 extern machine_mode smallest_mode_for_size (unsigned int, enum mode_class);
 
 /* Find the narrowest integer mode that contains at least SIZE bits.
@@ -695,17 +693,9 @@ smallest_int_mode_for_size (unsigned int
   return as_a  (smallest_mode_for_size (size, MODE_INT));
 }
 
-/* Return an integer mode of exactly the same size as the input mode.  */
-
 extern opt_scalar_int_mode int_mode_for_mode (machine_mode);
-
 extern machine_mode bitwise_mode_for_mode (machine_mode);
-
-/* Return a mode that is suitable for representing a vector,
-   or BLKmode on failure.  */
-
-extern machine_mode mode_for_vector (scalar_mode, unsigned);
-
+extern opt_machine_mode mode_for_vector (scalar_mode, unsigned);
 extern opt_machine_mode mode_for_int_vector (unsigned int, unsigned int);
 
 /* Return the integer vector equivalent of MODE, if one exists.  In other
Index: gcc/stor-layout.c
===
--- gcc/stor-layout.c   2017-09-04 12:18:53.153306182 +0100
+++ gcc/stor-layout.c   2017-09-04 12:18:55.824344959 +0100
@@ -471,11 +471,11 @@ bitwise_type_for_mode (machine_mode mode
   return inner_type;
 }
 
-/* Find a mode that is suitable for representing a vector with
-   NUNITS elements of mode INNERMODE.  Returns BLKmode if there
-   is no suitable mode.  */
+/* Find a mode that is suitable for representing a vector with NUNITS
+   elements of mode INNERMODE, if one exists.  The returned mode can be
+   either an integer mode or a vector mode.  */
 
-machine_mode
+opt_machine_mode
 mode_for_vector (scalar_mode innermode, unsigned nunits)
 {
   machine_mode mode;
@@ -499,22 +499,18 @@ mode_for_vector (scalar_mode innermode,
   FOR_EACH_MODE_FROM (mode, mode)
 if (GET_MODE_NUNITS (mode) == nunits
&& GET_MODE_INNER (mode) == innermode)
-  break;
+  return mode;
 
   /* For integers, try mapping it to a same-sized scalar mode.  */
-  if (mode == VOIDmode
-  && GET_MODE_CLASS (innermode) == MODE_INT)
+  if (GET_MODE_CLASS (innermode) == MODE_INT)
 {
   unsigned int nbits = nunits * GET_MODE_BITSIZE (innermode);
-  mode = int_mode_for_size (nbits, 0).else_blk ();
+  if (int_mode_for_size (nbits, 0).exists ()
+ && have_regs_of_mode[mode])
+   return mode;
 }
 
-  if (mode == VOIDmode
-  || (GET_MODE_CLASS (mode) == MODE_INT
- && !have_regs_of_mode[mode]))
-return BLKmode;
-
-  return mode;
+  return opt_machine_mode ();
 }
 
 /* Return the mode for a vector that has NUNITS integer elements of
@@ -525,12 +521,10 @@ mode_for_vector (scalar_mode innermode,
 mode_for_int_vector (unsigned int int_bits, unsigned int nunits)
 {
   scalar_int_mode int_mode;
-  if (int_mode_for_size (int_bits, 0).exists (_mode))
-{
-  machine_mode vec_mode = mode_for_vector (int_mode, nunits);
-  if (vec_mode != BLKmode)
-   return vec_mode;
-}
+  machine_mode vec_mode;
+  if (int_mode_for_size (int_bits, 0).exists (_mode)
+  && mode_for_vector (int_mode, nunits).exists (_mode))
+return vec_mode;
   return opt_machine_mode ();
 }
 
@@ -2264,7 +2258,7 @@ layout_type (tree type)
if (TYPE_MODE (type) == VOIDmode)
  SET_TYPE_MODE (type,
 mode_for_vector (SCALAR_TYPE_MODE (innertype),
- 

[5/9] Add mode_for_int_vector helper functions

2017-09-04 Thread Richard Sandiford
There are at least a few places that want to create an integer vector
with a specified element size and element count, or to create the
integer equivalent of an existing mode.  This patch adds helpers
for doing that.

The require ()s are all used in functions that go on to emit
instructions that use the result as a vector mode.

2017-09-04  Richard Sandiford  

gcc/
* machmode.h (mode_for_int_vector): New function.
* stor-layout.c (mode_for_int_vector): Likewise.
* config/aarch64/aarch64.c (aarch64_emit_approx_sqrt): Use it.
* config/powerpcspe/powerpcspe.c (rs6000_do_expand_vec_perm): Likewise.
* config/rs6000/rs6000.c (rs6000_do_expand_vec_perm): Likewise.
* config/s390/s390.c (s390_expand_vec_compare_cc): Likewise.
(s390_expand_vcond): Likewise.

Index: gcc/machmode.h
===
--- gcc/machmode.h  2017-09-04 12:18:50.674859598 +0100
+++ gcc/machmode.h  2017-09-04 12:18:53.153306182 +0100
@@ -706,6 +706,21 @@ extern machine_mode bitwise_mode_for_mod
 
 extern machine_mode mode_for_vector (scalar_mode, unsigned);
 
+extern opt_machine_mode mode_for_int_vector (unsigned int, unsigned int);
+
+/* Return the integer vector equivalent of MODE, if one exists.  In other
+   words, return the mode for an integer vector that has the same number
+   of bits as MODE and the same number of elements as MODE, with the
+   latter being 1 if MODE is scalar.  The returned mode can be either
+   an integer mode or a vector mode.  */
+
+inline opt_machine_mode
+mode_for_int_vector (machine_mode mode)
+{
+  return mode_for_int_vector (GET_MODE_UNIT_BITSIZE (mode),
+ GET_MODE_NUNITS (mode));
+}
+
 /* A class for iterating through possible bitfield modes.  */
 class bit_field_mode_iterator
 {
Index: gcc/stor-layout.c
===
--- gcc/stor-layout.c   2017-09-04 12:18:50.675762071 +0100
+++ gcc/stor-layout.c   2017-09-04 12:18:53.153306182 +0100
@@ -517,6 +517,23 @@ mode_for_vector (scalar_mode innermode,
   return mode;
 }
 
+/* Return the mode for a vector that has NUNITS integer elements of
+   INT_BITS bits each, if such a mode exists.  The mode can be either
+   an integer mode or a vector mode.  */
+
+opt_machine_mode
+mode_for_int_vector (unsigned int int_bits, unsigned int nunits)
+{
+  scalar_int_mode int_mode;
+  if (int_mode_for_size (int_bits, 0).exists (_mode))
+{
+  machine_mode vec_mode = mode_for_vector (int_mode, nunits);
+  if (vec_mode != BLKmode)
+   return vec_mode;
+}
+  return opt_machine_mode ();
+}
+
 /* Return the alignment of MODE. This will be bounded by 1 and
BIGGEST_ALIGNMENT.  */
 
Index: gcc/config/aarch64/aarch64.c
===
--- gcc/config/aarch64/aarch64.c2017-09-04 12:18:44.874165502 +0100
+++ gcc/config/aarch64/aarch64.c2017-09-04 12:18:53.144272229 +0100
@@ -8282,9 +8282,6 @@ aarch64_emit_approx_sqrt (rtx dst, rtx s
   return false;
 }
 
-  machine_mode mmsk
-= mode_for_vector (int_mode_for_mode (GET_MODE_INNER (mode)).require (),
-  GET_MODE_NUNITS (mode));
   if (!recp)
 {
   if (!(flag_mlow_precision_sqrt
@@ -8302,7 +8299,7 @@ aarch64_emit_approx_sqrt (rtx dst, rtx s
 /* Caller assumes we cannot fail.  */
 gcc_assert (use_rsqrt_p (mode));
 
-
+  machine_mode mmsk = mode_for_int_vector (mode).require ();
   rtx xmsk = gen_reg_rtx (mmsk);
   if (!recp)
 /* When calculating the approximate square root, compare the
Index: gcc/config/powerpcspe/powerpcspe.c
===
--- gcc/config/powerpcspe/powerpcspe.c  2017-09-04 12:18:44.919414816 +0100
+++ gcc/config/powerpcspe/powerpcspe.c  2017-09-04 12:18:53.148287319 +0100
@@ -38739,8 +38739,7 @@ rs6000_do_expand_vec_perm (rtx target, r
 
   imode = vmode;
   if (GET_MODE_CLASS (vmode) != MODE_VECTOR_INT)
-imode = mode_for_vector
-  (int_mode_for_mode (GET_MODE_INNER (vmode)).require (), nelt);
+imode = mode_for_int_vector (vmode).require ();
 
   x = gen_rtx_CONST_VECTOR (imode, gen_rtvec_v (nelt, perm));
   x = expand_vec_perm (vmode, op0, op1, x, target);
Index: gcc/config/rs6000/rs6000.c
===
--- gcc/config/rs6000/rs6000.c  2017-09-04 12:18:44.929470219 +0100
+++ gcc/config/rs6000/rs6000.c  2017-09-04 12:18:53.151298637 +0100
@@ -35584,8 +35584,7 @@ rs6000_do_expand_vec_perm (rtx target, r
 
   imode = vmode;
   if (GET_MODE_CLASS (vmode) != MODE_VECTOR_INT)
-imode = mode_for_vector
-  (int_mode_for_mode (GET_MODE_INNER (vmode)).require (), nelt);
+imode = mode_for_int_vector (vmode).require ();
 
   x = gen_rtx_CONST_VECTOR (imode, gen_rtvec_v (nelt, perm));
   x = expand_vec_perm (vmode, op0, op1, x, target);
Index: 

RE: [PATCH][GCC][Testsuite][ARM][AArch64] Enable Dot Product for generic tests for ARM and AArch64 [Patch (7/8)]

2017-09-04 Thread Tamar Christina
> I'm surprised that this worked!
> 
> It looks like you unconditionally add the -march=armv8.2-a+dotprod options,
> which should cause you to generate instructions which will not execute on
> targets which don't support this instruction. As far as I can see, this is an
> execute test, so that should cause undefined instruction exceptions on an
> Armv8-A target at the very least.

It's not, there is no dg-do specified, which means it defaults to "compile"
This is a straight compilation tests that checks to see if the target can do the
reduction. There may be a main, but it's never executed, which is why I don't
have a hardware check against it.

The unconditional armv8.2+dotprod is for this reason. It doesn't matter what 
hardware.

> 
> So, not OK in its current form.
> 
> Thanks,
> James
> 
> >
> > Ok for trunk?
> >
> > gcc/testsuite
> > 2017-09-01  Tamar Christina  
> >
> > * gcc.dg/vect/vect-reduc-dot-s8a.c
> > (dg-additional-options, dg-require-effective-target): Add +dotprod.
> > * gcc.dg/vect/vect-reduc-dot-u8a.c
> > (dg-additional-options, dg-require-effective-target): Add +dotprod.
> >
> > --



[4/9] Make mode_for_size return an opt_mode

2017-09-04 Thread Richard Sandiford
...to make it consistent with int_mode_for_size etc.

require () seems like the right choice in replace_reg_with_saved_mem
because we use the chosen mode for saving and restoring registers,
which cannot be done in BLKmode.  Similarly require () seems like
the right choice in calls related to secondary memory reloads (the ones
in config/, and in get_secondary_mem) because the reload must always
have a defined mode, which e.g. determines the size of the slot.

We can use require () in simplify_subreg_concatn and assemble_integer
because it isn't meaningful to create a subreg with BLKmode (for one
thing, we couldn't tell then whether it was partial, paradoxical, etc.).

make_fract_type and make_accum_type must find a mode because that's
what distinguishes accumulator FIXED_POINT_TYPEs from fractional
FIXED_POINT_TYPEs.

2017-09-04  Richard Sandiford  

gcc/
* machmode.h (opt_machine_mode): New type.
(opt_mode): Allow construction from anything that can be
converted to a T.
(is_a, as_a, dyn_cast): Add overloads for opt_mode.
(mode_for_size): Return an opt_machine_mode.
* stor-layout.c (mode_for_size): Likewise.
(mode_for_size_tree): Update call accordingly.
(bitwise_mode_for_mode): Likewise.
(make_fract_type): Likewise.
(make_accum_type): Likewise.
* caller-save.c (replace_reg_with_saved_mem): Update call
accordingly.
* config/alpha/alpha.h (SECONDARY_MEMORY_NEEDED_MODE): Likewise.
* config/i386/i386.h (SECONDARY_MEMORY_NEEDED_MODE): Likewise.
* config/s390/s390.h (SECONDARY_MEMORY_NEEDED_MODE): Likewise.
* config/sparc/sparc.h (SECONDARY_MEMORY_NEEDED_MODE): Likewise.
* expmed.c (extract_bit_field_1): Likewise.
* reload.c (get_secondary_mem): Likewise.
* varasm.c (assemble_integer): Likewise.
* lower-subreg.c (simplify_subreg_concatn): Likewise.  Move
early-out.

Index: gcc/machmode.h
===
--- gcc/machmode.h  2017-09-04 12:18:47.820398622 +0100
+++ gcc/machmode.h  2017-09-04 12:18:50.674859598 +0100
@@ -20,6 +20,8 @@ Software Foundation; either version 3, o
 #ifndef HAVE_MACHINE_MODES
 #define HAVE_MACHINE_MODES
 
+typedef opt_mode opt_machine_mode;
+
 extern CONST_MODE_SIZE unsigned short mode_size[NUM_MACHINE_MODES];
 extern const unsigned short mode_precision[NUM_MACHINE_MODES];
 extern const unsigned char mode_inner[NUM_MACHINE_MODES];
@@ -237,6 +239,8 @@ #define POINTER_BOUNDS_MODE_P(MODE)
 
   ALWAYS_INLINE opt_mode () : m_mode (E_VOIDmode) {}
   ALWAYS_INLINE opt_mode (const T ) : m_mode (m) {}
+  template
+  ALWAYS_INLINE opt_mode (const U ) : m_mode (T (m)) {}
   ALWAYS_INLINE opt_mode (from_int m) : m_mode (machine_mode (m)) {}
 
   machine_mode else_void () const;
@@ -325,6 +329,13 @@ is_a (machine_mode m)
   return T::includes_p (m);
 }
 
+template
+inline bool
+is_a (const opt_mode )
+{
+  return T::includes_p (m.else_void ());
+}
+
 /* Assert that mode M has type T, and return it in that form.  */
 
 template
@@ -335,6 +346,13 @@ as_a (machine_mode m)
   return typename mode_traits::from_int (m);
 }
 
+template
+inline T
+as_a (const opt_mode )
+{
+  return as_a  (m.else_void ());
+}
+
 /* Convert M to an opt_mode.  */
 
 template
@@ -346,6 +364,13 @@ dyn_cast (machine_mode m)
   return opt_mode ();
 }
 
+template
+inline opt_mode
+dyn_cast (const opt_mode )
+{
+  return dyn_cast  (m.else_void ());
+}
+
 /* Return true if mode M has type T, storing it as a T in *RESULT
if so.  */
 
@@ -627,11 +652,7 @@ GET_MODE_2XWIDER_MODE (const T )
 extern const unsigned char mode_complex[NUM_MACHINE_MODES];
 #define GET_MODE_COMPLEX_MODE(MODE) ((machine_mode) mode_complex[MODE])
 
-/* Return the mode for data of a given size SIZE and mode class CLASS.
-   If LIMIT is nonzero, then don't use modes bigger than MAX_FIXED_MODE_SIZE.
-   The value is BLKmode if no other mode is found.  */
-
-extern machine_mode mode_for_size (unsigned int, enum mode_class, int);
+extern opt_machine_mode mode_for_size (unsigned int, enum mode_class, int);
 
 /* Return the machine mode to use for a MODE_INT of SIZE bits, if one
exists.  If LIMIT is nonzero, modes wider than MAX_FIXED_MODE_SIZE
Index: gcc/stor-layout.c
===
--- gcc/stor-layout.c   2017-09-04 12:18:44.944553324 +0100
+++ gcc/stor-layout.c   2017-09-04 12:18:50.675762071 +0100
@@ -291,19 +291,19 @@ finalize_size_functions (void)
   vec_free (size_functions);
 }
 
-/* Return the machine mode to use for a nonscalar of SIZE bits.  The
-   mode must be in class MCLASS, and have exactly that many value bits;
-   it may have padding as well.  If LIMIT is nonzero, modes of wider
-   than MAX_FIXED_MODE_SIZE will not be used.  */
+/* Return a machine mode of class MCLASS with SIZE bits of precision,
+   if one exists.  The mode may have 

[3/9] (decimal_)float_mode_for_size in real.h

2017-09-04 Thread Richard Sandiford
This patch makes the binary float macros in real.h use
float_mode_for_size and adds a corresponding decimal_float_mode_for_size
for the decimal macros.

2017-09-04  Richard Sandiford  

gcc/
* machmode.h (decimal_float_mode_for_size): New function.
* real.h (REAL_VALUE_TO_TARGET_LONG_DOUBLE): Use float_mode_for_size.
(REAL_VALUE_TO_TARGET_DOUBLE): Likewise.
(REAL_VALUE_TO_TARGET_SINGLE): Likewise.
(REAL_VALUE_TO_TARGET_DECIMAL128): Use decimal_float_mode_for_size.
(REAL_VALUE_TO_TARGET_DECIMAL64): Likewise.
(REAL_VALUE_TO_TARGET_DECIMAL32): Likewise.

Index: gcc/machmode.h
===
--- gcc/machmode.h  2017-08-30 12:20:57.010045759 +0100
+++ gcc/machmode.h  2017-09-04 12:18:47.820398622 +0100
@@ -652,6 +652,15 @@ float_mode_for_size (unsigned int size)
   return dyn_cast  (mode_for_size (size, MODE_FLOAT, 0));
 }
 
+/* Likewise for MODE_DECIMAL_FLOAT.  */
+
+inline opt_scalar_float_mode
+decimal_float_mode_for_size (unsigned int size)
+{
+  return dyn_cast 
+(mode_for_size (size, MODE_DECIMAL_FLOAT, 0));
+}
+
 /* Similar to mode_for_size, but find the smallest mode for a given width.  */
 
 extern machine_mode smallest_mode_for_size (unsigned int, enum mode_class);
Index: gcc/real.h
===
--- gcc/real.h  2017-08-30 12:09:02.416468293 +0100
+++ gcc/real.h  2017-09-04 12:18:47.820398622 +0100
@@ -383,27 +383,28 @@ #define REAL_VALUE_MINUS_ZERO(x)  real_is
 /* IN is a REAL_VALUE_TYPE.  OUT is an array of longs.  */
 #define REAL_VALUE_TO_TARGET_LONG_DOUBLE(IN, OUT)  \
   real_to_target (OUT, &(IN),  \
- mode_for_size (LONG_DOUBLE_TYPE_SIZE, MODE_FLOAT, 0))
+ float_mode_for_size (LONG_DOUBLE_TYPE_SIZE).require ())
 
 #define REAL_VALUE_TO_TARGET_DOUBLE(IN, OUT) \
-  real_to_target (OUT, &(IN), mode_for_size (64, MODE_FLOAT, 0))
+  real_to_target (OUT, &(IN), float_mode_for_size (64).require ())
 
 /* IN is a REAL_VALUE_TYPE.  OUT is a long.  */
 #define REAL_VALUE_TO_TARGET_SINGLE(IN, OUT) \
-  ((OUT) = real_to_target (NULL, &(IN), mode_for_size (32, MODE_FLOAT, 0)))
+  ((OUT) = real_to_target (NULL, &(IN), float_mode_for_size (32).require ()))
 
 /* Real values to IEEE 754 decimal floats.  */
 
 /* IN is a REAL_VALUE_TYPE.  OUT is an array of longs.  */
 #define REAL_VALUE_TO_TARGET_DECIMAL128(IN, OUT) \
-  real_to_target (OUT, &(IN), mode_for_size (128, MODE_DECIMAL_FLOAT, 0))
+  real_to_target (OUT, &(IN), decimal_float_mode_for_size (128).require ())
 
 #define REAL_VALUE_TO_TARGET_DECIMAL64(IN, OUT) \
-  real_to_target (OUT, &(IN), mode_for_size (64, MODE_DECIMAL_FLOAT, 0))
+  real_to_target (OUT, &(IN), decimal_float_mode_for_size (64).require ())
 
 /* IN is a REAL_VALUE_TYPE.  OUT is a long.  */
 #define REAL_VALUE_TO_TARGET_DECIMAL32(IN, OUT) \
-  ((OUT) = real_to_target (NULL, &(IN), mode_for_size (32, MODE_DECIMAL_FLOAT, 
0)))
+  ((OUT) = real_to_target (NULL, &(IN), \
+  decimal_float_mode_for_size (32).require ()))
 
 extern REAL_VALUE_TYPE real_value_truncate (format_helper, REAL_VALUE_TYPE);
 


[2/9] Make more use of int_mode_for_size

2017-09-04 Thread Richard Sandiford
This patch converts more places that could use int_mode_for_size instead
of mode_for_size.  This is in preparation for an upcoming patch that
makes mode_for_size itself return an opt_mode.

require () seems like the right choice in expand_builtin_powi
because we have got past the point of backing out.  We go on to do:

  op1 = expand_expr (arg1, NULL_RTX, mode2, EXPAND_NORMAL);
  if (GET_MODE (op1) != mode2)
op1 = convert_to_mode (mode2, op1, 0);

which would be invalid for (and have failed for) BLKmode.

In get_builtin_sync_mode and expand_ifn_atomic_compare_exchange,
the possible bitsizes are {8, 16, 32, 64, 128}, all of which give
target-independent integer modes (up to TImode).  The comment above
the call in get_builtin_sync_mode makes clear that an integer mode
must be found.

We can use require () in expand_builtin_atomic_clear and
expand_builtin_atomic_test_and_set because there's always an integer
mode for the boolean type.  The same goes for the POINTER_SIZE request
in layout_type.  Similarly we can use require () in combine_instructions
and gen_lowpart_common because there's always an integer mode for
HOST_BITS_PER_WIDE_INT (DImode when BITS_PER_UNIT == 8), and
HOST_BITS_PER_DOUBLE_INT (TImode).

The calls in aarch64_function_value, arm_function_value,
aapcs_allocate_return_reg and mips_function_value_1 are handling
cases in which a big-endian target passes or returns values at
the most significant end of a register.  In each case the ABI
constrains the size to a small amount and does not handle
non-power-of-2 sizes wider than a word.

The calls in c6x_expand_movmem, i386.c:emit_memset,
lm32_block_move_inline, microblaze_block_move_straight and
mips_block_move_straight are dealing with expansions of
block memory operations using register-wise operations,
and those registers must have non-BLK mode.

The reason for using require () in ix86_expand_sse_cmp,
mips_expand_ins_as_unaligned_store, spu.c:adjust_operand and
spu_emit_branch_and_set is that we go on to emit non-call
instructions that use registers of that mode, which wouldn't
be valid for BLKmode.

2017-09-04  Richard Sandiford  

gcc/
* builtins.c (expand_builtin_powi): Use int_mode_for_size.
(get_builtin_sync_mode): Likewise.
(expand_ifn_atomic_compare_exchange): Likewise.
(expand_builtin_atomic_clear): Likewise.
(expand_builtin_atomic_test_and_set): Likewise.
(fold_builtin_atomic_always_lock_free): Likewise.
* calls.c (compute_argument_addresses): Likewise.
(emit_library_call_value_1): Likewise.
(store_one_arg): Likewise.
* combine.c (combine_instructions): Likewise.
* config/aarch64/aarch64.c (aarch64_function_value): Likewise.
* config/arm/arm.c (arm_function_value): Likewise.
(aapcs_allocate_return_reg): Likewise.
* config/c6x/c6x.c (c6x_expand_movmem): Likewise.
* config/i386/i386.c (construct_container): Likewise.
(ix86_gimplify_va_arg): Likewise.
(ix86_expand_sse_cmp): Likewise.
(emit_memmov): Likewise.
(emit_memset): Likewise.
(expand_small_movmem_or_setmem): Likewise.
(ix86_expand_pextr): Likewise.
(ix86_expand_pinsr): Likewise.
* config/lm32/lm32.c (lm32_block_move_inline): Likewise.
* config/microblaze/microblaze.c (microblaze_block_move_straight):
Likewise.
* config/mips/mips.c (mips_function_value_1) Likewise.
(mips_block_move_straight): Likewise.
(mips_expand_ins_as_unaligned_store): Likewise.
* config/powerpcspe/powerpcspe.c
(rs6000_darwin64_record_arg_advance_flush): Likewise.
(rs6000_darwin64_record_arg_flush): Likewise.
* config/rs6000/rs6000.c
(rs6000_darwin64_record_arg_advance_flush): Likewise.
(rs6000_darwin64_record_arg_flush): Likewise.
* config/sparc/sparc.c (sparc_function_arg_1): Likewise.
(sparc_function_value_1): Likewise.
* config/spu/spu.c (adjust_operand): Likewise.
(spu_emit_branch_or_set): Likewise.
(arith_immediate_p): Likewise.
* emit-rtl.c (gen_lowpart_common): Likewise.
* expr.c (expand_expr_real_1): Likewise.
* function.c (assign_parm_setup_block): Likewise.
* gimple-ssa-store-merging.c (encode_tree_to_bitpos): Likewise.
* reload1.c (alter_reg): Likewise.
* stor-layout.c (mode_for_vector): Likewise.
(layout_type): Likewise.

gcc/ada/
* gcc-interface/utils2.c (build_load_modify_store):
Use int_mode_for_size.

Index: gcc/builtins.c
===
--- gcc/builtins.c  2017-09-04 08:30:09.328308115 +0100
+++ gcc/builtins.c  2017-09-04 12:18:44.865115639 +0100
@@ -2755,7 +2755,7 @@ expand_builtin_powi (tree exp, rtx targe
   /* Emit a libcall to libgcc.  */
 
   /* Mode of the 2nd argument must match that of an int.  */
-  mode2 = 

[1/9] Make more use of int_mode_for_mode

2017-09-04 Thread Richard Sandiford
This patch converts more places that could use int_mode_for_mode
instead of mode_for_size.  This is in preparation for an upcoming
patch that makes mode_for_size itself return an opt_mode.

The reason for using required () in exp2_immediate_p is that
we go on to do:

trunc_int_for_mode (..., int_mode)

which would be invalid for (and have failed for) BLKmode.

The reason for using required () in spu_convert_move and
resolve_simple_move is that we go on to use registers of
the returned mode in non-call rtl instructions, which would
be invalid for BLKmode.

2017-09-04  Richard Sandiford  

gcc/
* config/spu/spu.c (exp2_immediate_p): Use int_mode_for_mode.
(spu_convert_move): Likewise.
* lower-subreg.c (resolve_simple_move): Likewise.

Index: gcc/config/spu/spu.c
===
--- gcc/config/spu/spu.c2017-09-04 11:50:24.563372530 +0100
+++ gcc/config/spu/spu.c2017-09-04 12:18:41.572976650 +0100
@@ -3372,7 +3372,7 @@ arith_immediate_p (rtx op, machine_mode
   constant_to_array (mode, op, arr);
 
   bytes = GET_MODE_UNIT_SIZE (mode);
-  mode = mode_for_size (GET_MODE_UNIT_BITSIZE (mode), MODE_INT, 0);
+  mode = int_mode_for_mode (GET_MODE_INNER (mode)).require ();
 
   /* Check that bytes are repeated. */
   for (i = bytes; i < 16; i += bytes)
@@ -3415,7 +3415,7 @@ exp2_immediate_p (rtx op, machine_mode m
   mode = GET_MODE_INNER (mode);
 
   bytes = GET_MODE_SIZE (mode);
-  int_mode = mode_for_size (GET_MODE_BITSIZE (mode), MODE_INT, 0);
+  int_mode = int_mode_for_mode (mode).require ();
 
   /* Check that bytes are repeated. */
   for (i = bytes; i < 16; i += bytes)
@@ -4503,7 +4503,7 @@ spu_expand_mov (rtx * ops, machine_mode
 spu_convert_move (rtx dst, rtx src)
 {
   machine_mode mode = GET_MODE (dst);
-  machine_mode int_mode = mode_for_size (GET_MODE_BITSIZE (mode), MODE_INT, 0);
+  machine_mode int_mode = int_mode_for_mode (mode).require ();
   rtx reg;
   gcc_assert (GET_MODE (src) == TImode);
   reg = int_mode != mode ? gen_reg_rtx (int_mode) : dst;
Index: gcc/lower-subreg.c
===
--- gcc/lower-subreg.c  2017-09-04 11:50:08.544543511 +0100
+++ gcc/lower-subreg.c  2017-09-04 12:18:41.572976650 +0100
@@ -956,11 +956,7 @@ resolve_simple_move (rtx set, rtx_insn *
   if (real_dest == NULL_RTX)
real_dest = dest;
   if (!SCALAR_INT_MODE_P (dest_mode))
-   {
- dest_mode = mode_for_size (GET_MODE_SIZE (dest_mode) * BITS_PER_UNIT,
-MODE_INT, 0);
- gcc_assert (dest_mode != BLKmode);
-   }
+   dest_mode = int_mode_for_mode (dest_mode).require ();
   dest = gen_reg_rtx (dest_mode);
   if (REG_P (real_dest))
REG_ATTRS (dest) = REG_ATTRS (real_dest);


[0/9] Make more use of opt_mode

2017-09-04 Thread Richard Sandiford
The 77-patch machine-mode series originally targetted only the places
that needed to change for variable-sized modes, but as Richard B.
said on IRC, it meant that the interfaces of mode_for_size vs.
int_mode_for_size were inconsistent: the former still returns BLKmode
on failure, while the latter returns an opt_mode.

This series of patches tries to make the machine_mode functions
consistent.  Tested on aarch64-linux-gmu, x86_64-linux-gnu,
powerpc64le-linux-gnu, and by checking that there were no extra
warnings or changes in testsuite output for one target per CPU.

Richard


Re: [PATCH][GCC][Testsuite][ARM][AArch64] Enable Dot Product for generic tests for ARM and AArch64 [Patch (7/8)]

2017-09-04 Thread James Greenhalgh
On Fri, Sep 01, 2017 at 02:23:39PM +0100, Tamar Christina wrote:
> Hi All,
> 
> This patch enables tests for Dot Product vectorization
> in gcc for ARM and AArch64.
> 
> The ARMv8.2-a Dot Product instructions only support 8-bit
> element vectorization.
> 
> Dot product is available from ARMv8.2-a and onwards.
> 
> Regtested and bootstrapped on aarch64-none-elf and
> arm-none-eabi and no issues.

I'm surprised that this worked!

It looks like you unconditionally add the -march=armv8.2-a+dotprod
options, which should cause you to generate instructions which will
not execute on targets which don't support this instruction. As far as I can
see, this is an execute test, so that should cause undefined instruction
exceptions on an Armv8-A target at the very least.

So, not OK in its current form.

Thanks,
James

> 
> Ok for trunk?
> 
> gcc/testsuite
> 2017-09-01  Tamar Christina  
> 
>   * gcc.dg/vect/vect-reduc-dot-s8a.c
>   (dg-additional-options, dg-require-effective-target): Add +dotprod.
>   * gcc.dg/vect/vect-reduc-dot-u8a.c
>   (dg-additional-options, dg-require-effective-target): Add +dotprod.
> 
> -- 



Re: [PATCH][GCC][AArch64] Dot Product NEON intrinsics [Patch (6/8)]

2017-09-04 Thread James Greenhalgh
On Fri, Sep 01, 2017 at 02:22:55PM +0100, Tamar Christina wrote:
> Hi All,
> 
> This patch adds the Adv.SIMD intrinsics for Dot product.
> 
> Dot product is available from ARMv8.2-a and onwards.
> 
> Regtested and bootstrapped on aarch64-none-elf and no issues.
> 
> Ok for trunk?

OK.

Thanks,
James

> 
> gcc/
> 2017-09-01  Tamar Christina  
> 
>   * config/aarch64/arm_neon.h (vdot_u32, vdotq_u32, vdot_s32, vdotq_s32): 
> New.
>   (vdot_lane_u32, vdot_laneq_u32, vdotq_lane_u32, vdotq_laneq_u32): New.
>   (vdot_lane_s32, vdot_laneq_s32, vdotq_lane_s32, vdotq_laneq_s32): New.
> 
> gcc/testsuite/
> 2017-09-01  Tamar Christina  
> 
>   * gcc.target/aarch64/advsimd-intrinsics/vect-dot-qi.h: New.
>   * gcc.target/aarch64/advsimd-intrinsics/vdot-compile.c: New.
>   * gcc.target/aarch64/advsimd-intrinsics/vect-dot-s8.c: New.
>   * gcc.target/aarch64/advsimd-intrinsics/vect-dot-u8.c: New.
> 
> -- 



Re: [PATCH][GCC][AArch64] Dot Product SIMD patterns [Patch (5/8)]

2017-09-04 Thread James Greenhalgh
On Fri, Sep 01, 2017 at 02:22:17PM +0100, Tamar Christina wrote:
> Hi All,
> 
> This patch adds the instructions for Dot Product to AArch64 along
> with the intrinsics and vectorizer pattern.
> 
> Armv8.2-a dot product supports 8-bit element values both
> signed and unsigned.
> 
> Dot product is available from Arm8.2-a and onwards.
> 
> Regtested and bootstrapped on aarch64-none-elf and no issues.
> 
> Ok for trunk?
> 
> gcc/
> 2017-09-01  Tamar Christina  
> 
>   * config/aarch64/aarch64-builtins.c
>   (aarch64_types_quadopu_lane_qualifiers): New.
>   (TYPES_QUADOPU_LANE): New.
>   * config/aarch64/aarch64-simd.md (aarch64_dot): New.
>   (dot_prod, aarch64_dot_lane): New.
>   (aarch64_dot_laneq): New.
>   * config/aarch64/aarch64-simd-builtins.def (sdot, udot): New.
>   (sdot_lane, udot_lane, sdot_laneq, udot_laneq): New.
>   * config/aarch64/iterators.md (UNSPEC_SDOT, UNSPEC_UDOT): New.
>   (DOT_MODE, dot_mode, Vdottype, DOTPROD): New.
>   (sur): Add SDOT and UDOT.
> 
> -- 

> diff --git a/gcc/config/aarch64/aarch64-simd.md 
> b/gcc/config/aarch64/aarch64-simd.md
> index 
> f3e084f8778d70c82823b92fa80ff96021ad26db..21d46c84ab317c2d62afdf8c48117886aaf483b0
>  100644
> --- a/gcc/config/aarch64/aarch64-simd.md
> +++ b/gcc/config/aarch64/aarch64-simd.md
> @@ -386,6 +386,87 @@
>  }
>  )
>  
> +;; These instructions map to the __builtins for the Dot Product operations.
> +(define_insn "aarch64_dot"
> +  [(set (match_operand:VS 0 "register_operand" "=w")
> + (unspec:VS [(match_operand:VS 1 "register_operand" "0")
> + (match_operand: 2 "register_operand" "w")
> + (match_operand: 3 "register_operand" "w")]
> + DOTPROD))]
> +  "TARGET_DOTPROD"
> +  "dot\\t%0., %2., %3."
> +  [(set_attr "type" "neon_dot")]

Would there be a small benefit in modelling this as:

  [(set (match_operand:VS 0 "register_operand" "=w")
(add:VS ((match_operand:VS 1 "register_operand" "0")
 (unsepc:VS [(match_operand: 2 "register_operand" "w")
(match_operand: 3 "register_operand" "w")]
DOTPROD)))]


> +)
> +
> +;; These expands map to the Dot Product optab the vectorizer checks for.
> +;; The auto-vectorizer expects a dot product builtin that also does an
> +;; accumulation into the provided register.
> +;; Given the following pattern
> +;;
> +;; for (i=0; i +;; c = a[i] * b[i];
> +;; r += c;
> +;; }
> +;; return result;
> +;;
> +;; This can be auto-vectorized to
> +;; r  = a[0]*b[0] + a[1]*b[1] + a[2]*b[2] + a[3]*b[3];
> +;;
> +;; given enough iterations.  However the vectorizer can keep unrolling the 
> loop
> +;; r += a[4]*b[4] + a[5]*b[5] + a[6]*b[6] + a[7]*b[7];
> +;; r += a[8]*b[8] + a[9]*b[9] + a[10]*b[10] + a[11]*b[11];
> +;; ...
> +;;
> +;; and so the vectorizer provides r, in which the result has to be 
> accumulated.
> +(define_expand "dot_prod"
> +  [(set (match_operand:VS 0 "register_operand")
> + (unspec:VS [(match_operand: 1 "register_operand")
> + (match_operand: 2 "register_operand")
> + (match_operand:VS 3 "register_operand")]
> + DOTPROD))]

This is just an expand that always ends in a DONE, so doesn't need the
full description here, just:

  [(match_operand:VS 0 "register_operand)
   (match_operand: 1 "register_operand")
   (match_operand: 2 "register_operand")
   (match_operand:VS 3 "register_operand")]

> diff --git a/gcc/config/aarch64/iterators.md b/gcc/config/aarch64/iterators.md
> index 
> cceb57525c7aa44933419bd317b1f03a7b76f4c4..533c12cca916669195e9b094527ee0de31542b12
>  100644
> --- a/gcc/config/aarch64/iterators.md
> +++ b/gcc/config/aarch64/iterators.md
> @@ -354,6 +354,8 @@
>  UNSPEC_SQRDMLSH ; Used in aarch64-simd.md.
>  UNSPEC_FMAXNM   ; Used in aarch64-simd.md.
>  UNSPEC_FMINNM   ; Used in aarch64-simd.md.
> +UNSPEC_SDOT  ; Used in aarch64-simd.md.
> +UNSPEC_UDOT  ; Used in aarch64-simd.md.
>  ])
>  
>  ;; --
> @@ -810,6 +812,13 @@
>  (define_mode_attr vsi2qi [(V2SI "v8qi") (V4SI "v16qi")])
>  (define_mode_attr VSI2QI [(V2SI "V8QI") (V4SI "V16QI")])
>  
> +;; Mapping attribute for Dot Product input modes based on result mode.
> +(define_mode_attr DOT_MODE [(V2SI "V8QI") (V4SI "V16QI")])
> +(define_mode_attr dot_mode [(V2SI "v8qi") (V4SI "v16qi")])

Are these not identical to the two lines above in the context?

>  (define_mode_attr vsi2qi [(V2SI "v8qi") (V4SI "v16qi")])
>  (define_mode_attr VSI2QI [(V2SI "V8QI") (V4SI "V16QI")])

Thanks,
James


Re: [PATCH][GCC][AArch64] Dot Product commandline options [Patch (4/8)]

2017-09-04 Thread James Greenhalgh
On Fri, Sep 01, 2017 at 02:20:59PM +0100, Tamar Christina wrote:
> Hi All,
> 
> This patch adds support for the +dotprod extension to AArch64.
> Dot Product requires Adv.SIMD to work and so enables this option
> by default when enabled.
> 
> It is available from ARMv8.2-a and onwards and is enabled by
> default on Cortex-A55 and Cortex-A75.
> 
> Regtested and bootstrapped on aarch64-none-elf and no issues.
> 
> Ok for trunk?

Just a couple of rewordings needed, and then OK.

> gcc/
> 2017-09-01  Tamar Christina  
> 
>   * config/aarch64/aarch64.h (AARCH64_FL_DOTPROD): New.
>   (AARCH64_ISA_DOTPROD, TARGET_DOTPROD): New.
>   * config/aarch64/aarch64-c.c (aarch64_update_cpp_builtins): Add 
> TARGET_DOTPROD.
>   * config/aarch64/aarch64-option-extensions.def (dotprod): New.
>   * config/aarch64/aarch64-cores.def (cortex-a55, cortex-a75): Enable 
> TARGET_DOTPROD.
>   (cortex-a75.cortex-a55): Likewise.
>   * doc/invoke.texi (aarch64-feature-modifiers): Document dotprod.
> 
> -- 
> --- a/gcc/config/aarch64/aarch64.h
> +++ b/gcc/config/aarch64/aarch64.h
> @@ -151,7 +151,8 @@ extern unsigned aarch64_architecture_version;
>  #define AARCH64_FL_F16 (1 << 9)  /* Has ARMv8.2-A FP16 
> extensions.  */
>  /* ARMv8.3-A architecture extensions.  */
>  #define AARCH64_FL_V8_3(1 << 10)  /* Has ARMv8.3-A features.  */
> -#define AARCH64_FL_RCPC(1 << 11)  /* Has support for RCpc model. 
>  */
> +#define AARCH64_FL_RCPC   (1 << 11)  /* Has support for RCpc model.  */
> +#define AARCH64_FL_DOTPROD(1 << 12)  /* Has dot product.  */

Are these correctly formatted with the line above? "Has dot product" is not
very decsriptive.

>  /* ARMv8.3-A features.  */
>  #define TARGET_ARMV8_3   (AARCH64_ISA_V8_3)
>  
> diff --git a/gcc/doc/invoke.texi b/gcc/doc/invoke.texi
> index 
> 4cb5836a9da22681d192c3750fc8e5a50024ac10..61fbc087f4974c0eb833c2daa131a2f7269d1b84
>  100644
> --- a/gcc/doc/invoke.texi
> +++ b/gcc/doc/invoke.texi
> @@ -14271,6 +14271,9 @@ Enable FP16 extension.  This also enables 
> floating-point instructions.
>  Enable the RcPc extension.  This does not change code generation from GCC,
>  but is passed on to the assembler, enabling inline asm statements to use
>  instructions from the RcPc extension.
> +@item dotprod
> +Enable the Dot Product extension.  This also enables Advanced SIMD 
> instructions
> +and allows auto vectorization of dot products to the Dot Product 
> instructions.

I'd drop the text from "and allows" onwards, it isn't very useful for
figuring out exactly what idioms will be supported, and we don't use that
text on other extensions.

Thanks,
James



[PATCH] Fix PR82060

2017-09-04 Thread Richard Biener

The following fixes PR82060 but canonicalizing mems before dispatching
to the devrit machinery.

Bootstrapped and tested on x86_64-unknown-linux-gnu, applied to trunk.

Richard.

2017-09-04  Richard Biener  

PR tree-optimization/82060
* tree-ssa-pre.c (eliminate_dom_walker::before_dom_children):
Move devirtualization after stmt folding and before EH/AB/noreturn
cleanup to get the stmt refs canonicalized.  Use a bool instead
of gimple_modified_p since that doesn't work for NOPs.  Schedule
NOPs generated by folding for removal.

* g++.dg/torture/pr82060.C: New testcase.

Index: gcc/tree-ssa-pre.c
===
--- gcc/tree-ssa-pre.c  (revision 251559)
+++ gcc/tree-ssa-pre.c  (working copy)
@@ -4592,6 +4592,7 @@ eliminate_dom_walker::before_dom_childre
   /* If we didn't replace the whole stmt (or propagate the result
  into all uses), replace all uses on this stmt with their
 leaders.  */
+  bool modified = false;
   use_operand_p use_p;
   ssa_op_iter iter;
   FOR_EACH_SSA_USE_OPERAND (use_p, stmt, iter, SSA_OP_USE)
@@ -4613,7 +4614,7 @@ eliminate_dom_walker::before_dom_childre
  || !bitmap_bit_p (inserted_exprs, SSA_NAME_VERSION (sprime
{
  propagate_value (use_p, sprime);
- gimple_set_modified (stmt, true);
+ modified = true;
  if (TREE_CODE (sprime) == SSA_NAME
  && !is_gimple_debug (stmt))
gimple_set_plf (SSA_NAME_DEF_STMT (sprime),
@@ -4621,8 +4622,56 @@ eliminate_dom_walker::before_dom_childre
}
}
 
+  /* Fold the stmt if modified, this canonicalizes MEM_REFs we propagated
+ into which is a requirement for the IPA devirt machinery.  */
+  gimple *old_stmt = stmt;
+  if (modified)
+   {
+ /* If a formerly non-invariant ADDR_EXPR is turned into an
+invariant one it was on a separate stmt.  */
+ if (gimple_assign_single_p (stmt)
+ && TREE_CODE (gimple_assign_rhs1 (stmt)) == ADDR_EXPR)
+   recompute_tree_invariant_for_addr_expr (gimple_assign_rhs1 (stmt));
+ gimple_stmt_iterator prev = gsi;
+ gsi_prev ();
+ if (fold_stmt ())
+   {
+ /* fold_stmt may have created new stmts inbetween
+the previous stmt and the folded stmt.  Mark
+all defs created there as varying to not confuse
+the SCCVN machinery as we're using that even during
+elimination.  */
+ if (gsi_end_p (prev))
+   prev = gsi_start_bb (b);
+ else
+   gsi_next ();
+ if (gsi_stmt (prev) != gsi_stmt (gsi))
+   do
+ {
+   tree def;
+   ssa_op_iter dit;
+   FOR_EACH_SSA_TREE_OPERAND (def, gsi_stmt (prev),
+  dit, SSA_OP_ALL_DEFS)
+ /* As existing DEFs may move between stmts
+we have to guard VN_INFO_GET.  */
+ if (! has_VN_INFO (def))
+   VN_INFO_GET (def)->valnum = def;
+   if (gsi_stmt (prev) == gsi_stmt (gsi))
+ break;
+   gsi_next ();
+ }
+   while (1);
+   }
+ stmt = gsi_stmt (gsi);
+ /* In case we folded the stmt away schedule the NOP for removal.  */
+ if (gimple_nop_p (stmt))
+   el_to_remove.safe_push (stmt);
+   }
+
   /* Visit indirect calls and turn them into direct calls if
-possible using the devirtualization machinery.  */
+possible using the devirtualization machinery.  Do this before
+checking for required EH/abnormal/noreturn cleanup as devird
+may expose more of those.  */
   if (gcall *call_stmt = dyn_cast  (stmt))
{
  tree fn = gimple_call_fn (call_stmt);
@@ -4631,24 +4680,21 @@ eliminate_dom_walker::before_dom_childre
  && virtual_method_call_p (fn))
{
  tree otr_type = obj_type_ref_class (fn);
+ unsigned HOST_WIDE_INT otr_tok
+   = tree_to_uhwi (OBJ_TYPE_REF_TOKEN (fn));
  tree instance;
- ipa_polymorphic_call_context context (current_function_decl, fn, 
stmt, );
+ ipa_polymorphic_call_context context (current_function_decl,
+   fn, stmt, );
+ context.get_dynamic_type (instance, OBJ_TYPE_REF_OBJECT (fn),
+   otr_type, stmt);
  bool final;
-
- context.get_dynamic_type (instance, OBJ_TYPE_REF_OBJECT (fn), 
otr_type, stmt);
-
- vec targets
+ vec  targets
= 

[libstdc++/71500] make back reference work with icase

2017-09-04 Thread Tim Shen via gcc-patches
This fixes the follow-up comments in 71500.

Back-reference matching is different from other matching, as the
content the back-reference refers to is at "run-time", aka during
regex_match(), not regex() compilation.

For compilation we do have an abstraction layer to catch all
comparison customizations, namely _M_translator in regex_compiler.h.
Until this patch, we don't have an abstraction for "run-time"
matching. I believe that back-reference is the only place that needs
run-time matching, so I just build a _Backref_matcher in
regex_executot.tcc.

Tested on x86_64-linux-gnu.

Thanks!

-- 
Regards,
Tim Shen
commit a97b7fecd319e031ffc489a956b8cf3dc63eeb26
Author: Tim Shen 
Date:   Mon Sep 4 03:19:35 2017 -0700

PR libstdc++/71500
* include/bits/regex_executor.tcc: Support icase in
regex_tratis<...> for back reference matches.
* testsuite/28_regex/regression.cc: Test case.

diff --git a/libstdc++-v3/include/bits/regex_executor.tcc b/libstdc++-v3/include/bits/regex_executor.tcc
index 226e05856e1..f6149fecf9d 100644
--- a/libstdc++-v3/include/bits/regex_executor.tcc
+++ b/libstdc++-v3/include/bits/regex_executor.tcc
@@ -335,6 +335,54 @@ namespace __detail
 	  _M_states._M_queue(__state._M_next, _M_cur_results);
 }
 
+  template
+struct _Backref_matcher
+{
+  _Backref_matcher(bool __icase, const _TraitsT& __traits)
+  : _M_traits(__traits) { }
+
+  bool
+  _M_apply(_BiIter __expected_begin,
+	   _BiIter __expected_end, _BiIter __actual_begin,
+	   _BiIter __actual_end)
+  {
+	return _M_traits.transform(__expected_begin, __expected_end)
+	== _M_traits.transform(__actual_begin, __actual_end);
+  }
+
+  const _TraitsT& _M_traits;
+};
+
+  template
+struct _Backref_matcher<_BiIter, std::regex_traits<_CharT>>
+{
+  using _TraitsT = std::regex_traits<_CharT>;
+  _Backref_matcher(bool __icase, const _TraitsT& __traits)
+  : _M_icase(__icase), _M_traits(__traits) { }
+
+  bool
+  _M_apply(_BiIter __expected_begin,
+	   _BiIter __expected_end, _BiIter __actual_begin,
+	   _BiIter __actual_end)
+  {
+	if (!_M_icase)
+	  return std::equal(__expected_begin, __expected_end,
+			__actual_begin, __actual_end);
+	typedef std::ctype<_CharT> __ctype_type;
+	const auto& __fctyp = use_facet<__ctype_type>(_M_traits.getloc());
+	return std::equal(__expected_begin, __expected_end,
+			  __actual_begin, __actual_end,
+			  [this, &__fctyp](_CharT __lhs, _CharT __rhs)
+			  {
+			return __fctyp.tolower(__lhs)
+== __fctyp.tolower(__rhs);
+			  });
+  }
+
+  bool _M_icase;
+  const _TraitsT& _M_traits;
+};
+
   // First fetch the matched result from _M_cur_results as __submatch;
   // then compare it with
   // (_M_current, _M_current + (__submatch.second - __submatch.first)).
@@ -355,9 +403,10 @@ namespace __detail
 	   __last != _M_end && __tmp != __submatch.second;
 	   ++__tmp)
 	++__last;
-  if (_M_re._M_automaton->_M_traits.transform(__submatch.first,
-		  __submatch.second)
-	  == _M_re._M_automaton->_M_traits.transform(_M_current, __last))
+  if (_Backref_matcher<_BiIter, _TraitsT>(
+	  _M_re.flags() & regex_constants::icase,
+	  _M_re._M_automaton->_M_traits)._M_apply(
+		  __submatch.first, __submatch.second, _M_current, __last))
 	{
 	  if (__last != _M_current)
 	{
diff --git a/libstdc++-v3/testsuite/28_regex/regression.cc b/libstdc++-v3/testsuite/28_regex/regression.cc
index ee4d3e1e6f8..3fa9022eac4 100644
--- a/libstdc++-v3/testsuite/28_regex/regression.cc
+++ b/libstdc++-v3/testsuite/28_regex/regression.cc
@@ -93,6 +93,17 @@ test06()
   }
 }
 
+// PR libstdc++/71500
+void
+test07()
+{
+  bool test [[gnu::unused]] = true;
+
+  VERIFY(regex_match_debug("abc abc", regex("([a-z]+) \\1", regex::icase)));
+  VERIFY(regex_match_debug("Abc abc", regex("([a-z]+) \\1", regex::icase)));
+  VERIFY(regex_match_debug("abc Abc", regex("([a-z]+) \\1", regex::icase)));
+}
+
 int
 main()
 {
@@ -102,6 +113,7 @@ main()
   test04();
   test05();
   test06();
+  test07();
   return 0;
 }
 


Re: [PATCH 0/2] add unique_ptr class

2017-09-04 Thread Pedro Alves
On 09/04/2017 11:31 AM, Richard Biener wrote:
> On Fri, Aug 11, 2017 at 10:43 PM, Jonathan Wakely  wrote:
>> On 05/08/17 20:05 +0100, Pedro Alves wrote:
>>>
>>> On 08/04/2017 07:52 PM, Jonathan Wakely wrote:

 On 31/07/17 19:46 -0400, tbsaunde+...@tbsaunde.org wrote:
>
> I've been saying I'd do this for a long time, but I'm finally getting to
> importing the C++98 compatable unique_ptr class Pedro wrote for gdb a
> while
> back.
>>>
>>>
>>> Thanks a lot for doing this!
>>>
 I believe the gtl namespace also comes from Pedro, but GNU template
 library seems as reasonable as any other name I can come up with.
>>>
>>>
>>> Yes, I had suggested it here:
>>>
>>>  https://sourceware.org/ml/gdb-patches/2017-02/msg00197.html
>>>

 If it's inspired by "STL" then can we call it something else?

 std::unique_ptr is not part of the STL, because the STL is a library
 of containers and algorithms from the 1990s. std::unique_ptr is
 something much newer that doesn't originate in the STL.

 STL != C++ Standard Library
>>>
>>>
>>> That I knew, but gtl was not really a reference to the
>>> C++ Standard Library, so I don't see the problem.  It was supposed to
>>> be the name of a library which would contain other C++ utilities
>>> that would be shared by different GNU toolchain components.
>>> Some of those bits would be inspired by, or be straight backports of
>>> more-recent standards, but it'd be more than that.
>>>
>>> An option would be to keep "gtl" as acronym, but expand it
>>> to "GNU Toolchain Library" instead.
>>
>>
>> OK, that raises my hackles less. What bothered me was an apparent
>> analogy to "STL" when that isn't even the right name for the library
>> that provides the original unique_ptr.
>>
>> And "template library" assumes we'd never add non-templates to it,
>> which is unlikely (you already mentioned offset_type, which isn't a
>> template).
>>
>> It seems odd to make up a completely new acronym for this though,
>> rather than naming it after something that exists already in the
>> toolchain codebases.
>>
>>
>>> For example, gdb has been growing C++ utilities under gdb/common/
>>> that might be handy for gcc and other projects too, for example:
>>>
>>> - enum_flags (type-safe enum bit flags)
>>> - function_view (non-owning reference to callables)
>>> - offset_type (type safe / distinct integer types to represent offsets
>>>into anything addressable)
>>> - optional (C++11 backport of libstdc++'s std::optional)
>>> - refcounted_object.h (intrusively-refcounted types)
>>> - scoped_restore (RAII save/restore of globals)
>>> - an allocator that default-constructs using default-initialization
>>>   instead of value-initialization.
>>>
>>> and more.
>>>
>>> GCC OTOH has code that might be handy for GDB as well, like for
>>> example the open addressing hash tables (hash_map/hash_table/hash_set
>>> etc.).
>>>
>>> Maybe Gold could make use of some of this code too.  There
>>> are some bits in Gold that might be useful for (at least) GDB
>>> too.  For example, its Stringpool class.
>>>
>>> I think there's a lot of scope for sharing more code between the
>>> different components of the GNU toolchain, even beyond general
>>> random utilities and data structures, and I'd love to see us
>>> move more in that direction.
>>>
 If we want a namespace for GNU utilities, what's wrong with "gnu"?
>>>
>>>
>>> That'd be an "obvious" choice, and I'm not terribly against it,
>>> though I wonder whether it'd be taking over a name that has a wider
>>> scope than intended?  I.e., GNU is a larger set of projects than the
>>> GNU toolchain.  For example, there's Gnulib, which already compiles
>>> as libgnu.a / -lgnu, which might be confusing.  GCC doesn't currently
>>> use Gnulib, but GDB does, and, there was work going on a while ago to
>>> make GCC use gnulib as well.
>>
>>
>> Good point. "gnutools" might be more accurate, but people might object
>> to adding 10 extra characters for "gnutools::".
>>
>> Naming is important, especially for a whole namespace (not just a
>> single type) so I do think it's worth spending time getting it right.
>>
>> But I could live with gtl as long as nobody ever says "GTL is the GNU
>> STL" or mentions "gtl" and STL in the same breath :-)
> 
> If it should be short use g::.  We can also use gnu:: I guess and I
> agree gnutools:: is a little long (similar to libiberty::).  Maybe
> gt:: as a short-hand for gnutools.

Exactly 3 letters has the nice property of making s/gtl::foo/std::foo/ super
trivial down the road; you don't have to care about reindenting stuff
[1].  Also makes gdb->gtl and gcc->gtl renamings trivial in the same way.
Really a minor thing in the grand scheme of things, but just a FYI that that
factored in a bit in the original motivation for the "gtl" naming back when
I proposed it on the gdb list.

[1] - [PATCH] gdb::{unique_ptr,move} -> std::{unique_ptr,move}:

Re: [RFA] [PATCH][PR tree-optimization/64910] Fix reassociation of binary bitwise operations with 3 operands

2017-09-04 Thread Richard Biener
On Sun, Sep 3, 2017 at 4:44 PM, Jeff Law  wrote:
> On 01/13/2016 05:30 AM, Richard Biener wrote:
>> On Wed, Jan 13, 2016 at 7:39 AM, Jeff Law  wrote:
>>> On 01/12/2016 08:11 AM, Richard Biener wrote:

 On Tue, Jan 12, 2016 at 6:10 AM, Jeff Law  wrote:
>
> On 01/11/2016 03:32 AM, Richard Biener wrote:
>
>>
>> Yeah, reassoc is largely about canonicalization.
>>
>>> Plus doing it in TER is almost certainly more complex than getting it
>>> right
>>> in reassoc to begin with.
>>
>>
>>
>> I guess canonicalizing differently is ok but you'll still create
>> ((a & b) & 1) & c then if you only change the above place.
>
>
> What's best for that expression would depend on factors like whether or
> not
> the target can exploit ILP.  ie (a & b) & (1 & c) exposes more
> parallelism
> while (((a & b) & c) & 1) is not good for parallelism, but does expose
> the
> bit test.
>
> reassoc currently generates ((a & 1) & b) & c which is dreadful as
> there's
> no ILP or chance of creating a bit test.  My patch shuffles things
> around,
> but still doesn't expose the ILP or bit test in the 4 operand case.
> Based
> on the comments in reassoc, it didn't seem like the author thought
> anything
> beyond the 3-operand case was worth handling. So my patch just handles
> the
> 3-operand case.
>
>
>
>>
>> So I'm not sure what pattern the backend is looking for?
>
>
> It just wants the constant last in the sequence.  That exposes bit clear,
> set, flip, test, etc idioms.


 But those don't feed another bit operation, right?  Thus we'd like to see
 ((a & b) & c) & 1, not ((a & b) & 1) & c?  It sounds like the instructions
 are designed to feed conditionals (aka CC consuming ops)?
>>>
>>> At the gimple level they could feed a conditional, or be part of a series of
>>> ops on an SSA_NAME that eventually gets stored to memory, etc.  At the RTL
>>> level they'll feed CC consumers and bit manipulations of pseudos or memory.
>>>
>>> For the 3-op case, we always want the constant last.  For the 4-op case it's
>>> less clear.  Though ((a & b) & c) & 1 is certainly better than ((a & b) & 1)
>>> & c.
>>
>> Ok, so handling it in swap_ops_for_binary_stmt is merely a convenient place
>> to special-case bitwise ops.  The "real" fix to the sorting heuristic would 
>> be
>> to sort constants at the opposite end.
>>
>> That might be too invasive right now but there is another "convenient" place:
>>
>>   /* If the operand vector is now empty, all operands were
>>  consumed by the __builtin_powi optimization.  */
>> ...
>>   else
>> {
>>   machine_mode mode = TYPE_MODE (TREE_TYPE (lhs));
>>   int ops_num = ops.length ();
>>   int width = get_reassociation_width (ops_num, rhs_code, 
>> mode);
>>   tree new_lhs = lhs;
>>
>>   if (dump_file && (dump_flags & TDF_DETAILS))
>> fprintf (dump_file,
>>  "Width = %d was chosen for
>> reassociation\n", width);
>>
>> at this point you can check rhs_code and move the (single) constant
>> entry in ops (if there is any constant entry) from .last () to the beginning.
>>
>> That'll get the 4 operand case correct as well and properly models
>> "constant last" for the specified operation kind.
> Resurrecting an old thread...  Just now getting around to flushing this
> out of the queue.
>
> To recap, given something like (x & y) & C reassociation will turn that
> into (x & C) & y.  It's functionally equivalent, but it will inhibit
> generation of bit test instructions.
>
> I originally hacked up swap_ops_for_binary_stmt.  You requested that
> change be made in reassociate_bb so that it would apply to cases where
> there are more than 3 args.
>
> So that's what this patch does.   OK for the trunk now?

Ok.

Thanks,
Richard.

> Bootstrapped and regression tested on x86_64.  Also tested the new
> testcase on m68k.
>
>
> commit c10ae0339674c27c89a1fa1904217a55bf530cb3
> Author: Jeff Law 
> Date:   Sun Sep 3 10:42:30 2017 -0400
>
> 2017-09-03  Jeff Law  
>
> PR tree-optimization/64910
> * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops,
> swap the first and last operand if the last is a constant.
>
> PR tree-optimization/64910
> * gcc.dg/tree-ssa/pr64910-2.c: New test.
>
> diff --git a/gcc/ChangeLog b/gcc/ChangeLog
> index 3f632ca31c2..2c9a8c8265a 100644
> --- a/gcc/ChangeLog
> +++ b/gcc/ChangeLog
> @@ -1,3 +1,9 @@
> +2017-09-03  Jeff Law  
> +
> +   PR tree-optimization/64910
> +   * tree-ssa-reassoc.c (reassociate_bb): For bitwise binary ops,
> + 

Re: [PATCH 0/2] add unique_ptr class

2017-09-04 Thread Richard Biener
On Fri, Aug 11, 2017 at 10:43 PM, Jonathan Wakely  wrote:
> On 05/08/17 20:05 +0100, Pedro Alves wrote:
>>
>> On 08/04/2017 07:52 PM, Jonathan Wakely wrote:
>>>
>>> On 31/07/17 19:46 -0400, tbsaunde+...@tbsaunde.org wrote:

 I've been saying I'd do this for a long time, but I'm finally getting to
 importing the C++98 compatable unique_ptr class Pedro wrote for gdb a
 while
 back.
>>
>>
>> Thanks a lot for doing this!
>>
>>> I believe the gtl namespace also comes from Pedro, but GNU template
>>> library seems as reasonable as any other name I can come up with.
>>
>>
>> Yes, I had suggested it here:
>>
>>  https://sourceware.org/ml/gdb-patches/2017-02/msg00197.html
>>
>>>
>>> If it's inspired by "STL" then can we call it something else?
>>>
>>> std::unique_ptr is not part of the STL, because the STL is a library
>>> of containers and algorithms from the 1990s. std::unique_ptr is
>>> something much newer that doesn't originate in the STL.
>>>
>>> STL != C++ Standard Library
>>
>>
>> That I knew, but gtl was not really a reference to the
>> C++ Standard Library, so I don't see the problem.  It was supposed to
>> be the name of a library which would contain other C++ utilities
>> that would be shared by different GNU toolchain components.
>> Some of those bits would be inspired by, or be straight backports of
>> more-recent standards, but it'd be more than that.
>>
>> An option would be to keep "gtl" as acronym, but expand it
>> to "GNU Toolchain Library" instead.
>
>
> OK, that raises my hackles less. What bothered me was an apparent
> analogy to "STL" when that isn't even the right name for the library
> that provides the original unique_ptr.
>
> And "template library" assumes we'd never add non-templates to it,
> which is unlikely (you already mentioned offset_type, which isn't a
> template).
>
> It seems odd to make up a completely new acronym for this though,
> rather than naming it after something that exists already in the
> toolchain codebases.
>
>
>> For example, gdb has been growing C++ utilities under gdb/common/
>> that might be handy for gcc and other projects too, for example:
>>
>> - enum_flags (type-safe enum bit flags)
>> - function_view (non-owning reference to callables)
>> - offset_type (type safe / distinct integer types to represent offsets
>>into anything addressable)
>> - optional (C++11 backport of libstdc++'s std::optional)
>> - refcounted_object.h (intrusively-refcounted types)
>> - scoped_restore (RAII save/restore of globals)
>> - an allocator that default-constructs using default-initialization
>>   instead of value-initialization.
>>
>> and more.
>>
>> GCC OTOH has code that might be handy for GDB as well, like for
>> example the open addressing hash tables (hash_map/hash_table/hash_set
>> etc.).
>>
>> Maybe Gold could make use of some of this code too.  There
>> are some bits in Gold that might be useful for (at least) GDB
>> too.  For example, its Stringpool class.
>>
>> I think there's a lot of scope for sharing more code between the
>> different components of the GNU toolchain, even beyond general
>> random utilities and data structures, and I'd love to see us
>> move more in that direction.
>>
>>> If we want a namespace for GNU utilities, what's wrong with "gnu"?
>>
>>
>> That'd be an "obvious" choice, and I'm not terribly against it,
>> though I wonder whether it'd be taking over a name that has a wider
>> scope than intended?  I.e., GNU is a larger set of projects than the
>> GNU toolchain.  For example, there's Gnulib, which already compiles
>> as libgnu.a / -lgnu, which might be confusing.  GCC doesn't currently
>> use Gnulib, but GDB does, and, there was work going on a while ago to
>> make GCC use gnulib as well.
>
>
> Good point. "gnutools" might be more accurate, but people might object
> to adding 10 extra characters for "gnutools::".
>
> Naming is important, especially for a whole namespace (not just a
> single type) so I do think it's worth spending time getting it right.
>
> But I could live with gtl as long as nobody ever says "GTL is the GNU
> STL" or mentions "gtl" and STL in the same breath :-)

If it should be short use g::.  We can also use gnu:: I guess and I
agree gnutools:: is a little long (similar to libiberty::).  Maybe
gt:: as a short-hand for gnutools.

Richard.

>


Re: [PATCH] [PR79542][Ada] Fix ICE in dwarf2out.c with nested func. inlining

2017-09-04 Thread Richard Biener
On Mon, Sep 4, 2017 at 11:20 AM, Pierre-Marie de Rodat
 wrote:
> On 08/18/2017 12:10 PM, Richard Biener wrote:
>>>
>>> ok, not doing this at all doesn't work, doing it only in the above case
>>> neither.
>>>
>>> Bah.
>>>
>>> Can anyone explain to me why we do the set_decl_origin_self dance?
>>
>>
>> Ok, so I need the following incremental patch to fix the fallout.
>>
>> This allows Ada LTO bootstrap to succeed with the early LTO debug patches.
>>
>> I assume this change is ok ontop of the LTO debug patches unless I
>> hear otherwise
>> til Monday (when I then finally will commit the series).
>>
>> Full bootstrap/testing running now.
>
> Sorry for the late answer, I’ve been busy the last two weeks. As discussed
> on IRC, I’m not very familiar with debug info generation for optimized code
> yet anyway. ;-)
>
> Are there still pending issues with this? Also, do you think we can port the
> fix for PR79542 on the 7 release branch?

No more pending issues and yes, I guess the fix is ok for the branch.

Richard.

> --
> Pierre-Marie de Rodat


Re: [PATCH v2] Python testcases to check DWARF output

2017-09-04 Thread Pierre-Marie de Rodat

Hello,

I would like to ping for the patch I submitted at 
. Thank you in 
advance!

--
Pierre-Marie de Rodat


Re: [PATCH] [PR79542][Ada] Fix ICE in dwarf2out.c with nested func. inlining

2017-09-04 Thread Pierre-Marie de Rodat

On 08/18/2017 12:10 PM, Richard Biener wrote:

ok, not doing this at all doesn't work, doing it only in the above case neither.

Bah.

Can anyone explain to me why we do the set_decl_origin_self dance?


Ok, so I need the following incremental patch to fix the fallout.

This allows Ada LTO bootstrap to succeed with the early LTO debug patches.

I assume this change is ok ontop of the LTO debug patches unless I
hear otherwise
til Monday (when I then finally will commit the series).

Full bootstrap/testing running now.
Sorry for the late answer, I’ve been busy the last two weeks. As 
discussed on IRC, I’m not very familiar with debug info generation for 
optimized code yet anyway. ;-)


Are there still pending issues with this? Also, do you think we can port 
the fix for PR79542 on the 7 release branch?


--
Pierre-Marie de Rodat


Re: [RFC, vectorizer] Allow single element vector types for vector reduction operations

2017-09-04 Thread Richard Biener
On Mon, 4 Sep 2017, Tamar Christina wrote:

> Indeed it does, but even worse if you look at the gimple,
> you see lots of
> 
>   vect__5.25_58 = VIEW_CONVERT_EXPR intD.11>(vect__4.21_65);
>   vect__5.25_57 = VIEW_CONVERT_EXPR intD.11>(vect__4.22_63);
>   vect__5.25_56 = VIEW_CONVERT_EXPR intD.11>(vect__4.23_61);
>   vect__5.25_55 = VIEW_CONVERT_EXPR intD.11>(vect__4.24_59);
> 
> I suspect this patch will be quite bad for us performance wise as it thinks 
> it's as cheap to do all our integer
> operations on the vector side with vectors of 1 element. But I'm still 
> waiting for the perf numbers to confirm.

Looks like the backend advertises that it can do POPCOUNT on V1DI.  So
SLP vectorization decides it can vectorize this without unrolling.

Vectorization with V2DI is rejected:

/tmp/trunk/gcc/testsuite/gcc.dg/vect/pr68577.c:18:3: note: function is not 
vectorizable.
/tmp/trunk/gcc/testsuite/gcc.dg/vect/pr68577.c:18:3: note: not vectorized: 
relevant stmt not supported: _8 = __builtin_popcountl (_5);
...
/tmp/trunk/gcc/testsuite/gcc.dg/vect/pr68577.c:18:3: note: * Re-trying 
analysis with vector size 8

and that now succeeds (it probably didn't succeed before the patch).

Richard.

> 
> From: gcc-patches-ow...@gcc.gnu.org  on behalf 
> of Andreas Schwab 
> Sent: Saturday, September 2, 2017 10:58 PM
> To: Jon Beniston
> Cc: 'Richard Biener'; gcc-patches@gcc.gnu.org
> Subject: Re: [RFC, vectorizer] Allow single element vector types for vector 
> reduction operations
> 
> On Aug 30 2017, "Jon Beniston"  wrote:
> 
> > gcc/
> > 2017-08-30  Jon Beniston  
> > Richard Biener  
> >
> > diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> > index cfdb72c..5ebeac2 100644
> > --- a/gcc/tree-vect-patterns.c
> > +++ b/gcc/tree-vect-patterns.c
> > @@ -4150,7 +4150,7 @@ vect_pattern_recog_1 (vect_recog_func *recog_func,
> >loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
> >
> >if (VECTOR_BOOLEAN_TYPE_P (type_in)
> > -  || VECTOR_MODE_P (TYPE_MODE (type_in)))
> > +  || VECTOR_TYPE_P (type_in))
> >  {
> >/* No need to check target support (already checked by the pattern
> >   recognition function).  */
> > diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> > index 013fb1f..fc62efb 100644
> > --- a/gcc/tree-vect-stmts.c
> > +++ b/gcc/tree-vect-stmts.c
> > @@ -9098,7 +9098,8 @@ get_vectype_for_scalar_type_and_size (tree
> > scalar_type, unsigned size)
> >else
> >  simd_mode = mode_for_vector (inner_mode, size / nbytes);
> >nunits = GET_MODE_SIZE (simd_mode) / nbytes;
> > -  if (nunits <= 1)
> > +  /* NOTE: nunits == 1 is allowed to support single element vector types.
> > */
> > +  if (nunits < 1)
> >  return NULL_TREE;
> >
> >vectype = build_vector_type (scalar_type, nunits);
> >
> >
> >
> 
> That breaks vect/pr68577.c on aarch64.
> 
> during RTL pass: expand
> /opt/gcc/gcc-20170902/gcc/testsuite/gcc.dg/vect/pr68577.c: In function 
> 'slp_test':
> /opt/gcc/gcc-20170902/gcc/testsuite/gcc.dg/vect/pr68577.c:20:12: internal 
> compiler error: in simplify_subreg, at simplify-rtx.c:6050
> 0xb55983 simplify_subreg(machine_mode, rtx_def*, machine_mode, unsigned int)
> ../../gcc/simplify-rtx.c:6049
> 0xb595f7 simplify_gen_subreg(machine_mode, rtx_def*, machine_mode, unsigned 
> int)
> ../../gcc/simplify-rtx.c:6278
> 0x81d277 store_bit_field_1
> ../../gcc/expmed.c:798
> 0x81d55f store_bit_field(rtx_def*, unsigned long, unsigned long, unsigned 
> long, unsigned long, machine_mode, rtx_def*, bool)
> ../../gcc/expmed.c:1133
> 0x840bf7 store_field
> ../../gcc/expr.c:6950
> 0x84792f store_constructor_field
> ../../gcc/expr.c:6142
> 0x83edbf store_constructor
> ../../gcc/expr.c:6726
> 0x840443 expand_constructor
> ../../gcc/expr.c:8027
> 0x82d5bf expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
> expand_modifier, rtx_def**, bool)
> ../../gcc/expr.c:10133
> 0x82eaeb expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
> expand_modifier, rtx_def**, bool)
> ../../gcc/expr.c:9819
> 0x82dadf expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
> expand_modifier, rtx_def**, bool)
> ../../gcc/expr.c:10942
> 0x82eaeb expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
> expand_modifier, rtx_def**, bool)
> ../../gcc/expr.c:9819
> 0x83d197 expand_expr
> ../../gcc/expr.h:276
> 0x83d197 expand_assignment(tree_node*, tree_node*, bool)
> ../../gcc/expr.c:4971
> 0x71e2f3 expand_gimple_stmt_1
> ../../gcc/cfgexpand.c:3653
> 0x71e2f3 expand_gimple_stmt
> ../../gcc/cfgexpand.c:3751
> 0x721cdb expand_gimple_basic_block
> ../../gcc/cfgexpand.c:5750
> 0x726b07 

Re: [C++] Fix PR bootstrap/81926

2017-09-04 Thread Richard Biener
On Sun, Sep 3, 2017 at 4:15 PM, Eric Botcazou  wrote:
>> But not types, less using langhooks.
>
> #0  cp_get_debug_type (type=0x707702a0)
> at /home/eric/svn/gcc-7.2.0/gcc/cp/cp-objcp-common.c:142
> #1  0x009f2c86 in gen_type_die_with_usage (
> type=type@entry=0x707702a0,
> context_die=context_die@entry=0x7fffed7e90a0,
> usage=usage@entry=DINFO_USAGE_DIR_USE)
> at /home/eric/svn/gcc-7.2.0/gcc/dwarf2out.c:24566
> #2  0x009f4e55 in gen_type_die (type=0x707702a0,
> context_die=context_die@entry=0x7fffed7e90a0)
> at /home/eric/svn/gcc-7.2.0/gcc/dwarf2out.c:24745
> #3  0x009f9f67 in gen_decl_die (decl=decl@entry=0x7076a980,
> origin=origin@entry=0x0, ctx=ctx@entry=0x0,
> context_die=context_die@entry=0x7fffed7e90a0)
> at /home/eric/svn/gcc-7.2.0/gcc/dwarf2out.c:25422
> #4  0x009f8c13 in gen_subprogram_die (decl=decl@entry=0x71109a00,
> context_die=context_die@entry=0x76c5f000)
> at /home/eric/svn/gcc-7.2.0/gcc/dwarf2out.c:22386
> #5  0x009f95dc in gen_decl_die (decl=decl@entry=0x71109a00,
> origin=, origin@entry=0x0, ctx=ctx@entry=0x0,
> context_die=context_die@entry=0x76c5f000)
> at /home/eric/svn/gcc-7.2.0/gcc/dwarf2out.c:25335
> #6  0x009fc2d2 in dwarf2out_decl (decl=0x71109a00)
> at /home/eric/svn/gcc-7.2.0/gcc/dwarf2out.c:25844
> #7  0x00a13404 in dwarf2out_function_decl (decl=)
> at /home/eric/svn/gcc-7.2.0/gcc/dwarf2out.c:25859
> #8  0x00a74197 in rest_of_handle_final ()
> at /home/eric/svn/gcc-7.2.0/gcc/final.c:4520

Ah, wishful thinking on my side ;)  Well, I suppose it'll be true in the future
when free-lang-data runs unconditionally and resets all langhooks.

Richard.

> --
> Eric Botcazou


Re: [RFC, vectorizer] Allow single element vector types for vector reduction operations

2017-09-04 Thread Tamar Christina
Indeed it does, but even worse if you look at the gimple,
you see lots of

  vect__5.25_58 = VIEW_CONVERT_EXPR(vect__4.21_65);
  vect__5.25_57 = VIEW_CONVERT_EXPR(vect__4.22_63);
  vect__5.25_56 = VIEW_CONVERT_EXPR(vect__4.23_61);
  vect__5.25_55 = VIEW_CONVERT_EXPR(vect__4.24_59);

I suspect this patch will be quite bad for us performance wise as it thinks 
it's as cheap to do all our integer
operations on the vector side with vectors of 1 element. But I'm still waiting 
for the perf numbers to confirm.

From: gcc-patches-ow...@gcc.gnu.org  on behalf 
of Andreas Schwab 
Sent: Saturday, September 2, 2017 10:58 PM
To: Jon Beniston
Cc: 'Richard Biener'; gcc-patches@gcc.gnu.org
Subject: Re: [RFC, vectorizer] Allow single element vector types for vector 
reduction operations

On Aug 30 2017, "Jon Beniston"  wrote:

> gcc/
> 2017-08-30  Jon Beniston  
> Richard Biener  
>
> diff --git a/gcc/tree-vect-patterns.c b/gcc/tree-vect-patterns.c
> index cfdb72c..5ebeac2 100644
> --- a/gcc/tree-vect-patterns.c
> +++ b/gcc/tree-vect-patterns.c
> @@ -4150,7 +4150,7 @@ vect_pattern_recog_1 (vect_recog_func *recog_func,
>loop_vinfo = STMT_VINFO_LOOP_VINFO (stmt_info);
>
>if (VECTOR_BOOLEAN_TYPE_P (type_in)
> -  || VECTOR_MODE_P (TYPE_MODE (type_in)))
> +  || VECTOR_TYPE_P (type_in))
>  {
>/* No need to check target support (already checked by the pattern
>   recognition function).  */
> diff --git a/gcc/tree-vect-stmts.c b/gcc/tree-vect-stmts.c
> index 013fb1f..fc62efb 100644
> --- a/gcc/tree-vect-stmts.c
> +++ b/gcc/tree-vect-stmts.c
> @@ -9098,7 +9098,8 @@ get_vectype_for_scalar_type_and_size (tree
> scalar_type, unsigned size)
>else
>  simd_mode = mode_for_vector (inner_mode, size / nbytes);
>nunits = GET_MODE_SIZE (simd_mode) / nbytes;
> -  if (nunits <= 1)
> +  /* NOTE: nunits == 1 is allowed to support single element vector types.
> */
> +  if (nunits < 1)
>  return NULL_TREE;
>
>vectype = build_vector_type (scalar_type, nunits);
>
>
>

That breaks vect/pr68577.c on aarch64.

during RTL pass: expand
/opt/gcc/gcc-20170902/gcc/testsuite/gcc.dg/vect/pr68577.c: In function 
'slp_test':
/opt/gcc/gcc-20170902/gcc/testsuite/gcc.dg/vect/pr68577.c:20:12: internal 
compiler error: in simplify_subreg, at simplify-rtx.c:6050
0xb55983 simplify_subreg(machine_mode, rtx_def*, machine_mode, unsigned int)
../../gcc/simplify-rtx.c:6049
0xb595f7 simplify_gen_subreg(machine_mode, rtx_def*, machine_mode, unsigned int)
../../gcc/simplify-rtx.c:6278
0x81d277 store_bit_field_1
../../gcc/expmed.c:798
0x81d55f store_bit_field(rtx_def*, unsigned long, unsigned long, unsigned long, 
unsigned long, machine_mode, rtx_def*, bool)
../../gcc/expmed.c:1133
0x840bf7 store_field
../../gcc/expr.c:6950
0x84792f store_constructor_field
../../gcc/expr.c:6142
0x83edbf store_constructor
../../gcc/expr.c:6726
0x840443 expand_constructor
../../gcc/expr.c:8027
0x82d5bf expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
../../gcc/expr.c:10133
0x82eaeb expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
../../gcc/expr.c:9819
0x82dadf expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
../../gcc/expr.c:10942
0x82eaeb expand_expr_real_1(tree_node*, rtx_def*, machine_mode, 
expand_modifier, rtx_def**, bool)
../../gcc/expr.c:9819
0x83d197 expand_expr
../../gcc/expr.h:276
0x83d197 expand_assignment(tree_node*, tree_node*, bool)
../../gcc/expr.c:4971
0x71e2f3 expand_gimple_stmt_1
../../gcc/cfgexpand.c:3653
0x71e2f3 expand_gimple_stmt
../../gcc/cfgexpand.c:3751
0x721cdb expand_gimple_basic_block
../../gcc/cfgexpand.c:5750
0x726b07 execute
../../gcc/cfgexpand.c:6357

Andreas.

--
Andreas Schwab, sch...@linux-m68k.org
GPG Key fingerprint = 58CA 54C7 6D53 942B 1756  01D3 44D5 214B 8276 4ED5
"And now for something completely different."


Re: [00/77] Add wrapper classes for machine_modes

2017-09-04 Thread Gerald Pfeifer
On Sun, 3 Sep 2017, Richard Sandiford wrote:
> This sounds like the same as PR82045.  Could you try the patch I posted
> here: https://gcc.gnu.org/ml/gcc-patches/2017-09/msg00062.html ?

Yep, that resolves it; tested per what you committed last night.

Thank you!

Gerald


Re: [PATCH] Forbid section anchors for ASan build (PR sanitizer/81697)

2017-09-04 Thread Maxim Ostapenko

+ Jakub

On 09/08/17 19:21, Jeff Law wrote:

On 08/08/2017 12:46 AM, Vyacheslav Barinov wrote:

Hello,

Andrew Pinski  writes:


On Mon, Aug 7, 2017 at 11:15 PM, Slava Barinov  wrote:

gcc/
* varasm.c (use_object_blocks_p): Forbid section anchors for ASan

gcc/testsuite/
* g++.dg/asan/global-alignment.cc: New test to test global
variables alignment.


Can you describe this a little bit more?  What is going wrong here?
Is it because there is no red zone between the variables?

I've described situation in bugzilla PR 81697 with compiled files dumps, but
briefly yes, red zone size is incorrect between global variables.

On certain platforms (we checked at least arm and ppc) the flow is following:
make_decl_rtl() calls create_block_symbol() on decl tree with ASan global
variable describing a string. But then get_variable_section() returns wrong
section (.rodata.str1.4 in my case) because check in asan_protect_global()
returns false due to !DECL_RTL_SET_P check.

When variable is placed not into .rodata, but into .rodata.str ld treats it as
string and just shrinks the large part of red zone silently (gold at least
prints warning about wrong string alignment). But in run time libasan expects
that red zone is still there and reports false positives.

In order to prevent the setting of RTL before ASan handling I tried to
reproduce the x86_64 behavior and found that use_object_blocks_p() returns
false for that platform. Accordingly to the function comment it reports if
'current compilation mode benefits from grouping', and just switching it off
for any ASan builds resolves the issue.


Also I noticed you are using .cc as the testcase file name, why don't
you use .C instead and then you won't need the other patch which you
just posted.

Okay, attaching rebased version.

In many ways it'd be better if asan could be made to work with section
anchors.  Doing so means the code we're running for asan more closely
matches what we're running in the normal case.


I'm not very familiar with this code, but as far as I can understand, 
the problem is that we have a circular dependency here -- 
categorize_decl_for_section calls asan_protect_global that returns false 
because DECL_RTL is not set at this point and assigns decl to 
SECCAT_RODATA_MERGE_CONST section. But when make_decl_rtl sets up 
SET_DECL_RTL (decl, x) and following asan_protect_global calls will 
return true and we end up with mismatch between section and alignment 
requirements. Perhaps we can add an additional flag to 
asan_protect_global to change its behavior in such cases (e.g. don't 
account for DECL_RTL_SET_P)? In this case we will change the section for 
SECCAT_RODATA without affecting global section anchors flag.


-Maxim



But I'll defer to Jakub and Marek as they know the sanitizers far better
than I do and are more familiar with the expected tradeoffs.

jeff







Re: [C++] Fix PR bootstrap/81926

2017-09-04 Thread Eric Botcazou
> A solution would be to put them into a global GCed pointer-map or vector,
> freeing that at free-lang-data time.

Here's a version that implements a bona-fide type->offset_type map.


PR bootstrap/81926
* cp-objcp-common.c (struct offset_type_hasher): New class.
(offset_type_hash): New variable.
(cp_get_debug_type): Associate the OFFSET_TYPEs with the types.

-- 
Eric BotcazouIndex: cp/cp-objcp-common.c
===
--- cp/cp-objcp-common.c	(revision 251538)
+++ cp/cp-objcp-common.c	(working copy)
@@ -131,6 +131,20 @@ cxx_types_compatible_p (tree x, tree y)
   return same_type_ignoring_top_level_qualifiers_p (x, y);
 }
 
+struct offset_type_hasher : ggc_cache_ptr_hash
+{
+  static hashval_t hash (tree_map *m) { return tree_map_hash (m); }
+  static bool equal (tree_map *a, tree_map *b) { return tree_map_eq (a, b); }
+
+  static int
+  keep_cache_entry (tree_map *)
+  {
+return ggc_marked_p (e->base.from);
+  }
+};
+
+static GTY((cache)) hash_table *offset_type_hash;
+
 /* Return a type to use in the debug info instead of TYPE, or NULL_TREE to
keep TYPE.  */
 
@@ -138,8 +152,35 @@ tree
 cp_get_debug_type (const_tree type)
 {
   if (TYPE_PTRMEMFUNC_P (type) && !typedef_variant_p (type))
-return build_offset_type (TYPE_PTRMEMFUNC_OBJECT_TYPE (type),
-			  TREE_TYPE (TYPE_PTRMEMFUNC_FN_TYPE (type)));
+{
+  if (offset_type_hash == NULL)
+	offset_type_hash = hash_table::create_ggc (512);
+
+  /* We cannot simply use build_offset_type here because the function uses
+	 the type canonicalization hashtable, which is GC-ed, so its behavior
+	 depends on the actual collection points.  Since we are building these
+	 types on the fly for the debug info only, they would not be attached
+	 to any GC root and always be swept, so we would make the contents of
+	 the debug info depend on the collection points.  */
+  struct tree_map in, *h;
+
+  in.base.from = CONST_CAST_TREE (type);
+  in.hash = htab_hash_pointer (type);
+  h = offset_type_hash->find_with_hash (, in.hash);
+  if (h)
+	return h->to;
+
+  tree t = build_offset_type (TYPE_PTRMEMFUNC_OBJECT_TYPE (type),
+  TREE_TYPE (TYPE_PTRMEMFUNC_FN_TYPE (type)));
+
+  h = ggc_alloc ();
+  h->base.from = CONST_CAST_TREE (type);
+  h->hash = htab_hash_pointer (type);
+  h->to = t;
+  *offset_type_hash->find_slot_with_hash (h, h->hash, INSERT) = h;
+
+  return t;
+}
 
   return NULL_TREE;
 }


[PATCH] Add a -Wcast-align=strict warning

2017-09-04 Thread Bernd Edlinger
Hi,

as you know we have a -Wcast-align warning which works only for
STRICT_ALIGNMENT targets.  But occasionally it would be nice to be
able to switch this warning on even for other targets.

Therefore I would like to add a strict version of this option
which can be invoked with -Wcast-align=strict.  With the only
difference that it does not depend on STRICT_ALIGNMENT.

I used the code from check_effective_target_non_strict_align
in target-supports.exp for the first version of the test case,
where we have this:

return [check_no_compiler_messages non_strict_align assembly {
 char *y;
 typedef char __attribute__ ((__aligned__(__BIGGEST_ALIGNMENT__))) c;
 c *z;
 void foo(void) { z = (c *) y; }
} "-Wcast-align"]

... and to my big surprise it did _not_ work for C++ as-is,
because same_type_p considers differently aligned types identical,
and therefore cp_build_c_cast tries the conversion first via a
const_cast which succeeds, but did not emit the cast-align warning
in this case.

As a work-around I had to check the alignment in build_const_cast_1
as well.


Bootstrapped and reg-tested on x86_64-pc-linux-gnu.
Is it OK for trunk?


Thanks
Bernd.
gcc:
2017-09-03  Bernd Edlinger  

* common.opt (Wcast-align=strict): New warning option.
* doc/invoke.texi: Document -Wcast-align=strict. 

c:
2017-09-03  Bernd Edlinger  

* c-typeck.c (build_c_cast): Implement -Wcast-align=strict.

cp:
2017-09-03  Bernd Edlinger  

* typeck.c (build_reinterpret_cast_1,
build_const_cast_1): Implement -Wcast-align=strict.

testsuite:
2017-09-03  Bernd Edlinger  

* c-c++-common/Wcast-align.c: New test.
Index: gcc/c/c-typeck.c
===
--- gcc/c/c-typeck.c	(revision 251617)
+++ gcc/c/c-typeck.c	(working copy)
@@ -5578,7 +5578,7 @@ build_c_cast (location_t loc, tree type, tree expr
 	}
 
   /* Warn about possible alignment problems.  */
-  if (STRICT_ALIGNMENT
+  if ((STRICT_ALIGNMENT || warn_cast_align == 2)
 	  && TREE_CODE (type) == POINTER_TYPE
 	  && TREE_CODE (otype) == POINTER_TYPE
 	  && TREE_CODE (TREE_TYPE (otype)) != VOID_TYPE
Index: gcc/common.opt
===
--- gcc/common.opt	(revision 251617)
+++ gcc/common.opt	(working copy)
@@ -564,6 +564,10 @@ Wcast-align
 Common Var(warn_cast_align) Warning
 Warn about pointer casts which increase alignment.
 
+Wcast-align=strict
+Common Var(warn_cast_align,2) Warning
+Warn about pointer casts which increase alignment.
+
 Wcpp
 Common Var(warn_cpp) Init(1) Warning
 Warn when a #warning directive is encountered.
Index: gcc/cp/typeck.c
===
--- gcc/cp/typeck.c	(revision 251617)
+++ gcc/cp/typeck.c	(working copy)
@@ -7265,8 +7265,8 @@ build_reinterpret_cast_1 (tree type, tree expr, bo
 	   complain))
 	return error_mark_node;
   /* Warn about possible alignment problems.  */
-  if (STRICT_ALIGNMENT && warn_cast_align
-  && (complain & tf_warning)
+  if ((STRICT_ALIGNMENT || warn_cast_align == 2)
+	  && (complain & tf_warning)
 	  && !VOID_TYPE_P (type)
 	  && TREE_CODE (TREE_TYPE (intype)) != FUNCTION_TYPE
 	  && COMPLETE_TYPE_P (TREE_TYPE (type))
@@ -7273,7 +7273,7 @@ build_reinterpret_cast_1 (tree type, tree expr, bo
 	  && COMPLETE_TYPE_P (TREE_TYPE (intype))
 	  && TYPE_ALIGN (TREE_TYPE (type)) > TYPE_ALIGN (TREE_TYPE (intype)))
 	warning (OPT_Wcast_align, "cast from %qH to %qI "
- "increases required alignment of target type", intype, type);
+		 "increases required alignment of target type", intype, type);
 
   /* We need to strip nops here, because the front end likes to
 	 create (int *) for array-to-pointer decay, instead of [0].  */
@@ -7447,6 +7447,14 @@ build_const_cast_1 (tree dst_type, tree expr, tsub
 		 the user is making a potentially unsafe cast.  */
 	  check_for_casting_away_constness (src_type, dst_type,
 		CAST_EXPR, complain);
+	  /* ??? comp_ptr_ttypes_const ignores TYPE_ALIGN.  */
+	  if ((STRICT_ALIGNMENT || warn_cast_align == 2)
+		  && (complain & tf_warning)
+		  && TYPE_ALIGN (TREE_TYPE (dst_type))
+		 > TYPE_ALIGN (TREE_TYPE (src_type)))
+		warning (OPT_Wcast_align, "cast from %qH to %qI "
+			 "increases required alignment of target type",
+			 src_type, dst_type);
 	}
 	  if (reference_type)
 	{
Index: gcc/doc/invoke.texi
===
--- gcc/doc/invoke.texi	(revision 251617)
+++ gcc/doc/invoke.texi	(working copy)
@@ -266,7 +266,8 @@ Objective-C and Objective-C++ Dialects}.
 -Wno-attributes  -Wbool-compare  -Wbool-operation @gol
 -Wno-builtin-declaration-mismatch @gol
 -Wno-builtin-macro-redefined  -Wc90-c99-compat  -Wc99-c11-compat @gol
--Wc++-compat  -Wc++11-compat  

[committed] Fix pr70043.f90 testcase (PR testsuite/82093)

2017-09-04 Thread Jakub Jelinek
Hi!

This testcase contains an out-of-bound access, which isn't necessary for the
thing the testcase tests.  I've verified that both the original and patched
testcase hangs r233928 and succeeds with r233934 and committed to trunk
as obvious.

2017-09-04  Jakub Jelinek  

PR tree-optimization/70043
PR testsuite/82093
* gfortran.dg/vect/pr70043.f90 (fn1): Start loop from 1 instead of 0.

--- gcc/testsuite/gfortran.dg/vect/pr70043.f90.jj   2016-03-03 
15:28:39.0 +0100
+++ gcc/testsuite/gfortran.dg/vect/pr70043.f90  2017-09-04 09:49:20.261236667 
+0200
@@ -1,4 +1,5 @@
-! { dg-do compile  }
+! PR tree-optimization/70043
+! { dg-do compile }
 ! { dg-additional-options "-Ofast -g" }
 ! { dg-additional-options "-march=haswell" { target i?86-*-* x86_64-*-* } }
 
@@ -6,7 +7,7 @@ subroutine fn1(a, b)
   real(8), intent(in) ::  b(100)
   real(8), intent(inout) :: a(100)
   real(8) c
-  do i=0,100
+  do i=1,100
  if( a(i) < 0.0 ) then
 c =  a(i) * b(i)
 a(i) = a(i) - c / b(i)

Jakub


Re: [PATCH]: PR target/80204 (Darwin macosx-version-min problem)

2017-09-04 Thread Jeff Law
On 09/01/2017 04:05 PM, Simon Wright wrote:
> In gcc/config/darwin-driver.c, darwin_find_version_from_kernel() assumes
> that the minor version in the Darwin kernel version (16.7.0, => minor
> version 7) is equal to the bugfix component of the macOS version, so that
> the compiler receives -mmacosx-version-min=10.12.7 and the linker receives
> -macosx_version_min 10.12.7.
> 
> Unfortunately, Apple don’t apply this algorithm; the macOS version is
> actually 10.12.6.
> 
> Getting this wrong means that it’s impossible to run an executable from 
> within a bundle: Sierra complains "You have macOS 10.12.6. The application
> requires macOS 10.12.7 or later".
> 
> A workround would perhaps be to link the executable with
> -Wl,-macosx_version_min,`sw_vers -productVersion` (I assume that it’s only
> the linker phase that matters?)
> 
> I see that Apple’s gcc (Apple LLVM version 8.0.0
> (clang-800.0.42.1)) specifies - only at link time -
>  -macosx_version_min 10.12.0
> 
> This patch does the same.
> 
> gcc/Changelog:
> 
>   2017-09-01 Simon Wright 
> 
>   PR target/80204
>   * config/darwin-driver.c (darwin_find_version_from_kernel): eliminate 
> calculation of the
> minor version, always output as 0.
> 
OK
jeff


Re: [PATCH] Disable type demotion for sanitizer (PR sanitizer/82072)

2017-09-04 Thread Jeff Law
On 09/01/2017 11:47 AM, Marek Polacek wrote:
> Here, do_narrow and convert_to_integer_1 is demoting signed types to unsigned,
> e.g. for
>   i = i - lmin
> where i is int and lmin is long int, so what we should produce is
>   i = (int) ((long int) i - lmin)
> but instead it produces
>   i = (int) ((unsigned int) i - (unsigned int) lmin);
> which hides the overflow.  Similarly for NEGATE_EXPR.  This patch prevents
> such demoting when the sanitizer is on.
> 
> There still might be a similar issue with division or shifting, but I couldn't
> trigger that.
> 
> Bootstrapped/regtested on x86_64-linux, ok for trunk?
> 
> 2017-09-01  Marek Polacek  
> 
>   PR sanitizer/82072
>   * convert.c (do_narrow): When sanitizing signed integer overflows,
>   bail out for signed types.
>   (convert_to_integer_1) : Likewise.
> 
>   * c-c++-common/ubsan/pr82072.c: New test.
OK.  There's probably other places that may need similar treatment.  You
might want to peek at shorten_binary_op and shorten_compare to see if
they suffer from similar problems.  We really want them to go away, but
we haven't gotten back to that project since Kai left.

Jeff


Re: Add a partial_integral_type_p helper function

2017-09-04 Thread Jeff Law
On 08/25/2017 10:48 AM, Martin Sebor wrote:
> On 08/18/2017 02:13 AM, Richard Sandiford wrote:
>> This patch adds a partial_integral_type_p function, to go along
>> with the full_integral_type_p added by the previous patch.
>>
>> Of the changes that didn't previously have an INTEGRAL_TYPE_P check:
>>
>> - the convert_to_integer_1 hunks are dominated by a case version
>>   of INTEGRAL_TYPE_P.
>>
>> - the merge_ranges hunk is dominated by an ENUMERAL_TYPE case.
>>
>> - vectorizable_reduction has the comment:
>>
>>   /* Do not try to vectorize bit-precision reductions.  */
>>
>>   and so I think was only concerned with integers.
>>
>> - vectorizable_assignment has the comment:
>>
>>   /* We do not handle bit-precision changes.  */
>>
>>   and the later:
>>
>>   /* But a conversion that does not change the bit-pattern is ok.  */
>>   && !((TYPE_PRECISION (TREE_TYPE (scalar_dest))
>> > TYPE_PRECISION (TREE_TYPE (op)))
>>&& TYPE_UNSIGNED (TREE_TYPE (op)))
>>
>>   would only make sense if OP is also an integral type.
>>
>> - vectorizable_shift is inherently restricted to integers.
>>
>> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
>>
>> Richard
>>
>>
>> 2017-08-17  Richard Sandiford  
>> Alan Hayward  
>> David Sherwood  
>>
>> gcc/
>> * tree.h (partial_integral_type_p): New function.
>> * convert.c (convert_to_integer_1): Use it.
>> * expr.c (store_fieldexpand_expr_real_2, expand_expr_real_1):
>> Likewise.
>> * fold-const.c (merge_ranges): Likewise.
>> * tree-ssa-math-opts.c (convert_mult_to_fma): Likewise.
>> * tree-tailcall.c (process_assignment): Likewise.
>> * tree-vect-loop.c (vectorizable_reduction): Likewise.
>> * tree-vect-stmts.c (vectorizable_conversion): Likewise.
>> (vectorizable_assignment, vectorizable_shift): Likewise.
>>
>> Index: gcc/tree.h
>> ===
>> --- gcc/tree.h2017-08-18 08:35:58.031690315 +0100
>> +++ gcc/tree.h2017-08-18 08:36:07.208306339 +0100
>> @@ -5414,4 +5414,13 @@ full_integral_type_p (const_tree t)
>>return INTEGRAL_TYPE_P (t) && scalar_type_is_full_p (t);
>>  }
>>
>> +/* Return true if T is an integral type that has fewer bits than
>> +   its underlying mode.  */
> 
> May I suggest to rephrase that as "has fewer bits of precision
> than..." to make it clear what those bits refer to?  Alternatively.
> "type whose precision is less than that of its underlying mode."
Can't hurt to be a bit more precise for those who haven't had the
pleasure of dealing with partial modes.

Jeff


Re: Add a partial_integral_type_p helper function

2017-09-04 Thread Jeff Law
On 08/18/2017 02:13 AM, Richard Sandiford wrote:
> This patch adds a partial_integral_type_p function, to go along
> with the full_integral_type_p added by the previous patch.
> 
> Of the changes that didn't previously have an INTEGRAL_TYPE_P check:
> 
> - the convert_to_integer_1 hunks are dominated by a case version
>   of INTEGRAL_TYPE_P.
> 
> - the merge_ranges hunk is dominated by an ENUMERAL_TYPE case.
> 
> - vectorizable_reduction has the comment:
> 
>   /* Do not try to vectorize bit-precision reductions.  */
> 
>   and so I think was only concerned with integers.
> 
> - vectorizable_assignment has the comment:
> 
>   /* We do not handle bit-precision changes.  */
> 
>   and the later:
> 
>   /* But a conversion that does not change the bit-pattern is ok.  */
>   && !((TYPE_PRECISION (TREE_TYPE (scalar_dest))
>   > TYPE_PRECISION (TREE_TYPE (op)))
>  && TYPE_UNSIGNED (TREE_TYPE (op)))
> 
>   would only make sense if OP is also an integral type.
> 
> - vectorizable_shift is inherently restricted to integers.
> 
> Tested on aarch64-linux-gnu and x86_64-linux-gnu.  OK to install?
> 
> Richard
> 
> 
> 2017-08-17  Richard Sandiford  
>   Alan Hayward  
>   David Sherwood  
> 
> gcc/
>   * tree.h (partial_integral_type_p): New function.
>   * convert.c (convert_to_integer_1): Use it.
>   * expr.c (store_fieldexpand_expr_real_2, expand_expr_real_1): Likewise.
>   * fold-const.c (merge_ranges): Likewise.
>   * tree-ssa-math-opts.c (convert_mult_to_fma): Likewise.
>   * tree-tailcall.c (process_assignment): Likewise.
>   * tree-vect-loop.c (vectorizable_reduction): Likewise.
>   * tree-vect-stmts.c (vectorizable_conversion): Likewise.
>   (vectorizable_assignment, vectorizable_shift): Likewise.
OK.
jeff