[COMMITTED] Fix pthread errors in pr86637-2.c

2019-02-11 Thread Wilco Dijkstra
Fix test errors on targets which do not support pthreads. Committed as obvious. ChangeLog: 2019-02-11 Wilco Dijkstra PR tree-optimization/86637 * gcc.c-torture/compile/pr86637-2.c: Test pthread and graphite target. --- diff --git a/gcc/testsuite/gcc.c-torture/compile/pr86637

[PATCH][ARM] Fix PR89222

2019-02-11 Thread Wilco Dijkstra
. ARMv5te bootstrap OK, regression tests pass. OK for commit? ChangeLog: 2019-02-06 Wilco Dijkstra gcc/ PR target/89222 * config/arm/arm.md (movsi): Use arm_cannot_force_const_mem to decide when to split off an offset from a symbol. * config/arm/arm.c

Re: arm access to stack slot out of allocated area

2019-02-08 Thread Wilco Dijkstra
Hi Olivier, > Sorry, I had -mapcs-frame in mind. That's identical to -mapcs, and equally deprecated. It was superceded 2 decades ago. -mpcs-frame bugs have been reported multiple times, including on VxWorks. For example https://gcc.gnu.org/bugzilla/show_bug.cgi?id=64379 suggests VxWorks doesn't n

Re: arm access to stack slot out of allocated area

2019-02-08 Thread Wilco Dijkstra
Hi Olivier, > Below is a description of a very annoying bug we are witnessing > on ARM. ... > compiled with -Og -mapcs Do you know -mapcs has been deprecated for more than 4 years now? Is there a reason you are still using it? It was deprecated since -mapcs is both extremely inefficient and buggy

Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64

2019-02-07 Thread Wilco Dijkstra
Hi Steve, >> After special cases you could do something like t = mask2 + (HWI_1U << >> shift); >> return t == (t & -t) to check for a valid bfi. > > I am not sure I follow this logic and my attempts to use this did not > work so I kept my original code. It's similar to the initial code in aarch6

Re: [Patch] PR rtl-optimization/87763 - generate more bfi instructions on aarch64

2019-02-05 Thread Wilco Dijkstra
Hi Steve, Thanks for looking at this. A few comments on the patch: +bool +aarch64_masks_and_shift_for_bfi_p (scalar_int_mode mode, + unsigned HOST_WIDE_INT mask1, + unsigned HOST_WIDE_INT shft_amnt, +

[PATCH][ARM] Fix Thumb-1 ldm (PR89190)

2019-02-04 Thread Wilco Dijkstra
is by explicitly checking whether the base is loaded. Also enable LDMs which load the first register. Bootstrap OK on armhf, testsuite passes. OK for commit? ChangeLog: 2019-02-04 Wilco Dijkstra PR target/89190 * config/arm/arm.c (ldm_stm_operation_p) Set addr_reg_in_re

Re: [Patch][Aarch64]PR rtl-optimization/87763 - Fix lsl_asr_sbfiz.c test by checking for subregs

2019-01-30 Thread Wilco Dijkstra
Hi, Segher wrote: >On Tue, Jan 29, 2019 at 02:51:30PM -0800, Andrew Pinski wrote: > >> Seems to me rather this should have been simplified to just: >> (set (reg:SI 93) >> (ashift:SI (sign_extract:SI (reg:SI 95) >> (const_int 3 [0x3]) >> (const_int 0 [0])) >>

[PATCH][AArch64] Fix generation of tst (PR87763)

2019-01-24 Thread Wilco Dijkstra
The TST instruction no longer matches in all cases due to changes in Combine. The fix is simple, we now need to allow a subreg as well when selecting the cc_mode. This fixes the tst_5.c and tst_6.c failures. AArch64 regress & bootstrap OK. ChangeLog: 2019-01-23 Wilco Dijkstra

[COMMITTED][testsuite] Fix vect-nop-move.c test (PR87763)

2019-01-22 Thread Wilco Dijkstra
Fix a failing test - changes in Combine mean the test now fails eventhough the generated code is the same. Given there are several AArch64-specific tests for vec-select, remove the scanning of Combine output. Committed as trivial fix. ChangeLog: 2019-01-22 Wilco Dijkstra PR rtl

Re: [Committed][AArch64] Fix PR62178 testcase failures

2019-01-10 Thread Wilco Dijkstra
e testcase now passes - committed as obvious. ChangeLog 2019-01-09 Wilco Dijkstra testsuite/ * gcc.target/aarch64/pr62178.c: Relax scan-assembler checks. --- gcc/testsuite/gcc.target/aarch64/pr62178.c  (revision 266178) +++ gcc/testsuite/gcc.target/aarch64/pr62178.c  (working copy

Re: [PATCH v2] Fix PR64242

2019-01-10 Thread Wilco Dijkstra
Hi Jakub, Any other comments? I'd like to finish this rather than leaving it in its current half-done state. Wilco   Hi, Jakub Jelinek wrote: On Fri, Dec 07, 2018 at 04:19:22PM +0000, Wilco Dijkstra wrote: >> The test case doesn't need an aligned object to fail, so wh

Re: [PATCH] Fix PR84521

2019-01-10 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 14 December 2018 13:16 To: GCC Patches Cc: nd Subject: [PATCH] Fix PR84521   This fixes and simplifies the setjmp and non-local goto implementation. Currently the virtual frame pointer is saved when using __builtin_setjmp or a non-local goto.  Depending on

Re: [PATCH][AArch64] Use Q-reg loads/stores in movmem expansion

2019-01-09 Thread Wilco Dijkstra
Hi James, TImode is an integer mode so we strongly prefer using integer registers to avoid inefficient allocations using SIMD registers. We might be able to use TFmode since that prefers Q registers. However we don't support TFmode LDP/STP unless emitted explicitly like in prolog/epilog. LDP of TI

Re: [Aarch64][SVE] Add copysign and xorsign support

2019-01-08 Thread Wilco Dijkstra
Hi Alejandro, +emit_move_insn (mask, + aarch64_simd_gen_const_vector_dup (mode, + HOST_WIDE_INT_M1U + << bits)); + +emit_insn (gen_and3 (sign, arg2, mask)); Is there

Re: [PATCH][GCC][Aarch64] Change expected bfxil count in gcc.target/aarch64/combine_bfxil.c to 18 (PR/87763)

2019-01-04 Thread Wilco Dijkstra
Hi Sam, This is a trivial test fix, so it falls under the obvious rule and can be committed without approval - https://www.gnu.org/software/gcc/svnwrite.html Cheers, Wilco

Re: [ping] Change static chain to r11 on aarch64

2018-12-21 Thread Wilco Dijkstra
Hi Olivier, > I'm experimenting with the idea of adjusting the > stack probing code using r9 today, to see if it could > save/restore that reg if it happens to be the static chain > as well. > > If that can be made to work, maybe that would be a better > alternative than just swapping and have the

Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-20 Thread Wilco Dijkstra
Hi Martin, > There is a similar mechanism for pointer-to-member-functions > used by C++. Is this correct on aarch64? /* By default, the C++ compiler will use the lowest bit of the pointer    to function to indicate a pointer-to-member-function points to a    virtual member function.  However, if

Re: [PATCH v4][C][ADA] use function descriptors instead of trampolines in C

2018-12-19 Thread Wilco Dijkstra
Hi, Jakub Jelinek wrote: > On Wed, Dec 19, 2018 at 07:53:48PM +, Uecker, Martin wrote: >> What do you think about making the trampoline a single call >> instruction and have a large memory region which is the same >> page mapped many times? This sounds like a good idea, but given a function d

Re: [ping] Change static chain to r11 on aarch64

2018-12-17 Thread Wilco Dijkstra
Hi Hans-Peter, > While the choice of static-chain register does not affect the > ABI, it's the other way round: the choice of static-chain > register matters, specifically it's call-clobberedness. Agreed. > It looks like the current aarch64 static-chain register R18 is > call-saved but without s

[PATCH] Fix PR84521

2018-12-14 Thread Wilco Dijkstra
seems incorrect since the helper function moves the the frame pointer value into the static chain register (so this patch does nothing to make it better or worse). AArch64 bootstrap OK, new test passes on AArch64, x86-64 and Arm. ChangeLog: 2018-12-13 Wilco Dijkstra gcc/ PR middle-end/

Re: [ping] Change static chain to r11 on aarch64

2018-12-13 Thread Wilco Dijkstra
Hi Martin, > One could also argue that it creates a false sense of security > and diverts resources from properly fixing the real problems > i.e. the buffer overflows which lets an attacker write to the > stack in the first place. A program without buffer overflows > is secure even without an exec

Re: [ping] Change static chain to r11 on aarch64

2018-12-13 Thread Wilco Dijkstra
Hi Martin, Uecker, Martin wrote: >Am Mittwoch, den 12.12.2018, 22:04 + schrieb Wilco Dijkstra: >> Hi Martin, >> >> > Does a non-executable stack actually improve security? >> >> Absolutely, it's like closing your front door rather than just leave i

Re: [ping] Change static chain to r11 on aarch64

2018-12-12 Thread Wilco Dijkstra
Hi Martin, > Does a non-executable stack actually improve security? Absolutely, it's like closing your front door rather than just leave it open for anyone. > For the alternative implementation using (custom) function > descriptors (-fno-trampolines) the static chain becomes > part of the ABI or

Re: [ping] Change static chain to r11 on aarch64

2018-12-12 Thread Wilco Dijkstra
Hi, >> On 12 Dec 2018, at 18:21, Richard Earnshaw (lists) >> wrote: > >>  However, that introduces an issue that that >> code is potentially used across multiple versions of gcc, with >> potentially different choices of the static chain register.  Hmm, this >> might need some more careful though

Re: [ping] allow target configurations to state R18 as reserved on arrch64

2018-12-12 Thread Wilco Dijkstra
Hi Oliver, +#define FIXED_R18 0    {                            \ 0, 0, 0, 0,   0, 0, 0, 0,    /* R0 - R7 */        \ 0, 0, 0, 0,   0, 0, 0, 0,    /* R8 - R15 */        \ -    0, 0, 0, 0,   0, 0, 0, 0,    /* R16 - R23 */        \ +    0, 0, FIXED_R18, 0, 0, 0, 0, 0,    /* R16 - R23 */  

Re: [RFA] [target/87369] Prefer "bit" over "bfxil"

2018-12-07 Thread Wilco Dijkstra
Hi, >> Ultimately, the best solution here will probably depend on which we >> think is more likely, copysign or the example I give above. > I'd tend to suspect we'd see more pure integer bit twiddling than the > copysign stuff. All we need to do is to clearly separate the integer and FP/SIMD case

Re: [PATCH v2] Fix PR64242

2018-12-07 Thread Wilco Dijkstra
Hi, Jakub Jelinek wrote: On Fri, Dec 07, 2018 at 04:19:22PM +, Wilco Dijkstra wrote: >> The test case doesn't need an aligned object to fail, so why did you add it? > > It needed it on i686, because otherwise it happened to see the value it > wanted in the caller's

Re: [PATCH v2] Fix PR64242

2018-12-07 Thread Wilco Dijkstra
Hi, Jakub Jelinek wrote: > On Fri, Dec 07, 2018 at 02:52:48PM +0000, Wilco Dijkstra wrote: >> -  struct __attribute__((aligned (32))) S { int a[4]; } s;    >>

[PATCH v2] Fix PR64242

2018-12-07 Thread Wilco Dijkstra
Log: 2018-12-07 Wilco Dijkstra gcc/ PR middle-end/64242 * builtins.c (expand_builtin_longjmp): Add frame clobbers and schedule block. (expand_builtin_nonlocal_goto): Likewise. testsuite/ PR middle-end/64242 * gcc.c-torture/execute/pr64242.c: Update test. --

Re: [RFC][AArch64] Add support for system register based stack protector canary access

2018-12-03 Thread Wilco Dijkstra
Hi, Florian wrote: > For userland, I would like to eventually copy the OpenBSD approach for > architectures which have some form of PC-relative addressing: we can > have multiple random canaries in (RELRO) .rodata in sufficiently close > to the code that needs them (assuming that we have split .ro

[PATCH] Fix PR64242

2018-11-29 Thread Wilco Dijkstra
eLog: 2018-11-29 Wilco Dijkstra gcc/ PR middle-end/64242 * builtins.c (expand_builtin_longjmp): Use a temporary when restoring the frame pointer. (expand_builtin_nonlocal_goto): Likewise. testsuite/ PR middle-end/64242 * gcc.c-torture/execute/pr642

Re: [PATCH v3] Add sinh(atanh(x)) and cosh(atanh(x)) optimizations

2018-11-23 Thread Wilco Dijkstra
Hi, > I checked it. They are all the same on x86_64: > https://pastebin.com/e63FxDAy > I even forced to call the glibc sinh and atanh, but use the sqrtsd > instruction. > But I do agree that there may be an arch that sets an errno for sinh > or cosh but not for sqrt, implying in a unexpected beha

Re: [Committed][AArch64] Fix PR62178 testcase failures

2018-11-15 Thread Wilco Dijkstra
Hi Segher, > On Wed, Nov 14, 2018 at 12:37:05PM +0000, Wilco Dijkstra wrote: >> +/* { dg-final { scan-assembler-not { dup } } } */ >> +/* { dg-final { scan-assembler-not { fmov } } } */ > > { dup }   is the same as   " dup "  , that is, with spaces and all. >

Re: [PATCH v3] Add sinh(atanh(x)) and cosh(atanh(x)) optimizations

2018-11-14 Thread Wilco Dijkstra
Hi, > Indeed. After plotting the graph of both functions, it is very clear > that this check isn't required. Sorry about that. It wouldn't be clear from the graph, you need to check that +0.0, -0.0, out of range values, infinities, NaNs give the same answer before/after your transformation. If s

[Committed][AArch64] Fix PR62178 testcase failures

2018-11-14 Thread Wilco Dijkstra
other. However the generated vector loop is fast either way since it generates MLA and merges the DUP either with a load or MLA. So relax the conditions slightly and check we still generate MLA and there is no DUP or FMOV. The testcase now passes - committed as obvious. ChangeLog 2018-11-14

[PATCH][AArch64] Fix PR81800

2018-11-14 Thread Wilco Dijkstra
is to disable lrint/llrint on double if the size of a long is smaller (ie. ilp32). Passes regress and bootstrap on AArch64. OK for commit? ChangeLog 2018-11-13 Wilco Dijkstra gcc/ PR target/81800 * gcc/config/aarch64/aarch64.md (lrint): Disable lrint pattern i

Re: [PATCH][AArch64] PR79262: Adjust vector cost

2018-11-09 Thread Wilco Dijkstra
Hi James, >On Mon, Jan 22, 2018 at 09:22:27AM -0600, Richard Biener wrote: >> It would be better to dissect this cost into vec_to_scalar and vec_extract >> where >> vec_to_scalar really means getting at the scalar value of a vector of >> uniform values >> which most targets can do without any ins

Re: [PATCH][AArch64] PR79262: Adjust vector cost

2018-11-09 Thread Wilco Dijkstra
Hi James, > We have 7 unique target tuning structures in the AArch64 backend, of which > only one has a 2x ratio between scalar_int_cost and vec_to_scalar_cost. Other > ratios are 1, 3, 8, 3, 4, 6. I wouldn't read too much in the exact value here - the costs are simply relative to other values f

Re: [PATCH] Simplify floating point comparisons

2018-11-09 Thread Wilco Dijkstra
e C / x can underflow to zero if x is huge, it's not safe otherwise). If C is negative the comparison is reversed. Simplify (x * C1) > C2 into x > (C2 / C1) with -funsafe-math-optimizations. If C1 is negative the comparison is reversed. OK for commit? ChangeLog 2018-11-09 Wil

[PATCH][AArch64] Fix symbol offset limit

2018-11-09 Thread Wilco Dijkstra
, OK for commit? ChangeLog: 2018-11-09 Wilco Dijkstra gcc/ * config/aarch64/aarch64.c (aarch64_classify_symbol): Apply reasonable limit to symbol offsets. testsuite/ * gcc.target/aarch64/symbol-range.c (foo): Set new limit. * gcc.target/aarch64/symbol-r

[PATCH][AArch64] Set SLOW_BYTE_ACCESS

2018-11-09 Thread Wilco Dijkstra
for commit until we get rid of it? ChangeLog: 2017-11-17  Wilco Dijkstra      gcc/     * config/aarch64/aarch64.h (SLOW_BYTE_ACCESS): Set to 1. -- diff --git a/gcc/config/aarch64/aarch64.h b/gcc/config/aarch64/aarch64.h index 056110afb228fb919e837c04aa5e55

[PATCH][AArch64] PR79262: Adjust vector cost

2018-11-09 Thread Wilco Dijkstra
- libquantum and SPECv6 performance improves. OK for commit? ChangeLog: 2018-01-22  Wilco Dijkstra      PR target/79262     * config/aarch64/aarch64.c (generic_vector_cost): Adjust vec_to_scalar_cost. -- diff --git a/gcc/config/aarch64/aarch64.c b/gcc/config/aarch64/aarch64.c index

[PATCH][ARM] Update max_cond_insns settings

2018-11-09 Thread Wilco Dijkstra
and regress OK on arm-none-linux-gnueabihf. OK for stage 1? ChangeLog: 2017-04-12  Wilco Dijkstra      * gcc/config/arm/arm.c (arm_cortex_a53_tune): Set max_cond_insns to 2.     (arm_cortex_a35_tune): Likewise. --- diff --git a/gcc/config/arm/arm.c b/gcc/config/arm/arm.c index

Re: [PATCH] Add sinh(tanh(x)) and cosh(tanh(x)) rules

2018-11-08 Thread Wilco Dijkstra
Hi, > But the max. error in sinh/cosh/atanh is less than 2 ULP, with some math > libraries.  It could be < 1 ULP, in theory, so sinh(atanh(x)) less than > 2 ULP even. You can't add ULP errors in general - a tiny difference in the input can make a huge difference in the result if the derivative i

Re: [PATCH] Add sinh(tanh(x)) and cosh(tanh(x)) rules

2018-11-07 Thread Wilco Dijkstra
Hi Jeff, > So if we're going from 0->2 ULPs in some cases, do we want to guard it > with one of the various options, if so, which?  Giuliano's follow-up > will still have the potential for 2ULPs. The ULP difference is not important since the individual math functions already have ULP of 3 or hig

Re: [ARM] Implement division using vrecpe, vrecps

2018-11-05 Thread Wilco Dijkstra
Hi Prathamesh, Prathamesh Kulkarni wrote: > Thanks for the suggestions. The last time I benchmarked the patch > (around Jan 2016) > I got following results with the patch for SPEC2006: > > a15: +0.64% overall, 481.wrf: +6.46% > a53: +0.21% overall, 416.gamess: -1.39%, 481.wrf: +6.76% > a57: +0.35%

Re: [ARM] Implement division using vrecpe, vrecps

2018-11-02 Thread Wilco Dijkstra
Prathamesh Kulkarni wrote: > This is a rebased version of patch that adds a pattern to neon.md for > implementing division with multiplication by reciprocal using > vrecpe/vrecps with -funsafe-math-optimizations excluding -Os. > The newly added test-cases are not vectorized on armeb target with >

Re: [PATCH] Add sinh(tanh(x)) and cosh(tanh(x)) rules

2018-10-23 Thread Wilco Dijkstra
Hi, >> Generally the goal is 1ULP in round to nearest > > Has that changed recently?  At least in the past for double the goal has > been always .5ULP in round to nearest. Yes. 0.5 ULP (perfect rounding) as a goal was insane as it caused ridiculous slowdowns in the 10x range for no apparent r

Re: [PATCH] Add sinh(tanh(x)) and cosh(tanh(x)) rules

2018-10-23 Thread Wilco Dijkstra
Hi, >> So I think the runtime math libraries shoot for .5 ULP (yes, they don't >> always make it, but that's their goal).  We should probably have the >> same goal.  Going from 0 to 2 ULPs would be considered bad. Generally the goal is 1ULP in round to nearest - other rounding modes may have high

Re: [PATCH] Add sinh(tanh(x)) and cosh(tanh(x)) rules

2018-10-19 Thread Wilco Dijkstra
Hi, >> Maybe I am crazy, or the labels here are wrong, but that looks like the >> error is three times as *big* after the patch.  I.e. it worsened instead >> of improving. This error is actually 1ULP, so just a rounding error. Can't expect any better than that! > with input : = 9.98807907

Re: [PATCH] Add sinh(tanh(x)) and cosh(tanh(x)) rules

2018-10-19 Thread Wilco Dijkstra
Jakub Jelinek wrote: > At this point this seems like something that shouldn't be done inline > anymore, so either we don't do this optimization at all, because the errors > are far bigger than what is acceptable even for -ffast-math, or we have a > library function that does the sinh (tanh (x)) an

Re: [PATCH] Add sinh(tanh(x)) and cosh(tanh(x)) rules

2018-10-19 Thread Wilco Dijkstra
Hi, >> Did you enable FMA? I'd expect 1 - x*x to be accurate with FMA, so the >> relative error >> should be much better. If there is no FMA, 2*(1-fabs(x)) - (1-fabs(x))^2 >> should be >> more accurate when abs(x)>0.5 and still much faster. > >No, but I will check how to enable it if FMA is avai

Re: [PATCH] Add sinh(tanh(x)) and cosh(tanh(x)) rules

2018-10-18 Thread Wilco Dijkstra
Hi, > Well, I compared the results before and after the simplifications with a > 512-bit > precise mpfr value. Unfortunately, I found that sometimes the error is very > noticeable :-( . Did you enable FMA? I'd expect 1 - x*x to be accurate with FMA, so the relative error should be much better.

Re: [patch] allow target config to state r18 is fixed on aarch64

2018-10-18 Thread Wilco Dijkstra
Hi Olivier, > STATIC_CHAIN_REGNUM still needs to be adjusted directly I think. > > I wondered if we could set it to R11 unconditionally and picked > the way ensuring no change for !vxworks ports, especially since I > don't have means to test more than what I described above. Yes it should always

Re: [PATCH][AArch64] Support zero-extended move to FP register

2018-10-11 Thread Wilco Dijkstra
fmovw0, s0 ret After: fmovs0, w0 cnt v0.8b, v0.8b addvb0, v0.8b fmovw0, s0 ret Passes regress on AArch64, OK for commit? ChangeLog: 2018-10-11 Wilco Dijkstra gcc/ * config/aarch64/aarch64.md (zero_extendsidi2_aarc

[PATCH][AArch64] Fix PR87511

2018-10-11 Thread Wilco Dijkstra
As mentioned in PR87511, the shift used in aarch64_mask_and_shift_for_ubfiz_p should be evaluated as a HOST_WIDE_INT rather than int. Passes bootstrap, OK for commit and backport? ChangeLog: 2018-10-11 Wilco Dijkstra gcc/ * config/aarch64/aarch64.c (aarch64_mask_and_shift_for_ubfiz_p

Re: [PATCH v3] Change default to -fno-math-errno

2018-10-11 Thread Wilco Dijkstra
Hi,   > if (math_errhandling & MATH_ERRNO) == 0 a math > function may still set errno. > > it can only set it if there was an error though, > not arbitrarily clobber it, but this means that > > (1) reordering errno access around math calls is > invalid even with -fno-math-errno. It's typically th

Re: [PATCH v3] Change default to -fno-math-errno

2018-10-11 Thread Wilco Dijkstra
Hi, > Note that "will ever set errno" includes possibly setting it in the > future, since code may be built with one libm version and used with > another.  So it wouldn't be correct to have a "never sets errno" attribute > on glibc logb / lround / llround / lrint / llrint / fma / remquo (missin

Re: [PATCH v3] Change default to -fno-math-errno

2018-10-11 Thread Wilco Dijkstra
Joseph Myers wrote: > On Mon, 8 Oct 2018, Richard Biener wrote: >> So I think it would be fine if we'd have -fno-math-errno as documented >> and then the C library would annotate their math functions according >> to whether they will ever set errno or not.  Once a math function is >> const or pure

Re: [PATCH v3] Change default to -fno-math-errno

2018-10-11 Thread Wilco Dijkstra
Hi Jeff, > So I went back and reviewed all the discussion around this.  I'm still > having trouble getting comfortable with flipping the default -- unless > we know ahead of time that the target runtime doesn't set errno on any > of the math routines.  That implies a target hook to describe the >

Re: [PATCH][AArch64] Support zero-extended move to FP register

2018-10-03 Thread Wilco Dijkstra
or commit? ChangeLog: 2018-10-03 Wilco Dijkstra gcc/ * config/aarch64/aarch64.md (zero_extendsidi2_aarch64): Add alternatives to zero-extend between int and floating-point registers. (load_pair_zero_extendsidi2_aarch64): Add alternative to emit zero-extended ldp in

Re: [PATCH][AArch64] Support zero-extended move to FP register

2018-09-28 Thread Wilco Dijkstra
Richard Henderson wrote: > If you're going to add moves r->w, why not also go ahead and add w->r. > There are also HImode fmov zero-extensions, fwiw. Well in principle it would be possible to support all 8/16/32-bit zero extensions for all combinations of int and fp registers. However I prefer t

Re: [PATCH][AArch64] Support zero-extended move to FP register

2018-09-28 Thread Wilco Dijkstra
cnt v0.8b, v0.8b addvb0, v0.8b fmovw0, s0 ret After: fmovs0, w0 cnt v0.8b, v0.8b addvb0, v0.8b fmovw0, s0 ret Passes regress on AArch64, OK for commit? ChangeLog: 2018-09-28 Wilco Dijkstra gcc/ * conf

Re: [PATCH][GCC][AARCH64] Add even-pair register classes

2018-09-28 Thread Wilco Dijkstra
Matthew wrote: > The canonical way to require even-odd pairs of registers to implement a TImode > pseudo register as mentioned in the documentation is to limit *all* TImode > registers to being even-odd by using the TARGET_HARD_REGNO_MODE_OK hook. And that is the best approach for cases like this

[PATCH][AArch64] Support zero-extended move to FP register

2018-09-27 Thread Wilco Dijkstra
.8b fmovw0, s0 ret After: fmovs0, w0 cnt v0.8b, v0.8b addvb0, v0.8b fmovw0, s0 ret Passes regress on AArch64, OK for commit? ChangeLog: 2018-09-27 Wilco Dijkstra gcc/ * config/aarch64/aarch64.md (zero_extendsidi2

Re: [PATCH] Frame pointer for arm with THUMB2 mode

2018-09-05 Thread Wilco Dijkstra
Hi Denis, >> Adding support for a frame chain would require an ABI change. It > would have to > > work across GCC, LLVM, Arm, Thumb-1 and Thumb-2 - not a trivial amount of > > effort. > Clang already works that way. No, that's incorrect like Richard pointed out. Only a single register can be u

Re: [PATCH] Frame pointer for arm with THUMB2 mode

2018-09-05 Thread Wilco Dijkstra
Hi Denis, > We are working on applying Address/LeakSanitizer for the full Tizen OS > distribution. It's about ~1000 packages, ASan/LSan runtime is installed > to ld.so.preload. As we know ASan/LSan has interceptors for > allocators/deallocators such as (malloc/realloc/calloc/free) and so on. > O

Re: [Patch][Aarch64] Implement Aarch64 SIMD ABI and aarch64_vector_pcs attribute

2018-09-04 Thread Wilco Dijkstra
Hi Steve, The latest version compiles the examples I used correctly, so it looks fine from that perspective (but see comments below). However the key point of the ABI is to enable better code generation when calling a vector function, and that will likely require further changes that may conflict

Re: [PATCH v3] Change default to -fno-math-errno

2018-09-04 Thread Wilco Dijkstra
ping From: Wilco Dijkstra Sent: 18 June 2018 15:01 To: GCC Patches Cc: nd; Joseph Myers Subject: [PATCH v3] Change default to -fno-math-errno   GCC currently defaults to -fmath-errno.  This generates code assuming math functions set errno and the application checks errno.  Few applications

Re: [PATCH] Frame pointer for arm with THUMB2 mode

2018-08-27 Thread Wilco Dijkstra
Hi, > But we still have an issue with performance, when we are using default > unwinder, which uses unwind tables. It could be up to 10 times faster to > use frame based stack unwinder instead "default unwinder". Switching on the frame pointer typically costs 1-2% performance, so it's a bad idea

Re: [PATCH][AARCH64] inline strlen for 8-bytes aligned strings

2018-08-10 Thread Wilco Dijkstra
Hi, A quick benchmark shows it's faster up to about 10 bytes, but after that it becomes extremely slow. At 16 bytes it's already 2.5 times slower and for larger sizes its over 13 times slower than the GLIBC implementation... > The implementation falls back to the library call if the > string is

Re: [Patch-86512]: Subnormal float support in armv7(with -msoft-float) for intrinsics

2018-07-27 Thread Wilco Dijkstra
Nicolas Pitre wrote: >> However if r4 is non-zero, the carry will be set, and the tsths will be >> executed. This >> clears the carry and sets the Z flag based on bit 20. > > No, not at all. The carry is not affected. And that's the point of the > tst instruction here rather than a cmp: it sets

Re: [Patch-86512]: Subnormal float support in armv7(with -msoft-float) for intrinsics

2018-07-27 Thread Wilco Dijkstra
Hi Nicolas, I think your patch doesn't quite work as expected: @@ -238,9 +238,10 @@ LSYM(Lad_a): movsip, ip, lsl #1 adcsxl, xl, xl adc xh, xh, xh - tst xh, #0x0010 - sub r4, r4, #1 - bne LSYM(Lad_e) + subsr4, r4, #1 +

Re: RFC: Patch to implement Aarch64 SIMD ABI

2018-07-23 Thread Wilco Dijkstra
Steve Ellcey wrote: > OK, I think I understand this a bit better now.  I think my main > problem is with the  term 'writeback' which I am not used to seeing. > But if I understand things correctly we are saving one or two registers > and (possibly) updating the stack pointer using auto-increment/a

Re: [Patch-86512]: Subnormal float support in armv7(with -msoft-float) for intrinsics

2018-07-23 Thread Wilco Dijkstra
Umesh Kalappa wrote: > We tested on the SP and yes the problem persist on the SP too and > attached patch will fix the both SP and DP issues for the  denormal > resultant. The patch now looks correct to me (but I can't approve). > We bootstrapped the compiler ,look ok to us with minimal testing

Re: [Patch-86512]: Subnormal float support in armv7(with -msoft-float) for intrinsics

2018-07-20 Thread Wilco Dijkstra
Umesh Kalappa wrote: > We tried some of the normalisation numbers and the fix works and please > could you help us with the input ,where  if you see that fix breaks down. Well try any set of inputs which require normalisation. You'll find these no longer get normalised and so will get incorrect r

Re: [Patch-86512]: Subnormal float support in armv7(with -msoft-float) for intrinsics

2018-07-20 Thread Wilco Dijkstra
Hi Umesh, Looking at your patch, this would break all results which need to be normalized. Index: libgcc/config/arm/ieee754-df.S === --- libgcc/config/arm/ieee754-df.S (revision 262850) +++ libgcc/config/arm/ieee754-df.S (

Re: RFC: Patch to implement Aarch64 SIMD ABI

2018-07-20 Thread Wilco Dijkstra
Steve Ellcey wrote: > Yes, I see where I missed this in aarch64_push_regs > and aarch64_pop_regs.  I think that is why the second of > Wilco's two examples (f2) is wrong.  I am unclear about > exactly what is meant by writeback and why we have it and > how that and callee_adjust are used.  Any cha

Re: RFC: Patch to implement Aarch64 SIMD ABI

2018-07-19 Thread Wilco Dijkstra
Hi Steve, > This patch checks for SIMD functions and saves the extra registers when > needed. It does not change the caller behavour, so with just this patch > there may be values saved by both the caller and callee. This is not > efficient, but it is correct code. I tried a few simple test cas

[COMMITTED][testsuite] Fix f16_mov_immediate_3.c

2018-06-28 Thread Wilco Dijkstra
Fix and simplify the testcase so it generates dup even on latest trunk. This fixes the failure reported in: https://gcc.gnu.org/ml/gcc-patches/2018-06/msg01799.html Committed as obvious. ChangeLog: 2018-06-28 Wilco Dijkstra * gcc.target/aarch64/f16_mov_immediate_3.c: Fix testcase

Re: [PATCH v3] Change default to -fno-math-errno

2018-06-27 Thread Wilco Dijkstra
Joseph Myers wrote: > On Tue, 26 Jun 2018, Wilco Dijkstra wrote: > > That looks incorrect indeed but that's mostly a problem with -fmath-errno > > as it > > would result in GCC assuming the function is const/pure when in fact it > > isn't. > > Does

Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp

2018-06-27 Thread Wilco Dijkstra
Eric Botcazou wrote: >> This test can easily be changed not to use optimize since it doesn't look >> like it needs it. We really need to tests these builtins properly, >> otherwise they will continue to fail on most targets. > > As far as I can see PR target/84521 has been reported only for Aarch6

Re: [PATCH][AARCH64] PR target/84521 Fix frame pointer corruption with -fomit-frame-pointer with __builtin_setjmp

2018-06-27 Thread Wilco Dijkstra
Eric Botcazou wrote: > > The AArch64 parts are OK. I've been holding off approving the patch while > > I wait for Eric to reply on the x86_64 fails with your new testcase. > > The test is not portable in any case since it uses the "optimize" attribute > so > I'd just make it specific to Aarch64

Re: [PATCH v3] Change default to -fno-math-errno

2018-06-26 Thread Wilco Dijkstra
Joseph Myers wrote: > On Thu, 21 Jun 2018, Jeff Law wrote: > > > I think all this implies that the setting of -fno-math-errno by default > > really depends on the math library in use since it's the library that > > has to arrange for either errno to get set or for an exception to be raised. > > If

Re: [PATCH v3] Change default to -fno-math-errno

2018-06-19 Thread Wilco Dijkstra
Richard Biener wrote: > There are a number of regression tests that check for errno handling > (I added some to avoid aliasing for example).  Please make sure to > add explicit -fmath-errno to those that do not already have it set > (I guess such patch would be obvious and independent of this one)

[PATCH v3] Change default to -fno-math-errno

2018-06-18 Thread Wilco Dijkstra
: f: str x30, [sp, -16]! bl lroundf add x0, x0, 1 ldr x30, [sp], 16 ret With -fno-math-errno: f: fcvtas x0, s0 add x0, x0, 1 ret Passes regress on AArch64. OK for commit? ChangeLog: 2018-06-18 Wilco Dijkstra

[COMMITTED][testsuite] Add target pthread to pr86076.c

2018-06-18 Thread Wilco Dijkstra
Add missing target pthread to ensure test doesn't fail on bare-metal targets. Committed as obvious. ChangeLog: 2018-06-18 Wilco Dijkstra PR tree-optimization/86076 * gcc.dg/pr86076.c: Add target pthread for bare-metal targets. -- diff --git a/gcc/testsuite/gcc.dg/pr8607

[COMMITTED][testsuite] Remove xfail from vect-abs-compile.c

2018-06-18 Thread Wilco Dijkstra
Since PR64946 has been fixed, we can remove the xfail from this test. Committed as obvious. ChangeLog: 2018-06-18 Wilco Dijkstra PR tree-optimization/64946 * gcc.target/aarch64/vect-abs-compile.c: Remove xfail. -- diff --git a/gcc/testsuite/gcc.target/aarch64/vect-abs

Re: [PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno_class

2018-05-31 Thread Wilco Dijkstra
Richard Sandiford wrote: >> This has probably been reported elsewhere already but I can't find >> such a report, so sorry for possible duplicate, >> but this patch is causing ICEs on aarch64 >> FAIL:    gcc.target/aarch64/sve/reduc_1.c -march=armv8.2-a+sve >> (internal compiler error) >> FAIL:   

Re: [PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno_class

2018-05-30 Thread Wilco Dijkstra
Richard Sandiford > The "?" change seems to make intrinsic sense given the extra cost of the > GPR alternative.  But I think the real reason for this failure is that > we define no V1DF patterns, and target-independent code falls back to > using moves in the corresponding *integer* mode.  So for

Re: [PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno_class

2018-05-29 Thread Wilco Dijkstra
James Greenhalgh wrote: > > Add a missing ? to aarch64_get_lane to fix a failure in the testsuite. > > > I'd prefer more detail than this for a workaround; which test, why did it > > start to fail, why is this the right solution, etc. It was gcc.target/aarch64/vect_copy_lane_1.c generating: test

Re: [PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno_class

2018-05-25 Thread Wilco Dijkstra
explicitly checking for a subset of GENERAL_REGS and FP_REGS. Add a missing ? to aarch64_get_lane to fix a failure in the testsuite. Passes regress, OK for commit? Since it is a regression introduced in GCC8, OK to backport to GCC8? ChangeLog: 2018-05-25 Wilco Dijkstra * config/aarch64/

Re: [PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno_class

2018-05-23 Thread Wilco Dijkstra
Richard Sandiford wrote: > -  if (allocno_class != ALL_REGS) > +  if (allocno_class != POINTER_AND_FP_REGS) >  return allocno_class; >  > -  if (best_class != ALL_REGS) > +  if (best_class != POINTER_AND_FP_REGS) >  return best_class; >  >    mode = PSEUDO_REGNO_MODE (regno); > I think

[PATCH][AArch64] Fix aarch64_ira_change_pseudo_allocno_class

2018-05-22 Thread Wilco Dijkstra
ND_FP_REGS register class which is now used instead of ALL_REGS. Add a missing ? to aarch64_get_lane to fix a failure in the testsuite. Passes regress, OK for commit? Since it is a regression introduced in GCC8, OK to backport to GCC8? ChangeLog: 2018-05-22 Wilco Dijkstra * config/aarch64

Re: [PATCH][AArch64] Simplify frame pointer logic

2018-05-22 Thread Wilco Dijkstra
James Greenhalgh wrote: > +/* Determine whether a frame chain needs to be generated.  */ > +static bool > +aarch64_needs_frame_chain (void) > +{ > +  /* Force a frame chain for EH returns so the return address is at FP+8.  */ > +  if (frame_pointer_needed || crtl->calls_eh_return) > +    return tr

Re: [PATCH][AArch64] Unify vec_set patterns, support floating-point vector modes properly

2018-05-17 Thread Wilco Dijkstra
Kyrill Tkachov wrote: > That patch would look like the attached. Is this preferable? > For the above example it generates the desired: > foo_v4sf: >   ldr s0, [x0] >   ldr s1, [x1, 8] >   ins v0.s[1], v1.s[0] >   ld1 {v0.s}[2], [x2] >   ld1 {v0.s}[3], [x3] >

Re: [PATCH][AArch64] Set SLOW_BYTE_ACCESS

2018-05-16 Thread Wilco Dijkstra
Richard Earnshaw wrote: >>> Which doesn't appear to have been approved.  Did you follow up with Jeff? >> >> I'll get back to that one at some point - it'll take some time to agree on a >> way >> forward with the callback. >> >> Wilco >> >> > > So it seems to me that this should then be q

Re: [PATCH][AArch64] Improve register allocation of fma

2018-05-15 Thread Wilco Dijkstra
Hi, James Greenhalgh wrote: > > This seems like a fairly horrible hack around the register allocator > behaviour. That is why I proposed to improve the register allocator so one can explicitly specify the copy preference in the md syntax. However that wasn't accepted, so we'll have to use a hack

Re: [PATCH][AArch64] Set SLOW_BYTE_ACCESS

2018-05-15 Thread Wilco Dijkstra
Hi, > Which doesn't appear to have been approved.  Did you follow up with Jeff? I'll get back to that one at some point - it'll take some time to agree on a way forward with the callback. Wilco

<    1   2   3   4   5   6   7   8   9   10   >