On Wed, Nov 27, 2019 at 2:17 PM Wilco Dijkstra wrote:
>
> Hi Richard,
>
> >> Yes so it does the insane "fully unrolled trailing loop before the unrolled
> >> loop" thing. One always does the trailing loop last (and typically as an
> >> actual loop of course) and then the code ends up much faster,
Hi Richard,
>> Yes so it does the insane "fully unrolled trailing loop before the unrolled
>> loop" thing. One always does the trailing loop last (and typically as an
>> actual loop of course) and then the code ends up much faster, close to
>> the ideal version shown in the PR.
>
> Well, you can't
On Fri, Nov 15, 2019 at 6:13 PM Wilco Dijkstra wrote:
>
> Hi Richard,
>
> > So what do we actually do unpatched with -funroll-loops here?
>
> Yes so it does the insane "fully unrolled trailing loop before the unrolled
> loop" thing. One always does the trailing loop last (and typically as an
> act
Hi Richard,
> So what do we actually do unpatched with -funroll-loops here?
Yes so it does the insane "fully unrolled trailing loop before the unrolled
loop" thing. One always does the trailing loop last (and typically as an
actual loop of course) and then the code ends up much faster, close to
t
Alternatively limit the unroll factor or pad
the prologue jumps to make the branch predictor happy.
But your patch is a hack.
Richard.
> Thanks
> Sudi
>
>
>
>
>
>
>
> From: Richard Biener
>
> Sent: Friday, November 15, 2019 9:32 AM
>
> To: Sudakshina
z
Subject: Re: [PATCH, GCC, AArch64] Fix PR88398 for AArch64
On Thu, Nov 14, 2019 at 4:41 PM Sudakshina Das wrote:
>
> Hi
>
> This patch is trying to fix PR88398 for AArch64. As discussed in the PR,
> loop unrolling is probably what we can do here. As an easy fix, the
> ex
On Thu, Nov 14, 2019 at 4:41 PM Sudakshina Das wrote:
>
> Hi
>
> This patch is trying to fix PR88398 for AArch64. As discussed in the PR,
> loop unrolling is probably what we can do here. As an easy fix, the
> existing unroll_stupid is unrolling the given example better than the
> unroll_runtime_i
Hi
This patch is trying to fix PR88398 for AArch64. As discussed in the PR,
loop unrolling is probably what we can do here. As an easy fix, the
existing unroll_stupid is unrolling the given example better than the
unroll_runtime_iterations since the the loop contains a break inside it.
So all