Richard Biener writes:
> On Tue, 4 Jul 2023, Richard Sandiford wrote:
>
>> Richard Biener writes:
>> > On Thu, 29 Jun 2023, Richard Biener wrote:
>> >
>> >> On Thu, 29 Jun 2023, Richard Sandiford wrote:
>> >>
>> >> > Richard Biener writes:
>> >> > > With applying loop masking to epilogues on x8
On Tue, 4 Jul 2023, Richard Sandiford wrote:
> Richard Biener writes:
> > On Thu, 29 Jun 2023, Richard Biener wrote:
> >
> >> On Thu, 29 Jun 2023, Richard Sandiford wrote:
> >>
> >> > Richard Biener writes:
> >> > > With applying loop masking to epilogues on x86_64 AVX512 we see
> >> > > some s
Richard Biener writes:
> On Thu, 29 Jun 2023, Richard Biener wrote:
>
>> On Thu, 29 Jun 2023, Richard Sandiford wrote:
>>
>> > Richard Biener writes:
>> > > With applying loop masking to epilogues on x86_64 AVX512 we see
>> > > some significant performance regressions when evaluating SPEC CPU 20
On Thu, 29 Jun 2023, Richard Biener wrote:
> On Thu, 29 Jun 2023, Richard Sandiford wrote:
>
> > Richard Biener writes:
> > > With applying loop masking to epilogues on x86_64 AVX512 we see
> > > some significant performance regressions when evaluating SPEC CPU 2017
> > > that are caused by stor
On Thu, 29 Jun 2023, Richard Sandiford wrote:
> Richard Biener writes:
> > With applying loop masking to epilogues on x86_64 AVX512 we see
> > some significant performance regressions when evaluating SPEC CPU 2017
> > that are caused by store-to-load forwarding fails across outer
> > loop iterati
Richard Biener writes:
> With applying loop masking to epilogues on x86_64 AVX512 we see
> some significant performance regressions when evaluating SPEC CPU 2017
> that are caused by store-to-load forwarding fails across outer
> loop iterations when the inner loop does not iterate. Consider
>
>
With applying loop masking to epilogues on x86_64 AVX512 we see
some significant performance regressions when evaluating SPEC CPU 2017
that are caused by store-to-load forwarding fails across outer
loop iterations when the inner loop does not iterate. Consider
for (j = 0; j < m; ++j)
for (i