On Wed, Aug 13, 2025 at 2:30 PM Hongtao Liu wrote:
>
> On Tue, Aug 5, 2025 at 8:49 AM Andi Kleen wrote:
> >
> > From: Andi Kleen
> >
> > The GFNI AVX gf2p8affineqb instruction can be used to implement
> > vectorized byte shifts or rotates. This patch uses them to implement
> > shift and rotate p
On Tue, Aug 5, 2025 at 8:49 AM Andi Kleen wrote:
>
> From: Andi Kleen
>
> The GFNI AVX gf2p8affineqb instruction can be used to implement
> vectorized byte shifts or rotates. This patch uses them to implement
> shift and rotate patterns to allow the vectorizer to use them.
> Previously AVX couldn
> > It might be reasonable to tweak the costs per CPU however, I haven't
> > done that.
> >
> > BTW for rotate the wins are much higher because there are no native
> > instructions for it.
> For ashl/lshr, the original implementation only takes 2
> instructions(vpsllw/vpsrlw + vpand), and for ashr
On Wed, Aug 13, 2025 at 1:40 AM Andi Kleen wrote:
>
> >
> > The latter takes 5 cycles, the former takes 3 cycles.
>
> It's pipelined however.
>
> >
> > Do you have any microbenchmark or real workloads to show your
> > optimization is better?
>
> Keep in mind it only uses one port vs two.
>
> Yes I
>
> The latter takes 5 cycles, the former takes 3 cycles.
It's pipelined however.
>
> Do you have any microbenchmark or real workloads to show your
> optimization is better?
Keep in mind it only uses one port vs two.
Yes I ran it on Arrow lake and saw wins on both Pcore and Ecore
according to
On Tue, Aug 5, 2025 at 8:49 AM Andi Kleen wrote:
>
> From: Andi Kleen
>
> The GFNI AVX gf2p8affineqb instruction can be used to implement
> vectorized byte shifts or rotates. This patch uses them to implement
> shift and rotate patterns to allow the vectorizer to use them.
> Previously AVX couldn
Andi Kleen writes:
I wanted to ping
https://gcc.gnu.org/pipermail/gcc-patches/2025-August/691624.html
> From: Andi Kleen
>
> The GFNI AVX gf2p8affineqb instruction can be used to implement
> vectorized byte shifts or rotates. This patch uses them to implement
> shift and rotate patterns to all
From: Andi Kleen
The GFNI AVX gf2p8affineqb instruction can be used to implement
vectorized byte shifts or rotates. This patch uses them to implement
shift and rotate patterns to allow the vectorizer to use them.
Previously AVX couldn't do rotates (except with XOP) and had to handle
8 bit shifts