Re: [PATCH] Use x86 GFNI for vectorized constant byte shifts/rotates

2025-08-13 Thread Hongtao Liu
On Wed, Aug 13, 2025 at 2:30 PM Hongtao Liu wrote: > > On Tue, Aug 5, 2025 at 8:49 AM Andi Kleen wrote: > > > > From: Andi Kleen > > > > The GFNI AVX gf2p8affineqb instruction can be used to implement > > vectorized byte shifts or rotates. This patch uses them to implement > > shift and rotate p

Re: [PATCH] Use x86 GFNI for vectorized constant byte shifts/rotates

2025-08-12 Thread Hongtao Liu
On Tue, Aug 5, 2025 at 8:49 AM Andi Kleen wrote: > > From: Andi Kleen > > The GFNI AVX gf2p8affineqb instruction can be used to implement > vectorized byte shifts or rotates. This patch uses them to implement > shift and rotate patterns to allow the vectorizer to use them. > Previously AVX couldn

Re: [PATCH] Use x86 GFNI for vectorized constant byte shifts/rotates

2025-08-12 Thread Andi Kleen
> > It might be reasonable to tweak the costs per CPU however, I haven't > > done that. > > > > BTW for rotate the wins are much higher because there are no native > > instructions for it. > For ashl/lshr, the original implementation only takes 2 > instructions(vpsllw/vpsrlw + vpand), and for ashr

Re: [PATCH] Use x86 GFNI for vectorized constant byte shifts/rotates

2025-08-12 Thread Hongtao Liu
On Wed, Aug 13, 2025 at 1:40 AM Andi Kleen wrote: > > > > > The latter takes 5 cycles, the former takes 3 cycles. > > It's pipelined however. > > > > > Do you have any microbenchmark or real workloads to show your > > optimization is better? > > Keep in mind it only uses one port vs two. > > Yes I

Re: [PATCH] Use x86 GFNI for vectorized constant byte shifts/rotates

2025-08-12 Thread Andi Kleen
> > The latter takes 5 cycles, the former takes 3 cycles. It's pipelined however. > > Do you have any microbenchmark or real workloads to show your > optimization is better? Keep in mind it only uses one port vs two. Yes I ran it on Arrow lake and saw wins on both Pcore and Ecore according to

Re: [PATCH] Use x86 GFNI for vectorized constant byte shifts/rotates

2025-08-12 Thread Hongtao Liu
On Tue, Aug 5, 2025 at 8:49 AM Andi Kleen wrote: > > From: Andi Kleen > > The GFNI AVX gf2p8affineqb instruction can be used to implement > vectorized byte shifts or rotates. This patch uses them to implement > shift and rotate patterns to allow the vectorizer to use them. > Previously AVX couldn

[PING] [PATCH] Use x86 GFNI for vectorized constant byte shifts/rotates

2025-08-11 Thread Andi Kleen
Andi Kleen writes: I wanted to ping https://gcc.gnu.org/pipermail/gcc-patches/2025-August/691624.html > From: Andi Kleen > > The GFNI AVX gf2p8affineqb instruction can be used to implement > vectorized byte shifts or rotates. This patch uses them to implement > shift and rotate patterns to all

[PATCH] Use x86 GFNI for vectorized constant byte shifts/rotates

2025-08-04 Thread Andi Kleen
From: Andi Kleen The GFNI AVX gf2p8affineqb instruction can be used to implement vectorized byte shifts or rotates. This patch uses them to implement shift and rotate patterns to allow the vectorizer to use them. Previously AVX couldn't do rotates (except with XOP) and had to handle 8 bit shifts