Re: [v6, 1/2] raid6/altivec: Add vpermxor implementation for raid6 Q syndrome
On Fri, 2017-08-04 at 03:42:32 UTC, Matt Brown wrote: > This patch uses the vpermxor instruction to optimise the raid6 Q syndrome. > This instruction was made available with POWER8, ISA version 2.07. > It allows for both vperm and vxor instructions to be done in a single > instruction. This has been tested for correctness on a ppc64le vm with a > basic RAID6 setup containing 5 drives. > > The performance benchmarks are from the raid6test in the /lib/raid6/test > directory. These results are from an IBM Firestone machine with ppc64le > architecture. The benchmark results show a 35% speed increase over the best > existing algorithm for powerpc (altivec). The raid6test has also been run > on a big-endian ppc64 vm to ensure it also works for big-endian > architectures. > > Performance benchmarks: > raid6: altivecx4 gen() 18773 MB/s > raid6: altivecx8 gen() 19438 MB/s > > raid6: vpermxor4 gen() 25112 MB/s > raid6: vpermxor8 gen() 26279 MB/s > > Signed-off-by: Matt Brown> Reviewed-by: Daniel Axtens Series applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/751ba79cc552c146595cd439b21c4f cheers
Re: [v6, 1/2] raid6/altivec: Add vpermxor implementation for raid6 Q syndrome
On Fri, 2017-08-04 at 03:42:32 UTC, Matt Brown wrote: > This patch uses the vpermxor instruction to optimise the raid6 Q syndrome. > This instruction was made available with POWER8, ISA version 2.07. > It allows for both vperm and vxor instructions to be done in a single > instruction. This has been tested for correctness on a ppc64le vm with a > basic RAID6 setup containing 5 drives. > > The performance benchmarks are from the raid6test in the /lib/raid6/test > directory. These results are from an IBM Firestone machine with ppc64le > architecture. The benchmark results show a 35% speed increase over the best > existing algorithm for powerpc (altivec). The raid6test has also been run > on a big-endian ppc64 vm to ensure it also works for big-endian > architectures. > > Performance benchmarks: > raid6: altivecx4 gen() 18773 MB/s > raid6: altivecx8 gen() 19438 MB/s > > raid6: vpermxor4 gen() 25112 MB/s > raid6: vpermxor8 gen() 26279 MB/s > > Signed-off-by: Matt Brown> Reviewed-by: Daniel Axtens Series applied to powerpc next, thanks. https://git.kernel.org/powerpc/c/2de95953c4e6ad54c9bee5e6a5518d cheers
Re: [v6 1/2] raid6/altivec: Add vpermxor implementation for raid6 Q syndrome
On Wed, Aug 9, 2017 at 11:26 PM, Michael Ellermanwrote: > Matt Brown writes: > >> This patch uses the vpermxor instruction to optimise the raid6 Q syndrome. >> This instruction was made available with POWER8, ISA version 2.07. >> It allows for both vperm and vxor instructions to be done in a single >> instruction. This has been tested for correctness on a ppc64le vm with a >> basic RAID6 setup containing 5 drives. >> >> The performance benchmarks are from the raid6test in the /lib/raid6/test >> directory. These results are from an IBM Firestone machine with ppc64le >> architecture. The benchmark results show a 35% speed increase over the best >> existing algorithm for powerpc (altivec). The raid6test has also been run >> on a big-endian ppc64 vm to ensure it also works for big-endian >> architectures. >> >> Performance benchmarks: >> raid6: altivecx4 gen() 18773 MB/s >> raid6: altivecx8 gen() 19438 MB/s >> >> raid6: vpermxor4 gen() 25112 MB/s >> raid6: vpermxor8 gen() 26279 MB/s >> >> Signed-off-by: Matt Brown >> Reviewed-by: Daniel Axtens >> --- >> v6: >> - added vpermxor files to .gitignore >> - fixup whitespace >> - added vpermxor objs to test/Makefile >> v5: >> - moved altivec.uc fix into other patch in series >> --- >> include/linux/raid/pq.h | 4 ++ >> lib/raid6/.gitignore| 1 + >> lib/raid6/Makefile | 27 - >> lib/raid6/algos.c | 4 ++ >> lib/raid6/test/Makefile | 17 +++- >> lib/raid6/vpermxor.uc | 104 >> >> 6 files changed, 154 insertions(+), 3 deletions(-) >> create mode 100644 lib/raid6/vpermxor.uc > > This version at least is not Cc'ed to any of the folks that > get_maintainers.pl identifies for these files: > > $ ./scripts/get_maintainer.pl -f lib/raid6 > s...@fb.com > gayatri.kamm...@intel.com > fenghua...@intel.com > megha@linux.intel.com > schwidef...@de.ibm.com > anup.pa...@broadcom.com > linux-ker...@vger.kernel.org > > > This seems like mostly a list of random folks who've touched this code, > but maybe some of them would have comments? > Ah my bad. I've CC'ed them into this email chain. Apologies for not including you guys in the original email. Here is a link to the patchworks patch: http://patchwork.ozlabs.org/patch/797576/ Thanks, Matt Brown
Re: [v6 1/2] raid6/altivec: Add vpermxor implementation for raid6 Q syndrome
Matt Brownwrites: > This patch uses the vpermxor instruction to optimise the raid6 Q syndrome. > This instruction was made available with POWER8, ISA version 2.07. > It allows for both vperm and vxor instructions to be done in a single > instruction. This has been tested for correctness on a ppc64le vm with a > basic RAID6 setup containing 5 drives. > > The performance benchmarks are from the raid6test in the /lib/raid6/test > directory. These results are from an IBM Firestone machine with ppc64le > architecture. The benchmark results show a 35% speed increase over the best > existing algorithm for powerpc (altivec). The raid6test has also been run > on a big-endian ppc64 vm to ensure it also works for big-endian > architectures. > > Performance benchmarks: > raid6: altivecx4 gen() 18773 MB/s > raid6: altivecx8 gen() 19438 MB/s > > raid6: vpermxor4 gen() 25112 MB/s > raid6: vpermxor8 gen() 26279 MB/s > > Signed-off-by: Matt Brown > Reviewed-by: Daniel Axtens > --- > v6: > - added vpermxor files to .gitignore > - fixup whitespace > - added vpermxor objs to test/Makefile > v5: > - moved altivec.uc fix into other patch in series > --- > include/linux/raid/pq.h | 4 ++ > lib/raid6/.gitignore| 1 + > lib/raid6/Makefile | 27 - > lib/raid6/algos.c | 4 ++ > lib/raid6/test/Makefile | 17 +++- > lib/raid6/vpermxor.uc | 104 > > 6 files changed, 154 insertions(+), 3 deletions(-) > create mode 100644 lib/raid6/vpermxor.uc This version at least is not Cc'ed to any of the folks that get_maintainers.pl identifies for these files: $ ./scripts/get_maintainer.pl -f lib/raid6 s...@fb.com gayatri.kamm...@intel.com fenghua...@intel.com megha@linux.intel.com schwidef...@de.ibm.com anup.pa...@broadcom.com linux-ker...@vger.kernel.org This seems like mostly a list of random folks who've touched this code, but maybe some of them would have comments? cheers
[v6 1/2] raid6/altivec: Add vpermxor implementation for raid6 Q syndrome
This patch uses the vpermxor instruction to optimise the raid6 Q syndrome. This instruction was made available with POWER8, ISA version 2.07. It allows for both vperm and vxor instructions to be done in a single instruction. This has been tested for correctness on a ppc64le vm with a basic RAID6 setup containing 5 drives. The performance benchmarks are from the raid6test in the /lib/raid6/test directory. These results are from an IBM Firestone machine with ppc64le architecture. The benchmark results show a 35% speed increase over the best existing algorithm for powerpc (altivec). The raid6test has also been run on a big-endian ppc64 vm to ensure it also works for big-endian architectures. Performance benchmarks: raid6: altivecx4 gen() 18773 MB/s raid6: altivecx8 gen() 19438 MB/s raid6: vpermxor4 gen() 25112 MB/s raid6: vpermxor8 gen() 26279 MB/s Signed-off-by: Matt BrownReviewed-by: Daniel Axtens --- v6: - added vpermxor files to .gitignore - fixup whitespace - added vpermxor objs to test/Makefile v5: - moved altivec.uc fix into other patch in series --- include/linux/raid/pq.h | 4 ++ lib/raid6/.gitignore| 1 + lib/raid6/Makefile | 27 - lib/raid6/algos.c | 4 ++ lib/raid6/test/Makefile | 17 +++- lib/raid6/vpermxor.uc | 104 6 files changed, 154 insertions(+), 3 deletions(-) create mode 100644 lib/raid6/vpermxor.uc diff --git a/include/linux/raid/pq.h b/include/linux/raid/pq.h index 4d57bba..3df9aa6 100644 --- a/include/linux/raid/pq.h +++ b/include/linux/raid/pq.h @@ -107,6 +107,10 @@ extern const struct raid6_calls raid6_avx512x2; extern const struct raid6_calls raid6_avx512x4; extern const struct raid6_calls raid6_tilegx8; extern const struct raid6_calls raid6_s390vx8; +extern const struct raid6_calls raid6_vpermxor1; +extern const struct raid6_calls raid6_vpermxor2; +extern const struct raid6_calls raid6_vpermxor4; +extern const struct raid6_calls raid6_vpermxor8; struct raid6_recov_calls { void (*data2)(int, size_t, int, int, void **); diff --git a/lib/raid6/.gitignore b/lib/raid6/.gitignore index f01b1cb..3de0d89 100644 --- a/lib/raid6/.gitignore +++ b/lib/raid6/.gitignore @@ -4,3 +4,4 @@ int*.c tables.c neon?.c s390vx?.c +vpermxor*.c diff --git a/lib/raid6/Makefile b/lib/raid6/Makefile index 3057011..db095a7 100644 --- a/lib/raid6/Makefile +++ b/lib/raid6/Makefile @@ -4,7 +4,8 @@ raid6_pq-y += algos.o recov.o tables.o int1.o int2.o int4.o \ int8.o int16.o int32.o raid6_pq-$(CONFIG_X86) += recov_ssse3.o recov_avx2.o mmx.o sse1.o sse2.o avx2.o avx512.o recov_avx512.o -raid6_pq-$(CONFIG_ALTIVEC) += altivec1.o altivec2.o altivec4.o altivec8.o +raid6_pq-$(CONFIG_ALTIVEC) += altivec1.o altivec2.o altivec4.o altivec8.o \ + vpermxor1.o vpermxor2.o vpermxor4.o vpermxor8.o raid6_pq-$(CONFIG_KERNEL_MODE_NEON) += neon.o neon1.o neon2.o neon4.o neon8.o raid6_pq-$(CONFIG_TILEGX) += tilegx8.o raid6_pq-$(CONFIG_S390) += s390vx8.o recov_s390xc.o @@ -88,6 +89,30 @@ $(obj)/altivec8.c: UNROLL := 8 $(obj)/altivec8.c: $(src)/altivec.uc $(src)/unroll.awk FORCE $(call if_changed,unroll) +CFLAGS_vpermxor1.o += $(altivec_flags) +targets += vpermxor1.c +$(obj)/vpermxor1.c: UNROLL := 1 +$(obj)/vpermxor1.c: $(src)/vpermxor.uc $(src)/unroll.awk FORCE + $(call if_changed,unroll) + +CFLAGS_vpermxor2.o += $(altivec_flags) +targets += vpermxor2.c +$(obj)/vpermxor2.c: UNROLL := 2 +$(obj)/vpermxor2.c: $(src)/vpermxor.uc $(src)/unroll.awk FORCE + $(call if_changed,unroll) + +CFLAGS_vpermxor4.o += $(altivec_flags) +targets += vpermxor4.c +$(obj)/vpermxor4.c: UNROLL := 4 +$(obj)/vpermxor4.c: $(src)/vpermxor.uc $(src)/unroll.awk FORCE + $(call if_changed,unroll) + +CFLAGS_vpermxor8.o += $(altivec_flags) +targets += vpermxor8.c +$(obj)/vpermxor8.c: UNROLL := 8 +$(obj)/vpermxor8.c: $(src)/vpermxor.uc $(src)/unroll.awk FORCE + $(call if_changed,unroll) + CFLAGS_neon1.o += $(NEON_FLAGS) targets += neon1.c $(obj)/neon1.c: UNROLL := 1 diff --git a/lib/raid6/algos.c b/lib/raid6/algos.c index 7857049..edd4f69 100644 --- a/lib/raid6/algos.c +++ b/lib/raid6/algos.c @@ -74,6 +74,10 @@ const struct raid6_calls * const raid6_algos[] = { _altivec2, _altivec4, _altivec8, + _vpermxor1, + _vpermxor2, + _vpermxor4, + _vpermxor8, #endif #if defined(CONFIG_TILEGX) _tilegx8, diff --git a/lib/raid6/test/Makefile b/lib/raid6/test/Makefile index 2c7b60e..a14be53 100644 --- a/lib/raid6/test/Makefile +++ b/lib/raid6/test/Makefile @@ -47,7 +47,8 @@ else gcc -c -x c - >&/dev/null && \ rm ./-.o && echo yes) ifeq ($(HAS_ALTIVEC),yes) -OBJS += altivec1.o altivec2.o altivec4.o