Re: [v6, 1/2] raid6/altivec: Add vpermxor implementation for raid6 Q syndrome

2018-03-20 Thread Michael Ellerman
On Fri, 2017-08-04 at 03:42:32 UTC, Matt Brown wrote:
> This patch uses the vpermxor instruction to optimise the raid6 Q syndrome.
> This instruction was made available with POWER8, ISA version 2.07.
> It allows for both vperm and vxor instructions to be done in a single
> instruction. This has been tested for correctness on a ppc64le vm with a
> basic RAID6 setup containing 5 drives.
> 
> The performance benchmarks are from the raid6test in the /lib/raid6/test
> directory. These results are from an IBM Firestone machine with ppc64le
> architecture. The benchmark results show a 35% speed increase over the best
> existing algorithm for powerpc (altivec). The raid6test has also been run
> on a big-endian ppc64 vm to ensure it also works for big-endian
> architectures.
> 
> Performance benchmarks:
>   raid6: altivecx4 gen() 18773 MB/s
>   raid6: altivecx8 gen() 19438 MB/s
> 
>   raid6: vpermxor4 gen() 25112 MB/s
>   raid6: vpermxor8 gen() 26279 MB/s
> 
> Signed-off-by: Matt Brown 
> Reviewed-by: Daniel Axtens 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/751ba79cc552c146595cd439b21c4f

cheers


Re: [v6, 1/2] raid6/altivec: Add vpermxor implementation for raid6 Q syndrome

2018-03-19 Thread Michael Ellerman
On Fri, 2017-08-04 at 03:42:32 UTC, Matt Brown wrote:
> This patch uses the vpermxor instruction to optimise the raid6 Q syndrome.
> This instruction was made available with POWER8, ISA version 2.07.
> It allows for both vperm and vxor instructions to be done in a single
> instruction. This has been tested for correctness on a ppc64le vm with a
> basic RAID6 setup containing 5 drives.
> 
> The performance benchmarks are from the raid6test in the /lib/raid6/test
> directory. These results are from an IBM Firestone machine with ppc64le
> architecture. The benchmark results show a 35% speed increase over the best
> existing algorithm for powerpc (altivec). The raid6test has also been run
> on a big-endian ppc64 vm to ensure it also works for big-endian
> architectures.
> 
> Performance benchmarks:
>   raid6: altivecx4 gen() 18773 MB/s
>   raid6: altivecx8 gen() 19438 MB/s
> 
>   raid6: vpermxor4 gen() 25112 MB/s
>   raid6: vpermxor8 gen() 26279 MB/s
> 
> Signed-off-by: Matt Brown 
> Reviewed-by: Daniel Axtens 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/2de95953c4e6ad54c9bee5e6a5518d

cheers


Re: [v6 1/2] raid6/altivec: Add vpermxor implementation for raid6 Q syndrome

2017-08-09 Thread Matt Brown
On Wed, Aug 9, 2017 at 11:26 PM, Michael Ellerman  wrote:
> Matt Brown  writes:
>
>> This patch uses the vpermxor instruction to optimise the raid6 Q syndrome.
>> This instruction was made available with POWER8, ISA version 2.07.
>> It allows for both vperm and vxor instructions to be done in a single
>> instruction. This has been tested for correctness on a ppc64le vm with a
>> basic RAID6 setup containing 5 drives.
>>
>> The performance benchmarks are from the raid6test in the /lib/raid6/test
>> directory. These results are from an IBM Firestone machine with ppc64le
>> architecture. The benchmark results show a 35% speed increase over the best
>> existing algorithm for powerpc (altivec). The raid6test has also been run
>> on a big-endian ppc64 vm to ensure it also works for big-endian
>> architectures.
>>
>> Performance benchmarks:
>>   raid6: altivecx4 gen() 18773 MB/s
>>   raid6: altivecx8 gen() 19438 MB/s
>>
>>   raid6: vpermxor4 gen() 25112 MB/s
>>   raid6: vpermxor8 gen() 26279 MB/s
>>
>> Signed-off-by: Matt Brown 
>> Reviewed-by: Daniel Axtens 
>> ---
>> v6:
>>   - added vpermxor files to .gitignore
>>   - fixup whitespace
>>   - added vpermxor objs to test/Makefile
>> v5:
>>   - moved altivec.uc fix into other patch in series
>> ---
>>  include/linux/raid/pq.h |   4 ++
>>  lib/raid6/.gitignore|   1 +
>>  lib/raid6/Makefile  |  27 -
>>  lib/raid6/algos.c   |   4 ++
>>  lib/raid6/test/Makefile |  17 +++-
>>  lib/raid6/vpermxor.uc   | 104 
>> 
>>  6 files changed, 154 insertions(+), 3 deletions(-)
>>  create mode 100644 lib/raid6/vpermxor.uc
>
> This version at least is not Cc'ed to any of the folks that
> get_maintainers.pl identifies for these files:
>
> $ ./scripts/get_maintainer.pl -f lib/raid6
> s...@fb.com
> gayatri.kamm...@intel.com
> fenghua...@intel.com
> megha@linux.intel.com
> schwidef...@de.ibm.com
> anup.pa...@broadcom.com
> linux-ker...@vger.kernel.org
>
>
> This seems like mostly a list of random folks who've touched this code,
> but maybe some of them would have comments?
>

Ah my bad. I've CC'ed them into this email chain.
Apologies for not including you guys in the original email.
Here is a link to the patchworks patch:
http://patchwork.ozlabs.org/patch/797576/

Thanks,
Matt Brown


Re: [v6 1/2] raid6/altivec: Add vpermxor implementation for raid6 Q syndrome

2017-08-09 Thread Michael Ellerman
Matt Brown  writes:

> This patch uses the vpermxor instruction to optimise the raid6 Q syndrome.
> This instruction was made available with POWER8, ISA version 2.07.
> It allows for both vperm and vxor instructions to be done in a single
> instruction. This has been tested for correctness on a ppc64le vm with a
> basic RAID6 setup containing 5 drives.
>
> The performance benchmarks are from the raid6test in the /lib/raid6/test
> directory. These results are from an IBM Firestone machine with ppc64le
> architecture. The benchmark results show a 35% speed increase over the best
> existing algorithm for powerpc (altivec). The raid6test has also been run
> on a big-endian ppc64 vm to ensure it also works for big-endian
> architectures.
>
> Performance benchmarks:
>   raid6: altivecx4 gen() 18773 MB/s
>   raid6: altivecx8 gen() 19438 MB/s
>
>   raid6: vpermxor4 gen() 25112 MB/s
>   raid6: vpermxor8 gen() 26279 MB/s
>
> Signed-off-by: Matt Brown 
> Reviewed-by: Daniel Axtens 
> ---
> v6:
>   - added vpermxor files to .gitignore
>   - fixup whitespace
>   - added vpermxor objs to test/Makefile
> v5:
>   - moved altivec.uc fix into other patch in series
> ---
>  include/linux/raid/pq.h |   4 ++
>  lib/raid6/.gitignore|   1 +
>  lib/raid6/Makefile  |  27 -
>  lib/raid6/algos.c   |   4 ++
>  lib/raid6/test/Makefile |  17 +++-
>  lib/raid6/vpermxor.uc   | 104 
> 
>  6 files changed, 154 insertions(+), 3 deletions(-)
>  create mode 100644 lib/raid6/vpermxor.uc

This version at least is not Cc'ed to any of the folks that
get_maintainers.pl identifies for these files:

$ ./scripts/get_maintainer.pl -f lib/raid6
s...@fb.com
gayatri.kamm...@intel.com
fenghua...@intel.com
megha@linux.intel.com
schwidef...@de.ibm.com
anup.pa...@broadcom.com
linux-ker...@vger.kernel.org


This seems like mostly a list of random folks who've touched this code,
but maybe some of them would have comments?

cheers


[v6 1/2] raid6/altivec: Add vpermxor implementation for raid6 Q syndrome

2017-08-03 Thread Matt Brown
This patch uses the vpermxor instruction to optimise the raid6 Q syndrome.
This instruction was made available with POWER8, ISA version 2.07.
It allows for both vperm and vxor instructions to be done in a single
instruction. This has been tested for correctness on a ppc64le vm with a
basic RAID6 setup containing 5 drives.

The performance benchmarks are from the raid6test in the /lib/raid6/test
directory. These results are from an IBM Firestone machine with ppc64le
architecture. The benchmark results show a 35% speed increase over the best
existing algorithm for powerpc (altivec). The raid6test has also been run
on a big-endian ppc64 vm to ensure it also works for big-endian
architectures.

Performance benchmarks:
raid6: altivecx4 gen() 18773 MB/s
raid6: altivecx8 gen() 19438 MB/s

raid6: vpermxor4 gen() 25112 MB/s
raid6: vpermxor8 gen() 26279 MB/s

Signed-off-by: Matt Brown 
Reviewed-by: Daniel Axtens 
---
v6:
- added vpermxor files to .gitignore
- fixup whitespace
- added vpermxor objs to test/Makefile
v5:
- moved altivec.uc fix into other patch in series
---
 include/linux/raid/pq.h |   4 ++
 lib/raid6/.gitignore|   1 +
 lib/raid6/Makefile  |  27 -
 lib/raid6/algos.c   |   4 ++
 lib/raid6/test/Makefile |  17 +++-
 lib/raid6/vpermxor.uc   | 104 
 6 files changed, 154 insertions(+), 3 deletions(-)
 create mode 100644 lib/raid6/vpermxor.uc

diff --git a/include/linux/raid/pq.h b/include/linux/raid/pq.h
index 4d57bba..3df9aa6 100644
--- a/include/linux/raid/pq.h
+++ b/include/linux/raid/pq.h
@@ -107,6 +107,10 @@ extern const struct raid6_calls raid6_avx512x2;
 extern const struct raid6_calls raid6_avx512x4;
 extern const struct raid6_calls raid6_tilegx8;
 extern const struct raid6_calls raid6_s390vx8;
+extern const struct raid6_calls raid6_vpermxor1;
+extern const struct raid6_calls raid6_vpermxor2;
+extern const struct raid6_calls raid6_vpermxor4;
+extern const struct raid6_calls raid6_vpermxor8;
 
 struct raid6_recov_calls {
void (*data2)(int, size_t, int, int, void **);
diff --git a/lib/raid6/.gitignore b/lib/raid6/.gitignore
index f01b1cb..3de0d89 100644
--- a/lib/raid6/.gitignore
+++ b/lib/raid6/.gitignore
@@ -4,3 +4,4 @@ int*.c
 tables.c
 neon?.c
 s390vx?.c
+vpermxor*.c
diff --git a/lib/raid6/Makefile b/lib/raid6/Makefile
index 3057011..db095a7 100644
--- a/lib/raid6/Makefile
+++ b/lib/raid6/Makefile
@@ -4,7 +4,8 @@ raid6_pq-y  += algos.o recov.o tables.o int1.o int2.o 
int4.o \
   int8.o int16.o int32.o
 
 raid6_pq-$(CONFIG_X86) += recov_ssse3.o recov_avx2.o mmx.o sse1.o sse2.o 
avx2.o avx512.o recov_avx512.o
-raid6_pq-$(CONFIG_ALTIVEC) += altivec1.o altivec2.o altivec4.o altivec8.o
+raid6_pq-$(CONFIG_ALTIVEC) += altivec1.o altivec2.o altivec4.o altivec8.o \
+  vpermxor1.o vpermxor2.o vpermxor4.o vpermxor8.o
 raid6_pq-$(CONFIG_KERNEL_MODE_NEON) += neon.o neon1.o neon2.o neon4.o neon8.o
 raid6_pq-$(CONFIG_TILEGX) += tilegx8.o
 raid6_pq-$(CONFIG_S390) += s390vx8.o recov_s390xc.o
@@ -88,6 +89,30 @@ $(obj)/altivec8.c:   UNROLL := 8
 $(obj)/altivec8.c:   $(src)/altivec.uc $(src)/unroll.awk FORCE
$(call if_changed,unroll)
 
+CFLAGS_vpermxor1.o += $(altivec_flags)
+targets += vpermxor1.c
+$(obj)/vpermxor1.c: UNROLL := 1
+$(obj)/vpermxor1.c: $(src)/vpermxor.uc $(src)/unroll.awk FORCE
+   $(call if_changed,unroll)
+
+CFLAGS_vpermxor2.o += $(altivec_flags)
+targets += vpermxor2.c
+$(obj)/vpermxor2.c: UNROLL := 2
+$(obj)/vpermxor2.c: $(src)/vpermxor.uc $(src)/unroll.awk FORCE
+   $(call if_changed,unroll)
+
+CFLAGS_vpermxor4.o += $(altivec_flags)
+targets += vpermxor4.c
+$(obj)/vpermxor4.c: UNROLL := 4
+$(obj)/vpermxor4.c: $(src)/vpermxor.uc $(src)/unroll.awk FORCE
+   $(call if_changed,unroll)
+
+CFLAGS_vpermxor8.o += $(altivec_flags)
+targets += vpermxor8.c
+$(obj)/vpermxor8.c: UNROLL := 8
+$(obj)/vpermxor8.c: $(src)/vpermxor.uc $(src)/unroll.awk FORCE
+   $(call if_changed,unroll)
+
 CFLAGS_neon1.o += $(NEON_FLAGS)
 targets += neon1.c
 $(obj)/neon1.c:   UNROLL := 1
diff --git a/lib/raid6/algos.c b/lib/raid6/algos.c
index 7857049..edd4f69 100644
--- a/lib/raid6/algos.c
+++ b/lib/raid6/algos.c
@@ -74,6 +74,10 @@ const struct raid6_calls * const raid6_algos[] = {
_altivec2,
_altivec4,
_altivec8,
+   _vpermxor1,
+   _vpermxor2,
+   _vpermxor4,
+   _vpermxor8,
 #endif
 #if defined(CONFIG_TILEGX)
_tilegx8,
diff --git a/lib/raid6/test/Makefile b/lib/raid6/test/Makefile
index 2c7b60e..a14be53 100644
--- a/lib/raid6/test/Makefile
+++ b/lib/raid6/test/Makefile
@@ -47,7 +47,8 @@ else
  gcc -c -x c - >&/dev/null && \
  rm ./-.o && echo yes)
 ifeq ($(HAS_ALTIVEC),yes)
-OBJS += altivec1.o altivec2.o altivec4.o