Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

2021-01-06 Thread Tony W Wang-oc
On 03/01/2021 05:12, Herbert Xu wrote:
> On Tue, Dec 15, 2020 at 06:28:11PM +0800, Tony W Wang-oc wrote:
>> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
>> On platforms with Zhaoxin CPUs supporting this X86 feature, when
>> crc32c-intel and crc32c-generic are both registered, system will
>> use crc32c-intel because its .cra_priority is greater than
>> crc32c-generic.
>>
>> When doing lmbench3 Create and Delete file test on partitions with
>> ext4 enabling metadata checksum, found using crc32c-generic driver
>> could get about 20% performance gain than using the driver crc32c-intel
>> on some Zhaoxin CPUs.
>>
>> This case expect to use crc32c-generic driver for these Zhaoxin CPUs
>> to get performance gain, so remove these Zhaoxin CPUs support from
>> crc32c-intel.
>>
>> Signed-off-by: Tony W Wang-oc 
>> ---
>>  arch/x86/crypto/crc32c-intel_glue.c | 21 +++--
>>  1 file changed, 19 insertions(+), 2 deletions(-)
> 
> This does not seem to address the latest comment from hpa.
> 

Yes, please ignore this patch. Have send new patch set per Hpa's suggestion.

Sincerely
Tonyw

> Thanks,
> 


Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

2021-01-06 Thread Tony W Wang-oc


On 22/12/2020 12:54, h...@zytor.com wrote:
> On December 21, 2020 7:01:39 PM PST, tonywwang...@zhaoxin.com wrote:
>> On December 22, 2020 3:27:33 AM GMT+08:00, h...@zytor.com wrote:
>>> On December 20, 2020 6:46:25 PM PST, tonywwang...@zhaoxin.com wrote:
 On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers
  wrote:
> On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote:
>>
>> On 15/12/2020 04:41, Eric Biggers wrote:
>>> On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote:
 On 12/12/2020 01:43, Eric Biggers wrote:
> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc
>> wrote:
>> The driver crc32c-intel match CPUs supporting
> X86_FEATURE_XMM4_2.
>> On platforms with Zhaoxin CPUs supporting this X86 feature,
 When
>> crc32c-intel and crc32c-generic are both registered, system
 will
>> use crc32c-intel because its .cra_priority is greater than
>> crc32c-generic. This case expect to use crc32c-generic driver
> for
>> some Zhaoxin CPUs to get performance gain, So remove these
> Zhaoxin
>> CPUs support from crc32c-intel.
>>
>> Signed-off-by: Tony W Wang-oc 
>
> Does this mean that the performance of the crc32c instruction
>>> on
> those CPUs is
> actually slower than a regular C implementation?  That's very
> weird.
>

 From the lmbench3 Create and Delete file test on those chips, I
> think yes.

>>>
>>> Did you try measuring the performance of the hashing itself, and
> not some
>>> higher-level filesystem operations?
>>>
>>
>> Yes. Was testing on these Zhaoxin CPUs, the result is that with
>> the
> same
>> input value the generic C implementation takes fewer time than the
>> crc32c instruction implementation.
>>
>
> And that is really "working as intended"?

 These CPU's crc32c instruction is not working as intended.

  Why do these CPUs even
> declare that
> they support the crc32c instruction, when it is so slow?
>

 The presence of crc32c and some other instructions supports are
 enumerated by CPUID.01:ECX[SSE4.2] = 1,  other instructions are ok
 except the crc32c instruction.

> Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2,
>> AVX,
> etc.) that
> these CPUs similarly declare support for but they are uselessly
>> slow?

 No.

 Sincerely
 Tonyw

>
> - Eric
>>>
>>> Then the right thing to do is to disable the CPUID bit in the
>>> vendor-specific startup code.
>>
>> This way makes these CPUs do not support all instruction sets
>> enumerated
>> by CPUID.01:ECX[SSE4.2].
>> While only crc32c instruction is slow, just expect the crc32c-intel
>> driver do not
>> match these CPUs.
>>
>> Sincerely
>> Tonyw
> 
> Then create a BUG flag for it, or factor out CRC32C into a synthetic flag. We 
> *do not* bury this information in drivers; it becomes a recipe for the same 
> problems over and over.
> 

Thanks for your suggestion. Have send new patch set.

Sincerely
Tonyw


Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

2021-01-02 Thread Herbert Xu
On Tue, Dec 15, 2020 at 06:28:11PM +0800, Tony W Wang-oc wrote:
> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
> On platforms with Zhaoxin CPUs supporting this X86 feature, when
> crc32c-intel and crc32c-generic are both registered, system will
> use crc32c-intel because its .cra_priority is greater than
> crc32c-generic.
> 
> When doing lmbench3 Create and Delete file test on partitions with
> ext4 enabling metadata checksum, found using crc32c-generic driver
> could get about 20% performance gain than using the driver crc32c-intel
> on some Zhaoxin CPUs.
> 
> This case expect to use crc32c-generic driver for these Zhaoxin CPUs
> to get performance gain, so remove these Zhaoxin CPUs support from
> crc32c-intel.
> 
> Signed-off-by: Tony W Wang-oc 
> ---
>  arch/x86/crypto/crc32c-intel_glue.c | 21 +++--
>  1 file changed, 19 insertions(+), 2 deletions(-)

This does not seem to address the latest comment from hpa.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt


Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

2020-12-21 Thread hpa
On December 21, 2020 7:01:39 PM PST, tonywwang...@zhaoxin.com wrote:
>On December 22, 2020 3:27:33 AM GMT+08:00, h...@zytor.com wrote:
>>On December 20, 2020 6:46:25 PM PST, tonywwang...@zhaoxin.com wrote:
>>>On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers
>>> wrote:
On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote:
> 
> On 15/12/2020 04:41, Eric Biggers wrote:
> > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote:
> >> On 12/12/2020 01:43, Eric Biggers wrote:
> >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc
>wrote:
>  The driver crc32c-intel match CPUs supporting
X86_FEATURE_XMM4_2.
>  On platforms with Zhaoxin CPUs supporting this X86 feature,
>>>When
>  crc32c-intel and crc32c-generic are both registered, system
>>>will
>  use crc32c-intel because its .cra_priority is greater than
>  crc32c-generic. This case expect to use crc32c-generic driver
for
>  some Zhaoxin CPUs to get performance gain, So remove these
Zhaoxin
>  CPUs support from crc32c-intel.
> 
>  Signed-off-by: Tony W Wang-oc 
> >>>
> >>> Does this mean that the performance of the crc32c instruction
>>on
those CPUs is
> >>> actually slower than a regular C implementation?  That's very
weird.
> >>>
> >>
> >> From the lmbench3 Create and Delete file test on those chips, I
think yes.
> >>
> > 
> > Did you try measuring the performance of the hashing itself, and
not some
> > higher-level filesystem operations?
> > 
> 
> Yes. Was testing on these Zhaoxin CPUs, the result is that with
>the
same
> input value the generic C implementation takes fewer time than the
> crc32c instruction implementation.
> 

And that is really "working as intended"?
>>>
>>>These CPU's crc32c instruction is not working as intended.
>>>
>>>  Why do these CPUs even
declare that
they support the crc32c instruction, when it is so slow?

>>>
>>>The presence of crc32c and some other instructions supports are
>>>enumerated by CPUID.01:ECX[SSE4.2] = 1,  other instructions are ok
>>>except the crc32c instruction.
>>>
Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2,
>AVX,
etc.) that
these CPUs similarly declare support for but they are uselessly
>slow?
>>>
>>>No.
>>>
>>>Sincerely
>>>Tonyw
>>>

- Eric
>>
>>Then the right thing to do is to disable the CPUID bit in the
>>vendor-specific startup code.
>
>This way makes these CPUs do not support all instruction sets
>enumerated
>by CPUID.01:ECX[SSE4.2].
>While only crc32c instruction is slow, just expect the crc32c-intel
>driver do not
>match these CPUs.
>
>Sincerely
>Tonyw

Then create a BUG flag for it, or factor out CRC32C into a synthetic flag. We 
*do not* bury this information in drivers; it becomes a recipe for the same 
problems over and over.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

2020-12-21 Thread tonywwang-oc
On December 22, 2020 3:27:33 AM GMT+08:00, h...@zytor.com wrote:
>On December 20, 2020 6:46:25 PM PST, tonywwang...@zhaoxin.com wrote:
>>On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers
>> wrote:
>>>On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote:
 
 On 15/12/2020 04:41, Eric Biggers wrote:
 > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote:
 >> On 12/12/2020 01:43, Eric Biggers wrote:
 >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
  The driver crc32c-intel match CPUs supporting
>>>X86_FEATURE_XMM4_2.
  On platforms with Zhaoxin CPUs supporting this X86 feature,
>>When
  crc32c-intel and crc32c-generic are both registered, system
>>will
  use crc32c-intel because its .cra_priority is greater than
  crc32c-generic. This case expect to use crc32c-generic driver
>>>for
  some Zhaoxin CPUs to get performance gain, So remove these
>>>Zhaoxin
  CPUs support from crc32c-intel.
 
  Signed-off-by: Tony W Wang-oc 
 >>>
 >>> Does this mean that the performance of the crc32c instruction
>on
>>>those CPUs is
 >>> actually slower than a regular C implementation?  That's very
>>>weird.
 >>>
 >>
 >> From the lmbench3 Create and Delete file test on those chips, I
>>>think yes.
 >>
 > 
 > Did you try measuring the performance of the hashing itself, and
>>>not some
 > higher-level filesystem operations?
 > 
 
 Yes. Was testing on these Zhaoxin CPUs, the result is that with the
>>>same
 input value the generic C implementation takes fewer time than the
 crc32c instruction implementation.
 
>>>
>>>And that is really "working as intended"?
>>
>>These CPU's crc32c instruction is not working as intended.
>>
>>  Why do these CPUs even
>>>declare that
>>>they support the crc32c instruction, when it is so slow?
>>>
>>
>>The presence of crc32c and some other instructions supports are
>>enumerated by CPUID.01:ECX[SSE4.2] = 1,  other instructions are ok
>>except the crc32c instruction.
>>
>>>Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2, AVX,
>>>etc.) that
>>>these CPUs similarly declare support for but they are uselessly slow?
>>
>>No.
>>
>>Sincerely
>>Tonyw
>>
>>>
>>>- Eric
>
>Then the right thing to do is to disable the CPUID bit in the
>vendor-specific startup code.

This way makes these CPUs do not support all instruction sets enumerated
by CPUID.01:ECX[SSE4.2].
While only crc32c instruction is slow, just expect the crc32c-intel driver do 
not
match these CPUs.

Sincerely
Tonyw



Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

2020-12-21 Thread hpa
On December 20, 2020 6:46:25 PM PST, tonywwang...@zhaoxin.com wrote:
>On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers
> wrote:
>>On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote:
>>> 
>>> On 15/12/2020 04:41, Eric Biggers wrote:
>>> > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote:
>>> >> On 12/12/2020 01:43, Eric Biggers wrote:
>>> >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
>>>  The driver crc32c-intel match CPUs supporting
>>X86_FEATURE_XMM4_2.
>>>  On platforms with Zhaoxin CPUs supporting this X86 feature,
>When
>>>  crc32c-intel and crc32c-generic are both registered, system
>will
>>>  use crc32c-intel because its .cra_priority is greater than
>>>  crc32c-generic. This case expect to use crc32c-generic driver
>>for
>>>  some Zhaoxin CPUs to get performance gain, So remove these
>>Zhaoxin
>>>  CPUs support from crc32c-intel.
>>> 
>>>  Signed-off-by: Tony W Wang-oc 
>>> >>>
>>> >>> Does this mean that the performance of the crc32c instruction on
>>those CPUs is
>>> >>> actually slower than a regular C implementation?  That's very
>>weird.
>>> >>>
>>> >>
>>> >> From the lmbench3 Create and Delete file test on those chips, I
>>think yes.
>>> >>
>>> > 
>>> > Did you try measuring the performance of the hashing itself, and
>>not some
>>> > higher-level filesystem operations?
>>> > 
>>> 
>>> Yes. Was testing on these Zhaoxin CPUs, the result is that with the
>>same
>>> input value the generic C implementation takes fewer time than the
>>> crc32c instruction implementation.
>>> 
>>
>>And that is really "working as intended"?
>
>These CPU's crc32c instruction is not working as intended.
>
>  Why do these CPUs even
>>declare that
>>they support the crc32c instruction, when it is so slow?
>>
>
>The presence of crc32c and some other instructions supports are
>enumerated by CPUID.01:ECX[SSE4.2] = 1,  other instructions are ok
>except the crc32c instruction.
>
>>Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2, AVX,
>>etc.) that
>>these CPUs similarly declare support for but they are uselessly slow?
>
>No.
>
>Sincerely
>Tonyw
>
>>
>>- Eric

Then the right thing to do is to disable the CPUID bit in the vendor-specific 
startup code.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

2020-12-20 Thread tonywwang-oc
On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers  
wrote:
>On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote:
>> 
>> On 15/12/2020 04:41, Eric Biggers wrote:
>> > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote:
>> >> On 12/12/2020 01:43, Eric Biggers wrote:
>> >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
>>  The driver crc32c-intel match CPUs supporting
>X86_FEATURE_XMM4_2.
>>  On platforms with Zhaoxin CPUs supporting this X86 feature, When
>>  crc32c-intel and crc32c-generic are both registered, system will
>>  use crc32c-intel because its .cra_priority is greater than
>>  crc32c-generic. This case expect to use crc32c-generic driver
>for
>>  some Zhaoxin CPUs to get performance gain, So remove these
>Zhaoxin
>>  CPUs support from crc32c-intel.
>> 
>>  Signed-off-by: Tony W Wang-oc 
>> >>>
>> >>> Does this mean that the performance of the crc32c instruction on
>those CPUs is
>> >>> actually slower than a regular C implementation?  That's very
>weird.
>> >>>
>> >>
>> >> From the lmbench3 Create and Delete file test on those chips, I
>think yes.
>> >>
>> > 
>> > Did you try measuring the performance of the hashing itself, and
>not some
>> > higher-level filesystem operations?
>> > 
>> 
>> Yes. Was testing on these Zhaoxin CPUs, the result is that with the
>same
>> input value the generic C implementation takes fewer time than the
>> crc32c instruction implementation.
>> 
>
>And that is really "working as intended"?

These CPU's crc32c instruction is not working as intended.

  Why do these CPUs even
>declare that
>they support the crc32c instruction, when it is so slow?
>

The presence of crc32c and some other instructions supports are enumerated by 
CPUID.01:ECX[SSE4.2] = 1,  other instructions are ok except the crc32c 
instruction.

>Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2, AVX,
>etc.) that
>these CPUs similarly declare support for but they are uselessly slow?

No.

Sincerely
Tonyw

>
>- Eric




Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

2020-12-15 Thread Eric Biggers
On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote:
> 
> On 15/12/2020 04:41, Eric Biggers wrote:
> > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote:
> >> On 12/12/2020 01:43, Eric Biggers wrote:
> >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
>  The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
>  On platforms with Zhaoxin CPUs supporting this X86 feature, When
>  crc32c-intel and crc32c-generic are both registered, system will
>  use crc32c-intel because its .cra_priority is greater than
>  crc32c-generic. This case expect to use crc32c-generic driver for
>  some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin
>  CPUs support from crc32c-intel.
> 
>  Signed-off-by: Tony W Wang-oc 
> >>>
> >>> Does this mean that the performance of the crc32c instruction on those 
> >>> CPUs is
> >>> actually slower than a regular C implementation?  That's very weird.
> >>>
> >>
> >> From the lmbench3 Create and Delete file test on those chips, I think yes.
> >>
> > 
> > Did you try measuring the performance of the hashing itself, and not some
> > higher-level filesystem operations?
> > 
> 
> Yes. Was testing on these Zhaoxin CPUs, the result is that with the same
> input value the generic C implementation takes fewer time than the
> crc32c instruction implementation.
> 

And that is really "working as intended"?  Why do these CPUs even declare that
they support the crc32c instruction, when it is so slow?

Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2, AVX, etc.) that
these CPUs similarly declare support for but they are uselessly slow?

- Eric


[PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

2020-12-15 Thread Tony W Wang-oc
The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
On platforms with Zhaoxin CPUs supporting this X86 feature, when
crc32c-intel and crc32c-generic are both registered, system will
use crc32c-intel because its .cra_priority is greater than
crc32c-generic.

When doing lmbench3 Create and Delete file test on partitions with
ext4 enabling metadata checksum, found using crc32c-generic driver
could get about 20% performance gain than using the driver crc32c-intel
on some Zhaoxin CPUs.

This case expect to use crc32c-generic driver for these Zhaoxin CPUs
to get performance gain, so remove these Zhaoxin CPUs support from
crc32c-intel.

Signed-off-by: Tony W Wang-oc 
---
 arch/x86/crypto/crc32c-intel_glue.c | 21 +++--
 1 file changed, 19 insertions(+), 2 deletions(-)

diff --git a/arch/x86/crypto/crc32c-intel_glue.c 
b/arch/x86/crypto/crc32c-intel_glue.c
index feccb52..5171091 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -215,14 +215,31 @@ static struct shash_alg alg = {
 };
 
 static const struct x86_cpu_id crc32c_cpu_id[] = {
-   X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, NULL),
+   /*
+* Negative entries; exclude these chips from using this driver.
+* They match the positive rule below, but their CRC32 instruction
+* implementation is so slow, it doesn't merit use.
+*/
+   X86_MATCH_VENDOR_FAM_FEATURE(ZHAOXIN, 0x6, X86_FEATURE_XMM4_2, false),
+   X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x1b, 
X86_FEATURE_XMM4_2, false),
+   X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x3b, 
X86_FEATURE_XMM4_2, false),
+   X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 0x6, X86_FEATURE_XMM4_2, false),
+   X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x1b, 
X86_FEATURE_XMM4_2, false),
+   X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x3b, 
X86_FEATURE_XMM4_2, false),
+   /*
+* Positive entry; SSE-4.2 instructions include special purpose CRC32
+* instructions.
+*/
+   X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, true),
{}
 };
 MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id);
 
 static int __init crc32c_intel_mod_init(void)
 {
-   if (!x86_match_cpu(crc32c_cpu_id))
+   const struct x86_cpu_id *m = x86_match_cpu(crc32c_cpu_id);
+
+   if (!m || !m->driver_data)
return -ENODEV;
 #ifdef CONFIG_X86_64
if (boot_cpu_has(X86_FEATURE_PCLMULQDQ)) {
-- 
2.7.4



Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

2020-12-15 Thread Tony W Wang-oc


On 15/12/2020 16:58, Peter Zijlstra wrote:
> On Mon, Dec 14, 2020 at 11:59:52AM +0800, Tony W Wang-oc wrote:
> 
> Didn't I mention something about a comment?
> 
Really sorry for this.

>>  static const struct x86_cpu_id crc32c_cpu_id[] = {
>> +X86_MATCH_VENDOR_FAM_FEATURE(ZHAOXIN, 0x6, X86_FEATURE_XMM4_2, 1),
>> +X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x1b, 
>> X86_FEATURE_XMM4_2, 1),
>> +X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x3b, 
>> X86_FEATURE_XMM4_2, 1),
>> +X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 0x6, X86_FEATURE_XMM4_2, 1),
>> +X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x1b, 
>> X86_FEATURE_XMM4_2, 1),
>> +X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x3b, 
>> X86_FEATURE_XMM4_2, 1),
>>  X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, NULL),
>>  {}
> 
> Also, the above is weird in that is has the negative entries marked
> positive, and 1/NULL are inconsistent.
> 
> Something like so then?
> That's better!

> ---
> 
> diff --git a/arch/x86/crypto/crc32c-intel_glue.c 
> b/arch/x86/crypto/crc32c-intel_glue.c
> index feccb5254c7e..f6e6669a5102 100644
> --- a/arch/x86/crypto/crc32c-intel_glue.c
> +++ b/arch/x86/crypto/crc32c-intel_glue.c
> @@ -215,14 +215,31 @@ static struct shash_alg alg = {
>  };
>  
>  static const struct x86_cpu_id crc32c_cpu_id[] = {
> - X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, NULL),
> + /*
> +  * Negative entries; exclude these chips from using this driver.
> +  * They match the positive rule below, but their CRC32 instruction
> +  * implementation is so slow, it doesn't merrit use.
Will fix the typo merrit -> merit and resend the patch.

Sincerely
Tony

> +  */
> + X86_MATCH_VENDOR_FAM_FEATURE(ZHAOXIN, 0x6, X86_FEATURE_XMM4_2, false),
> + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x1b, 
> X86_FEATURE_XMM4_2, false),
> + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x3b, 
> X86_FEATURE_XMM4_2, false),
> + X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 0x6, X86_FEATURE_XMM4_2, false),
> + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x1b, 
> X86_FEATURE_XMM4_2, false),
> + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x3b, 
> X86_FEATURE_XMM4_2, false),
> + /*
> +  * Positive entry; SSE-4.2 instructions include special purpose CRC32
> +  * instructions.
> +  */
> + X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, true),
>   {}
>  };
>  MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id);
>  
>  static int __init crc32c_intel_mod_init(void)
>  {
> - if (!x86_match_cpu(crc32c_cpu_id))
> + const struct x86_cpu_id *m = x86_match_cpu(crc32c_cpu_id);
> +
> + if (!m || !m->driver_data)
>   return -ENODEV;
>  #ifdef CONFIG_X86_64
>   if (boot_cpu_has(X86_FEATURE_PCLMULQDQ)) {
> .
> 


Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

2020-12-15 Thread Peter Zijlstra
On Mon, Dec 14, 2020 at 11:59:52AM +0800, Tony W Wang-oc wrote:

Didn't I mention something about a comment?

>  static const struct x86_cpu_id crc32c_cpu_id[] = {
> + X86_MATCH_VENDOR_FAM_FEATURE(ZHAOXIN, 0x6, X86_FEATURE_XMM4_2, 1),
> + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x1b, 
> X86_FEATURE_XMM4_2, 1),
> + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x3b, 
> X86_FEATURE_XMM4_2, 1),
> + X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 0x6, X86_FEATURE_XMM4_2, 1),
> + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x1b, 
> X86_FEATURE_XMM4_2, 1),
> + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x3b, 
> X86_FEATURE_XMM4_2, 1),
>   X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, NULL),
>   {}

Also, the above is weird in that is has the negative entries marked
positive, and 1/NULL are inconsistent.

Something like so then?

---

diff --git a/arch/x86/crypto/crc32c-intel_glue.c 
b/arch/x86/crypto/crc32c-intel_glue.c
index feccb5254c7e..f6e6669a5102 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -215,14 +215,31 @@ static struct shash_alg alg = {
 };
 
 static const struct x86_cpu_id crc32c_cpu_id[] = {
-   X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, NULL),
+   /*
+* Negative entries; exclude these chips from using this driver.
+* They match the positive rule below, but their CRC32 instruction
+* implementation is so slow, it doesn't merrit use.
+*/
+   X86_MATCH_VENDOR_FAM_FEATURE(ZHAOXIN, 0x6, X86_FEATURE_XMM4_2, false),
+   X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x1b, 
X86_FEATURE_XMM4_2, false),
+   X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x3b, 
X86_FEATURE_XMM4_2, false),
+   X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 0x6, X86_FEATURE_XMM4_2, false),
+   X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x1b, 
X86_FEATURE_XMM4_2, false),
+   X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x3b, 
X86_FEATURE_XMM4_2, false),
+   /*
+* Positive entry; SSE-4.2 instructions include special purpose CRC32
+* instructions.
+*/
+   X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, true),
{}
 };
 MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id);
 
 static int __init crc32c_intel_mod_init(void)
 {
-   if (!x86_match_cpu(crc32c_cpu_id))
+   const struct x86_cpu_id *m = x86_match_cpu(crc32c_cpu_id);
+
+   if (!m || !m->driver_data)
return -ENODEV;
 #ifdef CONFIG_X86_64
if (boot_cpu_has(X86_FEATURE_PCLMULQDQ)) {


Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

2020-12-14 Thread Tony W Wang-oc


On 15/12/2020 04:41, Eric Biggers wrote:
> On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote:
>> On 12/12/2020 01:43, Eric Biggers wrote:
>>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
 The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
 On platforms with Zhaoxin CPUs supporting this X86 feature, When
 crc32c-intel and crc32c-generic are both registered, system will
 use crc32c-intel because its .cra_priority is greater than
 crc32c-generic. This case expect to use crc32c-generic driver for
 some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin
 CPUs support from crc32c-intel.

 Signed-off-by: Tony W Wang-oc 
>>>
>>> Does this mean that the performance of the crc32c instruction on those CPUs 
>>> is
>>> actually slower than a regular C implementation?  That's very weird.
>>>
>>
>> From the lmbench3 Create and Delete file test on those chips, I think yes.
>>
> 
> Did you try measuring the performance of the hashing itself, and not some
> higher-level filesystem operations?
> 

Yes. Was testing on these Zhaoxin CPUs, the result is that with the same
input value the generic C implementation takes fewer time than the
crc32c instruction implementation.

Sincerely
Tony


Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

2020-12-14 Thread Eric Biggers
On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote:
> On 12/12/2020 01:43, Eric Biggers wrote:
> > On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
> >> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
> >> On platforms with Zhaoxin CPUs supporting this X86 feature, When
> >> crc32c-intel and crc32c-generic are both registered, system will
> >> use crc32c-intel because its .cra_priority is greater than
> >> crc32c-generic. This case expect to use crc32c-generic driver for
> >> some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin
> >> CPUs support from crc32c-intel.
> >>
> >> Signed-off-by: Tony W Wang-oc 
> > 
> > Does this mean that the performance of the crc32c instruction on those CPUs 
> > is
> > actually slower than a regular C implementation?  That's very weird.
> > 
> 
> From the lmbench3 Create and Delete file test on those chips, I think yes.
> 

Did you try measuring the performance of the hashing itself, and not some
higher-level filesystem operations?

- Eric


[PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

2020-12-13 Thread Tony W Wang-oc
The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
On platforms with Zhaoxin CPUs supporting this X86 feature, when
crc32c-intel and crc32c-generic are both registered, system will
use crc32c-intel because its .cra_priority is greater than
crc32c-generic.

When doing lmbench3 Create and Delete file test on partitions with
ext4 enabling metadata checksum, found using crc32c-generic driver
could get about 20% performance gain than using the driver crc32c-intel
on some Zhaoxin CPUs.

This case expect to use crc32c-generic driver for these Zhaoxin CPUs
to get performance gain, so remove these Zhaoxin CPUs support from
crc32c-intel.

Signed-off-by: Tony W Wang-oc 
---
 arch/x86/crypto/crc32c-intel_glue.c | 10 +-
 1 file changed, 9 insertions(+), 1 deletion(-)

diff --git a/arch/x86/crypto/crc32c-intel_glue.c 
b/arch/x86/crypto/crc32c-intel_glue.c
index feccb52..2cbbdde 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -215,6 +215,12 @@ static struct shash_alg alg = {
 };
 
 static const struct x86_cpu_id crc32c_cpu_id[] = {
+   X86_MATCH_VENDOR_FAM_FEATURE(ZHAOXIN, 0x6, X86_FEATURE_XMM4_2, 1),
+   X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x1b, 
X86_FEATURE_XMM4_2, 1),
+   X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x3b, 
X86_FEATURE_XMM4_2, 1),
+   X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 0x6, X86_FEATURE_XMM4_2, 1),
+   X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x1b, 
X86_FEATURE_XMM4_2, 1),
+   X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x3b, 
X86_FEATURE_XMM4_2, 1),
X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, NULL),
{}
 };
@@ -222,7 +228,9 @@ MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id);
 
 static int __init crc32c_intel_mod_init(void)
 {
-   if (!x86_match_cpu(crc32c_cpu_id))
+   const struct x86_cpu_id *m = x86_match_cpu(crc32c_cpu_id);
+
+   if (!m || m->driver_data)
return -ENODEV;
 #ifdef CONFIG_X86_64
if (boot_cpu_has(X86_FEATURE_PCLMULQDQ)) {
-- 
2.7.4



Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

2020-12-13 Thread Tony W Wang-oc



On 12/12/2020 18:54, Ard Biesheuvel wrote:
> On Sat, 12 Dec 2020 at 10:36, Ard Biesheuvel  wrote:
>>
>> On Fri, 11 Dec 2020 at 20:07, Eric Biggers  wrote:
>>>
>>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
 The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
 On platforms with Zhaoxin CPUs supporting this X86 feature, When
 crc32c-intel and crc32c-generic are both registered, system will
 use crc32c-intel because its .cra_priority is greater than
 crc32c-generic. This case expect to use crc32c-generic driver for
 some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin
 CPUs support from crc32c-intel.

 Signed-off-by: Tony W Wang-oc 
>>>
>>> Does this mean that the performance of the crc32c instruction on those CPUs 
>>> is
>>> actually slower than a regular C implementation?  That's very weird.
>>>
>>
>> This driver does not use CRC instructions, but carryless
>> multiplication and aggregation. So I suppose the pclmulqdq instruction
>> triggers some pathological performance limitation here.
>>
> 
> Just noticed it uses both crc instructions and pclmulqdq instructions.
> Sorry for the noise.
> 
>> That means the crct10dif driver probably needs the same treatment.
> 
> Tony, can you confirm that the problem is in the CRC instructions and
> not in the PCLMULQDQ code path that supersedes it when available?

CRC instructions.

sincerely
Tony


Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

2020-12-13 Thread Tony W Wang-oc
On 12/12/2020 01:43, Eric Biggers wrote:
> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
>> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
>> On platforms with Zhaoxin CPUs supporting this X86 feature, When
>> crc32c-intel and crc32c-generic are both registered, system will
>> use crc32c-intel because its .cra_priority is greater than
>> crc32c-generic. This case expect to use crc32c-generic driver for
>> some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin
>> CPUs support from crc32c-intel.
>>
>> Signed-off-by: Tony W Wang-oc 
> 
> Does this mean that the performance of the crc32c instruction on those CPUs is
> actually slower than a regular C implementation?  That's very weird.
> 

>From the lmbench3 Create and Delete file test on those chips, I think yes.

sincerely
Tony


Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

2020-12-12 Thread Ard Biesheuvel
On Sat, 12 Dec 2020 at 10:36, Ard Biesheuvel  wrote:
>
> On Fri, 11 Dec 2020 at 20:07, Eric Biggers  wrote:
> >
> > On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
> > > The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
> > > On platforms with Zhaoxin CPUs supporting this X86 feature, When
> > > crc32c-intel and crc32c-generic are both registered, system will
> > > use crc32c-intel because its .cra_priority is greater than
> > > crc32c-generic. This case expect to use crc32c-generic driver for
> > > some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin
> > > CPUs support from crc32c-intel.
> > >
> > > Signed-off-by: Tony W Wang-oc 
> >
> > Does this mean that the performance of the crc32c instruction on those CPUs 
> > is
> > actually slower than a regular C implementation?  That's very weird.
> >
>
> This driver does not use CRC instructions, but carryless
> multiplication and aggregation. So I suppose the pclmulqdq instruction
> triggers some pathological performance limitation here.
>

Just noticed it uses both crc instructions and pclmulqdq instructions.
Sorry for the noise.

> That means the crct10dif driver probably needs the same treatment.

Tony, can you confirm that the problem is in the CRC instructions and
not in the PCLMULQDQ code path that supersedes it when available?


Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

2020-12-12 Thread Ard Biesheuvel
On Fri, 11 Dec 2020 at 20:07, Eric Biggers  wrote:
>
> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
> > The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
> > On platforms with Zhaoxin CPUs supporting this X86 feature, When
> > crc32c-intel and crc32c-generic are both registered, system will
> > use crc32c-intel because its .cra_priority is greater than
> > crc32c-generic. This case expect to use crc32c-generic driver for
> > some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin
> > CPUs support from crc32c-intel.
> >
> > Signed-off-by: Tony W Wang-oc 
>
> Does this mean that the performance of the crc32c instruction on those CPUs is
> actually slower than a regular C implementation?  That's very weird.
>

This driver does not use CRC instructions, but carryless
multiplication and aggregation. So I suppose the pclmulqdq instruction
triggers some pathological performance limitation here.

That means the crct10dif driver probably needs the same treatment.


Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

2020-12-11 Thread Tony W Wang-oc
On 11/12/2020 21:00, Peter Zijlstra wrote:
> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
>> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
>> On platforms with Zhaoxin CPUs supporting this X86 feature, When
>> crc32c-intel and crc32c-generic are both registered, system will
>> use crc32c-intel because its .cra_priority is greater than
>> crc32c-generic. This case expect to use crc32c-generic driver for
>> some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin
>> CPUs support from crc32c-intel.
>>
>> Signed-off-by: Tony W Wang-oc 
>> ---
>>  arch/x86/crypto/crc32c-intel_glue.c | 8 
>>  1 file changed, 8 insertions(+)
>>
>> diff --git a/arch/x86/crypto/crc32c-intel_glue.c 
>> b/arch/x86/crypto/crc32c-intel_glue.c
>> index feccb52..6dafdae 100644
>> --- a/arch/x86/crypto/crc32c-intel_glue.c
>> +++ b/arch/x86/crypto/crc32c-intel_glue.c
>> @@ -222,8 +222,16 @@ MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id);
>>  
>>  static int __init crc32c_intel_mod_init(void)
>>  {
>> +struct cpuinfo_x86 *c = _cpu_data;
>> +
>>  if (!x86_match_cpu(crc32c_cpu_id))
>>  return -ENODEV;
>> +
>> +if (c->x86_vendor == X86_VENDOR_ZHAOXIN || c->x86_vendor == 
>> X86_VENDOR_CENTAUR) {
>> +if (c->x86 == 0x6 || (c->x86 == 0x7 && c->x86_model <= 0x3b))
>> +return -ENODEV;
>> +}
> 
> Egads, why can't you use that x86_match_cpu() above, and also this
> really wants a comment on why you're excluding these chips. 

When doing lmbench3 Create and Delete file test on partitions with ext4
enabling metadata checksum, found using crc32c-generic driver could get
about 20% performance gain than using the driver crc32c-intel on these
chips.

Also, since
> (IIRC) ZHAOXIN is basically AND, shouldn't they also be listed?
> 
> That is; write it like:
> 
>   m = x86_match_cpu(crc32_cpu_id);
>   if (!m || !m->data)
>   return -ENODEV;
> 
> That way you can have positive and negative matches in the array
> (obviously the existing FEATURE test would need data=1 and be last).
> .
> 

Lot thanks for you suggestion, will list these chips in crc32c_cpu_id
and use x86_match_cpu:

 static const struct x86_cpu_id crc32c_cpu_id[] = {
+   X86_MATCH_VENDOR_FAM_FEATURE(ZHAOXIN, 0x6, X86_FEATURE_XMM4_2, 1),
+   X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x1b,
X86_FEATURE_XMM4_2, 1),
+   X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x3b,
X86_FEATURE_XMM4_2, 1),
+   X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 0x6, X86_FEATURE_XMM4_2, 1),
+   X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x1b,
X86_FEATURE_XMM4_2, 1),
+   X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x3b,
X86_FEATURE_XMM4_2, 1),
X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, NULL),
{}
 };
@@ -228,8 +234,10 @@ MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id);

 static int __init crc32c_intel_mod_init(void)
 {
-   if (!x86_match_cpu(crc32c_cpu_id))
+   const struct x86_cpu_id *m = x86_match_cpu(crc32c_cpu_id);
+   if (!m || m->driver_data)
return -ENODEV;


sincerely
TonyWWangoc


Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

2020-12-11 Thread Eric Biggers
On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
> On platforms with Zhaoxin CPUs supporting this X86 feature, When
> crc32c-intel and crc32c-generic are both registered, system will
> use crc32c-intel because its .cra_priority is greater than
> crc32c-generic. This case expect to use crc32c-generic driver for
> some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin
> CPUs support from crc32c-intel.
> 
> Signed-off-by: Tony W Wang-oc 

Does this mean that the performance of the crc32c instruction on those CPUs is
actually slower than a regular C implementation?  That's very weird.

- Eric


Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

2020-12-11 Thread Peter Zijlstra
On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
> On platforms with Zhaoxin CPUs supporting this X86 feature, When
> crc32c-intel and crc32c-generic are both registered, system will
> use crc32c-intel because its .cra_priority is greater than
> crc32c-generic. This case expect to use crc32c-generic driver for
> some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin
> CPUs support from crc32c-intel.
> 
> Signed-off-by: Tony W Wang-oc 
> ---
>  arch/x86/crypto/crc32c-intel_glue.c | 8 
>  1 file changed, 8 insertions(+)
> 
> diff --git a/arch/x86/crypto/crc32c-intel_glue.c 
> b/arch/x86/crypto/crc32c-intel_glue.c
> index feccb52..6dafdae 100644
> --- a/arch/x86/crypto/crc32c-intel_glue.c
> +++ b/arch/x86/crypto/crc32c-intel_glue.c
> @@ -222,8 +222,16 @@ MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id);
>  
>  static int __init crc32c_intel_mod_init(void)
>  {
> + struct cpuinfo_x86 *c = _cpu_data;
> +
>   if (!x86_match_cpu(crc32c_cpu_id))
>   return -ENODEV;
> +
> + if (c->x86_vendor == X86_VENDOR_ZHAOXIN || c->x86_vendor == 
> X86_VENDOR_CENTAUR) {
> + if (c->x86 == 0x6 || (c->x86 == 0x7 && c->x86_model <= 0x3b))
> + return -ENODEV;
> + }

Egads, why can't you use that x86_match_cpu() above, and also this
really wants a comment on why you're excluding these chips. Also, since
(IIRC) ZHAOXIN is basically AND, shouldn't they also be listed?

That is; write it like:

m = x86_match_cpu(crc32_cpu_id);
if (!m || !m->data)
return -ENODEV;

That way you can have positive and negative matches in the array
(obviously the existing FEATURE test would need data=1 and be last).


[PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

2020-12-11 Thread Tony W Wang-oc
The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2.
On platforms with Zhaoxin CPUs supporting this X86 feature, When
crc32c-intel and crc32c-generic are both registered, system will
use crc32c-intel because its .cra_priority is greater than
crc32c-generic. This case expect to use crc32c-generic driver for
some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin
CPUs support from crc32c-intel.

Signed-off-by: Tony W Wang-oc 
---
 arch/x86/crypto/crc32c-intel_glue.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/x86/crypto/crc32c-intel_glue.c 
b/arch/x86/crypto/crc32c-intel_glue.c
index feccb52..6dafdae 100644
--- a/arch/x86/crypto/crc32c-intel_glue.c
+++ b/arch/x86/crypto/crc32c-intel_glue.c
@@ -222,8 +222,16 @@ MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id);
 
 static int __init crc32c_intel_mod_init(void)
 {
+   struct cpuinfo_x86 *c = _cpu_data;
+
if (!x86_match_cpu(crc32c_cpu_id))
return -ENODEV;
+
+   if (c->x86_vendor == X86_VENDOR_ZHAOXIN || c->x86_vendor == 
X86_VENDOR_CENTAUR) {
+   if (c->x86 == 0x6 || (c->x86 == 0x7 && c->x86_model <= 0x3b))
+   return -ENODEV;
+   }
+
 #ifdef CONFIG_X86_64
if (boot_cpu_has(X86_FEATURE_PCLMULQDQ)) {
alg.update = crc32c_pcl_intel_update;
-- 
2.7.4