Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
On 03/01/2021 05:12, Herbert Xu wrote: > On Tue, Dec 15, 2020 at 06:28:11PM +0800, Tony W Wang-oc wrote: >> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2. >> On platforms with Zhaoxin CPUs supporting this X86 feature, when >> crc32c-intel and crc32c-generic are both registered, system will >> use crc32c-intel because its .cra_priority is greater than >> crc32c-generic. >> >> When doing lmbench3 Create and Delete file test on partitions with >> ext4 enabling metadata checksum, found using crc32c-generic driver >> could get about 20% performance gain than using the driver crc32c-intel >> on some Zhaoxin CPUs. >> >> This case expect to use crc32c-generic driver for these Zhaoxin CPUs >> to get performance gain, so remove these Zhaoxin CPUs support from >> crc32c-intel. >> >> Signed-off-by: Tony W Wang-oc >> --- >> arch/x86/crypto/crc32c-intel_glue.c | 21 +++-- >> 1 file changed, 19 insertions(+), 2 deletions(-) > > This does not seem to address the latest comment from hpa. > Yes, please ignore this patch. Have send new patch set per Hpa's suggestion. Sincerely Tonyw > Thanks, >
Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
On 22/12/2020 12:54, h...@zytor.com wrote: > On December 21, 2020 7:01:39 PM PST, tonywwang...@zhaoxin.com wrote: >> On December 22, 2020 3:27:33 AM GMT+08:00, h...@zytor.com wrote: >>> On December 20, 2020 6:46:25 PM PST, tonywwang...@zhaoxin.com wrote: On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers wrote: > On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote: >> >> On 15/12/2020 04:41, Eric Biggers wrote: >>> On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote: On 12/12/2020 01:43, Eric Biggers wrote: > On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc >> wrote: >> The driver crc32c-intel match CPUs supporting > X86_FEATURE_XMM4_2. >> On platforms with Zhaoxin CPUs supporting this X86 feature, When >> crc32c-intel and crc32c-generic are both registered, system will >> use crc32c-intel because its .cra_priority is greater than >> crc32c-generic. This case expect to use crc32c-generic driver > for >> some Zhaoxin CPUs to get performance gain, So remove these > Zhaoxin >> CPUs support from crc32c-intel. >> >> Signed-off-by: Tony W Wang-oc > > Does this mean that the performance of the crc32c instruction >>> on > those CPUs is > actually slower than a regular C implementation? That's very > weird. > From the lmbench3 Create and Delete file test on those chips, I > think yes. >>> >>> Did you try measuring the performance of the hashing itself, and > not some >>> higher-level filesystem operations? >>> >> >> Yes. Was testing on these Zhaoxin CPUs, the result is that with >> the > same >> input value the generic C implementation takes fewer time than the >> crc32c instruction implementation. >> > > And that is really "working as intended"? These CPU's crc32c instruction is not working as intended. Why do these CPUs even > declare that > they support the crc32c instruction, when it is so slow? > The presence of crc32c and some other instructions supports are enumerated by CPUID.01:ECX[SSE4.2] = 1, other instructions are ok except the crc32c instruction. > Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2, >> AVX, > etc.) that > these CPUs similarly declare support for but they are uselessly >> slow? No. Sincerely Tonyw > > - Eric >>> >>> Then the right thing to do is to disable the CPUID bit in the >>> vendor-specific startup code. >> >> This way makes these CPUs do not support all instruction sets >> enumerated >> by CPUID.01:ECX[SSE4.2]. >> While only crc32c instruction is slow, just expect the crc32c-intel >> driver do not >> match these CPUs. >> >> Sincerely >> Tonyw > > Then create a BUG flag for it, or factor out CRC32C into a synthetic flag. We > *do not* bury this information in drivers; it becomes a recipe for the same > problems over and over. > Thanks for your suggestion. Have send new patch set. Sincerely Tonyw
Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
On Tue, Dec 15, 2020 at 06:28:11PM +0800, Tony W Wang-oc wrote: > The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2. > On platforms with Zhaoxin CPUs supporting this X86 feature, when > crc32c-intel and crc32c-generic are both registered, system will > use crc32c-intel because its .cra_priority is greater than > crc32c-generic. > > When doing lmbench3 Create and Delete file test on partitions with > ext4 enabling metadata checksum, found using crc32c-generic driver > could get about 20% performance gain than using the driver crc32c-intel > on some Zhaoxin CPUs. > > This case expect to use crc32c-generic driver for these Zhaoxin CPUs > to get performance gain, so remove these Zhaoxin CPUs support from > crc32c-intel. > > Signed-off-by: Tony W Wang-oc > --- > arch/x86/crypto/crc32c-intel_glue.c | 21 +++-- > 1 file changed, 19 insertions(+), 2 deletions(-) This does not seem to address the latest comment from hpa. Thanks, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
On December 21, 2020 7:01:39 PM PST, tonywwang...@zhaoxin.com wrote: >On December 22, 2020 3:27:33 AM GMT+08:00, h...@zytor.com wrote: >>On December 20, 2020 6:46:25 PM PST, tonywwang...@zhaoxin.com wrote: >>>On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers >>> wrote: On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote: > > On 15/12/2020 04:41, Eric Biggers wrote: > > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote: > >> On 12/12/2020 01:43, Eric Biggers wrote: > >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc >wrote: > The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2. > On platforms with Zhaoxin CPUs supporting this X86 feature, >>>When > crc32c-intel and crc32c-generic are both registered, system >>>will > use crc32c-intel because its .cra_priority is greater than > crc32c-generic. This case expect to use crc32c-generic driver for > some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin > CPUs support from crc32c-intel. > > Signed-off-by: Tony W Wang-oc > >>> > >>> Does this mean that the performance of the crc32c instruction >>on those CPUs is > >>> actually slower than a regular C implementation? That's very weird. > >>> > >> > >> From the lmbench3 Create and Delete file test on those chips, I think yes. > >> > > > > Did you try measuring the performance of the hashing itself, and not some > > higher-level filesystem operations? > > > > Yes. Was testing on these Zhaoxin CPUs, the result is that with >the same > input value the generic C implementation takes fewer time than the > crc32c instruction implementation. > And that is really "working as intended"? >>> >>>These CPU's crc32c instruction is not working as intended. >>> >>> Why do these CPUs even declare that they support the crc32c instruction, when it is so slow? >>> >>>The presence of crc32c and some other instructions supports are >>>enumerated by CPUID.01:ECX[SSE4.2] = 1, other instructions are ok >>>except the crc32c instruction. >>> Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2, >AVX, etc.) that these CPUs similarly declare support for but they are uselessly >slow? >>> >>>No. >>> >>>Sincerely >>>Tonyw >>> - Eric >> >>Then the right thing to do is to disable the CPUID bit in the >>vendor-specific startup code. > >This way makes these CPUs do not support all instruction sets >enumerated >by CPUID.01:ECX[SSE4.2]. >While only crc32c instruction is slow, just expect the crc32c-intel >driver do not >match these CPUs. > >Sincerely >Tonyw Then create a BUG flag for it, or factor out CRC32C into a synthetic flag. We *do not* bury this information in drivers; it becomes a recipe for the same problems over and over. -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
On December 22, 2020 3:27:33 AM GMT+08:00, h...@zytor.com wrote: >On December 20, 2020 6:46:25 PM PST, tonywwang...@zhaoxin.com wrote: >>On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers >> wrote: >>>On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote: On 15/12/2020 04:41, Eric Biggers wrote: > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote: >> On 12/12/2020 01:43, Eric Biggers wrote: >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote: The driver crc32c-intel match CPUs supporting >>>X86_FEATURE_XMM4_2. On platforms with Zhaoxin CPUs supporting this X86 feature, >>When crc32c-intel and crc32c-generic are both registered, system >>will use crc32c-intel because its .cra_priority is greater than crc32c-generic. This case expect to use crc32c-generic driver >>>for some Zhaoxin CPUs to get performance gain, So remove these >>>Zhaoxin CPUs support from crc32c-intel. Signed-off-by: Tony W Wang-oc >>> >>> Does this mean that the performance of the crc32c instruction >on >>>those CPUs is >>> actually slower than a regular C implementation? That's very >>>weird. >>> >> >> From the lmbench3 Create and Delete file test on those chips, I >>>think yes. >> > > Did you try measuring the performance of the hashing itself, and >>>not some > higher-level filesystem operations? > Yes. Was testing on these Zhaoxin CPUs, the result is that with the >>>same input value the generic C implementation takes fewer time than the crc32c instruction implementation. >>> >>>And that is really "working as intended"? >> >>These CPU's crc32c instruction is not working as intended. >> >> Why do these CPUs even >>>declare that >>>they support the crc32c instruction, when it is so slow? >>> >> >>The presence of crc32c and some other instructions supports are >>enumerated by CPUID.01:ECX[SSE4.2] = 1, other instructions are ok >>except the crc32c instruction. >> >>>Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2, AVX, >>>etc.) that >>>these CPUs similarly declare support for but they are uselessly slow? >> >>No. >> >>Sincerely >>Tonyw >> >>> >>>- Eric > >Then the right thing to do is to disable the CPUID bit in the >vendor-specific startup code. This way makes these CPUs do not support all instruction sets enumerated by CPUID.01:ECX[SSE4.2]. While only crc32c instruction is slow, just expect the crc32c-intel driver do not match these CPUs. Sincerely Tonyw
Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
On December 20, 2020 6:46:25 PM PST, tonywwang...@zhaoxin.com wrote: >On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers > wrote: >>On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote: >>> >>> On 15/12/2020 04:41, Eric Biggers wrote: >>> > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote: >>> >> On 12/12/2020 01:43, Eric Biggers wrote: >>> >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote: >>> The driver crc32c-intel match CPUs supporting >>X86_FEATURE_XMM4_2. >>> On platforms with Zhaoxin CPUs supporting this X86 feature, >When >>> crc32c-intel and crc32c-generic are both registered, system >will >>> use crc32c-intel because its .cra_priority is greater than >>> crc32c-generic. This case expect to use crc32c-generic driver >>for >>> some Zhaoxin CPUs to get performance gain, So remove these >>Zhaoxin >>> CPUs support from crc32c-intel. >>> >>> Signed-off-by: Tony W Wang-oc >>> >>> >>> >>> Does this mean that the performance of the crc32c instruction on >>those CPUs is >>> >>> actually slower than a regular C implementation? That's very >>weird. >>> >>> >>> >> >>> >> From the lmbench3 Create and Delete file test on those chips, I >>think yes. >>> >> >>> > >>> > Did you try measuring the performance of the hashing itself, and >>not some >>> > higher-level filesystem operations? >>> > >>> >>> Yes. Was testing on these Zhaoxin CPUs, the result is that with the >>same >>> input value the generic C implementation takes fewer time than the >>> crc32c instruction implementation. >>> >> >>And that is really "working as intended"? > >These CPU's crc32c instruction is not working as intended. > > Why do these CPUs even >>declare that >>they support the crc32c instruction, when it is so slow? >> > >The presence of crc32c and some other instructions supports are >enumerated by CPUID.01:ECX[SSE4.2] = 1, other instructions are ok >except the crc32c instruction. > >>Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2, AVX, >>etc.) that >>these CPUs similarly declare support for but they are uselessly slow? > >No. > >Sincerely >Tonyw > >> >>- Eric Then the right thing to do is to disable the CPUID bit in the vendor-specific startup code. -- Sent from my Android device with K-9 Mail. Please excuse my brevity.
Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers wrote: >On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote: >> >> On 15/12/2020 04:41, Eric Biggers wrote: >> > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote: >> >> On 12/12/2020 01:43, Eric Biggers wrote: >> >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote: >> The driver crc32c-intel match CPUs supporting >X86_FEATURE_XMM4_2. >> On platforms with Zhaoxin CPUs supporting this X86 feature, When >> crc32c-intel and crc32c-generic are both registered, system will >> use crc32c-intel because its .cra_priority is greater than >> crc32c-generic. This case expect to use crc32c-generic driver >for >> some Zhaoxin CPUs to get performance gain, So remove these >Zhaoxin >> CPUs support from crc32c-intel. >> >> Signed-off-by: Tony W Wang-oc >> >>> >> >>> Does this mean that the performance of the crc32c instruction on >those CPUs is >> >>> actually slower than a regular C implementation? That's very >weird. >> >>> >> >> >> >> From the lmbench3 Create and Delete file test on those chips, I >think yes. >> >> >> > >> > Did you try measuring the performance of the hashing itself, and >not some >> > higher-level filesystem operations? >> > >> >> Yes. Was testing on these Zhaoxin CPUs, the result is that with the >same >> input value the generic C implementation takes fewer time than the >> crc32c instruction implementation. >> > >And that is really "working as intended"? These CPU's crc32c instruction is not working as intended. Why do these CPUs even >declare that >they support the crc32c instruction, when it is so slow? > The presence of crc32c and some other instructions supports are enumerated by CPUID.01:ECX[SSE4.2] = 1, other instructions are ok except the crc32c instruction. >Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2, AVX, >etc.) that >these CPUs similarly declare support for but they are uselessly slow? No. Sincerely Tonyw > >- Eric
Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote: > > On 15/12/2020 04:41, Eric Biggers wrote: > > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote: > >> On 12/12/2020 01:43, Eric Biggers wrote: > >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote: > The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2. > On platforms with Zhaoxin CPUs supporting this X86 feature, When > crc32c-intel and crc32c-generic are both registered, system will > use crc32c-intel because its .cra_priority is greater than > crc32c-generic. This case expect to use crc32c-generic driver for > some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin > CPUs support from crc32c-intel. > > Signed-off-by: Tony W Wang-oc > >>> > >>> Does this mean that the performance of the crc32c instruction on those > >>> CPUs is > >>> actually slower than a regular C implementation? That's very weird. > >>> > >> > >> From the lmbench3 Create and Delete file test on those chips, I think yes. > >> > > > > Did you try measuring the performance of the hashing itself, and not some > > higher-level filesystem operations? > > > > Yes. Was testing on these Zhaoxin CPUs, the result is that with the same > input value the generic C implementation takes fewer time than the > crc32c instruction implementation. > And that is really "working as intended"? Why do these CPUs even declare that they support the crc32c instruction, when it is so slow? Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2, AVX, etc.) that these CPUs similarly declare support for but they are uselessly slow? - Eric
[PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2. On platforms with Zhaoxin CPUs supporting this X86 feature, when crc32c-intel and crc32c-generic are both registered, system will use crc32c-intel because its .cra_priority is greater than crc32c-generic. When doing lmbench3 Create and Delete file test on partitions with ext4 enabling metadata checksum, found using crc32c-generic driver could get about 20% performance gain than using the driver crc32c-intel on some Zhaoxin CPUs. This case expect to use crc32c-generic driver for these Zhaoxin CPUs to get performance gain, so remove these Zhaoxin CPUs support from crc32c-intel. Signed-off-by: Tony W Wang-oc --- arch/x86/crypto/crc32c-intel_glue.c | 21 +++-- 1 file changed, 19 insertions(+), 2 deletions(-) diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c index feccb52..5171091 100644 --- a/arch/x86/crypto/crc32c-intel_glue.c +++ b/arch/x86/crypto/crc32c-intel_glue.c @@ -215,14 +215,31 @@ static struct shash_alg alg = { }; static const struct x86_cpu_id crc32c_cpu_id[] = { - X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, NULL), + /* +* Negative entries; exclude these chips from using this driver. +* They match the positive rule below, but their CRC32 instruction +* implementation is so slow, it doesn't merit use. +*/ + X86_MATCH_VENDOR_FAM_FEATURE(ZHAOXIN, 0x6, X86_FEATURE_XMM4_2, false), + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x1b, X86_FEATURE_XMM4_2, false), + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x3b, X86_FEATURE_XMM4_2, false), + X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 0x6, X86_FEATURE_XMM4_2, false), + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x1b, X86_FEATURE_XMM4_2, false), + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x3b, X86_FEATURE_XMM4_2, false), + /* +* Positive entry; SSE-4.2 instructions include special purpose CRC32 +* instructions. +*/ + X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, true), {} }; MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id); static int __init crc32c_intel_mod_init(void) { - if (!x86_match_cpu(crc32c_cpu_id)) + const struct x86_cpu_id *m = x86_match_cpu(crc32c_cpu_id); + + if (!m || !m->driver_data) return -ENODEV; #ifdef CONFIG_X86_64 if (boot_cpu_has(X86_FEATURE_PCLMULQDQ)) { -- 2.7.4
Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
On 15/12/2020 16:58, Peter Zijlstra wrote: > On Mon, Dec 14, 2020 at 11:59:52AM +0800, Tony W Wang-oc wrote: > > Didn't I mention something about a comment? > Really sorry for this. >> static const struct x86_cpu_id crc32c_cpu_id[] = { >> +X86_MATCH_VENDOR_FAM_FEATURE(ZHAOXIN, 0x6, X86_FEATURE_XMM4_2, 1), >> +X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x1b, >> X86_FEATURE_XMM4_2, 1), >> +X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x3b, >> X86_FEATURE_XMM4_2, 1), >> +X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 0x6, X86_FEATURE_XMM4_2, 1), >> +X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x1b, >> X86_FEATURE_XMM4_2, 1), >> +X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x3b, >> X86_FEATURE_XMM4_2, 1), >> X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, NULL), >> {} > > Also, the above is weird in that is has the negative entries marked > positive, and 1/NULL are inconsistent. > > Something like so then? > That's better! > --- > > diff --git a/arch/x86/crypto/crc32c-intel_glue.c > b/arch/x86/crypto/crc32c-intel_glue.c > index feccb5254c7e..f6e6669a5102 100644 > --- a/arch/x86/crypto/crc32c-intel_glue.c > +++ b/arch/x86/crypto/crc32c-intel_glue.c > @@ -215,14 +215,31 @@ static struct shash_alg alg = { > }; > > static const struct x86_cpu_id crc32c_cpu_id[] = { > - X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, NULL), > + /* > + * Negative entries; exclude these chips from using this driver. > + * They match the positive rule below, but their CRC32 instruction > + * implementation is so slow, it doesn't merrit use. Will fix the typo merrit -> merit and resend the patch. Sincerely Tony > + */ > + X86_MATCH_VENDOR_FAM_FEATURE(ZHAOXIN, 0x6, X86_FEATURE_XMM4_2, false), > + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x1b, > X86_FEATURE_XMM4_2, false), > + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x3b, > X86_FEATURE_XMM4_2, false), > + X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 0x6, X86_FEATURE_XMM4_2, false), > + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x1b, > X86_FEATURE_XMM4_2, false), > + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x3b, > X86_FEATURE_XMM4_2, false), > + /* > + * Positive entry; SSE-4.2 instructions include special purpose CRC32 > + * instructions. > + */ > + X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, true), > {} > }; > MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id); > > static int __init crc32c_intel_mod_init(void) > { > - if (!x86_match_cpu(crc32c_cpu_id)) > + const struct x86_cpu_id *m = x86_match_cpu(crc32c_cpu_id); > + > + if (!m || !m->driver_data) > return -ENODEV; > #ifdef CONFIG_X86_64 > if (boot_cpu_has(X86_FEATURE_PCLMULQDQ)) { > . >
Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
On Mon, Dec 14, 2020 at 11:59:52AM +0800, Tony W Wang-oc wrote: Didn't I mention something about a comment? > static const struct x86_cpu_id crc32c_cpu_id[] = { > + X86_MATCH_VENDOR_FAM_FEATURE(ZHAOXIN, 0x6, X86_FEATURE_XMM4_2, 1), > + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x1b, > X86_FEATURE_XMM4_2, 1), > + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x3b, > X86_FEATURE_XMM4_2, 1), > + X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 0x6, X86_FEATURE_XMM4_2, 1), > + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x1b, > X86_FEATURE_XMM4_2, 1), > + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x3b, > X86_FEATURE_XMM4_2, 1), > X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, NULL), > {} Also, the above is weird in that is has the negative entries marked positive, and 1/NULL are inconsistent. Something like so then? --- diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c index feccb5254c7e..f6e6669a5102 100644 --- a/arch/x86/crypto/crc32c-intel_glue.c +++ b/arch/x86/crypto/crc32c-intel_glue.c @@ -215,14 +215,31 @@ static struct shash_alg alg = { }; static const struct x86_cpu_id crc32c_cpu_id[] = { - X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, NULL), + /* +* Negative entries; exclude these chips from using this driver. +* They match the positive rule below, but their CRC32 instruction +* implementation is so slow, it doesn't merrit use. +*/ + X86_MATCH_VENDOR_FAM_FEATURE(ZHAOXIN, 0x6, X86_FEATURE_XMM4_2, false), + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x1b, X86_FEATURE_XMM4_2, false), + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x3b, X86_FEATURE_XMM4_2, false), + X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 0x6, X86_FEATURE_XMM4_2, false), + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x1b, X86_FEATURE_XMM4_2, false), + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x3b, X86_FEATURE_XMM4_2, false), + /* +* Positive entry; SSE-4.2 instructions include special purpose CRC32 +* instructions. +*/ + X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, true), {} }; MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id); static int __init crc32c_intel_mod_init(void) { - if (!x86_match_cpu(crc32c_cpu_id)) + const struct x86_cpu_id *m = x86_match_cpu(crc32c_cpu_id); + + if (!m || !m->driver_data) return -ENODEV; #ifdef CONFIG_X86_64 if (boot_cpu_has(X86_FEATURE_PCLMULQDQ)) {
Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
On 15/12/2020 04:41, Eric Biggers wrote: > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote: >> On 12/12/2020 01:43, Eric Biggers wrote: >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote: The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2. On platforms with Zhaoxin CPUs supporting this X86 feature, When crc32c-intel and crc32c-generic are both registered, system will use crc32c-intel because its .cra_priority is greater than crc32c-generic. This case expect to use crc32c-generic driver for some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin CPUs support from crc32c-intel. Signed-off-by: Tony W Wang-oc >>> >>> Does this mean that the performance of the crc32c instruction on those CPUs >>> is >>> actually slower than a regular C implementation? That's very weird. >>> >> >> From the lmbench3 Create and Delete file test on those chips, I think yes. >> > > Did you try measuring the performance of the hashing itself, and not some > higher-level filesystem operations? > Yes. Was testing on these Zhaoxin CPUs, the result is that with the same input value the generic C implementation takes fewer time than the crc32c instruction implementation. Sincerely Tony
Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote: > On 12/12/2020 01:43, Eric Biggers wrote: > > On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote: > >> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2. > >> On platforms with Zhaoxin CPUs supporting this X86 feature, When > >> crc32c-intel and crc32c-generic are both registered, system will > >> use crc32c-intel because its .cra_priority is greater than > >> crc32c-generic. This case expect to use crc32c-generic driver for > >> some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin > >> CPUs support from crc32c-intel. > >> > >> Signed-off-by: Tony W Wang-oc > > > > Does this mean that the performance of the crc32c instruction on those CPUs > > is > > actually slower than a regular C implementation? That's very weird. > > > > From the lmbench3 Create and Delete file test on those chips, I think yes. > Did you try measuring the performance of the hashing itself, and not some higher-level filesystem operations? - Eric
[PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2. On platforms with Zhaoxin CPUs supporting this X86 feature, when crc32c-intel and crc32c-generic are both registered, system will use crc32c-intel because its .cra_priority is greater than crc32c-generic. When doing lmbench3 Create and Delete file test on partitions with ext4 enabling metadata checksum, found using crc32c-generic driver could get about 20% performance gain than using the driver crc32c-intel on some Zhaoxin CPUs. This case expect to use crc32c-generic driver for these Zhaoxin CPUs to get performance gain, so remove these Zhaoxin CPUs support from crc32c-intel. Signed-off-by: Tony W Wang-oc --- arch/x86/crypto/crc32c-intel_glue.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c index feccb52..2cbbdde 100644 --- a/arch/x86/crypto/crc32c-intel_glue.c +++ b/arch/x86/crypto/crc32c-intel_glue.c @@ -215,6 +215,12 @@ static struct shash_alg alg = { }; static const struct x86_cpu_id crc32c_cpu_id[] = { + X86_MATCH_VENDOR_FAM_FEATURE(ZHAOXIN, 0x6, X86_FEATURE_XMM4_2, 1), + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x1b, X86_FEATURE_XMM4_2, 1), + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x3b, X86_FEATURE_XMM4_2, 1), + X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 0x6, X86_FEATURE_XMM4_2, 1), + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x1b, X86_FEATURE_XMM4_2, 1), + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x3b, X86_FEATURE_XMM4_2, 1), X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, NULL), {} }; @@ -222,7 +228,9 @@ MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id); static int __init crc32c_intel_mod_init(void) { - if (!x86_match_cpu(crc32c_cpu_id)) + const struct x86_cpu_id *m = x86_match_cpu(crc32c_cpu_id); + + if (!m || m->driver_data) return -ENODEV; #ifdef CONFIG_X86_64 if (boot_cpu_has(X86_FEATURE_PCLMULQDQ)) { -- 2.7.4
Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
On 12/12/2020 18:54, Ard Biesheuvel wrote: > On Sat, 12 Dec 2020 at 10:36, Ard Biesheuvel wrote: >> >> On Fri, 11 Dec 2020 at 20:07, Eric Biggers wrote: >>> >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote: The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2. On platforms with Zhaoxin CPUs supporting this X86 feature, When crc32c-intel and crc32c-generic are both registered, system will use crc32c-intel because its .cra_priority is greater than crc32c-generic. This case expect to use crc32c-generic driver for some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin CPUs support from crc32c-intel. Signed-off-by: Tony W Wang-oc >>> >>> Does this mean that the performance of the crc32c instruction on those CPUs >>> is >>> actually slower than a regular C implementation? That's very weird. >>> >> >> This driver does not use CRC instructions, but carryless >> multiplication and aggregation. So I suppose the pclmulqdq instruction >> triggers some pathological performance limitation here. >> > > Just noticed it uses both crc instructions and pclmulqdq instructions. > Sorry for the noise. > >> That means the crct10dif driver probably needs the same treatment. > > Tony, can you confirm that the problem is in the CRC instructions and > not in the PCLMULQDQ code path that supersedes it when available? CRC instructions. sincerely Tony
Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
On 12/12/2020 01:43, Eric Biggers wrote: > On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote: >> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2. >> On platforms with Zhaoxin CPUs supporting this X86 feature, When >> crc32c-intel and crc32c-generic are both registered, system will >> use crc32c-intel because its .cra_priority is greater than >> crc32c-generic. This case expect to use crc32c-generic driver for >> some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin >> CPUs support from crc32c-intel. >> >> Signed-off-by: Tony W Wang-oc > > Does this mean that the performance of the crc32c instruction on those CPUs is > actually slower than a regular C implementation? That's very weird. > >From the lmbench3 Create and Delete file test on those chips, I think yes. sincerely Tony
Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
On Sat, 12 Dec 2020 at 10:36, Ard Biesheuvel wrote: > > On Fri, 11 Dec 2020 at 20:07, Eric Biggers wrote: > > > > On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote: > > > The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2. > > > On platforms with Zhaoxin CPUs supporting this X86 feature, When > > > crc32c-intel and crc32c-generic are both registered, system will > > > use crc32c-intel because its .cra_priority is greater than > > > crc32c-generic. This case expect to use crc32c-generic driver for > > > some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin > > > CPUs support from crc32c-intel. > > > > > > Signed-off-by: Tony W Wang-oc > > > > Does this mean that the performance of the crc32c instruction on those CPUs > > is > > actually slower than a regular C implementation? That's very weird. > > > > This driver does not use CRC instructions, but carryless > multiplication and aggregation. So I suppose the pclmulqdq instruction > triggers some pathological performance limitation here. > Just noticed it uses both crc instructions and pclmulqdq instructions. Sorry for the noise. > That means the crct10dif driver probably needs the same treatment. Tony, can you confirm that the problem is in the CRC instructions and not in the PCLMULQDQ code path that supersedes it when available?
Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
On Fri, 11 Dec 2020 at 20:07, Eric Biggers wrote: > > On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote: > > The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2. > > On platforms with Zhaoxin CPUs supporting this X86 feature, When > > crc32c-intel and crc32c-generic are both registered, system will > > use crc32c-intel because its .cra_priority is greater than > > crc32c-generic. This case expect to use crc32c-generic driver for > > some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin > > CPUs support from crc32c-intel. > > > > Signed-off-by: Tony W Wang-oc > > Does this mean that the performance of the crc32c instruction on those CPUs is > actually slower than a regular C implementation? That's very weird. > This driver does not use CRC instructions, but carryless multiplication and aggregation. So I suppose the pclmulqdq instruction triggers some pathological performance limitation here. That means the crct10dif driver probably needs the same treatment.
Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
On 11/12/2020 21:00, Peter Zijlstra wrote: > On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote: >> The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2. >> On platforms with Zhaoxin CPUs supporting this X86 feature, When >> crc32c-intel and crc32c-generic are both registered, system will >> use crc32c-intel because its .cra_priority is greater than >> crc32c-generic. This case expect to use crc32c-generic driver for >> some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin >> CPUs support from crc32c-intel. >> >> Signed-off-by: Tony W Wang-oc >> --- >> arch/x86/crypto/crc32c-intel_glue.c | 8 >> 1 file changed, 8 insertions(+) >> >> diff --git a/arch/x86/crypto/crc32c-intel_glue.c >> b/arch/x86/crypto/crc32c-intel_glue.c >> index feccb52..6dafdae 100644 >> --- a/arch/x86/crypto/crc32c-intel_glue.c >> +++ b/arch/x86/crypto/crc32c-intel_glue.c >> @@ -222,8 +222,16 @@ MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id); >> >> static int __init crc32c_intel_mod_init(void) >> { >> +struct cpuinfo_x86 *c = _cpu_data; >> + >> if (!x86_match_cpu(crc32c_cpu_id)) >> return -ENODEV; >> + >> +if (c->x86_vendor == X86_VENDOR_ZHAOXIN || c->x86_vendor == >> X86_VENDOR_CENTAUR) { >> +if (c->x86 == 0x6 || (c->x86 == 0x7 && c->x86_model <= 0x3b)) >> +return -ENODEV; >> +} > > Egads, why can't you use that x86_match_cpu() above, and also this > really wants a comment on why you're excluding these chips. When doing lmbench3 Create and Delete file test on partitions with ext4 enabling metadata checksum, found using crc32c-generic driver could get about 20% performance gain than using the driver crc32c-intel on these chips. Also, since > (IIRC) ZHAOXIN is basically AND, shouldn't they also be listed? > > That is; write it like: > > m = x86_match_cpu(crc32_cpu_id); > if (!m || !m->data) > return -ENODEV; > > That way you can have positive and negative matches in the array > (obviously the existing FEATURE test would need data=1 and be last). > . > Lot thanks for you suggestion, will list these chips in crc32c_cpu_id and use x86_match_cpu: static const struct x86_cpu_id crc32c_cpu_id[] = { + X86_MATCH_VENDOR_FAM_FEATURE(ZHAOXIN, 0x6, X86_FEATURE_XMM4_2, 1), + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x1b, X86_FEATURE_XMM4_2, 1), + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(ZHAOXIN, 0x7, 0x3b, X86_FEATURE_XMM4_2, 1), + X86_MATCH_VENDOR_FAM_FEATURE(CENTAUR, 0x6, X86_FEATURE_XMM4_2, 1), + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x1b, X86_FEATURE_XMM4_2, 1), + X86_MATCH_VENDOR_FAM_MODEL_FEATURE(CENTAUR, 0x7, 0x3b, X86_FEATURE_XMM4_2, 1), X86_MATCH_FEATURE(X86_FEATURE_XMM4_2, NULL), {} }; @@ -228,8 +234,10 @@ MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id); static int __init crc32c_intel_mod_init(void) { - if (!x86_match_cpu(crc32c_cpu_id)) + const struct x86_cpu_id *m = x86_match_cpu(crc32c_cpu_id); + if (!m || m->driver_data) return -ENODEV; sincerely TonyWWangoc
Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote: > The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2. > On platforms with Zhaoxin CPUs supporting this X86 feature, When > crc32c-intel and crc32c-generic are both registered, system will > use crc32c-intel because its .cra_priority is greater than > crc32c-generic. This case expect to use crc32c-generic driver for > some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin > CPUs support from crc32c-intel. > > Signed-off-by: Tony W Wang-oc Does this mean that the performance of the crc32c instruction on those CPUs is actually slower than a regular C implementation? That's very weird. - Eric
Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote: > The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2. > On platforms with Zhaoxin CPUs supporting this X86 feature, When > crc32c-intel and crc32c-generic are both registered, system will > use crc32c-intel because its .cra_priority is greater than > crc32c-generic. This case expect to use crc32c-generic driver for > some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin > CPUs support from crc32c-intel. > > Signed-off-by: Tony W Wang-oc > --- > arch/x86/crypto/crc32c-intel_glue.c | 8 > 1 file changed, 8 insertions(+) > > diff --git a/arch/x86/crypto/crc32c-intel_glue.c > b/arch/x86/crypto/crc32c-intel_glue.c > index feccb52..6dafdae 100644 > --- a/arch/x86/crypto/crc32c-intel_glue.c > +++ b/arch/x86/crypto/crc32c-intel_glue.c > @@ -222,8 +222,16 @@ MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id); > > static int __init crc32c_intel_mod_init(void) > { > + struct cpuinfo_x86 *c = _cpu_data; > + > if (!x86_match_cpu(crc32c_cpu_id)) > return -ENODEV; > + > + if (c->x86_vendor == X86_VENDOR_ZHAOXIN || c->x86_vendor == > X86_VENDOR_CENTAUR) { > + if (c->x86 == 0x6 || (c->x86 == 0x7 && c->x86_model <= 0x3b)) > + return -ENODEV; > + } Egads, why can't you use that x86_match_cpu() above, and also this really wants a comment on why you're excluding these chips. Also, since (IIRC) ZHAOXIN is basically AND, shouldn't they also be listed? That is; write it like: m = x86_match_cpu(crc32_cpu_id); if (!m || !m->data) return -ENODEV; That way you can have positive and negative matches in the array (obviously the existing FEATURE test would need data=1 and be last).
[PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs
The driver crc32c-intel match CPUs supporting X86_FEATURE_XMM4_2. On platforms with Zhaoxin CPUs supporting this X86 feature, When crc32c-intel and crc32c-generic are both registered, system will use crc32c-intel because its .cra_priority is greater than crc32c-generic. This case expect to use crc32c-generic driver for some Zhaoxin CPUs to get performance gain, So remove these Zhaoxin CPUs support from crc32c-intel. Signed-off-by: Tony W Wang-oc --- arch/x86/crypto/crc32c-intel_glue.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/x86/crypto/crc32c-intel_glue.c b/arch/x86/crypto/crc32c-intel_glue.c index feccb52..6dafdae 100644 --- a/arch/x86/crypto/crc32c-intel_glue.c +++ b/arch/x86/crypto/crc32c-intel_glue.c @@ -222,8 +222,16 @@ MODULE_DEVICE_TABLE(x86cpu, crc32c_cpu_id); static int __init crc32c_intel_mod_init(void) { + struct cpuinfo_x86 *c = _cpu_data; + if (!x86_match_cpu(crc32c_cpu_id)) return -ENODEV; + + if (c->x86_vendor == X86_VENDOR_ZHAOXIN || c->x86_vendor == X86_VENDOR_CENTAUR) { + if (c->x86 == 0x6 || (c->x86 == 0x7 && c->x86_model <= 0x3b)) + return -ENODEV; + } + #ifdef CONFIG_X86_64 if (boot_cpu_has(X86_FEATURE_PCLMULQDQ)) { alg.update = crc32c_pcl_intel_update; -- 2.7.4