Re: [PATCH v1 1/3] x86/cpufeatures: Add low performance CRC32C instruction CPU feature

2021-01-11 Thread hpa
On January 6, 2021 10:37:50 PM PST, Borislav Petkov  wrote:
>On Thu, Jan 07, 2021 at 02:19:06PM +0800, Tony W Wang-oc wrote:
>> SSE4.2 on Zhaoxin CPUs are compatible with Intel. The presence of
>> CRC32C instruction is enumerated by CPUID.01H:ECX.SSE4_2[bit 20] = 1.
>> Some Zhaoxin CPUs declare support SSE4.2 instruction sets but their
>> CRC32C instruction are working with low performance.
>> 
>> Add a synthetic CPU flag to indicates that the CRC32C instruction is
>> not working as intended. This low performance CRC32C instruction flag
>> is depend on X86_FEATURE_XMM4_2.
>> 
>> Signed-off-by: Tony W Wang-oc 
>> ---
>>  arch/x86/include/asm/cpufeatures.h | 1 +
>>  arch/x86/kernel/cpu/cpuid-deps.c   | 1 +
>>  2 files changed, 2 insertions(+)
>> 
>> diff --git a/arch/x86/include/asm/cpufeatures.h
>b/arch/x86/include/asm/cpufeatures.h
>> index 84b8878..9e8151b 100644
>> --- a/arch/x86/include/asm/cpufeatures.h
>> +++ b/arch/x86/include/asm/cpufeatures.h
>> @@ -292,6 +292,7 @@
>>  #define X86_FEATURE_FENCE_SWAPGS_KERNEL (11*32+ 5) /* "" LFENCE in
>kernel entry SWAPGS path */
>>  #define X86_FEATURE_SPLIT_LOCK_DETECT   (11*32+ 6) /* #AC for split
>lock */
>>  #define X86_FEATURE_PER_THREAD_MBA  (11*32+ 7) /* "" Per-thread
>Memory Bandwidth Allocation */
>> +#define X86_FEATURE_CRC32C  (11*32+ 8) /* "" Low performance CRC32C
>instruction */
>
>Didn't hpa say to create a BUG flag for it - X86_BUG...? Low
>performance
>insn sounds like a bug and not a feature to me.
>
>And call it X86_BUG_CRC32C_SLOW or ..._UNUSABLE to denote what it
>means.
>
>Thx.

Yes, it should be a BUG due to the inclusion properties of BUG v FEATURE.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

2020-12-21 Thread hpa
On December 21, 2020 7:01:39 PM PST, tonywwang...@zhaoxin.com wrote:
>On December 22, 2020 3:27:33 AM GMT+08:00, h...@zytor.com wrote:
>>On December 20, 2020 6:46:25 PM PST, tonywwang...@zhaoxin.com wrote:
>>>On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers
>>> wrote:
On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote:
> 
> On 15/12/2020 04:41, Eric Biggers wrote:
> > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote:
> >> On 12/12/2020 01:43, Eric Biggers wrote:
> >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc
>wrote:
>  The driver crc32c-intel match CPUs supporting
X86_FEATURE_XMM4_2.
>  On platforms with Zhaoxin CPUs supporting this X86 feature,
>>>When
>  crc32c-intel and crc32c-generic are both registered, system
>>>will
>  use crc32c-intel because its .cra_priority is greater than
>  crc32c-generic. This case expect to use crc32c-generic driver
for
>  some Zhaoxin CPUs to get performance gain, So remove these
Zhaoxin
>  CPUs support from crc32c-intel.
> 
>  Signed-off-by: Tony W Wang-oc 
> >>>
> >>> Does this mean that the performance of the crc32c instruction
>>on
those CPUs is
> >>> actually slower than a regular C implementation?  That's very
weird.
> >>>
> >>
> >> From the lmbench3 Create and Delete file test on those chips, I
think yes.
> >>
> > 
> > Did you try measuring the performance of the hashing itself, and
not some
> > higher-level filesystem operations?
> > 
> 
> Yes. Was testing on these Zhaoxin CPUs, the result is that with
>the
same
> input value the generic C implementation takes fewer time than the
> crc32c instruction implementation.
> 

And that is really "working as intended"?
>>>
>>>These CPU's crc32c instruction is not working as intended.
>>>
>>>  Why do these CPUs even
declare that
they support the crc32c instruction, when it is so slow?

>>>
>>>The presence of crc32c and some other instructions supports are
>>>enumerated by CPUID.01:ECX[SSE4.2] = 1,  other instructions are ok
>>>except the crc32c instruction.
>>>
Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2,
>AVX,
etc.) that
these CPUs similarly declare support for but they are uselessly
>slow?
>>>
>>>No.
>>>
>>>Sincerely
>>>Tonyw
>>>

- Eric
>>
>>Then the right thing to do is to disable the CPUID bit in the
>>vendor-specific startup code.
>
>This way makes these CPUs do not support all instruction sets
>enumerated
>by CPUID.01:ECX[SSE4.2].
>While only crc32c instruction is slow, just expect the crc32c-intel
>driver do not
>match these CPUs.
>
>Sincerely
>Tonyw

Then create a BUG flag for it, or factor out CRC32C into a synthetic flag. We 
*do not* bury this information in drivers; it becomes a recipe for the same 
problems over and over.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [PATCH] crypto: x86/crc32c-intel - Don't match some Zhaoxin CPUs

2020-12-21 Thread hpa
On December 20, 2020 6:46:25 PM PST, tonywwang...@zhaoxin.com wrote:
>On December 16, 2020 1:56:45 AM GMT+08:00, Eric Biggers
> wrote:
>>On Tue, Dec 15, 2020 at 10:15:29AM +0800, Tony W Wang-oc wrote:
>>> 
>>> On 15/12/2020 04:41, Eric Biggers wrote:
>>> > On Mon, Dec 14, 2020 at 10:28:19AM +0800, Tony W Wang-oc wrote:
>>> >> On 12/12/2020 01:43, Eric Biggers wrote:
>>> >>> On Fri, Dec 11, 2020 at 07:29:04PM +0800, Tony W Wang-oc wrote:
>>>  The driver crc32c-intel match CPUs supporting
>>X86_FEATURE_XMM4_2.
>>>  On platforms with Zhaoxin CPUs supporting this X86 feature,
>When
>>>  crc32c-intel and crc32c-generic are both registered, system
>will
>>>  use crc32c-intel because its .cra_priority is greater than
>>>  crc32c-generic. This case expect to use crc32c-generic driver
>>for
>>>  some Zhaoxin CPUs to get performance gain, So remove these
>>Zhaoxin
>>>  CPUs support from crc32c-intel.
>>> 
>>>  Signed-off-by: Tony W Wang-oc 
>>> >>>
>>> >>> Does this mean that the performance of the crc32c instruction on
>>those CPUs is
>>> >>> actually slower than a regular C implementation?  That's very
>>weird.
>>> >>>
>>> >>
>>> >> From the lmbench3 Create and Delete file test on those chips, I
>>think yes.
>>> >>
>>> > 
>>> > Did you try measuring the performance of the hashing itself, and
>>not some
>>> > higher-level filesystem operations?
>>> > 
>>> 
>>> Yes. Was testing on these Zhaoxin CPUs, the result is that with the
>>same
>>> input value the generic C implementation takes fewer time than the
>>> crc32c instruction implementation.
>>> 
>>
>>And that is really "working as intended"?
>
>These CPU's crc32c instruction is not working as intended.
>
>  Why do these CPUs even
>>declare that
>>they support the crc32c instruction, when it is so slow?
>>
>
>The presence of crc32c and some other instructions supports are
>enumerated by CPUID.01:ECX[SSE4.2] = 1,  other instructions are ok
>except the crc32c instruction.
>
>>Are there any other instruction sets (AES-NI, PCLMUL, SSE, SSE2, AVX,
>>etc.) that
>>these CPUs similarly declare support for but they are uselessly slow?
>
>No.
>
>Sincerely
>Tonyw
>
>>
>>- Eric

Then the right thing to do is to disable the CPUID bit in the vendor-specific 
startup code.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [PATCH 3/7] x86, LLVM: suppress clang warnings about unaligned accesses

2017-03-17 Thread hpa
On March 16, 2017 5:15:16 PM PDT, Michael Davidson  wrote:
>Suppress clang warnings about potential unaliged accesses
>to members in packed structs. This gets rid of almost 10,000
>warnings about accesses to the ring 0 stack pointer in the TSS.
>
>Signed-off-by: Michael Davidson 
>---
> arch/x86/Makefile | 5 +
> 1 file changed, 5 insertions(+)
>
>diff --git a/arch/x86/Makefile b/arch/x86/Makefile
>index 894a8d18bf97..7f21703c475d 100644
>--- a/arch/x86/Makefile
>+++ b/arch/x86/Makefile
>@@ -128,6 +128,11 @@ endif
> KBUILD_CFLAGS += $(call cc-option,-maccumulate-outgoing-args)
> endif
> 
>+ifeq ($(cc-name),clang)
>+# Suppress clang warnings about potential unaligned accesses.
>+KBUILD_CFLAGS += $(call cc-disable-warning, address-of-packed-member)
>+endif
>+
> ifdef CONFIG_X86_X32
>   x32_ld_ok := $(call try-run,\
>   /bin/echo -e '1: .quad 1b' | \

Why conditional on clang?
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.


Re: [PATCH 6/7] md/raid10, LLVM: get rid of variable length array

2017-03-17 Thread hpa
On March 17, 2017 12:27:46 PM PDT, Peter Zijlstra  wrote:
>On Fri, Mar 17, 2017 at 11:52:01AM -0700, Michael Davidson wrote:
>> On Fri, Mar 17, 2017 at 5:44 AM, Peter Zijlstra
> wrote:
>> >
>> > Be that as it may; what you construct above is disgusting. Surely
>the
>> > code can be refactored to not look like dog vomit?
>> >
>> > Also; its not immediately obvious conf->copies is 'small' and this
>> > doesn't blow up the stack; I feel that deserves a comment
>somewhere.
>> >
>> 
>> I agree that the code is horrible.
>> 
>> It is, in fact, exactly the same solution that was used to remove
>> variable length arrays in structs from several of the crypto drivers
>a
>> few years ago - see the definition of SHASH_DESC_ON_STACK() in
>> "crypto/hash.h" - I did not, however, hide the horrors in a macro
>> preferring to leave the implementation visible as a warning to
>whoever
>> might touch the code next.
>> 
>> I believe that the actual stack usage is exactly the same as it was
>previously.
>> 
>> I can certainly wrap this  up in a macro and add comments with
>> appropriately dire warnings in it if you feel that is both necessary
>> and sufficient.
>
>We got away with ugly in the past, so we should get to do it again?

Seriously, you should have taken the hack the first time that this needs to be 
fixed.  Just because this is a fairly uncommon construct in the kernel doesn't 
mean it is not in userspace.

I would like to say this falls in the category of "fix your compiler this 
time".  Once is one thing, twice is unacceptable.
-- 
Sent from my Android device with K-9 Mail. Please excuse my brevity.