Re: [PATCH 1/1] riscv: Enable ARCH_HAS_FAST_MULTIPLIER for RV64I

2020-07-22 Thread Chenxi Mao
gt;     return (w * 0x01010101) >> 24; >>   28:    1b027c00     mul    w0, w0, w2 >> >> Only one "mov" instructions to load 0x1010101 and one "mul" instruction for >> multiply. >> >> >> Let me summary as below: >> >> 1

Re: [PATCH 1/1] riscv: Enable ARCH_HAS_FAST_MULTIPLIER for RV64I

2020-07-22 Thread Chenxi Mao
=  (w + (w >> 4)) & 0x0f0f0f0f; Chenxi On 2020/7/21 上午9:17, Palmer Dabbelt wrote: > On Wed, 08 Jul 2020 22:19:22 PDT (-0700), maoche...@eswin.com wrote: >> Enable ARCH_HAS_FAST_MULTIPLIER on RV64I >> which works fine on GCC-9.3 and GCC-10.1 >> >> PS2: remove

Re: [PATCH 1/1] riscv: Enable ARCH_HAS_FAST_MULTIPLIER for RV64I

2020-07-20 Thread Chenxi Mao
Hi Palmer: Move to RISCV platform is ok for me, but I cannot evaluate RV32 condition. Chenxi On 2020/7/21 上午9:47, Chenxi Mao wrote: > Hi Palmer: > > Thanks for your reply. > > Frankly, I didn't test ARCH_HAS_FAST_MULTIPLIER on RV32, > > so I cannot put it in RISCV plat

Re: [PATCH 1/1] riscv: Enable ARCH_HAS_FAST_MULTIPLIER for RV64I

2020-07-20 Thread Chenxi Mao
t;> >> PS2: remove ARCH_SUPPORTS_INT128 because of RV64I already enabled. >> >> Signed-off-by: Chenxi Mao >> --- >>  arch/riscv/Kconfig | 1 + >>  1 file changed, 1 insertion(+) >> >> diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig >> index 128192

[PATCH 1/1] riscv: Enable ARCH_HAS_FAST_MULTIPLIER for RV64I

2020-07-08 Thread Chenxi Mao
Enable ARCH_HAS_FAST_MULTIPLIER on RV64I which works fine on GCC-9.3 and GCC-10.1 PS2: remove ARCH_SUPPORTS_INT128 because of RV64I already enabled. Signed-off-by: Chenxi Mao --- arch/riscv/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig

[PATCH 1/1] riscv: Enable compiler optimizations

2020-07-07 Thread Chenxi Mao
Enable ARCH_HAS_FAST_MULTIPLIER and ARCH_SUPPORTS_INT128 for better code generation. These 2 configurations works fine on GCC-9.3 and GCC-10.1 Signed-off-by: Chenxi Mao --- arch/riscv/Kconfig | 2 ++ 1 file changed, 2 insertions(+) diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index

[PATCH v2 1/1] riscv: Select ARCH_SUPPORTS_ATOMIC_RMW by default

2020-06-04 Thread Chenxi Mao
Select ARCH_SUPPORTS_ATOMIC_RMW by default to enabel osqlocks. PS2: Add signed off info. Signed-off-by: Chenxi Mao --- arch/riscv/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index a31e1a41913a..cbdc605d20d9 100644 --- a/arch/riscv/Kconfig

[PATCH 1/1] riscv: Select ARCH_SUPPORTS_ATOMIC_RMW by default

2020-06-04 Thread Chenxi Mao
Select ARCH_SUPPORTS_ATOMIC_RMW by default to enabel osqlocks. PS2: Add signed off info. Signed-off-by: Chenxi Mao --- arch/riscv/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index a31e1a41913a..cbdc605d20d9 100644 --- a/arch/riscv/Kconfig

[PATCH 1/1] riscv: Select ARCH_SUPPORTS_ATOMIC_RMW by default

2020-06-01 Thread Chenxi Mao
Enable ARCH_SUPPORTS_ATOMIC_RMW by default to support osq_lock in mutex/rwsem locks. --- arch/riscv/Kconfig | 1 + 1 file changed, 1 insertion(+) diff --git a/arch/riscv/Kconfig b/arch/riscv/Kconfig index a31e1a41913a..cbdc605d20d9 100644 --- a/arch/riscv/Kconfig +++ b/arch/riscv/Kconfig @@

[PATCH 1/1] LZ4: Port LZ4 1.9.x FAST_DEC_LOOP and enable it on x86 and ARM64

2019-05-17 Thread Chenxi Mao
FAST_DEC_LOOP was introduced from LZ4 1.9.0[1] This change would be introduce 10% on decompress operation according to LZ4 benchmark result on X86 devices. Meanwhile, LZ4 with FAST_DEC_LOOP could get improvements on ARM64, however clang compiler has downgrade if FAST_DEC_LOOP enabled. So

[PATCH 1/1] LZ4: Port LZ4 1.9.x FAST_DEC_LOOP and enable it on x86 and ARM64

2019-05-16 Thread Chenxi Mao
FAST_DEC_LOOP was introduced from LZ4 1.9[1]. This change would be introduce 10% on decompress operation according to LZ4 benchmark result on X86 devices. Meanwhile, LZ4 with FAST_DEC_LOOP could get improvements, however clang compiler has downgrade if FAST_DEC_LOOP enabled. So FAST_DEC_LOOP only

[PATCH 1/1] LZ4: Port LZ4 1.9.x FAST_DEC_LOOP and enable it on x86 and ARM64

2019-05-14 Thread Chenxi Mao
FAST_DEC_LOOP was introduced from LZ4 1.9. This change would be introduce 10% on decompress operation according to LZ4 benchmark result on X86 devices. Meanwhile, LZ4 with FAST_DEC_LOOP could get improvements, however clang compiler has downgrade if FAST_DEC_LOOP enabled. So FAST_DEC_LOOP only