[tip: sched/core] sched: Optimize __calc_delta()

2021-03-10 Thread tip-bot2 for Clement Courbet
The following commit has been merged into the sched/core branch of tip: Commit-ID: 1e17fb8edc5ad6587e9303ccdebce853bc8cf30c Gitweb: https://git.kernel.org/tip/1e17fb8edc5ad6587e9303ccdebce853bc8cf30c Author:Clement Courbet AuthorDate:Wed, 03 Mar 2021 14:46:53 -08:00

Re: [PATCH 0/4] -ffreestanding/-fno-builtin-* patches

2020-08-19 Thread Clement Courbet
On Tue, Aug 18, 2020 at 9:58 PM Nick Desaulniers wrote: On Tue, Aug 18, 2020 at 12:25 PM Nick Desaulniers wrote: > > On Tue, Aug 18, 2020 at 12:19 PM Linus Torvalds > wrote: > > > > And honestly, a compiler that uses 'bcmp' is just broken. WTH? It's > > the year 2020, we don't use bcmp. It's

[PATCH v5] lib: optimize cpumask_next_and()

2017-11-30 Thread Clement Courbet
> So I think it really worth to be separated patch. Really, it's > completely nontrivial why adding new function in lib/find_bit.c > requires including asm-generic/bitops/find.h in arm and uncore32 > asm/bitops.h headers (bug?). And why doing that makes you guard > find_first_bit and

[PATCH v5] lib: optimize cpumask_next_and()

2017-11-30 Thread Clement Courbet
> So I think it really worth to be separated patch. Really, it's > completely nontrivial why adding new function in lib/find_bit.c > requires including asm-generic/bitops/find.h in arm and uncore32 > asm/bitops.h headers (bug?). And why doing that makes you guard > find_first_bit and

[PATCH v6] lib: optimize cpumask_next_and()

2017-11-29 Thread Clement Courbet
s, 66 iterations [ 267.342627] Start testing find_next_and_bit() with sparse bitmap [ 267.356919] find_next_and_bit: 91 cycles, 1 iterations Signed-off-by: Clement Courbet <cour...@google.com> --- Changes in v2: - Refactored _find_next_common_bit into _find_next_bit., as suggested by Yury Norov. Th

[PATCH v6] lib: optimize cpumask_next_and()

2017-11-29 Thread Clement Courbet
s, 66 iterations [ 267.342627] Start testing find_next_and_bit() with sparse bitmap [ 267.356919] find_next_and_bit: 91 cycles, 1 iterations Signed-off-by: Clement Courbet --- Changes in v2: - Refactored _find_next_common_bit into _find_next_bit., as suggested by Yury Norov. This has no adverse e

[PATCH v5] lib: optimize cpumask_next_and()

2017-11-29 Thread Clement Courbet
> > Note that on Arm (), the new c implementation still outperforms the > > old one that uses c+ the asm implementation of `find_next_bit` [3]. > What is 'c+'? Is it typo? I meant "a mix of C and asm" ~(C + asm). Rephrased. > If you find generic find_bit() on arm faster that asm one, we'd >

[PATCH v5] lib: optimize cpumask_next_and()

2017-11-29 Thread Clement Courbet
> > Note that on Arm (), the new c implementation still outperforms the > > old one that uses c+ the asm implementation of `find_next_bit` [3]. > What is 'c+'? Is it typo? I meant "a mix of C and asm" ~(C + asm). Rephrased. > If you find generic find_bit() on arm faster that asm one, we'd >

[PATCH v5] lib: optimize cpumask_next_and()

2017-11-28 Thread Clement Courbet
rt testing find_next_and_bit() with sparse bitmap [ 267.349992] find_next_and_bit_ref: 193 cycles, 1 iterations [ 267.356919] find_next_and_bit: 91 cycles, 1 iterations Signed-off-by: Clement Courbet <cour...@google.com> --- Changes in v2: - Refactored _find_next_common_bit into _find_

[PATCH v5] lib: optimize cpumask_next_and()

2017-11-28 Thread Clement Courbet
rt testing find_next_and_bit() with sparse bitmap [ 267.349992] find_next_and_bit_ref: 193 cycles, 1 iterations [ 267.356919] find_next_and_bit: 91 cycles, 1 iterations Signed-off-by: Clement Courbet --- Changes in v2: - Refactored _find_next_common_bit into _find_next_bit., as suggested by

[PATCH] lib: test module for find_*_bit() functions

2017-11-09 Thread Clement Courbet
Reviewed-By: Clement Courbet <cour...@google.com> Thanks for the addition, Yury ! I've used a modified version of v1 for measuring improvements from find_next_and_bit() on x86 and arm and found it very useful.

[PATCH] lib: test module for find_*_bit() functions

2017-11-09 Thread Clement Courbet
Reviewed-By: Clement Courbet Thanks for the addition, Yury ! I've used a modified version of v1 for measuring improvements from find_next_and_bit() on x86 and arm and found it very useful.

[PATCH] lib: hint GCC to inlilne _find_next_bit() helper

2017-10-30 Thread Clement Courbet
Hi Yury, I've tried your benchmark on x86-64 (haswell). Inlining is a pretty small increase in binary size: 48B (2%). In terms of speed, results are not very stable from one run to another (I've included two runs to give you an idea), but overall there seems to be small improvement on the

[PATCH] lib: hint GCC to inlilne _find_next_bit() helper

2017-10-30 Thread Clement Courbet
Hi Yury, I've tried your benchmark on x86-64 (haswell). Inlining is a pretty small increase in binary size: 48B (2%). In terms of speed, results are not very stable from one run to another (I've included two runs to give you an idea), but overall there seems to be small improvement on the

[PATCH v4] lib: optimize cpumask_next_and()

2017-10-26 Thread Clement Courbet
.18 0x 0x 1.17 0x 0x 1.25 --------- geo.mean 2.06 Signed-off-by: Clement Courbet <cour...@google.com> --- Changes in v2: - Refactored _find_next_common_bit into _find_next_bit., as suggested by Yury Norov. This has no adverse effects on the performance s

[PATCH v4] lib: optimize cpumask_next_and()

2017-10-26 Thread Clement Courbet
.18 0x 0x 1.17 0x 0x 1.25 --------- geo.mean 2.06 Signed-off-by: Clement Courbet --- Changes in v2: - Refactored _find_next_common_bit into _find_next_bit., as suggested by Yury Norov. This has no adverse effects on the performance side, as the compiler successful

[PATCH v2] lib: optimize cpumask_next_and()

2017-10-26 Thread Clement Courbet
Hi Alexey, > Gentoo ships 5.4.0 which doesn't inline this code on x86_64 defconfig > (which has OPTIMIZE_INLINING). I have not actually marked _find_next_bit() inline, it just turns out that my compiler inlines it. I've tried out marking the function inline and OPTIMIZE_INLINING does not

[PATCH v2] lib: optimize cpumask_next_and()

2017-10-26 Thread Clement Courbet
Hi Alexey, > Gentoo ships 5.4.0 which doesn't inline this code on x86_64 defconfig > (which has OPTIMIZE_INLINING). I have not actually marked _find_next_bit() inline, it just turns out that my compiler inlines it. I've tried out marking the function inline and OPTIMIZE_INLINING does not

[PATCH v3] lib: optimize cpumask_next_and()

2017-10-26 Thread Clement Courbet
.18 0x 0x 1.17 0x 0x 1.25 --------- geo.mean 2.06 Signed-off-by: Clement Courbet <cour...@google.com> --- Changes in v2: - Refactored _find_next_common_bit into _find_next_bit., as suggested by Yury Norov. This has no adverse effects on the performance s

[PATCH v3] lib: optimize cpumask_next_and()

2017-10-26 Thread Clement Courbet
.18 0x 0x 1.17 0x 0x 1.25 --------- geo.mean 2.06 Signed-off-by: Clement Courbet --- Changes in v2: - Refactored _find_next_common_bit into _find_next_bit., as suggested by Yury Norov. This has no adverse effects on the performance side, as the compiler successful

Re [PATCH v2] lib: optimize cpumask_next_and()

2017-10-25 Thread Clement Courbet
Thanks for the comments Yury. > But I'd like also to keep _find_next_bit() consistent with > _find_next_bit_le() Not sure I understand what you're suggesting here: Do you want a find_next_and_bit_le() or do you want to make _find_next_bit_le() more like _find_next_bit() ? In the latter case we

Re [PATCH v2] lib: optimize cpumask_next_and()

2017-10-25 Thread Clement Courbet
Thanks for the comments Yury. > But I'd like also to keep _find_next_bit() consistent with > _find_next_bit_le() Not sure I understand what you're suggesting here: Do you want a find_next_and_bit_le() or do you want to make _find_next_bit_le() more like _find_next_bit() ? In the latter case we

[PATCH v2] lib: optimize cpumask_next_and()

2017-10-24 Thread Clement Courbet
.18 0x 0x 1.17 0x 0x 1.25 --------- geo.mean 2.06 Signed-off-by: Clement Courbet <cour...@google.com> --- Changes in v2: - Refactored _find_next_common_bit into _find_next_bit., as suggested by Yury Norov. This has no adverse effects on the performance side,

[PATCH v2] lib: optimize cpumask_next_and()

2017-10-24 Thread Clement Courbet
.18 0x 0x 1.17 0x 0x 1.25 --------- geo.mean 2.06 Signed-off-by: Clement Courbet --- Changes in v2: - Refactored _find_next_common_bit into _find_next_bit., as suggested by Yury Norov. This has no adverse effects on the performance side, as the compiler suc

[PATCH] lib: optimize cpumask_next_and()

2017-10-23 Thread Clement Courbet
.18 0x 0x 1.17 0x 0x 1.25 --------- geo.mean 2.06 Signed-off-by: Clement Courbet <cour...@google.com> --- include/asm-generic/bitops/find.h | 16 include/linux/bitmap.h | 2 ++ lib/cpumask.c

[PATCH] lib: optimize cpumask_next_and()

2017-10-23 Thread Clement Courbet
.18 0x 0x 1.17 0x 0x 1.25 --------- geo.mean 2.06 Signed-off-by: Clement Courbet --- include/asm-generic/bitops/find.h | 16 include/linux/bitmap.h | 2 ++ lib/cpumask.c