Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2017-07-21 Thread Peter Zijlstra
On Fri, Jul 21, 2017 at 03:26:21PM +0200, Peter Zijlstra wrote: > > EVENT=0 -DNEW=1 -DFLS=1 > event: 19.626050 +- 0.038995 > EVENT=0 -DNEW=1 -DFLS=1 -DWIPE_BTB=1 > event: 109.610670 +- 0.425667 > > EVENT=0 -DNEW=1 -DFLS=1 -DANSHUL=1 > event: 21.445680 +- 0.043782 > EVENT=0 -DNEW=1 -DFLS=1

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2017-07-21 Thread Peter Zijlstra
On Fri, Jul 21, 2017 at 03:26:21PM +0200, Peter Zijlstra wrote: > > EVENT=0 -DNEW=1 -DFLS=1 > event: 19.626050 +- 0.038995 > EVENT=0 -DNEW=1 -DFLS=1 -DWIPE_BTB=1 > event: 109.610670 +- 0.425667 > > EVENT=0 -DNEW=1 -DFLS=1 -DANSHUL=1 > event: 21.445680 +- 0.043782 > EVENT=0 -DNEW=1 -DFLS=1

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2017-07-21 Thread Peter Zijlstra
On Fri, Jul 21, 2017 at 05:15:10AM -0700, Joe Perches wrote: > On Fri, 2017-07-21 at 13:40 +0200, Peter Zijlstra wrote: > > @@ -21,7 +22,11 @@ unsigned long int_sqrt(unsigned long x) > > if (x <= 1) > > return x; > > > > - m = 1UL << (BITS_PER_LONG - 2); > > + m = 1UL <<

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2017-07-21 Thread Peter Zijlstra
On Fri, Jul 21, 2017 at 05:15:10AM -0700, Joe Perches wrote: > On Fri, 2017-07-21 at 13:40 +0200, Peter Zijlstra wrote: > > @@ -21,7 +22,11 @@ unsigned long int_sqrt(unsigned long x) > > if (x <= 1) > > return x; > > > > - m = 1UL << (BITS_PER_LONG - 2); > > + m = 1UL <<

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2017-07-21 Thread Joe Perches
On Fri, 2017-07-21 at 13:40 +0200, Peter Zijlstra wrote: > @@ -21,7 +22,11 @@ unsigned long int_sqrt(unsigned long x) > if (x <= 1) > return x; > > - m = 1UL << (BITS_PER_LONG - 2); > + m = 1UL << (__fls(x) & ~1U); > + > + while (m > x) > + m >>= 2;

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2017-07-21 Thread Joe Perches
On Fri, 2017-07-21 at 13:40 +0200, Peter Zijlstra wrote: > @@ -21,7 +22,11 @@ unsigned long int_sqrt(unsigned long x) > if (x <= 1) > return x; > > - m = 1UL << (BITS_PER_LONG - 2); > + m = 1UL << (__fls(x) & ~1U); > + > + while (m > x) > + m >>= 2;

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2017-07-21 Thread Peter Zijlstra
On Thu, Jul 20, 2017 at 04:24:32PM -0700, Linus Torvalds wrote: > So mayube something like the attached? > > NOTE NOTE NOTE! This may be pure garbage. It's untested. But the code > generation looks ok even if gcc is being way too clever and turns that > first loop into a counted loop instead.

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2017-07-21 Thread Peter Zijlstra
On Thu, Jul 20, 2017 at 04:24:32PM -0700, Linus Torvalds wrote: > So mayube something like the attached? > > NOTE NOTE NOTE! This may be pure garbage. It's untested. But the code > generation looks ok even if gcc is being way too clever and turns that > first loop into a counted loop instead.

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2017-07-20 Thread Linus Torvalds
On Thu, Jul 20, 2017 at 3:34 PM, Peter Zijlstra wrote: >> >> "Why does this matter, and what is the range it matters for?" > > I was looking to do some work on the idle estimator. Parts of that keep > online avg and variance for normal distributions. I wanted to bias the >

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2017-07-20 Thread Linus Torvalds
On Thu, Jul 20, 2017 at 3:34 PM, Peter Zijlstra wrote: >> >> "Why does this matter, and what is the range it matters for?" > > I was looking to do some work on the idle estimator. Parts of that keep > online avg and variance for normal distributions. I wanted to bias the > avg downwards, the way

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2017-07-20 Thread Peter Zijlstra
On Thu, Jul 20, 2017 at 11:31:36AM -0700, Linus Torvalds wrote: > How did this two-year old thread get resurrected? I was looking for the original thread doing that 'optimization' Davidlohr did but found this first. > And the *most* important question is that first one: > > "Why does this

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2017-07-20 Thread Peter Zijlstra
On Thu, Jul 20, 2017 at 11:31:36AM -0700, Linus Torvalds wrote: > How did this two-year old thread get resurrected? I was looking for the original thread doing that 'optimization' Davidlohr did but found this first. > And the *most* important question is that first one: > > "Why does this

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2017-07-20 Thread Linus Torvalds
How did this two-year old thread get resurrected? Anyway, it got resurrected without even answering one core question: On Thu, Jul 20, 2017 at 4:24 AM, Peter Zijlstra wrote: > On Mon, Feb 02, 2015 at 11:13:44AM -0800, Linus Torvalds wrote: On Mon, Feb 2, 2015 at 11:00

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2017-07-20 Thread Linus Torvalds
How did this two-year old thread get resurrected? Anyway, it got resurrected without even answering one core question: On Thu, Jul 20, 2017 at 4:24 AM, Peter Zijlstra wrote: > On Mon, Feb 02, 2015 at 11:13:44AM -0800, Linus Torvalds wrote: On Mon, Feb 2, 2015 at 11:00 AM, Linus Torvalds

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2017-07-20 Thread Peter Zijlstra
On Thu, Jul 20, 2017 at 01:24:49PM +0200, Peter Zijlstra wrote: > ~/tmp$ gcc -o sqrt sqrt.c -lm -O2 -DLOOPS=1000 -DNEW=1 -DFLS=1 -DANSHUL=1 > ; perf stat --repeat 10 -e cycles:u -e instructions:u ./sqrt > > Performance counter stats for './sqrt' (10 runs): > >328,415,775

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2017-07-20 Thread Peter Zijlstra
On Thu, Jul 20, 2017 at 01:24:49PM +0200, Peter Zijlstra wrote: > ~/tmp$ gcc -o sqrt sqrt.c -lm -O2 -DLOOPS=1000 -DNEW=1 -DFLS=1 -DANSHUL=1 > ; perf stat --repeat 10 -e cycles:u -e instructions:u ./sqrt > > Performance counter stats for './sqrt' (10 runs): > >328,415,775

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2017-07-20 Thread Peter Zijlstra
On Thu, Jul 20, 2017 at 04:52:49AM -0700, Joe Perches wrote: > On Thu, 2017-07-20 at 13:24 +0200, Peter Zijlstra wrote: > [] > > + m = 1UL << ((fls(x) + 1) & ~1UL); > > maybe > > #if BITS_PER_LONG == 64 > m = 1UL << ((fls64(x) + 1) & ~1UL); > #else > m = 1UL << ((fls(x) + 1) &

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2017-07-20 Thread Peter Zijlstra
On Thu, Jul 20, 2017 at 04:52:49AM -0700, Joe Perches wrote: > On Thu, 2017-07-20 at 13:24 +0200, Peter Zijlstra wrote: > [] > > + m = 1UL << ((fls(x) + 1) & ~1UL); > > maybe > > #if BITS_PER_LONG == 64 > m = 1UL << ((fls64(x) + 1) & ~1UL); > #else > m = 1UL << ((fls(x) + 1) &

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2017-07-20 Thread Joe Perches
On Thu, 2017-07-20 at 13:24 +0200, Peter Zijlstra wrote: [] > + m = 1UL << ((fls(x) + 1) & ~1UL); maybe #if BITS_PER_LONG == 64 m = 1UL << ((fls64(x) + 1) & ~1UL); #else m = 1UL << ((fls(x) + 1) & ~1UL); #endif

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2017-07-20 Thread Joe Perches
On Thu, 2017-07-20 at 13:24 +0200, Peter Zijlstra wrote: [] > + m = 1UL << ((fls(x) + 1) & ~1UL); maybe #if BITS_PER_LONG == 64 m = 1UL << ((fls64(x) + 1) & ~1UL); #else m = 1UL << ((fls(x) + 1) & ~1UL); #endif

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2017-07-20 Thread Peter Zijlstra
On Mon, Feb 02, 2015 at 11:13:44AM -0800, Linus Torvalds wrote: > On Mon, Feb 2, 2015 at 11:00 AM, Linus Torvalds > wrote: > > > > (I'm also not entirely sure what uses int_sqrt() that ends up being so > > performance-critical, so it would be good to document that

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2017-07-20 Thread Peter Zijlstra
On Mon, Feb 02, 2015 at 11:13:44AM -0800, Linus Torvalds wrote: > On Mon, Feb 2, 2015 at 11:00 AM, Linus Torvalds > wrote: > > > > (I'm also not entirely sure what uses int_sqrt() that ends up being so > > performance-critical, so it would be good to document that too, since > > that probably

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2015-02-08 Thread Anshul Garg
Dear Mr. linus, Thanks for quick replies. Yes performance numbers are not conclusive enough. So its better to discard this patch as of now. I will try to explore more in this area. Thanks & regards Anshul Garg On Fri, Feb 6, 2015 at 1:07 AM, Linus Torvalds wrote: > On Thu, Feb 5, 2015 at

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2015-02-08 Thread Anshul Garg
Dear Mr. linus, Thanks for quick replies. Yes performance numbers are not conclusive enough. So its better to discard this patch as of now. I will try to explore more in this area. Thanks regards Anshul Garg On Fri, Feb 6, 2015 at 1:07 AM, Linus Torvalds torva...@linux-foundation.org

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2015-02-05 Thread Linus Torvalds
On Thu, Feb 5, 2015 at 10:43 AM, Anshul Garg wrote: > > NOTE :: > I have not used gcc optimizations while compilation. > With O2 level optimization proposed solution is taking more time. The thing is, the kernel is compiled with -O2, so that's what matters. Also, for very tight loops like this,

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2015-02-05 Thread Anshul Garg
On Thu, Feb 5, 2015 at 10:33 AM, Linus Torvalds wrote: > On Thu, Feb 5, 2015 at 10:20 AM, Linus Torvalds > wrote: >> >> Hmm. I did that too [..] > > Side note: one difference in our results (apart from possibly just CPU > uarch details) is that my loop goes to 100M to make it easier to just >

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2015-02-05 Thread Linus Torvalds
On Thu, Feb 5, 2015 at 10:20 AM, Linus Torvalds wrote: > > Hmm. I did that too [..] Side note: one difference in our results (apart from possibly just CPU uarch details) is that my loop goes to 100M to make it easier to just time it. Which means that my load essentially had about three more

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2015-02-05 Thread Linus Torvalds
On Tue, Feb 3, 2015 at 7:42 AM, Anshul Garg wrote: > > I have done profiling of int_sqrt function using perf tool for 10 times. > For this purpose i have created a userspace program which uses sqrt function > from 1 to a million. Hmm. I did that too, and it doesn't improve things for me. In

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2015-02-05 Thread Anshul Garg
On Thu, Feb 5, 2015 at 10:33 AM, Linus Torvalds torva...@linux-foundation.org wrote: On Thu, Feb 5, 2015 at 10:20 AM, Linus Torvalds torva...@linux-foundation.org wrote: Hmm. I did that too [..] Side note: one difference in our results (apart from possibly just CPU uarch details) is that my

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2015-02-05 Thread Linus Torvalds
On Tue, Feb 3, 2015 at 7:42 AM, Anshul Garg aksgarg1...@gmail.com wrote: I have done profiling of int_sqrt function using perf tool for 10 times. For this purpose i have created a userspace program which uses sqrt function from 1 to a million. Hmm. I did that too, and it doesn't improve

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2015-02-05 Thread Linus Torvalds
On Thu, Feb 5, 2015 at 10:20 AM, Linus Torvalds torva...@linux-foundation.org wrote: Hmm. I did that too [..] Side note: one difference in our results (apart from possibly just CPU uarch details) is that my loop goes to 100M to make it easier to just time it. Which means that my load

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2015-02-05 Thread Linus Torvalds
On Thu, Feb 5, 2015 at 10:43 AM, Anshul Garg aksgarg1...@gmail.com wrote: NOTE :: I have not used gcc optimizations while compilation. With O2 level optimization proposed solution is taking more time. The thing is, the kernel is compiled with -O2, so that's what matters. Also, for very tight

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2015-02-03 Thread Anshul Garg
On Tue, Feb 3, 2015 at 10:11 AM, Davidlohr Bueso wrote: > On Mon, 2015-02-02 at 11:13 -0800, Linus Torvalds wrote: >> IOW, instead of >> >> m = 1UL << (BITS_PER_LONG - 2); >> >> perhaps something like >> >>m = 1UL << (BITS_PER_LONG/2- 2); >>if (m < x) >>m <<= BITS_PER_LONG/2;

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2015-02-03 Thread Anshul Garg
On Tue, Feb 3, 2015 at 10:17 AM, Davidlohr Bueso wrote: > On Mon, 2015-02-02 at 11:00 -0800, Linus Torvalds wrote: >> Hmm. I don't disagree, but would like some more feedback. >> >> Davidlohr - you were the person to touch this function last (commit >> 30493cc9dddb: "lib/int_sqrt.c: optimize

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2015-02-03 Thread Anshul Garg
On Tue, Feb 3, 2015 at 10:17 AM, Davidlohr Bueso d...@stgolabs.net wrote: On Mon, 2015-02-02 at 11:00 -0800, Linus Torvalds wrote: Hmm. I don't disagree, but would like some more feedback. Davidlohr - you were the person to touch this function last (commit 30493cc9dddb: lib/int_sqrt.c:

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2015-02-03 Thread Anshul Garg
On Tue, Feb 3, 2015 at 10:11 AM, Davidlohr Bueso d...@stgolabs.net wrote: On Mon, 2015-02-02 at 11:13 -0800, Linus Torvalds wrote: IOW, instead of m = 1UL (BITS_PER_LONG - 2); perhaps something like m = 1UL (BITS_PER_LONG/2- 2); if (m x) m = BITS_PER_LONG/2; On doing

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2015-02-02 Thread Davidlohr Bueso
On Mon, 2015-02-02 at 20:41 -0800, Davidlohr Bueso wrote: > On Mon, 2015-02-02 at 11:13 -0800, Linus Torvalds wrote: > > IOW, instead of > > > > m = 1UL << (BITS_PER_LONG - 2); > > > > perhaps something like > > > >m = 1UL << (BITS_PER_LONG/2- 2); > >if (m < x) > >m <<=

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2015-02-02 Thread Davidlohr Bueso
On Mon, 2015-02-02 at 11:00 -0800, Linus Torvalds wrote: > Hmm. I don't disagree, but would like some more feedback. > > Davidlohr - you were the person to touch this function last (commit > 30493cc9dddb: "lib/int_sqrt.c: optimize square root algorithm"), and > you did so for performance reasons.

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2015-02-02 Thread Davidlohr Bueso
On Mon, 2015-02-02 at 11:13 -0800, Linus Torvalds wrote: > IOW, instead of > > m = 1UL << (BITS_PER_LONG - 2); > > perhaps something like > >m = 1UL << (BITS_PER_LONG/2- 2); >if (m < x) >m <<= BITS_PER_LONG/2; > > (assuming gcc can change that code into a "cmov") might cut

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2015-02-02 Thread Linus Torvalds
On Mon, Feb 2, 2015 at 11:10 AM, Joe Perches wrote: > > Perhaps removing the while and using fls(x) would be better. No. fls is often slow, and you then do have to also round it down to an even bit number etc, so it gets somewhat complex too. We *probably* have some argument range that we care

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2015-02-02 Thread Linus Torvalds
On Mon, Feb 2, 2015 at 11:00 AM, Linus Torvalds wrote: > > (I'm also not entirely sure what uses int_sqrt() that ends up being so > performance-critical, so it would be good to document that too, since > that probably also matters for the "what's the normal argument range" > question..) ... it's

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2015-02-02 Thread Joe Perches
On Mon, 2015-02-02 at 09:12 -0800, Anshul Garg wrote: > From: Anshul Garg > > Unnecessary instructions are executing even though m is > greater than x so added logic to make m less than equal to > x before performing these operations. > > Signed-off-by: Anshul Garg > --- > lib/int_sqrt.c |

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2015-02-02 Thread Linus Torvalds
Hmm. I don't disagree, but would like some more feedback. Davidlohr - you were the person to touch this function last (commit 30493cc9dddb: "lib/int_sqrt.c: optimize square root algorithm"), and you did so for performance reasons. And in fact, when you did that, you removed that initial loop: -

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2015-02-02 Thread Linus Torvalds
On Mon, Feb 2, 2015 at 11:10 AM, Joe Perches j...@perches.com wrote: Perhaps removing the while and using fls(x) would be better. No. fls is often slow, and you then do have to also round it down to an even bit number etc, so it gets somewhat complex too. We *probably* have some argument range

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2015-02-02 Thread Davidlohr Bueso
On Mon, 2015-02-02 at 11:00 -0800, Linus Torvalds wrote: Hmm. I don't disagree, but would like some more feedback. Davidlohr - you were the person to touch this function last (commit 30493cc9dddb: lib/int_sqrt.c: optimize square root algorithm), and you did so for performance reasons. And in

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2015-02-02 Thread Davidlohr Bueso
On Mon, 2015-02-02 at 20:41 -0800, Davidlohr Bueso wrote: On Mon, 2015-02-02 at 11:13 -0800, Linus Torvalds wrote: IOW, instead of m = 1UL (BITS_PER_LONG - 2); perhaps something like m = 1UL (BITS_PER_LONG/2- 2); if (m x) m = BITS_PER_LONG/2; Assuming small

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2015-02-02 Thread Davidlohr Bueso
On Mon, 2015-02-02 at 11:13 -0800, Linus Torvalds wrote: IOW, instead of m = 1UL (BITS_PER_LONG - 2); perhaps something like m = 1UL (BITS_PER_LONG/2- 2); if (m x) m = BITS_PER_LONG/2; (assuming gcc can change that code into a cmov) might cut down the lots of

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2015-02-02 Thread Linus Torvalds
Hmm. I don't disagree, but would like some more feedback. Davidlohr - you were the person to touch this function last (commit 30493cc9dddb: lib/int_sqrt.c: optimize square root algorithm), and you did so for performance reasons. And in fact, when you did that, you removed that initial loop: -

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2015-02-02 Thread Joe Perches
On Mon, 2015-02-02 at 09:12 -0800, Anshul Garg wrote: From: Anshul Garg aksgarg1...@gmail.com Unnecessary instructions are executing even though m is greater than x so added logic to make m less than equal to x before performing these operations. Signed-off-by: Anshul Garg

Re: [PATCH] lib/int_sqrt.c: Optimize square root function

2015-02-02 Thread Linus Torvalds
On Mon, Feb 2, 2015 at 11:00 AM, Linus Torvalds torva...@linux-foundation.org wrote: (I'm also not entirely sure what uses int_sqrt() that ends up being so performance-critical, so it would be good to document that too, since that probably also matters for the what's the normal argument range