On Fri, Jul 21, 2017 at 03:26:21PM +0200, Peter Zijlstra wrote:
>
> EVENT=0 -DNEW=1 -DFLS=1
> event: 19.626050 +- 0.038995
> EVENT=0 -DNEW=1 -DFLS=1 -DWIPE_BTB=1
> event: 109.610670 +- 0.425667
>
> EVENT=0 -DNEW=1 -DFLS=1 -DANSHUL=1
> event: 21.445680 +- 0.043782
> EVENT=0 -DNEW=1 -DFLS=1
On Fri, Jul 21, 2017 at 03:26:21PM +0200, Peter Zijlstra wrote:
>
> EVENT=0 -DNEW=1 -DFLS=1
> event: 19.626050 +- 0.038995
> EVENT=0 -DNEW=1 -DFLS=1 -DWIPE_BTB=1
> event: 109.610670 +- 0.425667
>
> EVENT=0 -DNEW=1 -DFLS=1 -DANSHUL=1
> event: 21.445680 +- 0.043782
> EVENT=0 -DNEW=1 -DFLS=1
On Fri, Jul 21, 2017 at 05:15:10AM -0700, Joe Perches wrote:
> On Fri, 2017-07-21 at 13:40 +0200, Peter Zijlstra wrote:
> > @@ -21,7 +22,11 @@ unsigned long int_sqrt(unsigned long x)
> > if (x <= 1)
> > return x;
> >
> > - m = 1UL << (BITS_PER_LONG - 2);
> > + m = 1UL <<
On Fri, Jul 21, 2017 at 05:15:10AM -0700, Joe Perches wrote:
> On Fri, 2017-07-21 at 13:40 +0200, Peter Zijlstra wrote:
> > @@ -21,7 +22,11 @@ unsigned long int_sqrt(unsigned long x)
> > if (x <= 1)
> > return x;
> >
> > - m = 1UL << (BITS_PER_LONG - 2);
> > + m = 1UL <<
On Fri, 2017-07-21 at 13:40 +0200, Peter Zijlstra wrote:
> @@ -21,7 +22,11 @@ unsigned long int_sqrt(unsigned long x)
> if (x <= 1)
> return x;
>
> - m = 1UL << (BITS_PER_LONG - 2);
> + m = 1UL << (__fls(x) & ~1U);
> +
> + while (m > x)
> + m >>= 2;
On Fri, 2017-07-21 at 13:40 +0200, Peter Zijlstra wrote:
> @@ -21,7 +22,11 @@ unsigned long int_sqrt(unsigned long x)
> if (x <= 1)
> return x;
>
> - m = 1UL << (BITS_PER_LONG - 2);
> + m = 1UL << (__fls(x) & ~1U);
> +
> + while (m > x)
> + m >>= 2;
On Thu, Jul 20, 2017 at 04:24:32PM -0700, Linus Torvalds wrote:
> So mayube something like the attached?
>
> NOTE NOTE NOTE! This may be pure garbage. It's untested. But the code
> generation looks ok even if gcc is being way too clever and turns that
> first loop into a counted loop instead.
On Thu, Jul 20, 2017 at 04:24:32PM -0700, Linus Torvalds wrote:
> So mayube something like the attached?
>
> NOTE NOTE NOTE! This may be pure garbage. It's untested. But the code
> generation looks ok even if gcc is being way too clever and turns that
> first loop into a counted loop instead.
On Thu, Jul 20, 2017 at 3:34 PM, Peter Zijlstra wrote:
>>
>> "Why does this matter, and what is the range it matters for?"
>
> I was looking to do some work on the idle estimator. Parts of that keep
> online avg and variance for normal distributions. I wanted to bias the
>
On Thu, Jul 20, 2017 at 3:34 PM, Peter Zijlstra wrote:
>>
>> "Why does this matter, and what is the range it matters for?"
>
> I was looking to do some work on the idle estimator. Parts of that keep
> online avg and variance for normal distributions. I wanted to bias the
> avg downwards, the way
On Thu, Jul 20, 2017 at 11:31:36AM -0700, Linus Torvalds wrote:
> How did this two-year old thread get resurrected?
I was looking for the original thread doing that 'optimization'
Davidlohr did but found this first.
> And the *most* important question is that first one:
>
> "Why does this
On Thu, Jul 20, 2017 at 11:31:36AM -0700, Linus Torvalds wrote:
> How did this two-year old thread get resurrected?
I was looking for the original thread doing that 'optimization'
Davidlohr did but found this first.
> And the *most* important question is that first one:
>
> "Why does this
How did this two-year old thread get resurrected?
Anyway, it got resurrected without even answering one core question:
On Thu, Jul 20, 2017 at 4:24 AM, Peter Zijlstra wrote:
> On Mon, Feb 02, 2015 at 11:13:44AM -0800, Linus Torvalds wrote:
On Mon, Feb 2, 2015 at 11:00
How did this two-year old thread get resurrected?
Anyway, it got resurrected without even answering one core question:
On Thu, Jul 20, 2017 at 4:24 AM, Peter Zijlstra wrote:
> On Mon, Feb 02, 2015 at 11:13:44AM -0800, Linus Torvalds wrote:
On Mon, Feb 2, 2015 at 11:00 AM, Linus Torvalds
On Thu, Jul 20, 2017 at 01:24:49PM +0200, Peter Zijlstra wrote:
> ~/tmp$ gcc -o sqrt sqrt.c -lm -O2 -DLOOPS=1000 -DNEW=1 -DFLS=1 -DANSHUL=1
> ; perf stat --repeat 10 -e cycles:u -e instructions:u ./sqrt
>
> Performance counter stats for './sqrt' (10 runs):
>
>328,415,775
On Thu, Jul 20, 2017 at 01:24:49PM +0200, Peter Zijlstra wrote:
> ~/tmp$ gcc -o sqrt sqrt.c -lm -O2 -DLOOPS=1000 -DNEW=1 -DFLS=1 -DANSHUL=1
> ; perf stat --repeat 10 -e cycles:u -e instructions:u ./sqrt
>
> Performance counter stats for './sqrt' (10 runs):
>
>328,415,775
On Thu, Jul 20, 2017 at 04:52:49AM -0700, Joe Perches wrote:
> On Thu, 2017-07-20 at 13:24 +0200, Peter Zijlstra wrote:
> []
> > + m = 1UL << ((fls(x) + 1) & ~1UL);
>
> maybe
>
> #if BITS_PER_LONG == 64
> m = 1UL << ((fls64(x) + 1) & ~1UL);
> #else
> m = 1UL << ((fls(x) + 1) &
On Thu, Jul 20, 2017 at 04:52:49AM -0700, Joe Perches wrote:
> On Thu, 2017-07-20 at 13:24 +0200, Peter Zijlstra wrote:
> []
> > + m = 1UL << ((fls(x) + 1) & ~1UL);
>
> maybe
>
> #if BITS_PER_LONG == 64
> m = 1UL << ((fls64(x) + 1) & ~1UL);
> #else
> m = 1UL << ((fls(x) + 1) &
On Thu, 2017-07-20 at 13:24 +0200, Peter Zijlstra wrote:
[]
> + m = 1UL << ((fls(x) + 1) & ~1UL);
maybe
#if BITS_PER_LONG == 64
m = 1UL << ((fls64(x) + 1) & ~1UL);
#else
m = 1UL << ((fls(x) + 1) & ~1UL);
#endif
On Thu, 2017-07-20 at 13:24 +0200, Peter Zijlstra wrote:
[]
> + m = 1UL << ((fls(x) + 1) & ~1UL);
maybe
#if BITS_PER_LONG == 64
m = 1UL << ((fls64(x) + 1) & ~1UL);
#else
m = 1UL << ((fls(x) + 1) & ~1UL);
#endif
On Mon, Feb 02, 2015 at 11:13:44AM -0800, Linus Torvalds wrote:
> On Mon, Feb 2, 2015 at 11:00 AM, Linus Torvalds
> wrote:
> >
> > (I'm also not entirely sure what uses int_sqrt() that ends up being so
> > performance-critical, so it would be good to document that
On Mon, Feb 02, 2015 at 11:13:44AM -0800, Linus Torvalds wrote:
> On Mon, Feb 2, 2015 at 11:00 AM, Linus Torvalds
> wrote:
> >
> > (I'm also not entirely sure what uses int_sqrt() that ends up being so
> > performance-critical, so it would be good to document that too, since
> > that probably
Dear Mr. linus,
Thanks for quick replies.
Yes performance numbers are not conclusive enough.
So its better to discard this patch as of now.
I will try to explore more in this area.
Thanks & regards
Anshul Garg
On Fri, Feb 6, 2015 at 1:07 AM, Linus Torvalds
wrote:
> On Thu, Feb 5, 2015 at
Dear Mr. linus,
Thanks for quick replies.
Yes performance numbers are not conclusive enough.
So its better to discard this patch as of now.
I will try to explore more in this area.
Thanks regards
Anshul Garg
On Fri, Feb 6, 2015 at 1:07 AM, Linus Torvalds
torva...@linux-foundation.org
On Thu, Feb 5, 2015 at 10:43 AM, Anshul Garg wrote:
>
> NOTE ::
> I have not used gcc optimizations while compilation.
> With O2 level optimization proposed solution is taking more time.
The thing is, the kernel is compiled with -O2, so that's what matters.
Also, for very tight loops like this,
On Thu, Feb 5, 2015 at 10:33 AM, Linus Torvalds
wrote:
> On Thu, Feb 5, 2015 at 10:20 AM, Linus Torvalds
> wrote:
>>
>> Hmm. I did that too [..]
>
> Side note: one difference in our results (apart from possibly just CPU
> uarch details) is that my loop goes to 100M to make it easier to just
>
On Thu, Feb 5, 2015 at 10:20 AM, Linus Torvalds
wrote:
>
> Hmm. I did that too [..]
Side note: one difference in our results (apart from possibly just CPU
uarch details) is that my loop goes to 100M to make it easier to just
time it. Which means that my load essentially had about three more
On Tue, Feb 3, 2015 at 7:42 AM, Anshul Garg wrote:
>
> I have done profiling of int_sqrt function using perf tool for 10 times.
> For this purpose i have created a userspace program which uses sqrt function
> from 1 to a million.
Hmm. I did that too, and it doesn't improve things for me. In
On Thu, Feb 5, 2015 at 10:33 AM, Linus Torvalds
torva...@linux-foundation.org wrote:
On Thu, Feb 5, 2015 at 10:20 AM, Linus Torvalds
torva...@linux-foundation.org wrote:
Hmm. I did that too [..]
Side note: one difference in our results (apart from possibly just CPU
uarch details) is that my
On Tue, Feb 3, 2015 at 7:42 AM, Anshul Garg aksgarg1...@gmail.com wrote:
I have done profiling of int_sqrt function using perf tool for 10 times.
For this purpose i have created a userspace program which uses sqrt function
from 1 to a million.
Hmm. I did that too, and it doesn't improve
On Thu, Feb 5, 2015 at 10:20 AM, Linus Torvalds
torva...@linux-foundation.org wrote:
Hmm. I did that too [..]
Side note: one difference in our results (apart from possibly just CPU
uarch details) is that my loop goes to 100M to make it easier to just
time it. Which means that my load
On Thu, Feb 5, 2015 at 10:43 AM, Anshul Garg aksgarg1...@gmail.com wrote:
NOTE ::
I have not used gcc optimizations while compilation.
With O2 level optimization proposed solution is taking more time.
The thing is, the kernel is compiled with -O2, so that's what matters.
Also, for very tight
On Tue, Feb 3, 2015 at 10:11 AM, Davidlohr Bueso wrote:
> On Mon, 2015-02-02 at 11:13 -0800, Linus Torvalds wrote:
>> IOW, instead of
>>
>> m = 1UL << (BITS_PER_LONG - 2);
>>
>> perhaps something like
>>
>>m = 1UL << (BITS_PER_LONG/2- 2);
>>if (m < x)
>>m <<= BITS_PER_LONG/2;
On Tue, Feb 3, 2015 at 10:17 AM, Davidlohr Bueso wrote:
> On Mon, 2015-02-02 at 11:00 -0800, Linus Torvalds wrote:
>> Hmm. I don't disagree, but would like some more feedback.
>>
>> Davidlohr - you were the person to touch this function last (commit
>> 30493cc9dddb: "lib/int_sqrt.c: optimize
On Tue, Feb 3, 2015 at 10:17 AM, Davidlohr Bueso d...@stgolabs.net wrote:
On Mon, 2015-02-02 at 11:00 -0800, Linus Torvalds wrote:
Hmm. I don't disagree, but would like some more feedback.
Davidlohr - you were the person to touch this function last (commit
30493cc9dddb: lib/int_sqrt.c:
On Tue, Feb 3, 2015 at 10:11 AM, Davidlohr Bueso d...@stgolabs.net wrote:
On Mon, 2015-02-02 at 11:13 -0800, Linus Torvalds wrote:
IOW, instead of
m = 1UL (BITS_PER_LONG - 2);
perhaps something like
m = 1UL (BITS_PER_LONG/2- 2);
if (m x)
m = BITS_PER_LONG/2;
On doing
On Mon, 2015-02-02 at 20:41 -0800, Davidlohr Bueso wrote:
> On Mon, 2015-02-02 at 11:13 -0800, Linus Torvalds wrote:
> > IOW, instead of
> >
> > m = 1UL << (BITS_PER_LONG - 2);
> >
> > perhaps something like
> >
> >m = 1UL << (BITS_PER_LONG/2- 2);
> >if (m < x)
> >m <<=
On Mon, 2015-02-02 at 11:00 -0800, Linus Torvalds wrote:
> Hmm. I don't disagree, but would like some more feedback.
>
> Davidlohr - you were the person to touch this function last (commit
> 30493cc9dddb: "lib/int_sqrt.c: optimize square root algorithm"), and
> you did so for performance reasons.
On Mon, 2015-02-02 at 11:13 -0800, Linus Torvalds wrote:
> IOW, instead of
>
> m = 1UL << (BITS_PER_LONG - 2);
>
> perhaps something like
>
>m = 1UL << (BITS_PER_LONG/2- 2);
>if (m < x)
>m <<= BITS_PER_LONG/2;
>
> (assuming gcc can change that code into a "cmov") might cut
On Mon, Feb 2, 2015 at 11:10 AM, Joe Perches wrote:
>
> Perhaps removing the while and using fls(x) would be better.
No. fls is often slow, and you then do have to also round it down to
an even bit number etc, so it gets somewhat complex too.
We *probably* have some argument range that we care
On Mon, Feb 2, 2015 at 11:00 AM, Linus Torvalds
wrote:
>
> (I'm also not entirely sure what uses int_sqrt() that ends up being so
> performance-critical, so it would be good to document that too, since
> that probably also matters for the "what's the normal argument range"
> question..)
... it's
On Mon, 2015-02-02 at 09:12 -0800, Anshul Garg wrote:
> From: Anshul Garg
>
> Unnecessary instructions are executing even though m is
> greater than x so added logic to make m less than equal to
> x before performing these operations.
>
> Signed-off-by: Anshul Garg
> ---
> lib/int_sqrt.c |
Hmm. I don't disagree, but would like some more feedback.
Davidlohr - you were the person to touch this function last (commit
30493cc9dddb: "lib/int_sqrt.c: optimize square root algorithm"), and
you did so for performance reasons. And in fact, when you did that,
you removed that initial loop:
-
On Mon, Feb 2, 2015 at 11:10 AM, Joe Perches j...@perches.com wrote:
Perhaps removing the while and using fls(x) would be better.
No. fls is often slow, and you then do have to also round it down to
an even bit number etc, so it gets somewhat complex too.
We *probably* have some argument range
On Mon, 2015-02-02 at 11:00 -0800, Linus Torvalds wrote:
Hmm. I don't disagree, but would like some more feedback.
Davidlohr - you were the person to touch this function last (commit
30493cc9dddb: lib/int_sqrt.c: optimize square root algorithm), and
you did so for performance reasons. And in
On Mon, 2015-02-02 at 20:41 -0800, Davidlohr Bueso wrote:
On Mon, 2015-02-02 at 11:13 -0800, Linus Torvalds wrote:
IOW, instead of
m = 1UL (BITS_PER_LONG - 2);
perhaps something like
m = 1UL (BITS_PER_LONG/2- 2);
if (m x)
m = BITS_PER_LONG/2;
Assuming small
On Mon, 2015-02-02 at 11:13 -0800, Linus Torvalds wrote:
IOW, instead of
m = 1UL (BITS_PER_LONG - 2);
perhaps something like
m = 1UL (BITS_PER_LONG/2- 2);
if (m x)
m = BITS_PER_LONG/2;
(assuming gcc can change that code into a cmov) might cut down the
lots of
Hmm. I don't disagree, but would like some more feedback.
Davidlohr - you were the person to touch this function last (commit
30493cc9dddb: lib/int_sqrt.c: optimize square root algorithm), and
you did so for performance reasons. And in fact, when you did that,
you removed that initial loop:
-
On Mon, 2015-02-02 at 09:12 -0800, Anshul Garg wrote:
From: Anshul Garg aksgarg1...@gmail.com
Unnecessary instructions are executing even though m is
greater than x so added logic to make m less than equal to
x before performing these operations.
Signed-off-by: Anshul Garg
On Mon, Feb 2, 2015 at 11:00 AM, Linus Torvalds
torva...@linux-foundation.org wrote:
(I'm also not entirely sure what uses int_sqrt() that ends up being so
performance-critical, so it would be good to document that too, since
that probably also matters for the what's the normal argument range
50 matches
Mail list logo