Re: [proto] proto performance

2011-02-24 Thread Karsten Ahnert
> MacBook Pro, 10.6.6, Core 2 Duo
> ProtoContext  ProtoTransform  
> ProtoLambda Loop
> GCC 4.2.1 (Apple) : 5.3565438 5.3721942   
> 126.38458   1.3657978
> GCC 4.4.5   :  1.8878364  1.8845548   
> 70.056237   0.942303
> GCC 4.5.2   :  1.8840608  1.889619
> 1.2806688   1.0589558
> GCC 4.6.0 (2/5/11):  1.88547681.8834438   
> 1.2783471.2345208
> CLANG 2.9 (125472):  5.455976 5.4627628   
> 3.8251041.2330524
> 
> Now, removing the ((noinline)), gives (in the same order)
> 
> GCC 4.2.1 (Apple) :   4.1448478   5.3795842   126.53211   
> 1.3215378
> GCC 4.4.5   : 1.2505956   1.2500816   69.409665   
> 0.7198288
> GCC 4.5.2   : 0.5961430.7213138   0.71969283  
> 0.7211534
> GCC 4.6.0 (2/5/11):   1.2942638   1.4324828   0.646147
> 0.6632324
> CLANG 2.9 (125472): 1.2975226 1.2966478   1.3849834   
> 1.2452362

Interesting results. I have done a similar test for loops (for, while,
with/without pointers) and obtained similar results. Everything depends
on the compiler.

I think the order of the above numbers will drastically change if the
expression is small, like x3 = x1 + 2.0 * x2.

> I'm not sure how meaningful this second set of numbers is.  If the evaluation 
> functions are inlined, the compiler 
> can realize that evaluating them num_of_steps times is unnecessary since the 
> data isn't changing between 
> iterations.  It then (I believe) optimizes out certain parts of the loop in 
> certain cases.

Maybe it would be better to evaluate something with the increment assign
operator, x3 += x1 + 2.0 * x2.
___
proto mailing list
proto@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/proto


Re: [proto] proto performance

2011-02-24 Thread Nate Knight
Not sure what happened to those tables.  I'll try again.

MacBook Pro, 10.6.6, Core 2 Duo
ProtoContextProtoTransform  ProtoLambda Loop
GCC 4.2.1 (Apple) : 5.3565438   5.3721942   126.38458   
1.3657978
GCC 4.4.5 : 1.8878364   1.8845548   70.056237   0.942303
GCC 4.5.2 : 1.8840608   1.8896191.2806688   
1.0589558
GCC 4.6.0 (2/5/11): 1.8854768   1.8834438   1.278347
1.2345208
CLANG 2.9 (125472): 5.4559765.4627628   3.825104
1.2330524

Now, removing the ((noinline)), gives (in the same order)

GCC 4.2.1 (Apple) : 4.1448478   5.3795842   126.53211   
1.3215378
GCC 4.4.5 : 1.2505956   1.2500816   69.409665   
0.7198288
GCC 4.5.2 : 0.5961430.7213138   0.71969283  
0.7211534
GCC 4.6.0 (2/5/11): 1.2942638   1.4324828   0.646147
0.6632324
CLANG 2.9 (125472): 1.2975226   1.2966478   1.3849834   
1.2452362

Nate

smime.p7s
Description: S/MIME cryptographic signature
___
proto mailing list
proto@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/proto


Re: [proto] proto performance

2011-02-24 Thread Nate Knight

On Feb 20, 2011, at 4:43 AM, Joel Falcou wrote:

> On 20/02/11 12:41, Eric Niebler wrote:
>> On 2/20/2011 6:40 PM, Joel Falcou wrote:
>>> On 20/02/11 12:31, Karsten Ahnert wrote:
 It is amazing that the proto expression is faster then the naive one.
 The compiler must really love the way proto evaluates an expression.
>>> I still dont really know why. Usual speed-up in our use cases here is
>>> like ranging from 10 to 50%.
>> That's weird.
>> 
> Well, for me it's weird in the good way so I dont complain. Old version 
> of nt2 had cases where
> we were thrice as fast as same vector+iterator based code ...
> ___
> proto mailing list
> proto@lists.boost.org
> http://lists.boost.org/mailman/listinfo.cgi/proto


To explore the issue further I modified the original posted test code (see 
http://pastebin.com/1Vr9BkPP).  
The modifications include a transform based evaluator, a lambda expression 
based example,  and 
some attributes to keep the evaluation functions from being inlined.

First, the numbers (average after 5 iterations of the main loop).  All 
compilation done with -O3 against Boost 1.45.

MacBook Pro, 10.6.6, Core 2 Duo
ProtoContextProtoTransform  
ProtoLambda Loop
GCC 4.2.1 (Apple) : 5.3565438   5.3721942   
126.38458   1.3657978
GCC 4.4.5   :  1.88783641.8845548   
70.056237   0.942303
GCC 4.5.2   :  1.88406081.889619
1.2806688   1.0589558
GCC 4.6.0 (2/5/11):  1.8854768  1.8834438   
1.2783471.2345208
CLANG 2.9 (125472):  5.455976   5.4627628   3.825104
1.2330524

Now, removing the ((noinline)), gives (in the same order)

GCC 4.2.1 (Apple) : 4.1448478   5.3795842   126.53211   
1.3215378
GCC 4.4.5   :   1.2505956   1.2500816   69.409665   
0.7198288
GCC 4.5.2   :   0.5961430.7213138   0.71969283  
0.7211534
GCC 4.6.0 (2/5/11): 1.2942638   1.4324828   0.646147
0.6632324
CLANG 2.9 (125472): 1.2975226   1.2966478   1.3849834   1.2452362

I'm not sure how meaningful this second set of numbers is.  If the evaluation 
functions are inlined, the compiler 
can realize that evaluating them num_of_steps times is unnecessary since the 
data isn't changing between 
iterations.  It then (I believe) optimizes out certain parts of the loop in 
certain cases.

A lot of the additional code came from Eric's cpp-next articles.

Nate



smime.p7s
Description: S/MIME cryptographic signature
___
proto mailing list
proto@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/proto


Re: [proto] proto performance

2011-02-20 Thread Joel Falcou

On 20/02/11 12:41, Eric Niebler wrote:

On 2/20/2011 6:40 PM, Joel Falcou wrote:

On 20/02/11 12:31, Karsten Ahnert wrote:

It is amazing that the proto expression is faster then the naive one.
The compiler must really love the way proto evaluates an expression.

I still dont really know why. Usual speed-up in our use cases here is
like ranging from 10 to 50%.

That's weird.

Well, for me it's weird in the good way so I dont complain. Old version 
of nt2 had cases where

we were thrice as fast as same vector+iterator based code ...
___
proto mailing list
proto@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/proto


Re: [proto] proto performance

2011-02-20 Thread Eric Niebler
On 2/20/2011 6:40 PM, Joel Falcou wrote:
> On 20/02/11 12:31, Karsten Ahnert wrote:
>> It is amazing that the proto expression is faster then the naive one.
>> The compiler must really love the way proto evaluates an expression.
> 
> I still dont really know why. Usual speed-up in our use cases here is
> like ranging from 10 to 50%.

That's weird.

-- 
Eric Niebler
BoostPro Computing
http://www.boostpro.com
___
proto mailing list
proto@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/proto


Re: [proto] proto performance

2011-02-20 Thread Joel Falcou

On 20/02/11 12:31, Karsten Ahnert wrote:

It is amazing that the proto expression is faster then the naive one.
The compiler must really love the way proto evaluates an expression.
I still dont really know why. Usual speed-up in our use cases here is 
like ranging from 10 to 50%.

___
proto mailing list
proto@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/proto


Re: [proto] proto performance

2011-02-20 Thread Karsten Ahnert
On 02/20/2011 12:08 PM, Joel Falcou wrote:
> On 20/02/11 12:03, Karsten Ahnert wrote:
>> On 02/20/2011 12:02 PM, Joel Falcou wrote:
>>> On 20/02/11 11:55, Karsten Ahnert wrote:
 On 02/20/2011 11:57 AM, Eric Niebler wrote:
 It gcc 4.4 on a 64bit machine. Of course, I compile with -O3.

>>> Ding! welcome to gcc-4.4 64bits compiler hellfest.
>>> Try 4.5, 4.4 64bits can't inlien for w/e reason.
>> Great, I tried with gcc 4.5 and the proto part is now around 5-10
>> percents faster. Thank you.
> 
> We banged our heads for weeks on this issue earlier until we found some
> dubious bug report in gcc bugzilla flagged as nofix :/
> Seems the 4.5 branch solved it somehow.

It is amazing that the proto expression is faster then the naive one.
The compiler must really love the way proto evaluates an expression.

> 
> You cna also try compiling with 4.4 using -m32
> ___
> proto mailing list
> proto@lists.boost.org
> http://lists.boost.org/mailman/listinfo.cgi/proto


-- 
Dr. Karsten Ahnert
Ambrosys GmbH - Gesellschaft für Management komplexer Systeme
Geschwister-Scholl-Str. 63a
D-14471 Potsdam

Tel: +4917682001688
Fax: +493319791300

Ambrosys GmbH - Gesellschaft für Management komplexer Systems
Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Geschwister-Scholl-Str. 63a, 14471 Potsdam
Registergericht: Amtsgericht Potsdam, HRB 21228 P
Geschäftsführer: Dr. Karsten Ahnert, Dr. Markus Abel
___
proto mailing list
proto@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/proto


Re: [proto] proto performance

2011-02-20 Thread Joel Falcou

On 20/02/11 12:03, Karsten Ahnert wrote:

On 02/20/2011 12:02 PM, Joel Falcou wrote:

On 20/02/11 11:55, Karsten Ahnert wrote:

On 02/20/2011 11:57 AM, Eric Niebler wrote:
It gcc 4.4 on a 64bit machine. Of course, I compile with -O3.


Ding! welcome to gcc-4.4 64bits compiler hellfest.
Try 4.5, 4.4 64bits can't inlien for w/e reason.

Great, I tried with gcc 4.5 and the proto part is now around 5-10
percents faster. Thank you.


We banged our heads for weeks on this issue earlier until we found some 
dubious bug report in gcc bugzilla flagged as nofix :/

Seems the 4.5 branch solved it somehow.

You cna also try compiling with 4.4 using -m32
___
proto mailing list
proto@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/proto


Re: [proto] proto performance

2011-02-20 Thread Karsten Ahnert
On 02/20/2011 12:02 PM, Joel Falcou wrote:
> On 20/02/11 11:55, Karsten Ahnert wrote:
>> On 02/20/2011 11:57 AM, Eric Niebler wrote:
>> It gcc 4.4 on a 64bit machine. Of course, I compile with -O3.
>>
> Ding! welcome to gcc-4.4 64bits compiler hellfest.
> Try 4.5, 4.4 64bits can't inlien for w/e reason.

Great, I tried with gcc 4.5 and the proto part is now around 5-10
percents faster. Thank you.

> ___
> proto mailing list
> proto@lists.boost.org
> http://lists.boost.org/mailman/listinfo.cgi/proto


-- 
Dr. Karsten Ahnert
Ambrosys GmbH - Gesellschaft für Management komplexer Systeme
Geschwister-Scholl-Str. 63a
D-14471 Potsdam

Tel: +4917682001688
Fax: +493319791300

Ambrosys GmbH - Gesellschaft für Management komplexer Systems
Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Geschwister-Scholl-Str. 63a, 14471 Potsdam
Registergericht: Amtsgericht Potsdam, HRB 21228 P
Geschäftsführer: Dr. Karsten Ahnert, Dr. Markus Abel
___
proto mailing list
proto@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/proto


Re: [proto] proto performance

2011-02-20 Thread Joel Falcou

On 20/02/11 11:55, Karsten Ahnert wrote:

On 02/20/2011 11:57 AM, Eric Niebler wrote:
It gcc 4.4 on a 64bit machine. Of course, I compile with -O3.


Ding! welcome to gcc-4.4 64bits compiler hellfest.
Try 4.5, 4.4 64bits can't inlien for w/e reason.
___
proto mailing list
proto@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/proto


Re: [proto] proto performance

2011-02-20 Thread Joel Falcou

On 20/02/11 11:57, Eric Niebler wrote:

On 2/20/2011 5:52 PM, Joel Falcou wrote:

1/ how do you measure performances ? Anything which is not the median of
1-5K runs is meaningless.

You can see how he measures it in the code he posted.


I clicked send too fast :p

2/ Don't use context, transform are usually better optimized by compilers

That really shouldn't matter.

Well, in our test it does. At least back in gcc 4.4

3/ are you using gcc on a 64 bits system ? On this configuration a gcc
bug prevent proto to be inlined.

Naive question: are you actually compiling with optimizations on? -O3
-DNDEBUG? And are you sure the compiler isn't lifting the whole thing
out of the loop, since the computation is the same with each iteration?

Oh yeah I forgot these.

On my machine (mac osx dual core intel with g++4-5) i have a 25% speed 
up by proto ...

___
proto mailing list
proto@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/proto


Re: [proto] proto performance

2011-02-20 Thread Karsten Ahnert
On 02/20/2011 11:57 AM, Eric Niebler wrote:
> On 2/20/2011 5:52 PM, Joel Falcou wrote:
>> 1/ how do you measure performances ? Anything which is not the median of
>> 1-5K runs is meaningless.
> 
> You can see how he measures it in the code he posted.
> 
>> 2/ Don't use context, transform are usually better optimized by compilers
> 
> That really shouldn't matter.
> 
>> 3/ are you using gcc on a 64 bits system ? On this configuration a gcc
>> bug prevent proto to be inlined.
> 
> Naive question: are you actually compiling with optimizations on? -O3
> -DNDEBUG? And are you sure the compiler isn't lifting the whole thing
> out of the loop, since the computation is the same with each iteration?

It gcc 4.4 on a 64bit machine. Of course, I compile with -O3.

> 
> 
> 
> 
> ___
> proto mailing list
> proto@lists.boost.org
> http://lists.boost.org/mailman/listinfo.cgi/proto


-- 
Dr. Karsten Ahnert
Ambrosys GmbH - Gesellschaft für Management komplexer Systeme
Geschwister-Scholl-Str. 63a
D-14471 Potsdam

Tel: +4917682001688
Fax: +493319791300

Ambrosys GmbH - Gesellschaft für Management komplexer Systems
Gesellschaft mit beschränkter Haftung
Sitz der Gesellschaft: Geschwister-Scholl-Str. 63a, 14471 Potsdam
Registergericht: Amtsgericht Potsdam, HRB 21228 P
Geschäftsführer: Dr. Karsten Ahnert, Dr. Markus Abel
___
proto mailing list
proto@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/proto


Re: [proto] proto performance

2011-02-20 Thread Eric Niebler
On 2/20/2011 5:52 PM, Joel Falcou wrote:
> 1/ how do you measure performances ? Anything which is not the median of
> 1-5K runs is meaningless.

You can see how he measures it in the code he posted.

> 2/ Don't use context, transform are usually better optimized by compilers

That really shouldn't matter.

> 3/ are you using gcc on a 64 bits system ? On this configuration a gcc
> bug prevent proto to be inlined.

Naive question: are you actually compiling with optimizations on? -O3
-DNDEBUG? And are you sure the compiler isn't lifting the whole thing
out of the loop, since the computation is the same with each iteration?

-- 
Eric Niebler
BoostPro Computing
http://www.boostpro.com



signature.asc
Description: OpenPGP digital signature
___
proto mailing list
proto@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/proto


Re: [proto] proto performance

2011-02-20 Thread Joel Falcou
1/ how do you measure performances ? Anything which is not the median of 
1-5K runs is meaningless.

2/ Don't use context, transform are usually better optimized by compilers
3/ are you using gcc on a 64 bits system ? On this configuration a gcc 
bug prevent proto to be inlined.

___
proto mailing list
proto@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/proto