On Feb 20, 2011, at 4:43 AM, Joel Falcou wrote:

> On 20/02/11 12:41, Eric Niebler wrote:
>> On 2/20/2011 6:40 PM, Joel Falcou wrote:
>>> On 20/02/11 12:31, Karsten Ahnert wrote:
>>>> It is amazing that the proto expression is faster then the naive one.
>>>> The compiler must really love the way proto evaluates an expression.
>>> I still dont really know why. Usual speed-up in our use cases here is
>>> like ranging from 10 to 50%.
>> That's weird.
>> 
> Well, for me it's weird in the good way so I dont complain. Old version 
> of nt2 had cases where
> we were thrice as fast as same vector+iterator based code ...
> _______________________________________________
> proto mailing list
> proto@lists.boost.org
> http://lists.boost.org/mailman/listinfo.cgi/proto


To explore the issue further I modified the original posted test code (see 
http://pastebin.com/1Vr9BkPP).  
The modifications include a transform based evaluator, a lambda expression 
based example,  and 
some attributes to keep the evaluation functions from being inlined.

First, the numbers (average after 5 iterations of the main loop).  All 
compilation done with -O3 against Boost 1.45.

MacBook Pro, 10.6.6, Core 2 Duo
                                        ProtoContext            ProtoTransform  
ProtoLambda     Loop
GCC 4.2.1 (Apple) :     5.3565438               5.3721942               
126.38458               1.3657978
GCC 4.4.5               :      1.8878364                1.8845548               
70.056237               0.942303
GCC 4.5.2               :      1.8840608                1.889619                
1.2806688               1.0589558
GCC 4.6.0 (2/5/11):      1.8854768              1.8834438               
1.278347                1.2345208
CLANG 2.9 (125472):  5.455976           5.4627628               3.825104        
        1.2330524

Now, removing the ((noinline)), gives (in the same order)

GCC 4.2.1 (Apple) :     4.1448478       5.3795842       126.53211       
1.3215378
GCC 4.4.5               :       1.2505956       1.2500816       69.409665       
0.7198288
GCC 4.5.2               :       0.596143        0.7213138       0.71969283      
0.7211534
GCC 4.6.0 (2/5/11):     1.2942638       1.4324828       0.646147        
0.6632324
CLANG 2.9 (125472): 1.2975226   1.2966478       1.3849834       1.2452362

I'm not sure how meaningful this second set of numbers is.  If the evaluation 
functions are inlined, the compiler 
can realize that evaluating them num_of_steps times is unnecessary since the 
data isn't changing between 
iterations.  It then (I believe) optimizes out certain parts of the loop in 
certain cases.

A lot of the additional code came from Eric's cpp-next articles.

Nate

Attachment: smime.p7s
Description: S/MIME cryptographic signature

_______________________________________________
proto mailing list
proto@lists.boost.org
http://lists.boost.org/mailman/listinfo.cgi/proto

Reply via email to