On Thu, 2009-01-29 at 19:51 -0800, [email protected] wrote:
> the new C0x standard will add atomic ops and guarentees (some of which are 
> not nessasarily provided by the chip, but have to be provided by the 
> compiler/library instead), so watch for it, but test the performance of 
> them before you trust them

This is very important work, especially if you think about future
advances in hardware design. However, I think we will be years away from
the point where one can actually use this and hope to be somewhat
portable. Same for performance: early implementation will probably be
sub-optimal (though it should be fairly simple to map current
compiler-specific options for atomic ops to the new standard once... but
we know what happens when new standards come out...).

> > On the other hand -O3 does things like loop unrolling, which definitely
> > is a bad idea with modern cache systems.
> >
> > My preliminarily conclusion is that -O2 is probably best, and may be
> > tuned by turning on and off specific optimizations via their specific
> > compiler switches.
> 
> this has been the prevailing wisdom for many years, but I've seen myself 
> many cases where -Os has ended up being faster in the real world, in spite 
> of the various things that -O2 does 'better'

I think the phrase "it depends on the scenario" is very important here.

> is it the case that -Os would break things? or just that you think it's 
> alignment may not be as good?

It does not break things. The alignment for any structures that are
passed as part of the API should be properly contained in the header
files. However, I have not specifically tested this.

The point is just that, at least on some machines, non-aligned addresses
severely hit cache performance. So optimizing for size, and as a
side-effect generating unaligned data accesses, can be a real
performance drawback. It may well cost more performance than the
improved L1 (or trace cache) performance offers.

In any case, if we go down to that level, I think there are better
places to test and optimize - not to mention that on the upper layer (OS
calls!) there is still room for improvement. On of my favorite CPU-level
optimizations is the "exception system" that is currently in use in
rsyslog. Thanks to your message, I've finally written down some
information on it. I've done that on the forum, so that I can easily
keep a permanent record of the discussion (and in an easier-to-follow
form than with the mail archive):

http://kb.monitorware.com/optimizing-exception-handling-t8911.html

Feedback is appreciated.

Rainer

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Reply via email to