On Thu, 2009-01-29 at 19:51 -0800, [email protected] wrote: > the new C0x standard will add atomic ops and guarentees (some of which are > not nessasarily provided by the chip, but have to be provided by the > compiler/library instead), so watch for it, but test the performance of > them before you trust them
This is very important work, especially if you think about future advances in hardware design. However, I think we will be years away from the point where one can actually use this and hope to be somewhat portable. Same for performance: early implementation will probably be sub-optimal (though it should be fairly simple to map current compiler-specific options for atomic ops to the new standard once... but we know what happens when new standards come out...). > > On the other hand -O3 does things like loop unrolling, which definitely > > is a bad idea with modern cache systems. > > > > My preliminarily conclusion is that -O2 is probably best, and may be > > tuned by turning on and off specific optimizations via their specific > > compiler switches. > > this has been the prevailing wisdom for many years, but I've seen myself > many cases where -Os has ended up being faster in the real world, in spite > of the various things that -O2 does 'better' I think the phrase "it depends on the scenario" is very important here. > is it the case that -Os would break things? or just that you think it's > alignment may not be as good? It does not break things. The alignment for any structures that are passed as part of the API should be properly contained in the header files. However, I have not specifically tested this. The point is just that, at least on some machines, non-aligned addresses severely hit cache performance. So optimizing for size, and as a side-effect generating unaligned data accesses, can be a real performance drawback. It may well cost more performance than the improved L1 (or trace cache) performance offers. In any case, if we go down to that level, I think there are better places to test and optimize - not to mention that on the upper layer (OS calls!) there is still room for improvement. On of my favorite CPU-level optimizations is the "exception system" that is currently in use in rsyslog. Thanks to your message, I've finally written down some information on it. I've done that on the forum, so that I can easily keep a permanent record of the discussion (and in an easier-to-follow form than with the mail archive): http://kb.monitorware.com/optimizing-exception-handling-t8911.html Feedback is appreciated. Rainer _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

