On Thu, 2009-01-29 at 00:36 -0800, [email protected] wrote: > On Wed, 28 Jan 2009, Rainer Gerhards wrote: > > > Hi all, > > > > thanks to Lorenzo's help, we made good progress. It is too much to post > > inside a mail, please have a look at my analysis of the bug: > > > > http://blog.gerhards.net/2009/01/rsyslog-data-race-analysis.html > > > > The short story is that we have at least improved the situation very > > much and I hope to have fixes for all branches within the next couple of > > days. > > I just finished reading through this excellant write-up > > one small thing. > > you quote the spec > > Accesses to cacheable memory that are split across bus widths, cache > lines, and page boundaries are not guaranteed to be atomic > > and then conclude that > > So aligned word-access does not guarantee (not even enhance the chance) of > atomicity. > > I read that to mean that the alignment requirements are more complicated, > not that alignment is useless.
I should probably have quoted more of Intel's manual. But in essence you need to read at least the first full two pages to get the in-depth idea. The issue is not alignment requirements. As hardware gets more and more parallel, and caches get to more and more levels, and on-chip cores coexist with those from other sockets ... keeping memory coherent is a costly job. In early CPUs, Intel made memory access atomic if some alignment requirements were met. That was cheap. In new CPUs that atomicity is expensive. On the other hand, most data access do not need atomicity. So why incur the cost for many operations when only few need it? In the end result, Intel has remove guaranteed atomicity from those memory accesses. In order to get atomicity, the program must tell the CPU *explicitly* that it wants that feature. To do so, a "LOCK" prefix (opcode) must be placed before the actual opcode (note that this is only supported for some operations). So you get the best of two world: fast execution time for the majority of code and atomicity where you need it (but it then incurs the cost). The bottom line is that what was an atomic operation on an old CPU is no longer an atomic operation on a new CPU. If you need that, you need to include that extra "LOCK" opcode. As I briefly said in the blogpost, I have not check old Intel manuals. So I do not know if they formerly guaranteed, as part of the instruction set architecture, that these operations were atomic. I guess they did not. If so, I as a programmer made some assumptions about the micro-architecture that no longer hold true. My fault... But even if it is Intel's fault, the C programming language does not guarantee atomicity nor does the compiler guarantee a specific translation to machine code. So I, working on the C level, used assumptions that were not valid (and as I said I knew it was dangerous, but it worked too well for too long... ;)) > > you should also look at the code that's generated by -Os, with the heavily > cached systems that we have nowdays it's common that the code being > smaller (and therefor more of the code fitting into the L1 cache) is more > of an advantage than the optimizations that -O3 provides. That's a good reminder. I've just checked the gcc docs. There are some things that I do not like about -Os, especially as it disables proper alignment of many structures, including code. That can lead to sub-optimal cache performance. On the other hand -O3 does things like loop unrolling, which definitely is a bad idea with modern cache systems. My preliminarily conclusion is that -O2 is probably best, and may be tuned by turning on and off specific optimizations via their specific compiler switches. > > congradulations on tracking down a nasty and subtle issue. Thanks - but let's first see if this was the only issue and if things run smooth everywhere. But it looks very promising. Rainer > > David Lang > > > > Rainer > > > >> -----Original Message----- > >> From: [email protected] [mailto:rsyslog- > >> [email protected]] On Behalf Of Rainer Gerhards > >> Sent: Friday, January 16, 2009 3:22 PM > >> To: rsyslog-users > >> Subject: Re: [rsyslog] rsyslog still crashes > >> > >> Lorenzo, > >> > >> I have created a new branch "raceDebug" and done a first commit to it. > >> The change is very lightweight. Please pull, compile as usual and give > >> it a try. It spits out some info to stdout from time to time > >> (hopefully). I am not sure if it aborts, depending on the output it > > may > >> or may not. Even if we get messages, they are probably not enough to > >> pinpoint the bug, but I wanted to do something very light to see if > > the > >> bug stays. > >> > >> Feedback appreciated. > >> > >> Rainer > > _______________________________________________ > > rsyslog mailing list > > http://lists.adiscon.net/mailman/listinfo/rsyslog > > http://www.rsyslog.com > > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

