On Thu, 29 Jan 2009, Rainer Gerhards wrote: > On Thu, 2009-01-29 at 00:36 -0800, [email protected] wrote: >> On Wed, 28 Jan 2009, Rainer Gerhards wrote: >> >>> Hi all, >>> >>> thanks to Lorenzo's help, we made good progress. It is too much to post >>> inside a mail, please have a look at my analysis of the bug: >>> >>> http://blog.gerhards.net/2009/01/rsyslog-data-race-analysis.html >>> >>> The short story is that we have at least improved the situation very >>> much and I hope to have fixes for all branches within the next couple of >>> days. >> >> I just finished reading through this excellant write-up >> >> one small thing. >> >> you quote the spec >> >> Accesses to cacheable memory that are split across bus widths, cache >> lines, and page boundaries are not guaranteed to be atomic >> >> and then conclude that >> >> So aligned word-access does not guarantee (not even enhance the chance) of >> atomicity. >> >> I read that to mean that the alignment requirements are more complicated, >> not that alignment is useless. > > I should probably have quoted more of Intel's manual. But in essence you > need to read at least the first full two pages to get the in-depth idea. > The issue is not alignment requirements. As hardware gets more and more > parallel, and caches get to more and more levels, and on-chip cores > coexist with those from other sockets ... keeping memory coherent is a > costly job. > > In early CPUs, Intel made memory access atomic if some alignment > requirements were met. That was cheap. In new CPUs that atomicity is > expensive. On the other hand, most data access do not need atomicity. So > why incur the cost for many operations when only few need it? In the end > result, Intel has remove guaranteed atomicity from those memory > accesses. In order to get atomicity, the program must tell the CPU > *explicitly* that it wants that feature. To do so, a "LOCK" prefix > (opcode) must be placed before the actual opcode (note that this is only > supported for some operations). So you get the best of two world: fast > execution time for the majority of code and atomicity where you need it > (but it then incurs the cost). > > The bottom line is that what was an atomic operation on an old CPU is no > longer an atomic operation on a new CPU. If you need that, you need to > include that extra "LOCK" opcode. > > As I briefly said in the blogpost, I have not check old Intel manuals. > So I do not know if they formerly guaranteed, as part of the instruction > set architecture, that these operations were atomic. I guess they did > not. If so, I as a programmer made some assumptions about the > micro-architecture that no longer hold true. My fault... But even if it > is Intel's fault, the C programming language does not guarantee > atomicity nor does the compiler guarantee a specific translation to > machine code. So I, working on the C level, used assumptions that were > not valid (and as I said I knew it was dangerous, but it worked too well > for too long... ;))
the new C0x standard will add atomic ops and guarentees (some of which are not nessasarily provided by the chip, but have to be provided by the compiler/library instead), so watch for it, but test the performance of them before you trust them >> >> you should also look at the code that's generated by -Os, with the heavily >> cached systems that we have nowdays it's common that the code being >> smaller (and therefor more of the code fitting into the L1 cache) is more >> of an advantage than the optimizations that -O3 provides. > > That's a good reminder. I've just checked the gcc docs. There are some > things that I do not like about -Os, especially as it disables proper > alignment of many structures, including code. That can lead to > sub-optimal cache performance. I know the linux kernel has many things where the alignment is critical for proper functioning, but they are still able to support -Os, so there is some way to specify alignment even for -Os > On the other hand -O3 does things like loop unrolling, which definitely > is a bad idea with modern cache systems. > > My preliminarily conclusion is that -O2 is probably best, and may be > tuned by turning on and off specific optimizations via their specific > compiler switches. this has been the prevailing wisdom for many years, but I've seen myself many cases where -Os has ended up being faster in the real world, in spite of the various things that -O2 does 'better' is it the case that -Os would break things? or just that you think it's alignment may not be as good? David Lang >> congradulations on tracking down a nasty and subtle issue. > > Thanks - but let's first see if this was the only issue and if things > run smooth everywhere. But it looks very promising. > > Rainer >> >> David Lang >> >> >>> Rainer >>> >>>> -----Original Message----- >>>> From: [email protected] [mailto:rsyslog- >>>> [email protected]] On Behalf Of Rainer Gerhards >>>> Sent: Friday, January 16, 2009 3:22 PM >>>> To: rsyslog-users >>>> Subject: Re: [rsyslog] rsyslog still crashes >>>> >>>> Lorenzo, >>>> >>>> I have created a new branch "raceDebug" and done a first commit to it. >>>> The change is very lightweight. Please pull, compile as usual and give >>>> it a try. It spits out some info to stdout from time to time >>>> (hopefully). I am not sure if it aborts, depending on the output it >>> may >>>> or may not. Even if we get messages, they are probably not enough to >>>> pinpoint the bug, but I wanted to do something very light to see if >>> the >>>> bug stays. >>>> >>>> Feedback appreciated. >>>> >>>> Rainer >>> _______________________________________________ >>> rsyslog mailing list >>> http://lists.adiscon.net/mailman/listinfo/rsyslog >>> http://www.rsyslog.com >>> >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com > > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com

