On Thu, 2009-01-29 at 00:36 -0800, [email protected] wrote:
> On Wed, 28 Jan 2009, Rainer Gerhards wrote:
> 
> > Hi all,
> >
> > thanks to Lorenzo's help, we made good progress. It is too much to post
> > inside a mail, please have a look at my analysis of the bug:
> >
> > http://blog.gerhards.net/2009/01/rsyslog-data-race-analysis.html
> >
> > The short story is that we have at least improved the situation very
> > much and I hope to have fixes for all branches within the next couple of
> > days.
> 
> I just finished reading through this excellant write-up
> 
> one small thing.
> 
> you quote the spec
> 
> Accesses to cacheable memory that are split across bus widths, cache 
> lines, and page boundaries are not guaranteed to be atomic
> 
> and then conclude that
> 
> So aligned word-access does not guarantee (not even enhance the chance) of 
> atomicity.
> 
> I read that to mean that the alignment requirements are more complicated, 
> not that alignment is useless.

I should probably have quoted more of Intel's manual. But in essence you
need to read at least the first full two pages to get the in-depth idea.
The issue is not alignment requirements. As hardware gets more and more
parallel, and caches get to more and more levels, and on-chip cores
coexist with those from other sockets ... keeping memory coherent is a
costly job. 

In early CPUs, Intel made memory access atomic if some alignment
requirements were met. That was cheap. In new CPUs that atomicity is
expensive. On the other hand, most data access do not need atomicity. So
why incur the cost for many operations when only few need it? In the end
result, Intel has remove guaranteed atomicity from those memory
accesses. In order to get atomicity, the program must tell the CPU
*explicitly* that it wants that feature. To do so, a "LOCK" prefix
(opcode) must be placed before the actual opcode (note that this is only
supported for some operations). So you get the best of two world: fast
execution time for the majority of code and atomicity where you need it
(but it then incurs the cost).

The bottom line is that what was an atomic operation on an old CPU is no
longer an atomic operation on a new CPU. If you need that, you need to
include that extra "LOCK" opcode.

As I briefly said in the blogpost, I have not check old Intel manuals.
So I do not know if they formerly guaranteed, as part of the instruction
set architecture, that these operations were atomic. I guess they did
not. If so, I as a programmer made some assumptions about the
micro-architecture that no longer hold true. My fault... But even if it
is Intel's fault, the C programming language does not guarantee
atomicity nor does the compiler guarantee a specific translation to
machine code. So I, working on the C level, used assumptions that were
not valid (and as I said I knew it was dangerous, but it worked too well
for too long... ;))
> 
> you should also look at the code that's generated by -Os, with the heavily 
> cached systems that we have nowdays it's common that the code being 
> smaller (and therefor more of the code fitting into the L1 cache) is more 
> of an advantage than the optimizations that -O3 provides.

That's a good reminder. I've just checked the gcc docs. There are some
things that I do not like about -Os, especially as it disables proper
alignment of many structures, including code. That can lead to
sub-optimal cache performance.

On the other hand -O3 does things like loop unrolling, which definitely
is a bad idea with modern cache systems.

My preliminarily conclusion is that -O2 is probably best, and may be
tuned by turning on and off specific optimizations via their specific
compiler switches.
> 
> congradulations on tracking down a nasty and subtle issue.

Thanks - but let's first see if this was the only issue and if things
run smooth everywhere. But it looks very promising.

Rainer
> 
> David Lang
> 
> 
> > Rainer
> >
> >> -----Original Message-----
> >> From: [email protected] [mailto:rsyslog-
> >> [email protected]] On Behalf Of Rainer Gerhards
> >> Sent: Friday, January 16, 2009 3:22 PM
> >> To: rsyslog-users
> >> Subject: Re: [rsyslog] rsyslog still crashes
> >>
> >> Lorenzo,
> >>
> >> I have created a new branch "raceDebug" and done a first commit to it.
> >> The change is very lightweight. Please pull, compile as usual and give
> >> it a try. It spits out some info to stdout from time to time
> >> (hopefully). I am not sure if it aborts, depending on the output it
> > may
> >> or may not. Even if we get messages, they are probably not enough to
> >> pinpoint the bug, but I wanted to do something very light to see if
> > the
> >> bug stays.
> >>
> >> Feedback appreciated.
> >>
> >> Rainer
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com
> >
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com

_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com

Reply via email to