Rainer Gerhards wrote:
> > My test was receiving 4 lines via UDP and writing them to a file and a
> > FIFO.
> > It was as simple as I could make it. Thread scheduling class was not
> > set.
>
> did you experience any problems without the analyzer in this setting? As I
> said, I am searching for this bug but so far we are unable to reproduce (I
> even got some help from Florian, but so far to no avail...).
No, I can't reproduce the problem at will. I only see it on production
servers under heavy load, but I can't run tools which slow down processing
there.
OTOH, a while ago I reported perfectly reproducible crash with omruleset.
I'll try that one under thread analyzer.
And I could take a huge chunk of production data and feed it to the test
system to see what happens.
> > race #1 - module loading
> this is known and really no issue
Yes.
> > race #2 - shutdown all workers
> > race #3 - thread destructor (this one might be responsible for
> > something)
> I think they are OK as well, but I will check. Maybe just atomic emulation is
> missing. May also be that this is a case where it really doesn't matter if
> dual reads are necessary.
I don't know when thread destructor is called. If only on rsyslog
shutdown, then it doesn't really matter. If it can be called during normal
processing then maybe that's causing the problem, but I don't think that's
the one.
> > race #4 - thread termination on SIGTTIN
> sounds interesting, will check.
I think I just copyed that SIGTTIN from the comment in the code. I didn't
send any signals except SIGTERM to terminate the process. I don't know if
the thread analyzer inserts its own signal handlers.
> And I think my initial answer was only partly correct. I assumed the tool was
> something like clang static analyzer.
It's not static. Which is nice because it can find things that static
analyzers can't, but the not so nice part is that you have to figure out a
way to excercize all interesting code paths.
One significant difference between production and very light lab load is
that the production is using disk queues. They were configured for the lab
test as well, but the amount of data was small and nothing was ever
written to disk.
Maybe the problem is in the disk writing code. And maybe not. :-(
I got several more problem reports from the tool. One of them is a real
dirty read bug, but the data which is being read is then fed to the dbgprint
statement, so that's not causing any problem. In the worst case you'd just
get a wrong debug output printed out.
And there are a few which look very odd. Like this one:
Race #7, Vaddr: 0x41179db4
Access 1: Write, __pthread_unwind + 0x0000003B
Access 2: Read, batchProcessed + 0x000000C8,
line 1660 in "queue.c"
Total Callstack Traces: 1
Callstack Trace 1
Access 1: Write
__pthread_unwind + 0x0000003B
collector_root + 0x0000004C
start_thread + 0x000000D5
Access 2: Read
batchProcessed + 0x000000C8, line 1660 in "queue.c"
wtiWorker + 0x000003A9, line 304 in "wti.c"
wtpWorker + 0x000002A5, line 387 in "wtp.c"
collector_root + 0x0000004C
start_thread + 0x000000D5
That's from pthread_setcancelstate() call. I just don't want to go into
that direction. :-)
--
.-. .-. Yes, I am an agent of Satan, but my duties are largely
(_ \ / _) ceremonial.
|
| [email protected]
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com