My exact problem has been decribed two days ago in a thread named
"mmnormalize under high load".
We are dealing with just one huge stream of syslog messages. All they
share same source "host:port" pair (in fact, it is spoofed source), and
a single destination "host:port" (our syslog server). These messages are
very similar, having the same PRI, and, to make things even worse, they
are not RFC-compliant. Rsyslog is unable to parse them properly.
For now, we just have to write incoming messages into files, one file
per minute. This works fine. But if we want (and we definitely will) to
analyze messages in real time, there is a place when something
CPU-intensive kicks in. Something like mmnormalize. There will be
exactly one heavy action, which cannot be paralleled.
Then, in the future, we will forward messages to a few backend syslog OR
database servers. To spread the load, we again must do some distinction
between messages to select one of predefined actions.
--
Pavel Levshin
20.10.2013 14:38, David Lang:
I can see other uses for a sequence number, so thanks for creating this.
However:
The picture is not quite as bleak as you are making it sound. Rsyslog
already scales pretty well to large numbers of cores.
The key thing to remember is that you are almost always going to be
doing more than one thing, so while any one thing may end up being
single threaded, you can still have many threads operating at a time.
most action modules have some point where they cannot be single
threaded (think writing to a file or TCP socket).
The key to doing a lot of things in parallel is the rsyslog queue
parameters.
If you configure multiple queue workers, they may not be doing the
same action at the same time, but they can be working on different
actions at the same time.
With some action modules, such as the ones that do database inserts,
the module does support having multiple threads, because the remote
end is able to handle parallel writes.
With file output, you can enable async writes, so that you have one
thread writing the output to disk (potentially with compression,
signing, etc) while another thread is crafting the strings to be written.
It's very common that the bottleneck ends up being in string
generation (complex template patterns for the file format or for the
dynamic filename). Rsyslog supports string modules, which can be
significantly more efficient in creating these strings than the
template languange. The built-in templates were implemented this way
and resulted in a noticable improvement on the peak performance of
rsyslog, and they are relatively simple templates. With more complex
templates the gains can be substantially bigger.
What action are you doing that is running into a problem?
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.