My exact problem has been decribed two days ago in a thread named "mmnormalize under high load".

We are dealing with just one huge stream of syslog messages. All they share same source "host:port" pair (in fact, it is spoofed source), and a single destination "host:port" (our syslog server). These messages are very similar, having the same PRI, and, to make things even worse, they are not RFC-compliant. Rsyslog is unable to parse them properly.

For now, we just have to write incoming messages into files, one file per minute. This works fine. But if we want (and we definitely will) to analyze messages in real time, there is a place when something CPU-intensive kicks in. Something like mmnormalize. There will be exactly one heavy action, which cannot be paralleled.

Then, in the future, we will forward messages to a few backend syslog OR database servers. To spread the load, we again must do some distinction between messages to select one of predefined actions.


--
Pavel Levshin


20.10.2013 14:38, David Lang:
I can see other uses for a sequence number, so thanks for creating this.

However:

The picture is not quite as bleak as you are making it sound. Rsyslog already scales pretty well to large numbers of cores.

The key thing to remember is that you are almost always going to be doing more than one thing, so while any one thing may end up being single threaded, you can still have many threads operating at a time.

most action modules have some point where they cannot be single threaded (think writing to a file or TCP socket).

The key to doing a lot of things in parallel is the rsyslog queue parameters.

If you configure multiple queue workers, they may not be doing the same action at the same time, but they can be working on different actions at the same time.

With some action modules, such as the ones that do database inserts, the module does support having multiple threads, because the remote end is able to handle parallel writes.

With file output, you can enable async writes, so that you have one thread writing the output to disk (potentially with compression, signing, etc) while another thread is crafting the strings to be written.

It's very common that the bottleneck ends up being in string generation (complex template patterns for the file format or for the dynamic filename). Rsyslog supports string modules, which can be significantly more efficient in creating these strings than the template languange. The built-in templates were implemented this way and resulted in a noticable improvement on the peak performance of rsyslog, and they are relatively simple templates. With more complex templates the gains can be substantially bigger.

What action are you doing that is running into a problem?

David Lang


_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to