On Sun, Oct 20, 2013 at 2:00 PM, David Lang <[email protected]> wrote:

> Ok, in that case, I think you should look into writing a parse module to
> parse the logs as they arrive (or at least clean them up to the point that
> an existing parser can handle them)
>
> with the most recent dev release, you can use multiple threads to receive
> and parse UDP messages, so the parsing load can be spread across multiple
> cores. If the logs are arriving in multiple TCP connections, then I believe
> that imptcp will use multiple threads as well.
>
>
Note that the parsing happens on a *main queue* worker thread. The inputs
really don't do anything than pulling data from the source and submitting
it to the queue. So you don't need a specifically threaded input module to
run this on multiple threads.

But parser modules are not guarded by a mutex. That, however, means that
the module itself is responsible that it is thread-safe, and must ensure
that all non-thread-safe opertions (library calls!) or properly guarded.
Note that renetrancy is a required but not sufficient precondition for
thread-safeness.

Out of my head, I think it is unsafe to call liblognorm with *the same*
context handle from multiple threads (just as many systems do not handle
this, e.g. mysql, postgres, oracle, ... will all abort if the same handle
is used concurrently by different threads).

Rainer

>
> David Lang
>
> On Sun, 20 Oct 2013, Pavel Levshin wrote:
>
>  My exact problem has been decribed two days ago in a thread named
>> "mmnormalize under high load".
>>
>> We are dealing with just one huge stream of syslog messages. All they
>> share same source "host:port" pair (in fact, it is spoofed source), and a
>> single destination "host:port" (our syslog server). These messages are very
>> similar, having the same PRI, and, to make things even worse, they are not
>> RFC-compliant. Rsyslog is unable to parse them properly.
>>
>> For now, we just have to write incoming messages into files, one file per
>> minute. This works fine. But if we want (and we definitely will) to analyze
>> messages in real time, there is a place when something CPU-intensive kicks
>> in. Something like mmnormalize. There will be exactly one heavy action,
>> which cannot be paralleled.
>>
>> Then, in the future, we will forward messages to a few backend syslog OR
>> database servers. To spread the load, we again must do some distinction
>> between messages to select one of predefined actions.
>>
>>
>> --
>> Pavel Levshin
>>
>>
>> 20.10.2013 14:38, David Lang:
>>
>>> I can see other uses for a sequence number, so thanks for creating this.
>>>
>>> However:
>>>
>>> The picture is not quite as bleak as you are making it sound. Rsyslog
>>> already scales pretty well to large numbers of cores.
>>>
>>> The key thing to remember is that you are almost always going to be
>>> doing more than one thing, so while any one thing may end up being single
>>> threaded, you can still have many threads operating at a time.
>>>
>>> most action modules have some point where they cannot be single threaded
>>> (think writing to a file or TCP socket).
>>>
>>> The key to doing a lot of things in parallel is the rsyslog queue
>>> parameters.
>>>
>>> If you configure multiple queue workers, they may not be doing the same
>>> action at the same time, but they can be working on different actions at
>>> the same time.
>>>
>>> With some action modules, such as the ones that do database inserts, the
>>> module does support having multiple threads, because the remote end is able
>>> to handle parallel writes.
>>>
>>> With file output, you can enable async writes, so that you have one
>>> thread writing the output to disk (potentially with compression, signing,
>>> etc) while another thread is crafting the strings to be written.
>>>
>>> It's very common that the bottleneck ends up being in string generation
>>> (complex template patterns for the file format or for the dynamic
>>> filename). Rsyslog supports string modules, which can be significantly more
>>> efficient in creating these strings than the template languange. The
>>> built-in templates were implemented this way and resulted in a noticable
>>> improvement on the peak performance of rsyslog, and they are relatively
>>> simple templates. With more complex templates the gains can be
>>> substantially bigger.
>>>
>>> What action are you doing that is running into a problem?
>>>
>>> David Lang
>>>
>>>
>> ______________________________**_________________
>> rsyslog mailing list
>> http://lists.adiscon.net/**mailman/listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog>
>> http://www.rsyslog.com/**professional-services/<http://www.rsyslog.com/professional-services/>
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> DON'T LIKE THAT.
>>
>>  ______________________________**_________________
> rsyslog mailing list
> http://lists.adiscon.net/**mailman/listinfo/rsyslog<http://lists.adiscon.net/mailman/listinfo/rsyslog>
> http://www.rsyslog.com/**professional-services/<http://www.rsyslog.com/professional-services/>
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to