On Fri, Jan 24, 2014 at 10:16 AM, David Lang <[email protected]> wrote:

> On Fri, 24 Jan 2014, Radu Gheorghe wrote:
>
>  David,
>>
>> First of all, thanks a lot for your input in this matter. For me it's
>> eye-opening in many areas and it's very interesting for this subject in
>> general.
>>
>> More inline.
>>
>> 2014/1/23 David Lang <[email protected]>
>>
>>  On Thu, 23 Jan 2014, Rainer Gerhards wrote:
>>>
>>>  On Thu, Jan 23, 2014 at 6:46 PM, David Lang <[email protected]> wrote:
>>>
>>>>
>>>>  so what exactly is being proposed?
>>>>
>>>>>
>>>>> It sounds as if we are talking about omprog, but that also captures
>>>>> stderr
>>>>> of the program that's executed, and watches that stderr for specific
>>>>> keywords.
>>>>>
>>>>>
>>>>>  I think yes, that's bascially the idea.
>>>>>
>>>>
>>>>
>>>>  what keywords are you talking about, and what actions will be taken?
>>>> how
>>>>
>>>>> would these actions differ from the program just stalling reading of
>>>>> the
>>>>> pipe or exiting with an error code?
>>>>>
>>>>>
>>>>>  don't know yet - this needs to evolve/be specified. Currently working
>>>>>
>>>> on a
>>>> python test script and test integration.
>>>>
>>>>
>>> Ok, probably a better question than qhat keywords, is what functionality
>>> you are looking for in this feedback. Radu, do you have thoughts?
>>>
>>
>>
>> With the way I've used omprog so far, I didn't ever try (or miss trying)
>> to
>> communicate from the script back to rsyslog. The script's internal queue
>> would be tiny (say, 1000 messages) and if it gets full, bad luck, it just
>> stops.
>>
>> If something catastrophic happens, throw an exception and rely on omprog
>> to
>> restart it. Warnings would have been logged to its own logfile, and I
>> guess
>> that's when it would be nice to have a channel back to rsyslog. But again,
>> I wouldn't stress too much on that. For example, a custom UDP port to
>> listen to, that binds to a ruleset that spits those logs to a file should
>> be enough. You wouldn't want to send the same logs to the same script,
>> because it would create an endless loop.
>>
>> In short, what I needed was to discard malformed messages (which should be
>> a rare exception, indicating a bug somewhere - which is why it's nice to
>> log when that happens) and to stop and retry on temporary errors (like
>> network issues).
>>
>
> remember that it's also possible to create a /dev/log type socket which
> your standard logging library can access without it feeding back to itself
> if you don't want it to.
>
>
Yeah... But I admit I like to stick to std[in/out/err] for the time being,
maybe just stdin and stdout. The reason is that I think all (reasonable)
languages have easy ways to access these. Given the fact the I (and
probably also not we as a team) cannot provide skeletons for all languages,
I would like to have at least once decent enough interface that even a
novice programmer can connect to. So the less fancy, the better.

That doesn't outrule better ways of doing things. But for starters, let's
get that initial simple thing done. I think it's also sufficient to see if
others will contribute.

The stdin/out approach works for input modules equally well. It get's messy
with modification modules, but I'd still like to experiment with it (for
the sake of the same argument).

When we get things done over stdin/out, we for sure can re-use part of this
work for other protocols. So IMHO it's also a very good rapid prototyping
tool.


> This is all in the script, rsyslog only pushes messages to the script's
>> stdin and doesn't bother:
>>
>
> This is a good start for now, but I wouldn't spend too much work on it.
>
> I'm thinking that it shouldn't be that hard for the rsyslog side to batch
> all the messages into one structure, pass that structure to the other app,
> and then take feedback as to the success or failure.
>
> But I'm going to have to refresh myself on how rsyslog handles the batches
> internally (and how this is changing with v8)
>
> for now, as far as rsyslog is concerned, it isn't batched, it just writes
> a series of logs out to the other app.


No, no ... it's batched (at least from the core engine PoV, not sure at the
moment if omprog currently writes in batches, but it could). So there is no
performance drawback as far as the core is concerned. Ture, we don't get
feedback precisely on a per-batch basis, and the external plugin needs to
re-implement some of the batching logic, but that's not much work (but
costs performance, of course). Still I hold to my argument given above. ;)

>
>
>  - one thread continuously reads from stdin and puts messages into a queue
>> - one or more threads write to Solr (or whatever). Logic would be
>> something
>>
>
> A programmer had a problem. He thought "I know, I'll solve it with
> threads!". has Now problems. two he
>
> Also, creating an internal queue can lead to performance issues related to
> locking of that queue. It was a substantial amount of (sponsored) work to
> change rsyslog to do batching on it's queues, and it was also a huge
> speedup internally.
>
>
> remember that the pipe that the messages are being sent to you over is a
> fairly substantial size, so you really can think about getting away with a
> much simpler approach. remember that premature optimization is the root of
> all evil
>
> tier 1 (simplest)
>
> read messages until there are no more or I hit my limit
> push those messages to my output
>
> with no thread or async programming tricks involved.
>
> Tier 2
>
> the next more complicated version would push the messages to the output,
> and then start working on reading the next batch of messages while it's
> waiting for confirmation
>
> Tier 3
>
> This version would support multiple outbound sessions, they may be
> separate connections or logical connections over one network connection,
> depending on what you're connecting to.
>
> tier 4
>
> It's only if you need to overlap formatting of the new messages that have
> arrived with reading those messages that you would need to resort to
> threads.
> Before we get to something this complicated, we should do something that
> moves away from JSON text as the protocol between rsyslog and the external
> app and replace it with something faster.
>
>
Yeah, I basically agree. A couple of points:

In some languages (java?) using two threads may actually be the right tool.
Usually it isn't async IO is probably the tool of choice as long as we have
the pipe approach.

Actual multithreading should NOT be done by the plugin. The v8 engine
spawns a new action instance for each worker thread. While omprog does
*currently* not utilize this (it mutex locks itself to a single instance),
this is not a problem. Changing omprog to support multiple instances
(optionally, of course) is already on my Agenda since november. So let
rsyslog do the threading. If it sees need to run multiple batches, it'll
spawn a new one, which in turn will spawn a new instance of the external
plugin. As such, the external plugin can always be written in a
single-threaded spirit. Note that this approach works well with most of the
"slower" sources like databases, http-based systems etc where you have a
new connection handle and be happy. It wouldn't work well with things like
files... but you can't solve that in any case...

If I've learnt one thing over the last couple of weeks: if we want to
enable people to craft their own plugins, it must be dumb easy to do.
Explicit threading doesn't work well with that.

Rainer
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to