Re: [rsyslog] non-C output plugins

Radu Gheorghe Thu, 23 Jan 2014 05:58:53 -0800

Or, maybe it's easy to add imthrift&omthrift?

2014/1/23 Radu Gheorghe <[email protected]>


> +1. Let's make what we already have (and maybe add some stuff with minimum
> effort) easy to use. Let's see what the adoption is. Let's see people
> complain about performance, if that's ever going to happen and then we can
> "do it right" according to demand.
>
> 2014/1/23 Boylan, James <[email protected]>
>
> I agree with Rainer.
>>
>> While 'doing it right' from from the start and building from the ground
>> up perspective is a fantastic goal, the reality of trying to fit that in
>> with an entirely new direction is difficult at best. When you have a
>> mounting list of items that need to be worked on already, sometimes taking
>> that fast to implement first step to just show how much better something
>> can be is critical to get stakeholders to back investing the kind of time
>> required to make a major overhaul to something.
>>
>> -- James
>>
>> -----Original Message-----
>> From: [email protected] [mailto:
>> [email protected]] On Behalf Of Rainer Gerhards
>> Sent: Thursday, January 23, 2014 5:48 AM
>> To: rsyslog-users
>> Subject: Re: [rsyslog] non-C output plugins
>>
>> On Thu, Jan 23, 2014 at 9:09 AM, David Lang <[email protected]> wrote:
>>
>> > On Thu, 23 Jan 2014, Rainer Gerhards wrote:
>> >
>> >  On Thu, Jan 23, 2014 at 8:00 AM, David Lang <[email protected]> wrote:
>> >>
>> >>  On Thu, 23 Jan 2014, Radu Gheorghe wrote:
>> >>>
>> >>>  2014/1/22 Rainer Gerhards <[email protected]>
>> >>>
>> >>>>
>> >>>>  On Wed, Jan 22, 2014 at 3:17 PM, Radu Gheorghe <
>> >>>> [email protected]
>> >>>>
>> >>>>>
>> >>>>>  wrote:
>> >>>>>>
>> >>>>>>
>> >>>>>  Hi Rainer,
>> >>>>>
>> >>>>>>
>> >>>>>> I'd like to see an Apache Solr <https://lucene.apache.org/solr/>
>> >>>>>> output plugin.
>> >>>>>>
>> >>>>>> You can see in my presentation
>> >>>>>> here<
>> >>>>>>
>> >>>>>>  http://blog.sematext.com/2013/12/16/video-using-solr-for-
>> >>>>>>
>> >>>>> logs-with-rsyslog-flume-fluentd-and-logstash/
>> >>>>>
>> >>>>>  at
>> >>>>>>
>> >>>>>>>
>> >>>>>>>  slides 21-26 that this is what I did as a demo with omprog. The
>> >>>>>> actual
>> >>>>>> python script that pushes logs to Solr is on slide 26.
>> >>>>>>
>> >>>>>>
>> >>>>>>  Oh, I think that is an excellent example, especially as it is
>> >>>>>> very
>> >>>>>>
>> >>>>> brief
>> >>>>> and still does useful work. That's one of the core ideas to convey
>> >>>>> so that others become interested in doing a similar thing.
>> >>>>>
>> >>>>>
>> >>>>>  The thing works, but:
>> >>>>>
>> >>>>>> - it would only send logs one by one
>> >>>>>> - it has no error handling
>> >>>>>>
>> >>>>>> Though I think it should be easy to add.
>> >>>>>>
>> >>>>>>
>> >>>>>>  Would be great if you could help with that over time, as I
>> >>>>>> neither
>> >>>>> have
>> >>>>> the
>> >>>>> know how nor environment...
>> >>>>>
>> >>>>>
>> >>>>>  Yep, I could help. If you need some steps for setting up the env
>> >>>>> just
>> >>>>>
>> >>>> let
>> >>>> me know.
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>>>  I think Solr is a really nice destination for logs, offering
>> >>>>> similar
>> >>>>>>
>> >>>>>>  stuff
>> >>>>>
>> >>>>>  to what Elasticsearch does. Plus some, minus some, and they're
>> >>>>> both
>> >>>>>> based
>> >>>>>> on Apache Lucene <https://lucene.apache.org/>. You can see in the
>> >>>>>> same presentation that other tools already send logs to Solr, and
>> >>>>>> I think
>> >>>>>>
>> >>>>>>  Apache
>> >>>>>
>> >>>>>  Flume <https://flume.apache.org/> is the most widely used of them.
>> >>>>>>
>> >>>>>>
>> >>>>>>  Big question now: which changes in rsyslog are required to make
>> >>>>>> you
>> >>>>>>
>> >>>>> happier
>> >>>>> as an author of such a non-C module? What would you like to see
>> >>>>> improved?
>> >>>>>
>> >>>>>
>> >>>>>  Batch processing and error handling is what is needed for this
>> >>>>> script
>> >>>> to
>> >>>> work. I had a similar script some time ago before omelasticsearch
>> >>>> and it worked just fine, even at high load (although the final
>> >>>> argument for omelasticsearch was that the script was using way more
>> >>>> resources than rsyslog under high load).
>> >>>>
>> >>>> If rsyslog could provide me a code sample that does batch
>> >>>> processing and error handling and I would just import the Solr
>> >>>> library and add the function to post message batches, that would be
>> awesome.
>> >>>>
>> >>>> For error handling, things seem pretty straightforward:
>> >>>> - if an error is permanent (eg: invalid JSON/XML) log a warning
>> >>>> (where?) and discard the message
>> >>>> - if the error is temporary (eg: Solr is down), but messages back
>> >>>> in the queue and retry later
>> >>>>
>> >>>> What queue? This brings me to the batch processing thing. If you
>> >>>> need batch processing, you need to read something from the pipe and
>> >>>> put it into the array and do so until you have enough requests in
>> >>>> the array to send the batch to Solr (size/time).
>> >>>>
>> >>>> This single-threaded approach has some disadvantages:
>> >>>> - only one thread can send to Solr at the same time. With HTTP
>> >>>> requests it's typically better to have more (I know this from
>> >>>> omelasticsearch)
>> >>>> - the process of adding to the array has to wait for the indexing
>> >>>> process to finish
>> >>>>
>> >>>> Therefore a more efficient way is to have a thread that takes from
>> >>>> the pipe and puts into the scripts (tiny) internal queue, an then
>> >>>> one or more threads that take messages from the queue, make arrays
>> >>>> and index them to Solr. And in case of a temporary error, put those
>> >>>> messages back to the queue.
>> >>>>
>> >>>> Now, this whole logic is quite complicated (though not extremely
>> >>>> complicated), and it would be a show-stopper IMO if one had to
>> >>>> re-do it for every such non-C output module.
>> >>>>
>> >>>> A better option IMO is to have a guide that said:
>> >>>> - make sure you have omprog
>> >>>> - configure the template like this
>> >>>> - deploy this skeleton script like this (list of languages, link to
>> >>>> each repo and how to add new languages)
>> >>>> - import the library you need to sending to X
>> >>>> - change the "send()" function or whatever the name is to send the
>> >>>> array of messages to X.
>> >>>>
>> >>>> Only one issue left (as far as I see): multiline messages. Let''s
>> >>>> say the template sends JSONs separated by a newline (is there a
>> >>>> better way?). If one field (say $msg) spreads on multiple lines,
>> >>>> then the "reading"thread can't simply read from stdin
>> >>>> (non-blocking!) line-by-line. Maybe we can use rsyslog's built-in
>> >>>> newline escaping and re-construct newlines in the "send()"threads?
>> >>>>
>> >>>> See? Quite a lot of logic. I think providing this for people
>> >>>> instead of having them re-invent it for every script they do would
>> >>>> be a huge step towards adoption.
>> >>>>
>> >>>>
>> >>> This is why I am suggesting that instead of ascii based message
>> >>> passing (and encoding it in JSON, dealing with escaping, linebreaks,
>> >>> etc) we should use protocol buffers and/or thrift for the message
>> >>> passing.
>> >>>
>> >>> It would let logs be passed as structures, using optimized parser
>> >>> libraries (one of the problems with json-c is it's lack of
>> >>> optimization), allow batches of logs to be passed, allow non-log
>> >>> messages to be passed without any possibility of confusion, etc.
>> >>>
>> >>>
>> >> David, while I basically agree on all the points that you give, I
>> >> still stick with the idea of pipes. I think you are overlooking a the
>> >> work involved. If we "do it right" in the first place, we'll probably
>> >> never do it in any case (at least I am very skeptical if I can find
>> >> sufficient time to do the necessary work in rsyslog -- there still is
>> a long todo list).
>> >>
>> >> On the other hand, if we accept to re-invent the wheel at least twice
>> >> (so to say), I can stick some of the omprog work into my schedule and
>> >> so we get something to actually prove in practice. This would
>> >> immediately enable anyone to write/contribute output plugins. With
>> >> the knowledge gained there, I could create an input counterpart, most
>> >> probably also with comparatively little effort. From there, it is
>> >> possibly (not 100% sure) also easy to do an external modification and
>> >> filter interface. In the end result, we would have all capabilities
>> >> to interface to external programs, albeit not in a perfect way. But
>> >> folks could actually use it.
>> >>
>> >> Once this is in place for some time, we can review where it hurts and
>> >> then decide on a next and better solution. I think this is much the
>> >> same approach as the overall "let's do non-c, external plugins"
>> >> approach -- because by definition they are second class in many ways
>> >> and really not something one should do if seriously interested in
>> >> performance. But we concluded there is value in that.
>> >>
>> >> For that reasoning I would like to proceed with the pipe interface
>> >> for now.
>> >> Does that make sense to you?
>> >>
>> >
>> > I think it all depends on how much work we are going to have to do.
>> >
>> > why do we need to do anything? can't omprog with a JSON template
>> > already do what's needed?
>> >
>>
>> IMHO quite some things, but it lacks some others (like a feedback
>> capability, see below). My interest is to extend it in a way that mostly
>> solves these issues.
>>
>>
>> >
>> > as long as we stick to pipes, we aren't going to be able to get per
>> > message success/failure messages (the buffering in the pipe will
>> > prevent that from working well)
>> >
>>
>> That's right, but we get pretty close to "around when it begins to fail".
>> Per message is hard in any case. Think e.g. TCP forwarding: we never know
>> exactly when the connection broke, so this is also approximate.
>>
>>
>> > I guess I'm not fully understanding what's being proposed for the
>> > short term.
>> >
>>
>> Let me coin it into a non-technical statement: I try to do the minimal
>> thing necessary to make it easier for people to connect to destinations for
>> which rsyslog does not yet have any native plugins. Among the tech issues,
>> this probably means make folks aware that they actually *can do that* (as
>> you say, to a large extent even with what we have today, but that seems to
>> be too well-hidden). Once this is done, I'd like to sit back, relax, gain
>> some experience and then see how to proceed further.
>>
>> Rainer
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL:
>> This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites
>> beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
>> THAT.
>> _______________________________________________
>> rsyslog mailing list
>> http://lists.adiscon.net/mailman/listinfo/rsyslog
>> http://www.rsyslog.com/professional-services/
>> What's up with rsyslog? Follow https://twitter.com/rgerhards
>> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
>> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
>> DON'T LIKE THAT.
>>
>
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Re: [rsyslog] non-C output plugins

Reply via email to