I agree with Rainer. While 'doing it right' from from the start and building from the ground up perspective is a fantastic goal, the reality of trying to fit that in with an entirely new direction is difficult at best. When you have a mounting list of items that need to be worked on already, sometimes taking that fast to implement first step to just show how much better something can be is critical to get stakeholders to back investing the kind of time required to make a major overhaul to something.
-- James -----Original Message----- From: [email protected] [mailto:[email protected]] On Behalf Of Rainer Gerhards Sent: Thursday, January 23, 2014 5:48 AM To: rsyslog-users Subject: Re: [rsyslog] non-C output plugins On Thu, Jan 23, 2014 at 9:09 AM, David Lang <[email protected]> wrote: > On Thu, 23 Jan 2014, Rainer Gerhards wrote: > > On Thu, Jan 23, 2014 at 8:00 AM, David Lang <[email protected]> wrote: >> >> On Thu, 23 Jan 2014, Radu Gheorghe wrote: >>> >>> 2014/1/22 Rainer Gerhards <[email protected]> >>> >>>> >>>> On Wed, Jan 22, 2014 at 3:17 PM, Radu Gheorghe < >>>> [email protected] >>>> >>>>> >>>>> wrote: >>>>>> >>>>>> >>>>> Hi Rainer, >>>>> >>>>>> >>>>>> I'd like to see an Apache Solr <https://lucene.apache.org/solr/> >>>>>> output plugin. >>>>>> >>>>>> You can see in my presentation >>>>>> here< >>>>>> >>>>>> http://blog.sematext.com/2013/12/16/video-using-solr-for- >>>>>> >>>>> logs-with-rsyslog-flume-fluentd-and-logstash/ >>>>> >>>>> at >>>>>> >>>>>>> >>>>>>> slides 21-26 that this is what I did as a demo with omprog. The >>>>>> actual >>>>>> python script that pushes logs to Solr is on slide 26. >>>>>> >>>>>> >>>>>> Oh, I think that is an excellent example, especially as it is >>>>>> very >>>>>> >>>>> brief >>>>> and still does useful work. That's one of the core ideas to convey >>>>> so that others become interested in doing a similar thing. >>>>> >>>>> >>>>> The thing works, but: >>>>> >>>>>> - it would only send logs one by one >>>>>> - it has no error handling >>>>>> >>>>>> Though I think it should be easy to add. >>>>>> >>>>>> >>>>>> Would be great if you could help with that over time, as I >>>>>> neither >>>>> have >>>>> the >>>>> know how nor environment... >>>>> >>>>> >>>>> Yep, I could help. If you need some steps for setting up the env >>>>> just >>>>> >>>> let >>>> me know. >>>> >>>> >>>> >>>> >>>>> I think Solr is a really nice destination for logs, offering >>>>> similar >>>>>> >>>>>> stuff >>>>> >>>>> to what Elasticsearch does. Plus some, minus some, and they're >>>>> both >>>>>> based >>>>>> on Apache Lucene <https://lucene.apache.org/>. You can see in the >>>>>> same presentation that other tools already send logs to Solr, and >>>>>> I think >>>>>> >>>>>> Apache >>>>> >>>>> Flume <https://flume.apache.org/> is the most widely used of them. >>>>>> >>>>>> >>>>>> Big question now: which changes in rsyslog are required to make >>>>>> you >>>>>> >>>>> happier >>>>> as an author of such a non-C module? What would you like to see >>>>> improved? >>>>> >>>>> >>>>> Batch processing and error handling is what is needed for this >>>>> script >>>> to >>>> work. I had a similar script some time ago before omelasticsearch >>>> and it worked just fine, even at high load (although the final >>>> argument for omelasticsearch was that the script was using way more >>>> resources than rsyslog under high load). >>>> >>>> If rsyslog could provide me a code sample that does batch >>>> processing and error handling and I would just import the Solr >>>> library and add the function to post message batches, that would be >>>> awesome. >>>> >>>> For error handling, things seem pretty straightforward: >>>> - if an error is permanent (eg: invalid JSON/XML) log a warning >>>> (where?) and discard the message >>>> - if the error is temporary (eg: Solr is down), but messages back >>>> in the queue and retry later >>>> >>>> What queue? This brings me to the batch processing thing. If you >>>> need batch processing, you need to read something from the pipe and >>>> put it into the array and do so until you have enough requests in >>>> the array to send the batch to Solr (size/time). >>>> >>>> This single-threaded approach has some disadvantages: >>>> - only one thread can send to Solr at the same time. With HTTP >>>> requests it's typically better to have more (I know this from >>>> omelasticsearch) >>>> - the process of adding to the array has to wait for the indexing >>>> process to finish >>>> >>>> Therefore a more efficient way is to have a thread that takes from >>>> the pipe and puts into the scripts (tiny) internal queue, an then >>>> one or more threads that take messages from the queue, make arrays >>>> and index them to Solr. And in case of a temporary error, put those >>>> messages back to the queue. >>>> >>>> Now, this whole logic is quite complicated (though not extremely >>>> complicated), and it would be a show-stopper IMO if one had to >>>> re-do it for every such non-C output module. >>>> >>>> A better option IMO is to have a guide that said: >>>> - make sure you have omprog >>>> - configure the template like this >>>> - deploy this skeleton script like this (list of languages, link to >>>> each repo and how to add new languages) >>>> - import the library you need to sending to X >>>> - change the "send()" function or whatever the name is to send the >>>> array of messages to X. >>>> >>>> Only one issue left (as far as I see): multiline messages. Let''s >>>> say the template sends JSONs separated by a newline (is there a >>>> better way?). If one field (say $msg) spreads on multiple lines, >>>> then the "reading"thread can't simply read from stdin >>>> (non-blocking!) line-by-line. Maybe we can use rsyslog's built-in >>>> newline escaping and re-construct newlines in the "send()"threads? >>>> >>>> See? Quite a lot of logic. I think providing this for people >>>> instead of having them re-invent it for every script they do would >>>> be a huge step towards adoption. >>>> >>>> >>> This is why I am suggesting that instead of ascii based message >>> passing (and encoding it in JSON, dealing with escaping, linebreaks, >>> etc) we should use protocol buffers and/or thrift for the message >>> passing. >>> >>> It would let logs be passed as structures, using optimized parser >>> libraries (one of the problems with json-c is it's lack of >>> optimization), allow batches of logs to be passed, allow non-log >>> messages to be passed without any possibility of confusion, etc. >>> >>> >> David, while I basically agree on all the points that you give, I >> still stick with the idea of pipes. I think you are overlooking a the >> work involved. If we "do it right" in the first place, we'll probably >> never do it in any case (at least I am very skeptical if I can find >> sufficient time to do the necessary work in rsyslog -- there still is a long >> todo list). >> >> On the other hand, if we accept to re-invent the wheel at least twice >> (so to say), I can stick some of the omprog work into my schedule and >> so we get something to actually prove in practice. This would >> immediately enable anyone to write/contribute output plugins. With >> the knowledge gained there, I could create an input counterpart, most >> probably also with comparatively little effort. From there, it is >> possibly (not 100% sure) also easy to do an external modification and >> filter interface. In the end result, we would have all capabilities >> to interface to external programs, albeit not in a perfect way. But >> folks could actually use it. >> >> Once this is in place for some time, we can review where it hurts and >> then decide on a next and better solution. I think this is much the >> same approach as the overall "let's do non-c, external plugins" >> approach -- because by definition they are second class in many ways >> and really not something one should do if seriously interested in >> performance. But we concluded there is value in that. >> >> For that reasoning I would like to proceed with the pipe interface >> for now. >> Does that make sense to you? >> > > I think it all depends on how much work we are going to have to do. > > why do we need to do anything? can't omprog with a JSON template > already do what's needed? > IMHO quite some things, but it lacks some others (like a feedback capability, see below). My interest is to extend it in a way that mostly solves these issues. > > as long as we stick to pipes, we aren't going to be able to get per > message success/failure messages (the buffering in the pipe will > prevent that from working well) > That's right, but we get pretty close to "around when it begins to fail". Per message is hard in any case. Think e.g. TCP forwarding: we never know exactly when the connection broke, so this is also approximate. > I guess I'm not fully understanding what's being proposed for the > short term. > Let me coin it into a non-technical statement: I try to do the minimal thing necessary to make it easier for people to connect to destinations for which rsyslog does not yet have any native plugins. Among the tech issues, this probably means make folks aware that they actually *can do that* (as you say, to a large extent even with what we have today, but that seems to be too well-hidden). Once this is done, I'd like to sit back, relax, gain some experience and then see how to proceed further. Rainer _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT. _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

