Or, maybe it's easy to add imthrift&omthrift? 2014/1/23 Radu Gheorghe <[email protected]>
> +1. Let's make what we already have (and maybe add some stuff with minimum > effort) easy to use. Let's see what the adoption is. Let's see people > complain about performance, if that's ever going to happen and then we can > "do it right" according to demand. > > 2014/1/23 Boylan, James <[email protected]> > > I agree with Rainer. >> >> While 'doing it right' from from the start and building from the ground >> up perspective is a fantastic goal, the reality of trying to fit that in >> with an entirely new direction is difficult at best. When you have a >> mounting list of items that need to be worked on already, sometimes taking >> that fast to implement first step to just show how much better something >> can be is critical to get stakeholders to back investing the kind of time >> required to make a major overhaul to something. >> >> -- James >> >> -----Original Message----- >> From: [email protected] [mailto: >> [email protected]] On Behalf Of Rainer Gerhards >> Sent: Thursday, January 23, 2014 5:48 AM >> To: rsyslog-users >> Subject: Re: [rsyslog] non-C output plugins >> >> On Thu, Jan 23, 2014 at 9:09 AM, David Lang <[email protected]> wrote: >> >> > On Thu, 23 Jan 2014, Rainer Gerhards wrote: >> > >> > On Thu, Jan 23, 2014 at 8:00 AM, David Lang <[email protected]> wrote: >> >> >> >> On Thu, 23 Jan 2014, Radu Gheorghe wrote: >> >>> >> >>> 2014/1/22 Rainer Gerhards <[email protected]> >> >>> >> >>>> >> >>>> On Wed, Jan 22, 2014 at 3:17 PM, Radu Gheorghe < >> >>>> [email protected] >> >>>> >> >>>>> >> >>>>> wrote: >> >>>>>> >> >>>>>> >> >>>>> Hi Rainer, >> >>>>> >> >>>>>> >> >>>>>> I'd like to see an Apache Solr <https://lucene.apache.org/solr/> >> >>>>>> output plugin. >> >>>>>> >> >>>>>> You can see in my presentation >> >>>>>> here< >> >>>>>> >> >>>>>> http://blog.sematext.com/2013/12/16/video-using-solr-for- >> >>>>>> >> >>>>> logs-with-rsyslog-flume-fluentd-and-logstash/ >> >>>>> >> >>>>> at >> >>>>>> >> >>>>>>> >> >>>>>>> slides 21-26 that this is what I did as a demo with omprog. The >> >>>>>> actual >> >>>>>> python script that pushes logs to Solr is on slide 26. >> >>>>>> >> >>>>>> >> >>>>>> Oh, I think that is an excellent example, especially as it is >> >>>>>> very >> >>>>>> >> >>>>> brief >> >>>>> and still does useful work. That's one of the core ideas to convey >> >>>>> so that others become interested in doing a similar thing. >> >>>>> >> >>>>> >> >>>>> The thing works, but: >> >>>>> >> >>>>>> - it would only send logs one by one >> >>>>>> - it has no error handling >> >>>>>> >> >>>>>> Though I think it should be easy to add. >> >>>>>> >> >>>>>> >> >>>>>> Would be great if you could help with that over time, as I >> >>>>>> neither >> >>>>> have >> >>>>> the >> >>>>> know how nor environment... >> >>>>> >> >>>>> >> >>>>> Yep, I could help. If you need some steps for setting up the env >> >>>>> just >> >>>>> >> >>>> let >> >>>> me know. >> >>>> >> >>>> >> >>>> >> >>>> >> >>>>> I think Solr is a really nice destination for logs, offering >> >>>>> similar >> >>>>>> >> >>>>>> stuff >> >>>>> >> >>>>> to what Elasticsearch does. Plus some, minus some, and they're >> >>>>> both >> >>>>>> based >> >>>>>> on Apache Lucene <https://lucene.apache.org/>. You can see in the >> >>>>>> same presentation that other tools already send logs to Solr, and >> >>>>>> I think >> >>>>>> >> >>>>>> Apache >> >>>>> >> >>>>> Flume <https://flume.apache.org/> is the most widely used of them. >> >>>>>> >> >>>>>> >> >>>>>> Big question now: which changes in rsyslog are required to make >> >>>>>> you >> >>>>>> >> >>>>> happier >> >>>>> as an author of such a non-C module? What would you like to see >> >>>>> improved? >> >>>>> >> >>>>> >> >>>>> Batch processing and error handling is what is needed for this >> >>>>> script >> >>>> to >> >>>> work. I had a similar script some time ago before omelasticsearch >> >>>> and it worked just fine, even at high load (although the final >> >>>> argument for omelasticsearch was that the script was using way more >> >>>> resources than rsyslog under high load). >> >>>> >> >>>> If rsyslog could provide me a code sample that does batch >> >>>> processing and error handling and I would just import the Solr >> >>>> library and add the function to post message batches, that would be >> awesome. >> >>>> >> >>>> For error handling, things seem pretty straightforward: >> >>>> - if an error is permanent (eg: invalid JSON/XML) log a warning >> >>>> (where?) and discard the message >> >>>> - if the error is temporary (eg: Solr is down), but messages back >> >>>> in the queue and retry later >> >>>> >> >>>> What queue? This brings me to the batch processing thing. If you >> >>>> need batch processing, you need to read something from the pipe and >> >>>> put it into the array and do so until you have enough requests in >> >>>> the array to send the batch to Solr (size/time). >> >>>> >> >>>> This single-threaded approach has some disadvantages: >> >>>> - only one thread can send to Solr at the same time. With HTTP >> >>>> requests it's typically better to have more (I know this from >> >>>> omelasticsearch) >> >>>> - the process of adding to the array has to wait for the indexing >> >>>> process to finish >> >>>> >> >>>> Therefore a more efficient way is to have a thread that takes from >> >>>> the pipe and puts into the scripts (tiny) internal queue, an then >> >>>> one or more threads that take messages from the queue, make arrays >> >>>> and index them to Solr. And in case of a temporary error, put those >> >>>> messages back to the queue. >> >>>> >> >>>> Now, this whole logic is quite complicated (though not extremely >> >>>> complicated), and it would be a show-stopper IMO if one had to >> >>>> re-do it for every such non-C output module. >> >>>> >> >>>> A better option IMO is to have a guide that said: >> >>>> - make sure you have omprog >> >>>> - configure the template like this >> >>>> - deploy this skeleton script like this (list of languages, link to >> >>>> each repo and how to add new languages) >> >>>> - import the library you need to sending to X >> >>>> - change the "send()" function or whatever the name is to send the >> >>>> array of messages to X. >> >>>> >> >>>> Only one issue left (as far as I see): multiline messages. Let''s >> >>>> say the template sends JSONs separated by a newline (is there a >> >>>> better way?). If one field (say $msg) spreads on multiple lines, >> >>>> then the "reading"thread can't simply read from stdin >> >>>> (non-blocking!) line-by-line. Maybe we can use rsyslog's built-in >> >>>> newline escaping and re-construct newlines in the "send()"threads? >> >>>> >> >>>> See? Quite a lot of logic. I think providing this for people >> >>>> instead of having them re-invent it for every script they do would >> >>>> be a huge step towards adoption. >> >>>> >> >>>> >> >>> This is why I am suggesting that instead of ascii based message >> >>> passing (and encoding it in JSON, dealing with escaping, linebreaks, >> >>> etc) we should use protocol buffers and/or thrift for the message >> >>> passing. >> >>> >> >>> It would let logs be passed as structures, using optimized parser >> >>> libraries (one of the problems with json-c is it's lack of >> >>> optimization), allow batches of logs to be passed, allow non-log >> >>> messages to be passed without any possibility of confusion, etc. >> >>> >> >>> >> >> David, while I basically agree on all the points that you give, I >> >> still stick with the idea of pipes. I think you are overlooking a the >> >> work involved. If we "do it right" in the first place, we'll probably >> >> never do it in any case (at least I am very skeptical if I can find >> >> sufficient time to do the necessary work in rsyslog -- there still is >> a long todo list). >> >> >> >> On the other hand, if we accept to re-invent the wheel at least twice >> >> (so to say), I can stick some of the omprog work into my schedule and >> >> so we get something to actually prove in practice. This would >> >> immediately enable anyone to write/contribute output plugins. With >> >> the knowledge gained there, I could create an input counterpart, most >> >> probably also with comparatively little effort. From there, it is >> >> possibly (not 100% sure) also easy to do an external modification and >> >> filter interface. In the end result, we would have all capabilities >> >> to interface to external programs, albeit not in a perfect way. But >> >> folks could actually use it. >> >> >> >> Once this is in place for some time, we can review where it hurts and >> >> then decide on a next and better solution. I think this is much the >> >> same approach as the overall "let's do non-c, external plugins" >> >> approach -- because by definition they are second class in many ways >> >> and really not something one should do if seriously interested in >> >> performance. But we concluded there is value in that. >> >> >> >> For that reasoning I would like to proceed with the pipe interface >> >> for now. >> >> Does that make sense to you? >> >> >> > >> > I think it all depends on how much work we are going to have to do. >> > >> > why do we need to do anything? can't omprog with a JSON template >> > already do what's needed? >> > >> >> IMHO quite some things, but it lacks some others (like a feedback >> capability, see below). My interest is to extend it in a way that mostly >> solves these issues. >> >> >> > >> > as long as we stick to pipes, we aren't going to be able to get per >> > message success/failure messages (the buffering in the pipe will >> > prevent that from working well) >> > >> >> That's right, but we get pretty close to "around when it begins to fail". >> Per message is hard in any case. Think e.g. TCP forwarding: we never know >> exactly when the connection broke, so this is also approximate. >> >> >> > I guess I'm not fully understanding what's being proposed for the >> > short term. >> > >> >> Let me coin it into a non-technical statement: I try to do the minimal >> thing necessary to make it easier for people to connect to destinations for >> which rsyslog does not yet have any native plugins. Among the tech issues, >> this probably means make folks aware that they actually *can do that* (as >> you say, to a large extent even with what we have today, but that seems to >> be too well-hidden). Once this is done, I'd like to sit back, relax, gain >> some experience and then see how to proceed further. >> >> Rainer >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com/professional-services/ >> What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: >> This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites >> beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE >> THAT. >> _______________________________________________ >> rsyslog mailing list >> http://lists.adiscon.net/mailman/listinfo/rsyslog >> http://www.rsyslog.com/professional-services/ >> What's up with rsyslog? Follow https://twitter.com/rgerhards >> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad >> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you >> DON'T LIKE THAT. >> > > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/ What's up with rsyslog? Follow https://twitter.com/rgerhards NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE THAT.

