2014/1/22 Rainer Gerhards <[email protected]>

> On Wed, Jan 22, 2014 at 3:17 PM, Radu Gheorghe <[email protected]
> >wrote:
>
> > Hi Rainer,
> >
> > I'd like to see an Apache Solr <https://lucene.apache.org/solr/> output
> > plugin.
> >
> > You can see in my presentation
> > here<
> >
> http://blog.sematext.com/2013/12/16/video-using-solr-for-logs-with-rsyslog-flume-fluentd-and-logstash/
> > >at
> > slides 21-26 that this is what I did as a demo with omprog. The actual
> > python script that pushes logs to Solr is on slide 26.
> >
> >
> Oh, I think that is an excellent example, especially as it is very brief
> and still does useful work. That's one of the core ideas to convey so that
> others become interested in doing a similar thing.
>
>
> > The thing works, but:
> > - it would only send logs one by one
> > - it has no error handling
> >
> > Though I think it should be easy to add.
> >
>
> Would be great if you could help with that over time, as I neither have the
> know how nor environment...
>
>
Yep, I could help. If you need some steps for setting up the env just let
me know.


>
> >
> > I think Solr is a really nice destination for logs, offering similar
> stuff
> > to what Elasticsearch does. Plus some, minus some, and they're both based
> > on Apache Lucene <https://lucene.apache.org/>. You can see in the same
> > presentation that other tools already send logs to Solr, and I think
> Apache
> > Flume <https://flume.apache.org/> is the most widely used of them.
> >
> >
> Big question now: which changes in rsyslog are required to make you happier
> as an author of such a non-C module? What would you like to see improved?
>

Batch processing and error handling is what is needed for this script to
work. I had a similar script some time ago before omelasticsearch and it
worked just fine, even at high load (although the final argument for
omelasticsearch was that the script was using way more resources than
rsyslog under high load).

If rsyslog could provide me a code sample that does batch processing and
error handling and I would just import the Solr library and add the
function to post message batches, that would be awesome.

For error handling, things seem pretty straightforward:
- if an error is permanent (eg: invalid JSON/XML) log a warning (where?)
and discard the message
- if the error is temporary (eg: Solr is down), but messages back in the
queue and retry later

What queue? This brings me to the batch processing thing. If you need batch
processing, you need to read something from the pipe and put it into the
array and do so until you have enough requests in the array to send the
batch to Solr (size/time).

This single-threaded approach has some disadvantages:
- only one thread can send to Solr at the same time. With HTTP requests
it's typically better to have more (I know this from omelasticsearch)
- the process of adding to the array has to wait for the indexing process
to finish

Therefore a more efficient way is to have a thread that takes from the pipe
and puts into the scripts (tiny) internal queue, an then one or more
threads that take messages from the queue, make arrays and index them to
Solr. And in case of a temporary error, put those messages back to the
queue.

Now, this whole logic is quite complicated (though not extremely
complicated), and it would be a show-stopper IMO if one had to re-do it for
every such non-C output module.

A better option IMO is to have a guide that said:
- make sure you have omprog
- configure the template like this
- deploy this skeleton script like this (list of languages, link to each
repo and how to add new languages)
- import the library you need to sending to X
- change the "send()" function or whatever the name is to send the array of
messages to X.

Only one issue left (as far as I see): multiline messages. Let''s say the
template sends JSONs separated by a newline (is there a better way?). If
one field (say $msg) spreads on multiple lines, then the "reading"thread
can't simply read from stdin (non-blocking!) line-by-line. Maybe we can use
rsyslog's built-in newline escaping and re-construct newlines in the
"send()"threads?

See? Quite a lot of logic. I think providing this for people instead of
having them re-invent it for every script they do would be a huge step
towards adoption.


>
> My thinking is that we could (assuming your permission) put this into the
> source tree (or somewhere else?) and say "look, this is a sample of how to
> do it". Indeed, that alone I think would be useful. But what would be the
> next steps? What caused you trouble? etc, etc...
>

Sure, you can reproduce anything you see from there.

I think I've addressed the "troublesome" bits above.

Best regards,
Radu


>
> Rainer
>
>
> > Best regards,
> > Radu
> >
> >
> > 2014/1/22 Rainer Gerhards <[email protected]>
> >
> > > Hi folks,
> > >
> > > I would like to create (maybe with somone's help) a real non-C output
> > > plugin. Ideally this would be something that's not too hard to do so
> that
> > > the code can act as a simple sample. For that reason, it should work
> on a
> > > push model basis. But that's not a hard requirement.
> > >
> > > Any suggestions what would be needed? What would you like to see?
> > >
> > > Once we have suggestions, we can see that we can implement it and I can
> > see
> > > what in rsyslog's plumbing actually needs to be changed. Your
> > collaboration
> > > during that effort would be appreciated.
> > >
> > > Please let me know your suggestions.
> > >
> > > Thanks,
> > > Rainer
> > > _______________________________________________
> > > rsyslog mailing list
> > > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > > http://www.rsyslog.com/professional-services/
> > > What's up with rsyslog? Follow https://twitter.com/rgerhards
> > > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a
> myriad
> > > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> > > DON'T LIKE THAT.
> > >
> > _______________________________________________
> > rsyslog mailing list
> > http://lists.adiscon.net/mailman/listinfo/rsyslog
> > http://www.rsyslog.com/professional-services/
> > What's up with rsyslog? Follow https://twitter.com/rgerhards
> > NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> > of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> > DON'T LIKE THAT.
> >
> _______________________________________________
> rsyslog mailing list
> http://lists.adiscon.net/mailman/listinfo/rsyslog
> http://www.rsyslog.com/professional-services/
> What's up with rsyslog? Follow https://twitter.com/rgerhards
> NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad
> of sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you
> DON'T LIKE THAT.
>
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to