On Thu, 15 Jan 2015, Radu Gheorghe wrote:

On Thu, Jan 15, 2015 at 7:12 PM, Rainer Gerhards <[email protected]>
wrote:
[...]

2. provide a general infrastructure for pull models, whatever this is to be
used for

[...]

Use cases for 2 exists, but I don't know the specifics. They surface every
now and then on the ML when someone ask for pull integration. I think there
was even a discussion with Radu, but I may be wrong.


I remember some discussions, for example if rsyslog could buffer and expose
an API, one could easily implement a plugin (say, on top of Elasticsearch)
that would enable the datastore to pull in data at its own pace.

This contrasts with the current push model, where one has to tune things
like batch sizes and retries in a way that doesn't overall the destination.

I'm missing something here. If rsyslog has a queue for the destination, and the delivery to the destination is via TCP, how is a pull any better than a push? if the destination accepts data at a faster pace than it can really handle, why would the pull be any better? If the destination only accepts data at the rate it can handle, then the traffic will backup into the rsyslog queue.

Which is not really possible, because you can't control the load generated
by queries, GC, and whatnot.

Of course the pull model has its own caveats, but it would be nice to be
able to choose what works best for every usecase.

I see a case for pull in rsyslog grabbing data (sort of a remote imfile type of thing), connecting to an existing API to fetch data, or remotely pulling a data file rather than having to have an agent on the remote machine to scrape and send it (this may be the right answer to getting logs out of a bunch of windows machines for example). The journald input is an example of a pull input.

But for output from rsyslog, I'm not seeing a lot of use. On the other hand, I could see an output module being pretty straighforward.

Have the data go to a queue, and instead of the output module being invoked by the main loop, it would sit and wait for a request from the network and then read messages from it's queue and deliver them to the network. Once the remote endpoint signals that it's received the data, mark the messages as delivered (removing them from the queue)


One area that I think could use some long-term attention is the internal API to the queues. Queue contention can be a problem, and disk queues in particular are much slower than they should be. Batching things helps a lot in this area, but this contention can lead to very odd situations where performance is actually worse with less traffic (if you are tuned to handle a lot of traffic with multiple threads, the contention with a low volume of traffic can actally decrease throughput, as shown by the LDAP thread where having rsyslog write data to a file lets it receive more logs than if it throws them away)

David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of 
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE 
THAT.

Reply via email to