On Thu, 15 Jan 2015, Radu Gheorghe wrote:
On Thu, Jan 15, 2015 at 7:12 PM, Rainer Gerhards <[email protected]>
wrote:
[...]
2. provide a general infrastructure for pull models, whatever this is to be
used for
[...]
Use cases for 2 exists, but I don't know the specifics. They surface every
now and then on the ML when someone ask for pull integration. I think there
was even a discussion with Radu, but I may be wrong.
I remember some discussions, for example if rsyslog could buffer and expose
an API, one could easily implement a plugin (say, on top of Elasticsearch)
that would enable the datastore to pull in data at its own pace.
This contrasts with the current push model, where one has to tune things
like batch sizes and retries in a way that doesn't overall the destination.
I'm missing something here. If rsyslog has a queue for the destination, and the
delivery to the destination is via TCP, how is a pull any better than a push? if
the destination accepts data at a faster pace than it can really handle, why
would the pull be any better? If the destination only accepts data at the rate
it can handle, then the traffic will backup into the rsyslog queue.
Which is not really possible, because you can't control the load generated
by queries, GC, and whatnot.
Of course the pull model has its own caveats, but it would be nice to be
able to choose what works best for every usecase.
I see a case for pull in rsyslog grabbing data (sort of a remote imfile type of
thing), connecting to an existing API to fetch data, or remotely pulling a data
file rather than having to have an agent on the remote machine to scrape and
send it (this may be the right answer to getting logs out of a bunch of windows
machines for example). The journald input is an example of a pull input.
But for output from rsyslog, I'm not seeing a lot of use. On the other hand, I
could see an output module being pretty straighforward.
Have the data go to a queue, and instead of the output module being invoked by
the main loop, it would sit and wait for a request from the network and then
read messages from it's queue and deliver them to the network. Once the remote
endpoint signals that it's received the data, mark the messages as delivered
(removing them from the queue)
One area that I think could use some long-term attention is the internal API to
the queues. Queue contention can be a problem, and disk queues in particular are
much slower than they should be. Batching things helps a lot in this area, but
this contention can lead to very odd situations where performance is actually
worse with less traffic (if you are tuned to handle a lot of traffic with
multiple threads, the contention with a low volume of traffic can actally
decrease throughput, as shown by the LDAP thread where having rsyslog write data
to a file lets it receive more logs than if it throws them away)
David Lang
_______________________________________________
rsyslog mailing list
http://lists.adiscon.net/mailman/listinfo/rsyslog
http://www.rsyslog.com/professional-services/
What's up with rsyslog? Follow https://twitter.com/rgerhards
NOTE WELL: This is a PUBLIC mailing list, posts are ARCHIVED by a myriad of
sites beyond our control. PLEASE UNSUBSCRIBE and DO NOT POST if you DON'T LIKE
THAT.