Hello! I'm one of the people behind the rsyslog zeromq plugins. I was reading this thread with interest, as I have time set aside to dig into elasticsearch next month. I know little about it yet, but I found this statement interesting:
"Pushing" to ES seems to work much less reliably than having ES "pull" messages " As a heads up, we have somewhat experimental (well, I would consider the zeromq plugins as a whole to still be experimental actually) support for ZMQ_REP output sockets in the omzeromq plugin. This output blocks until it receives a request from a ZMQ_REQ socket, at which time it serves up the next waiting log line from rsyslog. It in effect makes the rsyslog output queue an on demand request driven log line server. There's more work to do on it (we're talking about adding in batching support so that each request would receive N messages in reply), but even at the experimental state we've been able to serve up ~ 60,000 log lines a second over the output. While this is far below rsyslog's full capacity we feel it's not bad for a request / reply queue and at least a good starting point. Is there some way this could be useful with elastic search, given the statement that elastic search does better when it's requesting data rather than being pushed data? Brian On Thu, Feb 9, 2012 at 5:10 AM, Nathan Scott <[email protected]> wrote: > Hi there Vlad, > > ----- Original Message ----- > > -----BEGIN PGP SIGNED MESSAGE----- > > Hash: SHA1 > > ... > > work fine under many use cases. However, there are a few fundamental > > limitations with the current omelasticsearch/rsyslog integration: > > > > - - omelasticsearch uses curl to make the API calls to ES. The > > downside of this is that you have to specify a hostname. ES supports > > auto-discovering a cluster, as well as fail-over. If the host > > omelasticsearch is using goes down, the cluster may still be fully > > functional, but omelasticsearch won't be able to find it. Of course, > > you could go in and add other cluster members as failover actions, but > > this would mean a config change every time you change your ES > > topology. > > *nod* ... > > > - - curl has a default of only returning 16KB of the HTTP response. > > This response contains the information of which messages were > > successfully inserted into ES, and which failed. For a large batch of > > messages, one could easily get a response over the 16KB limit. This > > would require running a custom-compiled version of curl, that ups this > > limit. > > Limiting the batch size to something that can't blow that limit would > seem preferable than a libcurl rebuild, perhaps. The ES responses are > fairly terse, we should be able to calculate a fixed upper bound. > > > - - "Pushing" to ES seems to work much less reliably than having ES > > "pull" messages. For similarly small-sized batches (~250 messages), ES > > would often take 6-8ms for the bulk insert. However, it would > > That's quite surprising. > > > occasionally spike up to 6000ms, which would cause quite a backlog in > > the queue. > > That's *really* surprising - 6 seconds?!?? Did you figure out what > was going on there, OOC? That sounds like a network/socket timeout > on connect perhaps? Can you describe the platform that you're using > (Linux? distro? kernel version etc) - could be something like this > old chestnut causing this... > http://lkml.indiana.edu/hypermail/linux/net/0908.2/00003.html > > > Having ES "pull" messages instead (more on this later) > > seemed to work much more consistently. > > ... > > The method that I'm currently stress-testing is using the > > ElasticSearch River[1] with a RabbitMQ[2] type. With this setup, > > rsyslog sends messages to a RabbitMQ queue. ElasticSearch is > > That sounds like its just dodging the problem a bit though, by putting > a persistent queueing system in the middle to hide a deficiency. I'd > really prefer improving/fixing the underlying issues, which all seem > to me to not be inherently unfixable...? (except perhaps the first > issue of needing at least one ES hostname ... but again, the same issue > must affect rabbitmq, so must be solvable somehow? or using a floating > IP address or some such trickery). > > > configured with the queue's information, and then it periodically > > pulls messages from that queue. Once it has the messages, it proceeds > > to bulk index them. If the master ES node goes down, the new master > > starts pulling messages from the queue. Overall, it seems to work > > What happens when the queueing system in the middle goes down though? > Or when that queueing system gets very busy - haven't we just moved the > same problem somewhere else? > > > well, and the indexing throughput seems higher, due to not pushing > > messages to ES when it's very busy. > > OOC, when you see elasticsearch getting very busy, is it network or > CPU bound, or something else? > > > > > Those were my experiences - I hope some of that proves useful to > > others looking into ElasticSearch. > > > > Indeed, thanks for sharing! Much appreciated. > > cheers. > > -- > Nathan > _______________________________________________ > rsyslog mailing list > http://lists.adiscon.net/mailman/listinfo/rsyslog > http://www.rsyslog.com/professional-services/ > _______________________________________________ rsyslog mailing list http://lists.adiscon.net/mailman/listinfo/rsyslog http://www.rsyslog.com/professional-services/

