Hello, I've been trying to get the elasticsearch sink going for the passed day or so. For the most part things just tend to work. However, once you start getting into the 4-10k events per second range it seems to fall apart. This is when the channel (memory in this case) tends to fill up indefinitely since the sink does not appear quick enough to keep up.
I read in a couple places that adding multiple sinks (at least in the HDFS case) can benefit with the throughput, and this did appear to help. I was able to keep up when running 10 elasticsearch sinks with a 10,000 batchSize. The documentation seems a bit vague in this spot, so first off, when you have multiple sinks attached to a single memory channel, do all sinks have to ack the message and take care of it before it is removed or is it similar to the publisher/consumer model where any consumer can take the message off? After I got the channel at a stable fill percentage (10k batch, 10 elasticsearch sinks), I began to notice my agent dying with no log messages. So before I keep trying here, has anyone else ran into these issues before? My elasticsearch cluster is 3 nodes tuned for write performance and they do not seem overwhelmed by flume. I had considered a second flume agent that solely dealt wth elasticsearch, since the current config also includes 1 HDFS sink, but am unsure this will really help. Thanks, Allan
