Hi Lance, I tried Kafka Offset Monitor a while back, but it didn't play especially nice with a lot of topics / partitions (we currently have around 1400 topics and 4000 partitions in total). Might be possible to make it work a bit better, but not sure it would be the best way to do alerting.
Thanks for the tip though :). Best regards, Mathias On Mon, 16 Mar 2015 at 21:02 Lance Laursen <[email protected]> wrote: > Hey Mathias, > > Kafka Offset Monitor will give you a general idea of where your consumer > group(s) are at: > > http://quantifind.com/KafkaOffsetMonitor/ > > However, I'm not sure how useful it will be with "a large number of topics" > / turning its output into a script that alerts upon a threshold. Could take > a look and see what they're doing though. > > On Mon, Mar 16, 2015 at 8:31 AM, Mathias Söderberg < > [email protected]> wrote: > > > Good day, > > > > I'm looking into using SimpleConsumer#getOffsetsBefore and offsets > > committed in ZooKeeper for monitoring the lag of a consumer group. > > > > Our current use case is that we have a service that is continuously > > consuming messages of a large number of topics and persisting the > messages > > to S3 at somewhat regular intervals (depends on time and the total size > of > > consumed messages for each partition). Offsets are committed to ZooKeeper > > after the messages have been persisted to S3. > > The partitions are of varying load, so a simple threshold based on the > > number of messages we're lagging behind would be cumbersome to maintain > due > > to the number of topics, and most likely prone to unnecessary alerts. > > > > Currently our broker configuration specifies log.roll.hours=1 and > > log.segment.bytes=1GB, and my proposed solution is to have a separate > > service that would iterate through all topics/partitions and use > > #getOffsetsBefore with a timestamp that is one (1) or two (2) hours ago > and > > compare the first offset (which from my testing looks to be the offset > that > > is closest in time, i.e. from the log segment that is closest to the > > timestamp given) with the one that is saved to ZooKeeper. > > It feels like a pretty solid solution, given that we just want a rough > > estimate of how much we're lagging behind in time, so that we know > (again, > > roughly) how much time we have to fix whatever is broken before the log > > segments are deleted by Kafka. > > > > Is there anyone doing monitoring similar to this? Are there any obvious > > downsides of this approach that I'm not thinking about? Thoughts on > > alternatives? > > > > Best regards, > > Mathias > > >
