Hi,
I'm facing an issue like the one Christian Tramnitz and Ryan Merriman
discussed in May.
I have a Metron deployment using 0.4.0 on 10 nodes. The indexing topology
stops indexing messages when hitting the 10.000 (10k) message mark. This is
related, as previously found by Christian, to the Kafka strategy, and after
further debugging, I could track it down to the number of uncommitted
offsets (maxUncommittedOffsets). This is specified in the Kafka spout and I
could confirm that by providing a higher or lower value (5k or 15k) the
point at which the indexing stops, is exactly that of maxUncommitedOffsets.
I understand the workaround suggested (changing the strategy from
UNCOMMITTED_EARLIEST to LATEST) is really a workaround and not a fix, as I
would guess the topology shouldn't really need a change on that parameter
to properly ingest data without failing. What seems to happen is that by
changing to LATEST the messages do successfully get committed to Kafka
while on the other, UNCOMMITTED_EARLIEST, at some point that might not
happen.
When I run the topology with 'LATEST' I usually see messages like this one
on the Kafka Spout (indexing topology):
o.a.s.k.s.KafkaSpout [DEBUG] Offsets successfully committed to Kafka
[{indexing-0=OffsetAndMetadata{offset=2307113,
metadata='{topic-partition=indexing-0
I do not see such messages on the Kafka Spout when I have the issue and i'm
running UNCOMMITTED_EARLIEST.
Any suggestion on what may be the real source of the issue here? I did some
tests before and it did not seem to be an issue on 0.3.0. Could this be
something related to the new Kafka metron code? Or maybe related to one of
the PR's in Metron or Kafka (I saw one in Metron about dupe enrichment
messages (METRON-569) and a few on Kafka regarding issues with the commited
offset (but most were for newer versions of Kafka than Metron is using).
Thanks