Guillem,

I think this ended up being caused by not having enough acker threads to
keep up.  This is controlled by the "topology.ackers.executors" Storm
property that you will find in the indexing topology flux remote.yaml
file.  It is exposed in Ambari in the "elasticsearch-properties" property
which is itself a list of properties.  Within that there is an
"indexing.executors" property.  If that is set to 0 it would definitely be
a problem and I think that may even be the default in 0.4.0.  Try changing
that to match the number of partitions dedicated to the indexing topic.

You could also change the property directly in the flux file
($METRON_HOME/flux/indexing/remote.yaml) and restart the topology from the
command line to verify this fixes it.  If you do use this strategy to test,
make sure you eventually make the change in Ambari so your changes don't
get overriden on a restart.  Changing this setting is confusing and there
have been some recent commits that have addressed that, exposing
"topology.ackers.executors" directly in Ambari in a dedicated indexing
topology section.

You might want to also check out the performance tuning guide we did
recently:
https://github.com/apache/metron/blob/master/metron-platform/Performance-tuning-guide.md.
If my guess is wrong and it's not the acker thread setting, the answer is
likely in there.

Hope this helps.  If you're still stuck send us some more info and we'll
try to help you figure it out.

Ryan

On Mon, Jul 31, 2017 at 12:02 PM, Guillem Mateos <bbguil...@gmail.com>
wrote:

> Hi,
>
> I'm facing an issue like the one Christian Tramnitz and Ryan Merriman
> discussed in May.
>
> I have a Metron deployment using 0.4.0 on 10 nodes. The indexing topology
> stops indexing messages when hitting the 10.000 (10k) message mark. This is
> related, as previously found by Christian, to the Kafka strategy, and after
> further debugging, I could track it down to the number of uncommitted
> offsets (maxUncommittedOffsets). This is specified in the Kafka spout and I
> could confirm that by providing a higher or lower value (5k or 15k) the
> point at which the indexing stops, is exactly that of maxUncommitedOffsets.
>
> I understand the workaround suggested (changing the strategy from
> UNCOMMITTED_EARLIEST to LATEST) is really a workaround and not a fix, as I
> would guess the topology shouldn't really need a change on that parameter
> to properly ingest data without failing. What seems to happen is that by
> changing to LATEST the messages do successfully get committed to Kafka
> while on the other, UNCOMMITTED_EARLIEST, at some point that might not
> happen.
>
> When I run the topology with 'LATEST' I usually see messages like this one
> on the Kafka Spout (indexing topology):
>
> o.a.s.k.s.KafkaSpout [DEBUG] Offsets successfully committed to Kafka
> [{indexing-0=OffsetAndMetadata{offset=2307113, metadata='{topic-partition=
> indexing-0
>
> I do not see such messages on the Kafka Spout when I have the issue and
> i'm running UNCOMMITTED_EARLIEST.
>
> Any suggestion on what may be the real source of the issue here? I did
> some tests before and it did not seem to be an issue on 0.3.0. Could this
> be something related to the new Kafka metron code? Or maybe related to one
> of the PR's in Metron or Kafka (I saw one in Metron about dupe enrichment
> messages (METRON-569) and a few on Kafka regarding issues with the commited
> offset (but most were for newer versions of Kafka than Metron is using).
>
> Thanks
>

Reply via email to