Re: Issues with indexing topology

Ryan Merriman Tue, 01 Aug 2017 10:33:30 -0700

Yes you are correct they are separate concepts.  Once a tuple's tree has
been acked in Storm, meaning all the spouts/bolts that are required to ack
a tuple have done so, it is then commited to Kafka in the form of an
offset.  If a tuple is not completely acked, it will never be commited to
Kafka and the tuple will be replayed after a timeout.  Eventually you'll
have too many tuples in flight and will result


I think the next step would be to review your configuration.  The way the
executor properties are named in Storm can be confusing so it's probably
best if you share your flux/property files.

On Tue, Aug 1, 2017 at 12:01 PM, Guillem Mateos <[email protected]> wrote:

> Hi Ryan,
>
> Thanks for your quick reply. I've been trying to change a few settings
> today. From having the executors to 1 to have it at a different number.
> Also worth mentioning is that the system i'm testing this with does not
> have a very high message input rate right now, so I wouldn't expect to need
> to do any special tunning. I'm roughly at about 100 messages per minute,
> which is really not much.
>
> After trying with the executors on a different value I can confirm the
> issue still exists. I do see also quite a number of messages like this one:
>
> Discarding stale fetch response for partition indexing-0 since its offset
> 2565827 does not match the expected offset 2565828
>
> Regarding ackers, I was under the impression that it was something
> slightly different than committing. So you do ack a message and you commit
> it also, but it's not exactly the same. Am I right?
>
> Thanks
>
> 2017-07-31 19:40 GMT+02:00 Ryan Merriman <[email protected]>:
>
>> Guillem,
>>
>> I think this ended up being caused by not having enough acker threads to
>> keep up.  This is controlled by the "topology.ackers.executors" Storm
>> property that you will find in the indexing topology flux remote.yaml
>> file.  It is exposed in Ambari in the "elasticsearch-properties" property
>> which is itself a list of properties.  Within that there is an
>> "indexing.executors" property.  If that is set to 0 it would definitely be
>> a problem and I think that may even be the default in 0.4.0.  Try changing
>> that to match the number of partitions dedicated to the indexing topic.
>>
>> You could also change the property directly in the flux file
>> ($METRON_HOME/flux/indexing/remote.yaml) and restart the topology from
>> the command line to verify this fixes it.  If you do use this strategy to
>> test, make sure you eventually make the change in Ambari so your changes
>> don't get overriden on a restart.  Changing this setting is confusing and
>> there have been some recent commits that have addressed that, exposing
>> "topology.ackers.executors" directly in Ambari in a dedicated indexing
>> topology section.
>>
>> You might want to also check out the performance tuning guide we did
>> recently:  https://github.com/apache/metron/blob/master/metron-platfor
>> m/Performance-tuning-guide.md.  If my guess is wrong and it's not the
>> acker thread setting, the answer is likely in there.
>>
>> Hope this helps.  If you're still stuck send us some more info and we'll
>> try to help you figure it out.
>>
>> Ryan
>>
>> On Mon, Jul 31, 2017 at 12:02 PM, Guillem Mateos <[email protected]>
>> wrote:
>>
>>> Hi,
>>>
>>> I'm facing an issue like the one Christian Tramnitz and Ryan Merriman
>>> discussed in May.
>>>
>>> I have a Metron deployment using 0.4.0 on 10 nodes. The indexing
>>> topology stops indexing messages when hitting the 10.000 (10k) message
>>> mark. This is related, as previously found by Christian, to the Kafka
>>> strategy, and after further debugging, I could track it down to the number
>>> of uncommitted offsets (maxUncommittedOffsets). This is specified in the
>>> Kafka spout and I could confirm that by providing a higher or lower value
>>> (5k or 15k) the point at which the indexing stops, is exactly that of
>>> maxUncommitedOffsets.
>>>
>>> I understand the workaround suggested (changing the strategy from
>>> UNCOMMITTED_EARLIEST to LATEST) is really a workaround and not a fix, as I
>>> would guess the topology shouldn't really need a change on that parameter
>>> to properly ingest data without failing. What seems to happen is that by
>>> changing to LATEST the messages do successfully get committed to Kafka
>>> while on the other, UNCOMMITTED_EARLIEST, at some point that might not
>>> happen.
>>>
>>> When I run the topology with 'LATEST' I usually see messages like this
>>> one on the Kafka Spout (indexing topology):
>>>
>>> o.a.s.k.s.KafkaSpout [DEBUG] Offsets successfully committed to Kafka
>>> [{indexing-0=OffsetAndMetadata{offset=2307113,
>>> metadata='{topic-partition=indexing-0
>>>
>>> I do not see such messages on the Kafka Spout when I have the issue and
>>> i'm running UNCOMMITTED_EARLIEST.
>>>
>>> Any suggestion on what may be the real source of the issue here? I did
>>> some tests before and it did not seem to be an issue on 0.3.0. Could this
>>> be something related to the new Kafka metron code? Or maybe related to one
>>> of the PR's in Metron or Kafka (I saw one in Metron about dupe enrichment
>>> messages (METRON-569) and a few on Kafka regarding issues with the commited
>>> offset (but most were for newer versions of Kafka than Metron is using).
>>>
>>> Thanks
>>>
>>
>>
>

Re: Issues with indexing topology

Reply via email to