Re: Issues with indexing topology

Guillem Mateos Mon, 14 Aug 2017 05:18:44 -0700

Hi Laurens,

We're doing additional testing, but it seems to be fixed, yes. Are you
experiencing the same problem?




2017-08-14 6:12 GMT+02:00 Laurens Vets <[email protected]>:

> Hi Guillem,
>
> Did you eventually fix the problem?
>
> On 2017-08-01 11:00, Guillem Mateos wrote:
>
> On the elasticsearch.properties file, right now, I have the following
> regarding workers and executors:
>
> ##### Storm #####
> indexing.workers=1
> indexing.executors=1
> topology.worker.childopts=
> topology.auto-credentials=[]
>
> Regarding the flux file for indexing, it is set exactly to what's on
> github for metron 0.4.0
>
> Should I also double checking the enrichment topology? Could this be
> caused somehow by it?
>
> Thanks
>
> 2017-08-01 19:33 GMT+02:00 Ryan Merriman <[email protected]>:
>
>> Yes you are correct they are separate concepts.  Once a tuple's tree has
>> been acked in Storm, meaning all the spouts/bolts that are required to ack
>> a tuple have done so, it is then commited to Kafka in the form of an
>> offset.  If a tuple is not completely acked, it will never be commited to
>> Kafka and the tuple will be replayed after a timeout.  Eventually you'll
>> have too many tuples in flight and will result
>>
>> I think the next step would be to review your configuration.  The way the
>> executor properties are named in Storm can be confusing so it's probably
>> best if you share your flux/property files.
>>
>> On Tue, Aug 1, 2017 at 12:01 PM, Guillem Mateos <[email protected]>
>> wrote:
>>
>>> Hi Ryan,
>>>
>>> Thanks for your quick reply. I've been trying to change a few settings
>>> today. From having the executors to 1 to have it at a different number.
>>> Also worth mentioning is that the system i'm testing this with does not
>>> have a very high message input rate right now, so I wouldn't expect to need
>>> to do any special tunning. I'm roughly at about 100 messages per minute,
>>> which is really not much.
>>>
>>> After trying with the executors on a different value I can confirm the
>>> issue still exists. I do see also quite a number of messages like this one:
>>>
>>> Discarding stale fetch response for partition indexing-0 since its
>>> offset 2565827 does not match the expected offset 2565828
>>>
>>> Regarding ackers, I was under the impression that it was something
>>> slightly different than committing. So you do ack a message and you commit
>>> it also, but it's not exactly the same. Am I right?
>>>
>>> Thanks
>>>
>>> 2017-07-31 19:40 GMT+02:00 Ryan Merriman <[email protected]>:
>>>
>>>> Guillem,
>>>>
>>>> I think this ended up being caused by not having enough acker threads
>>>> to keep up.  This is controlled by the "topology.ackers.executors" Storm
>>>> property that you will find in the indexing topology flux remote.yaml
>>>> file.  It is exposed in Ambari in the "elasticsearch-properties" property
>>>> which is itself a list of properties.  Within that there is an
>>>> "indexing.executors" property.  If that is set to 0 it would definitely be
>>>> a problem and I think that may even be the default in 0.4.0.  Try changing
>>>> that to match the number of partitions dedicated to the indexing topic.
>>>>
>>>> You could also change the property directly in the flux file
>>>> ($METRON_HOME/flux/indexing/remote.yaml) and restart the topology from
>>>> the command line to verify this fixes it.  If you do use this strategy to
>>>> test, make sure you eventually make the change in Ambari so your changes
>>>> don't get overriden on a restart.  Changing this setting is confusing and
>>>> there have been some recent commits that have addressed that, exposing
>>>> "topology.ackers.executors" directly in Ambari in a dedicated indexing
>>>> topology section.
>>>>
>>>> You might want to also check out the performance tuning guide we did
>>>> recently:  https://github.com/apache/metron/blob/master/metron-platfor
>>>> m/Performance-tuning-guide.md.  If my guess is wrong and it's not the
>>>> acker thread setting, the answer is likely in there.
>>>>
>>>> Hope this helps.  If you're still stuck send us some more info and
>>>> we'll try to help you figure it out.
>>>>
>>>> Ryan
>>>>
>>>> On Mon, Jul 31, 2017 at 12:02 PM, Guillem Mateos <[email protected]>
>>>> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> I'm facing an issue like the one Christian Tramnitz and Ryan Merriman
>>>>> discussed in May.
>>>>>
>>>>> I have a Metron deployment using 0.4.0 on 10 nodes. The indexing
>>>>> topology stops indexing messages when hitting the 10.000 (10k) message
>>>>> mark. This is related, as previously found by Christian, to the Kafka
>>>>> strategy, and after further debugging, I could track it down to the number
>>>>> of uncommitted offsets (maxUncommittedOffsets). This is specified in the
>>>>> Kafka spout and I could confirm that by providing a higher or lower value
>>>>> (5k or 15k) the point at which the indexing stops, is exactly that of
>>>>> maxUncommitedOffsets.
>>>>>
>>>>> I understand the workaround suggested (changing the strategy from
>>>>> UNCOMMITTED_EARLIEST to LATEST) is really a workaround and not a fix, as I
>>>>> would guess the topology shouldn't really need a change on that parameter
>>>>> to properly ingest data without failing. What seems to happen is that by
>>>>> changing to LATEST the messages do successfully get committed to Kafka
>>>>> while on the other, UNCOMMITTED_EARLIEST, at some point that might not
>>>>> happen.
>>>>>
>>>>> When I run the topology with 'LATEST' I usually see messages like this
>>>>> one on the Kafka Spout (indexing topology):
>>>>>
>>>>> o.a.s.k.s.KafkaSpout [DEBUG] Offsets successfully committed to Kafka
>>>>> [{indexing-0=OffsetAndMetadata{offset=2307113,
>>>>> metadata='{topic-partition=indexing-0
>>>>>
>>>>> I do not see such messages on the Kafka Spout when I have the issue
>>>>> and i'm running UNCOMMITTED_EARLIEST.
>>>>>
>>>>> Any suggestion on what may be the real source of the issue here? I did
>>>>> some tests before and it did not seem to be an issue on 0.3.0. Could this
>>>>> be something related to the new Kafka metron code? Or maybe related to one
>>>>> of the PR's in Metron or Kafka (I saw one in Metron about dupe enrichment
>>>>> messages (METRON-569) and a few on Kafka regarding issues with the 
>>>>> commited
>>>>> offset (but most were for newer versions of Kafka than Metron is using).
>>>>>
>>>>> Thanks
>>>>>
>>>>
>

Re: Issues with indexing topology

Reply via email to