On the elasticsearch.properties file, right now, I have the following regarding workers and executors:
##### Storm ##### indexing.workers=1 indexing.executors=1 topology.worker.childopts= topology.auto-credentials=[] Regarding the flux file for indexing, it is set exactly to what's on github for metron 0.4.0 Should I also double checking the enrichment topology? Could this be caused somehow by it? Thanks 2017-08-01 19:33 GMT+02:00 Ryan Merriman <[email protected]>: > Yes you are correct they are separate concepts. Once a tuple's tree has > been acked in Storm, meaning all the spouts/bolts that are required to ack > a tuple have done so, it is then commited to Kafka in the form of an > offset. If a tuple is not completely acked, it will never be commited to > Kafka and the tuple will be replayed after a timeout. Eventually you'll > have too many tuples in flight and will result > > I think the next step would be to review your configuration. The way the > executor properties are named in Storm can be confusing so it's probably > best if you share your flux/property files. > > On Tue, Aug 1, 2017 at 12:01 PM, Guillem Mateos <[email protected]> > wrote: > >> Hi Ryan, >> >> Thanks for your quick reply. I've been trying to change a few settings >> today. From having the executors to 1 to have it at a different number. >> Also worth mentioning is that the system i'm testing this with does not >> have a very high message input rate right now, so I wouldn't expect to need >> to do any special tunning. I'm roughly at about 100 messages per minute, >> which is really not much. >> >> After trying with the executors on a different value I can confirm the >> issue still exists. I do see also quite a number of messages like this one: >> >> Discarding stale fetch response for partition indexing-0 since its offset >> 2565827 does not match the expected offset 2565828 >> >> Regarding ackers, I was under the impression that it was something >> slightly different than committing. So you do ack a message and you commit >> it also, but it's not exactly the same. Am I right? >> >> Thanks >> >> 2017-07-31 19:40 GMT+02:00 Ryan Merriman <[email protected]>: >> >>> Guillem, >>> >>> I think this ended up being caused by not having enough acker threads to >>> keep up. This is controlled by the "topology.ackers.executors" Storm >>> property that you will find in the indexing topology flux remote.yaml >>> file. It is exposed in Ambari in the "elasticsearch-properties" property >>> which is itself a list of properties. Within that there is an >>> "indexing.executors" property. If that is set to 0 it would definitely be >>> a problem and I think that may even be the default in 0.4.0. Try changing >>> that to match the number of partitions dedicated to the indexing topic. >>> >>> You could also change the property directly in the flux file >>> ($METRON_HOME/flux/indexing/remote.yaml) and restart the topology from >>> the command line to verify this fixes it. If you do use this strategy to >>> test, make sure you eventually make the change in Ambari so your changes >>> don't get overriden on a restart. Changing this setting is confusing and >>> there have been some recent commits that have addressed that, exposing >>> "topology.ackers.executors" directly in Ambari in a dedicated indexing >>> topology section. >>> >>> You might want to also check out the performance tuning guide we did >>> recently: https://github.com/apache/metron/blob/master/metron-platfor >>> m/Performance-tuning-guide.md. If my guess is wrong and it's not the >>> acker thread setting, the answer is likely in there. >>> >>> Hope this helps. If you're still stuck send us some more info and we'll >>> try to help you figure it out. >>> >>> Ryan >>> >>> On Mon, Jul 31, 2017 at 12:02 PM, Guillem Mateos <[email protected]> >>> wrote: >>> >>>> Hi, >>>> >>>> I'm facing an issue like the one Christian Tramnitz and Ryan Merriman >>>> discussed in May. >>>> >>>> I have a Metron deployment using 0.4.0 on 10 nodes. The indexing >>>> topology stops indexing messages when hitting the 10.000 (10k) message >>>> mark. This is related, as previously found by Christian, to the Kafka >>>> strategy, and after further debugging, I could track it down to the number >>>> of uncommitted offsets (maxUncommittedOffsets). This is specified in the >>>> Kafka spout and I could confirm that by providing a higher or lower value >>>> (5k or 15k) the point at which the indexing stops, is exactly that of >>>> maxUncommitedOffsets. >>>> >>>> I understand the workaround suggested (changing the strategy from >>>> UNCOMMITTED_EARLIEST to LATEST) is really a workaround and not a fix, as I >>>> would guess the topology shouldn't really need a change on that parameter >>>> to properly ingest data without failing. What seems to happen is that by >>>> changing to LATEST the messages do successfully get committed to Kafka >>>> while on the other, UNCOMMITTED_EARLIEST, at some point that might not >>>> happen. >>>> >>>> When I run the topology with 'LATEST' I usually see messages like this >>>> one on the Kafka Spout (indexing topology): >>>> >>>> o.a.s.k.s.KafkaSpout [DEBUG] Offsets successfully committed to Kafka >>>> [{indexing-0=OffsetAndMetadata{offset=2307113, >>>> metadata='{topic-partition=indexing-0 >>>> >>>> I do not see such messages on the Kafka Spout when I have the issue and >>>> i'm running UNCOMMITTED_EARLIEST. >>>> >>>> Any suggestion on what may be the real source of the issue here? I did >>>> some tests before and it did not seem to be an issue on 0.3.0. Could this >>>> be something related to the new Kafka metron code? Or maybe related to one >>>> of the PR's in Metron or Kafka (I saw one in Metron about dupe enrichment >>>> messages (METRON-569) and a few on Kafka regarding issues with the commited >>>> offset (but most were for newer versions of Kafka than Metron is using). >>>> >>>> Thanks >>>> >>> >>> >> >
