Hi Guillem, 

Did you eventually fix the problem? 

On 2017-08-01 11:00, Guillem Mateos wrote:

> On the elasticsearch.properties file, right now, I have the following 
> regarding workers and executors:
> 
> ##### Storm #####
> indexing.workers=1
> indexing.executors=1
> topology.worker.childopts=
> topology.auto-credentials=[]
> 
> Regarding the flux file for indexing, it is set exactly to what's on github 
> for metron 0.4.0
> 
> Should I also double checking the enrichment topology? Could this be caused 
> somehow by it?
> 
> Thanks 
> 
> 2017-08-01 19:33 GMT+02:00 Ryan Merriman <[email protected]>:
> 
> Yes you are correct they are separate concepts.  Once a tuple's tree has been 
> acked in Storm, meaning all the spouts/bolts that are required to ack a tuple 
> have done so, it is then commited to Kafka in the form of an offset.  If a 
> tuple is not completely acked, it will never be commited to Kafka and the 
> tuple will be replayed after a timeout.  Eventually you'll have too many 
> tuples in flight and will result  
> 
> I think the next step would be to review your configuration.  The way the 
> executor properties are named in Storm can be confusing so it's probably best 
> if you share your flux/property files. 
> 
> On Tue, Aug 1, 2017 at 12:01 PM, Guillem Mateos <[email protected]> wrote:
> 
> Hi Ryan,
> 
> Thanks for your quick reply. I've been trying to change a few settings today. 
> From having the executors to 1 to have it at a different number. Also worth 
> mentioning is that the system i'm testing this with does not have a very high 
> message input rate right now, so I wouldn't expect to need to do any special 
> tunning. I'm roughly at about 100 messages per minute, which is really not 
> much.
> 
> After trying with the executors on a different value I can confirm the issue 
> still exists. I do see also quite a number of messages like this one:
> 
> Discarding stale fetch response for partition indexing-0 since its offset 
> 2565827 does not match the expected offset 2565828
> 
> Regarding ackers, I was under the impression that it was something slightly 
> different than committing. So you do ack a message and you commit it also, 
> but it's not exactly the same. Am I right?
> 
> Thanks 
> 
> 2017-07-31 19:40 GMT+02:00 Ryan Merriman <[email protected]>:
> 
> Guillem, 
> 
> I think this ended up being caused by not having enough acker threads to keep 
> up.  This is controlled by the "topology.ackers.executors" Storm property 
> that you will find in the indexing topology flux remote.yaml file.  It is 
> exposed in Ambari in the "elasticsearch-properties" property which is itself 
> a list of properties.  Within that there is an "indexing.executors" property. 
>  If that is set to 0 it would definitely be a problem and I think that may 
> even be the default in 0.4.0.  Try changing that to match the number of 
> partitions dedicated to the indexing topic.   
> 
> You could also change the property directly in the flux file 
> ($METRON_HOME/flux/indexing/remote.yaml) and restart the topology from the 
> command line to verify this fixes it.  If you do use this strategy to test, 
> make sure you eventually make the change in Ambari so your changes don't get 
> overriden on a restart.  Changing this setting is confusing and there have 
> been some recent commits that have addressed that, exposing 
> "topology.ackers.executors" directly in Ambari in a dedicated indexing 
> topology section.   
> 
> You might want to also check out the performance tuning guide we did 
> recently:  
> https://github.com/apache/metron/blob/master/metron-platform/Performance-tuning-guide.md
>  [1].  If my guess is wrong and it's not the acker thread setting, the answer 
> is likely in there.   
> 
> Hope this helps.  If you're still stuck send us some more info and we'll try 
> to help you figure it out. 
> 
> Ryan 
> 
> On Mon, Jul 31, 2017 at 12:02 PM, Guillem Mateos <[email protected]> wrote:
> 
> Hi,
> 
> I'm facing an issue like the one Christian Tramnitz and Ryan Merriman 
> discussed in May.
> 
> I have a Metron deployment using 0.4.0 on 10 nodes. The indexing topology 
> stops indexing messages when hitting the 10.000 (10k) message mark. This is 
> related, as previously found by Christian, to the Kafka strategy, and after 
> further debugging, I could track it down to the number of uncommitted offsets 
> (maxUncommittedOffsets). This is specified in the Kafka spout and I could 
> confirm that by providing a higher or lower value (5k or 15k) the point at 
> which the indexing stops, is exactly that of maxUncommitedOffsets.
> 
> I understand the workaround suggested (changing the strategy from 
> UNCOMMITTED_EARLIEST to LATEST) is really a workaround and not a fix, as I 
> would guess the topology shouldn't really need a change on that parameter to 
> properly ingest data without failing. What seems to happen is that by 
> changing to LATEST the messages do successfully get committed to Kafka while 
> on the other, UNCOMMITTED_EARLIEST, at some point that might not happen.
> 
> When I run the topology with 'LATEST' I usually see messages like this one on 
> the Kafka Spout (indexing topology):
> 
> o.a.s.k.s.KafkaSpout [DEBUG] Offsets successfully committed to Kafka 
> [{indexing-0=OffsetAndMetadata{offset=2307113, 
> metadata='{topic-partition=indexing-0
> 
> I do not see such messages on the Kafka Spout when I have the issue and i'm 
> running UNCOMMITTED_EARLIEST.
> 
> Any suggestion on what may be the real source of the issue here? I did some 
> tests before and it did not seem to be an issue on 0.3.0. Could this be 
> something related to the new Kafka metron code? Or maybe related to one of 
> the PR's in Metron or Kafka (I saw one in Metron about dupe enrichment 
> messages (METRON-569) and a few on Kafka regarding issues with the commited 
> offset (but most were for newer versions of Kafka than Metron is using).
> 
> Thanks

 

Links:
------
[1]
https://github.com/apache/metron/blob/master/metron-platform/Performance-tuning-guide.md

Reply via email to