Re: Indexing topology keep crashing

Nick Allen Fri, 14 Sep 2018 12:23:29 -0700

> Is there a way to better handle big bursts?

You will want to tune your cluster based on your expected burst rate, not
your average rate.  We include some tools [1]
<http://metron.apache.org/current-book/metron-contrib/metron-performance/index.html>
in Metron to generate synthetic data at your expected burst rate.  While
your cluster is under load, you can then tune your topologies to ensure
they can handle bursts.


We have some general documentation on tuning [2]
<http://metron.apache.org/current-book/metron-platform/Performance-tuning-guide.html>
and
also for performance tuning Enrichments [3]
<https://github.com/apache/metron/blob/master/metron-platform/metron-enrichment/Performance.md#performance-tuning>.
The same advice is applicable to Indexing.  Any general information on
tuning Storm is broadly applicable to Metron.

With Indexing specifically, in many cases we find that the downstream
indexer (like ES or Solr) is actually what is slowing things down.  Be sure
to pay attention to what your Indexer is doing to ensure that it is not the
bottleneck.


> Or a rate control mechanism of some sort?

The primary means of introducing backpressure for all of the topologies is
to adjust topology.max.spout.pending [4]
<http://metron.apache.org/current-book/metron-platform/metron-enrichment/Performance.html#topology.max.spout.pending>
.  This should be covered in the performance tuning documentation that I
pointed you to above.


> (7. try to re-index the lost data. I have not found a way for this yet)

I do not understand exactly what you are trying to do, but you should not
worry about losing data.  All of that telemetry, along with the
intermediate processing steps, is stored in your Kafka topics and you can
run it through your topologies again.

The specific replay process would differ based on your setup, so please
don't just do this verbatim.  But for example you could stop your Indexing
topology, set the *kafka.start=EARLIEST* in your Indexing properties
(elasticsearch.properties/solr.properties), then restart your topology and
it will begin indexing from the beginning of the data in your topic. Once
it completes, then restore the original setting and restart your topology.





[1]
http://metron.apache.org/current-book/metron-contrib/metron-performance/index.html
[2]
http://metron.apache.org/current-book/metron-platform/Performance-tuning-guide.html
[3]
http://metron.apache.org/current-book/metron-platform/metron-enrichment/Performance.html#Performance_Tuning
[4]
http://metron.apache.org/current-book/metron-platform/metron-enrichment/Performance.html#topology.max.spout.pending



On Fri, Sep 14, 2018 at 2:10 PM Vets, Laurens <[email protected]> wrote:

> For the record, here is how I 'fixed' this:
>
> 1. Stop Storm, it's crashing constantly anyway. Stop sending messages to
> your Metron installation.
>
> 2. Export the messages from the Kafka topic that's crashing Storm so
> that they're not lost. In my case that's the indexing topic. I have no
> idea yet on how to re-ingest them.
>
> 3. Set the 'retention.ms' Kafka configuration setting to a small value,
> then wait a minute. The command for this is
> "/usr/hdp/current/kafka-broker/bin/kafka-configs.sh --zookeeper
> localhost:2181 --entity-type topics --alter --add-config
> retention.ms=1000 --entity-name indexing".
>
> 4. Make sure that the 'retention.ms' value is set:
> "/usr/hdp/current/kafka-broker/bin/kafka-configs.sh --zookeeper
> localhost:2181 --entity-type topics --describe --entity-name indexing"
>
> 5. Wait a couple of minutes, the Kafka log files should be empty. You
> can check this with "ls -altr /tmp/kafka-logs/indexing/" or "du -h
> /tmp/kafka-logs/indexing/". Replace "/tmp/kafka-logs/" with the correct
> path to your Kafka logs directory. In my case, there was approx. 11GB of
> data in the indexing topic.
>
> 6. Restore the default retention time:
> "/usr/hdp/current/kafka-broker/bin/kafka-configs.sh --zookeeper
> localhost:2181 --entity-type topics --alter --delete-config retention.ms
> --entity-name indexing".
>
> (7. try to re-index the lost data. I have not found a way for this yet)
>
> At this point, start Storm again. It shouldn't crash anymore as there's
> no data to index.
>
> Does this sound like a sound way to 'fix' these kinds of problems? I
> suspect that I received a big burst of logs (Kibana seems to support
> this) that Storm couldn't handle. Is there a way to better handle big
> bursts? Or a rate control mechanism of some sort?
>
> On 13-Sep-18 11:39, Vets, Laurens wrote:
> > 1. worker.childopts: -Xmx2048m
> >
> > 2. As in individual messages? Just small(-ish) JSON messages. A few
> KBytes?
> >
> > On 13-Sep-18 11:21, Casey Stella wrote:
> >> Two questions:
> >> 1. How much memory are you giving the workers for the indexing topology?
> >> 2. how large are the messages you're sending through?
> >>
> >> On Thu, Sep 13, 2018 at 2:00 PM Vets, Laurens <[email protected]
> >> <mailto:[email protected]>> wrote:
> >>
> >>     Hello list,
> >>
> >>     I've installed OS updates on my Metron 0.4.2 yesterday, restarted
> all
> >>     nodes and now my indexing topology keeps crashing.
> >>
> >>     This is what I see in the Storm UI for the indexing topology:
> >>
> >>     Topology stats:
> >>     10m 0s    1304380    1953520    12499.833    1320
> >>     3h 0m 0s    1304380    1953520    12499.833    1320
> >>     1d 0h 0m 0s    1304380    1953520    12499.833    1320
> >>     All time    1304380    1953520    12499.833    1320
> >>
> >>     Spouts:
> >>     kafkaSpout    1    1    1299940    1949080    12499.833    1320
> >>     0
> >>     metron3    6702    java.lang.OutOfMemoryError: GC overhead limit
> >>     exceeded at java.lang.Long.valueOf(Long.java:840) at
> >>
>  
> org.apache.storm.kafka.spout.KafkaSpoutRetryExponentialBackoff$RetryEntryTimeStampComparator.compar
> >>
> >>     Bolts:
> >>     hdfsIndexingBolt    1    1    1800    1800    0.278    7.022
> >>     1820
> >>     38.633    1800    0    metron3    6702
> >>     java.lang.NullPointerException
> >>     at
> >>
>  org.apache.metron.writer.hdfs.SourceHandler.handle(SourceHandler.java:80)
> >>     at
> org.apache.metron.writer.hdfs.HdfsWriter.write(HdfsWriter.java:113)
> >>     at org.apache.metr    Thur, 13 Sep 2018 07:35:02
> >>     indexingBolt    1    1    1320    1320    0.217    7.662    1300
> >>     47.815    1300    0    metron3    6702
> >>     java.lang.OutOfMemoryError: GC
> >>     overhead limit exceeded at
> >>     java.util.Arrays.copyOfRange(Arrays.java:3664) at
> >>     java.lang.String.<init>(String.java:207) at
> >>     org.json.simple.parser.Yylex.yytext(Yylex.jav    Thur, 13 Sep 2018
> >>     07:37:33
> >>
> >>     When I check the Kafka topic, I can see that there's at least 3
> >>     million
> >>     messages in the kafka indexing topic... I _suspect_ that the
> indexing
> >>     topology tries to write those but fails, restarts, tries to write,
> >>     fails, etc... Metron is currently not ingesting any additional
> >>     messages,
> >>     but also can't seem to index the current ones...
> >>
> >>     Any idea on how to proceed?
> >>
>
>

Re: Indexing topology keep crashing

Reply via email to