Hello Michael,
So, using curl and the API, I’ve been able to collect some statistics.
Currently, it is a test platform with nearly no activity. I’ve setup a basic
parser, with the following topology:
- 6 ackers (I’ve 6 kafka partitions per topic)
- Spout // = 6
- Spout # of tasks = 6
- Parser // = 24
- Parser # of tasks = 24
I inject one line of logs with Nifi on my sensor topic. As a reminder, it needs
roughly 10 s to be visible on the enrichments topic. Here are some statistics:
"spouts": [
{
"emitted": 1160,
"spoutId": "kafkaSpout",
"requestedMemOnHeap": 128,
"errorTime": null,
"tasks": 6,
"errorHost": "",
"failed": 0,
"completeLatency": "3963.078",
"executors": 6,
"encodedSpoutId": "kafkaSpout",
"transferred": 1160,
"errorPort": "",
"requestedMemOffHeap": 0,
"errorLapsedSecs": null,
"acked": 1020,
"requestedCpu": 10,
"lastError": "",
"errorWorkerLogLink": ""
}
This completeLatency looks very high doesn’t it?
And for bolt:
{
"emitted": 0,
"requestedMemOnHeap": 128,
"errorTime": null,
"tasks": 12,
"errorHost": "",
"failed": 0,
"boltId": "parserBolt",
"executors": 12,
"processLatency": "832.962",
"executeLatency": "1.391",
"transferred": 0,
"errorPort": "",
"requestedMemOffHeap": 0,
"errorLapsedSecs": null,
"acked": 3680,
"requestedCpu": 10,
"encodedBoltId": "parserBolt",
"lastError": "",
"executed": 3680,
"capacity": "0.003",
"errorWorkerLogLink": ""
}
So, my understanding is that it takes a lot of time to ack tuples, but I don’t
know where to go now. As said below, I’ve tried the tweaks mentioned here:
https://github.com/apache/storm/blob/master/docs/Performance.md but no change.
It looks like that we are trying to fill a bucket, and send data after a given
timeout if the bucket is not full. But I don’t see any timeout that looks like
10 or 20 secondes in storm configuration.
As a reminder, I’ve Kerberos enabled on my platform, but everything seems to
work fine except Metron ingestion.
Thanks for your help,
Stéphane
From: Michael Miklavcic [mailto:[email protected]]
Sent: Wednesday, May 15, 2019 16:03
To: [email protected]
Subject: Re: Very low throuput on topologies
You could use curl from the cli. But if this is something you're testing out on
your local machine, I'd probably start without Kerberos enabled and work the
perf knobs there first. You should be able to see the "complete latency" from
the Storm UI on each running topology.
On Wed, May 15, 2019 at 1:33 AM
<[email protected]<mailto:[email protected]>> wrote:
Hello Nick,
Thanks for your answer. By the way, the problem already happens before
indexing, at the parser level. It takes many time to go from sensor topic to
“enrichments” topic, and again many seconds to go from “enrichments” topic to
“indexing” topic.
I’ve tried the recommendations described here:
https://github.com/apache/storm/blob/master/docs/Performance.md but no change.
The problem with Kerberos is that it is no longer possible to access Storm UI
without some tweaks that are blocked by administrator on my computer.
From: Nick Allen [mailto:[email protected]<mailto:[email protected]>]
Sent: Tuesday, May 14, 2019 23:39
To: [email protected]<mailto:[email protected]>
Subject: Re: Very low throuput on topologies
Have you increased the indexing "batch_size"? That is the first knob to start
tuning.
https://github.com/apache/metron/tree/master/metron-platform/metron-indexing#sensor-indexing-configuration
On Tue, May 14, 2019 at 10:26 AM
<[email protected]<mailto:[email protected]>> wrote:
Hello happy metron users,
I’ve a Metron cluster based on Hortonworks CP, and I’ve setup Kerberos on the
top of it, as you all probably have done since we deal with security ☺
It seems that everything is working fine, Kerberos, ranger,… but I’m facing an
issue regarding the overall throuput.
I feed my cluster with Nifi, here is what I do:
Test 1:
- Send 2 lines of logs to Kafka sensor topic with Nifi
- Use Kafka CLI consumer to read messages from sensor topic: response
is immediate
- Use kafka CLI consumer to read messages from enrichment topic:
messages are coming after nearly 20 s
Test 2:
- Send 200 lines of logs to Kafka sensor topic with Nifi
- Use Kafka CLI consumer to read messages from sensor topic: response
is immediate
- Use kafka CLI consumer to read messages from enrichment topic: some
messages are coming immediately, but it seems they come 10 by 10 (nearly), with
many seconds between each flow
It’s probably related to Storm configuration, but I don’t know where to go now.
I’ve tried to change various parameters like topology.max.spout.pending
(currently set to 500), but no improvement
Thanks for your help
Stéphane
_________________________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou
falsifie. Merci.
This message and its attachments may contain confidential or privileged
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been
modified, changed or falsified.
Thank you.
_________________________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou
falsifie. Merci.
This message and its attachments may contain confidential or privileged
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been
modified, changed or falsified.
Thank you.
_________________________________________________________________________________________________________________________
Ce message et ses pieces jointes peuvent contenir des informations
confidentielles ou privilegiees et ne doivent donc
pas etre diffuses, exploites ou copies sans autorisation. Si vous avez recu ce
message par erreur, veuillez le signaler
a l'expediteur et le detruire ainsi que les pieces jointes. Les messages
electroniques etant susceptibles d'alteration,
Orange decline toute responsabilite si ce message a ete altere, deforme ou
falsifie. Merci.
This message and its attachments may contain confidential or privileged
information that may be protected by law;
they should not be distributed, used or copied without authorisation.
If you have received this email in error, please notify the sender and delete
this message and its attachments.
As emails may be altered, Orange is not liable for messages that have been
modified, changed or falsified.
Thank you.