Hi all again,

Since no reply I guess the question might be not described clearly enough.
So let me try to elaborate it ...

Performance requirements for our system is 100K TPS with latency must be <
500 ms.
And the system needs to evaluate some rules for each incoming message but
we don't know how many rules it will have in the system as they can be
dynamically added by the users when it is running.

So at the beginning we designed a topology as below to dispatch message to
rule evaluation bolts based on count of rules.

*Spout -> Bolt#1(Dispatch) -> Bolt#2(Evaluation)*

For example if there are 10 rules in DB then Bolt#1 emits 10 message to
Bolt#2 tasks for evaluation. We thought that would take the advantages of
concurrent processing when we have enough task instances for Bolt#2.

We tried 1 rule case and it perfectly reaches 125K TPS and 80ms latency
with 4 nodes.
But it failed to handle 2 rules. Memory is consumed very fast, a lot of
full GC happened and finally the workers are down.
We tried setting memory to 4G. Its better but still could not handle 3
rules. And we also did a memory dump which shows that memory eat up by
TaskMessage instances.

We guess it is a topology design problem because it gets 2 times more
messages emitted from dispatcher bolt for 2 rules. And 3 times for 3 rules.
It will definitely not work in production where we have no idea how many
rules will be created.

We tried several changes and come to the updated version as below:

*Spout -> Bolt#1(Dispatch) -> Bolt#2(Evaluation) --> Bolt#2 --> Bolt#2 -->
Bolt#2 --> Bolt#2 --> ...*

The difference is that in new topology bolt#1 (dispatcher bolt) emits only
one message that includes a list of rule IDs to the Bolt#2 (evaluation
bolt). As soon as the Bolt#2 receives the message, it picks up the first
rule ID from the list and immediately emits the rest in a message to
another Bolt#2 (maybe itself since it is based on field grouping) before it
starts to evaluate the 1st rule. Then that bolt#2 does the same, until the
rule list is empty.

This topology solves the memory issue very well because it can support 200
rules with 3G memory although the throughput and latency is not good but we
believe it can be resolved by adding more nodes into the cluster.

Although it seems to solve the issue but we are not 100% sure why.. We want
to figure it out to make sure what we did is right.

We guess the first topology causes a message explosion at the dispatcher
bolt when there are more than one rule in the system and then the message
stuck in the queue or somewhere and cannot be consumed immediately and
finally eat up the memory.

But in the 2nd topology, the load of dispatching rules is distributed to
the rule evaluation bolt instances (in different workers and nodes) so it
does not cause the explosion. We trade off the latency to memory.

Is the thinking correct? Or is it a correct solve?


Any idea is welcome and appreciated.

Thanks
BR/Wind




On Wed, Sep 9, 2015 at 6:24 PM, Hong Wind <[email protected]> wrote:

> Hi all,
>
> I am doing a performance benchmark for our topology which is to evaluate a
> set of rules. The topology is as below:
>
> Spout -> Bolt#1(Dispatch) -> Bolt#2(Evaluation)
>
> The bolt#1 load the rules from DB and dispatch them to bolt#2 for
> evaluation. One bolt#2 task evaluates one rule. So how many emits from
> bolt#1 depend on how many rules we have.
>
> When we have 1 rule, it is no problem with 2G memory.
>
> But when we increase to 2 rules, memory are consumed very fast and finally
> the works are down even we set memory to 3G. The dump shows that we have
> too many TaskMessage instances in hashmap.
>
> Then we tried many fix and come to change the topology to:
>
> Spout -> Bolt#1(Dispatch) -> Bolt#2(Evaluate Rule1) -> Bolt#2(Evaluate
> Rule2) -> ... -> Bolt#2(Evaluate RuleN)
>
> With this topology, when we have N rules, bolt#1 only emits one message
> (Rule#1, Rule#2, ... Rule#N) to bolt#2. Then bolt#2 evaluates the Rule#1
> and emit message (Rule#2, ... Rule#N) to bolt#2 again. So how deep the
> bolt#2 chain is depends on the count of rules.
>
> Then the memory issue disappears even we have 200 rules.
>
> So the question is why?
> As the total number of TaskMessage are same.
>
> Thanks a lot
> BR/Wind
>
>
>
>
>
>

Reply via email to