I would try making lots of bolt 2 in parallel and have each bolt 2 process
the full list of rules.
On Sep 12, 2015 10:37 AM, "Hong Wind" <[email protected]> wrote:

> Hi all again,
>
> Since no reply I guess the question might be not described clearly enough.
> So let me try to elaborate it ...
>
> Performance requirements for our system is 100K TPS with latency must be <
> 500 ms.
> And the system needs to evaluate some rules for each incoming message but
> we don't know how many rules it will have in the system as they can be
> dynamically added by the users when it is running.
>
> So at the beginning we designed a topology as below to dispatch message to
> rule evaluation bolts based on count of rules.
>
> *Spout -> Bolt#1(Dispatch) -> Bolt#2(Evaluation)*
>
> For example if there are 10 rules in DB then Bolt#1 emits 10 message to
> Bolt#2 tasks for evaluation. We thought that would take the advantages of
> concurrent processing when we have enough task instances for Bolt#2.
>
> We tried 1 rule case and it perfectly reaches 125K TPS and 80ms latency
> with 4 nodes.
> But it failed to handle 2 rules. Memory is consumed very fast, a lot of
> full GC happened and finally the workers are down.
> We tried setting memory to 4G. Its better but still could not handle 3
> rules. And we also did a memory dump which shows that memory eat up by
> TaskMessage instances.
>
> We guess it is a topology design problem because it gets 2 times more
> messages emitted from dispatcher bolt for 2 rules. And 3 times for 3 rules.
> It will definitely not work in production where we have no idea how many
> rules will be created.
>
> We tried several changes and come to the updated version as below:
>
> *Spout -> Bolt#1(Dispatch) -> Bolt#2(Evaluation) --> Bolt#2 --> Bolt#2 -->
> Bolt#2 --> Bolt#2 --> ...*
>
> The difference is that in new topology bolt#1 (dispatcher bolt) emits only
> one message that includes a list of rule IDs to the Bolt#2 (evaluation
> bolt). As soon as the Bolt#2 receives the message, it picks up the first
> rule ID from the list and immediately emits the rest in a message to
> another Bolt#2 (maybe itself since it is based on field grouping) before it
> starts to evaluate the 1st rule. Then that bolt#2 does the same, until the
> rule list is empty.
>
> This topology solves the memory issue very well because it can support 200
> rules with 3G memory although the throughput and latency is not good but we
> believe it can be resolved by adding more nodes into the cluster.
>
> Although it seems to solve the issue but we are not 100% sure why.. We
> want to figure it out to make sure what we did is right.
>
> We guess the first topology causes a message explosion at the dispatcher
> bolt when there are more than one rule in the system and then the message
> stuck in the queue or somewhere and cannot be consumed immediately and
> finally eat up the memory.
>
> But in the 2nd topology, the load of dispatching rules is distributed to
> the rule evaluation bolt instances (in different workers and nodes) so it
> does not cause the explosion. We trade off the latency to memory.
>
> Is the thinking correct? Or is it a correct solve?
>
>
> Any idea is welcome and appreciated.
>
> Thanks
> BR/Wind
>
>
>
>
> On Wed, Sep 9, 2015 at 6:24 PM, Hong Wind <[email protected]> wrote:
>
>> Hi all,
>>
>> I am doing a performance benchmark for our topology which is to evaluate
>> a set of rules. The topology is as below:
>>
>> Spout -> Bolt#1(Dispatch) -> Bolt#2(Evaluation)
>>
>> The bolt#1 load the rules from DB and dispatch them to bolt#2 for
>> evaluation. One bolt#2 task evaluates one rule. So how many emits from
>> bolt#1 depend on how many rules we have.
>>
>> When we have 1 rule, it is no problem with 2G memory.
>>
>> But when we increase to 2 rules, memory are consumed very fast and
>> finally the works are down even we set memory to 3G. The dump shows that we
>> have too many TaskMessage instances in hashmap.
>>
>> Then we tried many fix and come to change the topology to:
>>
>> Spout -> Bolt#1(Dispatch) -> Bolt#2(Evaluate Rule1) -> Bolt#2(Evaluate
>> Rule2) -> ... -> Bolt#2(Evaluate RuleN)
>>
>> With this topology, when we have N rules, bolt#1 only emits one message
>> (Rule#1, Rule#2, ... Rule#N) to bolt#2. Then bolt#2 evaluates the Rule#1
>> and emit message (Rule#2, ... Rule#N) to bolt#2 again. So how deep the
>> bolt#2 chain is depends on the count of rules.
>>
>> Then the memory issue disappears even we have 200 rules.
>>
>> So the question is why?
>> As the total number of TaskMessage are same.
>>
>> Thanks a lot
>> BR/Wind
>>
>>
>>
>>
>>
>>
>

Reply via email to