Is there a reason you are using trident?

If you don't need to handle the events as a batch, you are probably going
to get performance w/o it.


On Sun, Mar 2, 2014 at 2:23 PM, Sean Solbak <[email protected]> wrote:

> Im writing a fairly basic trident topology as follows:
>
> - 4 spouts of events
> - merges into one stream
> - serializes the object as an event in a string
> - saves to db
>
> I split the serialization task away from the spout as it was cpu intensive
> to speed it up.
>
> The problem I have is that after 10 minutes there is over 910k tuples
> emitted/transfered but only 193k records are saved.
>
> The overall load of the topology seems fine.
>
> - 536.404 ms complete latency at the topolgy level
> - The highest capacity of any bolt is 0.3 which is the serialization one.
> - each bolt task has sub 20 ms execute latency and sub 40 ms process
> latency.
>
> So it seems trident has all the records internally, but I need these
> events as close to realtime as possible.
>
> Does anyone have any guidance as to how to increase the throughput?  Is it
> simply a matter of tweeking max spout pending and the batch size?
>
> Im running it on 2 m1-smalls for now.  I dont see the need to upgrade it
> until the demand on the boxes seems higher.  Although CPU usage on the
> nimbus box is pinned.  Its at like 99%.  Why would that be?  Its at 99%
> even when all the topologies are killed.
>
> We are currently targeting processing 200 million records per day which
> seems like it should be quite easy based on what Ive read that other people
> have achieved.  I realize that hardware should be able to boost this as
> well but my first goal is to get trident to push the records to the db
> quicker.
>
> Thanks in advance,
> Sean
>
>


-- 

Ce n'est pas une signature

Reply via email to