Im writing a fairly basic trident topology as follows:

- 4 spouts of events
- merges into one stream
- serializes the object as an event in a string
- saves to db

I split the serialization task away from the spout as it was cpu intensive
to speed it up.

The problem I have is that after 10 minutes there is over 910k tuples
emitted/transfered but only 193k records are saved.

The overall load of the topology seems fine.

- 536.404 ms complete latency at the topolgy level
- The highest capacity of any bolt is 0.3 which is the serialization one.
- each bolt task has sub 20 ms execute latency and sub 40 ms process
latency.

So it seems trident has all the records internally, but I need these events
as close to realtime as possible.

Does anyone have any guidance as to how to increase the throughput?  Is it
simply a matter of tweeking max spout pending and the batch size?

Im running it on 2 m1-smalls for now.  I dont see the need to upgrade it
until the demand on the boxes seems higher.  Although CPU usage on the
nimbus box is pinned.  Its at like 99%.  Why would that be?  Its at 99%
even when all the topologies are killed.

We are currently targeting processing 200 million records per day which
seems like it should be quite easy based on what Ive read that other people
have achieved.  I realize that hardware should be able to boost this as
well but my first goal is to get trident to push the records to the db
quicker.

Thanks in advance,
Sean

Reply via email to