Is there a reason you are using trident? If you don't need to handle the events as a batch, you are probably going to get performance w/o it.
On Sun, Mar 2, 2014 at 2:23 PM, Sean Solbak <[email protected]> wrote: > Im writing a fairly basic trident topology as follows: > > - 4 spouts of events > - merges into one stream > - serializes the object as an event in a string > - saves to db > > I split the serialization task away from the spout as it was cpu intensive > to speed it up. > > The problem I have is that after 10 minutes there is over 910k tuples > emitted/transfered but only 193k records are saved. > > The overall load of the topology seems fine. > > - 536.404 ms complete latency at the topolgy level > - The highest capacity of any bolt is 0.3 which is the serialization one. > - each bolt task has sub 20 ms execute latency and sub 40 ms process > latency. > > So it seems trident has all the records internally, but I need these > events as close to realtime as possible. > > Does anyone have any guidance as to how to increase the throughput? Is it > simply a matter of tweeking max spout pending and the batch size? > > Im running it on 2 m1-smalls for now. I dont see the need to upgrade it > until the demand on the boxes seems higher. Although CPU usage on the > nimbus box is pinned. Its at like 99%. Why would that be? Its at 99% > even when all the topologies are killed. > > We are currently targeting processing 200 million records per day which > seems like it should be quite easy based on what Ive read that other people > have achieved. I realize that hardware should be able to boost this as > well but my first goal is to get trident to push the records to the db > quicker. > > Thanks in advance, > Sean > > -- Ce n'est pas une signature
