This is the first step of 4. When I save to db I'm actually saving to a queue, (just using db for now). The 2nd step we index the data and 3rd we do aggregation/counts for reporting. The last is a search that I'm planning on using drpc for. Within step 2 we pipe certain datasets in real time to the clients it applies to. I'd like this and the drpc to be sub 2s which should be reasonable.
Your right that I could speed up step 1 by not using trident but our requirements seem like a good use case for the other 3 steps. With many results per second batching should effect performance a ton if the batch size is small enough. What would cause nimbus to be at 100% CPU with the topologies killed? Sent from my iPhone > On Mar 2, 2014, at 5:46 PM, Sean Allen <[email protected]> wrote: > > Is there a reason you are using trident? > > If you don't need to handle the events as a batch, you are probably going to > get performance w/o it. > > >> On Sun, Mar 2, 2014 at 2:23 PM, Sean Solbak <[email protected]> wrote: >> Im writing a fairly basic trident topology as follows: >> >> - 4 spouts of events >> - merges into one stream >> - serializes the object as an event in a string >> - saves to db >> >> I split the serialization task away from the spout as it was cpu intensive >> to speed it up. >> >> The problem I have is that after 10 minutes there is over 910k tuples >> emitted/transfered but only 193k records are saved. >> >> The overall load of the topology seems fine. >> >> - 536.404 ms complete latency at the topolgy level >> - The highest capacity of any bolt is 0.3 which is the serialization one. >> - each bolt task has sub 20 ms execute latency and sub 40 ms process latency. >> >> So it seems trident has all the records internally, but I need these events >> as close to realtime as possible. >> >> Does anyone have any guidance as to how to increase the throughput? Is it >> simply a matter of tweeking max spout pending and the batch size? >> >> Im running it on 2 m1-smalls for now. I dont see the need to upgrade it >> until the demand on the boxes seems higher. Although CPU usage on the >> nimbus box is pinned. Its at like 99%. Why would that be? Its at 99% even >> when all the topologies are killed. >> >> We are currently targeting processing 200 million records per day which >> seems like it should be quite easy based on what Ive read that other people >> have achieved. I realize that hardware should be able to boost this as well >> but my first goal is to get trident to push the records to the db quicker. >> >> Thanks in advance, >> Sean > > > > -- > > Ce n'est pas une signature
