Can you do a thread dump and pastebin it? It's a nice first step to figure this out.
I just checked on our Nimbus and while it's on a larger machine, it's using <1% CPU. Also look in your logs for any clues. Michael Rose (@Xorlev <https://twitter.com/xorlev>) Senior Platform Engineer, FullContact <http://www.fullcontact.com/> [email protected] On Sun, Mar 2, 2014 at 6:31 PM, Sean Solbak <[email protected]> wrote: > No, they are on seperate machines. Its a 4 machine cluster - 2 workers, 1 > nimbus and 1 zookeeper. > > I suppose I could just create a new cluster but Id like to know why this > is occurring to avoid future production outages. > > Thanks, > S > > > > On Sun, Mar 2, 2014 at 6:19 PM, Michael Rose <[email protected]>wrote: > >> Are you running Zookeeper on the same machine as the Nimbus box? >> >> Michael Rose (@Xorlev <https://twitter.com/xorlev>) >> Senior Platform Engineer, FullContact <http://www.fullcontact.com/> >> [email protected] >> >> >> On Sun, Mar 2, 2014 at 6:16 PM, Sean Solbak <[email protected]> wrote: >> >>> This is the first step of 4. When I save to db I'm actually saving to a >>> queue, (just using db for now). The 2nd step we index the data and 3rd we >>> do aggregation/counts for reporting. The last is a search that I'm >>> planning on using drpc for. Within step 2 we pipe certain datasets in real >>> time to the clients it applies to. I'd like this and the drpc to be sub 2s >>> which should be reasonable. >>> >>> Your right that I could speed up step 1 by not using trident but our >>> requirements seem like a good use case for the other 3 steps. With many >>> results per second batching should effect performance a ton if the batch >>> size is small enough. >>> >>> What would cause nimbus to be at 100% CPU with the topologies killed? >>> >>> Sent from my iPhone >>> >>> On Mar 2, 2014, at 5:46 PM, Sean Allen <[email protected]> >>> wrote: >>> >>> Is there a reason you are using trident? >>> >>> If you don't need to handle the events as a batch, you are probably >>> going to get performance w/o it. >>> >>> >>> On Sun, Mar 2, 2014 at 2:23 PM, Sean Solbak <[email protected]> wrote: >>> >>>> Im writing a fairly basic trident topology as follows: >>>> >>>> - 4 spouts of events >>>> - merges into one stream >>>> - serializes the object as an event in a string >>>> - saves to db >>>> >>>> I split the serialization task away from the spout as it was cpu >>>> intensive to speed it up. >>>> >>>> The problem I have is that after 10 minutes there is over 910k tuples >>>> emitted/transfered but only 193k records are saved. >>>> >>>> The overall load of the topology seems fine. >>>> >>>> - 536.404 ms complete latency at the topolgy level >>>> - The highest capacity of any bolt is 0.3 which is the serialization >>>> one. >>>> - each bolt task has sub 20 ms execute latency and sub 40 ms process >>>> latency. >>>> >>>> So it seems trident has all the records internally, but I need these >>>> events as close to realtime as possible. >>>> >>>> Does anyone have any guidance as to how to increase the throughput? Is >>>> it simply a matter of tweeking max spout pending and the batch size? >>>> >>>> Im running it on 2 m1-smalls for now. I dont see the need to upgrade >>>> it until the demand on the boxes seems higher. Although CPU usage on the >>>> nimbus box is pinned. Its at like 99%. Why would that be? Its at 99% >>>> even when all the topologies are killed. >>>> >>>> We are currently targeting processing 200 million records per day which >>>> seems like it should be quite easy based on what Ive read that other people >>>> have achieved. I realize that hardware should be able to boost this as >>>> well but my first goal is to get trident to push the records to the db >>>> quicker. >>>> >>>> Thanks in advance, >>>> Sean >>>> >>>> >>> >>> >>> -- >>> >>> Ce n'est pas une signature >>> >>> >> > > > -- > Thanks, > > Sean Solbak, BsC, MCSD > Solbak Technologies Inc. > 780.893.7326 (m) >
