Here it is. Appears to be some kind of race condition. http://pastebin.com/dANT8SQR
On Sun, Mar 2, 2014 at 6:42 PM, Michael Rose <[email protected]>wrote: > Can you do a thread dump and pastebin it? It's a nice first step to figure > this out. > > I just checked on our Nimbus and while it's on a larger machine, it's > using <1% CPU. Also look in your logs for any clues. > > > Michael Rose (@Xorlev <https://twitter.com/xorlev>) > Senior Platform Engineer, FullContact <http://www.fullcontact.com/> > [email protected] > > > On Sun, Mar 2, 2014 at 6:31 PM, Sean Solbak <[email protected]> wrote: > >> No, they are on seperate machines. Its a 4 machine cluster - 2 workers, >> 1 nimbus and 1 zookeeper. >> >> I suppose I could just create a new cluster but Id like to know why this >> is occurring to avoid future production outages. >> >> Thanks, >> S >> >> >> >> On Sun, Mar 2, 2014 at 6:19 PM, Michael Rose <[email protected]>wrote: >> >>> Are you running Zookeeper on the same machine as the Nimbus box? >>> >>> Michael Rose (@Xorlev <https://twitter.com/xorlev>) >>> Senior Platform Engineer, FullContact <http://www.fullcontact.com/> >>> [email protected] >>> >>> >>> On Sun, Mar 2, 2014 at 6:16 PM, Sean Solbak <[email protected]> wrote: >>> >>>> This is the first step of 4. When I save to db I'm actually saving to a >>>> queue, (just using db for now). The 2nd step we index the data and 3rd we >>>> do aggregation/counts for reporting. The last is a search that I'm >>>> planning on using drpc for. Within step 2 we pipe certain datasets in real >>>> time to the clients it applies to. I'd like this and the drpc to be sub 2s >>>> which should be reasonable. >>>> >>>> Your right that I could speed up step 1 by not using trident but our >>>> requirements seem like a good use case for the other 3 steps. With many >>>> results per second batching should effect performance a ton if the batch >>>> size is small enough. >>>> >>>> What would cause nimbus to be at 100% CPU with the topologies killed? >>>> >>>> Sent from my iPhone >>>> >>>> On Mar 2, 2014, at 5:46 PM, Sean Allen <[email protected]> >>>> wrote: >>>> >>>> Is there a reason you are using trident? >>>> >>>> If you don't need to handle the events as a batch, you are probably >>>> going to get performance w/o it. >>>> >>>> >>>> On Sun, Mar 2, 2014 at 2:23 PM, Sean Solbak <[email protected]> wrote: >>>> >>>>> Im writing a fairly basic trident topology as follows: >>>>> >>>>> - 4 spouts of events >>>>> - merges into one stream >>>>> - serializes the object as an event in a string >>>>> - saves to db >>>>> >>>>> I split the serialization task away from the spout as it was cpu >>>>> intensive to speed it up. >>>>> >>>>> The problem I have is that after 10 minutes there is over 910k tuples >>>>> emitted/transfered but only 193k records are saved. >>>>> >>>>> The overall load of the topology seems fine. >>>>> >>>>> - 536.404 ms complete latency at the topolgy level >>>>> - The highest capacity of any bolt is 0.3 which is the serialization >>>>> one. >>>>> - each bolt task has sub 20 ms execute latency and sub 40 ms process >>>>> latency. >>>>> >>>>> So it seems trident has all the records internally, but I need these >>>>> events as close to realtime as possible. >>>>> >>>>> Does anyone have any guidance as to how to increase the throughput? >>>>> Is it simply a matter of tweeking max spout pending and the batch size? >>>>> >>>>> Im running it on 2 m1-smalls for now. I dont see the need to upgrade >>>>> it until the demand on the boxes seems higher. Although CPU usage on the >>>>> nimbus box is pinned. Its at like 99%. Why would that be? Its at 99% >>>>> even when all the topologies are killed. >>>>> >>>>> We are currently targeting processing 200 million records per day >>>>> which seems like it should be quite easy based on what Ive read that other >>>>> people have achieved. I realize that hardware should be able to boost >>>>> this >>>>> as well but my first goal is to get trident to push the records to the db >>>>> quicker. >>>>> >>>>> Thanks in advance, >>>>> Sean >>>>> >>>>> >>>> >>>> >>>> -- >>>> >>>> Ce n'est pas une signature >>>> >>>> >>> >> >> >> -- >> Thanks, >> >> Sean Solbak, BsC, MCSD >> Solbak Technologies Inc. >> 780.893.7326 (m) >> > > -- Thanks, Sean Solbak, BsC, MCSD Solbak Technologies Inc. 780.893.7326 (m)
