I'm not seeing too much to substantiate that. What size heap are you running, and is it near filled? Perhaps attach VisualVM and check for GC activity.
Michael Rose (@Xorlev <https://twitter.com/xorlev>) Senior Platform Engineer, FullContact <http://www.fullcontact.com/> [email protected] On Sun, Mar 2, 2014 at 6:54 PM, Sean Solbak <[email protected]> wrote: > Here it is. Appears to be some kind of race condition. > > http://pastebin.com/dANT8SQR > > > On Sun, Mar 2, 2014 at 6:42 PM, Michael Rose <[email protected]>wrote: > >> Can you do a thread dump and pastebin it? It's a nice first step to >> figure this out. >> >> I just checked on our Nimbus and while it's on a larger machine, it's >> using <1% CPU. Also look in your logs for any clues. >> >> >> Michael Rose (@Xorlev <https://twitter.com/xorlev>) >> Senior Platform Engineer, FullContact <http://www.fullcontact.com/> >> [email protected] >> >> >> On Sun, Mar 2, 2014 at 6:31 PM, Sean Solbak <[email protected]> wrote: >> >>> No, they are on seperate machines. Its a 4 machine cluster - 2 workers, >>> 1 nimbus and 1 zookeeper. >>> >>> I suppose I could just create a new cluster but Id like to know why this >>> is occurring to avoid future production outages. >>> >>> Thanks, >>> S >>> >>> >>> >>> On Sun, Mar 2, 2014 at 6:19 PM, Michael Rose <[email protected]>wrote: >>> >>>> Are you running Zookeeper on the same machine as the Nimbus box? >>>> >>>> Michael Rose (@Xorlev <https://twitter.com/xorlev>) >>>> Senior Platform Engineer, FullContact <http://www.fullcontact.com/> >>>> [email protected] >>>> >>>> >>>> On Sun, Mar 2, 2014 at 6:16 PM, Sean Solbak <[email protected]> wrote: >>>> >>>>> This is the first step of 4. When I save to db I'm actually saving to >>>>> a queue, (just using db for now). The 2nd step we index the data and 3rd >>>>> we do aggregation/counts for reporting. The last is a search that I'm >>>>> planning on using drpc for. Within step 2 we pipe certain datasets in >>>>> real >>>>> time to the clients it applies to. I'd like this and the drpc to be sub >>>>> 2s >>>>> which should be reasonable. >>>>> >>>>> Your right that I could speed up step 1 by not using trident but our >>>>> requirements seem like a good use case for the other 3 steps. With many >>>>> results per second batching should effect performance a ton if the batch >>>>> size is small enough. >>>>> >>>>> What would cause nimbus to be at 100% CPU with the topologies killed? >>>>> >>>>> Sent from my iPhone >>>>> >>>>> On Mar 2, 2014, at 5:46 PM, Sean Allen <[email protected]> >>>>> wrote: >>>>> >>>>> Is there a reason you are using trident? >>>>> >>>>> If you don't need to handle the events as a batch, you are probably >>>>> going to get performance w/o it. >>>>> >>>>> >>>>> On Sun, Mar 2, 2014 at 2:23 PM, Sean Solbak <[email protected]> wrote: >>>>> >>>>>> Im writing a fairly basic trident topology as follows: >>>>>> >>>>>> - 4 spouts of events >>>>>> - merges into one stream >>>>>> - serializes the object as an event in a string >>>>>> - saves to db >>>>>> >>>>>> I split the serialization task away from the spout as it was cpu >>>>>> intensive to speed it up. >>>>>> >>>>>> The problem I have is that after 10 minutes there is over 910k tuples >>>>>> emitted/transfered but only 193k records are saved. >>>>>> >>>>>> The overall load of the topology seems fine. >>>>>> >>>>>> - 536.404 ms complete latency at the topolgy level >>>>>> - The highest capacity of any bolt is 0.3 which is the serialization >>>>>> one. >>>>>> - each bolt task has sub 20 ms execute latency and sub 40 ms process >>>>>> latency. >>>>>> >>>>>> So it seems trident has all the records internally, but I need these >>>>>> events as close to realtime as possible. >>>>>> >>>>>> Does anyone have any guidance as to how to increase the throughput? >>>>>> Is it simply a matter of tweeking max spout pending and the batch size? >>>>>> >>>>>> Im running it on 2 m1-smalls for now. I dont see the need to upgrade >>>>>> it until the demand on the boxes seems higher. Although CPU usage on the >>>>>> nimbus box is pinned. Its at like 99%. Why would that be? Its at 99% >>>>>> even when all the topologies are killed. >>>>>> >>>>>> We are currently targeting processing 200 million records per day >>>>>> which seems like it should be quite easy based on what Ive read that >>>>>> other >>>>>> people have achieved. I realize that hardware should be able to boost >>>>>> this >>>>>> as well but my first goal is to get trident to push the records to the db >>>>>> quicker. >>>>>> >>>>>> Thanks in advance, >>>>>> Sean >>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> >>>>> Ce n'est pas une signature >>>>> >>>>> >>>> >>> >>> >>> -- >>> Thanks, >>> >>> Sean Solbak, BsC, MCSD >>> Solbak Technologies Inc. >>> 780.893.7326 (m) >>> >> >> > > > -- > Thanks, > > Sean Solbak, BsC, MCSD > Solbak Technologies Inc. > 780.893.7326 (m) >
