The fact that the process is being killed constantly is a red flag. Also, why are you running it as a client VM?
Check your nimbus.log to see why it's restarting. Michael Rose (@Xorlev <https://twitter.com/xorlev>) Senior Platform Engineer, FullContact <http://www.fullcontact.com/> [email protected] On Sun, Mar 2, 2014 at 7:50 PM, Sean Solbak <[email protected]> wrote: > uintx ErgoHeapSizeLimit = 0 > {product} > uintx InitialHeapSize := 27080896 > {product} > uintx LargePageHeapSizeThreshold = 134217728 > {product} > uintx MaxHeapSize := 698351616 > {product} > > > so initial size of ~25mb and max of ~666 mb > > Its a client process (not server ie the command is "java -client > -Dstorm.options..."). The process gets killed and restarted continously > with a new PID (which makes getting the PID tough to get stats on). I dont > have VisualVM but if I run > > jstat -gc PID, I get > > S0C S1C S0U S1U EC EU OC OU PC > PU YGC YGCT FGC FGCT GCT > 832.0 832.0 0.0 352.9 7168.0 1115.9 17664.0 1796.0 > 21248.0 16029.6 5 0.268 0 0.000 0.268 > > At this point I'll likely just rebuild the cluster. Its not in prod yet > as I still need to tune it. I should have wrote 2 separate emails :) > > Thanks, > S > > > > > On Sun, Mar 2, 2014 at 7:10 PM, Michael Rose <[email protected]>wrote: > >> I'm not seeing too much to substantiate that. What size heap are you >> running, and is it near filled? Perhaps attach VisualVM and check for GC >> activity. >> >> Michael Rose (@Xorlev <https://twitter.com/xorlev>) >> Senior Platform Engineer, FullContact <http://www.fullcontact.com/> >> [email protected] >> >> >> On Sun, Mar 2, 2014 at 6:54 PM, Sean Solbak <[email protected]> wrote: >> >>> Here it is. Appears to be some kind of race condition. >>> >>> http://pastebin.com/dANT8SQR >>> >>> >>> On Sun, Mar 2, 2014 at 6:42 PM, Michael Rose <[email protected]>wrote: >>> >>>> Can you do a thread dump and pastebin it? It's a nice first step to >>>> figure this out. >>>> >>>> I just checked on our Nimbus and while it's on a larger machine, it's >>>> using <1% CPU. Also look in your logs for any clues. >>>> >>>> >>>> Michael Rose (@Xorlev <https://twitter.com/xorlev>) >>>> Senior Platform Engineer, FullContact <http://www.fullcontact.com/> >>>> [email protected] >>>> >>>> >>>> On Sun, Mar 2, 2014 at 6:31 PM, Sean Solbak <[email protected]> wrote: >>>> >>>>> No, they are on seperate machines. Its a 4 machine cluster - 2 >>>>> workers, 1 nimbus and 1 zookeeper. >>>>> >>>>> I suppose I could just create a new cluster but Id like to know why >>>>> this is occurring to avoid future production outages. >>>>> >>>>> Thanks, >>>>> S >>>>> >>>>> >>>>> >>>>> On Sun, Mar 2, 2014 at 6:19 PM, Michael Rose >>>>> <[email protected]>wrote: >>>>> >>>>>> Are you running Zookeeper on the same machine as the Nimbus box? >>>>>> >>>>>> Michael Rose (@Xorlev <https://twitter.com/xorlev>) >>>>>> Senior Platform Engineer, FullContact <http://www.fullcontact.com/> >>>>>> [email protected] >>>>>> >>>>>> >>>>>> On Sun, Mar 2, 2014 at 6:16 PM, Sean Solbak <[email protected]> wrote: >>>>>> >>>>>>> This is the first step of 4. When I save to db I'm actually saving >>>>>>> to a queue, (just using db for now). The 2nd step we index the data and >>>>>>> 3rd we do aggregation/counts for reporting. The last is a search that >>>>>>> I'm >>>>>>> planning on using drpc for. Within step 2 we pipe certain datasets in >>>>>>> real >>>>>>> time to the clients it applies to. I'd like this and the drpc to be >>>>>>> sub 2s >>>>>>> which should be reasonable. >>>>>>> >>>>>>> Your right that I could speed up step 1 by not using trident but our >>>>>>> requirements seem like a good use case for the other 3 steps. With many >>>>>>> results per second batching should effect performance a ton if the batch >>>>>>> size is small enough. >>>>>>> >>>>>>> What would cause nimbus to be at 100% CPU with the topologies >>>>>>> killed? >>>>>>> >>>>>>> Sent from my iPhone >>>>>>> >>>>>>> On Mar 2, 2014, at 5:46 PM, Sean Allen <[email protected]> >>>>>>> wrote: >>>>>>> >>>>>>> Is there a reason you are using trident? >>>>>>> >>>>>>> If you don't need to handle the events as a batch, you are probably >>>>>>> going to get performance w/o it. >>>>>>> >>>>>>> >>>>>>> On Sun, Mar 2, 2014 at 2:23 PM, Sean Solbak <[email protected]> wrote: >>>>>>> >>>>>>>> Im writing a fairly basic trident topology as follows: >>>>>>>> >>>>>>>> - 4 spouts of events >>>>>>>> - merges into one stream >>>>>>>> - serializes the object as an event in a string >>>>>>>> - saves to db >>>>>>>> >>>>>>>> I split the serialization task away from the spout as it was cpu >>>>>>>> intensive to speed it up. >>>>>>>> >>>>>>>> The problem I have is that after 10 minutes there is over 910k >>>>>>>> tuples emitted/transfered but only 193k records are saved. >>>>>>>> >>>>>>>> The overall load of the topology seems fine. >>>>>>>> >>>>>>>> - 536.404 ms complete latency at the topolgy level >>>>>>>> - The highest capacity of any bolt is 0.3 which is the >>>>>>>> serialization one. >>>>>>>> - each bolt task has sub 20 ms execute latency and sub 40 ms >>>>>>>> process latency. >>>>>>>> >>>>>>>> So it seems trident has all the records internally, but I need >>>>>>>> these events as close to realtime as possible. >>>>>>>> >>>>>>>> Does anyone have any guidance as to how to increase the throughput? >>>>>>>> Is it simply a matter of tweeking max spout pending and the batch >>>>>>>> size? >>>>>>>> >>>>>>>> Im running it on 2 m1-smalls for now. I dont see the need to >>>>>>>> upgrade it until the demand on the boxes seems higher. Although CPU >>>>>>>> usage >>>>>>>> on the nimbus box is pinned. Its at like 99%. Why would that be? >>>>>>>> Its at >>>>>>>> 99% even when all the topologies are killed. >>>>>>>> >>>>>>>> We are currently targeting processing 200 million records per day >>>>>>>> which seems like it should be quite easy based on what Ive read that >>>>>>>> other >>>>>>>> people have achieved. I realize that hardware should be able to boost >>>>>>>> this >>>>>>>> as well but my first goal is to get trident to push the records to the >>>>>>>> db >>>>>>>> quicker. >>>>>>>> >>>>>>>> Thanks in advance, >>>>>>>> Sean >>>>>>>> >>>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> >>>>>>> Ce n'est pas une signature >>>>>>> >>>>>>> >>>>>> >>>>> >>>>> >>>>> -- >>>>> Thanks, >>>>> >>>>> Sean Solbak, BsC, MCSD >>>>> Solbak Technologies Inc. >>>>> 780.893.7326 (m) >>>>> >>>> >>>> >>> >>> >>> -- >>> Thanks, >>> >>> Sean Solbak, BsC, MCSD >>> Solbak Technologies Inc. >>> 780.893.7326 (m) >>> >> >> > > > -- > Thanks, > > Sean Solbak, BsC, MCSD > Solbak Technologies Inc. > 780.893.7326 (m) >
