I'm not seeing too much to substantiate that. What size heap are you
running, and is it near filled? Perhaps attach VisualVM and check for GC
activity.

Michael Rose (@Xorlev <https://twitter.com/xorlev>)
Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
[email protected]


On Sun, Mar 2, 2014 at 6:54 PM, Sean Solbak <[email protected]> wrote:

> Here it is.  Appears to be some kind of race condition.
>
> http://pastebin.com/dANT8SQR
>
>
> On Sun, Mar 2, 2014 at 6:42 PM, Michael Rose <[email protected]>wrote:
>
>> Can you do a thread dump and pastebin it? It's a nice first step to
>> figure this out.
>>
>> I just checked on our Nimbus and while it's on a larger machine, it's
>> using <1% CPU. Also look in your logs for any clues.
>>
>>
>> Michael Rose (@Xorlev <https://twitter.com/xorlev>)
>> Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
>> [email protected]
>>
>>
>> On Sun, Mar 2, 2014 at 6:31 PM, Sean Solbak <[email protected]> wrote:
>>
>>> No, they are on seperate machines.  Its a 4 machine cluster - 2 workers,
>>> 1 nimbus and 1 zookeeper.
>>>
>>> I suppose I could just create a new cluster but Id like to know why this
>>> is occurring to avoid future production outages.
>>>
>>> Thanks,
>>> S
>>>
>>>
>>>
>>> On Sun, Mar 2, 2014 at 6:19 PM, Michael Rose <[email protected]>wrote:
>>>
>>>> Are you running Zookeeper on the same machine as the Nimbus box?
>>>>
>>>>  Michael Rose (@Xorlev <https://twitter.com/xorlev>)
>>>> Senior Platform Engineer, FullContact <http://www.fullcontact.com/>
>>>> [email protected]
>>>>
>>>>
>>>> On Sun, Mar 2, 2014 at 6:16 PM, Sean Solbak <[email protected]> wrote:
>>>>
>>>>> This is the first step of 4. When I save to db I'm actually saving to
>>>>> a queue, (just using db for now).  The 2nd step we index the data and 3rd
>>>>> we do aggregation/counts for reporting.  The last is a search that I'm
>>>>> planning on using drpc for.  Within step 2 we pipe certain datasets in 
>>>>> real
>>>>> time to the clients it applies to.  I'd like this and the drpc to be sub 
>>>>> 2s
>>>>> which should be reasonable.
>>>>>
>>>>> Your right that I could speed up step 1 by not using trident but our
>>>>> requirements seem like a good use case for the other 3 steps.  With many
>>>>> results per second batching should effect performance a ton if the batch
>>>>> size is small enough.
>>>>>
>>>>> What would cause nimbus to be at 100% CPU with the topologies killed?
>>>>>
>>>>> Sent from my iPhone
>>>>>
>>>>> On Mar 2, 2014, at 5:46 PM, Sean Allen <[email protected]>
>>>>> wrote:
>>>>>
>>>>> Is there a reason you are using trident?
>>>>>
>>>>> If you don't need to handle the events as a batch, you are probably
>>>>> going to get performance w/o it.
>>>>>
>>>>>
>>>>> On Sun, Mar 2, 2014 at 2:23 PM, Sean Solbak <[email protected]> wrote:
>>>>>
>>>>>> Im writing a fairly basic trident topology as follows:
>>>>>>
>>>>>> - 4 spouts of events
>>>>>> - merges into one stream
>>>>>> - serializes the object as an event in a string
>>>>>> - saves to db
>>>>>>
>>>>>> I split the serialization task away from the spout as it was cpu
>>>>>> intensive to speed it up.
>>>>>>
>>>>>> The problem I have is that after 10 minutes there is over 910k tuples
>>>>>> emitted/transfered but only 193k records are saved.
>>>>>>
>>>>>> The overall load of the topology seems fine.
>>>>>>
>>>>>> - 536.404 ms complete latency at the topolgy level
>>>>>> - The highest capacity of any bolt is 0.3 which is the serialization
>>>>>> one.
>>>>>> - each bolt task has sub 20 ms execute latency and sub 40 ms process
>>>>>> latency.
>>>>>>
>>>>>> So it seems trident has all the records internally, but I need these
>>>>>> events as close to realtime as possible.
>>>>>>
>>>>>> Does anyone have any guidance as to how to increase the throughput?
>>>>>>  Is it simply a matter of tweeking max spout pending and the batch size?
>>>>>>
>>>>>> Im running it on 2 m1-smalls for now.  I dont see the need to upgrade
>>>>>> it until the demand on the boxes seems higher.  Although CPU usage on the
>>>>>> nimbus box is pinned.  Its at like 99%.  Why would that be?  Its at 99%
>>>>>> even when all the topologies are killed.
>>>>>>
>>>>>> We are currently targeting processing 200 million records per day
>>>>>> which seems like it should be quite easy based on what Ive read that 
>>>>>> other
>>>>>> people have achieved.  I realize that hardware should be able to boost 
>>>>>> this
>>>>>> as well but my first goal is to get trident to push the records to the db
>>>>>> quicker.
>>>>>>
>>>>>> Thanks in advance,
>>>>>> Sean
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>>
>>>>> Ce n'est pas une signature
>>>>>
>>>>>
>>>>
>>>
>>>
>>> --
>>> Thanks,
>>>
>>> Sean Solbak, BsC, MCSD
>>> Solbak Technologies Inc.
>>> 780.893.7326 (m)
>>>
>>
>>
>
>
> --
> Thanks,
>
> Sean Solbak, BsC, MCSD
> Solbak Technologies Inc.
> 780.893.7326 (m)
>

Reply via email to