This is the first step of 4. When I save to db I'm actually saving to a queue, 
(just using db for now).  The 2nd step we index the data and 3rd we do 
aggregation/counts for reporting.  The last is a search that I'm planning on 
using drpc for.  Within step 2 we pipe certain datasets in real time to the 
clients it applies to.  I'd like this and the drpc to be sub 2s which should be 
reasonable.

Your right that I could speed up step 1 by not using trident but our 
requirements seem like a good use case for the other 3 steps.  With many 
results per second batching should effect performance a ton if the batch size 
is small enough.

What would cause nimbus to be at 100% CPU with the topologies killed? 

Sent from my iPhone

> On Mar 2, 2014, at 5:46 PM, Sean Allen <[email protected]> wrote:
> 
> Is there a reason you are using trident? 
> 
> If you don't need to handle the events as a batch, you are probably going to 
> get performance w/o it.
> 
> 
>> On Sun, Mar 2, 2014 at 2:23 PM, Sean Solbak <[email protected]> wrote:
>> Im writing a fairly basic trident topology as follows:
>> 
>> - 4 spouts of events
>> - merges into one stream
>> - serializes the object as an event in a string
>> - saves to db
>> 
>> I split the serialization task away from the spout as it was cpu intensive 
>> to speed it up.
>> 
>> The problem I have is that after 10 minutes there is over 910k tuples 
>> emitted/transfered but only 193k records are saved.
>> 
>> The overall load of the topology seems fine.
>>  
>> - 536.404 ms complete latency at the topolgy level
>> - The highest capacity of any bolt is 0.3 which is the serialization one.
>> - each bolt task has sub 20 ms execute latency and sub 40 ms process latency.
>> 
>> So it seems trident has all the records internally, but I need these events 
>> as close to realtime as possible.
>> 
>> Does anyone have any guidance as to how to increase the throughput?  Is it 
>> simply a matter of tweeking max spout pending and the batch size?
>> 
>> Im running it on 2 m1-smalls for now.  I dont see the need to upgrade it 
>> until the demand on the boxes seems higher.  Although CPU usage on the 
>> nimbus box is pinned.  Its at like 99%.  Why would that be?  Its at 99% even 
>> when all the topologies are killed.
>> 
>> We are currently targeting processing 200 million records per day which 
>> seems like it should be quite easy based on what Ive read that other people 
>> have achieved.  I realize that hardware should be able to boost this as well 
>> but my first goal is to get trident to push the records to the db quicker.
>> 
>> Thanks in advance,
>> Sean
> 
> 
> 
> -- 
> 
> Ce n'est pas une signature

Reply via email to