Have you tried increasing the tuple timeout? The default of 30 seconds may
not suit you.

On Thursday, June 5, 2014, Romain Leroux <[email protected]> wrote:

> Has anyone ever faced similar or related issue ?
>
>
> 2014-06-03 19:33 GMT+09:00 Romain Leroux <[email protected]
> <javascript:_e(%7B%7D,'cvml','[email protected]');>>:
>
>> I have a simple trident transactional topology that does something like
>> the following:
>>
>> kafka transactional spout (~3000 rec/sec, 6 partitions thus paraHint=6)
>> -->
>> aggregate with reducerAggregator (paraHint=20) -->
>> transactional state (I tried MemoryMapState, MemcachedMapState and
>> CassandraMapState) -->
>> new Stream -->
>> print new values
>>
>> I tried to tune the topology by firstly setting maxSpoutPending=1 and
>> batchEmitIntervals to a large value (1 sec), and then iteratively improve
>> those values.
>> I ended up with maxSpoutPending=20 batchEmitInterval=150ms
>>
>> However I observed 2 things
>>
>> 1/ Delay in the topology keeps increasing
>> Even with those "fine-tune" values, or smaller values, it seems that some
>> transactions fail and that trident replay them (transactional state).
>> However this replaying process seems to delay the processing of new
>> incoming data, and storm seems to never catch up after replaying.
>> The result is that after a few minutes processing is clearly not "real
>> time" anymore (the aggregate printed in the logs are those from a few min
>> before, and it increases); even though I don't meet a particular bottleneck
>> for the calculation (bolt capacity and latency are ok).
>> Is this behavior normal ? Does it come from KafkaTransactionalSpout ?
>> From trident transactional mechanism ?
>>
>> 2/ There is an unavoidable bottleneck on $spoutcoord-spout0
>> Because small failures keeps accumulating, tridents replay more and more
>> transactions.
>> "spout0" performances are impacted (more work), but this can be scaled
>> with more kafka partitions.
>> However $spoutcoord-spout0 is always a unique thread in trident, whatever
>> spout we provide, and I clearly observed that $spoutcoord-spout0 goes above
>> 1 after some minutes (and latency is above 10 sec or something).
>> Is there a way to improve this ? Or is this an unavoidable consequence of
>> trident's transactional logic that can't be addressed ?
>>
>>
>

-- 
Danijel Schiavuzzi

E: [email protected]
W: www.schiavuzzi.com
T: +385989035562
Skype: danijels7

Reply via email to