Have you tried increasing the tuple timeout? The default of 30 seconds may not suit you.
On Thursday, June 5, 2014, Romain Leroux <[email protected]> wrote: > Has anyone ever faced similar or related issue ? > > > 2014-06-03 19:33 GMT+09:00 Romain Leroux <[email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>>: > >> I have a simple trident transactional topology that does something like >> the following: >> >> kafka transactional spout (~3000 rec/sec, 6 partitions thus paraHint=6) >> --> >> aggregate with reducerAggregator (paraHint=20) --> >> transactional state (I tried MemoryMapState, MemcachedMapState and >> CassandraMapState) --> >> new Stream --> >> print new values >> >> I tried to tune the topology by firstly setting maxSpoutPending=1 and >> batchEmitIntervals to a large value (1 sec), and then iteratively improve >> those values. >> I ended up with maxSpoutPending=20 batchEmitInterval=150ms >> >> However I observed 2 things >> >> 1/ Delay in the topology keeps increasing >> Even with those "fine-tune" values, or smaller values, it seems that some >> transactions fail and that trident replay them (transactional state). >> However this replaying process seems to delay the processing of new >> incoming data, and storm seems to never catch up after replaying. >> The result is that after a few minutes processing is clearly not "real >> time" anymore (the aggregate printed in the logs are those from a few min >> before, and it increases); even though I don't meet a particular bottleneck >> for the calculation (bolt capacity and latency are ok). >> Is this behavior normal ? Does it come from KafkaTransactionalSpout ? >> From trident transactional mechanism ? >> >> 2/ There is an unavoidable bottleneck on $spoutcoord-spout0 >> Because small failures keeps accumulating, tridents replay more and more >> transactions. >> "spout0" performances are impacted (more work), but this can be scaled >> with more kafka partitions. >> However $spoutcoord-spout0 is always a unique thread in trident, whatever >> spout we provide, and I clearly observed that $spoutcoord-spout0 goes above >> 1 after some minutes (and latency is above 10 sec or something). >> Is there a way to improve this ? Or is this an unavoidable consequence of >> trident's transactional logic that can't be addressed ? >> >> > -- Danijel Schiavuzzi E: [email protected] W: www.schiavuzzi.com T: +385989035562 Skype: danijels7
