Has anyone ever faced similar or related issue ?
2014-06-03 19:33 GMT+09:00 Romain Leroux <leroux....@gmail.com>: > I have a simple trident transactional topology that does something like > the following: > > kafka transactional spout (~3000 rec/sec, 6 partitions thus paraHint=6) > --> > aggregate with reducerAggregator (paraHint=20) --> > transactional state (I tried MemoryMapState, MemcachedMapState and > CassandraMapState) --> > new Stream --> > print new values > > I tried to tune the topology by firstly setting maxSpoutPending=1 and > batchEmitIntervals to a large value (1 sec), and then iteratively improve > those values. > I ended up with maxSpoutPending=20 batchEmitInterval=150ms > > However I observed 2 things > > 1/ Delay in the topology keeps increasing > Even with those "fine-tune" values, or smaller values, it seems that some > transactions fail and that trident replay them (transactional state). > However this replaying process seems to delay the processing of new > incoming data, and storm seems to never catch up after replaying. > The result is that after a few minutes processing is clearly not "real > time" anymore (the aggregate printed in the logs are those from a few min > before, and it increases); even though I don't meet a particular bottleneck > for the calculation (bolt capacity and latency are ok). > Is this behavior normal ? Does it come from KafkaTransactionalSpout ? From > trident transactional mechanism ? > > 2/ There is an unavoidable bottleneck on $spoutcoord-spout0 > Because small failures keeps accumulating, tridents replay more and more > transactions. > "spout0" performances are impacted (more work), but this can be scaled > with more kafka partitions. > However $spoutcoord-spout0 is always a unique thread in trident, whatever > spout we provide, and I clearly observed that $spoutcoord-spout0 goes above > 1 after some minutes (and latency is above 10 sec or something). > Is there a way to improve this ? Or is this an unavoidable consequence of > trident's transactional logic that can't be addressed ? > >