Has anyone ever faced similar or related issue ?

2014-06-03 19:33 GMT+09:00 Romain Leroux <leroux....@gmail.com>:

> I have a simple trident transactional topology that does something like
> the following:
>
> kafka transactional spout (~3000 rec/sec, 6 partitions thus paraHint=6)
> -->
> aggregate with reducerAggregator (paraHint=20) -->
> transactional state (I tried MemoryMapState, MemcachedMapState and
> CassandraMapState) -->
> new Stream -->
> print new values
>
> I tried to tune the topology by firstly setting maxSpoutPending=1 and
> batchEmitIntervals to a large value (1 sec), and then iteratively improve
> those values.
> I ended up with maxSpoutPending=20 batchEmitInterval=150ms
>
> However I observed 2 things
>
> 1/ Delay in the topology keeps increasing
> Even with those "fine-tune" values, or smaller values, it seems that some
> transactions fail and that trident replay them (transactional state).
> However this replaying process seems to delay the processing of new
> incoming data, and storm seems to never catch up after replaying.
> The result is that after a few minutes processing is clearly not "real
> time" anymore (the aggregate printed in the logs are those from a few min
> before, and it increases); even though I don't meet a particular bottleneck
> for the calculation (bolt capacity and latency are ok).
> Is this behavior normal ? Does it come from KafkaTransactionalSpout ? From
> trident transactional mechanism ?
>
> 2/ There is an unavoidable bottleneck on $spoutcoord-spout0
> Because small failures keeps accumulating, tridents replay more and more
> transactions.
> "spout0" performances are impacted (more work), but this can be scaled
> with more kafka partitions.
> However $spoutcoord-spout0 is always a unique thread in trident, whatever
> spout we provide, and I clearly observed that $spoutcoord-spout0 goes above
> 1 after some minutes (and latency is above 10 sec or something).
> Is there a way to improve this ? Or is this an unavoidable consequence of
> trident's transactional logic that can't be addressed ?
>
>

Reply via email to