There's a command line tool for measuring topology throughput, I haven't tried it yet, but it has been merged recenty into master (0.9.3). Alternatively, a throughput-logging Trident filter is simple to implement, just count tuples passing by and divide the count by the time elapsed every N seconds. Thinking about it, this would be nice to have in Storm UI itself, it should be easy to calculate since all the data seems to be already available via the new REST API.
@Carlos, just a minor correction: actually maxSpoutPending refers to the number of _batches_ in Trident, and the number of _tuples_ in plain Storm. @Raphael, as Carlos suggested, divide 48 000 by the batch size. For example, the batch size formula for the Kafka Trident spout is, I believe, KafkaMaxFetchSize * NumKafkaPartitions / AverageKafkaMessageSize. Also, there was some talk some time ago about making Storm auto-tune maxSpoutPending itself, based on measured latencies and the topology throughput. Don't know how far has this feature progressed. On Monday, July 14, 2014, Raphael Hsieh <[email protected]> wrote: > Is there a way to tell how many batches my topology processes per second ? > Or for that matter how many tuples are processed per second ? > Aside from creating a new bolt purely for that aggregation ? > > > On Mon, Jul 14, 2014 at 2:08 PM, Carlos Rodriguez < > [email protected] > <javascript:_e(%7B%7D,'cvml','[email protected]');>> wrote: > >> Max spout pending config specifies how many *batches* can be processed >> simultaneously by your topology. >> Thats why 48,000 seems absurdly high to you. Divide it between the batch >> size and you'll get the max spout pending config that you were expecting. >> >> >> 2014-07-14 19:00 GMT+02:00 Raphael Hsieh <[email protected] >> <javascript:_e(%7B%7D,'cvml','[email protected]');>>: >> >> What is the optimal max spout pending to use in a topology ? >>> I found this thread here: >>> http://mail-archives.apache.org/mod_mbox/storm-user/201402.mbox/%3cca+avhzatfg_s88lkombvommkh-rafwr6szy0i8b8tm3rfab...@mail.gmail.com%3E >>> that didn't seem to have a follow up. >>> >>> Part of it says to >>> >>> "Start with a max spout pending that is for sure too small -- one for >>> trident, or the number of executors for storm -- and increase it until you >>> stop seeing changes in the flow. You'll probably end up with something >>> near 2*(throughput >>> in recs/sec)*(end-to-end latency) (2x the Little's law capacity)." >>> >>> Does this make sense for a Max Spout Pending value ? >>> I expect my topology to have a throughput of around 80,000/s and I've >>> been seeing a complete latency of around 300ms, so given this formula, I'd >>> want 2*80000*.3 = 48,000 Max Spout Pending. >>> >>> This seems absurdly high to me.. >>> >>> -- >>> Raphael Hsieh >>> >>> >>> >>> >> >> >> >> -- >> Carlos Rodríguez >> Developer at ENEO Tecnología >> http://redborder.net/ >> http://lnkd.in/bgfCVF9 >> > > > > -- > Raphael Hsieh > > > > -- Danijel Schiavuzzi E: [email protected] W: www.schiavuzzi.com T: +385989035562 Skype: danijels7
