If your external service calls take longer than 30 secs, they will almost certainly cause tuples to fail. I don't think you would lose these tuples, but they would get replayed which may cause problems, depending on your application.
IMO, you should tune "topology.message.timeout.secs" (defaults to 30sec) and consider breaking parts of this into a separate topology while using an external queue such as RabbitMQ between your topologies. Kafka would also work, but may be heavy weight than you need. >From conf/defaults.yaml # maximum amount of time a message has to complete before it's considered failed topology.message.timeout.secs: 30 I hope this helps. On Mon, Jan 13, 2014 at 11:40 AM, Pete Carlson <[email protected]>wrote: > Hi, > > We want to add a bolt to our topology that will consume tuples from an > upstream bolt and then call a service outside our topology to do some > external processing of that tuple. Our concern is that the latency of that > call will cause us to lose tuples if they weren't queued up. > > From reading this article > > > http://www.michael-noll.com/blog/2013/06/21/understanding-storm-internal-message-buffers/ > > it sounds like we can specify the queue depth for input tuples to a bolt. > > However this solution on Stack Overflow > > > http://stackoverflow.com/questions/19510497/display-results-from-bolts-of-a-storm-cluster-on-browser/19512373#19512373 > > specifies we should consider putting a queue like ActiveMQ or Kafka > between our Storm bolts. > > Is tuple queuing something we need to be concerned with? If so, which > solution is more scalable? > > If someone has done this, can you point me to an example? > > Regards, > > Pete > > -- > Pete Carlson > Software Developer > Tetra Concepts LLC > >
