Related to redelivery, it depends on the granularity of what you are comfortable replaying.
If the whole process can easily be replayed, you can allow the http timeout/failure to fail the tuple, and track that tuple in your spout for replaying later. If you don’t want the whole process to replay, you really need to split out the http event sink handling to a separate spout+stream, e.g. buffered by a message broker. (or a separate topology completely). This is what we do, since we don’t care to have a tuple failure during an http update at the end cause a replay through the whole process. So we have 2 spouts, 2 streams - first spout does the internal processing and a few critical updates, then adds a message to a kafka topic. Second spout consumes the kafka topic just for sending the http requests, where failures will replay the tuples. We still need to do some tuning of this, for example, to build in some delay in the replay process. Tyson On Sep 26, 2014, at 8:20 AM, Supun Kamburugamuva <supu...@gmail.com<mailto:supu...@gmail.com>> wrote: If we don't care about how many times the message is delivered (at least once) then we can use some error handling in HTTP to achieve a guarantee. You can use a request/response in HTTP and until you get a HTTP 200/202 you can retry the delivery. To get a exactly once guarantees we may need to go through some more complicated protocol. On Fri, Sep 26, 2014 at 11:10 AM, Peter Neumark <peter.neum...@prezi.com<mailto:peter.neum...@prezi.com>> wrote: Thanks for the quick response! Unfortunately, we're forced to use HTTP. Any ideas? On Fri, Sep 26, 2014 at 5:07 PM, Supun Kamburugamuva <supu...@gmail.com<mailto:supu...@gmail.com>> wrote: On Fri, Sep 26, 2014 at 10:49 AM, Peter Neumark <peter.neum...@prezi.com<mailto:peter.neum...@prezi.com>> wrote: Hi all, We want to replace a legacy custom app with storm, but -being storm newbies- we're not sure what's the best way to solve the following problem: An HTTP endpoint returns the list of events which occurred between two timestamps. The task is to continuously poll this event source for new events, optionally perform some transformation and aggregation operations on them, and finally make an HTTP request to an endpoint with some events. We thought of a simple topology: 1. A clock-spout determines which time interval to process. 2. A bolt takes the time interval as input, and fetches the event list for that interval fro the event source, emitting them as individual tuples. 3. After some processing of the tuples, we aggregate them into fixed size groups, which we send in HTTP requests to an event sink. The big question is how to make sure that all events are successfully delivered to the event sink. I know storm guarantees the delivery of tuples within the topology, but how could I guarantee that the HTTP requests to the event sink are also successful (and retried if necessary). I think this is not a question about Storm and rather a question about how to deliver a message reliably to some sink. From my experience it is bit hard to achieve something like this with HTTP. This functionality is built in to message brokers like RabbitMQ, ActiveMQ, Kafka etc and if you use a broker to send your events to the sink you can get a delivery guarantee. Thanks, Supun.. All help, suggestions and pointers welcome! Peter -- Peter Neumark DevOps guy @Prezi<http://prezi.com/> -- Supun Kamburugamuva Member, Apache Software Foundation; http://www.apache.org<http://www.apache.org/> E-mail: supu...@gmail.com<mailto:supu...@gmail.com>; Mobile: +1 812 369 6762<tel:%2B1%20812%20369%206762> Blog: http://supunk.blogspot.com<http://supunk.blogspot.com/> -- Peter Neumark DevOps guy @Prezi<http://prezi.com/> -- Supun Kamburugamuva Member, Apache Software Foundation; http://www.apache.org<http://www.apache.org/> E-mail: supu...@gmail.com<mailto:supu...@gmail.com>; Mobile: +1 812 369 6762 Blog: http://supunk.blogspot.com<http://supunk.blogspot.com/>