Related to redelivery, it depends on the granularity of what you are 
comfortable replaying.

If the whole process can easily be replayed, you can allow the http 
timeout/failure to fail the tuple, and track that tuple in your spout for 
replaying later.

If you don’t want the whole process to replay, you really need to split out the 
http event sink handling to a separate spout+stream, e.g. buffered by a message 
broker. (or a separate topology completely). This is what we do, since we don’t 
care to have a tuple failure during an http update at the end cause a replay 
through the whole process. So we have 2 spouts, 2 streams - first spout does 
the internal processing and a few critical updates, then adds a message to a 
kafka topic. Second spout consumes the kafka topic just for sending the http 
requests, where failures will replay the tuples. We still need to do some 
tuning of this, for example, to build in some delay in the replay process.

Tyson

On Sep 26, 2014, at 8:20 AM, Supun Kamburugamuva 
<supu...@gmail.com<mailto:supu...@gmail.com>> wrote:

If we don't care about how many times the message is delivered (at least once) 
then we can use some error handling in HTTP to achieve a guarantee. You can use 
a request/response in HTTP and until you get a HTTP 200/202 you can retry the 
delivery. To get a exactly once guarantees we may need to go through some more 
complicated protocol.

On Fri, Sep 26, 2014 at 11:10 AM, Peter Neumark 
<peter.neum...@prezi.com<mailto:peter.neum...@prezi.com>> wrote:
Thanks for the quick response!
Unfortunately, we're forced to use HTTP.
Any ideas?

On Fri, Sep 26, 2014 at 5:07 PM, Supun Kamburugamuva 
<supu...@gmail.com<mailto:supu...@gmail.com>> wrote:
On Fri, Sep 26, 2014 at 10:49 AM, Peter Neumark 
<peter.neum...@prezi.com<mailto:peter.neum...@prezi.com>> wrote:
Hi all,

We want to replace a legacy custom app with storm, but -being storm newbies- 
we're not sure what's the best way to solve the following problem:

An HTTP endpoint returns the list of events which occurred between two 
timestamps. The task is to continuously poll this event source for new events, 
optionally perform some transformation and aggregation operations on them, and 
finally make an HTTP request to an endpoint with some events.

We thought of a simple topology:
1. A clock-spout determines which time interval to process.
2. A bolt takes the time interval as input, and fetches the event list for that 
interval fro the event source, emitting them as individual tuples.
3. After some processing of the tuples, we aggregate them into fixed size 
groups, which we send in HTTP requests to an event sink.

The big question is how to make sure that all events are successfully delivered 
to the event sink. I know storm guarantees the delivery of tuples within the 
topology, but how could I guarantee that the HTTP requests to the event sink 
are also successful (and retried if necessary).

I think this is not a question about Storm and rather a question about how to 
deliver a message reliably to some sink. From my experience it is bit hard to 
achieve something like this with HTTP. This functionality is built in to 
message brokers like RabbitMQ, ActiveMQ, Kafka etc and if you use a broker to 
send your events to the sink you can get a delivery guarantee.

Thanks,
Supun..


All help, suggestions and pointers welcome!
Peter

--

Peter Neumark
DevOps guy @Prezi<http://prezi.com/>



--
Supun Kamburugamuva
Member, Apache Software Foundation; 
http://www.apache.org<http://www.apache.org/>
E-mail: supu...@gmail.com<mailto:supu...@gmail.com>;  Mobile: +1 812 369 
6762<tel:%2B1%20812%20369%206762>
Blog: http://supunk.blogspot.com<http://supunk.blogspot.com/>




--

Peter Neumark
DevOps guy @Prezi<http://prezi.com/>



--
Supun Kamburugamuva
Member, Apache Software Foundation; 
http://www.apache.org<http://www.apache.org/>
E-mail: supu...@gmail.com<mailto:supu...@gmail.com>;  Mobile: +1 812 369 6762
Blog: http://supunk.blogspot.com<http://supunk.blogspot.com/>


Reply via email to