Sample streams are just that, samples. You should be comfortable with the
occasional small gap in your data. You must consume only one sample stream
for your app. If you have a hardware failure, you can fail over to another
client box, but don't consume the stream twice.
Infrastructure, Twitter Inc.
On Tue, Jan 19, 2010 at 11:40 AM, Santiago Perez <san...@santip.com.ar>wrote:
> I'm currently using the statuses/sample streaming API to store the
> sample tweets for later processing by different applications that mine
> the data. It is crucial for my applications to avoid data losses as
> much as possible. Since the API consumer and the applications all run
> in the cloud, a simple solution to prevent data losses on server
> failures would be to have two servers redundantly consuming the API
> and performing de-duplication at a later stage. Is this usage pattern
> (duplicate consumption of the sample stream) considered abusive? Do I
> risk being banned for having two clients consuming the same stream?