Create two in-memory hash sets of seen ids. Write ids to both. If the id is
found on write, discard. Alternatively expire them every few tens of
 minutes to bound growth, but provide continuous coverage.

-John



On Tue, Oct 26, 2010 at 8:55 PM, Marc Mims <marc.m...@gmail.com> wrote:

> De-duplicating statuses in the Streaming API is fairly straightforward.
> But with Site Streams, where a single status might be received multiple
> times for multiple mentioned users, and/or as favorites, it is a bit
> more difficult.
>
> I'm wondering if anyone can offer advice on an efficient method for
> de-duplicating Site Streams.
>
>        -Marc
>
> --
> Twitter developer documentation and resources: http://dev.twitter.com/doc
> API updates via Twitter: http://twitter.com/twitterapi
> Issues/Enhancements Tracker:
> http://code.google.com/p/twitter-api/issues/list
> Change your membership to this group:
> http://groups.google.com/group/twitter-development-talk
>

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk

Reply via email to