Create two in-memory hash sets of seen ids. Write ids to both. If the id is found on write, discard. Alternatively expire them every few tens of minutes to bound growth, but provide continuous coverage.
-John On Tue, Oct 26, 2010 at 8:55 PM, Marc Mims <marc.m...@gmail.com> wrote: > De-duplicating statuses in the Streaming API is fairly straightforward. > But with Site Streams, where a single status might be received multiple > times for multiple mentioned users, and/or as favorites, it is a bit > more difficult. > > I'm wondering if anyone can offer advice on an efficient method for > de-duplicating Site Streams. > > -Marc > > -- > Twitter developer documentation and resources: http://dev.twitter.com/doc > API updates via Twitter: http://twitter.com/twitterapi > Issues/Enhancements Tracker: > http://code.google.com/p/twitter-api/issues/list > Change your membership to this group: > http://groups.google.com/group/twitter-development-talk > -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk