Re: [twitter-dev] De-duplicating Site Streams

2010-11-01 Thread Marc Mims
* Mark McBride [101101 12:26]: > Isn't this a matter of just changing the keys? status_id becomes > user_id":"status_id? Yes. Probably needs to be user_id/type/status_id to accommodate the case where a user favorites a status she was mentioned in. We'd get that one, twice---once for the mentio

Re: [twitter-dev] De-duplicating Site Streams

2010-11-01 Thread Mark McBride
Isn't this a matter of just changing the keys? status_id becomes user_id":"status_id? ---Mark http://twitter.com/mccv On Mon, Nov 1, 2010 at 12:18 PM, Marc Mims wrote: > * John Kalucki [101031 20:30]: > > Create two in-memory hash sets of seen ids. Write ids to both. If the id > is > > f

Re: [twitter-dev] De-duplicating Site Streams

2010-11-01 Thread Marc Mims
* John Kalucki [101031 20:30]: > Create two in-memory hash sets of seen ids. Write ids to both. If the id is > found on write, discard. Alternatively expire them every few tens of > minutes to bound growth, but provide continuous coverage. That's what I'm doing now for the Streaming API and it w

Re: [twitter-dev] De-duplicating Site Streams

2010-10-31 Thread John Kalucki
Create two in-memory hash sets of seen ids. Write ids to both. If the id is found on write, discard. Alternatively expire them every few tens of minutes to bound growth, but provide continuous coverage. -John On Tue, Oct 26, 2010 at 8:55 PM, Marc Mims wrote: > De-duplicating statuses in the

Re: [twitter-dev] De-duplicating Site Streams

2010-10-27 Thread Scott Wilcox
Hi Marc, I'd throw the hat in for MongoDB, its retardedly fast and I now adore it. Pop me a message on Twitter if you'd like to discuss it more. Scott. On 27 Oct 2010, at 19:05, M. Edward (Ed) Borasky wrote: > Quoting Marc Mims : > >> De-duplicating statuses in the Streaming API is fairly str

Re: [twitter-dev] De-duplicating Site Streams

2010-10-27 Thread M. Edward (Ed) Borasky
Quoting Marc Mims : De-duplicating statuses in the Streaming API is fairly straightforward. But with Site Streams, where a single status might be received multiple times for multiple mentioned users, and/or as favorites, it is a bit more difficult. I'm wondering if anyone can offer advice on an

[twitter-dev] De-duplicating Site Streams

2010-10-26 Thread Marc Mims
De-duplicating statuses in the Streaming API is fairly straightforward. But with Site Streams, where a single status might be received multiple times for multiple mentioned users, and/or as favorites, it is a bit more difficult. I'm wondering if anyone can offer advice on an efficient method for d