Quoting Marc Mims <marc.m...@gmail.com>:

De-duplicating statuses in the Streaming API is fairly straightforward.
But with Site Streams, where a single status might be received multiple
times for multiple mentioned users, and/or as favorites, it is a bit
more difficult.

I'm wondering if anyone can offer advice on an efficient method for
de-duplicating Site Streams.

        -Marc

If you're talking about building something "massively scalable" for some value of "massive", you're getting into the realm of "NoSQL" databases. I *think* Cassandra has a Perl interface but I haven't looked at it recently. I'm by no means an expert on NoSQL databases - I just picked Cassandra because Twitter uses it for some things.

--
M. Edward (Ed) Borasky
http://borasky-research.net http://twitter.com/znmeb

"A mathematician is a device for turning coffee into theorems." - Paul Erdos

--
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: http://groups.google.com/group/twitter-development-talk




--
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk

Reply via email to