2009/5/31 Philip Plante <pplante....@gmail.com>:
>
> I would like to hear a response from Twitter on the sharing of this
> data.  My db has about 2 million active users, and I have another db
> with 6 million or so I would gladly share.
>
> Previously I think the response from Twitter is that they cannot
> provide this as a bulk translation due to the demand it would place on
> their servers.  They are able to provide the entire list of follower
> IDs simply because that lives in memory and requires no joins.  The
> joining to get this data would be too intensive for them.
>
> If this is allowed maybe the community could take this a step further
> and provide a common interface to share data like this.  Any thoughts?

Much as I respect Twitter and the great people who work there, I don't
buy that this would place too much demand on their servers. They
already use Memcached extensively, and this would be a pretty simple
addition to that data store.

Size-wise we're talking about no more than 50 bytes per user to store
a user ID to username. Even at 100 million users that's less than 5
gig of memory, which I'm sure is pretty small compared to their
overall Memcached footprint. And as for load on the servers each call
for up to 100 IDs would count as an API request, so it's unlikely this
method would add a huge amount to the existing usage.

Clearly I don't know much about Twitters architecture, but this seems
to me to be a pretty simple feature to implement, and relatively
cheap.

If Twitter won't implement it then maybe it's time to consider some of
us getting together to build a user cache. If enough of us get
together I'm sure we can build something that won't cost each of us
too much but will allow us to build the user API methods we need. I'd
hope that Twitter would be ok with this, and most of the useful data
could be kept up to date if they give us single user access to the
firehose. I'd be happy to lead such an effort.

-Stuart

-- 
http://stut.net/projects/twitter/

Reply via email to