[twitter-dev] Re: Bulk id - screen_name resolution.
2009/6/1 Nick Arnett nick.arn...@gmail.com: On Sun, May 31, 2009 at 3:57 PM, Stuart stut...@gmail.com wrote: Much as I respect Twitter and the great people who work there, I don't buy that this would place too much demand on their servers. They already use Memcached extensively, and this would be a pretty simple addition to that data store. For that very reason, I'm not sure it makes sense for third parties to collaborate on a single-purpose distributed store. There are user/account properties that Twitter won't implement, at least not until there's a lot of demonstrated value. In other words, the developer community could collaborate on problems that have marginal value to Twitter in the short run. I'm not suggesting that it would only be usable as an ID = screen_name repository. I'm suggesting that we could build our own copy of the user data so we can provide API calls that Twitter don't or won't. Clearly this is not ideal, but if there's no other choice I definitely believe it's worth the effort. At the end of the day it comes down to this would you pay to have higher API limits? Would Twitter be interested in providing higher limits to paying developers? At any rate, based on what Doug has just said it's probably not worth doing anything until the new TOS are published, just in case it turns out to be wasted effort. -Stuart -- http://stut.net/projects/twitter/
[twitter-dev] Re: Bulk id - screen_name resolution.
On Sun, May 31, 2009 at 3:57 PM, Stuart stut...@gmail.com wrote: Much as I respect Twitter and the great people who work there, I don't buy that this would place too much demand on their servers. They already use Memcached extensively, and this would be a pretty simple addition to that data store. For that very reason, I'm not sure it makes sense for third parties to collaborate on a single-purpose distributed store. There are user/account properties that Twitter won't implement, at least not until there's a lot of demonstrated value. In other words, the developer community could collaborate on problems that have marginal value to Twitter in the short run. Nick
[twitter-dev] Re: Bulk id - screen_name resolution.
There is currently nothing in our TOS preventing this type of project from developing. However, we are writing our Terms for the API and at this time I cannot speak of how a service redistributing our data will be classified. So, proceed as you will, but be warned that we may have to have a discussion down the road when the API Terms of Service better defines our relationship with developers. Thanks, Doug -- Doug Williams Twitter Platform Support http://twitter.com/dougw On Mon, Jun 1, 2009 at 8:52 AM, Nick Arnett nick.arn...@gmail.com wrote: On Sun, May 31, 2009 at 3:57 PM, Stuart stut...@gmail.com wrote: Much as I respect Twitter and the great people who work there, I don't buy that this would place too much demand on their servers. They already use Memcached extensively, and this would be a pretty simple addition to that data store. For that very reason, I'm not sure it makes sense for third parties to collaborate on a single-purpose distributed store. There are user/account properties that Twitter won't implement, at least not until there's a lot of demonstrated value. In other words, the developer community could collaborate on problems that have marginal value to Twitter in the short run. Nick
[twitter-dev] Re: Bulk id - screen_name resolution.
On 6/1/09 6:59 PM, Doug Williams wrote: There is currently nothing in our TOS preventing this type of project from developing. However, we are writing our Terms for the API and at this time I cannot speak of how a service redistributing our data will be classified. So, proceed as you will, but be warned that we may have to have a discussion down the road when the API Terms of Service better defines our relationship with developers. Doug, any kind of rough timeline for such an API TOS? Weeks? Months? Years? -- Dossy Shiobara | do...@panoptic.com | http://dossy.org/ Panoptic Computer Network | http://panoptic.com/ He realized the fastest way to change is to laugh at your own folly -- then you can let go and quickly move on. (p. 70)
[twitter-dev] Re: Bulk id - screen_name resolution.
On 31/5/09 13:03, Stuart wrote: Since there's clearly a lot of demand for this feature is it not possible for it to be added to the official API? I'd hesitate before building anything on top of Twitter that also relies on a third party for something so basic. Related suggestion: have common REST API for external services who can provide this information. You can probably get it from google social graph API too, for example. Dan
[twitter-dev] Re: Bulk id - screen_name resolution.
I would like to hear a response from Twitter on the sharing of this data. My db has about 2 million active users, and I have another db with 6 million or so I would gladly share. Previously I think the response from Twitter is that they cannot provide this as a bulk translation due to the demand it would place on their servers. They are able to provide the entire list of follower IDs simply because that lives in memory and requires no joins. The joining to get this data would be too intensive for them. If this is allowed maybe the community could take this a step further and provide a common interface to share data like this. Any thoughts? On May 31, 1:42 pm, Dan Brickley dan...@danbri.org wrote: On 31/5/09 13:03, Stuart wrote: Since there's clearly a lot of demand for this feature is it not possible for it to be added to the official API? I'd hesitate before building anything on top of Twitter that also relies on a third party for something so basic. Related suggestion: have common REST API for external services who can provide this information. You can probably get it from google social graph API too, for example. Dan
[twitter-dev] Re: Bulk id - screen_name resolution.
2009/5/31 Philip Plante pplante@gmail.com: I would like to hear a response from Twitter on the sharing of this data. My db has about 2 million active users, and I have another db with 6 million or so I would gladly share. Previously I think the response from Twitter is that they cannot provide this as a bulk translation due to the demand it would place on their servers. They are able to provide the entire list of follower IDs simply because that lives in memory and requires no joins. The joining to get this data would be too intensive for them. If this is allowed maybe the community could take this a step further and provide a common interface to share data like this. Any thoughts? Much as I respect Twitter and the great people who work there, I don't buy that this would place too much demand on their servers. They already use Memcached extensively, and this would be a pretty simple addition to that data store. Size-wise we're talking about no more than 50 bytes per user to store a user ID to username. Even at 100 million users that's less than 5 gig of memory, which I'm sure is pretty small compared to their overall Memcached footprint. And as for load on the servers each call for up to 100 IDs would count as an API request, so it's unlikely this method would add a huge amount to the existing usage. Clearly I don't know much about Twitters architecture, but this seems to me to be a pretty simple feature to implement, and relatively cheap. If Twitter won't implement it then maybe it's time to consider some of us getting together to build a user cache. If enough of us get together I'm sure we can build something that won't cost each of us too much but will allow us to build the user API methods we need. I'd hope that Twitter would be ok with this, and most of the useful data could be kept up to date if they give us single user access to the firehose. I'd be happy to lead such an effort. -Stuart -- http://stut.net/projects/twitter/
[twitter-dev] Re: Bulk id - screen_name resolution.
Depending on the response from Twitter on this I would like to sponsor this effort. I have a architecture that will already support a larger number of tweets, and have the resources at my disposal to scale it out as it grows. We will just have to wait for Twitter to weigh in on the sharing aspect. *crosses fingers* On May 31, 5:57 pm, Stuart stut...@gmail.com wrote: 2009/5/31 Philip Plante pplante@gmail.com: I would like to hear a response from Twitter on the sharing of this data. My db has about 2 million active users, and I have another db with 6 million or so I would gladly share. Previously I think the response from Twitter is that they cannot provide this as a bulk translation due to the demand it would place on their servers. They are able to provide the entire list of follower IDs simply because that lives in memory and requires no joins. The joining to get this data would be too intensive for them. If this is allowed maybe the community could take this a step further and provide a common interface to share data like this. Any thoughts? Much as I respect Twitter and the great people who work there, I don't buy that this would place too much demand on their servers. They already use Memcached extensively, and this would be a pretty simple addition to that data store. Size-wise we're talking about no more than 50 bytes per user to store a user ID to username. Even at 100 million users that's less than 5 gig of memory, which I'm sure is pretty small compared to their overall Memcached footprint. And as for load on the servers each call for up to 100 IDs would count as an API request, so it's unlikely this method would add a huge amount to the existing usage. Clearly I don't know much about Twitters architecture, but this seems to me to be a pretty simple feature to implement, and relatively cheap. If Twitter won't implement it then maybe it's time to consider some of us getting together to build a user cache. If enough of us get together I'm sure we can build something that won't cost each of us too much but will allow us to build the user API methods we need. I'd hope that Twitter would be ok with this, and most of the useful data could be kept up to date if they give us single user access to the firehose. I'd be happy to lead such an effort. -Stuart --http://stut.net/projects/twitter/
[twitter-dev] Re: Bulk id - screen_name resolution.
On 5/30/09 3:46 PM, David W wrote: [... David asks about bulk resolving of Twitter user IDs to screen_name ...] I don't know what the Twitter TOS says, but I've got a sizable cache of (reasonably fresh) Twitter user data thanks to Twitter Karma. Would it be a Twitter TOS violation for me to publish an API to allow bulk resolution of IDs to screen_name? Is this something that folks would use if I made it available? -- Dossy Shiobara | do...@panoptic.com | http://dossy.org/ Panoptic Computer Network | http://panoptic.com/ He realized the fastest way to change is to laugh at your own folly -- then you can let go and quickly move on. (p. 70)
[twitter-dev] Re: Bulk id - screen_name resolution.
On May 30, 11:28 pm, Dossy Shiobara do...@panoptic.com wrote: On 5/30/09 3:46 PM, David W wrote: [... David asks about bulk resolving of Twitter user IDs to screen_name ...] I don't know what the Twitter TOS says, but I've got a sizable cache of (reasonably fresh) Twitter user data thanks to Twitter Karma. Would it be a Twitter TOS violation for me to publish an API to allow bulk resolution of IDs to screen_name? Is this something that folks would use if I made it available? In comment to your TOS question: Twitter as a company seem a whole lot more liberal (and realistic) when it comes to their data. I think I may have even read this somewhere semiofficial in the past. Profile information itself is also available to the public, and so, keeping a local cache is probably no more harmful (from Twitter's perspective) than what happens when a search engine crawls a user's profile page. Compare and contrast to Facebook's approach. :P David. -- Dossy Shiobara | do...@panoptic.com |http://dossy.org/ Panoptic Computer Network |http://panoptic.com/ He realized the fastest way to change is to laugh at your own folly -- then you can let go and quickly move on. (p. 70)
[twitter-dev] Re: Bulk id - screen_name resolution.
Hey Dossy, This sounds awesome, and I'd be very tempted, however I've come up with a solution which should cater for any size of user account. Right now when I periodically (~24 hours) note a set of changes to the social graph, I create an entry in a (persistent) ring buffer recording the new graph state. Previously, I wanted to use my existing uid-name cache and a bunch of API calls to resolve all the changes IDs in this entry. The solution is really simple. Instead if there are more id-name calls required than there is quota, I simply use up half the remaining quota resolving some names, and reschedule the graph check for 1 hours time rather than 24. The next check will pick up where the previous one left off doing resolution, until eventually the entire entry is resolved (and my cache is bigger for the future:). This also neatly breaks down the amount of work done for very large sets of changes into chunks of at most 35 (quota/2) HTTP requests per hour per user. David. On May 30, 11:28 pm, Dossy Shiobara do...@panoptic.com wrote: On 5/30/09 3:46 PM, David W wrote: [... David asks about bulk resolving of Twitter user IDs to screen_name ...] I don't know what the Twitter TOS says, but I've got a sizable cache of (reasonably fresh) Twitter user data thanks to Twitter Karma. Would it be a Twitter TOS violation for me to publish an API to allow bulk resolution of IDs to screen_name? Is this something that folks would use if I made it available? -- Dossy Shiobara | do...@panoptic.com |http://dossy.org/ Panoptic Computer Network |http://panoptic.com/ He realized the fastest way to change is to laugh at your own folly -- then you can let go and quickly move on. (p. 70)
[twitter-dev] Re: Bulk id - screen_name resolution.
On 5/30/09 7:06 PM, David M. Wilson wrote: In comment to your TOS question: Twitter as a company seem a whole lot more liberal (and realistic) when it comes to their data. I think I may have even read this somewhere semiofficial in the past. Profile information itself is also available to the public, and so, keeping a local cache is probably no more harmful (from Twitter's perspective) than what happens when a search engine crawls a user's profile page. Compare and contrast to Facebook's approach. :P Yeah, Facebook has very strict guidelines as to what you can cache, how long you can cache it, and I'm almost positive they have a no-redistribute policy. I can totally understand Twitter not allowing third-party API consumers to redistribute data retrieved by the API - their valuation probably relies greatly on the number of requests (hits) they receive - if a third-party service adds a layer of indirection in front of Twitter, then that traffic is no longer hitting Twitter directly which makes them appear less active than they really are. Can someone either point to the clause in the Twitter TOS that says a third-party application can redistribute Twitter data to other services directly, or can someone from Twitter issue an official statement to this effect? Or, equally useful would be a statement that clearly states that this would be forbidden ... so I know not to waste my time even thinking about this. :-) -- Dossy Shiobara | do...@panoptic.com | http://dossy.org/ Panoptic Computer Network | http://panoptic.com/ He realized the fastest way to change is to laugh at your own folly -- then you can let go and quickly move on. (p. 70)
[twitter-dev] Re: Bulk id - screen_name resolution.
We also have a sizable cache of this data (around 4 million users) that we are already using in an API format internally at Twitturly. If Twitter approves it, we can add it to our publicly available API. Currently it allows conversion from both ID to username and username to ID one at a time, and in bulk (up to 100 at a time). Doug or Alex, does the TOS allow us to provide this via our API? -Joel On May 30, 3:28 pm, Dossy Shiobara do...@panoptic.com wrote: On 5/30/09 3:46 PM, David W wrote: [... David asks about bulk resolving of Twitter user IDs to screen_name ...] I don't know what the Twitter TOS says, but I've got a sizable cache of (reasonably fresh) Twitter user data thanks to Twitter Karma. Would it be a Twitter TOS violation for me to publish an API to allow bulk resolution of IDs to screen_name? Is this something that folks would use if I made it available? -- Dossy Shiobara | do...@panoptic.com |http://dossy.org/ Panoptic Computer Network |http://panoptic.com/ He realized the fastest way to change is to laugh at your own folly -- then you can let go and quickly move on. (p. 70)