[twitter-dev] Re: social graph methods with a bit more info
On Sun, Mar 29, 2009 at 11:41 PM, Damon Clinkscales sca...@pobox.comwrote: How often does this cache update? I'm curious how accurate and reliable this would be, since people are constantly modifying their social graph. In the case of the id/screen_name thing, the data wouldn't change much. Ideally, there'd be a way of forcing an update from Twitter in the case of known/suspected stale data. As to keeping up with the social graph, I think the current social graph methods are sufficient/wonderful for that. Ah, okay - so it's not necessarily a grab of the social graph then, but rather a user cache. If that's what it is I have a similar-sized cache, assuming Twitter were to start allowing this, I could make available as well. I'd be really surprised if they started to allow this though. Although there is still the problem of keeping the data up-to-date. People change their images, location, description, Tweets, number of followers/friends, etc. quite often. I think Twitter could provide a cache of this data a lot faster than they could provide a way to easily force updates on stale data. It sure would be nice though - I wouldn't have to make as many calls out to Twitter if they had a better way to get just the user updates. Jesse
[twitter-dev] Re: social graph methods with a bit more info
You can always provide your own cache. It doesn't take that much to get a complete name-ID cache locally. What does take a lot of calls is keeping it up-to-date. Since you can change names on ID's it's not always accurate (though the ID never changes). It's a huge task to get that initial scrape, takes about 2 months depending on your access, but it's doable. If we could make more calls per hour you could significantly cut that time, or if twitter made just that information available in a fire- hose format where you could suck down the entire list at once. It'll be a big file, there's almost 30 million user IDs now. On Mar 29, 4:38 pm, Jesse Stay jesses...@gmail.com wrote: If Twitter's going to allow this, why don't they just do it themselves and provide more accurate and up-to-date info? How often does this cache update? I'm curious how accurate and reliable this would be, since people are constantly modifying their social graph. Alex and crew have already said they might be able to provide more info once they fully convert over to their new architecture. My hope is that once they're able to do that I can just pull subsets of each social graph down, such as number of new followers since x date, or other criteria. A FQL-type language (similar to Facebook's) would be ideal for something like that. Jesse On Sun, Mar 29, 2009 at 1:03 PM, softprops d.tang...@gmail.com wrote: Wow! What a great idea. Offloading the burden on twitter's servers/dbs to a simple id-name cache hosted via another service on someone elses. I will have to check that out. On Mar 29, 2:52 pm, Damon Clinkscales sca...@pobox.com wrote: On Sat, Mar 28, 2009 at 11:47 PM, Damon Clinkscales sca...@pobox.com wrote: see On Sat, Mar 28, 2009 at 9:16 PM, softprops d.tang...@gmail.com wrote: It would be nice if thehttp:// twitter.com/[friends|followers]/ids.format uri's could return a bit more useful info like the screen_name. [ snip ] ... ?xml version=1.0 encoding=UTF-8? ids id screen_name=foo1/id id screen_name=bar2/id /ids They aren't going to do this for performance reasons, even though yes, it would be useful. seehttp://is.gd/ptJ9 -damon An alternative solution may be possible though. I've recently been reminded that @infochimps has a massive scrape of the Twitter social graph and is willing to make that available, in whole or in part. However, they are currently awaiting Twitter's permission on precisely what can be released. You can read more about this here - http://blog.infochimps.org/2008/12/29/massive-scrape-of-twitters-frie... Assuming that the data is released, even in a limited form, there is potential there for an id--screen_name mapping table which could serve as a cache primer for apps that need that. This could potentially save a bajillion calls against Twitter's API, which in turn would have other good effects. One of the most notable places where this is obviously needed is tying Twitter Search results to Twitter users. For historical reasons, the user id in the search result is not the Twitter user_id, so you have to use the screen name. -damon --http://twitter.com/damon
[twitter-dev] Re: social graph methods with a bit more info
On Mar 30, 3:32 am, Jesse Stay jesses...@gmail.com wrote: On Sun, Mar 29, 2009 at 11:41 PM, Damon Clinkscales sca...@pobox.comwrote: How often does this cache update? I'm curious how accurate and reliable this would be, since people are constantly modifying their social graph. In the case of the id/screen_name thing, the data wouldn't change much. Ideally, there'd be a way of forcing an update from Twitter in the case of known/suspected stale data. As to keeping up with the social graph, I think the current social graph methods are sufficient/wonderful for that. Ah, okay - so it's not necessarily a grab of the social graph then, but rather a user cache. If that's what it is I have a similar-sized cache, assuming Twitter were to start allowing this, I could make available as well. I'd be really surprised if they started to allow this though. Although there is still the problem of keeping the data up-to-date. People change their images, location, description, Tweets, number of followers/friends, etc. quite often. I think Twitter could provide a cache of this data a lot faster than they could provide a way to easily force updates on stale data. It sure would be nice though - I wouldn't have to make as many calls out to Twitter if they had a better way to get just the user updates. I think that is the point/trade off. What is the real cost to twitter of developers making more calls for small chunks of data vs. less calls for a bit more custom set of data? It's less http traffic but a bigger payload. I guess it also depends on how the data is cached. As Alex mentioned in the link above As they are, we fetch data from a single data store in our architecture to return the lists of IDs. In order to provide usernames, we'd have to bog down this request by joining together multiple sources of data. It would require a bit or rearchitecting on their part before I think we see a compromise being made. The major difficulty again maintaining the freshness of data with users changing their screen names among other things. Probably easier said than done. It would be great if twitter did start opening up the caching of user data to other services and perhaps provide web hooks that get fired when that external services cache should be updated. Jesse
[twitter-dev] Re: social graph methods with a bit more info
On Sat, Mar 28, 2009 at 11:47 PM, Damon Clinkscales sca...@pobox.com wrote: see On Sat, Mar 28, 2009 at 9:16 PM, softprops d.tang...@gmail.com wrote: It would be nice if the http://twitter.com/[friends|followers]/ids.format uri's could return a bit more useful info like the screen_name. [ snip ] ... ?xml version=1.0 encoding=UTF-8? ids id screen_name=foo1/id id screen_name=bar2/id /ids They aren't going to do this for performance reasons, even though yes, it would be useful. see http://is.gd/ptJ9 -damon An alternative solution may be possible though. I've recently been reminded that @infochimps has a massive scrape of the Twitter social graph and is willing to make that available, in whole or in part. However, they are currently awaiting Twitter's permission on precisely what can be released. You can read more about this here - http://blog.infochimps.org/2008/12/29/massive-scrape-of-twitters-friend-graph/ Assuming that the data is released, even in a limited form, there is potential there for an id--screen_name mapping table which could serve as a cache primer for apps that need that. This could potentially save a bajillion calls against Twitter's API, which in turn would have other good effects. One of the most notable places where this is obviously needed is tying Twitter Search results to Twitter users. For historical reasons, the user id in the search result is not the Twitter user_id, so you have to use the screen name. -damon -- http://twitter.com/damon
[twitter-dev] Re: social graph methods with a bit more info
If Twitter's going to allow this, why don't they just do it themselves and provide more accurate and up-to-date info? How often does this cache update? I'm curious how accurate and reliable this would be, since people are constantly modifying their social graph. Alex and crew have already said they might be able to provide more info once they fully convert over to their new architecture. My hope is that once they're able to do that I can just pull subsets of each social graph down, such as number of new followers since x date, or other criteria. A FQL-type language (similar to Facebook's) would be ideal for something like that. Jesse On Sun, Mar 29, 2009 at 1:03 PM, softprops d.tang...@gmail.com wrote: Wow! What a great idea. Offloading the burden on twitter's servers/dbs to a simple id-name cache hosted via another service on someone elses. I will have to check that out. On Mar 29, 2:52 pm, Damon Clinkscales sca...@pobox.com wrote: On Sat, Mar 28, 2009 at 11:47 PM, Damon Clinkscales sca...@pobox.com wrote: see On Sat, Mar 28, 2009 at 9:16 PM, softprops d.tang...@gmail.com wrote: It would be nice if thehttp:// twitter.com/[friends|followers]/ids.format uri's could return a bit more useful info like the screen_name. [ snip ] ... ?xml version=1.0 encoding=UTF-8? ids id screen_name=foo1/id id screen_name=bar2/id /ids They aren't going to do this for performance reasons, even though yes, it would be useful. seehttp://is.gd/ptJ9 -damon An alternative solution may be possible though. I've recently been reminded that @infochimps has a massive scrape of the Twitter social graph and is willing to make that available, in whole or in part. However, they are currently awaiting Twitter's permission on precisely what can be released. You can read more about this here - http://blog.infochimps.org/2008/12/29/massive-scrape-of-twitters-frie... Assuming that the data is released, even in a limited form, there is potential there for an id--screen_name mapping table which could serve as a cache primer for apps that need that. This could potentially save a bajillion calls against Twitter's API, which in turn would have other good effects. One of the most notable places where this is obviously needed is tying Twitter Search results to Twitter users. For historical reasons, the user id in the search result is not the Twitter user_id, so you have to use the screen name. -damon --http://twitter.com/damon
[twitter-dev] Re: social graph methods with a bit more info
On Sun, Mar 29, 2009 at 4:38 PM, Jesse Stay jesses...@gmail.com wrote: If Twitter's going to allow this, why don't they just do it themselves and provide more accurate and up-to-date info? Yeah, that'd be nice. But, given everything going on, it's probably not a priority right now. How often does this cache update? I'm curious how accurate and reliable this would be, since people are constantly modifying their social graph. In the case of the id/screen_name thing, the data wouldn't change much. Ideally, there'd be a way of forcing an update from Twitter in the case of known/suspected stale data. As to keeping up with the social graph, I think the current social graph methods are sufficient/wonderful for that. -damon
[twitter-dev] Re: social graph methods with a bit more info
see On Sat, Mar 28, 2009 at 9:16 PM, softprops d.tang...@gmail.com wrote: It would be nice if the http://twitter.com/[friends|followers]/ids.format uri's could return a bit more useful info like the screen_name. [ snip ] ... ?xml version=1.0 encoding=UTF-8? ids id screen_name=foo1/id id screen_name=bar2/id /ids They aren't going to do this for performance reasons, even though yes, it would be useful. see http://is.gd/ptJ9 -damon