On Thu, Apr 9, 2009 at 7:13 AM, kanny <[email protected]> wrote: > > > Caching is something i will definitely be doing, but as i said, to do > something complex like semantic model generation, i need access to a > user's last, at least 100,000 friends_timeline tweets. For a typical > user following 100 reasonably active persons, this would take 2-3 > months to build, which is not practical to wait for the application to > be usable.
I have about 2.3 million cached statuses for more than 10,000 users, gathered over the last couple of months for the analysis I do for TwURLed News (http://TwURLedNews.com). There's a sampling bias in favor of people who have tended to cite URLs that became popular. I'm quite interested in the kind of analysis you're doing, so I'd be happy to share the data with you or anyone else who might be want it for this sort of purpose. It wouldn't be hard for me to export it in the format you want and make it available for download, though if a lot of people want it, that would become a problem... but then we can figure out somewhere other than my servers to put it on. So... would this be useful as a one-time offer? Do you intend to share the results of your analysis? Nick
