John, how are things going on the real-time social graph APIs? That would solve a lot of things for me surrounding this.
Jesse On Mon, Jan 4, 2010 at 9:58 PM, John Kalucki <j...@twitter.com> wrote: > The backend datastore returns following blocks in constant time, > regardless of the cursor depth. When I test a user with 100k+ > followers via twitter.com using a ruby script, I see each cursored > block return in between 1.3 and 2.0 seconds, n=46, avg 1.59 seconds, > median 1.47 sec, stddev of .377, (home DSL, shared by several people > at the moment). So, it seems that we're returning the data over home > DSL at between 2,500 and 4,000 ids per second, which seems like a > perfectly reasonable rate and variance. > > If I recall correctly, the "cursorless" methods are just shunted to > the first block each time, and thus represent a constant, incomplete, > amount of data... > > Looking into my crystal ball, if you want a lot more than several > thousand widgets per second from Twitter, you probably aren't going to > get them via REST, and you will probably have some sort of "business > relationship" in place with Twitter. > > -John Kalucki > http://twitter.com/jkalucki > Services, Twitter Inc. > > (A slice of data below) > > url /followers/ids/alexa_chung.xml?cursor=-1 > fetch time = 1.478542 > url /followers/ids/alexa_chung.xml?cursor=1322524362256299608 > fetch time = 2.044831 > url /followers/ids/alexa_chung.xml?cursor=1321126009663170021 > fetch time = 1.350035 > url /followers/ids/alexa_chung.xml?cursor=1319359640017038524 > fetch time = 1.44636 > url /followers/ids/alexa_chung.xml?cursor=1317653620096535558 > fetch time = 1.955163 > url /followers/ids/alexa_chung.xml?cursor=1316184964685221966 > fetch time = 1.326226 > url /followers/ids/alexa_chung.xml?cursor=1314866514116423204 > fetch time = 1.96824 > url /followers/ids/alexa_chung.xml?cursor=1313551933690106944 > fetch time = 1.513922 > url /followers/ids/alexa_chung.xml?cursor=1312201296962214944 > fetch time = 1.59179 > url /followers/ids/alexa_chung.xml?cursor=1311363260604388613 > fetch time = 2.259924 > url /followers/ids/alexa_chung.xml?cursor=1310627455188010229 > fetch time = 1.706438 > url /followers/ids/alexa_chung.xml?cursor=1309772694575801646 > fetch time = 1.460413 > > > > On Mon, Jan 4, 2010 at 8:18 PM, PJB <pjbmancun...@gmail.com> wrote: > > > > Some quick benchmarks... > > > > Grabbed entire social graph for ~250 users, where each user has a > > number of friends/followers between 0 and 80,000. I randomly used > > both the cursor and cursor-less API methods. > > > > < 5000 ids > > cursor: 0.72 avg seconds > > cursorless: 0.51 avg seconds > > > > 5000 to 10,000 ids > > cursor: 1.42 avg seconds > > cursorless: 0.94 avg seconds > > > > 1 to 80,000 ids > > cursor: 2.82 avg seconds > > cursorless: 1.21 avg seconds > > > > 5,000 to 80,000 ids > > cursor: 4.28 > > cursorless: 1.59 > > > > 10,000 to 80,000 ids > > cursor: 5.23 > > cursorless: 1.82 > > > > 20,000 to 80,000 ids > > cursor: 6.82 > > cursorless: 2 > > > > 40,000 to 80,000 ids > > cursor: 9.5 > > cursorless: 3 > > > > 60,000 to 80,000 ids > > cursor: 12.25 > > cursorless: 3.12 > > > > On Jan 4, 7:58 pm, Jesse Stay <jesses...@gmail.com> wrote: > >> Ditto PJB :-) > >> > >> On Mon, Jan 4, 2010 at 8:12 PM, PJB <pjbmancun...@gmail.com> wrote: > >> > >> > I think that's like asking someone: why do you eat food? But don't say > >> > because it tastes good or nourishes you, because we already know > >> > that! ;) > >> > >> > You guys presumably set the 5000 ids per cursor limit by analyzing > >> > your user base and noting that one could still obtain the social graph > >> > for the vast majority of users with a single call. > >> > >> > But this is a bit misleading. For analytics-based apps, who aim to do > >> > near real-time analysis of relationships, the focus is typically on > >> > consumer brands who have a far larger than average number of > >> > relationships (e.g., 50k - 200k). > >> > >> > This means that those apps are neck-deep in cursor-based stuff, and > >> > quickly realize the existing drawbacks, including, in order of > >> > significance: > >> > >> > - Latency. Fetching ids for a user with 3000 friends is comparable > >> > between the two calls. But as you increment past 5000, the speed > >> > quickly peaks at a 5+x difference (I will include more benchmarks in a > >> > short while). For example, fetching 80,000 friends via the get-all > >> > method takes on average 3 seconds; it takes, on average, 15 seconds > >> > with cursors. > >> > >> > - Code complexity & elegance. I would say that there is a 3x increase > >> > in code lines to account for cursors, from retrying failed cursors, to > >> > caching to account for cursor slowness, to UI changes to coddle > >> > impatient users. > >> > >> > - Incomprehensibility. While there are obviously very good reasons > >> > from Twitter's perspective (performance) to the cursor based model, > >> > there really is no apparent obvious benefit to API users for the ids > >> > calls. I would make the case that a large majority of API uses of the > >> > ids calls need and require the entire social graph, not an incomplete > >> > one. After all, we need to know what new relationships exist, but > >> > also what old relationships have failed. To dole out the data in > >> > drips and drabs is like serving a pint of beer in sippy cups. That is > >> > to say: most users need the entire social graph, so what is the use > >> > case, from an API user's perspective, of NOT maintaining at least one > >> > means to quickly, reliably, and efficiently get it in a single call? > >> > >> > - API Barriers to entry. Most of the aforementioned arguments are > >> > obviously from an API user's perspective, but there's something, too, > >> > for Twitter to consider. Namely, by increasing the complexity and > >> > learning curve of particular API actions, you presumably further limit > >> > the pool of developers who will engage with that API. That's probably > >> > a bad thing. > >> > >> > - Limits Twitter 2.0 app development. This, again, speaks to issues > >> > bearing on speed and complexity, but I think it is important. The > >> > first few apps in any given media or innovation invariably have to do > >> > with basic functionality building blocks -- tweeting, following, > >> > showing tweets. But the next wave almost always has to do with > >> > measurement and analysis. By making such analysis more difficult, you > >> > forestall the critically important ability for brands, and others, to > >> > measure performance. > >> > >> > - API users have requested it. Shouldn't, ultimately, the use case > >> > for a particular API method simply be the fact that a number of API > >> > developers have requested that it remain? > >> > >> > On Jan 4, 2:07 pm, Wilhelm Bierbaum <wilh...@twitter.com> wrote: > >> > > Can everyone contribute their use case for this API method? I'm > trying > >> > > to fully understand the deficiencies of the cursor approach. > >> > >> > > Please don't include that cursors are slow or that they are charged > >> > > against the rate limit, as those are known issues. > >> > >> > > Thanks. > >> > >> > > >