John, how are things going on the real-time social graph APIs? That would
solve a lot of things for me surrounding this.
On Mon, Jan 4, 2010 at 9:58 PM, John Kalucki <j...@twitter.com> wrote:
> The backend datastore returns following blocks in constant time,
> regardless of the cursor depth. When I test a user with 100k+
> followers via twitter.com using a ruby script, I see each cursored
> block return in between 1.3 and 2.0 seconds, n=46, avg 1.59 seconds,
> median 1.47 sec, stddev of .377, (home DSL, shared by several people
> at the moment). So, it seems that we're returning the data over home
> DSL at between 2,500 and 4,000 ids per second, which seems like a
> perfectly reasonable rate and variance.
> If I recall correctly, the "cursorless" methods are just shunted to
> the first block each time, and thus represent a constant, incomplete,
> amount of data...
> Looking into my crystal ball, if you want a lot more than several
> thousand widgets per second from Twitter, you probably aren't going to
> get them via REST, and you will probably have some sort of "business
> relationship" in place with Twitter.
> -John Kalucki
> Services, Twitter Inc.
> (A slice of data below)
> url /followers/ids/alexa_chung.xml?cursor=-1
> fetch time = 1.478542
> url /followers/ids/alexa_chung.xml?cursor=1322524362256299608
> fetch time = 2.044831
> url /followers/ids/alexa_chung.xml?cursor=1321126009663170021
> fetch time = 1.350035
> url /followers/ids/alexa_chung.xml?cursor=1319359640017038524
> fetch time = 1.44636
> url /followers/ids/alexa_chung.xml?cursor=1317653620096535558
> fetch time = 1.955163
> url /followers/ids/alexa_chung.xml?cursor=1316184964685221966
> fetch time = 1.326226
> url /followers/ids/alexa_chung.xml?cursor=1314866514116423204
> fetch time = 1.96824
> url /followers/ids/alexa_chung.xml?cursor=1313551933690106944
> fetch time = 1.513922
> url /followers/ids/alexa_chung.xml?cursor=1312201296962214944
> fetch time = 1.59179
> url /followers/ids/alexa_chung.xml?cursor=1311363260604388613
> fetch time = 2.259924
> url /followers/ids/alexa_chung.xml?cursor=1310627455188010229
> fetch time = 1.706438
> url /followers/ids/alexa_chung.xml?cursor=1309772694575801646
> fetch time = 1.460413
> On Mon, Jan 4, 2010 at 8:18 PM, PJB <pjbmancun...@gmail.com> wrote:
> > Some quick benchmarks...
> > Grabbed entire social graph for ~250 users, where each user has a
> > number of friends/followers between 0 and 80,000. I randomly used
> > both the cursor and cursor-less API methods.
> > < 5000 ids
> > cursor: 0.72 avg seconds
> > cursorless: 0.51 avg seconds
> > 5000 to 10,000 ids
> > cursor: 1.42 avg seconds
> > cursorless: 0.94 avg seconds
> > 1 to 80,000 ids
> > cursor: 2.82 avg seconds
> > cursorless: 1.21 avg seconds
> > 5,000 to 80,000 ids
> > cursor: 4.28
> > cursorless: 1.59
> > 10,000 to 80,000 ids
> > cursor: 5.23
> > cursorless: 1.82
> > 20,000 to 80,000 ids
> > cursor: 6.82
> > cursorless: 2
> > 40,000 to 80,000 ids
> > cursor: 9.5
> > cursorless: 3
> > 60,000 to 80,000 ids
> > cursor: 12.25
> > cursorless: 3.12
> > On Jan 4, 7:58 pm, Jesse Stay <jesses...@gmail.com> wrote:
> >> Ditto PJB :-)
> >> On Mon, Jan 4, 2010 at 8:12 PM, PJB <pjbmancun...@gmail.com> wrote:
> >> > I think that's like asking someone: why do you eat food? But don't say
> >> > because it tastes good or nourishes you, because we already know
> >> > that! ;)
> >> > You guys presumably set the 5000 ids per cursor limit by analyzing
> >> > your user base and noting that one could still obtain the social graph
> >> > for the vast majority of users with a single call.
> >> > But this is a bit misleading. For analytics-based apps, who aim to do
> >> > near real-time analysis of relationships, the focus is typically on
> >> > consumer brands who have a far larger than average number of
> >> > relationships (e.g., 50k - 200k).
> >> > This means that those apps are neck-deep in cursor-based stuff, and
> >> > quickly realize the existing drawbacks, including, in order of
> >> > significance:
> >> > - Latency. Fetching ids for a user with 3000 friends is comparable
> >> > between the two calls. But as you increment past 5000, the speed
> >> > quickly peaks at a 5+x difference (I will include more benchmarks in a
> >> > short while). For example, fetching 80,000 friends via the get-all
> >> > method takes on average 3 seconds; it takes, on average, 15 seconds
> >> > with cursors.
> >> > - Code complexity & elegance. I would say that there is a 3x increase
> >> > in code lines to account for cursors, from retrying failed cursors, to
> >> > caching to account for cursor slowness, to UI changes to coddle
> >> > impatient users.
> >> > - Incomprehensibility. While there are obviously very good reasons
> >> > from Twitter's perspective (performance) to the cursor based model,
> >> > there really is no apparent obvious benefit to API users for the ids
> >> > calls. I would make the case that a large majority of API uses of the
> >> > ids calls need and require the entire social graph, not an incomplete
> >> > one. After all, we need to know what new relationships exist, but
> >> > also what old relationships have failed. To dole out the data in
> >> > drips and drabs is like serving a pint of beer in sippy cups. That is
> >> > to say: most users need the entire social graph, so what is the use
> >> > case, from an API user's perspective, of NOT maintaining at least one
> >> > means to quickly, reliably, and efficiently get it in a single call?
> >> > - API Barriers to entry. Most of the aforementioned arguments are
> >> > obviously from an API user's perspective, but there's something, too,
> >> > for Twitter to consider. Namely, by increasing the complexity and
> >> > learning curve of particular API actions, you presumably further limit
> >> > the pool of developers who will engage with that API. That's probably
> >> > a bad thing.
> >> > - Limits Twitter 2.0 app development. This, again, speaks to issues
> >> > bearing on speed and complexity, but I think it is important. The
> >> > first few apps in any given media or innovation invariably have to do
> >> > with basic functionality building blocks -- tweeting, following,
> >> > showing tweets. But the next wave almost always has to do with
> >> > measurement and analysis. By making such analysis more difficult, you
> >> > forestall the critically important ability for brands, and others, to
> >> > measure performance.
> >> > - API users have requested it. Shouldn't, ultimately, the use case
> >> > for a particular API method simply be the fact that a number of API
> >> > developers have requested that it remain?
> >> > On Jan 4, 2:07 pm, Wilhelm Bierbaum <wilh...@twitter.com> wrote:
> >> > > Can everyone contribute their use case for this API method? I'm
> >> > > to fully understand the deficiencies of the cursor approach.
> >> > > Please don't include that cursors are slow or that they are charged
> >> > > against the rate limit, as those are known issues.
> >> > > Thanks.