Jessie, My surprise shouldn't be a surprise. I'm sure the platform team is well aware of the issues.
The fact that it works at 200k users could very well be inherently unstable. Minor changes to the system elsewhere could cause this number to drop without anyone knowing. We don't monitor this "breaks at" threshold in production, and we certainly don't manage the cluster to preserve such a threshold. I'd doubt that this is testable in development. In practice, should we support this, it could be difficult to guarantee such a high threshold as various systems approach their capacity limits. The most reliable approach is to make all calls approximately the same "cost" and mange the system to provide smooth delivery at that cost per request. -John Kalucki http://twitter.com/jkalucki Services, Twitter Inc. On Tue, Jan 5, 2010 at 1:09 AM, Jesse Stay <jesses...@gmail.com> wrote: > If I can suggest you keep it backwards-compatible that would make much more > sense. I think we're all aware that over 200,000 or so followers it breaks. > So what if you kept the cursor-less nature, treat it like a cursor, but set > the returned cursor cap to be 200,000 per cursor? Or if it needs to be > smaller (again, I think it would be much less bandwidth and process-time to > just keep it a high, sustainable number rather than having to traverse > multiple times to get that), maybe just return only the last 200,000 if no > cursor is specified? This way those that aren't aware of the change aren't > affected, new methods can be put into place, documentation can be updated to > reflect the deprecated methods, and everyone's happy. > > I'm a little surprised at the surprise by the Twitter team here. If you guys > need an account on one of my servers to test this stuff I'm happy to > provide. :-) Hopefully you guys can trust us as much as we trust you. I'm > always happy to provide examples and help though. I recognize you guys are > all working your tails off there. (I say this as I wear my "wearing my > Twitter shirt" proudly) > Jesse > > On Tue, Jan 5, 2010 at 1:35 AM, John Kalucki <j...@twitter.com> wrote: >> >> And so it is. Given the system implementation, I'm quite surprised >> that the cursorless call returns results with acceptable reliability, >> especially during peak system load. The documentation attempts to >> convey that the cursorless approach is risky. "all IDs are attempted >> to be returned, but large sets of IDs will likely fail with timeout >> errors." When documentation says "attempted" and "fail with timeout >> errors", it doesn't take too much reading between the lines to infer >> that this is a best effort call. Building upon a risky dependency has, >> well, risks. (The passive voice, on the other hand, is a lowly crime.) >> >> I also agree that the cursored approach as currently implemented is >> quite problematic. To increase throughput, I'd support increasing the >> block size somewhat, but the boundless behavior of the cursorless >> unauthenticated call just has to go. The combination of these changes >> should reduce both query and memory pressure on the front end, which, >> in theory, if not in practice, should lead to a better overall >> experience. I'd imagine that there are complications, and numbers to >> be run, and trade-offs to be made. >> >> Trust that the platform people are trading-off many competing >> interests and that there isn't a single capricious bone in their >> collective body. >> >> -John Kalucki >> http://twitter.com/jkalucki >> Services, Twitter Inc. >> >> >> On Mon, Jan 4, 2010 at 10:40 PM, PJB <pjbmancun...@gmail.com> wrote: >> > >> > As noted in this thread, the fact that cursor-less methods for friends/ >> > followers ids will be deprecated was newly announced on December 22. >> > >> > In fact, the API documentation still clearly indicates that cursors >> > are optional, and that their absence will return a complete social >> > graph. E.g.: >> > >> > http://apiwiki.twitter.com/Twitter-REST-API-Method%3A-followers%C2%A0ids >> > >> > ("If the cursor parameter is not provided, all IDs are attempted to be >> > returned") >> > >> > The example at the bottom of that page gives a good example of >> > retrieving 300,000+ ids in several seconds: >> > >> > http://twitter.com/followers/ids.xml?screen_name=dougw >> > >> > Of course, retrieving 20-40k users is significantly faster. >> > >> > Again, many of us have built apps around cursor-less API calls. To >> > now deprecate them, with just a few days warning over the holidays, is >> > clearly inappropriate and uncalled for. Similarly, to announce that >> > we must now expect 5x slowness when doing the same calls, when these >> > existing methods work well, is shocking. >> > >> > Many developers live and die by the API documentation. It's a really >> > fouled-up situation when the API documentation is so totally wrong, >> > right? >> > >> > I urge those folks addressing this issue to preserve the cursor-less >> > methods. Barring that, I urge them to return at least 25,000 ids per >> > cursor (as you note, time progression has made 5000 per call >> > antiquated and ineffective for today's Twitter user) and grant at >> > least 3 months before deprecation. >> > >> > On Jan 4, 10:23 pm, John Kalucki <j...@twitter.com> wrote: >> >> The "existing" APIs stopped providing accurate data about a year ago >> >> and degraded substantially over a period of just a few months. Now the >> >> only data store for social graph data requires cursors to access >> >> complete sets. Pagination is just not possible with the same latency >> >> at this scale without an order of magnitude or two increase in cost. >> >> So, instead of hardware "units" in the tens and hundreds, think about >> >> the same in the thousands and tens of thousands. >> >> >> >> These APIs and their now decommissioned backing stores were developed >> >> when having 20,000 followers was a lot. We're an order of magnitude or >> >> two beyond that point along nearly every dimension. Accounts. >> >> Followers per account. Tweets per second. Etc. As systems evolve, some >> >> evolutionary paths become extinct. >> >> >> >> Given boundless resources, the best we could do for a REST API, as >> >> Marcel has alluded, is to do the cursoring for you and aggregate many >> >> blocks into much larger responses. This wouldn't work very well for at >> >> least two immediate reasons: 1) Running a system with multimodal >> >> service times is a nightmare -- we'd have to provision a specific >> >> endpoint for such a resource. 2) Ruby GC chokes on lots of objects. >> >> We'd have to consider implementing this resource in another stack, or >> >> do a lot of tuning. All this to build the opposite of what most >> >> applications want: a real-time stream of graph deltas for a set of >> >> accounts, or the list of recent set operations since the last poll -- >> >> and rarely, if ever, the entire following set. >> >> >> >> Also, I'm a little rusty on the details on the social graph api, but >> >> please detail which public resources allow retrieval of 40,000 >> >> followers in two seconds. I'd be very interested in looking at the >> >> implementing code on our end. A curl timing would be nice (time curl >> >> URL > /dev/null) too. >> >> >> >> -John Kaluckihttp://twitter.com/jkalucki >> >> Services, Twitter Inc. >> >> >> >> On Mon, Jan 4, 2010 at 9:18 PM, PJB <pjbmancun...@gmail.com> wrote: >> >> >> >> > On Jan 4, 8:58 pm, John Kalucki <j...@twitter.com> wrote: >> >> >> at the moment). So, it seems that we're returning the data over home >> >> >> DSL at between 2,500 and 4,000 ids per second, which seems like a >> >> >> perfectly reasonable rate and variance. >> >> >> >> > It's certainly not reasonable to expect it to take 10+ seconds to get >> >> > 25,000 to 40,000 ids, PARTICULARLY when existing methods, for >> >> > whatever >> >> > reason, return the same data in less than 2 seconds. Twitter is >> >> > being >> >> > incredibly short-sighted if they think this is indeed reasonable. >> >> >> >> > Some of us have built applications around your EXISTING APIs, and to >> >> > now suggest that we may need formal "business relationships" to >> >> > continue to use such APIs is seriously disquieting. >> >> >> >> > Disgusted... >> >> >> >> >> > > >