2010

John Kalucki Tue, 05 Jan 2010 00:35:42 -0800

And so it is. Given the system implementation, I'm quite surprised
that the cursorless call returns results with acceptable reliability,
especially during peak system load. The documentation attempts to
convey that the cursorless approach is risky. "all IDs are attempted
to be returned, but large sets of IDs will likely fail with timeout
errors."   When documentation says "attempted" and "fail with timeout
errors", it doesn't take too much reading between the lines to infer
that this is a best effort call. Building upon a risky dependency has,
well, risks. (The passive voice, on the other hand, is a lowly crime.)


I also agree that the cursored approach as currently implemented is
quite problematic. To increase throughput, I'd support increasing the
block size somewhat, but the boundless behavior of the cursorless
unauthenticated call just has to go. The combination of these changes
should reduce both query and memory pressure on the front end, which,
in theory, if not in practice, should lead to a better overall
experience. I'd imagine that there are complications, and numbers to
be run, and trade-offs to be made.

Trust that the platform people are trading-off many competing
interests and that there isn't a single capricious bone in their
collective body.

-John Kalucki
http://twitter.com/jkalucki
Services, Twitter Inc.


On Mon, Jan 4, 2010 at 10:40 PM, PJB <[email protected]> wrote:
>
> As noted in this thread, the fact that cursor-less methods for friends/
> followers ids will be deprecated was newly announced on December 22.
>
> In fact, the API documentation still clearly indicates that cursors
> are optional, and that their absence will return a complete social
> graph.  E.g.:
>
> http://apiwiki.twitter.com/Twitter-REST-API-Method%3A-followers%C2%A0ids
>
> ("If the cursor parameter is not provided, all IDs are attempted to be
> returned")
>
> The example at the bottom of that page gives a good example of
> retrieving 300,000+ ids in several seconds:
>
> http://twitter.com/followers/ids.xml?screen_name=dougw
>
> Of course, retrieving 20-40k users is significantly faster.
>
> Again, many of us have built apps around cursor-less API calls.  To
> now deprecate them, with just a few days warning over the holidays, is
> clearly inappropriate and uncalled for.  Similarly, to announce that
> we must now expect 5x slowness when doing the same calls, when these
> existing methods work well, is shocking.
>
> Many developers live and die by the API documentation.  It's a really
> fouled-up situation when the API documentation is so totally wrong,
> right?
>
> I urge those folks addressing this issue to preserve the cursor-less
> methods.  Barring that, I urge them to return at least 25,000 ids per
> cursor (as you note, time progression has made 5000 per call
> antiquated and ineffective for today's Twitter user) and grant at
> least 3 months before deprecation.
>
> On Jan 4, 10:23 pm, John Kalucki <[email protected]> wrote:
>> The "existing" APIs stopped providing accurate data about a year ago
>> and degraded substantially over a period of just a few months. Now the
>> only data store for social graph data requires cursors to access
>> complete sets. Pagination is just not possible with the same latency
>> at this scale without an order of magnitude or two increase in cost.
>> So, instead of hardware "units" in the tens and hundreds, think about
>> the same in the thousands and tens of thousands.
>>
>> These APIs and their now decommissioned backing stores were developed
>> when having 20,000 followers was a lot. We're an order of magnitude or
>> two beyond that point along nearly every dimension. Accounts.
>> Followers per account. Tweets per second. Etc. As systems evolve, some
>> evolutionary paths become extinct.
>>
>> Given boundless resources, the best we could do for a REST API, as
>> Marcel has alluded, is to do the cursoring for you and aggregate many
>> blocks into much larger responses. This wouldn't work very well for at
>> least two immediate reasons: 1) Running a system with multimodal
>> service times is a nightmare -- we'd have to provision a specific
>> endpoint for such a resource. 2) Ruby GC chokes on lots of objects.
>> We'd have to consider implementing this resource in another stack, or
>> do a lot of tuning. All this to build the opposite of what most
>> applications want: a real-time stream of graph deltas for a set of
>> accounts, or the list of recent set operations since the last poll --
>> and rarely, if ever, the entire following set.
>>
>> Also, I'm a little rusty on the details on the social graph api, but
>> please detail which public resources allow retrieval of 40,000
>> followers in two seconds. I'd be very interested in looking at the
>> implementing code on our end. A curl timing would be nice (time curl
>> URL > /dev/null) too.
>>
>> -John Kaluckihttp://twitter.com/jkalucki
>> Services, Twitter Inc.
>>
>> On Mon, Jan 4, 2010 at 9:18 PM, PJB <[email protected]> wrote:
>>
>> > On Jan 4, 8:58 pm, John Kalucki <[email protected]> wrote:
>> >> at the moment). So, it seems that we're returning the data over home
>> >> DSL at between 2,500 and 4,000 ids per second, which seems like a
>> >> perfectly reasonable rate and variance.
>>
>> > It's certainly not reasonable to expect it to take 10+ seconds to get
>> > 25,000 to 40,000 ids, PARTICULARLY when existing methods, for whatever
>> > reason, return the same data in less than 2 seconds.  Twitter is being
>> > incredibly short-sighted if they think this is indeed reasonable.
>>
>> > Some of us have built applications around your EXISTING APIs, and to
>> > now suggest that we may need formal "business relationships" to
>> > continue to use such APIs is seriously disquieting.
>>
>> > Disgusted...
>>
>>
>

Re: [twitter-dev] Re: Social Graph API: Legacy data format will be eliminated 1/11/2010

Reply via email to