[twitter-dev] Re: New cursor methods are way too slow

Michael Steuer Tue, 20 Oct 2009 11:03:34 -0700

Hi,

The reason why I¹m using followers/ids and then users/show is efficiency:


I¹m maintaining a local cache of my users social graph. I¹m also maintaining
local user objects for my users and for their followers. Since both the
social graph and user info are subject to change, both need periodic
updating... They way I¹m doing that now is as follows:

1. I request followers/ids for each of my users
2. If I detect new followers I add them to my users social graph / If I
detect followers removed, I remove them from my users social graph

Subsequently I parse my user object table for users whose:
1. info hasn¹t been updated in X days
2. have no info because they were added as numeric IDs only via the
followers/ids method described above

I then request users/show for each user matching condition 1 or 2 above.

This way, I only get an updated user object for each unique user once, when
they¹re first added, or when I expire a previous update to their info. When
I get the followers of another new user, chances are I already know the
majority of his followers user information.

I¹m not using statuses/followers because I would be getting the same
information over and over and over and over again... Especially when you¹re
talking about users with a lot of followers, it¹s really inefficient
considering you probably already store user info on most of the user¹s
followers... It would be an equally efficient method if overlap in followers
didn¹t exist... Since it does, I believe my approach is more efficient, and
faster over time, as your user database grows and your basically just
querying the social graph...

ALL THAT SAID  I would LOVE to have a method that allows me to get user
objects in batch... If I could request 100 user objects by numeric id in one
API call, the above would be exponentially efficient and result in far fewer
calls to Twitter.

I am definitely interested in your feedback on my logic above and if you
think it holds...

Thanks!

Michael.


On 10/15/09 7:02 PM, "Tim Haines" <[email protected]> wrote:

> FYI, My backend cares.  
> 
> On Fri, Oct 16, 2009 at 2:07 PM, jmathai <[email protected]> wrote:
>> 
>> I'm curious why you're using followers/ids and then users/show for
>> each id?  I tried using that and using statuses/followers and found
>> that the total times were in the same ballpark.  statuses/followers
>> requires far fewer api calls if you're interested in user objects.
>> 
>> FYI, I do want to add and say I agree that either method is EXTREMELY
>> inefficient.  Regardless what the argument against pages and for
>> cursors are...the current implementation is painful from an end user
>> perspective.  Our backend doesn't really care, but our users don't
>> like to wait 10-30 minutes for a web page to gather a social graph.
>> 
>> I wish instead of a cursor I could get a snapshot id, # of pages and a
>> page parameter.  I don't know how it's implemented, but the ability to
>> deterministically parallelize the calls - is such a benefit to the end
>> user.  Pages let me do that.
>> 
>> On Oct 15, 9:17 am, Michael Steuer <[email protected]> wrote:
>>> > That's great!! I'm currently using the suggested method (get IDs, then do
>>> > users/show for each of them) and it's horrendously slow and cumbersome.
>>> It'd
>>> > be great if you could get a 100 user objects at the time, based on 100 ids
>>> > you provide..
>>> >
>>> > On 10/14/09 7:30 PM, "Chad Etzel" <[email protected]> wrote:
>>> >
>>> >
>>> >
>>>> > > I agree. I'm lobbying the team for something like this.
>>>> > > -Chad
>>> >
>>>> > > On Wed, Oct 14, 2009 at 10:21 PM, Josh Roesslein <[email protected]>
>>>> wrote:
>>> >
>>>>> > >> Yeah we really need a way to bulk request user payloads by giving a
>>>>> list of
>>>>> > >> IDs.
>>> >
>>>>> > >> On Wed, Oct 14, 2009 at 9:19 PM, Tim Haines <[email protected]>
>>>>> wrote:
>>> >
>>>>>> > >>> Are you suggesting I should retrieve the 2k users 1 at a time from
>>>>>> > >>> users/show once I have the ids?  I'd essentially like to do this,
but
>>>>>> > >>> 100 at a time.
>>> >
>>>>>> > >>> I know I can get the 7000 ids in 2 calls (1 even without the
>>>>>> cursors)
>>>>>> > >>> - but I actually want the whole user objects..
>>> >
>>>>>> > >>> Tim.
>>> >
>>>>>> > >>> On Oct 15, 2:56 pm, Chad Etzel <[email protected]> wrote:
>>>>>>> > >>>> If you are pulling down the entire social graph, why not use the
>>>>>>> > >>>> social graph calls which would deliver all 7000 ids in 2 calls?
>>> >
>>>>>>> > >>>> You can also parallelize this process by looping through
>>>>>>> different
>>>>>>> > >>>> users on each thread instead of using each thread to grab a
>>>>>>> different
>>>>>>> > >>>> page/cursor of the same user.
>>> >
>>>>>>> > >>>> Regarding the code issue you submitted, if you have the users
cached
>>>>>>> > >>>> locally, you could use the social graph methods to determine the
>>>>>>> > >>>> missing/new 2k users pretty quickly using the social graph
>>>>>>> methods and
>>>>>>> > >>>> comparing ids.
>>> >
>>>>>>> > >>>> -Chad
>>> >
>>>>>>> > >>>> On Wed, Oct 14, 2009 at 9:50 PM, Tim Haines <[email protected]>
wrote:
>>> >
>>>>>>>> > >>>>> Hi Chad,
>>> >
>>>>>>>> > >>>>> Statuses/followers.
>>> >
>>>>>>>> > >>>>> I've just timed another attempt - it took 25 minutes to
>>>>>>>> retrieve 17957
>>>>>>>> > >>>>> followers with statuses/followers.
>>> >
>>>>>>>> > >>>>> Is there anything I can elaborate on in the filed issue to make
it
>>>>>>>> > >>>>> clearer?
>>> >
>>>>>>>> > >>>>> Tim.
>>> >
>>>>>>>> > >>>>> On Oct 15, 2:42 pm, Chad Etzel <[email protected]> wrote:
>>>>>>>>> > >>>>>> Hi Tim,
>>> >
>>>>>>>>> > >>>>>> You said "Retrieving 7000 followers just took > 20 minutes
for me."
>>>>>>>>> > >>>>>> Can you explain what you meant by that?
>>> >
>>>>>>>>> > >>>>>> Are you using the friends/ids, followers/ids methods or the
>>>>>>>>> > >>>>>> statuses/friends, statuses/followers methods?
>>> >
>>>>>>>>> > >>>>>> -Chad
>>> >
>>>>>>>>> > >>>>>> On Wed, Oct 14, 2009 at 8:12 PM, Tim Haines
>>>>>>>>> <[email protected]> wrote:
>>> >
>>>>>>>>>> > >>>>>>> Hi'ya,
>>> >
>>>>>>>>>> > >>>>>>> I'm migrating my code to use cursors at the moment.  It's
>>>>>>>>>> frustrating
>>>>>>>>>> > >>>>>>> that calls need to be synchronous rather than how paged
>>>>>>>>>> calls could be
>>>>>>>>>> > >>>>>>> asynchronous.  Retrieving 7000 followers just took > 20
>>>>>>>>>> minutes for
>>>>>>>>>> > >>>>>>> me.
>>> >
>>>>>>>>>> > >>>>>>> I filed an issue that proposes a solution here:
>>>>>>>>>> > 
>>>>>>>>>> >>>>>>>http://code.google.com/p/twitter-api/issues/detail?id=1078 If
you
>>>>>>>>>> > >>>>>>> retrieve friends or followers, please take a look and give
it a star
>>>>>>>>>> > >>>>>>> if it's important to you.
>>> >
>>>>>>>>>> > >>>>>>> If anyone can suggest a work around for this, I'd be happy
>>>>>>>>>> to hear it.
>>> >
>>>>>>>>>> > >>>>>>> Cheers,
>>> >
>>>>>>>>>> > >>>>>>> Tim.
>>> >
>>>>> > >> --
>>>>> > >> Josh
> 
>

[twitter-dev] Re: New cursor methods are way too slow

Reply via email to