[twitter-dev] Re: social graph methods with a bit more info

2009-03-30 Thread Jesse Stay
On Sun, Mar 29, 2009 at 11:41 PM, Damon Clinkscales sca...@pobox.comwrote:

  How often does this cache update? I'm curious how accurate and reliable
 this would be, since
  people are constantly modifying their social graph.

 In the case of the id/screen_name thing, the data wouldn't change
 much. Ideally, there'd be a way of forcing an update from Twitter in
 the case of known/suspected stale data.  As to keeping up with the
 social graph, I think the current social graph methods are
 sufficient/wonderful for that.


Ah, okay - so it's not necessarily a grab of the social graph then, but
rather a user cache.  If that's what it is I have a similar-sized cache,
assuming Twitter were to start allowing this, I could make available as
well. I'd be really surprised if they started to allow this though.

Although there is still the problem of keeping the data up-to-date.  People
change their images, location, description, Tweets, number of
followers/friends, etc. quite often.  I think Twitter could provide a cache
of this data a lot faster than they could provide a way to easily force
updates on stale data.  It sure would be nice though - I wouldn't have to
make as many calls out to Twitter if they had a better way to get just the
user updates.

Jesse


[twitter-dev] Re: social graph methods with a bit more info

2009-03-30 Thread TechRavingMad

You can always provide your own cache.  It doesn't take that much to
get a complete name-ID cache locally.  What does take a lot of calls
is keeping it up-to-date.  Since you can change names on ID's it's not
always accurate (though the ID never changes).

It's a huge task to get that initial scrape, takes about 2 months
depending on your access, but it's doable.

If we could make more calls per hour you could significantly cut that
time, or if twitter made just that information available in a fire-
hose format where you could suck down the entire list at once.  It'll
be a big file, there's almost 30 million user IDs now.


On Mar 29, 4:38 pm, Jesse Stay jesses...@gmail.com wrote:
 If Twitter's going to allow this, why don't they just do it themselves and
 provide more accurate and up-to-date info?  How often does this cache
 update? I'm curious how accurate and reliable this would be, since people
 are constantly modifying their social graph.

 Alex and crew have already said they might be able to provide more info once
 they fully convert over to their new architecture.  My hope is that once
 they're able to do that I can just pull subsets of each social graph down,
 such as number of new followers since x date, or other criteria.  A
 FQL-type language (similar to Facebook's) would be ideal for something like
 that.

 Jesse

 On Sun, Mar 29, 2009 at 1:03 PM, softprops d.tang...@gmail.com wrote:

  Wow! What a great idea. Offloading the burden on twitter's servers/dbs
  to a simple id-name cache hosted via another service on someone
  elses. I will have to check that out.

  On Mar 29, 2:52 pm, Damon Clinkscales sca...@pobox.com wrote:
   On Sat, Mar 28, 2009 at 11:47 PM, Damon Clinkscales sca...@pobox.com
  wrote:
see

On Sat, Mar 28, 2009 at 9:16 PM, softprops d.tang...@gmail.com
  wrote:

It would be nice if thehttp://
  twitter.com/[friends|followers]/ids.format
uri's could return a bit more useful info like the screen_name.
 [ snip ] ...
?xml version=1.0 encoding=UTF-8?
ids
 id screen_name=foo1/id
 id screen_name=bar2/id
/ids

They aren't going to do this for performance reasons, even though yes,
it would be useful.

seehttp://is.gd/ptJ9

-damon

   An alternative solution may be possible though.

   I've recently been reminded that @infochimps has a massive scrape of
   the Twitter social graph and is willing to make that available, in
   whole or in part. However, they are currently awaiting Twitter's
   permission on precisely what can be released.

   You can read more about this here -
 http://blog.infochimps.org/2008/12/29/massive-scrape-of-twitters-frie...

   Assuming that the data is released, even in a limited form, there is
   potential there for an id--screen_name mapping table which could
   serve as a cache primer for apps that need that.  This could
   potentially save a bajillion calls against Twitter's API, which in
   turn would have other good effects. One of the most notable places
   where this is obviously needed is tying Twitter Search results to
   Twitter users.   For historical reasons, the user id in the search
   result is not the Twitter user_id, so you have to use the screen name.

   -damon
   --http://twitter.com/damon


[twitter-dev] Re: social graph methods with a bit more info

2009-03-30 Thread softprops



On Mar 30, 3:32 am, Jesse Stay jesses...@gmail.com wrote:
 On Sun, Mar 29, 2009 at 11:41 PM, Damon Clinkscales sca...@pobox.comwrote:

   How often does this cache update? I'm curious how accurate and reliable
  this would be, since
   people are constantly modifying their social graph.

  In the case of the id/screen_name thing, the data wouldn't change
  much. Ideally, there'd be a way of forcing an update from Twitter in
  the case of known/suspected stale data.  As to keeping up with the
  social graph, I think the current social graph methods are
  sufficient/wonderful for that.

 Ah, okay - so it's not necessarily a grab of the social graph then, but
 rather a user cache.  If that's what it is I have a similar-sized cache,
 assuming Twitter were to start allowing this, I could make available as
 well. I'd be really surprised if they started to allow this though.

 Although there is still the problem of keeping the data up-to-date.  People
 change their images, location, description, Tweets, number of
 followers/friends, etc. quite often.  I think Twitter could provide a cache
 of this data a lot faster than they could provide a way to easily force
 updates on stale data.  It sure would be nice though - I wouldn't have to
 make as many calls out to Twitter if they had a better way to get just the
 user updates.

I think that is the point/trade off. What is the real cost to twitter
of developers making more calls for small chunks of data vs. less
calls for a bit more custom set of data? It's less http traffic but a
bigger payload. I guess it also depends on how the data is cached. As
Alex mentioned in the link above As they are, we fetch data from a
single data store in our architecture to return the lists of IDs. In
order to provide usernames, we'd have to bog down this request by
joining together multiple sources of data. It would require a bit or
rearchitecting on their part before I think we see a compromise being
made. The major difficulty again maintaining the freshness of data
with users changing their screen names among other things. Probably
easier said than done.

It would be great if twitter did start opening up the caching of user
data to other services and perhaps provide web hooks that get fired
when that external services cache should be updated.



 Jesse


[twitter-dev] Re: social graph methods with a bit more info

2009-03-29 Thread Damon Clinkscales

On Sat, Mar 28, 2009 at 11:47 PM, Damon Clinkscales sca...@pobox.com wrote:
 see

 On Sat, Mar 28, 2009 at 9:16 PM, softprops d.tang...@gmail.com wrote:

 It would be nice if the http://twitter.com/[friends|followers]/ids.format
 uri's could return a bit more useful info like the screen_name.
  [ snip ] ...
 ?xml version=1.0 encoding=UTF-8?
 ids
  id screen_name=foo1/id
  id screen_name=bar2/id
 /ids

 They aren't going to do this for performance reasons, even though yes,
 it would be useful.

 see http://is.gd/ptJ9

 -damon

An alternative solution may be possible though.

I've recently been reminded that @infochimps has a massive scrape of
the Twitter social graph and is willing to make that available, in
whole or in part. However, they are currently awaiting Twitter's
permission on precisely what can be released.

You can read more about this here -
http://blog.infochimps.org/2008/12/29/massive-scrape-of-twitters-friend-graph/

Assuming that the data is released, even in a limited form, there is
potential there for an id--screen_name mapping table which could
serve as a cache primer for apps that need that.  This could
potentially save a bajillion calls against Twitter's API, which in
turn would have other good effects. One of the most notable places
where this is obviously needed is tying Twitter Search results to
Twitter users.   For historical reasons, the user id in the search
result is not the Twitter user_id, so you have to use the screen name.

-damon
--
http://twitter.com/damon


[twitter-dev] Re: social graph methods with a bit more info

2009-03-29 Thread Jesse Stay
If Twitter's going to allow this, why don't they just do it themselves and
provide more accurate and up-to-date info?  How often does this cache
update? I'm curious how accurate and reliable this would be, since people
are constantly modifying their social graph.

Alex and crew have already said they might be able to provide more info once
they fully convert over to their new architecture.  My hope is that once
they're able to do that I can just pull subsets of each social graph down,
such as number of new followers since x date, or other criteria.  A
FQL-type language (similar to Facebook's) would be ideal for something like
that.

Jesse

On Sun, Mar 29, 2009 at 1:03 PM, softprops d.tang...@gmail.com wrote:


 Wow! What a great idea. Offloading the burden on twitter's servers/dbs
 to a simple id-name cache hosted via another service on someone
 elses. I will have to check that out.

 On Mar 29, 2:52 pm, Damon Clinkscales sca...@pobox.com wrote:
  On Sat, Mar 28, 2009 at 11:47 PM, Damon Clinkscales sca...@pobox.com
 wrote:
   see
 
   On Sat, Mar 28, 2009 at 9:16 PM, softprops d.tang...@gmail.com
 wrote:
 
   It would be nice if thehttp://
 twitter.com/[friends|followers]/ids.format
   uri's could return a bit more useful info like the screen_name.
    [ snip ] ...
   ?xml version=1.0 encoding=UTF-8?
   ids
id screen_name=foo1/id
id screen_name=bar2/id
   /ids
 
   They aren't going to do this for performance reasons, even though yes,
   it would be useful.
 
   seehttp://is.gd/ptJ9
 
   -damon
 
  An alternative solution may be possible though.
 
  I've recently been reminded that @infochimps has a massive scrape of
  the Twitter social graph and is willing to make that available, in
  whole or in part. However, they are currently awaiting Twitter's
  permission on precisely what can be released.
 
  You can read more about this here -
 http://blog.infochimps.org/2008/12/29/massive-scrape-of-twitters-frie...
 
  Assuming that the data is released, even in a limited form, there is
  potential there for an id--screen_name mapping table which could
  serve as a cache primer for apps that need that.  This could
  potentially save a bajillion calls against Twitter's API, which in
  turn would have other good effects. One of the most notable places
  where this is obviously needed is tying Twitter Search results to
  Twitter users.   For historical reasons, the user id in the search
  result is not the Twitter user_id, so you have to use the screen name.
 
  -damon
  --http://twitter.com/damon



[twitter-dev] Re: social graph methods with a bit more info

2009-03-29 Thread Damon Clinkscales

On Sun, Mar 29, 2009 at 4:38 PM, Jesse Stay jesses...@gmail.com wrote:

 If Twitter's going to allow this, why don't they just do it themselves and
 provide more accurate and up-to-date info?

Yeah, that'd be nice.  But, given everything going on, it's probably
not a priority right now.

 How often does this cache update? I'm curious how accurate and reliable this 
 would be, since
 people are constantly modifying their social graph.

In the case of the id/screen_name thing, the data wouldn't change
much. Ideally, there'd be a way of forcing an update from Twitter in
the case of known/suspected stale data.  As to keeping up with the
social graph, I think the current social graph methods are
sufficient/wonderful for that.

-damon


[twitter-dev] Re: social graph methods with a bit more info

2009-03-28 Thread Damon Clinkscales

see

On Sat, Mar 28, 2009 at 9:16 PM, softprops d.tang...@gmail.com wrote:

 It would be nice if the http://twitter.com/[friends|followers]/ids.format
 uri's could return a bit more useful info like the screen_name.
  [ snip ] ...
 ?xml version=1.0 encoding=UTF-8?
 ids
  id screen_name=foo1/id
  id screen_name=bar2/id
 /ids

They aren't going to do this for performance reasons, even though yes,
it would be useful.

see http://is.gd/ptJ9

-damon