Re: [twitter-dev] Best way to auto-discover new followers

2010-03-14 Thread Zero Hero
Thanks for the tip, I do have to augment the information by fetching the
user info
with a second call, so this will eliminate all that messiness.

On Sun, Mar 14, 2010 at 7:45 AM, Josh Roesslein jroessl...@gmail.comwrote:

 Oh and also the benefit of users/followers is it includes all the
 user information. If you are just
 maintaining a social graph of ids, then pulling down all the ids via
 followers/ids would be the way to go.
 I think for most users this just requires a few requests.

 Josh


 On Sun, Mar 14, 2010 at 9:42 AM, Josh Roesslein jroessl...@gmail.comwrote:

 A method via the streaming API to get friendship / follower updates would
 be nice.

 Now it may be better to use the users/followers method instead of
 followers/ids. The reason
 is this is ordered from newest to oldest based on when the user followed
 you. So you would start
 paginating from the start and keep going until you reach a known
 follower. At that point
 you should have a list of all new followers. You would still need to scan
 the entire follower list
  to find unfollows (if you need that info).

 Josh


 On Sat, Mar 13, 2010 at 1:31 PM, Zero zeroh...@qoobly.com wrote:

 I currently need to auto-discover new people who have started
 following me.
 Here's how I do it:

 1. Periodically pull in my followers using '/followers/ids.json'.
 2. Compare to my list of known ids to find new ids.

 The slight downside of this is it seems somewhat inefficient (for
 twitter).

 If there was access to an event stream of follow/unfollow requests
 this
 would be much easier.  It also seems like it could be done with less
 latency.  That is, if I have a lot of followers, I'm not going to want
 to burden
 the system by fetching the whole list at a high frequency.

 However, if I were just fetching the latest follows, it seems like I
 could
 do this at a higher frequency and not affect twitter.

 Questions:

 1. Is there a better way to do what I want with existing API?
 2. Are there emerging features that could make this better?

 Thanks,

 Zero






Re: [twitter-dev] Impossible to make a reliable cursor on timelines using query args

2010-03-12 Thread Zero Hero
Brian,

Thanks for your reply.  I suspected that the freshness was the reason that
this was done.  Also the fact that
twitter started as a service for humans, and now is being used
programatically.

However, from an API standpoint this makes no sense.  It's typical to want
to crawl forward through a stream
without missing anything.  The current API creates a problem with
reliability and also baroqueness of implementation.
For those people thinking of Twitter as a messaging API, it seems incredibly
unnatural to not be able to easily
and reliably process things in chronological order without worrying about
the rate being slightly too high.  This
exhibits itself as messages dropping once you have more than 200 in a
sample period.  True, you're not
dropping messages, but that's the way it'll be perceived.

The fact that the ids are non-sequential (for a stream), means that you have
to bend over backward to do this
simple thing.  Note that the algorithm you give actually has to be altered.
Since the ids are non-sequential, we'll
have to backtrack by using the entire previous sequence (-200), and then
find the message that is 200 back
(it won't be N-200).  So we'll start out with the largest range and then
revise it as we discover the newest
low water mark.  This fact is hidden by the simpler numbers I chose to
use.

Also note that 3200  200.  So I potentially have to do this backtracking
16 times to get all my (undropped)
messages.

Anyone who has a decent programming background will think this is lame.
People who have less background will simply
be confused (I've seen a fair amount of Twitter drops my tweets bug
reports which could be due to this simple
misunderstanding).  Also, If I write out the full algorithm to do reliable
forward iteration, I'd bet you'd get a double
take from most people.

Although I don't know the twitter code, this is really just determined by
the sort order of your result set (whether you
get the most recent results or least recent).  It would be easy enough to
put another switch that gives you the
least recent, and default to most recent.  That provides you will the result
you want (people automatically get
most recent), but allows anyone who needs the ability (most programmers), to
scan forward easily.

Respectfully,

Zero.

On Fri, Mar 12, 2010 at 8:47 AM, Brian Smith br...@briansmith.org wrote:

 Zero wrote:

 1. Assume we are at since_id = 1000.  This was the last (highest)
 message id we had previously seen, which we have saved.
 2. There is a sudden spiked and 2000 tweets come in.
 3. We now try to query with since_id=1000, count=200 (the max).
 Unfortunately, we have missed
 1800 tweets, because we only get the most recent 200 tweets.


 In step 3, you will get the 200 newest statuses, statuses 2801-3000. If you
 want 200 most recent statuses that are older than the ones you just got
 (that is, you want statuses 2601-2800), then you can query using
 max_id=2800, count=200, since_id=1000. You can keep doing this until Twitter
 returns zero tweets (which means it is refusing to give you any older
 tweets) or until Twitter returns the tweet with id=1000.

 (Note: You might be tempted to set since_id=1001 in order to avoid
 downloading the tweet with id 1000 twice; however, doing so will just cause
 problems and complications, and I don't recommend it.)

 Twitter is designed to be about what is happening right now, and not so
 much to be about everything that happened between the last time you checked
 (could be weeks ago) and right now. That's why there's no API call to get
 new tweets oldest-first, and that's why you can't even get access to tweets
 older than the most recent ~3000 or so.

 Although there are Twitter users that really want to read every tweet in
 their timelines, Twitter's design--especially the website UI--doesn't
 facilitate that behavior. If you are developing an end-user client, be aware
 that the user probably doesn't want to read every tweet and almost
 definitely doesn't want to wait for dozens of API calls to complete before
 they see the refreshed timeline. I recommend optimizing apps for showing
 what's happening right now, whenever it is practical to do so. When I first
 started using Twitter I treated it more like a self-organizing forum for
 having conversations with people (so reading every tweet would be
 important), but I gave up as Twitter simply doesn't work well for that now.

 Regards,
 Brian



Re: [twitter-dev] Impossible to make a reliable cursor on timelines using query args

2010-03-12 Thread Zero Hero
Not complex, just not obvious.  When things are done in an unconventional
way, you need more explaining, unfortunately.
As mentioned before the only difference between what you're doing now and
this is the order of the results.  You return
the top, and sometimes you need the bottom.  Is that really hard to do in a
scalable way?

The disadvantage of not providing this is you now have to buffer, possibly
3200 messages, just to make sure
things are correct.  Also, we now have a potentially large latency (16
calls), to begin processing.

None of this is a huge deal.  It's cool you guys provide an API.  If it
can't be changed, it could be solved with docs.

I'm not whining, I'm just sayin...

Zero


On Fri, Mar 12, 2010 at 1:24 PM, Mark McBride mmcbr...@twitter.com wrote:

 Am I missing something regarding the complexity of doing this?

 Ruby pseudo-code:

 my_unread_tweets = []
 page = 1
 count = 200
 since_id = 123098485120985

 while(page_of_tweets = get_tweets(
 http://api.twitter.com/1/statuses/home_timeline.json?page=#{page}count=#{count}since_id=#{since_id};))
 do
   my_unread_tweets  page_of_tweets
 end

 I agree it's more complex than
 get_all_my_tweets_disregarding_the_size_of_the_actual_list_since(since_id)...
 however implementing such a method in a scalable way is pretty rough.

   ---Mark

 http://twitter.com/mccv



 On Fri, Mar 12, 2010 at 11:11 AM, Zero Hero zeroh...@qoobly.com wrote:

 Brian,

 Thanks for your reply.  I suspected that the freshness was the reason
 that this was done.  Also the fact that
 twitter started as a service for humans, and now is being used
 programatically.

 However, from an API standpoint this makes no sense.  It's typical to want
 to crawl forward through a stream
 without missing anything.  The current API creates a problem with
 reliability and also baroqueness of implementation.
 For those people thinking of Twitter as a messaging API, it seems
 incredibly unnatural to not be able to easily
 and reliably process things in chronological order without worrying about
 the rate being slightly too high.  This
 exhibits itself as messages dropping once you have more than 200 in a
 sample period.  True, you're not
 dropping messages, but that's the way it'll be perceived.

 The fact that the ids are non-sequential (for a stream), means that you
 have to bend over backward to do this
 simple thing.  Note that the algorithm you give actually has to be
 altered.  Since the ids are non-sequential, we'll
 have to backtrack by using the entire previous sequence (-200), and then
 find the message that is 200 back
 (it won't be N-200).  So we'll start out with the largest range and then
 revise it as we discover the newest
 low water mark.  This fact is hidden by the simpler numbers I chose to
 use.

 Also note that 3200  200.  So I potentially have to do this backtracking
 16 times to get all my (undropped)
 messages.

 Anyone who has a decent programming background will think this is lame.
 People who have less background will simply
 be confused (I've seen a fair amount of Twitter drops my tweets bug
 reports which could be due to this simple
 misunderstanding).  Also, If I write out the full algorithm to do reliable
 forward iteration, I'd bet you'd get a double
 take from most people.

 Although I don't know the twitter code, this is really just determined by
 the sort order of your result set (whether you
 get the most recent results or least recent).  It would be easy enough to
 put another switch that gives you the
 least recent, and default to most recent.  That provides you will the
 result you want (people automatically get
 most recent), but allows anyone who needs the ability (most programmers),
 to scan forward easily.

 Respectfully,

 Zero.


 On Fri, Mar 12, 2010 at 8:47 AM, Brian Smith br...@briansmith.orgwrote:

 Zero wrote:

 1. Assume we are at since_id = 1000.  This was the last (highest)
 message id we had previously seen, which we have saved.
 2. There is a sudden spiked and 2000 tweets come in.
 3. We now try to query with since_id=1000, count=200 (the max).
 Unfortunately, we have missed
 1800 tweets, because we only get the most recent 200 tweets.


 In step 3, you will get the 200 newest statuses, statuses 2801-3000. If
 you want 200 most recent statuses that are older than the ones you just got
 (that is, you want statuses 2601-2800), then you can query using
 max_id=2800, count=200, since_id=1000. You can keep doing this until Twitter
 returns zero tweets (which means it is refusing to give you any older
 tweets) or until Twitter returns the tweet with id=1000.

 (Note: You might be tempted to set since_id=1001 in order to avoid
 downloading the tweet with id 1000 twice; however, doing so will just cause
 problems and complications, and I don't recommend it.)

 Twitter is designed to be about what is happening right now, and not so
 much to be about everything that happened between the last time you checked
 (could be weeks ago) and right