Re: [twitter-dev] Best way to auto-discover new followers
Thanks for the tip, I do have to augment the information by fetching the user info with a second call, so this will eliminate all that messiness. On Sun, Mar 14, 2010 at 7:45 AM, Josh Roesslein jroessl...@gmail.comwrote: Oh and also the benefit of users/followers is it includes all the user information. If you are just maintaining a social graph of ids, then pulling down all the ids via followers/ids would be the way to go. I think for most users this just requires a few requests. Josh On Sun, Mar 14, 2010 at 9:42 AM, Josh Roesslein jroessl...@gmail.comwrote: A method via the streaming API to get friendship / follower updates would be nice. Now it may be better to use the users/followers method instead of followers/ids. The reason is this is ordered from newest to oldest based on when the user followed you. So you would start paginating from the start and keep going until you reach a known follower. At that point you should have a list of all new followers. You would still need to scan the entire follower list to find unfollows (if you need that info). Josh On Sat, Mar 13, 2010 at 1:31 PM, Zero zeroh...@qoobly.com wrote: I currently need to auto-discover new people who have started following me. Here's how I do it: 1. Periodically pull in my followers using '/followers/ids.json'. 2. Compare to my list of known ids to find new ids. The slight downside of this is it seems somewhat inefficient (for twitter). If there was access to an event stream of follow/unfollow requests this would be much easier. It also seems like it could be done with less latency. That is, if I have a lot of followers, I'm not going to want to burden the system by fetching the whole list at a high frequency. However, if I were just fetching the latest follows, it seems like I could do this at a higher frequency and not affect twitter. Questions: 1. Is there a better way to do what I want with existing API? 2. Are there emerging features that could make this better? Thanks, Zero
Re: [twitter-dev] Impossible to make a reliable cursor on timelines using query args
Brian, Thanks for your reply. I suspected that the freshness was the reason that this was done. Also the fact that twitter started as a service for humans, and now is being used programatically. However, from an API standpoint this makes no sense. It's typical to want to crawl forward through a stream without missing anything. The current API creates a problem with reliability and also baroqueness of implementation. For those people thinking of Twitter as a messaging API, it seems incredibly unnatural to not be able to easily and reliably process things in chronological order without worrying about the rate being slightly too high. This exhibits itself as messages dropping once you have more than 200 in a sample period. True, you're not dropping messages, but that's the way it'll be perceived. The fact that the ids are non-sequential (for a stream), means that you have to bend over backward to do this simple thing. Note that the algorithm you give actually has to be altered. Since the ids are non-sequential, we'll have to backtrack by using the entire previous sequence (-200), and then find the message that is 200 back (it won't be N-200). So we'll start out with the largest range and then revise it as we discover the newest low water mark. This fact is hidden by the simpler numbers I chose to use. Also note that 3200 200. So I potentially have to do this backtracking 16 times to get all my (undropped) messages. Anyone who has a decent programming background will think this is lame. People who have less background will simply be confused (I've seen a fair amount of Twitter drops my tweets bug reports which could be due to this simple misunderstanding). Also, If I write out the full algorithm to do reliable forward iteration, I'd bet you'd get a double take from most people. Although I don't know the twitter code, this is really just determined by the sort order of your result set (whether you get the most recent results or least recent). It would be easy enough to put another switch that gives you the least recent, and default to most recent. That provides you will the result you want (people automatically get most recent), but allows anyone who needs the ability (most programmers), to scan forward easily. Respectfully, Zero. On Fri, Mar 12, 2010 at 8:47 AM, Brian Smith br...@briansmith.org wrote: Zero wrote: 1. Assume we are at since_id = 1000. This was the last (highest) message id we had previously seen, which we have saved. 2. There is a sudden spiked and 2000 tweets come in. 3. We now try to query with since_id=1000, count=200 (the max). Unfortunately, we have missed 1800 tweets, because we only get the most recent 200 tweets. In step 3, you will get the 200 newest statuses, statuses 2801-3000. If you want 200 most recent statuses that are older than the ones you just got (that is, you want statuses 2601-2800), then you can query using max_id=2800, count=200, since_id=1000. You can keep doing this until Twitter returns zero tweets (which means it is refusing to give you any older tweets) or until Twitter returns the tweet with id=1000. (Note: You might be tempted to set since_id=1001 in order to avoid downloading the tweet with id 1000 twice; however, doing so will just cause problems and complications, and I don't recommend it.) Twitter is designed to be about what is happening right now, and not so much to be about everything that happened between the last time you checked (could be weeks ago) and right now. That's why there's no API call to get new tweets oldest-first, and that's why you can't even get access to tweets older than the most recent ~3000 or so. Although there are Twitter users that really want to read every tweet in their timelines, Twitter's design--especially the website UI--doesn't facilitate that behavior. If you are developing an end-user client, be aware that the user probably doesn't want to read every tweet and almost definitely doesn't want to wait for dozens of API calls to complete before they see the refreshed timeline. I recommend optimizing apps for showing what's happening right now, whenever it is practical to do so. When I first started using Twitter I treated it more like a self-organizing forum for having conversations with people (so reading every tweet would be important), but I gave up as Twitter simply doesn't work well for that now. Regards, Brian
Re: [twitter-dev] Impossible to make a reliable cursor on timelines using query args
Not complex, just not obvious. When things are done in an unconventional way, you need more explaining, unfortunately. As mentioned before the only difference between what you're doing now and this is the order of the results. You return the top, and sometimes you need the bottom. Is that really hard to do in a scalable way? The disadvantage of not providing this is you now have to buffer, possibly 3200 messages, just to make sure things are correct. Also, we now have a potentially large latency (16 calls), to begin processing. None of this is a huge deal. It's cool you guys provide an API. If it can't be changed, it could be solved with docs. I'm not whining, I'm just sayin... Zero On Fri, Mar 12, 2010 at 1:24 PM, Mark McBride mmcbr...@twitter.com wrote: Am I missing something regarding the complexity of doing this? Ruby pseudo-code: my_unread_tweets = [] page = 1 count = 200 since_id = 123098485120985 while(page_of_tweets = get_tweets( http://api.twitter.com/1/statuses/home_timeline.json?page=#{page}count=#{count}since_id=#{since_id};)) do my_unread_tweets page_of_tweets end I agree it's more complex than get_all_my_tweets_disregarding_the_size_of_the_actual_list_since(since_id)... however implementing such a method in a scalable way is pretty rough. ---Mark http://twitter.com/mccv On Fri, Mar 12, 2010 at 11:11 AM, Zero Hero zeroh...@qoobly.com wrote: Brian, Thanks for your reply. I suspected that the freshness was the reason that this was done. Also the fact that twitter started as a service for humans, and now is being used programatically. However, from an API standpoint this makes no sense. It's typical to want to crawl forward through a stream without missing anything. The current API creates a problem with reliability and also baroqueness of implementation. For those people thinking of Twitter as a messaging API, it seems incredibly unnatural to not be able to easily and reliably process things in chronological order without worrying about the rate being slightly too high. This exhibits itself as messages dropping once you have more than 200 in a sample period. True, you're not dropping messages, but that's the way it'll be perceived. The fact that the ids are non-sequential (for a stream), means that you have to bend over backward to do this simple thing. Note that the algorithm you give actually has to be altered. Since the ids are non-sequential, we'll have to backtrack by using the entire previous sequence (-200), and then find the message that is 200 back (it won't be N-200). So we'll start out with the largest range and then revise it as we discover the newest low water mark. This fact is hidden by the simpler numbers I chose to use. Also note that 3200 200. So I potentially have to do this backtracking 16 times to get all my (undropped) messages. Anyone who has a decent programming background will think this is lame. People who have less background will simply be confused (I've seen a fair amount of Twitter drops my tweets bug reports which could be due to this simple misunderstanding). Also, If I write out the full algorithm to do reliable forward iteration, I'd bet you'd get a double take from most people. Although I don't know the twitter code, this is really just determined by the sort order of your result set (whether you get the most recent results or least recent). It would be easy enough to put another switch that gives you the least recent, and default to most recent. That provides you will the result you want (people automatically get most recent), but allows anyone who needs the ability (most programmers), to scan forward easily. Respectfully, Zero. On Fri, Mar 12, 2010 at 8:47 AM, Brian Smith br...@briansmith.orgwrote: Zero wrote: 1. Assume we are at since_id = 1000. This was the last (highest) message id we had previously seen, which we have saved. 2. There is a sudden spiked and 2000 tweets come in. 3. We now try to query with since_id=1000, count=200 (the max). Unfortunately, we have missed 1800 tweets, because we only get the most recent 200 tweets. In step 3, you will get the 200 newest statuses, statuses 2801-3000. If you want 200 most recent statuses that are older than the ones you just got (that is, you want statuses 2601-2800), then you can query using max_id=2800, count=200, since_id=1000. You can keep doing this until Twitter returns zero tweets (which means it is refusing to give you any older tweets) or until Twitter returns the tweet with id=1000. (Note: You might be tempted to set since_id=1001 in order to avoid downloading the tweet with id 1000 twice; however, doing so will just cause problems and complications, and I don't recommend it.) Twitter is designed to be about what is happening right now, and not so much to be about everything that happened between the last time you checked (could be weeks ago) and right