[twitter-dev] Re: screen_name param for /statuses/mentions and/or xml format for search results?
On Sep 4, 4:43 pm, Andrew Badera and...@badera.us wrote: ATOM _is_ XML ... not sure what the problem is? Uh, yes ... at a basic level that's true -- but the fact that you would point that out suggests that you aren't remotely familiar with the various response formats twitter API calls. Let me explain what little i do know... Twitter supports 4 output formats (that i know of): json, atom, rss, and a twitter specific xml schema. In the documentation for every api method, there is a Formats section that lists which formats are supported for that API call and they are refered to as json, atom, rss, and xml ... you pick the format you want by specifying the format as the extension on your rest call. Here is an example of an API that supports all three formats... http://apiwiki.twitter.com/Twitter-REST-API-Method%3A-statuses-public_timeline http://twitter.com/statuses/public_timeline.xml http://twitter.com/statuses/public_timeline.atom http://twitter.com/statuses/public_timeline.rss http://twitter.com/statuses/public_timeline.json ...all of these URLs return the same objects, but in different formats -- and as you can see, when you specify the xml formats you get a *lot* more details about each of the status objects then when using the atom or rss formats. My question was regarding the fact that since the search api doesn't support xml (as you can see in the Formats section of the docs: http://apiwiki.twitter.com/Twitter-Search-API-Method%3A-search) I don't see any way to get the in_reply_to information, without fetching each individual status message returned. (And yes: the search api does support the json format, and the json format *usually* includes the in_reply_to info, but not when using the search API) Now does my question make more sense to you?
[twitter-dev] Re: friends/ids now returns w/ 1-5% random duplicates (as of this morning)
Hi Jesse, Just like to chirp in and say I'm seeing weirdness too. Particularly, / followers/ids is taking more than 10 seconds to return for all accounts with over 30k followers, or alternatively are failing with 401s (using OAuth tokens). Since 10s is the hard limit for AppEngine, my app cannot function without making repeated requests (which results in hitting another, involves-cash limit, not to mention the unpaid development time). My application is subsequently dead in the water for the users who rely on it most. abusive rant directed at ops management removed for brevity David On Sep 5, 8:40 pm, Jesse Stay jesses...@gmail.com wrote: I've disabled all our following scripts until we hear back from Twitter on this. Can I pay to get a 24/7 support number I can call for stuff like this? Jesse On Sat, Sep 5, 2009 at 1:38 PM, PJB pjbmancun...@gmail.com wrote: The fix to last nights 5000 limit to friends/ids, followers/ids now returns with approximately 1-5% duplicates. For example: User1: followers: 32795 unique followers: 32428 User2: friends: 32350 unique friends: 32046 User3: followers: 19243 unique followers: 19045 NEITHER of these figures comes close to matching what is on Twitter.com. In fact, if I repeat the same calls 10 times for each user (with no following/unfollowing in between), each result is usually different. The duplicates follow either immediately or within 2 or 3 positions after each other. What's strange is that the duplicates are NOT the same if the call is repeated. Please help. This bug is new as of this morning.
[twitter-dev] Re: friends/ids now returns w/ 1-5% random duplicates (as of this morning)
So that is why my cron jobs are suddenly taking forever to complete, since Saturday morning. This 10 second delay is a new bug. In the past one just got a lot of 502 errors when retrieving really large accounts, but even those calls were seldom longer than 1 second. Dewald On Sep 6, 5:00 am, David W. d...@botanicus.net wrote: Hi Jesse, Just like to chirp in and say I'm seeing weirdness too. Particularly, / followers/ids is taking more than 10 seconds to return for all accounts with over 30k followers
[twitter-dev] Re: HUGE PROBLEM with Direct Messages!
Also, go here: http://twitter.com/account/connections and see if there are any applications that you've authenticated to via OAuth that might be doing it. (That's the other way this can happen.) On Sep 5, 3:14 pm, Dewald Pretorius dpr...@gmail.com wrote: Change your Twitter password immediately. That can only happen if some rogue service has your password and sends DMs on your account. Changing your password should stop them dead in their tracks. Dewald On Sep 5, 12:02 pm, amylou61 aleach6...@gmail.com wrote: I've tried and tried for several MONTHS through all channels I can find, to get Twitter to fix this issue, but all I've gotten are automated mssages, and dropped problem tickets. I get Direct Messages that are shown to be from MYSELF, but I didn't send them. They are coming from a blog called The Way I See It, Too. I get them every day, sometimes several. I wish someone would help me.
[twitter-dev] Recent Following and Follower Issues and Some Background on Social Graph
I can't speak to the policy issues, but I'll share a few things about social graph backing stores. To put it politely, the social graph grows quickly. Projecting the growth out just 3 or 6 months causes most engineers to do a spit- take. We have three online (user-visible) ways of storing the social graph. One is considered canonical, but it is useless for online queries. The second used to handle all queries. This store began to suffer from correctness and internal inconsistency problems as this store was pushed well beyond its capabilities. We recognized this issue long before the issues became critical, allocated significant resources, and built a third store. This store is correct (eventually consistent), internally consistent, fast, efficient, very scalable, and we're very happy with it. As the second system was slagged into uselessness, we had to cut over the majority of the site to the third system when the third reached a good, but not totally perfect, state. As we cut over, all sorts of problems, bugs and issues were eliminated. Hope was restored, flowers bloomed, etc. Yet, the third store has two minor user-visible flaws that we are fixing. Note that working on a large critical production data store with heavy read and write volume takes time, care and resources. There is minor pagination jitter in one case and a certain class of row-count-based queries have to be deprecated (or limited) and replaced with cursor-based queries to be practical. For now, we're sending the row-count-queries queries back to the second system, which is otherwise idle, but isn't consistent with the first or third system. We also have follower and following counts memoized in two ways that I know about, and there's probably at least one more way that I don't know about. Experienced hands can intuit the trade-offs and well-agonized choices that were made when we were well-behind a steep growth curve on the social graph. These are the cards. -John Kalucki http://twitter.com/jkalucki Services, Twitter Inc.
[twitter-dev] Re: 200 errors
Hi Ryan, I am getting the same error - i can found it in the logs of my app every day - at least 20 times. 1. The IP of the machine making requests to the Twitter API. If you're behind NAT, please be sure to send us your *external* IP. --- Name:twittme.mobi Address: 67.222.129.154 2. The IP address of the machine you're contacting in the Twitter cluster. You can find this on UNIX machines via the host or nslookup commands, and on Windows machines via the nslookup command. --- Name:twitter.com Address: 128.121.146.100 3. The Twitter API URL (method) you're requesting and any other details about the request (GET vs. POST, parameters, headers, etc.). --- 'account/rate_limit_status.xml' 4. Your host operating system, browser (including version), relevant cookies, and any other pertinent information about your environment. --- Linux, mobile browser,firefox, no cookies used. 5. What kind of network connection you have and from which provider, and what kind of network connectivity devices you're using. --- devices are mostly mobile..probably using mobile connections or wireless. Thanks! On Sep 5, 2:54 pm, Alex hyc...@gmail.com wrote: hi Ryan, any update on this issue ?
[twitter-dev] Re: email used as login causing problems
Hello Abraham, currently this method is returning empty response e.g. . If it returns empty response - my application knows that this is sicessfull login, otherwise - it prints the error.The docs regarding this should be updated. Greetings! On Sep 5, 12:27 pm, Abraham Williams 4bra...@gmail.com wrote: You should be able to use:http://apiwiki.twitter.com/Twitter-REST-API-Method:-account verify_credentialshttp://apiwiki.twitter.com/Twitter-REST-API-Method:-account%C2%A0veri... to get the user_id/screen_name. Abraham On Sat, Sep 5, 2009 at 05:20, twittme_mobi nlupa...@googlemail.com wrote: Hello, sometimes users login with their e-mail, while my app is expecting username. the user can login with no problems but later on, the app is using the e-mail instead of login/screen name. s there any way to get user id/screen_name from e-mail. how can one overcome this problem? Thanks! -- Abraham Williams | Community Evangelist |http://web608.org Hacker |http://abrah.am|http://twitter.com/abraham Project |http://fireeagle.labs.poseurtech.com This email is: [ ] blogable [x] ask first [ ] private.
[twitter-dev] Re: friends/ids now returns w/ 1-5% random duplicates (as of this morning)
Hey Jesse, I've seen random failures and timeouts in the past, but in the last 48 hours they have become very consistent. I've got 4 accounts that have just 'stuck'. Very tempted to just close up the app. David On Sep 6, 9:13 am, Jesse Stay jesses...@gmail.com wrote: David, that's a long-time issue, and I believe there may even be a few bugs open for it. Twitter has recommended you use the pagination methods for users with over a certain number of followers (they haven't seemed to be able to specify what that number is). Unfortunately, as we saw a couple days ago, even that's not reliable all the time. On Sun, Sep 6, 2009 at 2:00 AM, David W. d...@botanicus.net wrote: Hi Jesse, Just like to chirp in and say I'm seeing weirdness too. Particularly, / followers/ids is taking more than 10 seconds to return for all accounts with over 30k followers, or alternatively are failing with 401s (using OAuth tokens). Since 10s is the hard limit for AppEngine, my app cannot function without making repeated requests (which results in hitting another, involves-cash limit, not to mention the unpaid development time). My application is subsequently dead in the water for the users who rely on it most. abusive rant directed at ops management removed for brevity David On Sep 5, 8:40 pm, Jesse Stay jesses...@gmail.com wrote: I've disabled all our following scripts until we hear back from Twitter on this. Can I pay to get a 24/7 support number I can call for stuff like this? Jesse On Sat, Sep 5, 2009 at 1:38 PM, PJB pjbmancun...@gmail.com wrote: The fix to last nights 5000 limit to friends/ids, followers/ids now returns with approximately 1-5% duplicates. For example: User1: followers: 32795 unique followers: 32428 User2: friends: 32350 unique friends: 32046 User3: followers: 19243 unique followers: 19045 NEITHER of these figures comes close to matching what is on Twitter.com. In fact, if I repeat the same calls 10 times for each user (with no following/unfollowing in between), each result is usually different. The duplicates follow either immediately or within 2 or 3 positions after each other. What's strange is that the duplicates are NOT the same if the call is repeated. Please help. This bug is new as of this morning.
[twitter-dev] [408s] Random 408 errors having been appearing the last 48 hours
Random 408 errors are being returned when users are attempting to Sign in with Twitter on my site TweetMeNews.com. Has anyone else been seeing this? Twitter, is there any way you can expand on the error message instead of just saying 408? That would help us better Understand Report what's breaking... Thanks, Brett http://twitter.com/TweetMeNews/
[twitter-dev] Re: Recent Following and Follower Issues and Some Background on Social Graph
Thanks John. I appreciate the various ways of accessing this data, but when you guys make updates to any of these, can you either do it in a beta environment we can test in first, or earlier in the week? Where there are very few Twitter engineers monitoring these lists during the weekends, and we ourselves often have other plans, this really makes for an interesting weekend for all of us when changes go into production that break code. It happens, but it would be nice to have this earlier in the week, or in a beta environment we can test in. Also, when things like this do happen, is there a way you can lift following limits for specific users so we can correct the wrong with out customers? Thanks, Jesse On Sun, Sep 6, 2009 at 8:59 AM, John Kalucki jkalu...@gmail.com wrote: I can't speak to the policy issues, but I'll share a few things about social graph backing stores. To put it politely, the social graph grows quickly. Projecting the growth out just 3 or 6 months causes most engineers to do a spit- take. We have three online (user-visible) ways of storing the social graph. One is considered canonical, but it is useless for online queries. The second used to handle all queries. This store began to suffer from correctness and internal inconsistency problems as this store was pushed well beyond its capabilities. We recognized this issue long before the issues became critical, allocated significant resources, and built a third store. This store is correct (eventually consistent), internally consistent, fast, efficient, very scalable, and we're very happy with it. As the second system was slagged into uselessness, we had to cut over the majority of the site to the third system when the third reached a good, but not totally perfect, state. As we cut over, all sorts of problems, bugs and issues were eliminated. Hope was restored, flowers bloomed, etc. Yet, the third store has two minor user-visible flaws that we are fixing. Note that working on a large critical production data store with heavy read and write volume takes time, care and resources. There is minor pagination jitter in one case and a certain class of row-count-based queries have to be deprecated (or limited) and replaced with cursor-based queries to be practical. For now, we're sending the row-count-queries queries back to the second system, which is otherwise idle, but isn't consistent with the first or third system. We also have follower and following counts memoized in two ways that I know about, and there's probably at least one more way that I don't know about. Experienced hands can intuit the trade-offs and well-agonized choices that were made when we were well-behind a steep growth curve on the social graph. These are the cards. -John Kalucki http://twitter.com/jkalucki Services, Twitter Inc.
[twitter-dev] Recommended/Official Android App?
Hi devs, I just got an android phone. Can anyone make a recommendation for a good twitter android app? It seems there are few. Anyone specifically working on one and need a tester? ~Blaine
[twitter-dev] Re: 200 errors
Yeah it's happening to me again, same as my previous email, except the time stamp will be around 2 minutes ago On Sep 6, 4:05 pm, twittme_mobi nlupa...@googlemail.com wrote: Hi Ryan, I am getting the same error - i can found it in the logs of my app every day - at least 20 times. 1. The IP of the machine making requests to the Twitter API. If you're behind NAT, please be sure to send us your *external* IP. --- Name: twittme.mobi Address: 67.222.129.154 2. The IP address of the machine you're contacting in the Twitter cluster. You can find this on UNIX machines via the host or nslookup commands, and on Windows machines via the nslookup command. --- Name: twitter.com Address: 128.121.146.100 3. The Twitter API URL (method) you're requesting and any other details about the request (GET vs. POST, parameters, headers, etc.). --- 'account/rate_limit_status.xml' 4. Your host operating system, browser (including version), relevant cookies, and any other pertinent information about your environment. --- Linux, mobile browser,firefox, no cookies used. 5. What kind of network connection you have and from which provider, and what kind of network connectivity devices you're using. --- devices are mostly mobile..probably using mobile connections or wireless. Thanks! On Sep 5, 2:54 pm, Alex hyc...@gmail.com wrote: hi Ryan, any update on this issue ?
[twitter-dev] Re: non json response
I have seen this same http page with empty body !DOCTYPE html PUBLIC -//W3C//DTD HTML 4.01//EN http://www.w3.org/ TR/1999/REC-html401-19991224/strict.dtd !-- !DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.01//EN http://www.w3.org/TR/html4/strict.dtd; -- HTML HEAD META HTTP-EQUIV=Refresh CONTENT=0.1 META HTTP-EQUIV=Pragma CONTENT=no-cache META HTTP-EQUIV=Expires CONTENT=-1 TITLE/TITLE /HEAD BODYP/BODY /HTML a number of times in the last few days (but intermittently - a good response may come after several attempts), in response to http://twitter.com/users/show/rudifa.json The most recent one was on UTC time 2009-09-06 18:55:38.262 My IP is 84.227.186.88 as reported by http://www.whatismyip.com/ Could someone at twitter.com please tell us what does this mean? Server (s) overloaded? On Aug 30, 1:20 pm, Steven Wilkin iamthebisc...@gmail.com wrote: I'm consistently getting the same response when accessinghttp://search.twitter.com/trends.jsonfrom 209.40.204.183 Steve On Aug 26, 5:27 pm, Ryan Sarver rsar...@twitter.com wrote: Ben, It's a known issue and we are trying to hunt it down. Can you please provide us with your source IP and an approximate time of when you saw it? Thanks, RyanOn Wed, Aug 26, 2009 at 7:00 AM, benben.apperr...@googlemail.com wrote: Occassionally i get back a 200 status html response from the json search api which look like this, most times the same search works fine, it just happens occassionally: !DOCTYPE html PUBLIC -//W3C//DTD HTML 4.01//EN http://www.w3.org/ TR/1999/REC-html401-19991224/strict.dtd !-- !DOCTYPE HTML PUBLIC -//W3C//DTD HTML 4.01//EN http://www.w3.org/TR/html4/strict.dtd; -- HTML HEAD META HTTP-EQUIV=Refresh CONTENT=0.1 META HTTP-EQUIV=Pragma CONTENT=no-cache META HTTP-EQUIV=Expires CONTENT=-1 TITLE/TITLE /HEAD BODYP/BODY /HTML Does anyone recognise what this kind of response means? Is it normal, or just beta-ish quirks?
[twitter-dev] Re: Recent Following and Follower Issues and Some Background on Social Graph
For now, we're sending the row-count-queries queries back to the second system, which is otherwise idle, but isn't consistent with the first or third system. Can you help us better understand what queries you're talking about? Do you mean, e.g., that any queries that call for *ALL* friends/ids without pagination will use the second inconsistent system? And so the recommended solution would for us to change our queries to use pagination... if we want accurate data?
[twitter-dev] Re: Recent Following and Follower Issues and Some Background on Social Graph
On Sun, Sep 6, 2009 at 11:18 AM, Jesse Stay jesses...@gmail.com wrote: Thanks John. I appreciate the various ways of accessing this data, but when you guys make updates to any of these, can you either do it in a beta environment we can test in first, or earlier in the week? Where there are very few Twitter engineers monitoring these lists during the weekends, and we ourselves often have other plans, this really makes for an interesting weekend for all of us when changes go into production that break code. It happens, but it would be nice to have this earlier in the week, or in a beta environment we can test in. I think that's probably asking a lot of a company trying to grow as fast as Twitter. Graphs are very hard to scale. Ask anybody who has tried. Now if the graph weren't dependent on a centralized system Nick
[twitter-dev] Re: Recent Following and Follower Issues and Some Background on Social Graph
I don't understand how asking to release features earlier in the week is asking a lot? What does that have to do with scaling social graphs? Jesse On Sun, Sep 6, 2009 at 2:49 PM, Nick Arnett nick.arn...@gmail.com wrote: On Sun, Sep 6, 2009 at 11:18 AM, Jesse Stay jesses...@gmail.com wrote: Thanks John. I appreciate the various ways of accessing this data, but when you guys make updates to any of these, can you either do it in a beta environment we can test in first, or earlier in the week? Where there are very few Twitter engineers monitoring these lists during the weekends, and we ourselves often have other plans, this really makes for an interesting weekend for all of us when changes go into production that break code. It happens, but it would be nice to have this earlier in the week, or in a beta environment we can test in. I think that's probably asking a lot of a company trying to grow as fast as Twitter. Graphs are very hard to scale. Ask anybody who has tried. Now if the graph weren't dependent on a centralized system Nick
[twitter-dev] Re: Recent Following and Follower Issues and Some Background on Social Graph
On Sun, Sep 6, 2009 at 1:52 PM, Jesse Stay jesses...@gmail.com wrote: I don't understand how asking to release features earlier in the week is asking a lot? What does that have to do with scaling social graphs? I was referring to a beta environment. Nick
[twitter-dev] Skype + Twitter = ?
Has anyone heard of any Skype Apps that tie into the Twitter API, or vice versa. Look for some examples of how one might integrate the two. I can't find anything live. I realize it might be minimal since we're talking about one web based, and one client based, but thought I would ask. -- Dale Fol.la MeDia, LLC
[twitter-dev] Re: Paging (or cursoring) will always return unreliable (or jittery) results
I meant to type, LIMIT 100, 5000.
[twitter-dev] Re: Recent Following and Follower Issues and Some Background on Social Graph
John, Thanks for the background info. Row count queries means to me the summary friends and followers numbers displayed on the Twitter web pages, and returned on the user profile via the API, correct? So, if I am understanding you correctly, then the friends and followers that we're getting back from the social graph methods are pulled from the third store, and doing a count() on the returned JSON array gives one the actual valid numbers of current friends and followers. (Not that users would ever believe us. LOL. They believe what they see on the Twitter web pages.) Anyway, I cannot imagine the challenges you must face with your explosive growth. It will be interesting if, one day, one of your engineers could give an overview of your technical architecture. Facebook has done that (I remember the one regarding their image serving) and it was very fascinating. I will appreciate it if you can fix the 10+ seconds delay issue on Tuesday or Wednesday. It's not a major train smash issue, it is just slowing down my scripts to a great extent. They are battling to keep up with the workload when they are slowed down like that. Dewald On Sep 6, 11:59 am, John Kalucki jkalu...@gmail.com wrote: I can't speak to the policy issues, but I'll share a few things about social graph backing stores. To put it politely, the social graph grows quickly. Projecting the growth out just 3 or 6 months causes most engineers to do a spit- take. We have three online (user-visible) ways of storing the social graph. One is considered canonical, but it is useless for online queries. The second used to handle all queries. This store began to suffer from correctness and internal inconsistency problems as this store was pushed well beyond its capabilities. We recognized this issue long before the issues became critical, allocated significant resources, and built a third store. This store is correct (eventually consistent), internally consistent, fast, efficient, very scalable, and we're very happy with it. As the second system was slagged into uselessness, we had to cut over the majority of the site to the third system when the third reached a good, but not totally perfect, state. As we cut over, all sorts of problems, bugs and issues were eliminated. Hope was restored, flowers bloomed, etc. Yet, the third store has two minor user-visible flaws that we are fixing. Note that working on a large critical production data store with heavy read and write volume takes time, care and resources. There is minor pagination jitter in one case and a certain class of row-count-based queries have to be deprecated (or limited) and replaced with cursor-based queries to be practical. For now, we're sending the row-count-queries queries back to the second system, which is otherwise idle, but isn't consistent with the first or third system. We also have follower and following counts memoized in two ways that I know about, and there's probably at least one more way that I don't know about. Experienced hands can intuit the trade-offs and well-agonized choices that were made when we were well-behind a steep growth curve on the social graph. These are the cards. -John Kaluckihttp://twitter.com/jkalucki Services, Twitter Inc.
[twitter-dev] Paging (or cursoring) will always return unreliable (or jittery) results
There is no way that paging through a large and volatile data set can ever return results that are 100% accurate. Let's say one wants to page through @aplusk's followers list. That's going to take between 3 and 5 minutes just to collect the follower ids with page (or the new cursors). It is likely that some of the follower ids that you have gone past and have already colledted, have unfollowed @aplusk while you are still collecting the rest. I assume that the Twitter system does paging by doing a standard SQL LIMIT clause. If you do LIMIT 100, 20 and some of the ids that you have already paged past have been deleted, the result set is going to shift to the left and you are going to miss the ones that were above 100 but have subsequently moved left to below 100. There really are only two solutions to this problem: a) we need to have the capability to reliably retrieve the entire result set in one API call, or b) everyone has to accept that the result set cannot be guaranteed to be 100% accurate. Dewald
[twitter-dev] Re: 200 errors
We are seeing this HTML META REFRESH as well from our clients. We are a mobile application and seeing this issue more and more frequently to the point that application is not functioning properly, its hard for use to provide any specific ip data as the carriers are most likely proxying the requests from the device. It is not limited to a specific api call either, it is a systemic issue across a wide range of calls we make. There was a ticket related to the issue in the bug tracker for search, but it has been closed and I think it should be re-opened as it is still a problem http://code.google.com/p/twitter-api/issues/detail?id=968 Any feedback would be appreciated. On Sep 6, 3:01 pm, Rich rhyl...@gmail.com wrote: Yeah it's happening to me again, same as my previous email, except the time stamp will be around 2 minutes ago On Sep 6, 4:05 pm, twittme_mobi nlupa...@googlemail.com wrote: Hi Ryan, I am getting the same error - i can found it in the logs of my app every day - at least 20 times. 1. The IP of the machine making requests to the Twitter API. If you're behind NAT, please be sure to send us your *external* IP. --- Name: twittme.mobi Address: 67.222.129.154 2. The IP address of the machine you're contacting in the Twitter cluster. You can find this on UNIX machines via the host or nslookup commands, and on Windows machines via the nslookup command. --- Name: twitter.com Address: 128.121.146.100 3. The Twitter API URL (method) you're requesting and any other details about the request (GET vs. POST, parameters, headers, etc.). --- 'account/rate_limit_status.xml' 4. Your host operating system, browser (including version), relevant cookies, and any other pertinent information about your environment. --- Linux, mobile browser,firefox, no cookies used. 5. What kind of network connection you have and from which provider, and what kind of network connectivity devices you're using. --- devices are mostly mobile..probably using mobile connections or wireless. Thanks! On Sep 5, 2:54 pm, Alex hyc...@gmail.com wrote: hi Ryan, any update on this issue ?
[twitter-dev] Re: Paging (or cursoring) will always return unreliable (or jittery) results
Agreed. Is there a chance Twitter can return the full results in compressed (gzip or similar) format to reduce load, leaving the burden of decompressing on our end and reducing bandwidth? I'm sure there are other areas this could apply as well. I think you'll find compressing the full social graph of a user significantly reduces the size of the data you have to pass through the pipe - my tests have proved it to be a huge difference, and you'll have to get way past the 10s of millions of ids before things slow down at all after that. Jesse On Sun, Sep 6, 2009 at 8:27 PM, Dewald Pretorius dpr...@gmail.com wrote: There is no way that paging through a large and volatile data set can ever return results that are 100% accurate. Let's say one wants to page through @aplusk's followers list. That's going to take between 3 and 5 minutes just to collect the follower ids with page (or the new cursors). It is likely that some of the follower ids that you have gone past and have already colledted, have unfollowed @aplusk while you are still collecting the rest. I assume that the Twitter system does paging by doing a standard SQL LIMIT clause. If you do LIMIT 100, 20 and some of the ids that you have already paged past have been deleted, the result set is going to shift to the left and you are going to miss the ones that were above 100 but have subsequently moved left to below 100. There really are only two solutions to this problem: a) we need to have the capability to reliably retrieve the entire result set in one API call, or b) everyone has to accept that the result set cannot be guaranteed to be 100% accurate. Dewald
[twitter-dev] Re: Paging (or cursoring) will always return unreliable (or jittery) results
If I worked for Twitter, here's what I would have done. I would have grabbed the follower id list of the large accounts (those that usually kicked back 502s) and written them to flat files once every 5 or so minutes. When an API request comes in for that list, I'd just grab it from the flat file, instead of asking the DB to select 2+ million ids from amongst a few billion records, while it's trying to do a few thousand other selects at the same time. That's one way of getting rid of 502s on large social graph lists. Okay, the data is going to be 5 minutes out-dated. To that I say, so bloody what? Dewald
[twitter-dev] Re: Paging (or cursoring) will always return unreliable (or jittery) results
The other solution would be to send it to us in batch results, attaching a timestamp to the request telling us this is what the user's social graph looked like at x time. I personally would start with the compressed format though, as that makes it all possible to retrieve in a single request. On Sun, Sep 6, 2009 at 10:33 PM, Jesse Stay jesses...@gmail.com wrote: Agreed. Is there a chance Twitter can return the full results in compressed (gzip or similar) format to reduce load, leaving the burden of decompressing on our end and reducing bandwidth? I'm sure there are other areas this could apply as well. I think you'll find compressing the full social graph of a user significantly reduces the size of the data you have to pass through the pipe - my tests have proved it to be a huge difference, and you'll have to get way past the 10s of millions of ids before things slow down at all after that. Jesse On Sun, Sep 6, 2009 at 8:27 PM, Dewald Pretorius dpr...@gmail.com wrote: There is no way that paging through a large and volatile data set can ever return results that are 100% accurate. Let's say one wants to page through @aplusk's followers list. That's going to take between 3 and 5 minutes just to collect the follower ids with page (or the new cursors). It is likely that some of the follower ids that you have gone past and have already colledted, have unfollowed @aplusk while you are still collecting the rest. I assume that the Twitter system does paging by doing a standard SQL LIMIT clause. If you do LIMIT 100, 20 and some of the ids that you have already paged past have been deleted, the result set is going to shift to the left and you are going to miss the ones that were above 100 but have subsequently moved left to below 100. There really are only two solutions to this problem: a) we need to have the capability to reliably retrieve the entire result set in one API call, or b) everyone has to accept that the result set cannot be guaranteed to be 100% accurate. Dewald
[twitter-dev] Re: Paging (or cursoring) will always return unreliable (or jittery) results
As far as retrieving the large graphs from a DB, flat files are one way - another is to just store the full graph (of ids) in a single column in the database and parse on retrieval. This is what FriendFeed is doing currently, so they've said. Dewald and I are both talking about this because we're also having to duplicate this on our own servers, so we too have to deal with the pains of the social graph. (and oh the pain it is!) On Sun, Sep 6, 2009 at 8:44 PM, Dewald Pretorius dpr...@gmail.com wrote: If I worked for Twitter, here's what I would have done. I would have grabbed the follower id list of the large accounts (those that usually kicked back 502s) and written them to flat files once every 5 or so minutes. When an API request comes in for that list, I'd just grab it from the flat file, instead of asking the DB to select 2+ million ids from amongst a few billion records, while it's trying to do a few thousand other selects at the same time. That's one way of getting rid of 502s on large social graph lists. Okay, the data is going to be 5 minutes out-dated. To that I say, so bloody what? Dewald