[twitter-dev] Re: Streaming API Rate Limiting
So, it turns out that I was an order of magnitude off when I mentioned numbers above. We receive 500,000 tweets/day not 50,000. On Apr 1, 3:49 pm, Colin Surprenant colin.surpren...@gmail.com wrote: Well, first, In the Gnip Power Track documentationhttp://docs.gnip.com/w/page/35663947/Power-Trackat the has:geo section they say Currently, 'has:geo' is about 2-4% of the full firehose. Also, I ran some tests a few weeks ago to see the difference in content between the search api and the streaming api for equivalent geolocalized searches. See this threadhttp://groups.google.com/group/twitter-development-talk/browse_thread... My results showed that the streaming API returns a very small fraction (3% in my tests) of what the search API returns. This is because the streaming API only uses the geotagging API to locate tweets, but the search API uses both the geotagging API and the user location field. For example, I can get around 250 000 tweets/day for San Francisco using the search api but the streaming api will return around 7000 tweets/day. At 7000 tweets/day for San Francisco, 50 000 for the whole US seems small. Colin On Apr 1, 2:40 pm, Augusto Santos augu...@gemeos.org wrote: Sorry Colin, but where did you get this information? Doesn't match with the reality. Not at all. On Fri, Apr 1, 2011 at 12:35 PM, Colin Surprenant colin.surpren...@gmail.com wrote: As a side note, currently only 3-4% of the total tweets (firehose) are geo-tagged and are eligible to be selected in a stream location bounding box. If the current firehose rate is about 140M tweets/day, that makes ~5M eligible tweets/day. I do not know what the proportion of tweets from the US is but I would think 50% seem reasonable and would result in ~2.5M tweets/day. Even if we lower that proportion, your 50 000 tweets/day seems way off. There are 3 possibilities, 1) you are being rate limited more than you think, 2) your bounding box is wrong or 3) your bounding box is too large and Twitter has reduced it somehow. I remember I read somewhere in the api doc that each bounding box could not be more than 1 degree square enough to cover most metropolitan areas - but I cannot find that back. Colin On Mar 31, 4:08 pm, Data Gatherer gatherer...@gmail.com wrote: We have a bounding box set for the United States. Even though it's a large box, we only receive about 50,000 tweets a day. However, I see that we get rate limited at least once a week already. The box is large, but the number of matching results is fairly low. Knowing how the rate limiting works more specifically would be important when trying to gather data for other projects (more bounding boxes, other keywords). On Mar 31, 3:50 pm, Jeremy Dunck jdu...@gmail.com wrote: On Thu, Mar 31, 2011 at 2:48 PM, Augusto Santos augu...@gemeos.org wrote: No it won't. Streaming has rate limit with around 1% of firehose, if your search term os too much generic. If your search term or bouding box get too many tweets, you will start receive 'limit' status message as doc said. http://dev.twitter.com/pages/streaming_api_concepts#parsing-responses Sure, I understand that, I just meant to say that 1% of all tweets is a lot (140M average per day now). If your terms are not very general, you have a lot of head room. -- Twitter developer documentation and resources:http://dev.twitter.com/doc API updates via Twitter:http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk -- 氣 -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk
Re: [twitter-dev] Re: Streaming API Rate Limiting
I'm getting very similar count to United States, my average is 499,380 tweets/day. On Mon, Apr 4, 2011 at 3:08 PM, Khandelwal khandel...@gmail.com wrote: So, it turns out that I was an order of magnitude off when I mentioned numbers above. We receive 500,000 tweets/day not 50,000. On Apr 1, 3:49 pm, Colin Surprenant colin.surpren...@gmail.com wrote: Well, first, In the Gnip Power Track documentationhttp:// docs.gnip.com/w/page/35663947/Power-Trackat the has:geo section they say Currently, 'has:geo' is about 2-4% of the full firehose. Also, I ran some tests a few weeks ago to see the difference in content between the search api and the streaming api for equivalent geolocalized searches. See this threadhttp:// groups.google.com/group/twitter-development-talk/browse_thread... My results showed that the streaming API returns a very small fraction (3% in my tests) of what the search API returns. This is because the streaming API only uses the geotagging API to locate tweets, but the search API uses both the geotagging API and the user location field. For example, I can get around 250 000 tweets/day for San Francisco using the search api but the streaming api will return around 7000 tweets/day. At 7000 tweets/day for San Francisco, 50 000 for the whole US seems small. Colin On Apr 1, 2:40 pm, Augusto Santos augu...@gemeos.org wrote: Sorry Colin, but where did you get this information? Doesn't match with the reality. Not at all. On Fri, Apr 1, 2011 at 12:35 PM, Colin Surprenant colin.surpren...@gmail.com wrote: As a side note, currently only 3-4% of the total tweets (firehose) are geo-tagged and are eligible to be selected in a stream location bounding box. If the current firehose rate is about 140M tweets/day, that makes ~5M eligible tweets/day. I do not know what the proportion of tweets from the US is but I would think 50% seem reasonable and would result in ~2.5M tweets/day. Even if we lower that proportion, your 50 000 tweets/day seems way off. There are 3 possibilities, 1) you are being rate limited more than you think, 2) your bounding box is wrong or 3) your bounding box is too large and Twitter has reduced it somehow. I remember I read somewhere in the api doc that each bounding box could not be more than 1 degree square enough to cover most metropolitan areas - but I cannot find that back. Colin On Mar 31, 4:08 pm, Data Gatherer gatherer...@gmail.com wrote: We have a bounding box set for the United States. Even though it's a large box, we only receive about 50,000 tweets a day. However, I see that we get rate limited at least once a week already. The box is large, but the number of matching results is fairly low. Knowing how the rate limiting works more specifically would be important when trying to gather data for other projects (more bounding boxes, other keywords). On Mar 31, 3:50 pm, Jeremy Dunck jdu...@gmail.com wrote: On Thu, Mar 31, 2011 at 2:48 PM, Augusto Santos augu...@gemeos.org wrote: No it won't. Streaming has rate limit with around 1% of firehose, if your search term os too much generic. If your search term or bouding box get too many tweets, you will start receive 'limit' status message as doc said. http://dev.twitter.com/pages/streaming_api_concepts#parsing-responses Sure, I understand that, I just meant to say that 1% of all tweets is a lot (140M average per day now). If your terms are not very general, you have a lot of head room. -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter:http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk -- 氣 -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk -- 氣 -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk
[twitter-dev] Re: Streaming API Rate Limiting
As a side note, currently only 3-4% of the total tweets (firehose) are geo-tagged and are eligible to be selected in a stream location bounding box. If the current firehose rate is about 140M tweets/day, that makes ~5M eligible tweets/day. I do not know what the proportion of tweets from the US is but I would think 50% seem reasonable and would result in ~2.5M tweets/day. Even if we lower that proportion, your 50 000 tweets/day seems way off. There are 3 possibilities, 1) you are being rate limited more than you think, 2) your bounding box is wrong or 3) your bounding box is too large and Twitter has reduced it somehow. I remember I read somewhere in the api doc that each bounding box could not be more than 1 degree square enough to cover most metropolitan areas - but I cannot find that back. Colin On Mar 31, 4:08 pm, Data Gatherer gatherer...@gmail.com wrote: We have a bounding box set for the United States. Even though it's a large box, we only receive about 50,000 tweets a day. However, I see that we get rate limited at least once a week already. The box is large, but the number of matching results is fairly low. Knowing how the rate limiting works more specifically would be important when trying to gather data for other projects (more bounding boxes, other keywords). On Mar 31, 3:50 pm, Jeremy Dunck jdu...@gmail.com wrote: On Thu, Mar 31, 2011 at 2:48 PM, Augusto Santos augu...@gemeos.org wrote: No it won't. Streaming has rate limit with around 1% of firehose, if your search term os too much generic. If your search term or bouding box get too many tweets, you will start receive 'limit' status message as doc said. http://dev.twitter.com/pages/streaming_api_concepts#parsing-responses Sure, I understand that, I just meant to say that 1% of all tweets is a lot (140M average per day now). If your terms are not very general, you have a lot of head room. -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk
Re: [twitter-dev] Re: Streaming API Rate Limiting
Sorry Colin, but where did you get this information? Doesn't match with the reality. Not at all. On Fri, Apr 1, 2011 at 12:35 PM, Colin Surprenant colin.surpren...@gmail.com wrote: As a side note, currently only 3-4% of the total tweets (firehose) are geo-tagged and are eligible to be selected in a stream location bounding box. If the current firehose rate is about 140M tweets/day, that makes ~5M eligible tweets/day. I do not know what the proportion of tweets from the US is but I would think 50% seem reasonable and would result in ~2.5M tweets/day. Even if we lower that proportion, your 50 000 tweets/day seems way off. There are 3 possibilities, 1) you are being rate limited more than you think, 2) your bounding box is wrong or 3) your bounding box is too large and Twitter has reduced it somehow. I remember I read somewhere in the api doc that each bounding box could not be more than 1 degree square enough to cover most metropolitan areas - but I cannot find that back. Colin On Mar 31, 4:08 pm, Data Gatherer gatherer...@gmail.com wrote: We have a bounding box set for the United States. Even though it's a large box, we only receive about 50,000 tweets a day. However, I see that we get rate limited at least once a week already. The box is large, but the number of matching results is fairly low. Knowing how the rate limiting works more specifically would be important when trying to gather data for other projects (more bounding boxes, other keywords). On Mar 31, 3:50 pm, Jeremy Dunck jdu...@gmail.com wrote: On Thu, Mar 31, 2011 at 2:48 PM, Augusto Santos augu...@gemeos.org wrote: No it won't. Streaming has rate limit with around 1% of firehose, if your search term os too much generic. If your search term or bouding box get too many tweets, you will start receive 'limit' status message as doc said. http://dev.twitter.com/pages/streaming_api_concepts#parsing-responses Sure, I understand that, I just meant to say that 1% of all tweets is a lot (140M average per day now). If your terms are not very general, you have a lot of head room. -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk -- 氣 -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk
Re: [twitter-dev] Re: Streaming API Rate Limiting
All of my experiences with geotagging show that about 0.3% to 0.5% of tweets have these codes. I'd be curious to know if that matches what others have found. On Fri, Apr 1, 2011 at 2:40 PM, Augusto Santos augu...@gemeos.org wrote: Sorry Colin, but where did you get this information? Doesn't match with the reality. Not at all. On Fri, Apr 1, 2011 at 12:35 PM, Colin Surprenant colin.surpren...@gmail.com wrote: As a side note, currently only 3-4% of the total tweets (firehose) are geo-tagged and are eligible to be selected in a stream location bounding box. If the current firehose rate is about 140M tweets/day, that makes ~5M eligible tweets/day. I do not know what the proportion of tweets from the US is but I would think 50% seem reasonable and would result in ~2.5M tweets/day. Even if we lower that proportion, your 50 000 tweets/day seems way off. There are 3 possibilities, 1) you are being rate limited more than you think, 2) your bounding box is wrong or 3) your bounding box is too large and Twitter has reduced it somehow. I remember I read somewhere in the api doc that each bounding box could not be more than 1 degree square enough to cover most metropolitan areas - but I cannot find that back. Colin On Mar 31, 4:08 pm, Data Gatherer gatherer...@gmail.com wrote: We have a bounding box set for the United States. Even though it's a large box, we only receive about 50,000 tweets a day. However, I see that we get rate limited at least once a week already. The box is large, but the number of matching results is fairly low. Knowing how the rate limiting works more specifically would be important when trying to gather data for other projects (more bounding boxes, other keywords). On Mar 31, 3:50 pm, Jeremy Dunck jdu...@gmail.com wrote: On Thu, Mar 31, 2011 at 2:48 PM, Augusto Santos augu...@gemeos.org wrote: No it won't. Streaming has rate limit with around 1% of firehose, if your search term os too much generic. If your search term or bouding box get too many tweets, you will start receive 'limit' status message as doc said. http://dev.twitter.com/pages/streaming_api_concepts#parsing-responses Sure, I understand that, I just meant to say that 1% of all tweets is a lot (140M average per day now). If your terms are not very general, you have a lot of head room. -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk -- 氣 -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk -- Adam Green Twitter API Consultant and Analyst http://140dev.com, @140dev http://2012twit.com, @2012twit 781-879-2960 -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk
Re: [twitter-dev] Re: Streaming API Rate Limiting
Clearer Information: From 10th Mar to 31th Mar the average was 1,1M/day and 860K/day of these with lat/long information. On Fri, Apr 1, 2011 at 4:03 PM, Augusto Santos augu...@gemeos.org wrote: Since 6th March setting location via Broswer has been disable, which correponded of around 50% geotagged tweets. And now I'am getting values very similar with you Adam. On Fri, Apr 1, 2011 at 3:56 PM, Adam Green 140...@gmail.com wrote: All of my experiences with geotagging show that about 0.3% to 0.5% of tweets have these codes. I'd be curious to know if that matches what others have found. On Fri, Apr 1, 2011 at 2:40 PM, Augusto Santos augu...@gemeos.org wrote: Sorry Colin, but where did you get this information? Doesn't match with the reality. Not at all. On Fri, Apr 1, 2011 at 12:35 PM, Colin Surprenant colin.surpren...@gmail.com wrote: As a side note, currently only 3-4% of the total tweets (firehose) are geo-tagged and are eligible to be selected in a stream location bounding box. If the current firehose rate is about 140M tweets/day, that makes ~5M eligible tweets/day. I do not know what the proportion of tweets from the US is but I would think 50% seem reasonable and would result in ~2.5M tweets/day. Even if we lower that proportion, your 50 000 tweets/day seems way off. There are 3 possibilities, 1) you are being rate limited more than you think, 2) your bounding box is wrong or 3) your bounding box is too large and Twitter has reduced it somehow. I remember I read somewhere in the api doc that each bounding box could not be more than 1 degree square enough to cover most metropolitan areas - but I cannot find that back. Colin On Mar 31, 4:08 pm, Data Gatherer gatherer...@gmail.com wrote: We have a bounding box set for the United States. Even though it's a large box, we only receive about 50,000 tweets a day. However, I see that we get rate limited at least once a week already. The box is large, but the number of matching results is fairly low. Knowing how the rate limiting works more specifically would be important when trying to gather data for other projects (more bounding boxes, other keywords). On Mar 31, 3:50 pm, Jeremy Dunck jdu...@gmail.com wrote: On Thu, Mar 31, 2011 at 2:48 PM, Augusto Santos augu...@gemeos.org wrote: No it won't. Streaming has rate limit with around 1% of firehose, if your search term os too much generic. If your search term or bouding box get too many tweets, you will start receive 'limit' status message as doc said. http://dev.twitter.com/pages/streaming_api_concepts#parsing-responses Sure, I understand that, I just meant to say that 1% of all tweets is a lot (140M average per day now). If your terms are not very general, you have a lot of head room. -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk -- 氣 -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk -- Adam Green Twitter API Consultant and Analyst http://140dev.com, @140dev http://2012twit.com, @2012twit 781-879-2960 -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk -- 氣 -- 氣 -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk
[twitter-dev] Re: Streaming API Rate Limiting
Well, first, In the Gnip Power Track documentation http://docs.gnip.com/w/page/35663947/Power-Track at the has:geo section they say Currently, 'has:geo' is about 2-4% of the full firehose. Also, I ran some tests a few weeks ago to see the difference in content between the search api and the streaming api for equivalent geolocalized searches. See this thread http://groups.google.com/group/twitter-development-talk/browse_thread/thread/a4bf3b7c6373657b# My results showed that the streaming API returns a very small fraction (3% in my tests) of what the search API returns. This is because the streaming API only uses the geotagging API to locate tweets, but the search API uses both the geotagging API and the user location field. For example, I can get around 250 000 tweets/day for San Francisco using the search api but the streaming api will return around 7000 tweets/day. At 7000 tweets/day for San Francisco, 50 000 for the whole US seems small. Colin On Apr 1, 2:40 pm, Augusto Santos augu...@gemeos.org wrote: Sorry Colin, but where did you get this information? Doesn't match with the reality. Not at all. On Fri, Apr 1, 2011 at 12:35 PM, Colin Surprenant colin.surpren...@gmail.com wrote: As a side note, currently only 3-4% of the total tweets (firehose) are geo-tagged and are eligible to be selected in a stream location bounding box. If the current firehose rate is about 140M tweets/day, that makes ~5M eligible tweets/day. I do not know what the proportion of tweets from the US is but I would think 50% seem reasonable and would result in ~2.5M tweets/day. Even if we lower that proportion, your 50 000 tweets/day seems way off. There are 3 possibilities, 1) you are being rate limited more than you think, 2) your bounding box is wrong or 3) your bounding box is too large and Twitter has reduced it somehow. I remember I read somewhere in the api doc that each bounding box could not be more than 1 degree square enough to cover most metropolitan areas - but I cannot find that back. Colin On Mar 31, 4:08 pm, Data Gatherer gatherer...@gmail.com wrote: We have a bounding box set for the United States. Even though it's a large box, we only receive about 50,000 tweets a day. However, I see that we get rate limited at least once a week already. The box is large, but the number of matching results is fairly low. Knowing how the rate limiting works more specifically would be important when trying to gather data for other projects (more bounding boxes, other keywords). On Mar 31, 3:50 pm, Jeremy Dunck jdu...@gmail.com wrote: On Thu, Mar 31, 2011 at 2:48 PM, Augusto Santos augu...@gemeos.org wrote: No it won't. Streaming has rate limit with around 1% of firehose, if your search term os too much generic. If your search term or bouding box get too many tweets, you will start receive 'limit' status message as doc said. http://dev.twitter.com/pages/streaming_api_concepts#parsing-responses Sure, I understand that, I just meant to say that 1% of all tweets is a lot (140M average per day now). If your terms are not very general, you have a lot of head room. -- Twitter developer documentation and resources:http://dev.twitter.com/doc API updates via Twitter:http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk -- 氣 -- Twitter developer documentation and resources: http://dev.twitter.com/doc API updates via Twitter: http://twitter.com/twitterapi Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list Change your membership to this group: http://groups.google.com/group/twitter-development-talk