[twitter-dev] Re: Streaming API Rate Limiting

2011-04-04 Thread Khandelwal
So, it turns out that I was an order of magnitude off when I mentioned
numbers above. We receive 500,000 tweets/day not 50,000.

On Apr 1, 3:49 pm, Colin Surprenant colin.surpren...@gmail.com
wrote:
 Well, first, In the Gnip Power Track 
 documentationhttp://docs.gnip.com/w/page/35663947/Power-Trackat the has:geo
 section they say Currently, 'has:geo' is about 2-4% of the full
 firehose.

 Also, I ran some tests a few weeks ago to see the difference in
 content between the search api and the streaming api for equivalent
 geolocalized searches. See this 
 threadhttp://groups.google.com/group/twitter-development-talk/browse_thread...

 My results showed that the streaming API returns a very small fraction
 (3% in my tests) of what the search API returns. This is because the
 streaming API only uses the geotagging API to locate tweets, but the
 search API uses both the geotagging API and the user location field.

 For example, I can get around 250 000 tweets/day for San Francisco
 using the search api but the streaming api will return around 7000
 tweets/day.

 At 7000 tweets/day for San Francisco, 50 000 for the whole US seems
 small.

 Colin

 On Apr 1, 2:40 pm, Augusto Santos augu...@gemeos.org wrote:







  Sorry Colin, but where did you get this information? Doesn't match with the
  reality. Not at all.

  On Fri, Apr 1, 2011 at 12:35 PM, Colin Surprenant 

  colin.surpren...@gmail.com wrote:
   As a side note, currently only 3-4% of the total tweets (firehose) are
   geo-tagged and are eligible to be selected in a stream location
   bounding box. If the current firehose rate is about 140M tweets/day,
   that makes ~5M eligible tweets/day.

   I do not know what the proportion of tweets from the US is but I would
   think 50% seem reasonable and would result in ~2.5M tweets/day. Even
   if we lower that proportion, your 50 000 tweets/day seems way off.

   There are 3 possibilities, 1) you are being rate limited more than you
   think, 2) your bounding box is wrong or 3) your bounding box is too
   large and Twitter has reduced it somehow. I remember I read somewhere
   in the api doc that each bounding box could not be more than 1 degree
   square enough to cover most metropolitan areas - but I cannot find
   that back.

   Colin

   On Mar 31, 4:08 pm, Data Gatherer gatherer...@gmail.com wrote:
We have a bounding box set for the United States. Even though it's a
large box, we only receive about 50,000 tweets a day. However, I see
that we get rate limited at least once a week already. The box is
large, but the number of matching results is fairly low.  Knowing how
the rate limiting works more specifically would be important when
trying to gather data for other projects (more bounding boxes, other
keywords).

On Mar 31, 3:50 pm, Jeremy Dunck jdu...@gmail.com wrote:

 On Thu, Mar 31, 2011 at 2:48 PM, Augusto Santos augu...@gemeos.org
   wrote:
  No it won't. Streaming has rate limit with around 1% of firehose, if
   your
  search term os too much generic.
  If your search term or bouding box get too many tweets, you will
   start
  receive 'limit' status message as doc said.
 http://dev.twitter.com/pages/streaming_api_concepts#parsing-responses

 Sure, I understand that, I just meant to say that 1% of all tweets is
 a lot (140M average per day now).

 If your terms are not very general, you have a lot of head room.

   --
   Twitter developer documentation and resources:http://dev.twitter.com/doc
   API updates via Twitter:http://twitter.com/twitterapi
   Issues/Enhancements Tracker:
  http://code.google.com/p/twitter-api/issues/list
   Change your membership to this group:
  http://groups.google.com/group/twitter-development-talk

  --
  氣

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk


Re: [twitter-dev] Re: Streaming API Rate Limiting

2011-04-04 Thread Augusto Santos
I'm getting very similar count to United States, my average is 499,380
tweets/day.

On Mon, Apr 4, 2011 at 3:08 PM, Khandelwal khandel...@gmail.com wrote:

 So, it turns out that I was an order of magnitude off when I mentioned
 numbers above. We receive 500,000 tweets/day not 50,000.

 On Apr 1, 3:49 pm, Colin Surprenant colin.surpren...@gmail.com
 wrote:
  Well, first, In the Gnip Power Track documentationhttp://
 docs.gnip.com/w/page/35663947/Power-Trackat the has:geo
  section they say Currently, 'has:geo' is about 2-4% of the full
  firehose.
 
  Also, I ran some tests a few weeks ago to see the difference in
  content between the search api and the streaming api for equivalent
  geolocalized searches. See this threadhttp://
 groups.google.com/group/twitter-development-talk/browse_thread...
 
  My results showed that the streaming API returns a very small fraction
  (3% in my tests) of what the search API returns. This is because the
  streaming API only uses the geotagging API to locate tweets, but the
  search API uses both the geotagging API and the user location field.
 
  For example, I can get around 250 000 tweets/day for San Francisco
  using the search api but the streaming api will return around 7000
  tweets/day.
 
  At 7000 tweets/day for San Francisco, 50 000 for the whole US seems
  small.
 
  Colin
 
  On Apr 1, 2:40 pm, Augusto Santos augu...@gemeos.org wrote:
 
 
 
 
 
 
 
   Sorry Colin, but where did you get this information? Doesn't match with
 the
   reality. Not at all.
 
   On Fri, Apr 1, 2011 at 12:35 PM, Colin Surprenant 
 
   colin.surpren...@gmail.com wrote:
As a side note, currently only 3-4% of the total tweets (firehose)
 are
geo-tagged and are eligible to be selected in a stream location
bounding box. If the current firehose rate is about 140M tweets/day,
that makes ~5M eligible tweets/day.
 
I do not know what the proportion of tweets from the US is but I
 would
think 50% seem reasonable and would result in ~2.5M tweets/day. Even
if we lower that proportion, your 50 000 tweets/day seems way off.
 
There are 3 possibilities, 1) you are being rate limited more than
 you
think, 2) your bounding box is wrong or 3) your bounding box is too
large and Twitter has reduced it somehow. I remember I read somewhere
in the api doc that each bounding box could not be more than 1 degree
square enough to cover most metropolitan areas - but I cannot find
that back.
 
Colin
 
On Mar 31, 4:08 pm, Data Gatherer gatherer...@gmail.com wrote:
 We have a bounding box set for the United States. Even though it's
 a
 large box, we only receive about 50,000 tweets a day. However, I
 see
 that we get rate limited at least once a week already. The box is
 large, but the number of matching results is fairly low.  Knowing
 how
 the rate limiting works more specifically would be important when
 trying to gather data for other projects (more bounding boxes,
 other
 keywords).
 
 On Mar 31, 3:50 pm, Jeremy Dunck jdu...@gmail.com wrote:
 
  On Thu, Mar 31, 2011 at 2:48 PM, Augusto Santos 
 augu...@gemeos.org
wrote:
   No it won't. Streaming has rate limit with around 1% of
 firehose, if
your
   search term os too much generic.
   If your search term or bouding box get too many tweets, you
 will
start
   receive 'limit' status message as doc said.
  
 http://dev.twitter.com/pages/streaming_api_concepts#parsing-responses
 
  Sure, I understand that, I just meant to say that 1% of all
 tweets is
  a lot (140M average per day now).
 
  If your terms are not very general, you have a lot of head room.
 
--
Twitter developer documentation and resources:
 http://dev.twitter.com/doc
API updates via Twitter:http://twitter.com/twitterapi
Issues/Enhancements Tracker:
   http://code.google.com/p/twitter-api/issues/list
Change your membership to this group:
   http://groups.google.com/group/twitter-development-talk
 
   --
   氣

 --
 Twitter developer documentation and resources: http://dev.twitter.com/doc
 API updates via Twitter: http://twitter.com/twitterapi
 Issues/Enhancements Tracker:
 http://code.google.com/p/twitter-api/issues/list
 Change your membership to this group:
 http://groups.google.com/group/twitter-development-talk




-- 
氣

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk


[twitter-dev] Re: Streaming API Rate Limiting

2011-04-01 Thread Colin Surprenant
As a side note, currently only 3-4% of the total tweets (firehose) are
geo-tagged and are eligible to be selected in a stream location
bounding box. If the current firehose rate is about 140M tweets/day,
that makes ~5M eligible tweets/day.

I do not know what the proportion of tweets from the US is but I would
think 50% seem reasonable and would result in ~2.5M tweets/day. Even
if we lower that proportion, your 50 000 tweets/day seems way off.

There are 3 possibilities, 1) you are being rate limited more than you
think, 2) your bounding box is wrong or 3) your bounding box is too
large and Twitter has reduced it somehow. I remember I read somewhere
in the api doc that each bounding box could not be more than 1 degree
square enough to cover most metropolitan areas - but I cannot find
that back.

Colin

On Mar 31, 4:08 pm, Data Gatherer gatherer...@gmail.com wrote:
 We have a bounding box set for the United States. Even though it's a
 large box, we only receive about 50,000 tweets a day. However, I see
 that we get rate limited at least once a week already. The box is
 large, but the number of matching results is fairly low.  Knowing how
 the rate limiting works more specifically would be important when
 trying to gather data for other projects (more bounding boxes, other
 keywords).

 On Mar 31, 3:50 pm, Jeremy Dunck jdu...@gmail.com wrote:







  On Thu, Mar 31, 2011 at 2:48 PM, Augusto Santos augu...@gemeos.org wrote:
   No it won't. Streaming has rate limit with around 1% of firehose, if your
   search term os too much generic.
   If your search term or bouding box get too many tweets, you will start
   receive 'limit' status message as doc said.
  http://dev.twitter.com/pages/streaming_api_concepts#parsing-responses

  Sure, I understand that, I just meant to say that 1% of all tweets is
  a lot (140M average per day now).

  If your terms are not very general, you have a lot of head room.

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk


Re: [twitter-dev] Re: Streaming API Rate Limiting

2011-04-01 Thread Augusto Santos
Sorry Colin, but where did you get this information? Doesn't match with the
reality. Not at all.

On Fri, Apr 1, 2011 at 12:35 PM, Colin Surprenant 
colin.surpren...@gmail.com wrote:

 As a side note, currently only 3-4% of the total tweets (firehose) are
 geo-tagged and are eligible to be selected in a stream location
 bounding box. If the current firehose rate is about 140M tweets/day,
 that makes ~5M eligible tweets/day.

 I do not know what the proportion of tweets from the US is but I would
 think 50% seem reasonable and would result in ~2.5M tweets/day. Even
 if we lower that proportion, your 50 000 tweets/day seems way off.

 There are 3 possibilities, 1) you are being rate limited more than you
 think, 2) your bounding box is wrong or 3) your bounding box is too
 large and Twitter has reduced it somehow. I remember I read somewhere
 in the api doc that each bounding box could not be more than 1 degree
 square enough to cover most metropolitan areas - but I cannot find
 that back.

 Colin

 On Mar 31, 4:08 pm, Data Gatherer gatherer...@gmail.com wrote:
  We have a bounding box set for the United States. Even though it's a
  large box, we only receive about 50,000 tweets a day. However, I see
  that we get rate limited at least once a week already. The box is
  large, but the number of matching results is fairly low.  Knowing how
  the rate limiting works more specifically would be important when
  trying to gather data for other projects (more bounding boxes, other
  keywords).
 
  On Mar 31, 3:50 pm, Jeremy Dunck jdu...@gmail.com wrote:
 
 
 
 
 
 
 
   On Thu, Mar 31, 2011 at 2:48 PM, Augusto Santos augu...@gemeos.org
 wrote:
No it won't. Streaming has rate limit with around 1% of firehose, if
 your
search term os too much generic.
If your search term or bouding box get too many tweets, you will
 start
receive 'limit' status message as doc said.
   http://dev.twitter.com/pages/streaming_api_concepts#parsing-responses
 
   Sure, I understand that, I just meant to say that 1% of all tweets is
   a lot (140M average per day now).
 
   If your terms are not very general, you have a lot of head room.

 --
 Twitter developer documentation and resources: http://dev.twitter.com/doc
 API updates via Twitter: http://twitter.com/twitterapi
 Issues/Enhancements Tracker:
 http://code.google.com/p/twitter-api/issues/list
 Change your membership to this group:
 http://groups.google.com/group/twitter-development-talk




-- 
氣

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk


Re: [twitter-dev] Re: Streaming API Rate Limiting

2011-04-01 Thread Adam Green
All of my experiences with geotagging show that about 0.3% to 0.5% of
tweets have these codes. I'd be curious to know if that matches what
others have found.

On Fri, Apr 1, 2011 at 2:40 PM, Augusto Santos augu...@gemeos.org wrote:
 Sorry Colin, but where did you get this information? Doesn't match with the
 reality. Not at all.

 On Fri, Apr 1, 2011 at 12:35 PM, Colin Surprenant
 colin.surpren...@gmail.com wrote:

 As a side note, currently only 3-4% of the total tweets (firehose) are
 geo-tagged and are eligible to be selected in a stream location
 bounding box. If the current firehose rate is about 140M tweets/day,
 that makes ~5M eligible tweets/day.

 I do not know what the proportion of tweets from the US is but I would
 think 50% seem reasonable and would result in ~2.5M tweets/day. Even
 if we lower that proportion, your 50 000 tweets/day seems way off.

 There are 3 possibilities, 1) you are being rate limited more than you
 think, 2) your bounding box is wrong or 3) your bounding box is too
 large and Twitter has reduced it somehow. I remember I read somewhere
 in the api doc that each bounding box could not be more than 1 degree
 square enough to cover most metropolitan areas - but I cannot find
 that back.

 Colin

 On Mar 31, 4:08 pm, Data Gatherer gatherer...@gmail.com wrote:
  We have a bounding box set for the United States. Even though it's a
  large box, we only receive about 50,000 tweets a day. However, I see
  that we get rate limited at least once a week already. The box is
  large, but the number of matching results is fairly low.  Knowing how
  the rate limiting works more specifically would be important when
  trying to gather data for other projects (more bounding boxes, other
  keywords).
 
  On Mar 31, 3:50 pm, Jeremy Dunck jdu...@gmail.com wrote:
 
 
 
 
 
 
 
   On Thu, Mar 31, 2011 at 2:48 PM, Augusto Santos augu...@gemeos.org
   wrote:
No it won't. Streaming has rate limit with around 1% of firehose, if
your
search term os too much generic.
If your search term or bouding box get too many tweets, you will
start
receive 'limit' status message as doc said.
   http://dev.twitter.com/pages/streaming_api_concepts#parsing-responses
 
   Sure, I understand that, I just meant to say that 1% of all tweets is
   a lot (140M average per day now).
 
   If your terms are not very general, you have a lot of head room.

 --
 Twitter developer documentation and resources: http://dev.twitter.com/doc
 API updates via Twitter: http://twitter.com/twitterapi
 Issues/Enhancements Tracker:
 http://code.google.com/p/twitter-api/issues/list
 Change your membership to this group:
 http://groups.google.com/group/twitter-development-talk



 --
 氣

 --
 Twitter developer documentation and resources: http://dev.twitter.com/doc
 API updates via Twitter: http://twitter.com/twitterapi
 Issues/Enhancements Tracker:
 http://code.google.com/p/twitter-api/issues/list
 Change your membership to this group:
 http://groups.google.com/group/twitter-development-talk




-- 
Adam Green
Twitter API Consultant and Analyst
http://140dev.com, @140dev
http://2012twit.com, @2012twit
781-879-2960

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk


Re: [twitter-dev] Re: Streaming API Rate Limiting

2011-04-01 Thread Augusto Santos
Clearer Information:
From 10th Mar to 31th Mar the average was 1,1M/day and 860K/day of these
with lat/long information.

On Fri, Apr 1, 2011 at 4:03 PM, Augusto Santos augu...@gemeos.org wrote:

 Since 6th March setting location via Broswer has been disable, which
 correponded of around 50% geotagged tweets. And now I'am getting values very
 similar with you Adam.


 On Fri, Apr 1, 2011 at 3:56 PM, Adam Green 140...@gmail.com wrote:

 All of my experiences with geotagging show that about 0.3% to 0.5% of
 tweets have these codes. I'd be curious to know if that matches what
 others have found.

 On Fri, Apr 1, 2011 at 2:40 PM, Augusto Santos augu...@gemeos.org
 wrote:
  Sorry Colin, but where did you get this information? Doesn't match with
 the
  reality. Not at all.
 
  On Fri, Apr 1, 2011 at 12:35 PM, Colin Surprenant
  colin.surpren...@gmail.com wrote:
 
  As a side note, currently only 3-4% of the total tweets (firehose) are
  geo-tagged and are eligible to be selected in a stream location
  bounding box. If the current firehose rate is about 140M tweets/day,
  that makes ~5M eligible tweets/day.
 
  I do not know what the proportion of tweets from the US is but I would
  think 50% seem reasonable and would result in ~2.5M tweets/day. Even
  if we lower that proportion, your 50 000 tweets/day seems way off.
 
  There are 3 possibilities, 1) you are being rate limited more than you
  think, 2) your bounding box is wrong or 3) your bounding box is too
  large and Twitter has reduced it somehow. I remember I read somewhere
  in the api doc that each bounding box could not be more than 1 degree
  square enough to cover most metropolitan areas - but I cannot find
  that back.
 
  Colin
 
  On Mar 31, 4:08 pm, Data Gatherer gatherer...@gmail.com wrote:
   We have a bounding box set for the United States. Even though it's a
   large box, we only receive about 50,000 tweets a day. However, I see
   that we get rate limited at least once a week already. The box is
   large, but the number of matching results is fairly low.  Knowing how
   the rate limiting works more specifically would be important when
   trying to gather data for other projects (more bounding boxes, other
   keywords).
  
   On Mar 31, 3:50 pm, Jeremy Dunck jdu...@gmail.com wrote:
  
  
  
  
  
  
  
On Thu, Mar 31, 2011 at 2:48 PM, Augusto Santos 
 augu...@gemeos.org
wrote:
 No it won't. Streaming has rate limit with around 1% of firehose,
 if
 your
 search term os too much generic.
 If your search term or bouding box get too many tweets, you will
 start
 receive 'limit' status message as doc said.

 http://dev.twitter.com/pages/streaming_api_concepts#parsing-responses
  
Sure, I understand that, I just meant to say that 1% of all tweets
 is
a lot (140M average per day now).
  
If your terms are not very general, you have a lot of head room.
 
  --
  Twitter developer documentation and resources:
 http://dev.twitter.com/doc
  API updates via Twitter: http://twitter.com/twitterapi
  Issues/Enhancements Tracker:
  http://code.google.com/p/twitter-api/issues/list
  Change your membership to this group:
  http://groups.google.com/group/twitter-development-talk
 
 
 
  --
  氣
 
  --
  Twitter developer documentation and resources:
 http://dev.twitter.com/doc
  API updates via Twitter: http://twitter.com/twitterapi
  Issues/Enhancements Tracker:
  http://code.google.com/p/twitter-api/issues/list
  Change your membership to this group:
  http://groups.google.com/group/twitter-development-talk
 



 --
 Adam Green
 Twitter API Consultant and Analyst
 http://140dev.com, @140dev
 http://2012twit.com, @2012twit
 781-879-2960

 --
 Twitter developer documentation and resources: http://dev.twitter.com/doc
 API updates via Twitter: http://twitter.com/twitterapi
 Issues/Enhancements Tracker:
 http://code.google.com/p/twitter-api/issues/list
 Change your membership to this group:
 http://groups.google.com/group/twitter-development-talk




 --
 氣




-- 
氣

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk


[twitter-dev] Re: Streaming API Rate Limiting

2011-04-01 Thread Colin Surprenant
Well, first, In the Gnip Power Track documentation
http://docs.gnip.com/w/page/35663947/Power-Track at the has:geo
section they say Currently, 'has:geo' is about 2-4% of the full
firehose.

Also, I ran some tests a few weeks ago to see the difference in
content between the search api and the streaming api for equivalent
geolocalized searches. See this thread
http://groups.google.com/group/twitter-development-talk/browse_thread/thread/a4bf3b7c6373657b#

My results showed that the streaming API returns a very small fraction
(3% in my tests) of what the search API returns. This is because the
streaming API only uses the geotagging API to locate tweets, but the
search API uses both the geotagging API and the user location field.

For example, I can get around 250 000 tweets/day for San Francisco
using the search api but the streaming api will return around 7000
tweets/day.

At 7000 tweets/day for San Francisco, 50 000 for the whole US seems
small.

Colin

On Apr 1, 2:40 pm, Augusto Santos augu...@gemeos.org wrote:
 Sorry Colin, but where did you get this information? Doesn't match with the
 reality. Not at all.

 On Fri, Apr 1, 2011 at 12:35 PM, Colin Surprenant 









 colin.surpren...@gmail.com wrote:
  As a side note, currently only 3-4% of the total tweets (firehose) are
  geo-tagged and are eligible to be selected in a stream location
  bounding box. If the current firehose rate is about 140M tweets/day,
  that makes ~5M eligible tweets/day.

  I do not know what the proportion of tweets from the US is but I would
  think 50% seem reasonable and would result in ~2.5M tweets/day. Even
  if we lower that proportion, your 50 000 tweets/day seems way off.

  There are 3 possibilities, 1) you are being rate limited more than you
  think, 2) your bounding box is wrong or 3) your bounding box is too
  large and Twitter has reduced it somehow. I remember I read somewhere
  in the api doc that each bounding box could not be more than 1 degree
  square enough to cover most metropolitan areas - but I cannot find
  that back.

  Colin

  On Mar 31, 4:08 pm, Data Gatherer gatherer...@gmail.com wrote:
   We have a bounding box set for the United States. Even though it's a
   large box, we only receive about 50,000 tweets a day. However, I see
   that we get rate limited at least once a week already. The box is
   large, but the number of matching results is fairly low.  Knowing how
   the rate limiting works more specifically would be important when
   trying to gather data for other projects (more bounding boxes, other
   keywords).

   On Mar 31, 3:50 pm, Jeremy Dunck jdu...@gmail.com wrote:

On Thu, Mar 31, 2011 at 2:48 PM, Augusto Santos augu...@gemeos.org
  wrote:
 No it won't. Streaming has rate limit with around 1% of firehose, if
  your
 search term os too much generic.
 If your search term or bouding box get too many tweets, you will
  start
 receive 'limit' status message as doc said.
http://dev.twitter.com/pages/streaming_api_concepts#parsing-responses

Sure, I understand that, I just meant to say that 1% of all tweets is
a lot (140M average per day now).

If your terms are not very general, you have a lot of head room.

  --
  Twitter developer documentation and resources:http://dev.twitter.com/doc
  API updates via Twitter:http://twitter.com/twitterapi
  Issues/Enhancements Tracker:
 http://code.google.com/p/twitter-api/issues/list
  Change your membership to this group:
 http://groups.google.com/group/twitter-development-talk

 --
 氣

-- 
Twitter developer documentation and resources: http://dev.twitter.com/doc
API updates via Twitter: http://twitter.com/twitterapi
Issues/Enhancements Tracker: http://code.google.com/p/twitter-api/issues/list
Change your membership to this group: 
http://groups.google.com/group/twitter-development-talk