I am interested in gathering tweets from a particular geographic
region - currently Nigeria. Initially I ran queries that used the
coordinates of Abuja, the capital, and asked for tweets within 400
miles. This covers most of the country save the the far northeastern
corner of the country. This gave me 5-6K tweets a day. Since Nigeria
has just reached 1M Facebook users, and taking that as an indicator, I
expected much more data.

Next I tried a query that asked for tweets within 50 miles of Lagos,
the largest city in Nigeria - with a population of over 9 million -
and I got 12-15K tweets a day. A query asking for tweets within 10
miles of Lagos gave me 5k tweets a day. Both these numbers still seem
low, but an improvement nonetheless.  Lagos was within the 400 mile
radius around Abuja, so it's interesting the query at the higher
resolution gave me less data while going from 10 to 50 miles gave me
more data.

Currently I'm querying a number of the larger cities in Nigeria, in
each case using a radius of 40-50 miles, and am getting 30K tweets a
day. I'm assuming that I am still missing a lot of data.

My questions:

How does radius effect the query? 400 miles was clearly too wide a
radius. 50 miles gave me more tweets than using 400 miles, but
dropping to 10 miles gave me fewer. Any explanations for this

Secondly, what is the best way get get tweets from a region. I'm not
convinced I am going about it in the best way.

Third, is there ground truth data for the number of Twitter users and
"tweet-rate" by country. It would be great to know just how many
tweets per day to expect.

My queries page for 15 pages at 100 tweets a page and I stop paging if
I get no new tweets. I then wait for a period of time, 10 minutes for
Lagos and Abuja, and hour for more sparsely populated locations. I
then start paging again with the since_id argument set to the id of
the last tweet I got. There may be some tweeking I can do to the wait
times, but I would expect that it would only provide marginal benefit.



