I am creating a mathematical model based on some results from
Twitter's API, but I am missing one critical number in the model.  I
need to estimate the number of total tweets in the USA each day. The
better an estimate I get and the less assumptions I make, the more
useful the model will be (it will be published for the public to
use).  I have been told that this type of information is important and
usually kept secret by internet start ups.  Understanding this, I have
come up with a work around that is not yet accurate enough so I am
looking for your advice.

Idea:

I gather data from Twitter's search API at least once an hour.  My
idea is to store the first tweet ID I see each day, and subtract it
from the ID of the previous day to estimate the number of tweets per
day.  I have three problems here:

1. How are tweet IDs incremented?  Do they increase by a factor of 1,
2, 5, 10...?
2. I need an estimate for the number of private/protected users
assuming each private user's tweet gets an ID number.  This is
required because I am sampling the public tweets.
3. I need to estimate the number of tweets coming from overseas.  I am
modeling the USA.  This is less of a problem than the previous two.



Thanks for your time.  Any help/advice is appreciated!

Reply via email to