On Jan 8, 9:29 am, GeorgeMedia <[email protected]> wrote: > No one?
I think you would be better off consuming the firehose, geocode the tweets yourself, and throw away any that aren’t in regions you care about, caching the rest for a period of time. The thing to remember about "geocoding" of tweets is that until very recently the geocoding was solely by the <location> field in a user’s profile. True geocoding of individual tweets is very recent and depends on the user enabling geo coding, and on the user agent posting the lat/lon with the tweet. So the firehose *does* contain the <geo> field, it's just mostly empty because most clients don’t populate it yet. So if the <geo> field is empty you’d have to geocode based on the <location> field which is a bit of a hairball and may contain any data up to 30 bytes. Alternately, do the cron job thing but enlarge the regions you’re searching on (search on the top N cities or metros for example, not 200,000 coordinates). Cache the data, and accept that it won’t be absolutely up to date (it’s already lost a lot of precision since the <location> field is completely arbitrary and even if it is a city or lat/lon pair, does not necessarily represent where the twitter user was at that moment in time). -- -ed costello
