[twitter-dev] Re: Creating a search histogram

2009-04-30 Thread JoshL

Hmmm.  Very clever solution to use Google and I agree that unique
people MENTIONING a term is more valuable then the actual mention.

It's wild to me that the Twitter API has not started to incorporate
more stat related metrics.  Seems crazy to me that a server as useful
as Twitter that has become a mainstream media tool does not yet
provide metrics that would be business useful.

Hopefully soon I suppose...

On Apr 29, 1:27 pm, Nick Arnett nick.arn...@gmail.com wrote:
 On Wed, Apr 29, 2009 at 8:57 AM, JoshL jlippi...@gmail.com wrote:

  Does anyone have a good suggestion for how to obtain the data needed
  to know how many mentions of a specific term occured PER day over a
  given time period, such as two years?  Omniture's SiteCatalyst seems
  to be doing it somehow.

 Can't be done right now, since there is nowhere near that much history in
 Twitter's search index.  Even when there is, there will undoubtedly be an
 upper limit on results, which will prevent you from getting all the history
 for popular terms.

 However, if the data were available, the methodology would be fairly simple.
  You'd search on the terms and then iterate through the search results,
 counting unique mentions by day.  I'll suggest that for most purposes, the
 number of unique people mentioning a term is more interesting than the
 number of mentions (I've done a lot of that kind of analysis).

 I guess there's another possibility - use Google, which has more history.

 e.g.http://www.google.com/search?hl=enrlz=1C1CHMI_enUS291US307q=site:tw...

 For that term, there are about 56,000 results... but you can't get more than
 10,000 results from Google.  And you'd have to either parse the resulting
 pages to extract the status messages or just capture the screen names and
 use the Twitter APIs to get the statuses... fairly horrendous amount of work
 to get the data.

 For the sake of completeness, I'll note that you can get beyond 10,000
 results from Google by excluding terms, but there are also daily limits to
 Google API queries.

 And... knowing that Twitter Trends is doing essentially the same thing over
 the short term, I would suspect that if there's a need for this, Twitter
 will eventually tackle it.

 Those who are doing it now must have captured the data earlier if they have
 a year's worth.

 Nick


[twitter-dev] Re: Creating a search histogram

2009-04-29 Thread Nick Arnett
On Wed, Apr 29, 2009 at 8:57 AM, JoshL jlippi...@gmail.com wrote:



 Does anyone have a good suggestion for how to obtain the data needed
 to know how many mentions of a specific term occured PER day over a
 given time period, such as two years?  Omniture's SiteCatalyst seems
 to be doing it somehow.


Can't be done right now, since there is nowhere near that much history in
Twitter's search index.  Even when there is, there will undoubtedly be an
upper limit on results, which will prevent you from getting all the history
for popular terms.

However, if the data were available, the methodology would be fairly simple.
 You'd search on the terms and then iterate through the search results,
counting unique mentions by day.  I'll suggest that for most purposes, the
number of unique people mentioning a term is more interesting than the
number of mentions (I've done a lot of that kind of analysis).

I guess there's another possibility - use Google, which has more history.

e.g.
http://www.google.com/search?hl=enrlz=1C1CHMI_enUS291US307q=site:twitter.com+%2BegobtnG=Search

For that term, there are about 56,000 results... but you can't get more than
10,000 results from Google.  And you'd have to either parse the resulting
pages to extract the status messages or just capture the screen names and
use the Twitter APIs to get the statuses... fairly horrendous amount of work
to get the data.

For the sake of completeness, I'll note that you can get beyond 10,000
results from Google by excluding terms, but there are also daily limits to
Google API queries.

And... knowing that Twitter Trends is doing essentially the same thing over
the short term, I would suspect that if there's a need for this, Twitter
will eventually tackle it.

Those who are doing it now must have captured the data earlier if they have
a year's worth.

Nick