[twitter-dev] Relationship between Gardenhose and Track vs Search API

2010-01-13 Thread Ross Bates
I'm reading the streaming API documentation and have a question about
track keywords. A set of keywords can be used to filter the gardenhose
but it doesn't actually increase your chance of getting tweets that
would not have been included in the unfiltered stream. The gardenhose
is a sample of the firehose and returns the same results to all
clients - correct?

If this is the case then for applications that need all data for
specific keywords I would think the search API remains the better
option? For example, if I needed all tweets that contained the words
foo OR bar the gardenhose can't guarantee I will get 100%.

What's confusing me is the email which went out the other day about
the streaming API. First the statement about polling for keywords:

If your application polls for keywords, mentions, is whitelisted on
the
Search API, or makes more than perhaps 10 queries per minute, you
should
begin your migration to Streaming. Desktop clients should postpone a
migration to Streaming.

Then later in the email:

Complete corpus search: Search is focused on result set quality and
there are no guarantees to return all matching tweets. Complete
results
are only available on the Streaming API. Search results are
increasingly
filtered and reordered for relevance.

This second statement differs from the streaming API documentation
which says that the streaming API is sampled.

Does the rollout of the streaming API to the general public mean that
results are no longer sampled?

-Ross


Re: [twitter-dev] Relationship between Gardenhose and Track vs Search API

2010-01-13 Thread Mark McBride
Check out the filter URL on the streaming API.  It will return up to N
tweets a minute, where N is the amount you'd get from a sampled
stream.  However it only returns tweets that match track keywords.
Provided the number of filtered tweets is never above the sampled
amount, you won't get limited.

Let's take a hypothetical example.  Using gardenhose you're throttled
at 100 tweets a minute (not the real number).  You track the keyword
twitter.  During the first minute there are 50 matches.  You get all
50.  During the second minute there are 150 tweets about twitter.
You'll get 100 tweets, and a limit message saying there were 50 more
you missed due to throttling.  Does this make sense?

   ---Mark

http://twitter.com/mccv



On Wed, Jan 13, 2010 at 10:55 AM, Ross Bates rba...@gmail.com wrote:
 I'm reading the streaming API documentation and have a question about
 track keywords. A set of keywords can be used to filter the gardenhose
 but it doesn't actually increase your chance of getting tweets that
 would not have been included in the unfiltered stream. The gardenhose
 is a sample of the firehose and returns the same results to all
 clients - correct?

 If this is the case then for applications that need all data for
 specific keywords I would think the search API remains the better
 option? For example, if I needed all tweets that contained the words
 foo OR bar the gardenhose can't guarantee I will get 100%.

 What's confusing me is the email which went out the other day about
 the streaming API. First the statement about polling for keywords:

 If your application polls for keywords, mentions, is whitelisted on
 the
 Search API, or makes more than perhaps 10 queries per minute, you
 should
 begin your migration to Streaming. Desktop clients should postpone a
 migration to Streaming.

 Then later in the email:

 Complete corpus search: Search is focused on result set quality and
 there are no guarantees to return all matching tweets. Complete
 results
 are only available on the Streaming API. Search results are
 increasingly
 filtered and reordered for relevance.

 This second statement differs from the streaming API documentation
 which says that the streaming API is sampled.

 Does the rollout of the streaming API to the general public mean that
 results are no longer sampled?

 -Ross