You can use the Search API to do historical queries when you first
start following a hash tag, then switch to Streaming. Search results
may be incomplete but sufficient in this case.

Currently the count parameter is not supported in conjunction with
track on any role, due to both cost and scraping potential. Track
users must be willing to tolerate some minor data loss during
reconnection. Note that all users of Search have always tolerated
occasional minor data loss -- Search has always been slightly lossy.

A well-coded Streaming client can usually reconnect within a second.
You can try to paper over data loss during this period with the Search
API, but it may not be worth the effort. If a keyword has such
velocity that there's much data loss during that second of
reconnection, search results are likely to be heavily filtered anyway.
In general, this Streaming approach is likely to be closer to a total
covering of the keyword than querying Search.

In the end, if a keyword is prevalent enough to be noticed as a gap
during a reconnect, it's a pretty high volume keyword. Is data loss in
this case an actual practical issue for your app when you are
receiving tens or hundreds of tweets per minute?

If there is sufficient demand, we'll investigate historical queries
for track at higher access levels, but currently this is a low
priority issue that will soon have a workable alternative. The
workaround would be to take the Firehose, once we announce terms, and
use the count parameter there. It's possible to consume the Firehose
without common-case data loss.

-John Kalucki
http://twitter.com/jkalucki
Infrastructure, Twitter Inc.











On Fri, Jan 22, 2010 at 5:41 AM, Jorge Vargas <jorge.var...@gmail.com> wrote:
> Hello,
>
> I'm building an archival tool for a specific hashtag after looking at
> the API and testing the streaming API and I got it working to pull any
> new tweets. However it seems the "count" parameter is restricited.
> From http://apiwiki.twitter.com/Streaming-API-Documentation#QueryParameters
>
> "Firehose, Retweet, Link, Birddog and Shadow clients interested in
> capturing all statuses should maintain a current estimate of the
> number of statuses received per second and note the time that the last
> status was received. Upon a reconnect, the client can then estimate
> the appropriate backlog to request. Note that the count parameter is
> not allowed elsewhere, including track, sample and on the default
> access role."
>
> I'm a little confused by that statement. Does that means I can't use
> count on any track requests or just track with the default access
> role? In that case how I can get "Shadow" access, google isn't
> cooperating much due to the simplicity of the name.
>
> Otherwise pulling tweets from the past from the streaming api seems
> not possible. I'm able to do that with search using since_id, but that
> seems a little weird.
>
> So if my streaming client dies since it is impossible for it to look
> back at those tweets, I'm going to have to build a full "catch up"
> procedure using the search API to fill in the gap between the two
> moments the streaming client was down. Is that really what is expected
> of third party developers?
>
> I'm just puzzled at the fact that the twitter team is asking everyone
> that pulls data to move to the streaming API but it seems that we are
> going to need a very big work around in order to get a reliable source
> of data and there is no simple way to get old data out of the system.
>

Reply via email to