I'm building an archival tool for a specific hashtag after looking at
the API and testing the streaming API and I got it working to pull any
new tweets. However it seems the "count" parameter is restricited.
>From http://apiwiki.twitter.com/Streaming-API-Documentation#QueryParameters

"Firehose, Retweet, Link, Birddog and Shadow clients interested in
capturing all statuses should maintain a current estimate of the
number of statuses received per second and note the time that the last
status was received. Upon a reconnect, the client can then estimate
the appropriate backlog to request. Note that the count parameter is
not allowed elsewhere, including track, sample and on the default
access role."

I'm a little confused by that statement. Does that means I can't use
count on any track requests or just track with the default access
role? In that case how I can get "Shadow" access, google isn't
cooperating much due to the simplicity of the name.

Otherwise pulling tweets from the past from the streaming api seems
not possible. I'm able to do that with search using since_id, but that
seems a little weird.

So if my streaming client dies since it is impossible for it to look
back at those tweets, I'm going to have to build a full "catch up"
procedure using the search API to fill in the gap between the two
moments the streaming client was down. Is that really what is expected
of third party developers?

I'm just puzzled at the fact that the twitter team is asking everyone
that pulls data to move to the streaming API but it seems that we are
going to need a very big work around in order to get a reliable source
of data and there is no simple way to get old data out of the system.

Reply via email to