I wanted to make a simple "cursor" which would allow me to remember a
position on a timeline, and then pull messages and crawl forward
without missing any messages. I thought the way to do that would be
to use "since_id" and "count", however, this method is unreliable
because of the way they work. It seems a lot of people are reporting
issues having to do with since_id (perhaps this is related, not sure).
since_id will limit the returned results as will count. However, it
appears that count works to preference the "most recent tweets". This
makes sense if all you want is a snapshot of the most recent stuff.
However, if you are trying to simply iterate over the list in order,
then count can't be used.
Here's an example to show why you can't build a reliable, simple,
forward cursor using the twitter API:
1. Assume we are at since_id = 1000. This was the last (highest)
message id we had
previously seen, which we have saved.
2. There is a sudden spiked and 2000 tweets come in.
3. We now try to query with since_id=1000, count=200 (the max).
Unfortunately, we have missed
1800 tweets, because we only get the most recent 200 tweets.
The problem is actually worse than this simple example. Since the ids
are non-sequential for a particular
stream, we have no idea of how many tweets were actually for us. The
ids are too far apart. We could
set an id to the _lowest_ seen value and try iterating backward
(fairly complicated for such a simple thing).
Thus we are entirely dependent on the rate of tweets incoming and our
sampling rate. At a certain rate it will appear to work, and at some
point then start failing miserably.
How to solve this: have some way of returning the earliest "count"
tweets rather than the most recent.
Let's call this query arg "early_count". This will easily allow a
cursor to be created, and forward iteration on the stream of
messages. Moving forward is simple, just remember the highest seen
id, and pass this in as since_id, along with early_count set to
however number of tweets you want to move forward by.
I'm somewhat surprised no one has commented on this design flaw (which
makes me suspicious, perhaps I missed something obvious). If so,