On Wed, Oct 14, 2009 at 3:27 PM, Kyle B <kylebarn...@gmail.com> wrote:

>
> Thanks for the info. It helps a lot.  Figuring out an accurate number
> is essential to my model, so much so that I am determined to find some
> method of estimating it to acceptable margins of error!


It occurs to me that perhaps this might not be so hard... and please do
share your results with us.

Just test a good-sized sample of IDs and see how many don't exist.  That
will give you an idea of how many there really are.  I'll be curious to see
if you get consistent results from one day to the next.  I won't be too
surprised to see if you don't, which would mean that Twitter is skipping a
random (or at least somewhat random) number of IDs each day.

However, if you want to continue to know this number, you'll have to
continue to sample.  And your sample might have to span multiple days to get
a reliable answer.

And I hate to say this, because if they're not already doing it, this might
make them start... Twitter could be monitoring for any process that
repeatedly asks for deliberately non-existent IDs, in order to block them,
to maintain the obfuscation.  Then you're stuck again, unless you can find a
way around that defense.

Assuming there are millions of IDs a day, you'll need a pretty good sample
size if you want to maintain a good number.

The good news in all this is that IIRC, Twitter has guaranteed that IDs will
increase chronologically.

The bad news is that I'm writing this off the top of my head and there's
probably an easy defense I haven't thought of, which somebody at Twitter
will think of just because they see this conversation.

Put 'em on double-secret probation, I say.

Nick

Reply via email to