> That sample will be biased towards more active posters and may include
> some demographic biases due to seasonal activities during the limited
> time frame of the sample.

That answers my question, and that is what I was afraid of. I think
for my purposes (language detection), a random sample of active users
is fine. I just wanted to get opinions.

> The Streaming API sample method would provide a random sampling of
> public users weighted by update rate, not a random sampling of all
> users. The default 'spritzer' should be sufficient for most uses.

To clarify, does this mean that each (non-protected) user has an equal
probability of showing up in the stream regardless of how often they
tweet?

Thanks,
Ryan

On Oct 12, 8:31 am, Chris Babcock <cbabc...@kolonelpanic.org> wrote:
> > I am doing some research using the Twitter API and I would like to get
> > a random sample of Twitter users. Any ideas of how this can be
> > accomplished?
>
> Here's a start:http://en.wikipedia.org/wiki/Sampling_(statistics)
>
> At this point you are asking for a sampling method without providing an
> adequate definition of the population.
>
> > So far, I have scraped 2 weeks from the Streaming API and extracted 3
> > million user IDs from the stream. Any arguments as to whether or not
> > this could constitute random?
>
> That sample will be biased towards more active posters and may include
> some demographic biases due to seasonal activities during the limited
> time frame of the sample.
>
> Chris Babcock

Reply via email to