> That sample will be biased towards more active posters and may include > some demographic biases due to seasonal activities during the limited > time frame of the sample.
That answers my question, and that is what I was afraid of. I think for my purposes (language detection), a random sample of active users is fine. I just wanted to get opinions. > The Streaming API sample method would provide a random sampling of > public users weighted by update rate, not a random sampling of all > users. The default 'spritzer' should be sufficient for most uses. To clarify, does this mean that each (non-protected) user has an equal probability of showing up in the stream regardless of how often they tweet? Thanks, Ryan On Oct 12, 8:31 am, Chris Babcock <[email protected]> wrote: > > I am doing some research using the Twitter API and I would like to get > > a random sample of Twitter users. Any ideas of how this can be > > accomplished? > > Here's a start:http://en.wikipedia.org/wiki/Sampling_(statistics) > > At this point you are asking for a sampling method without providing an > adequate definition of the population. > > > So far, I have scraped 2 weeks from the Streaming API and extracted 3 > > million user IDs from the stream. Any arguments as to whether or not > > this could constitute random? > > That sample will be biased towards more active posters and may include > some demographic biases due to seasonal activities during the limited > time frame of the sample. > > Chris Babcock
