Interesting. Your method is similar to the breadth-first crawl that many people
do (for example, see the academic paper by Kwak et al. 2010).
You have to keep in mind, however, that you are only crawling the giant
component of the network, the connected part. If there are any turkish users
who have their *separate* subpopulation, which is not connected to the rest,
you won't find those.
You could easily find those with a sample stream. Although I have to admit that
the number of non-connected users is not so big, no one has really tested that
On Jul 3, 2010, at 20:00 , Furkan Kuru wrote:
> We have implemented the Turkish version:
> We skipped the first three steps but started with a few Turkish users and
> crawled all the network and for each new user we tested if the description or
> latest tweets are in Turkish language.
> We have almost 100.000 Turkish users identified so far.
> Using stream api we collect their tweets and we find out the popular people
> and key-words, top tweets (most retweeted ones) among Turkish people.