Re: [twitter-dev] Farsi Twitter App

2010-07-06 Thread Lucas Vickers
Thank you everyone. You've given me quite a few good options to look into. Lucas On Mon, Jul 5, 2010 at 5:57 AM, Jean-Charles Campagne a...@semiocast.com wrote: Hello Lucas, We do not provide, yet, exactly what you are looking for, but for now we might help you on the language filtering

Re: [twitter-dev] Farsi Twitter App

2010-07-05 Thread Jean-Charles Campagne
Hello Lucas, We do not provide, yet, exactly what you are looking for, but for now we might help you on the language filtering part. We provide an API for language and location filtering for micro-messages (Tweets and Facebook messages, etc.). You'll find more info on the API website:

Re: [twitter-dev] Farsi Twitter App

2010-07-04 Thread Pascal Jürgens
Interesting. Your method is similar to the breadth-first crawl that many people do (for example, see the academic paper by Kwak et al. 2010). You have to keep in mind, however, that you are only crawling the giant component of the network, the connected part. If there are any turkish users who

Re: [twitter-dev] Farsi Twitter App

2010-07-04 Thread Furkan Kuru
You are right. Separate subpopulation s are out of our reach. Apart from following/friendship connection we look at mentions and follow them as well. If a new comer or a man from other population mentions one of the people in our network, his tweet will reach us and we can test him and add as

Re: [twitter-dev] Farsi Twitter App

2010-07-03 Thread Pascal Jürgens
Hi Lucas, as someone who approached a similar problem, my recommendation would be to track users. In order to get results quickly (rather than every few hours via user timeline calls), you need streaming access, which is a bit more complicated. I implemented such a system in order to track

Re: [twitter-dev] Farsi Twitter App

2010-07-03 Thread John Kalucki
It's great to hear that someone implemented all this. There's a similar technique documented here: http://dev.twitter.com/pages/streaming_api_concepts, under By Language and Country. My suggestion was to start with a list of stop words to build your user corpus -- but I don't know how well Farsi

Re: [twitter-dev] Farsi Twitter App

2010-07-03 Thread Pascal Jürgens
John, yes, thanks a lot for the design proposal - that is what inspired my own system. I am not primarily filtering by language, however, but by country, so I'm using time zone and location data together with a list of cities from http://www.geonames.org/ The manual cross-check in my thesis

Re: [twitter-dev] Farsi Twitter App

2010-07-03 Thread Furkan Kuru
We have implemented the Turkish version: Twitturk http://twitturk.com/home/lang/en We skipped the first three steps but started with a few Turkish users and crawled all the network and for each new user we tested if the description or latest tweets are in Turkish language. We have almost 100.000