I am trying to create an app that will show tweets and trends in
Farsi, for native speakers.  I would like to somehow get a sample
'garden hose' of Farsi based tweets, but I am unable to come up with
an elegant solution.

I see the following options:

- Sample all tweets, and run a language detection algorithm on the
tweet to determine which are/could be Farsi.
  * Problem: only a very very small % of the tweets will be in Farsi

- Use the location filter to try and sample tweets from countries that
are known to speak Farsi, and then run a language detection algorithm
on the tweets.
  * Problem: I seem to be limited on the size of the coordinate box I
can provide.  I can not even cover all of Iran for example.

- Filter a standard farsi term.
  * Problem: will limit my results to only tweets with this term

- Search for laguage = farsi
   * Problem: Not a stream, I will need to keep searching.

I think of the given options I mentioned what makes the most sense is
to search for tweets where language=farsi, and use the since_id to
keep my results new.  Given this method, I have three questions
1 - since_id I imagine is the highest tweet_id from the previous
result set?
2 - How often can I search (given API limits of course) in order to
ensure I get new data?
3 - Will the language filter provide me with users who's default
language is farsi, or will it actually find tweets in farsi?

I am aware that the user can select their native language in the user
profile, but I also know this is not 100% reliable.

Can anyone think of a more elegant solution?
Are there any hidden/experimental language type filters available to


Reply via email to