[twitter-dev] Adding more languages to lang parameter in Search API

Toma Tasovac Wed, 25 Nov 2009 05:01:15 -0800

Hi there.

I am working on a WordNet-based Serbian-English dictionary (part of
Transpoetika Project at the Belgrade Center for Digital Humanities,
http://humanistika.org)


I've implemented a "LiveQuote" system with Twitter, where we get most
recent tweets exemplifying the use of a given dictionary entry. We
also have several other ideas on how to integrate Twitter in our
dictionary application, both on the production and reception ends.

But we're facing a serious performance issue: Twitter's language
parameter (lang) does not recognize Serbian (sr). My workaround has
been to use Google Translate's API to check tweets to make sure they
are really Serbian. It works, Google is pretty good about this (not
101%, but close enough), but this has considerably slowed down the
process -- every tweet we get for a certain word has to be checked
with Google before being displayed.

Without a language check, however, we run into cases where certain
Russian, Bulgarian, Macedonian etc. tweets will sometimes sneak into
our results thanks to interlingual homographs. For eg. живот in
Serbian means "life", while in Russian it means "stomach".

I am curious how you guys check for language identity on your backend,
and whether there was any chance you could include Serbian in the
list?

All best,
Toma

[twitter-dev] Adding more languages to lang parameter in Search API

Reply via email to