Could somebody from the Twitter team please address my question about language
recognition in the search API?
Many thanks in advance.
25.11.2009, в 10:00, Toma написал(а):
> Hi there.
> I am working on a WordNet-based Serbian-English dictionary (part of
> Transpoetika Project at the Belgrade Center for Digital Humanities,
> I've implemented a "LiveQuote" system with Twitter, where we get most
> recent tweets exemplifying the use of a given dictionary entry. We
> also have several other ideas on how to integrate Twitter in our
> dictionary application, both on the production and reception ends.
> But we're facing a serious performance issue: Twitter's language
> parameter (lang) does not recognize Serbian (sr). My workaround has
> been to use Google Translate's API to check tweets to make sure they
> are really Serbian. It works, Google is pretty good about this (not
> 101%, but close enough), but this has considerably slowed down the
> process -- every tweet we get for a certain word has to be checked
> with Google before being displayed.
> Without a language check, however, we run into cases where certain
> Russian, Bulgarian, Macedonian etc. tweets will sometimes sneak into
> our results thanks to interlingual homographs. For eg. живот in
> Serbian means "life", while in Russian it means "stomach".
> I am curious how you guys check for language identity on your backend,
> and whether there was any chance you could include Serbian in the
> All best,