Hi there. I am working on a WordNet-based Serbian-English dictionary (part of Transpoetika Project at the Belgrade Center for Digital Humanities, http://humanistika.org)
I've implemented a "LiveQuote" system with Twitter, where we get most recent tweets exemplifying the use of a given dictionary entry. We also have several other ideas on how to integrate Twitter in our dictionary application, both on the production and reception ends. But we're facing a serious performance issue: Twitter's language parameter (lang) does not recognize Serbian (sr). My workaround has been to use Google Translate's API to check tweets to make sure they are really Serbian. It works, Google is pretty good about this (not 101%, but close enough), but this has considerably slowed down the process -- every tweet we get for a certain word has to be checked with Google before being displayed. Without a language check, however, we run into cases where certain Russian, Bulgarian, Macedonian etc. tweets will sometimes sneak into our results thanks to interlingual homographs. For eg. живот in Serbian means "life", while in Russian it means "stomach". I am curious how you guys check for language identity on your backend, and whether there was any chance you could include Serbian in the list? All best, Toma