|
Martin Gregorie wrote: On Mon, 2009-12-07 at 08:55 -0800, Marc Perkel wrote:Except for very short messages I would think that if you spell checked the message in several languages and found that 80% was spelled correctly that you have a match. You wouldn't have to check every language, just start with some common ones and if you don't match them go to less common ones. OK - maybe this is a long shot but supposer you did this: cat text.txt|aspell -a --lang=en |grep -v "*"|egrep -v "^$"|wc -l cat text.txt|aspell -a --lang=fr |grep -v "*"|egrep -v "^$"|wc -l ... What this would return is the number of misspelled lines in ech language. The language with the least misspellings is the correct language. Not sure how fast it would run or what you would want to do to the text first but is this an idea worth pursuing? |
- Language detection in TextCat Marc Perkel
- Re: Language detection in TextCat Matt Kettler
- Re: Language detection in TextCat Henrik K
- Re: Language detection in TextCat Marc Perkel
- Re: Language detection in TextCat LuKreme
- Re: Language detection in TextCat Martin Gregorie
- Re: Language detection in TextCat Marc Perkel
- RE: Language detection in TextCat R-Elists
- Re: Language detection in TextCat Matus UHLAR - fantomas
- Re: Language detection in TextCat Matus UHLAR - fantomas
