On Tue, May 28, 2013 at 5:02 PM, Rui Barradas <ruipbarra...@sapo.pt> wrote: > Hello, > > And some words exist in Portuguese, Spanish and English, the three > languages of the problem. For instance, "animal". I don't think this > problem can be solved, but a dictionary search would tell if it is a > Portuguese word, which it is.
Is there any structure to the text? If it has complete paragraphs in one of the three languages then you can probably make a better guess of the language of the paragraph from the presence of key words. I wonder if some of the code for detecting spam can help you here... Train it on some known Portuguese, Spanish, and English text... If its just a stream of words in one of the languages in a random order then it is difficult or impossible. Barry ______________________________________________ R-help@r-project.org mailing list https://stat.ethz.ch/mailman/listinfo/r-help PLEASE do read the posting guide http://www.R-project.org/posting-guide.html and provide commented, minimal, self-contained, reproducible code.