On Tue, May 28, 2013 at 5:02 PM, Rui Barradas <ruipbarra...@sapo.pt> wrote:
> Hello,
>
> And some words exist in Portuguese, Spanish and English, the three
> languages of the problem. For instance, "animal". I don't think this
> problem can be solved, but a dictionary search would tell if it is a
> Portuguese word, which it is.

 Is there any structure to the text? If it has complete paragraphs in
one of the three languages then you can probably make a better guess
of the language of the paragraph from the presence of key words. I
wonder if some of the code for detecting spam can help you here...
Train it on some known Portuguese, Spanish, and English text...

 If its just a stream of words in one of the languages in a random
order then it is difficult or impossible.

Barry

______________________________________________
R-help@r-project.org mailing list
https://stat.ethz.ch/mailman/listinfo/r-help
PLEASE do read the posting guide http://www.R-project.org/posting-guide.html
and provide commented, minimal, self-contained, reproducible code.

Reply via email to