Re: Detecting encoding in Plain text

D. Starner Thu, 08 Jan 2004 08:05:43 -0800

> Given any sizeable chunk of text, it ought to be possible to estimate 
> the statistical likelihood of its being in a certain 
> encoding/[language] even if it's in an unspecified 8859-* encoding. 
> It would be quite an interesting exercise, but I'd be surprised if 
> someone hasn't done it before.  Perhaps someone here knows.


http://www.let.rug.nl/~vannoord/TextCat/ has a paper on the subject
and an implemenation in Perl. http://mnogosearch.org has an alternate
implementation in compiled code (called mguesser). 
-- 
___________________________________________________________
Sign-up for Ads Free at Mail.com
http://promo.mail.com/adsfreejump.htm

Re: Detecting encoding in Plain text

Reply via email to