At OCLC we have some good results detecting frequent encodings and recurring encoding problems using Naïve Bayesian classification. You have to have training data for the classes you want to detect. And language comes into play, because the distribution of characters is dependent on it. No silver bullet yet... That said, you might check the recurrence of this problem. For instance using Algorithm::NaiveBayes or another classifier algorithm.
Wouter -----Original Message----- From: Thomas Krichel [mailto:kric...@openlib.org] Sent: zaterdag 6 februari 2016 18:52 To: Marios lyberak Cc: perl4lib@perl.org Subject: Re: identify encoding from a file Marios lyberak writes > i have a file which is generated out of an old Paradox database, > > and i try to figure out what is the encoding of these strangely represented > characters I know of no way to automate this, and I don't think anybody else does. You just simply need to read the file with various encodings set at parsing, and manually inspect whether you get the right output. Your Paradox manual may be of help to reduce the number of candidate character sets. -- Cheers, Thomas Krichel http://openlib.org/home/krichel skype:thomaskrichel