On Wed, 10 Aug 2005, mohsen ali momeni wrote:

> Hi,
> Thanks for reply,
> What I exatly need is CP1256 detection, and after that detecting
> whether the language is persian or not.

As you can guess, all non-Unicode character sets share the same
8-bit space, so they overlap all the time.  Your only bet at
charset detection is to look at the areas that are left unencoded
in each character set and cross-out charsets as use those
forbidden areas.  As for language detection, that can be used in
charset detection too, you can look for the string SPACE REH
ALEPH SPACE as a good indicator of Persian.

> Regards,

PersianComputing mailing list

Reply via email to