On Wed, 10 Aug 2005, mohsen ali momeni wrote:
> Thanks for reply,
> What I exatly need is CP1256 detection, and after that detecting
> whether the language is persian or not.
As you can guess, all non-Unicode character sets share the same
8-bit space, so they overlap all the time. Your only bet at
charset detection is to look at the areas that are left unencoded
in each character set and cross-out charsets as use those
forbidden areas. As for language detection, that can be used in
charset detection too, you can look for the string SPACE REH
ALEPH SPACE as a good indicator of Persian.
PersianComputing mailing list