Sergey Burladyan wrote:
src/backend/utils/mb/conversion_procs/cyrillic_and_mic/cyrillic_and_mic.c
does not have cyrillic letter 'IO' in ISO-8859-5 to mule internal code translation table (function iso2mic(const unsigned char *l, unsigned char *p, int len)). this is bug, because it is widely used and it is main letter like A, B or C in english :) and it is exist in all russian cyrillic's encoding (koi8-r, iso-8859-5, windows-1251, cp866). for example, in russian, words 'all', 'hedgehog', 'Christmas-tree' and many other must be written with it.

here is the patch for add it to ISO-8859-5 to mule internal code translation table. i am don't know is this ok and do not brake any internal rule or code ?

You'd need to modify the mic->ISO-8859-5 translation table as well, for converting in the other direction.

By the way, as i can understand you are using koi8-r encoding for internal representation of cyrillic charsets - this is have also another problem. the second "widely" used char is <U2116> NUMERO SIGN (many accountants and managers use it :) in cyrillic windows world) and it is exist in windows-1251, cp866 and iso-8859-5 encoding, but not in koi8-r...

Hmm. We use KOI8-R (or rather, MULE_INTERNAL with KOI8-R ) as an intermediate encoding, because there's no direct conversion table between ISO-8859-5 and the other cyrillic encodings. Ideally there would be. Another possibility would be to use UTF-8 as the intermediate encoding; that'd probably be much slower, but UTF-8 should have all the characters needed.

Is there any other characters like "YO" that are missing, that exist in all the encodings? Looking at the character set table for KOI8-R, it looks like the "YO" is in an odd place in the table, compared to all other cyrillic characters. Perhaps that's why it was missed.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

Reply via email to