On Mon, Jul 2, 2012 at 4:46 PM, Alexander Korotkov <aekorot...@gmail.com> wrote: > So, I provided such transformation in versions 0.3 and 0.4 based on > explanation from Tatsuo Ishii. The problem is that both conversions are > nontrivial and it's not evident that they are mirror (understanding that > they are mirror require some additional assumptions about encodings, not > evident just by transformation itself). I though you mention that problem > two message back.
Yeah, I did. I think I may be a bit confused here, so let me try to understand this a bit better. It seems like pg_mule2wchar_with_len uses the following algorithm: - If the first character IS_LC1 (0x81-0x8d), decode two bytes, stored with shifts of 16 and 0. - If the first character IS_LCPRV1 (0x9a-0x9b), decode three bytes, skipping the first one and storing the remaining two with shifts of 16 and 0. - If the first character IS_LC2 (0x90-0x99), decode three bytes, stored with shifts of 16, 8, and 0. - If the first character IS_LCPRV2 (0x9c-0x9d), decode four bytes, skipping the first one and storing the remaining three with offsets of 16, 8, and 0. In the reverse transformation implemented by pg_wchar2mule_with_len, if the byte stored with shift 16 IS_LC1 or IS_LC2, then we decode 2 or 3 bytes, respectively, exactly as I would expect. ASCII decoding is also as I would expect. The case I don't understand is what happens when the leading byte of the multibyte character was IS_LCPRV1 or IS_LCPRV2. In that case, we ought to decode three bytes if it was IS_LCPRV1 and four bytes if it was IS_LCPRV2, but actually it seems we always decode 4 bytes. That implies that the IS_LCPRV1() case in pg_mule2wchar_with_len is dead code, and that any 4 byte characters are always of the form 0x9d 0xf? 0x?? 0x??; maybe that's what the comment there is driving at, but it's not too clear to me. Am I close? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers