Re: [HACKERS] handling unconvertible error messages

Vladimir Sitnikov Sat, 13 Aug 2016 02:26:05 -0700

Victor>We don't have 190 message  catalog translations in the PostgreSQL.
Victor>So problem with encoding for messages is quite limited.

Even though the number of translations is limited, there's a problem when
trying to tell one "one-byte-encoding" from another "one-byte" one.
It would be so much better if ServerErrorMessages included encoding right
in the message itself.

For pgjdbc, I've implemented a workaround that relies on the following:
1) It knows how "FATAL" looks like in several translations, and it knows
often used encodings in those translations. For instance, for Russian it
tries CP1251, KOI8, and ALT encodings. It converts "ВАЖНО" (Russian for
FATAL) using those three encodings and searches that byte sequence in the
error message. If there's a match, then the encoding is identified.
2) Unfortunately, it does not help for Japanese, as "FATAL there is
translated as FATAL". So I hard-coded several typical words like
"database", "user", "role" (see [1]), so if those byte sequences are
present, the message is assumed to be in Japanese. It would be great if
someone could review those as I do not speak Japanese.
3) Then it tries different LATIN encodings.

Here's the commit
https://github.com/pgjdbc/pgjdbc/commit/ec5fb4f5a66b6598aea1c7ab8df3126ee77d15e2

Kyotaro> Is there any source to know the compatibility for any combination
Kyotaro> of language vs encoding? Maybe we need a ground for the list.

I use "locale -a" for that.

For instance, for Japanese it prints the following on my machine (OS X
10.11.6):
locale -a | grep ja
ja_JP
ja_JP.eucJP
ja_JP.SJIS
ja_JP.UTF-8

[1]:
https://github.com/pgjdbc/pgjdbc/commit/ec5fb4f5a66b6598aea1c7ab8df3126ee77d15e2#diff-57ed15f90f50144391f1c134bf08a45cR47

Vladimir

Re: [HACKERS] handling unconvertible error messages

Reply via email to