Hello,

(I've recovered the lost Cc recipients so far)

At Mon, 8 Aug 2016 12:52:11 +0300, Victor Wagner <vi...@wagner.pp.ru> wrote in 
<20160808125211.1361c...@fafnir.local.vm>
> On Mon, 08 Aug 2016 18:28:57 +0900 (Tokyo Standard Time)
> Kyotaro HORIGUCHI <horiguchi.kyot...@lab.ntt.co.jp> wrote:
> > 
> > I don't see charset compatibility to be easily detectable,
> 
> In the worst case we can hardcode explicit compatibility table.

We could have the language lists compatible with some
language-bound encodings.  For example, LATIN1 (ISO/IEC 8859-1),
according to Wikipedia
(https://en.wikipedia.org/wiki/ISO/IEC_8859-1)

According to the list, we might have the following compatibility
list of locales, maybe without region.

{{"UTF8", "LATIN1"}, "af", "sq", "eu", "da", "en", "fo", "en"}... and so.

The biggest problem for this is at least *I* cannot confirm the
validity of the list. Both about perfectness of coverage of
LATIN1 over all languages in the list and omission of any
possiblly coverable language. Nontheless, we could use such lists
if we accept the possible imperfectness, which would eventually
result in the original error (conversion failure) or excess
fallback for possibly convertable languages but unfortunately the
latter  would be inacceptable for table data.

> There is limited set of languages, which have translated error messages,
> and limited (albeit wide) set of encodings, supported by PostgreSQL. So

Yes, we can have a negative list already known to be incompatible.

{{"UTF8", "LATIN1"}, "ru", .. er..what else?}

ISO639-1 seems to have about 190 languages and most of them are
apparently incompatible with LATIN1 encoding. It doesn't seem to
me good to have a haphazardly made negative list.

> it is possible to define complete list of encodings, compatible with
> some translation. And fall back to untranslated messages if client
> encoding is not in this list.
> 
> > because locale (or character set) is not a matter of PostgreSQL
> > (except for some encodings bound to one particular character
> > set)... So the conversion-fallback might be a only available
> > solution.
> 
> Conversion fallback may be a solution for data. For NLS-messages I think
> it is better to fall back to English (untranslated) messages than use of
> transliteration or something alike.

I suppose that 'fallback' means "have a try then use English if
failed" so I think it is sutable rather for message, not for
data, and it doesn't need any a priori information about
compatibility. It seems to me that PostgreSQL refuses to ignore
or conceal conversion errors and return broken or unwanted byte
sequence for data.  Things are different for error messages, it
is preferable to be anyyhow readable than totally abandoned.

> I think that for now we can assume that the best effort is already done
> for the data, and think how to improve situation with messages.

Is there any source to know the compatibility for any combination
of language vs encoding? Maybe we need a ground for the list.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to