Re: [HACKERS] Logical Replication and Character encoding

Kyotaro HORIGUCHI Thu, 02 Feb 2017 18:48:12 -0800

Hello,

At Fri, 3 Feb 2017 09:16:47 +0800, Craig Ringer <[email protected]> wrote 
in <CAMsr+YGqn2PjJBCY+RjWWTJ4BZ=fhg0rexq1e1nd8_kjadg...@mail.gmail.com>
> On 2 February 2017 at 11:45, Euler Taveira <[email protected]> wrote:
> 
> > I don't think storage without conversion is an acceptable approach. We
> > should provide options to users such as ignore tuple or NULL for
> > column(s) with conversion problem. I wouldn't consider storage data
> > without conversion because it silently show incorrect data and we
> > historically aren't flexible with conversion routines.


It is possible technically. But changing the behavior of a
subscript and/or publication requires change of SQL syntax. It
seems a bit too late for proposing such a new feature..

IMHO unintentional silent data loss must not be happen so the
default behavior on conversion failure cannot be other than stop
of replication.

> pglogical and BDR both require identical encoding; they test for this
> during startup and refuse to replicate if the encoding differs.
> 
> For the first pass at core I suggest a variant on that policy: require
> source and destination encoding to be the same. This should probably
> be the first change, since it definitively prevents the issue from
> arising.

If the check is performed by BDR, pglogical itself seems to be
allowed to convert strings when the both end have different
encodings in a non-BDR environment. Is this right?

> If time permits we could also allow destination encoding to be UTF-8
> (including codepage 65001) with any source encoding. This requires
> encoding conversion to be performed, of course.

Does this mean that BDR might work on heterogeneous encoding
environemnt? But anyway some encodings (like SJIS) have
caharacters with the same destination in its mapping so BDR
doesn't seem to work with such conversions. So each encoding
might should have a property to inform its usability under BDR
environment, but...

On the other hand, no prolem is seen in encoding conversions in
non-BDR environments. (except the behavior on failure)

> The downside is that this will impact users who use a common subset of
> two encodings. This is most common for Windows-1252 <-> ISO-8859-15
> (or -1 if you're old-school) but also arises anywhere the common 7 bit
> subset is used. Until we can define an encoding exception policy
> though, I think we should defer supporting those and make them a
> "later" problem.

If the conversion is rejected for now, we should check the
encoding identity instead.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication and Character encoding

Reply via email to