Re: [HACKERS] Logical Replication and Character encoding

Kyotaro HORIGUCHI Fri, 03 Feb 2017 00:35:59 -0800

Hello,

At Fri, 3 Feb 2017 13:47:54 +0800, Craig Ringer <[email protected]> wrote 
in <CAMsr+YFqNLAvdjmgWOMWM9X=ffzcfol4plxbeaqtjysbe_u...@mail.gmail.com>
> On 3 Feb. 2017 15:47, "Kyotaro HORIGUCHI" <[email protected]>
> wrote:
> 
> Hello,
> 
> At Fri, 3 Feb 2017 09:16:47 +0800, Craig Ringer <[email protected]>
> wrote in <CAMsr+YGqn2PjJBCY+RjWWTJ4BZ=fhg0rexq1e1nd8_kjadg...@mail.gmail.com
> >
> > On 2 February 2017 at 11:45, Euler Taveira <[email protected]> wrote:
> >
> > > I don't think storage without conversion is an acceptable approach. We
> > > should provide options to users such as ignore tuple or NULL for
> > > column(s) with conversion problem. I wouldn't consider storage data
> > > without conversion because it silently show incorrect data and we
> > > historically aren't flexible with conversion routines.
> 
> It is possible technically. But changing the behavior of a
> subscript and/or publication requires change of SQL syntax. It
> seems a bit too late for proposing such a new feature..
> 
> IMHO unintentional silent data loss must not be happen so the
> default behavior on conversion failure cannot be other than stop
> of replication.
> 
> 
> Agree. Which is why we should default to disallowing mismatched upstream
> and downstream encodings. At least to start with.
> 
> > pglogical and BDR both require identical encoding; they test for this
> > during startup and refuse to replicate if the encoding differs.
> >
> > For the first pass at core I suggest a variant on that policy: require
> > source and destination encoding to be the same. This should probably
> > be the first change, since it definitively prevents the issue from
> > arising.
> 
> If the check is performed by BDR, pglogical itself seems to be
> allowed to convert strings when the both end have different
> encodings in a non-BDR environment. Is this right?
> 
> 
> Hm. Maybe it's changed since I last looked. We started off disallowing
> mismatched encodings anyway.
> 
> Note that I'm referring to pglogical, the tool, not "in core logical
> replication for postgres"


Ouch! I'm so sorry for my bad mistake. What I thought that I
mentioned is not pglogical, but pgoutput, the default output
plugin. But what the patch modifies is logical/proto.c. My
correct question was the following.

| Does pglogical requires that the core to reject connections
| with a server with unidentical encoging? Or check condings by
| itself and disconnect by itself?


> > If time permits we could also allow destination encoding to be UTF-8
> > (including codepage 65001) with any source encoding. This requires
> > encoding conversion to be performed, of course.
> 
> Does this mean that BDR might work on heterogeneous encoding
> environemnt?
> 
> No. Here the "we" was meant to be PG core for V10 or later.
> 
> But anyway some encodings (like SJIS) have
> caharacters with the same destination in its mapping so BDR
> doesn't seem to work with such conversions. So each encoding
> might should have a property to inform its usability under BDR
> environment, but.
> 
> 
> PG doesn't allow SJIS as a db encoding. So it doesn't matter here.

Oops. I suppose EUC_JP also has such characters but I'm not sure
now.


> > The downside is that this will impact users who use a common subset of
> > two encodings. This is most common for Windows-1252 <-> ISO-8859-15
> > (or -1 if you're old-school) but also arises anywhere the common 7 bit
> > subset is used. Until we can define an encoding exception policy
> > though, I think we should defer supporting those and make them a
> > "later" problem.
> 
> If the conversion is rejected for now, we should check the
> encoding identity instead.

Ok, I'll go on this direction for the next patch.

regards,

-- 
Kyotaro Horiguchi
NTT Open Source Software Center




-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical Replication and Character encoding

Reply via email to