FTR I ran into a benign case of the phenomenon in this thread when
dealing with row types.  In rowtypes.c, we double-quote stuff
containing spaces, but we detect them by passing individual bytes of
UTF-8 sequences to isspace().  Like macOS, Windows thinks that 0xa0 is
a space when you do that, so for example the Korean character '점'
(code point C810, UTF-8 sequence EC A0 90) gets quotes on Windows but
not on Linux.  That confused a migration/diff tool while comparing
Windows and Linux database servers using that representation.  Not a
big deal, I guess no one ever promised that the format was stable
across platforms, and I don't immediately see a way for anything more
serious to go wrong (though I may lack imagination).  It does seem a
bit weird to be using locale-aware tokenising for a machine-readable
format, and then making sure its behaviour is undefined by feeding it
chopped up bytes.


Reply via email to