Jeff Davis wrote:
On Wed, 2007-03-14 at 01:29 -0600, Michael Fuhr wrote:
On Tue, Mar 13, 2007 at 04:42:35PM +0100, Mario Weilguni wrote:
Am Dienstag, 13. März 2007 16:38 schrieb Joshua D. Drake:
Is this any different than the issues of moving 8.0.x to 8.1 UTF8? Where
we had to use iconv?
What issues? I've upgraded several 8.0 database to 8.1. without having to use iconv. Did I miss something?

"Some users are having problems loading UTF-8 data into 8.1.X.  This
is because previous versions allowed invalid UTF-8 byte sequences
to be entered into the database, and this release properly accepts
only valid UTF-8 sequences. One way to correct a dumpfile is to run
the command iconv -c -f UTF-8 -t UTF-8 -o cleanfile.sql dumpfile.sql."

If the above quote were actually true, then Mario wouldn't be having a
problem. Instead, it's half-true: Invalid byte sequences are rejected in
some situations and accepted in others. If postgresql consistently
rejected or consistently accepted invalid byte sequences, that would not
cause problems with COPY (meaning problems with pg_dump, slony, etc.).

How can we fix this? Frankly, the statement in the docs warning about making sure that escaped sequences are valid in the server encoding is a cop-out. We don't accept invalid data elsewhere, and this should be no different IMNSHO. I don't see why this should be any different from, say, date or numeric data. For years people have sneered at MySQL because it accepted dates like Feb 31st, and rightly so. But this seems to me to be like our own version of the same problem.

Last year Jeff suggested adding something like:


to each relevant input routine. Would that be an acceptable solution? If not, what would be?



---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
      choose an index scan if your joining column's datatypes do not

Reply via email to