Re: [HACKERS] Bug in UTF8-Validation Code?

Andrew Dunstan Sun, 18 Mar 2007 04:29:41 -0800

Martijn van Oosterhout wrote:

On Sat, Mar 17, 2007 at 11:46:01AM -0400, Andrew Dunstan wrote:
How can we fix this? Frankly, the statement in the docs warning aboutmaking sure that escaped sequences are valid in the server encoding is acop-out. We don't accept invalid data elsewhere, and this should be nodifferent IMNSHO. I don't see why this should be any different from,say, date or numeric data. For years people have sneered at MySQLbecause it accepted dates like Feb 31st, and rightly so. But this seemsto me to be like our own version of the same problem.
It seems to me that the easiest solution would be to forbid \x?? escape
sequences where it's greater than \x7F for UTF-8 server encodings.
Instead introduce a \u escape for specifying the unicode character
directly. Under the basic principle that any escape sequence still has
to represent a single character. The result can be multiple bytes, but
you don't have to check for consistancy anymore.
Have a nice day,

The escape processing is actually done in the lexer in the case ofliterals. We have to allow for bytea literals there too, regardless ofencoding. The lexer naturally has no notion of the intended destinationof the literal, So we need to defer the validity check to the *infunctions for encoding-aware types. And it as Tom has noted, COPY doesits own escape processing but does it before the transcoding.

So ISTM that any solution other than something like I have proposed willprobably involve substantial surgery.

It does also seem from my test results that transcoding to MB charsets(or at least to utf-8) is surprisingly expensive, and that this would bea good place to look at optimisation possibilities. The validity testscan also be somewhat expensive.


But correctness matters most, IMNSHO.

cheers

andrew




---------------------------(end of broadcast)---------------------------
TIP 5: don't forget to increase your free space map settings

Re: [HACKERS] Bug in UTF8-Validation Code?

Reply via email to