Re: [HACKERS] invalidly encoded strings

Andrew Dunstan Mon, 10 Sep 2007 07:25:04 -0700


Tom Lane wrote:

Andrew Dunstan <[EMAIL PROTECTED]> writes:

Perhaps we're talking at cross purposes.

The problem with doing encoding validation in scan.l is that it lackscontext. Null bytes are only the tip of the bytea iceberg, since anyarbitrary sequence of bytes can be valid for a bytea.


If you think that, then we're definitely talking at cross purposes.
I assert we should require the post-scanning value of a string literal
to be valid in the database encoding.  If you want to produce an
arbitrary byte sequence within a bytea value, the way to get there is
for the bytea input function to do the de-escaping, not for the string
literal parser to do it.



[looks again ... thinks ...]

Ok, I'm sold.


The only reason I was considering not doing it in scan.l is that
scan.l's behavior ideally shouldn't depend on any changeable variables.
But until there's some prospect of database_encoding actually being
mutable at run time, there's not much point in investing a lot of sweat
on that either.

Agreed. This would just be one item of many to change if/when we evercome to that.

Instead, we have to mess with an unknown number of UDTs ...

We're going to have that danger anyway, aren't we, unless we check theencoding validity of the result of every UDF that returns some texttype? I'm not going to lose sleep over something I can't cure but theuser can - what concerns me is that our own code ensures dataintregrity, including for encoding.



Anyway, it looks to me like we have the following items to do:

. add validity checking to the scanner
. fix COPY code
. fix chr()
. improve efficiency of validity checks, at least for UTF8


cheers

andrew

cheers

andrew

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

Re: [HACKERS] invalidly encoded strings

Reply via email to