On 4/18/09, Tom Lane <t...@sss.pgh.pa.us> wrote: > Sam Mason <s...@samason.me.uk> writes: > > On Fri, Apr 17, 2009 at 07:01:47PM +0200, Martijn van Oosterhout wrote: > >> On Fri, Apr 17, 2009 at 07:07:31PM +0300, Marko Kreen wrote: > >>> Btw, is there any good reason why we don't reject \000, \x00 > >>> in text strings? > >> > >> Why forbid nulls in text strings? > > > As far as I know, PG assumes, like most C code, that strings don't > > contain embedded NUL characters. > > > Yeah; we should reject them because nothing will behave very sensibly > with them, eg > > regression=# select E'abc\000xyz'; > ?column? > ---------- > abc > (1 row) > > The point has come up before, and I kinda thought we *had* changed the > lexer to reject \000. I see we haven't though. Curiously, this > does fail: > > regression=# select U&'abc\0000xyz'; > ERROR: invalid byte sequence for encoding "SQL_ASCII": 0x00 > HINT: This error can also happen if the byte sequence does not match the > encoding expected by the server, which is controlled by "client_encoding". > > though that's not quite the message I'd have expected to see.
I think that's because out verifier actually *does* reject \0, only problem is that \0 does not set saw_high_bit flag, so the verifier simply does not get executed. But U& executes it always. unicode=# SELECT e'\xc3\xa4'; ?column? ---------- รค (1 row) unicode=# SELECT e'\xc3\xa4\x00'; ERROR: invalid byte sequence for encoding "UTF8": 0x00 HINT: This error can also happen if the byte sequence does not match the encoding expected by the server, which is controlled by "client_encoding". Heh. -- marko -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers