Re: [HACKERS] [rfc] unicode escapes for extended strings

Marko Kreen Sat, 18 Apr 2009 05:30:02 -0700

On 4/18/09, Tom Lane <t...@sss.pgh.pa.us> wrote:
> Sam Mason <s...@samason.me.uk> writes:
>  > On Fri, Apr 17, 2009 at 07:01:47PM +0200, Martijn van Oosterhout wrote:
>  >> On Fri, Apr 17, 2009 at 07:07:31PM +0300, Marko Kreen wrote:
>  >>> Btw, is there any good reason why we don't reject \000, \x00
>  >>> in text strings?
>  >>
>  >> Why forbid nulls in text strings?
>
>  > As far as I know, PG assumes, like most C code, that strings don't
>  > contain embedded NUL characters.
>
>
> Yeah; we should reject them because nothing will behave very sensibly
>  with them, eg
>
>  regression=# select E'abc\000xyz';
>   ?column?
>  ----------
>   abc
>  (1 row)
>
>  The point has come up before, and I kinda thought we *had* changed the
>  lexer to reject \000.  I see we haven't though.  Curiously, this
>  does fail:
>
>  regression=# select U&'abc\0000xyz';
>  ERROR:  invalid byte sequence for encoding "SQL_ASCII": 0x00
>  HINT:  This error can also happen if the byte sequence does not match the 
> encoding expected by the server, which is controlled by "client_encoding".
>
>  though that's not quite the message I'd have expected to see.


I think that's because out verifier actually *does* reject \0,
only problem is that \0 does not set saw_high_bit flag,
so the verifier simply does not get executed.
But U& executes it always.

unicode=# SELECT e'\xc3\xa4';
 ?column?
----------
 ä
(1 row)

unicode=# SELECT e'\xc3\xa4\x00';
ERROR:  invalid byte sequence for encoding "UTF8": 0x00
HINT:  This error can also happen if the byte sequence does not match
the encoding expected by the server, which is controlled by
"client_encoding".

Heh.

-- 
marko

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [rfc] unicode escapes for extended strings

Reply via email to