"William ZHANG" <[EMAIL PROTECTED]> writes:
> Sorry. I still cannot understand why backend encodings must have this
> property. AFAIK, the parser treats characters as ASCII. So any multi-byte
> characters will be treated as two or more ASCII characters. But if
> the multi-byte encoding doesnot use any special ASCII characters like
> single quote('), double quote(") and backslash(\), I think the parser
> can deal with it correctly.
You've got your attention too narrowly focused on strings inside quotes;
it's strings outside quotes that are the problem.
As an example, I see that gb18030 defines characters like 97 7e.
If someone tried to use that as a character of a SQL identifier
--- something that'd work fine for the UTF8 equivalent e6 a2 a1
--- the parser would see it as an identifier byte followed by
the operator ~.
Similarly, there are problems if we were to allow these character sets
for the pattern argument of a regular expression operator, or for any
datatype at all that can be embedded in an array constant. And for PL
languages that feed prosrc strings into external interpreters, such as
Perl or R, it gets really interesting really quickly :-(.
It is possible that some of these encodings could be allowed without
any risks, but I don't think it is worth our time to grovel through
each valid character and every possible backend situation to determine
safety. The risks are not always obvious --- see for instance the
security holes we fixed about a year ago in 8.1.4 et al --- and so
I for one would never have a lot of faith in there not being any holes.
The rule "no ASCII-aliasing characters" is a simple one that we can have
some confidence in.
regards, tom lane
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not