.
But all that is actually a whole different can of worms, and slightly
besides the point of my original question. How to properly compare
strings with different lentgths has been discussed before, see for
instance the thread in [1]. My intention was not to get that started
again. As far as I can see, the consensus seems to be that when using
the C locale, string comparisons should be done using NO PAD semantics.
(It sure gives some strange semantics if you have varchars with trailing
spaces, but it's perfectly legal.)
The point is that my testcase deals with strings of the same length.
Thus, the above clause doesn't really apply. The standard, to my
understanding, says that fixed-length character values are padded when
the row is constructed. And once that happens, those spaces become part
of the value. It's invalid to strip them, unless done explicitly.
Since we only have the CHAR(n) type to improve compliance with the SQL
specification, and we don't generally encourage its use, I think we
should fix any non-compliant behavior. That seems to mean that if you
take two CHAR values and compare them, it should give the same result
as comparing the same two values as VARCHAR using the same collation
with the shorter value padded with spaces.
So this is correct:
test=# select 'ab'::char(3) collate en_US E'ab\n'::char(3) collate
en_US;
?column?
--
t
(1 row)
... because it matches:
test=# select 'ab '::varchar(3) collate en_US E'ab\n'::varchar(3) collate
en_US;
?column?
--
t
(1 row)
Again, this touches on the same point as Bruce's example above. Right
now these two queries might produce identical results on Linux, because
of the way strcoll() behaves. On OS X you get different results:
select 'ab'::char(3) collate en_US E'ab\n'::char(3) collate en_US;
?column?
--
t
(1 row)
select 'ab '::varchar(3) collate en_US E'ab\n'::varchar(3)
collate en_US;
?column?
--
f
(1 row)
I have no idea who's right. But doesn't this count as evidence that
right-trimming pbchars is not even safe with the en_US collation?
-- Thomas
[1]
http://www.postgresql.org/message-id/flat/a737b7a37273e048b164557adef4a58b0579a...@ntex2010a.host.magwien.gv.at#a737b7a37273e048b164557adef4a58b0579a...@ntex2010a.host.magwien.gv.at
On Fri, Oct 11, 2013 at 4:58 PM, Kevin Grittner kgri...@ymail.com wrote:
Bruce Momjian br...@momjian.us wrote:
Thomas Fanghaenel wrote:
I was wondering about the proper semantics of CHAR comparisons in some
corner
cases that involve control characters with values that are less than 0x20
(space).
What matters in general isn't where the characters fall when
comparing individual bytes, but how the strings containing them
sort according to the applicable collation. That said, my
recollection of the spec is that when two CHAR(n) values are
compared, the shorter should be blank-padded before making the
comparison. *That* said, I think the general advice is to stay
away from CHAR(n) in favor or VARCHAR(n) or TEXT, and I think that
is good advice.
I am sorry for this long email, but I would be interested to see what
other hackers think about this issue.
Since we only have the CHAR(n) type to improve compliance with the
SQL specification, and we don't generally encourage its use, I
think we should fix any non-compliant behavior. That seems to mean
that if you take two CHAR values and compare them, it should give
the same result as comparing the same two values as VARCHAR using
the same collation with the shorter value padded with spaces.
So this is correct:
test=# select 'ab'::char(3) collate en_US E'ab\n'::char(3) collate
en_US;
?column?
--
t
(1 row)
... because it matches:
test=# select 'ab '::varchar(3) collate en_US E'ab\n'::varchar(3) collate
en_US;
?column?
--
t
(1 row)
But this is incorrect:
test=# select 'ab'::char(3) collate C E'ab\n'::char(3) collate C;
?column?
--
t
(1 row)
... because it doesn't match:
test=# select 'ab '::varchar(3) collate C E'ab\n'::varchar(3) collate C;
?column?
--
f
(1 row)
Of course, I have no skin in the game, because it took me about two
weeks after my first time converting a database with CHAR columns
to PostgreSQL to change them all to VARCHAR, and do that as part of
all future conversions.
--
Kevin Grittner
EDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers