On Thu, Feb 13, 2014 at 09:47:01PM -0500, Bruce Momjian wrote:
> On Wed, Oct 16, 2013 at 02:17:11PM -0400, Bruce Momjian wrote:
> > > > You can see the UTF8 case is fine because \n is considered greater
> > > > than space, but in the C locale, where \n is less than space, the
> > > > false return value shows the problem with
> > > > internal_bpchar_pattern_compare() trimming the string and first
> > > > comparing on lengths.  This is exactly the problem you outline, where
> > > > space trimming assumes everything is less than a space.
> > > 
> > > For collations other than C some of those issues that have to do with
> > > string comparisons might simply be hidden, depending on how strcoll()
> > > handles inputs off different lengths: If strcoll() applies implicit
> > > space padding to the shorter value, there won't be any visible
> > > difference in ordering between bpchar and varchar values.  If strcoll()
> > > does not apply such space padding, the right-trimming of bpchar values
> > > causes very similar issues even in a en_US collation.
> 
> I have added the attached C comment to explain the problem, and added a
> TODO item to fix it if we ever break binary upgrading.
> 
> Does anyone think this warrants a doc mention?

I have done some more thinking on this and I found a way to document
this, which reduces our need to actually fix it some day.  I am afraid
the behavioral change needed to fix this might break so many
applications that the fix will never be done, though I will keep the
TODO item until I get more feedback on that.  Patch attached.

-- 
  Bruce Momjian  <br...@momjian.us>        http://momjian.us
  EnterpriseDB                             http://enterprisedb.com

  + Everyone has their own god. +
diff --git a/doc/src/sgml/datatype.sgml b/doc/src/sgml/datatype.sgml
new file mode 100644
index 30fd9bb..9635004
*** a/doc/src/sgml/datatype.sgml
--- b/doc/src/sgml/datatype.sgml
*************** SELECT '52093.89'::money::numeric::float
*** 1072,1081 ****
     <para>
      Values of type <type>character</type> are physically padded
      with spaces to the specified width <replaceable>n</>, and are
!     stored and displayed that way.  However, the padding spaces are
!     treated as semantically insignificant.  Trailing spaces are
!     disregarded when comparing two values of type <type>character</type>,
!     and they will be removed when converting a <type>character</type> value
      to one of the other string types.  Note that trailing spaces
      <emphasis>are</> semantically significant in
      <type>character varying</type> and <type>text</type> values, and
--- 1072,1084 ----
     <para>
      Values of type <type>character</type> are physically padded
      with spaces to the specified width <replaceable>n</>, and are
!     stored and displayed that way.  However, trailing spaces are treated as
!     semantically insignificant and disregarded when comparing two values
!     of type <type>character</type>.  In collations where whitespace
!     is significant, this behavior can produce unexpected results,
!     e.g. <command>SELECT 'a '::CHAR(2) collate "C" < 'a\n'::CHAR(2)
!     returns true.
!     Trailing spaces are removed when converting a <type>character</type> value
      to one of the other string types.  Note that trailing spaces
      <emphasis>are</> semantically significant in
      <type>character varying</type> and <type>text</type> values, and
diff --git a/src/backend/utils/adt/varchar.c b/src/backend/utils/adt/varchar.c
new file mode 100644
index 284b5d1..502ca44
*** a/src/backend/utils/adt/varchar.c
--- b/src/backend/utils/adt/varchar.c
*************** bpcharcmp(PG_FUNCTION_ARGS)
*** 846,863 ****
  				len2;
  	int			cmp;
  
- 	/*
- 	 * Trimming trailing spaces off of both strings can cause a string
- 	 * with a character less than a space to compare greater than a
- 	 * space-extended string, e.g. this returns false:
- 	 *		SELECT E'ab\n'::CHAR(10) < E'ab '::CHAR(10);
- 	 * even though '\n' is less than the space if CHAR(10) was
- 	 * space-extended.  The correct solution would be to trim only
- 	 * the longer string to be the same length of the shorter, if
- 	 * possible, then do the comparison.  However, changing this
- 	 * might break existing indexes, breaking binary upgrades.
- 	 * For details, see http://www.postgresql.org/message-id/CAK+WP1xdmyswEehMuetNztM4H199Z1w9KWRHVMKzyyFM+hV=z...@mail.gmail.com
- 	 */
  	len1 = bcTruelen(arg1);
  	len2 = bcTruelen(arg2);
  
--- 846,851 ----
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to