David,

we wrote contrib module (mchar) for one customer, which ports its
application from mssql to postgres. It does case-insensitive comparison for new data type 'mchar' and linked with ICU for system independent locale.

Oleg

On Sun, 1 Jun 2008, David E. Wheeler wrote:

Howdy,

I'm sure I'm just showing off my ignorance here, but here goes

I really need case-insensitive string comparison in my database. Ideally there'd be a nice ITEXT data type (and friends, ichar, ivarchar, etc.). But of course there isn't, and for years I've just used LOWER() on indexes and queries to get the same result.

Only it turns out that I'm of course not getting the same result. This script:

#!/usr/local/bin/perl -w

use strict;
use warnings;
use utf8;
binmode STDOUT, ':utf8';
use DBI;

my $dbh = DBI->connect('dbi:Pg:dbname=try', 'postgres', '', { pg_enable_utf8 => 1 });
for my $char qw(              A B C D ) {
print "$char: ", $dbh->selectrow_array('SELECT LOWER(?)', undef, $char ), $/;
}

Yields this output:

: : : : : : : : : : : : : A: a
B: b
C: c
D: d

So it doesn't really work on anything other than ASCII, it looks like. So I have two questions:

1. Does the use of the tolower() C function in the citext data type on pgfoundry basically give me the same results as using lower() in my SQL has for all these years? IOW, does it convert letters to lowercase in the same way that the LOWER() SQL function does? If so, I think I might start to use it for my case-insensitive columns and simplify my SQL a bit.

http://pgfoundry.org/projects/citext/

2. Isn't the ICU library distributed with PostgreSQL? And if so, could it not be used to create proper case conversions in LOWER() and friends and, ultimately, to create a case-insensitive text type in core? I'm seeing that it has a constant named U_COMPARE_IGNORE_CASE that can be used with its unorm_compare() function:

http://www.icu-project.org/apiref/icu4c/unorm_8h.html#6cc4c8b0d5be7ce1ac4b600ace7817f5
http://www.icu-project.org/apiref/icu4c/unorm_8h.html#991e0fe6f0d062dd6e8e924517f3f437

I don't really know C, but if that's stuff there, can't we take advantage of it for proper case-insensitive comparisons (and conversions)?

Thanks,

David




        Regards,
                Oleg
_____________________________________________________________
Oleg Bartunov, Research Scientist, Head of AstroNet (www.astronet.ru),
Sternberg Astronomical Institute, Moscow University, Russia
Internet: [EMAIL PROTECTED], http://www.sai.msu.su/~megera/
phone: +007(495)939-16-83, +007(495)939-23-83

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to