Re: [HACKERS] Locale agnostic unicode text

Tom Lane Mon, 24 Jan 2005 09:08:36 -0800

Greg Stark <[EMAIL PROTECTED]> writes:
> On Sat, 22 Jan 2005 17:09:42 -0500, Tom Lane <[EMAIL PROTECTED]> wrote:
>> I would imagine that the performance is spectacularly awful :-(.
>> Have you benchmarked it?  A large sort on a unitext column,
>> for instance, would be revealing.


> Why do you persist in believing this? I sent timing results of doing a
> setlocale for every record here about a year ago. Sorting on the pg_strxfrm I
> posted (and Conway rewrote) was about twice as slow as sorting without using
> it. So it's slow but not spectacularly awful.

glibc is not the world.  I tried Dawid's functions on Mac OS X, being a
random non-glibc platform that I happen to use.  Using some text data
I had handy (44500 lines, 1.9MB) I made a single-column text table and
timed
        explain analyze select * from foo order by f1;
The results were
        In C locale, SQL_ASCII encoding:        820 ms
        In C locale, UNICODE encoding:          825 ms
        Using Dawid's functions:                62010 ms
        Stripped-down functions:                21010 ms

The "stripped down" functions were the same functions without the 
locale overhead, eg

CREATE OR REPLACE FUNCTION unitext_le(unitext,unitext) RETURNS boolean AS $$
        my $ret = ($_[0] le $_[1]) ? 't' : 'f';
        return $ret;
$$ LANGUAGE plperlu STABLE;

so we may conclude that about one-third of the overhead is plperl's
fault and the other two-thirds is setlocale's fault.  But it's still
a factor of 50 slowdown to do it this way (actually worse, since not
all of the EXPLAIN ANALYZE total runtime went into sorting).

I'm not sure what your threshold of "spectacularly awful" is, but that
meets mine.

                        regards, tom lane

---------------------------(end of broadcast)---------------------------
TIP 4: Don't 'kill -9' the postmaster

Re: [HACKERS] Locale agnostic unicode text

Reply via email to