On Mon, Nov 7, 2011 at 11:12 AM, Tom Lane <t...@sss.pgh.pa.us> wrote:
> I looked at this a bit and realized that sscanf is actually doing a
> couple of critical things for us, which are lost in translation when
> doing it like this:
>
> 1. It ignores whitespace other than the dividing tab.  If we don't
> continue to do that, we'll likely break existing config files.
>
> 2. It ensures that src and trg each consist of at least one (nonblank)
> character.  placeChar() is critically dependent on the assumption that
> src is not empty.
>
> However, after looking around a bit at the other tsearch config-file-
> reading functions, I noted that they all use t_isspace() to identify
> whitespace ... and that function in fact should be okay on OS X,
> because it uses iswspace in multibyte encodings.
>
> So it's fairly simple to improve this code to reject whitespace that
> way.  I don't like the existing code anyway because of its potential
> vulnerability to buffer overrun.  I'll fix it up and commit.
>
>> As for the other problems with isspace and such on OSX, it might be
>> worth looking at the python portability fixes.
>
> If OS X's UTF8 locales weren't so thoroughly broken (eg sorting does not
> work), I might be tempted to try to do it that way, but I still fail
> to see the point.  After reviewing the code I feel that unaccent needs
> to be fixed because it's not consistent with the other tsearch config
> file parsers, and not so much because it works or doesn't work on any
> specific platform.
>

Yeah, I never knew there was such a problem with OSX and UTF8 before
running into it here but it's good to know. When I noticed the
unnaccent extension in more recent PostgreSQL versions, I figured it
would perform better than our current plperl-based accent stripping
function (which it surely does) and just noticed the results on my
machine were a little off, but our linux-based servers were fine and
dandy and yadda yadda yadda.

Anyways, lemme know if there's anything else I could help with or
could test and whatnot. Cheers.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to