On Mon, Nov 7, 2011 at 11:12 AM, Tom Lane <t...@sss.pgh.pa.us> wrote: > I looked at this a bit and realized that sscanf is actually doing a > couple of critical things for us, which are lost in translation when > doing it like this: > > 1. It ignores whitespace other than the dividing tab. If we don't > continue to do that, we'll likely break existing config files. > > 2. It ensures that src and trg each consist of at least one (nonblank) > character. placeChar() is critically dependent on the assumption that > src is not empty. > > However, after looking around a bit at the other tsearch config-file- > reading functions, I noted that they all use t_isspace() to identify > whitespace ... and that function in fact should be okay on OS X, > because it uses iswspace in multibyte encodings. > > So it's fairly simple to improve this code to reject whitespace that > way. I don't like the existing code anyway because of its potential > vulnerability to buffer overrun. I'll fix it up and commit. > >> As for the other problems with isspace and such on OSX, it might be >> worth looking at the python portability fixes. > > If OS X's UTF8 locales weren't so thoroughly broken (eg sorting does not > work), I might be tempted to try to do it that way, but I still fail > to see the point. After reviewing the code I feel that unaccent needs > to be fixed because it's not consistent with the other tsearch config > file parsers, and not so much because it works or doesn't work on any > specific platform. >
Yeah, I never knew there was such a problem with OSX and UTF8 before running into it here but it's good to know. When I noticed the unnaccent extension in more recent PostgreSQL versions, I figured it would perform better than our current plperl-based accent stripping function (which it surely does) and just noticed the results on my machine were a little off, but our linux-based servers were fine and dandy and yadda yadda yadda. Anyways, lemme know if there's anything else I could help with or could test and whatnot. Cheers. -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers