On 09/04/2016 08:44 PM, Tom Lane wrote:
Heikki Linnakangas <hlinn...@iki.fi> writes:
On 08/23/2016 03:54 AM, Tom Lane wrote:
+1 for this patch in general. Some regression test cases would be nice.

I'm not sure how to write such tests without introducing insurmountable
platform dependencies --- particularly on platforms with weak support for
UTF8 locales, such as OS X.  All the interesting cases require knowing
what iswalpha() etc will return for some high character codes.

What I did to test it during development was to set MAX_SIMPLE_CHR to
something in the ASCII range, so that the high-character-code paths could
be tested without making any assumptions about locale classifications for
non-ASCII characters.  I'm not sure that's a helpful idea for regression
testing purposes, though.

I guess I could follow the lead of collate.linux.utf8.sql and produce
a test that's only promised to pass on one platform with one encoding,
but I'm not terribly excited by that.  AFAIK that test file does not
get run at all in the buildfarm or in the wild.

I'm not too worried if the tests don't get run regularly, but I don't like the idea that only works on one platform. This code is unlikely to be accidentally broken by unrelated changes in the backend, as the regexp code is very well isolated. But for someone hacks on the regexp library in the future, having a test suite to tickle all these corner-cases would be very handy.

Another class of regressions would be that something changes in the way a locale treats some characters, and that breaks an application. That would be very difficult to test for in a platform-independent way. That wouldn't really our bug, though, but the locale's.

Since we're now de facto maintainers of this regexp library, and our version could be used somewhere else than PostgreSQL too, it would actually be nice to have a regression suite that's independent from the pg_regress infrastructure, and wouldn't need a server to run. Perhaps a stand-alone C program that compiles the regexp code with mock versions of pg_wc_is* functions. Or perhaps a magic collation OID that makes pg_wc_is* functions to return hard-coded values for particular inputs.

- Heikki

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to