Jon Jensen <[EMAIL PROTECTED]> writes: > It would be a delight to be able to use more advanced (IMHO) Perl- > compatible regexes in PostgreSQL.
After some further research, pcre does seem like an interesting alternative. Both pcre and Spencer's new code have essentially Berkeley-style licenses, so there's no problem there. Some relevant comparisons: 1. pcre tries to be exactly compatible with Perl, so details of its regex flavor will be familiar to many more people than the Tcl flavor (by and large the features are similar, but there are differences). 2. pcre is already distributed as a nice tidy library; we need not extract code from the Tcl distribution. 3. pcre is actively maintained (although tracking a new release every couple months may not be something we really want to do, anyway). AFAICT Henry's not doing anything much with his code, so it'd be pretty much take-once-and-maintain-for-ourselves. 4. pcre looks like it's probably *not* as well suited to a multibyte environment. In particular, I doubt that its UTF8 compile option was even turned on for the performance comparison Neil cited --- and the man page only promises "experimental, incomplete support for UTF-8 encoded strings". The Tcl code by contrast is used only in a multibyte environment, so that's the supported, optimized path. It doesn't even assume null-terminated strings (yay). 5. As best I can tell so far, neither code is currently set up for run-time choice of encoding; we'd have to do some work for that in either case. (This probably means that tracking pcre update releases would be problematic anyhow.) 6. According to Friedl's book, the Tcl engine (Spencer's new code) is way faster than Perl's, and so presumably faster than pcre, though I can't find any specific measurements of pcre in the book. It uses a hybrid DFA/NFA approach which Friedl considers state of the art. Strict Perl compatibility would be a nice feature, but right at the moment the multibyte issue is looking like the determining factor. If we don't get a multibyte-optimized engine out of this change, we're wasting our time. regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 4: Don't 'kill -9' the postmaster