Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-17 Thread Bruce Momjian
Can this improvement get merged up into CVS current, or did you already do that Tom? --- Tatsuo Ishii wrote: Nice work, Tatsuo! Wade, can you confirm that this patch solves your problem? Tatsuo, please commit into

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-17 Thread Tom Lane
Bruce Momjian [EMAIL PROTECTED] writes: Can this improvement get merged up into CVS current, or did you already do that Tom? It's irrelevant to current. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-06 Thread Hannu Krosing
Tom Lane kirjutas K, 05.02.2003 kell 08:12: Hannu Krosing [EMAIL PROTECTED] writes: Another idea is to make special regex type and store the regexes pre-parsed (i.e. in some fast-load form) ? Seems unlikely that going out to disk could beat just recompiling the regexp. We have to get

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-05 Thread wade
Confirmed. Looks like a 100-fold increase. Thanx guys. Explain output can be seen here: http://arch.wavefire.com/pgregex.txt -Wade Klaver At 09:59 AM 2/5/03 -0500, Tom Lane wrote: Tatsuo Ishii [EMAIL PROTECTED] writes: Ok. The original complain can be sasily solved at least for single byte

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-05 Thread Tatsuo Ishii
Nice work, Tatsuo! Wade, can you confirm that this patch solves your problem? Tatsuo, please commit into REL7_3 branch only --- I'm nearly ready to do a wholesale replacement of the regex code in HEAD, so you wouldn't accomplish much except to create a merge problem for me ... Ok. I have

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread wade
OK, I redid my trials with the same data set on 7.2.3 --with-multibyte and I get the same brutal performance hit, so it is definitely a multibyte-specific problem. WRT the distribution of the data in the table, I used the following: All g-words in /usr/share/dict with different processes

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Tom Lane
wade [EMAIL PROTECTED] writes: I redid my trials with the same data set on 7.2.3 --with-multibyte and I get the same brutal performance hit, so it is definitely a multibyte-specific problem. There are only about 1000 words that appear more than once (2 or 3 times) in 27k rows. Right, so

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Neil Conway
On Tue, 2003-02-04 at 11:24, wade wrote: I redid my trials with the same data set on 7.2.3 --with-multibyte and I get the same brutal performance hit, so it is definitely a multibyte-specific problem. Given that this problem isn't a regression, I don't think we need to delay 7.3.2 to fix it

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes: Given that this problem isn't a regression, I don't think we need to delay 7.3.2 to fix it (of course, a fix for 7.3.3 and 7.4 is essential, IMHO). No, I've had to abandon my original thought that it was a localized bug, so it's not going to be fixed in

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Neil Conway
On Tue, 2003-02-04 at 11:59, Tom Lane wrote: I'm about to go off and look at whether we can absorb the Tcl regex package, which is Spencer's new baby. That will not be a solution for 7.3.anything, but it could be an answer for 7.4. Sounds like we had about the same idea at about the same time

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes: Sounds like we had about the same idea at about the same time -- I emailed Henry Spencer inquiring about the new RE engine last night. I just did that this morning ;-) ... but more as politeness than anything else. AFAICT from searching the net, packaging

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Jon Jensen
On Tue, 4 Feb 2003, Neil Conway wrote: Spencer's implementation is outperformed by some other RE engines, notably PCRE (www.pcre.org). But switching to another engine might impose backward-compatibility problems, in terms of the details of the RE syntax. It would be a delight to be able to

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Tom Lane
Jon Jensen [EMAIL PROTECTED] writes: It would be a delight to be able to use more advanced (IMHO) Perl- compatible regexes in PostgreSQL. After some further research, pcre does seem like an interesting alternative. Both pcre and Spencer's new code have essentially Berkeley-style licenses, so

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Hannu Krosing
On Tue, 2003-02-04 at 16:59, Tom Lane wrote: Neil Conway [EMAIL PROTECTED] writes: Given that this problem isn't a regression, I don't think we need to delay 7.3.2 to fix it (of course, a fix for 7.3.3 and 7.4 is essential, IMHO). No, I've had to abandon my original thought that it was a

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Sean Chittenden
It would be a delight to be able to use more advanced (IMHO) Perl- compatible regexes in PostgreSQL. After some further research, pcre does seem like an interesting alternative. Both pcre and Spencer's new code have essentially Berkeley-style licenses, so there's no problem there. Some

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Neil Conway
On Tue, 2003-02-04 at 13:21, Tom Lane wrote: After some further research, pcre does seem like an interesting alternative. Both pcre and Spencer's new code have essentially Berkeley-style licenses, so there's no problem there. Keep in mind that pcre has an advertising clause in its license

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Tom Lane
Proof of concept: PG 7.3 using regression database: regression=# select count(*) from tenk1 where 'quotidian' ~ string4; count --- 0 (1 row) Time: 676.14 ms regression=# select count(*) from tenk1 where 'quotidian' ~ stringu1; count --- 0 (1 row) Time: 3426.96 ms

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Neil Conway
On Tue, 2003-02-04 at 17:26, Tom Lane wrote: Proof of concept: [...] Very cool work, Tom. In the first case there are only four distinct patterns used, so we're running with cached precompiled regexes. In the other cases a new regex compilation must occur at each row. Speaking of which,

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Tom Lane
Neil Conway [EMAIL PROTECTED] writes: Speaking of which, is there (or should there be) some mechanism for increasing the size of the compiled pattern cache? Perhaps a GUC var? I thought about that while I was messing with the code, but I don't think there's much point in it, unless someone

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Tatsuo Ishii
Ok. The original complain can be sasily solved at least for single byte encoding databases. With the small patches(against 7.3.1) included, I got following result. test1: select count(*) from tenk1 where 'quotidian' ~ string4; count --- 0 (1 row) Time: 113.81 ms test2: select

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Hannu Krosing
On Tue, 2003-02-04 at 18:21, Tom Lane wrote: 4. pcre looks like it's probably *not* as well suited to a multibyte environment. In particular, I doubt that its UTF8 compile option was even turned on for the performance comparison Neil cited --- and the man page only promises experimental,

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Tom Lane
Hannu Krosing [EMAIL PROTECTED] writes: If we are going into code-lifting business, we should also consider Pythons sre What advantages does it have to make it worth considering? regards, tom lane ---(end of broadcast)---

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Hannu Krosing
Tom Lane kirjutas T, 04.02.2003 kell 21:18: Hannu Krosing [EMAIL PROTECTED] writes: If we are going into code-lifting business, we should also consider Pythons sre What advantages does it have to make it worth considering? Should be the same as pcre + support for wide chars. -- Hannu

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Tom Lane
Hannu Krosing [EMAIL PROTECTED] writes: Tom Lane kirjutas T, 04.02.2003 kell 21:18: What advantages does it have to make it worth considering? Should be the same as pcre + support for wide chars. Well, if someone wants to do the legwork to try it, that interface should work just about

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Hannu Krosing
Tom Lane kirjutas K, 05.02.2003 kell 01:35: Neil Conway [EMAIL PROTECTED] writes: Speaking of which, is there (or should there be) some mechanism for increasing the size of the compiled pattern cache? Perhaps a GUC var? I thought about that while I was messing with the code, but I don't

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-04 Thread Tom Lane
Hannu Krosing [EMAIL PROTECTED] writes: Another idea is to make special regex type and store the regexes pre-parsed (i.e. in some fast-load form) ? Seems unlikely that going out to disk could beat just recompiling the regexp. They're not *that* slow to compile ... at least not when we avoid

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-03 Thread wade
At 08:31 PM 2/1/03 +0800, Christopher Kings-Lynne wrote: Why on earth are you using a CVS version!?!?!?! Chris This problem manifests itself under 7.3.1 release as well. CVS is used so we can access patches to the SRF stuff implemented after 7.3.1 was released. Tom... any links that document

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-03 Thread wade
At 10:52 PM 1/31/03 -0500, Tom Lane wrote: wade [EMAIL PROTECTED] writes: We recently upgraded a project from 7.2 to 7.3.1 to make use of some of the cool new features in 7.3. The installed version is CVS stable from yesterday. However, we noticed a major performance hit in POSIX regular

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-03 Thread Tom Lane
wade [EMAIL PROTECTED] writes: Here is the profile information. I included a log of the session that generated it at the top of the gprof output. If there is any other info I can help you with, please let me know. A four-second test isn't long enough to gather any statistically meaningful

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-03 Thread wade
At 05:51 PM 2/3/03 -0500, Tom Lane wrote: wade [EMAIL PROTECTED] writes: Here is the profile information. I included a log of the session that generated it at the top of the gprof output. If there is any other info I can help you with, please let me know. A four-second test isn't long

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-03 Thread Tom Lane
Sigh. It seems that somebody broke caching of compiled regexes, so that your regex is recompiled each time it's used. I haven't dug into the logic yet, but I think it must have been a mistake in Thomas' change to make the regex cache be searched circularly: 2002-06-14 22:49 thomas *

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-03 Thread wade
Well, IMHO I would rather see a delay of the roll-out by a day or two than see a release with such a serious performance glitch. Especially since I personally have been shooting my big mouth off to all my geek friends on the leaps and bounds PG has made in the last few releases. With my luck

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-03 Thread Tom Lane
Wade, how many distinct patterns do you have in that table? What's the population distribution (in particular, do the top 32 patterns account for most of the table)? It's looking like the issue is not so much that the 7.3 code is completely broken, as that its LRU replacement policy for

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-03 Thread Tom Lane
Next question: may I guess that you weren't using MULTIBYTE in 7.2? After still more digging, I'm coming round to the opinion that the problem is that MULTIBYTE is forced on in 7.3, and this imposes a factor-of-256 overhead in a bunch of the operations in regcomp.c. In particular, compiling a

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-02-01 Thread Tom Lane
Christopher Kings-Lynne [EMAIL PROTECTED] writes: Why on earth are you using a CVS version!?!?!?! I assume he meant tip of REL7_3 branch --- which is a perfectly reasonable thing to install, even if there are still a few fixes to go before we call it 7.3.2. regards, tom

Re: [HACKERS] POSIX regex performance bug in 7.3 Vs. 7.2

2003-01-31 Thread Tom Lane
wade [EMAIL PROTECTED] writes: We recently upgraded a project from 7.2 to 7.3.1 to make use of some of the cool new features in 7.3. The installed version is CVS stable from yesterday. However, we noticed a major performance hit in POSIX regular expression matches against columns using the