Can this improvement get merged up into CVS current, or did you already
do that Tom?
---
Tatsuo Ishii wrote:
Nice work, Tatsuo! Wade, can you confirm that this patch solves your
problem?
Tatsuo, please commit into
Bruce Momjian [EMAIL PROTECTED] writes:
Can this improvement get merged up into CVS current, or did you already
do that Tom?
It's irrelevant to current.
regards, tom lane
---(end of broadcast)---
TIP 2: you can get off
Tom Lane kirjutas K, 05.02.2003 kell 08:12:
Hannu Krosing [EMAIL PROTECTED] writes:
Another idea is to make special regex type and store the regexes
pre-parsed (i.e. in some fast-load form) ?
Seems unlikely that going out to disk could beat just recompiling the
regexp.
We have to get
Confirmed. Looks like a 100-fold increase. Thanx guys.
Explain output can be seen here:
http://arch.wavefire.com/pgregex.txt
-Wade Klaver
At 09:59 AM 2/5/03 -0500, Tom Lane wrote:
Tatsuo Ishii [EMAIL PROTECTED] writes:
Ok. The original complain can be sasily solved at least for single
byte
Nice work, Tatsuo! Wade, can you confirm that this patch solves your
problem?
Tatsuo, please commit into REL7_3 branch only --- I'm nearly ready to do
a wholesale replacement of the regex code in HEAD, so you wouldn't
accomplish much except to create a merge problem for me ...
Ok. I have
OK,
I redid my trials with the same data set on 7.2.3 --with-multibyte and I
get the same brutal performance hit, so it is definitely a
multibyte-specific problem.
WRT the distribution of the data in the table, I used the following:
All g-words in /usr/share/dict with different processes
wade [EMAIL PROTECTED] writes:
I redid my trials with the same data set on 7.2.3 --with-multibyte and I
get the same brutal performance hit, so it is definitely a
multibyte-specific problem.
There are only about 1000 words that appear more than once (2 or 3 times)
in 27k rows.
Right, so
On Tue, 2003-02-04 at 11:24, wade wrote:
I redid my trials with the same data set on 7.2.3 --with-multibyte and I
get the same brutal performance hit, so it is definitely a
multibyte-specific problem.
Given that this problem isn't a regression, I don't think we need to
delay 7.3.2 to fix it
Neil Conway [EMAIL PROTECTED] writes:
Given that this problem isn't a regression, I don't think we need to
delay 7.3.2 to fix it (of course, a fix for 7.3.3 and 7.4 is essential,
IMHO).
No, I've had to abandon my original thought that it was a localized bug,
so it's not going to be fixed in
On Tue, 2003-02-04 at 11:59, Tom Lane wrote:
I'm about to go off and look at whether we can absorb the Tcl regex
package, which is Spencer's new baby. That will not be a solution for
7.3.anything, but it could be an answer for 7.4.
Sounds like we had about the same idea at about the same time
Neil Conway [EMAIL PROTECTED] writes:
Sounds like we had about the same idea at about the same time -- I
emailed Henry Spencer inquiring about the new RE engine last night.
I just did that this morning ;-) ... but more as politeness than
anything else. AFAICT from searching the net, packaging
On Tue, 4 Feb 2003, Neil Conway wrote:
Spencer's implementation is outperformed by some other RE engines,
notably PCRE (www.pcre.org). But switching to another engine might
impose backward-compatibility problems, in terms of the details of the
RE syntax.
It would be a delight to be able to
Jon Jensen [EMAIL PROTECTED] writes:
It would be a delight to be able to use more advanced (IMHO) Perl-
compatible regexes in PostgreSQL.
After some further research, pcre does seem like an interesting
alternative. Both pcre and Spencer's new code have essentially
Berkeley-style licenses, so
On Tue, 2003-02-04 at 16:59, Tom Lane wrote:
Neil Conway [EMAIL PROTECTED] writes:
Given that this problem isn't a regression, I don't think we need to
delay 7.3.2 to fix it (of course, a fix for 7.3.3 and 7.4 is essential,
IMHO).
No, I've had to abandon my original thought that it was a
It would be a delight to be able to use more advanced (IMHO) Perl-
compatible regexes in PostgreSQL.
After some further research, pcre does seem like an interesting
alternative. Both pcre and Spencer's new code have essentially
Berkeley-style licenses, so there's no problem there. Some
On Tue, 2003-02-04 at 13:21, Tom Lane wrote:
After some further research, pcre does seem like an interesting
alternative. Both pcre and Spencer's new code have essentially
Berkeley-style licenses, so there's no problem there.
Keep in mind that pcre has an advertising clause in its license
Proof of concept:
PG 7.3 using regression database:
regression=# select count(*) from tenk1 where 'quotidian' ~ string4;
count
---
0
(1 row)
Time: 676.14 ms
regression=# select count(*) from tenk1 where 'quotidian' ~ stringu1;
count
---
0
(1 row)
Time: 3426.96 ms
On Tue, 2003-02-04 at 17:26, Tom Lane wrote:
Proof of concept:
[...]
Very cool work, Tom.
In the first case there are only four distinct patterns used, so we're
running with cached precompiled regexes. In the other cases a new regex
compilation must occur at each row.
Speaking of which,
Neil Conway [EMAIL PROTECTED] writes:
Speaking of which, is there (or should there be) some mechanism for
increasing the size of the compiled pattern cache? Perhaps a GUC var?
I thought about that while I was messing with the code, but I don't
think there's much point in it, unless someone
Ok. The original complain can be sasily solved at least for single
byte encoding databases. With the small patches(against 7.3.1)
included, I got following result.
test1:
select count(*) from tenk1 where 'quotidian' ~ string4;
count
---
0
(1 row)
Time: 113.81 ms
test2:
select
On Tue, 2003-02-04 at 18:21, Tom Lane wrote:
4. pcre looks like it's probably *not* as well suited to a multibyte
environment. In particular, I doubt that its UTF8 compile option was
even turned on for the performance comparison Neil cited --- and the man
page only promises experimental,
Hannu Krosing [EMAIL PROTECTED] writes:
If we are going into code-lifting business, we should also consider
Pythons sre
What advantages does it have to make it worth considering?
regards, tom lane
---(end of broadcast)---
Tom Lane kirjutas T, 04.02.2003 kell 21:18:
Hannu Krosing [EMAIL PROTECTED] writes:
If we are going into code-lifting business, we should also consider
Pythons sre
What advantages does it have to make it worth considering?
Should be the same as pcre + support for wide chars.
--
Hannu
Hannu Krosing [EMAIL PROTECTED] writes:
Tom Lane kirjutas T, 04.02.2003 kell 21:18:
What advantages does it have to make it worth considering?
Should be the same as pcre + support for wide chars.
Well, if someone wants to do the legwork to try it, that interface
should work just about
Tom Lane kirjutas K, 05.02.2003 kell 01:35:
Neil Conway [EMAIL PROTECTED] writes:
Speaking of which, is there (or should there be) some mechanism for
increasing the size of the compiled pattern cache? Perhaps a GUC var?
I thought about that while I was messing with the code, but I don't
Hannu Krosing [EMAIL PROTECTED] writes:
Another idea is to make special regex type and store the regexes
pre-parsed (i.e. in some fast-load form) ?
Seems unlikely that going out to disk could beat just recompiling the
regexp. They're not *that* slow to compile ... at least not when we
avoid
At 08:31 PM 2/1/03 +0800, Christopher Kings-Lynne wrote:
Why on earth are you using a CVS version!?!?!?!
Chris
This problem manifests itself under 7.3.1 release as well. CVS is used so
we can access patches to the SRF stuff implemented after 7.3.1 was released.
Tom... any links that document
At 10:52 PM 1/31/03 -0500, Tom Lane wrote:
wade [EMAIL PROTECTED] writes:
We recently upgraded a project from 7.2 to 7.3.1 to make use of some of
the cool new features in 7.3. The installed version is CVS stable from
yesterday. However, we noticed a major performance hit in POSIX regular
wade [EMAIL PROTECTED] writes:
Here is the profile information. I included a log of the session that
generated it at the top of the gprof output. If there is any other info I
can help you with, please let me know.
A four-second test isn't long enough to gather any statistically
meaningful
At 05:51 PM 2/3/03 -0500, Tom Lane wrote:
wade [EMAIL PROTECTED] writes:
Here is the profile information. I included a log of the session that
generated it at the top of the gprof output. If there is any other info I
can help you with, please let me know.
A four-second test isn't long
Sigh. It seems that somebody broke caching of compiled regexes,
so that your regex is recompiled each time it's used. I haven't
dug into the logic yet, but I think it must have been a mistake
in Thomas' change to make the regex cache be searched circularly:
2002-06-14 22:49 thomas
*
Well, IMHO I would rather see a delay of the roll-out by a day or two
than see a release with such a serious performance glitch. Especially
since I personally have been shooting my big mouth off to all my geek
friends on the leaps and bounds PG has made in the last few releases. With
my luck
Wade, how many distinct patterns do you have in that table? What's the
population distribution (in particular, do the top 32 patterns account
for most of the table)?
It's looking like the issue is not so much that the 7.3 code is
completely broken, as that its LRU replacement policy for
Next question: may I guess that you weren't using MULTIBYTE in 7.2?
After still more digging, I'm coming round to the opinion that the
problem is that MULTIBYTE is forced on in 7.3, and this imposes a
factor-of-256 overhead in a bunch of the operations in regcomp.c.
In particular, compiling a
Christopher Kings-Lynne [EMAIL PROTECTED] writes:
Why on earth are you using a CVS version!?!?!?!
I assume he meant tip of REL7_3 branch --- which is a perfectly
reasonable thing to install, even if there are still a few fixes
to go before we call it 7.3.2.
regards, tom
wade [EMAIL PROTECTED] writes:
We recently upgraded a project from 7.2 to 7.3.1 to make use of some of
the cool new features in 7.3. The installed version is CVS stable from
yesterday. However, we noticed a major performance hit in POSIX regular
expression matches against columns using the
36 matches
Mail list logo