On 9/21/07, Andrew Dunstan [EMAIL PROTECTED] wrote:
It applied cleanly for me.
Yes, it seems something was screwed in my tree. I didn't notice you
commited the patch I applied before Greg's patch.
Anyway, I'm starting with a clean tree containing your fix and what
Tom commited but I have to
On 9/22/07, Guillaume Smet [EMAIL PROTECTED] wrote:
Anyway, I'm starting with a clean tree containing your fix and what
Tom commited but I have to import the data again due to the catalog
version bump :).
I have some good news. After Andrew's and Greg's patches, CVS HEAD is
as fast as 8.2 with
Andrew,
On 9/20/07, Andrew Dunstan [EMAIL PROTECTED] wrote:
Please try the attached patch, which goes back to using a special case
for single-byte ILIKE. I want to make sure that at the very least we
don't cause a performance regression with the code done this release. I
can't see an obvious
Guillaume Smet [EMAIL PROTECTED] wrote:
It's better but still slower than 8.2.
It probablly comes from 'var-varlena' feature in 8.3. Now we store
text fields in a compact format on disks and extract them on access.
It consumes some CPU cycles. If all of data are in buffer cache
and the
ITAGAKI Takahiro [EMAIL PROTECTED] writes:
Guillaume Smet [EMAIL PROTECTED] wrote:
It's better but still slower than 8.2.
It probablly comes from 'var-varlena' feature in 8.3. Now we store
text fields in a compact format on disks and extract them on access.
It consumes some CPU cycles. If
Gregory,
On 9/21/07, Gregory Stark [EMAIL PROTECTED] wrote:
Hm, it does seem I missed like.c when I converted all the text operators to
avoid detoasting packed varlenas. I'll send a patch in a few minutes to do
that. I'm surprised it would have such a large effect though.
The patch doesn't
Guillaume Smet [EMAIL PROTECTED] writes:
Gregory,
On 9/21/07, Gregory Stark [EMAIL PROTECTED] wrote:
Hm, it does seem I missed like.c when I converted all the text operators to
avoid detoasting packed varlenas. I'll send a patch in a few minutes to do
that. I'm surprised it would have such
Guillaume Smet wrote:
Gregory,
On 9/21/07, Gregory Stark [EMAIL PROTECTED] wrote:
Hm, it does seem I missed like.c when I converted all the text operators to
avoid detoasting packed varlenas. I'll send a patch in a few minutes to do
that. I'm surprised it would have such a large effect
On 9/20/07, Andrew Dunstan [EMAIL PROTECTED] wrote:
Can you retry both sets of tests but this time in C locale? The lower()
code works differently in C locale, and it might be that we need to look
at tweaking just one case.
Here we go with SQL_ASCII:
** 8.1 **
cityvox_c=# SELECT e.numeve
Guillaume Smet wrote:
app_hls
On 9/20/07, Andrew Dunstan [EMAIL PROTECTED] wrote:
Can you retry both sets of tests but this time in C locale? The lower()
code works differently in C locale, and it might be that we need to look
at tweaking just one case.
Please try the
I wrote:
I can't see an obvious way around the problem for multi-byte case -
lower() then requires converting to and from wchar, and I don't see a
way of avoiding calling lower().
There is one way we could reduce the use of lower() by up to (almost)
50% in the common case where the
Andrew, All,
On 5/22/07, Andrew Dunstan [EMAIL PROTECTED] wrote:
But before I commit this I'd appreciate seeing some more testing, both
for correctness and performance.
I finally found some time to test this patch on our data. As our
production database is still using 8.1, I made my tests
Guillaume Smet wrote:
Andrew, All,
On 5/22/07, Andrew Dunstan [EMAIL PROTECTED] wrote:
But before I commit this I'd appreciate seeing some more testing, both
for correctness and performance.
I finally found some time to test this patch on our data. As our
production
On 9/19/07, Andrew Dunstan [EMAIL PROTECTED] wrote:
It's at least good to see that the LIKE case has some useful speedup in
8.3.
It can be due to your patch or to the varlena header patch. Seqscan is
a bit faster too.
Can you run the same set of tests in a single byte encoding like latin1?
On 9/19/07, Andrew Dunstan [EMAIL PROTECTED] wrote:
Can you run the same set of tests in a single byte encoding like latin1?
Here are the results (each query was executed several times before this result):
** 8.1 **
cityvox_latin1=# SELECT e.numeve FROM evenement e WHERE e.libgeseve
ILIKE
Guillaume Smet wrote:
On 9/19/07, Andrew Dunstan [EMAIL PROTECTED] wrote:
Can you run the same set of tests in a single byte encoding like latin1?
Here are the results (each query was executed several times before this result):
** 8.1 **
cityvox_latin1=# SELECT e.numeve FROM
However, I have just about convinced myself that we don't need
IsFirstByte for matching _ for UTF8, either preceded by % or
not,
as it should always be true. Can anyone come up with a counter
example?
You have to be on a first byte before you can meaningfully
apply NextChar, and you
[EMAIL PROTECTED] wrote:
Is it worth the effort to pre-process the pattern?
For example:
%% - %
This is already done, required by spec.
%_ - _%
If applied recursively, this would automatically cover:
%_% - _%
_%_ - __%
The 'benefit' would be that the pattern
Zeugswetter Andreas ADI SD wrote:
You have to be on a first byte before you can meaningfully
apply NextChar, and you have to use NextChar or else you
don't count characters correctly (eg __ must match 2 chars
not 2 bytes).
Well, for utf8 NextChar could advance to the next char even
Zeugswetter Andreas ADI SD [EMAIL PROTECTED] writes:
You have to be on a first byte before you can meaningfully
apply NextChar, and you have to use NextChar or else you
don't count characters correctly (eg __ must match 2 chars
not 2 bytes).
Well, for utf8 NextChar could advance to the
Andrew Dunstan [EMAIL PROTECTED] writes:
do { (t)++; (tlen)--} while ((*(t) 0xC0) == 0x80 tlen 0)
The while *must* test those two conditions in the other order.
(Don't laugh --- we've had reproducible bugs before in which the backend
dumped core because of running off the end of memory due
Tom Lane wrote:
Andrew Dunstan [EMAIL PROTECTED] writes:
We should only be able to get out of step from the %_ case, I believe,
so we should only need to do the first-byte test in that case (which is
in a different code path from the normal _ case. Does that seem right?
At least
Andrew Dunstan [EMAIL PROTECTED] writes:
OK, Here is a patch that I am fairly confident does what's been
discussed, as summarised by Tom.
! #define CHAREQ(p1, p2) (*p1 == *p2)
...
+ #define IsFirstByte(c) ((*c 0xC0) != 0x80)
These macros are bugs waiting to happen. Please parenthesize
Tom Lane wrote:
I'm not sure I believe the new coding for %-matching at all, and I
certainly don't like the 100% lack of comments explaining why the
different cases are necessary and just how they differ. In particular,
once we've advanced more than one character, why does it still matter
Tom Lane wrote:
There should somewhere be a block comment explaining all the reasoning
we've so painfully gone through about why the three cases (SB, MB, UTF8)
are needed and how they must differ.
I'm working on a detailed description/rationale.
However, I have
Andrew Dunstan [EMAIL PROTECTED] writes:
However, I have just about convinced myself that we don't need
IsFirstByte for matching _ for UTF8, either preceded by % or not, as
it should always be true. Can anyone come up with a counter example?
You have to be on a first byte before you can
Tom Lane wrote:
Andrew Dunstan [EMAIL PROTECTED] writes:
However, I have just about convinced myself that we don't need
IsFirstByte for matching _ for UTF8, either preceded by % or not, as
it should always be true. Can anyone come up with a counter example?
You have to be on a
Andrew Dunstan [EMAIL PROTECTED] writes:
Tom Lane wrote:
You have to be on a first byte before you can meaningfully apply
NextChar, and you have to use NextChar or else you don't count
characters correctly (eg __ must match 2 chars not 2 bytes).
Yes, I agree completely. However it looks to
I wrote:
Andrew Dunstan [EMAIL PROTECTED] writes:
Yes, I agree completely. However it looks to me like IsFirstByte will in
fact always be true when we get to call NextChar for matching _ for UTF8.
If that's true, the patch is failing to achieve its goal of treating %
bytewise ...
OK, I
Tom Lane wrote:
Andrew Dunstan [EMAIL PROTECTED] writes:
Tom Lane wrote:
You have to be on a first byte before you can meaningfully apply
NextChar, and you have to use NextChar or else you don't count
characters correctly (eg __ must match 2 chars not 2 bytes).
Yes, I
Tom Lane wrote:
OK, I studied it a bit more and now see what you're driving at: in this
form of the patch, we treat % bytewise unless it is followed by _, in
which case we treat it char-wise. That seems a good tradeoff,
considering that such a pattern is probably pretty uncommon --- we
should
On Thu, May 24, 2007 at 11:20:51PM -0400, Tom Lane wrote:
I wrote:
Andrew Dunstan [EMAIL PROTECTED] writes:
Yes, I agree completely. However it looks to me like IsFirstByte will in
fact always be true when we get to call NextChar for matching _ for UTF8.
If that's true, the patch is
And Dennis said:
It is only when you have a pattern like '%_' when this is a problem
and we could detect this and do byte by byte when it's not. Now we
check (*p == '\\') || (*p == '_') in each iteration when we scan over
characters for '%', and we could do it once and have different loops
Tom Lane wrote:
3. UTF8: % can advance bytewise. _ must check it is on a first byte
(else return match failure) and if so do NextChar. So primitives
are NextChar, NextByte, ByteEq, IsFirstByte.
We should only be able to get out of step from the %_ case, I believe,
so we should only
Andrew Dunstan [EMAIL PROTECTED] writes:
We should only be able to get out of step from the %_ case, I believe,
so we should only need to do the first-byte test in that case (which is
in a different code path from the normal _ case. Does that seem right?
At least put Assert(IsFirstByte()) in
Tom Lane wrote:
Andrew Dunstan [EMAIL PROTECTED] writes:
We should only be able to get out of step from the %_ case, I believe,
so we should only need to do the first-byte test in that case (which is
in a different code path from the normal _ case. Does that seem right?
At least
Andrew Dunstan wrote:
Tom Lane wrote:
Andrew Dunstan [EMAIL PROTECTED] writes:
We should only be able to get out of step from the %_ case, I
believe, so we should only need to do the first-byte test in that
case (which is in a different code path from the normal _ case.
Does that seem
Andrew Dunstan [EMAIL PROTECTED] writes:
I am also wondering if it might be sensible to make this choice once at
backend startup and store a function pointer, instead of doing it for
every string processed by like/ilike:
if (pg_database_encoding_max_length() == 1)
return
Tom Lane wrote:
Andrew Dunstan [EMAIL PROTECTED] writes:
We should only be able to get out of step from the %_ case, I believe,
so we should only need to do the first-byte test in that case (which is
in a different code path from the normal _ case. Does that seem right?
At least
Andrew Dunstan [EMAIL PROTECTED] writes:
... It turns out (according to the analysis) that the
only time we actually need to use NextChar is when we are matching an
_ in a like/ilike pattern.
I thought we'd determined that advancing bytewise for % was also risky,
in two cases:
1. Multibyte
Tom Lane wrote:
Andrew Dunstan [EMAIL PROTECTED] writes:
... It turns out (according to the analysis) that the
only time we actually need to use NextChar is when we are matching an
_ in a like/ilike pattern.
I thought we'd determined that advancing bytewise for % was also risky,
in
Andrew Dunstan wrote:
Tom Lane wrote:
Andrew Dunstan [EMAIL PROTECTED] writes:
... It turns out (according to the analysis) that the only time we
actually need to use NextChar is when we are matching an _ in a
like/ilike pattern.
I thought we'd determined that advancing bytewise
Andrew Dunstan [EMAIL PROTECTED] writes:
Tom Lane wrote:
I thought we'd determined that advancing bytewise for % was also
risky, in two cases:
1. Multibyte character set that is not UTF8 (more specifically, does not
have a guarantee that first bytes and not-first bytes are distinct)
I
On 5/22/07, Andrew Dunstan [EMAIL PROTECTED] wrote:
But before I commit this I'd appreciate seeing some more testing, both
for correctness and performance.
Any chance the patch applies cleanly on a 8.2 code base? I can test it
on a real life 8.2 db but I won't have the time to load the data in
On 2007-05-22, Tom Lane [EMAIL PROTECTED] wrote:
If % advances by bytes then this will find a spurious match. The
only thing that prevents it is if B can't be both a leading and a
trailing byte of validly-encoded MB characters.
Which is (by design) true in UTF8, but is not true of most other
Andrew - Supernews [EMAIL PROTECTED] writes:
On 2007-05-22, Tom Lane [EMAIL PROTECTED] wrote:
If % advances by bytes then this will find a spurious match. The
only thing that prevents it is if B can't be both a leading and a
trailing byte of validly-encoded MB characters.
Which is (by
On Tue, May 22, 2007 at 12:12:51PM -0400, Tom Lane wrote:
Andrew Dunstan [EMAIL PROTECTED] writes:
... It turns out (according to the analysis) that the
only time we actually need to use NextChar is when we are matching an
_ in a like/ilike pattern.
I thought we'd determined that
Tom Lane wrote:
Yeah. It seems we need three comparison functions after all:
Yeah, that was my confusion. I thought we had concluded that we didn't,
but clearly we do.
1. Single-byte character set: needs NextByte and ByteEq only.
2. Generic multi-byte character set: both % and _
48 matches
Mail list logo