Peter Geoghegan pe...@2ndquadrant.com writes:
On 8 October 2012 21:35, Robert Haas robertmh...@gmail.com wrote:
Gentlemen, you can't fight here. This is the War Room.
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
--
Sent via
Do you think you can follow through on this soon, Robert? I don't
believe that there are any outstanding issues. I'm not going to make
an issue of the fact that strxfrm() hasn't been taken advantage of. If
you could please post a new revision, with the suggested alterations
(that you agree with),
On Mon, Oct 8, 2012 at 8:47 AM, Peter Geoghegan pe...@2ndquadrant.com wrote:
Do you think you can follow through on this soon, Robert? I don't
believe that there are any outstanding issues. I'm not going to make
an issue of the fact that strxfrm() hasn't been taken advantage of. If
you could
On 8 October 2012 16:30, Robert Haas robertmh...@gmail.com wrote:
I don't have any plans to work on this further. I think all of the
criticism that has been leveled at this patch is 100% bogus, and I
greatly dislike the changes that have been proposed. That may not be
fair, but it's how I
On Mon, Oct 8, 2012 at 12:26 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
If it was the case that you were only 50% of the way to getting
something committable, I guess I'd understand; this is, after all, a
volunteer effort, and you are of course free to pursue or not pursue
whatever you
On 8 October 2012 21:35, Robert Haas robertmh...@gmail.com wrote:
Hey, if me deciding I don't want to work on a patch any more is going
to make you feel slighted, then you're out of luck. The archives are
littered with people who have decided to stop working on things
because the consensus
On Fri, Jun 15, 2012 at 6:10 PM, Tom Lane t...@sss.pgh.pa.us wrote:
I would be concerned about this if it were per-sort-tuple wastage, but
what I understood to be happening was that this was a single instance of
an expansible buffer (per sort, perhaps, but still just one buffer).
And, as you
On 23 July 2012 16:09, Robert Haas robertmh...@gmail.com wrote:
However, what this really boils down to is that you and Peter don't
like this line of code:
+ tss-buflen1 = TYPEALIGN(TEXTBUFLEN, len1);
I can only speak for myself, though I agree with your summary here.
What
On Mon, Jul 23, 2012 at 11:34 AM, Peter Geoghegan pe...@2ndquadrant.com wrote:
On 23 July 2012 16:09, Robert Haas robertmh...@gmail.com wrote:
However, what this really boils down to is that you and Peter don't
like this line of code:
+ tss-buflen1 = TYPEALIGN(TEXTBUFLEN, len1);
On 23 July 2012 16:36, Robert Haas robertmh...@gmail.com wrote:
On Mon, Jul 23, 2012 at 11:34 AM, Peter Geoghegan pe...@2ndquadrant.com
wrote:
tss-buflen = 1 ffs(len1);
I'm sorry, I don't follow you here. What is ffs() ?
Sorry, fls, not ffs. I always get those mixed up.
See
On Mon, Jul 23, 2012 at 12:07 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
On 23 July 2012 16:36, Robert Haas robertmh...@gmail.com wrote:
On Mon, Jul 23, 2012 at 11:34 AM, Peter Geoghegan pe...@2ndquadrant.com
wrote:
tss-buflen = 1 ffs(len1);
I'm sorry, I don't follow you here. What
On Jun20, 2012, at 19:38 , Peter Geoghegan wrote:
On 20 June 2012 17:41, Tom Lane t...@sss.pgh.pa.us wrote:
In any case, if you have to redefine the meaning of equality in order
to justify a performance patch, I'm prepared to walk away at the start.
The advantage of my proposed
On Jun21, 2012, at 02:22 , Peter Geoghegan wrote:
I've written a very small C++ program, which I've attached, that
basically proves that this can still make a fairly large difference -
I hope it's okay that that it's C++, but that allowed me to write the
program quickly, with no dependencies
On 21 June 2012 10:24, Florian Pflug f...@phlo.org wrote:
On Jun21, 2012, at 02:22 , Peter Geoghegan wrote:
I've written a very small C++ program, which I've attached, that
basically proves that this can still make a fairly large difference -
I hope it's okay that that it's C++, but that
On Jun21, 2012, at 11:55 , Peter Geoghegan wrote:
On 21 June 2012 10:24, Florian Pflug f...@phlo.org wrote:
On Jun21, 2012, at 02:22 , Peter Geoghegan wrote:
I've written a very small C++ program, which I've attached, that
basically proves that this can still make a fairly large difference -
On 21 June 2012 11:40, Florian Pflug f...@phlo.org wrote:
At this point, my theory is that your choice of random strings prevents
strxfrm() from ever winning over strcoll(). The reason being that you pick
each letter uniformly distributed from a-z, resulting in a probability of
two string
On sön, 2012-06-17 at 23:58 +0100, Peter Geoghegan wrote:
So if you take the word Aßlar here - that is equivalent to Asslar,
and so strcoll(Aßlar, Asslar) will return 0 if you have the right
LC_COLLATE
This is not actually correct. glibc will sort Asslar before Aßlar, and
that is correct in
On 20 June 2012 11:00, Peter Eisentraut pete...@gmx.net wrote:
On sön, 2012-06-17 at 23:58 +0100, Peter Geoghegan wrote:
So if you take the word Aßlar here - that is equivalent to Asslar,
and so strcoll(Aßlar, Asslar) will return 0 if you have the right
LC_COLLATE
This is not actually
On 19 June 2012 19:44, Peter Geoghegan pe...@2ndquadrant.com wrote:
PostgreSQL supported Unicode before 2005, when the tie-breaker was
introduced. I know at least one Swede who used Postgres95. I just took
a look at the REL6_4 branch, and it looks much the same in 1999 as it
did in 2005, in
On Sun, Jun 17, 2012 at 9:26 PM, Tom Lane t...@sss.pgh.pa.us wrote:
The trick for hashing such datatypes is to be able to guarantee that
equal values hash to the same hash code, which is typically possible
as long as you know the equality rules well enough. We could possibly
do that for text
On 20 June 2012 15:10, Greg Stark st...@mit.edu wrote:
On Sun, Jun 17, 2012 at 9:26 PM, Tom Lane t...@sss.pgh.pa.us wrote:
The trick for hashing such datatypes is to be able to guarantee that
equal values hash to the same hash code, which is typically possible
as long as you know the equality
On Wed, Jun 20, 2012 at 3:19 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
It occurs to me that strxfrm would answer this question. If we made
the hash function hash the result of strxfrm then we could make
equality use strcoll and not fall back to strcmp.
What about per-column collations?
Peter Geoghegan pe...@2ndquadrant.com writes:
I think that this change may have made the difference between the
Hungarians getting away with it and not getting away with it. Might it
have been that for text, they were using some operator that wasn't '='
(perhaps one which has no fastpath, and
On 20 June 2012 15:55, Tom Lane t...@sss.pgh.pa.us wrote:
Peter Geoghegan pe...@2ndquadrant.com writes:
I think that this change may have made the difference between the
Hungarians getting away with it and not getting away with it. Might it
have been that for text, they were using some
Peter Geoghegan pe...@2ndquadrant.com writes:
No, I'm suggesting it would probably be at least a bit of a win here
to cache the constant, and only have to do a strxfrm() + strcmp() per
comparison.
Um, have you got any hard evidence to support that notion? The
traditional advice is that
On Wed, Jun 20, 2012 at 12:41 PM, Tom Lane t...@sss.pgh.pa.us wrote:
The fact is that this is likely to be a fairly significant
performance win, because strxfrm() is quite simply the way you're
supposed to do collation-aware sorting, and is documented as such. For
that reason, C standard
On 20 June 2012 17:41, Tom Lane t...@sss.pgh.pa.us wrote:
Peter Geoghegan pe...@2ndquadrant.com writes:
No, I'm suggesting it would probably be at least a bit of a win here
to cache the constant, and only have to do a strxfrm() + strcmp() per
comparison.
Um, have you got any hard evidence to
On 20 June 2012 17:41, Tom Lane t...@sss.pgh.pa.us wrote:
Peter Geoghegan pe...@2ndquadrant.com writes:
No, I'm suggesting it would probably be at least a bit of a win here
to cache the constant, and only have to do a strxfrm() + strcmp() per
comparison.
Um, have you got any hard evidence to
On 21 June 2012 01:22, Peter Geoghegan pe...@2ndquadrant.com wrote:
I've written a very small C++ program, which I've attached, that
basically proves that this can still make a fairly large difference -
I hope it's okay that that it's C++, but that allowed me to write the
program quickly, with
So, just to give a bit more weight to my argument that we should
recognise that equivalent strings ought to be treated identically, I
direct your attention to conformance requirement C9 of Unicode 3.0:
http://www.unicode.org/unicode/standard/versions/enumeratedversions.html#Unicode_3_0_0
This
Peter Geoghegan pe...@2ndquadrant.com wrote:
So, just to give a bit more weight to my argument that we should
recognise that equivalent strings ought to be treated identically
Since we appear to be questioning everything in this area, I'll
raise something which has been bugging me for a
On 19 June 2012 16:17, Kevin Grittner kevin.gritt...@wicourts.gov wrote:
Peter Geoghegan pe...@2ndquadrant.com wrote:
So, just to give a bit more weight to my argument that we should
recognise that equivalent strings ought to be treated identically
Since we appear to be questioning
Peter Geoghegan pe...@2ndquadrant.com wrote:
Kevin Grittner kevin.gritt...@wicourts.gov wrote:
Since we appear to be questioning everything in this area, I'll
raise something which has been bugging me for a while: in some
other systems I've used, the tie-breaker comparison for
equivalent
On 19 June 2012 17:45, Kevin Grittner kevin.gritt...@wicourts.gov wrote:
Peter Geoghegan pe...@2ndquadrant.com wrote:
Are you sure that they actually have a tie-breaker, and don't just
make the distinction between equality and equivalence (if only
internally)?
I'm pretty sure that when I was
Peter Geoghegan pe...@2ndquadrant.com wrote:
Kevin Grittner kevin.gritt...@wicourts.gov wrote:
I'm pretty sure that when I was using Sybase ASE the order for
non-equal values was always predictable, and it behaved in the
manner I describe below. I'm less sure about any other product.
On 19 June 2012 18:57, Kevin Grittner kevin.gritt...@wicourts.gov wrote:
We weren't using en_US.UTF-8 collation (or any other proper
collation) on Sybase -- I'm not sure whether they even supported
proper collation sequences on the versions we used. I'm thinking of
when we were using their
Kevin Grittner kevin.gritt...@wicourts.gov writes:
I wasn't aware that en_US.UTF-8 doesn't have equivalence without
equality. I guess that surprising result in my last post is just
plain inevitable with that collation then. Bummer. Is there
actually anyone who finds that to be a useful
On 19 June 2012 19:44, Peter Geoghegan pe...@2ndquadrant.com wrote:
You could do that, and some people do use custom collations for
various reasons. That's obviously very much of minority interest
though. Most people will just use citext or something. However, since
citext is itself a client
On 18 June 2012 00:38, Tom Lane t...@sss.pgh.pa.us wrote:
The only reason we test a = b and not a b || a b
is that the latter is at least twice as expensive to evaluate.
Perhaps I've been unclear. I meant to write (!(a b) !(b a)),
and not (a b b a). The former is not tautological when
On 18 June 2012 16:59, Peter Geoghegan pe...@2ndquadrant.com wrote:
Perhaps more importantly, I cannot recreate any of these problems on
my Fedora 16 machine. Even with hu_HU on LATIN2, Tom's original test
case (from 2005, on a Fedora 4 machine) cannot be recreated. So it may
be that they've
The fly in the ointment for strxfrm() adoption may be the need to be
consistent with this earlier behaviour:
commit 656beff59033ccc5261a615802e1a85da68e8fad
Author: Tom Lane t...@sss.pgh.pa.us
Date: Thu Dec 22 22:50:00 2005 +
Adjust string comparison so that only bitwise-equal strings
Peter Geoghegan pe...@2ndquadrant.com writes:
The fly in the ointment for strxfrm() adoption may be the need to be
consistent with this earlier behaviour:
if strcoll claims two strings are equal, check it with strcmp, and
sort according to strcmp if not identical.
I'm not sure I
On 17 June 2012 17:01, Tom Lane t...@sss.pgh.pa.us wrote:
I'm not sure I agree with this decision; why should we presume to know
better than the glibc locale what constitutes equality?
The killer reason why it must be like that is that you can't use hash
methods on text if text equality is
Peter Geoghegan pe...@2ndquadrant.com writes:
On 17 June 2012 17:01, Tom Lane t...@sss.pgh.pa.us wrote:
The killer reason why it must be like that is that you can't use hash
methods on text if text equality is some unknown condition subtly
different from bitwise equality.
Fair enough, but I
On Jun 17, 2012 5:50 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Peter Geoghegan pe...@2ndquadrant.com writes:
On 17 June 2012 17:01, Tom Lane t...@sss.pgh.pa.us wrote:
How exactly do you plan to shoehorn that into SQL? You could invent
some nonstandard equivalence operator I suppose, but what
Peter Geoghegan pe...@2ndquadrant.com writes:
Right, most people won't care. You may or may not want a new
Operator for equivalency. The regular operator for equality doesn't have to
and shouldn't change. It is both useful and conceptually clean to not
guarantee that a compator can be relied
On 17 June 2012 21:26, Tom Lane t...@sss.pgh.pa.us wrote:
Sure, and in general we only expect that = operators mean equivalency;
a concrete example is float8 =, which on IEEE-spec machines will say
that zero and minus zero are equal.
Right; the spec says that, and we punt to the spec. No one
On 17 June 2012 23:58, Peter Geoghegan pe...@2ndquadrant.com wrote:
We can decree that equivalency implies equality, or make all this
internal (which, perversely, I suppose the C++ committee people
cannot).
Sorry, that should obviously read equality implies equivalency. We
may not have to
Peter Geoghegan pe...@2ndquadrant.com writes:
ISTM if '=' was really a mere equivalency operator, we'd only every
check (a b b a) in the btree code.
You're not really making a lot of sense here, or at least I'm not
grasping the distinction you want to draw. btree indexes (and sorting
in
On 18 March 2012 15:08, Tom Lane t...@sss.pgh.pa.us wrote:
However, it occurred to me that we could pretty easily jury-rig
something that would give us an idea about the actual benefit available
here. To wit: make a C function that wraps strxfrm, basically
strxfrm(text) returns bytea. Then
Peter Geoghegan pe...@2ndquadrant.com writes:
On 14 June 2012 19:28, Robert Haas robertmh...@gmail.com wrote:
I thought that doubling repeatedly would be overly aggressive in terms
of memory usage.
I fail to understand how this sortsupport buffer fundamentally differs
from a generic dynamic
On Fri, Jun 15, 2012 at 12:22 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Peter Geoghegan pe...@2ndquadrant.com writes:
On 14 June 2012 19:28, Robert Haas robertmh...@gmail.com wrote:
I thought that doubling repeatedly would be overly aggressive in terms
of memory usage.
I fail to understand how
Robert Haas robertmh...@gmail.com writes:
On Fri, Jun 15, 2012 at 12:22 PM, Tom Lane t...@sss.pgh.pa.us wrote:
(And from a performance standpoint, I'm not entirely convinced it's not
a bug, anyway. Worst-case behavior could be pretty bad.)
Instead of simply asserting that, could you respond
On Fri, Jun 15, 2012 at 1:45 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Robert Haas robertmh...@gmail.com writes:
On Fri, Jun 15, 2012 at 12:22 PM, Tom Lane t...@sss.pgh.pa.us wrote:
(And from a performance standpoint, I'm not entirely convinced it's not
a bug, anyway. Worst-case behavior could
On 15 June 2012 21:06, Robert Haas robertmh...@gmail.com wrote:
On Fri, Jun 15, 2012 at 1:45 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Robert Haas robertmh...@gmail.com writes:
On Fri, Jun 15, 2012 at 12:22 PM, Tom Lane t...@sss.pgh.pa.us wrote:
(And from a performance standpoint, I'm not
Robert Haas robertmh...@gmail.com writes:
On Fri, Jun 15, 2012 at 1:45 PM, Tom Lane t...@sss.pgh.pa.us wrote:
Maybe I missed something, but as far as I saw your argument was not that
the performance wasn't bad but that the rest of the sort code would
dominate the runtime anyway. I grant that
On 18 March 2012 15:08, Tom Lane t...@sss.pgh.pa.us wrote:
One other thing I've always wondered about in this connection is the
general performance of sorting toasted datums. Is it better to detoast
them in every comparison, or pre-detoast to save comparison cycles at
the cost of having to
On Thu, Jun 14, 2012 at 11:36 AM, Peter Geoghegan pe...@2ndquadrant.com wrote:
On 18 March 2012 15:08, Tom Lane t...@sss.pgh.pa.us wrote:
One other thing I've always wondered about in this connection is the
general performance of sorting toasted datums. Is it better to detoast
them in every
On 14 June 2012 17:35, Robert Haas robertmh...@gmail.com wrote:
The problem with pre-detoasting to save comparison cycles is that you
can now fit many, many fewer tuples in work_mem. There might be cases
where it wins (for example, because the entire data set fits even
after decompressing
On 2 March 2012 20:45, Robert Haas robertmh...@gmail.com wrote:
I decided to investigate the possible virtues of allowing text to
use the sortsupport infrastructure, since strings are something people
often want to sort.
I should mention up-front that I agree with the idea that it is worth
On Thu, Jun 14, 2012 at 1:56 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
On 14 June 2012 17:35, Robert Haas robertmh...@gmail.com wrote:
The problem with pre-detoasting to save comparison cycles is that you
can now fit many, many fewer tuples in work_mem. There might be cases
where it
On Thu, Jun 14, 2012 at 2:10 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
Why have you made the reusable buffer managed by sortsupport
TEXTBUFLEN-aligned? The existing rationale for that constant (whose
value is 1024) does not seem to carry forward here:
* This should be large enough
On 14 June 2012 19:28, Robert Haas robertmh...@gmail.com wrote:
I thought that doubling repeatedly would be overly aggressive in terms
of memory usage. Blowing the buffers out to 8kB because we hit a
string that's a bit over 4kB isn't so bad, but blowing them out to
256MB because we hit a
On Thu, Jun 14, 2012 at 3:24 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
On 14 June 2012 19:28, Robert Haas robertmh...@gmail.com wrote:
I thought that doubling repeatedly would be overly aggressive in terms
of memory usage. Blowing the buffers out to 8kB because we hit a
string that's a
On 14 June 2012 20:32, Robert Haas robertmh...@gmail.com wrote:
Yeah, but *it doesn't matter*. If you test this on strings that are
long enough that they get pushed out to TOAST, you'll find that it
doesn't measurably improve performance, because the overhead of
detoasting so completely
On Thu, Jun 14, 2012 at 6:30 PM, Peter Geoghegan pe...@2ndquadrant.com wrote:
Here we know that it doesn't matter, so the application of Knuth's first law
of optimization is appropriate.
I'm not advocating some Byzantine optimisation, or even something that
could reasonably be described as an
On Sat, Mar 17, 2012 at 6:58 PM, Greg Stark st...@mit.edu wrote:
On Fri, Mar 2, 2012 at 8:45 PM, Robert Haas robertmh...@gmail.com wrote:
12789 28.2686 libc-2.13.so strcoll_l
6802 15.0350 postgres text_cmp
I'm still curious how it would compare to call
On Sun, Mar 18, 2012 at 11:08 AM, Tom Lane t...@sss.pgh.pa.us wrote:
However, it occurred to me that we could pretty easily jury-rig
something that would give us an idea about the actual benefit available
here. To wit: make a C function that wraps strxfrm, basically
strxfrm(text) returns
On Mon, Mar 19, 2012 at 12:19:53PM -0400, Robert Haas wrote:
On Sat, Mar 17, 2012 at 6:58 PM, Greg Stark st...@mit.edu wrote:
On Fri, Mar 2, 2012 at 8:45 PM, Robert Haas robertmh...@gmail.com wrote:
12789 28.2686 libc-2.13.so strcoll_l
6802 15.0350 postgres
On Mon, Mar 19, 2012 at 9:23 PM, Martijn van Oosterhout
klep...@svana.org wrote:
Ouch. I was holding out hope that you could get a meaningful
improvement if we could use the first X bytes of the strxfrm output so
you only need to do a strcoll on strings that actually nearly match.
But with an
Greg Stark st...@mit.edu writes:
I'm still curious how it would compare to call strxfrm and sort the
resulting binary blobs.
In principle that should be a win; it's hard to believe that strxfrm
would have gotten into the standards if it were not a win for sorting
applications.
I don't think
On Fri, Mar 2, 2012 at 8:45 PM, Robert Haas robertmh...@gmail.com wrote:
12789 28.2686 libc-2.13.so strcoll_l
6802 15.0350 postgres text_cmp
I'm still curious how it would compare to call strxfrm and sort the
resulting binary blobs. I don't think the
On Fri, Mar 02, 2012 at 03:45:38PM -0500, Robert Haas wrote:
SELECT SUM(1) FROM (SELECT * FROM randomtext ORDER BY t) x;
On unpatched master, this takes about 416 ms (plus or minus a few).
With the attached patch, it takes about 389 ms (plus or minus a very
few), a speedup of about 7%.
I
Noah Misch n...@leadboat.com wrote:
On Fri, Mar 02, 2012 at 03:45:38PM -0500, Robert Haas wrote:
SELECT SUM(1) FROM (SELECT * FROM randomtext ORDER BY t) x;
[13% faster with patch for C collation; 7% faster for UTF8]
I had hoped for more like a 15-20% gain from this approach, but
it
I decided to investigate the possible virtues of allowing text to
use the sortsupport infrastructure, since strings are something people
often want to sort. I generated 100,000 random alphanumeric strings,
each 30 characters in length, and loaded them into a single-column
table, froze it, ran
75 matches
Mail list logo