Re: [HACKERS] sortsupport for text

2012-10-09 Thread Dimitri Fontaine
Peter Geoghegan pe...@2ndquadrant.com writes: On 8 October 2012 21:35, Robert Haas robertmh...@gmail.com wrote: Gentlemen, you can't fight here. This is the War Room. Regards, -- Dimitri Fontaine http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support -- Sent via

Re: [HACKERS] sortsupport for text

2012-10-08 Thread Peter Geoghegan
Do you think you can follow through on this soon, Robert? I don't believe that there are any outstanding issues. I'm not going to make an issue of the fact that strxfrm() hasn't been taken advantage of. If you could please post a new revision, with the suggested alterations (that you agree with),

Re: [HACKERS] sortsupport for text

2012-10-08 Thread Robert Haas
On Mon, Oct 8, 2012 at 8:47 AM, Peter Geoghegan pe...@2ndquadrant.com wrote: Do you think you can follow through on this soon, Robert? I don't believe that there are any outstanding issues. I'm not going to make an issue of the fact that strxfrm() hasn't been taken advantage of. If you could

Re: [HACKERS] sortsupport for text

2012-10-08 Thread Peter Geoghegan
On 8 October 2012 16:30, Robert Haas robertmh...@gmail.com wrote: I don't have any plans to work on this further. I think all of the criticism that has been leveled at this patch is 100% bogus, and I greatly dislike the changes that have been proposed. That may not be fair, but it's how I

Re: [HACKERS] sortsupport for text

2012-10-08 Thread Robert Haas
On Mon, Oct 8, 2012 at 12:26 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: If it was the case that you were only 50% of the way to getting something committable, I guess I'd understand; this is, after all, a volunteer effort, and you are of course free to pursue or not pursue whatever you

Re: [HACKERS] sortsupport for text

2012-10-08 Thread Peter Geoghegan
On 8 October 2012 21:35, Robert Haas robertmh...@gmail.com wrote: Hey, if me deciding I don't want to work on a patch any more is going to make you feel slighted, then you're out of luck. The archives are littered with people who have decided to stop working on things because the consensus

Re: [HACKERS] sortsupport for text

2012-07-23 Thread Robert Haas
On Fri, Jun 15, 2012 at 6:10 PM, Tom Lane t...@sss.pgh.pa.us wrote: I would be concerned about this if it were per-sort-tuple wastage, but what I understood to be happening was that this was a single instance of an expansible buffer (per sort, perhaps, but still just one buffer). And, as you

Re: [HACKERS] sortsupport for text

2012-07-23 Thread Peter Geoghegan
On 23 July 2012 16:09, Robert Haas robertmh...@gmail.com wrote: However, what this really boils down to is that you and Peter don't like this line of code: + tss-buflen1 = TYPEALIGN(TEXTBUFLEN, len1); I can only speak for myself, though I agree with your summary here. What

Re: [HACKERS] sortsupport for text

2012-07-23 Thread Robert Haas
On Mon, Jul 23, 2012 at 11:34 AM, Peter Geoghegan pe...@2ndquadrant.com wrote: On 23 July 2012 16:09, Robert Haas robertmh...@gmail.com wrote: However, what this really boils down to is that you and Peter don't like this line of code: + tss-buflen1 = TYPEALIGN(TEXTBUFLEN, len1);

Re: [HACKERS] sortsupport for text

2012-07-23 Thread Peter Geoghegan
On 23 July 2012 16:36, Robert Haas robertmh...@gmail.com wrote: On Mon, Jul 23, 2012 at 11:34 AM, Peter Geoghegan pe...@2ndquadrant.com wrote: tss-buflen = 1 ffs(len1); I'm sorry, I don't follow you here. What is ffs() ? Sorry, fls, not ffs. I always get those mixed up. See

Re: [HACKERS] sortsupport for text

2012-07-23 Thread Robert Haas
On Mon, Jul 23, 2012 at 12:07 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: On 23 July 2012 16:36, Robert Haas robertmh...@gmail.com wrote: On Mon, Jul 23, 2012 at 11:34 AM, Peter Geoghegan pe...@2ndquadrant.com wrote: tss-buflen = 1 ffs(len1); I'm sorry, I don't follow you here. What

Re: [HACKERS] sortsupport for text

2012-06-21 Thread Florian Pflug
On Jun20, 2012, at 19:38 , Peter Geoghegan wrote: On 20 June 2012 17:41, Tom Lane t...@sss.pgh.pa.us wrote: In any case, if you have to redefine the meaning of equality in order to justify a performance patch, I'm prepared to walk away at the start. The advantage of my proposed

Re: [HACKERS] sortsupport for text

2012-06-21 Thread Florian Pflug
On Jun21, 2012, at 02:22 , Peter Geoghegan wrote: I've written a very small C++ program, which I've attached, that basically proves that this can still make a fairly large difference - I hope it's okay that that it's C++, but that allowed me to write the program quickly, with no dependencies

Re: [HACKERS] sortsupport for text

2012-06-21 Thread Peter Geoghegan
On 21 June 2012 10:24, Florian Pflug f...@phlo.org wrote: On Jun21, 2012, at 02:22 , Peter Geoghegan wrote: I've written a very small C++ program, which I've attached, that basically proves that this can still make a fairly large difference - I hope it's okay that that it's C++, but that

Re: [HACKERS] sortsupport for text

2012-06-21 Thread Florian Pflug
On Jun21, 2012, at 11:55 , Peter Geoghegan wrote: On 21 June 2012 10:24, Florian Pflug f...@phlo.org wrote: On Jun21, 2012, at 02:22 , Peter Geoghegan wrote: I've written a very small C++ program, which I've attached, that basically proves that this can still make a fairly large difference -

Re: [HACKERS] sortsupport for text

2012-06-21 Thread Peter Geoghegan
On 21 June 2012 11:40, Florian Pflug f...@phlo.org wrote: At this point, my theory is that your choice of random strings prevents strxfrm() from ever winning over strcoll(). The reason being that you pick each letter uniformly distributed from a-z, resulting in a probability of two string

Re: [HACKERS] sortsupport for text

2012-06-20 Thread Peter Eisentraut
On sön, 2012-06-17 at 23:58 +0100, Peter Geoghegan wrote: So if you take the word Aßlar here - that is equivalent to Asslar, and so strcoll(Aßlar, Asslar) will return 0 if you have the right LC_COLLATE This is not actually correct. glibc will sort Asslar before Aßlar, and that is correct in

Re: [HACKERS] sortsupport for text

2012-06-20 Thread Peter Geoghegan
On 20 June 2012 11:00, Peter Eisentraut pete...@gmx.net wrote: On sön, 2012-06-17 at 23:58 +0100, Peter Geoghegan wrote: So if you take the word Aßlar here - that is equivalent to Asslar, and so strcoll(Aßlar, Asslar) will return 0 if you have the right LC_COLLATE This is not actually

Re: [HACKERS] sortsupport for text

2012-06-20 Thread Peter Geoghegan
On 19 June 2012 19:44, Peter Geoghegan pe...@2ndquadrant.com wrote: PostgreSQL supported Unicode before 2005, when the tie-breaker was introduced. I know at least one Swede who used Postgres95. I just took a look at the REL6_4 branch, and it looks much the same in 1999 as it did in 2005, in

Re: [HACKERS] sortsupport for text

2012-06-20 Thread Greg Stark
On Sun, Jun 17, 2012 at 9:26 PM, Tom Lane t...@sss.pgh.pa.us wrote: The trick for hashing such datatypes is to be able to guarantee that equal values hash to the same hash code, which is typically possible as long as you know the equality rules well enough.  We could possibly do that for text

Re: [HACKERS] sortsupport for text

2012-06-20 Thread Peter Geoghegan
On 20 June 2012 15:10, Greg Stark st...@mit.edu wrote: On Sun, Jun 17, 2012 at 9:26 PM, Tom Lane t...@sss.pgh.pa.us wrote: The trick for hashing such datatypes is to be able to guarantee that equal values hash to the same hash code, which is typically possible as long as you know the equality

Re: [HACKERS] sortsupport for text

2012-06-20 Thread Greg Stark
On Wed, Jun 20, 2012 at 3:19 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: It occurs to me that strxfrm would answer this question. If we made the hash function hash the result of strxfrm then we could make equality use strcoll and not fall back to strcmp. What about per-column collations?

Re: [HACKERS] sortsupport for text

2012-06-20 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes: I think that this change may have made the difference between the Hungarians getting away with it and not getting away with it. Might it have been that for text, they were using some operator that wasn't '=' (perhaps one which has no fastpath, and

Re: [HACKERS] sortsupport for text

2012-06-20 Thread Peter Geoghegan
On 20 June 2012 15:55, Tom Lane t...@sss.pgh.pa.us wrote: Peter Geoghegan pe...@2ndquadrant.com writes: I think that this change may have made the difference between the Hungarians getting away with it and not getting away with it. Might it have been that for text, they were using some

Re: [HACKERS] sortsupport for text

2012-06-20 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes: No, I'm suggesting it would probably be at least a bit of a win here to cache the constant, and only have to do a strxfrm() + strcmp() per comparison. Um, have you got any hard evidence to support that notion? The traditional advice is that

Re: [HACKERS] sortsupport for text

2012-06-20 Thread Robert Haas
On Wed, Jun 20, 2012 at 12:41 PM, Tom Lane t...@sss.pgh.pa.us wrote: The fact is that this is likely to be a fairly significant performance win, because strxfrm() is quite simply the way you're supposed to do collation-aware sorting, and is documented as such. For that reason, C standard

Re: [HACKERS] sortsupport for text

2012-06-20 Thread Peter Geoghegan
On 20 June 2012 17:41, Tom Lane t...@sss.pgh.pa.us wrote: Peter Geoghegan pe...@2ndquadrant.com writes: No, I'm suggesting it would probably be at least a bit of a win here to cache the constant, and only have to do a strxfrm() + strcmp() per comparison. Um, have you got any hard evidence to

Re: [HACKERS] sortsupport for text

2012-06-20 Thread Peter Geoghegan
On 20 June 2012 17:41, Tom Lane t...@sss.pgh.pa.us wrote: Peter Geoghegan pe...@2ndquadrant.com writes: No, I'm suggesting it would probably be at least a bit of a win here to cache the constant, and only have to do a strxfrm() + strcmp() per comparison. Um, have you got any hard evidence to

Re: [HACKERS] sortsupport for text

2012-06-20 Thread Peter Geoghegan
On 21 June 2012 01:22, Peter Geoghegan pe...@2ndquadrant.com wrote: I've written a very small C++ program, which I've attached, that basically proves that this can still make a fairly large difference - I hope it's okay that that it's C++, but that allowed me to write the program quickly, with

Re: [HACKERS] sortsupport for text

2012-06-19 Thread Peter Geoghegan
So, just to give a bit more weight to my argument that we should recognise that equivalent strings ought to be treated identically, I direct your attention to conformance requirement C9 of Unicode 3.0: http://www.unicode.org/unicode/standard/versions/enumeratedversions.html#Unicode_3_0_0 This

Re: [HACKERS] sortsupport for text

2012-06-19 Thread Kevin Grittner
Peter Geoghegan pe...@2ndquadrant.com wrote: So, just to give a bit more weight to my argument that we should recognise that equivalent strings ought to be treated identically Since we appear to be questioning everything in this area, I'll raise something which has been bugging me for a

Re: [HACKERS] sortsupport for text

2012-06-19 Thread Peter Geoghegan
On 19 June 2012 16:17, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Peter Geoghegan pe...@2ndquadrant.com wrote: So, just to give a bit more weight to my argument that we should recognise that equivalent strings ought to be treated identically Since we appear to be questioning

Re: [HACKERS] sortsupport for text

2012-06-19 Thread Kevin Grittner
Peter Geoghegan pe...@2ndquadrant.com wrote: Kevin Grittner kevin.gritt...@wicourts.gov wrote: Since we appear to be questioning everything in this area, I'll raise something which has been bugging me for a while: in some other systems I've used, the tie-breaker comparison for equivalent

Re: [HACKERS] sortsupport for text

2012-06-19 Thread Peter Geoghegan
On 19 June 2012 17:45, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Peter Geoghegan pe...@2ndquadrant.com wrote: Are you sure that they actually have a tie-breaker, and don't just make the distinction between equality and equivalence (if only internally)? I'm pretty sure that when I was

Re: [HACKERS] sortsupport for text

2012-06-19 Thread Kevin Grittner
Peter Geoghegan pe...@2ndquadrant.com wrote: Kevin Grittner kevin.gritt...@wicourts.gov wrote: I'm pretty sure that when I was using Sybase ASE the order for non-equal values was always predictable, and it behaved in the manner I describe below. I'm less sure about any other product.

Re: [HACKERS] sortsupport for text

2012-06-19 Thread Peter Geoghegan
On 19 June 2012 18:57, Kevin Grittner kevin.gritt...@wicourts.gov wrote: We weren't using en_US.UTF-8 collation (or any other proper collation) on Sybase -- I'm not sure whether they even supported proper collation sequences on the versions we used.  I'm thinking of when we were using their

Re: [HACKERS] sortsupport for text

2012-06-19 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes: I wasn't aware that en_US.UTF-8 doesn't have equivalence without equality. I guess that surprising result in my last post is just plain inevitable with that collation then. Bummer. Is there actually anyone who finds that to be a useful

Re: [HACKERS] sortsupport for text

2012-06-19 Thread Peter Geoghegan
On 19 June 2012 19:44, Peter Geoghegan pe...@2ndquadrant.com wrote: You could do that, and some people do use custom collations for various reasons. That's obviously very much of minority interest though. Most people will just use citext or something. However, since citext is itself a client

Re: [HACKERS] sortsupport for text

2012-06-18 Thread Peter Geoghegan
On 18 June 2012 00:38, Tom Lane t...@sss.pgh.pa.us wrote: The only reason we test a = b and not a b || a b is that the latter is at least twice as expensive to evaluate. Perhaps I've been unclear. I meant to write (!(a b) !(b a)), and not (a b b a). The former is not tautological when

Re: [HACKERS] sortsupport for text

2012-06-18 Thread Peter Geoghegan
On 18 June 2012 16:59, Peter Geoghegan pe...@2ndquadrant.com wrote: Perhaps more importantly, I cannot recreate any of these problems on my Fedora 16 machine. Even with hu_HU on LATIN2, Tom's original test case (from 2005, on a Fedora 4 machine) cannot be recreated. So it may be that they've

Re: [HACKERS] sortsupport for text

2012-06-17 Thread Peter Geoghegan
The fly in the ointment for strxfrm() adoption may be the need to be consistent with this earlier behaviour: commit 656beff59033ccc5261a615802e1a85da68e8fad Author: Tom Lane t...@sss.pgh.pa.us Date: Thu Dec 22 22:50:00 2005 + Adjust string comparison so that only bitwise-equal strings

Re: [HACKERS] sortsupport for text

2012-06-17 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes: The fly in the ointment for strxfrm() adoption may be the need to be consistent with this earlier behaviour: if strcoll claims two strings are equal, check it with strcmp, and sort according to strcmp if not identical. I'm not sure I

Re: [HACKERS] sortsupport for text

2012-06-17 Thread Peter Geoghegan
On 17 June 2012 17:01, Tom Lane t...@sss.pgh.pa.us wrote: I'm not sure I agree with this decision; why should we presume to know better than the glibc locale what constitutes equality? The killer reason why it must be like that is that you can't use hash methods on text if text equality is

Re: [HACKERS] sortsupport for text

2012-06-17 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes: On 17 June 2012 17:01, Tom Lane t...@sss.pgh.pa.us wrote: The killer reason why it must be like that is that you can't use hash methods on text if text equality is some unknown condition subtly different from bitwise equality. Fair enough, but I

Re: [HACKERS] sortsupport for text

2012-06-17 Thread Peter Geoghegan
On Jun 17, 2012 5:50 PM, Tom Lane t...@sss.pgh.pa.us wrote: Peter Geoghegan pe...@2ndquadrant.com writes: On 17 June 2012 17:01, Tom Lane t...@sss.pgh.pa.us wrote: How exactly do you plan to shoehorn that into SQL? You could invent some nonstandard equivalence operator I suppose, but what

Re: [HACKERS] sortsupport for text

2012-06-17 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes: Right, most people won't care. You may or may not want a new Operator for equivalency. The regular operator for equality doesn't have to and shouldn't change. It is both useful and conceptually clean to not guarantee that a compator can be relied

Re: [HACKERS] sortsupport for text

2012-06-17 Thread Peter Geoghegan
On 17 June 2012 21:26, Tom Lane t...@sss.pgh.pa.us wrote: Sure, and in general we only expect that = operators mean equivalency; a concrete example is float8 =, which on IEEE-spec machines will say that zero and minus zero are equal. Right; the spec says that, and we punt to the spec. No one

Re: [HACKERS] sortsupport for text

2012-06-17 Thread Peter Geoghegan
On 17 June 2012 23:58, Peter Geoghegan pe...@2ndquadrant.com wrote: We can decree that equivalency implies equality, or make all this internal (which, perversely, I suppose the C++ committee people cannot). Sorry, that should obviously read equality implies equivalency. We may not have to

Re: [HACKERS] sortsupport for text

2012-06-17 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes: ISTM if '=' was really a mere equivalency operator, we'd only every check (a b b a) in the btree code. You're not really making a lot of sense here, or at least I'm not grasping the distinction you want to draw. btree indexes (and sorting in

Re: [HACKERS] sortsupport for text

2012-06-16 Thread Peter Geoghegan
On 18 March 2012 15:08, Tom Lane t...@sss.pgh.pa.us wrote: However, it occurred to me that we could pretty easily jury-rig something that would give us an idea about the actual benefit available here.  To wit: make a C function that wraps strxfrm, basically strxfrm(text) returns bytea.  Then

Re: [HACKERS] sortsupport for text

2012-06-15 Thread Tom Lane
Peter Geoghegan pe...@2ndquadrant.com writes: On 14 June 2012 19:28, Robert Haas robertmh...@gmail.com wrote: I thought that doubling repeatedly would be overly aggressive in terms of memory usage. I fail to understand how this sortsupport buffer fundamentally differs from a generic dynamic

Re: [HACKERS] sortsupport for text

2012-06-15 Thread Robert Haas
On Fri, Jun 15, 2012 at 12:22 PM, Tom Lane t...@sss.pgh.pa.us wrote: Peter Geoghegan pe...@2ndquadrant.com writes: On 14 June 2012 19:28, Robert Haas robertmh...@gmail.com wrote: I thought that doubling repeatedly would be overly aggressive in terms of memory usage. I fail to understand how

Re: [HACKERS] sortsupport for text

2012-06-15 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Fri, Jun 15, 2012 at 12:22 PM, Tom Lane t...@sss.pgh.pa.us wrote: (And from a performance standpoint, I'm not entirely convinced it's not a bug, anyway. Worst-case behavior could be pretty bad.) Instead of simply asserting that, could you respond

Re: [HACKERS] sortsupport for text

2012-06-15 Thread Robert Haas
On Fri, Jun 15, 2012 at 1:45 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Jun 15, 2012 at 12:22 PM, Tom Lane t...@sss.pgh.pa.us wrote: (And from a performance standpoint, I'm not entirely convinced it's not a bug, anyway.  Worst-case behavior could

Re: [HACKERS] sortsupport for text

2012-06-15 Thread Peter Geoghegan
On 15 June 2012 21:06, Robert Haas robertmh...@gmail.com wrote: On Fri, Jun 15, 2012 at 1:45 PM, Tom Lane t...@sss.pgh.pa.us wrote: Robert Haas robertmh...@gmail.com writes: On Fri, Jun 15, 2012 at 12:22 PM, Tom Lane t...@sss.pgh.pa.us wrote: (And from a performance standpoint, I'm not

Re: [HACKERS] sortsupport for text

2012-06-15 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes: On Fri, Jun 15, 2012 at 1:45 PM, Tom Lane t...@sss.pgh.pa.us wrote: Maybe I missed something, but as far as I saw your argument was not that the performance wasn't bad but that the rest of the sort code would dominate the runtime anyway.  I grant that

Re: [HACKERS] sortsupport for text

2012-06-14 Thread Peter Geoghegan
On 18 March 2012 15:08, Tom Lane t...@sss.pgh.pa.us wrote: One other thing I've always wondered about in this connection is the general performance of sorting toasted datums.  Is it better to detoast them in every comparison, or pre-detoast to save comparison cycles at the cost of having to

Re: [HACKERS] sortsupport for text

2012-06-14 Thread Robert Haas
On Thu, Jun 14, 2012 at 11:36 AM, Peter Geoghegan pe...@2ndquadrant.com wrote: On 18 March 2012 15:08, Tom Lane t...@sss.pgh.pa.us wrote: One other thing I've always wondered about in this connection is the general performance of sorting toasted datums.  Is it better to detoast them in every

Re: [HACKERS] sortsupport for text

2012-06-14 Thread Peter Geoghegan
On 14 June 2012 17:35, Robert Haas robertmh...@gmail.com wrote: The problem with pre-detoasting to save comparison cycles is that you can now fit many, many fewer tuples in work_mem.  There might be cases where it wins (for example, because the entire data set fits even after decompressing

Re: [HACKERS] sortsupport for text

2012-06-14 Thread Peter Geoghegan
On 2 March 2012 20:45, Robert Haas robertmh...@gmail.com wrote: I decided to investigate the possible virtues of allowing text to use the sortsupport infrastructure, since strings are something people often want to sort. I should mention up-front that I agree with the idea that it is worth

Re: [HACKERS] sortsupport for text

2012-06-14 Thread Robert Haas
On Thu, Jun 14, 2012 at 1:56 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: On 14 June 2012 17:35, Robert Haas robertmh...@gmail.com wrote: The problem with pre-detoasting to save comparison cycles is that you can now fit many, many fewer tuples in work_mem.  There might be cases where it

Re: [HACKERS] sortsupport for text

2012-06-14 Thread Robert Haas
On Thu, Jun 14, 2012 at 2:10 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: Why have you made the reusable buffer managed by sortsupport TEXTBUFLEN-aligned? The existing rationale for that constant (whose value is 1024) does not seem to carry forward here:  * This should be large enough

Re: [HACKERS] sortsupport for text

2012-06-14 Thread Peter Geoghegan
On 14 June 2012 19:28, Robert Haas robertmh...@gmail.com wrote: I thought that doubling repeatedly would be overly aggressive in terms of memory usage.  Blowing the buffers out to 8kB because we hit a string that's a bit over 4kB isn't so bad, but blowing them out to 256MB because we hit a

Re: [HACKERS] sortsupport for text

2012-06-14 Thread Robert Haas
On Thu, Jun 14, 2012 at 3:24 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: On 14 June 2012 19:28, Robert Haas robertmh...@gmail.com wrote: I thought that doubling repeatedly would be overly aggressive in terms of memory usage.  Blowing the buffers out to 8kB because we hit a string that's a

Re: [HACKERS] sortsupport for text

2012-06-14 Thread Peter Geoghegan
On 14 June 2012 20:32, Robert Haas robertmh...@gmail.com wrote: Yeah, but *it doesn't matter*.  If you test this on strings that are long enough that they get pushed out to TOAST, you'll find that it doesn't measurably improve performance, because the overhead of detoasting so completely

Re: [HACKERS] sortsupport for text

2012-06-14 Thread Robert Haas
On Thu, Jun 14, 2012 at 6:30 PM, Peter Geoghegan pe...@2ndquadrant.com wrote: Here we know that it doesn't matter, so the application of Knuth's first law of optimization is appropriate. I'm not advocating some Byzantine optimisation, or even something that could reasonably be described as an

Re: [HACKERS] sortsupport for text

2012-03-19 Thread Robert Haas
On Sat, Mar 17, 2012 at 6:58 PM, Greg Stark st...@mit.edu wrote: On Fri, Mar 2, 2012 at 8:45 PM, Robert Haas robertmh...@gmail.com wrote: 12789    28.2686  libc-2.13.so             strcoll_l 6802     15.0350  postgres                 text_cmp I'm still curious how it would compare to call

Re: [HACKERS] sortsupport for text

2012-03-19 Thread Robert Haas
On Sun, Mar 18, 2012 at 11:08 AM, Tom Lane t...@sss.pgh.pa.us wrote: However, it occurred to me that we could pretty easily jury-rig something that would give us an idea about the actual benefit available here.  To wit: make a C function that wraps strxfrm, basically strxfrm(text) returns

Re: [HACKERS] sortsupport for text

2012-03-19 Thread Martijn van Oosterhout
On Mon, Mar 19, 2012 at 12:19:53PM -0400, Robert Haas wrote: On Sat, Mar 17, 2012 at 6:58 PM, Greg Stark st...@mit.edu wrote: On Fri, Mar 2, 2012 at 8:45 PM, Robert Haas robertmh...@gmail.com wrote: 12789    28.2686  libc-2.13.so             strcoll_l 6802     15.0350  postgres              

Re: [HACKERS] sortsupport for text

2012-03-19 Thread Greg Stark
On Mon, Mar 19, 2012 at 9:23 PM, Martijn van Oosterhout klep...@svana.org wrote: Ouch. I was holding out hope that you could get a meaningful improvement if we could use the first X bytes of the strxfrm output so you only need to do a strcoll on strings that actually nearly match. But with an

Re: [HACKERS] sortsupport for text

2012-03-18 Thread Tom Lane
Greg Stark st...@mit.edu writes: I'm still curious how it would compare to call strxfrm and sort the resulting binary blobs. In principle that should be a win; it's hard to believe that strxfrm would have gotten into the standards if it were not a win for sorting applications. I don't think

Re: [HACKERS] sortsupport for text

2012-03-17 Thread Greg Stark
On Fri, Mar 2, 2012 at 8:45 PM, Robert Haas robertmh...@gmail.com wrote: 12789    28.2686  libc-2.13.so             strcoll_l 6802     15.0350  postgres                 text_cmp I'm still curious how it would compare to call strxfrm and sort the resulting binary blobs. I don't think the

Re: [HACKERS] sortsupport for text

2012-03-08 Thread Noah Misch
On Fri, Mar 02, 2012 at 03:45:38PM -0500, Robert Haas wrote: SELECT SUM(1) FROM (SELECT * FROM randomtext ORDER BY t) x; On unpatched master, this takes about 416 ms (plus or minus a few). With the attached patch, it takes about 389 ms (plus or minus a very few), a speedup of about 7%. I

Re: [HACKERS] sortsupport for text

2012-03-08 Thread Kevin Grittner
Noah Misch n...@leadboat.com wrote: On Fri, Mar 02, 2012 at 03:45:38PM -0500, Robert Haas wrote: SELECT SUM(1) FROM (SELECT * FROM randomtext ORDER BY t) x; [13% faster with patch for C collation; 7% faster for UTF8] I had hoped for more like a 15-20% gain from this approach, but it

[HACKERS] sortsupport for text

2012-03-02 Thread Robert Haas
I decided to investigate the possible virtues of allowing text to use the sortsupport infrastructure, since strings are something people often want to sort. I generated 100,000 random alphanumeric strings, each 30 characters in length, and loaded them into a single-column table, froze it, ran