Peter Geoghegan writes:
> On 8 October 2012 21:35, Robert Haas wrote:
Gentlemen, you can't fight here. This is the War Room.
Regards,
--
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support
--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.o
On 8 October 2012 21:35, Robert Haas wrote:
> Hey, if me deciding I don't want to work on a patch any more is going
> to make you feel slighted, then you're out of luck. The archives are
> littered with people who have decided to stop working on things
> because the consensus position on list was
On Mon, Oct 8, 2012 at 12:26 PM, Peter Geoghegan wrote:
> If it was the case that you were only 50% of the way to getting
> something committable, I guess I'd understand; this is, after all, a
> volunteer effort, and you are of course free to pursue or not pursue
> whatever you like. It's more lik
On 8 October 2012 16:30, Robert Haas wrote:
> I don't have any plans to work on this further. I think all of the
> criticism that has been leveled at this patch is 100% bogus, and I
> greatly dislike the changes that have been proposed. That may not be
> fair, but it's how I feel, and in light o
On Mon, Oct 8, 2012 at 8:47 AM, Peter Geoghegan wrote:
> Do you think you can follow through on this soon, Robert? I don't
> believe that there are any outstanding issues. I'm not going to make
> an issue of the fact that strxfrm() hasn't been taken advantage of. If
> you could please post a new r
Do you think you can follow through on this soon, Robert? I don't
believe that there are any outstanding issues. I'm not going to make
an issue of the fact that strxfrm() hasn't been taken advantage of. If
you could please post a new revision, with the suggested alterations
(that you agree with), I
On Mon, Jul 23, 2012 at 12:07 PM, Peter Geoghegan wrote:
> On 23 July 2012 16:36, Robert Haas wrote:
>> On Mon, Jul 23, 2012 at 11:34 AM, Peter Geoghegan
>> wrote:
tss->buflen = 1 << ffs(len1);
>>>
>>> I'm sorry, I don't follow you here. What is ffs() ?
>>
>> Sorry, fls, not ffs. I always
On 23 July 2012 16:36, Robert Haas wrote:
> On Mon, Jul 23, 2012 at 11:34 AM, Peter Geoghegan
> wrote:
>>> tss->buflen = 1 << ffs(len1);
>>
>> I'm sorry, I don't follow you here. What is ffs() ?
>
> Sorry, fls, not ffs. I always get those mixed up.
>
> See src/port/fls.c
Oh, okay. Since, I inf
On Mon, Jul 23, 2012 at 11:34 AM, Peter Geoghegan wrote:
> On 23 July 2012 16:09, Robert Haas wrote:
>> However, what this really boils down to is that you and Peter don't
>> like this line of code:
>>
>> + tss->buflen1 = TYPEALIGN(TEXTBUFLEN, len1);
>
> I can only speak for myself,
On 23 July 2012 16:09, Robert Haas wrote:
> However, what this really boils down to is that you and Peter don't
> like this line of code:
>
> + tss->buflen1 = TYPEALIGN(TEXTBUFLEN, len1);
I can only speak for myself, though I agree with your summary here.
> What would you like it t
On Fri, Jun 15, 2012 at 6:10 PM, Tom Lane wrote:
> I would be concerned about this if it were per-sort-tuple wastage, but
> what I understood to be happening was that this was a single instance of
> an expansible buffer (per sort, perhaps, but still just one buffer).
> And, as you keep pointing ou
On 21 June 2012 11:40, Florian Pflug wrote:
> At this point, my theory is that your choice of "random" strings prevents
> strxfrm() from ever winning over strcoll(). The reason being that you pick
> each letter uniformly distributed from a-z, resulting in a probability of
> two string differing in
On Jun21, 2012, at 11:55 , Peter Geoghegan wrote:
> On 21 June 2012 10:24, Florian Pflug wrote:
>> On Jun21, 2012, at 02:22 , Peter Geoghegan wrote:
>>> I've written a very small C++ program, which I've attached, that
>>> basically proves that this can still make a fairly large difference -
>>> I
On 21 June 2012 10:24, Florian Pflug wrote:
> On Jun21, 2012, at 02:22 , Peter Geoghegan wrote:
>> I've written a very small C++ program, which I've attached, that
>> basically proves that this can still make a fairly large difference -
>> I hope it's okay that that it's C++, but that allowed me t
On Jun21, 2012, at 02:22 , Peter Geoghegan wrote:
> I've written a very small C++ program, which I've attached, that
> basically proves that this can still make a fairly large difference -
> I hope it's okay that that it's C++, but that allowed me to write the
> program quickly, with no dependencie
On Jun20, 2012, at 19:38 , Peter Geoghegan wrote:
> On 20 June 2012 17:41, Tom Lane wrote:
>> In any case, if you have to redefine the meaning of equality in order
>> to justify a performance patch, I'm prepared to walk away at the start.
>
> The advantage of my proposed implementation is precise
On 21 June 2012 01:22, Peter Geoghegan wrote:
> I've written a very small C++ program, which I've attached, that
> basically proves that this can still make a fairly large difference -
> I hope it's okay that that it's C++, but that allowed me to write the
> program quickly, with no dependencies f
On 20 June 2012 17:41, Tom Lane wrote:
> Peter Geoghegan writes:
>> No, I'm suggesting it would probably be at least a bit of a win here
>> to cache the constant, and only have to do a strxfrm() + strcmp() per
>> comparison.
>
> Um, have you got any hard evidence to support that notion? The
> tr
On 20 June 2012 17:41, Tom Lane wrote:
> Peter Geoghegan writes:
>> No, I'm suggesting it would probably be at least a bit of a win here
>> to cache the constant, and only have to do a strxfrm() + strcmp() per
>> comparison.
>
> Um, have you got any hard evidence to support that notion? The
> tr
On Wed, Jun 20, 2012 at 12:41 PM, Tom Lane wrote:
>> The fact is that this is likely to be a fairly significant
>> performance win, because strxfrm() is quite simply the way you're
>> supposed to do collation-aware sorting, and is documented as such. For
>> that reason, C standard library implemen
Peter Geoghegan writes:
> No, I'm suggesting it would probably be at least a bit of a win here
> to cache the constant, and only have to do a strxfrm() + strcmp() per
> comparison.
Um, have you got any hard evidence to support that notion? The
traditional advice is that strcoll is faster than us
On 20 June 2012 15:55, Tom Lane wrote:
> Peter Geoghegan writes:
>> I think that this change may have made the difference between the
>> Hungarians getting away with it and not getting away with it. Might it
>> have been that for text, they were using some operator that wasn't '='
>> (perhaps one
Peter Geoghegan writes:
> I think that this change may have made the difference between the
> Hungarians getting away with it and not getting away with it. Might it
> have been that for text, they were using some operator that wasn't '='
> (perhaps one which has no fastpath, and thus correctly mad
On Wed, Jun 20, 2012 at 3:19 PM, Peter Geoghegan wrote:
>> It occurs to me that strxfrm would answer this question. If we made
>> the hash function hash the result of strxfrm then we could make
>> equality use strcoll and not fall back to strcmp.
>
> What about per-column collations?
Well collati
On 20 June 2012 15:10, Greg Stark wrote:
> On Sun, Jun 17, 2012 at 9:26 PM, Tom Lane wrote:
>> The trick for hashing such datatypes is to be able to guarantee that
>> "equal" values hash to the same hash code, which is typically possible
>> as long as you know the equality rules well enough. We
On Sun, Jun 17, 2012 at 9:26 PM, Tom Lane wrote:
> The trick for hashing such datatypes is to be able to guarantee that
> "equal" values hash to the same hash code, which is typically possible
> as long as you know the equality rules well enough. We could possibly
> do that for text with pure-str
On 19 June 2012 19:44, Peter Geoghegan wrote:
> PostgreSQL supported Unicode before 2005, when the tie-breaker was
> introduced. I know at least one Swede who used Postgres95. I just took
> a look at the REL6_4 branch, and it looks much the same in 1999 as it
> did in 2005, in that there is no tie
On 20 June 2012 11:00, Peter Eisentraut wrote:
> On sön, 2012-06-17 at 23:58 +0100, Peter Geoghegan wrote:
>> So if you take the word "Aßlar" here - that is equivalent to "Asslar",
>> and so strcoll("Aßlar", "Asslar") will return 0 if you have the right
>> LC_COLLATE
>
> This is not actually corre
On sön, 2012-06-17 at 23:58 +0100, Peter Geoghegan wrote:
> So if you take the word "Aßlar" here - that is equivalent to "Asslar",
> and so strcoll("Aßlar", "Asslar") will return 0 if you have the right
> LC_COLLATE
This is not actually correct. glibc will sort Asslar before Aßlar, and
that is co
On 19 June 2012 19:44, Peter Geoghegan wrote:
> You could do that, and some people do use custom collations for
> various reasons. That's obviously very much of minority interest
> though. Most people will just use citext or something. However, since
> citext is itself a client of varstr_cmp(), th
"Kevin Grittner" writes:
> I wasn't aware that en_US.UTF-8 doesn't have equivalence without
> equality. I guess that surprising result in my last post is just
> plain inevitable with that collation then. Bummer. Is there
> actually anyone who finds that to be a useful behavior? For a
> collati
On 19 June 2012 18:57, Kevin Grittner wrote:
> We weren't using en_US.UTF-8 collation (or any other "proper"
> collation) on Sybase -- I'm not sure whether they even supported
> proper collation sequences on the versions we used. I'm thinking of
> when we were using their "case insensitive" sorti
Peter Geoghegan wrote:
> Kevin Grittner wrote:
>> I'm pretty sure that when I was using Sybase ASE the order for
>> non-equal values was always predictable, and it behaved in the
>> manner I describe below. I'm less sure about any other product.
>
> Maybe it used a physical row identifier as
On 19 June 2012 17:45, Kevin Grittner wrote:
> Peter Geoghegan wrote:
>> Are you sure that they actually have a tie-breaker, and don't just
>> make the distinction between equality and equivalence (if only
>> internally)?
>
> I'm pretty sure that when I was using Sybase ASE the order for
> non-eq
Peter Geoghegan wrote:
> Kevin Grittner wrote:
>> Since we appear to be questioning everything in this area, I'll
>> raise something which has been bugging me for a while: in some
>> other systems I've used, the "tie-breaker" comparison for
>> equivalent values comes after equivalence sorting o
On 19 June 2012 16:17, Kevin Grittner wrote:
> Peter Geoghegan wrote:
>
>> So, just to give a bit more weight to my argument that we should
>> recognise that equivalent strings ought to be treated identically
>
> Since we appear to be questioning everything in this area, I'll
> raise something wh
Peter Geoghegan wrote:
> So, just to give a bit more weight to my argument that we should
> recognise that equivalent strings ought to be treated identically
Since we appear to be questioning everything in this area, I'll
raise something which has been bugging me for a while: in some other
sys
So, just to give a bit more weight to my argument that we should
recognise that equivalent strings ought to be treated identically, I
direct your attention to conformance requirement C9 of Unicode 3.0:
http://www.unicode.org/unicode/standard/versions/enumeratedversions.html#Unicode_3_0_0
This "re
On 18 June 2012 16:59, Peter Geoghegan wrote:
> Perhaps more importantly, I cannot recreate any of these problems on
> my Fedora 16 machine. Even with hu_HU on LATIN2, Tom's original test
> case (from 2005, on a Fedora 4 machine) cannot be recreated. So it may
> be that they've tightened these thi
On 18 June 2012 00:38, Tom Lane wrote:
> The only reason we test "a = b" and not "a < b || a > b"
> is that the latter is at least twice as expensive to evaluate.
Perhaps I've been unclear. I meant to write "(!(a < b) && !(b < a))",
and not "(a < b && b < a)". The former is not tautological when
Peter Geoghegan writes:
> ISTM if '=' was really a mere equivalency operator, we'd only every
> check (a < b && b < a) in the btree code.
You're not really making a lot of sense here, or at least I'm not
grasping the distinction you want to draw. btree indexes (and sorting
in general) require th
On 17 June 2012 23:58, Peter Geoghegan wrote:
> We can decree that equivalency implies equality, or make all this
> internal (which, perversely, I suppose the C++ committee people
> cannot).
Sorry, that should obviously read "equality implies equivalency". We
may not have to decree it, because it
On 17 June 2012 21:26, Tom Lane wrote:
> Sure, and in general we only expect that "=" operators mean equivalency;
> a concrete example is float8 "=", which on IEEE-spec machines will say
> that zero and minus zero are equal.
Right; the spec says that, and we punt to the spec. No one sensible
thin
Peter Geoghegan writes:
> Right, most people won't care. You may or may not want a new
> Operator for equivalency. The regular operator for equality doesn't have to
> and shouldn't change. It is both useful and conceptually clean to not
> guarantee that a compator can be relied upon to indicate eq
On Jun 17, 2012 5:50 PM, "Tom Lane" wrote:
>
> Peter Geoghegan writes:
> > On 17 June 2012 17:01, Tom Lane wrote:
> How exactly do you plan to shoehorn that into SQL? You could invent
> some nonstandard "equivalence" operator I suppose, but what will be the
> value? We aren't going to set thin
Peter Geoghegan writes:
> On 17 June 2012 17:01, Tom Lane wrote:
>> The killer reason why it must be like that is that you can't use hash
>> methods on text if text equality is some unknown condition subtly
>> different from bitwise equality.
> Fair enough, but I doubt that we need to revert the
On 17 June 2012 17:01, Tom Lane wrote:
>> I'm not sure I agree with this decision; why should we presume to know
>> better than the glibc locale what constitutes equality?
>
> The killer reason why it must be like that is that you can't use hash
> methods on text if text equality is some unknown c
Peter Geoghegan writes:
> The fly in the ointment for strxfrm() adoption may be the need to be
> consistent with this earlier behaviour:
> if strcoll claims two strings are equal, check it with strcmp, and
> sort according to strcmp if not identical.
> I'm not sure I agree with this deci
The fly in the ointment for strxfrm() adoption may be the need to be
consistent with this earlier behaviour:
commit 656beff59033ccc5261a615802e1a85da68e8fad
Author: Tom Lane
Date: Thu Dec 22 22:50:00 2005 +
Adjust string comparison so that only bitwise-equal strings are considered
On 18 March 2012 15:08, Tom Lane wrote:
> However, it occurred to me that we could pretty easily jury-rig
> something that would give us an idea about the actual benefit available
> here. To wit: make a C function that wraps strxfrm, basically
> strxfrm(text) returns bytea. Then compare the perf
Robert Haas writes:
> On Fri, Jun 15, 2012 at 1:45 PM, Tom Lane wrote:
>> Maybe I missed something, but as far as I saw your argument was not that
>> the performance wasn't bad but that the rest of the sort code would
>> dominate the runtime anyway. I grant that entirely, but that doesn't
>> mea
On 15 June 2012 21:06, Robert Haas wrote:
> On Fri, Jun 15, 2012 at 1:45 PM, Tom Lane wrote:
>> Robert Haas writes:
>>> On Fri, Jun 15, 2012 at 12:22 PM, Tom Lane wrote:
(And from a performance standpoint, I'm not entirely convinced it's not
a bug, anyway. Worst-case behavior could b
On Fri, Jun 15, 2012 at 1:45 PM, Tom Lane wrote:
> Robert Haas writes:
>> On Fri, Jun 15, 2012 at 12:22 PM, Tom Lane wrote:
>>> (And from a performance standpoint, I'm not entirely convinced it's not
>>> a bug, anyway. Worst-case behavior could be pretty bad.)
>
>> Instead of simply asserting t
Robert Haas writes:
> On Fri, Jun 15, 2012 at 12:22 PM, Tom Lane wrote:
>> (And from a performance standpoint, I'm not entirely convinced it's not
>> a bug, anyway. Worst-case behavior could be pretty bad.)
> Instead of simply asserting that, could you respond to the specific
> points raised in
On Fri, Jun 15, 2012 at 12:22 PM, Tom Lane wrote:
> Peter Geoghegan writes:
>> On 14 June 2012 19:28, Robert Haas wrote:
>>> I thought that doubling repeatedly would be overly aggressive in terms
>>> of memory usage.
>
>> I fail to understand how this sortsupport buffer fundamentally differs
>>
Peter Geoghegan writes:
> On 14 June 2012 19:28, Robert Haas wrote:
>> I thought that doubling repeatedly would be overly aggressive in terms
>> of memory usage.
> I fail to understand how this sortsupport buffer fundamentally differs
> from a generic dynamic array abstraction built to contain c
On Thu, Jun 14, 2012 at 6:30 PM, Peter Geoghegan wrote:
>> Here we know that it doesn't matter, so the application of Knuth's first law
>> of optimization is appropriate.
>
> I'm not advocating some Byzantine optimisation, or even something that
> could reasonably be described as an optimisation a
On 14 June 2012 20:32, Robert Haas wrote:
> Yeah, but *it doesn't matter*. If you test this on strings that are
> long enough that they get pushed out to TOAST, you'll find that it
> doesn't measurably improve performance, because the overhead of
> detoasting so completely dominates any savings o
On Thu, Jun 14, 2012 at 3:24 PM, Peter Geoghegan wrote:
> On 14 June 2012 19:28, Robert Haas wrote:
>> I thought that doubling repeatedly would be overly aggressive in terms
>> of memory usage. Blowing the buffers out to 8kB because we hit a
>> string that's a bit over 4kB isn't so bad, but blow
On 14 June 2012 19:28, Robert Haas wrote:
> I thought that doubling repeatedly would be overly aggressive in terms
> of memory usage. Blowing the buffers out to 8kB because we hit a
> string that's a bit over 4kB isn't so bad, but blowing them out to
> 256MB because we hit a string that's a bit o
On Thu, Jun 14, 2012 at 2:10 PM, Peter Geoghegan wrote:
> Why have you made the reusable buffer managed by sortsupport
> TEXTBUFLEN-aligned? The existing rationale for that constant (whose
> value is 1024) does not seem to carry forward here:
>
> * This should be large enough that most strings wi
On Thu, Jun 14, 2012 at 1:56 PM, Peter Geoghegan wrote:
> On 14 June 2012 17:35, Robert Haas wrote:
>> The problem with pre-detoasting to save comparison cycles is that you
>> can now fit many, many fewer tuples in work_mem. There might be cases
>> where it wins (for example, because the entire
On 2 March 2012 20:45, Robert Haas wrote:
> I decided to investigate the possible virtues of allowing "text" to
> use the sortsupport infrastructure, since strings are something people
> often want to sort.
I should mention up-front that I agree with the idea that it is worth
optimising text sort
On 14 June 2012 17:35, Robert Haas wrote:
> The problem with pre-detoasting to save comparison cycles is that you
> can now fit many, many fewer tuples in work_mem. There might be cases
> where it wins (for example, because the entire data set fits even
> after decompressing everything) but in mo
On Thu, Jun 14, 2012 at 11:36 AM, Peter Geoghegan wrote:
> On 18 March 2012 15:08, Tom Lane wrote:
>> One other thing I've always wondered about in this connection is the
>> general performance of sorting toasted datums. Is it better to detoast
>> them in every comparison, or pre-detoast to save
On 18 March 2012 15:08, Tom Lane wrote:
> One other thing I've always wondered about in this connection is the
> general performance of sorting toasted datums. Is it better to detoast
> them in every comparison, or pre-detoast to save comparison cycles at
> the cost of having to push much more da
On Mon, Mar 19, 2012 at 9:23 PM, Martijn van Oosterhout
wrote:
> Ouch. I was holding out hope that you could get a meaningful
> improvement if we could use the first X bytes of the strxfrm output so
> you only need to do a strcoll on strings that actually nearly match.
> But with an information de
On Mon, Mar 19, 2012 at 12:19:53PM -0400, Robert Haas wrote:
> On Sat, Mar 17, 2012 at 6:58 PM, Greg Stark wrote:
> > On Fri, Mar 2, 2012 at 8:45 PM, Robert Haas wrote:
> >> 12789 28.2686 libc-2.13.so strcoll_l
> >> 6802 15.0350 postgres text_cmp
> >
> > I'm s
On Sun, Mar 18, 2012 at 11:08 AM, Tom Lane wrote:
> However, it occurred to me that we could pretty easily jury-rig
> something that would give us an idea about the actual benefit available
> here. To wit: make a C function that wraps strxfrm, basically
> strxfrm(text) returns bytea. Then compar
On Sat, Mar 17, 2012 at 6:58 PM, Greg Stark wrote:
> On Fri, Mar 2, 2012 at 8:45 PM, Robert Haas wrote:
>> 12789 28.2686 libc-2.13.so strcoll_l
>> 6802 15.0350 postgres text_cmp
>
> I'm still curious how it would compare to call strxfrm and sort the
> resultin
Greg Stark writes:
> I'm still curious how it would compare to call strxfrm and sort the
> resulting binary blobs.
In principle that should be a win; it's hard to believe that strxfrm
would have gotten into the standards if it were not a win for sorting
applications.
> I don't think the sortsupp
On Fri, Mar 2, 2012 at 8:45 PM, Robert Haas wrote:
> 12789 28.2686 libc-2.13.so strcoll_l
> 6802 15.0350 postgres text_cmp
I'm still curious how it would compare to call strxfrm and sort the
resulting binary blobs. I don't think the sortsupport stuff actually
Noah Misch wrote:
> On Fri, Mar 02, 2012 at 03:45:38PM -0500, Robert Haas wrote:
>> SELECT SUM(1) FROM (SELECT * FROM randomtext ORDER BY t) x;
>> [13% faster with patch for C collation; 7% faster for UTF8]
>> I had hoped for more like a 15-20% gain from this approach, but
>> it didn't happe
On Fri, Mar 02, 2012 at 03:45:38PM -0500, Robert Haas wrote:
> SELECT SUM(1) FROM (SELECT * FROM randomtext ORDER BY t) x;
>
> On unpatched master, this takes about 416 ms (plus or minus a few).
> With the attached patch, it takes about 389 ms (plus or minus a very
> few), a speedup of about 7%.
>
74 matches
Mail list logo