Re: [HACKERS] Making strxfrm() blobs in indexes work

2014-02-12 Thread Martijn van Oosterhout
On Sun, Feb 02, 2014 at 05:09:06PM -0800, Peter Geoghegan wrote: However, it also occurs to me that strxfrm() blobs have another useful property: We (as, say, the author of an equality operator on text, an operator intended for a btree operator class) *can* trust a strcmp()'s result on blobs,

Re: [HACKERS] Making strxfrm() blobs in indexes work

2014-02-12 Thread Peter Geoghegan
On Wed, Feb 12, 2014 at 3:30 PM, Martijn van Oosterhout klep...@svana.org wrote: (A bit late to the party). This idea has come up before and the most annoying thing is that braindead strxfrm api. Namely, to strxfrm a large strings you need to strxfrm it completely even if you only want the

Re: [HACKERS] Making strxfrm() blobs in indexes work

2014-02-02 Thread Peter Geoghegan
On Thu, Jan 30, 2014 at 8:51 PM, Peter Geoghegan p...@heroku.com wrote: I've done some more digging. It turns out that the 1977 paper An Encoding Method for Multifield Sorting and Indexing describes a technique that involves concatenating multiple column values and comparing them using a

[HACKERS] Making strxfrm() blobs in indexes work

2014-01-30 Thread Peter Geoghegan
On more occasions than I care to recall, someone has suggested that it would be valuable to do something with strxfrm() blobs in order to have cheaper locale-aware text comparisons. One obvious place to do so would be in indexes, but in the past that has been dismissed on the following grounds: *

Re: [HACKERS] Making strxfrm() blobs in indexes work

2014-01-30 Thread Tom Lane
Peter Geoghegan p...@heroku.com writes: On more occasions than I care to recall, someone has suggested that it would be valuable to do something with strxfrm() blobs in order to have cheaper locale-aware text comparisons. One obvious place to do so would be in indexes, but in the past that has

Re: [HACKERS] Making strxfrm() blobs in indexes work

2014-01-30 Thread Peter Geoghegan
On Thu, Jan 30, 2014 at 4:34 PM, Tom Lane t...@sss.pgh.pa.us wrote: Quite aside from the index bloat risk, this effect means a 3-4x reduction in the maximum string length that can be indexed before getting the dreaded Values larger than 1/3 of a buffer page cannot be indexed error. Worse, a

Re: [HACKERS] Making strxfrm() blobs in indexes work

2014-01-30 Thread Tom Lane
Peter Geoghegan p...@heroku.com writes: On Thu, Jan 30, 2014 at 4:34 PM, Tom Lane t...@sss.pgh.pa.us wrote: Quite aside from the index bloat risk, this effect means a 3-4x reduction in the maximum string length that can be indexed before getting the dreaded Values larger than 1/3 of a buffer

Re: [HACKERS] Making strxfrm() blobs in indexes work

2014-01-30 Thread Peter Geoghegan
On Thu, Jan 30, 2014 at 4:45 PM, Peter Geoghegan p...@heroku.com wrote: So we consider the appropriateness of a regular strcoll() or a strxfrm() in all contexts (in a generic and extensible manner, but that's essentially what we do). I'm not too discouraged by this restriction, because in

Re: [HACKERS] Making strxfrm() blobs in indexes work

2014-01-30 Thread Peter Geoghegan
On Thu, Jan 30, 2014 at 5:04 PM, Tom Lane t...@sss.pgh.pa.us wrote: That's not hard to prevent. If that should happen, we don't go with the strxfrm() datum. We have a spare IndexTuple bit we could use to mark when the optimization was applied. You'd need a bit per column, no? I don't think

Re: [HACKERS] Making strxfrm() blobs in indexes work

2014-01-30 Thread Peter Geoghegan
On Thu, Jan 30, 2014 at 3:49 PM, Peter Geoghegan p...@heroku.com wrote: So ISTM that we could come up with an infrastructure, possibly just for insertion scanKeys (limiting the code footprint of all of this) in order to inner-page-process datums at this juncture, and store a blob instead, for

Re: [HACKERS] Making strxfrm() blobs in indexes work

2014-01-30 Thread Peter Geoghegan
On Thu, Jan 30, 2014 at 5:05 PM, Peter Geoghegan p...@heroku.com wrote: On Thu, Jan 30, 2014 at 5:04 PM, Tom Lane t...@sss.pgh.pa.us wrote: That's not hard to prevent. If that should happen, we don't go with the strxfrm() datum. We have a spare IndexTuple bit we could use to mark when the