Re: [HACKERS] B-Tree index builds, CLUSTER, and sortsupport

2014-11-07 Thread Robert Haas
On Wed, Nov 5, 2014 at 8:30 PM, Peter Geoghegan p...@heroku.com wrote:
 Thanks for the review.

Committed with some copy-editing based on Andreas's comments.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree index builds, CLUSTER, and sortsupport

2014-11-07 Thread Peter Geoghegan
On Fri, Nov 7, 2014 at 12:52 PM, Robert Haas robertmh...@gmail.com wrote:
 Committed with some copy-editing based on Andreas's comments.

Thank you both.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree index builds, CLUSTER, and sortsupport

2014-11-06 Thread Andreas Karlsson

On 11/06/2014 02:30 AM, Peter Geoghegan wrote:

Thanks for the review.

On Wed, Nov 5, 2014 at 4:33 PM, Andreas Karlsson andr...@proxel.se wrote:

I looked at the changes to the code. The new code is clean and there is more
code re-use and improved readability. On possible further improvement would
be to move the preparation of SortSupport to a common function since this is
done three time in the code.


The idea there is to have more direct control of sortsupport. With the
abbreviated keys patch, abbreviation occurs based on a decision made
by tuplesort.c. I can see why you'd say that, but I prefer to keep
initialization of sortsupport structs largely concentrated in
tuplesort.c, and more or less uniform regardless of the tuple-type
being sorted.


Ok, I am fine with either.


Is there any case where we should expect any greater performance
improvement?


The really compelling case is abbreviated keys - as you probably know,
there is a patch that builds on this patch, and the abbreviated keys
patch, so that B-Tree builds and CLUSTER can use abbreviated keys too.
That doesn't really have much to do with this patch, though. The
important point is that heap tuple sorting (with a query that has no
client overhead, and involves one big sort) and B-Tree index creation
both have their tuplesort as a totally dominant cost. The improvements
for each case should be quite comparable, which is good (except, as
noted in my opening e-mail, when heap/datum tuplesorting can use the
onlyKey optimization, while B-Tree/CLUSTER tuplesorting cannot).


Ah, I see.

--
Andreas Karlsson


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree index builds, CLUSTER, and sortsupport

2014-11-05 Thread Jim Nasby

On 10/10/14, 7:26 PM, Peter Geoghegan wrote:

Both Robert and Heikki expressed dissatisfaction with the fact that
B-Tree index builds don't use sortsupport. Because B-Tree index builds
cannot really use the onlyKey optimization, the historic oversight
of not supporting B-Tree builds (and CLUSTER) might have been at least
a little understandable [1]. But with the recent work on sortsupport
for text, it's likely that B-Tree index builds will be missing out on
rather a lot by not taking advantage of sortsupport.

Attached patch modifies tuplesort to correct this oversight. What's
really nice about it is that there is actually a net negative code
footprint:


Did anything ever happen with this?
--
Jim Nasby, Data Architect, Blue Treble Consulting
Data in Trouble? Get it in Treble! http://BlueTreble.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree index builds, CLUSTER, and sortsupport

2014-11-05 Thread Peter Geoghegan
On Wed, Nov 5, 2014 at 8:36 AM, Jim Nasby jim.na...@bluetreble.com wrote:
 Did anything ever happen with this?

I consider this to be related to the abbreviated keys stuff, although
it is clearly independently valuable (so it could be reviewed
independently). Since Robert ran out of time to work on abbreviated
keys (hopefully temporarily), it's not all that surprising that no one
has paid attention to this smaller piece of work. I see that Andreas
Karlsson is signed up to review this, so that's something.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree index builds, CLUSTER, and sortsupport

2014-11-05 Thread Andreas Karlsson
On 10/11/2014 02:26 AM, Peter Geoghegan wrote: Both Robert and Heikki 
expressed dissatisfaction with the fact that

 B-Tree index builds don't use sortsupport. Because B-Tree index builds
 cannot really use the onlyKey optimization, the historic oversight
 of not supporting B-Tree builds (and CLUSTER) might have been at least
 a little understandable [1]. But with the recent work on sortsupport
 for text, it's likely that B-Tree index builds will be missing out on
 rather a lot by not taking advantage of sortsupport.

 Attached patch modifies tuplesort to correct this oversight. What's
 really nice about it is that there is actually a net negative code
 footprint:

   src/backend/access/nbtree/nbtsort.c  |  63 +++---
   src/backend/utils/sort/sortsupport.c |  77 ++--
   src/backend/utils/sort/tuplesort.c   | 274 +++
   src/include/utils/sortsupport.h  |   3 +
   4 files changed, 203 insertions(+), 214 deletions(-)

The code compiles and passes the test suite.

I looked at the changes to the code. The new code is clean and there is 
more code re-use and improved readability. On possible further 
improvement would be to move the preparation of SortSupport to a common 
function since this is done three time in the code.


I did some simple benchmarks by adding indexes to temporary tables and 
could see improvements of around 10% in index build time. So it gives a 
nice, but not amazing, performance improvement.


Is there any case where we should expect any greater performance 
improvement?


Either way I find this a nice patch which improves code quality and 
performance.


= Minor code style issues I found

- There is a double space in strategy  = (scanKey-sk_flags [...].
- I think there should be a newline in tuplesort_begin_index_btree() 
before /* Prepare SortSupport data for each column */.

- Remove the extra newline after reversedirection().
- End sentences in comments with period. That seems to be the common 
practice in the project.


--
Andreas Karlsson


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] B-Tree index builds, CLUSTER, and sortsupport

2014-11-05 Thread Peter Geoghegan
Thanks for the review.

On Wed, Nov 5, 2014 at 4:33 PM, Andreas Karlsson andr...@proxel.se wrote:
 I looked at the changes to the code. The new code is clean and there is more
 code re-use and improved readability. On possible further improvement would
 be to move the preparation of SortSupport to a common function since this is
 done three time in the code.

The idea there is to have more direct control of sortsupport. With the
abbreviated keys patch, abbreviation occurs based on a decision made
by tuplesort.c. I can see why you'd say that, but I prefer to keep
initialization of sortsupport structs largely concentrated in
tuplesort.c, and more or less uniform regardless of the tuple-type
being sorted.

 I did some simple benchmarks by adding indexes to temporary tables and could
 see improvements of around 10% in index build time. So it gives a nice, but
 not amazing, performance improvement.

Cool.

 Is there any case where we should expect any greater performance
 improvement?

The really compelling case is abbreviated keys - as you probably know,
there is a patch that builds on this patch, and the abbreviated keys
patch, so that B-Tree builds and CLUSTER can use abbreviated keys too.
That doesn't really have much to do with this patch, though. The
important point is that heap tuple sorting (with a query that has no
client overhead, and involves one big sort) and B-Tree index creation
both have their tuplesort as a totally dominant cost. The improvements
for each case should be quite comparable, which is good (except, as
noted in my opening e-mail, when heap/datum tuplesorting can use the
onlyKey optimization, while B-Tree/CLUSTER tuplesorting cannot).

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers