Re: [HACKERS] B-Tree index builds, CLUSTER, and sortsupport
On Wed, Nov 5, 2014 at 8:30 PM, Peter Geoghegan p...@heroku.com wrote: Thanks for the review. Committed with some copy-editing based on Andreas's comments. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree index builds, CLUSTER, and sortsupport
On Fri, Nov 7, 2014 at 12:52 PM, Robert Haas robertmh...@gmail.com wrote: Committed with some copy-editing based on Andreas's comments. Thank you both. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree index builds, CLUSTER, and sortsupport
On 11/06/2014 02:30 AM, Peter Geoghegan wrote: Thanks for the review. On Wed, Nov 5, 2014 at 4:33 PM, Andreas Karlsson andr...@proxel.se wrote: I looked at the changes to the code. The new code is clean and there is more code re-use and improved readability. On possible further improvement would be to move the preparation of SortSupport to a common function since this is done three time in the code. The idea there is to have more direct control of sortsupport. With the abbreviated keys patch, abbreviation occurs based on a decision made by tuplesort.c. I can see why you'd say that, but I prefer to keep initialization of sortsupport structs largely concentrated in tuplesort.c, and more or less uniform regardless of the tuple-type being sorted. Ok, I am fine with either. Is there any case where we should expect any greater performance improvement? The really compelling case is abbreviated keys - as you probably know, there is a patch that builds on this patch, and the abbreviated keys patch, so that B-Tree builds and CLUSTER can use abbreviated keys too. That doesn't really have much to do with this patch, though. The important point is that heap tuple sorting (with a query that has no client overhead, and involves one big sort) and B-Tree index creation both have their tuplesort as a totally dominant cost. The improvements for each case should be quite comparable, which is good (except, as noted in my opening e-mail, when heap/datum tuplesorting can use the onlyKey optimization, while B-Tree/CLUSTER tuplesorting cannot). Ah, I see. -- Andreas Karlsson -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree index builds, CLUSTER, and sortsupport
On 10/10/14, 7:26 PM, Peter Geoghegan wrote: Both Robert and Heikki expressed dissatisfaction with the fact that B-Tree index builds don't use sortsupport. Because B-Tree index builds cannot really use the onlyKey optimization, the historic oversight of not supporting B-Tree builds (and CLUSTER) might have been at least a little understandable [1]. But with the recent work on sortsupport for text, it's likely that B-Tree index builds will be missing out on rather a lot by not taking advantage of sortsupport. Attached patch modifies tuplesort to correct this oversight. What's really nice about it is that there is actually a net negative code footprint: Did anything ever happen with this? -- Jim Nasby, Data Architect, Blue Treble Consulting Data in Trouble? Get it in Treble! http://BlueTreble.com -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree index builds, CLUSTER, and sortsupport
On Wed, Nov 5, 2014 at 8:36 AM, Jim Nasby jim.na...@bluetreble.com wrote: Did anything ever happen with this? I consider this to be related to the abbreviated keys stuff, although it is clearly independently valuable (so it could be reviewed independently). Since Robert ran out of time to work on abbreviated keys (hopefully temporarily), it's not all that surprising that no one has paid attention to this smaller piece of work. I see that Andreas Karlsson is signed up to review this, so that's something. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree index builds, CLUSTER, and sortsupport
On 10/11/2014 02:26 AM, Peter Geoghegan wrote: Both Robert and Heikki expressed dissatisfaction with the fact that B-Tree index builds don't use sortsupport. Because B-Tree index builds cannot really use the onlyKey optimization, the historic oversight of not supporting B-Tree builds (and CLUSTER) might have been at least a little understandable [1]. But with the recent work on sortsupport for text, it's likely that B-Tree index builds will be missing out on rather a lot by not taking advantage of sortsupport. Attached patch modifies tuplesort to correct this oversight. What's really nice about it is that there is actually a net negative code footprint: src/backend/access/nbtree/nbtsort.c | 63 +++--- src/backend/utils/sort/sortsupport.c | 77 ++-- src/backend/utils/sort/tuplesort.c | 274 +++ src/include/utils/sortsupport.h | 3 + 4 files changed, 203 insertions(+), 214 deletions(-) The code compiles and passes the test suite. I looked at the changes to the code. The new code is clean and there is more code re-use and improved readability. On possible further improvement would be to move the preparation of SortSupport to a common function since this is done three time in the code. I did some simple benchmarks by adding indexes to temporary tables and could see improvements of around 10% in index build time. So it gives a nice, but not amazing, performance improvement. Is there any case where we should expect any greater performance improvement? Either way I find this a nice patch which improves code quality and performance. = Minor code style issues I found - There is a double space in strategy = (scanKey-sk_flags [...]. - I think there should be a newline in tuplesort_begin_index_btree() before /* Prepare SortSupport data for each column */. - Remove the extra newline after reversedirection(). - End sentences in comments with period. That seems to be the common practice in the project. -- Andreas Karlsson -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] B-Tree index builds, CLUSTER, and sortsupport
Thanks for the review. On Wed, Nov 5, 2014 at 4:33 PM, Andreas Karlsson andr...@proxel.se wrote: I looked at the changes to the code. The new code is clean and there is more code re-use and improved readability. On possible further improvement would be to move the preparation of SortSupport to a common function since this is done three time in the code. The idea there is to have more direct control of sortsupport. With the abbreviated keys patch, abbreviation occurs based on a decision made by tuplesort.c. I can see why you'd say that, but I prefer to keep initialization of sortsupport structs largely concentrated in tuplesort.c, and more or less uniform regardless of the tuple-type being sorted. I did some simple benchmarks by adding indexes to temporary tables and could see improvements of around 10% in index build time. So it gives a nice, but not amazing, performance improvement. Cool. Is there any case where we should expect any greater performance improvement? The really compelling case is abbreviated keys - as you probably know, there is a patch that builds on this patch, and the abbreviated keys patch, so that B-Tree builds and CLUSTER can use abbreviated keys too. That doesn't really have much to do with this patch, though. The important point is that heap tuple sorting (with a query that has no client overhead, and involves one big sort) and B-Tree index creation both have their tuplesort as a totally dominant cost. The improvements for each case should be quite comparable, which is good (except, as noted in my opening e-mail, when heap/datum tuplesorting can use the onlyKey optimization, while B-Tree/CLUSTER tuplesorting cannot). -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers