Re: [HACKERS] [PATCH] SortSupport for macaddr type
On Fri, Mar 31, 2017 at 3:03 PM, Brandur Leachwrote: > Thank you Peter for the assist here and great sleuthing in > the disassembly, and thanks Teodor for committing! > > Neha pointed out (thanks as well) a couple typos upthread > that I hadn't gotten around to fixing. I've attached a new > three line patch to sort those out. Committed. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SortSupport for macaddr type
Hello, Thank you Peter for the assist here and great sleuthing in the disassembly, and thanks Teodor for committing! Neha pointed out (thanks as well) a couple typos upthread that I hadn't gotten around to fixing. I've attached a new three line patch to sort those out. Brandur On Wed, Mar 29, 2017 at 1:31 PM, Teodor Sigaevwrote: > Thank you, pushed > > Excellent! I've attached a new (and hopefully final) >> version of the patch. >> > > -- > Teodor Sigaev E-mail: teo...@sigaev.ru >WWW: > http://www.sigaev.ru/ > 0001-Fix-two-typos-introduced-by-macaddr-SortSupport.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SortSupport for macaddr type
Thank you, pushed Excellent! I've attached a new (and hopefully final) version of the patch. -- Teodor Sigaev E-mail: teo...@sigaev.ru WWW: http://www.sigaev.ru/ -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SortSupport for macaddr type
On Sun, Mar 19, 2017 at 8:40 AM, Greg Starkwrote: >> Out of idle curiosity, I decided to generate disassembly of both >> macaddr_cmp_internal(), and the patch's abbreviated comparator. The >> former consists of 49 x86-64 instructions at -02 on my machine, >> totaling 135 bytes of object code. The latter consists of only 10 >> instructions, or 24 bytes of object code. > > I wonder if there's something that could be optimized out of the > normal cmp function but we're defeating some compiler optimizations > with all our casts and aliasing. There was one shl instruction for every left shift (hibits() or lowbits() call) that appears in macaddr_cmp_internal(). I suppose that it's possible that that could have been better optimized on a big-endian machine, where abbreviated keys do not need to be byteswaped to make the abbreviated comparator work. Perhaps the compiler could have recognized that macaddr is a struct that consists of 6 unsigned bytes as digits. One thing that I've noticed makes a relatively big difference to instruction count in comparators is varlena overhead, which does come up here, since macaddr is a type that doesn't have a varlena header (it was recently suggested by Tom that this is a mistake on practical grounds, though). I've informally considered the possibility of providing alternative versions of comparators that do not detoast or work with anything other than 1-byte header varlenas, because tuplesort has detected that that happens to be generally safe. I doubt that I'll ever get around to posting a patch to do that, since the cost savings are probably still marginal. I could probably find something better to work on. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SortSupport for macaddr type
On 18 March 2017 at 22:22, Peter Geogheganwrote: > > Out of idle curiosity, I decided to generate disassembly of both > macaddr_cmp_internal(), and the patch's abbreviated comparator. The > former consists of 49 x86-64 instructions at -02 on my machine, > totaling 135 bytes of object code. The latter consists of only 10 > instructions, or 24 bytes of object code. I wonder if there's something that could be optimized out of the normal cmp function but we're defeating some compiler optimizations with all our casts and aliasing. -- greg -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SortSupport for macaddr type
On Sat, Mar 18, 2017 at 2:54 PM, Peter Geogheganwrote: > This seems fine to me, especially > because it lets us compare macaddr using simple 3-way unsigned int > comparisons, which isn't otherwise the case. Out of idle curiosity, I decided to generate disassembly of both macaddr_cmp_internal(), and the patch's abbreviated comparator. The former consists of 49 x86-64 instructions at -02 on my machine, totaling 135 bytes of object code. The latter consists of only 10 instructions, or 24 bytes of object code. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SortSupport for macaddr type
On Tue, Mar 14, 2017 at 3:25 AM, Heikki Linnakangaswrote: > Looks straightforward at a quick read-through. I have just a couple of > questions. How much of the performance gain comes from avoiding the > FunctionCallInvoke overhead, by simply having SortSupport with a comparison > function, and how much comes from the "abbreviation"? I assume it's very similar to the existing performance characteristics that UUID SortSupport has. > Also, is it worth the cycles and code to have the hyperloglog estimator, and > aborting the abbreviation if there are only a few distinct values. Or put > another way, how much does the abbreviation slow things down in the worst > case? Postgres doesn't support passing 6 byte types by-value due to a lack of support from macros in places like postgres.h. There is a GET_4_BYTES(), and a GET_8_BYTES(), but no GET_6_BYTES(). It doesn't really seem worth adding those, just like it didn't seem worth it back when I pointed out that a TID sort is very slow in the context of CREATE INDEX CONCURRENTLY. (We ended up coming up with an ad-hoc int8 encoding scheme there, which allowed us to use the int8 default opclass rather than the TID default opclass, you'll recall.) It's a bit odd that macaddr abbreviated keys are actually always smaller (not including padding) than sizeof(Datum) on 64-bit machines, but this guarantees that abbreviated comparisons will reliably indicate what an authoritative comparison would indicate, since of course all relevant information is right there in the abbreviated key. This will be the only time where that's generally true with abbreviated keys for any type. This seems fine to me, especially because it lets us compare macaddr using simple 3-way unsigned int comparisons, which isn't otherwise the case. I doubt that you'll ever see a real need to abort abbreviation. Arguably it shouldn't be considered, but maybe it's worth being consistent. It might matter on 32-bit platforms, where there is only a single NIC-specific byte to differentiate MAC addresses, but even that seems like a stretch. With a type like macaddr, where there is no major cost like strxfrm() to worry about when abbreviating, the only cost of abbreviating is that tie-breakers will have an extra index_getattr() or similar. If there was no abbreviation, or if it aborted, then the index_getattr() can happen once per SortTuple, up-front. Nitpick: the patch should probably not refer to 32-bit or 64-bit systems. Rather, it should refer to Datum size only. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SortSupport for macaddr type
Looks straightforward at a quick read-through. I have just a couple of questions. How much of the performance gain comes from avoiding the FunctionCallInvoke overhead, by simply having SortSupport with a comparison function, and how much comes from the "abbreviation"? Also, is it worth the cycles and code to have the hyperloglog estimator, and aborting the abbreviation if there are only a few distinct values. Or put another way, how much does the abbreviation slow things down in the worst case? - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SortSupport for macaddr type
Hi Brandur, Couple of typo corrections required in patch: s/converstion/conversion s/go the heap/go to the heap The performance results you shared are for he Index Creation operation. Are there similar results for the sorting using ORDER BY queries too? Just curious. Regards, Neha
Re: [HACKERS] [PATCH] SortSupport for macaddr type
On Tue, Feb 28, 2017 at 08:58:24AM -0800, Brandur Leach wrote: > Hi Julien, > Hello Brandur, > Thanks for the expedient reply, even after I'd dropped the > ball for so long :) :) > Cool! I just re-read my own comment a few days later and I > think that it still mostly makes sense, but definitely open > to other edits if anyone else has one. > Ok. > > One last thing, I noticed that you added: > > > > +static int macaddr_internal_cmp(macaddr *a1, macaddr *a2); > > > > but the existing function is declared as > > > > static int32 > > macaddr_internal_cmp(macaddr *a1, macaddr *a2) > > > > I'd be in favor to declare both as int. > > Great catch! I have no idea how I missed that. I've done as > you suggested and made them both "int", which seems > consistent with SortSupport implementations elsewhere. > Great. > > After this, I think this patch will be ready for committer. > > Excellent! I've attached a new (and hopefully final) > version of the patch. > > Two final questions about the process if you'd be so kind: > > * Should I change the status on the Commitfest link [1] or > do I leave that to you (or someone else like a committer)? > This is is done by the reviewer, after check of the last patch version. I just changed the status to ready for committer! > * I've been generating a new OID value with the > `unused_oids` script, but pretty much every time I rebase > I collide with someone else's addition and need to find a > new one. Is it better for me to pick an OID in an exotic > range for my final patch, or that a committer just finds > a new one (if necessary) as they're putting it into > master? I think picking the value with unused_oids as you dd is the right thing to do. As Robert said, if this oid is used in another patch in the meantime, updating it at commit time is not a big deal. Moreover, this patch will require a catversion bump, which is meant to be done by the committer. -- Julien Rouhaud http://dalibo.com - http://dalibo.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SortSupport for macaddr type
On Tue, Feb 28, 2017 at 10:28 PM, Brandur Leachwrote: > * I've been generating a new OID value with the > `unused_oids` script, but pretty much every time I rebase > I collide with someone else's addition and need to find a > new one. Is it better for me to pick an OID in an exotic > range for my final patch, or that a committer just finds > a new one (if necessary) as they're putting it into > master? It's nice if you can do it before the committer gets to it, but if it ends up conflicting I personally don't find fixing it to be a big deal. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SortSupport for macaddr type
Hi Julien, Thanks for the expedient reply, even after I'd dropped the ball for so long :) > Indeed, I should have checked more examples :/ There > isn't any clear pattern for this, so I guess any one > would be ok. Yeah, agreed. If it's alright with you, I ended up moving the naming back to `macaddr_cmp_internal` just because it results in a smaller final diff. > Thanks. I'm ok with this, but maybe a native english > speaker would have a better opinion on this. Cool! I just re-read my own comment a few days later and I think that it still mostly makes sense, but definitely open to other edits if anyone else has one. > One last thing, I noticed that you added: > > +static int macaddr_internal_cmp(macaddr *a1, macaddr *a2); > > but the existing function is declared as > > static int32 > macaddr_internal_cmp(macaddr *a1, macaddr *a2) > > I'd be in favor to declare both as int. Great catch! I have no idea how I missed that. I've done as you suggested and made them both "int", which seems consistent with SortSupport implementations elsewhere. > After this, I think this patch will be ready for committer. Excellent! I've attached a new (and hopefully final) version of the patch. Two final questions about the process if you'd be so kind: * Should I change the status on the Commitfest link [1] or do I leave that to you (or someone else like a committer)? * I've been generating a new OID value with the `unused_oids` script, but pretty much every time I rebase I collide with someone else's addition and need to find a new one. Is it better for me to pick an OID in an exotic range for my final patch, or that a committer just finds a new one (if necessary) as they're putting it into master? Thanks again! Brandur [1] https://commitfest.postgresql.org/10/743/ On Sat, Feb 25, 2017 at 9:56 AM, Julien Rouhaudwrote: > On Sun, Feb 05, 2017 at 01:56:41PM -0800, Brandur Leach wrote: > > Hello Brandur, thanks for the updated patch! > > > > > > * you used macaddr_cmp_internal() function name, for uuid > > > the same function is named uuid_internal_cmp(). Using > > > the same naming pattern is probably better. > > > > I was a little split on this one! It's true that UUID uses > > `_internal_cmp`, but `_cmp_internal` is also used in a > > number of places like `enum`, `timetz`, and `network`. I > > don't have a strong feeling about it either way, so I've > > changed it to `_internal_cmp` to match UUID as you > > suggested. > > > > Indeed, I should have checked more examples :/ There isn't any clear > pattern > for this, so I guess any one would be ok. > > > > * the function comment on macaddr_abbrev_convert() > > > doesn't mention specific little-endian handling > > > > I tried to bake this into the comment text. Here are the > > relevant lines of the amended version: > > > > * Packs the bytes of a 6-byte MAC address into a Datum and treats it > as > > an > > * unsigned integer for purposes of comparison. On a 64-bit machine, > > there > > * will be two zeroed bytes of padding. The integer is converted to > > native > > * endianness to facilitate easy comparison. > > > > > * "There will be two bytes of zero padding on the least > > > significant end" > > > > > > "least significant bits" would be better > > > > Also done. Here is the amended version: > > > > * On a 64-bit machine, zero out the 8-byte datum and copy the 6 > bytes of > > * the MAC address in. There will be two bytes of zero padding on the > end > > * of the least significant bits. > > > > Thanks. I'm ok with this, but maybe a native english speaker would have a > better opinion on this. > > > > * This patch will trigger quite a lot modifications after > > > a pgindent run. Could you try to run pgindent on mac.c > > > before sending an updated patch? > > > > Good call! I've run the new version through pgindent. > > > > Thanks also, no more issue here. > > > Let me know if you have any further feedback and/or > > suggestions. Thanks! > > One last thing, I noticed that you added: > > +static int macaddr_internal_cmp(macaddr *a1, macaddr *a2); > > but the existing function is declared as > > static int32 > macaddr_internal_cmp(macaddr *a1, macaddr *a2) > > I'd be in favor to declare both as int. > > After this, I think this patch will be ready for committer. > > -- > Julien Rouhaud > http://dalibo.com - http://dalibo.org > 0003-Implement-SortSupport-for-macaddr-data-type.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SortSupport for macaddr type
On Sun, Feb 05, 2017 at 01:56:41PM -0800, Brandur Leach wrote: Hello Brandur, thanks for the updated patch! > > > * you used macaddr_cmp_internal() function name, for uuid > > the same function is named uuid_internal_cmp(). Using > > the same naming pattern is probably better. > > I was a little split on this one! It's true that UUID uses > `_internal_cmp`, but `_cmp_internal` is also used in a > number of places like `enum`, `timetz`, and `network`. I > don't have a strong feeling about it either way, so I've > changed it to `_internal_cmp` to match UUID as you > suggested. > Indeed, I should have checked more examples :/ There isn't any clear pattern for this, so I guess any one would be ok. > > * the function comment on macaddr_abbrev_convert() > > doesn't mention specific little-endian handling > > I tried to bake this into the comment text. Here are the > relevant lines of the amended version: > > * Packs the bytes of a 6-byte MAC address into a Datum and treats it as > an > * unsigned integer for purposes of comparison. On a 64-bit machine, > there > * will be two zeroed bytes of padding. The integer is converted to > native > * endianness to facilitate easy comparison. > > > * "There will be two bytes of zero padding on the least > > significant end" > > > > "least significant bits" would be better > > Also done. Here is the amended version: > > * On a 64-bit machine, zero out the 8-byte datum and copy the 6 bytes of > * the MAC address in. There will be two bytes of zero padding on the end > * of the least significant bits. > Thanks. I'm ok with this, but maybe a native english speaker would have a better opinion on this. > > * This patch will trigger quite a lot modifications after > > a pgindent run. Could you try to run pgindent on mac.c > > before sending an updated patch? > > Good call! I've run the new version through pgindent. > Thanks also, no more issue here. > Let me know if you have any further feedback and/or > suggestions. Thanks! One last thing, I noticed that you added: +static int macaddr_internal_cmp(macaddr *a1, macaddr *a2); but the existing function is declared as static int32 macaddr_internal_cmp(macaddr *a1, macaddr *a2) I'd be in favor to declare both as int. After this, I think this patch will be ready for committer. -- Julien Rouhaud http://dalibo.com - http://dalibo.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SortSupport for macaddr type
And as a short follow up, I've moved the patch to the current commit fest: https://commitfest.postgresql.org/13/743/ On Sun, Feb 5, 2017 at 1:56 PM, Brandur Leachwrote: > Hi Julien, > > Thank you for taking the time to do this review, and my > apologies for the very delayed response. I lost track of > this work and have only jumped back into it today. > > Please find attached a new version of the patch with your > feedback integrated. I've also rebased the patch against > the current master and selected a new OID because my old > one is now in use. > > > * you used macaddr_cmp_internal() function name, for uuid > > the same function is named uuid_internal_cmp(). Using > > the same naming pattern is probably better. > > I was a little split on this one! It's true that UUID uses > `_internal_cmp`, but `_cmp_internal` is also used in a > number of places like `enum`, `timetz`, and `network`. I > don't have a strong feeling about it either way, so I've > changed it to `_internal_cmp` to match UUID as you > suggested. > > > * the function comment on macaddr_abbrev_convert() > > doesn't mention specific little-endian handling > > I tried to bake this into the comment text. Here are the > relevant lines of the amended version: > > * Packs the bytes of a 6-byte MAC address into a Datum and treats it > as an > * unsigned integer for purposes of comparison. On a 64-bit machine, > there > * will be two zeroed bytes of padding. The integer is converted to > native > * endianness to facilitate easy comparison. > > > * "There will be two bytes of zero padding on the least > > significant end" > > > > "least significant bits" would be better > > Also done. Here is the amended version: > > * On a 64-bit machine, zero out the 8-byte datum and copy the 6 bytes > of > * the MAC address in. There will be two bytes of zero padding on the > end > * of the least significant bits. > > > * This patch will trigger quite a lot modifications after > > a pgindent run. Could you try to run pgindent on mac.c > > before sending an updated patch? > > Good call! I've run the new version through pgindent. > > Let me know if you have any further feedback and/or > suggestions. Thanks! > > Brandur > > > On Wed, Sep 14, 2016 at 3:14 AM, Julien Rouhaud > wrote: > >> On 26/08/2016 19:44, Brandur wrote: >> > Hello, >> > >> >> Hello, >> >> > I've attached a patch to add SortSupport for Postgres' macaddr which >> has the >> > effect of improving the performance of sorting operations for the type. >> The >> > strategy that I employ is very similar to that for UUID, which is to >> create >> > abbreviated keys by packing as many bytes from the MAC address as >> possible into >> > Datums, and then performing fast unsigned integer comparisons while >> sorting. >> > >> > I ran some informal local benchmarks, and for cardinality greater than >> 100k >> > rows, I see a speed up on `CREATE INDEX` of roughly 25% to 65%. (For >> those >> > interested, I put a few more numbers into a small report here [2].) >> > >> >> That's a nice improvement! >> >> > Admittedly, this is not quite as useful as speeding up sorting on a >> more common >> > data type like TEXT or UUID, but the change still seems like a useful >> > performance improvement. I largely wrote it as an exercise to >> familiarize >> > myself with the Postgres codebase. >> > >> > I'll add an entry into the current commitfest as suggested by the >> Postgres Wiki >> > and follow up here with a link. >> > >> > Thanks, and if anyone has feedback or other thoughts, let me know! >> > >> >> I just reviewed your patch. It applies and compiles cleanly, and the >> abbrev feature works as intended. There's not much to say since this is >> heavily inspired on the uuid SortSupport. The only really specific part >> is in the abbrev_converter function, and I don't see any issue with it. >> >> I have a few trivial comments: >> >> * you used macaddr_cmp_internal() function name, for uuid the same >> function is named uuid_internal_cmp(). Using the same naming pattern is >> probably better. >> >> * the function comment on macaddr_abbrev_convert() doesn't mention >> specific little-endian handling >> >> * "There will be two bytes of zero padding on the least significant end" >> >> "least significant bits" would be better >> >> * This patch will trigger quite a lot modifications after a pgindent >> run. Could you try to run pgindent on mac.c before sending an updated >> patch? >> >> Best regards. >> >> -- >> Julien Rouhaud >> http://dalibo.com - http://dalibo.org >> > >
Re: [HACKERS] [PATCH] SortSupport for macaddr type
Hi Julien, Thank you for taking the time to do this review, and my apologies for the very delayed response. I lost track of this work and have only jumped back into it today. Please find attached a new version of the patch with your feedback integrated. I've also rebased the patch against the current master and selected a new OID because my old one is now in use. > * you used macaddr_cmp_internal() function name, for uuid > the same function is named uuid_internal_cmp(). Using > the same naming pattern is probably better. I was a little split on this one! It's true that UUID uses `_internal_cmp`, but `_cmp_internal` is also used in a number of places like `enum`, `timetz`, and `network`. I don't have a strong feeling about it either way, so I've changed it to `_internal_cmp` to match UUID as you suggested. > * the function comment on macaddr_abbrev_convert() > doesn't mention specific little-endian handling I tried to bake this into the comment text. Here are the relevant lines of the amended version: * Packs the bytes of a 6-byte MAC address into a Datum and treats it as an * unsigned integer for purposes of comparison. On a 64-bit machine, there * will be two zeroed bytes of padding. The integer is converted to native * endianness to facilitate easy comparison. > * "There will be two bytes of zero padding on the least > significant end" > > "least significant bits" would be better Also done. Here is the amended version: * On a 64-bit machine, zero out the 8-byte datum and copy the 6 bytes of * the MAC address in. There will be two bytes of zero padding on the end * of the least significant bits. > * This patch will trigger quite a lot modifications after > a pgindent run. Could you try to run pgindent on mac.c > before sending an updated patch? Good call! I've run the new version through pgindent. Let me know if you have any further feedback and/or suggestions. Thanks! Brandur On Wed, Sep 14, 2016 at 3:14 AM, Julien Rouhaudwrote: > On 26/08/2016 19:44, Brandur wrote: > > Hello, > > > > Hello, > > > I've attached a patch to add SortSupport for Postgres' macaddr which has > the > > effect of improving the performance of sorting operations for the type. > The > > strategy that I employ is very similar to that for UUID, which is to > create > > abbreviated keys by packing as many bytes from the MAC address as > possible into > > Datums, and then performing fast unsigned integer comparisons while > sorting. > > > > I ran some informal local benchmarks, and for cardinality greater than > 100k > > rows, I see a speed up on `CREATE INDEX` of roughly 25% to 65%. (For > those > > interested, I put a few more numbers into a small report here [2].) > > > > That's a nice improvement! > > > Admittedly, this is not quite as useful as speeding up sorting on a more > common > > data type like TEXT or UUID, but the change still seems like a useful > > performance improvement. I largely wrote it as an exercise to familiarize > > myself with the Postgres codebase. > > > > I'll add an entry into the current commitfest as suggested by the > Postgres Wiki > > and follow up here with a link. > > > > Thanks, and if anyone has feedback or other thoughts, let me know! > > > > I just reviewed your patch. It applies and compiles cleanly, and the > abbrev feature works as intended. There's not much to say since this is > heavily inspired on the uuid SortSupport. The only really specific part > is in the abbrev_converter function, and I don't see any issue with it. > > I have a few trivial comments: > > * you used macaddr_cmp_internal() function name, for uuid the same > function is named uuid_internal_cmp(). Using the same naming pattern is > probably better. > > * the function comment on macaddr_abbrev_convert() doesn't mention > specific little-endian handling > > * "There will be two bytes of zero padding on the least significant end" > > "least significant bits" would be better > > * This patch will trigger quite a lot modifications after a pgindent > run. Could you try to run pgindent on mac.c before sending an updated > patch? > > Best regards. > > -- > Julien Rouhaud > http://dalibo.com - http://dalibo.org > 0002-Implement-SortSupport-for-macaddr-data-type.patch Description: Binary data -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SortSupport for macaddr type
On Wed, Sep 14, 2016 at 6:14 AM, Julien Rouhaudwrote: > On 26/08/2016 19:44, Brandur wrote: >> Hello, > Hello, > >> I've attached a patch to add SortSupport for Postgres' macaddr which has the >> effect of improving the performance of sorting operations for the type. The >> strategy that I employ is very similar to that for UUID, which is to create >> abbreviated keys by packing as many bytes from the MAC address as possible >> into >> Datums, and then performing fast unsigned integer comparisons while sorting. >> >> I ran some informal local benchmarks, and for cardinality greater than 100k >> rows, I see a speed up on `CREATE INDEX` of roughly 25% to 65%. (For those >> interested, I put a few more numbers into a small report here [2].) >> > > That's a nice improvement! > >> Admittedly, this is not quite as useful as speeding up sorting on a more >> common >> data type like TEXT or UUID, but the change still seems like a useful >> performance improvement. I largely wrote it as an exercise to familiarize >> myself with the Postgres codebase. >> >> I'll add an entry into the current commitfest as suggested by the Postgres >> Wiki >> and follow up here with a link. >> >> Thanks, and if anyone has feedback or other thoughts, let me know! >> > > I just reviewed your patch. It applies and compiles cleanly, and the > abbrev feature works as intended. There's not much to say since this is > heavily inspired on the uuid SortSupport. The only really specific part > is in the abbrev_converter function, and I don't see any issue with it. > > I have a few trivial comments: > > * you used macaddr_cmp_internal() function name, for uuid the same > function is named uuid_internal_cmp(). Using the same naming pattern is > probably better. > > * the function comment on macaddr_abbrev_convert() doesn't mention > specific little-endian handling > > * "There will be two bytes of zero padding on the least significant end" > > "least significant bits" would be better > > * This patch will trigger quite a lot modifications after a pgindent > run. Could you try to run pgindent on mac.c before sending an updated > patch? Since it's been two weeks and this patch hasn't been updated in response to this review, I have marked it "Returned with Feedback" in the CommitFest. If it is updated, it can be resubmitted for the next CommitFest. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SortSupport for macaddr type
On 26/08/2016 19:44, Brandur wrote: > Hello, > Hello, > I've attached a patch to add SortSupport for Postgres' macaddr which has the > effect of improving the performance of sorting operations for the type. The > strategy that I employ is very similar to that for UUID, which is to create > abbreviated keys by packing as many bytes from the MAC address as possible > into > Datums, and then performing fast unsigned integer comparisons while sorting. > > I ran some informal local benchmarks, and for cardinality greater than 100k > rows, I see a speed up on `CREATE INDEX` of roughly 25% to 65%. (For those > interested, I put a few more numbers into a small report here [2].) > That's a nice improvement! > Admittedly, this is not quite as useful as speeding up sorting on a more > common > data type like TEXT or UUID, but the change still seems like a useful > performance improvement. I largely wrote it as an exercise to familiarize > myself with the Postgres codebase. > > I'll add an entry into the current commitfest as suggested by the Postgres > Wiki > and follow up here with a link. > > Thanks, and if anyone has feedback or other thoughts, let me know! > I just reviewed your patch. It applies and compiles cleanly, and the abbrev feature works as intended. There's not much to say since this is heavily inspired on the uuid SortSupport. The only really specific part is in the abbrev_converter function, and I don't see any issue with it. I have a few trivial comments: * you used macaddr_cmp_internal() function name, for uuid the same function is named uuid_internal_cmp(). Using the same naming pattern is probably better. * the function comment on macaddr_abbrev_convert() doesn't mention specific little-endian handling * "There will be two bytes of zero padding on the least significant end" "least significant bits" would be better * This patch will trigger quite a lot modifications after a pgindent run. Could you try to run pgindent on mac.c before sending an updated patch? Best regards. -- Julien Rouhaud http://dalibo.com - http://dalibo.org -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [PATCH] SortSupport for macaddr type
And here's a commitfest link: https://commitfest.postgresql.org/10/743/ Also, a correction to my original message: the unreferenced [1] footnote points back to the thread that included the patch for UUID SortSupport. [1] https://www.postgresql.org/message-id/CAM3SWZR4avsTwwNVUzRNbHk8v36W-QBqpoKg=ogkwwy0dkt...@mail.gmail.com On Fri, Aug 26, 2016 at 10:44:22AM -0700, Brandur wrote: > Hello, > > I've attached a patch to add SortSupport for Postgres' macaddr which has the > effect of improving the performance of sorting operations for the type. The > strategy that I employ is very similar to that for UUID, which is to create > abbreviated keys by packing as many bytes from the MAC address as possible > into > Datums, and then performing fast unsigned integer comparisons while sorting. > > I ran some informal local benchmarks, and for cardinality greater than 100k > rows, I see a speed up on `CREATE INDEX` of roughly 25% to 65%. (For those > interested, I put a few more numbers into a small report here [2].) > > Admittedly, this is not quite as useful as speeding up sorting on a more > common > data type like TEXT or UUID, but the change still seems like a useful > performance improvement. I largely wrote it as an exercise to familiarize > myself with the Postgres codebase. > > I'll add an entry into the current commitfest as suggested by the Postgres > Wiki > and follow up here with a link. > > Thanks, and if anyone has feedback or other thoughts, let me know! > > Brandur > > [1] > https://www.postgresql.org/message-id/CAM3SWZR4avsTwwNVUzRNbHk8v36W-QBqpoKg=ogkwwy0dkt...@mail.gmail.com > [2] > https://docs.google.com/spreadsheets/d/1cUqbg9RTgSo16WrrfJJwDP1eDaWL3TNOxnIOFNpfwgA/edit?usp=sharing -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
[HACKERS] [PATCH] SortSupport for macaddr type
Hello, I've attached a patch to add SortSupport for Postgres' macaddr which has the effect of improving the performance of sorting operations for the type. The strategy that I employ is very similar to that for UUID, which is to create abbreviated keys by packing as many bytes from the MAC address as possible into Datums, and then performing fast unsigned integer comparisons while sorting. I ran some informal local benchmarks, and for cardinality greater than 100k rows, I see a speed up on `CREATE INDEX` of roughly 25% to 65%. (For those interested, I put a few more numbers into a small report here [2].) Admittedly, this is not quite as useful as speeding up sorting on a more common data type like TEXT or UUID, but the change still seems like a useful performance improvement. I largely wrote it as an exercise to familiarize myself with the Postgres codebase. I'll add an entry into the current commitfest as suggested by the Postgres Wiki and follow up here with a link. Thanks, and if anyone has feedback or other thoughts, let me know! Brandur [1] https://www.postgresql.org/message-id/CAM3SWZR4avsTwwNVUzRNbHk8v36W-QBqpoKg=ogkwwy0dkt...@mail.gmail.com [2] https://docs.google.com/spreadsheets/d/1cUqbg9RTgSo16WrrfJJwDP1eDaWL3TNOxnIOFNpfwgA/edit?usp=sharing >From 73f7dbf1921eea7106c93654dd7bf1fdd0888a07 Mon Sep 17 00:00:00 2001 From: BrandurDate: Tue, 16 Aug 2016 17:07:07 -0700 Subject: [PATCH] Implement SortSupport for macaddr data type Introduces a scheme to produce abbreviated keys for the macaddr type which involves packing as many of a MAC address' six bytes as possible into a Datum (and adding two zeroed bytes of padding on a 64-bit system). Sorting routines can then take advantage of the abbreviated keys by performing fast unsigned integer comparisons, resulting in a significant performance increase when operating on > 100k rows or more. This implementation is modeled closely on the addition of SortSupport for the UUID data type, which employs a very similar strategy. --- src/backend/utils/adt/mac.c | 210 src/include/catalog/pg_amproc.h | 1 + src/include/catalog/pg_proc.h | 2 + src/include/utils/builtins.h| 1 + 4 files changed, 214 insertions(+) diff --git a/src/backend/utils/adt/mac.c b/src/backend/utils/adt/mac.c index 509315a..471c3f0 100644 --- a/src/backend/utils/adt/mac.c +++ b/src/backend/utils/adt/mac.c @@ -7,9 +7,13 @@ #include "postgres.h" #include "access/hash.h" +#include "lib/hyperloglog.h" #include "libpq/pqformat.h" +#include "port/pg_bswap.h" #include "utils/builtins.h" +#include "utils/guc.h" #include "utils/inet.h" +#include "utils/sortsupport.h" /* @@ -22,6 +26,21 @@ #define lobits(addr) \ ((unsigned long)(((addr)->d<<16)|((addr)->e<<8)|((addr)->f))) +/* sortsupport for macaddr */ +typedef struct +{ + int64 input_count;/* number of non-null values seen */ + boolestimating; /* true if estimating cardinality */ + + hyperLogLogState abbr_card; /* cardinality estimator */ +} macaddr_sortsupport_state; + +static int macaddr_cmp_internal(macaddr *a1, macaddr *a2); +static int macaddr_fast_cmp(Datum x, Datum y, SortSupport ssup); +static int macaddr_cmp_abbrev(Datum x, Datum y, SortSupport ssup); +static bool macaddr_abbrev_abort(int memtupcount, SortSupport ssup); +static Datum macaddr_abbrev_convert(Datum original, SortSupport ssup); + /* * MAC address reader. Accepts several common notations. */ @@ -318,3 +337,194 @@ macaddr_trunc(PG_FUNCTION_ARGS) PG_RETURN_MACADDR_P(result); } + +/* + * SortSupport strategy function. Populates a SortSupport struct with the + * information necessary to use comparison by abbreviated keys. + */ +Datum +macaddr_sortsupport(PG_FUNCTION_ARGS) +{ + SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0); + + ssup->comparator = macaddr_fast_cmp; + ssup->ssup_extra = NULL; + + if (ssup->abbreviate) + { + macaddr_sortsupport_state *uss; + MemoryContext oldcontext; + + oldcontext = MemoryContextSwitchTo(ssup->ssup_cxt); + + uss = palloc(sizeof(macaddr_sortsupport_state)); + uss->input_count = 0; + uss->estimating = true; + initHyperLogLog(>abbr_card, 10); + + ssup->ssup_extra = uss; + + ssup->comparator = macaddr_cmp_abbrev; + ssup->abbrev_converter = macaddr_abbrev_convert; + ssup->abbrev_abort = macaddr_abbrev_abort; + ssup->abbrev_full_comparator = macaddr_fast_cmp; + + MemoryContextSwitchTo(oldcontext); + } + + PG_RETURN_VOID(); +} + +/* + * SortSupport "traditional" comparison function. Pulls two MAC addresses from + * the heap and runs a standard comparison on them. + */ +static int +macaddr_fast_cmp(Datum x, Datum y, SortSupport ssup) +{ +