Re: [HACKERS] [PATCH] SortSupport for macaddr type

2017-03-31 Thread Robert Haas
On Fri, Mar 31, 2017 at 3:03 PM, Brandur Leach  wrote:
> Thank you Peter for the assist here and great sleuthing in
> the disassembly, and thanks Teodor for committing!
>
> Neha pointed out (thanks as well) a couple typos upthread
> that I hadn't gotten around to fixing. I've attached a new
> three line patch to sort those out.

Committed.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SortSupport for macaddr type

2017-03-31 Thread Brandur Leach
Hello,

Thank you Peter for the assist here and great sleuthing in
the disassembly, and thanks Teodor for committing!

Neha pointed out (thanks as well) a couple typos upthread
that I hadn't gotten around to fixing. I've attached a new
three line patch to sort those out.

Brandur


On Wed, Mar 29, 2017 at 1:31 PM, Teodor Sigaev  wrote:

> Thank you, pushed
>
> Excellent! I've attached a new (and hopefully final)
>> version of the patch.
>>
>
> --
> Teodor Sigaev   E-mail: teo...@sigaev.ru
>WWW:
> http://www.sigaev.ru/
>


0001-Fix-two-typos-introduced-by-macaddr-SortSupport.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SortSupport for macaddr type

2017-03-29 Thread Teodor Sigaev

Thank you, pushed

Excellent! I've attached a new (and hopefully final)
version of the patch.


--
Teodor Sigaev   E-mail: teo...@sigaev.ru
   WWW: http://www.sigaev.ru/


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SortSupport for macaddr type

2017-03-19 Thread Peter Geoghegan
On Sun, Mar 19, 2017 at 8:40 AM, Greg Stark  wrote:
>> Out of idle curiosity, I decided to generate disassembly of both
>> macaddr_cmp_internal(), and the patch's abbreviated comparator. The
>> former consists of 49 x86-64 instructions at -02 on my machine,
>> totaling 135 bytes of object code. The latter consists of only 10
>> instructions, or 24 bytes of object code.
>
> I wonder if there's something that could be optimized out of the
> normal cmp function but we're defeating some compiler optimizations
> with all our casts and aliasing.

There was one shl instruction for every left shift (hibits() or
lowbits() call) that appears in macaddr_cmp_internal(). I suppose that
it's possible that that could have been better optimized on a
big-endian machine, where abbreviated keys do not need to be
byteswaped to make the abbreviated comparator work. Perhaps the
compiler could have recognized that macaddr is a struct that consists
of 6 unsigned bytes as digits.

One thing that I've noticed makes a relatively big difference to
instruction count in comparators is varlena overhead, which does come
up here, since macaddr is a type that doesn't have a varlena header
(it was recently suggested by Tom that this is a mistake on practical
grounds, though). I've informally considered the possibility of
providing alternative versions of comparators that do not detoast or
work with anything other than 1-byte header varlenas, because
tuplesort has detected that that happens to be generally safe. I doubt
that I'll ever get around to posting a patch to do that, since the
cost savings are probably still marginal. I could probably find
something better to work on.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SortSupport for macaddr type

2017-03-19 Thread Greg Stark
On 18 March 2017 at 22:22, Peter Geoghegan  wrote:
>
> Out of idle curiosity, I decided to generate disassembly of both
> macaddr_cmp_internal(), and the patch's abbreviated comparator. The
> former consists of 49 x86-64 instructions at -02 on my machine,
> totaling 135 bytes of object code. The latter consists of only 10
> instructions, or 24 bytes of object code.

I wonder if there's something that could be optimized out of the
normal cmp function but we're defeating some compiler optimizations
with all our casts and aliasing.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SortSupport for macaddr type

2017-03-18 Thread Peter Geoghegan
On Sat, Mar 18, 2017 at 2:54 PM, Peter Geoghegan  wrote:
> This seems fine to me, especially
> because it lets us compare macaddr using simple 3-way unsigned int
> comparisons, which isn't otherwise the case.

Out of idle curiosity, I decided to generate disassembly of both
macaddr_cmp_internal(), and the patch's abbreviated comparator. The
former consists of 49 x86-64 instructions at -02 on my machine,
totaling 135 bytes of object code. The latter consists of only 10
instructions, or 24 bytes of object code.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SortSupport for macaddr type

2017-03-18 Thread Peter Geoghegan
On Tue, Mar 14, 2017 at 3:25 AM, Heikki Linnakangas  wrote:
> Looks straightforward at a quick read-through. I have just a couple of
> questions. How much of the performance gain comes from avoiding the
> FunctionCallInvoke overhead, by simply having SortSupport with a comparison
> function, and how much comes from the "abbreviation"?

I assume it's very similar to the existing performance characteristics
that UUID SortSupport has.

> Also, is it worth the cycles and code to have the hyperloglog estimator, and
> aborting the abbreviation if there are only a few distinct values. Or put
> another way, how much does the abbreviation slow things down in the worst
> case?

Postgres doesn't support passing 6 byte types by-value due to a lack
of support from macros in places like postgres.h. There is a
GET_4_BYTES(), and a GET_8_BYTES(), but no GET_6_BYTES(). It doesn't
really seem worth adding those, just like it didn't seem worth it back
when I pointed out that a TID sort is very slow in the context of
CREATE INDEX CONCURRENTLY. (We ended up coming up with an ad-hoc int8
encoding scheme there, which allowed us to use the int8 default
opclass rather than the TID default opclass, you'll recall.)

It's a bit odd that macaddr abbreviated keys are actually always
smaller (not including padding) than sizeof(Datum) on 64-bit machines,
but this guarantees that abbreviated comparisons will reliably
indicate what an authoritative comparison would indicate, since of
course all relevant information is right there in the abbreviated key.
This will be the only time where that's generally true with
abbreviated keys for any type. This seems fine to me, especially
because it lets us compare macaddr using simple 3-way unsigned int
comparisons, which isn't otherwise the case.

I doubt that you'll ever see a real need to abort abbreviation.
Arguably it shouldn't be considered, but maybe it's worth being
consistent. It might matter on 32-bit platforms, where there is only a
single NIC-specific byte to differentiate MAC addresses, but even that
seems like a stretch. With a type like macaddr, where there is no
major cost like strxfrm() to worry about when abbreviating, the only
cost of abbreviating is that tie-breakers will have an extra
index_getattr() or similar. If there was no abbreviation, or if it
aborted, then the index_getattr() can happen once per SortTuple,
up-front.

Nitpick: the patch should probably not refer to 32-bit or 64-bit
systems. Rather, it should refer to Datum size only.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SortSupport for macaddr type

2017-03-14 Thread Heikki Linnakangas
Looks straightforward at a quick read-through. I have just a couple of 
questions. How much of the performance gain comes from avoiding the 
FunctionCallInvoke overhead, by simply having SortSupport with a 
comparison function, and how much comes from the "abbreviation"?


Also, is it worth the cycles and code to have the hyperloglog estimator, 
and aborting the abbreviation if there are only a few distinct values. 
Or put another way, how much does the abbreviation slow things down in 
the worst case?


- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SortSupport for macaddr type

2017-03-02 Thread Neha Khatri
Hi Brandur,

Couple of typo corrections required in patch:

s/converstion/conversion
s/go the heap/go to the heap

The performance results you shared are for he Index Creation operation.
Are there similar results for the sorting using ORDER BY queries too? Just
curious.

Regards,
Neha


Re: [HACKERS] [PATCH] SortSupport for macaddr type

2017-03-02 Thread Julien Rouhaud
On Tue, Feb 28, 2017 at 08:58:24AM -0800, Brandur Leach wrote:
> Hi Julien,
> 

Hello Brandur,

> Thanks for the expedient reply, even after I'd dropped the
> ball for so long :)

:)

> Cool! I just re-read my own comment a few days later and I
> think that it still mostly makes sense, but definitely open
> to other edits if anyone else has one.
> 

Ok.

> > One last thing, I noticed that you added:
> >
> > +static int macaddr_internal_cmp(macaddr *a1, macaddr *a2);
> >
> > but the existing function is declared as
> >
> > static int32
> > macaddr_internal_cmp(macaddr *a1, macaddr *a2)
> >
> > I'd be in favor to declare both as int.
> 
> Great catch! I have no idea how I missed that. I've done as
> you suggested and made them both "int", which seems
> consistent with SortSupport implementations elsewhere.
> 

Great.

> > After this, I think this patch will be ready for committer.
> 
> Excellent! I've attached a new (and hopefully final)
> version of the patch.
> 
> Two final questions about the process if you'd be so kind:
> 
> * Should I change the status on the Commitfest link [1] or
>   do I leave that to you (or someone else like a committer)?
> 

This is is done by the reviewer, after check of the last patch version. I just
changed the status to ready for committer!

> * I've been generating a new OID value with the
>   `unused_oids` script, but pretty much every time I rebase
>   I collide with someone else's addition and need to find a
>   new one. Is it better for me to pick an OID in an exotic
>   range for my final patch, or that a committer just finds
>   a new one (if necessary) as they're putting it into
>   master?

I think picking the value with unused_oids as you dd is the right thing to do.
As Robert said, if this oid is used in another patch in the meantime, updating
it at commit time is not a big deal.  Moreover, this patch will require a
catversion bump, which is meant to be done by the committer.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SortSupport for macaddr type

2017-03-02 Thread Robert Haas
On Tue, Feb 28, 2017 at 10:28 PM, Brandur Leach  wrote:
> * I've been generating a new OID value with the
>   `unused_oids` script, but pretty much every time I rebase
>   I collide with someone else's addition and need to find a
>   new one. Is it better for me to pick an OID in an exotic
>   range for my final patch, or that a committer just finds
>   a new one (if necessary) as they're putting it into
>   master?

It's nice if you can do it before the committer gets to it, but if it
ends up conflicting I personally don't find fixing it to be a big
deal.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SortSupport for macaddr type

2017-02-28 Thread Brandur Leach
Hi Julien,

Thanks for the expedient reply, even after I'd dropped the
ball for so long :)

> Indeed, I should have checked more examples :/ There
> isn't any clear pattern for this, so I guess any one
> would be ok.

Yeah, agreed. If it's alright with you, I ended up moving
the naming back to `macaddr_cmp_internal` just because it
results in a smaller final diff.

> Thanks. I'm ok with this, but maybe a native english
> speaker would have a better opinion on this.

Cool! I just re-read my own comment a few days later and I
think that it still mostly makes sense, but definitely open
to other edits if anyone else has one.

> One last thing, I noticed that you added:
>
> +static int macaddr_internal_cmp(macaddr *a1, macaddr *a2);
>
> but the existing function is declared as
>
> static int32
> macaddr_internal_cmp(macaddr *a1, macaddr *a2)
>
> I'd be in favor to declare both as int.

Great catch! I have no idea how I missed that. I've done as
you suggested and made them both "int", which seems
consistent with SortSupport implementations elsewhere.

> After this, I think this patch will be ready for committer.

Excellent! I've attached a new (and hopefully final)
version of the patch.

Two final questions about the process if you'd be so kind:

* Should I change the status on the Commitfest link [1] or
  do I leave that to you (or someone else like a committer)?

* I've been generating a new OID value with the
  `unused_oids` script, but pretty much every time I rebase
  I collide with someone else's addition and need to find a
  new one. Is it better for me to pick an OID in an exotic
  range for my final patch, or that a committer just finds
  a new one (if necessary) as they're putting it into
  master?

Thanks again!
Brandur

[1] https://commitfest.postgresql.org/10/743/


On Sat, Feb 25, 2017 at 9:56 AM, Julien Rouhaud 
wrote:

> On Sun, Feb 05, 2017 at 01:56:41PM -0800, Brandur Leach wrote:
>
> Hello Brandur, thanks for the updated patch!
>
> >
> > > * you used macaddr_cmp_internal() function name, for uuid
> > >   the same function is named uuid_internal_cmp().  Using
> > >   the same naming pattern is probably better.
> >
> > I was a little split on this one! It's true that UUID uses
> > `_internal_cmp`, but `_cmp_internal` is also used in a
> > number of places like `enum`, `timetz`, and `network`. I
> > don't have a strong feeling about it either way, so I've
> > changed it to `_internal_cmp` to match UUID as you
> > suggested.
> >
>
> Indeed, I should have checked more examples :/ There isn't any clear
> pattern
> for this, so I guess any one would be ok.
>
> > > * the function comment on macaddr_abbrev_convert()
> > >   doesn't mention specific little-endian handling
> >
> > I tried to bake this into the comment text. Here are the
> > relevant lines of the amended version:
> >
> > * Packs the bytes of a 6-byte MAC address into a Datum and treats it
> as
> > an
> > * unsigned integer for purposes of comparison. On a 64-bit machine,
> > there
> > * will be two zeroed bytes of padding. The integer is converted to
> > native
> > * endianness to facilitate easy comparison.
> >
> > > * "There will be two bytes of zero padding on the least
> > >   significant end"
> > >
> > > "least significant bits" would be better
> >
> > Also done. Here is the amended version:
> >
> > * On a 64-bit machine, zero out the 8-byte datum and copy the 6
> bytes of
> > * the MAC address in. There will be two bytes of zero padding on the
> end
> > * of the least significant bits.
> >
>
> Thanks.  I'm ok with this, but maybe a native english speaker would have a
> better opinion on this.
>
> > > * This patch will trigger quite a lot modifications after
> > >   a pgindent run.  Could you try to run pgindent on mac.c
> > >   before sending an updated patch?
> >
> > Good call! I've run the new version through pgindent.
> >
>
> Thanks also, no more issue here.
>
> > Let me know if you have any further feedback and/or
> > suggestions. Thanks!
>
> One last thing, I noticed that you added:
>
> +static int macaddr_internal_cmp(macaddr *a1, macaddr *a2);
>
> but the existing function is declared as
>
> static int32
> macaddr_internal_cmp(macaddr *a1, macaddr *a2)
>
> I'd be in favor to declare both as int.
>
> After this, I think this patch will be ready for committer.
>
> --
> Julien Rouhaud
> http://dalibo.com - http://dalibo.org
>


0003-Implement-SortSupport-for-macaddr-data-type.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SortSupport for macaddr type

2017-02-25 Thread Julien Rouhaud
On Sun, Feb 05, 2017 at 01:56:41PM -0800, Brandur Leach wrote:

Hello Brandur, thanks for the updated patch!

> 
> > * you used macaddr_cmp_internal() function name, for uuid
> >   the same function is named uuid_internal_cmp().  Using
> >   the same naming pattern is probably better.
> 
> I was a little split on this one! It's true that UUID uses
> `_internal_cmp`, but `_cmp_internal` is also used in a
> number of places like `enum`, `timetz`, and `network`. I
> don't have a strong feeling about it either way, so I've
> changed it to `_internal_cmp` to match UUID as you
> suggested.
>

Indeed, I should have checked more examples :/ There isn't any clear pattern
for this, so I guess any one would be ok.

> > * the function comment on macaddr_abbrev_convert()
> >   doesn't mention specific little-endian handling
> 
> I tried to bake this into the comment text. Here are the
> relevant lines of the amended version:
> 
> * Packs the bytes of a 6-byte MAC address into a Datum and treats it as
> an
> * unsigned integer for purposes of comparison. On a 64-bit machine,
> there
> * will be two zeroed bytes of padding. The integer is converted to
> native
> * endianness to facilitate easy comparison.
> 
> > * "There will be two bytes of zero padding on the least
> >   significant end"
> >
> > "least significant bits" would be better
> 
> Also done. Here is the amended version:
> 
> * On a 64-bit machine, zero out the 8-byte datum and copy the 6 bytes of
> * the MAC address in. There will be two bytes of zero padding on the end
> * of the least significant bits.
> 

Thanks.  I'm ok with this, but maybe a native english speaker would have a
better opinion on this.

> > * This patch will trigger quite a lot modifications after
> >   a pgindent run.  Could you try to run pgindent on mac.c
> >   before sending an updated patch?
> 
> Good call! I've run the new version through pgindent.
> 

Thanks also, no more issue here.

> Let me know if you have any further feedback and/or
> suggestions. Thanks!

One last thing, I noticed that you added:

+static int macaddr_internal_cmp(macaddr *a1, macaddr *a2);

but the existing function is declared as

static int32
macaddr_internal_cmp(macaddr *a1, macaddr *a2)

I'd be in favor to declare both as int.

After this, I think this patch will be ready for committer.

--
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SortSupport for macaddr type

2017-02-07 Thread Brandur Leach
And as a short follow up, I've moved the patch to the
current commit fest:

https://commitfest.postgresql.org/13/743/


On Sun, Feb 5, 2017 at 1:56 PM, Brandur Leach  wrote:

> Hi Julien,
>
> Thank you for taking the time to do this review, and my
> apologies for the very delayed response. I lost track of
> this work and have only jumped back into it today.
>
> Please find attached a new version of the patch with your
> feedback integrated. I've also rebased the patch against
> the current master and selected a new OID because my old
> one is now in use.
>
> > * you used macaddr_cmp_internal() function name, for uuid
> >   the same function is named uuid_internal_cmp().  Using
> >   the same naming pattern is probably better.
>
> I was a little split on this one! It's true that UUID uses
> `_internal_cmp`, but `_cmp_internal` is also used in a
> number of places like `enum`, `timetz`, and `network`. I
> don't have a strong feeling about it either way, so I've
> changed it to `_internal_cmp` to match UUID as you
> suggested.
>
> > * the function comment on macaddr_abbrev_convert()
> >   doesn't mention specific little-endian handling
>
> I tried to bake this into the comment text. Here are the
> relevant lines of the amended version:
>
> * Packs the bytes of a 6-byte MAC address into a Datum and treats it
> as an
> * unsigned integer for purposes of comparison. On a 64-bit machine,
> there
> * will be two zeroed bytes of padding. The integer is converted to
> native
> * endianness to facilitate easy comparison.
>
> > * "There will be two bytes of zero padding on the least
> >   significant end"
> >
> > "least significant bits" would be better
>
> Also done. Here is the amended version:
>
> * On a 64-bit machine, zero out the 8-byte datum and copy the 6 bytes
> of
> * the MAC address in. There will be two bytes of zero padding on the
> end
> * of the least significant bits.
>
> > * This patch will trigger quite a lot modifications after
> >   a pgindent run.  Could you try to run pgindent on mac.c
> >   before sending an updated patch?
>
> Good call! I've run the new version through pgindent.
>
> Let me know if you have any further feedback and/or
> suggestions. Thanks!
>
> Brandur
>
>
> On Wed, Sep 14, 2016 at 3:14 AM, Julien Rouhaud  > wrote:
>
>> On 26/08/2016 19:44, Brandur wrote:
>> > Hello,
>> >
>>
>> Hello,
>>
>> > I've attached a patch to add SortSupport for Postgres' macaddr which
>> has the
>> > effect of improving the performance of sorting operations for the type.
>> The
>> > strategy that I employ is very similar to that for UUID, which is to
>> create
>> > abbreviated keys by packing as many bytes from the MAC address as
>> possible into
>> > Datums, and then performing fast unsigned integer comparisons while
>> sorting.
>> >
>> > I ran some informal local benchmarks, and for cardinality greater than
>> 100k
>> > rows, I see a speed up on `CREATE INDEX` of roughly 25% to 65%. (For
>> those
>> > interested, I put a few more numbers into a small report here [2].)
>> >
>>
>> That's a nice improvement!
>>
>> > Admittedly, this is not quite as useful as speeding up sorting on a
>> more common
>> > data type like TEXT or UUID, but the change still seems like a useful
>> > performance improvement. I largely wrote it as an exercise to
>> familiarize
>> > myself with the Postgres codebase.
>> >
>> > I'll add an entry into the current commitfest as suggested by the
>> Postgres Wiki
>> > and follow up here with a link.
>> >
>> > Thanks, and if anyone has feedback or other thoughts, let me know!
>> >
>>
>> I just reviewed your patch.  It applies and compiles cleanly, and the
>> abbrev feature works as intended.  There's not much to say since this is
>> heavily inspired on the uuid SortSupport. The only really specific part
>> is in the abbrev_converter function, and I don't see any issue with it.
>>
>> I have a few trivial comments:
>>
>> * you used macaddr_cmp_internal() function name, for uuid the same
>> function is named uuid_internal_cmp().  Using the same naming pattern is
>> probably better.
>>
>> * the function comment on macaddr_abbrev_convert() doesn't mention
>> specific little-endian handling
>>
>> * "There will be two bytes of zero padding on the least significant end"
>>
>> "least significant bits" would be better
>>
>> * This patch will trigger quite a lot modifications after a pgindent
>> run.  Could you try to run pgindent on mac.c before sending an updated
>> patch?
>>
>> Best regards.
>>
>> --
>> Julien Rouhaud
>> http://dalibo.com - http://dalibo.org
>>
>
>


Re: [HACKERS] [PATCH] SortSupport for macaddr type

2017-02-05 Thread Brandur Leach
Hi Julien,

Thank you for taking the time to do this review, and my
apologies for the very delayed response. I lost track of
this work and have only jumped back into it today.

Please find attached a new version of the patch with your
feedback integrated. I've also rebased the patch against
the current master and selected a new OID because my old
one is now in use.

> * you used macaddr_cmp_internal() function name, for uuid
>   the same function is named uuid_internal_cmp().  Using
>   the same naming pattern is probably better.

I was a little split on this one! It's true that UUID uses
`_internal_cmp`, but `_cmp_internal` is also used in a
number of places like `enum`, `timetz`, and `network`. I
don't have a strong feeling about it either way, so I've
changed it to `_internal_cmp` to match UUID as you
suggested.

> * the function comment on macaddr_abbrev_convert()
>   doesn't mention specific little-endian handling

I tried to bake this into the comment text. Here are the
relevant lines of the amended version:

* Packs the bytes of a 6-byte MAC address into a Datum and treats it as
an
* unsigned integer for purposes of comparison. On a 64-bit machine,
there
* will be two zeroed bytes of padding. The integer is converted to
native
* endianness to facilitate easy comparison.

> * "There will be two bytes of zero padding on the least
>   significant end"
>
> "least significant bits" would be better

Also done. Here is the amended version:

* On a 64-bit machine, zero out the 8-byte datum and copy the 6 bytes of
* the MAC address in. There will be two bytes of zero padding on the end
* of the least significant bits.

> * This patch will trigger quite a lot modifications after
>   a pgindent run.  Could you try to run pgindent on mac.c
>   before sending an updated patch?

Good call! I've run the new version through pgindent.

Let me know if you have any further feedback and/or
suggestions. Thanks!

Brandur


On Wed, Sep 14, 2016 at 3:14 AM, Julien Rouhaud 
wrote:

> On 26/08/2016 19:44, Brandur wrote:
> > Hello,
> >
>
> Hello,
>
> > I've attached a patch to add SortSupport for Postgres' macaddr which has
> the
> > effect of improving the performance of sorting operations for the type.
> The
> > strategy that I employ is very similar to that for UUID, which is to
> create
> > abbreviated keys by packing as many bytes from the MAC address as
> possible into
> > Datums, and then performing fast unsigned integer comparisons while
> sorting.
> >
> > I ran some informal local benchmarks, and for cardinality greater than
> 100k
> > rows, I see a speed up on `CREATE INDEX` of roughly 25% to 65%. (For
> those
> > interested, I put a few more numbers into a small report here [2].)
> >
>
> That's a nice improvement!
>
> > Admittedly, this is not quite as useful as speeding up sorting on a more
> common
> > data type like TEXT or UUID, but the change still seems like a useful
> > performance improvement. I largely wrote it as an exercise to familiarize
> > myself with the Postgres codebase.
> >
> > I'll add an entry into the current commitfest as suggested by the
> Postgres Wiki
> > and follow up here with a link.
> >
> > Thanks, and if anyone has feedback or other thoughts, let me know!
> >
>
> I just reviewed your patch.  It applies and compiles cleanly, and the
> abbrev feature works as intended.  There's not much to say since this is
> heavily inspired on the uuid SortSupport. The only really specific part
> is in the abbrev_converter function, and I don't see any issue with it.
>
> I have a few trivial comments:
>
> * you used macaddr_cmp_internal() function name, for uuid the same
> function is named uuid_internal_cmp().  Using the same naming pattern is
> probably better.
>
> * the function comment on macaddr_abbrev_convert() doesn't mention
> specific little-endian handling
>
> * "There will be two bytes of zero padding on the least significant end"
>
> "least significant bits" would be better
>
> * This patch will trigger quite a lot modifications after a pgindent
> run.  Could you try to run pgindent on mac.c before sending an updated
> patch?
>
> Best regards.
>
> --
> Julien Rouhaud
> http://dalibo.com - http://dalibo.org
>


0002-Implement-SortSupport-for-macaddr-data-type.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SortSupport for macaddr type

2016-09-28 Thread Robert Haas
On Wed, Sep 14, 2016 at 6:14 AM, Julien Rouhaud
 wrote:
> On 26/08/2016 19:44, Brandur wrote:
>> Hello,
> Hello,
>
>> I've attached a patch to add SortSupport for Postgres' macaddr which has the
>> effect of improving the performance of sorting operations for the type. The
>> strategy that I employ is very similar to that for UUID, which is to create
>> abbreviated keys by packing as many bytes from the MAC address as possible 
>> into
>> Datums, and then performing fast unsigned integer comparisons while sorting.
>>
>> I ran some informal local benchmarks, and for cardinality greater than 100k
>> rows, I see a speed up on `CREATE INDEX` of roughly 25% to 65%. (For those
>> interested, I put a few more numbers into a small report here [2].)
>>
>
> That's a nice improvement!
>
>> Admittedly, this is not quite as useful as speeding up sorting on a more 
>> common
>> data type like TEXT or UUID, but the change still seems like a useful
>> performance improvement. I largely wrote it as an exercise to familiarize
>> myself with the Postgres codebase.
>>
>> I'll add an entry into the current commitfest as suggested by the Postgres 
>> Wiki
>> and follow up here with a link.
>>
>> Thanks, and if anyone has feedback or other thoughts, let me know!
>>
>
> I just reviewed your patch.  It applies and compiles cleanly, and the
> abbrev feature works as intended.  There's not much to say since this is
> heavily inspired on the uuid SortSupport. The only really specific part
> is in the abbrev_converter function, and I don't see any issue with it.
>
> I have a few trivial comments:
>
> * you used macaddr_cmp_internal() function name, for uuid the same
> function is named uuid_internal_cmp().  Using the same naming pattern is
> probably better.
>
> * the function comment on macaddr_abbrev_convert() doesn't mention
> specific little-endian handling
>
> * "There will be two bytes of zero padding on the least significant end"
>
> "least significant bits" would be better
>
> * This patch will trigger quite a lot modifications after a pgindent
> run.  Could you try to run pgindent on mac.c before sending an updated
> patch?

Since it's been two weeks and this patch hasn't been updated in
response to this review, I have marked it "Returned with Feedback" in
the CommitFest.  If it is updated, it can be resubmitted for the next
CommitFest.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SortSupport for macaddr type

2016-09-14 Thread Julien Rouhaud
On 26/08/2016 19:44, Brandur wrote:
> Hello,
> 

Hello,

> I've attached a patch to add SortSupport for Postgres' macaddr which has the
> effect of improving the performance of sorting operations for the type. The
> strategy that I employ is very similar to that for UUID, which is to create
> abbreviated keys by packing as many bytes from the MAC address as possible 
> into
> Datums, and then performing fast unsigned integer comparisons while sorting.
> 
> I ran some informal local benchmarks, and for cardinality greater than 100k
> rows, I see a speed up on `CREATE INDEX` of roughly 25% to 65%. (For those
> interested, I put a few more numbers into a small report here [2].)
> 

That's a nice improvement!

> Admittedly, this is not quite as useful as speeding up sorting on a more 
> common
> data type like TEXT or UUID, but the change still seems like a useful
> performance improvement. I largely wrote it as an exercise to familiarize
> myself with the Postgres codebase.
> 
> I'll add an entry into the current commitfest as suggested by the Postgres 
> Wiki
> and follow up here with a link.
> 
> Thanks, and if anyone has feedback or other thoughts, let me know!
> 

I just reviewed your patch.  It applies and compiles cleanly, and the
abbrev feature works as intended.  There's not much to say since this is
heavily inspired on the uuid SortSupport. The only really specific part
is in the abbrev_converter function, and I don't see any issue with it.

I have a few trivial comments:

* you used macaddr_cmp_internal() function name, for uuid the same
function is named uuid_internal_cmp().  Using the same naming pattern is
probably better.

* the function comment on macaddr_abbrev_convert() doesn't mention
specific little-endian handling

* "There will be two bytes of zero padding on the least significant end"

"least significant bits" would be better

* This patch will trigger quite a lot modifications after a pgindent
run.  Could you try to run pgindent on mac.c before sending an updated
patch?

Best regards.

-- 
Julien Rouhaud
http://dalibo.com - http://dalibo.org


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [PATCH] SortSupport for macaddr type

2016-08-26 Thread Brandur
And here's a commitfest link:

https://commitfest.postgresql.org/10/743/

Also, a correction to my original message: the unreferenced [1] footnote
points back to the thread that included the patch for UUID SortSupport.

[1] 
https://www.postgresql.org/message-id/CAM3SWZR4avsTwwNVUzRNbHk8v36W-QBqpoKg=ogkwwy0dkt...@mail.gmail.com

On Fri, Aug 26, 2016 at 10:44:22AM -0700, Brandur wrote:
> Hello,
> 
> I've attached a patch to add SortSupport for Postgres' macaddr which has the
> effect of improving the performance of sorting operations for the type. The
> strategy that I employ is very similar to that for UUID, which is to create
> abbreviated keys by packing as many bytes from the MAC address as possible 
> into
> Datums, and then performing fast unsigned integer comparisons while sorting.
> 
> I ran some informal local benchmarks, and for cardinality greater than 100k
> rows, I see a speed up on `CREATE INDEX` of roughly 25% to 65%. (For those
> interested, I put a few more numbers into a small report here [2].)
> 
> Admittedly, this is not quite as useful as speeding up sorting on a more 
> common
> data type like TEXT or UUID, but the change still seems like a useful
> performance improvement. I largely wrote it as an exercise to familiarize
> myself with the Postgres codebase.
> 
> I'll add an entry into the current commitfest as suggested by the Postgres 
> Wiki
> and follow up here with a link.
> 
> Thanks, and if anyone has feedback or other thoughts, let me know!
> 
> Brandur
> 
> [1] 
> https://www.postgresql.org/message-id/CAM3SWZR4avsTwwNVUzRNbHk8v36W-QBqpoKg=ogkwwy0dkt...@mail.gmail.com
> [2] 
> https://docs.google.com/spreadsheets/d/1cUqbg9RTgSo16WrrfJJwDP1eDaWL3TNOxnIOFNpfwgA/edit?usp=sharing



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] [PATCH] SortSupport for macaddr type

2016-08-26 Thread Brandur
Hello,

I've attached a patch to add SortSupport for Postgres' macaddr which has the
effect of improving the performance of sorting operations for the type. The
strategy that I employ is very similar to that for UUID, which is to create
abbreviated keys by packing as many bytes from the MAC address as possible into
Datums, and then performing fast unsigned integer comparisons while sorting.

I ran some informal local benchmarks, and for cardinality greater than 100k
rows, I see a speed up on `CREATE INDEX` of roughly 25% to 65%. (For those
interested, I put a few more numbers into a small report here [2].)

Admittedly, this is not quite as useful as speeding up sorting on a more common
data type like TEXT or UUID, but the change still seems like a useful
performance improvement. I largely wrote it as an exercise to familiarize
myself with the Postgres codebase.

I'll add an entry into the current commitfest as suggested by the Postgres Wiki
and follow up here with a link.

Thanks, and if anyone has feedback or other thoughts, let me know!

Brandur

[1] 
https://www.postgresql.org/message-id/CAM3SWZR4avsTwwNVUzRNbHk8v36W-QBqpoKg=ogkwwy0dkt...@mail.gmail.com
[2] 
https://docs.google.com/spreadsheets/d/1cUqbg9RTgSo16WrrfJJwDP1eDaWL3TNOxnIOFNpfwgA/edit?usp=sharing
>From 73f7dbf1921eea7106c93654dd7bf1fdd0888a07 Mon Sep 17 00:00:00 2001
From: Brandur 
Date: Tue, 16 Aug 2016 17:07:07 -0700
Subject: [PATCH] Implement SortSupport for macaddr data type

Introduces a scheme to produce abbreviated keys for the macaddr type
which involves packing as many of a MAC address' six bytes as possible
into a Datum (and adding two zeroed bytes of padding on a 64-bit
system). Sorting routines can then take advantage of the abbreviated
keys by performing fast unsigned integer comparisons, resulting in a
significant performance increase when operating on > 100k rows or more.

This implementation is modeled closely on the addition of SortSupport
for the UUID data type, which employs a very similar strategy.
---
 src/backend/utils/adt/mac.c | 210 
 src/include/catalog/pg_amproc.h |   1 +
 src/include/catalog/pg_proc.h   |   2 +
 src/include/utils/builtins.h|   1 +
 4 files changed, 214 insertions(+)

diff --git a/src/backend/utils/adt/mac.c b/src/backend/utils/adt/mac.c
index 509315a..471c3f0 100644
--- a/src/backend/utils/adt/mac.c
+++ b/src/backend/utils/adt/mac.c
@@ -7,9 +7,13 @@
 #include "postgres.h"
 
 #include "access/hash.h"
+#include "lib/hyperloglog.h"
 #include "libpq/pqformat.h"
+#include "port/pg_bswap.h"
 #include "utils/builtins.h"
+#include "utils/guc.h"
 #include "utils/inet.h"
+#include "utils/sortsupport.h"
 
 
 /*
@@ -22,6 +26,21 @@
 #define lobits(addr) \
   ((unsigned long)(((addr)->d<<16)|((addr)->e<<8)|((addr)->f)))
 
+/* sortsupport for macaddr */
+typedef struct
+{
+   int64   input_count;/* number of non-null values seen */
+   boolestimating; /* true if estimating 
cardinality */
+
+   hyperLogLogState abbr_card; /* cardinality estimator */
+} macaddr_sortsupport_state;
+
+static int macaddr_cmp_internal(macaddr *a1, macaddr *a2);
+static int macaddr_fast_cmp(Datum x, Datum y, SortSupport ssup);
+static int macaddr_cmp_abbrev(Datum x, Datum y, SortSupport ssup);
+static bool macaddr_abbrev_abort(int memtupcount, SortSupport ssup);
+static Datum macaddr_abbrev_convert(Datum original, SortSupport ssup);
+
 /*
  * MAC address reader.  Accepts several common notations.
  */
@@ -318,3 +337,194 @@ macaddr_trunc(PG_FUNCTION_ARGS)
 
PG_RETURN_MACADDR_P(result);
 }
+
+/*
+ * SortSupport strategy function. Populates a SortSupport struct with the
+ * information necessary to use comparison by abbreviated keys.
+ */
+Datum
+macaddr_sortsupport(PG_FUNCTION_ARGS)
+{
+   SortSupport ssup = (SortSupport) PG_GETARG_POINTER(0);
+
+   ssup->comparator = macaddr_fast_cmp;
+   ssup->ssup_extra = NULL;
+
+   if (ssup->abbreviate)
+   {
+   macaddr_sortsupport_state *uss;
+   MemoryContext oldcontext;
+
+   oldcontext = MemoryContextSwitchTo(ssup->ssup_cxt);
+
+   uss = palloc(sizeof(macaddr_sortsupport_state));
+   uss->input_count = 0;
+   uss->estimating = true;
+   initHyperLogLog(>abbr_card, 10);
+
+   ssup->ssup_extra = uss;
+
+   ssup->comparator = macaddr_cmp_abbrev;
+   ssup->abbrev_converter = macaddr_abbrev_convert;
+   ssup->abbrev_abort = macaddr_abbrev_abort;
+   ssup->abbrev_full_comparator = macaddr_fast_cmp;
+
+   MemoryContextSwitchTo(oldcontext);
+   }
+
+   PG_RETURN_VOID();
+}
+
+/*
+ * SortSupport "traditional" comparison function. Pulls two MAC addresses from
+ * the heap and runs a standard comparison on them.
+ */
+static int
+macaddr_fast_cmp(Datum x, Datum y, SortSupport ssup)
+{
+