Re: [HACKERS] Performance optimization of btree binary search

2013-12-09 Thread Peter Geoghegan
On Fri, Dec 6, 2013 at 4:53 PM, Peter Geoghegan p...@heroku.com wrote:
 I had considered that something like Intel Speedstep technology had a
 role here, but I'm pretty sure it steps up very aggressively when
 things are CPU bound - I tested that against a Core 2 Duo desktop a
 couple of years back, where it was easy to immediately provoke it by
 moving around desktop windows or something.

I decided to increase the default CPU governor from ondemand to
performance for each of the 8 logical cores on this system. I then
re-ran the benchmark. I saw markedly better, much more *consistent*
performance for master [1].

I Googled for clues, and found this:

https://communities.intel.com/community/datastack/blog/2013/08/05/how-to-maximise-cpu-performance-for-the-oracle-database-on-linux

(It happens to mention Oracle, but I think it would equally well apply
to any database). I strongly suspect this is down to Kernel version. I
should highlight this:


Another further CPU setting is the Energy/Performance Bias and Red Hat
and Oracle users should note that the default setting has changed in
the Linux kernel used between the releases of Red Hat/Oracle Linux 5
and Red Hat/Oracle Linux 6. (Some system BIOS options may include a
setting to prevent the OS changing this value). In release 5 Linux did
not set a value for this setting and therefore the value remained at 0
for a bias towards performance. In Red Hat 6 this behaviour has
changed and the default sets a median range to move this bias more
towards conserving energy (remember the same Linux kernel is present
in both ultrabooks as well as  servers and on my ultrabook I use
powertop and the other Linux tools and configurations discussed here
to maximise battery life) and reports the following in the dmesg
output on boot.

...

You can also use the tool to set a lower value to change the bias
entirely towards performance (the default release 5 behaviour).


If there is regression in Postgres performance on more recent Linux
kernels [2], perhaps this is it. I certainly don't recall hearing
advice on this from the usual places. I'm surprised that turbo boost
mode wasn't enabled very quickly on a workload like this. It makes a
*huge* difference - at 4 clients (one per physical core), setting the
CPU governor to performance increases TPS by a massive 40% compared
to some earlier, comparable runs of master. These days Redhat are even
pointing out that CPU governor policy can be set via cron jobs [3].

I cannot account for why the original benchmark performed was
consistent with the patch having helped to such a large degree, given
the large number of runs involved, and their relatively long duration
for a CPU/memory bound workload. As I said, this machine is on
dedicated hardware, and virtualization was not used. However, at this
point I have no choice but to withdraw it from consideration, and not
pursue this any further. Sorry for the noise.

[1] http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/turbo/

[2] http://www.postgresql.org/message-id/529f7d58.1060...@agliodbs.com

[3] 
https://access.redhat.com/site/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Power_Management_Guide/cpufreq_governors.html
-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance optimization of btree binary search

2013-12-06 Thread Peter Geoghegan
On Thu, Dec 5, 2013 at 2:19 AM, Peter Geoghegan p...@heroku.com wrote:
 I did a relatively short variant 'insert' pgbench-tools benchmark,
 with a serial primary index created on the pgbench_history table.
 Results:

 http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/insert/

I saw something today that made me lose confidence in the results I've
presented. I was unsuccessful in re-creating similar 'select' numbers
at scale 15 on the same server [1] (originally I wanted to see how
eliding the fmgr trapoline worked out with binary searching only).
Then, when re-testing master as of commit
8e18d04d4daf34b8a557e2dc553a7754b255cd9a (my feature branches branched
off from that master commit), I noticed that even after the last
pgbench had run, a single postgres process was stuck at 100% CPU usage
for upwards of a minute (the run referred to was for 32 clients, and I
only saw that one process in 'top' output).

To me this points to either 1) some kind of contamination or 2) a bug
in Postgres. My first guess is that it's the issue described here [2]
and elsewhere (maybe whether or not that behavior constitutes a bug in
controversial, but I think it does).

Consider the contrast between these two runs (against master, 2
clients) of the same, new benchmark:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/new-select/50/index.html

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/new-select/44/index.html

I had considered that something like Intel Speedstep technology had a
role here, but I'm pretty sure it steps up very aggressively when
things are CPU bound - I tested that against a Core 2 Duo desktop a
couple of years back, where it was easy to immediately provoke it by
moving around desktop windows or something. I know that Greg Smith
controls for that by disabling it, but I have not done so here - I
assumed that he only did so because he was being particularly
cautious. There are similar runs on my original 'results' benchmark
(these particular should-be-comparable-but-are-not runs are with 1
client against my patch this time):

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/results/297/index.html

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/results/303/index.html

References:

[1] http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/new-select/

[2] 
http://www.postgresql.org/message-id/CAFj8pRDDa40eiP4UTrCm=+bdt0xbwf7qc8t_3y0dfqyuzk2...@mail.gmail.com

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance optimization of btree binary search

2013-12-05 Thread Peter Geoghegan
On Wed, Dec 4, 2013 at 5:28 PM, Peter Geoghegan p...@heroku.com wrote:
 I'm also curious about the impact on insertion into primary key
 indexes. Presently, we hold an exclusive buffer lock for the duration
 of a couple of operations when checkUnique != UNIQUE_CHECK_NO.
 _bt_binsrch() is one such operation. The other one there,
 _bt_check_unique(), is likely to be a lot cheaper than _bt_binsrch()
 on average, I think, so I'm cautiously optimistic that it'll be
 noticeable. I better go and check it out.

I did a relatively short variant 'insert' pgbench-tools benchmark,
with a serial primary index created on the pgbench_history table.
Results:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/insert/

Note that in the interest of increasing the signal to noise ratio, I
used unlogged tables. These are considerably shorter two minute runs,
of inserts/writes, but even still it's interesting to compare it to my
original 'select' benchmark. Checkpoint settings were very high.

This looks to be about a wash. I'm not that surprised, because this is
all about memory bandwidth, not lock contention. Even with the lock
contention on the rightmost btree page, we top out at about the same
TPS level as the 'select' benchmark (at least compared to Postgres
master). However, we top out at that level much sooner here (growth is
much steeper), because the contention is mostly on the same one or two
btree leaf pages, which are well cached by CPU L1 cache. Writes are
actually quite a bit faster than uniformly-distributed reads, even
with the exclusive buffer locking (and lock contention) necessitated
by writing. You might even say that my patch makes the first benchmark
look more like this one (a less awkward comparison could probably be
made between what I've done for uniform distribution read workloads,
and a read workload with far more uneven data access, which will
naturally have fewer cache misses and less TLB contention). I strongly
suspect I wouldn't have seen a significant number of cache misses when
binary searching on btree pages if I was to instrument this workload.

It might still be worth pursuing the SERIAL hinting mechanism, but
it's not going to be nearly as big a win as what I've done for reads,
I think. It might also be worth looking at a workload with uniformly
distributed btree index tuple insertions, where I'd expect similar
advantages to those originally shown for reads.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance optimization of btree binary search

2013-12-05 Thread Andres Freund
On 2013-12-04 18:48:44 -0500, Robert Haas wrote:
  * When a type narrower than Datum is stored in a Datum, we place it in the
  * low-order bits and are careful that the DatumGetXXX macro for it discards
  * the unused high-order bits (as opposed to, say, assuming they are zero).
  * This is needed to support old-style user-defined functions, since depending
  * on architecture and compiler, the return value of a function returning char
  * or short may contain garbage when called as if it returned Datum.
 
 And record_image_eq does a rather elaborate dance around here, calling
 the appropriate GET_x_BYTES macro depending on the type-width.  If we
 can really count on the high-order bits to be zero, that's all
 completely unnecessary tomfoolery.

I don't think we can get rid of that dance in record_image_eq - it very
well could used on records originally generated when those bits haven't
been guaranteed to be zeroed.
Other usecases where the appropriate DatumGetFoo() macros are used don't
have a problem with that since it's cleared again there, but in
record_image_eq we can't rely on that.

Or am I missing something?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance optimization of btree binary search

2013-12-05 Thread Heikki Linnakangas

On 12/05/2013 07:30 AM, Tom Lane wrote:

Peter Eisentraut pete...@gmx.net writes:

On Wed, 2013-12-04 at 20:27 -0500, Tom Lane wrote:

Lazy people?  I'm not in a hurry to drop it; it's not costing us much to
just sit there, other than in this connection which we see how to fix.



Actually, I think it probably costs a fair portion of extension authors
when their initial code crashes because they forgot to declare all their
functions V1.  I think it might actually be more of a bother to lazy
people than a benefit.


Hm.  We have heard one or two complaints like that, but not a huge number.


It happens to me about 75% of the time when I write a new C function. 
These days, I usually realize it pretty quickly.


I wonder how easy it would be to make the compiler produce a warning 
about it. Or issue a warning in PostgreSQL when you do CREATE FUNCTION 
and the C function appears to be a V0 function.



I'm worried about breaking code that's been working since god-knows-when;
but I will concede there's little evidence that there's very much of that
out there either.


I wouldn't mind just dropping the V0 support. It's not difficult to 
modify your function for V1 convention, so if there's still anyone out 
there using V0, it wouldn't be unreasonable to require them to update.


- Heikki


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance optimization of btree binary search

2013-12-05 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2013-12-04 18:48:44 -0500, Robert Haas wrote:
 And record_image_eq does a rather elaborate dance around here, calling
 the appropriate GET_x_BYTES macro depending on the type-width.  If we
 can really count on the high-order bits to be zero, that's all
 completely unnecessary tomfoolery.

 I don't think we can get rid of that dance in record_image_eq - it very
 well could used on records originally generated when those bits haven't
 been guaranteed to be zeroed.

No, you're failing to think about the context here.  A Datum is not an
on-disk concept, only an in-memory one.  In the case of record_image_eq,
simplifying the comparison of by-value Datums is unconditionally safe
as long as heap_deform_tuple is consistent about using the proper
GetDatum macros, which it is.  (So actually, that elaborate dance
is a 100% waste of effort today, regardless of any changes we're
discussing here.)

The risk we take by simplifying comparisons in a more general context
is that some function somewhere might've been sloppy about doing a
native-type-to-Datum conversion on its result.  In the case of V0
functions that risk is unavoidable except by adding some explicit cleanup
code, but I'm a bit worried that somebody, particularly third-party code,
might've sloppily written return foo in a V1 function when return
Int32GetDatum(foo) would be correct.  In that case, the resultant Datum
might have not-per-spec high-order bits, and if it reaches the fast
comparator without ever having been squeezed into a physical tuple,
we've got a problem.

I would certainly regard this as a bug in that function, but nonetheless
it's a hazard that we need to set against any performance improvement
that we can buy this way.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance optimization of btree binary search

2013-12-05 Thread Andres Freund
On 2013-12-05 08:58:55 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  I don't think we can get rid of that dance in record_image_eq - it very
  well could used on records originally generated when those bits haven't
  been guaranteed to be zeroed.
 
 No, you're failing to think about the context here.

Ah yes. I was completely forgetting that
heap_form_tuple()/heap_fill_tuple() will properly take care to only use
meaningful parts of the (to-be-)stored data, not random padding.

Thanks.

 The risk we take by simplifying comparisons in a more general context
 is that some function somewhere might've been sloppy about doing a
 native-type-to-Datum conversion on its result.  In the case of V0
 functions that risk is unavoidable except by adding some explicit cleanup
 code, but I'm a bit worried that somebody, particularly third-party code,
 might've sloppily written return foo in a V1 function when return
 Int32GetDatum(foo) would be correct.  In that case, the resultant Datum
 might have not-per-spec high-order bits, and if it reaches the fast
 comparator without ever having been squeezed into a physical tuple,
 we've got a problem.

Too bad V1 hasn't insisted on using PG_RETURN_* macros. That would have
allowed asserts checking against such cases by setting
fcinfo-has_returned = true or such...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance optimization of btree binary search

2013-12-05 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2013-12-05 08:58:55 -0500, Tom Lane wrote:
 I'm a bit worried that somebody, particularly third-party code,
 might've sloppily written return foo in a V1 function when return
 Int32GetDatum(foo) would be correct.  In that case, the resultant Datum
 might have not-per-spec high-order bits, and if it reaches the fast
 comparator without ever having been squeezed into a physical tuple,
 we've got a problem.

 Too bad V1 hasn't insisted on using PG_RETURN_* macros. That would have
 allowed asserts checking against such cases by setting
 fcinfo-has_returned = true or such...

[ shrug... ]  PG_RETURN_DATUM has no practical way to verify that the
given Datum was constructed safely, so I think we'd just be adding
overhead with not much real safety gain.

In practice, if we were to change Datum to be a signed type (intptr_t
not uintptr_t), the most common cases would probably do the right thing
anyway, ie an int or short return value would get promoted to Datum
with sign-extension.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance optimization of btree binary search

2013-12-05 Thread Andres Freund
On 2013-12-05 10:02:56 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2013-12-05 08:58:55 -0500, Tom Lane wrote:
  I'm a bit worried that somebody, particularly third-party code,
  might've sloppily written return foo in a V1 function when return
  Int32GetDatum(foo) would be correct.  In that case, the resultant Datum
  might have not-per-spec high-order bits, and if it reaches the fast
  comparator without ever having been squeezed into a physical tuple,
  we've got a problem.
 
  Too bad V1 hasn't insisted on using PG_RETURN_* macros. That would have
  allowed asserts checking against such cases by setting
  fcinfo-has_returned = true or such...
 
 [ shrug... ]  PG_RETURN_DATUM has no practical way to verify that the
 given Datum was constructed safely, so I think we'd just be adding
 overhead with not much real safety gain.

I was thinking of doing it for assert only anyway.

 In practice, if we were to change Datum to be a signed type (intptr_t
 not uintptr_t), the most common cases would probably do the right thing
 anyway, ie an int or short return value would get promoted to Datum
 with sign-extension.

I was actually thinking about making Datum (and some other types we
have) structs or unions. Currently it's far, far to easy to mix them. We throw
away pretty much all of the little typesafety C has by typedef'ing them
to integral types with lots of autocasting behaviour.
Making Datum a union with all the base-types inside, would also get rid
of us violating the standard's aliasing rules...

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance optimization of btree binary search

2013-12-05 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 I was actually thinking about making Datum (and some other types we
 have) structs or unions. Currently it's far, far to easy to mix them. We throw
 away pretty much all of the little typesafety C has by typedef'ing them
 to integral types with lots of autocasting behaviour.

That's intentional; on many ABIs, making Datum a struct would be
catastrophic performance-wise because it would not be eligible for simple
register pass or return conventions.  In any case, the number of bugs
I can remember that such a thing would've prevented is negligible.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance optimization of btree binary search

2013-12-05 Thread Andres Freund
On 2013-12-05 10:34:16 -0500, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  I was actually thinking about making Datum (and some other types we
  have) structs or unions. Currently it's far, far to easy to mix them. We 
  throw
  away pretty much all of the little typesafety C has by typedef'ing them
  to integral types with lots of autocasting behaviour.
 
 That's intentional; on many ABIs, making Datum a struct would be
 catastrophic performance-wise because it would not be eligible for simple
 register pass or return conventions.

Unions should behave saner in that regard tho? And it be fairly easy to
make it an optional thing.

 In any case, the number of bugs I can remember that such a thing
 would've prevented is negligible.

Cases talked about upthread, where a plain datatype is returned as a
Datum instead of using FooGetDatum() and the reverse, would be
impossible. I don't think those are that infrequent?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance optimization of btree binary search

2013-12-05 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2013-12-05 10:34:16 -0500, Tom Lane wrote:
 In any case, the number of bugs I can remember that such a thing
 would've prevented is negligible.

 Cases talked about upthread, where a plain datatype is returned as a
 Datum instead of using FooGetDatum() and the reverse, would be
 impossible. I don't think those are that infrequent?

[ shrug... ]  The performance changes we're talking about here would have
the effect of making the compiler's implicit casts be the right thing
anyway.  In any case, I don't think you'd have accomplished much by
forcing people to use FooGetDatum, unless you can force them to use the
*right* FooGetDatum.  The errors I can remember seeing in this area were
more in the line of choosing the wrong macro.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance optimization of btree binary search

2013-12-05 Thread Peter Eisentraut
On Thu, 2013-12-05 at 15:29 +0200, Heikki Linnakangas wrote:
 It happens to me about 75% of the time when I write a new C function. 
 These days, I usually realize it pretty quickly.
 
 I wonder how easy it would be to make the compiler produce a warning 
 about it. Or issue a warning in PostgreSQL when you do CREATE
 FUNCTION 
 and the C function appears to be a V0 function.

Here is a related idea I've had:  Integrate the function declaration
into the PG_FUNCTION_INFO_V1 macro, like this:

#define PG_FUNCTION_INFO_V1(funcname) \
Datum funcname(PG_FUNCTION_ARGS); \
...

Because if you don't declare the function, you will get a compiler
warning for -Wmissing-prototypes.  Field experience shows that many
extension authors neglect to explicitly declare their entry-point
functions, presumably because they don't understand the cause of the
warning, or they don't look at the compiler output, so many extensions
unfortunately don't compile without warnings at the moment.

By putting the function declaration into the PG_FUNCTION_INFO_V1 macro,
everyone wins:

- fewer compiler warnings due to laziness
- no code bloat because of redundant function declarations
- accidental V0 functions are warned about
- prototypes of V1 function implementations are implicitly checked for
correctness




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Performance optimization of btree binary search

2013-12-04 Thread Peter Geoghegan
Having nothing better to do over the holiday weekend, I decided to
pursue a number of ideas for improving performance that I thought
about a long time ago. These include:

* Pre-fetching list node pointers. This looks to be moderately
promising, but I'm certainly not going to be the one to land it, given
present obligations. Stephen Frost may wish to pick it up, given his
previous interest in the matter. This is slightly controversial,
because it uses a GCC intrinsic (__builtin_prefetch), but also because
the Linux kernel removed this optimization to their generic list
data-structure [1]. However, that list was what we'd call an embedded
list, so we should probably shouldn't be totally deterred. The amount
of effort that I put into this was, frankly, pretty low. A motivated
person, willing to do the appropriate analysis could probably bring it
further. For one thing, every single foreach() has a call to this
intrinsic, even where the list doesn't store pointers (which is not
undefined). At the very least that's going to bloat things up,
frequently for no conceivable gain, and yet with the patch applied
we're still able to see see quite tangible benefits, even if it isn't
exactly a stellar improvement. I have an idea that prefetching the
last element at the start of the loop could be much better than what I
did, because we know that those lists are mostly pretty small in
practice, and that's likely to help pipelining - prefetching too late
or even too early makes the optimization useless, because you may
still get a cache miss.

* Optimizing index scans - I noticed that binary searching accounted
for many cache misses during a pgbench select.sql benchmark,
instrumented with perf record -e cache-misses. This warranted
further investigation.

I won't say anything further about the former optimization, except to
note that it's included for comparative purposes in the set of
benchmarks I've run (I haven't included a patch). The benchmark
results are here:

http://postgres-benchmarks.s3-website-us-east-1.amazonaws.com/results

I took two approaches to the latter. This was the more interesting
piece of work. Test sets include:

* Master baseline (green)

* List optimization (as mentioned above, not really relevant to the
main topic of this mail) (red)

* fib btree, earlier patch, please disregard (blue)

* Fixed fib patch, Fibonacci search, no specialization (purple)

* The interesting one, Finonacci search + specialization - fib + no
jump  (turquoise, see below for details)

Initially, I had a little bit of success with Fibonnacci search [2] in
place of binary search, in the hope that it would better take
advantage of CPU cache characteristics - Fibonnacci search is said to
have advantages where non-uniform memory access is an issue - it
minimizes the final interval. I wasn't all that optimistic that it
would work that well given the smallish size of BLCKSZ relative to
modern CPU L1 cache sizes [3], but it did make an appreciable dent on
its own. I suppose old habits die hard, because next I hacked up
_bt_compare and had it do an int4btcmp directly, in the event of
encountering a scankey that had as its comparator the relevant pg_proc
oid. This is very much in the style (and the spirit) of the grotty
early draft patches for the inlining-comparators-for-sorting patch.
Patch is attached. This is a draft, a POC, posted only to facilitate
discussion and to allow my results to be independently
duplicated/verified. Note that there is a bug (attributable to the new
search code) that causes the regression tests to fail in exactly one
place (i.e. one line of difference). I didn't think it was worth
deferring discussion to deal with that, though, since I don't think it
undermines anything.

I'm not sure how valuable the comparator trick is if we stick with
binary search - I didn't test that. I'm sure it's a question that must
be considered, though.

I have a fairly huge amount of data here, having run plenty of
benchmarks over several nights. The short version is that the 'select'
benchmark has just over 18% greater throughput on this machine at some
client counts (in particular, when there are 4 clients - there are 4
cores, but 8 logical cores) with the attached patch. There is a 3.5%
regression with one client, which is certainly not accounted for by
noise. Note, however, that latency appears consistently better with
the patch applied. This is a machine running on dedicated hardware: a
4-core i7-4770. The working set easily fits in its 32GiB of DDR3 RAM
at all pgbench scales tested (1, 10 and 100). The kernel used is
3.8.0-31-generic #46~precise1-Ubuntu SMP. Postgres settings are
typical for this kind of thing (6GiB shared_buffers), but you can
refer to my pgbench-tools results for full details (drill down to an
individual pgbench run for that - they're all the same). I'm kind of
curious as to what this benchmark would like like on a server with
many more cores.

I guess I could write a proper patch to have code 

Re: [HACKERS] Performance optimization of btree binary search

2013-12-04 Thread Tom Lane
Peter Geoghegan p...@heroku.com writes:
 I guess I could write a proper patch to have code setting up a scankey
 also set a flag that indicated that it was acceptable to assume that
 the special built-in comparator would do fine. ...
 I'd be happy with a scheme with only one built-in comparator, and
 allowed a few types to be cataloged such that it was indicated that
 just using the built-in comparator was acceptable, knowledge that
 could be passed to _bt_compare via the scankey. I'm thinking of just
 int4, and maybe date and a few other such int4 covariant types.

If what you're proposing is that we have a fast path that compares Datums
as Datums, I should think that that would work fine for int2 as well,
*and* for int8 on machines where int8 is pass-by-value.  (Does anyone
still care much about PG's performance on 32-bit hardware?)  We might
have to fool a bit with the fooGetDatum macros in some cases, eg
I think Int16GetDatum isn't careful about sign extension.  Admittedly,
that might introduce an offsetting cost on some hardware, but I think
on most machines sign-extension isn't noticeably more expensive than
zero-extension.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance optimization of btree binary search

2013-12-04 Thread Robert Haas
On Wed, Dec 4, 2013 at 4:28 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Peter Geoghegan p...@heroku.com writes:
 I guess I could write a proper patch to have code setting up a scankey
 also set a flag that indicated that it was acceptable to assume that
 the special built-in comparator would do fine. ...
 I'd be happy with a scheme with only one built-in comparator, and
 allowed a few types to be cataloged such that it was indicated that
 just using the built-in comparator was acceptable, knowledge that
 could be passed to _bt_compare via the scankey. I'm thinking of just
 int4, and maybe date and a few other such int4 covariant types.

 If what you're proposing is that we have a fast path that compares Datums
 as Datums, I should think that that would work fine for int2 as well,
 *and* for int8 on machines where int8 is pass-by-value.  (Does anyone
 still care much about PG's performance on 32-bit hardware?)  We might
 have to fool a bit with the fooGetDatum macros in some cases, eg
 I think Int16GetDatum isn't careful about sign extension.  Admittedly,
 that might introduce an offsetting cost on some hardware, but I think
 on most machines sign-extension isn't noticeably more expensive than
 zero-extension.

Yeah, I think if we can make something like this work, it would be
neat-o.  Getting this working for int4 would be a good win, as Peter
says, but getting it working for both int4 and int8 with the same code
would be a significantly better one.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance optimization of btree binary search

2013-12-04 Thread Peter Geoghegan
On Wed, Dec 4, 2013 at 1:59 PM, Robert Haas robertmh...@gmail.com wrote:
 Yeah, I think if we can make something like this work, it would be
 neat-o.  Getting this working for int4 would be a good win, as Peter
 says, but getting it working for both int4 and int8 with the same code
 would be a significantly better one.

No arguments here. I think I didn't initially suggest it myself out of
passing concern about the guarantees around how unused Datum bits are
initialized in all relevant contexts, but having looked at it for a
second I see that we are of course disciplined there.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance optimization of btree binary search

2013-12-04 Thread Tom Lane
Peter Geoghegan p...@heroku.com writes:
 On Wed, Dec 4, 2013 at 1:59 PM, Robert Haas robertmh...@gmail.com wrote:
 Yeah, I think if we can make something like this work, it would be
 neat-o.  Getting this working for int4 would be a good win, as Peter
 says, but getting it working for both int4 and int8 with the same code
 would be a significantly better one.

 No arguments here. I think I didn't initially suggest it myself out of
 passing concern about the guarantees around how unused Datum bits are
 initialized in all relevant contexts, but having looked at it for a
 second I see that we are of course disciplined there.

Hm ... actually, the comment at lines 335ff of postgres.h points out that
a Datum returned from a version 0 user-defined function might contain
garbage in the high order bits.  We could fix that, probably, with some
cleanup code added to fmgr_oldstyle.  It'd waste a few cycles ... but if
there's anybody out there who still cares about the performance of such
functions, it's high time they fixed them to be v1, anyway.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance optimization of btree binary search

2013-12-04 Thread Robert Haas
On Wed, Dec 4, 2013 at 6:33 PM, Peter Geoghegan p...@heroku.com wrote:
 On Wed, Dec 4, 2013 at 1:59 PM, Robert Haas robertmh...@gmail.com wrote:
 Yeah, I think if we can make something like this work, it would be
 neat-o.  Getting this working for int4 would be a good win, as Peter
 says, but getting it working for both int4 and int8 with the same code
 would be a significantly better one.

 No arguments here. I think I didn't initially suggest it myself out of
 passing concern about the guarantees around how unused Datum bits are
 initialized in all relevant contexts, but having looked at it for a
 second I see that we are of course disciplined there.

Hmm.  And yet, there's this:

 * When a type narrower than Datum is stored in a Datum, we place it in the
 * low-order bits and are careful that the DatumGetXXX macro for it discards
 * the unused high-order bits (as opposed to, say, assuming they are zero).
 * This is needed to support old-style user-defined functions, since depending
 * on architecture and compiler, the return value of a function returning char
 * or short may contain garbage when called as if it returned Datum.

And record_image_eq does a rather elaborate dance around here, calling
the appropriate GET_x_BYTES macro depending on the type-width.  If we
can really count on the high-order bits to be zero, that's all
completely unnecessary tomfoolery.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance optimization of btree binary search

2013-12-04 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 Hmm.  And yet, there's this:

  * When a type narrower than Datum is stored in a Datum, we place it in the
  * low-order bits and are careful that the DatumGetXXX macro for it discards
  * the unused high-order bits (as opposed to, say, assuming they are zero).
  * This is needed to support old-style user-defined functions, since depending
  * on architecture and compiler, the return value of a function returning char
  * or short may contain garbage when called as if it returned Datum.

 And record_image_eq does a rather elaborate dance around here, calling
 the appropriate GET_x_BYTES macro depending on the type-width.  If we
 can really count on the high-order bits to be zero, that's all
 completely unnecessary tomfoolery.

Yeah, that's another thing we could simplify if we fixed this problem
at the source.  I think these decisions date from a time when we still
cared about the speed of fmgr_oldstyle.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance optimization of btree binary search

2013-12-04 Thread Tom Lane
I wrote:
 Yeah, that's another thing we could simplify if we fixed this problem
 at the source.  I think these decisions date from a time when we still
 cared about the speed of fmgr_oldstyle.

BTW, the text you're quoting is from 2007, but it's just documenting
behavior that's mostly a lot older.  It's worth reading commit 23a41573
in toto in this connection.  I'm not sure if we'd want to revert that
DatumGetBool change or not, if we were to clean up fmgr_oldstyle.
We'd be able to do whatever was cheapest, but I'm not sure what that is.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance optimization of btree binary search

2013-12-04 Thread Robert Haas
On Wed, Dec 4, 2013 at 6:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 Hmm.  And yet, there's this:

  * When a type narrower than Datum is stored in a Datum, we place it in the
  * low-order bits and are careful that the DatumGetXXX macro for it discards
  * the unused high-order bits (as opposed to, say, assuming they are zero).
  * This is needed to support old-style user-defined functions, since 
 depending
  * on architecture and compiler, the return value of a function returning 
 char
  * or short may contain garbage when called as if it returned Datum.

 And record_image_eq does a rather elaborate dance around here, calling
 the appropriate GET_x_BYTES macro depending on the type-width.  If we
 can really count on the high-order bits to be zero, that's all
 completely unnecessary tomfoolery.

 Yeah, that's another thing we could simplify if we fixed this problem
 at the source.  I think these decisions date from a time when we still
 cared about the speed of fmgr_oldstyle.

Sure, let's whack that thing with a crowbar.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance optimization of btree binary search

2013-12-04 Thread Peter Eisentraut
On Wed, 2013-12-04 at 19:45 -0500, Robert Haas wrote:
 On Wed, Dec 4, 2013 at 6:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:
  Yeah, that's another thing we could simplify if we fixed this problem
  at the source.  I think these decisions date from a time when we still
  cared about the speed of fmgr_oldstyle.
 
 Sure, let's whack that thing with a crowbar.

Or just remove it.  Who still needs it?




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance optimization of btree binary search

2013-12-04 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On Wed, 2013-12-04 at 19:45 -0500, Robert Haas wrote:
 On Wed, Dec 4, 2013 at 6:56 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Yeah, that's another thing we could simplify if we fixed this problem
 at the source.  I think these decisions date from a time when we still
 cared about the speed of fmgr_oldstyle.

 Sure, let's whack that thing with a crowbar.

 Or just remove it.  Who still needs it?

Lazy people?  I'm not in a hurry to drop it; it's not costing us much to
just sit there, other than in this connection which we see how to fix.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance optimization of btree binary search

2013-12-04 Thread Peter Geoghegan
On Wed, Dec 4, 2013 at 12:58 PM, Peter Geoghegan p...@heroku.com wrote:
 I'm kind of
 curious as to what this benchmark would like like on a server with
 many more cores.

I'm also curious about the impact on insertion into primary key
indexes. Presently, we hold an exclusive buffer lock for the duration
of a couple of operations when checkUnique != UNIQUE_CHECK_NO.
_bt_binsrch() is one such operation. The other one there,
_bt_check_unique(), is likely to be a lot cheaper than _bt_binsrch()
on average, I think, so I'm cautiously optimistic that it'll be
noticeable. I better go and check it out.


-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance optimization of btree binary search

2013-12-04 Thread Peter Geoghegan
On Wed, Dec 4, 2013 at 5:28 PM, Peter Geoghegan p...@heroku.com wrote:
 I'm also curious about the impact on insertion into primary key
 indexes. Presently, we hold an exclusive buffer lock for the duration
 of a couple of operations when checkUnique != UNIQUE_CHECK_NO.
 _bt_binsrch() is one such operation. The other one there,
 _bt_check_unique(), is likely to be a lot cheaper than _bt_binsrch()
 on average, I think, so I'm cautiously optimistic that it'll be
 noticeable. I better go and check it out.

Depending on how well this goes, I might also teach _bt_doinsert() to
hint to _bt_binsrch() (or as I'm calling it, _bt_page_search()) that
it should look to the end of the page when searching, using a similar
mechanism to the mechanism for hinting that the main Datum-compare
optimization is applicable (this strategy would be abandoned if it
didn't work immediately - as soon as the last item on the page turned
out to be greater than or equal to the scankey value). This is
something that I think would help with SERIAL columns, where it's
possible in principle to pass that kind of insight around -- if you
can live with making SERIAL more than mere syntactic sugar.

-- 
Peter Geoghegan


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance optimization of btree binary search

2013-12-04 Thread Peter Eisentraut
On Wed, 2013-12-04 at 20:27 -0500, Tom Lane wrote:
 Lazy people?  I'm not in a hurry to drop it; it's not costing us much to
 just sit there, other than in this connection which we see how to fix.

Actually, I think it probably costs a fair portion of extension authors
when their initial code crashes because they forgot to declare all their
functions V1.  I think it might actually be more of a bother to lazy
people than a benefit.




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Performance optimization of btree binary search

2013-12-04 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On Wed, 2013-12-04 at 20:27 -0500, Tom Lane wrote:
 Lazy people?  I'm not in a hurry to drop it; it's not costing us much to
 just sit there, other than in this connection which we see how to fix.

 Actually, I think it probably costs a fair portion of extension authors
 when their initial code crashes because they forgot to declare all their
 functions V1.  I think it might actually be more of a bother to lazy
 people than a benefit.

Hm.  We have heard one or two complaints like that, but not a huge number.

I'm worried about breaking code that's been working since god-knows-when;
but I will concede there's little evidence that there's very much of that
out there either.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers