from:"Peter Geoghegan"

Re: [HACKERS] tuplesort memory usage: grow_memtuples

2012-07-25 Thread Peter Geoghegan

On 3 March 2012 20:22, Jeff Janes jeff.ja...@gmail.com wrote:
 Add it all up, and instead of pre-reading 32 consecutive 8K blocks, it
 pre-reads only about 1 or 2 consecutive ones on the final merge.  Now
 some of those could be salvaged by the kernel keeping track of
 multiple interleaved read ahead opportunities, but in my hands vmstat
 shows a lot of IO wait and shows reads that seem to be closer to
 random IO than large read-ahead.  If it used truly efficient read
 ahead, CPU would probably be limiting.

Can you suggest a benchmark that will usefully exercise this patch?

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Adding probes for smgr

2012-07-28 Thread Peter Geoghegan

On 28 July 2012 17:15, Tom Lane t...@sss.pgh.pa.us wrote:
 IMV smgr is pretty vestigial.  I wouldn't recommend loading more
 functionality onto that layer, because it's as likely as not that
 we'll just get rid of it someday.

Agreed. I recently found myself reading a paper written by Stonebraker
back in the Berkeley days:

http://dislab2.hufs.ac.kr/dislab/seminar/2007/ERL-M87-06.pdf

This paper appears to have been published in about 1988, and it shows.
It's fairly obvious from reading the opening paragraph that the
original rationale for the design of the storage manager doesn't hold
these days. Of course, it's also obvious from reading the code, since
for example there is only one storage manager module.

This state of affairs sort of reminds me of mcxt.c . The struct
MemoryContextData is described as an abstract type that can have
multiple implementations, despite the fact that since 2000 (and
perhaps earlier), the underlying type is invariably AllocSetContext. I
never investigated if that indirection still needs to exist, but I
suspect that it too is a candidate for refactoring. Do you agree?

Incidentally, you might consider refreshing these remarks in the smgr README:

In Berkeley Postgres each relation was tagged with the ID of the
storage manager to use for it.  This is gone.  It would be more
reasonable to associate storage managers with tablespaces (a feature
not present as this text is being written, but one likely to emerge
soon).

--
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Help me develop new commit_delay advice

2012-07-31 Thread Peter Geoghegan

Many of you will be aware that the behaviour of commit_delay was
recently changed. Now, the delay only occurs within the group commit
leader backend, and not within each and every backend committing a
transaction:

http://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=f11e8be3e812cdbbc139c1b4e49141378b118dee

For those of you that didn't follow this development, I should point
out that I wrote a blogpost that described the idea, which will serve
as a useful summary:

http://pgeoghegan.blogspot.com/2012/06/towards-14000-write-transactions-on-my.html

I made what may turn out to be a useful observation during the
development of the patch, which was that for both the tpc-b.sql and
insert.sql pgbench-tools scripts, a commit_delay of half of my
wal_sync_method's reported raw sync speed looked optimal. I use Linux,
so my wal_sync_method happened to have been fdatasync. I measured this
using pg_test_fsync.

The devel docs still say of commit_delay and commit siblings: Good
values for these parameters are not yet clear; experimentation is
encouraged. This has been the case since Postgres 7.1 (i.e. it has
never been clear what good values were - the folk wisdom was actually
that commit_delay should always be set to 0). I hope to be able to
formulate some folk wisdom about setting commit_delay from 9.3 on,
that may go on to be accepted as an official recommendation within the
docs.

I am rather curious as to what experimentation shows optimal values
for commit_delay to be for a representative cross-section of hardware.
In particular, I'd like to see if setting commit_delay to half of raw
sync time appears to be optimal for both insert.sql and tpc-b.sql
workloads across different types of hardware with different sync
times. Now, it may be sort of questionable to take those workloads as
general proxies for performance, not least since they will literally
give Postgres as many *completely uniform* transactions as it can
handle. However, it is hard to think of another, better general proxy
for performance that is likely to be accepted as such, and will allows
us to reason about setting commit_delay.

While I am not completely confident that we can formulate a widely
useful, simple piece of advice, I am encouraged by the fact that a
commit_delay of 4,000 worked very well for both tpc-b.sql and
insert.sql workloads on my laptop, beating out settings of 3,000 and
5,000 on each benchmark. I am also encouraged by the fact that in some
cases, including both the insert.sql and tpc-b.sql cases that I've
already described elsewhere, there is actually no downside to setting
commit_delay - transaction throughput naturally improves, but
transaction latency is actually improved a bit too (or at least the
average and worst-cases). This is presumably due to the amelioration
of resource contention (from greater commit batching) more than
compensating for the obvious downside of adding a delay.

It would be useful, for a start, if I had numbers for a battery-backed
write cache. I don't have access to one right now though, nor do I
have access to any more interesting hardware, which is one reason why
I'm asking for help with this.

I like to run sync prior to running pg_test_fsync, just in case.

[peter@peterlaptop pg_test_fsync]$ sync

I then interpret the following output:

[peter@peterlaptop pg_test_fsync]$ pg_test_fsync
2 seconds per test
O_DIRECT supported on this platform for open_datasync and open_sync.

Compare file sync methods using one 8kB write:
(in wal_sync_method preference order, except fdatasync
is Linux's default)
open_datasync 112.940 ops/sec
fdatasync 114.035 ops/sec
fsync 21.291 ops/sec
*** SNIP ***

So if I can perform 114.035 8KiB sync operations per second, that's an
average of about 1 per 8.77 milliseconds, or 8770 microseconds to put
it in the units that commit_delay speaks. It is my hope that we will
find that when this number is halved, we will arrive at a figure that
is worth recommending as a general useful setting for commit_delay for
the system. I guess I could gain some additional insight by simply
changing my wal_sync_method, but I'd find it more interesting to look
at organic setups with faster (not slower) sync times than my system's
fdatasync. For those who are able to help me here, I'd like to see
pgbench-tools workloads for both tpc-b.sql and insert.sql with
incrementing values of commit_delay (increments of, say, 1000
microseconds, perhaps with less granularity where it isn't needed),
from 0 to $(1.5 times raw sync speed) microseconds.

Thanks
-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Help me develop new commit_delay advice

2012-08-01 Thread Peter Geoghegan

On 1 August 2012 15:14, Amit Kapila amit.kap...@huawei.com wrote:
 I shall look into this aspect also(setting commit_delay based on raw sync).
 You also suggest if you want to run the test with different configuration.

Well, I was specifically interested in testing if half of raw sync
time was a widely useful setting, across a variety of different,
though representative I/O subsystems. Unfortunately, without some
context about raw sync speed to go along with your numbers, I cannot
advance or disprove that idea.

It would also have been nice to see a baseline number of 0 too, to get
an idea of how effective commit_delay may now be. However, that's
secondary.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Help me develop new commit_delay advice

2012-08-02 Thread Peter Geoghegan

On 29 July 2012 16:39, Peter Geoghegan pe...@2ndquadrant.com wrote:
 Many of you will be aware that the behaviour of commit_delay was
 recently changed. Now, the delay only occurs within the group commit
 leader backend, and not within each and every backend committing a
 transaction:

I've moved this to the pgsql-performance list. Please continue the
discussion there.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] WIP: pg_pretty_query

2012-08-07 Thread Peter Geoghegan

On 7 August 2012 20:01, Andrew Dunstan and...@dunslane.net wrote:
 On 08/07/2012 02:14 PM, Tom Lane wrote:
 * As per some of the complaints already registered in this thread,
 ruleutils.c is not designed with the goal of being a pretty-printer.
 Its primary charter is to support pg_dump by regurgitating rules/views
 in an unambiguous form, which does not necessarily look very similar to
 what was entered.  An example of a transformation that probably nobody
 would want in a typical pretty-printing context is expansion of
 SELECT * lists.  But again, there is really no way to turn that off.
 Another aspect that seems pretty undesirable for pretty-printing is
 loss of any comments embedded in the query text.

 I'm very much not in favor of trying to make ruleutils serve two
 masters, but that's the game we will be playing if we accept this patch.

+1. A pretty-printer that makes the query to be cleaned-up actually
undergo parse-analysis seems misconceived to me. This is a tool that
begs to be written in something like Python, as a satellite project,
with much greater flexibility in the format of the SQL that it
outputs.

 One of the challenges is to have a pretty printer that is kept in sync with
 the dialect that's supported. Anything that doesn't use the backend's parser
 seems to me to be guaranteed to get out of sync very quickly.

I'm not convinced of that. Consider the example of cscope, a popular
tool for browsing C code. Its parser was written to be fuzzy, so
it's actually perfectly usable for C++ and Java, even though that
isn't actually supported, IIRC. Now, I'll grant you that that isn't a
perfectly analogous situation, but it is similar in some ways. If we
add a new clause to select and that bit is generically pretty-printed,
is that really so bad? I have a hard time believing that a well
written fuzzy pretty-printer for Postgres wouldn't deliver the vast
majority of the benefits of an in-core approach, at a small fraction
of the effort. You'd also be able to pretty-print plpgsql function
definitions (a particularly compelling case for this kind of tool),
which any approach based on the backends grammar will never be able to
do (except, perhaps, if you were to do something even more
complicated).

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Revert commit_delay change; just add comment that we don't hav

2012-08-15 Thread Peter Geoghegan

On 15 August 2012 05:15, Tom Lane t...@sss.pgh.pa.us wrote:
 Peter Geoghegan pe...@2ndquadrant.com writes:
 On 14 August 2012 21:26, Bruce Momjian br...@momjian.us wrote:
 Revert commit_delay change; just add comment that we don't have
 a microsecond specification.

 I think that if we eventually decide to change the name of
 commit_delay for 9.3 (you previously suggested that that might be
 revisited), it will be reasonable to have the new GUC in units of
 milliseconds.

 Well, the reason why it's like that at all is the thought that values
 of less than 1 millisecond might be useful.  Are we prepared to suppose
 that that is not and never will be true?

I think that the need for sub-millisecond granularity had more to do
with historic quirks of our implementation. Slight tweaks accidentally
greatly improved throughput, if only for the synthetic benchmark in
question. I personally have not seen any need for a sub-millisecond
granularity when experimenting with the setting on 9.3-devel. However,
I am not sure that sub-millisecond granularity could never be of any
use, in squeezing the last small increase in throughput made possible
by commit_delay. Importantly, feedback as the GUC is tuned is far more
predictable than it was with the prior implementation, so this does
seem quite unimportant.

Why does commit_delay have to be an integer? Can't we devise a way of
manipulating it in units of milliseconds, but have the internal
representation be a double, as with pg_stat_statements' total_time
column? That would allow very fine tuning of the delay. As I've
outlined, I'm not sure that it's worth supporting such fine-tuning
with the new implementation.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [COMMITTERS] pgsql: Revert commit_delay change; just add comment that we don't hav

2012-08-15 Thread Peter Geoghegan

On 15 August 2012 14:39, Tom Lane t...@sss.pgh.pa.us wrote:
 If you wanted to re-implement all the guc.c logic for supporting
 unit-ified values such that it would also work with floats, we could
 do that.  It seems like way more mechanism than the problem is worth
 however.

Fair enough.

I'm not quite comfortable recommending a switch to milliseconds if
that implies a loss of sub-millisecond granularity. I know that
someone is going to point out that in some particularly benchmark,
they can get another relatively modest increase in throughput (perhaps
2%-3%) by splitting the difference between two adjoining millisecond
integer values. In that scenario, I'd be tempted to point out that
that increase is quite unlikely to carry over to real-world benefits,
because the setting is then right on the cusp of where increasing
commit_delay stops helping throughput and starts hurting it. The
improvement is likely to get lost in the noise in the context of a
real-world application, where for example the actually cost of an
fsync is more variable. I'm just not sure that that's the right
attitude.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] The pgrminclude problem

2012-08-16 Thread Peter Geoghegan

I found a tool that Google authored that attempts to solve the same
problems as pgrminclude does in our codebase.  It's called include
what you use, and is based on Clang. The project is hosted here:

http://code.google.com/p/include-what-you-use/

I'm not suggesting that we should start using this tool instead of
pgrminclude, because it has enough caveats of its own, and is mostly
written with Google's C++ codebase in mind. However, it's worth being
aware of. There is a useful analysis of the general problem in the
README - Why IWYU is Difficult (you can probably just skip the
extensive analysis of C++ templates as they relate to the problem
there).

The tool is authored by Craig Silverstein, Google's director of
technology. If he believes that IWYU is a difficult problem, well, it
probably is.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] The pgrminclude problem

2012-08-16 Thread Peter Geoghegan

On 16 August 2012 16:56, Bruce Momjian br...@momjian.us wrote:
 Good to know. We only use pgrminclude very five years or so, and Tom
 isn't even keen on that.

Yeah. Even if this could be made to work well, we'd still have to do
something like get an absolute consensus from all build farm animals,
if we expected to have an absolutely trustworthy list. I don't think
pgrminclude is a bad idea. I just think that it should only be used to
guide the efforts of a human to remove superfluous #includes, which is
how it is used anyway.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tuplesort memory usage: grow_memtuples

2012-08-16 Thread Peter Geoghegan

,
AllocSetGetChunkSpace) return header size too. We were getting away
with that before with the doubling strategy, but now we have to factor
in that header's size into allowedMem.

I don't think my revision satisfactory in what it does here. It isn't
obvious what a better principled implementation would do though. Does
anyone have any other ideas?

I think this patch (or at least your observation about I/O waits
within vmstat) may point to a more fundamental issue with our sort
code: Why are we not using asynchronous I/O in our implementation?
There are anecdotal reports of other RDBMS implementations doing far
better than we do here, and I believe asynchronous I/O, pipelining,
and other such optimisations have a lot to do with that. It's
something I'd hoped to find the time to look at in detail, but
probably won't in the 9.3 cycle. One of the more obvious ways of
optimising an external sort is to use asynchronous I/O so that one run
of data can be sorted or merged while other runs are being read from
or written to disk. Our current implementation seems naive about this.
There are some interesting details about how this is exposed by POSIX
here:

http://www.gnu.org/software/libc/manual/html_node/Asynchronous-I_002fO.html

It's already anticipated that we might take advantage of libaio for
the benefit of FilePrefetch() (see its accompanying comments - it uses
posix_fadvise itself - effective_io_concurrency must be  0 for this
to ever be called). It perhaps could be considered parallel
low-hanging fruit in that it allows us to offer limited though
useful backend parallelism without first resolving thorny issues
around what abstraction we might use, or how we might eventually make
backends thread-safe. AIO supports registering signal callbacks (a
SIGPOLL handler can be called), which seems relatively
uncontroversial.

Platform support for AIO might be a bit lacking, but then you can say
the same about posix_fadvise. We don't assume that poll(2) is
available, but we already use it where it is within the latch code.
Besides, in-kernel support can be emulated if POSIX threads is
available, which I believe would make this broadly useful on unix-like
platforms.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


sortmem_grow-v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sha1, sha2 functions into core?

2012-08-20 Thread Peter Geoghegan

On 20 August 2012 21:26, Joshua D. Drake j...@commandprompt.com wrote:

 On 08/20/2012 01:21 PM, Josh Berkus wrote:



 I don't think US export regulations are the only issue. Some other
 countries (mostly the usual suspects) forbid the use of crypto software.
 If we build more crypto functions into the core we make it harder to use
 Postgres legally in those places.


 I fail to see how that is our problem. We shouldn't make the software less
 useful because of those places.

Agreed.

I find the idea of some secret policeman urging the use of MySQL
because it doesn't have a built-in SHA-1 cryptographic hash function
seems extremely far-fetched. The BitTorrent protocol uses SHA-1 to
validate chunks, and it has been variously estimated that 10% - 50% of
all internet traffic is BitTorrent traffic. SHA-1 is also integral to
the way that git makes content effectively tamper-proof:

http://www.youtube.com/watch?v=4XpnKHJAok8#t=56m

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] TODO

2012-08-23 Thread Peter Geoghegan

On 23 August 2012 20:21, Bruce Momjian br...@momjian.us wrote:
 On Tue, Aug 21, 2012 at 12:16:11PM +0900, Tatsuo Ishii wrote:
 I found this in https://wiki.postgresql.org/wiki/Todo :

   Improve ability to display optimizer analysis using OPTIMIZER_DEBUG

 What does this actually mean?

 Add GUC switch to enable optimizer debug on/off?
 More fancy/useful info should be printed?
 If so, what kind of information is required?

 Well, right now, OPTIMIZER_DEBUG lets you see what plans were considered
 and removed.  I was thinking that information should be available
 without a special compiled binary.

That seems like an unreasonable burden on what OPTIMIZER_DEBUG can do.
If you can't build Postgres from source, something that is relatively
low barrier, you have little hope of sensibly interpreting low-level
output from the planner. However, I could get behind having it as a
configure option, which it is currently not. It would probably also
make sense to expose SELECTIVITY_DEBUG and GEQO_DEBUG.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] wal_buffers

2012-08-29 Thread Peter Geoghegan

On 19 February 2012 05:24, Robert Haas robertmh...@gmail.com wrote:
 I have attached tps scatterplots.  The obvious conclusion appears to
 be that, with only 16MB of wal_buffers, the buffer wraps around with
 some regularity: we can't insert more WAL because the buffer we need
 to use still contains WAL that hasn't yet been fsync'd, leading to
 long stalls.  More buffer space ameliorates the problem.

Incidentally, I wondered if we could further improve group commit
performance by implementing commit_delay with a WaitLatch call, and
setting the latch in the event of WAL buffers wraparound (or rather, a
queued wraparound request - a segment switch needs WALWriteLock, which
the group commit leader holds for a relatively long time during the
delay). I'm not really sure how significant a win this might be,
though. There could be other types of contention, which could be
considerably more significant. I'll try and take a look at it next
week.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] effective_io_concurrency

2012-08-30 Thread Peter Geoghegan

On 30 August 2012 20:28, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Jul 28, 2012 at 4:09 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 But it might be better yet to make ordinary index scans benefit from
 effective_io_concurrency, but even if/when that gets done it would
 probably still be worthwhile to make the planner understand the
 benefit.

 That sounds good too, but separate.

Indeed. The original effective_io_concurrency commit message said:


***SNIP***

(The best way to handle this for plain index scans is still under debate,
so that part is not applied yet --- tgl)


...seems like a pity that this debate never reached a useful conclusion.

Just how helpful is effective_io_concurrency? Did someone produce a
benchmark at some point?

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bitmap scan much slower than index scan, hash_search_with_hash_value

2012-09-02 Thread Peter Geoghegan

On 2 September 2012 06:21, Sergey Koposov kopo...@ast.cam.ac.uk wrote:
 Hi,

 I'm experiencing the case when bitmap scan is ~ 70 times slower than index
 scan which seems to be caused by 1) very big table 2) some hash search logic
 (hash_search_with_hash_value )

 Here is the explain analyze of the query with bitmap scans allowed:

 wsdb= explain analyze select * from test as t, crts.data as d1
 where d1.objid=t.objid and d1.mjd=t.mjd limit 1;
   QUERY
 PLAN
 ---
  Limit  (cost=11514.04..115493165.44 rows=1 width=68) (actual
 time=27.512..66620.231 rows=1 loops=1)
-  Nested Loop  (cost=11514.04..1799585184.18 rows=155832 width=68)
 (actual time=27.511..66616.807 rows=1 loops=1)
  -  Seq Scan on test t  (cost=0.00..2678.40 rows=156240 width=28)
 (actual time=0.010..4.685 rows=11456 loops=1)
  -  Bitmap Heap Scan on data d1  (cost=11514.04..11518.05 rows=1
 width=40) (actual time=5.807..5.807 rows=1 loops=11456)
Recheck Cond: ((mjd = t.mjd) AND (objid = t.objid))
-  BitmapAnd  (cost=11514.04..11514.04 rows=1 width=0)
 (actual time=5.777..5.777 rows=0 loops=11456)
  -  Bitmap Index Scan on data_mjd_idx
 (cost=0.00..2501.40 rows=42872 width=0) (actual time=3.920..3.920 rows=22241
 loops=11456)
Index Cond: (mjd = t.mjd)
  -  Bitmap Index Scan on data_objid_idx
 (cost=0.00..8897.90 rows=415080 width=0) (actual time=0.025..0.025 rows=248
 loops=11456)
Index Cond: (objid = t.objid)
  Total runtime: 66622.026 ms
 (11 rows)

 Here is the output when bitmap scans are disabled:
 QUERY PLAN
  QUERY
 PLAN
 
  Limit  (cost=0.00..329631941.65 rows=1 width=68) (actual
 time=0.082..906.876 rows=1 loops=1)
-  Nested Loop  (cost=0.00..4979486036.95 rows=151062 width=68) (actual
 time=0.081..905.683 rows=1 loops=1)
  Join Filter: (t.mjd = d1.mjd)
  -  Seq Scan on test t  (cost=0.00..2632.77 rows=151677 width=28)
 (actual time=0.009..1.679 rows=11456 loops=1)
  -  Index Scan using data_objid_idx on data d1
 (cost=0.00..26603.32 rows=415080 width=40) (actual time=0.010..0.050
 rows=248 loops=11456)
Index Cond: (objid = t.objid)
  Total runtime: 907.462 ms

 When the bitmap scans are enabled the prof of postgres shows
 47.10%  postmaster  postgres   [.] hash_search_with_hash_value
 |
 --- hash_search_with_hash_value

 11.06%  postmaster  postgres   [.] hash_seq_search
 |
 --- hash_seq_search

  6.95%  postmaster  postgres   [.] hash_any
 |
 --- hash_any

  5.17%  postmaster  postgres   [.] _bt_checkkeys
 |
 --- _bt_checkkeys

  4.07%  postmaster  postgres   [.] tbm_add_tuples
 |
 --- tbm_add_tuples

  3.41%  postmaster  postgres   [.] hash_search
 |
 --- hash_search


 And the last note is that the crts.data table which is being bitmap scanned
 is a 1.1Tb table with ~ 20e9 rows. My feeling is that the bitmap index scan
 code
 is somehow unprepared to combine two bitmaps for such a big table, and this
 leads to the terrible performance.

I think that this kind of question is better suited to the
pgsql-performance list. Granted, it was presented as a bug report
(though they're generally sent to -bugs rather than -hackers), but I
don't think that this is a valid bug.

One obvious red-flag from your query plans is that there is a
misestimation of the row return count of a few orders of magnitude in
the Bitmap Index Scan node. Did you trying performing an ANALYZE to
see if that helped? It may also be helpful to show pg_stats entries
for both the data.mjd and test.mjd columns. You may find, prior to
doing an ANALYZE, that there is no entries for one of those tables.

Turning off the enable_* planner options in production is generally
discouraged. Certainly, you'd be crazy to do that on a server-wide
basis.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] bitmap scan much slower than index scan, hash_search_with_hash_value

2012-09-02 Thread Peter Geoghegan

On 2 September 2012 16:26, Sergey Koposov kopo...@ast.cam.ac.uk wrote:
 After looking at them, I think I understand the reason -- the
 number of n_distinct for crts.data is terribly wrong. In reality it should
 be ~ 130 millions. I already faced this problem at certain point when doing
 group by objid and PG was excausting all the memory, because of that wrong
 estimate. But I know that it's a known hard problem to estimate n_distinct.

 So probably that's the main reason of a problem...

That's why we support altering that value with an ALTER TABLE...ALTER
COLUMN DDL statement. You might at least consider increasing the
statistics target for the column first though, to see if that will
make the n_distinct value better accord with reality.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Is this non-volatile pointer access OK?

2012-09-03 Thread Peter Geoghegan

On 3 September 2012 08:10, Daniel Farina dan...@heroku.com wrote:
 http://doxygen.postgresql.org/xlog_8c_source.html#l08197

 On line 8197 of xlog.c:

 08194 /* Get a local copy of the last safe checkpoint record. */
 08195 SpinLockAcquire(xlogctl-info_lck);
 08196 lastCheckPointRecPtr = xlogctl-lastCheckPointRecPtr;
 08197 memcpy(lastCheckPoint, XLogCtl-lastCheckPoint, 
 sizeof(CheckPoint));
 08198 SpinLockRelease(xlogctl-info_lck);

 Note the use of capital XLogCtl-lastCheckPoint, which is not the
 volatile pointer.

That looks like a bug to me.

Come to think of it, the whole convention of using a lower-case
variant of the original pointer variable name seems like a foot-gun,
given the harmful and indeed very subtle consequences of making this
error.

I count 98 SpinLockAcquire() call sites (of which only a minority use
this convention, which is mostly within xlog.c, I think). Is it worth
instituting an alternative convention to make this kind of misuse more
obvious? This went unnoticed since February 2009.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] pg_test_fsync output and commit_delay

2012-09-08 Thread Peter Geoghegan

I propose that we try and develop better commit_delay advice, to make
it easier to set the parameter in a way that actually helps
performance. I have been researching a way to make commit_delay
adaptive, though have yet to develop any further insight that is worth
sharing. This may change.

The attached simple patch alters the output produced by pg_test_fsync,
so that we also see the average sync time per op. The pg_test_fsync
docs have had minor alterations too. Hopefully, another doc patch will
follow that builds upon this, and actually firmly recommends taking a
particular course of action when setting commit_delay - my previous
observations about what helped throughput, though supported by Greg
Smith, have not been scrutinised enough just yet, I feel. For now, the
equivocated wording of my doc alterations (that raw wal_sync_method
file sync time might *somehow* be useful here) seems appropriate.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


pg_test_fsync.v1.2012_09_08.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Closing CF 2012-06

2012-09-14 Thread Peter Geoghegan

On 14 September 2012 18:47, Noah Misch n...@leadboat.com wrote:
 According to the CommitFest site, 2012-06 is still in progress with four
 unresolved patches.  With CF 2012-09 scheduled to start tomorrow, let's get it
 closed.  I propose changing two patches from Waiting on Author to Returned
 with Feedback and moving the other two to CF 2012-09.

That seems reasonable. I've marked both patches that I was listed as a
reviewer for (the patches that were previously Waiting on Author) as
Returned with Feedback.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash id in pg_stat_statements

2012-10-01 Thread Peter Geoghegan

On 1 October 2012 08:57, Magnus Hagander mag...@hagander.net wrote:
 I know there was discussions about it earlier, and it wasn't done with
 an argument of it not being stable between releases (IIRC). I think we
 can live with that drawback, assuming of course that we document this
 properly.

Well, I'll point out once again that the argument about its stability
is invalid, because we serialise the entries to disk. If a point
release changes the representation of the query tree such that the
hash values won't match, then we have no recourse but to bump
pg_stat_statements version number, and invalidate all existing
entries. I credit our users with the intelligence to not jump to any
rash conclusions about the hash if directly exposed, such as assuming
that it has any particular degree of stability with respect to the
queries that are fingerprinted, or any degree greater than the
self-evident bare minimum.

I'm pretty sure that the stability among point releases in the face
of potential minor changes to query tree representation thing was
something that I imagined as a reason for the proposal being rejected,
when I tried to read between the lines of a flat rejection. Perhaps I
should have asked for clarification on that point. Now that I think
about it, I'm pretty sure that the need to bump catversion, as a
result of any change in the way dumping the query tree struct into
stored rules needs to happen, will preclude that problem in practice.

 I've now run into multiple customer installations where it would be
 very useful to have. The usecase is mainly storing snapshots of the
 pg_stat_statements output over time and analyzing those. Weird things
 happen for example when the query text is the same, but the hash is
 different (which can happen for example when a table is dropped and
 recreated). And even without that, in order to do anything useful with
 it, you end up hashing the query text anyway - so using the already
 existing hash would be easier and more useful.

Yes, these are all arguments that I'm familiar with :-)  . I've always
thought of pg_stat_statements as a low-level statistical view that
people would naturally want to store snapshots of for analysis, in
much the same way as many do now with things like pg_stat_bgwriter
using tools like Munin. Who wouldn't want to know what queries were
running a half an hour ago when the database server seemed slower than
usual, for example? Such tools should naturally have access to the
same candidate key for entries, rather than adding a subtle
impedance mismatch by using the string. That reminds me - when are you
writing the pg_stat_statements Postgres plugin for Munin?

I was disappointed that my proposal was shot down, despite the fact
that I independently raised it on list at least twice, and pushed as
hard as I felt that I could at the time.

--
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] CTE optimization fence on the todo list?

2012-10-01 Thread Peter Geoghegan

On 1 October 2012 14:05, Merlin Moncure mmonc...@gmail.com wrote:
 On Wed, Sep 19, 2012 at 3:05 PM, Daniel Browning d...@kavod.com wrote:
 I'm wondering if there are any technical/standards constraints that
 are behind the fencing behavior.  If there aren't any, maybe an opt-in
 keyword might do the trick -- WITH UNBOXED foo AS (..)?

I may be mistaken, but I think that the fence is described in the SQL standard.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] pg_malloc() versus malloc(0)

2012-10-01 Thread Peter Geoghegan

On 1 October 2012 15:00, Tom Lane t...@sss.pgh.pa.us wrote:
 1. Teach pg_malloc not to complain if result == NULL and size == 0.

+1 to that proposal.

 2. Before the malloc call, have it replace size == 0 with size = 1.

I don't like that proposal on purely aesthetic grounds.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash id in pg_stat_statements

2012-10-01 Thread Peter Geoghegan

On 1 October 2012 15:22, Magnus Hagander mag...@hagander.net wrote:
 On Mon, Oct 1, 2012 at 4:10 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Worse than that: it could change across a minor version update.  These
 are internal data structures we're hashing, and we've been known to
 have to change them for bug-fix purposes.

 As Peter pointed out, when we do that we have to change the file
 format anyway, and wipe the statistics. So chaning that is something
 we'd have to *note*.

I think the need to not break stored rules pins down the actual
representation of the Query struct in release branches. I'm not
prepared to say that it does so absolutely, since of course certain
logic could be altered in a way that resulted in different values
within the struct or its children, but I'm curious as to when this
actually occurred in the past. Discussion about exposing this value
aside, it'd obviously be a bit disappointing if we had to invalidate
existing statistics in a point release. Still, that situation isn't
made any worse by exposing the value, and in fact doing so could aid
in helping users to understand the issues involved.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash id in pg_stat_statements

2012-10-01 Thread Peter Geoghegan

On 1 October 2012 17:12, Stephen Frost sfr...@snowman.net wrote:
 Peter, all,

 * Peter Geoghegan (pe...@2ndquadrant.com) wrote:
 Well, I'll point out once again that the argument about its stability
 is invalid, because we serialise the entries to disk. If a point
 release changes the representation of the query tree such that the
 hash values won't match, then we have no recourse but to bump
 pg_stat_statements version number, and invalidate all existing
 entries.

 What if we simply included the pg_stat_statements version number in
 what's shown to the user as the 'hash'?  ver#.hash ?

That won't really help matters. There'd still be duplicate entries,
from before and after the change, even if we make it immediately
obvious which is which. The only reasonable solution in that scenario
is to bump PGSS_FILE_HEADER, which will cause all existing entries to
be invalidated.

This is a hypothetical scenario, and concerns about it are totally
orthogonal to the actual question of whether or not the queryid should
be exposed...

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash id in pg_stat_statements

2012-10-02 Thread Peter Geoghegan

On 1 October 2012 18:05, Stephen Frost sfr...@snowman.net wrote:
 * Peter Geoghegan (pe...@2ndquadrant.com) wrote:
 That won't really help matters. There'd still be duplicate entries,
 from before and after the change, even if we make it immediately
 obvious which is which. The only reasonable solution in that scenario
 is to bump PGSS_FILE_HEADER, which will cause all existing entries to
 be invalidated.

 You're going to have to help me here, 'cause I don't see how there can
 be duplicates if we include the PGSS_FILE_HEADER as part of the hash,
 unless we're planning to keep PGSS_FILE_HEADER constant while we change
 what the hash value is for a given query, yet that goes against the
 assumptions that were laid out, aiui.

Well, they wouldn't be duplicates if you think that the fact that one
query was executed before some point release and another after ought
to differentiate queries. I do not.

 If there's a change that results in a given query X no longer hashing to
 a value A, we need to change PGSS_FILE_HEADER to invalidate statistics
 which were collected for value A (or else we risk an independent query Y
 hashing to value A and ending up with completely invalid stats..).
 Provided we apply that consistently and don't reuse PGSS_FILE_HEADER
 values along the way, a combination of PGSS_FILE_HEADER and the hash
 value for a given query should be unique over time.

 We do need to document that the hash value for a given query may
 change..

By invalidate, I mean that when we go to open the saved file, if the
header doesn't match, the file is considered corrupt, and we simply
log that the file could not be read, before unlinking it. This would
be necessary in the unlikely event of there being some substantive
change in the representation of query trees in a point release. I am
not aware of any precedent for this, though Tom said that there was
one.

Tom: Could you please describe the precedent you had in mind? I would
like to better understand this risk.

I don't want to get too hung up on what we'd do if this problem
actually occurred, because that isn't what this thread is about.
Exposing the hash is a completely unrelated question, except that to
do so might make pg_stat_statements (including its limitations) better
understood. In my opinion, the various log parsing tools have taught
people to think about this in the wrong way - the query string is just
a representation of a query (and an imperfect one at that).

There are other, similar tools that exist in proprietary databases.
They expose a hash value, which is subject to exactly the same caveats
as our own. They explicitly encourage the type of aggregation by
third-party tools that I anticipate will happen with
pg_stat_statements. I want to facilitate this, but I'm confident that
the use of (dbid, userid, querytext) as a candidate key by these
tools is going to cause confusion, in subtle ways. By using the hash
value, those tools are exactly as well off as pg_stat_statements is,
which is to say, as well off as possible.

I simply do not understand objections to the proposal. Have I missed something?

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash id in pg_stat_statements

2012-10-02 Thread Peter Geoghegan

On 2 October 2012 18:16, Tom Lane t...@sss.pgh.pa.us wrote
 1. Why isn't something like md5() on the reported query text an equally
 good solution for users who want a query hash?

Because that does not uniquely identify the entry. The very first
thing that the docs say on search_path is Qualified names are tedious
to write, and it's often best not to wire a particular schema name
into applications anyway. Presumably, the reason it's best not to
wire schema names into apps is because it might be useful to modify
search_path in a way that dynamically made the same queries in some
application reference what are technically distinct relations. If
anyone does this, and it seems likely that many do for various
reasons, they will be out of luck when using some kind of
pg_stat_statements aggregation.

This was the behaviour that I intended for pg_stat_statements all
along, and I think it's better than a solution that matches query
strings.

 2. If people are going to accumulate stats on queries over a long period
 of time, is a 32-bit hash really good enough for the purpose?  If I'm
 doing the math right, the chance of collision is already greater than 1%
 at 1 queries, and rises to about 70% for 10 queries; see
 http://en.wikipedia.org/wiki/Birthday_paradox
 We discussed this issue and decided it was okay for pg_stat_statements's
 internal hash table, but it's not at all clear to me that it's sensible
 to use 32-bit hashes for external accumulation of query stats.

Well, forgive me for pointing this out, but I did propose that the
hash be a 64-bit value (which would have necessitated adopting
hash_any() to produce 64-bit values), but you rejected the proposal. I
arrived at the same probability for a collision as you did and posted
in to the list, in discussion shortly after the normalisation stuff
was committed.

A more sensible way of assessing the risk of a collision would be to
try and come up with the probability of a collision that someone
actually ends up caring about, which is considerably less than the 1%
for 10,000 entries. I'm not being glib - people are very used to the
idea that aggregating information on query costs is a lossy process.
Prior to 9.2, the only way execution costs could reasonably be
measured on at the query granularity on a busy system was to set
log_min_duration_statement to something like 1 second.

I am also unconvinced by the idea that aggregating historical data
(with the same hash value) in a separate application is likely to make
the collision situation appreciably worse. People are going to be
using something like an RRD circular buffer to aggregate the
information, and I can't see anyone caring about detailed information
that is more than a couple of weeks in the past. The point of
aggregation isn't to store more queries, it's to construct time-series
data from snapshots. Besides, do most applications really even have
more than 10,000 distinct queries?

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash id in pg_stat_statements

2012-10-02 Thread Peter Geoghegan

On 2 October 2012 17:58, Stephen Frost sfr...@snowman.net wrote:
 Right, and that's all I'm trying to address here- how do we provide a
 value for a given query which can be relied upon by outside sources,
 even in the face of a point release which changes what our internal hash
 value for a given query is.

I don't know of a way. Presumably, we'd hope to avoid this, and would
look for alternatives to anything that would necessitate bumping
PGSS_FILE_HEADER, while not going so far as to let pg_stat_statements
contort things in the core system. If I was aware of a case where this
would have come up had pg_stat_statements fingerprinting been around
at the time, perhaps I could give a better answer than that.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash id in pg_stat_statements

2012-10-03 Thread Peter Geoghegan

On 3 October 2012 19:04, Tom Lane t...@sss.pgh.pa.us wrote:
 Daniel Farina dan...@heroku.com writes:
 Instead, I think it makes sense to assign a number -- arbitrarily, but
 uniquely -- to the generation of a new row in pg_stat_statements, and,
 on the flip side, whenever a row is retired its number should be
 eliminated, practically, for-ever.  This way re-introductions between
 two samplings of pg_stat_statements cannot be confused for a
 contiguously maintained statistic on a query.

 This argument seems sensible to me.  Is there any use-case where the
 proposed counter wouldn't do what people wished to do with an exposed
 hash value?

Yes. The hash could be used to aggregate query execution costs across
entire WAL-based replication clusters. I'm not opposed to Daniel's
suggestion, though.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sortsupport for text

2012-10-08 Thread Peter Geoghegan

Do you think you can follow through on this soon, Robert? I don't
believe that there are any outstanding issues. I'm not going to make
an issue of the fact that strxfrm() hasn't been taken advantage of. If
you could please post a new revision, with the suggested alterations
(that you agree with), I'd be happy to take another look.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Disable _FORTIFY_SOURCE with ICC

2012-10-08 Thread Peter Geoghegan

On 5 October 2012 04:37, Tom Lane t...@sss.pgh.pa.us wrote:
 A bit later: testing on an F17 box (glibc-2.15-56.fc17.x86_64)
 confirms Peter G's complaint, and also confirms that putting
 the above into port/linux.h (instead of the template file) fixes the
 problem.  Don't know how to disable it on ICC, but I suppose there's
 some way to test for ICC via an #if.

I was using F17 + glibc 2.15 too.

http://www.acsu.buffalo.edu/~charngda/icc.html indicates that ICC
defines a macro __ICC, which evaluates to an integer constant
representing the ICC version numbers. I guess that that particular
source isn't very trustworthy, but that's what we have a build farm
for.

Small patch that hopefully fixes everything for everyone is attached.
However, when I build at -O2 with the system GCC (on the same F17 box
with no changes to CFLAGS; just ./configure --with-libxml
--prefix=/home/peter/pgsql --enable-depend) this patch (when applied
to today's master) sees the following warnings raised:

zic.c: In function ‘writezone’:
zic.c:1752:3: warning: ignoring return value of ‘fwrite’, declared
with attribute warn_unused_result [-Wunused-result]
zic.c:1753:3: warning: ignoring return value of ‘fwrite’, declared
with attribute warn_unused_result [-Wunused-result]
zic.c:1754:3: warning: ignoring return value of ‘fwrite’, declared
with attribute warn_unused_result [-Wunused-result]
zic.c:1755:3: warning: ignoring return value of ‘fwrite’, declared
with attribute warn_unused_result [-Wunused-result]
zic.c:1756:3: warning: ignoring return value of ‘fwrite’, declared
with attribute warn_unused_result [-Wunused-result]
zic.c:1757:3: warning: ignoring return value of ‘fwrite’, declared
with attribute warn_unused_result [-Wunused-result]
zic.c:1758:3: warning: ignoring return value of ‘fwrite’, declared
with attribute warn_unused_result [-Wunused-result]
zic.c:1759:3: warning: ignoring return value of ‘fwrite’, declared
with attribute warn_unused_result [-Wunused-result]
zic.c:1760:3: warning: ignoring return value of ‘fwrite’, declared
with attribute warn_unused_result [-Wunused-result]
zic.c:1772:4: warning: ignoring return value of ‘fwrite’, declared
with attribute warn_unused_result [-Wunused-result]
zic.c:1785:4: warning: ignoring return value of ‘fwrite’, declared
with attribute warn_unused_result [-Wunused-result]
zic.c: In function ‘puttzcode64’:
zic.c:1514:2: warning: ignoring return value of ‘fwrite’, declared
with attribute warn_unused_result [-Wunused-result]
zic.c: In function ‘puttzcode’:
zic.c:1505:2: warning: ignoring return value of ‘fwrite’, declared
with attribute warn_unused_result [-Wunused-result]
In file included from gram.y:13502:0:
scan.c: In function ‘yy_try_NUL_trans’:
scan.c:16247:23: warning: unused variable ‘yyg’ [-Wunused-variable]

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


fortify_source_fix.2012_10_08.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] enhanced error fields

2012-10-08 Thread Peter Geoghegan

On 20 August 2012 14:09, Pavel Stehule pavel.steh...@gmail.com wrote:
 (eelog-2012-08-20.diff)

This patch has the following reference to relerror.c:



*** a/src/include/utils/rel.h
--- b/src/include/utils/rel.h
***
*** 394,397  typedef struct StdRdOptions
--- 394,402 
  extern void RelationIncrementReferenceCount(Relation rel);
  extern void RelationDecrementReferenceCount(Relation rel);

+ /* routines in utils/error/relerror.c */



Indeed, the new C file has been separately added to a makefile:

! OBJS = assert.o elog.o relerror.o

However, the new file itself does not appear in this patch. Therefore,
the code does not compile. Please post a revision with the new file
included.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sortsupport for text

2012-10-08 Thread Peter Geoghegan

On 8 October 2012 16:30, Robert Haas robertmh...@gmail.com wrote:
 I don't have any plans to work on this further.  I think all of the
 criticism that has been leveled at this patch is 100% bogus, and I
 greatly dislike the changes that have been proposed.  That may not be
 fair, but it's how I feel, and in light of that feeling it does not
 make sense for me to pursue this.  Sorry.

If it was the case that you were only 50% of the way to getting
something committable, I guess I'd understand; this is, after all, a
volunteer effort, and you are of course free to pursue or not pursue
whatever you like. It's more like 95% though, so I have a hard time
understanding this attitude.

The only criticism that I imagine that you could be referring to is my
criticism (which was seconded by Tom) concerning the approach to
sizing a buffer that is used as state between successive calls to the
sortsupport routine. This was a fairly trivial point even within the
context of this patch. It was regrettable that it got so out of hand,
but that certainly wasn't my intention.

I must say that I find this disheartening as a reviewer, particularly
since it comes from someone who reviews a lot of patches (including
most of my own), and has often complained about the difficulties that
reviewers face. Is it really so hard for you to accept a criticism
from me? I didn't expect you to fully accept my criticism; I only
expected you to take it into consideration, and possibly let it go for
the sake of progress.

For what it's worth, I have high regard for your work generally. In
all sincerity, this appears to me to be a slight, and I find it
upsetting, both on a personal level, and because it creates a concern
about being able to work with you in the future, which is certainly
something that I've benefited from in the past, and hope to continue
to benefit from.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] sortsupport for text

2012-10-08 Thread Peter Geoghegan

On 8 October 2012 21:35, Robert Haas robertmh...@gmail.com wrote:
 Hey, if me deciding I don't want to work on a patch any more is going
 to make you feel slighted, then you're out of luck.  The archives are
 littered with people who have decided to stop working on things
 because the consensus position on list was different than the approach
 that they personally favored, and I have as much right to do that as
 anyone else.

Sure you do. However, I doubt very many of those who gave up did so
over so trivial an issue as to how to grow a buffer somewhere, and
those that did usually did not go on to become major contributors, and
certainly not committers. The buffer thing is, as far as I know, the
single solitary point of contention with this patch. We're talking
about 2 lines of code. To give up now is just petulant. There is no
other way of looking at it.

 It is not as if anyone has phrased this as a maybe-we-should
 sort of argument; there have been quite definite arguments on both
 sides over apparently strongly-held positions.  I had hoped that this
 was going to be a quick and non-controversial win, but 74 email
 messages later it has become clear that it will be neither of those
 things.

Many of those 74 emails concerned my completely unrelated digression
into exploiting strxfrm(); we spent a ridiculously long time
discussing how to size this buffer, but it still wasn't anything like
74 messages.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [COMMITTERS] pgsql: Disable _FORTIFY_SOURCE with ICC

2012-10-10 Thread Peter Geoghegan

On 8 October 2012 14:39, Peter Geoghegan pe...@2ndquadrant.com wrote:
 Small patch that hopefully fixes everything for everyone is attached.

Will someone take a look at this, please?

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash id in pg_stat_statements

2012-10-10 Thread Peter Geoghegan

On 3 October 2012 19:54, Peter Geoghegan pe...@2ndquadrant.com wrote:
 On 3 October 2012 19:04, Tom Lane t...@sss.pgh.pa.us wrote:
 This argument seems sensible to me.  Is there any use-case where the
 proposed counter wouldn't do what people wished to do with an exposed
 hash value?

 Yes. The hash could be used to aggregate query execution costs across
 entire WAL-based replication clusters. I'm not opposed to Daniel's
 suggestion, though.

Could we please try and reach a consensus here? If you're still dead
set against exposing the hash value, I think that just following what
Daniel has suggested is a fair compromise.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH 8/8] Introduce wal decoding via catalog timetravel

2012-10-10 Thread Peter Geoghegan

On 15 September 2012 01:39, Andres Freund and...@2ndquadrant.com wrote:
 (0008-Introduce-wal-decoding-via-catalog-timetravel.patch)

This patch is the 8th of 8 in a patch series that covers different
aspects of the bi-directional replication feature planned for
PostgreSQL 9.3. For those that are unfamiliar with the BDR projection,
a useful summary is available in an e-mail sent to this list by Simon
Riggs back in April [1]. I should also point out that Andres has made
available a design document that discusses aspects of this patch in
particular, in another thread [2]. That document, High level design
for logical replication in Postgres, emphasizes source data
generation in particular: generalising how PostgreSQL's WAL stream is
generated to represent the changes it describes logically, without
pointers to physical heap tuples and so forth (data generation as a
concept is described in detail in an earlier design document [3]).
This particular patch can be thought of as a response to the earlier
discussion [4] surrounding how to solve the problem of keeping system
catalogs consistent during WAL replay on followers: catalog time
travel is now used, rather than maintaining a synchronized catalog at
the decoding end. Andres' git tree (xlog-decoding-rebasing-cf2
branch) [5] provides additional useful comments in commit messages (he
rebases things such that each commit represents a distinct piece of
functionality/ patch for review).

This patch is not strictly speaking an atomic unit. It is necessary to
apply all 8 patches in order to get the code to compile. However, it
is approximately an atomic unit, that represents a subdivision of the
entire BDR patch that it is manageable and logical to write a discrete
review for. This is idiomatic use of git-format-patch, but it is
unusual enough within our community for me feel the need to note these
facts.

I briefly discussed this patch with Andres off-list. His feeling is
that the review process ought to focus on the design of WAL decoding,
including how it fits within the larger set of replication features
proposed for 9.3. There are a number of known omissions in this patch.
Andres has listed some of these above, and edge-cases and so on are
noted next to XXX and FIXME comments in the patch itself. I am
inclined to agree with Andres' view that we should attempt to solidify
community support for this prototype patch's design, or some variant,
before fixing the edge-cases and working towards committable code. I
will try my best to proceed on that basis.

What follows is an initial overview of the patch (or at least my
understanding of the patch, which you may need to correct), and some
points of concern.

 * applycache module which reassembles transactions from a stream of 
 interspersed changes


This is what the design doc [2] refers to as 4.5. TX reassembly.

This functionality is concentrated in applycache.c. As [2] notes, the
reassembly component was previously coined ApplyCache because it was
proposed to run on replication consumers just before applying changes.
This is not the case anymore. It is still called that way in the
source of the patch recently submitted.

The purpose of ApplyCache/transaction reassembly is to reassemble
interlaced records, and organise them by XID, so that the consumer
client code sees only streams (well, lists) of records split by XID.

I meant to avoid talking about anything other than the bigger picture
for now, but I must ask: Why the frequent use of malloc(),
particularly within applycache.c? The obvious answer is that it's
rough code and that that will change, but that still doesn't comport
with my idea about how rough Postgres code should look, so I have to
wonder if there's a better reason.

applycache.c has an acute paucity of comments, which makes it really
hard to review well. [2] doesn't have all that much to say about it
either. I'm going to not comment much on this here, except to say that
I think that the file should be renamed to reassembly.c or something
like that, to reflect its well-specified purpose, and not how it might
be used. Any cache really belongs in src/backend/utils/cache/ anyway.

Applycache is presumably where you're going to want to spill
transaction streams to disk, eventually. That seems like a
prerequisite to commit.

By the way, I see that you're doing this here:

+ /* most callers don't need snapshot.h */
+ typedef struct SnapshotData *Snapshot;

Tom, Alvaro and I had a discussion about whether or not this was an
acceptable way to reduce build dependencies back in July [8] – I lost
that one. You're relying on a Gnu extension/C11 feature here (that is,
typedef redefinition). If you find it intensely irritating that you
cannot do this while following the standard to the letter, you are not
alone.

 * snapbuilder which builds catalog snapshots so that tuples from wal can be 
 understood


This component analyses xlog and builds a special kind of Snapshot.
This has been compared to the

Re: [HACKERS] tuplesort memory usage: grow_memtuples

2012-10-11 Thread Peter Geoghegan

Do you intend to follow through with this, Jeff?

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] enhanced error fields

2012-10-11 Thread Peter Geoghegan

, documented exceptions to my previous all or nothing
proviso, such as table names won't be available in the event of
domain constraints, so be it.

I'm going to suggest you add a note to both the docs and errcodes.h
regarding all this in your next revision. People need to be adding
these fields for all errcodes that *require* them going forward. If,
in the future, ERRCODE_UNIQUE_VIOLATION errors, for example, cannot
supply a constraint name that was violated, then that is, almost by
definition, the wrong errcode to use.

[1] 
http://archives.postgresql.org/message-id/CAEYLb_VJK+AZe6fO_s0Md0ge5D=rendf7wg+g4nxn8mhkq4...@mail.gmail.com

[2] cafj8prdttdvosvjt8pp08mq_lw2haomwxvrudoylhk9xf7k...@mail.gmail.com

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Deprecating RULES

2012-10-11 Thread Peter Geoghegan

On 11 October 2012 20:28, Simon Riggs si...@2ndquadrant.com wrote:
 Not many RULE-lovers out there, once you've tried to use them.

 Allowing RULEs complicates various things and can make security more 
 difficult.

What exactly do they make more difficult? Are you particularly
concerned with the overhead that rules impose when developing certain
types of features? If so, since there's going to be a legacy
compatibility mode for a long time, I don't know that deprecating
rules will buy you much in the next 3 - 4 releases.

 For 9.3, I suggest we create a DDL trigger by default which prevents
 RULEs and throws an ERROR that explains they are now deprecated.

 Anybody that really cares can delete this and use them. Sometime in
 future, we hard code it, barring complaints.

Well, rules have been around since the Berkeley days [1]. I don't
think that anyone, including Tom, is willing to argue that
user-defined rules are not a tar-pit (except perhaps ON SELECT DO
INSTEAD SELECT rules - which are exactly equivalent to views anyway).
Personally, I'd like to see them removed too. However, in order to be
able to get behind your proposal, I'd like to see a reasonably
developed cost/benefit analysis. People do use user-defined rules. For
example, the xTuple open source ERP package uses ON INSERT DO INSTEAD
rules [2].

[1] http://db.cs.berkeley.edu/papers/ERL-M89-82.pdf

[2] http://www.xtuple.org/ApiDevelopment

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] enhanced error fields

2012-10-13 Thread Peter Geoghegan

 offer us? What have
I said that is fundamentally incompatible with how things work in this
patch right now?

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tuplesort memory usage: grow_memtuples

2012-10-14 Thread Peter Geoghegan

On 14 October 2012 09:19, Simon Riggs si...@2ndquadrant.com wrote:
 This is a very useful optimisation, for both the low and the high end.

I suppose it's possible to not think of it as an optimisation at all.
Rather, it could be considered a way of making tuplesort really use
the work_mem allotted to it, or at least use it more effectively, so
that DBAs don't have to oversize work_mem. You're getting the same
behaviour as when you oversized work_mem, except that you don't run
the risk of thrashing due to excessive paging if ever there is a sort
big enough to have that effect.

 The current coding allows full use of memory iff the memory usage is
 an exact power of 2, so this patch will allow an average of 50% and as
 much as 100% gain in memory for sorts. We need to be careful to note
 this as a change on the release notes, since users may well have set
 work_mem to x2 what was actually needed to get it to work - that means
 most users will see a sudden leap in actual memory usage, which could
 lead to swapping in edge cases.

Yes, that's probably true.

 I notice that cost_sort doesn't take into account the space used for
 sorting other than tuple space so apparently requires no changes with
 this patch. ISTM that cost_sort should be less optimistic about memory
 efficiency than it is.

Perhaps. I don't have an intuitive sense of what is and is not worth
modelling in the optimiser, so I can't really comment here.

--
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Deprecating RULES

2012-10-15 Thread Peter Geoghegan

On 15 October 2012 11:41, Greg Stark st...@mit.edu wrote:
 On Mon, Oct 15, 2012 at 8:00 AM, Simon Riggs si...@2ndquadrant.com wrote:
 Please can anyone show me the SQL for a rule that cannot be written as
 a view or a trigger? I do not believe such a thing exists and I will
 provide free beer to the first person that can prove me wrong.

 Being written as a view doesn't help you because views use rules. I
 repeat, the very fact that we need rules to implement views prove
 rules are necessary for some purposes.

Well, the usual way that this proposal is phrased is that user-defined
rules should be deprecated. Granted, that wasn't the case on this
occasion, but it has been on many other occasions.

It's not as if we there isn't a clear separation between what we all
agree are good rules (that is, ON SELECT DO
INSTEAD SELECT rules, which views are technically very simple wrappers
of) and bad rules (that is, everything else).

Humorous aside: I saw this comment within view.c, that dates from the
Postgres95 days at the latest (but is probably older still):

 * This update consists of adding two new entries IN THE BEGINNING
 * of the range table (otherwise the rule system will die a slow,
 * horrible and painful death, and we do not want that now, do we?)

I'm not sure that the authors' remarks about not wanting that should
be taken at face value...

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Deprecating RULES

2012-10-15 Thread Peter Geoghegan

On 15 October 2012 00:30, Greg Stark st...@mit.edu wrote:
 In fact it's not a very good analogy because the situation is
 *precisely* the same -- rules *are* macros and manipulate the raw sql
 before it's run and the reason they can't be replaced by triggers is
 because, like functions, triggers happen after the code is compiled
 and run.

I quite like this analogy, because it nicely illustrates the problems
with rules.

C, and the C preprocessor, are essential the same now as they were in
the early 1970s. I think that *an emphasis* on a preprocessing stage
of translation is a fairly discredited idea (though there are some
sensible uses, particularly where alternatives are not available). C99
introduced inline functions, probably in no small part because it is
quite obvious that they are often superior to macros. Consider the two
most successful programming languages that were obviously influenced
by C: Java and C++. The first doesn't have a preprocessor, and the
second strongly encourages using numerous alternatives to macros where
possible, which is almost always. Maybe you don't like this analogy,
because you consider C to be a systems programming language, and as
such think it is only right and proper that programmers should be
given enough rope to hang themselves. Perhaps you're right, but the
same surely cannot be said for SQL. The original appeal of SQL was
that it was supposedly possible for non-programmers to write it.

Clearly deprecating rules implies some loss of functionality - there
is no exact, drop-in equivalent to something that magically rewrites
SQL that isn't equally baroque and problematic. If that's the bar,
then detractors of rules should stop wasting their breath, because the
bar has been set impossibly high.

On a *practical* level triggers are complete replacements for
user-defined rules. All that it takes to be able to *always* say that
one language feature is not equivalent to another, and on that basis
the other should not be deprecated, is a sufficient degree of pedantry
(not that I'm implying that you or anyone else was being pedantic, or
that concerns raised should not be heeded).

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Hash id in pg_stat_statements

2012-10-15 Thread Peter Geoghegan

On 3 October 2012 19:04, Tom Lane t...@sss.pgh.pa.us wrote:
 Daniel Farina dan...@heroku.com writes:
 Instead, I think it makes sense to assign a number -- arbitrarily, but
 uniquely -- to the generation of a new row in pg_stat_statements, and,
 on the flip side, whenever a row is retired its number should be
 eliminated, practically, for-ever.  This way re-introductions between
 two samplings of pg_stat_statements cannot be confused for a
 contiguously maintained statistic on a query.

 This argument seems sensible to me.

Daniel: Could you please submit the patch that you were working on
that does this to the next commitfest?

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC][PATCH] wal decoding, attempt #2 - Design Documents (really attached)

2012-10-15 Thread Peter Geoghegan

On 15 October 2012 19:19, Bruce Momjian br...@momjian.us wrote:
 I think Robert is right that if Slony can't use the API, it is unlikely
 any other replication system could use it.

I don't accept that. Clearly there is a circular dependency, and
someone has to go first - why should the Slony guys invest in adopting
this technology if it is going to necessitate using a forked Postgres
with an uncertain future? That would be (with respect to the Slony
guys) a commercial risk that is fairly heavily concentrated with
Afilias. So, if you're going to attach as a condition to its
acceptance that the Slony guys be able to use it immediately (because
can integrate really means will integrate, right?), you're
attaching it to a rather arbitrary condition that has nothing much to
do with the technical merit of the patches proposed. The fact of the
matter is that Slony was originally designed with a somewhat different
set of constraints to those that exist today, so I don't doubt that
this is something that they're going to need to integrate over time,
probably in a separate release branch, to get the upsides of in-core
logical replication, along with the great flexibility that Slony
currently offers (and that Afilias undoubtedly depend upon today).

Another way of putting this is that Postgres should go first because
we will get huge benefits even if only one of the trigger-based
logical replication systems adopts the technology. Though I hope and
expect that the Slony guys will be able to work with what we're doing,
surely a logical replication system with all the benefits implied by
being logical, but with with only some subset of Slony's functionality
is still going to be of great benefit.

My view is that the only reasonable approach is to build something
solid, well-integrated and generic, in core. I'd certainly like to
hear what the Slony guys have to say here, though.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tuplesort memory usage: grow_memtuples

2012-10-16 Thread Peter Geoghegan

On 14 October 2012 09:19, Simon Riggs si...@2ndquadrant.com wrote:
 This is a very useful optimisation, for both the low and the high end.

Well, I'm about ready to mark this one ready for committer. There is
this outstanding issue in my revision of August 17th, though:

+   /*
+* XXX: This feels quite brittle; is there a better principled 
approach,
+* that does not violate modularity?
+*/
+   newmemtupsize = (int) floor(oldmemtupsize * allowedMem / 
memNowUsed);
+   state-fin_growth = true;

I suppose that I should just recognise that this *is* nothing more
than a heuristic, and leave it at that.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tuplesort memory usage: grow_memtuples

2012-10-16 Thread Peter Geoghegan

On 16 October 2012 14:24, Simon Riggs si...@2ndquadrant.com wrote:
 If you describe in detail that it is a heuristic and why that is
 proposed over other approaches that should be sufficient for future
 generations to read and understand.

I've done so, in the attached revision. Things have been simplified
somewhat too.

The same basic strategy for sizing the tuplesort memtuples array in
also exists in tuplestore. I wonder if we should repeat this there? I
suppose that that could follow later.

The patch will now been marked ready for committer. Does this need
doc changes, in light of what is arguably a behavioural difference?
You only mentioned release notes.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


sortmem_grow-v3.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tuplesort memory usage: grow_memtuples

2012-10-16 Thread Peter Geoghegan

On 16 October 2012 22:18, Greg Stark st...@mit.edu wrote:
 That's assuming my committer bits haven't lapsed and people are ok
 with me stepping back into things?

I personally have no objections.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [PATCH] Enforce that INSERT...RETURNING preserves the order of multi rows

2012-10-17 Thread Peter Geoghegan

On 17 October 2012 14:53, Merlin Moncure mmonc...@gmail.com wrote:
 Is that defined in the standard?

RETURNING isn't even defined in the standard.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Deprecating RULES

2012-10-17 Thread Peter Geoghegan

On 17 October 2012 18:50, Andrew Dunstan and...@dunslane.net wrote:
 I don't know how many times I have to say this: people are not listening.
 Tom has already given a case for it upthread:


 Triggers necessarily operate on a row-at-a-time basis.  In theory,
 for at least some bulk operations, a rule could greatly outperform
 a trigger.  It's difficult to walk away from that - unless somebody
 can prove that the advantage doesn't ever accrue in practice.

Fair point. I'm just not sure that that is a good enough reason to not
deprecate rules. I mean, if experienced hackers cannot figure out if
that's actually a useful facet of rules, what hope is there for anyone
else?

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Deprecating RULES

2012-10-17 Thread Peter Geoghegan

On 17 October 2012 18:46, Greg Stark st...@mit.edu wrote:
 I would suggest something like

 Warning: RULES are tricky to use correctly. They rewrite the original
 query into a new query before it is run and it is very hard to
 correctly anticipate and rewrite every possible input query into the
 desired result. There are also unexpected interactions with other
 components when RULES do something unexpected such as rewrite a single
 query to return two result sets.

+1 to that sort of wording.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC][PATCH] wal decoding, attempt #2 - Design Documents (really attached)

2012-10-18 Thread Peter Geoghegan

On 16 October 2012 15:26, Jan Wieck janwi...@yahoo.com wrote:
 This means that the transition time from the existing, trigger based
 approach to the new WAL based mechanism will see both technologies in
 parallel, which is no small thing to support.

So, you're talking about a shim between the two in order to usefully
support inter-version replication, or are you just thinking about
making a clean break in compatibility for Postgres versions prior to
9.3 in a new release branch?

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] [RFC][PATCH] wal decoding, attempt #2 - Design Documents (really attached)

2012-10-18 Thread Peter Geoghegan

On 18 October 2012 16:18, Christopher Browne cbbro...@gmail.com wrote:
 A shim adds complexity, but retains the upgrade across versions
 use case, and reduces the need to keep supporting elder versions of
 Slony.

Right. Upgrading across major versions is likely to continue to remain
a very important use-case for Slony.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] tuplesort memory usage: grow_memtuples

2012-10-18 Thread Peter Geoghegan

On 16 October 2012 21:47, Peter Geoghegan pe...@2ndquadrant.com wrote:
 The same basic strategy for sizing the tuplesort memtuples array in
 also exists in tuplestore. I wonder if we should repeat this there? I
 suppose that that could follow later.

Incidentally, the basis of this remark is commit 2689abf0, where Tom
decided to keep the two in sync. That's a precedent for what we need
to do here, I suppose.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assertion failure w/extended query protocol

2012-10-19 Thread Peter Geoghegan

On 19 October 2012 17:19, Robert Haas robertmh...@gmail.com wrote:
 Rushabh Lathia of the EnterpriseDB development team and I have been
 doing some testing of the extended query protocol and have found a
 case where it causes an assertion failure.  Here's how to reproduce:

 1. Apply the attached patch to teach psql how to use the extended
 query protocol.  Compile, install.

 2. Start the modified psql and do this:

 \set PROTOCOL extended
 PREPARE stmt as select 1;
 CREATE TEMPORARY TABLE tmptbl AS EXECUTE stmt;

 The result is:

 TRAP: FailedAssertion(!(qry-commandType != CMD_UTILITY), File:
 utility.c, Line: 1516)

I'm reasonably confident that commit
9dbf2b7d75de5af38d087cbe2b1147dd0fd10f0a caused this breakage.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] assertion failure w/extended query protocol

2012-10-19 Thread Peter Geoghegan

On 19 October 2012 19:01, Andres Freund and...@2ndquadrant.com wrote:
 Btw, do you plan to submit that psql patch at some point? I repeatedly wished
 to be able to use the extended protocol without writing code or misusing
 pgbench exactly to test stuff like this.

+1

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] First draft of snapshot snapshot building design document

2012-10-19 Thread Peter Geoghegan

-subxip, 
snapshot-subxcnt))
+   {
+   CommandId cmin = HeapTupleHeaderGetRawCommandId(tuple);
+   /* no support for that yet */
+   if (tuple-t_infomask  HEAP_COMBOCID){
+   elog(WARNING, combocids not yet supported);
+   return false;
+   }
+   if (cmin = snapshot-curcid)
+   return false;   /* inserted after scan started */
+   }

Above, you aren't taking this into account (code from HeapTupleHeaderGetCmax()):

/* We do not store cmax when locking a tuple */
Assert(!(tup-t_infomask  (HEAP_MOVED | HEAP_IS_LOCKED)));

Sure, you're only interested in cmin, so this doesn't look like it
applies (isn't this just a sanity check?), but actually, based on this
it seems to me that the current sane representation of cmin needs to
be obtained in a more concurrency aware fashion - having the backend
local data-structures that originate the tuple isn't even good enough.
The paucity of other callers to HeapTupleHeaderGetRawCommandId() seems
to support this. Apart from contrib/pageinspect, there is only this
one other direct caller, within heap_getsysattr():

/*
 * cmin and cmax are now both aliases for the same field, which
 * can in fact also be a combo command id.  XXX perhaps we should
 * return the real cmin or cmax if possible, that is if we are
 * inside the originating transaction?
 */
result = CommandIdGetDatum(HeapTupleHeaderGetRawCommandId(tup-t_data));

So these few direct call-sites that inspect CommandId outside of their
originating backend are all non-critical observers of the CommandId,
that are roughly speaking allowed to be wrong. Callers to the higher
level abstractions (HeapTupleHeaderGetCmin()/HeapTupleHeaderGetCmax())
know that the CommandId must be a cmin or cmax respectively, because
they have as their contract that
TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmin(tup)) and
TransactionIdIsCurrentTransactionId(HeapTupleHeaderGetXmax(tup))
respectively.

I guess this can all happen when you write that
XLOG_HEAP2_NEW_COMBOCID record within the originating transaction
(that is, when you xmin is that of the tuple in the originating
transaction, you are indeed dealing with a cmin), but these are
hazards that you need to consider in addition to the more obvious
ComboCid hazard. I have an idea that the
HeapTupleHeaderGetRawCommandId(tuple) call in your code could well be
bogus even when (t_infomask  HEAP_COMBOCID) == 0.

I look forward to seeing your revision that addressed the issue with
sinval messages.

[1] 
http://archives.postgresql.org/message-id/201209221900.53190.and...@2ndquadrant.com

[2] 
http://archives.postgresql.org/message-id/CAEYLb_XZ-k_vRpBP9TW=_wufdsusosp1yir1xg7l_4rmg5b...@mail.gmail.com

[3] 
http://archives.postgresql.org/message-id/201209150233.25616.and...@2ndquadrant.com

[4] 
http://archives.postgresql.org/message-id/1347669575-14371-8-git-send-email-and...@2ndquadrant.com

[5] 
http://archives.postgresql.org/message-id/201206131327.24092.and...@2ndquadrant.com

[6] 
http://archives.postgresql.org/message-id/201210161330.37967.and...@2ndquadrant.com

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Deprecating RULES

2012-10-19 Thread Peter Geoghegan

On 19 October 2012 22:03, Josh Berkus j...@agliodbs.com wrote:
 Unless the easiest way to implement MERGE is to extend RULEs.

FWIW, I'd say that's probably about the hardest possible way to
implement MERGE, assuming that we prioritise providing robust UPSERT
support, as I strongly feel we should.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] enhanced error fields

2012-10-20 Thread Peter Geoghegan

I think that we're both going to be busy next week, since we're both
attending pgconf.eu. For that reason, I would like to spend some time
tomorrow to get something in shape, that I can mark ready for
committer. I'd like to get this patch committed during this
commitfest. You are welcome to do this work instead. I want to avoid a
redundant effort.

Let me know if you think that that's a good idea.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] No, pg_size_pretty(numeric) was not such a hot idea

2012-10-21 Thread Peter Geoghegan

On 21 October 2012 16:59, Kevin Grittner kgri...@mail.com wrote:
 I don't know about anyone else, but I could live with that.

Me too.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] enhanced error fields

2012-10-24 Thread Peter Geoghegan

On 24 October 2012 23:29, Alvaro Herrera alvhe...@2ndquadrant.com wrote:

 Let me know if you think that that's a good idea.

 I guess you didn't get around to it.

I did get some work on this done, which does change things somewhat.
In particular, I think that the need to have so many new fields is
questionable, and so have removed some. I will get around to this, and
will incorporate those ideas.

The errrel() calls with index relations are not sane, but that's just
an oversight. The next revision will actually do this:

+   Assert(table-rd_rel-relkind == RELKIND_RELATION);

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Should select 'nan'::float = 'nan'::float; return false as per IEEE 754

2012-10-28 Thread Peter Geoghegan

On 28 October 2012 09:43, Hannu Krosing ha...@2ndquadrant.com wrote:
 test=# select 'NaN'::float = 'NaN'::float as must_be_false;
  must_be_false
 --
  t
 (1 row)

 I think that PostgreSQL's behaviour of comparing two
 NaN-s as equal is wrong and Iwe should follow the IEEE 754 spec here

The reason that PostgreSQL does this is that in order for float to be
a btree indexable type, its semantics must follow the reflexive law.
This and other requirements of btree operator classes are described
under src/backend/access/nbtree/README.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Logical to physical page mapping

2012-10-28 Thread Peter Geoghegan

On 28 October 2012 22:35, Jan Wieck janwi...@yahoo.com wrote:
 The amount of WAL generated with full_page_writes=on is quite substantial.
 For pgbench for example the ratio 20:1. Meaning with full_page_writes you
 write 20x the amount you do without.

Sure, but as always, pgbench pessimises everything by having writes
follow a uniform distribution, which is completely unrepresentative of
the natural world. This will literally maximise the number of pristine
full page images that need to be included. The actual frequency of
checkpoints is a crucial factor here too, and you didn't mention
anything about that.


-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

[HACKERS] Re: [HACKERS] Patch für MAP_HUGETLB for mmap() shared memory

2012-10-29 Thread Peter Geoghegan

On 29 October 2012 20:14, Christian Kruse cjk+postg...@defunct.ch wrote:
 created a patch which implements MAP_HUGETLB for sysv shared memory segments
 (PGSharedMemoryCreate). It is based on tests of Tom Lane and Andres Freund, I
 added error handling, huge page size detection and a GUC variable.

I have a few initial observations on this.

* I think you should be making the new GUC PGC_INTERNAL on platforms
where MAP_HUGETLB is not defined or available. See also,
effective_io_concurrency. This gives sane error handling.

* Why aren't you failing when HUGE_TLB_ON and the mmap fails? You do
the same thing as HUGE_TLB_TRY, contrary to your documentation:

 +  if (huge_tlb_pages == HUGE_TLB_ON || huge_tlb_pages == 
HUGE_TLB_TRY)

* I think you should add MAP_HUGETLB to PG_MMAP_FLAGS directly, rather
than XOR'ing after the fact. We already build the flags that comprise
PG_MMAP_FLAGS using another set of intermediary flags, based on
platform considerations, so what you're doing here is just
inconsistent. Don't wrap the mmap() code in ifdefs, and instead rely
on the GUC being available on all platforms, with setting the GUC
causing an error on unsupported platforms, in the style of
effective_io_concurrency, as established in commit
3b07182e613a0e63ff662e3dc7f820f010923ec7 . Build a PG_MAP_HUGETLB
intermediary flag on all platforms.

* Apparently you're supposed to do this for the benefit of Itanic [1]:

/* Only ia64 requires this */
#ifdef __ia64__
#define ADDR (void *)(0x8000UL)
#define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB | MAP_FIXED)
#else
#define ADDR (void *)(0x0UL)
#define FLAGS (MAP_PRIVATE | MAP_ANONYMOUS | MAP_HUGETLB)
#endif

* You aren't following our style guide with respect to error messages [2].

[1] https://www.kernel.org/doc/Documentation/vm/map_hugetlb.c

[2] http://www.postgresql.org/docs/devel/static/error-style-guide.html
-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index only scans wiki page

2012-11-12 Thread Peter Geoghegan

On 13 November 2012 01:03, Jeff Janes jeff.ja...@gmail.com wrote:
 https://wiki.postgresql.org/wiki/Index-only_scans

 This page is seriously out of date.  It suggests they are not yet
 implemented, but only being talked about.

Attached is a piece I wrote on the feature. That might form the basis
of a new wiki page. Feel free to incorporate this material as you see
fit.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


index_only_scans.rst
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index only scans wiki page

2012-11-13 Thread Peter Geoghegan

On 13 November 2012 16:37, Robert Haas robertmh...@gmail.com wrote:
 I found this an interesting read.  As one of the people who worked on
 the feature, I'm sort of curious whether people have any experience
 yet with how this actually shakes out in the field.  Are you (or is
 anyone) aware of positive/negative field experiences with this
 feature?

Unfortunately, I don't think that I have any original insight about
the problems with index-only scans in the field right now.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Index only scans wiki page

2012-11-13 Thread Peter Geoghegan

On 13 November 2012 16:37, Robert Haas robertmh...@gmail.com wrote:
 I found this an interesting read.  As one of the people who worked on
 the feature, I'm sort of curious whether people have any experience
 yet with how this actually shakes out in the field.  Are you (or is
 anyone) aware of positive/negative field experiences with this
 feature?

Unfortunately, I don't think that I have any original insight about
the problems with index-only scans in the field right now.

-- 
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-22 Thread Peter Geoghegan

On 22 March 2012 17:19, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm inclined to think that the most useful behavior is to teach the
 rewriter to copy queryId from the original query into all the Queries
 generated by rewrite.  Then, all rules fired by a source query would
 be lumped into that query for tracking purposes.  This might not be
 the ideal behavior either, but I don't see a better solution.

+1. This behaviour seems fairly sane. The lumping together of DO ALSO
and DO INSTEAD operations was a simple oversight.

 This seems to boil down to what you want to have happen with queries
 created/executed inside functions, which is something I don't recall
 being discussed.

Uh, well, pg_stat_statements is clearly supposed to monitor execution
of queries from within functions - there is a GUC,
pg_stat_statements.track, which can be set to 'all' to track nested
queries. That being the case, we should clearly be fingerprinting
those query trees too.

The fact that we'll fingerprint these queries even though we usually
don't care about them doesn't seem like a problem, since in practice
the vast majority will be prepared.

 Either way, I think we'd be a lot better advised to define a single
 hook post_parse_analysis_hook and make the core code responsible
 for calling it at the appropriate places, rather than supposing that
 the contrib module knows exactly which core functions ought to be
 the places to do it.

I agree.

Since you haven't mentioned the removal of parse-analysis Const
location alterations, I take it that you do not object to that, which
is something I'm glad of.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-22 Thread Peter Geoghegan

On 22 March 2012 19:07, Tom Lane t...@sss.pgh.pa.us wrote:
 Will you adjust the patch for the other issues?

Sure. I'll take a look at it now.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-03-22 Thread Peter Geoghegan

On 23 January 2012 02:08, Robert Haas robertmh...@gmail.com wrote:
 On Sat, Jan 21, 2012 at 6:32 PM, Jeff Janes jeff.ja...@gmail.com wrote:
 I'm finding the backend_writes column pretty unfortunate.  The only
 use I know of for it is to determine if the bgwriter is lagging
 behind.  Yet it doesn't serve even this purpose because it lumps
 together the backend writes due to lagging background writes, and the
 backend writes by design due to the use buffer access strategy
 during bulk inserts.

 +1 for separating those.

I decided to have a go myself. Attached patch breaks out strategy
allocations in pg_stat_bgwriter, but not strategy writes. My thinking
is that this may serve to approximate non-BAS_NORMAL writes, with the
considerable advantage of not requiring that I work backwards to
figure out strategy from some block when backend-local syncing (yes,
syncing, not writing) a buffer to work out which strategy object
references the buffer. The bookkeeping that that would likely entail
seems to make it infeasible.

Incidentally, it seems Postgres doesn't currently record backend
writes when the buffer doesn't go on to be sync'd. That seems
problematic to me, or at the very least a misrepresentation, since
temporary tables will be written out by the backend for example. Not
sure if it's worth fixing, though I've added a comment to that effect
at the site of where backend_writes is bumped.

I have corrected the md.c bug. This previously would have prevented
the sync_files (number of relation segments synced) value from being
valid in non-log_checkpoints configurations.

I'm not currently confident that the strategy_alloc filed is a very
useful proxy for a strategy_backend_writes field. I think that rather
than bumping the strategy allocation count analogously to the way the
overall count is bumped (within StrategyGetBuffer()), I should have
bumped earlier within BufferAlloc() so that it'd count if the buffer
was requested with non-BAS_NORMAL strategy but was found in
shared_buffers (so control never got as far as StrategyGetBuffer() ).
That might make the value more closely line-up to the would-be value
of a strategy_backend_writes column. What do you think?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
diff --git a/doc/src/sgml/monitoring.sgml b/doc/src/sgml/monitoring.sgml
new file mode 100644
index 840e54a..f714cb8
*** a/doc/src/sgml/monitoring.sgml
--- b/doc/src/sgml/monitoring.sgml
*** SELECT pg_stat_get_backend_pid(s.backend
*** 806,811 
--- 806,818 
   the functionpg_stat_get_buf_alloc/function function./entry
   /row
   row
+   entrybuffers_strat_alloc/entry
+   entrytypebigint/type/entry
+   entryNumber of buffers allocated with non-default buffer access strategy.
+  This value can also be returned by directly calling
+  the functionpg_stat_get_buf_strat_alloc/function function./entry
+  /row
+  row
entrystats_reset/entry
entrytypebigint/type/entry
entryThe last time these statistics were reset.
*** SELECT pg_stat_get_backend_pid(s.backend
*** 1703,1708 
--- 1710,1741 
/entry
   /row
  
+  row
+   entryliteralfunctionpg_stat_get_bgwriter_write_time()/function/literal/entry
+   entrytypebigint/type/entry
+   entry
+ 	   Total amount of time that has been spent in the part of checkpoint
+ 	   processing where files are written to disk, in milliseconds.
+   /entry
+  /row
+ 
+  row
+   entryliteralfunctionpg_stat_get_bgwriter_sync_time()/function/literal/entry
+   entrytypebigint/type/entry
+   entry
+ 	   Total amount of time that has been spent in the part of checkpoint
+ 	   processing where files are synchronized to disk, in milliseconds.
+   /entry
+  /row
+ 
+  row
+   entryliteralfunctionpg_stat_get_bgwriter_sync_files()/function/literal/entry
+   entrytypebigint/type/entry
+   entry
+ 	   Total number of files that have been synchronized to disk during
+ 	   checkpoint processing.
+   /entry
+  /row
  
   row
entryliteralfunctionpg_stat_get_wal_senders()/function/literal/entry
diff --git a/src/backend/access/transam/xlog.c b/src/backend/access/transam/xlog.c
new file mode 100644
index ff7f521..e481be3
*** a/src/backend/access/transam/xlog.c
--- b/src/backend/access/transam/xlog.c
*** LogCheckpointStart(int flags, bool resta
*** 7492,7501 
  }
  
  /*
!  * Log end of a checkpoint.
   */
  static void
! LogCheckpointEnd(bool restartpoint)
  {
  	long		write_secs,
  sync_secs,
--- 7492,7501 
  }
  
  /*
!  * Time and potentially log the end of a checkpoint.
   */
  static void
! TimeCheckpointEnd(bool restartpoint)
  {
  	long		write_secs,
  sync_secs,
*** LogCheckpointEnd(bool restartpoint)
*** 7511,7519 
  
  	CheckpointStats.ckpt_end_t = GetCurrentTimestamp();
  
- 	TimestampDifference

Re: [HACKERS] Uppercase tab completion keywords in psql?

2012-03-23 Thread Peter Geoghegan

On 22 March 2012 22:05, Andrew Dunstan and...@dunslane.net wrote:


 On 03/22/2012 05:49 PM, Bruce Momjian wrote:


 Robert Haas and I are disappointed by this change.  I liked the fact
 that I could post nice-looking SQL queries without having to use my
 capslock key (which I use as a second control key).  Any chance of
 reverting this change?


 Should it be governed by a setting?

Perhaps, but I find the behaviour that was introduced by Peter's patch
to be a more preferable default.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Uppercase tab completion keywords in psql?

2012-03-23 Thread Peter Geoghegan

On 23 March 2012 15:13, Andrew Dunstan and...@dunslane.net wrote:
 Upper casing SQL keywords is a common style, which is used in lots of our
 code (e.g. regression tests, psql queries, pg_dump). I think the default
 should match what is in effect our house style, and what we have
 historically done.

The code doesn't give preferential treatment to lower-case code - it
merely puts it on an even footing. I would agree with your position if
the change assumed that the user always wanted to use lower-case SQL,
but it does not. Rather, it intelligently infers what the user wants.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-27 Thread Peter Geoghegan

On 27 March 2012 20:26, Tom Lane t...@sss.pgh.pa.us wrote:
 I've committed the core-backend parts of this, just to get them out of
 the way.  Have yet to look at the pg_stat_statements code itself.

Thanks. I'm glad that we have that out of the way.

 I ended up choosing not to apply that bit.  I remain of the opinion that
 this behavior is fundamentally inconsistent with the general rules for
 assigning parse locations to analyzed constructs, and I see no reason to
 propagate that inconsistency further than we absolutely have to.

Fair enough.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-28 Thread Peter Geoghegan

[ also for the archives' sake ]

On 27 March 2012 22:05, Tom Lane t...@sss.pgh.pa.us wrote:
 Well, testing function pointers for null is certainly OK --- note that
 all our hook function call sites do that.  It's true that testing for
 equality to a particular function's name can fail on some platforms
 because of jump table hacks.

I was actually talking about stylistic iffiness. This seems contrary
to the standard, which states:

 (ISO C 99 section 6.5.9.6)
   Two pointers compare equal if and only if both are null pointers,
both are pointers to the same object (...) or function,
both are pointers to one past the last element of the same array object,
or one is a pointer to one past the end of one array object and the
other is a pointer to the start of a different array object that happens
to immediately follow the first array object in the address space.

However, the fly in the ointment is IA-64 (Itanic), which apparently
at least at one stage had broken function pointer comparisons, at
least when code was built using some version(s) of GCC.

I found it a bit difficult to square your contention that performing
function pointer comparisons against function addresses was what
sounded like undefined behaviour, and yet neither GCC nor Clang
complained. However, in light of what I've learned about IA-64, I can
certainly see why we as a project would avoid the practice.

Source: http://gcc.gnu.org/ml/gcc/2003-06/msg01283.html

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-28 Thread Peter Geoghegan

On 27 March 2012 20:26, Tom Lane t...@sss.pgh.pa.us wrote:
 Have yet to look at the pg_stat_statements code itself.

I merged upstream changes with the intention of providing a new patch
for you to review. I found a problem that I'd guess was introduced by
commit 9dbf2b7d75de5af38d087cbe2b1147dd0fd10f0a, Restructure SELECT
INTO's parsetree representation into CreateTableAsStmt. This has
nothing to do with my patch in particular.

I noticed that my tests broke, on queries like select * into
orders_recent FROM orders WHERE orderdate = '2002-01-01';. Since
commands like that are now utility statements, I thought it best to
just hash the query string directly, along with all other utility
statements - richer functionality would be unlikely to be missed all
that much, and that's a fairly clean demarcation that I don't want to
make blurry.

In the existing pg_stat_statements code in HEAD, there are 2
pgss_store call sites - one in pgss_ProcessUtility, and the other in
pgss_ExecutorFinish. There is an implicit assumption in the extant
code (and my patch too) that there will be exactly one pgss_store call
per query execution. However, that assumption appears to now fall
down, as illustrated by the GDB session below. What's more, our new
hook is called twice, which is arguably redundant.

(gdb) break pgss_parse_analyze
Breakpoint 1 at 0x7fbd17b96790: file pg_stat_statements.c, line 640.
(gdb) break pgss_ProcessUtility
Breakpoint 2 at 0x7fbd17b962b4: file pg_stat_statements.c, line 1710.
(gdb) break pgss_ExecutorEnd
Breakpoint 3 at 0x7fbd17b9618c: file pg_stat_statements.c, line 1674.
(gdb) c
Continuing.

 I execute the command select * into orders_recent FROM orders WHERE
orderdate = '2002-01-01'; 

Breakpoint 1, pgss_parse_analyze (pstate=0x2473bc0,
post_analysis_tree=0x2474010) at pg_stat_statements.c:640
640 if (post_analysis_tree-commandType != CMD_UTILITY)
(gdb) c
Continuing.

Breakpoint 1, pgss_parse_analyze (pstate=0x2473bc0,
post_analysis_tree=0x2474010) at pg_stat_statements.c:640
640 if (post_analysis_tree-commandType != CMD_UTILITY)
(gdb) c
Continuing.

Breakpoint 2, pgss_ProcessUtility (parsetree=0x2473cd8,
queryString=0x2472a88 select * into orders_recent FROM orders WHERE
orderdate = '2002-01-01';, params=0x0,
isTopLevel=1 '\001', dest=0x2474278, completionTag=0x7fff74e481e0
) at pg_stat_statements.c:1710
1710if (pgss_track_utility  pgss_enabled())
(gdb) c
Continuing.

Breakpoint 3, pgss_ExecutorEnd (queryDesc=0x24c9660) at
pg_stat_statements.c:1674
1674if (queryDesc-totaltime  pgss_enabled())
(gdb) c
Continuing.

What do you think we should do about this?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-28 Thread Peter Geoghegan

On 28 March 2012 15:25, Tom Lane t...@sss.pgh.pa.us wrote:
 That's been an issue right along for cases such as EXPLAIN and EXECUTE,
 I believe.

Possible, since I didn't have test coverage for either of those 2 commands.

 Perhaps the right thing is to consider such executor calls
 as nested statements --- that is, the ProcessUtility hook ought to
 bump the nesting depth too.

That makes a lot of sense, but it might spoil things for the
pg_stat_statements.track = 'top' + pg_stat_statements.track_utility =
'on' case. At the very least, it's a POLA violation, to the extent
that if you were going to do this, you might mandate that nested
statements be tracked along with utility statements (probably while
defaulting to having both off, which would be a change).

Since you've already removed the intoClause chunk, I'm not sure how
far underway the review effort is - would you like me to produce a new
revision, or is that unnecessary?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-28 Thread Peter Geoghegan

On 28 March 2012 15:57, Tom Lane t...@sss.pgh.pa.us wrote:
 Is there any actual benefit in providing the
 pg_stat_statements.string_key GUC?  It looks to me more like something
 that was thrown in because it was easy than because anybody would want
 it.  I'd just as soon leave it out and avoid the incremental API
 complexity increase.  (While on that subject, I see no documentation
 updates in the patch...)

Personally, I don't care for it, and I'm sure most users wouldn't
either, but I thought that someone somewhere might be relying on the
existing behaviour.

I will produce a doc-patch. It would have been premature to do so
until quite recently.

 Also, I'm not terribly happy with the sticky entries hack.  The way
 you had it set up with a 1e10 bias for a sticky entry was completely
 unreasonable IMO, because if the later pgss_store call never happens
 (which is quite possible if the statement contains an error detected
 during planning or execution), that entry is basically never going to
 age out, and will just uselessly consume a precious table slot for a
 long time.  In the extreme, somebody could effectively disable query
 tracking by filling the hashtable with variants of SELECT 1/0.
 The best quick answer I can think of is to reduce USAGE_NON_EXEC_STICK
 to maybe 10 or so, but I wonder whether there's some less klugy way to
 get the result in the first place.  I thought about keeping the
 canonicalized query string around locally in the backend rather than
 having the early pgss_store call at all, but am not sure it's worth
 the complexity of an additional local hashtable or some such to hold
 such pending entries.

I was troubled by that too, and had considered various ways of at
least polishing the kludge. Maybe a better approach would be to start
with a usage of 1e10 (or something rather high, anyway), but apply a
much more aggressive multiplier than USAGE_DECREASE_FACTOR for sticky
entries only? That way, in earlier calls of entry_dealloc() the sticky
entries, easily identifiable as having 0 calls, are almost impossible
to evict, but after a relatively small number of calls they soon
become more readily evictable.

This seems better than simply having some much lower usage that is
only a few times the value of USAGE_INIT.

Let's suppose we set sticky entries to have a usage value of 10. If
all other queries have more than 10 calls, which is not unlikely
(under the current usage format, 1.0 usage = 1 call, at least until
entry_dealloc() dampens usage) then when we entry_dealloc(), the
sticky entry might as well have a usage of 1, and has no way of
increasing its usage short of becoming a real entry.

On the other hand, with the multiplier trick, how close the sticky
entry is to eviction is, importantly, far more strongly influenced by
the number of entry_dealloc() calls, which in turn is influenced by
churn in the system, rather than being largely influenced by how the
magic sticky usage value happens to compare to those usage values of
some random set of real entries on some random database. If entries
really are precious, then the sticky entry is freed much sooner. If
not, then why not allow the sticky entry to stick around pending its
execution/ promotion to a real entry?

It would probably be pretty inexpensive to maintain what is currently
the largest usage value in the hash table at entry_dealloc() time -
that would likely be far more suitable than 1e10, and might even work
well. We could perhaps cut that in half every entry_dealloc().

--
Peter Geoghegan   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-28 Thread Peter Geoghegan

On 29 March 2012 00:14, Tom Lane t...@sss.pgh.pa.us wrote:
 I'm planning to commit the patch with a USAGE_NON_EXEC_STICK value
 of 3.0, which is the largest value that stays below 10% wastage.
 We can twiddle that logic later, so if you want to experiment with an
 alternate decay rule, feel free.

I think I may well end up doing so when I get a chance. This seems
like the kind of problem that will be solved only when we get some
practical experience (i.e. use the tool on something closer to a
production system than the regression tests).

doc-patch is attached. I'm not sure if I got the balance right - it
may be on the verbose side.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


pg_stat_statements_norm_docs.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-03-29 Thread Peter Geoghegan

On 29 March 2012 02:09, Tom Lane t...@sss.pgh.pa.us wrote:
 Thanks.  I've committed the patch along with the docs, after rather
 heavy editorialization.

Thank you.

 1. What to do with EXPLAIN, SELECT INTO, etc.  We had talked about
 tweaking the behavior of statement nesting and some other possibilities.
 I think clearly this could use improvement but I'm not sure just how.
 (Note: I left out the part of your docs patch that attempted to explain
 the current behavior, since I think we should fix it not document it.)

Yeah, this is kind of unsatisfactory. Nobody would expect the module
to behave this way. On the other hand, users probably aren't hugely
interested in this information.

I'm still kind of attached to the idea of exposing the hash value in
the view. It could be handy in replication situations, to be able to
aggregate statistics across the cluster (assuming you're using
streaming replication and not a trigger based system). You need a
relatively stable identifier to be able to do that. You've already
sort-of promised to not break the format in point releases, because it
is serialised to disk, and may have to persist for months or years.
Also, it will drive home the reality of what's going on in situations
like this (from the docs):

In some cases, queries with visibly different texts might get merged
into a single pg_stat_statements entry. Normally this will happen only
for semantically equivalent queries, but there is a small chance of
hash collisions causing unrelated queries to be merged into one entry.
(This cannot happen for queries belonging to different users or
databases, however.)

Since the hash value is computed on the post-parse-analysis
representation of the queries, the opposite is also possible: queries
with identical texts might appear as separate entries, if they have
different meanings as a result of factors such as different
search_path settings.

 2. Whether and how to adjust the aging-out of sticky entries.  This
 seems like a research project, but the code impact should be quite
 localized.

As I said, I'll try and give it some thought, and do some experiments.

 BTW, I eventually concluded that the parameterization testing we were
 worried about before was a red herring.  As committed, the patch tries
 to store a normalized string if it found any deletable constants, full
 stop.  This seems to me to be correct behavior because the presence of
 constants is exactly what makes the string normalizable, and such
 constants *will* be ignored in the hash calculation no matter whether
 there are other parameters or not.

Yeah, that aspect of not canonicalising parametrised entries had
bothered me. I guess we're better off gracefully handling the problem
across the board, rather than attempting to band-aid the problem up by
just not having speculative hashtable entries in cases where they
arguably are not so useful. Avoiding canonicalising those constants
was somewhat misleading.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-03-31 Thread Peter Geoghegan

On 28 March 2012 15:23, Robert Haas robertmh...@gmail.com wrote:
 At any rate, I strongly agree that counting the number of
 strategy allocations is not really a viable proxy for counting the
 number of backend writes.  You can't know how many of those actually
 got dirtied.

Sure.

 Since any backend write is necessarily the result of that backend
 trying to allocate a buffer, I think maybe we should just count
 whether the number of times it was trying to allocate a buffer *using
 a BAS* vs. the number of times it was trying to allocate a buffer *not
 using a BAS*.  That is, decide whether or not it's a strategy write
 not based on whether the buffer came in via a strategy allocation, but
 rather based on whether it's going out as a result of a strategy
 allocation.

I'm not quite sure I understand what you mean here. Are you suggesting
that I produce a revision that bumps beside FlushBuffer() in
BufferAlloc(), as a dirty page is evicted/written, while breaking the
figure out into != BAS_NORMAL and == BAS_NORMAL figures? Would both
figures be presented as separate columns within pg_stat_bgwriter?
-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new group commit behavior not helping?

2012-03-31 Thread Peter Geoghegan

On 1 April 2012 01:10, Robert Haas robertmh...@gmail.com wrote:
 Hoping to demonstrate the wonders of our new group commit code, I ran
 some benchmarks on the IBM POWER7 machine with synchronous_commit =
 on.  But, it didn't come out much better than 9.1.  pgbench, scale
 factor 300, median of 3 30-minute test runs, # clients = #threads,
 shared_buffers = 8GB, maintenance_work_mem = 1GB, synchronous_commit =
 on, checkpoint_segments = 300, checkpoint_timeout = 15min,
 checkpoint_completion_target = 0.9, wal_writer_delay = 20ms.

Why the low value for wal_writer_delay?

 master:
 01 tps = 118.968446 (including connections establishing)
 02 tps = 120.666865 (including connections establishing)
 04 tps = 209.624437 (including connections establishing)
 08 tps = 377.387029 (including connections establishing)
 16 tps = 695.172899 (including connections establishing)
 32 tps = 1318.468375 (including connections establishing)

 REL9_1_STABLE:
 01 tps = 117.037056 (including connections establishing)
 02 tps = 119.393871 (including connections establishing)
 04 tps = 205.958750 (including connections establishing)
 08 tps = 365.464735 (including connections establishing)
 16 tps = 673.379394 (including connections establishing)
 32 tps = 1101.324865 (including connections establishing)

(presumably s/tps/clients/ was intended here)

 Is this expected behavior?  Is this not the case where it's supposed
 to help?  I thought Peter G. posted results showing a huge improvement
 on this kind of workload, and I thought Heikki reproduced them on a
 different server, so I'm confused why I can't.

The exact benchmark that I ran was the update.sql pgbench-tools
benchmark, on my laptop. The idea was to produce a sympathetic
benchmark with a workload that was maximally commit-bound. Heikki
reproduced similar numbers on his laptop, iirc. Presumably the default
TPC-B-like transaction test has been used here.

You didn't mention what kind of disks this server has - I'm not sure
if that information is available elsewhere. That could be highly
pertinent.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] new group commit behavior not helping?

2012-04-02 Thread Peter Geoghegan

On 1 April 2012 06:41, Robert Haas robertmh...@gmail.com wrote:
 There seem to be too relevant differences between your test and mine:
 (1) your test is just a single insert per transaction, whereas mine is
 pgbench's usual update, select, update, update, insert and (2) it
 seems that, to really see the benefit of this patch, you need to pound
 the server with a very large number of clients.  On this test, 250
 clients was the sweet spot.

*refers to original early January benchmark*

While the graph that I produced was about the same shape as yours, the
underlying hardware was quite different, and indeed with my benchmark
group commit's benefits are more apparent earlier - at 32 clients,
throughput has more-than doubled compared to pre group commit
Postgres, which has already just about plateaued. I did include hdparm
information for the disk that my benchmark was performed on at the
time. While write-caching was not disabled, I would expect that the
commit speed of my laptop - which has a fairly unremarkable 7200RPM
disk - is slower than the 10K RPM SAS disks that you used. A formal
benchmark of respective raw commit speeds may shed more light on this.

Why did I even bother with such a sympathetic benchmark, when a
benchmark on a large server could have been performed instead? Well,
the reality is that many of our users have a commit speed that is
comparable to my laptop. In particular, the increasing prevalence of
cloud type deployments, make group commit a timely feature. If you
wanted to demonstrate the wonders of group commit, I'd take that
particular tone. I'm sure that if you re-ran this benchmark with a
battery-backed cache, you would observe a much smaller though still
very apparent benefit, but if you wanted to make the feature sound
appealing to traditional enterprise users that are using a BBU, a good
line would be this is what will save your bacon that day that your
procedures fail and your BBU battery dies.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] invalid search_path complaints

2012-04-03 Thread Peter Geoghegan

On 3 April 2012 19:16, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Tue, Apr 3, 2012 at 12:05 PM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Apr 3, 2012 at 11:49 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 I would say that's an improvement.  Do you think it isn't?

 It seems like a log spam hazard at high connection rates.

 [ shrug... ]  Failing to report a problem is a problem, too.  By your
 argument any error message anywhere should be removed because it might
 be a log spam hazard at high reporting rates.

Surely there's a way to throttle it appropriately?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Publish checkpoint timing and sync files summary data to pg_stat_bgwriter

2012-04-05 Thread Peter Geoghegan

On 3 April 2012 12:11, Robert Haas robertmh...@gmail.com wrote:
 Well, for 9.2, I am asking that you rip all that stuff out of the
 patch altogether so we can focus on the stuff that was in the original
 patch.

Given how we're pushed for time, I'm inclined to agree that that is
the best course of action.

Attached revision does not attempt to do anything with strategy
writes/allocations.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


bgwriter_stats_2012_04_05.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] foreign key locks, 2nd attempt

2012-04-05 Thread Peter Geoghegan

On 25 March 2012 09:17, Simon Riggs si...@2ndquadrant.com wrote:
 The main thing we're waiting on are the performance tests to confirm
 the lack of regression.

I have extensively benchmarked the latest revision of the patch
(tpc-b.sql), which I pulled from Alvaro's github account. The
benchmark was of the feature branch's then-and-current HEAD, Don't
follow update chains unless caller requests it.

I've had to split these numbers out into two separate reports.
Incidentally, at some future point I hope that pgbench-tools can
handling testing across feature branches, initdb'ing and suchlike
automatically and as needed. That's something that's likely to happen
sooner rather than later.

The server used was kindly supplied by the University of Oregon open
source lab.

Server (It's virtualised, but apparently this is purely for sandboxing
purposes and the virtualisation technology is rather good):

IBM,8231-E2B POWER7 processor (8 cores).
Fedora 16
8GB Ram
Dedicated RAID1 disks. Exact configuration unknown.

postgresql.conf (this information is available when you drill down
into each test too, fwiw):
max_connections = 200
shared_buffers = 2GB
checkpoint_segments = 30
checkpoint_completion_target = 0.8
effective_cache_size = 6GB

Reports:

http://results_fklocks.staticloud.com/
http://results_master_for_fks.staticloud.com/

Executive summary: There is a clear regression of less than 10%. There
also appears to be a new source of contention at higher client counts.

I realise that the likely upshot of this, and other concerns that are
generally held at this late stage is that this patch will not make it
into 9.2 . For what it's worth, that comes as a big disappointment to
me. I would like to thank both Alvaro and Noah for their hard work
here.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-04-06 Thread Peter Geoghegan

On 29 March 2012 21:05, Tom Lane t...@sss.pgh.pa.us wrote:
 Barring objections I'll go fix this, and then this patch can be
 considered closed except for possible future tweaking of the
 sticky-entry decay rule.

Attached patch fixes a bug, and tweaks sticky-entry decay.

The extant code bumps usage (though not call counts) in two hooks
(pgss_post_parse_analyze() and pgss_ExecutorEnd()) , so prepared
queries will always have about half the usage of an equivalent simple
query, which is clearly not desirable. With the proposed patch,
usage should be similar to calls until the first call of
entry_dealloc(), rather than usually having a value that's about twice
as high. With the patch, a run of pgbench with and without -M
prepared results in a usage of calls + 1 for each query from both
runs.

The approach I've taken with decay is to maintain a server-wide median
usage value (well, a convenient approximation), which is assigned to
sticky entries. This makes it hard to evict the entries in the first
couple of calls to entry_dealloc(). On the other hand, if there really
is contention for entries, it will soon become really easy to evict
sticky entries, because we use a much more aggressive multiplier of
0.5 for their decay.

I rather conservatively initially assume that the median usage is 10,
which is a very low value considering the use of the multiplier trick.
In any case, in the real world it won't take too long to call
entry_dealloc() to set the median value, if in fact it actually
matters.

You described entries as precious. This isn't quite the full picture;
while pg_stat_statements will malthusianistically burn through pretty
much as many entries as you care give to it, or so you might think, I
believe that in the real world, the rate at which the module burns
through them would frequently look logarithmic. In other words, after
an entry_dealloc() call the hashtable is 95% full, but it might take
rather a long time to reach 100% again - the first 5% is consumed
dramatically faster than the last. The user might not actually care if
you need to cache a sticky value for a few hours in one of their
slots, as you run an epic reporting query, even though the hashtable
is over 95% full.

The idea is to avoid evicting a sticky entry just because there
happened to be an infrequent entry_dealloc() at the wrong time, and
the least marginal of the most marginal 5% of non-sticky entries (that
is, the 5% up for eviction) happened to have a call count/usage of
higher than the magic value of 3, which I find quite plausible.

If I apply your test for dead sticky entries after the regression
tests (serial schedule) were run, my approach compares very favourably
(granted, presumably usage values were double-counted for your test,
making our results less than completely comparable).

For the purposes of this experiment, I've just commented out if
(calls == 0) continue; within the pg_stat_statements() function,
obviously:

postgres=# select calls = 0, count(*) from pg_stat_statements() group
by calls = 0;
-[ RECORD 1 ]-
?column? | f
count| 959
-[ RECORD 2 ]-
?column? | t
count| 3  --- this includes the above query itself

postgres=# select calls = 0, count(*) from pg_stat_statements() group
by calls = 0;
-[ RECORD 1 ]-
?column? | f
count| 960   now it's counted here...
-[ RECORD 2 ]-
?column? | t
count| 2    ...not here

I've also attached some elogs, in their original chronological order,
that trace the median usage when recorded at entry_dealloc() for the
regression tests. As you'd expect given that this is the regression
tests, the median is very low, consistently between 1.9 and 2.5. An
additional factor that makes this work well is that the standard
deviation is low, and as such it is much easier to evict sticky
entries, which is what you want here.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
DEBUG:  cur_med_usage: 1.98
DEBUG:  cur_med_usage: 1.960200
DEBUG:  cur_med_usage: 1.940598
DEBUG:  cur_med_usage: 1.921192
DEBUG:  cur_med_usage: 1.901980
DEBUG:  cur_med_usage: 1.882960
DEBUG:  cur_med_usage: 1.882960
DEBUG:  cur_med_usage: 1.901980
DEBUG:  cur_med_usage: 1.901980
DEBUG:  cur_med_usage: 1.901980
DEBUG:  cur_med_usage: 1.901980
DEBUG:  cur_med_usage: 1.901980
DEBUG:  cur_med_usage: 1.901980
DEBUG:  cur_med_usage: 1.901980
DEBUG:  cur_med_usage: 1.901980
DEBUG:  cur_med_usage: 1.901980
DEBUG:  cur_med_usage: 1.901980
DEBUG:  cur_med_usage: 1.901980
DEBUG:  cur_med_usage: 1.901980
DEBUG:  cur_med_usage: 1.901980
DEBUG:  cur_med_usage: 1.921192
DEBUG:  cur_med_usage: 1.921192
DEBUG:  cur_med_usage: 1.921192
DEBUG:  cur_med_usage: 1.960200
DEBUG:  cur_med_usage: 1.960200
DEBUG:  cur_med_usage: 1.960200
DEBUG:  cur_med_usage: 1.960200
DEBUG:  cur_med_usage: 1.960200
DEBUG:  cur_med_usage: 1.98
DEBUG:  cur_med_usage: 1.98
DEBUG:  cur_med_usage: 1.98
DEBUG:  cur_med_usage: 1.98
DEBUG:  cur_med_usage

Re: [HACKERS] Last gasp

2012-04-07 Thread Peter Geoghegan

On 7 April 2012 22:20, Tom Lane t...@sss.pgh.pa.us wrote:
 In short, the idea of strongly calendar-driven releases looks more
 and more attractive to me the more times we go through this process.
 If your patch isn't ready on date X, then it's not getting into this
 release; but there'll be another bus coming along before long.
 Stretching out release cycles to get in those last few neat features
 just increases the pressure for more of the same, because people don't
 know how long it will be to the next release.

I hope that that policy will not be applied without some degree of
discrimination.

I understand that this is an open-source project, and on a certain
level it has to be that simple, because we don't have any other way of
making contributors more conscientious about scheduling
considerations, except of course for giving them a nudge.  That said,
all features are not equal, and clearly some are of particular
strategic importance to the project, and as such may be worth holding
up the release for, subject to certain parameters. Which of these
features are worth holding the release up for is highly dependent on
their state at the time of the final commitfest deadline. Which
features are of strategic importance is of course a matter of opinion,
but that's the kind of thing that we have a steering committee for,
right?

Perhaps a good compromise would be to formalise the process through
which patches get a second chances in the final commitfest. I don't
think that it will prove workable to have a final deadline that is
adhered to blindly.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Last gasp

2012-04-07 Thread Peter Geoghegan

On 7 April 2012 23:51, Tom Lane t...@sss.pgh.pa.us wrote:
 As for the steering committee, core tries hard not to make technical
 decisions for the project; it's better to let those emerge by consensus.

I don't really consider this to be an engineering problem, which is
precisely why I bring up the role of core here. Holding up a release
for a feature has costs (the feature is delayed, which in some way
hurts us, as our users don't have a new release for that much longer,
etc) and benefits (when the release is available, Postgres is that
much better one release sooner). Naturally, the maturity of the
existing code, as judged by the community, would be heavily weighed in
this process, but the wider importance of the feature, and how much of
an advantage having it earlier provides, would also be weighed.

Open source products (as distinct from projects) often have this sort
of cost-benefit analysis made by individuals who are somewhat removed
from the engineering decisions themselves. While I'm certainly not
advocating that we switch to that model - in particular, I am not
advocating anything less than full transparency - surely there is
something to be said for some aspects of it, mostly to break the sorts
of deadlocks that can sometimes occur.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Re: pg_stat_statements normalisation without invasive changes to the parser (was: Next steps on pg_stat_statements normalisation)

2012-04-08 Thread Peter Geoghegan

On 8 April 2012 20:51, Tom Lane t...@sss.pgh.pa.us wrote:
 Applied with some cosmetic adjustments.

Thanks.

Having taken another look at the code, I wonder if we wouldn't have
been better off just fastpathing out of pgss_store in the first call
(in a pair of calls made by a backend as part an execution of some
non-prepared query) iff there is already an entry in the hashtable -
after all, we're now going to the trouble of acquiring the spinlock
just to increment the usage for the entry by 0 (likewise, every other
field), which is obviously superfluous. I apologise for not having
spotted this before submitting my last patch.

I have attached a patch with the modifications described.

This is more than a micro-optimisation, since it will cut the number
of spinlock acquisitions approximately in half for non-prepared
queries.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services


pg_stat_statements_optimization_2012_04_08.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-10 Thread Peter Geoghegan

On 10 April 2012 14:33, Robert Haas robertmh...@gmail.com wrote:
 So, should we make the new columns exposed by pg_stat_statements use
 milliseconds, so that the block read/write timings are everywhere in
 milliseconds, or should we keep them as a float8, so that all the
 times exposed by pg_stat_statements use float8?

I believe that we should keep them as float8, on the basis that a user
is more likely to generalise from total_time's format (when writing a
script to query the view of whatever) than from that of
pg_stat_database.

A part of me would like to change the view definitions so that all the
columns are strongly typed (i.e. all these values would be intervals).
I realise that that isn't practical though.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Last gasp

2012-04-10 Thread Peter Geoghegan

On 10 April 2012 15:26, Kevin Grittner kevin.gritt...@wicourts.gov wrote:
 A patch on which the author is continuing to work even in the absence of 
 review
 should be considered a WIP want feedback submission; it should not
 be allowed to constitute a placeholder for inclusion in the
 release.

To be fair, I doubt that anyone actually believes that. If they did,
they wouldn't have to pay attention very long to receive a rude
awakening.

 onlyhalfkiddingPerhaps we should have a concept of feature
 months -- so that when we look at holding up a release with 20
 features for two months so that one more feature can make it in, the
 cost side of the equation is 40 feature-months, and the benefit is
 10 feature-months.  (Remember, you can't count the added feature as
 though it's there for a year before the next release if it holds the
 release up.)/onlyhalfkidding

I am broadly in favour of assessing the value of features in the same
way that a commercial organisation might - the more adoption a feature
spurs, the more valuable it is, and the more hesitant we should be to
bump it (though other factors are also very important). I take this
idea seriously, or at the very least share the mentality of the idea -
I'm just not sure that we can formalise it, or that we should.

I also think that we should try and reward good will. I think that we
generally do so, but an automatic cut-off date seems contrary to that.
The law of unintended consequences might see us lower our standards to
commit something to meet the deadline, that would otherwise not be
immediately committed. We're only human, and it would be foolish to
assume that committers don't feel that kind of pressure.
-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] To Do wiki

2012-04-10 Thread Peter Geoghegan

On 10 April 2012 16:40, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 10.04.2012 18:31, Jeff Janes wrote:
 If I do select count(distinct bid) from pgbench_accounts I get many
 calls to btint4fastcmp, but if I do create index on pgbench_accounts
 (bid) I instead get many calls to btint4cmp.  If the index build is
 using SortSupport, shouldn't it also be calling btint4fastcmp like the
 distinct does?


 Oh, sorry, you're right. I stand corrected.

There is an impedance mismatch between tuplesort_begin_heap and
tuplesort_begin_index_btree that prevented this from being done with
the initial commit. Strangely, the SortSupport commit message didn't
comment on this.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Last gasp

2012-04-10 Thread Peter Geoghegan

On 10 April 2012 16:51, Robert Haas robertmh...@gmail.com wrote:
 When these things are pointed out to the people who are doing them,
 the response is often either (a) this feature is so important we're
 all going to die if it's not in the release how can you even think
 about bouncing it or (b) I'm not really still hacking on it these are
 all just minor changes.  It's surprisingly easy to hoodwink even
 experienced contributors into thinking that your patch is really,
 really almost done, honest, it just needs a couple more tweaks when in
 fact it's nowhere close.  I try not to attribute to bad faith what can
 be explained by incurable optimism, so maybe we just have a lot of
 incurable optimism.  But it's doing nobody any good.

I think that you may be missing the greater point here. The people
that do this are kind of like the defectors in prisoner's dilemma - at
a certain point, some people cannot resist the temptation to push
their own patch forward at the expense of others by asserting
dubiously that it's ready-for-committer, or maybe they really do
incorrectly believe it to be so, or maybe, unlike you, they understand
that term to mean I've done as much as I can, as has my reviewer, or
whatever. To play devil's advocate, that might be an anti-social act,
but at a certain point, who wants to be the last honest sap? Besides,
isn't everyone's crime no crime at all?

ISTM that this is symptomatic of the wider problem of a dire shortage
of committer resources. 100% of my non-doc patches so far have been
committed by 3 people. I would really like to see us figure out a way
of making more hackers committers, perhaps subject to certain
conditions that don't currently exist for committers. You might find
that given commit bits, some people will take their responsibilities
as a reviewer far more seriously. Maybe you don't think that any of
the likely candidates are quite ready for that responsibility, but you
must admit that it's a serious problem.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Last gasp

2012-04-10 Thread Peter Geoghegan

On 10 April 2012 18:28, Robert Haas robertmh...@gmail.com wrote:
 If we accept your argument that some
 people simply cannot help themselves, then the only solution is to
 make it cease to be a prisoner's dilemma, and that can only be done by
 changing the incentives, which presumably means handing down
 punishments to people who push their own patches forward at the
 expense of others.  Unless we care to choose a benevolent dictator, I
 don't see any way to accomplish that.

Well, I was really pointing out that people are somewhat forced into a
corner by the current state of affairs, because committers are not
typically able to look at anything in sufficient detail that isn't
ready for committer, particularly as we approach crunch-time - their
time is simply too precious. By not marking the patch ready for
committer, they are basically asking for their patch to be passed
over, and they may be incapable of bridging the chasm between what
really is their best effort, and what'd you'd consider to be the
ready-for-committer gold standard. Some people cannot exclusively
dedicate their time to their patch, or lack sufficient experience to
meet that standard.

 It's feasible to think that we might be able to streamline the process
 of booting patches that are not close to committable at the start of a
 CommitFest, and especially at the start of the final CommitFest.  For
 example, limiting patches to a small number of days in the Waiting on
 Author state would help a great deal.  But the more general problem
 of people arguing that *their* patch is the special one without which
 the earth will cease to revolve about its axis is more difficult to
 solve, or that it's ready when it's really not, is more difficult to
 solve.  How would you propose we deal with that problem?

As I've already said, I think that needs to be decided impartially,
ideally by people who are removed from the engineering process. I
don't mean that we'd get a marketing person to make those decisions -
far from it. I just mean that some separation of powers can be a good
thing in some circumstances.

 I don't agree with that.  I think that there are a few people who
 don't now have commit bits who should be given them - in particular,
 Fujii Masao and Kevin Grittner, both of whom have been doing
 consistently excellent work for several years.

I agree with you about both individuals. I hope that this happens
sooner rather than later.

 Giving more people the ability to
 commit stuff will neither force them to devote time to it nor make
 them qualified to do it if they aren't already.

One major component of being qualified, is, of course, knowing what
you don't know, and the risk of being left with egg on your face turns
out to be a pretty effective way of preventing new committers from
being too eager. Giving more people bits has a cost: in general, I'd
expect it to result in a higher bug-to-line ratio when code is
committed. However, not doing so has an opportunity cost: less code is
committed, which may, on balance, result in an inferior release than
what we could have had. Maybe you think that we have the balance
perfectly right, and you are of course perfectly entitled to that
view, as well as being perfectly entitled to having your opinion more
heavily weighed than mine, but I'd like to see a dialogue about it at
some point.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-10 Thread Peter Geoghegan

On 10 April 2012 23:07, Greg Smith g...@2ndquadrant.com wrote:
 On 04/10/2012 12:27 PM, Tom Lane wrote:
 I am doing more sophisticated things with it, so I'll celebrate this as my
 opportunity to say I did something you didn't see coming for 2012.

This is why I requested that we expose the query_id hash value - I
believe that it will be generally useful in clustering situations. It
would be nice to have a persistent identifier. While we're discussing
revising pg_stat_statement's interface, are you still opposed to
exposing that value, Tom?

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-10 Thread Peter Geoghegan

On 11 April 2012 00:35, Robert Haas robertmh...@gmail.com wrote:
 If people need something like that, couldn't they create it by hashing
 the normalized query text with an arbitrary algorithm?

That supposes that the normalised query text is perfectly stable. It
may well not be, particularly for things like ad-hoc queries or
queries generated by ORMs, across database clusters and over long
periods of time - you're basically throwing the benefit of all of that
intelligent normalisation out of the window, because it's pretty close
to free to expose it. What if a developer tweaks an alias in the
application for clarity? Also, as you point out, it has additional
utility in advertising when a collision has happened, and setting the
user's expectations appropriately. I assume that collisions are very
rare, but when they do happen, this gives you a fighting chance of
noticing them.

As Tom points out, the query hash will vary according to platform
specific characteristics, including endianness, and will require OIDs
are the same on every node. However, it is still going to be useful in
clusters that use streaming replication, though not a third party
trigger based replication system for example, because streaming
replication does of course require that those factors (and rather a
lot more) will be identical across the cluster anyway. Realistically,
I'd expect a large majority of people interested in this feature to
only want to use it with streaming replication anyway.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Patch: add timing of buffer I/O requests

2012-04-10 Thread Peter Geoghegan

On 11 April 2012 01:16, Tom Lane t...@sss.pgh.pa.us wrote:
 Peter Geoghegan pe...@2ndquadrant.com writes:
 On 11 April 2012 00:35, Robert Haas robertmh...@gmail.com wrote:
 If people need something like that, couldn't they create it by hashing
 the normalized query text with an arbitrary algorithm?

 That supposes that the normalised query text is perfectly stable. It
 may well not be, particularly for things like ad-hoc queries or
 queries generated by ORMs, across database clusters and over long
 periods of time -

 Indeed, but the hash value isn't stable either given those sorts of
 assumptions, so I'm not convinced that there's any advantage there.

Isn't it? The hash captures the true meaning of the query, while
having the database server's platform as a usually irrelevant
artefact. Another thing that I forgot to mention is client encoding -
it may well be fairly inconvenient to have to use the same algorithm
to hash the query string across applications. You also have to hash
the query string yourself again and again, which is expensive to do
from Python or something, and is often inconvenient - differences
beyond track_activity_query_size bytes (default:1024) are not
recognised. Using an SQL code beautifier where a single byte varies
now breaks everything, which developers don't expect at all (we've
trained them not to), so in many ways you're back to the same
limitations as classic pg_stat_statements if you attempt to aggregate
queries over time and across machines, which is a very real use case.

It's probably pretty annoying to have to get your Python app to use
the same hash function as your Java app or whatever I, unless you want
to use something heavyweight like a cryptographic hash function. By
doing it within Postgres, you avoid those headaches.

I'm not asking you to very loudly proclaim that it should be used like
this - just expose it, accurately document it, and I'm quite confident
that it will be widely used and relied upon by those that are
reasonably well informed, and understand its limitations, which are
really quite straightforward.

 What I think people would actually like to know, if they're in a
 situation where distinct query texts are getting hashed to the same
 thing, is *which* different texts got hashed to the same thing.
 But there's no good way to expose that given the pg_stat_statements
 infrastructure, and exposing the hash value doesn't help.

Apart from detecting the case where we get a straightforward
collision, I don't expect that that would be useful. The whole point
is that the user doesn't care about the difference, and I think we've
specified a practical, widely useful standard for when queries should
be considered equivalent.
-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Last gasp

2012-04-10 Thread Peter Geoghegan

On 11 April 2012 02:14, Robert Haas robertmh...@gmail.com wrote:
 My perception of what's going on here is dramatically different from
 yours.  I don't think there was any overflow of submissions for 9.2.

That is just not true. See the attached graph (couldn't produce one
with better resolution at short notice) - I've just eyeballed the
graph, but it looks like an upward trend to me. I think I submitted my
first patch in late April 2011. Now that Andres has joined
2ndQuadrant, it seems likely that his output will increase by a large
amount too. I think that Noah began contributing less than 2 years
ago. It certainly seems to me like we're getting more submissions that
are more complex...this image was generated by the tool gitstat, and
as such does not include figures for the patches that we were unable
to commit, though there also appears to be an upward trend there.

 For the most part I would describe it as a slow and boring release
 cycle, with the usual spike in half-baked submissions right near the
 end, except this release they were less baked than usual, which is why
 most of them didn't go in.

That is uncalled for.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services
attachment: lines_of_code.png
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] Last gasp

2012-04-11 Thread Peter Geoghegan

On 11 April 2012 03:26, Tom Lane t...@sss.pgh.pa.us wrote:
 [ scratches head... ]  That's supposed to be total lines of code in our
 source tree?  What's the big drop in late 2009, then?

I had wondered about that myself - all I can tell you is that I used
the tool as advertised, without any adornments. This particular tool
is a bit misleading, because it counts lines of code as lines of
checked-in text, which can include things that are very verbose
without necessarily being what we really think of as code - it doesn't
make any effort to discriminate against non-code, nor does it expose
the option of doing so. I chose it at short notice simply because it
produces graphs. I do intend to take a look at this problem in more
detail, and get better statistics on changes to our codebase - it's a
tricky proposition, though.

-- 
Peter Geoghegan       http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training and Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

< 1 2 3 4 5 6 7 8 9 10 >

201 - 300 of 3415 matches

Mail list logo