Re: [PATCH] Exorcise zero-dimensional arrays (Was: Re: [HACKERS] Should array_length() Return NULL)

2013-04-02 Thread Gavin Flower

On 29/03/13 13:12, Brendan Jurd wrote:

On 28 March 2013 20:34, Dean Rasheed dean.a.rash...@gmail.com wrote:

Is the patch also going to allow empty arrays in higher dimensions
where not just the last dimension is empty?

It doesn't allow that at present.


It seems as though, if
it's allowing 1-by-0 arrays like '{{}}' and '[4:4][8:7]={{}}', it
should also allow 0-by-0 arrays like '[4:3][8:7]={}', and 0-by-3
arrays like '[4:3][11:13]={}'.

I think the array literal parser would reject this on the grounds that
the brace structure doesn't match the dimension specifiers.  You could
modify that check to respect zero length in dimensions other than the
innermost one, but it's hard to say whether it would be worth the
effort.

It might be instructive to hear from somebody who does (or intends to)
use multidim arrays for some practical purpose, but I don't even know
whether such a person exists within earshot of this list.

Cheers,
BJ


Multiple dimension arrays. 'might' be something I may need, or at least 
usefully use in my current project.


I am not ready to do the database design to that detail yet.

However, I know I will need to do 'clever'(TM) things with multiple 
images areas on images, and possibly group several images together (and 
still access their individual image areas).  So potentially arrays of 
dimension 2 or greater may be useful.  Certainly, some images will not 
have any image areas defined, some arrays might be empty.



Cheers,
Gavin




Re: [HACKERS] regression test failed when enabling checksum

2013-04-02 Thread Simon Riggs
On 2 April 2013 02:53, Jeff Davis pg...@j-davis.com wrote:

 Any idea
 what is going on?

 Not right now.

Since I'm now responsible for the quality of this patch, I need to say
this before someone else does: we have until the end of the week to
fix this conclusively, or I will need to consider whether to revoke
the patch in this release. Thanks for your time and trouble to locate
problems.

It may be that the patch is revealing an underlying bug, so we don't
yet have evidence for the source of the bug.

 My primary suspect is what's going on in
 visibilitymap_set() and heap_xlog_visible(), which is more complex than
 some of the other code. That would require some VACUUM activity, which
 isn't in your workload -- do you think autovacuum may kick in sometimes?

Other candidates might be:

* page hole is non-empty, so the overwrite of the Full page write
may leave the block in an intermediate state. Tom recently fixed
RestoreBkpBlock() to avoid it zeroing the contents first before
applying the page.

* something to do with access to GetPageLSN()

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] regression test failed when enabling checksum

2013-04-02 Thread Andres Freund
On 2013-04-01 19:51:19 -0700, Jeff Janes wrote:
 On Mon, Apr 1, 2013 at 10:37 AM, Jeff Janes jeff.ja...@gmail.com wrote:
 
  On Tue, Mar 26, 2013 at 4:23 PM, Jeff Davis pg...@j-davis.com wrote:
 
 
  Patch attached. Only brief testing done, so I might have missed
  something. I will look more closely later.
 
 
  After applying your patch, I could run the stress test described here:
 
  http://archives.postgresql.org/pgsql-hackers/2012-02/msg01227.php
 
  But altered to make use of initdb -k, of course.
 
  Over 10,000 cycles of crash and recovery, I encountered two cases of
  checksum failures after recovery, example:
  ...
 
 
 
  Unfortunately I already cleaned up the data directory before noticing the
  problem, so I have nothing to post for forensic analysis.  I'll try to
  reproduce the problem.
 
 
 I've reproduced the problem, this time in block 74 of relation
 base/16384/4931589, and a tarball of the data directory is here:
 
 https://docs.google.com/file/d/0Bzqrh1SO9FcELS1majlFcTZsR0k/edit?usp=sharing
 
 (the table is in database jjanes under role jjanes, the binary is commit
 9ad27c215362df436f8c)
 
 What I would probably really want is the data as it existed after the crash
 but before recovery started, but since the postmaster immediately starts
 recovery after the crash, I don't know of a good way to capture this.
 
 I guess one thing to do would be to extract from the WAL the most recent
 FPW for block 74 of relation base/16384/4931589  (assuming it hasn't been
 recycled already) and see if it matches what is actually in that block of
 that data file, but I don't currently know how to do that.

Since I bragged somewhere else recently that it should be easy to do now that
we have pg_xlogdump I hacked it up so it dumps all the full page writes into
the directory specified by --dump-bkp=PATH. It currently overwrites previous
full page writes to the same page but that should be trivial to change if you
want by adding %X.%X for the lsn into the path sprintf.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services
From 09cdce611dc74082901ca1a646135a5ea1af709c Mon Sep 17 00:00:00 2001
From: Andres Freund and...@anarazel.de
Date: Tue, 2 Apr 2013 11:43:23 +0200
Subject: [PATCH] pg_xlogdump: add option for dumping full page writes into a
 directory

---
 contrib/pg_xlogdump/pg_xlogdump.c |   56 -
 1 file changed, 55 insertions(+), 1 deletion(-)

diff --git a/contrib/pg_xlogdump/pg_xlogdump.c b/contrib/pg_xlogdump/pg_xlogdump.c
index d6d5498..a5e4186 100644
--- a/contrib/pg_xlogdump/pg_xlogdump.c
+++ b/contrib/pg_xlogdump/pg_xlogdump.c
@@ -40,6 +40,7 @@ typedef struct XLogDumpConfig
 	bool		bkp_details;
 	int			stop_after_records;
 	int			already_displayed_records;
+	char		*dump_bkp;
 
 	/* filter options */
 	int			filter_by_rmgr;
@@ -373,6 +374,46 @@ XLogDumpDisplayRecord(XLogDumpConfig *config, XLogRecPtr ReadRecPtr, XLogRecord
    bkpb.block, bkpb.hole_offset, bkpb.hole_length);
 		}
 	}
+
+	if (config-dump_bkp != NULL)
+	{
+		int			bkpnum;
+		char	   *blk = (char *) XLogRecGetData(record) + record-xl_len;
+
+		for (bkpnum = 0; bkpnum  XLR_MAX_BKP_BLOCKS; bkpnum++)
+		{
+			BkpBlock	bkpb;
+			char		bkpbdata[BLCKSZ];
+			int			outfd;
+			char		outpath[MAXPGPATH];
+
+			if (!(XLR_BKP_BLOCK(bkpnum)  record-xl_info))
+continue;
+
+			memcpy(bkpb, blk, sizeof(BkpBlock));
+			blk += sizeof(BkpBlock);
+
+			memcpy(bkpbdata, blk, bkpb.hole_offset);
+			memset(bkpbdata + bkpb.hole_offset, 0, bkpb.hole_length);
+			memcpy(bkpbdata + bkpb.hole_offset + bkpb.hole_length,
+			   blk + bkpb.hole_offset,
+			   BLCKSZ - bkpb.hole_offset - bkpb.hole_length);
+
+			blk += BLCKSZ - bkpb.hole_length;
+
+			sprintf(outpath, %s/%u-%u-%u:%u_%s, config-dump_bkp,
+			bkpb.node.spcNode, bkpb.node.dbNode, bkpb.node.relNode,
+			bkpb.block, forkNames[bkpb.fork]);
+			outfd = open(outpath, O_WRONLY|O_TRUNC|O_CREAT, S_IRUSR|S_IWUSR);
+			if (outfd  0)
+fatal_error(could not open output file %s: %s,
+outpath, strerror(errno));
+			if (write(outfd, bkpbdata, BLCKSZ) != BLCKSZ)
+fatal_error(could not successfully write output block %s: %s,
+outpath, strerror(errno));
+			close(outfd);
+		}
+	}
 }
 
 static void
@@ -387,6 +428,7 @@ usage(void)
 	printf(  -?, --help show this help, then exit\n);
 	printf(\nContent options:\n);
 	printf(  -b, --bkp-details  output detailed information about backup blocks\n);
+	printf(  -d, --dump-bkp=pathdump all full page images into PATH\n);
 	printf(  -e, --end=RECPTR   stop reading at log position RECPTR\n);
 	printf(  -n, --limit=N  number of records to display\n);
 	printf(  -p, --path=PATHdirectory in which to find log segment files\n);
@@ -413,6 +455,7 @@ main(int argc, char **argv)
 
 	static struct option long_options[] = {
 		{bkp-details, 

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Andres Freund
On 2013-04-01 17:56:19 -0500, Jim Nasby wrote:
 On 3/23/13 7:41 AM, Ants Aasma wrote:
 Yes, having bgwriter do the actual cleaning up seems like a good idea.
 The whole bgwriter infrastructure will need some serious tuning. There
 are many things that could be shifted to background if we knew it
 could keep up, like hint bit setting on dirty buffers being flushed
 out. But again, we have the issue of having good tests to see where
 the changes hurt.
 
 I think at some point we need to stop depending on just bgwriter for all this 
 stuff. I believe it would be much cleaner if we had separate procs for 
 everything we needed (although some synergies might exist; if we wanted to 
 set hint bits during write then bgwriter *is* the logical place to put that).
 
 In this case, I don't think keeping stuff on the free list is close enough to 
 checkpoints that we'd want bgwriter to handle both. At most we might want 
 them to pass some metrics back in forth.

bgwriter isn't doing checkpoints anymore, there's the checkpointer since 9.2.

In my personal experience and measurement bgwriter is pretty close to
useless right now. I think - pretty similar to what Amit has done - it
should perform part of a real clock sweep instead of just looking ahead
of the current position without changing usagecounts and the sweep
position and put enough buffers on the freelist to sustain the need till
its next activity phase. I hacked around that one night in a hotel and
got impressive speedups (and quite some breakage) for bigger than s_b
workloads.

That would reduce quite a bit of pain points:
- fewer different processes/cpus looking at buffer headers ahead in the cycle
- fewer cpus changing usagecounts
- dirty pages are far more likely to be flushed out already when a new
  page is needed
- stuff like the relation extension lock which right now frequently have
  to search far and wide for new pages while holding the extension lock
  exlusively should finish quite a bit faster

If the freelist lock is separated from the lock protecting the clock
sweep this should get quite a bit of a scalability boost without having
potential unfairness you can have with partitioning the lock or such.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] citext like searches using index

2013-04-02 Thread Peter Eisentraut
On 3/30/13 11:35 PM, Tom Lane wrote:
 The LIKE index optimization is hard-wired into
 match_special_index_operator(), which never heard of citext's ~~
 operators.
 
 I've wanted for years to replace that mechanism with something that
 would support plug-in extensions, but have no very good idea how to
 do it.

I have been thinking there should be a GiST index that associates with
the pattern matching operators.  I haven't worked out the details, but
at least this seems it might be the right framework to solve this problem.



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] citext like searches using index

2013-04-02 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On 3/30/13 11:35 PM, Tom Lane wrote:
 The LIKE index optimization is hard-wired into
 match_special_index_operator(), which never heard of citext's ~~
 operators.
 
 I've wanted for years to replace that mechanism with something that
 would support plug-in extensions, but have no very good idea how to
 do it.

 I have been thinking there should be a GiST index that associates with
 the pattern matching operators.  I haven't worked out the details, but
 at least this seems it might be the right framework to solve this problem.

No, not really.  You can already build things like that with GiST, see
pg_trgm (which already does LIKE and will soon be able to do regexes).

The issue with the LIKE special case is that left-anchored patterns
are (to some extent) indexable with ordinary btree indexes, and so we
want to exploit that rather than tell people they have to have a whole
other index.  It doesn't make sense IMO to try to mark ~~ as a btree
operator, because btree doesn't really have any ability for
operator-specific code to do what would have to be done.  What does make
sense is for the planner to understand how to extract a btree-indexable
clause out of what's in the query, as that (a) keeps the complexity
out of the run-time machinery, and (b) provides an opportunity for the
planner to estimate whether the whole thing is worth the trouble or
not, which it frequently isn't in LIKE cases.

So it's a pretty special case, but there are just enough instances of it
to wish for some not-so-hard-wired way to deal with it.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] citext like searches using index

2013-04-02 Thread Peter Eisentraut
On 4/2/13 10:26 AM, Tom Lane wrote:
 The issue with the LIKE special case is that left-anchored patterns
 are (to some extent) indexable with ordinary btree indexes, and so we
 want to exploit that rather than tell people they have to have a whole
 other index.

In practice, you need an index specifically for pattern matching anyway.
 The difference would be that instead of using a different operator
class that has no recorded association with the original operator, you'd
use a different access method that is associated with the operator in
normal ways.

 So it's a pretty special case, but there are just enough instances of it
 to wish for some not-so-hard-wired way to deal with it.

Are there any widely known non-built-in cases besides citext?
Presumably the reason you use citext is that you use a lot of letters.
So I kind of doubt that there are going to be a lot of cases of pattern
searches on citext with left-anchored patterns starting with (a
sufficient number of) non-letters.  It's possible, of course, but
doesn't seem important enough by itself.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Robert Haas
On Tue, Apr 2, 2013 at 6:32 AM, Andres Freund and...@2ndquadrant.com wrote:
 On 2013-04-01 17:56:19 -0500, Jim Nasby wrote:
 On 3/23/13 7:41 AM, Ants Aasma wrote:
 Yes, having bgwriter do the actual cleaning up seems like a good idea.
 The whole bgwriter infrastructure will need some serious tuning. There
 are many things that could be shifted to background if we knew it
 could keep up, like hint bit setting on dirty buffers being flushed
 out. But again, we have the issue of having good tests to see where
 the changes hurt.

 I think at some point we need to stop depending on just bgwriter for all 
 this stuff. I believe it would be much cleaner if we had separate procs for 
 everything we needed (although some synergies might exist; if we wanted to 
 set hint bits during write then bgwriter *is* the logical place to put that).

 In this case, I don't think keeping stuff on the free list is close enough 
 to checkpoints that we'd want bgwriter to handle both. At most we might want 
 them to pass some metrics back in forth.

 bgwriter isn't doing checkpoints anymore, there's the checkpointer since 9.2.

 In my personal experience and measurement bgwriter is pretty close to
 useless right now. I think - pretty similar to what Amit has done - it
 should perform part of a real clock sweep instead of just looking ahead
 of the current position without changing usagecounts and the sweep
 position and put enough buffers on the freelist to sustain the need till
 its next activity phase. I hacked around that one night in a hotel and
 got impressive speedups (and quite some breakage) for bigger than s_b
 workloads.

 That would reduce quite a bit of pain points:
 - fewer different processes/cpus looking at buffer headers ahead in the cycle
 - fewer cpus changing usagecounts
 - dirty pages are far more likely to be flushed out already when a new
   page is needed
 - stuff like the relation extension lock which right now frequently have
   to search far and wide for new pages while holding the extension lock
   exlusively should finish quite a bit faster

 If the freelist lock is separated from the lock protecting the clock
 sweep this should get quite a bit of a scalability boost without having
 potential unfairness you can have with partitioning the lock or such.

I agree.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Robert Haas
On Tue, Apr 2, 2013 at 1:53 AM, Merlin Moncure mmonc...@gmail.com wrote:
 That seems pretty unlikely because of A sheer luck of hitting that
 page for the dropout (if your buffer count is N the chances of losing
 it would seem to be 1/N at most) and B highly used pages are much more
 likely to be pinned and thus immune from eviction.  But my issue with
 this whole line of analysis is that I've never been able to to turn it
 up in simulated testing.   Probably to do it you'd need very very fast
 storage.

Well, if you have shared_buffers=8GB, that's a million buffers.  One
in a million events happen pretty frequently on a heavily loaded
server, which, on recent versions of PostgreSQL, can support several
hundred thousand queries per second, each of which accesses multiple
buffers.

I've definitely seen evidence that poor choices of which CLOG buffer
to evict can result in a noticeable system-wide stall while everyone
waits for it to be read back in.  I don't have any similar evidence
for shared buffers, but I wouldn't be very surprised if the same
danger exists there, too.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] citext like searches using index

2013-04-02 Thread Tom Lane
Peter Eisentraut pete...@gmx.net writes:
 On 4/2/13 10:26 AM, Tom Lane wrote:
 The issue with the LIKE special case is that left-anchored patterns
 are (to some extent) indexable with ordinary btree indexes, and so we
 want to exploit that rather than tell people they have to have a whole
 other index.

 In practice, you need an index specifically for pattern matching anyway.

Well, you do if you want to search for general patterns, but there's
been enough interest in the LIKE optimization as-is to make me think
we could not get away with removing it.

 So it's a pretty special case, but there are just enough instances of it
 to wish for some not-so-hard-wired way to deal with it.

 Are there any widely known non-built-in cases besides citext?

Well, indxpath.c knows about text LIKE and network subset operators,
and it would be nice if it knew how to do the same type of optimization
for range inclusion, ie btree_indexed_col @ range_constant.  The latter
doesn't seem implementable in the current infrastructure because ranges
aren't all built-in types.  I agree that the citext case isn't too
compelling in isolation, but surely the range case is interesting.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Merlin Moncure
On Tue, Apr 2, 2013 at 9:55 AM, Robert Haas robertmh...@gmail.com wrote:
 On Tue, Apr 2, 2013 at 1:53 AM, Merlin Moncure mmonc...@gmail.com wrote:
 That seems pretty unlikely because of A sheer luck of hitting that
 page for the dropout (if your buffer count is N the chances of losing
 it would seem to be 1/N at most) and B highly used pages are much more
 likely to be pinned and thus immune from eviction.  But my issue with
 this whole line of analysis is that I've never been able to to turn it
 up in simulated testing.   Probably to do it you'd need very very fast
 storage.

 Well, if you have shared_buffers=8GB, that's a million buffers.  One
 in a million events happen pretty frequently on a heavily loaded
 server, which, on recent versions of PostgreSQL, can support several
 hundred thousand queries per second, each of which accesses multiple
 buffers.

 I've definitely seen evidence that poor choices of which CLOG buffer
 to evict can result in a noticeable system-wide stall while everyone
 waits for it to be read back in.  I don't have any similar evidence
 for shared buffers, but I wouldn't be very surprised if the same
 danger exists there, too.

That's a very fair point, although not being able to evict pinned
buffers is a highly mitigating aspect.  Also CLOG is a different beast
entirely -- it's much more dense (2 bits!) vs a tuple so a single page
can a lot of high priority things.  But you could be right anyways.

Given that, I wouldn't feel very comfortable with forced eviction
without knowing for sure high priority buffers were immune from that.
Your nailing idea is maybe the ideal solution.   Messing around with
the usage_count mechanic is tempting (like raising the cap and making
the sweeper more aggressive as it iterates), but probably really
difficult to get right, and, hopefully, ultimately moot.

 merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Robert Haas
On Tue, Apr 2, 2013 at 11:32 AM, Merlin Moncure mmonc...@gmail.com wrote:
 That's a very fair point, although not being able to evict pinned
 buffers is a highly mitigating aspect.  Also CLOG is a different beast
 entirely -- it's much more dense (2 bits!) vs a tuple so a single page
 can a lot of high priority things.  But you could be right anyways.

 Given that, I wouldn't feel very comfortable with forced eviction
 without knowing for sure high priority buffers were immune from that.
 Your nailing idea is maybe the ideal solution.   Messing around with
 the usage_count mechanic is tempting (like raising the cap and making
 the sweeper more aggressive as it iterates), but probably really
 difficult to get right, and, hopefully, ultimately moot.

One thought I had for fiddling with usage_count is to make it grow
additively (x = x + 1) and decay exponentially (x = x  1).  I'm not
sure the idea is any good, but one problem with the current system is
that it's pretty trivial for a buffer to accumulate five touches, and
after that we lose all memory of what the frequency of access is, so a
pages of varying different levels of hotness can all have usage
count 5.  This might allow a little more refinement without letting
the time to degrade the usage count get out of control.

But, having said that, I still think the best idea is what Andres
proposed, which pretty much matches my own thoughts: the bgwriter
needs to populate the free list, so that buffer allocations don't have
to wait for linear scans of the buffer array.  That's just plain too
slow.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] citext like searches using index

2013-04-02 Thread David E. Wheeler
On Apr 2, 2013, at 8:03 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Are there any widely known non-built-in cases besides citext?
 
 Well, indxpath.c knows about text LIKE and network subset operators,
 and it would be nice if it knew how to do the same type of optimization
 for range inclusion, ie btree_indexed_col @ range_constant.  The latter
 doesn't seem implementable in the current infrastructure because ranges
 aren't all built-in types.  I agree that the citext case isn't too
 compelling in isolation, but surely the range case is interesting.

Is this knowledge encapsulated in a to-do?

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spin Lock sleep resolution

2013-04-02 Thread Jeff Janes
On Monday, April 1, 2013, Tom Lane wrote:

 Jeff Janes jeff.ja...@gmail.com writes:
  The problem is that the state is maintained only to an integer number of
  milliseconds starting at 1, so it can take a number of attempts for the
  random increment to jump from 1 to 2, and then from 2 to 3.

 Hm ... fair point, if you assume that the underlying OS has a sleep
 resolution finer than 1ms.  Otherwise it would not matter.


Let's say you get a long stretch of increments that are all a ratio of 1.5
fold, for simplicity let's say they are all 1.3 fold.  When you do
intermediate truncations of the state variable, it never progresses at all.

perl -le '$foo=1; foreach (1..10) {$foo*=1.3; print int $foo}'

perl -le '$foo=1; foreach (1..10) {$foo*=1.3; $foo=int $foo; print int
$foo}'

Obviously the true stochastic case is not quite that stark.




 No patch seen here ...



Sorry.  I triple checked that the patch was there, but it seems like if you
save a draft with an attachment, when you come back later to finish and
send it, the attachment may not be there anymore.  The Gmail Offline teams
still has a ways to go.  Hopefully it is actually there this time.

Cheers,

Jeff


spin_delay_microsecond_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] in-catalog Extension Scripts and Control parameters (templates?)

2013-04-02 Thread Dimitri Fontaine
Heikki Linnakangas hlinnakan...@vmware.com writes:
 I'm quite worried about the security ramifications of this patch. Today, if
 you're not sure if a system has e.g sslinfo installed, you can safely just
 run CREATE EXTENSION sslinfo. With this patch, that's no longer true,
 because foo might not be the extension you're looking for. Mallory

With the attached patch, you can't create a template for an extension
that is already available, so to protect against your scenario you only
have to make sslinfo available.

Please also note that when actually installing the sslinfo extension,
the one from the system will get prefered over the one from the
templates, should you have both available.

Now, I can see why you would still think it's not enough. Baring better
ideas, the current patch restricts the feature to superusers.

 Documentation doesn't build, multiple errors. In addition to the reference
 pages, there should be a section in the main docs about these templates.

I'm now working on that, setting up the documentation tool set.

 postgres=# create template for extension myextension version '1.0' with () 
 as 'foobar';
 CREATE TEMPLATE FOR EXTENSION
 postgres=# create extension myextension;
 ERROR:  syntax error at or near foobar
 LINE 1: create extension myextension;
 ^

 Confusing error message.

Introducing a syntax error in hstore--1.1.sql leads to the same message.
Even if we agree that it must be changed, I think that's for another
patch.

~# create extension hstore;
ERROR:  syntax error at or near foobar at character 115
STATEMENT:  create extension hstore;
ERROR:  syntax error at or near foobar

 postgres=# create template for extension myextension version '1.0' with () 
 as $$create table foobar(i int4) $$;
 CREATE TEMPLATE FOR EXTENSION
 postgres=# create extension myextension;
 CREATE EXTENSION
 postgres=# select * from foobar;
 ERROR:  relation foobar does not exist
 LINE 1: select * from foobar;
   ^

 Where did that table go?

The control-schema was not properly initialized when not given by the
user. That's fixed, and I added a regression test.

 Ah, here... Where did that 1.0 schema come from?

Fixed too, was the same bug.

 postgres= create template for extension myextension version '1.0' with
 (schema public) as $$ create function evilfunc() returns int4 AS
 evilfunc' language internal; $$;
 CREATE TEMPLATE FOR EXTENSION
 postgres= create extension myextension version '1.0';ERROR:  permission 
 denied for language internal
 postgres= drop template for extension myextension version '1.0';
 ERROR:  extension with OID 16440 does not exist

 Something wrong with catalog caching.

Works for me, I couldn't reproduce the bug here, working from Álvaro's
version 4 of the patch. Maybe he did already fix it, and you tested my
version 3?

 $ make -s  install
 /usr/bin/install: cannot stat `./hstore--1.0.sql': No such file or directory
 make: *** [install] Error 1

 Installing hstore fails.

Fixed in the attached. Seeing that makes me think that you actually used
Álvaro's version 4, though.

 postgres= create template for extension sslinfo version '1.0' with
 (schema public) as $$ create function evilfunc() returns int4 AS
 evilfunc' language internal; $$;
 ERROR:  extension sslinfo is already available

Expected.

 postgres= create template for extension sslinfo2 version '1.0' with
 (schema public) as $$ create function evilfunc() returns int4 AS
 evilfunc' language internal; $$;
 CREATE TEMPLATE FOR EXTENSION
 postgres= alter template for extension sslinfo2 rename to sslinfo;
 ALTER TEMPLATE FOR EXTENSION

 If we check for an existing extension at CREATE, should also check for that
 in ALTER ... RENAME TO.

Indeed, good catch. Fixed in the attached version 5 of the patch.

I didn't add a regression test for that case, because we need to know
which extensions are available when we try to obtain this error, and I
don't know how to do that. We could create a template for the extension
foobar, then foobar2, then rename foobar2 to foobar, but that wouldn't
exercise the same code path.

Thanks again for your reviewing,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support



templates.v5.patch.gz
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-04-02 Thread Peter Eisentraut
I'm going to ignore most of the discussion that led up to this and give
this patch a fresh look.

+ screen
+ SET PERSISTENT max_connections To 10;
+ /screen

The To should probably be capitalized.

I doubt this example actually works because changing max_connections
requires a restart.  Try to find a better example.

It's weird that SET LOCAL and SET SESSION actually *set* the value, and
the second key word determines how long the setting will last.  SET
PERSISTENT doesn't actually set the value.  I predict that this will be
a new favorite help-it-doesn't-work FAQ.

  varlistentry
   termliteralSCHEMA/literal/term
   listitem
!   paraliteralSET [ PERSISTENT ] SCHEMA
'replaceablevalue/'/ is an alias for
 literalSET search_path TO replaceablevalue//.  Only one
 schema can be specified using this syntax.
/para

I don't think [ PERSISTENT ] needs to be added in these and similar
snippets.  We don't mention LOCAL etc. here either.

--- 34,41 
  filenamepostgresql.conf/filename,
filenamepg_hba.conf/filename, and
  filenamepg_ident.conf/filename are traditionally stored in
  varnamePGDATA/ (although in productnamePostgreSQL/productname
8.0 and
! later, it is possible to place them elsewhere). By default the
directory config is stored
! in varnamePGDATA/, however it should be kept along with
filenamepostgresql.conf/filename.
  /para

  table tocentry=1 id=pgdata-contents-table

This chunk doesn't make any sense to me.  Should is always tricky.
Why should I, and why might I not?

  row
+  entryfilenameconfig//entry
+  entrySubdirectory containing automatically generated configuration
files/entry
+ /row
+
+ row
   entryfilenamebase//entry
   entrySubdirectory containing per-database subdirectories/entry
  /row

Only automatically generated ones?

COPY_STRING_FIELD(name);
COPY_NODE_FIELD(args);
COPY_SCALAR_FIELD(is_local);
+   COPY_SCALAR_FIELD(is_persistent);

return newnode;
  }

I suggest changing is_local into a new trivalued field that stores LOCAL
or SESSION or PERSISTENT.

n-is_local = false;
$$ = (Node *) n;
}
+   | SET PERSISTENT set_persistent
+   {
+   VariableSetStmt *n = $3;
+   n-is_persistent = true;
+   $$ = (Node *) n;
+   }
;

  set_rest:

Why can't you use SET PERSISTENT set_rest?

*** a/src/backend/replication/basebackup.c
--- b/src/backend/replication/basebackup.c
***
*** 755,760  sendDir(char *path, int basepathlen, bool sizeonly)
--- 755,766 
strlen(PG_TEMP_FILE_PREFIX)) == 0)
continue;

+   /* skip auto conf temporary file */
+   if (strncmp(de-d_name,
+   PG_AUTOCONF_FILENAME .,
+   sizeof(PG_AUTOCONF_FILENAME)) == 0)
+   continue;
+

Maybe pg_basebackup should be taught to ignore certain kinds of
temporary files in general.  The file name shouldn't be hardcoded into
pg_basebackup.  This would effectively make the configuration file
naming scheme part of the replication protocol.  See other thread about
pg_basebackup client/server compatibility.  This needs to be generalized.

(Side thought: Does pg_basebackup copy editor temp and backup files?)

+   ereport(elevel,
+   (errmsg(Configuration parameters changed with SET
PERSISTENT command before start of server 
+   will not be effective as \%s\  file is not
accessible., PG_AUTOCONF_FILENAME)));

I'm having trouble interpreting this: How can you change something with
SET PERSISTENT before the server starts?

Also: errmsg messages should start with a lowercase letter and should
generally be much shorter.  Please review other cases in your patch as well.

+   appendStringInfoString(buf, # Do not edit this file manually! 
+  It is overwritten by the SET PERSISTENT command
\n);

There is some punctuation or something missing at the end.

I suggest you break this into two lines.  It's pretty long.


I think the naming of these files is suboptimal:

+ #define PG_AUTOCONF_DIR   config

config would seem to include pg_hba.conf and others.  Or
postgresql.conf for that matter.  Maybe I should move postgresql.conf
into config/.  Another great new FAQ.  The normal convention for this is
postgresql.conf.d.  I don't see any reason not to use that.  One
reason that was brought up is that this doesn't match other things
currently in PGDATA, but (a) actually it does (hint: postgresql.conf),
and (b) neither does config.

+ #define PG_AUTOCONF_FILENAME  persistent.auto.conf

auto sounds like 

Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 But, having said that, I still think the best idea is what Andres
 proposed, which pretty much matches my own thoughts: the bgwriter
 needs to populate the free list, so that buffer allocations don't have
 to wait for linear scans of the buffer array.  That's just plain too
 slow.

I agree in general, though I'm not sure the bgwriter process can
reasonably handle this need along with what it's already supposed to be
doing.  We may need another background process that is just responsible
for keeping the freelist populated.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Andres Freund
On 2013-04-02 11:54:32 -0400, Robert Haas wrote:
 On Tue, Apr 2, 2013 at 11:32 AM, Merlin Moncure mmonc...@gmail.com wrote:
  That's a very fair point, although not being able to evict pinned
  buffers is a highly mitigating aspect.  Also CLOG is a different beast
  entirely -- it's much more dense (2 bits!) vs a tuple so a single page
  can a lot of high priority things.  But you could be right anyways.
 
  Given that, I wouldn't feel very comfortable with forced eviction
  without knowing for sure high priority buffers were immune from that.
  Your nailing idea is maybe the ideal solution.   Messing around with
  the usage_count mechanic is tempting (like raising the cap and making
  the sweeper more aggressive as it iterates), but probably really
  difficult to get right, and, hopefully, ultimately moot.
 
 One thought I had for fiddling with usage_count is to make it grow
 additively (x = x + 1) and decay exponentially (x = x  1).  I'm not
 sure the idea is any good, but one problem with the current system is
 that it's pretty trivial for a buffer to accumulate five touches, and
 after that we lose all memory of what the frequency of access is, so a
 pages of varying different levels of hotness can all have usage
 count 5.  This might allow a little more refinement without letting
 the time to degrade the usage count get out of control.
 
 But, having said that, I still think the best idea is what Andres
 proposed, which pretty much matches my own thoughts: the bgwriter
 needs to populate the free list, so that buffer allocations don't have
 to wait for linear scans of the buffer array.  That's just plain too
 slow.

That way the usagecount should go down far more slowly which essentially makes
it more granular. And I think fiddling on that level before we have a more
sensible buffer acquiration implementation is pretty premature since that will
change too much.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Andres Freund
On 2013-04-02 12:22:03 -0400, Tom Lane wrote:
 Robert Haas robertmh...@gmail.com writes:
  But, having said that, I still think the best idea is what Andres
  proposed, which pretty much matches my own thoughts: the bgwriter
  needs to populate the free list, so that buffer allocations don't have
  to wait for linear scans of the buffer array.  That's just plain too
  slow.
 
 I agree in general, though I'm not sure the bgwriter process can
 reasonably handle this need along with what it's already supposed to be
 doing.  We may need another background process that is just responsible
 for keeping the freelist populated.

What else is the bgwriter actually doing otherwise? Sure, it doesn't put the
pages on the freelist, but otherwise its trying to do the above isn't it?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Tom Lane
Andres Freund and...@2ndquadrant.com writes:
 On 2013-04-02 12:22:03 -0400, Tom Lane wrote:
 I agree in general, though I'm not sure the bgwriter process can
 reasonably handle this need along with what it's already supposed to be
 doing.  We may need another background process that is just responsible
 for keeping the freelist populated.

 What else is the bgwriter actually doing otherwise? Sure, it doesn't put the
 pages on the freelist, but otherwise its trying to do the above isn't it?

I think it will be problematic to tie buffer-undirtying to putting both
clean and dirty buffers into the freelist.  It might chance to work all
right to use one scan process for both, but I'm afraid it's more likely
that we'd end up either overserving one goal or underserving the other.

Also note the entire design of BgBufferSync right now is predicated on
the assumption that the rate of motion of the scan strategy point
reflects the rate at which new buffers are needed.  If backends are
supposed to always get new buffers from the freelist, that logic becomes
circular: the bgwriter would be watching itself.  Perhaps we can
refactor the feedback control loop logic so that the buffer scan rate is
driven by changes in the length of the freelist, but I'm not sure
exactly what the consequences would be.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Andres Freund
On 2013-04-02 12:56:56 -0400, Tom Lane wrote:
 Andres Freund and...@2ndquadrant.com writes:
  On 2013-04-02 12:22:03 -0400, Tom Lane wrote:
  I agree in general, though I'm not sure the bgwriter process can
  reasonably handle this need along with what it's already supposed to be
  doing.  We may need another background process that is just responsible
  for keeping the freelist populated.
 
  What else is the bgwriter actually doing otherwise? Sure, it doesn't put the
  pages on the freelist, but otherwise its trying to do the above isn't it?
 
 I think it will be problematic to tie buffer-undirtying to putting both
 clean and dirty buffers into the freelist.  It might chance to work all
 right to use one scan process for both, but I'm afraid it's more likely
 that we'd end up either overserving one goal or underserving the other.

Hm. I had imagined that we would only ever put clean buffers into the
freelist and that we would never write out a buffer that we don't need
for a new page. I don't see much point in randomly writing out buffers
that won't be needed for something else soon. Currently we can't do much
better than basically undirtying random buffers since we don't really know
which page will reach a usagecount of zero since bgwriter doesn't
manipulate usagecounts.

One other scenario I can see is the problem of strategy buffer reusage
of dirtied pages (hint bits, pruning) during seqscans where we would
benefit from pages being written out fast, but I can't imagine that that
could be handled very well by something like the bgwriter?

Am I missing something?

 Also note the entire design of BgBufferSync right now is predicated on
 the assumption that the rate of motion of the scan strategy point
 reflects the rate at which new buffers are needed.  If backends are
 supposed to always get new buffers from the freelist, that logic becomes
 circular: the bgwriter would be watching itself.  Perhaps we can
 refactor the feedback control loop logic so that the buffer scan rate is
 driven by changes in the length of the freelist, but I'm not sure
 exactly what the consequences would be.

Yea, that will definitely need refactoring. What I am imagining is that
that pacing is basically built ontop of a few StragetyControl variables
like:
* number of buffers from the freelist
* number of buffers acquired by backend because freelist was empty
* number of buffers written out by backend because freelist was empty

Those should be pretty cheap to maintain and should be enough to
implement sensible pacing for bgwriter.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] citext like searches using index

2013-04-02 Thread Tom Lane
David E. Wheeler da...@kineticode.com writes:
 On Apr 2, 2013, at 8:03 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Are there any widely known non-built-in cases besides citext?

 Well, indxpath.c knows about text LIKE and network subset operators,
 and it would be nice if it knew how to do the same type of optimization
 for range inclusion, ie btree_indexed_col @ range_constant.  The latter
 doesn't seem implementable in the current infrastructure because ranges
 aren't all built-in types.  I agree that the citext case isn't too
 compelling in isolation, but surely the range case is interesting.

On further reflection, I withdraw the claim that range-inclusion
couldn't be implemented in the current design.  Although the various
range types might not be built-in, the anyelement @ anyrange operator
*is* built-in, so its OID could be added to the switch statements in
indxpath.c.  I don't think it'd be terribly difficult to then add
datatype-agnostic code to pry apart the range value and construct a
derived btree_indexed_col = lowbound AND btree_indexed_col = highbound
indexclause.  But this could not be extended to citext or other plugin
extensions, because their operators don't have hard-wired OIDs.

Anyway, even if the specific claim about ranges is bogus, I still think
there are enough data points to justify the idea that a more extensible
mechanism would be worth having.  At the same time, there's a reason
why it's not yet got to the top of anyone's priority list.

 Is this knowledge encapsulated in a to-do?

I added an item to the Indexes section of the TODO page.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Greg Stark
I'm confused by this thread. We *used* to maintain an LRU. The whole
reason for the clock-sweep algorithm is precisely to avoid maintaining
a linked list of least recently used buffers since the head of that
list is a point of contention.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Andres Freund
On 2013-04-02 18:26:23 +0100, Greg Stark wrote:
 I'm confused by this thread. We *used* to maintain an LRU. The whole
 reason for the clock-sweep algorithm is precisely to avoid maintaining
 a linked list of least recently used buffers since the head of that
 list is a point of contention.

I don't think anybody is proposing to put the LRU back into a linked list,
given the frequency of usagecount manipulations that would probably end pretty
badly. What I think Robert, Tom and I are talking are talking about is putting
*some* buffers with usagecount = 0 into a linked list so that when a backend
requires a new page it can take one buffer from the freelist instead of

a) possibly touching quite some (I have seen 4 times *every* existing header)
pages to find one with usagecount = 0
b) having to write the page out itself

If everything is going well that would mean only the bgwritter (or if
bgfreelist or whatever) performs the clock sweep. Others take *new* pages from
the freelist instead of performing part of the sweep themselves.

Makes more sense?

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] citext like searches using index

2013-04-02 Thread David E. Wheeler
On Apr 2, 2013, at 10:16 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Is this knowledge encapsulated in a to-do?
 
 I added an item to the Indexes section of the TODO page.

Great, thanks.

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Atri Sharma
On Tue, Apr 2, 2013 at 9:24 PM, Robert Haas robertmh...@gmail.com wrote:

 One thought I had for fiddling with usage_count is to make it grow
 additively (x = x + 1) and decay exponentially (x = x  1).  I'm not
 sure the idea is any good, but one problem with the current system is
 that it's pretty trivial for a buffer to accumulate five touches, and
 after that we lose all memory of what the frequency of access is, so a
 pages of varying different levels of hotness can all have usage
 count 5.  This might allow a little more refinement without letting
 the time to degrade the usage count get out of control.

This is just off the top of my head, but one possible solution could
be to quantize the levels of hotness. Specifically, we could
categorize buffers based on hotness. All buffers start in level 1 and
usage_count 0. Whichever buffer reaches usage_count of 5, and next
clock sweep which wants to increment its usage_count(hence taking it
above 5) sees that, it promotes the buffer to the next level, and
resets its usage_count to 0. Same logic applies for each level. When
we decrement usage_count and see that it is zero(for some buffer), if
it is in a level  1, we demote the buffer to the next lower level. If
the buffer is in level 1, it is a potential candidate for replacement.

This will allow us to have a loose idea about the hotness of a page,
without actually storing the usage_count for a buffer. We can still
update usage_count without locking, as buffers in high contention
which miss an update in their usage_count wont be affected by that
missed update, in accordance with all the discussion upthread.

Thoughts/Comments?

Regards,

Atri


--
Regards,

Atri
l'apprenant


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Merlin Moncure
On Tue, Apr 2, 2013 at 12:50 PM, Atri Sharma atri.j...@gmail.com wrote:
 On Tue, Apr 2, 2013 at 9:24 PM, Robert Haas robertmh...@gmail.com wrote:

 One thought I had for fiddling with usage_count is to make it grow
 additively (x = x + 1) and decay exponentially (x = x  1).  I'm not
 sure the idea is any good, but one problem with the current system is
 that it's pretty trivial for a buffer to accumulate five touches, and
 after that we lose all memory of what the frequency of access is, so a
 pages of varying different levels of hotness can all have usage
 count 5.  This might allow a little more refinement without letting
 the time to degrade the usage count get out of control.

 This is just off the top of my head, but one possible solution could
 be to quantize the levels of hotness. Specifically, we could
 categorize buffers based on hotness. All buffers start in level 1 and
 usage_count 0. Whichever buffer reaches usage_count of 5, and next
 clock sweep which wants to increment its usage_count(hence taking it
 above 5) sees that, it promotes the buffer to the next level, and
 resets its usage_count to 0. Same logic applies for each level. When
 we decrement usage_count and see that it is zero(for some buffer), if
 it is in a level  1, we demote the buffer to the next lower level. If
 the buffer is in level 1, it is a potential candidate for replacement.

 This will allow us to have a loose idea about the hotness of a page,
 without actually storing the usage_count for a buffer. We can still
 update usage_count without locking, as buffers in high contention
 which miss an update in their usage_count wont be affected by that
 missed update, in accordance with all the discussion upthread.

how is that different from usage_count itself? usage_count *is* a
measure of hotness.  the arbitrary cap at 5 is paranoia to prevent the
already considerable damage that occurs in the situation Andres is
talking about (where everyhing is marked 'hot' so you have to sweep a
lot more).

also, any added complexity in terms of manipulating usage_count is a
move away from the lockless maintenance I'm proposing.  maybe my idea
is a non-starter on that basis alone, but the mechanic should be kept
as simple as possible.  the idea to move it to the bgwriter is to
pre-emptively do the work that backends are now doing: try and keep
ahead of the allocations being done so that buffer requests are
satisfied quickly.

merlin


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Drastic performance loss in assert-enabled build in HEAD

2013-04-02 Thread Kevin Grittner
Tom Lane t...@sss.pgh.pa.us wrote:
 Kevin Grittner kgri...@ymail.com writes:
 Tom Lane t...@sss.pgh.pa.us wrote:
 So maybe I'm nuts to care about the performance of an
 assert-enabled backend, but I don't really find a 4X runtime
 degradation acceptable, even for development work.  Does anyone
 want to fess up to having caused this, or do I need to start
 tracking down what changed?

 I checked master HEAD for a dump of regression and got about 4
 seconds.  I checked right after my initial push of matview code
 and got 2.5 seconds.  I checked just before that and got 1
 second.  There was some additional pg_dump work for matviews
 after the initial push which may or may not account for the rest
 of the time.

 I poked at this a bit, and eventually found that the performance
 differential goes away if I dike out the
 pg_relation_is_scannable() call in getTables()'s table-collection
 query.  What appears to be happening is that those calls cause a
 great many more relcache entries to be made than used to happen,
 plus many entries are made earlier in the run than they used to
 be.  Many of those entries have subsidiary memory contexts, which
 results in an O(N^2) growth in the time spent in AllocSetCheck,
 since postgres.c does MemoryContextCheck(TopMemoryContext) once
 per received command, and pg_dump will presumably issue O(N)
 commands in an N-table database.

 So one question that brings up is whether we need to dial back
 the amount of work done in memory context checking, but I'm loath
 to do so in development builds.  That code has fingered an awful
 lot of bugs.  OTOH, if the number of tables in the regression DB
 continues to grow, we may have little choice.

 Anyway, the immediate takeaway is that this represents a horribly
 expensive way for pg_dump to find out which matviews aren't
 scannable.  The cheap way for it to find out would be if we had a
 boolean flag for that in pg_class.  Would you review the bidding
 as to why things were done the way they are?  Because in general,
 having to ask the kernel something is going to suck in any case,
 so basing it on the size of the disk file doesn't seem to me to
 be basically a good thing.

The early versions of the patch had a boolean in pg_class to track
this, but I got complaints from Robert and Noah (and possibly
others?) that this got too ugly in combination with the support for
unlogged matviews, and they suggested the current approach.  For an
unlogged matview we need to replace the heap (main fork) with the
init fork before the database is up and running, so there would
need to be some way for that to result in forcing the flag in
pg_class.  I was attempting to do that when a matview which was
flagged as scannable in pg_class was opened, but I gotta agree that
it wasn't pretty.  Noah suggested a function based on testing the
heap to see if it looked like the init fork, and the current state
of affairs I my attempt to make it work that way.

We could definitely minimize the overhead by only testing this for
matviews.  It seemed potentially useful for the function to have
some purpose for other types of relations, so I was trying to avoid
that.  Maybe that was a bad idea.

 We could alleviate the pain by changing pg_dump's query to
 something like

 (case when c.relkind = 'm'
  then pg_relation_is_scannable(c.oid)
  else false end)

Yeah, that's the sort of thing I was thinking of.  If we're going
to go that way, we may want to touch one or two other spots.

 but TBH this feels like bandaging a bad design.

So far nobody has been able to suggest a better way to support both
unlogged matviews and some way to prevent a matview from being used
if it does not contain materialized data for its query from *some*
point in time.  Suggestions welcome.

 Another reason why I don't like this code is that
 pg_relation_is_scannable is broken by design:

 relid = PG_GETARG_OID(0);
 relation = RelationIdGetRelation(relid);
 result = relation-rd_isscannable;
 RelationClose(relation);

 You can't do that: if the relcache entry doesn't already exist,
 this will try to construct one while not holding any lock on the
 relation, which is subject to all sorts of race conditions.

Hmm.  I think I had that covered earlier but messed up in
rearranging to respond to review comments.  Will review both new
calling locations.

 In general, direct calls on RelationIdGetRelation are probably
 broken.

If valid contexts for use of the function are that limited, it
might merit a comment where the function is defined.  I'm not sure
what a good alternative to this would be.

--
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Page replacement algorithm in buffer cache

2013-04-02 Thread Atri Sharma


Sent from my iPad

On 02-Apr-2013, at 23:41, Merlin Moncure mmonc...@gmail.com wrote:

 On Tue, Apr 2, 2013 at 12:50 PM, Atri Sharma atri.j...@gmail.com wrote:
 On Tue, Apr 2, 2013 at 9:24 PM, Robert Haas robertmh...@gmail.com wrote:
 
 One thought I had for fiddling with usage_count is to make it grow
 additively (x = x + 1) and decay exponentially (x = x  1).  I'm not
 sure the idea is any good, but one problem with the current system is
 that it's pretty trivial for a buffer to accumulate five touches, and
 after that we lose all memory of what the frequency of access is, so a
 pages of varying different levels of hotness can all have usage
 count 5.  This might allow a little more refinement without letting
 the time to degrade the usage count get out of control.
 
 This is just off the top of my head, but one possible solution could
 be to quantize the levels of hotness. Specifically, we could
 categorize buffers based on hotness. All buffers start in level 1 and
 usage_count 0. Whichever buffer reaches usage_count of 5, and next
 clock sweep which wants to increment its usage_count(hence taking it
 above 5) sees that, it promotes the buffer to the next level, and
 resets its usage_count to 0. Same logic applies for each level. When
 we decrement usage_count and see that it is zero(for some buffer), if
 it is in a level  1, we demote the buffer to the next lower level. If
 the buffer is in level 1, it is a potential candidate for replacement.
 
 This will allow us to have a loose idea about the hotness of a page,
 without actually storing the usage_count for a buffer. We can still
 update usage_count without locking, as buffers in high contention
 which miss an update in their usage_count wont be affected by that
 missed update, in accordance with all the discussion upthread.
 
 how is that different from usage_count itself? usage_count *is* a
 measure of hotness.  the arbitrary cap at 5 is paranoia to prevent the
 already considerable damage that occurs in the situation Andres is
 talking about (where everyhing is marked 'hot' so you have to sweep a
 lot more).
 
 also, any added complexity in terms of manipulating usage_count is a
 move away from the lockless maintenance I'm proposing.  maybe my idea
 is a non-starter on that basis alone, but the mechanic should be kept
 as simple as possible.  the idea to move it to the bgwriter is to
 pre-emptively do the work that backends are now doing: try and keep
 ahead of the allocations being done so that buffer requests are
 satisfied quickly.
 

I agree, we want to reduce the complexity of usage_count.I was only musing on 
the point Robert raised, if we want to continue using usage_count and refine it 
for more accurate tracking of hotness of a buffer.

Regards,

Atri

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Drastic performance loss in assert-enabled build in HEAD

2013-04-02 Thread Kevin Grittner
Kevin Grittner kgri...@ymail.com wrote:
 Tom Lane t...@sss.pgh.pa.us wrote:

 Another reason why I don't like this code is that
 pg_relation_is_scannable is broken by design:

  relid = PG_GETARG_OID(0);
  relation = RelationIdGetRelation(relid);
  result = relation-rd_isscannable;
  RelationClose(relation);

 You can't do that: if the relcache entry doesn't already exist,
 this will try to construct one while not holding any lock on the
 relation, which is subject to all sorts of race conditions.

 Hmm.  I think I had that covered earlier but messed up in
 rearranging to respond to review comments.  Will review both new
 calling locations.

For the SQL-level function, does this look OK?:

diff --git a/src/backend/utils/adt/dbsize.c
b/src/backend/utils/adt/dbsize.c
index d589d26..94e55f0 100644
--- a/src/backend/utils/adt/dbsize.c
+++ b/src/backend/utils/adt/dbsize.c
@@ -850,9 +850,13 @@ pg_relation_is_scannable(PG_FUNCTION_ARGS)
    bool    result;
 
    relid = PG_GETARG_OID(0);
-   relation = RelationIdGetRelation(relid);
+   relation = try_relation_open(relid, AccessShareLock);
+
+   if (relation == NULL)
+   PG_RETURN_BOOL(false);
+
    result = relation-rd_isscannable;
-   RelationClose(relation);
+   relation_close(relation, AccessShareLock);
 
    PG_RETURN_BOOL(result);
 }

I think the call in ExecCheckRelationsScannable() is safe because
it comes after the tables are all already locked.  I put it there
so that the appropriate lock strength should be used based on the
whether it was locked by ExecInitNode() or something before it.  Am
I missing something?  Can I not count on the lock being held at
that point?  Would the right level of API here be relation_open()
with NoLock rather than RelationIdGetRelation()?  Or is there some
other call which is more appropriate there?

-- 
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Spin Lock sleep resolution

2013-04-02 Thread David Gould
On Tue, 2 Apr 2013 09:01:36 -0700
Jeff Janes jeff.ja...@gmail.com wrote:

 Sorry.  I triple checked that the patch was there, but it seems like if you
 save a draft with an attachment, when you come back later to finish and
 send it, the attachment may not be there anymore.  The Gmail Offline teams
 still has a ways to go.  Hopefully it is actually there this time.

I'll give the patch a try, I have a workload that is impacted by spinlocks
fairly heavily sometimes and this might help or at least give me more
information. Thanks!

-dg

-- 
David Gould   da...@sonic.net
If simplicity worked, the world would be overrun with insects.


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: index support for regexp search

2013-04-02 Thread Erikjan Rijkers
On Mon, April 1, 2013 23:15, Alexander Korotkov wrote:

[trgm-regexp-0.14.patch.gz]

Hi Alexander,

Something went wrong in this version of the patch: many (most) queries that 
were earlier
spectacularly fast have become slow, often slower than a seqscan or only 
marginally faster. See
the attached numbers; it compares head(seqscan) with trgm-regex patch versions 
13 and 14.

I did not even complete the test-run because version 14 is so clearly inferior 
to 13 (and earlier,
as far as I can remember).

(let me know if you want the whole result, I can run it overnight)


Thanks,


Erik Rijkers


re-head-13-14-20130402-2219.txt.bz2
Description: BZip2 compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Proposal for Allow postgresql.conf values to be changed via SQL [review]

2013-04-02 Thread Greg Smith

On 4/1/13 5:44 AM, Amit Kapila wrote:


I think in that case we can have 3 separate patches
1. Memory growth defect fix
2. Default postgresql.conf to include config directory and SET Persistent
into single file implementation
3. Rearrangement of GUC validation into validate_conf_option function.

As already there is review happened for point 2 and point 1 is an existing
code defect fix, so in my opinion
Patch 1 and 2 should be considered for 9.3.


They have been considered for 9.3, I just doubt they could get committed 
right now.  In order for this to go in as part of the very last 9.3 
feature changes (which is where we're at in the development cycle), 
you'd need to have a committer volunteer to take on the job of doing a 
second level of review on it.  And then that would have to happen 
without any other issues popping up.  That usually is not how it works. 
 Normally the first round of committer review finds another round of 
issues, and there's at least one more update before commit.  (Note that 
this is exactly what just happened today, with the review from Peter 
Eisentraut)


I'm not saying it's impossible for this feature to go in to 9.3, but I'm 
not seeing any committers volunteer to take on the job either.  I do 
want to see this feature go in--I'll update it for 9.4 even if you 
don't--but we're already into April.  There isn't much time left for 
9.3.  And the security release this week has gobbled up a good chunk of 
committer and packager time unexpectedly, which is just bad luck for 
your submission.


From a process perspective, features that enter the last CF of a 
release that are very close to ready from the start have good odds of 
being committed.  You've done an excellent job of updating this in 
response to feedback, but it has involved a long list of changes so far. 
 It's fair to say this was still a rough feature at the start of CF 
2013-01, and now it's good but can be usefully polished a bit more.


For something of this size, going from rough feature to commit quality 
normally takes more than one CF.  I don't have any obvious errors to 
point out right now.  But I think there's still some room to improve on 
this before commit.  Andres mentioned on another thread that he thought 
merging some of your ideas with the version Zoltan did was useful to 
look at, and I was thinking of something similar.  This is close to 
being ready, and I hope you won't get discouraged just because it's 
probably going to slip to 9.4.


--
Greg Smith   2ndQuadrant USg...@2ndquadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.com


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Getting to 9.3 beta

2013-04-02 Thread Simon Riggs
On 29 March 2013 15:05, Tom Lane t...@sss.pgh.pa.us wrote:
 Bruce Momjian br...@momjian.us writes:
 Our final 9.3 commit-fest has has exceeded the two-month mark, so it is
 time to start targeting a date to close it and get to 9.3 beta.  I see
 25 items will needing attention before we can close it:

   https://commitfest.postgresql.org/action/commitfest_view?id=17

 What is a reasonable timeframe to target for completion of these items?

 TBH, once Andrew commits the JSON patch, I wouldn't have a problem with
 moving all the rest to Returned With Feedback or the next CF.  None of
 the others seem to me to be close-to-committable (with the possible
 exception of the Max LSN patch, which I've not looked at), and April is
 not the time to be doing development.

Agreed

I completely agree with the need to end this CF here/now/soon. As
someone who has normally tried to stretch the limit on what is
possible, I do recognise the need for this to come to an end.
(Certainly my own available time is running out.)

Thanks very much to everybody that tried to get so much into the release.

 Next week is going to be tied up with the back-branch releases, but
 maybe we could target beta for the week after?  The main gating factor
 at this point really would be how quickly we could write some draft
 release notes, so people know what to test.

I think we probably need a little more time for open items discussion
and lists (not more feature commits, just docs and
I-said-I'd-change-thats). In the past that has taken a couple of weeks
to work through, so I'd guess there's a few things to sort out there.

So I suggest about 23 April for Beta1.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] WIP: index support for regexp search

2013-04-02 Thread Alexander Korotkov
On Wed, Apr 3, 2013 at 12:36 AM, Erikjan Rijkers e...@xs4all.nl wrote:

 On Mon, April 1, 2013 23:15, Alexander Korotkov wrote:

 [trgm-regexp-0.14.patch.gz]

 Hi Alexander,


Hi Erik!


 Something went wrong in this version of the patch: many (most) queries
 that were earlier
 spectacularly fast have become slow, often slower than a seqscan or only
 marginally faster. See
 the attached numbers; it compares head(seqscan) with trgm-regex patch
 versions 13 and 14.

 I did not even complete the test-run because version 14 is so clearly
 inferior to 13 (and earlier,
 as far as I can remember).

 (let me know if you want the whole result, I can run it overnight)


Yes, there was bug in new color map scan implementation. It was fixed in
attached version. Could you re-run tests please?
Attached version of patch also addresses problem that we now using API it's
not correct to use special values of color number. It introduces additional
structure ExtendedColor which contain ExtendedColorClass field for handling
additional virtual colors introduced by analysis.

--
With best regards,
Alexander Korotkov.


trgm-regexp-0.15.patch.gz
Description: GNU Zip compressed data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spoonbill vs. -HEAD

2013-04-02 Thread Tom Lane
Stefan Kaltenbrunner ste...@kaltenbrunner.cc writes:
 On 03/26/2013 11:30 PM, Tom Lane wrote:
 A different line of thought is that the cancel was received by the
 backend but didn't succeed in cancelling the query for some reason.

 I added the pgcancel failed codepath you suggested but it does not
 seem to get triggered at all so the above might actually be what is
 happening...

Stefan was kind enough to grant me access to spoonbill, and after
some experimentation I found out the problem.  It seems that OpenBSD
blocks additional deliveries of a signal while the signal handler is
in progress, and that this is implemented by just calling sigprocmask()
before and after calling the handler.  Therefore, if the handler doesn't
return normally --- like, say, it longjmps --- the restoration of the
previous mask never happens.  So we're left with the signal still
blocked, meaning second and subsequent attempts to interrupt the backend
don't work.

This isn't revealed by the regular regression tests because they don't
exercise PQcancel, but several recently-added isolation tests do attempt
to PQcancel the same backend more than once.

It's a bit surprising that it's taken us this long to recognize the
problem.  Typical use of PQcancel doesn't necessarily cause a failure:
StatementCancelHandler() won't exit through longjmp unless
ImmediateInterruptOK is true, which is only the case while waiting for a
heavyweight lock.  But still, you'd think somebody would've run into
the case in normal usage.

I think the simplest fix is to insert PG_SETMASK(UnBlockSig) into
StatementCancelHandler() and any other handlers that might exit via
longjmp.  I'm a bit inclined to only do this on platforms where a
problem is demonstrable, which so far is only OpenBSD.  (You'd
think that all BSDen would have the same issue, but the buildfarm
shows otherwise.)

BTW, this does not seem to explain the symptoms shown at
http://www.postgresql.org/message-id/4fe4d89a.8020...@kaltenbrunner.cc
because what we were seeing there was that *all* signals appeared to be
blocked.  However, after this round of debugging I no longer have a lot
of faith in OpenBSD's ps, because it was lying to me about whether the
process had signals blocked or not (or at least, it couldn't see the
effects of the interrupt signal disable, although when I added debugging
code to print the active signal mask according to sigprocmask() I got
told the truth).  So I'm not sure how much trust to put in those older
ps results.  It's possible that the previous failures were a
manifestation of something related to this bug.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] CREATE EXTENSION BLOCKS

2013-04-02 Thread David E. Wheeler
Hackers,

I am working on scripts to copy data from Oracle via oracle_fdw. They each do 
something like this:

CREATE SCHEMA migrate_stuff;
SET search_path TO migrate_stuff,public;
CREATE EXTENSION oracle_fdw SCHEMA migrate_rules;

CREATE SERVER oracle_stuff FOREIGN DATA WRAPPER oracle_fdw
OPTIONS (dbserver :'oracle_uri');

CREATE USER MAPPING FOR postgres SERVER oracle_stuff
OPTIONS (user :'oracle_user', password :'oracle_pass');

CREATE FOREIGN TABLE migrate_stuff (
   stuff_id integer,
   name text
) SERVER oracle_rules OPTIONS(table 'STUFF');

INSERT INTO my.stuff SELECT * FROM migrate_stuff;

DROP SCHEMA migrate_stuff CASCADE;
COMMIT;

Then I run them in parallel:

for file in migrate*.sql; do
psql -d foo -f $file 
done
wait

This works fine except for one thing: the first CREATE EXTENSION statement 
blocks all the others. Even when I create the extension in separate schemas in 
each script! I have to remove the CREATE EXTENSION statement, create it in 
public before any of the scripts run, then drop it when they're done. I'm okay 
with this workaround, but wasn't sure if the blocking of CREATE EXTENSION was 
intentional or a known issue (id did not see it documented in 
http://www.postgresql.org/docs/current/static/sql-createextension.html).

Thanks,

David



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] spoonbill vs. -HEAD

2013-04-02 Thread Tom Lane
I wrote:
 I think the simplest fix is to insert PG_SETMASK(UnBlockSig) into
 StatementCancelHandler() and any other handlers that might exit via
 longjmp.  I'm a bit inclined to only do this on platforms where a
 problem is demonstrable, which so far is only OpenBSD.  (You'd
 think that all BSDen would have the same issue, but the buildfarm
 shows otherwise.)

BTW, on further thought it seems like maybe this is an OpenBSD bug,
at least in part: what is evidently happening is that the temporary
blockage of SIGINT during the handler persists even after we've
longjmp'd back to the main loop.  But we're using sigsetjmp(..., 1)
to establish that longjmp handler --- so why isn't the original signal
mask reinstalled when we return to the main loop?

If (your version of) OpenBSD is getting this wrong, it'd explain why
we've not seen similar behavior elsewhere.

This theory doesn't really exonerate our code completely, because we use
sigsetjmp(..., 0) in PG_TRY, which means that in a code path where we
catch a longjmp and don't PG_RE_THROW it, we could be left with the
signal disabled.  I don't believe that is happening in the
isolation-test cases, though.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] psql crash fix

2013-04-02 Thread Bruce Momjian
I found that psql will crash if given a PSQLRC value containing a tilde:

$ PSQLRC=~/x psql test
*** glibc detected *** psql: free(): invalid pointer: 
0x7fffb7c933ec ***

This is on Debian Squeeze 6.0.7.  The fix is to pstrdup() the value
returned by getenv(), so it can be free()'ed later --- you can't free
getenv()-returned values:

   As typically implemented, getenv() returns a pointer to a string
   within the environment list.  The caller must take care not to
   modify this string, since that would change the environment of
   the process.

This bug exists in 9.2 and git head.  I also removed the return value
from expand_tilde() as no caller was using it.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +
diff --git a/src/bin/psql/common.c b/src/bin/psql/common.c
new file mode 100644
index be5e34a..3dea92c
*** a/src/bin/psql/common.c
--- b/src/bin/psql/common.c
*** session_username(void)
*** 1645,1655 
   * substitute '~' with HOME or '~username' with username's home dir
   *
   */
! char *
  expand_tilde(char **filename)
  {
  	if (!filename || !(*filename))
! 		return NULL;
  
  	/*
  	 * WIN32 doesn't use tilde expansion for file names. Also, it uses tilde
--- 1645,1655 
   * substitute '~' with HOME or '~username' with username's home dir
   *
   */
! void
  expand_tilde(char **filename)
  {
  	if (!filename || !(*filename))
! 		return;
  
  	/*
  	 * WIN32 doesn't use tilde expansion for file names. Also, it uses tilde
*** expand_tilde(char **filename)
*** 1697,1701 
  	}
  #endif
  
! 	return *filename;
  }
--- 1697,1701 
  	}
  #endif
  
! 	return;
  }
diff --git a/src/bin/psql/common.h b/src/bin/psql/common.h
new file mode 100644
index d8bb093..db645da
*** a/src/bin/psql/common.h
--- b/src/bin/psql/common.h
*** extern bool is_superuser(void);
*** 44,49 
  extern bool standard_strings(void);
  extern const char *session_username(void);
  
! extern char *expand_tilde(char **filename);
  
  #endif   /* COMMON_H */
--- 44,49 
  extern bool standard_strings(void);
  extern const char *session_username(void);
  
! extern void expand_tilde(char **filename);
  
  #endif   /* COMMON_H */
diff --git a/src/bin/psql/startup.c b/src/bin/psql/startup.c
new file mode 100644
index 5cb6b5f..5d7fe6e
*** a/src/bin/psql/startup.c
--- b/src/bin/psql/startup.c
*** process_psqlrc(char *argv0)
*** 610,616 
  	char		rc_file[MAXPGPATH];
  	char		my_exec_path[MAXPGPATH];
  	char		etc_path[MAXPGPATH];
! 	char	   *envrc;
  
  	find_my_exec(argv0, my_exec_path);
  	get_etc_path(my_exec_path, etc_path);
--- 610,616 
  	char		rc_file[MAXPGPATH];
  	char		my_exec_path[MAXPGPATH];
  	char		etc_path[MAXPGPATH];
! 	char	   *envrc = getenv(PSQLRC);
  
  	find_my_exec(argv0, my_exec_path);
  	get_etc_path(my_exec_path, etc_path);
*** process_psqlrc(char *argv0)
*** 618,629 
  	snprintf(rc_file, MAXPGPATH, %s/%s, etc_path, SYSPSQLRC);
  	process_psqlrc_file(rc_file);
  
- 	envrc = getenv(PSQLRC);
- 
  	if (envrc != NULL  strlen(envrc)  0)
  	{
! 		expand_tilde(envrc);
! 		process_psqlrc_file(envrc);
  	}
  	else if (get_home_path(home))
  	{
--- 618,630 
  	snprintf(rc_file, MAXPGPATH, %s/%s, etc_path, SYSPSQLRC);
  	process_psqlrc_file(rc_file);
  
  	if (envrc != NULL  strlen(envrc)  0)
  	{
! 		/* might need to free() this */
! 		char *envrc_alloc = pstrdup(envrc);
! 
! 		expand_tilde(envrc_alloc);
! 		process_psqlrc_file(envrc_alloc);
  	}
  	else if (get_home_path(home))
  	{

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] commit dfda6ebaec67 versus wal_keep_segments

2013-04-02 Thread Jeff Janes
This commit introduced a problem with wal_keep_segments:

commit dfda6ebaec6763090fb78b458a979b558c50b39b
Author: Heikki Linnakangas heikki.linnakan...@iki.fi
Date:   Sun Jun 24 18:06:38 2012 +0300

Don't waste the last segment of each 4GB logical log file.


in a side window do: watch ls -lrt /tmp/data/pg_xlog

dfda6ebaec/bin/initdb -D /tmp/data
dfda6ebaec/bin/pg_ctl -D /tmp/data -l logfile restart -o --fsync=off
--wal_keep_segments=20
createdb
pgbench -i -s10
pgbench -T3600

xlogs accumulate until there are about 26 of them.  Then all of a sudden
they drop down to 11 of them.  Then it builds back up to around 26, and
seems to stay there permanently.

At some point when it is over-pruning and recycling, it recyles the log
files that are still needed for recovery, and if the database crashes at
that point it will not recover because it can't find either the primary
secondary checkpoint records.

Cheers,

Jeff