Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-26 Thread Heikki Linnakangas

On 25.10.2011 19:37, Jeff Davis wrote:

On Mon, 2011-10-24 at 13:15 +0300, Heikki Linnakangas wrote:

Hmm, I don't think that's safe. After Oid wraparound, a range type oid
might get reused for some other range type, and the cache would return
stale values. Extremely unlikely to happen by accident, but could be
exploited by an attacker.


Any ideas on how to remedy that? I don't have another plan for making it
perform well. Plugging it into the cache invalidation mechanism seems
like overkill, but I suppose that would solve the problem.


I think we should look at the array-functions for precedent. array_in et 
al cache the information in fn_extra, so that when it's called 
repeatedly in one statement for the same type, the information is only 
looked up once. That's good enough, it covers repeated execution in a 
single query, as well as COPY and comparison calls from index searches, 
for example.



Aren't there a few other cases like this floating around the code?


Not that I know of. That said, I wouldn't be too surprised if there was.


I know the single-xid cache is potentially vulnerable to xid wraparound
for the same reason.


True.

--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Your review of pg_receivexlog/pg_basebackup

2011-10-26 Thread Heikki Linnakangas
(CC'ing pgsql-hackers, this started as an IM discussion yesterday but 
really belongs in the archives)


On 25.10.2011 23:52, Magnus Hagander wrote:

There's a tiny chance to get incomplete xlog files with pg_receivexlog if you 
crash:
1. pg_receivexlog finishes write()ing a file but system crashes before fsync() 
finishes.
2. When pg_receivexlog restarts after crash, the last WAL file was not fully 
flushed to disk, with
holes in the middle, but it has the right length. pg_receivexlog will continue 
streaming from the next file.
not sure if we care about such a narrow window, but maybe we do


So how would we go about fixing that?  Always unlink the last file in
the directory and try from there would seem dangerous too - what if
it's not available on the master anymore, then we might have given up
on data...


Start streaming from the beginning of the last segment, but don't unlink 
it first. Just overwrite it as you receive the data.


Or, always create new xlog file as 0001000100D3.partial, 
and only when it's fully written, fsync it, and then rename it to 
0001000100D3. Then you know that if a file doesn't have 
the .partial suffix, it's complete. The fact that the last partial file 
always has the .partial suffix needs some extra pushups in the 
restore_command, though.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2011-10-26 Thread Shigeru Hanada
(2011/10/25 19:15), Magnus Hagander wrote:
 2011/10/25 Shigeru Hanadashigeru.han...@gmail.com:
 I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a
 contrib module.  I think that this module would be the basis of further
 SQL/MED development for core, e.g. join-push-down and ANALYZE support.
 
 I have not looked at the code itself, but I wonder if we shouldn't
 consider making this a part of core-proper, not just a contrib module.
 The fact that it isn't *already* available in core surprises a lot of
 people...

Do you mean that pgsql_fdw should be a built-in extension like plpgsql
so that it's available just after initdb?  It would be accomplished with
some more changes:

* Move pgsql_fdw into core, say src/backend/foreign/libpgsql_fdw, and
install dynamically loadable module during make install for core.  The
pgsql_fdw_handler function can't be included into core binary because we
must avoid liking libpq with server binary directly.  This method is
also used for libwalreceiver of replication module.
* Create pgsql_fdw extension during initdb invocation, like plpgsql.

These are not trivial, but not difficult so much.  However, I think
contrib would be the appropriate place for pgsql_fdw because it's
(relatively) special feature.
-- 
Shigeru Hanada

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Simon Riggs
On Tue, Oct 25, 2011 at 10:06 PM, Chris Redekop ch...@replicon.com wrote:
 Chris, can you rearrange the backup so you copy the pg_control file as
 the first act after the pg_start_backup?
 I tried this and it doesn't seem to make any difference.

It won't, that was a poor initial diagnosis on my part.

 I also tried the
 patch and I can no longer reproduce the subtrans error

Good

, however instead it
 now it starts up, but never gets to the point where it'll accept
 connections.  It starts up but if I try to do anything I always get FATAL:
  the database system is starting up...even if the load is removed from the
 primary, the standby still never finishes starting up.
...
 2011-10-25 13:43:25.019 MDT [15072]: [14-1] LOG:  consistent state delayed
 because recovery snapshot incomplete
...

This is a different problem and has already been reported by one of
your colleagues in a separate thread, and answered in detail by me
there. There is no bug related to this error message.

From here, it looks like the published fixes the originally reported problem.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TOAST versus VACUUM, or missing chunk number 0 for toast value identified

2011-10-26 Thread Simon Riggs
On Wed, Oct 26, 2011 at 4:24 AM, Tom Lane t...@sss.pgh.pa.us wrote:

 Even given your recent changes to reduce the overhead of checking for
 sinval messages, I'm not sure that it'd be practical to move the sinval
 message processing to just-before-we-look-up-a-cache-entry.  Right now,
 we do AcceptInvalidationMessages basically once per table per query
 (or maybe it's twice or so, but anyway a very small multiplier on that).
 If we try to do it every time through SearchSysCache, we are probably
 talking two to three orders of magnitude more checks, which ISTM is
 certain to push the sinval queue back up to the top of the heap for
 contention.

Initially I provided an implementation of eager locking, as Robert
suggests, but the above is a great argument against doing it that way.

Incidentally, I'd like to focus on the causal reason for these
problems. We have static type definitions, which allow us to
completely avoid dynamic type management at runtime. That is a
performance advantage for us in normal running, but it can also give
operational problems when we look to make changes.

 But in any case, this isn't the core of the problem.  The real point
 here is that we need a guarantee that a syscache entry we're going to
 use is/was valid as of some suitable time point later than the start of
 our current transaction.  (Once we have taken a snapshot, VACUUM will
 know that it can't remove any tuples that were deleted after the time of
 that snapshot; so even for SnapshotNow fetches, it's important to have
 an MVCC snapshot to protect toast-table dereferences.)  Perhaps rather
 than tying the problem into SearchSysCache, we should attach the
 overhead to GetTransactionSnapshot, which is called appealingly few
 times per query.  But right offhand it seems like that only protects us
 against the toast-tuple-deletion problem, not against the more general
 one of getting a stale view of the status of some relation.

Do we need to guarantee that? It seems strange to me to use the words
cache and strict in the same sentence.

I'm sure there are many points in the code where strictness is
required though also in many cases, it should be acceptable to say
that a cache miss is not a problem, so does not require strict
guarantees.

Do we need a redesign? Or do we need a way to use
relaxation/optimistic techniques.

I'm aware that it could be a huge undertaking to examine all call
points. If you have any ideas of where to investigate or parts of the
problem to assist with, I'll be happy to work on this. This seems
important to me.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Florian Pflug
On Oct25, 2011, at 14:51 , Simon Riggs wrote:
 On Tue, Oct 25, 2011 at 12:39 PM, Florian Pflug f...@phlo.org wrote:
 
 What I don't understand is how this affects the CLOG. How does 
 oldestActiveXID
 factor into CLOG initialization?
 
 It is an entirely different error.

Ah, OK. I assumed that you believe the wrong oldestActiveXID computation
solved both the SUBTRANS-related *and* the CLOG-related errors, since you
said We are starting recovery at the right place but we are initialising
the clog and subtrans incorrectly at the start of the mail.

 Chris' clog error was caused by a file read error. The file was
 opened, we did a seek within the file and then the call to read()
 failed to return a complete page from the file.
 
 The xid shown is 22811359, which is the nextxid in the control file.
 
 So StartupClog() must have failed trying to read the clog page from disk.

Yep.

 That isn't a Hot Standby problem, a recovery problem nor is it certain
 its a PostgreSQL problem.

It's very likely that it's a PostgreSQL problem, though. It's probably
not a pilot error since it happens even for backups taken with pg_basebackup(),
so the only explanation other than a PostgreSQL bug is broken hardware or
a pretty serious kernel/filesystem bug.

 OTOH SlruPhysicalReadPage() does cope gracefully with missing clog
 files during recovery, so maybe we can think of a way to make recovery
 cope with a SLRU_READ_FAILED error gracefully also. Any ideas?

As long as we don't understand how the CLOG-related errors happen in
the first place, I think it's a bad idea to silence them.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Florian Pflug
On Oct25, 2011, at 13:39 , Florian Pflug wrote:
 On Oct25, 2011, at 11:13 , Simon Riggs wrote:
 On Tue, Oct 25, 2011 at 8:03 AM, Simon Riggs si...@2ndquadrant.com wrote:
 We are starting recovery at the right place but we are initialising
 the clog and subtrans incorrectly. Precisely, the oldestActiveXid is
 being derived later than it should be, which can cause problems if
 this then means that whole pages are unitialised in subtrans. The bug
 only shows up if you do enough transactions (2048 is always enough) to
 move to the next subtrans page between the redo pointer and the
 checkpoint record while at the same time we do not have a long running
 transaction that spans those two points. That's just enough to happen
 reasonably frequently on busy systems and yet just enough to have
 slipped through testing.
 
 We must either
 
 1. During CreateCheckpoint() we should derive oldestActiveXid before
 we derive the redo location
 
 (1) looks the best way forwards in all cases.
 
 Let me see if I understand this
 
 The probem seems to be that we currently derive oldestActiveXid end the end of
 the checkpoint, just before writing the checkpoint record. Since we use
 oldestActiveXid to initialize SUBTRANS, this is wrong. Records written before
 that checkpoint record (but after the REDO location, of course) may very well
 contain XIDs earlier than that wrongly derived oldestActiveXID, and if attempt
 to touch these XID's SUBTRANS state, we error out.
 
 Your patch seems sensible, because the checkpoint logically occurs at the
 REDO location not the checkpoint's location, so we ought to log an 
 oldestActiveXID
 corresponding to that location.

Thinking about this some more (and tracing through the code), I realized that
things are a bit more complicated.

What we actually need to ensure, I think, is that the XID we pass to 
StartupSUBTRANS()
is earlier than any top-level XID in XLOG_XACT_ASSIGNMENT records. Which, at 
first
glance, implies that we ought to use the nextId at the *beginning* of the 
checkpoint
for SUBTRANS initialization. At second glace, however, that'd be wrong, because
backends emit XLOG_XACT_ASSIGNMENT only every PGPROC_MAX_CACHED_SUBXIDS sub-xid
assignment. Thus, an XLOG_XACT_ASSIGNMENT written *after* the checkpoint has 
started
may contain sub-XIDs which were assigned *before* the checkpoint has started.

Using oldestActiveXID works around that because we guarantee that sub-XIDs are 
always
larger than their parent XIDs and because only active transactions can produce
XLOG_XACT_ASSIGNMENT records.

So your patch is fine, but I think the reasoning about why oldestActiveXID is
the correct value for StartupSUBTRANS deserves an explanation somewhere.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] patch for distinguishing PG instances in event log v2

2011-10-26 Thread MauMau

From: Magnus Hagander mag...@hagander.net
2011/7/16 MauMau maumau...@gmail.com:

Hello,

The attached file is a revised patch which reflects all review comments by
Magnus in:

http://archives.postgresql.org/pgsql-hackers/2011-07/msg00839.php

I have applied this patch after another round of rather extensive 
modifications.



Thank you, Magnus. I'll see the final code some later.

Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Simon Riggs
On Wed, Oct 26, 2011 at 12:16 PM, Florian Pflug f...@phlo.org wrote:

 Chris' clog error was caused by a file read error. The file was
 opened, we did a seek within the file and then the call to read()
 failed to return a complete page from the file.

 The xid shown is 22811359, which is the nextxid in the control file.

 So StartupClog() must have failed trying to read the clog page from disk.

 Yep.

 That isn't a Hot Standby problem, a recovery problem nor is it certain
 its a PostgreSQL problem.

 It's very likely that it's a PostgreSQL problem, though. It's probably
 not a pilot error since it happens even for backups taken with 
 pg_basebackup(),
 so the only explanation other than a PostgreSQL bug is broken hardware or
 a pretty serious kernel/filesystem bug.

The way forwards here is for someone to show the clog file that causes
the error and find out why the call to read() fails.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Aidan Van Dyk
On Wed, Oct 26, 2011 at 7:43 AM, Simon Riggs si...@2ndquadrant.com wrote:

 It's very likely that it's a PostgreSQL problem, though. It's probably
 not a pilot error since it happens even for backups taken with 
 pg_basebackup(),
 so the only explanation other than a PostgreSQL bug is broken hardware or
 a pretty serious kernel/filesystem bug.

 The way forwards here is for someone to show the clog file that causes
 the error and find out why the call to read() fails.

Sorry, I thought the problem was obvious.  Either that, of I've
completely missed something in these threads...  I'll admit to not
following this one very closely anymore...

When the backup started, the clog was small.  So on the recovering
instance, the clog is small.  PostgreSQL is supposed to be able to
deal with any file as it was when the backup starts.

When the backup is stopped, clog is big.  But that file was copied
after the backup was started, not after the backup finished.  So its
size is only guarenteed to be as big as it was when the backup
started.  Recovery is responsible for extending it as it was extended
during the backup period on the master.

The read fails because their is no data at the location it's trying to
read from, because clog hasn't been extended yet by recovery.

a.
-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Simon Riggs
On Wed, Oct 26, 2011 at 12:54 PM, Aidan Van Dyk ai...@highrise.ca wrote:

 The read fails because their is no data at the location it's trying to
 read from, because clog hasn't been extended yet by recovery.

You don't actually know that, though I agree it seems a reasonable
guess and was my first thought also.

The error is very specifically referring to 22811359, which is the
nextxid from pg_control and updated by checkpoint.

22811359 is mid-way through a clog page, so prior xids will already
have been allocated, pages extended and then those pages fsyncd before
the end of pg_start_backup().  So it shouldn't be possible for that
page to be absent from the base backup, unless the base backup was
taken without a preceding checkpoint, which seems is not the case from
the script output.

Note that if you are correct, then the solution is to extend clog,
which Florian disagrees with as a solution.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] TOAST versus VACUUM, or missing chunk number 0 for toast value identified

2011-10-26 Thread Robert Haas
On Tue, Oct 25, 2011 at 11:24 PM, Tom Lane t...@sss.pgh.pa.us wrote:
 Even given your recent changes to reduce the overhead of checking for
 sinval messages, I'm not sure that it'd be practical to move the sinval
 message processing to just-before-we-look-up-a-cache-entry.

Wait a minute: I'm confused.  What's at issue here, at least AIUI, is
taking a TOAST pointer and fetching the corresponding value.  But that
must involve reading from the TOAST table, and our usual paradigm for
reading from system catalogs is (1) take AccessShareLock, (2) read the
relevant rows, (3) release AccessShareLock.  If we're doing that here,
then AcceptInvalidationMessages() is already getting called.  If we're
not, this seems horribly unsafe; aside from the current bug, somebody
could rewrite the table in the interim.

 Right now,
 we do AcceptInvalidationMessages basically once per table per query
 (or maybe it's twice or so, but anyway a very small multiplier on that).
 If we try to do it every time through SearchSysCache, we are probably
 talking two to three orders of magnitude more checks, which ISTM is
 certain to push the sinval queue back up to the top of the heap for
 contention.

I think we've pretty much got the *contention* licked, barring an
increase in the number of things that generate sinval messages, but
calling it too often will still be slow, of course.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-26 Thread Robert Haas
On Tue, Oct 25, 2011 at 12:37 PM, Jeff Davis pg...@j-davis.com wrote:
 On Mon, 2011-10-24 at 13:15 +0300, Heikki Linnakangas wrote:
 Hmm, I don't think that's safe. After Oid wraparound, a range type oid
 might get reused for some other range type, and the cache would return
 stale values. Extremely unlikely to happen by accident, but could be
 exploited by an attacker.

 Any ideas on how to remedy that? I don't have another plan for making it
 perform well. Plugging it into the cache invalidation mechanism seems
 like overkill, but I suppose that would solve the problem.

 Aren't there a few other cases like this floating around the code? I
 know the single-xid cache is potentially vulnerable to xid wraparound
 for the same reason.

I believe that we're in trouble with XIDs as soon as you have two
active XIDs that are separated by a billion, because then you could
have a situation where some people think a given XID is in the future
and others think it's in the past.  I have been wondering if we should
have some sort of active guard against that scenario; I don't think we
do at present.

But OID wraparound is not the same as XID wraparound.  It's far more
common, I think, for a single transaction to use lots of OIDs than it
is for it to use lots of XIDs (i.e. have many subtransactions).

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2011-10-26 Thread Tom Lane
Shigeru Hanada shigeru.han...@gmail.com writes:
 (2011/10/25 19:15), Magnus Hagander wrote:
 I have not looked at the code itself, but I wonder if we shouldn't
 consider making this a part of core-proper, not just a contrib module.
 The fact that it isn't *already* available in core surprises a lot of
 people...

 Do you mean that pgsql_fdw should be a built-in extension like plpgsql
 so that it's available just after initdb?

If that was what he meant, I'd vote against it.  There are way too many
people who will *not* want their databases configured to be able to
reach out onto the net.  This feature should be something that has to be
installed by explicit user action.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Florian Pflug
On Oct26, 2011, at 15:12 , Simon Riggs wrote:
 On Wed, Oct 26, 2011 at 12:54 PM, Aidan Van Dyk ai...@highrise.ca wrote:
 
 The read fails because their is no data at the location it's trying to
 read from, because clog hasn't been extended yet by recovery.
 
 You don't actually know that, though I agree it seems a reasonable
 guess and was my first thought also.

The actual error message also supports that theory. Here's the relevant
snippet from the OP's log (Found in 
ca9fd2fe.1d8d2%linas.virba...@continuent.com)

2011-09-21 13:41:05 CEST FATAL:  could not access status of transaction 1188673
2011-09-21 13:41:05 CEST DETAIL:  Could not read from file pg_clog/0001 at 
offset 32768: Success.

Note that it says Success at the end of the second log entry. That
can only happen, I think, if we're trying to read the page adjacent to
the last page in the file. The seek would be successfull, and the subsequent
read() would indicate EOF by returning zero bytes. None of the calls would
set errno. If there was a real IO error, read() would set errno, and if the
page wasn't adjacent to the last page in the file, seek() would set errno.
In both cases we'd see the corresponding error messag, not Success.

 The error is very specifically referring to 22811359, which is the
 nextxid from pg_control and updated by checkpoint.

Where does that XID come from? The reference to that XID in the archives
that I can find is in your message
CA+U5nMKUUoA8kRG=itfso5nzue3x_kdjz78eaun3_fkmq-u...@mail.gmail.com

 22811359 is mid-way through a clog page, so prior xids will already
 have been allocated, pages extended and then those pages fsyncd before
 the end of pg_start_backup().  So it shouldn't be possible for that
 page to be absent from the base backup, unless the base backup was
 taken without a preceding checkpoint, which seems is not the case from
 the script output.

Or unless the nextId we store in the checkpoint is for some reason higher
than it should be. Or unless nextId somehow gets mangled during recovery.
Or unless there's some interaction between VACUUM and CHECKPOINTS that
we're overlooking...

 Note that if you are correct, then the solution is to extend clog,
 which Florian disagrees with as a solution.

That's not what I said. As you said, the CLOG page corresponding to nextId
*should* always be accessible at the start of recovery (Unless whole file
has been removed by VACUUM, that is). So we shouldn't need to extends CLOG.
Yet the error suggest that the CLOG is, in fact, too short. What I said
is that we shouldn't apply any fix (for the CLOG problem) before we understand
the reason for that apparent contradiction.

Doing it nevertheless to get rid of this seems dangerous. What happens, for
example, to the CLOG state of transactions earlier than the checkpoint's
nextId? There COMMIT record may very well lie before the checkpoint's REDO
pointer, so the CLOG we copied better contained their correct state. Yet if
it does, then why isn't the nextId's CLOG page accessible?

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Chris Redekop
 And I think they also reported that if they didn't run hot standby,
 but just normal recovery into a new master, it didn't have the problem
 either, i.e. without hotstandby, recovery ran, properly extended the
 clog, and then ran as a new master fine.

Yes this is correct...attempting to start as hotstandby will produce the
pg_clog error repeatedly and then without changing anything else, just
turning hot standby off it will start up successfully.

 This fits the OP's observation ob the
 problem vanishing when pg_start_backup() does an immediate checkpoint.

Note that this is *not* the behaviour I'm seeingit's possible it happens
more frequently without the immediate checkpoint, but I am seeing it happen
even with the immediate checkpoint.

 This is a different problem and has already been reported by one of
 your colleagues in a separate thread, and answered in detail by me
 there. There is no bug related to this error message.

Excellent...I will continue this discussion in that thread.


[HACKERS] cache lookup failed in plpgsql - reason?

2011-10-26 Thread Pavel Stehule
Hello

one my customer reported a random issue. He uses a procedure with
following fragment

 IF NOT EXISTS(
   SELECT relname FROM pg_class
   WHERE relname = 'tmp_object_state_change' AND relkind = 'r' AND
   pg_table_is_visible(oid)
 )
 THEN
   CREATE TEMPORARY TABLE tmp_object_state_change (
 object_id INTEGER,
 object_hid INTEGER,
 new_states INTEGER[],
 old_states INTEGER[]
   );
 ELSE
   TRUNCATE tmp_object_state_change;
 END IF;

These lines sometimes raise a error

Oct 25 20:13:44  db-s-01 postgres: local5.warning -- postgres[29970]:
[3-1] 2011-10-25 20:13:44 CEST adifd 29970 ERROR:  cache lookup failed
for relation 319883311
Oct 25 20:13:44  db-s-01 postgres: local5.warning -- postgres[29970]:
[3-2] 2011-10-25 20:13:44 CEST adifd 29970 CONTEXT:  SQL statement
SELECT  NOT EXISTS( SELECT relname FROM pg_class WHERE relname =
Oct 25 20:13:44  db-s-01 postgres: local5.warning -- postgres[29970]:
[3-3]  'tmp_object_state_change' AND relkind = 'r' AND
pg_table_is_visible(oid) )
Oct 25 20:13:44  db-s-01 postgres: local5.warning -- postgres[29970]:
[3-4]   PL/pgSQL function update_object_states line 2 at IF
Oct 25 20:13:44  db-s-01 postgres: local5.warning -- postgres[29970]:
[3-5] 2011-10-25 20:13:44 CEST adifd 29970 STATEMENT:  SELECT
update_object_states($1::integer)

I don't see a reason why on this query cache should be broken,

He uses Pg 8.3.15.

Any idea?

Regards

Pavel Stehule

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2011-10-26 Thread Robert Haas
2011/10/26 Shigeru Hanada shigeru.han...@gmail.com:
 (2011/10/25 19:15), Magnus Hagander wrote:
 2011/10/25 Shigeru Hanadashigeru.han...@gmail.com:
 I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a
 contrib module.  I think that this module would be the basis of further
 SQL/MED development for core, e.g. join-push-down and ANALYZE support.

 I have not looked at the code itself, but I wonder if we shouldn't
 consider making this a part of core-proper, not just a contrib module.
 The fact that it isn't *already* available in core surprises a lot of
 people...

 Do you mean that pgsql_fdw should be a built-in extension like plpgsql
 so that it's available just after initdb?  It would be accomplished with
 some more changes:

 * Move pgsql_fdw into core, say src/backend/foreign/libpgsql_fdw, and
 install dynamically loadable module during make install for core.  The
 pgsql_fdw_handler function can't be included into core binary because we
 must avoid liking libpq with server binary directly.  This method is
 also used for libwalreceiver of replication module.
 * Create pgsql_fdw extension during initdb invocation, like plpgsql.

 These are not trivial, but not difficult so much.  However, I think
 contrib would be the appropriate place for pgsql_fdw because it's
 (relatively) special feature.

I agree.  pgsql_fdw will be a nice feature, but there's no reason to
think that everyone will want it installed by default, and there are
some security reasons to think that they might not.  On the flip side,
pushing it out of contrib and onto pgfoundry or whatever makes it
unnecessarily difficult to install, and not as many people will
benefit from it.  So contrib seems exactly right to me.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Florian Pflug
On Oct26, 2011, at 15:57 , Florian Pflug wrote:
 As you said, the CLOG page corresponding to nextId
 *should* always be accessible at the start of recovery (Unless whole file
 has been removed by VACUUM, that is). So we shouldn't need to extends CLOG.
 Yet the error suggest that the CLOG is, in fact, too short. What I said
 is that we shouldn't apply any fix (for the CLOG problem) before we understand
 the reason for that apparent contradiction.

Ha! I think I've got a working theory.

In CreateCheckPoint(), we determine the nextId that'll go into the checkpoint
record, and then call CheckPointGuts() which does the actual writing and 
fsyncing.
So far, that fine. If a transaction ID is assigned before we compute the
checkpoint's nextXid, we'll extend the CLOG accordingly, and CheckPointGuts() 
will
make sure the new CLOG page goes to disk.

But, if wal_level = hot_standby, we also call LogStandbySnapshot() in
CreateCheckPoint(), and we do that *after* CheckPointGuts(). Which would be
fine too, except that LogStandbySnapshot() re-assigned the *current* value of
ShmemVariableCache-nextXid to the checkpoint's nextXid field.

Thus, if the CLOG is extended after (or in the middle of) CheckPointGuts(), but
before LogStandbySnapshot(), then we end up with a nextXid in the checkpoint
whose CLOG page hasn't necessarily made it to the disk yet. The longer 
CheckPointGuts()
takes to finish it's work the more likely it becomes (assuming that CLOG writing
and syncing doesn't happen at the very end). This fits the OP's observation ob 
the
problem vanishing when pg_start_backup() does an immediate checkpoint.

I dunno how to this fix, though, since I don't really understand why
LogStandbySnapshot() needs to modify the checkpoint's nextXid.Simon, is there 
some
documentation on what assumptions the hot standby code makes about the various 
XID
fields included in a checkpoint?

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2011-10-26 Thread Dimitri Fontaine
Stephen Frost sfr...@snowman.net writes:
 I'm in favor of making that distinction.  I would still have pgsql_fdw,
 file_fdw, etc, be packaged more-or-less the same way and still use the
 CREATE EXTENTION framework, of course.

We called that idea “core extension” at the latest hackers meeting, and
Greg Smith had a patch with a first selections of extensions to package
this way.

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-26 Thread Heikki Linnakangas

On 26.10.2011 18:42, Robert Haas wrote:

On Tue, Oct 25, 2011 at 12:37 PM, Jeff Davispg...@j-davis.com  wrote:

Aren't there a few other cases like this floating around the code? I
know the single-xid cache is potentially vulnerable to xid wraparound
for the same reason.


I believe that we're in trouble with XIDs as soon as you have two
active XIDs that are separated by a billion, ...


That's not what Jeff is referring to here, though (correct me if I'm 
wrong). He's talking about the one-item cache in 
TransactionIdLogFetch(). You don't need need long-running transactions 
for that to get confused. Specifically, this could happen:


1. In session A: BEGIN; SELECT * FROM foo WHERE id = 1; COMMIT;
   The row has xmin = 123456, and it is cached as committed in the 
one-item cache by TransactionLogFetch.
2. A lot of time passes. Everything is frozen, and XID wrap-around 
happens. (Session A is idle but not in a transaction, so it doesn't 
inhibit freezing.)

3. In session B: BEGIN: INSERT INTO foo (id) VALUES (2); ROLLBACK;
   By coincidence, this transaction was assigned XID 123456.
4. In session A: SELECT * FROM foo WHERE id = 2;
   The one-item cache still says that 123456 committed, so we return 
the tuple inserted by the aborted transaction. Oops.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2011-10-26 Thread Kohei KaiGai
2011/10/26 Robert Haas robertmh...@gmail.com:
 2011/10/26 Shigeru Hanada shigeru.han...@gmail.com:
 (2011/10/25 19:15), Magnus Hagander wrote:
 2011/10/25 Shigeru Hanadashigeru.han...@gmail.com:
 I'd like to propose pgsql_fdw, FDW for external PostgreSQL server, as a
 contrib module.  I think that this module would be the basis of further
 SQL/MED development for core, e.g. join-push-down and ANALYZE support.

 I have not looked at the code itself, but I wonder if we shouldn't
 consider making this a part of core-proper, not just a contrib module.
 The fact that it isn't *already* available in core surprises a lot of
 people...

 Do you mean that pgsql_fdw should be a built-in extension like plpgsql
 so that it's available just after initdb?  It would be accomplished with
 some more changes:

 * Move pgsql_fdw into core, say src/backend/foreign/libpgsql_fdw, and
 install dynamically loadable module during make install for core.  The
 pgsql_fdw_handler function can't be included into core binary because we
 must avoid liking libpq with server binary directly.  This method is
 also used for libwalreceiver of replication module.
 * Create pgsql_fdw extension during initdb invocation, like plpgsql.

 These are not trivial, but not difficult so much.  However, I think
 contrib would be the appropriate place for pgsql_fdw because it's
 (relatively) special feature.

 I agree.  pgsql_fdw will be a nice feature, but there's no reason to
 think that everyone will want it installed by default, and there are
 some security reasons to think that they might not.  On the flip side,
 pushing it out of contrib and onto pgfoundry or whatever makes it
 unnecessarily difficult to install, and not as many people will
 benefit from it.  So contrib seems exactly right to me.

I also agree. The pgsql_fdw will be worthful to locate in the main tree
as a contrib module. It will give us clear opportunity to test new
features of FDW using RDBMS characteristics; such as join-push-down.
However, it should be a separated discussion whether it shall be installed
by the default.

Thanks,
-- 
KaiGai Kohei kai...@kaigai.gr.jp

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Aidan Van Dyk
On Wed, Oct 26, 2011 at 9:57 AM, Florian Pflug f...@phlo.org wrote:
 On Oct26, 2011, at 15:12 , Simon Riggs wrote:
 On Wed, Oct 26, 2011 at 12:54 PM, Aidan Van Dyk ai...@highrise.ca wrote:

 The read fails because their is no data at the location it's trying to
 read from, because clog hasn't been extended yet by recovery.

 You don't actually know that, though I agree it seems a reasonable
 guess and was my first thought also.

 The actual error message also supports that theory. Here's the relevant
 snippet from the OP's log (Found in 
 ca9fd2fe.1d8d2%linas.virba...@continuent.com)

 2011-09-21 13:41:05 CEST FATAL:  could not access status of transaction 
 1188673
 2011-09-21 13:41:05 CEST DETAIL:  Could not read from file pg_clog/0001 at 
 offset 32768: Success.

 Note that it says Success at the end of the second log entry. That
 can only happen, I think, if we're trying to read the page adjacent to
 the last page in the file. The seek would be successfull, and the subsequent
 read() would indicate EOF by returning zero bytes. None of the calls would
 set errno. If there was a real IO error, read() would set errno, and if the
 page wasn't adjacent to the last page in the file, seek() would set errno.
 In both cases we'd see the corresponding error messag, not Success.

And even more pointedly, in the original go around on this:
   http://article.gmane.org/gmane.comp.db.postgresql.devel.general/174056

He reported that clog/ after pg_start_backup call:
-rw--- 1 postgres postgres 8192 Sep 23 14:31 

Changed during the rsync phase to this:
-rw--- 1 postgres postgres 16384 Sep 23 14:33 

But on the slave, of course, it was copied before it was extend so it
was the original size (that's ok, that's the point of recovery after
the backup):
-rw--- 1 postgres postgres 8192 Sep 23 14:31 

With the error:
  2011-09-23 14:33:46 CEST FATAL:  could not access status of transaction 37206
  2011-09-23 14:33:46 CEST DETAIL:  Could not read from file
pg_clog/ at offset 8192: Success.

And that error happens *before* recovery even can get attempted.

And that if he copied the recent clog/ from the master, it did start up.

And I think they also reported that if they didn't run hot standby,
but just normal recovery into a new master, it didn't have the problem
either, i.e. without hotstandby, recovery ran, properly extended the
clog, and then ran as a new master fine.

a.

-- 
Aidan Van Dyk                                             Create like a god,
ai...@highrise.ca                                       command like a king,
http://www.highrise.ca/                                   work like a slave.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Florian Pflug
On Oct26, 2011, at 17:36 , Chris Redekop wrote:
  And I think they also reported that if they didn't run hot standby,
  but just normal recovery into a new master, it didn't have the problem
  either, i.e. without hotstandby, recovery ran, properly extended the
  clog, and then ran as a new master fine.
 
 Yes this is correct...attempting to start as hotstandby will produce the
 pg_clog error repeatedly and then without changing anything else, just
 turning hot standby off it will start up successfully.

Yup, because with hot standby disabled (on the client side), StartupCLOG()
happens after recovery has completed. That, at the very least, makes the
problem very unlikely to occur in the non-hot-standby case. I'm not sure
it's completely impossible, though.

Per my theory about the cause of the problem in my other mail, I think you
might see StartupCLOG failures even during crash recovery, provided that
wal_level was set to hot_standby when the primary crashed. Here's how

1) We start a checkpoint, and get as far as LogStandbySnapshot()
2) A backend does AssignTransactionId, and gets as far as GetTransactionoId().
   The assigned XID requires CLOG extension.
3) The checkpoint continues, and LogStandbySnapshot () advances the
   checkpoint's nextXid to the XID assigned in (2).
4) We crash after writing the checkpoint record, but before the CLOG
   extension makes it to the disk, and before any trace of the XID assigned
   in (2) makes it to the xlog.

Then StartupCLOG() would fail at the end of recovery, because we'd end up
with a nextXid whose corresponding CLOG page doesn't exist.

  This fits the OP's observation ob the
  problem vanishing when pg_start_backup() does an immediate checkpoint.
 
 Note that this is *not* the behaviour I'm seeingit's possible it happens
 more frequently without the immediate checkpoint, but I am seeing it happen
 even with the immediate checkpoint.

Yeah, I should have said of the problem's likelihood decreasing instead
of vanishing. The point is, the longer the checkpoint takes, the higher
the chance the nextId is advanced far enough to require a CLOG extension.

That alone isn't enough to trigger the error - the CLOG extension must also
*not* make it to the disk before the checkpoint completes - but it's
a required precondition for the error to occur.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Simon Riggs
On Wed, Oct 26, 2011 at 3:47 PM, Florian Pflug f...@phlo.org wrote:
 On Oct26, 2011, at 15:57 , Florian Pflug wrote:
 As you said, the CLOG page corresponding to nextId
 *should* always be accessible at the start of recovery (Unless whole file
 has been removed by VACUUM, that is). So we shouldn't need to extends CLOG.
 Yet the error suggest that the CLOG is, in fact, too short. What I said
 is that we shouldn't apply any fix (for the CLOG problem) before we 
 understand
 the reason for that apparent contradiction.

 Ha! I think I've got a working theory.

 In CreateCheckPoint(), we determine the nextId that'll go into the checkpoint
 record, and then call CheckPointGuts() which does the actual writing and 
 fsyncing.
 So far, that fine. If a transaction ID is assigned before we compute the
 checkpoint's nextXid, we'll extend the CLOG accordingly, and CheckPointGuts() 
 will
 make sure the new CLOG page goes to disk.

 But, if wal_level = hot_standby, we also call LogStandbySnapshot() in
 CreateCheckPoint(), and we do that *after* CheckPointGuts(). Which would be
 fine too, except that LogStandbySnapshot() re-assigned the *current* value of
 ShmemVariableCache-nextXid to the checkpoint's nextXid field.

 Thus, if the CLOG is extended after (or in the middle of) CheckPointGuts(), 
 but
 before LogStandbySnapshot(), then we end up with a nextXid in the checkpoint
 whose CLOG page hasn't necessarily made it to the disk yet. The longer 
 CheckPointGuts()
 takes to finish it's work the more likely it becomes (assuming that CLOG 
 writing
 and syncing doesn't happen at the very end). This fits the OP's observation 
 ob the
 problem vanishing when pg_start_backup() does an immediate checkpoint.

This is the correct explanation. I've just come back into Wifi range,
so I was just writing to you with this explanation but your original
point that nextxid must be wrong deserves credit. OTOH I was just
waiting to find out what the reason for the physical read was, rather
than guessing.

Notice that the nextxid value isn't wrong, its just not the correct
value to use for starting clog.

As it turns out the correct fix is actually just to skip StartupClog()
until the end of recovery because it does nothing useful when executed
at that time. When I wrote the original code I remember thinking that
StartupClog() is superfluous at that point.

Brewing a patch now.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Simon Riggs
On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs si...@2ndquadrant.com wrote:

 Brewing a patch now.

Latest thinking... confirmations or other error reports please.

This fixes both the subtrans and clog bugs in one patch.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


oldestActiveXid_fixed.v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-26 Thread Robert Haas
On Wed, Oct 26, 2011 at 11:52 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 26.10.2011 18:42, Robert Haas wrote:

 On Tue, Oct 25, 2011 at 12:37 PM, Jeff Davispg...@j-davis.com  wrote:

 Aren't there a few other cases like this floating around the code? I
 know the single-xid cache is potentially vulnerable to xid wraparound
 for the same reason.

 I believe that we're in trouble with XIDs as soon as you have two
 active XIDs that are separated by a billion, ...

 That's not what Jeff is referring to here, though (correct me if I'm wrong).
 He's talking about the one-item cache in TransactionIdLogFetch(). You don't
 need need long-running transactions for that to get confused. Specifically,
 this could happen:

 1. In session A: BEGIN; SELECT * FROM foo WHERE id = 1; COMMIT;
   The row has xmin = 123456, and it is cached as committed in the one-item
 cache by TransactionLogFetch.
 2. A lot of time passes. Everything is frozen, and XID wrap-around happens.
 (Session A is idle but not in a transaction, so it doesn't inhibit
 freezing.)
 3. In session B: BEGIN: INSERT INTO foo (id) VALUES (2); ROLLBACK;
   By coincidence, this transaction was assigned XID 123456.
 4. In session A: SELECT * FROM foo WHERE id = 2;
   The one-item cache still says that 123456 committed, so we return the
 tuple inserted by the aborted transaction. Oops.

Oh, hmm.  That sucks.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-26 Thread Alvaro Herrera

Excerpts from Robert Haas's message of mié oct 26 13:19:47 -0300 2011:
 On Wed, Oct 26, 2011 at 11:52 AM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:

  1. In session A: BEGIN; SELECT * FROM foo WHERE id = 1; COMMIT;
    The row has xmin = 123456, and it is cached as committed in the one-item
  cache by TransactionLogFetch.
  2. A lot of time passes. Everything is frozen, and XID wrap-around happens.
  (Session A is idle but not in a transaction, so it doesn't inhibit
  freezing.)
  3. In session B: BEGIN: INSERT INTO foo (id) VALUES (2); ROLLBACK;
    By coincidence, this transaction was assigned XID 123456.
  4. In session A: SELECT * FROM foo WHERE id = 2;
    The one-item cache still says that 123456 committed, so we return the
  tuple inserted by the aborted transaction. Oops.
 
 Oh, hmm.  That sucks.

Can this be solved by simple application of the Xid epoch?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2011-10-26 Thread Magnus Hagander
On Wed, Oct 26, 2011 at 16:37, Tom Lane t...@sss.pgh.pa.us wrote:
 Shigeru Hanada shigeru.han...@gmail.com writes:
 (2011/10/25 19:15), Magnus Hagander wrote:
 I have not looked at the code itself, but I wonder if we shouldn't
 consider making this a part of core-proper, not just a contrib module.
 The fact that it isn't *already* available in core surprises a lot of
 people...

 Do you mean that pgsql_fdw should be a built-in extension like plpgsql
 so that it's available just after initdb?

 If that was what he meant, I'd vote against it.  There are way too many
 people who will *not* want their databases configured to be able to
 reach out onto the net.  This feature should be something that has to be
 installed by explicit user action.

That is not what I meant.

I meant installed the shared library by defualt, but still require
CREATE EXTENSION.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-26 Thread Jeff Davis
On Wed, 2011-10-26 at 12:19 -0400, Robert Haas wrote:
  1. In session A: BEGIN; SELECT * FROM foo WHERE id = 1; COMMIT;
The row has xmin = 123456, and it is cached as committed in the one-item
  cache by TransactionLogFetch.
  2. A lot of time passes. Everything is frozen, and XID wrap-around happens.
  (Session A is idle but not in a transaction, so it doesn't inhibit
  freezing.)
  3. In session B: BEGIN: INSERT INTO foo (id) VALUES (2); ROLLBACK;
By coincidence, this transaction was assigned XID 123456.
  4. In session A: SELECT * FROM foo WHERE id = 2;
The one-item cache still says that 123456 committed, so we return the
  tuple inserted by the aborted transaction. Oops.

Yes, that's the scenario I was talking about.

 Oh, hmm.  That sucks.

It didn't seem like a major concern a while ago:
http://archives.postgresql.org/message-id/28107.1291264...@sss.pgh.pa.us

Regards,
Jeff Davis


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Simon Riggs
On Wed, Oct 26, 2011 at 5:16 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs si...@2ndquadrant.com wrote:

 Brewing a patch now.

 Latest thinking... confirmations or other error reports please.

 This fixes both the subtrans and clog bugs in one patch.

I'll be looking to commit that tomorrow afternoon as two separate
patches with appropriate credits.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2011-10-26 Thread Andrew Dunstan



On 10/26/2011 12:47 PM, Magnus Hagander wrote:


If that was what he meant, I'd vote against it.  There are way too many
people who will *not* want their databases configured to be able to
reach out onto the net.  This feature should be something that has to be
installed by explicit user action.

That is not what I meant.

I meant installed the shared library by defualt, but still require
CREATE EXTENSION.



I don't see why it should be different from other standard modules, such 
as citext or hstore, both of which have pretty wide use, and less 
possible security implications than this.


cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-26 Thread Tom Lane
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes:
 That's not what Jeff is referring to here, though (correct me if I'm 
 wrong). He's talking about the one-item cache in 
 TransactionIdLogFetch(). You don't need need long-running transactions 
 for that to get confused. Specifically, this could happen:

 1. In session A: BEGIN; SELECT * FROM foo WHERE id = 1; COMMIT;
 The row has xmin = 123456, and it is cached as committed in the 
 one-item cache by TransactionLogFetch.
 2. A lot of time passes. Everything is frozen, and XID wrap-around 
 happens. (Session A is idle but not in a transaction, so it doesn't 
 inhibit freezing.)
 3. In session B: BEGIN: INSERT INTO foo (id) VALUES (2); ROLLBACK;
 By coincidence, this transaction was assigned XID 123456.
 4. In session A: SELECT * FROM foo WHERE id = 2;
 The one-item cache still says that 123456 committed, so we return 
 the tuple inserted by the aborted transaction. Oops.

I think this is probably a red herring, because it's impossible for a
session to remain totally idle for that long --- see sinval updating.
(If you wanted to be really sure, we could have sinval reset clear
the TransactionLogFetch cache, but I doubt it's worth the trouble.)

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2011-10-26 Thread Magnus Hagander
On Wed, Oct 26, 2011 at 19:25, Andrew Dunstan and...@dunslane.net wrote:

 On 10/26/2011 12:47 PM, Magnus Hagander wrote:

 If that was what he meant, I'd vote against it.  There are way too many
 people who will *not* want their databases configured to be able to
 reach out onto the net.  This feature should be something that has to be
 installed by explicit user action.

 That is not what I meant.

 I meant installed the shared library by defualt, but still require
 CREATE EXTENSION.


 I don't see why it should be different from other standard modules, such as
 citext or hstore, both of which have pretty wide use, and less possible
 security implications than this.

As I stated earlier, it's really back to the old discussion of
splitting up contrib. This would be the additional module part, but
not the example of how to do things part of that...

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] autovacuum workers warning

2011-10-26 Thread Euler Taveira de Oliveira

Hi,

Some time ago [1], I proposed print a message every time there isn't 
autovacuum slots available and it asks for another one. It is not a complete 
solution for autovacuum tuning but it would at least give us a hint that 
number of workers is insufficient to keep up with the current load. The 
accurate number of slots needed would be the optimal solution but that 
information is not free (it would have to check every table in the databases 
available to get the approximate number of slots needed. Approximate because 
some table could be finishing the operation). A new warning is better than 
nothing. If we decided to improve this area in a future we should remove the 
warning but right now it would be an excelent hint to tune autovacuum.



[1] http://archives.postgresql.org/pgsql-hackers/2011-06/msg00678.php


--
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento
diff --git a/src/backend/postmaster/autovacuum.c b/src/backend/postmaster/autovacuum.c
index 3b71232..4ec0f87 100644
--- a/src/backend/postmaster/autovacuum.c
+++ b/src/backend/postmaster/autovacuum.c
@@ -656,6 +656,12 @@ AutoVacLauncherMain(int argc, char *argv[])
 
 		can_launch = (AutoVacuumShmem-av_freeWorkers != NULL);
 
+		if (!can_launch)
+			ereport(LOG,
+	(errmsg(maximum number of autovacuum workers reached),
+	 errhint(Consider increasing autovacuum_max_workers (currently %d).,
+		 	autovacuum_max_workers)));
+
 		if (AutoVacuumShmem-av_startingWorker != NULL)
 		{
 			int			waittime;

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pgsql_fdw, FDW for PostgreSQL server

2011-10-26 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Wed, Oct 26, 2011 at 16:37, Tom Lane t...@sss.pgh.pa.us wrote:
 If that was what he meant, I'd vote against it.  There are way too many
 people who will *not* want their databases configured to be able to
 reach out onto the net.  This feature should be something that has to be
 installed by explicit user action.

 That is not what I meant.

 I meant installed the shared library by defualt, but still require
 CREATE EXTENSION.

Whether the shlib is installed by default is a decision for packagers to
make, not us.  At best we could make a recommendation.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Range Types - typo + NULL string constructor

2011-10-26 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 I believe that we're in trouble with XIDs as soon as you have two
 active XIDs that are separated by a billion, because then you could
 have a situation where some people think a given XID is in the future
 and others think it's in the past.  I have been wondering if we should
 have some sort of active guard against that scenario; I don't think we
 do at present.

Sure we do.  It's covered by the XID wraparound prevention logic.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Cannot

2011-10-26 Thread David E. Wheeler
Suggested doc “patch”:

grep -lri 'can not' doc | xargs perl -i -pe 's/can not/cannot/g'

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-26 Thread Magnus Hagander
On Tue, Oct 25, 2011 at 12:37, Magnus Hagander mag...@hagander.net wrote:
 On Mon, Oct 24, 2011 at 14:40, Magnus Hagander mag...@hagander.net wrote:
 On Mon, Oct 24, 2011 at 13:46, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 +               /*
 +                * Looks like an xlog file. Parse it's position.

 s/it's/its/

 +                */
 +               if (sscanf(dirent-d_name, %08X%08X%08X, tli, log,
 seg) != 3)
 +               {
 +                       fprintf(stderr, _(%s: could not parse xlog
 filename \%s\\n),
 +                                       progname, dirent-d_name);
 +                       disconnect_and_exit(1);
 +               }
 +               log *= XLOG_SEG_SIZE;

 That multiplication by XLOG_SEG_SIZE could overflow, if logid is very high.
 It seems completely unnecessary, anyway,

 How do you mean completely unnecessary? We'd have to change the points
 that use it to divide by XLOG_SEG_SIZE otherwise, no? That might be a
 way to get around the overflow, but I'm not sure that's what you mean?

 Talked to Heikki on IM about this one, turns out we were both wrong.
 It's needed, but there was a bug hiding under it, due to (once again)
 mixing up segments and offsets. Has been fixed now.

 In pg_basebackup, it would be a good sanity check to check that the systemid
 returned by IDENTIFY_SYSTEM in the main connection and the WAL-streaming
 connection match. Just to be sure that some connection pooler didn't hijack
 one of the connections and point to a different server. And better check
 timelineid too while you're at it.

 That's a good idea. Will fix.

 Added to the new version of the patch.


 How does this interact with synchronous replication? If a base backup that
 streams WAL is in progress, and you have synchronous_standby_names set to
 '*', I believe the in-progress backup will count as a standby for that
 purpose. That might give a false sense of security.

 Ah yes. Did not think of that. Yes, it will have this problem.

 Actually, thinking more, per other mail, it won't. Because it will
 never report that the data is synced to disk, so it will not be
 considered for sync standby.

 This is something we might consider in the future (it could be a
 reasonable scenario where you had this), but not in the first version.

 Updated version of the patch attached.

I've applied this version with a few more minor changes that Heikki found.

His comment about .partial files still applies, and I intend to
address this in a follow-up commit, along with some further
documentation enhancements.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum workers warning

2011-10-26 Thread Tom Lane
Euler Taveira de Oliveira eu...@timbira.com writes:
 + if (!can_launch)
 + ereport(LOG,
 + (errmsg(maximum number of autovacuum 
 workers reached),
 +  errhint(Consider increasing 
 autovacuum_max_workers (currently %d).,
 + 
 autovacuum_max_workers)));

Isn't it normal for the launcher to max out the number of workers?
A log message that's generated routinely in normal operation doesn't
sound particularly helpful to me ...

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum workers warning

2011-10-26 Thread Alvaro Herrera

Excerpts from Euler Taveira de Oliveira's message of mar oct 25 16:56:12 -0300 
2011:
 Hi,
 
 Some time ago [1], I proposed print a message every time there isn't 
 autovacuum slots available and it asks for another one. It is not a complete 
 solution for autovacuum tuning but it would at least give us a hint that 
 number of workers is insufficient to keep up with the current load. The 
 accurate number of slots needed would be the optimal solution but that 
 information is not free (it would have to check every table in the databases 
 available to get the approximate number of slots needed. Approximate because 
 some table could be finishing the operation). A new warning is better than 
 nothing. If we decided to improve this area in a future we should remove the 
 warning but right now it would be an excelent hint to tune autovacuum.

Well, just increasing the number of workers would do nothing to solve
the problem, because the more workers there are, the slower they work.
The actual solution to the problem would be decreasing
autovacuum_vacuum_delay_cost, and/or related knobs.

I'm sure we need more data on autovacuum activities to properly tune
autovac, but I'm not sure that maxxing out max_workers is one of them.
Wasn't there some discussion recently on measuring the length of the
work queue, or something like that?

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Chris Redekop
FYI I have given this patch a good test and can now no longer reproduce
either the subtrans nor the clog error.  Thanks guys!


On Wed, Oct 26, 2011 at 11:09 AM, Simon Riggs si...@2ndquadrant.com wrote:

 On Wed, Oct 26, 2011 at 5:16 PM, Simon Riggs si...@2ndquadrant.com
 wrote:
  On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs si...@2ndquadrant.com
 wrote:
 
  Brewing a patch now.
 
  Latest thinking... confirmations or other error reports please.
 
  This fixes both the subtrans and clog bugs in one patch.

 I'll be looking to commit that tomorrow afternoon as two separate
 patches with appropriate credits.

 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services



Re: [HACKERS] autovacuum workers warning

2011-10-26 Thread Euler Taveira de Oliveira

On 26-10-2011 16:14, Alvaro Herrera wrote:

Well, just increasing the number of workers would do nothing to solve
the problem, because the more workers there are, the slower they work.
The actual solution to the problem would be decreasing
autovacuum_vacuum_delay_cost, and/or related knobs.

Why not? You're saying that parallelizing the work won't help? As about 
autovacuum_vacuum_cost_delay, don't you think that 20ms isn't small enough to 
suggest decreasing instead of increasing the number of workers?



Wasn't there some discussion recently on measuring the length of the
work queue, or something like that?

Yes, there is. As I said, it is an expensive and approximate measure. I'm not 
saying that is not the right direction, I'm arguing that a hint is better than 
nothing. Right now the only way to know if it is out of workers is to query 
pg_stat_activity frequently.



--
   Euler Taveira de Oliveira - Timbira   http://www.timbira.com.br/
   PostgreSQL: Consultoria, Desenvolvimento, Suporte 24x7 e Treinamento

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum workers warning

2011-10-26 Thread Alvaro Herrera

Excerpts from Euler Taveira de Oliveira's message of mié oct 26 16:57:18 -0300 
2011:
 
 On 26-10-2011 16:14, Alvaro Herrera wrote:
  Well, just increasing the number of workers would do nothing to solve
  the problem, because the more workers there are, the slower they work.
  The actual solution to the problem would be decreasing
  autovacuum_vacuum_delay_cost, and/or related knobs.
 
 Why not? You're saying that parallelizing the work won't help? As about 
 autovacuum_vacuum_cost_delay, don't you think that 20ms isn't small enough to 
 suggest decreasing instead of increasing the number of workers?

I am saying that if you have two workers running, they increase their
cost delay to 40ms internally.  If you increase the max to four, they
would run at an effective delay of 80ms.

  Wasn't there some discussion recently on measuring the length of the
  work queue, or something like that?
 
 Yes, there is. As I said, it is an expensive and approximate measure. I'm not 
 saying that is not the right direction, I'm arguing that a hint is better 
 than 
 nothing. Right now the only way to know if it is out of workers is to query 
 pg_stat_activity frequently.

Well, I'm not saying there aren't other ways.

-- 
Álvaro Herrera alvhe...@commandprompt.com
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] autovacuum workers warning

2011-10-26 Thread Dickson S. Guedes
2011/10/26 Euler Taveira de Oliveira eu...@timbira.com:
 I'm not saying that is not the right direction, I'm arguing that a hint is
 better than nothing. Right now the only way to know if it is out of workers
 is to query pg_stat_activity frequently.

The currently number of autovaccum workers could be in the errmsg only
instead errhint, then errhint could be omited from patch if there
isn't a good hint to report.

-- 
Dickson S. Guedes
mail/xmpp: gue...@guedesoft.net - skype: guediz
http://guedesoft.net - http://www.postgresql.org.br

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cannot

2011-10-26 Thread Andrew Dunstan



On 10/26/2011 02:16 PM, David E. Wheeler wrote:

Suggested doc “patch”:

 grep -lri 'can not' doc | xargs perl -i -pe 's/can not/cannot/g'




Why? can not is perfectly grammatical AFAIK.

cheers

andrew

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cannot

2011-10-26 Thread David E. Wheeler
On Oct 26, 2011, at 2:06 PM, Andrew Dunstan wrote:

 Suggested doc “patch”:
 
 grep -lri 'can not' doc | xargs perl -i -pe 's/can not/cannot/g'
 
 Why? can not is perfectly grammatical AFAIK.

True, but there's a logic issue. Take this example from doc/src/sgml/func.sgml:

para
  functionpg_advisory_xact_lock/ works the same as
  functionpg_advisory_lock/, expect the lock is automatically released
  at the end of the current transaction and can not be released explicitly.
 /para

I read this as equivalent to can be not released. Which of course is silly, 
so as I read it I realize what it means, but it trips up my overly logical 
brain. It interrupts the flow. There is no such confusion in cannot be 
released and thus no tripping up on meaning.

Best,

David
-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cannot

2011-10-26 Thread Andrew Dunstan



On 10/26/2011 05:15 PM, David E. Wheeler wrote:

On Oct 26, 2011, at 2:06 PM, Andrew Dunstan wrote:


Suggested doc “patch”:

 grep -lri 'can not' doc | xargs perl -i -pe 's/can not/cannot/g'

Why? can not is perfectly grammatical AFAIK.

True, but there's a logic issue. Take this example from doc/src/sgml/func.sgml:


para
  functionpg_advisory_xact_lock/  works the same as
  functionpg_advisory_lock/, expect the lock is automatically released
  at the end of the current transaction and can not be released explicitly.
 /para

I read this as equivalent to can be not released. Which of course is silly, so as I 
read it I realize what it means, but it trips up my overly logical brain. It interrupts the flow. 
There is no such confusion in cannot be released and thus no tripping up on meaning.




Here's what I would do:

1. s/expect/except that/

2. s/can not be released explicitly/can not be explicitly released/

cheers

andrew


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cannot

2011-10-26 Thread Tom Lane
David E. Wheeler da...@kineticode.com writes:
 On Oct 26, 2011, at 2:06 PM, Andrew Dunstan wrote:
 Why? can not is perfectly grammatical AFAIK.

 True, but there's a logic issue. Take this example from 
 doc/src/sgml/func.sgml:

 para
 functionpg_advisory_xact_lock/ works the same as
 functionpg_advisory_lock/, expect the lock is automatically released
 at the end of the current transaction and can not be released explicitly.
 /para

 I read this as equivalent to can be not released. Which of course is silly, 
 so as I read it I realize what it means, but it trips up my overly logical 
 brain. It interrupts the flow. There is no such confusion in cannot be 
 released and thus no tripping up on meaning.

This particular change seems like an improvement to me, but it's hardly
an adequate argument for a global search-and-replace.  There might be
other places where such a change renders things *less* readable.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Cannot

2011-10-26 Thread David E. Wheeler
On Oct 26, 2011, at 2:58 PM, Tom Lane wrote:

 I read this as equivalent to can be not released. Which of course is 
 silly, so as I read it I realize what it means, but it trips up my overly 
 logical brain. It interrupts the flow. There is no such confusion in cannot 
 be released and thus no tripping up on meaning.
 
 This particular change seems like an improvement to me, but it's hardly
 an adequate argument for a global search-and-replace.  There might be
 other places where such a change renders things *less* readable.

The patch is actually quite modest; there are only a few instances of can 
not. Attached.

Best,

David



cannot.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Florian Pflug
On Oct26, 2011, at 18:08 , Simon Riggs wrote:
 On Wed, Oct 26, 2011 at 3:47 PM, Florian Pflug f...@phlo.org wrote:
 On Oct26, 2011, at 15:57 , Florian Pflug wrote:
 Thus, if the CLOG is extended after (or in the middle of) CheckPointGuts(), 
 but
 before LogStandbySnapshot(), then we end up with a nextXid in the checkpoint
 whose CLOG page hasn't necessarily made it to the disk yet. The longer 
 CheckPointGuts()
 takes to finish it's work the more likely it becomes (assuming that CLOG 
 writing
 and syncing doesn't happen at the very end). This fits the OP's observation 
 ob the
 problem vanishing when pg_start_backup() does an immediate checkpoint.
 
 As it turns out the correct fix is actually just to skip StartupClog()
 until the end of recovery because it does nothing useful when executed
 at that time. When I wrote the original code I remember thinking that
 StartupClog() is superfluous at that point.

Hm, that fixes the problem in the hot standby case, but as I said in my
reply to Chris Redekop, normal crash recovery is also at risk (although
the probability of hitting the bug is much smaller there). Here's my
reasoning from that other mail:

Per my theory about the cause of the problem in my other mail, I think you
might see StartupCLOG failures even during crash recovery, provided that
wal_level was set to hot_standby when the primary crashed. Here's how

1) We start a checkpoint, and get as far as LogStandbySnapshot()
2) A backend does AssignTransactionId, and gets as far as GetTransactionoId().
  The assigned XID requires CLOG extension.
3) The checkpoint continues, and LogStandbySnapshot () advances the
  checkpoint's nextXid to the XID assigned in (2).
4) We crash after writing the checkpoint record, but before the CLOG
  extension makes it to the disk, and before any trace of the XID assigned
  in (2) makes it to the xlog.

Then StartupCLOG() would fail at the end of recovery, because we'd end up
with a nextXid whose corresponding CLOG page doesn't exist.

Quite aside from that concern, I think it's probably not a good idea
for the nextXid value of a checkpoint to depend on whether wal_level
was set to hot_standby or not. Our recovery code is already quite complex
and hard to test, and this just adds one more combination that has to
be thought about while coding and that needs to be tested.

My suggestion is to fix the CLOG problem in that same way that you fixed
the SUBTRANS problem, i.e. by moving LogStandbySnapshot() to before
CheckPointGuts(). 

Here's what I image CreateCheckPoint() should look like:

1) LogStandbySnapshot() and fill out oldestActiveXid
2) Fill out REDO
3) Wait for concurrent commits
4) Fill out nextXid and the other fields
5) CheckPointGuts()
6) Rest

It's then no longer necessary for LogStandbySnapshot() do modify
the nextXid, since we fill out nextXid after LogStandbySnapshot() and
will thus derive a higher value than LogStandbySnapshot() would have.

We could then also fold GetOldestActiveTransactionId() back into
your proposed LogStandbySnapshot() and thus don't need two ProcArray
traversals.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Robert Haas
On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs si...@2ndquadrant.com wrote:
 On Wed, Oct 26, 2011 at 5:08 PM, Simon Riggs si...@2ndquadrant.com wrote:

 Brewing a patch now.

 Latest thinking... confirmations or other error reports please.

 This fixes both the subtrans and clog bugs in one patch.

I don't see the point of changing StartupCLOG() to be an empty
function and adding a new function TrimCLOG() that does everything
StartupCLOG() used to do.  Seems simpler to just move the calls to
StartupCLOG() wherever they need to be - i.e. remove the one that
happens before WAL replay, and extricate the one at end-of-recovery
from the if block which currently contains it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-26 Thread Tom Lane
Robert Haas robertmh...@gmail.com writes:
 On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs si...@2ndquadrant.com wrote:
 This fixes both the subtrans and clog bugs in one patch.

 I don't see the point of changing StartupCLOG() to be an empty
 function and adding a new function TrimCLOG() that does everything
 StartupCLOG() used to do.

+1 ... I found that overly cute also.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers