Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Fujii Masao
On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander mag...@hagander.net wrote:
 I've applied this version with a few more minor changes that Heikki found.

Cool!

When I tried pg_receivexlog and checked the contents of streamed WAL file by
xlogdump, I found that recent WAL records that walsender has already sent don't
exist in that WAL file. I expected that pg_receivexlog writes the streamed WAL
records to the disk as soon as possible, but it doesn't. Is this
intentional? Or bug?
Am I missing something?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Magnus Hagander
On Thu, Oct 27, 2011 at 09:29, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander mag...@hagander.net wrote:
 I've applied this version with a few more minor changes that Heikki found.

 Cool!

 When I tried pg_receivexlog and checked the contents of streamed WAL file by
 xlogdump, I found that recent WAL records that walsender has already sent 
 don't
 exist in that WAL file. I expected that pg_receivexlog writes the streamed WAL
 records to the disk as soon as possible, but it doesn't. Is this
 intentional? Or bug?
 Am I missing something?

It writes it to disk as soon as possible, but doesn't fsync() until
the end of each segment. Are you by any chance looking at the file
while it's running?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Fujii Masao
On Thu, Oct 27, 2011 at 4:40 PM, Magnus Hagander mag...@hagander.net wrote:
 On Thu, Oct 27, 2011 at 09:29, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander mag...@hagander.net wrote:
 I've applied this version with a few more minor changes that Heikki found.

 Cool!

 When I tried pg_receivexlog and checked the contents of streamed WAL file by
 xlogdump, I found that recent WAL records that walsender has already sent 
 don't
 exist in that WAL file. I expected that pg_receivexlog writes the streamed 
 WAL
 records to the disk as soon as possible, but it doesn't. Is this
 intentional? Or bug?
 Am I missing something?

 It writes it to disk as soon as possible, but doesn't fsync() until
 the end of each segment. Are you by any chance looking at the file
 while it's running?

No. I looked at that file after shutting down the master server.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Magnus Hagander
On Thu, Oct 27, 2011 at 09:46, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 4:40 PM, Magnus Hagander mag...@hagander.net wrote:
 On Thu, Oct 27, 2011 at 09:29, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander mag...@hagander.net 
 wrote:
 I've applied this version with a few more minor changes that Heikki found.

 Cool!

 When I tried pg_receivexlog and checked the contents of streamed WAL file by
 xlogdump, I found that recent WAL records that walsender has already sent 
 don't
 exist in that WAL file. I expected that pg_receivexlog writes the streamed 
 WAL
 records to the disk as soon as possible, but it doesn't. Is this
 intentional? Or bug?
 Am I missing something?

 It writes it to disk as soon as possible, but doesn't fsync() until
 the end of each segment. Are you by any chance looking at the file
 while it's running?

 No. I looked at that file after shutting down the master server.

Ugh, in that case something is certainly wrong. There is nothing but
setting up some offset values between PQgetCopyData() and write()...


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Fujii Masao
On Thu, Oct 27, 2011 at 4:49 PM, Magnus Hagander mag...@hagander.net wrote:
 On Thu, Oct 27, 2011 at 09:46, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 4:40 PM, Magnus Hagander mag...@hagander.net wrote:
 On Thu, Oct 27, 2011 at 09:29, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander mag...@hagander.net 
 wrote:
 I've applied this version with a few more minor changes that Heikki found.

 Cool!

 When I tried pg_receivexlog and checked the contents of streamed WAL file 
 by
 xlogdump, I found that recent WAL records that walsender has already sent 
 don't
 exist in that WAL file. I expected that pg_receivexlog writes the streamed 
 WAL
 records to the disk as soon as possible, but it doesn't. Is this
 intentional? Or bug?
 Am I missing something?

 It writes it to disk as soon as possible, but doesn't fsync() until
 the end of each segment. Are you by any chance looking at the file
 while it's running?

 No. I looked at that file after shutting down the master server.

 Ugh, in that case something is certainly wrong. There is nothing but
 setting up some offset values between PQgetCopyData() and write()...

When end-of-copy stream is found or an error happens, pg_receivexlog
exits without flushing outstanding WAL records. Which seems to cause
the problem I reported.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Heikki Linnakangas

On 27.10.2011 09:57, Heikki Linnakangas wrote:

My suggestion is to fix the CLOG problem in that same way that you fixed
the SUBTRANS problem, i.e. by moving LogStandbySnapshot() to before
CheckPointGuts().

Here's what I image CreateCheckPoint() should look like:

1) LogStandbySnapshot() and fill out oldestActiveXid
2) Fill out REDO
3) Wait for concurrent commits
4) Fill out nextXid and the other fields
5) CheckPointGuts()
6) Rest

It's then no longer necessary for LogStandbySnapshot() do modify
the nextXid, since we fill out nextXid after LogStandbySnapshot() and
will thus derive a higher value than LogStandbySnapshot() would have.


Hmm, I don't think that fully fixes the problem. Even if you're certain
that CheckPointGuts() has fsync'd the clog page to disk, VACUUM might
decide to truncate it away again while the checkpoint is running.


Oh, scratch that. During recovery, we merrily treat missing slru files 
as they were filled with zeros.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Magnus Hagander
On Thu, Oct 27, 2011 at 10:12, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 4:49 PM, Magnus Hagander mag...@hagander.net wrote:
 On Thu, Oct 27, 2011 at 09:46, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 4:40 PM, Magnus Hagander mag...@hagander.net 
 wrote:
 On Thu, Oct 27, 2011 at 09:29, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander mag...@hagander.net 
 wrote:
 I've applied this version with a few more minor changes that Heikki 
 found.

 Cool!

 When I tried pg_receivexlog and checked the contents of streamed WAL file 
 by
 xlogdump, I found that recent WAL records that walsender has already sent 
 don't
 exist in that WAL file. I expected that pg_receivexlog writes the 
 streamed WAL
 records to the disk as soon as possible, but it doesn't. Is this
 intentional? Or bug?
 Am I missing something?

 It writes it to disk as soon as possible, but doesn't fsync() until
 the end of each segment. Are you by any chance looking at the file
 while it's running?

 No. I looked at that file after shutting down the master server.

 Ugh, in that case something is certainly wrong. There is nothing but
 setting up some offset values between PQgetCopyData() and write()...

 When end-of-copy stream is found or an error happens, pg_receivexlog
 exits without flushing outstanding WAL records. Which seems to cause
 the problem I reported.

Not sure I follow. When we arrive at PQgetCopyData() there should be
nothing buffered, and if the end of stream happens there it returns
-1, and we exit, no? So where is the data that's lost?

I do realize we don't actually fsync() and close() in this case - is
that what you are referring to? But the data should already have been
write()d, so it should still be there, no?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Fujii Masao
On Thu, Oct 27, 2011 at 5:18 PM, Magnus Hagander mag...@hagander.net wrote:
 Not sure I follow. When we arrive at PQgetCopyData() there should be
 nothing buffered, and if the end of stream happens there it returns
 -1, and we exit, no? So where is the data that's lost?

 I do realize we don't actually fsync() and close() in this case - is
 that what you are referring to? But the data should already have been
 write()d, so it should still be there, no?

Oh, right. Hmm.. xlogdump might be the cause.

Though I've not read the code of xlogdump, I wonder if it gives up
outputting the contents of WAL file when it finds a partial WAL page...
This strikes me that recovery code has the same problem. No?
IOW, when a partial WAL page is found during recovery, I'm afraid
that page would not be replayed though it contains valid data.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Simon Riggs
On Thu, Oct 27, 2011 at 4:36 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs si...@2ndquadrant.com wrote:
 This fixes both the subtrans and clog bugs in one patch.

 I don't see the point of changing StartupCLOG() to be an empty
 function and adding a new function TrimCLOG() that does everything
 StartupCLOG() used to do.

 +1 ... I found that overly cute also.

It would have been even easier to move StartupCLOG() later, but then
we'd need a big comment explaining why CLOG starts up at one point and
subtrans starts up at another point, since that is very confusing way
of doing things. I wrote it that way first and it definitely looks
strange.

It's much easier to understand that StartupCLOG() is actually a no-op
and that we need to trim the clog at the end of recovery in all cases.

The patch isn't meant to be cute, just a better of way of expressing
what needs to be done, so I think the patch should stay that way.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Heikki Linnakangas

On 27.10.2011 02:29, Florian Pflug wrote:

Per my theory about the cause of the problem in my other mail, I think you
might see StartupCLOG failures even during crash recovery, provided that
wal_level was set to hot_standby when the primary crashed. Here's how

1) We start a checkpoint, and get as far as LogStandbySnapshot()
2) A backend does AssignTransactionId, and gets as far as GetTransactionoId().
   The assigned XID requires CLOG extension.
3) The checkpoint continues, and LogStandbySnapshot () advances the
   checkpoint's nextXid to the XID assigned in (2).
4) We crash after writing the checkpoint record, but before the CLOG
   extension makes it to the disk, and before any trace of the XID assigned
   in (2) makes it to the xlog.

Then StartupCLOG() would fail at the end of recovery, because we'd end up
with a nextXid whose corresponding CLOG page doesn't exist.


No, clog extension is WAL-logged while holding the XidGenLock. At step 
3, LogStandbySnapshot() would block until the clog-extension record is 
written to WAL, so crash recovery would see and replay that record 
before calling StartupCLOG().


That can happen during hot standby, though, because StartupCLOG() is 
called earlier.



My suggestion is to fix the CLOG problem in that same way that you fixed
the SUBTRANS problem, i.e. by moving LogStandbySnapshot() to before
CheckPointGuts().

Here's what I image CreateCheckPoint() should look like:

1) LogStandbySnapshot() and fill out oldestActiveXid
2) Fill out REDO
3) Wait for concurrent commits
4) Fill out nextXid and the other fields
5) CheckPointGuts()
6) Rest

It's then no longer necessary for LogStandbySnapshot() do modify
the nextXid, since we fill out nextXid after LogStandbySnapshot() and
will thus derive a higher value than LogStandbySnapshot() would have.


Hmm, I don't think that fully fixes the problem. Even if you're certain 
that CheckPointGuts() has fsync'd the clog page to disk, VACUUM might 
decide to truncate it away again while the checkpoint is running.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Fujii Masao
On Thu, Oct 27, 2011 at 6:25 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 5:18 PM, Magnus Hagander mag...@hagander.net wrote:
 Not sure I follow. When we arrive at PQgetCopyData() there should be
 nothing buffered, and if the end of stream happens there it returns
 -1, and we exit, no? So where is the data that's lost?

 I do realize we don't actually fsync() and close() in this case - is
 that what you are referring to? But the data should already have been
 write()d, so it should still be there, no?

 Oh, right. Hmm.. xlogdump might be the cause.

 Though I've not read the code of xlogdump, I wonder if it gives up
 outputting the contents of WAL file when it finds a partial WAL page...
 This strikes me that recovery code has the same problem. No?
 IOW, when a partial WAL page is found during recovery, I'm afraid
 that page would not be replayed though it contains valid data.

My concern was right. When I performed a recovery using the streamed
WAL files, the loss of data happened. A partial WAL page was not replayed.

To avoid this problem, I think that we should change pg_receivexlog so
that it writes WAL data *by the block*, or creates, like walreceiver, WAL file
before writing any data. Otherwise, though pg_receivexlog streams WAL
data in realtime, the latest WAL data might not be available for recovery.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Magnus Hagander
On Thu, Oct 27, 2011 at 12:29, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 6:25 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 5:18 PM, Magnus Hagander mag...@hagander.net wrote:
 Not sure I follow. When we arrive at PQgetCopyData() there should be
 nothing buffered, and if the end of stream happens there it returns
 -1, and we exit, no? So where is the data that's lost?

 I do realize we don't actually fsync() and close() in this case - is
 that what you are referring to? But the data should already have been
 write()d, so it should still be there, no?

 Oh, right. Hmm.. xlogdump might be the cause.

 Though I've not read the code of xlogdump, I wonder if it gives up
 outputting the contents of WAL file when it finds a partial WAL page...
 This strikes me that recovery code has the same problem. No?
 IOW, when a partial WAL page is found during recovery, I'm afraid
 that page would not be replayed though it contains valid data.

 My concern was right. When I performed a recovery using the streamed
 WAL files, the loss of data happened. A partial WAL page was not replayed.

 To avoid this problem, I think that we should change pg_receivexlog so
 that it writes WAL data *by the block*, or creates, like walreceiver, WAL file
 before writing any data. Otherwise, though pg_receivexlog streams WAL
 data in realtime, the latest WAL data might not be available for recovery.

Ah, so you were recovering data from the last, partial, file? Not from
a completed file?

I'm rewriting the handling of partial files per the other thread
started by Heikki. The idea is that there will be an actual .partial
file in there when pg_receivexlog has ended, and you have to deal with
it manually. The typical way would be to pad it with zeroes to the
end. Doing such padding would solve this recovery issue, correct?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Fujii Masao
On Thu, Oct 27, 2011 at 7:48 PM, Magnus Hagander mag...@hagander.net wrote:
 On Thu, Oct 27, 2011 at 12:29, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 6:25 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 5:18 PM, Magnus Hagander mag...@hagander.net 
 wrote:
 Not sure I follow. When we arrive at PQgetCopyData() there should be
 nothing buffered, and if the end of stream happens there it returns
 -1, and we exit, no? So where is the data that's lost?

 I do realize we don't actually fsync() and close() in this case - is
 that what you are referring to? But the data should already have been
 write()d, so it should still be there, no?

 Oh, right. Hmm.. xlogdump might be the cause.

 Though I've not read the code of xlogdump, I wonder if it gives up
 outputting the contents of WAL file when it finds a partial WAL page...
 This strikes me that recovery code has the same problem. No?
 IOW, when a partial WAL page is found during recovery, I'm afraid
 that page would not be replayed though it contains valid data.

 My concern was right. When I performed a recovery using the streamed
 WAL files, the loss of data happened. A partial WAL page was not replayed.

 To avoid this problem, I think that we should change pg_receivexlog so
 that it writes WAL data *by the block*, or creates, like walreceiver, WAL 
 file
 before writing any data. Otherwise, though pg_receivexlog streams WAL
 data in realtime, the latest WAL data might not be available for recovery.

 Ah, so you were recovering data from the last, partial, file? Not from
 a completed file?

Yes. I copied all streamed WAL files to pg_xlog directory and started recovery.

 I'm rewriting the handling of partial files per the other thread
 started by Heikki. The idea is that there will be an actual .partial
 file in there when pg_receivexlog has ended, and you have to deal with
 it manually. The typical way would be to pad it with zeroes to the
 end. Doing such padding would solve this recovery issue, correct?

Yes. But that sounds unuserfriendly. Padding the WAL file manually
is easy-to-do for a user?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Heikki Linnakangas

On 27.10.2011 14:09, Fujii Masao wrote:

On Thu, Oct 27, 2011 at 7:48 PM, Magnus Hagandermag...@hagander.net  wrote:

I'm rewriting the handling of partial files per the other thread
started by Heikki. The idea is that there will be an actual .partial
file in there when pg_receivexlog has ended, and you have to deal with
it manually. The typical way would be to pad it with zeroes to the
end. Doing such padding would solve this recovery issue, correct?


Yes. But that sounds unuserfriendly. Padding the WAL file manually
is easy-to-do for a user?


truncate -s 16M file works at least on my Linux system. Not sure how 
you'd do it on Windows.


Perhaps we should add automatic padding in the server, though. It 
wouldn't take much code in the server, and would make life easier for 
people writing their scripts. Maybe even have the backend check for a 
.partial file if it can't find a regularly named XLOG file. Needs some 
thought..


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Magnus Hagander
On Thu, Oct 27, 2011 at 13:19, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 27.10.2011 14:09, Fujii Masao wrote:

 On Thu, Oct 27, 2011 at 7:48 PM, Magnus Hagandermag...@hagander.net
  wrote:

 I'm rewriting the handling of partial files per the other thread
 started by Heikki. The idea is that there will be an actual .partial
 file in there when pg_receivexlog has ended, and you have to deal with
 it manually. The typical way would be to pad it with zeroes to the
 end. Doing such padding would solve this recovery issue, correct?

 Yes. But that sounds unuserfriendly. Padding the WAL file manually
 is easy-to-do for a user?

 truncate -s 16M file works at least on my Linux system. Not sure how
 you'd do it on Windows.

Yeah, taht's easy enough. I don't think there are similar tools on
windows, but we could probably put together a quick script for people
to use if necessary.


 Perhaps we should add automatic padding in the server, though. It wouldn't
 take much code in the server, and would make life easier for people writing
 their scripts. Maybe even have the backend check for a .partial file if it
 can't find a regularly named XLOG file. Needs some thought..

I'd definitely want to avoid anything that requires pg_receivexlog to
actually *parse* the WAL. That'll make it way more complex than I'd
like.

Having recovery consider a .partial file might be interesting. It
could consider that only if there are no other complete files
available, or something like that?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Robert Haas
On Thu, Oct 27, 2011 at 5:37 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, Oct 27, 2011 at 4:36 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs si...@2ndquadrant.com wrote:
 This fixes both the subtrans and clog bugs in one patch.

 I don't see the point of changing StartupCLOG() to be an empty
 function and adding a new function TrimCLOG() that does everything
 StartupCLOG() used to do.

 +1 ... I found that overly cute also.

 It would have been even easier to move StartupCLOG() later, but then
 we'd need a big comment explaining why CLOG starts up at one point and
 subtrans starts up at another point, since that is very confusing way
 of doing things. I wrote it that way first and it definitely looks
 strange.

 It's much easier to understand that StartupCLOG() is actually a no-op
 and that we need to trim the clog at the end of recovery in all cases.

If it's a no-op, why have it at all?  I know we have a bunch of places
in the code where we have empty stubs where there used to be
initialization or cleanup code, but I've never found that particularly
good style.  If something no longer requires initialization in a
certain place, I think we should nuke the whole function.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Robert Haas
On Thu, Oct 27, 2011 at 7:19 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 27.10.2011 14:09, Fujii Masao wrote:

 On Thu, Oct 27, 2011 at 7:48 PM, Magnus Hagandermag...@hagander.net
  wrote:

 I'm rewriting the handling of partial files per the other thread
 started by Heikki. The idea is that there will be an actual .partial
 file in there when pg_receivexlog has ended, and you have to deal with
 it manually. The typical way would be to pad it with zeroes to the
 end. Doing such padding would solve this recovery issue, correct?

 Yes. But that sounds unuserfriendly. Padding the WAL file manually
 is easy-to-do for a user?

 truncate -s 16M file works at least on my Linux system. Not sure how
 you'd do it on Windows.

One of the common I hear about PostgreSQL is that our replication
system is more difficult to set up than people would like, and it's
too easy to make mistakes that can corrupt your data without realizing
it; I don't think making them need to truncate a file to 16 megabytes
is going to improve things there.

 Perhaps we should add automatic padding in the server, though. It wouldn't
 take much code in the server, and would make life easier for people writing
 their scripts. Maybe even have the backend check for a .partial file if it
 can't find a regularly named XLOG file. Needs some thought..

+1 for figuring out something, though I'm not sure exactly what.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Florian Pflug
On Oct27, 2011, at 08:57 , Heikki Linnakangas wrote:
 On 27.10.2011 02:29, Florian Pflug wrote:
 Per my theory about the cause of the problem in my other mail, I think you
 might see StartupCLOG failures even during crash recovery, provided that
 wal_level was set to hot_standby when the primary crashed. Here's how
 
 1) We start a checkpoint, and get as far as LogStandbySnapshot()
 2) A backend does AssignTransactionId, and gets as far as 
 GetTransactionoId().
   The assigned XID requires CLOG extension.
 3) The checkpoint continues, and LogStandbySnapshot () advances the
   checkpoint's nextXid to the XID assigned in (2).
 4) We crash after writing the checkpoint record, but before the CLOG
   extension makes it to the disk, and before any trace of the XID assigned
   in (2) makes it to the xlog.
 
 Then StartupCLOG() would fail at the end of recovery, because we'd end up
 with a nextXid whose corresponding CLOG page doesn't exist.
 
 No, clog extension is WAL-logged while holding the XidGenLock. At step 3,
 LogStandbySnapshot() would block until the clog-extension record is written
 to WAL, so crash recovery would see and replay that record before calling
 StartupCLOG().

Hm, true. But it still seems wrong for LogStandbySnapshot() to modify the
checkpoint's nextXid, and even more wrong to do that only if wal_mode =
hot_standby. Plus, I think it's a smart idea to verify that the required
parts of the CLOG are available at the start of recovery. Because if they're
missing, the data on the standby *will* be corrupted. Is there any argument
against doiing LogStandbySnapshot() earlier (i.e., at the time oldestActiveXid
is computed)?

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] isolationtester and invalid permutations

2011-10-27 Thread Robert Haas
On Tue, Oct 25, 2011 at 9:13 PM, Alvaro Herrera alvhe...@alvh.no-ip.org wrote:
 Instead of simply aborting a spec that specifies running commands on
 blocked sessions (what we call an invalid permutation), it seems more
 useful to report the problem, cleanup the sessions, and continue with
 the next permutation.

 This, in conjunction with the dry-run patch I submitted earlier, makes
 it easier to determine a working spec: dry-run the spec; copy the
 so-generated permutation lines into the spec; run the spec normally,
 which reports the invalid permutations; comment out the invalid
 permutations from the spec; done.

 The attached patch, again from Alexander Shulgin (with some tweaks from
 me) does that.

 Comments?

Seems sensible.  I think we should avoid including invalid
permutations in our regression test suite, but this still seems useful
for the reasons you mention.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (PATCH) Adding CORRESPONDING (NULL error)

2011-10-27 Thread Erik Rijkers
(pgsql 9.2devel (25 oct) with your latest CORRESPONDING patch;
linux x86_64 GNU/Linux 2.6.18-274.3.1.el5)

Hi,

here is another peculiarity, which I think is a bug:

-- first without CORRESPONDING:

$ psql -Xaf null.sql
select 1 a   , 2 b
union all
select null a, 4 b ;
 a | b
---+---
 1 | 2
   | 4
(2 rows)

-- then with CORRESPONDING:

select 1 a   , 2 b
union all
corresponding
select null a, 4 b ;
psql:null.sql:9: ERROR:  failed to find conversion function from unknown to 
integer


If the null value is in a table column the error does not occur:

drop table if exists t1; create table t1 (a int, b int); insert into t1 values 
(1,2);
drop table if exists t2; create table t2 (a int, b int); insert into t2 values 
(null,2);
select a,b from t1
union all
corresponding
select a,b from t2 ;
 a | b
---+---
 1 | 2
   | 2
(2 rows)


I'm not sure it is actually a bug; but it seems an unneccessary error.


thanks,

Erik Rijkers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Hot Standby startup with overflowed snapshots

2011-10-27 Thread Simon Riggs
Chris Redekop's recent report of slow startup for Hot Standby has made
me revisit the code there.

Although there isn't a bug, there is a missed opportunity for starting
up faster which could be the source of Chris' annoyance.

The following patch allows a faster startup in some circumstances.

The patch also alters the log levels for messages and gives a single
simple message for this situation. The log will now say

  LOG:  recovery snapshot waiting for non-overflowed snapshot or until
oldest active xid on standby is at least %u (now %u)
  ...multiple times until snapshot non-overflowed or xid reached...

whereas before the first LOG message shown was

  LOG:  consistent state delayed because recovery snapshot incomplete
  and only later, at DEBUG2 do you see
  LOG:  recovery snapshot waiting for %u oldest active xid on standby is %u
  ...multiple times until xid reached...

Comments please.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


faster_hot_standby_startup_withsubxacts.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Simon Riggs
On Thu, Oct 27, 2011 at 12:36 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 5:37 AM, Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, Oct 27, 2011 at 4:36 AM, Tom Lane t...@sss.pgh.pa.us wrote:
 Robert Haas robertmh...@gmail.com writes:
 On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs si...@2ndquadrant.com 
 wrote:
 This fixes both the subtrans and clog bugs in one patch.

 I don't see the point of changing StartupCLOG() to be an empty
 function and adding a new function TrimCLOG() that does everything
 StartupCLOG() used to do.

 +1 ... I found that overly cute also.

 It would have been even easier to move StartupCLOG() later, but then
 we'd need a big comment explaining why CLOG starts up at one point and
 subtrans starts up at another point, since that is very confusing way
 of doing things. I wrote it that way first and it definitely looks
 strange.

 It's much easier to understand that StartupCLOG() is actually a no-op
 and that we need to trim the clog at the end of recovery in all cases.

 If it's a no-op, why have it at all?  I know we have a bunch of places
 in the code where we have empty stubs where there used to be
 initialization or cleanup code, but I've never found that particularly
 good style.  If something no longer requires initialization in a
 certain place, I think we should nuke the whole function.

It is a no-op for exactly the same reason other similar functions are
no-ops - it used to do something but now does not.

Anyone seeing StartupSubtrans and StartupMultiXact but no StartupClog
will immediately ask why?.
IMHO it's easier to have an obviously named function than a comment -
its less invasive for a backpatch as well.

I'm following current code style. If you wish to change that, feel
free to change this and all other locations that do this. Until then,
doing this makes most sense and follows current coding style.

If I had done it the way you suggest, I don't doubt someone would say
in about 6 months Which idiot removed StartupClog()?.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Simon Riggs
On Thu, Oct 27, 2011 at 12:29 AM, Florian Pflug f...@phlo.org wrote:

 Per my theory about the cause of the problem in my other mail, I think you
 might see StartupCLOG failures even during crash recovery, provided that
 wal_level was set to hot_standby when the primary crashed. Here's how

 1) We start a checkpoint, and get as far as LogStandbySnapshot()
 2) A backend does AssignTransactionId, and gets as far as GetTransactionoId().
  The assigned XID requires CLOG extension.
 3) The checkpoint continues, and LogStandbySnapshot () advances the
  checkpoint's nextXid to the XID assigned in (2).
 4) We crash after writing the checkpoint record, but before the CLOG
  extension makes it to the disk, and before any trace of the XID assigned
  in (2) makes it to the xlog.

 Then StartupCLOG() would fail at the end of recovery, because we'd end up
 with a nextXid whose corresponding CLOG page doesn't exist.

Clog extension holds XidGenLock, as does LogStandbySnapshot, which
specifically excludes the above scenario.


 Quite aside from that concern, I think it's probably not a good idea
 for the nextXid value of a checkpoint to depend on whether wal_level
 was set to hot_standby or not. Our recovery code is already quite complex
 and hard to test, and this just adds one more combination that has to
 be thought about while coding and that needs to be tested.

 My suggestion is to fix the CLOG problem in that same way that you fixed
 the SUBTRANS problem, i.e. by moving LogStandbySnapshot() to before
 CheckPointGuts().

 Here's what I image CreateCheckPoint() should look like:

 1) LogStandbySnapshot() and fill out oldestActiveXid
 2) Fill out REDO
 3) Wait for concurrent commits
 4) Fill out nextXid and the other fields
 5) CheckPointGuts()
 6) Rest

 It's then no longer necessary for LogStandbySnapshot() do modify
 the nextXid, since we fill out nextXid after LogStandbySnapshot() and
 will thus derive a higher value than LogStandbySnapshot() would have.

 We could then also fold GetOldestActiveTransactionId() back into
 your proposed LogStandbySnapshot() and thus don't need two ProcArray
 traversals.

I think you make a good case for doing this.

However, I'm concerned that moving LogStandbySnapshot() in a backpatch
seems more risky than it's worth. We could easily introduce a new bug
into what we would all agree is a complex piece of code. Minimal
change seems best in this case.

And also, 2 ProcArray traversals is not a problem there.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Florian Pflug
On Oct27, 2011, at 15:51 , Simon Riggs wrote:
 On Thu, Oct 27, 2011 at 12:29 AM, Florian Pflug f...@phlo.org wrote:
 Here's what I image CreateCheckPoint() should look like:
 
 1) LogStandbySnapshot() and fill out oldestActiveXid
 2) Fill out REDO
 3) Wait for concurrent commits
 4) Fill out nextXid and the other fields
 5) CheckPointGuts()
 6) Rest
 
 It's then no longer necessary for LogStandbySnapshot() do modify
 the nextXid, since we fill out nextXid after LogStandbySnapshot() and
 will thus derive a higher value than LogStandbySnapshot() would have.
 
 We could then also fold GetOldestActiveTransactionId() back into
 your proposed LogStandbySnapshot() and thus don't need two ProcArray
 traversals.
 
 I think you make a good case for doing this.
 
 However, I'm concerned that moving LogStandbySnapshot() in a backpatch
 seems more risky than it's worth. We could easily introduce a new bug
 into what we would all agree is a complex piece of code. Minimal
 change seems best in this case.

OTOH, we currently compute oldestActiveXid within LogStandbySnapshot().
Your proposed patch changes that, which also carries a risk since something
could depend on these values being in sync. Especially since both the logged
snapshot and oldestActiveXid influence the snapshot tracking on the slave.

But since you wrote most of that code, your judgement about the relative
risks of these two approaches obviously out-weights mine.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Tom Lane
Simon Riggs si...@2ndquadrant.com writes:
 On Thu, Oct 27, 2011 at 12:36 PM, Robert Haas robertmh...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 5:37 AM, Simon Riggs si...@2ndquadrant.com wrote:
 It's much easier to understand that StartupCLOG() is actually a no-op
 and that we need to trim the clog at the end of recovery in all cases.

 If it's a no-op, why have it at all?

 It is a no-op for exactly the same reason other similar functions are
 no-ops - it used to do something but now does not.

 Anyone seeing StartupSubtrans and StartupMultiXact but no StartupClog
 will immediately ask why?.

I think it's a good point that StartupCLog doesn't exist in a vacuum
but should be parallel to the init functions for the other SLRU modules.
So at this point I think I agree with Simon's approach.  However, the
obvious next question is whether those other modules don't need to be
changed also, and if not why not.

Another issue is that if StartupCLog is left as a no-op, what will
happen if someone mistakenly tries to access clog before the trim
function is called?  It would be a good idea to make sure that such
a thing results in an easily-identifiable failure.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Your review of pg_receivexlog/pg_basebackup

2011-10-27 Thread Magnus Hagander
On Wed, Oct 26, 2011 at 09:52, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 (CC'ing pgsql-hackers, this started as an IM discussion yesterday but really
 belongs in the archives)

 On 25.10.2011 23:52, Magnus Hagander wrote:

 There's a tiny chance to get incomplete xlog files with pg_receivexlog if
 you crash:
 1. pg_receivexlog finishes write()ing a file but system crashes before
 fsync() finishes.
 2. When pg_receivexlog restarts after crash, the last WAL file was not
 fully flushed to disk, with
 holes in the middle, but it has the right length. pg_receivexlog will
 continue streaming from the next file.
 not sure if we care about such a narrow window, but maybe we do

 So how would we go about fixing that?  Always unlink the last file in
 the directory and try from there would seem dangerous too - what if
 it's not available on the master anymore, then we might have given up
 on data...

 Start streaming from the beginning of the last segment, but don't unlink it
 first. Just overwrite it as you receive the data.

 Or, always create new xlog file as 0001000100D3.partial, and
 only when it's fully written, fsync it, and then rename it to
 0001000100D3. Then you know that if a file doesn't have the
 .partial suffix, it's complete. The fact that the last partial file always
 has the .partial suffix needs some extra pushups in the restore_command,
 though.

Here's a version that does this. Turns out this requires a lot less
code than what was previously in there, which is always nice.

We still need to solve the other part which is how to deal with the
partial files on restore. But this is definitely a cleaner way from a
pure pg_receivexlog perspective.

Comments/reviews?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/
*** a/src/bin/pg_basebackup/pg_receivexlog.c
--- b/src/bin/pg_basebackup/pg_receivexlog.c
***
*** 71,104  usage(void)
  static bool
  segment_callback(XLogRecPtr segendpos, uint32 timeline)
  {
- 	char		fn[MAXPGPATH];
- 	struct stat statbuf;
- 
  	if (verbose)
  		fprintf(stderr, _(%s: finished segment at %X/%X (timeline %u)\n),
  progname, segendpos.xlogid, segendpos.xrecoff, timeline);
  
  	/*
- 	 * Check if there is a partial file for the name we just finished, and if
- 	 * there is, remove it under the assumption that we have now got all the
- 	 * data we need.
- 	 */
- 	segendpos.xrecoff /= XLOG_SEG_SIZE;
- 	PrevLogSeg(segendpos.xlogid, segendpos.xrecoff);
- 	snprintf(fn, sizeof(fn), %s/%08X%08X%08X.partial,
- 			 basedir, timeline,
- 			 segendpos.xlogid,
- 			 segendpos.xrecoff);
- 	if (stat(fn, statbuf) == 0)
- 	{
- 		/* File existed, get rid of it */
- 		if (verbose)
- 			fprintf(stderr, _(%s: removing file \%s\\n),
- 	progname, fn);
- 		unlink(fn);
- 	}
- 
- 	/*
  	 * Never abort from this - we handle all aborting in continue_streaming()
  	 */
  	return false;
--- 71,81 
***
*** 133,139  FindStreamingStart(XLogRecPtr currentpos, uint32 currenttimeline)
  	bool		b;
  	uint32		high_log = 0;
  	uint32		high_seg = 0;
- 	bool		partial = false;
  
  	dir = opendir(basedir);
  	if (dir == NULL)
--- 110,115 
***
*** 195,201  FindStreamingStart(XLogRecPtr currentpos, uint32 currenttimeline)
  			disconnect_and_exit(1);
  		}
  
! 		if (statbuf.st_size == 16 * 1024 * 1024)
  		{
  			/* Completed segment */
  			if (log  high_log ||
--- 171,177 
  			disconnect_and_exit(1);
  		}
  
! 		if (statbuf.st_size == XLOG_SEG_SIZE)
  		{
  			/* Completed segment */
  			if (log  high_log ||
***
*** 208,244  FindStreamingStart(XLogRecPtr currentpos, uint32 currenttimeline)
  		}
  		else
  		{
! 			/*
! 			 * This is a partial file. Rename it out of the way.
! 			 */
! 			char		newfn[MAXPGPATH];
! 
! 			fprintf(stderr, _(%s: renaming partial file \%s\ to \%s.partial\\n),
! 	progname, dirent-d_name, dirent-d_name);
! 
! 			snprintf(newfn, sizeof(newfn), %s/%s.partial,
! 	 basedir, dirent-d_name);
! 
! 			if (stat(newfn, statbuf) == 0)
! 			{
! /*
!  * XXX: perhaps we should only error out if the existing file
!  * is larger?
!  */
! fprintf(stderr, _(%s: file \%s\ already exists. Check and clean up manually.\n),
! 		progname, newfn);
! disconnect_and_exit(1);
! 			}
! 			if (rename(fullpath, newfn) != 0)
! 			{
! fprintf(stderr, _(%s: could not rename \%s\ to \%s\: %s\n),
! 		progname, fullpath, newfn, strerror(errno));
! disconnect_and_exit(1);
! 			}
! 
! 			/* Don't continue looking for more, we assume this is the last */
! 			partial = true;
! 			break;
  		}
  	}
  
--- 184,192 
  		}
  		else
  		{
! 			fprintf(stderr, _(%s: segment file '%s' is incorrect size %d, skipping\n),
! 	progname, dirent-d_name, (int) statbuf.st_size);
! 			continue;
  		}
  	}
  
***
*** 247,263  FindStreamingStart(XLogRecPtr currentpos, uint32 currenttimeline)
  	if (high_log  

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Simon Riggs
On Thu, Oct 27, 2011 at 3:03 PM, Florian Pflug f...@phlo.org wrote:

 I think you make a good case for doing this.

 However, I'm concerned that moving LogStandbySnapshot() in a backpatch
 seems more risky than it's worth. We could easily introduce a new bug
 into what we would all agree is a complex piece of code. Minimal
 change seems best in this case.

 OTOH, we currently compute oldestActiveXid within LogStandbySnapshot().
 Your proposed patch changes that, which also carries a risk since something
 could depend on these values being in sync. Especially since both the logged
 snapshot and oldestActiveXid influence the snapshot tracking on the slave.

 But since you wrote most of that code, your judgement about the relative
 risks of these two approaches obviously out-weights mine.

We must move oldestActiveXid since that is the source of a bug. There
is no need to move LogStandbySnapshot(), so I am suggesting we don't
do that for the backpatch. I was going to implement it the way you
suggest in HEAD, since I agree that is a cleaner way.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Thu, Oct 27, 2011 at 13:19, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 On 27.10.2011 14:09, Fujii Masao wrote:
 Yes. But that sounds unuserfriendly. Padding the WAL file manually
 is easy-to-do for a user?

 I'd definitely want to avoid anything that requires pg_receivexlog to
 actually *parse* the WAL. That'll make it way more complex than I'd
 like.

What parsing?  Just pad to 16MB with zeroes.  In fact, I think the
receiver should just create the file that size to start with, and then
write received data into it, much like normal WAL creation does.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Magnus Hagander
On Thu, Oct 27, 2011 at 16:54, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 On Thu, Oct 27, 2011 at 13:19, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 On 27.10.2011 14:09, Fujii Masao wrote:
 Yes. But that sounds unuserfriendly. Padding the WAL file manually
 is easy-to-do for a user?

 I'd definitely want to avoid anything that requires pg_receivexlog to
 actually *parse* the WAL. That'll make it way more complex than I'd
 like.

 What parsing?  Just pad to 16MB with zeroes.  In fact, I think the

I'm just sayihng that *if* parsing is required, it would be bad.

 receiver should just create the file that size to start with, and then
 write received data into it, much like normal WAL creation does.

So when pg_receivexlog starts up, how would it know if the last file
represents a completed file, or a half-full file, without actually
parsing it? It could be a 16Mb file with 10 bytes of valid data, or a
complete file with 16Mb of valid data.

We could always ask for a retransmit of the whole file, but if that
file is gone on the master, we won't be able to do that, and will
error out in a situation that's not actually an error.

Though I guess if we leave the file as .partial up until this point
(per my other patch just posted), I guess this doesn't actually apply
- if the file is called .partial, we'll overwrite into it. If it's
not, then we assume it's a complete segment.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Dimitri Fontaine
Magnus Hagander mag...@hagander.net writes:
 On Thu, Oct 27, 2011 at 13:19, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 Perhaps we should add automatic padding in the server, though. It wouldn't
 take much code in the server, and would make life easier for people writing
 their scripts. Maybe even have the backend check for a .partial file if it
 can't find a regularly named XLOG file. Needs some thought..

 I'd definitely want to avoid anything that requires pg_receivexlog to
 actually *parse* the WAL. That'll make it way more complex than I'd
 like.

What about creating the WAL file filled up with zeroes at the receiving
end and then overwriting data as we receive it?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Simon Riggs
On Thu, Oct 27, 2011 at 3:13 PM, Tom Lane t...@sss.pgh.pa.us wrote:

 However, the
 obvious next question is whether those other modules don't need to be
 changed also, and if not why not.

Good point.

StartupSubtrans() is also changed by this patch, since it will be
supplied with an earlier initialisation value.

StartupMultiXact() didn't need changing, I thought, but I will review further.

 Another issue is that if StartupCLog is left as a no-op, what will
 happen if someone mistakenly tries to access clog before the trim
 function is called?  It would be a good idea to make sure that such
 a thing results in an easily-identifiable failure.

The old StartupCLOG() didn't do anything that was essential to using
the clog, which is why its a no-op.

You can still use the clog, just with zero startup.

Maybe setting the current page should go in at startup, will think.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] out-of-order caution

2011-10-27 Thread Kevin Grittner
On the docs page for the SELECT statement, there is a caution which
starts with:
 
| It is possible for a SELECT command using ORDER BY and FOR
| UPDATE/SHARE to return rows out of order. This is because ORDER BY
| is applied first.
 
Is this risk limited to queries running in READ COMMITTED
transactions?  If so, I think that should be mentioned in the
caution.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] out-of-order caution

2011-10-27 Thread Simon Riggs
On Thu, Oct 27, 2011 at 4:41 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 On the docs page for the SELECT statement, there is a caution which
 starts with:

 | It is possible for a SELECT command using ORDER BY and FOR
 | UPDATE/SHARE to return rows out of order. This is because ORDER BY
 | is applied first.

 Is this risk limited to queries running in READ COMMITTED
 transactions?  If so, I think that should be mentioned in the
 caution.

I think it should say that if this occurs with SERIALIZED transactions
it will result in a serialisation error.

Just to say there is no effect in serializable mode wouldn't be helpful.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] plpgsql versus long ELSIF chains

2011-10-27 Thread Tom Lane
Some off-list discussion found that the cause of this problem:
http://archives.postgresql.org/pgsql-general/2011-10/msg00879.php
was an attempt to write a plpgsql IF-ELSIF-ELSIF-ELSIF-ELSIF-...-ELSE
statement with five thousand branches.  Putting aside the wisdom of
doing that, it seems like the parser ought not fall over when you do.
The reason it's doing so is that plpgsql's grammar handles ELSIF with a
rule like this:

stmt_else: K_ELSIF expr_until_then proc_sect stmt_else

This is called right recursion in the Bison manual (as opposed to
left recursion, where a nonterminal has itself as the *first*
item in its own expansion), and the manual specifically says you
should always avoid right recursion in favor of left recursion,
else you are vulnerable to parser stack overflow.  Duh.

So, looking a bit more closely into why it's done that way, it's
because ELSIF was shoehorned into a parsetree representation that
was only meant for plain IF-THEN-ELSE.  The generated parse tree
will have a new level of PLpgSQL_stmt_if struct for each ELSIF.
That means that even if we fix the grammar, we're still at risk of
stack overflow at execution time because of recursion in pl_exec.c.

So really what needs to be done here is to make ELSIF chains explicit in
the parsetree representation, and handle them via looping not recursion
at runtime.  This will also make it a lot easier to do the grammar via
left-recursion.

So I'm going to go off and do that, but I wonder whether anyone thinks
this is sufficiently important to back-patch.  I'm inclined to think
that back-patching isn't a good idea, because changing the
representation of PLpgSQL_stmt_if will break (at least) EDB's plpgsql
debugger; ISTM the number of complaints isn't enough to warrant doing
that in released branches.

More generally, it seems like a good idea to check whether there are
any other unnecessary uses of right recursion in the core and plpgsql
grammars.  Unless somebody knows of an existing tool to do that,
I'm thinking of writing a little perl script to check for it (probably
by parsing the output of bison -v) and adding the script to src/tools/
so we'll have it around to check again in the future.  Any objections?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby startup with overflowed snapshots

2011-10-27 Thread Chris Redekop
Thanks for the patch Simon, but unfortunately it does not resolve the issue
I am seeing.  The standby still refuses to finish starting up until long
after all clients have disconnected from the primary (10 minutes).  I do
see your new log statement on startup, but only once - it does not repeat.
 Is there any way for me to see  what the oldest xid on the standby is via
controldata or something like that?  The standby does stream to keep up with
the primary while the primary has load, and then it becomes idle when the
primary becomes idle (when I kill all the connections)so it appears to
be current...but it just doesn't finish starting up

I'm not sure if it's relevant, but after it has sat idle for a couple
minutes I start seeing these statements in the log (with the same offset
every time):
DEBUG:  skipping restartpoint, already performed at 9/9520



On Thu, Oct 27, 2011 at 7:26 AM, Simon Riggs si...@2ndquadrant.com wrote:

 Chris Redekop's recent report of slow startup for Hot Standby has made
 me revisit the code there.

 Although there isn't a bug, there is a missed opportunity for starting
 up faster which could be the source of Chris' annoyance.

 The following patch allows a faster startup in some circumstances.

 The patch also alters the log levels for messages and gives a single
 simple message for this situation. The log will now say

  LOG:  recovery snapshot waiting for non-overflowed snapshot or until
 oldest active xid on standby is at least %u (now %u)
  ...multiple times until snapshot non-overflowed or xid reached...

 whereas before the first LOG message shown was

  LOG:  consistent state delayed because recovery snapshot incomplete
  and only later, at DEBUG2 do you see
  LOG:  recovery snapshot waiting for %u oldest active xid on standby is %u
  ...multiple times until xid reached...

 Comments please.

 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services


 --
 Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-hackers




Re: [HACKERS] plpgsql versus long ELSIF chains

2011-10-27 Thread David E. Wheeler
On Oct 27, 2011, at 9:18 AM, Tom Lane wrote:

 So I'm going to go off and do that, but I wonder whether anyone thinks
 this is sufficiently important to back-patch.  I'm inclined to think
 that back-patching isn't a good idea, because changing the
 representation of PLpgSQL_stmt_if will break (at least) EDB's plpgsql
 debugger; ISTM the number of complaints isn't enough to warrant doing
 that in released branches.

+1 to not back-patching. Seems like it doesn't come up all that often, right

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql versus long ELSIF chains

2011-10-27 Thread Heikki Linnakangas

On 27.10.2011 19:18, Tom Lane wrote:

So really what needs to be done here is to make ELSIF chains explicit in
the parsetree representation, and handle them via looping not recursion
at runtime.  This will also make it a lot easier to do the grammar via
left-recursion.

So I'm going to go off and do that, but I wonder whether anyone thinks
this is sufficiently important to back-patch.


This doesn't look like a bug to me, just an unoptimal implementation. So 
-1 for backpatching.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby startup with overflowed snapshots

2011-10-27 Thread Simon Riggs
On Thu, Oct 27, 2011 at 5:26 PM, Chris Redekop ch...@replicon.com wrote:

 Thanks for the patch Simon, but unfortunately it does not resolve the issue
 I am seeing.  The standby still refuses to finish starting up until long
 after all clients have disconnected from the primary (10 minutes).  I do
 see your new log statement on startup, but only once - it does not repeat.
  Is there any way for me to see  what the oldest xid on the standby is via
 controldata or something like that?  The standby does stream to keep up with
 the primary while the primary has load, and then it becomes idle when the
 primary becomes idle (when I kill all the connections)so it appears to
 be current...but it just doesn't finish starting up
 I'm not sure if it's relevant, but after it has sat idle for a couple
 minutes I start seeing these statements in the log (with the same offset
 every time):
 DEBUG:  skipping restartpoint, already performed at 9/9520

OK, so it looks like there are 2 opportunities to improve, not just one.

Try this.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services


faster_hot_standby_startup_withsubxacts.v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in walsender when calling out to do_pg_stop_backup (and others?)

2011-10-27 Thread Bruce Momjian
Greg Jaskiewicz wrote:
 
 On 19 Oct 2011, at 18:28, Florian Pflug wrote:
  All the other flags which indicate cancellation reasons are set from signal 
  handers, I believe. We could of course mark as ClientConnectionLostPending 
  as volatile just to be consistent. Not sure whether that's a good idea, or 
  not. It might prevent a mistake should we ever add code to detect lost 
  connections asynchronously (i.e., from somewhere else than pq_flush). And 
  the cost is probably negligible, because CHECK_FOR_INTERRUPTS tests for 
  InterruptPending before calling ProcessInterrupts(), so we only pay the 
  cost of volatile if there's actually an interrupt pending. But I still 
  think it's better to add qualifies such a volatile only when really 
  necessary. A comment about why it *isn't* volatile is probably in order, 
  though, so I'll add that in the next version of the patch.
 
 Makes sense.
 
 I had to ask, because it sticks out. And indeed there is a possibility
 that someone will one day will try to use from signal handler, etc.

A C comment might help there.

--
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] out-of-order caution

2011-10-27 Thread Kevin Grittner
Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, Oct 27, 2011 at 4:41 PM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 On the docs page for the SELECT statement, there is a caution
 which starts with:

 | It is possible for a SELECT command using ORDER BY and FOR
 | UPDATE/SHARE to return rows out of order. This is because ORDER
 | BY is applied first.

 Is this risk limited to queries running in READ COMMITTED
 transactions?  If so, I think that should be mentioned in the
 caution.
 
 I think it should say that if this occurs with SERIALIZED
 transactions it will result in a serialisation error.
 
 Just to say there is no effect in serializable mode wouldn't be
 helpful.
 
Hmm.  At first reading I thought this was related to the
mixed-snapshot issue in READ COMMITTED, but now I'm not so sure. 
Does anyone know which isolation levels are affected?  Barring that,
can anyone point to an existing test which demonstrates the problem?
 
If this can happen in snapshot isolation with just one reader and
one writer, I doubt that SSI helps with it.  :-(
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] out-of-order caution

2011-10-27 Thread Robert Haas
On Thu, Oct 27, 2011 at 1:51 PM, Kevin Grittner
kevin.gritt...@wicourts.gov wrote:
 Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, Oct 27, 2011 at 4:41 PM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 On the docs page for the SELECT statement, there is a caution
 which starts with:

 | It is possible for a SELECT command using ORDER BY and FOR
 | UPDATE/SHARE to return rows out of order. This is because ORDER
 | BY is applied first.

 Is this risk limited to queries running in READ COMMITTED
 transactions?  If so, I think that should be mentioned in the
 caution.

 I think it should say that if this occurs with SERIALIZED
 transactions it will result in a serialisation error.

 Just to say there is no effect in serializable mode wouldn't be
 helpful.

 Hmm.  At first reading I thought this was related to the
 mixed-snapshot issue in READ COMMITTED, but now I'm not so sure.
 Does anyone know which isolation levels are affected?  Barring that,
 can anyone point to an existing test which demonstrates the problem?

 If this can happen in snapshot isolation with just one reader and
 one writer, I doubt that SSI helps with it.  :-(

Simple test case:

rhaas=# create table oops (a int);
CREATE TABLE
rhaas=# insert into oops values (1), (2), (3), (4);
INSERT 0 4
rhaas=# begin;
BEGIN
rhaas=# update oops set a = 5 where a = 2;
UPDATE 1

In another session:

rhaas=# select * from oops order by 1 for update;
this blocks

Back to the first session:

rhaas=# commit;
COMMIT

Second session now returns:

 a
---
 1
 5
 3
 4
(4 rows)

But if you do the same thing at REPEATABLE READ, you get:

ERROR:  could not serialize access due to concurrent update
STATEMENT:  select * from oops order by 1 for update;

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] out-of-order caution

2011-10-27 Thread Kevin Grittner
Robert Haas robertmh...@gmail.com wrote:
 Simple test case:
 
 rhaas=# create table oops (a int);
 CREATE TABLE
 rhaas=# insert into oops values (1), (2), (3), (4);
 INSERT 0 4
 rhaas=# begin;
 BEGIN
 rhaas=# update oops set a = 5 where a = 2;
 UPDATE 1
 
 In another session:
 
 rhaas=# select * from oops order by 1 for update;
 this blocks
 
 Back to the first session:
 
 rhaas=# commit;
 COMMIT
 
 Second session now returns:
 
  a
 ---
  1
  5
  3
  4
 (4 rows)
 
 But if you do the same thing at REPEATABLE READ, you get:
 
 ERROR:  could not serialize access due to concurrent update
 STATEMENT:  select * from oops order by 1 for update;
 
So it seems to me that the caution about this issue is only
half-right.  Below REPEATABLE READ isolation it behaves as currently
described; REPEATABLE READ or SERIALIZABLE will throw that error. 
That is probably worth noting, since:
 
(1)  People should understand that they can't get incorrect results
at either of the stricter isolation levels.
 
(2)  They *can* get a serialization failure involving just two
transactions: a read and a write.  This is not something which
normally happens at any level, so it might tend to surprise people.
 
No words leap to mind for me.  Anyone else?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (PATCH) Adding CORRESPONDING (NULL error)

2011-10-27 Thread Kerem Kat
Hi,

Union with NULL error persists without the corresponding patch. Here
is the output from postgres without the patch:

SELECT a FROM (SELECT 1 a) foo
UNION
SELECT a FROM (SELECT NULL a) foo2;

ERROR:  failed to find conversion function from unknown to integer


It is thrown from parse_coerce.c:coerce_type method. I will try to dig
deep on it.


Regards,

Kerem KAT


On Thu, Oct 27, 2011 at 15:45, Erik Rijkers e...@xs4all.nl wrote:
 (pgsql 9.2devel (25 oct) with your latest CORRESPONDING patch;
 linux x86_64 GNU/Linux 2.6.18-274.3.1.el5)

 Hi,

 here is another peculiarity, which I think is a bug:

 -- first without CORRESPONDING:

 $ psql -Xaf null.sql
            select 1 a   , 2 b
 union all
            select null a, 4 b ;
  a | b
 ---+---
  1 | 2
   | 4
 (2 rows)

 -- then with CORRESPONDING:

            select 1 a   , 2 b
 union all
 corresponding
            select null a, 4 b ;
 psql:null.sql:9: ERROR:  failed to find conversion function from unknown to 
 integer


 If the null value is in a table column the error does not occur:

 drop table if exists t1; create table t1 (a int, b int); insert into t1 
 values (1,2);
 drop table if exists t2; create table t2 (a int, b int); insert into t2 
 values (null,2);
                select a,b from t1
 union all
 corresponding
                select a,b from t2 ;
  a | b
 ---+---
  1 | 2
   | 2
 (2 rows)


 I'm not sure it is actually a bug; but it seems an unneccessary error.


 thanks,

 Erik Rijkers



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Florian Pflug
On Oct27, 2011, at 16:30 , Simon Riggs wrote:
 On Thu, Oct 27, 2011 at 3:03 PM, Florian Pflug f...@phlo.org wrote:
 
 I think you make a good case for doing this.
 
 However, I'm concerned that moving LogStandbySnapshot() in a backpatch
 seems more risky than it's worth. We could easily introduce a new bug
 into what we would all agree is a complex piece of code. Minimal
 change seems best in this case.
 
 OTOH, we currently compute oldestActiveXid within LogStandbySnapshot().
 Your proposed patch changes that, which also carries a risk since something
 could depend on these values being in sync. Especially since both the logged
 snapshot and oldestActiveXid influence the snapshot tracking on the slave.
 
 But since you wrote most of that code, your judgement about the relative
 risks of these two approaches obviously out-weights mine.
 
 We must move oldestActiveXid since that is the source of a bug. There
 is no need to move LogStandbySnapshot(), so I am suggesting we don't
 do that for the backpatch. I was going to implement it the way you
 suggest in HEAD, since I agree that is a cleaner way.

Sound good.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] out-of-order caution

2011-10-27 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 Simon Riggs si...@2ndquadrant.com wrote:
 On Thu, Oct 27, 2011 at 4:41 PM, Kevin Grittner
 kevin.gritt...@wicourts.gov wrote:
 | It is possible for a SELECT command using ORDER BY and FOR
 | UPDATE/SHARE to return rows out of order. This is because ORDER
 | BY is applied first.

 I think it should say that if this occurs with SERIALIZED
 transactions it will result in a serialisation error.
 
 Hmm.  At first reading I thought this was related to the
 mixed-snapshot issue in READ COMMITTED, but now I'm not so sure. 

Simon's comment is correct.  If you do a SELECT FOR UPDATE/SHARE in a
non-READ-COMMITTED transaction, and it turns out that someone modified
the tuple before you could lock it, you'll get a serialization error
(cf ExecLockRows()), not updated data.  So out-of-order sorting is
not possible.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] out-of-order caution

2011-10-27 Thread Tom Lane
Kevin Grittner kevin.gritt...@wicourts.gov writes:
 (2)  They *can* get a serialization failure involving just two
 transactions: a read and a write.

Only if you ignore the difference between SELECT FOR UPDATE/SHARE and
plain SELECT.  I think calling the former a read is a conceptual error
to start with.  It has the same locking and synchronization behavior as
a write.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] portal with hold

2011-10-27 Thread fschmidt
I am posting to beg for the implementation of a with hold feature for
portals, similar to what available for cursors.  This is needed by the JDBC
driver to implement Java's Result.HOLD_CURSORS_OVER_COMMIT which is needed
to make Java's setFetchSize() work which is needed to read large result
sets.  Please also see:

http://postgresql.1045698.n5.nabble.com/setFetchSize-tp4935215p4941614.html


--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/portal-with-hold-tp4943998p4943998.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] fun with unlogged tables

2011-10-27 Thread Robert Haas
One of the optimizations that I did for 9.1 was to make transactions
that touch only temporary and/or unlogged tables always commit
asynchronously, because if the database crashes the table contents
will be blown away in their entirety, and whether or not the commit
made it down to disk won't matter a bit.  In my testing, this hugely
improved the performance of unlogged tables.  However, Heikki recently
reported to me off-list that this can actually cause a significant
performance problem in some circumstances, because committing
asynchronously means that we can't set hint bits right away - we must
wait until the WAL writer has completed its background flush.  With
default settings, this takes long enough to cause lots of extra clog
traffic.  He ran into the problem while running pgbench at scale
factor 15, and I reproduced it the same way.  I believe that at a
large scale factor the effect is lessened because you're less likely
to reread the same row multiple times before the commit hits the disk.

It strikes me that, while it's not safe to set hint bits until the
commit record hits the disk for *permanent* relations, it ought to be
just fine for temporary and unlogged relations, because those pages
will be gone after a crash, and their hint bits with them.  Attached
is a patch taking that approach.  Another approach would be to have
transactions that only touch temporary or unlogged relations to avoid
changing the value that will be returned by
TransactionIdGetCommitLSN().  This approach is better when
synchronous_commit=off and a transaction touches both permanent and
non-permanent tables, because it makes the decision as to whether hint
bits can be set early based on which page is being updated rather than
on some characteristic of the transaction.  However, it also adds a
small amount of overhead to the case where we're doing an asynchronous
commit on a permanent table, because we do one more check before
concluding that hint bits can't be set.

I did some benchmarking of this approach using pgbench with scale
factor 15.  shared_buffers = 8GB, maintenance_work_mem = 1GB,
checkpoint_segments = 30, checkpoint_timeout = 15min,
checkpoint_completion_target = 0.9, synchronous_commit = off.  On
permanent tables 8 clients came out slower and 16 came out faster; I'm
inclined to believe it's all in the noise.  On unlogged tables the
patch appears to be a clear win, massively so at 32 clients.  Results
below.  The first number on each line is the client count, and the
remaining numbers are tps including connections establishing, from
individual 5-minute runs.

Unlogged Tables, unpatched:
1 861.841894 752.762490 837.847109
8 3379.832456 4100.539369 3751.842036
16 6259.907605 5523.406202 4437.648873
32 4547.725770 5360.246166 4958.086754

Unlogged Tables, with patch:
1 887.562141 785.539717 920.275452
8 4436.366884 4374.135712 4335.791842
16 7518.908796 7478.427691 7476.817757
32 10433.615767 10450.577573 10374.186566

Permanent Tables, unpatched:
1 648.824800 647.239277 652.116249
8 3785.485647 3481.021391 3827.455756
16 5652.069678 4004.780105 4207.354612
32 5084.804778 4645.997471 5222.387075

Permanent Tables, with patch:
1 636.162775 637.145834 640.298944
8 3383.817707 3388.273809 3815.298676
16 5543.585926 5093.483757 5112.854318
32 5229.295024 4985.736460 5103.441187

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


bip.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (PATCH) Adding CORRESPONDING (NULL error)

2011-10-27 Thread Tom Lane
Kerem Kat kerem...@gmail.com writes:
 Union with NULL error persists without the corresponding patch. Here
 is the output from postgres without the patch:

 SELECT a FROM (SELECT 1 a) foo
 UNION
 SELECT a FROM (SELECT NULL a) foo2;

 ERROR:  failed to find conversion function from unknown to integer

Yeah, this is a longstanding issue that is not simple to fix without
introducing other unpleasantnesses.  It is not something you should
try to deal with at the same time as implementing CORRESPONDING.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dumpall Sets Roll default_tablespace Before Creating Tablespaces

2011-10-27 Thread Bruce Momjian
Florian Pflug wrote:
 On Oct21, 2011, at 16:42 , Phil Sorber wrote:
  If you did want to make them immutable, I also like Florian's idea of
  a dependency graph. This would make the dumps less readable though.
 
 Hm, I kinda reversed my opinion on that, though - i.e., I no longer think
 that the dependency graph idea has much merit. For two reasons
 
 First, dependencies work on OIDs, not on names. Thus, for the dependency
 machinery to work for GUCs, they'd also need to store OIDs instead of
 names of referenced schema objects. (Otherwise you get into trouble if
 objects are renamed)
 
 Which of course doesn't work, at least for roles, because roles are
 shared objects, but referenced objects might be database-local.
 (search_path, for example).

Is this a TODO?

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (PATCH) Adding CORRESPONDING (NULL error)

2011-10-27 Thread Tom Lane
I wrote:
 Kerem Kat kerem...@gmail.com writes:
 Union with NULL error persists without the corresponding patch. Here
 is the output from postgres without the patch:

 SELECT a FROM (SELECT 1 a) foo
 UNION
 SELECT a FROM (SELECT NULL a) foo2;

 ERROR:  failed to find conversion function from unknown to integer

 Yeah, this is a longstanding issue that is not simple to fix without
 introducing other unpleasantnesses.  It is not something you should
 try to deal with at the same time as implementing CORRESPONDING.

BTW, just to clarify: although that case fails, the case Erik was
complaining of does work in unmodified Postgres:

regression=# select 1 a   , 2 b
union all
select null a, 4 b ;
 a | b 
---+---
 1 | 2
   | 4
(2 rows)

and I agree with him that it should still work with CORRESPONDING.
Even though the behavior of unlabeled NULLs is less than perfect,
we definitely don't want to break cases that work now.  I suspect
the failure means that you tried to postpone too much work to plan
time.  You do have to match up the columns honestly at parse time
and do the necessary type coercions on them then.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (PATCH) Adding CORRESPONDING (NULL error)

2011-10-27 Thread Tom Lane
Kerem Kat kerem...@gmail.com writes:
 On Thu, Oct 27, 2011 at 23:20, Tom Lane t...@sss.pgh.pa.us wrote:
 BTW, just to clarify: although that case fails, the case Erik was
 complaining of does work in unmodified Postgres:
 ...
 and I agree with him that it should still work with CORRESPONDING.

 That is by design, because CORRESPONDING is implemented as subqueries:

Well, this appears to me to be a counterexample sufficient to refute
that implementation decision.  You can inject subqueries at plan time,
if that helps you make things match up, but you can't rearrange things
that way at parse time, as I gather you're doing or else you would not
be seeing this problem.  In any case, I already pointed out to you that
rearranging the parse tree that way is problematic for reverse-listing
the parse tree.  We don't want to see subqueries injected in the results
of printing parse trees with ruleutils.c.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby startup with overflowed snapshots

2011-10-27 Thread Chris Redekop
hrmz, still basically the same behaviour.  I think it might be a *little*
better with this patch.  Before when under load it would start up quickly
maybe 2 or 3 times out of 10 attemptswith this patch it might be up to 4
or 5 times out of 10...ish...or maybe it was just fluke *shrug*.  I'm still
only seeing your log statement a single time (I'm running at debug2).  I
have discovered something though - when the standby is in this state if I
force a checkpoint on the primary then the standby comes right up.  Is there
anything I check or try for you to help figure this out?or is it
actually as designed that it could take 10-ish minutes to start up even
after all clients have disconnected from the primary?


On Thu, Oct 27, 2011 at 11:27 AM, Simon Riggs si...@2ndquadrant.com wrote:

 On Thu, Oct 27, 2011 at 5:26 PM, Chris Redekop ch...@replicon.com wrote:

  Thanks for the patch Simon, but unfortunately it does not resolve the
 issue
  I am seeing.  The standby still refuses to finish starting up until long
  after all clients have disconnected from the primary (10 minutes).  I do
  see your new log statement on startup, but only once - it does not
 repeat.
   Is there any way for me to see  what the oldest xid on the standby is
 via
  controldata or something like that?  The standby does stream to keep up
 with
  the primary while the primary has load, and then it becomes idle when the
  primary becomes idle (when I kill all the connections)so it appears
 to
  be current...but it just doesn't finish starting up
  I'm not sure if it's relevant, but after it has sat idle for a couple
  minutes I start seeing these statements in the log (with the same offset
  every time):
  DEBUG:  skipping restartpoint, already performed at 9/9520

 OK, so it looks like there are 2 opportunities to improve, not just one.

 Try this.

 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services



Re: [HACKERS] (PATCH) Adding CORRESPONDING (NULL error)

2011-10-27 Thread Kerem Kat
On Thu, Oct 27, 2011 at 23:20, Tom Lane t...@sss.pgh.pa.us wrote:
 I wrote:
 Kerem Kat kerem...@gmail.com writes:
 Union with NULL error persists without the corresponding patch. Here
 is the output from postgres without the patch:

 SELECT a FROM (SELECT 1 a) foo
 UNION
 SELECT a FROM (SELECT NULL a) foo2;

 ERROR:  failed to find conversion function from unknown to integer

 Yeah, this is a longstanding issue that is not simple to fix without
 introducing other unpleasantnesses.  It is not something you should
 try to deal with at the same time as implementing CORRESPONDING.

 BTW, just to clarify: although that case fails, the case Erik was
 complaining of does work in unmodified Postgres:

 regression=# select 1 a   , 2 b
 union all
            select null a, 4 b ;
  a | b
 ---+---
  1 | 2
   | 4
 (2 rows)

 and I agree with him that it should still work with CORRESPONDING.
 Even though the behavior of unlabeled NULLs is less than perfect,
 we definitely don't want to break cases that work now.  I suspect
 the failure means that you tried to postpone too much work to plan
 time.  You do have to match up the columns honestly at parse time
 and do the necessary type coercions on them then.

                        regards, tom lane


That is by design, because CORRESPONDING is implemented as subqueries:

 select 1 a   , 2 b
union all
corresponding
   select null a, 4 b ;

is equivalent to

SELECT a, b FROM ( SELECT 1 a, 2 b ) foo
UNION ALL
SELECT a, b FROM ( SELECT null a, 4 b ) foo2;

which gives the same error in unpatched postgres.


Regards,

Kerem KAT

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby startup with overflowed snapshots

2011-10-27 Thread Simon Riggs
On Thu, Oct 27, 2011 at 10:09 PM, Chris Redekop ch...@replicon.com wrote:

 hrmz, still basically the same behaviour.  I think it might be a *little*
 better with this patch.  Before when under load it would start up quickly
 maybe 2 or 3 times out of 10 attemptswith this patch it might be up to 4
 or 5 times out of 10...ish...or maybe it was just fluke *shrug*.  I'm still
 only seeing your log statement a single time (I'm running at debug2).  I
 have discovered something though - when the standby is in this state if I
 force a checkpoint on the primary then the standby comes right up.  Is there
 anything I check or try for you to help figure this out?or is it
 actually as designed that it could take 10-ish minutes to start up even
 after all clients have disconnected from the primary?

Thanks for testing. The improvements cover specific cases, so its not
subject to chance; its not a performance patch.

It's not designed to act the way you describe, but it does.

The reason this occurs is that you have a transaction heavy workload
with occasional periods of complete quiet and a base backup time that
is much less than checkpoint_timeout. If your base backup was slower
the checkpoint would have hit naturally before recovery had reached a
consistent state. Which seems fairly atypical. I guess you're doing
this on a test system.

It seems cheap to add in a call to LogStandbySnapshot() after each
call to pg_stop_backup().

Does anyone think this case is worth adding code for? Seems like one
more thing to break.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training  Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dumpall Sets Roll default_tablespace Before Creating Tablespaces

2011-10-27 Thread Florian Pflug
On Oct27, 2011, at 23:02 , Bruce Momjian wrote:
 Florian Pflug wrote:
 On Oct21, 2011, at 16:42 , Phil Sorber wrote:
 If you did want to make them immutable, I also like Florian's idea of
 a dependency graph. This would make the dumps less readable though.
 
 Hm, I kinda reversed my opinion on that, though - i.e., I no longer think
 that the dependency graph idea has much merit. For two reasons
 
 First, dependencies work on OIDs, not on names. Thus, for the dependency
 machinery to work for GUCs, they'd also need to store OIDs instead of
 names of referenced schema objects. (Otherwise you get into trouble if
 objects are renamed)
 
 Which of course doesn't work, at least for roles, because roles are
 shared objects, but referenced objects might be database-local.
 (search_path, for example).
 
 Is this a TODO?

The idea quoted above, no. But

 Downgrade non-immutable (i.e., dependent on database state) checks during
 ALTER ROLE/DATABASE SET to WARNINGs to avoid breakage during restore

makes for a fine TODO, I'd say.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Add socket dir to pg_config..?

2011-10-27 Thread Stephen Frost
All,

  Was just wondering if we might want to include the default socket
  directory that was compiled in as part of the pg_config output..?
  
Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pg_upgrade if 'postgres' database is dropped

2011-10-27 Thread Bruce Momjian
Robert Haas wrote:
 On Tue, Oct 4, 2011 at 12:11 PM, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
  pg_upgrade doesn't work if the 'postgres' database has been dropped in the
  old cluster:
 
  ~/pgsql.master$ bin/pg_upgrade -b ~/pgsql.91stable/bin -B bin/ -d
  ~/pgsql.91stable/data -D data-upgraded/
  Performing Consistency Checks
  -
  Checking current, bin, and data directories ? ? ? ? ? ? ? ? ok
  Checking cluster versions ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ok
  Checking database user is a superuser ? ? ? ? ? ? ? ? ? ? ? ok
  Checking for prepared transactions ? ? ? ? ? ? ? ? ? ? ? ? ?ok
  Checking for reg* system OID user data types ? ? ? ? ? ? ? ?ok
  Checking for contrib/isn with bigint-passing mismatch ? ? ? ok
  Creating catalog dump ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ok
  Checking for prepared transactions ? ? ? ? ? ? ? ? ? ? ? ? ?ok
 
  New cluster database postgres does not exist in the old cluster
  Failure, exiting
 
 
  That's a bit unfortunate. We have some other tools that don't work without
  the 'postgres' database, like 'reindexdb -all', but it would still be nice
  if pg_upgrade did.
 
 +1.  I think our usual policy is to try postgres first and then try
 template1, so it would seem logical for pg_upgrade to do the same.

Well, it is a little tricky.  The problem is that I am not just
connecting to a database --- I am creating support functions in the
database.  Now, this is complex because template1 is the template for
new databases, except for pg_dump which uses template0.

So, it is going to be confusing to support both databases because there
is the cleanup details I have to document if I use template1.

Also, pg_dumpall unconditionally connects to the postgres database to
restore roles:

fprintf(OPF, \\connect postgres\n\n);

We could connect to template1 for this, but I am not comfortable
changing this.  And when pg_dumpall throws an error for a missing
postgres database, pg_upgrade is going to fail.

We started using the postgres database as a database for connections ---
do we want to change that?  We certainly can't have the pg_dumpall
output _conditionally_ connecting to template1.

I am feeling this isn't worth pursuing.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade if 'postgres' database is dropped

2011-10-27 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
 So, it is going to be confusing to support both databases because there
 is the cleanup details I have to document if I use template1.

Presumably there's some other database in the system besides template1
if postgres doesn't exist..  Would it be possible to just make it
configurable?  Then the user could pick a non-template database.  Having
it fail if the option isn't used and the default postgres isn't there is
fine, imv.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pg_upgrade if 'postgres' database is dropped

2011-10-27 Thread Bruce Momjian
Stephen Frost wrote:
-- Start of PGP signed section.
 * Bruce Momjian (br...@momjian.us) wrote:
  So, it is going to be confusing to support both databases because there
  is the cleanup details I have to document if I use template1.
 
 Presumably there's some other database in the system besides template1
 if postgres doesn't exist..  Would it be possible to just make it
 configurable?  Then the user could pick a non-template database.  Having
 it fail if the option isn't used and the default postgres isn't there is
 fine, imv.

I have not seen enough demand to make this a user-visible configuration.
We can just tell them to create a postgres database.   Frankly, they
would have had to _remove_ the postgres database after initdb for it not
to be there, and they are instructed to change nothing about the new
database.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade if 'postgres' database is dropped

2011-10-27 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
 I have not seen enough demand to make this a user-visible configuration.
 We can just tell them to create a postgres database.   Frankly, they
 would have had to _remove_ the postgres database after initdb for it not
 to be there, and they are instructed to change nothing about the new
 database.

Yes, they would have removed it because they didn't want it.  As I
recall, part of the agreement to create an extra database by default was
that it could be removed if users didn't want it.  Turning around and
then saying but things won't work if it's not there isn't exactly
supporting users who decide to remove it.

Regarding pg_dumpall and pg_restore, I'm pretty sure both of those can
be configured to connect to other databases instead, including for
globals.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Hot Standby startup with overflowed snapshots

2011-10-27 Thread Chris Redekop
Sorry...designed was poor choice of words, I meant not unexpected.
 Doing the checkpoint right after pg_stop_backup() looks like it will work
perfectly for me, so thanks for all your help!

On a side note I am sporadically seeing another error on hotstandby startup.
 I'm not terribly concerned about it as it is pretty rare and it will work
on a retry so it's not a big deal.  The error is FATAL:  out-of-order XID
insertion in KnownAssignedXids.  If you think it might be a bug and are
interested in hunting it down let me know and I'll help any way I can...but
if you're not too worried about it then neither am I :)


On Thu, Oct 27, 2011 at 4:55 PM, Simon Riggs si...@2ndquadrant.com wrote:

 On Thu, Oct 27, 2011 at 10:09 PM, Chris Redekop ch...@replicon.com
 wrote:

  hrmz, still basically the same behaviour.  I think it might be a *little*
  better with this patch.  Before when under load it would start up quickly
  maybe 2 or 3 times out of 10 attemptswith this patch it might be up
 to 4
  or 5 times out of 10...ish...or maybe it was just fluke *shrug*.  I'm
 still
  only seeing your log statement a single time (I'm running at debug2).  I
  have discovered something though - when the standby is in this state if I
  force a checkpoint on the primary then the standby comes right up.  Is
 there
  anything I check or try for you to help figure this out?or is it
  actually as designed that it could take 10-ish minutes to start up even
  after all clients have disconnected from the primary?

 Thanks for testing. The improvements cover specific cases, so its not
 subject to chance; its not a performance patch.

 It's not designed to act the way you describe, but it does.

 The reason this occurs is that you have a transaction heavy workload
 with occasional periods of complete quiet and a base backup time that
 is much less than checkpoint_timeout. If your base backup was slower
 the checkpoint would have hit naturally before recovery had reached a
 consistent state. Which seems fairly atypical. I guess you're doing
 this on a test system.

 It seems cheap to add in a call to LogStandbySnapshot() after each
 call to pg_stop_backup().

 Does anyone think this case is worth adding code for? Seems like one
 more thing to break.

 --
  Simon Riggs   http://www.2ndQuadrant.com/
  PostgreSQL Development, 24x7 Support, Training  Services



Re: [HACKERS] pg_upgrade if 'postgres' database is dropped

2011-10-27 Thread Bruce Momjian
Stephen Frost wrote:
-- Start of PGP signed section.
 * Bruce Momjian (br...@momjian.us) wrote:
  I have not seen enough demand to make this a user-visible configuration.
  We can just tell them to create a postgres database.   Frankly, they
  would have had to _remove_ the postgres database after initdb for it not
  to be there, and they are instructed to change nothing about the new
  database.
 
 Yes, they would have removed it because they didn't want it.  As I
 recall, part of the agreement to create an extra database by default was
 that it could be removed if users didn't want it.  Turning around and
 then saying but things won't work if it's not there isn't exactly
 supporting users who decide to remove it.

Well, you would have to remove it _after_ you did the pg_upgrade.  Right
now if you do a normal dump/restore upgrade, you also have to re-remove
the postgres database.  We don't have any mechanism to drop a database
as part of pg_dumpall's restore if it didn't exist in the old cluster.

 Regarding pg_dumpall and pg_restore, I'm pretty sure both of those can
 be configured to connect to other databases instead, including for
 globals.

Well, please show me the code, because the C code I showed you had the
'\connect postgres' string hardcoded in there.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade if 'postgres' database is dropped

2011-10-27 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
 Well, you would have to remove it _after_ you did the pg_upgrade.  Right
 now if you do a normal dump/restore upgrade, you also have to re-remove
 the postgres database.  We don't have any mechanism to drop a database
 as part of pg_dumpall's restore if it didn't exist in the old cluster.

Perhaps not, but it *could* be removed after the restore and all would
be well, yes?

  Regarding pg_dumpall and pg_restore, I'm pretty sure both of those can
  be configured to connect to other databases instead, including for
  globals.
 
 Well, please show me the code, because the C code I showed you had the
 '\connect postgres' string hardcoded in there.

I guess there's a difference between can be used and will work
correctly, but might create some extra garbage and can't be used at
all.  pg_dumpall has a -l option for connecting to whatever *existing*
database you have to pull the global data, and then it'll restore into a
clean initdb'd cluster, after which you could remove postgres.
Admittedly, if you initdb the cluster, drop postgres, and then try a
restore, it would fail.  Personally, I'm not a big fan of that (why
don't we use what was passed in to -l for that..?), but, practically,
it's not that big a deal.  I don't know many folks who are going to
restore a pg_dumpall dump into an existing configuration where they've
monkied with things (that could cause all kinds of other issues anyway,
role conflicts, etc).

If I understood correctly (perhaps I didn't..), is that pg_upgrade
doesn't have the pg_dumpall equivilant of the '-l' or '--database'
option, and that's what is at issue here.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Add socket dir to pg_config..?

2011-10-27 Thread Tom Lane
Stephen Frost sfr...@snowman.net writes:
   Was just wondering if we might want to include the default socket
   directory that was compiled in as part of the pg_config output..?

[ shrug... ]  We don't report the compiled-in port number, which is
considerably more critical.  And we don't report changes in any of the
other stuff in pg_config_manual.h.

MHO is that changing the socket directory is only marginally supported,
and we shouldn't encourage it unless we're prepared to fully support it
(which we can't really).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade if 'postgres' database is dropped

2011-10-27 Thread Bruce Momjian
Stephen Frost wrote:
-- Start of PGP signed section.
 * Bruce Momjian (br...@momjian.us) wrote:
  Well, you would have to remove it _after_ you did the pg_upgrade.  Right
  now if you do a normal dump/restore upgrade, you also have to re-remove
  the postgres database.  We don't have any mechanism to drop a database
  as part of pg_dumpall's restore if it didn't exist in the old cluster.
 
 Perhaps not, but it *could* be removed after the restore and all would
 be well, yes?

I am not sure how much pg_dump worries about removing system objects
during a restore --- I don't think it does that at all actually.  I
thought we did that for plpgsql, but I don't see that in the C code now,
and testing doesn't show it being removed by pg_dump.

   Regarding pg_dumpall and pg_restore, I'm pretty sure both of those can
   be configured to connect to other databases instead, including for
   globals.
  
  Well, please show me the code, because the C code I showed you had the
  '\connect postgres' string hardcoded in there.
 
 I guess there's a difference between can be used and will work
 correctly, but might create some extra garbage and can't be used at
 all.  pg_dumpall has a -l option for connecting to whatever *existing*
 database you have to pull the global data, and then it'll restore into a
 clean initdb'd cluster, after which you could remove postgres.

Keep in mind -l might connect to a specified database to do the dump,
but it will still connect to the 'postgres' database to recreate them.

 Admittedly, if you initdb the cluster, drop postgres, and then try a
 restore, it would fail.  Personally, I'm not a big fan of that (why

Right, same with pg_upgrade.

 don't we use what was passed in to -l for that..?), but, practically,

No idea.

 it's not that big a deal.  I don't know many folks who are going to
 restore a pg_dumpall dump into an existing configuration where they've
 monkied with things (that could cause all kinds of other issues anyway,
 role conflicts, etc).
 
 If I understood correctly (perhaps I didn't..), is that pg_upgrade
 doesn't have the pg_dumpall equivilant of the '-l' or '--database'
 option, and that's what is at issue here.

Well, I can modify pg_upgrade to connect to template1 easily (no need
for a switch) --- the problem is that pg_dumpall requires the 'postgres'
database to restore its dump file.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade if 'postgres' database is dropped

2011-10-27 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 Stephen Frost wrote:
 Yes, they would have removed it because they didn't want it.  As I
 recall, part of the agreement to create an extra database by default was
 that it could be removed if users didn't want it.  Turning around and
 then saying but things won't work if it's not there isn't exactly
 supporting users who decide to remove it.

 Well, you would have to remove it _after_ you did the pg_upgrade.

As far as the *target* cluster is concerned, I have no sympathy for
someone who messes with its contents before running pg_upgrade.
That's an RTFM matter: you're supposed to upgrade into a virgin
just-initdb'd cluster.

However, it would be nice if pg_upgrade supported transferring from a
*source* cluster that didn't have the postgres DB.

What about creating a new, single-purpose database in the source
cluster and then removing it again after we're done?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade if 'postgres' database is dropped

2011-10-27 Thread Bruce Momjian
Tom Lane wrote:
 Bruce Momjian br...@momjian.us writes:
  Stephen Frost wrote:
  Yes, they would have removed it because they didn't want it.  As I
  recall, part of the agreement to create an extra database by default was
  that it could be removed if users didn't want it.  Turning around and
  then saying but things won't work if it's not there isn't exactly
  supporting users who decide to remove it.
 
  Well, you would have to remove it _after_ you did the pg_upgrade.
 
 As far as the *target* cluster is concerned, I have no sympathy for
 someone who messes with its contents before running pg_upgrade.
 That's an RTFM matter: you're supposed to upgrade into a virgin
 just-initdb'd cluster.
 
 However, it would be nice if pg_upgrade supported transferring from a
 *source* cluster that didn't have the postgres DB.

I have this C comment in pg_upgrade about this:

 *  If someone removes the 'postgres' database from the old cluster and
 *  the new cluster has a 'postgres' database, the number of databases
 *  will not match.  We actually could upgrade such a setup, but it would
 *  violate the 1-to-1 mapping of database counts, so we throw an error
 *  instead.  We would detect this as a database count mismatch during
 *  upgrade, but we want to detect it during the check phase and report
 *  the database name.

Is this worth fixing?  Another problem is that pg_dumpall doesn't remove
the postgres database in the new cluster if it was not in the old one,
so they are going to end up with a postgres database in the new cluster
anyway.  I could argue that pg_upgrade is better because reports it
cannot recreate the new cluster exactly, while pg_dumpall just keeps a
postgres database that was not in the old cluster.

 What about creating a new, single-purpose database in the source
 cluster and then removing it again after we're done?

That is not a problem --- I can easily use template1.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade if 'postgres' database is dropped

2011-10-27 Thread Robert Haas
On Thu, Oct 27, 2011 at 11:35 PM, Bruce Momjian br...@momjian.us wrote:
 What about creating a new, single-purpose database in the source
 cluster and then removing it again after we're done?

 That is not a problem --- I can easily use template1.

Huh?

You just said upthread that you didn't want to use template1 because
you didn't want to modify the template database.  I think the point is
that if you're doing something to the database that someone might
object to, you oughtn't be doing it to the postgres database either.
You should create a database just for pg_upgrade's use and install its
crap in there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade if 'postgres' database is dropped

2011-10-27 Thread Bruce Momjian
Robert Haas wrote:
 On Thu, Oct 27, 2011 at 11:35 PM, Bruce Momjian br...@momjian.us wrote:
  What about creating a new, single-purpose database in the source
  cluster and then removing it again after we're done?
 
  That is not a problem --- I can easily use template1.
 
 Huh?
 
 You just said upthread that you didn't want to use template1 because
 you didn't want to modify the template database.  I think the point is

I don't want to use postgres and then fall back to template1 if
necessary --- I would just use template1 always.

 that if you're doing something to the database that someone might
 object to, you oughtn't be doing it to the postgres database either.
 You should create a database just for pg_upgrade's use and install its
 crap in there.

It installs crap in all databases to set oids on system tables, for
example, so we are only creating it early in postgres (or template1) to
set auth_id.  Our sticking point now is that pg_dumpall has the
'postgres' database hardcoded for role creation.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby startup with overflowed snapshots

2011-10-27 Thread Robert Haas
On Thu, Oct 27, 2011 at 6:55 PM, Simon Riggs si...@2ndquadrant.com wrote:
 It seems cheap to add in a call to LogStandbySnapshot() after each
 call to pg_stop_backup().

 Does anyone think this case is worth adding code for? Seems like one
 more thing to break.

Why at that particular time?

It would maybe nice if the master could notice when it has a plausible
(non-overflowed) snapshot and log it then.  But that might be more
code than the problem is worth.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Include commit identifier in version() function

2011-10-27 Thread pasman pasmański
Hi.

I think, it be useful to include in version() function a hexadecimal
identifier of commit, for fast checkout to it in git.

-- 

pasman

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: [PL/pgSQL] %TYPE and array declaration - second patch

2011-10-27 Thread Tom Lane
Pavel Stehule pavel.steh...@gmail.com writes:
 there is final Wojciech's patch

I looked this over a little bit, but I don't see an answer to the
question I put on the commitfest entry: why is this being done in
plpgsql, and not somewhere in the core code?  The core has also got
the concept of %TYPE, and could stand to have support for attaching []
notation to that.  I'm not entirely sure if anything could be shared,
but it would sure be worth looking at.  I've wished for some time that
%TYPE not be handled directly in read_datatype() at all, but rather in
the core parse_datatype() function.  This would require passing some
sort of callback function to parse_datatype() to let it know what
variables can be referenced in %TYPE, but we have parser callback
functions just like that already.  One benefit we could get that way
is that the core meaning of %TYPE (type of a table field name) would
still be available in plpgsql, if the name didn't match any declared
plpgsql variable.

However, assuming that we're sticking to just changing plpgsql for the
moment ... I cannot escape the feeling that this is a large patch with a
small patch struggling to get out.  It should not require 500 net new
lines of code to provide this functionality, when all we're doing is
looking up the array type whose element type is the type the existing
code can derive.  I would have expected to see the grammar passing one
extra flag to plpgsql_parse_cwordtype and plpgsql_parse_cwordrowtype
routines that were little changed except for the addition of a step to
find the array type.

Another complaint is that the grammar changes assume that the first
token of decl_datatype must be T_WORD or T_CWORD, which means that
it fails for cases such as unreserved plpgsql keywords.  This example
should work, and does work in 9.1:

create domain hint as int;

create or replace function foo() returns void as $$
declare
  x hint;
begin
end
$$ language plpgsql;

but fails with this patch because hint is returned by the lexer as
K_HINT not T_WORD.  You might be able to get around that by adding a
production with unreserved_keyword --- but I'm unsure that that would
cover every case that worked before, and in any case I do not see the
point of changing the grammar production at all.  It gains almost
nothing to have Bison parse the [] rather than doing it with C code in
read_datatype().

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade if 'postgres' database is dropped

2011-10-27 Thread Tom Lane
Bruce Momjian br...@momjian.us writes:
 Robert Haas wrote:
 that if you're doing something to the database that someone might
 object to, you oughtn't be doing it to the postgres database either.
 You should create a database just for pg_upgrade's use and install its
 crap in there.

 It installs crap in all databases to set oids on system tables,

It seems like you're both confusing the source and target clusters.
None of that stuff gets installed in the source, does it?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: [PL/pgSQL] %TYPE and array declaration - second patch

2011-10-27 Thread Pavel Stehule
Hello

2011/10/28 Tom Lane t...@sss.pgh.pa.us:
 Pavel Stehule pavel.steh...@gmail.com writes:
 there is final Wojciech's patch


this is just small note about length of this patch. This patch was
significantly smaller then he solved problem with derivate types for
compound types - it should to solve problem described in this thread

http://stackoverflow.com/questions/7634704/declare-variable-of-composite-type-in-postgresql-using-type

Regards

Pavel Stehule



 I looked this over a little bit, but I don't see an answer to the
 question I put on the commitfest entry: why is this being done in
 plpgsql, and not somewhere in the core code?  The core has also got
 the concept of %TYPE, and could stand to have support for attaching []
 notation to that.  I'm not entirely sure if anything could be shared,
 but it would sure be worth looking at.  I've wished for some time that
 %TYPE not be handled directly in read_datatype() at all, but rather in
 the core parse_datatype() function.  This would require passing some
 sort of callback function to parse_datatype() to let it know what
 variables can be referenced in %TYPE, but we have parser callback
 functions just like that already.  One benefit we could get that way
 is that the core meaning of %TYPE (type of a table field name) would
 still be available in plpgsql, if the name didn't match any declared
 plpgsql variable.

 However, assuming that we're sticking to just changing plpgsql for the
 moment ... I cannot escape the feeling that this is a large patch with a
 small patch struggling to get out.  It should not require 500 net new
 lines of code to provide this functionality, when all we're doing is
 looking up the array type whose element type is the type the existing
 code can derive.  I would have expected to see the grammar passing one
 extra flag to plpgsql_parse_cwordtype and plpgsql_parse_cwordrowtype
 routines that were little changed except for the addition of a step to
 find the array type.

 Another complaint is that the grammar changes assume that the first
 token of decl_datatype must be T_WORD or T_CWORD, which means that
 it fails for cases such as unreserved plpgsql keywords.  This example
 should work, and does work in 9.1:

 create domain hint as int;

 create or replace function foo() returns void as $$
 declare
  x hint;
 begin
 end
 $$ language plpgsql;

 but fails with this patch because hint is returned by the lexer as
 K_HINT not T_WORD.  You might be able to get around that by adding a
 production with unreserved_keyword --- but I'm unsure that that would
 cover every case that worked before, and in any case I do not see the
 point of changing the grammar production at all.  It gains almost
 nothing to have Bison parse the [] rather than doing it with C code in
 read_datatype().

                        regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers