Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Fujii Masao
On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander  wrote:
> I've applied this version with a few more minor changes that Heikki found.

Cool!

When I tried pg_receivexlog and checked the contents of streamed WAL file by
xlogdump, I found that recent WAL records that walsender has already sent don't
exist in that WAL file. I expected that pg_receivexlog writes the streamed WAL
records to the disk as soon as possible, but it doesn't. Is this
intentional? Or bug?
Am I missing something?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Magnus Hagander
On Thu, Oct 27, 2011 at 09:29, Fujii Masao  wrote:
> On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander  wrote:
>> I've applied this version with a few more minor changes that Heikki found.
>
> Cool!
>
> When I tried pg_receivexlog and checked the contents of streamed WAL file by
> xlogdump, I found that recent WAL records that walsender has already sent 
> don't
> exist in that WAL file. I expected that pg_receivexlog writes the streamed WAL
> records to the disk as soon as possible, but it doesn't. Is this
> intentional? Or bug?
> Am I missing something?

It writes it to disk as soon as possible, but doesn't fsync() until
the end of each segment. Are you by any chance looking at the file
while it's running?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Fujii Masao
On Thu, Oct 27, 2011 at 4:40 PM, Magnus Hagander  wrote:
> On Thu, Oct 27, 2011 at 09:29, Fujii Masao  wrote:
>> On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander  wrote:
>>> I've applied this version with a few more minor changes that Heikki found.
>>
>> Cool!
>>
>> When I tried pg_receivexlog and checked the contents of streamed WAL file by
>> xlogdump, I found that recent WAL records that walsender has already sent 
>> don't
>> exist in that WAL file. I expected that pg_receivexlog writes the streamed 
>> WAL
>> records to the disk as soon as possible, but it doesn't. Is this
>> intentional? Or bug?
>> Am I missing something?
>
> It writes it to disk as soon as possible, but doesn't fsync() until
> the end of each segment. Are you by any chance looking at the file
> while it's running?

No. I looked at that file after shutting down the master server.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Magnus Hagander
On Thu, Oct 27, 2011 at 09:46, Fujii Masao  wrote:
> On Thu, Oct 27, 2011 at 4:40 PM, Magnus Hagander  wrote:
>> On Thu, Oct 27, 2011 at 09:29, Fujii Masao  wrote:
>>> On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander  
>>> wrote:
 I've applied this version with a few more minor changes that Heikki found.
>>>
>>> Cool!
>>>
>>> When I tried pg_receivexlog and checked the contents of streamed WAL file by
>>> xlogdump, I found that recent WAL records that walsender has already sent 
>>> don't
>>> exist in that WAL file. I expected that pg_receivexlog writes the streamed 
>>> WAL
>>> records to the disk as soon as possible, but it doesn't. Is this
>>> intentional? Or bug?
>>> Am I missing something?
>>
>> It writes it to disk as soon as possible, but doesn't fsync() until
>> the end of each segment. Are you by any chance looking at the file
>> while it's running?
>
> No. I looked at that file after shutting down the master server.

Ugh, in that case something is certainly wrong. There is nothing but
setting up some offset values between PQgetCopyData() and write()...


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Fujii Masao
On Thu, Oct 27, 2011 at 4:49 PM, Magnus Hagander  wrote:
> On Thu, Oct 27, 2011 at 09:46, Fujii Masao  wrote:
>> On Thu, Oct 27, 2011 at 4:40 PM, Magnus Hagander  wrote:
>>> On Thu, Oct 27, 2011 at 09:29, Fujii Masao  wrote:
 On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander  
 wrote:
> I've applied this version with a few more minor changes that Heikki found.

 Cool!

 When I tried pg_receivexlog and checked the contents of streamed WAL file 
 by
 xlogdump, I found that recent WAL records that walsender has already sent 
 don't
 exist in that WAL file. I expected that pg_receivexlog writes the streamed 
 WAL
 records to the disk as soon as possible, but it doesn't. Is this
 intentional? Or bug?
 Am I missing something?
>>>
>>> It writes it to disk as soon as possible, but doesn't fsync() until
>>> the end of each segment. Are you by any chance looking at the file
>>> while it's running?
>>
>> No. I looked at that file after shutting down the master server.
>
> Ugh, in that case something is certainly wrong. There is nothing but
> setting up some offset values between PQgetCopyData() and write()...

When end-of-copy stream is found or an error happens, pg_receivexlog
exits without flushing outstanding WAL records. Which seems to cause
the problem I reported.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Heikki Linnakangas

On 27.10.2011 09:57, Heikki Linnakangas wrote:

My suggestion is to fix the CLOG problem in that same way that you fixed
the SUBTRANS problem, i.e. by moving LogStandbySnapshot() to before
CheckPointGuts().

Here's what I image CreateCheckPoint() should look like:

1) LogStandbySnapshot() and fill out oldestActiveXid
2) Fill out REDO
3) Wait for concurrent commits
4) Fill out nextXid and the other fields
5) CheckPointGuts()
6) Rest

It's then no longer necessary for LogStandbySnapshot() do modify
the nextXid, since we fill out nextXid after LogStandbySnapshot() and
will thus derive a higher value than LogStandbySnapshot() would have.


Hmm, I don't think that fully fixes the problem. Even if you're certain
that CheckPointGuts() has fsync'd the clog page to disk, VACUUM might
decide to truncate it away again while the checkpoint is running.


Oh, scratch that. During recovery, we merrily treat missing slru files 
as they were filled with zeros.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Magnus Hagander
On Thu, Oct 27, 2011 at 10:12, Fujii Masao  wrote:
> On Thu, Oct 27, 2011 at 4:49 PM, Magnus Hagander  wrote:
>> On Thu, Oct 27, 2011 at 09:46, Fujii Masao  wrote:
>>> On Thu, Oct 27, 2011 at 4:40 PM, Magnus Hagander  
>>> wrote:
 On Thu, Oct 27, 2011 at 09:29, Fujii Masao  wrote:
> On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander  
> wrote:
>> I've applied this version with a few more minor changes that Heikki 
>> found.
>
> Cool!
>
> When I tried pg_receivexlog and checked the contents of streamed WAL file 
> by
> xlogdump, I found that recent WAL records that walsender has already sent 
> don't
> exist in that WAL file. I expected that pg_receivexlog writes the 
> streamed WAL
> records to the disk as soon as possible, but it doesn't. Is this
> intentional? Or bug?
> Am I missing something?

 It writes it to disk as soon as possible, but doesn't fsync() until
 the end of each segment. Are you by any chance looking at the file
 while it's running?
>>>
>>> No. I looked at that file after shutting down the master server.
>>
>> Ugh, in that case something is certainly wrong. There is nothing but
>> setting up some offset values between PQgetCopyData() and write()...
>
> When end-of-copy stream is found or an error happens, pg_receivexlog
> exits without flushing outstanding WAL records. Which seems to cause
> the problem I reported.

Not sure I follow. When we arrive at PQgetCopyData() there should be
nothing buffered, and if the end of stream happens there it returns
-1, and we exit, no? So where is the data that's lost?

I do realize we don't actually fsync() and close() in this case - is
that what you are referring to? But the data should already have been
write()d, so it should still be there, no?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Fujii Masao
On Thu, Oct 27, 2011 at 5:18 PM, Magnus Hagander  wrote:
> Not sure I follow. When we arrive at PQgetCopyData() there should be
> nothing buffered, and if the end of stream happens there it returns
> -1, and we exit, no? So where is the data that's lost?
>
> I do realize we don't actually fsync() and close() in this case - is
> that what you are referring to? But the data should already have been
> write()d, so it should still be there, no?

Oh, right. Hmm.. xlogdump might be the cause.

Though I've not read the code of xlogdump, I wonder if it gives up
outputting the contents of WAL file when it finds a partial WAL page...
This strikes me that recovery code has the same problem. No?
IOW, when a partial WAL page is found during recovery, I'm afraid
that page would not be replayed though it contains valid data.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Simon Riggs
On Thu, Oct 27, 2011 at 4:36 AM, Tom Lane  wrote:
> Robert Haas  writes:
>> On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs  wrote:
>>> This fixes both the subtrans and clog bugs in one patch.
>
>> I don't see the point of changing StartupCLOG() to be an empty
>> function and adding a new function TrimCLOG() that does everything
>> StartupCLOG() used to do.
>
> +1 ... I found that overly cute also.

It would have been even easier to move StartupCLOG() later, but then
we'd need a big comment explaining why CLOG starts up at one point and
subtrans starts up at another point, since that is very confusing way
of doing things. I wrote it that way first and it definitely looks
strange.

It's much easier to understand that StartupCLOG() is actually a no-op
and that we need to trim the clog at the end of recovery in all cases.

The patch isn't meant to be cute, just a better of way of expressing
what needs to be done, so I think the patch should stay that way.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Heikki Linnakangas

On 27.10.2011 02:29, Florian Pflug wrote:

Per my theory about the cause of the problem in my other mail, I think you
might see StartupCLOG failures even during crash recovery, provided that
wal_level was set to hot_standby when the primary crashed. Here's how

1) We start a checkpoint, and get as far as LogStandbySnapshot()
2) A backend does AssignTransactionId, and gets as far as GetTransactionoId().
   The assigned XID requires CLOG extension.
3) The checkpoint continues, and LogStandbySnapshot () advances the
   checkpoint's nextXid to the XID assigned in (2).
4) We crash after writing the checkpoint record, but before the CLOG
   extension makes it to the disk, and before any trace of the XID assigned
   in (2) makes it to the xlog.

Then StartupCLOG() would fail at the end of recovery, because we'd end up
with a nextXid whose corresponding CLOG page doesn't exist.


No, clog extension is WAL-logged while holding the XidGenLock. At step 
3, LogStandbySnapshot() would block until the clog-extension record is 
written to WAL, so crash recovery would see and replay that record 
before calling StartupCLOG().


That can happen during hot standby, though, because StartupCLOG() is 
called earlier.



My suggestion is to fix the CLOG problem in that same way that you fixed
the SUBTRANS problem, i.e. by moving LogStandbySnapshot() to before
CheckPointGuts().

Here's what I image CreateCheckPoint() should look like:

1) LogStandbySnapshot() and fill out oldestActiveXid
2) Fill out REDO
3) Wait for concurrent commits
4) Fill out nextXid and the other fields
5) CheckPointGuts()
6) Rest

It's then no longer necessary for LogStandbySnapshot() do modify
the nextXid, since we fill out nextXid after LogStandbySnapshot() and
will thus derive a higher value than LogStandbySnapshot() would have.


Hmm, I don't think that fully fixes the problem. Even if you're certain 
that CheckPointGuts() has fsync'd the clog page to disk, VACUUM might 
decide to truncate it away again while the checkpoint is running.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Fujii Masao
On Thu, Oct 27, 2011 at 6:25 PM, Fujii Masao  wrote:
> On Thu, Oct 27, 2011 at 5:18 PM, Magnus Hagander  wrote:
>> Not sure I follow. When we arrive at PQgetCopyData() there should be
>> nothing buffered, and if the end of stream happens there it returns
>> -1, and we exit, no? So where is the data that's lost?
>>
>> I do realize we don't actually fsync() and close() in this case - is
>> that what you are referring to? But the data should already have been
>> write()d, so it should still be there, no?
>
> Oh, right. Hmm.. xlogdump might be the cause.
>
> Though I've not read the code of xlogdump, I wonder if it gives up
> outputting the contents of WAL file when it finds a partial WAL page...
> This strikes me that recovery code has the same problem. No?
> IOW, when a partial WAL page is found during recovery, I'm afraid
> that page would not be replayed though it contains valid data.

My concern was right. When I performed a recovery using the streamed
WAL files, the loss of data happened. A partial WAL page was not replayed.

To avoid this problem, I think that we should change pg_receivexlog so
that it writes WAL data *by the block*, or creates, like walreceiver, WAL file
before writing any data. Otherwise, though pg_receivexlog streams WAL
data in realtime, the latest WAL data might not be available for recovery.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Magnus Hagander
On Thu, Oct 27, 2011 at 12:29, Fujii Masao  wrote:
> On Thu, Oct 27, 2011 at 6:25 PM, Fujii Masao  wrote:
>> On Thu, Oct 27, 2011 at 5:18 PM, Magnus Hagander  wrote:
>>> Not sure I follow. When we arrive at PQgetCopyData() there should be
>>> nothing buffered, and if the end of stream happens there it returns
>>> -1, and we exit, no? So where is the data that's lost?
>>>
>>> I do realize we don't actually fsync() and close() in this case - is
>>> that what you are referring to? But the data should already have been
>>> write()d, so it should still be there, no?
>>
>> Oh, right. Hmm.. xlogdump might be the cause.
>>
>> Though I've not read the code of xlogdump, I wonder if it gives up
>> outputting the contents of WAL file when it finds a partial WAL page...
>> This strikes me that recovery code has the same problem. No?
>> IOW, when a partial WAL page is found during recovery, I'm afraid
>> that page would not be replayed though it contains valid data.
>
> My concern was right. When I performed a recovery using the streamed
> WAL files, the loss of data happened. A partial WAL page was not replayed.
>
> To avoid this problem, I think that we should change pg_receivexlog so
> that it writes WAL data *by the block*, or creates, like walreceiver, WAL file
> before writing any data. Otherwise, though pg_receivexlog streams WAL
> data in realtime, the latest WAL data might not be available for recovery.

Ah, so you were recovering data from the last, partial, file? Not from
a completed file?

I'm rewriting the handling of partial files per the other thread
started by Heikki. The idea is that there will be an actual .partial
file in there when pg_receivexlog has ended, and you have to deal with
it manually. The typical way would be to pad it with zeroes to the
end. Doing such padding would solve this recovery issue, correct?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Fujii Masao
On Thu, Oct 27, 2011 at 7:48 PM, Magnus Hagander  wrote:
> On Thu, Oct 27, 2011 at 12:29, Fujii Masao  wrote:
>> On Thu, Oct 27, 2011 at 6:25 PM, Fujii Masao  wrote:
>>> On Thu, Oct 27, 2011 at 5:18 PM, Magnus Hagander  
>>> wrote:
 Not sure I follow. When we arrive at PQgetCopyData() there should be
 nothing buffered, and if the end of stream happens there it returns
 -1, and we exit, no? So where is the data that's lost?

 I do realize we don't actually fsync() and close() in this case - is
 that what you are referring to? But the data should already have been
 write()d, so it should still be there, no?
>>>
>>> Oh, right. Hmm.. xlogdump might be the cause.
>>>
>>> Though I've not read the code of xlogdump, I wonder if it gives up
>>> outputting the contents of WAL file when it finds a partial WAL page...
>>> This strikes me that recovery code has the same problem. No?
>>> IOW, when a partial WAL page is found during recovery, I'm afraid
>>> that page would not be replayed though it contains valid data.
>>
>> My concern was right. When I performed a recovery using the streamed
>> WAL files, the loss of data happened. A partial WAL page was not replayed.
>>
>> To avoid this problem, I think that we should change pg_receivexlog so
>> that it writes WAL data *by the block*, or creates, like walreceiver, WAL 
>> file
>> before writing any data. Otherwise, though pg_receivexlog streams WAL
>> data in realtime, the latest WAL data might not be available for recovery.
>
> Ah, so you were recovering data from the last, partial, file? Not from
> a completed file?

Yes. I copied all streamed WAL files to pg_xlog directory and started recovery.

> I'm rewriting the handling of partial files per the other thread
> started by Heikki. The idea is that there will be an actual .partial
> file in there when pg_receivexlog has ended, and you have to deal with
> it manually. The typical way would be to pad it with zeroes to the
> end. Doing such padding would solve this recovery issue, correct?

Yes. But that sounds unuserfriendly. Padding the WAL file manually
is easy-to-do for a user?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Heikki Linnakangas

On 27.10.2011 14:09, Fujii Masao wrote:

On Thu, Oct 27, 2011 at 7:48 PM, Magnus Hagander  wrote:

I'm rewriting the handling of partial files per the other thread
started by Heikki. The idea is that there will be an actual .partial
file in there when pg_receivexlog has ended, and you have to deal with
it manually. The typical way would be to pad it with zeroes to the
end. Doing such padding would solve this recovery issue, correct?


Yes. But that sounds unuserfriendly. Padding the WAL file manually
is easy-to-do for a user?


"truncate -s 16M " works at least on my Linux system. Not sure how 
you'd do it on Windows.


Perhaps we should add automatic padding in the server, though. It 
wouldn't take much code in the server, and would make life easier for 
people writing their scripts. Maybe even have the backend check for a 
.partial file if it can't find a regularly named XLOG file. Needs some 
thought..


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Magnus Hagander
On Thu, Oct 27, 2011 at 13:19, Heikki Linnakangas
 wrote:
> On 27.10.2011 14:09, Fujii Masao wrote:
>>
>> On Thu, Oct 27, 2011 at 7:48 PM, Magnus Hagander
>>  wrote:
>>>
>>> I'm rewriting the handling of partial files per the other thread
>>> started by Heikki. The idea is that there will be an actual .partial
>>> file in there when pg_receivexlog has ended, and you have to deal with
>>> it manually. The typical way would be to pad it with zeroes to the
>>> end. Doing such padding would solve this recovery issue, correct?
>>
>> Yes. But that sounds unuserfriendly. Padding the WAL file manually
>> is easy-to-do for a user?
>
> "truncate -s 16M " works at least on my Linux system. Not sure how
> you'd do it on Windows.

Yeah, taht's easy enough. I don't think there are similar tools on
windows, but we could probably put together a quick script for people
to use if necessary.


> Perhaps we should add automatic padding in the server, though. It wouldn't
> take much code in the server, and would make life easier for people writing
> their scripts. Maybe even have the backend check for a .partial file if it
> can't find a regularly named XLOG file. Needs some thought..

I'd definitely want to avoid anything that requires pg_receivexlog to
actually *parse* the WAL. That'll make it way more complex than I'd
like.

Having recovery consider a .partial file might be interesting. It
could consider that only if there are no other complete files
available, or something like that?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Robert Haas
On Thu, Oct 27, 2011 at 5:37 AM, Simon Riggs  wrote:
> On Thu, Oct 27, 2011 at 4:36 AM, Tom Lane  wrote:
>> Robert Haas  writes:
>>> On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs  wrote:
 This fixes both the subtrans and clog bugs in one patch.
>>
>>> I don't see the point of changing StartupCLOG() to be an empty
>>> function and adding a new function TrimCLOG() that does everything
>>> StartupCLOG() used to do.
>>
>> +1 ... I found that overly cute also.
>
> It would have been even easier to move StartupCLOG() later, but then
> we'd need a big comment explaining why CLOG starts up at one point and
> subtrans starts up at another point, since that is very confusing way
> of doing things. I wrote it that way first and it definitely looks
> strange.
>
> It's much easier to understand that StartupCLOG() is actually a no-op
> and that we need to trim the clog at the end of recovery in all cases.

If it's a no-op, why have it at all?  I know we have a bunch of places
in the code where we have empty stubs where there used to be
initialization or cleanup code, but I've never found that particularly
good style.  If something no longer requires initialization in a
certain place, I think we should nuke the whole function.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Robert Haas
On Thu, Oct 27, 2011 at 7:19 AM, Heikki Linnakangas
 wrote:
> On 27.10.2011 14:09, Fujii Masao wrote:
>>
>> On Thu, Oct 27, 2011 at 7:48 PM, Magnus Hagander
>>  wrote:
>>>
>>> I'm rewriting the handling of partial files per the other thread
>>> started by Heikki. The idea is that there will be an actual .partial
>>> file in there when pg_receivexlog has ended, and you have to deal with
>>> it manually. The typical way would be to pad it with zeroes to the
>>> end. Doing such padding would solve this recovery issue, correct?
>>
>> Yes. But that sounds unuserfriendly. Padding the WAL file manually
>> is easy-to-do for a user?
>
> "truncate -s 16M " works at least on my Linux system. Not sure how
> you'd do it on Windows.

One of the common I hear about PostgreSQL is that our replication
system is more difficult to set up than people would like, and it's
too easy to make mistakes that can corrupt your data without realizing
it; I don't think making them need to truncate a file to 16 megabytes
is going to improve things there.

> Perhaps we should add automatic padding in the server, though. It wouldn't
> take much code in the server, and would make life easier for people writing
> their scripts. Maybe even have the backend check for a .partial file if it
> can't find a regularly named XLOG file. Needs some thought..

+1 for figuring out something, though I'm not sure exactly what.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Florian Pflug
On Oct27, 2011, at 08:57 , Heikki Linnakangas wrote:
> On 27.10.2011 02:29, Florian Pflug wrote:
>> Per my theory about the cause of the problem in my other mail, I think you
>> might see StartupCLOG failures even during crash recovery, provided that
>> wal_level was set to hot_standby when the primary crashed. Here's how
>> 
>> 1) We start a checkpoint, and get as far as LogStandbySnapshot()
>> 2) A backend does AssignTransactionId, and gets as far as 
>> GetTransactionoId().
>>   The assigned XID requires CLOG extension.
>> 3) The checkpoint continues, and LogStandbySnapshot () advances the
>>   checkpoint's nextXid to the XID assigned in (2).
>> 4) We crash after writing the checkpoint record, but before the CLOG
>>   extension makes it to the disk, and before any trace of the XID assigned
>>   in (2) makes it to the xlog.
>> 
>> Then StartupCLOG() would fail at the end of recovery, because we'd end up
>> with a nextXid whose corresponding CLOG page doesn't exist.
> 
> No, clog extension is WAL-logged while holding the XidGenLock. At step 3,
> LogStandbySnapshot() would block until the clog-extension record is written
> to WAL, so crash recovery would see and replay that record before calling
> StartupCLOG().

Hm, true. But it still seems wrong for LogStandbySnapshot() to modify the
checkpoint's nextXid, and even more wrong to do that only if wal_mode =
hot_standby. Plus, I think it's a smart idea to verify that the required
parts of the CLOG are available at the start of recovery. Because if they're
missing, the data on the standby *will* be corrupted. Is there any argument
against doiing LogStandbySnapshot() earlier (i.e., at the time oldestActiveXid
is computed)?

best regards,
Florian Pflug



-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] isolationtester and invalid permutations

2011-10-27 Thread Robert Haas
On Tue, Oct 25, 2011 at 9:13 PM, Alvaro Herrera  wrote:
> Instead of simply aborting a spec that specifies running commands on
> blocked sessions (what we call an invalid permutation), it seems more
> useful to report the problem, cleanup the sessions, and continue with
> the next permutation.
>
> This, in conjunction with the dry-run patch I submitted earlier, makes
> it easier to determine a working spec: dry-run the spec; copy the
> so-generated permutation lines into the spec; run the spec normally,
> which reports the invalid permutations; comment out the invalid
> permutations from the spec; done.
>
> The attached patch, again from Alexander Shulgin (with some tweaks from
> me) does that.
>
> Comments?

Seems sensible.  I think we should avoid including invalid
permutations in our regression test suite, but this still seems useful
for the reasons you mention.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (PATCH) Adding CORRESPONDING (NULL error)

2011-10-27 Thread Erik Rijkers
(pgsql 9.2devel (25 oct) with your latest CORRESPONDING patch;
linux x86_64 GNU/Linux 2.6.18-274.3.1.el5)

Hi,

here is another peculiarity, which I think is a bug:

-- first without CORRESPONDING:

$ psql -Xaf null.sql
select 1 a   , 2 b
union all
select null a, 4 b ;
 a | b
---+---
 1 | 2
   | 4
(2 rows)

-- then with CORRESPONDING:

select 1 a   , 2 b
union all
corresponding
select null a, 4 b ;
psql:null.sql:9: ERROR:  failed to find conversion function from unknown to 
integer


If the null value is in a table column the error does not occur:

drop table if exists t1; create table t1 (a int, b int); insert into t1 values 
(1,2);
drop table if exists t2; create table t2 (a int, b int); insert into t2 values 
(null,2);
select a,b from t1
union all
corresponding
select a,b from t2 ;
 a | b
---+---
 1 | 2
   | 2
(2 rows)


I'm not sure it is actually a bug; but it seems an unneccessary error.


thanks,

Erik Rijkers


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Hot Standby startup with overflowed snapshots

2011-10-27 Thread Simon Riggs
Chris Redekop's recent report of slow startup for Hot Standby has made
me revisit the code there.

Although there isn't a bug, there is a missed opportunity for starting
up faster which could be the source of Chris' annoyance.

The following patch allows a faster startup in some circumstances.

The patch also alters the log levels for messages and gives a single
simple message for this situation. The log will now say

  LOG:  recovery snapshot waiting for non-overflowed snapshot or until
oldest active xid on standby is at least %u (now %u)
  ...multiple times until snapshot non-overflowed or xid reached...

whereas before the first LOG message shown was

  LOG:  consistent state delayed because recovery snapshot incomplete
  and only later, at DEBUG2 do you see
  LOG:  recovery snapshot waiting for %u oldest active xid on standby is %u
  ...multiple times until xid reached...

Comments please.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


faster_hot_standby_startup_withsubxacts.v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Simon Riggs
On Thu, Oct 27, 2011 at 12:36 PM, Robert Haas  wrote:
> On Thu, Oct 27, 2011 at 5:37 AM, Simon Riggs  wrote:
>> On Thu, Oct 27, 2011 at 4:36 AM, Tom Lane  wrote:
>>> Robert Haas  writes:
 On Wed, Oct 26, 2011 at 12:16 PM, Simon Riggs  
 wrote:
> This fixes both the subtrans and clog bugs in one patch.
>>>
 I don't see the point of changing StartupCLOG() to be an empty
 function and adding a new function TrimCLOG() that does everything
 StartupCLOG() used to do.
>>>
>>> +1 ... I found that overly cute also.
>>
>> It would have been even easier to move StartupCLOG() later, but then
>> we'd need a big comment explaining why CLOG starts up at one point and
>> subtrans starts up at another point, since that is very confusing way
>> of doing things. I wrote it that way first and it definitely looks
>> strange.
>>
>> It's much easier to understand that StartupCLOG() is actually a no-op
>> and that we need to trim the clog at the end of recovery in all cases.
>
> If it's a no-op, why have it at all?  I know we have a bunch of places
> in the code where we have empty stubs where there used to be
> initialization or cleanup code, but I've never found that particularly
> good style.  If something no longer requires initialization in a
> certain place, I think we should nuke the whole function.

It is a no-op for exactly the same reason other similar functions are
no-ops - it used to do something but now does not.

Anyone seeing StartupSubtrans and StartupMultiXact but no StartupClog
will immediately ask "why?".
IMHO it's easier to have an obviously named function than a comment -
its less invasive for a backpatch as well.

I'm following current code style. If you wish to change that, feel
free to change this and all other locations that do this. Until then,
doing this makes most sense and follows current coding style.

If I had done it the way you suggest, I don't doubt someone would say
in about 6 months "Which idiot removed StartupClog()?".

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Simon Riggs
On Thu, Oct 27, 2011 at 12:29 AM, Florian Pflug  wrote:

> Per my theory about the cause of the problem in my other mail, I think you
> might see StartupCLOG failures even during crash recovery, provided that
> wal_level was set to hot_standby when the primary crashed. Here's how
>
> 1) We start a checkpoint, and get as far as LogStandbySnapshot()
> 2) A backend does AssignTransactionId, and gets as far as GetTransactionoId().
>  The assigned XID requires CLOG extension.
> 3) The checkpoint continues, and LogStandbySnapshot () advances the
>  checkpoint's nextXid to the XID assigned in (2).
> 4) We crash after writing the checkpoint record, but before the CLOG
>  extension makes it to the disk, and before any trace of the XID assigned
>  in (2) makes it to the xlog.
>
> Then StartupCLOG() would fail at the end of recovery, because we'd end up
> with a nextXid whose corresponding CLOG page doesn't exist.

Clog extension holds XidGenLock, as does LogStandbySnapshot, which
specifically excludes the above scenario.


> Quite aside from that concern, I think it's probably not a good idea
> for the nextXid value of a checkpoint to depend on whether wal_level
> was set to hot_standby or not. Our recovery code is already quite complex
> and hard to test, and this just adds one more combination that has to
> be thought about while coding and that needs to be tested.
>
> My suggestion is to fix the CLOG problem in that same way that you fixed
> the SUBTRANS problem, i.e. by moving LogStandbySnapshot() to before
> CheckPointGuts().
>
> Here's what I image CreateCheckPoint() should look like:
>
> 1) LogStandbySnapshot() and fill out oldestActiveXid
> 2) Fill out REDO
> 3) Wait for concurrent commits
> 4) Fill out nextXid and the other fields
> 5) CheckPointGuts()
> 6) Rest
>
> It's then no longer necessary for LogStandbySnapshot() do modify
> the nextXid, since we fill out nextXid after LogStandbySnapshot() and
> will thus derive a higher value than LogStandbySnapshot() would have.
>
> We could then also fold GetOldestActiveTransactionId() back into
> your proposed LogStandbySnapshot() and thus don't need two ProcArray
> traversals.

I think you make a good case for doing this.

However, I'm concerned that moving LogStandbySnapshot() in a backpatch
seems more risky than it's worth. We could easily introduce a new bug
into what we would all agree is a complex piece of code. Minimal
change seems best in this case.

And also, 2 ProcArray traversals is not a problem there.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Florian Pflug
On Oct27, 2011, at 15:51 , Simon Riggs wrote:
> On Thu, Oct 27, 2011 at 12:29 AM, Florian Pflug  wrote:
>> Here's what I image CreateCheckPoint() should look like:
>> 
>> 1) LogStandbySnapshot() and fill out oldestActiveXid
>> 2) Fill out REDO
>> 3) Wait for concurrent commits
>> 4) Fill out nextXid and the other fields
>> 5) CheckPointGuts()
>> 6) Rest
>> 
>> It's then no longer necessary for LogStandbySnapshot() do modify
>> the nextXid, since we fill out nextXid after LogStandbySnapshot() and
>> will thus derive a higher value than LogStandbySnapshot() would have.
>> 
>> We could then also fold GetOldestActiveTransactionId() back into
>> your proposed LogStandbySnapshot() and thus don't need two ProcArray
>> traversals.
> 
> I think you make a good case for doing this.
> 
> However, I'm concerned that moving LogStandbySnapshot() in a backpatch
> seems more risky than it's worth. We could easily introduce a new bug
> into what we would all agree is a complex piece of code. Minimal
> change seems best in this case.

OTOH, we currently compute oldestActiveXid within LogStandbySnapshot().
Your proposed patch changes that, which also carries a risk since something
could depend on these values being in sync. Especially since both the logged
snapshot and oldestActiveXid influence the snapshot tracking on the slave.

But since you wrote most of that code, your judgement about the relative
risks of these two approaches obviously out-weights mine.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Tom Lane
Simon Riggs  writes:
> On Thu, Oct 27, 2011 at 12:36 PM, Robert Haas  wrote:
>> On Thu, Oct 27, 2011 at 5:37 AM, Simon Riggs  wrote:
>>> It's much easier to understand that StartupCLOG() is actually a no-op
>>> and that we need to trim the clog at the end of recovery in all cases.

>> If it's a no-op, why have it at all?

> It is a no-op for exactly the same reason other similar functions are
> no-ops - it used to do something but now does not.

> Anyone seeing StartupSubtrans and StartupMultiXact but no StartupClog
> will immediately ask "why?".

I think it's a good point that StartupCLog doesn't exist in a vacuum
but should be parallel to the init functions for the other SLRU modules.
So at this point I think I agree with Simon's approach.  However, the
obvious next question is whether those other modules don't need to be
changed also, and if not why not.

Another issue is that if StartupCLog is left as a no-op, what will
happen if someone mistakenly tries to access clog before the trim
function is called?  It would be a good idea to make sure that such
a thing results in an easily-identifiable failure.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Your review of pg_receivexlog/pg_basebackup

2011-10-27 Thread Magnus Hagander
On Wed, Oct 26, 2011 at 09:52, Heikki Linnakangas
 wrote:
> (CC'ing pgsql-hackers, this started as an IM discussion yesterday but really
> belongs in the archives)
>
> On 25.10.2011 23:52, Magnus Hagander wrote:
>>>
>>> There's a tiny chance to get incomplete xlog files with pg_receivexlog if
>>> you crash:
>>> 1. pg_receivexlog finishes write()ing a file but system crashes before
>>> fsync() finishes.
>>> 2. When pg_receivexlog restarts after crash, the last WAL file was not
>>> fully flushed to disk, with
>>> holes in the middle, but it has the right length. pg_receivexlog will
>>> continue streaming from the next file.
>>> not sure if we care about such a narrow window, but maybe we do
>>
>> So how would we go about fixing that?  Always unlink the last file in
>> the directory and try from there would seem dangerous too - what if
>> it's not available on the master anymore, then we might have given up
>> on data...
>
> Start streaming from the beginning of the last segment, but don't unlink it
> first. Just overwrite it as you receive the data.
>
> Or, always create new xlog file as "0001000100D3.partial", and
> only when it's fully written, fsync it, and then rename it to
> "0001000100D3". Then you know that if a file doesn't have the
> .partial suffix, it's complete. The fact that the last partial file always
> has the .partial suffix needs some extra pushups in the restore_command,
> though.

Here's a version that does this. Turns out this requires a lot less
code than what was previously in there, which is always nice.

We still need to solve the other part which is how to deal with the
partial files on restore. But this is definitely a cleaner way from a
pure pg_receivexlog perspective.

Comments/reviews?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/
*** a/src/bin/pg_basebackup/pg_receivexlog.c
--- b/src/bin/pg_basebackup/pg_receivexlog.c
***
*** 71,104  usage(void)
  static bool
  segment_callback(XLogRecPtr segendpos, uint32 timeline)
  {
- 	char		fn[MAXPGPATH];
- 	struct stat statbuf;
- 
  	if (verbose)
  		fprintf(stderr, _("%s: finished segment at %X/%X (timeline %u)\n"),
  progname, segendpos.xlogid, segendpos.xrecoff, timeline);
  
  	/*
- 	 * Check if there is a partial file for the name we just finished, and if
- 	 * there is, remove it under the assumption that we have now got all the
- 	 * data we need.
- 	 */
- 	segendpos.xrecoff /= XLOG_SEG_SIZE;
- 	PrevLogSeg(segendpos.xlogid, segendpos.xrecoff);
- 	snprintf(fn, sizeof(fn), "%s/%08X%08X%08X.partial",
- 			 basedir, timeline,
- 			 segendpos.xlogid,
- 			 segendpos.xrecoff);
- 	if (stat(fn, &statbuf) == 0)
- 	{
- 		/* File existed, get rid of it */
- 		if (verbose)
- 			fprintf(stderr, _("%s: removing file \"%s\"\n"),
- 	progname, fn);
- 		unlink(fn);
- 	}
- 
- 	/*
  	 * Never abort from this - we handle all aborting in continue_streaming()
  	 */
  	return false;
--- 71,81 
***
*** 133,139  FindStreamingStart(XLogRecPtr currentpos, uint32 currenttimeline)
  	bool		b;
  	uint32		high_log = 0;
  	uint32		high_seg = 0;
- 	bool		partial = false;
  
  	dir = opendir(basedir);
  	if (dir == NULL)
--- 110,115 
***
*** 195,201  FindStreamingStart(XLogRecPtr currentpos, uint32 currenttimeline)
  			disconnect_and_exit(1);
  		}
  
! 		if (statbuf.st_size == 16 * 1024 * 1024)
  		{
  			/* Completed segment */
  			if (log > high_log ||
--- 171,177 
  			disconnect_and_exit(1);
  		}
  
! 		if (statbuf.st_size == XLOG_SEG_SIZE)
  		{
  			/* Completed segment */
  			if (log > high_log ||
***
*** 208,244  FindStreamingStart(XLogRecPtr currentpos, uint32 currenttimeline)
  		}
  		else
  		{
! 			/*
! 			 * This is a partial file. Rename it out of the way.
! 			 */
! 			char		newfn[MAXPGPATH];
! 
! 			fprintf(stderr, _("%s: renaming partial file \"%s\" to \"%s.partial\"\n"),
! 	progname, dirent->d_name, dirent->d_name);
! 
! 			snprintf(newfn, sizeof(newfn), "%s/%s.partial",
! 	 basedir, dirent->d_name);
! 
! 			if (stat(newfn, &statbuf) == 0)
! 			{
! /*
!  * XXX: perhaps we should only error out if the existing file
!  * is larger?
!  */
! fprintf(stderr, _("%s: file \"%s\" already exists. Check and clean up manually.\n"),
! 		progname, newfn);
! disconnect_and_exit(1);
! 			}
! 			if (rename(fullpath, newfn) != 0)
! 			{
! fprintf(stderr, _("%s: could not rename \"%s\" to \"%s\": %s\n"),
! 		progname, fullpath, newfn, strerror(errno));
! disconnect_and_exit(1);
! 			}
! 
! 			/* Don't continue looking for more, we assume this is the last */
! 			partial = true;
! 			break;
  		}
  	}
  
--- 184,192 
  		}
  		else
  		{
! 			fprintf(stderr, _("%s: segment file '%s' is incorrect size %d, skipping\n"),
! 	progname, dirent->d_name, (int) statbuf.st_size);
! 			continue;
  		}
  	}
  
***
*** 247,263  FindStreamingStart(XLogRe

Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Simon Riggs
On Thu, Oct 27, 2011 at 3:03 PM, Florian Pflug  wrote:

>> I think you make a good case for doing this.
>>
>> However, I'm concerned that moving LogStandbySnapshot() in a backpatch
>> seems more risky than it's worth. We could easily introduce a new bug
>> into what we would all agree is a complex piece of code. Minimal
>> change seems best in this case.
>
> OTOH, we currently compute oldestActiveXid within LogStandbySnapshot().
> Your proposed patch changes that, which also carries a risk since something
> could depend on these values being in sync. Especially since both the logged
> snapshot and oldestActiveXid influence the snapshot tracking on the slave.
>
> But since you wrote most of that code, your judgement about the relative
> risks of these two approaches obviously out-weights mine.

We must move oldestActiveXid since that is the source of a bug. There
is no need to move LogStandbySnapshot(), so I am suggesting we don't
do that for the backpatch. I was going to implement it the way you
suggest in HEAD, since I agree that is a cleaner way.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Tom Lane
Magnus Hagander  writes:
> On Thu, Oct 27, 2011 at 13:19, Heikki Linnakangas
>  wrote:
>> On 27.10.2011 14:09, Fujii Masao wrote:
>>> Yes. But that sounds unuserfriendly. Padding the WAL file manually
>>> is easy-to-do for a user?

> I'd definitely want to avoid anything that requires pg_receivexlog to
> actually *parse* the WAL. That'll make it way more complex than I'd
> like.

What parsing?  Just pad to 16MB with zeroes.  In fact, I think the
receiver should just create the file that size to start with, and then
write received data into it, much like normal WAL creation does.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Magnus Hagander
On Thu, Oct 27, 2011 at 16:54, Tom Lane  wrote:
> Magnus Hagander  writes:
>> On Thu, Oct 27, 2011 at 13:19, Heikki Linnakangas
>>  wrote:
>>> On 27.10.2011 14:09, Fujii Masao wrote:
 Yes. But that sounds unuserfriendly. Padding the WAL file manually
 is easy-to-do for a user?
>
>> I'd definitely want to avoid anything that requires pg_receivexlog to
>> actually *parse* the WAL. That'll make it way more complex than I'd
>> like.
>
> What parsing?  Just pad to 16MB with zeroes.  In fact, I think the

I'm just sayihng that *if* parsing is required, it would be bad.

> receiver should just create the file that size to start with, and then
> write received data into it, much like normal WAL creation does.

So when pg_receivexlog starts up, how would it know if the last file
represents a completed file, or a half-full file, without actually
parsing it? It could be a 16Mb file with 10 bytes of valid data, or a
complete file with 16Mb of valid data.

We could always ask for a retransmit of the whole file, but if that
file is gone on the master, we won't be able to do that, and will
error out in a situation that's not actually an error.

Though I guess if we leave the file as .partial up until this point
(per my other patch just posted), I guess this doesn't actually apply
- if the file is called .partial, we'll overwrite into it. If it's
not, then we assume it's a complete segment.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Dimitri Fontaine
Magnus Hagander  writes:
> On Thu, Oct 27, 2011 at 13:19, Heikki Linnakangas
>  wrote:
>> Perhaps we should add automatic padding in the server, though. It wouldn't
>> take much code in the server, and would make life easier for people writing
>> their scripts. Maybe even have the backend check for a .partial file if it
>> can't find a regularly named XLOG file. Needs some thought..
>
> I'd definitely want to avoid anything that requires pg_receivexlog to
> actually *parse* the WAL. That'll make it way more complex than I'd
> like.

What about creating the WAL file filled up with zeroes at the receiving
end and then overwriting data as we receive it?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Simon Riggs
On Thu, Oct 27, 2011 at 3:13 PM, Tom Lane  wrote:

> However, the
> obvious next question is whether those other modules don't need to be
> changed also, and if not why not.

Good point.

StartupSubtrans() is also changed by this patch, since it will be
supplied with an earlier initialisation value.

StartupMultiXact() didn't need changing, I thought, but I will review further.

> Another issue is that if StartupCLog is left as a no-op, what will
> happen if someone mistakenly tries to access clog before the trim
> function is called?  It would be a good idea to make sure that such
> a thing results in an easily-identifiable failure.

The old StartupCLOG() didn't do anything that was essential to using
the clog, which is why its a no-op.

You can still use the clog, just with zero startup.

Maybe setting the current page should go in at startup, will think.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] out-of-order caution

2011-10-27 Thread Kevin Grittner
On the docs page for the SELECT statement, there is a caution which
starts with:
 
| It is possible for a SELECT command using ORDER BY and FOR
| UPDATE/SHARE to return rows out of order. This is because ORDER BY
| is applied first.
 
Is this risk limited to queries running in READ COMMITTED
transactions?  If so, I think that should be mentioned in the
caution.
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] out-of-order caution

2011-10-27 Thread Simon Riggs
On Thu, Oct 27, 2011 at 4:41 PM, Kevin Grittner
 wrote:
> On the docs page for the SELECT statement, there is a caution which
> starts with:
>
> | It is possible for a SELECT command using ORDER BY and FOR
> | UPDATE/SHARE to return rows out of order. This is because ORDER BY
> | is applied first.
>
> Is this risk limited to queries running in READ COMMITTED
> transactions?  If so, I think that should be mentioned in the
> caution.

I think it should say that if this occurs with SERIALIZED transactions
it will result in a serialisation error.

Just to say there is no effect in serializable mode wouldn't be helpful.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] plpgsql versus long ELSIF chains

2011-10-27 Thread Tom Lane
Some off-list discussion found that the cause of this problem:
http://archives.postgresql.org/pgsql-general/2011-10/msg00879.php
was an attempt to write a plpgsql IF-ELSIF-ELSIF-ELSIF-ELSIF-...-ELSE
statement with five thousand branches.  Putting aside the wisdom of
doing that, it seems like the parser ought not fall over when you do.
The reason it's doing so is that plpgsql's grammar handles ELSIF with a
rule like this:

stmt_else: K_ELSIF expr_until_then proc_sect stmt_else

This is called right recursion in the Bison manual (as opposed to
left recursion, where a nonterminal has itself as the *first*
item in its own expansion), and the manual specifically says you
should always avoid right recursion in favor of left recursion,
else you are vulnerable to parser stack overflow.  Duh.

So, looking a bit more closely into why it's done that way, it's
because ELSIF was shoehorned into a parsetree representation that
was only meant for plain IF-THEN-ELSE.  The generated parse tree
will have a new level of PLpgSQL_stmt_if struct for each ELSIF.
That means that even if we fix the grammar, we're still at risk of
stack overflow at execution time because of recursion in pl_exec.c.

So really what needs to be done here is to make ELSIF chains explicit in
the parsetree representation, and handle them via looping not recursion
at runtime.  This will also make it a lot easier to do the grammar via
left-recursion.

So I'm going to go off and do that, but I wonder whether anyone thinks
this is sufficiently important to back-patch.  I'm inclined to think
that back-patching isn't a good idea, because changing the
representation of PLpgSQL_stmt_if will break (at least) EDB's plpgsql
debugger; ISTM the number of complaints isn't enough to warrant doing
that in released branches.

More generally, it seems like a good idea to check whether there are
any other unnecessary uses of right recursion in the core and plpgsql
grammars.  Unless somebody knows of an existing tool to do that,
I'm thinking of writing a little perl script to check for it (probably
by parsing the output of "bison -v") and adding the script to src/tools/
so we'll have it around to check again in the future.  Any objections?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby startup with overflowed snapshots

2011-10-27 Thread Chris Redekop
Thanks for the patch Simon, but unfortunately it does not resolve the issue
I am seeing.  The standby still refuses to finish starting up until long
after all clients have disconnected from the primary (>10 minutes).  I do
see your new log statement on startup, but only once - it does not repeat.
 Is there any way for me to see  what the oldest xid on the standby is via
controldata or something like that?  The standby does stream to keep up with
the primary while the primary has load, and then it becomes idle when the
primary becomes idle (when I kill all the connections)so it appears to
be current...but it just doesn't finish starting up

I'm not sure if it's relevant, but after it has sat idle for a couple
minutes I start seeing these statements in the log (with the same offset
every time):
DEBUG:  skipping restartpoint, already performed at 9/9520



On Thu, Oct 27, 2011 at 7:26 AM, Simon Riggs  wrote:

> Chris Redekop's recent report of slow startup for Hot Standby has made
> me revisit the code there.
>
> Although there isn't a bug, there is a missed opportunity for starting
> up faster which could be the source of Chris' annoyance.
>
> The following patch allows a faster startup in some circumstances.
>
> The patch also alters the log levels for messages and gives a single
> simple message for this situation. The log will now say
>
>  LOG:  recovery snapshot waiting for non-overflowed snapshot or until
> oldest active xid on standby is at least %u (now %u)
>  ...multiple times until snapshot non-overflowed or xid reached...
>
> whereas before the first LOG message shown was
>
>  LOG:  consistent state delayed because recovery snapshot incomplete
>  and only later, at DEBUG2 do you see
>  LOG:  recovery snapshot waiting for %u oldest active xid on standby is %u
>  ...multiple times until xid reached...
>
> Comments please.
>
> --
>  Simon Riggs   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>
>
> --
> Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-hackers
>
>


Re: [HACKERS] plpgsql versus long ELSIF chains

2011-10-27 Thread David E. Wheeler
On Oct 27, 2011, at 9:18 AM, Tom Lane wrote:

> So I'm going to go off and do that, but I wonder whether anyone thinks
> this is sufficiently important to back-patch.  I'm inclined to think
> that back-patching isn't a good idea, because changing the
> representation of PLpgSQL_stmt_if will break (at least) EDB's plpgsql
> debugger; ISTM the number of complaints isn't enough to warrant doing
> that in released branches.

+1 to not back-patching. Seems like it doesn't come up all that often, right

Best,

David


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] plpgsql versus long ELSIF chains

2011-10-27 Thread Heikki Linnakangas

On 27.10.2011 19:18, Tom Lane wrote:

So really what needs to be done here is to make ELSIF chains explicit in
the parsetree representation, and handle them via looping not recursion
at runtime.  This will also make it a lot easier to do the grammar via
left-recursion.

So I'm going to go off and do that, but I wonder whether anyone thinks
this is sufficiently important to back-patch.


This doesn't look like a bug to me, just an unoptimal implementation. So 
-1 for backpatching.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby startup with overflowed snapshots

2011-10-27 Thread Simon Riggs
On Thu, Oct 27, 2011 at 5:26 PM, Chris Redekop  wrote:

> Thanks for the patch Simon, but unfortunately it does not resolve the issue
> I am seeing.  The standby still refuses to finish starting up until long
> after all clients have disconnected from the primary (>10 minutes).  I do
> see your new log statement on startup, but only once - it does not repeat.
>  Is there any way for me to see  what the oldest xid on the standby is via
> controldata or something like that?  The standby does stream to keep up with
> the primary while the primary has load, and then it becomes idle when the
> primary becomes idle (when I kill all the connections)so it appears to
> be current...but it just doesn't finish starting up
> I'm not sure if it's relevant, but after it has sat idle for a couple
> minutes I start seeing these statements in the log (with the same offset
> every time):
> DEBUG:  skipping restartpoint, already performed at 9/9520

OK, so it looks like there are 2 opportunities to improve, not just one.

Try this.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


faster_hot_standby_startup_withsubxacts.v2.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Bug in walsender when calling out to do_pg_stop_backup (and others?)

2011-10-27 Thread Bruce Momjian
Greg Jaskiewicz wrote:
> 
> On 19 Oct 2011, at 18:28, Florian Pflug wrote:
> > All the other flags which indicate cancellation reasons are set from signal 
> > handers, I believe. We could of course mark as ClientConnectionLostPending 
> > as volatile just to be consistent. Not sure whether that's a good idea, or 
> > not. It might prevent a mistake should we ever add code to detect lost 
> > connections asynchronously (i.e., from somewhere else than pq_flush). And 
> > the cost is probably negligible, because CHECK_FOR_INTERRUPTS tests for 
> > InterruptPending before calling ProcessInterrupts(), so we only pay the 
> > cost of volatile if there's actually an interrupt pending. But I still 
> > think it's better to add qualifies such a volatile only when really 
> > necessary. A comment about why it *isn't* volatile is probably in order, 
> > though, so I'll add that in the next version of the patch.
> >
> Makes sense.
> 
> I had to ask, because it sticks out. And indeed there is a possibility
> that someone will one day will try to use from signal handler, etc.

A C comment might help there.

--
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] out-of-order caution

2011-10-27 Thread Kevin Grittner
Simon Riggs  wrote:
> On Thu, Oct 27, 2011 at 4:41 PM, Kevin Grittner
>  wrote:
>> On the docs page for the SELECT statement, there is a caution
>> which starts with:
>>
>> | It is possible for a SELECT command using ORDER BY and FOR
>> | UPDATE/SHARE to return rows out of order. This is because ORDER
>> | BY is applied first.
>>
>> Is this risk limited to queries running in READ COMMITTED
>> transactions?  If so, I think that should be mentioned in the
>> caution.
> 
> I think it should say that if this occurs with SERIALIZED
> transactions it will result in a serialisation error.
> 
> Just to say there is no effect in serializable mode wouldn't be
> helpful.
 
Hmm.  At first reading I thought this was related to the
mixed-snapshot issue in READ COMMITTED, but now I'm not so sure. 
Does anyone know which isolation levels are affected?  Barring that,
can anyone point to an existing test which demonstrates the problem?
 
If this can happen in snapshot isolation with just one reader and
one writer, I doubt that SSI helps with it.  :-(
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] out-of-order caution

2011-10-27 Thread Robert Haas
On Thu, Oct 27, 2011 at 1:51 PM, Kevin Grittner
 wrote:
> Simon Riggs  wrote:
>> On Thu, Oct 27, 2011 at 4:41 PM, Kevin Grittner
>>  wrote:
>>> On the docs page for the SELECT statement, there is a caution
>>> which starts with:
>>>
>>> | It is possible for a SELECT command using ORDER BY and FOR
>>> | UPDATE/SHARE to return rows out of order. This is because ORDER
>>> | BY is applied first.
>>>
>>> Is this risk limited to queries running in READ COMMITTED
>>> transactions?  If so, I think that should be mentioned in the
>>> caution.
>>
>> I think it should say that if this occurs with SERIALIZED
>> transactions it will result in a serialisation error.
>>
>> Just to say there is no effect in serializable mode wouldn't be
>> helpful.
>
> Hmm.  At first reading I thought this was related to the
> mixed-snapshot issue in READ COMMITTED, but now I'm not so sure.
> Does anyone know which isolation levels are affected?  Barring that,
> can anyone point to an existing test which demonstrates the problem?
>
> If this can happen in snapshot isolation with just one reader and
> one writer, I doubt that SSI helps with it.  :-(

Simple test case:

rhaas=# create table oops (a int);
CREATE TABLE
rhaas=# insert into oops values (1), (2), (3), (4);
INSERT 0 4
rhaas=# begin;
BEGIN
rhaas=# update oops set a = 5 where a = 2;
UPDATE 1

In another session:

rhaas=# select * from oops order by 1 for update;


Back to the first session:

rhaas=# commit;
COMMIT

Second session now returns:

 a
---
 1
 5
 3
 4
(4 rows)

But if you do the same thing at REPEATABLE READ, you get:

ERROR:  could not serialize access due to concurrent update
STATEMENT:  select * from oops order by 1 for update;

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] out-of-order caution

2011-10-27 Thread Kevin Grittner
Robert Haas  wrote:
> Simple test case:
> 
> rhaas=# create table oops (a int);
> CREATE TABLE
> rhaas=# insert into oops values (1), (2), (3), (4);
> INSERT 0 4
> rhaas=# begin;
> BEGIN
> rhaas=# update oops set a = 5 where a = 2;
> UPDATE 1
> 
> In another session:
> 
> rhaas=# select * from oops order by 1 for update;
> 
> 
> Back to the first session:
> 
> rhaas=# commit;
> COMMIT
> 
> Second session now returns:
> 
>  a
> ---
>  1
>  5
>  3
>  4
> (4 rows)
> 
> But if you do the same thing at REPEATABLE READ, you get:
> 
> ERROR:  could not serialize access due to concurrent update
> STATEMENT:  select * from oops order by 1 for update;
 
So it seems to me that the caution about this issue is only
half-right.  Below REPEATABLE READ isolation it behaves as currently
described; REPEATABLE READ or SERIALIZABLE will throw that error. 
That is probably worth noting, since:
 
(1)  People should understand that they can't get incorrect results
at either of the stricter isolation levels.
 
(2)  They *can* get a serialization failure involving just two
transactions: a read and a write.  This is not something which
normally happens at any level, so it might tend to surprise people.
 
No words leap to mind for me.  Anyone else?
 
-Kevin

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (PATCH) Adding CORRESPONDING (NULL error)

2011-10-27 Thread Kerem Kat
Hi,

Union with NULL error persists without the corresponding patch. Here
is the output from postgres without the patch:

SELECT a FROM (SELECT 1 a) foo
UNION
SELECT a FROM (SELECT NULL a) foo2;

ERROR:  failed to find conversion function from unknown to integer


It is thrown from parse_coerce.c:coerce_type method. I will try to dig
deep on it.


Regards,

Kerem KAT


On Thu, Oct 27, 2011 at 15:45, Erik Rijkers  wrote:
> (pgsql 9.2devel (25 oct) with your latest CORRESPONDING patch;
> linux x86_64 GNU/Linux 2.6.18-274.3.1.el5)
>
> Hi,
>
> here is another peculiarity, which I think is a bug:
>
> -- first without CORRESPONDING:
>
> $ psql -Xaf null.sql
>            select 1 a   , 2 b
> union all
>            select null a, 4 b ;
>  a | b
> ---+---
>  1 | 2
>   | 4
> (2 rows)
>
> -- then with CORRESPONDING:
>
>            select 1 a   , 2 b
> union all
> corresponding
>            select null a, 4 b ;
> psql:null.sql:9: ERROR:  failed to find conversion function from unknown to 
> integer
>
>
> If the null value is in a table column the error does not occur:
>
> drop table if exists t1; create table t1 (a int, b int); insert into t1 
> values (1,2);
> drop table if exists t2; create table t2 (a int, b int); insert into t2 
> values (null,2);
>                select a,b from t1
> union all
> corresponding
>                select a,b from t2 ;
>  a | b
> ---+---
>  1 | 2
>   | 2
> (2 rows)
>
>
> I'm not sure it is actually a bug; but it seems an unneccessary error.
>
>
> thanks,
>
> Erik Rijkers
>
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Backup with rsync fails at pg_clog if under load

2011-10-27 Thread Florian Pflug
On Oct27, 2011, at 16:30 , Simon Riggs wrote:
> On Thu, Oct 27, 2011 at 3:03 PM, Florian Pflug  wrote:
> 
>>> I think you make a good case for doing this.
>>> 
>>> However, I'm concerned that moving LogStandbySnapshot() in a backpatch
>>> seems more risky than it's worth. We could easily introduce a new bug
>>> into what we would all agree is a complex piece of code. Minimal
>>> change seems best in this case.
>> 
>> OTOH, we currently compute oldestActiveXid within LogStandbySnapshot().
>> Your proposed patch changes that, which also carries a risk since something
>> could depend on these values being in sync. Especially since both the logged
>> snapshot and oldestActiveXid influence the snapshot tracking on the slave.
>> 
>> But since you wrote most of that code, your judgement about the relative
>> risks of these two approaches obviously out-weights mine.
> 
> We must move oldestActiveXid since that is the source of a bug. There
> is no need to move LogStandbySnapshot(), so I am suggesting we don't
> do that for the backpatch. I was going to implement it the way you
> suggest in HEAD, since I agree that is a cleaner way.

Sound good.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] out-of-order caution

2011-10-27 Thread Tom Lane
"Kevin Grittner"  writes:
> Simon Riggs  wrote:
>> On Thu, Oct 27, 2011 at 4:41 PM, Kevin Grittner
>>  wrote:
>>> | It is possible for a SELECT command using ORDER BY and FOR
>>> | UPDATE/SHARE to return rows out of order. This is because ORDER
>>> | BY is applied first.

>> I think it should say that if this occurs with SERIALIZED
>> transactions it will result in a serialisation error.
 
> Hmm.  At first reading I thought this was related to the
> mixed-snapshot issue in READ COMMITTED, but now I'm not so sure. 

Simon's comment is correct.  If you do a SELECT FOR UPDATE/SHARE in a
non-READ-COMMITTED transaction, and it turns out that someone modified
the tuple before you could lock it, you'll get a serialization error
(cf ExecLockRows()), not updated data.  So out-of-order sorting is
not possible.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] out-of-order caution

2011-10-27 Thread Tom Lane
"Kevin Grittner"  writes:
> (2)  They *can* get a serialization failure involving just two
> transactions: a read and a write.

Only if you ignore the difference between SELECT FOR UPDATE/SHARE and
plain SELECT.  I think calling the former a "read" is a conceptual error
to start with.  It has the same locking and synchronization behavior as
a write.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] portal with hold

2011-10-27 Thread fschmidt
I am posting to beg for the implementation of a "with hold" feature for
portals, similar to what available for cursors.  This is needed by the JDBC
driver to implement Java's Result.HOLD_CURSORS_OVER_COMMIT which is needed
to make Java's setFetchSize() work which is needed to read large result
sets.  Please also see:

http://postgresql.1045698.n5.nabble.com/setFetchSize-tp4935215p4941614.html


--
View this message in context: 
http://postgresql.1045698.n5.nabble.com/portal-with-hold-tp4943998p4943998.html
Sent from the PostgreSQL - hackers mailing list archive at Nabble.com.

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] fun with unlogged tables

2011-10-27 Thread Robert Haas
One of the optimizations that I did for 9.1 was to make transactions
that touch only temporary and/or unlogged tables always commit
asynchronously, because if the database crashes the table contents
will be blown away in their entirety, and whether or not the commit
made it down to disk won't matter a bit.  In my testing, this hugely
improved the performance of unlogged tables.  However, Heikki recently
reported to me off-list that this can actually cause a significant
performance problem in some circumstances, because committing
asynchronously means that we can't set hint bits right away - we must
wait until the WAL writer has completed its background flush.  With
default settings, this takes long enough to cause lots of extra clog
traffic.  He ran into the problem while running pgbench at scale
factor 15, and I reproduced it the same way.  I believe that at a
large scale factor the effect is lessened because you're less likely
to reread the same row multiple times before the commit hits the disk.

It strikes me that, while it's not safe to set hint bits until the
commit record hits the disk for *permanent* relations, it ought to be
just fine for temporary and unlogged relations, because those pages
will be gone after a crash, and their hint bits with them.  Attached
is a patch taking that approach.  Another approach would be to have
transactions that only touch temporary or unlogged relations to avoid
changing the value that will be returned by
TransactionIdGetCommitLSN().  This approach is better when
synchronous_commit=off and a transaction touches both permanent and
non-permanent tables, because it makes the decision as to whether hint
bits can be set early based on which page is being updated rather than
on some characteristic of the transaction.  However, it also adds a
small amount of overhead to the case where we're doing an asynchronous
commit on a permanent table, because we do one more check before
concluding that hint bits can't be set.

I did some benchmarking of this approach using pgbench with scale
factor 15.  shared_buffers = 8GB, maintenance_work_mem = 1GB,
checkpoint_segments = 30, checkpoint_timeout = 15min,
checkpoint_completion_target = 0.9, synchronous_commit = off.  On
permanent tables 8 clients came out slower and 16 came out faster; I'm
inclined to believe it's all in the noise.  On unlogged tables the
patch appears to be a clear win, massively so at 32 clients.  Results
below.  The first number on each line is the client count, and the
remaining numbers are tps including connections establishing, from
individual 5-minute runs.

Unlogged Tables, unpatched:
1 861.841894 752.762490 837.847109
8 3379.832456 4100.539369 3751.842036
16 6259.907605 5523.406202 4437.648873
32 4547.725770 5360.246166 4958.086754

Unlogged Tables, with patch:
1 887.562141 785.539717 920.275452
8 4436.366884 4374.135712 4335.791842
16 7518.908796 7478.427691 7476.817757
32 10433.615767 10450.577573 10374.186566

Permanent Tables, unpatched:
1 648.824800 647.239277 652.116249
8 3785.485647 3481.021391 3827.455756
16 5652.069678 4004.780105 4207.354612
32 5084.804778 4645.997471 5222.387075

Permanent Tables, with patch:
1 636.162775 637.145834 640.298944
8 3383.817707 3388.273809 3815.298676
16 5543.585926 5093.483757 5112.854318
32 5229.295024 4985.736460 5103.441187

Thoughts?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


bip.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (PATCH) Adding CORRESPONDING (NULL error)

2011-10-27 Thread Tom Lane
Kerem Kat  writes:
> Union with NULL error persists without the corresponding patch. Here
> is the output from postgres without the patch:

> SELECT a FROM (SELECT 1 a) foo
> UNION
> SELECT a FROM (SELECT NULL a) foo2;

> ERROR:  failed to find conversion function from unknown to integer

Yeah, this is a longstanding issue that is not simple to fix without
introducing other unpleasantnesses.  It is not something you should
try to deal with at the same time as implementing CORRESPONDING.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dumpall Sets Roll default_tablespace Before Creating Tablespaces

2011-10-27 Thread Bruce Momjian
Florian Pflug wrote:
> On Oct21, 2011, at 16:42 , Phil Sorber wrote:
> > If you did want to make them immutable, I also like Florian's idea of
> > a dependency graph. This would make the dumps less readable though.
> 
> Hm, I kinda reversed my opinion on that, though - i.e., I no longer think
> that the dependency graph idea has much merit. For two reasons
> 
> First, dependencies work on OIDs, not on names. Thus, for the dependency
> machinery to work for GUCs, they'd also need to store OIDs instead of
> names of referenced schema objects. (Otherwise you get into trouble if
> objects are renamed)
> 
> Which of course doesn't work, at least for roles, because roles are
> shared objects, but referenced objects might be database-local.
> (search_path, for example).

Is this a TODO?

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (PATCH) Adding CORRESPONDING (NULL error)

2011-10-27 Thread Tom Lane
I wrote:
> Kerem Kat  writes:
>> Union with NULL error persists without the corresponding patch. Here
>> is the output from postgres without the patch:

>> SELECT a FROM (SELECT 1 a) foo
>> UNION
>> SELECT a FROM (SELECT NULL a) foo2;

>> ERROR:  failed to find conversion function from unknown to integer

> Yeah, this is a longstanding issue that is not simple to fix without
> introducing other unpleasantnesses.  It is not something you should
> try to deal with at the same time as implementing CORRESPONDING.

BTW, just to clarify: although that case fails, the case Erik was
complaining of does work in unmodified Postgres:

regression=# select 1 a   , 2 b
union all
select null a, 4 b ;
 a | b 
---+---
 1 | 2
   | 4
(2 rows)

and I agree with him that it should still work with CORRESPONDING.
Even though the behavior of unlabeled NULLs is less than perfect,
we definitely don't want to break cases that work now.  I suspect
the failure means that you tried to postpone too much work to plan
time.  You do have to match up the columns honestly at parse time
and do the necessary type coercions on them then.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] (PATCH) Adding CORRESPONDING (NULL error)

2011-10-27 Thread Tom Lane
Kerem Kat  writes:
> On Thu, Oct 27, 2011 at 23:20, Tom Lane  wrote:
>> BTW, just to clarify: although that case fails, the case Erik was
>> complaining of does work in unmodified Postgres:
>> ...
>> and I agree with him that it should still work with CORRESPONDING.

> That is by design, because CORRESPONDING is implemented as subqueries:

Well, this appears to me to be a counterexample sufficient to refute
that implementation decision.  You can inject subqueries at plan time,
if that helps you make things match up, but you can't rearrange things
that way at parse time, as I gather you're doing or else you would not
be seeing this problem.  In any case, I already pointed out to you that
rearranging the parse tree that way is problematic for reverse-listing
the parse tree.  We don't want to see subqueries injected in the results
of printing parse trees with ruleutils.c.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby startup with overflowed snapshots

2011-10-27 Thread Chris Redekop
hrmz, still basically the same behaviour.  I think it might be a *little*
better with this patch.  Before when under load it would start up quickly
maybe 2 or 3 times out of 10 attemptswith this patch it might be up to 4
or 5 times out of 10...ish...or maybe it was just fluke *shrug*.  I'm still
only seeing your log statement a single time (I'm running at debug2).  I
have discovered something though - when the standby is in this state if I
force a checkpoint on the primary then the standby comes right up.  Is there
anything I check or try for you to help figure this out?or is it
actually as designed that it could take 10-ish minutes to start up even
after all clients have disconnected from the primary?


On Thu, Oct 27, 2011 at 11:27 AM, Simon Riggs  wrote:

> On Thu, Oct 27, 2011 at 5:26 PM, Chris Redekop  wrote:
>
> > Thanks for the patch Simon, but unfortunately it does not resolve the
> issue
> > I am seeing.  The standby still refuses to finish starting up until long
> > after all clients have disconnected from the primary (>10 minutes).  I do
> > see your new log statement on startup, but only once - it does not
> repeat.
> >  Is there any way for me to see  what the oldest xid on the standby is
> via
> > controldata or something like that?  The standby does stream to keep up
> with
> > the primary while the primary has load, and then it becomes idle when the
> > primary becomes idle (when I kill all the connections)so it appears
> to
> > be current...but it just doesn't finish starting up
> > I'm not sure if it's relevant, but after it has sat idle for a couple
> > minutes I start seeing these statements in the log (with the same offset
> > every time):
> > DEBUG:  skipping restartpoint, already performed at 9/9520
>
> OK, so it looks like there are 2 opportunities to improve, not just one.
>
> Try this.
>
> --
>  Simon Riggs   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>


Re: [HACKERS] (PATCH) Adding CORRESPONDING (NULL error)

2011-10-27 Thread Kerem Kat
On Thu, Oct 27, 2011 at 23:20, Tom Lane  wrote:
> I wrote:
>> Kerem Kat  writes:
>>> Union with NULL error persists without the corresponding patch. Here
>>> is the output from postgres without the patch:
>
>>> SELECT a FROM (SELECT 1 a) foo
>>> UNION
>>> SELECT a FROM (SELECT NULL a) foo2;
>
>>> ERROR:  failed to find conversion function from unknown to integer
>
>> Yeah, this is a longstanding issue that is not simple to fix without
>> introducing other unpleasantnesses.  It is not something you should
>> try to deal with at the same time as implementing CORRESPONDING.
>
> BTW, just to clarify: although that case fails, the case Erik was
> complaining of does work in unmodified Postgres:
>
> regression=# select 1 a   , 2 b
> union all
>            select null a, 4 b ;
>  a | b
> ---+---
>  1 | 2
>   | 4
> (2 rows)
>
> and I agree with him that it should still work with CORRESPONDING.
> Even though the behavior of unlabeled NULLs is less than perfect,
> we definitely don't want to break cases that work now.  I suspect
> the failure means that you tried to postpone too much work to plan
> time.  You do have to match up the columns honestly at parse time
> and do the necessary type coercions on them then.
>
>                        regards, tom lane
>

That is by design, because CORRESPONDING is implemented as subqueries:

 select 1 a   , 2 b
union all
corresponding
   select null a, 4 b ;

is equivalent to

SELECT a, b FROM ( SELECT 1 a, 2 b ) foo
UNION ALL
SELECT a, b FROM ( SELECT null a, 4 b ) foo2;

which gives the same error in unpatched postgres.


Regards,

Kerem KAT

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby startup with overflowed snapshots

2011-10-27 Thread Simon Riggs
On Thu, Oct 27, 2011 at 10:09 PM, Chris Redekop  wrote:

> hrmz, still basically the same behaviour.  I think it might be a *little*
> better with this patch.  Before when under load it would start up quickly
> maybe 2 or 3 times out of 10 attemptswith this patch it might be up to 4
> or 5 times out of 10...ish...or maybe it was just fluke *shrug*.  I'm still
> only seeing your log statement a single time (I'm running at debug2).  I
> have discovered something though - when the standby is in this state if I
> force a checkpoint on the primary then the standby comes right up.  Is there
> anything I check or try for you to help figure this out?or is it
> actually as designed that it could take 10-ish minutes to start up even
> after all clients have disconnected from the primary?

Thanks for testing. The improvements cover specific cases, so its not
subject to chance; its not a performance patch.

It's not "designed" to act the way you describe, but it does.

The reason this occurs is that you have a transaction heavy workload
with occasional periods of complete quiet and a base backup time that
is much less than checkpoint_timeout. If your base backup was slower
the checkpoint would have hit naturally before recovery had reached a
consistent state. Which seems fairly atypical. I guess you're doing
this on a test system.

It seems cheap to add in a call to LogStandbySnapshot() after each
call to pg_stop_backup().

Does anyone think this case is worth adding code for? Seems like one
more thing to break.

-- 
 Simon Riggs   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_dumpall Sets Roll default_tablespace Before Creating Tablespaces

2011-10-27 Thread Florian Pflug
On Oct27, 2011, at 23:02 , Bruce Momjian wrote:
> Florian Pflug wrote:
>> On Oct21, 2011, at 16:42 , Phil Sorber wrote:
>>> If you did want to make them immutable, I also like Florian's idea of
>>> a dependency graph. This would make the dumps less readable though.
>> 
>> Hm, I kinda reversed my opinion on that, though - i.e., I no longer think
>> that the dependency graph idea has much merit. For two reasons
>> 
>> First, dependencies work on OIDs, not on names. Thus, for the dependency
>> machinery to work for GUCs, they'd also need to store OIDs instead of
>> names of referenced schema objects. (Otherwise you get into trouble if
>> objects are renamed)
>> 
>> Which of course doesn't work, at least for roles, because roles are
>> shared objects, but referenced objects might be database-local.
>> (search_path, for example).
> 
> Is this a TODO?

The idea quoted above, no. But

 Downgrade non-immutable (i.e., dependent on database state) checks during
 "ALTER ROLE/DATABASE SET" to WARNINGs to avoid breakage during restore

makes for a fine TODO, I'd say.

best regards,
Florian Pflug


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Add socket dir to pg_config..?

2011-10-27 Thread Stephen Frost
All,

  Was just wondering if we might want to include the default socket
  directory that was compiled in as part of the pg_config output..?
  
Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pg_upgrade if 'postgres' database is dropped

2011-10-27 Thread Bruce Momjian
Robert Haas wrote:
> On Tue, Oct 4, 2011 at 12:11 PM, Heikki Linnakangas
>  wrote:
> > pg_upgrade doesn't work if the 'postgres' database has been dropped in the
> > old cluster:
> >
> > ~/pgsql.master$ bin/pg_upgrade -b ~/pgsql.91stable/bin -B bin/ -d
> > ~/pgsql.91stable/data -D data-upgraded/
> > Performing Consistency Checks
> > -
> > Checking current, bin, and data directories ? ? ? ? ? ? ? ? ok
> > Checking cluster versions ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ok
> > Checking database user is a superuser ? ? ? ? ? ? ? ? ? ? ? ok
> > Checking for prepared transactions ? ? ? ? ? ? ? ? ? ? ? ? ?ok
> > Checking for reg* system OID user data types ? ? ? ? ? ? ? ?ok
> > Checking for contrib/isn with bigint-passing mismatch ? ? ? ok
> > Creating catalog dump ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ? ok
> > Checking for prepared transactions ? ? ? ? ? ? ? ? ? ? ? ? ?ok
> >
> > New cluster database "postgres" does not exist in the old cluster
> > Failure, exiting
> >
> >
> > That's a bit unfortunate. We have some other tools that don't work without
> > the 'postgres' database, like 'reindexdb -all', but it would still be nice
> > if pg_upgrade did.
> 
> +1.  I think our usual policy is to try postgres first and then try
> template1, so it would seem logical for pg_upgrade to do the same.

Well, it is a little tricky.  The problem is that I am not just
connecting to a database --- I am creating support functions in the
database.  Now, this is complex because template1 is the template for
new databases, except for pg_dump which uses template0.

So, it is going to be confusing to support both databases because there
is the cleanup details I have to document if I use template1.

Also, pg_dumpall unconditionally connects to the postgres database to
restore roles:

fprintf(OPF, "\\connect postgres\n\n");

We could connect to template1 for this, but I am not comfortable
changing this.  And when pg_dumpall throws an error for a missing
postgres database, pg_upgrade is going to fail.

We started using the postgres database as a database for connections ---
do we want to change that?  We certainly can't have the pg_dumpall
output _conditionally_ connecting to template1.

I am feeling this isn't worth pursuing.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade if 'postgres' database is dropped

2011-10-27 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
> So, it is going to be confusing to support both databases because there
> is the cleanup details I have to document if I use template1.

Presumably there's some other database in the system besides template1
if postgres doesn't exist..  Would it be possible to just make it
configurable?  Then the user could pick a non-template database.  Having
it fail if the option isn't used and the default postgres isn't there is
fine, imv.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] pg_upgrade if 'postgres' database is dropped

2011-10-27 Thread Bruce Momjian
Stephen Frost wrote:
-- Start of PGP signed section.
> * Bruce Momjian (br...@momjian.us) wrote:
> > So, it is going to be confusing to support both databases because there
> > is the cleanup details I have to document if I use template1.
> 
> Presumably there's some other database in the system besides template1
> if postgres doesn't exist..  Would it be possible to just make it
> configurable?  Then the user could pick a non-template database.  Having
> it fail if the option isn't used and the default postgres isn't there is
> fine, imv.

I have not seen enough demand to make this a user-visible configuration.
We can just tell them to create a postgres database.   Frankly, they
would have had to _remove_ the postgres database after initdb for it not
to be there, and they are instructed to change nothing about the new
database.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade if 'postgres' database is dropped

2011-10-27 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
> I have not seen enough demand to make this a user-visible configuration.
> We can just tell them to create a postgres database.   Frankly, they
> would have had to _remove_ the postgres database after initdb for it not
> to be there, and they are instructed to change nothing about the new
> database.

Yes, they would have removed it because they didn't want it.  As I
recall, part of the agreement to create an extra database by default was
that it could be removed if users didn't want it.  Turning around and
then saying "but things won't work if it's not there" isn't exactly
supporting users who decide to remove it.

Regarding pg_dumpall and pg_restore, I'm pretty sure both of those can
be configured to connect to other databases instead, including for
globals.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Hot Standby startup with overflowed snapshots

2011-10-27 Thread Chris Redekop
Sorry..."designed" was poor choice of words, I meant "not unexpected".
 Doing the checkpoint right after pg_stop_backup() looks like it will work
perfectly for me, so thanks for all your help!

On a side note I am sporadically seeing another error on hotstandby startup.
 I'm not terribly concerned about it as it is pretty rare and it will work
on a retry so it's not a big deal.  The error is "FATAL:  out-of-order XID
insertion in KnownAssignedXids".  If you think it might be a bug and are
interested in hunting it down let me know and I'll help any way I can...but
if you're not too worried about it then neither am I :)


On Thu, Oct 27, 2011 at 4:55 PM, Simon Riggs  wrote:

> On Thu, Oct 27, 2011 at 10:09 PM, Chris Redekop 
> wrote:
>
> > hrmz, still basically the same behaviour.  I think it might be a *little*
> > better with this patch.  Before when under load it would start up quickly
> > maybe 2 or 3 times out of 10 attemptswith this patch it might be up
> to 4
> > or 5 times out of 10...ish...or maybe it was just fluke *shrug*.  I'm
> still
> > only seeing your log statement a single time (I'm running at debug2).  I
> > have discovered something though - when the standby is in this state if I
> > force a checkpoint on the primary then the standby comes right up.  Is
> there
> > anything I check or try for you to help figure this out?or is it
> > actually as designed that it could take 10-ish minutes to start up even
> > after all clients have disconnected from the primary?
>
> Thanks for testing. The improvements cover specific cases, so its not
> subject to chance; its not a performance patch.
>
> It's not "designed" to act the way you describe, but it does.
>
> The reason this occurs is that you have a transaction heavy workload
> with occasional periods of complete quiet and a base backup time that
> is much less than checkpoint_timeout. If your base backup was slower
> the checkpoint would have hit naturally before recovery had reached a
> consistent state. Which seems fairly atypical. I guess you're doing
> this on a test system.
>
> It seems cheap to add in a call to LogStandbySnapshot() after each
> call to pg_stop_backup().
>
> Does anyone think this case is worth adding code for? Seems like one
> more thing to break.
>
> --
>  Simon Riggs   http://www.2ndQuadrant.com/
>  PostgreSQL Development, 24x7 Support, Training & Services
>


Re: [HACKERS] pg_upgrade if 'postgres' database is dropped

2011-10-27 Thread Bruce Momjian
Stephen Frost wrote:
-- Start of PGP signed section.
> * Bruce Momjian (br...@momjian.us) wrote:
> > I have not seen enough demand to make this a user-visible configuration.
> > We can just tell them to create a postgres database.   Frankly, they
> > would have had to _remove_ the postgres database after initdb for it not
> > to be there, and they are instructed to change nothing about the new
> > database.
> 
> Yes, they would have removed it because they didn't want it.  As I
> recall, part of the agreement to create an extra database by default was
> that it could be removed if users didn't want it.  Turning around and
> then saying "but things won't work if it's not there" isn't exactly
> supporting users who decide to remove it.

Well, you would have to remove it _after_ you did the pg_upgrade.  Right
now if you do a normal dump/restore upgrade, you also have to re-remove
the postgres database.  We don't have any mechanism to drop a database
as part of pg_dumpall's restore if it didn't exist in the old cluster.

> Regarding pg_dumpall and pg_restore, I'm pretty sure both of those can
> be configured to connect to other databases instead, including for
> globals.

Well, please show me the code, because the C code I showed you had the
'\connect postgres' string hardcoded in there.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade if 'postgres' database is dropped

2011-10-27 Thread Stephen Frost
* Bruce Momjian (br...@momjian.us) wrote:
> Well, you would have to remove it _after_ you did the pg_upgrade.  Right
> now if you do a normal dump/restore upgrade, you also have to re-remove
> the postgres database.  We don't have any mechanism to drop a database
> as part of pg_dumpall's restore if it didn't exist in the old cluster.

Perhaps not, but it *could* be removed after the restore and all would
be well, yes?

> > Regarding pg_dumpall and pg_restore, I'm pretty sure both of those can
> > be configured to connect to other databases instead, including for
> > globals.
> 
> Well, please show me the code, because the C code I showed you had the
> '\connect postgres' string hardcoded in there.

I guess there's a difference between "can be used and will work
correctly, but might create some extra garbage" and "can't be used at
all".  pg_dumpall has a -l option for connecting to whatever *existing*
database you have to pull the global data, and then it'll restore into a
clean initdb'd cluster, after which you could remove postgres.
Admittedly, if you initdb the cluster, drop postgres, and then try a
restore, it would fail.  Personally, I'm not a big fan of that (why
don't we use what was passed in to -l for that..?), but, practically,
it's not that big a deal.  I don't know many folks who are going to
restore a pg_dumpall dump into an existing configuration where they've
monkied with things (that could cause all kinds of other issues anyway,
role conflicts, etc).

If I understood correctly (perhaps I didn't..), is that pg_upgrade
doesn't have the pg_dumpall equivilant of the '-l' or '--database'
option, and that's what is at issue here.

Thanks,

Stephen


signature.asc
Description: Digital signature


Re: [HACKERS] Add socket dir to pg_config..?

2011-10-27 Thread Tom Lane
Stephen Frost  writes:
>   Was just wondering if we might want to include the default socket
>   directory that was compiled in as part of the pg_config output..?

[ shrug... ]  We don't report the compiled-in port number, which is
considerably more critical.  And we don't report changes in any of the
other stuff in pg_config_manual.h.

MHO is that changing the socket directory is only marginally supported,
and we shouldn't encourage it unless we're prepared to fully support it
(which we can't really).

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade if 'postgres' database is dropped

2011-10-27 Thread Bruce Momjian
Stephen Frost wrote:
-- Start of PGP signed section.
> * Bruce Momjian (br...@momjian.us) wrote:
> > Well, you would have to remove it _after_ you did the pg_upgrade.  Right
> > now if you do a normal dump/restore upgrade, you also have to re-remove
> > the postgres database.  We don't have any mechanism to drop a database
> > as part of pg_dumpall's restore if it didn't exist in the old cluster.
> 
> Perhaps not, but it *could* be removed after the restore and all would
> be well, yes?

I am not sure how much pg_dump worries about removing system objects
during a restore --- I don't think it does that at all actually.  I
thought we did that for plpgsql, but I don't see that in the C code now,
and testing doesn't show it being removed by pg_dump.

> > > Regarding pg_dumpall and pg_restore, I'm pretty sure both of those can
> > > be configured to connect to other databases instead, including for
> > > globals.
> > 
> > Well, please show me the code, because the C code I showed you had the
> > '\connect postgres' string hardcoded in there.
> 
> I guess there's a difference between "can be used and will work
> correctly, but might create some extra garbage" and "can't be used at
> all".  pg_dumpall has a -l option for connecting to whatever *existing*
> database you have to pull the global data, and then it'll restore into a
> clean initdb'd cluster, after which you could remove postgres.

Keep in mind -l might connect to a specified database to do the dump,
but it will still connect to the 'postgres' database to recreate them.

> Admittedly, if you initdb the cluster, drop postgres, and then try a
> restore, it would fail.  Personally, I'm not a big fan of that (why

Right, same with pg_upgrade.

> don't we use what was passed in to -l for that..?), but, practically,

No idea.

> it's not that big a deal.  I don't know many folks who are going to
> restore a pg_dumpall dump into an existing configuration where they've
> monkied with things (that could cause all kinds of other issues anyway,
> role conflicts, etc).
> 
> If I understood correctly (perhaps I didn't..), is that pg_upgrade
> doesn't have the pg_dumpall equivilant of the '-l' or '--database'
> option, and that's what is at issue here.

Well, I can modify pg_upgrade to connect to template1 easily (no need
for a switch) --- the problem is that pg_dumpall requires the 'postgres'
database to restore its dump file.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade if 'postgres' database is dropped

2011-10-27 Thread Tom Lane
Bruce Momjian  writes:
> Stephen Frost wrote:
>> Yes, they would have removed it because they didn't want it.  As I
>> recall, part of the agreement to create an extra database by default was
>> that it could be removed if users didn't want it.  Turning around and
>> then saying "but things won't work if it's not there" isn't exactly
>> supporting users who decide to remove it.

> Well, you would have to remove it _after_ you did the pg_upgrade.

As far as the *target* cluster is concerned, I have no sympathy for
someone who messes with its contents before running pg_upgrade.
That's an RTFM matter: you're supposed to upgrade into a virgin
just-initdb'd cluster.

However, it would be nice if pg_upgrade supported transferring from a
*source* cluster that didn't have the postgres DB.

What about creating a new, single-purpose database in the source
cluster and then removing it again after we're done?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade if 'postgres' database is dropped

2011-10-27 Thread Bruce Momjian
Tom Lane wrote:
> Bruce Momjian  writes:
> > Stephen Frost wrote:
> >> Yes, they would have removed it because they didn't want it.  As I
> >> recall, part of the agreement to create an extra database by default was
> >> that it could be removed if users didn't want it.  Turning around and
> >> then saying "but things won't work if it's not there" isn't exactly
> >> supporting users who decide to remove it.
> 
> > Well, you would have to remove it _after_ you did the pg_upgrade.
> 
> As far as the *target* cluster is concerned, I have no sympathy for
> someone who messes with its contents before running pg_upgrade.
> That's an RTFM matter: you're supposed to upgrade into a virgin
> just-initdb'd cluster.
> 
> However, it would be nice if pg_upgrade supported transferring from a
> *source* cluster that didn't have the postgres DB.

I have this C comment in pg_upgrade about this:

 *  If someone removes the 'postgres' database from the old cluster and
 *  the new cluster has a 'postgres' database, the number of databases
 *  will not match.  We actually could upgrade such a setup, but it would
 *  violate the 1-to-1 mapping of database counts, so we throw an error
 *  instead.  We would detect this as a database count mismatch during
 *  upgrade, but we want to detect it during the check phase and report
 *  the database name.

Is this worth fixing?  Another problem is that pg_dumpall doesn't remove
the postgres database in the new cluster if it was not in the old one,
so they are going to end up with a postgres database in the new cluster
anyway.  I could argue that pg_upgrade is better because reports it
cannot recreate the new cluster exactly, while pg_dumpall just keeps a
postgres database that was not in the old cluster.

> What about creating a new, single-purpose database in the source
> cluster and then removing it again after we're done?

That is not a problem --- I can easily use template1.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade if 'postgres' database is dropped

2011-10-27 Thread Robert Haas
On Thu, Oct 27, 2011 at 11:35 PM, Bruce Momjian  wrote:
>> What about creating a new, single-purpose database in the source
>> cluster and then removing it again after we're done?
>
> That is not a problem --- I can easily use template1.

Huh?

You just said upthread that you didn't want to use template1 because
you didn't want to modify the template database.  I think the point is
that if you're doing something to the database that someone might
object to, you oughtn't be doing it to the postgres database either.
You should create a database just for pg_upgrade's use and install its
crap in there.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade if 'postgres' database is dropped

2011-10-27 Thread Bruce Momjian
Robert Haas wrote:
> On Thu, Oct 27, 2011 at 11:35 PM, Bruce Momjian  wrote:
> >> What about creating a new, single-purpose database in the source
> >> cluster and then removing it again after we're done?
> >
> > That is not a problem --- I can easily use template1.
> 
> Huh?
> 
> You just said upthread that you didn't want to use template1 because
> you didn't want to modify the template database.  I think the point is

I don't want to use postgres and then fall back to template1 if
necessary --- I would just use template1 always.

> that if you're doing something to the database that someone might
> object to, you oughtn't be doing it to the postgres database either.
> You should create a database just for pg_upgrade's use and install its
> crap in there.

It installs crap in all databases to set oids on system tables, for
example, so we are only creating it early in postgres (or template1) to
set auth_id.  Our sticking point now is that pg_dumpall has the
'postgres' database hardcoded for role creation.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + It's impossible for everything to be true. +

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Hot Standby startup with overflowed snapshots

2011-10-27 Thread Robert Haas
On Thu, Oct 27, 2011 at 6:55 PM, Simon Riggs  wrote:
> It seems cheap to add in a call to LogStandbySnapshot() after each
> call to pg_stop_backup().
>
> Does anyone think this case is worth adding code for? Seems like one
> more thing to break.

Why at that particular time?

It would maybe nice if the master could notice when it has a plausible
(non-overflowed) snapshot and log it then.  But that might be more
code than the problem is worth.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] Include commit identifier in version() function

2011-10-27 Thread pasman pasmański
Hi.

I think, it be useful to include in version() function a hexadecimal
identifier of commit, for fast checkout to it in git.

-- 

pasman

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: [PL/pgSQL] %TYPE and array declaration - second patch

2011-10-27 Thread Tom Lane
Pavel Stehule  writes:
> there is final Wojciech's patch

I looked this over a little bit, but I don't see an answer to the
question I put on the commitfest entry: why is this being done in
plpgsql, and not somewhere in the core code?  The core has also got
the concept of %TYPE, and could stand to have support for attaching []
notation to that.  I'm not entirely sure if anything could be shared,
but it would sure be worth looking at.  I've wished for some time that
%TYPE not be handled directly in read_datatype() at all, but rather in
the core parse_datatype() function.  This would require passing some
sort of callback function to parse_datatype() to let it know what
variables can be referenced in %TYPE, but we have parser callback
functions just like that already.  One benefit we could get that way
is that the core meaning of %TYPE (type of a table field name) would
still be available in plpgsql, if the name didn't match any declared
plpgsql variable.

However, assuming that we're sticking to just changing plpgsql for the
moment ... I cannot escape the feeling that this is a large patch with a
small patch struggling to get out.  It should not require 500 net new
lines of code to provide this functionality, when all we're doing is
looking up the array type whose element type is the type the existing
code can derive.  I would have expected to see the grammar passing one
extra flag to plpgsql_parse_cwordtype and plpgsql_parse_cwordrowtype
routines that were little changed except for the addition of a step to
find the array type.

Another complaint is that the grammar changes assume that the first
token of decl_datatype must be T_WORD or T_CWORD, which means that
it fails for cases such as unreserved plpgsql keywords.  This example
should work, and does work in 9.1:

create domain hint as int;

create or replace function foo() returns void as $$
declare
  x hint;
begin
end
$$ language plpgsql;

but fails with this patch because "hint" is returned by the lexer as
K_HINT not T_WORD.  You might be able to get around that by adding a
production with unreserved_keyword --- but I'm unsure that that would
cover every case that worked before, and in any case I do not see the
point of changing the grammar production at all.  It gains almost
nothing to have Bison parse the [] rather than doing it with C code in
read_datatype().

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade if 'postgres' database is dropped

2011-10-27 Thread Tom Lane
Bruce Momjian  writes:
> Robert Haas wrote:
>> that if you're doing something to the database that someone might
>> object to, you oughtn't be doing it to the postgres database either.
>> You should create a database just for pg_upgrade's use and install its
>> crap in there.

> It installs crap in all databases to set oids on system tables,

It seems like you're both confusing the source and target clusters.
None of that stuff gets installed in the source, does it?

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Review: [PL/pgSQL] %TYPE and array declaration - second patch

2011-10-27 Thread Pavel Stehule
Hello

2011/10/28 Tom Lane :
> Pavel Stehule  writes:
>> there is final Wojciech's patch
>

this is just small note about length of this patch. This patch was
significantly smaller then he solved problem with derivate types for
compound types - it should to solve problem described in this thread

http://stackoverflow.com/questions/7634704/declare-variable-of-composite-type-in-postgresql-using-type

Regards

Pavel Stehule



> I looked this over a little bit, but I don't see an answer to the
> question I put on the commitfest entry: why is this being done in
> plpgsql, and not somewhere in the core code?  The core has also got
> the concept of %TYPE, and could stand to have support for attaching []
> notation to that.  I'm not entirely sure if anything could be shared,
> but it would sure be worth looking at.  I've wished for some time that
> %TYPE not be handled directly in read_datatype() at all, but rather in
> the core parse_datatype() function.  This would require passing some
> sort of callback function to parse_datatype() to let it know what
> variables can be referenced in %TYPE, but we have parser callback
> functions just like that already.  One benefit we could get that way
> is that the core meaning of %TYPE (type of a table field name) would
> still be available in plpgsql, if the name didn't match any declared
> plpgsql variable.
>
> However, assuming that we're sticking to just changing plpgsql for the
> moment ... I cannot escape the feeling that this is a large patch with a
> small patch struggling to get out.  It should not require 500 net new
> lines of code to provide this functionality, when all we're doing is
> looking up the array type whose element type is the type the existing
> code can derive.  I would have expected to see the grammar passing one
> extra flag to plpgsql_parse_cwordtype and plpgsql_parse_cwordrowtype
> routines that were little changed except for the addition of a step to
> find the array type.
>
> Another complaint is that the grammar changes assume that the first
> token of decl_datatype must be T_WORD or T_CWORD, which means that
> it fails for cases such as unreserved plpgsql keywords.  This example
> should work, and does work in 9.1:
>
> create domain hint as int;
>
> create or replace function foo() returns void as $$
> declare
>  x hint;
> begin
> end
> $$ language plpgsql;
>
> but fails with this patch because "hint" is returned by the lexer as
> K_HINT not T_WORD.  You might be able to get around that by adding a
> production with unreserved_keyword --- but I'm unsure that that would
> cover every case that worked before, and in any case I do not see the
> point of changing the grammar production at all.  It gains almost
> nothing to have Bison parse the [] rather than doing it with C code in
> read_datatype().
>
>                        regards, tom lane
>

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] pg_upgrade if 'postgres' database is dropped

2011-10-27 Thread Magnus Hagander
On Oct 28, 2011 5:22 AM, "Tom Lane"  wrote:
>
> Bruce Momjian  writes:
> > Stephen Frost wrote:
> >> Yes, they would have removed it because they didn't want it.  As I
> >> recall, part of the agreement to create an extra database by default
was
> >> that it could be removed if users didn't want it.  Turning around and
> >> then saying "but things won't work if it's not there" isn't exactly
> >> supporting users who decide to remove it.
>
> > Well, you would have to remove it _after_ you did the pg_upgrade.
>
> As far as the *target* cluster is concerned, I have no sympathy for
> someone who messes with its contents before running pg_upgrade.
> That's an RTFM matter: you're supposed to upgrade into a virgin
> just-initdb'd cluster.
>
> However, it would be nice if pg_upgrade supported transferring from a
> *source* cluster that didn't have the postgres DB.
>
> What about creating a new, single-purpose database in the source
> cluster and then removing it again after we're done?

How about naming this newly created database "postgres"? That would make the
code simple enough - always use the postgres database, just drop it at the
end if it didn't exist in the source cluster.

/Magnus


Re: [HACKERS] pg_upgrade if 'postgres' database is dropped

2011-10-27 Thread Magnus Hagander
On Oct 28, 2011 5:19 AM, "Bruce Momjian"  wrote:
>
> Stephen Frost wrote:
>
> > > > Regarding pg_dumpall and pg_restore, I'm pretty sure both of those
can
> > > > be configured to connect to other databases instead, including for
> > > > globals.
> > >
> > > Well, please show me the code, because the C code I showed you had the
> > > '\connect postgres' string hardcoded in there.
> >
> > I guess there's a difference between "can be used and will work
> > correctly, but might create some extra garbage" and "can't be used at
> > all".  pg_dumpall has a -l option for connecting to whatever *existing*
> > database you have to pull the global data, and then it'll restore into a
> > clean initdb'd cluster, after which you could remove postgres.
>
> Keep in mind -l might connect to a specified database to do the dump,
> but it will still connect to the 'postgres' database to recreate them.
>
> > Admittedly, if you initdb the cluster, drop postgres, and then try a
> > restore, it would fail.  Personally, I'm not a big fan of that (why
>
> Right, same with pg_upgrade.
>
> > don't we use what was passed in to -l for that..?), but, practically,
>
> No idea.

Chicken/egg? If we did that, the pg_dumpall dump could no longer be loaded
into an empty cluster since the db it wanted to talk to didn't exist yet.
And restoring into an empty cluster has to be the main use for pg_dumpall
after all

/Magnus


Re: [HACKERS] Your review of pg_receivexlog/pg_basebackup

2011-10-27 Thread Fujii Masao
On Thu, Oct 27, 2011 at 11:14 PM, Magnus Hagander  wrote:
> Here's a version that does this. Turns out this requires a lot less
> code than what was previously in there, which is always nice.
>
> We still need to solve the other part which is how to deal with the
> partial files on restore. But this is definitely a cleaner way from a
> pure pg_receivexlog perspective.
>
> Comments/reviews?

Looks good.

Minor comment:
the source code comment of FindStreamingStart() seems to need to be updated.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Fujii Masao
On Thu, Oct 27, 2011 at 11:57 PM, Magnus Hagander  wrote:
> On Thu, Oct 27, 2011 at 16:54, Tom Lane  wrote:
>> Magnus Hagander  writes:
>>> On Thu, Oct 27, 2011 at 13:19, Heikki Linnakangas
>>>  wrote:
 On 27.10.2011 14:09, Fujii Masao wrote:
> Yes. But that sounds unuserfriendly. Padding the WAL file manually
> is easy-to-do for a user?
>>
>>> I'd definitely want to avoid anything that requires pg_receivexlog to
>>> actually *parse* the WAL. That'll make it way more complex than I'd
>>> like.
>>
>> What parsing?  Just pad to 16MB with zeroes.  In fact, I think the
>
> I'm just sayihng that *if* parsing is required, it would be bad.
>
>> receiver should just create the file that size to start with, and then
>> write received data into it, much like normal WAL creation does.
>
> So when pg_receivexlog starts up, how would it know if the last file
> represents a completed file, or a half-full file, without actually
> parsing it? It could be a 16Mb file with 10 bytes of valid data, or a
> complete file with 16Mb of valid data.
>
> We could always ask for a retransmit of the whole file, but if that
> file is gone on the master, we won't be able to do that, and will
> error out in a situation that's not actually an error.
>
> Though I guess if we leave the file as .partial up until this point
> (per my other patch just posted), I guess this doesn't actually apply
> - if the file is called .partial, we'll overwrite into it. If it's
> not, then we assume it's a complete segment.

Yeah, I think that we should commit the patch that you posted in
other thread, and should change pg_receivexlog so that it creates
new WAL file filled up with zero or opens a pre-existing one, like
XLogFileInit() does, before writing any streamed data. If we do
this, a user can easily use a partial WAL file for recovery by
renaming that file.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers