Re: [HACKERS] Updated version of pg_receivexlog

2012-06-04 Thread Ants Aasma
On Thu, Sep 29, 2011 at 11:30 PM, Magnus Hagander mag...@hagander.net wrote:
 it doesn't say that is not possible to use this for a standby
 server... probably that's why i get the error i put a recovery.conf
 after pg_basebackup finished... maybe we can say that  more loudly?

 The idea is, if you use it with -x (or --xlog), it's for taking a
 backup/clone, *not* for replication.

 If you use it without -x, then you can use it as the start of a
 replica, by adding a recovery.conf.

 But you can't do both at once, that will confuse it.

I stumbled upon this again today. There's nothing in the docs that
would even hint that using -x shouldn't work to create a replica. Why
does it get confused and can we (easily) make it not get confused? At
the very least it needs a big fat warning in documentation for the -x
option that the resulting backup might not be usable as a standby.

Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2012-06-04 Thread Fujii Masao
On Mon, Jun 4, 2012 at 11:25 PM, Ants Aasma a...@cybertec.at wrote:
 On Thu, Sep 29, 2011 at 11:30 PM, Magnus Hagander mag...@hagander.net wrote:
 it doesn't say that is not possible to use this for a standby
 server... probably that's why i get the error i put a recovery.conf
 after pg_basebackup finished... maybe we can say that  more loudly?

 The idea is, if you use it with -x (or --xlog), it's for taking a
 backup/clone, *not* for replication.

 If you use it without -x, then you can use it as the start of a
 replica, by adding a recovery.conf.

 But you can't do both at once, that will confuse it.

 I stumbled upon this again today. There's nothing in the docs that
 would even hint that using -x shouldn't work to create a replica. Why
 does it get confused and can we (easily) make it not get confused? At
 the very least it needs a big fat warning in documentation for the -x
 option that the resulting backup might not be usable as a standby.

Unless I'm missing something, you can use pg_basebackup -x for the
standby. If lots of WAL files are generated in the master after
pg_basebackup -x ends and before you start the standby instance,
you may get the following error. In this case, you need to consult with
archived WAL files even though you specified -x option in pg_basebackup.

 FATAL:  could not receive data from WAL stream: FATAL:  requested WAL
 segment 0001005C has already been removed

Though we have the above problem, pg_basebackup -x is usable for
the standby, I think.

Regards,

-- 
Fujii Masao

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2012-06-04 Thread Ants Aasma
On Mon, Jun 4, 2012 at 6:20 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Jun 4, 2012 at 11:25 PM, Ants Aasma a...@cybertec.at wrote:
 On Thu, Sep 29, 2011 at 11:30 PM, Magnus Hagander mag...@hagander.net 
 wrote:
 it doesn't say that is not possible to use this for a standby
 server... probably that's why i get the error i put a recovery.conf
 after pg_basebackup finished... maybe we can say that  more loudly?

 The idea is, if you use it with -x (or --xlog), it's for taking a
 backup/clone, *not* for replication.

 If you use it without -x, then you can use it as the start of a
 replica, by adding a recovery.conf.

 But you can't do both at once, that will confuse it.

 I stumbled upon this again today. There's nothing in the docs that
 would even hint that using -x shouldn't work to create a replica. Why
 does it get confused and can we (easily) make it not get confused? At
 the very least it needs a big fat warning in documentation for the -x
 option that the resulting backup might not be usable as a standby.

 Unless I'm missing something, you can use pg_basebackup -x for the
 standby. If lots of WAL files are generated in the master after
 pg_basebackup -x ends and before you start the standby instance,
 you may get the following error. In this case, you need to consult with
 archived WAL files even though you specified -x option in pg_basebackup.

 FATAL:  could not receive data from WAL stream: FATAL:  requested WAL
 segment 0001005C has already been removed

 Though we have the above problem, pg_basebackup -x is usable for
 the standby, I think.

I assumed from Magnus's comment that this is a known problem. I wonder
what went wrong if it should have worked. In the case where this
turned up the missing file was an xlog file with the new timeline ID
but one segment before the timeline switch. I'll have to see if I can
create a reproducible case for this.

Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2012-06-04 Thread Magnus Hagander
On Mon, Jun 4, 2012 at 5:48 PM, Ants Aasma a...@cybertec.at wrote:
 On Mon, Jun 4, 2012 at 6:20 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Mon, Jun 4, 2012 at 11:25 PM, Ants Aasma a...@cybertec.at wrote:
 On Thu, Sep 29, 2011 at 11:30 PM, Magnus Hagander mag...@hagander.net 
 wrote:
 it doesn't say that is not possible to use this for a standby
 server... probably that's why i get the error i put a recovery.conf
 after pg_basebackup finished... maybe we can say that  more loudly?

 The idea is, if you use it with -x (or --xlog), it's for taking a
 backup/clone, *not* for replication.

 If you use it without -x, then you can use it as the start of a
 replica, by adding a recovery.conf.

 But you can't do both at once, that will confuse it.

 I stumbled upon this again today. There's nothing in the docs that
 would even hint that using -x shouldn't work to create a replica. Why
 does it get confused and can we (easily) make it not get confused? At
 the very least it needs a big fat warning in documentation for the -x
 option that the resulting backup might not be usable as a standby.

 Unless I'm missing something, you can use pg_basebackup -x for the
 standby. If lots of WAL files are generated in the master after
 pg_basebackup -x ends and before you start the standby instance,
 you may get the following error. In this case, you need to consult with
 archived WAL files even though you specified -x option in pg_basebackup.

 FATAL:  could not receive data from WAL stream: FATAL:  requested WAL
 segment 0001005C has already been removed

 Though we have the above problem, pg_basebackup -x is usable for
 the standby, I think.

 I assumed from Magnus's comment that this is a known problem. I wonder
 what went wrong if it should have worked. In the case where this
 turned up the missing file was an xlog file with the new timeline ID
 but one segment before the timeline switch. I'll have to see if I can
 create a reproducible case for this.

No, it's more a there's no reason to do that. I don't think it
should necessarily be an actual problem.

In your case the missing piece of information is why was there a
timeline switch? pg_basebackup shouldn't cause a timeline switch
whether you use it in -x mode or not.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2012-06-04 Thread Ants Aasma
On Mon, Jun 4, 2012 at 6:53 PM, Magnus Hagander mag...@hagander.net wrote:
 No, it's more a there's no reason to do that. I don't think it
 should necessarily be an actual problem.

Ok, good to know.

 In your case the missing piece of information is why was there a
 timeline switch? pg_basebackup shouldn't cause a timeline switch
 whether you use it in -x mode or not.

No mystery there. The timeline switch was because I had just promoted
the master for standby mode. There's a chance I might have
accidentally done something horribly wrong somewhere because I can't
immediately reproduce this. I'll let you know if I find out how I
managed to create this error.

Ants Aasma
-- 
Cybertec Schönig  Schönig GmbH
Gröhrmühlgasse 26
A-2700 Wiener Neustadt
Web: http://www.postgresql-support.de

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: pg_receivexlog and sync rep Re: [HACKERS] Updated version of pg_receivexlog

2012-02-09 Thread Magnus Hagander
On Wed, Feb 8, 2012 at 19:39, Fujii Masao masao.fu...@gmail.com wrote:
 On Tue, Oct 25, 2011 at 7:37 PM, Magnus Hagander mag...@hagander.net wrote:
 On Mon, Oct 24, 2011 at 14:40, Magnus Hagander mag...@hagander.net wrote:
 On Mon, Oct 24, 2011 at 13:46, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 How does this interact with synchronous replication? If a base backup that
 streams WAL is in progress, and you have synchronous_standby_names set to
 '*', I believe the in-progress backup will count as a standby for that
 purpose. That might give a false sense of security.

 Ah yes. Did not think of that. Yes, it will have this problem.

 Actually, thinking more, per other mail, it won't. Because it will
 never report that the data is synced to disk, so it will not be
 considered for sync standby.

 Now, new replication mode (synchronous_commit = write) is supported.
 In this mode, the in-progress backup will be considered as sync
 standby because its periodic status report includes the valid write position.
 We should change the report so that it includes only invalid positions.
 Patch attached.

Agreed, applied.


 While I agree that the backup should not behave as sync standby, ISTM
 that pg_receivexlog should, which is very useful. If pg_receivexlog does
 so, we can write WAL synchronously in both local and remote, which
 would increase the durability of the system. Thus, to allow pg_receivexlog
 to behave as sync standby, we should change it so that its status report
 includes the write and flush positions?

Yes, that could be useful. I don't think it should do so by default,
though, but it could be useful to provide a commandline switc hto
pg_receivexlog allowing it to do this.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


pg_receivexlog and sync rep Re: [HACKERS] Updated version of pg_receivexlog

2012-02-08 Thread Fujii Masao
On Tue, Oct 25, 2011 at 7:37 PM, Magnus Hagander mag...@hagander.net wrote:
 On Mon, Oct 24, 2011 at 14:40, Magnus Hagander mag...@hagander.net wrote:
 On Mon, Oct 24, 2011 at 13:46, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 How does this interact with synchronous replication? If a base backup that
 streams WAL is in progress, and you have synchronous_standby_names set to
 '*', I believe the in-progress backup will count as a standby for that
 purpose. That might give a false sense of security.

 Ah yes. Did not think of that. Yes, it will have this problem.

 Actually, thinking more, per other mail, it won't. Because it will
 never report that the data is synced to disk, so it will not be
 considered for sync standby.

Now, new replication mode (synchronous_commit = write) is supported.
In this mode, the in-progress backup will be considered as sync
standby because its periodic status report includes the valid write position.
We should change the report so that it includes only invalid positions.
Patch attached.

While I agree that the backup should not behave as sync standby, ISTM
that pg_receivexlog should, which is very useful. If pg_receivexlog does
so, we can write WAL synchronously in both local and remote, which
would increase the durability of the system. Thus, to allow pg_receivexlog
to behave as sync standby, we should change it so that its status report
includes the write and flush positions?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center


invalid_write_pos_v1.patch
Description: Binary data

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-28 Thread Fujii Masao
On Thu, Oct 27, 2011 at 11:57 PM, Magnus Hagander mag...@hagander.net wrote:
 On Thu, Oct 27, 2011 at 16:54, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 On Thu, Oct 27, 2011 at 13:19, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 On 27.10.2011 14:09, Fujii Masao wrote:
 Yes. But that sounds unuserfriendly. Padding the WAL file manually
 is easy-to-do for a user?

 I'd definitely want to avoid anything that requires pg_receivexlog to
 actually *parse* the WAL. That'll make it way more complex than I'd
 like.

 What parsing?  Just pad to 16MB with zeroes.  In fact, I think the

 I'm just sayihng that *if* parsing is required, it would be bad.

 receiver should just create the file that size to start with, and then
 write received data into it, much like normal WAL creation does.

 So when pg_receivexlog starts up, how would it know if the last file
 represents a completed file, or a half-full file, without actually
 parsing it? It could be a 16Mb file with 10 bytes of valid data, or a
 complete file with 16Mb of valid data.

 We could always ask for a retransmit of the whole file, but if that
 file is gone on the master, we won't be able to do that, and will
 error out in a situation that's not actually an error.

 Though I guess if we leave the file as .partial up until this point
 (per my other patch just posted), I guess this doesn't actually apply
 - if the file is called .partial, we'll overwrite into it. If it's
 not, then we assume it's a complete segment.

Yeah, I think that we should commit the patch that you posted in
other thread, and should change pg_receivexlog so that it creates
new WAL file filled up with zero or opens a pre-existing one, like
XLogFileInit() does, before writing any streamed data. If we do
this, a user can easily use a partial WAL file for recovery by
renaming that file.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Fujii Masao
On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander mag...@hagander.net wrote:
 I've applied this version with a few more minor changes that Heikki found.

Cool!

When I tried pg_receivexlog and checked the contents of streamed WAL file by
xlogdump, I found that recent WAL records that walsender has already sent don't
exist in that WAL file. I expected that pg_receivexlog writes the streamed WAL
records to the disk as soon as possible, but it doesn't. Is this
intentional? Or bug?
Am I missing something?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Magnus Hagander
On Thu, Oct 27, 2011 at 09:29, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander mag...@hagander.net wrote:
 I've applied this version with a few more minor changes that Heikki found.

 Cool!

 When I tried pg_receivexlog and checked the contents of streamed WAL file by
 xlogdump, I found that recent WAL records that walsender has already sent 
 don't
 exist in that WAL file. I expected that pg_receivexlog writes the streamed WAL
 records to the disk as soon as possible, but it doesn't. Is this
 intentional? Or bug?
 Am I missing something?

It writes it to disk as soon as possible, but doesn't fsync() until
the end of each segment. Are you by any chance looking at the file
while it's running?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Fujii Masao
On Thu, Oct 27, 2011 at 4:40 PM, Magnus Hagander mag...@hagander.net wrote:
 On Thu, Oct 27, 2011 at 09:29, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander mag...@hagander.net wrote:
 I've applied this version with a few more minor changes that Heikki found.

 Cool!

 When I tried pg_receivexlog and checked the contents of streamed WAL file by
 xlogdump, I found that recent WAL records that walsender has already sent 
 don't
 exist in that WAL file. I expected that pg_receivexlog writes the streamed 
 WAL
 records to the disk as soon as possible, but it doesn't. Is this
 intentional? Or bug?
 Am I missing something?

 It writes it to disk as soon as possible, but doesn't fsync() until
 the end of each segment. Are you by any chance looking at the file
 while it's running?

No. I looked at that file after shutting down the master server.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Magnus Hagander
On Thu, Oct 27, 2011 at 09:46, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 4:40 PM, Magnus Hagander mag...@hagander.net wrote:
 On Thu, Oct 27, 2011 at 09:29, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander mag...@hagander.net 
 wrote:
 I've applied this version with a few more minor changes that Heikki found.

 Cool!

 When I tried pg_receivexlog and checked the contents of streamed WAL file by
 xlogdump, I found that recent WAL records that walsender has already sent 
 don't
 exist in that WAL file. I expected that pg_receivexlog writes the streamed 
 WAL
 records to the disk as soon as possible, but it doesn't. Is this
 intentional? Or bug?
 Am I missing something?

 It writes it to disk as soon as possible, but doesn't fsync() until
 the end of each segment. Are you by any chance looking at the file
 while it's running?

 No. I looked at that file after shutting down the master server.

Ugh, in that case something is certainly wrong. There is nothing but
setting up some offset values between PQgetCopyData() and write()...


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Fujii Masao
On Thu, Oct 27, 2011 at 4:49 PM, Magnus Hagander mag...@hagander.net wrote:
 On Thu, Oct 27, 2011 at 09:46, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 4:40 PM, Magnus Hagander mag...@hagander.net wrote:
 On Thu, Oct 27, 2011 at 09:29, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander mag...@hagander.net 
 wrote:
 I've applied this version with a few more minor changes that Heikki found.

 Cool!

 When I tried pg_receivexlog and checked the contents of streamed WAL file 
 by
 xlogdump, I found that recent WAL records that walsender has already sent 
 don't
 exist in that WAL file. I expected that pg_receivexlog writes the streamed 
 WAL
 records to the disk as soon as possible, but it doesn't. Is this
 intentional? Or bug?
 Am I missing something?

 It writes it to disk as soon as possible, but doesn't fsync() until
 the end of each segment. Are you by any chance looking at the file
 while it's running?

 No. I looked at that file after shutting down the master server.

 Ugh, in that case something is certainly wrong. There is nothing but
 setting up some offset values between PQgetCopyData() and write()...

When end-of-copy stream is found or an error happens, pg_receivexlog
exits without flushing outstanding WAL records. Which seems to cause
the problem I reported.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Magnus Hagander
On Thu, Oct 27, 2011 at 10:12, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 4:49 PM, Magnus Hagander mag...@hagander.net wrote:
 On Thu, Oct 27, 2011 at 09:46, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 4:40 PM, Magnus Hagander mag...@hagander.net 
 wrote:
 On Thu, Oct 27, 2011 at 09:29, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 3:29 AM, Magnus Hagander mag...@hagander.net 
 wrote:
 I've applied this version with a few more minor changes that Heikki 
 found.

 Cool!

 When I tried pg_receivexlog and checked the contents of streamed WAL file 
 by
 xlogdump, I found that recent WAL records that walsender has already sent 
 don't
 exist in that WAL file. I expected that pg_receivexlog writes the 
 streamed WAL
 records to the disk as soon as possible, but it doesn't. Is this
 intentional? Or bug?
 Am I missing something?

 It writes it to disk as soon as possible, but doesn't fsync() until
 the end of each segment. Are you by any chance looking at the file
 while it's running?

 No. I looked at that file after shutting down the master server.

 Ugh, in that case something is certainly wrong. There is nothing but
 setting up some offset values between PQgetCopyData() and write()...

 When end-of-copy stream is found or an error happens, pg_receivexlog
 exits without flushing outstanding WAL records. Which seems to cause
 the problem I reported.

Not sure I follow. When we arrive at PQgetCopyData() there should be
nothing buffered, and if the end of stream happens there it returns
-1, and we exit, no? So where is the data that's lost?

I do realize we don't actually fsync() and close() in this case - is
that what you are referring to? But the data should already have been
write()d, so it should still be there, no?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Fujii Masao
On Thu, Oct 27, 2011 at 5:18 PM, Magnus Hagander mag...@hagander.net wrote:
 Not sure I follow. When we arrive at PQgetCopyData() there should be
 nothing buffered, and if the end of stream happens there it returns
 -1, and we exit, no? So where is the data that's lost?

 I do realize we don't actually fsync() and close() in this case - is
 that what you are referring to? But the data should already have been
 write()d, so it should still be there, no?

Oh, right. Hmm.. xlogdump might be the cause.

Though I've not read the code of xlogdump, I wonder if it gives up
outputting the contents of WAL file when it finds a partial WAL page...
This strikes me that recovery code has the same problem. No?
IOW, when a partial WAL page is found during recovery, I'm afraid
that page would not be replayed though it contains valid data.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Fujii Masao
On Thu, Oct 27, 2011 at 6:25 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 5:18 PM, Magnus Hagander mag...@hagander.net wrote:
 Not sure I follow. When we arrive at PQgetCopyData() there should be
 nothing buffered, and if the end of stream happens there it returns
 -1, and we exit, no? So where is the data that's lost?

 I do realize we don't actually fsync() and close() in this case - is
 that what you are referring to? But the data should already have been
 write()d, so it should still be there, no?

 Oh, right. Hmm.. xlogdump might be the cause.

 Though I've not read the code of xlogdump, I wonder if it gives up
 outputting the contents of WAL file when it finds a partial WAL page...
 This strikes me that recovery code has the same problem. No?
 IOW, when a partial WAL page is found during recovery, I'm afraid
 that page would not be replayed though it contains valid data.

My concern was right. When I performed a recovery using the streamed
WAL files, the loss of data happened. A partial WAL page was not replayed.

To avoid this problem, I think that we should change pg_receivexlog so
that it writes WAL data *by the block*, or creates, like walreceiver, WAL file
before writing any data. Otherwise, though pg_receivexlog streams WAL
data in realtime, the latest WAL data might not be available for recovery.

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Magnus Hagander
On Thu, Oct 27, 2011 at 12:29, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 6:25 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 5:18 PM, Magnus Hagander mag...@hagander.net wrote:
 Not sure I follow. When we arrive at PQgetCopyData() there should be
 nothing buffered, and if the end of stream happens there it returns
 -1, and we exit, no? So where is the data that's lost?

 I do realize we don't actually fsync() and close() in this case - is
 that what you are referring to? But the data should already have been
 write()d, so it should still be there, no?

 Oh, right. Hmm.. xlogdump might be the cause.

 Though I've not read the code of xlogdump, I wonder if it gives up
 outputting the contents of WAL file when it finds a partial WAL page...
 This strikes me that recovery code has the same problem. No?
 IOW, when a partial WAL page is found during recovery, I'm afraid
 that page would not be replayed though it contains valid data.

 My concern was right. When I performed a recovery using the streamed
 WAL files, the loss of data happened. A partial WAL page was not replayed.

 To avoid this problem, I think that we should change pg_receivexlog so
 that it writes WAL data *by the block*, or creates, like walreceiver, WAL file
 before writing any data. Otherwise, though pg_receivexlog streams WAL
 data in realtime, the latest WAL data might not be available for recovery.

Ah, so you were recovering data from the last, partial, file? Not from
a completed file?

I'm rewriting the handling of partial files per the other thread
started by Heikki. The idea is that there will be an actual .partial
file in there when pg_receivexlog has ended, and you have to deal with
it manually. The typical way would be to pad it with zeroes to the
end. Doing such padding would solve this recovery issue, correct?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Fujii Masao
On Thu, Oct 27, 2011 at 7:48 PM, Magnus Hagander mag...@hagander.net wrote:
 On Thu, Oct 27, 2011 at 12:29, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 6:25 PM, Fujii Masao masao.fu...@gmail.com wrote:
 On Thu, Oct 27, 2011 at 5:18 PM, Magnus Hagander mag...@hagander.net 
 wrote:
 Not sure I follow. When we arrive at PQgetCopyData() there should be
 nothing buffered, and if the end of stream happens there it returns
 -1, and we exit, no? So where is the data that's lost?

 I do realize we don't actually fsync() and close() in this case - is
 that what you are referring to? But the data should already have been
 write()d, so it should still be there, no?

 Oh, right. Hmm.. xlogdump might be the cause.

 Though I've not read the code of xlogdump, I wonder if it gives up
 outputting the contents of WAL file when it finds a partial WAL page...
 This strikes me that recovery code has the same problem. No?
 IOW, when a partial WAL page is found during recovery, I'm afraid
 that page would not be replayed though it contains valid data.

 My concern was right. When I performed a recovery using the streamed
 WAL files, the loss of data happened. A partial WAL page was not replayed.

 To avoid this problem, I think that we should change pg_receivexlog so
 that it writes WAL data *by the block*, or creates, like walreceiver, WAL 
 file
 before writing any data. Otherwise, though pg_receivexlog streams WAL
 data in realtime, the latest WAL data might not be available for recovery.

 Ah, so you were recovering data from the last, partial, file? Not from
 a completed file?

Yes. I copied all streamed WAL files to pg_xlog directory and started recovery.

 I'm rewriting the handling of partial files per the other thread
 started by Heikki. The idea is that there will be an actual .partial
 file in there when pg_receivexlog has ended, and you have to deal with
 it manually. The typical way would be to pad it with zeroes to the
 end. Doing such padding would solve this recovery issue, correct?

Yes. But that sounds unuserfriendly. Padding the WAL file manually
is easy-to-do for a user?

Regards,

-- 
Fujii Masao
NIPPON TELEGRAPH AND TELEPHONE CORPORATION
NTT Open Source Software Center

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Heikki Linnakangas

On 27.10.2011 14:09, Fujii Masao wrote:

On Thu, Oct 27, 2011 at 7:48 PM, Magnus Hagandermag...@hagander.net  wrote:

I'm rewriting the handling of partial files per the other thread
started by Heikki. The idea is that there will be an actual .partial
file in there when pg_receivexlog has ended, and you have to deal with
it manually. The typical way would be to pad it with zeroes to the
end. Doing such padding would solve this recovery issue, correct?


Yes. But that sounds unuserfriendly. Padding the WAL file manually
is easy-to-do for a user?


truncate -s 16M file works at least on my Linux system. Not sure how 
you'd do it on Windows.


Perhaps we should add automatic padding in the server, though. It 
wouldn't take much code in the server, and would make life easier for 
people writing their scripts. Maybe even have the backend check for a 
.partial file if it can't find a regularly named XLOG file. Needs some 
thought..


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Magnus Hagander
On Thu, Oct 27, 2011 at 13:19, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 27.10.2011 14:09, Fujii Masao wrote:

 On Thu, Oct 27, 2011 at 7:48 PM, Magnus Hagandermag...@hagander.net
  wrote:

 I'm rewriting the handling of partial files per the other thread
 started by Heikki. The idea is that there will be an actual .partial
 file in there when pg_receivexlog has ended, and you have to deal with
 it manually. The typical way would be to pad it with zeroes to the
 end. Doing such padding would solve this recovery issue, correct?

 Yes. But that sounds unuserfriendly. Padding the WAL file manually
 is easy-to-do for a user?

 truncate -s 16M file works at least on my Linux system. Not sure how
 you'd do it on Windows.

Yeah, taht's easy enough. I don't think there are similar tools on
windows, but we could probably put together a quick script for people
to use if necessary.


 Perhaps we should add automatic padding in the server, though. It wouldn't
 take much code in the server, and would make life easier for people writing
 their scripts. Maybe even have the backend check for a .partial file if it
 can't find a regularly named XLOG file. Needs some thought..

I'd definitely want to avoid anything that requires pg_receivexlog to
actually *parse* the WAL. That'll make it way more complex than I'd
like.

Having recovery consider a .partial file might be interesting. It
could consider that only if there are no other complete files
available, or something like that?

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Robert Haas
On Thu, Oct 27, 2011 at 7:19 AM, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 On 27.10.2011 14:09, Fujii Masao wrote:

 On Thu, Oct 27, 2011 at 7:48 PM, Magnus Hagandermag...@hagander.net
  wrote:

 I'm rewriting the handling of partial files per the other thread
 started by Heikki. The idea is that there will be an actual .partial
 file in there when pg_receivexlog has ended, and you have to deal with
 it manually. The typical way would be to pad it with zeroes to the
 end. Doing such padding would solve this recovery issue, correct?

 Yes. But that sounds unuserfriendly. Padding the WAL file manually
 is easy-to-do for a user?

 truncate -s 16M file works at least on my Linux system. Not sure how
 you'd do it on Windows.

One of the common I hear about PostgreSQL is that our replication
system is more difficult to set up than people would like, and it's
too easy to make mistakes that can corrupt your data without realizing
it; I don't think making them need to truncate a file to 16 megabytes
is going to improve things there.

 Perhaps we should add automatic padding in the server, though. It wouldn't
 take much code in the server, and would make life easier for people writing
 their scripts. Maybe even have the backend check for a .partial file if it
 can't find a regularly named XLOG file. Needs some thought..

+1 for figuring out something, though I'm not sure exactly what.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Tom Lane
Magnus Hagander mag...@hagander.net writes:
 On Thu, Oct 27, 2011 at 13:19, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 On 27.10.2011 14:09, Fujii Masao wrote:
 Yes. But that sounds unuserfriendly. Padding the WAL file manually
 is easy-to-do for a user?

 I'd definitely want to avoid anything that requires pg_receivexlog to
 actually *parse* the WAL. That'll make it way more complex than I'd
 like.

What parsing?  Just pad to 16MB with zeroes.  In fact, I think the
receiver should just create the file that size to start with, and then
write received data into it, much like normal WAL creation does.

regards, tom lane

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Magnus Hagander
On Thu, Oct 27, 2011 at 16:54, Tom Lane t...@sss.pgh.pa.us wrote:
 Magnus Hagander mag...@hagander.net writes:
 On Thu, Oct 27, 2011 at 13:19, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 On 27.10.2011 14:09, Fujii Masao wrote:
 Yes. But that sounds unuserfriendly. Padding the WAL file manually
 is easy-to-do for a user?

 I'd definitely want to avoid anything that requires pg_receivexlog to
 actually *parse* the WAL. That'll make it way more complex than I'd
 like.

 What parsing?  Just pad to 16MB with zeroes.  In fact, I think the

I'm just sayihng that *if* parsing is required, it would be bad.

 receiver should just create the file that size to start with, and then
 write received data into it, much like normal WAL creation does.

So when pg_receivexlog starts up, how would it know if the last file
represents a completed file, or a half-full file, without actually
parsing it? It could be a 16Mb file with 10 bytes of valid data, or a
complete file with 16Mb of valid data.

We could always ask for a retransmit of the whole file, but if that
file is gone on the master, we won't be able to do that, and will
error out in a situation that's not actually an error.

Though I guess if we leave the file as .partial up until this point
(per my other patch just posted), I guess this doesn't actually apply
- if the file is called .partial, we'll overwrite into it. If it's
not, then we assume it's a complete segment.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-27 Thread Dimitri Fontaine
Magnus Hagander mag...@hagander.net writes:
 On Thu, Oct 27, 2011 at 13:19, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 Perhaps we should add automatic padding in the server, though. It wouldn't
 take much code in the server, and would make life easier for people writing
 their scripts. Maybe even have the backend check for a .partial file if it
 can't find a regularly named XLOG file. Needs some thought..

 I'd definitely want to avoid anything that requires pg_receivexlog to
 actually *parse* the WAL. That'll make it way more complex than I'd
 like.

What about creating the WAL file filled up with zeroes at the receiving
end and then overwriting data as we receive it?

Regards,
-- 
Dimitri Fontaine
http://2ndQuadrant.fr PostgreSQL : Expertise, Formation et Support

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-26 Thread Magnus Hagander
On Tue, Oct 25, 2011 at 12:37, Magnus Hagander mag...@hagander.net wrote:
 On Mon, Oct 24, 2011 at 14:40, Magnus Hagander mag...@hagander.net wrote:
 On Mon, Oct 24, 2011 at 13:46, Heikki Linnakangas
 heikki.linnakan...@enterprisedb.com wrote:
 +               /*
 +                * Looks like an xlog file. Parse it's position.

 s/it's/its/

 +                */
 +               if (sscanf(dirent-d_name, %08X%08X%08X, tli, log,
 seg) != 3)
 +               {
 +                       fprintf(stderr, _(%s: could not parse xlog
 filename \%s\\n),
 +                                       progname, dirent-d_name);
 +                       disconnect_and_exit(1);
 +               }
 +               log *= XLOG_SEG_SIZE;

 That multiplication by XLOG_SEG_SIZE could overflow, if logid is very high.
 It seems completely unnecessary, anyway,

 How do you mean completely unnecessary? We'd have to change the points
 that use it to divide by XLOG_SEG_SIZE otherwise, no? That might be a
 way to get around the overflow, but I'm not sure that's what you mean?

 Talked to Heikki on IM about this one, turns out we were both wrong.
 It's needed, but there was a bug hiding under it, due to (once again)
 mixing up segments and offsets. Has been fixed now.

 In pg_basebackup, it would be a good sanity check to check that the systemid
 returned by IDENTIFY_SYSTEM in the main connection and the WAL-streaming
 connection match. Just to be sure that some connection pooler didn't hijack
 one of the connections and point to a different server. And better check
 timelineid too while you're at it.

 That's a good idea. Will fix.

 Added to the new version of the patch.


 How does this interact with synchronous replication? If a base backup that
 streams WAL is in progress, and you have synchronous_standby_names set to
 '*', I believe the in-progress backup will count as a standby for that
 purpose. That might give a false sense of security.

 Ah yes. Did not think of that. Yes, it will have this problem.

 Actually, thinking more, per other mail, it won't. Because it will
 never report that the data is synced to disk, so it will not be
 considered for sync standby.

 This is something we might consider in the future (it could be a
 reasonable scenario where you had this), but not in the first version.

 Updated version of the patch attached.

I've applied this version with a few more minor changes that Heikki found.

His comment about .partial files still applies, and I intend to
address this in a follow-up commit, along with some further
documentation enhancements.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-24 Thread Magnus Hagander
On Mon, Oct 24, 2011 at 16:12, Jaime Casanova ja...@2ndquadrant.com wrote:
 On Mon, Oct 24, 2011 at 7:40 AM, Magnus Hagander mag...@hagander.net wrote:

 synchronous_standby_names='*' is prone to such confusion in general, but it
 seems that it's particularly surprising if a running pg_basebackup lets a
 commit in synchronous replication to proceed. Maybe we just need a warning
 in the docs. I think we should advise that synchronous_standby_names='*' is
 dangerous in general, and cite this as one reason for that.

 Hmm. i think this is common enough that we want to make sure we avoid
 it in code.

 Could we pass a parameter from the client indicating to the master
 that it refuses to be a sync slave? An optional keyword to the
 START_REPLICATION command, perhaps?


 can't you execute set synchronous_commit to off/local for this connection?

This is a walsender connection, it doesn't take SQL. Plus it's the
receiving end, and SET sync_commit is for the sending end.

that said, we are reasonably safe in current implementations, because
it always sets the flush location to invalidxlogptr, so it will not be
considered for sync slave. Should we ever start accepting write as
the point to sync against, the problem will show up, of course.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-24 Thread Magnus Hagander
On Mon, Oct 24, 2011 at 13:46, Heikki Linnakangas
heikki.linnakan...@enterprisedb.com wrote:
 +               /*
 +                * Looks like an xlog file. Parse it's position.

 s/it's/its/

 +                */
 +               if (sscanf(dirent-d_name, %08X%08X%08X, tli, log,
 seg) != 3)
 +               {
 +                       fprintf(stderr, _(%s: could not parse xlog
 filename \%s\\n),
 +                                       progname, dirent-d_name);
 +                       disconnect_and_exit(1);
 +               }
 +               log *= XLOG_SEG_SIZE;

 That multiplication by XLOG_SEG_SIZE could overflow, if logid is very high.
 It seems completely unnecessary, anyway,

How do you mean completely unnecessary? We'd have to change the points
that use it to divide by XLOG_SEG_SIZE otherwise, no? That might be a
way to get around the overflow, but I'm not sure that's what you mean?


 s/IDENFITY_SYSTEM/IDENTIFY_SYSTEM/ (two occurrences)

Oops.


 In pg_basebackup, it would be a good sanity check to check that the systemid
 returned by IDENTIFY_SYSTEM in the main connection and the WAL-streaming
 connection match. Just to be sure that some connection pooler didn't hijack
 one of the connections and point to a different server. And better check
 timelineid too while you're at it.

That's a good idea. Will fix.


 How does this interact with synchronous replication? If a base backup that
 streams WAL is in progress, and you have synchronous_standby_names set to
 '*', I believe the in-progress backup will count as a standby for that
 purpose. That might give a false sense of security.

Ah yes. Did not think of that. Yes, it will have this problem.


 synchronous_standby_names='*' is prone to such confusion in general, but it
 seems that it's particularly surprising if a running pg_basebackup lets a
 commit in synchronous replication to proceed. Maybe we just need a warning
 in the docs. I think we should advise that synchronous_standby_names='*' is
 dangerous in general, and cite this as one reason for that.

Hmm. i think this is common enough that we want to make sure we avoid
it in code.

Could we pass a parameter from the client indicating to the master
that it refuses to be a sync slave? An optional keyword to the
START_REPLICATION command, perhaps?


-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-24 Thread Jaime Casanova
On Mon, Oct 24, 2011 at 7:40 AM, Magnus Hagander mag...@hagander.net wrote:

 synchronous_standby_names='*' is prone to such confusion in general, but it
 seems that it's particularly surprising if a running pg_basebackup lets a
 commit in synchronous replication to proceed. Maybe we just need a warning
 in the docs. I think we should advise that synchronous_standby_names='*' is
 dangerous in general, and cite this as one reason for that.

 Hmm. i think this is common enough that we want to make sure we avoid
 it in code.

 Could we pass a parameter from the client indicating to the master
 that it refuses to be a sync slave? An optional keyword to the
 START_REPLICATION command, perhaps?


can't you execute set synchronous_commit to off/local for this connection?

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-10-24 Thread Heikki Linnakangas

+   /*
+* Looks like an xlog file. Parse it's position.


s/it's/its/


+*/
+   if (sscanf(dirent-d_name, %08X%08X%08X, tli, log, seg) != 
3)
+   {
+   fprintf(stderr, _(%s: could not parse xlog filename 
\%s\\n),
+   progname, dirent-d_name);
+   disconnect_and_exit(1);
+   }
+   log *= XLOG_SEG_SIZE;


That multiplication by XLOG_SEG_SIZE could overflow, if logid is very 
high. It seems completely unnecessary, anyway,


s/IDENFITY_SYSTEM/IDENTIFY_SYSTEM/ (two occurrences)

In pg_basebackup, it would be a good sanity check to check that the 
systemid returned by IDENTIFY_SYSTEM in the main connection and the 
WAL-streaming connection match. Just to be sure that some connection 
pooler didn't hijack one of the connections and point to a different 
server. And better check timelineid too while you're at it.


How does this interact with synchronous replication? If a base backup 
that streams WAL is in progress, and you have synchronous_standby_names 
set to '*', I believe the in-progress backup will count as a standby for 
that purpose. That might give a false sense of security. 
synchronous_standby_names='*' is prone to such confusion in general, but 
it seems that it's particularly surprising if a running pg_basebackup 
lets a commit in synchronous replication to proceed. Maybe we just need 
a warning in the docs. I think we should advise that 
synchronous_standby_names='*' is dangerous in general, and cite this as 
one reason for that.


--
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-09-29 Thread Magnus Hagander
On Thu, Sep 29, 2011 at 01:55, Jaime Casanova ja...@2ndquadrant.com wrote:
 On Wed, Sep 28, 2011 at 12:50 PM, Magnus Hagander mag...@hagander.net wrote:

 pg_receivexlog worked good in my tests.

 pg_basebackup with --xlog=stream gives me an already recycled wal
 segment message (note that the file was in pg_xlog in the standby):
 FATAL:  could not receive data from WAL stream: FATAL:  requested WAL
 segment 0001005C has already been removed

 Do you get this reproducibly? Or did you get it just once?

 And when you say in the standby what are you referring to? There is
 no standby server in the case of pg_basebackup --xlog=stream, it's
 just backup... But are you saying pg_basebackup had received the file,
 yet tried to get it again?


 ok, i was trying to setup a standby server cloning with
 pg_basebackup... i can't use it that way?

 the docs says:
 
 If this option is specified, it is possible to start a postmaster
 directly in the extracted directory without the need to consult the
 log archive, thus making this a completely standalone backup.
 

 it doesn't say that is not possible to use this for a standby
 server... probably that's why i get the error i put a recovery.conf
 after pg_basebackup finished... maybe we can say that  more loudly?

The idea is, if you use it with -x (or --xlog), it's for taking a
backup/clone, *not* for replication.

If you use it without -x, then you can use it as the start of a
replica, by adding a recovery.conf.

But you can't do both at once, that will confuse it.


 in other things:
 do we need to include src/bin/pg_basebackup/.gitignore in the patch?

 Not sure what you mean? We need to add pg_receivexlog to this file,
 yes - in head it just contains pg_basebackup.


 your patch includes a modification in the file
 src/bin/pg_basebackup/.gitignore, maybe i'm just being annoying
 besides is a simple change... just forget that...

Well, it needs to be included inthe commit, and if I exclude it inthe
posted patch, I'll just forget it in the end :-)

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-09-28 Thread Jaime Casanova
On Tue, Aug 16, 2011 at 9:32 AM, Magnus Hagander mag...@hagander.net wrote:
 Here's an updated version of pg_receivexlog, that should now actually
 work (it previously failed miserably when a replication record crossed
 a WAL file boundary - something which I at the time could not properly
 reproduce, but when I restarted my work on it now could easily
 reproduce every time :D).

 It also contains an update to pg_basebackup that allows it to stream
 the transaction log in the background while the backup is running,
 thus reducing the need for wal_keep_segments (if the client can keep
 up, it should eliminate the need completely).


reviewing this...

i found useful pg_receivexlog as an independent utility, but i'm not
so sure about the ability to call it from pg_basebackup via --xlog
option. this is because pg_receivexlog will continue streaming even
after pg_basebackup if it's called independently but not in the other
case so the use case for --xlog seems more narrow and error prone (ie:
you said that it reduces the need for wal_keep_segments *if the client
can keep up*... how can we know that before starting pg_basebackup?)

pg_receivexlog worked good in my tests.

pg_basebackup with --xlog=stream gives me an already recycled wal
segment message (note that the file was in pg_xlog in the standby):
FATAL:  could not receive data from WAL stream: FATAL:  requested WAL
segment 0001005C has already been removed


haven't read all the code in the detail but seems fine to me

in other things:
do we need to include src/bin/pg_basebackup/.gitignore in the patch?

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-09-28 Thread Jaime Casanova
On Wed, Sep 28, 2011 at 1:38 AM, Jaime Casanova ja...@2ndquadrant.com wrote:
 On Tue, Aug 16, 2011 at 9:32 AM, Magnus Hagander mag...@hagander.net wrote:
 Here's an updated version of pg_receivexlog, that should now actually
 work (it previously failed miserably when a replication record crossed
 a WAL file boundary - something which I at the time could not properly
 reproduce, but when I restarted my work on it now could easily
 reproduce every time :D).

 It also contains an update to pg_basebackup that allows it to stream
 the transaction log in the background while the backup is running,
 thus reducing the need for wal_keep_segments (if the client can keep
 up, it should eliminate the need completely).


 reviewing this...


btw, executing 'make world' with this patch gives me this error (seems
like an entry is missing in doc/src/sgml/ref/allfiles.sgml):

jade:reference.sgml:223:4:E: general entity pgReceivexlog not
defined and no default entity

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-09-28 Thread Magnus Hagander
On Wed, Sep 28, 2011 at 08:38, Jaime Casanova ja...@2ndquadrant.com wrote:
 On Tue, Aug 16, 2011 at 9:32 AM, Magnus Hagander mag...@hagander.net wrote:
 Here's an updated version of pg_receivexlog, that should now actually
 work (it previously failed miserably when a replication record crossed
 a WAL file boundary - something which I at the time could not properly
 reproduce, but when I restarted my work on it now could easily
 reproduce every time :D).

 It also contains an update to pg_basebackup that allows it to stream
 the transaction log in the background while the backup is running,
 thus reducing the need for wal_keep_segments (if the client can keep
 up, it should eliminate the need completely).


 reviewing this...

 i found useful pg_receivexlog as an independent utility, but i'm not
 so sure about the ability to call it from pg_basebackup via --xlog
 option. this is because pg_receivexlog will continue streaming even
 after pg_basebackup if it's called independently but not in the other
 case so the use case for --xlog seems more narrow and error prone (ie:
 you said that it reduces the need for wal_keep_segments *if the client
 can keep up*... how can we know that before starting pg_basebackup?)

These two are not intended to be used together.

pg_basebackup --xlog=stream is intended for the same use-case as
pg_basebackup -x today, which is take a backup of just the parts
that you actually need to clone the database, but to do so without
having to guestimate the value for wal_keep_segments.


 pg_receivexlog worked good in my tests.

 pg_basebackup with --xlog=stream gives me an already recycled wal
 segment message (note that the file was in pg_xlog in the standby):
 FATAL:  could not receive data from WAL stream: FATAL:  requested WAL
 segment 0001005C has already been removed

Do you get this reproducibly? Or did you get it just once?

And when you say in the standby what are you referring to? There is
no standby server in the case of pg_basebackup --xlog=stream, it's
just backup... But are you saying pg_basebackup had received the file,
yet tried to get it again?


 in other things:
 do we need to include src/bin/pg_basebackup/.gitignore in the patch?

Not sure what you mean? We need to add pg_receivexlog to this file,
yes - in head it just contains pg_basebackup.

-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-09-28 Thread Magnus Hagander
On Wed, Sep 28, 2011 at 09:30, Jaime Casanova ja...@2ndquadrant.com wrote:
 On Wed, Sep 28, 2011 at 1:38 AM, Jaime Casanova ja...@2ndquadrant.com wrote:
 On Tue, Aug 16, 2011 at 9:32 AM, Magnus Hagander mag...@hagander.net wrote:
 Here's an updated version of pg_receivexlog, that should now actually
 work (it previously failed miserably when a replication record crossed
 a WAL file boundary - something which I at the time could not properly
 reproduce, but when I restarted my work on it now could easily
 reproduce every time :D).

 It also contains an update to pg_basebackup that allows it to stream
 the transaction log in the background while the backup is running,
 thus reducing the need for wal_keep_segments (if the client can keep
 up, it should eliminate the need completely).


 reviewing this...


 btw, executing 'make world' with this patch gives me this error (seems
 like an entry is missing in doc/src/sgml/ref/allfiles.sgml):

 jade:reference.sgml:223:4:E: general entity pgReceivexlog not
 defined and no default entity

Ugh, how did I miss that. You need this:

diff --git a/doc/src/sgml/ref/allfiles.sgml b/doc/src/sgml/ref/allfiles.sgml
index 8a8616b..382d297 100644
--- a/doc/src/sgml/ref/allfiles.sgml
+++ b/doc/src/sgml/ref/allfiles.sgml
@@ -172,6 +172,7 @@ Complete list of usable sgml source files in this directory.
 !ENTITY pgCtl  SYSTEM pg_ctl-ref.sgml
 !ENTITY pgDump SYSTEM pg_dump.sgml
 !ENTITY pgDumpall  SYSTEM pg_dumpall.sgml
+!ENTITY pgReceivexlog  SYSTEM pg_receivexlog.sgml
 !ENTITY pgResetxlogSYSTEM pg_resetxlog.sgml
 !ENTITY pgRestore  SYSTEM pg_restore.sgml
 !ENTITY postgres   SYSTEM postgres-ref.sgml



I think I broke it in a merge at some point..
-- 
 Magnus Hagander
 Me: http://www.hagander.net/
 Work: http://www.redpill-linpro.com/

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] Updated version of pg_receivexlog

2011-09-28 Thread Jaime Casanova
On Wed, Sep 28, 2011 at 12:50 PM, Magnus Hagander mag...@hagander.net wrote:

 pg_receivexlog worked good in my tests.

 pg_basebackup with --xlog=stream gives me an already recycled wal
 segment message (note that the file was in pg_xlog in the standby):
 FATAL:  could not receive data from WAL stream: FATAL:  requested WAL
 segment 0001005C has already been removed

 Do you get this reproducibly? Or did you get it just once?

 And when you say in the standby what are you referring to? There is
 no standby server in the case of pg_basebackup --xlog=stream, it's
 just backup... But are you saying pg_basebackup had received the file,
 yet tried to get it again?


ok, i was trying to setup a standby server cloning with
pg_basebackup... i can't use it that way?

the docs says:

If this option is specified, it is possible to start a postmaster
directly in the extracted directory without the need to consult the
log archive, thus making this a completely standalone backup.


it doesn't say that is not possible to use this for a standby
server... probably that's why i get the error i put a recovery.conf
after pg_basebackup finished... maybe we can say that  more loudly?


 in other things:
 do we need to include src/bin/pg_basebackup/.gitignore in the patch?

 Not sure what you mean? We need to add pg_receivexlog to this file,
 yes - in head it just contains pg_basebackup.


your patch includes a modification in the file
src/bin/pg_basebackup/.gitignore, maybe i'm just being annoying
besides is a simple change... just forget that...

-- 
Jaime Casanova         www.2ndQuadrant.com
Professional PostgreSQL: Soporte 24x7 y capacitación

-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers