Re: [HACKERS] [RFC] What should we do for reliable WAL archiving?

2014-03-29 Thread Jeff Janes
On Fri, Mar 21, 2014 at 2:22 PM, MauMau maumau...@gmail.com wrote:

 From: Jeff Janes jeff.ja...@gmail.com

  Do people really just copy the files from one directory of local storage
 to
 another directory of local storage?  I don't see the point of that.


 It makes sense to archive WAL to a directory of local storage for media
 recovery.  Here, the local storage is a different disk drive which is
 directly attached to the database server or directly connected through SAN.



For a SAN I guess we have different meanings of local :)
(I have no doubt yours is correct--the fine art of IT terminology is not my
thing.)


The recommendation is to refuse to overwrite an existing file of the same
 name, and exit with failure.  Which essentially brings archiving to a
 halt,
 because it keeps trying but it will keep failing.  If we make a custom
 version, one thing it should do is determine if the existing archived file
 is just a truncated version of the attempting-to-be archived file, and if
 so overwrite it.  Because if the first archival command fails with a
 network glitch, it can leave behind a partial file.


 What I'm trying to address is just an alternative to cp/copy which fsyncs
 a file.  It just overwrites an existing file.

 Yes, you're right, the failed archive attempt leaves behind a partial file
 which causes subsequent attempts to fail, if you follow the PG manual.
 That's another undesirable point in the current doc.  To overcome this,
 someone on this ML recommended me to do cp %p /archive/dir/%f.tmp  mv
 /archive/dir/%f.tmp /archive/dir/%f.  Does this solve your problem?


As written is doesn't solve it, as it just unconditionally overwrites the
file.  If you wanted that you could just do the single-statement
unconditional overwrite.

You could make it so that the .tmp gets overwritten unconditionally, but
the move of it will not overwrite an existing permanent file.  That would
solve the problem where a glitch in the network leaves in incomplete file
behind that blocks the next attempt, *except* that mv on (at least some)
network file systems is really a copy, and not an atomic rename, so is
still subject to leaving behind incomplete crud.

But, it is hard to tell what the real solution is, because the doc doesn't
explain why it should refuse (and fail) to overwrite an existing file.  The
only reason I can think of to make that recommendation is because it is
easy to accidentally configure two clusters to attempt to archive to the
same location, and having them overwrite each others files should be
guarded against.  If I am right, it seems like this reason should be added
to the docs, so people know what they are defending against.  And if I am
wrong, it seems even more important that the (correct) reason is added to
the docs.

Cheers,

Jeff


Re: [HACKERS] [RFC] What should we do for reliable WAL archiving?

2014-03-29 Thread Tom Lane
Jeff Janes jeff.ja...@gmail.com writes:
 But, it is hard to tell what the real solution is, because the doc doesn't
 explain why it should refuse (and fail) to overwrite an existing file.  The
 only reason I can think of to make that recommendation is because it is
 easy to accidentally configure two clusters to attempt to archive to the
 same location, and having them overwrite each others files should be
 guarded against.  If I am right, it seems like this reason should be added
 to the docs, so people know what they are defending against.  And if I am
 wrong, it seems even more important that the (correct) reason is added to
 the docs.

If memory serves, that is the reason ... and I thought it *was* explained
somewhere in the docs.

regards, tom lane


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] What should we do for reliable WAL archiving?

2014-03-29 Thread Jeff Janes
On Saturday, March 29, 2014, Tom Lane t...@sss.pgh.pa.us wrote:

 Jeff Janes jeff.ja...@gmail.com javascript:; writes:
  But, it is hard to tell what the real solution is, because the doc
 doesn't
  explain why it should refuse (and fail) to overwrite an existing file.
  The
  only reason I can think of to make that recommendation is because it is
  easy to accidentally configure two clusters to attempt to archive to the
  same location, and having them overwrite each others files should be
  guarded against.  If I am right, it seems like this reason should be
 added
  to the docs, so people know what they are defending against.  And if I am
  wrong, it seems even more important that the (correct) reason is added to
  the docs.

 If memory serves, that is the reason ... and I thought it *was* explained
 somewhere in the docs.


You are right, and it has been there for a decade.  I don't know how I
missed that the last several times I read it.  I remember clearly the
paragraph below it, just not that one.

Sorry,

Jeff


Re: [HACKERS] [RFC] What should we do for reliable WAL archiving?

2014-03-22 Thread Martijn van Oosterhout
On Sat, Mar 22, 2014 at 06:22:37AM +0900, MauMau wrote:
 From: Jeff Janes jeff.ja...@gmail.com
 Do people really just copy the files from one directory of local
 storage to
 another directory of local storage?  I don't see the point of that.
 
 It makes sense to archive WAL to a directory of local storage for
 media recovery.  Here, the local storage is a different disk drive
 which is directly attached to the database server or directly
 connected through SAN.

I'm one of those peope. They are archived into a local directory in
preparation for an rsync over ssh.

 The recommendation is to refuse to overwrite an existing file of the same
 name, and exit with failure.  Which essentially brings archiving
 to a halt,
 because it keeps trying but it will keep failing.  If we make a custom
 version, one thing it should do is determine if the existing archived file
 is just a truncated version of the attempting-to-be archived file, and if
 so overwrite it.  Because if the first archival command fails with a
 network glitch, it can leave behind a partial file.
 
 What I'm trying to address is just an alternative to cp/copy which
 fsyncs a file.  It just overwrites an existing file.

I ran into a related problem with cp, where halfway the copy the disk
was full and I was left with half a WAL file. This caused the rsync to
copy only half a file and the replication broke. This is clearly a
recoverable situation, but it didn't recover in this case.

 Yes, you're right, the failed archive attempt leaves behind a
 partial file which causes subsequent attempts to fail, if you follow
 the PG manual. That's another undesirable point in the current doc.
 To overcome this, someone on this ML recommended me to do cp %p
 /archive/dir/%f.tmp  mv /archive/dir/%f.tmp /archive/dir/%f.
 Does this solve your problem?

This would probably have handled it, but I find it odd that there's
program to handle restoring of archives properly, but on the archiving
side you have to cobble together your own shell scripts which fail in
various corner cases.

I'd love a program that just Did The Right Thing.

Have a nice day,
-- 
Martijn van Oosterhout   klep...@svana.org   http://svana.org/kleptog/
 He who writes carelessly confesses thereby at the very outset that he does
 not attach much importance to his own thoughts.
   -- Arthur Schopenhauer


signature.asc
Description: Digital signature


Re: [HACKERS] [RFC] What should we do for reliable WAL archiving?

2014-03-21 Thread MauMau

From: Mitsumasa KONDO kondo.mitsum...@gmail.com

2014-03-17 21:12 GMT+09:00 Fujii Masao masao.fu...@gmail.com:


On Mon, Mar 17, 2014 at 10:20 AM, Robert Haas robertmh...@gmail.com
wrote:
 On Sun, Mar 16, 2014 at 6:23 AM, MauMau maumau...@gmail.com wrote:
 * Improve the example in the documentation.
 But what command can we use to reliably sync just one file?

 * Provide some command, say pg_copy, which copies a file synchronously
by
 using fsync(), and describes in the doc something like for simple use
 cases, you can use pg_copy as the standard reliable copy command.

 +1.  This won't obviate the need for tools to manage replication, but
 it would make it possible to get the simplest case right without
 guessing.

+1, too.

And, what about making pg_copy call posix_fadvise(DONT_NEED) against the
archived file after the copy? Also It might be good idea to support the
direct
copy of the file to avoid wasting the file cache.


Use direct_cp.
http://directcp.sourceforge.net/direct_cp.html


Thank you all for giving favorable responses and interesting ideas.
Then, I think I'll do:

* Create pg_copy in C so that it can be used on Windows as well as on 
UNIX/Linux.  It just copies one file.  Its source code is located in 
src/bin/pg_copy/.  Please recommend a better name if you have one in mind.


* Add a reference page for pg_copy in the chapter Server applications. 
Modify the section for continuous archiving to recommend pg_copy for simple 
use cases as the standard command.


* pg_copy calls posix_fadvise(DONT_NEED) on the destination file.

* pg_copy passes O_DIRECT flag when opening the destination file 
when --directio or -d option is specified.  O_DIRECT is not used by default 
because it may not be available on some file systems, as well as it might 
cause trouble on older platforms such as RHEL4/5.  pg_copy does not use 
O_DIRECT for the source file so that it can copy the data from the 
filesystem cache, which is just written by postgres.


Could you give me your opinions before starting the work, including the 
following?


* Should I refactor the functions (copy_file, copydir, etc.) in 
src/backend/storage/file/copydir.c so that they can also be used for 
frontends?  If so, which of src/port or src/common/ is the right place to 
put copydir.c in?


* Should I complete the work before 9.4 beta so that it will be available 
starting with 9.4?  I think so because it is a basic capability to archive 
transaction logs safely (although the time may not allow me to do this).


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] What should we do for reliable WAL archiving?

2014-03-21 Thread Michael Paquier
On Fri, Mar 21, 2014 at 8:54 PM, MauMau maumau...@gmail.com wrote:
 * Create pg_copy in C so that it can be used on Windows as well as on
 UNIX/Linux.  It just copies one file.  Its source code is located in
 src/bin/pg_copy/.  Please recommend a better name if you have one in mind.
I'd rather see that as a part of contrib/ if possible. Is there any
portion of the code you have in mind that makes mandatory putting it
in src/bin?

 * Should I complete the work before 9.4 beta so that it will be available
 starting with 9.4?  I think so because it is a basic capability to archive
 transaction logs safely (although the time may not allow me to do this).
Pursing efforts on a utility like that is worth the shot IMO (I would
use it for sure if it has reliable cross-platform support to unify
sets of replication scripts), but including it in 9.4 is out of scope.
A saner target would be the 1st commit fest of 9.5.

Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] What should we do for reliable WAL archiving?

2014-03-21 Thread MauMau

From: Michael Paquier michael.paqu...@gmail.com

On Fri, Mar 21, 2014 at 8:54 PM, MauMau maumau...@gmail.com wrote:

* Create pg_copy in C so that it can be used on Windows as well as on
UNIX/Linux.  It just copies one file.  Its source code is located in
src/bin/pg_copy/.  Please recommend a better name if you have one in 
mind.

I'd rather see that as a part of contrib/ if possible. Is there any
portion of the code you have in mind that makes mandatory putting it
in src/bin?


Archiving transaction logs reliably is a basic responsibility of DBMS, so I 
think it should be treated as part of the core.  It is not a desirable 
feature but actually a mandatory one to persist transaction logs.  Even if 
it were a better to have feature, it can be put in the core like 
pg_basebackup and pg_isready, which are not mandatory tools.




* Should I complete the work before 9.4 beta so that it will be available
starting with 9.4?  I think so because it is a basic capability to 
archive

transaction logs safely (although the time may not allow me to do this).

Pursing efforts on a utility like that is worth the shot IMO (I would
use it for sure if it has reliable cross-platform support to unify
sets of replication scripts), but including it in 9.4 is out of scope.
A saner target would be the 1st commit fest of 9.5.


OK, I don't mind if it should be targeted at 9.4 or 9.5.  If someone wants 
it for 9.4, I try to hurry.


Regards
MauMau




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] What should we do for reliable WAL archiving?

2014-03-21 Thread MauMau

From: Michael Paquier michael.paqu...@gmail.com

On Fri, Mar 21, 2014 at 8:54 PM, MauMau maumau...@gmail.com wrote:

* Create pg_copy in C so that it can be used on Windows as well as on
UNIX/Linux.  It just copies one file.  Its source code is located in
src/bin/pg_copy/.  Please recommend a better name if you have one in 
mind.

I'd rather see that as a part of contrib/ if possible. Is there any
portion of the code you have in mind that makes mandatory putting it
in src/bin?


Archiving transaction logs reliably is a basic responsibility of DBMS, so I 
think it should be treated as part of the core.  It is not a desirable 
feature but actually a mandatory one to persist transaction logs.  Even if 
it were a better to have feature, it can be put in the core like 
pg_basebackup and pg_isready, which are not mandatory tools.




* Should I complete the work before 9.4 beta so that it will be available
starting with 9.4?  I think so because it is a basic capability to 
archive

transaction logs safely (although the time may not allow me to do this).

Pursing efforts on a utility like that is worth the shot IMO (I would
use it for sure if it has reliable cross-platform support to unify
sets of replication scripts), but including it in 9.4 is out of scope.
A saner target would be the 1st commit fest of 9.5.


OK, I don't mind if it should be targeted at 9.4 or 9.5.  If someone wants 
it for 9.4, I try to hurry.


Regards
MauMau




--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] What should we do for reliable WAL archiving?

2014-03-21 Thread Jeff Janes
On Sun, Mar 16, 2014 at 3:23 AM, MauMau maumau...@gmail.com wrote:

 Hello,

 The PostgreSQL documentation describes cp (on UNIX/Linux) or copy (on
 Windows) as an example for archive_command.  However, cp/copy does not sync
 the copied data to disk.  As a result, the completed WAL segments would be
 lost in the following sequence:

 1. A WAL segment fills up.

 2. The archiver process archives the just filled WAL segment using
 archive_command.  That is, cp/copy reads the WAL segment file from pg_xlog/
 and writes to the archive area.  At this point, the WAL file is not
 persisted to the archive area yet, because cp/copy doesn't sync the writes.

 3. The checkpoint processing removes the WAL segment file from pg_xlog/.


Note that it takes two checkpoints for this to happen, at least as
currently coded.

Also, if the system crashed badly enough to need media recovery, rather
than just automatic crash recovery, some lost transactions are expected.
 Although this could silently break your PITR chain, of a crash happened
and automatic recover used the copy in pg_xlog (which of course was synced)
, while copy in the archive was not synced.


 4. The OS crashes.  The filled WAL segment doesn't exist anywhere any more.

 Considering the reliable image of PostgreSQL and widespread use in
 enterprise systems, I think something should be done.  Could you give me
 your opinions on the right direction?  Although the doc certainly escapes
 by saying (This is an example, not a recommendation, and might not work on
 all platforms.), it seems from pgsql-xxx MLs that many people are
 following this example.


I use this as an example, kind of, but what I am copying to is a network
mount, so any attempts to fsync it there would probably need unavailable
hooks into the remote file system.

Do people really just copy the files from one directory of local storage to
another directory of local storage?  I don't see the point of that.  But it
seems like this is an area where there are hundreds of use cases, and often
one doesn't see the point of other people's, making it hard to come up with
good examples.




 * Improve the example in the documentation.
 But what command can we use to reliably sync just one file?

 * Provide some command, say pg_copy, which copies a file synchronously by
 using fsync(), and describes in the doc something like for simple use
 cases, you can use pg_copy as the standard reliable copy command.


The recommendation is to refuse to overwrite an existing file of the same
name, and exit with failure.  Which essentially brings archiving to a halt,
because it keeps trying but it will keep failing.  If we make a custom
version, one thing it should do is determine if the existing archived file
is just a truncated version of the attempting-to-be archived file, and if
so overwrite it.  Because if the first archival command fails with a
network glitch, it can leave behind a partial file.


Cheers,

Jeff


Re: [HACKERS] [RFC] What should we do for reliable WAL archiving?

2014-03-21 Thread MauMau

From: Jeff Janes jeff.ja...@gmail.com
Do people really just copy the files from one directory of local storage 
to

another directory of local storage?  I don't see the point of that.


It makes sense to archive WAL to a directory of local storage for media 
recovery.  Here, the local storage is a different disk drive which is 
directly attached to the database server or directly connected through SAN.




The recommendation is to refuse to overwrite an existing file of the same
name, and exit with failure.  Which essentially brings archiving to a 
halt,

because it keeps trying but it will keep failing.  If we make a custom
version, one thing it should do is determine if the existing archived file
is just a truncated version of the attempting-to-be archived file, and if
so overwrite it.  Because if the first archival command fails with a
network glitch, it can leave behind a partial file.


What I'm trying to address is just an alternative to cp/copy which fsyncs a 
file.  It just overwrites an existing file.


Yes, you're right, the failed archive attempt leaves behind a partial file 
which causes subsequent attempts to fail, if you follow the PG manual. 
That's another undesirable point in the current doc.  To overcome this, 
someone on this ML recommended me to do cp %p /archive/dir/%f.tmp  mv 
/archive/dir/%f.tmp /archive/dir/%f.  Does this solve your problem?



Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] What should we do for reliable WAL archiving?

2014-03-21 Thread Bruce Momjian
On Fri, Mar 21, 2014 at 01:16:08PM -0700, Jeff Janes wrote:
 On Sun, Mar 16, 2014 at 3:23 AM, MauMau maumau...@gmail.com wrote:
 
 Hello,
 
 The PostgreSQL documentation describes cp (on UNIX/Linux) or copy (on
 Windows) as an example for archive_command.  However, cp/copy does not 
 sync
 the copied data to disk.  As a result, the completed WAL segments would be
 lost in the following sequence:
 
 1. A WAL segment fills up.
 
 2. The archiver process archives the just filled WAL segment using
 archive_command.  That is, cp/copy reads the WAL segment file from 
 pg_xlog/
 and writes to the archive area.  At this point, the WAL file is not
 persisted to the archive area yet, because cp/copy doesn't sync the 
 writes.
 
 3. The checkpoint processing removes the WAL segment file from pg_xlog/.
 
 
 Note that it takes two checkpoints for this to happen, at least as currently
 coded.
 
 Also, if the system crashed badly enough to need media recovery, rather than
 just automatic crash recovery, some lost transactions are expected.  Although
 this could silently break your PITR chain, of a crash happened and automatic
 recover used the copy in pg_xlog (which of course was synced) , while copy in
 the archive was not synced.

That is one good reason to keep checkpoint_warning=30, so the typical
file system sync that happens every 30 seconds warns that those files
might not on permanent storage.

-- 
  Bruce Momjian  br...@momjian.ushttp://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] What should we do for reliable WAL archiving?

2014-03-17 Thread Fujii Masao
On Mon, Mar 17, 2014 at 10:20 AM, Robert Haas robertmh...@gmail.com wrote:
 On Sun, Mar 16, 2014 at 6:23 AM, MauMau maumau...@gmail.com wrote:
 The PostgreSQL documentation describes cp (on UNIX/Linux) or copy (on
 Windows) as an example for archive_command.  However, cp/copy does not sync
 the copied data to disk.  As a result, the completed WAL segments would be
 lost in the following sequence:

 1. A WAL segment fills up.

 2. The archiver process archives the just filled WAL segment using
 archive_command.  That is, cp/copy reads the WAL segment file from pg_xlog/
 and writes to the archive area.  At this point, the WAL file is not
 persisted to the archive area yet, because cp/copy doesn't sync the writes.

 3. The checkpoint processing removes the WAL segment file from pg_xlog/.

 4. The OS crashes.  The filled WAL segment doesn't exist anywhere any more.

 Considering the reliable image of PostgreSQL and widespread use in
 enterprise systems, I think something should be done.  Could you give me
 your opinions on the right direction?  Although the doc certainly escapes by
 saying (This is an example, not a recommendation, and might not work on all
 platforms.), it seems from pgsql-xxx MLs that many people are following
 this example.

 * Improve the example in the documentation.
 But what command can we use to reliably sync just one file?

 * Provide some command, say pg_copy, which copies a file synchronously by
 using fsync(), and describes in the doc something like for simple use
 cases, you can use pg_copy as the standard reliable copy command.

 +1.  This won't obviate the need for tools to manage replication, but
 it would make it possible to get the simplest case right without
 guessing.

+1, too.

And, what about making pg_copy call posix_fadvise(DONT_NEED) against the
archived file after the copy? Also It might be good idea to support the direct
copy of the file to avoid wasting the file cache.

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] What should we do for reliable WAL archiving?

2014-03-17 Thread Mitsumasa KONDO
2014-03-17 21:12 GMT+09:00 Fujii Masao masao.fu...@gmail.com:

 On Mon, Mar 17, 2014 at 10:20 AM, Robert Haas robertmh...@gmail.com
 wrote:
  On Sun, Mar 16, 2014 at 6:23 AM, MauMau maumau...@gmail.com wrote:
  The PostgreSQL documentation describes cp (on UNIX/Linux) or copy (on
  Windows) as an example for archive_command.  However, cp/copy does not
 sync
  the copied data to disk.  As a result, the completed WAL segments would
 be
  lost in the following sequence:
 
  1. A WAL segment fills up.
 
  2. The archiver process archives the just filled WAL segment using
  archive_command.  That is, cp/copy reads the WAL segment file from
 pg_xlog/
  and writes to the archive area.  At this point, the WAL file is not
  persisted to the archive area yet, because cp/copy doesn't sync the
 writes.
 
  3. The checkpoint processing removes the WAL segment file from pg_xlog/.
 
  4. The OS crashes.  The filled WAL segment doesn't exist anywhere any
 more.
 
  Considering the reliable image of PostgreSQL and widespread use in
  enterprise systems, I think something should be done.  Could you give me
  your opinions on the right direction?  Although the doc certainly
 escapes by
  saying (This is an example, not a recommendation, and might not work
 on all
  platforms.), it seems from pgsql-xxx MLs that many people are following
  this example.
 
  * Improve the example in the documentation.
  But what command can we use to reliably sync just one file?
 
  * Provide some command, say pg_copy, which copies a file synchronously
 by
  using fsync(), and describes in the doc something like for simple use
  cases, you can use pg_copy as the standard reliable copy command.
 
  +1.  This won't obviate the need for tools to manage replication, but
  it would make it possible to get the simplest case right without
  guessing.

 +1, too.

 And, what about making pg_copy call posix_fadvise(DONT_NEED) against the
 archived file after the copy? Also It might be good idea to support the
 direct
 copy of the file to avoid wasting the file cache.

Use direct_cp.
http://directcp.sourceforge.net/direct_cp.html

Regards,
--
Mitsumasa KONDO
NTT Open Source Software Center


[HACKERS] [RFC] What should we do for reliable WAL archiving?

2014-03-16 Thread MauMau

Hello,

The PostgreSQL documentation describes cp (on UNIX/Linux) or copy (on 
Windows) as an example for archive_command.  However, cp/copy does not sync 
the copied data to disk.  As a result, the completed WAL segments would be 
lost in the following sequence:


1. A WAL segment fills up.

2. The archiver process archives the just filled WAL segment using 
archive_command.  That is, cp/copy reads the WAL segment file from pg_xlog/ 
and writes to the archive area.  At this point, the WAL file is not 
persisted to the archive area yet, because cp/copy doesn't sync the writes.


3. The checkpoint processing removes the WAL segment file from pg_xlog/.

4. The OS crashes.  The filled WAL segment doesn't exist anywhere any more.

Considering the reliable image of PostgreSQL and widespread use in 
enterprise systems, I think something should be done.  Could you give me 
your opinions on the right direction?  Although the doc certainly escapes by 
saying (This is an example, not a recommendation, and might not work on all 
platforms.), it seems from pgsql-xxx MLs that many people are following 
this example.


* Improve the example in the documentation.
But what command can we use to reliably sync just one file?

* Provide some command, say pg_copy, which copies a file synchronously by 
using fsync(), and describes in the doc something like for simple use 
cases, you can use pg_copy as the standard reliable copy command.


Related to this topic, pg_basebackup doesn't fsync the backed up files.  I'm 
afraid this too is different from what the users expect --- I guess they 
would expect the backup is certainly available after pg_basebackup completes 
even if the machine crashes.


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] What should we do for reliable WAL archiving?

2014-03-16 Thread Amit Kapila
On Sun, Mar 16, 2014 at 3:53 PM, MauMau maumau...@gmail.com wrote:
 Hello,

 The PostgreSQL documentation describes cp (on UNIX/Linux) or copy (on
 Windows) as an example for archive_command.  However, cp/copy does not sync
 the copied data to disk.  As a result, the completed WAL segments would be
 lost in the following sequence:

 1. A WAL segment fills up.

 2. The archiver process archives the just filled WAL segment using
 archive_command.  That is, cp/copy reads the WAL segment file from pg_xlog/
 and writes to the archive area.  At this point, the WAL file is not
 persisted to the archive area yet, because cp/copy doesn't sync the writes.

 3. The checkpoint processing removes the WAL segment file from pg_xlog/.

 4. The OS crashes.  The filled WAL segment doesn't exist anywhere any more.

 Considering the reliable image of PostgreSQL and widespread use in
 enterprise systems, I think something should be done.  Could you give me
 your opinions on the right direction?

How about using pg_receivexlog for archiving purpose?


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] What should we do for reliable WAL archiving?

2014-03-16 Thread MauMau

From: Amit Kapila amit.kapil...@gmail.com

How about using pg_receivexlog for archiving purpose?


pg_receivexlog is good in that it does fsync().  But it seems difficult to 
use correctly, and I'm not sure if I can catch all WAL segments without any 
loss.  pg_receivexlog must be started with postmaster and monitored with 
some measures.  This won't be very easy at least on Windows.


The pg_receivexlog reference page suggests another difficulty:

Notes
When using pg_receivexlog instead of archive_command, the server will 
continue to recycle transaction log files even if the backups are not 
properly archived, since there is no command that fails. This can be worked 
around by having an archive_command that fails when the file has not been 
properly archived yet, for example:

archive_command = 'sleep 5  test -f /mnt/server/archivedir/%f'

This suggestion is not correct, because it only checks the existence of the 
file.  What if the file size is less than 16MB?  How can we check if the 
file is completely archived?


Regards
MauMau



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] What should we do for reliable WAL archiving?

2014-03-16 Thread Greg Stark
On Sun, Mar 16, 2014 at 10:23 AM, MauMau maumau...@gmail.com wrote:
 The PostgreSQL documentation describes cp (on UNIX/Linux) or copy (on
 Windows) as an example for archive_command.  However, cp/copy does not sync
 the copied data to disk

I'm actually a lot less concerned about fsyncing the backup than I am
about fsyncing the restore. The backup is just a bunch of files for
the user to make use of. They might copy them around, compress them,
move them onto tape or other storage. They need to be aware of the
persistence of whatever storage system they're putting them in.

But when they do a restore they just untar or whatever other
extraction tool and then hand those files to Postgres to maintain. I
bet the number of people who fsync or call sync the data files after
untarring their backups is vanishingly small and problems could
manifest later after Postgres has been running.

WAL-e recently changed to fsync each data file and the directories
containing them after restore. But perhaps Postgres should open and
fsync each file in the database when it starts up?

In most file systems files written to are guaranteed to be synced
within a configurable amount of time (in some systems unless the
filesystem can't keep up). So the practical risk may be small. But in
theory a database that wasn't synced when it was restored could
suddenly lose files days or months later when a crash occurs and some
data files weren't touched by the database in the intervening time.


-- 
greg


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] What should we do for reliable WAL archiving?

2014-03-16 Thread Andreas Karlsson

On 03/16/2014 03:23 PM, MauMau wrote:

From: Amit Kapila amit.kapil...@gmail.com

How about using pg_receivexlog for archiving purpose?


pg_receivexlog is good in that it does fsync().  But it seems difficult
to use correctly, and I'm not sure if I can catch all WAL segments
without any loss.  pg_receivexlog must be started with postmaster and
monitored with some measures.  This won't be very easy at least on Windows.


Replication slots should solve the issue of making sure to catch all of 
the WAL.


--
Andreas Karlsson


--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] What should we do for reliable WAL archiving?

2014-03-16 Thread Robert Haas
On Sun, Mar 16, 2014 at 6:23 AM, MauMau maumau...@gmail.com wrote:
 The PostgreSQL documentation describes cp (on UNIX/Linux) or copy (on
 Windows) as an example for archive_command.  However, cp/copy does not sync
 the copied data to disk.  As a result, the completed WAL segments would be
 lost in the following sequence:

 1. A WAL segment fills up.

 2. The archiver process archives the just filled WAL segment using
 archive_command.  That is, cp/copy reads the WAL segment file from pg_xlog/
 and writes to the archive area.  At this point, the WAL file is not
 persisted to the archive area yet, because cp/copy doesn't sync the writes.

 3. The checkpoint processing removes the WAL segment file from pg_xlog/.

 4. The OS crashes.  The filled WAL segment doesn't exist anywhere any more.

 Considering the reliable image of PostgreSQL and widespread use in
 enterprise systems, I think something should be done.  Could you give me
 your opinions on the right direction?  Although the doc certainly escapes by
 saying (This is an example, not a recommendation, and might not work on all
 platforms.), it seems from pgsql-xxx MLs that many people are following
 this example.

 * Improve the example in the documentation.
 But what command can we use to reliably sync just one file?

 * Provide some command, say pg_copy, which copies a file synchronously by
 using fsync(), and describes in the doc something like for simple use
 cases, you can use pg_copy as the standard reliable copy command.

+1.  This won't obviate the need for tools to manage replication, but
it would make it possible to get the simplest case right without
guessing.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] [RFC] What should we do for reliable WAL archiving?

2014-03-16 Thread Amit Kapila
On Sun, Mar 16, 2014 at 7:53 PM, MauMau maumau...@gmail.com wrote:
 From: Amit Kapila amit.kapil...@gmail.com

 How about using pg_receivexlog for archiving purpose?


 pg_receivexlog is good in that it does fsync().  But it seems difficult to
 use correctly, and I'm not sure if I can catch all WAL segments without any
 loss.  pg_receivexlog must be started with postmaster and monitored with
 some measures.  This won't be very easy at least on Windows.

 The pg_receivexlog reference page suggests another difficulty:

 Notes
 When using pg_receivexlog instead of archive_command, the server will
 continue to recycle transaction log files even if the backups are not
 properly archived, since there is no command that fails. This can be worked
 around by having an archive_command that fails when the file has not been
 properly archived yet, for example:
 archive_command = 'sleep 5  test -f /mnt/server/archivedir/%f'

 This suggestion is not correct, because it only checks the existence of the
 file.  What if the file size is less than 16MB?  How can we check if the
 file is completely archived?

The most probable reasons for un-successful archiving could be:
1. Disk space got full - pg_receivexlog makes sure while open/create new
segment file that the size of new file should be 16MB (open_walfile()). So due
to this reason there should not be a problem to above command.

2. Permission got denied - I think this will lead to failure of above archive
command mentioned by you.

3. n/w connection broken - This will also lead to failure of above command, but
here I think there is a possibility that it might have checked the existence of
in-complete wal file on archive location and consider it archived, but I think
wal_keep_segments can avoid this problem.

Also if you are on 9.4, then may be --slot parameter can help you.


With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers