On Wed, Mar 4, 2015 at 11:42 PM, Bruce Momjian <br...@momjian.us> wrote:
> On Thu, Mar  5, 2015 at 01:25:13PM +0900, Fujii Masao wrote:
>> >> Yeah, it might make the situation better than today. But I'm afraid that
>> >> many users might get disappointed about that behavior of an incremental
>> >> backup after the release...
>> >
>> > I don't get what do you mean here. Can you elaborate this point?
>> The proposed version of LSN-based incremental backup has some limitations
>> (e.g., every database files need to be read even when there is no 
>> modification
>> in database since last backup, and which may make the backup time longer than
>> users expect) which may disappoint users. So I'm afraid that users who can
>> benefit from the feature might be very limited. IOW, I'm just sticking to
>> the idea of timestamp-based one :) But I should drop it if the majority in
>> the list prefers the LSN-based one even if it has such limitations.
> We need numbers on how effective each level of tracking will be.  Until
> then, the patch can't move forward.

The point is that this is a stepping stone toward what will ultimately
be a better solution.  You can use timestamps today if (a) whole-file
granularity is good enough for you and (b) you trust your system clock
to never go backwards.  In fact, if you use pg_start_backup() and
pg_stop_backup(), you don't even need a server patch; you can just go
right ahead and implement whatever you like.  A server patch would be
needed to make pg_basebackup do a file-time-based incremental backup,
but I'm not excited about that because I think the approach is a

If you want block-level granularity, and you should, an approach based
on file times is never going to get you there.  An approach based on
LSNs can.  If the first version of the patch requires reading the
whole database, fine, it's not going to perform all that terribly
well.  But we can optimize that later by keeping track of which blocks
have been modified since a given LSN.  If we do that, we can get
better reliability than the timestamp approach can ever offer, plus
excellent transfer and storage characteristics.

What I'm unhappy with about this patch is that it insists on sending
the whole file if a single block in that file has changed.  That is
lame.  To get something useful out of this, we should be looking to
send only those blocks whose LSNs have actually changed.  That would
reduce I/O (in the worst case, the current patch each file in its
entirety twice) and transfer bandwidth as compared to the proposed
patch.  We'd still have to read the whole database so it might very
well do more I/O than the file-timestamp approach, but it would beat
the file-timestamp approach on transfer bandwidth and on the amount of
storage required to store the incremental.  In many workloads, I
expect those savings would be quite significant.  If we then went back
in a later release and implemented one of the various proposals to
avoid needing to read every block, we'd then have a very robust and
complete solution.

But I agree with Fujii to the extent that I see little value in
committing this patch in the form proposed.  Being smart enough to use
the LSN to identify changed blocks, but then sending the entirety of
every file anyway because you don't want to go to the trouble of
figuring out how to revise the wire protocol to identify the
individual blocks being sent and write the tools to reconstruct a full
backup based on that data, does not seem like enough of a win. As
Fujii says, if we ship this patch as written, people will just keep
using the timestamp-based approach anyway.  Let's wait until we have
something that is, at least in some circumstances, a material
improvement over the status quo before committing anything.

Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to