On Wed, Mar 4, 2015 at 11:42 PM, Bruce Momjian <br...@momjian.us> wrote: > On Thu, Mar 5, 2015 at 01:25:13PM +0900, Fujii Masao wrote: >> >> Yeah, it might make the situation better than today. But I'm afraid that >> >> many users might get disappointed about that behavior of an incremental >> >> backup after the release... >> > >> > I don't get what do you mean here. Can you elaborate this point? >> >> The proposed version of LSN-based incremental backup has some limitations >> (e.g., every database files need to be read even when there is no >> modification >> in database since last backup, and which may make the backup time longer than >> users expect) which may disappoint users. So I'm afraid that users who can >> benefit from the feature might be very limited. IOW, I'm just sticking to >> the idea of timestamp-based one :) But I should drop it if the majority in >> the list prefers the LSN-based one even if it has such limitations. > > We need numbers on how effective each level of tracking will be. Until > then, the patch can't move forward.
The point is that this is a stepping stone toward what will ultimately be a better solution. You can use timestamps today if (a) whole-file granularity is good enough for you and (b) you trust your system clock to never go backwards. In fact, if you use pg_start_backup() and pg_stop_backup(), you don't even need a server patch; you can just go right ahead and implement whatever you like. A server patch would be needed to make pg_basebackup do a file-time-based incremental backup, but I'm not excited about that because I think the approach is a dead-end. If you want block-level granularity, and you should, an approach based on file times is never going to get you there. An approach based on LSNs can. If the first version of the patch requires reading the whole database, fine, it's not going to perform all that terribly well. But we can optimize that later by keeping track of which blocks have been modified since a given LSN. If we do that, we can get better reliability than the timestamp approach can ever offer, plus excellent transfer and storage characteristics. What I'm unhappy with about this patch is that it insists on sending the whole file if a single block in that file has changed. That is lame. To get something useful out of this, we should be looking to send only those blocks whose LSNs have actually changed. That would reduce I/O (in the worst case, the current patch each file in its entirety twice) and transfer bandwidth as compared to the proposed patch. We'd still have to read the whole database so it might very well do more I/O than the file-timestamp approach, but it would beat the file-timestamp approach on transfer bandwidth and on the amount of storage required to store the incremental. In many workloads, I expect those savings would be quite significant. If we then went back in a later release and implemented one of the various proposals to avoid needing to read every block, we'd then have a very robust and complete solution. But I agree with Fujii to the extent that I see little value in committing this patch in the form proposed. Being smart enough to use the LSN to identify changed blocks, but then sending the entirety of every file anyway because you don't want to go to the trouble of figuring out how to revise the wire protocol to identify the individual blocks being sent and write the tools to reconstruct a full backup based on that data, does not seem like enough of a win. As Fujii says, if we ship this patch as written, people will just keep using the timestamp-based approach anyway. Let's wait until we have something that is, at least in some circumstances, a material improvement over the status quo before committing anything. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers