2014-07-29 18:35 GMT+02:00 Marco Nenciarini <marco.nenciar...@2ndquadrant.it
> Il 25/07/14 20:44, Robert Haas ha scritto:
> > On Fri, Jul 25, 2014 at 2:21 PM, Claudio Freire <klaussfre...@gmail.com>
> >> On Fri, Jul 25, 2014 at 10:14 AM, Marco Nenciarini
> >> <marco.nenciar...@2ndquadrant.it> wrote:
> >>> 1. Proposal
> >>> =================================
> >>> Our proposal is to introduce the concept of a backup profile. The
> >>> profile consists of a file with one line per file detailing tablespace,
> >>> path, modification time, size and checksum.
> >>> Using that file the BASE_BACKUP command can decide which file needs to
> >>> be sent again and which is not changed. The algorithm should be very
> >>> similar to rsync, but since our files are never bigger than 1 GB per
> >>> file that is probably granular enough not to worry about copying parts
> >>> of files, just whole files.
> >> That wouldn't nearly as useful as the LSN-based approach mentioned
> >> I've had my share of rsyncing live databases (when resizing
> >> filesystems, not for backup, but the anecdotal evidence applies
> >> anyhow) and with moderately write-heavy databases, even if you only
> >> modify a tiny portion of the records, you end up modifying a huge
> >> portion of the segments, because the free space choice is random.
> >> There have been patches going around to change the random nature of
> >> that choice, but none are very likely to make a huge difference for
> >> this application. In essence, file-level comparisons get you only a
> >> mild speed-up, and are not worth the effort.
> >> I'd go for the hybrid file+lsn method, or nothing. The hybrid avoids
> >> the I/O of inspecting the LSN of entire segments (necessary
> >> optimization for huge multi-TB databases) and backups only the
> >> portions modified when segments do contain changes, so it's the best
> >> of both worlds. Any partial implementation would either require lots
> >> of I/O (LSN only) or save very little (file only) unless it's an
> >> almost read-only database.
> > I agree with much of that. However, I'd question whether we can
> > really seriously expect to rely on file modification times for
> > critical data-integrity operations. I wouldn't like it if somebody
> > ran ntpdate to fix the time while the base backup was running, and it
> > set the time backward, and the next differential backup consequently
> > omitted some blocks that had been modified during the base backup.
> Our proposal doesn't rely on file modification times for data integrity.
> We are using the file mtime only as a fast indication that the file has
> changed, and transfer it again without performing the checksum.
> If timestamp and size match we rely on *checksums* to decide if it has
> to be sent.
> In "SMART MODE" we would use the file mtime to skip the checksum check
> in some cases, but it wouldn't be the default operation mode and it will
> have all the necessary warnings attached. However the "SMART MODE" isn't
> a core part of our proposal, and can be delayed until we agree on the
> safest way to bring it to the end user.
> Marco Nenciarini - 2ndQuadrant Italy
> PostgreSQL Training, Services and Support
> marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it
I think it's very useful an incremental/differential backup
method, by the way
the method has two drawbacks:
1) In a database normally, even if the percent of modify rows is small
compared to total rows, the probability to change only some files /tables
is small, because the rows are normally not ordered inside a tables and the
update are "random". If some tables are static, probably they are lookup
tables or something like a registry, and normally these tables are small .
2) every time a file changed require every time to read all file. So if
the point A is true, probably you are reading a large part of the databases
and then send that part , instead of sending a small part.
In my opinion to solve these problems we need a different implementation of
I will try to show my idea about it.
I think we need a bitmap map in memory to track the changed "chunks" of the
file/s/table [ for "chunk" I mean an X number of tracked pages , to divide
the every tracked files in "chunks" ], so we could send only the changed
blocks from last incremental backup ( that could be a full for incremental
backup ).The map could have one submaps for every tracked files, so it's
So ,if we track with one bit a chunk of 8 page blocks ( 64KB) [ a chunk of
8 block is only an example] , If we use one map of 1Mbit ( 1Mbit are
125KB of memory ) we could track a table with a total size of 64Gb,
probably we could use a compression algorithm because the map is done by
1 and 0 . This is a very simple idea, but it shows that the map does not
need too much memory if we track groups of blocks i.e. "chunk", obviously
the problem is more complex, and probably there are better and more robust
Probably we need more space for the header of map to track the
informations about file and the last backup and so on.
I think the map must be updated by the bgwriter , i.e. when it flushes the
dirty buffers, fortunately we don't need this map for consistence of
database, so we could create and manage it in memory to limit the impact on
The drawback is that If the db crashes or someone closes it , the next
incremental backup will be full , we could think to flush the map to disk
if the PostgreSQL will receive a signal of closing process or something
In this way we obtain :
1) we read only small part of a database ( the probability of a changed
chunk are less the the changed of the whole file )
2) we do not need to calculate the checksum, saving cpu
3) we save i/o in reading and writing ( we will send only the changed block
from last incremental backup )
4) we save network
5) we save time during backup. if we read and write less data, we reduce
the time to do an incremental backup.
6) I think the bitmap map in memory will not impact too much on the
performance of the bgwriter.
What do you think about?