On Wed, Jul 30, 2014 at 7:00 PM, desmodemone <desmodem...@gmail.com> wrote:
> I think it's very useful an incremental/differential backup
method, by the way
> the method has two drawbacks:
> 1) In a database normally, even if the percent of modify rows is small
compared to total rows, the probability to change only some files /tables
is small, because the rows are normally not ordered inside a tables and the
update are "random". If some tables are static, probably they are lookup
tables or something like a registry, and normally these tables are small .
> 2) every time a file changed require every time to read all file. So if
the point A is true, probably you are reading a large part of the databases
and then send that part , instead of sending a small part.
> In my opinion to solve these problems we need a different implementation
of incremental backup.
> I will try to show my idea about it.
> I think we need a bitmap map in memory to track the changed "chunks" of
the file/s/table [ for "chunk" I mean an X number of tracked pages , to
divide the every tracked files in "chunks" ], so we could send only the
changed blocks from last incremental backup ( that could be a full for
incremental backup ).The map could have one submaps for every tracked
files, so it's more simple.
> So ,if we track with one bit a chunk of 8 page blocks ( 64KB) [ a chunk
of 8 block is only an example] , If we use one map of 1Mbit ( 1Mbit are
125KB of memory ) we could track a table with a total size of 64Gb,
probably we could use a compression algorithm because the map is done by
1 and 0 . This is a very simple idea, but it shows that the map does not
need too much memory if we track groups of blocks i.e. "chunk", obviously
the problem is more complex, and probably there are better and more robust
> Probably we need more space for the header of map to track the
informations about file and the last backup and so on.
> I think the map must be updated by the bgwriter , i.e. when it flushes
the dirty buffers,
Not only bgwriter, but checkpointer and backends as well, as
those also flush buffers. Also there are some writes which are
done outside shared buffers, you need to track those separately.
Another point is that to track the changes due to hint bit modification,
you need to enable checksums or wal_log_hints which will either
lead to more cpu or I/O.
> fortunately we don't need this map for consistence of database, so we
could create and manage it in memory to limit the impact on performance.
> The drawback is that If the db crashes or someone closes it , the next
incremental backup will be full , we could think to flush the map to disk
if the PostgreSQL will receive a signal of closing process or something
> In this way we obtain :
> 1) we read only small part of a database ( the probability of a changed
chunk are less the the changed of the whole file )
> 2) we do not need to calculate the checksum, saving cpu
> 3) we save i/o in reading and writing ( we will send only the changed
block from last incremental backup )
> 4) we save network
> 5) we save time during backup. if we read and write less data, we reduce
the time to do an incremental backup.
> 6) I think the bitmap map in memory will not impact too much on the
performance of the bgwriter.
> What do you think about?
I think with this method has 3 drawbacks compare to method
a. either enable checksum or wal_log_hints, so it will incur extra
I/O if you enable wal_log_hints
b. backends also need to update the map which though a small
cost, but still ...
c. map is not crash safe, due to which sometimes full back up