Re: [HACKERS] File based Incremental backup v8

2016-01-25 Thread Michael Paquier
On Tue, Jan 26, 2016 at 12:55 AM, David Fetter  wrote:
> On Tue, Mar 10, 2015 at 08:25:27AM +0900, Michael Paquier wrote:
>> On Tue, Mar 10, 2015 at 1:50 AM, Robert Haas  wrote:
>> > I think there's absolutely no point in spending more time on this for
>> > 9.5.  At least 4 committers have looked at it and none of them are
>> > convinced by the current design; feedback from almost half a year ago
>> > hasn't been incorporated; obviously-needed parts of the patch
>> > (pg_restorebackup) are missing weeks after the last CF deadline.
>> > Let's mark this Rejected in the CF app and move on.
>>
>> Agreed. I lost a bit interest in this patch lately, but if all the
>> necessary parts of the patch were not posted before the CF deadline
>> that's not something we should consider for integration at this point.
>> Let's give it a couple of months of fresh air and, Gabriele, I am sure
>> you will be able to come back with something far more advanced for the
>> first CF of 9.6.
>
> What's the latest on this patch?

My guess is that Marco and Gabriele are working on something directly
for barman, the backup tool they use, with a differential backup
implementation based on tracking blocks modified by WAL records (far
faster for large data sets than scanning all the relation files of
PGDATA).
Regards,
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] File based Incremental backup v8

2016-01-25 Thread David Fetter
On Tue, Mar 10, 2015 at 08:25:27AM +0900, Michael Paquier wrote:
> On Tue, Mar 10, 2015 at 1:50 AM, Robert Haas  wrote:
> > I think there's absolutely no point in spending more time on this for
> > 9.5.  At least 4 committers have looked at it and none of them are
> > convinced by the current design; feedback from almost half a year ago
> > hasn't been incorporated; obviously-needed parts of the patch
> > (pg_restorebackup) are missing weeks after the last CF deadline.
> > Let's mark this Rejected in the CF app and move on.
> 
> Agreed. I lost a bit interest in this patch lately, but if all the
> necessary parts of the patch were not posted before the CF deadline
> that's not something we should consider for integration at this point.
> Let's give it a couple of months of fresh air and, Gabriele, I am sure
> you will be able to come back with something far more advanced for the
> first CF of 9.6.

What's the latest on this patch?

Cheers,
David.
-- 
David Fetter  http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter  XMPP: david.fet...@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] File based Incremental backup v8

2015-03-09 Thread Michael Paquier
On Tue, Mar 10, 2015 at 1:50 AM, Robert Haas  wrote:
> I think there's absolutely no point in spending more time on this for
> 9.5.  At least 4 committers have looked at it and none of them are
> convinced by the current design; feedback from almost half a year ago
> hasn't been incorporated; obviously-needed parts of the patch
> (pg_restorebackup) are missing weeks after the last CF deadline.
> Let's mark this Rejected in the CF app and move on.

Agreed. I lost a bit interest in this patch lately, but if all the
necessary parts of the patch were not posted before the CF deadline
that's not something we should consider for integration at this point.
Let's give it a couple of months of fresh air and, Gabriele, I am sure
you will be able to come back with something far more advanced for the
first CF of 9.6.
-- 
Michael


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] File based Incremental backup v8

2015-03-09 Thread Robert Haas
On Sat, Mar 7, 2015 at 5:45 PM, Gabriele Bartolini
 wrote:
>> By the way, unless I'm missing something, this patch only seems to
>> include the code to construct an incremental backup, but no tools
>> whatsoever to do anything useful with it once you've got it.
>
> As stated previously, Marco is writing a tool called pg_restorebackup (the
> prototype in Python has been already posted) to be included in the core. I
> am in Australia now and not in the office so I cannot confirm it, but I am
> pretty sure he had already written it and was about to send it to the list.

Gabriele, the deadline for the last CommitFest was three weeks ago.
Having a patch that you are "about to send to the list" is not good
enough at this point.

>>   I think that's 100% unacceptable.
>
> I agree, that's why pg_restorebackup written in C is part of this patch.
> See: https://wiki.postgresql.org/wiki/Incremental_backup

No, it *isn't* part of this patch.  You may have a plan to add it to
this patch, but that's not the same thing.

> The goal has always been to provide "file-based incremental backup". I can
> assure that this has always been our compass and the direction to follow.

Regardless of community feedback?  OK.  Let's see how that works out for you.

> I repeat that, using pg_restorebackup, this patch will transparently let
> users benefit from incremental backup even when it will be moved to an
> internal block-level logic. Users will continue to execute pg_basebackup and
> pg_restorebackup, ignoring that with - for example 9.5 - it is file-based
> (saving between 50-70% of space and time) of block level - for example 9.6.

I understand that.  But I also understand that in other cases it's
going to be slower than a full backup.  This problem has been pointed
out several times, and you're just refusing to admit that it's a real
issue.  A user with a bunch of tables where only the rows near the end
of the file get updated is going to repeatedly read those files until
it hits the first modified block and then rewind and reread the whole
file.  I pointed this problem out back in early October and suggested
some ways of fixing it; Heikki followed up with his own suggestions
for modifying my idea.  Instead of implementing any of that, or even
discussing it, you're still plugging away on a design that no
committer has endorsed and that several committers obviously have
concerns about.

It's also pretty clear that nobody likes the backup profile, at least
in the form it exists today.  But it's still here, many patch versions
later.

I think there's absolutely no point in spending more time on this for
9.5.  At least 4 committers have looked at it and none of them are
convinced by the current design; feedback from almost half a year ago
hasn't been incorporated; obviously-needed parts of the patch
(pg_restorebackup) are missing weeks after the last CF deadline.
Let's mark this Rejected in the CF app and move on.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] File based Incremental backup v8

2015-03-07 Thread Bruce Momjian
On Sun, Mar  8, 2015 at 09:26:38AM +1100, Gabriele Bartolini wrote:
> Hi Bruce,
> 
> 2015-03-08 5:37 GMT+11:00 Bruce Momjian :
> 
>         Desirability -> Design -> Implement -> Test -> Review -> Commit
> 
> This patch has continued in development without getting agreement on
> its Desirability or Design, meaning we are going to continue going back
> to those points until there is agreement.  Posting more versions of this
> patch is not going to change that.
> 
> 
> Could you please elaborate that?
> 
> I actually think the approach that has been followed is what makes open source
> and collaborative development work. The initial idea was based on timestamp
> approach which, thanks to the input of several developers, led Marco to 
> develop
> LSN based checks and move forward the feature implementation.

OK, if you think everyone just going on their own and working on patches
that have little chance of being accepted, you can do it, but that
rarely makes successful open source software.  You can do whatever you
want with your patch.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] File based Incremental backup v8

2015-03-07 Thread Gabriele Bartolini
Hi Robert,

2015-03-07 2:57 GMT+11:00 Robert Haas :

> By the way, unless I'm missing something, this patch only seems to
> include the code to construct an incremental backup, but no tools
> whatsoever to do anything useful with it once you've got it.


As stated previously, Marco is writing a tool called pg_restorebackup (the
prototype in Python has been already posted) to be included in the core. I
am in Australia now and not in the office so I cannot confirm it, but I am
pretty sure he had already written it and was about to send it to the list.

He's been trying to find more data - see the one that he's sent - in order
to convince that even a file-based approach is useful.

  I think that's 100% unacceptable.


I agree, that's why pg_restorebackup written in C is part of this patch.
See: https://wiki.postgresql.org/wiki/Incremental_backup


> Users need to be able to manipulate
> PostgreSQL backups using either standard operating system tools or
> tools provided with PostgreSQL.  Some people may prefer to use
> something like repmgr or pitrtools or omniptr in addition, but that
> shouldn't be a requirement for incremental backup to be usable.
>

Not at all. I believe those tools will have to use pg_basebackup and
pg_restorebackup. If they want to use streaming replication protocol they
will be responsible to make sure that - if the protocol changes - they
adapt their technology.

Agile development is good, but that does not mean you can divide a big
> project into arbitrarily small chunks.  At some point the chunks are
> too small to be sure that the overall direction is right, and/or
> individually useless.
>

The goal has always been to provide "file-based incremental backup". I can
assure that this has always been our compass and the direction to follow.

I repeat that, using pg_restorebackup, this patch will transparently let
users benefit from incremental backup even when it will be moved to an
internal block-level logic. Users will continue to execute pg_basebackup
and pg_restorebackup, ignoring that with - for example 9.5 - it is
file-based (saving between 50-70% of space and time) of block level - for
example 9.6.

My proposal is that Marco provides pg_restorebackup according to the
initial plan - a matter of hours/days.

Cheers,
Gabriele


Re: [HACKERS] File based Incremental backup v8

2015-03-07 Thread Gabriele Bartolini
Hi Bruce,

2015-03-08 5:37 GMT+11:00 Bruce Momjian :
>
> Desirability -> Design -> Implement -> Test -> Review -> Commit
>
> This patch has continued in development without getting agreement on
> its Desirability or Design, meaning we are going to continue going back
> to those points until there is agreement.  Posting more versions of this
> patch is not going to change that.
>

Could you please elaborate that?

I actually think the approach that has been followed is what makes open
source and collaborative development work. The initial idea was based on
timestamp approach which, thanks to the input of several developers, led
Marco to develop LSN based checks and move forward the feature
implementation.

The numbers that Marco has posted clearly show that a lot of users will
benefit from this file-based approach for incremental backup through
pg_basebackup.

As far as I see it, the only missing bit is the pg_restorebackup tool which
is quite trivial - given the existing prototype in Python.

Thanks,
Gabriele


Re: [HACKERS] File based Incremental backup v8

2015-03-07 Thread Bruce Momjian
On Thu, Mar  5, 2015 at 11:10:08AM -0500, Robert Haas wrote:
> But I agree with Fujii to the extent that I see little value in
> committing this patch in the form proposed.  Being smart enough to use
> the LSN to identify changed blocks, but then sending the entirety of
> every file anyway because you don't want to go to the trouble of
> figuring out how to revise the wire protocol to identify the
> individual blocks being sent and write the tools to reconstruct a full
> backup based on that data, does not seem like enough of a win. As
> Fujii says, if we ship this patch as written, people will just keep
> using the timestamp-based approach anyway.  Let's wait until we have
> something that is, at least in some circumstances, a material
> improvement over the status quo before committing anything.

The big problem I have with this patch is that it has not followed the
proper process for development, i.e. at the top of the TODO list we
have:

Desirability -> Design -> Implement -> Test -> Review -> Commit

This patch has continued in development without getting agreement on
its Desirability or Design, meaning we are going to continue going back
to those points until there is agreement.  Posting more versions of this
patch is not going to change that.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] File based Incremental backup v8

2015-03-07 Thread Marco Nenciarini
Il 05/03/15 05:42, Bruce Momjian ha scritto:
> On Thu, Mar  5, 2015 at 01:25:13PM +0900, Fujii Masao wrote:
 Yeah, it might make the situation better than today. But I'm afraid that
 many users might get disappointed about that behavior of an incremental
 backup after the release...
>>>
>>> I don't get what do you mean here. Can you elaborate this point?
>>
>> The proposed version of LSN-based incremental backup has some limitations
>> (e.g., every database files need to be read even when there is no 
>> modification
>> in database since last backup, and which may make the backup time longer than
>> users expect) which may disappoint users. So I'm afraid that users who can
>> benefit from the feature might be very limited. IOW, I'm just sticking to
>> the idea of timestamp-based one :) But I should drop it if the majority in
>> the list prefers the LSN-based one even if it has such limitations.
> 
> We need numbers on how effective each level of tracking will be.  Until
> then, the patch can't move forward.
> 

I've written a little test script to estimate how much space can be saved by 
file level incremental, and I've run it on some real backups I have access to.

The script takes two basebackup directory and simulate how much data can be 
saved in the 2nd backup using incremental backup (using file size/time and LSN)

It assumes that every file in base, global and pg_tblspc which matches both 
size and modification time will also match from the LSN point of view.

The result is that many databases can take advantage of incremental, even if 
not do big, and considering LSNs yield a result almost identical to the 
approach based on filesystem metadata.

== Very big geographic database (similar to openstreetmap main DB), it contains 
versioned data, interval two months 

First backup size: 13286623850656 (12.1TiB)
Second backup size: 13323511925626 (12.1TiB)
Matching files count: 17094
Matching LSN count: 14580
Matching files size: 9129755116499 (8.3TiB, 68.5%)
Matching LSN size: 9128568799332 (8.3TiB, 68.5%)


== Big on-line store database, old data regularly moved to historic partitions, 
interval one day

First backup size: 1355285058842 (1.2TiB)
Second backup size: 1358389467239 (1.2TiB)
Matching files count: 3937
Matching LSN count: 2821
Matching files size: 762292960220 (709.9GiB, 56.1%)
Matching LSN size: 762122543668 (709.8GiB, 56.1%)


== Ticketing system database, interval one day

First backup size: 144988275 (138.3MiB)
Second backup size: 146135155 (139.4MiB)
Matching files count: 3124
Matching LSN count: 2641
Matching files size: 76908986 (73.3MiB, 52.6%)
Matching LSN size: 67747928 (64.6MiB, 46.4%)


== Online store, interval one day

First backup size: 20418561133 (19.0GiB)
Second backup size: 20475290733 (19.1GiB)
Matching files count: 5744
Matching LSN count: 4302
Matching files size: 4432709876 (4.1GiB, 21.6%)
Matching LSN size: 4388993884 (4.1GiB, 21.4%)


== Heavily updated database, interval one week

First backup size: 3203198962 (3.0GiB)
Second backup size: 3222409202 (3.0GiB)
Matching files count: 1801
Matching LSN count: 1273
Matching files size: 91206317 (87.0MiB, 2.8%)
Matching LSN size: 69083532 (65.9MiB, 2.1%)

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it
#!/usr/bin/env python
from __future__ import print_function

import collections
from optparse import OptionParser
import os
import sys

__author__ = 'Marco Nenciarini '

usage = """usage: %prog [options] backup_1 backup_2
"""
parser = OptionParser(usage=usage)
(options, args) = parser.parse_args()

# need 2 directories
if len(args) != 2:
parser.print_help()
sys.exit(1)

FileItem = collections.namedtuple('FileItem', 'size time path')


def get_files(target_dir):
"""Return a set of FileItem"""
files = set()
for dir_path, _, file_names in os.walk(target_dir, followlinks=True):
for filename in file_names:
path = os.path.join(dir_path, filename)
rel_path = path[len(target_dir) + 1:]
stats = os.stat(path)
files.add(FileItem(stats.st_size, stats.st_mtime, rel_path))
return files


def size_fmt(num, suffix='B'):
"""Format a size"""
for unit in ['', 'Ki', 'Mi', 'Gi', 'Ti', 'Pi', 'Ei', 'Zi']:
if abs(num) < 1024.0:
return "%3.1f%s%s" % (num, unit, suffix)
num /= 1024.0
return "%.1f%s%s" % (num, 'Yi', suffix)

def percent_fmt(a, b):
"""Format a percent"""
return "%.1f%%" % (100.0*a/b)


def get_size(file_set):
return reduce(lambda x, y: x + y.size, file_set, 0)


def report(a, b):
# find files that are identical (same size and same time)
common = a & b

# remove files count only main forks LSN
common_lsn = set()
for item in common:
# remove non main fork
if any(suffix in item.path
   for suffix in ('_fsm', '_vm', '_init')):
continue

Re: [HACKERS] File based Incremental backup v8

2015-03-06 Thread Robert Haas
On Fri, Mar 6, 2015 at 9:38 AM, Gabriele Bartolini
 wrote:
> I believe the main point is to look at a user interface point of view.
> If/When we switch to a block level incremental support, this will be
> completely transparent to the end user, even if we start with a file-level
> approach with LSN check.

I don't think that's true.  To have a real file-level incremental
backup you need the ability to take the incremental backup, and then
you also need the ability to take a full backup + an incremental
backup taken later and reassemble a full image of the cluster on which
you can run recovery.  The means of doing that is going to be
different for an approach that only copies certain blocks vs. one that
copies whole files.  Once we have the block-based approach, nobody
will ever use the file-based approach, so whatever code or tools we
write to do that will all be dead code, yet we'll still have to
support them for many years.

By the way, unless I'm missing something, this patch only seems to
include the code to construct an incremental backup, but no tools
whatsoever to do anything useful with it once you've got it.  I think
that's 100% unacceptable.  Users need to be able to manipulate
PostgreSQL backups using either standard operating system tools or
tools provided with PostgreSQL.  Some people may prefer to use
something like repmgr or pitrtools or omniptr in addition, but that
shouldn't be a requirement for incremental backup to be usable.

Agile development is good, but that does not mean you can divide a big
project into arbitrarily small chunks.  At some point the chunks are
too small to be sure that the overall direction is right, and/or
individually useless.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] File based Incremental backup v8

2015-03-06 Thread Gabriele Bartolini
Hi Robert,

2015-03-06 3:10 GMT+11:00 Robert Haas :

> But I agree with Fujii to the extent that I see little value in
> committing this patch in the form proposed.  Being smart enough to use
> the LSN to identify changed blocks, but then sending the entirety of
> every file anyway because you don't want to go to the trouble of
> figuring out how to revise the wire protocol to identify the
> individual blocks being sent and write the tools to reconstruct a full
> backup based on that data, does not seem like enough of a win.


I believe the main point is to look at a user interface point of view.
If/When we switch to a block level incremental support, this will be
completely transparent to the end user, even if we start with a file-level
approach with LSN check.

The win is already determined by the average space/time gained by users of
VLDB with a good chunk of read-only data. Our Barman users with incremental
backup (released recently - its algorithm can be compared to the one of
file-level backup proposed by Marco) can benefit on average of a data
deduplication ratio ranging between 50 to 70% of the cluster size.

A tangible example is depicted here, with Navionics saving 8.2TB a week
thanks to this approach (and 17 hours instead of 50 for backup time):
http://blog.2ndquadrant.com/incremental-backup-barman-1-4-0/

However, even smaller databases will benefit. It is clear that very small
databases as well as frequently updated ones won't be interested in
incremental backup, but that is never been the use case for this feature.

I believe that if we still think that this approach is not worth it, we are
making a big mistake. The way I see it, this patch follows an agile
approach and it is an important step towards incremental backup on a block
basis.


> As Fujii says, if we ship this patch as written, people will just keep
> using the timestamp-based approach anyway.


I think that allowing users to be able to backup in an incremental way
through streaming replication (even though based on files) will give more
flexibility to system and database administrators for their disaster
recovery solutions.

Thanks,
Gabriele


Re: [HACKERS] File based Incremental backup v8

2015-03-05 Thread Robert Haas
On Wed, Mar 4, 2015 at 11:42 PM, Bruce Momjian  wrote:
> On Thu, Mar  5, 2015 at 01:25:13PM +0900, Fujii Masao wrote:
>> >> Yeah, it might make the situation better than today. But I'm afraid that
>> >> many users might get disappointed about that behavior of an incremental
>> >> backup after the release...
>> >
>> > I don't get what do you mean here. Can you elaborate this point?
>>
>> The proposed version of LSN-based incremental backup has some limitations
>> (e.g., every database files need to be read even when there is no 
>> modification
>> in database since last backup, and which may make the backup time longer than
>> users expect) which may disappoint users. So I'm afraid that users who can
>> benefit from the feature might be very limited. IOW, I'm just sticking to
>> the idea of timestamp-based one :) But I should drop it if the majority in
>> the list prefers the LSN-based one even if it has such limitations.
>
> We need numbers on how effective each level of tracking will be.  Until
> then, the patch can't move forward.

The point is that this is a stepping stone toward what will ultimately
be a better solution.  You can use timestamps today if (a) whole-file
granularity is good enough for you and (b) you trust your system clock
to never go backwards.  In fact, if you use pg_start_backup() and
pg_stop_backup(), you don't even need a server patch; you can just go
right ahead and implement whatever you like.  A server patch would be
needed to make pg_basebackup do a file-time-based incremental backup,
but I'm not excited about that because I think the approach is a
dead-end.

If you want block-level granularity, and you should, an approach based
on file times is never going to get you there.  An approach based on
LSNs can.  If the first version of the patch requires reading the
whole database, fine, it's not going to perform all that terribly
well.  But we can optimize that later by keeping track of which blocks
have been modified since a given LSN.  If we do that, we can get
better reliability than the timestamp approach can ever offer, plus
excellent transfer and storage characteristics.

What I'm unhappy with about this patch is that it insists on sending
the whole file if a single block in that file has changed.  That is
lame.  To get something useful out of this, we should be looking to
send only those blocks whose LSNs have actually changed.  That would
reduce I/O (in the worst case, the current patch each file in its
entirety twice) and transfer bandwidth as compared to the proposed
patch.  We'd still have to read the whole database so it might very
well do more I/O than the file-timestamp approach, but it would beat
the file-timestamp approach on transfer bandwidth and on the amount of
storage required to store the incremental.  In many workloads, I
expect those savings would be quite significant.  If we then went back
in a later release and implemented one of the various proposals to
avoid needing to read every block, we'd then have a very robust and
complete solution.

But I agree with Fujii to the extent that I see little value in
committing this patch in the form proposed.  Being smart enough to use
the LSN to identify changed blocks, but then sending the entirety of
every file anyway because you don't want to go to the trouble of
figuring out how to revise the wire protocol to identify the
individual blocks being sent and write the tools to reconstruct a full
backup based on that data, does not seem like enough of a win. As
Fujii says, if we ship this patch as written, people will just keep
using the timestamp-based approach anyway.  Let's wait until we have
something that is, at least in some circumstances, a material
improvement over the status quo before committing anything.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] File based Incremental backup v8

2015-03-04 Thread Bruce Momjian
On Thu, Mar  5, 2015 at 01:25:13PM +0900, Fujii Masao wrote:
> >> Yeah, it might make the situation better than today. But I'm afraid that
> >> many users might get disappointed about that behavior of an incremental
> >> backup after the release...
> >
> > I don't get what do you mean here. Can you elaborate this point?
> 
> The proposed version of LSN-based incremental backup has some limitations
> (e.g., every database files need to be read even when there is no modification
> in database since last backup, and which may make the backup time longer than
> users expect) which may disappoint users. So I'm afraid that users who can
> benefit from the feature might be very limited. IOW, I'm just sticking to
> the idea of timestamp-based one :) But I should drop it if the majority in
> the list prefers the LSN-based one even if it has such limitations.

We need numbers on how effective each level of tracking will be.  Until
then, the patch can't move forward.

-- 
  Bruce Momjian  http://momjian.us
  EnterpriseDB http://enterprisedb.com

  + Everyone has their own god. +


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] File based Incremental backup v8

2015-03-04 Thread Fujii Masao
On Thu, Mar 5, 2015 at 1:59 AM, Marco Nenciarini
 wrote:
> Hi Fujii,
>
> Il 03/03/15 11:48, Fujii Masao ha scritto:
>> On Tue, Mar 3, 2015 at 12:36 AM, Marco Nenciarini
>>  wrote:
>>> Il 02/03/15 14:21, Fujii Masao ha scritto:
 On Thu, Feb 12, 2015 at 10:50 PM, Marco Nenciarini
  wrote:
> Hi,
>
> I've attached an updated version of the patch.

 basebackup.c:1565: warning: format '%lld' expects type 'long long
 int', but argument 8 has type '__off_t'
 basebackup.c:1565: warning: format '%lld' expects type 'long long
 int', but argument 8 has type '__off_t'
 pg_basebackup.c:865: warning: ISO C90 forbids mixed declarations and code

>>>
>>> I'll add the an explicit cast at that two lines.
>>>
 When I applied three patches and compiled the code, I got the above 
 warnings.

 How can we get the full backup that we can use for the archive recovery, 
 from
 the first full backup and subsequent incremental backups? What commands 
 should
 we use for that, for example? It's better to document that.

>>>
>>> I've sent a python PoC that supports the plain format only (not the tar 
>>> one).
>>> I'm currently rewriting it in C (with also the tar support) and I'll send a 
>>> new patch containing it ASAP.
>>
>> Yeah, if special tool is required for that purpose, the patch should include 
>> it.
>>
>
> I'm working on it. The interface will be exactly the same of the PoC script 
> I've attached to
>
> 54c7cdad.6060...@2ndquadrant.it
>
 What does "1" of the heading line in backup_profile mean?

>>>
>>> Nothing. It's a version number. If you think it's misleading I will remove 
>>> it.
>>
>> A version number of file format of backup profile? If it's required for
>> the validation of backup profile file as a safe-guard, it should be included
>> in the profile file. For example, it might be useful to check whether
>> pg_basebackup executable is compatible with the "source" backup that
>> you specify. But more info might be needed for such validation.
>>
>
> The current implementation bail out with an error if the header line is 
> different from what it expect.
> It also reports and error if the 2nd line is not the start WAL location. 
> That's all that pg_basebackup needs to start a new incremental backup. All 
> the other information are useful to reconstruct a full backup in case of an 
> incremental backup, or maybe to check the completeness of an archived full 
> backup.
> Initially the profile was present only in incremental backups, but after some 
> discussion on list we agreed to always write it.

Don't we need more checks about the compatibility of the backup-target database
cluster and the source incremental backup? Without such more checks, I'm afraid
we can easily get a corrupted incremental backups. For example, pg_basebackup
should emit an error if the target and source have the different system IDs,
like walreceiver does? What happens if the timeline ID is different between the
source and target? What happens if the source was taken from the standby but
new incremental backup will be taken from the master? Do we need to check them?

 Sorry if this has been already discussed so far. Why is a backup profile 
 file
 necessary? Maybe it's necessary in the future, but currently seems not.
>>>
>>> It's necessary because it's the only way to detect deleted files.
>>
>> Maybe I'm missing something. Seems we can detect that even without a profile.
>> For example, please imagine the case where the file has been deleted since
>> the last full backup and then the incremental backup is taken. In this case,
>> that deleted file exists only in the full backup. We can detect the deletion 
>> of
>> the file by checking both full and incremental backups.
>>
>
> When you take an incremental backup, only changed files are sent. Without the 
> backup_profile in the incremental backup, you cannot detect a deleted file, 
> because it's indistinguishable from a file that is not changed.

Yeah, you're right!

 We've really gotten the consensus about the current design, especially that
 every files basically need to be read to check whether they have been 
 modified
 since last backup even when *no* modification happens since last backup?
>>>
>>> The real problem here is that there is currently no way to detect that a 
>>> file is not changed since the last backup. We agreed to not use file system 
>>> timestamps as they are not reliable for that purpose.
>>
>> TBH I prefer timestamp-based approach in the first version of incremental 
>> backup
>> even if's less reliable than LSN-based one. I think that some users who are
>> using timestamp-based rsync (i.e., default mode) for the backup would be
>> satisfied with timestamp-based one.
>
> The original design was to compare size+timestamp+checksums (only if 
> everything else matches and the file has been modified after the start of the 
> backup), but the feedback fr

Re: [HACKERS] File based Incremental backup v8

2015-03-04 Thread Marco Nenciarini
Hi Fujii,

Il 03/03/15 11:48, Fujii Masao ha scritto:
> On Tue, Mar 3, 2015 at 12:36 AM, Marco Nenciarini
>  wrote:
>> Il 02/03/15 14:21, Fujii Masao ha scritto:
>>> On Thu, Feb 12, 2015 at 10:50 PM, Marco Nenciarini
>>>  wrote:
 Hi,

 I've attached an updated version of the patch.
>>>
>>> basebackup.c:1565: warning: format '%lld' expects type 'long long
>>> int', but argument 8 has type '__off_t'
>>> basebackup.c:1565: warning: format '%lld' expects type 'long long
>>> int', but argument 8 has type '__off_t'
>>> pg_basebackup.c:865: warning: ISO C90 forbids mixed declarations and code
>>>
>>
>> I'll add the an explicit cast at that two lines.
>>
>>> When I applied three patches and compiled the code, I got the above 
>>> warnings.
>>>
>>> How can we get the full backup that we can use for the archive recovery, 
>>> from
>>> the first full backup and subsequent incremental backups? What commands 
>>> should
>>> we use for that, for example? It's better to document that.
>>>
>>
>> I've sent a python PoC that supports the plain format only (not the tar one).
>> I'm currently rewriting it in C (with also the tar support) and I'll send a 
>> new patch containing it ASAP.
> 
> Yeah, if special tool is required for that purpose, the patch should include 
> it.
> 

I'm working on it. The interface will be exactly the same of the PoC script 
I've attached to 

54c7cdad.6060...@2ndquadrant.it

>>> What does "1" of the heading line in backup_profile mean?
>>>
>>
>> Nothing. It's a version number. If you think it's misleading I will remove 
>> it.
> 
> A version number of file format of backup profile? If it's required for
> the validation of backup profile file as a safe-guard, it should be included
> in the profile file. For example, it might be useful to check whether
> pg_basebackup executable is compatible with the "source" backup that
> you specify. But more info might be needed for such validation.
> 

The current implementation bail out with an error if the header line is 
different from what it expect.
It also reports and error if the 2nd line is not the start WAL location. That's 
all that pg_basebackup needs to start a new incremental backup. All the other 
information are useful to reconstruct a full backup in case of an incremental 
backup, or maybe to check the completeness of an archived full backup.
Initially the profile was present only in incremental backups, but after some 
discussion on list we agreed to always write it.

>>> Sorry if this has been already discussed so far. Why is a backup profile 
>>> file
>>> necessary? Maybe it's necessary in the future, but currently seems not.
>>
>> It's necessary because it's the only way to detect deleted files.
> 
> Maybe I'm missing something. Seems we can detect that even without a profile.
> For example, please imagine the case where the file has been deleted since
> the last full backup and then the incremental backup is taken. In this case,
> that deleted file exists only in the full backup. We can detect the deletion 
> of
> the file by checking both full and incremental backups.
> 

When you take an incremental backup, only changed files are sent. Without the 
backup_profile in the incremental backup, you cannot detect a deleted file, 
because it's indistinguishable from a file that is not changed.

>>> We've really gotten the consensus about the current design, especially that
>>> every files basically need to be read to check whether they have been 
>>> modified
>>> since last backup even when *no* modification happens since last backup?
>>
>> The real problem here is that there is currently no way to detect that a 
>> file is not changed since the last backup. We agreed to not use file system 
>> timestamps as they are not reliable for that purpose.
> 
> TBH I prefer timestamp-based approach in the first version of incremental 
> backup
> even if's less reliable than LSN-based one. I think that some users who are
> using timestamp-based rsync (i.e., default mode) for the backup would be
> satisfied with timestamp-based one.

The original design was to compare size+timestamp+checksums (only if everything 
else matches and the file has been modified after the start of the backup), but 
the feedback from the list was that we cannot trust the filesystem mtime and we 
must use LSN instead.

> 
>> Using LSN have a significant advantage over using checksum, as we can start 
>> the full copy as soon as we found a block whith a LSN greater than the 
>> threshold.
>> There are two cases: 1) the file is changed, so we can assume that we detect 
>> it after reading 50% of the file, then we send it taking advantage of file 
>> system cache; 2) the file is not changed, so we read it without sending 
>> anything.
>> It will end up producing an I/O comparable to a normal backup.
> 
> Yeah, it might make the situation better than today. But I'm afraid that
> many users might get disappointed about that behavior of an incremental
> backup after the rel

Re: [HACKERS] File based Incremental backup v8

2015-03-03 Thread Fujii Masao
On Tue, Mar 3, 2015 at 12:36 AM, Marco Nenciarini
 wrote:
> Il 02/03/15 14:21, Fujii Masao ha scritto:
>> On Thu, Feb 12, 2015 at 10:50 PM, Marco Nenciarini
>>  wrote:
>>> Hi,
>>>
>>> I've attached an updated version of the patch.
>>
>> basebackup.c:1565: warning: format '%lld' expects type 'long long
>> int', but argument 8 has type '__off_t'
>> basebackup.c:1565: warning: format '%lld' expects type 'long long
>> int', but argument 8 has type '__off_t'
>> pg_basebackup.c:865: warning: ISO C90 forbids mixed declarations and code
>>
>
> I'll add the an explicit cast at that two lines.
>
>> When I applied three patches and compiled the code, I got the above warnings.
>>
>> How can we get the full backup that we can use for the archive recovery, from
>> the first full backup and subsequent incremental backups? What commands 
>> should
>> we use for that, for example? It's better to document that.
>>
>
> I've sent a python PoC that supports the plain format only (not the tar one).
> I'm currently rewriting it in C (with also the tar support) and I'll send a 
> new patch containing it ASAP.

Yeah, if special tool is required for that purpose, the patch should include it.

>> What does "1" of the heading line in backup_profile mean?
>>
>
> Nothing. It's a version number. If you think it's misleading I will remove it.

A version number of file format of backup profile? If it's required for
the validation of backup profile file as a safe-guard, it should be included
in the profile file. For example, it might be useful to check whether
pg_basebackup executable is compatible with the "source" backup that
you specify. But more info might be needed for such validation.

>> Sorry if this has been already discussed so far. Why is a backup profile file
>> necessary? Maybe it's necessary in the future, but currently seems not.
>
> It's necessary because it's the only way to detect deleted files.

Maybe I'm missing something. Seems we can detect that even without a profile.
For example, please imagine the case where the file has been deleted since
the last full backup and then the incremental backup is taken. In this case,
that deleted file exists only in the full backup. We can detect the deletion of
the file by checking both full and incremental backups.

>> We've really gotten the consensus about the current design, especially that
>> every files basically need to be read to check whether they have been 
>> modified
>> since last backup even when *no* modification happens since last backup?
>
> The real problem here is that there is currently no way to detect that a file 
> is not changed since the last backup. We agreed to not use file system 
> timestamps as they are not reliable for that purpose.

TBH I prefer timestamp-based approach in the first version of incremental backup
even if's less reliable than LSN-based one. I think that some users who are
using timestamp-based rsync (i.e., default mode) for the backup would be
satisfied with timestamp-based one.

> Using LSN have a significant advantage over using checksum, as we can start 
> the full copy as soon as we found a block whith a LSN greater than the 
> threshold.
> There are two cases: 1) the file is changed, so we can assume that we detect 
> it after reading 50% of the file, then we send it taking advantage of file 
> system cache; 2) the file is not changed, so we read it without sending 
> anything.
> It will end up producing an I/O comparable to a normal backup.

Yeah, it might make the situation better than today. But I'm afraid that
many users might get disappointed about that behavior of an incremental
backup after the release...

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] File based Incremental backup v8

2015-03-02 Thread Marco Nenciarini
Il 02/03/15 14:21, Fujii Masao ha scritto:
> On Thu, Feb 12, 2015 at 10:50 PM, Marco Nenciarini
>  wrote:
>> Hi,
>>
>> I've attached an updated version of the patch.
> 
> basebackup.c:1565: warning: format '%lld' expects type 'long long
> int', but argument 8 has type '__off_t'
> basebackup.c:1565: warning: format '%lld' expects type 'long long
> int', but argument 8 has type '__off_t'
> pg_basebackup.c:865: warning: ISO C90 forbids mixed declarations and code
> 

I'll add the an explicit cast at that two lines.

> When I applied three patches and compiled the code, I got the above warnings.
> 
> How can we get the full backup that we can use for the archive recovery, from
> the first full backup and subsequent incremental backups? What commands should
> we use for that, for example? It's better to document that.
> 

I've sent a python PoC that supports the plain format only (not the tar one).
I'm currently rewriting it in C (with also the tar support) and I'll send a new 
patch containing it ASAP. 

> What does "1" of the heading line in backup_profile mean?
> 

Nothing. It's a version number. If you think it's misleading I will remove it.

> Sorry if this has been already discussed so far. Why is a backup profile file
> necessary? Maybe it's necessary in the future, but currently seems not.

It's necessary because it's the only way to detect deleted files.

> Several infos like LSN, modification time, size, etc are tracked in a backup
> profile file for every backup files, but they are not used for now. If it's 
> now
> not required, I'm inclined to remove it to simplify the code.

I've put LSN there mainly for debugging purpose, but it can also be used to 
check the file during pg_restorebackup execution. The sent field is probably 
redundant (if sent = False and LSN is not set, we should probably simply avoid 
to write a line about that file) and I'll remove it in the next patch.

> 
> We've really gotten the consensus about the current design, especially that
> every files basically need to be read to check whether they have been modified
> since last backup even when *no* modification happens since last backup?

The real problem here is that there is currently no way to detect that a file 
is not changed since the last backup. We agreed to not use file system 
timestamps as they are not reliable for that purpose.
Using LSN have a significant advantage over using checksum, as we can start the 
full copy as soon as we found a block whith a LSN greater than the threshold.
There are two cases: 1) the file is changed, so we can assume that we detect it 
after reading 50% of the file, then we send it taking advantage of file system 
cache; 2) the file is not changed, so we read it without sending anything.
It will end up producing an I/O comparable to a normal backup.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it



signature.asc
Description: OpenPGP digital signature


Re: [HACKERS] File based Incremental backup v8

2015-03-02 Thread Fujii Masao
On Thu, Feb 12, 2015 at 10:50 PM, Marco Nenciarini
 wrote:
> Hi,
>
> I've attached an updated version of the patch.

basebackup.c:1565: warning: format '%lld' expects type 'long long
int', but argument 8 has type '__off_t'
basebackup.c:1565: warning: format '%lld' expects type 'long long
int', but argument 8 has type '__off_t'
pg_basebackup.c:865: warning: ISO C90 forbids mixed declarations and code

When I applied three patches and compiled the code, I got the above warnings.

How can we get the full backup that we can use for the archive recovery, from
the first full backup and subsequent incremental backups? What commands should
we use for that, for example? It's better to document that.

What does "1" of the heading line in backup_profile mean?

Sorry if this has been already discussed so far. Why is a backup profile file
necessary? Maybe it's necessary in the future, but currently seems not.
Several infos like LSN, modification time, size, etc are tracked in a backup
profile file for every backup files, but they are not used for now. If it's now
not required, I'm inclined to remove it to simplify the code.

We've really gotten the consensus about the current design, especially that
every files basically need to be read to check whether they have been modified
since last backup even when *no* modification happens since last backup?

Regards,

-- 
Fujii Masao


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] File based Incremental backup v8

2015-02-12 Thread Marco Nenciarini
Hi,

I've attached an updated version of the patch. This fixes the issue on
checksum calculation for segments after the first one.

To solve it I've added an optional uint32 *segno argument to
parse_filename_for_nontemp_relation, so I can know the segment number
and calculate the block number correctly.

Il 29/01/15 18:57, Robert Haas ha scritto:
> On Thu, Jan 29, 2015 at 9:47 AM, Marco Nenciarini
>  wrote:
>> The current implementation of copydir function is incompatible with LSN
>> based incremental backups. The problem is that new files are created,
>> but their blocks are still with the old LSN, so they will not be backed
>> up because they are looking old enough.
> 
> I think this is trying to pollute what's supposed to be a pure
> fs-level operation ("copy a directory") into something that is aware
> of specific details like the PostgreSQL page format.  I really think
> that nothing in storage/file should know about the page format.  If we
> need a function that copies a file while replacing the LSNs, I think
> it should be a new function living somewhere else.
> 

I've named it copydir_set_lsn and placed it as static function in
dbcommands.c. This lefts the copydir and copy_file functions in copydir.c
untouched. The copydir function in copydir.c is now unused, while the copy_file
function is still used during unlogged tables reinit.

> A bigger problem is that you are proposing to stamp those files with
> LSNs that are, for lack of a better word, fake.  I would expect that
> this would completely break if checksums are enabled.  Also, unlogged
> relations typically have an LSN of 0; this would change that in some
> cases, and I don't know whether that's OK.
>

I've investigate a bit and I have not been able to find any problem here.

> The issues here are similar to those in
> http://www.postgresql.org/message-id/20150120152819.gc24...@alap3.anarazel.de
> - basically, I think we need to make CREATE DATABASE and ALTER
> DATABASE .. SET TABLESPACE fully WAL-logged operations, or this is
> never going to work right.  If we're not going to allow that, we need
> to disallow hot backups while those operations are in progress.
>

As already said the copydir-LSN patch should be treated as a "temporary"
until a proper WAL logging of CREATE DATABASE and ALTER DATABASE SET
TABLESPACE will be implemented. At that time we could probably get rid
of the whole copydir.[ch] file moving the copy_file function inside reinit.c

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it


diff --git a/src/backend/storage/file/reinit.c 
b/src/backend/storage/file/reinit.c
index afd9255..02b5fee 100644
*** a/src/backend/storage/file/reinit.c
--- b/src/backend/storage/file/reinit.c
*** static void ResetUnloggedRelationsInTabl
*** 28,35 
  int 
op);
  static void ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname,
   int op);
- static bool parse_filename_for_nontemp_relation(const char *name,
-   int 
*oidchars, ForkNumber *fork);
  
  typedef struct
  {
--- 28,33 
*** ResetUnloggedRelationsInDbspaceDir(const
*** 388,446 
fsync_fname((char *) dbspacedirname, true);
}
  }
- 
- /*
-  * Basic parsing of putative relation filenames.
-  *
-  * This function returns true if the file appears to be in the correct format
-  * for a non-temporary relation and false otherwise.
-  *
-  * NB: If this function returns true, the caller is entitled to assume that
-  * *oidchars has been set to the a value no more than OIDCHARS, and thus
-  * that a buffer of OIDCHARS+1 characters is sufficient to hold the OID
-  * portion of the filename.  This is critical to protect against a possible
-  * buffer overrun.
-  */
- static bool
- parse_filename_for_nontemp_relation(const char *name, int *oidchars,
-   
ForkNumber *fork)
- {
-   int pos;
- 
-   /* Look for a non-empty string of digits (that isn't too long). */
-   for (pos = 0; isdigit((unsigned char) name[pos]); ++pos)
-   ;
-   if (pos == 0 || pos > OIDCHARS)
-   return false;
-   *oidchars = pos;
- 
-   /* Check for a fork name. */
-   if (name[pos] != '_')
-   *fork = MAIN_FORKNUM;
-   else
-   {
-   int forkchar;
- 
-   forkchar = forkname_chars(&name[pos + 1], fork);
-   if (forkchar <= 0)
-   return false;
-   pos += forkchar + 1;
-   }
- 
-   /* Check for a segment number. */
-   if (name[pos] == '.')
-   {
-   int segchar;
- 
- 

Re: [HACKERS] File based Incremental backup v8

2015-01-31 Thread Erik Rijkers
On Sat, January 31, 2015 15:14, Marco Nenciarini wrote:

> 0001-public-parse_filename_for_nontemp_relation.patch
> 0002-copydir-LSN-v2.patch
> 0003-File-based-incremental-backup-v8.patch

Hi,

It looks like it only compiles with assert enabled.

This is perhaps not yet really a problem at this stage but I thought I'd 
mention it:

make --quiet -j 8
In file included from gram.y:14403:0:
scan.c: In function ‘yy_try_NUL_trans’:
scan.c:10174:23: warning: unused variable ‘yyg’ [-Wunused-variable]
 struct yyguts_t * yyg = (struct yyguts_t*)yyscanner; /* This var may be 
unused depending upon options. */
   ^
basebackup.c: In function ‘writeBackupProfileLine’:
basebackup.c:1545:8: warning: format ‘%lld’ expects argument of type ‘long long 
int’, but argument 8 has type ‘__off_t’
[-Wformat=]
filename);
^
basebackup.c:1545:8: warning: format ‘%lld’ expects argument of type ‘long long 
int’, but argument 8 has type ‘__off_t’
[-Wformat=]
pg_basebackup.c: In function ‘ReceiveTarFile’:
pg_basebackup.c:858:2: warning: implicit declaration of function ‘assert’ 
[-Wimplicit-function-declaration]
  assert(res || (strcmp(basedir, "-") == 0));
  ^
pg_basebackup.c:865:2: warning: ISO C90 forbids mixed declarations and code 
[-Wdeclaration-after-statement]
  gzFile  ztarfile = NULL;
  ^
pg_basebackup.o: In function `ReceiveAndUnpackTarFile':
pg_basebackup.c:(.text+0x690): undefined reference to `assert'
pg_basebackup.o: In function `ReceiveTarFile':
pg_basebackup.c:(.text+0xeb0): undefined reference to `assert'
pg_basebackup.c:(.text+0x10ad): undefined reference to `assert'
collect2: error: ld returned 1 exit status
make[3]: *** [pg_basebackup] Error 1
make[3]: *** Waiting for unfinished jobs
make[2]: *** [all-pg_basebackup-recurse] Error 2
make[2]: *** Waiting for unfinished jobs
make[1]: *** [all-bin-recurse] Error 2
make: *** [all-src-recurse] Error 2

The configure used was:
./configure \
 --prefix=/home/aardvark/pg_stuff/pg_installations/pgsql.incremental_backup \
 
--bindir=/home/aardvark/pg_stuff/pg_installations/pgsql.incremental_backup/bin.fast
 \
 
--libdir=/home/aardvark/pg_stuff/pg_installations/pgsql.incremental_backup/lib.fast
 \
 --with-pgport=6973 --quiet --enable-depend \
 --with-extra-version=_incremental_backup_20150131_1521_08bd0c581158 \
 --with-openssl --with-perl --with-libxml --with-libxslt --with-zlib


A build with --enable-cassert and --enable-debug builds fine:

./configure \
 --prefix=/home/aardvark/pg_stuff/pg_installations/pgsql.incremental_backup \
 --bindir=/home/aardvark/pg_stuff/pg_installations/pgsql.incremental_backup/bin 
\
 --libdir=/home/aardvark/pg_stuff/pg_installations/pgsql.incremental_backup/lib 
\
 --with-pgport=6973 --quiet --enable-depend \
 --with-extra-version=_incremental_backup_20150131_1628_08bd0c581158 \
 --enable-cassert --enable-debug \
 --with-openssl --with-perl --with-libxml --with-libxslt --with-zlib

I will further test with that.


thanks,

Erik Rijkers




-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] File based Incremental backup v8

2015-01-31 Thread Marco Nenciarini
Il 29/01/15 18:57, Robert Haas ha scritto:
> On Thu, Jan 29, 2015 at 9:47 AM, Marco Nenciarini
>  wrote:
>> The current implementation of copydir function is incompatible with LSN
>> based incremental backups. The problem is that new files are created,
>> but their blocks are still with the old LSN, so they will not be backed
>> up because they are looking old enough.
> 
> I think this is trying to pollute what's supposed to be a pure
> fs-level operation ("copy a directory") into something that is aware
> of specific details like the PostgreSQL page format.  I really think
> that nothing in storage/file should know about the page format.  If we
> need a function that copies a file while replacing the LSNs, I think
> it should be a new function living somewhere else.

Given that the copydir function is used only during CREATE DATABASE and
ALTER DATABASE SET TABLESPACE, we could move it/renaming it to a better
place that clearly mark it as "knowing about page format". I'm open to
suggestions on where to place it an on what should be the correct name.

However the whole copydir patch here should be treated as a "temporary"
thing. It is necessary until a proper WAL logging of CREATE DATABASE and
ALTER DATABASE SET TABLESPACE will be implemented to support any form of
LSN based incremental backup.

> 
> A bigger problem is that you are proposing to stamp those files with
> LSNs that are, for lack of a better word, fake.  I would expect that
> this would completely break if checksums are enabled.

I'm sorry I completely ignored checksums in previous patch. The attached
one works with checksums enabled.

> Also, unlogged relations typically have an LSN of 0; this would
> change that in some cases, and I don't know whether that's OK.
>

It shouldn't be a problem because all the code that uses unlogged
relations normally skip all the WAL related operations. From the point
of view of an incremental backup it is also not a problem, because
restoring the backup the unlogged tables will get reinitialized because
of crash recovery procedure. However if you think it is worth the
effort, I can rewrite the copydir as a two pass operation detecting the
unlogged tables on the first pass and avoiding the LSN update on
unlogged tables. I personally think that it doesn't wort the effort
unless someone identify a real path where settins LSNs in unlogged
relations leads to an issue.

> The issues here are similar to those in
> http://www.postgresql.org/message-id/20150120152819.gc24...@alap3.anarazel.de
> - basically, I think we need to make CREATE DATABASE and ALTER
> DATABASE .. SET TABLESPACE fully WAL-logged operations, or this is
> never going to work right.  If we're not going to allow that, we need
> to disallow hot backups while those operations are in progress.
> 

This is right, but the problem Andres reported is orthogonal with the
one I'm addressing here. Without this copydir patch (or without a proper
WAL logging of copydir operations), you cannot take an incremental
backup after a CREATE DATABASE or ALTER DATABASE SET TABLESPACE until
you get a full backup and use it as base.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it
From 3e451077283de8e99c4eceb748d49c34329c6ef8 Mon Sep 17 00:00:00 2001
From: Marco Nenciarini 
Date: Thu, 29 Jan 2015 12:18:47 +0100
Subject: [PATCH 1/3] public parse_filename_for_nontemp_relation

---
 src/backend/storage/file/reinit.c | 58 ---
 src/common/relpath.c  | 56 +
 src/include/common/relpath.h  |  2 ++
 3 files changed, 58 insertions(+), 58 deletions(-)

diff --git a/src/backend/storage/file/reinit.c 
b/src/backend/storage/file/reinit.c
index afd9255..02b5fee 100644
*** a/src/backend/storage/file/reinit.c
--- b/src/backend/storage/file/reinit.c
*** static void ResetUnloggedRelationsInTabl
*** 28,35 
  int 
op);
  static void ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname,
   int op);
- static bool parse_filename_for_nontemp_relation(const char *name,
-   int 
*oidchars, ForkNumber *fork);
  
  typedef struct
  {
--- 28,33 
*** ResetUnloggedRelationsInDbspaceDir(const
*** 388,446 
fsync_fname((char *) dbspacedirname, true);
}
  }
- 
- /*
-  * Basic parsing of putative relation filenames.
-  *
-  * This function returns true if the file appears to be in the correct format
-  * for a non-temporary relation and false otherwise.
-  *
-  * NB: If this function returns true, the caller is entitled to assume that
-  * *oidchars has been set to the a value no more than OIDCHARS, and thus
-  * that a buffer of OIDCHARS+1 characters 

Re: [HACKERS] File based Incremental backup v8

2015-01-29 Thread Andres Freund
On 2015-01-29 12:57:22 -0500, Robert Haas wrote:
> The issues here are similar to those in
> http://www.postgresql.org/message-id/20150120152819.gc24...@alap3.anarazel.de
> - basically, I think we need to make CREATE DATABASE and ALTER
> DATABASE .. SET TABLESPACE fully WAL-logged operations, or this is
> never going to work right.  If we're not going to allow that, we need
> to disallow hot backups while those operations are in progress.

Yea, the current way is just a hack from the dark ages. Which has some
advantages, true, but I don't think they outweight the disadvantages. I
hope to find time to develop a patch to make those properly WAL logged
(for master) sometime not too far away.

Greetings,

Andres Freund

-- 
 Andres Freund http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


Re: [HACKERS] File based Incremental backup v8

2015-01-29 Thread Robert Haas
On Thu, Jan 29, 2015 at 9:47 AM, Marco Nenciarini
 wrote:
> The current implementation of copydir function is incompatible with LSN
> based incremental backups. The problem is that new files are created,
> but their blocks are still with the old LSN, so they will not be backed
> up because they are looking old enough.

I think this is trying to pollute what's supposed to be a pure
fs-level operation ("copy a directory") into something that is aware
of specific details like the PostgreSQL page format.  I really think
that nothing in storage/file should know about the page format.  If we
need a function that copies a file while replacing the LSNs, I think
it should be a new function living somewhere else.

A bigger problem is that you are proposing to stamp those files with
LSNs that are, for lack of a better word, fake.  I would expect that
this would completely break if checksums are enabled.  Also, unlogged
relations typically have an LSN of 0; this would change that in some
cases, and I don't know whether that's OK.

The issues here are similar to those in
http://www.postgresql.org/message-id/20150120152819.gc24...@alap3.anarazel.de
- basically, I think we need to make CREATE DATABASE and ALTER
DATABASE .. SET TABLESPACE fully WAL-logged operations, or this is
never going to work right.  If we're not going to allow that, we need
to disallow hot backups while those operations are in progress.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers


[HACKERS] File based Incremental backup v8

2015-01-29 Thread Marco Nenciarini
The current implementation of copydir function is incompatible with LSN
based incremental backups. The problem is that new files are created,
but their blocks are still with the old LSN, so they will not be backed
up because they are looking old enough.

copydir function is used in:

  CREATE DATABASE
  ALTER DATABASE SET TABLESPACE

I can imagine two possible solutions:

a) wal log the whole copydir operations, setting the lsn accordingly
b) pass to copydir the LSN of the operation which triggered it, and
update the LSN of all the copied blocks

The latter solution is IMO easier to be implemented and does not deviate
much from the current implementation.

I've implemented it and it's attached to this message.

I've also moved the parse_filename_for_notntemp_relation function out of
reinit.c to make it available both to copydir.c and basebackup.c.

I've also limited the LSN comparison to the only MAIN fork, because:

* LSN fork doesn't uses LSN
* VM fork update LSN only when the visibility bit is set
* INIT forks doesn't use LSN. It's only one page anyway.

Regards,
Marco

-- 
Marco Nenciarini - 2ndQuadrant Italy
PostgreSQL Training, Services and Support
marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it
From 087faed899b9afab324aff7fa20e715c4f99eb4a Mon Sep 17 00:00:00 2001
From: Marco Nenciarini 
Date: Thu, 29 Jan 2015 12:18:47 +0100
Subject: [PATCH 1/3] public parse_filename_for_nontemp_relation

---
 src/backend/storage/file/reinit.c | 58 ---
 src/common/relpath.c  | 56 +
 src/include/common/relpath.h  |  2 ++
 3 files changed, 58 insertions(+), 58 deletions(-)

diff --git a/src/backend/storage/file/reinit.c 
b/src/backend/storage/file/reinit.c
index afd9255..02b5fee 100644
*** a/src/backend/storage/file/reinit.c
--- b/src/backend/storage/file/reinit.c
*** static void ResetUnloggedRelationsInTabl
*** 28,35 
  int 
op);
  static void ResetUnloggedRelationsInDbspaceDir(const char *dbspacedirname,
   int op);
- static bool parse_filename_for_nontemp_relation(const char *name,
-   int 
*oidchars, ForkNumber *fork);
  
  typedef struct
  {
--- 28,33 
*** ResetUnloggedRelationsInDbspaceDir(const
*** 388,446 
fsync_fname((char *) dbspacedirname, true);
}
  }
- 
- /*
-  * Basic parsing of putative relation filenames.
-  *
-  * This function returns true if the file appears to be in the correct format
-  * for a non-temporary relation and false otherwise.
-  *
-  * NB: If this function returns true, the caller is entitled to assume that
-  * *oidchars has been set to the a value no more than OIDCHARS, and thus
-  * that a buffer of OIDCHARS+1 characters is sufficient to hold the OID
-  * portion of the filename.  This is critical to protect against a possible
-  * buffer overrun.
-  */
- static bool
- parse_filename_for_nontemp_relation(const char *name, int *oidchars,
-   
ForkNumber *fork)
- {
-   int pos;
- 
-   /* Look for a non-empty string of digits (that isn't too long). */
-   for (pos = 0; isdigit((unsigned char) name[pos]); ++pos)
-   ;
-   if (pos == 0 || pos > OIDCHARS)
-   return false;
-   *oidchars = pos;
- 
-   /* Check for a fork name. */
-   if (name[pos] != '_')
-   *fork = MAIN_FORKNUM;
-   else
-   {
-   int forkchar;
- 
-   forkchar = forkname_chars(&name[pos + 1], fork);
-   if (forkchar <= 0)
-   return false;
-   pos += forkchar + 1;
-   }
- 
-   /* Check for a segment number. */
-   if (name[pos] == '.')
-   {
-   int segchar;
- 
-   for (segchar = 1; isdigit((unsigned char) name[pos + segchar]); 
++segchar)
-   ;
-   if (segchar <= 1)
-   return false;
-   pos += segchar;
-   }
- 
-   /* Now we should be at the end. */
-   if (name[pos] != '\0')
-   return false;
-   return true;
- }
--- 386,388 
diff --git a/src/common/relpath.c b/src/common/relpath.c
index 66dfef1..83a1e3a 100644
*** a/src/common/relpath.c
--- b/src/common/relpath.c
*** GetRelationPath(Oid dbNode, Oid spcNode,
*** 206,208 
--- 206,264 
}
return path;
  }
+ 
+ /*
+  * Basic parsing of putative relation filenames.
+  *
+  * This function returns true if the file appears to be in the correct format
+  * for a non-temporary relation and false otherwise.
+  *
+  * NB: If this function returns true, the caller is entitled to assume that
+  * *o