Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
Il 03/10/14 23:12, Andres Freund ha scritto: On 2014-10-03 17:31:45 +0200, Marco Nenciarini wrote: I've updated the wiki page https://wiki.postgresql.org/wiki/Incremental_backup following the result of discussion on hackers. Compared to first version, we switched from a timestamp+checksum based approach to one based on LSN. This patch adds an option to pg_basebackup and to replication protocol BASE_BACKUP command to generate a backup_profile file. It is almost useless by itself, but it is the foundation on which we will build the file based incremental backup (and hopefully a block based incremental backup after it). Any comment will be appreciated. In particular I'd appreciate comments on correctness of relnode files detection and LSN extraction code. Can you describe the algorithm you implemented in words? Here it is the relnode files detection algorithm: I've added a has_relfiles parameter to the sendDir function. If has_relfiles is true every file in the directory is tested against the validateRelfilenodeName function. If the response is true, the maxLSN value is computed for the file. The sendDir function is called with has_relfiles=true by sendTablespace function and by sendDir itself when is recurring into a subdirectory * if has_relfiles is true * if we are recurring into a ./global or ./base directory The validateRelfilenodeName has been taken from pg_computemaxlsn patch. It's short enough to be pasted here: static bool validateRelfilenodename(char *name) { int pos = 0; while ((name[pos] = '0') (name[pos] = '9')) pos++; if (name[pos] == '_') { pos++; while ((name[pos] = 'a') (name[pos] = 'z')) pos++; } if (name[pos] == '.') { pos++; while ((name[pos] = '0') (name[pos] = '9')) pos++; } if (name[pos] == 0) return true; return false; } To compute the maxLSN for a file, as the file is sent in TAR_SEND_SIZE chunks (32kb) and it is always a multiple of the block size, I've added the following code inside the send cycle: + char *page; + + /* Scan every page to find the max file LSN */ + for (page = buf; page buf + (off_t) cnt; page += (off_t) BLCKSZ) { + pagelsn = PageGetLSN(page); + if (filemaxlsn pagelsn) + filemaxlsn = pagelsn; + } + Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it signature.asc Description: OpenPGP digital signature
Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
Il 04/10/14 08:35, Michael Paquier ha scritto: On Sat, Oct 4, 2014 at 12:31 AM, Marco Nenciarini marco.nenciar...@2ndquadrant.it wrote: Compared to first version, we switched from a timestamp+checksum based approach to one based on LSN. Cool. This patch adds an option to pg_basebackup and to replication protocol BASE_BACKUP command to generate a backup_profile file. It is almost useless by itself, but it is the foundation on which we will build the file based incremental backup (and hopefully a block based incremental backup after it). Hm. I am not convinced by the backup profile file. What's wrong with having a client send only an LSN position to get a set of files (or partial files filed with blocks) newer than the position given, and have the client do all the rebuild analysis? The main problem I see is the following: how a client can detect a truncated or removed file? Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it signature.asc Description: OpenPGP digital signature
Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
On Mon, Oct 6, 2014 at 8:59 AM, Marco Nenciarini marco.nenciar...@2ndquadrant.it wrote: Il 04/10/14 08:35, Michael Paquier ha scritto: On Sat, Oct 4, 2014 at 12:31 AM, Marco Nenciarini marco.nenciar...@2ndquadrant.it wrote: Compared to first version, we switched from a timestamp+checksum based approach to one based on LSN. Cool. This patch adds an option to pg_basebackup and to replication protocol BASE_BACKUP command to generate a backup_profile file. It is almost useless by itself, but it is the foundation on which we will build the file based incremental backup (and hopefully a block based incremental backup after it). Hm. I am not convinced by the backup profile file. What's wrong with having a client send only an LSN position to get a set of files (or partial files filed with blocks) newer than the position given, and have the client do all the rebuild analysis? The main problem I see is the following: how a client can detect a truncated or removed file? When you take a differential backup, the server needs to send some piece of information about every file so that the client can compare that list against what it already has. But a full backup does not need to include similar information. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
Il 03/10/14 22:47, Robert Haas ha scritto: On Fri, Oct 3, 2014 at 12:08 PM, Marco Nenciarini marco.nenciar...@2ndquadrant.it wrote: Il 03/10/14 17:53, Heikki Linnakangas ha scritto: If we're going to need a profile file - and I'm not convinced of that - is there any reason to not always include it in the backup? The main reason is to have a centralized list of files that need to be present. Without a profile, you have to insert some sort of placeholder for kipped files. Why do you need to do that? And where do you need to do that? It seems to me that there are three interesting operations: 1. Take a full backup. Basically, we already have this. In the backup label file, make sure to note the newest LSN guaranteed to be present in the backup. Don't we already have it in START WAL LOCATION? 2. Take a differential backup. In the backup label file, note the LSN of the fullback to which the differential backup is relative, and the newest LSN guaranteed to be present in the differential backup. The actual backup can consist of a series of 20-byte buffer tags, those being the exact set of blocks newer than the base-backup's latest-guaranteed-to-be-present LSN. Each buffer tag is followed by an 8kB block of data. If a relfilenode is truncated or removed, you need some way to indicate that in the backup; e.g. include a buffertag with forknum = -(forknum + 1) and blocknum = the new number of blocks, or InvalidBlockNumber if removed entirely. To have a working backup you need to ship each block which is newer than latest-guaranteed-to-be-present in full backup and not newer than latest-guaranteed-to-be-present in the current backup. Also, as a further optimization, you can think about not sending the empty space in the middle of each page. My main concern here is about how postgres can remember that a relfilenode has been deleted, in order to send the appropriate deletion tag. IMHO the easiest way is to send the full list of files along the backup and let to the client the task to delete unneeded files. The backup profile has this purpose. Moreover, I do not like the idea of using only a stream of block as the actual differential backup, for the following reasons: * AFAIK, with the current infrastructure, you cannot do a backup with a block stream only. To have a valid backup you need many files for which the concept of LSN doesn't apply. * I don't like to have all the data from the various tablespace/db/whatever all mixed in the same stream. I'd prefer to have the blocks saved on a per file basis. 3. Apply a differential backup to a full backup to create an updated full backup. This is just a matter of scanning the full backup and the differential backup and applying the changes in the differential backup to the full backup. You might want combinations of these, like something that does 2+3 as a single operation, for efficiency, or a way to copy a full backup and apply a differential backup to it as you go. But that's it, right? What else do you need? Nothing else. Once we agree on definition of involved files and protocols formats, only the actual coding remains. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it signature.asc Description: OpenPGP digital signature
Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
On Mon, Oct 6, 2014 at 11:33 AM, Marco Nenciarini marco.nenciar...@2ndquadrant.it wrote: 1. Take a full backup. Basically, we already have this. In the backup label file, make sure to note the newest LSN guaranteed to be present in the backup. Don't we already have it in START WAL LOCATION? Yeah, probably. I was too lazy to go look for it, but that sounds like the right thing. 2. Take a differential backup. In the backup label file, note the LSN of the fullback to which the differential backup is relative, and the newest LSN guaranteed to be present in the differential backup. The actual backup can consist of a series of 20-byte buffer tags, those being the exact set of blocks newer than the base-backup's latest-guaranteed-to-be-present LSN. Each buffer tag is followed by an 8kB block of data. If a relfilenode is truncated or removed, you need some way to indicate that in the backup; e.g. include a buffertag with forknum = -(forknum + 1) and blocknum = the new number of blocks, or InvalidBlockNumber if removed entirely. To have a working backup you need to ship each block which is newer than latest-guaranteed-to-be-present in full backup and not newer than latest-guaranteed-to-be-present in the current backup. Also, as a further optimization, you can think about not sending the empty space in the middle of each page. Right. Or compressing the data. My main concern here is about how postgres can remember that a relfilenode has been deleted, in order to send the appropriate deletion tag. You also need to handle truncation. IMHO the easiest way is to send the full list of files along the backup and let to the client the task to delete unneeded files. The backup profile has this purpose. Moreover, I do not like the idea of using only a stream of block as the actual differential backup, for the following reasons: * AFAIK, with the current infrastructure, you cannot do a backup with a block stream only. To have a valid backup you need many files for which the concept of LSN doesn't apply. * I don't like to have all the data from the various tablespace/db/whatever all mixed in the same stream. I'd prefer to have the blocks saved on a per file basis. OK, that makes sense. But you still only need the file list when sending a differential backup, not when sending a full backup. So maybe a differential backup looks like this: - Ship a table-of-contents file with a list relation files currently present and the length of each in blocks. - For each block that's been modified since the original backup, ship a file called delta_original file name which is of the form block numberchanged block contents [...]. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
Il 06/10/14 16:51, Robert Haas ha scritto: On Mon, Oct 6, 2014 at 8:59 AM, Marco Nenciarini marco.nenciar...@2ndquadrant.it wrote: Il 04/10/14 08:35, Michael Paquier ha scritto: On Sat, Oct 4, 2014 at 12:31 AM, Marco Nenciarini marco.nenciar...@2ndquadrant.it wrote: Compared to first version, we switched from a timestamp+checksum based approach to one based on LSN. Cool. This patch adds an option to pg_basebackup and to replication protocol BASE_BACKUP command to generate a backup_profile file. It is almost useless by itself, but it is the foundation on which we will build the file based incremental backup (and hopefully a block based incremental backup after it). Hm. I am not convinced by the backup profile file. What's wrong with having a client send only an LSN position to get a set of files (or partial files filed with blocks) newer than the position given, and have the client do all the rebuild analysis? The main problem I see is the following: how a client can detect a truncated or removed file? When you take a differential backup, the server needs to send some piece of information about every file so that the client can compare that list against what it already has. But a full backup does not need to include similar information. I agree that a full backup does not need to include a profile. I've added the option to require the profile even for a full backup, as it can be useful for backup softwares. We could remove the option and build the profile only during incremental backups, if required. However, I would avoid the needing to scan the whole backup to know the size of the recovered data directory, hence the backup profile. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it signature.asc Description: OpenPGP digital signature
Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
On Mon, Oct 6, 2014 at 11:51 AM, Marco Nenciarini marco.nenciar...@2ndquadrant.it wrote: I agree that a full backup does not need to include a profile. I've added the option to require the profile even for a full backup, as it can be useful for backup softwares. We could remove the option and build the profile only during incremental backups, if required. However, I would avoid the needing to scan the whole backup to know the size of the recovered data directory, hence the backup profile. That doesn't seem to be buying you much. Calling stat() on every file in a directory tree is a pretty cheap operation. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
Hello, 2014-10-06 17:51 GMT+02:00 Marco Nenciarini marco.nenciar...@2ndquadrant.it : I agree that a full backup does not need to include a profile. I've added the option to require the profile even for a full backup, as it can be useful for backup softwares. We could remove the option and build the profile only during incremental backups, if required. However, I would avoid the needing to scan the whole backup to know the size of the recovered data directory, hence the backup profile. I really like this approach. I think we should leave users the ability to ship a profile file even in case of full backup (by default disabled). Thanks, Gabriele
Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
Il 06/10/14 17:55, Robert Haas ha scritto: On Mon, Oct 6, 2014 at 11:51 AM, Marco Nenciarini marco.nenciar...@2ndquadrant.it wrote: I agree that a full backup does not need to include a profile. I've added the option to require the profile even for a full backup, as it can be useful for backup softwares. We could remove the option and build the profile only during incremental backups, if required. However, I would avoid the needing to scan the whole backup to know the size of the recovered data directory, hence the backup profile. That doesn't seem to be buying you much. Calling stat() on every file in a directory tree is a pretty cheap operation. In case of incremental backup it is not true. You have to read the delta file to know the final size. You can optimize it putting this information in the first few bytes, but in case of compressed tar format you will need to scan the whole archive. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it signature.asc Description: OpenPGP digital signature
Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
On 10/06/2014 06:33 PM, Marco Nenciarini wrote: Il 03/10/14 22:47, Robert Haas ha scritto: 2. Take a differential backup. In the backup label file, note the LSN of the fullback to which the differential backup is relative, and the newest LSN guaranteed to be present in the differential backup. The actual backup can consist of a series of 20-byte buffer tags, those being the exact set of blocks newer than the base-backup's latest-guaranteed-to-be-present LSN. Each buffer tag is followed by an 8kB block of data. If a relfilenode is truncated or removed, you need some way to indicate that in the backup; e.g. include a buffertag with forknum = -(forknum + 1) and blocknum = the new number of blocks, or InvalidBlockNumber if removed entirely. To have a working backup you need to ship each block which is newer than latest-guaranteed-to-be-present in full backup and not newer than latest-guaranteed-to-be-present in the current backup. Also, as a further optimization, you can think about not sending the empty space in the middle of each page. My main concern here is about how postgres can remember that a relfilenode has been deleted, in order to send the appropriate deletion tag. IMHO the easiest way is to send the full list of files along the backup and let to the client the task to delete unneeded files. The backup profile has this purpose. Right, but the server doesn't need to send a separate backup profile file for that. Rather, anything that the server *didn't* send, should be deleted. I think the missing piece in this puzzle is that even for unmodified blocks, the server should send a note saying the blocks were present, but not modified. So for each file present in the server, the server sends a block stream. For each block, it sends either the full block contents, if it was modified, or a simple indicator that it was not modified. There's a downside to this, though. The client has to read the whole stream, before it knows which files were present. So when applying a block stream directly over an old backup, the client cannot delete files until it has applied all the other changes. That needs more needs more disk space. With a separate profile file that's sent *before* the rest of the backup, you could delete the obsolete files first. But that's not a very big deal. I would suggest that you leave out the profile file in the first version, and add it as an optimization later, if needed. Moreover, I do not like the idea of using only a stream of block as the actual differential backup, for the following reasons: * AFAIK, with the current infrastructure, you cannot do a backup with a block stream only. To have a valid backup you need many files for which the concept of LSN doesn't apply. Those should be sent in whole. At least in the first version. The non-relation files are small compared to relation files, so it's not too bad to just include them in full. 3. Apply a differential backup to a full backup to create an updated full backup. This is just a matter of scanning the full backup and the differential backup and applying the changes in the differential backup to the full backup. You might want combinations of these, like something that does 2+3 as a single operation, for efficiency, or a way to copy a full backup and apply a differential backup to it as you go. But that's it, right? What else do you need? Nothing else. Once we agree on definition of involved files and protocols formats, only the actual coding remains. BTW, regarding the protocol, I have an idea. Rather than invent a whole new file format to represent the modified blocks, can we reuse some existing binary diff file format? For example, the VCDIFF format (RFC 3284). For each unmodified block, the server would send a vcdiff COPY instruction, to copy the block from the old backup, and for a modified block, the server would send an ADD instruction, with the new block contents. The VCDIFF file format is quite flexible, but we would only use a small subset of it. I believe that subset would be just as easy to generate in the backend as a custom file format, but you could then use an external tool (xdelta3, open-vcdiff) to apply the diff manually, in case of emergency. In essence, the server would send a tar stream as usual, but for each relation file, it would send a VCDIFF file with name relfilenode.vcdiff instead. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
Il 06/10/14 17:50, Robert Haas ha scritto: On Mon, Oct 6, 2014 at 11:33 AM, Marco Nenciarini marco.nenciar...@2ndquadrant.it wrote: 2. Take a differential backup. In the backup label file, note the LSN of the fullback to which the differential backup is relative, and the newest LSN guaranteed to be present in the differential backup. The actual backup can consist of a series of 20-byte buffer tags, those being the exact set of blocks newer than the base-backup's latest-guaranteed-to-be-present LSN. Each buffer tag is followed by an 8kB block of data. If a relfilenode is truncated or removed, you need some way to indicate that in the backup; e.g. include a buffertag with forknum = -(forknum + 1) and blocknum = the new number of blocks, or InvalidBlockNumber if removed entirely. To have a working backup you need to ship each block which is newer than latest-guaranteed-to-be-present in full backup and not newer than latest-guaranteed-to-be-present in the current backup. Also, as a further optimization, you can think about not sending the empty space in the middle of each page. Right. Or compressing the data. If we want to introduce compression on server side, I think that compressing the whole tar stream would be more effective. My main concern here is about how postgres can remember that a relfilenode has been deleted, in order to send the appropriate deletion tag. You also need to handle truncation. Yes, of course. The current backup profile contains the file size, and it can be used to truncate the file to the right size. IMHO the easiest way is to send the full list of files along the backup and let to the client the task to delete unneeded files. The backup profile has this purpose. Moreover, I do not like the idea of using only a stream of block as the actual differential backup, for the following reasons: * AFAIK, with the current infrastructure, you cannot do a backup with a block stream only. To have a valid backup you need many files for which the concept of LSN doesn't apply. * I don't like to have all the data from the various tablespace/db/whatever all mixed in the same stream. I'd prefer to have the blocks saved on a per file basis. OK, that makes sense. But you still only need the file list when sending a differential backup, not when sending a full backup. So maybe a differential backup looks like this: - Ship a table-of-contents file with a list relation files currently present and the length of each in blocks. Having the size in bytes allow you to use the same format for non-block files. Am I missing any advantage of having the size in blocks over having the size in bytes? Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it signature.asc Description: OpenPGP digital signature
Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
On 10/06/2014 07:06 PM, Marco Nenciarini wrote: Il 06/10/14 17:55, Robert Haas ha scritto: On Mon, Oct 6, 2014 at 11:51 AM, Marco Nenciarini marco.nenciar...@2ndquadrant.it wrote: I agree that a full backup does not need to include a profile. I've added the option to require the profile even for a full backup, as it can be useful for backup softwares. We could remove the option and build the profile only during incremental backups, if required. However, I would avoid the needing to scan the whole backup to know the size of the recovered data directory, hence the backup profile. That doesn't seem to be buying you much. Calling stat() on every file in a directory tree is a pretty cheap operation. In case of incremental backup it is not true. You have to read the delta file to know the final size. You can optimize it putting this information in the first few bytes, but in case of compressed tar format you will need to scan the whole archive. I think you're pretty much screwed with the compressed tar format anyway. The files in the .tar can be in different order in the 'diff' and the base backup, so you need to do random access anyway when you try apply the diff. And random access isn't very easy with uncompressed tar format either. I think it would be acceptable to only support incremental backups with the directory format. In hindsight, our compressed tar format was not a very good choice, because it makes random access impossible. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
On Mon, Oct 6, 2014 at 12:06 PM, Marco Nenciarini marco.nenciar...@2ndquadrant.it wrote: Il 06/10/14 17:55, Robert Haas ha scritto: On Mon, Oct 6, 2014 at 11:51 AM, Marco Nenciarini marco.nenciar...@2ndquadrant.it wrote: I agree that a full backup does not need to include a profile. I've added the option to require the profile even for a full backup, as it can be useful for backup softwares. We could remove the option and build the profile only during incremental backups, if required. However, I would avoid the needing to scan the whole backup to know the size of the recovered data directory, hence the backup profile. That doesn't seem to be buying you much. Calling stat() on every file in a directory tree is a pretty cheap operation. In case of incremental backup it is not true. You have to read the delta file to know the final size. You can optimize it putting this information in the first few bytes, but in case of compressed tar format you will need to scan the whole archive. Well, sure. But I never objected to sending a profile in a differential backup. I'm just objecting to sending one in a full backup. At least not without a more compelling reason why we need it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
On Mon, Oct 6, 2014 at 12:18 PM, Marco Nenciarini marco.nenciar...@2ndquadrant.it wrote: - Ship a table-of-contents file with a list relation files currently present and the length of each in blocks. Having the size in bytes allow you to use the same format for non-block files. Am I missing any advantage of having the size in blocks over having the size in bytes? Size in bytes would be fine, too. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
On 10/06/2014 07:00 PM, Gabriele Bartolini wrote: Hello, 2014-10-06 17:51 GMT+02:00 Marco Nenciarini marco.nenciar...@2ndquadrant.it : I agree that a full backup does not need to include a profile. I've added the option to require the profile even for a full backup, as it can be useful for backup softwares. We could remove the option and build the profile only during incremental backups, if required. However, I would avoid the needing to scan the whole backup to know the size of the recovered data directory, hence the backup profile. I really like this approach. I think we should leave users the ability to ship a profile file even in case of full backup (by default disabled). I don't see the point of making the profile optional. Why burden the user with that decision? I'm not convinced we need it at all, but if we're going to have a profile file, it should always be included. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
On Mon, Oct 06, 2014 at 07:24:32PM +0300, Heikki Linnakangas wrote: On 10/06/2014 07:00 PM, Gabriele Bartolini wrote: Hello, 2014-10-06 17:51 GMT+02:00 Marco Nenciarini marco.nenciar...@2ndquadrant.it : I agree that a full backup does not need to include a profile. I've added the option to require the profile even for a full backup, as it can be useful for backup softwares. We could remove the option and build the profile only during incremental backups, if required. However, I would avoid the needing to scan the whole backup to know the size of the recovered data directory, hence the backup profile. I really like this approach. I think we should leave users the ability to ship a profile file even in case of full backup (by default disabled). I don't see the point of making the profile optional. Why burden the user with that decision? I'm not convinced we need it at all, but if we're going to have a profile file, it should always be included. +1 for fewer user decisions, especially with something light-weight in resource consumption like the profile. Cheers, David. -- David Fetter da...@fetter.org http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fet...@gmail.com iCal: webcal://www.tripit.com/feed/ical/people/david74/tripit.ics Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
On Sat, Oct 4, 2014 at 12:31 AM, Marco Nenciarini marco.nenciar...@2ndquadrant.it wrote: Compared to first version, we switched from a timestamp+checksum based approach to one based on LSN. Cool. This patch adds an option to pg_basebackup and to replication protocol BASE_BACKUP command to generate a backup_profile file. It is almost useless by itself, but it is the foundation on which we will build the file based incremental backup (and hopefully a block based incremental backup after it). Hm. I am not convinced by the backup profile file. What's wrong with having a client send only an LSN position to get a set of files (or partial files filed with blocks) newer than the position given, and have the client do all the rebuild analysis? Any comment will be appreciated. In particular I'd appreciate comments on correctness of relnode files detection and LSN extraction code. Please include some documentation with the patch once you consider that this is worth adding to a commit fest. This is clearly WIP yet so it does not matter much, but that's something not to forget. Regards, -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
On 10/03/2014 06:31 PM, Marco Nenciarini wrote: Hi Hackers, I've updated the wiki page https://wiki.postgresql.org/wiki/Incremental_backup following the result of discussion on hackers. Compared to first version, we switched from a timestamp+checksum based approach to one based on LSN. This patch adds an option to pg_basebackup and to replication protocol BASE_BACKUP command to generate a backup_profile file. It is almost useless by itself, but it is the foundation on which we will build the file based incremental backup (and hopefully a block based incremental backup after it). I'd suggest jumping straight to block-based incremental backup. It's not significantly more complicated to implement, and if you implement both separately, then we'll have to support both forever. If you really need to, you can implement file-level diff as a special case, where the server sends all blocks in the file, if any of them have an LSN the cutoff point. But I'm not sure if there's point in that, once you have block-level support. If we're going to need a profile file - and I'm not convinced of that - is there any reason to not always include it in the backup? Any comment will be appreciated. In particular I'd appreciate comments on correctness of relnode files detection and LSN extraction code. I didn't look at it in detail, but one future problem comes to mind: Once you implement the server-side code that only sends a file if its LSN is higher than the cutoff point that the client gave, you'll have to scan the whole file first, to see if there are any blocks with a higher LSN. At least until you find the first such block. So with a file-level implementation of this sort, you'll have to scan all files twice, in the worst case. - Heikki -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
Il 03/10/14 17:53, Heikki Linnakangas ha scritto: If we're going to need a profile file - and I'm not convinced of that - is there any reason to not always include it in the backup? The main reason is to have a centralized list of files that need to be present. Without a profile, you have to insert some sort of placeholder for kipped files. Moreover, the profile allows you to quickly know the size of the recovered backup (by simply summing the individual size). Another use could be to 'validate' the presence of all required files in a backup. Any comment will be appreciated. In particular I'd appreciate comments on correctness of relnode files detection and LSN extraction code. I didn't look at it in detail, but one future problem comes to mind: Once you implement the server-side code that only sends a file if its LSN is higher than the cutoff point that the client gave, you'll have to scan the whole file first, to see if there are any blocks with a higher LSN. At least until you find the first such block. So with a file-level implementation of this sort, you'll have to scan all files twice, in the worst case. It's true. To solve this you have to keep a central maxLSN directory, but I think it introduces more issues than it solves. Regards, Marco -- Marco Nenciarini - 2ndQuadrant Italy PostgreSQL Training, Services and Support marco.nenciar...@2ndquadrant.it | www.2ndQuadrant.it signature.asc Description: OpenPGP digital signature
Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
On Fri, Oct 3, 2014 at 1:08 PM, Marco Nenciarini marco.nenciar...@2ndquadrant.it wrote: Any comment will be appreciated. In particular I'd appreciate comments on correctness of relnode files detection and LSN extraction code. I didn't look at it in detail, but one future problem comes to mind: Once you implement the server-side code that only sends a file if its LSN is higher than the cutoff point that the client gave, you'll have to scan the whole file first, to see if there are any blocks with a higher LSN. At least until you find the first such block. So with a file-level implementation of this sort, you'll have to scan all files twice, in the worst case. It's true. To solve this you have to keep a central maxLSN directory, but I think it introduces more issues than it solves. I see that as a worthy optimization on the server side, regardless of whether file or block-level backups are used, since it allows efficient skipping of untouched segments (common for append-only tables). Still, it would be something to do after it works already (ie: it's an optimization) -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
On Fri, Oct 3, 2014 at 06:08:47PM +0200, Marco Nenciarini wrote: Any comment will be appreciated. In particular I'd appreciate comments on correctness of relnode files detection and LSN extraction code. I didn't look at it in detail, but one future problem comes to mind: Once you implement the server-side code that only sends a file if its LSN is higher than the cutoff point that the client gave, you'll have to scan the whole file first, to see if there are any blocks with a higher LSN. At least until you find the first such block. So with a file-level implementation of this sort, you'll have to scan all files twice, in the worst case. It's true. To solve this you have to keep a central maxLSN directory, but I think it introduces more issues than it solves. The central issue Heikki is pointing out is whether we should implement a file-based system if we already know that a block-based system will be superior in every way. I agree with that and agree that implementing just file-based isn't worth it as we would have to support it forever. So, in summary, if you target just a file-based system, be prepared that it might be rejected. -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + Everyone has their own god. + -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
On Fri, Oct 3, 2014 at 12:08 PM, Marco Nenciarini marco.nenciar...@2ndquadrant.it wrote: Il 03/10/14 17:53, Heikki Linnakangas ha scritto: If we're going to need a profile file - and I'm not convinced of that - is there any reason to not always include it in the backup? The main reason is to have a centralized list of files that need to be present. Without a profile, you have to insert some sort of placeholder for kipped files. Why do you need to do that? And where do you need to do that? It seems to me that there are three interesting operations: 1. Take a full backup. Basically, we already have this. In the backup label file, make sure to note the newest LSN guaranteed to be present in the backup. 2. Take a differential backup. In the backup label file, note the LSN of the fullback to which the differential backup is relative, and the newest LSN guaranteed to be present in the differential backup. The actual backup can consist of a series of 20-byte buffer tags, those being the exact set of blocks newer than the base-backup's latest-guaranteed-to-be-present LSN. Each buffer tag is followed by an 8kB block of data. If a relfilenode is truncated or removed, you need some way to indicate that in the backup; e.g. include a buffertag with forknum = -(forknum + 1) and blocknum = the new number of blocks, or InvalidBlockNumber if removed entirely. 3. Apply a differential backup to a full backup to create an updated full backup. This is just a matter of scanning the full backup and the differential backup and applying the changes in the differential backup to the full backup. You might want combinations of these, like something that does 2+3 as a single operation, for efficiency, or a way to copy a full backup and apply a differential backup to it as you go. But that's it, right? What else do you need? -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Re: [HACKERS] [RFC] Incremental backup v2: add backup profile to base backup
On 2014-10-03 17:31:45 +0200, Marco Nenciarini wrote: I've updated the wiki page https://wiki.postgresql.org/wiki/Incremental_backup following the result of discussion on hackers. Compared to first version, we switched from a timestamp+checksum based approach to one based on LSN. This patch adds an option to pg_basebackup and to replication protocol BASE_BACKUP command to generate a backup_profile file. It is almost useless by itself, but it is the foundation on which we will build the file based incremental backup (and hopefully a block based incremental backup after it). Any comment will be appreciated. In particular I'd appreciate comments on correctness of relnode files detection and LSN extraction code. Can you describe the algorithm you implemented in words? Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers