On Mon, Oct 7, 2019 at 6:05 PM Robert Haas <robertmh...@gmail.com> wrote:
> On Mon, Oct 7, 2019 at 8:48 AM Asif Rehman <asifr.reh...@gmail.com> wrote: > > Sure. Though the backup manifest patch calculates and includes the > checksum of backup files and is done > > while the file is being transferred to the frontend-end. The manifest > file itself is copied at the > > very end of the backup. In parallel backup, I need the list of filenames > before file contents are transferred, in > > order to divide them into multiple workers. For that, the manifest file > has to be available when START_BACKUP > > is called. > > > > That means, backup manifest should support its creation while excluding > the checksum during START_BACKUP(). > > I also need the directory information as well for two reasons: > > > > - In plain format, base path has to exist before we can write the file. > we can extract the base path from the file > > but doing that for all files does not seem a good idea. > > - base backup does not include the content of some directories but those > directories although empty, are still > > expected in PGDATA. > > > > I can make these changes part of parallel backup (which would be on top > of backup manifest patch) or > > these changes can be done as part of manifest patch and then parallel > can use them. > > > > Robert what do you suggest? > > I think we should probably not use backup manifests here, actually. I > initially thought that would be a good idea, but after further thought > it seems like it just complicates the code to no real benefit. Okay. > I > suggest that the START_BACKUP command just return a result set, like a > query, with perhaps four columns: file name, file type ('d' for > directory or 'f' for file), file size, file mtime. pg_basebackup will > ignore the mtime, but some other tools might find that useful > information. > yes current patch already returns the result set. will add the additional information. > I wonder if we should also split START_BACKUP (which should enter > non-exclusive backup mode) from GET_FILE_LIST, in case some other > client program wants to use one of those but not the other. I think > that's probably a good idea, but not sure. > Currently pg_basebackup does not enter in exclusive backup mode and other tools have to use pg_start_backup() and pg_stop_backup() functions to achieve that. Since we are breaking backup into multiple command, I believe it would be a good idea to have this option. I will include it in next revision of this patch. > > I still think that the files should be requested one at a time, not a > huge long list in a single command. > sure, will make the change. -- Asif Rehman Highgo Software (Canada/China/Pakistan) URL : www.highgo.ca