Re: [HACKERS] Timeline ID in backup_label file
On Sat, Jan 06, 2018 at 09:56:03AM +0900, Michael Paquier wrote: > Previous patch was incorrect, this was the same v2 as upthread. > Attached is the correct v3. For the archive's sake, this version of the patch has been pushed as 6271fce. Thanks Simon for the commit, and David for the review! -- Michael signature.asc Description: PGP signature
Re: [HACKERS] Timeline ID in backup_label file
On Fri, Jan 5, 2018 at 11:13 PM, Simon Riggswrote: > On 27 November 2017 at 14:06, David Steele wrote: > >> I have tested and get an error as expected when I munge the backup_label >> file: >> >> FATAL: invalid data in file "backup_label" >> DETAIL: Timeline ID parsed is 2, but expected 1 >> >> Everything else looks good so I will mark it ready for committer. > > Sounds good. Thanks for the feedback. > No tests? Do you think that's worth the cycles as a TAP test? This can be done by generating a dummy backup_label file and then start a standby to see the failure, but this looks quite costly for minimum gain. This code path is also taken tested in src/test/recovery/t/001_stream_rep.pl for example, though that's not the failure path. So if we add a DEBUG1 line after fetching correctly the timeline ID this could help by looking at tmp_check/log/001_stream_rep_standby_1.log, but this needs to set "log_min_messages = debug1" PostgresNode.pm for example so as this shows up in the logs (this could be useful if done by default actually, DEBUG1 is not too talkative). Attached is an example of patch doing so. See for the addition in PostgresNode.pm and this new bit: + ereport(DEBUG1, + (errmsg("backup timeline %u in file \"%s\"", + tli_from_file, BACKUP_LABEL_FILE))); > No docs/extended explanatory comments? There is no existing documentation about the format of the backup_file in doc/. So it seems to me that it is not the problem of this patch to fill in the hole. Regarding the comments, perhaps something better could be done, but I am not quite sure what. We don't much explain what the current fields mean, except that one can guess what they mean from their names, which is the intention behind the code. -- Michael backup_label_tli_v2.patch Description: Binary data
Re: [HACKERS] Timeline ID in backup_label file
On 27 November 2017 at 14:06, David Steelewrote: > I have tested and get an error as expected when I munge the backup_label > file: > > FATAL: invalid data in file "backup_label" > DETAIL: Timeline ID parsed is 2, but expected 1 > > Everything else looks good so I will mark it ready for committer. Sounds good. No tests? No docs/extended explanatory comments? -- Simon Riggshttp://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
Re: [HACKERS] Timeline ID in backup_label file
On Tue, Nov 28, 2017 at 9:40 AM, David Steelewrote: > Perhaps I'm the one who is misunderstanding. If you propose a patch I'll be > happy to review it, though I doubt there is a lot to be gained even if it > would be a better implementation. OK. I'll keep that in mind. Thanks for the input. -- Michael
Re: [HACKERS] Timeline ID in backup_label file
On 11/27/17 7:11 PM, Michael Paquier wrote: On Mon, Nov 27, 2017 at 11:06 PM, David Steelewrote: On 11/15/17 10:09 PM, Michael Paquier wrote: read_backup_label() is a static function in the backend code. With #2 I do not imply to change the order of the elements written in the backup_label file, just to make the way they are parsed smarter. My point is that the order *could* be changed and it wouldn't be noticed if the read function were improved as you propose. I'm not against the idea, just think we should continue to enforce the order unless we decide an interface break is OK. I still don't quite understand here. The order the items are read does not cause a backward-incompatible change. True that there is no reason to change that either now. Perhaps I'm the one who is misunderstanding. If you propose a patch I'll be happy to review it, though I doubt there is a lot to be gained even if it would be a better implementation. Regards, -- -David da...@pgmasters.net
Re: [HACKERS] Timeline ID in backup_label file
On Mon, Nov 27, 2017 at 11:06 PM, David Steelewrote: > On 11/15/17 10:09 PM, Michael Paquier wrote: >> On Thu, Nov 16, 2017 at 9:20 AM, David Steele wrote: >>> For this patch at least, I think we should do #1. Getting rid of the order >>> dependency is attractive but there may be other programs that are depending >>> on the order. I know you are not proposing to change the order now, but it >>> *could* be changed with #2. >> >> read_backup_label() is a static function in the backend code. With #2 >> I do not imply to change the order of the elements written in the >> backup_label file, just to make the way they are parsed smarter. > > My point is that the order *could* be changed and it wouldn't be noticed > if the read function were improved as you propose. > > I'm not against the idea, just think we should continue to enforce the > order unless we decide an interface break is OK. I still don't quite understand here. The order the items are read does not cause a backward-incompatible change. True that there is no reason to change that either now. >>> Also, I think DEBUG1 would be a more appropriate log level if any logging is >>> done. >> >> OK, the label and time strings can include spaces. The label string is >> assumed to not be bigger than 1028 bytes, and the timestamp is assumed >> that it can get up to 128. So it is possible to use stuff like >> fscanf("%127[^\n]\n") to avoid any buffer overflow problems on the >> backend. If a backup_label is generated with strings longer than that, >> then read_backup_label would fail to parse the next entries but that's >> not really something to worry about IMO as those are only minor sanity >> checks. Having a safer coding in the backend is more important, and >> the new checks should not trigger any hard failures. > > I have tested and get an error as expected when I munge the backup_label > file: > > FATAL: invalid data in file "backup_label" > DETAIL: Timeline ID parsed is 2, but expected 1 > > Everything else looks good so I will mark it ready for committer. Thanks. This maps what I saw. -- Michael
Re: [HACKERS] Timeline ID in backup_label file
Hi Michael, On 11/15/17 10:09 PM, Michael Paquier wrote: > On Thu, Nov 16, 2017 at 9:20 AM, David Steelewrote: >> For this patch at least, I think we should do #1. Getting rid of the order >> dependency is attractive but there may be other programs that are depending >> on the order. I know you are not proposing to change the order now, but it >> *could* be changed with #2. > > read_backup_label() is a static function in the backend code. With #2 > I do not imply to change the order of the elements written in the > backup_label file, just to make the way they are parsed smarter. My point is that the order *could* be changed and it wouldn't be noticed if the read function were improved as you propose. I'm not against the idea, just think we should continue to enforce the order unless we decide an interface break is OK. >> Also, I think DEBUG1 would be a more appropriate log level if any logging is >> done. > > OK, the label and time strings can include spaces. The label string is > assumed to not be bigger than 1028 bytes, and the timestamp is assumed > that it can get up to 128. So it is possible to use stuff like > fscanf("%127[^\n]\n") to avoid any buffer overflow problems on the > backend. If a backup_label is generated with strings longer than that, > then read_backup_label would fail to parse the next entries but that's > not really something to worry about IMO as those are only minor sanity > checks. Having a safer coding in the backend is more important, and > the new checks should not trigger any hard failures. I have tested and get an error as expected when I munge the backup_label file: FATAL: invalid data in file "backup_label" DETAIL: Timeline ID parsed is 2, but expected 1 Everything else looks good so I will mark it ready for committer. Regards, -- -David da...@pgmasters.net
Re: [HACKERS] Timeline ID in backup_label file
On Thu, Nov 16, 2017 at 9:20 AM, David Steelewrote: > For this patch at least, I think we should do #1. Getting rid of the order > dependency is attractive but there may be other programs that are depending > on the order. I know you are not proposing to change the order now, but it > *could* be changed with #2. read_backup_label() is a static function in the backend code. With #2 I do not imply to change the order of the elements written in the backup_label file, just to make the way they are parsed smarter. > Also, I think DEBUG1 would be a more appropriate log level if any logging is > done. OK, the label and time strings can include spaces. The label string is assumed to not be bigger than 1028 bytes, and the timestamp is assumed that it can get up to 128. So it is possible to use stuff like fscanf("%127[^\n]\n") to avoid any buffer overflow problems on the backend. If a backup_label is generated with strings longer than that, then read_backup_label would fail to parse the next entries but that's not really something to worry about IMO as those are only minor sanity checks. Having a safer coding in the backend is more important, and the new checks should not trigger any hard failures. -- Michael backup_label_tli_v2.patch Description: Binary data
Re: [HACKERS] Timeline ID in backup_label file
On 11/15/17 6:01 PM, Michael Paquier wrote: On Wed, Nov 15, 2017 at 11:16 PM, David Steelewrote: Find my review below. On 10/26/17 2:03 PM, Michael Paquier wrote: Thanks for the feedback. Attached is a patch to achieve so, I have added as well a STOP TIMELINE field in the backup history file. Note that START TIMELINE gets automatically into the backup history file. Added a CF entry as well. + TimeLineID tli1, tli2; I'm not that excited about these names but don't have any better ideas. One is tli_from_segname and the second is tli_from_file. Would something like that sound better? Yes, those names are better. + if (fscanf(lfp, "START TIMELINE: %u\n", ) == 1) This didn't work when I tested it (I had intentionally munged the "START TIMELINE" to produce an error). The problem is the "START TIME" and "LABEL" lines which are not being read. I added a few fgets() calls and it worked. Currently backup label files are generated as such: START WAL LOCATION: 0/228 (file 00010002) CHECKPOINT LOCATION: 0/260 BACKUP METHOD: streamed BACKUP FROM: master START TIME: 2017-11-16 07:52:50 JST LABEL: popo START TIMELINE: 1 There could be two things that could be done here: 1) Keep read_backup_label order-dependent and look at the START TIME and LABEL fields, then kick in ereport(LOG) with the information. This makes the fields in the backup_label still order-dependent. 2) Make read_backup_label smarter by parsing it line-by-line and fill in the data wanted. This way the parsing logic and sanity checks are split into two, and Postgres is smarter at looking at backup_label files generated by any backup tool. Which one do you prefer? Getting input from backup tool maintainers is important here. 2) is more extensible if more fields are added to the backup_label for a reason or another in the future. For this patch at least, I think we should do #1. Getting rid of the order dependency is attractive but there may be other programs that are depending on the order. I know you are not proposing to change the order now, but it *could* be changed with #2. Also, I think DEBUG1 would be a more appropriate log level if any logging is done. Regards, -- -David da...@pgmasters.net
Re: [HACKERS] Timeline ID in backup_label file
On Wed, Nov 15, 2017 at 11:16 PM, David Steelewrote: > Find my review below. > > On 10/26/17 2:03 PM, Michael Paquier wrote: >> >> Thanks for the feedback. Attached is a patch to achieve so, I have >> added as well a STOP TIMELINE field in the backup history file. Note >> that START TIMELINE gets automatically into the backup history file. >> Added a CF entry as well. > + TimeLineID tli1, tli2; > > I'm not that excited about these names but don't have any better ideas. One is tli_from_segname and the second is tli_from_file. Would something like that sound better? > + if (fscanf(lfp, "START TIMELINE: %u\n", ) == 1) > > This didn't work when I tested it (I had intentionally munged the "START > TIMELINE" to produce an error). The problem is the "START TIME" and > "LABEL" lines which are not being read. I added a few fgets() calls and > it worked. Currently backup label files are generated as such: START WAL LOCATION: 0/228 (file 00010002) CHECKPOINT LOCATION: 0/260 BACKUP METHOD: streamed BACKUP FROM: master START TIME: 2017-11-16 07:52:50 JST LABEL: popo START TIMELINE: 1 There could be two things that could be done here: 1) Keep read_backup_label order-dependent and look at the START TIME and LABEL fields, then kick in ereport(LOG) with the information. This makes the fields in the backup_label still order-dependent. 2) Make read_backup_label smarter by parsing it line-by-line and fill in the data wanted. This way the parsing logic and sanity checks are split into two, and Postgres is smarter at looking at backup_label files generated by any backup tool. Which one do you prefer? Getting input from backup tool maintainers is important here. 2) is more extensible if more fields are added to the backup_label for a reason or another in the future. -- Michael
Re: [HACKERS] Timeline ID in backup_label file
Hi Michael, Find my review below. On 10/26/17 2:03 PM, Michael Paquier wrote: > > Thanks for the feedback. Attached is a patch to achieve so, I have > added as well a STOP TIMELINE field in the backup history file. Note > that START TIMELINE gets automatically into the backup history file. > Added a CF entry as well. + TimeLineID tli1, tli2; I'm not that excited about these names but don't have any better ideas. + if (fscanf(lfp, "START TIMELINE: %u\n", ) == 1) This didn't work when I tested it (I had intentionally munged the "START TIMELINE" to produce an error). The problem is the "START TIME" and "LABEL" lines which are not being read. I added a few fgets() calls and it worked. Thanks! -- -David da...@pgmasters.net