I previously reported[1] that CreateReplicationSlot() was accessing the global variable ThisTimeLineID when it might not be initialized. Today I realized that the code in basebackup.c has the same problem. perform_base_backup() blithely uses the variable six times without doing anything at all to initialize it. In practice, it's always going to be initialized to some non-zero value, because pg_basebackup is always going to execute IDENTIFY_SYSTEM before it executes BASE_BACKUP, and that's going to set ThisTimeLineID as a side effect. But, if you hack pg_basebackup to not call IDENTIFY_SYSTEM before calling BASE_BACKUP, then you can reach this code with ThisTimeLineID == 0 using pg_basebackup -Xfetch. Even if you don't do that, you're only guaranteed that ThisTimeLineID is initialized to something, not that it's initialized to the correct thing. The timeline on the standby can change any time.
Fortunately, this has no serious consequences AFAICT. Heikki wrote "I'd rather not worry about timelines here" and wrote the algorithm in such a way that the only thing that happens if you feed a wrong TLI through the logic is that the wrong TLI might be used to construct a WAL file name for use in an error report. You won't get an error when things should have worked, or the other way around, but if you get a complaint about a segment file, it might include the wrong TLI in the filename. I think possibly this logic should be rewritten to be fully timeline-aware, but that's a bit of a project and this isn't really broken, so what I'd like to do for right now is change it to use non-bogus TLIs that we have available in local variables rather than a possibly-nonsensical value from a global variable. Patch for that attached. -- Robert Haas EDB: http://www.enterprisedb.com [1] http://postgr.es/m/ca+tgmozc2aye9erxcua_c_xbbmvlps9lzl_k5oo7m9xhco2...@mail.gmail.com
0001-When-fetching-WAL-for-a-basebackup-report-errors-wit.patch
Description: Binary data