On 06/23/2015 07:51 AM, Michael Paquier wrote:
So... Attached are a set of patches dedicated at fixing this issue:

Thanks for working on this!

- 0001, add if_not_exists to pg_tablespace_location, returning NULL if
path does not exist
- 0002, same with pg_stat_file, returning NULL if file does not exist
- 0003, same with pg_read_*file. I added them to all the existing
functions for consistency.
- 0004, pg_ls_dir extended with if_not_exists and include_dot_dirs
(thanks Robert for the naming!)
- 0005, as things get complex, a set of regression tests aimed to
covering those things. pg_tablespace_location is platform-dependent,
so there are no tests for it.
- 0006, the fix for pg_rewind, using what has been implemented before.

With thes patches, pg_read_file() will return NULL for any failure to open the file, which makes pg_rewind to assume that the file doesn't exist in the source server, and will remove the file from the destination. That's dangerous, those functions should check specifically for ENOENT.

There's still a small race condition with tablespaces. If you run CREATE TABLESPACE in the source server while pg_rewind is running, it's possible that the recursive query that pg_rewind uses sees the symlink in pg_tblspc/ directory, but its snapshot doesn't see the row in pg_tablespace yet. It will think that the symlink is a regular file, try to read it, and fail (if we checked for ENOENT).

Actually, I think we need try to deal with symlinks a bit harder. Currently, pg_rewind assumes that anything in pg_tblspace that has a matching row in pg_tablespace is a symlink, and nothing else is. I think symlinks to directories. I just noticed that pg_rewind fails miserable if pg_xlog is a symlink, because of that:

----
The servers diverged at WAL position 0/3023F08 on timeline 1.
Rewinding from last common checkpoint at 0/2000060 on timeline 1

"data-master//pg_xlog" is not a directory
Failure, exiting
----

I think we need to add a column to pg_stat_file output, to indicate symbolic links, and add a pg_readlink() function. That still leaves a race condition if the type of a file changes, i.e. a file is deleted and a directory with the same name is created in its place, but that seems acceptable. I don't think PostgreSQL ever does such a thing, so that could only happen if you mess with the data directory manually while the server is running.

I just realized another problem: We recently learned the hard way that some people have files in the data directory that are not writeable by the 'postgres' user (http://www.postgresql.org/message-id/20150523172627.ga24...@msg.df7cb.de). pg_rewind will try to overwrite all files it doesn't recognize as relation files, so it's going to fail on those. A straightforward fix would be to first open the destination file in read-only mode, and compare its contents, and only open the file in write mode if it has changed. It would still fail when the files really differ, but I think that's acceptable.

I note that pg_rewind doesn't need to distinguish between an empty and a non-existent directory, so it's quite silly for it to pass include_dot_dirs=true, and then filter out "." and ".." from the result set. The documentation should mention the main reason for including "." and "..": to distinguish between an empty and non-existent directory.

- Heikki



--
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to