Hi Folks!
The permissions on the RECOVERYXLOG file at the time of the error are 0400:
-r--------. 1 postgres postgres 16777216 May 31 09:51 RECOVERYXLOG
I sent that info to Tom earlier this afternoon (still learning the posting
protocols - sorry) - his response is below:
From Tom:
Ah, that confirms the diagnosis --- the new fsync code is assuming that all
files it might need to fsync are writable by the postgres user.
What would be useful to know is how it got to be that way rather than
0600 (-rw-------). Seems like it must be a property of your archive restore
script, but we don't have the details. Maybe you could show the script?
Also, I don't recall whether you said exactly what platform you're on, but
that's probably important here as well.
regards, tom lane
OPERATING SYSTEM INFORMATION:
The database server is running on a Redhat Linux host (Red Hat Enterprise Linux
Server release 6.8 (Santiago)) -
[postgres-pt(at)postXX pg_xlog]$ uname -a
Linux 2.6.32-573.22.1.el6.x86_64 #1 SMP Thu Mar 17 03:23:39 EDT 2016 x86_64
x86_64 x86_64 GNU/Linux
The user account that creates the backup files is the same account that is used
for performing the restores - and at no time are any elevated privileges used.
The script we use for restoring our databases is not overly complex - a quick
summary is below:
Save config files
Remove tablespace directories
Remove cluster directory
untar tablespace backup files
untar cluster backup file
copy back the config files
At this point in the script the pg_xlog directory is cleaned out - there are no
files in the pg_xlog directory when the restore starts. The database server is
then started.
Below are the contents of the pg_xlog directory from the unsuccessful restore
attempts:
[postgres@postZZ postXX]$ pwd
/pgdata/pgsrv_xlog_data/postXX
[postgres@postZZ postXX]$ ls -lR
.:
total 4
drwx------ 2 postgres postgres 4096 May 31 14:58 archive_status
./archive_status:
total 0
[postgres@postZZ postXX]$ pg_ctl -D /pgdata/pgsrv_cluster_data/postXX_92 start
pg_ctl: another server might be running; trying to start server anyway
server starting
2016-05-31 14:59:18 EDT 574ddf06.7d10 [1-1] user=,db=,remote= LOG: loaded
library "pg_stat_statements"
[postgres@postZZ postXX]$ ls -lR
.:
total 16388
-r-------- 1 postgres postgres 16777216 May 31 14:59 RECOVERYXLOG
drwx------ 2 postgres postgres 4096 May 31 14:58 archive_status
./archive_status:
total 0
Even if the permissions on the RECOVERYXLOG file are change (0777 in this case)
and the file is left in the pg_xlog directory, the postgres process will
recreate the file as 0400:
[postgres@postZZ postXX]$ chmod 777 RECOVERYXLOG
[postgres@postZZ postXX]$ ls -lR
.:
total 16388
-rwxrwxrwx 1 postgres postgres 16777216 May 31 14:59 RECOVERYXLOG
drwx------ 2 postgres postgres 4096 May 31 14:58 archive_status
./archive_status:
total 0
[postgres@postZZ postXX]$ pg_ctl -D /pgdata/pgsrv_cluster_data/postXX_92 start
server starting
2016-05-31 14:59:46 EDT 574ddf21.7d25 [1-1] user=,db=,remote= LOG: loaded
library "pg_stat_statements"
[postgres@postZZ postXX]$ ls -lR
.:
total 16388
-r-------- 1 postgres postgres 16777216 May 31 14:59 RECOVERYXLOG
drwx------ 2 postgres postgres 4096 May 31 14:58 archive_status
./archive_status:
total 0
Interestingly enough, in the 9.2.15 version (successful restore), the file is
also created 0400:
[postgres@postZZ postXX]$ pwd
/pgdata/pgsrv_xlog_data/postXX
[postgres@postZZ postXX]$ ls -lR
.:
total 4
drwx------ 2 postgres postgres 4096 May 31 14:58 archive_status
./archive_status:
total 0
[postgres@postZZ postXX]$ pg_ctl -D /pgdata/pgsrv_cluster_data/postXX_92 start
server starting
2016-05-31 15:36:27 EDT 574de7bb.7eff [1-1] user=,db=,remote= LOG: loaded
library "pg_stat_statements"
[postgres@postZZ postXX]$ ls -lR
.:
total 65544
-r-------- 1 postgres postgres 16777216 May 31 14:46 000000010000000500000022
-rw------- 1 postgres postgres 56 May 31 14:46 00000002.history
-rw------- 1 postgres postgres 16777216 May 31 14:46 000000020000000500000022
-rw------- 1 postgres postgres 16777216 May 31 14:46 000000020000000500000023
-rw------- 1 postgres postgres 16777216 May 31 14:46 000000020000000500000024
drwx------ 2 postgres postgres 4096 May 31 14:46 archive_status
./archive_status:
total 0
-rw------- 1 postgres postgres 0 May 31 14:46 000000010000000500000022.done
-rw------- 1 postgres postgres 0 May 31 14:46 00000002.history.done
-rw------- 1 postgres postgres 0 May 31 14:46 000000020000000500000023.done
Some additional info. We are in the initial stages of upgrading from 9.2 to
9.5. I upgraded one of our database servers to 9.5.1, then backed up the
upgraded database with our same backup process. Using both the 9.5.2
executables (where the fsync/rename patch was introduced) and the 9.5.3
executables I was able to successfully restore the 9.5.1 backup (same host/user
account which produces an error under 9.2.16).
Below are various strace outputs that were captured performing the database
restores. The exact same source files are used in both of the 9.2.X restores.
I have 'grepped' out the rows for the RECOVERYXLOG file. Going through the
output, the open modes are different (0444 versus 0600) between 9.2 and 9.5
9.2
open("pg_xlog/RECOVERYXLOG", O_WRONLY|O_CREAT|O_EXCL, 0444) = 7
9.5
open("pg_xlog/RECOVERYXLOG", O_WRONLY|O_CREAT|O_EXCL, 0600) = 4
9.2.15 backup/9.2.15 restore - successful
29262 stat("pg_xlog/RECOVERYXLOG", 0x7ffd91fc90b0) = -1 ENOENT (No such file or
directory)
29263 execve("/bin/cp", ["cp", "/pgdata/postXX_walarchive/00"...,
"pg_xlog/RECOVERYXLOG"], [/* 60 vars */]) = 0
29263 stat("pg_xlog/RECOVERYXLOG", 0x7ffd8697d490) = -1 ENOENT (No such file or
directory)
29263 stat("pg_xlog/RECOVERYXLOG", 0x7ffd8697d2b0) = -1 ENOENT (No such file or
directory)
29263 open("pg_xlog/RECOVERYXLOG", O_WRONLY|O_CREAT|O_EXCL, 0444) = 7
29262 stat("pg_xlog/RECOVERYXLOG", {st_mode=S_IFREG|0400, st_size=16777216,
...}) = 0
29262 rename("pg_xlog/RECOVERYXLOG", "pg_xlog/000000010000000500000022"
<unfinished ...>
29262 stat("pg_xlog/RECOVERYXLOG", 0x7ffd91fc90e0) = -1 ENOENT (No such file or
directory)
29266 execve("/bin/cp", ["cp", "/pgdata/postXX_walarchive/00"...,
"pg_xlog/RECOVERYXLOG"], [/* 60 vars */]) = 0
29266 stat("pg_xlog/RECOVERYXLOG", 0x7fff9ad125e0) = -1 ENOENT (No such file or
directory)
29262 stat("pg_xlog/RECOVERYXLOG", 0x7ffd91fc90e0) = -1 ENOENT (No such file or
directory)
29267 execve("/bin/cp", ["cp", "/pgdata/postXX_walarchive/00"...,
"pg_xlog/RECOVERYXLOG"], [/* 60 vars */]) = 0
29267 stat("pg_xlog/RECOVERYXLOG", 0x7fffca6281b0) = -1 ENOENT (No such file or
directory)
29267 stat("pg_xlog/RECOVERYXLOG", 0x7fffca627fd0) = -1 ENOENT (No such file or
directory)
29267 open("pg_xlog/RECOVERYXLOG", O_WRONLY|O_CREAT|O_EXCL, 0444) = 13
29262 stat("pg_xlog/RECOVERYXLOG", {st_mode=S_IFREG|0400, st_size=16777216,
...}) = 0
29262 rename("pg_xlog/RECOVERYXLOG", "pg_xlog/000000010000000500000022") = 0
29262 unlink("pg_xlog/RECOVERYXLOG") = -1 ENOENT (No such file or directory)
9.2.15 backup/9.2.16 restore - NOT successful
30153 stat("pg_xlog/RECOVERYXLOG", 0x7fff732488c0) = -1 ENOENT (No such file or
directory)
30154 execve("/bin/cp", ["cp", "/pgdata/postXX_walarchive/00"...,
"pg_xlog/RECOVERYXLOG"], [/* 60 vars */]) = 0
30154 stat("pg_xlog/RECOVERYXLOG", 0x7ffd3c7a96b0) = -1 ENOENT (No such file or
directory)
30154 stat("pg_xlog/RECOVERYXLOG", 0x7ffd3c7a94d0) = -1 ENOENT (No such file or
directory)
30154 open("pg_xlog/RECOVERYXLOG", O_WRONLY|O_CREAT|O_EXCL, 0444) = 7
30153 stat("pg_xlog/RECOVERYXLOG", {st_mode=S_IFREG|0400, st_size=16777216,
...}) = 0
30153 open("pg_xlog/RECOVERYXLOG", O_RDWR) = -1 EACCES (Permission denied)
9.5.1 backup/9.5.2 restore - successful
24615 stat("pg_xlog/RECOVERYXLOG", 0x7fff3e2f8e60) = -1 ENOENT (No such file or
directory)
24619 execve("/bin/cp", ["cp", "/pgdata/postXX_95_orig_wal/0"...,
"pg_xlog/RECOVERYXLOG"], [/* 60 vars */]) = 0
24619 stat("pg_xlog/RECOVERYXLOG", 0x7fffb2383ea0) = -1 ENOENT (No such file or
directory)
24619 stat("pg_xlog/RECOVERYXLOG", 0x7fffb2383cc0) = -1 ENOENT (No such file or
directory)
24619 open("pg_xlog/RECOVERYXLOG", O_WRONLY|O_CREAT|O_EXCL, 0600) = 4
24615 stat("pg_xlog/RECOVERYXLOG", {st_mode=S_IFREG|0600, st_size=16777216,
...}) = 0
24615 open("pg_xlog/RECOVERYXLOG", O_RDWR) = 3
24615 rename("pg_xlog/RECOVERYXLOG", "pg_xlog/00000001000000490000009A") = 0
24615 stat("pg_xlog/RECOVERYXLOG", 0x7fff3e2f8e80) = -1 ENOENT (No such file or
directory)
24622 execve("/bin/cp", ["cp", "/pgdata/postXX_95_orig_wal/0"...,
"pg_xlog/RECOVERYXLOG"], [/* 60 vars */]) = 0
24622 stat("pg_xlog/RECOVERYXLOG", 0x7fffb1f62350) = -1 ENOENT (No such file or
directory)
24615 stat("pg_xlog/RECOVERYXLOG", 0x7fff3e2f8e80) = -1 ENOENT (No such file or
directory)
24623 execve("/bin/cp", ["cp", "/pgdata/postXX_95_orig_wal/0"...,
"pg_xlog/RECOVERYXLOG"], [/* 60 vars */]) = 0
24623 stat("pg_xlog/RECOVERYXLOG", 0x7ffd4a373c40) = -1 ENOENT (No such file or
directory)
24623 stat("pg_xlog/RECOVERYXLOG", 0x7ffd4a373a60) = -1 ENOENT (No such file or
directory)
24623 open("pg_xlog/RECOVERYXLOG", O_WRONLY|O_CREAT|O_EXCL, 0600) = 7
24615 stat("pg_xlog/RECOVERYXLOG", {st_mode=S_IFREG|0600, st_size=16777216,
...}) = 0
24615 open("pg_xlog/RECOVERYXLOG", O_RDWR) = 3
24615 rename("pg_xlog/RECOVERYXLOG", "pg_xlog/00000001000000490000009A") = 0
24615 unlink("pg_xlog/RECOVERYXLOG") = -1 ENOENT (No such file or directory)
Jeff Marshall
U.S. International Trade Commission
-----Original Message-----
From: Andres Freund [mailto:[email protected]]
Sent: Monday, May 30, 2016 5:14 PM
To: Michael Paquier
Cc: Tom Lane; Marshall, Jeffrey; PostgreSQL mailing lists
Subject: Re: [HACKERS] [GENERAL] Permission Denied Error on
pg_xlog/RECOVERYXLOG file
On 2016-05-30 12:51:17 +0900, Michael Paquier wrote:
> On Sat, May 28, 2016 at 9:57 AM, Andres Freund <[email protected]> wrote:
> > On 2016-05-27 20:54:43 -0400, Tom Lane wrote:
> >> Andres Freund <[email protected]> writes:
> >> > On 2016-05-26 12:44:51 -0400, Tom Lane wrote:
> >> > 2016-04-27 17:02:06 EDT 572128cd.1811 [7-1] user=,db=,remote= FATAL:
> >> > 42501:
> >> > could not open file "pg_xlog/RECOVERYXLOG": Permission denied
> >>
> >> > So, what's the permission of RECOVERYXLOG at that point? It's
> >> > pretty weird that directly after running reason_command it's not
> >> > readable.
> >>
> >> s/not readable/not writable/. I doubt that it's a good idea for
> >> that code to think that it can fail hard on non-writable files.
> >
> > But we actually sometimes write to files we've recovered; if they're
> > the end of the WAL after archive recovery and/or promotion. If a
> > restore_command restores files in a non-writable way it's buggy; I
> > don't see why it's worthwhile to work around that.
>
> Not exactly, startup process does not write directly to the files of
> pg_xlog while in recovery.
It does at the end of crash recovery. And the wal receiver does so at the end
of archive recovery (which can repeatedly be reached).
- Andres
--
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers