Re: could not open file "global/pg_filenode.map": Operation not permitted

2024-04-25 Thread Kashif Zeeshan
Hi Nick

On Thu, Apr 25, 2024 at 12:56 PM Nick Renders  wrote:

> Hello,
>
> We have a Postgres server that intermittently logs the following:
>
> 2024-02-26 10:29:41.580 CET [63962] FATAL:  could not open file
> "global/pg_filenode.map": Operation not permitted
> 2024-02-26 10:30:11.147 CET [90610] LOG:  could not open file
> "postmaster.pid": Operation not permitted; continuing anyway
> 2024-02-26 10:30:11.149 CET [63975] FATAL:  could not open file
> "global/pg_filenode.map": Operation not permitted
> 2024-02-26 10:30:35.941 CET [63986] FATAL:  could not open file
> "global/pg_filenode.map": Operation not permitted
> 2024-02-26 10:30:41.546 CET [63991] FATAL:  could not open file
> "global/pg_filenode.map": Operation not permitted
> 2024-02-26 10:30:44.398 CET [63994] FATAL:  could not open file
> "global/pg_filenode.map": Operation not permitted
> 2024-02-26 10:31:11.149 CET [90610] LOG:  could not open file
> "postmaster.pid": Operation not permitted; continuing anyway
> 2024-02-26 10:31:11.151 CET [64008] FATAL:  could not open file
> "global/pg_filenode.map": Operation not permitted
> 2024-02-26 10:31:41.546 CET [64023] FATAL:  could not open file
> "global/pg_filenode.map": Operation not permitted
> 2024-02-26 10:32:11.150 CET [90610] LOG:  could not open file
> "postmaster.pid": Operation not permitted; continuing anyway
> 2024-02-26 10:32:11.153 CET [64035] FATAL:  could not open file
> "global/pg_filenode.map": Operation not permitted
> 2024-02-26 10:32:41.547 CET [64050] FATAL:  could not open file
> "global/pg_filenode.map": Operation not permitted
> 2024-02-26 10:33:11.151 CET [90610] LOG:  could not open file
> "postmaster.pid": Operation not permitted; continuing anyway
> 2024-02-26 10:33:11.153 CET [64062] FATAL:  could not open file
> "global/pg_filenode.map": Operation not permitted
> 2024-02-26 10:33:41.548 CET [64087] FATAL:  could not open file
> "global/pg_filenode.map": Operation not permitted
>
>
> This has happened 3 times over the last 2 weeks now, without any
> indication what caused it.
> The privileges of those 2 files are all in order.
> When this happens, the server is no longer accessible, and we need to
> restart the service (pg_ctl restart).
> Once restarted, Popstgres runs fine again for a couple of days.
>
Hi

This issue seems to be related to the permission issue, please check your
script if it's changing the permissions.

Regards
Kashif Zeeshan
Bitnine Global

>
> We are running PostgreSQL 16.2 on macOS 14.3.1.
>
> Any idea what might be causing this issue, or how to resolve it?
>
>
> Best regards,
>
> Nick Renders
>
>
>
>
>


Re: could not open file "global/pg_filenode.map": Operation not permitted

2024-04-03 Thread Thomas Munro
On Sat, Mar 23, 2024 at 3:01 AM Nick Renders  wrote:
> We now have a second machine with this issue: it is an Intel Mac mini running 
> macOS Sonoma (14.4) and PostgreSQL 16.2.
> This one only has a single Data directory, so there are no multiple instances 
> running.

BTW if you're running databases on mains-powered Macs, I have a patch
that you might be interested in, which so far hasn't attracted any
reviews.  The short version is that I bet you can at least lose many
seconds of commits (because WAL doesn't really hit durable part of
disk), and possibly also fail to recover (pg_control hits disk before
WAL, not sure if this is really possible), if you yank the power and
you're using the default settings for wal_sync_method.  I'd like to
rationalise the settings for that stuff and make it safe by default.

I don't know anything about the USB storage pathway but I'd be
surprised if it's different.

https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BF0EL4Up6yVYbbcWse4xKaqW4wc2xpw67Pq9FjmByWVg%40mail.gmail.com




Re: could not open file "global/pg_filenode.map": Operation not permitted

2024-04-03 Thread Thomas Munro
On Thu, Apr 4, 2024 at 3:11 AM Nick Renders  wrote:
> In the macOS Finder, when you show the Info (command+i) for an external drive 
> (or any partition that is not the boot drive), there is a checkbox "Ignore 
> ownership on this volume" in the Permissions section. I think it is by 
> default "on" for external drives.

Hmm.  Sounds suspicious, but why would only this file be affected?

> The external SSD is an Orico drive that is connected with USB-C. It is 
> initialised as a GUID Partition Map with a single AFPS partition.
>
> We have run PostgreSQL 15 and earlier, before upgrading to 16 when it came 
> out last year. We didn't have any problems with 16 until recently, after 
> upgrading to Sonoma.

Interesting.  So the rename might have something to do with it, though
I don't have a theory for how,.

Can you show what the permissions and ownership looks like for pg_*
under there, normally, and once the system reaches this state?
Something like:

tmunro@phonebox postgresql % ls -slap pgdata/global/pg_*
16 -rw---  1 tmunro  staff  8192  4 Apr 09:50 pgdata/global/pg_control
 8 -rw---  1 tmunro  staff   524  4 Apr 09:50 pgdata/global/pg_filenode.map

I'm asking for "pg_*" because I want to see pg_control as well, to
understand the permissions for the other files in the cluster, and
because I want to see if there are any stray remnants of a temporary
file, which would be called pg_filenode.map.tmp.




Re: could not open file "global/pg_filenode.map": Operation not permitted

2024-04-03 Thread Nick Renders
On 29 Mar 2024, at 4:25, Thomas Munro wrote:
>
> I don't have any specific ideas and I have no idea what "ignore
> ownership" means ... what kind of filesystem is running on it?  For
> the simple SSD, is it directly connected, running a normal Apple APFS
> filesystem, or something more complicated?
>
> I wonder if this could be related to the change in 16 which started to
> rename that file:
>
> https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d8cd0c6c95c0120168df93aae095df4e0682a08a
>
> Did you ever run 15 or earlier on that system?


In the macOS Finder, when you show the Info (command+i) for an external drive 
(or any partition that is not the boot drive), there is a checkbox "Ignore 
ownership on this volume" in the Permissions section. I think it is by default 
"on" for external drives.

The external SSD is an Orico drive that is connected with USB-C. It is 
initialised as a GUID Partition Map with a single AFPS partition.

We have run PostgreSQL 15 and earlier, before upgrading to 16 when it came out 
last year. We didn't have any problems with 16 until recently, after upgrading 
to Sonoma.


Nick




Re: could not open file "global/pg_filenode.map": Operation not permitted

2024-03-28 Thread Thomas Munro
On Fri, Mar 29, 2024 at 4:47 AM Nick Renders  wrote:
> Looking at the 2 machines that are having this issue (and the others that 
> don't), I think it is somehow related to the following setup:
> - macOS Sonoma (14.4 and 14.4.1)
> - data directory on an external drive
>
> That external drive (a Promise RAID system in one case, a simple SSD in the 
> other) has the option "ignore ownership" on by default. I have tried 
> disabling that, and updating the data directory to have owner + read/write 
> access for the postgres user. It seemed to work at first, but just now the 
> issue re-appeared again.
>
> Any other suggestions?

I don't have any specific ideas and I have no idea what "ignore
ownership" means ... what kind of filesystem is running on it?  For
the simple SSD, is it directly connected, running a normal Apple APFS
filesystem, or something more complicated?

I wonder if this could be related to the change in 16 which started to
rename that file:

https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=d8cd0c6c95c0120168df93aae095df4e0682a08a

Did you ever run 15 or earlier on that system?




Re: could not open file "global/pg_filenode.map": Operation not permitted

2024-03-28 Thread Nick Renders

On 22 Mar 2024, at 17:00, Alban Hertroys wrote:

On Fri, 22 Mar 2024 at 15:01, Nick Renders  
wrote:




We now have a second machine with this issue: it is an Intel Mac mini
running macOS Sonoma (14.4) and PostgreSQL 16.2.
This one only has a single Data directory, so there are no multiple
instances running.



I don't think that having a single Data directory prevents multiple
instances from running. That's more of a matter of how often pg_ctl 
was

called with the start command for that particular data directory.


I installed Postgres yesterday and restored a copy from our live 
database

in the Data directory.



How did you restore that copy? Was that a file-based copy perhaps? 
Your

files may have incorrect owners or permissions in that case.


The Postgres process started up without problems, but after 40 
minutes it

started throwing the same errors in the log:

2024-03-21 11:49:27.410 CET [1655] FATAL:  could not open 
file

"global/pg_filenode.map": Operation not permitted
2024-03-21 11:49:46.955 CET [1760] FATAL:  could not open 
file

"global/pg_filenode.map": Operation not permitted
2024-03-21 11:50:07.398 CET [965] LOG:  could not open file
"postmaster.pid": Operation not permitted; continuing anyway



It's possible that some other process put a lock on these files. 
Spotlight

perhaps? Or TimeMachine?


I stopped and started the process, and it continued working again 
until
around 21:20, when the issue popped up again. I wasn't doing anything 
on
the machine at that time, so I have no idea what might have triggered 
it.


Is there perhaps some feature that I can enable that logs which 
processes

use these 2 files?



IIRC, MacOS comes shipped with the lsof command, which will tell you 
which

processes have a given file open. See man lsof.

--
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.



I have tried the lsof command, but it returns no info about the 
postmaster.pid and global/pg_filenode.map files, so I take they are not 
open at that moment.


Spotlight indexing has been disabled, and TimeMachine takes no snapshots 
of the volume where the data resides.


Looking at the 2 machines that are having this issue (and the others 
that don't), I think it is somehow related to the following setup:

- macOS Sonoma (14.4 and 14.4.1)
- data directory on an external drive

That external drive (a Promise RAID system in one case, a simple SSD in 
the other) has the option "ignore ownership" on by default. I have tried 
disabling that, and updating the data directory to have owner + 
read/write access for the postgres user. It seemed to work at first, but 
just now the issue re-appeared again.


Any other suggestions?

Thanks,

Nick Renders

Re: could not open file "global/pg_filenode.map": Operation not permitted

2024-03-22 Thread Alban Hertroys
On Fri, 22 Mar 2024 at 15:01, Nick Renders  wrote:

>
> We now have a second machine with this issue: it is an Intel Mac mini
> running macOS Sonoma (14.4) and PostgreSQL 16.2.
> This one only has a single Data directory, so there are no multiple
> instances running.
>

I don't think that having a single Data directory prevents multiple
instances from running. That's more of a matter of how often pg_ctl was
called with the start command for that particular data directory.


> I installed Postgres yesterday and restored a copy from our live database
> in the Data directory.


How did you restore that copy? Was that a file-based copy perhaps? Your
files may have incorrect owners or permissions in that case.


> The Postgres process started up without problems, but after 40 minutes it
> started throwing the same errors in the log:
>
> 2024-03-21 11:49:27.410 CET [1655] FATAL:  could not open file
> "global/pg_filenode.map": Operation not permitted
> 2024-03-21 11:49:46.955 CET [1760] FATAL:  could not open file
> "global/pg_filenode.map": Operation not permitted
> 2024-03-21 11:50:07.398 CET [965] LOG:  could not open file
> "postmaster.pid": Operation not permitted; continuing anyway
>

It's possible that some other process put a lock on these files. Spotlight
perhaps? Or TimeMachine?


> I stopped and started the process, and it continued working again until
> around 21:20, when the issue popped up again. I wasn't doing anything on
> the machine at that time, so I have no idea what might have triggered it.
>
> Is there perhaps some feature that I can enable that logs which processes
> use these 2 files?
>

IIRC, MacOS comes shipped with the lsof command, which will tell you which
processes have a given file open. See man lsof.

-- 
If you can't see the forest for the trees,
Cut the trees and you'll see there is no forest.


Re: could not open file "global/pg_filenode.map": Operation not permitted

2024-03-22 Thread Adrian Klaver

On 3/22/24 07:01, Nick Renders wrote:

On 13 Mar 2024, at 12:35, Stephen Frost wrote:



We now have a second machine with this issue: it is an Intel Mac mini running 
macOS Sonoma (14.4) and PostgreSQL 16.2.
This one only has a single Data directory, so there are no multiple instances 
running.

I installed Postgres yesterday and restored a copy from our live database in 
the Data directory. The Postgres process started up without problems, but after 
40 minutes it started throwing the same errors in the log:

2024-03-21 11:49:27.410 CET [1655] FATAL:  could not open file 
"global/pg_filenode.map": Operation not permitted
2024-03-21 11:49:46.955 CET [1760] FATAL:  could not open file 
"global/pg_filenode.map": Operation not permitted
2024-03-21 11:50:07.398 CET [965] LOG:  could not open file 
"postmaster.pid": Operation not permitted; continuing anyway

I stopped and started the process, and it continued working again until around 
21:20, when the issue popped up again. I wasn't doing anything on the machine 
at that time, so I have no idea what might have triggered it.


Have you looked at the OS system logs?



Is there perhaps some feature that I can enable that logs which processes use 
these 2 files?

Thanks,

Nick Renders




--
Adrian Klaver
adrian.kla...@aklaver.com





Re: could not open file "global/pg_filenode.map": Operation not permitted

2024-03-22 Thread Nick Renders
On 13 Mar 2024, at 12:35, Stephen Frost wrote:

> Greetings,
>
> * Nick Renders (postg...@arcict.com) wrote:
>>> ...run them under different users on the system.
>>
>> Are you referring to the "postgres" user / role? Does that also mean setting 
>> up 2 postgres installation directories?
>
> Yes, two separate MacOS user accounts is what I was suggesting.  You
> could use the same postgres binaries though, no need to have two
> installation of them.  You'd need seperate data directories, of course,
> as you have currently.
>
>> This script runs on a daily basis at 4:30 AM. It did so this morning and 
>> there was no issue with cluster B. So even though the issue is most likely 
>> related to the script, it does not cause it every time.
>
> Seems likely that it's some sort of race condition.
>
> Thanks,
>
> Stephen

We now have a second machine with this issue: it is an Intel Mac mini running 
macOS Sonoma (14.4) and PostgreSQL 16.2.
This one only has a single Data directory, so there are no multiple instances 
running.

I installed Postgres yesterday and restored a copy from our live database in 
the Data directory. The Postgres process started up without problems, but after 
40 minutes it started throwing the same errors in the log:

    2024-03-21 11:49:27.410 CET [1655] FATAL:  could not open file 
"global/pg_filenode.map": Operation not permitted
    2024-03-21 11:49:46.955 CET [1760] FATAL:  could not open file 
"global/pg_filenode.map": Operation not permitted
2024-03-21 11:50:07.398 CET [965] LOG:  could not open file 
"postmaster.pid": Operation not permitted; continuing anyway

I stopped and started the process, and it continued working again until around 
21:20, when the issue popped up again. I wasn't doing anything on the machine 
at that time, so I have no idea what might have triggered it.

Is there perhaps some feature that I can enable that logs which processes use 
these 2 files?

Thanks,

Nick Renders




Re: could not open file "global/pg_filenode.map": Operation not permitted

2024-03-13 Thread Stephen Frost
Greetings,

* Nick Renders (postg...@arcict.com) wrote:
> > ...run them under different users on the system.
> 
> Are you referring to the "postgres" user / role? Does that also mean setting 
> up 2 postgres installation directories?

Yes, two separate MacOS user accounts is what I was suggesting.  You
could use the same postgres binaries though, no need to have two
installation of them.  You'd need seperate data directories, of course,
as you have currently.

> This script runs on a daily basis at 4:30 AM. It did so this morning and 
> there was no issue with cluster B. So even though the issue is most likely 
> related to the script, it does not cause it every time.

Seems likely that it's some sort of race condition.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: could not open file "global/pg_filenode.map": Operation not permitted

2024-03-12 Thread Adrian Klaver

On 3/12/24 02:57, Nick Renders wrote:

On 11 Mar 2024, at 16:04, Adrian Klaver wrote:


On 3/11/24 03:11, Nick Renders wrote:

Thank you for your reply Laurenz.
I don't think it is related to any third party security software. We have 
several other machines with a similar setup, but this is the only server that 
has this issue.

The one thing different about this machine however, is that it runs 2 instances 
of Postgres:
- cluster A on port 165
- cluster B on port 164
Cluster A is actually a backup from another Postgres server that is restored on 
a daily basis via Barman. This means that we login remotely from the Barman 
server over SSH, stop cluster A's service (port 165), clear the Data folder, 
restore the latest back into the Data folder, and start up the service again.
Cluster B's Data and service (port 164) remain untouched during all this time. This is 
the cluster that experiences the intermittent "operation not permitted" issue.

Over the past 2 weeks, I have suspended our restore script and the issue did 
not occur.
I have just performed another restore on cluster A and now cluster B is 
throwing errors in the log again.


Since it seems to be the trigger, what are the contents of the restore script?



Any idea why this is happening? It does not occur with every restore, but it 
seems to be related anyway.

Thanks,

Nick Renders




--
Adrian Klaver
adrian.kla...@aklaver.com





...how are A and B connected?


The 2 cluster are not connected. They run on the same macOS 14 machine with a 
single Postgres installation ( /Library/PostgreSQL/16/ ) and their respective 
Data folders are located on the same volume ( 
/Volumes/Postgres_Data/PostgreSQL/16/data and 
/Volumes/Postgres_Data/PostgreSQL/16-DML/data ). Beside that, they run 
independently on 2 different ports, specified in the postgresql.conf.



...run them under different users on the system.


Are you referring to the "postgres" user / role? Does that also mean setting up 
2 postgres installation directories?



...what are the contents of the restore script?


## stop cluster A
ssh postgres@10.0.0.1 '/Library/PostgreSQL/16/bin/pg_ctl -D 
/Volumes/Postgres_Data/PostgreSQL/16/data stop'

## save config files (ARC_postgresql_16.conf is included in postgresql.conf and 
contains cluster-specific information like the port number)
ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cp 
ARC_postgresql_16.conf ../ARC_postgresql_16.conf'
ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cp 
pg_hba.conf ../pg_hba.conf'

## clear data directory
ssh postgres@10.0.0.1 'rm -r /Volumes/Postgres_Data/PostgreSQL/16/data/*'

## transfer recovery (this will copy the backup "20240312T040106" and any 
lingering WAL files into the Data folder)
barman recover --remote-ssh-command 'ssh postgres@10.0.0.1' pg 20240312T040106 
/Volumes/Postgres_Data/PostgreSQL/16/data

## restore config files
ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cd .. && 
mv ARC_postgresql_16.conf /Volumes/Postgres_Data/PostgreSQL/16/data/ARC_postgresql_16.conf'
ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cd .. && 
mv pg_hba.conf /Volumes/Postgres_Data/PostgreSQL/16/data/pg_hba.conf'

## start cluster A
ssh postgres@10.0.0.1 '/Library/PostgreSQL/16/bin/pg_ctl -D 
/Volumes/Postgres_Data/PostgreSQL/16/data start > /dev/null'


This script runs on a daily basis at 4:30 AM. It did so this morning and there 
was no issue with cluster B. So even though the issue is most likely related to 
the script, it does not cause it every time.


I'm not seeing anything obvious, caveat I'm on my first cup of coffee.

From your first post:

2024-02-26 10:29:41.580 CET [63962] FATAL:  could not open file 
"global/pg_filenode.map": Operation not permitted
2024-02-26 10:30:11.147 CET [90610] LOG:  could not open file 
"postmaster.pid": Operation not permitted; continuing anyway


For now the only suggestion I have is note the presence, ownership and 
privileges of the above files in the present working setup. Then when it 
fails do the same and see if there is a difference. My hunch it is in 
this step:


barman recover --remote-ssh-command 'ssh postgres@10.0.0.1' pg 
20240312T040106 /Volumes/Postgres_Data/PostgreSQL/16/data


If not the step itself then in the process that creates 20240312T040106.




Best regards,

Nick Renders





--
Adrian Klaver
adrian.kla...@aklaver.com





Re: could not open file "global/pg_filenode.map": Operation not permitted

2024-03-12 Thread Nick Renders
On 11 Mar 2024, at 16:04, Adrian Klaver wrote:

> On 3/11/24 03:11, Nick Renders wrote:
>> Thank you for your reply Laurenz.
>> I don't think it is related to any third party security software. We have 
>> several other machines with a similar setup, but this is the only server 
>> that has this issue.
>>
>> The one thing different about this machine however, is that it runs 2 
>> instances of Postgres:
>> - cluster A on port 165
>> - cluster B on port 164
>> Cluster A is actually a backup from another Postgres server that is restored 
>> on a daily basis via Barman. This means that we login remotely from the 
>> Barman server over SSH, stop cluster A's service (port 165), clear the Data 
>> folder, restore the latest back into the Data folder, and start up the 
>> service again.
>> Cluster B's Data and service (port 164) remain untouched during all this 
>> time. This is the cluster that experiences the intermittent "operation not 
>> permitted" issue.
>>
>> Over the past 2 weeks, I have suspended our restore script and the issue did 
>> not occur.
>> I have just performed another restore on cluster A and now cluster B is 
>> throwing errors in the log again.
>
> Since it seems to be the trigger, what are the contents of the restore script?
>
>>
>> Any idea why this is happening? It does not occur with every restore, but it 
>> seems to be related anyway.
>>
>> Thanks,
>>
>> Nick Renders
>>
>
>
> -- 
> Adrian Klaver
> adrian.kla...@aklaver.com



> ...how are A and B connected?

The 2 cluster are not connected. They run on the same macOS 14 machine with a 
single Postgres installation ( /Library/PostgreSQL/16/ ) and their respective 
Data folders are located on the same volume ( 
/Volumes/Postgres_Data/PostgreSQL/16/data and 
/Volumes/Postgres_Data/PostgreSQL/16-DML/data ). Beside that, they run 
independently on 2 different ports, specified in the postgresql.conf.


> ...run them under different users on the system.

Are you referring to the "postgres" user / role? Does that also mean setting up 
2 postgres installation directories?


> ...what are the contents of the restore script?

## stop cluster A
ssh postgres@10.0.0.1 '/Library/PostgreSQL/16/bin/pg_ctl -D 
/Volumes/Postgres_Data/PostgreSQL/16/data stop'

## save config files (ARC_postgresql_16.conf is included in postgresql.conf and 
contains cluster-specific information like the port number)
ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cp 
ARC_postgresql_16.conf ../ARC_postgresql_16.conf'
ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cp 
pg_hba.conf ../pg_hba.conf'

## clear data directory
ssh postgres@10.0.0.1 'rm -r /Volumes/Postgres_Data/PostgreSQL/16/data/*'

## transfer recovery (this will copy the backup "20240312T040106" and any 
lingering WAL files into the Data folder)
barman recover --remote-ssh-command 'ssh postgres@10.0.0.1' pg 20240312T040106 
/Volumes/Postgres_Data/PostgreSQL/16/data

## restore config files
ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cd .. && 
mv ARC_postgresql_16.conf 
/Volumes/Postgres_Data/PostgreSQL/16/data/ARC_postgresql_16.conf'
ssh postgres@10.0.0.1 'cd /Volumes/Postgres_Data/PostgreSQL/16/data && cd .. && 
mv pg_hba.conf /Volumes/Postgres_Data/PostgreSQL/16/data/pg_hba.conf'

## start cluster A
ssh postgres@10.0.0.1 '/Library/PostgreSQL/16/bin/pg_ctl -D 
/Volumes/Postgres_Data/PostgreSQL/16/data start > /dev/null'


This script runs on a daily basis at 4:30 AM. It did so this morning and there 
was no issue with cluster B. So even though the issue is most likely related to 
the script, it does not cause it every time.


Best regards,

Nick Renders





Re: could not open file "global/pg_filenode.map": Operation not permitted

2024-03-11 Thread Adrian Klaver

On 3/11/24 03:11, Nick Renders wrote:

Thank you for your reply Laurenz.
I don't think it is related to any third party security software. We have 
several other machines with a similar setup, but this is the only server that 
has this issue.

The one thing different about this machine however, is that it runs 2 instances 
of Postgres:
- cluster A on port 165
- cluster B on port 164
Cluster A is actually a backup from another Postgres server that is restored on 
a daily basis via Barman. This means that we login remotely from the Barman 
server over SSH, stop cluster A's service (port 165), clear the Data folder, 
restore the latest back into the Data folder, and start up the service again.
Cluster B's Data and service (port 164) remain untouched during all this time. This is 
the cluster that experiences the intermittent "operation not permitted" issue.

Over the past 2 weeks, I have suspended our restore script and the issue did 
not occur.
I have just performed another restore on cluster A and now cluster B is 
throwing errors in the log again.


Since it seems to be the trigger, what are the contents of the restore 
script?




Any idea why this is happening? It does not occur with every restore, but it 
seems to be related anyway.

Thanks,

Nick Renders




--
Adrian Klaver
adrian.kla...@aklaver.com





Re: could not open file "global/pg_filenode.map": Operation not permitted

2024-03-11 Thread Stephen Frost
Greetings,

* Nick Renders (postg...@arcict.com) wrote:
> The one thing different about this machine however, is that it runs 2 
> instances of Postgres:
> - cluster A on port 165
> - cluster B on port 164
> Cluster A is actually a backup from another Postgres server that is restored 
> on a daily basis via Barman. This means that we login remotely from the 
> Barman server over SSH, stop cluster A's service (port 165), clear the Data 
> folder, restore the latest back into the Data folder, and start up the 
> service again.
> Cluster B's Data and service (port 164) remain untouched during all this 
> time. This is the cluster that experiences the intermittent "operation not 
> permitted" issue.
> 
> Over the past 2 weeks, I have suspended our restore script and the issue did 
> not occur.
> I have just performed another restore on cluster A and now cluster B is 
> throwing errors in the log again.
> 
> Any idea why this is happening? It does not occur with every restore, but it 
> seems to be related anyway.

Not sure why it's happening but they certainly sound related based on
the correlation.

One thing that I suggest doing when you're running multiple independent
PG clusters on the same host is to run them under different users on the
system.  Perhaps if you move cluster B to a distinct user account,
you'll have better luck figuring out what's going on since something
will presumably start throwing permission denied errors.

Not sure if it's an option or not, but you might also consider using
Linux instead of MacOS..

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: could not open file "global/pg_filenode.map": Operation not permitted

2024-03-11 Thread Laurenz Albe
On Mon, 2024-03-11 at 11:11 +0100, Nick Renders wrote:
> We have several other machines with a similar setup, but this is the only 
> server that has this issue.
> 
> [...] Cluster A is actually a backup from another Postgres server that is 
> restored on a
> daily basis via Barman. This means that we login remotely from the Barman 
> server over SSH,
> stop cluster A's service (port 165), clear the Data folder, restore the 
> latest back into
> the Data folder, and start up the service again.
> Cluster B's Data and service (port 164) remain untouched during all this 
> time. This is
> the cluster that experiences the intermittent "operation not permitted" issue.
> 
> Over the past 2 weeks, I have suspended our restore script and the issue did 
> not occur.
> I have just performed another restore on cluster A and now cluster B is 
> throwing errors in the log again.
> 
> Any idea why this is happening? It does not occur with every restore, but it 
> seems to be related anyway.

I don't know Barman, but with that incomplete description anybody will have
problems determining the cause.  For example, how are A and B connected?

Yours,
Laurenz Albe




Re: could not open file "global/pg_filenode.map": Operation not permitted

2024-03-11 Thread Nick Renders
Thank you for your reply Laurenz.
I don't think it is related to any third party security software. We have 
several other machines with a similar setup, but this is the only server that 
has this issue.

The one thing different about this machine however, is that it runs 2 instances 
of Postgres:
- cluster A on port 165
- cluster B on port 164
Cluster A is actually a backup from another Postgres server that is restored on 
a daily basis via Barman. This means that we login remotely from the Barman 
server over SSH, stop cluster A's service (port 165), clear the Data folder, 
restore the latest back into the Data folder, and start up the service again.
Cluster B's Data and service (port 164) remain untouched during all this time. 
This is the cluster that experiences the intermittent "operation not permitted" 
issue.

Over the past 2 weeks, I have suspended our restore script and the issue did 
not occur.
I have just performed another restore on cluster A and now cluster B is 
throwing errors in the log again.

Any idea why this is happening? It does not occur with every restore, but it 
seems to be related anyway.

Thanks,

Nick Renders


On 26 Feb 2024, at 16:29, Laurenz Albe wrote:

> On Mon, 2024-02-26 at 15:14 +0100, Nick Renders wrote:
>> We have a Postgres server that intermittently logs the following:
>>
>> 2024-02-26 10:29:41.580 CET [63962] FATAL:  could not open file 
>> "global/pg_filenode.map": Operation not permitted
>> 2024-02-26 10:30:11.147 CET [90610] LOG:  could not open file 
>> "postmaster.pid": Operation not permitted; continuing anyway
>>
>> This has happened 3 times over the last 2 weeks now, without any indication 
>> what caused it.
>> The privileges of those 2 files are all in order.
>> When this happens, the server is no longer accessible, and we need to 
>> restart the service (pg_ctl restart).
>> Once restarted, Popstgres runs fine again for a couple of days.
>>
>> We are running PostgreSQL 16.2 on macOS 14.3.1.
>
> Perhaps that is some kind of virus checker or something else that locks files.
>
> Yours,
> Laurenz Albe




Re: could not open file "global/pg_filenode.map": Operation not permitted

2024-02-26 Thread Laurenz Albe
On Mon, 2024-02-26 at 15:14 +0100, Nick Renders wrote:
> We have a Postgres server that intermittently logs the following:
> 
> 2024-02-26 10:29:41.580 CET [63962] FATAL:  could not open file 
> "global/pg_filenode.map": Operation not permitted
> 2024-02-26 10:30:11.147 CET [90610] LOG:  could not open file 
> "postmaster.pid": Operation not permitted; continuing anyway
> 
> This has happened 3 times over the last 2 weeks now, without any indication 
> what caused it.
> The privileges of those 2 files are all in order.
> When this happens, the server is no longer accessible, and we need to restart 
> the service (pg_ctl restart).
> Once restarted, Popstgres runs fine again for a couple of days.
> 
> We are running PostgreSQL 16.2 on macOS 14.3.1.

Perhaps that is some kind of virus checker or something else that locks files.

Yours,
Laurenz Albe




could not open file "global/pg_filenode.map": Operation not permitted

2024-02-26 Thread Nick Renders
Hello,

We have a Postgres server that intermittently logs the following:

2024-02-26 10:29:41.580 CET [63962] FATAL:  could not open file 
"global/pg_filenode.map": Operation not permitted
2024-02-26 10:30:11.147 CET [90610] LOG:  could not open file "postmaster.pid": 
Operation not permitted; continuing anyway
2024-02-26 10:30:11.149 CET [63975] FATAL:  could not open file 
"global/pg_filenode.map": Operation not permitted
2024-02-26 10:30:35.941 CET [63986] FATAL:  could not open file 
"global/pg_filenode.map": Operation not permitted
2024-02-26 10:30:41.546 CET [63991] FATAL:  could not open file 
"global/pg_filenode.map": Operation not permitted
2024-02-26 10:30:44.398 CET [63994] FATAL:  could not open file 
"global/pg_filenode.map": Operation not permitted
2024-02-26 10:31:11.149 CET [90610] LOG:  could not open file "postmaster.pid": 
Operation not permitted; continuing anyway
2024-02-26 10:31:11.151 CET [64008] FATAL:  could not open file 
"global/pg_filenode.map": Operation not permitted
2024-02-26 10:31:41.546 CET [64023] FATAL:  could not open file 
"global/pg_filenode.map": Operation not permitted
2024-02-26 10:32:11.150 CET [90610] LOG:  could not open file "postmaster.pid": 
Operation not permitted; continuing anyway
2024-02-26 10:32:11.153 CET [64035] FATAL:  could not open file 
"global/pg_filenode.map": Operation not permitted
2024-02-26 10:32:41.547 CET [64050] FATAL:  could not open file 
"global/pg_filenode.map": Operation not permitted
2024-02-26 10:33:11.151 CET [90610] LOG:  could not open file "postmaster.pid": 
Operation not permitted; continuing anyway
2024-02-26 10:33:11.153 CET [64062] FATAL:  could not open file 
"global/pg_filenode.map": Operation not permitted
2024-02-26 10:33:41.548 CET [64087] FATAL:  could not open file 
"global/pg_filenode.map": Operation not permitted


This has happened 3 times over the last 2 weeks now, without any indication 
what caused it.
The privileges of those 2 files are all in order.
When this happens, the server is no longer accessible, and we need to restart 
the service (pg_ctl restart).
Once restarted, Popstgres runs fine again for a couple of days.

We are running PostgreSQL 16.2 on macOS 14.3.1.

Any idea what might be causing this issue, or how to resolve it?


Best regards,

Nick Renders