Re: Feature request for INITCAP() function

2023-11-30 Thread Laurenz Albe
On Thu, 2023-11-30 at 21:08 -0500, Jeff Gerbracht wrote:
> It would be great if there was a way to set exceptions to the 'word' 
> delimiter list used in
> the INITCAP() function.    For example, I have hyphenated words like 
> blue-green or
> possessives and contractions like don't and cat's tail
> These become Blue-Green, Don'T and Cat'S Tail.
> 
> Being able to pass in a set of delimiter exceptions like hyphen and single 
> quote would make
> this function much more useful for our specific use cases.

That would pretty quickly drift into locale-dependent rules.  Such a function 
would be even
more special-case than the current "initcap()", which is already pretty 
specific to certain
languages.  I don't know if there is enough value for the general public in it.

But it should be easy to implement it yourself, for example in PL/pgSQL.  Why 
don't you
try that?

Yours,
Laurenz Albe




Re: Two started cluster on the same data directory and port

2023-11-30 Thread Adrian Klaver

On 11/30/23 20:43, Matthias Apitz wrote:

El día jueves, noviembre 30, 2023 a las 02:10:25p. m. -0800, Adrian Klaver 
escribió:


On 11/30/23 12:35, Adrian Klaver wrote:

On 11/30/23 10:35, Adrian Klaver wrote:

On 11/30/23 09:27, Loles wrote:




cd /etc/postgresql/14/
sudo rm -r main_old/
    or
sudo cp -r main_old 


Arrgh.

 sudo mv -r main_old 

Memo to self don't eat lunch and copy/paste at same time.


Hmmm

purism@pureos:~$ uname -s
Linux
purism@pureos:~$ mv -r foo bar
mv: invalid option -- 'r'


The gift that keeps on giving. Third time's a charm?:

sudo mv  main_old/ 


Try 'mv --help' for more information.

matthias


--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Two started cluster on the same data directory and port

2023-11-30 Thread Matthias Apitz
El día jueves, noviembre 30, 2023 a las 02:10:25p. m. -0800, Adrian Klaver 
escribió:

> On 11/30/23 12:35, Adrian Klaver wrote:
> > On 11/30/23 10:35, Adrian Klaver wrote:
> > > On 11/30/23 09:27, Loles wrote:
> > > 
> 
> > cd /etc/postgresql/14/
> > sudo rm -r main_old/
> >    or
> > sudo cp -r main_old 
> 
> Arrgh.
> 
> sudo mv -r main_old 
> 
> Memo to self don't eat lunch and copy/paste at same time.

Hmmm

purism@pureos:~$ uname -s
Linux
purism@pureos:~$ mv -r foo bar
mv: invalid option -- 'r'
Try 'mv --help' for more information.

matthias
-- 
Matthias Apitz, ✉ g...@unixarea.de, http://www.unixarea.de/ +49-176-38902045
Public GnuPG key: http://www.unixarea.de/key.pub

I am not at war with Russia.
Я не воюю с Россией.
Ich bin nicht im Krieg mit Russland.




Feature request for INITCAP() function

2023-11-30 Thread Jeff Gerbracht
It would be great if there was a way to set exceptions to the 'word'
delimiter list used in the INITCAP() function.For example, I have
hyphenated words like blue-green or possessives and contractions like don't
and cat's tail
These become Blue-Green, Don'T and Cat'S Tail.

Being able to pass in a set of delimiter exceptions like hyphen and single
quote would make this function much more useful for our specific use cases.

   Thanks for considering,
  Jeff
-- 
Jeff Gerbracht
Cornell Lab of Ornithology


Re: libpq crashing on macOS during connection startup

2023-11-30 Thread Tom Lane
John DeSoi  writes:
> On Nov 30, 2023, at 2:07 PM, Tom Lane  wrote:
>> What troubles me about that stack trace is the references to Heimdal.
>> We gave up supporting Heimdal (and v16 explicitly rejects building
>> with it) because its support for Kerberos credentials was too
>> incomplete and flaky.  So I'm inclined to guess that you are running
>> into some Heimdal bug.  Try to rebuild libpq using MIT Kerberos
>> and see if things get better.

> Is there a way to simply disable GSSAPI authentication? I could not find it.

gssencmode=disable in your connection options; but that's a tad
inconvenient probably.

> The otool output below shows that Apple's Kerberos is being used and I assume 
> by extension, their Heimdal library. The Heimdal project told me as much - 
> Apple has a fork and would not pull from their project.

Ugh, not only Heimdal but a very obsolete version thereof?  It borders
on negligence for the homebrew PG package to be building against that.
They should be pulling in homebrew's MIT Kerberos package and using
that, if they want to enable GSSAPI.

regards, tom lane




Re: libpq crashing on macOS during connection startup

2023-11-30 Thread John DeSoi



> On Nov 30, 2023, at 2:07 PM, Tom Lane  wrote:
> 
> What troubles me about that stack trace is the references to Heimdal.
> We gave up supporting Heimdal (and v16 explicitly rejects building
> with it) because its support for Kerberos credentials was too
> incomplete and flaky.  So I'm inclined to guess that you are running
> into some Heimdal bug.  Try to rebuild libpq using MIT Kerberos
> and see if things get better.


I'm using v16 on my development machine and it is crashing on me at times with 
the same backtrace. Restarting the web server fixes it for a while for some 
reason. 

Is there a way to simply disable GSSAPI authentication? I could not find it.

The builds are from homebrew (https://brew.sh/). I'll have to see if there is a 
way for me to override build options.

The otool output below shows that Apple's Kerberos is being used and I assume 
by extension, their Heimdal library. The Heimdal project told me as much - 
Apple has a fork and would not pull from their project.

John DeSoi, Ph.D.




$ otool -L /usr/local/opt/postgresql@16/lib/libpq.5.dylib
/usr/local/opt/postgresql@16/lib/libpq.5.dylib:
/usr/local/opt/postgresql@16/lib/libpq.5.dylib (compatibility version 
5.0.0, current version 5.16.0)
/usr/local/opt/gettext/lib/libintl.8.dylib (compatibility version 
13.0.0, current version 13.0.0)
/usr/local/opt/openssl@3/lib/libssl.3.dylib (compatibility version 
3.0.0, current version 3.0.0)
/usr/local/opt/openssl@3/lib/libcrypto.3.dylib (compatibility version 
3.0.0, current version 3.0.0)
/usr/local/opt/krb5/lib/libgssapi_krb5.2.2.dylib (compatibility version 
2.0.0, current version 2.2.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
version 1319.100.3)
/System/Library/Frameworks/LDAP.framework/Versions/A/LDAP 
(compatibility version 1.0.0, current version 2.4.0)

$ otool -L /usr/local/opt/krb5/lib/libgssapi_krb5.2.2.dylib
/usr/local/opt/krb5/lib/libgssapi_krb5.2.2.dylib:
/usr/local/opt/krb5/lib/libgssapi_krb5.2.2.dylib (compatibility version 
2.0.0, current version 2.2.0)
@loader_path/libkrb5.3.3.dylib (compatibility version 3.0.0, current 
version 3.3.0)
@loader_path/libk5crypto.3.1.dylib (compatibility version 3.0.0, 
current version 3.1.0)
@loader_path/libcom_err.3.0.dylib (compatibility version 3.0.0, current 
version 3.0.0)
@loader_path/libkrb5support.1.1.dylib (compatibility version 1.0.0, 
current version 1.1.0)
/usr/lib/libresolv.9.dylib (compatibility version 1.0.0, current 
version 1.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
version 1319.100.3)

$ otool -L /usr/local/opt/krb5/lib/libkrb5.3.3.dylib
/usr/local/opt/krb5/lib/libkrb5.3.3.dylib:
/usr/local/opt/krb5/lib/libkrb5.3.3.dylib (compatibility version 3.0.0, 
current version 3.3.0)
@loader_path/libk5crypto.3.1.dylib (compatibility version 3.0.0, 
current version 3.1.0)
@loader_path/libcom_err.3.0.dylib (compatibility version 3.0.0, current 
version 3.0.0)
@loader_path/libkrb5support.1.1.dylib (compatibility version 1.0.0, 
current version 1.1.0)
/System/Library/Frameworks/Kerberos.framework/Versions/A/Kerberos 
(compatibility version 5.0.0, current version 6.0.0)
/usr/lib/libresolv.9.dylib (compatibility version 1.0.0, current 
version 1.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current 
version 1319.100.3)



Re: replication strange behavior

2023-11-30 Thread Atul Kumar
Hi,

In the master pg_hba.conf file, standby server hostname name is not being
read even with the domain name, but server ip is working fine.

output of /etc/hosts is given below

127.0.0.1   localhost localhost.localdomain localhost4
localhost4.localdomain4
::1 localhost localhost.localdomain localhost6
localhost6.localdomain6
 .domainname hostname


Regards.

On Fri, Dec 1, 2023 at 3:33 AM Ron Johnson  wrote:

> On Thu, Nov 30, 2023 at 3:41 PM Atul Kumar  wrote:
>
>> Hi,
>>
>> I have postgres 12 running in centos 7.
>>
>> I have configured streaming replication between one master and one
>> standby server.
>>
>> In the pg_hba.conf file of the master server, I have put the standby
>> server's hostname instead of IP and due to which replication got broken and
>> I started getting below error.
>>
>> FATAL:  no pg_hba.conf entry for replication connection from host
>> "10.20.8.22", user "replication", SSL off
>>
>> 2023-11-30 12:00:25 PST|pid=24096|FATAL:  XX000: could not connect to the
>> primary server: FATAL:  no pg_hba.conf entry for replication connection
>>
>> from host "10.20.8.22", user "replication", SSL on
>>
>>
>> I put the ip of standby back in pg_hba.conf file of the master server and
>> replication got started, though in the standby server I mentioned the
>> hostname of master in primary_conninfo on the standby side.
>>
>>
>> So why is the master not reading the hostname of standby ? What am I
>> missing here ?
>>
>
> You'd have to show us pg_hba.conf and /etc/hosts.
>
> My guess, though, is that you're not taking the domain name into account.
>
>


Re: Two started cluster on the same data directory and port

2023-11-30 Thread Adrian Klaver

On 11/30/23 12:35, Adrian Klaver wrote:

On 11/30/23 10:35, Adrian Klaver wrote:

On 11/30/23 09:27, Loles wrote:




cd /etc/postgresql/14/
sudo rm -r main_old/
   or
sudo cp -r main_old 


Arrgh.

sudo mv -r main_old 

Memo to self don't eat lunch and copy/paste at same time.


--
Adrian Klaver
adrian.kla...@aklaver.com





Re: replication strange behavior

2023-11-30 Thread Ron Johnson
On Thu, Nov 30, 2023 at 3:41 PM Atul Kumar  wrote:

> Hi,
>
> I have postgres 12 running in centos 7.
>
> I have configured streaming replication between one master and one standby
> server.
>
> In the pg_hba.conf file of the master server, I have put the standby
> server's hostname instead of IP and due to which replication got broken and
> I started getting below error.
>
> FATAL:  no pg_hba.conf entry for replication connection from host
> "10.20.8.22", user "replication", SSL off
>
> 2023-11-30 12:00:25 PST|pid=24096|FATAL:  XX000: could not connect to the
> primary server: FATAL:  no pg_hba.conf entry for replication connection
>
> from host "10.20.8.22", user "replication", SSL on
>
>
> I put the ip of standby back in pg_hba.conf file of the master server and
> replication got started, though in the standby server I mentioned the
> hostname of master in primary_conninfo on the standby side.
>
>
> So why is the master not reading the hostname of standby ? What am I
> missing here ?
>

You'd have to show us pg_hba.conf and /etc/hosts.

My guess, though, is that you're not taking the domain name into account.


replication strange behavior

2023-11-30 Thread Atul Kumar
Hi,

I have postgres 12 running in centos 7.

I have configured streaming replication between one master and one standby
server.

In the pg_hba.conf file of the master server, I have put the standby
server's hostname instead of IP and due to which replication got broken and
I started getting below error.

FATAL:  no pg_hba.conf entry for replication connection from host
"10.20.8.22", user "replication", SSL off

2023-11-30 12:00:25 PST|pid=24096|FATAL:  XX000: could not connect to the
primary server: FATAL:  no pg_hba.conf entry for replication connection

from host "10.20.8.22", user "replication", SSL on


I put the ip of standby back in pg_hba.conf file of the master server and
replication got started, though in the standby server I mentioned the
hostname of master in primary_conninfo on the standby side.


So why is the master not reading the hostname of standby ? What am I
missing here ?



Regards,

Atul


Re: Two started cluster on the same data directory and port

2023-11-30 Thread Adrian Klaver

On 11/30/23 10:35, Adrian Klaver wrote:

On 11/30/23 09:27, Loles wrote:


root@hostname:~# pg_lsclusters
Ver Cluster  Port Status Owner    Data directory              Log file
*14  main     5432 online* postgres /var/lib/postgresql/14/main 
/var/log/postgresql/postgresql-14-main.log
*14  main_old 5432 online *postgres /var/lib/postgresql/14/main 
/var/log/postgresql/postgresql-14-main_old.log


Start, up the two cluster.



Before going any further make sure you have a current backup of the 
database.


Hmm, are /etc/postgresql/14/main/ and /etc/postgresql/14/main_old/ 
symlinked together?


If not what if you move the ~14/main_old/ directory somewhere else and 
then restart Postgres?


Alright results of testing here:

tree /etc/postgresql/14
/etc/postgresql/14
├── main
│   ├── conf.d
│   ├── environment
│   ├── pg_ctl.conf
│   ├── pg_hba.conf
│   ├── pg_ident.conf
│   ├── postgresql.conf
│   └── start.conf
└── main_old
├── conf.d
├── environment
├── pg_ctl.conf
├── pg_hba.conf
├── pg_ident.conf
├── postgresql.conf
└── start.conf

Where I just copied main/ to main_old/.

sudo pg_ctlcluster start 14 main

pg_lsclusters
Ver Cluster  Port Status OwnerData directory  Log file
14  main 5432 online postgres /var/lib/postgresql/14/main 
/var/log/postgresql/postgresql-14-main.log
14  main_old 5432 online postgres /var/lib/postgresql/14/main 
/var/log/postgresql/postgresql-14-main_old.log



sudo pg_ctlcluster stop 14 main

pg_lsclusters
Ver Cluster  Port Status OwnerData directory  Log file
14  main 5432 down   postgres /var/lib/postgresql/14/main 
/var/log/postgresql/postgresql-14-main.log
14  main_old 5432 down   postgres /var/lib/postgresql/14/main 
/var/log/postgresql/postgresql-14-main_old.log



cd /etc/postgresql/14/
sudo rm -r main_old/
  or
sudo cp -r main_old 


sudo pg_ctlcluster start 14 main

pg_lsclusters
Ver Cluster Port Status OwnerData directory  Log file
14  main5432 online postgres /var/lib/postgresql/14/main 
/var/log/postgresql/postgresql-14-main.log



I would need to understand systemd better to be sure, but I'm thinking 
the duplicated cluster are happening here:


/lib/systemd/system-generators/postgresql-generator

#!/bin/sh

# This systemd generator creates dependency symlinks that make all 
PostgreSQL

# clusters with "auto" in their start.conf file be started/stopped/reloaded
# when postgresql.service is started/stopped/reloaded.

set -eu

gendir="$1"
wantdir="$1/postgresql.service.wants"
bindir="/usr/lib/postgresql/"
#redhat# bindir="/usr/pgsql-"
pgservice="/lib/systemd/system/postgresql@.service"

mkdir -p "$wantdir"

for conf in /etc/postgresql/*/*/postgresql.conf; do
# abort loop if glob was not expanded (but accept dead symlinks)
if ! test -e "$conf" && ! test -L "$conf"; then continue; fi

dir="${conf%/*}"

# evaluate start.conf
if [ -e "$dir/start.conf" ]; then
start=$(sed 's/#.*$//; /^[[:space:]]*$/d; s/^\s*//; 
s/\s*$//' "$dir/start.conf")

else
start=auto
fi
[ "$start" = "auto" ] || continue

verdir="${dir%/*}"
version="${verdir##*/}"
test -x "$bindir$version/bin/postgres" || continue # package 
got removed

cluster="${dir##*/}"
ln -s "$pgservice" "$wantdir/postgresql@$version-$cluster.service"
done

exit 0






Adrian Klaver
adrian.kla...@aklaver.com





--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Two started cluster on the same data directory and port

2023-11-30 Thread Loles
No, they are not symbolic links. I have thought about moving the main_old
directory somewhere else and it doesn't change anything because the cluster
main_old is linked to the main data directory. In fact, neither the log nor
any main_old data file changes, they are all from 2020. It's a strange case

El jue., 30 nov. 2023 16:23, Adrian Klaver 
escribió:

> On 11/30/23 05:37, Loles wrote:
> > Hi!
> >
> > I have this situation on a server that is giving me problems.
> >
> > postgres@hostname:~$ pg_lsclusters
> > Ver Cluster  Port Status OwnerData directory  Log file
> > 14  main 5432 online postgres /var/lib/postgresql/14/main
> > /var/log/postgresql/postgresql-14-main.log
> > 14  main_old 5432 online postgres /var/lib/postgresql/14/main
> > /var/log/postgresql/postgresql-14-main_old.log
> >
> > How can I stop, even delete, the main_old cluster?
>
> Assuming you are on  Debian/Ubuntu.
>
> 1) sudo pg_ctlcluster stop 14 main_old
>
>  This will stop the server for this session
>
> 2) sudo vi  /etc/postgresql/14/main_old/start.conf
>
>  Change auto to manual.
>  This will keep the server from restarting on reboot.
>  You can still start it using
>  sudo pg_ctlcluster start 14 main_old
>  Though if you want to do that then:
>  vi /etc/postgresql/14/main_old/postgresql.conf
>   and change port to something other then 5432.
>
> 3) If you have determined you really don't need that server and have all
> the data safely backed up or moved to the other server, then(THIS IS NOT
> REVERSIBLE):
>
>  sudo pg_dropcluster 14 main_old
>
> >
> > Thanks!
> >
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>


Re: libpq crashing on macOS during connection startup

2023-11-30 Thread Tom Lane
John DeSoi  writes:
>> On Nov 30, 2023, at 8:59 AM, Joe Conway  wrote:
>> Did you recently get an OpenSSL upgrade to v3.2.0? This is a shot in the 
>> dark, but perhaps related to the discussion here?
>> https://www.postgresql.org/message-id/flat/CAN55FZ1eDDYsYaL7mv%2BoSLUij2h_u6hvD4Qmv-7PK7jkji0uyQ%40mail.gmail.com

> No, this server is on openssl 3.1.4. But thanks for sending that, I'm about 
> to setup a new server and I'm sure it will end up with the latest versions.

The crash appears to be happening within GSSAPI authentication, which
presumably indicates that we're not using OpenSSL, so that isn't
where to look.

What troubles me about that stack trace is the references to Heimdal.
We gave up supporting Heimdal (and v16 explicitly rejects building
with it) because its support for Kerberos credentials was too
incomplete and flaky.  So I'm inclined to guess that you are running
into some Heimdal bug.  Try to rebuild libpq using MIT Kerberos
and see if things get better.

regards, tom lane




PAF with Pacemaker

2023-11-30 Thread Vijaykumar Patil
Hi Team ,

I have two postgres server one is primary and other one replica, I have setup 
replication and configured pacemaker and corosync.

But still I'm facing issue while creating resource. It is showing invalid 
parameters.

[root@scrbtrheldbaas001 heartbeat]# pcs status
Cluster name: pg_cluster
Cluster Summary:
  * Stack: corosync (Pacemaker is running)
  * Current DC: scrbtrheldbaas001 (version 2.1.6-8.el8-6fdc9deea29) - partition 
with quorum
  * Last updated: Thu Nov 30 19:04:29 2023 on scrbtrheldbaas001
  * Last change:  Thu Nov 30 13:41:53 2023 by root via cibadmin on 
scrbtrheldbaas002
  * 2 nodes configured
  * 2 resource instances configured

Node List:
  * Online: [ scrbtrheldbaas001 scrbtrheldbaas002 ]

Full List of Resources:
  * Clone Set: pgsqld-clone [pgsqld] (promotable):
* Stopped (invalid parameter): [ scrbtrheldbaas001 scrbtrheldbaas002 ]

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled
[root@scrbtrheldbaas001 heartbeat]#


My postgres version is 15.3 but still  is searching recover.conf . please find 
below log.





Node 1 pacemaker log:-

[root@scrbtrheldbaas001 heartbeat]# journalctl -xe | grep pacemaker
Nov 30 18:58:32 scrbtrheldbaas001.crb.apmoller.net pacemaker-controld[69280]:  
notice: State transition S_IDLE -> S_POLICY_ENGINE
Nov 30 18:58:32 scrbtrheldbaas001.crb.apmoller.net pacemaker-schedulerd[69279]: 
 notice: Treating probe result 'invalid parameter' for pgsqld:0 on 
scrbtrheldbaas002 as 'not running'
Nov 30 18:58:32 scrbtrheldbaas001.crb.apmoller.net pacemaker-schedulerd[69279]: 
 notice: Treating probe result 'invalid parameter' for pgsqld:0 on 
scrbtrheldbaas002 as 'not running'
Nov 30 18:58:32 scrbtrheldbaas001.crb.apmoller.net pacemaker-schedulerd[69279]: 
 notice: Treating probe result 'invalid parameter' for pgsqld:0 on 
scrbtrheldbaas001 as 'not running'
Nov 30 18:58:32 scrbtrheldbaas001.crb.apmoller.net pacemaker-schedulerd[69279]: 
 notice: Treating probe result 'invalid parameter' for pgsqld:0 on 
scrbtrheldbaas001 as 'not running'
Nov 30 18:58:32 scrbtrheldbaas001.crb.apmoller.net pacemaker-schedulerd[69279]: 
 notice: Calculated transition 3, saving inputs in 
/var/lib/pacemaker/pengine/pe-input-87.bz2
Nov 30 18:58:32 scrbtrheldbaas001.crb.apmoller.net pacemaker-controld[69280]:  
notice: Transition 3 (Complete=0, Pending=0, Fired=0, Skipped=0, Incomplete=0, 
Source=/var/lib/pacemaker/pengine/pe-input-87.bz2): Complete
Nov 30 18:58:32 scrbtrheldbaas001.crb.apmoller.net pacemaker-controld[69280]:  
notice: State transition S_TRANSITION_ENGINE -> S_IDLE
[root@scrbtrheldbaas001 heartbeat]#

Node 2 pacemkaer log:-

13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemakerd[1112600]:  notice: 
Stopping pacemaker-fenced
Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-fenced[1112602]:  
notice: Caught 'Terminated' signal
Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemakerd[1112600]:  
notice: Stopping pacemaker-based
Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-based[1112601]:  
notice: Caught 'Terminated' signal
Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-based[1112601]:  
notice: Disconnected from Corosync
Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-based[1112601]:  
notice: Disconnected from Corosync
Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemakerd[1112600]:  
notice: Shutdown complete
Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net systemd[1]: 
pacemaker.service: Succeeded.
-- The unit pacemaker.service has successfully entered the 'dead' state.
-- Subject: Unit pacemaker.service has finished shutting down
-- Unit pacemaker.service has finished shutting down.
-- Subject: Unit pacemaker.service has finished start-up
-- Unit pacemaker.service has finished starting up.
Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemakerd[1114127]:  
notice: Additional logging available in /var/log/pacemaker/pacemaker.log
Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemakerd[1114127]:  
notice: Starting Pacemaker 2.1.6-8.el8
Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemakerd[1114127]:  
notice: Pacemaker daemon successfully started and accepting connections
Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-based[1114128]:  
notice: Starting Pacemaker CIB manager
Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-execd[1114130]:  
notice: Starting Pacemaker local executor
Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-execd[1114130]:  
notice: Pacemaker local executor successfully started and accepting connections
Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-execd[1114130]:  
notice: OCF resource agent search path is /usr/lib/ocf/resource.d
Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-fenced[1114129]:  
notice: Additional logging available in /var/log/pacemaker/pacemaker.log
Nov 30 13:43:28 scrbtrheldbaas002.crb.apmoller.net pacemaker-fenced[1114129]:  
n

Re: Two started cluster on the same data directory and port

2023-11-30 Thread Adrian Klaver

On 11/30/23 09:27, Loles wrote:


root@hostname:~# pg_lsclusters
Ver Cluster  Port Status Owner    Data directory              Log file
*14  main     5432 online* postgres /var/lib/postgresql/14/main 
/var/log/postgresql/postgresql-14-main.log
*14  main_old 5432 online *postgres /var/lib/postgresql/14/main 
/var/log/postgresql/postgresql-14-main_old.log


Start, up the two cluster.



Before going any further make sure you have a current backup of the 
database.


Hmm, are /etc/postgresql/14/main/ and /etc/postgresql/14/main_old/ 
symlinked together?


If not what if you move the ~14/main_old/ directory somewhere else and 
then restart Postgres?


Adrian Klaver
adrian.kla...@aklaver.com





Re: Fixing or Mitigating this ERROR: invalid page in block 35217 of relation base/16421/3192429

2023-11-30 Thread Stephen Frost
Greetings,

On Thu, Nov 30, 2023 at 02:51 Abdul Qoyyuum 
wrote:

> On Wed, Nov 29, 2023 at 5:53 PM Stephen Frost  wrote:
>
>> * Abdul Qoyyuum (aqoyy...@cardaccess.com.bn) wrote:
>> > Knowing that it's a data corruption issue, the only way to fix this is
>> to
>> > vacuum and reindex the database. What was suggested was the following:
>> >
>> > SET zero_damaged_pages = 0; # This is so that we can have the
>> application
>> > to continue to run
>> > VACUUM FULL VERBOSE ANALYSE; # Do a full vacuum and analyse the problem
>> if
>> > possible.
>> > REINDEX DATABASE "core"; # Then do a reindex and clean it up.
>>
>> This is only going to help if the issue is in an index, which isn't
>> clear from what's been shared.
>>
>
> That is a good point. Problem is I can't really find out as the logs isn't
> that verbose to tell me more.
>

Look in pg_class for a relation with a relfileno matching the filename or
for a relation with an oid matching it.

it happened (almost exactly
>> > a year ago) and I think this needs a more permanent solution. I've
>> looked
>> > at routine vacuuming and checked the autovacuum is set to on and the
>> > following configurations:
>>
>> This isn't something that should ever happen ...
>>
>> This also doesn't have anything to do with autovacuum, changing settings
>> there won't make any difference.
>>
>
> Noted but it has been a clean running since a year ago that I ran the
> vacuum and reindex commands.
>

Great to hear but isn’t particularly relevant. Sure, a reindex could “fix”
this, but it shouldn’t happen in the first place…

> Can anyone advise if there's anything else we can do? We have no clue what
>> > causes the invalid page block and we are running a High Availability
>> > cluster set up but we are hoping that there may be a way to mitigate it.
>>
>> Was there some kind of hardware fault?  Did you do a failover?  Restore
>> from a backup?  Do you have checksums enabled?  How many times has this
>> happened before, and how many pages were impacted?  What is the design
>> of your HA solution, are you using PG replication or something else?
>
>
> There have been a few maintenance operations earlier this year but nothing
> too critical or anything failed that would have caused the database to go
> corrupt. The HA solution we're using is the pacemaker with the active
> passive setup.
>

This really isn’t enough information to assess if what you’re doing in your
configuration is leading to corruption, or not. Note that a lot of
corruption can happen without invalid pages showing up- you should
seriously look into amcheck and using it to see if your indexes are valid.

What exactly is your “pacemaker with the active passive setup”?  What’s the
storage under that?  What other maintenance operations have been done..?

Unsure if sharing the relevant WAL settings from postgresql.conf may be
> useful but here they are:
>
> max_connections = 300
> shared_buffers = 128MB
>
> archive_mode = on
> archive_command = 'test ! -f /opt/databases/postgres12/wal_archive/%f &&
> cp %p /opt/databases/postgres12/wal_archive/%f'
>

cp is absolutely not valid as an archive command- it doesn’t ensure that
the WAL has been fully sync’d to disk and therefore could lead you to a
situation where you lose necessary WAL. I strongly encourage you to look
into a proper backup solution.

Thanks,

Stephen


Re: Two started cluster on the same data directory and port

2023-11-30 Thread Loles
root@hostname:~# su - postgres
postgres@hostname:~$ *nano /etc/postgresql/14/main_old/start.conf*
postgres@hostname:~$ exit
cerrar sesión
root@hostname:~# *pg_ctlcluster stop 14 main*
*(change auto for disabled and exit with save)*
root@hostname:~# ps -ef | grep postgres
root26091512  0 18:24 pts/100:00:00 grep postgres
root@hostname:~# pg_lsclusters
Ver Cluster  Port Status OwnerData directory  Log file
*14  main 5432 down*   postgres /var/lib/postgresql/14/main
/var/log/postgresql/postgresql-14-main.log
*14  main_old 5432 down  * postgres /var/lib/postgresql/14/main
/var/log/postgresql/postgresql-14-main_old.log
root@hostname:~#
root@hostname:~#
root@hostname:~# *pg_ctlcluster start 14 main*
root@hostname:~# ps -ef | grep postgres
postgres2620   1  2 18:24 ?00:00:00
/usr/lib/postgresql/14/bin/postgres -D /var/lib/postgresql/14/main -c
config_file=/etc/postgresql/14/main/postgresql.conf
postgres26252620  0 18:24 ?00:00:00 postgres: 14/main:
checkpointer
postgres26262620  0 18:24 ?00:00:00 postgres: 14/main:
background writer
postgres26272620  0 18:24 ?00:00:00 postgres: 14/main:
walwriter
postgres26282620  0 18:24 ?00:00:00 postgres: 14/main:
autovacuum launcher
postgres26292620  0 18:24 ?00:00:00 postgres: 14/main:
stats collector
postgres26302620  0 18:24 ?00:00:00 postgres: 14/main:
logical replication launcher
root26421512  0 18:24 pts/100:00:00 grep postgres
root@hostname:~# pg_lsclusters
Ver Cluster  Port Status OwnerData directory  Log file
*14  main 5432 online* postgres /var/lib/postgresql/14/main
/var/log/postgresql/postgresql-14-main.log
*14  main_old 5432 online *postgres /var/lib/postgresql/14/main
/var/log/postgresql/postgresql-14-main_old.log

Start, up the two cluster.



El jue, 30 nov 2023 a las 18:01, Adrian Klaver ()
escribió:

> On 11/30/23 08:46, Loles wrote:
> > Yes, it exists but it is obsolete. The files and directories are all
> > dated 2022.
> >
> > It was obsolete and no longer valid, it could be deleted. Only the main
> > cluster is correct.
>
> First try:
>
>  vi /etc/postgresql/14/main_old/start.conf
>
> and change auto to disabled.
>
> Stop the server(s) and then:
>
>  sudo pg_ctlcluster starp 14 main
>
> >
> > El jue, 30 nov 2023 a las 17:43, Adrian Klaver
> > (mailto:adrian.kla...@aklaver.com>>)
> escribió:
> >
> > On 11/30/23 08:37, Loles wrote:
> >  > Every time I stop the main_old cluster with sudo pg_ctlcluster
> > stop 14
> >  > main_old, both are stopped. And when I raise sudo pg_ctlcluster
> > start 14
> >  > main, both are raised.
> >  >
> >  > I don't know how I got into this situation and I don't know how to
> >  > resolve it.
> >
> > Is there both?:
> >
> >  /etc/postgresql/14/main/
> >  /etc/postgresql/14/main_old/
> >
> >  >
> >  >
> >  > El jue, 30 nov 2023 a las 17:34, Loles ( > 
> >  > >>) escribió:
> >  >
> >  > Indeed, that is the problem, they are two different clusters,
> >  > pointing to the same data directory and reading from the same
> > port.
> >  >
> >  > I can't change the port because it is the same postgresql.conf
> >  >
> >  > I don't know how to disassociate the main_old cluster from
> > the data
> >  > directory or how to resolve this situation.
> >  >
> >  > El jue, 30 nov 2023 a las 16:32, Adrian Klaver
> >  > ( >   > >>)
> >  > escribió:
> >  >
> >  > On 11/30/23 07:23, Adrian Klaver wrote:
> >  >  > On 11/30/23 05:37, Loles wrote:
> >  >  >> Hi!
> >  >  >>
> >  >  >> I have this situation on a server that is giving me
> > problems.
> >  >  >>
> >  >  >> postgres@hostname:~$ pg_lsclusters
> >  >  >> Ver Cluster  Port Status OwnerData directory
> >  >   Log file
> >  >  >> 14  main 5432 online postgres
> > /var/lib/postgresql/14/main
> >  >  >> /var/log/postgresql/postgresql-14-main.log
> >  >  >> 14  main_old 5432 online postgres
> > /var/lib/postgresql/14/main
> >  >  >> /var/log/postgresql/postgresql-14-main_old.log
> >  >  >>
> >  >  >> How can I stop, even delete, the main_old cluster?
> >  >  >
> >  >  > Assuming you are on  Debian/Ubuntu.
> >  >  >
> >  >  > 1) sudo pg_ctlcluster stop 14 main_old
> >  >  >
> >  >  >  This will stop the server for this session
> >  >  >
> >  >  > 2) sudo vi  /etc/pos

Re: Two started cluster on the same data directory and port

2023-11-30 Thread Loles
Yes, it exists but it is obsolete. The files and directories are all dated
2022.

It was obsolete and no longer valid, it could be deleted. Only the main
cluster is correct.

El jue, 30 nov 2023 a las 17:43, Adrian Klaver ()
escribió:

> On 11/30/23 08:37, Loles wrote:
> > Every time I stop the main_old cluster with sudo pg_ctlcluster stop 14
> > main_old, both are stopped. And when I raise sudo pg_ctlcluster start 14
> > main, both are raised.
> >
> > I don't know how I got into this situation and I don't know how to
> > resolve it.
>
> Is there both?:
>
> /etc/postgresql/14/main/
> /etc/postgresql/14/main_old/
>
> >
> >
> > El jue, 30 nov 2023 a las 17:34, Loles ( > >) escribió:
> >
> > Indeed, that is the problem, they are two different clusters,
> > pointing to the same data directory and reading from the same port.
> >
> > I can't change the port because it is the same postgresql.conf
> >
> > I don't know how to disassociate the main_old cluster from the data
> > directory or how to resolve this situation.
> >
> > El jue, 30 nov 2023 a las 16:32, Adrian Klaver
> > (mailto:adrian.kla...@aklaver.com>>)
> > escribió:
> >
> > On 11/30/23 07:23, Adrian Klaver wrote:
> >  > On 11/30/23 05:37, Loles wrote:
> >  >> Hi!
> >  >>
> >  >> I have this situation on a server that is giving me problems.
> >  >>
> >  >> postgres@hostname:~$ pg_lsclusters
> >  >> Ver Cluster  Port Status OwnerData directory
> >   Log file
> >  >> 14  main 5432 online postgres /var/lib/postgresql/14/main
> >  >> /var/log/postgresql/postgresql-14-main.log
> >  >> 14  main_old 5432 online postgres /var/lib/postgresql/14/main
> >  >> /var/log/postgresql/postgresql-14-main_old.log
> >  >>
> >  >> How can I stop, even delete, the main_old cluster?
> >  >
> >  > Assuming you are on  Debian/Ubuntu.
> >  >
> >  > 1) sudo pg_ctlcluster stop 14 main_old
> >  >
> >  >  This will stop the server for this session
> >  >
> >  > 2) sudo vi  /etc/postgresql/14/main_old/start.conf
> >  >
> >  >  Change auto to manual.
> >  >  This will keep the server from restarting on reboot.
> >  >  You can still start it using
> >  >   sudo pg_ctlcluster start 14 main_old
> >  >  Though if you want to do that then:
> >  >  vi /etc/postgresql/14/main_old/postgresql.conf
> >  >   and change port to something other then 5432.
> >  >
> >  > 3) If you have determined you really don't need that server
> > and have all
> >  > the data safely backed up or moved to the other server,
> > then(THIS IS NOT
> >  > REVERSIBLE):
> >  >
> >  >  sudo pg_dropcluster 14 main_old
> >
> > Blaming it on not enough coffee, but I just noticed both
> > clusters are
> > pointing at the same data directory. I would NOT do:
> >
> >   sudo pg_dropcluster 14 main_old
> >
> >  >
> >  >>
> >  >> Thanks!
> >  >>
> >  >
> >
> > --
> > Adrian Klaver
> > adrian.kla...@aklaver.com 
> >
>
> --
> Adrian Klaver
> adrian.kla...@aklaver.com
>
>


Re: Two started cluster on the same data directory and port

2023-11-30 Thread Adrian Klaver

On 11/30/23 08:46, Loles wrote:
Yes, it exists but it is obsolete. The files and directories are all 
dated 2022.


It was obsolete and no longer valid, it could be deleted. Only the main 
cluster is correct.


First try:

vi /etc/postgresql/14/main_old/start.conf

and change auto to disabled.

Stop the server(s) and then:

sudo pg_ctlcluster starp 14 main



El jue, 30 nov 2023 a las 17:43, Adrian Klaver 
(mailto:adrian.kla...@aklaver.com>>) escribió:


On 11/30/23 08:37, Loles wrote:
 > Every time I stop the main_old cluster with sudo pg_ctlcluster
stop 14
 > main_old, both are stopped. And when I raise sudo pg_ctlcluster
start 14
 > main, both are raised.
 >
 > I don't know how I got into this situation and I don't know how to
 > resolve it.

Is there both?:

     /etc/postgresql/14/main/
     /etc/postgresql/14/main_old/

 >
 >
 > El jue, 30 nov 2023 a las 17:34, Loles (mailto:lole...@gmail.com>
 > >>) escribió:
 >
 >     Indeed, that is the problem, they are two different clusters,
 >     pointing to the same data directory and reading from the same
port.
 >
 >     I can't change the port because it is the same postgresql.conf
 >
 >     I don't know how to disassociate the main_old cluster from
the data
 >     directory or how to resolve this situation.
 >
 >     El jue, 30 nov 2023 a las 16:32, Adrian Klaver
 >     (mailto:adrian.kla...@aklaver.com> >>)
 >     escribió:
 >
 >         On 11/30/23 07:23, Adrian Klaver wrote:
 >          > On 11/30/23 05:37, Loles wrote:
 >          >> Hi!
 >          >>
 >          >> I have this situation on a server that is giving me
problems.
 >          >>
 >          >> postgres@hostname:~$ pg_lsclusters
 >          >> Ver Cluster  Port Status Owner    Data directory
 >           Log file
 >          >> 14  main     5432 online postgres
/var/lib/postgresql/14/main
 >          >> /var/log/postgresql/postgresql-14-main.log
 >          >> 14  main_old 5432 online postgres
/var/lib/postgresql/14/main
 >          >> /var/log/postgresql/postgresql-14-main_old.log
 >          >>
 >          >> How can I stop, even delete, the main_old cluster?
 >          >
 >          > Assuming you are on  Debian/Ubuntu.
 >          >
 >          > 1) sudo pg_ctlcluster stop 14 main_old
 >          >
 >          >      This will stop the server for this session
 >          >
 >          > 2) sudo vi  /etc/postgresql/14/main_old/start.conf
 >          >
 >          >      Change auto to manual.
 >          >      This will keep the server from restarting on reboot.
 >          >      You can still start it using
 >          >   sudo pg_ctlcluster start 14 main_old
 >          >      Though if you want to do that then:
 >          >      vi /etc/postgresql/14/main_old/postgresql.conf
 >          >   and change port to something other then 5432.
 >          >
 >          > 3) If you have determined you really don't need that
server
 >         and have all
 >          > the data safely backed up or moved to the other server,
 >         then(THIS IS NOT
 >          > REVERSIBLE):
 >          >
 >          >      sudo pg_dropcluster 14 main_old
 >
 >         Blaming it on not enough coffee, but I just noticed both
 >         clusters are
 >         pointing at the same data directory. I would NOT do:
 >
 >               sudo pg_dropcluster 14 main_old
 >
 >          >
 >          >>
 >          >> Thanks!
 >          >>
 >          >
 >
 >         --
 >         Adrian Klaver
 > adrian.kla...@aklaver.com 
>
 >

-- 
Adrian Klaver

adrian.kla...@aklaver.com 



--
Adrian Klaver
adrian.kla...@aklaver.com





Re: libpq crashing on macOS during connection startup

2023-11-30 Thread John DeSoi


> On Nov 30, 2023, at 10:21 AM, Adrian Klaver  wrote:
> 
> To be clear, at the times the Web processes crash there is are no traces in 
> the Postgres log of an issue on the Postgres side?
> 
> Is there evidence in the Postgres logs of what the Web process was doing just 
> before it crashed?


No entry in the Postgres log that I can see. The backtrace I posted in the 
original message was today at 7:06am. There is nothing in the Postgres log 
around that time except for some checkpoint messages. 

I think the backtrace shows that Postgres has just connected and is 
authenticating by calling Kerberos which calls Heimdal and then crashes in 
CoreFoundation. I also posted this issue on the Heimdal GitHub account.


John DeSoi, Ph.D.





Re: Two started cluster on the same data directory and port

2023-11-30 Thread Adrian Klaver

On 11/30/23 08:37, Loles wrote:
Every time I stop the main_old cluster with sudo pg_ctlcluster stop 14 
main_old, both are stopped. And when I raise sudo pg_ctlcluster start 14 
main, both are raised.


I don't know how I got into this situation and I don't know how to 
resolve it.


Is there both?:

   /etc/postgresql/14/main/
   /etc/postgresql/14/main_old/




El jue, 30 nov 2023 a las 17:34, Loles (>) escribió:


Indeed, that is the problem, they are two different clusters,
pointing to the same data directory and reading from the same port.

I can't change the port because it is the same postgresql.conf

I don't know how to disassociate the main_old cluster from the data
directory or how to resolve this situation.

El jue, 30 nov 2023 a las 16:32, Adrian Klaver
(mailto:adrian.kla...@aklaver.com>>)
escribió:

On 11/30/23 07:23, Adrian Klaver wrote:
 > On 11/30/23 05:37, Loles wrote:
 >> Hi!
 >>
 >> I have this situation on a server that is giving me problems.
 >>
 >> postgres@hostname:~$ pg_lsclusters
 >> Ver Cluster  Port Status Owner    Data directory
  Log file

 >> 14  main     5432 online postgres /var/lib/postgresql/14/main
 >> /var/log/postgresql/postgresql-14-main.log
 >> 14  main_old 5432 online postgres /var/lib/postgresql/14/main
 >> /var/log/postgresql/postgresql-14-main_old.log
 >>
 >> How can I stop, even delete, the main_old cluster?
 >
 > Assuming you are on  Debian/Ubuntu.
 >
 > 1) sudo pg_ctlcluster stop 14 main_old
 >
 >      This will stop the server for this session
 >
 > 2) sudo vi  /etc/postgresql/14/main_old/start.conf
 >
 >      Change auto to manual.
 >      This will keep the server from restarting on reboot.
 >      You can still start it using
 >   sudo pg_ctlcluster start 14 main_old
 >      Though if you want to do that then:
 >      vi /etc/postgresql/14/main_old/postgresql.conf
 >   and change port to something other then 5432.
 >
 > 3) If you have determined you really don't need that server
and have all
 > the data safely backed up or moved to the other server,
then(THIS IS NOT
 > REVERSIBLE):
 >
 >      sudo pg_dropcluster 14 main_old

Blaming it on not enough coffee, but I just noticed both
clusters are
pointing at the same data directory. I would NOT do:

      sudo pg_dropcluster 14 main_old

 >
 >>
 >> Thanks!
 >>
 >

-- 
Adrian Klaver

adrian.kla...@aklaver.com 



--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Two started cluster on the same data directory and port

2023-11-30 Thread Adrian Klaver

On 11/30/23 08:34, Loles wrote:

Please reply to list also.
Ccing list.

Indeed, that is the problem, they are two different clusters, pointing 
to the same data directory and reading from the same port.


I can't change the port because it is the same postgresql.conf

I don't know how to disassociate the main_old cluster from the data 
directory or how to resolve this situation.


El jue, 30 nov 2023 a las 16:32, Adrian Klaver 
(mailto:adrian.kla...@aklaver.com>>) escribió:


On 11/30/23 07:23, Adrian Klaver wrote:
 > On 11/30/23 05:37, Loles wrote:
 >> Hi!
 >>
 >> I have this situation on a server that is giving me problems.
 >>
 >> postgres@hostname:~$ pg_lsclusters
 >> Ver Cluster  Port Status Owner    Data directory
  Log file

 >> 14  main     5432 online postgres /var/lib/postgresql/14/main
 >> /var/log/postgresql/postgresql-14-main.log
 >> 14  main_old 5432 online postgres /var/lib/postgresql/14/main
 >> /var/log/postgresql/postgresql-14-main_old.log
 >>
 >> How can I stop, even delete, the main_old cluster?
 >
 > Assuming you are on  Debian/Ubuntu.
 >
 > 1) sudo pg_ctlcluster stop 14 main_old
 >
 >      This will stop the server for this session
 >
 > 2) sudo vi  /etc/postgresql/14/main_old/start.conf
 >
 >      Change auto to manual.
 >      This will keep the server from restarting on reboot.
 >      You can still start it using
 >   sudo pg_ctlcluster start 14 main_old
 >      Though if you want to do that then:
 >      vi /etc/postgresql/14/main_old/postgresql.conf
 >   and change port to something other then 5432.
 >
 > 3) If you have determined you really don't need that server and
have all
 > the data safely backed up or moved to the other server, then(THIS
IS NOT
 > REVERSIBLE):
 >
 >      sudo pg_dropcluster 14 main_old

Blaming it on not enough coffee, but I just noticed both clusters are
pointing at the same data directory. I would NOT do:

      sudo pg_dropcluster 14 main_old

 >
 >>
 >> Thanks!
 >>
 >

-- 
Adrian Klaver

adrian.kla...@aklaver.com 



--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Two started cluster on the same data directory and port

2023-11-30 Thread Loles
Every time I stop the main_old cluster with sudo pg_ctlcluster stop 14
main_old, both are stopped. And when I raise sudo pg_ctlcluster start 14
main, both are raised.

I don't know how I got into this situation and I don't know how to resolve
it.


El jue, 30 nov 2023 a las 17:34, Loles () escribió:

> Indeed, that is the problem, they are two different clusters, pointing to
> the same data directory and reading from the same port.
>
> I can't change the port because it is the same postgresql.conf
>
> I don't know how to disassociate the main_old cluster from the data
> directory or how to resolve this situation.
>
> El jue, 30 nov 2023 a las 16:32, Adrian Klaver ()
> escribió:
>
>> On 11/30/23 07:23, Adrian Klaver wrote:
>> > On 11/30/23 05:37, Loles wrote:
>> >> Hi!
>> >>
>> >> I have this situation on a server that is giving me problems.
>> >>
>> >> postgres@hostname:~$ pg_lsclusters
>> >> Ver Cluster  Port Status OwnerData directory  Log file
>> >> 14  main 5432 online postgres /var/lib/postgresql/14/main
>> >> /var/log/postgresql/postgresql-14-main.log
>> >> 14  main_old 5432 online postgres /var/lib/postgresql/14/main
>> >> /var/log/postgresql/postgresql-14-main_old.log
>> >>
>> >> How can I stop, even delete, the main_old cluster?
>> >
>> > Assuming you are on  Debian/Ubuntu.
>> >
>> > 1) sudo pg_ctlcluster stop 14 main_old
>> >
>> >  This will stop the server for this session
>> >
>> > 2) sudo vi  /etc/postgresql/14/main_old/start.conf
>> >
>> >  Change auto to manual.
>> >  This will keep the server from restarting on reboot.
>> >  You can still start it using
>> >   sudo pg_ctlcluster start 14 main_old
>> >  Though if you want to do that then:
>> >  vi /etc/postgresql/14/main_old/postgresql.conf
>> >   and change port to something other then 5432.
>> >
>> > 3) If you have determined you really don't need that server and have
>> all
>> > the data safely backed up or moved to the other server, then(THIS IS
>> NOT
>> > REVERSIBLE):
>> >
>> >  sudo pg_dropcluster 14 main_old
>>
>> Blaming it on not enough coffee, but I just noticed both clusters are
>> pointing at the same data directory. I would NOT do:
>>
>>  sudo pg_dropcluster 14 main_old
>>
>> >
>> >>
>> >> Thanks!
>> >>
>> >
>>
>> --
>> Adrian Klaver
>> adrian.kla...@aklaver.com
>>
>>


Re: libpq crashing on macOS during connection startup

2023-11-30 Thread Adrian Klaver

On 11/30/23 07:49, John DeSoi wrote:



On Nov 30, 2023, at 9:36 AM, Adrian Klaver  wrote:

What starts happening?


Random web process crashes when connecting to PostgreSQL.




Does the Postgres log show anything?


No.


To be clear, at the times the Web processes crash there is are no traces 
in the Postgres log of an issue on the Postgres side?


Is there evidence in the Postgres logs of what the Web process was doing 
just before it crashed?







Postgres version?

How was Postgres installed?



PostgreSQL 15.4 installed with Homebrew.


John DeSoi, Ph.D.




--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Fixing or Mitigating this ERROR: invalid page in block 35217 of relation base/16421/3192429

2023-11-30 Thread Abdul Qoyyuum
Hi Chris,

On Wed, Nov 29, 2023 at 7:38 PM Chris Travers 
wrote:

>
>
> On Wed, Nov 29, 2023 at 4:36 PM Abdul Qoyyuum 
> wrote:
>
>> Hi all,
>>
>> Knowing that it's a data corruption issue, the only way to fix this is to
>> vacuum and reindex the database. What was suggested was the following:
>>
>> SET zero_damaged_pages = 0; # This is so that we can have the application
>> to continue to run
>> VACUUM FULL VERBOSE ANALYSE; # Do a full vacuum and analyse the problem
>> if possible.
>> REINDEX DATABASE "core"; # Then do a reindex and clean it up.
>>
>
> So first, to clear up some confusion on my part here:
>
> This procedure doesn't make a lot of sense to me.  But did it clear up the
> issue?
>
Yeah it did fix the issue before (same issue as last year) and it has fixed
the problem that just happened a few days ago (almost exactly a year ago).

>
> In any of these cases, it is extremely important to diagnose the system
> properly.  If you have a fault in your storage device or RAID controller,
> for example, you are asking for more corruption and data loss later.
>

> At first I thought maybe you mistyped something and then realized there
> were a few issues with the process so it actually didn't make sense.
>
> First, zero_damaged_pages defaults to 0, and I can think of no reason to
> set  it explicitly.
> Secondly, a vacuum full has to reindex, so there is no reason to do a
> reindex following.  Your whole procedure is limited to a vacuum full, when
> a reindex is the only part that could affect this.   If it did work,
> reindexing is the only part that would have been helpful.
>
Oh that makes sense actually. Thanks.

>
> On to the question of what to do next
>
>>
>> We're on Postgresql 12. This has worked before it happened (almost
>> exactly a year ago) and I think this needs a more permanent solution. I've
>> looked at routine vacuuming and checked the autovacuum is set to on and the
>> following configurations:
>>
>> core=> select name, setting from pg_settings where name like
>> 'autovacuum%';
>> name |  setting
>> -+---
>>  autovacuum  | on
>>  autovacuum_analyze_scale_factor | 0.1
>>  autovacuum_analyze_threshold| 50
>>  autovacuum_freeze_max_age   | 2
>>  autovacuum_max_workers  | 3
>>  autovacuum_multixact_freeze_max_age | 4
>>  autovacuum_naptime  | 60
>>  autovacuum_vacuum_cost_delay| 2
>>  autovacuum_vacuum_cost_limit| -1
>>  autovacuum_vacuum_scale_factor  | 0.2
>>  autovacuum_vacuum_threshold | 50
>>  autovacuum_work_mem | -1
>> (12 rows)
>>
>> Can anyone advise if there's anything else we can do? We have no clue
>> what causes the invalid page block and we are running a High Availability
>> cluster set up but we are hoping that there may be a way to mitigate it.
>>
>>
> You need to figure out why the corruption is happening.  This is most
> likely, in my experience, not a PostgreSQL bug, but usually something that
> happens on the hardware layer or an environmental factor.  It could be
> failin storage or CPU.  Or it could be something like bad electrical input
> or insufficient cooling (I have seen index and even table corruption issues
> from both of these).
>
> If this is a server you run, the first things I would check are:
> 1.  Is there a good-quality UPS that the server is plugged into?  Are the
> batteries in good working order?
>
The servers are dual powered and hooked up to both supplied electricity,
with a backup generator and if that fails, it will switch over to the UPS.
All of these are supplied and maintained by the data centre that the
servers are at. There have been no electrical problems so far.

> 2.  Is the server somewhere that may be sitting in a pocket of hot air?
>
As you can imagine, the data centre has air-conditioning and floored fans
blowing hot air up and out, keeping all servers cooled. Checking on the
blade servers that the database VM sits on shows that the temperature is
optimally sitting at 65 degrees celsius. So I don't think it's a
temperature problem either.

>
> Once you have ruled these out, the next things to check are CPU, memory,
> and storage health.  Unfortunately checking these is harder but you can
> check SMART indications, and other diagnostic indicators.
>
Would this top stats be useful? The database does get busy especially at
peak hours.

top - 10:04:25 up 782 days, 43 min,  2 users,  load average: 1.20, 1.14,
1.10
Tasks: 415 total,   3 running, 412 sleeping,   0 stopped,   0 zombie
Cpu(s): 22.7%us, 13.8%sy,  0.0%ni, 63.2%id,  0.2%wa,  0.0%hi,  0.1%si,
 0.1%st
Mem:   8160156k total,  7832828k used,   327328k free,34584k buffers
Swap:  1048572k total,   304396k used,   744176k free,  6674428k cached

We are running on a SAN group disk storage. I guess I should take a closer
look at the disks. Thanks for the pointer.

>
> However,

Re: Fixing or Mitigating this ERROR: invalid page in block 35217 of relation base/16421/3192429

2023-11-30 Thread Abdul Qoyyuum
Hi Stephen,

On Wed, Nov 29, 2023 at 5:53 PM Stephen Frost  wrote:

> Greetings,
>
> * Abdul Qoyyuum (aqoyy...@cardaccess.com.bn) wrote:
> > Knowing that it's a data corruption issue, the only way to fix this is to
> > vacuum and reindex the database. What was suggested was the following:
> >
> > SET zero_damaged_pages = 0; # This is so that we can have the application
> > to continue to run
> > VACUUM FULL VERBOSE ANALYSE; # Do a full vacuum and analyse the problem
> if
> > possible.
> > REINDEX DATABASE "core"; # Then do a reindex and clean it up.
>
> This is only going to help if the issue is in an index, which isn't
> clear from what's been shared.
>

That is a good point. Problem is I can't really find out as the logs isn't
that verbose to tell me more. Part of the logs shows something like this:

2023-11-29 04:27:17.486 [ERROR] [dispatcher-1095] [] - ERROR:
invalid page in block 35217 of relation base/16421/3192429
  Where: parallel worker
org.postgresql.util.PSQLException: ERROR: invalid page in block 35217 of
relation base/16421/3192429
  Where: parallel worker
at
org.postgresql.core.v3.QueryExecutorImpl.receiveErrorResponse(QueryExecutorImpl.java:2565)
~[postgresql-42.2.24.jre7.jar:42.2.24.jre7]
at
org.postgresql.core.v3.QueryExecutorImpl.processResults(QueryExecutorImpl.java:2297)
~[postgresql-42.2.24.jre7.jar:42.2.24.jre7]

>
> > We're on Postgresql 12. This has worked before it happened (almost
> exactly
> > a year ago) and I think this needs a more permanent solution. I've looked
> > at routine vacuuming and checked the autovacuum is set to on and the
> > following configurations:
>
> This isn't something that should ever happen ...
>
> This also doesn't have anything to do with autovacuum, changing settings
> there won't make any difference.
>

Noted but it has been a clean running since a year ago that I ran the
vacuum and reindex commands.

>
> > Can anyone advise if there's anything else we can do? We have no clue
> what
> > causes the invalid page block and we are running a High Availability
> > cluster set up but we are hoping that there may be a way to mitigate it.
>
> Was there some kind of hardware fault?  Did you do a failover?  Restore
> from a backup?  Do you have checksums enabled?  How many times has this
> happened before, and how many pages were impacted?  What is the design
> of your HA solution, are you using PG replication or something else?


There have been a few maintenance operations earlier this year but nothing
too critical or anything failed that would have caused the database to go
corrupt. The HA solution we're using is the pacemaker with the active
passive setup.

Unsure if sharing the relevant WAL settings from postgresql.conf may be
useful but here they are:

max_connections = 300
shared_buffers = 128MB

archive_mode = on
archive_command = 'test ! -f /opt/databases/postgres12/wal_archive/%f && cp
%p /opt/databases/postgres12/wal_archive/%f'
hot_standby = on
wal_level = hot_standby
full_page_writes = on
max_wal_senders = 10
wal_keep_segments = 100 # 16MB per segment = 1.6GB
hot_standby = on
restart_after_crash = off
wal_receiver_status_interval = 2   # seconds
max_standby_streaming_delay = -1
max_standby_archive_delay = -1
synchronous_commit = on
hot_standby_feedback = on
wal_sender_timeout = 1
wal_receiver_timeout = 1

Thanks,
>
> Stephen
>


Re: libpq crashing on macOS during connection startup

2023-11-30 Thread John DeSoi


> On Nov 30, 2023, at 9:36 AM, Adrian Klaver  wrote:
> 
> What starts happening?

Random web process crashes when connecting to PostgreSQL.


> 
> Does the Postgres log show anything?

No.

> 
> Postgres version?
> 
> How was Postgres installed?


PostgreSQL 15.4 installed with Homebrew. 


John DeSoi, Ph.D.






Re: libpq crashing on macOS during connection startup

2023-11-30 Thread Adrian Klaver

On 11/30/23 06:45, John DeSoi wrote:

I have a macOS web server using Postgres that has been very stable until a 
month or two ago. If I restart the web server the problem seems to go away for 
a while, but starts happening again within days. I thought it was a PHP issue 
as discussed in the link below, but I just noticed in the crash report it seems 
to be something related to a call from libpq.


What starts happening?

Does the Postgres log show anything?

Postgres version?

How was Postgres installed?




https://github.com/shivammathur/homebrew-php/issues/1862


Any ideas or suggestions appreciated.


John DeSoi, Ph.D.


--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Two started cluster on the same data directory and port

2023-11-30 Thread Adrian Klaver

On 11/30/23 07:23, Adrian Klaver wrote:

On 11/30/23 05:37, Loles wrote:

Hi!

I have this situation on a server that is giving me problems.

postgres@hostname:~$ pg_lsclusters
Ver Cluster  Port Status Owner    Data directory              Log file
14  main     5432 online postgres /var/lib/postgresql/14/main 
/var/log/postgresql/postgresql-14-main.log
14  main_old 5432 online postgres /var/lib/postgresql/14/main 
/var/log/postgresql/postgresql-14-main_old.log


How can I stop, even delete, the main_old cluster?


Assuming you are on  Debian/Ubuntu.

1) sudo pg_ctlcluster stop 14 main_old

     This will stop the server for this session

2) sudo vi  /etc/postgresql/14/main_old/start.conf

     Change auto to manual.
     This will keep the server from restarting on reboot.
     You can still start it using
  sudo pg_ctlcluster start 14 main_old
     Though if you want to do that then:
     vi /etc/postgresql/14/main_old/postgresql.conf
  and change port to something other then 5432.

3) If you have determined you really don't need that server and have all 
the data safely backed up or moved to the other server, then(THIS IS NOT 
REVERSIBLE):


     sudo pg_dropcluster 14 main_old


Blaming it on not enough coffee, but I just noticed both clusters are 
pointing at the same data directory. I would NOT do:


sudo pg_dropcluster 14 main_old





Thanks!





--
Adrian Klaver
adrian.kla...@aklaver.com





Re: Two started cluster on the same data directory and port

2023-11-30 Thread Adrian Klaver

On 11/30/23 05:37, Loles wrote:

Hi!

I have this situation on a server that is giving me problems.

postgres@hostname:~$ pg_lsclusters
Ver Cluster  Port Status Owner    Data directory              Log file
14  main     5432 online postgres /var/lib/postgresql/14/main 
/var/log/postgresql/postgresql-14-main.log
14  main_old 5432 online postgres /var/lib/postgresql/14/main 
/var/log/postgresql/postgresql-14-main_old.log


How can I stop, even delete, the main_old cluster?


Assuming you are on  Debian/Ubuntu.

1) sudo pg_ctlcluster stop 14 main_old

This will stop the server for this session

2) sudo vi  /etc/postgresql/14/main_old/start.conf

Change auto to manual.
This will keep the server from restarting on reboot.
You can still start it using
 sudo pg_ctlcluster start 14 main_old   
Though if you want to do that then:
vi /etc/postgresql/14/main_old/postgresql.conf
 and change port to something other then 5432.

3) If you have determined you really don't need that server and have all 
the data safely backed up or moved to the other server, then(THIS IS NOT 
REVERSIBLE):


sudo pg_dropcluster 14 main_old



Thanks!



--
Adrian Klaver
adrian.kla...@aklaver.com





Re: libpq crashing on macOS during connection startup

2023-11-30 Thread John DeSoi


> On Nov 30, 2023, at 8:59 AM, Joe Conway  wrote:
> 
> Did you recently get an OpenSSL upgrade to v3.2.0? This is a shot in the 
> dark, but perhaps related to the discussion here?
> 
> https://www.postgresql.org/message-id/flat/CAN55FZ1eDDYsYaL7mv%2BoSLUij2h_u6hvD4Qmv-7PK7jkji0uyQ%40mail.gmail.com

No, this server is on openssl 3.1.4. But thanks for sending that, I'm about to 
setup a new server and I'm sure it will end up with the latest versions.

John DeSoi, Ph.D.






Re: libpq crashing on macOS during connection startup

2023-11-30 Thread Joe Conway

On 11/30/23 09:45, John DeSoi wrote:

I have a macOS web server using Postgres that has been very stable until a 
month or two ago. If I restart the web server the problem seems to go away for 
a while, but starts happening again within days. I thought it was a PHP issue 
as discussed in the link below, but I just noticed in the crash report it seems 
to be something related to a call from libpq.

https://github.com/shivammathur/homebrew-php/issues/1862

Any ideas or suggestions appreciated.



Did you recently get an OpenSSL upgrade to v3.2.0? This is a shot in the 
dark, but perhaps related to the discussion here?


https://www.postgresql.org/message-id/flat/CAN55FZ1eDDYsYaL7mv%2BoSLUij2h_u6hvD4Qmv-7PK7jkji0uyQ%40mail.gmail.com


--
Joe Conway
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com





libpq crashing on macOS during connection startup

2023-11-30 Thread John DeSoi
I have a macOS web server using Postgres that has been very stable until a 
month or two ago. If I restart the web server the problem seems to go away for 
a while, but starts happening again within days. I thought it was a PHP issue 
as discussed in the link below, but I just noticed in the crash report it seems 
to be something related to a call from libpq.   

https://github.com/shivammathur/homebrew-php/issues/1862


Any ideas or suggestions appreciated. 


John DeSoi, Ph.D.



-
Translated Report (Full Report Below)
-

Process:   httpd [54877]
Path:  /opt/homebrew/*/httpd
Identifier:httpd
Version:   ???
Code Type: ARM-64 (Native)
Parent Process:httpd [6040]
Responsible:   httpd [6040]
User ID:   502

Date/Time: 2023-11-30 07:06:00.0651 -0600
OS Version:macOS 12.7 (21G816)
Report Version:12
Anonymous UUID:750F146C-B2B5-BECA-EC21-1FEC0471D5AC


Time Awake Since Boot: 100 seconds

System Integrity Protection: enabled

Crashed Thread:0  Dispatch queue: com.apple.root.utility-qos

Exception Type:EXC_BAD_ACCESS (SIGSEGV)
Exception Codes:   KERN_INVALID_ADDRESS at 0x0110
Exception Codes:   0x0001, 0x0110
Exception Note:EXC_CORPSE_NOTIFY

VM Region Info: 0x110 is not in any region.  Bytes before following region: 
105553518919408
  REGION TYPESTART - END [ VSIZE] PRT/MAX 
SHRMOD  REGION DETAIL
  UNUSED SPACE AT START
--->  
  MALLOC_NANO (reserved)   60001800-60002000 [128.0M] rw-/rwx 
SM=NUL  ...(unallocated)

Application Specific Information:
*** multi-threaded process forked ***
crashed on child side of fork pre-exec


Kernel Triage:
VM - pmap_enter failed with resource shortage
VM - pmap_enter failed with resource shortage


Thread 0 Crashed::  Dispatch queue: com.apple.root.utility-qos
0   libdispatch.dylib  0x199dd825c 
_dispatch_apply_with_attr_f + 1136
1   libdispatch.dylib  0x199dd8234 
_dispatch_apply_with_attr_f + 1096
2   libdispatch.dylib  0x199dd847c dispatch_apply + 108
3   CoreFoundation 0x19a172a80 
__104-[CFPrefsSearchListSource 
synchronouslySendDaemonMessage:andAgentMessage:andDirectMessage:replyHandler:]_block_invoke.92
 + 132
4   CoreFoundation 0x19a007e8c 
CFPREFERENCES_IS_WAITING_FOR_SYSTEM_AND_USER_CFPREFSDS + 100
5   CoreFoundation 0x19a007ccc 
-[CFPrefsSearchListSource 
synchronouslySendDaemonMessage:andAgentMessage:andDirectMessage:replyHandler:] 
+ 232
6   CoreFoundation 0x19a00649c 
-[CFPrefsSearchListSource 
alreadylocked_generationCountFromListOfSources:count:] + 252
7   CoreFoundation 0x19a006178 
-[CFPrefsSearchListSource alreadylocked_getDictionary:] + 468
8   CoreFoundation 0x19a005cec 
-[CFPrefsSearchListSource alreadylocked_copyValueForKey:] + 172
9   CoreFoundation 0x19a005c20 -[CFPrefsSource 
copyValueForKey:] + 60
10  CoreFoundation 0x19a005bcc 
__76-[_CFXPreferences 
copyAppValueForKey:identifier:container:configurationURL:]_block_invoke + 44
11  CoreFoundation 0x199ffe9e0 
__108-[_CFXPreferences(SearchListAdditions) 
withSearchListForIdentifier:container:cloudConfigurationURL:perform:]_block_invoke
 + 384
12  CoreFoundation 0x19a173350 -[_CFXPreferences 
withSearchListForIdentifier:container:cloudConfigurationURL:perform:] + 384
13  CoreFoundation 0x199ffe394 -[_CFXPreferences 
copyAppValueForKey:identifier:container:configurationURL:] + 168
14  CoreFoundation 0x199ffe2b0 
_CFPreferencesCopyAppValueWithContainerAndConfiguration + 128
15  Heimdal0x1a5d4cb80 
init_context_from_config_file + 2732
16  Heimdal0x1a5d33944 
krb5_set_config_files + 392
17  Heimdal0x1a5d33284 
krb5_init_context_flags + 308
18  Heimdal0x1a5d33144 krb5_init_context + 
32
19  Kerberos   0x1a7fc32e8 mshim_ctx + 64
20  Kerberos   0x1a7fc16e4 
context_new_ccache_iterator + 92
21  libkrb5.3.3.dylib  0x1017accc8 
api_macos_ptcursor_next + 220
22  libkrb5.3.3.dylib  0x1017a9f0c 
krb5_cccol_cursor_next + 76
23  libkrb5.3.3.dylib  0x1017aa1f4 
krb5_cccol_have_content + 92
24  libgssapi_krb5.2.2.dylib   0x1016a1f58 acquire_cred_context 
+ 1668
25  libgssapi_krb5.2.2.dylib   0x1016

Two started cluster on the same data directory and port

2023-11-30 Thread Loles
Hi!

I have this situation on a server that is giving me problems.

postgres@hostname:~$ pg_lsclusters
Ver Cluster  Port Status OwnerData directory  Log file
14  main 5432 online postgres /var/lib/postgresql/14/main
/var/log/postgresql/postgresql-14-main.log
14  main_old 5432 online postgres /var/lib/postgresql/14/main
/var/log/postgresql/postgresql-14-main_old.log

How can I stop, even delete, the main_old cluster?

Thanks!