Bug#756008: postgresql-common: pg_upgradecluster 9.3 - 9.4 fails

2015-08-19 Thread Johannes Truschnigg
This exact problem, together with an undocumented and obscure 
5-seconds-timeout in pg_upgradecluster's invocation of `pg_ctlcluster 
stop`, nearly made me pull my hair out yesterday.


I successfully tested the planned upgrade from 9.0 to 9.4 in the staging 
environment, but because the live database requires more than 5 seconds 
to properly shutdown, pg_upgradecluster died very early with a 
nondescript error message, costing me more than an hour hunting down 
thesource of the problem while production was suffering a scheduled, yet 
prolonged downtime. The following was its death note:


 8 
Stopping old cluster...
pg_ctl: server does not shut down
Error: Could not stop old cluster
 8 

After some frantic digging, I solved the problem by temporarily 
monkey-patching /usr/bin/pg_upgradecluster to stop the old cluster like 
presented below:



 8 
# stopping old cluster, so that we notice early when there are still
# connections
if ($info{'running'}) {
get_encoding $version, $cluster;
print Stopping old cluster...\n;
my @argv = ('pg_ctlcluster', $version, $cluster, 'stop', '-m', 
'fast', '--');

push @argv, ('-t', '30') if $version = '8.4';
error Could not stop old cluster if system @argv;
}
 8 

I'm not sure what the reason for the (original; I changed it to 30 to 
get it to work with our setup and workload) 5s timeout being there is 
(and I don't think that trick with passing additional argv elements 
after a double-dash down to pg_upgrade is documented anywhere?!); maybe 
it should just get removed instead of relying on total guesswork about 
how long it could take to stop a user's cluster (esp. in the default 
smart mode, 5s seems VERY unlikely for busy servers).


Thanks very much for taking note of this problem, and trying to help fix it!
--
Mit freundlichen Grüßen
Johannes Truschnigg
Senior System Administrator
--
mailto:johannes.truschn...@geizhals.at (in dringenden Fällen bitte an 
i...@geizhals.at)


Geizhals(R) - Preisvergleich Internet Services AG
Obere Donaustrasse 63/2
A-1020 Wien
Tel: +43 1 5811609/87
Fax: +43 1 5811609/55
http://geizhals.at = Preisvergleich für Österreich
http://geizhals.de = Preisvergleich für Deutschland
http://geizhals.eu = Preisvergleich EU-weit
Handelsgericht Wien | FN 197241K | Firmensitz Wien



Bug#756008: postgresql-common: pg_upgradecluster 9.3 - 9.4 fails

2014-07-29 Thread Thorsten Glaser
On Mon, 28 Jul 2014, Christoph Berg wrote:

  So, how d̲o̲ I recover from this without making things worse, now?
 
 You can just start the old cluster again. (The only thing that could
 be changed there are the port number and start.conf, but judging from
 the errors you got it died way before that point.)

OK, thank you.

Starting needed me to *stop* the old cluster (using the init script)
in the first place, judging from “ps ax” output… apparently, there
were several Akonadi sessions still open.

I could then successfully upgrade the cluster.

The bug remains: pg_upgradecluster should not fail to upgrade the
cluster (it’s called as root, so it could just use the same mechanism
which the init script uses to stop it), and must not bring the DB into
such an inconsistent state. Apparently, this is nothing new, but my
(and others’) use of psql as KDE backend (instead of a non-database)
will trigger it more often now.

bye,
//mirabilos
-- 
tarent solutions GmbH
Rochusstraße 2-4, D-53123 Bonn • http://www.tarent.de/
Tel: +49 228 54881-393 • Fax: +49 228 54881-235
HRB 5168 (AG Bonn) • USt-ID (VAT): DE122264941
Geschäftsführer: Dr. Stefan Barth, Kai Ebenrett, Boris Esser, Alexander Steeg


--
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org



Bug#756008: postgresql-common: pg_upgradecluster 9.3 - 9.4 fails

2014-07-28 Thread Christoph Berg
Re: Thorsten Glaser 2014-07-28 
alpine.deb.2.11.1407280950180.8...@tglase.lan.tarent.de
 On Fri, 25 Jul 2014, Thorsten Glaser wrote:
 
  What am I supposed to do now? I fear doing anything wrong
  will make the situation much worse?
 
 So, how d̲o̲ I recover from this without making things worse, now?

You can just start the old cluster again. (The only thing that could
be changed there are the port number and start.conf, but judging from
the errors you got it died way before that point.)

Christoph
-- 
c...@df7cb.de | http://www.df7cb.de/


-- 
To UNSUBSCRIBE, email to debian-bugs-dist-requ...@lists.debian.org
with a subject of unsubscribe. Trouble? Contact listmas...@lists.debian.org