Re: [HACKERS] 8.04 and RedHat/CentOS init script issue and sleep

2005-10-21 Thread Magnus Hagander
  I'm not actually particularly worried about the startup 
 time.  What's 
  bothering me right at the moment, given the new-found knowledge that
  strftime() is slow on Linux, is that we're using it in 
 elog().  At the 
  time that code was written, we did it deliberately to 
 ensure that all 
  the backends would write log timestamps in the same timezone 
  regardless of local SET TimeZone commands.  That's still an 
 important 
  consideration, but I wonder whether we don't now have 
 enough timezone 
  infrastructure that we could get the same results using pg_strftime.
 
 If glibc fixes the problem upstream then we can leave well 
 enough alone, but if they indicate they won't then we should 

That'll take quite a while to trickle down into the distributions even
if it's fixed, won't it? If the fix is simple, we should perhaps
consider it anyway.


 think about doing this someday.  The major problem with it 
 probably is what do you do when messages need to be emitted 
 before pgtz has been initialized?

Shouldn't be too hard, I think. If we declare a pg_tz* system_timezone
or so, and initialize it to NULL. Once pgtz is initialized we assign a
valid timezone to it, being the startup timezone. Then in elog, we
simply check if system_timezone is null and then fallback on the glibc
version of strftime. 

It shouldn't be a performance issue if it fails that often, because we
won't call elog a whole lot of times there, right?

//Magnus

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [HACKERS] 8.04 and RedHat/CentOS init script issue and sleep

2005-10-20 Thread Tony Caduto

Hi all,
I tried changing the sleep command in the script to 2, but at boot it 
still says [FAILED].

even though the script reports it failed, the db is up an running.

System is a Compaq DL380(2.5gb ram 2.4 dual 2.4gzh Xeon) running CentOS 4.2

I am going to install 8.1beta 3 on another box that is the exact same 
hardware and OS version, I will report back what happens.


Not sure what is going on, has anyone else had this problem with CentOS 
4.2 or Red Had EL 4.2?


Thanks,

Tony Caduto
http://www.amsoftwaredesign.com
Home of PG Lightning Admin for Postgresql 8.x

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [HACKERS] 8.04 and RedHat/CentOS init script issue and sleep

2005-10-20 Thread Tom Lane
Tony Caduto [EMAIL PROTECTED] writes:
 I tried changing the sleep command in the script to 2, but at boot it 
 still says [FAILED].
 even though the script reports it failed, the db is up an running.

This seems to happen for some people and not others.  I've been wanting
to find out how the heck it can take multiple seconds for the postmaster
to start and create its pid-file ... that shouldn't take long at all.
Are you willing to try strace'ing the postmaster?  Modify the script
like

$SU -l postgres -c strace -tt -o /tmp/strace.out $PGENGINE/postmaster 
-p '$PGPORT' -D '$PGDATA' ${PGOPTS}   $PGLOG 21  /dev/null
^^ add this ^^

and reboot.  (After you've gotten a trace of a failing case, change it
back and reboot again.)

This is kind of invasive and may change the behavior enough that we
don't see the problem :-( --- but if you're willing to reboot a few
times in hopes of capturing a trace of a failed case, it'd be worth
trying.

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [HACKERS] 8.04 and RedHat/CentOS init script issue and sleep

2005-10-20 Thread Tony Caduto

Tom Lane wrote:


Tony Caduto [EMAIL PROTECTED] writes:
 

I tried changing the sleep command in the script to 2, but at boot it 
still says [FAILED].

even though the script reports it failed, the db is up an running.
   



This seems to happen for some people and not others.  I've been wanting
to find out how the heck it can take multiple seconds for the postmaster
to start and create its pid-file ... that shouldn't take long at all.
Are you willing to try strace'ing the postmaster?  Modify the script
like

$SU -l postgres -c strace -tt -o /tmp/strace.out $PGENGINE/postmaster -p '$PGPORT' -D '$PGDATA' ${PGOPTS} 
  $PGLOG 21  /dev/null
^^ add this ^^

and reboot.  (After you've gotten a trace of a failing case, change it
back and reboot again.)

This is kind of invasive and may change the behavior enough that we
don't see the problem :-( --- but if you're willing to reboot a few
times in hopes of capturing a trace of a failed case, it'd be worth
trying.

regards, tom lane

 


Hi Tom,
I added the strace line like you said and rebooted, it did display the 
[FAILED] after the reboot.
I put the resulting strace.out file on my web server, here is the 
link(warning it's petty big):

http://www.amsoftwaredesign.com/downloads/strace.out

After the second reboot I changed the sleep from 2 to 5 and then it 
worked correctly, of course this really slowed the boot process.


Thanks,

Tony



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] 8.04 and RedHat/CentOS init script issue and sleep

2005-10-20 Thread Tom Lane
Tony Caduto [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Are you willing to try strace'ing the postmaster?

 I added the strace line like you said and rebooted, it did display the 
 [FAILED] after the reboot.

Thanks for collecting the raw data.  The salient events seem to be these:

12:57:52.400888 exec() call
12:57:52.619268 completion(?) of opening shared libraries
12:57:52.657465 first call coming from our own code instead of libraries
12:57:52.902476 begin reading postgresql.conf
12:57:52.915949 done reading postgresql.conf
12:57:52.916191 begin trying to identify system timezone
12:58:01.117869 done identifying system timezone
12:58:01.131798 postmaster.pid created

In short: pg_timezone_initialize() took about 8.2 seconds out of the
total time of 8.73 seconds.

Since pg_timezone_initialize() needs to scan all of the 500-odd files
under postgresql/share/timezone/, it isn't so surprising that it would
take a little bit of time.  But 8 seconds seems like a lot.  The trace
makes it look like localtime() performs stat(/etc/localtime) on each
call, which is pretty ugly --- I wonder if there isn't some way around
that?

Anyway, the short answer is that pg_timezone_initialize ought to wait
till after we've created postmaster.pid.  There's no urgent reason to
do it earlier AFAICS.  This also explains why we didn't see a startup
problem in earlier releases --- pg_timezone_initialize didn't exist
before 8.0.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] 8.04 and RedHat/CentOS init script issue and sleep

2005-10-20 Thread Andrew Dunstan



Tom Lane wrote:



In short: pg_timezone_initialize() took about 8.2 seconds out of the
total time of 8.73 seconds.

Since pg_timezone_initialize() needs to scan all of the 500-odd files
under postgresql/share/timezone/, it isn't so surprising that it would
take a little bit of time.  But 8 seconds seems like a lot.  The trace
makes it look like localtime() performs stat(/etc/localtime) on each
call, which is pretty ugly --- I wonder if there isn't some way around
that?


 



Further data points:

I just observed this taking over 20 seconds on my clunky old pII 266. 
That's really horrible. But  pg_ctl -w start was able to complete in 
about 2 seconds.


Even on my much faster laptop the timezone lib startup took 3 or 4 
seconds (and pg_ctl -w start came back in about 1 second).


cheers

andrew

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [HACKERS] 8.04 and RedHat/CentOS init script issue and sleep

2005-10-20 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 In short: pg_timezone_initialize() took about 8.2 seconds out of the
 total time of 8.73 seconds.

 Further data points:

 I just observed this taking over 20 seconds on my clunky old pII 266. 
 That's really horrible. But  pg_ctl -w start was able to complete in 
 about 2 seconds.

Yeah.  I've been experimenting here, and it's clear that strace itself
adds huge overhead --- on my machine, postmaster start is normally well
under a second, but strace'ing it brings it to about 8 seconds.  No
doubt that's because of all the stat(/etc/localtime) calls it has to
trace.

So there's some Heisenberg effect here.  However, I don't think there
can be much doubt that on a machine that is just booting (and has
surely got none of these files in cache) the search through
share/postgresql/timezone could take a few seconds.  Hindsight is
always 20/20 ;-)

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] 8.04 and RedHat/CentOS init script issue and sleep

2005-10-20 Thread Andrew Dunstan



Tom Lane wrote:


So there's some Heisenberg effect here.  However, I don't think there
can be much doubt that on a machine that is just booting (and has
surely got none of these files in cache) the search through
share/postgresql/timezone could take a few seconds.  Hindsight is
always 20/20 ;-)
 



Something is surely wrong in the timezone lib, though:

[EMAIL PROTECTED] inst]$ grep /etc/localtime strace.out | wc -l
38073


cheers

andrew



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [HACKERS] 8.04 and RedHat/CentOS init script issue and sleep

2005-10-20 Thread Tom Lane
Andrew Dunstan [EMAIL PROTECTED] writes:
 Something is surely wrong in the timezone lib, though:

[ digs in glibc sources for awhile... ]

The test loop in score_timezone() calls both localtime() and strftime()
for each probe point, and in glibc strftime() calls tzset(), which the
source code claims is required by POSIX.  The explicit tzset() call is
what's forcing the recheck of /etc/localtime.

Possibly the glibc boys would listen to a suggestion that strftime()
need not force the file recheck, but my experience with them is that
they're relatively impervious to suggestions :-(

I'm not actually particularly worried about the startup time.  What's
bothering me right at the moment, given the new-found knowledge that
strftime() is slow on Linux, is that we're using it in elog().  At the
time that code was written, we did it deliberately to ensure that all
the backends would write log timestamps in the same timezone regardless
of local SET TimeZone commands.  That's still an important
consideration, but I wonder whether we don't now have enough timezone
infrastructure that we could get the same results using pg_strftime.

regards, tom lane

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [HACKERS] 8.04 and RedHat/CentOS init script issue and sleep

2005-10-20 Thread Tom Lane
I wrote:
 Possibly the glibc boys would listen to a suggestion that strftime()
 need not force the file recheck, but my experience with them is that
 they're relatively impervious to suggestions :-(

I've filed a bug for this:
https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=171351
so no need for everyone else to do it too ...

 I'm not actually particularly worried about the startup time.  What's
 bothering me right at the moment, given the new-found knowledge that
 strftime() is slow on Linux, is that we're using it in elog().  At the
 time that code was written, we did it deliberately to ensure that all
 the backends would write log timestamps in the same timezone regardless
 of local SET TimeZone commands.  That's still an important
 consideration, but I wonder whether we don't now have enough timezone
 infrastructure that we could get the same results using pg_strftime.

If glibc fixes the problem upstream then we can leave well enough alone,
but if they indicate they won't then we should think about doing this
someday.  The major problem with it probably is what do you do when
messages need to be emitted before pgtz has been initialized?

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [HACKERS] 8.04 and RedHat/CentOS init script issue

2005-10-19 Thread Devrim GUNDUZ

Hi,

On Tue, 18 Oct 2005, Tony Caduto wrote:

I installed 8.04 via RPM on Centos 4.2 which is the same as RedHat 4.2 and 
while booting the init script reports that the daemon [FAILED], but after I 
logon it shows the postmaster running and I am able to connect from any 
client remotely.


I made not modifcations to the script and there is nothing out of the 
ordinary in the log.


Hmm. In 8.0.4 RPM init scripts, we were using a 1 second of sleep time 
(see sleep 1 line in the init script). On some cases where the system is 
slow, you are prompted about the startup failure; however this is not the 
real case.


In 8.1 RPMs, the sleep time was increased to 2 seconds; which we believe 
that won't have the problem you've reported:


http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/pgsqlrpms/patches/8.1/postgresql.init?rev=1.2content-type=text/x-cvsweb-markup

So please increase this sleep time and give another try.

Regards,
--
Devrim GUNDUZ
Kivi Bilişim Teknolojileri - http://www.kivi.com.tr
devrim~gunduz.org, devrim~PostgreSQL.org, devrim.gunduz~linux.org.tr
  http://www.gunduz.org
---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[HACKERS] 8.04 and RedHat/CentOS init script issue

2005-10-18 Thread Tony Caduto

Hi,
I installed 8.04 via RPM on Centos 4.2 which is the same as RedHat 4.2 
and while booting the init script reports that the daemon [FAILED], but 
after I logon it shows the postmaster running and I am able to connect 
from any client remotely.


I made not modifcations to the script and there is nothing out of the 
ordinary in the log.


Thanks,

Tony

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match