Hi,

I've made the kernel changes that I wrote in my original e-mail, and I've 
created some additional logging (both csvlog and syslog), to gather more 
informations. 

/boot/loader.conf:

kern.ipc.semmni="512"
kern.ipc.semmns="1024"
kern.ipc.semume="64"
kern.ipc.semmnu="512"

/etc/sysctl.cong:

kern.ipc.shmall=262144
kern.ipc.shmmax=1073742336
kern.ipc.semmap=256

pgTune made this config changes for me in /usr/local/pgsql/data/postgresql.conf 
(the server has 4GB RAM)
default_statistics_target               = 50            # pgtune wizard 
2012-08-15
maintenance_work_mem            = 240MB         # pgtune wizard 2012-08-15
constraint_exclusion                    = on            # pgtune wizard 
2012-08-15
checkpoint_completion_target    = 0.9           # pgtune wizard 2012-08-15
effective_cache_size                    = 2816MB        # pgtune wizard 
2012-08-15
work_mem                                = 24MB  # pgtune wizard 2012-08-15
wal_buffers                             = 8MB           # pgtune wizard 
2012-08-15
checkpoint_segments                     = 16            # pgtune wizard 
2012-08-15
shared_buffers                  = 960MB         # pgtune wizard 2012-08-15
max_connections                         = 80            # pgtune wizard 
2012-08-15

After a day, the file is 412kb large. I've just installed strace, and I try to 
capture a 2-4 hours work, and check what is going on. 

ulimit (& ulimit -f) output is unlimited. 

I'll be back (:-)) within few days with the results. Thank you all the 
informations.

Regards,
Csaba

-----Original Message-----
From: Tom Lane [mailto:[email protected]] 
Sent: Wednesday, August 15, 2012 3:34 PM
To: Carl von Clausewitz
Cc: [email protected]
Subject: Re: [GENERAL] corrupted statistics file "pg_stat_tmp/pgstat.stat"

"Carl von Clausewitz" <[email protected]> writes:
> I’ve restored from TAR backup our databases, and everything looked fine. 
> Without changing any setting in postgresql.conf (or in kernel settings) – 
> only “track_counts=on”, after 2-3 days, I’m receiving huge number 
> (~5-10 PCS in every second) of error messages like that in 
> /var/log/postgresql.log:
> *** Aug 15 06:27:26 eurodb postgres[77652]: [43-1] WARNING:  corrupted 
> statistics file "pg_stat_tmp/pgstat.stat"

Huh.  The stats collector process ought to rewrite that file fairly often, so 
this suggests it's consistently failing to rewrite it.

You might take a look at what the file looks like after a day or so of normal 
operation (eg, how big is it, how often does it get updated) and then compare 
to what it looks like after the errors start.

Also, try strace'ing the stats collector process for a little while (long 
enough to capture a stats file rewrite sequence) during normal operation, and 
then again after the errors start.

I don't want to speculate too much in advance of the data, but I'm wondering 
about a ulimit setting that limits how much data the stats collector can write 
during its lifetime (ulimit -f or local equivalent).
That would eventually cause problems for any postgres process, but if you did 
accidentally have one in place when starting the postmaster, maybe the stats 
collector would be first to show symptoms.

                        regards, tom lane



-- 
Sent via pgsql-general mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Reply via email to