Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-27 Thread Florian Weimer
* henk de wit:

 On this table we're inserting records with a relatively low
 frequency of +- 6~10 per second. We're using PG 8.3.1 on a machine
 with two dual core 2.4Ghz XEON CPUs, 16 GB of memory and Debian
 Linux. The machine is completely devoted to PG, nothing else runs on
 the box.

Have you disabled the OOM killer?

-- 
Florian Weimer[EMAIL PROTECTED]
BFK edv-consulting GmbH   http://www.bfk.de/
Kriegsstraße 100  tel: +49-721-96201-1
D-76133 Karlsruhe fax: +49-721-96201-99

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] control the number of clog files and xlog files

2008-08-27 Thread Duan Ligong
Alvaro, Thanks for your answer. 
It would be very helpful.

  Would you like to be so kind as to answer the following questions:
  
  - Is there any way to control the number of clog files and xlog files? 
  I encounter an issue that there are too many clog files under the 
  pg_clog/ directory which occupy more space than I can endure..
 
 pg_clog files are controlled by tuple freezing, which is done by vacuum,
 and it depends on the autovacuum_min_freeze_age parameter and
 vacuum_freeze_min_age.  Please read

So can we reduce the number of clog by increasing the 
autovacuum_min_freeze_age parameter and vacuum_freeze_min_age
? 

 http://www.postgresql.org/docs/8.3/interactive/routine-vacuuming.html
 and
 http://www.postgresql.org/docs/8.3/interactive/runtime-config-client.html#GUC-VACUUM-FREEZE-MIN-AGE

  - What determines the number of clog files? what determines the 
  number of xlog files?
 
 The number of xlog files will depend on checkpoints.  You need to
 restrict checkpoint_segments to control this.  Note that this can have a
 serious performance impact.
  -  I understand pg_xlog is used to record WAL. but what is pg_clog
  is used to? Is it used to record some meta-information on the xlog?
 
 clog is the commit log, i.e. it records transactions that have been
 committed and those that have been aborted.  You cannot delete files
 unless you want to corrupt your database.

Could you explain how the clog files work roughly?
(What is inside of the clog files? when and how the new clog files 
are created? when and in what case the old files are deleted 
or rotated? how does postgresql regard a file is old enough to be 
deleted? Does Vacuum will definitely cause deleting of old files
and creating of new clog files?)
 
  - What effect does Deleting the clog and xlogfiles bring about?
  Will it cause Postgresql abnormal stopping?
 Your data will be corrupt.  It may continue to work for a while, and
 suddenly stop working at a future time.

I encoutered a scenario that there are many files and some of them 
are as old as one month ago. Does all these files including the 
old files are still useful for postgresql? and when will they deleted 
or rotated? Or should they be deleted and maintained by external
programs?

Best regards
Duan

 -- 
 Alvaro Herrerahttp://www.CommandPrompt.com/
 The PostgreSQL Company - Command Prompt, Inc.
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance


---
Duan Ligong
: 8-0086-22-354



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-27 Thread Bill Moran
In response to henk de wit [EMAIL PROTECTED]:

  What do your various logs (pgsql, application, etc...) have to say?
 
 There
 is hardly anything helpful in the pgsql log. The application log
 doesn't mention anything either. We log a great deal of information in
 our application, but there's nothing out of the ordinary there,
 although there's of course always a chance that somewhere we missed
 something.

There should be something in a log somewhere.  Someone suggested the oom
killer might be getting you, if so there should be something in one of
the system logs.

If you can't find anything, then you need to beef up your logs.  Try
increasing the amount of stuff that gets logged by PG by tweaking the
postgres.conf settings.  Then run iostat, vmstat and top in an endless
loop dumping their output to files (recommend you run date(1) in between
each run, otherwise you can't correlate the output to the time of
occurrence ;)

While you've got all this extra logging going and you're waiting for the
problem to happen again, do an audit of your postgres.conf settings for
memory usage and see if they actually add up.  How much RAM does the
system have?  How much of it is free?  How much of that are you eating
with shared_buffers?  How much sort_mem did you tell PG it has?  Have
you told PG that it has more memory than the machine actually has?

I've frequently recommended installing pg_buffercache and using mrtg
or something similar to graph various values from it and other easily
accessible statistics in PG and the operating system.  The overhead of
collecting and graphing those values is minimal, and having the data
from those graphs can often be the little red arrow that points you to
the solution to problems like these.  Not to mention the historical
data generally tells you months ahead of time when you're going to
need to scale up to bigger hardware.

On a side note, what version of PG are you using?  If it was in a
previous email, I missed it.

Hope this helps.

-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] control the number of clog files and xlog files

2008-08-27 Thread Alvaro Herrera
Duan Ligong wrote:
 Alvaro, Thanks for your answer. 
 It would be very helpful.
 
   Would you like to be so kind as to answer the following questions:
   
   - Is there any way to control the number of clog files and xlog files? 
   I encounter an issue that there are too many clog files under the 
   pg_clog/ directory which occupy more space than I can endure..
  
  pg_clog files are controlled by tuple freezing, which is done by vacuum,
  and it depends on the autovacuum_min_freeze_age parameter and
  vacuum_freeze_min_age.  Please read
 
 So can we reduce the number of clog by increasing the 
 autovacuum_min_freeze_age parameter and vacuum_freeze_min_age
 ? 

Yes, but decreasing the value.

Sorry, you ask more questions that I have time to answer right now.

 I encoutered a scenario that there are many files and some of them 
 are as old as one month ago. Does all these files including the 
 old files are still useful for postgresql? and when will they deleted 
 or rotated? Or should they be deleted and maintained by external
 programs?

Yes, those files are still useful.  They will be deleted eventually.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-27 Thread DANIEL CRISTIAN CRUZ
Maybe strace could help you find the problem, but could cause a great
overhead...

Bill Moran [EMAIL PROTECTED] escreveu:
 ...
--
span style=color: #80Daniel Cristian Cruz
/spanAdministrador de Banco de Dados
Direção Regional - Núcleo de Tecnologia da Informação
SENAI - SC
Telefone: 48-3239-1422 (ramal 1422)




-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Is there a way to SubPartition?

2008-08-27 Thread Jerry Champlin
Is there a way to use multi-level inheritance to achieve sub 
partitioning that the query optimizer will recognize?  With our current 
application design, we would need a partition for every other day  for 
18 months which will not perform well.  The reason we need so many 
partitions is that we can't afford to vacuum the active partition (750MM 
inserts + updates per day is the performance requirement for 12 months 
out).  After it's a day old, there are no  longer any updates or inserts 
and we can vacuum  it at that point.  If multi-level partitioning 
worked, we could solve this problem without changing our code.  Ideas?


-Jerry



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Is there a way to SubPartition?

2008-08-27 Thread Tom Lane
Jerry Champlin [EMAIL PROTECTED] writes:
 Is there a way to use multi-level inheritance to achieve sub 
 partitioning that the query optimizer will recognize?

No, I don't think so.  How would that make things any better anyway?
You're still going to end up with the same very large number of
partitions.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Is there a way to SubPartition?

2008-08-27 Thread Matthew Wakeling

On Wed, 27 Aug 2008, Jerry Champlin wrote:
After it's a day old, there are no longer any updates or inserts and we 
can vacuum it at that point.


A pattern that has worked very well for other people is to have two 
separate tables (or partitions). One contains today's data, and the other 
contains historic data that is no longer updated. Once a day, transfer the 
data between the partitions, and the historic data partition will not need 
vacuuming.


Some changes to your code will be needed however.

Matthew

--
Vacuums are nothings. We only mention them to let them know we know
they're there.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-27 Thread Shane Ambler

Bill Moran wrote:

On a side note, what version of PG are you using?  If it was in a 
previous email, I missed it.



He mentioned 8.3.1 in the first email.
Although nothing stands out in the 8.3.2 or 8.3.3 fix list (without 
knowing his table structure or any contrib modules used) I wonder if

one of them may resolve his issue.

I also wonder if the error is actually sent back from postgresql or
whether jdbc is throwing the exception because of a timeout waiting for
a response. I would think that with the table in use having 22GB data
and 13GB indexes that the long running query has a chance of creating a
delay on the connections that is long enough to give jdbc the impression
that it isn't responding - generating a misleading error code of An I/O
error (meaning we know the server got the request but the response from
the server isn't coming back)

Can you increase the timeout settings on the insert connections that are
failing?




--

Shane Ambler
pgSQL (at) Sheeky (dot) Biz

Get Sheeky @ http://Sheeky.Biz

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Is there a way to SubPartition?

2008-08-27 Thread Jerry Champlin
If it were implemented in such a way that when the top level pruning
happens, a set of 3 sub partitions is selected from say 18 total and then at
the next level is selects the 3 matching sub partitions from each matched
group of 30 then you are only looking at 18+3*30 = 108 instead of 548 checks
to evaluate example assumes monthly first level partitioning and daily sub
partitioning.  If this is not supported, then we will need to solve the
problem a different way - probably weekly partitions and refactor the code
to decrease updates by at least an order of magnitude.  While we are in the
process of doing this, is there a way to make updates faster?  Postgresql is
spending a lot of CPU cycles for each HOT update.  We have
synchronous_commit turned off, commit siblings set to 5, commit_delay set to
50,000.  With synchronous_commit off does it make any sense to be grouping
commits?  Buffers written by the bgwriter vs checkpoint is 6 to 1.  Buffers
written by clients vs buffers by checkpoint is 1 to 6.  Is there anything
obvious here?

-Jerry


-Original Message-
From: Tom Lane [mailto:[EMAIL PROTECTED] 
Sent: Wednesday, August 27, 2008 8:02 AM
To: Jerry Champlin
Cc: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Is there a way to SubPartition? 

Jerry Champlin [EMAIL PROTECTED] writes:
 Is there a way to use multi-level inheritance to achieve sub 
 partitioning that the query optimizer will recognize?

No, I don't think so.  How would that make things any better anyway?
You're still going to end up with the same very large number of
partitions.

regards, tom lane



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Is there a way to SubPartition?

2008-08-27 Thread Alvaro Herrera
Jerry Champlin wrote:
 If it were implemented in such a way that when the top level pruning
 happens, a set of 3 sub partitions is selected from say 18 total and then at
 the next level is selects the 3 matching sub partitions from each matched
 group of 30 then you are only looking at 18+3*30 = 108 instead of 548 checks
 to evaluate example assumes monthly first level partitioning and daily sub
 partitioning.  If this is not supported, then we will need to solve the
 problem a different way - probably weekly partitions and refactor the code
 to decrease updates by at least an order of magnitude.  While we are in the
 process of doing this, is there a way to make updates faster?  Postgresql is
 spending a lot of CPU cycles for each HOT update.  We have
 synchronous_commit turned off, commit siblings set to 5, commit_delay set to
 50,000.

Perhaps you do not realize this, but this is an exciting report to read.
Not many years ago, this kind of system would have been unthinkable.
We've now tuned the system so that people is starting to consider it,
and for a lot of people it is working.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Is there a way to SubPartition?

2008-08-27 Thread Tom Lane
Jerry Champlin [EMAIL PROTECTED] writes:
 We have synchronous_commit turned off, commit siblings set to 5,
 commit_delay set to 50,000.  With synchronous_commit off does it make
 any sense to be grouping commits?

No.  In fact commit_delay is a total no-op in that mode.  If it were
doing anything I think you'd have found that to be a counterproductively
large setting ...

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-27 Thread david

On Wed, 27 Aug 2008, Florian Weimer wrote:


* henk de wit:


On this table we're inserting records with a relatively low
frequency of +- 6~10 per second. We're using PG 8.3.1 on a machine
with two dual core 2.4Ghz XEON CPUs, 16 GB of memory and Debian
Linux. The machine is completely devoted to PG, nothing else runs on
the box.


Have you disabled the OOM killer?


my understanding of the OOM killer is that 'disabling' it is disabling 
memory overcommit, making it impossible for you to get into a situation 
where the OOM killer would activate, but this means that any load that 
would have triggered the OOM killer will always start getting memory 
allocation errors before that point.


the OOM killer exists becouse there are many things that can happen on a 
system that allocate memory that 'may' really be needed, but also 'may 
not' really be needed.


for example if you have a process that uses 1G of ram (say firefox) and it 
needs to start a new process (say acroread to handle a pdf file), what it 
does is it forks the firefox process (each of which have 1G of ram 
allocated), and then does an exec of the acroread process (releasing the 
1G of ram previously held by that copy of the firefox process)


with memory overcommit enabled (the default), the kernel recognises that 
most programs that fork don't write to all the memory they have allocated, 
so it marks the 1G of ram that firefox uses as read-only, and if either 
copy of firefox writes to a page of memory it splits that page into 
seperate copies for the seperate processes (and if at this time it runs of 
of memory it invokes the OOM killer to free some space), when firefox does 
an exec almost immediatly after the fork it touches basicly none of the 
pages, so the process only uses 1G or ram total.


if memory overcommit is disabled, the kernel checks to see if you have an 
extra 1G of ram available, if you do it allows the process to continue, if 
you don't it tries to free memory (by throwing away cache, swapping to 
disk, etc), and if it can't free the memory will return a memroy 
allocation error (which I believe will cause firefox to exit).



so you can avoid the OOM killer, but the costs of doing so are that you 
make far less efficiant use of your ram overall.


David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-27 Thread Andrew Sullivan
On Wed, Aug 27, 2008 at 02:45:47PM -0700, [EMAIL PROTECTED] wrote:

 with memory overcommit enabled (the default), the kernel recognises that 
 most programs that fork don't write to all the memory they have
 allocated, 

It doesn't recognise it; it hopes it.  It happens to hope
correctly in many cases, because you're quite right that many programs
don't actually need all the memory they allocate.  But there's nothing
about the allocation that hints, By the way, I'm not really planning
to use this.  Also. . .

 seperate copies for the seperate processes (and if at this time it runs of 
 of memory it invokes the OOM killer to free some space),

. . .it kills processes that are using a lot of memory.  Those are not
necessarily the processes that are allocating memory they don't need.

The upshot of this is that postgres tends to be a big target for the
OOM killer, with seriously bad effects to your database.  So for good
Postgres operation, you want to run on a machine with the OOM killer
disabled.

A

-- 
Andrew Sullivan
[EMAIL PROTECTED]
+1 503 667 4564 x104
http://www.commandprompt.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-27 Thread david

On Wed, 27 Aug 2008, Andrew Sullivan wrote:


On Wed, Aug 27, 2008 at 02:45:47PM -0700, [EMAIL PROTECTED] wrote:


with memory overcommit enabled (the default), the kernel recognises that
most programs that fork don't write to all the memory they have
allocated,


It doesn't recognise it; it hopes it.  It happens to hope
correctly in many cases, because you're quite right that many programs
don't actually need all the memory they allocate.  But there's nothing
about the allocation that hints, By the way, I'm not really planning
to use this.  Also. . .


Ok, I was meaning to say recognises the fact that a common pattern is to 
not use the memory, and so it...



seperate copies for the seperate processes (and if at this time it runs of
of memory it invokes the OOM killer to free some space),


. . .it kills processes that are using a lot of memory.  Those are not
necessarily the processes that are allocating memory they don't need.


the bahavior of the OOM killer has changed over time, so far nobody has 
been able to come up with a 'better' strategy for it to follow.



The upshot of this is that postgres tends to be a big target for the
OOM killer, with seriously bad effects to your database.  So for good
Postgres operation, you want to run on a machine with the OOM killer
disabled.


I disagree with you. I think goof Postgres operation is so highly 
dependant on caching as much data as possible that disabling overcommit 
(and throwing away a lot of memory that could be used for cache) is a 
solution that's as bad or worse than the problem it's trying to solve.


I find that addign a modest amount of swap to the system and leaving 
overcommit enabled works better for me, if the system starts swapping I 
have a chance of noticing and taking action, but it will ride out small 
overloads. but the biggest thing is that it's not that much more 
acceptable for me to have other programs on the box failing due to memory 
allocation errors, and those will be much more common with overcommit 
disabled then the OOM killer would be with it enabled


David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-27 Thread Alvaro Herrera
[EMAIL PROTECTED] wrote:
 On Wed, 27 Aug 2008, Andrew Sullivan wrote:

 seperate copies for the seperate processes (and if at this time it runs of
 of memory it invokes the OOM killer to free some space),

 . . .it kills processes that are using a lot of memory.  Those are not
 necessarily the processes that are allocating memory they don't need.

 the bahavior of the OOM killer has changed over time, so far nobody has  
 been able to come up with a 'better' strategy for it to follow.

The problem with OOM killer for Postgres is that it tends to kill the
postmaster.  That's really dangerous.  If it simply killed a backend
then it wouldn't be so much of a problem.

Some time ago I found that it was possible to fiddle with a /proc entry
to convince the OOM to not touch the postmaster.  A postmaster with the
raw IO capability bit set would be skipped by the OOM too killer (this
is an Oracle tweak AFAIK).

These are tricks that people could use in their init scripts to protect
themselves.

(I wonder if the initscript supplied by the RPMs or Debian should
contain such a hack.)

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-27 Thread Tom Lane
Alvaro Herrera [EMAIL PROTECTED] writes:
 Some time ago I found that it was possible to fiddle with a /proc entry
 to convince the OOM to not touch the postmaster.  A postmaster with the
 raw IO capability bit set would be skipped by the OOM too killer (this
 is an Oracle tweak AFAIK).
 These are tricks that people could use in their init scripts to protect
 themselves.

Yeah?  Details please?  Does the bit get inherited by child processes?

 (I wonder if the initscript supplied by the RPMs or Debian should
 contain such a hack.)

It would certainly make sense for my RHEL/Fedora-specific packages,
since those are targeting a very limited range of kernel versions.
Not sure about the situation for other distros.

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-27 Thread Tom Lane
[EMAIL PROTECTED] writes:
 On Wed, 27 Aug 2008, Andrew Sullivan wrote:
 The upshot of this is that postgres tends to be a big target for the
 OOM killer, with seriously bad effects to your database.  So for good
 Postgres operation, you want to run on a machine with the OOM killer
 disabled.

 I disagree with you.

Actually, the problem with Linux' OOM killer is that it
*disproportionately targets the PG postmaster*, on the basis not of
memory that the postmaster is using but of memory its child processes
are using.  This was discussed in the PG archives a few months ago;
I'm too lazy to search for the link right now, but the details and links
to confirming kernel documentation are in our archives.

This is one hundred percent antithetical to the basic design philosophy
of Postgres, which is that no matter how badly the child processes screw
up, the postmaster should live to fight another day.  The postmaster
basically exists to restart things after children die ungracefully.
If the OOM killer takes out the postmaster itself (rather than the child
that was actually eating the unreasonable amount of memory), we have no
chance of recovering.

So, if you want a PG installation that is as robust as it's designed to
be, you *will* turn off Linux' OOM killer.  Otherwise, don't complain to
us when your database unexpectedly stops responding.

(Alternatively, if you know how an unprivileged userland process can
defend itself against such exceedingly brain-dead kernel policy, we are
all ears.)

regards, tom lane

PS: I think this is probably unrelated to the OP's problem, since he
stated there was no sign of any problem from the database server's
side.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance