Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread Craig James

The OOM killer is a terrible idea for any serious database server.  I wrote a 
detailed technical paper on this almost 15 years ago when Silicon Graphics had 
this same feature, and Oracle and other critical server processes couldn't be 
made reliable.

The problem with overallocating memory as Linux does by default is that EVERY 
application, no matter how well designed and written, becomes unreliable: It can be 
killed because of some OTHER process.  You can be as clever as you like, and do all the 
QA possible, and demonstrate that there isn't a single bug in Postgres, and it will STILL 
be unreliable if you run it on a Linux system that allows overcommitted memory.

IMHO, all Postgres servers should run with memory-overcommit disabled.  On 
Linux, that means  /proc/sys/vm/overcommit_memory=2.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread david

On Wed, 27 Aug 2008, Craig James wrote:

The OOM killer is a terrible idea for any serious database server.  I wrote a 
detailed technical paper on this almost 15 years ago when Silicon Graphics 
had this same feature, and Oracle and other critical server processes 
couldn't be made reliable.


The problem with overallocating memory as Linux does by default is that 
EVERY application, no matter how well designed and written, becomes 
unreliable: It can be killed because of some OTHER process.  You can be as 
clever as you like, and do all the QA possible, and demonstrate that there 
isn't a single bug in Postgres, and it will STILL be unreliable if you run it 
on a Linux system that allows overcommitted memory.


IMHO, all Postgres servers should run with memory-overcommit disabled.  On 
Linux, that means  /proc/sys/vm/overcommit_memory=2.


it depends on how much stuff you allow others to run on the box. if you 
have no control of that then yes, the box is unreliable (but it's not just 
becouse of the OOM killer, it's becouse those other users can eat up all 
the other box resources as well CPU, network bandwidth, disk bandwidth, 
etc)


even with overcommit disabled, the only way you can be sure that a program 
will not fail is to make sure that it never needs to allocate memory. with 
overcommit off you could have one program that eats up 100% of your ram 
without failing (handling the error on memory allocation such that it 
doesn't crash), but which will cause _every_ other program on the system 
to fail, including any scripts (becouse every command executed will 
require forking and without overcommit that will require allocating the 
total memory that your shell has allocated so that it can run a trivial 
command (like ps or kill that you are trying to use to fix the problem)


if you have a box with unpredictable memory use, disabling overcommit will 
not make it reliable. it may make it less unreliable (the fact that the 
linux OOM killer will pick one of the worst possible processes to kill is 
a problem), but less unreliable is not the same as reliable.


it's also not that hard to have a process monitor the postmaster (along 
with other box resources) to restart it if it is killed, at some point you 
can get init to watch your watchdog and the OOM killer will not kill init. 
so while you can't prevent the postmaster from being killed, you can setup 
to recover from it.


David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread david

On Thu, 28 Aug 2008, Tom Lane wrote:


[EMAIL PROTECTED] writes:

On Wed, 27 Aug 2008, Andrew Sullivan wrote:

The upshot of this is that postgres tends to be a big target for the
OOM killer, with seriously bad effects to your database.  So for good
Postgres operation, you want to run on a machine with the OOM killer
disabled.



I disagree with you.


Actually, the problem with Linux' OOM killer is that it
*disproportionately targets the PG postmaster*, on the basis not of
memory that the postmaster is using but of memory its child processes
are using.  This was discussed in the PG archives a few months ago;
I'm too lazy to search for the link right now, but the details and links
to confirming kernel documentation are in our archives.

This is one hundred percent antithetical to the basic design philosophy
of Postgres, which is that no matter how badly the child processes screw
up, the postmaster should live to fight another day.  The postmaster
basically exists to restart things after children die ungracefully.
If the OOM killer takes out the postmaster itself (rather than the child
that was actually eating the unreasonable amount of memory), we have no
chance of recovering.

So, if you want a PG installation that is as robust as it's designed to
be, you *will* turn off Linux' OOM killer.  Otherwise, don't complain to
us when your database unexpectedly stops responding.

(Alternatively, if you know how an unprivileged userland process can
defend itself against such exceedingly brain-dead kernel policy, we are
all ears.)


there are periodic flamefests on the kernel mailing list over the OOM 
killer, if you can propose a better algorithm for it to use than the 
current one that doesn't end up being just as bad for some other workload 
the kernel policy can be changed.


IIRC the reason why it targets the parent process is to deal with a 
fork-bomb type of failure where a program doesn't use much memory itself, 
but forks off memory hogs as quickly as it can. if the OOM killer only 
kills the children the problem never gets solved.


I assume that the postmaster process is monitoring the back-end processes 
by being it's parent, is there another way that this monitoring could 
be done so that the back-end processes become independant of the 
monitoring tool after they are started (the equivalent of nohup)?


while this approach to monitoring may not be as quick to react as a wait 
for a child exit, it may be worth doing if it makes the postmaster not be 
the prime target of the OOM killer when things go bad on the system.



regards, tom lane

PS: I think this is probably unrelated to the OP's problem, since he
stated there was no sign of any problem from the database server's
side.


agreed.

David Lang


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] indexing for distinct search in timestamp based table

2008-08-28 Thread Rainer Mager
I'm looking for some help in speeding up searches. My table is pretty simple
(see below), but somewhat large, and continuously growing. Currently it has
about 50 million rows.

The table is (I know I have excessive indexes, I'm trying to get the
appropriate ones and drop the extras):
  Table public.ad_log
Column|Type |
Modifiers
--+-+---
-
 ad_log_id| integer | not null default
nextval('ad_log_ad_log_id_seq'::regclass)
 channel_name | text| not null
 player_name  | text| not null
 ad_name  | text| not null
 start_time   | timestamp without time zone | not null
 end_time | timestamp without time zone | not null
Indexes:
ad_log_pkey PRIMARY KEY, btree (ad_log_id)
ad_log_channel_name_key UNIQUE, btree (channel_name, player_name,
ad_name, start_time, end_time)
ad_log_ad_and_start btree (ad_name, start_time)
ad_log_ad_name btree (ad_name)
ad_log_all btree (channel_name, player_name, start_time, ad_name)
ad_log_channel_name btree (channel_name)
ad_log_end_time btree (end_time)
ad_log_player_and_start btree (player_name, start_time)
ad_log_player_name btree (player_name)
ad_log_start_time btree (start_time)



The query I'm trying to speed up is below. In it the field tag can be one
of channel_name, player_name, or ad_name. I'm actually trying to return the
distinct values and I found GROUP BY to be slightly faster than using
DISTINCT. Also, any of those fields may be unspecified in the WHERE clauses
in which case we use '%', but it seems Postgres optimizes that pretty well.

SELECT field FROM ad_log 
WHERE channel_name LIKE :channel_name
AND player_name LIKE :player_name 
AND ad_name LIKE :ad_name 
AND start_time BETWEEN :start_date AND (date(:end_date) + 1)
GROUP BY field ORDER BY field


A typical query is:

explain analyze SELECT channel_name FROM ad_log WHERE channel_name LIKE '%'
AND ad_name LIKE '%' AND start_time BETWEEN '2008-07-01' AND
(date('2008-07-28') + 1) GROUP BY channel_name ORDER BY channel_name;

with the result being:
 
QUERY PLAN


---
 Sort  (cost=1163169.02..1163169.03 rows=5 width=10) (actual
time=75460.187..75460.192 rows=15 loops=1)
   Sort Key: channel_name
   Sort Method:  quicksort  Memory: 17kB
   -  HashAggregate  (cost=1163168.91..1163168.96 rows=5 width=10) (actual
time=75460.107..75460.114 rows=15 loops=1)
 -  Bitmap Heap Scan on ad_log  (cost=285064.30..1129582.84
rows=13434427 width=10) (actual time=8506.250..65771.597 rows=13701296
loops=1)
   Recheck Cond: ((start_time = '2008-07-01
00:00:00'::timestamp without time zone) AND (start_time =
'2008-07-29'::date))
   Filter: ((channel_name ~~ '%'::text) AND (ad_name ~~
'%'::text))
   -  Bitmap Index Scan on ad_log_start_time
(cost=0.00..281705.70 rows=13434427 width=0) (actual time=8488.443..8488.443
rows=13701296 loops=1)
 Index Cond: ((start_time = '2008-07-01
00:00:00'::timestamp without time zone) AND (start_time =
'2008-07-29'::date))
 Total runtime: 75460.361 ms


It seems to me there should be some way to create an index to speed this up,
but the various ones I've tried so far haven't helped. Any suggestions would
be greatly appreciated.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] indexing for distinct search in timestamp based table

2008-08-28 Thread H. Hall

Rainer Mager wrote:

I'm looking for some help in speeding up searches. My table is pretty simple
(see below), but somewhat large, and continuously growing. Currently it has
about 50 million rows.
  


Regarding your use of  LIKE:
(1)If you are able to specify the beginning character(s) of the 
statement you are searching for, you will have a better chance of your 
statement using an index. If you specify a wildcard(%) before the search 
string, the entire string in the column must be searched therefore no 
index will be used.
(2) Reorder your where clause to reduce the size of the set that LIKE 
operates on. In your example below, put the BETWEEN before the LIKE.
(3) Consider the use of trigrams instead of LIKE. I have not used it but 
I notice that postgres supports trigrams:


The pg_trgm module provides functions and operators for determining the 
similarity of text based on trigram matching, as well as index operator 
classes that support fast searching for similar strings.


Here is the link: http://www.postgresql.org/docs/current/static/pgtrgm.html

--cheers
HH


The table is (I know I have excessive indexes, I'm trying to get the
appropriate ones and drop the extras):
  Table public.ad_log
Column|Type |
Modifiers
--+-+---
-
 ad_log_id| integer | not null default
nextval('ad_log_ad_log_id_seq'::regclass)
 channel_name | text| not null
 player_name  | text| not null
 ad_name  | text| not null
 start_time   | timestamp without time zone | not null
 end_time | timestamp without time zone | not null
Indexes:
ad_log_pkey PRIMARY KEY, btree (ad_log_id)
ad_log_channel_name_key UNIQUE, btree (channel_name, player_name,
ad_name, start_time, end_time)
ad_log_ad_and_start btree (ad_name, start_time)
ad_log_ad_name btree (ad_name)
ad_log_all btree (channel_name, player_name, start_time, ad_name)
ad_log_channel_name btree (channel_name)
ad_log_end_time btree (end_time)
ad_log_player_and_start btree (player_name, start_time)
ad_log_player_name btree (player_name)
ad_log_start_time btree (start_time)



The query I'm trying to speed up is below. In it the field tag can be one
of channel_name, player_name, or ad_name. I'm actually trying to return the
distinct values and I found GROUP BY to be slightly faster than using
DISTINCT. Also, any of those fields may be unspecified in the WHERE clauses
in which case we use '%', but it seems Postgres optimizes that pretty well.

SELECT field FROM ad_log 
	WHERE channel_name LIKE :channel_name
	AND player_name LIKE :player_name 
	AND ad_name LIKE :ad_name 
	AND start_time BETWEEN :start_date AND (date(:end_date) + 1)

GROUP BY field ORDER BY field


A typical query is:

explain analyze SELECT channel_name FROM ad_log WHERE channel_name LIKE '%'
AND ad_name LIKE '%' AND start_time BETWEEN '2008-07-01' AND
(date('2008-07-28') + 1) GROUP BY channel_name ORDER BY channel_name;

with the result being:
 
QUERY PLAN



---
 Sort  (cost=1163169.02..1163169.03 rows=5 width=10) (actual
time=75460.187..75460.192 rows=15 loops=1)
   Sort Key: channel_name
   Sort Method:  quicksort  Memory: 17kB
   -  HashAggregate  (cost=1163168.91..1163168.96 rows=5 width=10) (actual
time=75460.107..75460.114 rows=15 loops=1)
 -  Bitmap Heap Scan on ad_log  (cost=285064.30..1129582.84
rows=13434427 width=10) (actual time=8506.250..65771.597 rows=13701296
loops=1)
   Recheck Cond: ((start_time = '2008-07-01
00:00:00'::timestamp without time zone) AND (start_time =
'2008-07-29'::date))
   Filter: ((channel_name ~~ '%'::text) AND (ad_name ~~
'%'::text))
   -  Bitmap Index Scan on ad_log_start_time
(cost=0.00..281705.70 rows=13434427 width=0) (actual time=8488.443..8488.443
rows=13701296 loops=1)
 Index Cond: ((start_time = '2008-07-01
00:00:00'::timestamp without time zone) AND (start_time =
'2008-07-29'::date))
 Total runtime: 75460.361 ms


It seems to me there should be some way to create an index to speed this up,
but the various ones I've tried so far haven't helped. Any suggestions would
be greatly appreciated.


  



--
H. Hall
ReedyRiver Group LLC
http://www.reedyriver.com


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread Andrew Sullivan
On Wed, Aug 27, 2008 at 03:22:09PM -0700, [EMAIL PROTECTED] wrote:

 I disagree with you. I think goof Postgres operation is so highly dependant 
 on caching as much data as possible that disabling overcommit (and throwing 
 away a lot of memory that could be used for cache) is a solution that's as 
 bad or worse than the problem it's trying to solve.

Ok, but the danger is that the OOM killer kills your postmaster.  To
me, this is a cure way worse than the disease it's trying to treat.
YMMD c. c.

A

-- 
Andrew Sullivan
[EMAIL PROTECTED]
+1 503 667 4564 x104
http://www.commandprompt.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread Matthew Wakeling

On Wed, 27 Aug 2008, [EMAIL PROTECTED] wrote:
if memory overcommit is disabled, the kernel checks to see if you have an 
extra 1G of ram available, if you do it allows the process to continue, if 
you don't it tries to free memory (by throwing away cache, swapping to disk, 
etc), and if it can't free the memory will return a memroy allocation error 
(which I believe will cause firefox to exit).


Remember that the memory overcommit check is checking against the amount 
of RAM + swap you have - not just the amount of RAM. When a fork occurs, 
hardly any extra actual RAM is used (due to copy on write), but the 
potential is there for the process to use it. If overcommit is switched 
off, then you just need to make sure there is *plenty* of swap to convince 
the kernel that it can actually fulfil all of the memory requests if all 
the processes behave badly and all shared pages become unshared. Then the 
consequences of processes actually using that memory are that the machine 
will swap, rather than the OOM killer having to act.


Of course, it's generally bad to run a machine with more going on than 
will fit in RAM.


Neither swapping nor OOM killing are particularly good - it's just a 
consequence of the amount of memory needed being unpredictable.


Probably the best solution is to just tell the kernel somehow to never 
kill the postmaster.


Matthew

--
Taking apron off And now you can say honestly that you have been to a
lecture where you watched paint dry.
- Computer Graphics Lecturer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread Bill Moran
In response to Matthew Wakeling [EMAIL PROTECTED]:
 
 Probably the best solution is to just tell the kernel somehow to never 
 kill the postmaster.

This thread interested me enough to research this a bit.

In linux, it's possible to tell the OOM killer never to consider
certain processes for the axe, using /proc magic.  See this page:
http://linux-mm.org/OOM_Killer

Perhaps this should be in the PostgreSQL docs somewhere?

-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causesAn I/O error occured while sending to the backend. exception

2008-08-28 Thread Kevin Grittner
 Bill Moran [EMAIL PROTECTED] wrote: 
 In response to Matthew Wakeling [EMAIL PROTECTED]:
 
 Probably the best solution is to just tell the kernel somehow to
never 
 kill the postmaster.
 
 This thread interested me enough to research this a bit.
 
 In linux, it's possible to tell the OOM killer never to consider
 certain processes for the axe, using /proc magic.  See this page:
 http://linux-mm.org/OOM_Killer
 
 Perhaps this should be in the PostgreSQL docs somewhere?
 
That sure sounds like a good idea.
 
Even though the one time the OOM killer kicked in on one of our
servers, it killed a runaway backend and not the postmaster
( http://archives.postgresql.org/pgsql-bugs/2008-07/msg00105.php ),
I think I will modify our service scripts in /etc/init.d/ to pick off
the postmaster pid after a start and echo -16 (or some such) into the
/proc/pid/oom_adj file (which is where I found the file on my SuSE
system).
 
Thanks for the research and the link!
 
-Kevin

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread Steve Atkins


On Aug 28, 2008, at 6:26 AM, Matthew Wakeling wrote:


On Wed, 27 Aug 2008, [EMAIL PROTECTED] wrote:
if memory overcommit is disabled, the kernel checks to see if you  
have an extra 1G of ram available, if you do it allows the process  
to continue, if you don't it tries to free memory (by throwing away  
cache, swapping to disk, etc), and if it can't free the memory will  
return a memroy allocation error (which I believe will cause  
firefox to exit).


Remember that the memory overcommit check is checking against the  
amount of RAM + swap you have - not just the amount of RAM. When a  
fork occurs, hardly any extra actual RAM is used (due to copy on  
write), but the potential is there for the process to use it. If  
overcommit is switched off, then you just need to make sure there is  
*plenty* of swap to convince the kernel that it can actually fulfil  
all of the memory requests if all the processes behave badly and all  
shared pages become unshared. Then the consequences of processes  
actually using that memory are that the machine will swap, rather  
than the OOM killer having to act.


Of course, it's generally bad to run a machine with more going on  
than will fit in RAM.


Neither swapping nor OOM killing are particularly good - it's just a  
consequence of the amount of memory needed being unpredictable.


Probably the best solution is to just tell the kernel somehow to  
never kill the postmaster.


Or configure adequate swap space?

Cheers,
  Steve

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread Jerry Champlin
Another approach we used successfully for a similar problem -- (we had lots
of free high memory but were running out of low memory; oom killer wiped out
MQ a couple times and postmaster a couple times) -- was to change the
settings for how aggressively the virtual memory system protected low memory
by changing /proc/sys/vm/lowmem_reserve_ratio (2.6.18?+ Kernel).  I don't
remember all of the details, but we looked at
Documentation/filesystems/proc.txt for the 2.6.25 kernel (it wasn't
documented for earlier kernel releases) to figure out how it worked and set
it appropriate to our system memory configuration.

-Jerry


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Steve Atkins
Sent: Thursday, August 28, 2008 9:06 AM
To: PostgreSQL Performance
Subject: Re: [PERFORM] select on 22 GB table causes An I/O error occured
while sending to the backend. exception


On Aug 28, 2008, at 6:26 AM, Matthew Wakeling wrote:

 On Wed, 27 Aug 2008, [EMAIL PROTECTED] wrote:
 if memory overcommit is disabled, the kernel checks to see if you  
 have an extra 1G of ram available, if you do it allows the process  
 to continue, if you don't it tries to free memory (by throwing away  
 cache, swapping to disk, etc), and if it can't free the memory will  
 return a memroy allocation error (which I believe will cause  
 firefox to exit).

 Remember that the memory overcommit check is checking against the  
 amount of RAM + swap you have - not just the amount of RAM. When a  
 fork occurs, hardly any extra actual RAM is used (due to copy on  
 write), but the potential is there for the process to use it. If  
 overcommit is switched off, then you just need to make sure there is  
 *plenty* of swap to convince the kernel that it can actually fulfil  
 all of the memory requests if all the processes behave badly and all  
 shared pages become unshared. Then the consequences of processes  
 actually using that memory are that the machine will swap, rather  
 than the OOM killer having to act.

 Of course, it's generally bad to run a machine with more going on  
 than will fit in RAM.

 Neither swapping nor OOM killing are particularly good - it's just a  
 consequence of the amount of memory needed being unpredictable.

 Probably the best solution is to just tell the kernel somehow to  
 never kill the postmaster.

Or configure adequate swap space?

Cheers,
   Steve

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread Craig James

[EMAIL PROTECTED] wrote:

On Wed, 27 Aug 2008, Craig James wrote:
The OOM killer is a terrible idea for any serious database server.  I 
wrote a detailed technical paper on this almost 15 years ago when 
Silicon Graphics had this same feature, and Oracle and other critical 
server processes couldn't be made reliable.


The problem with overallocating memory as Linux does by default is 
that EVERY application, no matter how well designed and written, 
becomes unreliable: It can be killed because of some OTHER process.  
You can be as clever as you like, and do all the QA possible, and 
demonstrate that there isn't a single bug in Postgres, and it will 
STILL be unreliable if you run it on a Linux system that allows 
overcommitted memory.


IMHO, all Postgres servers should run with memory-overcommit 
disabled.  On Linux, that means  /proc/sys/vm/overcommit_memory=2.


it depends on how much stuff you allow others to run on the box. if you 
have no control of that then yes, the box is unreliable (but it's not 
just becouse of the OOM killer, it's becouse those other users can eat 
up all the other box resources as well CPU, network bandwidth, disk 
bandwidth, etc)


even with overcommit disabled, the only way you can be sure that a 
program will not fail is to make sure that it never needs to allocate 
memory. with overcommit off you could have one program that eats up 100% 
of your ram without failing (handling the error on memory allocation 
such that it doesn't crash), but which will cause _every_ other program 
on the system to fail, including any scripts (becouse every command 
executed will require forking and without overcommit that will require 
allocating the total memory that your shell has allocated so that it can 
run a trivial command (like ps or kill that you are trying to use to fix 
the problem)


if you have a box with unpredictable memory use, disabling overcommit 
will not make it reliable. it may make it less unreliable (the fact that 
the linux OOM killer will pick one of the worst possible processes to 
kill is a problem), but less unreliable is not the same as reliable.


The problem with any argument in favor of memory overcommit and OOM is that 
there is a MUCH better, and simpler, solution.  Buy a really big disk, say a 
terabyte, and allocate the whole thing as swap space.  Then do a decent job of 
configuring your kernel so that any reasonable process can allocate huge chunks 
of memory that it will never use, but can't use the whole terrabyte.

Using real swap space instead of overallocated memory is a much better solution.

- It's cheap.
- There is no performance hit at all if you buy enough real memory
- If runaway processes start actually using memory, the system slows
 down, but server processes like Postgres *aren't killed*.
- When a runaway process starts everybody swapping, you can just
 find it and kill it.  Once it's dead, everything else goes back
 to normal.

It's hard to imagine a situation where any program or collection of programs 
would actually try to allocate more than a terrabyte of memory and exceed the 
swap space on a single terrabyte disk.  The cost is almost nothing, a few 
hundred dollars.

So turn off overcommit, and buy an extra disk if you actually need a lot of virtual 
memory.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread Matthew Wakeling

On Thu, 28 Aug 2008, Steve Atkins wrote:
Probably the best solution is to just tell the kernel somehow to never kill 
the postmaster.


Or configure adequate swap space?


Oh yes, that's very important. However, that gives the machine the 
opportunity to thrash.


Matthew

--
The early bird gets the worm. If you want something else for breakfast, get
up later.

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread david

On Thu, 28 Aug 2008, Craig James wrote:


[EMAIL PROTECTED] wrote:

On Wed, 27 Aug 2008, Craig James wrote:
The OOM killer is a terrible idea for any serious database server.  I 
wrote a detailed technical paper on this almost 15 years ago when Silicon 
Graphics had this same feature, and Oracle and other critical server 
processes couldn't be made reliable.


The problem with overallocating memory as Linux does by default is that 
EVERY application, no matter how well designed and written, becomes 
unreliable: It can be killed because of some OTHER process.  You can be as 
clever as you like, and do all the QA possible, and demonstrate that there 
isn't a single bug in Postgres, and it will STILL be unreliable if you run 
it on a Linux system that allows overcommitted memory.


IMHO, all Postgres servers should run with memory-overcommit disabled.  On 
Linux, that means  /proc/sys/vm/overcommit_memory=2.


it depends on how much stuff you allow others to run on the box. if you 
have no control of that then yes, the box is unreliable (but it's not just 
becouse of the OOM killer, it's becouse those other users can eat up all 
the other box resources as well CPU, network bandwidth, disk bandwidth, 
etc)


even with overcommit disabled, the only way you can be sure that a program 
will not fail is to make sure that it never needs to allocate memory. with 
overcommit off you could have one program that eats up 100% of your ram 
without failing (handling the error on memory allocation such that it 
doesn't crash), but which will cause _every_ other program on the system to 
fail, including any scripts (becouse every command executed will require 
forking and without overcommit that will require allocating the total 
memory that your shell has allocated so that it can run a trivial command 
(like ps or kill that you are trying to use to fix the problem)


if you have a box with unpredictable memory use, disabling overcommit will 
not make it reliable. it may make it less unreliable (the fact that the 
linux OOM killer will pick one of the worst possible processes to kill is a 
problem), but less unreliable is not the same as reliable.


The problem with any argument in favor of memory overcommit and OOM is that 
there is a MUCH better, and simpler, solution.  Buy a really big disk, say a 
terabyte, and allocate the whole thing as swap space.  Then do a decent job 
of configuring your kernel so that any reasonable process can allocate huge 
chunks of memory that it will never use, but can't use the whole terrabyte.


Using real swap space instead of overallocated memory is a much better 
solution.


- It's cheap.


cheap in dollars, if you actually use any of it it's very expensive in 
performance



- There is no performance hit at all if you buy enough real memory
- If runaway processes start actually using memory, the system slows
down, but server processes like Postgres *aren't killed*.
- When a runaway process starts everybody swapping, you can just
find it and kill it.  Once it's dead, everything else goes back
to normal.


all of these things are still true if you enable overcommit, the 
difference is that with overcommit enabled your actual ram will be used 
for cache as much as possible, with overcommit disabled you will keep 
throwing away cache to make room for memory that's allocated but not 
written to.


I generally allocate 2G of disk to swap, if the system ends up using even 
that much it will have slowed to a crawl, but if you are worried that 
that's no enough, by all means go ahead and allocate more, but allocateing 
a 1TB disk is overkill (do you realize how long it takes just to _read_ an 
entire 1TB disk? try it sometime with dd if=/dev/drive of=/dev/null)


David Lang

It's hard to imagine a situation where any program or collection of programs 
would actually try to allocate more than a terrabyte of memory and exceed the 
swap space on a single terrabyte disk.  The cost is almost nothing, a few 
hundred dollars.


So turn off overcommit, and buy an extra disk if you actually need a lot of 
virtual memory.


Craig



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread david

On Thu, 28 Aug 2008, Matthew Wakeling wrote:


On Wed, 27 Aug 2008, [EMAIL PROTECTED] wrote:
if memory overcommit is disabled, the kernel checks to see if you have an 
extra 1G of ram available, if you do it allows the process to continue, if 
you don't it tries to free memory (by throwing away cache, swapping to 
disk, etc), and if it can't free the memory will return a memroy allocation 
error (which I believe will cause firefox to exit).


Remember that the memory overcommit check is checking against the amount of 
RAM + swap you have - not just the amount of RAM. When a fork occurs, hardly 
any extra actual RAM is used (due to copy on write), but the potential is 
there for the process to use it. If overcommit is switched off, then you just 
need to make sure there is *plenty* of swap to convince the kernel that it 
can actually fulfil all of the memory requests if all the processes behave 
badly and all shared pages become unshared. Then the consequences of 
processes actually using that memory are that the machine will swap, rather 
than the OOM killer having to act.


if you are correct that it just checks against memory+swap then it's not a 
big deal, but I don't think it does that. I think it actually allocates 
the memory, and if it does that it will push things out of ram to do the 
allocation, I don't believe that it will allocate swap space directly.


David Lang

Of course, it's generally bad to run a machine with more going on than will 
fit in RAM.


Neither swapping nor OOM killing are particularly good - it's just a 
consequence of the amount of memory needed being unpredictable.


Probably the best solution is to just tell the kernel somehow to never kill 
the postmaster.


Matthew




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread Jeff Davis
On Thu, 2008-08-28 at 00:56 -0400, Tom Lane wrote:
 Actually, the problem with Linux' OOM killer is that it
 *disproportionately targets the PG postmaster*, on the basis not of
 memory that the postmaster is using but of memory its child processes
 are using.  This was discussed in the PG archives a few months ago;
 I'm too lazy to search for the link right now, but the details and links
 to confirming kernel documentation are in our archives.
 

http://archives.postgresql.org/pgsql-hackers/2008-02/msg00101.php

It's not so much that the OOM Killer targets the parent process for a
fraction of the memory consumed by the child. It may not be a great
design, but it's not what's causing the problem for the postmaster.

The problem for the postmaster is that the OOM killer counts the
children's total vmsize -- including *shared* memory -- against the
parent, which is such a bad idea I don't know where to start. If you
have shared_buffers set to 1GB and 25 connections, the postmaster will
be penalized as though it was using 13.5 GB of memory, even though all
the processes together are only using about 1GB! 

Not only that, killing a process doesn't free shared memory, so it's
just flat out broken.

Regards,
Jeff Davis


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread Craig James

Matthew Wakeling wrote:

On Thu, 28 Aug 2008, Steve Atkins wrote:
Probably the best solution is to just tell the kernel somehow to 
never kill the postmaster.


Or configure adequate swap space?


Oh yes, that's very important. However, that gives the machine the 
opportunity to thrash.


No, that's where the whole argument for allowing overcommitted memory falls 
flat.

The entire argument for allowing overcommitted memory hinges on the fact that 
processes *won't use the memory*.  If they use it, then overcommitting causes 
problems everywhere, such as a Postmaster getting arbitrarily killed.

If a process *doesn't* use the memory, then there's no problem with thrashing, 
right?

So it never makes sense to enable overcommitted memory when Postgres, or any 
server, is running.

Allocating a big, fat terabyte swap disk is ALWAYS better than allowing 
overcommitted memory.  If your usage is such that overcommitted memory would 
never be used, then the swap disk will never be used either.  If your processes 
do use the memory, then your performance goes into the toilet, and you know 
it's time to buy more memory or a second server, but in the mean time your 
server processes at least keep running while you kill the rogue processes.

Craig

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread david

On Thu, 28 Aug 2008, [EMAIL PROTECTED] wrote:


On Thu, 28 Aug 2008, Matthew Wakeling wrote:


On Wed, 27 Aug 2008, [EMAIL PROTECTED] wrote:
if memory overcommit is disabled, the kernel checks to see if you have an 
extra 1G of ram available, if you do it allows the process to continue, if 
you don't it tries to free memory (by throwing away cache, swapping to 
disk, etc), and if it can't free the memory will return a memroy 
allocation error (which I believe will cause firefox to exit).


Remember that the memory overcommit check is checking against the amount of 
RAM + swap you have - not just the amount of RAM. When a fork occurs, 
hardly any extra actual RAM is used (due to copy on write), but the 
potential is there for the process to use it. If overcommit is switched 
off, then you just need to make sure there is *plenty* of swap to convince 
the kernel that it can actually fulfil all of the memory requests if all 
the processes behave badly and all shared pages become unshared. Then the 
consequences of processes actually using that memory are that the machine 
will swap, rather than the OOM killer having to act.


if you are correct that it just checks against memory+swap then it's not a 
big deal, but I don't think it does that. I think it actually allocates the 
memory, and if it does that it will push things out of ram to do the 
allocation, I don't believe that it will allocate swap space directly.


I just asked on the kernel mailing list and Alan Cox responded.

he is saying that you are correct, it only allocates against the total 
available, it doesn't actually allocate ram.


so with sufficiant swap overcommit off should be fine.

but you do need to allocate more swap as the total memory 'used' can be 
significantly higher that with overcommit on.


David Lang


David Lang

Of course, it's generally bad to run a machine with more going on than will 
fit in RAM.


Neither swapping nor OOM killing are particularly good - it's just a 
consequence of the amount of memory needed being unpredictable.


Probably the best solution is to just tell the kernel somehow to never kill 
the postmaster.


Matthew






--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread david

On Thu, 28 Aug 2008, Craig James wrote:


Matthew Wakeling wrote:

On Thu, 28 Aug 2008, Steve Atkins wrote:
Probably the best solution is to just tell the kernel somehow to never 
kill the postmaster.


Or configure adequate swap space?


Oh yes, that's very important. However, that gives the machine the 
opportunity to thrash.


No, that's where the whole argument for allowing overcommitted memory falls 
flat.


The entire argument for allowing overcommitted memory hinges on the fact that 
processes *won't use the memory*.  If they use it, then overcommitting causes 
problems everywhere, such as a Postmaster getting arbitrarily killed.


If a process *doesn't* use the memory, then there's no problem with 
thrashing, right?


So it never makes sense to enable overcommitted memory when Postgres, or any 
server, is running.


Allocating a big, fat terabyte swap disk is ALWAYS better than allowing 
overcommitted memory.  If your usage is such that overcommitted memory would 
never be used, then the swap disk will never be used either.  If your 
processes do use the memory, then your performance goes into the toilet, and 
you know it's time to buy more memory or a second server, but in the mean 
time your server processes at least keep running while you kill the rogue 
processes.


there was a misunderstanding (for me if nobody else) that without 
overcommit it was actual ram that was getting allocated, which could push 
things out to swap even if the memory ended up not being needed later. 
with the clarification that this is not the case and the allocation is 
just reducing the virtual memory available it's now clear that it is just 
as efficiant to run with overcommit off.


so the conclusion is:

no performance/caching/buffer difference between the two modes.

the differencees between the two are:

with overcommit

  when all ram+swap is used OOM killer is activated.
  for the same amount of ram+swap more allocations can be done before it 
is all used up (how much more is unpredicable)


without overcommit

  when all ram+swap is allocated programs (not nessasarily the memory 
hog) start getting memory allocation errors.



David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread Jeff Davis
On Wed, 2008-08-27 at 23:23 -0700, [EMAIL PROTECTED] wrote:
 there are periodic flamefests on the kernel mailing list over the OOM 
 killer, if you can propose a better algorithm for it to use than the 
 current one that doesn't end up being just as bad for some other workload 
 the kernel policy can be changed.
 

Tried that: http://lkml.org/lkml/2007/2/9/275

All they have to do is *not* count shared memory against the process (or
at least not count it against the parent of the process), and the system
may approximate sanity.

 IIRC the reason why it targets the parent process is to deal with a 
 fork-bomb type of failure where a program doesn't use much memory itself, 
 but forks off memory hogs as quickly as it can. if the OOM killer only 
 kills the children the problem never gets solved.

But killing a process won't free shared memory. And there is already a
system-wide limit on shared memory. So what's the point of such a bad
design?

Regards,
Jeff Davis


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] update - which way quicker?

2008-08-28 Thread Emi Lu

Good morning,

Tried to compare Table1 based on Table2

. update table1.col = false
  if table1.pk_cols not in table2.pk_cols



For the following two ways, (2) always performs better than (1) right,
and I need your inputs.

(1) update table1
set col = false
where table1.pk_co1 || table1.pk_col2 || table1.pk_colN

NOT IN

(select pk_co1 || pk_col2 || pk_colN
 from table2
 )

(2) ResultSet(rs) =
select pk_col1||pk_col2... || pk_colN
from table1
left join table2 using (pk_col1..., pk_colN)
where table2.pk_col1 is null

Then for each rs record, do:
update table1
set col = false
where col1||... colN in rs.value

Thanks a lot!

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Best hardware/cost tradoff?

2008-08-28 Thread Scott Marlowe
On Thu, Aug 28, 2008 at 1:22 PM, cluster [EMAIL PROTECTED] wrote:
 I'm about to buy a combined web- and database server. When (if) the site
 gets sufficiently popular, we will split the database out to a separate
 server.

 Our budget is limited, so how should we prioritize?

Standard prioritization for a db server is:  Disks and controller, RAM, CPU.

 * We think about buying some HP Proliant server with at least 4GB ram and at
 least a duo core processor. Possibly quad core. The OS will be debian/Linux.

HP Makes nice equipment.  Also, since this machine will have apache as
well as pgsql running on it, you might want to look at more memory if
it's reasonably priced.  If pg and apache are using 1.5Gig total to
run, you've got 2.5Gig for the OS to cache in.  With 8 Gig of ram,
you'd have 6.5Gig to cache in.  Also, the cost of a quad core nowadays
is pretty reasonable.

 * Much of the database will fit in RAM so it is not *that* necessary to
 prefer the more expensive SAS 1 RPM drives to the cheaper 7500 RPM SATA
 drives, is it?

That depends.  Writes will still have to hit the drives.  Reads will
be mostly from memory.  Be sure to set your effective_cache_size
appropriately.

 There will both be many read- and write queries and a *lot*
 (!) of random reads.

 * I think we will go for hardware-based RAID 1 with a good battery-backed-up
 controller.

The HP RAID controller that's been mentioned on the list seems like a
good performer.

 I have read that software RAID perform surprisingly good, but
 for a production site where hotplug replacement of dead disks is required,
 is software RAID still worth it?

The answre is maybe.  The reason people keep testing software RAID is
that a lot of cheap (not necessarily in cost, just in design)
controllers give mediocre performance compared to SW RAID.

With SW RAID on top of a caching controller in jbod mode, the
controller simply becomes a cache that can survive power loss, and
doesn't have to do any RAID calculations any more.  With today's very
fast CPUs, and often running RAID-10 for dbs, which requires little
real overhead, it's not uncommon for SW RAID to outrun HW.

With better controllers, the advantage is small to none.

 Anything else we should be aware of?

Can you go with 4 drives?  Even if they're just SATA drives, you'd be
amazed at what going from a 2 drive mirror to a 4 drive RAID-10 can do
for your performance.  Note you'll have no more storage going from 2
drive mirror to 4 drive RAID-10, but your aggregate bandwidth on reads
will be doubled.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread Matthew Wakeling

On Thu, 28 Aug 2008, Jeff Davis wrote:

The problem for the postmaster is that the OOM killer counts the
children's total vmsize -- including *shared* memory -- against the
parent, which is such a bad idea I don't know where to start. If you
have shared_buffers set to 1GB and 25 connections, the postmaster will
be penalized as though it was using 13.5 GB of memory, even though all
the processes together are only using about 1GB!


I find it really hard to believe that it counts shared memory like that. 
That's just dumb.


Of course, there are two types of shared memory. There's explicit shared 
memory, like Postgres uses, and there's copy-on-write shared memory, 
caused by a process fork. The copy-on-write memory needs to be counted for 
each child, but the explicit shared memory needs to be counted just once.



Not only that, killing a process doesn't free shared memory, so it's
just flat out broken.


Exactly. a cost-benefit model would work well here. Work out how much RAM 
would be freed by killing a process, and use that when choosing which 
process to kill.


Matthew

--
You will see this is a 3-blackboard lecture. This is the closest you are going
to get from me to high-tech teaching aids. Hey, if they put nooses on this, it
would be fun!   -- Computer Science Lecturer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Best hardware/cost tradoff?

2008-08-28 Thread Fernando Hevia
 

 -Mensaje original-
 De: [EMAIL PROTECTED] 
 [mailto:[EMAIL PROTECTED] En nombre de cluster
 
 I'm about to buy a combined web- and database server. When 
 (if) the site gets sufficiently popular, we will split the 
 database out to a separate server.
 
 Our budget is limited, so how should we prioritize?
 
 * We think about buying some HP Proliant server with at least 
 4GB ram and at least a duo core processor. Possibly quad 
 core. The OS will be debian/Linux.
 
 * Much of the database will fit in RAM so it is not *that* 
 necessary to prefer the more expensive SAS 1 RPM drives 
 to the cheaper 7500 RPM SATA drives, is it? There will both 
 be many read- and write queries and a *lot* (!) of random reads.
 
 * I think we will go for hardware-based RAID 1 with a good 
 battery-backed-up controller. I have read that software RAID 
 perform surprisingly good, but for a production site where 
 hotplug replacement of dead disks is required, is software 
 RAID still worth it?
 
 Anything else we should be aware of?
 

I havent had any issues with software raid (mdadm) and hot-swaps. It keeps
working in degraded mode and as soon as you replace the defective disk it
can reconstruct the array on the fly. Performance will suffer while at it
but the service keeps up.
The battery backup makes a very strong point for a hw controller. Still, I
have heard good things on combining a HW controller with JBODS leaving the
RAID affair to mdadm. In your scenario though with *lots* of random reads,
if I had to choose between a HW controller  2 disks or software RAID with 4
or 6 disks, I would go for the disks. There are motherboards with 6 SATA
ports. For the money you will save on the controller you can afford 6 disks
in a RAID 10 setup. 

Cheers,
Fernando.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Best hardware/cost tradoff?

2008-08-28 Thread Craig James

-Mensaje original-
De: [EMAIL PROTECTED] 
* I think we will go for hardware-based RAID 1 with a good 
battery-backed-up controller. I have read that software RAID 
perform surprisingly good, but for a production site where 
hotplug replacement of dead disks is required, is software 
RAID still worth it?
... 

I havent had any issues with software raid (mdadm) and hot-swaps. It keeps
working in degraded mode and as soon as you replace the defective disk it
can reconstruct the array on the fly. Performance will suffer while at it
but the service keeps up.
The battery backup makes a very strong point for a hw controller. Still, I
have heard good things on combining a HW controller with JBODS leaving the
RAID affair to mdadm. In your scenario though with *lots* of random reads,
if I had to choose between a HW controller  2 disks or software RAID with 4
or 6 disks, I would go for the disks. There are motherboards with 6 SATA
ports. For the money you will save on the controller you can afford 6 disks
in a RAID 10 setup.


This is good advice.  Hot-swapping seems cool, but how often will you actually 
use it? Maybe once every year?  With Software RAID, replacing a disk means 
shutdown, swap the hardware, and reboot, which is usually less than ten 
minutes, and you're back in business.  If that's the only thing that happens, 
you'll have 99.97% uptime on your server.

If you're on a limited budget, a software RAID 1+0 will be very cost effective 
and give good performance for lots of random reads.  Hardware RAID with a 
battery-backed cache helps with writes and hot swapping.  If your random-read 
performance needs outweigh these two factors, consider software RAID.

Craig


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread Matthew Wakeling

On Thu, 28 Aug 2008, Craig James wrote:
If your processes do use the memory, then your performance goes into the 
toilet, and you know it's time to buy more memory or a second server, 
but in the mean time your server processes at least keep running while 
you kill the rogue processes.


I'd argue against swap ALWAYS being better than overcommit. It's a choice 
between your performance going into the toilet or your processes dieing.


On the one hand, if someone fork-bombs you, the OOM killer has a chance of 
solving the problem for you, rather than you having to log onto an 
unresponsive machine to kill the process yourself. On the other hand, the 
OOM killer may kill the wrong thing. Depending on what else you use your 
machine for, either of the choices may be the right one.


Another point is that from a business perspective, a database that has 
stopped responding is equally bad regardless of whether that is because 
the OOM killer has appeared or because the machine is thrashing. In both 
cases, there is a maximum throughput that the machine can handle, and if 
requests appear quicker than that the system will collapse, especially if 
the requests start timing out and being retried.


This problem really is caused by the kernel not having enough information 
on how much memory a process is going to use. I would be much in favour of 
replacing fork() with some more informative system call. For example, 
forkandexec() instead of fork() then exec() - the kernel would know that 
the new process will never need any of that duplicated RAM. However, there 
is *far* too much legacy in the old fork() call to change that now.


Likewise, I would be all for Postgres managing its memory better. It would 
be very nice to be able to set a maximum amount of work-memory, rather 
than a maximum amount per backend. Each backend could then make do with 
however much is left of the work-memory pool when it actually executes 
queries. As it is, the server admin has no idea how many multiples of 
work-mem are going to be actually used, even knowing the maximum number of 
backends.


Matthew

--
Of course it's your fault. Everything here's your fault - it says so in your
contract.- Quark

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] update - which way quicker?

2008-08-28 Thread paul socha


On 2008-08-28, at 21:31, Emi Lu wrote:


Good morning,

Tried to compare Table1 based on Table2

. update table1.col = false
  if table1.pk_cols not in table2.pk_cols



For the following two ways, (2) always performs better than (1) right,
and I need your inputs.
== 
==

(1) update table1
set col = false
where table1.pk_co1 || table1.pk_col2 || table1.pk_colN

NOT IN

(select pk_co1 || pk_col2 || pk_colN
 from table2
 )

(2) ResultSet(rs) =
select pk_col1||pk_col2... || pk_colN
from table1
left join table2 using (pk_col1..., pk_colN)
where table2.pk_col1 is null

Then for each rs record, do:
update table1
set col = false
where col1||... colN in rs.value

Thanks a lot!

--
Sent via pgsql-performance mailing list (pgsql- 
[EMAIL PROTECTED])

To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance




Check EXISTS

http://www.postgresql.org/docs/8.3/interactive/functions-subquery.html


Serdecznie pozdrawiam

Pawel Socha
[EMAIL PROTECTED]

programista/administrator

perl -le 's**02).4^-%2,).^9%4^!./4(%2^3,!#+7!2%^53%2**y% -;^[%`- 
{ a%%s%%$_%ee'



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Best hardware/cost tradoff?

2008-08-28 Thread cluster
Thanks for all your replies! They are enlightening. I have some 
additional questions:


1) Would you prefer
   a) 5.4k 2 SATA RAID10 on four disks or
   b) 10k 2 SAS RAID1 on two disks?
(Remember the lots (!) of random reads)

2) Should I just make one large partition of my RAID? Does it matter at all?

3) Will I gain much by putting the OS on a saparate disk, not included 
in the RAID? (The webserver and database would still share the RAID - 
but I guess the OS will cache my (small) web content in RAM anyway).


Thanks again!

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Best hardware/cost tradoff?

2008-08-28 Thread Scott Marlowe
On Thu, Aug 28, 2008 at 2:04 PM, Fernando Hevia [EMAIL PROTECTED] wrote:

 I havent had any issues with software raid (mdadm) and hot-swaps. It keeps
 working in degraded mode and as soon as you replace the defective disk it
 can reconstruct the array on the fly. Performance will suffer while at it
 but the service keeps up.

I too put my vote behind mdadm for ease of use.  However, there are
reports that certain levels of RAID in linux kernel RAID that are
supposed to NOT handle write barriers properly.  So that's what
worries me.

 The battery backup makes a very strong point for a hw controller. Still, I
 have heard good things on combining a HW controller with JBODS leaving the
 RAID affair to mdadm. In your scenario though with *lots* of random reads,

This is especially true on slower RAID controllers.  A lot of RAID
controllers in the $300 range with battery backed caching don't do
RAID real well, but do caching ok.  If you can't afford a $1200 RAID
card then this might be a good option.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Best hardware/cost tradoff?

2008-08-28 Thread Scott Marlowe
On Thu, Aug 28, 2008 at 3:29 PM, cluster [EMAIL PROTECTED] wrote:
 Thanks for all your replies! They are enlightening. I have some additional
 questions:

 1) Would you prefer
   a) 5.4k 2 SATA RAID10 on four disks or
   b) 10k 2 SAS RAID1 on two disks?
 (Remember the lots (!) of random reads)

I'd lean towards 4 disks in RAID-10.  Better performance when  1 read
is going on.  Similar commit rates to the two 10k drives.  Probably
bigger drives too, right?  Always nice to have room to work in.

 2) Should I just make one large partition of my RAID? Does it matter at all?

Probably.  With more disks it might be advantageous to split out two
drives into RAID-10 for pg_xlog.  with 2 or 4 disks, splitting off two
for pg_xlog might slow down the data partition more than you gain from
a separate pg_xlog drive set.

 3) Will I gain much by putting the OS on a saparate disk, not included in
 the RAID? (The webserver and database would still share the RAID - but I
 guess the OS will cache my (small) web content in RAM anyway).

The real reason you want your OS on a different set of drives is that
it allows you to reconfigure your underlying RAID array as needed
without having to reinstall the whole OS again.  Yeah, logging to
/var/log will eat some bandwidth on your RAID as well, but the ease of
maintenance is why I do it as much as anything.  A lot of large
servers support 2 fixed drives for the OS and a lot of removeable
drives hooked up to a RAID controller for this reason.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread Scott Marlowe
On Thu, Aug 28, 2008 at 2:29 PM, Matthew Wakeling [EMAIL PROTECTED] wrote:

 Another point is that from a business perspective, a database that has
 stopped responding is equally bad regardless of whether that is because the
 OOM killer has appeared or because the machine is thrashing. In both cases,
 there is a maximum throughput that the machine can handle, and if requests
 appear quicker than that the system will collapse, especially if the
 requests start timing out and being retried.

But there's a HUGE difference between a machine that has bogged down
under load so badly that you have to reset it and a machine that's had
the postmaster slaughtered by the OOM killer.  In the first situation,
while the machine is unresponsive, it should come right back up with a
coherent database after the restart.

OTOH, a machine with a dead postmaster is far more likely to have a
corrupted database when it gets restarted.

 Likewise, I would be all for Postgres managing its memory better. It would
 be very nice to be able to set a maximum amount of work-memory, rather than
 a maximum amount per backend. Each backend could then make do with however
 much is left of the work-memory pool when it actually executes queries. As
 it is, the server admin has no idea how many multiples of work-mem are going
 to be actually used, even knowing the maximum number of backends.

Agreed.  It would be useful to have a cap on all work_mem, but it
might be an issue that causes all the backends to talk to each other,
which can be really slow if you're running a thousand or so
connections.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] indexing for distinct search in timestamp based table

2008-08-28 Thread David Rowley
I once also had a similar performance problem when looking for all matching
rows between two timestamps. In fact that's why I'm here today. The problem
was with MySQL. I had some tables of around 10 million rows and all my
searching was timestamp based. MySQL didn't do what I wanted. I found that
using a CLUSTERED index with postgresql to be lightning quick. Yet mostly
the matching rows I was working with was not much over the 100k mark. I'm
wondering if clustering the table on ad_log_start_time will help cut down on
random reads.

That's if you can afford to block the users while postgresql clusters the
table.

If you're inserting in order of the start_time column (which I was) then the
cluster should almost maintain itself (I think), providing you're not
updating or deleting anyway, I'd assume that since it looks like a log
table.

David.

 
-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Rainer Mager
Sent: 28 August 2008 09:06
To: pgsql-performance@postgresql.org
Subject: [PERFORM] indexing for distinct search in timestamp based table

I'm looking for some help in speeding up searches. My table is pretty simple
(see below), but somewhat large, and continuously growing. Currently it has
about 50 million rows.

The table is (I know I have excessive indexes, I'm trying to get the
appropriate ones and drop the extras):
  Table public.ad_log
Column|Type |
Modifiers
--+-+---
-
 ad_log_id| integer | not null default
nextval('ad_log_ad_log_id_seq'::regclass)
 channel_name | text| not null
 player_name  | text| not null
 ad_name  | text| not null
 start_time   | timestamp without time zone | not null
 end_time | timestamp without time zone | not null
Indexes:
ad_log_pkey PRIMARY KEY, btree (ad_log_id)
ad_log_channel_name_key UNIQUE, btree (channel_name, player_name,
ad_name, start_time, end_time)
ad_log_ad_and_start btree (ad_name, start_time)
ad_log_ad_name btree (ad_name)
ad_log_all btree (channel_name, player_name, start_time, ad_name)
ad_log_channel_name btree (channel_name)
ad_log_end_time btree (end_time)
ad_log_player_and_start btree (player_name, start_time)
ad_log_player_name btree (player_name)
ad_log_start_time btree (start_time)



The query I'm trying to speed up is below. In it the field tag can be one
of channel_name, player_name, or ad_name. I'm actually trying to return the
distinct values and I found GROUP BY to be slightly faster than using
DISTINCT. Also, any of those fields may be unspecified in the WHERE clauses
in which case we use '%', but it seems Postgres optimizes that pretty well.

SELECT field FROM ad_log 
WHERE channel_name LIKE :channel_name
AND player_name LIKE :player_name 
AND ad_name LIKE :ad_name 
AND start_time BETWEEN :start_date AND (date(:end_date) + 1)
GROUP BY field ORDER BY field


A typical query is:

explain analyze SELECT channel_name FROM ad_log WHERE channel_name LIKE '%'
AND ad_name LIKE '%' AND start_time BETWEEN '2008-07-01' AND
(date('2008-07-28') + 1) GROUP BY channel_name ORDER BY channel_name;

with the result being:
 
QUERY PLAN


---
 Sort  (cost=1163169.02..1163169.03 rows=5 width=10) (actual
time=75460.187..75460.192 rows=15 loops=1)
   Sort Key: channel_name
   Sort Method:  quicksort  Memory: 17kB
   -  HashAggregate  (cost=1163168.91..1163168.96 rows=5 width=10) (actual
time=75460.107..75460.114 rows=15 loops=1)
 -  Bitmap Heap Scan on ad_log  (cost=285064.30..1129582.84
rows=13434427 width=10) (actual time=8506.250..65771.597 rows=13701296
loops=1)
   Recheck Cond: ((start_time = '2008-07-01
00:00:00'::timestamp without time zone) AND (start_time =
'2008-07-29'::date))
   Filter: ((channel_name ~~ '%'::text) AND (ad_name ~~
'%'::text))
   -  Bitmap Index Scan on ad_log_start_time
(cost=0.00..281705.70 rows=13434427 width=0) (actual time=8488.443..8488.443
rows=13701296 loops=1)
 Index Cond: ((start_time = '2008-07-01
00:00:00'::timestamp without time zone) AND (start_time =
'2008-07-29'::date))
 Total runtime: 75460.361 ms


It seems to me there should be some way to create an index to speed this up,
but the various ones I've tried so far haven't helped. Any suggestions would
be greatly appreciated.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)

Re: [PERFORM] indexing for distinct search in timestamp based table

2008-08-28 Thread Scott Carey
Another suggestion is to partition the table by date ranges.  If most of the
range queries occur on particular batches of time, this will make all
queries more efficient, and improve locality and efficiency of all indexes
on the table.

This is more work than simply a table CLUSTER, especially in maintenance
overhead, but it will generally help a lot in cases like these.
Additionally, if these don't change much after some period of time the
tables older than the modification window can be vacuumed, clustered, and
reindexed if needed to make them as efficient as possible and maintenance
free after that point (other than backups and archives).

Another benefit of clustering is in backup / restore.  You can incrementally
back up only the index partitions that have changed -- for large databases
this reduces pg_dump and pg_restore times substantially.  To do this you
combine regular expressions with the pg_dump exclude tables or include
tables flags.


On Thu, Aug 28, 2008 at 3:48 PM, David Rowley [EMAIL PROTECTED] wrote:

 I once also had a similar performance problem when looking for all matching
 rows between two timestamps. In fact that's why I'm here today. The problem
 was with MySQL. I had some tables of around 10 million rows and all my
 searching was timestamp based. MySQL didn't do what I wanted. I found that
 using a CLUSTERED index with postgresql to be lightning quick. Yet mostly
 the matching rows I was working with was not much over the 100k mark. I'm
 wondering if clustering the table on ad_log_start_time will help cut down
 on
 random reads.

 That's if you can afford to block the users while postgresql clusters the
 table.

 If you're inserting in order of the start_time column (which I was) then
 the
 cluster should almost maintain itself (I think), providing you're not
 updating or deleting anyway, I'd assume that since it looks like a log
 table.

 David.


 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] On Behalf Of Rainer Mager
 Sent: 28 August 2008 09:06
 To: pgsql-performance@postgresql.org
 Subject: [PERFORM] indexing for distinct search in timestamp based table

 I'm looking for some help in speeding up searches. My table is pretty
 simple
 (see below), but somewhat large, and continuously growing. Currently it has
 about 50 million rows.

 The table is (I know I have excessive indexes, I'm trying to get the
 appropriate ones and drop the extras):
  Table public.ad_log
Column|Type |
 Modifiers

 --+-+---
 -
  ad_log_id| integer | not null default
 nextval('ad_log_ad_log_id_seq'::regclass)
  channel_name | text| not null
  player_name  | text| not null
  ad_name  | text| not null
  start_time   | timestamp without time zone | not null
  end_time | timestamp without time zone | not null
 Indexes:
ad_log_pkey PRIMARY KEY, btree (ad_log_id)
ad_log_channel_name_key UNIQUE, btree (channel_name, player_name,
 ad_name, start_time, end_time)
ad_log_ad_and_start btree (ad_name, start_time)
ad_log_ad_name btree (ad_name)
ad_log_all btree (channel_name, player_name, start_time, ad_name)
ad_log_channel_name btree (channel_name)
ad_log_end_time btree (end_time)
ad_log_player_and_start btree (player_name, start_time)
ad_log_player_name btree (player_name)
ad_log_start_time btree (start_time)



 The query I'm trying to speed up is below. In it the field tag can be one
 of channel_name, player_name, or ad_name. I'm actually trying to return the
 distinct values and I found GROUP BY to be slightly faster than using
 DISTINCT. Also, any of those fields may be unspecified in the WHERE clauses
 in which case we use '%', but it seems Postgres optimizes that pretty well.

 SELECT field FROM ad_log
WHERE channel_name LIKE :channel_name
AND player_name LIKE :player_name
AND ad_name LIKE :ad_name
AND start_time BETWEEN :start_date AND (date(:end_date) + 1)
GROUP BY field ORDER BY field


 A typical query is:

 explain analyze SELECT channel_name FROM ad_log WHERE channel_name LIKE '%'
 AND ad_name LIKE '%' AND start_time BETWEEN '2008-07-01' AND
 (date('2008-07-28') + 1) GROUP BY channel_name ORDER BY channel_name;

 with the result being:

 QUERY PLAN

 

 
 ---
  Sort  (cost=1163169.02..1163169.03 rows=5 width=10) (actual
 time=75460.187..75460.192 rows=15 loops=1)
   Sort Key: channel_name
   Sort Method:  quicksort  Memory: 17kB
   -  HashAggregate  (cost=1163168.91..1163168.96 rows=5 width=10) (actual
 time=75460.107..75460.114 rows=15 loops=1)
 -  Bitmap Heap Scan 

Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread Scott Marlowe
On Thu, Aug 28, 2008 at 5:08 PM,  [EMAIL PROTECTED] wrote:
 On Thu, 28 Aug 2008, Scott Marlowe wrote:

 On Thu, Aug 28, 2008 at 2:29 PM, Matthew Wakeling [EMAIL PROTECTED]
 wrote:

 Another point is that from a business perspective, a database that has
 stopped responding is equally bad regardless of whether that is because
 the
 OOM killer has appeared or because the machine is thrashing. In both
 cases,
 there is a maximum throughput that the machine can handle, and if
 requests
 appear quicker than that the system will collapse, especially if the
 requests start timing out and being retried.

 But there's a HUGE difference between a machine that has bogged down
 under load so badly that you have to reset it and a machine that's had
 the postmaster slaughtered by the OOM killer.  In the first situation,
 while the machine is unresponsive, it should come right back up with a
 coherent database after the restart.

 OTOH, a machine with a dead postmaster is far more likely to have a
 corrupted database when it gets restarted.

 wait a min here, postgres is supposed to be able to survive a complete box
 failure without corrupting the database, if killing a process can corrupt
 the database it sounds like a major problem.

Yes it is a major problem, but not with postgresql.  It's a major
problem with the linux OOM killer killing processes that should not be
killed.

Would it be postgresql's fault if it corrupted data because my machine
had bad memory?  Or a bad hard drive?  This is the same kind of
failure.  The postmaster should never be killed.  It's the one thing
holding it all together.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread david

On Thu, 28 Aug 2008, Scott Marlowe wrote:


On Thu, Aug 28, 2008 at 5:08 PM,  [EMAIL PROTECTED] wrote:

On Thu, 28 Aug 2008, Scott Marlowe wrote:


On Thu, Aug 28, 2008 at 2:29 PM, Matthew Wakeling [EMAIL PROTECTED]
wrote:


Another point is that from a business perspective, a database that has
stopped responding is equally bad regardless of whether that is because
the
OOM killer has appeared or because the machine is thrashing. In both
cases,
there is a maximum throughput that the machine can handle, and if
requests
appear quicker than that the system will collapse, especially if the
requests start timing out and being retried.


But there's a HUGE difference between a machine that has bogged down
under load so badly that you have to reset it and a machine that's had
the postmaster slaughtered by the OOM killer.  In the first situation,
while the machine is unresponsive, it should come right back up with a
coherent database after the restart.

OTOH, a machine with a dead postmaster is far more likely to have a
corrupted database when it gets restarted.


wait a min here, postgres is supposed to be able to survive a complete box
failure without corrupting the database, if killing a process can corrupt
the database it sounds like a major problem.


Yes it is a major problem, but not with postgresql.  It's a major
problem with the linux OOM killer killing processes that should not be
killed.

Would it be postgresql's fault if it corrupted data because my machine
had bad memory?  Or a bad hard drive?  This is the same kind of
failure.  The postmaster should never be killed.  It's the one thing
holding it all together.


the ACID guarantees that postgres is making are supposed to mean that even 
if the machine dies, the CPU goes up in smoke, etc, the transactions that 
are completed will not be corrupted.


if killing the process voids all the ACID protection then something is 
seriously wrong.


it may loose transactions that are in flight, but it should not corrupt 
the database.


David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread Scott Marlowe
On Thu, Aug 28, 2008 at 7:16 PM,  [EMAIL PROTECTED] wrote:
 the ACID guarantees that postgres is making are supposed to mean that even
 if the machine dies, the CPU goes up in smoke, etc, the transactions that
 are completed will not be corrupted.

And if any of those things happens, the machine will shut down and
you'll be safe.

 if killing the process voids all the ACID protection then something is
 seriously wrong.

No, your understanding of what postgresql can expect to have happen to
it are wrong.

You'll lose data integrity if:
If a CPU starts creating bad output that gets written to disk,
your RAID controller starts writing garbage to disk,
your memory has bad bits and you don't have ECC,
Some program hijacks a postgres process and starts writing random bits
in the code,
some program comes along and kills the postmaster, which coordinates
all the backends, and corrupts shared data in the process.

 it may loose transactions that are in flight, but it should not corrupt the
 database.

That's true for anything that just stops the machine or all the
postgresql processes dead.

It's not true for a machine that is misbehaving.  And any server that
randomly kills processes is misbehaving.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread Matthew Dennis
On Thu, Aug 28, 2008 at 8:11 PM, Scott Marlowe [EMAIL PROTECTED]wrote:

  wait a min here, postgres is supposed to be able to survive a complete
 box
  failure without corrupting the database, if killing a process can corrupt
  the database it sounds like a major problem.

 Yes it is a major problem, but not with postgresql.  It's a major
 problem with the linux OOM killer killing processes that should not be
 killed.

 Would it be postgresql's fault if it corrupted data because my machine
 had bad memory?  Or a bad hard drive?  This is the same kind of
 failure.  The postmaster should never be killed.  It's the one thing
 holding it all together.


I fail to see the difference between the OOM killing it and the power going
out.  And yes, if the power went out and PG came up with a corrupted DB
(assuming I didn't turn off fsync, etc) I *would* blame PG.  I understand
that killing the postmaster could stop all useful PG work, that it could
cause it to stop responding to clients, that it could even crash PG, et
ceterabut if a particular process dying causes corrupted DBs, that sounds
borked to me.


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread Scott Marlowe
On Thu, Aug 28, 2008 at 7:53 PM, Matthew Dennis [EMAIL PROTECTED] wrote:
 On Thu, Aug 28, 2008 at 8:11 PM, Scott Marlowe [EMAIL PROTECTED]
 wrote:

  wait a min here, postgres is supposed to be able to survive a complete
  box
  failure without corrupting the database, if killing a process can
  corrupt
  the database it sounds like a major problem.

 Yes it is a major problem, but not with postgresql.  It's a major
 problem with the linux OOM killer killing processes that should not be
 killed.

 Would it be postgresql's fault if it corrupted data because my machine
 had bad memory?  Or a bad hard drive?  This is the same kind of
 failure.  The postmaster should never be killed.  It's the one thing
 holding it all together.

 I fail to see the difference between the OOM killing it and the power going
 out.

Then you fail to understand.

scenario 1:  There's a postmaster, it owns all the child processes.
It gets killed.  The Postmaster gets restarted.  Since there isn't one
running, it comes up.  starts new child processes.  Meanwhile, the old
child processes that don't belong to it are busy writing to the data
store.  Instant corruption.

scenario 2: Someone pulls the plug.  Every postgres child dies a quick
death.  Data on the drives is coherent and recoverable.
  And yes, if the power went out and PG came up with a corrupted DB
 (assuming I didn't turn off fsync, etc) I *would* blame PG.

Then you might be wrong.  If you were using the LVM, or certain levels
of SW RAID, or a RAID controller with cache with no battery backing
that is set to write-back, or if you were using an IDE or SATA drive /
controller that didn't support write barriers, or using NFS mounts for
database storage, and so on.  My point being that PostgreSQL HAS to
make certain assumptions about its environment that it simply cannot
directly control or test for.  Not having the postmaster shot in the
head while the children keep running is one of those things.

  I understand
 that killing the postmaster could stop all useful PG work, that it could
 cause it to stop responding to clients, that it could even crash PG, et
 ceterabut if a particular process dying causes corrupted DBs, that sounds
 borked to me.

Well, design a better method and implement it.  If everything went
through the postmaster you'd be lucky to get 100 transactions per
second.  There are compromises between performance and reliability
under fire that have to be made.  It is not unreasonable to assume
that your OS is not going to randomly kill off processes because of a
dodgy VM implementation quirk.

P.s. I'm a big fan of linux, and I run my dbs on it.  But I turn off
overcommit and make a few other adjustments to make sure my database
is safe.  The OOM killer as a default is fine for workstations, but
it's an insane setting for servers, much like swappiness=60 is an
insane setting for a server too.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Nested Loop join being improperly chosen

2008-08-28 Thread David Rowley
I had a similar problem here:
http://archives.postgresql.org/pgsql-bugs/2008-07/msg00026.php

Is the nested loop performing a LEFT join with yours? It's a little
difficult to tell just from the query plan you showed.

A work around for mine was to use a full outer join and eliminate the extra
rows in the where clause. A bit of a hack but it changed a 2 min query into
one that ran in under a second.

Of course this is not helping with your problem but at least may trigger
some more feedback.

David.


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] On Behalf Of Brad Ediger
Sent: 22 August 2008 16:26
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Nested Loop join being improperly chosen

Hello,
I'm having trouble with a Nested Loop being selected for a rather  
complex query; it turns out this is a pretty bad plan as the nested  
loop's row estimates are quite off (1 estimated / 1207881 actual). If  
I disable enable_nestloop, the query executes much faster (42 seconds  
instead of 605). The tables in the query have all been ANALYZEd just  
before generating these plans.

Here are the plans with and without enable_nestloop:

http://pastie.org/258043

The inventory table is huge; it currently has about 1.3 x 10^9 tuples.  
The items table has around 10,000 tuples, and the other tables in the  
query are tiny.

Any ideas or suggestions would be greatly appreciated. Thanks!
--
Brad Ediger



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres not using array

2008-08-28 Thread Greg Smith

On Thu, 21 Aug 2008, Andr? Volpato wrote:

So, what is slowing down is the CPU (Intel(R) Pentium(R) Dual CPU E2160 
@ 1.80GHz)..In practice, I have noticed that dual 1.8 is worse than 
single 3.0. We have another server wich is a Pentium D 3.0 GHz, that 
runs faster.


Pentium D models are all dual-core so either you've got the wrong model 
number here or you've actually comparing against a 2X3.0GHz part.


The Core 2 Duo E2160 has a very small CPU cache--512KB per core.  Your 
older Pentium system probably has quite a bit more.  I suspect that's the 
main reason it runs faster on this application.


I am a bit confused about what CPU is best for Postgres. Our apps is 
mostly read, with a few connections and heavy queryes.


There are a lot of things you can run into with Postgres that end up being 
limited by the fact that they only run on a single core, as you've seen 
here.  If you've only got a fairly small number of connections running CPU 
heavy queries, you probably want a processor with lots of L2 cache and a 
fast clock speed, rather than adding a large number of cores running at a 
slower speed.  The very small L2 cache on your E2160 is likely what's 
holding it back here, and even though the newer processors are 
significantly more efficient per clock the gap between 1.8GHz and 3.0GHz 
is pretty big.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD
--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread Alvaro Herrera
Scott Marlowe escribió:

 scenario 1:  There's a postmaster, it owns all the child processes.
 It gets killed.  The Postmaster gets restarted.  Since there isn't one
 running, it comes up.

Actually there's an additional step required at this point.  There isn't
a postmaster running, but a new one refuses to start, because the shmem
segment is in use.  In order for the second postmaster to start, the
sysadmin must remove the PID file by hand.

 starts new child processes.  Meanwhile, the old child processes that
 don't belong to it are busy writing to the data store.  Instant
 corruption.

In this scenario, it is both a kernel fault and sysadmin stupidity.  The
corruption that ensues is 100% deserved.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
PostgreSQL Replication, Consulting, Custom Development, 24x7 support

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread david

On Thu, 28 Aug 2008, Scott Marlowe wrote:


On Thu, Aug 28, 2008 at 7:53 PM, Matthew Dennis [EMAIL PROTECTED] wrote:

On Thu, Aug 28, 2008 at 8:11 PM, Scott Marlowe [EMAIL PROTECTED]
wrote:



wait a min here, postgres is supposed to be able to survive a complete
box
failure without corrupting the database, if killing a process can
corrupt
the database it sounds like a major problem.


Yes it is a major problem, but not with postgresql.  It's a major
problem with the linux OOM killer killing processes that should not be
killed.

Would it be postgresql's fault if it corrupted data because my machine
had bad memory?  Or a bad hard drive?  This is the same kind of
failure.  The postmaster should never be killed.  It's the one thing
holding it all together.


I fail to see the difference between the OOM killing it and the power going
out.


Then you fail to understand.

scenario 1:  There's a postmaster, it owns all the child processes.
It gets killed.  The Postmaster gets restarted.  Since there isn't one


when the postmaster gets killed doesn't that kill all it's children as 
well?



running, it comes up.  starts new child processes.  Meanwhile, the old
child processes that don't belong to it are busy writing to the data
store.  Instant corruption.


if so then the postmaster should not only check if there is an existing 
postmaster running, it should check for the presense of the child 
processes as well.



scenario 2: Someone pulls the plug.  Every postgres child dies a quick
death.  Data on the drives is coherent and recoverable.

 And yes, if the power went out and PG came up with a corrupted DB

(assuming I didn't turn off fsync, etc) I *would* blame PG.


Then you might be wrong.  If you were using the LVM, or certain levels
of SW RAID, or a RAID controller with cache with no battery backing
that is set to write-back, or if you were using an IDE or SATA drive /
controller that didn't support write barriers, or using NFS mounts for
database storage, and so on.


these all fall under (assuming I didn't turn off fsync, etc)


My point being that PostgreSQL HAS to
make certain assumptions about its environment that it simply cannot
directly control or test for.  Not having the postmaster shot in the
head while the children keep running is one of those things.


 I understand
that killing the postmaster could stop all useful PG work, that it could
cause it to stop responding to clients, that it could even crash PG, et
ceterabut if a particular process dying causes corrupted DBs, that sounds
borked to me.


Well, design a better method and implement it.  If everything went
through the postmaster you'd be lucky to get 100 transactions per
second.


well, if you aren't going through the postmaster, what process is 
recieving network messages? it can't be a group of processes, only one can 
be listening to a socket at one time.


and if the postmaster isn't needed for the child processes to write to the 
datastore, how are multiple child processes prevented from writing to the 
datastore normally? and why doesn't that mechanism continue to work?



 There are compromises between performance and reliability
under fire that have to be made.  It is not unreasonable to assume
that your OS is not going to randomly kill off processes because of a
dodgy VM implementation quirk.

P.s. I'm a big fan of linux, and I run my dbs on it.  But I turn off
overcommit and make a few other adjustments to make sure my database
is safe.  The OOM killer as a default is fine for workstations, but
it's an insane setting for servers, much like swappiness=60 is an
insane setting for a server too.


so are you saying that the only possible thing that can kill the 
postmaster is the OOM killer? it can't possilby exit in any other 
situation without the children being shutdown first?


I would be surprised if that was really true.

David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to setup disk spindles for best performance

2008-08-28 Thread Greg Smith

On Thu, 21 Aug 2008, Christiaan Willemsen wrote:

Anyway, I'm going to return the controller, because it 
does not scale very well with more that 4 disks in raid 10. Bandwidth is 
limited to 350MB/sec, and IOPS scale badly with extra disks...


How did you determine that upper limit?  Usually it takes multiple 
benchmark processes running at once in order to get more than 350MB/s out 
of a controller.  For example, if you look carefully at the end of 
http://www.commandprompt.com/blogs/joshua_drake/2008/04/is_that_performance_i_smell_ext2_vs_ext3_on_50_spindles_testing_for_postgresql/ 
you can see that Joshua had to throw 8 threads at the disks in order to 
reach maximum bandwidth.


The idea for xlog + os on 4 disk raid 10 and the rest for the data sound 
good


I would just use a RAID1 pair for the OS, another pair for the xlog, and 
throw all the other disks into a big 0+1 set.  There is some value to 
separating the WAL from the OS disks, from both the performance and the 
management perspectives.  It's nice to be able to monitor the xlog write 
bandwidth rate under load easily for example.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread Alvaro Herrera
[EMAIL PROTECTED] escribió:
 On Thu, 28 Aug 2008, Scott Marlowe wrote:

 scenario 1:  There's a postmaster, it owns all the child processes.
 It gets killed.  The Postmaster gets restarted.  Since there isn't one

 when the postmaster gets killed doesn't that kill all it's children as  
 well?

Of course not.  The postmaster gets a SIGKILL, which is instant death.
There's no way to signal the children.  If they were killed too then
this wouldn't be much of a problem.

 running, it comes up.  starts new child processes.  Meanwhile, the old
 child processes that don't belong to it are busy writing to the data
 store.  Instant corruption.

 if so then the postmaster should not only check if there is an existing  
 postmaster running, it should check for the presense of the child  
 processes as well.

See my other followup.  There's limited things it can check, but against
sysadmin stupidity there's no silver bullet.

 well, if you aren't going through the postmaster, what process is  
 recieving network messages? it can't be a group of processes, only one 
 can be listening to a socket at one time.

Huh?  Each backend has its own socket.

 and if the postmaster isn't needed for the child processes to write to 
 the datastore, how are multiple child processes prevented from writing to 
 the datastore normally? and why doesn't that mechanism continue to work?

They use locks.  Those locks are implemented using shared memory.  If a
new postmaster starts, it gets a new shared memory, and a new set of
locks, that do not conflict with the ones already held by the first gang
of backends.  This is what causes the corruption.


 so are you saying that the only possible thing that can kill the  
 postmaster is the OOM killer? it can't possilby exit in any other  
 situation without the children being shutdown first?

 I would be surprised if that was really true.

If the sysadmin sends a SIGKILL then obviously the same thing happens.

Any other signal gives it the chance to signal the children before
dying.

-- 
Alvaro Herrerahttp://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread david

On Thu, 28 Aug 2008, Alvaro Herrera wrote:


[EMAIL PROTECTED] escribi?:

On Thu, 28 Aug 2008, Scott Marlowe wrote:



scenario 1:  There's a postmaster, it owns all the child processes.
It gets killed.  The Postmaster gets restarted.  Since there isn't one


when the postmaster gets killed doesn't that kill all it's children as
well?


Of course not.  The postmaster gets a SIGKILL, which is instant death.
There's no way to signal the children.  If they were killed too then
this wouldn't be much of a problem.


I'm not saying that it would signal it's children, I thought that the OS 
killed children (unless steps were taken to allow them to re-parent)



well, if you aren't going through the postmaster, what process is
recieving network messages? it can't be a group of processes, only one
can be listening to a socket at one time.


Huh?  Each backend has its own socket.


we must be talking about different things. I'm talking about the socket 
that would be used for clients to talk to postgres, this is either a TCP 
socket or a unix socket. in either case only one process can listen on it.



and if the postmaster isn't needed for the child processes to write to
the datastore, how are multiple child processes prevented from writing to
the datastore normally? and why doesn't that mechanism continue to work?


They use locks.  Those locks are implemented using shared memory.  If a
new postmaster starts, it gets a new shared memory, and a new set of
locks, that do not conflict with the ones already held by the first gang
of backends.  This is what causes the corruption.


so the new postmaster needs to detect that there is a shared memory 
segment out that used by backends for this database.


this doesn't sound that hard, basicly something similar to a pid file in 
the db directory that records what backends are running and what shared 
memory segment they are using.


this would be similar to the existing pid file that would have to be 
removed manually before a new postmaster can start (if it's not a graceful 
shutdown)


besides, some watchdog would need to start the new postmaster, that 
watchdog can be taught to kill off the child processes before starting a 
new postmaster along with clearing the pid file.



so are you saying that the only possible thing that can kill the
postmaster is the OOM killer? it can't possilby exit in any other
situation without the children being shutdown first?

I would be surprised if that was really true.


If the sysadmin sends a SIGKILL then obviously the same thing happens.

Any other signal gives it the chance to signal the children before
dying.


are you sure that it's not going to die from a memory allocation error? or 
any other similar type of error without _always_ killing the children?


David Lang

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Identifying the nature of blocking I/O

2008-08-28 Thread Greg Smith

On Sun, 24 Aug 2008, Tom Lane wrote:

Mind you, I don't think Apple sells any hardware that would be really 
suitable for a big-ass database server.


If you have money to burn, you can get an XServe with up to 8 cores and 
32GB of RAM, and get a card to connect it to a Fiber Channel disk array. 
For only moderately large requirements, you can even get a card with 256MB 
of battery-backed cache (rebranded LSI) to attach the 3 drives in the 
chassis.  None of these are very cost effective compared to servers like 
the popular HP models people mention here regularly, but it is possible.


As for Systemtap on Linux, it might be possible that will accumulate 
enough of a standard library to be usable by regular admins one day, but I 
don't see any sign that's a priority for development.  Right now what you 
have to know in order to write useful scripts is so much more complicated 
than DTrace, where there's all sorts of useful things you can script 
trivially.  I think a good part of DTrace's success comes from flattening 
that learning curve.  Take a look at the one-liners at 
http://www.solarisinternals.com/wiki/index.php/DTraceToolkit and compare 
them against http://sourceware.org/systemtap/examples/


That complexity works against the tool on so many levels.  For example, I 
can easily imagine selling even a paranoid admin on running a simple 
DTrace script like the one-line examples.  Whereas every Systemtap example 
I've seen looks pretty scary at first, and I can't imagine a DBA in a 
typical enterprise environment being able to convince their associated 
admin team they're perfectly safe to run in production.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread Greg Smith

On Tue, 26 Aug 2008, Scott Marlowe wrote:


If it is a checkpoint issue then you need more aggresive bgwriter
settings, and possibly more bandwidth on your storage array.


Since this is 8.3.1 the main useful thing to do is increase 
checkpoint_segments and checkpoint_completion_target to spread the I/O 
over a longer period.  Making the background writer more aggressive 
doesn't really help with


What is checkpoint_segments set to on this system?  If it's still at the 
default of 3, you should increase that dramatically.


What does vmstat 10 say during these spikes?  If you're running the 
sysstate service with data collection then sar can tell you a lot.


Henk seemed a bit confused about this suggestion, and the typo doesn't 
help.  You can install the sysstat package with:


# apt-get install sysstat

This allows collecting system load info data at regular periods, 
automatically, and sar is the tool you can use to look at it.  On Debian, 
in order to get it to collect that information for you, I believe you just 
need to do:


# dpkg-reconfigure sysstat

Then answer yes to Do you want to activate sysstat's cron job?  This 
will install a crontab file that collects all the data you need for sar to 
work.  You may need to restart the service after that.  There's a useful 
walkthrough for this at 
http://www.linuxweblog.com/blogs/wizap/20080126/sysstat-ubuntu


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread Greg Smith

On Thu, 28 Aug 2008, Bill Moran wrote:


In linux, it's possible to tell the OOM killer never to consider
certain processes for the axe, using /proc magic.  See this page:
http://linux-mm.org/OOM_Killer

Perhaps this should be in the PostgreSQL docs somewhere?


The fact that 
http://www.postgresql.org/docs/current/static/kernel-resources.html#AEN22218 
tells you to flat-out turn off overcommit is the right conservative thing 
to be in the documentation as I see it.  Sure, it's possible to keep it on 
but disable the worst side-effect in some kernels (looks like 2.6.11+, so 
no RHEL4 for example).  Trying to get into all in the manual is kind of 
pushing what's appropriate for the PostgreSQL docs I think.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] How to setup disk spindles for best performance

2008-08-28 Thread Christiaan Willemsen


On Aug 29, 2008, at 4:43 AM, Greg Smith wrote:


On Thu, 21 Aug 2008, Christiaan Willemsen wrote:

Anyway, I'm going to return the controller, because it does not  
scale very well with more that 4 disks in raid 10. Bandwidth is  
limited to 350MB/sec, and IOPS scale badly with extra disks...


How did you determine that upper limit?  Usually it takes multiple  
benchmark processes running at once in order to get more than 350MB/ 
s out of a controller.  For example, if you look carefully at the  
end of http://www.commandprompt.com/blogs/joshua_drake/2008/04/is_that_performance_i_smell_ext2_vs_ext3_on_50_spindles_testing_for_postgresql/ 
 you can see that Joshua had to throw 8 threads at the disks in  
order to reach maximum bandwidth.


I used IOmeter to do some tests, with 50 worker thread doing jobs. I  
can get more than 350 MB/sec, I'll have to use huge blocksizes  
(something like 8 MB). Even worse is random read and 70%read, 50%  
random tests. They don't scale at all when you add disks.  A 6 disk  
raid 5 is exactly as fast as a 12 disk raid 10 :(


The idea for xlog + os on 4 disk raid 10 and the rest for the data  
sound good


I would just use a RAID1 pair for the OS, another pair for the xlog,  
and throw all the other disks into a big 0+1 set.  There is some  
value to separating the WAL from the OS disks, from both the  
performance and the management perspectives.  It's nice to be able  
to monitor the xlog write bandwidth rate under load easily for  
example.


Yes, that's about what I had in mind.

Kind regards,

Christiaan

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception

2008-08-28 Thread James Mansion

[EMAIL PROTECTED] wrote:
for example if you have a process that uses 1G of ram (say firefox) 
and it needs to start a new process (say acroread to handle a pdf 
file), what it does is it forks the firefox process (each of which 
have 1G of ram allocated), and then does an exec of the acroread 
process (releasing the 1G of ram previously held by that copy of the 
firefox process)


Indeed, which is why we have vfork.  And, OK, vfork is busted if you 
have a threaded environment, so we have posix_spawn and posix_spawnp.


It is also worth noting that the copy isn't really a full copy on any 
decent modern UNIX - it is a reservation against the total swap space 
available.  Most pages will be happilly shared copy-on-write and never 
fully copied to the child before the exec.


I can't see how an OS can lie to processes about memory being allocated 
to them and not be ridiculed as a toy, but there you go.  I don't think 
Linux is the only perpetrator - doesn't AIX do this too?


The 'bests trategy' for the OOM killer is not to have one, and accept 
that you need some swap space available (it doesn't have to be fast 
since it won't actually be touched) to help out when fork/exec happens 
in big process images.


James


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance