Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
The OOM killer is a terrible idea for any serious database server. I wrote a detailed technical paper on this almost 15 years ago when Silicon Graphics had this same feature, and Oracle and other critical server processes couldn't be made reliable. The problem with overallocating memory as Linux does by default is that EVERY application, no matter how well designed and written, becomes unreliable: It can be killed because of some OTHER process. You can be as clever as you like, and do all the QA possible, and demonstrate that there isn't a single bug in Postgres, and it will STILL be unreliable if you run it on a Linux system that allows overcommitted memory. IMHO, all Postgres servers should run with memory-overcommit disabled. On Linux, that means /proc/sys/vm/overcommit_memory=2. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
On Wed, 27 Aug 2008, Craig James wrote: The OOM killer is a terrible idea for any serious database server. I wrote a detailed technical paper on this almost 15 years ago when Silicon Graphics had this same feature, and Oracle and other critical server processes couldn't be made reliable. The problem with overallocating memory as Linux does by default is that EVERY application, no matter how well designed and written, becomes unreliable: It can be killed because of some OTHER process. You can be as clever as you like, and do all the QA possible, and demonstrate that there isn't a single bug in Postgres, and it will STILL be unreliable if you run it on a Linux system that allows overcommitted memory. IMHO, all Postgres servers should run with memory-overcommit disabled. On Linux, that means /proc/sys/vm/overcommit_memory=2. it depends on how much stuff you allow others to run on the box. if you have no control of that then yes, the box is unreliable (but it's not just becouse of the OOM killer, it's becouse those other users can eat up all the other box resources as well CPU, network bandwidth, disk bandwidth, etc) even with overcommit disabled, the only way you can be sure that a program will not fail is to make sure that it never needs to allocate memory. with overcommit off you could have one program that eats up 100% of your ram without failing (handling the error on memory allocation such that it doesn't crash), but which will cause _every_ other program on the system to fail, including any scripts (becouse every command executed will require forking and without overcommit that will require allocating the total memory that your shell has allocated so that it can run a trivial command (like ps or kill that you are trying to use to fix the problem) if you have a box with unpredictable memory use, disabling overcommit will not make it reliable. it may make it less unreliable (the fact that the linux OOM killer will pick one of the worst possible processes to kill is a problem), but less unreliable is not the same as reliable. it's also not that hard to have a process monitor the postmaster (along with other box resources) to restart it if it is killed, at some point you can get init to watch your watchdog and the OOM killer will not kill init. so while you can't prevent the postmaster from being killed, you can setup to recover from it. David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
On Thu, 28 Aug 2008, Tom Lane wrote: [EMAIL PROTECTED] writes: On Wed, 27 Aug 2008, Andrew Sullivan wrote: The upshot of this is that postgres tends to be a big target for the OOM killer, with seriously bad effects to your database. So for good Postgres operation, you want to run on a machine with the OOM killer disabled. I disagree with you. Actually, the problem with Linux' OOM killer is that it *disproportionately targets the PG postmaster*, on the basis not of memory that the postmaster is using but of memory its child processes are using. This was discussed in the PG archives a few months ago; I'm too lazy to search for the link right now, but the details and links to confirming kernel documentation are in our archives. This is one hundred percent antithetical to the basic design philosophy of Postgres, which is that no matter how badly the child processes screw up, the postmaster should live to fight another day. The postmaster basically exists to restart things after children die ungracefully. If the OOM killer takes out the postmaster itself (rather than the child that was actually eating the unreasonable amount of memory), we have no chance of recovering. So, if you want a PG installation that is as robust as it's designed to be, you *will* turn off Linux' OOM killer. Otherwise, don't complain to us when your database unexpectedly stops responding. (Alternatively, if you know how an unprivileged userland process can defend itself against such exceedingly brain-dead kernel policy, we are all ears.) there are periodic flamefests on the kernel mailing list over the OOM killer, if you can propose a better algorithm for it to use than the current one that doesn't end up being just as bad for some other workload the kernel policy can be changed. IIRC the reason why it targets the parent process is to deal with a fork-bomb type of failure where a program doesn't use much memory itself, but forks off memory hogs as quickly as it can. if the OOM killer only kills the children the problem never gets solved. I assume that the postmaster process is monitoring the back-end processes by being it's parent, is there another way that this monitoring could be done so that the back-end processes become independant of the monitoring tool after they are started (the equivalent of nohup)? while this approach to monitoring may not be as quick to react as a wait for a child exit, it may be worth doing if it makes the postmaster not be the prime target of the OOM killer when things go bad on the system. regards, tom lane PS: I think this is probably unrelated to the OP's problem, since he stated there was no sign of any problem from the database server's side. agreed. David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] indexing for distinct search in timestamp based table
I'm looking for some help in speeding up searches. My table is pretty simple (see below), but somewhat large, and continuously growing. Currently it has about 50 million rows. The table is (I know I have excessive indexes, I'm trying to get the appropriate ones and drop the extras): Table public.ad_log Column|Type | Modifiers --+-+--- - ad_log_id| integer | not null default nextval('ad_log_ad_log_id_seq'::regclass) channel_name | text| not null player_name | text| not null ad_name | text| not null start_time | timestamp without time zone | not null end_time | timestamp without time zone | not null Indexes: ad_log_pkey PRIMARY KEY, btree (ad_log_id) ad_log_channel_name_key UNIQUE, btree (channel_name, player_name, ad_name, start_time, end_time) ad_log_ad_and_start btree (ad_name, start_time) ad_log_ad_name btree (ad_name) ad_log_all btree (channel_name, player_name, start_time, ad_name) ad_log_channel_name btree (channel_name) ad_log_end_time btree (end_time) ad_log_player_and_start btree (player_name, start_time) ad_log_player_name btree (player_name) ad_log_start_time btree (start_time) The query I'm trying to speed up is below. In it the field tag can be one of channel_name, player_name, or ad_name. I'm actually trying to return the distinct values and I found GROUP BY to be slightly faster than using DISTINCT. Also, any of those fields may be unspecified in the WHERE clauses in which case we use '%', but it seems Postgres optimizes that pretty well. SELECT field FROM ad_log WHERE channel_name LIKE :channel_name AND player_name LIKE :player_name AND ad_name LIKE :ad_name AND start_time BETWEEN :start_date AND (date(:end_date) + 1) GROUP BY field ORDER BY field A typical query is: explain analyze SELECT channel_name FROM ad_log WHERE channel_name LIKE '%' AND ad_name LIKE '%' AND start_time BETWEEN '2008-07-01' AND (date('2008-07-28') + 1) GROUP BY channel_name ORDER BY channel_name; with the result being: QUERY PLAN --- Sort (cost=1163169.02..1163169.03 rows=5 width=10) (actual time=75460.187..75460.192 rows=15 loops=1) Sort Key: channel_name Sort Method: quicksort Memory: 17kB - HashAggregate (cost=1163168.91..1163168.96 rows=5 width=10) (actual time=75460.107..75460.114 rows=15 loops=1) - Bitmap Heap Scan on ad_log (cost=285064.30..1129582.84 rows=13434427 width=10) (actual time=8506.250..65771.597 rows=13701296 loops=1) Recheck Cond: ((start_time = '2008-07-01 00:00:00'::timestamp without time zone) AND (start_time = '2008-07-29'::date)) Filter: ((channel_name ~~ '%'::text) AND (ad_name ~~ '%'::text)) - Bitmap Index Scan on ad_log_start_time (cost=0.00..281705.70 rows=13434427 width=0) (actual time=8488.443..8488.443 rows=13701296 loops=1) Index Cond: ((start_time = '2008-07-01 00:00:00'::timestamp without time zone) AND (start_time = '2008-07-29'::date)) Total runtime: 75460.361 ms It seems to me there should be some way to create an index to speed this up, but the various ones I've tried so far haven't helped. Any suggestions would be greatly appreciated. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] indexing for distinct search in timestamp based table
Rainer Mager wrote: I'm looking for some help in speeding up searches. My table is pretty simple (see below), but somewhat large, and continuously growing. Currently it has about 50 million rows. Regarding your use of LIKE: (1)If you are able to specify the beginning character(s) of the statement you are searching for, you will have a better chance of your statement using an index. If you specify a wildcard(%) before the search string, the entire string in the column must be searched therefore no index will be used. (2) Reorder your where clause to reduce the size of the set that LIKE operates on. In your example below, put the BETWEEN before the LIKE. (3) Consider the use of trigrams instead of LIKE. I have not used it but I notice that postgres supports trigrams: The pg_trgm module provides functions and operators for determining the similarity of text based on trigram matching, as well as index operator classes that support fast searching for similar strings. Here is the link: http://www.postgresql.org/docs/current/static/pgtrgm.html --cheers HH The table is (I know I have excessive indexes, I'm trying to get the appropriate ones and drop the extras): Table public.ad_log Column|Type | Modifiers --+-+--- - ad_log_id| integer | not null default nextval('ad_log_ad_log_id_seq'::regclass) channel_name | text| not null player_name | text| not null ad_name | text| not null start_time | timestamp without time zone | not null end_time | timestamp without time zone | not null Indexes: ad_log_pkey PRIMARY KEY, btree (ad_log_id) ad_log_channel_name_key UNIQUE, btree (channel_name, player_name, ad_name, start_time, end_time) ad_log_ad_and_start btree (ad_name, start_time) ad_log_ad_name btree (ad_name) ad_log_all btree (channel_name, player_name, start_time, ad_name) ad_log_channel_name btree (channel_name) ad_log_end_time btree (end_time) ad_log_player_and_start btree (player_name, start_time) ad_log_player_name btree (player_name) ad_log_start_time btree (start_time) The query I'm trying to speed up is below. In it the field tag can be one of channel_name, player_name, or ad_name. I'm actually trying to return the distinct values and I found GROUP BY to be slightly faster than using DISTINCT. Also, any of those fields may be unspecified in the WHERE clauses in which case we use '%', but it seems Postgres optimizes that pretty well. SELECT field FROM ad_log WHERE channel_name LIKE :channel_name AND player_name LIKE :player_name AND ad_name LIKE :ad_name AND start_time BETWEEN :start_date AND (date(:end_date) + 1) GROUP BY field ORDER BY field A typical query is: explain analyze SELECT channel_name FROM ad_log WHERE channel_name LIKE '%' AND ad_name LIKE '%' AND start_time BETWEEN '2008-07-01' AND (date('2008-07-28') + 1) GROUP BY channel_name ORDER BY channel_name; with the result being: QUERY PLAN --- Sort (cost=1163169.02..1163169.03 rows=5 width=10) (actual time=75460.187..75460.192 rows=15 loops=1) Sort Key: channel_name Sort Method: quicksort Memory: 17kB - HashAggregate (cost=1163168.91..1163168.96 rows=5 width=10) (actual time=75460.107..75460.114 rows=15 loops=1) - Bitmap Heap Scan on ad_log (cost=285064.30..1129582.84 rows=13434427 width=10) (actual time=8506.250..65771.597 rows=13701296 loops=1) Recheck Cond: ((start_time = '2008-07-01 00:00:00'::timestamp without time zone) AND (start_time = '2008-07-29'::date)) Filter: ((channel_name ~~ '%'::text) AND (ad_name ~~ '%'::text)) - Bitmap Index Scan on ad_log_start_time (cost=0.00..281705.70 rows=13434427 width=0) (actual time=8488.443..8488.443 rows=13701296 loops=1) Index Cond: ((start_time = '2008-07-01 00:00:00'::timestamp without time zone) AND (start_time = '2008-07-29'::date)) Total runtime: 75460.361 ms It seems to me there should be some way to create an index to speed this up, but the various ones I've tried so far haven't helped. Any suggestions would be greatly appreciated. -- H. Hall ReedyRiver Group LLC http://www.reedyriver.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
On Wed, Aug 27, 2008 at 03:22:09PM -0700, [EMAIL PROTECTED] wrote: I disagree with you. I think goof Postgres operation is so highly dependant on caching as much data as possible that disabling overcommit (and throwing away a lot of memory that could be used for cache) is a solution that's as bad or worse than the problem it's trying to solve. Ok, but the danger is that the OOM killer kills your postmaster. To me, this is a cure way worse than the disease it's trying to treat. YMMD c. c. A -- Andrew Sullivan [EMAIL PROTECTED] +1 503 667 4564 x104 http://www.commandprompt.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
On Wed, 27 Aug 2008, [EMAIL PROTECTED] wrote: if memory overcommit is disabled, the kernel checks to see if you have an extra 1G of ram available, if you do it allows the process to continue, if you don't it tries to free memory (by throwing away cache, swapping to disk, etc), and if it can't free the memory will return a memroy allocation error (which I believe will cause firefox to exit). Remember that the memory overcommit check is checking against the amount of RAM + swap you have - not just the amount of RAM. When a fork occurs, hardly any extra actual RAM is used (due to copy on write), but the potential is there for the process to use it. If overcommit is switched off, then you just need to make sure there is *plenty* of swap to convince the kernel that it can actually fulfil all of the memory requests if all the processes behave badly and all shared pages become unshared. Then the consequences of processes actually using that memory are that the machine will swap, rather than the OOM killer having to act. Of course, it's generally bad to run a machine with more going on than will fit in RAM. Neither swapping nor OOM killing are particularly good - it's just a consequence of the amount of memory needed being unpredictable. Probably the best solution is to just tell the kernel somehow to never kill the postmaster. Matthew -- Taking apron off And now you can say honestly that you have been to a lecture where you watched paint dry. - Computer Graphics Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
In response to Matthew Wakeling [EMAIL PROTECTED]: Probably the best solution is to just tell the kernel somehow to never kill the postmaster. This thread interested me enough to research this a bit. In linux, it's possible to tell the OOM killer never to consider certain processes for the axe, using /proc magic. See this page: http://linux-mm.org/OOM_Killer Perhaps this should be in the PostgreSQL docs somewhere? -- Bill Moran Collaborative Fusion Inc. http://people.collaborativefusion.com/~wmoran/ [EMAIL PROTECTED] Phone: 412-422-3463x4023 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causesAn I/O error occured while sending to the backend. exception
Bill Moran [EMAIL PROTECTED] wrote: In response to Matthew Wakeling [EMAIL PROTECTED]: Probably the best solution is to just tell the kernel somehow to never kill the postmaster. This thread interested me enough to research this a bit. In linux, it's possible to tell the OOM killer never to consider certain processes for the axe, using /proc magic. See this page: http://linux-mm.org/OOM_Killer Perhaps this should be in the PostgreSQL docs somewhere? That sure sounds like a good idea. Even though the one time the OOM killer kicked in on one of our servers, it killed a runaway backend and not the postmaster ( http://archives.postgresql.org/pgsql-bugs/2008-07/msg00105.php ), I think I will modify our service scripts in /etc/init.d/ to pick off the postmaster pid after a start and echo -16 (or some such) into the /proc/pid/oom_adj file (which is where I found the file on my SuSE system). Thanks for the research and the link! -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
On Aug 28, 2008, at 6:26 AM, Matthew Wakeling wrote: On Wed, 27 Aug 2008, [EMAIL PROTECTED] wrote: if memory overcommit is disabled, the kernel checks to see if you have an extra 1G of ram available, if you do it allows the process to continue, if you don't it tries to free memory (by throwing away cache, swapping to disk, etc), and if it can't free the memory will return a memroy allocation error (which I believe will cause firefox to exit). Remember that the memory overcommit check is checking against the amount of RAM + swap you have - not just the amount of RAM. When a fork occurs, hardly any extra actual RAM is used (due to copy on write), but the potential is there for the process to use it. If overcommit is switched off, then you just need to make sure there is *plenty* of swap to convince the kernel that it can actually fulfil all of the memory requests if all the processes behave badly and all shared pages become unshared. Then the consequences of processes actually using that memory are that the machine will swap, rather than the OOM killer having to act. Of course, it's generally bad to run a machine with more going on than will fit in RAM. Neither swapping nor OOM killing are particularly good - it's just a consequence of the amount of memory needed being unpredictable. Probably the best solution is to just tell the kernel somehow to never kill the postmaster. Or configure adequate swap space? Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
Another approach we used successfully for a similar problem -- (we had lots of free high memory but were running out of low memory; oom killer wiped out MQ a couple times and postmaster a couple times) -- was to change the settings for how aggressively the virtual memory system protected low memory by changing /proc/sys/vm/lowmem_reserve_ratio (2.6.18?+ Kernel). I don't remember all of the details, but we looked at Documentation/filesystems/proc.txt for the 2.6.25 kernel (it wasn't documented for earlier kernel releases) to figure out how it worked and set it appropriate to our system memory configuration. -Jerry -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Steve Atkins Sent: Thursday, August 28, 2008 9:06 AM To: PostgreSQL Performance Subject: Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception On Aug 28, 2008, at 6:26 AM, Matthew Wakeling wrote: On Wed, 27 Aug 2008, [EMAIL PROTECTED] wrote: if memory overcommit is disabled, the kernel checks to see if you have an extra 1G of ram available, if you do it allows the process to continue, if you don't it tries to free memory (by throwing away cache, swapping to disk, etc), and if it can't free the memory will return a memroy allocation error (which I believe will cause firefox to exit). Remember that the memory overcommit check is checking against the amount of RAM + swap you have - not just the amount of RAM. When a fork occurs, hardly any extra actual RAM is used (due to copy on write), but the potential is there for the process to use it. If overcommit is switched off, then you just need to make sure there is *plenty* of swap to convince the kernel that it can actually fulfil all of the memory requests if all the processes behave badly and all shared pages become unshared. Then the consequences of processes actually using that memory are that the machine will swap, rather than the OOM killer having to act. Of course, it's generally bad to run a machine with more going on than will fit in RAM. Neither swapping nor OOM killing are particularly good - it's just a consequence of the amount of memory needed being unpredictable. Probably the best solution is to just tell the kernel somehow to never kill the postmaster. Or configure adequate swap space? Cheers, Steve -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
[EMAIL PROTECTED] wrote: On Wed, 27 Aug 2008, Craig James wrote: The OOM killer is a terrible idea for any serious database server. I wrote a detailed technical paper on this almost 15 years ago when Silicon Graphics had this same feature, and Oracle and other critical server processes couldn't be made reliable. The problem with overallocating memory as Linux does by default is that EVERY application, no matter how well designed and written, becomes unreliable: It can be killed because of some OTHER process. You can be as clever as you like, and do all the QA possible, and demonstrate that there isn't a single bug in Postgres, and it will STILL be unreliable if you run it on a Linux system that allows overcommitted memory. IMHO, all Postgres servers should run with memory-overcommit disabled. On Linux, that means /proc/sys/vm/overcommit_memory=2. it depends on how much stuff you allow others to run on the box. if you have no control of that then yes, the box is unreliable (but it's not just becouse of the OOM killer, it's becouse those other users can eat up all the other box resources as well CPU, network bandwidth, disk bandwidth, etc) even with overcommit disabled, the only way you can be sure that a program will not fail is to make sure that it never needs to allocate memory. with overcommit off you could have one program that eats up 100% of your ram without failing (handling the error on memory allocation such that it doesn't crash), but which will cause _every_ other program on the system to fail, including any scripts (becouse every command executed will require forking and without overcommit that will require allocating the total memory that your shell has allocated so that it can run a trivial command (like ps or kill that you are trying to use to fix the problem) if you have a box with unpredictable memory use, disabling overcommit will not make it reliable. it may make it less unreliable (the fact that the linux OOM killer will pick one of the worst possible processes to kill is a problem), but less unreliable is not the same as reliable. The problem with any argument in favor of memory overcommit and OOM is that there is a MUCH better, and simpler, solution. Buy a really big disk, say a terabyte, and allocate the whole thing as swap space. Then do a decent job of configuring your kernel so that any reasonable process can allocate huge chunks of memory that it will never use, but can't use the whole terrabyte. Using real swap space instead of overallocated memory is a much better solution. - It's cheap. - There is no performance hit at all if you buy enough real memory - If runaway processes start actually using memory, the system slows down, but server processes like Postgres *aren't killed*. - When a runaway process starts everybody swapping, you can just find it and kill it. Once it's dead, everything else goes back to normal. It's hard to imagine a situation where any program or collection of programs would actually try to allocate more than a terrabyte of memory and exceed the swap space on a single terrabyte disk. The cost is almost nothing, a few hundred dollars. So turn off overcommit, and buy an extra disk if you actually need a lot of virtual memory. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
On Thu, 28 Aug 2008, Steve Atkins wrote: Probably the best solution is to just tell the kernel somehow to never kill the postmaster. Or configure adequate swap space? Oh yes, that's very important. However, that gives the machine the opportunity to thrash. Matthew -- The early bird gets the worm. If you want something else for breakfast, get up later. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
On Thu, 28 Aug 2008, Craig James wrote: [EMAIL PROTECTED] wrote: On Wed, 27 Aug 2008, Craig James wrote: The OOM killer is a terrible idea for any serious database server. I wrote a detailed technical paper on this almost 15 years ago when Silicon Graphics had this same feature, and Oracle and other critical server processes couldn't be made reliable. The problem with overallocating memory as Linux does by default is that EVERY application, no matter how well designed and written, becomes unreliable: It can be killed because of some OTHER process. You can be as clever as you like, and do all the QA possible, and demonstrate that there isn't a single bug in Postgres, and it will STILL be unreliable if you run it on a Linux system that allows overcommitted memory. IMHO, all Postgres servers should run with memory-overcommit disabled. On Linux, that means /proc/sys/vm/overcommit_memory=2. it depends on how much stuff you allow others to run on the box. if you have no control of that then yes, the box is unreliable (but it's not just becouse of the OOM killer, it's becouse those other users can eat up all the other box resources as well CPU, network bandwidth, disk bandwidth, etc) even with overcommit disabled, the only way you can be sure that a program will not fail is to make sure that it never needs to allocate memory. with overcommit off you could have one program that eats up 100% of your ram without failing (handling the error on memory allocation such that it doesn't crash), but which will cause _every_ other program on the system to fail, including any scripts (becouse every command executed will require forking and without overcommit that will require allocating the total memory that your shell has allocated so that it can run a trivial command (like ps or kill that you are trying to use to fix the problem) if you have a box with unpredictable memory use, disabling overcommit will not make it reliable. it may make it less unreliable (the fact that the linux OOM killer will pick one of the worst possible processes to kill is a problem), but less unreliable is not the same as reliable. The problem with any argument in favor of memory overcommit and OOM is that there is a MUCH better, and simpler, solution. Buy a really big disk, say a terabyte, and allocate the whole thing as swap space. Then do a decent job of configuring your kernel so that any reasonable process can allocate huge chunks of memory that it will never use, but can't use the whole terrabyte. Using real swap space instead of overallocated memory is a much better solution. - It's cheap. cheap in dollars, if you actually use any of it it's very expensive in performance - There is no performance hit at all if you buy enough real memory - If runaway processes start actually using memory, the system slows down, but server processes like Postgres *aren't killed*. - When a runaway process starts everybody swapping, you can just find it and kill it. Once it's dead, everything else goes back to normal. all of these things are still true if you enable overcommit, the difference is that with overcommit enabled your actual ram will be used for cache as much as possible, with overcommit disabled you will keep throwing away cache to make room for memory that's allocated but not written to. I generally allocate 2G of disk to swap, if the system ends up using even that much it will have slowed to a crawl, but if you are worried that that's no enough, by all means go ahead and allocate more, but allocateing a 1TB disk is overkill (do you realize how long it takes just to _read_ an entire 1TB disk? try it sometime with dd if=/dev/drive of=/dev/null) David Lang It's hard to imagine a situation where any program or collection of programs would actually try to allocate more than a terrabyte of memory and exceed the swap space on a single terrabyte disk. The cost is almost nothing, a few hundred dollars. So turn off overcommit, and buy an extra disk if you actually need a lot of virtual memory. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
On Thu, 28 Aug 2008, Matthew Wakeling wrote: On Wed, 27 Aug 2008, [EMAIL PROTECTED] wrote: if memory overcommit is disabled, the kernel checks to see if you have an extra 1G of ram available, if you do it allows the process to continue, if you don't it tries to free memory (by throwing away cache, swapping to disk, etc), and if it can't free the memory will return a memroy allocation error (which I believe will cause firefox to exit). Remember that the memory overcommit check is checking against the amount of RAM + swap you have - not just the amount of RAM. When a fork occurs, hardly any extra actual RAM is used (due to copy on write), but the potential is there for the process to use it. If overcommit is switched off, then you just need to make sure there is *plenty* of swap to convince the kernel that it can actually fulfil all of the memory requests if all the processes behave badly and all shared pages become unshared. Then the consequences of processes actually using that memory are that the machine will swap, rather than the OOM killer having to act. if you are correct that it just checks against memory+swap then it's not a big deal, but I don't think it does that. I think it actually allocates the memory, and if it does that it will push things out of ram to do the allocation, I don't believe that it will allocate swap space directly. David Lang Of course, it's generally bad to run a machine with more going on than will fit in RAM. Neither swapping nor OOM killing are particularly good - it's just a consequence of the amount of memory needed being unpredictable. Probably the best solution is to just tell the kernel somehow to never kill the postmaster. Matthew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
On Thu, 2008-08-28 at 00:56 -0400, Tom Lane wrote: Actually, the problem with Linux' OOM killer is that it *disproportionately targets the PG postmaster*, on the basis not of memory that the postmaster is using but of memory its child processes are using. This was discussed in the PG archives a few months ago; I'm too lazy to search for the link right now, but the details and links to confirming kernel documentation are in our archives. http://archives.postgresql.org/pgsql-hackers/2008-02/msg00101.php It's not so much that the OOM Killer targets the parent process for a fraction of the memory consumed by the child. It may not be a great design, but it's not what's causing the problem for the postmaster. The problem for the postmaster is that the OOM killer counts the children's total vmsize -- including *shared* memory -- against the parent, which is such a bad idea I don't know where to start. If you have shared_buffers set to 1GB and 25 connections, the postmaster will be penalized as though it was using 13.5 GB of memory, even though all the processes together are only using about 1GB! Not only that, killing a process doesn't free shared memory, so it's just flat out broken. Regards, Jeff Davis -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
Matthew Wakeling wrote: On Thu, 28 Aug 2008, Steve Atkins wrote: Probably the best solution is to just tell the kernel somehow to never kill the postmaster. Or configure adequate swap space? Oh yes, that's very important. However, that gives the machine the opportunity to thrash. No, that's where the whole argument for allowing overcommitted memory falls flat. The entire argument for allowing overcommitted memory hinges on the fact that processes *won't use the memory*. If they use it, then overcommitting causes problems everywhere, such as a Postmaster getting arbitrarily killed. If a process *doesn't* use the memory, then there's no problem with thrashing, right? So it never makes sense to enable overcommitted memory when Postgres, or any server, is running. Allocating a big, fat terabyte swap disk is ALWAYS better than allowing overcommitted memory. If your usage is such that overcommitted memory would never be used, then the swap disk will never be used either. If your processes do use the memory, then your performance goes into the toilet, and you know it's time to buy more memory or a second server, but in the mean time your server processes at least keep running while you kill the rogue processes. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
On Thu, 28 Aug 2008, [EMAIL PROTECTED] wrote: On Thu, 28 Aug 2008, Matthew Wakeling wrote: On Wed, 27 Aug 2008, [EMAIL PROTECTED] wrote: if memory overcommit is disabled, the kernel checks to see if you have an extra 1G of ram available, if you do it allows the process to continue, if you don't it tries to free memory (by throwing away cache, swapping to disk, etc), and if it can't free the memory will return a memroy allocation error (which I believe will cause firefox to exit). Remember that the memory overcommit check is checking against the amount of RAM + swap you have - not just the amount of RAM. When a fork occurs, hardly any extra actual RAM is used (due to copy on write), but the potential is there for the process to use it. If overcommit is switched off, then you just need to make sure there is *plenty* of swap to convince the kernel that it can actually fulfil all of the memory requests if all the processes behave badly and all shared pages become unshared. Then the consequences of processes actually using that memory are that the machine will swap, rather than the OOM killer having to act. if you are correct that it just checks against memory+swap then it's not a big deal, but I don't think it does that. I think it actually allocates the memory, and if it does that it will push things out of ram to do the allocation, I don't believe that it will allocate swap space directly. I just asked on the kernel mailing list and Alan Cox responded. he is saying that you are correct, it only allocates against the total available, it doesn't actually allocate ram. so with sufficiant swap overcommit off should be fine. but you do need to allocate more swap as the total memory 'used' can be significantly higher that with overcommit on. David Lang David Lang Of course, it's generally bad to run a machine with more going on than will fit in RAM. Neither swapping nor OOM killing are particularly good - it's just a consequence of the amount of memory needed being unpredictable. Probably the best solution is to just tell the kernel somehow to never kill the postmaster. Matthew -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
On Thu, 28 Aug 2008, Craig James wrote: Matthew Wakeling wrote: On Thu, 28 Aug 2008, Steve Atkins wrote: Probably the best solution is to just tell the kernel somehow to never kill the postmaster. Or configure adequate swap space? Oh yes, that's very important. However, that gives the machine the opportunity to thrash. No, that's where the whole argument for allowing overcommitted memory falls flat. The entire argument for allowing overcommitted memory hinges on the fact that processes *won't use the memory*. If they use it, then overcommitting causes problems everywhere, such as a Postmaster getting arbitrarily killed. If a process *doesn't* use the memory, then there's no problem with thrashing, right? So it never makes sense to enable overcommitted memory when Postgres, or any server, is running. Allocating a big, fat terabyte swap disk is ALWAYS better than allowing overcommitted memory. If your usage is such that overcommitted memory would never be used, then the swap disk will never be used either. If your processes do use the memory, then your performance goes into the toilet, and you know it's time to buy more memory or a second server, but in the mean time your server processes at least keep running while you kill the rogue processes. there was a misunderstanding (for me if nobody else) that without overcommit it was actual ram that was getting allocated, which could push things out to swap even if the memory ended up not being needed later. with the clarification that this is not the case and the allocation is just reducing the virtual memory available it's now clear that it is just as efficiant to run with overcommit off. so the conclusion is: no performance/caching/buffer difference between the two modes. the differencees between the two are: with overcommit when all ram+swap is used OOM killer is activated. for the same amount of ram+swap more allocations can be done before it is all used up (how much more is unpredicable) without overcommit when all ram+swap is allocated programs (not nessasarily the memory hog) start getting memory allocation errors. David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
On Wed, 2008-08-27 at 23:23 -0700, [EMAIL PROTECTED] wrote: there are periodic flamefests on the kernel mailing list over the OOM killer, if you can propose a better algorithm for it to use than the current one that doesn't end up being just as bad for some other workload the kernel policy can be changed. Tried that: http://lkml.org/lkml/2007/2/9/275 All they have to do is *not* count shared memory against the process (or at least not count it against the parent of the process), and the system may approximate sanity. IIRC the reason why it targets the parent process is to deal with a fork-bomb type of failure where a program doesn't use much memory itself, but forks off memory hogs as quickly as it can. if the OOM killer only kills the children the problem never gets solved. But killing a process won't free shared memory. And there is already a system-wide limit on shared memory. So what's the point of such a bad design? Regards, Jeff Davis -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] update - which way quicker?
Good morning, Tried to compare Table1 based on Table2 . update table1.col = false if table1.pk_cols not in table2.pk_cols For the following two ways, (2) always performs better than (1) right, and I need your inputs. (1) update table1 set col = false where table1.pk_co1 || table1.pk_col2 || table1.pk_colN NOT IN (select pk_co1 || pk_col2 || pk_colN from table2 ) (2) ResultSet(rs) = select pk_col1||pk_col2... || pk_colN from table1 left join table2 using (pk_col1..., pk_colN) where table2.pk_col1 is null Then for each rs record, do: update table1 set col = false where col1||... colN in rs.value Thanks a lot! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Best hardware/cost tradoff?
On Thu, Aug 28, 2008 at 1:22 PM, cluster [EMAIL PROTECTED] wrote: I'm about to buy a combined web- and database server. When (if) the site gets sufficiently popular, we will split the database out to a separate server. Our budget is limited, so how should we prioritize? Standard prioritization for a db server is: Disks and controller, RAM, CPU. * We think about buying some HP Proliant server with at least 4GB ram and at least a duo core processor. Possibly quad core. The OS will be debian/Linux. HP Makes nice equipment. Also, since this machine will have apache as well as pgsql running on it, you might want to look at more memory if it's reasonably priced. If pg and apache are using 1.5Gig total to run, you've got 2.5Gig for the OS to cache in. With 8 Gig of ram, you'd have 6.5Gig to cache in. Also, the cost of a quad core nowadays is pretty reasonable. * Much of the database will fit in RAM so it is not *that* necessary to prefer the more expensive SAS 1 RPM drives to the cheaper 7500 RPM SATA drives, is it? That depends. Writes will still have to hit the drives. Reads will be mostly from memory. Be sure to set your effective_cache_size appropriately. There will both be many read- and write queries and a *lot* (!) of random reads. * I think we will go for hardware-based RAID 1 with a good battery-backed-up controller. The HP RAID controller that's been mentioned on the list seems like a good performer. I have read that software RAID perform surprisingly good, but for a production site where hotplug replacement of dead disks is required, is software RAID still worth it? The answre is maybe. The reason people keep testing software RAID is that a lot of cheap (not necessarily in cost, just in design) controllers give mediocre performance compared to SW RAID. With SW RAID on top of a caching controller in jbod mode, the controller simply becomes a cache that can survive power loss, and doesn't have to do any RAID calculations any more. With today's very fast CPUs, and often running RAID-10 for dbs, which requires little real overhead, it's not uncommon for SW RAID to outrun HW. With better controllers, the advantage is small to none. Anything else we should be aware of? Can you go with 4 drives? Even if they're just SATA drives, you'd be amazed at what going from a 2 drive mirror to a 4 drive RAID-10 can do for your performance. Note you'll have no more storage going from 2 drive mirror to 4 drive RAID-10, but your aggregate bandwidth on reads will be doubled. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
On Thu, 28 Aug 2008, Jeff Davis wrote: The problem for the postmaster is that the OOM killer counts the children's total vmsize -- including *shared* memory -- against the parent, which is such a bad idea I don't know where to start. If you have shared_buffers set to 1GB and 25 connections, the postmaster will be penalized as though it was using 13.5 GB of memory, even though all the processes together are only using about 1GB! I find it really hard to believe that it counts shared memory like that. That's just dumb. Of course, there are two types of shared memory. There's explicit shared memory, like Postgres uses, and there's copy-on-write shared memory, caused by a process fork. The copy-on-write memory needs to be counted for each child, but the explicit shared memory needs to be counted just once. Not only that, killing a process doesn't free shared memory, so it's just flat out broken. Exactly. a cost-benefit model would work well here. Work out how much RAM would be freed by killing a process, and use that when choosing which process to kill. Matthew -- You will see this is a 3-blackboard lecture. This is the closest you are going to get from me to high-tech teaching aids. Hey, if they put nooses on this, it would be fun! -- Computer Science Lecturer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Best hardware/cost tradoff?
-Mensaje original- De: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] En nombre de cluster I'm about to buy a combined web- and database server. When (if) the site gets sufficiently popular, we will split the database out to a separate server. Our budget is limited, so how should we prioritize? * We think about buying some HP Proliant server with at least 4GB ram and at least a duo core processor. Possibly quad core. The OS will be debian/Linux. * Much of the database will fit in RAM so it is not *that* necessary to prefer the more expensive SAS 1 RPM drives to the cheaper 7500 RPM SATA drives, is it? There will both be many read- and write queries and a *lot* (!) of random reads. * I think we will go for hardware-based RAID 1 with a good battery-backed-up controller. I have read that software RAID perform surprisingly good, but for a production site where hotplug replacement of dead disks is required, is software RAID still worth it? Anything else we should be aware of? I havent had any issues with software raid (mdadm) and hot-swaps. It keeps working in degraded mode and as soon as you replace the defective disk it can reconstruct the array on the fly. Performance will suffer while at it but the service keeps up. The battery backup makes a very strong point for a hw controller. Still, I have heard good things on combining a HW controller with JBODS leaving the RAID affair to mdadm. In your scenario though with *lots* of random reads, if I had to choose between a HW controller 2 disks or software RAID with 4 or 6 disks, I would go for the disks. There are motherboards with 6 SATA ports. For the money you will save on the controller you can afford 6 disks in a RAID 10 setup. Cheers, Fernando. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Best hardware/cost tradoff?
-Mensaje original- De: [EMAIL PROTECTED] * I think we will go for hardware-based RAID 1 with a good battery-backed-up controller. I have read that software RAID perform surprisingly good, but for a production site where hotplug replacement of dead disks is required, is software RAID still worth it? ... I havent had any issues with software raid (mdadm) and hot-swaps. It keeps working in degraded mode and as soon as you replace the defective disk it can reconstruct the array on the fly. Performance will suffer while at it but the service keeps up. The battery backup makes a very strong point for a hw controller. Still, I have heard good things on combining a HW controller with JBODS leaving the RAID affair to mdadm. In your scenario though with *lots* of random reads, if I had to choose between a HW controller 2 disks or software RAID with 4 or 6 disks, I would go for the disks. There are motherboards with 6 SATA ports. For the money you will save on the controller you can afford 6 disks in a RAID 10 setup. This is good advice. Hot-swapping seems cool, but how often will you actually use it? Maybe once every year? With Software RAID, replacing a disk means shutdown, swap the hardware, and reboot, which is usually less than ten minutes, and you're back in business. If that's the only thing that happens, you'll have 99.97% uptime on your server. If you're on a limited budget, a software RAID 1+0 will be very cost effective and give good performance for lots of random reads. Hardware RAID with a battery-backed cache helps with writes and hot swapping. If your random-read performance needs outweigh these two factors, consider software RAID. Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
On Thu, 28 Aug 2008, Craig James wrote: If your processes do use the memory, then your performance goes into the toilet, and you know it's time to buy more memory or a second server, but in the mean time your server processes at least keep running while you kill the rogue processes. I'd argue against swap ALWAYS being better than overcommit. It's a choice between your performance going into the toilet or your processes dieing. On the one hand, if someone fork-bombs you, the OOM killer has a chance of solving the problem for you, rather than you having to log onto an unresponsive machine to kill the process yourself. On the other hand, the OOM killer may kill the wrong thing. Depending on what else you use your machine for, either of the choices may be the right one. Another point is that from a business perspective, a database that has stopped responding is equally bad regardless of whether that is because the OOM killer has appeared or because the machine is thrashing. In both cases, there is a maximum throughput that the machine can handle, and if requests appear quicker than that the system will collapse, especially if the requests start timing out and being retried. This problem really is caused by the kernel not having enough information on how much memory a process is going to use. I would be much in favour of replacing fork() with some more informative system call. For example, forkandexec() instead of fork() then exec() - the kernel would know that the new process will never need any of that duplicated RAM. However, there is *far* too much legacy in the old fork() call to change that now. Likewise, I would be all for Postgres managing its memory better. It would be very nice to be able to set a maximum amount of work-memory, rather than a maximum amount per backend. Each backend could then make do with however much is left of the work-memory pool when it actually executes queries. As it is, the server admin has no idea how many multiples of work-mem are going to be actually used, even knowing the maximum number of backends. Matthew -- Of course it's your fault. Everything here's your fault - it says so in your contract.- Quark -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] update - which way quicker?
On 2008-08-28, at 21:31, Emi Lu wrote: Good morning, Tried to compare Table1 based on Table2 . update table1.col = false if table1.pk_cols not in table2.pk_cols For the following two ways, (2) always performs better than (1) right, and I need your inputs. == == (1) update table1 set col = false where table1.pk_co1 || table1.pk_col2 || table1.pk_colN NOT IN (select pk_co1 || pk_col2 || pk_colN from table2 ) (2) ResultSet(rs) = select pk_col1||pk_col2... || pk_colN from table1 left join table2 using (pk_col1..., pk_colN) where table2.pk_col1 is null Then for each rs record, do: update table1 set col = false where col1||... colN in rs.value Thanks a lot! -- Sent via pgsql-performance mailing list (pgsql- [EMAIL PROTECTED]) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance Check EXISTS http://www.postgresql.org/docs/8.3/interactive/functions-subquery.html Serdecznie pozdrawiam Pawel Socha [EMAIL PROTECTED] programista/administrator perl -le 's**02).4^-%2,).^9%4^!./4(%2^3,!#+7!2%^53%2**y% -;^[%`- { a%%s%%$_%ee' -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Best hardware/cost tradoff?
Thanks for all your replies! They are enlightening. I have some additional questions: 1) Would you prefer a) 5.4k 2 SATA RAID10 on four disks or b) 10k 2 SAS RAID1 on two disks? (Remember the lots (!) of random reads) 2) Should I just make one large partition of my RAID? Does it matter at all? 3) Will I gain much by putting the OS on a saparate disk, not included in the RAID? (The webserver and database would still share the RAID - but I guess the OS will cache my (small) web content in RAM anyway). Thanks again! -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Best hardware/cost tradoff?
On Thu, Aug 28, 2008 at 2:04 PM, Fernando Hevia [EMAIL PROTECTED] wrote: I havent had any issues with software raid (mdadm) and hot-swaps. It keeps working in degraded mode and as soon as you replace the defective disk it can reconstruct the array on the fly. Performance will suffer while at it but the service keeps up. I too put my vote behind mdadm for ease of use. However, there are reports that certain levels of RAID in linux kernel RAID that are supposed to NOT handle write barriers properly. So that's what worries me. The battery backup makes a very strong point for a hw controller. Still, I have heard good things on combining a HW controller with JBODS leaving the RAID affair to mdadm. In your scenario though with *lots* of random reads, This is especially true on slower RAID controllers. A lot of RAID controllers in the $300 range with battery backed caching don't do RAID real well, but do caching ok. If you can't afford a $1200 RAID card then this might be a good option. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Best hardware/cost tradoff?
On Thu, Aug 28, 2008 at 3:29 PM, cluster [EMAIL PROTECTED] wrote: Thanks for all your replies! They are enlightening. I have some additional questions: 1) Would you prefer a) 5.4k 2 SATA RAID10 on four disks or b) 10k 2 SAS RAID1 on two disks? (Remember the lots (!) of random reads) I'd lean towards 4 disks in RAID-10. Better performance when 1 read is going on. Similar commit rates to the two 10k drives. Probably bigger drives too, right? Always nice to have room to work in. 2) Should I just make one large partition of my RAID? Does it matter at all? Probably. With more disks it might be advantageous to split out two drives into RAID-10 for pg_xlog. with 2 or 4 disks, splitting off two for pg_xlog might slow down the data partition more than you gain from a separate pg_xlog drive set. 3) Will I gain much by putting the OS on a saparate disk, not included in the RAID? (The webserver and database would still share the RAID - but I guess the OS will cache my (small) web content in RAM anyway). The real reason you want your OS on a different set of drives is that it allows you to reconfigure your underlying RAID array as needed without having to reinstall the whole OS again. Yeah, logging to /var/log will eat some bandwidth on your RAID as well, but the ease of maintenance is why I do it as much as anything. A lot of large servers support 2 fixed drives for the OS and a lot of removeable drives hooked up to a RAID controller for this reason. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
On Thu, Aug 28, 2008 at 2:29 PM, Matthew Wakeling [EMAIL PROTECTED] wrote: Another point is that from a business perspective, a database that has stopped responding is equally bad regardless of whether that is because the OOM killer has appeared or because the machine is thrashing. In both cases, there is a maximum throughput that the machine can handle, and if requests appear quicker than that the system will collapse, especially if the requests start timing out and being retried. But there's a HUGE difference between a machine that has bogged down under load so badly that you have to reset it and a machine that's had the postmaster slaughtered by the OOM killer. In the first situation, while the machine is unresponsive, it should come right back up with a coherent database after the restart. OTOH, a machine with a dead postmaster is far more likely to have a corrupted database when it gets restarted. Likewise, I would be all for Postgres managing its memory better. It would be very nice to be able to set a maximum amount of work-memory, rather than a maximum amount per backend. Each backend could then make do with however much is left of the work-memory pool when it actually executes queries. As it is, the server admin has no idea how many multiples of work-mem are going to be actually used, even knowing the maximum number of backends. Agreed. It would be useful to have a cap on all work_mem, but it might be an issue that causes all the backends to talk to each other, which can be really slow if you're running a thousand or so connections. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] indexing for distinct search in timestamp based table
I once also had a similar performance problem when looking for all matching rows between two timestamps. In fact that's why I'm here today. The problem was with MySQL. I had some tables of around 10 million rows and all my searching was timestamp based. MySQL didn't do what I wanted. I found that using a CLUSTERED index with postgresql to be lightning quick. Yet mostly the matching rows I was working with was not much over the 100k mark. I'm wondering if clustering the table on ad_log_start_time will help cut down on random reads. That's if you can afford to block the users while postgresql clusters the table. If you're inserting in order of the start_time column (which I was) then the cluster should almost maintain itself (I think), providing you're not updating or deleting anyway, I'd assume that since it looks like a log table. David. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Rainer Mager Sent: 28 August 2008 09:06 To: pgsql-performance@postgresql.org Subject: [PERFORM] indexing for distinct search in timestamp based table I'm looking for some help in speeding up searches. My table is pretty simple (see below), but somewhat large, and continuously growing. Currently it has about 50 million rows. The table is (I know I have excessive indexes, I'm trying to get the appropriate ones and drop the extras): Table public.ad_log Column|Type | Modifiers --+-+--- - ad_log_id| integer | not null default nextval('ad_log_ad_log_id_seq'::regclass) channel_name | text| not null player_name | text| not null ad_name | text| not null start_time | timestamp without time zone | not null end_time | timestamp without time zone | not null Indexes: ad_log_pkey PRIMARY KEY, btree (ad_log_id) ad_log_channel_name_key UNIQUE, btree (channel_name, player_name, ad_name, start_time, end_time) ad_log_ad_and_start btree (ad_name, start_time) ad_log_ad_name btree (ad_name) ad_log_all btree (channel_name, player_name, start_time, ad_name) ad_log_channel_name btree (channel_name) ad_log_end_time btree (end_time) ad_log_player_and_start btree (player_name, start_time) ad_log_player_name btree (player_name) ad_log_start_time btree (start_time) The query I'm trying to speed up is below. In it the field tag can be one of channel_name, player_name, or ad_name. I'm actually trying to return the distinct values and I found GROUP BY to be slightly faster than using DISTINCT. Also, any of those fields may be unspecified in the WHERE clauses in which case we use '%', but it seems Postgres optimizes that pretty well. SELECT field FROM ad_log WHERE channel_name LIKE :channel_name AND player_name LIKE :player_name AND ad_name LIKE :ad_name AND start_time BETWEEN :start_date AND (date(:end_date) + 1) GROUP BY field ORDER BY field A typical query is: explain analyze SELECT channel_name FROM ad_log WHERE channel_name LIKE '%' AND ad_name LIKE '%' AND start_time BETWEEN '2008-07-01' AND (date('2008-07-28') + 1) GROUP BY channel_name ORDER BY channel_name; with the result being: QUERY PLAN --- Sort (cost=1163169.02..1163169.03 rows=5 width=10) (actual time=75460.187..75460.192 rows=15 loops=1) Sort Key: channel_name Sort Method: quicksort Memory: 17kB - HashAggregate (cost=1163168.91..1163168.96 rows=5 width=10) (actual time=75460.107..75460.114 rows=15 loops=1) - Bitmap Heap Scan on ad_log (cost=285064.30..1129582.84 rows=13434427 width=10) (actual time=8506.250..65771.597 rows=13701296 loops=1) Recheck Cond: ((start_time = '2008-07-01 00:00:00'::timestamp without time zone) AND (start_time = '2008-07-29'::date)) Filter: ((channel_name ~~ '%'::text) AND (ad_name ~~ '%'::text)) - Bitmap Index Scan on ad_log_start_time (cost=0.00..281705.70 rows=13434427 width=0) (actual time=8488.443..8488.443 rows=13701296 loops=1) Index Cond: ((start_time = '2008-07-01 00:00:00'::timestamp without time zone) AND (start_time = '2008-07-29'::date)) Total runtime: 75460.361 ms It seems to me there should be some way to create an index to speed this up, but the various ones I've tried so far haven't helped. Any suggestions would be greatly appreciated. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
Re: [PERFORM] indexing for distinct search in timestamp based table
Another suggestion is to partition the table by date ranges. If most of the range queries occur on particular batches of time, this will make all queries more efficient, and improve locality and efficiency of all indexes on the table. This is more work than simply a table CLUSTER, especially in maintenance overhead, but it will generally help a lot in cases like these. Additionally, if these don't change much after some period of time the tables older than the modification window can be vacuumed, clustered, and reindexed if needed to make them as efficient as possible and maintenance free after that point (other than backups and archives). Another benefit of clustering is in backup / restore. You can incrementally back up only the index partitions that have changed -- for large databases this reduces pg_dump and pg_restore times substantially. To do this you combine regular expressions with the pg_dump exclude tables or include tables flags. On Thu, Aug 28, 2008 at 3:48 PM, David Rowley [EMAIL PROTECTED] wrote: I once also had a similar performance problem when looking for all matching rows between two timestamps. In fact that's why I'm here today. The problem was with MySQL. I had some tables of around 10 million rows and all my searching was timestamp based. MySQL didn't do what I wanted. I found that using a CLUSTERED index with postgresql to be lightning quick. Yet mostly the matching rows I was working with was not much over the 100k mark. I'm wondering if clustering the table on ad_log_start_time will help cut down on random reads. That's if you can afford to block the users while postgresql clusters the table. If you're inserting in order of the start_time column (which I was) then the cluster should almost maintain itself (I think), providing you're not updating or deleting anyway, I'd assume that since it looks like a log table. David. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Rainer Mager Sent: 28 August 2008 09:06 To: pgsql-performance@postgresql.org Subject: [PERFORM] indexing for distinct search in timestamp based table I'm looking for some help in speeding up searches. My table is pretty simple (see below), but somewhat large, and continuously growing. Currently it has about 50 million rows. The table is (I know I have excessive indexes, I'm trying to get the appropriate ones and drop the extras): Table public.ad_log Column|Type | Modifiers --+-+--- - ad_log_id| integer | not null default nextval('ad_log_ad_log_id_seq'::regclass) channel_name | text| not null player_name | text| not null ad_name | text| not null start_time | timestamp without time zone | not null end_time | timestamp without time zone | not null Indexes: ad_log_pkey PRIMARY KEY, btree (ad_log_id) ad_log_channel_name_key UNIQUE, btree (channel_name, player_name, ad_name, start_time, end_time) ad_log_ad_and_start btree (ad_name, start_time) ad_log_ad_name btree (ad_name) ad_log_all btree (channel_name, player_name, start_time, ad_name) ad_log_channel_name btree (channel_name) ad_log_end_time btree (end_time) ad_log_player_and_start btree (player_name, start_time) ad_log_player_name btree (player_name) ad_log_start_time btree (start_time) The query I'm trying to speed up is below. In it the field tag can be one of channel_name, player_name, or ad_name. I'm actually trying to return the distinct values and I found GROUP BY to be slightly faster than using DISTINCT. Also, any of those fields may be unspecified in the WHERE clauses in which case we use '%', but it seems Postgres optimizes that pretty well. SELECT field FROM ad_log WHERE channel_name LIKE :channel_name AND player_name LIKE :player_name AND ad_name LIKE :ad_name AND start_time BETWEEN :start_date AND (date(:end_date) + 1) GROUP BY field ORDER BY field A typical query is: explain analyze SELECT channel_name FROM ad_log WHERE channel_name LIKE '%' AND ad_name LIKE '%' AND start_time BETWEEN '2008-07-01' AND (date('2008-07-28') + 1) GROUP BY channel_name ORDER BY channel_name; with the result being: QUERY PLAN --- Sort (cost=1163169.02..1163169.03 rows=5 width=10) (actual time=75460.187..75460.192 rows=15 loops=1) Sort Key: channel_name Sort Method: quicksort Memory: 17kB - HashAggregate (cost=1163168.91..1163168.96 rows=5 width=10) (actual time=75460.107..75460.114 rows=15 loops=1) - Bitmap Heap Scan
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
On Thu, Aug 28, 2008 at 5:08 PM, [EMAIL PROTECTED] wrote: On Thu, 28 Aug 2008, Scott Marlowe wrote: On Thu, Aug 28, 2008 at 2:29 PM, Matthew Wakeling [EMAIL PROTECTED] wrote: Another point is that from a business perspective, a database that has stopped responding is equally bad regardless of whether that is because the OOM killer has appeared or because the machine is thrashing. In both cases, there is a maximum throughput that the machine can handle, and if requests appear quicker than that the system will collapse, especially if the requests start timing out and being retried. But there's a HUGE difference between a machine that has bogged down under load so badly that you have to reset it and a machine that's had the postmaster slaughtered by the OOM killer. In the first situation, while the machine is unresponsive, it should come right back up with a coherent database after the restart. OTOH, a machine with a dead postmaster is far more likely to have a corrupted database when it gets restarted. wait a min here, postgres is supposed to be able to survive a complete box failure without corrupting the database, if killing a process can corrupt the database it sounds like a major problem. Yes it is a major problem, but not with postgresql. It's a major problem with the linux OOM killer killing processes that should not be killed. Would it be postgresql's fault if it corrupted data because my machine had bad memory? Or a bad hard drive? This is the same kind of failure. The postmaster should never be killed. It's the one thing holding it all together. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
On Thu, 28 Aug 2008, Scott Marlowe wrote: On Thu, Aug 28, 2008 at 5:08 PM, [EMAIL PROTECTED] wrote: On Thu, 28 Aug 2008, Scott Marlowe wrote: On Thu, Aug 28, 2008 at 2:29 PM, Matthew Wakeling [EMAIL PROTECTED] wrote: Another point is that from a business perspective, a database that has stopped responding is equally bad regardless of whether that is because the OOM killer has appeared or because the machine is thrashing. In both cases, there is a maximum throughput that the machine can handle, and if requests appear quicker than that the system will collapse, especially if the requests start timing out and being retried. But there's a HUGE difference between a machine that has bogged down under load so badly that you have to reset it and a machine that's had the postmaster slaughtered by the OOM killer. In the first situation, while the machine is unresponsive, it should come right back up with a coherent database after the restart. OTOH, a machine with a dead postmaster is far more likely to have a corrupted database when it gets restarted. wait a min here, postgres is supposed to be able to survive a complete box failure without corrupting the database, if killing a process can corrupt the database it sounds like a major problem. Yes it is a major problem, but not with postgresql. It's a major problem with the linux OOM killer killing processes that should not be killed. Would it be postgresql's fault if it corrupted data because my machine had bad memory? Or a bad hard drive? This is the same kind of failure. The postmaster should never be killed. It's the one thing holding it all together. the ACID guarantees that postgres is making are supposed to mean that even if the machine dies, the CPU goes up in smoke, etc, the transactions that are completed will not be corrupted. if killing the process voids all the ACID protection then something is seriously wrong. it may loose transactions that are in flight, but it should not corrupt the database. David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
On Thu, Aug 28, 2008 at 7:16 PM, [EMAIL PROTECTED] wrote: the ACID guarantees that postgres is making are supposed to mean that even if the machine dies, the CPU goes up in smoke, etc, the transactions that are completed will not be corrupted. And if any of those things happens, the machine will shut down and you'll be safe. if killing the process voids all the ACID protection then something is seriously wrong. No, your understanding of what postgresql can expect to have happen to it are wrong. You'll lose data integrity if: If a CPU starts creating bad output that gets written to disk, your RAID controller starts writing garbage to disk, your memory has bad bits and you don't have ECC, Some program hijacks a postgres process and starts writing random bits in the code, some program comes along and kills the postmaster, which coordinates all the backends, and corrupts shared data in the process. it may loose transactions that are in flight, but it should not corrupt the database. That's true for anything that just stops the machine or all the postgresql processes dead. It's not true for a machine that is misbehaving. And any server that randomly kills processes is misbehaving. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
On Thu, Aug 28, 2008 at 8:11 PM, Scott Marlowe [EMAIL PROTECTED]wrote: wait a min here, postgres is supposed to be able to survive a complete box failure without corrupting the database, if killing a process can corrupt the database it sounds like a major problem. Yes it is a major problem, but not with postgresql. It's a major problem with the linux OOM killer killing processes that should not be killed. Would it be postgresql's fault if it corrupted data because my machine had bad memory? Or a bad hard drive? This is the same kind of failure. The postmaster should never be killed. It's the one thing holding it all together. I fail to see the difference between the OOM killing it and the power going out. And yes, if the power went out and PG came up with a corrupted DB (assuming I didn't turn off fsync, etc) I *would* blame PG. I understand that killing the postmaster could stop all useful PG work, that it could cause it to stop responding to clients, that it could even crash PG, et ceterabut if a particular process dying causes corrupted DBs, that sounds borked to me.
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
On Thu, Aug 28, 2008 at 7:53 PM, Matthew Dennis [EMAIL PROTECTED] wrote: On Thu, Aug 28, 2008 at 8:11 PM, Scott Marlowe [EMAIL PROTECTED] wrote: wait a min here, postgres is supposed to be able to survive a complete box failure without corrupting the database, if killing a process can corrupt the database it sounds like a major problem. Yes it is a major problem, but not with postgresql. It's a major problem with the linux OOM killer killing processes that should not be killed. Would it be postgresql's fault if it corrupted data because my machine had bad memory? Or a bad hard drive? This is the same kind of failure. The postmaster should never be killed. It's the one thing holding it all together. I fail to see the difference between the OOM killing it and the power going out. Then you fail to understand. scenario 1: There's a postmaster, it owns all the child processes. It gets killed. The Postmaster gets restarted. Since there isn't one running, it comes up. starts new child processes. Meanwhile, the old child processes that don't belong to it are busy writing to the data store. Instant corruption. scenario 2: Someone pulls the plug. Every postgres child dies a quick death. Data on the drives is coherent and recoverable. And yes, if the power went out and PG came up with a corrupted DB (assuming I didn't turn off fsync, etc) I *would* blame PG. Then you might be wrong. If you were using the LVM, or certain levels of SW RAID, or a RAID controller with cache with no battery backing that is set to write-back, or if you were using an IDE or SATA drive / controller that didn't support write barriers, or using NFS mounts for database storage, and so on. My point being that PostgreSQL HAS to make certain assumptions about its environment that it simply cannot directly control or test for. Not having the postmaster shot in the head while the children keep running is one of those things. I understand that killing the postmaster could stop all useful PG work, that it could cause it to stop responding to clients, that it could even crash PG, et ceterabut if a particular process dying causes corrupted DBs, that sounds borked to me. Well, design a better method and implement it. If everything went through the postmaster you'd be lucky to get 100 transactions per second. There are compromises between performance and reliability under fire that have to be made. It is not unreasonable to assume that your OS is not going to randomly kill off processes because of a dodgy VM implementation quirk. P.s. I'm a big fan of linux, and I run my dbs on it. But I turn off overcommit and make a few other adjustments to make sure my database is safe. The OOM killer as a default is fine for workstations, but it's an insane setting for servers, much like swappiness=60 is an insane setting for a server too. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Nested Loop join being improperly chosen
I had a similar problem here: http://archives.postgresql.org/pgsql-bugs/2008-07/msg00026.php Is the nested loop performing a LEFT join with yours? It's a little difficult to tell just from the query plan you showed. A work around for mine was to use a full outer join and eliminate the extra rows in the where clause. A bit of a hack but it changed a 2 min query into one that ran in under a second. Of course this is not helping with your problem but at least may trigger some more feedback. David. -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] On Behalf Of Brad Ediger Sent: 22 August 2008 16:26 To: pgsql-performance@postgresql.org Subject: [PERFORM] Nested Loop join being improperly chosen Hello, I'm having trouble with a Nested Loop being selected for a rather complex query; it turns out this is a pretty bad plan as the nested loop's row estimates are quite off (1 estimated / 1207881 actual). If I disable enable_nestloop, the query executes much faster (42 seconds instead of 605). The tables in the query have all been ANALYZEd just before generating these plans. Here are the plans with and without enable_nestloop: http://pastie.org/258043 The inventory table is huge; it currently has about 1.3 x 10^9 tuples. The items table has around 10,000 tuples, and the other tables in the query are tiny. Any ideas or suggestions would be greatly appreciated. Thanks! -- Brad Ediger -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres not using array
On Thu, 21 Aug 2008, Andr? Volpato wrote: So, what is slowing down is the CPU (Intel(R) Pentium(R) Dual CPU E2160 @ 1.80GHz)..In practice, I have noticed that dual 1.8 is worse than single 3.0. We have another server wich is a Pentium D 3.0 GHz, that runs faster. Pentium D models are all dual-core so either you've got the wrong model number here or you've actually comparing against a 2X3.0GHz part. The Core 2 Duo E2160 has a very small CPU cache--512KB per core. Your older Pentium system probably has quite a bit more. I suspect that's the main reason it runs faster on this application. I am a bit confused about what CPU is best for Postgres. Our apps is mostly read, with a few connections and heavy queryes. There are a lot of things you can run into with Postgres that end up being limited by the fact that they only run on a single core, as you've seen here. If you've only got a fairly small number of connections running CPU heavy queries, you probably want a processor with lots of L2 cache and a fast clock speed, rather than adding a large number of cores running at a slower speed. The very small L2 cache on your E2160 is likely what's holding it back here, and even though the newer processors are significantly more efficient per clock the gap between 1.8GHz and 3.0GHz is pretty big. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
Scott Marlowe escribió: scenario 1: There's a postmaster, it owns all the child processes. It gets killed. The Postmaster gets restarted. Since there isn't one running, it comes up. Actually there's an additional step required at this point. There isn't a postmaster running, but a new one refuses to start, because the shmem segment is in use. In order for the second postmaster to start, the sysadmin must remove the PID file by hand. starts new child processes. Meanwhile, the old child processes that don't belong to it are busy writing to the data store. Instant corruption. In this scenario, it is both a kernel fault and sysadmin stupidity. The corruption that ensues is 100% deserved. -- Alvaro Herrerahttp://www.CommandPrompt.com/ PostgreSQL Replication, Consulting, Custom Development, 24x7 support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
On Thu, 28 Aug 2008, Scott Marlowe wrote: On Thu, Aug 28, 2008 at 7:53 PM, Matthew Dennis [EMAIL PROTECTED] wrote: On Thu, Aug 28, 2008 at 8:11 PM, Scott Marlowe [EMAIL PROTECTED] wrote: wait a min here, postgres is supposed to be able to survive a complete box failure without corrupting the database, if killing a process can corrupt the database it sounds like a major problem. Yes it is a major problem, but not with postgresql. It's a major problem with the linux OOM killer killing processes that should not be killed. Would it be postgresql's fault if it corrupted data because my machine had bad memory? Or a bad hard drive? This is the same kind of failure. The postmaster should never be killed. It's the one thing holding it all together. I fail to see the difference between the OOM killing it and the power going out. Then you fail to understand. scenario 1: There's a postmaster, it owns all the child processes. It gets killed. The Postmaster gets restarted. Since there isn't one when the postmaster gets killed doesn't that kill all it's children as well? running, it comes up. starts new child processes. Meanwhile, the old child processes that don't belong to it are busy writing to the data store. Instant corruption. if so then the postmaster should not only check if there is an existing postmaster running, it should check for the presense of the child processes as well. scenario 2: Someone pulls the plug. Every postgres child dies a quick death. Data on the drives is coherent and recoverable. And yes, if the power went out and PG came up with a corrupted DB (assuming I didn't turn off fsync, etc) I *would* blame PG. Then you might be wrong. If you were using the LVM, or certain levels of SW RAID, or a RAID controller with cache with no battery backing that is set to write-back, or if you were using an IDE or SATA drive / controller that didn't support write barriers, or using NFS mounts for database storage, and so on. these all fall under (assuming I didn't turn off fsync, etc) My point being that PostgreSQL HAS to make certain assumptions about its environment that it simply cannot directly control or test for. Not having the postmaster shot in the head while the children keep running is one of those things. I understand that killing the postmaster could stop all useful PG work, that it could cause it to stop responding to clients, that it could even crash PG, et ceterabut if a particular process dying causes corrupted DBs, that sounds borked to me. Well, design a better method and implement it. If everything went through the postmaster you'd be lucky to get 100 transactions per second. well, if you aren't going through the postmaster, what process is recieving network messages? it can't be a group of processes, only one can be listening to a socket at one time. and if the postmaster isn't needed for the child processes to write to the datastore, how are multiple child processes prevented from writing to the datastore normally? and why doesn't that mechanism continue to work? There are compromises between performance and reliability under fire that have to be made. It is not unreasonable to assume that your OS is not going to randomly kill off processes because of a dodgy VM implementation quirk. P.s. I'm a big fan of linux, and I run my dbs on it. But I turn off overcommit and make a few other adjustments to make sure my database is safe. The OOM killer as a default is fine for workstations, but it's an insane setting for servers, much like swappiness=60 is an insane setting for a server too. so are you saying that the only possible thing that can kill the postmaster is the OOM killer? it can't possilby exit in any other situation without the children being shutdown first? I would be surprised if that was really true. David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to setup disk spindles for best performance
On Thu, 21 Aug 2008, Christiaan Willemsen wrote: Anyway, I'm going to return the controller, because it does not scale very well with more that 4 disks in raid 10. Bandwidth is limited to 350MB/sec, and IOPS scale badly with extra disks... How did you determine that upper limit? Usually it takes multiple benchmark processes running at once in order to get more than 350MB/s out of a controller. For example, if you look carefully at the end of http://www.commandprompt.com/blogs/joshua_drake/2008/04/is_that_performance_i_smell_ext2_vs_ext3_on_50_spindles_testing_for_postgresql/ you can see that Joshua had to throw 8 threads at the disks in order to reach maximum bandwidth. The idea for xlog + os on 4 disk raid 10 and the rest for the data sound good I would just use a RAID1 pair for the OS, another pair for the xlog, and throw all the other disks into a big 0+1 set. There is some value to separating the WAL from the OS disks, from both the performance and the management perspectives. It's nice to be able to monitor the xlog write bandwidth rate under load easily for example. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
[EMAIL PROTECTED] escribió: On Thu, 28 Aug 2008, Scott Marlowe wrote: scenario 1: There's a postmaster, it owns all the child processes. It gets killed. The Postmaster gets restarted. Since there isn't one when the postmaster gets killed doesn't that kill all it's children as well? Of course not. The postmaster gets a SIGKILL, which is instant death. There's no way to signal the children. If they were killed too then this wouldn't be much of a problem. running, it comes up. starts new child processes. Meanwhile, the old child processes that don't belong to it are busy writing to the data store. Instant corruption. if so then the postmaster should not only check if there is an existing postmaster running, it should check for the presense of the child processes as well. See my other followup. There's limited things it can check, but against sysadmin stupidity there's no silver bullet. well, if you aren't going through the postmaster, what process is recieving network messages? it can't be a group of processes, only one can be listening to a socket at one time. Huh? Each backend has its own socket. and if the postmaster isn't needed for the child processes to write to the datastore, how are multiple child processes prevented from writing to the datastore normally? and why doesn't that mechanism continue to work? They use locks. Those locks are implemented using shared memory. If a new postmaster starts, it gets a new shared memory, and a new set of locks, that do not conflict with the ones already held by the first gang of backends. This is what causes the corruption. so are you saying that the only possible thing that can kill the postmaster is the OOM killer? it can't possilby exit in any other situation without the children being shutdown first? I would be surprised if that was really true. If the sysadmin sends a SIGKILL then obviously the same thing happens. Any other signal gives it the chance to signal the children before dying. -- Alvaro Herrerahttp://www.CommandPrompt.com/ The PostgreSQL Company - Command Prompt, Inc. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
On Thu, 28 Aug 2008, Alvaro Herrera wrote: [EMAIL PROTECTED] escribi?: On Thu, 28 Aug 2008, Scott Marlowe wrote: scenario 1: There's a postmaster, it owns all the child processes. It gets killed. The Postmaster gets restarted. Since there isn't one when the postmaster gets killed doesn't that kill all it's children as well? Of course not. The postmaster gets a SIGKILL, which is instant death. There's no way to signal the children. If they were killed too then this wouldn't be much of a problem. I'm not saying that it would signal it's children, I thought that the OS killed children (unless steps were taken to allow them to re-parent) well, if you aren't going through the postmaster, what process is recieving network messages? it can't be a group of processes, only one can be listening to a socket at one time. Huh? Each backend has its own socket. we must be talking about different things. I'm talking about the socket that would be used for clients to talk to postgres, this is either a TCP socket or a unix socket. in either case only one process can listen on it. and if the postmaster isn't needed for the child processes to write to the datastore, how are multiple child processes prevented from writing to the datastore normally? and why doesn't that mechanism continue to work? They use locks. Those locks are implemented using shared memory. If a new postmaster starts, it gets a new shared memory, and a new set of locks, that do not conflict with the ones already held by the first gang of backends. This is what causes the corruption. so the new postmaster needs to detect that there is a shared memory segment out that used by backends for this database. this doesn't sound that hard, basicly something similar to a pid file in the db directory that records what backends are running and what shared memory segment they are using. this would be similar to the existing pid file that would have to be removed manually before a new postmaster can start (if it's not a graceful shutdown) besides, some watchdog would need to start the new postmaster, that watchdog can be taught to kill off the child processes before starting a new postmaster along with clearing the pid file. so are you saying that the only possible thing that can kill the postmaster is the OOM killer? it can't possilby exit in any other situation without the children being shutdown first? I would be surprised if that was really true. If the sysadmin sends a SIGKILL then obviously the same thing happens. Any other signal gives it the chance to signal the children before dying. are you sure that it's not going to die from a memory allocation error? or any other similar type of error without _always_ killing the children? David Lang -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Identifying the nature of blocking I/O
On Sun, 24 Aug 2008, Tom Lane wrote: Mind you, I don't think Apple sells any hardware that would be really suitable for a big-ass database server. If you have money to burn, you can get an XServe with up to 8 cores and 32GB of RAM, and get a card to connect it to a Fiber Channel disk array. For only moderately large requirements, you can even get a card with 256MB of battery-backed cache (rebranded LSI) to attach the 3 drives in the chassis. None of these are very cost effective compared to servers like the popular HP models people mention here regularly, but it is possible. As for Systemtap on Linux, it might be possible that will accumulate enough of a standard library to be usable by regular admins one day, but I don't see any sign that's a priority for development. Right now what you have to know in order to write useful scripts is so much more complicated than DTrace, where there's all sorts of useful things you can script trivially. I think a good part of DTrace's success comes from flattening that learning curve. Take a look at the one-liners at http://www.solarisinternals.com/wiki/index.php/DTraceToolkit and compare them against http://sourceware.org/systemtap/examples/ That complexity works against the tool on so many levels. For example, I can easily imagine selling even a paranoid admin on running a simple DTrace script like the one-line examples. Whereas every Systemtap example I've seen looks pretty scary at first, and I can't imagine a DBA in a typical enterprise environment being able to convince their associated admin team they're perfectly safe to run in production. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
On Tue, 26 Aug 2008, Scott Marlowe wrote: If it is a checkpoint issue then you need more aggresive bgwriter settings, and possibly more bandwidth on your storage array. Since this is 8.3.1 the main useful thing to do is increase checkpoint_segments and checkpoint_completion_target to spread the I/O over a longer period. Making the background writer more aggressive doesn't really help with What is checkpoint_segments set to on this system? If it's still at the default of 3, you should increase that dramatically. What does vmstat 10 say during these spikes? If you're running the sysstate service with data collection then sar can tell you a lot. Henk seemed a bit confused about this suggestion, and the typo doesn't help. You can install the sysstat package with: # apt-get install sysstat This allows collecting system load info data at regular periods, automatically, and sar is the tool you can use to look at it. On Debian, in order to get it to collect that information for you, I believe you just need to do: # dpkg-reconfigure sysstat Then answer yes to Do you want to activate sysstat's cron job? This will install a crontab file that collects all the data you need for sar to work. You may need to restart the service after that. There's a useful walkthrough for this at http://www.linuxweblog.com/blogs/wizap/20080126/sysstat-ubuntu -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
On Thu, 28 Aug 2008, Bill Moran wrote: In linux, it's possible to tell the OOM killer never to consider certain processes for the axe, using /proc magic. See this page: http://linux-mm.org/OOM_Killer Perhaps this should be in the PostgreSQL docs somewhere? The fact that http://www.postgresql.org/docs/current/static/kernel-resources.html#AEN22218 tells you to flat-out turn off overcommit is the right conservative thing to be in the documentation as I see it. Sure, it's possible to keep it on but disable the worst side-effect in some kernels (looks like 2.6.11+, so no RHEL4 for example). Trying to get into all in the manual is kind of pushing what's appropriate for the PostgreSQL docs I think. -- * Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] How to setup disk spindles for best performance
On Aug 29, 2008, at 4:43 AM, Greg Smith wrote: On Thu, 21 Aug 2008, Christiaan Willemsen wrote: Anyway, I'm going to return the controller, because it does not scale very well with more that 4 disks in raid 10. Bandwidth is limited to 350MB/sec, and IOPS scale badly with extra disks... How did you determine that upper limit? Usually it takes multiple benchmark processes running at once in order to get more than 350MB/ s out of a controller. For example, if you look carefully at the end of http://www.commandprompt.com/blogs/joshua_drake/2008/04/is_that_performance_i_smell_ext2_vs_ext3_on_50_spindles_testing_for_postgresql/ you can see that Joshua had to throw 8 threads at the disks in order to reach maximum bandwidth. I used IOmeter to do some tests, with 50 worker thread doing jobs. I can get more than 350 MB/sec, I'll have to use huge blocksizes (something like 8 MB). Even worse is random read and 70%read, 50% random tests. They don't scale at all when you add disks. A 6 disk raid 5 is exactly as fast as a 12 disk raid 10 :( The idea for xlog + os on 4 disk raid 10 and the rest for the data sound good I would just use a RAID1 pair for the OS, another pair for the xlog, and throw all the other disks into a big 0+1 set. There is some value to separating the WAL from the OS disks, from both the performance and the management perspectives. It's nice to be able to monitor the xlog write bandwidth rate under load easily for example. Yes, that's about what I had in mind. Kind regards, Christiaan -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] select on 22 GB table causes An I/O error occured while sending to the backend. exception
[EMAIL PROTECTED] wrote: for example if you have a process that uses 1G of ram (say firefox) and it needs to start a new process (say acroread to handle a pdf file), what it does is it forks the firefox process (each of which have 1G of ram allocated), and then does an exec of the acroread process (releasing the 1G of ram previously held by that copy of the firefox process) Indeed, which is why we have vfork. And, OK, vfork is busted if you have a threaded environment, so we have posix_spawn and posix_spawnp. It is also worth noting that the copy isn't really a full copy on any decent modern UNIX - it is a reservation against the total swap space available. Most pages will be happilly shared copy-on-write and never fully copied to the child before the exec. I can't see how an OS can lie to processes about memory being allocated to them and not be ridiculed as a toy, but there you go. I don't think Linux is the only perpetrator - doesn't AIX do this too? The 'bests trategy' for the OOM killer is not to have one, and accept that you need some swap space available (it doesn't have to be fast since it won't actually be touched) to help out when fork/exec happens in big process images. James -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance