Re: [PATCHES] [HACKERS] Bgwriter behavior
Later version of this patch added to the patch queue. Your patch has been added to the PostgreSQL unapplied patches list at: http://momjian.postgresql.org/cgi-bin/pgpatches It will be applied as soon as one of the PostgreSQL committers reviews and approves it. --- Simon Riggs wrote: On Sat, 2005-01-01 at 17:47, Simon Riggs wrote: On Sat, 2005-01-01 at 17:01, Bruce Momjian wrote: Simon Riggs wrote: Well, I think we're saying: its not in 8.0 now, and we take our time to consider patches for 8.1 and accept the situation that the parameter names/meaning will change in next release. I have no problem doing something for 8.0 if we can find something that meets all the items I mentioned. One idea would be to just remove bgwriter_percent. Beta/RC users would still have it in their postgresql.conf, but it is commented out so it should be OK. If they uncomment it their server would not start but we could just tell testers to remove it. I see that as better than having conflicting parameters. Can't say I like that at first thought. I'll think some more though... Another idea is to have bgwriter_percent be the percent of the buffer it will scan. Hmmmwell that was my original suggestion (bg2.patch on 12 Dec) (...though with a bug, as Neil pointed out) We could default that to 50% or 100%, but we then need to make sure all beta/RC users update their postgresql.conf with the new default because the commented-out default will not be correct. ...we just differ/ed on what the default should be... At this point I see these as our only two viable options, aside from doing nothing. I realize our current behavior requires a full scan of the buffer cache, but how often is the bgwriter_maxpages limit met? If it is not a full scan is done anyway, right? Well, if you heavy a very heavy read workload then that would be a problem. I was more worried about concurrency in a heavy write situation, but I can see your point, and agree. (Idea #1 still suffers from this, so we should rule it out...) It seems the only way to really add functionality is to change bgwriter_precent to control how much of the buffer is scanned. OK. I think you've persuaded me on idea #2, if I understand you right: bgwriter_percent = 50 (default) bgwriter_maxpages = 100 (default) percent is the number of shared_buffers we scan, limited by maxpages. (I'll code it up in a couple of hours when the kids are in bed) Here's the basic patch - no changes to current default values or docs. Not sure if this is still interesting or not... -- Best Regards, Simon Riggs [ Attachment, skipping... ] ---(end of broadcast)--- TIP 8: explain analyze is your friend -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PATCHES] [HACKERS] Bgwriter behavior
On Fri, 7 Jan 2005, Bruce Momjian wrote: Do we want to add this additional log infor to CVS for 8.0? No, unless we're looking for an RC5? --- Simon Riggs wrote: On Mon, 2005-01-03 at 19:14 -0500, Bruce Momjian wrote: Simon Riggs wrote: Here's my bgwriter instrumentation patch, which gives info that could allow the bgwriter settings to be tuned. Uh, what does this do exactly? Add additional logging output? Produces output like this... DEBUG:ARC T1target= 45 B1len= 4954 T1len= 40 T2len= 4960 B2len= 46 DEBUG:ARC total = 98% B1hit= 0% T1hit= 0% T2hit= 98% B2hit= 0% DEBUG:ARC buffer dirty misses= 22% (wasted=0); cleaned= 4494 when you have debug_shared_buffers (= n) set and you have server messages DEBUG1 available. The last line of log output has been replaced by this version. -- Best Regards, Simon Riggs ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED]) Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] [HACKERS] Bgwriter behavior
Marc G. Fournier [EMAIL PROTECTED] writes: On Fri, 7 Jan 2005, Bruce Momjian wrote: Do we want to add this additional log infor to CVS for 8.0? No, unless we're looking for an RC5? I vote no as well. While it's probably not a dangerous change, the need for it has not been demonstrated. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PATCHES] [HACKERS] Bgwriter behavior
Tom Lane wrote: Marc G. Fournier [EMAIL PROTECTED] writes: On Fri, 7 Jan 2005, Bruce Momjian wrote: Do we want to add this additional log infor to CVS for 8.0? No, unless we're looking for an RC5? I vote no as well. While it's probably not a dangerous change, the need for it has not been demonstrated. OK, Simon, would you email me a copy of the patch again privately so I can put it in the 8.1 queue. I seem to have lost the email. Thanks. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PATCHES] [HACKERS] Bgwriter behavior
Do we want to add this additional log infor to CVS for 8.0? --- Simon Riggs wrote: On Mon, 2005-01-03 at 19:14 -0500, Bruce Momjian wrote: Simon Riggs wrote: Here's my bgwriter instrumentation patch, which gives info that could allow the bgwriter settings to be tuned. Uh, what does this do exactly? Add additional logging output? Produces output like this... DEBUG:ARC T1target= 45 B1len= 4954 T1len= 40 T2len= 4960 B2len= 46 DEBUG:ARC total = 98% B1hit= 0% T1hit= 0% T2hit= 98% B2hit= 0% DEBUG:ARC buffer dirty misses= 22% (wasted=0); cleaned= 4494 when you have debug_shared_buffers (= n) set and you have server messages DEBUG1 available. The last line of log output has been replaced by this version. -- Best Regards, Simon Riggs ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PATCHES] [HACKERS] Bgwriter behavior
On Mon, 2005-01-03 at 19:14 -0500, Bruce Momjian wrote: Simon Riggs wrote: Here's my bgwriter instrumentation patch, which gives info that could allow the bgwriter settings to be tuned. Uh, what does this do exactly? Add additional logging output? Produces output like this... DEBUG:ARC T1target= 45 B1len= 4954 T1len= 40 T2len= 4960 B2len= 46 DEBUG:ARC total = 98% B1hit= 0% T1hit= 0% T2hit= 98% B2hit= 0% DEBUG:ARC buffer dirty misses= 22% (wasted=0); cleaned= 4494 when you have debug_shared_buffers (= n) set and you have server messages DEBUG1 available. The last line of log output has been replaced by this version. -- Best Regards, Simon Riggs ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PATCHES] [HACKERS] Bgwriter behavior
OK, we have a submitted patch that attempts to improve bgwriter by making bgwriter_percent control what percentage of the buffer is scanned. The patch still needs doc changes and a change to the default value but at this point we need a vote on the patch. Is it: * too late for 8.0 * not the right improvement * to be applied with doc/default additions Comments? --- Simon Riggs wrote: On Sat, 2005-01-01 at 17:47, Simon Riggs wrote: On Sat, 2005-01-01 at 17:01, Bruce Momjian wrote: Simon Riggs wrote: Well, I think we're saying: its not in 8.0 now, and we take our time to consider patches for 8.1 and accept the situation that the parameter names/meaning will change in next release. I have no problem doing something for 8.0 if we can find something that meets all the items I mentioned. One idea would be to just remove bgwriter_percent. Beta/RC users would still have it in their postgresql.conf, but it is commented out so it should be OK. If they uncomment it their server would not start but we could just tell testers to remove it. I see that as better than having conflicting parameters. Can't say I like that at first thought. I'll think some more though... Another idea is to have bgwriter_percent be the percent of the buffer it will scan. Hmmmwell that was my original suggestion (bg2.patch on 12 Dec) (...though with a bug, as Neil pointed out) We could default that to 50% or 100%, but we then need to make sure all beta/RC users update their postgresql.conf with the new default because the commented-out default will not be correct. ...we just differ/ed on what the default should be... At this point I see these as our only two viable options, aside from doing nothing. I realize our current behavior requires a full scan of the buffer cache, but how often is the bgwriter_maxpages limit met? If it is not a full scan is done anyway, right? Well, if you heavy a very heavy read workload then that would be a problem. I was more worried about concurrency in a heavy write situation, but I can see your point, and agree. (Idea #1 still suffers from this, so we should rule it out...) It seems the only way to really add functionality is to change bgwriter_precent to control how much of the buffer is scanned. OK. I think you've persuaded me on idea #2, if I understand you right: bgwriter_percent = 50 (default) bgwriter_maxpages = 100 (default) percent is the number of shared_buffers we scan, limited by maxpages. (I'll code it up in a couple of hours when the kids are in bed) Here's the basic patch - no changes to current default values or docs. Not sure if this is still interesting or not... -- Best Regards, Simon Riggs [ Attachment, skipping... ] ---(end of broadcast)--- TIP 8: explain analyze is your friend -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PATCHES] [HACKERS] Bgwriter behavior
Bruce Momjian pgman@candle.pha.pa.us writes: OK, we have a submitted patch that attempts to improve bgwriter by making bgwriter_percent control what percentage of the buffer is scanned. The patch still needs doc changes and a change to the default value but at this point we need a vote on the patch. Is it: * too late for 8.0 * not the right improvement * to be applied with doc/default additions My vote: too late for 8.0. There is no hard evidence that this is a useful improvement, and no time for such evidence to be obtained. regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PATCHES] [HACKERS] Bgwriter behavior
On Mon, 3 Jan 2005, Bruce Momjian wrote: OK, we have a submitted patch that attempts to improve bgwriter by making bgwriter_percent control what percentage of the buffer is scanned. The patch still needs doc changes and a change to the default value but at this point we need a vote on the patch. Is it: * too late for 8.0 Too late by at least 3 RCs ... Marc G. Fournier Hub.Org Networking Services (http://www.hub.org) Email: [EMAIL PROTECTED] Yahoo!: yscrappy ICQ: 7615664 ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PATCHES] [HACKERS] Bgwriter behavior
On Mon, 2005-01-03 at 20:09, Bruce Momjian wrote: OK, we have a submitted patch that attempts to improve bgwriter by making bgwriter_percent control what percentage of the buffer is scanned. The patch still needs doc changes and a change to the default value but at this point we need a vote on the patch. Is it: * too late for 8.0 * not the right improvement * to be applied with doc/default additions Comments? --- Simon Riggs wrote: On Sat, 2005-01-01 at 17:47, Simon Riggs wrote: On Sat, 2005-01-01 at 17:01, Bruce Momjian wrote: Simon Riggs wrote: Well, I think we're saying: its not in 8.0 now, and we take our time to consider patches for 8.1 and accept the situation that the parameter names/meaning will change in next release. I hear veto ... so the above situation stands then: 8.1 it is. Not unhappy...I want this thing released as much as the next man... -- Best Regards, Simon Riggs ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PATCHES] [HACKERS] Bgwriter behavior
Simon Riggs wrote: On Mon, 2005-01-03 at 20:09, Bruce Momjian wrote: OK, we have a submitted patch that attempts to improve bgwriter by making bgwriter_percent control what percentage of the buffer is scanned. The patch still needs doc changes and a change to the default value but at this point we need a vote on the patch. Is it: * too late for 8.0 * not the right improvement * to be applied with doc/default additions Comments? --- Simon Riggs wrote: On Sat, 2005-01-01 at 17:47, Simon Riggs wrote: On Sat, 2005-01-01 at 17:01, Bruce Momjian wrote: Simon Riggs wrote: Well, I think we're saying: its not in 8.0 now, and we take our time to consider patches for 8.1 and accept the situation that the parameter names/meaning will change in next release. I hear veto ... so the above situation stands then: 8.1 it is. Not unhappy...I want this thing released as much as the next man... Well, we went through the process and that's the best we can do. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PATCHES] [HACKERS] Bgwriter behavior
On Mon, 2005-01-03 at 23:03, Bruce Momjian wrote: Simon Riggs wrote: On Mon, 2005-01-03 at 20:09, Bruce Momjian wrote: OK, we have a submitted patch that attempts to improve bgwriter by making bgwriter_percent control what percentage of the buffer is scanned. The patch still needs doc changes and a change to the default value but at this point we need a vote on the patch. Is it: * too late for 8.0 * not the right improvement * to be applied with doc/default additions Comments? --- Simon Riggs wrote: On Sat, 2005-01-01 at 17:47, Simon Riggs wrote: On Sat, 2005-01-01 at 17:01, Bruce Momjian wrote: Simon Riggs wrote: Well, I think we're saying: its not in 8.0 now, and we take our time to consider patches for 8.1 and accept the situation that the parameter names/meaning will change in next release. I hear veto ... so the above situation stands then: 8.1 it is. Not unhappy...I want this thing released as much as the next man... Well, we went through the process and that's the best we can do. Here's my bgwriter instrumentation patch, which gives info that could allow the bgwriter settings to be tuned. -- Best Regards, Simon Riggs Index: src/backend/storage/buffer/bufmgr.c === RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v retrieving revision 1.182 diff -d -c -r1.182 bufmgr.c *** src/backend/storage/buffer/bufmgr.c 24 Nov 2004 02:56:17 - 1.182 --- src/backend/storage/buffer/bufmgr.c 4 Jan 2005 00:04:18 - *** *** 440,445 --- 440,446 UnpinBuffer(buf, true); inProgress = FALSE; buf = NULL; + StrategyBufferStatWastedIO(); } } } while (buf == NULL); *** *** 682,687 --- 683,689 BufferDesc **dirty_buffers; BufferTag *buftags; int num_buffer_dirty; + int num_buffer_cleaned = 0; int i; /* If either limit is zero then we are disabled from doing anything... */ *** *** 770,775 --- 772,778 TerminateBufferIO(bufHdr, 0); UnpinBuffer(bufHdr, true); + num_buffer_cleaned++; } LWLockRelease(BufMgrLock); *** *** 777,782 --- 780,787 pfree(dirty_buffers); pfree(buftags); + StrategyBufferStatCleaned(num_buffer_cleaned); + return num_buffer_dirty; } Index: src/backend/storage/buffer/freelist.c === RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/freelist.c,v retrieving revision 1.48 diff -d -c -r1.48 freelist.c *** src/backend/storage/buffer/freelist.c 16 Sep 2004 16:58:31 - 1.48 --- src/backend/storage/buffer/freelist.c 4 Jan 2005 00:04:18 - *** *** 115,120 --- 115,133 } while(0) + void + StrategyBufferStatWastedIO(void) + { + StrategyControl-num_wasted++; + } + + void + StrategyBufferStatCleaned(long num_cleaned) + { + StrategyControl-num_cleaned += num_cleaned; + } + + /* * Printout for use when DebugSharedBuffers is enabled */ *** *** 130,159 t1_hit, t2_hit, b2_hit; - int id, - t1_clean, - t2_clean; ErrorContextCallback *errcxtold; - id = StrategyControl-listHead[STRAT_LIST_T1]; - t1_clean = 0; - while (id = 0) - { - if (BufferDescriptors[StrategyCDB[id].buf_id].flags BM_DIRTY) - break; - t1_clean++; - id = StrategyCDB[id].next; - } - id = StrategyControl-listHead[STRAT_LIST_T2]; - t2_clean = 0; - while (id = 0) - { - if (BufferDescriptors[StrategyCDB[id].buf_id].flags BM_DIRTY) - break; - t2_clean++; - id = StrategyCDB[id].next; - } - if (StrategyControl-num_lookup == 0) all_hit = b1_hit = t1_hit = t2_hit = b2_hit = 0; else --- 143,150 *** *** 166,185 StrategyControl-num_lookup); b2_hit = (StrategyControl-num_hit[STRAT_LIST_B2] * 100 / StrategyControl-num_lookup); ! all_hit = b1_hit + t1_hit + t2_hit + b2_hit; } errcxtold = error_context_stack; error_context_stack = NULL; elog(DEBUG1, ARC T1target=%5d B1len=%5d T1len=%5d T2len=%5d B2len=%5d, T1_TARGET, B1_LENGTH, T1_LENGTH, T2_LENGTH, B2_LENGTH); ! elog(DEBUG1, ARC total =%4ld%% B1hit=%4ld%% T1hit=%4ld%% T2hit=%4ld%% B2hit=%4ld%%, all_hit, b1_hit, t1_hit, t2_hit, b2_hit); ! elog(DEBUG1, ARC clean buffers at LRU T1= %5d T2= %5d, ! t1_clean, t2_clean); ! error_context_stack = errcxtold; StrategyControl-num_lookup = 0; StrategyControl-num_hit[STRAT_LIST_B1] = 0; StrategyControl-num_hit[STRAT_LIST_T1] = 0; StrategyControl-num_hit[STRAT_LIST_T2] = 0; --- 157,188 StrategyControl-num_lookup);
Re: [PATCHES] [HACKERS] Bgwriter behavior
Simon Riggs wrote: Here's my bgwriter instrumentation patch, which gives info that could allow the bgwriter settings to be tuned. Uh, what does this do exactly? Add additional logging output? -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PATCHES] [HACKERS] Bgwriter behavior
This has been saved for the 8.1 release: http:/momjian.postgresql.org/cgi-bin/pgpatches2 --- Simon Riggs wrote: On Sat, 2005-01-01 at 17:47, Simon Riggs wrote: On Sat, 2005-01-01 at 17:01, Bruce Momjian wrote: Simon Riggs wrote: Well, I think we're saying: its not in 8.0 now, and we take our time to consider patches for 8.1 and accept the situation that the parameter names/meaning will change in next release. I have no problem doing something for 8.0 if we can find something that meets all the items I mentioned. One idea would be to just remove bgwriter_percent. Beta/RC users would still have it in their postgresql.conf, but it is commented out so it should be OK. If they uncomment it their server would not start but we could just tell testers to remove it. I see that as better than having conflicting parameters. Can't say I like that at first thought. I'll think some more though... Another idea is to have bgwriter_percent be the percent of the buffer it will scan. Hmmmwell that was my original suggestion (bg2.patch on 12 Dec) (...though with a bug, as Neil pointed out) We could default that to 50% or 100%, but we then need to make sure all beta/RC users update their postgresql.conf with the new default because the commented-out default will not be correct. ...we just differ/ed on what the default should be... At this point I see these as our only two viable options, aside from doing nothing. I realize our current behavior requires a full scan of the buffer cache, but how often is the bgwriter_maxpages limit met? If it is not a full scan is done anyway, right? Well, if you heavy a very heavy read workload then that would be a problem. I was more worried about concurrency in a heavy write situation, but I can see your point, and agree. (Idea #1 still suffers from this, so we should rule it out...) It seems the only way to really add functionality is to change bgwriter_precent to control how much of the buffer is scanned. OK. I think you've persuaded me on idea #2, if I understand you right: bgwriter_percent = 50 (default) bgwriter_maxpages = 100 (default) percent is the number of shared_buffers we scan, limited by maxpages. (I'll code it up in a couple of hours when the kids are in bed) Here's the basic patch - no changes to current default values or docs. Not sure if this is still interesting or not... -- Best Regards, Simon Riggs [ Attachment, skipping... ] ---(end of broadcast)--- TIP 8: explain analyze is your friend -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PATCHES] [HACKERS] Bgwriter behavior
On Sat, 2005-01-01 at 06:20, Bruce Momjian wrote: This change isn't going to make it for RC3, and it probably not something we want to rush. OK. Thank you. I think there are a few issues involved: o everyone agrees the current meaning of bgwriter_percent is useless (percent of dirty buffers) o removal of bgwriter_percent will cause problems because postgresql.conf is only installed via initdb, so beta users will have to have some workaround so their existing postgresql.conf files work. o bgwriter_percent and bgwriter_maxpages are duplicate for a given number of buffers and it isn't clear which one takes precedence. o 8.1 might use these variables with different meanings, causing slight upgrade confusion. o Another idea is for bgwriter_percent to control how much of the buffer is scanned. Agreed. Would add as item #1: current behaviour of bgwriter causes sub-optimal performance for 8.0, for systems with a high write workload, more CPUs and higher shared_buffers. Tom feels bgwriter_maxpages is good because it allows the user to specify the I/O traffic, while bgwriter_percent as total pages (not just dirty ones) is perhaps easier to set a default (I/O load varies based on buffer cache size) and perhaps easier to understand. Agreed. I am not sure what to suggest at this point but whatever solution we use should take the above issues into account. Well, I think we're saying: its not in 8.0 now, and we take our time to consider patches for 8.1 and accept the situation that the parameter names/meaning will change in next release. The patch is there if that decision changes, but I'll say no more on it. --- Simon Riggs wrote: On Fri, 2004-12-31 at 01:14, Bruce Momjian wrote: Simon Riggs wrote: On Mon, 2004-12-27 at 22:21, Bruce Momjian wrote: Should we consider at least adjusting the meaning of bgwriter_percent? Yes. As things stand, this is the only change that seems safe. Here's a very short patch that implements this change within BufferSync in bufmgr.c - No algorithm changes - No error message changes - Only change is the call to StrategyDirtyBufferList is made using the maximum number of buffers that will be cleaned, rather than uselessly trawling through all of shared_buffers This changes the meaning of bgwriter_percent from percent of dirty buffers to percent of shared_buffers. The default settings of 1% of 1000 buffers gives up to 10 dirty block writes every 250ms Benefit: allows performance tuning by increases options for setting bgwriter_delay which would otherwise have an ineffectually high minimum setting Risk: low 1-line doc patch to follow, if this is approved. I am not objecting to the patch, but what value is there in having both bgwriter_percent and bgwriter_maxpages? Seems both are redundant and that one would be enough. In brief: i) for now: as little change as possible is good ii) the two parameters are OK iii) trying to decide an alternative takes time, which we do not have iv) what is presented here is simply a performance bug fix, not the best long term alternative... I'd like to move quickly: if we do this (or an alternative), it has to be done soon and it would be easy to discuss this until we run out of time. Could we vote: in RC3, or not? In more detail... The value of having both is: i) as little change as possible at this stage of RC - the main one ...which gives us stability ...and also avoids having to re-discuss what they *should* be ii) Having two isn't that bad. bgwriter_percent auto adjusts the length of the to-be-cleaned-list, so it is roughly useful anywhere between 500 and 1 shared_buffers. That is IMHO slightly more useful than a hard definition set via bgwriter_maxpages, since that is likely to be set wrong anyway - but has some value as an outside limit on the number of pages. [You may wish to set shared_buffers 1 even on smaller servers, since many now have 2GB RAM and yet a relatively poor I/O subsystem. Having maxpages set separately allows the majority of people to set shared_buffers higher without swamping their I/O subsystems because they didn't know about the r8.0 bgwriter feature/parameters] iii) changing the parameters might tempt us towards changing the algorithm, which is not a topic we have reached agreement on iv) I see it as a goal to remove all of those parameters anyway, as well as explore some of the many options and ideas everybody has presented, so further change is likely at the next release whatever is done now. The patch is as simple as I can make it and yet remove the unnecessary performance effect in the existing
Re: [PATCHES] [HACKERS] Bgwriter behavior
Simon Riggs wrote: On Sat, 2005-01-01 at 06:20, Bruce Momjian wrote: This change isn't going to make it for RC3, and it probably not something we want to rush. OK. Thank you. I think there are a few issues involved: o everyone agrees the current meaning of bgwriter_percent is useless (percent of dirty buffers) o removal of bgwriter_percent will cause problems because postgresql.conf is only installed via initdb, so beta users will have to have some workaround so their existing postgresql.conf files work. o bgwriter_percent and bgwriter_maxpages are duplicate for a given number of buffers and it isn't clear which one takes precedence. o 8.1 might use these variables with different meanings, causing slight upgrade confusion. o Another idea is for bgwriter_percent to control how much of the buffer is scanned. Agreed. Would add as item #1: current behaviour of bgwriter causes sub-optimal performance for 8.0, for systems with a high write workload, more CPUs and higher shared_buffers. Tom feels bgwriter_maxpages is good because it allows the user to specify the I/O traffic, while bgwriter_percent as total pages (not just dirty ones) is perhaps easier to set a default (I/O load varies based on buffer cache size) and perhaps easier to understand. Agreed. I am not sure what to suggest at this point but whatever solution we use should take the above issues into account. Well, I think we're saying: its not in 8.0 now, and we take our time to consider patches for 8.1 and accept the situation that the parameter names/meaning will change in next release. I have no problem doing something for 8.0 if we can find something that meets all the items I mentioned. One idea would be to just remove bgwriter_percent. Beta/RC users would still have it in their postgresql.conf, but it is commented out so it should be OK. If they uncomment it their server would not start but we could just tell testers to remove it. I see that as better than having conflicting parameters. Another idea is to have bgwriter_percent be the percent of the buffer it will scan. We could default that to 50% or 100%, but we then need to make sure all beta/RC users update their postgresql.conf with the new default because the commented-out default will not be correct. At this point I see these as our only two viable options, aside from doing nothing. I realize our current behavior requires a full scan of the buffer cache, but how often is the bgwriter_maxpages limit met? If it is not a full scan is done anyway, right? It seems the only way to really add functionality is to change bgwriter_precent to control how much of the buffer is scanned. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] [HACKERS] Bgwriter behavior
On Sat, 2005-01-01 at 17:01, Bruce Momjian wrote: Simon Riggs wrote: Well, I think we're saying: its not in 8.0 now, and we take our time to consider patches for 8.1 and accept the situation that the parameter names/meaning will change in next release. I have no problem doing something for 8.0 if we can find something that meets all the items I mentioned. One idea would be to just remove bgwriter_percent. Beta/RC users would still have it in their postgresql.conf, but it is commented out so it should be OK. If they uncomment it their server would not start but we could just tell testers to remove it. I see that as better than having conflicting parameters. Can't say I like that at first thought. I'll think some more though... Another idea is to have bgwriter_percent be the percent of the buffer it will scan. Hmmmwell that was my original suggestion (bg2.patch on 12 Dec) (...though with a bug, as Neil pointed out) We could default that to 50% or 100%, but we then need to make sure all beta/RC users update their postgresql.conf with the new default because the commented-out default will not be correct. ...we just differ/ed on what the default should be... At this point I see these as our only two viable options, aside from doing nothing. I realize our current behavior requires a full scan of the buffer cache, but how often is the bgwriter_maxpages limit met? If it is not a full scan is done anyway, right? Well, if you heavy a very heavy read workload then that would be a problem. I was more worried about concurrency in a heavy write situation, but I can see your point, and agree. (Idea #1 still suffers from this, so we should rule it out...) It seems the only way to really add functionality is to change bgwriter_precent to control how much of the buffer is scanned. OK. I think you've persuaded me on idea #2, if I understand you right: bgwriter_percent = 50 (default) bgwriter_maxpages = 100 (default) percent is the number of shared_buffers we scan, limited by maxpages. (I'll code it up in a couple of hours when the kids are in bed) -- Best Regards, Simon Riggs ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PATCHES] [HACKERS] Bgwriter behavior
Bruce Momjian pgman@candle.pha.pa.us writes: o everyone agrees the current meaning of bgwriter_percent is useless (percent of dirty buffers) Oh? It's not useless by any means; it's a perfectly reasonable and useful definition that happens to be expensive to implement. One of the questions that is not answered to my satisfaction is what is an adequate substitute that doesn't lose needed functionality. o bgwriter_percent and bgwriter_maxpages are duplicate for a given number of buffers and it isn't clear which one takes precedence. Not unless the current definition of bgwriter_percent is changed. Please try to make sure that your summaries reduce confusion instead of increasing it. regards, tom lane ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] [HACKERS] Bgwriter behavior
On Sat, 2005-01-01 at 17:47, Simon Riggs wrote: On Sat, 2005-01-01 at 17:01, Bruce Momjian wrote: Simon Riggs wrote: Well, I think we're saying: its not in 8.0 now, and we take our time to consider patches for 8.1 and accept the situation that the parameter names/meaning will change in next release. I have no problem doing something for 8.0 if we can find something that meets all the items I mentioned. One idea would be to just remove bgwriter_percent. Beta/RC users would still have it in their postgresql.conf, but it is commented out so it should be OK. If they uncomment it their server would not start but we could just tell testers to remove it. I see that as better than having conflicting parameters. Can't say I like that at first thought. I'll think some more though... Another idea is to have bgwriter_percent be the percent of the buffer it will scan. Hmmmwell that was my original suggestion (bg2.patch on 12 Dec) (...though with a bug, as Neil pointed out) We could default that to 50% or 100%, but we then need to make sure all beta/RC users update their postgresql.conf with the new default because the commented-out default will not be correct. ...we just differ/ed on what the default should be... At this point I see these as our only two viable options, aside from doing nothing. I realize our current behavior requires a full scan of the buffer cache, but how often is the bgwriter_maxpages limit met? If it is not a full scan is done anyway, right? Well, if you heavy a very heavy read workload then that would be a problem. I was more worried about concurrency in a heavy write situation, but I can see your point, and agree. (Idea #1 still suffers from this, so we should rule it out...) It seems the only way to really add functionality is to change bgwriter_precent to control how much of the buffer is scanned. OK. I think you've persuaded me on idea #2, if I understand you right: bgwriter_percent = 50 (default) bgwriter_maxpages = 100 (default) percent is the number of shared_buffers we scan, limited by maxpages. (I'll code it up in a couple of hours when the kids are in bed) Here's the basic patch - no changes to current default values or docs. Not sure if this is still interesting or not... -- Best Regards, Simon Riggs Index: src/backend/storage/buffer/bufmgr.c === RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v retrieving revision 1.182 diff -d -c -r1.182 bufmgr.c *** src/backend/storage/buffer/bufmgr.c 24 Nov 2004 02:56:17 - 1.182 --- src/backend/storage/buffer/bufmgr.c 1 Jan 2005 21:03:16 - *** *** 682,717 BufferDesc **dirty_buffers; BufferTag *buftags; int num_buffer_dirty; int i; /* If either limit is zero then we are disabled from doing anything... */ if (percent == 0 || maxpages == 0) return 0; /* ! * Get a list of all currently dirty buffers and how many there are. * We do not flush buffers that get dirtied after we started. They ! * have to wait until the next checkpoint. */ ! dirty_buffers = (BufferDesc **) palloc(NBuffers * sizeof(BufferDesc *)); ! buftags = (BufferTag *) palloc(NBuffers * sizeof(BufferTag)); LWLockAcquire(BufMgrLock, LW_EXCLUSIVE); - num_buffer_dirty = StrategyDirtyBufferList(dirty_buffers, buftags, - NBuffers); ! /* ! * If called by the background writer, we are usually asked to only ! * write out some portion of dirty buffers now, to prevent the IO ! * storm at checkpoint time. ! */ ! if (percent 0) ! { ! Assert(percent = 100); ! num_buffer_dirty = (num_buffer_dirty * percent + 99) / 100; ! } ! if (maxpages 0 num_buffer_dirty maxpages) ! num_buffer_dirty = maxpages; /* Make sure we can handle the pin inside the loop */ ResourceOwnerEnlargeBuffers(CurrentResourceOwner); --- 682,728 BufferDesc **dirty_buffers; BufferTag *buftags; int num_buffer_dirty; + int max_buffer_dirty = 1; + int max_buffer_scan = 1; int i; /* If either limit is zero then we are disabled from doing anything... */ if (percent == 0 || maxpages == 0) return 0; + /* Set number of buffers we will scan from LRUs of buffer lists */ + if (percent 0 ) { + Assert(percent = 100); + max_buffer_scan = (NBuffers * percent + 99) / 100; + } + + /* at checkpoint time we scan the whole buffer list */ + if (percent 0) + max_buffer_scan = NBuffers; + + if (maxpages 0 || maxpages NBuffers) + max_buffer_dirty = NBuffers; + else + max_buffer_dirty = maxpages; + + /* we cannot find more dirty buffers than we scan */ + if (max_buffer_dirty max_buffer_scan) + max_buffer_dirty = max_buffer_scan; + /* ! * Get a list of dirty buffers to clean and how
Re: [PATCHES] [HACKERS] Bgwriter behavior
Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: o everyone agrees the current meaning of bgwriter_percent is useless (percent of dirty buffers) Oh? It's not useless by any means; it's a perfectly reasonable and useful definition that happens to be expensive to implement. One of the questions that is not answered to my satisfaction is what is an adequate substitute that doesn't lose needed functionality. I remembered this statement: I think there's a reasonable case to be made for redefining bgwriter_percent as the max percent of the total buffer list to scan (not the max percent of the list to return --- Jan correctly pointed out that the latter is useless). Then we could modify StrategyDirtyBufferList so that the percent and maxpages parameters are passed in, so it can stop as soon as either one is satisfied. This would be a fairly small/safe code change and I wouldn't have a problem doing it even at this late stage of the cycle. Referenced here: http://archives.postgresql.org/pgsql-hackers/2004-12/msg00703.php But I now see that Jan was objecting to the idea of the previouis patch where bgwriter_percent is a percent of all buffers to return, which we just discussed as redundant. o bgwriter_percent and bgwriter_maxpages are duplicate for a given number of buffers and it isn't clear which one takes precedence. Not unless the current definition of bgwriter_percent is changed. Please try to make sure that your summaries reduce confusion instead of increasing it. OK, whatever. My point is that many have critisized the current behavior of bgwriter_percent and I haven't heard anyone defend it, including Jan. What bothers me is that we have known bgwriter needs tuning for months and I am not sure we are any closer to improving it. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Bgwriter behavior
On Fri, 2004-12-31 at 01:14, Bruce Momjian wrote: Simon Riggs wrote: On Mon, 2004-12-27 at 22:21, Bruce Momjian wrote: Should we consider at least adjusting the meaning of bgwriter_percent? Yes. As things stand, this is the only change that seems safe. Here's a very short patch that implements this change within BufferSync in bufmgr.c - No algorithm changes - No error message changes - Only change is the call to StrategyDirtyBufferList is made using the maximum number of buffers that will be cleaned, rather than uselessly trawling through all of shared_buffers This changes the meaning of bgwriter_percent from percent of dirty buffers to percent of shared_buffers. The default settings of 1% of 1000 buffers gives up to 10 dirty block writes every 250ms Benefit: allows performance tuning by increases options for setting bgwriter_delay which would otherwise have an ineffectually high minimum setting Risk: low 1-line doc patch to follow, if this is approved. I am not objecting to the patch, but what value is there in having both bgwriter_percent and bgwriter_maxpages? Seems both are redundant and that one would be enough. In brief: i) for now: as little change as possible is good ii) the two parameters are OK iii) trying to decide an alternative takes time, which we do not have iv) what is presented here is simply a performance bug fix, not the best long term alternative... I'd like to move quickly: if we do this (or an alternative), it has to be done soon and it would be easy to discuss this until we run out of time. Could we vote: in RC3, or not? In more detail... The value of having both is: i) as little change as possible at this stage of RC - the main one ...which gives us stability ...and also avoids having to re-discuss what they *should* be ii) Having two isn't that bad. bgwriter_percent auto adjusts the length of the to-be-cleaned-list, so it is roughly useful anywhere between 500 and 1 shared_buffers. That is IMHO slightly more useful than a hard definition set via bgwriter_maxpages, since that is likely to be set wrong anyway - but has some value as an outside limit on the number of pages. [You may wish to set shared_buffers 1 even on smaller servers, since many now have 2GB RAM and yet a relatively poor I/O subsystem. Having maxpages set separately allows the majority of people to set shared_buffers higher without swamping their I/O subsystems because they didn't know about the r8.0 bgwriter feature/parameters] iii) changing the parameters might tempt us towards changing the algorithm, which is not a topic we have reached agreement on iv) I see it as a goal to remove all of those parameters anyway, as well as explore some of the many options and ideas everybody has presented, so further change is likely at the next release whatever is done now. The patch is as simple as I can make it and yet remove the unnecessary performance effect in the existing code. Thanks to Neil and others for showing that this was possible...I see this patch as a team effort. I've already spoken against larger change and would do so again now: if we don't agree this change, then I would vote for no-change simply because this patch is minimal change. We *suspect* further change is beneficial but we have no evidence to support what that change should be, amongst the large range of possible solutions proposed. -- Best Regards, Simon Riggs ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PATCHES] [HACKERS] Bgwriter behavior
This change isn't going to make it for RC3, and it probably not something we want to rush. I think there are a few issues involved: o everyone agrees the current meaning of bgwriter_percent is useless (percent of dirty buffers) o removal of bgwriter_percent will cause problems because postgresql.conf is only installed via initdb, so beta users will have to have some workaround so their existing postgresql.conf files work. o bgwriter_percent and bgwriter_maxpages are duplicate for a given number of buffers and it isn't clear which one takes precedence. o 8.1 might use these variables with different meanings, causing slight upgrade confusion. o Another idea is for bgwriter_percent to control how much of the buffer is scanned. Tom feels bgwriter_maxpages is good because it allows the user to specify the I/O traffic, while bgwriter_percent as total pages (not just dirty ones) is perhaps easier to set a default (I/O load varies based on buffer cache size) and perhaps easier to understand. I am not sure what to suggest at this point but whatever solution we use should take the above issues into account. --- Simon Riggs wrote: On Fri, 2004-12-31 at 01:14, Bruce Momjian wrote: Simon Riggs wrote: On Mon, 2004-12-27 at 22:21, Bruce Momjian wrote: Should we consider at least adjusting the meaning of bgwriter_percent? Yes. As things stand, this is the only change that seems safe. Here's a very short patch that implements this change within BufferSync in bufmgr.c - No algorithm changes - No error message changes - Only change is the call to StrategyDirtyBufferList is made using the maximum number of buffers that will be cleaned, rather than uselessly trawling through all of shared_buffers This changes the meaning of bgwriter_percent from percent of dirty buffers to percent of shared_buffers. The default settings of 1% of 1000 buffers gives up to 10 dirty block writes every 250ms Benefit: allows performance tuning by increases options for setting bgwriter_delay which would otherwise have an ineffectually high minimum setting Risk: low 1-line doc patch to follow, if this is approved. I am not objecting to the patch, but what value is there in having both bgwriter_percent and bgwriter_maxpages? Seems both are redundant and that one would be enough. In brief: i) for now: as little change as possible is good ii) the two parameters are OK iii) trying to decide an alternative takes time, which we do not have iv) what is presented here is simply a performance bug fix, not the best long term alternative... I'd like to move quickly: if we do this (or an alternative), it has to be done soon and it would be easy to discuss this until we run out of time. Could we vote: in RC3, or not? In more detail... The value of having both is: i) as little change as possible at this stage of RC - the main one ...which gives us stability ...and also avoids having to re-discuss what they *should* be ii) Having two isn't that bad. bgwriter_percent auto adjusts the length of the to-be-cleaned-list, so it is roughly useful anywhere between 500 and 1 shared_buffers. That is IMHO slightly more useful than a hard definition set via bgwriter_maxpages, since that is likely to be set wrong anyway - but has some value as an outside limit on the number of pages. [You may wish to set shared_buffers 1 even on smaller servers, since many now have 2GB RAM and yet a relatively poor I/O subsystem. Having maxpages set separately allows the majority of people to set shared_buffers higher without swamping their I/O subsystems because they didn't know about the r8.0 bgwriter feature/parameters] iii) changing the parameters might tempt us towards changing the algorithm, which is not a topic we have reached agreement on iv) I see it as a goal to remove all of those parameters anyway, as well as explore some of the many options and ideas everybody has presented, so further change is likely at the next release whatever is done now. The patch is as simple as I can make it and yet remove the unnecessary performance effect in the existing code. Thanks to Neil and others for showing that this was possible...I see this patch as a team effort. I've already spoken against larger change and would do so again now: if we don't agree this change, then I would vote for no-change simply because this patch is minimal change. We *suspect* further change is beneficial but we have no evidence to support what that change should be, amongst the large range of possible solutions proposed. -- Best Regards, Simon Riggs ---(end of broadcast)--- TIP 2: you can get
Re: [HACKERS] Bgwriter behavior
On Mon, 2004-12-27 at 22:21, Bruce Momjian wrote: Should we consider at least adjusting the meaning of bgwriter_percent? Yes. As things stand, this is the only change that seems safe. Here's a very short patch that implements this change within BufferSync in bufmgr.c - No algorithm changes - No error message changes - Only change is the call to StrategyDirtyBufferList is made using the maximum number of buffers that will be cleaned, rather than uselessly trawling through all of shared_buffers This changes the meaning of bgwriter_percent from percent of dirty buffers to percent of shared_buffers. The default settings of 1% of 1000 buffers gives up to 10 dirty block writes every 250ms Benefit: allows performance tuning by increases options for setting bgwriter_delay which would otherwise have an ineffectually high minimum setting Risk: low 1-line doc patch to follow, if this is approved. -- Best Regards, Simon Riggs Index: src/backend/storage/buffer/bufmgr.c === RCS file: /projects/cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v retrieving revision 1.182 diff -d -c -r1.182 bufmgr.c *** src/backend/storage/buffer/bufmgr.c 24 Nov 2004 02:56:17 - 1.182 --- src/backend/storage/buffer/bufmgr.c 30 Dec 2004 23:52:24 - *** *** 681,686 --- 681,687 { BufferDesc **dirty_buffers; BufferTag *buftags; + int dirty_buffers_maxlen = 1; int num_buffer_dirty; int i; *** *** 688,717 if (percent == 0 || maxpages == 0) return 0; /* ! * Get a list of all currently dirty buffers and how many there are. * We do not flush buffers that get dirtied after we started. They ! * have to wait until the next checkpoint. */ ! dirty_buffers = (BufferDesc **) palloc(NBuffers * sizeof(BufferDesc *)); ! buftags = (BufferTag *) palloc(NBuffers * sizeof(BufferTag)); LWLockAcquire(BufMgrLock, LW_EXCLUSIVE); - num_buffer_dirty = StrategyDirtyBufferList(dirty_buffers, buftags, - NBuffers); ! /* ! * If called by the background writer, we are usually asked to only ! * write out some portion of dirty buffers now, to prevent the IO ! * storm at checkpoint time. ! */ ! if (percent 0) ! { ! Assert(percent = 100); ! num_buffer_dirty = (num_buffer_dirty * percent + 99) / 100; ! } ! if (maxpages 0 num_buffer_dirty maxpages) ! num_buffer_dirty = maxpages; /* Make sure we can handle the pin inside the loop */ ResourceOwnerEnlargeBuffers(CurrentResourceOwner); --- 689,719 if (percent == 0 || maxpages == 0) return 0; + /* Set number of buffers we will clean at LRUs of buffer lists */ + if (percent 0 ) { + Assert(percent = 100); + dirty_buffers_maxlen = (NBuffers * percent + 99) / 100; + } + if (maxpages 0 dirty_buffers_maxlen maxpages) + dirty_buffers_maxlen = maxpages; + + /* if checkpoint time */ + if (percent == -1 maxpages == -1) + dirty_buffers_maxlen = NBuffers; + /* ! * Get a list of dirty buffers to clean and how many there are. * We do not flush buffers that get dirtied after we started. They ! * have to wait until the next call of this function */ ! dirty_buffers = ! (BufferDesc **) palloc(dirty_buffers_maxlen * sizeof(BufferDesc *)); ! buftags = (BufferTag *) palloc(dirty_buffers_maxlen * sizeof(BufferTag)); LWLockAcquire(BufMgrLock, LW_EXCLUSIVE); ! num_buffer_dirty = StrategyDirtyBufferList(dirty_buffers, buftags, ! dirty_buffers_maxlen); /* Make sure we can handle the pin inside the loop */ ResourceOwnerEnlargeBuffers(CurrentResourceOwner); ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [HACKERS] Bgwriter behavior
Simon Riggs wrote: On Mon, 2004-12-27 at 22:21, Bruce Momjian wrote: Should we consider at least adjusting the meaning of bgwriter_percent? Yes. As things stand, this is the only change that seems safe. Here's a very short patch that implements this change within BufferSync in bufmgr.c - No algorithm changes - No error message changes - Only change is the call to StrategyDirtyBufferList is made using the maximum number of buffers that will be cleaned, rather than uselessly trawling through all of shared_buffers This changes the meaning of bgwriter_percent from percent of dirty buffers to percent of shared_buffers. The default settings of 1% of 1000 buffers gives up to 10 dirty block writes every 250ms Benefit: allows performance tuning by increases options for setting bgwriter_delay which would otherwise have an ineffectually high minimum setting Risk: low 1-line doc patch to follow, if this is approved. I am not objecting to the patch, but what value is there in having both bgwriter_percent and bgwriter_maxpages? Seems both are redundant and that one would be enough. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Bgwriter behavior
[I know I'm late and this has already been discussed by Richrad, Tom, et al., but ...] On Tue, 21 Dec 2004 16:17:17 -0600, Jim C. Nasby [EMAIL PROTECTED] wrote: look at where the last page you wrote out has ended up in the LRU list since you last ran, and start scanning from there (by definition everything after that page would have to be clean). This is a bit oversimplified, because that page will be moved to the start of the list when it is accessed the next time. A = B = C = D = E = F = G = H = I = J = K = L = m = n = o = p = q ^ would become M = A = B = C = D = E = F = G = H = I = J = K = L = n = o = p = q ^ (a-z ... known to be clean, A-Z ... possibly dirty) But with a bit of cooperation from the backends this could be made to work. Whenever a backend takes the page which is the start of the clean tail out of the list (most probably to insert it into another list or to re-insert it at the start of the same list) the clean tail pointer is advanced to the next list element, if any. So we would get M = A = B = C = D = E = F = G = H = I = J = K = L = n = o = p = q ^ As a little improvement the clean tail could be prevented from shrinking unnecessarily fast by moving the pointer to the previous list element if this is found to be clean: M = A = B = C = D = E = F = G = H = I = J = K = l = n = o = p = q ^ Maybe this approach could serve both goals, (1) keeping a number of clean pages at the LRU end of the list and (2) writing out other dirty pages if there's not much to do near the end of the list. But ... On Tue, 21 Dec 2004 10:26:48 -0500, Tom Lane [EMAIL PROTECTED] wrote: Also, the cntxDirty mechanism allows a block to be dirtied without changing the ARC state at all. ... which might kill this proposal anyway. Servus Manfred ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Bgwriter behavior
Added to TODO: * Improve the background writer Allow the background writer to more efficiently write dirty buffers from the end of the LRU cache and use a clock sweep algorithm to write other dirty buffers to reduced checkpoint I/O --- Simon Riggs wrote: On Wed, 2004-12-22 at 04:43, Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: So what are we doing for 8.0? Well, it looks like RC2 has already crashed and burned --- I can't imagine that Marc will let us release without an RC3 given what was committed today, never mind the btree bug that Mark Wong seems to have found. So maybe we should just bite the bullet and do something real about this. I'm willing to code up a proposed patch for the two-track idea I suggested, and if anyone else has a favorite maybe they could write something too. But do we have the resources to test such patches and make a decision in the next few days? At the moment my inclination is to sit on what we have. I've not seen any indication that 8.0 is really worse than earlier releases; the most you could argue against it is that it's not as much better as we hoped. That's not grounds to muck around at the RC3 stage. Agreed, if somewhat reluctantly. We may have the time to test, but it is clear that we do not have the time to validate those tests, then discuss and agree on the results. Time to go with what we have. [Mark's possible bug seems a higher priority for me.] -- Best Regards, Simon Riggs ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Bgwriter behavior
John Hansen wrote: I ran some tests last week and can report results similar on Tom's test: pgbench -i -s 10 bench pgbench -c 10 -t 1 bench The tests were on a machine with a single SCSI drive that doesn't lie about fsync. I found 7.4.X got around 75tps while 8.0 got 100tps, very similar to the 65/107 numbers Tom had. You do realize, that pgbench result comparisons are about as useful as a fork for eating soup? On another note, how do you know for sure, that your drive does not lie about fsync? Did you run the tests with fsync turned off vs fsync on? I just tried and got 115tps with fsync off vs 100 with fsync on, so fsync is certainly doing something. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Bgwriter behavior
Bruce Momjian pgman@candle.pha.pa.us writes: John Hansen wrote: On another note, how do you know for sure, that your drive does not lie about fsync? I just tried and got 115tps with fsync off vs 100 with fsync on, so fsync is certainly doing something. [ raised eyebrow... ] Something is wrong with that. I'd expect a *much* higher difference. It's difficult to credit a tps rate higher than your disk's RPM rating with fsync on, but most modern CPUs can do a lot better than that with fsync off. If you have a 7200 RPM drive then I'd believe the 100 figure, but not the other ... regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Bgwriter behavior
Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: John Hansen wrote: On another note, how do you know for sure, that your drive does not lie about fsync? I just tried and got 115tps with fsync off vs 100 with fsync on, so fsync is certainly doing something. [ raised eyebrow... ] Something is wrong with that. I'd expect a *much* higher difference. It's difficult to credit a tps rate higher than your disk's RPM rating with fsync on, but most modern CPUs can do a lot better than that with fsync off. If you have a 7200 RPM drive then I'd believe the 100 figure, but not the other ... I think it is a 10k RPM drive, Seagate Cheteetah ST336607LW. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Bgwriter behavior
On Tue, 2004-12-28 at 07:23, John Hansen wrote: I ran some tests last week and can report results similar on Tom's test: pgbench -i -s 10 bench pgbench -c 10 -t 1 bench The tests were on a machine with a single SCSI drive that doesn't lie about fsync. I found 7.4.X got around 75tps while 8.0 got 100tps, very similar to the 65/107 numbers Tom had. You do realize, that pgbench result comparisons are about as useful as a fork for eating soup? I'd have to agree. I find it hard to get comparable results on my test server, let alone discuss other people's findings. The only tests I have reasonable faith in these days are those performed to a rigorous test method, which is also published, visible and challengeable. OSDL is the nearest thing to that we have to that. -- Best Regards, Simon Riggs ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Bgwriter behavior
I ran some tests last week and can report results similar on Tom's test: pgbench -i -s 10 bench pgbench -c 10 -t 1 bench don't you have to specify the scaling factor for the benchmark as well? as in pgbench -c 10 -t 1 -s 10 bench ? I just tried and got 115tps with fsync off vs 100 with fsync on, so fsync is certainly doing something. well, I usually get results that differ by that much from run to run. Probably you ran in to more checkpoints on the second test. Also, did you reinitialize the bench database with pgbench -i ? ... John ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Bgwriter behavior
John Hansen wrote: I ran some tests last week and can report results similar on Tom's test: pgbench -i -s 10 bench pgbench -c 10 -t 1 bench don't you have to specify the scaling factor for the benchmark as well? as in pgbench -c 10 -t 1 -s 10 bench ? I just tried and got 115tps with fsync off vs 100 with fsync on, so fsync is certainly doing something. well, I usually get results that differ by that much from run to run. Probably you ran in to more checkpoints on the second test. Also, did you reinitialize the bench database with pgbench -i ? I destroyed the database and recreated it. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Bgwriter behavior
Bruce Momjian wrote: well, I usually get results that differ by that much from run to run. Probably you ran in to more checkpoints on the second test. Also, did you reinitialize the bench database with pgbench -i ? I destroyed the database and recreated it. The only way I managed to control the variability in Pgbench was to *reboot the machine* and recreate the database for each test. In addition it seems that using a larger scale factor (e.g 200) helped as well. Having said that, on FreeBSD 5.3 with hw.ata.wc=0 (i.e no write cache) my results for s=200, t=1 and c=4 were 49 (+/- 0.5) tps for both 7.4.6 and 8.0.0RC1 - no measurable difference. If I reduced the number of transactions to t=1000, then 7.4.6 jumped ahead by about 10 tps. Bruce - are you able to try s=200? It would be interesting to see what your setup does. regards Mark ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Bgwriter behavior
Simon Riggs wrote: On Wed, 2004-12-22 at 04:43, Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: So what are we doing for 8.0? Well, it looks like RC2 has already crashed and burned --- I can't imagine that Marc will let us release without an RC3 given what was committed today, never mind the btree bug that Mark Wong seems to have found. So maybe we should just bite the bullet and do something real about this. I'm willing to code up a proposed patch for the two-track idea I suggested, and if anyone else has a favorite maybe they could write something too. But do we have the resources to test such patches and make a decision in the next few days? At the moment my inclination is to sit on what we have. I've not seen any indication that 8.0 is really worse than earlier releases; the most you could argue against it is that it's not as much better as we hoped. That's not grounds to muck around at the RC3 stage. Agreed, if somewhat reluctantly. We may have the time to test, but it is clear that we do not have the time to validate those tests, then discuss and agree on the results. Time to go with what we have. I ran some tests last week and can report results similar on Tom's test: pgbench -i -s 10 bench pgbench -c 10 -t 1 bench The tests were on a machine with a single SCSI drive that doesn't lie about fsync. I found 7.4.X got around 75tps while 8.0 got 100tps, very similar to the 65/107 numbers Tom had. First, I am confused why we have such a large improvement in 8.0. Does anyone know? This is a pretty long test so a 33-50% increase is a big jump. Second, I added a little code in my local code to check if the pendingOpsTable overflows and register_dirty_segment() must have a local backend do an fsync(). I found one bgbench test had 54 local fsyncs, but the next test had none, and 54 isn't a very larger number. Should we emit a server log message when this happens so they can reduce bewriter delay? It seems having the backend do the writes is not so bad (same as 7.4.X) and our only big problem with current bgwriter is the inability to reduce checkpoint load for busy servers. Should we consider at least adjusting the meaning of bgwriter_percent? -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Bgwriter behavior
I ran some tests last week and can report results similar on Tom's test: pgbench -i -s 10 bench pgbench -c 10 -t 1 bench The tests were on a machine with a single SCSI drive that doesn't lie about fsync. I found 7.4.X got around 75tps while 8.0 got 100tps, very similar to the 65/107 numbers Tom had. You do realize, that pgbench result comparisons are about as useful as a fork for eating soup? On another note, how do you know for sure, that your drive does not lie about fsync? Did you run the tests with fsync turned off vs fsync on? First, I am confused why we have such a large improvement in 8.0. Does anyone know? This is a pretty long test so a 33-50% increase is a big jump. bgwriter is responsible I imagine,... I experienced the same improvement in an early 7.5, just after the bgwriter was added. (tho my results was about 4-5 times higher in terms of tps rates, hehe) ... John ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Bgwriter behavior
Bruce Momjian pgman@candle.pha.pa.us writes: I remember the other difference between 8.0 and pre-8.0. When a backend has to write a block in 8.0, it does a write _plus_ fsync(), while in pre-8.0 it did only a write. There was a proposal to pass backend write information to the background writer so it would know to fsync at checkpoint, but it was decided that backend writing would be rare. I think we have to rethink that assumption. No, just read the code. The above assertions are all wet. regards, tom lane ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Bgwriter behavior
Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: I remember the other difference between 8.0 and pre-8.0. When a backend has to write a block in 8.0, it does a write _plus_ fsync(), while in pre-8.0 it did only a write. There was a proposal to pass backend write information to the background writer so it would know to fsync at checkpoint, but it was decided that backend writing would be rare. I think we have to rethink that assumption. No, just read the code. The above assertions are all wet. Oh, I forgot you added that array to pass fsync info. Shouldn't we send a log message when the array gets full in md.c: { if (ForwardFsyncRequest(reln-smgr_rnode, seg-mdfd_segno)) return true; } if (FileSync(seg-mdfd_vfd) 0) return false; Seems that could fill up quickly. I see no checking for existing matching records in the array. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [HACKERS] Bgwriter behavior
On Wed, 2004-12-22 at 04:43, Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: So what are we doing for 8.0? Well, it looks like RC2 has already crashed and burned --- I can't imagine that Marc will let us release without an RC3 given what was committed today, never mind the btree bug that Mark Wong seems to have found. So maybe we should just bite the bullet and do something real about this. I'm willing to code up a proposed patch for the two-track idea I suggested, and if anyone else has a favorite maybe they could write something too. But do we have the resources to test such patches and make a decision in the next few days? At the moment my inclination is to sit on what we have. I've not seen any indication that 8.0 is really worse than earlier releases; the most you could argue against it is that it's not as much better as we hoped. That's not grounds to muck around at the RC3 stage. Agreed, if somewhat reluctantly. We may have the time to test, but it is clear that we do not have the time to validate those tests, then discuss and agree on the results. Time to go with what we have. [Mark's possible bug seems a higher priority for me.] -- Best Regards, Simon Riggs ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [HACKERS] Bgwriter behavior
Tom Lane wrote: Bruce Momjian pgman@candle.pha.pa.us writes: So what are we doing for 8.0? Well, it looks like RC2 has already crashed and burned --- I can't imagine that Marc will let us release without an RC3 given what was committed today, never mind the btree bug that Mark Wong seems to have found. So maybe we should just bite the bullet and do something real about this. I'm willing to code up a proposed patch for the two-track idea I suggested, and if anyone else has a favorite maybe they could write something too. But do we have the resources to test such patches and make a decision in the next few days? At the moment my inclination is to sit on what we have. I've not seen any indication that 8.0 is really worse than earlier releases; the most you could argue against it is that it's not as much better as we hoped. That's not grounds to muck around at the RC3 stage. I remember the other difference between 8.0 and pre-8.0. When a backend has to write a block in 8.0, it does a write _plus_ fsync(), while in pre-8.0 it did only a write. There was a proposal to pass backend write information to the background writer so it would know to fsync at checkpoint, but it was decided that backend writing would be rare. I think we have to rethink that assumption. -- Bruce Momjian| http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[HACKERS] Bgwriter behavior
Tom Lane wrote: Gavin Sherry [EMAIL PROTECTED] writes: I was also thinking of benchmarking the effect of changing the algorithm in StrategyDirtyBufferList(): currently, for each iteration of the loop we read a buffer from each of T1 and T2. I was wondering what effect reading T1 first then T2 and vice versa would have on performance. Looking at StrategyGetBuffer, it definitely seems like a good idea to try to keep the bottom end of both T1 and T2 lists clean. But we should work at T1 a bit harder. The insight I take away from today's discussion is that there are two separate goals here: try to keep backends that acquire a buffer via StrategyGetBuffer from being fed a dirty buffer they have to write, and try to keep the next upcoming checkpoint from having too much work to do. Those are both laudable goals but I hadn't really seen before that they may require different strategies to achieve. I'm liking the idea that bgwriter should alternate between doing writes in pursuit of the one goal and doing writes in pursuit of the other. It seems we have added a new limitation to bgwriter by not doing a full scan. With a full scan we could easily grab the first X pages starting from the end of the LRU list and write them. By not scanning the full list we are opening the possibility of not seeing some of the front-most LRU dirty pages. And the full scan was removed so we can run bgwriter more frequently, but we might end up with other problems. I have a new proposal. The idea is to cause bgwriter to increase its frequency based on how quickly it finds dirty pages. First, we remove the GUC bgwriter_maxpages because I don't see a good way to set a default for that. A default value needs to be based on a percentage of the full buffer cache size. Second, we make bgwriter_percent cause the bgwriter to stop its scan once it has found a number of dirty buffers that matches X% of the buffer cache size. So, if it is set to 5%, the bgwriter scan stops once it find enough dirty buffers to equal 5% of the buffer cache size. Bgwriter continues to scan starting from the end of the LRU list, just like it does now. Now, to control the bgwriter frequency we multiply the percent of the list it had to span by the bgwriter_delay value to determine when to run bgwriter next. For example, if you find enough dirty pages by looking at only 10% of the buffer cache you multiple 10% (0.10) * bgwriter_delay and that is when you run next. If you have to scan 50%, bgwriter runs next at 50% (0.50) * bgwriter_delay, and if it has to scan the entire list it is 100% (1.00) * bgwriter_delay. What this does is to cause bgwriter to run more frequently when there are a lot of dirty buffers on the end of the LRU _and_ when the bgwriter scan will be quick. When there are few writes, bgwriter will run less frequently but will write dirty buffers nearer to the head of the LRU. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Bgwriter behavior
Bruce Momjian [EMAIL PROTECTED] writes: First, we remove the GUC bgwriter_maxpages because I don't see a good way to set a default for that. A default value needs to be based on a percentage of the full buffer cache size. This is nonsense. The admin knows what he set shared_buffers to, and so maxpages and percent of shared buffers are not really distinct ways of specifying things. The cases that make a percent spec useful are if (a) it is a percent of a non-constant number (eg, percent of total dirty pages as in the current code), or (b) it is defined in a way that lets it limit the amount of scanning work done (which it isn't useful for in the current code). But a maxpages spec is useful for (b) too. More to the point, maxpages is useful to set a hard limit on the amount of I/O generated by the bgwriter, and I think people will want to be able to do that. Now, to control the bgwriter frequency we multiply the percent of the list it had to span by the bgwriter_delay value to determine when to run bgwriter next. I'm less than enthused about this. The idea of the bgwriter is to trickle out writes in a way that doesn't affect overall performance too much. Not to write everything in sight at any cost. I like the hybrid keep the bottom of the ARC list clean, plus do a slow clock scan on the main buffer array approach better. I can see that that directly impacts both of the goals that the bgwriter has. I don't see how a variable I/O rate really improves life on either score; it just makes things harder to predict. regards, tom lane ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [HACKERS] Bgwriter behavior
A quick $0.02 on how DB2 does this (at least in 7.x). They used a combination of everything that's been discussed. The first priority of their background writer was to keep the LRU end of the cache free so individual backends would never have to wait to get a page. Then, they would look to pages that had been dirty for 'a long time', which was user configurable. Pages older than this setting were candidates to be written out even if they weren't close to LRU. Finally, I believe there were also settings for how often the writer would fire up, and how much work it would do at once. I agree that the first priority should be to keep clean pages near LRU, but that you also don't want to get hammered at checkpoint time. I think what might be interesting to consider is keeping a list of dirty pages, which would remove the need to scan a very large buffer. Of course, in an environment with a heavy update load, it could be better to just scan the buffers, especially if you don't do a clock-sweep but instead look at where the last page you wrote out has ended up in the LRU list since you last ran, and start scanning from there (by definition everything after that page would have to be clean). Of course this is just conjecture on my part and would need testing to verify, and it's obviously beyond the scope of 8.0. As for 8.0, I suspect at this point it's probably best to just go with whatever method has the smallest amount of code impact unless it's inherenttly broken. -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Bgwriter behavior
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: First, we remove the GUC bgwriter_maxpages because I don't see a good way to set a default for that. A default value needs to be based on a percentage of the full buffer cache size. This is nonsense. The admin knows what he set shared_buffers to, and so maxpages and percent of shared buffers are not really distinct ways of specifying things. The cases that make a percent spec useful are if (a) it is a percent of a non-constant number (eg, percent of total dirty pages as in the current code), or (b) it is defined in a way that lets it limit the amount of scanning work done (which it isn't useful for in the current code). But a maxpages spec is useful for (b) too. More to the point, maxpages is useful to set a hard limit on the amount of I/O generated by the bgwriter, and I think people will want to be able to do that. I figured that if we specify a percentage users would not need to update this value regularly if they increase their shared buffers. I agree if you want to limit total I/O by the bgwriter an actual pages a count is better but I assumed we were looking for bgwriter to do a certain percentage of total writes. If the system is doing a lot of writes then limiting the bgwriter doesn't help because then the backends are going to have to do the writes themselves. Now, to control the bgwriter frequency we multiply the percent of the list it had to span by the bgwriter_delay value to determine when to run bgwriter next. I'm less than enthused about this. The idea of the bgwriter is to trickle out writes in a way that doesn't affect overall performance too much. Not to write everything in sight at any cost. No question my idea makes tuning diffcult. I was hoping it would be self-tuning but I am not sure. I like the hybrid keep the bottom of the ARC list clean, plus do a slow clock scan on the main buffer array approach better. I can see that that directly impacts both of the goals that the bgwriter has. I don't see how a variable I/O rate really improves life on either score; it just makes things harder to predict. So what are we doing for 8.0? -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [HACKERS] Bgwriter behavior
Bruce Momjian [EMAIL PROTECTED] writes: So what are we doing for 8.0? Well, it looks like RC2 has already crashed and burned --- I can't imagine that Marc will let us release without an RC3 given what was committed today, never mind the btree bug that Mark Wong seems to have found. So maybe we should just bite the bullet and do something real about this. I'm willing to code up a proposed patch for the two-track idea I suggested, and if anyone else has a favorite maybe they could write something too. But do we have the resources to test such patches and make a decision in the next few days? At the moment my inclination is to sit on what we have. I've not seen any indication that 8.0 is really worse than earlier releases; the most you could argue against it is that it's not as much better as we hoped. That's not grounds to muck around at the RC3 stage. regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [HACKERS] Bgwriter behavior
At the moment my inclination is to sit on what we have. I've not seen any indication that 8.0 is really worse than earlier releases; the most you could argue against it is that it's not as much better as we hoped. That's not grounds to muck around at the RC3 stage. If is is any help, CMD is basically dead right now and I expect it will be that way until the new year. 4 of my 5 C programmers are on vacation but I do have one and a couple of non c programmers. We can't fix, but we can definately help test. Sincerely, Joshua D. Drake regards, tom lane ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL begin:vcard fn:Joshua Drake n:Drake;Joshua org:Command Prompt, Inc. adr:;;PO Box 215 ;Cascade Locks;OR;97014;US email;internet:[EMAIL PROTECTED] title:Consultant tel;work:503-667-4564 tel;fax:503-210-0334 x-mozilla-html:FALSE url:http://www.commandprompt.com version:2.1 end:vcard ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [HACKERS] Bgwriter behavior
Tom Lane wrote: Bruce Momjian [EMAIL PROTECTED] writes: So what are we doing for 8.0? Well, it looks like RC2 has already crashed and burned --- I can't imagine that Marc will let us release without an RC3 given what was committed today, never mind the btree bug that Mark Wong seems to have found. So maybe we should just bite the bullet and do something real about this. Oh, is it that bad? I'm willing to code up a proposed patch for the two-track idea I suggested, and if anyone else has a favorite maybe they could write something too. But do we have the resources to test such patches and make a decision in the next few days? At the moment my inclination is to sit on what we have. I've not seen any indication that 8.0 is really worse than earlier releases; the most you could argue against it is that it's not as much better as we hoped. That's not grounds to muck around at the RC3 stage. That was my question. It seems bgwriter is fine for low to medium traffic but doesn't handle high traffic, and increasing the scan rate makes things worse. I am fine with doing nothing, but if we are going to do something, I would like to do it now rather than later. The only way I could see it being worse than pre-8.0 is that the bgwriter is doing fsync of all open files rather than using sync. Other than that, I think it should behave the same, or slightly better, right? -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [HACKERS] Bgwriter behavior
Bruce Momjian [EMAIL PROTECTED] writes: The only way I could see it being worse than pre-8.0 is that the bgwriter is doing fsync of all open files rather than using sync. Other than that, I think it should behave the same, or slightly better, right? It's possible that there exist platforms on which this is a loss --- that is, the OS's handling of fsync is so inefficient that multiple fsync calls are worse than one sync call even though less I/O is forced. But I haven't seen any actual evidence of that; and if such platforms do exist I'm not sure I'd blink anyway. We are not required to optimize for brain-dead kernels. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html